LinuxLists.cc - [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

[permalink] [raw]

Subject: [PATCH/RFCv4 1/6] lib: rbtree: rb_root_init() function added

Added a rb_root_init() function which initialises a rb_root
structure as a red-black tree with at most one element. The
rationale is that using rb_root_init(root, node) is more
straightforward and cleaner then first initialising and
empty tree followed by an insert operation.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
include/linux/rbtree.h | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 7066acb..5b6dc66 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -130,6 +130,17 @@ static inline void rb_set_color(struct rb_node *rb, int color)
}

#define RB_ROOT (struct rb_root) { NULL, }
+
+static inline void rb_root_init(struct rb_root *root, struct rb_node *node)
+{
+ root->rb_node = node;
+ if (node) {
+ node->rb_parent_color = RB_BLACK; /* black, no parent */
+ node->rb_left = NULL;
+ node->rb_right = NULL;
+ }
+}
+
#define rb_entry(ptr, type, member) container_of(ptr, type, member)

#define RB_EMPTY_ROOT(root) ((root)->rb_node == NULL)
--
1.7.1

2010-08-20 09:52:05

[permalink] [raw]

Subject: [PATCH/RFCv4 2/6] mm: cma: Contiguous Memory Allocator added

The Contiguous Memory Allocator framework is a set of APIs for
allocating physically contiguous chunks of memory.

Various chips require contiguous blocks of memory to operate. Those
chips include devices such as cameras, hardware video decoders and
encoders, etc.

The code is highly modular and customisable to suit the needs of
various users. Set of regions reserved for CMA can be configured
per-platform and it is easy to add custom allocator algorithms if one
has such need.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
Reviewed-by: Pawel Osciak <[email protected]>
---
Documentation/00-INDEX | 2 +
Documentation/contiguous-memory.txt | 541 +++++++++++++++++++++
include/linux/cma.h | 431 +++++++++++++++++
mm/Kconfig | 34 ++
mm/Makefile | 2 +
mm/cma-best-fit.c | 407 ++++++++++++++++
mm/cma.c | 910 +++++++++++++++++++++++++++++++++++
7 files changed, 2327 insertions(+), 0 deletions(-)
create mode 100644 Documentation/contiguous-memory.txt
create mode 100644 include/linux/cma.h
create mode 100644 mm/cma-best-fit.c
create mode 100644 mm/cma.c

diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index 8dfc670..f93e787 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -94,6 +94,8 @@ connector/
- docs on the netlink based userspace<->kernel space communication mod.
console/
- documentation on Linux console drivers.
+contiguous-memory.txt
+ - documentation on physically-contiguous memory allocation framework.
cpu-freq/
- info on CPU frequency and voltage scaling.
cpu-hotplug.txt
diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt
new file mode 100644
index 0000000..8fc2400
--- /dev/null
+++ b/Documentation/contiguous-memory.txt
@@ -0,0 +1,541 @@
+ -*- org -*-
+
+* Contiguous Memory Allocator
+
+ The Contiguous Memory Allocator (CMA) is a framework, which allows
+ setting up a machine-specific configuration for physically-contiguous
+ memory management. Memory for devices is then allocated according
+ to that configuration.
+
+ The main role of the framework is not to allocate memory, but to
+ parse and manage memory configurations, as well as to act as an
+ in-between between device drivers and pluggable allocators. It is
+ thus not tied to any memory allocation method or strategy.
+
+** Why is it needed?
+
+ Various devices on embedded systems have no scatter-getter and/or
+ IO map support and as such require contiguous blocks of memory to
+ operate. They include devices such as cameras, hardware video
+ decoders and encoders, etc.
+
+ Such devices often require big memory buffers (a full HD frame is,
+ for instance, more then 2 mega pixels large, i.e. more than 6 MB
+ of memory), which makes mechanisms such as kmalloc() ineffective.
+
+ Some embedded devices impose additional requirements on the
+ buffers, e.g. they can operate only on buffers allocated in
+ particular location/memory bank (if system has more than one
+ memory bank) or buffers aligned to a particular memory boundary.
+
+ Development of embedded devices have seen a big rise recently
+ (especially in the V4L area) and many such drivers include their
+ own memory allocation code. Most of them use bootmem-based methods.
+ CMA framework is an attempt to unify contiguous memory allocation
+ mechanisms and provide a simple API for device drivers, while
+ staying as customisable and modular as possible.
+
+** Design
+
+ The main design goal for the CMA was to provide a customisable and
+ modular framework, which could be configured to suit the needs of
+ individual systems. Configuration specifies a list of memory
+ regions, which then are assigned to devices. Memory regions can
+ be shared among many device drivers or assigned exclusively to
+ one. This has been achieved in the following ways:
+
+ 1. The core of the CMA does not handle allocation of memory and
+ management of free space. Dedicated allocators are used for
+ that purpose.
+
+ This way, if the provided solution does not match demands
+ imposed on a given system, one can develop a new algorithm and
+ easily plug it into the CMA framework.
+
+ The presented solution includes an implementation of a best-fit
+ algorithm.
+
+ 2. When requesting memory, devices have to introduce themselves.
+ This way CMA knows who the memory is allocated for. This
+ allows for the system architect to specify which memory regions
+ each device should use.
+
+ 3. Memory regions are grouped in various "types". When device
+ requests a chunk of memory, it can specify what type of memory
+ it needs. If no type is specified, "common" is assumed.
+
+ This makes it possible to configure the system in such a way,
+ that a single device may get memory from different memory
+ regions, depending on the "type" of memory it requested. For
+ example, a video codec driver might want to allocate some
+ shared buffers from the first memory bank and the other from
+ the second to get the highest possible memory throughput.
+
+ 4. For greater flexibility and extensibility, the framework allows
+ device drivers to register private regions of reserved memory
+ which then may be used only by them.
+
+ As an effect, if a driver would not use the rest of the CMA
+ interface, it can still use CMA allocators and other
+ mechanisms.
+
+ 4a. Early in boot process, device drivers can also request the
+ CMA framework to a reserve a region of memory for them
+ which then will be used as a private region.
+
+ This way, drivers do not need to directly call bootmem,
+ memblock or similar early allocator but merely register an
+ early region and the framework will handle the rest
+ including choosing the right early allocator.
+
+** Use cases
+
+ Let's analyse some imaginary system that uses the CMA to see how
+ the framework can be used and configured.
+
+
+ We have a platform with a hardware video decoder and a camera each
+ needing 20 MiB of memory in the worst case. Our system is written
+ in such a way though that the two devices are never used at the
+ same time and memory for them may be shared. In such a system the
+ following configuration would be used in the platform
+ initialisation code:
+
+ static struct cma_region regions[] = {
+ { .name = "region", .size = 20 << 20 },
+ { }
+ }
+ static const char map[] __initconst = "video,camera=region";
+
+ cma_set_defaults(regions, map);
+
+ The regions array defines a single 20-MiB region named "region".
+ The map says that drivers named "video" and "camera" are to be
+ granted memory from the previously defined region.
+
+ A shorter map can be used as well:
+
+ static const char map[] __initconst = "*=region";
+
+ The asterisk ("*") matches all devices thus all devices will use
+ the region named "region".
+
+ We can see, that because the devices share the same memory region,
+ we save 20 MiB, compared to the situation when each of the devices
+ would reserve 20 MiB of memory for itself.
+
+
+ Now, let's say that we have also many other smaller devices and we
+ want them to share some smaller pool of memory. For instance 5
+ MiB. This can be achieved in the following way:
+
+ static struct cma_region regions[] = {
+ { .name = "region", .size = 20 << 20 },
+ { .name = "common", .size = 5 << 20 },
+ { }
+ }
+ static const char map[] __initconst =
+ "video,camera=region;*=common";
+
+ cma_set_defaults(regions, map);
+
+ This instructs CMA to reserve two regions and let video and camera
+ use region "region" whereas all other devices should use region
+ "common".
+
+
+ Later on, after some development of the system, it can now run
+ video decoder and camera at the same time. The 20 MiB region is
+ no longer enough for the two to share. A quick fix can be made to
+ grant each of those devices separate regions:
+
+ static struct cma_region regions[] = {
+ { .name = "v", .size = 20 << 20 },
+ { .name = "c", .size = 20 << 20 },
+ { .name = "common", .size = 5 << 20 },
+ { }
+ }
+ static const char map[] __initconst = "video=v;camera=c;*=common";
+
+ cma_set_defaults(regions, map);
+
+ This solution also shows how with CMA you can assign private pools
+ of memory to each device if that is required.
+
+
+ Allocation mechanisms can be replaced dynamically in a similar
+ manner as well. Let's say that during testing, it has been
+ discovered that, for a given shared region of 40 MiB,
+ fragmentation has become a problem. It has been observed that,
+ after some time, it becomes impossible to allocate buffers of the
+ required sizes. So to satisfy our requirements, we would have to
+ reserve a larger shared region beforehand.
+
+ But fortunately, you have also managed to develop a new allocation
+ algorithm -- Neat Allocation Algorithm or "na" for short -- which
+ satisfies the needs for both devices even on a 30 MiB region. The
+ configuration can be then quickly changed to:
+
+ static struct cma_region regions[] = {
+ { .name = "region", .size = 30 << 20, .alloc_name = "na" },
+ { .name = "common", .size = 5 << 20 },
+ { }
+ }
+ static const char map[] __initconst = "video,camera=region;*=common";
+
+ cma_set_defaults(regions, map);
+
+ This shows how you can develop your own allocation algorithms if
+ the ones provided with CMA do not suit your needs and easily
+ replace them, without the need to modify CMA core or even
+ recompiling the kernel.
+
+** Technical Details
+
+*** The attributes
+
+ As shown above, CMA is configured by a two attributes: list
+ regions and map. The first one specifies regions that are to be
+ reserved for CMA. The second one specifies what regions each
+ device is assigned to.
+
+**** Regions
+
+ Regions is a list of regions terminated by a region with size
+ equal zero. The following fields may be set:
+
+ - size -- size of the region (required, must not be zero)
+ - alignment -- alignment of the region; must be power of two or
+ zero (optional)
+ - start -- where the region has to start (optional)
+ - alloc_name -- the name of allocator to use (optional)
+ - alloc -- allocator to use (optional; and besides
+ alloc_name is probably is what you want)
+
+ size, alignment and start is specified in bytes. Size will be
+ aligned up to a PAGE_SIZE. If alignment is less then a PAGE_SIZE
+ it will be set to a PAGE_SIZE. start will be aligned to
+ alignment.
+
+**** Map
+
+ The format of the "map" attribute is as follows:
+
+ map-attr ::= [ rules [ ';' ] ]
+ rules ::= rule [ ';' rules ]
+ rule ::= patterns '=' regions
+
+ patterns ::= pattern [ ',' patterns ]
+
+ regions ::= REG-NAME [ ',' regions ]
+ // list of regions to try to allocate memory
+ // from
+
+ pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
+ // pattern request must match for the rule to
+ // apply; the first rule that matches is
+ // applied; if dev-pattern part is omitted
+ // value identical to the one used in previous
+ // pattern is assumed.
+
+ dev-pattern ::= PATTERN
+ // pattern that device name must match for the
+ // rule to apply; may contain question marks
+ // which mach any characters and end with an
+ // asterisk which match the rest of the string
+ // (including nothing).
+
+ It is a sequence of rules which specify what regions should given
+ (device, type) pair use. The first rule that matches is applied.
+
+ For rule to match, the pattern must match (dev, type) pair.
+ Pattern consist of the part before and after slash. The first
+ part must match device name and the second part must match kind.
+
+ If the first part is empty, the device name is assumed to match
+ iff it matched in previous pattern. If the second part is
+ omitted it will mach any type of memory requested by device.
+
+ Some examples (whitespace added for better readability):
+
+ cma_map = foo/quaz = r1;
+ // device foo with type == "quaz" uses region r1
+
+ foo/* = r2; // OR:
+ /* = r2;
+ // device foo with any other kind uses region r2
+
+ bar = r1,r2;
+ // device bar uses region r1 or r2
+
+ baz?/a , baz?/b = r3;
+ // devices named baz? where ? is any character
+ // with type being "a" or "b" use r3
+
+*** The device and types of memory
+
+ The name of the device is taken from the device structure. It is
+ not possible to use CMA if driver does not register a device
+ (actually this can be overcome if a fake device structure is
+ provided with at least the name set).
+
+ The type of memory is an optional argument provided by the device
+ whenever it requests memory chunk. In many cases this can be
+ ignored but sometimes it may be required for some devices.
+
+ For instance, let's say that there are two memory banks and for
+ performance reasons a device uses buffers in both of them.
+ Platform defines a memory types "a" and "b" for regions in both
+ banks. The device driver would use those two types then to
+ request memory chunks from different banks. CMA attributes could
+ look as follows:
+
+ static struct cma_region regions[] = {
+ { .name = "a", .size = 32 << 20 },
+ { .name = "b", .size = 32 << 20, .start = 512 << 20 },
+ { }
+ }
+ static const char map[] __initconst = "foo/a=a;foo/b=b;*=a,b";
+
+ And whenever the driver allocated the memory it would specify the
+ kind of memory:
+
+ buffer1 = cma_alloc(dev, "a", 1 << 20, 0);
+ buffer2 = cma_alloc(dev, "b", 1 << 20, 0);
+
+ If it was needed to try to allocate from the other bank as well if
+ the dedicated one is full, the map attributes could be changed to:
+
+ static const char map[] __initconst = "foo/a=a,b;foo/b=b,a;*=a,b";
+
+ On the other hand, if the same driver was used on a system with
+ only one bank, the configuration could be changed just to:
+
+ static struct cma_region regions[] = {
+ { .name = "r", .size = 64 << 20 },
+ { }
+ }
+ static const char map[] __initconst = "*=r";
+
+ without the need to change the driver at all.
+
+*** Device API
+
+ There are three basic calls provided by the CMA framework to
+ devices. To allocate a chunk of memory cma_alloc() function needs
+ to be used:
+
+ dma_addr_t cma_alloc(const struct device *dev, const char *type,
+ size_t size, dma_addr_t alignment);
+
+ If required, device may specify alignment in bytes that the chunk
+ need to satisfy. It have to be a power of two or zero. The
+ chunks are always aligned at least to a page.
+
+ The type specifies the type of memory as described to in the
+ previous subsection. If device driver does not care about memory
+ type it can safely pass NULL as the type which is the same as
+ possing "common".
+
+ The basic usage of the function is just a:
+
+ addr = cma_alloc(dev, NULL, size, 0);
+
+ The function returns physical address of allocated chunk or
+ a value that evaluates to true if checked with IS_ERR_VALUE(), so
+ the correct way for checking for errors is:
+
+ unsigned long addr = cma_alloc(dev, size);
+ if (IS_ERR_VALUE(addr))
+ /* Error */
+ return (int)addr;
+ /* Allocated */
+
+ (Make sure to include <linux/err.h> which contains the definition
+ of the IS_ERR_VALUE() macro.)
+
+
+ Allocated chunk is freed via a cma_free() function:
+
+ int cma_free(dma_addr_t addr);
+
+ It takes physical address of the chunk as an argument frees it.
+
+
+ The last function is the cma_info() which returns information
+ about regions assigned to given (dev, type) pair. Its syntax is:
+
+ int cma_info(struct cma_info *info,
+ const struct device *dev,
+ const char *type);
+
+ On successful exit it fills the info structure with lower and
+ upper bound of regions, total size and number of regions assigned
+ to given (dev, type) pair.
+
+**** Dynamic and private regions
+
+ In the basic setup, regions are provided and initialised by
+ platform initialisation code (which usually use
+ cma_set_defaults() for that purpose).
+
+ It is, however, possible to create and add regions dynamically
+ using cma_region_register() function.
+
+ int cma_region_register(struct cma_region *reg);
+
+ The region does not have to have name. If it does not, it won't
+ be accessed via standard mapping (the one provided with map
+ attribute). Such regions are private and to allocate chunk from
+ them, one needs to call:
+
+ dma_addr_t cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment);
+
+ It is just like cma_alloc() expect one specifies what region to
+ allocate memory from. The region must have been registered.
+
+**** Allocating from region specified by name
+
+ If a driver preferred allocating from a region or list of regions
+ it knows name of it can use a different call simmilar to the
+ previous:
+
+ dma_addr_t cma_alloc_from(const char *regions,
+ size_t size, dma_addr_t alignment);
+
+ The first argument is a comma-separated list of regions the
+ driver desires CMA to try and allocate from. The list is
+ terminated by a NUL byte or a semicolon.
+
+ Similarly, there is a call for requesting information about named
+ regions:
+
+ int cma_info_about(struct cma_info *info, const char *regions);
+
+ Generally, it should not be needed to use those interfaces but
+ they are provided nevertheless.
+
+**** Registering early regions
+
+ An early region is a region that is managed by CMA early during
+ boot process. It's platforms responsibility to reserve memory
+ for early regions. Later on, when CMA initialises, early regions
+ with reserved memory are registered as normal regions.
+ Registering an early region may be a way for a device to request
+ a private pool of memory without worrying about actually
+ reserving the memory:
+
+ int cma_early_region_register(struct cma_region *reg);
+
+ This needs to be done quite early on in boot process, before
+ platform traverses the cma_early_regions list to reserve memory.
+
+ When boot process ends, device driver may see whether the region
+ was reserved (by checking reg->reserved flag) and if so, whether
+ it was successfully registered as a normal region (by checking
+ the reg->registered flag). If that is the case, device driver
+ can use normal API calls to use the region.
+
+*** Allocator operations
+
+ Creating an allocator for CMA needs four functions to be
+ implemented.
+
+
+ The first two are used to initialise an allocator far given driver
+ and clean up afterwards:
+
+ int cma_foo_init(struct cma_region *reg);
+ void cma_foo_cleanup(struct cma_region *reg);
+
+ The first is called when allocater is attached to region. The
+ cma_region structure has saved starting address of the region as
+ well as its size. Any data that allocate associated with the
+ region can be saved in private_data field.
+
+ The second call cleans up and frees all resources the allocator
+ has allocated for the region. The function can assume that all
+ chunks allocated form this region have been freed thus the whole
+ region is free.
+
+
+ The two other calls are used for allocating and freeing chunks.
+ They are:
+
+ struct cma_chunk *cma_foo_alloc(struct cma_region *reg,
+ size_t size, dma_addr_t alignment);
+ void cma_foo_free(struct cma_chunk *chunk);
+
+ As names imply the first allocates a chunk and the other frees
+ a chunk of memory. It also manages a cma_chunk object
+ representing the chunk in physical memory.
+
+ Either of those function can assume that they are the only thread
+ accessing the region. Therefore, allocator does not need to worry
+ about concurrency. Moreover, all arguments are guaranteed to be
+ valid (i.e. page aligned size, a power of two alignment no lower
+ the a page size).
+
+
+ When allocator is ready, all that is left is to register it by
+ calling cma_allocator_register() function:
+
+ int cma_allocator_register(struct cma_allocator *alloc);
+
+ The argument is an structure with pointers to the above functions
+ and allocator's name. The whole call may look something like
+ this:
+
+ static struct cma_allocator alloc = {
+ .name = "foo",
+ .init = cma_foo_init,
+ .cleanup = cma_foo_cleanup,
+ .alloc = cma_foo_alloc,
+ .free = cma_foo_free,
+ };
+ return cma_allocator_register(&alloc);
+
+ The name ("foo") will be available to use with command line
+ argument.
+
+*** Integration with platform
+
+ There is one function that needs to be called form platform
+ initialisation code. That is the cma_early_regions_reserve()
+ function:
+
+ void cma_early_regions_reserve(int (*reserve)(struct cma_region *reg));
+
+ It traverses list of all of the regions given on command line and
+ reserves memory for them. The only argument is a callback
+ function used to reserve the region. Passing NULL as the argument
+ makes the function use cma_early_region_reserve() function which
+ uses bootmem and memblock for allocating.
+
+ Alternatively, platform code could traverse the cma_early_regions
+ list by itself but this should not be necessary.
+
+
+ Platform has also a way of providing default attributes for CMA,
+ cma_set_defaults() function is used for that purpose:
+
+ int cma_set_defaults(struct cma_region *regions, const char *map)
+
+ It needs to be called prior to reserving regions. It let one
+ specify the list of regions defined by platform and the map
+ attribute. The map may point to a string in __initdata. See
+ above in this document for example usage of this function.
+
+** Future work
+
+ In the future, implementation of mechanisms that would allow the
+ free space inside the regions to be used as page cache, filesystem
+ buffers or swap devices is planned. With such mechanisms, the
+ memory would not be wasted when not used.
+
+ Because all allocations and freeing of chunks pass the CMA
+ framework it can follow what parts of the reserved memory are
+ freed and what parts are allocated. Tracking the unused memory
+ would let CMA use it for other purposes such as page cache, I/O
+ buffers, swap, etc.
diff --git a/include/linux/cma.h b/include/linux/cma.h
new file mode 100644
index 0000000..cd63f52
--- /dev/null
+++ b/include/linux/cma.h
@@ -0,0 +1,431 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator framework
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ */
+
+/*
+ * See Documentation/contiguous-memory.txt for details.
+ */
+
+/***************************** Kernel lever API *****************************/
+
+#ifdef __KERNEL__
+
+#include <linux/rbtree.h>
+#include <linux/list.h>
+
+
+struct device;
+struct cma_info;
+
+/*
+ * Don't call it directly, use cma_alloc(), cma_alloc_from() or
+ * cma_alloc_from_region().
+ */
+dma_addr_t __must_check
+__cma_alloc(const struct device *dev, const char *type,
+ size_t size, dma_addr_t alignment);
+
+/* Don't call it directly, use cma_info() or cma_info_about(). */
+int
+__cma_info(struct cma_info *info, const struct device *dev, const char *type);
+
+
+/**
+ * cma_alloc - allocates contiguous chunk of memory.
+ * @dev: The device to perform allocation for.
+ * @type: A type of memory to allocate. Platform may define
+ * several different types of memory and device drivers
+ * can then request chunks of different types. Usually it's
+ * safe to pass NULL here which is the same as passing
+ * "common".
+ * @size: Size of the memory to allocate in bytes.
+ * @alignment: Desired alignment in bytes. Must be a power of two or
+ * zero. If alignment is less then a page size it will be
+ * set to page size. If unsure, pass zero here.
+ *
+ * On error returns a negative error cast to dma_addr_t. Use
+ * IS_ERR_VALUE() to check if returned value is indeed an error.
+ * Otherwise physical address of the chunk is returned.
+ */
+static inline dma_addr_t __must_check
+cma_alloc(const struct device *dev, const char *type,
+ size_t size, dma_addr_t alignment)
+{
+ return dev ? __cma_alloc(dev, type, size, alignment) : -EINVAL;
+}
+
+
+/**
+ * struct cma_info - information about regions returned by cma_info().
+ * @lower_bound: The smallest address that is possible to be
+ * allocated for given (dev, type) pair.
+ * @upper_bound: The one byte after the biggest address that is
+ * possible to be allocated for given (dev, type)
+ * pair.
+ * @total_size: Total size of regions mapped to (dev, type) pair.
+ * @free_size: Total free size in all of the regions mapped to (dev, type)
+ * pair. Because of possible race conditions, it is not
+ * guaranteed that the value will be correct -- it gives only
+ * an approximation.
+ * @count: Number of regions mapped to (dev, type) pair.
+ */
+struct cma_info {
+ dma_addr_t lower_bound, upper_bound;
+ size_t total_size, free_size;
+ unsigned count;
+};
+
+/**
+ * cma_info - queries information about regions.
+ * @info: Pointer to a structure where to save the information.
+ * @dev: The device to query information for.
+ * @type: A type of memory to query information for.
+ * If unsure, pass NULL here which is equal to passing
+ * "common".
+ *
+ * On error returns a negative error, zero otherwise.
+ */
+static inline int
+cma_info(struct cma_info *info, const struct device *dev, const char *type)
+{
+ return dev ? __cma_info(info, dev, type) : -EINVAL;
+}
+
+
+/**
+ * cma_free - frees a chunk of memory.
+ * @addr: Beginning of the chunk.
+ *
+ * Returns -ENOENT if there is no chunk at given location; otherwise
+ * zero. In the former case issues a warning.
+ */
+int cma_free(dma_addr_t addr);
+
+
+
+/****************************** Lower lever API *****************************/
+
+/**
+ * cma_alloc_from - allocates contiguous chunk of memory from named regions.
+ * @regions: Comma separated list of region names. Terminated by NUL
+ * byte or a semicolon.
+ * @size: Size of the memory to allocate in bytes.
+ * @alignment: Desired alignment in bytes. Must be a power of two or
+ * zero. If alignment is less then a page size it will be
+ * set to page size. If unsure, pass zero here.
+ *
+ * On error returns a negative error cast to dma_addr_t. Use
+ * IS_ERR_VALUE() to check if returned value is indeed an error.
+ * Otherwise physical address of the chunk is returned.
+ */
+static inline dma_addr_t __must_check
+cma_alloc_from(const char *regions, size_t size, dma_addr_t alignment)
+{
+ return __cma_alloc(NULL, regions, size, alignment);
+}
+
+/**
+ * cma_info_about - queries information about named regions.
+ * @info: Pointer to a structure where to save the information.
+ * @regions: Comma separated list of region names. Terminated by NUL
+ * byte or a semicolon.
+ *
+ * On error returns a negative error, zero otherwise.
+ */
+static inline int
+cma_info_about(struct cma_info *info, const const char *regions)
+{
+ return __cma_info(info, NULL, regions);
+}
+
+
+
+struct cma_allocator;
+
+/**
+ * struct cma_region - a region reserved for CMA allocations.
+ * @name: Unique name of the region. Read only.
+ * @start: Physical starting address of the region in bytes. Always
+ * aligned at least to a full page. Read only.
+ * @size: Size of the region in bytes. Multiply of a page size.
+ * Read only.
+ * @free_space: Free space in the region. Read only.
+ * @alignment: Desired alignment of the region in bytes. A power of two,
+ * always at least page size. Early.
+ * @alloc: Allocator used with this region. NULL means allocator is
+ * not attached. Private.
+ * @alloc_name: Allocator name read from cmdline. Private. This may be
+ * different from @alloc->name.
+ * @private_data: Allocator's private data.
+ * @users: Number of chunks allocated in this region.
+ * @list: Entry in list of regions. Private.
+ * @used: Whether region was already used, ie. there was at least
+ * one allocation request for. Private.
+ * @registered: Whether this region has been registered. Read only.
+ * @reserved: Whether this region has been reserved. Early. Read only.
+ * @copy_name: Whether @name and @alloc_name needs to be copied when
+ * this region is converted from early to normal. Early.
+ * Private.
+ * @free_alloc_name: Whether @alloc_name was kmalloced(). Private.
+ *
+ * Regions come in two types: an early region and normal region. The
+ * former can be reserved or not-reserved. Fields marked as "early"
+ * are only meaningful in early regions.
+ *
+ * Early regions are important only during initialisation. The list
+ * of early regions is built from the "cma" command line argument or
+ * platform defaults. Platform initialisation code is responsible for
+ * reserving space for unreserved regions that are placed on
+ * cma_early_regions list.
+ *
+ * Later, during CMA initialisation all reserved regions from the
+ * cma_early_regions list are registered as normal regions and can be
+ * used using standard mechanisms.
+ */
+struct cma_region {
+ const char *name;
+ dma_addr_t start;
+ size_t size;
+ union {
+ size_t free_space; /* Normal region */
+ dma_addr_t alignment; /* Early region */
+ };
+
+ struct cma_allocator *alloc;
+ const char *alloc_name;
+ void *private_data;
+
+ unsigned users;
+ struct list_head list;
+
+ unsigned used:1;
+ unsigned registered:1;
+ unsigned reserved:1;
+ unsigned copy_name:1;
+ unsigned free_alloc_name:1;
+};
+
+
+/**
+ * cma_region_register() - registers a region.
+ * @reg: Region to region.
+ *
+ * Region's start and size must be set.
+ *
+ * If name is set the region will be accessible using normal mechanism
+ * like mapping or cma_alloc_from() function otherwise it will be
+ * a private region and accessible only using the
+ * cma_alloc_from_region() function.
+ *
+ * If alloc is set function will try to initialise given allocator
+ * (and will return error if it failes). Otherwise alloc_name may
+ * point to a name of an allocator to use (if not set, the default
+ * will be used).
+ *
+ * All other fields are ignored and/or overwritten.
+ *
+ * Returns zero or negative error. In particular, -EADDRINUSE if
+ * region overlap with already existing region.
+ */
+int __must_check cma_region_register(struct cma_region *reg);
+
+/**
+ * cma_region_unregister() - unregisters a region.
+ * @reg: Region to unregister.
+ *
+ * Region is unregistered only if there are no chunks allocated for
+ * it. Otherwise, function returns -EBUSY.
+ *
+ * On success returs zero.
+ */
+int __must_check cma_region_unregister(struct cma_region *reg);
+
+
+/**
+ * cma_alloc_from_region() - allocates contiguous chunk of memory from region.
+ * @reg: Region to allocate chunk from.
+ * @size: Size of the memory to allocate in bytes.
+ * @alignment: Desired alignment in bytes. Must be a power of two or
+ * zero. If alignment is less then a page size it will be
+ * set to page size. If unsure, pass zero here.
+ *
+ * On error returns a negative error cast to dma_addr_t. Use
+ * IS_ERR_VALUE() to check if returned value is indeed an error.
+ * Otherwise physical address of the chunk is returned.
+ */
+dma_addr_t __must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment);
+
+
+
+/****************************** Allocators API ******************************/
+
+/**
+ * struct cma_chunk - an allocated contiguous chunk of memory.
+ * @start: Physical address in bytes.
+ * @size: Size in bytes.
+ * @free_space: Free space in region in bytes. Read only.
+ * @reg: Region this chunk belongs to.
+ * @by_start: A node in an red-black tree with all chunks sorted by
+ * start address.
+ *
+ * The cma_allocator::alloc() operation need to set only the @start
+ * and @size fields. The rest is handled by the caller (ie. CMA
+ * glue).
+ */
+struct cma_chunk {
+ dma_addr_t start;
+ size_t size;
+
+ struct cma_region *reg;
+ struct rb_node by_start;
+};
+
+
+/**
+ * struct cma_allocator - a CMA allocator.
+ * @name: Allocator's unique name
+ * @init: Initialises an allocator on given region.
+ * @cleanup: Cleans up after init. May assume that there are no chunks
+ * allocated in given region.
+ * @alloc: Allocates a chunk of memory of given size in bytes and
+ * with given alignment. Alignment is a power of
+ * two (thus non-zero) and callback does not need to check it.
+ * May also assume that it is the only call that uses given
+ * region (ie. access to the region is synchronised with
+ * a mutex). This has to allocate the chunk object (it may be
+ * contained in a bigger structure with allocator-specific data.
+ * Required.
+ * @free: Frees allocated chunk. May also assume that it is the only
+ * call that uses given region. This has to free() the chunk
+ * object as well. Required.
+ * @list: Entry in list of allocators. Private.
+ */
+ /* * @users: How many regions use this allocator. Private. */
+struct cma_allocator {
+ const char *name;
+
+ int (*init)(struct cma_region *reg);
+ void (*cleanup)(struct cma_region *reg);
+ struct cma_chunk *(*alloc)(struct cma_region *reg, size_t size,
+ dma_addr_t alignment);
+ void (*free)(struct cma_chunk *chunk);
+
+ /* unsigned users; */
+ struct list_head list;
+};
+
+
+/**
+ * cma_allocator_register() - Registers an allocator.
+ * @alloc: Allocator to register.
+ *
+ * Adds allocator to the list of allocators managed by CMA.
+ *
+ * All of the fields of cma_allocator structure must be set except for
+ * optional name and users and list which will be overriden.
+ *
+ * Returns zero or negative error code.
+ */
+int cma_allocator_register(struct cma_allocator *alloc);
+
+
+/**************************** Initialisation API ****************************/
+
+/**
+ * cma_set_defaults() - specifies default command line parameters.
+ * @regions: A zero-sized entry terminated list of early regions.
+ * This array must not be placed in __initdata section.
+ * @map: Map attribute.
+ *
+ * This function should be called prior to cma_early_regions_reserve()
+ * and after early parameters have been parsed.
+ *
+ * Returns zero or negative error.
+ */
+int __init cma_set_defaults(struct cma_region *regions, const char *map);
+
+
+/**
+ * cma_early_regions - a list of early regions.
+ *
+ * Platform needs to allocate space for each of the region before
+ * initcalls are executed. If space is reserved, the reserved flag
+ * must be set. Platform initialisation code may choose to use
+ * cma_early_regions_allocate().
+ *
+ * Later, during CMA initialisation all reserved regions from the
+ * cma_early_regions list are registered as normal regions and can be
+ * used using standard mechanisms.
+ */
+extern struct list_head cma_early_regions __initdata;
+
+
+/**
+ * cma_early_region_register() - registers an early region.
+ * @reg: Region to add.
+ *
+ * Region's start, size and alignment must be set.
+ *
+ * If name is set the region will be accessible using normal mechanism
+ * like mapping or cma_alloc_from() function otherwise it will be
+ * a private region accessible only using the cma_alloc_from_region().
+ *
+ * If alloc is set function will try to initialise given allocator
+ * when the early region is "converted" to normal region and
+ * registered during CMA initialisation. If this failes, the space
+ * will still be reserved but the region won't be registered.
+ *
+ * As usually, alloc_name may point to a name of an allocator to use
+ * (if both alloc and alloc_name aret set, the default will be used).
+ *
+ * All other fields are ignored and/or overwritten.
+ *
+ * Returns zero or negative error. No checking if regions overlap is
+ * performed.
+ */
+int __init __must_check cma_early_region_register(struct cma_region *reg);
+
+
+/**
+ * cma_early_region_reserve() - reserves a physically contiguous memory region.
+ * @reg: Early region to reserve memory for.
+ *
+ * If platform supports bootmem this is the first allocator this
+ * function tries to use. If that failes (or bootmem is not
+ * supported) function tries to use memblec if it is available.
+ *
+ * On success sets reg->reserved flag.
+ *
+ * Returns zero or negative error.
+ */
+int __init cma_early_region_reserve(struct cma_region *reg);
+
+/**
+ * cma_early_regions_reserver() - helper function for reserving early regions.
+ * @reserve: Callbac function used to reserve space for region. Needs
+ * to return non-negative if allocation succeeded, negative
+ * error otherwise. NULL means cma_early_region_alloc() will
+ * be used.
+ *
+ * This function traverses the %cma_early_regions list and tries to
+ * reserve memory for each early region. It uses the @reserve
+ * callback function for that purpose. The reserved flag of each
+ * region is updated accordingly.
+ */
+void __init cma_early_regions_reserve(int (*reserve)(struct cma_region *reg));
+
+#else
+
+#define cma_defaults(regions, map) ((int)0)
+#define cma_early_regions_reserve(reserve) do { } while (0)
+
+#endif
+
+#endif
diff --git a/mm/Kconfig b/mm/Kconfig
index f4e516e..3e9317c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -301,3 +301,37 @@ config NOMMU_INITIAL_TRIM_EXCESS
of 1 says that all excess pages should be trimmed.

See Documentation/nommu-mmap.txt for more information.
+
+
+config CMA
+ bool "Contiguous Memory Allocator framework"
+ # Currently there is only one allocator so force it on
+ select CMA_BEST_FIT
+ help
+ This enables the Contiguous Memory Allocator framework which
+ allows drivers to allocate big physically-contiguous blocks of
+ memory for use with hardware components that do not support I/O
+ map nor scatter-gather.
+
+ If you select this option you will also have to select at least
+ one allocator algorithm below.
+
+ To make use of CMA you need to specify the regions and
+ driver->region mapping on command line when booting the kernel.
+
+config CMA_DEBUG
+ bool "CMA debug messages (DEVELOPEMENT)"
+ depends on CMA
+ help
+ Enable debug messages in CMA code.
+
+config CMA_BEST_FIT
+ bool "CMA best-fit allocator"
+ depends on CMA
+ default y
+ help
+ This is a best-fit algorithm running in O(n log n) time where
+ n is the number of existing holes (which is never greater then
+ the number of allocated regions and usually much smaller). It
+ allocates area from the smallest hole that is big enough for
+ allocation in question.
diff --git a/mm/Makefile b/mm/Makefile
index 34b2546..d8c717f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -47,3 +47,5 @@ obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
+obj-$(CONFIG_CMA) += cma.o
+obj-$(CONFIG_CMA_BEST_FIT) += cma-best-fit.o
diff --git a/mm/cma-best-fit.c b/mm/cma-best-fit.c
new file mode 100644
index 0000000..97f8d61
--- /dev/null
+++ b/mm/cma-best-fit.c
@@ -0,0 +1,407 @@
+/*
+ * Contiguous Memory Allocator framework: Best Fit allocator
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: bf: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#include <linux/errno.h> /* Error numbers */
+#include <linux/slab.h> /* kmalloc() */
+
+#include <linux/cma.h> /* CMA structures */
+
+
+/************************* Data Types *************************/
+
+struct cma_bf_item {
+ struct cma_chunk ch;
+ struct rb_node by_size;
+};
+
+struct cma_bf_private {
+ struct rb_root by_start_root;
+ struct rb_root by_size_root;
+};
+
+
+/************************* Prototypes *************************/
+
+/*
+ * Those are only for holes. They must be called whenever hole's
+ * properties change but also whenever chunk becomes a hole or hole
+ * becames a chunk.
+ */
+static void __cma_bf_hole_insert_by_size(struct cma_bf_item *item);
+static void __cma_bf_hole_erase_by_size(struct cma_bf_item *item);
+static int __must_check
+__cma_bf_hole_insert_by_start(struct cma_bf_item *item);
+static void __cma_bf_hole_erase_by_start(struct cma_bf_item *item);
+
+/**
+ * __cma_bf_hole_take - takes a chunk of memory out of a hole.
+ * @hole: hole to take chunk from
+ * @size: chunk's size
+ * @alignment: chunk's starting address alignment (must be power of two)
+ *
+ * Takes a @size bytes large chunk from hole @hole which must be able
+ * to hold the chunk. The "must be able" includes also alignment
+ * constraint.
+ *
+ * Returns allocated item or NULL on error (if kmalloc() failed).
+ */
+static struct cma_bf_item *__must_check
+__cma_bf_hole_take(struct cma_bf_item *hole, size_t size, dma_addr_t alignment);
+
+/**
+ * __cma_bf_hole_merge_maybe - tries to merge hole with neighbours.
+ * @item: hole to try and merge
+ *
+ * Which items are preserved is undefined so you may not rely on it.
+ */
+static void __cma_bf_hole_merge_maybe(struct cma_bf_item *item);
+
+
+/************************* Device API *************************/
+
+int cma_bf_init(struct cma_region *reg)
+{
+ struct cma_bf_private *prv;
+ struct cma_bf_item *item;
+
+ prv = kzalloc(sizeof *prv, GFP_KERNEL);
+ if (unlikely(!prv))
+ return -ENOMEM;
+
+ item = kzalloc(sizeof *item, GFP_KERNEL);
+ if (unlikely(!item)) {
+ kfree(prv);
+ return -ENOMEM;
+ }
+
+ item->ch.start = reg->start;
+ item->ch.size = reg->size;
+ item->ch.reg = reg;
+
+ rb_root_init(&prv->by_start_root, &item->ch.by_start);
+ rb_root_init(&prv->by_size_root, &item->by_size);
+
+ reg->private_data = prv;
+ return 0;
+}
+
+void cma_bf_cleanup(struct cma_region *reg)
+{
+ struct cma_bf_private *prv = reg->private_data;
+ struct cma_bf_item *item =
+ rb_entry(prv->by_size_root.rb_node,
+ struct cma_bf_item, by_size);
+
+ /* We can assume there is only a single hole in the tree. */
+ WARN_ON(item->by_size.rb_left || item->by_size.rb_right ||
+ item->ch.by_start.rb_left || item->ch.by_start.rb_right);
+
+ kfree(item);
+ kfree(prv);
+}
+
+struct cma_chunk *cma_bf_alloc(struct cma_region *reg,
+ size_t size, dma_addr_t alignment)
+{
+ struct cma_bf_private *prv = reg->private_data;
+ struct rb_node *node = prv->by_size_root.rb_node;
+ struct cma_bf_item *item = NULL;
+
+ /* First find hole that is large enough */
+ while (node) {
+ struct cma_bf_item *i =
+ rb_entry(node, struct cma_bf_item, by_size);
+
+ if (i->ch.size < size) {
+ node = node->rb_right;
+ } else if (i->ch.size >= size) {
+ node = node->rb_left;
+ item = i;
+ }
+ }
+ if (!item)
+ return NULL;
+
+ /* Now look for items which can satisfy alignment requirements */
+ for (;;) {
+ dma_addr_t start = ALIGN(item->ch.start, alignment);
+ dma_addr_t end = item->ch.start + item->ch.size;
+ if (start < end && end - start >= size) {
+ item = __cma_bf_hole_take(item, size, alignment);
+ return likely(item) ? &item->ch : NULL;
+ }
+
+ node = rb_next(node);
+ if (!node)
+ return NULL;
+
+ item = rb_entry(node, struct cma_bf_item, by_size);
+ }
+}
+
+void cma_bf_free(struct cma_chunk *chunk)
+{
+ struct cma_bf_item *item = container_of(chunk, struct cma_bf_item, ch);
+
+ /* Add new hole */
+ if (unlikely(__cma_bf_hole_insert_by_start(item))) {
+ /*
+ * We're screwed... Just free the item and forget
+ * about it. Things are broken beyond repair so no
+ * sense in trying to recover.
+ */
+ kfree(item);
+ } else {
+ __cma_bf_hole_insert_by_size(item);
+
+ /* Merge with prev and next sibling */
+ __cma_bf_hole_merge_maybe(item);
+ }
+}
+
+
+/************************* Basic Tree Manipulation *************************/
+
+static void __cma_bf_hole_insert_by_size(struct cma_bf_item *item)
+{
+ struct cma_bf_private *prv = item->ch.reg->private_data;
+ struct rb_node **link = &prv->by_size_root.rb_node, *parent = NULL;
+ const typeof(item->ch.size) value = item->ch.size;
+
+ while (*link) {
+ struct cma_bf_item *i;
+ parent = *link;
+ i = rb_entry(parent, struct cma_bf_item, by_size);
+ link = value <= i->ch.size
+ ? &parent->rb_left
+ : &parent->rb_right;
+ }
+
+ rb_link_node(&item->by_size, parent, link);
+ rb_insert_color(&item->by_size, &prv->by_size_root);
+}
+
+static void __cma_bf_hole_erase_by_size(struct cma_bf_item *item)
+{
+ struct cma_bf_private *prv = item->ch.reg->private_data;
+ rb_erase(&item->by_size, &prv->by_size_root);
+}
+
+static int __must_check
+__cma_bf_hole_insert_by_start(struct cma_bf_item *item)
+{
+ struct cma_bf_private *prv = item->ch.reg->private_data;
+ struct rb_node **link = &prv->by_start_root.rb_node, *parent = NULL;
+ const typeof(item->ch.start) value = item->ch.start;
+
+ while (*link) {
+ struct cma_bf_item *i;
+ parent = *link;
+ i = rb_entry(parent, struct cma_bf_item, ch.by_start);
+
+ if (WARN_ON(value == i->ch.start))
+ /*
+ * This should *never* happen. And I mean
+ * *never*. We could even BUG on it but
+ * hopefully things are only a bit broken,
+ * ie. system can still run. We produce
+ * a warning and return an error.
+ */
+ return -EBUSY;
+
+ link = value <= i->ch.start
+ ? &parent->rb_left
+ : &parent->rb_right;
+ }
+
+ rb_link_node(&item->ch.by_start, parent, link);
+ rb_insert_color(&item->ch.by_start, &prv->by_start_root);
+ return 0;
+}
+
+static void __cma_bf_hole_erase_by_start(struct cma_bf_item *item)
+{
+ struct cma_bf_private *prv = item->ch.reg->private_data;
+ rb_erase(&item->ch.by_start, &prv->by_start_root);
+}
+
+
+/************************* More Tree Manipulation *************************/
+
+static struct cma_bf_item *__must_check
+__cma_bf_hole_take(struct cma_bf_item *hole, size_t size, size_t alignment)
+{
+ struct cma_bf_item *item;
+
+ /*
+ * There are three cases:
+ * 1. the chunk takes the whole hole,
+ * 2. the chunk is at the beginning or at the end of the hole, or
+ * 3. the chunk is in the middle of the hole.
+ */
+
+
+ /* Case 1, the whole hole */
+ if (size == hole->ch.size) {
+ __cma_bf_hole_erase_by_size(hole);
+ __cma_bf_hole_erase_by_start(hole);
+ return hole;
+ }
+
+
+ /* Allocate */
+ item = kmalloc(sizeof *item, GFP_KERNEL);
+ if (unlikely(!item))
+ return NULL;
+
+ item->ch.start = ALIGN(hole->ch.start, alignment);
+ item->ch.size = size;
+
+ /* Case 3, in the middle */
+ if (item->ch.start != hole->ch.start
+ && item->ch.start + item->ch.size !=
+ hole->ch.start + hole->ch.size) {
+ struct cma_bf_item *tail;
+
+ /*
+ * Space between the end of the chunk and the end of
+ * the region, ie. space left after the end of the
+ * chunk. If this is dividable by alignment we can
+ * move the chunk to the end of the hole.
+ */
+ size_t left =
+ hole->ch.start + hole->ch.size -
+ (item->ch.start + item->ch.size);
+ if (left % alignment == 0) {
+ item->ch.start += left;
+ goto case_2;
+ }
+
+ /*
+ * We are going to add a hole at the end. This way,
+ * we will reduce the problem to case 2 -- the chunk
+ * will be at the end of the hole.
+ */
+ tail = kmalloc(sizeof *tail, GFP_KERNEL);
+ if (unlikely(!tail)) {
+ kfree(item);
+ return NULL;
+ }
+
+ tail->ch.start = item->ch.start + item->ch.size;
+ tail->ch.size =
+ hole->ch.start + hole->ch.size - tail->ch.start;
+ tail->ch.reg = hole->ch.reg;
+
+ if (unlikely(__cma_bf_hole_insert_by_start(tail))) {
+ /*
+ * Things are broken beyond repair... Abort
+ * inserting the hole but still continue with
+ * allocation (seems like the best we can do).
+ */
+
+ hole->ch.size = tail->ch.start - hole->ch.start;
+ kfree(tail);
+ } else {
+ __cma_bf_hole_insert_by_size(tail);
+ /*
+ * It's important that we first insert the new
+ * hole in the tree sorted by size and later
+ * reduce the size of the old hole. We will
+ * update the position of the old hole in the
+ * rb tree in code that handles case 2.
+ */
+ hole->ch.size = tail->ch.start - hole->ch.start;
+ }
+
+ /* Go to case 2 */
+ }
+
+
+ /* Case 2, at the beginning or at the end */
+case_2:
+ /* No need to update the tree; order preserved. */
+ if (item->ch.start == hole->ch.start)
+ hole->ch.start += item->ch.size;
+
+ /* Alter hole's size */
+ hole->ch.size -= size;
+ __cma_bf_hole_erase_by_size(hole);
+ __cma_bf_hole_insert_by_size(hole);
+
+ return item;
+}
+
+
+static void __cma_bf_hole_merge_maybe(struct cma_bf_item *item)
+{
+ struct cma_bf_item *prev;
+ struct rb_node *node;
+ int twice = 2;
+
+ node = rb_prev(&item->ch.by_start);
+ if (unlikely(!node))
+ goto next;
+ prev = rb_entry(node, struct cma_bf_item, ch.by_start);
+
+ for (;;) {
+ if (prev->ch.start + prev->ch.size == item->ch.start) {
+ /* Remove previous hole from trees */
+ __cma_bf_hole_erase_by_size(prev);
+ __cma_bf_hole_erase_by_start(prev);
+
+ /* Alter this hole */
+ item->ch.size += prev->ch.size;
+ item->ch.start = prev->ch.start;
+ __cma_bf_hole_erase_by_size(item);
+ __cma_bf_hole_insert_by_size(item);
+ /*
+ * No need to update by start trees as we do
+ * not break sequence order
+ */
+
+ /* Free prev hole */
+ kfree(prev);
+ }
+
+next:
+ if (!--twice)
+ break;
+
+ node = rb_next(&item->ch.by_start);
+ if (unlikely(!node))
+ break;
+ prev = item;
+ item = rb_entry(node, struct cma_bf_item, ch.by_start);
+ }
+}
+
+
+
+/************************* Register *************************/
+static int cma_bf_module_init(void)
+{
+ static struct cma_allocator alloc = {
+ .name = "bf",
+ .init = cma_bf_init,
+ .cleanup = cma_bf_cleanup,
+ .alloc = cma_bf_alloc,
+ .free = cma_bf_free,
+ };
+ return cma_allocator_register(&alloc);
+}
+module_init(cma_bf_module_init);
diff --git a/mm/cma.c b/mm/cma.c
new file mode 100644
index 0000000..401399c
--- /dev/null
+++ b/mm/cma.c
@@ -0,0 +1,910 @@
+/*
+ * Contiguous Memory Allocator framework
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * See Documentation/contiguous-memory.txt for details.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#ifndef CONFIG_NO_BOOTMEM
+# include <linux/bootmem.h> /* alloc_bootmem_pages_nopanic() */
+#endif
+#ifdef CONFIG_HAVE_MEMBLOCK
+# include <linux/memblock.h> /* memblock*() */
+#endif
+#include <linux/device.h> /* struct device, dev_name() */
+#include <linux/errno.h> /* Error numbers */
+#include <linux/err.h> /* IS_ERR, PTR_ERR, etc. */
+#include <linux/mm.h> /* PAGE_ALIGN() */
+#include <linux/module.h> /* EXPORT_SYMBOL_GPL() */
+#include <linux/mutex.h> /* mutex */
+#include <linux/slab.h> /* kmalloc() */
+#include <linux/string.h> /* str*() */
+
+#include <linux/cma.h>
+
+
+/*
+ * Protects cma_regions, cma_allocators, cma_map, cma_map_length, and
+ * cma_chunks_by_start.
+ */
+static DEFINE_MUTEX(cma_mutex);
+
+
+
+/************************* Map attribute *************************/
+
+static const char *cma_map;
+static size_t cma_map_length;
+
+/*
+ * map-attr ::= [ rules [ ';' ] ]
+ * rules ::= rule [ ';' rules ]
+ * rule ::= patterns '=' regions
+ * patterns ::= pattern [ ',' patterns ]
+ * regions ::= REG-NAME [ ',' regions ]
+ * pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
+ *
+ * See Documentation/contiguous-memory.txt for details.
+ */
+static ssize_t cma_map_validate(const char *param)
+{
+ const char *ch = param;
+
+ if (*ch == '\0' || *ch == '\n')
+ return 0;
+
+ for (;;) {
+ const char *start = ch;
+
+ while (*ch && *ch != '\n' && *ch != ';' && *ch != '=')
+ ++ch;
+
+ if (*ch != '=' || start == ch) {
+ pr_err("map: expecting \"<patterns>=<regions>\" near %s\n",
+ start);
+ return -EINVAL;
+ }
+
+ while (*++ch != ';')
+ if (*ch == '\0' || *ch == '\n')
+ return ch - param;
+ if (ch[1] == '\0' || ch[1] == '\n')
+ return ch - param;
+ ++ch;
+ }
+}
+
+static int __init cma_map_param(char *param)
+{
+ ssize_t len;
+
+ pr_debug("param: map: %s\n", param);
+
+ len = cma_map_validate(param);
+ if (len < 0)
+ return len;
+
+ cma_map = param;
+ cma_map_length = len;
+ return 0;
+}
+
+
+
+/************************* Early regions *************************/
+
+struct list_head cma_early_regions __initdata =
+ LIST_HEAD_INIT(cma_early_regions);
+
+
+int __init __must_check cma_early_region_register(struct cma_region *reg)
+{
+ dma_addr_t start, alignment;
+ size_t size;
+
+ if (reg->alignment & (reg->alignment - 1))
+ return -EINVAL;
+
+ alignment = max(reg->alignment, (dma_addr_t)PAGE_SIZE);
+ start = ALIGN(reg->start, alignment);
+ size = PAGE_ALIGN(reg->size);
+
+ if (start + size < start)
+ return -EINVAL;
+
+ reg->size = size;
+ reg->start = start;
+ reg->alignment = alignment;
+
+ list_add_tail(&reg->list, &cma_early_regions);
+
+ pr_debug("param: registering early region %s (%p@%p/%p)\n",
+ reg->name, (void *)reg->size, (void *)reg->start,
+ (void *)reg->alignment);
+
+ return 0;
+}
+
+
+
+/************************* Regions & Allocators *************************/
+
+static int __cma_region_attach_alloc(struct cma_region *reg);
+
+/* List of all regions. Named regions are kept before unnamed. */
+static LIST_HEAD(cma_regions);
+
+#define cma_foreach_region(reg) \
+ list_for_each_entry(reg, &cma_regions, list)
+
+int __must_check cma_region_register(struct cma_region *reg)
+{
+ const char *name, *alloc_name;
+ struct cma_region *r;
+ char *ch = NULL;
+ int ret = 0;
+
+ if (!reg->size || reg->start + reg->size < reg->start)
+ return -EINVAL;
+
+ reg->users = 0;
+ reg->used = 0;
+ reg->private_data = NULL;
+ reg->registered = 0;
+ reg->free_space = reg->size;
+
+ /* Copy name and alloc_name */
+ name = reg->name;
+ alloc_name = reg->alloc_name;
+ if (reg->copy_name && (reg->name || reg->alloc_name)) {
+ size_t name_size, alloc_size;
+
+ name_size = reg->name ? strlen(reg->name) + 1 : 0;
+ alloc_size = reg->alloc_name ? strlen(reg->alloc_name) + 1 : 0;
+
+ ch = kmalloc(name_size + alloc_size, GFP_KERNEL);
+ if (!ch) {
+ pr_err("%s: not enough memory to allocate name\n",
+ reg->name ?: "(private)");
+ return -ENOMEM;
+ }
+
+ if (name_size) {
+ memcpy(ch, reg->name, name_size);
+ name = ch;
+ ch += name_size;
+ }
+
+ if (alloc_size) {
+ memcpy(ch, reg->alloc_name, alloc_size);
+ alloc_name = ch;
+ }
+ }
+
+ mutex_lock(&cma_mutex);
+
+ /* Don't let regions overlap */
+ cma_foreach_region(r)
+ if (r->start + r->size > reg->start &&
+ r->start < reg->start + reg->size) {
+ ret = -EADDRINUSE;
+ goto done;
+ }
+
+ if (reg->alloc) {
+ ret = __cma_region_attach_alloc(reg);
+ if (unlikely(ret < 0))
+ goto done;
+ }
+
+ reg->name = name;
+ reg->alloc_name = alloc_name;
+ reg->registered = 1;
+ ch = NULL;
+
+ /*
+ * Keep named at the beginning and unnamed (private) at the
+ * end. This helps in traversal when named region is looked
+ * for.
+ */
+ if (name)
+ list_add(&reg->list, &cma_regions);
+ else
+ list_add_tail(&reg->list, &cma_regions);
+
+done:
+ mutex_unlock(&cma_mutex);
+
+ pr_debug("%s: region %sregistered\n",
+ reg->name ?: "(private)", ret ? "not " : "");
+ kfree(ch);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(cma_region_register);
+
+static struct cma_region *__must_check
+__cma_region_find(const char **namep)
+{
+ struct cma_region *reg;
+ const char *ch, *name;
+ size_t n;
+
+ for (ch = *namep; *ch && *ch != ',' && *ch != ';'; ++ch)
+ /* nop */;
+ name = *namep;
+ *namep = *ch == ',' ? ch : (ch + 1);
+ n = ch - name;
+
+ /*
+ * Named regions are kept in front of unnamed so if we
+ * encounter unnamed region we can stop.
+ */
+ cma_foreach_region(reg)
+ if (!reg->name)
+ break;
+ else if (!strncmp(name, reg->name, n) && !reg->name[n])
+ return reg;
+
+ return NULL;
+}
+
+
+/* List of all allocators. */
+static LIST_HEAD(cma_allocators);
+
+#define cma_foreach_allocator(alloc) \
+ list_for_each_entry(alloc, &cma_allocators, list)
+
+int cma_allocator_register(struct cma_allocator *alloc)
+{
+ struct cma_region *reg;
+ int first;
+
+ if (!alloc->alloc || !alloc->free)
+ return -EINVAL;
+
+ /* alloc->users = 0; */
+
+ mutex_lock(&cma_mutex);
+
+ first = list_empty(&cma_allocators);
+
+ list_add_tail(&alloc->list, &cma_allocators);
+
+ /*
+ * Attach this allocator to all allocator-less regions that
+ * request this particular allocator (reg->alloc_name equals
+ * alloc->name) or if region wants the first available
+ * allocator and we are the first.
+ */
+ cma_foreach_region(reg) {
+ if (reg->alloc)
+ continue;
+ if (reg->alloc_name
+ ? alloc->name && !strcmp(alloc->name, reg->alloc_name)
+ : (!reg->used && first))
+ continue;
+
+ reg->alloc = alloc;
+ __cma_region_attach_alloc(reg);
+ }
+
+ mutex_unlock(&cma_mutex);
+
+ pr_debug("%s: allocator registered\n", alloc->name ?: "(unnamed)");
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cma_allocator_register);
+
+static struct cma_allocator *__must_check
+__cma_allocator_find(const char *name)
+{
+ struct cma_allocator *alloc;
+
+ if (!name)
+ return list_empty(&cma_allocators)
+ ? NULL
+ : list_entry(cma_allocators.next,
+ struct cma_allocator, list);
+
+ cma_foreach_allocator(alloc)
+ if (alloc->name && !strcmp(name, alloc->name))
+ return alloc;
+
+ return NULL;
+}
+
+
+
+/************************* Initialise CMA *************************/
+
+int __init cma_set_defaults(struct cma_region *regions, const char *map)
+{
+ if (map) {
+ int ret = cma_map_param((char *)map);
+ if (unlikely(ret < 0))
+ return ret;
+ }
+
+ if (!regions)
+ return 0;
+
+ for (; regions->size; ++regions) {
+ int ret = cma_early_region_register(regions);
+ if (unlikely(ret < 0))
+ return ret;
+ }
+
+ return 0;
+}
+
+
+int __init cma_early_region_reserve(struct cma_region *reg)
+{
+ int tried = 0;
+
+ if (!reg->size || (reg->alignment & (reg->alignment - 1)) ||
+ reg->reserved)
+ return -EINVAL;
+
+#ifndef CONFIG_NO_BOOTMEM
+
+ tried = 1;
+
+ {
+ void *ptr = __alloc_bootmem_nopanic(reg->size, reg->alignment,
+ reg->start);
+ if (ptr) {
+ reg->start = virt_to_phys(ptr);
+ reg->reserved = 1;
+ return 0;
+ }
+ }
+
+#endif
+
+#ifdef CONFIG_HAVE_MEMBLOCK
+
+ tried = 1;
+
+ if (reg->start) {
+ if (memblock_is_region_reserved(reg->start, reg->size) < 0 &&
+ memblock_reserve(reg->start, reg->size) >= 0) {
+ reg->reserved = 1;
+ return 0;
+ }
+ } else {
+ /*
+ * Use __memblock_alloc_base() since
+ * memblock_alloc_base() panic()s.
+ */
+ u64 ret = __memblock_alloc_base(reg->size, reg->alignment, 0);
+ if (ret &&
+ ret < ~(dma_addr_t)0 &&
+ ret + reg->size < ~(dma_addr_t)0 &&
+ ret + reg->size > ret) {
+ reg->start = ret;
+ reg->reserved = 1;
+ return 0;
+ }
+
+ if (ret)
+ memblock_free(ret, reg->size);
+ }
+
+#endif
+
+ return tried ? -ENOMEM : -EOPNOTSUPP;
+}
+
+void __init cma_early_regions_reserve(int (*reserve)(struct cma_region *reg))
+{
+ struct cma_region *reg;
+
+ pr_debug("init: reserving early regions\n");
+
+ if (!reserve)
+ reserve = cma_early_region_reserve;
+
+ list_for_each_entry(reg, &cma_early_regions, list) {
+ if (reg->reserved) {
+ /* nothing */
+ } else if (reserve(reg) >= 0) {
+ pr_debug("init: %s: reserved %p@%p\n",
+ reg->name ?: "(private)",
+ (void *)reg->size, (void *)reg->start);
+ reg->reserved = 1;
+ } else {
+ pr_warn("init: %s: unable to reserve %p@%p/%p\n",
+ reg->name ?: "(private)",
+ (void *)reg->size, (void *)reg->start,
+ (void *)reg->alignment);
+ }
+ }
+}
+
+
+static int __init cma_init(void)
+{
+ struct cma_region *reg, *n;
+
+ pr_debug("init: initialising\n");
+
+ if (cma_map) {
+ char *val = kmemdup(cma_map, cma_map_length + 1, GFP_KERNEL);
+ cma_map = val;
+ if (!val)
+ return -ENOMEM;
+ val[cma_map_length] = '\0';
+ }
+
+ list_for_each_entry_safe(reg, n, &cma_early_regions, list) {
+ INIT_LIST_HEAD(&reg->list);
+ /*
+ * We don't care if there was an error. It's a pity
+ * but there's not much we can do about it any way.
+ * If the error is on a region that was parsed from
+ * command line then it will stay and waste a bit of
+ * space; if it was registered using
+ * cma_early_region_register() it's caller's
+ * responsibility to do something about it.
+ */
+ if (reg->reserved && cma_region_register(reg) < 0)
+ /* ignore error */;
+ }
+
+ INIT_LIST_HEAD(&cma_early_regions);
+
+ return 0;
+}
+/*
+ * We want to be initialised earlier than module_init/__initcall so
+ * that drivers that want to grab memory at boot time will get CMA
+ * ready. subsys_initcall() seems early enough and not too early at
+ * the same time.
+ */
+subsys_initcall(cma_init);
+
+
+
+/************************* Chunks *************************/
+
+/* All chunks sorted by start address. */
+static struct rb_root cma_chunks_by_start;
+
+static struct cma_chunk *__must_check __cma_chunk_find(dma_addr_t addr)
+{
+ struct cma_chunk *chunk;
+ struct rb_node *n;
+
+ for (n = cma_chunks_by_start.rb_node; n; ) {
+ chunk = rb_entry(n, struct cma_chunk, by_start);
+ if (addr < chunk->start)
+ n = n->rb_left;
+ else if (addr > chunk->start)
+ n = n->rb_right;
+ else
+ return chunk;
+ }
+ WARN(1, KERN_WARNING "no chunk starting at %p\n", (void *)addr);
+ return NULL;
+}
+
+static int __must_check __cma_chunk_insert(struct cma_chunk *chunk)
+{
+ struct rb_node **new, *parent = NULL;
+ typeof(chunk->start) addr = chunk->start;
+
+ for (new = &cma_chunks_by_start.rb_node; *new; ) {
+ struct cma_chunk *c =
+ container_of(*new, struct cma_chunk, by_start);
+
+ parent = *new;
+ if (addr < c->start) {
+ new = &(*new)->rb_left;
+ } else if (addr > c->start) {
+ new = &(*new)->rb_right;
+ } else {
+ /*
+ * We should never be here. If we are it
+ * means allocator gave us an invalid chunk
+ * (one that has already been allocated) so we
+ * refuse to accept it. Our caller will
+ * recover by freeing the chunk.
+ */
+ WARN_ON(1);
+ return -EADDRINUSE;
+ }
+ }
+
+ rb_link_node(&chunk->by_start, parent, new);
+ rb_insert_color(&chunk->by_start, &cma_chunks_by_start);
+
+ return 0;
+}
+
+static void __cma_chunk_free(struct cma_chunk *chunk)
+{
+ rb_erase(&chunk->by_start, &cma_chunks_by_start);
+
+ chunk->reg->alloc->free(chunk);
+ --chunk->reg->users;
+ chunk->reg->free_space += chunk->size;
+}
+
+
+/************************* The Device API *************************/
+
+static const char *__must_check
+__cma_where_from(const struct device *dev, const char *type);
+
+
+/* Allocate. */
+
+static dma_addr_t __must_check
+__cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment)
+{
+ struct cma_chunk *chunk;
+
+ pr_debug("allocate %p/%p from %s\n",
+ (void *)size, (void *)alignment,
+ reg ? reg->name ?: "(private)" : "(null)");
+
+ if (!reg || reg->free_space < size)
+ return -ENOMEM;
+
+ if (!reg->alloc) {
+ if (!reg->used)
+ __cma_region_attach_alloc(reg);
+ if (!reg->alloc)
+ return -ENOMEM;
+ }
+
+ chunk = reg->alloc->alloc(reg, size, alignment);
+ if (!chunk)
+ return -ENOMEM;
+
+ if (unlikely(__cma_chunk_insert(chunk) < 0)) {
+ /* We should *never* be here. */
+ chunk->reg->alloc->free(chunk);
+ kfree(chunk);
+ return -EADDRINUSE;
+ }
+
+ chunk->reg = reg;
+ ++reg->users;
+ reg->free_space -= chunk->size;
+ pr_debug("allocated at %p\n", (void *)chunk->start);
+ return chunk->start;
+}
+
+dma_addr_t __must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment)
+{
+ dma_addr_t addr;
+
+ pr_debug("allocate %p/%p from %s\n",
+ (void *)size, (void *)alignment,
+ reg ? reg->name ?: "(private)" : "(null)");
+
+ if (!size || alignment & (alignment - 1) || !reg)
+ return -EINVAL;
+
+ mutex_lock(&cma_mutex);
+
+ addr = reg->registered ?
+ __cma_alloc_from_region(reg, PAGE_ALIGN(size),
+ max(alignment, (dma_addr_t)PAGE_SIZE)) :
+ -EINVAL;
+
+ mutex_unlock(&cma_mutex);
+
+ return addr;
+}
+EXPORT_SYMBOL_GPL(cma_alloc_from_region);
+
+dma_addr_t __must_check
+__cma_alloc(const struct device *dev, const char *type,
+ dma_addr_t size, dma_addr_t alignment)
+{
+ struct cma_region *reg;
+ const char *from;
+ dma_addr_t addr;
+
+ if (dev)
+ pr_debug("allocate %p/%p for %s/%s\n",
+ (void *)size, (void *)alignment,
+ dev_name(dev), type ?: "");
+
+ if (!size || alignment & (alignment - 1))
+ return -EINVAL;
+
+ size = PAGE_ALIGN(size);
+ if (alignment < PAGE_SIZE)
+ alignment = PAGE_SIZE;
+
+ mutex_lock(&cma_mutex);
+
+ from = __cma_where_from(dev, type);
+ if (unlikely(IS_ERR(from))) {
+ addr = PTR_ERR(from);
+ goto done;
+ }
+
+ pr_debug("allocate %p/%p from one of %s\n",
+ (void *)size, (void *)alignment, from);
+
+ while (*from && *from != ';') {
+ reg = __cma_region_find(&from);
+ addr = __cma_alloc_from_region(reg, size, alignment);
+ if (!IS_ERR_VALUE(addr))
+ goto done;
+ }
+
+ pr_debug("not enough memory\n");
+ addr = -ENOMEM;
+
+done:
+ mutex_unlock(&cma_mutex);
+
+ return addr;
+}
+EXPORT_SYMBOL_GPL(__cma_alloc);
+
+
+/* Query information about regions. */
+static void __cma_info_add(struct cma_info *infop, struct cma_region *reg)
+{
+ infop->total_size += reg->size;
+ infop->free_size += reg->free_space;
+ if (infop->lower_bound > reg->start)
+ infop->lower_bound = reg->start;
+ if (infop->upper_bound < reg->start + reg->size)
+ infop->upper_bound = reg->start + reg->size;
+ ++infop->count;
+}
+
+int
+__cma_info(struct cma_info *infop, const struct device *dev, const char *type)
+{
+ struct cma_info info = { ~(dma_addr_t)0, 0, 0, 0, 0 };
+ struct cma_region *reg;
+ const char *from;
+ int ret;
+
+ if (unlikely(!infop))
+ return -EINVAL;
+
+ mutex_lock(&cma_mutex);
+
+ from = __cma_where_from(dev, type);
+ if (IS_ERR(from)) {
+ ret = PTR_ERR(from);
+ info.lower_bound = 0;
+ goto done;
+ }
+
+ while (*from && *from != ';') {
+ reg = __cma_region_find(&from);
+ if (reg)
+ __cma_info_add(&info, reg);
+ }
+
+ ret = 0;
+done:
+ mutex_unlock(&cma_mutex);
+
+ memcpy(infop, &info, sizeof info);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(__cma_info);
+
+
+/* Freeing. */
+int cma_free(dma_addr_t addr)
+{
+ struct cma_chunk *c;
+ int ret;
+
+ mutex_lock(&cma_mutex);
+
+ c = __cma_chunk_find(addr);
+
+ if (c) {
+ __cma_chunk_free(c);
+ ret = 0;
+ } else {
+ ret = -ENOENT;
+ }
+
+ mutex_unlock(&cma_mutex);
+
+ pr_debug("free(%p): %s\n", (void *)addr, c ? "freed" : "not found");
+ return ret;
+}
+EXPORT_SYMBOL_GPL(cma_free);
+
+
+/************************* Miscellaneous *************************/
+
+static int __cma_region_attach_alloc(struct cma_region *reg)
+{
+ struct cma_allocator *alloc;
+ int ret;
+
+ /*
+ * If reg->alloc is set then caller wants us to use this
+ * allocator. Otherwise we need to find one by name.
+ */
+ if (reg->alloc) {
+ alloc = reg->alloc;
+ } else {
+ alloc = __cma_allocator_find(reg->alloc_name);
+ if (!alloc) {
+ pr_warn("init: %s: %s: no such allocator\n",
+ reg->name ?: "(private)",
+ reg->alloc_name ?: "(default)");
+ reg->used = 1;
+ return -ENOENT;
+ }
+ }
+
+ /* Try to initialise the allocator. */
+ reg->private_data = NULL;
+ ret = alloc->init ? alloc->init(reg) : 0;
+ if (unlikely(ret < 0)) {
+ pr_err("init: %s: %s: unable to initialise allocator\n",
+ reg->name ?: "(private)", alloc->name ?: "(unnamed)");
+ reg->alloc = NULL;
+ reg->used = 1;
+ } else {
+ reg->alloc = alloc;
+ /* ++alloc->users; */
+ pr_debug("init: %s: %s: initialised allocator\n",
+ reg->name ?: "(private)", alloc->name ?: "(unnamed)");
+ }
+ return ret;
+}
+
+
+/*
+ * s ::= rules
+ * rules ::= rule [ ';' rules ]
+ * rule ::= patterns '=' regions
+ * patterns ::= pattern [ ',' patterns ]
+ * regions ::= REG-NAME [ ',' regions ]
+ * pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
+ */
+static const char *__must_check
+__cma_where_from(const struct device *dev, const char *type)
+{
+ /*
+ * This function matches the pattern from the map attribute
+ * agains given device name and type. Type may be of course
+ * NULL or an emtpy string.
+ */
+
+ const char *s, *name;
+ int name_matched = 0;
+
+ /*
+ * If dev is NULL we were called in alternative form where
+ * type is the from string. All we have to do is return it.
+ */
+ if (!dev)
+ return type ?: ERR_PTR(-EINVAL);
+
+ if (!cma_map)
+ return ERR_PTR(-ENOENT);
+
+ name = dev_name(dev);
+ if (WARN_ON(!name || !*name))
+ return ERR_PTR(-EINVAL);
+
+ if (!type)
+ type = "common";
+
+ /*
+ * Now we go throught the cma_map attribute.
+ */
+ for (s = cma_map; *s; ++s) {
+ const char *c;
+
+ /*
+ * If the pattern starts with a slash, the device part of the
+ * pattern matches if it matched previously.
+ */
+ if (*s == '/') {
+ if (!name_matched)
+ goto look_for_next;
+ goto match_type;
+ }
+
+ /*
+ * We are now trying to match the device name. This also
+ * updates the name_matched variable. If, while reading the
+ * spec, we ecnounter comma it means that the pattern does not
+ * match and we need to start over with another pattern (the
+ * one afther the comma). If we encounter equal sign we need
+ * to start over with another rule. If there is a character
+ * that does not match, we neet to look for a comma (to get
+ * another pattern) or semicolon (to get another rule) and try
+ * again if there is one somewhere.
+ */
+
+ name_matched = 0;
+
+ for (c = name; *s != '*' && *c; ++c, ++s)
+ if (*s == '=')
+ goto next_rule;
+ else if (*s == ',')
+ goto next_pattern;
+ else if (*s != '?' && *c != *s)
+ goto look_for_next;
+ if (*s == '*')
+ ++s;
+
+ name_matched = 1;
+
+ /*
+ * Now we need to match the type part of the pattern. If the
+ * pattern is missing it we match only if type points to an
+ * empty string. Otherwise wy try to match it just like name.
+ */
+ if (*s == '/') {
+match_type: /* s points to '/' */
+ ++s;
+
+ for (c = type; *s && *c; ++c, ++s)
+ if (*s == '=')
+ goto next_rule;
+ else if (*s == ',')
+ goto next_pattern;
+ else if (*c != *s)
+ goto look_for_next;
+ }
+
+ /* Return the string behind the '=' sign of the rule. */
+ if (*s == '=')
+ return s + 1;
+ else if (*s == ',')
+ return strchr(s, '=') + 1;
+
+ /* Pattern did not match */
+
+look_for_next:
+ do {
+ ++s;
+ } while (*s != ',' && *s != '=');
+ if (*s == ',')
+ continue;
+
+next_rule: /* s points to '=' */
+ s = strchr(s, ';');
+ if (!s)
+ break;
+
+next_pattern:
+ continue;
+ }
+
+ return ERR_PTR(-ENOENT);
+}
--
1.7.1

2010-08-20 09:52:14

[permalink] [raw]

Subject: [PATCH/RFCv4 3/6] mm: cma: Added SysFS support

The SysFS development interface lets one change the map attribute
at run time as well as observe what regions have been reserved.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
.../ABI/testing/sysfs-kernel-mm-contiguous | 53 +++
Documentation/contiguous-memory.txt | 4 +
include/linux/cma.h | 7 +
mm/Kconfig | 18 +-
mm/cma.c | 345 +++++++++++++++++++-
5 files changed, 423 insertions(+), 4 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-contiguous

diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-contiguous b/Documentation/ABI/testing/sysfs-kernel-mm-contiguous
new file mode 100644
index 0000000..8df15bc
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-mm-contiguous
@@ -0,0 +1,53 @@
+What: /sys/kernel/mm/contiguous/
+Date: August 2010
+Contact: Michal Nazarewicz <[email protected]>
+Description:
+ If CMA has been built with SysFS support,
+ /sys/kernel/mm/contiguous/ contains a file called
+ "map", a file called "allocators" and a directory
+ called "regions".
+
+ The "map" file lets one change the CMA's map attribute
+ at run-time.
+
+ The "allocators" file list all registered allocators.
+ Allocators with no name are listed as a single minus
+ sign.
+
+ The "regions" directory list all reserved regions.
+
+ For more details see
+ Documentation/contiguous-memory.txt.
+
+What: /sys/kernel/mm/contiguous/regions/
+Date: August 2010
+Contact: Michal Nazarewicz <[email protected]>
+Description:
+ The /sys/kernel/mm/contiguous/regions/ directory
+ contain directories for each registered CMA region.
+ The name of the directory is the same as the start
+ address of the region.
+
+ If region is named there is also a symbolic link named
+ like the region pointing to the region's directory.
+
+ Such directory contains the following files:
+
+ * "name" -- the name of the region or an empty file
+ * "start" -- starting address of the region (formatted
+ with %p, ie. hex).
+ * "size" -- size of the region (in bytes).
+ * "free" -- free space in the region (in bytes).
+ * "users" -- number of chunks allocated in the region.
+ * "alloc" -- name of the allocator.
+
+ If allocator is not attached to the region, "alloc" is
+ either the name of desired allocator in square
+ brackets (ie. "[foo]") or an empty file if region is
+ to be attached to default allocator. If an allocator
+ is attached to the region. "alloc" is either its name
+ or "-" if attached allocator has no name.
+
+ If there are no chunks allocated in given region
+ ("users" is "0") then a name of desired allocator can
+ be written to "alloc".
diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt
index 8fc2400..8d189b8 100644
--- a/Documentation/contiguous-memory.txt
+++ b/Documentation/contiguous-memory.txt
@@ -256,6 +256,10 @@
iff it matched in previous pattern. If the second part is
omitted it will mach any type of memory requested by device.

+ If SysFS support is enabled, this attribute is accessible via
+ SysFS and can be changed at run-time by writing to
+ /sys/kernel/mm/contiguous/map.
+
Some examples (whitespace added for better readability):

cma_map = foo/quaz = r1;
diff --git a/include/linux/cma.h b/include/linux/cma.h
index cd63f52..eede28d 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -17,6 +17,9 @@

#include <linux/rbtree.h>
#include <linux/list.h>
+#if defined CONFIG_CMA_SYSFS
+# include <linux/kobject.h>
+#endif

struct device;
@@ -203,6 +206,10 @@ struct cma_region {
unsigned users;
struct list_head list;

+#if defined CONFIG_CMA_SYSFS
+ struct kobject kobj;
+#endif
+
unsigned used:1;
unsigned registered:1;
unsigned reserved:1;
diff --git a/mm/Kconfig b/mm/Kconfig
index 3e9317c..ac0bb08 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -319,12 +319,26 @@ config CMA
To make use of CMA you need to specify the regions and
driver->region mapping on command line when booting the kernel.

-config CMA_DEBUG
- bool "CMA debug messages (DEVELOPEMENT)"
+config CMA_DEVELOPEMENT
+ bool "Include CMA developement features"
depends on CMA
help
+ This lets you enable some developement features of the CMA
+ freamework.
+
+config CMA_DEBUG
+ bool "CMA debug messages"
+ depends on CMA_DEVELOPEMENT
+ help
Enable debug messages in CMA code.

+config CMA_SYSFS
+ bool "CMA SysFS interface support"
+ depends on CMA_DEVELOPEMENT
+ help
+ Enable support for SysFS interface.
+
+config CMA_CMDLINE
config CMA_BEST_FIT
bool "CMA best-fit allocator"
depends on CMA
diff --git a/mm/cma.c b/mm/cma.c
index 401399c..561d817 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -38,8 +38,8 @@

/*
- * Protects cma_regions, cma_allocators, cma_map, cma_map_length, and
- * cma_chunks_by_start.
+ * Protects cma_regions, cma_allocators, cma_map, cma_map_length,
+ * cma_kobj, cma_sysfs_regions and cma_chunks_by_start.
*/
static DEFINE_MUTEX(cma_mutex);

@@ -143,7 +143,11 @@ int __init __must_check cma_early_region_register(struct cma_region *reg)

/************************* Regions & Allocators *************************/

+static void __cma_sysfs_region_add(struct cma_region *reg);
+
static int __cma_region_attach_alloc(struct cma_region *reg);
+static void __maybe_unused __cma_region_detach_alloc(struct cma_region *reg);
+

/* List of all regions. Named regions are kept before unnamed. */
static LIST_HEAD(cma_regions);
@@ -226,6 +230,8 @@ int __must_check cma_region_register(struct cma_region *reg)
else
list_add_tail(&reg->list, &cma_regions);

+ __cma_sysfs_region_add(reg);
+
done:
mutex_unlock(&cma_mutex);

@@ -483,6 +489,329 @@ subsys_initcall(cma_init);

+/************************* SysFS *************************/
+
+#if defined CONFIG_CMA_SYSFS
+
+static struct kobject cma_sysfs_regions;
+static int cma_sysfs_regions_ready;
+
+
+#define CMA_ATTR_INLINE(_type, _name) \
+ (&((struct cma_ ## _type ## _attribute){ \
+ .attr = { \
+ .name = __stringify(_name), \
+ .mode = 0644, \
+ }, \
+ .show = cma_sysfs_ ## _type ## _ ## _name ## _show, \
+ .store = cma_sysfs_ ## _type ## _ ## _name ## _store, \
+ }).attr)
+
+#define CMA_ATTR_RO_INLINE(_type, _name) \
+ (&((struct cma_ ## _type ## _attribute){ \
+ .attr = { \
+ .name = __stringify(_name), \
+ .mode = 0444, \
+ }, \
+ .show = cma_sysfs_ ## _type ## _ ## _name ## _show, \
+ }).attr)
+
+
+struct cma_root_attribute {
+ struct attribute attr;
+ ssize_t (*show)(char *buf);
+ int (*store)(const char *buf);
+};
+
+static ssize_t cma_sysfs_root_map_show(char *page)
+{
+ ssize_t len;
+
+ len = cma_map_length;
+ if (!len) {
+ *page = 0;
+ len = 0;
+ } else {
+ if (len > (size_t)PAGE_SIZE - 1)
+ len = (size_t)PAGE_SIZE - 1;
+ memcpy(page, cma_map, len);
+ page[len++] = '\n';
+ }
+
+ return len;
+}
+
+static int cma_sysfs_root_map_store(const char *page)
+{
+ ssize_t len = cma_map_validate(page);
+ char *val = NULL;
+
+ if (len < 0)
+ return len;
+
+ if (len) {
+ val = kmemdup(page, len + 1, GFP_KERNEL);
+ if (!val)
+ return -ENOMEM;
+ val[len] = '\0';
+ }
+
+ kfree(cma_map);
+ cma_map = val;
+ cma_map_length = len;
+
+ return 0;
+}
+
+static ssize_t cma_sysfs_root_allocators_show(char *page)
+{
+ struct cma_allocator *alloc;
+ size_t left = PAGE_SIZE;
+ char *ch = page;
+
+ cma_foreach_allocator(alloc) {
+ ssize_t l = snprintf(ch, left, "%s ", alloc->name ?: "-");
+ ch += l;
+ left -= l;
+ }
+
+ if (ch != page)
+ ch[-1] = '\n';
+ return ch - page;
+}
+
+static ssize_t
+cma_sysfs_root_show(struct kobject *kobj, struct attribute *attr, char *buf)
+{
+ struct cma_root_attribute *rattr =
+ container_of(attr, struct cma_root_attribute, attr);
+ ssize_t ret;
+
+ mutex_lock(&cma_mutex);
+ ret = rattr->show(buf);
+ mutex_unlock(&cma_mutex);
+
+ return ret;
+}
+
+static ssize_t
+cma_sysfs_root_store(struct kobject *kobj, struct attribute *attr,
+ const char *buf, size_t count)
+{
+ struct cma_root_attribute *rattr =
+ container_of(attr, struct cma_root_attribute, attr);
+ int ret;
+
+ mutex_lock(&cma_mutex);
+ ret = rattr->store(buf);
+ mutex_unlock(&cma_mutex);
+
+ return ret < 0 ? ret : count;
+}
+
+static struct kobj_type cma_sysfs_root_type = {
+ .sysfs_ops = &(const struct sysfs_ops){
+ .show = cma_sysfs_root_show,
+ .store = cma_sysfs_root_store,
+ },
+ .default_attrs = (struct attribute * []) {
+ CMA_ATTR_INLINE(root, map),
+ CMA_ATTR_RO_INLINE(root, allocators),
+ NULL
+ },
+};
+
+static int __init cma_sysfs_init(void)
+{
+ static struct kobject root;
+ static struct kobj_type fake_type;
+
+ struct cma_region *reg;
+ int ret;
+
+ /* Root */
+ ret = kobject_init_and_add(&root, &cma_sysfs_root_type,
+ mm_kobj, "contiguous");
+ if (unlikely(ret < 0)) {
+ pr_err("init: unable to add root kobject: %d\n", ret);
+ return ret;
+ }
+
+ /* Regions */
+ ret = kobject_init_and_add(&cma_sysfs_regions, &fake_type,
+ &root, "regions");
+ if (unlikely(ret < 0)) {
+ pr_err("init: unable to add regions kobject: %d\n", ret);
+ return ret;
+ }
+
+ mutex_lock(&cma_mutex);
+ cma_sysfs_regions_ready = 1;
+ cma_foreach_region(reg)
+ __cma_sysfs_region_add(reg);
+ mutex_unlock(&cma_mutex);
+
+ return 0;
+}
+device_initcall(cma_sysfs_init);
+
+
+
+struct cma_region_attribute {
+ struct attribute attr;
+ ssize_t (*show)(struct cma_region *reg, char *buf);
+ int (*store)(struct cma_region *reg, const char *buf);
+};
+
+
+static ssize_t cma_sysfs_region_name_show(struct cma_region *reg, char *page)
+{
+ return reg->name ? snprintf(page, PAGE_SIZE, "%s\n", reg->name) : 0;
+}
+
+static ssize_t cma_sysfs_region_start_show(struct cma_region *reg, char *page)
+{
+ return snprintf(page, PAGE_SIZE, "%p\n", (void *)reg->start);
+}
+
+static ssize_t cma_sysfs_region_size_show(struct cma_region *reg, char *page)
+{
+ return snprintf(page, PAGE_SIZE, "%zu\n", reg->size);
+}
+
+static ssize_t cma_sysfs_region_free_show(struct cma_region *reg, char *page)
+{
+ return snprintf(page, PAGE_SIZE, "%zu\n", reg->free_space);
+}
+
+static ssize_t cma_sysfs_region_users_show(struct cma_region *reg, char *page)
+{
+ return snprintf(page, PAGE_SIZE, "%u\n", reg->users);
+}
+
+static ssize_t cma_sysfs_region_alloc_show(struct cma_region *reg, char *page)
+{
+ if (reg->alloc)
+ return snprintf(page, PAGE_SIZE, "%s\n",
+ reg->alloc->name ?: "-");
+ else if (reg->alloc_name)
+ return snprintf(page, PAGE_SIZE, "[%s]\n", reg->alloc_name);
+ else
+ return 0;
+}
+
+static int
+cma_sysfs_region_alloc_store(struct cma_region *reg, const char *page)
+{
+ char *s;
+
+ if (reg->alloc && reg->users)
+ return -EBUSY;
+
+ if (!*page || *page == '\n') {
+ s = NULL;
+ } else {
+ size_t len;
+
+ for (s = (char *)page; *++s && *s != '\n'; )
+ /* nop */;
+
+ len = s - page;
+ s = kmemdup(page, len + 1, GFP_KERNEL);
+ if (!s)
+ return -ENOMEM;
+ s[len] = '\0';
+ }
+
+ if (reg->alloc)
+ __cma_region_detach_alloc(reg);
+
+ if (reg->free_alloc_name)
+ kfree(reg->alloc_name);
+
+ reg->alloc_name = s;
+ reg->free_alloc_name = !!s;
+
+ return 0;
+}
+
+
+static ssize_t
+cma_sysfs_region_show(struct kobject *kobj, struct attribute *attr,
+ char *buf)
+{
+ struct cma_region *reg = container_of(kobj, struct cma_region, kobj);
+ struct cma_region_attribute *rattr =
+ container_of(attr, struct cma_region_attribute, attr);
+ ssize_t ret;
+
+ mutex_lock(&cma_mutex);
+ ret = rattr->show(reg, buf);
+ mutex_unlock(&cma_mutex);
+
+ return ret;
+}
+
+static int
+cma_sysfs_region_store(struct kobject *kobj, struct attribute *attr,
+ const char *buf, size_t count)
+{
+ struct cma_region *reg = container_of(kobj, struct cma_region, kobj);
+ struct cma_region_attribute *rattr =
+ container_of(attr, struct cma_region_attribute, attr);
+ int ret;
+
+ mutex_lock(&cma_mutex);
+ ret = rattr->store(reg, buf);
+ mutex_unlock(&cma_mutex);
+
+ return ret < 0 ? ret : count;
+}
+
+static struct kobj_type cma_sysfs_region_type = {
+ .sysfs_ops = &(const struct sysfs_ops){
+ .show = cma_sysfs_region_show,
+ .store = cma_sysfs_region_store,
+ },
+ .default_attrs = (struct attribute * []) {
+ CMA_ATTR_RO_INLINE(region, name),
+ CMA_ATTR_RO_INLINE(region, start),
+ CMA_ATTR_RO_INLINE(region, size),
+ CMA_ATTR_RO_INLINE(region, free),
+ CMA_ATTR_RO_INLINE(region, users),
+ CMA_ATTR_INLINE(region, alloc),
+ NULL
+ },
+};
+
+static void __cma_sysfs_region_add(struct cma_region *reg)
+{
+ int ret;
+
+ if (!cma_sysfs_regions_ready)
+ return;
+
+ memset(&reg->kobj, 0, sizeof reg->kobj);
+
+ ret = kobject_init_and_add(&reg->kobj, &cma_sysfs_region_type,
+ &cma_sysfs_regions,
+ "%p", (void *)reg->start);
+
+ if (reg->name &&
+ sysfs_create_link(&cma_sysfs_regions, &reg->kobj, reg->name) < 0)
+ /* Ignore any errors. */;
+}
+
+#else
+
+static void __cma_sysfs_region_add(struct cma_region *reg)
+{
+ /* nop */
+}
+
+#endif
+
+
/************************* Chunks *************************/

/* All chunks sorted by start address. */
@@ -784,6 +1113,18 @@ static int __cma_region_attach_alloc(struct cma_region *reg)
return ret;
}

+static void __cma_region_detach_alloc(struct cma_region *reg)
+{
+ if (!reg->alloc)
+ return;
+
+ if (reg->alloc->cleanup)
+ reg->alloc->cleanup(reg);
+
+ reg->alloc = NULL;
+ reg->used = 1;
+}
+

/*
* s ::= rules
--
1.7.1

2010-08-20 09:52:19

[permalink] [raw]

Subject: [PATCH/RFCv4 5/6] mm: cma: Test device and application added

This patch adds a "cma" misc device which lets user space use the
CMA API. This device is meant for testing. A testing application
is also provided.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
drivers/misc/Kconfig | 8 +
drivers/misc/Makefile | 1 +
drivers/misc/cma-dev.c | 185 ++++++++++++++++++++++++
include/linux/cma.h | 30 ++++
tools/cma/cma-test.c | 373 ++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 597 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/cma-dev.c
create mode 100644 tools/cma/cma-test.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 0b591b6..f93e812 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -395,4 +395,12 @@ source "drivers/misc/eeprom/Kconfig"
source "drivers/misc/cb710/Kconfig"
source "drivers/misc/iwmc3200top/Kconfig"

+config CMA_DEVICE
+ tristate "CMA misc device (DEVELOPEMENT)"
+ depends on CMA_DEVELOPEMENT
+ help
+ The CMA misc device allows allocating contiguous memory areas
+ from user space. This is mostly for testing of the CMA
+ framework.
+
endif # MISC_DEVICES
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 255a80d..2e82898 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -35,3 +35,4 @@ obj-y += eeprom/
obj-y += cb710/
obj-$(CONFIG_VMWARE_BALLOON) += vmware_balloon.o
obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o
+obj-$(CONFIG_CMA_DEVICE) += cma-dev.o
diff --git a/drivers/misc/cma-dev.c b/drivers/misc/cma-dev.c
new file mode 100644
index 0000000..de534f0
--- /dev/null
+++ b/drivers/misc/cma-dev.c
@@ -0,0 +1,185 @@
+/*
+ * Contiguous Memory Allocator userspace driver
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#include <linux/errno.h> /* Error numbers */
+#include <linux/err.h> /* IS_ERR_VALUE() */
+#include <linux/fs.h> /* struct file */
+#include <linux/mm.h> /* Memory stuff */
+#include <linux/mman.h>
+#include <linux/slab.h>
+#include <linux/module.h> /* Standard module stuff */
+#include <linux/device.h> /* struct device, dev_dbg() */
+#include <linux/types.h> /* Just to be safe ;) */
+#include <linux/uaccess.h> /* __copy_{to,from}_user */
+#include <linux/miscdevice.h> /* misc_register() and company */
+
+#include <linux/cma.h>
+
+static int cma_file_open(struct inode *inode, struct file *file);
+static int cma_file_release(struct inode *inode, struct file *file);
+static long cma_file_ioctl(struct file *file, unsigned cmd, unsigned long arg);
+static int cma_file_mmap(struct file *file, struct vm_area_struct *vma);
+
+
+static struct miscdevice cma_miscdev = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "cma",
+ .fops = &(const struct file_operations) {
+ .owner = THIS_MODULE,
+ .open = cma_file_open,
+ .release = cma_file_release,
+ .unlocked_ioctl = cma_file_ioctl,
+ .mmap = cma_file_mmap,
+ },
+};
+#define cma_dev (cma_miscdev.this_device)
+
+
+#define cma_file_start(file) (((dma_addr_t *)(file)->private_data)[0])
+#define cma_file_size(file) (((dma_addr_t *)(file)->private_data)[1])
+
+
+static int cma_file_open(struct inode *inode, struct file *file)
+{
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ file->private_data = NULL;
+
+ return 0;
+}
+
+
+static int cma_file_release(struct inode *inode, struct file *file)
+{
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if (file->private_data) {
+ cma_free(cma_file_start(file));
+ kfree(file->private_data);
+ }
+
+ return 0;
+}
+
+
+static long cma_file_ioctl(struct file *file, unsigned cmd, unsigned long arg)
+{
+ struct cma_alloc_request req;
+ struct device fake_device;
+ unsigned long addr;
+ long ret;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if (cmd != IOCTL_CMA_ALLOC)
+ return -ENOTTY;
+
+ if (!arg)
+ return -EINVAL;
+
+ if (file->private_data) /* Already allocated */
+ return -EBADFD;
+
+ if (copy_from_user(&req, (void *)arg, sizeof req))
+ return -EFAULT;
+
+ if (req.magic != CMA_MAGIC)
+ return -ENOTTY;
+
+ /* May happen on 32 bit system. */
+ if (req.size > ~(typeof(req.size))0 ||
+ req.alignment > ~(typeof(req.alignment))0)
+ return -EINVAL;
+
+ if (strnlen(req.name, sizeof req.name) >= sizeof req.name
+ || strnlen(req.kind, sizeof req.kind) >= sizeof req.kind)
+ return -EINVAL;
+
+ file->private_data = kmalloc(2 * sizeof(dma_addr_t), GFP_KERNEL);
+ if (!file->private_data)
+ return -ENOMEM;
+
+ fake_device.init_name = req.name;
+ fake_device.kobj.name = req.name;
+ addr = cma_alloc(&fake_device, req.kind, req.size, req.alignment);
+ if (IS_ERR_VALUE(addr)) {
+ ret = addr;
+ goto error_priv;
+ }
+
+ if (put_user(addr, (typeof(req.start) *)(arg + offsetof(typeof(req),
+ start)))) {
+ ret = -EFAULT;
+ goto error_put;
+ }
+
+ cma_file_start(file) = addr;
+ cma_file_size(file) = req.size;
+
+ dev_dbg(cma_dev, "allocated %p@%p\n",
+ (void *)(dma_addr_t)req.size, (void *)addr);
+
+ return 0;
+
+error_put:
+ cma_free(addr);
+error_priv:
+ kfree(file->private_data);
+ file->private_data = NULL;
+ return ret;
+}
+
+
+static int cma_file_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ unsigned long pgoff, offset, length;
+
+ dev_dbg(cma_dev, "%s(%p)\n", __func__, (void *)file);
+
+ if (!file->private_data)
+ return -EBADFD;
+
+ pgoff = vma->vm_pgoff;
+ offset = pgoff << PAGE_SHIFT;
+ length = vma->vm_end - vma->vm_start;
+
+ if (offset >= cma_file_size(file)
+ || length > cma_file_size(file)
+ || offset + length > cma_file_size(file))
+ return -ENOSPC;
+
+ return remap_pfn_range(vma, vma->vm_start,
+ __phys_to_pfn(cma_file_start(file) + offset),
+ length, vma->vm_page_prot);
+}
+
+
+
+static int __init cma_dev_init(void)
+{
+ int ret = misc_register(&cma_miscdev);
+ pr_debug("miscdev: register returned: %d\n", ret);
+ return ret;
+}
+module_init(cma_dev_init);
+
+static void __exit cma_dev_exit(void)
+{
+ dev_dbg(cma_dev, "deregisterring\n");
+ misc_deregister(&cma_miscdev);
+}
+module_exit(cma_dev_exit);
diff --git a/include/linux/cma.h b/include/linux/cma.h
index eede28d..4334bb8 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -11,6 +11,36 @@
* See Documentation/contiguous-memory.txt for details.
*/

+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+
+#define CMA_MAGIC (('c' << 24) | ('M' << 16) | ('a' << 8) | 0x42)
+
+/**
+ * An information about area exportable to user space.
+ * @magic: must always be CMA_MAGIC.
+ * @name: name of the device to allocate as.
+ * @kind: kind of the memory.
+ * @_pad: reserved.
+ * @size: size of the chunk to allocate.
+ * @alignment: desired alignment of the chunk (must be power of two or zero).
+ * @start: when ioctl() finishes this stores physical address of the chunk.
+ */
+struct cma_alloc_request {
+ __u32 magic;
+ char name[17];
+ char kind[17];
+ __u16 pad;
+ /* __u64 to be compatible accross 32 and 64 bit systems. */
+ __u64 size;
+ __u64 alignment;
+ __u64 start;
+};
+
+#define IOCTL_CMA_ALLOC _IOWR('p', 0, struct cma_alloc_request)
+
+
/***************************** Kernel lever API *****************************/

#ifdef __KERNEL__
diff --git a/tools/cma/cma-test.c b/tools/cma/cma-test.c
new file mode 100644
index 0000000..567c57b
--- /dev/null
+++ b/tools/cma/cma-test.c
@@ -0,0 +1,373 @@
+/*
+ * cma-test.c -- CMA testing application
+ *
+ * Copyright (C) 2010 Samsung Electronics
+ * Author: Michal Nazarewicz <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+/* $(CROSS_COMPILE)gcc -Wall -Wextra -g -o cma-test cma-test.c */
+
+#include <linux/cma.h>
+
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+
+#include <fcntl.h>
+#include <unistd.h>
+
+#include <ctype.h>
+#include <errno.h>
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+
+static void handle_command(char *line);
+
+int main(void)
+{
+ unsigned no = 1;
+ char line[1024];
+ int skip = 0;
+
+ fputs("commands:\n"
+ " l or list list allocated chunks\n"
+ " a or alloc <name> <size>[/<alignment>] allocate chunk\n"
+ " f or free [<num>] free an chunk\n"
+ " # ... comment\n"
+ " <empty line> repeat previous\n"
+ "\n", stderr);
+
+ while (fgets(line, sizeof line, stdin)) {
+ char *nl = strchr(line, '\n');
+ if (nl) {
+ if (skip) {
+ fprintf(stderr, "cma: %d: line too long\n", no);
+ skip = 0;
+ } else {
+ *nl = '\0';
+ handle_command(line);
+ }
+ ++no;
+ } else {
+ skip = 1;
+ }
+ }
+
+ if (skip)
+ fprintf(stderr, "cma: %d: no new line at EOF\n", no);
+ return 0;
+}
+
+
+
+static void cmd_list(char *name, char *line);
+static void cmd_alloc(char *name, char *line);
+static void cmd_free(char *name, char *line);
+
+static const struct command {
+ const char name[8];
+ void (*handle)(char *name, char *line);
+} commands[] = {
+ { "list", cmd_list },
+ { "l", cmd_list },
+ { "alloc", cmd_alloc },
+ { "a", cmd_alloc },
+ { "free", cmd_free },
+ { "f", cmd_free },
+ { "", NULL }
+};
+
+
+#define SKIP_SPACE(ch) do while (isspace(*(ch))) ++(ch); while (0)
+
+
+static void handle_command(char *line)
+{
+ static char last_line[1024];
+
+ const struct command *cmd;
+ char *name;
+
+ SKIP_SPACE(line);
+ if (*line == '#')
+ return;
+
+ if (!*line)
+ strcpy(line, last_line);
+ else
+ strcpy(last_line, line);
+
+ name = line;
+ while (*line && !isspace(*line))
+ ++line;
+
+ if (*line) {
+ *line = '\0';
+ ++line;
+ }
+
+ for (cmd = commands; *(cmd->name); ++cmd)
+ if (!strcmp(name, cmd->name)) {
+ cmd->handle(name, line);
+ return;
+ }
+
+ fprintf(stderr, "%s: unknown command\n", name);
+}
+
+
+
+struct chunk {
+ struct chunk *next, *prev;
+ int fd;
+ unsigned long size;
+ unsigned long start;
+};
+
+static struct chunk root = {
+ .next = &root,
+ .prev = &root,
+};
+
+#define for_each(a) for (a = root.next; a != &root; a = a->next)
+
+static struct chunk *chunk_create(const char *prefix);
+static void chunk_destroy(struct chunk *chunk);
+static void chunk_add(struct chunk *chunk);
+
+static int memparse(char *ptr, char **retptr, unsigned long *ret);
+
+
+static void cmd_list(char *name, char *line)
+{
+ struct chunk *chunk;
+
+ (void)name; (void)line;
+
+ for_each(chunk)
+ printf("%3d: %p@%p\n", chunk->fd,
+ (void *)chunk->size, (void *)chunk->start);
+}
+
+
+static void cmd_alloc(char *name, char *line)
+{
+ unsigned long size, alignment = 0;
+ struct cma_alloc_request req;
+ char *dev, *kind = NULL;
+ struct chunk *chunk;
+ int ret;
+
+ SKIP_SPACE(line);
+ if (!*line) {
+ fprintf(stderr, "%s: expecting name\n", name);
+ return;
+ }
+
+ for (dev = line; *line && !isspace(*line); ++line)
+ if (*line == '/')
+ kind = line;
+
+ if (!*line) {
+ fprintf(stderr, "%s: expecting size after name\n", name);
+ return;
+ }
+
+ if (kind)
+ *kind++ = '\0';
+ *line++ = '\0';
+
+ if (( kind && (size_t)(kind - dev ) > sizeof req.name)
+ || (!kind && (size_t)(line - dev ) > sizeof req.name)
+ || ( kind && (size_t)(line - kind) > sizeof req.kind)) {
+ fprintf(stderr, "%s: name or kind too long\n", name);
+ return;
+ }
+
+
+ if (memparse(line, &line, &size) < 0 || !size) {
+ fprintf(stderr, "%s: invalid size\n", name);
+ return;
+ }
+
+ if (*line == '/')
+ if (memparse(line, &line, &alignment) < 0) {
+ fprintf(stderr, "%s: invalid alignment\n", name);
+ return;
+ }
+
+ SKIP_SPACE(line);
+ if (*line) {
+ fprintf(stderr, "%s: unknown arguments at the end: %s\n",
+ name, line);
+ return;
+ }
+
+
+ chunk = chunk_create(name);
+ if (!chunk)
+ return;
+
+ fprintf(stderr, "%s: allocating %p/%p\n", name,
+ (void *)size, (void *)alignment);
+
+ req.magic = CMA_MAGIC;
+ req.size = size;
+ req.alignment = alignment;
+
+ strcpy(req.name, dev);
+ if (kind)
+ strcpy(req.kind, kind);
+ else
+ req.kind[0] = '\0';
+
+
+ ret = ioctl(chunk->fd, IOCTL_CMA_ALLOC, &req);
+ if (ret < 0) {
+ fprintf(stderr, "%s: cma_alloc: %s\n", name, strerror(errno));
+ chunk_destroy(chunk);
+ } else {
+ chunk_add(chunk);
+ chunk->size = req.size;
+ chunk->start = req.start;
+
+ printf("%3d: %p@%p\n", chunk->fd,
+ (void *)chunk->size, (void *)chunk->start);
+ }
+}
+
+
+static void cmd_free(char *name, char *line)
+{
+ struct chunk *chunk;
+
+ SKIP_SPACE(line);
+
+ if (*line) {
+ unsigned long num;
+
+ errno = 0;
+ num = strtoul(line, &line, 10);
+
+ if (errno || num > INT_MAX) {
+ fprintf(stderr, "%s: invalid number\n", name);
+ return;
+ }
+
+ SKIP_SPACE(line);
+ if (*line) {
+ fprintf(stderr, "%s: unknown arguments at the end: %s\n",
+ name, line);
+ return;
+ }
+
+ for_each(chunk)
+ if (chunk->fd == (int)num)
+ goto ok;
+ fprintf(stderr, "%s: no chunk %3lu\n", name, num);
+ return;
+
+ } else {
+ chunk = root.prev;
+ if (chunk == &root) {
+ fprintf(stderr, "%s: no chunks\n", name);
+ return;
+ }
+ }
+
+ok:
+ fprintf(stderr, "%s: freeing %p@%p\n", name,
+ (void *)chunk->size, (void *)chunk->start);
+ chunk_destroy(chunk);
+}
+
+
+static struct chunk *chunk_create(const char *prefix)
+{
+ struct chunk *chunk;
+ int fd;
+
+ chunk = malloc(sizeof *chunk);
+ if (!chunk) {
+ fprintf(stderr, "%s: %s\n", prefix, strerror(errno));
+ return NULL;
+ }
+
+ fd = open("/dev/cma", O_RDWR);
+ if (fd < 0) {
+ fprintf(stderr, "%s: /dev/cma: %s\n", prefix, strerror(errno));
+ return NULL;
+ }
+
+ chunk->prev = chunk;
+ chunk->next = chunk;
+ chunk->fd = fd;
+ return chunk;
+}
+
+static void chunk_destroy(struct chunk *chunk)
+{
+ chunk->prev->next = chunk->next;
+ chunk->next->prev = chunk->prev;
+ close(chunk->fd);
+}
+
+static void chunk_add(struct chunk *chunk)
+{
+ chunk->next = &root;
+ chunk->prev = root.prev;
+ root.prev->next = chunk;
+ root.prev = chunk;
+}
+
+
+
+static int memparse(char *ptr, char **retptr, unsigned long *ret)
+{
+ unsigned long val;
+
+ SKIP_SPACE(ptr);
+
+ errno = 0;
+ val = strtoul(ptr, &ptr, 0);
+ if (errno)
+ return -1;
+
+ switch (*ptr) {
+ case 'G':
+ case 'g':
+ val <<= 10;
+ case 'M':
+ case 'm':
+ val <<= 10;
+ case 'K':
+ case 'k':
+ val <<= 10;
+ ++ptr;
+ }
+
+ if (retptr) {
+ SKIP_SPACE(ptr);
+ *retptr = ptr;
+ }
+
+ *ret = val;
+ return 0;
+}
--
1.7.1

2010-08-20 09:52:53

[permalink] [raw]

Subject: [PATCH/RFCv4 4/6] mm: cma: Added command line parameters support

This patch adds a pair of early parameters ("cma" and
"cma.map") which let one override the CMA configuration
given by platform without the need to recompile the kernel.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
Documentation/contiguous-memory.txt | 85 ++++++++++++++++++++++--
Documentation/kernel-parameters.txt | 7 ++
mm/Kconfig | 6 ++
mm/cma.c | 125 +++++++++++++++++++++++++++++++++++
4 files changed, 218 insertions(+), 5 deletions(-)

diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt
index 8d189b8..95faec1 100644
--- a/Documentation/contiguous-memory.txt
+++ b/Documentation/contiguous-memory.txt
@@ -88,6 +88,20 @@
early region and the framework will handle the rest
including choosing the right early allocator.

+ 4. CMA allows a run-time configuration of the memory regions it
+ will use to allocate chunks of memory from. The set of memory
+ regions is given on command line so it can be easily changed
+ without the need for recompiling the kernel.
+
+ Each region has it's own size, alignment demand, a start
+ address (physical address where it should be placed) and an
+ allocator algorithm assigned to the region.
+
+ This means that there can be different algorithms running at
+ the same time, if different devices on the platform have
+ distinct memory usage characteristics and different algorithm
+ match those the best way.
+
** Use cases

Let's analyse some imaginary system that uses the CMA to see how
@@ -162,7 +176,6 @@
This solution also shows how with CMA you can assign private pools
of memory to each device if that is required.

-
Allocation mechanisms can be replaced dynamically in a similar
manner as well. Let's say that during testing, it has been
discovered that, for a given shared region of 40 MiB,
@@ -217,6 +230,42 @@
it will be set to a PAGE_SIZE. start will be aligned to
alignment.

+ If command line parameter support is enabled, this attribute can
+ also be overriden by a command line "cma" parameter. When given
+ on command line its forrmat is as follows:
+
+ regions-attr ::= [ regions [ ';' ] ]
+ regions ::= region [ ';' regions ]
+
+ region ::= REG-NAME
+ '=' size
+ [ '@' start ]
+ [ '/' alignment ]
+ [ ':' ALLOC-NAME ]
+
+ size ::= MEMSIZE // size of the region
+ start ::= MEMSIZE // desired start address of
+ // the region
+ alignment ::= MEMSIZE // alignment of the start
+ // address of the region
+
+ REG-NAME specifies the name of the region. All regions given at
+ via the regions attribute need to have a name. Moreover, all
+ regions need to have a unique name. If two regions have the same
+ name it is unspecified which will be used when requesting to
+ allocate memory from region with given name.
+
+ ALLOC-NAME specifies the name of allocator to be used with the
+ region. If no allocator name is provided, the "default"
+ allocator will be used with the region. The "default" allocator
+ is, of course, the first allocator that has been registered. ;)
+
+ size, start and alignment are specified in bytes with suffixes
+ that memparse() accept. If start is given, the region will be
+ reserved on given starting address (or at close to it as
+ possible). If alignment is specified, the region will be aligned
+ to given value.
+
**** Map

The format of the "map" attribute is as follows:
@@ -260,8 +309,33 @@
SysFS and can be changed at run-time by writing to
/sys/kernel/mm/contiguous/map.

+ If command line parameter support is enabled, this attribute can
+ also be overriden by a command line "cma.map" parameter.
+
+**** Examples
+
Some examples (whitespace added for better readability):

+ cma = r1 = 64M // 64M region
+ @512M // starting at address 512M
+ // (or at least as near as possible)
+ /1M // make sure it's aligned to 1M
+ :foo(bar); // uses allocator "foo" with "bar"
+ // as parameters for it
+ r2 = 64M // 64M region
+ /1M; // make sure it's aligned to 1M
+ // uses the first available allocator
+ r3 = 64M // 64M region
+ @512M // starting at address 512M
+ :foo; // uses allocator "foo" with no parameters
+
+ cma_map = foo = r1;
+ // device foo with kind==NULL uses region r1
+
+ foo/quaz = r2; // OR:
+ /quaz = r2;
+ // device foo with kind == "quaz" uses region r2
+
cma_map = foo/quaz = r1;
// device foo with type == "quaz" uses region r1

@@ -526,10 +600,11 @@

int cma_set_defaults(struct cma_region *regions, const char *map)

- It needs to be called prior to reserving regions. It let one
- specify the list of regions defined by platform and the map
- attribute. The map may point to a string in __initdata. See
- above in this document for example usage of this function.
+ It needs to be called after early params have been parsed but
+ prior to reserving regions. It let one specify the list of
+ regions defined by platform and the map attribute. The map may
+ point to a string in __initdata. See above in this document for
+ example usage of this function.

** Future work

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index cf81298..db53b76 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -43,6 +43,7 @@ parameter is applicable:
AVR32 AVR32 architecture is enabled.
AX25 Appropriate AX.25 support is enabled.
BLACKFIN Blackfin architecture is enabled.
+ CMA Contiguous Memory Allocator is enabled.
DRM Direct Rendering Management support is enabled.
EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
EFI EFI Partitioning (GPT) is enabled
@@ -477,6 +478,12 @@ and is between 256 and 4096 characters. It is defined in the file
Also note the kernel might malfunction if you disable
some critical bits.

+ cma= [CMA] List of CMA regions.
+ See Documentation/contiguous-memory.txt for details.
+
+ cma.map= [CMA] CMA mapping
+ See Documentation/contiguous-memory.txt for details.
+
cmo_free_hint= [PPC] Format: { yes | no }
Specify whether pages are marked as being inactive
when they are freed. This is used in CMO environments
diff --git a/mm/Kconfig b/mm/Kconfig
index ac0bb08..05404fc 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -339,6 +339,12 @@ config CMA_SYSFS
Enable support for SysFS interface.

config CMA_CMDLINE
+ bool "CMA command line parameters support"
+ depends on CMA_DEVELOPEMENT
+ help
+ Enable support for cma, cma.map and cma.asterisk command line
+ parameters.
+
config CMA_BEST_FIT
bool "CMA best-fit allocator"
depends on CMA
diff --git a/mm/cma.c b/mm/cma.c
index 561d817..33d48d8 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -103,6 +103,12 @@ static int __init cma_map_param(char *param)
return 0;
}

+#if defined CONFIG_CMA_CMDLINE
+
+early_param("cma.map", cma_map_param);
+
+#endif
+

/************************* Early regions *************************/
@@ -110,6 +116,125 @@ static int __init cma_map_param(char *param)
struct list_head cma_early_regions __initdata =
LIST_HEAD_INIT(cma_early_regions);

+#ifdef CONFIG_CMA_CMDLINE
+
+/*
+ * regions-attr ::= [ regions [ ';' ] ]
+ * regions ::= region [ ';' regions ]
+ *
+ * region ::= [ '-' ] reg-name
+ * '=' size
+ * [ '@' start ]
+ * [ '/' alignment ]
+ * [ ':' alloc-name ]
+ *
+ * See Documentation/contiguous-memory.txt for details.
+ *
+ * Example:
+ * cma=reg1=64M:bf;reg2=32M@0x100000:bf;reg3=64M/1M:bf
+ *
+ * If allocator is ommited the first available allocater will be used.
+ */
+
+#define NUMPARSE(cond_ch, type, cond) ({ \
+ unsigned long long v = 0; \
+ if (*param == (cond_ch)) { \
+ const char *const msg = param + 1; \
+ v = memparse(msg, &param); \
+ if (!v || v > ~(type)0 || !(cond)) { \
+ pr_err("param: invalid value near %s\n", msg); \
+ ret = -EINVAL; \
+ break; \
+ } \
+ } \
+ v; \
+ })
+
+static int __init cma_param_parse(char *param)
+{
+ static struct cma_region regions[16];
+
+ size_t left = ARRAY_SIZE(regions);
+ struct cma_region *reg = regions;
+ int ret = 0;
+
+ pr_debug("param: %s\n", param);
+
+ for (; *param; ++reg) {
+ dma_addr_t start, alignment;
+ size_t size;
+
+ if (unlikely(!--left)) {
+ pr_err("param: too many early regions\n");
+ return -ENOSPC;
+ }
+
+ /* Parse name */
+ reg->name = param;
+ param = strchr(param, '=');
+ if (!param || param == reg->name) {
+ pr_err("param: expected \"<name>=\" near %s\n",
+ reg->name);
+ ret = -EINVAL;
+ break;
+ }
+ *param = '\0';
+
+ /* Parse numbers */
+ size = NUMPARSE('\0', size_t, true);
+ start = NUMPARSE('@', dma_addr_t, true);
+ alignment = NUMPARSE('/', dma_addr_t, (v & (v - 1)) == 0);
+
+ alignment = max(alignment, (dma_addr_t)PAGE_SIZE);
+ start = ALIGN(start, alignment);
+ size = PAGE_ALIGN(size);
+ if (start + size < start) {
+ pr_err("param: invalid start, size combination\n");
+ ret = -EINVAL;
+ break;
+ }
+
+ /* Parse allocator */
+ if (*param == ':') {
+ reg->alloc_name = ++param;
+ while (*param && *param != ';')
+ ++param;
+ if (param == reg->alloc_name)
+ reg->alloc_name = NULL;
+ }
+
+ /* Go to next */
+ if (*param == ';') {
+ *param = '\0';
+ ++param;
+ } else if (*param) {
+ pr_err("param: expecting ';' or end of parameter near %s\n",
+ param);
+ ret = -EINVAL;
+ break;
+ }
+
+ /* Add */
+ reg->size = size;
+ reg->start = start;
+ reg->alignment = alignment;
+ reg->copy_name = 1;
+
+ list_add_tail(&reg->list, &cma_early_regions);
+
+ pr_debug("param: registering early region %s (%p@%p/%p)\n",
+ reg->name, (void *)reg->size, (void *)reg->start,
+ (void *)reg->alignment);
+ }
+
+ return ret;
+}
+early_param("cma", cma_param_parse);
+
+#undef NUMPARSE
+
+#endif
+

int __init __must_check cma_early_region_register(struct cma_region *reg)
{
--
1.7.1

2010-08-20 09:52:28

[permalink] [raw]

Subject: [PATCH/RFCv4 6/6] arm: Added CMA to Aquila and Goni

Added the CMA initialisation code to two Samsung platforms.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
---
arch/arm/mach-s5pv210/mach-aquila.c | 31 +++++++++++++++++++++++++++++++
arch/arm/mach-s5pv210/mach-goni.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 62 insertions(+), 0 deletions(-)

diff --git a/arch/arm/mach-s5pv210/mach-aquila.c b/arch/arm/mach-s5pv210/mach-aquila.c
index 0dda801..3561859 100644
--- a/arch/arm/mach-s5pv210/mach-aquila.c
+++ b/arch/arm/mach-s5pv210/mach-aquila.c
@@ -19,6 +19,7 @@
#include <linux/gpio_keys.h>
#include <linux/input.h>
#include <linux/gpio.h>
+#include <linux/cma.h>

#include <asm/mach/arch.h>
#include <asm/mach/map.h>
@@ -493,6 +494,35 @@ static void __init aquila_map_io(void)
s3c24xx_init_uarts(aquila_uartcfgs, ARRAY_SIZE(aquila_uartcfgs));
}

+static void __init aquila_reserve(void)
+{
+ static struct cma_region regions[] = {
+ {
+ .name = "fw",
+ .size = 1 << 20,
+ { .alignment = 128 << 10 },
+ },
+ {
+ .name = "b1",
+ .size = 32 << 20,
+ .asterisk = 1,
+ },
+ {
+ .name = "b2",
+ .size = 16 << 20,
+ .start = 0x40000000,
+ .asterisk = 1,
+ },
+ { }
+ };
+
+ static const char map[] __initconst =
+ "s3c-mfc5/f=fw;s3c-mfc5/a=b1;s3c-mfc5/b=b2";
+
+ cma_set_defaults(regions, map);
+ cma_early_regions_reserve(NULL);
+}
+
static void __init aquila_machine_init(void)
{
/* PMIC */
@@ -523,4 +553,5 @@ MACHINE_START(AQUILA, "Aquila")
.map_io = aquila_map_io,
.init_machine = aquila_machine_init,
.timer = &s3c24xx_timer,
+ .reserve = aquila_reserve,
MACHINE_END
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c
index 53754d7..edeb93f 100644
--- a/arch/arm/mach-s5pv210/mach-goni.c
+++ b/arch/arm/mach-s5pv210/mach-goni.c
@@ -19,6 +19,7 @@
#include <linux/gpio_keys.h>
#include <linux/input.h>
#include <linux/gpio.h>
+#include <linux/cma.h>

#include <asm/mach/arch.h>
#include <asm/mach/map.h>
@@ -474,6 +475,35 @@ static void __init goni_map_io(void)
s3c24xx_init_uarts(goni_uartcfgs, ARRAY_SIZE(goni_uartcfgs));
}

+static void __init goni_reserve(void)
+{
+ static struct cma_region regions[] = {
+ {
+ .name = "fw",
+ .size = 1 << 20,
+ { .alignment = 128 << 10 },
+ },
+ {
+ .name = "b1",
+ .size = 32 << 20,
+ .asterisk = 1,
+ },
+ {
+ .name = "b2",
+ .size = 16 << 20,
+ .start = 0x40000000,
+ .asterisk = 1,
+ },
+ { }
+ };
+
+ static const char map[] __initconst =
+ "s3c-mfc5/f=fw;s3c-mfc5/a=b1;s3c-mfc5/b=b2";
+
+ cma_set_defaults(regions, map);
+ cma_early_regions_reserve(NULL);
+}
+
static void __init goni_machine_init(void)
{
/* PMIC */
@@ -498,4 +528,5 @@ MACHINE_START(GONI, "GONI")
.map_io = goni_map_io,
.init_machine = goni_machine_init,
.timer = &s3c24xx_timer,
+ .reserve = goni_reserve,
MACHINE_END
--
1.7.1

2010-08-20 13:16:36

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote:
> Hello everyone,
>
> The following patchset implements a Contiguous Memory Allocator. For
> those who have not yet stumbled across CMA an excerpt from
> documentation:
>
> The Contiguous Memory Allocator (CMA) is a framework, which allows
> setting up a machine-specific configuration for physically-contiguous
> memory management. Memory for devices is then allocated according
> to that configuration.
>
> The main role of the framework is not to allocate memory, but to
> parse and manage memory configurations, as well as to act as an
> in-between between device drivers and pluggable allocators. It is
> thus not tied to any memory allocation method or strategy.
>
> For more information please refer to the second patch from the
> patchset which contains the documentation.

So the idea is to grab a large chunk of memory at boot time and then
later allow some device to use it?

I'd much rather we'd improve the regular page allocator to be smarter
about this. We recently added a lot of smarts to it like memory
compaction, which allows large gobs of contiguous memory to be freed for
things like huge pages.

If you want guarantees you can free stuff, why not add constraints to
the page allocation type and only allow MIGRATE_MOVABLE pages inside a
certain region, those pages are easily freed/moved aside to satisfy
large contiguous allocations.

Also, please remove --chain-reply-to from your git config. You're using
1.7 which should do the right thing (--no-chain-reply-to) by default.

2010-08-25 20:35:38

by Konrad Rzeszutek Wilk

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 2/6] mm: cma: Contiguous Memory Allocator added

2010-08-25 20:39:38

by Konrad Rzeszutek Wilk

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 3/6] mm: cma: Added SysFS support

On Fri, Aug 20, 2010 at 11:50:43AM +0200, Michal Nazarewicz wrote:
> The SysFS development interface lets one change the map attribute
> at run time as well as observe what regions have been reserved.
>
> Signed-off-by: Michal Nazarewicz <[email protected]>
> Signed-off-by: Kyungmin Park <[email protected]>
> ---
> .../ABI/testing/sysfs-kernel-mm-contiguous | 53 +++
> Documentation/contiguous-memory.txt | 4 +
> include/linux/cma.h | 7 +
> mm/Kconfig | 18 +-
> mm/cma.c | 345 +++++++++++++++++++-
> 5 files changed, 423 insertions(+), 4 deletions(-)
> create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-contiguous
>
> diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-contiguous b/Documentation/ABI/testing/sysfs-kernel-mm-contiguous
> new file mode 100644
> index 0000000..8df15bc
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-mm-contiguous
> @@ -0,0 +1,53 @@
> +What: /sys/kernel/mm/contiguous/
> +Date: August 2010
> +Contact: Michal Nazarewicz <[email protected]>
> +Description:
> + If CMA has been built with SysFS support,
> + /sys/kernel/mm/contiguous/ contains a file called
> + "map", a file called "allocators" and a directory
> + called "regions".
> +
> + The "map" file lets one change the CMA's map attribute
> + at run-time.
> +
> + The "allocators" file list all registered allocators.
> + Allocators with no name are listed as a single minus
> + sign.
> +
> + The "regions" directory list all reserved regions.
> +
> + For more details see
> + Documentation/contiguous-memory.txt.
> +
> +What: /sys/kernel/mm/contiguous/regions/
> +Date: August 2010
> +Contact: Michal Nazarewicz <[email protected]>
> +Description:
> + The /sys/kernel/mm/contiguous/regions/ directory
> + contain directories for each registered CMA region.
> + The name of the directory is the same as the start
> + address of the region.
> +
> + If region is named there is also a symbolic link named
> + like the region pointing to the region's directory.
> +
> + Such directory contains the following files:
> +
> + * "name" -- the name of the region or an empty file
> + * "start" -- starting address of the region (formatted
> + with %p, ie. hex).
> + * "size" -- size of the region (in bytes).
> + * "free" -- free space in the region (in bytes).
> + * "users" -- number of chunks allocated in the region.
> + * "alloc" -- name of the allocator.
> +
> + If allocator is not attached to the region, "alloc" is
> + either the name of desired allocator in square
> + brackets (ie. "[foo]") or an empty file if region is
> + to be attached to default allocator. If an allocator
> + is attached to the region. "alloc" is either its name
> + or "-" if attached allocator has no name.
> +
> + If there are no chunks allocated in given region
> + ("users" is "0") then a name of desired allocator can
> + be written to "alloc".
> diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt
> index 8fc2400..8d189b8 100644
> --- a/Documentation/contiguous-memory.txt
> +++ b/Documentation/contiguous-memory.txt
> @@ -256,6 +256,10 @@
> iff it matched in previous pattern. If the second part is
> omitted it will mach any type of memory requested by device.
>
> + If SysFS support is enabled, this attribute is accessible via
> + SysFS and can be changed at run-time by writing to
> + /sys/kernel/mm/contiguous/map.
> +
> Some examples (whitespace added for better readability):
>
> cma_map = foo/quaz = r1;
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> index cd63f52..eede28d 100644
> --- a/include/linux/cma.h
> +++ b/include/linux/cma.h
> @@ -17,6 +17,9 @@
>
> #include <linux/rbtree.h>
> #include <linux/list.h>
> +#if defined CONFIG_CMA_SYSFS
> +# include <linux/kobject.h>
> +#endif
>
>
> struct device;
> @@ -203,6 +206,10 @@ struct cma_region {
> unsigned users;
> struct list_head list;
>
> +#if defined CONFIG_CMA_SYSFS
> + struct kobject kobj;
> +#endif
> +
> unsigned used:1;
> unsigned registered:1;
> unsigned reserved:1;
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 3e9317c..ac0bb08 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -319,12 +319,26 @@ config CMA
> To make use of CMA you need to specify the regions and
> driver->region mapping on command line when booting the kernel.
>
> -config CMA_DEBUG
> - bool "CMA debug messages (DEVELOPEMENT)"
> +config CMA_DEVELOPEMENT
> + bool "Include CMA developement features"
> depends on CMA
> help
> + This lets you enable some developement features of the CMA
> + freamework.
> +
> +config CMA_DEBUG
> + bool "CMA debug messages"
> + depends on CMA_DEVELOPEMENT
> + help
> Enable debug messages in CMA code.
>
> +config CMA_SYSFS
> + bool "CMA SysFS interface support"
> + depends on CMA_DEVELOPEMENT
> + help
> + Enable support for SysFS interface.

Whats the rationale for having those #ifdef CONFIG_CMA_SYSFS sprinkled
in the C code? Is SysFS not used on StrongARM? Why not implicitly include
the SysFS support?

2010-08-25 22:58:55

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Fri, 20 Aug 2010 15:15:10 +0200
Peter Zijlstra <[email protected]> wrote:

> On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote:
> > Hello everyone,
> >
> > The following patchset implements a Contiguous Memory Allocator. For
> > those who have not yet stumbled across CMA an excerpt from
> > documentation:
> >
> > The Contiguous Memory Allocator (CMA) is a framework, which allows
> > setting up a machine-specific configuration for physically-contiguous
> > memory management. Memory for devices is then allocated according
> > to that configuration.
> >
> > The main role of the framework is not to allocate memory, but to
> > parse and manage memory configurations, as well as to act as an
> > in-between between device drivers and pluggable allocators. It is
> > thus not tied to any memory allocation method or strategy.
> >
> > For more information please refer to the second patch from the
> > patchset which contains the documentation.
>
> So the idea is to grab a large chunk of memory at boot time and then
> later allow some device to use it?
>
> I'd much rather we'd improve the regular page allocator to be smarter
> about this. We recently added a lot of smarts to it like memory
> compaction, which allows large gobs of contiguous memory to be freed for
> things like huge pages.
>
> If you want guarantees you can free stuff, why not add constraints to
> the page allocation type and only allow MIGRATE_MOVABLE pages inside a
> certain region, those pages are easily freed/moved aside to satisfy
> large contiguous allocations.

That would be good. Although I expect that the allocation would need
to be 100% rock-solid reliable, otherwise the end user has a
non-functioning device. Could generic core VM provide the required level
of service?

Anyway, these patches are going to be hard to merge but not impossible.
Keep going. Part of the problem is cultural, really: the consumers of
this interface are weird dinky little devices which the core MM guys
tend not to work with a lot, and it adds code which they wouldn't use.

I agree that having two "contiguous memory allocators" floating about
on the list is distressing. Are we really all 100% diligently certain
that there is no commonality here with Zach's work?

I agree that Peter's above suggestion would be the best thing to do.
Please let's take a look at that without getting into sunk cost
fallacies with existing code!

It would help (a lot) if we could get more attention and buyin and
fedback from the potential clients of this code. rmk's feedback is
valuable. Have we heard from the linux-media people? What other
subsystems might use it? ieee1394 perhaps? Please help identify
specific subsystems and I can perhaps help to wake people up.

And I agree that this code (or one of its alternatives!) would benefit
from having a core MM person take a close interest. Any volunteers?

Please cc me on future emails on this topic?

2010-08-25 23:26:54

by Daniel Walker

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Wed, 2010-08-25 at 15:58 -0700, Andrew Morton wrote:
> That would be good. Although I expect that the allocation would need
> to be 100% rock-solid reliable, otherwise the end user has a
> non-functioning device. Could generic core VM provide the required
> level
> of service?
>
> Anyway, these patches are going to be hard to merge but not
> impossible.
> Keep going. Part of the problem is cultural, really: the consumers of
> this interface are weird dinky little devices which the core MM guys
> tend not to work with a lot, and it adds code which they wouldn't use.
>
> I agree that having two "contiguous memory allocators" floating about
> on the list is distressing. Are we really all 100% diligently certain
> that there is no commonality here with Zach's work?

There is some commonality with Zach's work, but Zach should be following
all of this development .. So presumably he has no issues with Michal's
changes. I think Zach's solution has a similar direction to this.

If Michal is active (he seems more so than Zach), and follows community
comments (including Zach's , but I haven't seen any) then we can defer
to that solution ..

Daniel

--

Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

2010-08-25 23:31:29

by Jonathan Corbet

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Wed, 25 Aug 2010 15:58:14 -0700
Andrew Morton <[email protected]> wrote:

> > If you want guarantees you can free stuff, why not add constraints to
> > the page allocation type and only allow MIGRATE_MOVABLE pages inside a
> > certain region, those pages are easily freed/moved aside to satisfy
> > large contiguous allocations.
>
> That would be good. Although I expect that the allocation would need
> to be 100% rock-solid reliable, otherwise the end user has a
> non-functioning device. Could generic core VM provide the required level
> of service?

The original OLPC has a camera controller which requires three contiguous,
image-sized buffers in memory. That system is a little memory constrained
(OK, it's desperately short of memory), so, in the past, the chances of
being able to allocate those buffers anytime some kid decides to start
taking pictures was poor. Thus, cafe_ccic.c has an option to snag the
memory at initialization time and never let go even if you threaten its
family. Hell hath no fury like a little kid whose new toy^W educational
tool stops taking pictures.

That, of course, is not a hugely efficient use of memory on a
memory-constrained system. If the VM could reliably satisfy those
allocation requestss, life would be wonderful. Seems difficult. But it
would be a nicer solution than CMA, which, to a great extent, is really
just a standardized mechanism for grabbing memory and never letting go.

> It would help (a lot) if we could get more attention and buyin and
> fedback from the potential clients of this code. rmk's feedback is
> valuable. Have we heard from the linux-media people? What other
> subsystems might use it? ieee1394 perhaps? Please help identify
> specific subsystems and I can perhaps help to wake people up.

If this code had been present when I did the Cafe driver, I would have used
it. I think it could be made useful to a number of low-end camera drivers
if the videobuf layer were made to talk to it in a way which Just Works.

With a bit of tweaking, I think it could be made useful in other
situations: the viafb driver, for example, really needs an allocator for
framebuffer memory and it seems silly to create one from scratch. Of
course, there might be other possible solutions, like adding a "zones"
concept to LMB^W memblock.

The problem which is being addressed here is real.

That said, the complexity of the solution still bugs me a bit, and the core
idea is still to take big chunks of memory out of service for specific
needs. It would be far better if the VM could just provide big chunks on
demand. Perhaps compaction and the pressures of making transparent huge
pages work will get us there, but I'm not sure we're there yet.

jon

2010-08-26 01:04:10

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Wed, 25 Aug 2010 15:58:14 -0700
Andrew Morton <[email protected]> wrote:

> On Fri, 20 Aug 2010 15:15:10 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote:
> > > Hello everyone,
> > >
> > > The following patchset implements a Contiguous Memory Allocator. For
> > > those who have not yet stumbled across CMA an excerpt from
> > > documentation:
> > >
> > > The Contiguous Memory Allocator (CMA) is a framework, which allows
> > > setting up a machine-specific configuration for physically-contiguous
> > > memory management. Memory for devices is then allocated according
> > > to that configuration.
> > >
> > > The main role of the framework is not to allocate memory, but to
> > > parse and manage memory configurations, as well as to act as an
> > > in-between between device drivers and pluggable allocators. It is
> > > thus not tied to any memory allocation method or strategy.
> > >
> > > For more information please refer to the second patch from the
> > > patchset which contains the documentation.
> >
> > So the idea is to grab a large chunk of memory at boot time and then
> > later allow some device to use it?
> >
> > I'd much rather we'd improve the regular page allocator to be smarter
> > about this. We recently added a lot of smarts to it like memory
> > compaction, which allows large gobs of contiguous memory to be freed for
> > things like huge pages.
> >
> > If you want guarantees you can free stuff, why not add constraints to
> > the page allocation type and only allow MIGRATE_MOVABLE pages inside a
> > certain region, those pages are easily freed/moved aside to satisfy
> > large contiguous allocations.
>
> That would be good. Although I expect that the allocation would need
> to be 100% rock-solid reliable, otherwise the end user has a
> non-functioning device. Could generic core VM provide the required level
> of service?
>
> Anyway, these patches are going to be hard to merge but not impossible.
> Keep going. Part of the problem is cultural, really: the consumers of
> this interface are weird dinky little devices which the core MM guys
> tend not to work with a lot, and it adds code which they wouldn't use.
>
> I agree that having two "contiguous memory allocators" floating about
> on the list is distressing. Are we really all 100% diligently certain
> that there is no commonality here with Zach's work?
>
> I agree that Peter's above suggestion would be the best thing to do.
> Please let's take a look at that without getting into sunk cost
> fallacies with existing code!
>
> It would help (a lot) if we could get more attention and buyin and
> fedback from the potential clients of this code. rmk's feedback is
> valuable. Have we heard from the linux-media people? What other
> subsystems might use it? ieee1394 perhaps? Please help identify
> specific subsystems and I can perhaps help to wake people up.
>
> And I agree that this code (or one of its alternatives!) would benefit
> from having a core MM person take a close interest. Any volunteers?
>
> Please cc me on future emails on this topic?
>

Hmm, you may not like this..but how about following kind of interface ?

Now, memoyr hotplug supports following operation to free and _isolate_
memory region.
# echo offline > /sys/devices/system/memory/memoryX/state

Then, a region of memory will be isolated. (This succeeds if there are free
memory.)

Add a new interface.

% echo offline > /sys/devices/system/memory/memoryX/state
# extract memory from System RAM and make them invisible from buddy allocator.

% echo cma > /sys/devices/system/memory/memoryX/state
# move invisible memory to cma.

Then, a chunk of memory will be moved into contiguous-memory-allocator.

To move "cma" region as usual region,
# echo offline > /sys/devices/system/memory/memoryX/state
# echo online > /sys/devices/system/memory/memoryX/state

Maybe "used-for-cma" memory are can be populated via /proc/iomem
As,
100000000-63fffffff : System RAM
640000000-800000000 : Contiguous RAM (Used for drivers)
(And you have to skip small memory holes by seeing this file)

Of course, cma guys can keep continue to use their own boot option.
With memory hotplug, kernelcore=xxxM interface can be used for creating
ZONE_MOVABLE. Some complicated work may be needed as

# echo movable > /sys/devices/system/memory/memoryX/state
(online pages and move them into ZONE_MOVABLE)

If anyone interested in, I may be able to offer some help.

Thanks,
-Kame

2010-08-26 01:22:06

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 3/6] mm: cma: Added SysFS support

On Wed, 25 Aug 2010 22:37:08 +0200, Konrad Rzeszutek Wilk <[email protected]> wrote:
> Whats the rationale for having those #ifdef CONFIG_CMA_SYSFS sprinkled
> in the C code? Is SysFS not used on StrongARM? Why not implicitly include
> the SysFS support?

The SysFS CMA interface is meant for development only and because of that
I decided to separate it form the core in a separate patch and enable it
only when explicitly requested.

--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

2010-08-26 01:23:08

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 2/6] mm: cma: Contiguous Memory Allocator added

On Wed, 25 Aug 2010 22:32:37 +0200, Konrad Rzeszutek Wilk <[email protected]> wrote:

> On Fri, Aug 20, 2010 at 11:50:42AM +0200, Michal Nazarewicz wrote:
>> The Contiguous Memory Allocator framework is a set of APIs for
>> allocating physically contiguous chunks of memory.
>>
>> Various chips require contiguous blocks of memory to operate. Those
>> chips include devices such as cameras, hardware video decoders and
>> encoders, etc.
>
> I am not that familiar with how StrongARM works, and I took a bit look
> at the arch/arm/mach-s* and then some of the
> drivers/video/media/video/cx88 to get an idea how the hardware video
> decoders would work this.
>
> What I got from this patch review is that you are writting an IOMMU

No. CMA's designed for systems without IOMMU. If system has IOMMU then
there is no need for contiguous memory blocks since all discontiguousnesses
can be hidden by the IOMMU.

> that is on steroids. It essentially knows that this device and that
> device can both share the same region, and it has fancy plugin system
> to deal with fragmentation and offers an simple API for other to
> write their own "allocators".

Dunno if the plugin system is "fancy" but essentially the above is true. ;)

> Even better, during init, the sub-platform can use
> cma_early_regions_reserve(<func>) to register their own function
> for reserving large regions of memory. Which from my experience (with
> Xen) means that there is a mechanism in place to have it setup
> contingous regions using sub-platform code.

Essentially that's the idea. Platform init code adds early regions and later
on reserves memory for all of the early regions. For the former some
additional helper functions are provided which can be used.

> This is how I think it works, but I am not sure if I got it right. From
> looking at 'cma_alloc' and 'cma_alloc_from_region' - both return
> an dma_addr_t, which is what is usually feed in the DMA API. And looking
> at the cx88 driver I see it using that API..
>
> I do understand that under ARM platform you might not have a need for
> DMA at all, and you use the 'dma_addr_t' just as handles, but for
> other platforms this would be used.

In the first version I've used unsigned long as return type but then it
was suggested that maybe dma_addr_t would be better. This is easily
changed at this stage so I'd be more then happy to hear any comments.

> So here is the bit where I am confused. Why not have this
> as Software IOMMU that would utilize the IOMMU API? There would be some
> technical questions to be answered (such as, what to do when you have
> another IOMMU and can you stack them on top of each other).

If I understood you correctly this is something I'm thinking about. I'm
actually thinking of ways to integrate CMA with Zach's IOMMU proposal posted
some time ago. The idea would be to define a subset of functionalities
of the IOMMU API that would work on systems with and without hardware IOMMU.
If platform had no IOMMU CMA would be used.

I'm currently trying to fully understand Zach's proposal to see how such an
approach could be pursued.

> A light review below:

Thanks! Greatly appreciated.

>> + * cma_alloc_from - allocates contiguous chunk of memory from named regions.
>> + * @regions: Comma separated list of region names. Terminated by NUL
>
> I think you mean 'NULL'

No, a NUL byte, ie. '\0'.

>> + * byte or a semicolon.
>
> Uh, really? Why? Why not just simplify your life and make it \0?

This is a consequence of how map is stored. It's stored as a single string
with entries separated by semicolons.

>> + * The cma_allocator::alloc() operation need to set only the @start
> ^^- C++, eh?

Well, I'm unaware of a C way to reference "methods" so I just borrowed C++ style.

>> +int __init cma_early_region_reserve(struct cma_region *reg)
>> +{
[...]
>> +#ifndef CONFIG_NO_BOOTMEM
[...]
>> +#endif
>> +
>> +#ifdef CONFIG_HAVE_MEMBLOCK
[...]
>> +#endif

> Those two #ifdefs are pretty ugly. What if you defined in a header
> something along this:
>
> #ifdef CONFIG_HAVE_MEMBLOCK
> int __init default_early_region_reserve(struct cma_region *reg) {
> .. do it using memblock
> }
> #endif
> #ifdef CONFIG_NO_BOOTMEM
> int __init default_early_region_reserve(struct cma_region *reg) {
> .. do it using bootmem
> }
> #endif

I wanted the function to try all possible allocators. As a matter of fact,
both APIs (memblock and bootmem) can be supported at the same time.

> and you would cut the API by one function, the
> cma_early_regions_reserve(struct cma_region *reg)

Actually, I would prefer to leave it. It may be useful for platform
initialisation code. Especially if platform has some special regions
which are allocated in a different but for the rest wants to use the
default CMA's reserve call.

--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

2010-08-26 01:23:16

by Pawel Osciak

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

Hi Andrew,

Thank you for your comments and interest in this!

On 08/26/2010 07:58 AM, Andrew Morton wrote:
> On Fri, 20 Aug 2010 15:15:10 +0200
> Peter Zijlstra<[email protected]> wrote:
>
>
>> On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote:
>>
>>> Hello everyone,
>>>
>>> The following patchset implements a Contiguous Memory Allocator. For
>>> those who have not yet stumbled across CMA an excerpt from
>>> documentation:
>>>
>>> The Contiguous Memory Allocator (CMA) is a framework, which allows
>>> setting up a machine-specific configuration for physically-contiguous
>>> memory management. Memory for devices is then allocated according
>>> to that configuration.
>>>
>>> The main role of the framework is not to allocate memory, but to
>>> parse and manage memory configurations, as well as to act as an
>>> in-between between device drivers and pluggable allocators. It is
>>> thus not tied to any memory allocation method or strategy.
>>>
>>> For more information please refer to the second patch from the
>>> patchset which contains the documentation.
>>>
>> So the idea is to grab a large chunk of memory at boot time and then
>> later allow some device to use it?
>>
>> I'd much rather we'd improve the regular page allocator to be smarter
>> about this. We recently added a lot of smarts to it like memory
>> compaction, which allows large gobs of contiguous memory to be freed for
>> things like huge pages.
>>
>> If you want guarantees you can free stuff, why not add constraints to
>> the page allocation type and only allow MIGRATE_MOVABLE pages inside a
>> certain region, those pages are easily freed/moved aside to satisfy
>> large contiguous allocations.
>>
> That would be good. Although I expect that the allocation would need
> to be 100% rock-solid reliable, otherwise the end user has a
> non-functioning device. Could generic core VM provide the required level
> of service?
>
> Anyway, these patches are going to be hard to merge but not impossible.
> Keep going. Part of the problem is cultural, really: the consumers of
> this interface are weird dinky little devices which the core MM guys
> tend not to work with a lot, and it adds code which they wouldn't use.
>

This is encouraging, thanks. Merging a contiguous allocator seems like a
lost cause, with a relative disinterest of non-embedded people, and on
the other hand because of the difficulty to satisfy those actually
interested. With virtually everybody having their own, custom solutions,
agreeing on one is nearly impossible.

> I agree that having two "contiguous memory allocators" floating about
> on the list is distressing. Are we really all 100% diligently certain
> that there is no commonality here with Zach's work?
>

I think Zach's work is more focused on IOMMU and on unifying virtual
memory handling. As far as I understand, any physical allocator can be
plugged into it, including CMA. CMA solves a different set of problems.

> I agree that Peter's above suggestion would be the best thing to do.
> Please let's take a look at that without getting into sunk cost
> fallacies with existing code!
>
> It would help (a lot) if we could get more attention and buyin and
> fedback from the potential clients of this code. rmk's feedback is
> valuable. Have we heard from the linux-media people? What other
> subsystems might use it? ieee1394 perhaps? Please help identify
> specific subsystems and I can perhaps help to wake people up.
>

As a media developer myself, I talked with people and many have
expressed their interest. Among them were developers from ST-Ericsson,
Intel and TI, to name a few. Their SoCs, like ours at Samsung, require
contiguous memory allocation schemes as well.

I am working on a driver framework for media for memory management (on
the logical, not physical level). One of the goals is to allow plugging
in custom allocators and memory handling functions (cache management,
etc.). CMA is intended to be used as one of the pluggable allocators for
it. Right now, many media drivers have to provide their own, more or
less complicated, memory handling, which is of course undesirable. Some
of those make it to the kernel, many are maintained outside the mainline.

The problem is that, as far as I am aware, there have already been quite
a few proposals for such allocators and none made it to the mainline. So
companies develop their own solutions and maintain them outside the
mainline.

I think that the interest is definitely there, but people have their
deadlines and assume that it is close to impossible to have a contiguous
allocator merged.

Your help and support would be very much appreciated. Working in
embedded Linux for some time now, I feel that the need is definitely
there and is quite substantial.

--
Best regards,
Pawel Osciak
Linux Platform Group
Samsung Poland R&D Center

2010-08-26 01:29:17

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Fri, 20 Aug 2010 15:15:10 +0200, Peter Zijlstra <[email protected]> wrote:
> So the idea is to grab a large chunk of memory at boot time and then
> later allow some device to use it?
>
> I'd much rather we'd improve the regular page allocator to be smarter
> about this. We recently added a lot of smarts to it like memory
> compaction, which allows large gobs of contiguous memory to be freed for
> things like huge pages.
>
> If you want guarantees you can free stuff, why not add constraints to
> the page allocation type and only allow MIGRATE_MOVABLE pages inside a
> certain region, those pages are easily freed/moved aside to satisfy
> large contiguous allocations.

I'm aware that grabbing a large chunk at boot time is a bit of waste of
space and because of it I'm hoping to came up with a way of reusing the
space when it's not used by CMA-aware devices. My current idea was to
use it for easily discardable data (page cache?).

> Also, please remove --chain-reply-to from your git config. You're using
> 1.7 which should do the right thing (--no-chain-reply-to) by default.

OK.

--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

2010-08-26 01:38:46

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 26 Aug 2010 01:26:34 +0200, Daniel Walker <[email protected]> wrote:
> If Michal is active, and follows community comments (including Zach's,
> but I haven't seen any) then we can defer to that solution ..

Comments are always welcome. :)

--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

2010-08-26 01:39:21

by Pawel Osciak

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On 08/26/2010 08:31 AM, Jonathan Corbet wrote:
> On Wed, 25 Aug 2010 15:58:14 -0700
> Andrew Morton<[email protected]> wrote:
>
>
>>> If you want guarantees you can free stuff, why not add constraints to
>>> the page allocation type and only allow MIGRATE_MOVABLE pages inside a
>>> certain region, those pages are easily freed/moved aside to satisfy
>>> large contiguous allocations.
>>>
>> That would be good. Although I expect that the allocation would need
>> to be 100% rock-solid reliable, otherwise the end user has a
>> non-functioning device. Could generic core VM provide the required level
>> of service?
>>
> The original OLPC has a camera controller which requires three contiguous,
> image-sized buffers in memory. That system is a little memory constrained
> (OK, it's desperately short of memory), so, in the past, the chances of
> being able to allocate those buffers anytime some kid decides to start
> taking pictures was poor. Thus, cafe_ccic.c has an option to snag the
> memory at initialization time and never let go even if you threaten its
> family. Hell hath no fury like a little kid whose new toy^W educational
> tool stops taking pictures.
>
> That, of course, is not a hugely efficient use of memory on a
> memory-constrained system. If the VM could reliably satisfy those
> allocation requestss, life would be wonderful. Seems difficult. But it
> would be a nicer solution than CMA, which, to a great extent, is really
> just a standardized mechanism for grabbing memory and never letting go.
>

The main problem is of course fragmentation, for this there is no
solution in CMA. It has a feature intended to at least reduce memory
usage though, if only a little bit. It is region sharing. It allows
platform architects to define regions shared by more than one driver, as
explained by Michal in the RFC. So we can at least try to reuse each
chunk of memory as much as possible and not hold separate regions for
each driver when they are not intended to work simultaneously. Not a
silver bullet, but is there any though?

>> It would help (a lot) if we could get more attention and buyin and
>> fedback from the potential clients of this code. rmk's feedback is
>> valuable. Have we heard from the linux-media people? What other
>> subsystems might use it? ieee1394 perhaps? Please help identify
>> specific subsystems and I can perhaps help to wake people up.
>>
> If this code had been present when I did the Cafe driver, I would have used
> it. I think it could be made useful to a number of low-end camera drivers
> if the videobuf layer were made to talk to it in a way which Just Works.
>

I am working on new videobuf which will (hopefully) Just Work. CMA is
intended to be pluggable into it, as should be any other allocator for
that matter.

--

Best regards,
Pawel Osciak
Linux Platform Group
Samsung Poland R&D Center

2010-08-26 01:49:44

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 26 Aug 2010 01:31:25 +0200, Jonathan Corbet <[email protected]> wrote:
> The original OLPC has a camera controller which requires three contiguous,
> image-sized buffers in memory. That system is a little memory constrained
> (OK, it's desperately short of memory), so, in the past, the chances of
> being able to allocate those buffers anytime some kid decides to start
> taking pictures was poor. Thus, cafe_ccic.c has an option to snag the
> memory at initialization time and never let go even if you threaten its
> family. Hell hath no fury like a little kid whose new toy^W educational
> tool stops taking pictures.
>
> That, of course, is not a hugely efficient use of memory on a
> memory-constrained system. If the VM could reliably satisfy those
> allocation requestss, life would be wonderful. Seems difficult. But it
> would be a nicer solution than CMA, which, to a great extent, is really
> just a standardized mechanism for grabbing memory and never letting go.

At this moment it seems nothing more then that but they way I see it
is that with a common, standardised, centrally-managed mechanism for
grabbing memory we can start thinking about the ways to reuse the memory.

If each driver were to grab it's own memory in a way know to itself only
the memory is truly lost but with CMA not only regions can be reused among
devices but also the framework can manage the unallocated memory and try to
utilize it in other ways (movable pages? cache? buffers? some kind of
compressed memory swap?).

What I'm trying to say is that I totally agree with your and other's comments
about CMA essentially grabbing memory and never releasing it but I believe
this can be combat with time when overall idea of haw the CMA API should look
like is agreed upon.

--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

2010-08-26 02:13:15

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 26 Aug 2010 02:58:57 +0200, KAMEZAWA Hiroyuki <[email protected]> wrote:
> Hmm, you may not like this..but how about following kind of interface ?
>
> Now, memoyr hotplug supports following operation to free and _isolate_
> memory region.
> # echo offline > /sys/devices/system/memory/memoryX/state
>
> Then, a region of memory will be isolated. (This succeeds if there are free
> memory.)
>
> Add a new interface.
>
> % echo offline > /sys/devices/system/memory/memoryX/state
> # extract memory from System RAM and make them invisible from buddy allocator.
>
> % echo cma > /sys/devices/system/memory/memoryX/state
> # move invisible memory to cma.

At this point I need to say that I have no experience with hotplug memory but
I think that for this to make sense the regions of memory would have to be
smaller. Unless I'm misunderstanding something, the above would convert
a region of sizes in order of GiBs to use for CMA.

--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

2010-08-26 02:41:41

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

Hello Andrew,

I think Pawel has replied to most of your comments, so I'll just add my own
0.02 KRW. ;)

> Peter Zijlstra <[email protected]> wrote:
>> So the idea is to grab a large chunk of memory at boot time and then
>> later allow some device to use it?
>>
>> I'd much rather we'd improve the regular page allocator to be smarter
>> about this. We recently added a lot of smarts to it like memory
>> compaction, which allows large gobs of contiguous memory to be freed for
>> things like huge pages.
>>
>> If you want guarantees you can free stuff, why not add constraints to
>> the page allocation type and only allow MIGRATE_MOVABLE pages inside a
>> certain region, those pages are easily freed/moved aside to satisfy
>> large contiguous allocations.

On Thu, 26 Aug 2010 00:58:14 +0200, Andrew Morton <[email protected]> wrote:
> That would be good. Although I expect that the allocation would need
> to be 100% rock-solid reliable, otherwise the end user has a
> non-functioning device. Could generic core VM provide the required level
> of service?

I think that the biggest problem is fragmentation here. For instance,
I think that a situation where there is enough free space but it's
fragmented so no single contiguous chunk can be allocated is a serious
problem. However, I would argue that if there's simply no space left,
a multimedia device could fail and even though it's not desirable, it
would not be such a big issue in my eyes.

So, if only movable or discardable pages are allocated in CMA managed
regions all should work well. When a device needs memory discardable
pages would get freed and movable moved unless there is no space left
on the device in which case allocation would fail.

Critical devices (just a hypothetical entities) could have separate
regions on which only discardable pages can be allocated so that memory
can always be allocated for them.

> I agree that having two "contiguous memory allocators" floating about
> on the list is distressing. Are we really all 100% diligently certain
> that there is no commonality here with Zach's work?

As Pawel said, I think Zach's trying to solve a different problem. No
matter, as I've said in response to Konrad's message, I have thought
about unifying Zach's IOMMU and CMA in such a way that devices could
work on both systems with and without IOMMU if only they would limit
the usage of the API to some subset which always works.

> Please cc me on future emails on this topic?

Not a problem.

--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

2010-08-26 02:49:30

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, Aug 26, 2010 at 8:31 AM, Jonathan Corbet <[email protected]> wrote:
> On Wed, 25 Aug 2010 15:58:14 -0700
> Andrew Morton <[email protected]> wrote:
>
>> > If you want guarantees you can free stuff, why not add constraints to
>> > the page allocation type and only allow MIGRATE_MOVABLE pages inside a
>> > certain region, those pages are easily freed/moved aside to satisfy
>> > large contiguous allocations.
>>
>> That would be good. ?Although I expect that the allocation would need
>> to be 100% rock-solid reliable, otherwise the end user has a
>> non-functioning device. ?Could generic core VM provide the required level
>> of service?
>
> The original OLPC has a camera controller which requires three contiguous,
> image-sized buffers in memory. ?That system is a little memory constrained
> (OK, it's desperately short of memory), so, in the past, the chances of
> being able to allocate those buffers anytime some kid decides to start
> taking pictures was poor. ?Thus, cafe_ccic.c has an option to snag the
> memory at initialization time and never let go even if you threaten its
> family. ?Hell hath no fury like a little kid whose new toy^W educational
> tool stops taking pictures.
>
> That, of course, is not a hugely efficient use of memory on a
> memory-constrained system. ?If the VM could reliably satisfy those
> allocation requestss, life would be wonderful. ?Seems difficult. ?But it
> would be a nicer solution than CMA, which, to a great extent, is really
> just a standardized mechanism for grabbing memory and never letting go.
>
>> It would help (a lot) if we could get more attention and buyin and
>> fedback from the potential clients of this code. ?rmk's feedback is
>> valuable. ?Have we heard from the linux-media people? ?What other
>> subsystems might use it? ?ieee1394 perhaps? ?Please help identify
>> specific subsystems and I can perhaps help to wake people up.
>
> If this code had been present when I did the Cafe driver, I would have used
> it. ?I think it could be made useful to a number of low-end camera drivers
> if the videobuf layer were made to talk to it in a way which Just Works.
>
> With a bit of tweaking, I think it could be made useful in other
> situations: the viafb driver, for example, really needs an allocator for
> framebuffer memory and it seems silly to create one from scratch. ?Of
> course, there might be other possible solutions, like adding a "zones"
> concept to LMB^W memblock.
>
> The problem which is being addressed here is real.
>
> That said, the complexity of the solution still bugs me a bit, and the core
> idea is still to take big chunks of memory out of service for specific
> needs. ?It would be far better if the VM could just provide big chunks on
> demand. ?Perhaps compaction and the pressures of making transparent huge
> pages work will get us there, but I'm not sure we're there yet.
>
> jon

I agree. compaction and movable zone will be one of good solutions.

If some driver needs big contiguous chunk to work, it should make sure
to be allowable to have memory size for it before going. To make sure
it, we have to consider compaction of ZONE_MOVABLE zone. But one of
problems is anonymous page which can be has a role of pinned page in
non-swapsystem. Even most of embedded system has no swap.
But it's not hard to solve it.

We needs Mel's opinion, too.

--
Kind regards,
Minchan Kim

2010-08-26 02:55:32

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 26 Aug 2010 04:12:10 +0200
Michał Nazarewicz <[email protected]> wrote:

> On Thu, 26 Aug 2010 02:58:57 +0200, KAMEZAWA Hiroyuki <[email protected]> wrote:
> > Hmm, you may not like this..but how about following kind of interface ?
> >
> > Now, memoyr hotplug supports following operation to free and _isolate_
> > memory region.
> > # echo offline > /sys/devices/system/memory/memoryX/state
> >
> > Then, a region of memory will be isolated. (This succeeds if there are free
> > memory.)
> >
> > Add a new interface.
> >
> > % echo offline > /sys/devices/system/memory/memoryX/state
> > # extract memory from System RAM and make them invisible from buddy allocator.
> >
> > % echo cma > /sys/devices/system/memory/memoryX/state
> > # move invisible memory to cma.
>
> At this point I need to say that I have no experience with hotplug memory but
> I think that for this to make sense the regions of memory would have to be
> smaller. Unless I'm misunderstanding something, the above would convert
> a region of sizes in order of GiBs to use for CMA.
>

Now, x86's section size is
==
#ifdef CONFIG_X86_32
# ifdef CONFIG_X86_PAE
# define SECTION_SIZE_BITS 29
# define MAX_PHYSADDR_BITS 36
# define MAX_PHYSMEM_BITS 36
# else
# define SECTION_SIZE_BITS 26
# define MAX_PHYSADDR_BITS 32
# define MAX_PHYSMEM_BITS 32
# endif
#else /* CONFIG_X86_32 */
# define SECTION_SIZE_BITS 27 /* matt - 128 is convenient right now */
# define MAX_PHYSADDR_BITS 44
# define MAX_PHYSMEM_BITS 46
#endif
==

128MB...too big ? But it's depend on config.

IBM's ppc guys used 16MB section, and recently, a new interface to shrink
the number of /sys files are added, maybe usable.

Something good with this approach will be you can create "cma" memory
before installing driver.

But yes, complicated and need some works.

Bye,
-Kame

2010-08-26 03:04:46

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, Aug 26, 2010 at 11:49 AM, Minchan Kim <[email protected]> wrote:
> On Thu, Aug 26, 2010 at 8:31 AM, Jonathan Corbet <[email protected]> wrote:
>> On Wed, 25 Aug 2010 15:58:14 -0700
>> Andrew Morton <[email protected]> wrote:
>>
>>> > If you want guarantees you can free stuff, why not add constraints to
>>> > the page allocation type and only allow MIGRATE_MOVABLE pages inside a
>>> > certain region, those pages are easily freed/moved aside to satisfy
>>> > large contiguous allocations.
>>>
>>> That would be good. ?Although I expect that the allocation would need
>>> to be 100% rock-solid reliable, otherwise the end user has a
>>> non-functioning device. ?Could generic core VM provide the required level
>>> of service?
>>
>> The original OLPC has a camera controller which requires three contiguous,
>> image-sized buffers in memory. ?That system is a little memory constrained
>> (OK, it's desperately short of memory), so, in the past, the chances of
>> being able to allocate those buffers anytime some kid decides to start
>> taking pictures was poor. ?Thus, cafe_ccic.c has an option to snag the
>> memory at initialization time and never let go even if you threaten its
>> family. ?Hell hath no fury like a little kid whose new toy^W educational
>> tool stops taking pictures.
>>
>> That, of course, is not a hugely efficient use of memory on a
>> memory-constrained system. ?If the VM could reliably satisfy those
>> allocation requestss, life would be wonderful. ?Seems difficult. ?But it
>> would be a nicer solution than CMA, which, to a great extent, is really
>> just a standardized mechanism for grabbing memory and never letting go.
>>
>>> It would help (a lot) if we could get more attention and buyin and
>>> fedback from the potential clients of this code. ?rmk's feedback is
>>> valuable. ?Have we heard from the linux-media people? ?What other
>>> subsystems might use it? ?ieee1394 perhaps? ?Please help identify
>>> specific subsystems and I can perhaps help to wake people up.
>>
>> If this code had been present when I did the Cafe driver, I would have used
>> it. ?I think it could be made useful to a number of low-end camera drivers
>> if the videobuf layer were made to talk to it in a way which Just Works.
>>
>> With a bit of tweaking, I think it could be made useful in other
>> situations: the viafb driver, for example, really needs an allocator for
>> framebuffer memory and it seems silly to create one from scratch. ?Of
>> course, there might be other possible solutions, like adding a "zones"
>> concept to LMB^W memblock.
>>
>> The problem which is being addressed here is real.
>>
>> That said, the complexity of the solution still bugs me a bit, and the core
>> idea is still to take big chunks of memory out of service for specific
>> needs. ?It would be far better if the VM could just provide big chunks on
>> demand. ?Perhaps compaction and the pressures of making transparent huge
>> pages work will get us there, but I'm not sure we're there yet.
>>
>> jon
>
> I agree. compaction and movable zone will be one of good solutions.
>
> If some driver needs big contiguous chunk to work, it should make sure
> to be allowable to have memory size for it before going. To make sure
> it, we have to consider compaction of ZONE_MOVABLE zone. But one of
> problems is anonymous page which can be has a role of pinned page in
> non-swapsystem. Even most of embedded system has no swap.
> But it's not hard to solve it.
>
> We needs Mel's opinion, too.

I elaborates my statement for preventing confusing due to using _pinned page_.
I means that anon pages isn't not a fragment problem but space problem
for the devices.
It would be better to move the pages into !ZONE_MOVABLE zone.

>
> --
> Kind regards,
> Minchan Kim
>

--
Kind regards,
Minchan Kim

2010-08-26 03:49:44

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 26 Aug 2010 11:50:17 +0900
KAMEZAWA Hiroyuki <[email protected]> wrote:

> 128MB...too big ? But it's depend on config.
>
> IBM's ppc guys used 16MB section, and recently, a new interface to shrink
> the number of /sys files are added, maybe usable.
>
> Something good with this approach will be you can create "cma" memory
> before installing driver.
>
> But yes, complicated and need some works.
>
Ah, I need to clarify what I want to say.

With compaction, it's helpful, but you can't get contiguous memory larger
than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand,
memory hot-plug code has almost all necessary things.

you may able to add
# echo 0xa0000000-0xa80000000 > /sys/devices/system/memory/cma

to get contiguous isolated memory.

BTW, just curious...the memory for cma need not to be saved at
hibernation ? Or drivers has to write its own hibernation ops by driver suspend
udev or some ?

Thanks,
-Kame

2010-08-26 04:03:04

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

KAMEZAWA Hiroyuki <[email protected]> wrote:
> 128MB...too big ? But it's depend on config.

On embedded systems it may be like half of the RAM. Or a quarter. So bigger
granularity could be desired on some platforms.

> IBM's ppc guys used 16MB section, and recently, a new interface to shrink
> the number of /sys files are added, maybe usable.
>
> Something good with this approach will be you can create "cma" memory
> before installing driver.

That's how CMA works at the moment. But if I understand you correctly, what
you are proposing would allow to reserve memory *at* *runtime* long after system
has booted. This would be a nice feature as well though.

> But yes, complicated and need some works.

> Ah, I need to clarify what I want to say.
>
> With compaction, it's helpful, but you can't get contiguous memory larger
> than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand,
> memory hot-plug code has almost all necessary things.

I'll try to look at it then.

> BTW, just curious...the memory for cma need not to be saved at
> hibernation ? Or drivers has to write its own hibernation ops by driver suspend
> udev or some ?

Hibernation was not considered as of yet but I think it's device driver's
responsibility more then CMA's especially since it may make little sense to save
some of the buffers -- ie. no need to keep a frame from camera since it'll be
overwritten just after system wakes up from hibernation. It may also be better
to stop playback and resume it later on rather than trying to save decoder's
state. Again though, I haven't thought about hibernation as of yet.

--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

2010-08-26 04:06:31

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, Aug 26, 2010 at 12:44 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Thu, 26 Aug 2010 11:50:17 +0900
> KAMEZAWA Hiroyuki <[email protected]> wrote:
>
>> 128MB...too big ? But it's depend on config.
>>
>> IBM's ppc guys used 16MB section, and recently, a new interface to shrink
>> the number of /sys files are added, maybe usable.
>>
>> Something good with this approach will be you can create "cma" memory
>> before installing driver.
>>
>> But yes, complicated and need some works.
>>
> Ah, I need to clarify what I want to say.
>
> With compaction, it's helpful, but you can't get contiguous memory larger
> than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand,
> memory hot-plug code has almost all necessary things.

True. Doesn't patch's idea of Christoph helps this ?
http://lwn.net/Articles/200699/

--
Kind regards,
Minchan Kim

2010-08-26 04:14:11

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, Aug 26, 2010 at 1:06 PM, Minchan Kim <[email protected]> wrote:
> On Thu, Aug 26, 2010 at 12:44 PM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:
>> On Thu, 26 Aug 2010 11:50:17 +0900
>> KAMEZAWA Hiroyuki <[email protected]> wrote:
>>
>>> 128MB...too big ? But it's depend on config.
>>>
>>> IBM's ppc guys used 16MB section, and recently, a new interface to shrink
>>> the number of /sys files are added, maybe usable.
>>>
>>> Something good with this approach will be you can create "cma" memory
>>> before installing driver.
>>>
>>> But yes, complicated and need some works.
>>>
>> Ah, I need to clarify what I want to say.
>>
>> With compaction, it's helpful, but you can't get contiguous memory larger
>> than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand,
>> memory hot-plug code has almost all necessary things.
>
> True. Doesn't patch's idea of Christoph helps this ?
> http://lwn.net/Articles/200699/

Of course, It itself can't meet our requirement but idea of range
allocation seem to be good.
I think it can be start point.

>
> --
> Kind regards,
> Minchan Kim
>

--
Kind regards,
Minchan Kim

2010-08-26 04:35:53

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 26 Aug 2010 13:06:28 +0900
Minchan Kim <[email protected]> wrote:

> On Thu, Aug 26, 2010 at 12:44 PM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:
> > On Thu, 26 Aug 2010 11:50:17 +0900
> > KAMEZAWA Hiroyuki <[email protected]> wrote:
> >
> >> 128MB...too big ? But it's depend on config.
> >>
> >> IBM's ppc guys used 16MB section, and recently, a new interface to shrink
> >> the number of /sys files are added, maybe usable.
> >>
> >> Something good with this approach will be you can create "cma" memory
> >> before installing driver.
> >>
> >> But yes, complicated and need some works.
> >>
> > Ah, I need to clarify what I want to say.
> >
> > With compaction, it's helpful, but you can't get contiguous memory larger
> > than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand,
> > memory hot-plug code has almost all necessary things.
>
> True. Doesn't patch's idea of Christoph helps this ?
> http://lwn.net/Articles/200699/
>

yes, I think so. But, IIRC, it's own purpose of Chirstoph's work is
for removing zones. please be careful what's really necessary.

Thanks,
-Kame

2010-08-26 04:45:12

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 26 Aug 2010 06:01:56 +0200
Michał Nazarewicz <[email protected]> wrote:

> KAMEZAWA Hiroyuki <[email protected]> wrote:
> > 128MB...too big ? But it's depend on config.
>
> On embedded systems it may be like half of the RAM. Or a quarter. So bigger
> granularity could be desired on some platforms.
>
> > IBM's ppc guys used 16MB section, and recently, a new interface to shrink
> > the number of /sys files are added, maybe usable.
> >
> > Something good with this approach will be you can create "cma" memory
> > before installing driver.
>
> That's how CMA works at the moment. But if I understand you correctly, what
> you are proposing would allow to reserve memory *at* *runtime* long after system
> has booted. This would be a nice feature as well though.
>
Yes, my proposal is that.

> > But yes, complicated and need some works.
>
> > Ah, I need to clarify what I want to say.
> >
> > With compaction, it's helpful, but you can't get contiguous memory larger
> > than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand,
> > memory hot-plug code has almost all necessary things.
>
> I'll try to look at it then.
>

mm/memory_hotplug.c::offline_pages() does

1. disallow new allocation of memory in [start_pfn...end_pfn)
2. move all LRU pages to other regions than [start_pfn...end_pfn)
3. finally, mark all pages as PG_reserved (see __offline_isolated_pages())

What's required for cma will be
a. remove _section_ limitation, which is done as BUG_ON().
b. replace 'step 3' with cma code.

Maybe you can do similar just using compaction logic. The biggest difference will
be 'step 1'.

> > BTW, just curious...the memory for cma need not to be saved at
> > hibernation ? Or drivers has to write its own hibernation ops by driver suspend
> > udev or some ?
>
> Hibernation was not considered as of yet but I think it's device driver's
> responsibility more then CMA's especially since it may make little sense to save
> some of the buffers -- ie. no need to keep a frame from camera since it'll be
> overwritten just after system wakes up from hibernation. It may also be better
> to stop playback and resume it later on rather than trying to save decoder's
> state. Again though, I haven't thought about hibernation as of yet.
>

Hmm, ok, use-case dependent and it's a job of a driver.

Thanks,
-Kame

2010-08-26 05:50:12

by Cong Wang

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, Aug 26, 2010 at 06:01:56AM +0200, Michał Nazarewicz wrote:
>KAMEZAWA Hiroyuki <[email protected]> wrote:
>>128MB...too big ? But it's depend on config.
>
>On embedded systems it may be like half of the RAM. Or a quarter. So bigger
>granularity could be desired on some platforms.
>
>>IBM's ppc guys used 16MB section, and recently, a new interface to shrink
>>the number of /sys files are added, maybe usable.
>>
>>Something good with this approach will be you can create "cma" memory
>>before installing driver.
>
>That's how CMA works at the moment. But if I understand you correctly, what
>you are proposing would allow to reserve memory *at* *runtime* long after system
>has booted. This would be a nice feature as well though.
>

Yeah, if we can do this, that will avoid rebooting for kdump to reserve
memory.

Thanks.

2010-08-26 06:26:11

[permalink] [raw]

Subject: [PATCH/RFCv4.1 2/6] mm: cma: Contiguous Memory Allocator added

The Contiguous Memory Allocator framework is a set of APIs for
allocating physically contiguous chunks of memory.

Various chips require contiguous blocks of memory to operate. Those
chips include devices such as cameras, hardware video decoders and
encoders, etc.

The code is highly modular and customisable to suit the needs of
various users. Set of regions reserved for CMA can be configured
per-platform and it is easy to add custom allocator algorithms if one
has such need.

Signed-off-by: Michal Nazarewicz <[email protected]>
Signed-off-by: Kyungmin Park <[email protected]>
Reviewed-by: Pawel Osciak <[email protected]>
---

Just a quick bugfix (i n certain conditions, CMA went into infinite
loop) and update in response to Konrad's comments.

Documentation/00-INDEX | 2 +
Documentation/contiguous-memory.txt | 544 +++++++++++++++++++++
include/linux/cma.h | 432 +++++++++++++++++
mm/Kconfig | 34 ++
mm/Makefile | 2 +
mm/cma-best-fit.c | 407 ++++++++++++++++
mm/cma.c | 911 +++++++++++++++++++++++++++++++++++
7 files changed, 2332 insertions(+), 0 deletions(-)
create mode 100644 Documentation/contiguous-memory.txt
create mode 100644 include/linux/cma.h
create mode 100644 mm/cma-best-fit.c
create mode 100644 mm/cma.c

diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index 8dfc670..f93e787 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -94,6 +94,8 @@ connector/
- docs on the netlink based userspace<->kernel space communication mod.
console/
- documentation on Linux console drivers.
+contiguous-memory.txt
+ - documentation on physically-contiguous memory allocation framework.
cpu-freq/
- info on CPU frequency and voltage scaling.
cpu-hotplug.txt
diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt
new file mode 100644
index 0000000..cc43440
--- /dev/null
+++ b/Documentation/contiguous-memory.txt
@@ -0,0 +1,544 @@
+ -*- org -*-
+
+* Contiguous Memory Allocator
+
+ The Contiguous Memory Allocator (CMA) is a framework, which allows
+ setting up a machine-specific configuration for physically-contiguous
+ memory management. Memory for devices is then allocated according
+ to that configuration.
+
+ The main role of the framework is not to allocate memory, but to
+ parse and manage memory configurations, as well as to act as an
+ in-between between device drivers and pluggable allocators. It is
+ thus not tied to any memory allocation method or strategy.
+
+** Why is it needed?
+
+ Various devices on embedded systems have no scatter-getter and/or
+ IO map support and as such require contiguous blocks of memory to
+ operate. They include devices such as cameras, hardware video
+ decoders and encoders, etc.
+
+ Such devices often require big memory buffers (a full HD frame is,
+ for instance, more then 2 mega pixels large, i.e. more than 6 MB
+ of memory), which makes mechanisms such as kmalloc() ineffective.
+
+ Some embedded devices impose additional requirements on the
+ buffers, e.g. they can operate only on buffers allocated in
+ particular location/memory bank (if system has more than one
+ memory bank) or buffers aligned to a particular memory boundary.
+
+ Development of embedded devices have seen a big rise recently
+ (especially in the V4L area) and many such drivers include their
+ own memory allocation code. Most of them use bootmem-based methods.
+ CMA framework is an attempt to unify contiguous memory allocation
+ mechanisms and provide a simple API for device drivers, while
+ staying as customisable and modular as possible.
+
+** Design
+
+ The main design goal for the CMA was to provide a customisable and
+ modular framework, which could be configured to suit the needs of
+ individual systems. Configuration specifies a list of memory
+ regions, which then are assigned to devices. Memory regions can
+ be shared among many device drivers or assigned exclusively to
+ one. This has been achieved in the following ways:
+
+ 1. The core of the CMA does not handle allocation of memory and
+ management of free space. Dedicated allocators are used for
+ that purpose.
+
+ This way, if the provided solution does not match demands
+ imposed on a given system, one can develop a new algorithm and
+ easily plug it into the CMA framework.
+
+ The presented solution includes an implementation of a best-fit
+ algorithm.
+
+ 2. When requesting memory, devices have to introduce themselves.
+ This way CMA knows who the memory is allocated for. This
+ allows for the system architect to specify which memory regions
+ each device should use.
+
+ 3. Memory regions are grouped in various "types". When device
+ requests a chunk of memory, it can specify what type of memory
+ it needs. If no type is specified, "common" is assumed.
+
+ This makes it possible to configure the system in such a way,
+ that a single device may get memory from different memory
+ regions, depending on the "type" of memory it requested. For
+ example, a video codec driver might want to allocate some
+ shared buffers from the first memory bank and the other from
+ the second to get the highest possible memory throughput.
+
+ 4. For greater flexibility and extensibility, the framework allows
+ device drivers to register private regions of reserved memory
+ which then may be used only by them.
+
+ As an effect, if a driver would not use the rest of the CMA
+ interface, it can still use CMA allocators and other
+ mechanisms.
+
+ 4a. Early in boot process, device drivers can also request the
+ CMA framework to a reserve a region of memory for them
+ which then will be used as a private region.
+
+ This way, drivers do not need to directly call bootmem,
+ memblock or similar early allocator but merely register an
+ early region and the framework will handle the rest
+ including choosing the right early allocator.
+
+** Use cases
+
+ Let's analyse some imaginary system that uses the CMA to see how
+ the framework can be used and configured.
+
+
+ We have a platform with a hardware video decoder and a camera each
+ needing 20 MiB of memory in the worst case. Our system is written
+ in such a way though that the two devices are never used at the
+ same time and memory for them may be shared. In such a system the
+ following configuration would be used in the platform
+ initialisation code:
+
+ static struct cma_region regions[] = {
+ { .name = "region", .size = 20 << 20 },
+ { }
+ }
+ static const char map[] __initconst = "video,camera=region";
+
+ cma_set_defaults(regions, map);
+
+ The regions array defines a single 20-MiB region named "region".
+ The map says that drivers named "video" and "camera" are to be
+ granted memory from the previously defined region.
+
+ A shorter map can be used as well:
+
+ static const char map[] __initconst = "*=region";
+
+ The asterisk ("*") matches all devices thus all devices will use
+ the region named "region".
+
+ We can see, that because the devices share the same memory region,
+ we save 20 MiB, compared to the situation when each of the devices
+ would reserve 20 MiB of memory for itself.
+
+
+ Now, let's say that we have also many other smaller devices and we
+ want them to share some smaller pool of memory. For instance 5
+ MiB. This can be achieved in the following way:
+
+ static struct cma_region regions[] = {
+ { .name = "region", .size = 20 << 20 },
+ { .name = "common", .size = 5 << 20 },
+ { }
+ }
+ static const char map[] __initconst =
+ "video,camera=region;*=common";
+
+ cma_set_defaults(regions, map);
+
+ This instructs CMA to reserve two regions and let video and camera
+ use region "region" whereas all other devices should use region
+ "common".
+
+
+ Later on, after some development of the system, it can now run
+ video decoder and camera at the same time. The 20 MiB region is
+ no longer enough for the two to share. A quick fix can be made to
+ grant each of those devices separate regions:
+
+ static struct cma_region regions[] = {
+ { .name = "v", .size = 20 << 20 },
+ { .name = "c", .size = 20 << 20 },
+ { .name = "common", .size = 5 << 20 },
+ { }
+ }
+ static const char map[] __initconst = "video=v;camera=c;*=common";
+
+ cma_set_defaults(regions, map);
+
+ This solution also shows how with CMA you can assign private pools
+ of memory to each device if that is required.
+
+
+ Allocation mechanisms can be replaced dynamically in a similar
+ manner as well. Let's say that during testing, it has been
+ discovered that, for a given shared region of 40 MiB,
+ fragmentation has become a problem. It has been observed that,
+ after some time, it becomes impossible to allocate buffers of the
+ required sizes. So to satisfy our requirements, we would have to
+ reserve a larger shared region beforehand.
+
+ But fortunately, you have also managed to develop a new allocation
+ algorithm -- Neat Allocation Algorithm or "na" for short -- which
+ satisfies the needs for both devices even on a 30 MiB region. The
+ configuration can be then quickly changed to:
+
+ static struct cma_region regions[] = {
+ { .name = "region", .size = 30 << 20, .alloc_name = "na" },
+ { .name = "common", .size = 5 << 20 },
+ { }
+ }
+ static const char map[] __initconst = "video,camera=region;*=common";
+
+ cma_set_defaults(regions, map);
+
+ This shows how you can develop your own allocation algorithms if
+ the ones provided with CMA do not suit your needs and easily
+ replace them, without the need to modify CMA core or even
+ recompiling the kernel.
+
+** Technical Details
+
+*** The attributes
+
+ As shown above, CMA is configured by a two attributes: list
+ regions and map. The first one specifies regions that are to be
+ reserved for CMA. The second one specifies what regions each
+ device is assigned to.
+
+**** Regions
+
+ Regions is a list of regions terminated by a region with size
+ equal zero. The following fields may be set:
+
+ - size -- size of the region (required, must not be zero)
+ - alignment -- alignment of the region; must be power of two or
+ zero (optional)
+ - start -- where the region has to start (optional)
+ - alloc_name -- the name of allocator to use (optional)
+ - alloc -- allocator to use (optional; and besides
+ alloc_name is probably is what you want)
+
+ size, alignment and start is specified in bytes. Size will be
+ aligned up to a PAGE_SIZE. If alignment is less then a PAGE_SIZE
+ it will be set to a PAGE_SIZE. start will be aligned to
+ alignment.
+
+**** Map
+
+ The format of the "map" attribute is as follows:
+
+ map-attr ::= [ rules [ ';' ] ]
+ rules ::= rule [ ';' rules ]
+ rule ::= patterns '=' regions
+
+ patterns ::= pattern [ ',' patterns ]
+
+ regions ::= REG-NAME [ ',' regions ]
+ // list of regions to try to allocate memory
+ // from
+
+ pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
+ // pattern request must match for the rule to
+ // apply; the first rule that matches is
+ // applied; if dev-pattern part is omitted
+ // value identical to the one used in previous
+ // pattern is assumed.
+
+ dev-pattern ::= PATTERN
+ // pattern that device name must match for the
+ // rule to apply; may contain question marks
+ // which mach any characters and end with an
+ // asterisk which match the rest of the string
+ // (including nothing).
+
+ It is a sequence of rules which specify what regions should given
+ (device, type) pair use. The first rule that matches is applied.
+
+ For rule to match, the pattern must match (dev, type) pair.
+ Pattern consist of the part before and after slash. The first
+ part must match device name and the second part must match kind.
+
+ If the first part is empty, the device name is assumed to match
+ iff it matched in previous pattern. If the second part is
+ omitted it will mach any type of memory requested by device.
+
+ Some examples (whitespace added for better readability):
+
+ cma_map = foo/quaz = r1;
+ // device foo with type == "quaz" uses region r1
+
+ foo/* = r2; // OR:
+ /* = r2;
+ // device foo with any other kind uses region r2
+
+ bar = r1,r2;
+ // device bar uses region r1 or r2
+
+ baz?/a , baz?/b = r3;
+ // devices named baz? where ? is any character
+ // with type being "a" or "b" use r3
+
+*** The device and types of memory
+
+ The name of the device is taken from the device structure. It is
+ not possible to use CMA if driver does not register a device
+ (actually this can be overcome if a fake device structure is
+ provided with at least the name set).
+
+ The type of memory is an optional argument provided by the device
+ whenever it requests memory chunk. In many cases this can be
+ ignored but sometimes it may be required for some devices.
+
+ For instance, let's say that there are two memory banks and for
+ performance reasons a device uses buffers in both of them.
+ Platform defines a memory types "a" and "b" for regions in both
+ banks. The device driver would use those two types then to
+ request memory chunks from different banks. CMA attributes could
+ look as follows:
+
+ static struct cma_region regions[] = {
+ { .name = "a", .size = 32 << 20 },
+ { .name = "b", .size = 32 << 20, .start = 512 << 20 },
+ { }
+ }
+ static const char map[] __initconst = "foo/a=a;foo/b=b;*=a,b";
+
+ And whenever the driver allocated the memory it would specify the
+ kind of memory:
+
+ buffer1 = cma_alloc(dev, "a", 1 << 20, 0);
+ buffer2 = cma_alloc(dev, "b", 1 << 20, 0);
+
+ If it was needed to try to allocate from the other bank as well if
+ the dedicated one is full, the map attributes could be changed to:
+
+ static const char map[] __initconst = "foo/a=a,b;foo/b=b,a;*=a,b";
+
+ On the other hand, if the same driver was used on a system with
+ only one bank, the configuration could be changed just to:
+
+ static struct cma_region regions[] = {
+ { .name = "r", .size = 64 << 20 },
+ { }
+ }
+ static const char map[] __initconst = "*=r";
+
+ without the need to change the driver at all.
+
+*** Device API
+
+ There are three basic calls provided by the CMA framework to
+ devices. To allocate a chunk of memory cma_alloc() function needs
+ to be used:
+
+ dma_addr_t cma_alloc(const struct device *dev, const char *type,
+ size_t size, dma_addr_t alignment);
+
+ If required, device may specify alignment in bytes that the chunk
+ need to satisfy. It have to be a power of two or zero. The
+ chunks are always aligned at least to a page.
+
+ The type specifies the type of memory as described to in the
+ previous subsection. If device driver does not care about memory
+ type it can safely pass NULL as the type which is the same as
+ possing "common".
+
+ The basic usage of the function is just a:
+
+ addr = cma_alloc(dev, NULL, size, 0);
+
+ The function returns bus address of allocated chunk or a value
+ that evaluates to true if checked with IS_ERR_VALUE(), so the
+ correct way for checking for errors is:
+
+ unsigned long addr = cma_alloc(dev, size);
+ if (IS_ERR_VALUE(addr))
+ /* Error */
+ return (int)addr;
+ /* Allocated */
+
+ (Make sure to include <linux/err.h> which contains the definition
+ of the IS_ERR_VALUE() macro.)
+
+
+ Allocated chunk is freed via a cma_free() function:
+
+ int cma_free(dma_addr_t addr);
+
+ It takes bus address of the chunk as an argument frees it.
+
+
+ The last function is the cma_info() which returns information
+ about regions assigned to given (dev, type) pair. Its syntax is:
+
+ int cma_info(struct cma_info *info,
+ const struct device *dev,
+ const char *type);
+
+ On successful exit it fills the info structure with lower and
+ upper bound of regions, total size and number of regions assigned
+ to given (dev, type) pair.
+
+**** Dynamic and private regions
+
+ In the basic setup, regions are provided and initialised by
+ platform initialisation code (which usually use
+ cma_set_defaults() for that purpose).
+
+ It is, however, possible to create and add regions dynamically
+ using cma_region_register() function.
+
+ int cma_region_register(struct cma_region *reg);
+
+ The region does not have to have name. If it does not, it won't
+ be accessed via standard mapping (the one provided with map
+ attribute). Such regions are private and to allocate chunk from
+ them, one needs to call:
+
+ dma_addr_t cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment);
+
+ It is just like cma_alloc() expect one specifies what region to
+ allocate memory from. The region must have been registered.
+
+**** Allocating from region specified by name
+
+ If a driver preferred allocating from a region or list of regions
+ it knows name of it can use a different call simmilar to the
+ previous:
+
+ dma_addr_t cma_alloc_from(const char *regions,
+ size_t size, dma_addr_t alignment);
+
+ The first argument is a comma-separated list of regions the
+ driver desires CMA to try and allocate from. The list is
+ terminated by a NUL byte or a semicolon.
+
+ Similarly, there is a call for requesting information about named
+ regions:
+
+ int cma_info_about(struct cma_info *info, const char *regions);
+
+ Generally, it should not be needed to use those interfaces but
+ they are provided nevertheless.
+
+**** Registering early regions
+
+ An early region is a region that is managed by CMA early during
+ boot process. It's platforms responsibility to reserve memory
+ for early regions. Later on, when CMA initialises, early regions
+ with reserved memory are registered as normal regions.
+ Registering an early region may be a way for a device to request
+ a private pool of memory without worrying about actually
+ reserving the memory:
+
+ int cma_early_region_register(struct cma_region *reg);
+
+ This needs to be done quite early on in boot process, before
+ platform traverses the cma_early_regions list to reserve memory.
+
+ When boot process ends, device driver may see whether the region
+ was reserved (by checking reg->reserved flag) and if so, whether
+ it was successfully registered as a normal region (by checking
+ the reg->registered flag). If that is the case, device driver
+ can use normal API calls to use the region.
+
+*** Allocator operations
+
+ Creating an allocator for CMA needs four functions to be
+ implemented.
+
+
+ The first two are used to initialise an allocator for given driver
+ and clean up afterwards:
+
+ int cma_foo_init(struct cma_region *reg);
+ void cma_foo_cleanup(struct cma_region *reg);
+
+ The first is called when allocator is attached to region. When
+ the function is called, the cma_region structure is fully
+ initialised (ie. starting address and size have correct values).
+ As a meter of fact, allocator should never modify the cma_region
+ structure other then the private_data field which it may use to
+ point to it's private data.
+
+ The second call cleans up and frees all resources the allocator
+ has allocated for the region. The function can assume that all
+ chunks allocated form this region have been freed thus the whole
+ region is free.
+
+
+ The two other calls are used for allocating and freeing chunks.
+ They are:
+
+ struct cma_chunk *cma_foo_alloc(struct cma_region *reg,
+ size_t size, dma_addr_t alignment);
+ void cma_foo_free(struct cma_chunk *chunk);
+
+ As names imply the first allocates a chunk and the other frees
+ a chunk of memory. It also manages a cma_chunk object
+ representing the chunk in physical memory.
+
+ Either of those function can assume that they are the only thread
+ accessing the region. Therefore, allocator does not need to worry
+ about concurrency. Moreover, all arguments are guaranteed to be
+ valid (i.e. page aligned size, a power of two alignment no lower
+ the a page size).
+
+
+ When allocator is ready, all that is left is to register it by
+ calling cma_allocator_register() function:
+
+ int cma_allocator_register(struct cma_allocator *alloc);
+
+ The argument is an structure with pointers to the above functions
+ and allocator's name. The whole call may look something like
+ this:
+
+ static struct cma_allocator alloc = {
+ .name = "foo",
+ .init = cma_foo_init,
+ .cleanup = cma_foo_cleanup,
+ .alloc = cma_foo_alloc,
+ .free = cma_foo_free,
+ };
+ return cma_allocator_register(&alloc);
+
+ The name ("foo") will be used when a this particular allocator is
+ requested as an allocator for given region.
+
+*** Integration with platform
+
+ There is one function that needs to be called form platform
+ initialisation code. That is the cma_early_regions_reserve()
+ function:
+
+ void cma_early_regions_reserve(int (*reserve)(struct cma_region *reg));
+
+ It traverses list of all of the early regions provided by platform
+ and registered by drivers and reserves memory for them. The only
+ argument is a callback function used to reserve the region.
+ Passing NULL as the argument is the same as passing
+ cma_early_region_reserve() function which uses bootmem and
+ memblock for allocating.
+
+ Alternatively, platform code could traverse the cma_early_regions
+ list by itself but this should never be necessary.
+
+
+ Platform has also a way of providing default attributes for CMA,
+ cma_set_defaults() function is used for that purpose:
+
+ int cma_set_defaults(struct cma_region *regions, const char *map)
+
+ It needs to be called prior to reserving regions. It let one
+ specify the list of regions defined by platform and the map
+ attribute. The map may point to a string in __initdata. See
+ above in this document for example usage of this function.
+
+** Future work
+
+ In the future, implementation of mechanisms that would allow the
+ free space inside the regions to be used as page cache, filesystem
+ buffers or swap devices is planned. With such mechanisms, the
+ memory would not be wasted when not used.
+
+ Because all allocations and freeing of chunks pass the CMA
+ framework it can follow what parts of the reserved memory are
+ freed and what parts are allocated. Tracking the unused memory
+ would let CMA use it for other purposes such as page cache, I/O
+ buffers, swap, etc.
diff --git a/include/linux/cma.h b/include/linux/cma.h
new file mode 100644
index 0000000..9f6ee57
--- /dev/null
+++ b/include/linux/cma.h
@@ -0,0 +1,432 @@
+#ifndef __LINUX_CMA_H
+#define __LINUX_CMA_H
+
+/*
+ * Contiguous Memory Allocator framework
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ */
+
+/*
+ * See Documentation/contiguous-memory.txt for details.
+ */
+
+/***************************** Kernel lever API *****************************/
+
+#ifdef __KERNEL__
+
+#include <linux/rbtree.h>
+#include <linux/list.h>
+
+
+struct device;
+struct cma_info;
+
+/*
+ * Don't call it directly, use cma_alloc(), cma_alloc_from() or
+ * cma_alloc_from_region().
+ */
+dma_addr_t __must_check
+__cma_alloc(const struct device *dev, const char *type,
+ size_t size, dma_addr_t alignment);
+
+/* Don't call it directly, use cma_info() or cma_info_about(). */
+int
+__cma_info(struct cma_info *info, const struct device *dev, const char *type);
+
+
+/**
+ * cma_alloc - allocates contiguous chunk of memory.
+ * @dev: The device to perform allocation for.
+ * @type: A type of memory to allocate. Platform may define
+ * several different types of memory and device drivers
+ * can then request chunks of different types. Usually it's
+ * safe to pass NULL here which is the same as passing
+ * "common".
+ * @size: Size of the memory to allocate in bytes.
+ * @alignment: Desired alignment in bytes. Must be a power of two or
+ * zero. If alignment is less then a page size it will be
+ * set to page size. If unsure, pass zero here.
+ *
+ * On error returns a negative error cast to dma_addr_t. Use
+ * IS_ERR_VALUE() to check if returned value is indeed an error.
+ * Otherwise bus address of the chunk is returned.
+ */
+static inline dma_addr_t __must_check
+cma_alloc(const struct device *dev, const char *type,
+ size_t size, dma_addr_t alignment)
+{
+ return dev ? __cma_alloc(dev, type, size, alignment) : -EINVAL;
+}
+
+
+/**
+ * struct cma_info - information about regions returned by cma_info().
+ * @lower_bound: The smallest address that is possible to be
+ * allocated for given (dev, type) pair.
+ * @upper_bound: The one byte after the biggest address that is
+ * possible to be allocated for given (dev, type)
+ * pair.
+ * @total_size: Total size of regions mapped to (dev, type) pair.
+ * @free_size: Total free size in all of the regions mapped to (dev, type)
+ * pair. Because of possible race conditions, it is not
+ * guaranteed that the value will be correct -- it gives only
+ * an approximation.
+ * @count: Number of regions mapped to (dev, type) pair.
+ */
+struct cma_info {
+ dma_addr_t lower_bound, upper_bound;
+ size_t total_size, free_size;
+ unsigned count;
+};
+
+/**
+ * cma_info - queries information about regions.
+ * @info: Pointer to a structure where to save the information.
+ * @dev: The device to query information for.
+ * @type: A type of memory to query information for.
+ * If unsure, pass NULL here which is equal to passing
+ * "common".
+ *
+ * On error returns a negative error, zero otherwise.
+ */
+static inline int
+cma_info(struct cma_info *info, const struct device *dev, const char *type)
+{
+ return dev ? __cma_info(info, dev, type) : -EINVAL;
+}
+
+
+/**
+ * cma_free - frees a chunk of memory.
+ * @addr: Beginning of the chunk.
+ *
+ * Returns -ENOENT if there is no chunk at given location; otherwise
+ * zero. In the former case issues a warning.
+ */
+int cma_free(dma_addr_t addr);
+
+
+
+/****************************** Lower lever API *****************************/
+
+/**
+ * cma_alloc_from - allocates contiguous chunk of memory from named regions.
+ * @regions: Comma separated list of region names. Terminated by NUL
+ * byte or a semicolon.
+ * @size: Size of the memory to allocate in bytes.
+ * @alignment: Desired alignment in bytes. Must be a power of two or
+ * zero. If alignment is less then a page size it will be
+ * set to page size. If unsure, pass zero here.
+ *
+ * On error returns a negative error cast to dma_addr_t. Use
+ * IS_ERR_VALUE() to check if returned value is indeed an error.
+ * Otherwise bus address of the chunk is returned.
+ */
+static inline dma_addr_t __must_check
+cma_alloc_from(const char *regions, size_t size, dma_addr_t alignment)
+{
+ return __cma_alloc(NULL, regions, size, alignment);
+}
+
+/**
+ * cma_info_about - queries information about named regions.
+ * @info: Pointer to a structure where to save the information.
+ * @regions: Comma separated list of region names. Terminated by NUL
+ * byte or a semicolon.
+ *
+ * On error returns a negative error, zero otherwise.
+ */
+static inline int
+cma_info_about(struct cma_info *info, const const char *regions)
+{
+ return __cma_info(info, NULL, regions);
+}
+
+
+
+struct cma_allocator;
+
+/**
+ * struct cma_region - a region reserved for CMA allocations.
+ * @name: Unique name of the region. Read only.
+ * @start: Bus address of the region in bytes. Always aligned at
+ * least to a full page. Read only.
+ * @size: Size of the region in bytes. Multiply of a page size.
+ * Read only.
+ * @free_space: Free space in the region. Read only.
+ * @alignment: Desired alignment of the region in bytes. A power of two,
+ * always at least page size. Early.
+ * @alloc: Allocator used with this region. NULL means allocator is
+ * not attached. Private.
+ * @alloc_name: Allocator name read from cmdline. Private. This may be
+ * different from @alloc->name.
+ * @private_data: Allocator's private data.
+ * @users: Number of chunks allocated in this region.
+ * @list: Entry in list of regions. Private.
+ * @used: Whether region was already used, ie. there was at least
+ * one allocation request for. Private.
+ * @registered: Whether this region has been registered. Read only.
+ * @reserved: Whether this region has been reserved. Early. Read only.
+ * @copy_name: Whether @name and @alloc_name needs to be copied when
+ * this region is converted from early to normal. Early.
+ * Private.
+ * @free_alloc_name: Whether @alloc_name was kmalloced(). Private.
+ *
+ * Regions come in two types: an early region and normal region. The
+ * former can be reserved or not-reserved. Fields marked as "early"
+ * are only meaningful in early regions.
+ *
+ * Early regions are important only during initialisation. The list
+ * of early regions is built from the "cma" command line argument or
+ * platform defaults. Platform initialisation code is responsible for
+ * reserving space for unreserved regions that are placed on
+ * cma_early_regions list.
+ *
+ * Later, during CMA initialisation all reserved regions from the
+ * cma_early_regions list are registered as normal regions and can be
+ * used using standard mechanisms.
+ */
+struct cma_region {
+ const char *name;
+ dma_addr_t start;
+ size_t size;
+ union {
+ size_t free_space; /* Normal region */
+ dma_addr_t alignment; /* Early region */
+ };
+
+ struct cma_allocator *alloc;
+ const char *alloc_name;
+ void *private_data;
+
+ unsigned users;
+ struct list_head list;
+
+ unsigned used:1;
+ unsigned registered:1;
+ unsigned reserved:1;
+ unsigned copy_name:1;
+ unsigned free_alloc_name:1;
+};
+
+
+/**
+ * cma_region_register() - registers a region.
+ * @reg: Region to region.
+ *
+ * Region's start and size must be set.
+ *
+ * If name is set the region will be accessible using normal mechanism
+ * like mapping or cma_alloc_from() function otherwise it will be
+ * a private region and accessible only using the
+ * cma_alloc_from_region() function.
+ *
+ * If alloc is set function will try to initialise given allocator
+ * (and will return error if it failes). Otherwise alloc_name may
+ * point to a name of an allocator to use (if not set, the default
+ * will be used).
+ *
+ * All other fields are ignored and/or overwritten.
+ *
+ * Returns zero or negative error. In particular, -EADDRINUSE if
+ * region overlap with already existing region.
+ */
+int __must_check cma_region_register(struct cma_region *reg);
+
+/**
+ * cma_region_unregister() - unregisters a region.
+ * @reg: Region to unregister.
+ *
+ * Region is unregistered only if there are no chunks allocated for
+ * it. Otherwise, function returns -EBUSY.
+ *
+ * On success returs zero.
+ */
+int __must_check cma_region_unregister(struct cma_region *reg);
+
+
+/**
+ * cma_alloc_from_region() - allocates contiguous chunk of memory from region.
+ * @reg: Region to allocate chunk from.
+ * @size: Size of the memory to allocate in bytes.
+ * @alignment: Desired alignment in bytes. Must be a power of two or
+ * zero. If alignment is less then a page size it will be
+ * set to page size. If unsure, pass zero here.
+ *
+ * On error returns a negative error cast to dma_addr_t. Use
+ * IS_ERR_VALUE() to check if returned value is indeed an error.
+ * Otherwise bus address of the chunk is returned.
+ */
+dma_addr_t __must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment);
+
+
+
+/****************************** Allocators API ******************************/
+
+/**
+ * struct cma_chunk - an allocated contiguous chunk of memory.
+ * @start: Bus address in bytes.
+ * @size: Size in bytes.
+ * @free_space: Free space in region in bytes. Read only.
+ * @reg: Region this chunk belongs to.
+ * @by_start: A node in an red-black tree with all chunks sorted by
+ * start address.
+ *
+ * The cma_allocator::alloc() operation need to set only the @start
+ * and @size fields. The rest is handled by the caller (ie. CMA
+ * glue).
+ */
+struct cma_chunk {
+ dma_addr_t start;
+ size_t size;
+
+ struct cma_region *reg;
+ struct rb_node by_start;
+};
+
+
+/**
+ * struct cma_allocator - a CMA allocator.
+ * @name: Allocator's unique name
+ * @init: Initialises an allocator on given region.
+ * @cleanup: Cleans up after init. May assume that there are no chunks
+ * allocated in given region.
+ * @alloc: Allocates a chunk of memory of given size in bytes and
+ * with given alignment. Alignment is a power of
+ * two (thus non-zero) and callback does not need to check it.
+ * May also assume that it is the only call that uses given
+ * region (ie. access to the region is synchronised with
+ * a mutex). This has to allocate the chunk object (it may be
+ * contained in a bigger structure with allocator-specific data.
+ * Required.
+ * @free: Frees allocated chunk. May also assume that it is the only
+ * call that uses given region. This has to free() the chunk
+ * object as well. Required.
+ * @list: Entry in list of allocators. Private.
+ */
+struct cma_allocator {
+ const char *name;
+
+ int (*init)(struct cma_region *reg);
+ void (*cleanup)(struct cma_region *reg);
+ struct cma_chunk *(*alloc)(struct cma_region *reg, size_t size,
+ dma_addr_t alignment);
+ void (*free)(struct cma_chunk *chunk);
+
+ struct list_head list;
+};
+
+
+/**
+ * cma_allocator_register() - Registers an allocator.
+ * @alloc: Allocator to register.
+ *
+ * Adds allocator to the list of allocators managed by CMA.
+ *
+ * All of the fields of cma_allocator structure must be set except for
+ * the optional name and the list's head which will be overriden
+ * anyway.
+ *
+ * Returns zero or negative error code.
+ */
+int cma_allocator_register(struct cma_allocator *alloc);
+
+
+/**************************** Initialisation API ****************************/
+
+/**
+ * cma_set_defaults() - specifies default command line parameters.
+ * @regions: A zero-sized entry terminated list of early regions.
+ * This array must not be placed in __initdata section.
+ * @map: Map attribute.
+ *
+ * This function should be called prior to cma_early_regions_reserve()
+ * and after early parameters have been parsed.
+ *
+ * Returns zero or negative error.
+ */
+int __init cma_set_defaults(struct cma_region *regions, const char *map);
+
+
+/**
+ * cma_early_regions - a list of early regions.
+ *
+ * Platform needs to allocate space for each of the region before
+ * initcalls are executed. If space is reserved, the reserved flag
+ * must be set. Platform initialisation code may choose to use
+ * cma_early_regions_allocate().
+ *
+ * Later, during CMA initialisation all reserved regions from the
+ * cma_early_regions list are registered as normal regions and can be
+ * used using standard mechanisms.
+ */
+extern struct list_head cma_early_regions __initdata;
+
+
+/**
+ * cma_early_region_register() - registers an early region.
+ * @reg: Region to add.
+ *
+ * Region's size, start and alignment must be set (however the last
+ * two can be zero). If name is set the region will be accessible
+ * using normal mechanism like mapping or cma_alloc_from() function
+ * otherwise it will be a private region accessible only using the
+ * cma_alloc_from_region().
+ *
+ * During platform initialisation, space is reserved for early
+ * regions. Later, when CMA initialises, the early regions are
+ * "converted" into normal regions. If cma_region::alloc is set, CMA
+ * will then try to setup given allocator on the region. Failure to
+ * do so will result in the region not being registered even though
+ * the space for it will still be reserved. If cma_region::alloc is
+ * not set, allocator will be attached to the region on first use and
+ * the value of cma_region::alloc_name will be taken into account if
+ * set.
+ *
+ * All other fields are ignored and/or overwritten.
+ *
+ * Returns zero or negative error. No checking if regions overlap is
+ * performed.
+ */
+int __init __must_check cma_early_region_register(struct cma_region *reg);
+
+
+/**
+ * cma_early_region_reserve() - reserves a physically contiguous memory region.
+ * @reg: Early region to reserve memory for.
+ *
+ * If platform supports bootmem this is the first allocator this
+ * function tries to use. If that failes (or bootmem is not
+ * supported) function tries to use memblec if it is available.
+ *
+ * On success sets reg->reserved flag.
+ *
+ * Returns zero or negative error.
+ */
+int __init cma_early_region_reserve(struct cma_region *reg);
+
+/**
+ * cma_early_regions_reserver() - helper function for reserving early regions.
+ * @reserve: Callbac function used to reserve space for region. Needs
+ * to return non-negative if allocation succeeded, negative
+ * error otherwise. NULL means cma_early_region_alloc() will
+ * be used.
+ *
+ * This function traverses the %cma_early_regions list and tries to
+ * reserve memory for each early region. It uses the @reserve
+ * callback function for that purpose. The reserved flag of each
+ * region is updated accordingly.
+ */
+void __init cma_early_regions_reserve(int (*reserve)(struct cma_region *reg));
+
+#else
+
+#define cma_defaults(regions, map) ((int)0)
+#define cma_early_regions_reserve(reserve) do { } while (0)
+
+#endif
+
+#endif
diff --git a/mm/Kconfig b/mm/Kconfig
index f4e516e..3e9317c 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -301,3 +301,37 @@ config NOMMU_INITIAL_TRIM_EXCESS
of 1 says that all excess pages should be trimmed.

See Documentation/nommu-mmap.txt for more information.
+
+
+config CMA
+ bool "Contiguous Memory Allocator framework"
+ # Currently there is only one allocator so force it on
+ select CMA_BEST_FIT
+ help
+ This enables the Contiguous Memory Allocator framework which
+ allows drivers to allocate big physically-contiguous blocks of
+ memory for use with hardware components that do not support I/O
+ map nor scatter-gather.
+
+ If you select this option you will also have to select at least
+ one allocator algorithm below.
+
+ To make use of CMA you need to specify the regions and
+ driver->region mapping on command line when booting the kernel.
+
+config CMA_DEBUG
+ bool "CMA debug messages (DEVELOPEMENT)"
+ depends on CMA
+ help
+ Enable debug messages in CMA code.
+
+config CMA_BEST_FIT
+ bool "CMA best-fit allocator"
+ depends on CMA
+ default y
+ help
+ This is a best-fit algorithm running in O(n log n) time where
+ n is the number of existing holes (which is never greater then
+ the number of allocated regions and usually much smaller). It
+ allocates area from the smallest hole that is big enough for
+ allocation in question.
diff --git a/mm/Makefile b/mm/Makefile
index 34b2546..d8c717f 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -47,3 +47,5 @@ obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
+obj-$(CONFIG_CMA) += cma.o
+obj-$(CONFIG_CMA_BEST_FIT) += cma-best-fit.o
diff --git a/mm/cma-best-fit.c b/mm/cma-best-fit.c
new file mode 100644
index 0000000..97f8d61
--- /dev/null
+++ b/mm/cma-best-fit.c
@@ -0,0 +1,407 @@
+/*
+ * Contiguous Memory Allocator framework: Best Fit allocator
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+#define pr_fmt(fmt) "cma: bf: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#include <linux/errno.h> /* Error numbers */
+#include <linux/slab.h> /* kmalloc() */
+
+#include <linux/cma.h> /* CMA structures */
+
+
+/************************* Data Types *************************/
+
+struct cma_bf_item {
+ struct cma_chunk ch;
+ struct rb_node by_size;
+};
+
+struct cma_bf_private {
+ struct rb_root by_start_root;
+ struct rb_root by_size_root;
+};
+
+
+/************************* Prototypes *************************/
+
+/*
+ * Those are only for holes. They must be called whenever hole's
+ * properties change but also whenever chunk becomes a hole or hole
+ * becames a chunk.
+ */
+static void __cma_bf_hole_insert_by_size(struct cma_bf_item *item);
+static void __cma_bf_hole_erase_by_size(struct cma_bf_item *item);
+static int __must_check
+__cma_bf_hole_insert_by_start(struct cma_bf_item *item);
+static void __cma_bf_hole_erase_by_start(struct cma_bf_item *item);
+
+/**
+ * __cma_bf_hole_take - takes a chunk of memory out of a hole.
+ * @hole: hole to take chunk from
+ * @size: chunk's size
+ * @alignment: chunk's starting address alignment (must be power of two)
+ *
+ * Takes a @size bytes large chunk from hole @hole which must be able
+ * to hold the chunk. The "must be able" includes also alignment
+ * constraint.
+ *
+ * Returns allocated item or NULL on error (if kmalloc() failed).
+ */
+static struct cma_bf_item *__must_check
+__cma_bf_hole_take(struct cma_bf_item *hole, size_t size, dma_addr_t alignment);
+
+/**
+ * __cma_bf_hole_merge_maybe - tries to merge hole with neighbours.
+ * @item: hole to try and merge
+ *
+ * Which items are preserved is undefined so you may not rely on it.
+ */
+static void __cma_bf_hole_merge_maybe(struct cma_bf_item *item);
+
+
+/************************* Device API *************************/
+
+int cma_bf_init(struct cma_region *reg)
+{
+ struct cma_bf_private *prv;
+ struct cma_bf_item *item;
+
+ prv = kzalloc(sizeof *prv, GFP_KERNEL);
+ if (unlikely(!prv))
+ return -ENOMEM;
+
+ item = kzalloc(sizeof *item, GFP_KERNEL);
+ if (unlikely(!item)) {
+ kfree(prv);
+ return -ENOMEM;
+ }
+
+ item->ch.start = reg->start;
+ item->ch.size = reg->size;
+ item->ch.reg = reg;
+
+ rb_root_init(&prv->by_start_root, &item->ch.by_start);
+ rb_root_init(&prv->by_size_root, &item->by_size);
+
+ reg->private_data = prv;
+ return 0;
+}
+
+void cma_bf_cleanup(struct cma_region *reg)
+{
+ struct cma_bf_private *prv = reg->private_data;
+ struct cma_bf_item *item =
+ rb_entry(prv->by_size_root.rb_node,
+ struct cma_bf_item, by_size);
+
+ /* We can assume there is only a single hole in the tree. */
+ WARN_ON(item->by_size.rb_left || item->by_size.rb_right ||
+ item->ch.by_start.rb_left || item->ch.by_start.rb_right);
+
+ kfree(item);
+ kfree(prv);
+}
+
+struct cma_chunk *cma_bf_alloc(struct cma_region *reg,
+ size_t size, dma_addr_t alignment)
+{
+ struct cma_bf_private *prv = reg->private_data;
+ struct rb_node *node = prv->by_size_root.rb_node;
+ struct cma_bf_item *item = NULL;
+
+ /* First find hole that is large enough */
+ while (node) {
+ struct cma_bf_item *i =
+ rb_entry(node, struct cma_bf_item, by_size);
+
+ if (i->ch.size < size) {
+ node = node->rb_right;
+ } else if (i->ch.size >= size) {
+ node = node->rb_left;
+ item = i;
+ }
+ }
+ if (!item)
+ return NULL;
+
+ /* Now look for items which can satisfy alignment requirements */
+ for (;;) {
+ dma_addr_t start = ALIGN(item->ch.start, alignment);
+ dma_addr_t end = item->ch.start + item->ch.size;
+ if (start < end && end - start >= size) {
+ item = __cma_bf_hole_take(item, size, alignment);
+ return likely(item) ? &item->ch : NULL;
+ }
+
+ node = rb_next(node);
+ if (!node)
+ return NULL;
+
+ item = rb_entry(node, struct cma_bf_item, by_size);
+ }
+}
+
+void cma_bf_free(struct cma_chunk *chunk)
+{
+ struct cma_bf_item *item = container_of(chunk, struct cma_bf_item, ch);
+
+ /* Add new hole */
+ if (unlikely(__cma_bf_hole_insert_by_start(item))) {
+ /*
+ * We're screwed... Just free the item and forget
+ * about it. Things are broken beyond repair so no
+ * sense in trying to recover.
+ */
+ kfree(item);
+ } else {
+ __cma_bf_hole_insert_by_size(item);
+
+ /* Merge with prev and next sibling */
+ __cma_bf_hole_merge_maybe(item);
+ }
+}
+
+
+/************************* Basic Tree Manipulation *************************/
+
+static void __cma_bf_hole_insert_by_size(struct cma_bf_item *item)
+{
+ struct cma_bf_private *prv = item->ch.reg->private_data;
+ struct rb_node **link = &prv->by_size_root.rb_node, *parent = NULL;
+ const typeof(item->ch.size) value = item->ch.size;
+
+ while (*link) {
+ struct cma_bf_item *i;
+ parent = *link;
+ i = rb_entry(parent, struct cma_bf_item, by_size);
+ link = value <= i->ch.size
+ ? &parent->rb_left
+ : &parent->rb_right;
+ }
+
+ rb_link_node(&item->by_size, parent, link);
+ rb_insert_color(&item->by_size, &prv->by_size_root);
+}
+
+static void __cma_bf_hole_erase_by_size(struct cma_bf_item *item)
+{
+ struct cma_bf_private *prv = item->ch.reg->private_data;
+ rb_erase(&item->by_size, &prv->by_size_root);
+}
+
+static int __must_check
+__cma_bf_hole_insert_by_start(struct cma_bf_item *item)
+{
+ struct cma_bf_private *prv = item->ch.reg->private_data;
+ struct rb_node **link = &prv->by_start_root.rb_node, *parent = NULL;
+ const typeof(item->ch.start) value = item->ch.start;
+
+ while (*link) {
+ struct cma_bf_item *i;
+ parent = *link;
+ i = rb_entry(parent, struct cma_bf_item, ch.by_start);
+
+ if (WARN_ON(value == i->ch.start))
+ /*
+ * This should *never* happen. And I mean
+ * *never*. We could even BUG on it but
+ * hopefully things are only a bit broken,
+ * ie. system can still run. We produce
+ * a warning and return an error.
+ */
+ return -EBUSY;
+
+ link = value <= i->ch.start
+ ? &parent->rb_left
+ : &parent->rb_right;
+ }
+
+ rb_link_node(&item->ch.by_start, parent, link);
+ rb_insert_color(&item->ch.by_start, &prv->by_start_root);
+ return 0;
+}
+
+static void __cma_bf_hole_erase_by_start(struct cma_bf_item *item)
+{
+ struct cma_bf_private *prv = item->ch.reg->private_data;
+ rb_erase(&item->ch.by_start, &prv->by_start_root);
+}
+
+
+/************************* More Tree Manipulation *************************/
+
+static struct cma_bf_item *__must_check
+__cma_bf_hole_take(struct cma_bf_item *hole, size_t size, size_t alignment)
+{
+ struct cma_bf_item *item;
+
+ /*
+ * There are three cases:
+ * 1. the chunk takes the whole hole,
+ * 2. the chunk is at the beginning or at the end of the hole, or
+ * 3. the chunk is in the middle of the hole.
+ */
+
+
+ /* Case 1, the whole hole */
+ if (size == hole->ch.size) {
+ __cma_bf_hole_erase_by_size(hole);
+ __cma_bf_hole_erase_by_start(hole);
+ return hole;
+ }
+
+
+ /* Allocate */
+ item = kmalloc(sizeof *item, GFP_KERNEL);
+ if (unlikely(!item))
+ return NULL;
+
+ item->ch.start = ALIGN(hole->ch.start, alignment);
+ item->ch.size = size;
+
+ /* Case 3, in the middle */
+ if (item->ch.start != hole->ch.start
+ && item->ch.start + item->ch.size !=
+ hole->ch.start + hole->ch.size) {
+ struct cma_bf_item *tail;
+
+ /*
+ * Space between the end of the chunk and the end of
+ * the region, ie. space left after the end of the
+ * chunk. If this is dividable by alignment we can
+ * move the chunk to the end of the hole.
+ */
+ size_t left =
+ hole->ch.start + hole->ch.size -
+ (item->ch.start + item->ch.size);
+ if (left % alignment == 0) {
+ item->ch.start += left;
+ goto case_2;
+ }
+
+ /*
+ * We are going to add a hole at the end. This way,
+ * we will reduce the problem to case 2 -- the chunk
+ * will be at the end of the hole.
+ */
+ tail = kmalloc(sizeof *tail, GFP_KERNEL);
+ if (unlikely(!tail)) {
+ kfree(item);
+ return NULL;
+ }
+
+ tail->ch.start = item->ch.start + item->ch.size;
+ tail->ch.size =
+ hole->ch.start + hole->ch.size - tail->ch.start;
+ tail->ch.reg = hole->ch.reg;
+
+ if (unlikely(__cma_bf_hole_insert_by_start(tail))) {
+ /*
+ * Things are broken beyond repair... Abort
+ * inserting the hole but still continue with
+ * allocation (seems like the best we can do).
+ */
+
+ hole->ch.size = tail->ch.start - hole->ch.start;
+ kfree(tail);
+ } else {
+ __cma_bf_hole_insert_by_size(tail);
+ /*
+ * It's important that we first insert the new
+ * hole in the tree sorted by size and later
+ * reduce the size of the old hole. We will
+ * update the position of the old hole in the
+ * rb tree in code that handles case 2.
+ */
+ hole->ch.size = tail->ch.start - hole->ch.start;
+ }
+
+ /* Go to case 2 */
+ }
+
+
+ /* Case 2, at the beginning or at the end */
+case_2:
+ /* No need to update the tree; order preserved. */
+ if (item->ch.start == hole->ch.start)
+ hole->ch.start += item->ch.size;
+
+ /* Alter hole's size */
+ hole->ch.size -= size;
+ __cma_bf_hole_erase_by_size(hole);
+ __cma_bf_hole_insert_by_size(hole);
+
+ return item;
+}
+
+
+static void __cma_bf_hole_merge_maybe(struct cma_bf_item *item)
+{
+ struct cma_bf_item *prev;
+ struct rb_node *node;
+ int twice = 2;
+
+ node = rb_prev(&item->ch.by_start);
+ if (unlikely(!node))
+ goto next;
+ prev = rb_entry(node, struct cma_bf_item, ch.by_start);
+
+ for (;;) {
+ if (prev->ch.start + prev->ch.size == item->ch.start) {
+ /* Remove previous hole from trees */
+ __cma_bf_hole_erase_by_size(prev);
+ __cma_bf_hole_erase_by_start(prev);
+
+ /* Alter this hole */
+ item->ch.size += prev->ch.size;
+ item->ch.start = prev->ch.start;
+ __cma_bf_hole_erase_by_size(item);
+ __cma_bf_hole_insert_by_size(item);
+ /*
+ * No need to update by start trees as we do
+ * not break sequence order
+ */
+
+ /* Free prev hole */
+ kfree(prev);
+ }
+
+next:
+ if (!--twice)
+ break;
+
+ node = rb_next(&item->ch.by_start);
+ if (unlikely(!node))
+ break;
+ prev = item;
+ item = rb_entry(node, struct cma_bf_item, ch.by_start);
+ }
+}
+
+
+
+/************************* Register *************************/
+static int cma_bf_module_init(void)
+{
+ static struct cma_allocator alloc = {
+ .name = "bf",
+ .init = cma_bf_init,
+ .cleanup = cma_bf_cleanup,
+ .alloc = cma_bf_alloc,
+ .free = cma_bf_free,
+ };
+ return cma_allocator_register(&alloc);
+}
+module_init(cma_bf_module_init);
diff --git a/mm/cma.c b/mm/cma.c
new file mode 100644
index 0000000..ba9adb7
--- /dev/null
+++ b/mm/cma.c
@@ -0,0 +1,911 @@
+/*
+ * Contiguous Memory Allocator framework
+ * Copyright (c) 2010 by Samsung Electronics.
+ * Written by Michal Nazarewicz ([email protected])
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License or (at your optional) any later version of the license.
+ */
+
+/*
+ * See Documentation/contiguous-memory.txt for details.
+ */
+
+#define pr_fmt(fmt) "cma: " fmt
+
+#ifdef CONFIG_CMA_DEBUG
+# define DEBUG
+#endif
+
+#ifndef CONFIG_NO_BOOTMEM
+# include <linux/bootmem.h> /* alloc_bootmem_pages_nopanic() */
+#endif
+#ifdef CONFIG_HAVE_MEMBLOCK
+# include <linux/memblock.h> /* memblock*() */
+#endif
+#include <linux/device.h> /* struct device, dev_name() */
+#include <linux/errno.h> /* Error numbers */
+#include <linux/err.h> /* IS_ERR, PTR_ERR, etc. */
+#include <linux/mm.h> /* PAGE_ALIGN() */
+#include <linux/module.h> /* EXPORT_SYMBOL_GPL() */
+#include <linux/mutex.h> /* mutex */
+#include <linux/slab.h> /* kmalloc() */
+#include <linux/string.h> /* str*() */
+
+#include <linux/cma.h>
+
+
+/*
+ * Protects cma_regions, cma_allocators, cma_map, cma_map_length, and
+ * cma_chunks_by_start.
+ */
+static DEFINE_MUTEX(cma_mutex);
+
+
+
+/************************* Map attribute *************************/
+
+static const char *cma_map;
+static size_t cma_map_length;
+
+/*
+ * map-attr ::= [ rules [ ';' ] ]
+ * rules ::= rule [ ';' rules ]
+ * rule ::= patterns '=' regions
+ * patterns ::= pattern [ ',' patterns ]
+ * regions ::= REG-NAME [ ',' regions ]
+ * pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
+ *
+ * See Documentation/contiguous-memory.txt for details.
+ */
+static ssize_t cma_map_validate(const char *param)
+{
+ const char *ch = param;
+
+ if (*ch == '\0' || *ch == '\n')
+ return 0;
+
+ for (;;) {
+ const char *start = ch;
+
+ while (*ch && *ch != '\n' && *ch != ';' && *ch != '=')
+ ++ch;
+
+ if (*ch != '=' || start == ch) {
+ pr_err("map: expecting \"<patterns>=<regions>\" near %s\n",
+ start);
+ return -EINVAL;
+ }
+
+ while (*++ch != ';')
+ if (*ch == '\0' || *ch == '\n')
+ return ch - param;
+ if (ch[1] == '\0' || ch[1] == '\n')
+ return ch - param;
+ ++ch;
+ }
+}
+
+static int __init cma_map_param(char *param)
+{
+ ssize_t len;
+
+ pr_debug("param: map: %s\n", param);
+
+ len = cma_map_validate(param);
+ if (len < 0)
+ return len;
+
+ cma_map = param;
+ cma_map_length = len;
+ return 0;
+}
+
+
+
+/************************* Early regions *************************/
+
+struct list_head cma_early_regions __initdata =
+ LIST_HEAD_INIT(cma_early_regions);
+
+
+int __init __must_check cma_early_region_register(struct cma_region *reg)
+{
+ dma_addr_t start, alignment;
+ size_t size;
+
+ if (reg->alignment & (reg->alignment - 1))
+ return -EINVAL;
+
+ alignment = max(reg->alignment, (dma_addr_t)PAGE_SIZE);
+ start = ALIGN(reg->start, alignment);
+ size = PAGE_ALIGN(reg->size);
+
+ if (start + size < start)
+ return -EINVAL;
+
+ reg->size = size;
+ reg->start = start;
+ reg->alignment = alignment;
+
+ list_add_tail(&reg->list, &cma_early_regions);
+
+ pr_debug("param: registering early region %s (%p@%p/%p)\n",
+ reg->name, (void *)reg->size, (void *)reg->start,
+ (void *)reg->alignment);
+
+ return 0;
+}
+
+
+
+/************************* Regions & Allocators *************************/
+
+static int __cma_region_attach_alloc(struct cma_region *reg);
+
+/* List of all regions. Named regions are kept before unnamed. */
+static LIST_HEAD(cma_regions);
+
+#define cma_foreach_region(reg) \
+ list_for_each_entry(reg, &cma_regions, list)
+
+int __must_check cma_region_register(struct cma_region *reg)
+{
+ const char *name, *alloc_name;
+ struct cma_region *r;
+ char *ch = NULL;
+ int ret = 0;
+
+ if (!reg->size || reg->start + reg->size < reg->start)
+ return -EINVAL;
+
+ reg->users = 0;
+ reg->used = 0;
+ reg->private_data = NULL;
+ reg->registered = 0;
+ reg->free_space = reg->size;
+
+ /* Copy name and alloc_name */
+ name = reg->name;
+ alloc_name = reg->alloc_name;
+ if (reg->copy_name && (reg->name || reg->alloc_name)) {
+ size_t name_size, alloc_size;
+
+ name_size = reg->name ? strlen(reg->name) + 1 : 0;
+ alloc_size = reg->alloc_name ? strlen(reg->alloc_name) + 1 : 0;
+
+ ch = kmalloc(name_size + alloc_size, GFP_KERNEL);
+ if (!ch) {
+ pr_err("%s: not enough memory to allocate name\n",
+ reg->name ?: "(private)");
+ return -ENOMEM;
+ }
+
+ if (name_size) {
+ memcpy(ch, reg->name, name_size);
+ name = ch;
+ ch += name_size;
+ }
+
+ if (alloc_size) {
+ memcpy(ch, reg->alloc_name, alloc_size);
+ alloc_name = ch;
+ }
+ }
+
+ mutex_lock(&cma_mutex);
+
+ /* Don't let regions overlap */
+ cma_foreach_region(r)
+ if (r->start + r->size > reg->start &&
+ r->start < reg->start + reg->size) {
+ ret = -EADDRINUSE;
+ goto done;
+ }
+
+ if (reg->alloc) {
+ ret = __cma_region_attach_alloc(reg);
+ if (unlikely(ret < 0))
+ goto done;
+ }
+
+ reg->name = name;
+ reg->alloc_name = alloc_name;
+ reg->registered = 1;
+ ch = NULL;
+
+ /*
+ * Keep named at the beginning and unnamed (private) at the
+ * end. This helps in traversal when named region is looked
+ * for.
+ */
+ if (name)
+ list_add(&reg->list, &cma_regions);
+ else
+ list_add_tail(&reg->list, &cma_regions);
+
+done:
+ mutex_unlock(&cma_mutex);
+
+ pr_debug("%s: region %sregistered\n",
+ reg->name ?: "(private)", ret ? "not " : "");
+ kfree(ch);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(cma_region_register);
+
+static struct cma_region *__must_check
+__cma_region_find(const char **namep)
+{
+ struct cma_region *reg;
+ const char *ch, *name;
+ size_t n;
+
+ ch = *namep;
+ while (*ch && *ch != ',' && *ch != ';')
+ ++ch;
+ name = *namep;
+ *namep = *ch == ',' ? ch + 1 : ch;
+ n = ch - name;
+
+ /*
+ * Named regions are kept in front of unnamed so if we
+ * encounter unnamed region we can stop.
+ */
+ cma_foreach_region(reg)
+ if (!reg->name)
+ break;
+ else if (!strncmp(name, reg->name, n) && !reg->name[n])
+ return reg;
+
+ return NULL;
+}
+
+
+/* List of all allocators. */
+static LIST_HEAD(cma_allocators);
+
+#define cma_foreach_allocator(alloc) \
+ list_for_each_entry(alloc, &cma_allocators, list)
+
+int cma_allocator_register(struct cma_allocator *alloc)
+{
+ struct cma_region *reg;
+ int first;
+
+ if (!alloc->alloc || !alloc->free)
+ return -EINVAL;
+
+ /* alloc->users = 0; */
+
+ mutex_lock(&cma_mutex);
+
+ first = list_empty(&cma_allocators);
+
+ list_add_tail(&alloc->list, &cma_allocators);
+
+ /*
+ * Attach this allocator to all allocator-less regions that
+ * request this particular allocator (reg->alloc_name equals
+ * alloc->name) or if region wants the first available
+ * allocator and we are the first.
+ */
+ cma_foreach_region(reg) {
+ if (reg->alloc)
+ continue;
+ if (reg->alloc_name
+ ? alloc->name && !strcmp(alloc->name, reg->alloc_name)
+ : (!reg->used && first))
+ continue;
+
+ reg->alloc = alloc;
+ __cma_region_attach_alloc(reg);
+ }
+
+ mutex_unlock(&cma_mutex);
+
+ pr_debug("%s: allocator registered\n", alloc->name ?: "(unnamed)");
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cma_allocator_register);
+
+static struct cma_allocator *__must_check
+__cma_allocator_find(const char *name)
+{
+ struct cma_allocator *alloc;
+
+ if (!name)
+ return list_empty(&cma_allocators)
+ ? NULL
+ : list_entry(cma_allocators.next,
+ struct cma_allocator, list);
+
+ cma_foreach_allocator(alloc)
+ if (alloc->name && !strcmp(name, alloc->name))
+ return alloc;
+
+ return NULL;
+}
+
+
+
+/************************* Initialise CMA *************************/
+
+int __init cma_set_defaults(struct cma_region *regions, const char *map)
+{
+ if (map) {
+ int ret = cma_map_param((char *)map);
+ if (unlikely(ret < 0))
+ return ret;
+ }
+
+ if (!regions)
+ return 0;
+
+ for (; regions->size; ++regions) {
+ int ret = cma_early_region_register(regions);
+ if (unlikely(ret < 0))
+ return ret;
+ }
+
+ return 0;
+}
+
+
+int __init cma_early_region_reserve(struct cma_region *reg)
+{
+ int tried = 0;
+
+ if (!reg->size || (reg->alignment & (reg->alignment - 1)) ||
+ reg->reserved)
+ return -EINVAL;
+
+#ifndef CONFIG_NO_BOOTMEM
+
+ tried = 1;
+
+ {
+ void *ptr = __alloc_bootmem_nopanic(reg->size, reg->alignment,
+ reg->start);
+ if (ptr) {
+ reg->start = virt_to_phys(ptr);
+ reg->reserved = 1;
+ return 0;
+ }
+ }
+
+#endif
+
+#ifdef CONFIG_HAVE_MEMBLOCK
+
+ tried = 1;
+
+ if (reg->start) {
+ if (memblock_is_region_reserved(reg->start, reg->size) < 0 &&
+ memblock_reserve(reg->start, reg->size) >= 0) {
+ reg->reserved = 1;
+ return 0;
+ }
+ } else {
+ /*
+ * Use __memblock_alloc_base() since
+ * memblock_alloc_base() panic()s.
+ */
+ u64 ret = __memblock_alloc_base(reg->size, reg->alignment, 0);
+ if (ret &&
+ ret < ~(dma_addr_t)0 &&
+ ret + reg->size < ~(dma_addr_t)0 &&
+ ret + reg->size > ret) {
+ reg->start = ret;
+ reg->reserved = 1;
+ return 0;
+ }
+
+ if (ret)
+ memblock_free(ret, reg->size);
+ }
+
+#endif
+
+ return tried ? -ENOMEM : -EOPNOTSUPP;
+}
+
+void __init cma_early_regions_reserve(int (*reserve)(struct cma_region *reg))
+{
+ struct cma_region *reg;
+
+ pr_debug("init: reserving early regions\n");
+
+ if (!reserve)
+ reserve = cma_early_region_reserve;
+
+ list_for_each_entry(reg, &cma_early_regions, list) {
+ if (reg->reserved) {
+ /* nothing */
+ } else if (reserve(reg) >= 0) {
+ pr_debug("init: %s: reserved %p@%p\n",
+ reg->name ?: "(private)",
+ (void *)reg->size, (void *)reg->start);
+ reg->reserved = 1;
+ } else {
+ pr_warn("init: %s: unable to reserve %p@%p/%p\n",
+ reg->name ?: "(private)",
+ (void *)reg->size, (void *)reg->start,
+ (void *)reg->alignment);
+ }
+ }
+}
+
+
+static int __init cma_init(void)
+{
+ struct cma_region *reg, *n;
+
+ pr_debug("init: initialising\n");
+
+ if (cma_map) {
+ char *val = kmemdup(cma_map, cma_map_length + 1, GFP_KERNEL);
+ cma_map = val;
+ if (!val)
+ return -ENOMEM;
+ val[cma_map_length] = '\0';
+ }
+
+ list_for_each_entry_safe(reg, n, &cma_early_regions, list) {
+ INIT_LIST_HEAD(&reg->list);
+ /*
+ * We don't care if there was an error. It's a pity
+ * but there's not much we can do about it any way.
+ * If the error is on a region that was parsed from
+ * command line then it will stay and waste a bit of
+ * space; if it was registered using
+ * cma_early_region_register() it's caller's
+ * responsibility to do something about it.
+ */
+ if (reg->reserved && cma_region_register(reg) < 0)
+ /* ignore error */;
+ }
+
+ INIT_LIST_HEAD(&cma_early_regions);
+
+ return 0;
+}
+/*
+ * We want to be initialised earlier than module_init/__initcall so
+ * that drivers that want to grab memory at boot time will get CMA
+ * ready. subsys_initcall() seems early enough and not too early at
+ * the same time.
+ */
+subsys_initcall(cma_init);
+
+
+
+/************************* Chunks *************************/
+
+/* All chunks sorted by start address. */
+static struct rb_root cma_chunks_by_start;
+
+static struct cma_chunk *__must_check __cma_chunk_find(dma_addr_t addr)
+{
+ struct cma_chunk *chunk;
+ struct rb_node *n;
+
+ for (n = cma_chunks_by_start.rb_node; n; ) {
+ chunk = rb_entry(n, struct cma_chunk, by_start);
+ if (addr < chunk->start)
+ n = n->rb_left;
+ else if (addr > chunk->start)
+ n = n->rb_right;
+ else
+ return chunk;
+ }
+ WARN(1, KERN_WARNING "no chunk starting at %p\n", (void *)addr);
+ return NULL;
+}
+
+static int __must_check __cma_chunk_insert(struct cma_chunk *chunk)
+{
+ struct rb_node **new, *parent = NULL;
+ typeof(chunk->start) addr = chunk->start;
+
+ for (new = &cma_chunks_by_start.rb_node; *new; ) {
+ struct cma_chunk *c =
+ container_of(*new, struct cma_chunk, by_start);
+
+ parent = *new;
+ if (addr < c->start) {
+ new = &(*new)->rb_left;
+ } else if (addr > c->start) {
+ new = &(*new)->rb_right;
+ } else {
+ /*
+ * We should never be here. If we are it
+ * means allocator gave us an invalid chunk
+ * (one that has already been allocated) so we
+ * refuse to accept it. Our caller will
+ * recover by freeing the chunk.
+ */
+ WARN_ON(1);
+ return -EADDRINUSE;
+ }
+ }
+
+ rb_link_node(&chunk->by_start, parent, new);
+ rb_insert_color(&chunk->by_start, &cma_chunks_by_start);
+
+ return 0;
+}
+
+static void __cma_chunk_free(struct cma_chunk *chunk)
+{
+ rb_erase(&chunk->by_start, &cma_chunks_by_start);
+
+ chunk->reg->alloc->free(chunk);
+ --chunk->reg->users;
+ chunk->reg->free_space += chunk->size;
+}
+
+
+/************************* The Device API *************************/
+
+static const char *__must_check
+__cma_where_from(const struct device *dev, const char *type);
+
+
+/* Allocate. */
+
+static dma_addr_t __must_check
+__cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment)
+{
+ struct cma_chunk *chunk;
+
+ pr_debug("allocate %p/%p from %s\n",
+ (void *)size, (void *)alignment,
+ reg ? reg->name ?: "(private)" : "(null)");
+
+ if (!reg || reg->free_space < size)
+ return -ENOMEM;
+
+ if (!reg->alloc) {
+ if (!reg->used)
+ __cma_region_attach_alloc(reg);
+ if (!reg->alloc)
+ return -ENOMEM;
+ }
+
+ chunk = reg->alloc->alloc(reg, size, alignment);
+ if (!chunk)
+ return -ENOMEM;
+
+ if (unlikely(__cma_chunk_insert(chunk) < 0)) {
+ /* We should *never* be here. */
+ chunk->reg->alloc->free(chunk);
+ kfree(chunk);
+ return -EADDRINUSE;
+ }
+
+ chunk->reg = reg;
+ ++reg->users;
+ reg->free_space -= chunk->size;
+ pr_debug("allocated at %p\n", (void *)chunk->start);
+ return chunk->start;
+}
+
+dma_addr_t __must_check
+cma_alloc_from_region(struct cma_region *reg,
+ size_t size, dma_addr_t alignment)
+{
+ dma_addr_t addr;
+
+ pr_debug("allocate %p/%p from %s\n",
+ (void *)size, (void *)alignment,
+ reg ? reg->name ?: "(private)" : "(null)");
+
+ if (!size || alignment & (alignment - 1) || !reg)
+ return -EINVAL;
+
+ mutex_lock(&cma_mutex);
+
+ addr = reg->registered ?
+ __cma_alloc_from_region(reg, PAGE_ALIGN(size),
+ max(alignment, (dma_addr_t)PAGE_SIZE)) :
+ -EINVAL;
+
+ mutex_unlock(&cma_mutex);
+
+ return addr;
+}
+EXPORT_SYMBOL_GPL(cma_alloc_from_region);
+
+dma_addr_t __must_check
+__cma_alloc(const struct device *dev, const char *type,
+ dma_addr_t size, dma_addr_t alignment)
+{
+ struct cma_region *reg;
+ const char *from;
+ dma_addr_t addr;
+
+ if (dev)
+ pr_debug("allocate %p/%p for %s/%s\n",
+ (void *)size, (void *)alignment,
+ dev_name(dev), type ?: "");
+
+ if (!size || alignment & (alignment - 1))
+ return -EINVAL;
+
+ size = PAGE_ALIGN(size);
+ if (alignment < PAGE_SIZE)
+ alignment = PAGE_SIZE;
+
+ mutex_lock(&cma_mutex);
+
+ from = __cma_where_from(dev, type);
+ if (unlikely(IS_ERR(from))) {
+ addr = PTR_ERR(from);
+ goto done;
+ }
+
+ pr_debug("allocate %p/%p from one of %s\n",
+ (void *)size, (void *)alignment, from);
+
+ while (*from && *from != ';') {
+ reg = __cma_region_find(&from);
+ addr = __cma_alloc_from_region(reg, size, alignment);
+ if (!IS_ERR_VALUE(addr))
+ goto done;
+ }
+
+ pr_debug("not enough memory\n");
+ addr = -ENOMEM;
+
+done:
+ mutex_unlock(&cma_mutex);
+
+ return addr;
+}
+EXPORT_SYMBOL_GPL(__cma_alloc);
+
+
+/* Query information about regions. */
+static void __cma_info_add(struct cma_info *infop, struct cma_region *reg)
+{
+ infop->total_size += reg->size;
+ infop->free_size += reg->free_space;
+ if (infop->lower_bound > reg->start)
+ infop->lower_bound = reg->start;
+ if (infop->upper_bound < reg->start + reg->size)
+ infop->upper_bound = reg->start + reg->size;
+ ++infop->count;
+}
+
+int
+__cma_info(struct cma_info *infop, const struct device *dev, const char *type)
+{
+ struct cma_info info = { ~(dma_addr_t)0, 0, 0, 0, 0 };
+ struct cma_region *reg;
+ const char *from;
+ int ret;
+
+ if (unlikely(!infop))
+ return -EINVAL;
+
+ mutex_lock(&cma_mutex);
+
+ from = __cma_where_from(dev, type);
+ if (IS_ERR(from)) {
+ ret = PTR_ERR(from);
+ info.lower_bound = 0;
+ goto done;
+ }
+
+ while (*from && *from != ';') {
+ reg = __cma_region_find(&from);
+ if (reg)
+ __cma_info_add(&info, reg);
+ }
+
+ ret = 0;
+done:
+ mutex_unlock(&cma_mutex);
+
+ memcpy(infop, &info, sizeof info);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(__cma_info);
+
+
+/* Freeing. */
+int cma_free(dma_addr_t addr)
+{
+ struct cma_chunk *c;
+ int ret;
+
+ mutex_lock(&cma_mutex);
+
+ c = __cma_chunk_find(addr);
+
+ if (c) {
+ __cma_chunk_free(c);
+ ret = 0;
+ } else {
+ ret = -ENOENT;
+ }
+
+ mutex_unlock(&cma_mutex);
+
+ pr_debug("free(%p): %s\n", (void *)addr, c ? "freed" : "not found");
+ return ret;
+}
+EXPORT_SYMBOL_GPL(cma_free);
+
+
+/************************* Miscellaneous *************************/
+
+static int __cma_region_attach_alloc(struct cma_region *reg)
+{
+ struct cma_allocator *alloc;
+ int ret;
+
+ /*
+ * If reg->alloc is set then caller wants us to use this
+ * allocator. Otherwise we need to find one by name.
+ */
+ if (reg->alloc) {
+ alloc = reg->alloc;
+ } else {
+ alloc = __cma_allocator_find(reg->alloc_name);
+ if (!alloc) {
+ pr_warn("init: %s: %s: no such allocator\n",
+ reg->name ?: "(private)",
+ reg->alloc_name ?: "(default)");
+ reg->used = 1;
+ return -ENOENT;
+ }
+ }
+
+ /* Try to initialise the allocator. */
+ reg->private_data = NULL;
+ ret = alloc->init ? alloc->init(reg) : 0;
+ if (unlikely(ret < 0)) {
+ pr_err("init: %s: %s: unable to initialise allocator\n",
+ reg->name ?: "(private)", alloc->name ?: "(unnamed)");
+ reg->alloc = NULL;
+ reg->used = 1;
+ } else {
+ reg->alloc = alloc;
+ /* ++alloc->users; */
+ pr_debug("init: %s: %s: initialised allocator\n",
+ reg->name ?: "(private)", alloc->name ?: "(unnamed)");
+ }
+ return ret;
+}
+
+
+/*
+ * s ::= rules
+ * rules ::= rule [ ';' rules ]
+ * rule ::= patterns '=' regions
+ * patterns ::= pattern [ ',' patterns ]
+ * regions ::= REG-NAME [ ',' regions ]
+ * pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
+ */
+static const char *__must_check
+__cma_where_from(const struct device *dev, const char *type)
+{
+ /*
+ * This function matches the pattern from the map attribute
+ * agains given device name and type. Type may be of course
+ * NULL or an emtpy string.
+ */
+
+ const char *s, *name;
+ int name_matched = 0;
+
+ /*
+ * If dev is NULL we were called in alternative form where
+ * type is the from string. All we have to do is return it.
+ */
+ if (!dev)
+ return type ?: ERR_PTR(-EINVAL);
+
+ if (!cma_map)
+ return ERR_PTR(-ENOENT);
+
+ name = dev_name(dev);
+ if (WARN_ON(!name || !*name))
+ return ERR_PTR(-EINVAL);
+
+ if (!type)
+ type = "common";
+
+ /*
+ * Now we go throught the cma_map attribute.
+ */
+ for (s = cma_map; *s; ++s) {
+ const char *c;
+
+ /*
+ * If the pattern starts with a slash, the device part of the
+ * pattern matches if it matched previously.
+ */
+ if (*s == '/') {
+ if (!name_matched)
+ goto look_for_next;
+ goto match_type;
+ }
+
+ /*
+ * We are now trying to match the device name. This also
+ * updates the name_matched variable. If, while reading the
+ * spec, we ecnounter comma it means that the pattern does not
+ * match and we need to start over with another pattern (the
+ * one afther the comma). If we encounter equal sign we need
+ * to start over with another rule. If there is a character
+ * that does not match, we neet to look for a comma (to get
+ * another pattern) or semicolon (to get another rule) and try
+ * again if there is one somewhere.
+ */
+
+ name_matched = 0;
+
+ for (c = name; *s != '*' && *c; ++c, ++s)
+ if (*s == '=')
+ goto next_rule;
+ else if (*s == ',')
+ goto next_pattern;
+ else if (*s != '?' && *c != *s)
+ goto look_for_next;
+ if (*s == '*')
+ ++s;
+
+ name_matched = 1;
+
+ /*
+ * Now we need to match the type part of the pattern. If the
+ * pattern is missing it we match only if type points to an
+ * empty string. Otherwise wy try to match it just like name.
+ */
+ if (*s == '/') {
+match_type: /* s points to '/' */
+ ++s;
+
+ for (c = type; *s && *c; ++c, ++s)
+ if (*s == '=')
+ goto next_rule;
+ else if (*s == ',')
+ goto next_pattern;
+ else if (*c != *s)
+ goto look_for_next;
+ }
+
+ /* Return the string behind the '=' sign of the rule. */
+ if (*s == '=')
+ return s + 1;
+ else if (*s == ',')
+ return strchr(s, '=') + 1;
+
+ /* Pattern did not match */
+
+look_for_next:
+ do {
+ ++s;
+ } while (*s != ',' && *s != '=');
+ if (*s == ',')
+ continue;
+
+next_rule: /* s points to '=' */
+ s = strchr(s, ';');
+ if (!s)
+ break;
+
+next_pattern:
+ continue;
+ }
+
+ return ERR_PTR(-ENOENT);
+}
--
1.7.1

2010-08-26 08:17:42

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 2010-08-26 at 03:28 +0200, Michał Nazarewicz wrote:
> On Fri, 20 Aug 2010 15:15:10 +0200, Peter Zijlstra <[email protected]> wrote:
> > So the idea is to grab a large chunk of memory at boot time and then
> > later allow some device to use it?
> >
> > I'd much rather we'd improve the regular page allocator to be smarter
> > about this. We recently added a lot of smarts to it like memory
> > compaction, which allows large gobs of contiguous memory to be freed for
> > things like huge pages.
> >
> > If you want guarantees you can free stuff, why not add constraints to
> > the page allocation type and only allow MIGRATE_MOVABLE pages inside a
> > certain region, those pages are easily freed/moved aside to satisfy
> > large contiguous allocations.
>
> I'm aware that grabbing a large chunk at boot time is a bit of waste of
> space and because of it I'm hoping to came up with a way of reusing the
> space when it's not used by CMA-aware devices. My current idea was to
> use it for easily discardable data (page cache?).

Right, so to me that looks like going at the problem backwards. That
will complicate the page-cache instead of your bad hardware drivers
(really, hardware should use IOMMUs already).

So why not work on the page allocator to improve its contiguous
allocation behaviour. If you look at the thing you'll find pageblocks
and migration types. If you change it so that you pin the migration type
of one or a number of contiguous pageblocks to say MIGRATE_MOVABLE, so
that they cannot be used for anything but movable pages you're pretty
much there.

2010-08-26 08:19:24

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 2010-08-26 at 04:40 +0200, Michał Nazarewicz wrote:
> I think that the biggest problem is fragmentation here. For instance,
> I think that a situation where there is enough free space but it's
> fragmented so no single contiguous chunk can be allocated is a serious
> problem. However, I would argue that if there's simply no space left,
> a multimedia device could fail and even though it's not desirable, it
> would not be such a big issue in my eyes.
>
> So, if only movable or discardable pages are allocated in CMA managed
> regions all should work well. When a device needs memory discardable
> pages would get freed and movable moved unless there is no space left
> on the device in which case allocation would fail.

If you'd actually looked at the page allocator you'd see its capable of
doing exactly that!

I has the notion of movable pages, it can defragment free space (called
compaction).

Use it!

2010-08-26 08:20:57

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 2010-08-26 at 11:49 +0900, Minchan Kim wrote:
> But one of
> problems is anonymous page which can be has a role of pinned page in
> non-swapsystem.

Well, compaction can move those around, but if you've got too many of
them its a simple matter of over-commit and for that we've got the
OOM-killer ;-)

2010-08-26 09:29:25

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, Aug 26, 2010 at 5:20 PM, Peter Zijlstra <[email protected]> wrote:
> On Thu, 2010-08-26 at 11:49 +0900, Minchan Kim wrote:
>> But one of
>> problems is anonymous page which can be has a role of pinned page in
>> non-swapsystem.
>
> Well, compaction can move those around, but if you've got too many of
> them its a simple matter of over-commit and for that we've got the
> OOM-killer ;-)
>

As I said following mail, I said about free space problem.
Of course, compaction could move anon pages into somewhere.
What's is somewhere? At last, it's same zone.
It can prevent fragment problem but not size of free space.
So I mean it would be better to move it into another zone(ex, HIGHMEM)
rather than OOM kill.

--
Kind regards,
Minchan Kim

2010-08-26 09:36:33

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, Aug 26, 2010 at 1:30 PM, KAMEZAWA Hiroyuki
<[email protected]> wrote:
> On Thu, 26 Aug 2010 13:06:28 +0900
> Minchan Kim <[email protected]> wrote:
>
>> On Thu, Aug 26, 2010 at 12:44 PM, KAMEZAWA Hiroyuki
>> <[email protected]> wrote:
>> > On Thu, 26 Aug 2010 11:50:17 +0900
>> > KAMEZAWA Hiroyuki <[email protected]> wrote:
>> >
>> >> 128MB...too big ? But it's depend on config.
>> >>
>> >> IBM's ppc guys used 16MB section, and recently, a new interface to shrink
>> >> the number of /sys files are added, maybe usable.
>> >>
>> >> Something good with this approach will be you can create "cma" memory
>> >> before installing driver.
>> >>
>> >> But yes, complicated and need some works.
>> >>
>> > Ah, I need to clarify what I want to say.
>> >
>> > With compaction, it's helpful, but you can't get contiguous memory larger
>> > than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand,
>> > memory hot-plug code has almost all necessary things.
>>
>> True. Doesn't patch's idea of Christoph helps this ?
>> http://lwn.net/Articles/200699/
>>
>
> yes, I think so. But, IIRC, ?it's own purpose of Chirstoph's work is
> for removing zones. please be careful what's really necessary.

Ahh. Sorry for missing point.
You're right. The patch can't help our problem.

How about changing following this?
The thing is MAX_ORDER is static. But we want to avoid too big
MAX_ORDER of whole zones to support devices which requires big
allocation chunk.
So let's add MAX_ORDER into each zone and then, each zone can have
different max order.
For example, while DMA[32], NORMAL, HIGHMEM can have normal size 11,
MOVABLE zone could have a 15.

This approach has a big side effect?

--
Kind regards,
Minchan Kim

2010-08-26 10:06:35

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 2010-08-26 at 18:29 +0900, Minchan Kim wrote:
> As I said following mail, I said about free space problem.
> Of course, compaction could move anon pages into somewhere.
> What's is somewhere? At last, it's same zone.
> It can prevent fragment problem but not size of free space.
> So I mean it would be better to move it into another zone(ex, HIGHMEM)
> rather than OOM kill.

Real machines don't have highmem, highmem sucks!! /me runs

Does cross zone movement really matter, I though these crappy devices
were mostly used on crappy hardware with very limited memory, so pretty
much everything would be in zone_normal.. no?

But sure, if there's really a need we can look at maybe doing cross zone
movement.

2010-08-26 10:12:46

by Mel Gorman

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Fri, Aug 20, 2010 at 03:15:10PM +0200, Peter Zijlstra wrote:
> On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote:
> > Hello everyone,
> >
> > The following patchset implements a Contiguous Memory Allocator. For
> > those who have not yet stumbled across CMA an excerpt from
> > documentation:
> >
> > The Contiguous Memory Allocator (CMA) is a framework, which allows
> > setting up a machine-specific configuration for physically-contiguous
> > memory management. Memory for devices is then allocated according
> > to that configuration.
> >
> > The main role of the framework is not to allocate memory, but to
> > parse and manage memory configurations, as well as to act as an
> > in-between between device drivers and pluggable allocators. It is
> > thus not tied to any memory allocation method or strategy.
> >
> > For more information please refer to the second patch from the
> > patchset which contains the documentation.
>

I'm only taking a quick look at this - slow as ever so pardon me if I
missed anything.

> So the idea is to grab a large chunk of memory at boot time and then
> later allow some device to use it?
>
> I'd much rather we'd improve the regular page allocator to be smarter
> about this. We recently added a lot of smarts to it like memory
> compaction, which allows large gobs of contiguous memory to be freed for
> things like huge pages.
>

Quick glance tells me that buffer sizes of 20MB are being thrown about
which the core page allocator doesn't handle very well (and couldn't
without major modification). Fragmentation avoidance only works well on
sizes < MAX_ORDER_NR_PAGES which likely will be 2MB or 4MB.

That said, there are things the core VM can do to help. One is related
to ZONE_MOVABLE and the second is on the use of MIGRATE_ISOLATE.

ZONE_MOVABLE is setup when the command line has kernelcore= or movablecore=
specified. In ZONE_MOVABLE only pages that can be migrated are allocated
(or huge pages if specifically configured to be allowed). The zone is setup
during initialisation by slicing pieces from the end of existing zones and
for various reasons, it would be best to maintain that behaviour unless CMA
had a specific requirement for memory in the middle of an existing zone.

So lets say the maximum amount of contiguous memory required by all
devices is 64M and ZONE_MOVABLE is 64M. During normal operation, normal
order-0 pages can be allocated from this zone meaning the memory is not
pinned and unusable by anybody else. This avoids wasting memory. When a
device needs a new buffer, compaction would need some additional smarts
to compact or reclaim the size of memory needed by the driver but
because all the pages in the zone are movable, it should be possible.
Ideally it would have swap to reclaim because if not, compaction needs
to know how to move pages outside a zone (something it currently
avoids).

Essentially, cma_alloc() would be a normal alloc_pages that uses
ZONE_MOVABLE for buffers < MAX_ORDER_NR_PAGES but would need additional
compaction smarts for the larger buffers. I think it would reuse as much
of the existing VM as possible but without reviewing the code, I don't
know for sure how useful the suggestion is.

> If you want guarantees you can free stuff, why not add constraints to
> the page allocation type and only allow MIGRATE_MOVABLE pages inside a
> certain region, those pages are easily freed/moved aside to satisfy
> large contiguous allocations.
>

Relatively handy to do something like this. It can also be somewhat
contrained by doing something similar to MIGRATE_ISOLATE to have
contiguous regions of memory in a zone unusable by non-movable
allocationos. It would be a lot trickier when interacting with reclaim
though so using ZONE_MOVABLE would have less gotchas.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2010-08-26 10:19:12

by Mel Gorman

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, Aug 26, 2010 at 04:40:46AM +0200, Micha?? Nazarewicz wrote:
> Hello Andrew,
>
> I think Pawel has replied to most of your comments, so I'll just add my own
> 0.02 KRW. ;)
>
>> Peter Zijlstra <[email protected]> wrote:
>>> So the idea is to grab a large chunk of memory at boot time and then
>>> later allow some device to use it?
>>>
>>> I'd much rather we'd improve the regular page allocator to be smarter
>>> about this. We recently added a lot of smarts to it like memory
>>> compaction, which allows large gobs of contiguous memory to be freed for
>>> things like huge pages.
>>>
>>> If you want guarantees you can free stuff, why not add constraints to
>>> the page allocation type and only allow MIGRATE_MOVABLE pages inside a
>>> certain region, those pages are easily freed/moved aside to satisfy
>>> large contiguous allocations.
>
> On Thu, 26 Aug 2010 00:58:14 +0200, Andrew Morton <[email protected]> wrote:
>> That would be good. Although I expect that the allocation would need
>> to be 100% rock-solid reliable, otherwise the end user has a
>> non-functioning device. Could generic core VM provide the required level
>> of service?
>
> I think that the biggest problem is fragmentation here. For instance,
> I think that a situation where there is enough free space but it's
> fragmented so no single contiguous chunk can be allocated is a serious
> problem. However, I would argue that if there's simply no space left,
> a multimedia device could fail and even though it's not desirable, it
> would not be such a big issue in my eyes.
>

For handling fragmentation, there is the option of ZONE_MOVABLE so it's
usable by normal allocations but the CMA can take action to get it
cleared out if necessary. Another option that is trickier but less
disruptive would be to select a range of memory in a normal zone for CMA
and mark it MIGRATE_MOVABLE so that movable pages are allocated from it.
The trickier part is you need to make that bit stick so that non-movable
pages are never allocated from that range. That would be trickish to
implement but possible and it would avoid the fragmentation
problem without pinning memory.

> So, if only movable or discardable pages are allocated in CMA managed
> regions all should work well. When a device needs memory discardable
> pages would get freed and movable moved unless there is no space left
> on the device in which case allocation would fail.
>
> Critical devices (just a hypothetical entities) could have separate
> regions on which only discardable pages can be allocated so that memory
> can always be allocated for them.
>
>> I agree that having two "contiguous memory allocators" floating about
>> on the list is distressing. Are we really all 100% diligently certain
>> that there is no commonality here with Zach's work?
>
> As Pawel said, I think Zach's trying to solve a different problem. No
> matter, as I've said in response to Konrad's message, I have thought
> about unifying Zach's IOMMU and CMA in such a way that devices could
> work on both systems with and without IOMMU if only they would limit
> the usage of the API to some subset which always works.
>
>> Please cc me on future emails on this topic?
>
> Not a problem.
>
> --
> Best regards, _ _
> | Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
> | Computer Science, Micha?? "mina86" Nazarewicz (o o)
> +----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
>

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2010-08-26 10:21:59

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, Aug 26, 2010 at 7:06 PM, Peter Zijlstra <[email protected]> wrote:
> On Thu, 2010-08-26 at 18:29 +0900, Minchan Kim wrote:
>> As I said following mail, I said about free space problem.
>> Of course, compaction could move anon pages into somewhere.
>> What's is somewhere? At last, it's same zone.
>> It can prevent fragment problem but not size of free space.
>> So I mean it would be better to move it into another zone(ex, HIGHMEM)
>> rather than OOM kill.
>
> Real machines don't have highmem, highmem sucks!! /me runs

It's another topic.
I agree highmem isn't a gorgeous. But my desktop isn't real machine?
Important thing is that we already have a highmem and many guys
include you(kmap stacking patch :))try to improve highmem problems. :)

>
> Does cross zone movement really matter, I though these crappy devices
> were mostly used on crappy hardware with very limited memory, so pretty
> much everything would be in zone_normal.. no?

No. Until now, many embedded devices have used to small memory. In
that case, only there is a DMA zone in system. But as I know, mobile
phone starts to use big(?) memory like 1G or above sooner or later. So
they starts to use HIGHMEM. Otherwise, 2G/2G space configuration.
Some embedded device uses many thread model to port easily from RTOS.
In that case, they don't have enough address space for application if
it uses 2G/2G model.

So we should care of HIGHMEM in embedded system from now on.

>
> But sure, if there's really a need we can look at maybe doing cross zone
> movement.
>

--
Kind regards,
Minchan Kim

2010-08-26 11:05:57

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

Even more offtopic ;-)

On Thu, 2010-08-26 at 19:21 +0900, Minchan Kim wrote:
> I agree highmem isn't a gorgeous. But my desktop isn't real machine?
> Important thing is that we already have a highmem and many guys
> include you(kmap stacking patch :))try to improve highmem problems. :)

I have exactly 0 machines in daily use that use highmem, I had to test
that kmap stuff in a 32bit qemu.

Sadly some hardware folks still think its a sane thing to do, like ARM
announcing 40bit PAE, I mean really?!

At least AMD announced a 64bit tiny-chip and hopefully Intel Atom will
soon be all 64bit too (please?!).

2010-08-26 13:47:55

by Mel Gorman

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 2/6] mm: cma: Contiguous Memory Allocator added

On Fri, Aug 20, 2010 at 11:50:42AM +0200, Michal Nazarewicz wrote:
> The Contiguous Memory Allocator framework is a set of APIs for
> allocating physically contiguous chunks of memory.
>
> Various chips require contiguous blocks of memory to operate. Those
> chips include devices such as cameras, hardware video decoders and
> encoders, etc.
>
> The code is highly modular and customisable to suit the needs of
> various users. Set of regions reserved for CMA can be configured
> per-platform and it is easy to add custom allocator algorithms if one
> has such need.
>
> Signed-off-by: Michal Nazarewicz <[email protected]>
> Signed-off-by: Kyungmin Park <[email protected]>
> Reviewed-by: Pawel Osciak <[email protected]>
> ---

Please do not consider this a proper review. I'm only glancing through
it.

> Documentation/00-INDEX | 2 +
> Documentation/contiguous-memory.txt | 541 +++++++++++++++++++++
> include/linux/cma.h | 431 +++++++++++++++++
> mm/Kconfig | 34 ++
> mm/Makefile | 2 +
> mm/cma-best-fit.c | 407 ++++++++++++++++
> mm/cma.c | 910 +++++++++++++++++++++++++++++++++++
> 7 files changed, 2327 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/contiguous-memory.txt
> create mode 100644 include/linux/cma.h
> create mode 100644 mm/cma-best-fit.c
> create mode 100644 mm/cma.c
>
> diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
> index 8dfc670..f93e787 100644
> --- a/Documentation/00-INDEX
> +++ b/Documentation/00-INDEX
> @@ -94,6 +94,8 @@ connector/
> - docs on the netlink based userspace<->kernel space communication mod.
> console/
> - documentation on Linux console drivers.
> +contiguous-memory.txt
> + - documentation on physically-contiguous memory allocation framework.
> cpu-freq/
> - info on CPU frequency and voltage scaling.
> cpu-hotplug.txt
> diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt
> new file mode 100644
> index 0000000..8fc2400
> --- /dev/null
> +++ b/Documentation/contiguous-memory.txt
> @@ -0,0 +1,541 @@
> + -*- org -*-
> +
> +* Contiguous Memory Allocator
> +
> + The Contiguous Memory Allocator (CMA) is a framework, which allows
> + setting up a machine-specific configuration for physically-contiguous
> + memory management. Memory for devices is then allocated according
> + to that configuration.
> +
> + The main role of the framework is not to allocate memory, but to
> + parse and manage memory configurations, as well as to act as an
> + in-between between device drivers and pluggable allocators. It is
> + thus not tied to any memory allocation method or strategy.
> +
> +** Why is it needed?
> +
> + Various devices on embedded systems have no scatter-getter and/or
> + IO map support and as such require contiguous blocks of memory to
> + operate. They include devices such as cameras, hardware video
> + decoders and encoders, etc.
> +
> + Such devices often require big memory buffers (a full HD frame is,
> + for instance, more then 2 mega pixels large, i.e. more than 6 MB
> + of memory), which makes mechanisms such as kmalloc() ineffective.
> +

So more than 6MB of memory means the page allocator cannot automatically
grant the requests. That's fine but I'd still like to be as close to the
page allocator if possible.

> + Some embedded devices impose additional requirements on the
> + buffers, e.g. they can operate only on buffers allocated in
> + particular location/memory bank (if system has more than one
> + memory bank) or buffers aligned to a particular memory boundary.
> +

An important consideration is if the alignment is always a natural
alignment? i.e. a 64K buffer must be 64K aligned, 128K must be 128K aligned
etc. I ask because the buddy allocator is great at granting natural alignments
but is difficult to work with for other alignments.

> + Development of embedded devices have seen a big rise recently
> + (especially in the V4L area) and many such drivers include their
> + own memory allocation code. Most of them use bootmem-based methods.
> + CMA framework is an attempt to unify contiguous memory allocation
> + mechanisms and provide a simple API for device drivers, while
> + staying as customisable and modular as possible.
> +

If drivers are using bootmem and custom allocators, I agree that some common
framework is needed. If every device depended on bootmem, there would be huge
chunks of unusable memory. i.e. At first glance, I think this is important
in concept.

> +** Design
> +
> + The main design goal for the CMA was to provide a customisable and
> + modular framework, which could be configured to suit the needs of
> + individual systems. Configuration specifies a list of memory
> + regions, which then are assigned to devices. Memory regions can
> + be shared among many device drivers or assigned exclusively to
> + one. This has been achieved in the following ways:
> +

It'd be very nice if the shared regions could also be used by normal movable
memory allocations to minimise the amount of wastage. I imagine this would
be particularly important on memory-constrained devices. So right now,
we have

#define MIGRATE_UNMOVABLE 0
#define MIGRATE_RECLAIMABLE 1
#define MIGRATE_MOVABLE 2
#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */
#define MIGRATE_RESERVE 3
#define MIGRATE_ISOLATE 4 /* can't allocate from here */
#define MIGRATE_TYPES 5

Conceptually speaking we also want

MIGRATE_MOVABLE_STICKY /* Set by CMA, used by CMA and GFP_MOVABLE */
MIGRATE_MOVABLE_EXCLUSIVE /* Set by CMA, exclusive use of a device */

Sticky would be usable by the page allocator and other than forcing
the migrate_type to be MIGRATE_MOVABLE, it would otherwise be normal
memory. Exclusive would be isolated from normal usage by taking the pages from
the normal free lists and putting them on a free list managed by CMA. Normally
the page allocator uses zone->free_area[] for its free lists. The allocator
would need to handle either free_area from a zone or one provided by a device
using CMA. Would be tricky to pass through admittedly.

I recognise this is not straight-forward so consider these to be suggestions,
not requirements. Glancing through, I don't see why these patches could not
evolve to be closer to the page allocator for example rather than happening
at the very start.

> + 1. The core of the CMA does not handle allocation of memory and
> + management of free space. Dedicated allocators are used for
> + that purpose.
> +

My ideal would be the default allocator was the buddy allocator
beginning at __rmqueue_smallest() with a custom allocator provided only
if absolutly required.

Custom allocators are not easy to get right :/

> + This way, if the provided solution does not match demands
> + imposed on a given system, one can develop a new algorithm and
> + easily plug it into the CMA framework.
> +
> + The presented solution includes an implementation of a best-fit
> + algorithm.
> +
> + 2. When requesting memory, devices have to introduce themselves.
> + This way CMA knows who the memory is allocated for. This
> + allows for the system architect to specify which memory regions
> + each device should use.
> +
> + 3. Memory regions are grouped in various "types". When device
> + requests a chunk of memory, it can specify what type of memory
> + it needs. If no type is specified, "common" is assumed.
> +
> + This makes it possible to configure the system in such a way,
> + that a single device may get memory from different memory
> + regions, depending on the "type" of memory it requested. For
> + example, a video codec driver might want to allocate some
> + shared buffers from the first memory bank and the other from
> + the second to get the highest possible memory throughput.
> +
> + 4. For greater flexibility and extensibility, the framework allows
> + device drivers to register private regions of reserved memory
> + which then may be used only by them.
> +
> + As an effect, if a driver would not use the rest of the CMA
> + interface, it can still use CMA allocators and other
> + mechanisms.
> +
> + 4a. Early in boot process, device drivers can also request the
> + CMA framework to a reserve a region of memory for them
> + which then will be used as a private region.
> +
> + This way, drivers do not need to directly call bootmem,
> + memblock or similar early allocator but merely register an
> + early region and the framework will handle the rest
> + including choosing the right early allocator.
> +
> +** Use cases
> +
> + Let's analyse some imaginary system that uses the CMA to see how
> + the framework can be used and configured.
> +
> +
> + We have a platform with a hardware video decoder and a camera each
> + needing 20 MiB of memory in the worst case. Our system is written
> + in such a way though that the two devices are never used at the
> + same time and memory for them may be shared. In such a system the
> + following configuration would be used in the platform
> + initialisation code:
> +
> + static struct cma_region regions[] = {
> + { .name = "region", .size = 20 << 20 },
> + { }
> + }
> + static const char map[] __initconst = "video,camera=region";
> +
> + cma_set_defaults(regions, map);
> +
> + The regions array defines a single 20-MiB region named "region".
> + The map says that drivers named "video" and "camera" are to be
> + granted memory from the previously defined region.
> +
> + A shorter map can be used as well:
> +
> + static const char map[] __initconst = "*=region";
> +
> + The asterisk ("*") matches all devices thus all devices will use
> + the region named "region".
> +
> + We can see, that because the devices share the same memory region,
> + we save 20 MiB, compared to the situation when each of the devices
> + would reserve 20 MiB of memory for itself.
> +
> +
> + Now, let's say that we have also many other smaller devices and we
> + want them to share some smaller pool of memory. For instance 5
> + MiB. This can be achieved in the following way:
> +
> + static struct cma_region regions[] = {
> + { .name = "region", .size = 20 << 20 },
> + { .name = "common", .size = 5 << 20 },
> + { }
> + }
> + static const char map[] __initconst =
> + "video,camera=region;*=common";
> +
> + cma_set_defaults(regions, map);
> +
> + This instructs CMA to reserve two regions and let video and camera
> + use region "region" whereas all other devices should use region
> + "common".
> +

Based on these requirements I guess it would go something like

For camera=region
1. Allocate free_area for free lists and associate with cma_region
2. Find contiguous range of MIGRATE_MOVABLE blocks
3. Mark MIGRATE_MOVABLE_STICKY
4. Remove pages from zone free lists and add to cma_region freelist

On allocation, cma_alloc passes a cma_control structure
including the cma_region. Bypass the per-cpu allocator and all
that. Use the normal allocator but use the cma_region free
lists.

All allocations for CMA must be compound so that the page has a
destructor. Store what cma_region the compound page belongs on the
struct page. This is tricky for single pages so it would be
ideal if the page could always be compound.

On free, the destructor adds the page back onto the cma_region
free list. Hugetlbfs does something like this

So, other than where the free list is, allocation is using the
core page allocator

That all said, you could just always go with your BEST_FIT
allocator when the use is exclusive.

For *=common
1. Find contiguous range of MIGRATE_MOVABLE blocks
2. Mark MIGRATE_MOVABLE_STICKY

On allocation for < MAX_ORDER_NR_PAGES, just specify __GFP_CMA. This
will allow allocation from regions marked MIGRATE_MOVABLE_STICKY.
If a suitable page is not found, compaction is used to vacate all
MOVABLE pages from all MIGRATE_MOVABLE_STICKY regions (you could be
smarter about it but as a start, move everything)

If the allocation is > MAX_ORDER_NR_PAGES, start by migrating all
movable pages out of the MIGRATE_MOVABLE_STICKY region and then fall
back to a linear scan. You could fall back to BEST_FIT if and only
if all the other MOVABLE pages that the best fit algorithm is not
aware of got moved out of the way or that the best fit algorithm
was informed where the unmovable pages happen to be.

Again, I recognise this is not easy and there will be some weird interaction
with page reclaim which will need to take the number of CMA regions into
account. It's just a suggestion on what direction you could take it to avoid
a mess of custom allocators. The current design looks like it could migrate
towards the core page allocator and share pages to minimise wastage so it's
not a blocker to merging.

> +
> + Later on, after some development of the system, it can now run
> + video decoder and camera at the same time. The 20 MiB region is
> + no longer enough for the two to share. A quick fix can be made to
> + grant each of those devices separate regions:
> +
> + static struct cma_region regions[] = {
> + { .name = "v", .size = 20 << 20 },
> + { .name = "c", .size = 20 << 20 },
> + { .name = "common", .size = 5 << 20 },
> + { }
> + }
> + static const char map[] __initconst = "video=v;camera=c;*=common";
> +
> + cma_set_defaults(regions, map);
> +
> + This solution also shows how with CMA you can assign private pools
> + of memory to each device if that is required.
> +
> +
> + Allocation mechanisms can be replaced dynamically in a similar
> + manner as well. Let's say that during testing, it has been
> + discovered that, for a given shared region of 40 MiB,
> + fragmentation has become a problem. It has been observed that,
> + after some time, it becomes impossible to allocate buffers of the
> + required sizes. So to satisfy our requirements, we would have to
> + reserve a larger shared region beforehand.
> +
> + But fortunately, you have also managed to develop a new allocation
> + algorithm -- Neat Allocation Algorithm or "na" for short -- which
> + satisfies the needs for both devices even on a 30 MiB region. The
> + configuration can be then quickly changed to:
> +
> + static struct cma_region regions[] = {
> + { .name = "region", .size = 30 << 20, .alloc_name = "na" },
> + { .name = "common", .size = 5 << 20 },
> + { }
> + }
> + static const char map[] __initconst = "video,camera=region;*=common";
> +
> + cma_set_defaults(regions, map);
> +
> + This shows how you can develop your own allocation algorithms if
> + the ones provided with CMA do not suit your needs and easily
> + replace them, without the need to modify CMA core or even
> + recompiling the kernel.
> +
> +** Technical Details
> +
> +*** The attributes
> +
> + As shown above, CMA is configured by a two attributes: list
> + regions and map. The first one specifies regions that are to be
> + reserved for CMA. The second one specifies what regions each
> + device is assigned to.
> +
> +**** Regions
> +
> + Regions is a list of regions terminated by a region with size
> + equal zero. The following fields may be set:
> +
> + - size -- size of the region (required, must not be zero)
> + - alignment -- alignment of the region; must be power of two or
> + zero (optional)
> + - start -- where the region has to start (optional)
> + - alloc_name -- the name of allocator to use (optional)
> + - alloc -- allocator to use (optional; and besides
> + alloc_name is probably is what you want)
> +
> + size, alignment and start is specified in bytes. Size will be
> + aligned up to a PAGE_SIZE. If alignment is less then a PAGE_SIZE
> + it will be set to a PAGE_SIZE. start will be aligned to
> + alignment.
> +
> +**** Map
> +
> + The format of the "map" attribute is as follows:
> +
> + map-attr ::= [ rules [ ';' ] ]
> + rules ::= rule [ ';' rules ]
> + rule ::= patterns '=' regions
> +
> + patterns ::= pattern [ ',' patterns ]
> +
> + regions ::= REG-NAME [ ',' regions ]
> + // list of regions to try to allocate memory
> + // from
> +
> + pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
> + // pattern request must match for the rule to
> + // apply; the first rule that matches is
> + // applied; if dev-pattern part is omitted
> + // value identical to the one used in previous
> + // pattern is assumed.
> +
> + dev-pattern ::= PATTERN
> + // pattern that device name must match for the
> + // rule to apply; may contain question marks
> + // which mach any characters and end with an
> + // asterisk which match the rest of the string
> + // (including nothing).
> +
> + It is a sequence of rules which specify what regions should given
> + (device, type) pair use. The first rule that matches is applied.
> +
> + For rule to match, the pattern must match (dev, type) pair.
> + Pattern consist of the part before and after slash. The first
> + part must match device name and the second part must match kind.
> +
> + If the first part is empty, the device name is assumed to match
> + iff it matched in previous pattern. If the second part is
> + omitted it will mach any type of memory requested by device.
> +
> + Some examples (whitespace added for better readability):
> +
> + cma_map = foo/quaz = r1;
> + // device foo with type == "quaz" uses region r1
> +
> + foo/* = r2; // OR:
> + /* = r2;
> + // device foo with any other kind uses region r2
> +
> + bar = r1,r2;
> + // device bar uses region r1 or r2
> +
> + baz?/a , baz?/b = r3;
> + // devices named baz? where ? is any character
> + // with type being "a" or "b" use r3
> +
> +*** The device and types of memory
> +
> + The name of the device is taken from the device structure. It is
> + not possible to use CMA if driver does not register a device
> + (actually this can be overcome if a fake device structure is
> + provided with at least the name set).
> +
> + The type of memory is an optional argument provided by the device
> + whenever it requests memory chunk. In many cases this can be
> + ignored but sometimes it may be required for some devices.
> +
> + For instance, let's say that there are two memory banks and for
> + performance reasons a device uses buffers in both of them.
> + Platform defines a memory types "a" and "b" for regions in both
> + banks. The device driver would use those two types then to
> + request memory chunks from different banks. CMA attributes could
> + look as follows:
> +
> + static struct cma_region regions[] = {
> + { .name = "a", .size = 32 << 20 },
> + { .name = "b", .size = 32 << 20, .start = 512 << 20 },
> + { }
> + }
> + static const char map[] __initconst = "foo/a=a;foo/b=b;*=a,b";
> +
> + And whenever the driver allocated the memory it would specify the
> + kind of memory:
> +
> + buffer1 = cma_alloc(dev, "a", 1 << 20, 0);
> + buffer2 = cma_alloc(dev, "b", 1 << 20, 0);
> +
> + If it was needed to try to allocate from the other bank as well if
> + the dedicated one is full, the map attributes could be changed to:
> +
> + static const char map[] __initconst = "foo/a=a,b;foo/b=b,a;*=a,b";
> +
> + On the other hand, if the same driver was used on a system with
> + only one bank, the configuration could be changed just to:
> +
> + static struct cma_region regions[] = {
> + { .name = "r", .size = 64 << 20 },
> + { }
> + }
> + static const char map[] __initconst = "*=r";
> +
> + without the need to change the driver at all.
> +
> +*** Device API
> +
> + There are three basic calls provided by the CMA framework to
> + devices. To allocate a chunk of memory cma_alloc() function needs
> + to be used:
> +
> + dma_addr_t cma_alloc(const struct device *dev, const char *type,
> + size_t size, dma_addr_t alignment);
> +
> + If required, device may specify alignment in bytes that the chunk
> + need to satisfy. It have to be a power of two or zero. The
> + chunks are always aligned at least to a page.
> +
> + The type specifies the type of memory as described to in the
> + previous subsection. If device driver does not care about memory
> + type it can safely pass NULL as the type which is the same as
> + possing "common".
> +
> + The basic usage of the function is just a:
> +
> + addr = cma_alloc(dev, NULL, size, 0);
> +
> + The function returns physical address of allocated chunk or
> + a value that evaluates to true if checked with IS_ERR_VALUE(), so
> + the correct way for checking for errors is:
> +
> + unsigned long addr = cma_alloc(dev, size);
> + if (IS_ERR_VALUE(addr))
> + /* Error */
> + return (int)addr;
> + /* Allocated */
> +
> + (Make sure to include <linux/err.h> which contains the definition
> + of the IS_ERR_VALUE() macro.)
> +
> +
> + Allocated chunk is freed via a cma_free() function:
> +
> + int cma_free(dma_addr_t addr);
> +
> + It takes physical address of the chunk as an argument frees it.
> +
> +
> + The last function is the cma_info() which returns information
> + about regions assigned to given (dev, type) pair. Its syntax is:
> +
> + int cma_info(struct cma_info *info,
> + const struct device *dev,
> + const char *type);
> +
> + On successful exit it fills the info structure with lower and
> + upper bound of regions, total size and number of regions assigned
> + to given (dev, type) pair.
> +
> +**** Dynamic and private regions
> +
> + In the basic setup, regions are provided and initialised by
> + platform initialisation code (which usually use
> + cma_set_defaults() for that purpose).
> +
> + It is, however, possible to create and add regions dynamically
> + using cma_region_register() function.
> +
> + int cma_region_register(struct cma_region *reg);
> +
> + The region does not have to have name. If it does not, it won't
> + be accessed via standard mapping (the one provided with map
> + attribute). Such regions are private and to allocate chunk from
> + them, one needs to call:
> +
> + dma_addr_t cma_alloc_from_region(struct cma_region *reg,
> + size_t size, dma_addr_t alignment);
> +
> + It is just like cma_alloc() expect one specifies what region to
> + allocate memory from. The region must have been registered.
> +
> +**** Allocating from region specified by name
> +
> + If a driver preferred allocating from a region or list of regions
> + it knows name of it can use a different call simmilar to the
> + previous:
> +
> + dma_addr_t cma_alloc_from(const char *regions,
> + size_t size, dma_addr_t alignment);
> +
> + The first argument is a comma-separated list of regions the
> + driver desires CMA to try and allocate from. The list is
> + terminated by a NUL byte or a semicolon.
> +
> + Similarly, there is a call for requesting information about named
> + regions:
> +
> + int cma_info_about(struct cma_info *info, const char *regions);
> +
> + Generally, it should not be needed to use those interfaces but
> + they are provided nevertheless.
> +
> +**** Registering early regions
> +
> + An early region is a region that is managed by CMA early during
> + boot process. It's platforms responsibility to reserve memory
> + for early regions. Later on, when CMA initialises, early regions
> + with reserved memory are registered as normal regions.
> + Registering an early region may be a way for a device to request
> + a private pool of memory without worrying about actually
> + reserving the memory:
> +
> + int cma_early_region_register(struct cma_region *reg);
> +
> + This needs to be done quite early on in boot process, before
> + platform traverses the cma_early_regions list to reserve memory.
> +
> + When boot process ends, device driver may see whether the region
> + was reserved (by checking reg->reserved flag) and if so, whether
> + it was successfully registered as a normal region (by checking
> + the reg->registered flag). If that is the case, device driver
> + can use normal API calls to use the region.
> +
> +*** Allocator operations
> +
> + Creating an allocator for CMA needs four functions to be
> + implemented.
> +
> +
> + The first two are used to initialise an allocator far given driver
> + and clean up afterwards:
> +
> + int cma_foo_init(struct cma_region *reg);
> + void cma_foo_cleanup(struct cma_region *reg);
> +
> + The first is called when allocater is attached to region. The
> + cma_region structure has saved starting address of the region as
> + well as its size. Any data that allocate associated with the
> + region can be saved in private_data field.
> +
> + The second call cleans up and frees all resources the allocator
> + has allocated for the region. The function can assume that all
> + chunks allocated form this region have been freed thus the whole
> + region is free.
> +
> +
> + The two other calls are used for allocating and freeing chunks.
> + They are:
> +
> + struct cma_chunk *cma_foo_alloc(struct cma_region *reg,
> + size_t size, dma_addr_t alignment);
> + void cma_foo_free(struct cma_chunk *chunk);
> +
> + As names imply the first allocates a chunk and the other frees
> + a chunk of memory. It also manages a cma_chunk object
> + representing the chunk in physical memory.
> +
> + Either of those function can assume that they are the only thread
> + accessing the region. Therefore, allocator does not need to worry
> + about concurrency. Moreover, all arguments are guaranteed to be
> + valid (i.e. page aligned size, a power of two alignment no lower
> + the a page size).
> +
> +
> + When allocator is ready, all that is left is to register it by
> + calling cma_allocator_register() function:
> +
> + int cma_allocator_register(struct cma_allocator *alloc);
> +
> + The argument is an structure with pointers to the above functions
> + and allocator's name. The whole call may look something like
> + this:
> +
> + static struct cma_allocator alloc = {
> + .name = "foo",
> + .init = cma_foo_init,
> + .cleanup = cma_foo_cleanup,
> + .alloc = cma_foo_alloc,
> + .free = cma_foo_free,
> + };
> + return cma_allocator_register(&alloc);
> +
> + The name ("foo") will be available to use with command line
> + argument.
> +
> +*** Integration with platform
> +
> + There is one function that needs to be called form platform
> + initialisation code. That is the cma_early_regions_reserve()
> + function:
> +
> + void cma_early_regions_reserve(int (*reserve)(struct cma_region *reg));
> +
> + It traverses list of all of the regions given on command line and
> + reserves memory for them. The only argument is a callback
> + function used to reserve the region. Passing NULL as the argument
> + makes the function use cma_early_region_reserve() function which
> + uses bootmem and memblock for allocating.
> +
> + Alternatively, platform code could traverse the cma_early_regions
> + list by itself but this should not be necessary.
> +
> +
> + Platform has also a way of providing default attributes for CMA,
> + cma_set_defaults() function is used for that purpose:
> +
> + int cma_set_defaults(struct cma_region *regions, const char *map)
> +
> + It needs to be called prior to reserving regions. It let one
> + specify the list of regions defined by platform and the map
> + attribute. The map may point to a string in __initdata. See
> + above in this document for example usage of this function.
> +
> +** Future work
> +
> + In the future, implementation of mechanisms that would allow the
> + free space inside the regions to be used as page cache, filesystem
> + buffers or swap devices is planned. With such mechanisms, the
> + memory would not be wasted when not used.
> +
> + Because all allocations and freeing of chunks pass the CMA
> + framework it can follow what parts of the reserved memory are
> + freed and what parts are allocated. Tracking the unused memory
> + would let CMA use it for other purposes such as page cache, I/O
> + buffers, swap, etc.
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> new file mode 100644
> index 0000000..cd63f52
> --- /dev/null
> +++ b/include/linux/cma.h
> @@ -0,0 +1,431 @@
> +#ifndef __LINUX_CMA_H
> +#define __LINUX_CMA_H
> +
> +/*
> + * Contiguous Memory Allocator framework
> + * Copyright (c) 2010 by Samsung Electronics.
> + * Written by Michal Nazarewicz ([email protected])
> + */
> +
> +/*
> + * See Documentation/contiguous-memory.txt for details.
> + */
> +
> +/***************************** Kernel lever API *****************************/
> +

s/lever/level/ ?

> +#ifdef __KERNEL__
> +
> +#include <linux/rbtree.h>
> +#include <linux/list.h>
> +
> +

Unnecessary whitespace. I'll keep these comments to a minimum. They are
distracting at best but I suggest you take a pass at cleaning up stuff
like this. It'll avoid your feedback being a mess of trivial cleanups
and no "proper" feedback :)

> +struct device;
> +struct cma_info;
> +
> +/*
> + * Don't call it directly, use cma_alloc(), cma_alloc_from() or
> + * cma_alloc_from_region().
> + */
> +dma_addr_t __must_check
> +__cma_alloc(const struct device *dev, const char *type,
> + size_t size, dma_addr_t alignment);
> +

So, lets say hypothetically speaking you used the core page allocator
where possible, you would be tranlating a size to an order. This is not
a problem but you'd have to watch the alignment because the page
allocator is only suitable for natural alignments.

> +/* Don't call it directly, use cma_info() or cma_info_about(). */
> +int
> +__cma_info(struct cma_info *info, const struct device *dev, const char *type);
> +

Don't put it in the header then :/

> +
> +/**
> + * cma_alloc - allocates contiguous chunk of memory.
> + * @dev: The device to perform allocation for.
> + * @type: A type of memory to allocate. Platform may define
> + * several different types of memory and device drivers
> + * can then request chunks of different types. Usually it's
> + * safe to pass NULL here which is the same as passing
> + * "common".
> + * @size: Size of the memory to allocate in bytes.
> + * @alignment: Desired alignment in bytes. Must be a power of two or

Nice one, must be a power of two implies natural alignment to the page
allocator!

> + * zero. If alignment is less then a page size it will be
> + * set to page size. If unsure, pass zero here.
> + *
> + * On error returns a negative error cast to dma_addr_t. Use
> + * IS_ERR_VALUE() to check if returned value is indeed an error.
> + * Otherwise physical address of the chunk is returned.
> + */
> +static inline dma_addr_t __must_check
> +cma_alloc(const struct device *dev, const char *type,
> + size_t size, dma_addr_t alignment)
> +{
> + return dev ? __cma_alloc(dev, type, size, alignment) : -EINVAL;
> +}
> +
> +
> +/**
> + * struct cma_info - information about regions returned by cma_info().
> + * @lower_bound: The smallest address that is possible to be
> + * allocated for given (dev, type) pair.
> + * @upper_bound: The one byte after the biggest address that is
> + * possible to be allocated for given (dev, type)
> + * pair.
> + * @total_size: Total size of regions mapped to (dev, type) pair.
> + * @free_size: Total free size in all of the regions mapped to (dev, type)
> + * pair. Because of possible race conditions, it is not
> + * guaranteed that the value will be correct -- it gives only
> + * an approximation.
> + * @count: Number of regions mapped to (dev, type) pair.
> + */
> +struct cma_info {
> + dma_addr_t lower_bound, upper_bound;
> + size_t total_size, free_size;
> + unsigned count;
> +};
> +
> +/**
> + * cma_info - queries information about regions.
> + * @info: Pointer to a structure where to save the information.
> + * @dev: The device to query information for.
> + * @type: A type of memory to query information for.
> + * If unsure, pass NULL here which is equal to passing
> + * "common".
> + *
> + * On error returns a negative error, zero otherwise.
> + */
> +static inline int
> +cma_info(struct cma_info *info, const struct device *dev, const char *type)
> +{
> + return dev ? __cma_info(info, dev, type) : -EINVAL;
> +}
> +
> +
> +/**
> + * cma_free - frees a chunk of memory.
> + * @addr: Beginning of the chunk.
> + *
> + * Returns -ENOENT if there is no chunk at given location; otherwise
> + * zero. In the former case issues a warning.
> + */
> +int cma_free(dma_addr_t addr);
> +

Is it not an error to free a non-existant chunk? Hope it WARN()s at
least.

> +
> +
> +/****************************** Lower lever API *****************************/
> +

How lower? If it can be hidden, put it in a private header.

> +/**
> + * cma_alloc_from - allocates contiguous chunk of memory from named regions.

Ideally named regions would be managed by default by free_area and the core
page allocator.

> + * @regions: Comma separated list of region names. Terminated by NUL
> + * byte or a semicolon.
> + * @size: Size of the memory to allocate in bytes.
> + * @alignment: Desired alignment in bytes. Must be a power of two or
> + * zero. If alignment is less then a page size it will be
> + * set to page size. If unsure, pass zero here.
> + *
> + * On error returns a negative error cast to dma_addr_t. Use
> + * IS_ERR_VALUE() to check if returned value is indeed an error.
> + * Otherwise physical address of the chunk is returned.
> + */
> +static inline dma_addr_t __must_check
> +cma_alloc_from(const char *regions, size_t size, dma_addr_t alignment)
> +{
> + return __cma_alloc(NULL, regions, size, alignment);
> +}
> +
> +/**
> + * cma_info_about - queries information about named regions.
> + * @info: Pointer to a structure where to save the information.
> + * @regions: Comma separated list of region names. Terminated by NUL
> + * byte or a semicolon.
> + *
> + * On error returns a negative error, zero otherwise.
> + */
> +static inline int
> +cma_info_about(struct cma_info *info, const const char *regions)
> +{
> + return __cma_info(info, NULL, regions);
> +}
> +
> +
> +
> +struct cma_allocator;
> +

So, I would hope that a default allocator would be something that sits above
__rmqueue_smallest with some juggling to allow __rmqueue_smallest to take
an arbitrary free_area. The CMA wrapper around it would need to know things
like how to call compaction to move pages out of MIGRATE_MOVABLE_STICKY
if necessary.

> +/**
> + * struct cma_region - a region reserved for CMA allocations.
> + * @name: Unique name of the region. Read only.
> + * @start: Physical starting address of the region in bytes. Always
> + * aligned at least to a full page. Read only.
> + * @size: Size of the region in bytes. Multiply of a page size.
> + * Read only.
> + * @free_space: Free space in the region. Read only.
> + * @alignment: Desired alignment of the region in bytes. A power of two,
> + * always at least page size. Early.
> + * @alloc: Allocator used with this region. NULL means allocator is
> + * not attached. Private.
> + * @alloc_name: Allocator name read from cmdline. Private. This may be
> + * different from @alloc->name.
> + * @private_data: Allocator's private data.
> + * @users: Number of chunks allocated in this region.
> + * @list: Entry in list of regions. Private.
> + * @used: Whether region was already used, ie. there was at least
> + * one allocation request for. Private.
> + * @registered: Whether this region has been registered. Read only.
> + * @reserved: Whether this region has been reserved. Early. Read only.
> + * @copy_name: Whether @name and @alloc_name needs to be copied when
> + * this region is converted from early to normal. Early.
> + * Private.
> + * @free_alloc_name: Whether @alloc_name was kmalloced(). Private.
> + *
> + * Regions come in two types: an early region and normal region. The
> + * former can be reserved or not-reserved. Fields marked as "early"
> + * are only meaningful in early regions.
> + *
> + * Early regions are important only during initialisation. The list
> + * of early regions is built from the "cma" command line argument or
> + * platform defaults. Platform initialisation code is responsible for
> + * reserving space for unreserved regions that are placed on
> + * cma_early_regions list.
> + *
> + * Later, during CMA initialisation all reserved regions from the
> + * cma_early_regions list are registered as normal regions and can be
> + * used using standard mechanisms.
> + */
> +struct cma_region {
> + const char *name;
> + dma_addr_t start;
> + size_t size;
> + union {
> + size_t free_space; /* Normal region */
> + dma_addr_t alignment; /* Early region */
> + };
> +
> + struct cma_allocator *alloc;
> + const char *alloc_name;
> + void *private_data;
> +
> + unsigned users;
> + struct list_head list;
> +
> + unsigned used:1;
> + unsigned registered:1;
> + unsigned reserved:1;
> + unsigned copy_name:1;
> + unsigned free_alloc_name:1;
> +};
> +
> +
> +/**
> + * cma_region_register() - registers a region.
> + * @reg: Region to region.
> + *
> + * Region's start and size must be set.
> + *
> + * If name is set the region will be accessible using normal mechanism
> + * like mapping or cma_alloc_from() function otherwise it will be
> + * a private region and accessible only using the
> + * cma_alloc_from_region() function.
> + *
> + * If alloc is set function will try to initialise given allocator
> + * (and will return error if it failes). Otherwise alloc_name may
> + * point to a name of an allocator to use (if not set, the default
> + * will be used).
> + *
> + * All other fields are ignored and/or overwritten.
> + *
> + * Returns zero or negative error. In particular, -EADDRINUSE if
> + * region overlap with already existing region.
> + */
> +int __must_check cma_region_register(struct cma_region *reg);
> +
> +/**
> + * cma_region_unregister() - unregisters a region.
> + * @reg: Region to unregister.
> + *
> + * Region is unregistered only if there are no chunks allocated for
> + * it. Otherwise, function returns -EBUSY.
> + *
> + * On success returs zero.
> + */
> +int __must_check cma_region_unregister(struct cma_region *reg);
> +
> +
> +/**
> + * cma_alloc_from_region() - allocates contiguous chunk of memory from region.
> + * @reg: Region to allocate chunk from.
> + * @size: Size of the memory to allocate in bytes.
> + * @alignment: Desired alignment in bytes. Must be a power of two or
> + * zero. If alignment is less then a page size it will be
> + * set to page size. If unsure, pass zero here.
> + *
> + * On error returns a negative error cast to dma_addr_t. Use
> + * IS_ERR_VALUE() to check if returned value is indeed an error.
> + * Otherwise physical address of the chunk is returned.
> + */
> +dma_addr_t __must_check
> +cma_alloc_from_region(struct cma_region *reg,
> + size_t size, dma_addr_t alignment);
> +
> +
> +
> +/****************************** Allocators API ******************************/
> +
> +/**
> + * struct cma_chunk - an allocated contiguous chunk of memory.
> + * @start: Physical address in bytes.
> + * @size: Size in bytes.
> + * @free_space: Free space in region in bytes. Read only.
> + * @reg: Region this chunk belongs to.
> + * @by_start: A node in an red-black tree with all chunks sorted by
> + * start address.
> + *
> + * The cma_allocator::alloc() operation need to set only the @start
> + * and @size fields. The rest is handled by the caller (ie. CMA
> + * glue).
> + */
> +struct cma_chunk {
> + dma_addr_t start;
> + size_t size;
> +
> + struct cma_region *reg;
> + struct rb_node by_start;
> +};
> +

Is there any scope for reusing parts of kernel/resource.c? Frankly, I
didn't look at your requirements closely enough or at kernel/resource.c
capabilities but at a glance, there appears to be some commonality.

Minimally, if there is a good reason to *not* use resource.c, it should
be in the changelog or I guarantee that in 3 months time, someone else
will ask you exactly the same question :)

> +
> +/**
> + * struct cma_allocator - a CMA allocator.
> + * @name: Allocator's unique name
> + * @init: Initialises an allocator on given region.
> + * @cleanup: Cleans up after init. May assume that there are no chunks
> + * allocated in given region.
> + * @alloc: Allocates a chunk of memory of given size in bytes and
> + * with given alignment. Alignment is a power of
> + * two (thus non-zero) and callback does not need to check it.
> + * May also assume that it is the only call that uses given
> + * region (ie. access to the region is synchronised with
> + * a mutex). This has to allocate the chunk object (it may be
> + * contained in a bigger structure with allocator-specific data.
> + * Required.
> + * @free: Frees allocated chunk. May also assume that it is the only
> + * call that uses given region. This has to free() the chunk
> + * object as well. Required.
> + * @list: Entry in list of allocators. Private.
> + */
> + /* * @users: How many regions use this allocator. Private. */
> +struct cma_allocator {
> + const char *name;
> +
> + int (*init)(struct cma_region *reg);
> + void (*cleanup)(struct cma_region *reg);
> + struct cma_chunk *(*alloc)(struct cma_region *reg, size_t size,
> + dma_addr_t alignment);
> + void (*free)(struct cma_chunk *chunk);
> +
> + /* unsigned users; */
> + struct list_head list;
> +};
> +

Again, with some jiggery pokery I think you could make the guts of the page
allocator the default cma_allocator.

> +
> +/**
> + * cma_allocator_register() - Registers an allocator.
> + * @alloc: Allocator to register.
> + *
> + * Adds allocator to the list of allocators managed by CMA.
> + *
> + * All of the fields of cma_allocator structure must be set except for
> + * optional name and users and list which will be overriden.
> + *
> + * Returns zero or negative error code.
> + */
> +int cma_allocator_register(struct cma_allocator *alloc);
> +
> +
> +/**************************** Initialisation API ****************************/
> +

As an aside, it does not seem necessary to have everything CMA related
in the same header. Maybe split it out to minimise the risk of drivers
abusing the layers. Up to you really, I don't feel very strongly on
header layout.

> +/**
> + * cma_set_defaults() - specifies default command line parameters.
> + * @regions: A zero-sized entry terminated list of early regions.
> + * This array must not be placed in __initdata section.
> + * @map: Map attribute.
> + *
> + * This function should be called prior to cma_early_regions_reserve()
> + * and after early parameters have been parsed.
> + *
> + * Returns zero or negative error.
> + */
> +int __init cma_set_defaults(struct cma_region *regions, const char *map);
> +
> +
> +/**
> + * cma_early_regions - a list of early regions.
> + *
> + * Platform needs to allocate space for each of the region before
> + * initcalls are executed. If space is reserved, the reserved flag
> + * must be set. Platform initialisation code may choose to use
> + * cma_early_regions_allocate().
> + *
> + * Later, during CMA initialisation all reserved regions from the
> + * cma_early_regions list are registered as normal regions and can be
> + * used using standard mechanisms.
> + */
> +extern struct list_head cma_early_regions __initdata;
> +
> +
> +/**
> + * cma_early_region_register() - registers an early region.
> + * @reg: Region to add.
> + *
> + * Region's start, size and alignment must be set.
> + *
> + * If name is set the region will be accessible using normal mechanism
> + * like mapping or cma_alloc_from() function otherwise it will be
> + * a private region accessible only using the cma_alloc_from_region().
> + *
> + * If alloc is set function will try to initialise given allocator
> + * when the early region is "converted" to normal region and
> + * registered during CMA initialisation. If this failes, the space
> + * will still be reserved but the region won't be registered.
> + *
> + * As usually, alloc_name may point to a name of an allocator to use
> + * (if both alloc and alloc_name aret set, the default will be used).
> + *
> + * All other fields are ignored and/or overwritten.
> + *
> + * Returns zero or negative error. No checking if regions overlap is
> + * performed.
> + */
> +int __init __must_check cma_early_region_register(struct cma_region *reg);
> +
> +
> +/**
> + * cma_early_region_reserve() - reserves a physically contiguous memory region.
> + * @reg: Early region to reserve memory for.
> + *
> + * If platform supports bootmem this is the first allocator this
> + * function tries to use. If that failes (or bootmem is not
> + * supported) function tries to use memblec if it is available.
> + *
> + * On success sets reg->reserved flag.
> + *
> + * Returns zero or negative error.
> + */
> +int __init cma_early_region_reserve(struct cma_region *reg);
> +
> +/**
> + * cma_early_regions_reserver() - helper function for reserving early regions.
> + * @reserve: Callbac function used to reserve space for region. Needs
> + * to return non-negative if allocation succeeded, negative
> + * error otherwise. NULL means cma_early_region_alloc() will
> + * be used.
> + *
> + * This function traverses the %cma_early_regions list and tries to
> + * reserve memory for each early region. It uses the @reserve
> + * callback function for that purpose. The reserved flag of each
> + * region is updated accordingly.
> + */
> +void __init cma_early_regions_reserve(int (*reserve)(struct cma_region *reg));
> +
> +#else
> +
> +#define cma_defaults(regions, map) ((int)0)
> +#define cma_early_regions_reserve(reserve) do { } while (0)
> +
> +#endif
> +
> +#endif
> diff --git a/mm/Kconfig b/mm/Kconfig
> index f4e516e..3e9317c 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -301,3 +301,37 @@ config NOMMU_INITIAL_TRIM_EXCESS
> of 1 says that all excess pages should be trimmed.
>
> See Documentation/nommu-mmap.txt for more information.
> +
> +
> +config CMA
> + bool "Contiguous Memory Allocator framework"
> + # Currently there is only one allocator so force it on
> + select CMA_BEST_FIT

and hopefully there will be a CMA_CORE_PGALLOC in the future. The
principal advantage again of such a move is that the pages would be
usable for normal allocations while the device is inactive.

> + help
> + This enables the Contiguous Memory Allocator framework which
> + allows drivers to allocate big physically-contiguous blocks of
> + memory for use with hardware components that do not support I/O
> + map nor scatter-gather.
> +
> + If you select this option you will also have to select at least
> + one allocator algorithm below.
> +
> + To make use of CMA you need to specify the regions and
> + driver->region mapping on command line when booting the kernel.
> +
> +config CMA_DEBUG
> + bool "CMA debug messages (DEVELOPEMENT)"
> + depends on CMA
> + help
> + Enable debug messages in CMA code.
> +
> +config CMA_BEST_FIT
> + bool "CMA best-fit allocator"
> + depends on CMA
> + default y

You don't need to default y this if CMA is selecting it, right?

also CMA should default n.

> + help
> + This is a best-fit algorithm running in O(n log n) time where
> + n is the number of existing holes (which is never greater then
> + the number of allocated regions and usually much smaller). It
> + allocates area from the smallest hole that is big enough for
> + allocation in question.
> diff --git a/mm/Makefile b/mm/Makefile
> index 34b2546..d8c717f 100644
> --- a/mm/Makefile
> +++ b/mm/Makefile
> @@ -47,3 +47,5 @@ obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o
> obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
> obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
> obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
> +obj-$(CONFIG_CMA) += cma.o
> +obj-$(CONFIG_CMA_BEST_FIT) += cma-best-fit.o
> diff --git a/mm/cma-best-fit.c b/mm/cma-best-fit.c
> new file mode 100644
> index 0000000..97f8d61
> --- /dev/null
> +++ b/mm/cma-best-fit.c
> @@ -0,0 +1,407 @@
> +/*
> + * Contiguous Memory Allocator framework: Best Fit allocator
> + * Copyright (c) 2010 by Samsung Electronics.
> + * Written by Michal Nazarewicz ([email protected])
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.
> + */
> +
> +#define pr_fmt(fmt) "cma: bf: " fmt
> +
> +#ifdef CONFIG_CMA_DEBUG
> +# define DEBUG
> +#endif
> +
> +#include <linux/errno.h> /* Error numbers */
> +#include <linux/slab.h> /* kmalloc() */
> +
> +#include <linux/cma.h> /* CMA structures */
> +
> +
> +/************************* Data Types *************************/
> +
> +struct cma_bf_item {
> + struct cma_chunk ch;
> + struct rb_node by_size;
> +};
> +
> +struct cma_bf_private {
> + struct rb_root by_start_root;
> + struct rb_root by_size_root;
> +};
> +
> +
> +/************************* Prototypes *************************/
> +
> +/*
> + * Those are only for holes. They must be called whenever hole's
> + * properties change but also whenever chunk becomes a hole or hole
> + * becames a chunk.
> + */
> +static void __cma_bf_hole_insert_by_size(struct cma_bf_item *item);
> +static void __cma_bf_hole_erase_by_size(struct cma_bf_item *item);
> +static int __must_check
> +__cma_bf_hole_insert_by_start(struct cma_bf_item *item);
> +static void __cma_bf_hole_erase_by_start(struct cma_bf_item *item);
> +
> +/**
> + * __cma_bf_hole_take - takes a chunk of memory out of a hole.
> + * @hole: hole to take chunk from
> + * @size: chunk's size
> + * @alignment: chunk's starting address alignment (must be power of two)
> + *
> + * Takes a @size bytes large chunk from hole @hole which must be able
> + * to hold the chunk. The "must be able" includes also alignment
> + * constraint.
> + *
> + * Returns allocated item or NULL on error (if kmalloc() failed).
> + */
> +static struct cma_bf_item *__must_check
> +__cma_bf_hole_take(struct cma_bf_item *hole, size_t size, dma_addr_t alignment);
> +
> +/**
> + * __cma_bf_hole_merge_maybe - tries to merge hole with neighbours.
> + * @item: hole to try and merge
> + *
> + * Which items are preserved is undefined so you may not rely on it.
> + */
> +static void __cma_bf_hole_merge_maybe(struct cma_bf_item *item);
> +
> +
> +/************************* Device API *************************/
> +
> +int cma_bf_init(struct cma_region *reg)
> +{
> + struct cma_bf_private *prv;
> + struct cma_bf_item *item;
> +
> + prv = kzalloc(sizeof *prv, GFP_KERNEL);
> + if (unlikely(!prv))
> + return -ENOMEM;
> +
> + item = kzalloc(sizeof *item, GFP_KERNEL);
> + if (unlikely(!item)) {
> + kfree(prv);
> + return -ENOMEM;
> + }
> +
> + item->ch.start = reg->start;
> + item->ch.size = reg->size;
> + item->ch.reg = reg;
> +
> + rb_root_init(&prv->by_start_root, &item->ch.by_start);
> + rb_root_init(&prv->by_size_root, &item->by_size);
> +
> + reg->private_data = prv;
> + return 0;
> +}
> +
> +void cma_bf_cleanup(struct cma_region *reg)
> +{
> + struct cma_bf_private *prv = reg->private_data;
> + struct cma_bf_item *item =
> + rb_entry(prv->by_size_root.rb_node,
> + struct cma_bf_item, by_size);
> +
> + /* We can assume there is only a single hole in the tree. */
> + WARN_ON(item->by_size.rb_left || item->by_size.rb_right ||
> + item->ch.by_start.rb_left || item->ch.by_start.rb_right);
> +
> + kfree(item);
> + kfree(prv);
> +}
> +
> +struct cma_chunk *cma_bf_alloc(struct cma_region *reg,
> + size_t size, dma_addr_t alignment)
> +{
> + struct cma_bf_private *prv = reg->private_data;
> + struct rb_node *node = prv->by_size_root.rb_node;
> + struct cma_bf_item *item = NULL;
> +
> + /* First find hole that is large enough */
> + while (node) {
> + struct cma_bf_item *i =
> + rb_entry(node, struct cma_bf_item, by_size);
> +
> + if (i->ch.size < size) {
> + node = node->rb_right;
> + } else if (i->ch.size >= size) {
> + node = node->rb_left;
> + item = i;
> + }
> + }
> + if (!item)
> + return NULL;
> +
> + /* Now look for items which can satisfy alignment requirements */
> + for (;;) {
> + dma_addr_t start = ALIGN(item->ch.start, alignment);
> + dma_addr_t end = item->ch.start + item->ch.size;
> + if (start < end && end - start >= size) {
> + item = __cma_bf_hole_take(item, size, alignment);
> + return likely(item) ? &item->ch : NULL;
> + }
> +
> + node = rb_next(node);
> + if (!node)
> + return NULL;
> +
> + item = rb_entry(node, struct cma_bf_item, by_size);
> + }
> +}
> +
> +void cma_bf_free(struct cma_chunk *chunk)
> +{
> + struct cma_bf_item *item = container_of(chunk, struct cma_bf_item, ch);
> +
> + /* Add new hole */
> + if (unlikely(__cma_bf_hole_insert_by_start(item))) {
> + /*
> + * We're screwed... Just free the item and forget
> + * about it. Things are broken beyond repair so no
> + * sense in trying to recover.
> + */
> + kfree(item);
> + } else {
> + __cma_bf_hole_insert_by_size(item);
> +
> + /* Merge with prev and next sibling */
> + __cma_bf_hole_merge_maybe(item);
> + }
> +}
> +
> +
> +/************************* Basic Tree Manipulation *************************/
> +
> +static void __cma_bf_hole_insert_by_size(struct cma_bf_item *item)
> +{
> + struct cma_bf_private *prv = item->ch.reg->private_data;
> + struct rb_node **link = &prv->by_size_root.rb_node, *parent = NULL;
> + const typeof(item->ch.size) value = item->ch.size;
> +
> + while (*link) {
> + struct cma_bf_item *i;
> + parent = *link;
> + i = rb_entry(parent, struct cma_bf_item, by_size);
> + link = value <= i->ch.size
> + ? &parent->rb_left
> + : &parent->rb_right;
> + }
> +
> + rb_link_node(&item->by_size, parent, link);
> + rb_insert_color(&item->by_size, &prv->by_size_root);
> +}
> +
> +static void __cma_bf_hole_erase_by_size(struct cma_bf_item *item)
> +{
> + struct cma_bf_private *prv = item->ch.reg->private_data;
> + rb_erase(&item->by_size, &prv->by_size_root);
> +}
> +
> +static int __must_check
> +__cma_bf_hole_insert_by_start(struct cma_bf_item *item)
> +{
> + struct cma_bf_private *prv = item->ch.reg->private_data;
> + struct rb_node **link = &prv->by_start_root.rb_node, *parent = NULL;
> + const typeof(item->ch.start) value = item->ch.start;
> +
> + while (*link) {
> + struct cma_bf_item *i;
> + parent = *link;
> + i = rb_entry(parent, struct cma_bf_item, ch.by_start);
> +
> + if (WARN_ON(value == i->ch.start))
> + /*
> + * This should *never* happen. And I mean
> + * *never*. We could even BUG on it but
> + * hopefully things are only a bit broken,
> + * ie. system can still run. We produce
> + * a warning and return an error.
> + */
> + return -EBUSY;
> +
> + link = value <= i->ch.start
> + ? &parent->rb_left
> + : &parent->rb_right;
> + }
> +
> + rb_link_node(&item->ch.by_start, parent, link);
> + rb_insert_color(&item->ch.by_start, &prv->by_start_root);
> + return 0;
> +}
> +
> +static void __cma_bf_hole_erase_by_start(struct cma_bf_item *item)
> +{
> + struct cma_bf_private *prv = item->ch.reg->private_data;
> + rb_erase(&item->ch.by_start, &prv->by_start_root);
> +}
> +
> +
> +/************************* More Tree Manipulation *************************/
> +
> +static struct cma_bf_item *__must_check
> +__cma_bf_hole_take(struct cma_bf_item *hole, size_t size, size_t alignment)
> +{
> + struct cma_bf_item *item;
> +
> + /*
> + * There are three cases:
> + * 1. the chunk takes the whole hole,
> + * 2. the chunk is at the beginning or at the end of the hole, or
> + * 3. the chunk is in the middle of the hole.
> + */
> +
> +
> + /* Case 1, the whole hole */
> + if (size == hole->ch.size) {
> + __cma_bf_hole_erase_by_size(hole);
> + __cma_bf_hole_erase_by_start(hole);
> + return hole;
> + }
> +
> +
> + /* Allocate */
> + item = kmalloc(sizeof *item, GFP_KERNEL);
> + if (unlikely(!item))
> + return NULL;
> +
> + item->ch.start = ALIGN(hole->ch.start, alignment);
> + item->ch.size = size;
> +
> + /* Case 3, in the middle */
> + if (item->ch.start != hole->ch.start
> + && item->ch.start + item->ch.size !=
> + hole->ch.start + hole->ch.size) {
> + struct cma_bf_item *tail;
> +
> + /*
> + * Space between the end of the chunk and the end of
> + * the region, ie. space left after the end of the
> + * chunk. If this is dividable by alignment we can
> + * move the chunk to the end of the hole.
> + */
> + size_t left =
> + hole->ch.start + hole->ch.size -
> + (item->ch.start + item->ch.size);
> + if (left % alignment == 0) {
> + item->ch.start += left;
> + goto case_2;
> + }
> +
> + /*
> + * We are going to add a hole at the end. This way,
> + * we will reduce the problem to case 2 -- the chunk
> + * will be at the end of the hole.
> + */
> + tail = kmalloc(sizeof *tail, GFP_KERNEL);
> + if (unlikely(!tail)) {
> + kfree(item);
> + return NULL;
> + }
> +
> + tail->ch.start = item->ch.start + item->ch.size;
> + tail->ch.size =
> + hole->ch.start + hole->ch.size - tail->ch.start;
> + tail->ch.reg = hole->ch.reg;
> +
> + if (unlikely(__cma_bf_hole_insert_by_start(tail))) {
> + /*
> + * Things are broken beyond repair... Abort
> + * inserting the hole but still continue with
> + * allocation (seems like the best we can do).
> + */
> +
> + hole->ch.size = tail->ch.start - hole->ch.start;
> + kfree(tail);
> + } else {
> + __cma_bf_hole_insert_by_size(tail);
> + /*
> + * It's important that we first insert the new
> + * hole in the tree sorted by size and later
> + * reduce the size of the old hole. We will
> + * update the position of the old hole in the
> + * rb tree in code that handles case 2.
> + */
> + hole->ch.size = tail->ch.start - hole->ch.start;
> + }
> +
> + /* Go to case 2 */
> + }
> +
> +
> + /* Case 2, at the beginning or at the end */
> +case_2:
> + /* No need to update the tree; order preserved. */
> + if (item->ch.start == hole->ch.start)
> + hole->ch.start += item->ch.size;
> +
> + /* Alter hole's size */
> + hole->ch.size -= size;
> + __cma_bf_hole_erase_by_size(hole);
> + __cma_bf_hole_insert_by_size(hole);
> +
> + return item;
> +}
> +
> +
> +static void __cma_bf_hole_merge_maybe(struct cma_bf_item *item)
> +{
> + struct cma_bf_item *prev;
> + struct rb_node *node;
> + int twice = 2;
> +
> + node = rb_prev(&item->ch.by_start);
> + if (unlikely(!node))
> + goto next;
> + prev = rb_entry(node, struct cma_bf_item, ch.by_start);
> +
> + for (;;) {
> + if (prev->ch.start + prev->ch.size == item->ch.start) {
> + /* Remove previous hole from trees */
> + __cma_bf_hole_erase_by_size(prev);
> + __cma_bf_hole_erase_by_start(prev);
> +
> + /* Alter this hole */
> + item->ch.size += prev->ch.size;
> + item->ch.start = prev->ch.start;
> + __cma_bf_hole_erase_by_size(item);
> + __cma_bf_hole_insert_by_size(item);
> + /*
> + * No need to update by start trees as we do
> + * not break sequence order
> + */
> +
> + /* Free prev hole */
> + kfree(prev);
> + }
> +
> +next:
> + if (!--twice)
> + break;
> +
> + node = rb_next(&item->ch.by_start);
> + if (unlikely(!node))
> + break;
> + prev = item;
> + item = rb_entry(node, struct cma_bf_item, ch.by_start);
> + }
> +}
> +

I confess I didn't review this part at all. I'm only commenting on how
you might better integrate with the core allocator in the future.

> +
> +
> +/************************* Register *************************/
> +static int cma_bf_module_init(void)
> +{
> + static struct cma_allocator alloc = {
> + .name = "bf",
> + .init = cma_bf_init,
> + .cleanup = cma_bf_cleanup,
> + .alloc = cma_bf_alloc,
> + .free = cma_bf_free,
> + };
> + return cma_allocator_register(&alloc);
> +}
> +module_init(cma_bf_module_init);
> diff --git a/mm/cma.c b/mm/cma.c
> new file mode 100644
> index 0000000..401399c
> --- /dev/null
> +++ b/mm/cma.c
> @@ -0,0 +1,910 @@
> +/*
> + * Contiguous Memory Allocator framework
> + * Copyright (c) 2010 by Samsung Electronics.
> + * Written by Michal Nazarewicz ([email protected])
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation; either version 2 of the
> + * License or (at your optional) any later version of the license.

I'm not certain about the "any later version" part of this license and
how it applies to kernel code but I'm no licensing guru. I know we have
duel licensing elsewhere for BSD but someone should double check this
license is ok.

> + */
> +
> +/*
> + * See Documentation/contiguous-memory.txt for details.
> + */
> +
> +#define pr_fmt(fmt) "cma: " fmt
> +
> +#ifdef CONFIG_CMA_DEBUG
> +# define DEBUG
> +#endif
> +
> +#ifndef CONFIG_NO_BOOTMEM
> +# include <linux/bootmem.h> /* alloc_bootmem_pages_nopanic() */
> +#endif
> +#ifdef CONFIG_HAVE_MEMBLOCK
> +# include <linux/memblock.h> /* memblock*() */
> +#endif
> +#include <linux/device.h> /* struct device, dev_name() */
> +#include <linux/errno.h> /* Error numbers */
> +#include <linux/err.h> /* IS_ERR, PTR_ERR, etc. */
> +#include <linux/mm.h> /* PAGE_ALIGN() */
> +#include <linux/module.h> /* EXPORT_SYMBOL_GPL() */
> +#include <linux/mutex.h> /* mutex */
> +#include <linux/slab.h> /* kmalloc() */
> +#include <linux/string.h> /* str*() */
> +
> +#include <linux/cma.h>
> +
> +
> +/*
> + * Protects cma_regions, cma_allocators, cma_map, cma_map_length, and
> + * cma_chunks_by_start.
> + */
> +static DEFINE_MUTEX(cma_mutex);
> +
> +
> +
> +/************************* Map attribute *************************/
> +
> +static const char *cma_map;
> +static size_t cma_map_length;
> +
> +/*
> + * map-attr ::= [ rules [ ';' ] ]
> + * rules ::= rule [ ';' rules ]
> + * rule ::= patterns '=' regions
> + * patterns ::= pattern [ ',' patterns ]
> + * regions ::= REG-NAME [ ',' regions ]
> + * pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
> + *
> + * See Documentation/contiguous-memory.txt for details.
> + */
> +static ssize_t cma_map_validate(const char *param)
> +{
> + const char *ch = param;
> +
> + if (*ch == '\0' || *ch == '\n')
> + return 0;
> +
> + for (;;) {
> + const char *start = ch;
> +
> + while (*ch && *ch != '\n' && *ch != ';' && *ch != '=')
> + ++ch;
> +
> + if (*ch != '=' || start == ch) {
> + pr_err("map: expecting \"<patterns>=<regions>\" near %s\n",
> + start);
> + return -EINVAL;
> + }
> +
> + while (*++ch != ';')
> + if (*ch == '\0' || *ch == '\n')
> + return ch - param;
> + if (ch[1] == '\0' || ch[1] == '\n')
> + return ch - param;
> + ++ch;
> + }
> +}
> +
> +static int __init cma_map_param(char *param)
> +{
> + ssize_t len;
> +
> + pr_debug("param: map: %s\n", param);
> +
> + len = cma_map_validate(param);
> + if (len < 0)
> + return len;
> +
> + cma_map = param;
> + cma_map_length = len;
> + return 0;
> +}
> +
> +
> +
> +/************************* Early regions *************************/
> +
> +struct list_head cma_early_regions __initdata =
> + LIST_HEAD_INIT(cma_early_regions);
> +
> +
> +int __init __must_check cma_early_region_register(struct cma_region *reg)
> +{
> + dma_addr_t start, alignment;
> + size_t size;
> +
> + if (reg->alignment & (reg->alignment - 1))
> + return -EINVAL;
> +
> + alignment = max(reg->alignment, (dma_addr_t)PAGE_SIZE);
> + start = ALIGN(reg->start, alignment);
> + size = PAGE_ALIGN(reg->size);
> +
> + if (start + size < start)
> + return -EINVAL;
> +
> + reg->size = size;
> + reg->start = start;
> + reg->alignment = alignment;
> +
> + list_add_tail(&reg->list, &cma_early_regions);
> +
> + pr_debug("param: registering early region %s (%p@%p/%p)\n",
> + reg->name, (void *)reg->size, (void *)reg->start,
> + (void *)reg->alignment);
> +
> + return 0;
> +}
> +
> +
> +
> +/************************* Regions & Allocators *************************/
> +
> +static int __cma_region_attach_alloc(struct cma_region *reg);
> +
> +/* List of all regions. Named regions are kept before unnamed. */
> +static LIST_HEAD(cma_regions);
> +
> +#define cma_foreach_region(reg) \
> + list_for_each_entry(reg, &cma_regions, list)
> +
> +int __must_check cma_region_register(struct cma_region *reg)
> +{
> + const char *name, *alloc_name;
> + struct cma_region *r;
> + char *ch = NULL;
> + int ret = 0;
> +
> + if (!reg->size || reg->start + reg->size < reg->start)
> + return -EINVAL;
> +
> + reg->users = 0;
> + reg->used = 0;
> + reg->private_data = NULL;
> + reg->registered = 0;
> + reg->free_space = reg->size;
> +
> + /* Copy name and alloc_name */
> + name = reg->name;
> + alloc_name = reg->alloc_name;
> + if (reg->copy_name && (reg->name || reg->alloc_name)) {
> + size_t name_size, alloc_size;
> +
> + name_size = reg->name ? strlen(reg->name) + 1 : 0;
> + alloc_size = reg->alloc_name ? strlen(reg->alloc_name) + 1 : 0;
> +
> + ch = kmalloc(name_size + alloc_size, GFP_KERNEL);
> + if (!ch) {
> + pr_err("%s: not enough memory to allocate name\n",
> + reg->name ?: "(private)");
> + return -ENOMEM;
> + }
> +
> + if (name_size) {
> + memcpy(ch, reg->name, name_size);
> + name = ch;
> + ch += name_size;
> + }
> +
> + if (alloc_size) {
> + memcpy(ch, reg->alloc_name, alloc_size);
> + alloc_name = ch;
> + }
> + }
> +
> + mutex_lock(&cma_mutex);
> +
> + /* Don't let regions overlap */
> + cma_foreach_region(r)
> + if (r->start + r->size > reg->start &&
> + r->start < reg->start + reg->size) {
> + ret = -EADDRINUSE;
> + goto done;
> + }
> +
> + if (reg->alloc) {
> + ret = __cma_region_attach_alloc(reg);
> + if (unlikely(ret < 0))
> + goto done;
> + }
> +
> + reg->name = name;
> + reg->alloc_name = alloc_name;
> + reg->registered = 1;
> + ch = NULL;
> +
> + /*
> + * Keep named at the beginning and unnamed (private) at the
> + * end. This helps in traversal when named region is looked
> + * for.
> + */
> + if (name)
> + list_add(&reg->list, &cma_regions);
> + else
> + list_add_tail(&reg->list, &cma_regions);
> +
> +done:
> + mutex_unlock(&cma_mutex);
> +
> + pr_debug("%s: region %sregistered\n",
> + reg->name ?: "(private)", ret ? "not " : "");
> + kfree(ch);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(cma_region_register);
> +
> +static struct cma_region *__must_check
> +__cma_region_find(const char **namep)
> +{
> + struct cma_region *reg;
> + const char *ch, *name;
> + size_t n;
> +
> + for (ch = *namep; *ch && *ch != ',' && *ch != ';'; ++ch)
> + /* nop */;
> + name = *namep;
> + *namep = *ch == ',' ? ch : (ch + 1);
> + n = ch - name;
> +
> + /*
> + * Named regions are kept in front of unnamed so if we
> + * encounter unnamed region we can stop.
> + */
> + cma_foreach_region(reg)
> + if (!reg->name)
> + break;
> + else if (!strncmp(name, reg->name, n) && !reg->name[n])
> + return reg;
> +
> + return NULL;
> +}
> +
> +
> +/* List of all allocators. */
> +static LIST_HEAD(cma_allocators);
> +
> +#define cma_foreach_allocator(alloc) \
> + list_for_each_entry(alloc, &cma_allocators, list)
> +
> +int cma_allocator_register(struct cma_allocator *alloc)
> +{
> + struct cma_region *reg;
> + int first;
> +
> + if (!alloc->alloc || !alloc->free)
> + return -EINVAL;
> +
> + /* alloc->users = 0; */
> +

Odd comment.

> + mutex_lock(&cma_mutex);
> +
> + first = list_empty(&cma_allocators);
> +
> + list_add_tail(&alloc->list, &cma_allocators);
> +
> + /*
> + * Attach this allocator to all allocator-less regions that
> + * request this particular allocator (reg->alloc_name equals
> + * alloc->name) or if region wants the first available
> + * allocator and we are the first.
> + */
> + cma_foreach_region(reg) {
> + if (reg->alloc)
> + continue;
> + if (reg->alloc_name
> + ? alloc->name && !strcmp(alloc->name, reg->alloc_name)
> + : (!reg->used && first))
> + continue;
> +
> + reg->alloc = alloc;
> + __cma_region_attach_alloc(reg);
> + }
> +
> + mutex_unlock(&cma_mutex);
> +
> + pr_debug("%s: allocator registered\n", alloc->name ?: "(unnamed)");
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(cma_allocator_register);
> +
> +static struct cma_allocator *__must_check
> +__cma_allocator_find(const char *name)
> +{
> + struct cma_allocator *alloc;
> +
> + if (!name)
> + return list_empty(&cma_allocators)
> + ? NULL
> + : list_entry(cma_allocators.next,
> + struct cma_allocator, list);
> +
> + cma_foreach_allocator(alloc)
> + if (alloc->name && !strcmp(name, alloc->name))
> + return alloc;
> +
> + return NULL;
> +}
> +
> +
> +
> +/************************* Initialise CMA *************************/
> +
> +int __init cma_set_defaults(struct cma_region *regions, const char *map)
> +{
> + if (map) {
> + int ret = cma_map_param((char *)map);
> + if (unlikely(ret < 0))
> + return ret;
> + }
> +
> + if (!regions)
> + return 0;
> +
> + for (; regions->size; ++regions) {
> + int ret = cma_early_region_register(regions);
> + if (unlikely(ret < 0))
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +
> +int __init cma_early_region_reserve(struct cma_region *reg)
> +{
> + int tried = 0;
> +
> + if (!reg->size || (reg->alignment & (reg->alignment - 1)) ||
> + reg->reserved)
> + return -EINVAL;
> +
> +#ifndef CONFIG_NO_BOOTMEM
> +
> + tried = 1;
> +
> + {
> + void *ptr = __alloc_bootmem_nopanic(reg->size, reg->alignment,
> + reg->start);
> + if (ptr) {
> + reg->start = virt_to_phys(ptr);
> + reg->reserved = 1;
> + return 0;
> + }
> + }
> +
> +#endif
> +
> +#ifdef CONFIG_HAVE_MEMBLOCK
> +
> + tried = 1;
> +
> + if (reg->start) {
> + if (memblock_is_region_reserved(reg->start, reg->size) < 0 &&
> + memblock_reserve(reg->start, reg->size) >= 0) {
> + reg->reserved = 1;
> + return 0;
> + }
> + } else {
> + /*
> + * Use __memblock_alloc_base() since
> + * memblock_alloc_base() panic()s.
> + */
> + u64 ret = __memblock_alloc_base(reg->size, reg->alignment, 0);
> + if (ret &&
> + ret < ~(dma_addr_t)0 &&
> + ret + reg->size < ~(dma_addr_t)0 &&
> + ret + reg->size > ret) {
> + reg->start = ret;
> + reg->reserved = 1;
> + return 0;
> + }
> +
> + if (ret)
> + memblock_free(ret, reg->size);
> + }
> +
> +#endif
> +
> + return tried ? -ENOMEM : -EOPNOTSUPP;
> +}
> +
> +void __init cma_early_regions_reserve(int (*reserve)(struct cma_region *reg))
> +{
> + struct cma_region *reg;
> +
> + pr_debug("init: reserving early regions\n");
> +
> + if (!reserve)
> + reserve = cma_early_region_reserve;
> +
> + list_for_each_entry(reg, &cma_early_regions, list) {
> + if (reg->reserved) {
> + /* nothing */
> + } else if (reserve(reg) >= 0) {
> + pr_debug("init: %s: reserved %p@%p\n",
> + reg->name ?: "(private)",
> + (void *)reg->size, (void *)reg->start);
> + reg->reserved = 1;
> + } else {
> + pr_warn("init: %s: unable to reserve %p@%p/%p\n",
> + reg->name ?: "(private)",
> + (void *)reg->size, (void *)reg->start,
> + (void *)reg->alignment);
> + }
> + }
> +}
> +
> +
> +static int __init cma_init(void)
> +{
> + struct cma_region *reg, *n;
> +
> + pr_debug("init: initialising\n");
> +
> + if (cma_map) {
> + char *val = kmemdup(cma_map, cma_map_length + 1, GFP_KERNEL);
> + cma_map = val;
> + if (!val)
> + return -ENOMEM;
> + val[cma_map_length] = '\0';
> + }
> +
> + list_for_each_entry_safe(reg, n, &cma_early_regions, list) {
> + INIT_LIST_HEAD(&reg->list);
> + /*
> + * We don't care if there was an error. It's a pity
> + * but there's not much we can do about it any way.
> + * If the error is on a region that was parsed from
> + * command line then it will stay and waste a bit of
> + * space; if it was registered using
> + * cma_early_region_register() it's caller's
> + * responsibility to do something about it.
> + */
> + if (reg->reserved && cma_region_register(reg) < 0)
> + /* ignore error */;
> + }
> +
> + INIT_LIST_HEAD(&cma_early_regions);
> +
> + return 0;
> +}
> +/*
> + * We want to be initialised earlier than module_init/__initcall so
> + * that drivers that want to grab memory at boot time will get CMA
> + * ready. subsys_initcall() seems early enough and not too early at
> + * the same time.
> + */
> +subsys_initcall(cma_init);
> +
> +
> +
> +/************************* Chunks *************************/
> +
> +/* All chunks sorted by start address. */
> +static struct rb_root cma_chunks_by_start;
> +
> +static struct cma_chunk *__must_check __cma_chunk_find(dma_addr_t addr)
> +{
> + struct cma_chunk *chunk;
> + struct rb_node *n;
> +
> + for (n = cma_chunks_by_start.rb_node; n; ) {
> + chunk = rb_entry(n, struct cma_chunk, by_start);
> + if (addr < chunk->start)
> + n = n->rb_left;
> + else if (addr > chunk->start)
> + n = n->rb_right;
> + else
> + return chunk;
> + }
> + WARN(1, KERN_WARNING "no chunk starting at %p\n", (void *)addr);
> + return NULL;
> +}
> +
> +static int __must_check __cma_chunk_insert(struct cma_chunk *chunk)
> +{
> + struct rb_node **new, *parent = NULL;
> + typeof(chunk->start) addr = chunk->start;
> +
> + for (new = &cma_chunks_by_start.rb_node; *new; ) {
> + struct cma_chunk *c =
> + container_of(*new, struct cma_chunk, by_start);
> +
> + parent = *new;
> + if (addr < c->start) {
> + new = &(*new)->rb_left;
> + } else if (addr > c->start) {
> + new = &(*new)->rb_right;
> + } else {
> + /*
> + * We should never be here. If we are it
> + * means allocator gave us an invalid chunk
> + * (one that has already been allocated) so we
> + * refuse to accept it. Our caller will
> + * recover by freeing the chunk.
> + */
> + WARN_ON(1);
> + return -EADDRINUSE;
> + }
> + }
> +
> + rb_link_node(&chunk->by_start, parent, new);
> + rb_insert_color(&chunk->by_start, &cma_chunks_by_start);
> +
> + return 0;
> +}
> +
> +static void __cma_chunk_free(struct cma_chunk *chunk)
> +{
> + rb_erase(&chunk->by_start, &cma_chunks_by_start);
> +
> + chunk->reg->alloc->free(chunk);
> + --chunk->reg->users;
> + chunk->reg->free_space += chunk->size;
> +}
> +

For the most part other than style issues, nothing horrible jumped out.
There is nothing "surprising" about the allocator or how it is
structured as such. At least, not at first glance :)

There are concepts it shares with a standard arena allocator and the managing
of buffer information is similar to how slab manages objects. There might
be some scope with getting closer to slab in the future but it would be
premature as a starting point.

I am curious about one thing though. Have you considered reusing the bootmem
allocator code to manage the regions instead of your custom stuff here? Instead
of the cma_regions core structures, you would associate cma_region with
a new bootmem_data_t, keep the bootmem code around and allocate using its
allocator. It's a bitmap allocator too and would be less code in the kernel?

> +
> +/************************* The Device API *************************/
> +
> +static const char *__must_check
> +__cma_where_from(const struct device *dev, const char *type);
> +
> +
> +/* Allocate. */
> +
> +static dma_addr_t __must_check
> +__cma_alloc_from_region(struct cma_region *reg,
> + size_t size, dma_addr_t alignment)
> +{
> + struct cma_chunk *chunk;
> +
> + pr_debug("allocate %p/%p from %s\n",
> + (void *)size, (void *)alignment,
> + reg ? reg->name ?: "(private)" : "(null)");
> +
> + if (!reg || reg->free_space < size)
> + return -ENOMEM;
> +
> + if (!reg->alloc) {
> + if (!reg->used)
> + __cma_region_attach_alloc(reg);
> + if (!reg->alloc)
> + return -ENOMEM;
> + }
> +
> + chunk = reg->alloc->alloc(reg, size, alignment);
> + if (!chunk)
> + return -ENOMEM;
> +
> + if (unlikely(__cma_chunk_insert(chunk) < 0)) {
> + /* We should *never* be here. */
> + chunk->reg->alloc->free(chunk);
> + kfree(chunk);
> + return -EADDRINUSE;
> + }
> +
> + chunk->reg = reg;
> + ++reg->users;
> + reg->free_space -= chunk->size;
> + pr_debug("allocated at %p\n", (void *)chunk->start);
> + return chunk->start;
> +}
> +
> +dma_addr_t __must_check
> +cma_alloc_from_region(struct cma_region *reg,
> + size_t size, dma_addr_t alignment)
> +{
> + dma_addr_t addr;
> +
> + pr_debug("allocate %p/%p from %s\n",
> + (void *)size, (void *)alignment,
> + reg ? reg->name ?: "(private)" : "(null)");
> +
> + if (!size || alignment & (alignment - 1) || !reg)
> + return -EINVAL;
> +
> + mutex_lock(&cma_mutex);
> +
> + addr = reg->registered ?
> + __cma_alloc_from_region(reg, PAGE_ALIGN(size),
> + max(alignment, (dma_addr_t)PAGE_SIZE)) :
> + -EINVAL;
> +
> + mutex_unlock(&cma_mutex);
> +
> + return addr;
> +}
> +EXPORT_SYMBOL_GPL(cma_alloc_from_region);
> +
> +dma_addr_t __must_check
> +__cma_alloc(const struct device *dev, const char *type,
> + dma_addr_t size, dma_addr_t alignment)
> +{
> + struct cma_region *reg;
> + const char *from;
> + dma_addr_t addr;
> +
> + if (dev)
> + pr_debug("allocate %p/%p for %s/%s\n",
> + (void *)size, (void *)alignment,
> + dev_name(dev), type ?: "");
> +
> + if (!size || alignment & (alignment - 1))
> + return -EINVAL;
> +
> + size = PAGE_ALIGN(size);
> + if (alignment < PAGE_SIZE)
> + alignment = PAGE_SIZE;
> +
> + mutex_lock(&cma_mutex);
> +
> + from = __cma_where_from(dev, type);
> + if (unlikely(IS_ERR(from))) {
> + addr = PTR_ERR(from);
> + goto done;
> + }
> +
> + pr_debug("allocate %p/%p from one of %s\n",
> + (void *)size, (void *)alignment, from);
> +
> + while (*from && *from != ';') {
> + reg = __cma_region_find(&from);
> + addr = __cma_alloc_from_region(reg, size, alignment);
> + if (!IS_ERR_VALUE(addr))
> + goto done;
> + }
> +
> + pr_debug("not enough memory\n");
> + addr = -ENOMEM;
> +
> +done:
> + mutex_unlock(&cma_mutex);
> +
> + return addr;
> +}
> +EXPORT_SYMBOL_GPL(__cma_alloc);
> +
> +
> +/* Query information about regions. */
> +static void __cma_info_add(struct cma_info *infop, struct cma_region *reg)
> +{
> + infop->total_size += reg->size;
> + infop->free_size += reg->free_space;
> + if (infop->lower_bound > reg->start)
> + infop->lower_bound = reg->start;
> + if (infop->upper_bound < reg->start + reg->size)
> + infop->upper_bound = reg->start + reg->size;
> + ++infop->count;
> +}
> +
> +int
> +__cma_info(struct cma_info *infop, const struct device *dev, const char *type)
> +{
> + struct cma_info info = { ~(dma_addr_t)0, 0, 0, 0, 0 };
> + struct cma_region *reg;
> + const char *from;
> + int ret;
> +
> + if (unlikely(!infop))
> + return -EINVAL;
> +
> + mutex_lock(&cma_mutex);
> +
> + from = __cma_where_from(dev, type);
> + if (IS_ERR(from)) {
> + ret = PTR_ERR(from);
> + info.lower_bound = 0;
> + goto done;
> + }
> +
> + while (*from && *from != ';') {
> + reg = __cma_region_find(&from);
> + if (reg)
> + __cma_info_add(&info, reg);
> + }
> +
> + ret = 0;
> +done:
> + mutex_unlock(&cma_mutex);
> +
> + memcpy(infop, &info, sizeof info);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(__cma_info);
> +
> +
> +/* Freeing. */
> +int cma_free(dma_addr_t addr)
> +{
> + struct cma_chunk *c;
> + int ret;
> +
> + mutex_lock(&cma_mutex);
> +
> + c = __cma_chunk_find(addr);
> +
> + if (c) {
> + __cma_chunk_free(c);
> + ret = 0;
> + } else {
> + ret = -ENOENT;
> + }
> +
> + mutex_unlock(&cma_mutex);
> +
> + pr_debug("free(%p): %s\n", (void *)addr, c ? "freed" : "not found");
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(cma_free);
> +
> +
> +/************************* Miscellaneous *************************/
> +
> +static int __cma_region_attach_alloc(struct cma_region *reg)
> +{
> + struct cma_allocator *alloc;
> + int ret;
> +
> + /*
> + * If reg->alloc is set then caller wants us to use this
> + * allocator. Otherwise we need to find one by name.
> + */
> + if (reg->alloc) {
> + alloc = reg->alloc;
> + } else {
> + alloc = __cma_allocator_find(reg->alloc_name);
> + if (!alloc) {
> + pr_warn("init: %s: %s: no such allocator\n",
> + reg->name ?: "(private)",
> + reg->alloc_name ?: "(default)");
> + reg->used = 1;
> + return -ENOENT;
> + }
> + }
> +
> + /* Try to initialise the allocator. */
> + reg->private_data = NULL;
> + ret = alloc->init ? alloc->init(reg) : 0;
> + if (unlikely(ret < 0)) {
> + pr_err("init: %s: %s: unable to initialise allocator\n",
> + reg->name ?: "(private)", alloc->name ?: "(unnamed)");
> + reg->alloc = NULL;
> + reg->used = 1;
> + } else {
> + reg->alloc = alloc;
> + /* ++alloc->users; */
> + pr_debug("init: %s: %s: initialised allocator\n",
> + reg->name ?: "(private)", alloc->name ?: "(unnamed)");
> + }
> + return ret;
> +}
> +
> +
> +/*
> + * s ::= rules
> + * rules ::= rule [ ';' rules ]
> + * rule ::= patterns '=' regions
> + * patterns ::= pattern [ ',' patterns ]
> + * regions ::= REG-NAME [ ',' regions ]
> + * pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
> + */
> +static const char *__must_check
> +__cma_where_from(const struct device *dev, const char *type)
> +{
> + /*
> + * This function matches the pattern from the map attribute
> + * agains given device name and type. Type may be of course
> + * NULL or an emtpy string.
> + */
> +
> + const char *s, *name;
> + int name_matched = 0;
> +
> + /*
> + * If dev is NULL we were called in alternative form where
> + * type is the from string. All we have to do is return it.
> + */
> + if (!dev)
> + return type ?: ERR_PTR(-EINVAL);
> +
> + if (!cma_map)
> + return ERR_PTR(-ENOENT);
> +
> + name = dev_name(dev);
> + if (WARN_ON(!name || !*name))
> + return ERR_PTR(-EINVAL);
> +
> + if (!type)
> + type = "common";
> +
> + /*
> + * Now we go throught the cma_map attribute.
> + */
> + for (s = cma_map; *s; ++s) {
> + const char *c;
> +
> + /*
> + * If the pattern starts with a slash, the device part of the
> + * pattern matches if it matched previously.
> + */
> + if (*s == '/') {
> + if (!name_matched)
> + goto look_for_next;
> + goto match_type;
> + }
> +
> + /*
> + * We are now trying to match the device name. This also
> + * updates the name_matched variable. If, while reading the
> + * spec, we ecnounter comma it means that the pattern does not
> + * match and we need to start over with another pattern (the
> + * one afther the comma). If we encounter equal sign we need
> + * to start over with another rule. If there is a character
> + * that does not match, we neet to look for a comma (to get
> + * another pattern) or semicolon (to get another rule) and try
> + * again if there is one somewhere.
> + */
> +
> + name_matched = 0;
> +
> + for (c = name; *s != '*' && *c; ++c, ++s)
> + if (*s == '=')
> + goto next_rule;
> + else if (*s == ',')
> + goto next_pattern;
> + else if (*s != '?' && *c != *s)
> + goto look_for_next;
> + if (*s == '*')
> + ++s;
> +
> + name_matched = 1;
> +
> + /*
> + * Now we need to match the type part of the pattern. If the
> + * pattern is missing it we match only if type points to an
> + * empty string. Otherwise wy try to match it just like name.
> + */
> + if (*s == '/') {
> +match_type: /* s points to '/' */
> + ++s;
> +
> + for (c = type; *s && *c; ++c, ++s)
> + if (*s == '=')
> + goto next_rule;
> + else if (*s == ',')
> + goto next_pattern;
> + else if (*c != *s)
> + goto look_for_next;
> + }
> +
> + /* Return the string behind the '=' sign of the rule. */
> + if (*s == '=')
> + return s + 1;
> + else if (*s == ',')
> + return strchr(s, '=') + 1;
> +
> + /* Pattern did not match */
> +
> +look_for_next:
> + do {
> + ++s;
> + } while (*s != ',' && *s != '=');
> + if (*s == ',')
> + continue;
> +
> +next_rule: /* s points to '=' */
> + s = strchr(s, ';');
> + if (!s)
> + break;
> +
> +next_pattern:
> + continue;
> + }
> +
> + return ERR_PTR(-ENOENT);
> +}

I'm afraid I ran out of beans reading the patch so it isn't a detailed
review. Nothing horrible jumps out but I'd be interested in hearing your
thoughts on suggestions for bringing it closer to the page allocator and
reusing the bootmem allocator instead of introducing new code for CMA.

Thanks

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab

2010-08-27 02:10:47

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 2/6] mm: cma: Contiguous Memory Allocator added

> An important consideration is if the alignment is always a natural
> alignment? i.e. a 64K buffer must be 64K aligned, 128K must be 128K aligned
> etc. I ask because the buddy allocator is great at granting natural alignments
> but is difficult to work with for other alignments.

I'm not sure what you mean by "natural alignment". If 1M alignment of a 64K buffer
is natural then yes, presented API requires alignment to be natural. In short,
alignment must be a power of two and is never less then a PAGE_SIZE but can be more
then the size of the requested chunk.

>> + The main design goal for the CMA was to provide a customisable and
>> + modular framework, which could be configured to suit the needs of
>> + individual systems. Configuration specifies a list of memory
>> + regions, which then are assigned to devices. Memory regions can
>> + be shared among many device drivers or assigned exclusively to
>> + one. This has been achieved in the following ways:

> It'd be very nice if the shared regions could also be used by normal movable
> memory allocations to minimise the amount of wastage.

Yes. I hope to came up with a CMA version that will allow reserved spec to be
reused by the rest of memory management code. For now, I won't respond to your
suggestions regarding the use of page allocator but I hope to write something
later today in response to Peter's and Minchan's mails. I'll make sure to cc you
as well.

>> +/* Don't call it directly, use cma_info() or cma_info_about(). */
>> +int
>> +__cma_info(struct cma_info *info, const struct device *dev, const char *type);
>> +

> Don't put it in the header then :/

It's in the header to allow cma_info() and cma_info_about() to be static inlines.
The idea is not to generate too many exported symbols. Also, it's not like usage
of __cma_info() is in any way more dangerous then cma_info() or cma_info_about().

>> +/**
>> + * cma_free - frees a chunk of memory.
>> + * @addr: Beginning of the chunk.
>> + *
>> + * Returns -ENOENT if there is no chunk at given location; otherwise
>> + * zero. In the former case issues a warning.
>> + */
>> +int cma_free(dma_addr_t addr);
>> +

> Is it not an error to free a non-existant chunk? Hope it WARN()s at
> least.

No WARN() is generated but -ENOENT is returned so it is considered an error.
I've also changed the code to use pr_err() when chunk is not found (it used
pr_debug() previously).

I'm still wondering whether the use of address is the best idea or whether
passing a cma_chunk structure would be a better option. In this way, cma_alloc()
would return cma_chunk structure rather then dma_addr_t.

>> +/****************************** Lower lever API *****************************/

> How lower? If it can be hidden, put it in a private header.

It's meant to be used by drivers even though the idea is that most drivers will
stick to the API above.

>> + * cma_alloc_from - allocates contiguous chunk of memory from named regions.

> Ideally named regions would be managed by default by free_area and the core
> page allocator.

Not sure what you mean.

>> +struct cma_chunk {
>> + dma_addr_t start;
>> + size_t size;
>> +
>> + struct cma_region *reg;
>> + struct rb_node by_start;
>> +};
>> +
>
> Is there any scope for reusing parts of kernel/resource.c? Frankly, I
> didn't look at your requirements closely enough or at kernel/resource.c
> capabilities but at a glance, there appears to be some commonality.

I'm not sure how resources.c could be reused. It puts resources in hierarchy
whereas CMA does not care about hierarchy that much plus has only two levels
(regions on top and then chunks allocated inside).

> As an aside, it does not seem necessary to have everything CMA related
> in the same header. Maybe split it out to minimise the risk of drivers
> abusing the layers. Up to you really, I don't feel very strongly on
> header layout.

I dunno, I first created two header files but then decided to put everything
in one file. I dunno if anything is gained by exporting a few functions to
a separate header. I think it only complicates things.

>> +config CMA
>> + bool "Contiguous Memory Allocator framework"
>> + # Currently there is only one allocator so force it on
>> + select CMA_BEST_FIT

>> +config CMA_BEST_FIT
>> + bool "CMA best-fit allocator"
>> + depends on CMA
>> + default y

> You don't need to default y this if CMA is selecting it, right?

True.

> also CMA should default n.

CMA defaults to n.

>> +/*
>> + * Contiguous Memory Allocator framework
>> + * Copyright (c) 2010 by Samsung Electronics.
>> + * Written by Michal Nazarewicz ([email protected])
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License as
>> + * published by the Free Software Foundation; either version 2 of the
>> + * License or (at your optional) any later version of the license.
>
> I'm not certain about the "any later version" part of this license and
> how it applies to kernel code but I'm no licensing guru. I know we have
> duel licensing elsewhere for BSD but someone should double check this
> license is ok.

Why wouldn't it? All it says is that this particular file can be distributed
under GPLv2 or GPLv3 (or any later if FSF decides to publish updated version).
There is no difference between licensing GPLv2/BSD and GPLv2/GPLv3+.

> I am curious about one thing though. Have you considered reusing the bootmem
> allocator code to manage the regions instead of your custom stuff here? Instead
> of the cma_regions core structures, you would associate cma_region with
> a new bootmem_data_t, keep the bootmem code around and allocate using its
> allocator. It's a bitmap allocator too and would be less code in the kernel?

I haven't looked at bootmem in such perspective. I'll add that to my TODO list.
On the other hand, however, it seems bootmem is passée so I'm not sure if it's a
good idea to integrate with it that much.

--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

2010-08-27 02:42:14

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 26 Aug 2010 10:17:07 +0200, Peter Zijlstra <[email protected]> wrote:
> So why not work on the page allocator to improve its contiguous
> allocation behaviour. If you look at the thing you'll find pageblocks
> and migration types. If you change it so that you pin the migration type
> of one or a number of contiguous pageblocks to say MIGRATE_MOVABLE, so
> that they cannot be used for anything but movable pages you're pretty
> much there.

And that's exactly where I'm headed. I've created API that seems to be
usable and meat mine and others requirements (not that I'm not saying it
cannot be improved -- I'm always happy to hear comments) and now I'm
starting to concentrate on the reusing of the grabbed memory. At first
I wasn't sure how this can be managed but thanks to many comments
(including yours, thanks!) I have an idea of how the thing should work
and what I should do from now.

--
Best regards, _ _
| Humble Liege of Serenely Enlightened Majesty of o' \,=./ `o
| Computer Science, Michał "mina86" Nazarewicz (o o)
+----[mina86*mina86.com]---[mina86*jabber.org]----ooO--(_)--Ooo--

2010-08-27 08:21:54

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thu, 26 Aug 2010 18:36:24 +0900
Minchan Kim <[email protected]> wrote:

> On Thu, Aug 26, 2010 at 1:30 PM, KAMEZAWA Hiroyuki
> <[email protected]> wrote:
> > On Thu, 26 Aug 2010 13:06:28 +0900
> > Minchan Kim <[email protected]> wrote:
> >
> >> On Thu, Aug 26, 2010 at 12:44 PM, KAMEZAWA Hiroyuki
> >> <[email protected]> wrote:
> >> > On Thu, 26 Aug 2010 11:50:17 +0900
> >> > KAMEZAWA Hiroyuki <[email protected]> wrote:
> >> >
> >> >> 128MB...too big ? But it's depend on config.
> >> >>
> >> >> IBM's ppc guys used 16MB section, and recently, a new interface to shrink
> >> >> the number of /sys files are added, maybe usable.
> >> >>
> >> >> Something good with this approach will be you can create "cma" memory
> >> >> before installing driver.
> >> >>
> >> >> But yes, complicated and need some works.
> >> >>
> >> > Ah, I need to clarify what I want to say.
> >> >
> >> > With compaction, it's helpful, but you can't get contiguous memory larger
> >> > than MAX_ORDER, I think. To get memory larger than MAX_ORDER on demand,
> >> > memory hot-plug code has almost all necessary things.
> >>
> >> True. Doesn't patch's idea of Christoph helps this ?
> >> http://lwn.net/Articles/200699/
> >>
> >
> > yes, I think so. But, IIRC, it's own purpose of Chirstoph's work is
> > for removing zones. please be careful what's really necessary.
>
> Ahh. Sorry for missing point.
> You're right. The patch can't help our problem.
>
> How about changing following this?
> The thing is MAX_ORDER is static. But we want to avoid too big
> MAX_ORDER of whole zones to support devices which requires big
> allocation chunk.
> So let's add MAX_ORDER into each zone and then, each zone can have
> different max order.
> For example, while DMA[32], NORMAL, HIGHMEM can have normal size 11,
> MOVABLE zone could have a 15.
>
> This approach has a big side effect?
>

Hm...need to check hard coded MAX_ORDER usages...I don't think
side-effect is big. Hmm. But I think enlarging MAX_ORDER isn't an
important thing. A code which strips contiguous chunks of pages from
buddy allocator is a necessaty thing, as..

What I can think of at 1st is...
==
int steal_pages(unsigned long start_pfn, unsigned long end_pfn)
{
/* Be careful mutal execution with memory hotplug, because reusing code */

split [start_pfn, end_pfn) to pageblock_order

for each pageblock in the range {
Mark this block as MIGRATE_ISOLATE
try-to-free pages in the range or
migrate pages in the range to somewhere.
/* Here all pages in the range are on buddy allocator
and free and never be allocated by anyone else. */
}

please see __rmqueue_fallback(). it selects migration-type at 1st.
Then, if you can pass start_migratetype of MIGLATE_ISOLATE,
you can automatically strip all MIGRATE_ISOLATE pages from free_area[].

return chunk of pages.
}
==

Thanks,
-Kame

2010-08-27 08:38:29

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Fri, 2010-08-27 at 17:16 +0900, KAMEZAWA Hiroyuki wrote:
> > How about changing following this?
> > The thing is MAX_ORDER is static. But we want to avoid too big
> > MAX_ORDER of whole zones to support devices which requires big
> > allocation chunk.
> > So let's add MAX_ORDER into each zone and then, each zone can have
> > different max order.
> > For example, while DMA[32], NORMAL, HIGHMEM can have normal size 11,
> > MOVABLE zone could have a 15.
> >
> > This approach has a big side effect?

The side effect of increasing MAX_ORDER is that page allocations get
more expensive since the buddy tree gets larger, yielding more
splits/merges.

> Hm...need to check hard coded MAX_ORDER usages...I don't think
> side-effect is big. Hmm. But I think enlarging MAX_ORDER isn't an
> important thing. A code which strips contiguous chunks of pages from
> buddy allocator is a necessaty thing, as..

Right, once we can explicitly free the pages we want, crossing MAX_ORDER
isn't too hard like you say, we can simply continue with freeing the
next in order page.

2010-08-28 12:38:52

by Hans Verkuil

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 2/6] mm: cma: Contiguous Memory Allocator added

On Friday, August 20, 2010 11:50:42 Michal Nazarewicz wrote:
> The Contiguous Memory Allocator framework is a set of APIs for
> allocating physically contiguous chunks of memory.
>
> Various chips require contiguous blocks of memory to operate. Those
> chips include devices such as cameras, hardware video decoders and
> encoders, etc.
>
> The code is highly modular and customisable to suit the needs of
> various users. Set of regions reserved for CMA can be configured
> per-platform and it is easy to add custom allocator algorithms if one
> has such need.
>
> Signed-off-by: Michal Nazarewicz <[email protected]>
> Signed-off-by: Kyungmin Park <[email protected]>
> Reviewed-by: Pawel Osciak <[email protected]>
> ---
> Documentation/00-INDEX | 2 +
> Documentation/contiguous-memory.txt | 541 +++++++++++++++++++++
> include/linux/cma.h | 431 +++++++++++++++++
> mm/Kconfig | 34 ++
> mm/Makefile | 2 +
> mm/cma-best-fit.c | 407 ++++++++++++++++
> mm/cma.c | 910 +++++++++++++++++++++++++++++++++++
> 7 files changed, 2327 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/contiguous-memory.txt
> create mode 100644 include/linux/cma.h
> create mode 100644 mm/cma-best-fit.c
> create mode 100644 mm/cma.c
>
> diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
> index 8dfc670..f93e787 100644
> --- a/Documentation/00-INDEX
> +++ b/Documentation/00-INDEX
> @@ -94,6 +94,8 @@ connector/
> - docs on the netlink based userspace<->kernel space communication mod.
> console/
> - documentation on Linux console drivers.
> +contiguous-memory.txt
> + - documentation on physically-contiguous memory allocation framework.
> cpu-freq/
> - info on CPU frequency and voltage scaling.
> cpu-hotplug.txt
> diff --git a/Documentation/contiguous-memory.txt b/Documentation/contiguous-memory.txt
> new file mode 100644
> index 0000000..8fc2400
> --- /dev/null
> +++ b/Documentation/contiguous-memory.txt
> @@ -0,0 +1,541 @@
> + -*- org -*-
> +
> +* Contiguous Memory Allocator
> +
> + The Contiguous Memory Allocator (CMA) is a framework, which allows
> + setting up a machine-specific configuration for physically-contiguous
> + memory management. Memory for devices is then allocated according
> + to that configuration.
> +
> + The main role of the framework is not to allocate memory, but to
> + parse and manage memory configurations, as well as to act as an
> + in-between between device drivers and pluggable allocators. It is
> + thus not tied to any memory allocation method or strategy.
> +
> +** Why is it needed?
> +
> + Various devices on embedded systems have no scatter-getter and/or
> + IO map support and as such require contiguous blocks of memory to
> + operate. They include devices such as cameras, hardware video
> + decoders and encoders, etc.
> +
> + Such devices often require big memory buffers (a full HD frame is,
> + for instance, more then 2 mega pixels large, i.e. more than 6 MB
> + of memory), which makes mechanisms such as kmalloc() ineffective.
> +
> + Some embedded devices impose additional requirements on the
> + buffers, e.g. they can operate only on buffers allocated in
> + particular location/memory bank (if system has more than one
> + memory bank) or buffers aligned to a particular memory boundary.
> +
> + Development of embedded devices have seen a big rise recently
> + (especially in the V4L area) and many such drivers include their
> + own memory allocation code. Most of them use bootmem-based methods.
> + CMA framework is an attempt to unify contiguous memory allocation
> + mechanisms and provide a simple API for device drivers, while
> + staying as customisable and modular as possible.
> +
> +** Design
> +
> + The main design goal for the CMA was to provide a customisable and
> + modular framework, which could be configured to suit the needs of
> + individual systems. Configuration specifies a list of memory
> + regions, which then are assigned to devices. Memory regions can
> + be shared among many device drivers or assigned exclusively to
> + one. This has been achieved in the following ways:
> +
> + 1. The core of the CMA does not handle allocation of memory and
> + management of free space. Dedicated allocators are used for
> + that purpose.
> +
> + This way, if the provided solution does not match demands
> + imposed on a given system, one can develop a new algorithm and
> + easily plug it into the CMA framework.
> +
> + The presented solution includes an implementation of a best-fit
> + algorithm.
> +
> + 2. When requesting memory, devices have to introduce themselves.
> + This way CMA knows who the memory is allocated for. This
> + allows for the system architect to specify which memory regions
> + each device should use.
> +
> + 3. Memory regions are grouped in various "types". When device
> + requests a chunk of memory, it can specify what type of memory
> + it needs. If no type is specified, "common" is assumed.
> +
> + This makes it possible to configure the system in such a way,
> + that a single device may get memory from different memory
> + regions, depending on the "type" of memory it requested. For
> + example, a video codec driver might want to allocate some
> + shared buffers from the first memory bank and the other from
> + the second to get the highest possible memory throughput.
> +
> + 4. For greater flexibility and extensibility, the framework allows
> + device drivers to register private regions of reserved memory
> + which then may be used only by them.
> +
> + As an effect, if a driver would not use the rest of the CMA
> + interface, it can still use CMA allocators and other
> + mechanisms.
> +
> + 4a. Early in boot process, device drivers can also request the
> + CMA framework to a reserve a region of memory for them
> + which then will be used as a private region.
> +
> + This way, drivers do not need to directly call bootmem,
> + memblock or similar early allocator but merely register an
> + early region and the framework will handle the rest
> + including choosing the right early allocator.
> +
> +** Use cases
> +
> + Let's analyse some imaginary system that uses the CMA to see how
> + the framework can be used and configured.
> +
> +
> + We have a platform with a hardware video decoder and a camera each
> + needing 20 MiB of memory in the worst case. Our system is written
> + in such a way though that the two devices are never used at the
> + same time and memory for them may be shared. In such a system the
> + following configuration would be used in the platform
> + initialisation code:
> +
> + static struct cma_region regions[] = {
> + { .name = "region", .size = 20 << 20 },
> + { }
> + }
> + static const char map[] __initconst = "video,camera=region";
> +
> + cma_set_defaults(regions, map);
> +
> + The regions array defines a single 20-MiB region named "region".
> + The map says that drivers named "video" and "camera" are to be
> + granted memory from the previously defined region.
> +
> + A shorter map can be used as well:
> +
> + static const char map[] __initconst = "*=region";
> +
> + The asterisk ("*") matches all devices thus all devices will use
> + the region named "region".
> +
> + We can see, that because the devices share the same memory region,
> + we save 20 MiB, compared to the situation when each of the devices
> + would reserve 20 MiB of memory for itself.
> +
> +
> + Now, let's say that we have also many other smaller devices and we
> + want them to share some smaller pool of memory. For instance 5
> + MiB. This can be achieved in the following way:
> +
> + static struct cma_region regions[] = {
> + { .name = "region", .size = 20 << 20 },
> + { .name = "common", .size = 5 << 20 },
> + { }
> + }
> + static const char map[] __initconst =
> + "video,camera=region;*=common";
> +
> + cma_set_defaults(regions, map);
> +
> + This instructs CMA to reserve two regions and let video and camera
> + use region "region" whereas all other devices should use region
> + "common".
> +
> +
> + Later on, after some development of the system, it can now run
> + video decoder and camera at the same time. The 20 MiB region is
> + no longer enough for the two to share. A quick fix can be made to
> + grant each of those devices separate regions:
> +
> + static struct cma_region regions[] = {
> + { .name = "v", .size = 20 << 20 },
> + { .name = "c", .size = 20 << 20 },
> + { .name = "common", .size = 5 << 20 },
> + { }
> + }
> + static const char map[] __initconst = "video=v;camera=c;*=common";
> +
> + cma_set_defaults(regions, map);
> +
> + This solution also shows how with CMA you can assign private pools
> + of memory to each device if that is required.
> +
> +
> + Allocation mechanisms can be replaced dynamically in a similar
> + manner as well. Let's say that during testing, it has been
> + discovered that, for a given shared region of 40 MiB,
> + fragmentation has become a problem. It has been observed that,
> + after some time, it becomes impossible to allocate buffers of the
> + required sizes. So to satisfy our requirements, we would have to
> + reserve a larger shared region beforehand.
> +
> + But fortunately, you have also managed to develop a new allocation
> + algorithm -- Neat Allocation Algorithm or "na" for short -- which
> + satisfies the needs for both devices even on a 30 MiB region. The
> + configuration can be then quickly changed to:
> +
> + static struct cma_region regions[] = {
> + { .name = "region", .size = 30 << 20, .alloc_name = "na" },
> + { .name = "common", .size = 5 << 20 },
> + { }
> + }
> + static const char map[] __initconst = "video,camera=region;*=common";
> +
> + cma_set_defaults(regions, map);
> +
> + This shows how you can develop your own allocation algorithms if
> + the ones provided with CMA do not suit your needs and easily
> + replace them, without the need to modify CMA core or even
> + recompiling the kernel.
> +
> +** Technical Details
> +
> +*** The attributes
> +
> + As shown above, CMA is configured by a two attributes: list
> + regions and map. The first one specifies regions that are to be
> + reserved for CMA. The second one specifies what regions each
> + device is assigned to.
> +
> +**** Regions
> +
> + Regions is a list of regions terminated by a region with size
> + equal zero. The following fields may be set:
> +
> + - size -- size of the region (required, must not be zero)
> + - alignment -- alignment of the region; must be power of two or
> + zero (optional)

Just wondering: is alignment really needed since we already align to the
PAGE_SIZE? Do you know of hardware with alignment requirements > PAGE_SIZE?

> + - start -- where the region has to start (optional)
> + - alloc_name -- the name of allocator to use (optional)
> + - alloc -- allocator to use (optional; and besides
> + alloc_name is probably is what you want)

I would make this field internal only. At least for now.

> +
> + size, alignment and start is specified in bytes. Size will be
> + aligned up to a PAGE_SIZE. If alignment is less then a PAGE_SIZE
> + it will be set to a PAGE_SIZE. start will be aligned to
> + alignment.
> +
> +**** Map
> +
> + The format of the "map" attribute is as follows:
> +
> + map-attr ::= [ rules [ ';' ] ]
> + rules ::= rule [ ';' rules ]
> + rule ::= patterns '=' regions
> +
> + patterns ::= pattern [ ',' patterns ]
> +
> + regions ::= REG-NAME [ ',' regions ]
> + // list of regions to try to allocate memory
> + // from
> +
> + pattern ::= dev-pattern [ '/' TYPE-NAME ] | '/' TYPE-NAME
> + // pattern request must match for the rule to
> + // apply; the first rule that matches is
> + // applied; if dev-pattern part is omitted
> + // value identical to the one used in previous
> + // pattern is assumed.
> +
> + dev-pattern ::= PATTERN
> + // pattern that device name must match for the
> + // rule to apply; may contain question marks
> + // which mach any characters and end with an
> + // asterisk which match the rest of the string
> + // (including nothing).
> +
> + It is a sequence of rules which specify what regions should given
> + (device, type) pair use. The first rule that matches is applied.
> +
> + For rule to match, the pattern must match (dev, type) pair.
> + Pattern consist of the part before and after slash. The first
> + part must match device name and the second part must match kind.
> +
> + If the first part is empty, the device name is assumed to match
> + iff it matched in previous pattern. If the second part is
> + omitted it will mach any type of memory requested by device.
> +
> + Some examples (whitespace added for better readability):
> +
> + cma_map = foo/quaz = r1;
> + // device foo with type == "quaz" uses region r1
> +
> + foo/* = r2; // OR:
> + /* = r2;
> + // device foo with any other kind uses region r2
> +
> + bar = r1,r2;
> + // device bar uses region r1 or r2
> +
> + baz?/a , baz?/b = r3;
> + // devices named baz? where ? is any character
> + // with type being "a" or "b" use r3
> +
> +*** The device and types of memory
> +
> + The name of the device is taken from the device structure. It is
> + not possible to use CMA if driver does not register a device
> + (actually this can be overcome if a fake device structure is
> + provided with at least the name set).
> +
> + The type of memory is an optional argument provided by the device
> + whenever it requests memory chunk. In many cases this can be
> + ignored but sometimes it may be required for some devices.

This really should not be optional but compulsory. 'type' has the same function
as the GFP flags with kmalloc. They tell the kernel where the memory should be
allocated. Only if you do not care at all can you pass in NULL. But in almost
all cases the memory should be at least DMA-able (and yes, for a lot of SoCs that
is the same as any memory -- for now).

Memory types should be defined in the platform code. Some can be generic
like 'dma' (i.e. any DMAable memory), 'dma32' (32-bit DMA) and 'common' (any
memory). Others are platform specific like 'banka' and 'bankb'.

A memory type definition can either be a start address/size pair but it can
perhaps also be a GFP type (e.g. .name = "dma32", .gfp = GFP_DMA32).

Regions should be of a single memory type. So when you define the region it
should have a memory type field.

Drivers request memory of whatever type they require. The mapping just maps
one or more regions to the driver and the cma allocator will pick only those
regions with the required type and ignore those that do not match.

> + For instance, let's say that there are two memory banks and for
> + performance reasons a device uses buffers in both of them.
> + Platform defines a memory types "a" and "b" for regions in both
> + banks. The device driver would use those two types then to
> + request memory chunks from different banks. CMA attributes could
> + look as follows:
> +
> + static struct cma_region regions[] = {
> + { .name = "a", .size = 32 << 20 },
> + { .name = "b", .size = 32 << 20, .start = 512 << 20 },
> + { }
> + }
> + static const char map[] __initconst = "foo/a=a;foo/b=b;*=a,b";

So this would become something like this:

static struct cma_memtype types[] = {
{ .name = "a", .size = 32 << 20 },
{ .name = "b", .size = 32 << 20, .start = 512 << 20 },
// For example:
{ .name = "dma", .gfp = GFP_DMA },
{ }
}
static struct cma_region regions[] = {
// size may of course be smaller than the memtype size.
{ .name = "a", type = "a", .size = 32 << 20 },
{ .name = "b", type = "b", .size = 32 << 20 },
{ }
}
static const char map[] __initconst = "*=a,b";

No need to do anything special for driver foo here: cma_alloc will pick the
correct region based on the memory type requested by the driver.

It is probably no longer needed to specify the memory type in the mapping when
this is in place.

> +
> + And whenever the driver allocated the memory it would specify the
> + kind of memory:
> +
> + buffer1 = cma_alloc(dev, "a", 1 << 20, 0);
> + buffer2 = cma_alloc(dev, "b", 1 << 20, 0);
> +
> + If it was needed to try to allocate from the other bank as well if
> + the dedicated one is full, the map attributes could be changed to:
> +
> + static const char map[] __initconst = "foo/a=a,b;foo/b=b,a;*=a,b";

This would be something for the driver to decide. If the driver can handle
this, then the driver should just try memtype "a" first, and then "b".

> + On the other hand, if the same driver was used on a system with
> + only one bank, the configuration could be changed just to:
> +
> + static struct cma_region regions[] = {
> + { .name = "r", .size = 64 << 20 },
> + { }
> + }
> + static const char map[] __initconst = "*=r";
> +
> + without the need to change the driver at all.

The only change needed here is that the region gets a '.type = "dma"' specifier
as well.

I think I am otherwise quite happy with this code (as least from the PoV of
v4l). But that 'memory kind' handling never felt like it was handled at the
right level.

Regards,

Hans

--
Hans Verkuil - video4linux developer - sponsored by TANDBERG, part of Cisco

2010-08-28 13:10:14

by Hans Verkuil

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Thursday, August 26, 2010 00:58:14 Andrew Morton wrote:
> On Fri, 20 Aug 2010 15:15:10 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > On Fri, 2010-08-20 at 11:50 +0200, Michal Nazarewicz wrote:
> > > Hello everyone,
> > >
> > > The following patchset implements a Contiguous Memory Allocator. For
> > > those who have not yet stumbled across CMA an excerpt from
> > > documentation:
> > >
> > > The Contiguous Memory Allocator (CMA) is a framework, which allows
> > > setting up a machine-specific configuration for physically-contiguous
> > > memory management. Memory for devices is then allocated according
> > > to that configuration.
> > >
> > > The main role of the framework is not to allocate memory, but to
> > > parse and manage memory configurations, as well as to act as an
> > > in-between between device drivers and pluggable allocators. It is
> > > thus not tied to any memory allocation method or strategy.
> > >
> > > For more information please refer to the second patch from the
> > > patchset which contains the documentation.
> >
> > So the idea is to grab a large chunk of memory at boot time and then
> > later allow some device to use it?
> >
> > I'd much rather we'd improve the regular page allocator to be smarter
> > about this. We recently added a lot of smarts to it like memory
> > compaction, which allows large gobs of contiguous memory to be freed for
> > things like huge pages.
> >
> > If you want guarantees you can free stuff, why not add constraints to
> > the page allocation type and only allow MIGRATE_MOVABLE pages inside a
> > certain region, those pages are easily freed/moved aside to satisfy
> > large contiguous allocations.
>
> That would be good. Although I expect that the allocation would need
> to be 100% rock-solid reliable, otherwise the end user has a
> non-functioning device.

Yes, indeed. And you have to be careful as well how you move pages around.
Say that you have a capture and an output v4l device: the first one needs
64 MB contiguous memory and so it allocates that amount, moving pages around
as needed. Once allocated that memory is pinned in place since it is needed
for DMA. So if the output device also needs 64 MB, then you must have a
guarantee that the first allocation didn't fragment the available contiguous
memory.

I also wonder how expensive it is to move all the pages around. E.g. if you
have a digital camera and want to make a hires picture, then it wouldn't
do if it takes a second to move all the pages around making room for the
captured picture. The CPUs in many SoCs are not very powerful compared to
your average desktop.

And how would memory allocations in specific memory ranges (e.g. memory
banks) work?

Note also that these issues are not limited to embedded systems, also PCI(e)
boards can sometimes require massive amounts of DMA-able memory. I have had
this happen in the past with the ivtv driver with customers that had 15 or so
capture cards in one box. And I'm sure it will happen in the future as well,
esp. with upcoming 4k video formats.

Video is a major memory consumer, particularly in embedded systems. And there
usually is no room for failure.

> Could generic core VM provide the required level
> of service?
>
> Anyway, these patches are going to be hard to merge but not impossible.
> Keep going. Part of the problem is cultural, really: the consumers of
> this interface are weird dinky little devices which the core MM guys
> tend not to work with a lot, and it adds code which they wouldn't use.

It's not really that weird. The same problems can actually occur as well
with the more 'mainstream' consumer level video boards, although you need
more extreme environments for these problems to surface.

> I agree that having two "contiguous memory allocators" floating about
> on the list is distressing. Are we really all 100% diligently certain
> that there is no commonality here with Zach's work?
>
> I agree that Peter's above suggestion would be the best thing to do.
> Please let's take a look at that without getting into sunk cost
> fallacies with existing code!
>
> It would help (a lot) if we could get more attention and buyin and
> fedback from the potential clients of this code. rmk's feedback is
> valuable. Have we heard from the linux-media people?

I'm doing the reviewing for linux-media. It would be really nice to have a
good system for this in place. For example, the current TI davinci capture
driver will only work reliably (memory-wise) if you also use the out-of-tree
TI cmem module. Hardly a desirable situation. Basically a fair amount of
custom hacks is required at the memory to have reliable video streaming on
embedded systems due to the lack of a cma-type framework.

> What other
> subsystems might use it? ieee1394 perhaps? Please help identify
> specific subsystems and I can perhaps help to wake people up.

The video subsystem is the other candidate. Probably not for the current generation
of GPUs (these all have hardware IOMMUs I suspect), but definitely for the framebuffer
based devices that are often present on SoCs.

Regards,

Hans

> And I agree that this code (or one of its alternatives!) would benefit
> from having a core MM person take a close interest. Any volunteers?
>
> Please cc me on future emails on this topic?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

--
Hans Verkuil - video4linux developer - sponsored by TANDBERG, part of Cisco

2010-08-28 13:35:53

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Sat, 2010-08-28 at 15:08 +0200, Hans Verkuil wrote:

> > That would be good. Although I expect that the allocation would need
> > to be 100% rock-solid reliable, otherwise the end user has a
> > non-functioning device.
>
> Yes, indeed. And you have to be careful as well how you move pages around.
> Say that you have a capture and an output v4l device: the first one needs
> 64 MB contiguous memory and so it allocates that amount, moving pages around
> as needed. Once allocated that memory is pinned in place since it is needed
> for DMA. So if the output device also needs 64 MB, then you must have a
> guarantee that the first allocation didn't fragment the available contiguous
> memory.

Isn't the proposed CMA thing vulnerable to the exact same problem? If
you allow sharing of regions and plug some allocator in there you get
the same problem. If you can solve it there, you can solve it for any
kind of reservation scheme.

> I also wonder how expensive it is to move all the pages around. E.g. if you
> have a digital camera and want to make a hires picture, then it wouldn't
> do if it takes a second to move all the pages around making room for the
> captured picture. The CPUs in many SoCs are not very powerful compared to
> your average desktop.

Well, that's a trade-off, if you want to have the memory be usable for
anything else (which I understood people did want) then you have to pay
for cleaning it up when you need to use it.

As for the cost of compaction vs regular page-out of random page-cache
memory, compaction is actually cheaper, since it doesn't need to write
out dirty data, and page-out driven writeback sucks due to the
non-linear nature of it.

> And how would memory allocations in specific memory ranges (e.g. memory
> banks) work?

Make sure you reserve pageblocks in the desired range.

> Note also that these issues are not limited to embedded systems, also PCI(e)
> boards can sometimes require massive amounts of DMA-able memory. I have had
> this happen in the past with the ivtv driver with customers that had 15 or so
> capture cards in one box. And I'm sure it will happen in the future as well,
> esp. with upcoming 4k video formats.

I would sincerely hope PCI(e) devices come with an IOMMU (and all memory
lines wired up), really, any hardware that doesn't isn't worth the
silicon its engraved in. Just don't buy it.

2010-08-28 13:59:44

by Hans Verkuil

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Saturday, August 28, 2010 15:34:46 Peter Zijlstra wrote:
> On Sat, 2010-08-28 at 15:08 +0200, Hans Verkuil wrote:
>
> > > That would be good. Although I expect that the allocation would need
> > > to be 100% rock-solid reliable, otherwise the end user has a
> > > non-functioning device.
> >
> > Yes, indeed. And you have to be careful as well how you move pages around.
> > Say that you have a capture and an output v4l device: the first one needs
> > 64 MB contiguous memory and so it allocates that amount, moving pages around
> > as needed. Once allocated that memory is pinned in place since it is needed
> > for DMA. So if the output device also needs 64 MB, then you must have a
> > guarantee that the first allocation didn't fragment the available contiguous
> > memory.
>
> Isn't the proposed CMA thing vulnerable to the exact same problem? If
> you allow sharing of regions and plug some allocator in there you get
> the same problem. If you can solve it there, you can solve it for any
> kind of reservation scheme.

Since with cma you can assign a region exclusively to a driver you can ensure
that this problem does not occur. Of course, if you allow sharing then you will
end up with the same type of problem unless you know that there is only one
driver at a time that will use that memory.

> > I also wonder how expensive it is to move all the pages around. E.g. if you
> > have a digital camera and want to make a hires picture, then it wouldn't
> > do if it takes a second to move all the pages around making room for the
> > captured picture. The CPUs in many SoCs are not very powerful compared to
> > your average desktop.
>
> Well, that's a trade-off, if you want to have the memory be usable for
> anything else (which I understood people did want) then you have to pay
> for cleaning it up when you need to use it.
>
> As for the cost of compaction vs regular page-out of random page-cache
> memory, compaction is actually cheaper, since it doesn't need to write
> out dirty data, and page-out driven writeback sucks due to the
> non-linear nature of it.

There is obviously a trade-off. I was just wondering how costly it is.
E.g. would it be a noticeable delay making 64 MB memory available in this
way on a, say, 600 MHz ARM.

> > And how would memory allocations in specific memory ranges (e.g. memory
> > banks) work?
>
> Make sure you reserve pageblocks in the desired range.
>
> > Note also that these issues are not limited to embedded systems, also PCI(e)
> > boards can sometimes require massive amounts of DMA-able memory. I have had
> > this happen in the past with the ivtv driver with customers that had 15 or so
> > capture cards in one box. And I'm sure it will happen in the future as well,
> > esp. with upcoming 4k video formats.
>
> I would sincerely hope PCI(e) devices come with an IOMMU (and all memory
> lines wired up), really, any hardware that doesn't isn't worth the
> silicon its engraved in. Just don't buy it.

In the case of the ivtv driver the PCI device had a broken scatter-gather DMA
engine, which is the underlying reason for these issues. Since I was maintainer
of this driver for a few years I would love to have a reliable solution for the
memory issues. It's not a big deal, 99.99% of all users will never notice
anything, but still... And I don't think there are any affordable or easily
obtainable alternatives to this hardware with similar feature sets, even after
all these years.

Anyway, I agree with your sentiment, but reality can be disappointingly
different :-( And especially with regards to video hardware the creativity of
the hardware designers is boundless -- to the dismay of us linux-media developers.

Regards,

Hans

--
Hans Verkuil - video4linux developer - sponsored by TANDBERG, part of Cisco

2010-08-28 14:16:38

[permalink] [raw]

Subject: Re: [PATCH/RFCv4 0/6] The Contiguous Memory Allocator framework

On Sat, 2010-08-28 at 15:58 +0200, Hans Verkuil wrote:
> > Isn't the proposed CMA thing vulnerable to the exact same problem? If
> > you allow sharing of regions and plug some allocator in there you get
> > the same problem. If you can solve it there, you can solve it for any
> > kind of reservation scheme.
>
> Since with cma you can assign a region exclusively to a driver you can ensure
> that this problem does not occur. Of course, if you allow sharing then you will
> end up with the same type of problem unless you know that there is only one
> driver at a time that will use that memory.

I think you could do the same thing, the proposed page allocator
solutions still needs to manage pageblock state, you can manage those
the same as you would your cma regions -- the difference is that you get
the option of letting the rest of the system use the memory in a
transparent manner if you don't need it.

> There is obviously a trade-off. I was just wondering how costly it is.
> E.g. would it be a noticeable delay making 64 MB memory available in this
> way on a, say, 600 MHz ARM.

Right, dunno really, rather depends on the memory bandwidth of your arm
device I suspect. It is something you'd have to test.

In case the machine isn't fast enough, there really isn't anything you
can do but keep the memory empty at all times; unless of course the
device in question needs it.

2010-08-29 01:49:25