This patch contains the documentation for the API, termed the Virtual
Contiguous Memory Manager. Its use would allow all of the IOMMU to VM,
VM to device and device to IOMMU interoperation code to be refactored
into platform independent code.
Comments, suggestions and criticisms are welcome and wanted.
Signed-off-by: Zach Pfeffer <[email protected]>
---
Documentation/vcm.txt | 583 +++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 583 insertions(+), 0 deletions(-)
create mode 100644 Documentation/vcm.txt
diff --git a/Documentation/vcm.txt b/Documentation/vcm.txt
new file mode 100644
index 0000000..d29c757
--- /dev/null
+++ b/Documentation/vcm.txt
@@ -0,0 +1,583 @@
+What is this document about?
+============================
+
+This document covers how to use the Virtual Contiguous Memory Manager
+(VCMM), how the first implmentation works with a specific low-level
+Input/Output Memory Management Unit (IOMMU) and the way the VCMM is used
+from user-space. It also contains a section that describes why something
+like the VCMM is needed in the kernel.
+
+If anything in this document is wrong please send patches to the
+maintainer of this file, listed at the bottom of the document.
+
+
+The Virtual Contiguous Memory Manager
+=====================================
+
+The VCMM was built to solve the system-wide memory mapping issues that
+occur when many bus-masters have IOMMUs.
+
+An IOMMU maps device addresses to physical addresses. It also insulates
+the system from spurious or malicious device bus transactions and allows
+fine-grained mapping attribute control. The Linux kernel core does not
+contain a generic API to handle IOMMU mapped memory; device driver writers
+must implement device specific code to interoperate with the Linux kernel
+core. As the number of IOMMUs increases, coordinating the many address
+spaces mapped by all discrete IOMMUs becomes difficult without in-kernel
+support.
+
+The VCMM API enables device independent IOMMU control, virtual memory
+manager (VMM) interoperation and non-IOMMU enabled device interoperation
+by treating devices with or without IOMMUs and all CPUs with or without
+MMUs, their mapping contexts and their mappings using common
+abstractions. Physical hardware is given a generic device type and mapping
+contexts are abstracted into Virtual Contiguous Memory (VCM)
+regions. Users "reserve" memory from VCMs and "back" their reservations
+with physical memory.
+
+Why the VCMM is Needed
+----------------------
+
+Driver writers who control devices with IOMMUs must contend with device
+control and memory management. Driver writers have a large device driver
+API that they can leverage to control their devices, but they are lacking
+a unified API to help them program mappings into IOMMUs and share those
+mappings with other devices and CPUs in the system.
+
+Sharing is complicated by Linux's CPU centric VMM. The CPU centric model
+generally makes sense because average hardware only contains a MMU for the
+CPU and possibly a graphics MMU. If every device in the system has one or
+more MMUs the CPU centric memory management (MM) programming model breaks
+down.
+
+Abstracting IOMMU device programming into a common API has already begun
+in the Linux kernel. It was built to abstract the difference between AMDs
+and Intels IOMMUs to support x86 virtualization on both platforms. The
+interface is listed in kernel/include/linux/iommu.h. It contains
+interfaces for mapping and unmapping as well as domain management. This
+interface has not gained widespread use outside the x86; PA-RISC, Alpha
+and SPARC architectures and ARM and PowerPC platforms all use their own
+mapping modules to control their IOMMUs. The VCMM contains an IOMMU
+programming layer, but since its abstraction supports map management
+independent of device control, the layer is not used directly. This
+higher-level view enables a new kernel service, not just an IOMMU
+interoperation layer.
+
+The General Idea: Map Management using Graphs
+---------------------------------------------
+
+Looking at mapping from a system-wide perspective reveals a general graph
+problem. The VCMMs API is built to manage the general mapping graph. Each
+node that talks to memory, either through an MMU or directly (physically
+mapped) can be thought of as the device-end of a mapping edge. The other
+edge is the physical memory (or intermediate virtual space) that is
+mapped.
+
+In the direct mapped case the device is assigned a one-to-one MMU. This
+scheme allows direct mapped devices to participate in general graph
+management.
+
+The CPU nodes can also be brought under the same mapping abstraction with
+the use of a light overlay on the existing VMM. This light overlay allows
+VMM managed mappings to interoperate with the common API. The light
+overlay enables this without substantial modifications to the existing
+VMM.
+
+In addition to CPU nodes that are running Linux (and the VMM), remote CPU
+nodes that may be running other operating systems can be brought into the
+general abstraction. Routing all memory management requests from a remote
+node through the central memory management framework enables new features
+like system-wide memory migration. This feature may only be feasible for
+large buffers that are managed outside of the fast-path, but having remote
+allocation in a system enables features that are impossible to build
+without it.
+
+The fundamental objects that support graph-based map management are:
+
+1) Virtual Contiguous Memory Regions
+
+2) Reservations
+
+3) Associated Virtual Contiguous Memory Regions
+
+4) Memory Targets
+
+5) Physical Memory Allocations
+
+Usage Overview
+--------------
+
+In a nut-shell, users allocate Virtual Contiguous Memory Regions and
+associate those regions with one or more devices by creating an Associated
+Virtual Contiguous Memory Region. Users then create Reservations from the
+Virtual Contiguous Memory Region. At this point no physical memory has
+been committed to the reservation. To associate physical memory with a
+reservation a Physical Memory Allocation is created and the Reservation is
+backed with this allocation.
+
+include/linux/vcm.h includes comments documenting each API.
+
+Virtual Contiguous Memory Regions
+---------------------------------
+
+A Virtual Contiguous Memory Region (VCM) abstracts the memory space a
+device sees. The addresses of the region are only used by the devices
+which are associated with the region. This address space would normally be
+implemented as a device page-table.
+
+A VCM is created and destroyed with three functions:
+
+ struct vcm *vcm_create(size_t start_addr, size_t len);
+
+ struct vcm *vcm_create_from_prebuilt(size_t ext_vcm_id);
+
+ int vcm_free(struct vcm *vcm);
+
+start_addr is an offset into the address space where allocations will
+start from. len is the length from start_addr of the VCM. Both functions
+generate an instance of a VCM.
+
+ext_vcm_id is used to pass a request to the VMM to generate a VCM
+instance. In the current implementation the call simply makes a note that
+the VCM instance is a VMM VCM instance for other interfaces usage. This
+muxing is seen throughout the implementation.
+
+vcm_create() and vcm_create_from_prebuilt() produce VCM instances for
+virtually mapped devices (IOMMUs and CPUs). To create a one-to-one mapped
+VCM users pass the start_addr and len of the physical region. The VCMM
+matches this and records that the VCM instance is a one-to-one VCM.
+
+The newly created VCM instance can be passed to any function that needs to
+operate on or with a virtual contiguous memory region. Its main attributes
+are a start_addr and a len as well as an internal setting that allows the
+implementation to mux between true virtual spaces, one-to-one mapped
+spaces and VMM managed spaces.
+
+The current implementation uses the genalloc library to manage the VCM for
+IOMMU devices. Return values and more in-depth per-function documentation
+for these and the ones listed below are in include/linux/vcm.h.
+
+Reservations
+------------
+
+A Reservation is a contiguous region allocated from a VCM. There is no
+physical memory associated with it.
+
+A Reservation is created and destroyed with:
+
+ struct res *vcm_reserve(struct vcm *vcm, size_t len, uint32_t attr);
+
+ int vcm_unreserve(struct res *res);
+
+A vcm is a VCM created above. len is the length of the request. It can be
+up to the length of the VCM region the reservation is being created
+from. attr are mapping attributes: read, write, execute, user, supervisor,
+secure, not-cached, write-back/write-allocate, write-back/no
+write-allocate, write-through. These attrs are appropriate for ARM but can
+be changed to match to any architecture.
+
+The implementation calls gen_pool_alloc() for IOMMU devices,
+alloc_vm_area() for VMM areas and is a pass through for one-to-one mapped
+areas.
+
+Associated Virtual Contiguous Memory Regions and Activation
+-----------------------------------------------------------
+
+An Associated Virtual Contiguous Memory Region (AVCM) is a mapping of a
+VCM to a device. The mapping can be active or inactive.
+
+An AVCM is managed with:
+
+struct avcm *vcm_assoc(struct vcm *vcm, size_t dev, uint32_t attr);
+
+ int vcm_deassoc(struct avcm *avcm);
+
+ int vcm_activate(struct avcm *avcm);
+
+ int vcm_deactivate(struct avcm *avcm);
+
+A VCM instance is a VCM created above. A dev is an opaque device handle
+thats passed down to the device driver the VCMM muxes in to handle a
+request. attr are association attributes: split, use-high or
+use-low. split controls which transactions hit a high-address page-table
+and which transactions hit a low-address page-table. For instance, all
+transactions whose most significant address bit is one would use the
+high-address page-table, any other transaction would use the low address
+page-table. This scheme is ARM specific and could be changed in other
+architectures. One VCM instance can be associated with many devices and
+many VCM instances can be associated with one device.
+
+An AVCM is only a link. To program and deprogram a device with a VCM the
+user calls vcm_activate() and vcm_deactivate().For IOMMU devices,
+activating a mapping programs the base address of a page-table into an
+IOMMU. For VMM and one-to-one based devices, mappings are active
+immediately but the API does require an activation call for them for
+internal reference counting.
+
+Memory Targets
+--------------
+
+A Memory Target is a platform independent way of specifying a physical
+pool; it abstracts a pool of physical memory. The physical memory pool may
+be physically discontiguous, need to be allocated from in a unique way or
+have other user-defined attributes.
+
+Physical Memory Allocation and Reservation Backing
+--------------------------------------------------
+
+Physical memory is allocated as a separate step from reserving
+memory. This allows multiple reservations to back the same physical
+memory.
+
+A Physical Memory Allocation is managed using the following functions:
+
+ struct physmem *vcm_phys_alloc(enum memtype_t memtype, size_t len,
+ uint32_t attr);
+
+ int vcm_phys_free(struct physmem *physmem);
+
+ int vcm_back(struct res *res, struct physmem *physmem);
+
+ int vcm_unback(struct res *res);
+
+attr can include an alignment request, a specification to map memory using
+various block sizes and/or to use physically contiguous memory. memtype is
+one of the memory types listed in Memory Targets.
+
+The current implementation manages two pools of memory. One pool is a
+contiguous block of memory and the other is a set of contiguous block
+pools. In the current implementation the block pools contain 4K, 64K and
+1M blocks. The physical allocator does not try to split blocks from the
+contiguous block pools to satisfy requests.
+
+The use of 4K, 64K and 1M blocks solves a problem with some IOMMU
+hardware. IOMMUs are placed in front of multimedia engines to provide a
+contiguous address space to the device. Multimedia devices need large
+buffers and large buffers may map to a large number of physical
+blocks. IOMMUs tend to have small translation lookaside buffers
+(TLBs). Since the TLB is small the number of physical blocks that map a
+given range needs to be small or else the IOMMU will continually fetch new
+translations during a typical streamed multimedia flow. By using a 1 MB
+mapping (or 64K mapping) instead of a 4K mapping the number of misses can
+be minimized, allowing the multimedia block to meet its performance goals.
+
+Low Level Control
+-----------------
+
+It is necessary in some instances to access attributes and provide
+higher-level control of the low-level hardware abstraction. The API
+contains many functions for this task but the two that are typically used
+are:
+
+ size_t vcm_get_dev_addr(struct res *res);
+
+ int vcm_hook(size_t dev, vcm_handler handler, void *data);
+
+The first function, vcm_get_dev_addr() returns a device address given a
+reservation. This device address is a virtual IOMMU address for
+reservations on IOMMU VCMs, a virtual VMM address for reservations on VMM
+VCMs and a virtual (really physical since its one-to-one mapped) address
+for one-to-one devices.
+
+The second function, vcm_hook allows a caller in the kernel to register a
+user_handler. The handler is passed the data member passed to vcm_hook
+during a fault. The user can return 1 to indicate that the underlying
+driver should handle the fault and retry the transaction or they can
+return 0 to halt the transaction. If the user doesn't register a handler
+the low-level driver will print a warning and terminate the transaction.
+
+A Detailed Walk Through
+-----------------------
+
+The following call sequence walks through a typical allocation
+sequence. In the first stage the memory for a device is reserved and
+backed. This occurs without mapping the memory into a VMM VCM region. The
+second stage maps the first VCM region into a VMM VCM region so the kernel
+can read or write it. The second stage is not necessary if the VMM does
+not need to read or modify the contents of the original mapping.
+
+ Stage 1: Map and Allocate Memory for a Device
+
+ The call sequence starts by creating a VCM region:
+
+ vcm = vcm_create(start_addr, len);
+
+ The next call associates a VCM region with a device:
+
+ avcm = vcm_assoc(vcm, dev, attr);
+
+ To activate the association users call vcm_activate() on the avcm from
+ the associate call. This programs the underlining device with the
+ mappings.
+
+ ret = vcm_activate(avcm);
+
+ Once a VCM region is created and associated it can be reserved from
+ with:
+
+ res = vcm_reserve(vcm, res_len, res_attr);
+
+ A user then allocates physical memory with:
+
+ physmem = vcm_phys_alloc(memtype, len, phys_attr);
+
+ To back the reservation with the physical memory allocation the user
+ calls:
+
+ vcm_back(res, physmem);
+
+
+ Stage 2: Map the Device's Memory into the VMM's VCM region
+
+ If the VMM needs to read and/or write the region that was just created
+ the following calls are made.
+
+ The first call creates a prebuilt VCM with:
+
+ vcm_vmm = vcm_from_prebuilt(ext_vcm_id);
+
+ The prebuilt VCM is associated with the CPU device and activated with:
+
+ avcm_vmm = vcm_assoc(vcm_vmm, dev_cpu, attr);
+ vcm_activate(avcm_vmm);
+
+ A reservation is made on the VMM VCM with:
+
+ res_vmm = vcm_reserve(vcm_vmm, res_len, attr);
+
+ Finally, once the topology has been set up a vcm_back() allows the VMM
+ to read the memory using the physmem generated in stage 1:
+
+ vcm_back(res_vmm, physmem);
+
+Mapping IOMMU, one-to-one and VMM Reservations
+----------------------------------------------
+
+The following example demonstrates mapping IOMMU, one-to-one and VMM
+reservations to the same physical memory. It shows the use of phys_addr
+and phys_size to create a contiguous VCM for one-to-one mapped devices.
+
+ The user allocates physical memory:
+
+ physmem = vcm_phys_alloc(memtype, SZ_2MB + SZ_4K, CONTIGUOUS);
+
+ Creates an IOMMU VCM:
+
+ vcm_iommu = vcm_create(SZ_1K, SZ_16M);
+
+ Creates an one-to-one VCM:
+
+ vcm_onetoone = vcm_create(phys_addr, phys_size);
+
+ Creates a Prebuit VCM:
+
+ vcm_vmm = vcm_from_prebuit(ext_vcm_id);
+
+ Associate and activate all three to their respective devices:
+
+ avcm_iommu = vcm_assoc(vcm_iommu, dev_iommu, attr0);
+ avcm_onetoone = vcm_assoc(vcm_onetoone, dev_onetoone, attr1);
+ avcm_vmm = vcm_assoc(vcm_vmm, dev_cpu, attr2);
+ vcm_activate(avcm_iommu);
+ vcm_activate(avcm_onetoone);
+ vcm_activate(avcm_vmm);
+
+ And finally, creates and backs reservations on all 3 such that they
+ all point to the same memory:
+
+ res_iommu = vcm_reserve(vcm_iommu, SZ_2MB + SZ_4K, attr);
+ res_onetoone = vcm_reserve(vcm_onetoone, SZ_2MB + SZ_4K, attr);
+ res_vmm = vcm_reserve(vcm_vmm, SZ_2MB + SZ_4K, attr);
+ vcm_back(res_iommu, physmem);
+ vcm_back(res_onetoone, physmem);
+ vcm_back(res_vmm, physmem);
+
+VCM Summary
+-----------
+
+The VCMM is an attempt to abstract attributes of three distinct classes of
+mappings into one API. The VCMM allows users to reason about mappings as
+first class objects. It also allows memory mappings to flow from the
+traditional 4K mappings prevalent on systems today to more efficient block
+sizes. Finally, it allows users to manage mapping interoperation without
+becoming VMM experts. These features will allow future systems with many
+MMU mapped devices to interoperate simply and therefore correctly.
+
+
+IOMMU Hardware Control
+======================
+
+The VCM currently supports a single type of IOMMU, a Qualcomm System MMU
+(SMMU). The SMMU interface contains functions to map and unmap virtual
+addresses, perform address translations and initialize hardware. A
+Qualcomm SMMU can contain multiple MMU contexts. Each context can
+translate in parallel. All contexts in a SMMU share one global translation
+look-aside buffer (TLB).
+
+To support context muxing the SMMU module creates and manages device
+independent virtual contexts. These context abstractions are bound to
+actual contexts at run-time. Once bound, a context can be activated. This
+activation programs the underlying context with the virtual context
+affecting a context switch.
+
+The following functions are all documented in:
+
+ arch/arm/mach-msm/include/mach/smmu_driver.h.
+
+Mapping
+-------
+
+To map and unmap a virtual page into physical space the VCM calls:
+
+ int smmu_map(struct smmu_dev *dev, unsigned long pa,
+ unsigned long va, unsigned long len, unsigned int attr);
+
+ int smmu_unmap(struct smmu_dev *dev, unsigned long va,
+ unsigned long len);
+
+ int smmu_update_start(struct smmu_dev *dev);
+
+ int smmu_update_done(struct smmu_dev *dev);
+
+The size given to map must be 4K, 64K, 1M or 16M and the VA and PA must be
+aligned to the given size. smmu_update_start() and smmu_update_done()
+should be called before and after each map or unmap.
+
+Translation
+-----------
+
+To request a hardware VA to PA translation on a single address the VCM
+calls:
+
+ unsigned long smmu_translate(struct smmu_dev *dev,
+ unsigned long va);
+
+Fault Handling
+--------------
+
+To register an interrupt handler for a context the VCM calls:
+
+ int smmu_hook_irpt(struct smmu_dev *dev, vcm_handler handler,
+ void *data);
+
+The registered interrupt handler should return 1 if it wants the SMMU
+driver to retry the transaction again and 0 if it wants the SMMU driver to
+terminate the transaction.
+
+Managing SMMU Initialization and Contexts
+-----------------------------------------
+
+SMMU hardware initialization and management happens in 2 steps. The first
+step initializes global SMMU devices and abstract device contexts. The
+second step binds contexts and devices.
+
+A SMMU hardware instance is built with:
+
+ int smmu_drvdata_init(struct smmu_driver *drv, unsigned long base,
+ int irq);
+
+A SMMU context is Initialized and deinitialized with:
+
+ struct smmu_dev *smmu_ctx_init(int ctx);
+ int smmu_ctx_deinit(struct smmu_dev *dev);
+
+An abstract SMMU context is bound to a particular SMMU with:
+
+ int smmu_ctx_bind(struct smmu_dev *ctx, struct smmu_driver *drv);
+
+Activation
+----------
+
+Activation affects a context switch.
+
+Activation, deactivation and activation state testing are done with:
+
+ int smmu_activate(struct smmu_dev *dev);
+ int smmu_deactivate(struct smmu_dev *dev);
+ int smmu_is_active(struct smmu_dev *dev);
+
+
+Userspace Access to Devices with IOMMUs
+=======================================
+
+A device that issues transactions through an IOMMU must work with two
+APIs. The first API is the VCM. The VCM API is device independent. Users
+pass the VCM a dev_id and the VCM makes calls on the hardware device it
+has been configured with using this dev_id. The second API is whatever
+device topology has been created to organize the particular IOMMUs in a
+system. The only constraint on this second API is that it must give the
+user a single dev_id that it can pass through the VCM.
+
+For the Qualcomm SMMUs the second API consists of a tree of platform
+devices and two platform drivers as well as a context lookup function that
+traverses the device tree and returns a dev_id given a context name.
+
+Qualcomm SMMU Device Tree
+-------------------------
+
+The current tree organizes the devices into a tree that looks like the
+following:
+
+smmu/
+ smmu0/
+ ctx0
+ ctx1
+ ctx2
+ smmu1/
+ ctx3
+
+
+Each context, ctx[n] and each smmu, smmu[n] is given a name. Since users
+are interested in contexts not smmus, the contexts name is passed to a
+function to find the dev_id associated with that name. The functions to
+find, free and get the base address (since the device probe function calls
+ioremap to map the SMMUs configuration registers into the kernel) are
+listed here:
+
+ struct smmu_dev *smmu_get_ctx_instance(char *ctx_name);
+ int smmu_free_ctx_instance(struct smmu_dev *dev);
+ unsigned long smmu_get_base_addr(struct smmu_dev *dev);
+
+Documentation for these functions is in:
+
+ arch/arm/mach-msm/include/mach/smmu_device.h
+
+Each context is given a dev node named after the context. For example:
+
+ /dev/vcodec_a_mm1
+ /dev/vcodec_b_mm2
+ /dev/vcodec_stream
+ etc...
+
+Users open, close and mmap these nodes to access VCM buffers from
+userspace in the same way that they used to open, close and mmap /dev
+nodes that represented large physically contiguous buffers (called PMEM
+buffers on Android).
+
+Example
+-------
+
+An abbreviated example is shown here:
+
+Users get the dev_id associated with their target context, create a VCM
+topology appropriate for their device and finally associate the VCMs of
+the topology with the contexts that will take the VCMs:
+
+ dev_id = smmu_get_ctx_instance(vcodec_a_stream);
+
+create vcm and needed topology
+
+ avcm = vcm_assoc(vcm, dev_id, attr);
+
+Tying it all Together
+---------------------
+
+VCMs, IOMMUs and the device tree all work to support system-wide memory
+mappings. The use of each API in this system allows users to concentrate
+on the relevant details without needing to worry about low-level
+details. The APIs clear separation of memory spaces and the devices that
+support those memory spaces continues Linuxs tradition of abstracting the
+what from the how.
+
+
+Maintainer: Zach Pfeffer <[email protected]>
--
1.7.0.2
The Virtual Contiguous Memory Manager (VCMM) allows the CPU, IOMMU
devices and physically mapped devices to map the same physical memory
using a common API. It achieves this by abstracting mapping into graph
management.
Within the graph abstraction, the device end-points are CPUs with and
without MMUs and devices with and without IOMMUs. The mapped memory is
the other end-point. In the [IO]MMU case this is readily apparent. In
the non-[IO]MMU case, it is as apparent if you give each device a
"one-to-one" mapper. These mappings are wrapped up into "reservations"
represented by struct res's.
Also part of this graph is a mapping of a virtual space, struct vcm,
to a device. One virtual space may be associated with multiple devices
and multiple virtual-spaces may be associated with the same device. In
the case of a one-to-one device this virtual-space mirrors the
physical-space. A virtual-space is tied to a device using a
"association", a struct avcm. A device may be associated with a
virtual-space without being programed to translate based on that
virtual-space. Programming occurs using an activate call.
The physical side of the mapping (or intermediate address-space side
in virtualized environments) is represented by a struct physmem. Due to
the peculiar needs of various IOMMUs, the VCM contains a physical
allocation subsystem that manages blocks of different sizes from
different pools. This allows fine grained control of block and
physical placement, a feature many advanced IOMMU devices need.
Once a user has made a reservation on a VCM and allocated physical
memory, the two graph end-points are joined in a backing step. This
step allows multiple reservations to map the same physical location.
Many of the functions in the API take various attributes that provide
fine grained control of the objects they create. For instance,
reservations can map cachable memory and physical allocations can be
constrained to use a particular subset of block sizes.
Signed-off-by: Zach Pfeffer <[email protected]>
---
arch/arm/mm/vcm.c | 1901 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/vcm.h | 701 +++++++++++++++++
include/linux/vcm_types.h | 318 ++++++++
3 files changed, 2920 insertions(+), 0 deletions(-)
create mode 100644 arch/arm/mm/vcm.c
create mode 100644 include/linux/vcm.h
create mode 100644 include/linux/vcm_types.h
diff --git a/arch/arm/mm/vcm.c b/arch/arm/mm/vcm.c
new file mode 100644
index 0000000..04ff2d4
--- /dev/null
+++ b/arch/arm/mm/vcm.c
@@ -0,0 +1,1901 @@
+/* Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA.
+ */
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/vcm_mm.h>
+#include <linux/vcm.h>
+#include <linux/vcm_types.h>
+#include <linux/errno.h>
+#include <linux/spinlock.h>
+
+#include <asm/page.h>
+#include <asm/sizes.h>
+
+#ifdef CONFIG_SMMU
+#include <mach/smmu_driver.h>
+#endif
+
+/* alloc_vm_area */
+#include <linux/pfn.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+
+/* may be temporary */
+#include <linux/bootmem.h>
+
+#include <asm/cacheflush.h>
+#include <asm/mach/map.h>
+
+#define BOOTMEM_SZ SZ_32M
+#define BOOTMEM_ALIGN SZ_1M
+
+#define CONT_SZ SZ_8M
+#define CONT_ALIGN SZ_1M
+
+#define ONE_TO_ONE_CHK 1
+
+#define vcm_err(a, ...) \
+ pr_err("ERROR %s %i " a, __func__, __LINE__, ##__VA_ARGS__)
+
+static void *bootmem;
+static void *bootmem_cont;
+static struct vcm *cont_vcm_id;
+static struct phys_chunk *cont_phys_chunk;
+
+DEFINE_SPINLOCK(vcmlock);
+
+static int vcm_no_res(struct vcm *vcm)
+{
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ return list_empty(&vcm->res_head);
+fail:
+ return -EINVAL;
+}
+
+static int vcm_no_assoc(struct vcm *vcm)
+{
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ return list_empty(&vcm->assoc_head);
+fail:
+ return -EINVAL;
+}
+
+static int vcm_all_activated(struct vcm *vcm)
+{
+ struct avcm *avcm;
+
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ list_for_each_entry(avcm, &vcm->assoc_head, assoc_elm)
+ if (!avcm->is_active)
+ return 0;
+
+ return 1;
+fail:
+ return -1;
+}
+
+static void vcm_destroy_common(struct vcm *vcm)
+{
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ return;
+ }
+
+ memset(vcm, 0, sizeof(*vcm));
+ kfree(vcm);
+}
+
+static struct vcm *vcm_create_common(void)
+{
+ struct vcm *vcm = 0;
+
+ vcm = kzalloc(sizeof(*vcm), GFP_KERNEL);
+ if (!vcm) {
+ vcm_err("kzalloc(%i, GFP_KERNEL) ret 0\n",
+ sizeof(*vcm));
+ goto fail;
+ }
+
+ INIT_LIST_HEAD(&vcm->res_head);
+ INIT_LIST_HEAD(&vcm->assoc_head);
+
+ return vcm;
+
+fail:
+ return NULL;
+}
+
+
+static int vcm_create_pool(struct vcm *vcm, size_t start_addr, size_t len)
+{
+ int ret = 0;
+
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ vcm->start_addr = start_addr;
+ vcm->len = len;
+
+ vcm->pool = gen_pool_create(PAGE_SHIFT, -1);
+ if (!vcm->pool) {
+ vcm_err("gen_pool_create(%x, -1) ret 0\n", PAGE_SHIFT);
+ goto fail;
+ }
+
+ ret = gen_pool_add(vcm->pool, start_addr, len, -1);
+ if (ret) {
+ vcm_err("gen_pool_add(%p, %p, %i, -1) ret %i\n", vcm->pool,
+ (void *) start_addr, len, ret);
+ goto fail2;
+ }
+
+ return 0;
+
+fail2:
+ gen_pool_destroy(vcm->pool);
+fail:
+ return -1;
+}
+
+
+static struct vcm *vcm_create_flagged(int flag, size_t start_addr, size_t len)
+{
+ int ret = 0;
+ struct vcm *vcm = 0;
+
+ vcm = vcm_create_common();
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ /* special one-to-one mapping case */
+ if ((flag & ONE_TO_ONE_CHK) &&
+ bootmem_cont &&
+ __pa(bootmem_cont) &&
+ start_addr == __pa(bootmem_cont) &&
+ len == CONT_SZ) {
+ vcm->type = VCM_ONE_TO_ONE;
+ } else {
+ ret = vcm_create_pool(vcm, start_addr, len);
+ vcm->type = VCM_DEVICE;
+ }
+
+ if (ret) {
+ vcm_err("vcm_create_pool(%p, %p, %i) ret %i\n", vcm,
+ (void *) start_addr, len, ret);
+ goto fail2;
+ }
+
+ return vcm;
+
+fail2:
+ vcm_destroy_common(vcm);
+fail:
+ return NULL;
+}
+
+struct vcm *vcm_create(size_t start_addr, size_t len)
+{
+ unsigned long flags;
+ struct vcm *vcm;
+
+ spin_lock_irqsave(&vcmlock, flags);
+ vcm = vcm_create_flagged(ONE_TO_ONE_CHK, start_addr, len);
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return vcm;
+}
+
+
+static int ext_vcm_id_valid(size_t ext_vcm_id)
+{
+ return ((ext_vcm_id == VCM_PREBUILT_KERNEL) ||
+ (ext_vcm_id == VCM_PREBUILT_USER));
+}
+
+
+struct vcm *vcm_create_from_prebuilt(size_t ext_vcm_id)
+{
+ unsigned long flags;
+ struct vcm *vcm = 0;
+
+ spin_lock_irqsave(&vcmlock, flags);
+
+ if (!ext_vcm_id_valid(ext_vcm_id)) {
+ vcm_err("ext_vcm_id_valid(%i) ret 0\n", ext_vcm_id);
+ goto fail;
+ }
+
+ vcm = vcm_create_common();
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ if (ext_vcm_id == VCM_PREBUILT_KERNEL)
+ vcm->type = VCM_EXT_KERNEL;
+ else if (ext_vcm_id == VCM_PREBUILT_USER)
+ vcm->type = VCM_EXT_USER;
+ else {
+ vcm_err("UNREACHABLE ext_vcm_id is illegal\n");
+ goto fail_free;
+ }
+
+ /* TODO: set kernel and userspace start_addr and len, if this
+ * makes sense */
+
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return vcm;
+
+fail_free:
+ vcm_destroy_common(vcm);
+fail:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return NULL;
+}
+
+
+struct vcm *vcm_clone(struct vcm *vcm_id)
+{
+ return 0;
+}
+
+
+/* No lock needed, vcm->start_addr is never updated after creation */
+size_t vcm_get_start_addr(struct vcm *vcm)
+{
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ return 1;
+ }
+
+ return vcm->start_addr;
+}
+
+
+/* No lock needed, vcm->len is never updated after creation */
+size_t vcm_get_len(struct vcm *vcm)
+{
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ return 0;
+ }
+
+ return vcm->len;
+}
+
+
+static int vcm_free_common_rule(struct vcm *vcm)
+{
+ int ret;
+
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ ret = vcm_no_res(vcm);
+ if (!ret) {
+ vcm_err("vcm_no_res(%p) ret 0\n", vcm);
+ goto fail_busy;
+ }
+
+ if (ret == -EINVAL) {
+ vcm_err("vcm_no_res(%p) ret -EINVAL\n", vcm);
+ goto fail;
+ }
+
+ ret = vcm_no_assoc(vcm);
+ if (!ret) {
+ vcm_err("vcm_no_assoc(%p) ret 0\n", vcm);
+ goto fail_busy;
+ }
+
+ if (ret == -EINVAL) {
+ vcm_err("vcm_no_assoc(%p) ret -EINVAL\n", vcm);
+ goto fail;
+ }
+
+ return 0;
+
+fail_busy:
+ return -EBUSY;
+fail:
+ return -EINVAL;
+}
+
+
+static int vcm_free_pool_rule(struct vcm *vcm)
+{
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ /* A vcm always has a valid pool, don't free the vcm because
+ what we got is probably invalid.
+ */
+ if (!vcm->pool) {
+ vcm_err("NULL vcm->pool\n");
+ goto fail;
+ }
+
+ return 0;
+
+fail:
+ return -EINVAL;
+}
+
+
+static void vcm_free_common(struct vcm *vcm)
+{
+ memset(vcm, 0, sizeof(*vcm));
+
+ kfree(vcm);
+}
+
+
+static int vcm_free_pool(struct vcm *vcm)
+{
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ gen_pool_destroy(vcm->pool);
+
+ return 0;
+
+fail:
+ return -1;
+}
+
+
+static int __vcm_free(struct vcm *vcm)
+{
+ int ret;
+
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ ret = vcm_free_common_rule(vcm);
+ if (ret != 0) {
+ vcm_err("vcm_free_common_rule(%p) ret %i\n", vcm, ret);
+ goto fail;
+ }
+
+ if (vcm->type == VCM_DEVICE) {
+ ret = vcm_free_pool_rule(vcm);
+ if (ret != 0) {
+ vcm_err("vcm_free_pool_rule(%p) ret %i\n",
+ (void *) vcm, ret);
+ goto fail;
+ }
+
+ ret = vcm_free_pool(vcm);
+ if (ret != 0) {
+ vcm_err("vcm_free_pool(%p) ret %i", (void *) vcm, ret);
+ goto fail;
+ }
+ }
+
+ vcm_free_common(vcm);
+
+ return 0;
+
+fail:
+ return -EINVAL;
+}
+
+int vcm_free(struct vcm *vcm)
+{
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&vcmlock, flags);
+ ret = __vcm_free(vcm);
+ spin_unlock_irqrestore(&vcmlock, flags);
+
+ return ret;
+}
+
+
+static struct res *__vcm_reserve(struct vcm *vcm, size_t len, uint32_t attr)
+{
+ struct res *res = NULL;
+
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ if (len == 0) {
+ vcm_err("len is 0\n");
+ goto fail;
+ }
+
+ res = kzalloc(sizeof(*res), GFP_KERNEL);
+ if (!res) {
+ vcm_err("kzalloc(%i, GFP_KERNEL) ret 0", sizeof(*res));
+ goto fail;
+ }
+
+ INIT_LIST_HEAD(&res->res_elm);
+ res->vcm_id = vcm;
+ res->len = len;
+ res->attr = attr;
+
+ if (len/SZ_1M)
+ res->alignment_req = SZ_1M;
+ else if (len/SZ_64K)
+ res->alignment_req = SZ_64K;
+ else
+ res->alignment_req = SZ_4K;
+
+ res->aligned_len = res->alignment_req + len;
+
+ switch (vcm->type) {
+ case VCM_DEVICE:
+ /* should always be not zero */
+ if (!vcm->pool) {
+ vcm_err("NULL vcm->pool\n");
+ goto fail2;
+ }
+
+ res->ptr = gen_pool_alloc(vcm->pool, res->aligned_len);
+ if (!res->ptr) {
+ vcm_err("gen_pool_alloc(%p, %i) ret 0\n",
+ vcm->pool, res->aligned_len);
+ goto fail2;
+ }
+
+ /* Calculate alignment... this will all change anyway */
+ res->aligned_ptr = res->ptr +
+ (res->alignment_req -
+ (res->ptr & (res->alignment_req - 1)));
+
+ break;
+ case VCM_EXT_KERNEL:
+ res->vm_area = alloc_vm_area(res->aligned_len);
+ res->mapped = 0; /* be explicit */
+ if (!res->vm_area) {
+ vcm_err("NULL res->vm_area\n");
+ goto fail2;
+ }
+
+ res->aligned_ptr = (size_t) res->vm_area->addr +
+ (res->alignment_req -
+ ((size_t) res->vm_area->addr &
+ (res->alignment_req - 1)));
+
+ break;
+ case VCM_ONE_TO_ONE:
+ break;
+ default:
+ vcm_err("%i is an invalid vcm->type\n", vcm->type);
+ goto fail2;
+ }
+
+ list_add_tail(&res->res_elm, &vcm->res_head);
+
+ return res;
+
+fail2:
+ kfree(res);
+fail:
+ return 0;
+}
+
+
+struct res *vcm_reserve(struct vcm *vcm, size_t len, uint32_t attr)
+{
+ unsigned long flags;
+ struct res *res;
+
+ spin_lock_irqsave(&vcmlock, flags);
+ res = __vcm_reserve(vcm, len, attr);
+ spin_unlock_irqrestore(&vcmlock, flags);
+
+ return res;
+}
+
+
+struct res *vcm_reserve_at(enum memtarget_t memtarget, struct vcm* vcm,
+ size_t len, uint32_t attr)
+{
+ return 0;
+}
+
+
+/* No lock needed, res->vcm_id is never updated after creation */
+struct vcm *vcm_get_vcm_from_res(struct res *res)
+{
+ if (!res) {
+ vcm_err("NULL res\n");
+ return 0;
+ }
+
+ return res->vcm_id;
+}
+
+
+static int __vcm_unreserve(struct res *res)
+{
+ struct vcm *vcm;
+
+ if (!res) {
+ vcm_err("NULL res\n");
+ goto fail;
+ }
+
+ if (!res->vcm_id) {
+ vcm_err("NULL res->vcm_id\n");
+ goto fail;
+ }
+
+ vcm = res->vcm_id;
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ switch (vcm->type) {
+ case VCM_DEVICE:
+ if (!res->vcm_id->pool) {
+ vcm_err("NULL (res->vcm_id))->pool\n");
+ goto fail;
+ }
+
+ /* res->ptr could be zero, this isn't an error */
+ gen_pool_free(res->vcm_id->pool, res->ptr,
+ res->aligned_len);
+ break;
+ case VCM_EXT_KERNEL:
+ if (res->mapped) {
+ vcm_err("res->mapped is true\n");
+ goto fail;
+ }
+
+ /* This may take a little explaining.
+ * In the kernel vunmap will free res->vm_area
+ * so if we've called it then we shouldn't call
+ * free_vm_area(). If we've called it we set
+ * res->vm_area to 0.
+ */
+ if (res->vm_area) {
+ free_vm_area(res->vm_area);
+ res->vm_area = 0;
+ }
+
+ break;
+ case VCM_ONE_TO_ONE:
+ break;
+ default:
+ vcm_err("%i is an invalid vcm->type\n", vcm->type);
+ goto fail;
+ }
+
+ list_del(&res->res_elm);
+
+ /* be extra careful by clearing the memory before freeing it */
+ memset(res, 0, sizeof(*res));
+
+ kfree(res);
+
+ return 0;
+
+fail:
+ return -EINVAL;
+}
+
+
+int vcm_unreserve(struct res *res)
+{
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&vcmlock, flags);
+ ret = __vcm_unreserve(res);
+ spin_unlock_irqrestore(&vcmlock, flags);
+
+ return ret;
+}
+
+
+/* No lock needed, res->len is never updated after creation */
+size_t vcm_get_res_len(struct res *res)
+{
+ if (!res) {
+ vcm_err("res is 0\n");
+ return 0;
+ }
+
+ return res->len;
+}
+
+
+int vcm_set_res_attr(struct res *res, uint32_t attr)
+{
+ return 0;
+}
+
+
+uint32_t vcm_get_res_attr(struct res *res)
+{
+ return 0;
+}
+
+
+size_t vcm_get_num_res(struct vcm *vcm)
+{
+ return 0;
+}
+
+
+struct res *vcm_get_next_res(struct vcm *vcm, struct res *res)
+{
+ return 0;
+}
+
+
+size_t vcm_res_copy(struct res *to, size_t to_off, struct res *from,
+ size_t from_off, size_t len)
+{
+ return 0;
+}
+
+
+size_t vcm_get_min_page_size(void)
+{
+ return PAGE_SIZE;
+}
+
+
+static int vcm_to_smmu_attr(uint32_t attr)
+{
+ int smmu_attr = 0;
+
+ switch (attr & VCM_CACHE_POLICY) {
+ case VCM_NOTCACHED:
+ smmu_attr = VCM_DEV_ATTR_NONCACHED;
+ break;
+ case VCM_WB_WA:
+ smmu_attr = VCM_DEV_ATTR_CACHED_WB_WA;
+ smmu_attr |= VCM_DEV_ATTR_SH;
+ break;
+ case VCM_WB_NWA:
+ smmu_attr = VCM_DEV_ATTR_CACHED_WB_NWA;
+ smmu_attr |= VCM_DEV_ATTR_SH;
+ break;
+ case VCM_WT:
+ smmu_attr = VCM_DEV_ATTR_CACHED_WT;
+ smmu_attr |= VCM_DEV_ATTR_SH;
+ break;
+ default:
+ return -1;
+ }
+
+ return smmu_attr;
+}
+
+
+/* TBD if you vcm_back again what happens? */
+int vcm_back(struct res *res, struct physmem *physmem)
+{
+ unsigned long flags;
+ struct vcm *vcm;
+ struct phys_chunk *chunk;
+ size_t va = 0;
+ int ret;
+ int attr;
+
+ spin_lock_irqsave(&vcmlock, flags);
+
+ if (!res) {
+ vcm_err("NULL res\n");
+ goto fail;
+ }
+
+ vcm = res->vcm_id;
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ switch (vcm->type) {
+ case VCM_DEVICE:
+ case VCM_EXT_KERNEL: /* hack part 1 */
+ attr = vcm_to_smmu_attr(res->attr);
+ if (attr == -1) {
+ vcm_err("Bad SMMU attr\n");
+ goto fail;
+ }
+ break;
+ default:
+ attr = 0;
+ break;
+ }
+
+ if (!physmem) {
+ vcm_err("NULL physmem\n");
+ goto fail;
+ }
+
+ if (res->len == 0) {
+ vcm_err("res->len is 0\n");
+ goto fail;
+ }
+
+ if (physmem->len == 0) {
+ vcm_err("physmem->len is 0\n");
+ goto fail;
+ }
+
+ if (res->len != physmem->len) {
+ vcm_err("res->len (%i) != physmem->len (%i)\n",
+ res->len, physmem->len);
+ goto fail;
+ }
+
+ if (physmem->is_cont) {
+ if (physmem->res == 0) {
+ vcm_err("cont physmem->res is 0");
+ goto fail;
+ }
+ } else {
+ /* fail if no physmem */
+ if (list_empty(&physmem->alloc_head.allocated)) {
+ vcm_err("no allocated phys memory");
+ goto fail;
+ }
+ }
+
+ ret = vcm_no_assoc(res->vcm_id);
+ if (ret == 1) {
+ vcm_err("can't back un associated VCM\n");
+ goto fail;
+ }
+
+ if (ret == -1) {
+ vcm_err("vcm_no_assoc() ret -1\n");
+ goto fail;
+ }
+
+ ret = vcm_all_activated(res->vcm_id);
+ if (ret == 0) {
+ vcm_err("can't back, not all associations are activated\n");
+ goto fail_eagain;
+ }
+
+ if (ret == -1) {
+ vcm_err("vcm_all_activated() ret -1\n");
+ goto fail;
+ }
+
+ va = res->aligned_ptr;
+
+ list_for_each_entry(chunk, &physmem->alloc_head.allocated,
+ allocated) {
+ struct vcm *vcm = res->vcm_id;
+ size_t chunk_size = vcm_alloc_idx_to_size(chunk->size_idx);
+
+ switch (vcm->type) {
+ case VCM_DEVICE:
+ {
+#ifdef CONFIG_SMMU
+ struct avcm *avcm;
+ /* map all */
+ list_for_each_entry(avcm, &vcm->assoc_head,
+ assoc_elm) {
+
+ ret = smmu_map(
+ (struct smmu_dev *) avcm->dev_id,
+ chunk->pa, va, chunk_size, attr);
+ if (ret != 0) {
+ vcm_err("smmu_map(%p, %p, %p, 0x%x,"
+ "0x%x)"
+ " ret %i",
+ (void *) avcm->dev_id,
+ (void *) chunk->pa,
+ (void *) va,
+ (int) chunk_size, attr, ret);
+ goto fail;
+ /* TODO handle weird inter-map case */
+ }
+ }
+ break;
+#else
+ vcm_err("No SMMU support - VCM_DEVICE not supported\n");
+ goto fail;
+#endif
+ }
+
+ case VCM_EXT_KERNEL:
+ {
+ unsigned int pages_in_chunk = chunk_size / PAGE_SIZE;
+ unsigned long loc_va = va;
+ unsigned long loc_pa = chunk->pa;
+
+ const struct mem_type *mtype;
+
+ /* TODO: get this based on MEMTYPE */
+ mtype = get_mem_type(MT_DEVICE);
+ if (!mtype) {
+ vcm_err("mtype is 0\n");
+ goto fail;
+ }
+
+ /* TODO: Map with the same chunk size */
+ while (pages_in_chunk--) {
+ ret = ioremap_page(loc_va,
+ loc_pa,
+ mtype);
+ if (ret != 0) {
+ vcm_err("ioremap_page(%p, %p, %p) ret"
+ " %i", (void *) loc_va,
+ (void *) loc_pa,
+ (void *) mtype, ret);
+ goto fail;
+ /* TODO handle weird
+ inter-map case */
+ }
+
+ /* hack part 2 */
+ /* we're changing the PT entry behind
+ * linux's back
+ */
+ ret = cpu_set_attr(loc_va, PAGE_SIZE, attr);
+ if (ret != 0) {
+ vcm_err("cpu_set_attr(%p, %lu, %x)"
+ "ret %i\n",
+ (void *) loc_va, PAGE_SIZE,
+ attr, ret);
+ goto fail;
+ /* TODO handle weird
+ inter-map case */
+ }
+
+ res->mapped = 1;
+
+ loc_va += PAGE_SIZE;
+ loc_pa += PAGE_SIZE;
+ }
+
+ flush_cache_vmap(va, loc_va);
+ break;
+ }
+ case VCM_ONE_TO_ONE:
+ va = chunk->pa;
+ break;
+ default:
+ /* this should never happen */
+ goto fail;
+ }
+
+ va += chunk_size;
+ /* also add res to the allocated chunk list of refs */
+ }
+
+ /* note the reservation */
+ res->physmem_id = physmem;
+
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return 0;
+fail_eagain:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -EAGAIN;
+fail:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -EINVAL;
+}
+
+
+int vcm_unback(struct res *res)
+{
+ unsigned long flags;
+ struct vcm *vcm;
+ struct physmem *physmem;
+ int ret;
+
+ spin_lock_irqsave(&vcmlock, flags);
+
+ if (!res)
+ goto fail;
+
+ vcm = res->vcm_id;
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ if (!res->physmem_id) {
+ vcm_err("can't unback a non-backed reservation\n");
+ goto fail;
+ }
+
+ physmem = res->physmem_id;
+ if (!physmem) {
+ vcm_err("physmem is NULL\n");
+ goto fail;
+ }
+
+ if (list_empty(&physmem->alloc_head.allocated)) {
+ vcm_err("physmem allocation is empty\n");
+ goto fail;
+ }
+
+ ret = vcm_no_assoc(res->vcm_id);
+ if (ret == 1) {
+ vcm_err("can't unback a unassociated reservation\n");
+ goto fail;
+ }
+
+ if (ret == -1) {
+ vcm_err("vcm_no_assoc(%p) ret -1\n", (void *) res->vcm_id);
+ goto fail;
+ }
+
+ ret = vcm_all_activated(res->vcm_id);
+ if (ret == 0) {
+ vcm_err("can't unback, not all associations are active\n");
+ goto fail_eagain;
+ }
+
+ if (ret == -1) {
+ vcm_err("vcm_all_activated(%p) ret -1\n", (void *) res->vcm_id);
+ goto fail;
+ }
+
+
+ switch (vcm->type) {
+ case VCM_EXT_KERNEL:
+ if (!res->mapped) {
+ vcm_err("can't unback an unmapped VCM_EXT_KERNEL"
+ " VCM\n");
+ goto fail;
+ }
+
+ /* vunmap free's vm_area */
+ vunmap(res->vm_area->addr);
+ res->vm_area = 0;
+
+ res->mapped = 0;
+ break;
+
+ case VCM_DEVICE:
+ {
+#ifdef CONFIG_SMMU
+ struct phys_chunk *chunk;
+ size_t va = res->aligned_ptr;
+
+ list_for_each_entry(chunk, &physmem->alloc_head.allocated,
+ allocated) {
+ struct vcm *vcm = res->vcm_id;
+ size_t chunk_size =
+ vcm_alloc_idx_to_size(chunk->size_idx);
+ struct avcm *avcm;
+
+ /* un map all */
+ list_for_each_entry(avcm, &vcm->assoc_head, assoc_elm) {
+ ret = smmu_unmap(
+ (struct smmu_dev *) avcm->dev_id,
+ va, chunk_size);
+ if (ret != 0) {
+ vcm_err("smmu_unmap(%p, %p, 0x%x)"
+ " ret %i",
+ (void *) avcm->dev_id,
+ (void *) va,
+ (int) chunk_size, ret);
+ goto fail;
+ /* TODO handle weird inter-unmap state*/
+ }
+ }
+ va += chunk_size;
+ /* may to a light unback, depending on the requested
+ * functionality
+ */
+ }
+#else
+ vcm_err("No SMMU support - VCM_DEVICE memory not supported\n");
+ goto fail;
+#endif
+ break;
+ }
+
+ case VCM_ONE_TO_ONE:
+ break;
+ default:
+ /* this should never happen */
+ goto fail;
+ }
+
+ /* clear the reservation */
+ res->physmem_id = 0;
+
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return 0;
+fail_eagain:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -EAGAIN;
+fail:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -EINVAL;
+}
+
+
+enum memtarget_t vcm_get_memtype_of_res(struct res *res)
+{
+ return VCM_INVALID;
+}
+
+static int vcm_free_max_munch_cont(struct phys_chunk *head)
+{
+ struct phys_chunk *chunk, *tmp;
+
+ if (!head)
+ return -1;
+
+ list_for_each_entry_safe(chunk, tmp, &head->allocated,
+ allocated) {
+ list_del_init(&chunk->allocated);
+ }
+
+ return 0;
+}
+
+static int vcm_alloc_max_munch_cont(size_t start_addr, size_t len,
+ struct phys_chunk *head)
+{
+ /* this function should always succeed, since it
+ parallels a VCM */
+
+ int i, j;
+
+ if (!head) {
+ vcm_err("head is NULL in continuous map.\n");
+ goto fail;
+ }
+
+ if (start_addr < __pa(bootmem_cont)) {
+ vcm_err("phys start addr (%p) < base (%p)\n",
+ (void *) start_addr, (void *) __pa(bootmem_cont));
+ goto fail;
+ }
+
+ if ((start_addr + len) >= (__pa(bootmem_cont) + CONT_SZ)) {
+ vcm_err("requested region (%p + %i) > "
+ " available region (%p + %i)",
+ (void *) start_addr, (int) len,
+ (void *) __pa(bootmem_cont), CONT_SZ);
+ goto fail;
+ }
+
+ i = (start_addr - __pa(bootmem_cont))/SZ_4K;
+
+ for (j = 0; j < ARRAY_SIZE(chunk_sizes); ++j) {
+ while (len/chunk_sizes[j]) {
+ if (!list_empty(&cont_phys_chunk[i].allocated)) {
+ vcm_err("chunk %i ( addr %p) already mapped\n",
+ i, (void *) (start_addr +
+ (i*chunk_sizes[j])));
+ goto fail_free;
+ }
+ list_add_tail(&cont_phys_chunk[i].allocated,
+ &head->allocated);
+ cont_phys_chunk[i].size_idx = j;
+
+ len -= chunk_sizes[j];
+ i += chunk_sizes[j]/SZ_4K;
+ }
+ }
+
+ if (len % SZ_4K) {
+ if (!list_empty(&cont_phys_chunk[i].allocated)) {
+ vcm_err("chunk %i (addr %p) already mapped\n",
+ i, (void *) (start_addr + (i*SZ_4K)));
+ goto fail_free;
+ }
+ len -= SZ_4K;
+ list_add_tail(&cont_phys_chunk[i].allocated,
+ &head->allocated);
+
+ i++;
+ }
+
+ return i;
+
+fail_free:
+ {
+ struct phys_chunk *chunk, *tmp;
+ /* just remove from list, if we're double alloc'ing
+ we don't want to stamp on the other guy */
+ list_for_each_entry_safe(chunk, tmp, &head->allocated,
+ allocated) {
+ list_del(&chunk->allocated);
+ }
+ }
+fail:
+ return 0;
+}
+
+struct physmem *vcm_phys_alloc(enum memtype_t memtype, size_t len,
+ uint32_t attr)
+{
+ unsigned long flags;
+ int ret;
+ struct physmem *physmem = NULL;
+ int blocks_allocated;
+
+ spin_lock_irqsave(&vcmlock, flags);
+
+ physmem = kzalloc(sizeof(*physmem), GFP_KERNEL);
+ if (!physmem) {
+ vcm_err("physmem is NULL\n");
+ goto fail;
+ }
+
+ physmem->memtype = memtype;
+ physmem->len = len;
+ physmem->attr = attr;
+
+ INIT_LIST_HEAD(&physmem->alloc_head.allocated);
+
+ if (attr & VCM_PHYS_CONT) {
+ if (!cont_vcm_id) {
+ vcm_err("cont_vcm_id is NULL\n");
+ goto fail2;
+ }
+
+ physmem->is_cont = 1;
+
+ /* TODO: get attributes */
+ physmem->res = __vcm_reserve(cont_vcm_id, len, 0);
+ if (physmem->res == 0) {
+ vcm_err("contiguous space allocation failed\n");
+ goto fail2;
+ }
+
+ /* if we're here we know we have memory, create
+ the shadow physmem links*/
+ blocks_allocated =
+ vcm_alloc_max_munch_cont(
+ vcm_get_dev_addr(physmem->res),
+ len,
+ &physmem->alloc_head);
+
+ if (blocks_allocated == 0) {
+ vcm_err("shadow physmem allocation failed\n");
+ goto fail3;
+ }
+ } else {
+ blocks_allocated = vcm_alloc_max_munch(len,
+ &physmem->alloc_head);
+ if (blocks_allocated == 0) {
+ vcm_err("physical allocation failed:"
+ " vcm_alloc_max_munch(%i, %p) ret 0\n",
+ len, &physmem->alloc_head);
+ goto fail2;
+ }
+ }
+
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return physmem;
+
+fail3:
+ ret = __vcm_unreserve(physmem->res);
+ if (ret != 0) {
+ vcm_err("vcm_unreserve(%p) ret %i during cleanup",
+ (void *) physmem->res, ret);
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return 0;
+ }
+fail2:
+ kfree(physmem);
+fail:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return 0;
+}
+
+
+int vcm_phys_free(struct physmem *physmem)
+{
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&vcmlock, flags);
+
+ if (!physmem) {
+ vcm_err("physmem is NULL\n");
+ goto fail;
+ }
+
+ if (physmem->is_cont) {
+ if (physmem->res == 0) {
+ vcm_err("contiguous reservation is NULL\n");
+ goto fail;
+ }
+
+ ret = vcm_free_max_munch_cont(&physmem->alloc_head);
+ if (ret != 0) {
+ vcm_err("failed to free physical blocks:"
+ " vcm_free_max_munch_cont(%p) ret %i\n",
+ (void *) &physmem->alloc_head, ret);
+ goto fail;
+ }
+
+ ret = __vcm_unreserve(physmem->res);
+ if (ret != 0) {
+ vcm_err("failed to free virtual blocks:"
+ " vcm_unreserve(%p) ret %i\n",
+ (void *) physmem->res, ret);
+ goto fail;
+ }
+
+ } else {
+
+ ret = vcm_alloc_free_blocks(&physmem->alloc_head);
+ if (ret != 0) {
+ vcm_err("failed to free physical blocks:"
+ " vcm_alloc_free_blocks(%p) ret %i\n",
+ (void *) &physmem->alloc_head, ret);
+ goto fail;
+ }
+ }
+
+ memset(physmem, 0, sizeof(*physmem));
+
+ kfree(physmem);
+
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return 0;
+
+fail:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -EINVAL;
+}
+
+
+struct avcm *vcm_assoc(struct vcm *vcm, size_t dev_id, uint32_t attr)
+{
+ unsigned long flags;
+ struct avcm *avcm = NULL;
+
+ spin_lock_irqsave(&vcmlock, flags);
+
+ if (!vcm) {
+ vcm_err("vcm is NULL\n");
+ goto fail;
+ }
+
+ if (!dev_id) {
+ vcm_err("dev_id is NULL\n");
+ goto fail;
+ }
+
+ if (vcm->type == VCM_EXT_KERNEL && !list_empty(&vcm->assoc_head)) {
+ vcm_err("only one device may be assocoated with a"
+ " VCM_EXT_KERNEL\n");
+ goto fail;
+ }
+
+ avcm = kzalloc(sizeof(*avcm), GFP_KERNEL);
+ if (!avcm) {
+ vcm_err("kzalloc(%i, GFP_KERNEL) ret NULL\n", sizeof(*avcm));
+ goto fail;
+ }
+
+ avcm->dev_id = dev_id;
+
+ avcm->vcm_id = vcm;
+ avcm->attr = attr;
+ avcm->is_active = 0;
+
+ INIT_LIST_HEAD(&avcm->assoc_elm);
+ list_add(&avcm->assoc_elm, &vcm->assoc_head);
+
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return avcm;
+
+fail:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return 0;
+}
+
+
+int vcm_deassoc(struct avcm *avcm)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&vcmlock, flags);
+
+ if (!avcm) {
+ vcm_err("avcm is NULL\n");
+ goto fail;
+ }
+
+ if (list_empty(&avcm->assoc_elm)) {
+ vcm_err("nothing to deassociate\n");
+ goto fail;
+ }
+
+ if (avcm->is_active) {
+ vcm_err("association still activated\n");
+ goto fail_busy;
+ }
+
+ list_del(&avcm->assoc_elm);
+
+ memset(avcm, 0, sizeof(*avcm));
+
+ kfree(avcm);
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return 0;
+fail_busy:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -EBUSY;
+fail:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -EINVAL;
+}
+
+
+int vcm_set_assoc_attr(struct avcm *avcm, uint32_t attr)
+{
+ return 0;
+}
+
+
+uint32_t vcm_get_assoc_attr(struct avcm *avcm)
+{
+ return 0;
+}
+
+
+int vcm_activate(struct avcm *avcm)
+{
+ unsigned long flags;
+ struct vcm *vcm;
+
+ spin_lock_irqsave(&vcmlock, flags);
+
+ if (!avcm) {
+ vcm_err("avcm is NULL\n");
+ goto fail;
+ }
+
+ vcm = avcm->vcm_id;
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ if (!avcm->dev_id) {
+ vcm_err("cannot activate without a device\n");
+ goto fail_nodev;
+ }
+
+ if (avcm->is_active) {
+ vcm_err("double activate\n");
+ goto fail_busy;
+ }
+
+ if (vcm->type == VCM_DEVICE) {
+#ifdef CONFIG_SMMU
+ int ret = smmu_is_active((struct smmu_dev *) avcm->dev_id);
+ if (ret == -1) {
+ vcm_err("smmu_is_active(%p) ret -1\n",
+ (void *) avcm->dev_id);
+ goto fail_dev;
+ }
+
+ if (ret == 1) {
+ vcm_err("SMMU is already active\n");
+ goto fail_busy;
+ }
+
+ /* TODO, pmem check */
+ ret = smmu_activate((struct smmu_dev *) avcm->dev_id);
+ if (ret != 0) {
+ vcm_err("smmu_activate(%p) ret %i"
+ " SMMU failed to activate\n",
+ (void *) avcm->dev_id, ret);
+ goto fail_dev;
+ }
+#else
+ vcm_err("No SMMU support - cannot activate/deactivate\n");
+ goto fail_nodev;
+#endif
+ }
+
+ avcm->is_active = 1;
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return 0;
+
+#ifdef CONFIG_SMMU
+fail_dev:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -1;
+#endif
+fail_busy:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -EBUSY;
+fail_nodev:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -ENODEV;
+fail:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -EINVAL;
+}
+
+
+int vcm_deactivate(struct avcm *avcm)
+{
+ unsigned long flags;
+ struct vcm *vcm;
+
+ spin_lock_irqsave(&vcmlock, flags);
+
+ if (!avcm)
+ goto fail;
+
+ vcm = avcm->vcm_id;
+ if (!vcm) {
+ vcm_err("NULL vcm\n");
+ goto fail;
+ }
+
+ if (!avcm->dev_id) {
+ vcm_err("cannot deactivate without a device\n");
+ goto fail;
+ }
+
+ if (!avcm->is_active) {
+ vcm_err("double deactivate\n");
+ goto fail_nobusy;
+ }
+
+ if (vcm->type == VCM_DEVICE) {
+#ifdef CONFIG_SMMU
+ int ret = smmu_is_active((struct smmu_dev *) avcm->dev_id);
+ if (ret == -1) {
+ vcm_err("smmu_is_active(%p) ret %i\n",
+ (void *) avcm->dev_id, ret);
+ goto fail_dev;
+ }
+
+ if (ret == 0) {
+ vcm_err("double SMMU deactivation\n");
+ goto fail_nobusy;
+ }
+
+ /* TODO, pmem check */
+ ret = smmu_deactivate((struct smmu_dev *) avcm->dev_id);
+ if (ret != 0) {
+ vcm_err("smmu_deactivate(%p) ret %i\n",
+ (void *) avcm->dev_id, ret);
+ goto fail_dev;
+ }
+#else
+ vcm_err("No SMMU support - cannot activate/deactivate\n");
+ goto fail;
+#endif
+ }
+
+ avcm->is_active = 0;
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return 0;
+#ifdef CONFIG_SMMU
+fail_dev:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -1;
+#endif
+fail_nobusy:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -ENOENT;
+fail:
+ spin_unlock_irqrestore(&vcmlock, flags);
+ return -EINVAL;
+}
+
+struct bound *vcm_create_bound(struct vcm *vcm, size_t len)
+{
+ return 0;
+}
+
+
+int vcm_free_bound(struct bound *bound)
+{
+ return -1;
+}
+
+
+struct res *vcm_reserve_from_bound(struct bound *bound, size_t len,
+ uint32_t attr)
+{
+ return 0;
+}
+
+
+size_t vcm_get_bound_start_addr(struct bound *bound)
+{
+ return 0;
+}
+
+
+size_t vcm_get_bound_len(struct bound *bound)
+{
+ return 0;
+}
+
+
+struct physmem *vcm_map_phys_addr(size_t phys, size_t len)
+{
+ return 0;
+}
+
+
+size_t vcm_get_next_phys_addr(struct physmem *physmem, size_t phys, size_t *len)
+{
+ return 0;
+}
+
+
+size_t vcm_get_dev_addr(struct res *res)
+{
+ if (!res) {
+ vcm_err("res is NULL\n");
+ return 0;
+ }
+
+ return res->aligned_ptr;
+}
+
+
+struct res *vcm_get_res(size_t dev_addr, struct vcm *vcm)
+{
+ return 0;
+}
+
+
+size_t vcm_translate(size_t src_dev, struct vcm *src_vcm, struct vcm *dst_vcm)
+{
+ return 0;
+}
+
+
+size_t vcm_get_phys_num_res(size_t phys)
+{
+ return 0;
+}
+
+
+struct res *vcm_get_next_phys_res(size_t phys, struct res *res_id, size_t *len)
+{
+ return 0;
+}
+
+
+size_t vcm_get_pgtbl_pa(struct vcm *vcm)
+{
+ return 0;
+}
+
+
+/* No lock needed, smmu_translate has its own lock */
+size_t vcm_dev_addr_to_phys_addr(size_t dev_id, size_t dev_addr)
+{
+#ifdef CONFIG_SMMU
+ int ret;
+ ret = smmu_translate((struct smmu_dev *) dev_id, dev_addr);
+ if (ret == -1)
+ vcm_err("smmu_translate(%p, %p) ret %i\n",
+ (void *) dev_id, (void *) dev_addr, ret);
+
+ return ret;
+#else
+ vcm_err("No support for SMMU - manual translation not supported\n");
+ return -1;
+#endif
+}
+
+
+/* No lock needed, bootmem_cont never changes after */
+size_t vcm_get_cont_memtype_pa(enum memtype_t memtype)
+{
+ if (memtype != VCM_MEMTYPE_0) {
+ vcm_err("memtype != VCM_MEMTYPE_0\n");
+ goto fail;
+ }
+
+ if (!bootmem_cont) {
+ vcm_err("bootmem_cont 0\n");
+ goto fail;
+ }
+
+ return (size_t) __pa(bootmem_cont);
+fail:
+ return 0;
+}
+
+
+/* No lock needed, constant */
+size_t vcm_get_cont_memtype_len(enum memtype_t memtype)
+{
+ if (memtype != VCM_MEMTYPE_0) {
+ vcm_err("memtype != VCM_MEMTYPE_0\n");
+ return 0;
+ }
+
+ return CONT_SZ;
+}
+
+int vcm_hook(size_t dev_id, vcm_handler handler, void *data)
+{
+#ifdef CONFIG_SMMU
+ int ret;
+
+ ret = smmu_hook_irpt((struct smmu_dev *) dev_id, handler, data);
+ if (ret != 0)
+ vcm_err("smmu_hook_irpt(%p, %p, %p) ret %i\n", (void *) dev_id,
+ (void *) handler, (void *) data, ret);
+
+ return ret;
+#else
+ vcm_err("No support for SMMU - interrupts not supported\n");
+ return -1;
+#endif
+}
+
+
+size_t vcm_hw_ver(size_t dev_id)
+{
+ return 0;
+}
+
+
+static int vcm_cont_phys_chunk_init(void)
+{
+ int i;
+ int cont_pa;
+
+ if (!cont_phys_chunk) {
+ vcm_err("cont_phys_chunk 0\n");
+ goto fail;
+ }
+
+ if (!bootmem_cont) {
+ vcm_err("bootmem_cont 0\n");
+ goto fail;
+ }
+
+ cont_pa = (int) __pa(bootmem_cont);
+
+ for (i = 0; i < CONT_SZ/PAGE_SIZE; ++i) {
+ cont_phys_chunk[i].pa = (int) cont_pa; cont_pa += PAGE_SIZE;
+ cont_phys_chunk[i].size_idx = IDX_4K;
+ INIT_LIST_HEAD(&cont_phys_chunk[i].allocated);
+ }
+
+ return 0;
+
+fail:
+ return -1;
+}
+
+
+int vcm_sys_init(void)
+{
+ int ret;
+ printk(KERN_INFO "VCM Initialization\n");
+ if (!bootmem) {
+ vcm_err("bootmem is 0\n");
+ ret = -1;
+ goto fail;
+ }
+
+ if (!bootmem_cont) {
+ vcm_err("bootmem_cont is 0\n");
+ ret = -1;
+ goto fail;
+ }
+
+ ret = vcm_setup_tex_classes();
+ if (ret != 0) {
+ printk(KERN_INFO "Could not determine TEX attribute mapping\n");
+ ret = -1;
+ goto fail;
+ }
+
+
+ ret = vcm_alloc_init(__pa(bootmem));
+ if (ret != 0) {
+ vcm_err("vcm_alloc_init(%p) ret %i\n", (void *) __pa(bootmem),
+ ret);
+ ret = -1;
+ goto fail;
+ }
+
+ cont_phys_chunk = kzalloc(sizeof(*cont_phys_chunk)*(CONT_SZ/PAGE_SIZE),
+ GFP_KERNEL);
+ if (!cont_phys_chunk) {
+ vcm_err("kzalloc(%lu, GFP_KERNEL) ret 0",
+ sizeof(*cont_phys_chunk)*(CONT_SZ/PAGE_SIZE));
+ goto fail_free;
+ }
+
+ /* the address and size will hit our special case unless we
+ pass an override */
+ cont_vcm_id = vcm_create_flagged(0, __pa(bootmem_cont), CONT_SZ);
+ if (cont_vcm_id == 0) {
+ vcm_err("vcm_create_flagged(0, %p, %i) ret 0\n",
+ (void *) __pa(bootmem_cont), CONT_SZ);
+ ret = -1;
+ goto fail_free2;
+ }
+
+ ret = vcm_cont_phys_chunk_init();
+ if (ret != 0) {
+ vcm_err("vcm_cont_phys_chunk_init() ret %i\n", ret);
+ goto fail_free3;
+ }
+
+ printk(KERN_INFO "VCM Initialization OK\n");
+ return 0;
+
+fail_free3:
+ ret = __vcm_free(cont_vcm_id);
+ if (ret != 0) {
+ vcm_err("vcm_free(%p) ret %i during failure path\n",
+ (void *) cont_vcm_id, ret);
+ return -1;
+ }
+
+fail_free2:
+ kfree(cont_phys_chunk);
+ cont_phys_chunk = 0;
+
+fail_free:
+ ret = vcm_alloc_destroy();
+ if (ret != 0)
+ vcm_err("vcm_alloc_destroy() ret %i during failure path\n",
+ ret);
+
+ ret = -1;
+fail:
+ return ret;
+}
+
+
+int vcm_sys_destroy(void)
+{
+ int ret = 0;
+
+ if (!cont_phys_chunk) {
+ vcm_err("cont_phys_chunk is 0\n");
+ return -1;
+ }
+
+ if (!cont_vcm_id) {
+ vcm_err("cont_vcm_id is 0\n");
+ return -1;
+ }
+
+ ret = __vcm_free(cont_vcm_id);
+ if (ret != 0) {
+ vcm_err("vcm_free(%p) ret %i\n", (void *) cont_vcm_id, ret);
+ return -1;
+ }
+
+ cont_vcm_id = 0;
+
+ kfree(cont_phys_chunk);
+ cont_phys_chunk = 0;
+
+ ret = vcm_alloc_destroy();
+ if (ret != 0) {
+ vcm_err("vcm_alloc_destroy() ret %i\n", ret);
+ return -1;
+ }
+
+ return ret;
+}
+
+int vcm_init(void)
+{
+ int ret;
+
+ bootmem = __alloc_bootmem(BOOTMEM_SZ, BOOTMEM_ALIGN, 0);
+ if (!bootmem) {
+ vcm_err("segregated block pool alloc failed:"
+ " __alloc_bootmem(%i, %i, 0)\n",
+ BOOTMEM_SZ, BOOTMEM_ALIGN);
+ goto fail;
+ }
+
+ bootmem_cont = __alloc_bootmem(CONT_SZ, CONT_ALIGN, 0);
+ if (!bootmem_cont) {
+ vcm_err("contiguous pool alloc failed:"
+ " __alloc_bootmem(%i, %i, 0)\n",
+ CONT_SZ, CONT_ALIGN);
+ goto fail_free;
+ }
+
+ ret = vcm_sys_init();
+ if (ret != 0) {
+ vcm_err("vcm_sys_init() ret %i\n", ret);
+ goto fail_free2;
+ }
+
+ return 0;
+
+fail_free2:
+ free_bootmem(__pa(bootmem_cont), CONT_SZ);
+fail_free:
+ free_bootmem(__pa(bootmem), BOOTMEM_SZ);
+fail:
+ return -1;
+};
+
+/* Useful for testing, and if VCM is ever unloaded */
+void vcm_exit(void)
+{
+ int ret;
+
+ if (!bootmem_cont) {
+ vcm_err("bootmem_cont is 0\n");
+ goto fail;
+ }
+
+ if (!bootmem) {
+ vcm_err("bootmem is 0\n");
+ goto fail;
+ }
+
+ ret = vcm_sys_destroy();
+ if (ret != 0) {
+ vcm_err("vcm_sys_destroy() ret %i\n", ret);
+ goto fail;
+ }
+
+ free_bootmem(__pa(bootmem_cont), CONT_SZ);
+ free_bootmem(__pa(bootmem), BOOTMEM_SZ);
+fail:
+ return;
+}
+early_initcall(vcm_init);
+module_exit(vcm_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Zach Pfeffer <[email protected]>");
diff --git a/include/linux/vcm.h b/include/linux/vcm.h
new file mode 100644
index 0000000..d2a1cd1
--- /dev/null
+++ b/include/linux/vcm.h
@@ -0,0 +1,701 @@
+/* Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials provided
+ * with the distribution.
+ * * Neither the name of Code Aurora Forum, Inc. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED "AS IS" AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef _VCM_H_
+#define _VCM_H_
+
+/* All undefined types must be defined using platform specific headers */
+
+#include <linux/vcm_types.h>
+
+/*
+ * Virtual contiguous memory (VCM) region primitives.
+ *
+ * Current memory mapping software uses a CPU centric management
+ * model. This makes sense in general, average hardware only contains an
+ * CPU MMU and possibly a graphics MMU. If every device in the system
+ * has one or more MMUs a CPU centric MM programming model breaks down.
+ *
+ * Looking at mapping from a system-wide perspective reveals a general
+ * graph problem. Each node that talks to memory, either through an MMU
+ * or directly (via physical memory) can be thought of as the device end
+ * of a mapping edge. The other edge is the physical memory that is
+ * mapped.
+ *
+ * In the direct mapped case, it is useful to give the device an
+ * MMU. This one-to-one MMU allows direct mapped devices to
+ * participate in graph management, they simply see memory through a
+ * one-to-one mapping.
+ *
+ * The CPU nodes can also be brought under the same mapping
+ * abstraction with the use of a light overlay on the existing
+ * VMM. This light overlay brings the VMM's page table abstraction for
+ * each process and the kernel into the graph management API.
+ *
+ * Taken together this system wide approach provides a capability that
+ * is greater than the sum of its parts by allowing users to reason
+ * about system wide mapping issues without getting bogged down in CPU
+ * centric device page table management issues.
+ */
+
+
+/*
+ * Creating, freeing and managing VCMs.
+ *
+ * A VCM region is a virtual space that can be reserved from and
+ * associated with one or more devices. At creation the user can
+ * specify an offset to start addresses and a length of the entire VCM
+ * region. Reservations out of a VCM region are always contiguous.
+ */
+
+/**
+ * vcm_create() - Create a VCM region
+ * @start_addr The starting address of the VCM region.
+ * @len The len of the VCM region. This must be at least
+ * vcm_get_min_page_size() bytes.
+ *
+ * A VCM typically abstracts a page table.
+ *
+ * All functions in this API are passed and return opaque things
+ * because the underlying implementations will vary. The goal
+ * is really graph management. vcm_create() creates the "device end"
+ * of an edge in the mapping graph.
+ *
+ * The return value is non-zero if a VCM has successfully been
+ * created. It will return zero if a VCM region cannot be created or
+ * len is invalid.
+ */
+struct vcm *vcm_create(size_t start_addr, size_t len);
+
+
+/**
+ * vcm_create_from_prebuilt() - Create a VCM region from an existing region
+ * @ext_vcm_id An external opaque value that allows the
+ * implementation to reference an already built table.
+ *
+ * The ext_vcm_id will probably reference a page table that's been built
+ * by the VM.
+ *
+ * The platform specific implementation will provide this.
+ *
+ * The return value is non-zero if a VCM has successfully been created.
+ */
+struct vcm *vcm_create_from_prebuilt(size_t ext_vcm_id);
+
+
+/**
+ * vcm_clone() - Clone a VCM
+ * @vcm_id A VCM to clone from.
+ *
+ * Perform a VCM "deep copy." The resulting VCM will match the original at
+ * the point of cloning. Subsequent updates to either VCM will only be
+ * seen by that VCM.
+ *
+ * The return value is non-zero if a VCM has been successfully cloned.
+ */
+struct vcm *vcm_clone(struct vcm *vcm_id);
+
+
+/**
+ * vcm_get_start_addr() - Get the starting address of the VCM region.
+ * @vcm_id The VCM we're interested in getting the starting address of.
+ *
+ * The return value will be 1 if an error has occurred.
+ */
+size_t vcm_get_start_addr(struct vcm *vcm_id);
+
+
+/**
+ * vcm_get_len() - Get the length of the VCM region.
+ * @vcm_id The VCM we're interested in reading the length from.
+ *
+ * The return value will be non-zero for a valid VCM. VCM regions
+ * cannot have 0 len.
+ */
+size_t vcm_get_len(struct vcm *vcm_id);
+
+
+/**
+ * vcm_free() - Free a VCM.
+ * @vcm_id The VCM we're interested in freeing.
+ *
+ * The return value is 0 if the VCM has been freed or:
+ * -EBUSY The VCM region contains reservations or has been associated
+ * (active or not) and cannot be freed.
+ * -EINVAL The vcm argument is invalid.
+ */
+int vcm_free(struct vcm *vcm_id);
+
+
+/*
+ * Creating, freeing and managing reservations out of a VCM.
+ *
+ */
+
+/**
+ * vcm_reserve() - Create a reservation from a VCM region.
+ * @vcm_id The VCM region to reserve from.
+ * @len The length of the reservation. Must be at least
+ * vcm_get_min_page_size() bytes.
+ * @attr See 'Reservation Attributes'.
+ *
+ * A reservation, res_t, is a contiguous range from a VCM region.
+ *
+ * The return value is non-zero if a reservation has been successfully
+ * created. It is 0 if any of the parameters are invalid.
+ */
+struct res *vcm_reserve(struct vcm *vcm_id, size_t len, uint32_t attr);
+
+
+/**
+ * vcm_reserve_at() - Make a reservation at a given logical location.
+ * @memtarget A logical location to start the reservation from.
+ * @vcm_id The VCM region to start the reservation from.
+ * @len The length of the reservation.
+ * @attr See 'Reservation Attributes'.
+ *
+ * The return value is non-zero if a reservation has been successfully
+ * created.
+ */
+struct res *vcm_reserve_at(enum memtarget_t memtarget, struct vcm *vcm_id,
+ size_t len, uint32_t attr);
+
+
+/**
+ * vcm_get_vcm_from_res() - Return the VCM region of a reservation.
+ * @res_id The reservation to return the VCM region of.
+ *
+ * Te return value will be non-zero if the reservation is valid. A valid
+ * reservation is always associated with a VCM region; there is no such
+ * thing as an orphan reservation.
+ */
+struct vcm *vcm_get_vcm_from_res(struct res *res_id);
+
+
+/**
+ * vcm_unreserve() - Unreserve the reservation.
+ * @res_id The reservation to unreserve.
+ *
+ * The return value will be 0 if the reservation was successfully
+ * unreserved and:
+ * -EBUSY The reservation is still backed,
+ * -EINVAL The vcm argument is invalid.
+ */
+int vcm_unreserve(struct res *res_id);
+
+
+/**
+ * vcm_get_res_len() - Return a reservations contiguous length.
+ * @res_id The reservation of interest.
+ *
+ * The return value will be 0 if res is invalid; reservations cannot
+ * have 0 length so there's no error return value ambiguity.
+ */
+size_t vcm_get_res_len(struct res *res_id);
+
+
+/**
+ * vcm_set_res_attr() - Set attributes of an existing reservation.
+ * @res_id An existing reservation of interest.
+ * @attr See 'Reservation Attributes'.
+ *
+ * This function can only be used on an existing reservation; there
+ * are no orphan reservations. All attributes can be set on a existing
+ * reservation.
+ *
+ * The return value will be 0 for a success, otherwise it will be:
+ * -EINVAL res or attr are invalid.
+ */
+int vcm_set_res_attr(struct res *res_id, uint32_t attr);
+
+
+/**
+ * vcm_get_res_attr() - Return a reservation's attributes.
+ * @res_id An existing reservation of interest.
+ *
+ * The return value will be 0 if res is invalid.
+ */
+uint32_t vcm_get_res_attr(struct res *res_id);
+
+
+/**
+ * vcm_get_num_res() - Return the number of reservations in a VCM region.
+ * @vcm_id The VCM region of interest.
+ */
+size_t vcm_get_num_res(struct vcm *vcm_id);
+
+
+/**
+ * vcm_get_next_res() - Read each reservation one at a time.
+ * @vcm_id The VCM region of interest.
+ * @res_id Contains the last reservation. Pass NULL on the first call.
+ *
+ * This function works like a foreach reservation in a VCM region.
+ *
+ * The return value will be non-zero for each reservation in a VCM. A
+ * zero indicates no further reservations.
+ */
+struct res *vcm_get_next_res(struct vcm *vcm_id, struct res *res_id);
+
+
+/**
+ * vcm_res_copy() - Copy len bytes from one reservation to another.
+ * @to The reservation to copy to.
+ * @from The reservation to copy from.
+ * @len The length of bytes to copy.
+ *
+ * The return value is the number of bytes copied.
+ */
+size_t vcm_res_copy(struct res *to, size_t to_off, struct res *from, size_t
+ from_off, size_t len);
+
+
+/**
+ * vcm_get_min_page_size() - Return the minimum page size supported by
+ * the architecture.
+ */
+size_t vcm_get_min_page_size(void);
+
+
+/**
+ * vcm_back() - Physically back a reservation.
+ * @res_id The reservation containing the virtual contiguous region to
+ * back.
+ * @physmem_id The physical memory that will back the virtual contiguous
+ * memory region.
+ *
+ * One VCM can be associated with multiple devices. When you vcm_back()
+ * each association must be active. This is not strictly necessary. It may
+ * be changed in the future.
+ *
+ * This function returns 0 on a successful physical backing. Otherwise
+ * it returns:
+ * -EINVAL res or physmem is invalid or res's len
+ * is different from physmem's len.
+ * -EAGAIN try again, one of the devices hasn't been activated.
+ */
+int vcm_back(struct res *res_id, struct physmem *physmem_id);
+
+
+/**
+ * vcm_unback() - Unback a reservation.
+ * @res_id The reservation to unback.
+ *
+ * One VCM can be associated with multiple devices. When you vcm_unback()
+ * each association must be active.
+ *
+ * This function returns 0 on a successful unbacking. Otherwise
+ * it returns:
+ * -EINVAL res is invalid.
+ * -EAGAIN try again, one of the devices hasn't been activated.
+ */
+int vcm_unback(struct res *res_id);
+
+
+/**
+ * vcm_phys_alloc() - Allocate physical memory for the VCM region.
+ * @memtype The memory type to allocate.
+ * @len The length of the allocation.
+ * @attr See 'Physical Allocation Attributes'.
+ *
+ * This function will allocate chunks of memory according to the attr
+ * it is passed.
+ *
+ * The return value is non-zero if physical memory has been
+ * successfully allocated.
+ */
+struct physmem *vcm_phys_alloc(enum memtype_t memtype, size_t len,
+ uint32_t attr);
+
+
+/**
+ * vcm_phys_free() - Free a physical allocation.
+ * @physmem_id The physical allocation to free.
+ *
+ * The return value is 0 if the physical allocation has been freed or:
+ * -EBUSY Their are reservation mapping the physical memory.
+ * -EINVAL The physmem argument is invalid.
+ */
+int vcm_phys_free(struct physmem *physmem_id);
+
+
+/**
+ * vcm_get_physmem_from_res() - Return a reservation's physmem_id
+ * @ res_id An existing reservation of interest.
+ *
+ * The return value will be non-zero on success, otherwise it will be:
+ * -EINVAL res is invalid
+ * -ENOMEM res is unbacked
+ */
+struct physmem *vcm_get_physmem_from_res(struct res *res_id);
+
+
+/**
+ * vcm_get_memtype_of_physalloc() - Return the memtype of a reservation.
+ * @physmem_id The physical allocation of interest.
+ *
+ * This function returns the memtype of a reservation or VCM_INVALID
+ * if res is invalid.
+ */
+enum memtype_t vcm_get_memtype_of_physalloc(struct physmem *physmem_id);
+
+
+/*
+ * Associate a VCM with a device, activate that association and remove it.
+ *
+ */
+
+/**
+ * vcm_assoc() - Associate a VCM with a device.
+ * @vcm_id The VCM region of interest.
+ * @dev_id The device to associate the VCM with.
+ * @attr See 'Association Attributes'.
+ *
+ * This function returns non-zero if a association is made. It returns 0
+ * if any of its parameters are invalid or VCM_ATTR_VALID is not present.
+ */
+struct avcm *vcm_assoc(struct vcm *vcm_id, size_t dev_id, uint32_t attr);
+
+
+/**
+ * vcm_deassoc() - Deassociate a VCM from a device.
+ * @avcm_id The association we want to break.
+ *
+ * The function returns 0 on success or:
+ * -EBUSY The association is currently activated.
+ * -EINVAL The avcm parameter is invalid.
+ */
+int vcm_deassoc(struct avcm *avcm_id);
+
+
+/**
+ * vcm_set_assoc_attr() - Set an AVCM's attributes.
+ * @avcm_id The AVCM of interest.
+ * @attr The new attr. See 'Association Attributes'.
+ *
+ * Every attribute can be set at runtime if an association isn't activated.
+ *
+ * This function returns 0 on success or:
+ * -EBUSY The association is currently activated.
+ * -EINVAL The avcm parameter is invalid.
+ */
+int vcm_set_assoc_attr(struct avcm *avcm_id, uint32_t attr);
+
+
+/**
+ * vcm_get_assoc_attr() - Return an AVCM's attributes.
+ * @avcm_id The AVCM of interest.
+ *
+ * This function returns 0 on error.
+ */
+uint32_t vcm_get_assoc_attr(struct avcm *avcm_id);
+
+
+/**
+ * vcm_activate() - Activate an AVCM.
+ * @avcm_id The AVCM to activate.
+ *
+ * You have to deactivate, before you activate.
+ *
+ * This function returns 0 on success or:
+ * -EINVAL avcm is invalid
+ * -ENODEV no device
+ * -EBUSY device is already active
+ * -1 hardware failure
+ */
+int vcm_activate(struct avcm *avcm_id);
+
+
+/**
+ * vcm_deactivate() - Deactivate an association.
+ * @avcm_id The AVCM to deactivate.
+ *
+ * This function returns 0 on success or:
+ * -ENOENT avcm is not activate
+ * -EINVAL avcm is invalid
+ * -1 hardware failure
+ */
+int vcm_deactivate(struct avcm *avcm_id);
+
+
+/**
+ * vcm_is_active() - Query if an AVCM is active.
+ * @avcm_id The AVCM of interest.
+ *
+ * returns 0 for not active, 1 for active or -EINVAL for error.
+ *
+ */
+int vcm_is_active(struct avcm *avcm_id);
+
+
+
+/*
+ * Create, manage and remove a boundary in a VCM.
+ */
+
+/**
+ * vcm_create_bound() - Create a bound in a VCM.
+ * @vcm_id The VCM that needs a bound.
+ * @len The len of the bound.
+ *
+ * The allocator picks the virtual addresses of the bound.
+ *
+ * This function returns non-zero if a bound was created.
+ */
+struct bound *vcm_create_bound(struct vcm *vcm_id, size_t len);
+
+
+/**
+ * vcm_free_bound() - Free a bound.
+ * @bound_id The bound to remove.
+ *
+ * This function returns 0 if bound has been removed or:
+ * -EBUSY The bound contains reservations and cannot be removed.
+ * -EINVAL The bound is invalid.
+ */
+int vcm_free_bound(struct bound *bound_id);
+
+
+/**
+ * vcm_reserve_from_bound() - Make a reservation from a bounded area.
+ * @bound_id The bound to reserve from.
+ * @len The len of the reservation.
+ * @attr See 'Reservation Attributes'.
+ *
+ * The return value is non-zero on success. It is 0 if any parameter
+ * is invalid.
+ */
+struct res *vcm_reserve_from_bound(struct bound *bound_id, size_t len,
+ uint32_t attr);
+
+
+/**
+ * vcm_get_bound_start_addr() - Return the starting device address of the bound.
+ * @bound_id The bound of interest.
+ *
+ * On success this function returns the starting addres of the bound. On error
+ * it returns:
+ * 1 bound_id is invalid.
+ */
+size_t vcm_get_bound_start_addr(struct bound *bound_id);
+
+
+/**
+ * vcm_get_bound_len() - Return the len of a bound.
+ * @bound_id The bound of interest.
+ *
+ * This function return non-zero on success, 0 on failure.
+ */
+size_t vcm_get_bound_len(struct bound *bound_id);
+
+
+
+/*
+ * Perform low-level control over VCM regions and reservations.
+ */
+
+/**
+ * vcm_map_phys_addr() - Produce a physmem_id from a contiguous
+ * physical address
+ *
+ * @phys The physical address of the contiguous range.
+ * @len The len of the contiguous address range.
+ *
+ * Returns non-zero on success, 0 on failure.
+ */
+struct physmem *vcm_map_phys_addr(size_t phys, size_t len);
+
+
+/**
+ * vcm_get_next_phys_addr() - Get the next physical addr and len of a
+ * physmem_id.
+ * @res_id The physmem_id of interest.
+ * @phys The current physical address. Set this to NULL to start the
+ * iteration.
+ * @len An output: the len of the next physical segment.
+ *
+ * physmem_id's may contain physically discontiguous sections. This
+ * function returns the next physical address and len. Pass NULL to
+ * phys to get the first physical address. The len of the physical
+ * segment is returned in *len.
+ *
+ * Returns 0 if there is no next physical address.
+ */
+size_t vcm_get_next_phys_addr(struct physmem *physmem_id, size_t phys,
+ size_t *len);
+
+
+/**
+ * vcm_get_dev_addr() - Return the device address of a reservation.
+ * @res_id The reservation of interest.
+ *
+ *
+ * On success this function returns the device address of a reservation. On
+ * error it returns:
+ * 1 res_id is invalid.
+ *
+ * Note: This may return a kernel address if the reservation was
+ * created from vcm_create_from_prebuilt() and the prebuilt ext_vcm_id
+ * references a VM page table.
+ */
+size_t vcm_get_dev_addr(struct res *res_id);
+
+
+/**
+ * vcm_get_res() - Return the reservation from a device address and a VCM
+ * @dev_addr The device address of interest.
+ * @vcm_id The VCM that contains the reservation
+ *
+ * This function returns 0 if there is no reservation whose device
+ * address is dev_addr.
+ */
+struct res *vcm_get_res(size_t dev_addr, struct vcm *vcm_id);
+
+
+/**
+ * vcm_translate() - Translate from one device address to another.
+ * @src_dev_id The source device address.
+ * @src_vcm_id The source VCM region.
+ * @dst_vcm_id The destination VCM region.
+ *
+ * Derive the device address from a VCM region that maps the same physical
+ * memory as a device address from another VCM region.
+ *
+ * On success this function returns the device address of a translation. On
+ * error it returns:
+ * 1 res_id is invalid.
+ */
+size_t vcm_translate(size_t src_dev_id, struct vcm *src_vcm_id,
+ struct vcm *dst_vcm_id);
+
+
+/**
+ * vcm_get_phys_num_res() - Return the number of reservations mapping a
+ * physical address.
+ * @phys The physical address to read.
+ */
+size_t vcm_get_phys_num_res(size_t phys);
+
+
+/**
+ * vcm_get_next_phys_res() - Return the next reservation mapped to a physical
+ * address.
+ * @phys The physical address to map.
+ * @res_id The starting reservation. Set this to NULL for the first
+ * reservation.
+ * @len The virtual length of the reservation
+ *
+ * This function returns 0 for the last reservation or no reservation.
+ */
+struct res *vcm_get_next_phys_res(size_t phys, struct res *res_id, size_t *len);
+
+
+/**
+ * vcm_get_pgtbl_pa() - Return the physcial address of a VCM's page table.
+ * @vcm_id The VCM region of interest.
+ *
+ * This function returns non-zero on success.
+ */
+size_t vcm_get_pgtbl_pa(struct vcm *vcm_id);
+
+
+/**
+ * vcm_get_cont_memtype_pa() - Return the phys base addr of a memtype's
+ * first contiguous region.
+ * @memtype The memtype of interest.
+ *
+ * This function returns non-zero on success. A zero return indicates that
+ * the given memtype does not have a contiguous region or that the memtype
+ * is invalid.
+ */
+size_t vcm_get_cont_memtype_pa(enum memtype_t memtype);
+
+
+/**
+ * vcm_get_cont_memtype_len() - Return the len of a memtype's
+ * first contiguous region.
+ * @memtype The memtype of interest.
+ *
+ * This function returns non-zero on success. A zero return indicates that
+ * the given memtype does not have a contiguous region or that the memtype
+ * is invalid.
+ */
+size_t vcm_get_cont_memtype_len(enum memtype_t memtype);
+
+
+/**
+ * vcm_dev_addr_to_phys_addr() - Perform a device address page-table lookup.
+ * @dev_id The device that has the table.
+ * @dev_addr The device address to map.
+ *
+ * This function returns the pa of a va from a device's page-table. It will
+ * fault if the dev_addr is not mapped.
+ */
+size_t vcm_dev_addr_to_phys_addr(size_t dev_id, size_t dev_addr);
+
+
+/*
+ * Fault Hooks
+ *
+ * vcm_hook()
+ */
+
+/**
+ * vcm_hook() - Add a fault handler.
+ * @dev_id The device.
+ * @handler The handler.
+ * @data A private piece of data that will get passed to the handler.
+ *
+ * This function returns 0 for a successful registration or:
+ * -EINVAL The arguments are invalid.
+ */
+int vcm_hook(size_t dev_id, vcm_handler handler, void *data);
+
+
+
+/*
+ * Low level, platform agnostic, HW control.
+ *
+ * vcm_hw_ver()
+ */
+
+/**
+ * vcm_hw_ver() - Return the hardware version of a device, if it has one.
+ * @dev_id The device.
+ */
+size_t vcm_hw_ver(size_t dev_id);
+
+
+
+/* bring-up init, destroy */
+int vcm_sys_init(void);
+int vcm_sys_destroy(void);
+
+#endif /* _VCM_H_ */
+
diff --git a/include/linux/vcm_types.h b/include/linux/vcm_types.h
new file mode 100644
index 0000000..2cc4770
--- /dev/null
+++ b/include/linux/vcm_types.h
@@ -0,0 +1,318 @@
+/* Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials provided
+ * with the distribution.
+ * * Neither the name of Code Aurora Forum, Inc. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED "AS IS" AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef VCM_TYPES_H
+#define VCM_TYPES_H
+
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/spinlock.h>
+#include <linux/genalloc.h>
+#include <linux/vcm_alloc.h>
+#include <linux/list.h>
+
+/*
+ * Reservation Attributes
+ *
+ * Used in vcm_reserve(), vcm_reserve_at(), vcm_set_res_attr() and
+ * vcm_reserve_bound().
+ *
+ * VCM_READ Specifies that the reservation can be read.
+ * VCM_WRITE Specifies that the reservation can be written.
+ * VCM_EXECUTE Specifies that the reservation can be executed.
+ * VCM_USER Specifies that this reservation is used for
+ * userspace access.
+ * VCM_SUPERVISOR Specifies that this reservation is used for
+ * supervisor access.
+ * VCM_SECURE Specifies that the target of the reservation is
+ * secure. The usage of this setting is TBD.
+ *
+ * Caching behavior as a 4 bit field:
+ * VCM_NOTCACHED The VCM region is not cached.
+ * VCM_INNER_WB_WA The VCM region is inner cached
+ * and is write-back and write-allocate.
+ * VCM_INNER_WT_NWA The VCM region is inner cached and is
+ * write-through and no-write-allocate.
+ * VCM_INNER_WB_NWA The VCM region is inner cached and is
+ * write-back and no-write-allocate.
+ * VCM_OUTER_WB_WA The VCM region is outer cached and is
+ * write-back and write-allocate.
+ * VCM_OUTER_WT_NWA The VCM region is outer cached and is
+ * write-through and no-write-allocate.
+ * VCM_OUTER_WB_NWA The VCM region is outer cached and is
+ * write-back and no-write-allocate.
+ * VCM_WB_WA The VCM region is cached and is write
+ * -back and write-allocate.
+ * VCM_WT_NWA The VCM region is cached and is write
+ * -through and no-write-allocate.
+ * VCM_WB_NWA The VCM region is cached and is write
+ * -back and no-write-allocate.
+ */
+
+#define VCM_CACHE_POLICY (0xF << 0)
+
+#define VCM_READ (1UL << 9)
+#define VCM_WRITE (1UL << 8)
+#define VCM_EXECUTE (1UL << 7)
+#define VCM_USER (1UL << 6)
+#define VCM_SUPERVISOR (1UL << 5)
+#define VCM_SECURE (1UL << 4)
+#define VCM_NOTCACHED (0UL << 0)
+#define VCM_WB_WA (1UL << 0)
+#define VCM_WB_NWA (2UL << 0)
+#define VCM_WT (3UL << 0)
+
+
+/*
+ * Physical Allocation Attributes
+ *
+ * Used in vcm_phys_alloc().
+ *
+ * Alignment as a power of 2 starting at 4 KB. 5 bit field.
+ * 1 = 4KB, 2 = 8KB, etc.
+ *
+ * Specifies that the reservation should have the
+ * alignment specified.
+ *
+ * VCM_4KB Specifies that the reservation should use 4KB pages.
+ * VCM_64KB Specifies that the reservation should use 64KB pages.
+ * VCM_1MB specifies that the reservation should use 1MB pages.
+ * VCM_ALL Specifies that the reservation should use all
+ * available page sizes.
+ * VCM_PHYS_CONT Specifies that a reservation should be backed with
+ * physically contiguous memory.
+ * VCM_COHERENT Specifies that the reservation must be kept coherent
+ * because it's shared.
+ */
+
+#define VCM_ALIGNMENT_MASK (0x1FUL << 6) /* 5-bit field */
+#define VCM_4KB (1UL << 5)
+#define VCM_64KB (1UL << 4)
+#define VCM_1MB (1UL << 3)
+#define VCM_ALL (1UL << 2)
+#define VCM_PAGE_SEL_MASK (0xFUL << 2)
+#define VCM_PHYS_CONT (1UL << 1)
+#define VCM_COHERENT (1UL << 0)
+
+
+#define SHIFT_4KB (12)
+
+#define ALIGN_REQ_BYTES(attr) (1UL << (((attr & VCM_ALIGNMENT_MASK) >> 6) + 12))
+/* set the alignment in pow 2, 0 = 4KB */
+#define SET_ALIGN_REQ_BYTES(attr, align) \
+ ((attr & ~VCM_ALIGNMENT_MASK) | ((align << 6) & VCM_ALIGNMENT_MASK))
+
+/*
+ * Association Attributes
+ *
+ * Used in vcm_assoc(), vcm_set_assoc_attr().
+ *
+ * VCM_USE_LOW_BASE Use the low base register.
+ * VCM_USE_HIGH_BASE Use the high base register.
+ *
+ * VCM_SPLIT A 5 bit field that defines the
+ * high/low split. This value defines
+ * the number of 0's left-filled into the
+ * split register. Addresses that match
+ * this will use VCM_USE_LOW_BASE
+ * otherwise they'll use
+ * VCM_USE_HIGH_BASE. An all 0's value
+ * directs all translations to
+ * VCM_USE_LOW_BASE.
+ */
+
+#define VCM_SPLIT (1UL << 3)
+#define VCM_USE_LOW_BASE (1UL << 2)
+#define VCM_USE_HIGH_BASE (1UL << 1)
+
+
+/*
+ * External VCMs
+ *
+ * Used in vcm_create_from_prebuilt()
+ *
+ * Externally created VCM IDs for creating kernel and user space
+ * mappings to VCMs and kernel and user space buffers out of
+ * VCM_MEMTYPE_0,1,2, etc.
+ *
+ */
+#define VCM_PREBUILT_KERNEL 1
+#define VCM_PREBUILT_USER 2
+
+/**
+ * enum memtarget_t - A logical location in a VCM.
+ *
+ * VCM_START Indicates the start of a VCM_REGION.
+ */
+enum memtarget_t {
+ VCM_START
+};
+
+
+/**
+ * enum memtype_t - A logical location in a VCM.
+ *
+ * VCM_MEMTYPE_0 Generic memory type 0
+ * VCM_MEMTYPE_1 Generic memory type 1
+ * VCM_MEMTYPE_2 Generic memory type 2
+ *
+ * A memtype encapsulates a platform specific memory arrangement. The
+ * memtype needn't refer to a single type of memory, it can refer to a
+ * set of memories that can back a reservation.
+ *
+ */
+enum memtype_t {
+ VCM_INVALID,
+ VCM_MEMTYPE_0,
+ VCM_MEMTYPE_1,
+ VCM_MEMTYPE_2,
+};
+
+
+/**
+ * vcm_handler - The signature of the fault hook.
+ * @dev_id The device id of the faulting device.
+ * @data The generic data pointer.
+ * @fault_data System specific common fault data.
+ *
+ * The handler should return 0 for success. This indicates that the
+ * fault was handled. A non-zero return value is an error and will be
+ * propagated up the stack.
+ */
+typedef int (*vcm_handler)(size_t dev_id, void *data, void *fault_data);
+
+
+enum vcm_type {
+ VCM_DEVICE,
+ VCM_EXT_KERNEL,
+ VCM_EXT_USER,
+ VCM_ONE_TO_ONE,
+};
+
+
+/**
+ * vcm - A Virtually Contiguous Memory region.
+ * @start_addr The starting address of the VCM region.
+ * @len The len of the VCM region. This must be at least
+ * vcm_min() bytes.
+ */
+struct vcm {
+ enum vcm_type type;
+
+ size_t start_addr;
+ size_t len;
+
+ size_t dev_id; /* opaque device control */
+
+ /* allocator dependent */
+ struct gen_pool *pool;
+
+ struct list_head res_head;
+
+ /* this will be a very short list */
+ struct list_head assoc_head;
+};
+
+/**
+ * avcm - A VCM to device association
+ * @vcm The VCM region of interest.
+ * @dev_id The device to associate the VCM with.
+ * @attr See 'Association Attributes'.
+ */
+struct avcm {
+ struct vcm *vcm_id;
+ size_t dev_id;
+ uint32_t attr;
+
+ struct list_head assoc_elm;
+
+ int is_active; /* is this particular association active */
+};
+
+/**
+ * bound - A boundary to reserve from in a VCM region.
+ * @vcm The VCM that needs a bound.
+ * @len The len of the bound.
+ */
+struct bound {
+ struct vcm *vcm_id;
+ size_t len;
+};
+
+
+/**
+ * physmem - A physical memory allocation.
+ * @memtype The memory type of the VCM region.
+ * @len The len of the physical memory allocation.
+ * @attr See 'Physical Allocation Attributes'.
+ *
+ */
+
+struct physmem {
+ enum memtype_t memtype;
+ size_t len;
+ uint32_t attr;
+
+ struct phys_chunk alloc_head;
+
+ /* if the physmem is cont then use the built in VCM */
+ int is_cont;
+ struct res *res;
+};
+
+/**
+ * res - A reservation in a VCM region.
+ * @vcm The VCM region to reserve from.
+ * @len The length of the reservation. Must be at least vcm_min()
+ * bytes.
+ * @attr See 'Reservation Attributes'.
+ */
+struct res {
+ struct vcm *vcm_id;
+ struct physmem *physmem_id;
+ size_t len;
+ uint32_t attr;
+
+ /* allocator dependent */
+ size_t alignment_req;
+ size_t aligned_len;
+ unsigned long ptr;
+ size_t aligned_ptr;
+
+ struct list_head res_elm;
+
+
+ /* type VCM_EXT_KERNEL */
+ struct vm_struct *vm_area;
+ int mapped;
+};
+
+extern int chunk_sizes[NUM_CHUNK_SIZES];
+
+#endif /* VCM_TYPES_H */
--
1.7.0.2
The Virtual Contiguous Memory Manager (VCMM) needs a physical pool to
allocate from. It breaks up the pool into sub-pools of same-sized
chunks. In particular, it breaks the pool it manages into sub-pools of
1 MB, 64 KB and 4 KB chunks.
When a user makes a request, this allocator satisfies that request
from the sub-pools using a "maximum-munch" strategy. This strategy
attempts to satisfy a request using the largest chunk-size without
over-allocating, then moving on to the next smallest size without
over-allocating and finally completing the request with the smallest
sized chunk, over-allocating if necessary.
The maximum-munch strategy allows physical page allocation for small
TLBs that need to map a given range using the minimum number of mappings.
Although the allocator has been configured for 1 MB, 64 KB and 4 KB
chunks, it can be easily extended to other chunk sizes.
Signed-off-by: Zach Pfeffer <[email protected]>
---
arch/arm/mm/vcm_alloc.c | 425 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/vcm_alloc.h | 70 ++++++++
2 files changed, 495 insertions(+), 0 deletions(-)
create mode 100644 arch/arm/mm/vcm_alloc.c
create mode 100644 include/linux/vcm_alloc.h
diff --git a/arch/arm/mm/vcm_alloc.c b/arch/arm/mm/vcm_alloc.c
new file mode 100644
index 0000000..e592e71
--- /dev/null
+++ b/arch/arm/mm/vcm_alloc.c
@@ -0,0 +1,425 @@
+/* Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA.
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/vcm_alloc.h>
+#include <linux/string.h>
+#include <asm/sizes.h>
+
+/* Amount of memory managed by VCM */
+#define TOTAL_MEM_SIZE SZ_32M
+
+static unsigned int base_pa = 0x80000000;
+int basicalloc_init;
+
+int chunk_sizes[NUM_CHUNK_SIZES] = {SZ_1M, SZ_64K, SZ_4K};
+int init_num_chunks[] = {
+ (TOTAL_MEM_SIZE/2) / SZ_1M,
+ (TOTAL_MEM_SIZE/4) / SZ_64K,
+ (TOTAL_MEM_SIZE/4) / SZ_4K
+};
+#define LAST_SZ() (ARRAY_SIZE(chunk_sizes) - 1)
+
+#define vcm_alloc_err(a, ...) \
+ pr_err("ERROR %s %i " a, __func__, __LINE__, ##__VA_ARGS__)
+
+struct phys_chunk_head {
+ struct list_head head;
+ int num;
+};
+
+struct phys_mem {
+ struct phys_chunk_head heads[ARRAY_SIZE(chunk_sizes)];
+} phys_mem;
+
+static int is_allocated(struct list_head *allocated)
+{
+ /* This should not happen under normal conditions */
+ if (!allocated) {
+ vcm_alloc_err("no allocated\n");
+ return 0;
+ }
+
+ if (!basicalloc_init) {
+ vcm_alloc_err("no basicalloc_init\n");
+ return 0;
+ }
+ return !list_empty(allocated);
+}
+
+static int count_allocated_size(enum chunk_size_idx idx)
+{
+ int cnt = 0;
+ struct phys_chunk *chunk, *tmp;
+
+ if (!basicalloc_init) {
+ vcm_alloc_err("no basicalloc_init\n");
+ return 0;
+ }
+
+ list_for_each_entry_safe(chunk, tmp,
+ &phys_mem.heads[idx].head, list) {
+ if (is_allocated(&chunk->allocated))
+ cnt++;
+ }
+
+ return cnt;
+}
+
+
+int vcm_alloc_get_mem_size(void)
+{
+ return TOTAL_MEM_SIZE;
+}
+EXPORT_SYMBOL(vcm_alloc_get_mem_size);
+
+
+int vcm_alloc_blocks_avail(enum chunk_size_idx idx)
+{
+ if (!basicalloc_init) {
+ vcm_alloc_err("no basicalloc_init\n");
+ return 0;
+ }
+
+ return phys_mem.heads[idx].num;
+}
+EXPORT_SYMBOL(vcm_alloc_blocks_avail);
+
+
+int vcm_alloc_get_num_chunks(void)
+{
+ return ARRAY_SIZE(chunk_sizes);
+}
+EXPORT_SYMBOL(vcm_alloc_get_num_chunks);
+
+
+int vcm_alloc_all_blocks_avail(void)
+{
+ int i;
+ int cnt = 0;
+
+ if (!basicalloc_init) {
+ vcm_alloc_err("no basicalloc_init\n");
+ return 0;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(chunk_sizes); ++i)
+ cnt += vcm_alloc_blocks_avail(i);
+ return cnt;
+}
+EXPORT_SYMBOL(vcm_alloc_all_blocks_avail);
+
+
+int vcm_alloc_count_allocated(void)
+{
+ int i;
+ int cnt = 0;
+
+ if (!basicalloc_init) {
+ vcm_alloc_err("no basicalloc_init\n");
+ return 0;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(chunk_sizes); ++i)
+ cnt += count_allocated_size(i);
+ return cnt;
+}
+EXPORT_SYMBOL(vcm_alloc_count_allocated);
+
+
+void vcm_alloc_print_list(int just_allocated)
+{
+ int i;
+ struct phys_chunk *chunk, *tmp;
+
+ if (!basicalloc_init) {
+ vcm_alloc_err("no basicalloc_init\n");
+ return;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(chunk_sizes); ++i) {
+ if (list_empty(&phys_mem.heads[i].head))
+ continue;
+ list_for_each_entry_safe(chunk, tmp,
+ &phys_mem.heads[i].head, list) {
+ if (just_allocated && !is_allocated(&chunk->allocated))
+ continue;
+
+ printk(KERN_INFO "pa = %#x, size = %#x\n",
+ chunk->pa, chunk_sizes[chunk->size_idx]);
+ }
+ }
+}
+EXPORT_SYMBOL(vcm_alloc_print_list);
+
+
+int vcm_alloc_idx_to_size(int idx)
+{
+ return chunk_sizes[idx];
+}
+EXPORT_SYMBOL(vcm_alloc_idx_to_size);
+
+
+int vcm_alloc_destroy(void)
+{
+ int i;
+ struct phys_chunk *chunk, *tmp;
+
+ if (!basicalloc_init) {
+ vcm_alloc_err("no basicalloc_init\n");
+ return -1;
+ }
+
+ /* can't destroy a space that has allocations */
+ if (vcm_alloc_count_allocated()) {
+ vcm_alloc_err("allocations still present\n");
+ return -1;
+ }
+ for (i = 0; i < ARRAY_SIZE(chunk_sizes); ++i) {
+
+ if (list_empty(&phys_mem.heads[i].head))
+ continue;
+ list_for_each_entry_safe(chunk, tmp,
+ &phys_mem.heads[i].head, list) {
+ list_del(&chunk->list);
+ memset(chunk, 0, sizeof(*chunk));
+ kfree(chunk);
+ }
+ }
+
+ basicalloc_init = 0;
+
+ return 0;
+}
+EXPORT_SYMBOL(vcm_alloc_destroy);
+
+
+int vcm_alloc_init(unsigned int set_base_pa)
+{
+ int i = 0, j = 0;
+ struct phys_chunk *chunk;
+ int pa;
+
+ if (set_base_pa)
+ base_pa = set_base_pa;
+
+ pa = base_pa;
+
+ /* no double inits */
+ if (basicalloc_init) {
+ vcm_alloc_err("double basicalloc_init\n");
+ BUG();
+ return -1;
+ }
+
+ /* separate out to ensure good cleanup */
+ for (i = 0; i < ARRAY_SIZE(chunk_sizes); ++i) {
+ INIT_LIST_HEAD(&phys_mem.heads[i].head);
+ phys_mem.heads[i].num = 0;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(chunk_sizes); ++i) {
+ for (j = 0; j < init_num_chunks[i]; ++j) {
+ chunk = kzalloc(sizeof(*chunk), GFP_KERNEL);
+ if (!chunk) {
+ vcm_alloc_err("null chunk\n");
+ goto fail;
+ }
+ chunk->pa = pa; pa += chunk_sizes[i];
+ chunk->size_idx = i;
+ INIT_LIST_HEAD(&chunk->allocated);
+ list_add_tail(&chunk->list, &phys_mem.heads[i].head);
+ phys_mem.heads[i].num++;
+ }
+ }
+
+ basicalloc_init = 1;
+ return 0;
+fail:
+ vcm_alloc_destroy();
+ return -1;
+}
+EXPORT_SYMBOL(vcm_alloc_init);
+
+
+int vcm_alloc_free_blocks(struct phys_chunk *alloc_head)
+{
+ struct phys_chunk *chunk, *tmp;
+
+ if (!basicalloc_init) {
+ vcm_alloc_err("no basicalloc_init\n");
+ goto fail;
+ }
+
+ if (!alloc_head) {
+ vcm_alloc_err("no alloc_head\n");
+ goto fail;
+ }
+
+ list_for_each_entry_safe(chunk, tmp, &alloc_head->allocated,
+ allocated) {
+ list_del_init(&chunk->allocated);
+ phys_mem.heads[chunk->size_idx].num++;
+ }
+
+ return 0;
+fail:
+ return -1;
+}
+EXPORT_SYMBOL(vcm_alloc_free_blocks);
+
+
+int vcm_alloc_num_blocks(int num,
+ enum chunk_size_idx idx, /* chunk size */
+ struct phys_chunk *alloc_head)
+{
+ struct phys_chunk *chunk;
+ int num_allocated = 0;
+
+ if (!basicalloc_init) {
+ vcm_alloc_err("no basicalloc_init\n");
+ goto fail;
+ }
+
+ if (!alloc_head) {
+ vcm_alloc_err("no alloc_head\n");
+ goto fail;
+ }
+
+ if (list_empty(&phys_mem.heads[idx].head)) {
+ vcm_alloc_err("list is empty\n");
+ goto fail;
+ }
+
+ if (vcm_alloc_blocks_avail(idx) < num) {
+ vcm_alloc_err("not enough blocks? num=%d\n", num);
+ goto fail;
+ }
+
+ list_for_each_entry(chunk, &phys_mem.heads[idx].head, list) {
+ if (num_allocated == num)
+ break;
+ if (is_allocated(&chunk->allocated))
+ continue;
+
+ list_add_tail(&chunk->allocated, &alloc_head->allocated);
+ phys_mem.heads[idx].num--;
+ num_allocated++;
+ }
+ return num_allocated;
+fail:
+ return 0;
+}
+EXPORT_SYMBOL(vcm_alloc_num_blocks);
+
+
+int vcm_alloc_max_munch(int len,
+ struct phys_chunk *alloc_head)
+{
+ int i;
+
+ int blocks_req = 0;
+ int block_residual = 0;
+ int blocks_allocated = 0;
+
+ int ba = 0;
+
+ if (!basicalloc_init) {
+ vcm_alloc_err("basicalloc_init is 0\n");
+ goto fail;
+ }
+
+ if (!alloc_head) {
+ vcm_alloc_err("alloc_head is NULL\n");
+ goto fail;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(chunk_sizes); ++i) {
+ blocks_req = len / chunk_sizes[i];
+ block_residual = len % chunk_sizes[i];
+
+ len = block_residual; /* len left */
+ if (blocks_req) {
+ int blocks_available = 0;
+ int blocks_diff = 0;
+ int bytes_diff = 0;
+
+ blocks_available = vcm_alloc_blocks_avail(i);
+ if (blocks_available < blocks_req) {
+ blocks_diff =
+ (blocks_req - blocks_available);
+ bytes_diff =
+ blocks_diff * chunk_sizes[i];
+
+ /* add back in the rest */
+ len += bytes_diff;
+ } else {
+ /* got all the blocks I need */
+ blocks_available =
+ (blocks_available > blocks_req)
+ ? blocks_req : blocks_available;
+ }
+
+ ba = vcm_alloc_num_blocks(blocks_available, i,
+ alloc_head);
+
+ if (ba != blocks_available) {
+ vcm_alloc_err("blocks allocated (%i) !="
+ " blocks_available (%i):"
+ " chunk size = %#x,"
+ " alloc_head = %p\n",
+ ba, blocks_available,
+ i, (void *) alloc_head);
+ goto fail;
+ }
+ blocks_allocated += blocks_available;
+ }
+ }
+
+ if (len) {
+ int blocks_available = 0;
+
+ blocks_available = vcm_alloc_blocks_avail(LAST_SZ());
+
+ if (blocks_available > 1) {
+ ba = vcm_alloc_num_blocks(1, LAST_SZ(), alloc_head);
+ if (ba != 1) {
+ vcm_alloc_err("blocks allocated (%i) !="
+ " blocks_available (%i):"
+ " chunk size = %#x,"
+ " alloc_head = %p\n",
+ ba, 1,
+ LAST_SZ(),
+ (void *) alloc_head);
+ goto fail;
+ }
+ blocks_allocated += 1;
+ } else {
+ vcm_alloc_err("blocks_available (%#x) <= 1\n",
+ blocks_available);
+ goto fail;
+ }
+ }
+
+ return blocks_allocated;
+fail:
+ vcm_alloc_free_blocks(alloc_head);
+ return 0;
+}
+EXPORT_SYMBOL(vcm_alloc_max_munch);
diff --git a/include/linux/vcm_alloc.h b/include/linux/vcm_alloc.h
new file mode 100644
index 0000000..e3e3b31
--- /dev/null
+++ b/include/linux/vcm_alloc.h
@@ -0,0 +1,70 @@
+/* Copyright (c) 2010, Code Aurora Forum. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials provided
+ * with the distribution.
+ * * Neither the name of Code Aurora Forum, Inc. nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED "AS IS" AND ANY EXPRESS OR IMPLIED
+ * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
+ * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
+ * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
+ * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
+ * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ */
+
+#ifndef VCM_ALLOC_H
+#define VCM_ALLOC_H
+
+#include <linux/list.h>
+
+#define NUM_CHUNK_SIZES 3
+
+enum chunk_size_idx {
+ IDX_1M = 0,
+ IDX_64K,
+ IDX_4K
+};
+
+struct phys_chunk {
+ struct list_head list;
+ struct list_head allocated; /* used to record is allocated */
+
+ struct list_head refers_to;
+
+ /* TODO: change to unsigned long */
+ int pa;
+ int size_idx;
+};
+
+int vcm_alloc_get_mem_size(void);
+int vcm_alloc_blocks_avail(enum chunk_size_idx idx);
+int vcm_alloc_get_num_chunks(void);
+int vcm_alloc_all_blocks_avail(void);
+int vcm_alloc_count_allocated(void);
+void vcm_alloc_print_list(int just_allocated);
+int vcm_alloc_idx_to_size(int idx);
+int vcm_alloc_destroy(void);
+int vcm_alloc_init(unsigned int set_base_pa);
+int vcm_alloc_free_blocks(struct phys_chunk *alloc_head);
+int vcm_alloc_num_blocks(int num,
+ enum chunk_size_idx idx, /* chunk size */
+ struct phys_chunk *alloc_head);
+int vcm_alloc_max_munch(int len,
+ struct phys_chunk *alloc_head);
+
+#endif /* VCM_ALLOC_H */
--
1.7.0.2
On Tue, 29 Jun 2010 22:55:48 -0700 Zach Pfeffer wrote:
> This patch contains the documentation for the API, termed the Virtual
> Contiguous Memory Manager. Its use would allow all of the IOMMU to VM,
> VM to device and device to IOMMU interoperation code to be refactored
> into platform independent code.
>
> Comments, suggestions and criticisms are welcome and wanted.
>
> Signed-off-by: Zach Pfeffer <[email protected]>
> ---
> Documentation/vcm.txt | 583 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 583 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/vcm.txt
>
> diff --git a/Documentation/vcm.txt b/Documentation/vcm.txt
> new file mode 100644
> index 0000000..d29c757
> --- /dev/null
> +++ b/Documentation/vcm.txt
> @@ -0,0 +1,583 @@
> +What is this document about?
> +============================
> +
> +This document covers how to use the Virtual Contiguous Memory Manager
> +(VCMM), how the first implmentation works with a specific low-level
> +Input/Output Memory Management Unit (IOMMU) and the way the VCMM is used
> +from user-space. It also contains a section that describes why something
> +like the VCMM is needed in the kernel.
> +
> +If anything in this document is wrong please send patches to the
is wrong,
> +maintainer of this file, listed at the bottom of the document.
> +
> +
> +The Virtual Contiguous Memory Manager
> +=====================================
> +
> +The VCMM was built to solve the system-wide memory mapping issues that
> +occur when many bus-masters have IOMMUs.
> +
> +An IOMMU maps device addresses to physical addresses. It also insulates
> +the system from spurious or malicious device bus transactions and allows
> +fine-grained mapping attribute control. The Linux kernel core does not
> +contain a generic API to handle IOMMU mapped memory; device driver writers
> +must implement device specific code to interoperate with the Linux kernel
> +core. As the number of IOMMUs increases, coordinating the many address
> +spaces mapped by all discrete IOMMUs becomes difficult without in-kernel
> +support.
> +
> +The VCMM API enables device independent IOMMU control, virtual memory
> +manager (VMM) interoperation and non-IOMMU enabled device interoperation
> +by treating devices with or without IOMMUs and all CPUs with or without
> +MMUs, their mapping contexts and their mappings using common
> +abstractions. Physical hardware is given a generic device type and mapping
> +contexts are abstracted into Virtual Contiguous Memory (VCM)
> +regions. Users "reserve" memory from VCMs and "back" their reservations
> +with physical memory.
> +
> +Why the VCMM is Needed
> +----------------------
> +
> +Driver writers who control devices with IOMMUs must contend with device
> +control and memory management. Driver writers have a large device driver
> +API that they can leverage to control their devices, but they are lacking
> +a unified API to help them program mappings into IOMMUs and share those
> +mappings with other devices and CPUs in the system.
> +
> +Sharing is complicated by Linux's CPU centric VMM. The CPU centric model
CPU-centric CPU-centric
> +generally makes sense because average hardware only contains a MMU for the
> +CPU and possibly a graphics MMU. If every device in the system has one or
> +more MMUs the CPU centric memory management (MM) programming model breaks
ditto
> +down.
> +
> +Abstracting IOMMU device programming into a common API has already begun
> +in the Linux kernel. It was built to abstract the difference between AMDs
AMD's
(or just "AMD")
> +and Intels IOMMUs to support x86 virtualization on both platforms. The
Intel's (or just "Intel")
> +interface is listed in kernel/include/linux/iommu.h. It contains
drop "kernel/"
> +interfaces for mapping and unmapping as well as domain management. This
> +interface has not gained widespread use outside the x86; PA-RISC, Alpha
> +and SPARC architectures and ARM and PowerPC platforms all use their own
> +mapping modules to control their IOMMUs. The VCMM contains an IOMMU
> +programming layer, but since its abstraction supports map management
> +independent of device control, the layer is not used directly. This
> +higher-level view enables a new kernel service, not just an IOMMU
> +interoperation layer.
> +
> +The General Idea: Map Management using Graphs
> +---------------------------------------------
> +
> +Looking at mapping from a system-wide perspective reveals a general graph
> +problem. The VCMMs API is built to manage the general mapping graph. Each
VMCC's
> +node that talks to memory, either through an MMU or directly (physically
> +mapped) can be thought of as the device-end of a mapping edge. The other
> +edge is the physical memory (or intermediate virtual space) that is
> +mapped.
> +
> +In the direct mapped case the device is assigned a one-to-one MMU. This
direct-mapped
> +scheme allows direct mapped devices to participate in general graph
> +management.
> +
> +The CPU nodes can also be brought under the same mapping abstraction with
> +the use of a light overlay on the existing VMM. This light overlay allows
> +VMM managed mappings to interoperate with the common API. The light
VMM-managed
> +overlay enables this without substantial modifications to the existing
> +VMM.
> +
> +In addition to CPU nodes that are running Linux (and the VMM), remote CPU
> +nodes that may be running other operating systems can be brought into the
> +general abstraction. Routing all memory management requests from a remote
> +node through the central memory management framework enables new features
> +like system-wide memory migration. This feature may only be feasible for
> +large buffers that are managed outside of the fast-path, but having remote
> +allocation in a system enables features that are impossible to build
> +without it.
> +
> +The fundamental objects that support graph-based map management are:
> +
> +1) Virtual Contiguous Memory Regions
> +
> +2) Reservations
> +
> +3) Associated Virtual Contiguous Memory Regions
> +
> +4) Memory Targets
> +
> +5) Physical Memory Allocations
> +
> +Usage Overview
> +--------------
> +
> +In a nut-shell, users allocate Virtual Contiguous Memory Regions and
nutshell,
> +associate those regions with one or more devices by creating an Associated
> +Virtual Contiguous Memory Region. Users then create Reservations from the
> +Virtual Contiguous Memory Region. At this point no physical memory has
> +been committed to the reservation. To associate physical memory with a
> +reservation a Physical Memory Allocation is created and the Reservation is
> +backed with this allocation.
> +
> +include/linux/vcm.h includes comments documenting each API.
> +
> +Virtual Contiguous Memory Regions
> +---------------------------------
> +
> +A Virtual Contiguous Memory Region (VCM) abstracts the memory space a
> +device sees. The addresses of the region are only used by the devices
> +which are associated with the region. This address space would normally be
> +implemented as a device page-table.
page table.
> +
> +A VCM is created and destroyed with three functions:
> +
> + struct vcm *vcm_create(size_t start_addr, size_t len);
Seems odd to use size_t for start_addr.
> + struct vcm *vcm_create_from_prebuilt(size_t ext_vcm_id);
> +
> + int vcm_free(struct vcm *vcm);
> +
> +start_addr is an offset into the address space where allocations will
> +start from. len is the length from start_addr of the VCM. Both functions
> +generate an instance of a VCM.
> +
> +ext_vcm_id is used to pass a request to the VMM to generate a VCM
> +instance. In the current implementation the call simply makes a note that
> +the VCM instance is a VMM VCM instance for other interfaces usage. This
> +muxing is seen throughout the implementation.
> +
> +vcm_create() and vcm_create_from_prebuilt() produce VCM instances for
> +virtually mapped devices (IOMMUs and CPUs). To create a one-to-one mapped
> +VCM users pass the start_addr and len of the physical region. The VCMM
VCM,
> +matches this and records that the VCM instance is a one-to-one VCM.
> +
> +The newly created VCM instance can be passed to any function that needs to
> +operate on or with a virtual contiguous memory region. Its main attributes
> +are a start_addr and a len as well as an internal setting that allows the
> +implementation to mux between true virtual spaces, one-to-one mapped
> +spaces and VMM managed spaces.
> +
> +The current implementation uses the genalloc library to manage the VCM for
> +IOMMU devices. Return values and more in-depth per-function documentation
> +for these and the ones listed below are in include/linux/vcm.h.
> +
> +Reservations
> +------------
> +
> +A Reservation is a contiguous region allocated from a VCM. There is no
> +physical memory associated with it.
> +
> +A Reservation is created and destroyed with:
> +
> + struct res *vcm_reserve(struct vcm *vcm, size_t len, uint32_t attr);
s/uint32_t/u32/ ?
> + int vcm_unreserve(struct res *res);
> +
> +A vcm is a VCM created above. len is the length of the request. It can be
> +up to the length of the VCM region the reservation is being created
> +from. attr are mapping attributes: read, write, execute, user, supervisor,
> +secure, not-cached, write-back/write-allocate, write-back/no
> +write-allocate, write-through. These attrs are appropriate for ARM but can
> +be changed to match to any architecture.
> +
> +The implementation calls gen_pool_alloc() for IOMMU devices,
> +alloc_vm_area() for VMM areas and is a pass through for one-to-one mapped
pass-through
> +areas.
> +
> +Associated Virtual Contiguous Memory Regions and Activation
> +-----------------------------------------------------------
> +
> +An Associated Virtual Contiguous Memory Region (AVCM) is a mapping of a
> +VCM to a device. The mapping can be active or inactive.
> +
> +An AVCM is managed with:
> +
> +struct avcm *vcm_assoc(struct vcm *vcm, size_t dev, uint32_t attr);
> +
> + int vcm_deassoc(struct avcm *avcm);
> +
> + int vcm_activate(struct avcm *avcm);
> +
> + int vcm_deactivate(struct avcm *avcm);
> +
> +A VCM instance is a VCM created above. A dev is an opaque device handle
> +thats passed down to the device driver the VCMM muxes in to handle a
> +request. attr are association attributes: split, use-high or
size_t used for an opaque device handle seems odd.
> +use-low. split controls which transactions hit a high-address page-table
> +and which transactions hit a low-address page-table. For instance, all
> +transactions whose most significant address bit is one would use the
> +high-address page-table, any other transaction would use the low address
> +page-table. This scheme is ARM specific and could be changed in other
ARM-specific
> +architectures. One VCM instance can be associated with many devices and
> +many VCM instances can be associated with one device.
> +
> +An AVCM is only a link. To program and deprogram a device with a VCM the
> +user calls vcm_activate() and vcm_deactivate().For IOMMU devices,
vcm_deactivate(). For
> +activating a mapping programs the base address of a page-table into an
page table
> +IOMMU. For VMM and one-to-one based devices, mappings are active
> +immediately but the API does require an activation call for them for
> +internal reference counting.
> +
> +Memory Targets
> +--------------
> +
> +A Memory Target is a platform independent way of specifying a physical
> +pool; it abstracts a pool of physical memory. The physical memory pool may
> +be physically discontiguous, need to be allocated from in a unique way or
> +have other user-defined attributes.
> +
> +Physical Memory Allocation and Reservation Backing
> +--------------------------------------------------
> +
> +Physical memory is allocated as a separate step from reserving
> +memory. This allows multiple reservations to back the same physical
> +memory.
> +
> +A Physical Memory Allocation is managed using the following functions:
> +
> + struct physmem *vcm_phys_alloc(enum memtype_t memtype, size_t len,
> + uint32_t attr);
> +
> + int vcm_phys_free(struct physmem *physmem);
> +
> + int vcm_back(struct res *res, struct physmem *physmem);
> +
> + int vcm_unback(struct res *res);
> +
> +attr can include an alignment request, a specification to map memory using
> +various block sizes and/or to use physically contiguous memory. memtype is
> +one of the memory types listed in Memory Targets.
> +
> +The current implementation manages two pools of memory. One pool is a
> +contiguous block of memory and the other is a set of contiguous block
> +pools. In the current implementation the block pools contain 4K, 64K and
> +1M blocks. The physical allocator does not try to split blocks from the
> +contiguous block pools to satisfy requests.
> +
> +The use of 4K, 64K and 1M blocks solves a problem with some IOMMU
> +hardware. IOMMUs are placed in front of multimedia engines to provide a
> +contiguous address space to the device. Multimedia devices need large
> +buffers and large buffers may map to a large number of physical
> +blocks. IOMMUs tend to have small translation lookaside buffers
> +(TLBs). Since the TLB is small the number of physical blocks that map a
> +given range needs to be small or else the IOMMU will continually fetch new
> +translations during a typical streamed multimedia flow. By using a 1 MB
> +mapping (or 64K mapping) instead of a 4K mapping the number of misses can
> +be minimized, allowing the multimedia block to meet its performance goals.
> +
> +Low Level Control
> +-----------------
> +
> +It is necessary in some instances to access attributes and provide
> +higher-level control of the low-level hardware abstraction. The API
> +contains many functions for this task but the two that are typically used
> +are:
> +
> + size_t vcm_get_dev_addr(struct res *res);
> +
> + int vcm_hook(size_t dev, vcm_handler handler, void *data);
> +
> +The first function, vcm_get_dev_addr() returns a device address given a
> +reservation. This device address is a virtual IOMMU address for
> +reservations on IOMMU VCMs, a virtual VMM address for reservations on VMM
> +VCMs and a virtual (really physical since its one-to-one mapped) address
> +for one-to-one devices.
> +
> +The second function, vcm_hook allows a caller in the kernel to register a
vcm_hook,
> +user_handler. The handler is passed the data member passed to vcm_hook
> +during a fault. The user can return 1 to indicate that the underlying
> +driver should handle the fault and retry the transaction or they can
> +return 0 to halt the transaction. If the user doesn't register a handler
> +the low-level driver will print a warning and terminate the transaction.
> +
> +A Detailed Walk Through
> +-----------------------
> +
> +The following call sequence walks through a typical allocation
> +sequence. In the first stage the memory for a device is reserved and
> +backed. This occurs without mapping the memory into a VMM VCM region. The
> +second stage maps the first VCM region into a VMM VCM region so the kernel
> +can read or write it. The second stage is not necessary if the VMM does
> +not need to read or modify the contents of the original mapping.
> +
> + Stage 1: Map and Allocate Memory for a Device
> +
> + The call sequence starts by creating a VCM region:
> +
> + vcm = vcm_create(start_addr, len);
> +
> + The next call associates a VCM region with a device:
> +
> + avcm = vcm_assoc(vcm, dev, attr);
> +
> + To activate the association users call vcm_activate() on the avcm from
association,
> + the associate call. This programs the underlining device with the
> + mappings.
> +
> + ret = vcm_activate(avcm);
> +
> + Once a VCM region is created and associated it can be reserved from
> + with:
> +
> + res = vcm_reserve(vcm, res_len, res_attr);
> +
> + A user then allocates physical memory with:
> +
> + physmem = vcm_phys_alloc(memtype, len, phys_attr);
> +
> + To back the reservation with the physical memory allocation the user
> + calls:
> +
> + vcm_back(res, physmem);
> +
> +
> + Stage 2: Map the Device's Memory into the VMM's VCM region
> +
> + If the VMM needs to read and/or write the region that was just created
created,
> + the following calls are made.
> +
> + The first call creates a prebuilt VCM with:
> +
> + vcm_vmm = vcm_from_prebuilt(ext_vcm_id);
> +
> + The prebuilt VCM is associated with the CPU device and activated with:
> +
> + avcm_vmm = vcm_assoc(vcm_vmm, dev_cpu, attr);
> + vcm_activate(avcm_vmm);
> +
> + A reservation is made on the VMM VCM with:
> +
> + res_vmm = vcm_reserve(vcm_vmm, res_len, attr);
> +
> + Finally, once the topology has been set up a vcm_back() allows the VMM
> + to read the memory using the physmem generated in stage 1:
> +
> + vcm_back(res_vmm, physmem);
> +
> +Mapping IOMMU, one-to-one and VMM Reservations
> +----------------------------------------------
> +
> +The following example demonstrates mapping IOMMU, one-to-one and VMM
> +reservations to the same physical memory. It shows the use of phys_addr
> +and phys_size to create a contiguous VCM for one-to-one mapped devices.
> +
> + The user allocates physical memory:
> +
> + physmem = vcm_phys_alloc(memtype, SZ_2MB + SZ_4K, CONTIGUOUS);
> +
> + Creates an IOMMU VCM:
> +
> + vcm_iommu = vcm_create(SZ_1K, SZ_16M);
> +
> + Creates an one-to-one VCM:
a one-to-one
> +
> + vcm_onetoone = vcm_create(phys_addr, phys_size);
> +
> + Creates a Prebuit VCM:
> +
> + vcm_vmm = vcm_from_prebuit(ext_vcm_id);
> +
> + Associate and activate all three to their respective devices:
> +
> + avcm_iommu = vcm_assoc(vcm_iommu, dev_iommu, attr0);
> + avcm_onetoone = vcm_assoc(vcm_onetoone, dev_onetoone, attr1);
> + avcm_vmm = vcm_assoc(vcm_vmm, dev_cpu, attr2);
error handling on vcm_assoc() failures?
> + vcm_activate(avcm_iommu);
> + vcm_activate(avcm_onetoone);
> + vcm_activate(avcm_vmm);
> +
> + And finally, creates and backs reservations on all 3 such that they
> + all point to the same memory:
> +
> + res_iommu = vcm_reserve(vcm_iommu, SZ_2MB + SZ_4K, attr);
> + res_onetoone = vcm_reserve(vcm_onetoone, SZ_2MB + SZ_4K, attr);
> + res_vmm = vcm_reserve(vcm_vmm, SZ_2MB + SZ_4K, attr);
error handling?
> + vcm_back(res_iommu, physmem);
> + vcm_back(res_onetoone, physmem);
> + vcm_back(res_vmm, physmem);
> +
> +VCM Summary
> +-----------
> +
> +The VCMM is an attempt to abstract attributes of three distinct classes of
> +mappings into one API. The VCMM allows users to reason about mappings as
> +first class objects. It also allows memory mappings to flow from the
> +traditional 4K mappings prevalent on systems today to more efficient block
> +sizes. Finally, it allows users to manage mapping interoperation without
> +becoming VMM experts. These features will allow future systems with many
> +MMU mapped devices to interoperate simply and therefore correctly.
> +
> +
> +IOMMU Hardware Control
> +======================
> +
> +The VCM currently supports a single type of IOMMU, a Qualcomm System MMU
> +(SMMU). The SMMU interface contains functions to map and unmap virtual
> +addresses, perform address translations and initialize hardware. A
> +Qualcomm SMMU can contain multiple MMU contexts. Each context can
> +translate in parallel. All contexts in a SMMU share one global translation
> +look-aside buffer (TLB).
> +
> +To support context muxing the SMMU module creates and manages device
> +independent virtual contexts. These context abstractions are bound to
> +actual contexts at run-time. Once bound, a context can be activated. This
> +activation programs the underlying context with the virtual context
> +affecting a context switch.
> +
> +The following functions are all documented in:
> +
> + arch/arm/mach-msm/include/mach/smmu_driver.h.
> +
> +Mapping
> +-------
> +
> +To map and unmap a virtual page into physical space the VCM calls:
> +
> + int smmu_map(struct smmu_dev *dev, unsigned long pa,
> + unsigned long va, unsigned long len, unsigned int attr);
> +
> + int smmu_unmap(struct smmu_dev *dev, unsigned long va,
> + unsigned long len);
> +
> + int smmu_update_start(struct smmu_dev *dev);
> +
> + int smmu_update_done(struct smmu_dev *dev);
> +
> +The size given to map must be 4K, 64K, 1M or 16M and the VA and PA must be
> +aligned to the given size. smmu_update_start() and smmu_update_done()
> +should be called before and after each map or unmap.
> +
> +Translation
> +-----------
> +
> +To request a hardware VA to PA translation on a single address the VCM
> +calls:
> +
> + unsigned long smmu_translate(struct smmu_dev *dev,
> + unsigned long va);
> +
> +Fault Handling
> +--------------
> +
> +To register an interrupt handler for a context the VCM calls:
> +
> + int smmu_hook_irpt(struct smmu_dev *dev, vcm_handler handler,
> + void *data);
We try to spell out things like "interrupt" in function names
because I/we can't remember just how you decided to abbreviate it.
> +
> +The registered interrupt handler should return 1 if it wants the SMMU
> +driver to retry the transaction again and 0 if it wants the SMMU driver to
> +terminate the transaction.
> +
> +Managing SMMU Initialization and Contexts
> +-----------------------------------------
> +
> +SMMU hardware initialization and management happens in 2 steps. The first
> +step initializes global SMMU devices and abstract device contexts. The
> +second step binds contexts and devices.
> +
> +A SMMU hardware instance is built with:
An SMMU
> +
> + int smmu_drvdata_init(struct smmu_driver *drv, unsigned long base,
> + int irq);
> +
> +A SMMU context is Initialized and deinitialized with:
An SMMU initialized
> +
> + struct smmu_dev *smmu_ctx_init(int ctx);
> + int smmu_ctx_deinit(struct smmu_dev *dev);
> +
> +An abstract SMMU context is bound to a particular SMMU with:
> +
> + int smmu_ctx_bind(struct smmu_dev *ctx, struct smmu_driver *drv);
> +
> +Activation
> +----------
> +
> +Activation affects a context switch.
> +
> +Activation, deactivation and activation state testing are done with:
> +
> + int smmu_activate(struct smmu_dev *dev);
> + int smmu_deactivate(struct smmu_dev *dev);
> + int smmu_is_active(struct smmu_dev *dev);
> +
> +
> +Userspace Access to Devices with IOMMUs
> +=======================================
> +
> +A device that issues transactions through an IOMMU must work with two
> +APIs. The first API is the VCM. The VCM API is device independent. Users
> +pass the VCM a dev_id and the VCM makes calls on the hardware device it
> +has been configured with using this dev_id. The second API is whatever
> +device topology has been created to organize the particular IOMMUs in a
> +system. The only constraint on this second API is that it must give the
> +user a single dev_id that it can pass through the VCM.
> +
> +For the Qualcomm SMMUs the second API consists of a tree of platform
> +devices and two platform drivers as well as a context lookup function that
> +traverses the device tree and returns a dev_id given a context name.
> +
> +Qualcomm SMMU Device Tree
> +-------------------------
> +
> +The current tree organizes the devices into a tree that looks like the
> +following:
> +
> +smmu/
> + smmu0/
> + ctx0
> + ctx1
> + ctx2
> + smmu1/
> + ctx3
> +
> +
> +Each context, ctx[n] and each smmu, smmu[n] is given a name. Since users
> +are interested in contexts not smmus, the contexts name is passed to a
context
> +function to find the dev_id associated with that name. The functions to
> +find, free and get the base address (since the device probe function calls
> +ioremap to map the SMMUs configuration registers into the kernel) are
> +listed here:
> +
> + struct smmu_dev *smmu_get_ctx_instance(char *ctx_name);
> + int smmu_free_ctx_instance(struct smmu_dev *dev);
> + unsigned long smmu_get_base_addr(struct smmu_dev *dev);
> +
> +Documentation for these functions is in:
> +
> + arch/arm/mach-msm/include/mach/smmu_device.h
> +
> +Each context is given a dev node named after the context. For example:
> +
> + /dev/vcodec_a_mm1
> + /dev/vcodec_b_mm2
> + /dev/vcodec_stream
> + etc...
> +
> +Users open, close and mmap these nodes to access VCM buffers from
> +userspace in the same way that they used to open, close and mmap /dev
> +nodes that represented large physically contiguous buffers (called PMEM
> +buffers on Android).
> +
> +Example
> +-------
> +
> +An abbreviated example is shown here:
> +
> +Users get the dev_id associated with their target context, create a VCM
> +topology appropriate for their device and finally associate the VCMs of
> +the topology with the contexts that will take the VCMs:
> +
> + dev_id = smmu_get_ctx_instance(vcodec_a_stream);
> +
> +create vcm and needed topology
> +
> + avcm = vcm_assoc(vcm, dev_id, attr);
> +
> +Tying it all Together
> +---------------------
> +
> +VCMs, IOMMUs and the device tree all work to support system-wide memory
> +mappings. The use of each API in this system allows users to concentrate
> +on the relevant details without needing to worry about low-level
> +details. The APIs clear separation of memory spaces and the devices that
API's
> +support those memory spaces continues Linuxs tradition of abstracting the
the Linux tradition
> +what from the how.
> +
> +
> +Maintainer: Zach Pfeffer <[email protected]>
> --
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
Thank you for the corrections. I'm correcting them now. Some responses:
Randy Dunlap wrote:
>> + struct vcm *vcm_create(size_t start_addr, size_t len);
>
> Seems odd to use size_t for start_addr.
I used size_t because I wanted to allow the start_addr the same range
as len. Is there a better type to use? I see 'unsigned long' used
throughout the mm code. Perhaps that's better for both the start_addr
and len.
>> +A Reservation is created and destroyed with:
>> +
>> + struct res *vcm_reserve(struct vcm *vcm, size_t len, uint32_t attr);
>
> s/uint32_t/u32/ ?
Sure.
>> + Associate and activate all three to their respective devices:
>> +
>> + avcm_iommu = vcm_assoc(vcm_iommu, dev_iommu, attr0);
>> + avcm_onetoone = vcm_assoc(vcm_onetoone, dev_onetoone, attr1);
>> + avcm_vmm = vcm_assoc(vcm_vmm, dev_cpu, attr2);
>
> error handling on vcm_assoc() failures?
I'll add the deassociate call to the example.
>> + res_iommu = vcm_reserve(vcm_iommu, SZ_2MB + SZ_4K, attr);
>> + res_onetoone = vcm_reserve(vcm_onetoone, SZ_2MB + SZ_4K, attr);
>> + res_vmm = vcm_reserve(vcm_vmm, SZ_2MB + SZ_4K, attr);
>
> error handling?
I'll add it here too.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On Tue, 29 Jun 2010 22:55:50 -0700 Zach Pfeffer wrote:
> arch/arm/mm/vcm.c | 1901 +++++++++++++++++++++++++++++++++++++++++++++
> include/linux/vcm.h | 701 +++++++++++++++++
> include/linux/vcm_types.h | 318 ++++++++
> 3 files changed, 2920 insertions(+), 0 deletions(-)
> create mode 100644 arch/arm/mm/vcm.c
> create mode 100644 include/linux/vcm.h
> create mode 100644 include/linux/vcm_types.h
> diff --git a/include/linux/vcm.h b/include/linux/vcm.h
> new file mode 100644
> index 0000000..d2a1cd1
> --- /dev/null
> +++ b/include/linux/vcm.h
> @@ -0,0 +1,701 @@
> +/* Copyright (c) 2010, Code Aurora Forum. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are
> + * met:
> + * * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer in the documentation and/or other materials provided
> + * with the distribution.
> + * * Neither the name of Code Aurora Forum, Inc. nor the names of its
> + * contributors may be used to endorse or promote products derived
> + * from this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED "AS IS" AND ANY EXPRESS OR IMPLIED
> + * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT
> + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS
> + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
> + * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
> + * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
> + * OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
> + * IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + */
What license (name/type) is this?
> +
> +#ifndef _VCM_H_
> +#define _VCM_H_
> +
> +/* All undefined types must be defined using platform specific headers */
> +
> +#include <linux/vcm_types.h>
> +
> +/*
> + * Creating, freeing and managing VCMs.
> + *
> + * A VCM region is a virtual space that can be reserved from and
> + * associated with one or more devices. At creation the user can
> + * specify an offset to start addresses and a length of the entire VCM
> + * region. Reservations out of a VCM region are always contiguous.
> + */
> +
> +/**
> + * vcm_create() - Create a VCM region
> + * @start_addr The starting address of the VCM region.
* @start_addr: text
* @len: text
I.e., need a colon (':') after each param name. (in multiple places)
> + * @len The len of the VCM region. This must be at least
> + * vcm_get_min_page_size() bytes.
> + *
> + * A VCM typically abstracts a page table.
> + *
> + * All functions in this API are passed and return opaque things
> + * because the underlying implementations will vary. The goal
> + * is really graph management. vcm_create() creates the "device end"
> + * of an edge in the mapping graph.
> + *
> + * The return value is non-zero if a VCM has successfully been
> + * created. It will return zero if a VCM region cannot be created or
> + * len is invalid.
> + */
> +struct vcm *vcm_create(size_t start_addr, size_t len);
> +/**
> + * vcm_get_physmem_from_res() - Return a reservation's physmem_id
> + * @ res_id An existing reservation of interest.
* @res_id: <text>
No space between @ and res_id.
> + *
> + * The return value will be non-zero on success, otherwise it will be:
> + * -EINVAL res is invalid
> + * -ENOMEM res is unbacked
> + */
> +struct physmem *vcm_get_physmem_from_res(struct res *res_id);
> diff --git a/include/linux/vcm_types.h b/include/linux/vcm_types.h
> new file mode 100644
> index 0000000..2cc4770
> --- /dev/null
> +++ b/include/linux/vcm_types.h
> @@ -0,0 +1,318 @@
> +/**
> + * enum memtarget_t - A logical location in a VCM.
> + *
> + * VCM_START Indicates the start of a VCM_REGION.
This is not quite kernel-doc notation (as indicated by the beginning /**).
Please see Documentation/kernel-doc-nano-HOWTO.txt for details, or ask me
if you need help with it.
> + */
> +enum memtarget_t {
> + VCM_START
> +};
> +
> +
> +/**
> + * enum memtype_t - A logical location in a VCM.
not quite kernel-doc notation...
> + *
> + * VCM_MEMTYPE_0 Generic memory type 0
> + * VCM_MEMTYPE_1 Generic memory type 1
> + * VCM_MEMTYPE_2 Generic memory type 2
> + *
> + * A memtype encapsulates a platform specific memory arrangement. The
> + * memtype needn't refer to a single type of memory, it can refer to a
> + * set of memories that can back a reservation.
> + *
> + */
> +enum memtype_t {
> + VCM_INVALID,
> + VCM_MEMTYPE_0,
> + VCM_MEMTYPE_1,
> + VCM_MEMTYPE_2,
> +};
> +
> +
> +/**
> + * vcm_handler - The signature of the fault hook.
> + * @dev_id The device id of the faulting device.
> + * @data The generic data pointer.
> + * @fault_data System specific common fault data.
ditto.
> + *
> + * The handler should return 0 for success. This indicates that the
> + * fault was handled. A non-zero return value is an error and will be
> + * propagated up the stack.
> + */
> +typedef int (*vcm_handler)(size_t dev_id, void *data, void *fault_data);
> +
> +
> +enum vcm_type {
> + VCM_DEVICE,
> + VCM_EXT_KERNEL,
> + VCM_EXT_USER,
> + VCM_ONE_TO_ONE,
> +};
> +
> +
> +/**
> + * vcm - A Virtually Contiguous Memory region.
* struct vcm - ...
and add colon after each struct @member:
> + * @start_addr The starting address of the VCM region.
> + * @len The len of the VCM region. This must be at least
> + * vcm_min() bytes.
and missing lots of struct members here.
If some of them are private, you can use:
/* private: */
...
/* public: */
comments in the struct below and then don't add the private ones to the
kernel-doc notation above.
> + */
> +struct vcm {
> + enum vcm_type type;
> +
> + size_t start_addr;
> + size_t len;
> +
> + size_t dev_id; /* opaque device control */
> +
> + /* allocator dependent */
> + struct gen_pool *pool;
> +
> + struct list_head res_head;
> +
> + /* this will be a very short list */
> + struct list_head assoc_head;
> +};
> +
> +/**
> + * avcm - A VCM to device association
not quite kernel-doc notation.
> + * @vcm The VCM region of interest.
> + * @dev_id The device to associate the VCM with.
> + * @attr See 'Association Attributes'.
> + */
> +struct avcm {
> + struct vcm *vcm_id;
> + size_t dev_id;
> + uint32_t attr;
> +
> + struct list_head assoc_elm;
> +
> + int is_active; /* is this particular association active */
> +};
> +
> +/**
> + * bound - A boundary to reserve from in a VCM region.
ditto.
> + * @vcm The VCM that needs a bound.
> + * @len The len of the bound.
> + */
> +struct bound {
> + struct vcm *vcm_id;
> + size_t len;
> +};
> +
> +
> +/**
> + * physmem - A physical memory allocation.
ditto.
> + * @memtype The memory type of the VCM region.
> + * @len The len of the physical memory allocation.
> + * @attr See 'Physical Allocation Attributes'.
> + *
> + */
> +
> +struct physmem {
> + enum memtype_t memtype;
> + size_t len;
> + uint32_t attr;
> +
> + struct phys_chunk alloc_head;
> +
> + /* if the physmem is cont then use the built in VCM */
> + int is_cont;
> + struct res *res;
> +};
> +
> +/**
> + * res - A reservation in a VCM region.
ditto.
> + * @vcm The VCM region to reserve from.
> + * @len The length of the reservation. Must be at least vcm_min()
> + * bytes.
> + * @attr See 'Reservation Attributes'.
> + */
> +struct res {
> + struct vcm *vcm_id;
> + struct physmem *physmem_id;
> + size_t len;
> + uint32_t attr;
> +
> + /* allocator dependent */
> + size_t alignment_req;
> + size_t aligned_len;
> + unsigned long ptr;
> + size_t aligned_ptr;
> +
> + struct list_head res_elm;
> +
> +
> + /* type VCM_EXT_KERNEL */
> + struct vm_struct *vm_area;
> + int mapped;
> +};
> +
> +extern int chunk_sizes[NUM_CHUNK_SIZES];
> +
> +#endif /* VCM_TYPES_H */
> --
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
> What license (name/type) is this?
IANAL, but AFAIK standard wisdom is that "disclaimer in the documentation
and/or other materials provided" is generally not acceptable for Linux
because it's an excessive burden for all distributors.
Also for me it's still quite unclear why we would want this code at all...
It doesn't seem to do anything you couldn't do with the existing interfaces.
-Andi
On Thu, 2010-07-01 at 20:02 +0200, Andi Kleen wrote:
> > What license (name/type) is this?
>
> IANAL, but AFAIK standard wisdom is that "disclaimer in the documentation
> and/or other materials provided" is generally not acceptable for Linux
> because it's an excessive burden for all distributors.
It's the BSD license ..
> Also for me it's still quite unclear why we would want this code at all...
> It doesn't seem to do anything you couldn't do with the existing interfaces.
I don't know all that much about what Zach's done here, but from what
he's said so far it looks like this help to manage lots of IOMMUs on a
single system.. On x86 it seems like there's not all that many IOMMUs in
comparison .. Zach mentioned 10 to 100 IOMMUs ..
Daniel
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On Thu, Jul 01, 2010 at 12:28:23PM -0700, Daniel Walker wrote:
> On Thu, 2010-07-01 at 20:02 +0200, Andi Kleen wrote:
> > > What license (name/type) is this?
> >
> > IANAL, but AFAIK standard wisdom is that "disclaimer in the documentation
> > and/or other materials provided" is generally not acceptable for Linux
> > because it's an excessive burden for all distributors.
>
> It's the BSD license ..
It's the old version of the BSD license that noone uses anymore because of its ]
problems: it's really a unreasonable burden to include hundreds or thousands of
attributions for every contributor in every printed manual you ship.
The BSDs have all switched to the "Clause 2" (without this one) because
of this.
>
> > Also for me it's still quite unclear why we would want this code at all...
> > It doesn't seem to do anything you couldn't do with the existing interfaces.
>
> I don't know all that much about what Zach's done here, but from what
> he's said so far it looks like this help to manage lots of IOMMUs on a
> single system.. On x86 it seems like there's not all that many IOMMUs in
> comparison .. Zach mentioned 10 to 100 IOMMUs ..
The current code can manage multiple IOMMUs fine.
-Andi
--
[email protected] -- Speaking for myself only.
On Thu, 2010-07-01 at 21:38 +0200, Andi Kleen wrote:
> >
> > > Also for me it's still quite unclear why we would want this code at all...
> > > It doesn't seem to do anything you couldn't do with the existing interfaces.
> >
> > I don't know all that much about what Zach's done here, but from what
> > he's said so far it looks like this help to manage lots of IOMMUs on a
> > single system.. On x86 it seems like there's not all that many IOMMUs in
> > comparison .. Zach mentioned 10 to 100 IOMMUs ..
>
> The current code can manage multiple IOMMUs fine.
He demonstrated the usage of his code in one of the emails he sent out
initially. Did you go over that, and what (or how many) step would you
use with the current code to do the same thing?
Daniel
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
Randy,
On Thu, 1 Jul 2010, Randy Dunlap wrote:
> > + * @start_addr The starting address of the VCM region.
> > + * @len The len of the VCM region. This must be at least
> > + * vcm_min() bytes.
>
> and missing lots of struct members here.
> If some of them are private, you can use:
>
> /* private: */
> ...
> /* public: */
> comments in the struct below and then don't add the private ones to the
> kernel-doc notation above.
To avoid wasting space in structures, it makes sense to place fields
smaller than the alignment width together in the structure definition.
If one were to do this and follow your proposal, some structures may need
multiple "private" and "public" comments, which seems undesirable. The
alternative, wasting memory, also seems undesirable. Perhaps you might
have a proposal for a way to resolve this?
- Paul
On 07/01/10 13:59, Paul Walmsley wrote:
> Randy,
>
> On Thu, 1 Jul 2010, Randy Dunlap wrote:
>
>>> + * @start_addr The starting address of the VCM region.
>>> + * @len The len of the VCM region. This must be at least
>>> + * vcm_min() bytes.
>>
>> and missing lots of struct members here.
>> If some of them are private, you can use:
>>
>> /* private: */
>> ...
>> /* public: */
>> comments in the struct below and then don't add the private ones to the
>> kernel-doc notation above.
>
> To avoid wasting space in structures, it makes sense to place fields
> smaller than the alignment width together in the structure definition.
> If one were to do this and follow your proposal, some structures may need
> multiple "private" and "public" comments, which seems undesirable. The
> alternative, wasting memory, also seems undesirable. Perhaps you might
> have a proposal for a way to resolve this?
I don't know of a really good way. There are a few structs that have
multiple private/public entries, and that is OK.
Or you can describe all of the entries with kernel-doc notation.
Or you can choose not to use kernel-doc notation on some structs.
--
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
Andi Kleen wrote:
>>> Also for me it's still quite unclear why we would want this code at all...
>>> It doesn't seem to do anything you couldn't do with the existing interfaces.
>> I don't know all that much about what Zach's done here, but from what
>> he's said so far it looks like this help to manage lots of IOMMUs on a
>> single system.. On x86 it seems like there's not all that many IOMMUs in
>> comparison .. Zach mentioned 10 to 100 IOMMUs ..
>
> The current code can manage multiple IOMMUs fine.
That's fair. The current code does manage multiple IOMMUs without issue
for a static map topology. Its core function 'map' maps a physical chunk
of some size into a IOMMU's address space and the kernel's address
space for some domain.
The VCMM provides a more abstract, global view with finer-grained
control of each mapping a user wants to create. For instance, the
symantics of iommu_map preclude its use in setting up just the IOMMU
side of a mapping. With a one-sided map, two IOMMU devices can be
pointed to the same physical memory without mapping that same memory
into the kernel's address space.
Additionally, the current IOMMU interface does not allow users to
associate one page table with multiple IOMMUs unless the user explicitly
wrote a muxed device underneith the IOMMU interface. This also could be
done, but would have to be done for every such use case. Since the
particular topology is run-time configurable all of these use-cases and
more can be expressed without pushing the topology into the low-level
IOMMU driver.
The VCMM takes the long view. Its designed for a future in which the
number of IOMMUs will go up and the ways in which these IOMMUs are
composed will vary from system to system, and may vary at
runtime. Already, there are ~20 different IOMMU map implementations in
the kernel. Had the Linux kernel had the VCMM, many of those
implementations could have leveraged the mapping and topology management
of a VCMM, while focusing on a few key hardware specific functions (map
this physical address, program the page table base register).
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On Thu, 2010-07-01 at 15:00 -0700, Zach Pfeffer wrote:
> Additionally, the current IOMMU interface does not allow users to
> associate one page table with multiple IOMMUs unless the user explicitly
> wrote a muxed device underneith the IOMMU interface. This also could be
> done, but would have to be done for every such use case. Since the
> particular topology is run-time configurable all of these use-cases and
> more can be expressed without pushing the topology into the low-level
> IOMMU driver.
>
> The VCMM takes the long view. Its designed for a future in which the
> number of IOMMUs will go up and the ways in which these IOMMUs are
> composed will vary from system to system, and may vary at
> runtime. Already, there are ~20 different IOMMU map implementations in
> the kernel. Had the Linux kernel had the VCMM, many of those
> implementations could have leveraged the mapping and topology management
> of a VCMM, while focusing on a few key hardware specific functions (map
> this physical address, program the page table base register).
So if we include this code which "map implementations" could you
collapse into this implementations ? Generally , what currently existing
code can VCMM help to eliminate?
Daniel
--
Sent by a consultant of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
>
> He demonstrated the usage of his code in one of the emails he sent out
> initially. Did you go over that, and what (or how many) step would you
> use with the current code to do the same thing?
-- So is this patch set adding layers and abstractions to help the User ?
If the idea is to share some memory across multiple devices, I guess
you can achieve the same by calling the map function provided by iommu
module and sharing the mapped address to the 10's or 100's of devices
to access the buffers. You would only need a dedicated virtual pool
per IOMMU device to manage its virtual memory allocations.
Hari
> The VCMM takes the long view. Its designed for a future in which the
> number of IOMMUs will go up and the ways in which these IOMMUs are
> composed will vary from system to system, and may vary at
> runtime. Already, there are ~20 different IOMMU map implementations in
> the kernel. Had the Linux kernel had the VCMM, many of those
> implementations could have leveraged the mapping and topology management
> of a VCMM, while focusing on a few key hardware specific functions (map
> this physical address, program the page table base register).
>
-- Sounds good.
Did you think of a way to handle the cases where one of the Device
that is using the mapped address crashed ?
How is the physical address unbacked in this case ?
Hari
> The VCMM provides a more abstract, global view with finer-grained
> control of each mapping a user wants to create. For instance, the
> symantics of iommu_map preclude its use in setting up just the IOMMU
> side of a mapping. With a one-sided map, two IOMMU devices can be
Hmm? dma_map_* does not change any CPU mappings. It only sets up
DMA mapping(s).
> Additionally, the current IOMMU interface does not allow users to
> associate one page table with multiple IOMMUs unless the user explicitly
That assumes that all the IOMMUs on the system support the same page table
format, right?
As I understand your approach would help if you have different
IOMMus with an different low level interface, which just
happen to have the same pte format. Is that very likely?
I would assume if you have lots of copies of the same IOMMU
in the system then you could just use a single driver with multiple
instances that share some state for all of them. That model
would fit in the current interfaces. There's no reason multiple
instances couldn't share the same allocation data structure.
And if you have lots of truly different IOMMUs then they likely
won't be able to share PTEs at the hardware level anyways, because
the formats are too different.
> The VCMM takes the long view. Its designed for a future in which the
> number of IOMMUs will go up and the ways in which these IOMMUs are
> composed will vary from system to system, and may vary at
> runtime. Already, there are ~20 different IOMMU map implementations in
> the kernel. Had the Linux kernel had the VCMM, many of those
> implementations could have leveraged the mapping and topology management
> of a VCMM, while focusing on a few key hardware specific functions (map
> this physical address, program the page table base register).
The standard Linux approach to such a problem is to write
a library that drivers can use for common functionality, not put a middle
layer inbetween. Libraries are much more flexible than layers.
That said I'm not sure there's all that much duplicated code anyways.
A lot of the code is always IOMMU specific. The only piece
which might be shareable is the mapping allocation, but I don't
think that's very much of a typical driver
In my old pci-gart driver the allocation was all only a few lines of code,
although given it was somewhat dumb in this regard because it only managed a
small remapping window.
-Andi
--
[email protected] -- Speaking for myself only.
Andi Kleen wrote:
>> The VCMM provides a more abstract, global view with finer-grained
>> control of each mapping a user wants to create. For instance, the
>> semantics of iommu_map preclude its use in setting up just the IOMMU
>> side of a mapping. With a one-sided map, two IOMMU devices can be
>
> Hmm? dma_map_* does not change any CPU mappings. It only sets up
> DMA mapping(s).
Sure, but I was saying that iommu_map() doesn't just set up the IOMMU
mappings, its sets up both the iommu and kernel buffer mappings.
>
>> Additionally, the current IOMMU interface does not allow users to
>> associate one page table with multiple IOMMUs unless the user explicitly
>
> That assumes that all the IOMMUs on the system support the same page table
> format, right?
Actually no. Since the VCMM abstracts a page-table as a Virtual
Contiguous Region (VCM) a VCM can be associated with any device,
regardless of their individual page table format.
>
> As I understand your approach would help if you have different
> IOMMus with an different low level interface, which just
> happen to have the same pte format. Is that very likely?
>
> I would assume if you have lots of copies of the same IOMMU
> in the system then you could just use a single driver with multiple
> instances that share some state for all of them. That model
> would fit in the current interfaces. There's no reason multiple
> instances couldn't share the same allocation data structure.
>
> And if you have lots of truly different IOMMUs then they likely
> won't be able to share PTEs at the hardware level anyways, because
> the formats are too different.
See VCM's above.
>
>> The VCMM takes the long view. Its designed for a future in which the
>> number of IOMMUs will go up and the ways in which these IOMMUs are
>> composed will vary from system to system, and may vary at
>> runtime. Already, there are ~20 different IOMMU map implementations in
>> the kernel. Had the Linux kernel had the VCMM, many of those
>> implementations could have leveraged the mapping and topology management
>> of a VCMM, while focusing on a few key hardware specific functions (map
>> this physical address, program the page table base register).
>
> The standard Linux approach to such a problem is to write
> a library that drivers can use for common functionality, not put a middle
> layer in between. Libraries are much more flexible than layers.
That's true up to the, "is this middle layer so useful that its worth
it" point. The VM is a middle layer, you could make the same argument
about it, "the mapping code isn't too hard, just map in the memory
that you need and be done with it". But the VM middle layer provides a
clean separation between page frames and pages which turns out to be
infinitely useful. The VCMM is built in the same spirit, It says
things like, "mapping is a global problem, I'm going to abstract
entire virtual spaces and allow people arbitrary chuck size
allocation, I'm not going to care that my device is physically mapping
this buffer and this other device is a virtual, virtual device."
>
> That said I'm not sure there's all that much duplicated code anyways.
> A lot of the code is always IOMMU specific. The only piece
> which might be shareable is the mapping allocation, but I don't
> think that's very much of a typical driver
>
> In my old pci-gart driver the allocation was all only a few lines of code,
> although given it was somewhat dumb in this regard because it only managed a
> small remapping window.
I agree that its not a lot of code, and that this layer may be a bit heavy, but I'd like to focus on is a global mapping view useful and if so is something like the graph management that the VCMM provides generally useful.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On Thu, Jul 01, 2010 at 12:16:18AM -0700, Zach Pfeffer wrote:
> Thank you for the corrections. I'm correcting them now. Some responses:
>
> Randy Dunlap wrote:
> >> + struct vcm *vcm_create(size_t start_addr, size_t len);
> >
> > Seems odd to use size_t for start_addr.
>
> I used size_t because I wanted to allow the start_addr the same range
> as len. Is there a better type to use? I see 'unsigned long' used
> throughout the mm code. Perhaps that's better for both the start_addr
> and len.
>
phys_addr_t or resource_size_t.
Hari Kanigeri wrote:
>> He demonstrated the usage of his code in one of the emails he sent out
>> initially. Did you go over that, and what (or how many) step would you
>> use with the current code to do the same thing?
>
> -- So is this patch set adding layers and abstractions to help the User ?
>
> If the idea is to share some memory across multiple devices, I guess
> you can achieve the same by calling the map function provided by iommu
> module and sharing the mapped address to the 10's or 100's of devices
> to access the buffers. You would only need a dedicated virtual pool
> per IOMMU device to manage its virtual memory allocations.
Yeah, you can do that. My idea is to get away from explicit addressing
and encapsulate the "device address to physical address" link into a
mapping.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
Hari Kanigeri wrote:
>> The VCMM takes the long view. Its designed for a future in which the
>> number of IOMMUs will go up and the ways in which these IOMMUs are
>> composed will vary from system to system, and may vary at
>> runtime. Already, there are ~20 different IOMMU map implementations in
>> the kernel. Had the Linux kernel had the VCMM, many of those
>> implementations could have leveraged the mapping and topology management
>> of a VCMM, while focusing on a few key hardware specific functions (map
>> this physical address, program the page table base register).
>>
>
> -- Sounds good.
> Did you think of a way to handle the cases where one of the Device
> that is using the mapped address crashed ?
> How is the physical address unbacked in this case ?
Actually the API takes care of that by design. Since the physical
space is managed apart from the mapper the mapper can crash and not
affect the physical memory allocation.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
Daniel Walker wrote:
> On Thu, 2010-07-01 at 15:00 -0700, Zach Pfeffer wrote:
>
>
>> Additionally, the current IOMMU interface does not allow users to
>> associate one page table with multiple IOMMUs unless the user explicitly
>> wrote a muxed device underneith the IOMMU interface. This also could be
>> done, but would have to be done for every such use case. Since the
>> particular topology is run-time configurable all of these use-cases and
>> more can be expressed without pushing the topology into the low-level
>> IOMMU driver.
>>
>> The VCMM takes the long view. Its designed for a future in which the
>> number of IOMMUs will go up and the ways in which these IOMMUs are
>> composed will vary from system to system, and may vary at
>> runtime. Already, there are ~20 different IOMMU map implementations in
>> the kernel. Had the Linux kernel had the VCMM, many of those
>> implementations could have leveraged the mapping and topology management
>> of a VCMM, while focusing on a few key hardware specific functions (map
>> this physical address, program the page table base register).
>
> So if we include this code which "map implementations" could you
> collapse into this implementations ? Generally , what currently existing
> code can VCMM help to eliminate?
In theory, it can eliminate all code the interoperates between IOMMU,
CPU and non-IOMMU based devices and all the mapping code, alignment,
mapping attribute and special block size support that's been
implemented.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On Thu, Jul 01, 2010 at 11:17:34PM -0700, Zach Pfeffer wrote:
> Andi Kleen wrote:
> >> The VCMM provides a more abstract, global view with finer-grained
> >> control of each mapping a user wants to create. For instance, the
> >> semantics of iommu_map preclude its use in setting up just the IOMMU
> >> side of a mapping. With a one-sided map, two IOMMU devices can be
> >
> > Hmm? dma_map_* does not change any CPU mappings. It only sets up
> > DMA mapping(s).
>
> Sure, but I was saying that iommu_map() doesn't just set up the IOMMU
> mappings, its sets up both the iommu and kernel buffer mappings.
Normally the data is already in the kernel or mappings, so why
would you need another CPU mapping too? Sometimes the CPU
code has to scatter-gather, but that is considered acceptable
(and if it really cannot be rewritten to support sg it's better
to have an explicit vmap operation)
In general on larger systems with many CPUs changing CPU mappings
also gets expensive (because you have to communicate with all cores),
and is not a good idea on frequent IO paths.
>
> >
> >> Additionally, the current IOMMU interface does not allow users to
> >> associate one page table with multiple IOMMUs unless the user explicitly
> >
> > That assumes that all the IOMMUs on the system support the same page table
> > format, right?
>
> Actually no. Since the VCMM abstracts a page-table as a Virtual
> Contiguous Region (VCM) a VCM can be associated with any device,
> regardless of their individual page table format.
But then there is no real page table sharing, isn't it?
The real information should be in the page tables, nowhere else.
> > The standard Linux approach to such a problem is to write
> > a library that drivers can use for common functionality, not put a middle
> > layer in between. Libraries are much more flexible than layers.
>
> That's true up to the, "is this middle layer so useful that its worth
> it" point. The VM is a middle layer, you could make the same argument
> about it, "the mapping code isn't too hard, just map in the memory
> that you need and be done with it". But the VM middle layer provides a
> clean separation between page frames and pages which turns out to be
Actually we use both PFNs and struct page *s in many layers up
and down, there's not really any layering in that.
-Andi
--
[email protected] -- Speaking for myself only.
Andi Kleen wrote:
> On Thu, Jul 01, 2010 at 11:17:34PM -0700, Zach Pfeffer wrote:
>> Andi Kleen wrote:
>>>> The VCMM provides a more abstract, global view with finer-grained
>>>> control of each mapping a user wants to create. For instance, the
>>>> semantics of iommu_map preclude its use in setting up just the IOMMU
>>>> side of a mapping. With a one-sided map, two IOMMU devices can be
>>> Hmm? dma_map_* does not change any CPU mappings. It only sets up
>>> DMA mapping(s).
>> Sure, but I was saying that iommu_map() doesn't just set up the IOMMU
>> mappings, its sets up both the iommu and kernel buffer mappings.
>
> Normally the data is already in the kernel or mappings, so why
> would you need another CPU mapping too? Sometimes the CPU
> code has to scatter-gather, but that is considered acceptable
> (and if it really cannot be rewritten to support sg it's better
> to have an explicit vmap operation)
>
> In general on larger systems with many CPUs changing CPU mappings
> also gets expensive (because you have to communicate with all cores),
> and is not a good idea on frequent IO paths.
That's all true, but what a VCMM allows is for these trade-offs to be
made by the user for future systems. It may not be too expensive to
change the IO path around on future chips or the user may be okay with
the performance penalty. A VCMM doesn't enforce a policy on the user,
it lets the user make their own policy.
>>>> Additionally, the current IOMMU interface does not allow users to
>>>> associate one page table with multiple IOMMUs unless the user explicitly
>>> That assumes that all the IOMMUs on the system support the same page table
>>> format, right?
>> Actually no. Since the VCMM abstracts a page-table as a Virtual
>> Contiguous Region (VCM) a VCM can be associated with any device,
>> regardless of their individual page table format.
>
> But then there is no real page table sharing, isn't it?
> The real information should be in the page tables, nowhere else.
Yeah, and the implementation ensures that it. The VCMM just adds a few
fields like start_addr, len and the device. The device still manages
the its page-tables.
>>> The standard Linux approach to such a problem is to write
>>> a library that drivers can use for common functionality, not put a middle
>>> layer in between. Libraries are much more flexible than layers.
>> That's true up to the, "is this middle layer so useful that its worth
>> it" point. The VM is a middle layer, you could make the same argument
>> about it, "the mapping code isn't too hard, just map in the memory
>> that you need and be done with it". But the VM middle layer provides a
>> clean separation between page frames and pages which turns out to be
>
> Actually we use both PFNs and struct page *s in many layers up
> and down, there's not really any layering in that.
Sure, but the PFNs and the struct page *s are the middle layer. Its
just that things haven't been layered on top of them. A VCMM is the
higher level abstraction, since it allows the size of the PFs to vary
and the consumers of the VCM's to be determined at run-time.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
Andi Kleen wrote:
> The standard Linux approach to such a problem is to write
> a library that drivers can use for common functionality, not put a middle
> layer inbetween. Libraries are much more flexible than layers.
I've been thinking about this statement. Its very true. I use the
genalloc lib which is a great piece of software to manage VCMs
(domains in linux/iommu.h parlance?).
On our hardware we have 3 things we have to do, use the minimum set of
mappings to map a buffer because of the extremely small TLBs in all the
IOMMUs we have to support, use special virtual alignments and direct
various multimedia flows through certain IOMMUs. To support this we:
1. Use the genalloc lib to allocate virtual space for our IOMMUs,
allowing virtual alignment to be specified.
2. Have a maxmunch allocator that manages our own physical pool.
I think I may be able to support this using the iommu interface and
some util functions. The big thing that's lost is the unified topology
management, but as demonstrated that may fall out from a refactor.
Anyhow, sounds like a few things to try. Thanks for the feedback so
far. I'll do some refactoring and see what's missing.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On Fri, Jul 02, 2010 at 12:09:02AM -0700, Zach Pfeffer wrote:
> Hari Kanigeri wrote:
> >> He demonstrated the usage of his code in one of the emails he sent out
> >> initially. Did you go over that, and what (or how many) step would you
> >> use with the current code to do the same thing?
> >
> > -- So is this patch set adding layers and abstractions to help the User ?
> >
> > If the idea is to share some memory across multiple devices, I guess
> > you can achieve the same by calling the map function provided by iommu
> > module and sharing the mapped address to the 10's or 100's of devices
> > to access the buffers. You would only need a dedicated virtual pool
> > per IOMMU device to manage its virtual memory allocations.
>
> Yeah, you can do that. My idea is to get away from explicit addressing
> and encapsulate the "device address to physical address" link into a
> mapping.
The DMA-API already does this with the help of IOMMUs if they are
present. What is the benefit of your approach over that?
Joerg
On Thu, Jul 01, 2010 at 03:00:17PM -0700, Zach Pfeffer wrote:
> Additionally, the current IOMMU interface does not allow users to
> associate one page table with multiple IOMMUs [...]
Thats not true. Multiple IOMMUs are completly handled by the IOMMU
drivers. In the case of the IOMMU-API backend drivers this also includes
the ability to use page-tables on multiple IOMMUs.
> Since the particular topology is run-time configurable all of these
> use-cases and more can be expressed without pushing the topology into
> the low-level IOMMU driver.
The IOMMU driver has to know about the topology anyway because it needs
to know which IOMMU it needs to program for a particular device.
> Already, there are ~20 different IOMMU map implementations in the
> kernel. Had the Linux kernel had the VCMM, many of those
> implementations could have leveraged the mapping and topology
> management of a VCMM, while focusing on a few key hardware specific
> functions (map this physical address, program the page table base
> register).
I partially agree here. All the IOMMU implementations in the Linux
kernel have a lot of functionality in common where code could be
shared. Work to share code has been done in the past by Fujita Tomonori
but there are more places to work on. I am just not sure if a new
front-end API is the right way to do this.
Joerg
On Fri, Jul 02, 2010 at 12:33:51AM -0700, Zach Pfeffer wrote:
> Daniel Walker wrote:
> > So if we include this code which "map implementations" could you
> > collapse into this implementations ? Generally , what currently existing
> > code can VCMM help to eliminate?
>
> In theory, it can eliminate all code the interoperates between IOMMU,
> CPU and non-IOMMU based devices and all the mapping code, alignment,
> mapping attribute and special block size support that's been
> implemented.
Thats a very abstract statement. Can you point to particular code files
and give a rough sketch how it could be improved using VCMM?
Joerg
On Thu, Jul 01, 2010 at 11:17:34PM -0700, Zach Pfeffer wrote:
> Andi Kleen wrote:
> > Hmm? dma_map_* does not change any CPU mappings. It only sets up
> > DMA mapping(s).
>
> Sure, but I was saying that iommu_map() doesn't just set up the IOMMU
> mappings, its sets up both the iommu and kernel buffer mappings.
What do you mean by kernel buffer mappings?
> > That assumes that all the IOMMUs on the system support the same page table
> > format, right?
>
> Actually no. Since the VCMM abstracts a page-table as a Virtual
> Contiguous Region (VCM) a VCM can be associated with any device,
> regardless of their individual page table format.
The IOMMU-API abstracts a page-table as a domain which can also be
associated with any device (behind an iommu).
Joerg
Joerg Roedel wrote:
> On Fri, Jul 02, 2010 at 12:09:02AM -0700, Zach Pfeffer wrote:
>> Hari Kanigeri wrote:
>>>> He demonstrated the usage of his code in one of the emails he sent out
>>>> initially. Did you go over that, and what (or how many) step would you
>>>> use with the current code to do the same thing?
>>> -- So is this patch set adding layers and abstractions to help the User ?
>>>
>>> If the idea is to share some memory across multiple devices, I guess
>>> you can achieve the same by calling the map function provided by iommu
>>> module and sharing the mapped address to the 10's or 100's of devices
>>> to access the buffers. You would only need a dedicated virtual pool
>>> per IOMMU device to manage its virtual memory allocations.
>> Yeah, you can do that. My idea is to get away from explicit addressing
>> and encapsulate the "device address to physical address" link into a
>> mapping.
>
> The DMA-API already does this with the help of IOMMUs if they are
> present. What is the benefit of your approach over that?
The grist to the DMA-API mill is the opaque scatterlist. Each
scatterlist element brings together a physical address and a bus
address that may be different. The set of scatterlist elements
constitute both the set of physical buffers and the mappings to those
buffers. My approach separates these two things into a struct physmem
which contains the set of physical buffers and a struct reservation
which contains the set of bus addresses (or device addresses). Each
element in the struct physmem may be of various lengths (without
resorting to chaining). A map call maps the one set to the other.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
Joerg Roedel wrote:
> On Thu, Jul 01, 2010 at 03:00:17PM -0700, Zach Pfeffer wrote:
>> Additionally, the current IOMMU interface does not allow users to
>> associate one page table with multiple IOMMUs [...]
>
> Thats not true. Multiple IOMMUs are completly handled by the IOMMU
> drivers. In the case of the IOMMU-API backend drivers this also includes
> the ability to use page-tables on multiple IOMMUs.
Yeah. I see that now.
>
>> Since the particular topology is run-time configurable all of these
>> use-cases and more can be expressed without pushing the topology into
>> the low-level IOMMU driver.
>
> The IOMMU driver has to know about the topology anyway because it needs
> to know which IOMMU it needs to program for a particular device.
Perhaps, but why not create a VCM which can be shared across all
mappers in the system? Why bury it in a device driver and make all
IOMMU device drivers managed their own virtual spaces? Practically
this would entail a minor refactor to the fledging IOMMU interface;
adding associate and activate ops.
>
>> Already, there are ~20 different IOMMU map implementations in the
>> kernel. Had the Linux kernel had the VCMM, many of those
>> implementations could have leveraged the mapping and topology
>> management of a VCMM, while focusing on a few key hardware specific
>> functions (map this physical address, program the page table base
>> register).
>
> I partially agree here. All the IOMMU implementations in the Linux
> kernel have a lot of functionality in common where code could be
> shared. Work to share code has been done in the past by Fujita Tomonori
> but there are more places to work on. I am just not sure if a new
> front-end API is the right way to do this.
I don't really think its a new front end API. Its just an API that
allows easier mapping manipulation than the current APIs.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
Joerg Roedel wrote:
> On Fri, Jul 02, 2010 at 12:33:51AM -0700, Zach Pfeffer wrote:
>> Daniel Walker wrote:
>
>>> So if we include this code which "map implementations" could you
>>> collapse into this implementations ? Generally , what currently existing
>>> code can VCMM help to eliminate?
>> In theory, it can eliminate all code the interoperates between IOMMU,
>> CPU and non-IOMMU based devices and all the mapping code, alignment,
>> mapping attribute and special block size support that's been
>> implemented.
>
> Thats a very abstract statement. Can you point to particular code files
> and give a rough sketch how it could be improved using VCMM?
I can. Not to single out a particular subsystem, but the video4linux
code contains interoperation code to abstract the difference between
sg buffers, vmalloc buffers and physically contiguous buffers. The
VCMM is an attempt to provide a framework where these and all the
other buffer types can be unified.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
Joerg Roedel wrote:
> On Thu, Jul 01, 2010 at 11:17:34PM -0700, Zach Pfeffer wrote:
>> Andi Kleen wrote:
>
>>> Hmm? dma_map_* does not change any CPU mappings. It only sets up
>>> DMA mapping(s).
>> Sure, but I was saying that iommu_map() doesn't just set up the IOMMU
>> mappings, its sets up both the iommu and kernel buffer mappings.
>
> What do you mean by kernel buffer mappings?
In-kernel mappings whose addresses can be dereferenced.
>
>
>>> That assumes that all the IOMMUs on the system support the same page table
>>> format, right?
>> Actually no. Since the VCMM abstracts a page-table as a Virtual
>> Contiguous Region (VCM) a VCM can be associated with any device,
>> regardless of their individual page table format.
>
> The IOMMU-API abstracts a page-table as a domain which can also be
> associated with any device (behind an iommu).
It does, but only by convention. The domain member is just a big
catchall void *. It would be more useful to factor out a VCM
abstraction, with associated ops. As it stands all IOMMU device driver
writters have to re-invent IOMMU virtual address management.
--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On Mon, 12 Jul 2010 22:46:59 -0700
Zach Pfeffer <[email protected]> wrote:
> Joerg Roedel wrote:
> > On Fri, Jul 02, 2010 at 12:33:51AM -0700, Zach Pfeffer wrote:
> >> Daniel Walker wrote:
> >
> >>> So if we include this code which "map implementations" could you
> >>> collapse into this implementations ? Generally , what currently existing
> >>> code can VCMM help to eliminate?
> >> In theory, it can eliminate all code the interoperates between IOMMU,
> >> CPU and non-IOMMU based devices and all the mapping code, alignment,
> >> mapping attribute and special block size support that's been
> >> implemented.
> >
> > Thats a very abstract statement. Can you point to particular code files
> > and give a rough sketch how it could be improved using VCMM?
>
> I can. Not to single out a particular subsystem, but the video4linux
> code contains interoperation code to abstract the difference between
> sg buffers, vmalloc buffers and physically contiguous buffers. The
> VCMM is an attempt to provide a framework where these and all the
> other buffer types can be unified.
Why video4linux can't use the DMA API? Doing DMA with vmalloc'ed
buffers is a thing that we should avoid (there are some exceptions
like xfs though).
> Why video4linux can't use the DMA API? Doing DMA with vmalloc'ed
> buffers is a thing that we should avoid (there are some exceptions
> like xfs though).
Vmalloc is about the only API for creating virtually linear memory areas.
The video stuff really needs that to avoid lots of horrible special cases
when doing buffer processing and the like.
Pretty much each driver using it has a pair of functions 'rvmalloc' and
'rvfree' so given a proper "vmalloc_for_dma()" type interface can easily
be switched
Alan
On Tue, 13 Jul 2010 09:20:12 +0100
Alan Cox <[email protected]> wrote:
> > Why video4linux can't use the DMA API? Doing DMA with vmalloc'ed
> > buffers is a thing that we should avoid (there are some exceptions
> > like xfs though).
>
> Vmalloc is about the only API for creating virtually linear memory areas.
> The video stuff really needs that to avoid lots of horrible special cases
> when doing buffer processing and the like.
>
> Pretty much each driver using it has a pair of functions 'rvmalloc' and
> 'rvfree' so given a proper "vmalloc_for_dma()" type interface can easily
> be switched
We already have helper functions for DMA with vmap pages,
flush_kernel_vmap_range and invalidate_kernel_vmap_range.
I think that the current DMA API with the above helper functions
should work well drivers that want virtually linear large memory areas
(such as xfs).
On Tue, 13 Jul 2010 17:30:43 +0900
FUJITA Tomonori <[email protected]> wrote:
> On Tue, 13 Jul 2010 09:20:12 +0100
> Alan Cox <[email protected]> wrote:
>
> > > Why video4linux can't use the DMA API? Doing DMA with vmalloc'ed
> > > buffers is a thing that we should avoid (there are some exceptions
> > > like xfs though).
> >
> > Vmalloc is about the only API for creating virtually linear memory areas.
> > The video stuff really needs that to avoid lots of horrible special cases
> > when doing buffer processing and the like.
> >
> > Pretty much each driver using it has a pair of functions 'rvmalloc' and
> > 'rvfree' so given a proper "vmalloc_for_dma()" type interface can easily
> > be switched
>
> We already have helper functions for DMA with vmap pages,
> flush_kernel_vmap_range and invalidate_kernel_vmap_range.
I'm not sure they help at all because the DMA user for these pages isn't
the video driver - it's the USB layer, and the USB layer isn't
specifically aware it is being passed vmap pages.
Alan
On Tue, 13 Jul 2010 09:42:44 +0100
Alan Cox <[email protected]> wrote:
> On Tue, 13 Jul 2010 17:30:43 +0900
> FUJITA Tomonori <[email protected]> wrote:
>
> > On Tue, 13 Jul 2010 09:20:12 +0100
> > Alan Cox <[email protected]> wrote:
> >
> > > > Why video4linux can't use the DMA API? Doing DMA with vmalloc'ed
> > > > buffers is a thing that we should avoid (there are some exceptions
> > > > like xfs though).
> > >
> > > Vmalloc is about the only API for creating virtually linear memory areas.
> > > The video stuff really needs that to avoid lots of horrible special cases
> > > when doing buffer processing and the like.
> > >
> > > Pretty much each driver using it has a pair of functions 'rvmalloc' and
> > > 'rvfree' so given a proper "vmalloc_for_dma()" type interface can easily
> > > be switched
> >
> > We already have helper functions for DMA with vmap pages,
> > flush_kernel_vmap_range and invalidate_kernel_vmap_range.
>
> I'm not sure they help at all because the DMA user for these pages isn't
> the video driver - it's the USB layer, and the USB layer isn't
> specifically aware it is being passed vmap pages.
Drivers can tell the USB layer that these are vmapped buffers? Adding
something to struct urb? I might be totally wrong since I don't know
anything about the USB layer.
On Tue, Jul 13, 2010 at 05:45:39PM +0900, FUJITA Tomonori wrote:
> Drivers can tell the USB layer that these are vmapped buffers? Adding
> something to struct urb? I might be totally wrong since I don't know
> anything about the USB layer.
With non-DMA coherent aliasing caches, you need to know where the page
is mapped into the virtual address space, so you can deal with aliases.
You'd need to tell the USB layer about the other mappings of the page
which you'd like to be coherent (such as the vmalloc area - and there's
also the possible userspace mapping to think about too, but that's
a separate issue.)
I wonder if we should have had:
vmalloc_prepare_dma(void *, size_t, enum dma_direction)
vmalloc_finish_dma(void *, size_t, enum dma_direction)
rather than flush_kernel_vmap_range and invalidate_kernel_vmap_range,
which'd make their use entirely obvious.
However, this brings up a question - how does the driver (eg, v4l, xfs)
which is preparing the buffer for another driver (eg, usb host, block
dev) know that DMA will be performed on the buffer rather than PIO?
That's a very relevant question, because for speculatively prefetching
CPUs, we need to invalidate caches after a DMA-from-device operation -
but if PIO-from-device happened, this would destroy data read from the
device.
That problem goes away if we decide that PIO drivers must have the same
apparant semantics as DMA drivers - in that data must end up beyond the
point of DMA coherency (eg, physical page) - but that's been proven to
be very hard to achieve, especially with block device drivers.
On Tue, Jul 13, 2010 at 02:59:08PM +0900, FUJITA Tomonori wrote:
> On Mon, 12 Jul 2010 22:46:59 -0700
> Zach Pfeffer <[email protected]> wrote:
>
> > Joerg Roedel wrote:
> > > On Fri, Jul 02, 2010 at 12:33:51AM -0700, Zach Pfeffer wrote:
> > >> Daniel Walker wrote:
> > >
> > >>> So if we include this code which "map implementations" could you
> > >>> collapse into this implementations ? Generally , what currently existing
> > >>> code can VCMM help to eliminate?
> > >> In theory, it can eliminate all code the interoperates between IOMMU,
> > >> CPU and non-IOMMU based devices and all the mapping code, alignment,
> > >> mapping attribute and special block size support that's been
> > >> implemented.
> > >
> > > Thats a very abstract statement. Can you point to particular code files
> > > and give a rough sketch how it could be improved using VCMM?
> >
> > I can. Not to single out a particular subsystem, but the video4linux
> > code contains interoperation code to abstract the difference between
> > sg buffers, vmalloc buffers and physically contiguous buffers. The
> > VCMM is an attempt to provide a framework where these and all the
> > other buffer types can be unified.
>
> Why video4linux can't use the DMA API? Doing DMA with vmalloc'ed
> buffers is a thing that we should avoid (there are some exceptions
> like xfs though).
I'm not sure, but I know that it makes the distinction. From
video4linux/videobuf:
<media/videobuf-dma-sg.h> /* Physically scattered */
<media/videobuf-vmalloc.h> /* vmalloc() buffers */
<media/videobuf-dma-contig.h> /* Physically contiguous */
On Tue, 13 Jul 2010 10:02:23 +0100
Russell King - ARM Linux <[email protected]> wrote:
> On Tue, Jul 13, 2010 at 05:45:39PM +0900, FUJITA Tomonori wrote:
> > Drivers can tell the USB layer that these are vmapped buffers? Adding
> > something to struct urb? I might be totally wrong since I don't know
> > anything about the USB layer.
>
> With non-DMA coherent aliasing caches, you need to know where the page
> is mapped into the virtual address space, so you can deal with aliases.
>
> You'd need to tell the USB layer about the other mappings of the page
> which you'd like to be coherent (such as the vmalloc area - and there's
> also the possible userspace mapping to think about too, but that's
> a separate issue.)
>
> I wonder if we should have had:
>
> vmalloc_prepare_dma(void *, size_t, enum dma_direction)
> vmalloc_finish_dma(void *, size_t, enum dma_direction)
>
> rather than flush_kernel_vmap_range and invalidate_kernel_vmap_range,
> which'd make their use entirely obvious.
>
> However, this brings up a question - how does the driver (eg, v4l, xfs)
> which is preparing the buffer for another driver (eg, usb host, block
> dev) know that DMA will be performed on the buffer rather than PIO?
>
> That's a very relevant question, because for speculatively prefetching
> CPUs, we need to invalidate caches after a DMA-from-device operation -
> but if PIO-from-device happened, this would destroy data read from the
> device.
>
> That problem goes away if we decide that PIO drivers must have the same
> apparant semantics as DMA drivers - in that data must end up beyond the
> point of DMA coherency (eg, physical page) - but that's been proven to
> be very hard to achieve, especially with block device drivers.
Yeah, the last thing we want to do is converting all the PIO drivers.
Seems that we are far from the original discussion (the most of the
above topics have been discussed in the past and we kinda agree that
we have to do something some time, I guess).
Zach Pfeffer said this new VCM infrastructure can be useful for
video4linux. However, I don't think we need 3,000-lines another
abstraction layer to solve video4linux's issue nicely.
I can't find any reasonable reasons that we need to merge VCM; seems
that the combination of the current APIs (or with some small
extensions) can work for the issues that VCM tries to solve.
On Mon, Jul 12, 2010 at 10:21:05PM -0700, Zach Pfeffer wrote:
> Joerg Roedel wrote:
> > The DMA-API already does this with the help of IOMMUs if they are
> > present. What is the benefit of your approach over that?
>
> The grist to the DMA-API mill is the opaque scatterlist. Each
> scatterlist element brings together a physical address and a bus
> address that may be different. The set of scatterlist elements
> constitute both the set of physical buffers and the mappings to those
> buffers. My approach separates these two things into a struct physmem
> which contains the set of physical buffers and a struct reservation
> which contains the set of bus addresses (or device addresses). Each
> element in the struct physmem may be of various lengths (without
> resorting to chaining). A map call maps the one set to the other.
Okay, thats a different concept, where is the benefit?
Joerg
On Wed, Jul 14, 2010 at 09:34:03PM +0200, Joerg Roedel wrote:
> On Mon, Jul 12, 2010 at 10:21:05PM -0700, Zach Pfeffer wrote:
> > Joerg Roedel wrote:
>
> > > The DMA-API already does this with the help of IOMMUs if they are
> > > present. What is the benefit of your approach over that?
> >
> > The grist to the DMA-API mill is the opaque scatterlist. Each
> > scatterlist element brings together a physical address and a bus
> > address that may be different. The set of scatterlist elements
> > constitute both the set of physical buffers and the mappings to those
> > buffers. My approach separates these two things into a struct physmem
> > which contains the set of physical buffers and a struct reservation
> > which contains the set of bus addresses (or device addresses). Each
> > element in the struct physmem may be of various lengths (without
> > resorting to chaining). A map call maps the one set to the other.
>
> Okay, thats a different concept, where is the benefit?
The benefit is that virtual address space and physical address space
are managed independently. This may be useful if you want to reuse the
same set of physical buffers, a user simply maps them when they're
needed. It also means that different physical memories could be
targeted and a virtual allocation could map those memories without
worrying about where they were.
This whole concept is just a logical extension of the already existing
separation between pages and page frames... in fact the separation
between physical memory and what is mapped to that memory is
fundamental to the Linux kernel. This approach just says that arbitrary
long buffers should work the same way.
On Wed, Jul 14, 2010 at 10:59:43AM +0900, FUJITA Tomonori wrote:
> On Tue, 13 Jul 2010 10:02:23 +0100
>
> Zach Pfeffer said this new VCM infrastructure can be useful for
> video4linux. However, I don't think we need 3,000-lines another
> abstraction layer to solve video4linux's issue nicely.
Its only 3000 lines because I haven't converted the code to use
function pointers.
> I can't find any reasonable reasons that we need to merge VCM; seems
> that the combination of the current APIs (or with some small
> extensions) can work for the issues that VCM tries to solve.
Getting back to the point. There is no API that can handle large
buffer allocation and sharing with low-level attribute control for
virtual address spaces outside the CPU. At this point if you need to
work with big buffers, 1 MB and 16 MB etc, and map those big buffers
to non-CPU virtual spaces you need to explicitly carve them out and
set up the mappings and sharing by hand. Its reasonable to have an API
that can do this especially since IOMMUs are going to become more
prevalent. The DMA API et al. take a CPU centric view of virtual space
management, sharing has to be explicitly written and external virtual
space management is left up to device driver writers. Given a system
where each device has an IOMMU or a MMU the whole concept of a
scatterlist goes away. The VCM API gets a jump on it.
On Wed, 21 Jul 2010 20:50:26 -0700
Zach Pfeffer <[email protected]> wrote:
> On Wed, Jul 14, 2010 at 10:59:43AM +0900, FUJITA Tomonori wrote:
> > On Tue, 13 Jul 2010 10:02:23 +0100
> >
> > Zach Pfeffer said this new VCM infrastructure can be useful for
> > video4linux. However, I don't think we need 3,000-lines another
> > abstraction layer to solve video4linux's issue nicely.
>
> Its only 3000 lines because I haven't converted the code to use
> function pointers.
The main point is adding a new abstraction that don't provide the huge
benefit.
On Wed, Jul 21, 2010 at 08:50:26PM -0700, Zach Pfeffer wrote:
> On Wed, Jul 14, 2010 at 10:59:43AM +0900, FUJITA Tomonori wrote:
> > On Tue, 13 Jul 2010 10:02:23 +0100
> >
> > Zach Pfeffer said this new VCM infrastructure can be useful for
> > video4linux. However, I don't think we need 3,000-lines another
> > abstraction layer to solve video4linux's issue nicely.
>
> Its only 3000 lines because I haven't converted the code to use
> function pointers.
I don't understand - you've made this claim a couple of times. I
can't see how converting the code to use function pointers (presumably
to eliminate those switch statements) would reduce the number of lines
of code.
Please explain (or show via new patches) how does converting this to
function pointers significantly reduce the number of lines of code.
We might then be able to put just _one_ of these issues to bed.
> Getting back to the point. There is no API that can handle large
> buffer allocation and sharing with low-level attribute control for
> virtual address spaces outside the CPU.
I think we've dealt with the attribute issue to death now. Shall we
repeat it again?
> The DMA API et al. take a CPU centric view of virtual space
> management, sharing has to be explicitly written and external virtual
> space management is left up to device driver writers.
I think I've also shown that not to be the case with example code.
The code behind the DMA API can be changed on a per-device basis
(currently on ARM we haven't supported that because no one's asked
for it yet) so that it can support multiple IOMMUs even of multiple
different types.
On Thu, Jul 22, 2010 at 08:51:51AM +0100, Russell King - ARM Linux wrote:
> On Wed, Jul 21, 2010 at 08:50:26PM -0700, Zach Pfeffer wrote:
> > On Wed, Jul 14, 2010 at 10:59:43AM +0900, FUJITA Tomonori wrote:
> > > On Tue, 13 Jul 2010 10:02:23 +0100
> > >
> > > Zach Pfeffer said this new VCM infrastructure can be useful for
> > > video4linux. However, I don't think we need 3,000-lines another
> > > abstraction layer to solve video4linux's issue nicely.
> >
> > Its only 3000 lines because I haven't converted the code to use
> > function pointers.
>
> I don't understand - you've made this claim a couple of times. I
> can't see how converting the code to use function pointers (presumably
> to eliminate those switch statements) would reduce the number of lines
> of code.
>
> Please explain (or show via new patches) how does converting this to
> function pointers significantly reduce the number of lines of code.
>
> We might then be able to put just _one_ of these issues to bed.
Aye. Its getting worked on. Once its done I'll push it.
>
> > Getting back to the point. There is no API that can handle large
> > buffer allocation and sharing with low-level attribute control for
> > virtual address spaces outside the CPU.
>
> I think we've dealt with the attribute issue to death now. Shall we
> repeat it again?
I think the only point of agreement is that all mappings must have
compatible attributes, the issue of multiple mappings is still
outstanding, as is needing more fine grained control of the attributes
of a set of compatible mappings (I still need to digest your examples
a little).
>
> > The DMA API et al. take a CPU centric view of virtual space
> > management, sharing has to be explicitly written and external virtual
> > space management is left up to device driver writers.
>
> I think I've also shown that not to be the case with example code.
>
> The code behind the DMA API can be changed on a per-device basis
> (currently on ARM we haven't supported that because no one's asked
> for it yet) so that it can support multiple IOMMUs even of multiple
> different types.
I'm seeing that now. As I become more familiar with the DMA API the
way forward may become more clear to me. I certainly appreciate the
time you've spent discussing things and the code examples you've
listed. For example, it fairly clear how I can use a scatter list to
describe a mapping of big buffers. I can start down this path and see
what shakes out.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
On Thu, Jul 22, 2010 at 01:47:36PM +0900, FUJITA Tomonori wrote:
> On Wed, 21 Jul 2010 20:50:26 -0700
> Zach Pfeffer <[email protected]> wrote:
>
> > On Wed, Jul 14, 2010 at 10:59:43AM +0900, FUJITA Tomonori wrote:
> > > On Tue, 13 Jul 2010 10:02:23 +0100
> > >
> > > Zach Pfeffer said this new VCM infrastructure can be useful for
> > > video4linux. However, I don't think we need 3,000-lines another
> > > abstraction layer to solve video4linux's issue nicely.
> >
> > Its only 3000 lines because I haven't converted the code to use
> > function pointers.
>
> The main point is adding a new abstraction that don't provide the huge
> benefit.
I disagree. In its current form the API may not be appropriate for
inclusion into the kernel, but it provides a common framework for
handling a class of problems that have been solved many times in the
kernel: large buffer management, IOMMU interoperation and fine grained
mapping control.