Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757485Ab0F3Xly (ORCPT ); Wed, 30 Jun 2010 19:41:54 -0400 Received: from rcsinet10.oracle.com ([148.87.113.121]:18123 "EHLO rcsinet10.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755240Ab0F3Xlw (ORCPT ); Wed, 30 Jun 2010 19:41:52 -0400 Date: Wed, 30 Jun 2010 16:40:58 -0700 From: Randy Dunlap To: Zach Pfeffer Cc: mel@csn.ul.ie, andi@firstfloor.org, dwalker@codeaurora.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-omap@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: Re: [RFC 1/3] mm: iommu: An API to unify IOMMU, CPU and device memory management Message-Id: <20100630164058.aa6aa3a2.randy.dunlap@oracle.com> In-Reply-To: <1277877350-2147-1-git-send-email-zpfeffer@codeaurora.org> References: <1277877350-2147-1-git-send-email-zpfeffer@codeaurora.org> Organization: Oracle Linux Eng. X-Mailer: Sylpheed 2.7.1 (GTK+ 2.16.6; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Source-IP: acsmt354.oracle.com [141.146.40.154] X-Auth-Type: Internal IP X-CT-RefId: str=0001.0A090202.4C2BD61B.024F:SCFMA4539814,ss=1,fgs=0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 27012 Lines: 708 On Tue, 29 Jun 2010 22:55:48 -0700 Zach Pfeffer wrote: > This patch contains the documentation for the API, termed the Virtual > Contiguous Memory Manager. Its use would allow all of the IOMMU to VM, > VM to device and device to IOMMU interoperation code to be refactored > into platform independent code. > > Comments, suggestions and criticisms are welcome and wanted. > > Signed-off-by: Zach Pfeffer > --- > Documentation/vcm.txt | 583 +++++++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 583 insertions(+), 0 deletions(-) > create mode 100644 Documentation/vcm.txt > > diff --git a/Documentation/vcm.txt b/Documentation/vcm.txt > new file mode 100644 > index 0000000..d29c757 > --- /dev/null > +++ b/Documentation/vcm.txt > @@ -0,0 +1,583 @@ > +What is this document about? > +============================ > + > +This document covers how to use the Virtual Contiguous Memory Manager > +(VCMM), how the first implmentation works with a specific low-level > +Input/Output Memory Management Unit (IOMMU) and the way the VCMM is used > +from user-space. It also contains a section that describes why something > +like the VCMM is needed in the kernel. > + > +If anything in this document is wrong please send patches to the is wrong, > +maintainer of this file, listed at the bottom of the document. > + > + > +The Virtual Contiguous Memory Manager > +===================================== > + > +The VCMM was built to solve the system-wide memory mapping issues that > +occur when many bus-masters have IOMMUs. > + > +An IOMMU maps device addresses to physical addresses. It also insulates > +the system from spurious or malicious device bus transactions and allows > +fine-grained mapping attribute control. The Linux kernel core does not > +contain a generic API to handle IOMMU mapped memory; device driver writers > +must implement device specific code to interoperate with the Linux kernel > +core. As the number of IOMMUs increases, coordinating the many address > +spaces mapped by all discrete IOMMUs becomes difficult without in-kernel > +support. > + > +The VCMM API enables device independent IOMMU control, virtual memory > +manager (VMM) interoperation and non-IOMMU enabled device interoperation > +by treating devices with or without IOMMUs and all CPUs with or without > +MMUs, their mapping contexts and their mappings using common > +abstractions. Physical hardware is given a generic device type and mapping > +contexts are abstracted into Virtual Contiguous Memory (VCM) > +regions. Users "reserve" memory from VCMs and "back" their reservations > +with physical memory. > + > +Why the VCMM is Needed > +---------------------- > + > +Driver writers who control devices with IOMMUs must contend with device > +control and memory management. Driver writers have a large device driver > +API that they can leverage to control their devices, but they are lacking > +a unified API to help them program mappings into IOMMUs and share those > +mappings with other devices and CPUs in the system. > + > +Sharing is complicated by Linux's CPU centric VMM. The CPU centric model CPU-centric CPU-centric > +generally makes sense because average hardware only contains a MMU for the > +CPU and possibly a graphics MMU. If every device in the system has one or > +more MMUs the CPU centric memory management (MM) programming model breaks ditto > +down. > + > +Abstracting IOMMU device programming into a common API has already begun > +in the Linux kernel. It was built to abstract the difference between AMDs AMD's (or just "AMD") > +and Intels IOMMUs to support x86 virtualization on both platforms. The Intel's (or just "Intel") > +interface is listed in kernel/include/linux/iommu.h. It contains drop "kernel/" > +interfaces for mapping and unmapping as well as domain management. This > +interface has not gained widespread use outside the x86; PA-RISC, Alpha > +and SPARC architectures and ARM and PowerPC platforms all use their own > +mapping modules to control their IOMMUs. The VCMM contains an IOMMU > +programming layer, but since its abstraction supports map management > +independent of device control, the layer is not used directly. This > +higher-level view enables a new kernel service, not just an IOMMU > +interoperation layer. > + > +The General Idea: Map Management using Graphs > +--------------------------------------------- > + > +Looking at mapping from a system-wide perspective reveals a general graph > +problem. The VCMMs API is built to manage the general mapping graph. Each VMCC's > +node that talks to memory, either through an MMU or directly (physically > +mapped) can be thought of as the device-end of a mapping edge. The other > +edge is the physical memory (or intermediate virtual space) that is > +mapped. > + > +In the direct mapped case the device is assigned a one-to-one MMU. This direct-mapped > +scheme allows direct mapped devices to participate in general graph > +management. > + > +The CPU nodes can also be brought under the same mapping abstraction with > +the use of a light overlay on the existing VMM. This light overlay allows > +VMM managed mappings to interoperate with the common API. The light VMM-managed > +overlay enables this without substantial modifications to the existing > +VMM. > + > +In addition to CPU nodes that are running Linux (and the VMM), remote CPU > +nodes that may be running other operating systems can be brought into the > +general abstraction. Routing all memory management requests from a remote > +node through the central memory management framework enables new features > +like system-wide memory migration. This feature may only be feasible for > +large buffers that are managed outside of the fast-path, but having remote > +allocation in a system enables features that are impossible to build > +without it. > + > +The fundamental objects that support graph-based map management are: > + > +1) Virtual Contiguous Memory Regions > + > +2) Reservations > + > +3) Associated Virtual Contiguous Memory Regions > + > +4) Memory Targets > + > +5) Physical Memory Allocations > + > +Usage Overview > +-------------- > + > +In a nut-shell, users allocate Virtual Contiguous Memory Regions and nutshell, > +associate those regions with one or more devices by creating an Associated > +Virtual Contiguous Memory Region. Users then create Reservations from the > +Virtual Contiguous Memory Region. At this point no physical memory has > +been committed to the reservation. To associate physical memory with a > +reservation a Physical Memory Allocation is created and the Reservation is > +backed with this allocation. > + > +include/linux/vcm.h includes comments documenting each API. > + > +Virtual Contiguous Memory Regions > +--------------------------------- > + > +A Virtual Contiguous Memory Region (VCM) abstracts the memory space a > +device sees. The addresses of the region are only used by the devices > +which are associated with the region. This address space would normally be > +implemented as a device page-table. page table. > + > +A VCM is created and destroyed with three functions: > + > + struct vcm *vcm_create(size_t start_addr, size_t len); Seems odd to use size_t for start_addr. > + struct vcm *vcm_create_from_prebuilt(size_t ext_vcm_id); > + > + int vcm_free(struct vcm *vcm); > + > +start_addr is an offset into the address space where allocations will > +start from. len is the length from start_addr of the VCM. Both functions > +generate an instance of a VCM. > + > +ext_vcm_id is used to pass a request to the VMM to generate a VCM > +instance. In the current implementation the call simply makes a note that > +the VCM instance is a VMM VCM instance for other interfaces usage. This > +muxing is seen throughout the implementation. > + > +vcm_create() and vcm_create_from_prebuilt() produce VCM instances for > +virtually mapped devices (IOMMUs and CPUs). To create a one-to-one mapped > +VCM users pass the start_addr and len of the physical region. The VCMM VCM, > +matches this and records that the VCM instance is a one-to-one VCM. > + > +The newly created VCM instance can be passed to any function that needs to > +operate on or with a virtual contiguous memory region. Its main attributes > +are a start_addr and a len as well as an internal setting that allows the > +implementation to mux between true virtual spaces, one-to-one mapped > +spaces and VMM managed spaces. > + > +The current implementation uses the genalloc library to manage the VCM for > +IOMMU devices. Return values and more in-depth per-function documentation > +for these and the ones listed below are in include/linux/vcm.h. > + > +Reservations > +------------ > + > +A Reservation is a contiguous region allocated from a VCM. There is no > +physical memory associated with it. > + > +A Reservation is created and destroyed with: > + > + struct res *vcm_reserve(struct vcm *vcm, size_t len, uint32_t attr); s/uint32_t/u32/ ? > + int vcm_unreserve(struct res *res); > + > +A vcm is a VCM created above. len is the length of the request. It can be > +up to the length of the VCM region the reservation is being created > +from. attr are mapping attributes: read, write, execute, user, supervisor, > +secure, not-cached, write-back/write-allocate, write-back/no > +write-allocate, write-through. These attrs are appropriate for ARM but can > +be changed to match to any architecture. > + > +The implementation calls gen_pool_alloc() for IOMMU devices, > +alloc_vm_area() for VMM areas and is a pass through for one-to-one mapped pass-through > +areas. > + > +Associated Virtual Contiguous Memory Regions and Activation > +----------------------------------------------------------- > + > +An Associated Virtual Contiguous Memory Region (AVCM) is a mapping of a > +VCM to a device. The mapping can be active or inactive. > + > +An AVCM is managed with: > + > +struct avcm *vcm_assoc(struct vcm *vcm, size_t dev, uint32_t attr); > + > + int vcm_deassoc(struct avcm *avcm); > + > + int vcm_activate(struct avcm *avcm); > + > + int vcm_deactivate(struct avcm *avcm); > + > +A VCM instance is a VCM created above. A dev is an opaque device handle > +thats passed down to the device driver the VCMM muxes in to handle a > +request. attr are association attributes: split, use-high or size_t used for an opaque device handle seems odd. > +use-low. split controls which transactions hit a high-address page-table > +and which transactions hit a low-address page-table. For instance, all > +transactions whose most significant address bit is one would use the > +high-address page-table, any other transaction would use the low address > +page-table. This scheme is ARM specific and could be changed in other ARM-specific > +architectures. One VCM instance can be associated with many devices and > +many VCM instances can be associated with one device. > + > +An AVCM is only a link. To program and deprogram a device with a VCM the > +user calls vcm_activate() and vcm_deactivate().For IOMMU devices, vcm_deactivate(). For > +activating a mapping programs the base address of a page-table into an page table > +IOMMU. For VMM and one-to-one based devices, mappings are active > +immediately but the API does require an activation call for them for > +internal reference counting. > + > +Memory Targets > +-------------- > + > +A Memory Target is a platform independent way of specifying a physical > +pool; it abstracts a pool of physical memory. The physical memory pool may > +be physically discontiguous, need to be allocated from in a unique way or > +have other user-defined attributes. > + > +Physical Memory Allocation and Reservation Backing > +-------------------------------------------------- > + > +Physical memory is allocated as a separate step from reserving > +memory. This allows multiple reservations to back the same physical > +memory. > + > +A Physical Memory Allocation is managed using the following functions: > + > + struct physmem *vcm_phys_alloc(enum memtype_t memtype, size_t len, > + uint32_t attr); > + > + int vcm_phys_free(struct physmem *physmem); > + > + int vcm_back(struct res *res, struct physmem *physmem); > + > + int vcm_unback(struct res *res); > + > +attr can include an alignment request, a specification to map memory using > +various block sizes and/or to use physically contiguous memory. memtype is > +one of the memory types listed in Memory Targets. > + > +The current implementation manages two pools of memory. One pool is a > +contiguous block of memory and the other is a set of contiguous block > +pools. In the current implementation the block pools contain 4K, 64K and > +1M blocks. The physical allocator does not try to split blocks from the > +contiguous block pools to satisfy requests. > + > +The use of 4K, 64K and 1M blocks solves a problem with some IOMMU > +hardware. IOMMUs are placed in front of multimedia engines to provide a > +contiguous address space to the device. Multimedia devices need large > +buffers and large buffers may map to a large number of physical > +blocks. IOMMUs tend to have small translation lookaside buffers > +(TLBs). Since the TLB is small the number of physical blocks that map a > +given range needs to be small or else the IOMMU will continually fetch new > +translations during a typical streamed multimedia flow. By using a 1 MB > +mapping (or 64K mapping) instead of a 4K mapping the number of misses can > +be minimized, allowing the multimedia block to meet its performance goals. > + > +Low Level Control > +----------------- > + > +It is necessary in some instances to access attributes and provide > +higher-level control of the low-level hardware abstraction. The API > +contains many functions for this task but the two that are typically used > +are: > + > + size_t vcm_get_dev_addr(struct res *res); > + > + int vcm_hook(size_t dev, vcm_handler handler, void *data); > + > +The first function, vcm_get_dev_addr() returns a device address given a > +reservation. This device address is a virtual IOMMU address for > +reservations on IOMMU VCMs, a virtual VMM address for reservations on VMM > +VCMs and a virtual (really physical since its one-to-one mapped) address > +for one-to-one devices. > + > +The second function, vcm_hook allows a caller in the kernel to register a vcm_hook, > +user_handler. The handler is passed the data member passed to vcm_hook > +during a fault. The user can return 1 to indicate that the underlying > +driver should handle the fault and retry the transaction or they can > +return 0 to halt the transaction. If the user doesn't register a handler > +the low-level driver will print a warning and terminate the transaction. > + > +A Detailed Walk Through > +----------------------- > + > +The following call sequence walks through a typical allocation > +sequence. In the first stage the memory for a device is reserved and > +backed. This occurs without mapping the memory into a VMM VCM region. The > +second stage maps the first VCM region into a VMM VCM region so the kernel > +can read or write it. The second stage is not necessary if the VMM does > +not need to read or modify the contents of the original mapping. > + > + Stage 1: Map and Allocate Memory for a Device > + > + The call sequence starts by creating a VCM region: > + > + vcm = vcm_create(start_addr, len); > + > + The next call associates a VCM region with a device: > + > + avcm = vcm_assoc(vcm, dev, attr); > + > + To activate the association users call vcm_activate() on the avcm from association, > + the associate call. This programs the underlining device with the > + mappings. > + > + ret = vcm_activate(avcm); > + > + Once a VCM region is created and associated it can be reserved from > + with: > + > + res = vcm_reserve(vcm, res_len, res_attr); > + > + A user then allocates physical memory with: > + > + physmem = vcm_phys_alloc(memtype, len, phys_attr); > + > + To back the reservation with the physical memory allocation the user > + calls: > + > + vcm_back(res, physmem); > + > + > + Stage 2: Map the Device's Memory into the VMM's VCM region > + > + If the VMM needs to read and/or write the region that was just created created, > + the following calls are made. > + > + The first call creates a prebuilt VCM with: > + > + vcm_vmm = vcm_from_prebuilt(ext_vcm_id); > + > + The prebuilt VCM is associated with the CPU device and activated with: > + > + avcm_vmm = vcm_assoc(vcm_vmm, dev_cpu, attr); > + vcm_activate(avcm_vmm); > + > + A reservation is made on the VMM VCM with: > + > + res_vmm = vcm_reserve(vcm_vmm, res_len, attr); > + > + Finally, once the topology has been set up a vcm_back() allows the VMM > + to read the memory using the physmem generated in stage 1: > + > + vcm_back(res_vmm, physmem); > + > +Mapping IOMMU, one-to-one and VMM Reservations > +---------------------------------------------- > + > +The following example demonstrates mapping IOMMU, one-to-one and VMM > +reservations to the same physical memory. It shows the use of phys_addr > +and phys_size to create a contiguous VCM for one-to-one mapped devices. > + > + The user allocates physical memory: > + > + physmem = vcm_phys_alloc(memtype, SZ_2MB + SZ_4K, CONTIGUOUS); > + > + Creates an IOMMU VCM: > + > + vcm_iommu = vcm_create(SZ_1K, SZ_16M); > + > + Creates an one-to-one VCM: a one-to-one > + > + vcm_onetoone = vcm_create(phys_addr, phys_size); > + > + Creates a Prebuit VCM: > + > + vcm_vmm = vcm_from_prebuit(ext_vcm_id); > + > + Associate and activate all three to their respective devices: > + > + avcm_iommu = vcm_assoc(vcm_iommu, dev_iommu, attr0); > + avcm_onetoone = vcm_assoc(vcm_onetoone, dev_onetoone, attr1); > + avcm_vmm = vcm_assoc(vcm_vmm, dev_cpu, attr2); error handling on vcm_assoc() failures? > + vcm_activate(avcm_iommu); > + vcm_activate(avcm_onetoone); > + vcm_activate(avcm_vmm); > + > + And finally, creates and backs reservations on all 3 such that they > + all point to the same memory: > + > + res_iommu = vcm_reserve(vcm_iommu, SZ_2MB + SZ_4K, attr); > + res_onetoone = vcm_reserve(vcm_onetoone, SZ_2MB + SZ_4K, attr); > + res_vmm = vcm_reserve(vcm_vmm, SZ_2MB + SZ_4K, attr); error handling? > + vcm_back(res_iommu, physmem); > + vcm_back(res_onetoone, physmem); > + vcm_back(res_vmm, physmem); > + > +VCM Summary > +----------- > + > +The VCMM is an attempt to abstract attributes of three distinct classes of > +mappings into one API. The VCMM allows users to reason about mappings as > +first class objects. It also allows memory mappings to flow from the > +traditional 4K mappings prevalent on systems today to more efficient block > +sizes. Finally, it allows users to manage mapping interoperation without > +becoming VMM experts. These features will allow future systems with many > +MMU mapped devices to interoperate simply and therefore correctly. > + > + > +IOMMU Hardware Control > +====================== > + > +The VCM currently supports a single type of IOMMU, a Qualcomm System MMU > +(SMMU). The SMMU interface contains functions to map and unmap virtual > +addresses, perform address translations and initialize hardware. A > +Qualcomm SMMU can contain multiple MMU contexts. Each context can > +translate in parallel. All contexts in a SMMU share one global translation > +look-aside buffer (TLB). > + > +To support context muxing the SMMU module creates and manages device > +independent virtual contexts. These context abstractions are bound to > +actual contexts at run-time. Once bound, a context can be activated. This > +activation programs the underlying context with the virtual context > +affecting a context switch. > + > +The following functions are all documented in: > + > + arch/arm/mach-msm/include/mach/smmu_driver.h. > + > +Mapping > +------- > + > +To map and unmap a virtual page into physical space the VCM calls: > + > + int smmu_map(struct smmu_dev *dev, unsigned long pa, > + unsigned long va, unsigned long len, unsigned int attr); > + > + int smmu_unmap(struct smmu_dev *dev, unsigned long va, > + unsigned long len); > + > + int smmu_update_start(struct smmu_dev *dev); > + > + int smmu_update_done(struct smmu_dev *dev); > + > +The size given to map must be 4K, 64K, 1M or 16M and the VA and PA must be > +aligned to the given size. smmu_update_start() and smmu_update_done() > +should be called before and after each map or unmap. > + > +Translation > +----------- > + > +To request a hardware VA to PA translation on a single address the VCM > +calls: > + > + unsigned long smmu_translate(struct smmu_dev *dev, > + unsigned long va); > + > +Fault Handling > +-------------- > + > +To register an interrupt handler for a context the VCM calls: > + > + int smmu_hook_irpt(struct smmu_dev *dev, vcm_handler handler, > + void *data); We try to spell out things like "interrupt" in function names because I/we can't remember just how you decided to abbreviate it. > + > +The registered interrupt handler should return 1 if it wants the SMMU > +driver to retry the transaction again and 0 if it wants the SMMU driver to > +terminate the transaction. > + > +Managing SMMU Initialization and Contexts > +----------------------------------------- > + > +SMMU hardware initialization and management happens in 2 steps. The first > +step initializes global SMMU devices and abstract device contexts. The > +second step binds contexts and devices. > + > +A SMMU hardware instance is built with: An SMMU > + > + int smmu_drvdata_init(struct smmu_driver *drv, unsigned long base, > + int irq); > + > +A SMMU context is Initialized and deinitialized with: An SMMU initialized > + > + struct smmu_dev *smmu_ctx_init(int ctx); > + int smmu_ctx_deinit(struct smmu_dev *dev); > + > +An abstract SMMU context is bound to a particular SMMU with: > + > + int smmu_ctx_bind(struct smmu_dev *ctx, struct smmu_driver *drv); > + > +Activation > +---------- > + > +Activation affects a context switch. > + > +Activation, deactivation and activation state testing are done with: > + > + int smmu_activate(struct smmu_dev *dev); > + int smmu_deactivate(struct smmu_dev *dev); > + int smmu_is_active(struct smmu_dev *dev); > + > + > +Userspace Access to Devices with IOMMUs > +======================================= > + > +A device that issues transactions through an IOMMU must work with two > +APIs. The first API is the VCM. The VCM API is device independent. Users > +pass the VCM a dev_id and the VCM makes calls on the hardware device it > +has been configured with using this dev_id. The second API is whatever > +device topology has been created to organize the particular IOMMUs in a > +system. The only constraint on this second API is that it must give the > +user a single dev_id that it can pass through the VCM. > + > +For the Qualcomm SMMUs the second API consists of a tree of platform > +devices and two platform drivers as well as a context lookup function that > +traverses the device tree and returns a dev_id given a context name. > + > +Qualcomm SMMU Device Tree > +------------------------- > + > +The current tree organizes the devices into a tree that looks like the > +following: > + > +smmu/ > + smmu0/ > + ctx0 > + ctx1 > + ctx2 > + smmu1/ > + ctx3 > + > + > +Each context, ctx[n] and each smmu, smmu[n] is given a name. Since users > +are interested in contexts not smmus, the contexts name is passed to a context > +function to find the dev_id associated with that name. The functions to > +find, free and get the base address (since the device probe function calls > +ioremap to map the SMMUs configuration registers into the kernel) are > +listed here: > + > + struct smmu_dev *smmu_get_ctx_instance(char *ctx_name); > + int smmu_free_ctx_instance(struct smmu_dev *dev); > + unsigned long smmu_get_base_addr(struct smmu_dev *dev); > + > +Documentation for these functions is in: > + > + arch/arm/mach-msm/include/mach/smmu_device.h > + > +Each context is given a dev node named after the context. For example: > + > + /dev/vcodec_a_mm1 > + /dev/vcodec_b_mm2 > + /dev/vcodec_stream > + etc... > + > +Users open, close and mmap these nodes to access VCM buffers from > +userspace in the same way that they used to open, close and mmap /dev > +nodes that represented large physically contiguous buffers (called PMEM > +buffers on Android). > + > +Example > +------- > + > +An abbreviated example is shown here: > + > +Users get the dev_id associated with their target context, create a VCM > +topology appropriate for their device and finally associate the VCMs of > +the topology with the contexts that will take the VCMs: > + > + dev_id = smmu_get_ctx_instance(vcodec_a_stream); > + > +create vcm and needed topology > + > + avcm = vcm_assoc(vcm, dev_id, attr); > + > +Tying it all Together > +--------------------- > + > +VCMs, IOMMUs and the device tree all work to support system-wide memory > +mappings. The use of each API in this system allows users to concentrate > +on the relevant details without needing to worry about low-level > +details. The APIs clear separation of memory spaces and the devices that API's > +support those memory spaces continues Linuxs tradition of abstracting the the Linux tradition > +what from the how. > + > + > +Maintainer: Zach Pfeffer > -- --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/