Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752395AbcDUIqA (ORCPT ); Thu, 21 Apr 2016 04:46:00 -0400 Received: from mail-wm0-f41.google.com ([74.125.82.41]:38396 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752220AbcDUIoX (ORCPT ); Thu, 21 Apr 2016 04:44:23 -0400 Subject: Re: [PATCH v7 06/10] iommu/dma-reserved-iommu: iommu_get/put_reserved_iova To: Robin Murphy , eric.auger@st.com, alex.williamson@redhat.com, will.deacon@arm.com, joro@8bytes.org, tglx@linutronix.de, jason@lakedaemon.net, marc.zyngier@arm.com, christoffer.dall@linaro.org, linux-arm-kernel@lists.infradead.org References: <1461084994-2355-1-git-send-email-eric.auger@linaro.org> <1461084994-2355-7-git-send-email-eric.auger@linaro.org> <5717B541.7090003@arm.com> Cc: patches@linaro.org, linux-kernel@vger.kernel.org, Bharat.Bhushan@freescale.com, pranav.sawargaonkar@gmail.com, p.fedin@samsung.com, iommu@lists.linux-foundation.org, Jean-Philippe.Brucker@arm.com, julien.grall@arm.com From: Eric Auger Message-ID: <5718929A.5050703@linaro.org> Date: Thu, 21 Apr 2016 10:43:06 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <5717B541.7090003@arm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 13487 Lines: 372 Hi Robin, On 04/20/2016 06:58 PM, Robin Murphy wrote: > On 19/04/16 17:56, Eric Auger wrote: >> This patch introduces iommu_get/put_reserved_iova. >> >> iommu_get_reserved_iova allows to iommu map a contiguous physical region >> onto a reserved contiguous IOVA region. The physical region base address >> does not need to be iommu page size aligned. iova pages are allocated and >> mapped so that they cover all the physical region. This mapping is >> tracked as a whole (and cannot be split) in an RB tree indexed by PA. >> >> In case a mapping already exists for the physical pages, the IOVA mapped >> to the PA base is directly returned. >> >> Each time the get succeeds a binding ref count is incremented. >> >> iommu_put_reserved_iova decrements the ref count and when this latter >> is null, the mapping is destroyed and the iovas are released. >> >> Signed-off-by: Eric Auger >> >> --- >> v7: >> - change title and rework commit message with new name of the functions >> and size parameter >> - fix locking >> - rework header doc comments >> - put now takes a phys_addr_t >> - check prot argument against reserved_iova_domain prot flags >> >> v5 -> v6: >> - revisit locking with spin_lock instead of mutex >> - do not kref_get on 1st get >> - add size parameter to the get function following Marc's request >> - use the iova domain shift instead of using the smallest supported >> page size >> >> v3 -> v4: >> - formerly in iommu: iommu_get/put_single_reserved & >> iommu/arm-smmu: implement iommu_get/put_single_reserved >> - Attempted to address Marc's doubts about missing size/alignment >> at VFIO level (user-space knows the IOMMU page size and the number >> of IOVA pages to provision) >> >> v2 -> v3: >> - remove static implementation of iommu_get_single_reserved & >> iommu_put_single_reserved when CONFIG_IOMMU_API is not set >> >> v1 -> v2: >> - previously a VFIO API, named vfio_alloc_map/unmap_free_reserved_iova >> --- >> drivers/iommu/dma-reserved-iommu.c | 150 >> +++++++++++++++++++++++++++++++++++++ >> include/linux/dma-reserved-iommu.h | 38 ++++++++++ >> 2 files changed, 188 insertions(+) >> >> diff --git a/drivers/iommu/dma-reserved-iommu.c >> b/drivers/iommu/dma-reserved-iommu.c >> index f6fa18e..426d339 100644 >> --- a/drivers/iommu/dma-reserved-iommu.c >> +++ b/drivers/iommu/dma-reserved-iommu.c >> @@ -135,6 +135,22 @@ unlock: >> } >> EXPORT_SYMBOL_GPL(iommu_alloc_reserved_iova_domain); >> >> +/* called with domain's reserved_lock held */ >> +static void reserved_binding_release(struct kref *kref) >> +{ >> + struct iommu_reserved_binding *b = >> + container_of(kref, struct iommu_reserved_binding, kref); >> + struct iommu_domain *d = b->domain; >> + struct reserved_iova_domain *rid = >> + (struct reserved_iova_domain *)d->reserved_iova_cookie; > > Either it's a void *, in which case you don't need to cast it, or it > should be the appropriate type as I mentioned earlier, in which case you > still wouldn't need to cast it. ok > >> + unsigned long order; >> + >> + order = iova_shift(rid->iovad); >> + free_iova(rid->iovad, b->iova >> order); > > iova_pfn() ? ok > >> + unlink_reserved_binding(d, b); >> + kfree(b); >> +} >> + >> void iommu_free_reserved_iova_domain(struct iommu_domain *domain) >> { >> struct reserved_iova_domain *rid; >> @@ -160,3 +176,137 @@ unlock: >> } >> } >> EXPORT_SYMBOL_GPL(iommu_free_reserved_iova_domain); >> + >> +int iommu_get_reserved_iova(struct iommu_domain *domain, >> + phys_addr_t addr, size_t size, int prot, >> + dma_addr_t *iova) >> +{ >> + unsigned long base_pfn, end_pfn, nb_iommu_pages, order, flags; >> + struct iommu_reserved_binding *b, *newb; >> + size_t iommu_page_size, binding_size; >> + phys_addr_t aligned_base, offset; >> + struct reserved_iova_domain *rid; >> + struct iova_domain *iovad; >> + struct iova *p_iova; >> + int ret = -EINVAL; >> + >> + newb = kzalloc(sizeof(*newb), GFP_KERNEL); >> + if (!newb) >> + return -ENOMEM; >> + >> + spin_lock_irqsave(&domain->reserved_lock, flags); >> + >> + rid = (struct reserved_iova_domain *)domain->reserved_iova_cookie; >> + if (!rid) >> + goto free_newb; >> + >> + if ((prot & IOMMU_READ & !(rid->prot & IOMMU_READ)) || >> + (prot & IOMMU_WRITE & !(rid->prot & IOMMU_WRITE))) > > Are devices wanting to read from MSI doorbells really a thing? the rationale is Alex asked for propagating the VFIO DMA MAP prot flag downto this API (genericity context). This later is stored in rid->prot and I was just checking the iova was mapped according to the direction the userspace expected. > >> + goto free_newb; >> + >> + iovad = rid->iovad; >> + order = iova_shift(iovad); >> + base_pfn = addr >> order; >> + end_pfn = (addr + size - 1) >> order; >> + aligned_base = base_pfn << order; >> + offset = addr - aligned_base; >> + nb_iommu_pages = end_pfn - base_pfn + 1; >> + iommu_page_size = 1 << order; >> + binding_size = nb_iommu_pages * iommu_page_size; > > offset = iova_offset(iovad, addr); > aligned_base = addr - offset; > binding_size = iova_align(iovad, size + offset); > > Am I right? Looks so. Will further test it. Thanks > >> + >> + b = find_reserved_binding(domain, aligned_base, binding_size); >> + if (b) { >> + *iova = b->iova + offset + aligned_base - b->addr; >> + kref_get(&b->kref); >> + ret = 0; >> + goto free_newb; >> + } >> + >> + p_iova = alloc_iova(iovad, nb_iommu_pages, >> + iovad->dma_32bit_pfn, true); >> + if (!p_iova) { >> + ret = -ENOMEM; >> + goto free_newb; >> + } >> + >> + *iova = iova_dma_addr(iovad, p_iova); >> + >> + /* unlock to call iommu_map which is not guaranteed to be atomic */ > > Hmm, that's concerning, because the ARM DMA mapping code, and > consequently the iommu-dma layer, has always relied on it being so. On > brief inspection, it looks to be only the AMD IOMMU doing something > obviously non-atomic (taking a mutex) in its map callback, but then that > has a separate DMA ops implementation. It doesn't look like it would be > too intrusive to change, either, but that's an idea for its own thread. yes. Making no hypothesis on the atomicity of iommu_map/unmap ops brought some extra complexity here. Also it obliged to separate the alloc/map from the iommu "binding" lookup. But now it is done I think it brings some added value. Typically the fact we introduced an irq-chip ops to retrieve the doorbells characteristics is valuable to enumerate their number. > >> + spin_unlock_irqrestore(&domain->reserved_lock, flags); >> + >> + ret = iommu_map(domain, *iova, aligned_base, binding_size, prot); >> + >> + spin_lock_irqsave(&domain->reserved_lock, flags); >> + >> + rid = (struct reserved_iova_domain *) domain->reserved_iova_cookie; >> + if (!rid || (rid->iovad != iovad)) { >> + /* reserved iova domain was destroyed in our back */ > > That that could happen at all is terrifying! Surely the reserved domain > should be set up immediately after iommu_domain_alloc() and torn down > immediately before iommu_domain_free(). Things going missing while a > domain is live smacks of horrible brokenness. The VFIO user client creates the "reserved iova domain" using the vfio VFIO_IOMMU_MAP_DMA ioctl. This can happen anytime after the iommu domain creation (on VFIO_SET_IOMMU ioctl). The user-space is currently allowed to unregister this iova domain at any time too. I think this is wrong: I should have 2 reserved iova domain destroy functions, one used by user-space and one used by kernel. In the user-space implementation I should reject any attempt to destroy the reserved iova domain until there are existing bindings. > >> + ret = -EBUSY; >> + goto free_newb; /* iova already released */ >> + } >> + >> + /* no change in iova reserved domain but iommu_map failed */ >> + if (ret) >> + goto free_iova; >> + >> + /* everything is fine, add in the new node in the rb tree */ >> + kref_init(&newb->kref); >> + newb->domain = domain; >> + newb->addr = aligned_base; >> + newb->iova = *iova; >> + newb->size = binding_size; >> + >> + link_reserved_binding(domain, newb); >> + >> + *iova += offset; >> + goto unlock; >> + >> +free_iova: >> + free_iova(rid->iovad, p_iova->pfn_lo); >> +free_newb: >> + kfree(newb); >> +unlock: >> + spin_unlock_irqrestore(&domain->reserved_lock, flags); >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(iommu_get_reserved_iova); >> + >> +void iommu_put_reserved_iova(struct iommu_domain *domain, phys_addr_t >> addr) >> +{ >> + phys_addr_t aligned_addr, page_size, mask; >> + struct iommu_reserved_binding *b; >> + struct reserved_iova_domain *rid; >> + unsigned long order, flags; >> + struct iommu_domain *d; >> + dma_addr_t iova; >> + size_t size; >> + int ret = 0; >> + >> + spin_lock_irqsave(&domain->reserved_lock, flags); >> + >> + rid = (struct reserved_iova_domain *)domain->reserved_iova_cookie; >> + if (!rid) >> + goto unlock; >> + >> + order = iova_shift(rid->iovad); >> + page_size = (uint64_t)1 << order; >> + mask = page_size - 1; >> + aligned_addr = addr & ~mask; > > addr & ~iova_mask(rid->iovad) OK > >> + >> + b = find_reserved_binding(domain, aligned_addr, page_size); >> + if (!b) >> + goto unlock; >> + >> + iova = b->iova; >> + size = b->size; >> + d = b->domain; >> + >> + ret = kref_put(&b->kref, reserved_binding_release); >> + >> +unlock: >> + spin_unlock_irqrestore(&domain->reserved_lock, flags); >> + if (ret) >> + iommu_unmap(d, iova, size); >> +} >> +EXPORT_SYMBOL_GPL(iommu_put_reserved_iova); >> + >> diff --git a/include/linux/dma-reserved-iommu.h >> b/include/linux/dma-reserved-iommu.h >> index 01ec385..8722131 100644 >> --- a/include/linux/dma-reserved-iommu.h >> +++ b/include/linux/dma-reserved-iommu.h >> @@ -42,6 +42,34 @@ int iommu_alloc_reserved_iova_domain(struct >> iommu_domain *domain, >> */ >> void iommu_free_reserved_iova_domain(struct iommu_domain *domain); >> >> +/** >> + * iommu_get_reserved_iova: allocate a contiguous set of iova pages and >> + * map them to the physical range defined by @addr and @size. >> + * >> + * @domain: iommu domain handle >> + * @addr: physical address to bind >> + * @size: size of the binding >> + * @prot: mapping protection attribute >> + * @iova: returned iova >> + * >> + * Mapped physical pfns are within [@addr >> order, (@addr + size -1) >> >> order] >> + * where order corresponds to the reserved iova domain order. >> + * This mapping is tracked and reference counted with the minimal >> granularity >> + * of @size. >> + */ >> +int iommu_get_reserved_iova(struct iommu_domain *domain, >> + phys_addr_t addr, size_t size, int prot, >> + dma_addr_t *iova); >> + >> +/** >> + * iommu_put_reserved_iova: decrement a ref count of the reserved >> mapping >> + * >> + * @domain: iommu domain handle >> + * @addr: physical address whose binding ref count is decremented >> + * >> + * if the binding ref count is null, destroy the reserved mapping >> + */ >> +void iommu_put_reserved_iova(struct iommu_domain *domain, phys_addr_t >> addr); >> #else >> >> static inline int >> @@ -55,5 +83,15 @@ iommu_alloc_reserved_iova_domain(struct >> iommu_domain *domain, >> static inline void >> iommu_free_reserved_iova_domain(struct iommu_domain *domain) {} >> >> +static inline int iommu_get_reserved_iova(struct iommu_domain *domain, >> + phys_addr_t addr, size_t size, >> + int prot, dma_addr_t *iova) >> +{ >> + return -ENOENT; >> +} >> + >> +static inline void iommu_put_reserved_iova(struct iommu_domain *domain, >> + phys_addr_t addr) {} >> + >> #endif /* CONFIG_IOMMU_DMA_RESERVED */ >> #endif /* __DMA_RESERVED_IOMMU_H */ >> > > I worry that this all falls into the trap of trying too hard to abstract > something which doesn't need abstracting. AFAICS all we need is > something for VFIO to keep track of its own IOVA usage vs. userspace's, > plus a list of MSI descriptors (with IOVAs) wrapped in refcounts hanging > off the iommu_domain, with a handful of functions to manage them. The > former is as good as solved already - stick an iova_domain or even just > a bitmap in the iova_cookie and use it directly - and the latter would > actually be reusable elsewhere (e.g. for iommu-dma domains). What I'm > seeing here is layers upon layers of complexity with no immediate > justification, that's 'generic' enough to not directly solve the problem > at hand, but in a way that still makes it more or less unusable for > solving equivalent problems elsewhere. > > Since I don't like that everything I have to say about this series so > far seems negative, I'll plan to spend some time next week having a go > at hardening my 50-line proof-of-concept for stage 1 MSIs, and see if I > can offer code instead of criticism :) No worries. I really appreciate the time you've already spent reading this code ;-) I aknowledge it is a lot of trouble for mapping a single page - in my case - ! Anyway I will take into account your comments and simplify things accordingly. Let's see how we can converge... Best Regards Eric > > Robin.