Received: by 2002:a05:6a10:f3d0:0:0:0:0 with SMTP id a16csp561767pxv; Thu, 8 Jul 2021 08:44:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy3wOIr17WjejOyYi2WzDzd7wMUtg6Yw57cDYXaKAyYiX0/QtwQzTAJ0FvcTy6Ag8Y1rVF1 X-Received: by 2002:a17:907:97d2:: with SMTP id js18mr25464036ejc.517.1625759080191; Thu, 08 Jul 2021 08:44:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1625759080; cv=none; d=google.com; s=arc-20160816; b=Y252c64EcjdYQ+pOfBd616UXhm1dPm0uTNhd2uqqgmv9/ldRRtkq3uKYN7WTvEYqGj odTukAPBVPCA2s5so8oYtUmnvPDwUKs1rZBzbCnUOhQRUVcckJ2k/Vk22Z0K17Y8FcpI pVwbmG9n+UCy4czEe04lhnQcLtaaRIZOEdBqmpNXF0hE2lNxlJ9kVCOouQC82dQGEOZz VvLLn5FQ3qWsHEMJC8XojpZeTyYROC+nWqQvWslU85t9tY+RkfhJTZsnI77phk2eVyp1 el+MVpaItL1dff86dB8zROLivREZpP6ivMGrpbbFnLn4Wi+71PF+rLTt7P8e0q37AUO0 hhEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=R5m/YTyk3LPenUL00yNLEhXaN+HteBlNepCPk/hbsj8=; b=n4oOOMEPb0PMU3mYMhGTt7Gi2QdD1ru7qQ1X6cArZNf9br2vDILWdJOW6g+vm9CYjj nI1xl5qZiYHa8qTUPoCZe78UHqfLoktGB7HseEapJ1tbNd9RcJTUiud4oQHnHwNHBxjk 1ekTV82fo24+q6qqE6EjfD4i7L6FSl/Iey5M71NClEg6PJunS8LvcUawNQnVKmrpPrLo A7xo2GWn/fzh6KONC36N8F54E+8PTmuYg0sNDWYfsd+JdW0vlbhcjDVuYX7+PxTvbDR/ +BLJq4yFyMji6a6ReT5ibPbf9Ykd0Mfm0+QrE8Q0LhE53iUCyG8R8x5L/tikgZL/60TY cDkQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w9si3755409edx.175.2021.07.08.08.44.17; Thu, 08 Jul 2021 08:44:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231944AbhGHPpt (ORCPT + 99 others); Thu, 8 Jul 2021 11:45:49 -0400 Received: from foss.arm.com ([217.140.110.172]:33418 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229592AbhGHPpt (ORCPT ); Thu, 8 Jul 2021 11:45:49 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1ECAAED1; Thu, 8 Jul 2021 08:43:07 -0700 (PDT) Received: from [10.57.35.192] (unknown [10.57.35.192]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E3B433F66F; Thu, 8 Jul 2021 08:43:05 -0700 (PDT) Subject: Re: [PATCH 1/2] dma-iommu: fix swiotlb SKIP_CPU_SYNC and arch sync To: Joerg Roedel , David Stevens Cc: iommu@lists.linux-foundation.org, Will Deacon , linux-kernel@vger.kernel.org, Tom Murphy References: <20210702053742.842850-1-stevensd@google.com> From: Robin Murphy Message-ID: <3f63e831-72a2-6482-aeea-7c8e84228b8c@arm.com> Date: Thu, 8 Jul 2021 16:43:01 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021-07-08 10:17, Joerg Roedel wrote: > Adding Robin. > > On Fri, Jul 02, 2021 at 02:37:41PM +0900, David Stevens wrote: >> From: David Stevens >> >> Make map_swiotlb and unmap_swiotlb only for mapping, and consistently >> use sync_single_for and sync_sg_for functions for swiotlb sync and arch >> sync. This ensures that the same code path is responsible for syncing >> regardless of whether or not SKIP_CPU_SYNC is set. In the process, fix >> various places where the original physical address and swiotlb tlb_addr >> are mixed up: >> - Make sync_sg functions call sync_single functions for untrusted >> devices, so they use tlb_addr when checking is_swiotlb_buffer and >> when doing arch sync if necessary. >> - Use tlb_addr for arch sync in map_page if necessary. >> - In map_sg, map before syncing so that arch sync can target the >> bounce buffer if necessary. >> - Pass SKIP_CPU_SYNC to swiotlb map and unmap to avoid double syncing >> the swiotlb. This had previously only happened in the unmap_page >> case, but is now necessary for all swiotlb cases. >> >> Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers") Hmm, there's a lot going on here, and it's not really clear to me what's what, especially WRT the comment about addresses being mixed up. Is there an actual correctness bug anywhere, i.e. an address which *should* be synced *isn't*? If so, please fix that minimally in a standalone patch without all the additional complication. If it's purely an efficiency improvement - i.e. the bounce page getting synced twice (which shouldn't have *too* much impact) or the original copy geting redundantly synced *as well* as the bounce page, then please clarify whether you've measured significant impact in practice or if this is just an observation from code inspection. TBH I don't find the overall justification in the commit message particularly convincing - to a less generous reading it kind of comes across as "move things around for the sake of it". Yes one can argue for keeping all the cache maintenance in one place, but one can equally argue for the keeping all the low-level practicalities of bouncing under SWIOTLB's umbrella such that it doesn't clutter up the IOMMU-focused code. The point of utilising the SWIOTLB machinery in the first place was to share and reuse as much of its "normal" flow as possible to keep things simple (see the thread at [1] for where the history began). Now, what is true is that the integration into iommu-dma is a bit rough and bolted-on in places. IIRC it was done expediently to un-block further development on the Intel driver, and it was always the plan to come back and improve it later (like only bouncing the unaligned start and end of larger buffers such that whole pages in the middle can still be mapped in-place). As such I'm very happy to see that someone has the time and interest to improve things, but some of this patch looks like adding more mess to work around the existing mess, rather than more fundamentally cleaning it up. For example, rather than further cementing the bodge of having two separate map_sg implementations, I think the bouncing could be streamlined into iommu-dma's own flow as part of its scatterlist transformation (AFAICS temporarily swizzling sg_page so that iommu_map_sg() maps the bounced copy would be no worse than what it's already doing). Thanks, Robin. [1] https://lore.kernel.org/linux-iommu/20190312163845.GA13141@infradead.org/ >> Signed-off-by: David Stevens >> --- >> drivers/iommu/dma-iommu.c | 82 ++++++++++++++++++++++++--------------- >> 1 file changed, 51 insertions(+), 31 deletions(-) >> >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c >> index 7bcdd1205535..24d1042cd052 100644 >> --- a/drivers/iommu/dma-iommu.c >> +++ b/drivers/iommu/dma-iommu.c >> @@ -505,7 +505,8 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, dma_addr_t dma_addr, >> __iommu_dma_unmap(dev, dma_addr, size); >> >> if (unlikely(is_swiotlb_buffer(phys))) >> - swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs); >> + swiotlb_tbl_unmap_single(dev, phys, size, dir, >> + attrs | DMA_ATTR_SKIP_CPU_SYNC); >> } >> >> static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, >> @@ -536,7 +537,8 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, >> >> static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys, >> size_t org_size, dma_addr_t dma_mask, bool coherent, >> - enum dma_data_direction dir, unsigned long attrs) >> + enum dma_data_direction dir, unsigned long attrs, >> + phys_addr_t *adj_phys) >> { >> int prot = dma_info_to_prot(dir, coherent, attrs); >> struct iommu_domain *domain = iommu_get_dma_domain(dev); >> @@ -555,7 +557,8 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys, >> iova_offset(iovad, phys | org_size)) { >> aligned_size = iova_align(iovad, org_size); >> phys = swiotlb_tbl_map_single(dev, phys, org_size, >> - aligned_size, dir, attrs); >> + aligned_size, dir, >> + attrs | DMA_ATTR_SKIP_CPU_SYNC); >> >> if (phys == DMA_MAPPING_ERROR) >> return DMA_MAPPING_ERROR; >> @@ -573,6 +576,8 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device *dev, phys_addr_t phys, >> >> memset(padding_start, 0, padding_size); >> } >> + if (adj_phys) >> + *adj_phys = phys; >> >> iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask); >> if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys)) >> @@ -785,15 +790,17 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev, >> swiotlb_sync_single_for_cpu(dev, phys, size, dir); >> } >> >> -static void iommu_dma_sync_single_for_device(struct device *dev, >> - dma_addr_t dma_handle, size_t size, enum dma_data_direction dir) >> +static void __iommu_dma_sync_single_for_device(struct device *dev, >> + dma_addr_t dma_handle, size_t size, >> + enum dma_data_direction dir, phys_addr_t phys) >> { >> - phys_addr_t phys; >> - >> if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev)) >> return; >> >> - phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle); >> + if (phys == 0) >> + phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), >> + dma_handle); >> + >> if (is_swiotlb_buffer(phys)) >> swiotlb_sync_single_for_device(dev, phys, size, dir); >> >> @@ -801,6 +808,12 @@ static void iommu_dma_sync_single_for_device(struct device *dev, >> arch_sync_dma_for_device(phys, size, dir); >> } >> >> +static void iommu_dma_sync_single_for_device(struct device *dev, >> + dma_addr_t dma_handle, size_t size, enum dma_data_direction dir) >> +{ >> + __iommu_dma_sync_single_for_device(dev, dma_handle, size, dir, 0); >> +} >> + >> static void iommu_dma_sync_sg_for_cpu(struct device *dev, >> struct scatterlist *sgl, int nelems, >> enum dma_data_direction dir) >> @@ -811,14 +824,13 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev, >> if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev)) >> return; >> >> - for_each_sg(sgl, sg, nelems, i) { >> - if (!dev_is_dma_coherent(dev)) >> + if (dev_is_untrusted(dev)) >> + for_each_sg(sgl, sg, nelems, i) >> + iommu_dma_sync_single_for_cpu(dev, sg_dma_address(sg), >> + sg->length, dir); >> + else >> + for_each_sg(sgl, sg, nelems, i) >> arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir); >> - >> - if (is_swiotlb_buffer(sg_phys(sg))) >> - swiotlb_sync_single_for_cpu(dev, sg_phys(sg), >> - sg->length, dir); >> - } >> } >> >> static void iommu_dma_sync_sg_for_device(struct device *dev, >> @@ -831,29 +843,30 @@ static void iommu_dma_sync_sg_for_device(struct device *dev, >> if (dev_is_dma_coherent(dev) && !dev_is_untrusted(dev)) >> return; >> >> - for_each_sg(sgl, sg, nelems, i) { >> - if (is_swiotlb_buffer(sg_phys(sg))) >> - swiotlb_sync_single_for_device(dev, sg_phys(sg), >> - sg->length, dir); >> - >> - if (!dev_is_dma_coherent(dev)) >> + if (dev_is_untrusted(dev)) >> + for_each_sg(sgl, sg, nelems, i) >> + __iommu_dma_sync_single_for_device(dev, >> + sg_dma_address(sg), >> + sg->length, dir, 0); >> + else >> + for_each_sg(sgl, sg, nelems, i) >> arch_sync_dma_for_device(sg_phys(sg), sg->length, dir); >> - } >> } >> >> static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page, >> unsigned long offset, size_t size, enum dma_data_direction dir, >> unsigned long attrs) >> { >> - phys_addr_t phys = page_to_phys(page) + offset; >> + phys_addr_t phys = page_to_phys(page) + offset, adj_phys; >> bool coherent = dev_is_dma_coherent(dev); >> dma_addr_t dma_handle; >> >> - dma_handle = __iommu_dma_map_swiotlb(dev, phys, size, dma_get_mask(dev), >> - coherent, dir, attrs); >> - if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) && >> + dma_handle = __iommu_dma_map_swiotlb(dev, phys, size, >> + dma_get_mask(dev), coherent, dir, attrs, &adj_phys); >> + if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) && >> dma_handle != DMA_MAPPING_ERROR) >> - arch_sync_dma_for_device(phys, size, dir); >> + __iommu_dma_sync_single_for_device(dev, dma_handle, size, >> + dir, adj_phys); >> return dma_handle; >> } >> >> @@ -960,7 +973,7 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg, >> for_each_sg(sg, s, nents, i) { >> sg_dma_address(s) = __iommu_dma_map_swiotlb(dev, sg_phys(s), >> s->length, dma_get_mask(dev), >> - dev_is_dma_coherent(dev), dir, attrs); >> + dev_is_dma_coherent(dev), dir, attrs, NULL); >> if (sg_dma_address(s) == DMA_MAPPING_ERROR) >> goto out_unmap; >> sg_dma_len(s) = s->length; >> @@ -991,17 +1004,24 @@ static int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, >> dma_addr_t iova; >> size_t iova_len = 0; >> unsigned long mask = dma_get_seg_boundary(dev); >> - int i; >> + int i, early_mapped = 0; >> >> if (static_branch_unlikely(&iommu_deferred_attach_enabled) && >> iommu_deferred_attach(dev, domain)) >> return 0; >> >> + if (dev_is_untrusted(dev)) { >> + early_mapped = iommu_dma_map_sg_swiotlb(dev, sg, nents, >> + dir, attrs); >> + if (!early_mapped) >> + return 0; >> + } >> + >> if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) >> iommu_dma_sync_sg_for_device(dev, sg, nents, dir); >> >> - if (dev_is_untrusted(dev)) >> - return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs); >> + if (early_mapped) >> + return early_mapped; >> >> /* >> * Work out how much IOVA space we need, and align the segments to >> -- >> 2.32.0.93.g670b81a890-goog > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu >