Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1167806AbcKAHre (ORCPT ); Tue, 1 Nov 2016 03:47:34 -0400 Received: from hqemgate16.nvidia.com ([216.228.121.65]:15118 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1163170AbcKAHrb (ORCPT ); Tue, 1 Nov 2016 03:47:31 -0400 X-PGP-Universal: processed; by hqnvupgp07.nvidia.com on Mon, 31 Oct 2016 12:47:19 -0700 Subject: Re: [PATCH v10 10/19] vfio iommu: Add blocking notifier to notify DMA_UNMAP To: Alex Williamson , Jike Song , , , , , , , , References: <1477517366-27871-1-git-send-email-kwankhede@nvidia.com> <1477517366-27871-11-git-send-email-kwankhede@nvidia.com> <5812FF66.6020801@intel.com> <20161028064045.0e8ca7dc@t450s.home> <20161028143350.45df29c1@t450s.home> <9cfebf8f-7c30-6d2c-a1ec-cc9c9ee1bdd7@nvidia.com> <20161029080301.5e464435@t450s.home> <20161101034558.GA7186@bjsdjshi@linux.vnet.ibm.com> X-Nvconfidentiality: public From: Kirti Wankhede Message-ID: <0fc2c739-b3ed-6cb1-7fd4-180da07607e1@nvidia.com> Date: Tue, 1 Nov 2016 13:17:19 +0530 MIME-Version: 1.0 In-Reply-To: <20161101034558.GA7186@bjsdjshi@linux.vnet.ibm.com> X-Originating-IP: [10.24.216.210] X-ClientProxiedBy: DRUKMAIL101.nvidia.com (10.25.59.19) To bgmail102.nvidia.com (10.25.59.11) Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6347 Lines: 150 On 11/1/2016 9:15 AM, Dong Jia Shi wrote: > * Alex Williamson [2016-10-29 08:03:01 -0600]: > >> On Sat, 29 Oct 2016 16:07:05 +0530 >> Kirti Wankhede wrote: >> >>> On 10/29/2016 2:03 AM, Alex Williamson wrote: >>>> On Sat, 29 Oct 2016 01:32:35 +0530 >>>> Kirti Wankhede wrote: >>>> >>>>> On 10/28/2016 6:10 PM, Alex Williamson wrote: >>>>>> On Fri, 28 Oct 2016 15:33:58 +0800 >>>>>> Jike Song wrote: >>>>>> >>> ... >>>>>>>> >>>>>>>> +/* >>>>>>>> + * This function finds pfn in domain->external_addr_space->pfn_list for given >>>>>>>> + * iova range. If pfn exist, notify pfn to registered notifier list. On >>>>>>>> + * receiving notifier callback, vendor driver should invalidate the mapping and >>>>>>>> + * call vfio_unpin_pages() to unpin this pfn. With that vfio_pfn for this pfn >>>>>>>> + * gets removed from rb tree of pfn_list. That re-arranges rb tree, so while >>>>>>>> + * searching for next vfio_pfn in rb tree, start search from first node again. >>>>>>>> + * If any vendor driver doesn't unpin that pfn, vfio_pfn would not get removed >>>>>>>> + * from rb tree and so in next search vfio_pfn would be same as previous >>>>>>>> + * vfio_pfn. In that case, exit from loop. >>>>>>>> + */ >>>>>>>> +static void vfio_notifier_call_chain(struct vfio_iommu *iommu, >>>>>>>> + struct vfio_iommu_type1_dma_unmap *unmap) >>>>>>>> +{ >>>>>>>> + struct vfio_domain *domain = iommu->external_domain; >>>>>>>> + struct rb_node *n; >>>>>>>> + struct vfio_pfn *vpfn = NULL, *prev_vpfn; >>>>>>>> + >>>>>>>> + do { >>>>>>>> + prev_vpfn = vpfn; >>>>>>>> + mutex_lock(&domain->external_addr_space->pfn_list_lock); >>>>>>>> + >>>>>>>> + n = rb_first(&domain->external_addr_space->pfn_list); >>>>>>>> + >>>>>>>> + for (; n; n = rb_next(n), vpfn = NULL) { >>>>>>>> + vpfn = rb_entry(n, struct vfio_pfn, node); >>>>>>>> + >>>>>>>> + if ((vpfn->iova >= unmap->iova) && >>>>>>>> + (vpfn->iova < unmap->iova + unmap->size)) >>>>>>>> + break; >>>>>>>> + } >>>>>>>> + >>>>>>>> + mutex_unlock(&domain->external_addr_space->pfn_list_lock); >>>>>>>> + >>>>>>>> + /* Notify any listeners about DMA_UNMAP */ >>>>>>>> + if (vpfn) >>>>>>>> + blocking_notifier_call_chain(&iommu->notifier, >>>>>>>> + VFIO_IOMMU_NOTIFY_DMA_UNMAP, >>>>>>>> + &vpfn->pfn); >>>>>>> >>>>>>> Hi Kirti, >>>>>>> >>>>>>> The information carried by notifier is only a pfn. >>>>>>> >>>>>>> Since your pin/unpin interfaces design, it's the vendor driver who should >>>>>>> guarantee pin/unpin same times. To achieve that, the vendor driver must >>>>>>> cache it's iova->pfn mapping on its side, to avoid pinning a same page >>>>>>> for multiple times. >>>>>>> >>>>>>> With the notifier carrying only a pfn, to find the iova by this pfn, >>>>>>> the vendor driver must *also* keep a reverse-mapping. That's a bit >>>>>>> too much. >>>>>>> >>>>>>> Since the vendor could also suffer from IOMMU-compatible problem, >>>>>>> which means a local cache is always helpful, so I'd like to have the >>>>>>> iova carried to the notifier. >>>>>>> >>>>>>> What'd you say? >>>>>> >>>>>> I agree, the pfn is not unique, multiple guest pfns (iovas) might be >>>>>> backed by the same host pfn. DMA_UNMAP calls are based on iova, the >>>>>> notifier through to the vendor driver must be based on the same. >>>>> >>>>> Host pfn should be unique, right? >>>> >>>> Let's say a user does a malloc of a single page and does 100 calls to >>>> MAP_DMA populating 100 pages of IOVA space all backed by the same >>>> malloc'd page. This is valid, I have unit tests that do essentially >>>> this. Those will all have the same pfn. The user then does an >>>> UNMAP_DMA to a single one of those IOVA pages. Did the user unmap >>>> everything matching that pfn? Of course not, they only unmapped that >>>> one IOVA page. There is no guarantee of a 1:1 mapping of pfn to IOVA. >>>> UNMAP_DMA works based on IOVA. Invalidation broadcasts to the vendor >>>> driver MUST therefore also work based on IOVA. This is not an academic >>>> problem, address space aliases exist in real VMs, imagine a virtual >>>> IOMMU. Thanks, >>>> >>> >>> >>> So struct vfio_iommu_type1_dma_unmap should be passed as argument to >>> notifier callback: >>> >>> if (unmapped && iommu->external_domain) >>> - vfio_notifier_call_chain(iommu, unmap); >>> + blocking_notifier_call_chain(&iommu->notifier, >>> + VFIO_IOMMU_NOTIFY_DMA_UNMAP, >>> + unmap); >>> >>> Then vendor driver should find pfns he has pinned from this range of >>> iovas, then invalidate and unpin pfns. Right? >> >> That seems like a valid choice. It's probably better than calling the >> notifier for each page of iova. Thanks, >> >> Alex >> > Hi Kirti, > > This version requires the *vendor driver* call vfio_register_notifier > for an mdev device before any pinning operations. I guess all of the > vendor drivers may have some alike code for notifier > registration/unregistration. > > My question is, how about letting the mdev framework managing the > notifier registration/unregistration process? > > We could add a notifier_fn_t callback to "struct parent_ops", then the > mdev framework should make sure that the vendor driver assigned a value > to this callback. The mdev core could initiate a notifier_block for each > parent driver with its callback, and register/unregister it to vfio in > the right time. > Module mdev_core is independent of VFIO so far and it should be independent of VFIO module. Its a good suggestion to have a notifier callback in parent_ops, but we shouldn't call vfio_register_notifier()/ vfio_unregister_notifier() from mdev core module. vfio_mdev module would take care to register/unregister notifier to vfio from its vfio_mdev_open()/ vfio_mdev_release() call. That looks cleaner to me. Notifier callback in parent_ops should be optional since all vendor drivers might not pin/unpin pages, for example sample mtty driver. Notifier would be registered to vfio module only if the notifier callback is provided by vendor driver. If this looks reasonable, I'll have this patch in my next version of patch series. Thanks, Kirti