Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933387AbcKPD6T (ORCPT ); Tue, 15 Nov 2016 22:58:19 -0500 Received: from mx1.redhat.com ([209.132.183.28]:55076 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932109AbcKPD6R (ORCPT ); Tue, 15 Nov 2016 22:58:17 -0500 Date: Tue, 15 Nov 2016 20:58:15 -0700 From: Alex Williamson To: Kirti Wankhede Cc: , , , , , , , , Subject: Re: [PATCH v13 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP Message-ID: <20161115205815.492670b2@t450s.home> In-Reply-To: <473d10c5-b2cb-e976-a923-b5add22bcde6@nvidia.com> References: <1479223805-22895-1-git-send-email-kwankhede@nvidia.com> <1479223805-22895-12-git-send-email-kwankhede@nvidia.com> <20161115151950.1e8ab7d6@t450s.home> <20161115201612.103893d7@t450s.home> <20161115202522.16d1990e@t450s.home> <473d10c5-b2cb-e976-a923-b5add22bcde6@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 16 Nov 2016 03:58:16 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3486 Lines: 83 On Wed, 16 Nov 2016 09:13:37 +0530 Kirti Wankhede wrote: > On 11/16/2016 8:55 AM, Alex Williamson wrote: > > On Tue, 15 Nov 2016 20:16:12 -0700 > > Alex Williamson wrote: > > > >> On Wed, 16 Nov 2016 08:16:15 +0530 > >> Kirti Wankhede wrote: > >> > >>> On 11/16/2016 3:49 AM, Alex Williamson wrote: > >>>> On Tue, 15 Nov 2016 20:59:54 +0530 > >>>> Kirti Wankhede wrote: > >>>> > >>> ... > >>> > >>>>> @@ -854,7 +857,28 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, > >>>>> */ > >>>>> if (dma->task->mm != current->mm) > >>>>> break; > >>>>> + > >>>>> unmapped += dma->size; > >>>>> + > >>>>> + if (iommu->external_domain && !RB_EMPTY_ROOT(&dma->pfn_list)) { > >>>>> + struct vfio_iommu_type1_dma_unmap nb_unmap; > >>>>> + > >>>>> + nb_unmap.iova = dma->iova; > >>>>> + nb_unmap.size = dma->size; > >>>>> + > >>>>> + /* > >>>>> + * Notifier callback would call vfio_unpin_pages() which > >>>>> + * would acquire iommu->lock. Release lock here and > >>>>> + * reacquire it again. > >>>>> + */ > >>>>> + mutex_unlock(&iommu->lock); > >>>>> + blocking_notifier_call_chain(&iommu->notifier, > >>>>> + VFIO_IOMMU_NOTIFY_DMA_UNMAP, > >>>>> + &nb_unmap); > >>>>> + mutex_lock(&iommu->lock); > >>>>> + if (WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list))) > >>>>> + break; > >>>>> + } > >>>> > >>>> > >>>> Why exactly do we need to notify per vfio_dma rather than per unmap > >>>> request? If we do the latter we can send the notify first, limiting us > >>>> to races where a page is pinned between the notify and the locking, > >>>> whereas here, even our dma pointer is suspect once we re-acquire the > >>>> lock, we don't technically know if another unmap could have removed > >>>> that already. Perhaps something like this (untested): > >>>> > >>> > >>> There are checks to validate unmap request, like v2 check and who is > >>> calling unmap and is it allowed for that task to unmap. Before these > >>> checks its not sure that unmap region range which asked for would be > >>> unmapped all. Notify call should be at the place where its sure that the > >>> range provided to notify call is definitely going to be removed. My > >>> change do that. > >> > >> Ok, but that does solve the problem. What about this (untested): > > > > s/does/does not/ > > > > BTW, I like how the retries here fill the gap in my previous proposal > > where we could still race re-pinning. We've given it an honest shot or > > someone is not participating if we've retried 10 times. I don't > > understand why the test for iommu->external_domain was there, clearly > > if the list is not empty, we need to notify. Thanks, > > > > Ok. Retry is good to give a chance to unpin all. But is it really > required to use BUG_ON() that would panic the host. I think WARN_ON > should be fine and then when container is closed or when the last group > is removed from the container, vfio_iommu_type1_release() is called and > we have a chance to unpin it all. See my comments on patch 10/22, we need to be vigilant that the vendor driver is participating. I don't think we should be cleaning up after the vendor driver on release, if we need to do that, it implies we already have problems in multi-mdev containers since we'll be left with pfn_list entries that no longer have an owner. Thanks, Alex