Date: Tue, 15 Nov 2016 20:58:15 -0700
From: Alex Williamson <alex.williamson@redhat.com>
To: Kirti Wankhede <kwankhede@nvidia.com>
Cc: <pbonzini@redhat.com>, <kraxel@redhat.com>, <cjia@nvidia.com>,
        <qemu-devel@nongnu.org>, <kvm@vger.kernel.org>, <kevin.tian@intel.com>,
        <jike.song@intel.com>, <bjsdjshi@linux.vnet.ibm.com>,
        <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v13 11/22] vfio iommu: Add blocking notifier to notify
 DMA_UNMAP
Message-ID: <20161115205815.492670b2@t450s.home>
In-Reply-To: <473d10c5-b2cb-e976-a923-b5add22bcde6@nvidia.com>
References: <1479223805-22895-1-git-send-email-kwankhede@nvidia.com>
        <1479223805-22895-12-git-send-email-kwankhede@nvidia.com>
        <20161115151950.1e8ab7d6@t450s.home>
        <ff36c637-92ec-a768-7bc4-3015c30dba12@nvidia.com>
        <20161115201612.103893d7@t450s.home>
        <20161115202522.16d1990e@t450s.home>
        <473d10c5-b2cb-e976-a923-b5add22bcde6@nvidia.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3486
Lines: 83

On Wed, 16 Nov 2016 09:13:37 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 11/16/2016 8:55 AM, Alex Williamson wrote:
> > On Tue, 15 Nov 2016 20:16:12 -0700
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> >> On Wed, 16 Nov 2016 08:16:15 +0530
> >> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>  
> >>> On 11/16/2016 3:49 AM, Alex Williamson wrote:    
> >>>> On Tue, 15 Nov 2016 20:59:54 +0530
> >>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>>       
> >>> ...
> >>>     
> >>>>> @@ -854,7 +857,28 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> >>>>>  		 */
> >>>>>  		if (dma->task->mm != current->mm)
> >>>>>  			break;
> >>>>> +
> >>>>>  		unmapped += dma->size;
> >>>>> +
> >>>>> +		if (iommu->external_domain && !RB_EMPTY_ROOT(&dma->pfn_list)) {
> >>>>> +			struct vfio_iommu_type1_dma_unmap nb_unmap;
> >>>>> +
> >>>>> +			nb_unmap.iova = dma->iova;
> >>>>> +			nb_unmap.size = dma->size;
> >>>>> +
> >>>>> +			/*
> >>>>> +			 * Notifier callback would call vfio_unpin_pages() which
> >>>>> +			 * would acquire iommu->lock. Release lock here and
> >>>>> +			 * reacquire it again.
> >>>>> +			 */
> >>>>> +			mutex_unlock(&iommu->lock);
> >>>>> +			blocking_notifier_call_chain(&iommu->notifier,
> >>>>> +						    VFIO_IOMMU_NOTIFY_DMA_UNMAP,
> >>>>> +						    &nb_unmap);
> >>>>> +			mutex_lock(&iommu->lock);
> >>>>> +			if (WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list)))
> >>>>> +				break;
> >>>>> +		}      
> >>>>
> >>>>
> >>>> Why exactly do we need to notify per vfio_dma rather than per unmap
> >>>> request?  If we do the latter we can send the notify first, limiting us
> >>>> to races where a page is pinned between the notify and the locking,
> >>>> whereas here, even our dma pointer is suspect once we re-acquire the
> >>>> lock, we don't technically know if another unmap could have removed
> >>>> that already.  Perhaps something like this (untested):
> >>>>       
> >>>
> >>> There are checks to validate unmap request, like v2 check and who is
> >>> calling unmap and is it allowed for that task to unmap. Before these
> >>> checks its not sure that unmap region range which asked for would be
> >>> unmapped all. Notify call should be at the place where its sure that the
> >>> range provided to notify call is definitely going to be removed. My
> >>> change do that.    
> >>
> >> Ok, but that does solve the problem.  What about this (untested):  
> > 
> > s/does/does not/
> > 
> > BTW, I like how the retries here fill the gap in my previous proposal
> > where we could still race re-pinning.  We've given it an honest shot or
> > someone is not participating if we've retried 10 times.  I don't
> > understand why the test for iommu->external_domain was there, clearly
> > if the list is not empty, we need to notify.  Thanks,
> >   
> 
> Ok. Retry is good to give a chance to unpin all. But is it really
> required to use BUG_ON() that would panic the host. I think WARN_ON
> should be fine and then when container is closed or when the last group
> is removed from the container, vfio_iommu_type1_release() is called and
> we have a chance to unpin it all.

See my comments on patch 10/22, we need to be vigilant that the vendor
driver is participating.  I don't think we should be cleaning up after
the vendor driver on release, if we need to do that, it implies we
already have problems in multi-mdev containers since we'll be left with
pfn_list entries that no longer have an owner.  Thanks,

Alex