Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934339AbcKPPW3 (ORCPT ); Wed, 16 Nov 2016 10:22:29 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:3971 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933122AbcKPPW0 (ORCPT ); Wed, 16 Nov 2016 10:22:26 -0500 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 16 Nov 2016 07:22:24 -0800 Subject: Re: [PATCH v13 11/22] vfio iommu: Add blocking notifier to notify DMA_UNMAP To: Alex Williamson References: <1479223805-22895-1-git-send-email-kwankhede@nvidia.com> <1479223805-22895-12-git-send-email-kwankhede@nvidia.com> <20161115151950.1e8ab7d6@t450s.home> <20161115201612.103893d7@t450s.home> <20161115202522.16d1990e@t450s.home> <473d10c5-b2cb-e976-a923-b5add22bcde6@nvidia.com> <20161115205815.492670b2@t450s.home> <20161115213626.50566052@t450s.home> CC: , , , , , , , , X-Nvconfidentiality: public From: Kirti Wankhede Message-ID: <156be3c0-c794-41ca-bbda-10c7cd8ae4ce@nvidia.com> Date: Wed, 16 Nov 2016 20:52:14 +0530 MIME-Version: 1.0 In-Reply-To: <20161115213626.50566052@t450s.home> X-Originating-IP: [10.24.216.210] X-ClientProxiedBy: DRUKMAIL101.nvidia.com (10.25.59.19) To bgmail102.nvidia.com (10.25.59.11) Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5596 Lines: 123 On 11/16/2016 10:06 AM, Alex Williamson wrote: > On Wed, 16 Nov 2016 09:46:20 +0530 > Kirti Wankhede wrote: > >> On 11/16/2016 9:28 AM, Alex Williamson wrote: >>> On Wed, 16 Nov 2016 09:13:37 +0530 >>> Kirti Wankhede wrote: >>> >>>> On 11/16/2016 8:55 AM, Alex Williamson wrote: >>>>> On Tue, 15 Nov 2016 20:16:12 -0700 >>>>> Alex Williamson wrote: >>>>> >>>>>> On Wed, 16 Nov 2016 08:16:15 +0530 >>>>>> Kirti Wankhede wrote: >>>>>> >>>>>>> On 11/16/2016 3:49 AM, Alex Williamson wrote: >>>>>>>> On Tue, 15 Nov 2016 20:59:54 +0530 >>>>>>>> Kirti Wankhede wrote: >>>>>>>> >>>>>>> ... >>>>>>> >>>>>>>>> @@ -854,7 +857,28 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, >>>>>>>>> */ >>>>>>>>> if (dma->task->mm != current->mm) >>>>>>>>> break; >>>>>>>>> + >>>>>>>>> unmapped += dma->size; >>>>>>>>> + >>>>>>>>> + if (iommu->external_domain && !RB_EMPTY_ROOT(&dma->pfn_list)) { >>>>>>>>> + struct vfio_iommu_type1_dma_unmap nb_unmap; >>>>>>>>> + >>>>>>>>> + nb_unmap.iova = dma->iova; >>>>>>>>> + nb_unmap.size = dma->size; >>>>>>>>> + >>>>>>>>> + /* >>>>>>>>> + * Notifier callback would call vfio_unpin_pages() which >>>>>>>>> + * would acquire iommu->lock. Release lock here and >>>>>>>>> + * reacquire it again. >>>>>>>>> + */ >>>>>>>>> + mutex_unlock(&iommu->lock); >>>>>>>>> + blocking_notifier_call_chain(&iommu->notifier, >>>>>>>>> + VFIO_IOMMU_NOTIFY_DMA_UNMAP, >>>>>>>>> + &nb_unmap); >>>>>>>>> + mutex_lock(&iommu->lock); >>>>>>>>> + if (WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list))) >>>>>>>>> + break; >>>>>>>>> + } >>>>>>>> >>>>>>>> >>>>>>>> Why exactly do we need to notify per vfio_dma rather than per unmap >>>>>>>> request? If we do the latter we can send the notify first, limiting us >>>>>>>> to races where a page is pinned between the notify and the locking, >>>>>>>> whereas here, even our dma pointer is suspect once we re-acquire the >>>>>>>> lock, we don't technically know if another unmap could have removed >>>>>>>> that already. Perhaps something like this (untested): >>>>>>>> >>>>>>> >>>>>>> There are checks to validate unmap request, like v2 check and who is >>>>>>> calling unmap and is it allowed for that task to unmap. Before these >>>>>>> checks its not sure that unmap region range which asked for would be >>>>>>> unmapped all. Notify call should be at the place where its sure that the >>>>>>> range provided to notify call is definitely going to be removed. My >>>>>>> change do that. >>>>>> >>>>>> Ok, but that does solve the problem. What about this (untested): >>>>> >>>>> s/does/does not/ >>>>> >>>>> BTW, I like how the retries here fill the gap in my previous proposal >>>>> where we could still race re-pinning. We've given it an honest shot or >>>>> someone is not participating if we've retried 10 times. I don't >>>>> understand why the test for iommu->external_domain was there, clearly >>>>> if the list is not empty, we need to notify. Thanks, >>>>> >>>> >>>> Ok. Retry is good to give a chance to unpin all. But is it really >>>> required to use BUG_ON() that would panic the host. I think WARN_ON >>>> should be fine and then when container is closed or when the last group >>>> is removed from the container, vfio_iommu_type1_release() is called and >>>> we have a chance to unpin it all. >>> >>> See my comments on patch 10/22, we need to be vigilant that the vendor >>> driver is participating. I don't think we should be cleaning up after >>> the vendor driver on release, if we need to do that, it implies we >>> already have problems in multi-mdev containers since we'll be left with >>> pfn_list entries that no longer have an owner. Thanks, >>> >> >> If any vendor driver doesn't clean its pinned pages and there are >> entries in pfn_list with no owner, that would be indicated by WARN_ON, >> which should be fixed by that vendor driver. I still feel it shouldn't >> cause host panic. >> When such warning is seen with multiple mdev devices in container, it is >> easy to isolate and find which vendor driver is not cleaning their >> stuff, same warning would be seen with single mdev device in a >> container. To isolate and find which vendor driver is culprit check with >> one mdev device at a time. >> Finally, we have a chance to clean all residue from >> vfio_iommu_type1_release() so that vfio_iommu_type1 module doesn't leave >> any leaks. > > How can we claim that we've resolved anything by unpinning the > residue? In fact, is it actually safe to unpin any residue left by the > vendor driver or does it imply that we're promoting a simple memory > leak to a security issue because we can't verify whether the vendor > driver has disabled access to that pfn, which may not reference a user > page after we unpin it. That, in addition to the fact that I don't > need to figure out how to break from the loop with a BUG_ON, is why I > chose that rather than a WARN_ON. The release path could probably be a > WARN_ON since the user no longer has access to the device, so we have a > consistency error with the vendor driver, but we're probably not > promoting it further by unpinning the pages. Thanks, > Ok. Agree with the security concern you mentioned. Changing to BUG_ON as you suggested in vfio_dma_do_unmap() and replacing 'unpinning remaining pages on detach_group and release' with 'WARN_ON' if there are unpinned pages. Thanks, Kirti