Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753662AbcJZPQS (ORCPT ); Wed, 26 Oct 2016 11:16:18 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34330 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751399AbcJZPQR (ORCPT ); Wed, 26 Oct 2016 11:16:17 -0400 Date: Wed, 26 Oct 2016 09:16:15 -0600 From: Alex Williamson To: "Tian, Kevin" Cc: Kirti Wankhede , "pbonzini@redhat.com" , "kraxel@redhat.com" , "cjia@nvidia.com" , "qemu-devel@nongnu.org" , "kvm@vger.kernel.org" , "Song, Jike" , "bjsdjshi@linux.vnet.ibm.com" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v9 04/12] vfio iommu: Add support for mediated devices Message-ID: <20161026091615.42f0b385@t450s.home> In-Reply-To: References: <1476739332-4911-1-git-send-email-kwankhede@nvidia.com> <1476739332-4911-5-git-send-email-kwankhede@nvidia.com> <20161019150231.19e32b84@t450s.home> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Wed, 26 Oct 2016 15:16:16 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3630 Lines: 77 On Wed, 26 Oct 2016 07:53:43 +0000 "Tian, Kevin" wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Thursday, October 20, 2016 5:03 AM > > > @@ -83,6 +92,21 @@ struct vfio_group { > > > }; > > > > > > /* > > > + * Guest RAM pinning working set or DMA target > > > + */ > > > +struct vfio_pfn { > > > + struct rb_node node; > > > + unsigned long vaddr; /* virtual addr */ > > > + dma_addr_t iova; /* IOVA */ > > > + unsigned long pfn; /* Host pfn */ > > > + int prot; > > > + atomic_t ref_count; > > > +}; > > > > Somehow we're going to need to fit an invalidation callback here too. > > How would we handle a case where there are multiple mdev devices, from > > different vendor drivers, that all have the same pfn pinned? I'm > > already concerned about the per pfn overhead we're introducing here so > > clearly we cannot store an invalidation callback per pinned page, per > > vendor driver. Perhaps invalidations should be done using a notifier > > chain per vfio_iommu, the vendor drivers are required to register on > > that chain (fail pinning with empty notifier list) user unmapping > > will be broadcast to the notifier chain, the vendor driver will be > > responsible for deciding if each unmap is relevant to them (potentially > > it's for a pinning from another driver). > > > > I expect we also need to enforce that vendors perform a synchronous > > unmap such that after returning from the notifier list call, the > > vfio_pfn should no longer exist. If it does we might need to BUG_ON. > > Also be careful to pay attention to the locking of the notifier vs > > unpin callbacks to avoid deadlocks. > > > > What about just requesting vendor driver to provide a callback in parent > device ops? How does the iommu driver get to the mdev vendor driver callback? We can also have pages pinned by multiple vendor drivers, I don't think we want the additional overhead of a per page list of invalidation callbacks. > Curious in which scenario the user application (say Qemu here) may > unmap memory pages which are still pinned by vendor driver... Is it > purely about a corner case which we want to handle elegantly? The vfio type1 iommu API provides a MAP and UNMAP interface. The unmap call is expected to work regardless of how it might inhibit the device from working. This is currently true of iommu protected devices today, a user can unmap pages which might be DMA targets for the device and the iommu prevents further access to those pages, possibly at the expense of device operation. We cannot support an interface where a user can unmap a set of pages and map in new pages to replace them when the vendor driver might be caching stale mappings. In normal VM operation perhaps this is a corner case, but the API is not defined only for the normal and expected behavior of a VM. > If yes, possibly a simpler way is to force destroying mdev instead of > asking vendor driver to take care of each invalidation request under > such situation. Since anyway the mdev device won't be in an usable > state anymore... (sorry if I missed the key problem here.) That's a pretty harsh response for an operation which is completely valid from an API perspective. What if the VM does an unmap of all memory around reset? We cannot guarantee that the guest driver will have a chance to do cleanup, the guest may have crashed or a system_reset invoked. Would you have the mdev destroyed in this case? How could QEMU, which has no device specific driver to know that vendor pinnings are present, recover from this? Thanks, Alex