Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753959AbcKIUbp (ORCPT ); Wed, 9 Nov 2016 15:31:45 -0500 Received: from foss.arm.com ([217.140.101.70]:35054 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751420AbcKIUbn (ORCPT ); Wed, 9 Nov 2016 15:31:43 -0500 Date: Wed, 9 Nov 2016 20:31:45 +0000 From: Will Deacon To: Christoffer Dall Cc: Don Dutile , Alex Williamson , Eric Auger , eric.auger.pro@gmail.com, marc.zyngier@arm.com, robin.murphy@arm.com, joro@8bytes.org, tglx@linutronix.de, jason@lakedaemon.net, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, drjones@redhat.com, linux-kernel@vger.kernel.org, pranav.sawargaonkar@gmail.com, iommu@lists.linux-foundation.org, punit.agrawal@arm.com, diana.craciun@nxp.com, benh@kernel.crashing.org, arnd@arndb.de, jcm@redhat.com, dwmw@amazon.co.uk Subject: Re: Summary of LPC guest MSI discussion in Santa Fe Message-ID: <20161109203145.GO17771@arm.com> References: <1478209178-3009-1-git-send-email-eric.auger@redhat.com> <20161103220205.37715b49@t450s.home> <20161108024559.GA20591@arm.com> <20161108202922.GC15676@cbox> <20161108163508.1bcae0c2@t450s.home> <58228F71.6020108@redhat.com> <20161109170326.GG17771@arm.com> <582371FB.2040808@redhat.com> <20161109192303.GD15676@cbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161109192303.GD15676@cbox> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4101 Lines: 76 On Wed, Nov 09, 2016 at 08:23:03PM +0100, Christoffer Dall wrote: > On Wed, Nov 09, 2016 at 01:59:07PM -0500, Don Dutile wrote: > > On 11/09/2016 12:03 PM, Will Deacon wrote: > > >On Tue, Nov 08, 2016 at 09:52:33PM -0500, Don Dutile wrote: > > >>On 11/08/2016 06:35 PM, Alex Williamson wrote: > > >>>Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then > > >>>it's potentially a DMA target and we'll get bogus data on DMA read from > > >>>the device, and lose data and potentially trigger spurious interrupts on > > >>>DMA write from the device. Thanks, > > >>> > > >>That's b/c the MSI doorbells are not positioned *above* the SMMU, i.e., > > >>they address match before the SMMU checks are done. if > > >>all DMA addrs had to go through SMMU first, then the DMA access could > > >>be ignored/rejected. > > > > > >That's actually not true :( The SMMU can't generally distinguish between MSI > > >writes and DMA writes, so it would just see a write transaction to the > > >doorbell address, regardless of how it was generated by the endpoint. > > > > > So, we have real systems where MSI doorbells are placed at the same IOVA > > that could have memory for a guest > > I don't think this is a property of a hardware system. THe problem is > userspace not knowing where in the IOVA space the kernel is going to > place the doorbell, so you can end up (basically by chance) that some > IPA range of guest memory overlaps with the IOVA space for the doorbell. I think the case that Don has in mind is where the host is using the SMMU for DMA mapping. In that case, yes, the IOVAs assigned by things like dma_map_single mustn't collide with any fixed MSI mapping. We currently take care to avoid PCI windows, but nobody has added the code for the fixed MSI mappings yet (I think we should put the onus on the people with the broken systems for that). Depending on how the firmware describes the fixed MSI address, either the irqchip driver can take care of it in compose_msi_msg, or we could do something in the iommu_dma_map_msi_msg path to ensure that the fixed region is preallocated in the msi_page_list. I'm less fussed about this issue because there's not a user ABI involved, so it can all be fixed later. > >, but not at the same IOVA as memory on real hw ? > > On real hardware without an IOMMU the system designer would have to > separate the IOVA and RAM in the physical address space. With an IOMMU, > the SMMU driver just makes sure to allocate separate regions in the IOVA > space. > > The challenge, as I understand it, happens with the VM, because the VM > doesn't allocate the IOVA for the MSI doorbell itself, but the host > kernel does this, independently from the attributes (e.g. memory map) of > the VM. > > Because the IOVA is a single resource, but with two independent entities > allocating chunks of it (the host kernel for the MSI doorbell IOVA, and > the VFIO user for other DMA operations), you have to provide some > coordination between those to entities to avoid conflicts. In the case > of KVM, the two entities are the host kernel and the VFIO user (QEMU/the > VM), and the host kernel informs the VFIO user to never attempt to use > the doorbell IOVA already reserved by the host kernel for DMA. > > One way to do that is to ensure that the IPA space of the VFIO user > corresponding to the doorbell IOVA is simply not valid, ie. the reserved > regions that avoid for example QEMU to allocate RAM there. > > (I suppose it's technically possible to get around this issue by letting > QEMU place RAM wherever it wants but tell the guest to never use a > particular subset of its RAM for DMA, because that would conflict with > the doorbell IOVA or be seen as p2p transactions. But I think we all > probably agree that it's a disgusting idea.) Disgusting, yes, but Ben's idea of hotplugging on the host controller with firmware tables describing the reserved regions is something that we could do in the distant future. In the meantime, I don't think that VFIO should explicitly reject overlapping mappings if userspace asks for them. Will