Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753283AbcKITXH (ORCPT ); Wed, 9 Nov 2016 14:23:07 -0500 Received: from mail-lf0-f46.google.com ([209.85.215.46]:33291 "EHLO mail-lf0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751496AbcKITXE (ORCPT ); Wed, 9 Nov 2016 14:23:04 -0500 Date: Wed, 9 Nov 2016 20:23:03 +0100 From: Christoffer Dall To: Don Dutile Cc: Will Deacon , Alex Williamson , Eric Auger , eric.auger.pro@gmail.com, marc.zyngier@arm.com, robin.murphy@arm.com, joro@8bytes.org, tglx@linutronix.de, jason@lakedaemon.net, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, drjones@redhat.com, linux-kernel@vger.kernel.org, pranav.sawargaonkar@gmail.com, iommu@lists.linux-foundation.org, punit.agrawal@arm.com, diana.craciun@nxp.com, benh@kernel.crashing.org, arnd@arndb.de, jcm@redhat.com, dwmw@amazon.co.uk Subject: Re: Summary of LPC guest MSI discussion in Santa Fe Message-ID: <20161109192303.GD15676@cbox> References: <1478209178-3009-1-git-send-email-eric.auger@redhat.com> <20161103220205.37715b49@t450s.home> <20161108024559.GA20591@arm.com> <20161108202922.GC15676@cbox> <20161108163508.1bcae0c2@t450s.home> <58228F71.6020108@redhat.com> <20161109170326.GG17771@arm.com> <582371FB.2040808@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <582371FB.2040808@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4237 Lines: 86 On Wed, Nov 09, 2016 at 01:59:07PM -0500, Don Dutile wrote: > On 11/09/2016 12:03 PM, Will Deacon wrote: > >On Tue, Nov 08, 2016 at 09:52:33PM -0500, Don Dutile wrote: > >>On 11/08/2016 06:35 PM, Alex Williamson wrote: > >>>On Tue, 8 Nov 2016 21:29:22 +0100 > >>>Christoffer Dall wrote: > >>>>Is my understanding correct, that you need to tell userspace about the > >>>>location of the doorbell (in the IOVA space) in case (2), because even > >>>>though the configuration of the device is handled by the (host) kernel > >>>>through trapping of the BARs, we have to avoid the VFIO user programming > >>>>the device to create other DMA transactions to this particular address, > >>>>since that will obviously conflict and either not produce the desired > >>>>DMA transactions or result in unintended weird interrupts? > > > >Yes, that's the crux of the issue. > > > >>>Correct, if the MSI doorbell IOVA range overlaps RAM in the VM, then > >>>it's potentially a DMA target and we'll get bogus data on DMA read from > >>>the device, and lose data and potentially trigger spurious interrupts on > >>>DMA write from the device. Thanks, > >>> > >>That's b/c the MSI doorbells are not positioned *above* the SMMU, i.e., > >>they address match before the SMMU checks are done. if > >>all DMA addrs had to go through SMMU first, then the DMA access could > >>be ignored/rejected. > > > >That's actually not true :( The SMMU can't generally distinguish between MSI > >writes and DMA writes, so it would just see a write transaction to the > >doorbell address, regardless of how it was generated by the endpoint. > > > >Will > > > So, we have real systems where MSI doorbells are placed at the same IOVA > that could have memory for a guest I don't think this is a property of a hardware system. THe problem is userspace not knowing where in the IOVA space the kernel is going to place the doorbell, so you can end up (basically by chance) that some IPA range of guest memory overlaps with the IOVA space for the doorbell. >, but not at the same IOVA as memory on real hw ? On real hardware without an IOMMU the system designer would have to separate the IOVA and RAM in the physical address space. With an IOMMU, the SMMU driver just makes sure to allocate separate regions in the IOVA space. The challenge, as I understand it, happens with the VM, because the VM doesn't allocate the IOVA for the MSI doorbell itself, but the host kernel does this, independently from the attributes (e.g. memory map) of the VM. Because the IOVA is a single resource, but with two independent entities allocating chunks of it (the host kernel for the MSI doorbell IOVA, and the VFIO user for other DMA operations), you have to provide some coordination between those to entities to avoid conflicts. In the case of KVM, the two entities are the host kernel and the VFIO user (QEMU/the VM), and the host kernel informs the VFIO user to never attempt to use the doorbell IOVA already reserved by the host kernel for DMA. One way to do that is to ensure that the IPA space of the VFIO user corresponding to the doorbell IOVA is simply not valid, ie. the reserved regions that avoid for example QEMU to allocate RAM there. (I suppose it's technically possible to get around this issue by letting QEMU place RAM wherever it wants but tell the guest to never use a particular subset of its RAM for DMA, because that would conflict with the doorbell IOVA or be seen as p2p transactions. But I think we all probably agree that it's a disgusting idea.) > How are memory holes passed to SMMU so it doesn't have this issue for bare-metal > (assign an IOVA that overlaps an MSI doorbell address)? > As I understand it, the SMMU driver manages the whole IOVA space when VFIO is *not* involved, so it simply allocates non-overlapping regions. The problem occurs when you have two independent entities essentially attempting to mange the same resource (and the problem is exacerbated by the VM potentially allocating slots in the IOVA space which may have other limitations it doesn't know about, for example the p2p regions, because the VM doesn't know anything about the topology of the underlying physical system). Christoffer