Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933936AbcKKPvB (ORCPT ); Fri, 11 Nov 2016 10:51:01 -0500 Received: from mx1.redhat.com ([209.132.183.28]:54720 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933518AbcKKPvA (ORCPT ); Fri, 11 Nov 2016 10:51:00 -0500 Date: Fri, 11 Nov 2016 08:50:56 -0700 From: Alex Williamson To: Joerg Roedel Cc: Auger Eric , Will Deacon , drjones@redhat.com, Christoffer Dall , jason@lakedaemon.net, kvm@vger.kernel.org, marc.zyngier@arm.com, benh@kernel.crashing.org, punit.agrawal@arm.com, linux-kernel@vger.kernel.org, diana.craciun@nxp.com, iommu@lists.linux-foundation.org, pranav.sawargaonkar@gmail.com, arnd@arndb.de, dwmw@amazon.co.uk, jcm@redhat.com, Don Dutile , tglx@linutronix.de, robin.murphy@arm.com, linux-arm-kernel@lists.infradead.org, eric.auger.pro@gmail.com Subject: Re: Summary of LPC guest MSI discussion in Santa Fe Message-ID: <20161111085056.4cf8989d@t450s.home> In-Reply-To: <20161111111944.GO2078@8bytes.org> References: <20161109151709.74927f83@t450s.home> <20161109222522.GS17771@arm.com> <20161109162458.39594fdb@t450s.home> <20161109233847.GT17771@arm.com> <20161109165957.62c1eb61@t450s.home> <83b6440a-31eb-c1b4-642c-a4c311f37ef2@redhat.com> <20161109175517.174e7803@t450s.home> <20161110020130.GA19108@arm.com> <20161110104601.0939ba9a@t450s.home> <20161111111944.GO2078@8bytes.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Fri, 11 Nov 2016 15:50:59 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2362 Lines: 45 On Fri, 11 Nov 2016 12:19:44 +0100 Joerg Roedel wrote: > On Thu, Nov 10, 2016 at 10:46:01AM -0700, Alex Williamson wrote: > > In the case of x86, we know that DMA mappings overlapping the MSI > > doorbells won't be translated correctly, it's not a valid mapping for > > that range, and therefore the iommu driver backing the IOMMU API > > should describe that reserved range and reject mappings to it. > > The drivers actually allow mappings to the MSI region via the IOMMU-API, > and I think it should stay this way also for other reserved ranges. > Address space management is done by the IOMMU-API user already (and has > to be done there nowadays), be it a DMA-API implementation which just > reserves these regions in its address space allocator or be it VFIO with > QEMU, which don't map RAM there anyway. So there is no point of checking > this again in the IOMMU drivers and we can keep that out of the > mapping/unmapping fast-path. It's really just a happenstance that we don't map RAM over the x86 MSI range though. That property really can't be guaranteed once we mix architectures, such as running an aarch64 VM on x86 host via TCG. AIUI, the MSI range is actually handled differently than other DMA ranges, so a iommu_map() overlapping a range that the iommu cannot map should fail just like an attempt to map beyond the address width of the iommu. > > For PCI devices userspace can examine the topology of the iommu group > > and exclude MMIO ranges of peer devices based on the BARs, which are > > exposed in various places, pci-sysfs as well as /proc/iomem. For > > non-PCI or MSI controllers... ??? > > Right, the hardware resources can be examined. But maybe this can be > extended to also cover RMRR ranges? Then we would be able to assign > devices with RMRR mappings to guests. RMRRs are special in a different way, the VT-d spec requires that the OS honor RMRRs, the user has no responsibility (and currently no visibility) to make that same arrangement. In order to potentially protect the physical host platform, the iommu drivers should prevent a user from remapping RMRRS. Maybe there needs to be a different interface used by untrusted users vs in-kernel drivers, but I think the kernel really needs to be defensive in the case of user mappings, which is where the IOMMU API is rooted. Thanks, Alex