Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753749AbcLHPrg (ORCPT ); Thu, 8 Dec 2016 10:47:36 -0500 Received: from foss.arm.com ([217.140.101.70]:35774 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753004AbcLHPr3 (ORCPT ); Thu, 8 Dec 2016 10:47:29 -0500 Subject: Re: [RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions To: Auger Eric , eric.auger.pro@gmail.com, christoffer.dall@linaro.org, marc.zyngier@arm.com, alex.williamson@redhat.com, will.deacon@arm.com, joro@8bytes.org, tglx@linutronix.de, jason@lakedaemon.net, linux-arm-kernel@lists.infradead.org References: <1479215363-2898-1-git-send-email-eric.auger@redhat.com> <48e7d75a-8499-7f41-42d0-08fe081c192b@redhat.com> Cc: kvm@vger.kernel.org, drjones@redhat.com, linux-kernel@vger.kernel.org, pranav.sawargaonkar@gmail.com, iommu@lists.linux-foundation.org, punit.agrawal@arm.com, diana.craciun@nxp.com, Don Dutile From: Robin Murphy Message-ID: Date: Thu, 8 Dec 2016 15:46:33 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <48e7d75a-8499-7f41-42d0-08fe081c192b@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 7239 Lines: 167 On 08/12/16 13:36, Auger Eric wrote: > Hi Robin, > > On 08/12/2016 14:14, Robin Murphy wrote: >> On 08/12/16 09:36, Auger Eric wrote: >>> Hi, >>> >>> On 15/11/2016 14:09, Eric Auger wrote: >>>> Following LPC discussions, we now report reserved regions through >>>> iommu-group sysfs reserved_regions attribute file. >>> >>> >>> While I am respinning this series into v4, here is a tentative summary >>> of technical topics for which no consensus was reached at this point. >>> >>> 1) Shall we report the usable IOVA range instead of reserved IOVA >>> ranges. Not discussed at/after LPC. >>> x I currently report reserved regions. Alex expressed the need to >>> report the full usable IOVA range instead (x86 min-max range >>> minus MSI APIC window). I think this is meaningful for ARM >>> too where arm-smmu might not support the full 64b range. >>> x Any objection we report the usable IOVA regions instead? >> >> The issue with that is that we can't actually report "the usable >> regions" at the moment, as that involves pulling together disjoint >> properties of arbitrary hardware unrelated to the IOMMU. We'd be >> reporting "the not-definitely-unusable regions, which may have some >> unusable holes in them still". That seems like an ABI nightmare - I'd >> still much rather say "here are some, but not necessarily all, regions >> you definitely can't use", because saying "here are some regions which >> you might be able to use most of, probably" is what we're already doing >> today, via a single implicit region from 0 to ULONG_MAX ;) >> >> The address space limits are definitely useful to know, but I think it >> would be better to expose them separately to avoid the ambiguity. At >> worst, I guess it would be reasonable to express the limits via an >> "out-of-range" reserved region type for 0 to $base and $top to >> ULONG-MAX. To *safely* expose usable regions, we'd have to start out >> with a very conservative assumption (e.g. only IOVAs matching physical >> RAM), and only expand them once we're sure we can detect every possible >> bit of problematic hardware in the system - that's just too limiting to >> be useful. And if we expose something knowingly inaccurate, we risk >> having another "bogoMIPS in /proc/cpuinfo" ABI burden on our hands, and >> nobody wants that... > Makes sense to me. "out-of-range reserved region type for 0 to $base and > $top to ULONG-MAX" can be an alternative to fulfill the requirement. >> >>> 2) Shall the kernel check collision with MSI window* when userspace >>> calls VFIO_IOMMU_MAP_DMA? >>> Joerg/Will No; Alex yes >>> *for IOVA regions consumed downstream to the IOMMU: everyone says NO >> >> If we're starting off by having the SMMU drivers expose it as a fake >> fixed region, I don't think we need to worry about this yet. We all seem >> to agree that as long as we communicate the fixed regions to userspace, >> it's then userspace's job to work around them. Let's come back to this >> one once we actually get to the point of dynamically sizing and >> allocating 'real' MSI remapping region(s). >> >> Ultimately, the kernel *will* police collisions either way, because an >> underlying iommu_map() is going to fail if overlapping IOVAs are ever >> actually used, so it's really just a question of whether to have a more >> user-friendly failure mode. > That's true on ARM but not on x86 where the APIC MSI region is not > mapped I think. Yes, but the APIC interrupt region is fixed, i.e. it falls under "IOVA regions consumed downstream to the IOMMU" since the DMAR units are physically incapable of remapping those addresses. I take "MSI window" to mean specifically the thing we have to set aside and get a magic remapping cookie for, which is also why it warrants its own special internal type - we definitely *don't* want VFIO trying to set up a remapping cookie on x86. We just don't let userspace know about the difference, yet (if ever). Robin. >>> 3) RMRR reporting in the iommu group sysfs? Joerg: yes; Don: no >>> My current series does not expose them in iommu group sysfs. >>> I understand we can expose the RMRR regions in the iomm group sysfs >>> without necessarily supporting RMRR requiring device assignment. >>> We can also add this support later. >> >> As you say, reporting them doesn't necessitate allowing device >> assignment, and it's information which can already be easily grovelled >> out of dmesg (for intel-iommu at least) - there doesn't seem to be any >> need to hide them, but the x86 folks can have the final word on that. > agreed > > Thanks > > Eric >> >> Robin. >> >>> Thanks >>> >>> Eric >>> >>> >>>> >>>> Reserved regions are populated through the IOMMU get_resv_region callback >>>> (former get_dm_regions), now implemented by amd-iommu, intel-iommu and >>>> arm-smmu. >>>> >>>> The intel-iommu reports the [FEE0_0000h - FEF0_000h] MSI window as an >>>> IOMMU_RESV_NOMAP reserved region. >>>> >>>> arm-smmu reports the MSI window (arbitrarily located at 0x8000000 and >>>> 1MB large) and the PCI host bridge windows. >>>> >>>> The series integrates a not officially posted patch from Robin: >>>> "iommu/dma: Allow MSI-only cookies". >>>> >>>> This series currently does not address IRQ safety assessment. >>>> >>>> Best Regards >>>> >>>> Eric >>>> >>>> Git: complete series available at >>>> https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3 >>>> >>>> History: >>>> RFC v2 -> v3: >>>> - switch to an iommu-group sysfs API >>>> - use new dummy allocator provided by Robin >>>> - dummy allocator initialized by vfio-iommu-type1 after enumerating >>>> the reserved regions >>>> - at the moment ARM MSI base address/size is left unchanged compared >>>> to v2 >>>> - we currently report reserved regions and not usable IOVA regions as >>>> requested by Alex >>>> >>>> RFC v1 -> v2: >>>> - fix intel_add_reserved_regions >>>> - add mutex lock/unlock in vfio_iommu_type1 >>>> >>>> >>>> Eric Auger (10): >>>> iommu/dma: Allow MSI-only cookies >>>> iommu: Rename iommu_dm_regions into iommu_resv_regions >>>> iommu: Add new reserved IOMMU attributes >>>> iommu: iommu_alloc_resv_region >>>> iommu: Do not map reserved regions >>>> iommu: iommu_get_group_resv_regions >>>> iommu: Implement reserved_regions iommu-group sysfs file >>>> iommu/vt-d: Implement reserved region get/put callbacks >>>> iommu/arm-smmu: Implement reserved region get/put callbacks >>>> vfio/type1: Get MSI cookie >>>> >>>> drivers/iommu/amd_iommu.c | 20 +++--- >>>> drivers/iommu/arm-smmu.c | 52 +++++++++++++++ >>>> drivers/iommu/dma-iommu.c | 116 ++++++++++++++++++++++++++------- >>>> drivers/iommu/intel-iommu.c | 50 ++++++++++---- >>>> drivers/iommu/iommu.c | 141 ++++++++++++++++++++++++++++++++++++---- >>>> drivers/vfio/vfio_iommu_type1.c | 26 ++++++++ >>>> include/linux/dma-iommu.h | 7 ++ >>>> include/linux/iommu.h | 49 ++++++++++---- >>>> 8 files changed, 391 insertions(+), 70 deletions(-) >>>> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>