2014-06-13 16:31:19

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

The user of the IOMMU API domain expects to have full control of
the IOVA space for the domain. RMRRs are fundamentally incompatible
with that idea. We can neither map the RMRR into the IOMMU API
domain, nor can we guarantee that the device won't continue DMA with
the area described by the RMRR as part of the new domain. Therefore
we must prevent such devices from being used by the IOMMU API.

Signed-off-by: Alex Williamson <[email protected]>
Cc: [email protected]
---

v2: consolidate test to a single, well documented function.

drivers/iommu/intel-iommu.c | 49 ++++++++++++++++++++++++++++++++++---------
1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c4f11c0..253d598 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2511,22 +2511,46 @@ static bool device_has_rmrr(struct device *dev)
return false;
}

+/*
+ * There are a couple cases where we need to restrict the functionality of
+ * devices associated with RMRRs. The first is when evaluating a device for
+ * identity mapping because problems exist when devices are moved in and out
+ * of domains and their respective RMRR information is lost. This means that
+ * a device with associated RMRRs will never be in a "passthrough" domain.
+ * The second is use of the device through the IOMMU API. This interface
+ * expects to have full control of the IOVA space for the device. We cannot
+ * satisfy both the requirement that RMRR access is maintained and have an
+ * unencumbered IOVA space. We also have no ability to quiesce the device's
+ * use of the RMRR space or even inform the IOMMU API user of the restriction.
+ * We therefore prevent devices associated with an RMRR from participating in
+ * the IOMMU API, which eliminates them from device assignment.
+ *
+ * In both cases we assume that PCI USB devices with RMRRs have them largely
+ * for historical reasons and that the RMRR space is not actively used post
+ * boot. This exclusion may change if vendors begin to abuse it.
+ */
+static bool device_is_rmrr_locked(struct device *dev)
+{
+ if (!device_has_rmrr(dev))
+ return false;
+
+ if (dev_is_pci(dev)) {
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ if ((pdev->class >> 8) == PCI_CLASS_SERIAL_USB)
+ return false;
+ }
+
+ return true;
+}
+
static int iommu_should_identity_map(struct device *dev, int startup)
{

if (dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(dev);

- /*
- * We want to prevent any device associated with an RMRR from
- * getting placed into the SI Domain. This is done because
- * problems exist when devices are moved in and out of domains
- * and their respective RMRR info is lost. We exempt USB devices
- * from this process due to their usage of RMRRs that are known
- * to not be needed after BIOS hand-off to OS.
- */
- if (device_has_rmrr(dev) &&
- (pdev->class >> 8) != PCI_CLASS_SERIAL_USB)
+ if (device_is_rmrr_locked(dev))
return 0;

if ((iommu_identity_mapping & IDENTMAP_AZALIA) && IS_AZALIA(pdev))
@@ -4171,6 +4195,11 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
int addr_width;
u8 bus, devfn;

+ if (device_is_rmrr_locked(dev)) {
+ dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.\n");
+ return -EPERM;
+ }
+
/* normally dev is not mapped */
if (unlikely(domain_context_mapped(dev))) {
struct dmar_domain *old_domain;


2014-06-17 05:35:31

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Fri, 2014-06-13 at 10:30 -0600, Alex Williamson wrote:
> The user of the IOMMU API domain expects to have full control of
> the IOVA space for the domain. RMRRs are fundamentally incompatible
> with that idea. We can neither map the RMRR into the IOMMU API
> domain, nor can we guarantee that the device won't continue DMA with
> the area described by the RMRR as part of the new domain. Therefore
> we must prevent such devices from being used by the IOMMU API.
>
> Signed-off-by: Alex Williamson <[email protected]>
> Cc: [email protected]
> ---

David,

Any idea what an off-the-shelf Asus motherboard would be doing with an
RMRR on the Intel HD graphics?

dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]

Thanks,

Alex

> v2: consolidate test to a single, well documented function.
>
> drivers/iommu/intel-iommu.c | 49 ++++++++++++++++++++++++++++++++++---------
> 1 file changed, 39 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index c4f11c0..253d598 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2511,22 +2511,46 @@ static bool device_has_rmrr(struct device *dev)
> return false;
> }
>
> +/*
> + * There are a couple cases where we need to restrict the functionality of
> + * devices associated with RMRRs. The first is when evaluating a device for
> + * identity mapping because problems exist when devices are moved in and out
> + * of domains and their respective RMRR information is lost. This means that
> + * a device with associated RMRRs will never be in a "passthrough" domain.
> + * The second is use of the device through the IOMMU API. This interface
> + * expects to have full control of the IOVA space for the device. We cannot
> + * satisfy both the requirement that RMRR access is maintained and have an
> + * unencumbered IOVA space. We also have no ability to quiesce the device's
> + * use of the RMRR space or even inform the IOMMU API user of the restriction.
> + * We therefore prevent devices associated with an RMRR from participating in
> + * the IOMMU API, which eliminates them from device assignment.
> + *
> + * In both cases we assume that PCI USB devices with RMRRs have them largely
> + * for historical reasons and that the RMRR space is not actively used post
> + * boot. This exclusion may change if vendors begin to abuse it.
> + */
> +static bool device_is_rmrr_locked(struct device *dev)
> +{
> + if (!device_has_rmrr(dev))
> + return false;
> +
> + if (dev_is_pci(dev)) {
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if ((pdev->class >> 8) == PCI_CLASS_SERIAL_USB)
> + return false;
> + }
> +
> + return true;
> +}
> +
> static int iommu_should_identity_map(struct device *dev, int startup)
> {
>
> if (dev_is_pci(dev)) {
> struct pci_dev *pdev = to_pci_dev(dev);
>
> - /*
> - * We want to prevent any device associated with an RMRR from
> - * getting placed into the SI Domain. This is done because
> - * problems exist when devices are moved in and out of domains
> - * and their respective RMRR info is lost. We exempt USB devices
> - * from this process due to their usage of RMRRs that are known
> - * to not be needed after BIOS hand-off to OS.
> - */
> - if (device_has_rmrr(dev) &&
> - (pdev->class >> 8) != PCI_CLASS_SERIAL_USB)
> + if (device_is_rmrr_locked(dev))
> return 0;
>
> if ((iommu_identity_mapping & IDENTMAP_AZALIA) && IS_AZALIA(pdev))
> @@ -4171,6 +4195,11 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
> int addr_width;
> u8 bus, devfn;
>
> + if (device_is_rmrr_locked(dev)) {
> + dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.\n");
> + return -EPERM;
> + }
> +
> /* normally dev is not mapped */
> if (unlikely(domain_context_mapped(dev))) {
> struct dmar_domain *old_domain;
>
> _______________________________________________
> iommu mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/iommu


2014-06-17 07:05:00

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Mon, 2014-06-16 at 23:35 -0600, Alex Williamson wrote:
>
> Any idea what an off-the-shelf Asus motherboard would be doing with an
> RMRR on the Intel HD graphics?
>
> dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
> IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]

Hm, we should have thought of that sooner. That's quite normal — it's
for the 'stolen' memory used for the framebuffer. And maybe also the
GTT, and shadow GTT and other things; I forget precisely what, and it
varies from one setup to another.

I'd expect fairly much all systems to have an RMRR for the integrated
graphics device if they have one, and your patch¹ is going to prevent
assignment of those to guests... as you've presumably noticed.

I'm not sure if the i915 driver is capable of fully reprogramming the
hardware to completely stop using that region, to allow assignment to a
guest with a 'pure' memory map and no stolen region. I suppose it must,
if assignment to guests was working correctly before?

Perhaps the better answer here is not to have the special cases in
'device_is_rmrr_locked()', and instead allow a device driver to call a
'iommu_release_rmrrs()' function once it's reset the hardware to *stop*
doing whatever DMA the BIOS set it up with.


--
David Woodhouse Open Source Technology Centre
[email protected] Intel Corporation

¹ Alex's patch prevents assignment to VM guests of *any* device which has
RMRRs (reserved regions requiring an IOMMU 1:1 mapping) in its DMAR
table.


Attachments:
smime.p7s (5.61 kB)

2014-06-17 07:15:30

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Tue, Jun 17, 2014 at 9:04 AM, David Woodhouse <[email protected]> wrote:
> On Mon, 2014-06-16 at 23:35 -0600, Alex Williamson wrote:
>>
>> Any idea what an off-the-shelf Asus motherboard would be doing with an
>> RMRR on the Intel HD graphics?
>>
>> dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
>> IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]
>
> Hm, we should have thought of that sooner. That's quite normal — it's
> for the 'stolen' memory used for the framebuffer. And maybe also the
> GTT, and shadow GTT and other things; I forget precisely what, and it
> varies from one setup to another.
>
> I'd expect fairly much all systems to have an RMRR for the integrated
> graphics device if they have one, and your patch¹ is going to prevent
> assignment of those to guests... as you've presumably noticed.
>
> I'm not sure if the i915 driver is capable of fully reprogramming the
> hardware to completely stop using that region, to allow assignment to a
> guest with a 'pure' memory map and no stolen region. I suppose it must,
> if assignment to guests was working correctly before?
>
> Perhaps the better answer here is not to have the special cases in
> 'device_is_rmrr_locked()', and instead allow a device driver to call a
> 'iommu_release_rmrrs()' function once it's reset the hardware to *stop*
> doing whatever DMA the BIOS set it up with.

We've always been struggling with stolen handling, and we've' always
been struggling with vt-d stuff. Also pass-through seems to be a major
pain (I've never tried myself). Given all that I'm voting for keeping
the RMRR and everything else as much like for the normal case since I
have no idea what exactly must be remapped and what's optional. The
gpu is definitely keeping a lot of it's own private stuff in various
chunks of stolen memory.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-06-17 07:21:41

by David Woodhouse

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Tue, 2014-06-17 at 09:15 +0200, Daniel Vetter wrote:
> We've always been struggling with stolen handling, and we've' always
> been struggling with vt-d stuff. Also pass-through seems to be a major
> pain (I've never tried myself). Given all that I'm voting for keeping
> the RMRR and everything else as much like for the normal case since I
> have no idea what exactly must be remapped and what's optional. The
> gpu is definitely keeping a lot of it's own private stuff in various
> chunks of stolen memory.

Keeping it like the normal case is distinctly non-trivial. I raised that
possibility, and it's hard. You have to make the guests' address maps
match the host, in that the E820-reserved regions used for DMA and
listed in RMRRs must also appear as reserved for the guests.

That was bad enough when it was just 'BIOS might be doing something evil
behind our back' and we didn't need to let the guest *access* those
pages. But in the i915 case we do actually map and access the stolen
region too, so the task is even harder. We'd need to be able to decide
when those regions should actually be mapped into the guest.

--
dwmw2


Attachments:
smime.p7s (5.61 kB)

2014-06-17 08:14:25

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Tue, Jun 17, 2014 at 08:21:31AM +0100, David Woodhouse wrote:
> On Tue, 2014-06-17 at 09:15 +0200, Daniel Vetter wrote:
> > We've always been struggling with stolen handling, and we've' always
> > been struggling with vt-d stuff. Also pass-through seems to be a major
> > pain (I've never tried myself). Given all that I'm voting for keeping
> > the RMRR and everything else as much like for the normal case since I
> > have no idea what exactly must be remapped and what's optional. The
> > gpu is definitely keeping a lot of it's own private stuff in various
> > chunks of stolen memory.
>
> Keeping it like the normal case is distinctly non-trivial. I raised that
> possibility, and it's hard. You have to make the guests' address maps
> match the host, in that the E820-reserved regions used for DMA and
> listed in RMRRs must also appear as reserved for the guests.
>
> That was bad enough when it was just 'BIOS might be doing something evil
> behind our back' and we didn't need to let the guest *access* those
> pages. But in the i915 case we do actually map and access the stolen
> region too, so the task is even harder. We'd need to be able to decide
> when those regions should actually be mapped into the guest.

Hm, we check some registers (which iirc are set up by the bios) for stolen
to detect the address and size. And if that's there we use it. So not sure
what to do really.

For my understanding: The tricky part with RMRR isn't the mapping, but
making sure that the gues memory layout has the corresponding range
properly marked as reserved in the e820 map (i.e. like on real machines)?
I guess we wouldn't need to care about the actual memory since the host
linux can't access it either (without i915.ko).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-06-17 12:23:06

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Tue, 2014-06-17 at 08:04 +0100, David Woodhouse wrote:
> On Mon, 2014-06-16 at 23:35 -0600, Alex Williamson wrote:
> >
> > Any idea what an off-the-shelf Asus motherboard would be doing with an
> > RMRR on the Intel HD graphics?
> >
> > dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
> > IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]
>
> Hm, we should have thought of that sooner. That's quite normal — it's
> for the 'stolen' memory used for the framebuffer. And maybe also the
> GTT, and shadow GTT and other things; I forget precisely what, and it
> varies from one setup to another.

Why exactly do these things need to be identity mapped through the
IOMMU? This sounds like something a normal device might do with a
coherent mapping.

> I'd expect fairly much all systems to have an RMRR for the integrated
> graphics device if they have one, and your patch¹ is going to prevent
> assignment of those to guests... as you've presumably noticed.
>
> I'm not sure if the i915 driver is capable of fully reprogramming the
> hardware to completely stop using that region, to allow assignment to a
> guest with a 'pure' memory map and no stolen region. I suppose it must,
> if assignment to guests was working correctly before?

IGD assignment has never worked with KVM.

> Perhaps the better answer here is not to have the special cases in
> 'device_is_rmrr_locked()', and instead allow a device driver to call a
> 'iommu_release_rmrrs()' function once it's reset the hardware to *stop*
> doing whatever DMA the BIOS set it up with.

IGD supports FLR, which is good, but I would assume an FLR doesn't
necessarily release use of this region and being a root complex device I
don't think we have a bigger hammer reset option. Thanks,

Alex

2014-06-17 12:41:18

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Tue, 2014-06-17 at 06:22 -0600, Alex Williamson wrote:
> On Tue, 2014-06-17 at 08:04 +0100, David Woodhouse wrote:
> > On Mon, 2014-06-16 at 23:35 -0600, Alex Williamson wrote:
> > >
> > > Any idea what an off-the-shelf Asus motherboard would be doing with an
> > > RMRR on the Intel HD graphics?
> > >
> > > dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
> > > IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]
> >
> > Hm, we should have thought of that sooner. That's quite normal — it's
> > for the 'stolen' memory used for the framebuffer. And maybe also the
> > GTT, and shadow GTT and other things; I forget precisely what, and it
> > varies from one setup to another.
>
> Why exactly do these things need to be identity mapped through the
> IOMMU? This sounds like something a normal device might do with a
> coherent mapping.

The BIOS (EFI or VESA) sets up a framebuffer in stolen main memory. It's
accessed by DMA, using the physical address. The RMRR exists because we
need it *not* to suddenly stop working the moment the OS turns on the
IOMMU.

The OS graphics driver, if any, is not loaded at this point.

And even later, the OS graphics driver may choose to make use of the
'stolen' memory for various purposes. And since it was already stolen,
it doesn't go and set up *another* mapping for it; it knows that a
mapping already exists.

> > I'd expect fairly much all systems to have an RMRR for the integrated
> > graphics device if they have one, and your patch¹ is going to prevent
> > assignment of those to guests... as you've presumably noticed.
> >
> > I'm not sure if the i915 driver is capable of fully reprogramming the
> > hardware to completely stop using that region, to allow assignment to a
> > guest with a 'pure' memory map and no stolen region. I suppose it must,
> > if assignment to guests was working correctly before?
>
> IGD assignment has never worked with KVM.

Hm. It works with Xen though, doesn't it?

Are we content to say that it'll *never* work with KVM, and thus we can
live with the fact that your patch makes it harder to fix whatever was
wrong in the first place?

--
dwmw2


Attachments:
smime.p7s (5.61 kB)

2014-06-17 13:16:58

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Tue, 2014-06-17 at 13:41 +0100, David Woodhouse wrote:
> On Tue, 2014-06-17 at 06:22 -0600, Alex Williamson wrote:
> > On Tue, 2014-06-17 at 08:04 +0100, David Woodhouse wrote:
> > > On Mon, 2014-06-16 at 23:35 -0600, Alex Williamson wrote:
> > > >
> > > > Any idea what an off-the-shelf Asus motherboard would be doing with an
> > > > RMRR on the Intel HD graphics?
> > > >
> > > > dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
> > > > IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]
> > >
> > > Hm, we should have thought of that sooner. That's quite normal — it's
> > > for the 'stolen' memory used for the framebuffer. And maybe also the
> > > GTT, and shadow GTT and other things; I forget precisely what, and it
> > > varies from one setup to another.
> >
> > Why exactly do these things need to be identity mapped through the
> > IOMMU? This sounds like something a normal device might do with a
> > coherent mapping.
>
> The BIOS (EFI or VESA) sets up a framebuffer in stolen main memory. It's
> accessed by DMA, using the physical address. The RMRR exists because we
> need it *not* to suddenly stop working the moment the OS turns on the
> IOMMU.
>
> The OS graphics driver, if any, is not loaded at this point.
>
> And even later, the OS graphics driver may choose to make use of the
> 'stolen' memory for various purposes. And since it was already stolen,
> it doesn't go and set up *another* mapping for it; it knows that a
> mapping already exists.
>
> > > I'd expect fairly much all systems to have an RMRR for the integrated
> > > graphics device if they have one, and your patch¹ is going to prevent
> > > assignment of those to guests... as you've presumably noticed.
> > >
> > > I'm not sure if the i915 driver is capable of fully reprogramming the
> > > hardware to completely stop using that region, to allow assignment to a
> > > guest with a 'pure' memory map and no stolen region. I suppose it must,
> > > if assignment to guests was working correctly before?
> >
> > IGD assignment has never worked with KVM.
>
> Hm. It works with Xen though, doesn't it?

Apparently

> Are we content to say that it'll *never* work with KVM, and thus we can
> live with the fact that your patch makes it harder to fix whatever was
> wrong in the first place?

Probably not. However, it seems like you're saying that this RMRR is
used by and visible to OS level drivers, versus backchannel
communication channels, invisible to the OS. I think the latter is
specifically what we want to prevent by excluding devices with RMRRs.
This is a challenging use case, but it seems to be understood. If when
IGD is bound to vfio-pci we can be sure that access to the RMRR area
ceases, then we can tear it down and re-establish it from
userspace/QEMU, describe it to the guest in an e820 reserved region, and
never consider hotplug of the device for guests. If that's the case,
maybe it's another exception, like USB. I'll need to look through i915
more to find how the region is discovered. Thanks,

Alex

2014-06-17 13:44:17

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Tue, Jun 17, 2014 at 07:16:22AM -0600, Alex Williamson wrote:
> On Tue, 2014-06-17 at 13:41 +0100, David Woodhouse wrote:
> > On Tue, 2014-06-17 at 06:22 -0600, Alex Williamson wrote:
> > > On Tue, 2014-06-17 at 08:04 +0100, David Woodhouse wrote:
> > > > On Mon, 2014-06-16 at 23:35 -0600, Alex Williamson wrote:
> > > > >
> > > > > Any idea what an off-the-shelf Asus motherboard would be doing with an
> > > > > RMRR on the Intel HD graphics?
> > > > >
> > > > > dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
> > > > > IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]
> > > >
> > > > Hm, we should have thought of that sooner. That's quite normal — it's
> > > > for the 'stolen' memory used for the framebuffer. And maybe also the
> > > > GTT, and shadow GTT and other things; I forget precisely what, and it
> > > > varies from one setup to another.
> > >
> > > Why exactly do these things need to be identity mapped through the
> > > IOMMU? This sounds like something a normal device might do with a
> > > coherent mapping.
> >
> > The BIOS (EFI or VESA) sets up a framebuffer in stolen main memory. It's
> > accessed by DMA, using the physical address. The RMRR exists because we
> > need it *not* to suddenly stop working the moment the OS turns on the
> > IOMMU.
> >
> > The OS graphics driver, if any, is not loaded at this point.
> >
> > And even later, the OS graphics driver may choose to make use of the
> > 'stolen' memory for various purposes. And since it was already stolen,
> > it doesn't go and set up *another* mapping for it; it knows that a
> > mapping already exists.
> >
> > > > I'd expect fairly much all systems to have an RMRR for the integrated
> > > > graphics device if they have one, and your patch¹ is going to prevent
> > > > assignment of those to guests... as you've presumably noticed.
> > > >
> > > > I'm not sure if the i915 driver is capable of fully reprogramming the
> > > > hardware to completely stop using that region, to allow assignment to a
> > > > guest with a 'pure' memory map and no stolen region. I suppose it must,
> > > > if assignment to guests was working correctly before?
> > >
> > > IGD assignment has never worked with KVM.
> >
> > Hm. It works with Xen though, doesn't it?
>
> Apparently
>
> > Are we content to say that it'll *never* work with KVM, and thus we can
> > live with the fact that your patch makes it harder to fix whatever was
> > wrong in the first place?
>
> Probably not. However, it seems like you're saying that this RMRR is
> used by and visible to OS level drivers, versus backchannel
> communication channels, invisible to the OS. I think the latter is
> specifically what we want to prevent by excluding devices with RMRRs.
> This is a challenging use case, but it seems to be understood. If when
> IGD is bound to vfio-pci we can be sure that access to the RMRR area
> ceases, then we can tear it down and re-establish it from
> userspace/QEMU, describe it to the guest in an e820 reserved region, and
> never consider hotplug of the device for guests. If that's the case,
> maybe it's another exception, like USB. I'll need to look through i915
> more to find how the region is discovered. Thanks,

We have a bunch of register in the mmio bar set up by the bios that tells
us the address and size of the stolen range we can use. The address we
need for programming ptes, the size to know how much there is. We also
have an early boot pci quirk in x86 nowadays to make sure the pci layer
doesn't put random stuff in that range.

See drivers/gpu/drm/i915/i915_gem_gtt.c (search for stolen size)
i915_gem_stolen.c (look at stolen_to_phys) and the early quirks in
arch/x86/kernel/early-quirks.c for copies of the same code.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-06-17 14:16:24

by Alex Williamson

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Tue, 2014-06-17 at 15:44 +0200, Daniel Vetter wrote:
> On Tue, Jun 17, 2014 at 07:16:22AM -0600, Alex Williamson wrote:
> > On Tue, 2014-06-17 at 13:41 +0100, David Woodhouse wrote:
> > > On Tue, 2014-06-17 at 06:22 -0600, Alex Williamson wrote:
> > > > On Tue, 2014-06-17 at 08:04 +0100, David Woodhouse wrote:
> > > > > On Mon, 2014-06-16 at 23:35 -0600, Alex Williamson wrote:
> > > > > >
> > > > > > Any idea what an off-the-shelf Asus motherboard would be doing with an
> > > > > > RMRR on the Intel HD graphics?
> > > > > >
> > > > > > dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
> > > > > > IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]
> > > > >
> > > > > Hm, we should have thought of that sooner. That's quite normal — it's
> > > > > for the 'stolen' memory used for the framebuffer. And maybe also the
> > > > > GTT, and shadow GTT and other things; I forget precisely what, and it
> > > > > varies from one setup to another.
> > > >
> > > > Why exactly do these things need to be identity mapped through the
> > > > IOMMU? This sounds like something a normal device might do with a
> > > > coherent mapping.
> > >
> > > The BIOS (EFI or VESA) sets up a framebuffer in stolen main memory. It's
> > > accessed by DMA, using the physical address. The RMRR exists because we
> > > need it *not* to suddenly stop working the moment the OS turns on the
> > > IOMMU.
> > >
> > > The OS graphics driver, if any, is not loaded at this point.
> > >
> > > And even later, the OS graphics driver may choose to make use of the
> > > 'stolen' memory for various purposes. And since it was already stolen,
> > > it doesn't go and set up *another* mapping for it; it knows that a
> > > mapping already exists.
> > >
> > > > > I'd expect fairly much all systems to have an RMRR for the integrated
> > > > > graphics device if they have one, and your patch¹ is going to prevent
> > > > > assignment of those to guests... as you've presumably noticed.
> > > > >
> > > > > I'm not sure if the i915 driver is capable of fully reprogramming the
> > > > > hardware to completely stop using that region, to allow assignment to a
> > > > > guest with a 'pure' memory map and no stolen region. I suppose it must,
> > > > > if assignment to guests was working correctly before?
> > > >
> > > > IGD assignment has never worked with KVM.
> > >
> > > Hm. It works with Xen though, doesn't it?
> >
> > Apparently
> >
> > > Are we content to say that it'll *never* work with KVM, and thus we can
> > > live with the fact that your patch makes it harder to fix whatever was
> > > wrong in the first place?
> >
> > Probably not. However, it seems like you're saying that this RMRR is
> > used by and visible to OS level drivers, versus backchannel
> > communication channels, invisible to the OS. I think the latter is
> > specifically what we want to prevent by excluding devices with RMRRs.
> > This is a challenging use case, but it seems to be understood. If when
> > IGD is bound to vfio-pci we can be sure that access to the RMRR area
> > ceases, then we can tear it down and re-establish it from
> > userspace/QEMU, describe it to the guest in an e820 reserved region, and
> > never consider hotplug of the device for guests. If that's the case,
> > maybe it's another exception, like USB. I'll need to look through i915
> > more to find how the region is discovered. Thanks,
>
> We have a bunch of register in the mmio bar set up by the bios that tells
> us the address and size of the stolen range we can use. The address we
> need for programming ptes, the size to know how much there is. We also
> have an early boot pci quirk in x86 nowadays to make sure the pci layer
> doesn't put random stuff in that range.
>
> See drivers/gpu/drm/i915/i915_gem_gtt.c (search for stolen size)
> i915_gem_stolen.c (look at stolen_to_phys) and the early quirks in
> arch/x86/kernel/early-quirks.c for copies of the same code.

Thanks for the tips. If the purpose of the RMRR is to maintain
consistency across the OS enabling VT-d, then there's really no reason
for this to be identity mapped in a guest (where VT-d is not exposed) is
there? It may waste the memory that's already reserved on the platform
to not setup an identity map, but I could back stolen memory by
non-stolen user memory, couldn't I? It might be nice to avoid adding an
identity mapping interface to the IOMMU API, even if it costs some
memory to do so. Or maybe I could expose the RMRR area through the VFIO
device file descriptor, allow it to be mmap'd there, then allow that
mmap to be mapped through the IOMMU. Thanks,

Alex

2014-06-17 16:45:42

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Tue, Jun 17, 2014 at 08:15:47AM -0600, Alex Williamson wrote:
> On Tue, 2014-06-17 at 15:44 +0200, Daniel Vetter wrote:
> > On Tue, Jun 17, 2014 at 07:16:22AM -0600, Alex Williamson wrote:
> > > On Tue, 2014-06-17 at 13:41 +0100, David Woodhouse wrote:
> > > > On Tue, 2014-06-17 at 06:22 -0600, Alex Williamson wrote:
> > > > > On Tue, 2014-06-17 at 08:04 +0100, David Woodhouse wrote:
> > > > > > On Mon, 2014-06-16 at 23:35 -0600, Alex Williamson wrote:
> > > > > > >
> > > > > > > Any idea what an off-the-shelf Asus motherboard would be doing with an
> > > > > > > RMRR on the Intel HD graphics?
> > > > > > >
> > > > > > > dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
> > > > > > > IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]
> > > > > >
> > > > > > Hm, we should have thought of that sooner. That's quite normal — it's
> > > > > > for the 'stolen' memory used for the framebuffer. And maybe also the
> > > > > > GTT, and shadow GTT and other things; I forget precisely what, and it
> > > > > > varies from one setup to another.
> > > > >
> > > > > Why exactly do these things need to be identity mapped through the
> > > > > IOMMU? This sounds like something a normal device might do with a
> > > > > coherent mapping.
> > > >
> > > > The BIOS (EFI or VESA) sets up a framebuffer in stolen main memory. It's
> > > > accessed by DMA, using the physical address. The RMRR exists because we
> > > > need it *not* to suddenly stop working the moment the OS turns on the
> > > > IOMMU.
> > > >
> > > > The OS graphics driver, if any, is not loaded at this point.
> > > >
> > > > And even later, the OS graphics driver may choose to make use of the
> > > > 'stolen' memory for various purposes. And since it was already stolen,
> > > > it doesn't go and set up *another* mapping for it; it knows that a
> > > > mapping already exists.
> > > >
> > > > > > I'd expect fairly much all systems to have an RMRR for the integrated
> > > > > > graphics device if they have one, and your patch¹ is going to prevent
> > > > > > assignment of those to guests... as you've presumably noticed.
> > > > > >
> > > > > > I'm not sure if the i915 driver is capable of fully reprogramming the
> > > > > > hardware to completely stop using that region, to allow assignment to a
> > > > > > guest with a 'pure' memory map and no stolen region. I suppose it must,
> > > > > > if assignment to guests was working correctly before?
> > > > >
> > > > > IGD assignment has never worked with KVM.
> > > >
> > > > Hm. It works with Xen though, doesn't it?
> > >
> > > Apparently
> > >
> > > > Are we content to say that it'll *never* work with KVM, and thus we can
> > > > live with the fact that your patch makes it harder to fix whatever was
> > > > wrong in the first place?
> > >
> > > Probably not. However, it seems like you're saying that this RMRR is
> > > used by and visible to OS level drivers, versus backchannel
> > > communication channels, invisible to the OS. I think the latter is
> > > specifically what we want to prevent by excluding devices with RMRRs.
> > > This is a challenging use case, but it seems to be understood. If when
> > > IGD is bound to vfio-pci we can be sure that access to the RMRR area
> > > ceases, then we can tear it down and re-establish it from
> > > userspace/QEMU, describe it to the guest in an e820 reserved region, and
> > > never consider hotplug of the device for guests. If that's the case,
> > > maybe it's another exception, like USB. I'll need to look through i915
> > > more to find how the region is discovered. Thanks,
> >
> > We have a bunch of register in the mmio bar set up by the bios that tells
> > us the address and size of the stolen range we can use. The address we
> > need for programming ptes, the size to know how much there is. We also
> > have an early boot pci quirk in x86 nowadays to make sure the pci layer
> > doesn't put random stuff in that range.
> >
> > See drivers/gpu/drm/i915/i915_gem_gtt.c (search for stolen size)
> > i915_gem_stolen.c (look at stolen_to_phys) and the early quirks in
> > arch/x86/kernel/early-quirks.c for copies of the same code.
>
> Thanks for the tips. If the purpose of the RMRR is to maintain
> consistency across the OS enabling VT-d, then there's really no reason
> for this to be identity mapped in a guest (where VT-d is not exposed) is
> there? It may waste the memory that's already reserved on the platform
> to not setup an identity map, but I could back stolen memory by
> non-stolen user memory, couldn't I? It might be nice to avoid adding an
> identity mapping interface to the IOMMU API, even if it costs some
> memory to do so. Or maybe I could expose the RMRR area through the VFIO
> device file descriptor, allow it to be mmap'd there, then allow that
> mmap to be mapped through the IOMMU. Thanks,

The stolen range is locked down at boot in the memory controller and at
least on some platforms not cpu accessible. Also our gpu is famous for
warts in the tlb and pte lookup hw, so I wouldn't be surprised at all if
the stolen range couldn't be backed by normal memory. Our driver otoh will
survive if you set the stolen size to 0 (with slight feature degration).
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-06-17 17:00:24

by Alex Williamson

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Tue, 2014-06-17 at 18:45 +0200, Daniel Vetter wrote:
> On Tue, Jun 17, 2014 at 08:15:47AM -0600, Alex Williamson wrote:
> > On Tue, 2014-06-17 at 15:44 +0200, Daniel Vetter wrote:
> > > On Tue, Jun 17, 2014 at 07:16:22AM -0600, Alex Williamson wrote:
> > > > On Tue, 2014-06-17 at 13:41 +0100, David Woodhouse wrote:
> > > > > On Tue, 2014-06-17 at 06:22 -0600, Alex Williamson wrote:
> > > > > > On Tue, 2014-06-17 at 08:04 +0100, David Woodhouse wrote:
> > > > > > > On Mon, 2014-06-16 at 23:35 -0600, Alex Williamson wrote:
> > > > > > > >
> > > > > > > > Any idea what an off-the-shelf Asus motherboard would be doing with an
> > > > > > > > RMRR on the Intel HD graphics?
> > > > > > > >
> > > > > > > > dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
> > > > > > > > IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]
> > > > > > >
> > > > > > > Hm, we should have thought of that sooner. That's quite normal — it's
> > > > > > > for the 'stolen' memory used for the framebuffer. And maybe also the
> > > > > > > GTT, and shadow GTT and other things; I forget precisely what, and it
> > > > > > > varies from one setup to another.
> > > > > >
> > > > > > Why exactly do these things need to be identity mapped through the
> > > > > > IOMMU? This sounds like something a normal device might do with a
> > > > > > coherent mapping.
> > > > >
> > > > > The BIOS (EFI or VESA) sets up a framebuffer in stolen main memory. It's
> > > > > accessed by DMA, using the physical address. The RMRR exists because we
> > > > > need it *not* to suddenly stop working the moment the OS turns on the
> > > > > IOMMU.
> > > > >
> > > > > The OS graphics driver, if any, is not loaded at this point.
> > > > >
> > > > > And even later, the OS graphics driver may choose to make use of the
> > > > > 'stolen' memory for various purposes. And since it was already stolen,
> > > > > it doesn't go and set up *another* mapping for it; it knows that a
> > > > > mapping already exists.
> > > > >
> > > > > > > I'd expect fairly much all systems to have an RMRR for the integrated
> > > > > > > graphics device if they have one, and your patch¹ is going to prevent
> > > > > > > assignment of those to guests... as you've presumably noticed.
> > > > > > >
> > > > > > > I'm not sure if the i915 driver is capable of fully reprogramming the
> > > > > > > hardware to completely stop using that region, to allow assignment to a
> > > > > > > guest with a 'pure' memory map and no stolen region. I suppose it must,
> > > > > > > if assignment to guests was working correctly before?
> > > > > >
> > > > > > IGD assignment has never worked with KVM.
> > > > >
> > > > > Hm. It works with Xen though, doesn't it?
> > > >
> > > > Apparently
> > > >
> > > > > Are we content to say that it'll *never* work with KVM, and thus we can
> > > > > live with the fact that your patch makes it harder to fix whatever was
> > > > > wrong in the first place?
> > > >
> > > > Probably not. However, it seems like you're saying that this RMRR is
> > > > used by and visible to OS level drivers, versus backchannel
> > > > communication channels, invisible to the OS. I think the latter is
> > > > specifically what we want to prevent by excluding devices with RMRRs.
> > > > This is a challenging use case, but it seems to be understood. If when
> > > > IGD is bound to vfio-pci we can be sure that access to the RMRR area
> > > > ceases, then we can tear it down and re-establish it from
> > > > userspace/QEMU, describe it to the guest in an e820 reserved region, and
> > > > never consider hotplug of the device for guests. If that's the case,
> > > > maybe it's another exception, like USB. I'll need to look through i915
> > > > more to find how the region is discovered. Thanks,
> > >
> > > We have a bunch of register in the mmio bar set up by the bios that tells
> > > us the address and size of the stolen range we can use. The address we
> > > need for programming ptes, the size to know how much there is. We also
> > > have an early boot pci quirk in x86 nowadays to make sure the pci layer
> > > doesn't put random stuff in that range.
> > >
> > > See drivers/gpu/drm/i915/i915_gem_gtt.c (search for stolen size)
> > > i915_gem_stolen.c (look at stolen_to_phys) and the early quirks in
> > > arch/x86/kernel/early-quirks.c for copies of the same code.
> >
> > Thanks for the tips. If the purpose of the RMRR is to maintain
> > consistency across the OS enabling VT-d, then there's really no reason
> > for this to be identity mapped in a guest (where VT-d is not exposed) is
> > there? It may waste the memory that's already reserved on the platform
> > to not setup an identity map, but I could back stolen memory by
> > non-stolen user memory, couldn't I? It might be nice to avoid adding an
> > identity mapping interface to the IOMMU API, even if it costs some
> > memory to do so. Or maybe I could expose the RMRR area through the VFIO
> > device file descriptor, allow it to be mmap'd there, then allow that
> > mmap to be mapped through the IOMMU. Thanks,
>
> The stolen range is locked down at boot in the memory controller and at
> least on some platforms not cpu accessible. Also our gpu is famous for
> warts in the tlb and pte lookup hw, so I wouldn't be surprised at all if
> the stolen range couldn't be backed by normal memory. Our driver otoh will
> survive if you set the stolen size to 0 (with slight feature degration).

Do you know if the same is true of the Windows driver for stolen size?
We can easily set the guest physical address of stolen memory to match
the physical hardware, which would hopefully keep the GPU happy, but if
it's special at the memory controller level, it sounds like we'd really
need to identity map it. Thanks,

Alex

2014-06-17 17:53:24

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Tue, Jun 17, 2014 at 10:59:51AM -0600, Alex Williamson wrote:
> On Tue, 2014-06-17 at 18:45 +0200, Daniel Vetter wrote:
> > On Tue, Jun 17, 2014 at 08:15:47AM -0600, Alex Williamson wrote:
> > > On Tue, 2014-06-17 at 15:44 +0200, Daniel Vetter wrote:
> > > > On Tue, Jun 17, 2014 at 07:16:22AM -0600, Alex Williamson wrote:
> > > > > On Tue, 2014-06-17 at 13:41 +0100, David Woodhouse wrote:
> > > > > > On Tue, 2014-06-17 at 06:22 -0600, Alex Williamson wrote:
> > > > > > > On Tue, 2014-06-17 at 08:04 +0100, David Woodhouse wrote:
> > > > > > > > On Mon, 2014-06-16 at 23:35 -0600, Alex Williamson wrote:
> > > > > > > > >
> > > > > > > > > Any idea what an off-the-shelf Asus motherboard would be doing with an
> > > > > > > > > RMRR on the Intel HD graphics?
> > > > > > > > >
> > > > > > > > > dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
> > > > > > > > > IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]
> > > > > > > >
> > > > > > > > Hm, we should have thought of that sooner. That's quite normal — it's
> > > > > > > > for the 'stolen' memory used for the framebuffer. And maybe also the
> > > > > > > > GTT, and shadow GTT and other things; I forget precisely what, and it
> > > > > > > > varies from one setup to another.
> > > > > > >
> > > > > > > Why exactly do these things need to be identity mapped through the
> > > > > > > IOMMU? This sounds like something a normal device might do with a
> > > > > > > coherent mapping.
> > > > > >
> > > > > > The BIOS (EFI or VESA) sets up a framebuffer in stolen main memory. It's
> > > > > > accessed by DMA, using the physical address. The RMRR exists because we
> > > > > > need it *not* to suddenly stop working the moment the OS turns on the
> > > > > > IOMMU.
> > > > > >
> > > > > > The OS graphics driver, if any, is not loaded at this point.
> > > > > >
> > > > > > And even later, the OS graphics driver may choose to make use of the
> > > > > > 'stolen' memory for various purposes. And since it was already stolen,
> > > > > > it doesn't go and set up *another* mapping for it; it knows that a
> > > > > > mapping already exists.
> > > > > >
> > > > > > > > I'd expect fairly much all systems to have an RMRR for the integrated
> > > > > > > > graphics device if they have one, and your patch¹ is going to prevent
> > > > > > > > assignment of those to guests... as you've presumably noticed.
> > > > > > > >
> > > > > > > > I'm not sure if the i915 driver is capable of fully reprogramming the
> > > > > > > > hardware to completely stop using that region, to allow assignment to a
> > > > > > > > guest with a 'pure' memory map and no stolen region. I suppose it must,
> > > > > > > > if assignment to guests was working correctly before?
> > > > > > >
> > > > > > > IGD assignment has never worked with KVM.
> > > > > >
> > > > > > Hm. It works with Xen though, doesn't it?
> > > > >
> > > > > Apparently
> > > > >
> > > > > > Are we content to say that it'll *never* work with KVM, and thus we can
> > > > > > live with the fact that your patch makes it harder to fix whatever was
> > > > > > wrong in the first place?
> > > > >
> > > > > Probably not. However, it seems like you're saying that this RMRR is
> > > > > used by and visible to OS level drivers, versus backchannel
> > > > > communication channels, invisible to the OS. I think the latter is
> > > > > specifically what we want to prevent by excluding devices with RMRRs.
> > > > > This is a challenging use case, but it seems to be understood. If when
> > > > > IGD is bound to vfio-pci we can be sure that access to the RMRR area
> > > > > ceases, then we can tear it down and re-establish it from
> > > > > userspace/QEMU, describe it to the guest in an e820 reserved region, and
> > > > > never consider hotplug of the device for guests. If that's the case,
> > > > > maybe it's another exception, like USB. I'll need to look through i915
> > > > > more to find how the region is discovered. Thanks,
> > > >
> > > > We have a bunch of register in the mmio bar set up by the bios that tells
> > > > us the address and size of the stolen range we can use. The address we
> > > > need for programming ptes, the size to know how much there is. We also
> > > > have an early boot pci quirk in x86 nowadays to make sure the pci layer
> > > > doesn't put random stuff in that range.
> > > >
> > > > See drivers/gpu/drm/i915/i915_gem_gtt.c (search for stolen size)
> > > > i915_gem_stolen.c (look at stolen_to_phys) and the early quirks in
> > > > arch/x86/kernel/early-quirks.c for copies of the same code.
> > >
> > > Thanks for the tips. If the purpose of the RMRR is to maintain
> > > consistency across the OS enabling VT-d, then there's really no reason
> > > for this to be identity mapped in a guest (where VT-d is not exposed) is
> > > there? It may waste the memory that's already reserved on the platform
> > > to not setup an identity map, but I could back stolen memory by
> > > non-stolen user memory, couldn't I? It might be nice to avoid adding an
> > > identity mapping interface to the IOMMU API, even if it costs some
> > > memory to do so. Or maybe I could expose the RMRR area through the VFIO
> > > device file descriptor, allow it to be mmap'd there, then allow that
> > > mmap to be mapped through the IOMMU. Thanks,
> >
> > The stolen range is locked down at boot in the memory controller and at
> > least on some platforms not cpu accessible. Also our gpu is famous for
> > warts in the tlb and pte lookup hw, so I wouldn't be surprised at all if
> > the stolen range couldn't be backed by normal memory. Our driver otoh will
> > survive if you set the stolen size to 0 (with slight feature degration).
>
> Do you know if the same is true of the Windows driver for stolen size?
> We can easily set the guest physical address of stolen memory to match
> the physical hardware, which would hopefully keep the GPU happy, but if
> it's special at the memory controller level, it sounds like we'd really
> need to identity map it. Thanks,

No idea what windows does here, and the path between me and the windows
team for such inquiries is extremely long :(
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-06-18 21:48:59

by Alex Williamson

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Tue, 2014-06-17 at 15:44 +0200, Daniel Vetter wrote:
> On Tue, Jun 17, 2014 at 07:16:22AM -0600, Alex Williamson wrote:
> > On Tue, 2014-06-17 at 13:41 +0100, David Woodhouse wrote:
> > > On Tue, 2014-06-17 at 06:22 -0600, Alex Williamson wrote:
> > > > On Tue, 2014-06-17 at 08:04 +0100, David Woodhouse wrote:
> > > > > On Mon, 2014-06-16 at 23:35 -0600, Alex Williamson wrote:
> > > > > >
> > > > > > Any idea what an off-the-shelf Asus motherboard would be doing with an
> > > > > > RMRR on the Intel HD graphics?
> > > > > >
> > > > > > dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
> > > > > > IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]
> > > > >
> > > > > Hm, we should have thought of that sooner. That's quite normal — it's
> > > > > for the 'stolen' memory used for the framebuffer. And maybe also the
> > > > > GTT, and shadow GTT and other things; I forget precisely what, and it
> > > > > varies from one setup to another.
> > > >
> > > > Why exactly do these things need to be identity mapped through the
> > > > IOMMU? This sounds like something a normal device might do with a
> > > > coherent mapping.
> > >
> > > The BIOS (EFI or VESA) sets up a framebuffer in stolen main memory. It's
> > > accessed by DMA, using the physical address. The RMRR exists because we
> > > need it *not* to suddenly stop working the moment the OS turns on the
> > > IOMMU.
> > >
> > > The OS graphics driver, if any, is not loaded at this point.
> > >
> > > And even later, the OS graphics driver may choose to make use of the
> > > 'stolen' memory for various purposes. And since it was already stolen,
> > > it doesn't go and set up *another* mapping for it; it knows that a
> > > mapping already exists.
> > >
> > > > > I'd expect fairly much all systems to have an RMRR for the integrated
> > > > > graphics device if they have one, and your patch¹ is going to prevent
> > > > > assignment of those to guests... as you've presumably noticed.
> > > > >
> > > > > I'm not sure if the i915 driver is capable of fully reprogramming the
> > > > > hardware to completely stop using that region, to allow assignment to a
> > > > > guest with a 'pure' memory map and no stolen region. I suppose it must,
> > > > > if assignment to guests was working correctly before?
> > > >
> > > > IGD assignment has never worked with KVM.
> > >
> > > Hm. It works with Xen though, doesn't it?
> >
> > Apparently
> >
> > > Are we content to say that it'll *never* work with KVM, and thus we can
> > > live with the fact that your patch makes it harder to fix whatever was
> > > wrong in the first place?
> >
> > Probably not. However, it seems like you're saying that this RMRR is
> > used by and visible to OS level drivers, versus backchannel
> > communication channels, invisible to the OS. I think the latter is
> > specifically what we want to prevent by excluding devices with RMRRs.
> > This is a challenging use case, but it seems to be understood. If when
> > IGD is bound to vfio-pci we can be sure that access to the RMRR area
> > ceases, then we can tear it down and re-establish it from
> > userspace/QEMU, describe it to the guest in an e820 reserved region, and
> > never consider hotplug of the device for guests. If that's the case,
> > maybe it's another exception, like USB. I'll need to look through i915
> > more to find how the region is discovered. Thanks,
>
> We have a bunch of register in the mmio bar set up by the bios that tells
> us the address and size of the stolen range we can use. The address we
> need for programming ptes, the size to know how much there is. We also
> have an early boot pci quirk in x86 nowadays to make sure the pci layer
> doesn't put random stuff in that range.
>
> See drivers/gpu/drm/i915/i915_gem_gtt.c (search for stolen size)
> i915_gem_stolen.c (look at stolen_to_phys) and the early quirks in
> arch/x86/kernel/early-quirks.c for copies of the same code.

Ok, here's what I observe on my system for a few settings of iGPU memory
size in the BIOS. The device ID for this IGD is 0152, so I'm using the
gen6_stolen_funcs stolen functions from early quirks for stolen
base/size. I also report the ASL Storage base, ie. the opregion since
that also needs to be punched through if this device were to be
assigned.

"1024M"
[ 0.628033] IOMMU: Setting identity map for device 0000:00:02.0 [0xbf800000 - 0xbf9fffff]
[ 0.000000] BIOS-e820: [mem 0x00000000bf800000-0x00000000bf9fffff] reserved

setpci -s 2.0 5c.l
7fa00001
setpci -s 2.0 50.l
00000289

(289 >> 3) & 1f = 0x11, 17 * 32M = 544M

stolen memory range: 7fa00000-a1bfffff

setpci -s 2.0 fc.l
7ebb7018

So for the max iGPU memory option, our RMRR is 2M and it contains
neither the stolen memory nor the opregion (it never contains the
opregion apparently). If the purpose of the RMRR is to maintain access
to the framebuffer in stolen memory across VT-d enabling, how does it
work here? What's in the 2M RMRR and would it need to be mapped to a
guest if we wanted to support IGD assignment?

"512M"
[ 0.627083] IOMMU: Setting identity map for device 0000:00:02.0 [0x9f800000 - 0xbf9fffff]
[ 0.000000] BIOS-e820: [mem 0x000000009f800000-0x00000000bf9fffff] reserved

setpci -s 2.0 5c.l
9fa00001
setpci -s 2.0 50.l
00000281

(281 >> 3) & 1f = 0x10, 16 * 32M = 512M

stolen memory range: 9fa00000-bf9fffff

setpci -s 2.0 fc.l
9ebb7018

With 512M iGPU memory, we're at least now using the RMRR for stolen
memory, but we still have an additional mystery 2M in the RMRR since
it's actually a 514M range.

"256M"
[ 0.626030] IOMMU: Setting identity map for device 0000:00:02.0 [0xaf800000 - 0xbf9fffff]
[ 0.000000] BIOS-e820: [mem 0x00000000af800000-0x00000000bf9fffff] reserved

setpci -s 2.0 5c.l
afa00001
setpci -s 2.0 50.l
00000241

(241 >> 3) & 1f = 0x8, 8 * 32M = 256M

stolen memory range: afa00000-bf9fffff

setpci -s 2.0 fc.l
aebb7018

The 256M setting is a repeat of 512M, the RMRR is 258M and 256M of it is
stolen memory.

So we can say that sometimes the RMRR contains the stolen memory used as
a framebuffer, but that stolen memory is not always mapped with an RMRR
and there's an additional 2M in the RMRR that's still a mystery. If we
wanted to support assignment of IGD, we could map the stolen memory and
the opregion, but what do we do that that extra RMRR space? Ignore it?
Map it? How do we find it from the device? Thanks,

Alex

2014-06-19 01:47:49

by Alex Williamson

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Wed, 2014-06-18 at 15:48 -0600, Alex Williamson wrote:
> On Tue, 2014-06-17 at 15:44 +0200, Daniel Vetter wrote:
> > On Tue, Jun 17, 2014 at 07:16:22AM -0600, Alex Williamson wrote:
> > > On Tue, 2014-06-17 at 13:41 +0100, David Woodhouse wrote:
> > > > On Tue, 2014-06-17 at 06:22 -0600, Alex Williamson wrote:
> > > > > On Tue, 2014-06-17 at 08:04 +0100, David Woodhouse wrote:
> > > > > > On Mon, 2014-06-16 at 23:35 -0600, Alex Williamson wrote:
> > > > > > >
> > > > > > > Any idea what an off-the-shelf Asus motherboard would be doing with an
> > > > > > > RMRR on the Intel HD graphics?
> > > > > > >
> > > > > > > dmar: RMRR base: 0x000000bb800000 end: 0x000000bf9fffff
> > > > > > > IOMMU: Setting identity map for device 0000:00:02.0 [0xbb800000 - 0xbf9fffff]
> > > > > >
> > > > > > Hm, we should have thought of that sooner. That's quite normal — it's
> > > > > > for the 'stolen' memory used for the framebuffer. And maybe also the
> > > > > > GTT, and shadow GTT and other things; I forget precisely what, and it
> > > > > > varies from one setup to another.
> > > > >
> > > > > Why exactly do these things need to be identity mapped through the
> > > > > IOMMU? This sounds like something a normal device might do with a
> > > > > coherent mapping.
> > > >
> > > > The BIOS (EFI or VESA) sets up a framebuffer in stolen main memory. It's
> > > > accessed by DMA, using the physical address. The RMRR exists because we
> > > > need it *not* to suddenly stop working the moment the OS turns on the
> > > > IOMMU.
> > > >
> > > > The OS graphics driver, if any, is not loaded at this point.
> > > >
> > > > And even later, the OS graphics driver may choose to make use of the
> > > > 'stolen' memory for various purposes. And since it was already stolen,
> > > > it doesn't go and set up *another* mapping for it; it knows that a
> > > > mapping already exists.
> > > >
> > > > > > I'd expect fairly much all systems to have an RMRR for the integrated
> > > > > > graphics device if they have one, and your patch¹ is going to prevent
> > > > > > assignment of those to guests... as you've presumably noticed.
> > > > > >
> > > > > > I'm not sure if the i915 driver is capable of fully reprogramming the
> > > > > > hardware to completely stop using that region, to allow assignment to a
> > > > > > guest with a 'pure' memory map and no stolen region. I suppose it must,
> > > > > > if assignment to guests was working correctly before?
> > > > >
> > > > > IGD assignment has never worked with KVM.
> > > >
> > > > Hm. It works with Xen though, doesn't it?
> > >
> > > Apparently
> > >
> > > > Are we content to say that it'll *never* work with KVM, and thus we can
> > > > live with the fact that your patch makes it harder to fix whatever was
> > > > wrong in the first place?
> > >
> > > Probably not. However, it seems like you're saying that this RMRR is
> > > used by and visible to OS level drivers, versus backchannel
> > > communication channels, invisible to the OS. I think the latter is
> > > specifically what we want to prevent by excluding devices with RMRRs.
> > > This is a challenging use case, but it seems to be understood. If when
> > > IGD is bound to vfio-pci we can be sure that access to the RMRR area
> > > ceases, then we can tear it down and re-establish it from
> > > userspace/QEMU, describe it to the guest in an e820 reserved region, and
> > > never consider hotplug of the device for guests. If that's the case,
> > > maybe it's another exception, like USB. I'll need to look through i915
> > > more to find how the region is discovered. Thanks,
> >
> > We have a bunch of register in the mmio bar set up by the bios that tells
> > us the address and size of the stolen range we can use. The address we
> > need for programming ptes, the size to know how much there is. We also
> > have an early boot pci quirk in x86 nowadays to make sure the pci layer
> > doesn't put random stuff in that range.
> >
> > See drivers/gpu/drm/i915/i915_gem_gtt.c (search for stolen size)
> > i915_gem_stolen.c (look at stolen_to_phys) and the early quirks in
> > arch/x86/kernel/early-quirks.c for copies of the same code.
>
> Ok, here's what I observe on my system for a few settings of iGPU memory
> size in the BIOS. The device ID for this IGD is 0152, so I'm using the
> gen6_stolen_funcs stolen functions from early quirks for stolen
> base/size. I also report the ASL Storage base, ie. the opregion since
> that also needs to be punched through if this device were to be
> assigned.
>
> "1024M"
> [ 0.628033] IOMMU: Setting identity map for device 0000:00:02.0 [0xbf800000 - 0xbf9fffff]
> [ 0.000000] BIOS-e820: [mem 0x00000000bf800000-0x00000000bf9fffff] reserved
>
> setpci -s 2.0 5c.l
> 7fa00001
> setpci -s 2.0 50.l
> 00000289
>
> (289 >> 3) & 1f = 0x11, 17 * 32M = 544M
>
> stolen memory range: 7fa00000-a1bfffff
>
> setpci -s 2.0 fc.l
> 7ebb7018
>
> So for the max iGPU memory option, our RMRR is 2M and it contains
> neither the stolen memory nor the opregion (it never contains the
> opregion apparently). If the purpose of the RMRR is to maintain access
> to the framebuffer in stolen memory across VT-d enabling, how does it
> work here? What's in the 2M RMRR and would it need to be mapped to a
> guest if we wanted to support IGD assignment?
>
> "512M"
> [ 0.627083] IOMMU: Setting identity map for device 0000:00:02.0 [0x9f800000 - 0xbf9fffff]
> [ 0.000000] BIOS-e820: [mem 0x000000009f800000-0x00000000bf9fffff] reserved
>
> setpci -s 2.0 5c.l
> 9fa00001
> setpci -s 2.0 50.l
> 00000281
>
> (281 >> 3) & 1f = 0x10, 16 * 32M = 512M
>
> stolen memory range: 9fa00000-bf9fffff
>
> setpci -s 2.0 fc.l
> 9ebb7018
>
> With 512M iGPU memory, we're at least now using the RMRR for stolen
> memory, but we still have an additional mystery 2M in the RMRR since
> it's actually a 514M range.
>
> "256M"
> [ 0.626030] IOMMU: Setting identity map for device 0000:00:02.0 [0xaf800000 - 0xbf9fffff]
> [ 0.000000] BIOS-e820: [mem 0x00000000af800000-0x00000000bf9fffff] reserved
>
> setpci -s 2.0 5c.l
> afa00001
> setpci -s 2.0 50.l
> 00000241
>
> (241 >> 3) & 1f = 0x8, 8 * 32M = 256M
>
> stolen memory range: afa00000-bf9fffff
>
> setpci -s 2.0 fc.l
> aebb7018
>
> The 256M setting is a repeat of 512M, the RMRR is 258M and 256M of it is
> stolen memory.
>
> So we can say that sometimes the RMRR contains the stolen memory used as
> a framebuffer, but that stolen memory is not always mapped with an RMRR
> and there's an additional 2M in the RMRR that's still a mystery. If we
> wanted to support assignment of IGD, we could map the stolen memory and
> the opregion, but what do we do that that extra RMRR space? Ignore it?
> Map it? How do we find it from the device? Thanks,

Finding some more specs... the MGGC0 register (50h) seems to indicate
the GTT stolen memory size is 2M, which sounds suspiciously like the 2M
that the RMRR is reporting. However, from the IvyBridge MMIO, Media
Registers & Programming Env manual:

4.6.1 Changes to GTT

The GTT is constrained to be located at the beginning of a
special section of stolen memory called the GTT stolen memory
(GSM). There is no longer an MMIO register containing the
physical base address of the GTT as on prior devices. Instead of
using the PGTBL_CTL register to specify the base address of the
GTT, the GTT base is now defined to be at the bottom (offset 0)
of GSM.

Since the graphics device (including the driver) knows nothing
about the location of GSM, it does not “know” where the GTT is
located in memory. In fact, the CPU cannot directly access the
GSM containing the GTT.

That seems to suggest we can't discover this region from the device, but
the device does need to maintain access to it... I don't know how to
resolve that without exposing the RMRR through the IOMMU API.

In any case, I don't know that any of this should block the original
patch. All of this seems like "acceptable" use of RMRRs that we can
later add an exception to allow if we get to the point of understanding
it and being able to reproduce any required mappings in the guest.
Thanks,

Alex

2014-06-19 06:10:04

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Thu, Jun 19, 2014 at 3:47 AM, Alex Williamson
<[email protected]> wrote:
> Finding some more specs... the MGGC0 register (50h) seems to indicate
> the GTT stolen memory size is 2M, which sounds suspiciously like the 2M
> that the RMRR is reporting. However, from the IvyBridge MMIO, Media
> Registers & Programming Env manual:
>
> 4.6.1 Changes to GTT
>
> The GTT is constrained to be located at the beginning of a
> special section of stolen memory called the GTT stolen memory
> (GSM). There is no longer an MMIO register containing the
> physical base address of the GTT as on prior devices. Instead of
> using the PGTBL_CTL register to specify the base address of the
> GTT, the GTT base is now defined to be at the bottom (offset 0)
> of GSM.
>
> Since the graphics device (including the driver) knows nothing
> about the location of GSM, it does not “know” where the GTT is
> located in memory. In fact, the CPU cannot directly access the
> GSM containing the GTT.
>
> That seems to suggest we can't discover this region from the device, but
> the device does need to maintain access to it... I don't know how to
> resolve that without exposing the RMRR through the IOMMU API.
>
> In any case, I don't know that any of this should block the original
> patch. All of this seems like "acceptable" use of RMRRs that we can
> later add an exception to allow if we get to the point of understanding
> it and being able to reproduce any required mappings in the guest.
> Thanks,

GTT stolen is the place where the gpu stores page tables. We never
access them directly but through a special mmio range so that the gpu
can intercept pte updates and invalidate tlbs accordingly. So yeah, we
need this, too.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2014-06-19 14:29:48

by Alex Williamson

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Thu, 2014-06-19 at 08:10 +0200, Daniel Vetter wrote:
> On Thu, Jun 19, 2014 at 3:47 AM, Alex Williamson
> <[email protected]> wrote:
> > Finding some more specs... the MGGC0 register (50h) seems to indicate
> > the GTT stolen memory size is 2M, which sounds suspiciously like the 2M
> > that the RMRR is reporting. However, from the IvyBridge MMIO, Media
> > Registers & Programming Env manual:
> >
> > 4.6.1 Changes to GTT
> >
> > The GTT is constrained to be located at the beginning of a
> > special section of stolen memory called the GTT stolen memory
> > (GSM). There is no longer an MMIO register containing the
> > physical base address of the GTT as on prior devices. Instead of
> > using the PGTBL_CTL register to specify the base address of the
> > GTT, the GTT base is now defined to be at the bottom (offset 0)
> > of GSM.
> >
> > Since the graphics device (including the driver) knows nothing
> > about the location of GSM, it does not “know” where the GTT is
> > located in memory. In fact, the CPU cannot directly access the
> > GSM containing the GTT.
> >
> > That seems to suggest we can't discover this region from the device, but
> > the device does need to maintain access to it... I don't know how to
> > resolve that without exposing the RMRR through the IOMMU API.
> >
> > In any case, I don't know that any of this should block the original
> > patch. All of this seems like "acceptable" use of RMRRs that we can
> > later add an exception to allow if we get to the point of understanding
> > it and being able to reproduce any required mappings in the guest.
> > Thanks,
>
> GTT stolen is the place where the gpu stores page tables. We never
> access them directly but through a special mmio range so that the gpu
> can intercept pte updates and invalidate tlbs accordingly. So yeah, we
> need this, too.

But is there a way for software to discover its location from the
device? If so, then I think we can recreate all the identity maps we'd
need for a guest from the device. If not, then we'd need to figure out
some IOMMU API extension to handle the mapping. The spec excerpt above
seems to indicate that hardware designers decided software doesn't need
to know about it, but the RMRR seems to be the "oh crap" moment when
they realized that yes we do need to know about it. Thanks,

Alex

2014-06-19 14:41:38

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [PATCH v2] iommu/intel: Exclude devices using RMRRs from IOMMU API domains

On Thu, Jun 19, 2014 at 4:29 PM, Alex Williamson
<[email protected]> wrote:
> But is there a way for software to discover its location from the
> device? If so, then I think we can recreate all the identity maps we'd
> need for a guest from the device. If not, then we'd need to figure out
> some IOMMU API extension to handle the mapping. The spec excerpt above
> seems to indicate that hardware designers decided software doesn't need
> to know about it, but the RMRR seems to be the "oh crap" moment when
> they realized that yes we do need to know about it. Thanks,

It's all specified somewhere how it exactly works. But we've just had
piles of fun trying to get the stolen range (i.e. for gfx buffer
usage, no the gtt pte block) to work correctly and it's not been fun.
The issue is that these registers are sw-defined and set by the bios.
And the bios team occasionally smokes strong stuff and nilly-willy
changes the definitions without telling anyone ... And we know that
there's more reserved stuff in that stolen range that occasionally
shouldn't be used by the driver. We have regular discussions with
them.

Otoh the same bios teams also set up the RMRR ranges with equallly
predictable results.

I don't have a recommendation here, but expect breakage no matter what you do.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch