2015-11-20 08:10:45

by Tian, Kevin

[permalink] [raw]
Subject: RE: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

> From: Tian, Kevin
> Sent: Friday, November 20, 2015 3:10 PM

> > > >
> > > > The proposal is therefore that GPU vendors can expose vGPUs to
> > > > userspace, and thus to QEMU, using the VFIO API. For instance, vfio
> > > > supports modular bus drivers and IOMMU drivers. An intel-vfio-gvt-d
> > > > module (or extension of i915) can register as a vfio bus driver, create
> > > > a struct device per vGPU, create an IOMMU group for that device, and
> > > > register that device with the vfio-core. Since we don't rely on the
> > > > system IOMMU for GVT-d vGPU assignment, another vGPU vendor driver (or
> > > > extension of the same module) can register a "type1" compliant IOMMU
> > > > driver into vfio-core. From the perspective of QEMU then, all of the
> > > > existing vfio-pci code is re-used, QEMU remains largely unaware of any
> > > > specifics of the vGPU being assigned, and the only necessary change so
> > > > far is how QEMU traverses sysfs to find the device and thus the IOMMU
> > > > group leading to the vfio group.
> > >
> > > GVT-g requires to pin guest memory and query GPA->HPA information,
> > > upon which shadow GTTs will be updated accordingly from (GMA->GPA)
> > > to (GMA->HPA). So yes, here a dummy or simple "type1" compliant IOMMU
> > > can be introduced just for this requirement.
> > >
> > > However there's one tricky point which I'm not sure whether overall
> > > VFIO concept will be violated. GVT-g doesn't require system IOMMU
> > > to function, however host system may enable system IOMMU just for
> > > hardening purpose. This means two-level translations existing (GMA->
> > > IOVA->HPA), so the dummy IOMMU driver has to request system IOMMU
> > > driver to allocate IOVA for VMs and then setup IOVA->HPA mapping
> > > in IOMMU page table. In this case, multiple VM's translations are
> > > multiplexed in one IOMMU page table.
> > >
> > > We might need create some group/sub-group or parent/child concepts
> > > among those IOMMUs for thorough permission control.
> >
> > My thought here is that this is all abstracted through the vGPU IOMMU
> > and device vfio backends. It's the GPU driver itself, or some vfio
> > extension of that driver, mediating access to the device and deciding
> > when to configure GPU MMU mappings. That driver has access to the GPA
> > to HVA translations thanks to the type1 complaint IOMMU it implements
> > and can pin pages as needed to create GPA to HPA mappings. That should
> > give it all the pieces it needs to fully setup mappings for the vGPU.
> > Whether or not there's a system IOMMU is simply an exercise for that
> > driver. It needs to do a DMA mapping operation through the system IOMMU
> > the same for a vGPU as if it was doing it for itself, because they are
> > in fact one in the same. The GMA to IOVA mapping seems like an internal
> > detail. I assume the IOVA is some sort of GPA, and the GMA is managed
> > through mediation of the device.
>
> Sorry I'm not familiar with VFIO internal. My original worry is that system
> IOMMU for GPU may be already claimed by another vfio driver (e.g. host kernel
> wants to harden gfx driver from rest sub-systems, regardless of whether vGPU
> is created or not). In that case vGPU IOMMU driver shouldn't manage system
> IOMMU directly.
>
> btw, curious today how VFIO coordinates with system IOMMU driver regarding
> to whether a IOMMU is used to control device assignment, or used for kernel
> hardening. Somehow two are conflicting since different address spaces are
> concerned (GPA vs. IOVA)...
>

Here is a more concrete example:

KVMGT doesn't require IOMMU. All DMA targets are already replaced with
HPA thru shadow GTT. So DMA requests from GPU all contain HPAs.

When IOMMU is enabled, one simple approach is to have vGPU IOMMU
driver configure system IOMMU with identity mapping (HPA->HPA). We
can't use (GPA->HPA) since GPAs from multiple VMs are conflicting.

However, we still have host gfx driver running. When IOMMU is enabled,
dma_alloc_*** will return IOVA (drvers/iommu/iova.c) in host gfx driver,
which will have IOVA->HPA programmed to system IOMMU.

One IOMMU device entry can only translate one address space, so here
comes a conflict (HPA->HPA vs. IOVA->HPA). To solve this, vGPU IOMMU
driver needs to allocate IOVA from iova.c for each VM w/ vGPU assigned,
and then KVMGT will program IOVA in shadow GTT accordingly. It adds
one additional mapping layer (GPA->IOVA->HPA). In this way two
requirements can be unified together since only IOVA->HPA mapping
needs to be built.

So unlike existing type1 IOMMU driver which controls IOMMU alone, vGPU
IOMMU driver needs to cooperate with other agent (iova.c here) to
co-manage system IOMMU. This may not impact existing VFIO framework.
Just want to highlight additional work here when implementing the vGPU
IOMMU driver.

Thanks
Kevin



Thanks
Kevin
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?


2015-11-20 17:25:08

by Alex Williamson

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On Fri, 2015-11-20 at 08:10 +0000, Tian, Kevin wrote:
> > From: Tian, Kevin
> > Sent: Friday, November 20, 2015 3:10 PM
>
> > > > >
> > > > > The proposal is therefore that GPU vendors can expose vGPUs to
> > > > > userspace, and thus to QEMU, using the VFIO API. For instance, vfio
> > > > > supports modular bus drivers and IOMMU drivers. An intel-vfio-gvt-d
> > > > > module (or extension of i915) can register as a vfio bus driver, create
> > > > > a struct device per vGPU, create an IOMMU group for that device, and
> > > > > register that device with the vfio-core. Since we don't rely on the
> > > > > system IOMMU for GVT-d vGPU assignment, another vGPU vendor driver (or
> > > > > extension of the same module) can register a "type1" compliant IOMMU
> > > > > driver into vfio-core. From the perspective of QEMU then, all of the
> > > > > existing vfio-pci code is re-used, QEMU remains largely unaware of any
> > > > > specifics of the vGPU being assigned, and the only necessary change so
> > > > > far is how QEMU traverses sysfs to find the device and thus the IOMMU
> > > > > group leading to the vfio group.
> > > >
> > > > GVT-g requires to pin guest memory and query GPA->HPA information,
> > > > upon which shadow GTTs will be updated accordingly from (GMA->GPA)
> > > > to (GMA->HPA). So yes, here a dummy or simple "type1" compliant IOMMU
> > > > can be introduced just for this requirement.
> > > >
> > > > However there's one tricky point which I'm not sure whether overall
> > > > VFIO concept will be violated. GVT-g doesn't require system IOMMU
> > > > to function, however host system may enable system IOMMU just for
> > > > hardening purpose. This means two-level translations existing (GMA->
> > > > IOVA->HPA), so the dummy IOMMU driver has to request system IOMMU
> > > > driver to allocate IOVA for VMs and then setup IOVA->HPA mapping
> > > > in IOMMU page table. In this case, multiple VM's translations are
> > > > multiplexed in one IOMMU page table.
> > > >
> > > > We might need create some group/sub-group or parent/child concepts
> > > > among those IOMMUs for thorough permission control.
> > >
> > > My thought here is that this is all abstracted through the vGPU IOMMU
> > > and device vfio backends. It's the GPU driver itself, or some vfio
> > > extension of that driver, mediating access to the device and deciding
> > > when to configure GPU MMU mappings. That driver has access to the GPA
> > > to HVA translations thanks to the type1 complaint IOMMU it implements
> > > and can pin pages as needed to create GPA to HPA mappings. That should
> > > give it all the pieces it needs to fully setup mappings for the vGPU.
> > > Whether or not there's a system IOMMU is simply an exercise for that
> > > driver. It needs to do a DMA mapping operation through the system IOMMU
> > > the same for a vGPU as if it was doing it for itself, because they are
> > > in fact one in the same. The GMA to IOVA mapping seems like an internal
> > > detail. I assume the IOVA is some sort of GPA, and the GMA is managed
> > > through mediation of the device.
> >
> > Sorry I'm not familiar with VFIO internal. My original worry is that system
> > IOMMU for GPU may be already claimed by another vfio driver (e.g. host kernel
> > wants to harden gfx driver from rest sub-systems, regardless of whether vGPU
> > is created or not). In that case vGPU IOMMU driver shouldn't manage system
> > IOMMU directly.
> >
> > btw, curious today how VFIO coordinates with system IOMMU driver regarding
> > to whether a IOMMU is used to control device assignment, or used for kernel
> > hardening. Somehow two are conflicting since different address spaces are
> > concerned (GPA vs. IOVA)...
> >
>
> Here is a more concrete example:
>
> KVMGT doesn't require IOMMU. All DMA targets are already replaced with
> HPA thru shadow GTT. So DMA requests from GPU all contain HPAs.
>
> When IOMMU is enabled, one simple approach is to have vGPU IOMMU
> driver configure system IOMMU with identity mapping (HPA->HPA). We
> can't use (GPA->HPA) since GPAs from multiple VMs are conflicting.
>
> However, we still have host gfx driver running. When IOMMU is enabled,
> dma_alloc_*** will return IOVA (drvers/iommu/iova.c) in host gfx driver,
> which will have IOVA->HPA programmed to system IOMMU.
>
> One IOMMU device entry can only translate one address space, so here
> comes a conflict (HPA->HPA vs. IOVA->HPA). To solve this, vGPU IOMMU
> driver needs to allocate IOVA from iova.c for each VM w/ vGPU assigned,
> and then KVMGT will program IOVA in shadow GTT accordingly. It adds
> one additional mapping layer (GPA->IOVA->HPA). In this way two
> requirements can be unified together since only IOVA->HPA mapping
> needs to be built.
>
> So unlike existing type1 IOMMU driver which controls IOMMU alone, vGPU
> IOMMU driver needs to cooperate with other agent (iova.c here) to
> co-manage system IOMMU. This may not impact existing VFIO framework.
> Just want to highlight additional work here when implementing the vGPU
> IOMMU driver.

Right, so the existing i915 driver needs to use the DMA API and calls
like dma_map_page() to enable translations through the IOMMU. With
dma_map_page(), the caller provides a page address (~HPA) and is
returned an IOVA. So unfortunately you don't get to take the shortcut
of having an identity mapping through the IOMMU unless you want to
convert i915 entirely to using the IOMMU API, because we also can't have
the conflict that an HPA could overlap an IOVA for a previously mapped
page.

The double translation, once through the GPU MMU and once through the
system IOMMU is going to happen regardless of whether we can identity
map through the IOMMU. The only solution to this would be for the GPU
to participate in ATS and provide pre-translated transactions from the
GPU. All of this is internal to the i915 driver (or vfio extension of
that driver) and needs to be done regardless of what sort of interface
we're using to expose the vGPU to QEMU. It just seems like VFIO
provides a convenient way of doing this since you'll have ready access
to the HVA-GPA mappings for the user.

I think the key points though are:

* the VFIO type1 IOMMU stores GPA to HVA translations
* get_user_pages() on the HVA will pin the page and give you a
page
* dma_map_page() receives that page, programs the system IOMMU and
provides an IOVA
* the GPU MMU can then be programmed with the GPA to IOVA
translations

Thanks,
Alex

2015-11-23 05:06:23

by Jike Song

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On 11/21/2015 01:25 AM, Alex Williamson wrote:
> On Fri, 2015-11-20 at 08:10 +0000, Tian, Kevin wrote:
>>
>> Here is a more concrete example:
>>
>> KVMGT doesn't require IOMMU. All DMA targets are already replaced with
>> HPA thru shadow GTT. So DMA requests from GPU all contain HPAs.
>>
>> When IOMMU is enabled, one simple approach is to have vGPU IOMMU
>> driver configure system IOMMU with identity mapping (HPA->HPA). We
>> can't use (GPA->HPA) since GPAs from multiple VMs are conflicting.
>>
>> However, we still have host gfx driver running. When IOMMU is enabled,
>> dma_alloc_*** will return IOVA (drvers/iommu/iova.c) in host gfx driver,
>> which will have IOVA->HPA programmed to system IOMMU.
>>
>> One IOMMU device entry can only translate one address space, so here
>> comes a conflict (HPA->HPA vs. IOVA->HPA). To solve this, vGPU IOMMU
>> driver needs to allocate IOVA from iova.c for each VM w/ vGPU assigned,
>> and then KVMGT will program IOVA in shadow GTT accordingly. It adds
>> one additional mapping layer (GPA->IOVA->HPA). In this way two
>> requirements can be unified together since only IOVA->HPA mapping
>> needs to be built.
>>
>> So unlike existing type1 IOMMU driver which controls IOMMU alone, vGPU
>> IOMMU driver needs to cooperate with other agent (iova.c here) to
>> co-manage system IOMMU. This may not impact existing VFIO framework.
>> Just want to highlight additional work here when implementing the vGPU
>> IOMMU driver.
>
> Right, so the existing i915 driver needs to use the DMA API and calls
> like dma_map_page() to enable translations through the IOMMU. With
> dma_map_page(), the caller provides a page address (~HPA) and is
> returned an IOVA. So unfortunately you don't get to take the shortcut
> of having an identity mapping through the IOMMU unless you want to
> convert i915 entirely to using the IOMMU API, because we also can't have
> the conflict that an HPA could overlap an IOVA for a previously mapped
> page.
>
> The double translation, once through the GPU MMU and once through the
> system IOMMU is going to happen regardless of whether we can identity
> map through the IOMMU. The only solution to this would be for the GPU
> to participate in ATS and provide pre-translated transactions from the
> GPU. All of this is internal to the i915 driver (or vfio extension of
> that driver) and needs to be done regardless of what sort of interface
> we're using to expose the vGPU to QEMU. It just seems like VFIO
> provides a convenient way of doing this since you'll have ready access
> to the HVA-GPA mappings for the user.
>
> I think the key points though are:
>
> * the VFIO type1 IOMMU stores GPA to HVA translations
> * get_user_pages() on the HVA will pin the page and give you a
> page
> * dma_map_page() receives that page, programs the system IOMMU and
> provides an IOVA
> * the GPU MMU can then be programmed with the GPA to IOVA
> translations

Thanks for such a nice example! I'll do my home work and get back to you
shortly :)

>
> Thanks,
> Alex
>

--
Thanks,
Jike