2015-11-18 18:12:25

by Alex Williamson

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

[cc +qemu-devel, +paolo, +gerd]

On Tue, 2015-10-27 at 17:25 +0800, Jike Song wrote:
> Hi all,
>
> We are pleased to announce another update of Intel GVT-g for Xen.
>
> Intel GVT-g is a full GPU virtualization solution with mediated
> pass-through, starting from 4th generation Intel Core(TM) processors
> with Intel Graphics processors. A virtual GPU instance is maintained
> for each VM, with part of performance critical resources directly
> assigned. The capability of running native graphics driver inside a
> VM, without hypervisor intervention in performance critical paths,
> achieves a good balance among performance, feature, and sharing
> capability. Xen is currently supported on Intel Processor Graphics
> (a.k.a. XenGT); and the core logic can be easily ported to other
> hypervisors.
>
>
> Repositories
>
> Kernel: https://github.com/01org/igvtg-kernel (2015q3-3.18.0 branch)
> Xen: https://github.com/01org/igvtg-xen (2015q3-4.5 branch)
> Qemu: https://github.com/01org/igvtg-qemu (xengt_public2015q3 branch)
>
>
> This update consists of:
>
> - XenGT is now merged with KVMGT in unified repositories(kernel and qemu), but currently
> different branches for qemu. XenGT and KVMGT share same iGVT-g core logic.

Hi!

At redhat we've been thinking about how to support vGPUs from multiple
vendors in a common way within QEMU. We want to enable code sharing
between vendors and give new vendors an easy path to add their own
support. We also have the complication that not all vGPU vendors are as
open source friendly as Intel, so being able to abstract the device
mediation and access outside of QEMU is a big advantage.

The proposal I'd like to make is that a vGPU, whether it is from Intel
or another vendor, is predominantly a PCI(e) device. We have an
interface in QEMU already for exposing arbitrary PCI devices, vfio-pci.
Currently vfio-pci uses the VFIO API to interact with "physical" devices
and system IOMMUs. I highlight /physical/ there because some of these
physical devices are SR-IOV VFs, which is somewhat of a fuzzy concept,
somewhere between fixed hardware and a virtual device implemented in
software. That software just happens to be running on the physical
endpoint.

vGPUs are similar, with the virtual device created at a different point,
host software. They also rely on different IOMMU constructs, making use
of the MMU capabilities of the GPU (GTTs and such), but really having
similar requirements.

The proposal is therefore that GPU vendors can expose vGPUs to
userspace, and thus to QEMU, using the VFIO API. For instance, vfio
supports modular bus drivers and IOMMU drivers. An intel-vfio-gvt-d
module (or extension of i915) can register as a vfio bus driver, create
a struct device per vGPU, create an IOMMU group for that device, and
register that device with the vfio-core. Since we don't rely on the
system IOMMU for GVT-d vGPU assignment, another vGPU vendor driver (or
extension of the same module) can register a "type1" compliant IOMMU
driver into vfio-core. From the perspective of QEMU then, all of the
existing vfio-pci code is re-used, QEMU remains largely unaware of any
specifics of the vGPU being assigned, and the only necessary change so
far is how QEMU traverses sysfs to find the device and thus the IOMMU
group leading to the vfio group.

There are a few areas where we know we'll need to extend the VFIO API to
make this work, but it seems like they can all be done generically. One
is that PCI BARs are described through the VFIO API as regions and each
region has a single flag describing whether mmap (ie. direct mapping) of
that region is possible. We expect that vGPUs likely need finer
granularity, enabling some areas within a BAR to be trapped and fowarded
as a read or write access for the vGPU-vfio-device module to emulate,
while other regions, like framebuffers or texture regions, are directly
mapped. I have prototype code to enable this already.

Another area is that we really don't want to proliferate each vGPU
needing a new IOMMU type within vfio. The existing type1 IOMMU provides
potentially the most simple mapping and unmapping interface possible.
We'd therefore need to allow multiple "type1" IOMMU drivers for vfio,
making type1 be more of an interface specification rather than a single
implementation. This is a trivial change to make within vfio and one
that I believe is compatible with the existing API. Note that
implementing a type1-compliant vfio IOMMU does not imply pinning an
mapping every registered page. A vGPU, with mediated device access, may
use this only to track the current HVA to GPA mappings for a VM. Only
when a DMA is enabled for the vGPU instance is that HVA pinned and an
HPA to GPA translation programmed into the GPU MMU.

Another area of extension is how to expose a framebuffer to QEMU for
seamless integration into a SPICE/VNC channel. For this I believe we
could use a new region, much like we've done to expose VGA access
through a vfio device file descriptor. An area within this new
framebuffer region could be directly mappable in QEMU while a
non-mappable page, at a standard location with standardized format,
provides a description of framebuffer and potentially even a
communication channel to synchronize framebuffer captures. This would
be new code for QEMU, but something we could share among all vGPU
implementations.

Another obvious area to be standardized would be how to discover,
create, and destroy vGPU instances. SR-IOV has a standard mechanism to
create VFs in sysfs and I would propose that vGPU vendors try to
standardize on similar interfaces to enable libvirt to easily discover
the vGPU capabilities of a given GPU and manage the lifecycle of a vGPU
instance.

This is obviously a lot to digest, but I'd certainly be interested in
hearing feedback on this proposal as well as try to clarify anything
I've left out or misrepresented above. Another benefit to this
mechanism is that direct GPU assignment and vGPU assignment use the same
code within QEMU and same API to the kernel, which should make debugging
and code support between the two easier. I'd really like to start a
discussion around this proposal, and of course the first open source
implementation of this sort of model will really help to drive the
direction it takes. Thanks!

Alex


2015-11-19 04:06:44

by Tian, Kevin

[permalink] [raw]
Subject: RE: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

> From: Alex Williamson [mailto:[email protected]]
> Sent: Thursday, November 19, 2015 2:12 AM
>
> [cc +qemu-devel, +paolo, +gerd]
>
> On Tue, 2015-10-27 at 17:25 +0800, Jike Song wrote:
> > Hi all,
> >
> > We are pleased to announce another update of Intel GVT-g for Xen.
> >
> > Intel GVT-g is a full GPU virtualization solution with mediated
> > pass-through, starting from 4th generation Intel Core(TM) processors
> > with Intel Graphics processors. A virtual GPU instance is maintained
> > for each VM, with part of performance critical resources directly
> > assigned. The capability of running native graphics driver inside a
> > VM, without hypervisor intervention in performance critical paths,
> > achieves a good balance among performance, feature, and sharing
> > capability. Xen is currently supported on Intel Processor Graphics
> > (a.k.a. XenGT); and the core logic can be easily ported to other
> > hypervisors.
> >
> >
> > Repositories
> >
> > Kernel: https://github.com/01org/igvtg-kernel (2015q3-3.18.0 branch)
> > Xen: https://github.com/01org/igvtg-xen (2015q3-4.5 branch)
> > Qemu: https://github.com/01org/igvtg-qemu (xengt_public2015q3 branch)
> >
> >
> > This update consists of:
> >
> > - XenGT is now merged with KVMGT in unified repositories(kernel and qemu), but
> currently
> > different branches for qemu. XenGT and KVMGT share same iGVT-g core logic.
>
> Hi!
>
> At redhat we've been thinking about how to support vGPUs from multiple
> vendors in a common way within QEMU. We want to enable code sharing
> between vendors and give new vendors an easy path to add their own
> support. We also have the complication that not all vGPU vendors are as
> open source friendly as Intel, so being able to abstract the device
> mediation and access outside of QEMU is a big advantage.
>
> The proposal I'd like to make is that a vGPU, whether it is from Intel
> or another vendor, is predominantly a PCI(e) device. We have an
> interface in QEMU already for exposing arbitrary PCI devices, vfio-pci.
> Currently vfio-pci uses the VFIO API to interact with "physical" devices
> and system IOMMUs. I highlight /physical/ there because some of these
> physical devices are SR-IOV VFs, which is somewhat of a fuzzy concept,
> somewhere between fixed hardware and a virtual device implemented in
> software. That software just happens to be running on the physical
> endpoint.

Agree.

One clarification for rest discussion, is that we're talking about GVT-g vGPU
here which is a pure software GPU virtualization technique. GVT-d (note
some use in the text) refers to passing through the whole GPU or a specific
VF. GVT-d already falls into existing VFIO APIs nicely (though some on-going
effort to remove Intel specific platform stickness from gfx driver). :-)

>
> vGPUs are similar, with the virtual device created at a different point,
> host software. They also rely on different IOMMU constructs, making use
> of the MMU capabilities of the GPU (GTTs and such), but really having
> similar requirements.

One important difference between system IOMMU and GPU-MMU here.
System IOMMU is very much about translation from a DMA target
(IOVA on native, or GPA in virtualization case) to HPA. However GPU
internal MMUs is to translate from Graphics Memory Address (GMA)
to DMA target (HPA if system IOMMU is disabled, or IOVA/GPA if system
IOMMU is enabled). GMA is an internal addr space within GPU, not
exposed to Qemu and fully managed by GVT-g device model. Since it's
not a standard PCI defined resource, we don't need abstract this capability
in VFIO interface.

>
> The proposal is therefore that GPU vendors can expose vGPUs to
> userspace, and thus to QEMU, using the VFIO API. For instance, vfio
> supports modular bus drivers and IOMMU drivers. An intel-vfio-gvt-d
> module (or extension of i915) can register as a vfio bus driver, create
> a struct device per vGPU, create an IOMMU group for that device, and
> register that device with the vfio-core. Since we don't rely on the
> system IOMMU for GVT-d vGPU assignment, another vGPU vendor driver (or
> extension of the same module) can register a "type1" compliant IOMMU
> driver into vfio-core. From the perspective of QEMU then, all of the
> existing vfio-pci code is re-used, QEMU remains largely unaware of any
> specifics of the vGPU being assigned, and the only necessary change so
> far is how QEMU traverses sysfs to find the device and thus the IOMMU
> group leading to the vfio group.

GVT-g requires to pin guest memory and query GPA->HPA information,
upon which shadow GTTs will be updated accordingly from (GMA->GPA)
to (GMA->HPA). So yes, here a dummy or simple "type1" compliant IOMMU
can be introduced just for this requirement.

However there's one tricky point which I'm not sure whether overall
VFIO concept will be violated. GVT-g doesn't require system IOMMU
to function, however host system may enable system IOMMU just for
hardening purpose. This means two-level translations existing (GMA->
IOVA->HPA), so the dummy IOMMU driver has to request system IOMMU
driver to allocate IOVA for VMs and then setup IOVA->HPA mapping
in IOMMU page table. In this case, multiple VM's translations are
multiplexed in one IOMMU page table.

We might need create some group/sub-group or parent/child concepts
among those IOMMUs for thorough permission control.

>
> There are a few areas where we know we'll need to extend the VFIO API to
> make this work, but it seems like they can all be done generically. One
> is that PCI BARs are described through the VFIO API as regions and each
> region has a single flag describing whether mmap (ie. direct mapping) of
> that region is possible. We expect that vGPUs likely need finer
> granularity, enabling some areas within a BAR to be trapped and fowarded
> as a read or write access for the vGPU-vfio-device module to emulate,
> while other regions, like framebuffers or texture regions, are directly
> mapped. I have prototype code to enable this already.

Yes in GVT-g one BAR resource might be partitioned among multiple vGPUs.
If VFIO can support such partial resource assignment, it'd be great. Similar
parent/child concept might also be required here, so any resource enumerated
on a vGPU shouldn't break limitations enforced on the physical device.

One unique requirement for GVT-g here, though, is that vGPU device model
need to know guest BAR configuration for proper emulation (e.g. register
IO emulation handler to KVM). Similar is about guest MSI vector for virtual
interrupt injection. Not sure how this can be fit into common VFIO model.
Does VFIO allow vendor specific extension today?

>
> Another area is that we really don't want to proliferate each vGPU
> needing a new IOMMU type within vfio. The existing type1 IOMMU provides
> potentially the most simple mapping and unmapping interface possible.
> We'd therefore need to allow multiple "type1" IOMMU drivers for vfio,
> making type1 be more of an interface specification rather than a single
> implementation. This is a trivial change to make within vfio and one
> that I believe is compatible with the existing API. Note that
> implementing a type1-compliant vfio IOMMU does not imply pinning an
> mapping every registered page. A vGPU, with mediated device access, may
> use this only to track the current HVA to GPA mappings for a VM. Only
> when a DMA is enabled for the vGPU instance is that HVA pinned and an
> HPA to GPA translation programmed into the GPU MMU.
>
> Another area of extension is how to expose a framebuffer to QEMU for
> seamless integration into a SPICE/VNC channel. For this I believe we
> could use a new region, much like we've done to expose VGA access
> through a vfio device file descriptor. An area within this new
> framebuffer region could be directly mappable in QEMU while a
> non-mappable page, at a standard location with standardized format,
> provides a description of framebuffer and potentially even a
> communication channel to synchronize framebuffer captures. This would
> be new code for QEMU, but something we could share among all vGPU
> implementations.

Now GVT-g already provides an interface to decode framebuffer information,
w/ an assumption that the framebuffer will be further composited into
OpenGL APIs. So the format is defined according to OpenGL definition.
Does that meet SPICE requirement?

Another thing to be added. Framebuffers are frequently switched in
reality. So either Qemu needs to poll or a notification mechanism is required.
And since it's dynamic, having framebuffer page directly exposed in the
new region might be tricky. We can just expose framebuffer information
(including base, format, etc.) and let Qemu to map separately out of VFIO
interface.

And... this works fine with vGPU model since software knows all the
detail about framebuffer. However in pass-through case, who do you expect
to provide that information? Is it OK to introduce vGPU specific APIs in
VFIO?

>
> Another obvious area to be standardized would be how to discover,
> create, and destroy vGPU instances. SR-IOV has a standard mechanism to
> create VFs in sysfs and I would propose that vGPU vendors try to
> standardize on similar interfaces to enable libvirt to easily discover
> the vGPU capabilities of a given GPU and manage the lifecycle of a vGPU
> instance.

Now there is no standard. We expose vGPU life-cycle mgmt. APIs through
sysfs (under i915 node), which is very Intel specific. In reality different
vendors have quite different capabilities for their own vGPUs, so not sure
how standard we can define such a mechanism. But this code should be
minor to be maintained in libvirt.

>
> This is obviously a lot to digest, but I'd certainly be interested in
> hearing feedback on this proposal as well as try to clarify anything
> I've left out or misrepresented above. Another benefit to this
> mechanism is that direct GPU assignment and vGPU assignment use the same
> code within QEMU and same API to the kernel, which should make debugging
> and code support between the two easier. I'd really like to start a
> discussion around this proposal, and of course the first open source
> implementation of this sort of model will really help to drive the
> direction it takes. Thanks!
>

Thanks for starting this discussion. Intel will definitely work with
community on this work. Based on earlier comments, I'm not sure
whether we can exactly same code for direct GPU assignment and
vGPU assignment, since even we extend VFIO some interfaces might
be vGPU specific. Does this way still achieve your end goal?

Thanks
Kevin
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-11-19 07:23:25

by Jike Song

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

Hi Alex,
On 11/19/2015 12:06 PM, Tian, Kevin wrote:
>> From: Alex Williamson [mailto:[email protected]]
>> Sent: Thursday, November 19, 2015 2:12 AM
>>
>> [cc +qemu-devel, +paolo, +gerd]
>>
>> On Tue, 2015-10-27 at 17:25 +0800, Jike Song wrote:
>>> {snip}
>>
>> Hi!
>>
>> At redhat we've been thinking about how to support vGPUs from multiple
>> vendors in a common way within QEMU. We want to enable code sharing
>> between vendors and give new vendors an easy path to add their own
>> support. We also have the complication that not all vGPU vendors are as
>> open source friendly as Intel, so being able to abstract the device
>> mediation and access outside of QEMU is a big advantage.
>>
>> The proposal I'd like to make is that a vGPU, whether it is from Intel
>> or another vendor, is predominantly a PCI(e) device. We have an
>> interface in QEMU already for exposing arbitrary PCI devices, vfio-pci.
>> Currently vfio-pci uses the VFIO API to interact with "physical" devices
>> and system IOMMUs. I highlight /physical/ there because some of these
>> physical devices are SR-IOV VFs, which is somewhat of a fuzzy concept,
>> somewhere between fixed hardware and a virtual device implemented in
>> software. That software just happens to be running on the physical
>> endpoint.
>
> Agree.
>
> One clarification for rest discussion, is that we're talking about GVT-g vGPU
> here which is a pure software GPU virtualization technique. GVT-d (note
> some use in the text) refers to passing through the whole GPU or a specific
> VF. GVT-d already falls into existing VFIO APIs nicely (though some on-going
> effort to remove Intel specific platform stickness from gfx driver). :-)
>

Hi Alex, thanks for the discussion.

In addition to Kevin's replies, I have a high-level question: can VFIO
be used by QEMU for both KVM and Xen?

--
Thanks,
Jike


>>
>> vGPUs are similar, with the virtual device created at a different point,
>> host software. They also rely on different IOMMU constructs, making use
>> of the MMU capabilities of the GPU (GTTs and such), but really having
>> similar requirements.
>
> One important difference between system IOMMU and GPU-MMU here.
> System IOMMU is very much about translation from a DMA target
> (IOVA on native, or GPA in virtualization case) to HPA. However GPU
> internal MMUs is to translate from Graphics Memory Address (GMA)
> to DMA target (HPA if system IOMMU is disabled, or IOVA/GPA if system
> IOMMU is enabled). GMA is an internal addr space within GPU, not
> exposed to Qemu and fully managed by GVT-g device model. Since it's
> not a standard PCI defined resource, we don't need abstract this capability
> in VFIO interface.
>
>>
>> The proposal is therefore that GPU vendors can expose vGPUs to
>> userspace, and thus to QEMU, using the VFIO API. For instance, vfio
>> supports modular bus drivers and IOMMU drivers. An intel-vfio-gvt-d
>> module (or extension of i915) can register as a vfio bus driver, create
>> a struct device per vGPU, create an IOMMU group for that device, and
>> register that device with the vfio-core. Since we don't rely on the
>> system IOMMU for GVT-d vGPU assignment, another vGPU vendor driver (or
>> extension of the same module) can register a "type1" compliant IOMMU
>> driver into vfio-core. From the perspective of QEMU then, all of the
>> existing vfio-pci code is re-used, QEMU remains largely unaware of any
>> specifics of the vGPU being assigned, and the only necessary change so
>> far is how QEMU traverses sysfs to find the device and thus the IOMMU
>> group leading to the vfio group.
>
> GVT-g requires to pin guest memory and query GPA->HPA information,
> upon which shadow GTTs will be updated accordingly from (GMA->GPA)
> to (GMA->HPA). So yes, here a dummy or simple "type1" compliant IOMMU
> can be introduced just for this requirement.
>
> However there's one tricky point which I'm not sure whether overall
> VFIO concept will be violated. GVT-g doesn't require system IOMMU
> to function, however host system may enable system IOMMU just for
> hardening purpose. This means two-level translations existing (GMA->
> IOVA->HPA), so the dummy IOMMU driver has to request system IOMMU
> driver to allocate IOVA for VMs and then setup IOVA->HPA mapping
> in IOMMU page table. In this case, multiple VM's translations are
> multiplexed in one IOMMU page table.
>
> We might need create some group/sub-group or parent/child concepts
> among those IOMMUs for thorough permission control.
>
>>
>> There are a few areas where we know we'll need to extend the VFIO API to
>> make this work, but it seems like they can all be done generically. One
>> is that PCI BARs are described through the VFIO API as regions and each
>> region has a single flag describing whether mmap (ie. direct mapping) of
>> that region is possible. We expect that vGPUs likely need finer
>> granularity, enabling some areas within a BAR to be trapped and fowarded
>> as a read or write access for the vGPU-vfio-device module to emulate,
>> while other regions, like framebuffers or texture regions, are directly
>> mapped. I have prototype code to enable this already.
>
> Yes in GVT-g one BAR resource might be partitioned among multiple vGPUs.
> If VFIO can support such partial resource assignment, it'd be great. Similar
> parent/child concept might also be required here, so any resource enumerated
> on a vGPU shouldn't break limitations enforced on the physical device.
>
> One unique requirement for GVT-g here, though, is that vGPU device model
> need to know guest BAR configuration for proper emulation (e.g. register
> IO emulation handler to KVM). Similar is about guest MSI vector for virtual
> interrupt injection. Not sure how this can be fit into common VFIO model.
> Does VFIO allow vendor specific extension today?
>
>>
>> Another area is that we really don't want to proliferate each vGPU
>> needing a new IOMMU type within vfio. The existing type1 IOMMU provides
>> potentially the most simple mapping and unmapping interface possible.
>> We'd therefore need to allow multiple "type1" IOMMU drivers for vfio,
>> making type1 be more of an interface specification rather than a single
>> implementation. This is a trivial change to make within vfio and one
>> that I believe is compatible with the existing API. Note that
>> implementing a type1-compliant vfio IOMMU does not imply pinning an
>> mapping every registered page. A vGPU, with mediated device access, may
>> use this only to track the current HVA to GPA mappings for a VM. Only
>> when a DMA is enabled for the vGPU instance is that HVA pinned and an
>> HPA to GPA translation programmed into the GPU MMU.
>>
>> Another area of extension is how to expose a framebuffer to QEMU for
>> seamless integration into a SPICE/VNC channel. For this I believe we
>> could use a new region, much like we've done to expose VGA access
>> through a vfio device file descriptor. An area within this new
>> framebuffer region could be directly mappable in QEMU while a
>> non-mappable page, at a standard location with standardized format,
>> provides a description of framebuffer and potentially even a
>> communication channel to synchronize framebuffer captures. This would
>> be new code for QEMU, but something we could share among all vGPU
>> implementations.
>
> Now GVT-g already provides an interface to decode framebuffer information,
> w/ an assumption that the framebuffer will be further composited into
> OpenGL APIs. So the format is defined according to OpenGL definition.
> Does that meet SPICE requirement?
>
> Another thing to be added. Framebuffers are frequently switched in
> reality. So either Qemu needs to poll or a notification mechanism is required.
> And since it's dynamic, having framebuffer page directly exposed in the
> new region might be tricky. We can just expose framebuffer information
> (including base, format, etc.) and let Qemu to map separately out of VFIO
> interface.
>
> And... this works fine with vGPU model since software knows all the
> detail about framebuffer. However in pass-through case, who do you expect
> to provide that information? Is it OK to introduce vGPU specific APIs in
> VFIO?
>
>>
>> Another obvious area to be standardized would be how to discover,
>> create, and destroy vGPU instances. SR-IOV has a standard mechanism to
>> create VFs in sysfs and I would propose that vGPU vendors try to
>> standardize on similar interfaces to enable libvirt to easily discover
>> the vGPU capabilities of a given GPU and manage the lifecycle of a vGPU
>> instance.
>
> Now there is no standard. We expose vGPU life-cycle mgmt. APIs through
> sysfs (under i915 node), which is very Intel specific. In reality different
> vendors have quite different capabilities for their own vGPUs, so not sure
> how standard we can define such a mechanism. But this code should be
> minor to be maintained in libvirt.
>
>>
>> This is obviously a lot to digest, but I'd certainly be interested in
>> hearing feedback on this proposal as well as try to clarify anything
>> I've left out or misrepresented above. Another benefit to this
>> mechanism is that direct GPU assignment and vGPU assignment use the same
>> code within QEMU and same API to the kernel, which should make debugging
>> and code support between the two easier. I'd really like to start a
>> discussion around this proposal, and of course the first open source
>> implementation of this sort of model will really help to drive the
>> direction it takes. Thanks!
>>
>
> Thanks for starting this discussion. Intel will definitely work with
> community on this work. Based on earlier comments, I'm not sure
> whether we can exactly same code for direct GPU assignment and
> vGPU assignment, since even we extend VFIO some interfaces might
> be vGPU specific. Does this way still achieve your end goal?
>
> Thanks
> Kevin
>

2015-11-19 08:41:07

by Gerd Hoffmann

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

Hi,

> > Another area of extension is how to expose a framebuffer to QEMU for
> > seamless integration into a SPICE/VNC channel. For this I believe we
> > could use a new region, much like we've done to expose VGA access
> > through a vfio device file descriptor. An area within this new
> > framebuffer region could be directly mappable in QEMU while a
> > non-mappable page, at a standard location with standardized format,
> > provides a description of framebuffer and potentially even a
> > communication channel to synchronize framebuffer captures. This would
> > be new code for QEMU, but something we could share among all vGPU
> > implementations.
>
> Now GVT-g already provides an interface to decode framebuffer information,
> w/ an assumption that the framebuffer will be further composited into
> OpenGL APIs.

Can I have a pointer to docs / code?

iGVT-g_Setup_Guide.txt mentions a "Indirect Display Mode", but doesn't
explain how the guest framebuffer can be accessed then.

> So the format is defined according to OpenGL definition.
> Does that meet SPICE requirement?

Yes and no ;)

Some more background: We basically have two rendering paths in qemu.
The classic one, without opengl, and a new, still emerging one, using
opengl and dma-bufs (gtk support merged for qemu 2.5, sdl2 support will
land in 2.6, spice support still WIP, hopefully 2.6 too). For best
performance you probably want use the new opengl-based rendering
whenever possible. However I do *not* expect the classic rendering path
disappear, we'll continue to need that in various cases, most prominent
one being vnc support.

So, for non-opengl rendering qemu needs the guest framebuffer data so it
can feed it into the vnc server. The vfio framebuffer region is meant
to support this use case.

> Another thing to be added. Framebuffers are frequently switched in
> reality. So either Qemu needs to poll or a notification mechanism is required.

The idea is to have qemu poll (and adapt poll rate, i.e. without vnc
client connected qemu will poll alot less frequently).

> And since it's dynamic, having framebuffer page directly exposed in the
> new region might be tricky. We can just expose framebuffer information
> (including base, format, etc.) and let Qemu to map separately out of VFIO
> interface.

Allocate some memory, ask gpu to blit the guest framebuffer there, i.e.
provide a snapshot of the current guest display instead of playing
mapping tricks?

> And... this works fine with vGPU model since software knows all the
> detail about framebuffer. However in pass-through case, who do you expect
> to provide that information? Is it OK to introduce vGPU specific APIs in
> VFIO?

It will only be used in the vgpu case, not for pass-though.

We think it is better to extend the vfio interface to improve vgpu
support rather than inventing something new while vfio can satisfy 90%
of the vgpu needs already. We want avoid vendor-specific extensions
though, the vgpu extension should work across vendors.

> Now there is no standard. We expose vGPU life-cycle mgmt. APIs through
> sysfs (under i915 node), which is very Intel specific. In reality different
> vendors have quite different capabilities for their own vGPUs, so not sure
> how standard we can define such a mechanism.

Agree when it comes to create vGPU instances.

> But this code should be
> minor to be maintained in libvirt.

As far I know libvirt only needs to discover those devices. If they
look like sr/iov devices in sysfs this might work without any changes to
libvirt.

cheers,
Gerd

2015-11-19 11:09:20

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel



On 19/11/2015 09:40, Gerd Hoffmann wrote:
>> > But this code should be
>> > minor to be maintained in libvirt.
> As far I know libvirt only needs to discover those devices. If they
> look like sr/iov devices in sysfs this might work without any changes to
> libvirt.

I don't think they will look like SR/IOV devices.

The interface may look a little like the sysfs interface that GVT-g is
already using. However, it should at least be extended to support
multiple vGPUs in a single VM. This might not be possible for Intel
integrated graphics, but it should definitely be possible for discrete
graphics cards.

Another nit is that the VM id should probably be replaced by a UUID
(because it's too easy to stumble on an existing VM id), assuming a VM
id is needed at all.

Paolo

2015-11-19 15:32:27

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On Thu, 19 Nov 2015, Jike Song wrote:
> Hi Alex, thanks for the discussion.
>
> In addition to Kevin's replies, I have a high-level question: can VFIO
> be used by QEMU for both KVM and Xen?

No. VFIO cannot be used with Xen today. When running on Xen, the IOMMU
is owned by Xen.

2015-11-19 15:49:57

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel



On 19/11/2015 16:32, Stefano Stabellini wrote:
> > In addition to Kevin's replies, I have a high-level question: can VFIO
> > be used by QEMU for both KVM and Xen?
>
> No. VFIO cannot be used with Xen today. When running on Xen, the IOMMU
> is owned by Xen.

I don't think QEMU command line compatibility between KVM and Xen should
be a design goal for GVT-g.

Nevertheless, it shouldn't be a problem to use a "virtual" VFIO (which
doesn't need the IOMMU, because it uses the MMU in the physical GPU)
even under Xen.

Paolo

2015-11-19 15:52:49

by Alex Williamson

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On Thu, 2015-11-19 at 15:32 +0000, Stefano Stabellini wrote:
> On Thu, 19 Nov 2015, Jike Song wrote:
> > Hi Alex, thanks for the discussion.
> >
> > In addition to Kevin's replies, I have a high-level question: can VFIO
> > be used by QEMU for both KVM and Xen?
>
> No. VFIO cannot be used with Xen today. When running on Xen, the IOMMU
> is owned by Xen.

Right, but in this case we're talking about device MMUs, which are owned
by the device driver which I think is running in dom0, right? This
proposal doesn't require support of the system IOMMU, the dom0 driver
maps IOVA translations just as it would for itself. We're largely
proposing use of the VFIO API to provide a common interface to expose a
PCI(e) device to QEMU, but what happens in the vGPU vendor device and
IOMMU backends is specific to the device and perhaps even specific to
the hypervisor. Thanks,

Alex

2015-11-19 16:12:44

by Stefano Stabellini

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On Thu, 19 Nov 2015, Paolo Bonzini wrote:
> On 19/11/2015 16:32, Stefano Stabellini wrote:
> > > In addition to Kevin's replies, I have a high-level question: can VFIO
> > > be used by QEMU for both KVM and Xen?
> >
> > No. VFIO cannot be used with Xen today. When running on Xen, the IOMMU
> > is owned by Xen.
>
> I don't think QEMU command line compatibility between KVM and Xen should
> be a design goal for GVT-g.

Right, I agree.

In fact I don't want my comment to be taken as "VFIO should not be used
at all". I only meant to reply to the question. I think it is unlikely
to be the best path for Xen, but it could very well be the right answer
for KVM.


> Nevertheless, it shouldn't be a problem to use a "virtual" VFIO (which
> doesn't need the IOMMU, because it uses the MMU in the physical GPU)
> even under Xen.

That could be true, but I would expect some extra work to be needed to
make use of VFIO on Xen. Also it might cause some duplication of
functionalities with the current Xen passthrough code base.

2015-11-19 20:02:41

by Alex Williamson

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

Hi Kevin,

On Thu, 2015-11-19 at 04:06 +0000, Tian, Kevin wrote:
> > From: Alex Williamson [mailto:[email protected]]
> > Sent: Thursday, November 19, 2015 2:12 AM
> >
> > [cc +qemu-devel, +paolo, +gerd]
> >
> > On Tue, 2015-10-27 at 17:25 +0800, Jike Song wrote:
> > > Hi all,
> > >
> > > We are pleased to announce another update of Intel GVT-g for Xen.
> > >
> > > Intel GVT-g is a full GPU virtualization solution with mediated
> > > pass-through, starting from 4th generation Intel Core(TM) processors
> > > with Intel Graphics processors. A virtual GPU instance is maintained
> > > for each VM, with part of performance critical resources directly
> > > assigned. The capability of running native graphics driver inside a
> > > VM, without hypervisor intervention in performance critical paths,
> > > achieves a good balance among performance, feature, and sharing
> > > capability. Xen is currently supported on Intel Processor Graphics
> > > (a.k.a. XenGT); and the core logic can be easily ported to other
> > > hypervisors.
> > >
> > >
> > > Repositories
> > >
> > > Kernel: https://github.com/01org/igvtg-kernel (2015q3-3.18.0 branch)
> > > Xen: https://github.com/01org/igvtg-xen (2015q3-4.5 branch)
> > > Qemu: https://github.com/01org/igvtg-qemu (xengt_public2015q3 branch)
> > >
> > >
> > > This update consists of:
> > >
> > > - XenGT is now merged with KVMGT in unified repositories(kernel and qemu), but
> > currently
> > > different branches for qemu. XenGT and KVMGT share same iGVT-g core logic.
> >
> > Hi!
> >
> > At redhat we've been thinking about how to support vGPUs from multiple
> > vendors in a common way within QEMU. We want to enable code sharing
> > between vendors and give new vendors an easy path to add their own
> > support. We also have the complication that not all vGPU vendors are as
> > open source friendly as Intel, so being able to abstract the device
> > mediation and access outside of QEMU is a big advantage.
> >
> > The proposal I'd like to make is that a vGPU, whether it is from Intel
> > or another vendor, is predominantly a PCI(e) device. We have an
> > interface in QEMU already for exposing arbitrary PCI devices, vfio-pci.
> > Currently vfio-pci uses the VFIO API to interact with "physical" devices
> > and system IOMMUs. I highlight /physical/ there because some of these
> > physical devices are SR-IOV VFs, which is somewhat of a fuzzy concept,
> > somewhere between fixed hardware and a virtual device implemented in
> > software. That software just happens to be running on the physical
> > endpoint.
>
> Agree.
>
> One clarification for rest discussion, is that we're talking about GVT-g vGPU
> here which is a pure software GPU virtualization technique. GVT-d (note
> some use in the text) refers to passing through the whole GPU or a specific
> VF. GVT-d already falls into existing VFIO APIs nicely (though some on-going
> effort to remove Intel specific platform stickness from gfx driver). :-)
>
> >
> > vGPUs are similar, with the virtual device created at a different point,
> > host software. They also rely on different IOMMU constructs, making use
> > of the MMU capabilities of the GPU (GTTs and such), but really having
> > similar requirements.
>
> One important difference between system IOMMU and GPU-MMU here.
> System IOMMU is very much about translation from a DMA target
> (IOVA on native, or GPA in virtualization case) to HPA. However GPU
> internal MMUs is to translate from Graphics Memory Address (GMA)
> to DMA target (HPA if system IOMMU is disabled, or IOVA/GPA if system
> IOMMU is enabled). GMA is an internal addr space within GPU, not
> exposed to Qemu and fully managed by GVT-g device model. Since it's
> not a standard PCI defined resource, we don't need abstract this capability
> in VFIO interface.
>
> >
> > The proposal is therefore that GPU vendors can expose vGPUs to
> > userspace, and thus to QEMU, using the VFIO API. For instance, vfio
> > supports modular bus drivers and IOMMU drivers. An intel-vfio-gvt-d
> > module (or extension of i915) can register as a vfio bus driver, create
> > a struct device per vGPU, create an IOMMU group for that device, and
> > register that device with the vfio-core. Since we don't rely on the
> > system IOMMU for GVT-d vGPU assignment, another vGPU vendor driver (or
> > extension of the same module) can register a "type1" compliant IOMMU
> > driver into vfio-core. From the perspective of QEMU then, all of the
> > existing vfio-pci code is re-used, QEMU remains largely unaware of any
> > specifics of the vGPU being assigned, and the only necessary change so
> > far is how QEMU traverses sysfs to find the device and thus the IOMMU
> > group leading to the vfio group.
>
> GVT-g requires to pin guest memory and query GPA->HPA information,
> upon which shadow GTTs will be updated accordingly from (GMA->GPA)
> to (GMA->HPA). So yes, here a dummy or simple "type1" compliant IOMMU
> can be introduced just for this requirement.
>
> However there's one tricky point which I'm not sure whether overall
> VFIO concept will be violated. GVT-g doesn't require system IOMMU
> to function, however host system may enable system IOMMU just for
> hardening purpose. This means two-level translations existing (GMA->
> IOVA->HPA), so the dummy IOMMU driver has to request system IOMMU
> driver to allocate IOVA for VMs and then setup IOVA->HPA mapping
> in IOMMU page table. In this case, multiple VM's translations are
> multiplexed in one IOMMU page table.
>
> We might need create some group/sub-group or parent/child concepts
> among those IOMMUs for thorough permission control.

My thought here is that this is all abstracted through the vGPU IOMMU
and device vfio backends. It's the GPU driver itself, or some vfio
extension of that driver, mediating access to the device and deciding
when to configure GPU MMU mappings. That driver has access to the GPA
to HVA translations thanks to the type1 complaint IOMMU it implements
and can pin pages as needed to create GPA to HPA mappings. That should
give it all the pieces it needs to fully setup mappings for the vGPU.
Whether or not there's a system IOMMU is simply an exercise for that
driver. It needs to do a DMA mapping operation through the system IOMMU
the same for a vGPU as if it was doing it for itself, because they are
in fact one in the same. The GMA to IOVA mapping seems like an internal
detail. I assume the IOVA is some sort of GPA, and the GMA is managed
through mediation of the device.


> > There are a few areas where we know we'll need to extend the VFIO API to
> > make this work, but it seems like they can all be done generically. One
> > is that PCI BARs are described through the VFIO API as regions and each
> > region has a single flag describing whether mmap (ie. direct mapping) of
> > that region is possible. We expect that vGPUs likely need finer
> > granularity, enabling some areas within a BAR to be trapped and fowarded
> > as a read or write access for the vGPU-vfio-device module to emulate,
> > while other regions, like framebuffers or texture regions, are directly
> > mapped. I have prototype code to enable this already.
>
> Yes in GVT-g one BAR resource might be partitioned among multiple vGPUs.
> If VFIO can support such partial resource assignment, it'd be great. Similar
> parent/child concept might also be required here, so any resource enumerated
> on a vGPU shouldn't break limitations enforced on the physical device.

To be clear, I'm talking about partitioning of the BAR exposed to the
guest. Partitioning of the physical BAR would be managed by the vGPU
vfio device driver. For instance when the guest mmap's a section of the
virtual BAR, the vGPU device driver would map that to a portion of the
physical device BAR.

> One unique requirement for GVT-g here, though, is that vGPU device model
> need to know guest BAR configuration for proper emulation (e.g. register
> IO emulation handler to KVM). Similar is about guest MSI vector for virtual
> interrupt injection. Not sure how this can be fit into common VFIO model.
> Does VFIO allow vendor specific extension today?

As a vfio device driver all config accesses and interrupt configuration
would be forwarded to you, so I don't see this being a problem.

> >
> > Another area is that we really don't want to proliferate each vGPU
> > needing a new IOMMU type within vfio. The existing type1 IOMMU provides
> > potentially the most simple mapping and unmapping interface possible.
> > We'd therefore need to allow multiple "type1" IOMMU drivers for vfio,
> > making type1 be more of an interface specification rather than a single
> > implementation. This is a trivial change to make within vfio and one
> > that I believe is compatible with the existing API. Note that
> > implementing a type1-compliant vfio IOMMU does not imply pinning an
> > mapping every registered page. A vGPU, with mediated device access, may
> > use this only to track the current HVA to GPA mappings for a VM. Only
> > when a DMA is enabled for the vGPU instance is that HVA pinned and an
> > HPA to GPA translation programmed into the GPU MMU.
> >
> > Another area of extension is how to expose a framebuffer to QEMU for
> > seamless integration into a SPICE/VNC channel. For this I believe we
> > could use a new region, much like we've done to expose VGA access
> > through a vfio device file descriptor. An area within this new
> > framebuffer region could be directly mappable in QEMU while a
> > non-mappable page, at a standard location with standardized format,
> > provides a description of framebuffer and potentially even a
> > communication channel to synchronize framebuffer captures. This would
> > be new code for QEMU, but something we could share among all vGPU
> > implementations.
>
> Now GVT-g already provides an interface to decode framebuffer information,
> w/ an assumption that the framebuffer will be further composited into
> OpenGL APIs. So the format is defined according to OpenGL definition.
> Does that meet SPICE requirement?
>
> Another thing to be added. Framebuffers are frequently switched in
> reality. So either Qemu needs to poll or a notification mechanism is required.
> And since it's dynamic, having framebuffer page directly exposed in the
> new region might be tricky. We can just expose framebuffer information
> (including base, format, etc.) and let Qemu to map separately out of VFIO
> interface.

Sure, we'll need to work out that interface, but it's also possible that
the framebuffer region is simply remapped to another area of the device
(ie. multiple interfaces mapping the same thing) by the vfio device
driver. Whether it's easier to do that or make the framebuffer region
reference another region is something we'll need to see.

> And... this works fine with vGPU model since software knows all the
> detail about framebuffer. However in pass-through case, who do you expect
> to provide that information? Is it OK to introduce vGPU specific APIs in
> VFIO?

Yes, vGPU may have additional features, like a framebuffer area, that
aren't present or optional for direct assignment. Obviously we support
direct assignment of GPUs for some vendors already without this feature.

> > Another obvious area to be standardized would be how to discover,
> > create, and destroy vGPU instances. SR-IOV has a standard mechanism to
> > create VFs in sysfs and I would propose that vGPU vendors try to
> > standardize on similar interfaces to enable libvirt to easily discover
> > the vGPU capabilities of a given GPU and manage the lifecycle of a vGPU
> > instance.
>
> Now there is no standard. We expose vGPU life-cycle mgmt. APIs through
> sysfs (under i915 node), which is very Intel specific. In reality different
> vendors have quite different capabilities for their own vGPUs, so not sure
> how standard we can define such a mechanism. But this code should be
> minor to be maintained in libvirt.

Every difference is a barrier. I imagine we can come up with some basic
interfaces that everyone could use, even if they don't allow fine tuning
every detail specific to a vendor.

> > This is obviously a lot to digest, but I'd certainly be interested in
> > hearing feedback on this proposal as well as try to clarify anything
> > I've left out or misrepresented above. Another benefit to this
> > mechanism is that direct GPU assignment and vGPU assignment use the same
> > code within QEMU and same API to the kernel, which should make debugging
> > and code support between the two easier. I'd really like to start a
> > discussion around this proposal, and of course the first open source
> > implementation of this sort of model will really help to drive the
> > direction it takes. Thanks!
> >
>
> Thanks for starting this discussion. Intel will definitely work with
> community on this work. Based on earlier comments, I'm not sure
> whether we can exactly same code for direct GPU assignment and
> vGPU assignment, since even we extend VFIO some interfaces might
> be vGPU specific. Does this way still achieve your end goal?

The backends will certainly be different for vGPU vs direct assignment,
but hopefully the QEMU code is almost entirely reused, modulo some
features like framebuffers that are likely only to be seen on vGPU.
Thanks,

Alex

2015-11-20 02:47:42

by Jike Song

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On 11/19/2015 07:09 PM, Paolo Bonzini wrote:
> On 19/11/2015 09:40, Gerd Hoffmann wrote:
>>>> But this code should be
>>>> minor to be maintained in libvirt.
>> As far I know libvirt only needs to discover those devices. If they
>> look like sr/iov devices in sysfs this might work without any changes to
>> libvirt.
>
> I don't think they will look like SR/IOV devices.
>
> The interface may look a little like the sysfs interface that GVT-g is
> already using. However, it should at least be extended to support
> multiple vGPUs in a single VM. This might not be possible for Intel
> integrated graphics, but it should definitely be possible for discrete
> graphics cards.

Didn't hear about multiple vGPUs for a single VM before. Yes If we
expect same vGPU interfaces for different vendors, abstraction and
vendor specific stuff should be implemented.


> Another nit is that the VM id should probably be replaced by a UUID
> (because it's too easy to stumble on an existing VM id), assuming a VM
> id is needed at all.

For the last assumption, yes, a VM id is not necessary for gvt-g, it's
only a temporary implementation.

As long as libvirt is used, UUID should be enough for gvt-g. However,
UUID is not mandatory? What should we do if user don't specify an UUID
in QEMU cmdline?

>
> Paolo
>

--
Thanks,
Jike

2015-11-20 02:58:56

by Jike Song

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On 11/19/2015 11:52 PM, Alex Williamson wrote:
> On Thu, 2015-11-19 at 15:32 +0000, Stefano Stabellini wrote:
>> On Thu, 19 Nov 2015, Jike Song wrote:
>>> Hi Alex, thanks for the discussion.
>>>
>>> In addition to Kevin's replies, I have a high-level question: can VFIO
>>> be used by QEMU for both KVM and Xen?
>>
>> No. VFIO cannot be used with Xen today. When running on Xen, the IOMMU
>> is owned by Xen.
>
> Right, but in this case we're talking about device MMUs, which are owned
> by the device driver which I think is running in dom0, right? This
> proposal doesn't require support of the system IOMMU, the dom0 driver
> maps IOVA translations just as it would for itself. We're largely
> proposing use of the VFIO API to provide a common interface to expose a
> PCI(e) device to QEMU, but what happens in the vGPU vendor device and
> IOMMU backends is specific to the device and perhaps even specific to
> the hypervisor. Thanks,

Let me conclude this, and please correct me in case of any misread: the
vGPU interface between kernel and QEMU will be through VFIO, with a new
VFIO backend (instead of the existing type1), for both KVMGT and XenGT?


>
> Alex
>

--
Thanks,
Jike

2015-11-20 04:22:56

by Alex Williamson

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On Fri, 2015-11-20 at 10:58 +0800, Jike Song wrote:
> On 11/19/2015 11:52 PM, Alex Williamson wrote:
> > On Thu, 2015-11-19 at 15:32 +0000, Stefano Stabellini wrote:
> >> On Thu, 19 Nov 2015, Jike Song wrote:
> >>> Hi Alex, thanks for the discussion.
> >>>
> >>> In addition to Kevin's replies, I have a high-level question: can VFIO
> >>> be used by QEMU for both KVM and Xen?
> >>
> >> No. VFIO cannot be used with Xen today. When running on Xen, the IOMMU
> >> is owned by Xen.
> >
> > Right, but in this case we're talking about device MMUs, which are owned
> > by the device driver which I think is running in dom0, right? This
> > proposal doesn't require support of the system IOMMU, the dom0 driver
> > maps IOVA translations just as it would for itself. We're largely
> > proposing use of the VFIO API to provide a common interface to expose a
> > PCI(e) device to QEMU, but what happens in the vGPU vendor device and
> > IOMMU backends is specific to the device and perhaps even specific to
> > the hypervisor. Thanks,
>
> Let me conclude this, and please correct me in case of any misread: the
> vGPU interface between kernel and QEMU will be through VFIO, with a new
> VFIO backend (instead of the existing type1), for both KVMGT and XenGT?

My primary concern is KVM and QEMU upstream, the proposal is not
specifically directed at XenGT, but does not exclude it either. Xen is
welcome to adopt this proposal as well, it simply defines the channel
through which vGPUs are exposed to QEMU as the VFIO API. The core VFIO
code in the Linux kernel is just as available for use in Xen dom0 as it
is for a KVM host. VFIO in QEMU certainly knows about some
accelerations for KVM, but these are almost entirely around allowing
eventfd based interrupts to be injected through KVM, which is something
I'm sure Xen could provide as well. These accelerations are also not
required, VFIO based device assignment in QEMU works with or without
KVM. Likewise, the VFIO kernel interface knows nothing about KVM and
has no dependencies on it.

There are two components to the VFIO API, one is the type1 compliant
IOMMU interface, which for this proposal is really doing nothing more
than tracking the HVA to GPA mappings for the VM. This much seems
entirely common regardless of the hypervisor. The other part is the
device interface. The lifecycle of the virtual device seems like it
would be entirely shared, as does much of the emulation components of
the device. When we get to pinning pages, providing direct access to
memory ranges for a VM, and accelerating interrupts, the vGPU drivers
will likely need some per hypervisor branches, but these are areas where
that's true no matter what the interface. I'm probably over
simplifying, but hopefully not too much, correct me if I'm wrong.

The benefit of course is that aside from some extensions to the API, the
QEMU components are already in place and there's a lot more leverage for
getting both QEMU and libvirt support upstream in being able to support
multiple vendors, perhaps multiple hypervisors, with the same code.
Also, I'm not sure how useful it is, but VFIO is a userspace driver
interface, where here we're predominantly talking about that userspace
driver being QEMU. It's not limited to that though. A userspace
compute application could have direct access to a vGPU through this
model. Thanks,

Alex

2015-11-20 05:52:19

by Jike Song

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On 11/20/2015 12:22 PM, Alex Williamson wrote:
> On Fri, 2015-11-20 at 10:58 +0800, Jike Song wrote:
>> On 11/19/2015 11:52 PM, Alex Williamson wrote:
>>> On Thu, 2015-11-19 at 15:32 +0000, Stefano Stabellini wrote:
>>>> On Thu, 19 Nov 2015, Jike Song wrote:
>>>>> Hi Alex, thanks for the discussion.
>>>>>
>>>>> In addition to Kevin's replies, I have a high-level question: can VFIO
>>>>> be used by QEMU for both KVM and Xen?
>>>>
>>>> No. VFIO cannot be used with Xen today. When running on Xen, the IOMMU
>>>> is owned by Xen.
>>>
>>> Right, but in this case we're talking about device MMUs, which are owned
>>> by the device driver which I think is running in dom0, right? This
>>> proposal doesn't require support of the system IOMMU, the dom0 driver
>>> maps IOVA translations just as it would for itself. We're largely
>>> proposing use of the VFIO API to provide a common interface to expose a
>>> PCI(e) device to QEMU, but what happens in the vGPU vendor device and
>>> IOMMU backends is specific to the device and perhaps even specific to
>>> the hypervisor. Thanks,
>>
>> Let me conclude this, and please correct me in case of any misread: the
>> vGPU interface between kernel and QEMU will be through VFIO, with a new
>> VFIO backend (instead of the existing type1), for both KVMGT and XenGT?
>
> My primary concern is KVM and QEMU upstream, the proposal is not
> specifically directed at XenGT, but does not exclude it either. Xen is
> welcome to adopt this proposal as well, it simply defines the channel
> through which vGPUs are exposed to QEMU as the VFIO API. The core VFIO
> code in the Linux kernel is just as available for use in Xen dom0 as it
> is for a KVM host. VFIO in QEMU certainly knows about some
> accelerations for KVM, but these are almost entirely around allowing
> eventfd based interrupts to be injected through KVM, which is something
> I'm sure Xen could provide as well. These accelerations are also not
> required, VFIO based device assignment in QEMU works with or without
> KVM. Likewise, the VFIO kernel interface knows nothing about KVM and
> has no dependencies on it.
>
> There are two components to the VFIO API, one is the type1 compliant
> IOMMU interface, which for this proposal is really doing nothing more
> than tracking the HVA to GPA mappings for the VM. This much seems
> entirely common regardless of the hypervisor. The other part is the
> device interface. The lifecycle of the virtual device seems like it
> would be entirely shared, as does much of the emulation components of
> the device. When we get to pinning pages, providing direct access to
> memory ranges for a VM, and accelerating interrupts, the vGPU drivers
> will likely need some per hypervisor branches, but these are areas where
> that's true no matter what the interface. I'm probably over
> simplifying, but hopefully not too much, correct me if I'm wrong.
>

Thanks for confirmation. For QEMU/KVM, I totally agree your point; However,
if we take XenGT to consider, it will be a bit more complex: with Xen
hypervisor and Dom0 kernel running in different level, it's not a straight-
forward way for QEMU to do something like mapping a portion of MMIO BAR
via VFIO in Dom0 kernel, instead of calling hypercalls directly.

I don't know if there is a better way to handle this. But I do agree that
channels between kernel and Qemu via VFIO is a good idea, even though we
may have to split KVMGT/XenGT in Qemu a bit. We are currently working on
moving all of PCI CFG emulation from kernel to Qemu, hopefully we can
release it by end of this year and work with you guys to adjust it for
the agreed method.


> The benefit of course is that aside from some extensions to the API, the
> QEMU components are already in place and there's a lot more leverage for
> getting both QEMU and libvirt support upstream in being able to support
> multiple vendors, perhaps multiple hypervisors, with the same code.
> Also, I'm not sure how useful it is, but VFIO is a userspace driver
> interface, where here we're predominantly talking about that userspace
> driver being QEMU. It's not limited to that though. A userspace
> compute application could have direct access to a vGPU through this
> model. Thanks,


>
> Alex
>
--
Thanks,
Jike

2015-11-20 06:01:59

by Tian, Kevin

[permalink] [raw]
Subject: RE: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

> From: Song, Jike
> Sent: Friday, November 20, 2015 1:52 PM
>
> On 11/20/2015 12:22 PM, Alex Williamson wrote:
> > On Fri, 2015-11-20 at 10:58 +0800, Jike Song wrote:
> >> On 11/19/2015 11:52 PM, Alex Williamson wrote:
> >>> On Thu, 2015-11-19 at 15:32 +0000, Stefano Stabellini wrote:
> >>>> On Thu, 19 Nov 2015, Jike Song wrote:
> >>>>> Hi Alex, thanks for the discussion.
> >>>>>
> >>>>> In addition to Kevin's replies, I have a high-level question: can VFIO
> >>>>> be used by QEMU for both KVM and Xen?
> >>>>
> >>>> No. VFIO cannot be used with Xen today. When running on Xen, the IOMMU
> >>>> is owned by Xen.
> >>>
> >>> Right, but in this case we're talking about device MMUs, which are owned
> >>> by the device driver which I think is running in dom0, right? This
> >>> proposal doesn't require support of the system IOMMU, the dom0 driver
> >>> maps IOVA translations just as it would for itself. We're largely
> >>> proposing use of the VFIO API to provide a common interface to expose a
> >>> PCI(e) device to QEMU, but what happens in the vGPU vendor device and
> >>> IOMMU backends is specific to the device and perhaps even specific to
> >>> the hypervisor. Thanks,

As I commented in another thread, let's not including device MMU in this
discussion, which is purely device internal so not in the scope of VFIO (Qemu
doesn't need to know). Let's keep discussion about a dummy type-1
IOMMU driver for maintaining G2H mapping.

> >>
> >> Let me conclude this, and please correct me in case of any misread: the
> >> vGPU interface between kernel and QEMU will be through VFIO, with a new
> >> VFIO backend (instead of the existing type1), for both KVMGT and XenGT?
> >
> > My primary concern is KVM and QEMU upstream, the proposal is not
> > specifically directed at XenGT, but does not exclude it either. Xen is
> > welcome to adopt this proposal as well, it simply defines the channel
> > through which vGPUs are exposed to QEMU as the VFIO API. The core VFIO
> > code in the Linux kernel is just as available for use in Xen dom0 as it
> > is for a KVM host. VFIO in QEMU certainly knows about some
> > accelerations for KVM, but these are almost entirely around allowing
> > eventfd based interrupts to be injected through KVM, which is something
> > I'm sure Xen could provide as well. These accelerations are also not
> > required, VFIO based device assignment in QEMU works with or without
> > KVM. Likewise, the VFIO kernel interface knows nothing about KVM and
> > has no dependencies on it.
> >
> > There are two components to the VFIO API, one is the type1 compliant
> > IOMMU interface, which for this proposal is really doing nothing more
> > than tracking the HVA to GPA mappings for the VM. This much seems
> > entirely common regardless of the hypervisor. The other part is the
> > device interface. The lifecycle of the virtual device seems like it
> > would be entirely shared, as does much of the emulation components of
> > the device. When we get to pinning pages, providing direct access to
> > memory ranges for a VM, and accelerating interrupts, the vGPU drivers
> > will likely need some per hypervisor branches, but these are areas where
> > that's true no matter what the interface. I'm probably over
> > simplifying, but hopefully not too much, correct me if I'm wrong.
> >
>
> Thanks for confirmation. For QEMU/KVM, I totally agree your point; However,
> if we take XenGT to consider, it will be a bit more complex: with Xen
> hypervisor and Dom0 kernel running in different level, it's not a straight-
> forward way for QEMU to do something like mapping a portion of MMIO BAR
> via VFIO in Dom0 kernel, instead of calling hypercalls directly.
>
> I don't know if there is a better way to handle this. But I do agree that
> channels between kernel and Qemu via VFIO is a good idea, even though we
> may have to split KVMGT/XenGT in Qemu a bit. We are currently working on
> moving all of PCI CFG emulation from kernel to Qemu, hopefully we can
> release it by end of this year and work with you guys to adjust it for
> the agreed method.

Currently pass-through path is already different in Qemu between Xen and KVM.
For now let's keep it simple about how to extend VFIO to manage vGPU. In
the future if Xen decides to use VFIO, it should not be that difficult to add
some Xen specific vfio driver there.

>
>
> > The benefit of course is that aside from some extensions to the API, the
> > QEMU components are already in place and there's a lot more leverage for
> > getting both QEMU and libvirt support upstream in being able to support
> > multiple vendors, perhaps multiple hypervisors, with the same code.
> > Also, I'm not sure how useful it is, but VFIO is a userspace driver
> > interface, where here we're predominantly talking about that userspace
> > driver being QEMU. It's not limited to that though. A userspace
> > compute application could have direct access to a vGPU through this
> > model. Thanks,
>

One idea in our mind is to extend vGPU for native application, e.g. to
support better isolation among containers regarding to GPU workloads.

Thanks
Kevin
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-11-20 06:12:56

by Tian, Kevin

[permalink] [raw]
Subject: RE: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

> From: Gerd Hoffmann [mailto:[email protected]]
> Sent: Thursday, November 19, 2015 4:41 PM
>
> Hi,
>
> > > Another area of extension is how to expose a framebuffer to QEMU for
> > > seamless integration into a SPICE/VNC channel. For this I believe we
> > > could use a new region, much like we've done to expose VGA access
> > > through a vfio device file descriptor. An area within this new
> > > framebuffer region could be directly mappable in QEMU while a
> > > non-mappable page, at a standard location with standardized format,
> > > provides a description of framebuffer and potentially even a
> > > communication channel to synchronize framebuffer captures. This would
> > > be new code for QEMU, but something we could share among all vGPU
> > > implementations.
> >
> > Now GVT-g already provides an interface to decode framebuffer information,
> > w/ an assumption that the framebuffer will be further composited into
> > OpenGL APIs.
>
> Can I have a pointer to docs / code?
>
> iGVT-g_Setup_Guide.txt mentions a "Indirect Display Mode", but doesn't
> explain how the guest framebuffer can be accessed then.

You can check "fb_decoder.h". One thing to clarify. Its format is
actually based on drm definition, instead of OpenGL. Sorry for
that.

>
> > So the format is defined according to OpenGL definition.
> > Does that meet SPICE requirement?
>
> Yes and no ;)
>
> Some more background: We basically have two rendering paths in qemu.
> The classic one, without opengl, and a new, still emerging one, using
> opengl and dma-bufs (gtk support merged for qemu 2.5, sdl2 support will
> land in 2.6, spice support still WIP, hopefully 2.6 too). For best
> performance you probably want use the new opengl-based rendering
> whenever possible. However I do *not* expect the classic rendering path
> disappear, we'll continue to need that in various cases, most prominent
> one being vnc support.
>
> So, for non-opengl rendering qemu needs the guest framebuffer data so it
> can feed it into the vnc server. The vfio framebuffer region is meant
> to support this use case.

what's the format requirement on that framebuffer? If you are familiar
with Intel Graphics, there's a so-called tiling feature applied on frame
buffer so it can't be used as a raw input to vnc server. w/o opengl you
need do some conversion on CPU first.

>
> > Another thing to be added. Framebuffers are frequently switched in
> > reality. So either Qemu needs to poll or a notification mechanism is required.
>
> The idea is to have qemu poll (and adapt poll rate, i.e. without vnc
> client connected qemu will poll alot less frequently).
>
> > And since it's dynamic, having framebuffer page directly exposed in the
> > new region might be tricky. We can just expose framebuffer information
> > (including base, format, etc.) and let Qemu to map separately out of VFIO
> > interface.
>
> Allocate some memory, ask gpu to blit the guest framebuffer there, i.e.
> provide a snapshot of the current guest display instead of playing
> mapping tricks?

yes it works but better to be completed in user level.

>
> > And... this works fine with vGPU model since software knows all the
> > detail about framebuffer. However in pass-through case, who do you expect
> > to provide that information? Is it OK to introduce vGPU specific APIs in
> > VFIO?
>
> It will only be used in the vgpu case, not for pass-though.
>
> We think it is better to extend the vfio interface to improve vgpu
> support rather than inventing something new while vfio can satisfy 90%
> of the vgpu needs already. We want avoid vendor-specific extensions
> though, the vgpu extension should work across vendors.

it's fine, as long as vgpu specific interface is allowed. :-)

>
> > Now there is no standard. We expose vGPU life-cycle mgmt. APIs through
> > sysfs (under i915 node), which is very Intel specific. In reality different
> > vendors have quite different capabilities for their own vGPUs, so not sure
> > how standard we can define such a mechanism.
>
> Agree when it comes to create vGPU instances.
>
> > But this code should be
> > minor to be maintained in libvirt.
>
> As far I know libvirt only needs to discover those devices. If they
> look like sr/iov devices in sysfs this might work without any changes to
> libvirt.
>
> cheers,
> Gerd
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-11-20 07:09:48

by Tian, Kevin

[permalink] [raw]
Subject: RE: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

> From: Alex Williamson [mailto:[email protected]]
> Sent: Friday, November 20, 2015 4:03 AM
>
> > >
> > > The proposal is therefore that GPU vendors can expose vGPUs to
> > > userspace, and thus to QEMU, using the VFIO API. For instance, vfio
> > > supports modular bus drivers and IOMMU drivers. An intel-vfio-gvt-d
> > > module (or extension of i915) can register as a vfio bus driver, create
> > > a struct device per vGPU, create an IOMMU group for that device, and
> > > register that device with the vfio-core. Since we don't rely on the
> > > system IOMMU for GVT-d vGPU assignment, another vGPU vendor driver (or
> > > extension of the same module) can register a "type1" compliant IOMMU
> > > driver into vfio-core. From the perspective of QEMU then, all of the
> > > existing vfio-pci code is re-used, QEMU remains largely unaware of any
> > > specifics of the vGPU being assigned, and the only necessary change so
> > > far is how QEMU traverses sysfs to find the device and thus the IOMMU
> > > group leading to the vfio group.
> >
> > GVT-g requires to pin guest memory and query GPA->HPA information,
> > upon which shadow GTTs will be updated accordingly from (GMA->GPA)
> > to (GMA->HPA). So yes, here a dummy or simple "type1" compliant IOMMU
> > can be introduced just for this requirement.
> >
> > However there's one tricky point which I'm not sure whether overall
> > VFIO concept will be violated. GVT-g doesn't require system IOMMU
> > to function, however host system may enable system IOMMU just for
> > hardening purpose. This means two-level translations existing (GMA->
> > IOVA->HPA), so the dummy IOMMU driver has to request system IOMMU
> > driver to allocate IOVA for VMs and then setup IOVA->HPA mapping
> > in IOMMU page table. In this case, multiple VM's translations are
> > multiplexed in one IOMMU page table.
> >
> > We might need create some group/sub-group or parent/child concepts
> > among those IOMMUs for thorough permission control.
>
> My thought here is that this is all abstracted through the vGPU IOMMU
> and device vfio backends. It's the GPU driver itself, or some vfio
> extension of that driver, mediating access to the device and deciding
> when to configure GPU MMU mappings. That driver has access to the GPA
> to HVA translations thanks to the type1 complaint IOMMU it implements
> and can pin pages as needed to create GPA to HPA mappings. That should
> give it all the pieces it needs to fully setup mappings for the vGPU.
> Whether or not there's a system IOMMU is simply an exercise for that
> driver. It needs to do a DMA mapping operation through the system IOMMU
> the same for a vGPU as if it was doing it for itself, because they are
> in fact one in the same. The GMA to IOVA mapping seems like an internal
> detail. I assume the IOVA is some sort of GPA, and the GMA is managed
> through mediation of the device.

Sorry I'm not familiar with VFIO internal. My original worry is that system
IOMMU for GPU may be already claimed by another vfio driver (e.g. host kernel
wants to harden gfx driver from rest sub-systems, regardless of whether vGPU
is created or not). In that case vGPU IOMMU driver shouldn't manage system
IOMMU directly.

btw, curious today how VFIO coordinates with system IOMMU driver regarding
to whether a IOMMU is used to control device assignment, or used for kernel
hardening. Somehow two are conflicting since different address spaces are
concerned (GPA vs. IOVA)...

>
>
> > > There are a few areas where we know we'll need to extend the VFIO API to
> > > make this work, but it seems like they can all be done generically. One
> > > is that PCI BARs are described through the VFIO API as regions and each
> > > region has a single flag describing whether mmap (ie. direct mapping) of
> > > that region is possible. We expect that vGPUs likely need finer
> > > granularity, enabling some areas within a BAR to be trapped and fowarded
> > > as a read or write access for the vGPU-vfio-device module to emulate,
> > > while other regions, like framebuffers or texture regions, are directly
> > > mapped. I have prototype code to enable this already.
> >
> > Yes in GVT-g one BAR resource might be partitioned among multiple vGPUs.
> > If VFIO can support such partial resource assignment, it'd be great. Similar
> > parent/child concept might also be required here, so any resource enumerated
> > on a vGPU shouldn't break limitations enforced on the physical device.
>
> To be clear, I'm talking about partitioning of the BAR exposed to the
> guest. Partitioning of the physical BAR would be managed by the vGPU
> vfio device driver. For instance when the guest mmap's a section of the
> virtual BAR, the vGPU device driver would map that to a portion of the
> physical device BAR.
>
> > One unique requirement for GVT-g here, though, is that vGPU device model
> > need to know guest BAR configuration for proper emulation (e.g. register
> > IO emulation handler to KVM). Similar is about guest MSI vector for virtual
> > interrupt injection. Not sure how this can be fit into common VFIO model.
> > Does VFIO allow vendor specific extension today?
>
> As a vfio device driver all config accesses and interrupt configuration
> would be forwarded to you, so I don't see this being a problem.

Sure, nice to know that.

Thanks
Kevin
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-11-20 08:26:18

by Gerd Hoffmann

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

Hi,

> > iGVT-g_Setup_Guide.txt mentions a "Indirect Display Mode", but doesn't
> > explain how the guest framebuffer can be accessed then.
>
> You can check "fb_decoder.h". One thing to clarify. Its format is
> actually based on drm definition, instead of OpenGL. Sorry for
> that.

drm is fine. That header explains the format, but not how it can be
accessed. Is the guest fb exported as dma-buf?

> > So, for non-opengl rendering qemu needs the guest framebuffer data so it
> > can feed it into the vnc server. The vfio framebuffer region is meant
> > to support this use case.
>
> what's the format requirement on that framebuffer? If you are familiar
> with Intel Graphics, there's a so-called tiling feature applied on frame
> buffer so it can't be used as a raw input to vnc server. w/o opengl you
> need do some conversion on CPU first.

Yes, that conversion needs to happen, qemu can't deal with tiled
graphics. Anything which pixman can handle will work. Prefered would
be PIXMAN_x8r8g8b8 (aka DRM_FORMAT_XRGB8888 on little endian host) which
is the format used by the vnc server (and other places in qemu)
internally.

qemu can also use the opengl texture for the guest fb, then fetch the
data with glReadPixels(). Which will probably do exactly the same
conversion. But it'll add a opengl dependency to the non-opengl
rendering path in qemu, would be nice if we can avoid that.

While being at it: When importing a dma-buf with a tiled framebuffer
into opengl (via eglCreateImageKHR + EGL_LINUX_DMA_BUF_EXT) I suspect we
have to pass in the tile size as attribute to make it work. Is that
correct?

cheers,
Gerd

2015-11-20 08:36:52

by Tian, Kevin

[permalink] [raw]
Subject: RE: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

> From: Gerd Hoffmann [mailto:[email protected]]
> Sent: Friday, November 20, 2015 4:26 PM
>
> Hi,
>
> > > iGVT-g_Setup_Guide.txt mentions a "Indirect Display Mode", but doesn't
> > > explain how the guest framebuffer can be accessed then.
> >
> > You can check "fb_decoder.h". One thing to clarify. Its format is
> > actually based on drm definition, instead of OpenGL. Sorry for
> > that.
>
> drm is fine. That header explains the format, but not how it can be
> accessed. Is the guest fb exported as dma-buf?

Currently not, but per our previous discussion we should move to use
dma-buf. We have some demo code in user space. Not sure whether
they're public now. Jike could you help do a check?

>
> > > So, for non-opengl rendering qemu needs the guest framebuffer data so it
> > > can feed it into the vnc server. The vfio framebuffer region is meant
> > > to support this use case.
> >
> > what's the format requirement on that framebuffer? If you are familiar
> > with Intel Graphics, there's a so-called tiling feature applied on frame
> > buffer so it can't be used as a raw input to vnc server. w/o opengl you
> > need do some conversion on CPU first.
>
> Yes, that conversion needs to happen, qemu can't deal with tiled
> graphics. Anything which pixman can handle will work. Prefered would
> be PIXMAN_x8r8g8b8 (aka DRM_FORMAT_XRGB8888 on little endian host) which
> is the format used by the vnc server (and other places in qemu)
> internally.
>
> qemu can also use the opengl texture for the guest fb, then fetch the
> data with glReadPixels(). Which will probably do exactly the same
> conversion. But it'll add a opengl dependency to the non-opengl
> rendering path in qemu, would be nice if we can avoid that.
>
> While being at it: When importing a dma-buf with a tiled framebuffer
> into opengl (via eglCreateImageKHR + EGL_LINUX_DMA_BUF_EXT) I suspect we
> have to pass in the tile size as attribute to make it work. Is that
> correct?
>

I'd guess so, but need double confirm later when reaching that level of detail.
some homework on dma-buf is required first. :-)

Thanks
Kevin
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-11-20 08:55:19

by Lv, Zhiyuan

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On Fri, Nov 20, 2015 at 04:36:15PM +0800, Tian, Kevin wrote:
> > From: Gerd Hoffmann [mailto:[email protected]]
> > Sent: Friday, November 20, 2015 4:26 PM
> >
> > Hi,
> >
> > > > iGVT-g_Setup_Guide.txt mentions a "Indirect Display Mode", but doesn't
> > > > explain how the guest framebuffer can be accessed then.
> > >
> > > You can check "fb_decoder.h". One thing to clarify. Its format is
> > > actually based on drm definition, instead of OpenGL. Sorry for
> > > that.
> >
> > drm is fine. That header explains the format, but not how it can be
> > accessed. Is the guest fb exported as dma-buf?
>
> Currently not, but per our previous discussion we should move to use
> dma-buf. We have some demo code in user space. Not sure whether
> they're public now. Jike could you help do a check?

Our current implementation did not use dma-buf yet, still based on DRM_FLINK
interface. We will switch to dma-buf. Thanks!

Regards,
-Zhiyuan

>
> >
> > > > So, for non-opengl rendering qemu needs the guest framebuffer data so it
> > > > can feed it into the vnc server. The vfio framebuffer region is meant
> > > > to support this use case.
> > >
> > > what's the format requirement on that framebuffer? If you are familiar
> > > with Intel Graphics, there's a so-called tiling feature applied on frame
> > > buffer so it can't be used as a raw input to vnc server. w/o opengl you
> > > need do some conversion on CPU first.
> >
> > Yes, that conversion needs to happen, qemu can't deal with tiled
> > graphics. Anything which pixman can handle will work. Prefered would
> > be PIXMAN_x8r8g8b8 (aka DRM_FORMAT_XRGB8888 on little endian host) which
> > is the format used by the vnc server (and other places in qemu)
> > internally.
> >
> > qemu can also use the opengl texture for the guest fb, then fetch the
> > data with glReadPixels(). Which will probably do exactly the same
> > conversion. But it'll add a opengl dependency to the non-opengl
> > rendering path in qemu, would be nice if we can avoid that.
> >
> > While being at it: When importing a dma-buf with a tiled framebuffer
> > into opengl (via eglCreateImageKHR + EGL_LINUX_DMA_BUF_EXT) I suspect we
> > have to pass in the tile size as attribute to make it work. Is that
> > correct?
> >
>
> I'd guess so, but need double confirm later when reaching that level of detail.
> some homework on dma-buf is required first. :-)
>
> Thanks
> Kevin

2015-11-20 16:40:45

by Alex Williamson

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On Fri, 2015-11-20 at 13:51 +0800, Jike Song wrote:
> On 11/20/2015 12:22 PM, Alex Williamson wrote:
> > On Fri, 2015-11-20 at 10:58 +0800, Jike Song wrote:
> >> On 11/19/2015 11:52 PM, Alex Williamson wrote:
> >>> On Thu, 2015-11-19 at 15:32 +0000, Stefano Stabellini wrote:
> >>>> On Thu, 19 Nov 2015, Jike Song wrote:
> >>>>> Hi Alex, thanks for the discussion.
> >>>>>
> >>>>> In addition to Kevin's replies, I have a high-level question: can VFIO
> >>>>> be used by QEMU for both KVM and Xen?
> >>>>
> >>>> No. VFIO cannot be used with Xen today. When running on Xen, the IOMMU
> >>>> is owned by Xen.
> >>>
> >>> Right, but in this case we're talking about device MMUs, which are owned
> >>> by the device driver which I think is running in dom0, right? This
> >>> proposal doesn't require support of the system IOMMU, the dom0 driver
> >>> maps IOVA translations just as it would for itself. We're largely
> >>> proposing use of the VFIO API to provide a common interface to expose a
> >>> PCI(e) device to QEMU, but what happens in the vGPU vendor device and
> >>> IOMMU backends is specific to the device and perhaps even specific to
> >>> the hypervisor. Thanks,
> >>
> >> Let me conclude this, and please correct me in case of any misread: the
> >> vGPU interface between kernel and QEMU will be through VFIO, with a new
> >> VFIO backend (instead of the existing type1), for both KVMGT and XenGT?
> >
> > My primary concern is KVM and QEMU upstream, the proposal is not
> > specifically directed at XenGT, but does not exclude it either. Xen is
> > welcome to adopt this proposal as well, it simply defines the channel
> > through which vGPUs are exposed to QEMU as the VFIO API. The core VFIO
> > code in the Linux kernel is just as available for use in Xen dom0 as it
> > is for a KVM host. VFIO in QEMU certainly knows about some
> > accelerations for KVM, but these are almost entirely around allowing
> > eventfd based interrupts to be injected through KVM, which is something
> > I'm sure Xen could provide as well. These accelerations are also not
> > required, VFIO based device assignment in QEMU works with or without
> > KVM. Likewise, the VFIO kernel interface knows nothing about KVM and
> > has no dependencies on it.
> >
> > There are two components to the VFIO API, one is the type1 compliant
> > IOMMU interface, which for this proposal is really doing nothing more
> > than tracking the HVA to GPA mappings for the VM. This much seems
> > entirely common regardless of the hypervisor. The other part is the
> > device interface. The lifecycle of the virtual device seems like it
> > would be entirely shared, as does much of the emulation components of
> > the device. When we get to pinning pages, providing direct access to
> > memory ranges for a VM, and accelerating interrupts, the vGPU drivers
> > will likely need some per hypervisor branches, but these are areas where
> > that's true no matter what the interface. I'm probably over
> > simplifying, but hopefully not too much, correct me if I'm wrong.
> >
>
> Thanks for confirmation. For QEMU/KVM, I totally agree your point; However,
> if we take XenGT to consider, it will be a bit more complex: with Xen
> hypervisor and Dom0 kernel running in different level, it's not a straight-
> forward way for QEMU to do something like mapping a portion of MMIO BAR
> via VFIO in Dom0 kernel, instead of calling hypercalls directly.

This would need to be part of the support added for Xen. To directly
map a device MMIO space to the VM, VFIO provides an mmap, QEMU registers
that mmap with KVM, or Xen. It's all just MemoryRegions in QEMU.
Perhaps it's even already supported by Xen.

> I don't know if there is a better way to handle this. But I do agree that
> channels between kernel and Qemu via VFIO is a good idea, even though we
> may have to split KVMGT/XenGT in Qemu a bit. We are currently working on
> moving all of PCI CFG emulation from kernel to Qemu, hopefully we can
> release it by end of this year and work with you guys to adjust it for
> the agreed method.

Well, moving PCI config space emulation from kernel to QEMU is exactly
the wrong direction to take for this proposal. Config space access to
the vGPU would occur through the VFIO API. So if you already have
config space emulation in the kernel, that's already one less piece of
work for a VFIO model, it just needs to be "wired up" through the VFIO
API. Thanks,

Alex

2015-11-20 17:03:08

by Alex Williamson

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On Fri, 2015-11-20 at 07:09 +0000, Tian, Kevin wrote:
> > From: Alex Williamson [mailto:[email protected]]
> > Sent: Friday, November 20, 2015 4:03 AM
> >
> > > >
> > > > The proposal is therefore that GPU vendors can expose vGPUs to
> > > > userspace, and thus to QEMU, using the VFIO API. For instance, vfio
> > > > supports modular bus drivers and IOMMU drivers. An intel-vfio-gvt-d
> > > > module (or extension of i915) can register as a vfio bus driver, create
> > > > a struct device per vGPU, create an IOMMU group for that device, and
> > > > register that device with the vfio-core. Since we don't rely on the
> > > > system IOMMU for GVT-d vGPU assignment, another vGPU vendor driver (or
> > > > extension of the same module) can register a "type1" compliant IOMMU
> > > > driver into vfio-core. From the perspective of QEMU then, all of the
> > > > existing vfio-pci code is re-used, QEMU remains largely unaware of any
> > > > specifics of the vGPU being assigned, and the only necessary change so
> > > > far is how QEMU traverses sysfs to find the device and thus the IOMMU
> > > > group leading to the vfio group.
> > >
> > > GVT-g requires to pin guest memory and query GPA->HPA information,
> > > upon which shadow GTTs will be updated accordingly from (GMA->GPA)
> > > to (GMA->HPA). So yes, here a dummy or simple "type1" compliant IOMMU
> > > can be introduced just for this requirement.
> > >
> > > However there's one tricky point which I'm not sure whether overall
> > > VFIO concept will be violated. GVT-g doesn't require system IOMMU
> > > to function, however host system may enable system IOMMU just for
> > > hardening purpose. This means two-level translations existing (GMA->
> > > IOVA->HPA), so the dummy IOMMU driver has to request system IOMMU
> > > driver to allocate IOVA for VMs and then setup IOVA->HPA mapping
> > > in IOMMU page table. In this case, multiple VM's translations are
> > > multiplexed in one IOMMU page table.
> > >
> > > We might need create some group/sub-group or parent/child concepts
> > > among those IOMMUs for thorough permission control.
> >
> > My thought here is that this is all abstracted through the vGPU IOMMU
> > and device vfio backends. It's the GPU driver itself, or some vfio
> > extension of that driver, mediating access to the device and deciding
> > when to configure GPU MMU mappings. That driver has access to the GPA
> > to HVA translations thanks to the type1 complaint IOMMU it implements
> > and can pin pages as needed to create GPA to HPA mappings. That should
> > give it all the pieces it needs to fully setup mappings for the vGPU.
> > Whether or not there's a system IOMMU is simply an exercise for that
> > driver. It needs to do a DMA mapping operation through the system IOMMU
> > the same for a vGPU as if it was doing it for itself, because they are
> > in fact one in the same. The GMA to IOVA mapping seems like an internal
> > detail. I assume the IOVA is some sort of GPA, and the GMA is managed
> > through mediation of the device.
>
> Sorry I'm not familiar with VFIO internal. My original worry is that system
> IOMMU for GPU may be already claimed by another vfio driver (e.g. host kernel
> wants to harden gfx driver from rest sub-systems, regardless of whether vGPU
> is created or not). In that case vGPU IOMMU driver shouldn't manage system
> IOMMU directly.

There are different APIs for the IOMMU depending on how it's being use.
If the IOMMU is being used for inter-device isolation in the host, then
the DMA API (ex. dma_map_page) transparently makes use of the IOMMU.
When we're doing device assignment, we make use of the IOMMU API which
allows more explicit control (ex. iommu_domain_alloc,
iommu_attach_device, iommu_map, etc). A vGPU is not an SR-IOV VF, it
doesn't have a unique requester ID that allows the IOMMU to
differentiate one vGPU from another, or vGPU from GPU. All mappings for
vGPUs need to occur for the GPU. It's therefore the responsibility of
the GPU driver, or this vfio extension of that driver, that needs to
perform the IOMMU mapping for the vGPU.

My expectation is therefore that once the GMA to IOVA mapping is
configured in the GPU MMU, the IOVA to HPA needs to be programmed, as if
the GPU driver was performing the setup itself, which it is. Before the
device mediation that triggered the mapping setup is complete, the GPU
MMU and the system IOMMU (if preset) should be configured to enable that
DMA. The GPU MMU provides the isolation of the vGPU, the system IOMMU
enable the DMA to occur.

> btw, curious today how VFIO coordinates with system IOMMU driver regarding
> to whether a IOMMU is used to control device assignment, or used for kernel
> hardening. Somehow two are conflicting since different address spaces are
> concerned (GPA vs. IOVA)...

When devices unbind from native host drivers, any previous IOMMU
mappings and domains are removed. These are typically created via the
DMA API above. The initialization operations of the VFIO API (creating
containers, attaching groups to containers, and setting the IOMMU model
for a container) work through the IOMMU API to create a new domain and
isolate devices within it. The type1 VFIO IOMMU interface is then
effectively a passthrough to the iommu_map() and iommu_unmap()
interfaces of the IOMMU API, modulo page pinning, accounting and
tracking. When a VFIO instance is destroyed, the devices are detached
from the IOMMU domain, the devices are unbound from vfio and re-bound to
host drivers and the DMA API can reclaim the devices for host isolation.
Thanks,

Alex

2015-11-23 04:52:50

by Jike Song

[permalink] [raw]
Subject: Re: [Qemu-devel] [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On 11/21/2015 12:40 AM, Alex Williamson wrote:
>>
>> Thanks for confirmation. For QEMU/KVM, I totally agree your point; However,
>> if we take XenGT to consider, it will be a bit more complex: with Xen
>> hypervisor and Dom0 kernel running in different level, it's not a straight-
>> forward way for QEMU to do something like mapping a portion of MMIO BAR
>> via VFIO in Dom0 kernel, instead of calling hypercalls directly.
>
> This would need to be part of the support added for Xen. To directly
> map a device MMIO space to the VM, VFIO provides an mmap, QEMU registers
> that mmap with KVM, or Xen. It's all just MemoryRegions in QEMU.
> Perhaps it's even already supported by Xen.
>

AFAICT, things are different here for Xen. To establish mappings between
Dom0 pfns and DomU gfn, one will have to call Xen hypercalls. In the scene
above, either QEMU calls the hypercall directly, or it asks the VFIO in
dom0 kernel to do it.

I'm not saying that VFIO is not applicable for XenGT. I just want to
say that given the VFIO based kernel/QEMU split model, additional effort
is needed for XenGT.

>> I don't know if there is a better way to handle this. But I do agree that
>> channels between kernel and Qemu via VFIO is a good idea, even though we
>> may have to split KVMGT/XenGT in Qemu a bit. We are currently working on
>> moving all of PCI CFG emulation from kernel to Qemu, hopefully we can
>> release it by end of this year and work with you guys to adjust it for
>> the agreed method.
>
> Well, moving PCI config space emulation from kernel to QEMU is exactly
> the wrong direction to take for this proposal. Config space access to
> the vGPU would occur through the VFIO API. So if you already have
> config space emulation in the kernel, that's already one less piece of
> work for a VFIO model, it just needs to be "wired up" through the VFIO
> API. Thanks,

If I understand correctly, the idea of moving PCI CFG to QEMU is actually
very similar to your VFIO design:

a) VM access a CFG regsiter
b) KVM hands over the access to QEMU
c) QEMU may emulate it, and when necessary, ioctl into kernel(i915/vgt)


>
> Alex
>
>
--
Thanks,
Jike

2015-11-24 11:19:25

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On Thu, Nov 19, 2015 at 01:02:36PM -0700, Alex Williamson wrote:
> On Thu, 2015-11-19 at 04:06 +0000, Tian, Kevin wrote:
> > > From: Alex Williamson [mailto:[email protected]]
> > > Sent: Thursday, November 19, 2015 2:12 AM
> > >
> > > [cc +qemu-devel, +paolo, +gerd]
> > >
> > > Another area of extension is how to expose a framebuffer to QEMU for
> > > seamless integration into a SPICE/VNC channel. For this I believe we
> > > could use a new region, much like we've done to expose VGA access
> > > through a vfio device file descriptor. An area within this new
> > > framebuffer region could be directly mappable in QEMU while a
> > > non-mappable page, at a standard location with standardized format,
> > > provides a description of framebuffer and potentially even a
> > > communication channel to synchronize framebuffer captures. This would
> > > be new code for QEMU, but something we could share among all vGPU
> > > implementations.
> >
> > Now GVT-g already provides an interface to decode framebuffer information,
> > w/ an assumption that the framebuffer will be further composited into
> > OpenGL APIs. So the format is defined according to OpenGL definition.
> > Does that meet SPICE requirement?
> >
> > Another thing to be added. Framebuffers are frequently switched in
> > reality. So either Qemu needs to poll or a notification mechanism is required.
> > And since it's dynamic, having framebuffer page directly exposed in the
> > new region might be tricky. We can just expose framebuffer information
> > (including base, format, etc.) and let Qemu to map separately out of VFIO
> > interface.
>
> Sure, we'll need to work out that interface, but it's also possible that
> the framebuffer region is simply remapped to another area of the device
> (ie. multiple interfaces mapping the same thing) by the vfio device
> driver. Whether it's easier to do that or make the framebuffer region
> reference another region is something we'll need to see.
>
> > And... this works fine with vGPU model since software knows all the
> > detail about framebuffer. However in pass-through case, who do you expect
> > to provide that information? Is it OK to introduce vGPU specific APIs in
> > VFIO?
>
> Yes, vGPU may have additional features, like a framebuffer area, that
> aren't present or optional for direct assignment. Obviously we support
> direct assignment of GPUs for some vendors already without this feature.

For exposing framebuffers for spice/vnc I highly recommend against
anything that looks like a bar/fixed mmio range mapping. First this means
the kernel driver needs to internally fake remapping, which isn't fun.
Second we can't get at the memory in an easy fashion for hw-accelerated
compositing.

My recoomendation is to build the actual memory access for underlying
framebuffers on top of dma-buf, so that it can be vacuumed up by e.g. the
host gpu driver again for rendering. For userspace the generic part would
simply be an invalidate-fb signal, with the new dma-buf supplied.

Upsides:
- You can composit stuff with the gpu.
- VRAM and other kinds of resources (even stuff not visible in pci bars)
can be represented.

Downside: Tracking mapping changes on the guest side won't be any easier.
This is mostly a problem for integrated gpus, since discrete ones usually
require contiguous vram for scanout. I think saying "don't do that" is a
valid option though, i.e. we're assuming that page mappings for a in-use
scanout range never changes on the guest side. That is true for at least
all the current linux drivers.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2015-11-24 11:50:12

by Chris Wilson

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On Tue, Nov 24, 2015 at 12:19:18PM +0100, Daniel Vetter wrote:
> Downside: Tracking mapping changes on the guest side won't be any easier.
> This is mostly a problem for integrated gpus, since discrete ones usually
> require contiguous vram for scanout. I think saying "don't do that" is a
> valid option though, i.e. we're assuming that page mappings for a in-use
> scanout range never changes on the guest side. That is true for at least
> all the current linux drivers.

Apart from we already suffer limitations of fixed mappings and have patches
that want to change the page mapping of active scanouts.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre

2015-11-24 12:39:05

by Gerd Hoffmann

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

Hi,

> > Yes, vGPU may have additional features, like a framebuffer area, that
> > aren't present or optional for direct assignment. Obviously we support
> > direct assignment of GPUs for some vendors already without this feature.
>
> For exposing framebuffers for spice/vnc I highly recommend against
> anything that looks like a bar/fixed mmio range mapping. First this means
> the kernel driver needs to internally fake remapping, which isn't fun.

Sure. I don't think we should remap here. More below.

> My recoomendation is to build the actual memory access for underlying
> framebuffers on top of dma-buf, so that it can be vacuumed up by e.g. the
> host gpu driver again for rendering.

We want that too ;)

Some more background:

OpenGL support in qemu is still young and emerging, and we are actually
building on dma-bufs here. There are a bunch of different ways how
guest display output is handled. At the end of the day it boils down to
only two fundamental cases though:

(a) Where qemu doesn't need access to the guest framebuffer
- qemu directly renders via opengl (works today with virtio-gpu
and will be in the qemu 2.5 release)
- qemu passed on the dma-buf to spice client for local display
(experimental code exists).
- qemu feeds the guest display into gpu-assisted video encoder
to send a stream over the network (no code yet).

(b) Where qemu must read the guest framebuffer.
- qemu's builtin vnc server.
- qemu writing screenshots to file.
- (non-opengl legacy code paths for local display, will
hopefully disappear long-term though ...)

So, the question is how to support (b) best. Even with OpenGL support
in qemu improving over time I don't expect this going away completely
anytime soon.

I think it makes sense to have a special vfio region for that. I don't
think remapping makes sense there. It doesn't need to be "live", it
doesn't need support high refresh rates. Placing a copy of the guest
framebuffer there on request (and convert from tiled to linear while
being at it) is perfectly fine. qemu has a adaptive update rate and
will stop doing frequent update requests when the vnc client
disconnects, so there will be nothing to do if nobody wants actually see
the guest display.

Possible alternative approach would be to import a dma-buf, then use
glReadPixels(). I suspect when doing the copy in the kernel the driver
could ask just the gpu to blit the guest framebuffer. Don't know gfx
hardware good enough to be sure though, comments are welcome.

cheers,
Gerd

2015-11-24 13:31:42

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On Tue, Nov 24, 2015 at 01:38:55PM +0100, Gerd Hoffmann wrote:
> Hi,
>
> > > Yes, vGPU may have additional features, like a framebuffer area, that
> > > aren't present or optional for direct assignment. Obviously we support
> > > direct assignment of GPUs for some vendors already without this feature.
> >
> > For exposing framebuffers for spice/vnc I highly recommend against
> > anything that looks like a bar/fixed mmio range mapping. First this means
> > the kernel driver needs to internally fake remapping, which isn't fun.
>
> Sure. I don't think we should remap here. More below.
>
> > My recoomendation is to build the actual memory access for underlying
> > framebuffers on top of dma-buf, so that it can be vacuumed up by e.g. the
> > host gpu driver again for rendering.
>
> We want that too ;)
>
> Some more background:
>
> OpenGL support in qemu is still young and emerging, and we are actually
> building on dma-bufs here. There are a bunch of different ways how
> guest display output is handled. At the end of the day it boils down to
> only two fundamental cases though:
>
> (a) Where qemu doesn't need access to the guest framebuffer
> - qemu directly renders via opengl (works today with virtio-gpu
> and will be in the qemu 2.5 release)
> - qemu passed on the dma-buf to spice client for local display
> (experimental code exists).
> - qemu feeds the guest display into gpu-assisted video encoder
> to send a stream over the network (no code yet).
>
> (b) Where qemu must read the guest framebuffer.
> - qemu's builtin vnc server.
> - qemu writing screenshots to file.
> - (non-opengl legacy code paths for local display, will
> hopefully disappear long-term though ...)
>
> So, the question is how to support (b) best. Even with OpenGL support
> in qemu improving over time I don't expect this going away completely
> anytime soon.
>
> I think it makes sense to have a special vfio region for that. I don't
> think remapping makes sense there. It doesn't need to be "live", it
> doesn't need support high refresh rates. Placing a copy of the guest
> framebuffer there on request (and convert from tiled to linear while
> being at it) is perfectly fine. qemu has a adaptive update rate and
> will stop doing frequent update requests when the vnc client
> disconnects, so there will be nothing to do if nobody wants actually see
> the guest display.
>
> Possible alternative approach would be to import a dma-buf, then use
> glReadPixels(). I suspect when doing the copy in the kernel the driver
> could ask just the gpu to blit the guest framebuffer. Don't know gfx
> hardware good enough to be sure though, comments are welcome.

Generally the kernel can't do gpu blts since the required massive state
setup is only in the userspace side of the GL driver stack. But
glReadPixels can do tricks for detiling, and if you use pixel buffer
objects or something similar it'll even be amortized reasonably.

But there's some work to add generic mmap support to dma-bufs, and for
really simple case (where we don't have a gl driver to handle the dma-buf
specially) for untiled framebuffers that would be all we need?
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2015-11-24 14:12:38

by Gerd Hoffmann

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

Hi,

> But there's some work to add generic mmap support to dma-bufs, and for
> really simple case (where we don't have a gl driver to handle the dma-buf
> specially) for untiled framebuffers that would be all we need?

Not requiring gl is certainly a bonus, people might want build qemu
without opengl support to reduce the attach surface and/or package
dependency chain.

And, yes, requirements for the non-gl rendering path are pretty low.
qemu needs something it can mmap, and which it can ask pixman to handle.
Preferred format is PIXMAN_x8r8g8b8 (qemu uses that internally in alot
of places so this avoids conversions).

Current plan is to have a special vfio region (not visible to the guest)
where the framebuffer lives, with one or two pages at the end for meta
data (format and size). Status field is there too and will be used by
qemu to request updates and the kernel to signal update completion.
Guess I should write that down as vfio rfc patch ...

I don't think it makes sense to have fields to notify qemu about which
framebuffer regions have been updated, I'd expect with full-screen
composing we have these days this information isn't available anyway.
Maybe a flag telling whenever there have been updates or not, so qemu
can skip update processing in case we have the screensaver showing a
black screen all day long.

cheers,
Gerd

2015-11-24 14:19:53

by Daniel Vetter

[permalink] [raw]
Subject: Re: [Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

On Tue, Nov 24, 2015 at 03:12:31PM +0100, Gerd Hoffmann wrote:
> Hi,
>
> > But there's some work to add generic mmap support to dma-bufs, and for
> > really simple case (where we don't have a gl driver to handle the dma-buf
> > specially) for untiled framebuffers that would be all we need?
>
> Not requiring gl is certainly a bonus, people might want build qemu
> without opengl support to reduce the attach surface and/or package
> dependency chain.
>
> And, yes, requirements for the non-gl rendering path are pretty low.
> qemu needs something it can mmap, and which it can ask pixman to handle.
> Preferred format is PIXMAN_x8r8g8b8 (qemu uses that internally in alot
> of places so this avoids conversions).
>
> Current plan is to have a special vfio region (not visible to the guest)
> where the framebuffer lives, with one or two pages at the end for meta
> data (format and size). Status field is there too and will be used by
> qemu to request updates and the kernel to signal update completion.
> Guess I should write that down as vfio rfc patch ...
>
> I don't think it makes sense to have fields to notify qemu about which
> framebuffer regions have been updated, I'd expect with full-screen
> composing we have these days this information isn't available anyway.
> Maybe a flag telling whenever there have been updates or not, so qemu
> can skip update processing in case we have the screensaver showing a
> black screen all day long.

GL, wayland, X, EGL and soonish Android's surface flinger (hwc already has
it afaik) all track damage. There's plans to add the same to the atomic
kms api too. But if you do damage tracking you really don't want to
support (maybe allow for perf reasons if the guest is stupid) frontbuffer
rendering, which means you need buffer handles + damage, and not a static
region.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch