On Wed, 5 Jul 2023, Roger Pau Monné wrote:
> On Tue, Jul 04, 2023 at 08:14:59PM +0300, Oleksandr Tyshchenko wrote:
> > On Tue, Jul 4, 2023 at 5:49 PM Roger Pau Monné <[email protected]> wrote:
> >
> > Hello all.
> >
> > [sorry for the possible format issues]
> >
> >
> > On Tue, Jul 04, 2023 at 01:43:46PM +0200, Marek Marczykowski-Górecki wrote:
> > > > Hi,
> > > >
> > > > FWIW, I have ran into this issue some time ago too. I run Xen on top of
> > > > KVM and then passthrough some of the virtio devices (network one
> > > > specifically) into a (PV) guest. So, I hit both cases, the dom0 one and
> > > > domU one. As a temporary workaround I needed to disable
> > > > CONFIG_XEN_VIRTIO completely (just disabling
> > > > CONFIG_XEN_VIRTIO_FORCE_GRANT was not enough to fix it).
> > > > With that context in place, the actual response below.
> > > >
> > > > On Tue, Jul 04, 2023 at 12:39:40PM +0200, Juergen Gross wrote:
> > > > > On 04.07.23 09:48, Roger Pau Monné wrote:
> > > > > > On Thu, Jun 29, 2023 at 03:44:04PM -0700, Stefano Stabellini wrote:
> > > > > > > On Thu, 29 Jun 2023, Oleksandr Tyshchenko wrote:
> > > > > > > > On 29.06.23 04:00, Stefano Stabellini wrote:
> > > > > > > > > I think we need to add a second way? It could be anything that
> > > can help
> > > > > > > > > us distinguish between a non-grants-capable virtio backend and
> > > a
> > > > > > > > > grants-capable virtio backend, such as:
> > > > > > > > > - a string on xenstore
> > > > > > > > > - a xen param
> > > > > > > > > - a special PCI configuration register value
> > > > > > > > > - something in the ACPI tables
> > > > > > > > > - the QEMU machine type
> > > > > > > >
> > > > > > > >
> > > > > > > > Yes, I remember there was a discussion regarding that. The point
> > > is to
> > > > > > > > choose a solution to be functional for both PV and HVM *and* to
> > > be able
> > > > > > > > to support a hotplug. IIRC, the xenstore could be a possible
> > > candidate.
> > > > > > >
> > > > > > > xenstore would be among the easiest to make work. The only
> > > downside is
> > > > > > > the dependency on xenstore which otherwise virtio+grants doesn't
> > > have.
> > > > > >
> > > > > > I would avoid introducing a dependency on xenstore, if nothing else
> > > we
> > > > > > know it's a performance bottleneck.
> > > > > >
> > > > > > We would also need to map the virtio device topology into xenstore,
> > > so
> > > > > > that we can pass different options for each device.
> > > > >
> > > > > This aspect (different options) is important. How do you want to pass
> > > virtio
> > > > > device configuration parameters from dom0 to the virtio backend
> > > domain? You
> > > > > probably need something like Xenstore (a virtio based alternative like
> > > virtiofs
> > > > > would work, too) for that purpose.
> > > > >
> > > > > Mapping the topology should be rather easy via the PCI-Id, e.g.:
> > > > >
> > > > > /local/domain/42/device/virtio/0000:00:1c.0/backend
> > > >
> > > > While I agree this would probably be the simplest to implement, I don't
> > > > like introducing xenstore dependency into virtio frontend either.
> > > > Toolstack -> backend communication is probably easier to solve, as it's
> > > > much more flexible (could use qemu cmdline, QMP, other similar
> > > > mechanisms for non-qemu backends etc).
> > >
> > > I also think features should be exposed uniformly for devices, it's at
> > > least weird to have certain features exposed in the PCI config space
> > > while other features exposed in xenstore.
> > >
> > > For virtio-mmio this might get a bit confusing, are we going to add
> > > xenstore entries based on the position of the device config mmio
> > > region?
> > >
> > > I think on Arm PCI enumeration is not (usually?) done by the firmware,
> > > at which point the SBDF expected by the tools/backend might be
> > > different than the value assigned by the guest OS.
> > >
> > > I think there are two slightly different issues, one is how to pass
> > > information to virtio backends, I think doing this initially based on
> > > xenstore is not that bad, because it's an internal detail of the
> > > backend implementation. However passing information to virtio
> > > frontends using xenstore is IMO a bad idea, there's already a way to
> > > negotiate features between virtio frontends and backends, and Xen
> > > should just expand and use that.
> > >
> > >
> >
> > On Arm with device-tree we have a special bindings which purpose is to
> > inform us whether we need to use grants for virtio and backend domid for a
> > particular device.Here on x86, we don't have a device tree, so cannot
> > (easily?) reuse this logic.
> >
> > I have just recollected one idea suggested by Stefano some time ago [1].
> > The context of discussion was about what to do when device-tree and ACPI
> > cannot be reused (or something like that).The idea won't cover virtio-mmio,
> > but I have heard that virtio-mmio usage with x86 Xen is rather unusual case.
> >
> > I will paste the text below for convenience.

Thanks Oleksandr! I had forgotten about this, but in retrospect it was a
good suggestion :-D

> > **********
> >
> > Part 1 (intro):
> >
> > We could reuse a PCI config space register to expose the backend id.
> > However this solution requires a backend change (QEMU) to expose the
> > backend id via an emulated register for each emulated device.
> >
> > To avoid having to introduce a special config space register in all
> > emulated PCI devices (virtio-net, virtio-block, etc) I wonder if we
> > could add a special PCI config space register at the emulated PCI Root
> > Complex level.
> >
> > Basically the workflow would be as follow:
> >
> > - Linux recognizes the PCI Root Complex as a Xen PCI Root Complex
> > - Linux writes to special PCI config space register of the Xen PCI Root
> > Complex the PCI device id (basically the BDF)
> > - The Xen PCI Root Complex emulated by Xen answers by writing back to
> > the same location the backend id (domid of the backend)
> > - Linux reads back the same PCI config space register of the Xen PCI
> > Root Complex and learn the relevant domid
>
> IMO this seems awfully complex. I'm not familiar with the VirtIO
> spec, but I see there's a Vendor data capability, could we possibly
> expose Xen-specific information on that capability?

That is also a possibility too. Also we could use a PCI conf register
which is known to be unused in the Virtio spec to expose the grant
capability and backend domid.

> > Part 2 (clarification):
> >
> > I think using a special config space register in the root complex would
> > not be terrible in terms of guest changes because it is easy to
> > introduce a new root complex driver in Linux and other OSes. The root
> > complex would still be ECAM compatible so the regular ECAM driver would
> > still work. A new driver would only be necessary if you want to be able
> > to access the special config space register.
>
> I'm slightly worry of this approach, we end up modifying a root
> complex emulation in order to avoid modifying a PCI device emulation
> on QEMU, not sure that's a good trade off.
>
> Note also that different architectures will likely have different root
> complex, and so you might need to modify several of them, plus then
> arrange the PCI layout correctly in order to have the proper hierarchy
> so that devices belonging to different driver domains are assigned to
> different bridges.

I do think that adding something to the PCI conf register somewhere is
the best option because it is not dependent on ACPI and it is not
dependent on xenstore both of which are very undesirable.

I am not sure where specifically is the best place. These are 3 ideas
we came up with:
1. PCI root complex
2. a register on the device itself
3. a new capability of the device
4. add one extra dummy PCI device for the sole purpose of exposing the
grants capability

Looking at the spec, there is a way to add a vendor-specific capability
(cap_vndr = 0x9). Could we use that? It doesn't look like it is used
today, Linux doesn't parse it.

> >
> >
> > **********
> > What do you think about it? Are there any pitfalls, etc? This also requires
> > system changes, but at least without virtio spec changes.
>
> Why are we so reluctant to add spec changes? I understand this might
> take time an effort, but it's the only way IMO to build a sustainable
> VirtIO Xen implementation. Did we already attempt to negotiate with
> Oasis Xen related spec changes and those where refused?

That's because spec changes can be very slow. This is a bug that we need
a relatively quick solution for and waiting 12-24 months for a spec
update is not realistic.

I think a spec change would be best as a long term solution. We also
need a short term solution. The short term solution doesn't have to be
ideal but it has to work now.

2023-07-06 22:22:03

by Stefano Stabellini

On Fri, 7 Jul 2023:
> On 07.07.23 16:42, Roger Pau Monné wrote:
> > On Fri, Jul 07, 2023 at 04:10:14PM +0200, Juergen Gross wrote:
> > > On 07.07.23 11:50, Roger Pau Monné wrote:
> > > > On Fri, Jul 07, 2023 at 06:38:48AM +0200, Juergen Gross wrote:
> > > > > On 06.07.23 23:49, Stefano Stabellini wrote:
> > > > > > On Thu, 6 Jul 2023, Roger Pau Monné wrote:
> > > > > > > On Wed, Jul 05, 2023 at 03:41:10PM -0700, Stefano Stabellini
> > > > > > > wrote:
> > > > > > > > On Wed, 5 Jul 2023, Roger Pau Monné wrote:
> > > > > > > > > On Tue, Jul 04, 2023 at 08:14:59PM +0300, Oleksandr Tyshchenko
> > > > > > > > > wrote:
> > > > > > > > > > Part 2 (clarification):
> > > > > > > > > >
> > > > > > > > > > I think using a special config space register in the root
> > > > > > > > > > complex would
> > > > > > > > > > not be terrible in terms of guest changes because it is easy
> > > > > > > > > > to
> > > > > > > > > > introduce a new root complex driver in Linux and other OSes.
> > > > > > > > > > The root
> > > > > > > > > > complex would still be ECAM compatible so the regular ECAM
> > > > > > > > > > driver would
> > > > > > > > > > still work. A new driver would only be necessary if you want
> > > > > > > > > > to be able
> > > > > > > > > > to access the special config space register.
> > > > > > > > >
> > > > > > > > > I'm slightly worry of this approach, we end up modifying a
> > > > > > > > > root
> > > > > > > > > complex emulation in order to avoid modifying a PCI device
> > > > > > > > > emulation
> > > > > > > > > on QEMU, not sure that's a good trade off.
> > > > > > > > >
> > > > > > > > > Note also that different architectures will likely have
> > > > > > > > > different root
> > > > > > > > > complex, and so you might need to modify several of them, plus
> > > > > > > > > then
> > > > > > > > > arrange the PCI layout correctly in order to have the proper
> > > > > > > > > hierarchy
> > > > > > > > > so that devices belonging to different driver domains are
> > > > > > > > > assigned to
> > > > > > > > > different bridges.
> > > > > > > >
> > > > > > > > I do think that adding something to the PCI conf register
> > > > > > > > somewhere is
> > > > > > > > the best option because it is not dependent on ACPI and it is
> > > > > > > > not
> > > > > > > > dependent on xenstore both of which are very undesirable.
> > > > > > > >
> > > > > > > > I am not sure where specifically is the best place. These are 3
> > > > > > > > ideas
> > > > > > > > we came up with:
> > > > > > > > 1. PCI root complex
> > > > > > > > 2. a register on the device itself
> > > > > > > > 3. a new capability of the device
> > > > > > > > 4. add one extra dummy PCI device for the sole purpose of
> > > > > > > > exposing the
> > > > > > > > grants capability
> > > > > > > >
> > > > > > > >
> > > > > > > > Looking at the spec, there is a way to add a vendor-specific
> > > > > > > > capability
> > > > > > > > (cap_vndr = 0x9). Could we use that? It doesn't look like it is
> > > > > > > > used
> > > > > > > > today, Linux doesn't parse it.
> > > > > > >
> > > > > > > I did wonder the same from a quick look at the spec. There's
> > > > > > > however
> > > > > > > a text in the specification that says:
> > > > > > >
> > > > > > > "The driver SHOULD NOT use the Vendor data capability except for
> > > > > > > debugging and reporting purposes."
> > > > > > >
> > > > > > > So we would at least need to change that because the capability
> > > > > > > would
> > > > > > > then be used by other purposes different than debugging and
> > > > > > > reporting.
> > > > > > >
> > > > > > > Seems like a minor adjustment, so might we worth asking upstream
> > > > > > > about
> > > > > > > their opinion, and to get a conversation started.
> > > > > >
> > > > > > Wait, wouldn't this use-case fall under "reporting" ? It is exactly
> > > > > > what
> > > > > > we are doing, right?
> > > > >
> > > > > I'd understand "reporting" as e.g. logging, transferring statistics,
> > > > > ...
> > > > >
> > > > > We'd like to use it for configuration purposes.
> > > >
> > > > I've also read it that way.
> > > >
> > > > > Another idea would be to enhance the virtio IOMMU device to suit our
> > > > > needs:
> > > > > we could add the domid as another virtio IOMMU device capability and
> > > > > (for now)
> > > > > use bypass mode for all "productive" devices.
> > > >
> > > > If we have to start adding capabilties, won't it be easier to just add
> > > > it to the each device instead of adding it to virtio IOMMU. Or is the
> > > > parsing of capabilities device specific, and hence we would have to
> > > > implement such parsing for each device? I would expect some
> > > > capabilities are shared between all devices, and a Xen capability could
> > > > be one of those.
> > >
> > > Have a look at [1], which is describing the common device config layout.
> > > The problem here is that we'd need to add the domid after the queue
> > > specific
> > > data, resulting in a mess if further queue fields would be added later.
> > >
> > > We could try that, of course.
> >
> > Right, we must make it part of the standard if we modify
> > virtio_pci_common_cfg, or else newly added fields would overlap the
> > Xen specific one.
> >
> > Would it be possible to signal Xen-grants support in the
> > `device_feature` field, and then expose it from a vendor capability?
> > IOW, would it be possible to add a Xen-specific hook in the parsing of
> > virtio_pci_common_cfg that would then fetch additional data from a
> > capability?
>
> TBH, I don't know. It might require some changes in the central parsing
> logic, but this shouldn't be too hard to do.
>
> > That would likely be less intrusive than adding a new Xen-specific
> > field to virtio_pci_common_cfg while still allowing us to do Xen
> > specific configuration for all VirtIO devices.
>
> In case we want to go that route, this should be in a new "platform config"
> capability, which might be just another form of a vendor capability.

I think this is the best idea. We should look into this.

> > > > > Later we could even add grant-V3 support to Xen and to the virtio
> > > > > IOMMU device
> > > > > (see my last year Xen Summit design session). This could be usable for
> > > > > disaggregated KVM setups, too, so I believe there is a chance to get
> > > > > this
> > > > > accepted.
> > > > >
> > > > > > > > > > **********
> > > > > > > > > > What do you think about it? Are there any pitfalls, etc?
> > > > > > > > > > This also requires
> > > > > > > > > > system changes, but at least without virtio spec changes.
> > > > > > > > >
> > > > > > > > > Why are we so reluctant to add spec changes? I understand
> > > > > > > > > this might
> > > > > > > > > take time an effort, but it's the only way IMO to build a
> > > > > > > > > sustainable
> > > > > > > > > VirtIO Xen implementation. Did we already attempt to
> > > > > > > > > negotiate with
> > > > > > > > > Oasis Xen related spec changes and those where refused?
> > > > > > > >
> > > > > > > > That's because spec changes can be very slow. This is a bug that
> > > > > > > > we need
> > > > > > > > a relatively quick solution for and waiting 12-24 months for a
> > > > > > > > spec
> > > > > > > > update is not realistic.
> > > > > > > >
> > > > > > > > I think a spec change would be best as a long term solution. We
> > > > > > > > also
> > > > > > > > need a short term solution. The short term solution doesn't have
> > > > > > > > to be
> > > > > > > > ideal but it has to work now.
> > > > > > >
> > > > > > > My fear with such approach is that once a bodge is in place people
> > > > > > > move on to other stuff and this never gets properly fixed.
> > > > > > >
> > > > > > > I know this might not be a well received opinion, but it would be
> > > > > > > better if such bodge is kept in each interested party patchqueue
> > > > > > > for
> > > > > > > the time being, until a proper solution is implemented. That way
> > > > > > > there's an interest from parties into properly fixing it upstream.
> > > > > >
> > > > > > Unfortunately we are in the situation where we have an outstanding
> > > > > > upstream bug, so we have to take action one way or the other.
> > > > >
> > > > > The required virtio IOMMU device modification would be rather small,
> > > > > so
> > > > > adding it maybe under a CONFIG option defaulting to off might be
> > > > > acceptable.
> > > >
> > > > Would you then do the grant allocation as part of virtio IOMMU?
> > >
> > > Long term, maybe. Do you remember my Grant-V3 design session last year?
> > > Being
> > > able to reuse the same layout for virtio IOMMU was one of the basic ideas
> > > for
> > > that layout (this would need some heavy work on the virtio IOMMU frontend
> > > and
> > > backend, of course).
> >
> > While this might well be the best option, do we have anyone with the
> > time and expertise to work on this? I might be wrong, but it seems
> > like a huge task.
>
> As a background project I'd like to pursue it. OTOH I'm not sure how much time
> I could spend on it.

Not only it is complex but also has severe implications in terms of
security, safety, and needs to interact with potential virtual IOMMUs in
the guest (virtual IOMMUs to expose another IOMMU stage of translation
in the guest.)

This is definitely not simple.

At that point I would feel more confident in a solution that uses ACPI
tables to add the necessary information the same way we use Device Tree
to do it on ARM. Keep in mind that if an existing ACPI table doesn't
have the fields that we need, we can introduce a new ACPI table.

2023-07-07 21:10:53

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [PATCH 2/2] xen/virtio: Avoid use of the dom0 backend in dom0

On Fri, 7 Jul 2023, Juergen Gross wrote:
> On 26.06.23 15:17, Petr Pavlu wrote:
> > On 6/21/23 19:58, Oleksandr Tyshchenko wrote:
> > > On 21.06.23 16:12, Petr Pavlu wrote:
> > > > When attempting to run Xen on a QEMU/KVM virtual machine with virtio
> > > > devices (all x86_64), dom0 tries to establish a grant for itself which
> > > > eventually results in a hang during the boot.
> > > >
> > > > The backtrace looks as follows, the while loop in __send_control_msg()
> > > > makes no progress:
> > > >
> > > > #0 virtqueue_get_buf_ctx (_vq=_vq@entry=0xffff8880074a8400,
> > > > len=len@entry=0xffffc90000413c94, ctx=ctx@entry=0x0 <fixed_percpu_data>)
> > > > at ../drivers/virtio/virtio_ring.c:2326
> > > > #1 0xffffffff817086b7 in virtqueue_get_buf
> > > > (_vq=_vq@entry=0xffff8880074a8400, len=len@entry=0xffffc90000413c94) at
> > > > ../drivers/virtio/virtio_ring.c:2333
> > > > #2 0xffffffff8175f6b2 in __send_control_msg (portdev=<optimized
> > > > out>, port_id=0xffffffff, event=0x0, value=0x1) at
> > > > ../drivers/char/virtio_console.c:562
> > > > #3 0xffffffff8175f6ee in __send_control_msg (portdev=<optimized
> > > > out>, port_id=<optimized out>, event=<optimized out>, value=<optimized
> > > > out>) at ../drivers/char/virtio_console.c:569
> > > > #4 0xffffffff817618b1 in virtcons_probe (vdev=0xffff88800585e800)
> > > > at ../drivers/char/virtio_console.c:2098
> > > > #5 0xffffffff81707117 in virtio_dev_probe (_d=0xffff88800585e810)
> > > > at ../drivers/virtio/virtio.c:305
> > > > #6 0xffffffff8198e348 in call_driver_probe (drv=0xffffffff82be40c0
> > > > <virtio_console>, drv=0xffffffff82be40c0 <virtio_console>,
> > > > dev=0xffff88800585e810) at ../drivers/base/dd.c:579
> > > > #7 really_probe (dev=dev@entry=0xffff88800585e810,
> > > > drv=drv@entry=0xffffffff82be40c0 <virtio_console>) at
> > > > ../drivers/base/dd.c:658
> > > > #8 0xffffffff8198e58f in __driver_probe_device
> > > > (drv=drv@entry=0xffffffff82be40c0 <virtio_console>,
> > > > dev=dev@entry=0xffff88800585e810) at ../drivers/base/dd.c:800
> > > > #9 0xffffffff8198e65a in driver_probe_device
> > > > (drv=drv@entry=0xffffffff82be40c0 <virtio_console>,
> > > > dev=dev@entry=0xffff88800585e810) at ../drivers/base/dd.c:830
> > > > #10 0xffffffff8198e832 in __driver_attach (dev=0xffff88800585e810,
> > > > data=0xffffffff82be40c0 <virtio_console>) at ../drivers/base/dd.c:1216
> > > > #11 0xffffffff8198bfb2 in bus_for_each_dev (bus=<optimized out>,
> > > > start=start@entry=0x0 <fixed_percpu_data>,
> > > > data=data@entry=0xffffffff82be40c0 <virtio_console>,
> > > > fn=fn@entry=0xffffffff8198e7b0 <__driver_attach>) at
> > > > ../drivers/base/bus.c:368
> > > > #12 0xffffffff8198db65 in driver_attach
> > > > (drv=drv@entry=0xffffffff82be40c0 <virtio_console>) at
> > > > ../drivers/base/dd.c:1233
> > > > #13 0xffffffff8198d207 in bus_add_driver
> > > > (drv=drv@entry=0xffffffff82be40c0 <virtio_console>) at
> > > > ../drivers/base/bus.c:673
> > > > #14 0xffffffff8198f550 in driver_register
> > > > (drv=drv@entry=0xffffffff82be40c0 <virtio_console>) at
> > > > ../drivers/base/driver.c:246
> > > > #15 0xffffffff81706b47 in register_virtio_driver
> > > > (driver=driver@entry=0xffffffff82be40c0 <virtio_console>) at
> > > > ../drivers/virtio/virtio.c:357
> > > > #16 0xffffffff832cd34b in virtio_console_init () at
> > > > ../drivers/char/virtio_console.c:2258
> > > > #17 0xffffffff8100105c in do_one_initcall (fn=0xffffffff832cd2e0
> > > > <virtio_console_init>) at ../init/main.c:1246
> > > > #18 0xffffffff83277293 in do_initcall_level
> > > > (command_line=0xffff888003e2f900 "root", level=0x6) at
> > > > ../init/main.c:1319
> > > > #19 do_initcalls () at ../init/main.c:1335
> > > > #20 do_basic_setup () at ../init/main.c:1354
> > > > #21 kernel_init_freeable () at ../init/main.c:1571
> > > > #22 0xffffffff81f64be1 in kernel_init (unused=<optimized out>) at
> > > > ../init/main.c:1462
> > > > #23 0xffffffff81001f49 in ret_from_fork () at
> > > > ../arch/x86/entry/entry_64.S:308
> > > > #24 0x0000000000000000 in ?? ()
> > > >
> > > > Fix the problem by preventing xen_grant_init_backend_domid() from
> > > > setting dom0 as a backend when running in dom0.
> > > >
> > > > Fixes: 035e3a4321f7 ("xen/virtio: Optimize the setup of "xen-grant-dma"
> > > > devices")
> > >
> > >
> > > I am not 100% sure whether the Fixes tag points to precise commit. If I
> > > am not mistaken, the said commit just moves the code in the context
> > > without changing the logic of CONFIG_XEN_VIRTIO_FORCE_GRANT, this was
> > > introduced before.
> >
> > I see, the tag should better point to 7228113d1fa0 ("xen/virtio: use
> > dom0 as default backend for CONFIG_XEN_VIRTIO_FORCE_GRANT") which
> > introduced the original logic to use dom0 as backend.
> >
> > Commit 035e3a4321f7 ("xen/virtio: Optimize the setup of "xen-grant-dma"
> > devices") is relevant in sense that it extended when this logic is
> > active by adding an OR check for xen_pv_domain().
> >
> > >
> > >
> > > > Signed-off-by: Petr Pavlu <[email protected]>
> > > > ---
> > > > drivers/xen/grant-dma-ops.c | 4 +++-
> > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> > > > index 76f6f26265a3..29ed27ac450e 100644
> > > > --- a/drivers/xen/grant-dma-ops.c
> > > > +++ b/drivers/xen/grant-dma-ops.c
> > > > @@ -362,7 +362,9 @@ static int xen_grant_init_backend_domid(struct
> > > > device *dev,
> > > > if (np) {
> > > > ret = xen_dt_grant_init_backend_domid(dev, np,
> > > > backend_domid);
> > > > of_node_put(np);
> > > > - } else if (IS_ENABLED(CONFIG_XEN_VIRTIO_FORCE_GRANT) ||
> > > > xen_pv_domain()) {
> > > > + } else if ((IS_ENABLED(CONFIG_XEN_VIRTIO_FORCE_GRANT) ||
> > > > + xen_pv_domain()) &&
> > > > + !xen_initial_domain()) {
> > >
> > > The commit lgtm, just one note:
> > >
> > >
> > > I would even bail out early in xen_virtio_restricted_mem_acc() instead,
> > > as I assume the same issue could happen on Arm with DT (although there
> > > we don't guess the backend's domid, we read it from DT and quite
> > > unlikely we get Dom0 being in Dom0 with correct DT).
> > >
> > > Something like:
> > >
> > > @@ -416,6 +421,10 @@ bool xen_virtio_restricted_mem_acc(struct
> > > virtio_device *dev)
> > > {
> > > domid_t backend_domid;
> > >
> > > + /* Xen grant DMA ops are not used when running as initial domain
> > > */
> > > + if (xen_initial_domain())
> > > + return false;
> > > +
> > > if (!xen_grant_init_backend_domid(dev->dev.parent,
> > > &backend_domid)) {
> > > xen_grant_setup_dma_ops(dev->dev.parent, backend_domid);
> > > return true;
> > > (END)
> > >
> > >
> > >
> > > If so, that commit subject would need to be updated accordingly.
> > >
> > > Let's see what other reviewers will say.
> >
> > Ok, makes sense.
>
> I think this is okay for a fix of the current problem.
>
> Passing through virtio devices to a PV domU is not covered by this fix, but
> this
> should be a rather rare configuration, which doesn't work today either. So the
> suggested patch would fix the current issue without introducing a regression.
>
> Anything else can be done later.

Why do you say that passing through virtio devices to a PV domU doesn't
work today anyway? Also, as you know many people use Xen outside of
datacenter deployments (laptops, embedded etc.) where drivers domains
and device assignment are very common. You could assign a virtio network
card to a domU and use PV network to share the network with other
guests. Physical virtio devices, especially virtio-net devices, exist. I
could probably repro this problem today in a domU just installing
QubesOS inside QEMU. QubesOS uses network driver domains and if QEMU
provides a virtio-net network card, this would break even with this
patch.

2023-07-08 11:15:30

by Jürgen Groß

[permalink] [raw]

Subject: Re: [PATCH 2/2] xen/virtio: Avoid use of the dom0 backend in dom0

On 07.07.23 23:02, Stefano Stabellini wrote:
> On Fri, 7 Jul 2023, Juergen Gross wrote:
>> On 26.06.23 15:17, Petr Pavlu wrote:
>>> On 6/21/23 19:58, Oleksandr Tyshchenko wrote:
>>>> On 21.06.23 16:12, Petr Pavlu wrote:
>>>>> When attempting to run Xen on a QEMU/KVM virtual machine with virtio
>>>>> devices (all x86_64), dom0 tries to establish a grant for itself which
>>>>> eventually results in a hang during the boot.
>>>>>
>>>>> The backtrace looks as follows, the while loop in __send_control_msg()
>>>>> makes no progress:
>>>>>
>>>>> #0 virtqueue_get_buf_ctx (_vq=_vq@entry=0xffff8880074a8400,
>>>>> len=len@entry=0xffffc90000413c94, ctx=ctx@entry=0x0 <fixed_percpu_data>)
>>>>> at ../drivers/virtio/virtio_ring.c:2326
>>>>> #1 0xffffffff817086b7 in virtqueue_get_buf
>>>>> (_vq=_vq@entry=0xffff8880074a8400, len=len@entry=0xffffc90000413c94) at
>>>>> ../drivers/virtio/virtio_ring.c:2333
>>>>> #2 0xffffffff8175f6b2 in __send_control_msg (portdev=<optimized
>>>>> out>, port_id=0xffffffff, event=0x0, value=0x1) at
>>>>> ../drivers/char/virtio_console.c:562
>>>>> #3 0xffffffff8175f6ee in __send_control_msg (portdev=<optimized
>>>>> out>, port_id=<optimized out>, event=<optimized out>, value=<optimized
>>>>> out>) at ../drivers/char/virtio_console.c:569
>>>>> #4 0xffffffff817618b1 in virtcons_probe (vdev=0xffff88800585e800)
>>>>> at ../drivers/char/virtio_console.c:2098
>>>>> #5 0xffffffff81707117 in virtio_dev_probe (_d=0xffff88800585e810)
>>>>> at ../drivers/virtio/virtio.c:305
>>>>> #6 0xffffffff8198e348 in call_driver_probe (drv=0xffffffff82be40c0
>>>>> <virtio_console>, drv=0xffffffff82be40c0 <virtio_console>,
>>>>> dev=0xffff88800585e810) at ../drivers/base/dd.c:579
>>>>> #7 really_probe (dev=dev@entry=0xffff88800585e810,
>>>>> drv=drv@entry=0xffffffff82be40c0 <virtio_console>) at
>>>>> ../drivers/base/dd.c:658
>>>>> #8 0xffffffff8198e58f in __driver_probe_device
>>>>> (drv=drv@entry=0xffffffff82be40c0 <virtio_console>,
>>>>> dev=dev@entry=0xffff88800585e810) at ../drivers/base/dd.c:800
>>>>> #9 0xffffffff8198e65a in driver_probe_device
>>>>> (drv=drv@entry=0xffffffff82be40c0 <virtio_console>,
>>>>> dev=dev@entry=0xffff88800585e810) at ../drivers/base/dd.c:830
>>>>> #10 0xffffffff8198e832 in __driver_attach (dev=0xffff88800585e810,
>>>>> data=0xffffffff82be40c0 <virtio_console>) at ../drivers/base/dd.c:1216
>>>>> #11 0xffffffff8198bfb2 in bus_for_each_dev (bus=<optimized out>,
>>>>> start=start@entry=0x0 <fixed_percpu_data>,
>>>>> data=data@entry=0xffffffff82be40c0 <virtio_console>,
>>>>> fn=fn@entry=0xffffffff8198e7b0 <__driver_attach>) at
>>>>> ../drivers/base/bus.c:368
>>>>> #12 0xffffffff8198db65 in driver_attach
>>>>> (drv=drv@entry=0xffffffff82be40c0 <virtio_console>) at
>>>>> ../drivers/base/dd.c:1233
>>>>> #13 0xffffffff8198d207 in bus_add_driver
>>>>> (drv=drv@entry=0xffffffff82be40c0 <virtio_console>) at
>>>>> ../drivers/base/bus.c:673
>>>>> #14 0xffffffff8198f550 in driver_register
>>>>> (drv=drv@entry=0xffffffff82be40c0 <virtio_console>) at
>>>>> ../drivers/base/driver.c:246
>>>>> #15 0xffffffff81706b47 in register_virtio_driver
>>>>> (driver=driver@entry=0xffffffff82be40c0 <virtio_console>) at
>>>>> ../drivers/virtio/virtio.c:357
>>>>> #16 0xffffffff832cd34b in virtio_console_init () at
>>>>> ../drivers/char/virtio_console.c:2258
>>>>> #17 0xffffffff8100105c in do_one_initcall (fn=0xffffffff832cd2e0
>>>>> <virtio_console_init>) at ../init/main.c:1246
>>>>> #18 0xffffffff83277293 in do_initcall_level
>>>>> (command_line=0xffff888003e2f900 "root", level=0x6) at
>>>>> ../init/main.c:1319
>>>>> #19 do_initcalls () at ../init/main.c:1335
>>>>> #20 do_basic_setup () at ../init/main.c:1354
>>>>> #21 kernel_init_freeable () at ../init/main.c:1571
>>>>> #22 0xffffffff81f64be1 in kernel_init (unused=<optimized out>) at
>>>>> ../init/main.c:1462
>>>>> #23 0xffffffff81001f49 in ret_from_fork () at
>>>>> ../arch/x86/entry/entry_64.S:308
>>>>> #24 0x0000000000000000 in ?? ()
>>>>>
>>>>> Fix the problem by preventing xen_grant_init_backend_domid() from
>>>>> setting dom0 as a backend when running in dom0.
>>>>>
>>>>> Fixes: 035e3a4321f7 ("xen/virtio: Optimize the setup of "xen-grant-dma"
>>>>> devices")
>>>>
>>>>
>>>> I am not 100% sure whether the Fixes tag points to precise commit. If I
>>>> am not mistaken, the said commit just moves the code in the context
>>>> without changing the logic of CONFIG_XEN_VIRTIO_FORCE_GRANT, this was
>>>> introduced before.
>>>
>>> I see, the tag should better point to 7228113d1fa0 ("xen/virtio: use
>>> dom0 as default backend for CONFIG_XEN_VIRTIO_FORCE_GRANT") which
>>> introduced the original logic to use dom0 as backend.
>>>
>>> Commit 035e3a4321f7 ("xen/virtio: Optimize the setup of "xen-grant-dma"
>>> devices") is relevant in sense that it extended when this logic is
>>> active by adding an OR check for xen_pv_domain().
>>>
>>>>
>>>>
>>>>> Signed-off-by: Petr Pavlu <[email protected]>
>>>>> ---
>>>>> drivers/xen/grant-dma-ops.c | 4 +++-
>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
>>>>> index 76f6f26265a3..29ed27ac450e 100644
>>>>> --- a/drivers/xen/grant-dma-ops.c
>>>>> +++ b/drivers/xen/grant-dma-ops.c
>>>>> @@ -362,7 +362,9 @@ static int xen_grant_init_backend_domid(struct
>>>>> device *dev,
>>>>> if (np) {
>>>>> ret = xen_dt_grant_init_backend_domid(dev, np,
>>>>> backend_domid);
>>>>> of_node_put(np);
>>>>> - } else if (IS_ENABLED(CONFIG_XEN_VIRTIO_FORCE_GRANT) ||
>>>>> xen_pv_domain()) {
>>>>> + } else if ((IS_ENABLED(CONFIG_XEN_VIRTIO_FORCE_GRANT) ||
>>>>> + xen_pv_domain()) &&
>>>>> + !xen_initial_domain()) {
>>>>
>>>> The commit lgtm, just one note:
>>>>
>>>>
>>>> I would even bail out early in xen_virtio_restricted_mem_acc() instead,
>>>> as I assume the same issue could happen on Arm with DT (although there
>>>> we don't guess the backend's domid, we read it from DT and quite
>>>> unlikely we get Dom0 being in Dom0 with correct DT).
>>>>
>>>> Something like:
>>>>
>>>> @@ -416,6 +421,10 @@ bool xen_virtio_restricted_mem_acc(struct
>>>> virtio_device *dev)
>>>> {
>>>> domid_t backend_domid;
>>>>
>>>> + /* Xen grant DMA ops are not used when running as initial domain
>>>> */
>>>> + if (xen_initial_domain())
>>>> + return false;
>>>> +
>>>> if (!xen_grant_init_backend_domid(dev->dev.parent,
>>>> &backend_domid)) {
>>>> xen_grant_setup_dma_ops(dev->dev.parent, backend_domid);
>>>> return true;
>>>> (END)
>>>>
>>>>
>>>>
>>>> If so, that commit subject would need to be updated accordingly.
>>>>
>>>> Let's see what other reviewers will say.
>>>
>>> Ok, makes sense.
>>
>> I think this is okay for a fix of the current problem.
>>
>> Passing through virtio devices to a PV domU is not covered by this fix, but
>> this
>> should be a rather rare configuration, which doesn't work today either. So the
>> suggested patch would fix the current issue without introducing a regression.
>>
>> Anything else can be done later.
>
> Why do you say that passing through virtio devices to a PV domU doesn't
> work today anyway? Also, as you know many people use Xen outside of
> datacenter deployments (laptops, embedded etc.) where drivers domains
> and device assignment are very common. You could assign a virtio network
> card to a domU and use PV network to share the network with other
> guests. Physical virtio devices, especially virtio-net devices, exist. I
> could probably repro this problem today in a domU just installing
> QubesOS inside QEMU. QubesOS uses network driver domains and if QEMU
> provides a virtio-net network card, this would break even with this
> patch.

I might be wrong, but I don't think all virtio frontends will work in that
scenario. The main reason is the PFN/MFN difference: a frontend using guest
consecutive memory for doing large I/Os will fail miserably. This was the
main reason why I had to add the functionality of consecutive grants for
large I/O buffers. The same goes for multi-page virtio ring pages.

Juergen

Attachments:

OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key OpenPGP_signature (505.00 B)
OpenPGP digital signature Download all attachments

2023-07-08 19:08:44

by Stefano Stabellini

[permalink] [raw]

Subject: Re: [PATCH 2/2] xen/virtio: Avoid use of the dom0 backend in dom0

On Sat, 8 Jul 2023, Juergen Gross wrote:
> On 07.07.23 23:02, Stefano Stabellini wrote:
> > On Fri, 7 Jul 2023, Juergen Gross wrote:
> > > On 26.06.23 15:17, Petr Pavlu wrote:
> > > > On 6/21/23 19:58, Oleksandr Tyshchenko wrote:
> > > > > On 21.06.23 16:12, Petr Pavlu wrote:
> > > > > > When attempting to run Xen on a QEMU/KVM virtual machine with virtio
> > > > > > devices (all x86_64), dom0 tries to establish a grant for itself
> > > > > > which
> > > > > > eventually results in a hang during the boot.
> > > > > >
> > > > > > The backtrace looks as follows, the while loop in
> > > > > > __send_control_msg()
> > > > > > makes no progress:
> > > > > >
> > > > > > #0 virtqueue_get_buf_ctx (_vq=_vq@entry=0xffff8880074a8400,
> > > > > > len=len@entry=0xffffc90000413c94, ctx=ctx@entry=0x0
> > > > > > <fixed_percpu_data>)
> > > > > > at ../drivers/virtio/virtio_ring.c:2326
> > > > > > #1 0xffffffff817086b7 in virtqueue_get_buf
> > > > > > (_vq=_vq@entry=0xffff8880074a8400, len=len@entry=0xffffc90000413c94)
> > > > > > at
> > > > > > ../drivers/virtio/virtio_ring.c:2333
> > > > > > #2 0xffffffff8175f6b2 in __send_control_msg
> > > > > > (portdev=<optimized
> > > > > > out>, port_id=0xffffffff, event=0x0, value=0x1) at
> > > > > > ../drivers/char/virtio_console.c:562
> > > > > > #3 0xffffffff8175f6ee in __send_control_msg
> > > > > > (portdev=<optimized
> > > > > > out>, port_id=<optimized out>, event=<optimized out>,
> > > > > > value=<optimized
> > > > > > out>) at ../drivers/char/virtio_console.c:569
> > > > > > #4 0xffffffff817618b1 in virtcons_probe
> > > > > > (vdev=0xffff88800585e800)
> > > > > > at ../drivers/char/virtio_console.c:2098
> > > > > > #5 0xffffffff81707117 in virtio_dev_probe
> > > > > > (_d=0xffff88800585e810)
> > > > > > at ../drivers/virtio/virtio.c:305
> > > > > > #6 0xffffffff8198e348 in call_driver_probe
> > > > > > (drv=0xffffffff82be40c0
> > > > > > <virtio_console>, drv=0xffffffff82be40c0 <virtio_console>,
> > > > > > dev=0xffff88800585e810) at ../drivers/base/dd.c:579
> > > > > > #7 really_probe (dev=dev@entry=0xffff88800585e810,
> > > > > > drv=drv@entry=0xffffffff82be40c0 <virtio_console>) at
> > > > > > ../drivers/base/dd.c:658
> > > > > > #8 0xffffffff8198e58f in __driver_probe_device
> > > > > > (drv=drv@entry=0xffffffff82be40c0 <virtio_console>,
> > > > > > dev=dev@entry=0xffff88800585e810) at ../drivers/base/dd.c:800
> > > > > > #9 0xffffffff8198e65a in driver_probe_device
> > > > > > (drv=drv@entry=0xffffffff82be40c0 <virtio_console>,
> > > > > > dev=dev@entry=0xffff88800585e810) at ../drivers/base/dd.c:830
> > > > > > #10 0xffffffff8198e832 in __driver_attach
> > > > > > (dev=0xffff88800585e810,
> > > > > > data=0xffffffff82be40c0 <virtio_console>) at
> > > > > > ../drivers/base/dd.c:1216
> > > > > > #11 0xffffffff8198bfb2 in bus_for_each_dev (bus=<optimized
> > > > > > out>,
> > > > > > start=start@entry=0x0 <fixed_percpu_data>,
> > > > > > data=data@entry=0xffffffff82be40c0 <virtio_console>,
> > > > > > fn=fn@entry=0xffffffff8198e7b0 <__driver_attach>) at
> > > > > > ../drivers/base/bus.c:368
> > > > > > #12 0xffffffff8198db65 in driver_attach
> > > > > > (drv=drv@entry=0xffffffff82be40c0 <virtio_console>) at
> > > > > > ../drivers/base/dd.c:1233
> > > > > > #13 0xffffffff8198d207 in bus_add_driver
> > > > > > (drv=drv@entry=0xffffffff82be40c0 <virtio_console>) at
> > > > > > ../drivers/base/bus.c:673
> > > > > > #14 0xffffffff8198f550 in driver_register
> > > > > > (drv=drv@entry=0xffffffff82be40c0 <virtio_console>) at
> > > > > > ../drivers/base/driver.c:246
> > > > > > #15 0xffffffff81706b47 in register_virtio_driver
> > > > > > (driver=driver@entry=0xffffffff82be40c0 <virtio_console>) at
> > > > > > ../drivers/virtio/virtio.c:357
> > > > > > #16 0xffffffff832cd34b in virtio_console_init () at
> > > > > > ../drivers/char/virtio_console.c:2258
> > > > > > #17 0xffffffff8100105c in do_one_initcall
> > > > > > (fn=0xffffffff832cd2e0
> > > > > > <virtio_console_init>) at ../init/main.c:1246
> > > > > > #18 0xffffffff83277293 in do_initcall_level
> > > > > > (command_line=0xffff888003e2f900 "root", level=0x6) at
> > > > > > ../init/main.c:1319
> > > > > > #19 do_initcalls () at ../init/main.c:1335
> > > > > > #20 do_basic_setup () at ../init/main.c:1354
> > > > > > #21 kernel_init_freeable () at ../init/main.c:1571
> > > > > > #22 0xffffffff81f64be1 in kernel_init (unused=<optimized out>)
> > > > > > at
> > > > > > ../init/main.c:1462
> > > > > > #23 0xffffffff81001f49 in ret_from_fork () at
> > > > > > ../arch/x86/entry/entry_64.S:308
> > > > > > #24 0x0000000000000000 in ?? ()
> > > > > >
> > > > > > Fix the problem by preventing xen_grant_init_backend_domid() from
> > > > > > setting dom0 as a backend when running in dom0.
> > > > > >
> > > > > > Fixes: 035e3a4321f7 ("xen/virtio: Optimize the setup of
> > > > > > "xen-grant-dma"
> > > > > > devices")
> > > > >
> > > > >
> > > > > I am not 100% sure whether the Fixes tag points to precise commit. If
> > > > > I
> > > > > am not mistaken, the said commit just moves the code in the context
> > > > > without changing the logic of CONFIG_XEN_VIRTIO_FORCE_GRANT, this was
> > > > > introduced before.
> > > >
> > > > I see, the tag should better point to 7228113d1fa0 ("xen/virtio: use
> > > > dom0 as default backend for CONFIG_XEN_VIRTIO_FORCE_GRANT") which
> > > > introduced the original logic to use dom0 as backend.
> > > >
> > > > Commit 035e3a4321f7 ("xen/virtio: Optimize the setup of "xen-grant-dma"
> > > > devices") is relevant in sense that it extended when this logic is
> > > > active by adding an OR check for xen_pv_domain().
> > > >
> > > > >
> > > > >
> > > > > > Signed-off-by: Petr Pavlu <[email protected]>
> > > > > > ---
> > > > > > drivers/xen/grant-dma-ops.c | 4 +++-
> > > > > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/xen/grant-dma-ops.c
> > > > > > b/drivers/xen/grant-dma-ops.c
> > > > > > index 76f6f26265a3..29ed27ac450e 100644
> > > > > > --- a/drivers/xen/grant-dma-ops.c
> > > > > > +++ b/drivers/xen/grant-dma-ops.c
> > > > > > @@ -362,7 +362,9 @@ static int xen_grant_init_backend_domid(struct
> > > > > > device *dev,
> > > > > > if (np) {
> > > > > > ret = xen_dt_grant_init_backend_domid(dev, np,
> > > > > > backend_domid);
> > > > > > of_node_put(np);
> > > > > > - } else if (IS_ENABLED(CONFIG_XEN_VIRTIO_FORCE_GRANT) ||
> > > > > > xen_pv_domain()) {
> > > > > > + } else if ((IS_ENABLED(CONFIG_XEN_VIRTIO_FORCE_GRANT) ||
> > > > > > + xen_pv_domain()) &&
> > > > > > + !xen_initial_domain()) {
> > > > >
> > > > > The commit lgtm, just one note:
> > > > >
> > > > >
> > > > > I would even bail out early in xen_virtio_restricted_mem_acc()
> > > > > instead,
> > > > > as I assume the same issue could happen on Arm with DT (although there
> > > > > we don't guess the backend's domid, we read it from DT and quite
> > > > > unlikely we get Dom0 being in Dom0 with correct DT).
> > > > >
> > > > > Something like:
> > > > >
> > > > > @@ -416,6 +421,10 @@ bool xen_virtio_restricted_mem_acc(struct
> > > > > virtio_device *dev)
> > > > > {
> > > > > domid_t backend_domid;
> > > > >
> > > > > + /* Xen grant DMA ops are not used when running as initial
> > > > > domain
> > > > > */
> > > > > + if (xen_initial_domain())
> > > > > + return false;
> > > > > +
> > > > > if (!xen_grant_init_backend_domid(dev->dev.parent,
> > > > > &backend_domid)) {
> > > > > xen_grant_setup_dma_ops(dev->dev.parent,
> > > > > backend_domid);
> > > > > return true;
> > > > > (END)
> > > > >
> > > > >
> > > > >
> > > > > If so, that commit subject would need to be updated accordingly.
> > > > >
> > > > > Let's see what other reviewers will say.
> > > >
> > > > Ok, makes sense.
> > >
> > > I think this is okay for a fix of the current problem.
> > >
> > > Passing through virtio devices to a PV domU is not covered by this fix,
> > > but
> > > this
> > > should be a rather rare configuration, which doesn't work today either. So
> > > the
> > > suggested patch would fix the current issue without introducing a
> > > regression.
> > >
> > > Anything else can be done later.
> >
> > Why do you say that passing through virtio devices to a PV domU doesn't
> > work today anyway? Also, as you know many people use Xen outside of
> > datacenter deployments (laptops, embedded etc.) where drivers domains
> > and device assignment are very common. You could assign a virtio network
> > card to a domU and use PV network to share the network with other
> > guests. Physical virtio devices, especially virtio-net devices, exist. I
> > could probably repro this problem today in a domU just installing
> > QubesOS inside QEMU. QubesOS uses network driver domains and if QEMU
> > provides a virtio-net network card, this would break even with this
> > patch.
>
> I might be wrong, but I don't think all virtio frontends will work in that
> scenario. The main reason is the PFN/MFN difference: a frontend using guest
> consecutive memory for doing large I/Os will fail miserably. This was the
> main reason why I had to add the functionality of consecutive grants for
> large I/O buffers. The same goes for multi-page virtio ring pages.

I think for PV guests the virtio frontends would work but they might be
slow as they would have to bounce over swiotlb-xen every multi-page
buffer. But the virtio frontends should work OK for PVH and HVM guests.