2007-11-06 17:16:18

by Anthony Liguori

[permalink] [raw]
Subject: Use of virtio device IDs

Hi Rusty,

I've written a PCI virtio transport and noticed something strange. All
current in-tree virtio devices register ID tables that match a specific
device ID, but any vendor ID.

This is incompatible with using PCI vendor/device IDs for virtio
vendor/device IDs since vendors control what device IDs mean. A simple
solution would be to assign a fixed vendor ID to all current virtio
devices. This doesn't solve the problem completely though since you
would create a conflict between the PCI vendor ID space and the virtio
vendor ID space.

The only solutions seem to be virtualizing the virtio vendor/device IDs
(which is what I'm currently doing) or to mandate that the virtio vendor
ID be within the PCI vendor ID space. It's probably not necessary to
make the same requirement for device IDs though.

What are your thoughts?

Regards,

Anthony Liguori


2007-11-06 18:49:56

by Anthony Liguori

[permalink] [raw]
Subject: Re: Use of virtio device IDs

Anthony Liguori wrote:
> Hi Rusty,
>
> I've written a PCI virtio transport and noticed something strange.
> All current in-tree virtio devices register ID tables that match a
> specific device ID, but any vendor ID.
>
> This is incompatible with using PCI vendor/device IDs for virtio
> vendor/device IDs since vendors control what device IDs mean. A
> simple solution would be to assign a fixed vendor ID to all current
> virtio devices. This doesn't solve the problem completely though
> since you would create a conflict between the PCI vendor ID space and
> the virtio vendor ID space.
>
> The only solutions seem to be virtualizing the virtio vendor/device
> IDs (which is what I'm currently doing) or to mandate that the virtio
> vendor ID be within the PCI vendor ID space. It's probably not
> necessary to make the same requirement for device IDs though.

There's another ugly bit in the current implementation.

Right now, we would have to have every PCI vendor/device ID pair in the
virtio PCI driver ID table for every virtio device.

This means every time a virtio device is added to Linux, the virtio PCI
driver has to be modified (assuming that each virtio device uses a
unique PCI vendor/device ID) :-/

Regards,

Anthony Liguori

> What are your thoughts?
>
> Regards,
>
> Anthony Liguori
>

2007-11-07 03:39:35

by Gregory Haskins

[permalink] [raw]
Subject: Re: Use of virtio device IDs

Anthony Liguori wrote:

>
> Right now, we would have to have every PCI vendor/device ID pair in the
> virtio PCI driver ID table for every virtio device.

I realize you guys are probably far down this road in the design
process, but FWIW: This is a major motivation for the reason that the
IOQ stuff I posted a while back used strings for device identification
instead of a fixed length, centrally managed namespace like PCI
vendor/dev-id. Then you can just name your device something reasonably
unique (e.g. "qumranet::veth", or "ibm-pvirt-clock").

(I realize that if you are going to do PCI, you need to make it
PCI-like. But I think using PCI in the first place is probably the
wrong direction. IMHO, there's really not a lot of reason to be
constrained by a hardware specification once you decide to go PV. This
is even more true if you want to support as many platforms as possible
(i.e. platforms that don't have PCI natively).

Regards,
-Greg

2007-11-07 05:40:39

by Avi Kivity

[permalink] [raw]
Subject: Re: Use of virtio device IDs

Gregory Haskins wrote:
> Anthony Liguori wrote:
>
>
>> Right now, we would have to have every PCI vendor/device ID pair in the
>> virtio PCI driver ID table for every virtio device.
>>
>
> I realize you guys are probably far down this road in the design
> process,

That doesn't mean we can't change it if it's wrong.

> but FWIW: This is a major motivation for the reason that the
> IOQ stuff I posted a while back used strings for device identification
> instead of a fixed length, centrally managed namespace like PCI
> vendor/dev-id. Then you can just name your device something reasonably
> unique (e.g. "qumranet::veth", or "ibm-pvirt-clock").
>

I dislike strings. They make it look as if you have a nice extensible
interface, where in reality you have a poorly documented interface which
leads to poor interoperability.

I prefer nice structure where you can see all the limitations immediately.

> (I realize that if you are going to do PCI, you need to make it
> PCI-like. But I think using PCI in the first place is probably the
> wrong direction. IMHO, there's really not a lot of reason to be
> constrained by a hardware specification once you decide to go PV. This
> is even more true if you want to support as many platforms as possible
> (i.e. platforms that don't have PCI natively).
>
>

PCI means that you can reuse all of the platform's infrastructure for
irq allocation, discovery, device hotplug, and management. You can
write it for new guests but backporting it to older guests will be a
huge task.

We will support non-pci for s390, but in order to support Windows and
older Linux PCI is necessary.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-11-07 06:09:26

by Rusty Russell

[permalink] [raw]
Subject: Re: Use of virtio device IDs

On Wednesday 07 November 2007 16:40:13 Avi Kivity wrote:
> Gregory Haskins wrote:
> > but FWIW: This is a major motivation for the reason that the
> > IOQ stuff I posted a while back used strings for device identification
> > instead of a fixed length, centrally managed namespace like PCI
> > vendor/dev-id. Then you can just name your device something reasonably
> > unique (e.g. "qumranet::veth", or "ibm-pvirt-clock").
>
> I dislike strings. They make it look as if you have a nice extensible
> interface, where in reality you have a poorly documented interface which
> leads to poor interoperability.

Yes, you end up with exactly names like "qumranet::veth"
and "ibm-pvirt-clock". I would recommend looking very hard at /proc, Open
Firmware on a modern system, or the Xen store, to see what a lack of
limitation can do to you :)

> We will support non-pci for s390, but in order to support Windows and
> older Linux PCI is necessary.

The aim is that PCI support is clean, but that we're not really tied to PCI.
I think we're getting closer with the recent config changes.

Cheers,
Rusty.

2007-11-07 06:29:17

by Anthony Liguori

[permalink] [raw]
Subject: Re: Use of virtio device IDs

Rusty Russell wrote:
> On Wednesday 07 November 2007 16:40:13 Avi Kivity wrote:
>
>> Gregory Haskins wrote:
>>
>>> but FWIW: This is a major motivation for the reason that the
>>> IOQ stuff I posted a while back used strings for device identification
>>> instead of a fixed length, centrally managed namespace like PCI
>>> vendor/dev-id. Then you can just name your device something reasonably
>>> unique (e.g. "qumranet::veth", or "ibm-pvirt-clock").
>>>
>> I dislike strings. They make it look as if you have a nice extensible
>> interface, where in reality you have a poorly documented interface which
>> leads to poor interoperability.
>>
>
> Yes, you end up with exactly names like "qumranet::veth"
> and "ibm-pvirt-clock". I would recommend looking very hard at /proc, Open
> Firmware on a modern system, or the Xen store, to see what a lack of
> limitation can do to you :)
>
>
>> We will support non-pci for s390, but in order to support Windows and
>> older Linux PCI is necessary.
>>
>
> The aim is that PCI support is clean, but that we're not really tied to PCI.
> I think we're getting closer with the recent config changes.
>

Yes, my main desire was to ensure that we had a clean PCI ABI that would
be natural to implement on a platform like Windows. I think with the
recent config_ops refactoring, we can now do that.

Regards,

Anthony Liguori

> Cheers,
> Rusty.
>

2007-11-07 17:33:14

by Anthony Liguori

[permalink] [raw]
Subject: Re: Use of virtio device IDs

Rusty Russell wrote:
> On Wednesday 07 November 2007 16:40:13 Avi Kivity wrote:
>
>> Gregory Haskins wrote:
>>
>>> but FWIW: This is a major motivation for the reason that the
>>> IOQ stuff I posted a while back used strings for device identification
>>> instead of a fixed length, centrally managed namespace like PCI
>>> vendor/dev-id. Then you can just name your device something reasonably
>>> unique (e.g. "qumranet::veth", or "ibm-pvirt-clock").
>>>
>> I dislike strings. They make it look as if you have a nice extensible
>> interface, where in reality you have a poorly documented interface which
>> leads to poor interoperability.
>>
>
> Yes, you end up with exactly names like "qumranet::veth"
> and "ibm-pvirt-clock". I would recommend looking very hard at /proc, Open
> Firmware on a modern system, or the Xen store, to see what a lack of
> limitation can do to you :)
>

FWIW, I've switched to using the PCI subsystem vendor/device IDs for
virtio which Rusty suggested. I think this makes even more sense than
using the main vendor/device ID since I do think that we only should use
a single vendor/device ID for all virtio PCI devices and then
differentiate based on the subsystem IDs.

Regards,

Anthony Liguori

>> We will support non-pci for s390, but in order to support Windows and
>> older Linux PCI is necessary.
>>
>
> The aim is that PCI support is clean, but that we're not really tied to PCI.
> I think we're getting closer with the recent config changes.
>
> Cheers,
> Rusty.
>

2007-11-07 20:39:54

by Gregory Haskins

[permalink] [raw]
Subject: Re: Use of virtio device IDs

Avi Kivity wrote:
>
> I dislike strings. They make it look as if you have a nice extensible
> interface, where in reality you have a poorly documented interface which
> leads to poor interoperability.

Its not really a full fledged interface, but rather just a simple id
mechanism. A decentralized id mechanism with less administrative burden.

On the flip side, a centralized namespace has the advantage of
controlling collisions at the expense of administrative overhead. After
designing systems both ways in the past, I prefer to reduce the admin
burden, but that is just me.


> PCI means that you can reuse all of the platform's infrastructure for
> irq allocation, discovery, device hotplug, and management.

Its tempting to use, yes. However, most of that infrastructure is
completely inappropriate for a PV implementation, IMHO. You are
probably better off designing something that is PV specific instead of
shoehorning it in to fit a different model (at least for the things I
have in mind). Its not a heck of a lot of code to write a pv-centric
version of these facilities.

> You can write it for new guests but backporting it to older guests will be a
> huge task.
>
> We will support non-pci for s390, but in order to support Windows and
> older Linux PCI is necessary.

I don't know if I would agree with "necessary". "Easier" perhaps. ;) By
definition once you are PV you are hypervisor aware. Now its just a
matter of plugging in the appropriate plumbing to bridge the hypervisor
to the guest-os. Some might be easier than others, sure. But all
should be extensible to a degree.

But I digress. I haven't really had much of a chance to follow the
latest developments here as I have been lost in -rt land for a few
months now. But I know Anthony and Rusty are top-notch, so I'm sure you
guys have it under control. Hopefully, one day soon I will be able to
join you guys again (perhaps to the KVM team's dismay ;).

Regards,
-Greg


2007-11-08 06:37:27

by Avi Kivity

[permalink] [raw]
Subject: Re: Use of virtio device IDs

Gregory Haskins wrote:
>
>> PCI means that you can reuse all of the platform's infrastructure for
>> irq allocation, discovery, device hotplug, and management.
>>
>
> Its tempting to use, yes. However, most of that infrastructure is
> completely inappropriate for a PV implementation, IMHO.

Why?

> You are
> probably better off designing something that is PV specific instead of
> shoehorning it in to fit a different model (at least for the things I
> have in mind).

Well, if we design our pv devices to look like hardware, they will fit
quite well. Both to the guest OS and to user's expectations.

> Its not a heck of a lot of code to write a pv-centric
> version of these facilities.
>
>

It is. Especially if you consider Windows and a gazillion versions of
deployed, non-pv-capable Linux systems. For pv-friendly newer Linux,
it's probably doable, but why?

Look at the mess Xen finds itself in.

>> You can write it for new guests but backporting it to older guests will be a
>> huge task.
>>
>> We will support non-pci for s390, but in order to support Windows and
>> older Linux PCI is necessary.
>>
>
> I don't know if I would agree with "necessary". "Easier" perhaps. ;) By
> definition once you are PV you are hypervisor aware. Now its just a
> matter of plugging in the appropriate plumbing to bridge the hypervisor
> to the guest-os. Some might be easier than others, sure. But all
> should be extensible to a degree.
>
>

It's "necessary" in a pragmatic sense: we want to deliver drivers that
provide features for a wide variety of guests in a reasonable
timeframe. And that means no rewriting guest OS infrastructure.


--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-11-08 09:18:21

by Gerd Hoffmann

[permalink] [raw]
Subject: Re: Use of virtio device IDs

Avi Kivity wrote:
>> You are
>> probably better off designing something that is PV specific instead of
>> shoehorning it in to fit a different model (at least for the things I
>> have in mind).
>
> Well, if we design our pv devices to look like hardware, they will fit
> quite well. Both to the guest OS and to user's expectations.

Disclaimer: Havn't looked at the virtio code much.

I think we should keep the door open for both models and don't nail the
virtio infrastructure to one of them.

For pure pv devices I don't see the point in trying to squeeze it into
the PCI model. Also s390 has no PCI, so there effecticely is no way
around that, we must be able have some pure virtual bus like xenbus.

IMHO the PCI model is most useful for emulated devices with a optional
pv path. Say the usual emulated piix3 IDE controller, give it an
additional pv mode, so the guest can drive it in pv mode if it knows
about it, and use the generic piix ide driver if it doesn't. That kind
of devices we probably want identify by pci subsystem id.

>> Its not a heck of a lot of code to write a pv-centric
>> version of these facilities.
>
> It is. Especially if you consider Windows and a gazillion versions of
> deployed, non-pv-capable Linux systems. For pv-friendly newer Linux,
> it's probably doable, but why?
>
> Look at the mess Xen finds itself in.

Uhm, well, yea. Guess you are refering to the pv-on-hvm drivers. Been
there, dealt with it. What exactly do you think is messy there?

IMHO the most messy thing is the boot problem. hvm bios can't deal with
pv disks, so you can't boot with pv disks only. "fixed" by having the
(boot) disk twice in the system, once via emulated ide, once as pv disk.
Ouch.

Other than that I don't see major problems with the virtual bus model.

> It's "necessary" in a pragmatic sense: we want to deliver drivers that
> provide features for a wide variety of guests in a reasonable
> timeframe. And that means no rewriting guest OS infrastructure.

At least for any udev-based linux distro there is no need to rewrite any
guest os infrastructure. Your virtual bus driver needs a proper uevent
callback, the virtual device drivers a module alias, and you are done.
udev autoloads your virtual device driver modules nicely, without any
distro tool hacking or config file writing ...

cheers,
Gerd

2007-11-08 16:41:17

by Anthony Liguori

[permalink] [raw]
Subject: Re: Use of virtio device IDs

Gerd Hoffmann wrote:
> Avi Kivity wrote:
>
>>> You are
>>> probably better off designing something that is PV specific instead of
>>> shoehorning it in to fit a different model (at least for the things I
>>> have in mind).
>>>
>> Well, if we design our pv devices to look like hardware, they will fit
>> quite well. Both to the guest OS and to user's expectations.
>>
>
> Disclaimer: Havn't looked at the virtio code much.
>
> I think we should keep the door open for both models and don't nail the
> virtio infrastructure to one of them.
>
> For pure pv devices I don't see the point in trying to squeeze it into
> the PCI model. Also s390 has no PCI, so there effecticely is no way
> around that, we must be able have some pure virtual bus like xenbus.
>

I don't really agree with this assessment. There is no performance
advantage to using a pure virtual bus. If you have a pure pv device
that looks and act like a PCI device, besides the obvious advantage of
easy portability to other guest OSes (since everything support PCI, but
porting XenBus--event to Linux 2.4.x was a royal pain), it is also very
easy to support the device on other VMMs.

For instance, the PCI device that I just posted would allow virtio
devices to be used trivially with HVM on Xen. In fact, once the
backends are complete and merged into QEMU, the next time Xen rebases
QEMU they'll get the virtio PV-on-HVM drivers for free. To me, that's a
pretty significant advantage.

> Uhm, well, yea. Guess you are refering to the pv-on-hvm drivers. Been
> there, dealt with it. What exactly do you think is messy there?
>
> IMHO the most messy thing is the boot problem. hvm bios can't deal with
> pv disks, so you can't boot with pv disks only. "fixed" by having the
> (boot) disk twice in the system, once via emulated ide, once as pv disk.
> Ouch.
>

I have actually addressed this problem with a PV option rom for QEMU. I
expect to get time to submit the QEMU patches by the end of the year.
See http://hg.codemonkey.ws/extboot

Regards,

Anthony Liguori

2007-11-13 13:19:54

by Gregory Haskins

[permalink] [raw]
Subject: Re: Use of virtio device IDs

Avi Kivity wrote:
> Gregory Haskins wrote:
>>> PCI means that you can reuse all of the platform's infrastructure for
>>> irq allocation, discovery, device hotplug, and management.
>>>
>> Its tempting to use, yes. However, most of that infrastructure is
>> completely inappropriate for a PV implementation, IMHO.
>
> Why?

(Sorry for the delay)

Since PCI was designed as a hardware solution it has all kinds of stuff
specifically geared towards hardware constraints. Those constraints
are different in a virtualized platform, so some things do not translate
well to an optimal solution. Half of the stuff wouldn't be used, and
the other half has some nasty baggage associated with it (like still
requiring partial emulation in a PV environment).

The point of PV, of course, is high performance guest/host interfaces,
and PCI baggage just gets in the way in many respects. Once a
particular guest's subsystem is HV aware we no longer strictly require
legacy emulation. It should know how to talk to the host using whatever
is appropriate. I aim to strip out all of those emulation points
whenever possible. Once you do that, all of the PCI features that we
would use drops to zero.

On the flip side, we have full emulation if you want broader
compatibility with legacy, at the expense of performance.

>
>> You are
>> probably better off designing something that is PV specific instead of
>> shoehorning it in to fit a different model (at least for the things I
>> have in mind).
>
> Well, if we design our pv devices to look like hardware, they will fit
> quite well. Both to the guest OS and to user's expectations.

Like what hardware? Like PCI hardware? What if the platform in
question doesn't have PCI? Also note that devices don't have to look
like emulated PCI devices per se to look like hardware to the
guest-OS/user. That is just one way to do it.

>
>> Its not a heck of a lot of code to write a pv-centric
>> version of these facilities.
>>
>>
>
> It is.

After having done it in the past, I disagree. But it sounds like you
are lumping core-pv and io-pv together. To be clear, I am not. I agree
that core-pv is invasive and not legacy friendly. io-pv is different,
however, and generally can be retrofitted to an OS in the same way that
support for an arbitrary device X over subsystem Y (PCI, usb, pv, etc)
can be. e.g. you load a driver for the subsystem.

> Especially if you consider Windows and a gazillion versions of
> deployed, non-pv-capable Linux systems.

I am ;)

> For pv-friendly newer Linux,
> it's probably doable, but why?
>
> Look at the mess Xen finds itself in.

I see hanging our hats on PCI as creating a mess for KVM/virtio ;)

>
>>> You can write it for new guests but backporting it to older guests will be a
>>> huge task.
>>>
>>> We will support non-pci for s390, but in order to support Windows and
>>> older Linux PCI is necessary.
>>>
>> I don't know if I would agree with "necessary". "Easier" perhaps. ;) By
>> definition once you are PV you are hypervisor aware. Now its just a
>> matter of plugging in the appropriate plumbing to bridge the hypervisor
>> to the guest-os. Some might be easier than others, sure. But all
>> should be extensible to a degree.
>>
>>
>
> It's "necessary" in a pragmatic sense: we want to deliver drivers that
> provide features for a wide variety of guests in a reasonable
> timeframe. And that means no rewriting guest OS infrastructure.
>

I guess what I am really trying to say in all this is; I would be
careful about painting KVM into a PCI corner. If you want to "render" a
view of PV devices as PCI for platforms that can utilize it, there is
probably not any harm in that.

However, I believe having things be PCI centric, especially in the long
term, will easily turn into a mistake for the project. I don't think
its going to be as useful as you think it is, and then we might find
ourselves in a "backwards compatibility maintenance" situation to
support all these decisions. If the current architecture already is
shaping up to be PCI neutral, great! (I haven't had a chance to follow
lately). If not, I think we should reconsider before things get too messy.

Regards,
-Greg





2007-11-13 13:56:50

by Zachary Amsden

[permalink] [raw]
Subject: Re: Use of virtio device IDs

On Tue, 2007-11-13 at 08:18 -0500, Gregory Haskins wrote:

> Since PCI was designed as a hardware solution it has all kinds of stuff
> specifically geared towards hardware constraints. Those constraints
> are different in a virtualized platform, so some things do not translate
> well to an optimal solution. Half of the stuff wouldn't be used, and
> the other half has some nasty baggage associated with it (like still
> requiring partial emulation in a PV environment).
>
> The point of PV, of course, is high performance guest/host interfaces,
> and PCI baggage just gets in the way in many respects. Once a

I would tend to disagree with that statement. The point of PV is a
simpler to implement guest/host interface, which sometimes results in a
higher performance interface. PV does not always mean high performance,
nor does high performance imply PV is necessary.

> particular guest's subsystem is HV aware we no longer strictly require
> legacy emulation. It should know how to talk to the host using whatever
> is appropriate. I aim to strip out all of those emulation points
> whenever possible. Once you do that, all of the PCI features that we
> would use drops to zero.

Device discovery, bus enumeration and shared memory configuration space
mapping are all very useful features that require complex negotiation
with the hypervisor. Those are provided implicitly in a standardized
way by PCI (or by any good hardware bus protocol).

> On the flip side, we have full emulation if you want broader
> compatibility with legacy, at the expense of performance.

There is no reason you need to sacrifice performance for broader
compatibility with well designed virtual devices on modern hardware.

>
> >
> >> You are
> >> probably better off designing something that is PV specific instead of
> >> shoehorning it in to fit a different model (at least for the things I
> >> have in mind).
> >
> > Well, if we design our pv devices to look like hardware, they will fit
> > quite well. Both to the guest OS and to user's expectations.
>
> Like what hardware? Like PCI hardware? What if the platform in
> question doesn't have PCI? Also note that devices don't have to look
> like emulated PCI devices per se to look like hardware to the
> guest-OS/user. That is just one way to do it.

What if the platform in question does have PCI. How are you going to
write drivers for non-Linux guests? Design a new bus protocol and
driver system for pv-only devices which can't run anywhere except for a
couple selected guests, or design along the lines of what the physical
hardware on your platform actually looks like? It's not like you can
construct a full-featured virtualization of x86 without implenting PCI
at some level anyway.

Note this is not an argument for PCI. It is an argument for devices
that look like hardware on whatever platform you are virtualizing. On
s390, that might be a bit different than on x86. But the key idea is
re-use the platform architecture as much as possible. This gets you far
more code sharing and interoperability for devices.

Just because you can paravirtualize everything does not mean it is a
good idea or more efficient. A good high performance "paravirtualized"
network driver only needs one efficient place to trap to the hypervisor,
that is to kick off a TX queue. Nothing else needs to be
paravirtualized to make this efficient, and now you have a driver that
is fairly easily ported among different operating systems because it
reuses many architectural primitives.

So I think it is a good thing for virtio to allow coupling to a PCI
device backend for x86, but be generic enough to allow coupling to other
backends for non-PCI architectures. Perhaps the top-level device code
and TX/RX queues can be re-used, although I'm not convinced sharing at
this layer gives so much benefit in terms of raw lines of code.

>
> >
> >> Its not a heck of a lot of code to write a pv-centric
> >> version of these facilities.
> >>
> >>
> >
> > It is.
>
> After having done it in the past, I disagree. But it sounds like you

What about XenBus? Device discovery and configuration are a huge amount
of work, especially when done without a standard to work from.

Just my $0.02.

Zach