LinuxLists.cc - [GIT PULL] AlacrityVM guest drivers for 2.6.33

2009-12-07 18:53:30

Subject: [GIT PULL] AlacrityVM guest drivers for 2.6.33

Hi Linus,

Please pull AlacrityVM guest support for 2.6.33 from:

git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git
for-linus

All of these patches have stewed in linux-next for quite a while now:

Gregory Haskins (26):
shm-signal: shared-memory signals
ioq: Add basic definitions for a shared-memory, lockless queue
vbus: add a "vbus-proxy" bus model for vbus_driver objects
vbus-proxy: add a pci-to-vbus bridge
ioq: add driver-side vbus helpers
net: add vbus_enet driver
venet: Update maintainer
venet: fix gso.hdr_len to report correct length
venet: add pre-mapped tx descriptor feature
venet: report actual used descriptor size
venet: cache the ringlen values at init
venet: add eventq protocol
venet: use an skblist for outstanding descriptors
venet: add a tx-complete event for out-of-order support
venet: add Layer-4 Reassembler Offload (L4RO) support
vbus: allow shmsignals to be named
vbus: register shm-signal events as standard Linux IRQ vectors
net: fix vbus-enet Kconfig dependencies
venet: fix locking issue with dev_kfree_skb()
vbus: fix kmalloc() from interrupt context to use GFP_ATOMIC
fix irq resource leak
vbus: remove create_irq() references from the pcibridge
vbus: make library code properly declared as GPL
venet: add missing ethtool include
vbus: add autoprobe capability to guest
vbus: fix pcibridge busmaster support

Jaswinder Singh Rajput (1):
ioq: includecheck fix

Patrick Mullaney (1):
vbus-enet: fix l4ro pool non-atomic allocations in softirq context

Rakib Mullick (1):
vbus: Fix section mismatch warnings in pci-bridge.c

Randy Dunlap (2):
vbus-proxy also uses ioq, so it should select IOQ.
Eliminate all cast warnings in vbus-enet.c and pci-bridge.c.

Thadeu Lima de Souza Cascardo (1):
trivial: fix a typo in SHM_SIGNAL config description

MAINTAINERS | 25 +
arch/x86/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/net/Kconfig | 14 +
drivers/net/Makefile | 1 +
drivers/net/vbus-enet.c | 1560
+++++++++++++++++++++++++++++++++++++++++++
drivers/vbus/Kconfig | 25 +
drivers/vbus/Makefile | 6 +
drivers/vbus/bus-proxy.c | 247 +++++++
drivers/vbus/pci-bridge.c | 1015 ++++++++++++++++++++++++++++
include/linux/Kbuild | 4 +
include/linux/ioq.h | 414 ++++++++++++
include/linux/shm_signal.h | 189 ++++++
include/linux/vbus_driver.h | 83 +++
include/linux/vbus_pci.h | 145 ++++
include/linux/venet.h | 133 ++++
lib/Kconfig | 21 +
lib/Makefile | 2 +
lib/ioq.c | 300 +++++++++
lib/shm_signal.c | 196 ++++++
20 files changed, 4383 insertions(+), 0 deletions(-)
create mode 100644 drivers/net/vbus-enet.c
create mode 100644 drivers/vbus/Kconfig
create mode 100644 drivers/vbus/Makefile
create mode 100644 drivers/vbus/bus-proxy.c
create mode 100644 drivers/vbus/pci-bridge.c
create mode 100644 include/linux/ioq.h
create mode 100644 include/linux/shm_signal.h
create mode 100644 include/linux/vbus_driver.h
create mode 100644 include/linux/vbus_pci.h
create mode 100644 include/linux/venet.h
create mode 100644 lib/ioq.c
create mode 100644 lib/shm_signal.c

Attachments:

On 12/22/09 1:53 PM, Avi Kivity wrote:
> On 12/22/2009 07:36 PM, Gregory Haskins wrote:
>>
>>> Gregory, it would be nice if you worked _much_ harder with the KVM folks
>>> before giving up.
>>>
>> I think the 5+ months that I politely tried to convince the KVM folks
>> that this was a good idea was pretty generous of my employer. The KVM
>> maintainers have ultimately made it clear they are not interested in
>> directly supporting this concept (which is their prerogative), but are
>> perhaps willing to support the peripheral logic needed to allow it to
>> easily interface with KVM. I can accept that, and thus AlacrityVM was
>> born.
>>
>
> Review pointed out locking issues with xinterface which I have not seen
> addressed. I asked why the irqfd/ioeventfd mechanisms are insufficient,
> and you did not reply.
>

Yes, I understand. I've been too busy to rework the code for an
upstream push. I will certainly address those questions when I make the
next attempt, but they weren't relevant to the guest side.

Attachments:

signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 19:15:50

On 12/22/09 2:25 PM, Avi Kivity wrote:
> On 12/22/2009 09:15 PM, Gregory Haskins wrote:
>> On 12/22/09 1:53 PM, Avi Kivity wrote:
>>
>>> I asked why the irqfd/ioeventfd mechanisms are insufficient, and you
>>> did not reply.
>>>
>>>
>> BTW: the ioeventfd issue just fell through the cracks, so sorry about
>> that. Note that I have no specific issue with irqfd ever since the
>> lockless IRQ injection code was added.
>>
>> ioeventfd turned out to be suboptimal for me in the fast path for two
>> reasons:
>>
>> 1) the underlying eventfd is called in atomic context. I had posted
>> patches to Davide to address that limitation, but I believe he rejected
>> them on the grounds that they are only relevant to KVM.
>>
>
> If you're not doing something pretty minor, you're better of waking up a
> thread (perhaps _sync if you want to keep on the same cpu). With the
> new user return notifier thingie, that's pretty cheap.

We have exploits that take advantage of IO heuristics. When triggered
they do more work in vcpu context than normal, which reduces latency
under certain circumstances. But you definitely do _not_ want to do
them in-atomic ;)

>
>> 2) it cannot retain the data field passed in the PIO. I wanted to have
>> one vector that could tell me what value was written, and this cannot be
>> expressed in ioeventfd.
>>
>>
>
> It would be easier to add data logging support to ioeventfd, if it was
> needed that badly.

"Better design"? perhaps. "More easily"? no. Besides, Davide has
already expressed dissatisfaction with the KVM-isms creeping into
eventfd, so its not likely to ever be accepted regardless of your own
disposition.

xinterface, as it turns out, is a great KVM interface for me and easy to
extend, all without conflicting with the changes in upstream. The old
way was via the kvm ioctl interface, but that sucked as the ABI was
always moving. Where is the problem? ioeventfd still works fine as it is.

Kind Regards,
-Greg

Attachments:

signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 19:37:36

On 12/22/09 2:38 PM, Avi Kivity wrote:
> On 12/22/2009 09:32 PM, Gregory Haskins wrote:
>> xinterface, as it turns out, is a great KVM interface for me and easy to
>> extend, all without conflicting with the changes in upstream. The old
>> way was via the kvm ioctl interface, but that sucked as the ABI was
>> always moving. Where is the problem? ioeventfd still works fine as
>> it is.
>>
>
> It means that kvm locking suddenly affects more of the kernel.
>

Thats ok. This would only be w.r.t. devices that are bound to the KVM
instance anyway, so they better know what they are doing (and they do).

-Greg

Attachments:

signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 19:44:40

by Avi Kivity

On 12/23/09 1:51 AM, Ingo Molnar wrote:
>
> * Anthony Liguori <[email protected]> wrote:
>
>> On 12/22/2009 10:01 AM, Bartlomiej Zolnierkiewicz wrote:
>>>> new e1000 driver is more superior in architecture and do the required
>>>> work to make the new e1000 driver a full replacement for the old one.
>>> Right, like everyone actually does things this way..
>>>
>>> I wonder why do we have OSS, old Firewire and IDE stacks still around then?
>>
>> And it's always a source of pain, isn't it.
>
> Even putting aside the fact that such overlap sucks and is a pain to users
> (and that 98% of driver and subsystem version transitions are done completely
> seemlessly to users - the examples that were cited were the odd ones out of
> 150K commits in the past 4 years, 149K+ of which are seemless), the comparison
> does not even apply really.
>
> e1000, OSS, old Firewire and IDE are hardware stacks, where hardware is a not
> fully understood externality, with its inevitable set of compatibility voes.
> There's often situations where one piece of hardware still works better with
> the old driver, for some odd (or not so odd) reason.
>
> Also, note that the 'new' hw drivers are generally intended and are maintained
> as clear replacements for the old ones, and do so with minimal ABI changes -
> or preferably with no ABI changes at all. Most driver developers just switch
> from old to new and the old bits are left around and are phased out. We phased
> out old OSS recently.
>
> That is a very different situation from the AlacrityVM patches, which:
>
> - Are a pure software concept

By design. In fact, I would describe it as "software to software
optimized" as opposed to trying to shoehorn into something that was
designed as a software-to-hardware interface (and therefore has
assumptions about the constraints in that environment that are not
applicable in software-only).

> and any compatibility mismatch is self-inflicted.

.. because the old model is not great for the intended use cases and has
issues. I've already covered the reasons why adnauseam.

> The patches are in fact breaking the ABI to KVM intentionally (for better or worse).

No, at the very worst they are _augmenting_ the ABI, as evident from the
fact that AlacrityVM is a superset of the entire KVM ABI.

>
> - Gregory claims that the AlacricityVM patches are not a replacement for KVM.
> I.e. there's no intention to phase out any 'old stuff'

There's no reason to phase anything out, except perhaps the virtio-pci
transport. This is one more transport, plugging into virtio underneath
(just like virtio-pci, virtio-lguest, and virtio-s390). I am not even
suggesting that the old transport has to go away, per se. It is the KVM
maintainers who insist on it being all or nothing. For me, I do not see
the big deal in having one more "model" option in the qemu cmd-line, but
that is just my opinion. If the maintainers were really so adamant that
choice is pure evil, I am not sure why we don't see patches for removing
everything but one model type in each IO category. But I digress.

> and it splits the pool of driver developers.

..it these dumb threads that are splitting driver developers with
ignorant statements, irrelevant numbers, and dubious "facts". I
actually tried many many times to ask others to join the effort, and
instead _they_ forked off and made vhost-net with a "sorry, not
interested in working with you"(and based the design largely on the
ideas proposed in my framework, I might add). Thats fine, and it's
their prerogative. I can easily maintain my project out of tree if
upstream is not interested. But do not turn around and try to blame me
for the current state of affairs.

>
> i.e. it has all the makings of a stupid, avoidable, permanent fork. The thing
> is, if AlacricityVM is better, and KVM developers are not willing to fix their
> stuff, replace KVM with it.
>
> It's a bit as if someone found a performance problem with sys_open() and came
> up with sys_open_v2() and claimed that he wants to work with the VFS
> developers while not really doing so but advances sys_open_v2() all the time.

No, its more like if I suggested sys_open_vbus() to augment
sys_open_pci(), sys_open_lguest(), sys_open_s390() because our
fictitious glibc natively supported modular open() implementations. And
I tried to work for a very long time convincing the VFS developers that
this new transport was a good idea to add because it was optimized for
the problem space, made it easy for API callers to reuse the important
elements of the design (e.g. built in "tx-mitigation" instead of waiting
for someone to write it for each device), had new features like the
ability to prioritize signals, create/route signal paths arbitrarily,
implement raw shared memory for idempotent updates, and didn't require
the massive and useless PCI/APIC emulation logic to function like
sys_open_pci() (e.g. easier to port around).

Ultimately, the "VFS developers" said "I know we let other transports in
in the past, but now all transports must be sys_open_pci() based going
forward". Game over, since sys_open_pci cannot support the features I
need, and/or it makes incredibly easy things complex when they don't
need to be so its a poor choice.

>
> Do we allow sys_open_v2() upstream, in the name of openness and diversity,
> letting some apps use that syscall while other apps still use sys_open()?

s/sys_open_v2/sys_open_vbus to portray it accurately, and sure, why not?
There is plenty of precedent already. Its just the top-edge IO ABI.
You can chose realtek, e1000, virtio-net 802.x ABIs today for instance.
This is one more, and despite attempts at painting it duplicative, it
is indeed an evolutionary upgrade IMO especially when you glance beyond
the 802.x world and look at the actual device model presented.

And its moot, anyway, as I have already retracted my one outstanding
pull request based on Linus' observation. So at this time, I am not
advocating _anything_ for upstream inclusion. And I am contemplating
_never_ doing so again. It's not worth _this_.

> Or do we say "enough is enough of this stupidity,

I certainly agree that this thread has indeed introduced a significant
degree of stupidity, yes.

> come up with some strong
> reasons to replace sys_open, and if so, replace the thing and be done with the
> pain!".
>

I am open to this, but powerless to control the decision in the upstream
variant other than to describe what I did, and rebut FUD against it to
make sure the record is straight.

> Overlap and forking can still be done in special circumstances, when a project
> splits and a hostile fork is inevitable due to prolongued and irreconcilable
> differences between the parties

You are certainly a contributing factor in pushing things in that direction.

> and if there's no strong technical advantage
> on either side. I havent seen evidence of this yet though: Gregory
claims that
> he wants to 'work with the community'

Well, I sincerely did in the beginning in the spirit of FOSS. I have to
admit that the desire is constantly eroded after dealing with the likes
of you. So if I have seemed more standoffish as of late, that is the
reason. If that was your goal, congratulations: You have irritated me
into submission. And no, I don't expect you to care.

> and the KVM guys seem to agree violently
> that performance can be improved - and are doing so (and are asking Gregory to
> take part in that effort).

And as I indicated to you in my first reply to this thread: the
performance aspects are but one goal here. Some of the performance
aspects cannot be achieved with their approach (like EIO mitigation as
an example), and some of the other feature based aspects cannot be
achieved either (interrupt priority, dynamic signals, etc). That is why
the calls to unify behind virtio-pci have gone unanswered by me: That
approach is orthogonal to the vbus project goals. Their ability to
understand or agree with that difference has no bearing on whether there
is any technical merit here. I think this is what you are failing to grasp.

There will be people that will say "Well, we can do a PV-APIC and get
EOI mitigation in PCI too". THAT IS WHAT VBUS IS FOR!!! Implementing
linux-kernel backed, shared-memory, high performance devices. Something
like a shared-memory based interrupt controller would be exactly the
kind of thing I envision here. We can also do other things, like high
performance timers, scheduler coordinators, etc. I don't know how many
different ways to describe it in a way that will be understood. I
started with 802.x because its easy to show the performance gains. If I
knew that the entire community would get bent around the axle on 802.x
when I started, I never would have broached the subject like this. Se
la vie.

>
> The main difference is that Gregory claims that improved performance is not
> possible within the existing KVM framework, while the KVM developers disagree.
> The good news is that this is a hard, testable fact.

Yes, I encourage a bakeoff and have code/instructions available to
anyone interested. I also encourage people to think about the other
facilities that are being introduced in addition to performance
enhancements for simple 802.x, or even KVM. This is about building a
modular framework that encompasses both sides of the links (guest AND
host), and implements "best practices" for optimized PV IO ingrained in
its DNA. It tries to do this in such a way that we don't need to write
new backends for every environment that comes along, or rely on
unnecessary emulation layers (PCI/APIC) to achieve it. It's about
extending Linux as a "io-visor" much as it is for userspace apps for any
environments, using a tried and true shared-memory based approach.

>
> I think we should try _much_ harder before giving up and forking the ABI of a
> healthy project and intentionally inflicting pain on our users.
>
> And, at minimum, such kinds of things _have to_ be pointed out in pull
> requests, because it's like utterly important. In fact i couldnt list any more
> important thing to point out in a pull request.

Mea Culpa. Since I've already established that the pull request didn't
directly relate to the controversy, I didn't think to mention that at
the time. These were just a few more drivers to join the ranks of 1000s
more within Linux. In retrospect, I probably should have so I apologize
for that. It was my first pull request to Linus, so I was bound to
screw something up.

-Greg

Attachments:

signature.asc (267.00 B)
OpenPGP digital signature

2009-12-23 17:00:59

by Chris Wright

[permalink] [raw]

Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

* Avi Kivity ([email protected]) wrote:
> On 12/23/2009 02:14 PM, Andi Kleen wrote:
> >>http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf
> >>
> >>See slide 32. This is without vhost-net.
> >Thanks. Do you also have latency numbers?
>
> No. Copying Chris. This was with the tx mitigation timer disabled,
> so you won't see the usual atrocious userspace virtio latencies, but
> it won't be as good as a host kernel implementation since we take a
> heavyweight exit and qemu is pretty unoptimized.

Those numbers don't show cpu cycles per packet nor do they show latencies.
You won't see the timer based latency, because the tx mitigation scheme
is not timer based for those numbers. Below are some numbers comparing
bare metal, an assigned device, and virtio (not vhost-net, so we are still
doing a heavy-weight exit to qemu and syscalls to deliver to tap device).

> >It seems like there's definitely still potential for improvement
> >with messages<4K. But for the large messages they indeed
> >look rather good.

You are misreading the graph. At <4K it is tracking bare metal (the
green and yellow lines are bare metal, the red and blue bars are virtio).
At >4k we start to drop off (esp. on RX).

This (slide 9) shows AMQP latencies for bare metal, an assigned device,
and virtio.
http://www.redhat.com/f/pdf/summit/bche_320_red_hat_enterprise_mrg.pdf

Similarly, here's some much rawer latency numbers from netpipe, all done
in usecs.

bare assigned
metal PCI NIC virtio
(usecs) (usecs) (usecs)
----- ----- -----
1 bytes 22.20 36.16 53.19
2 bytes 22.21 35.98 53.23
3 bytes 22.22 36.18 53.29
4 bytes 22.25 33.77 53.43
6 bytes 22.33 36.33 53.48
8 bytes 22.32 36.24 53.27
12 bytes 22.25 35.97 53.33
13 bytes 22.40 35.94 53.54
16 bytes 22.36 35.98 53.60
19 bytes 22.40 35.95 53.51
21 bytes 22.42 35.94 53.76
24 bytes 22.32 36.18 53.45
27 bytes 22.34 36.08 53.48
29 bytes 22.36 36.02 53.42
32 bytes 22.46 36.15 53.23
35 bytes 22.36 36.23 53.13
45 bytes 26.32 36.17 53.29
48 bytes 26.24 35.94 53.50
51 bytes 26.44 36.01 53.66
61 bytes 26.43 33.66 53.28
64 bytes 26.66 36.32 53.17
67 bytes 26.35 36.21 53.53
93 bytes 26.59 36.49 45.75
96 bytes 26.48 36.28 45.72
99 bytes 26.51 36.47 45.72
125 bytes 26.74 36.48 45.99
128 bytes 26.44 36.52 45.69
131 bytes 26.52 35.71 45.80
189 bytes 26.77 36.99 46.78
192 bytes 26.96 37.45 47.00
195 bytes 26.96 37.45 47.10
253 bytes 27.01 38.03 47.36
256 bytes 27.09 37.85 47.23
259 bytes 26.98 37.82 47.28
381 bytes 26.61 38.38 47.84
384 bytes 26.72 38.54 48.01
387 bytes 26.76 38.65 47.80
509 bytes 25.13 39.19 48.30
512 bytes 25.13 36.69 56.05
515 bytes 25.15 37.42 55.70
765 bytes 25.29 40.31 57.26
768 bytes 25.25 39.76 57.32
771 bytes 25.26 40.33 57.06
1021 bytes 49.27 57.00 63.73
1024 bytes 49.33 57.09 63.70
1027 bytes 49.07 57.25 63.70
1533 bytes 50.11 58.98 70.57
1536 bytes 50.09 59.30 70.22
1539 bytes 50.18 59.27 70.35
2045 bytes 50.44 59.42 74.31
2048 bytes 50.33 59.29 75.31
2051 bytes 50.32 59.14 74.02
3069 bytes 62.71 64.20 96.87
3072 bytes 62.78 64.94 96.84
3075 bytes 62.83 65.13 96.62
4093 bytes 62.56 64.78 99.63
4096 bytes 62.46 65.04 99.54
4099 bytes 62.47 65.87 99.65
6141 bytes 63.35 65.39 104.03
6144 bytes 63.59 66.16 104.66
6147 bytes 63.74 66.04 104.61
8189 bytes 63.65 66.52 107.75
8192 bytes 63.64 66.71 108.17
8195 bytes 63.66 67.08 109.11
12285 bytes 63.26 84.58 114.13
12288 bytes 63.28 85.38 114.55
12291 bytes 63.22 83.71 114.40
16381 bytes 62.87 98.19 120.48
16384 bytes 63.12 97.96 122.19
16387 bytes 63.48 98.48 121.68
24573 bytes 93.26 108.93 152.67
24576 bytes 94.40 109.42 152.14
24579 bytes 93.37 108.86 153.51
32765 bytes 102.84 115.46 169.04
32768 bytes 100.01 114.62 166.19
32771 bytes 102.61 115.97 167.96
49149 bytes 125.46 144.78 209.99
49152 bytes 123.76 139.70 187.17
49155 bytes 125.13 137.97 185.44

2009-12-23 17:10:56

by Andi Kleen

[permalink] [raw]

Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

> And its moot, anyway, as I have already retracted my one outstanding
> pull request based on Linus' observation. So at this time, I am not
> advocating _anything_ for upstream inclusion. And I am contemplating
> _never_ doing so again. It's not worth _this_.

That certainly sounds like the wrong reaction. Out of tree drivers
are typically a pain to use.

And upstream submission is not always like this!

-Andi

2009-12-23 17:17:56

On 12/23/09 1:15 AM, Kyle Moffett wrote:
> On Tue, Dec 22, 2009 at 12:36, Gregory Haskins
> <[email protected]> wrote:
>> On 12/22/09 2:57 AM, Ingo Molnar wrote:
>>> * Gregory Haskins <[email protected]> wrote:
>>>> Actually, these patches have nothing to do with the KVM folks. [...]
>>>
>>> That claim is curious to me - the AlacrityVM host
>>
>> It's quite simple, really. These drivers support accessing vbus, and
>> vbus is hypervisor agnostic. In fact, vbus isn't necessarily even
>> hypervisor related. It may be used anywhere where a Linux kernel is the
>> "io backend", which includes hypervisors like AlacrityVM, but also
>> userspace apps, and interconnected physical systems as well.
>>
>> The vbus-core on the backend, and the drivers on the frontend operate
>> completely independent of the underlying hypervisor. A glue piece
>> called a "connector" ties them together, and any "hypervisor" specific
>> details are encapsulated in the connector module. In this case, the
>> connector surfaces to the guest side as a pci-bridge, so even that is
>> not hypervisor specific per se. It will work with any pci-bridge that
>> exposes a compatible ABI, which conceivably could be actual hardware.
>
> This is actually something that is of particular interest to me. I
> have a few prototype boards right now with programmable PCI-E
> host/device links on them; one of my long-term plans is to finagle
> vbus into providing multiple "virtual" devices across that single
> PCI-E interface.
>
> Specifically, I want to be able to provide virtual NIC(s), serial
> ports and serial consoles, virtual block storage, and possibly other
> kinds of interfaces. My big problem with existing virtio right now
> (although I would be happy to be proven wrong) is that it seems to
> need some sort of out-of-band communication channel for setting up
> devices, not to mention it seems to need one PCI device per virtual
> device.
>
> So I would love to be able to port something like vbus to my nify PCI
> hardware and write some backend drivers... then my PCI-E connected
> systems would dynamically provide a list of highly-efficient "virtual"
> devices to each other, with only one 4-lane PCI-E bus.

Hi Kyle,

We indeed have others that are doing something similar. I have CC'd Ira
who may be able to provide you more details. I would also point you at
the canonical example for what you would need to write to tie your
systems together. Its the "null connector", which you can find here:

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=kernel/vbus/connectors/null.c;h=b6d16cb68b7e49e07528278bc9f5b73e1dac0c2f;hb=HEAD

Do not hesitate to ask any questions, though you may want to take the
conversation to the alacrityvm-devel list as to not annoy the current CC
list any further than I already have ;)

Kind Regards,
-Greg

Attachments:

signature.asc (267.00 B)
OpenPGP digital signature

2009-12-23 18:12:33

by Peter W. Morreale

[permalink] [raw]

Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Wed, 2009-12-23 at 13:14 +0100, Andi Kleen wrote:
> > http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf
> >
> > See slide 32. This is without vhost-net.
>
> Thanks. Do you also have latency numbers?
>
> It seems like there's definitely still potential for improvement
> with messages <4K. But for the large messages they indeed
> look rather good.
>
> It's unclear what message size the Alacrity numbers used, but I presume
> it was rather large.
>

No. It was 1500. Please see:

http://developer.novell.com/wiki/index.php/AlacrityVM/Results

Best,
-PWM

> -Andi

2009-12-23 18:15:22

by Gregory Haskins

[permalink] [raw]

Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/09 5:22 AM, Avi Kivity wrote:

>
> There was no attempt by Gregory to improve virtio-net.

If you truly do not understand why your statement is utterly wrong at
this point in the discussion, I feel sorry for you. If you are trying
to be purposely disingenuous, you should be ashamed of yourself. In any
case, your statement is demonstrably bogus, but you should already know
this given that we talked about at least several times.

To refresh your memory: http://patchwork.kernel.org/patch/17428/

In case its not blatantly clear, which I would hope it would be to
anyone that understands the problem space: What that patch would do is
allow an unmodified virtio-net to bridge to a vbus based virtio-net
backend. (Also note that this predates vhost-net by months (the date in
that thread is 4/9/2009) in case you are next going to try to argue that
it does nothing over vhost-net).

This would mean that virtio-net would gain most of the benefits I have
been advocating (fewer exits, cheaper exits, concurrent execution, etc).
So this would very much improve virtio-net indeed, given how poorly the
current backend was performing. I tried to convince the team to help me
build it out to completion on multiple occasions, but that request was
answered with "sorry, we are doing our own thing instead". You can say
that you didn't like my approach, since that is a subjective opinion.
But to say that I didn't attempt to improve it is a flat out wrong, and
I do not appreciate it.

-Greg

Attachments:

signature.asc (267.00 B)
OpenPGP digital signature

2009-12-23 18:17:35

On 12/23/09 3:36 PM, Avi Kivity wrote:
> On 12/23/2009 06:44 PM, Gregory Haskins wrote:
>>
>>> - Are a pure software concept
>>>
>> By design. In fact, I would describe it as "software to software
>> optimized" as opposed to trying to shoehorn into something that was
>> designed as a software-to-hardware interface (and therefore has
>> assumptions about the constraints in that environment that are not
>> applicable in software-only).
>>
>>
>
> And that's the biggest mistake you can make.

Sorry, that is just wrong or you wouldn't have virtio either.

> Look at Xen, for
> instance. The paravirtualized the fork out of everything that moved in
> order to get x86 virt going. And where are they now? x86_64 syscalls
> are slow since they have to trap to the hypervisor and (partially) flush
> the tlb. With npt or ept capable hosts performance is better for many
> workloads on fullvirt. And paravirt doesn't support Windows. Their
> unsung hero Jeremy is still trying to upstream dom0 Xen support. And
> they get to support it forever.

We are only talking about PV-IO here, so not apples to apples to what
Xen is going through.

>
> VMware stuck with the hardware defined interfaces. Sure they had to
> implement binary translation to get there, but as a result, they only
> have to support one interface, all guests support it, and they can drop
> it on newer hosts where it doesn't give them anything.

Again, you are confusing PV-IO. Not relevant here. Afaict, vmware,
kvm, xen, etc, all still do PV-IO and likely will for the foreseeable
future.

-Greg

Attachments:

signature.asc (267.00 B)
OpenPGP digital signature

2009-12-24 09:36:36

On 12/27/09 4:15 AM, Avi Kivity wrote:
> On 12/23/2009 11:21 PM, Gregory Haskins wrote:
>> That said, you are still incorrect. With what I proposed, the model
>> will run as an in-kernel vbus device, and no longer run in userspace.
>> It would therefore improve virtio-net as I stated, much in the same
>> way vhost-net or venet-tap do today.
>>
>
> That can't work. virtio-net has its own ABI on top of virtio, for
> example it prepends a header for TSO information. Maybe if you disable
> all features it becomes compatible with venet, but that cripples it.
>

You are confused. The backend would be virtio-net specific, and would
therefore understand the virtio-net ABI. It would support any feature
of virtio-net as long as it was implemented and negotiated by both sides
of the link.

-Greg

Attachments:

signature.asc (267.00 B)
OpenPGP digital signature

2009-12-27 13:28:24

by Avi Kivity

[permalink] [raw]

Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/27/2009 03:18 PM, Gregory Haskins wrote:
> On 12/27/09 4:15 AM, Avi Kivity wrote:
>
>> On 12/23/2009 11:21 PM, Gregory Haskins wrote:
>>
>>> That said, you are still incorrect. With what I proposed, the model
>>> will run as an in-kernel vbus device, and no longer run in userspace.
>>> It would therefore improve virtio-net as I stated, much in the same
>>> way vhost-net or venet-tap do today.
>>>
>>>
>> That can't work. virtio-net has its own ABI on top of virtio, for
>> example it prepends a header for TSO information. Maybe if you disable
>> all features it becomes compatible with venet, but that cripples it.
>>
>>
> You are confused. The backend would be virtio-net specific, and would
> therefore understand the virtio-net ABI. It would support any feature
> of virtio-net as long as it was implemented and negotiated by both sides
> of the link.
>

Then we're back to square one. A nice demonstration of vbus
flexibility, but no help for virtio.

--
error compiling committee.c: too many arguments to function

2009-12-27 13:34:53

by Gregory Haskins

[permalink] [raw]

Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/27/09 4:33 AM, Avi Kivity wrote:
> On 12/24/2009 11:36 AM, Gregory Haskins wrote:
>>> As a twist on this, the VMware paravirt driver interface is so
>>> hardware-like that they're getting hardware vendors to supply cards that
>>> implement it. Try that with a pure software approach.
>>>
>> Any hardware engineer (myself included) will tell you that, generally
>> speaking, what you can do in hardware you can do in software (think of
>> what QEMU does today, for instance). It's purely a cost/performance
>> tradeoff.
>>
>> I can at least tell you that is true of vbus. Anything on the vbus side
>> would be equally eligible for a hardware implementation, though there is
>> not reason to do this today since we have equivalent functionality in
>> baremetal already.
>
> There's a huge difference in the probability of vmware getting cards to
> their spec, or x86 vendors improving interrupt delivery to guests,
> compared to vbus being implemented in hardware.

Thats not relevant, however. I said in the original quote that you
snipped that I made it a software design on purpose, and you tried to
somehow paint that as a negative because vmware made theirs
"hardware-like" and you implied it could not be done with my approach
with the statement "try that with a pure software approach". And the
bottom line is that the statement is incorrect and/or misleading.

>
>> The only motiviation is if you wanted to preserve
>> ABI etc, which is what vmware is presumably after. However, I am not
>> advocating this as necessary at this juncture.
>>
>
> Maybe AlacrityVM users don't care about compatibility, but my users do.

Again, not relevant to this thread. Making your interface
"hardware-like" buys you nothing in the end, as you ultimately need to
load drivers in the guest either way, and any major OS lets you extend
both devices and buses with relative ease. The only counter example
would be if you truly were "hardware-exactly" like e1000 emulation, but
we already know that this means it is hardware centric and not
"exit-rate aware" and would perform poorly. Otherwise "compatible" is
purely a point on the time line (for instance, the moment virtio-pci ABI
shipped), not an architectural description such as "hardware-like".

>

Attachments:

signature.asc (267.00 B)
OpenPGP digital signature

2009-12-27 13:39:35

On 12/27/09 8:49 AM, Avi Kivity wrote:
> On 12/27/2009 03:34 PM, Gregory Haskins wrote:
>> On 12/27/09 4:33 AM, Avi Kivity wrote:
>>
>>> On 12/24/2009 11:36 AM, Gregory Haskins wrote:
>>>
>>>>> As a twist on this, the VMware paravirt driver interface is so
>>>>> hardware-like that they're getting hardware vendors to supply cards
>>>>> that
>>>>> implement it. Try that with a pure software approach.
>>>>>
>>>>>
>>>> Any hardware engineer (myself included) will tell you that, generally
>>>> speaking, what you can do in hardware you can do in software (think of
>>>> what QEMU does today, for instance). It's purely a cost/performance
>>>> tradeoff.
>>>>
>>>> I can at least tell you that is true of vbus. Anything on the vbus
>>>> side
>>>> would be equally eligible for a hardware implementation, though
>>>> there is
>>>> not reason to do this today since we have equivalent functionality in
>>>> baremetal already.
>>>>
>>> There's a huge difference in the probability of vmware getting cards to
>>> their spec, or x86 vendors improving interrupt delivery to guests,
>>> compared to vbus being implemented in hardware.
>>>
>> Thats not relevant, however. I said in the original quote that you
>> snipped that I made it a software design on purpose, and you tried to
>> somehow paint that as a negative because vmware made theirs
>> "hardware-like" and you implied it could not be done with my approach
>> with the statement "try that with a pure software approach". And the
>> bottom line is that the statement is incorrect and/or misleading.
>>
>
> It's not incorrect.

At the very best it's misleading.

> VMware stuck to the pci specs and as a result they
> can have hardware implement their virtual NIC protocol. For vbus this
> is much harder

Not really.

> to do since you need a side-channel between different
> cards to coordinate interrupt delivery. In theory you can do eveything
> if you don't consider practicalities.

pci based designs, such as vmware and virtio-pci arent free of this
notion either. They simply rely on APIC emulation for the irq-chip, and
it just so happens that vbus implements a different irq-chip (more
specifically, the connector that we employ between the guest and vbus
does). On one hand, you have the advantage of the guest already
supporting the irq-chip ABI, and on other other, you have an optimized
(e.g. shared memory based inject/ack) and feature enhanced ABI
(interrupt priority, no IDT constraints, etc). The are pros and cons to
either direction, but the vbus project charter is to go for maximum
performance and features, so that is acceptable to us.

>
> That's a digression, though, I'm not suggesting we'll see virtio
> hardware or that this is a virtio/pci advantage vs. vbus. It's an
> anecdote showing that sticking with specs has its advantages.

It also has distinct disadvantages. For instance, the PCI spec is
gigantic, yet almost none of it is needed to do the job here. When you
are talking full-virt, you are left with no choice. With para-virt, you
do have a choice, and the vbus-connector for AlacrityVM capitalizes on this.

As an example, think about all the work that went into emulating the PCI
chipset, the APIC chipset, MSI-X support, irq-routing, etc, when all you
needed was a simple event-queue to indicate that an event (e.g. an
"interrupt") occurred.

This is an example connector in vbus:

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=kernel/vbus/connectors/null.c;h=b6d16cb68b7e49e07528278bc9f5b73e1dac0c2f;hb=HEAD

It encapsulates all of hotplug, signal (interrupt) routing, and memory
routing for both sides of the "link" in 584 lines of code. And that
also implicitly brings in device discovery and configuration since that
is covered by the vbus framework. Try doing that with PCI, especially
when you are not already under the qemu umbrella, and the "standards
based" approach suddenly doesn't look very attractive.

>
> wrt pci vs vbus, the difference is in the ability to use improvements in
> interrupt delivery accelerations in virt hardware.

Most of which will also apply to the current vbus design as well since
at some point I have to have an underlying IDT mechanism too, btw.

> If this happens,
> virtio/pci can immediately take advantage of it, while vbus has to stick
> with software delivery for backward compatibility, and all that code
> becomes a useless support burden.
>

The shared-memory path will always be the fastest anyway, so I am not
too worried about it. But vbus supports feature negotiation, so we can
always phase that out if need be, same as anything else.

> As an example of what hardware can do when it really sets its mind to
> it, s390 can IPI from vcpu to vcpu without exiting to the host.

Great! I am just not in the practice of waiting for hardware to cover
sloppy software. There is a ton of impracticality in doing so, such as
the fact that the hardware, even once available, will not be ubiquitous
instantly.

>
>>>> The only motiviation is if you wanted to preserve
>>>> ABI etc, which is what vmware is presumably after. However, I am not
>>>> advocating this as necessary at this juncture.
>>>>
>>>>
>>> Maybe AlacrityVM users don't care about compatibility, but my users do.
>>>
>> Again, not relevant to this thread. Making your interface
>> "hardware-like" buys you nothing in the end, as you ultimately need to
>> load drivers in the guest either way, and any major OS lets you extend
>> both devices and buses with relative ease. The only counter example
>> would be if you truly were "hardware-exactly" like e1000 emulation, but
>> we already know that this means it is hardware centric and not
>> "exit-rate aware" and would perform poorly. Otherwise "compatible" is
>> purely a point on the time line (for instance, the moment virtio-pci ABI
>> shipped), not an architectural description such as "hardware-like".
>>
>
> True, not related to the thread. But it is a problem.

Agreed. It is a distinct disadvantage to switching. Note that I am not
advocating that we need to switch. virtio-pci can coexist peacefully
from my perspective, and AlacrityVM does exactly this.

-Greg

Attachments:

signature.asc (267.00 B)
OpenPGP digital signature

2009-12-28 01:01:28

by Gregory Haskins

[permalink] [raw]

Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/27/09 8:49 AM, Avi Kivity wrote:
> On 12/27/2009 03:39 PM, Gregory Haskins wrote:
>> No, where we are is at the point where we demonstrate that your original
>> statement that I did nothing to improve virtio was wrong.
>>
>>
>
> I stand by it. virtio + your patch does nothing without a ton more work
> (more or less equivalent to vhost-net).
>

Perhaps, but my work predates vhost-net by months and that has nothing
to do with what we are talking about anyway. Since you snipped your
original comment that started the thread, here it is again:

On 12/23/09 5:22 AM, Avi Kivity wrote:
> >
> > There was no attempt by Gregory to improve virtio-net.

It's not a gray area, nor open to interpretation. That statement was,
is, and will always be demonstrably false, so I'm sorry but you are
still wrong.

-Greg

Attachments:

signature.asc (267.00 B)
OpenPGP digital signature