2009-12-07 18:53:30

by Gregory Haskins

[permalink] [raw]
Subject: [GIT PULL] AlacrityVM guest drivers for 2.6.33

Hi Linus,

Please pull AlacrityVM guest support for 2.6.33 from:

git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git
for-linus

All of these patches have stewed in linux-next for quite a while now:

Gregory Haskins (26):
shm-signal: shared-memory signals
ioq: Add basic definitions for a shared-memory, lockless queue
vbus: add a "vbus-proxy" bus model for vbus_driver objects
vbus-proxy: add a pci-to-vbus bridge
ioq: add driver-side vbus helpers
net: add vbus_enet driver
venet: Update maintainer
venet: fix gso.hdr_len to report correct length
venet: add pre-mapped tx descriptor feature
venet: report actual used descriptor size
venet: cache the ringlen values at init
venet: add eventq protocol
venet: use an skblist for outstanding descriptors
venet: add a tx-complete event for out-of-order support
venet: add Layer-4 Reassembler Offload (L4RO) support
vbus: allow shmsignals to be named
vbus: register shm-signal events as standard Linux IRQ vectors
net: fix vbus-enet Kconfig dependencies
venet: fix locking issue with dev_kfree_skb()
vbus: fix kmalloc() from interrupt context to use GFP_ATOMIC
fix irq resource leak
vbus: remove create_irq() references from the pcibridge
vbus: make library code properly declared as GPL
venet: add missing ethtool include
vbus: add autoprobe capability to guest
vbus: fix pcibridge busmaster support

Jaswinder Singh Rajput (1):
ioq: includecheck fix

Patrick Mullaney (1):
vbus-enet: fix l4ro pool non-atomic allocations in softirq context

Rakib Mullick (1):
vbus: Fix section mismatch warnings in pci-bridge.c

Randy Dunlap (2):
vbus-proxy also uses ioq, so it should select IOQ.
Eliminate all cast warnings in vbus-enet.c and pci-bridge.c.

Thadeu Lima de Souza Cascardo (1):
trivial: fix a typo in SHM_SIGNAL config description

MAINTAINERS | 25 +
arch/x86/Kconfig | 2 +
drivers/Makefile | 1 +
drivers/net/Kconfig | 14 +
drivers/net/Makefile | 1 +
drivers/net/vbus-enet.c | 1560
+++++++++++++++++++++++++++++++++++++++++++
drivers/vbus/Kconfig | 25 +
drivers/vbus/Makefile | 6 +
drivers/vbus/bus-proxy.c | 247 +++++++
drivers/vbus/pci-bridge.c | 1015 ++++++++++++++++++++++++++++
include/linux/Kbuild | 4 +
include/linux/ioq.h | 414 ++++++++++++
include/linux/shm_signal.h | 189 ++++++
include/linux/vbus_driver.h | 83 +++
include/linux/vbus_pci.h | 145 ++++
include/linux/venet.h | 133 ++++
lib/Kconfig | 21 +
lib/Makefile | 2 +
lib/ioq.c | 300 +++++++++
lib/shm_signal.c | 196 ++++++
20 files changed, 4383 insertions(+), 0 deletions(-)
create mode 100644 drivers/net/vbus-enet.c
create mode 100644 drivers/vbus/Kconfig
create mode 100644 drivers/vbus/Makefile
create mode 100644 drivers/vbus/bus-proxy.c
create mode 100644 drivers/vbus/pci-bridge.c
create mode 100644 include/linux/ioq.h
create mode 100644 include/linux/shm_signal.h
create mode 100644 include/linux/vbus_driver.h
create mode 100644 include/linux/vbus_pci.h
create mode 100644 include/linux/venet.h
create mode 100644 lib/ioq.c
create mode 100644 lib/shm_signal.c


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-18 21:51:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33


* Gregory Haskins <[email protected]> wrote:

> Hi Linus,
>
> Please pull AlacrityVM guest support for 2.6.33 from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git
> for-linus
>
> All of these patches have stewed in linux-next for quite a while now:
>
> Gregory Haskins (26):

I think it would be fair to point out that these patches have been objected to
by the KVM folks quite extensively, on multiple technical grounds - as
basically this tree forks the KVM driver space for which no valid technical
reason could be offered by you in a 100+ mails long discussion.

(And yes, i've been Cc:-ed to much of that thread.)

The result will IMO be pain for users because now we'll have two frameworks,
tooling incompatibilities, etc. etc.

I've extended the Cc: for the KVM folks to have a chance to reply. Please try
_much_ harder to work with the KVM folks instead of ignoring their feedback
and de-facto forking their project. (and not mentioning any of this in your
pull request) We should unify, not fracture.

Ingo

2009-12-21 15:34:35

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/18/09 4:51 PM, Ingo Molnar wrote:
>
> * Gregory Haskins <[email protected]> wrote:
>
>> Hi Linus,
>>
>> Please pull AlacrityVM guest support for 2.6.33 from:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git
>> for-linus
>>
>> All of these patches have stewed in linux-next for quite a while now:
>>
>> Gregory Haskins (26):
>
> I think it would be fair to point out that these patches have been objected to
> by the KVM folks quite extensively,

Actually, these patches have nothing to do with the KVM folks. You are
perhaps confusing this with the hypervisor-side discussion, of which
there is indeed much disagreement.

To that point, it's certainly fair to point out the controversy on the
host side. It ultimately is what forced the creation of the AlacrityVM
project, after all. However, it should also be pointed out that this
pull request is not KVM specific, nor even KVM related per se. These
patches can (and in fact, do) work in other environments that do not use
KVM nor even AlacrityVM at all.

VBUS, the underlying technology here, is a framework for creating
optimized software-based device models using a Linux-kernel as a host
and their corresponding "driver" resources to the backend. AlacrityVM
is the application of these technologies using KVM/Linux/Qemu as a base,
but that is an implementation detail.

For more details, please see the project wiki

http://developer.novell.com/wiki/index.php/AlacrityVM

This pull request is for drivers to support running a Linux kernel as a
guest in this environment, so it actually doesn't affect KVM in any way.
They are just standard Linux drivers and in fact can load as
stand-alone KMPs in any modern vanilla distro. I haven't even pushed
the host side code to linux-next yet specifically because of that
controversy you mention.


> on multiple technical grounds - as
> basically this tree forks the KVM driver space for which no valid technical
> reason could be offered by you in a 100+ mails long discussion.

You will have to be more specific on these technical grounds you
mention, because I believe I satisfactorily rebutted any issues raised.
To say that there is no technical reason is, at best, a matter of
opinion. I have in fact listed numerous reasons on a technical,
feature, and architectural basis on what differentiates my approach, and
provided numbers which highlights their merits. Given that they are all
recorded in the archives of said 100+ email thread as well as numerous
others, I wont rehash the entire list here. Instead, I will post a
summary of the problem space from the performance perspective, since
that seems to be of most interest atm.

From my research, the reason why virt in general, and KVM in particular
suffers on the IO performance front is as follows: IOs
(traps+interrupts) are more expensive than bare-metal, and real hardware
is naturally concurrent (your hbas and nics are effectively parallel
execution engines, etc).

Assuming my observations are correct, in order to squeeze maximum
performance from a given guest, you need to do three things: A)
eliminate as many IOs as you possibly can, B) reduce the cost of the
ones you can't avoid, and C) run your algorithms in parallel to emulate
concurrent silicon.

So to that front, we move the device models to the kernel (where they
are closest to the physical IO devices) and use "cheap" instructions
like PIOs/Hypercalls for (B), and exploit spare host-side SMP resources
via kthreads for (C). For (A), part of the problem is that virtio-pci
is not designed optimally to address the problem space, and part of it
is a limitation of the PCI transport underneath it.

For example, PCI is somewhat of a unique bus design in that it wants to
map signals to interrupts 1:1. This works fine for real hardware where
interrupts are relatively cheap, but is quite suboptimal on virt where
the window-exits, injection-exits, and MMIO-based EIOs hurt
substantially (multiple microseconds per).

One core observation is that we don't technically need 1:1 interrupts to
signals in order to function properly. Ideally we will only bother the
CPU when work of a higher priority becomes ready. So the alacrityvm
connector to vbus uses a model were we deploy a lockless shared-memory
queue to inject interrupts. This means that temporal interrupts (of
both intra and inter device variety) of similar priority can queue
without incurring any extra IO. This means fewer exits, fewer EOIs, etc.

The end result is that I can demonstrate that even with a single stream
to a single device, I can reduce exit rate by over 45% and interrupt
rate > 50% when compared to the equivalent virtio-pci ABI. This scales
even higher when you add additional devices to the mix. The bottom line
is that we use significantly less CPU while producing the highest
throughput and lowest latency. In fact, to my knowledge vbus+venet is
still the highest performing 802.x device for KVM to my knowledge, even
when turning off its advanced features like zero-copy.

The parties involved have demonstrated a close mindedness to the
concepts I've introduced, which is ultimately why today we have two
projects. I would much prefer that we didn't, but that is not in my
control. Note that the KVM folks eventually came around regarding the
in-kernel and concurrent execution concepts, which is a good first step.
I have yet to convince them about the perils of relying on PCI, which I
believe is an architectural mistake. I suspect at this point it will
take community demand and independent reports from users of the
technology to convince them further. The goal of the alacrityvm project
is to make it easy for interested users to do so.

Don't get me wrong. PCI is a critical feature for full-virt guests.
But IMO it has limited applicability once we start talking about PV, and
AlacrityVM aims to correct that.

>
> (And yes, i've been Cc:-ed to much of that thread.)
>
> The result will IMO be pain for users because now we'll have two frameworks,
> tooling incompatibilities, etc. etc.

Precedent defies your claim, as that situation already exists today that
has nothing to do with my work. Even if you scoped the discussion
specifically to KVM, users can select various incompatible IO methods
([realtek, e1000, virtio-net], [ide. lsi-scsi, virtio-blk], [std-vga,
cirrus-vga], etc), so this claim about user pain seems dubious at best.
I suspect that if a new choice is available that offers
features/performance improvements, users are best served by having that
choice to make themselves, instead of having that choice simply unavailable.

The reason why we are here having this particular conversation as it
pertains to KVM is that I do not believe you can achieve the
performance/feature goals that I have set for the project in a backwards
compatible way (i.e. virtio-pci compatible). At least, not is a way
that is not a complete disaster code-base wise. So while I agree that a
new incompatible framework vs backwards compatible is suboptimal, I
believe it's necessary to ultimately fix the problems in the most ideal
way. Therefore, I would rather take this lump now than 5 years from now.

The KVM maintainers apparently do not agree on that fundamental point,
so we are deadlocked.

So far, the only legitimate objection I have seen to these guest side
drivers is Linus', and I see his point. I won't make a pull request
again until I feel enough community demand has been voiced to warrant a
reconsideration.

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-21 15:43:41

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/2009 05:34 PM, Gregory Haskins wrote:
>
>> I think it would be fair to point out that these patches have been objected to
>> by the KVM folks quite extensively,
>>
> Actually, these patches have nothing to do with the KVM folks. You are
> perhaps confusing this with the hypervisor-side discussion, of which
> there is indeed much disagreement.
>

This is true, though these drivers are fairly pointless for
virtualization without the host side support.

I did have a few issues with the guest drivers:
- the duplication of effort wrt virtio. These drivers don't cover
exactly the same problem space, but nearly so.
- no effort at scalability - all interrupts are taken on one cpu
- the patches introduce a new virtual interrupt controller for dubious
(IMO) benefits

> From my research, the reason why virt in general, and KVM in particular
> suffers on the IO performance front is as follows: IOs
> (traps+interrupts) are more expensive than bare-metal, and real hardware
> is naturally concurrent (your hbas and nics are effectively parallel
> execution engines, etc).
>
> Assuming my observations are correct, in order to squeeze maximum
> performance from a given guest, you need to do three things: A)
> eliminate as many IOs as you possibly can, B) reduce the cost of the
> ones you can't avoid, and C) run your algorithms in parallel to emulate
> concurrent silicon.
>

All these are addressed by vhost-net without introducing new drivers.

--
error compiling committee.c: too many arguments to function

2009-12-21 16:04:28

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/09 10:43 AM, Avi Kivity wrote:
> On 12/21/2009 05:34 PM, Gregory Haskins wrote:
>>
>>> I think it would be fair to point out that these patches have been
>>> objected to
>>> by the KVM folks quite extensively,
>>>
>> Actually, these patches have nothing to do with the KVM folks. You are
>> perhaps confusing this with the hypervisor-side discussion, of which
>> there is indeed much disagreement.
>>
>
> This is true, though these drivers are fairly pointless for
> virtualization without the host side support.

The host side support is available in various forms (git tree, rpm, etc)
from our project page. I would encourage any interested parties to
check it out:

Here is the git tree

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=summary

Here are some RPMs:

http://download.opensuse.org/repositories/devel://LLDC://alacrity/openSUSE_11.1/

And the main project site:

http://developer.novell.com/wiki/index.php/AlacrityVM

>
> I did have a few issues with the guest drivers:
> - the duplication of effort wrt virtio. These drivers don't cover
> exactly the same problem space, but nearly so.

Virtio itself is more or less compatible with this effort, as we have
discussed (see my virtio-vbus transport, for instance). I have issues
with some of the design decisions in the virtio device and ring models,
but they are minor in comparison to the beef I have with the virtio-pci
transport as a whole.

> - no effort at scalability - all interrupts are taken on one cpu

Addressed by the virtual-interrupt controller. This will enable us to
route shm-signal messages to a core, under guidance from the standard
irq-balance facilities.

> - the patches introduce a new virtual interrupt controller for dubious
> (IMO) benefits

See above. Its not fully plumbed yet, which is perhaps the reason for
the confusion as to its merits. Eventually I will trap the affinity
calls and pass them to the host, too. Today, it at least lets us see
the shm-signal statistics under /proc/interrupts, which is nice and is
consistent with other IO mechanisms.


>
>> From my research, the reason why virt in general, and KVM in particular
>> suffers on the IO performance front is as follows: IOs
>> (traps+interrupts) are more expensive than bare-metal, and real hardware
>> is naturally concurrent (your hbas and nics are effectively parallel
>> execution engines, etc).
>>
>> Assuming my observations are correct, in order to squeeze maximum
>> performance from a given guest, you need to do three things: A)
>> eliminate as many IOs as you possibly can, B) reduce the cost of the
>> ones you can't avoid, and C) run your algorithms in parallel to emulate
>> concurrent silicon.
>>
>
> All these are addressed by vhost-net without introducing new drivers.

No, B and C definitely are, but A is lacking. And the performance
suffers as a result in my testing (vhost-net still throws a ton of exits
as its limited by virtio-pci and only adds about 1Gb/s to virtio-u, far
behind venet even with things like zero-copy turned off).

I will also point out that these performance aspects are only a subset
of the discussion, since we are also addressing things like
qos/priority, alternate fabric types, etc. I do not expect you to
understand and agree where I am going per se. We can have that
discussion when I once again ask you for merge consideration. But if
you say "they are the same" I will call you on it, because they are
demonstrably unique capability sets.

Kind Regards,
-Greg





Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-21 16:37:16

by Anthony Liguori

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/2009 10:04 AM, Gregory Haskins wrote:
> No, B and C definitely are, but A is lacking. And the performance
> suffers as a result in my testing (vhost-net still throws a ton of exits
> as its limited by virtio-pci and only adds about 1Gb/s to virtio-u, far
> behind venet even with things like zero-copy turned off).
>

How does virtio-pci limit vhost-net? The only time exits should occur
are when the guest notifies the host that something has been placed on
the ring. Since vhost-net has no tx mitigation scheme right now, the
result may be that it's taking an io exit on every single packet but
this is orthogonal to virtio-pci.

Since virtio-pci supports MSI-X, there should be no IO exits on
host->guest notification other than EOI in the virtual APIC. This is a
light weight exit today and will likely disappear entirely with newer
hardware.

Regards,

Anthony Liguori

2009-12-21 16:41:13

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/2009 06:37 PM, Anthony Liguori wrote:
> Since virtio-pci supports MSI-X, there should be no IO exits on
> host->guest notification other than EOI in the virtual APIC. This is
> a light weight exit today and will likely disappear entirely with
> newer hardware.

I'm working on disappearing EOI exits on older hardware as well. Same
idea as the old TPR patching, without most of the magic.

--
error compiling committee.c: too many arguments to function

2009-12-21 16:46:21

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/09 11:37 AM, Anthony Liguori wrote:
> On 12/21/2009 10:04 AM, Gregory Haskins wrote:
>> No, B and C definitely are, but A is lacking. And the performance
>> suffers as a result in my testing (vhost-net still throws a ton of exits
>> as its limited by virtio-pci and only adds about 1Gb/s to virtio-u, far
>> behind venet even with things like zero-copy turned off).
>>
>
> How does virtio-pci limit vhost-net? The only time exits should occur
> are when the guest notifies the host that something has been placed on
> the ring. Since vhost-net has no tx mitigation scheme right now, the
> result may be that it's taking an io exit on every single packet but
> this is orthogonal to virtio-pci.
>
> Since virtio-pci supports MSI-X, there should be no IO exits on
> host->guest notification other than EOI in the virtual APIC.

The very best you can hope to achieve is 1:1 EOI per signal (though
today virtio-pci is even worse than that). As I indicated above, I can
eliminate more than 50% of even the EOIs in trivial examples, and even
more as we scale up the number of devices or the IO load (or both).

> This is a
> light weight exit today and will likely disappear entirely with newer
> hardware.

By that argument, this is all moot. New hardware will likely obsolete
the need for venet or virtio-net anyway. The goal of my work is to
provide an easy to use framework for maximizing the IO transport _in
lieu_ of hardware acceleration. Software will always be leading here,
so we don't want to get into a pattern of waiting for new hardware to
cover poor software engineering. Its simply not necessary in most
cases. A little smart software design and a framework that allows it to
be easily exploited/reused is the best step forward, IMO.

Kind Regards,
-Greg





Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-21 16:57:11

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/09 11:40 AM, Avi Kivity wrote:
> On 12/21/2009 06:37 PM, Anthony Liguori wrote:
>> Since virtio-pci supports MSI-X, there should be no IO exits on
>> host->guest notification other than EOI in the virtual APIC. This is
>> a light weight exit today and will likely disappear entirely with
>> newer hardware.
>
> I'm working on disappearing EOI exits on older hardware as well. Same
> idea as the old TPR patching, without most of the magic.
>

While I applaud any engineering effort that results in more optimal
execution, if you are talking about what we have discussed in the past
its not quite in the same league as my proposal.

You are talking about the ability to optimize the final EOI if there are
no pending interrupts remaining, right? The problem with this approach
is it addresses the wrong side of the curve: That is, it optimizes the
code as its about to go io-idle. You still have to take an extra exit
for each injection during the heat of battle, which is when you actually
need it most.

To that front, what I have done is re-used the lockless shared-memory
concept for even "interrupt injection". Lockless shared-memory rings
have the property that both producer and consumer can simultaneously
manipulate the ring. So what we do in alacrityvm is deliver shm-signals
(shm-signal == "interrupt" in vbus) over a ring so that the host can
inject a signal to a running vcpu and a vcpu can complete an
ack/re-inject cycle directly from vcpu context. Therefore, we only need
a physical IDT injection when the vcpu is io-idle transitioning to
io-busy, and remain completely in parallel guest/host context until we
go idle again.

That said, your suggestion would play nicely with the above mentioned
scheme, so I look forward to seeing it in the tree. Feel free to send
me patches for testing.

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-21 17:05:47

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/2009 06:56 PM, Gregory Haskins wrote:
>> I'm working on disappearing EOI exits on older hardware as well. Same
>> idea as the old TPR patching, without most of the magic.
>>
>>
> While I applaud any engineering effort that results in more optimal
> execution, if you are talking about what we have discussed in the past
> its not quite in the same league as my proposal.
>

I don't doubt this for a minute.

> You are talking about the ability to optimize the final EOI if there are
> no pending interrupts remaining, right? The problem with this approach
> is it addresses the wrong side of the curve: That is, it optimizes the
> code as its about to go io-idle. You still have to take an extra exit
> for each injection during the heat of battle, which is when you actually
> need it most.
>

No, it's completely orthogonal. An interrupt is injected, the handler
disables further interrupts and EOIs, then schedules the rest of the
handling code. So long as there as packets in the ring interrupts won't
be enabled and hence there won't be any reinjections.

Different interrupt sources still need different interrupts, but as all
of your tests have been single-interface, this can't be the reason for
your performance.

--
error compiling committee.c: too many arguments to function

2009-12-21 17:21:07

by Anthony Liguori

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/2009 10:46 AM, Gregory Haskins wrote:
> The very best you can hope to achieve is 1:1 EOI per signal (though
> today virtio-pci is even worse than that). As I indicated above, I can
> eliminate more than 50% of even the EOIs in trivial examples, and even
> more as we scale up the number of devices or the IO load (or both).
>

If optimizing EOI is the main technical advantage of vbus, then surely
we could paravirtualize EOI access and get that benefit in KVM without
introducing a whole new infrastructure, no?

>> This is a
>> light weight exit today and will likely disappear entirely with newer
>> hardware.
>>
> By that argument, this is all moot. New hardware will likely obsolete
> the need for venet or virtio-net anyway.

Not at all. But let's focus on concrete data. For a given workload,
how many exits do you see due to EOI? They should be relatively rare
because obtaining good receive batching is pretty easy. Considering
these are lightweight exits (on the order of 1-2us), you need an awfully
large amount of interrupts before you get really significant performance
impact. You would think NAPI would kick in at this point anyway.

Do you have data demonstrating the advantage of EOI mitigation?

Regards,

Anthony Liguori

2009-12-21 17:24:43

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/09 12:05 PM, Avi Kivity wrote:
> On 12/21/2009 06:56 PM, Gregory Haskins wrote:
>>> I'm working on disappearing EOI exits on older hardware as well. Same
>>> idea as the old TPR patching, without most of the magic.
>>>
>>>
>> While I applaud any engineering effort that results in more optimal
>> execution, if you are talking about what we have discussed in the past
>> its not quite in the same league as my proposal.
>>
>
> I don't doubt this for a minute.
>
>> You are talking about the ability to optimize the final EOI if there are
>> no pending interrupts remaining, right? The problem with this approach
>> is it addresses the wrong side of the curve: That is, it optimizes the
>> code as its about to go io-idle. You still have to take an extra exit
>> for each injection during the heat of battle, which is when you actually
>> need it most.
>>
>
> No, it's completely orthogonal. An interrupt is injected, the handler
> disables further interrupts and EOIs, then schedules the rest of the
> handling code. So long as there as packets in the ring interrupts won't
> be enabled and hence there won't be any reinjections.

I meant inter-vector "next-interrupt" injects. For lack of a better
term, I called it reinject, but I realize in retrospect that this is
ambiguous.

>
> Different interrupt sources still need different interrupts, but as all
> of your tests have been single-interface, this can't be the reason for
> your performance.
>

Actually I have tested both single and multi-homed setups, but it
doesn't matter. Even a single device can benefit, as even single
devices may have multiple vector sources that are highly probably to
generate coincident events. For instance, consider that even a basic
ethernet may have separate vectors for "rx" and "tx-complete". A simple
ping is likely to generate both vectors at approximately the same time,
given how the host side resources often work.

Trying to condense multiple vectors into one means its up to the driver
to implement any type of prioritization on its own (or worse, it just
suffers from PI). Likewise, implementing them as unique vectors means
you are likely to have coincident events for certain workloads.

What alacrityvm tries to do is recognize these points and optimize for
both cases. It means we still retain framework-managed prioritized
callbacks, yet optimize away extraneous IO for coincident signals. IOW:
best of both worlds.

Kind Regards,
-Greg




Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-21 17:44:26

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/09 12:20 PM, Anthony Liguori wrote:
> On 12/21/2009 10:46 AM, Gregory Haskins wrote:
>> The very best you can hope to achieve is 1:1 EOI per signal (though
>> today virtio-pci is even worse than that). As I indicated above, I can
>> eliminate more than 50% of even the EOIs in trivial examples, and even
>> more as we scale up the number of devices or the IO load (or both).
>>
>
> If optimizing EOI is the main technical advantage of vbus, then surely
> we could paravirtualize EOI access and get that benefit in KVM without
> introducing a whole new infrastructure, no?

No, because I never claimed optimizing EOI was the main/only advantage.
The feature set has all been covered in extensive detail in the lists,
however, so I will defer you to google for the archives for your reading
pleasure.

>
>>> This is a
>>> light weight exit today and will likely disappear entirely with newer
>>> hardware.
>>>
>> By that argument, this is all moot. New hardware will likely obsolete
>> the need for venet or virtio-net anyway.
>
> Not at all.

Well, surely something like SR-IOV is moving in that direction, no?

> But let's focus on concrete data. For a given workload,
> how many exits do you see due to EOI?

Its of course highly workload dependent, and I've published these
details in the past, I believe. Off the top of my head, I recall that
virtio-pci tends to throw about 65k exits per second, vs about 32k/s for
venet on a 10GE box, but I don't recall what ratio of those exits are
EOI. To be perfectly honest, I don't care. I do not discriminate
against the exit type...I want to eliminate as many as possible,
regardless of the type. That's how you go fast and yet use less CPU.

> They should be relatively rare
> because obtaining good receive batching is pretty easy.

Batching is poor mans throughput (its easy when you dont care about
latency), so we generally avoid as much as possible.

> Considering
> these are lightweight exits (on the order of 1-2us),

APIC EOIs on x86 are MMIO based, so they are generally much heavier than
that. I measure at least 4-5us just for the MMIO exit on my Woodcrest,
never mind executing the locking/apic-emulation code.

> you need an awfully
> large amount of interrupts before you get really significant performance
> impact. You would think NAPI would kick in at this point anyway.
>

Whether NAPI can kick in or not is workload dependent, and it also does
not address coincident events. But on that topic, you can think of
AlacrityVM's interrupt controller as "NAPI for interrupts", because it
operates on the same principle. For what its worth, it also operates on
a "NAPI for hypercalls" concept too.

> Do you have data demonstrating the advantage of EOI mitigation?

I have non-scientifically gathered numbers in my notebook that put it on
average of about 55%-60% reduction in EOIs for inbound netperf runs, for
instance. I don't have time to gather more in the near term, but its
typically in that range for a chatty enough workload, and it goes up as
you add devices. I would certainly formally generate those numbers when
I make another merge request in the future, but I don't have them now.

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 00:12:48

by Anthony Liguori

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/2009 11:44 AM, Gregory Haskins wrote:
> Well, surely something like SR-IOV is moving in that direction, no?
>

Not really, but that's a different discussion.

>> But let's focus on concrete data. For a given workload,
>> how many exits do you see due to EOI?
>>
> Its of course highly workload dependent, and I've published these
> details in the past, I believe. Off the top of my head, I recall that
> virtio-pci tends to throw about 65k exits per second, vs about 32k/s for
> venet on a 10GE box, but I don't recall what ratio of those exits are
> EOI.

Was this userspace virtio-pci or was this vhost-net? If it was the
former, then were you using MSI-X? If you weren't, there would be an
additional (rather heavy) exit per-interrupt to clear the ISR which
would certainly account for a large portion of the additional exits.

> To be perfectly honest, I don't care. I do not discriminate
> against the exit type...I want to eliminate as many as possible,
> regardless of the type. That's how you go fast and yet use less CPU.
>

It's important to understand why one mechanism is better than another.
All I'm looking for is a set of bullet points that say, vbus does this,
vhost-net does that, therefore vbus is better. We would then either
say, oh, that's a good idea, let's change vhost-net to do that, or we
would say, hrm, well, we can't change vhost-net to do that because of
some fundamental flaw, let's drop it and adopt vbus.

It's really that simple :-)


>> They should be relatively rare
>> because obtaining good receive batching is pretty easy.
>>
> Batching is poor mans throughput (its easy when you dont care about
> latency), so we generally avoid as much as possible.
>

Fair enough.

>> Considering
>> these are lightweight exits (on the order of 1-2us),
>>
> APIC EOIs on x86 are MMIO based, so they are generally much heavier than
> that. I measure at least 4-5us just for the MMIO exit on my Woodcrest,
> never mind executing the locking/apic-emulation code.
>

You won't like to hear me say this, but Woodcrests are pretty old and
clunky as far as VT goes :-)

On a modern Nehalem, I would be surprised if an MMIO exit handled in the
kernel was muck more than 2us. The hardware is getting very, very
fast. The trends here are very important to consider when we're looking
at architectures that we potentially are going to support for a long time.

>> you need an awfully
>> large amount of interrupts before you get really significant performance
>> impact. You would think NAPI would kick in at this point anyway.
>>
>>
> Whether NAPI can kick in or not is workload dependent, and it also does
> not address coincident events. But on that topic, you can think of
> AlacrityVM's interrupt controller as "NAPI for interrupts", because it
> operates on the same principle. For what its worth, it also operates on
> a "NAPI for hypercalls" concept too.
>

The concept of always batching hypercalls has certainly been explored
within the context of Xen. But then when you look at something like
KVM's hypercall support, it turns out that with sufficient cleverness in
the host, we don't even bother with the MMU hypercalls anymore.

Doing fancy things in the guest is difficult to support from a long term
perspective. It'll more or less never work for Windows and even the lag
with Linux makes it difficult for users to see the benefit of these
changes. You get a lot more flexibility trying to solve things in the
host even if it's convoluted (like TPR patching).

>> Do you have data demonstrating the advantage of EOI mitigation?
>>
> I have non-scientifically gathered numbers in my notebook that put it on
> average of about 55%-60% reduction in EOIs for inbound netperf runs, for
> instance. I don't have time to gather more in the near term, but its
> typically in that range for a chatty enough workload, and it goes up as
> you add devices. I would certainly formally generate those numbers when
> I make another merge request in the future, but I don't have them now.
>

I don't think it's possible to make progress with vbus without detailed
performance data comparing both vbus and virtio (vhost-net). On the
virtio/vhost-net side, I think we'd be glad to help gather/analyze that
data. We have to understand why one's better than the other and then we
have to evaluate whether we can bring those benefits into the later. If
we can't, we merge vbus. If we can, we fix virtio.

Regards,

Anthony Liguori

> Kind Regards,
> -Greg
>
>

2009-12-22 07:23:06

by Gleb Natapov

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Mon, Dec 21, 2009 at 12:44:17PM -0500, Gregory Haskins wrote:
> > They should be relatively rare
> > because obtaining good receive batching is pretty easy.
>
> Batching is poor mans throughput (its easy when you dont care about
> latency), so we generally avoid as much as possible.
>
> > Considering
> > these are lightweight exits (on the order of 1-2us),
>
> APIC EOIs on x86 are MMIO based, so they are generally much heavier than
> that. I measure at least 4-5us just for the MMIO exit on my Woodcrest,
> never mind executing the locking/apic-emulation code.
>
With x2apic EOIs are not MMIO any longer.

--
Gleb.

2009-12-22 07:58:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33


* Gregory Haskins <[email protected]> wrote:

> On 12/18/09 4:51 PM, Ingo Molnar wrote:
> >
> > * Gregory Haskins <[email protected]> wrote:
> >
> >> Hi Linus,
> >>
> >> Please pull AlacrityVM guest support for 2.6.33 from:
> >>
> >> git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git
> >> for-linus
> >>
> >> All of these patches have stewed in linux-next for quite a while now:
> >>
> >> Gregory Haskins (26):
> >
> > I think it would be fair to point out that these patches have been objected to
> > by the KVM folks quite extensively,
>
> Actually, these patches have nothing to do with the KVM folks. [...]

That claim is curious to me - the AlacrityVM host is 90% based on KVM code, so
how can it not be about KVM? I just checked, most of the changes that
AlacrityVM host does to KVM is in adding the host side interfaces for these
guest drivers:

virt/kvm/Kconfig | 11 +
virt/kvm/coalesced_mmio.c | 65 +++---
virt/kvm/coalesced_mmio.h | 1 +
virt/kvm/eventfd.c | 599 +++++++++++++++++++++++++++++++++++++++++++++
virt/kvm/ioapic.c | 118 +++++++--
virt/kvm/ioapic.h | 5 +
virt/kvm/iodev.h | 55 +++--
virt/kvm/irq_comm.c | 267 ++++++++++++++-------
virt/kvm/kvm_main.c | 127 ++++++++--
virt/kvm/xinterface.c | 587 ++++++++++++++++++++++++++++++++++++++++++++
10 files changed, 1649 insertions(+), 186 deletions(-)

[ stat for virt/kvm/ taken as of today, AlacrityVM host tree commit 84afcc7 ]

So as far as kernel code modifications of AlacrityVM goes, it's very much
about KVM.

> [...] You are perhaps confusing this with the hypervisor-side discussion,
> of which there is indeed much disagreement.

Are the guest drivers living in a vacuum? The whole purpose of the AlacrityVM
guest drivers is to ... enable AlacrityVM support, right? So how can it be not
about KVM?

Gregory, it would be nice if you worked _much_ harder with the KVM folks
before giving up. It's not like there's much valid technical disagreement that
i can identify in any of the threads - the strongest one i could identify was:
"I want to fork KVM so please let me do it, nobody is harmed, choice is good".

Ingo

2009-12-22 08:00:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33


* Anthony Liguori <[email protected]> wrote:

> It's important to understand why one mechanism is better than another. All
> I'm looking for is a set of bullet points that say, vbus does this,
> vhost-net does that, therefore vbus is better. We would then either say,
> oh, that's a good idea, let's change vhost-net to do that, or we would say,
> hrm, well, we can't change vhost-net to do that because of some fundamental
> flaw, let's drop it and adopt vbus.
>
> It's really that simple :-)

That makes a lot of sense to me.

I think we better have damn good technical reasons before we encourage a fork
of a subsystem within the kernel. Technical truth is not something we can
'agree to disagree' on, and it is not something we can compromise on really.

Both the host and the guest code is in Linux so adding another variant without
that variant replacing the old one (on the spot or gradually) makes no
technical sense.

Gregory, i'd suggest for you to shape this as a "this and this aspect of KVM
needs to be replaced/fixed" list of items, as suggested by Anthony. In my
experience the KVM folks are very approachable and very reasonable about
addressing technical shortcomings and acting upon feedback (and are happily
accepting code as well) - so to the extent there's room for improvement here
it should be done by shaping KVM, not by forking and rebranding it.

Ingo

2009-12-22 11:49:41

by Andi Kleen

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

Anthony Liguori <[email protected]> writes:
>
> On a modern Nehalem, I would be surprised if an MMIO exit handled in
> the kernel was muck more than 2us. The hardware is getting very, very
> fast. The trends here are very important to consider when we're
> looking at architectures that we potentially are going to support for
> a long time.

When you talk about trends the trend for IO is also to get faster.

An exit will be always more expensive than passing something from
another CPU in shared memory. An exit is much more work,
with lots of saved context and fundamentally synchronous,
even with all the tricks hardware can do. And then there's the
in kernel handler too.

Shared memory passing from another CPU is a much cheaper
operation and more likely to scale with IO rate improvements.

The basic problem in this discussion seems to be the usual
disconnect between working code (I understand Gregory has working
code that demonstrates the performance advances he's claiming)
versus unwritten optimizations.

Unwritten code tends to always sound nicer, but it remains to be seen
if it can deliver what it promises.

>From a abstract stand point having efficient paravirtual IO interfaces
seem attractive.

I also personally don't see a big problem in having another set of
virtual drivers -- Linux already has plenty (vmware, xen, virtio, power,
s390-vm, ...) and it's not that they would be a particular maintenance
burden impacting the kernel core.

-Andi
--
[email protected] -- Speaking for myself only.

2009-12-22 15:31:40

by Anthony Liguori

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/2009 05:49 AM, Andi Kleen wrote:
> Anthony Liguori<[email protected]> writes:
>
>> On a modern Nehalem, I would be surprised if an MMIO exit handled in
>> the kernel was muck more than 2us. The hardware is getting very, very
>> fast. The trends here are very important to consider when we're
>> looking at architectures that we potentially are going to support for
>> a long time.
>>
> When you talk about trends the trend for IO is also to get faster.
>
> An exit will be always more expensive than passing something from
> another CPU in shared memory. An exit is much more work,
> with lots of saved context and fundamentally synchronous,
> even with all the tricks hardware can do. And then there's the
> in kernel handler too.
>

Noone is advocating avoiding shared memory and doing more exits in the
IO path :-)

Whether it's x2apic support or whether it's more sophisicated hardware
apic virtualization support, the point remains that taking an exit due
to EOI is likely to not be required in the near term future.

So far, the only actual technical advantage I've seen is that vbus
avoids EOI exits. My response is that I don't think that that's so
important especially when you consider that it's going to not matter so
much in the future and that Avi has some ideas about how to eliminate
some of them even on older hardware. I'm also suspicious that EOI exits
alone would result in a huge performance differiental between the two
architectures.

We think we understand why vbus does better than the current userspace
virtio backend. That's why we're building vhost-net. It's not done
yet, but our expectation is that it will do just as well if not better.

> Shared memory passing from another CPU is a much cheaper
> operation and more likely to scale with IO rate improvements.
>
> The basic problem in this discussion seems to be the usual
> disconnect between working code (I understand Gregory has working
> code that demonstrates the performance advances he's claiming)
> versus unwritten optimizations.
>

vbus has one driver (networking) that supports one guest (very new Linux
kernels). It supports one hypervisor (KVM) on one architecture (x86).

On the other hand, virtio has six upstream drivers (console, network,
block, rng, balloon, 9p) with at least as many in development. It
supports kernels going back to at least 2.6.18, almost all versions of
Windows, and has experimental drivers for other OSes. It supports KVM,
lguest, VirtualBox, with support for additional hypervisors under
development. It supports at least five architectures (x86, ppc, s390,
ia64, arm).

You are correct, vbus has better numbers than virtio today. But so far,
it's hardly an apples-to-apples comparison. Our backend networking
driver has been implemented entirely in userspace up until very
recently. There really isn't any good performance data comparing vbus
to vhost-net largely because vhost-net is still under active development.

The most important point though, is that so far, I don't think Greg has
been able to articulate _why_ vbus would perform better than vhost-net.

If that can be articulated in a way that we all agree vbus has a
technical advantage over vhost-net, then I'm absolutely in agreement
that it should be merged.

I think the comparison would be if someone submitted a second e1000
driver that happened to do better on one netperf test than the current
e1000 driver.

You can argue, hey, choice is good, let's let a user choose if they want
to use the faster e1000 driver. But surely, the best thing for a user
is to figure out why the second e1000 driver is better on that one test,
integrate that change into the current e1000 driver, or decided that the
new e1000 driver is more superior in architecture and do the required
work to make the new e1000 driver a full replacement for the old one.

Regards,

Anthony Liguori

> Unwritten code tends to always sound nicer, but it remains to be seen
> if it can deliver what it promises.
>
> From a abstract stand point having efficient paravirtual IO interfaces
> seem attractive.
>
> I also personally don't see a big problem in having another set of
> virtual drivers -- Linux already has plenty (vmware, xen, virtio, power,
> s390-vm, ...) and it's not that they would be a particular maintenance
> burden impacting the kernel core.
>
> -Andi
>

Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Tuesday 22 December 2009 04:31:32 pm Anthony Liguori wrote:

> I think the comparison would be if someone submitted a second e1000
> driver that happened to do better on one netperf test than the current
> e1000 driver.
>
> You can argue, hey, choice is good, let's let a user choose if they want
> to use the faster e1000 driver. But surely, the best thing for a user
> is to figure out why the second e1000 driver is better on that one test,
> integrate that change into the current e1000 driver, or decided that the

Even though this is "Won't somebody please think of the users?" argument
such work would be much welcomed. Sending patches would be a great start..

> new e1000 driver is more superior in architecture and do the required
> work to make the new e1000 driver a full replacement for the old one.

Right, like everyone actually does things this way..

I wonder why do we have OSS, old Firewire and IDE stacks still around then?

> Regards,
>
> Anthony Liguori
>
> > Unwritten code tends to always sound nicer, but it remains to be seen
> > if it can deliver what it promises.
> >
> > From a abstract stand point having efficient paravirtual IO interfaces
> > seem attractive.
> >
> > I also personally don't see a big problem in having another set of
> > virtual drivers -- Linux already has plenty (vmware, xen, virtio, power,
> > s390-vm, ...) and it's not that they would be a particular maintenance
> > burden impacting the kernel core.

Exactly, I also don't see any problem here, especially since AlacrityVM
drivers have much cleaner design / internal architecture than some of their
competitors..

--
Bartlomiej Zolnierkiewicz

2009-12-22 16:21:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

> So far, the only actual technical advantage I've seen is that vbus avoids
> EOI exits.

The technical advantage is that it's significantly faster today.

Maybe your proposed alternative is as fast, or maybe it's not. Who knows?

> We think we understand why vbus does better than the current userspace
> virtio backend. That's why we're building vhost-net. It's not done yet,
> but our expectation is that it will do just as well if not better.

That's the vapourware vs working code disconnect I mentioned. One side has hard
numbers&working code and the other has expectations. I usually find it sad when the
vapourware holds up the working code.

-Andi
--
[email protected] -- Speaking for myself only.

2009-12-22 16:21:47

by Anthony Liguori

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/2009 10:01 AM, Bartlomiej Zolnierkiewicz wrote:
>> new e1000 driver is more superior in architecture and do the required
>> work to make the new e1000 driver a full replacement for the old one.
>>
> Right, like everyone actually does things this way..
>
> I wonder why do we have OSS, old Firewire and IDE stacks still around then?
>

And it's always a source of pain, isn't it.

>>> I also personally don't see a big problem in having another set of
>>> virtual drivers -- Linux already has plenty (vmware, xen, virtio, power,
>>> s390-vm, ...) and it's not that they would be a particular maintenance
>>> burden impacting the kernel core.
>>>
> Exactly, I also don't see any problem here, especially since AlacrityVM
> drivers have much cleaner design / internal architecture than some of their
> competitors..
>

Care to provide some actual objective argument to why it's better than
what we already have?

Regards,

Anthony Liguori

2009-12-22 16:27:39

by Anthony Liguori

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/2009 10:21 AM, Andi Kleen wrote:
>> So far, the only actual technical advantage I've seen is that vbus avoids
>> EOI exits.
>>
> The technical advantage is that it's significantly faster today.
>

There are two separate pieces of code in question. There are front-end
drivers and there are back-end drivers.

Right now, there are only front-end drivers in the kernel today. The
combination of vbus front-end drivers and *kernel* back-end drivers are
faster than the *combination* of virtio front-end drivers and
*userspace* back-end drivers.

vhost-net is our kernel back-end driver. No one has yet established
that the combination of virtio front-end driver and kernel back-end
driver is really significantly slower than vbus.

> Maybe your proposed alternative is as fast, or maybe it's not. Who knows?
>
>
>> We think we understand why vbus does better than the current userspace
>> virtio backend. That's why we're building vhost-net. It's not done yet,
>> but our expectation is that it will do just as well if not better.
>>
> That's the vapourware vs working code disconnect I mentioned. One side has hard
> numbers&working code and the other has expectations. I usually find it sad when the
> vapourware holds up the working code.
>

We're not talking about vaporware. vhost-net exists.

Regards,

Anthony Liguori

> -Andi
>

2009-12-22 17:06:14

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/2009 06:21 PM, Andi Kleen wrote:
>> So far, the only actual technical advantage I've seen is that vbus avoids
>> EOI exits.
>>
> The technical advantage is that it's significantly faster today.
>
> Maybe your proposed alternative is as fast, or maybe it's not. Who knows?
>

We're working on numbers for the proposed alternative, so we should know
soon. Are the AlacrityVM folks working on having all the virtio drivers
for all the virtio archs?

We shouldn't drop everything and switch to new code just because someone
came up with a new idea. The default should be to enhance the existing
code.

>> We think we understand why vbus does better than the current userspace
>> virtio backend. That's why we're building vhost-net. It's not done yet,
>> but our expectation is that it will do just as well if not better.
>>
> That's the vapourware vs working code disconnect I mentioned. One side has hard
> numbers&working code and the other has expectations. I usually find it sad when the
> vapourware holds up the working code.
>

vhost-net is working code and is queued for 2.6.33.

--
error compiling committee.c: too many arguments to function

2009-12-22 17:33:33

by Andi Kleen

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

> We're not talking about vaporware. vhost-net exists.

Is it as fast as the alacrityvm setup then e.g. for network traffic?

Last I heard the first could do wirespeed 10Gbit/s on standard hardware.
Can vhost-net do the same thing?

-Andi
--
[email protected] -- Speaking for myself only.

2009-12-22 17:36:34

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/2009 07:33 PM, Andi Kleen wrote:
>> We're not talking about vaporware. vhost-net exists.
>>
> Is it as fast as the alacrityvm setup then e.g. for network traffic?
>
> Last I heard the first could do wirespeed 10Gbit/s on standard hardware.
>

That was with zero-copy IIRC, which is known broken. There's nothing
alacrity-specific about zerocopy (and in fact the first zerocopy patches
were from Rusty).

> Can vhost-net do the same thing?
>

I've heard unofficial numbers which approach that, but let's wait for
the official ones.

--
error compiling committee.c: too many arguments to function

2009-12-22 17:37:13

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 2:57 AM, Ingo Molnar wrote:
>
> * Gregory Haskins <[email protected]> wrote:
>
>> On 12/18/09 4:51 PM, Ingo Molnar wrote:
>>>
>>> * Gregory Haskins <[email protected]> wrote:
>>>
>>>> Hi Linus,
>>>>
>>>> Please pull AlacrityVM guest support for 2.6.33 from:
>>>>
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git
>>>> for-linus
>>>>
>>>> All of these patches have stewed in linux-next for quite a while now:
>>>>
>>>> Gregory Haskins (26):
>>>
>>> I think it would be fair to point out that these patches have been objected to
>>> by the KVM folks quite extensively,
>>
>> Actually, these patches have nothing to do with the KVM folks. [...]
>
> That claim is curious to me - the AlacrityVM host

It's quite simple, really. These drivers support accessing vbus, and
vbus is hypervisor agnostic. In fact, vbus isn't necessarily even
hypervisor related. It may be used anywhere where a Linux kernel is the
"io backend", which includes hypervisors like AlacrityVM, but also
userspace apps, and interconnected physical systems as well.

The vbus-core on the backend, and the drivers on the frontend operate
completely independent of the underlying hypervisor. A glue piece
called a "connector" ties them together, and any "hypervisor" specific
details are encapsulated in the connector module. In this case, the
connector surfaces to the guest side as a pci-bridge, so even that is
not hypervisor specific per se. It will work with any pci-bridge that
exposes a compatible ABI, which conceivably could be actual hardware.

The AlacrityVM project just so happens to be the primary consumer, and
is therefore the most convenient way to package them up at the moment.

> is 90% based on KVM code, so
> how can it not be about KVM? I just checked, most of the changes that
> AlacrityVM host does to KVM is in adding the host side interfaces for these
> guest drivers:
>
> virt/kvm/Kconfig | 11 +
> virt/kvm/coalesced_mmio.c | 65 +++---
> virt/kvm/coalesced_mmio.h | 1 +
> virt/kvm/eventfd.c | 599 +++++++++++++++++++++++++++++++++++++++++++++
> virt/kvm/ioapic.c | 118 +++++++--
> virt/kvm/ioapic.h | 5 +
> virt/kvm/iodev.h | 55 +++--
> virt/kvm/irq_comm.c | 267 ++++++++++++++-------
> virt/kvm/kvm_main.c | 127 ++++++++--
> virt/kvm/xinterface.c | 587 ++++++++++++++++++++++++++++++++++++++++++++
> 10 files changed, 1649 insertions(+), 186 deletions(-)
>
> [ stat for virt/kvm/ taken as of today, AlacrityVM host tree commit 84afcc7 ]
>
> So as far as kernel code modifications of AlacrityVM goes, it's very much
> about KVM.

I think you are confused. Even if we entertained the notion that the
host side diffstat were somehow relevant here, you are probably
comparing the kvm.git backports that are in my tree. The only real KVM
specific change that is in my tree is the 587 lines for the xinterface.c
module, which is about ~4%, not 90%. Also note that I have pushed this
xinterface logic upstream already, but it just hasn't been accepted yet.

If I wanted to be extremely generous, you could include the entire "KVM
connector" code that bridges vbus-core to kvm-core, but even that tops
out at a total of ~17% of the changes in my tree. So I am still not
seeing the 90% nor how it is relevant.

>
>> [...] You are perhaps confusing this with the hypervisor-side discussion,
>> of which there is indeed much disagreement.
>
> Are the guest drivers living in a vacuum? The whole purpose of the AlacrityVM
> guest drivers is to ... enable AlacrityVM support, right?

More specifically, the purpose of the drivers, like any driver, is to
enable support for the underlying device in which it is related to. In
this case, the devices are vbus based devices. Of those, AlacrityVM is
the only available platform that exposes them. However, that is a
maturity/adoption detail, not a technical limitation. Simply
implementing a new connector would bridge these drivers to other
environments as well. There are community members working on these as
we speak, as a matter of fact.

> So how can it be not about KVM?

Because AlacrityVM is a hypervisor that supports VBUS for PV IO, and KVM
is not. In addition, the presence of these drivers in no way alters,
interferes with, or diminishes features found in KVM today. So it is,
and never will be about KVM until upstream KVM decides that they want to
support VBUS based PV-IO.

If you want to talk about the host side, then I have +587 lines that
hang in the balance that affect KVM, yes. But that isn't what $subject
was about.

>
> Gregory, it would be nice if you worked _much_ harder with the KVM folks
> before giving up.

I think the 5+ months that I politely tried to convince the KVM folks
that this was a good idea was pretty generous of my employer. The KVM
maintainers have ultimately made it clear they are not interested in
directly supporting this concept (which is their prerogative), but are
perhaps willing to support the peripheral logic needed to allow it to
easily interface with KVM. I can accept that, and thus AlacrityVM was born.

Note that upstream KVM are also only a subset of the mindshare needed
for this project anyway, since most of the core is independent of KVM.
Perhaps the KVM folks will reconsider if/when other community members
start to see the merit in the work. Perhaps not. It's out of my
control at this point.

> It's not like there's much valid technical disagreement that
> i can identify in any of the threads

While I am sorry to hear that, it should be noted that this doesn't mean
that your perception is accurate, either. It was quite a long and
fragmented set of threads over those 5+ months, so absorbing the gist of
the vision from casual observation is not likely trivial.

> - the strongest one i could identify was:
> "I want to fork KVM so please let me do it, nobody is harmed, choice is good".

Everyone is of course entitled to an opinion, but I would respectfully
disagree with your statement (as I did last time you made the same
claim, as well). I have now, nor ever, wanted a fork. But I also
believe in the work I am doing, so I won't roll over and die just
because a certain group doesn't share the vision per se either, sorry.
I get the impression that you would not either if you were in a similar
situation, so perhaps you can respect that.

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 18:55:12

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/2009 07:36 PM, Gregory Haskins wrote:
>
>> Gregory, it would be nice if you worked _much_ harder with the KVM folks
>> before giving up.
>>
> I think the 5+ months that I politely tried to convince the KVM folks
> that this was a good idea was pretty generous of my employer. The KVM
> maintainers have ultimately made it clear they are not interested in
> directly supporting this concept (which is their prerogative), but are
> perhaps willing to support the peripheral logic needed to allow it to
> easily interface with KVM. I can accept that, and thus AlacrityVM was born.
>

Review pointed out locking issues with xinterface which I have not seen
addressed. I asked why the irqfd/ioeventfd mechanisms are insufficient,
and you did not reply.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-12-22 18:56:48

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 1:53 PM, Avi Kivity wrote:
> On 12/22/2009 07:36 PM, Gregory Haskins wrote:
>>
>>> Gregory, it would be nice if you worked _much_ harder with the KVM folks
>>> before giving up.
>>>
>> I think the 5+ months that I politely tried to convince the KVM folks
>> that this was a good idea was pretty generous of my employer. The KVM
>> maintainers have ultimately made it clear they are not interested in
>> directly supporting this concept (which is their prerogative), but are
>> perhaps willing to support the peripheral logic needed to allow it to
>> easily interface with KVM. I can accept that, and thus AlacrityVM was
>> born.
>>
>
> Review pointed out locking issues with xinterface which I have not seen
> addressed. I asked why the irqfd/ioeventfd mechanisms are insufficient,
> and you did not reply.
>

Yes, I understand. I've been too busy to rework the code for an
upstream push. I will certainly address those questions when I make the
next attempt, but they weren't relevant to the guest side.


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 19:15:50

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 1:53 PM, Avi Kivity wrote:
> I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not reply.
>

BTW: the ioeventfd issue just fell through the cracks, so sorry about
that. Note that I have no specific issue with irqfd ever since the
lockless IRQ injection code was added.

ioeventfd turned out to be suboptimal for me in the fast path for two
reasons:

1) the underlying eventfd is called in atomic context. I had posted
patches to Davide to address that limitation, but I believe he rejected
them on the grounds that they are only relevant to KVM.

2) it cannot retain the data field passed in the PIO. I wanted to have
one vector that could tell me what value was written, and this cannot be
expressed in ioeventfd.

Based on this, it was a better decision to add a ioevent interface to
xinterface. It neatly solves both problems.

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 19:26:30

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/2009 09:15 PM, Gregory Haskins wrote:
> On 12/22/09 1:53 PM, Avi Kivity wrote:
>
>> I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not reply.
>>
>>
> BTW: the ioeventfd issue just fell through the cracks, so sorry about
> that. Note that I have no specific issue with irqfd ever since the
> lockless IRQ injection code was added.
>
> ioeventfd turned out to be suboptimal for me in the fast path for two
> reasons:
>
> 1) the underlying eventfd is called in atomic context. I had posted
> patches to Davide to address that limitation, but I believe he rejected
> them on the grounds that they are only relevant to KVM.
>

If you're not doing something pretty minor, you're better of waking up a
thread (perhaps _sync if you want to keep on the same cpu). With the
new user return notifier thingie, that's pretty cheap.

> 2) it cannot retain the data field passed in the PIO. I wanted to have
> one vector that could tell me what value was written, and this cannot be
> expressed in ioeventfd.
>
>

It would be easier to add data logging support to ioeventfd, if it was
needed that badly.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-12-22 19:32:35

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 2:25 PM, Avi Kivity wrote:
> On 12/22/2009 09:15 PM, Gregory Haskins wrote:
>> On 12/22/09 1:53 PM, Avi Kivity wrote:
>>
>>> I asked why the irqfd/ioeventfd mechanisms are insufficient, and you
>>> did not reply.
>>>
>>>
>> BTW: the ioeventfd issue just fell through the cracks, so sorry about
>> that. Note that I have no specific issue with irqfd ever since the
>> lockless IRQ injection code was added.
>>
>> ioeventfd turned out to be suboptimal for me in the fast path for two
>> reasons:
>>
>> 1) the underlying eventfd is called in atomic context. I had posted
>> patches to Davide to address that limitation, but I believe he rejected
>> them on the grounds that they are only relevant to KVM.
>>
>
> If you're not doing something pretty minor, you're better of waking up a
> thread (perhaps _sync if you want to keep on the same cpu). With the
> new user return notifier thingie, that's pretty cheap.

We have exploits that take advantage of IO heuristics. When triggered
they do more work in vcpu context than normal, which reduces latency
under certain circumstances. But you definitely do _not_ want to do
them in-atomic ;)

>
>> 2) it cannot retain the data field passed in the PIO. I wanted to have
>> one vector that could tell me what value was written, and this cannot be
>> expressed in ioeventfd.
>>
>>
>
> It would be easier to add data logging support to ioeventfd, if it was
> needed that badly.

"Better design"? perhaps. "More easily"? no. Besides, Davide has
already expressed dissatisfaction with the KVM-isms creeping into
eventfd, so its not likely to ever be accepted regardless of your own
disposition.

xinterface, as it turns out, is a great KVM interface for me and easy to
extend, all without conflicting with the changes in upstream. The old
way was via the kvm ioctl interface, but that sucked as the ABI was
always moving. Where is the problem? ioeventfd still works fine as it is.

Kind Regards,
-Greg



Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 19:37:36

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 2:32 PM, Gregory Haskins wrote:
> On 12/22/09 2:25 PM, Avi Kivity wrote:

>>
>> If you're not doing something pretty minor, you're better of waking up a
>> thread (perhaps _sync if you want to keep on the same cpu). With the
>> new user return notifier thingie, that's pretty cheap.
>
> We have exploits that take advantage of IO heuristics. When triggered
> they do more work in vcpu context than normal, which reduces latency
> under certain circumstances. But you definitely do _not_ want to do
> them in-atomic ;)

And I almost forgot: dev->call() is an RPC to the backend device.
Therefore, it must be synchronous, yet we dont want it locked either. I
think that was actually the primary motivation for the change, now that
I think about it.

-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 19:39:32

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/2009 09:32 PM, Gregory Haskins wrote:
> xinterface, as it turns out, is a great KVM interface for me and easy to
> extend, all without conflicting with the changes in upstream. The old
> way was via the kvm ioctl interface, but that sucked as the ABI was
> always moving. Where is the problem? ioeventfd still works fine as it is.
>

It means that kvm locking suddenly affects more of the kernel.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-12-22 19:40:07

by Davide Libenzi

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Tue, 22 Dec 2009, Gregory Haskins wrote:

> On 12/22/09 1:53 PM, Avi Kivity wrote:
> > I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not reply.
> >
>
> BTW: the ioeventfd issue just fell through the cracks, so sorry about
> that. Note that I have no specific issue with irqfd ever since the
> lockless IRQ injection code was added.
>
> ioeventfd turned out to be suboptimal for me in the fast path for two
> reasons:
>
> 1) the underlying eventfd is called in atomic context. I had posted
> patches to Davide to address that limitation, but I believe he rejected
> them on the grounds that they are only relevant to KVM.

I thought we addressed this already, in the few hundreds of email we
exchanged back then :)



> 2) it cannot retain the data field passed in the PIO. I wanted to have
> one vector that could tell me what value was written, and this cannot be
> expressed in ioeventfd.

Like might have hinted in his reply, couldn't you add data support to the
ioeventfd bits in KVM, instead of leaking them into mainline eventfd?



- Davide

2009-12-22 19:41:48

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/2009 09:32 PM, Gregory Haskins wrote:
> Besides, Davide has
> already expressed dissatisfaction with the KVM-isms creeping into
> eventfd, so its not likely to ever be accepted regardless of your own
> disposition.
>

Why don't you duplicate eventfd, then, should be easier than duplicating
virtio.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-12-22 19:41:30

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 2:38 PM, Avi Kivity wrote:
> On 12/22/2009 09:32 PM, Gregory Haskins wrote:
>> xinterface, as it turns out, is a great KVM interface for me and easy to
>> extend, all without conflicting with the changes in upstream. The old
>> way was via the kvm ioctl interface, but that sucked as the ABI was
>> always moving. Where is the problem? ioeventfd still works fine as
>> it is.
>>
>
> It means that kvm locking suddenly affects more of the kernel.
>

Thats ok. This would only be w.r.t. devices that are bound to the KVM
instance anyway, so they better know what they are doing (and they do).

-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 19:44:40

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/2009 09:41 PM, Gregory Haskins wrote:
>
>> It means that kvm locking suddenly affects more of the kernel.
>>
>>
> Thats ok. This would only be w.r.t. devices that are bound to the KVM
> instance anyway, so they better know what they are doing (and they do).
>
>

It's okay to the author of that device. It's not okay to the kvm
developers who are still evolving the locking and have to handle all
devices that use xinterface.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-12-22 19:47:37

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 2:43 PM, Avi Kivity wrote:
> On 12/22/2009 09:41 PM, Gregory Haskins wrote:
>>
>>> It means that kvm locking suddenly affects more of the kernel.
>>>
>>>
>> Thats ok. This would only be w.r.t. devices that are bound to the KVM
>> instance anyway, so they better know what they are doing (and they do).
>>
>>
>
> It's okay to the author of that device. It's not okay to the kvm
> developers who are still evolving the locking and have to handle all
> devices that use xinterface.

Perhaps, but like it or not, if you want to do in-kernel you need to
invoke backends. And if you want to invoke backends, limiting it to
thread wakeups is, well, limiting. For one, you miss out on that
exploit I mentioned earlier which can help sometimes.

Besides, the direction that Marcelo and I left the mmio/pio bus was that
it would go lockless eventually, not "more lockful" ;)

Has that changed? I honestly haven't followed whats going on in the
io-bus code in a while.

-Greg



Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 19:53:28

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/09 2:39 PM, Davide Libenzi wrote:
> On Tue, 22 Dec 2009, Gregory Haskins wrote:
>
>> On 12/22/09 1:53 PM, Avi Kivity wrote:
>>> I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not reply.
>>>
>>
>> BTW: the ioeventfd issue just fell through the cracks, so sorry about
>> that. Note that I have no specific issue with irqfd ever since the
>> lockless IRQ injection code was added.
>>
>> ioeventfd turned out to be suboptimal for me in the fast path for two
>> reasons:
>>
>> 1) the underlying eventfd is called in atomic context. I had posted
>> patches to Davide to address that limitation, but I believe he rejected
>> them on the grounds that they are only relevant to KVM.
>
> I thought we addressed this already, in the few hundreds of email we
> exchanged back then :)

We addressed the race conditions, but not the atomic callbacks. I can't
remember exactly what you said, but the effect was "no", so I dropped it. ;)

This was the thread.

http://www.archivum.info/[email protected]/2009-06/08548/Re:_[KVM-RFC_PATCH_1_2]_eventfd:_add_an_explicit_srcu_based_notifier_interface

>
>
>
>> 2) it cannot retain the data field passed in the PIO. I wanted to have
>> one vector that could tell me what value was written, and this cannot be
>> expressed in ioeventfd.
>
> Like might have hinted in his reply, couldn't you add data support to the
> ioeventfd bits in KVM, instead of leaking them into mainline eventfd?
>

Perhaps, or even easier I could extend xinterface. Which is what I did ;)

The problem with the first proposal is that you would no longer actually
have an eventfd based mechanism...so any code using ioeventfd (like
Michael Tsirkin's for instance) would break.

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 20:41:52

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/21/09 7:12 PM, Anthony Liguori wrote:
> On 12/21/2009 11:44 AM, Gregory Haskins wrote:
>> Well, surely something like SR-IOV is moving in that direction, no?
>>
>
> Not really, but that's a different discussion.

Ok, but my general point still stands. At some level, some crafty
hardware engineer may invent something that obsoletes the
need for, say, PV 802.x drivers because it can hit 40GE line rate at the
same performance level of bare metal with some kind of pass-through
trick. But I still do not see that as an excuse for sloppy software in
the meantime, as there will always be older platforms, older IO cards,
or different IO types that are not benefactors of said hw based
optimizations.

>
>>> But let's focus on concrete data. For a given workload,
>>> how many exits do you see due to EOI?
>>>
>> Its of course highly workload dependent, and I've published these
>> details in the past, I believe. Off the top of my head, I recall that
>> virtio-pci tends to throw about 65k exits per second, vs about 32k/s for
>> venet on a 10GE box, but I don't recall what ratio of those exits are
>> EOI.
>
> Was this userspace virtio-pci or was this vhost-net?

Both, actually, though userspace is obviously even worse.

> If it was the
> former, then were you using MSI-X?

MSI-X

> If you weren't, there would be an
> additional (rather heavy) exit per-interrupt to clear the ISR which
> would certainly account for a large portion of the additional exits.
>

Yep, if you don't use MSI it is significantly worse as expected.


>> To be perfectly honest, I don't care. I do not discriminate
>> against the exit type...I want to eliminate as many as possible,
>> regardless of the type. That's how you go fast and yet use less CPU.
>>
>
> It's important to understand why one mechanism is better than another.

Agreed, but note _I_ already understand why. I've certainly spent
countless hours/emails trying to get others to understand as well, but
it seems most are too busy to actually listen.


> All I'm looking for is a set of bullet points that say, vbus does this,
> vhost-net does that, therefore vbus is better. We would then either
> say, oh, that's a good idea, let's change vhost-net to do that, or we
> would say, hrm, well, we can't change vhost-net to do that because of
> some fundamental flaw, let's drop it and adopt vbus.
>
> It's really that simple :-)

This is all been covered ad-nauseam, directly with youself in many
cases. Google is your friend.

Here are some tips while you research: Do not fall into the trap of
vhost-net vs vbus, or venet vs virtio-net, or you miss the point
entirely. Recall that venet was originally crafted to demonstrate the
virtues of my three performance objectives (kill exits, reduce exit
overhead, and run concurrently). Then there is all the stuff we are
laying on top, like qos, real-time, advanced fabrics, and easy adoption
for various environments (so it doesn't need to be redefined each time).

Therefore if you only look at the limited feature set of virtio-net, you
will miss the majority of the points of the framework. virtio tried to
capture some of these ideas, but it missed the mark on several levels
and was only partially defined. Incidentally, you can stil run virtio
over vbus if desired, but so far no one has tried to use my transport.

>
>
>>> They should be relatively rare
>>> because obtaining good receive batching is pretty easy.
>>>
>> Batching is poor mans throughput (its easy when you dont care about
>> latency), so we generally avoid as much as possible.
>>
>
> Fair enough.
>
>>> Considering
>>> these are lightweight exits (on the order of 1-2us),
>>>
>> APIC EOIs on x86 are MMIO based, so they are generally much heavier than
>> that. I measure at least 4-5us just for the MMIO exit on my Woodcrest,
>> never mind executing the locking/apic-emulation code.
>>
>
> You won't like to hear me say this, but Woodcrests are pretty old and
> clunky as far as VT goes :-)

Fair enough.

>
> On a modern Nehalem, I would be surprised if an MMIO exit handled in the
> kernel was muck more than 2us. The hardware is getting very, very
> fast. The trends here are very important to consider when we're looking
> at architectures that we potentially are going to support for a long time.

The exit you do not take will always be infinitely faster.

>
>>> you need an awfully
>>> large amount of interrupts before you get really significant performance
>>> impact. You would think NAPI would kick in at this point anyway.
>>>
>>>
>> Whether NAPI can kick in or not is workload dependent, and it also does
>> not address coincident events. But on that topic, you can think of
>> AlacrityVM's interrupt controller as "NAPI for interrupts", because it
>> operates on the same principle. For what its worth, it also operates on
>> a "NAPI for hypercalls" concept too.
>>
>
> The concept of always batching hypercalls has certainly been explored
> within the context of Xen.

I am not talking about batching, which again is a poor mans throughput
trick at the expense of latency. This literally is a "NAPI" like
signaled/polled hybrid, just going in the south direction.

> But then when you look at something like
> KVM's hypercall support, it turns out that with sufficient cleverness in
> the host, we don't even bother with the MMU hypercalls anymore.
>
> Doing fancy things in the guest is difficult to support from a long term
> perspective. It'll more or less never work for Windows and even the lag
> with Linux makes it difficult for users to see the benefit of these
> changes. You get a lot more flexibility trying to solve things in the
> host even if it's convoluted (like TPR patching).
>
>>> Do you have data demonstrating the advantage of EOI mitigation?
>>>
>> I have non-scientifically gathered numbers in my notebook that put it on
>> average of about 55%-60% reduction in EOIs for inbound netperf runs, for
>> instance. I don't have time to gather more in the near term, but its
>> typically in that range for a chatty enough workload, and it goes up as
>> you add devices. I would certainly formally generate those numbers when
>> I make another merge request in the future, but I don't have them now.
>>
>
> I don't think it's possible to make progress with vbus without detailed
> performance data comparing both vbus and virtio (vhost-net). On the
> virtio/vhost-net side, I think we'd be glad to help gather/analyze that
> data. We have to understand why one's better than the other and then we
> have to evaluate whether we can bring those benefits into the later. If
> we can't, we merge vbus. If we can, we fix virtio.

You will need apples to apples to gain any meaningful data, and that
means running both on the same setup on the same base kernel, etc. My
trees and instructions on how to run them referenced are on the
alacrityvm site. I can probably send you a quilt series for any recent
kernel you may wish to try if the git tree is not sufficient.

Note that if you enable zero-copy (which is on by default), you may want
to increase the guests wmem buffers since the transmit buffer reclaim
path is longer and you can artificially stall the guest side stack.
Generally 1MB-2MB should suffice. Otherwise just disable zero-copy with
"echo 0 > /sys/vbus/devices/$dev/zcthresh" on the host.

After you try basic tests, try lots of request-response and multi-homed
configurations, and watch your exit and interrupt rates as you do so, in
addition to the obvious metrics.

Good luck, and of course ping me with any troubles getting it to run.

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-22 21:14:26

by Anthony Liguori

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/2009 11:33 AM, Andi Kleen wrote:
>> We're not talking about vaporware. vhost-net exists.
>>
> Is it as fast as the alacrityvm setup then e.g. for network traffic?
>
> Last I heard the first could do wirespeed 10Gbit/s on standard hardware.
>

I'm very wary of any such claims. As far as I know, no one has done an
exhaustive study of vbus and published the results. This is why it's so
important to understand why the results are what they are when we see
numbers posted.

For instance, check out
http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf slide 32.

These benchmarks show KVM without vhost-net pretty closely pacing
native. With large message sizes, it's awfully close to line rate.

Comparatively speaking, consider
http://developer.novell.com/wiki/index.php/AlacrityVM/Results

vbus here is pretty far off of native and virtio-net is ridiculus.

Why are the results so different? Because benchmarking is fickle and
networking performance is complicated. No one benchmarking scenario is
going to give you a very good picture overall. It's also relatively
easy to stack the cards in favor of one approach verses another. The
virtio-net setup probably made extensive use of pinning and other tricks
to make things faster than a normal user would see them. It ends up
creating a perfect combination of batching which is pretty much just
cooking the mitigation schemes to do extremely well for one benchmark.

This is why it's so important to look at vbus from the perspective of
critically asking, what precisely makes it better than virtio. A couple
benchmarks on a single piece of hardware does not constitute an
existence proof that it's better overall.

There are a ton of differences between virtio and vbus because vbus was
written in a vacuum wrt virtio. I'm not saying we are totally committed
to virtio no matter what, but it should take a whole lot more than a
couple netperf runs on a single piece of hardware for a single kind of
driver to justify replacing it.

> Can vhost-net do the same thing?

I think the fundamentally question is, what makes vbus better than
vhost-net? vhost-net exists and is further along upstream than vbus is
at the moment. If that question cannot be answered with technical facts
and numbers to back them up, then we're just arguing for the sake of
arguing.

Regards,

Anthony Liguori

> -Andi
>

2009-12-23 00:03:47

by Chris Wright

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

* Anthony Liguori ([email protected]) wrote:
> The
> virtio-net setup probably made extensive use of pinning and other tricks
> to make things faster than a normal user would see them. It ends up
> creating a perfect combination of batching which is pretty much just
> cooking the mitigation schemes to do extremely well for one benchmark.

Just pinning, the rest is stock virtio features like mergeable rx buffers,
GRO, GSO (tx mitigation is actually disabled). Certainly doesn't show
throughput in terms of cpu cycle cost (scaling) nor latency per-packet
(exit and mitigation).

thanks,
-chris

2009-12-23 01:05:25

by Davide Libenzi

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Tue, 22 Dec 2009, Gregory Haskins wrote:

> On 12/22/09 2:39 PM, Davide Libenzi wrote:
> > On Tue, 22 Dec 2009, Gregory Haskins wrote:
> >
> >> On 12/22/09 1:53 PM, Avi Kivity wrote:
> >>> I asked why the irqfd/ioeventfd mechanisms are insufficient, and you did not reply.
> >>>
> >>
> >> BTW: the ioeventfd issue just fell through the cracks, so sorry about
> >> that. Note that I have no specific issue with irqfd ever since the
> >> lockless IRQ injection code was added.
> >>
> >> ioeventfd turned out to be suboptimal for me in the fast path for two
> >> reasons:
> >>
> >> 1) the underlying eventfd is called in atomic context. I had posted
> >> patches to Davide to address that limitation, but I believe he rejected
> >> them on the grounds that they are only relevant to KVM.
> >
> > I thought we addressed this already, in the few hundreds of email we
> > exchanged back then :)
>
> We addressed the race conditions, but not the atomic callbacks. I can't
> remember exactly what you said, but the effect was "no", so I dropped it. ;)
>
> This was the thread.
>
> http://www.archivum.info/[email protected]/2009-06/08548/Re:_[KVM-RFC_PATCH_1_2]_eventfd:_add_an_explicit_srcu_based_notifier_interface

Didn't that ended up in schedule_work() being just fine, and no need was
there for pre-emptible callbacks?



> >> 2) it cannot retain the data field passed in the PIO. I wanted to have
> >> one vector that could tell me what value was written, and this cannot be
> >> expressed in ioeventfd.
> >
> > Like might have hinted in his reply, couldn't you add data support to the
> > ioeventfd bits in KVM, instead of leaking them into mainline eventfd?
> >
>
> Perhaps, or even easier I could extend xinterface. Which is what I did ;)
>
> The problem with the first proposal is that you would no longer actually
> have an eventfd based mechanism...so any code using ioeventfd (like
> Michael Tsirkin's for instance) would break.

At that point, the KVM eventfd can take care of thing so that Michael bits
do not break.



- Davide

2009-12-23 06:15:52

by Kyle Moffett

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Tue, Dec 22, 2009 at 12:36, Gregory Haskins
<[email protected]> wrote:
> On 12/22/09 2:57 AM, Ingo Molnar wrote:
>> * Gregory Haskins <[email protected]> wrote:
>>> Actually, these patches have nothing to do with the KVM folks. [...]
>>
>> That claim is curious to me - the AlacrityVM host
>
> It's quite simple, really.  These drivers support accessing vbus, and
> vbus is hypervisor agnostic.  In fact, vbus isn't necessarily even
> hypervisor related.  It may be used anywhere where a Linux kernel is the
> "io backend", which includes hypervisors like AlacrityVM, but also
> userspace apps, and interconnected physical systems as well.
>
> The vbus-core on the backend, and the drivers on the frontend operate
> completely independent of the underlying hypervisor.  A glue piece
> called a "connector" ties them together, and any "hypervisor" specific
> details are encapsulated in the connector module.  In this case, the
> connector surfaces to the guest side as a pci-bridge, so even that is
> not hypervisor specific per se.  It will work with any pci-bridge that
> exposes a compatible ABI, which conceivably could be actual hardware.

This is actually something that is of particular interest to me. I
have a few prototype boards right now with programmable PCI-E
host/device links on them; one of my long-term plans is to finagle
vbus into providing multiple "virtual" devices across that single
PCI-E interface.

Specifically, I want to be able to provide virtual NIC(s), serial
ports and serial consoles, virtual block storage, and possibly other
kinds of interfaces. My big problem with existing virtio right now
(although I would be happy to be proven wrong) is that it seems to
need some sort of out-of-band communication channel for setting up
devices, not to mention it seems to need one PCI device per virtual
device.

So I would love to be able to port something like vbus to my nify PCI
hardware and write some backend drivers... then my PCI-E connected
systems would dynamically provide a list of highly-efficient "virtual"
devices to each other, with only one 4-lane PCI-E bus.

Cheers,
Kyle Moffett

2009-12-23 06:51:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33


* Anthony Liguori <[email protected]> wrote:

> On 12/22/2009 10:01 AM, Bartlomiej Zolnierkiewicz wrote:
> >>new e1000 driver is more superior in architecture and do the required
> >>work to make the new e1000 driver a full replacement for the old one.
> >Right, like everyone actually does things this way..
> >
> >I wonder why do we have OSS, old Firewire and IDE stacks still around then?
>
> And it's always a source of pain, isn't it.

Even putting aside the fact that such overlap sucks and is a pain to users
(and that 98% of driver and subsystem version transitions are done completely
seemlessly to users - the examples that were cited were the odd ones out of
150K commits in the past 4 years, 149K+ of which are seemless), the comparison
does not even apply really.

e1000, OSS, old Firewire and IDE are hardware stacks, where hardware is a not
fully understood externality, with its inevitable set of compatibility voes.
There's often situations where one piece of hardware still works better with
the old driver, for some odd (or not so odd) reason.

Also, note that the 'new' hw drivers are generally intended and are maintained
as clear replacements for the old ones, and do so with minimal ABI changes -
or preferably with no ABI changes at all. Most driver developers just switch
from old to new and the old bits are left around and are phased out. We phased
out old OSS recently.

That is a very different situation from the AlacrityVM patches, which:

- Are a pure software concept and any compatibility mismatch is
self-inflicted. The patches are in fact breaking the ABI to KVM
intentionally (for better or worse).

- Gregory claims that the AlacricityVM patches are not a replacement for KVM.
I.e. there's no intention to phase out any 'old stuff' and it splits the
pool of driver developers.

i.e. it has all the makings of a stupid, avoidable, permanent fork. The thing
is, if AlacricityVM is better, and KVM developers are not willing to fix their
stuff, replace KVM with it.

It's a bit as if someone found a performance problem with sys_open() and came
up with sys_open_v2() and claimed that he wants to work with the VFS
developers while not really doing so but advances sys_open_v2() all the time.

Do we allow sys_open_v2() upstream, in the name of openness and diversity,
letting some apps use that syscall while other apps still use sys_open()? Or
do we say "enough is enough of this stupidity, come up with some strong
reasons to replace sys_open, and if so, replace the thing and be done with the
pain!".

Overlap and forking can still be done in special circumstances, when a project
splits and a hostile fork is inevitable due to prolongued and irreconcilable
differences between the parties and if there's no strong technical advantage
on either side. I havent seen evidence of this yet though: Gregory claims that
he wants to 'work with the community' and the KVM guys seem to agree violently
that performance can be improved - and are doing so (and are asking Gregory to
take part in that effort).

The main difference is that Gregory claims that improved performance is not
possible within the existing KVM framework, while the KVM developers disagree.
The good news is that this is a hard, testable fact.

I think we should try _much_ harder before giving up and forking the ABI of a
healthy project and intentionally inflicting pain on our users.

And, at minimum, such kinds of things _have to_ be pointed out in pull
requests, because it's like utterly important. In fact i couldnt list any more
important thing to point out in a pull request.

Ingo

2009-12-23 10:13:43

by Andi Kleen

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

> i.e. it has all the makings of a stupid, avoidable, permanent fork. The thing

Nearly. There was no equivalent of a kernel based virtual driver host
before.

> - Are a pure software concept and any compatibility mismatch is
> self-inflicted. The patches are in fact breaking the ABI to KVM

In practice, especially considering older kernel releases, VMs
behave like hardware, with all its quirks, compatibility requirements,
sometimes not fully understood, etc.

> is, if AlacricityVM is better, and KVM developers are not willing to fix their
> stuff, replace KVM with it.

In the end the driver model is only a very small part of KVM though.

>
> It's a bit as if someone found a performance problem with sys_open() and came
> up with sys_open_v2() and claimed that he wants to work with the VFS
> developers while not really doing so but advances sys_open_v2() all the time.

AFAIK Gregory tried for several months to work with the KVM maintainers,
but failed at their NIH filter.

>
> Do we allow sys_open_v2() upstream, in the name of openness and diversity,
> letting some apps use that syscall while other apps still use sys_open()? Or
> do we say "enough is enough of this stupidity, come up with some strong
> reasons to replace sys_open, and if so, replace the thing and be done with the
> pain!".

I thought the published benchmark numbers were strong reasons.
I certainly haven't seen similarly convincing numbers for vhost-net.

> The main difference is that Gregory claims that improved performance is not
> possible within the existing KVM framework, while the KVM developers disagree.
> The good news is that this is a hard, testable fact.

Yes clearly the onus at this point is on the vhost-net developers/
"pci is all that is ever needed for PV" proponents to show similar numbers
with their current code.

If they can show the same performance there's really no need for
the alacrityvm model (or at least I haven't seen a convincing reason
other than performance so far to have a separate model)

I heard claims earlier from one side to the other that some benchmarks
were not fair. Apart from such accusations not being very constructive
(I don't think anyone is trying to intentionally mislead others here),
but even if that's the case surely the other side can do similar
benchmarks and demonstrate they are as fast.

-Andi

2009-12-23 10:23:44

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 12:13 PM, Andi Kleen wrote:
>> i.e. it has all the makings of a stupid, avoidable, permanent fork. The thing
>>
> Nearly. There was no equivalent of a kernel based virtual driver host
> before.
>

These are guest drivers. We have virtio drivers, and Xen drivers (which
are Xen-specific).

>> - Are a pure software concept and any compatibility mismatch is
>> self-inflicted. The patches are in fact breaking the ABI to KVM
>>
> In practice, especially considering older kernel releases, VMs
> behave like hardware, with all its quirks, compatibility requirements,
> sometimes not fully understood, etc.
>

There was no attempt by Gregory to improve virtio-net.

>> It's a bit as if someone found a performance problem with sys_open() and came
>> up with sys_open_v2() and claimed that he wants to work with the VFS
>> developers while not really doing so but advances sys_open_v2() all the time.
>>
> AFAIK Gregory tried for several months to work with the KVM maintainers,
> but failed at their NIH filter.
>

It was the backwards compatibility, live migration, unneeded complexity,
and scalability filters from where I sit. vbus fails on all four.

>> The main difference is that Gregory claims that improved performance is not
>> possible within the existing KVM framework, while the KVM developers disagree.
>> The good news is that this is a hard, testable fact.
>>
> Yes clearly the onus at this point is on the vhost-net developers/
> "pci is all that is ever needed for PV" proponents to show similar numbers
> with their current code.
>
> If they can show the same performance there's really no need for
> the alacrityvm model (or at least I haven't seen a convincing reason
> other than performance so far to have a separate model)
>

Anthony posted this:

http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf

See slide 32. This is without vhost-net.

--
error compiling committee.c: too many arguments to function

2009-12-23 12:14:38

by Andi Kleen

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

> http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf
>
> See slide 32. This is without vhost-net.

Thanks. Do you also have latency numbers?

It seems like there's definitely still potential for improvement
with messages <4K. But for the large messages they indeed
look rather good.

It's unclear what message size the Alacrity numbers used, but I presume
it was rather large.

-Andi
--
[email protected] -- Speaking for myself only.

2009-12-23 12:49:31

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 02:14 PM, Andi Kleen wrote:
>> http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf
>>
>> See slide 32. This is without vhost-net.
>>
> Thanks. Do you also have latency numbers?
>

No. Copying Chris. This was with the tx mitigation timer disabled, so
you won't see the usual atrocious userspace virtio latencies, but it
won't be as good as a host kernel implementation since we take a
heavyweight exit and qemu is pretty unoptimized.

> It seems like there's definitely still potential for improvement
> with messages<4K. But for the large messages they indeed
> look rather good.
>

There's still a lot of optimization to be done, but I hope this proves
there is nothing inherently slow about virtio.

--
error compiling committee.c: too many arguments to function

Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Wednesday 23 December 2009 07:51:29 am Ingo Molnar wrote:
>
> * Anthony Liguori <[email protected]> wrote:
>
> > On 12/22/2009 10:01 AM, Bartlomiej Zolnierkiewicz wrote:
> > >>new e1000 driver is more superior in architecture and do the required
> > >>work to make the new e1000 driver a full replacement for the old one.
> > >Right, like everyone actually does things this way..
> > >
> > >I wonder why do we have OSS, old Firewire and IDE stacks still around then?
> >
> > And it's always a source of pain, isn't it.
>
> Even putting aside the fact that such overlap sucks and is a pain to users
> (and that 98% of driver and subsystem version transitions are done completely
> seemlessly to users - the examples that were cited were the odd ones out of
> 150K commits in the past 4 years, 149K+ of which are seemless), the comparison
> does not even apply really.

Total commit number has nothing to do with the issue raised since the problem
is in the total source code complexity and the need to maintain separate code
bases.

[ BTW I find your habit of bringing the completely unrelated numbers into
the discussion quite annoying. Do you really think that throwing in some
random numbers automatically increases the credibility of your opinion? ]

> e1000, OSS, old Firewire and IDE are hardware stacks, where hardware is a not
> fully understood externality, with its inevitable set of compatibility voes.
> There's often situations where one piece of hardware still works better with
> the old driver, for some odd (or not so odd) reason.
>
> Also, note that the 'new' hw drivers are generally intended and are maintained
> as clear replacements for the old ones, and do so with minimal ABI changes -
> or preferably with no ABI changes at all. Most driver developers just switch
> from old to new and the old bits are left around and are phased out. We phased
> out old OSS recently.

'We' as Fedora?

old OSS stuff is still there (sound/oss/ which is almost 45KLOC and would
be much more if not past efforts from Adrian Bunk to shrink it down)

Besides 'phase out' that you are talking about comes down to just waiting
for old hardware/user base to die and it takes years to accomplish..

I can understand how this is not an issue from i.e. Red Hat's POV when you
have one 'set in the stone' set of drivers in RHEL and the other 'constant
development flux' one in Fedora (which because of this fact cannot be no
longer consider as real distribution for real users BTW) but for everybody
else this is simply untrue.

> That is a very different situation from the AlacrityVM patches, which:
>
> - Are a pure software concept and any compatibility mismatch is
> self-inflicted. The patches are in fact breaking the ABI to KVM
> intentionally (for better or worse).

Care to explain the 'breakage' and why KVM is more special in this regard
than other parts of the kernel (where we don't keep any such requirements)?

Truth to be told KVM is just another driver/subsystem and Gregory's changes
are only 4KLOC of clean and easily maintainable code..

> - Gregory claims that the AlacricityVM patches are not a replacement for KVM.
> I.e. there's no intention to phase out any 'old stuff' and it splits the
> pool of driver developers.

Talk double standards. It was you & co. that officially legitimized this
style of doing things and now you are complaining about it?

> i.e. it has all the makings of a stupid, avoidable, permanent fork. The thing
> is, if AlacricityVM is better, and KVM developers are not willing to fix their
> stuff, replace KVM with it.
>
> It's a bit as if someone found a performance problem with sys_open() and came
> up with sys_open_v2() and claimed that he wants to work with the VFS
> developers while not really doing so but advances sys_open_v2() all the time.
>
> Do we allow sys_open_v2() upstream, in the name of openness and diversity,
> letting some apps use that syscall while other apps still use sys_open()? Or
> do we say "enough is enough of this stupidity, come up with some strong
> reasons to replace sys_open, and if so, replace the thing and be done with the
> pain!".

I certainly missed the time when KVM became officially part of core ABI..

> Overlap and forking can still be done in special circumstances, when a project
> splits and a hostile fork is inevitable due to prolongued and irreconcilable
> differences between the parties and if there's no strong technical advantage
> on either side. I havent seen evidence of this yet though: Gregory claims that
> he wants to 'work with the community' and the KVM guys seem to agree violently
> that performance can be improved - and are doing so (and are asking Gregory to
> take part in that effort).

How it is different from any past forks?

The odium of proving that the existing framework is sufficient was always on
original authors or current maintainers.

KVM guys were offered assistance from Gregory and had few months to prove that
they can get the same kind of performance using existing architecture and they
DID NOT do it.

> The main difference is that Gregory claims that improved performance is not
> possible within the existing KVM framework, while the KVM developers disagree.
>
> The good news is that this is a hard, testable fact.
>
> I think we should try _much_ harder before giving up and forking the ABI of a
> healthy project and intentionally inflicting pain on our users.

Then please try harder. Gregory posted his initial patches in August,
it is December now and we only see artificial road-blocks instead of code
from KVM folks.

> And, at minimum, such kinds of things _have to_ be pointed out in pull
> requests, because it's like utterly important. In fact i couldnt list any more
> important thing to point out in a pull request.

I think that this part should be easily fixable..

--
Bartlomiej Zolnierkiewicz

2009-12-23 13:31:34

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 03:07 PM, Bartlomiej Zolnierkiewicz wrote:
>
>> That is a very different situation from the AlacrityVM patches, which:
>>
>> - Are a pure software concept and any compatibility mismatch is
>> self-inflicted. The patches are in fact breaking the ABI to KVM
>> intentionally (for better or worse).
>>
> Care to explain the 'breakage' and why KVM is more special in this regard
> than other parts of the kernel (where we don't keep any such requirements)?
>

The device model is exposed to the guest. If you change it, the guest
breaks.

So we have two options:
- phase out virtio, users don't see new improvements, ask them to
change to vbus/venet
- maintain the two in parallel

Neither appeals to me.

> Truth to be told KVM is just another driver/subsystem and Gregory's changes
> are only 4KLOC of clean and easily maintainable code..
>

This 4K is only the beginning. There are five more virtio drivers, plus
features in virtio-net not ported to venet, plus the host support, plus
qemu support, plus Windows drivers, plus adapters for non-pci (lguest
and s390), plus live migration support. vbus itself still has scaling
issues.

Virtio was under development for years. Sure you can focus on one
dimension only (performance) and get good results but real life is more
complicated.

> I certainly missed the time when KVM became officially part of core ABI..
>

It's more akin to the hardware interface. We don't change the hardware
underneath the guest.

>> Overlap and forking can still be done in special circumstances, when a project
>> splits and a hostile fork is inevitable due to prolongued and irreconcilable
>> differences between the parties and if there's no strong technical advantage
>> on either side. I havent seen evidence of this yet though: Gregory claims that
>> he wants to 'work with the community' and the KVM guys seem to agree violently
>> that performance can be improved - and are doing so (and are asking Gregory to
>> take part in that effort).
>>
> How it is different from any past forks?
>
> The odium of proving that the existing framework is sufficient was always on
> original authors or current maintainers.
>
> KVM guys were offered assistance from Gregory and had few months to prove that
> they can get the same kind of performance using existing architecture and they
> DID NOT do it.
>

Look at the results from Chris Wright's presentation. Hopefully in a
few days some results from vhost-net.

> Then please try harder. Gregory posted his initial patches in August,
> it is December now and we only see artificial road-blocks instead of code
> from KVM folks.
>

What artificial road blocks?

--
error compiling committee.c: too many arguments to function

Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Wednesday 23 December 2009 02:31:11 pm Avi Kivity wrote:
> On 12/23/2009 03:07 PM, Bartlomiej Zolnierkiewicz wrote:
> >
> >> That is a very different situation from the AlacrityVM patches, which:
> >>
> >> - Are a pure software concept and any compatibility mismatch is
> >> self-inflicted. The patches are in fact breaking the ABI to KVM
> >> intentionally (for better or worse).
> >>
> > Care to explain the 'breakage' and why KVM is more special in this regard
> > than other parts of the kernel (where we don't keep any such requirements)?
> >
>
> The device model is exposed to the guest. If you change it, the guest
> breaks.

Huh? Shouldn't non-vbus aware guests continue to work just fine?

> > I certainly missed the time when KVM became officially part of core ABI..
> >
>
> It's more akin to the hardware interface. We don't change the hardware
> underneath the guest.

As far as my limited understanding of things go vbus is completely opt-in
so it is like adding new real hardware to host. Where is the problem?

--
Bartlomiej Zolnierkiewicz

2009-12-23 14:29:11

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 04:08 PM, Bartlomiej Zolnierkiewicz wrote:
>
>> The device model is exposed to the guest. If you change it, the guest
>> breaks.
>>
> Huh? Shouldn't non-vbus aware guests continue to work just fine?
>

Sure. But we aren't merging this code in order not to use it. If we
switch development focus to vbus, we have to ask everyone who's riding
on virtio to switch. Alternatively we maintain both models.

If vbus was the only way to get this kind of performance, I know what
I'd choose. But if it isn't, why inflict the change on users?

Consider a pxe-booting guest (or virtio-blk vs. a future veblk). Is
switching drivers in initrd something you want your users to do? [1]

One of the advantages of virtualization is stable hardware. I don't
want to let it go without a very good reason.

[1] I remember the move from /dev/hda to /dev/sda a few years ago, it
isn't a good memory.

--
error compiling committee.c: too many arguments to function

2009-12-23 14:57:44

by Anthony Liguori

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 07:07 AM, Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 23 December 2009 07:51:29 am Ingo Molnar wrote:

> KVM guys were offered assistance from Gregory and had few months to prove that
> they can get the same kind of performance using existing architecture and they
> DID NOT do it.

With all due respect, there is a huge misunderstanding that's unpinning
this thread which is that vbus is absolutely more performant than
virtio-net and that we've failed to demonstrate that we can obtain the
same level of performance in virtio-net. This is simply untrue.

In fact, within a week or so of Greg's first posting of vbus, I posted a
proof of concept patch to the virtio-net backend that got equivalent
results. But I did not feel at the time that this was the right
solution to the problem and we've been trying to do something much
better. By the same token, I don't feel that vbus is the right approach
to solving the problem.

There are really three factors that affect networking performance in a
virtual environment: the number of copies of the data, the number of
exits required per-packet transmission, and the cost of each exit.

The "poor" packet latency of virtio-net is a result of the fact that we
do software timer based TX mitigation. We do this such that we can
decrease the number of exits per-packet and increase throughput. We set
a timer for 250ms and per-packet latency will be at least that much.

We have to use a timer for the userspace backend because the tun/tap
device is rather quick to queue a packet which means that we get no
feedback that we can use to trigger TX mitigation.

vbus works around this by introducing a transmit and receive thread and
relies on the time it takes to schedule those threads to do TX
mitigation. The version of KVM in RHEL5.4 does the same thing. How
effective this is depends on a lot of factors including the overall
system load, the default time slice length, etc.

This tends to look really good when you're trying to drive line speed
but it absolutely sucks when you're looking at the CPU cost of low
packet rates. IOW, this is a heuristic that looks really good when
doing netperf TCP_RR and TCP_STREAM, but it starts to look really bad
when doing things like partial load CPU usage comparisons with other
hypervisors.

vhost-net takes a different, IMHO superior, approach in that it
associates with some type of network device (tun/tap or physical device)
and uses the device's transmit interface to determine how to mitigate
packets. This means that we can potentially get to the point where
instead of relying on short timeouts to do TX mitigation, we can use the
underlying physical device's packet processing state which will provide
better results in most circumstances.

N.B. using a separate thread for transmit mitigation looks really good
on benchmarks because when doing a simple ping test, you'll see very
short latencies because you're not batching at all. It's somewhat
artificial in this regard.

With respect to number of copies, vbus up until recently had the same
number of copies as virtio-net. Greg has been working on zero-copy
transmit, which is great stuff, but Rusty Russell had done the same
thing with virtio-net and tun/tap. There are some hidden nasties when
using skb destructors to achieve this and I think the feeling was this
wasn't going to work. Hopefully, Greg has better luck but suffice to
say, we've definitely demonstrated this before with virtio-net. If the
issues around skb destruction can be resolved, we can incorporate this
into tun/tap (and therefore, use it in virtio) very easily.

In terms of the cost per exit, the main advantage vbus had over
virtio-net was that virtio-net's userspace backend was in userspace
which required a heavy-weight exit which is a few times more expensive
than a lightweight exit. We've addressed this with vhost-net which
implements the backend in the kernel. Originally, vbus was able to do
edge triggered interrupts whereas virtio-pci was using level triggered
interrupts. We've since implemented MSI-X support (already merged
upstream) and we now can also do edge triggered interrupts with virtio.

The only remaining difference is the fact that vbus can mitigate exits
due to EOI's in the virtual APIC because it relies on a paravirtual
interrupt controller.

This is rather controversial for a few reasons. The first is that there
is absolutely no way that a paravirtual interrupt controller would work
for Windows, older Linux guests, or probably any non-Linux guest. As a
design point, this is a big problem for KVM. We've seen the struggle
with this sort of thing with Xen. The second is that it's very likely
that this problem will go away on it's own either because we'll rely on
x2apic (which will eventually work with Windows) or we'll see better
hardware support for eoi shadowing (there is already hardware support
for tpr shadowing). Most importantly though, it's unclear how much EOI
mitigation actually matters. Since we don't know how much of a win this
is, we have no way of evaluating whether it's even worth doing in the
first place.

At any rate, a paravirtual interrupt controller is entirely orthogonal
to a paravirtual IO model. You could use a paravirtual interrupt
controller with virtio and KVM as well as you could use it with vbus.
In fact, if that bit was split out of vbus and considered separately,
then I don't think there would be objections to it in principle
(although Avi has some scalability concerns with the current
implementation).

vbus also uses hypercalls instead of PIO. I think we've established
pretty concretely that the two are almost identical though from a
performance perspective. We could easily use hypercalls with virtio-pci
but our understanding is that the difference in performance would be
lost in the noise.

Then there's an awful lot of other things that vbus does differentiately
but AFAICT, none of them have any impact on performance whatsoever. The
shared memory abstraction is at a different level. virtio models
something of a bulk memory transfer API whereas vbus models a shared
memory API. Bulk memory transfer was chosen for virtio in order to
support hypervisors like Xen that aren't capable of doing robust shared
memory and instead rely on either page flipping or a fixed sharing pool
that often requires copying into or out of that pool.

vbus has a very different discovery mechanism that is more akin to Xen's
paravirtual I/O mechanism. virtio has not baked in concept of discovery
although we must commonly piggy back off of PCI for discovery. The way
devices are created and managed is very different in vbus. vbus also
has some provisions in it to support non-virtualized environments. I
think virtio is fundamentally capable of that but it's not a design
point for virtio.

We could take any of this other differences, and have a discussion about
whether it makes sense to introduce such a thing in virtio or what the
use cases are for that. I don't think Greg is really interested in
that. I think he wants all of vbus or nothing at all. I don't see the
point of having multiple I/O models supported in upstream Linux though
or in upstream KVM. It's bad for users and it splits development effort.

Greg, if there are other things that you think come into play with
respect to performance, please do speak up. This is the best that
"google" is able to answer my questions ;-)

Regards,

Anthony Liguori

2009-12-23 15:04:09

by Anthony Liguori

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 06:14 AM, Andi Kleen wrote:
>> http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf
>>
>> See slide 32. This is without vhost-net.
>
> Thanks. Do you also have latency numbers?

They'll be along the lines of the vbus numbers.

But I caution people from relying too much on just netperf TCP_RR and
TCP_STREAM numbers. There's a lot of heuristics in play in getting this
sort of numbers. They really aren't good ways to compare different drivers.

A better thing to do is look more deeply at the architectures and
consider things like the amount of copying imposed, the cost of
processing an exit, and the mechanisms for batching packet transmissions.

The real argument that vbus needs to make IMHO is not "look how much
better my netperf TCP_STREAM results are" but "we can eliminate N copies
from the transmit path and virtio-net cannot" or "we require N exits to
handle a submission and virtio-net requires N+M".

It's too easy to tweak for benchmarks.

Regards,

Anthony Liguori

2009-12-23 15:09:31

by Anthony Liguori

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 12:15 AM, Kyle Moffett wrote:
> This is actually something that is of particular interest to me. I
> have a few prototype boards right now with programmable PCI-E
> host/device links on them; one of my long-term plans is to finagle
> vbus into providing multiple "virtual" devices across that single
> PCI-E interface.
>
> Specifically, I want to be able to provide virtual NIC(s), serial
> ports and serial consoles, virtual block storage, and possibly other
> kinds of interfaces. My big problem with existing virtio right now
> (although I would be happy to be proven wrong) is that it seems to
> need some sort of out-of-band communication channel for setting up
> devices, not to mention it seems to need one PCI device per virtual
> device.

We've been thinking about doing a virtio-over-IP mechanism such that you
could remote the entire virtio bus to a separate physical machine.
virtio-over-IB is probably more interesting since you can make use of
RDMA. virtio-over-PCI-e would work just as well.

virtio is a layered architecture. Device enumeration/discovery sits at
a lower level than the actual device ABIs. The device ABIs are
implemented on top of a bulk data transfer API. The reason for this
layering is so that we can reuse PCI as an enumeration/discovery
mechanism. This tremendenously simplifies porting drivers to other OSes
and let's us use PCI hotplug automatically. We get integration into all
the fancy userspace hotplug support for free.

But both virtio-lguest and virtio-s390 use in-band enumeration and
discovery since they do not have support for PCI on either platform.

Regards,

Anthony Liguori

2009-12-23 15:12:33

by Anthony Liguori

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/22/2009 06:02 PM, Chris Wright wrote:
> * Anthony Liguori ([email protected]) wrote:
>> The
>> virtio-net setup probably made extensive use of pinning and other tricks
>> to make things faster than a normal user would see them. It ends up
>> creating a perfect combination of batching which is pretty much just
>> cooking the mitigation schemes to do extremely well for one benchmark.
>
> Just pinning, the rest is stock virtio features like mergeable rx buffers,
> GRO, GSO (tx mitigation is actually disabled).

Technically, tx mitigation isn't disabled. The heuristic is changed
such that instead of relying on a fixed timer, tx notification is
disabled until you can switch to another thread and process packets.

The effect is that depending on time slice length and system load, you
adaptively enable tx mitigation. It's heavily dependent on the
particulars of the system and the overall load.

For instance, this mitigation scheme looks great at high throughputs but
looks very bad at mid-to-low throughputs compared to timer based
mitigation (at least, when comparing CPU cost).

Regards,

Anthony Liguori

2009-12-23 15:18:31

by Chris Wright

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

* Anthony Liguori ([email protected]) wrote:
> On 12/22/2009 06:02 PM, Chris Wright wrote:
>> * Anthony Liguori ([email protected]) wrote:
>>> The
>>> virtio-net setup probably made extensive use of pinning and other tricks
>>> to make things faster than a normal user would see them. It ends up
>>> creating a perfect combination of batching which is pretty much just
>>> cooking the mitigation schemes to do extremely well for one benchmark.
>>
>> Just pinning, the rest is stock virtio features like mergeable rx buffers,
>> GRO, GSO (tx mitigation is actually disabled).
>
> Technically, tx mitigation isn't disabled. The heuristic is changed
> such that instead of relying on a fixed timer, tx notification is
> disabled until you can switch to another thread and process packets.
>
> The effect is that depending on time slice length and system load, you
> adaptively enable tx mitigation. It's heavily dependent on the
> particulars of the system and the overall load.
>
> For instance, this mitigation scheme looks great at high throughputs but
> looks very bad at mid-to-low throughputs compared to timer based
> mitigation (at least, when comparing CPU cost).

Yep, you're right.

thanks,
-chris

2009-12-23 16:44:49

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/09 1:51 AM, Ingo Molnar wrote:
>
> * Anthony Liguori <[email protected]> wrote:
>
>> On 12/22/2009 10:01 AM, Bartlomiej Zolnierkiewicz wrote:
>>>> new e1000 driver is more superior in architecture and do the required
>>>> work to make the new e1000 driver a full replacement for the old one.
>>> Right, like everyone actually does things this way..
>>>
>>> I wonder why do we have OSS, old Firewire and IDE stacks still around then?
>>
>> And it's always a source of pain, isn't it.
>
> Even putting aside the fact that such overlap sucks and is a pain to users
> (and that 98% of driver and subsystem version transitions are done completely
> seemlessly to users - the examples that were cited were the odd ones out of
> 150K commits in the past 4 years, 149K+ of which are seemless), the comparison
> does not even apply really.
>
> e1000, OSS, old Firewire and IDE are hardware stacks, where hardware is a not
> fully understood externality, with its inevitable set of compatibility voes.
> There's often situations where one piece of hardware still works better with
> the old driver, for some odd (or not so odd) reason.
>
> Also, note that the 'new' hw drivers are generally intended and are maintained
> as clear replacements for the old ones, and do so with minimal ABI changes -
> or preferably with no ABI changes at all. Most driver developers just switch
> from old to new and the old bits are left around and are phased out. We phased
> out old OSS recently.
>
> That is a very different situation from the AlacrityVM patches, which:
>
> - Are a pure software concept

By design. In fact, I would describe it as "software to software
optimized" as opposed to trying to shoehorn into something that was
designed as a software-to-hardware interface (and therefore has
assumptions about the constraints in that environment that are not
applicable in software-only).

> and any compatibility mismatch is self-inflicted.

.. because the old model is not great for the intended use cases and has
issues. I've already covered the reasons why adnauseam.

> The patches are in fact breaking the ABI to KVM intentionally (for better or worse).

No, at the very worst they are _augmenting_ the ABI, as evident from the
fact that AlacrityVM is a superset of the entire KVM ABI.

>
> - Gregory claims that the AlacricityVM patches are not a replacement for KVM.
> I.e. there's no intention to phase out any 'old stuff'

There's no reason to phase anything out, except perhaps the virtio-pci
transport. This is one more transport, plugging into virtio underneath
(just like virtio-pci, virtio-lguest, and virtio-s390). I am not even
suggesting that the old transport has to go away, per se. It is the KVM
maintainers who insist on it being all or nothing. For me, I do not see
the big deal in having one more "model" option in the qemu cmd-line, but
that is just my opinion. If the maintainers were really so adamant that
choice is pure evil, I am not sure why we don't see patches for removing
everything but one model type in each IO category. But I digress.

> and it splits the pool of driver developers.

..it these dumb threads that are splitting driver developers with
ignorant statements, irrelevant numbers, and dubious "facts". I
actually tried many many times to ask others to join the effort, and
instead _they_ forked off and made vhost-net with a "sorry, not
interested in working with you"(and based the design largely on the
ideas proposed in my framework, I might add). Thats fine, and it's
their prerogative. I can easily maintain my project out of tree if
upstream is not interested. But do not turn around and try to blame me
for the current state of affairs.

>
> i.e. it has all the makings of a stupid, avoidable, permanent fork. The thing
> is, if AlacricityVM is better, and KVM developers are not willing to fix their
> stuff, replace KVM with it.
>
> It's a bit as if someone found a performance problem with sys_open() and came
> up with sys_open_v2() and claimed that he wants to work with the VFS
> developers while not really doing so but advances sys_open_v2() all the time.

No, its more like if I suggested sys_open_vbus() to augment
sys_open_pci(), sys_open_lguest(), sys_open_s390() because our
fictitious glibc natively supported modular open() implementations. And
I tried to work for a very long time convincing the VFS developers that
this new transport was a good idea to add because it was optimized for
the problem space, made it easy for API callers to reuse the important
elements of the design (e.g. built in "tx-mitigation" instead of waiting
for someone to write it for each device), had new features like the
ability to prioritize signals, create/route signal paths arbitrarily,
implement raw shared memory for idempotent updates, and didn't require
the massive and useless PCI/APIC emulation logic to function like
sys_open_pci() (e.g. easier to port around).

Ultimately, the "VFS developers" said "I know we let other transports in
in the past, but now all transports must be sys_open_pci() based going
forward". Game over, since sys_open_pci cannot support the features I
need, and/or it makes incredibly easy things complex when they don't
need to be so its a poor choice.

>
> Do we allow sys_open_v2() upstream, in the name of openness and diversity,
> letting some apps use that syscall while other apps still use sys_open()?

s/sys_open_v2/sys_open_vbus to portray it accurately, and sure, why not?
There is plenty of precedent already. Its just the top-edge IO ABI.
You can chose realtek, e1000, virtio-net 802.x ABIs today for instance.
This is one more, and despite attempts at painting it duplicative, it
is indeed an evolutionary upgrade IMO especially when you glance beyond
the 802.x world and look at the actual device model presented.

And its moot, anyway, as I have already retracted my one outstanding
pull request based on Linus' observation. So at this time, I am not
advocating _anything_ for upstream inclusion. And I am contemplating
_never_ doing so again. It's not worth _this_.

> Or do we say "enough is enough of this stupidity,

I certainly agree that this thread has indeed introduced a significant
degree of stupidity, yes.

> come up with some strong
> reasons to replace sys_open, and if so, replace the thing and be done with the
> pain!".
>

I am open to this, but powerless to control the decision in the upstream
variant other than to describe what I did, and rebut FUD against it to
make sure the record is straight.


> Overlap and forking can still be done in special circumstances, when a project
> splits and a hostile fork is inevitable due to prolongued and irreconcilable
> differences between the parties

You are certainly a contributing factor in pushing things in that direction.

> and if there's no strong technical advantage
> on either side. I havent seen evidence of this yet though: Gregory
claims that
> he wants to 'work with the community'

Well, I sincerely did in the beginning in the spirit of FOSS. I have to
admit that the desire is constantly eroded after dealing with the likes
of you. So if I have seemed more standoffish as of late, that is the
reason. If that was your goal, congratulations: You have irritated me
into submission. And no, I don't expect you to care.

> and the KVM guys seem to agree violently
> that performance can be improved - and are doing so (and are asking Gregory to
> take part in that effort).

And as I indicated to you in my first reply to this thread: the
performance aspects are but one goal here. Some of the performance
aspects cannot be achieved with their approach (like EIO mitigation as
an example), and some of the other feature based aspects cannot be
achieved either (interrupt priority, dynamic signals, etc). That is why
the calls to unify behind virtio-pci have gone unanswered by me: That
approach is orthogonal to the vbus project goals. Their ability to
understand or agree with that difference has no bearing on whether there
is any technical merit here. I think this is what you are failing to grasp.

There will be people that will say "Well, we can do a PV-APIC and get
EOI mitigation in PCI too". THAT IS WHAT VBUS IS FOR!!! Implementing
linux-kernel backed, shared-memory, high performance devices. Something
like a shared-memory based interrupt controller would be exactly the
kind of thing I envision here. We can also do other things, like high
performance timers, scheduler coordinators, etc. I don't know how many
different ways to describe it in a way that will be understood. I
started with 802.x because its easy to show the performance gains. If I
knew that the entire community would get bent around the axle on 802.x
when I started, I never would have broached the subject like this. Se
la vie.

>
> The main difference is that Gregory claims that improved performance is not
> possible within the existing KVM framework, while the KVM developers disagree.
> The good news is that this is a hard, testable fact.

Yes, I encourage a bakeoff and have code/instructions available to
anyone interested. I also encourage people to think about the other
facilities that are being introduced in addition to performance
enhancements for simple 802.x, or even KVM. This is about building a
modular framework that encompasses both sides of the links (guest AND
host), and implements "best practices" for optimized PV IO ingrained in
its DNA. It tries to do this in such a way that we don't need to write
new backends for every environment that comes along, or rely on
unnecessary emulation layers (PCI/APIC) to achieve it. It's about
extending Linux as a "io-visor" much as it is for userspace apps for any
environments, using a tried and true shared-memory based approach.

>
> I think we should try _much_ harder before giving up and forking the ABI of a
> healthy project and intentionally inflicting pain on our users.
>
> And, at minimum, such kinds of things _have to_ be pointed out in pull
> requests, because it's like utterly important. In fact i couldnt list any more
> important thing to point out in a pull request.

Mea Culpa. Since I've already established that the pull request didn't
directly relate to the controversy, I didn't think to mention that at
the time. These were just a few more drivers to join the ranks of 1000s
more within Linux. In retrospect, I probably should have so I apologize
for that. It was my first pull request to Linus, so I was bound to
screw something up.

-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-23 17:00:59

by Chris Wright

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

* Avi Kivity ([email protected]) wrote:
> On 12/23/2009 02:14 PM, Andi Kleen wrote:
> >>http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf
> >>
> >>See slide 32. This is without vhost-net.
> >Thanks. Do you also have latency numbers?
>
> No. Copying Chris. This was with the tx mitigation timer disabled,
> so you won't see the usual atrocious userspace virtio latencies, but
> it won't be as good as a host kernel implementation since we take a
> heavyweight exit and qemu is pretty unoptimized.

Those numbers don't show cpu cycles per packet nor do they show latencies.
You won't see the timer based latency, because the tx mitigation scheme
is not timer based for those numbers. Below are some numbers comparing
bare metal, an assigned device, and virtio (not vhost-net, so we are still
doing a heavy-weight exit to qemu and syscalls to deliver to tap device).

> >It seems like there's definitely still potential for improvement
> >with messages<4K. But for the large messages they indeed
> >look rather good.

You are misreading the graph. At <4K it is tracking bare metal (the
green and yellow lines are bare metal, the red and blue bars are virtio).
At >4k we start to drop off (esp. on RX).

This (slide 9) shows AMQP latencies for bare metal, an assigned device,
and virtio.
http://www.redhat.com/f/pdf/summit/bche_320_red_hat_enterprise_mrg.pdf

Similarly, here's some much rawer latency numbers from netpipe, all done
in usecs.

bare assigned
metal PCI NIC virtio
(usecs) (usecs) (usecs)
----- ----- -----
1 bytes 22.20 36.16 53.19
2 bytes 22.21 35.98 53.23
3 bytes 22.22 36.18 53.29
4 bytes 22.25 33.77 53.43
6 bytes 22.33 36.33 53.48
8 bytes 22.32 36.24 53.27
12 bytes 22.25 35.97 53.33
13 bytes 22.40 35.94 53.54
16 bytes 22.36 35.98 53.60
19 bytes 22.40 35.95 53.51
21 bytes 22.42 35.94 53.76
24 bytes 22.32 36.18 53.45
27 bytes 22.34 36.08 53.48
29 bytes 22.36 36.02 53.42
32 bytes 22.46 36.15 53.23
35 bytes 22.36 36.23 53.13
45 bytes 26.32 36.17 53.29
48 bytes 26.24 35.94 53.50
51 bytes 26.44 36.01 53.66
61 bytes 26.43 33.66 53.28
64 bytes 26.66 36.32 53.17
67 bytes 26.35 36.21 53.53
93 bytes 26.59 36.49 45.75
96 bytes 26.48 36.28 45.72
99 bytes 26.51 36.47 45.72
125 bytes 26.74 36.48 45.99
128 bytes 26.44 36.52 45.69
131 bytes 26.52 35.71 45.80
189 bytes 26.77 36.99 46.78
192 bytes 26.96 37.45 47.00
195 bytes 26.96 37.45 47.10
253 bytes 27.01 38.03 47.36
256 bytes 27.09 37.85 47.23
259 bytes 26.98 37.82 47.28
381 bytes 26.61 38.38 47.84
384 bytes 26.72 38.54 48.01
387 bytes 26.76 38.65 47.80
509 bytes 25.13 39.19 48.30
512 bytes 25.13 36.69 56.05
515 bytes 25.15 37.42 55.70
765 bytes 25.29 40.31 57.26
768 bytes 25.25 39.76 57.32
771 bytes 25.26 40.33 57.06
1021 bytes 49.27 57.00 63.73
1024 bytes 49.33 57.09 63.70
1027 bytes 49.07 57.25 63.70
1533 bytes 50.11 58.98 70.57
1536 bytes 50.09 59.30 70.22
1539 bytes 50.18 59.27 70.35
2045 bytes 50.44 59.42 74.31
2048 bytes 50.33 59.29 75.31
2051 bytes 50.32 59.14 74.02
3069 bytes 62.71 64.20 96.87
3072 bytes 62.78 64.94 96.84
3075 bytes 62.83 65.13 96.62
4093 bytes 62.56 64.78 99.63
4096 bytes 62.46 65.04 99.54
4099 bytes 62.47 65.87 99.65
6141 bytes 63.35 65.39 104.03
6144 bytes 63.59 66.16 104.66
6147 bytes 63.74 66.04 104.61
8189 bytes 63.65 66.52 107.75
8192 bytes 63.64 66.71 108.17
8195 bytes 63.66 67.08 109.11
12285 bytes 63.26 84.58 114.13
12288 bytes 63.28 85.38 114.55
12291 bytes 63.22 83.71 114.40
16381 bytes 62.87 98.19 120.48
16384 bytes 63.12 97.96 122.19
16387 bytes 63.48 98.48 121.68
24573 bytes 93.26 108.93 152.67
24576 bytes 94.40 109.42 152.14
24579 bytes 93.37 108.86 153.51
32765 bytes 102.84 115.46 169.04
32768 bytes 100.01 114.62 166.19
32771 bytes 102.61 115.97 167.96
49149 bytes 125.46 144.78 209.99
49152 bytes 123.76 139.70 187.17
49155 bytes 125.13 137.97 185.44

2009-12-23 17:10:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

> And its moot, anyway, as I have already retracted my one outstanding
> pull request based on Linus' observation. So at this time, I am not
> advocating _anything_ for upstream inclusion. And I am contemplating
> _never_ doing so again. It's not worth _this_.

That certainly sounds like the wrong reaction. Out of tree drivers
are typically a pain to use.

And upstream submission is not always like this!

-Andi

2009-12-23 17:17:56

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/09 12:10 PM, Andi Kleen wrote:
>> And its moot, anyway, as I have already retracted my one outstanding
>> pull request based on Linus' observation. So at this time, I am not
>> advocating _anything_ for upstream inclusion. And I am contemplating
>> _never_ doing so again. It's not worth _this_.
>
> That certainly sounds like the wrong reaction. Out of tree drivers
> are typically a pain to use.

Well, to Linus' point, it shouldn't go in until a critical mass of users
have expressed desire to see it in tree, which seems reasonable to me.
For the admittedly small group that are using it today, modern tools
like the opensuse-build-service ease the deployment as a KMP, so that
can suffice for now. Its actually what most of the alacrityvm community
uses today anyway (as opposed to using a merged tree in the guest)

>
> And upstream submission is not always like this!

I would think the process would come to a grinding halt if it were ;)

Thanks Andi,
-Greg



Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-23 17:20:08

by Andi Kleen

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

> > >It seems like there's definitely still potential for improvement
> > >with messages<4K. But for the large messages they indeed
> > >look rather good.
>
> You are misreading the graph. At <4K it is tracking bare metal (the
> green and yellow lines are bare metal, the red and blue bars are virtio).
> At >4k we start to drop off (esp. on RX).

I see. True.

-Andi

--
[email protected] -- Speaking for myself only.

2009-12-23 17:30:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33



On Wed, 23 Dec 2009, Gregory Haskins wrote:
> >
> > And upstream submission is not always like this!
>
> I would think the process would come to a grinding halt if it were ;)

Well, in all honesty, if it had been non-virtualized drivers I would just
have pulled. The pull request all looked sane, the diffstat looked clean
and non-intrusive, and I had no problems with any of that.

But the virtualization people always argue about the fifty-eleven
different ways of doing things, and unlike real drivers - where the actual
hardware places constraints on what the heck is going on - virtualization
people seem to revel in making new interfaces weekly, and tend to be only
incidentally limited by hardware (ie hardware interfaces may limit some
_details_, but seldom any higher-level arguments).

So when I see another virtualization interface, I want the virtualization
people to just argue it out amongst themselves. Thanks to the virtue of me
personally not caring one whit about virtualization, I can stand back and
just watch the fireworks.

Which is not to say that I enjoy it (I like the occasional flame-fest, but
in order to like them I need to _care_ enough to get fired up about
them!).

So I just don't want the in-fighting to take place in my tree, so I'd
rather see the fighting die out _before_ I actually pull.

You people are all crazy.

Linus

2009-12-23 17:33:45

by Andi Kleen

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Wed, Dec 23, 2009 at 12:17:48PM -0500, Gregory Haskins wrote:
> On 12/23/09 12:10 PM, Andi Kleen wrote:
> >> And its moot, anyway, as I have already retracted my one outstanding
> >> pull request based on Linus' observation. So at this time, I am not
> >> advocating _anything_ for upstream inclusion. And I am contemplating
> >> _never_ doing so again. It's not worth _this_.
> >
> > That certainly sounds like the wrong reaction. Out of tree drivers
> > are typically a pain to use.
>
> Well, to Linus' point, it shouldn't go in until a critical mass of users
> have expressed desire to see it in tree, which seems reasonable to me.
> For the admittedly small group that are using it today, modern tools
> like the opensuse-build-service ease the deployment as a KMP, so that
> can suffice for now. Its actually what most of the alacrityvm community
> uses today anyway (as opposed to using a merged tree in the guest)

It would be probably also good to have some more exhaustive data
showing any performance improvements.

Your numbers are very hard to compare to Chris' numbers and not
as comprehensive (e.g. no latencies)

-Andi

--
[email protected] -- Speaking for myself only.

2009-12-23 17:34:51

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/09 1:15 AM, Kyle Moffett wrote:
> On Tue, Dec 22, 2009 at 12:36, Gregory Haskins
> <[email protected]> wrote:
>> On 12/22/09 2:57 AM, Ingo Molnar wrote:
>>> * Gregory Haskins <[email protected]> wrote:
>>>> Actually, these patches have nothing to do with the KVM folks. [...]
>>>
>>> That claim is curious to me - the AlacrityVM host
>>
>> It's quite simple, really. These drivers support accessing vbus, and
>> vbus is hypervisor agnostic. In fact, vbus isn't necessarily even
>> hypervisor related. It may be used anywhere where a Linux kernel is the
>> "io backend", which includes hypervisors like AlacrityVM, but also
>> userspace apps, and interconnected physical systems as well.
>>
>> The vbus-core on the backend, and the drivers on the frontend operate
>> completely independent of the underlying hypervisor. A glue piece
>> called a "connector" ties them together, and any "hypervisor" specific
>> details are encapsulated in the connector module. In this case, the
>> connector surfaces to the guest side as a pci-bridge, so even that is
>> not hypervisor specific per se. It will work with any pci-bridge that
>> exposes a compatible ABI, which conceivably could be actual hardware.
>
> This is actually something that is of particular interest to me. I
> have a few prototype boards right now with programmable PCI-E
> host/device links on them; one of my long-term plans is to finagle
> vbus into providing multiple "virtual" devices across that single
> PCI-E interface.
>
> Specifically, I want to be able to provide virtual NIC(s), serial
> ports and serial consoles, virtual block storage, and possibly other
> kinds of interfaces. My big problem with existing virtio right now
> (although I would be happy to be proven wrong) is that it seems to
> need some sort of out-of-band communication channel for setting up
> devices, not to mention it seems to need one PCI device per virtual
> device.
>
> So I would love to be able to port something like vbus to my nify PCI
> hardware and write some backend drivers... then my PCI-E connected
> systems would dynamically provide a list of highly-efficient "virtual"
> devices to each other, with only one 4-lane PCI-E bus.

Hi Kyle,

We indeed have others that are doing something similar. I have CC'd Ira
who may be able to provide you more details. I would also point you at
the canonical example for what you would need to write to tie your
systems together. Its the "null connector", which you can find here:

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=kernel/vbus/connectors/null.c;h=b6d16cb68b7e49e07528278bc9f5b73e1dac0c2f;hb=HEAD

Do not hesitate to ask any questions, though you may want to take the
conversation to the alacrityvm-devel list as to not annoy the current CC
list any further than I already have ;)

Kind Regards,
-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-23 18:12:33

by Peter W. Morreale

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Wed, 2009-12-23 at 13:14 +0100, Andi Kleen wrote:
> > http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf
> >
> > See slide 32. This is without vhost-net.
>
> Thanks. Do you also have latency numbers?
>
> It seems like there's definitely still potential for improvement
> with messages <4K. But for the large messages they indeed
> look rather good.
>
> It's unclear what message size the Alacrity numbers used, but I presume
> it was rather large.
>

No. It was 1500. Please see:

http://developer.novell.com/wiki/index.php/AlacrityVM/Results


Best,
-PWM


> -Andi

2009-12-23 18:15:22

by Gregory Haskins

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/09 5:22 AM, Avi Kivity wrote:

>
> There was no attempt by Gregory to improve virtio-net.

If you truly do not understand why your statement is utterly wrong at
this point in the discussion, I feel sorry for you. If you are trying
to be purposely disingenuous, you should be ashamed of yourself. In any
case, your statement is demonstrably bogus, but you should already know
this given that we talked about at least several times.

To refresh your memory: http://patchwork.kernel.org/patch/17428/

In case its not blatantly clear, which I would hope it would be to
anyone that understands the problem space: What that patch would do is
allow an unmodified virtio-net to bridge to a vbus based virtio-net
backend. (Also note that this predates vhost-net by months (the date in
that thread is 4/9/2009) in case you are next going to try to argue that
it does nothing over vhost-net).

This would mean that virtio-net would gain most of the benefits I have
been advocating (fewer exits, cheaper exits, concurrent execution, etc).
So this would very much improve virtio-net indeed, given how poorly the
current backend was performing. I tried to convince the team to help me
build it out to completion on multiple occasions, but that request was
answered with "sorry, we are doing our own thing instead". You can say
that you didn't like my approach, since that is a subjective opinion.
But to say that I didn't attempt to improve it is a flat out wrong, and
I do not appreciate it.

-Greg




Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-23 18:17:35

by Gregory Haskins

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/09 12:52 PM, Peter W. Morreale wrote:
> On Wed, 2009-12-23 at 13:14 +0100, Andi Kleen wrote:
>>> http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf
>>>
>>> See slide 32. This is without vhost-net.
>>
>> Thanks. Do you also have latency numbers?
>>
>> It seems like there's definitely still potential for improvement
>> with messages <4K. But for the large messages they indeed
>> look rather good.
>>
>> It's unclear what message size the Alacrity numbers used, but I presume
>> it was rather large.
>>
>
> No. It was 1500. Please see:
>
> http://developer.novell.com/wiki/index.php/AlacrityVM/Results
>

Note: 1500 was the L2 MTU, not necessarily the L3/L4 size which was
probably much larger (though I do not recall what exactly atm).

-Greg



Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-23 18:23:39

by Chris Wright

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

* Peter W. Morreale ([email protected]) wrote:
> On Wed, 2009-12-23 at 13:14 +0100, Andi Kleen wrote:
> > > http://www.redhat.com/f/pdf/summit/cwright_11_open_source_virt.pdf
> > >
> > > See slide 32. This is without vhost-net.
> >
> > Thanks. Do you also have latency numbers?
> >
> > It seems like there's definitely still potential for improvement
> > with messages <4K. But for the large messages they indeed
> > look rather good.
> >
> > It's unclear what message size the Alacrity numbers used, but I presume
> > it was rather large.
>
> No. It was 1500. Please see:
>
> http://developer.novell.com/wiki/index.php/AlacrityVM/Results

That's just MTU. Not the message size. We can infer the message size by
the bare metal results (reasonably large), but is helpful to record that.

thanks,
-chris

2009-12-23 18:34:53

by Chris Wright

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

* Anthony Liguori ([email protected]) wrote:
> The "poor" packet latency of virtio-net is a result of the fact that we
> do software timer based TX mitigation. We do this such that we can
> decrease the number of exits per-packet and increase throughput. We set
> a timer for 250ms and per-packet latency will be at least that much.

Actually that's 150us ;-) It's the AlacrityVM numbers that show 250us
(note micro, not milli) for latency. That makes sense, shave off 150us
for the timer and you're left w/ 100us, which is not substantially
slower than what we see (for that bare metal latency we see ~60us)
when we switched tx mitigation schemes from timer based to thread
scheduling. Quite similar to the 56.8us that vbus/venet shows.

thanks,
-chris

2009-12-23 18:52:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33


* Andi Kleen <[email protected]> wrote:

> > - Are a pure software concept and any compatibility mismatch is
> > self-inflicted. The patches are in fact breaking the ABI to KVM
>
> In practice, especially considering older kernel releases, VMs behave like
> hardware, with all its quirks, compatibility requirements, sometimes not
> fully understood, etc.

I stopped reading your reply here. That's not actually fully true of KVM, at
all.

Virtualization isnt voodoo magic with some hidden souce in some magic hardware
component that no-one can understand fully. This isnt some mystic hardware
vendor coming up with some code and going away in the next quarter, with
barely anything documented and thousands of users left with hardware
components which we need to support under Linux somehow.

This is Linux virtualization, where _both_ the host and the guest source code
is fully known, and bugs (if any) can be found with a high degree of
determinism. This is Linux where the players dont just vanish overnight, and
are expected to do a proper job.

Yes, there's (obviously) compatibility requirements and artifacts and past
mistakes (as with any software interface), but you need to admit it to
yourself that your "virtualization is sloppy just like hardware" claim is just
a cheap excuse to not do a proper job of interface engineering.

Thanks,

Ingo

2009-12-23 19:27:14

by Andi Kleen

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

Ingo Molnar <[email protected]> writes:

> Yes, there's (obviously) compatibility requirements and artifacts and past
> mistakes (as with any software interface), but you need to admit it to

Yes that's exactly what I meant.

> yourself that your "virtualization is sloppy just like hardware" claim is just

In my experience hardware is a lot less sloppy than software.
Imagine your latest CPU had as many regressions as 2.6.32 @)

I wish software and even VMs were as good.

> a cheap excuse to not do a proper job of interface engineering.

Past mistakes cannot be easily fixed. And undoubtedly even the new
shiny interfaces will have bugs and problems. Also the behaviour is
often not completely understood. Maybe it can be easier debugged with
fully available source, but even then it's hard to fix the old
software (or rather even if you can fix it deploy the fixes). In
that regard it's a lot like hardware.

I agree with you that this makes it important to design good
interfaces, but again realistically mistakes will be made
and they cannot be all fixed retroactively.

-Andi

--
[email protected] -- Speaking for myself only.

2009-12-23 19:28:13

by Ira W. Snyder

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Wed, Dec 23, 2009 at 12:34:44PM -0500, Gregory Haskins wrote:
> On 12/23/09 1:15 AM, Kyle Moffett wrote:
> > On Tue, Dec 22, 2009 at 12:36, Gregory Haskins
> > <[email protected]> wrote:
> >> On 12/22/09 2:57 AM, Ingo Molnar wrote:
> >>> * Gregory Haskins <[email protected]> wrote:
> >>>> Actually, these patches have nothing to do with the KVM folks. [...]
> >>>
> >>> That claim is curious to me - the AlacrityVM host
> >>
> >> It's quite simple, really. These drivers support accessing vbus, and
> >> vbus is hypervisor agnostic. In fact, vbus isn't necessarily even
> >> hypervisor related. It may be used anywhere where a Linux kernel is the
> >> "io backend", which includes hypervisors like AlacrityVM, but also
> >> userspace apps, and interconnected physical systems as well.
> >>
> >> The vbus-core on the backend, and the drivers on the frontend operate
> >> completely independent of the underlying hypervisor. A glue piece
> >> called a "connector" ties them together, and any "hypervisor" specific
> >> details are encapsulated in the connector module. In this case, the
> >> connector surfaces to the guest side as a pci-bridge, so even that is
> >> not hypervisor specific per se. It will work with any pci-bridge that
> >> exposes a compatible ABI, which conceivably could be actual hardware.
> >
> > This is actually something that is of particular interest to me. I
> > have a few prototype boards right now with programmable PCI-E
> > host/device links on them; one of my long-term plans is to finagle
> > vbus into providing multiple "virtual" devices across that single
> > PCI-E interface.
> >
> > Specifically, I want to be able to provide virtual NIC(s), serial
> > ports and serial consoles, virtual block storage, and possibly other
> > kinds of interfaces. My big problem with existing virtio right now
> > (although I would be happy to be proven wrong) is that it seems to
> > need some sort of out-of-band communication channel for setting up
> > devices, not to mention it seems to need one PCI device per virtual
> > device.
> >

Greg, thanks for CC'ing me.

Hello Kyle,

I've got a similar situation here. I've got many PCI agents (devices)
plugged into a PCI backplane. I want to use the network to communicate
from the agents to the PCI master (host system).

At the moment, I'm using a custom driver, heavily based on the PCINet
driver posted on the linux-netdev mailing list. David Miller rejected
this approach, and suggested I use virtio instead.

My first approach with virtio was to create a "crossed-wires" driver,
which connected two virtio-net drivers together. While this worked, it
doesn't support feature negotiation properly, and so it was scrapped.
You can find this posted on linux-netdev with the title
"virtio-over-PCI".

I started writing a "virtio-phys" layer which creates the appropriate
distinction between frontend (guest driver) and backend (kvm, qemu,
etc.). This effort has been put on hold for lack of time, and because
there is no example code which shows how to create an interface from
virtio rings to TUN/TAP. The vhost-net driver is supposed to fill this
role, but I haven't seen any test code for that either. The developers
haven't been especially helpful answering questions like: how would I
use vhost-net with a DMA engine.

(You'll quickly find that you must use DMA to transfer data across PCI.
AFAIK, CPU's cannot do burst accesses to the PCI bus. I get a 10+ times
speedup using DMA.)

The virtio-phys work is mostly lacking a backend for virtio-net. It is
still incomplete, but at least devices can be registered, etc. It is
available at:
http://www.mmarray.org/~iws/virtio-phys/

Another thing you'll notice about virtio-net (and vbus' venet) is that
they DO NOT specify endianness. This means that they cannot be used with
a big-endian guest and a little-endian host, or vice versa. This means
they will not work in certain QEMU setups today.

Another problem with virtio is that you'll need to invent your own bus
model. QEMU/KVM has their bus model, lguest uses a different one, and
s390 uses yet another, IIRC. At least vbus provides a standardized bus
model.

All in all, I've written a lot of virtio code, and it has pretty much
all been shot down. It isn't very encouraging.

> > So I would love to be able to port something like vbus to my nify PCI
> > hardware and write some backend drivers... then my PCI-E connected
> > systems would dynamically provide a list of highly-efficient "virtual"
> > devices to each other, with only one 4-lane PCI-E bus.

I've written some IOQ test code, all of which is posted on the
alacrityvm-devel mailing list. If we can figure out how to make IOQ use
the proper ioread32()/iowrite32() accessors for accessing ioremap()ed
PCI BARs, then I can pretty easily write the rest of a "vbus-phys"
connector.

>
> Hi Kyle,
>
> We indeed have others that are doing something similar. I have CC'd Ira
> who may be able to provide you more details. I would also point you at
> the canonical example for what you would need to write to tie your
> systems together. Its the "null connector", which you can find here:
>
> http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=kernel/vbus/connectors/null.c;h=b6d16cb68b7e49e07528278bc9f5b73e1dac0c2f;hb=HEAD
>
> Do not hesitate to ask any questions, though you may want to take the
> conversation to the alacrityvm-devel list as to not annoy the current CC
> list any further than I already have ;)
>

IMO, they should at least see the issues here. They can reply back if
they want to be removed.

I hope it helps. Feel free to contact me off-list with any other
questions.

Ira

2009-12-23 19:51:00

by Andi Kleen

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

"Ira W. Snyder" <[email protected]> writes:

> (You'll quickly find that you must use DMA to transfer data across PCI.
> AFAIK, CPU's cannot do burst accesses to the PCI bus. I get a 10+ times

AFAIK that's what write-combining on x86 does. DMA has other
advantages of course.

-Andi
--
[email protected] -- Speaking for myself only.

2009-12-23 19:54:24

by Ira W. Snyder

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Wed, Dec 23, 2009 at 09:09:21AM -0600, Anthony Liguori wrote:
> On 12/23/2009 12:15 AM, Kyle Moffett wrote:
> > This is actually something that is of particular interest to me. I
> > have a few prototype boards right now with programmable PCI-E
> > host/device links on them; one of my long-term plans is to finagle
> > vbus into providing multiple "virtual" devices across that single
> > PCI-E interface.
> >
> > Specifically, I want to be able to provide virtual NIC(s), serial
> > ports and serial consoles, virtual block storage, and possibly other
> > kinds of interfaces. My big problem with existing virtio right now
> > (although I would be happy to be proven wrong) is that it seems to
> > need some sort of out-of-band communication channel for setting up
> > devices, not to mention it seems to need one PCI device per virtual
> > device.
>
> We've been thinking about doing a virtio-over-IP mechanism such that you
> could remote the entire virtio bus to a separate physical machine.
> virtio-over-IB is probably more interesting since you can make use of
> RDMA. virtio-over-PCI-e would work just as well.
>

I didn't know you were interested in this as well. See my later reply to
Kyle for a lot of code that I've written with this in mind.

> virtio is a layered architecture. Device enumeration/discovery sits at
> a lower level than the actual device ABIs. The device ABIs are
> implemented on top of a bulk data transfer API. The reason for this
> layering is so that we can reuse PCI as an enumeration/discovery
> mechanism. This tremendenously simplifies porting drivers to other OSes
> and let's us use PCI hotplug automatically. We get integration into all
> the fancy userspace hotplug support for free.
>
> But both virtio-lguest and virtio-s390 use in-band enumeration and
> discovery since they do not have support for PCI on either platform.
>

I'm interested in the same thing, just over PCI. The only PCI agent
systems I've used are not capable of manipulating the PCI configuration
space in such a way that virtio-pci is usable on them. This means
creating your own enumeration mechanism. Which sucks. See my virtio-phys
code (http://www.mmarray.org/~iws/virtio-phys/) for an example of how I
did it. It was modeled on lguest. Help is appreciated.

Ira

2009-12-23 20:21:19

by Avi Kivity

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 08:15 PM, Gregory Haskins wrote:
> On 12/23/09 5:22 AM, Avi Kivity wrote:
>
>
>> There was no attempt by Gregory to improve virtio-net.
>>
> If you truly do not understand why your statement is utterly wrong at
> this point in the discussion, I feel sorry for you. If you are trying
> to be purposely disingenuous, you should be ashamed of yourself. In any
> case, your statement is demonstrably bogus, but you should already know
> this given that we talked about at least several times.
>

There's no need to feel sorry for me, thanks. There's no reason for me
to be ashamed, either. And there's no need to take the discussion to
personal levels. Please keep it technical.


> To refresh your memory: http://patchwork.kernel.org/patch/17428/
>

This is not an attempt to improve virtio-net, it's an attempt to push
vbus. With this, virtio-net doesn't become any faster, since the
greatest bottleneck is not removed, it remains in userspace.

If you wanted to improve virtio-net, you would port venet-host to the
virtio-net guest/host interface, and port any secret sauce in
venet(-guest) to virtio-net. After that we could judge what vbus'
contribution to the equation is.

> In case its not blatantly clear, which I would hope it would be to
> anyone that understands the problem space: What that patch would do is
> allow an unmodified virtio-net to bridge to a vbus based virtio-net
> backend. (Also note that this predates vhost-net by months (the date in
> that thread is 4/9/2009) in case you are next going to try to argue that
> it does nothing over vhost-net).
>

Without the backend, it is useless. It demonstrates vbus' flexibility
quite well, but does nothing for virtio-net or its users, at least
without a lot more work.

> This would mean that virtio-net would gain most of the benefits I have
> been advocating (fewer exits, cheaper exits, concurrent execution, etc).
> So this would very much improve virtio-net indeed, given how poorly the
> current backend was performing. I tried to convince the team to help me
> build it out to completion on multiple occasions, but that request was
> answered with "sorry, we are doing our own thing instead". You can say
> that you didn't like my approach, since that is a subjective opinion.
> But to say that I didn't attempt to improve it is a flat out wrong, and
> I do not appreciate it.
>

Cutting down on the rhetoric is more important than cutting down exits
at this point in time.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-12-23 20:27:08

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 09:27 PM, Andi Kleen wrote:
> Ingo Molnar<[email protected]> writes:
>
>
>> Yes, there's (obviously) compatibility requirements and artifacts and past
>> mistakes (as with any software interface), but you need to admit it to
>>
> Yes that's exactly what I meant.
>

And we do make plenty of mistakes. And when we fix them, we have to
maintain bug-compatibility to allow live migration from the broken
version to the good version. If you're ever feeling overly happy, do
some compat work in qemu and it will suck a year's worth or two of your
life force a pop.

>> yourself that your "virtualization is sloppy just like hardware" claim is just
>>
> In my experience hardware is a lot less sloppy than software.
> Imagine your latest CPU had as many regressions as 2.6.32 @)
>
> I wish software and even VMs were as good.
>
>

Me too.

>> a cheap excuse to not do a proper job of interface engineering.
>>
> Past mistakes cannot be easily fixed. And undoubtedly even the new
> shiny interfaces will have bugs and problems. Also the behaviour is
> often not completely understood. Maybe it can be easier debugged with
> fully available source, but even then it's hard to fix the old
> software (or rather even if you can fix it deploy the fixes). In
> that regard it's a lot like hardware.
>
> I agree with you that this makes it important to design good
> interfaces, but again realistically mistakes will be made
> and they cannot be all fixed retroactively.
>

Our principal tool for this is to avoid introducing new interfaces
whenever possible. We try to stick to established hardware standards
(so we don't need to sloppily define them, and get guest compatibility
for free).

Hardware (both virt and non-virt) faces the same problems as software
here. So as hardware solutions are introduced, we adopt them, and
usually the virt extensions vendors follow with accelerations for these
paths as well.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-12-23 20:37:19

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 06:44 PM, Gregory Haskins wrote:
>
>> - Are a pure software concept
>>
> By design. In fact, I would describe it as "software to software
> optimized" as opposed to trying to shoehorn into something that was
> designed as a software-to-hardware interface (and therefore has
> assumptions about the constraints in that environment that are not
> applicable in software-only).
>
>

And that's the biggest mistake you can make. Look at Xen, for
instance. The paravirtualized the fork out of everything that moved in
order to get x86 virt going. And where are they now? x86_64 syscalls
are slow since they have to trap to the hypervisor and (partially) flush
the tlb. With npt or ept capable hosts performance is better for many
workloads on fullvirt. And paravirt doesn't support Windows. Their
unsung hero Jeremy is still trying to upstream dom0 Xen support. And
they get to support it forever.

VMware stuck with the hardware defined interfaces. Sure they had to
implement binary translation to get there, but as a result, they only
have to support one interface, all guests support it, and they can drop
it on newer hosts where it doesn't give them anything.

We had the advantage of course of starting with virt extensions, so it
was a no-brainer: paravirt only where absolutely required. Where we
deviated from this, it backfired.

>> - Gregory claims that the AlacricityVM patches are not a replacement for KVM.
>> I.e. there's no intention to phase out any 'old stuff'
>>
> There's no reason to phase anything out, except perhaps the virtio-pci
> transport. This is one more transport, plugging into virtio underneath
> (just like virtio-pci, virtio-lguest, and virtio-s390). I am not even
> suggesting that the old transport has to go away, per se. It is the KVM
> maintainers who insist on it being all or nothing. For me, I do not see
> the big deal in having one more "model" option in the qemu cmd-line, but
> that is just my opinion. If the maintainers were really so adamant that
> choice is pure evil, I am not sure why we don't see patches for removing
> everything but one model type in each IO category. But I digress.
>

We have to support users (also known as customers in some areas), so we
have to keep the old stuff. We have limited resources, so we want to
maintain as little as possible. We'll do it if we have to, but I'm
totally unconvinced we have to.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-12-23 21:02:26

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 10:36 PM, Avi Kivity wrote:
> On 12/23/2009 06:44 PM, Gregory Haskins wrote:
>>
>>> - Are a pure software concept
>> By design. In fact, I would describe it as "software to software
>> optimized" as opposed to trying to shoehorn into something that was
>> designed as a software-to-hardware interface (and therefore has
>> assumptions about the constraints in that environment that are not
>> applicable in software-only).
>>
>
> And that's the biggest mistake you can make. Look at Xen, for
> instance. The paravirtualized the fork out of everything that moved
> in order to get x86 virt going. And where are they now? x86_64
> syscalls are slow since they have to trap to the hypervisor and
> (partially) flush the tlb. With npt or ept capable hosts performance
> is better for many workloads on fullvirt. And paravirt doesn't
> support Windows. Their unsung hero Jeremy is still trying to upstream
> dom0 Xen support. And they get to support it forever.
>
> VMware stuck with the hardware defined interfaces. Sure they had to
> implement binary translation to get there, but as a result, they only
> have to support one interface, all guests support it, and they can
> drop it on newer hosts where it doesn't give them anything.

As a twist on this, the VMware paravirt driver interface is so
hardware-like that they're getting hardware vendors to supply cards that
implement it. Try that with a pure software approach.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

2009-12-23 21:25:21

by Gregory Haskins

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

(Sorry for top post...on a mobile)

When someone repeatedly makes a claim you believe to be wrong and you
correct them, you start to wonder if that person has a less than
honorable agenda. In any case, I overreacted. For that, I apologize.

That said, you are still incorrect. With what I proposed, the model
will run as an in-kernel vbus device, and no longer run in userspace.
It would therefore improve virtio-net as I stated, much in the same
way vhost-net or venet-tap do today.

FYI I am about to log out for the long holiday, so will be
unresponsive for a bit.

Kind Regards,
-Greg

On 12/23/09, Avi Kivity <[email protected]> wrote:
> On 12/23/2009 08:15 PM, Gregory Haskins wrote:
>> On 12/23/09 5:22 AM, Avi Kivity wrote:
>>
>>
>>> There was no attempt by Gregory to improve virtio-net.
>>>
>> If you truly do not understand why your statement is utterly wrong at
>> this point in the discussion, I feel sorry for you. If you are trying
>> to be purposely disingenuous, you should be ashamed of yourself. In any
>> case, your statement is demonstrably bogus, but you should already know
>> this given that we talked about at least several times.
>>
>
> There's no need to feel sorry for me, thanks. There's no reason for me
> to be ashamed, either. And there's no need to take the discussion to
> personal levels. Please keep it technical.
>
>
>> To refresh your memory: http://patchwork.kernel.org/patch/17428/
>>
>
> This is not an attempt to improve virtio-net, it's an attempt to push
> vbus. With this, virtio-net doesn't become any faster, since the
> greatest bottleneck is not removed, it remains in userspace.
>
> If you wanted to improve virtio-net, you would port venet-host to the
> virtio-net guest/host interface, and port any secret sauce in
> venet(-guest) to virtio-net. After that we could judge what vbus'
> contribution to the equation is.
>
>> In case its not blatantly clear, which I would hope it would be to
>> anyone that understands the problem space: What that patch would do is
>> allow an unmodified virtio-net to bridge to a vbus based virtio-net
>> backend. (Also note that this predates vhost-net by months (the date in
>> that thread is 4/9/2009) in case you are next going to try to argue that
>> it does nothing over vhost-net).
>>
>
> Without the backend, it is useless. It demonstrates vbus' flexibility
> quite well, but does nothing for virtio-net or its users, at least
> without a lot more work.
>
>> This would mean that virtio-net would gain most of the benefits I have
>> been advocating (fewer exits, cheaper exits, concurrent execution, etc).
>> So this would very much improve virtio-net indeed, given how poorly the
>> current backend was performing. I tried to convince the team to help me
>> build it out to completion on multiple occasions, but that request was
>> answered with "sorry, we are doing our own thing instead". You can say
>> that you didn't like my approach, since that is a subjective opinion.
>> But to say that I didn't attempt to improve it is a flat out wrong, and
>> I do not appreciate it.
>>
>
> Cutting down on the rhetoric is more important than cutting down exits
> at this point in time.
>
> --
> I have a truly marvellous patch that fixes the bug which this
> signature is too narrow to contain.
>
>

--
Sent from my mobile device

2009-12-23 22:58:56

by Anthony Liguori

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 01:54 PM, Ira W. Snyder wrote:
> On Wed, Dec 23, 2009 at 09:09:21AM -0600, Anthony Liguori wrote:

> I didn't know you were interested in this as well. See my later reply to
> Kyle for a lot of code that I've written with this in mind.


BTW, in the future, please CC me or CC
[email protected]. Or certainly kvm@vger. I
never looked at the virtio-over-pci patchset although I've heard it
referenced before.

>> But both virtio-lguest and virtio-s390 use in-band enumeration and
>> discovery since they do not have support for PCI on either platform.
>>
>
> I'm interested in the same thing, just over PCI. The only PCI agent
> systems I've used are not capable of manipulating the PCI configuration
> space in such a way that virtio-pci is usable on them.

virtio-pci is the wrong place to start if you want to use a PCI *device*
as the virtio bus. virtio-pci is meant to use the PCI bus as the virtio
bus. That's a very important requirement for us because it maintains
the relationship of each device looking like a normal PCI device.

> This means
> creating your own enumeration mechanism. Which sucks.

I don't think it sucks. The idea is that we don't want to unnecessarily
reinvent things.

Of course, the key feature of virtio is that it makes it possible for
you to create your own enumeration mechanism if you're so inclined.

> See my virtio-phys
> code (http://www.mmarray.org/~iws/virtio-phys/) for an example of how I
> did it. It was modeled on lguest. Help is appreciated.

If it were me, I'd take a much different approach. I would use a very
simple device with a single transmit and receive queue. I'd create a
standard header, and the implement a command protocol on top of it.
You'll be able to support zero copy I/O (although you'll have a fixed
number of outstanding requests). You would need a single large ring.

But then again, I have no idea what your requirements are. You could
probably get far treating the thing as a network device and just doing
ATAoE or something like that.

Regards,

Anthony Liguori

> Ira

2009-12-23 23:27:22

by Anthony Liguori

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 11:29 AM, Linus Torvalds wrote:
>
>
> On Wed, 23 Dec 2009, Gregory Haskins wrote:
>>>
>>> And upstream submission is not always like this!
>>
>> I would think the process would come to a grinding halt if it were ;)
>
> Well, in all honesty, if it had been non-virtualized drivers I would just
> have pulled. The pull request all looked sane, the diffstat looked clean
> and non-intrusive, and I had no problems with any of that.
>
> But the virtualization people always argue about the fifty-eleven
> different ways of doing things, and unlike real drivers - where the actual
> hardware places constraints on what the heck is going on - virtualization
> people seem to revel in making new interfaces weekly, and tend to be only
> incidentally limited by hardware (ie hardware interfaces may limit some
> _details_, but seldom any higher-level arguments).
>
> So when I see another virtualization interface, I want the virtualization
> people to just argue it out amongst themselves.

Actually, this sentiment is really the basis of this whole discussion.
KVM is the product of learnign the hard way that that inventing
interfaces just because we can is a total waste of time.

Our current I/O infrastructure is based on PCI devices that we can
emulate efficiently. They look, feel, and taste like real hardware
devices. We try to be as boring as humanly possible and so far, it's
worked out extremely well for us.

> Thanks to the virtue of me
> personally not caring one whit about virtualization, I can stand back and
> just watch the fireworks.

That's ashame, because I wish more people with a practical sentiment
cared about virtualization to discourage the general silliness that
seems to be all too common in this space.

Regards,

Anthony LIguori

2009-12-23 23:43:06

by Ira W. Snyder

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Wed, Dec 23, 2009 at 04:58:37PM -0600, Anthony Liguori wrote:
> On 12/23/2009 01:54 PM, Ira W. Snyder wrote:
> > On Wed, Dec 23, 2009 at 09:09:21AM -0600, Anthony Liguori wrote:
>
> > I didn't know you were interested in this as well. See my later reply to
> > Kyle for a lot of code that I've written with this in mind.
>
>
> BTW, in the future, please CC me or CC
> [email protected]. Or certainly kvm@vger. I
> never looked at the virtio-over-pci patchset although I've heard it
> referenced before.
>

Will do. I wouldn't think kvm@vger would be on-topic. I'm not interested
in KVM (though I do use it constantly, it is great). I'm only interested
in using virtio as a transport between physical systems. Is it a place
where discussing virtio by itself is on-topic?

> >> But both virtio-lguest and virtio-s390 use in-band enumeration and
> >> discovery since they do not have support for PCI on either platform.
> >>
> >
> > I'm interested in the same thing, just over PCI. The only PCI agent
> > systems I've used are not capable of manipulating the PCI configuration
> > space in such a way that virtio-pci is usable on them.
>
> virtio-pci is the wrong place to start if you want to use a PCI *device*
> as the virtio bus. virtio-pci is meant to use the PCI bus as the virtio
> bus. That's a very important requirement for us because it maintains
> the relationship of each device looking like a normal PCI device.
>
> > This means
> > creating your own enumeration mechanism. Which sucks.
>
> I don't think it sucks. The idea is that we don't want to unnecessarily
> reinvent things.
>
> Of course, the key feature of virtio is that it makes it possible for
> you to create your own enumeration mechanism if you're so inclined.
>
> > See my virtio-phys
> > code (http://www.mmarray.org/~iws/virtio-phys/) for an example of how I
> > did it. It was modeled on lguest. Help is appreciated.
>
> If it were me, I'd take a much different approach. I would use a very
> simple device with a single transmit and receive queue. I'd create a
> standard header, and the implement a command protocol on top of it.
> You'll be able to support zero copy I/O (although you'll have a fixed
> number of outstanding requests). You would need a single large ring.
>
> But then again, I have no idea what your requirements are. You could
> probably get far treating the thing as a network device and just doing
> ATAoE or something like that.
>

I've got a single PCI Host (master) with ~20 PCI slots. Physically, it
is a backplane in a cPCI chassis, but the form factor is irrelevant. It
is regular PCI from a software perspective.

Into this backplane, I plug up to 20 PCI Agents (slaves). They are
powerpc computers, almost identical to the Freescale MPC8349EMDS board.
They're full-featured powerpc computers, with CPU, RAM, etc. They can
run standalone.

I want to use the PCI backplane as a data transport. Specifically, I
want to transport ethernet over the backplane, so I can have the powerpc
boards mount their rootfs via NFS, etc. Everyone knows how to write
network daemons. It is a good and very well known way to transport data
between systems.

On the PCI bus, the powerpc systems expose 3 PCI BAR's. The size is
configureable, as is the memory location at which they point. What I
cannot do is get notified when a read/write hits the BAR. There is a
feature on the board which allows me to generate interrupts in either
direction: agent->master (PCI INTX) and master->agent (via an MMIO
register). The PCI vendor ID and device ID are not configureable.

One thing I cannot assume is that the PCI master system is capable of
performing DMA. In my system, it is a Pentium3 class x86 machine, which
has no DMA engine. However, the PowerPC systems do have DMA engines. In
virtio terms, it was suggested to make the powerpc systems the "virtio
hosts" (running the backends) and make the x86 (PCI master) the "virtio
guest" (running virtio-net, etc.).

I'm not sure what you're suggesting in the paragraph above. I want to
use virtio-net as the transport, I do not want to write my own
virtual-network driver. Can you please clarify?

Hopefully that explains what I'm trying to do. I'd love someone to help
guide me in the right direction here. I want something to fill this need
in mainline. I've been contacted seperately by 10+ people also looking
for a similar solution. I hunch most of them end up doing what I did:
write a quick-and-dirty network driver. I've been working on this for a
year, just to give an idea.

PS - should I create a new thread on the two mailing lists mentioned
above? I don't want to go too far off-topic in an alacrityvm thread. :)

Ira

2009-12-24 04:52:54

by Kyle Moffett

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Wed, Dec 23, 2009 at 17:58, Anthony Liguori <[email protected]> wrote:
> On 12/23/2009 01:54 PM, Ira W. Snyder wrote:
>> On Wed, Dec 23, 2009 at 09:09:21AM -0600, Anthony Liguori wrote:
>>> But both virtio-lguest and virtio-s390 use in-band enumeration and
>>> discovery since they do not have support for PCI on either platform.
>>
>> I'm interested in the same thing, just over PCI. The only PCI agent
>> systems I've used are not capable of manipulating the PCI configuration
>> space in such a way that virtio-pci is usable on them.
>
> virtio-pci is the wrong place to start if you want to use a PCI *device* as
> the virtio bus. virtio-pci is meant to use the PCI bus as the virtio bus.
>  That's a very important requirement for us because it maintains the
> relationship of each device looking like a normal PCI device.
>
>> This means
>> creating your own enumeration mechanism. Which sucks.
>
> I don't think it sucks.  The idea is that we don't want to unnecessarily
> reinvent things.
>
> Of course, the key feature of virtio is that it makes it possible for you to
> create your own enumeration mechanism if you're so inclined.

See... the thing is... a lot of us random embedded board developers
don't *want* to create our own enumeration mechanisms. I see a huge
amount of value in vbus as a common zero-copy DMA-capable
virtual-device interface, especially over miscellaneous non-PCI-bus
interconnects. I mentioned my PCI-E boards earlier, but I would also
personally be interested in using infiniband with RDMA as a virtual
device bus.

Basically, what it comes down to is vbus is practically useful as a
generic way to provide a large number of hotpluggable virtual devices
across an arbitrary interconnect. I agree that virtio works fine if
you have some out-of-band enumeration and hotplug transport (like
emulated PCI), but if you *don't* have that, it's pretty much faster
to write your own set of paired network drivers than it is to write a
whole enumeration and transport stack for virtio.

On top of *that*, with the virtio approach I would need to write a
whole bunch of tools to manage the set of virtual devices on my custom
hardware. With vbus that management interface would be entirely
common code across a potentially large number of virtualized physical
transports.

If vbus actually gets merged I will most likely be able to spend the
time to get the PCI-E crosslinks on my boards talking vbus, otherwise
it's liable to get completely shelved as "not worth the effort" to
write all the glue to make virtio work.

>> See my virtio-phys
>> code (http://www.mmarray.org/~iws/virtio-phys/) for an example of how I
>> did it. It was modeled on lguest. Help is appreciated.
>
> If it were me, I'd take a much different approach.  I would use a very
> simple device with a single transmit and receive queue.  I'd create a
> standard header, and the implement a command protocol on top of it. You'll
> be able to support zero copy I/O (although you'll have a fixed number of
> outstanding requests).  You would need a single large ring.

That's basically about as much work as writing entirely new network
and serial drivers over PCI. Not only that, but I The beauty of
vbus for me is that I could write a fairly simple logical-to-physical
glue driver which lets vbus talk over my PCI-E or infiniband link and
then I'm basically done.

Not only that, but the tools for adding new virtual devices (ethernet,
serial, block, etc) over vbus would be the same no matter what the
underlying transport.

> But then again, I have no idea what your requirements are.  You could
> probably get far treating the thing as a network device and just doing ATAoE
> or something like that.

<sarcasm>Oh... yes... clearly the right solution is to forgo the whole
zero-copy direct DMA of block writes and instead shuffle the whole
thing into 16kB ATAoE packets. That would obviously be much faster on
my little 1GHz PowerPC boards </sarcasm>

Sorry for the rant, but I really do think vbus is a valuable
technology and it's a damn shame to see Gregory Haskins being put
through this whole hassle. While most everybody else was griping
about problems he sat down and wrote some very nice clean maintainable
code to do what he needed. Not only that, but he designed a good
enough model that it could be ported to run over almost everything
from a single PCI-E link to an infiniband network.

I personally would love to see vbus merged, into staging at the very
least. I would definitely spend some time trying to make it work
across PCI-E on my *very* *real* embedded boards. Look at vbus not as
another virtualization ABI, but as a multiprotocol high-level device
abstraction API that already has one well-implemented and
high-performance user.

Cheers,
Kyle Moffett

2009-12-24 06:59:14

by Gleb Natapov

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Wed, Dec 23, 2009 at 07:51:50PM +0100, Ingo Molnar wrote:
>
> * Andi Kleen <[email protected]> wrote:
>
> > > - Are a pure software concept and any compatibility mismatch is
> > > self-inflicted. The patches are in fact breaking the ABI to KVM
> >
> > In practice, especially considering older kernel releases, VMs behave like
> > hardware, with all its quirks, compatibility requirements, sometimes not
> > fully understood, etc.
>
> I stopped reading your reply here. That's not actually fully true of KVM, at
> all.
>
> Virtualization isnt voodoo magic with some hidden souce in some magic hardware
> component that no-one can understand fully. This isnt some mystic hardware
> vendor coming up with some code and going away in the next quarter, with
> barely anything documented and thousands of users left with hardware
> components which we need to support under Linux somehow.
>
> This is Linux virtualization, where _both_ the host and the guest source code
> is fully known, and bugs (if any) can be found with a high degree of
It may sound strange but Windows is very popular guest and last I
checked my HW there was no Windows sources there, but the answer to that
is to emulate HW as close as possible to real one and then closed source
guests will not have a reason to be upset.

> determinism. This is Linux where the players dont just vanish overnight, and
> are expected to do a proper job.
>
> Yes, there's (obviously) compatibility requirements and artifacts and past
> mistakes (as with any software interface), but you need to admit it to
> yourself that your "virtualization is sloppy just like hardware" claim is just
> a cheap excuse to not do a proper job of interface engineering.
>
> Thanks,
>
> Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Gleb.

2009-12-24 09:31:22

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/09 3:36 PM, Avi Kivity wrote:
> On 12/23/2009 06:44 PM, Gregory Haskins wrote:
>>
>>> - Are a pure software concept
>>>
>> By design. In fact, I would describe it as "software to software
>> optimized" as opposed to trying to shoehorn into something that was
>> designed as a software-to-hardware interface (and therefore has
>> assumptions about the constraints in that environment that are not
>> applicable in software-only).
>>
>>
>
> And that's the biggest mistake you can make.

Sorry, that is just wrong or you wouldn't have virtio either.

> Look at Xen, for
> instance. The paravirtualized the fork out of everything that moved in
> order to get x86 virt going. And where are they now? x86_64 syscalls
> are slow since they have to trap to the hypervisor and (partially) flush
> the tlb. With npt or ept capable hosts performance is better for many
> workloads on fullvirt. And paravirt doesn't support Windows. Their
> unsung hero Jeremy is still trying to upstream dom0 Xen support. And
> they get to support it forever.

We are only talking about PV-IO here, so not apples to apples to what
Xen is going through.

>
> VMware stuck with the hardware defined interfaces. Sure they had to
> implement binary translation to get there, but as a result, they only
> have to support one interface, all guests support it, and they can drop
> it on newer hosts where it doesn't give them anything.

Again, you are confusing PV-IO. Not relevant here. Afaict, vmware,
kvm, xen, etc, all still do PV-IO and likely will for the foreseeable
future.

-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-24 09:36:36

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/09 4:01 PM, Avi Kivity wrote:
> On 12/23/2009 10:36 PM, Avi Kivity wrote:
>> On 12/23/2009 06:44 PM, Gregory Haskins wrote:
>>>
>>>> - Are a pure software concept
>>> By design. In fact, I would describe it as "software to software
>>> optimized" as opposed to trying to shoehorn into something that was
>>> designed as a software-to-hardware interface (and therefore has
>>> assumptions about the constraints in that environment that are not
>>> applicable in software-only).
>>>
>>
>> And that's the biggest mistake you can make. Look at Xen, for
>> instance. The paravirtualized the fork out of everything that moved
>> in order to get x86 virt going. And where are they now? x86_64
>> syscalls are slow since they have to trap to the hypervisor and
>> (partially) flush the tlb. With npt or ept capable hosts performance
>> is better for many workloads on fullvirt. And paravirt doesn't
>> support Windows. Their unsung hero Jeremy is still trying to upstream
>> dom0 Xen support. And they get to support it forever.
>>
>> VMware stuck with the hardware defined interfaces. Sure they had to
>> implement binary translation to get there, but as a result, they only
>> have to support one interface, all guests support it, and they can
>> drop it on newer hosts where it doesn't give them anything.
>
> As a twist on this, the VMware paravirt driver interface is so
> hardware-like that they're getting hardware vendors to supply cards that
> implement it. Try that with a pure software approach.

Any hardware engineer (myself included) will tell you that, generally
speaking, what you can do in hardware you can do in software (think of
what QEMU does today, for instance). It's purely a cost/performance
tradeoff.

I can at least tell you that is true of vbus. Anything on the vbus side
would be equally eligible for a hardware implementation, though there is
not reason to do this today since we have equivalent functionality in
baremetal already. The only motiviation is if you wanted to preserve
ABI etc, which is what vmware is presumably after. However, I am not
advocating this as necessary at this juncture.

So sorry, your statement is not relevant.

-Greg




>



Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-24 10:07:00

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Wed, Dec 23, 2009 at 11:28:08AM -0800, Ira W. Snyder wrote:
> On Wed, Dec 23, 2009 at 12:34:44PM -0500, Gregory Haskins wrote:
> > On 12/23/09 1:15 AM, Kyle Moffett wrote:
> > > On Tue, Dec 22, 2009 at 12:36, Gregory Haskins
> > > <[email protected]> wrote:
> > >> On 12/22/09 2:57 AM, Ingo Molnar wrote:
> > >>> * Gregory Haskins <[email protected]> wrote:
> > >>>> Actually, these patches have nothing to do with the KVM folks. [...]
> > >>>
> > >>> That claim is curious to me - the AlacrityVM host
> > >>
> > >> It's quite simple, really. These drivers support accessing vbus, and
> > >> vbus is hypervisor agnostic. In fact, vbus isn't necessarily even
> > >> hypervisor related. It may be used anywhere where a Linux kernel is the
> > >> "io backend", which includes hypervisors like AlacrityVM, but also
> > >> userspace apps, and interconnected physical systems as well.

So focus on interconnecting physical systems I think would be one way
for vbus to stop conflicting with KVM. If drivers for such systems
appear I expect that relevant (hypervisor-agnostic) vbus bits would be
very uncontroversial.

This would not be the first technology to make the jump from attempting
to be a PCI replacement to being an interconnect btw, I think infiniband
did this as well.

--
MST

2009-12-24 16:57:15

by Anthony Liguori

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 10:52 PM, Kyle Moffett wrote:
> On Wed, Dec 23, 2009 at 17:58, Anthony Liguori<[email protected]> wrote:
>> Of course, the key feature of virtio is that it makes it possible for you to
>> create your own enumeration mechanism if you're so inclined.
>
> See... the thing is... a lot of us random embedded board developers
> don't *want* to create our own enumeration mechanisms. I see a huge
> amount of value in vbus as a common zero-copy DMA-capable
> virtual-device interface, especially over miscellaneous non-PCI-bus
> interconnects. I mentioned my PCI-E boards earlier, but I would also
> personally be interested in using infiniband with RDMA as a virtual
> device bus.

I understand what you're saying, but is there really a practical
argument here? Infiniband already supports things like IPoIB and SCSI
over IB. Is it necessary to add another layer on top of it?

That said, it's easy enough to create a common enumeration mechanism for
people to use with virtio. I doubt it's really that interesting but
it's certainly quite reasonable. In fact, a lot of code could be reused
from virtio-s390 or virtio-lguest.


> Basically, what it comes down to is vbus is practically useful as a
> generic way to provide a large number of hotpluggable virtual devices
> across an arbitrary interconnect. I agree that virtio works fine if
> you have some out-of-band enumeration and hotplug transport (like
> emulated PCI), but if you *don't* have that, it's pretty much faster
> to write your own set of paired network drivers than it is to write a
> whole enumeration and transport stack for virtio.
>
> On top of *that*, with the virtio approach I would need to write a
> whole bunch of tools to manage the set of virtual devices on my custom
> hardware. With vbus that management interface would be entirely
> common code across a potentially large number of virtualized physical
> transports.


This particular use case really has nothing to do with virtualization.
You really want an infiniband replacement using the PCI-e bus. There's
so much on the horizon in this space that's being standardized in
PCI-sig like MR-IOV.

>> If it were me, I'd take a much different approach. I would use a very
>> simple device with a single transmit and receive queue. I'd create a
>> standard header, and the implement a command protocol on top of it. You'll
>> be able to support zero copy I/O (although you'll have a fixed number of
>> outstanding requests). You would need a single large ring.
>
> That's basically about as much work as writing entirely new network
> and serial drivers over PCI. Not only that, but I The beauty of
> vbus for me is that I could write a fairly simple logical-to-physical
> glue driver which lets vbus talk over my PCI-E or infiniband link and
> then I'm basically done.

Is this something you expect people to use or is this a one-off project?

> I personally would love to see vbus merged, into staging at the very
> least. I would definitely spend some time trying to make it work
> across PCI-E on my *very* *real* embedded boards. Look at vbus not as
> another virtualization ABI, but as a multiprotocol high-level device
> abstraction API that already has one well-implemented and
> high-performance user.

If someone wants to advocate vbus for non-virtualized purposes, I have
no problem with that.

I just don't think it makes sense in for KVM. virtio is not intended to
be used for any possible purpose.

Regards,

Anthony Liguori

2009-12-24 17:09:48

by Anthony Liguori

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 05:42 PM, Ira W. Snyder wrote:
>
> I've got a single PCI Host (master) with ~20 PCI slots. Physically, it
> is a backplane in a cPCI chassis, but the form factor is irrelevant. It
> is regular PCI from a software perspective.
>
> Into this backplane, I plug up to 20 PCI Agents (slaves). They are
> powerpc computers, almost identical to the Freescale MPC8349EMDS board.
> They're full-featured powerpc computers, with CPU, RAM, etc. They can
> run standalone.
>
> I want to use the PCI backplane as a data transport. Specifically, I
> want to transport ethernet over the backplane, so I can have the powerpc
> boards mount their rootfs via NFS, etc. Everyone knows how to write
> network daemons. It is a good and very well known way to transport data
> between systems.
>
> On the PCI bus, the powerpc systems expose 3 PCI BAR's. The size is
> configureable, as is the memory location at which they point. What I
> cannot do is get notified when a read/write hits the BAR. There is a
> feature on the board which allows me to generate interrupts in either
> direction: agent->master (PCI INTX) and master->agent (via an MMIO
> register). The PCI vendor ID and device ID are not configureable.
>
> One thing I cannot assume is that the PCI master system is capable of
> performing DMA. In my system, it is a Pentium3 class x86 machine, which
> has no DMA engine. However, the PowerPC systems do have DMA engines. In
> virtio terms, it was suggested to make the powerpc systems the "virtio
> hosts" (running the backends) and make the x86 (PCI master) the "virtio
> guest" (running virtio-net, etc.).

IMHO, virtio and vbus are both the wrong model for what you're doing.
The key reason why is that virtio and vbus are generally designed around
the concept that there is shared cache coherent memory from which you
can use lock-less ring queues to implement efficient I/O.

In your architecture, you do not have cache coherent shared memory.
Instead, you have two systems connected via a PCI backplace with
non-coherent shared memory.

You probably need to use the shared memory as a bounce buffer and
implement a driver on top of that.

> I'm not sure what you're suggesting in the paragraph above. I want to
> use virtio-net as the transport, I do not want to write my own
> virtual-network driver. Can you please clarify?

virtio-net and vbus are going to be overly painful for you to use
because no one end can access arbitrary memory in the other end.

> Hopefully that explains what I'm trying to do. I'd love someone to help
> guide me in the right direction here. I want something to fill this need
> in mainline.

If I were you, I would write a custom network driver. virtio-net is
awfully small (just a few hundred lines). I'd use that as a basis but I
would not tie into virtio or vbus. The paradigms don't match.

> I've been contacted seperately by 10+ people also looking
> for a similar solution. I hunch most of them end up doing what I did:
> write a quick-and-dirty network driver. I've been working on this for a
> year, just to give an idea.

The whole architecture of having multiple heterogenous systems on a
common high speed backplane is what IBM refers to as "hybrid computing".
It's a model that I think will be come a lot more common in the
future. I think there are typically two types of hybrid models
depending on whether the memory sharing is cache coherent or not. If
you have coherent shared memory, the problem looks an awfully lot like
virtualization. If you don't have coherent shared memory, then the
shared memory basically becomes a pool to bounce into and out-of.

> PS - should I create a new thread on the two mailing lists mentioned
> above? I don't want to go too far off-topic in an alacrityvm thread. :)

Couldn't hurt.

Regards,

Anthony Liguori

2009-12-24 20:41:30

by Roland Dreier

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33


> > This is Linux virtualization, where _both_ the host and the guest source code
> > is fully known, and bugs (if any) can be found with a high degree of

> It may sound strange but Windows is very popular guest and last I
> checked my HW there was no Windows sources there, but the answer to that
> is to emulate HW as close as possible to real one and then closed source
> guests will not have a reason to be upset.
>
> > determinism. This is Linux where the players dont just vanish overnight, and
> > are expected to do a proper job.

And without even getting into closed/proprietary guests, virt is useful
for testing/developing/deploying many free OSes, eg FreeBSD, NetBSD,
OpenBSD, Hurd, <random research OS>, etc. Not to mention just wanting a
stable [virtual] platform to run <old enterprise Linux distro> on. So
having a virtual platform whose interface doesn't change very often or
very much has a lot of value at least in avoiding churn in guest OSes.

- R.

2009-12-25 00:38:43

by Ira W. Snyder

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On Thu, Dec 24, 2009 at 11:09:39AM -0600, Anthony Liguori wrote:
> On 12/23/2009 05:42 PM, Ira W. Snyder wrote:
> >
> > I've got a single PCI Host (master) with ~20 PCI slots. Physically, it
> > is a backplane in a cPCI chassis, but the form factor is irrelevant. It
> > is regular PCI from a software perspective.
> >
> > Into this backplane, I plug up to 20 PCI Agents (slaves). They are
> > powerpc computers, almost identical to the Freescale MPC8349EMDS board.
> > They're full-featured powerpc computers, with CPU, RAM, etc. They can
> > run standalone.
> >
> > I want to use the PCI backplane as a data transport. Specifically, I
> > want to transport ethernet over the backplane, so I can have the powerpc
> > boards mount their rootfs via NFS, etc. Everyone knows how to write
> > network daemons. It is a good and very well known way to transport data
> > between systems.
> >
> > On the PCI bus, the powerpc systems expose 3 PCI BAR's. The size is
> > configureable, as is the memory location at which they point. What I
> > cannot do is get notified when a read/write hits the BAR. There is a
> > feature on the board which allows me to generate interrupts in either
> > direction: agent->master (PCI INTX) and master->agent (via an MMIO
> > register). The PCI vendor ID and device ID are not configureable.
> >
> > One thing I cannot assume is that the PCI master system is capable of
> > performing DMA. In my system, it is a Pentium3 class x86 machine, which
> > has no DMA engine. However, the PowerPC systems do have DMA engines. In
> > virtio terms, it was suggested to make the powerpc systems the "virtio
> > hosts" (running the backends) and make the x86 (PCI master) the "virtio
> > guest" (running virtio-net, etc.).
>
> IMHO, virtio and vbus are both the wrong model for what you're doing.
> The key reason why is that virtio and vbus are generally designed around
> the concept that there is shared cache coherent memory from which you
> can use lock-less ring queues to implement efficient I/O.
>
> In your architecture, you do not have cache coherent shared memory.
> Instead, you have two systems connected via a PCI backplace with
> non-coherent shared memory.
>
> You probably need to use the shared memory as a bounce buffer and
> implement a driver on top of that.
>
> > I'm not sure what you're suggesting in the paragraph above. I want to
> > use virtio-net as the transport, I do not want to write my own
> > virtual-network driver. Can you please clarify?
>
> virtio-net and vbus are going to be overly painful for you to use
> because no one end can access arbitrary memory in the other end.
>

The PCI Agents (powerpc's) can access the lowest 4GB of the PCI Master's
memory. Not all at the same time, but I have a 1GB movable window into
PCI address space. I hunch Kyle's setup is similar.

I've proved that virtio can work via my "crossed-wires" driver, hooking
two virtio-net's together. With a proper in-kernel backend, I think the
issues would be gone, and things would work great.

> > Hopefully that explains what I'm trying to do. I'd love someone to help
> > guide me in the right direction here. I want something to fill this need
> > in mainline.
>
> If I were you, I would write a custom network driver. virtio-net is
> awfully small (just a few hundred lines). I'd use that as a basis but I
> would not tie into virtio or vbus. The paradigms don't match.
>

This is exactly what I did first. I proposed it for mainline, and David
Miller shot it down, saying: you're creating your own virtualization
scheme, use virtio instead. Arnd Bergmann is maintaining a driver
out-of-tree for some IBM cell boards which is very similar, IIRC.

In my driver, I used the PCI Agent's PCI BAR's to contain ring
descriptors. The PCI Agent actually handles all data transfer (via the
onboard DMA engine). It works great. I'll gladly post it if you'd like
to see it.

In my driver, I had to use 64K MTU to get acceptable performance. I'm
not entirely sure how to implement a driver that can handle
scatter/gather (fragmented skb's). It clearly isn't that easy to tune a
network driver for good performance. For reference, my "crossed-wires"
virtio drivers achieved excellent performance (10x better than my custom
driver) with 1500 byte MTU.

> > I've been contacted seperately by 10+ people also looking
> > for a similar solution. I hunch most of them end up doing what I did:
> > write a quick-and-dirty network driver. I've been working on this for a
> > year, just to give an idea.
>
> The whole architecture of having multiple heterogenous systems on a
> common high speed backplane is what IBM refers to as "hybrid computing".
> It's a model that I think will be come a lot more common in the
> future. I think there are typically two types of hybrid models
> depending on whether the memory sharing is cache coherent or not. If
> you have coherent shared memory, the problem looks an awfully lot like
> virtualization. If you don't have coherent shared memory, then the
> shared memory basically becomes a pool to bounce into and out-of.
>

Let's say I could get David Miller to accept a driver as described
above. Would you really want 10+ seperate but extremely similar drivers
for similar boards? Such as mine, Arnd's, Kyle's, etc. It is definitely
a niche that Linux is lacking support for. And as you say, it is
growing.

It seems that no matter what I try, everyone says: no, go do this other
thing instead. Before I go and write the 5th iteration of this, I'll be
looking for a maintainer who says: this is the correct thing to be
doing, I'll help you push this towards mainline. It's been frustrating.

Ira

2009-12-27 09:16:49

by Avi Kivity

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/23/2009 11:21 PM, Gregory Haskins wrote:
> That said, you are still incorrect. With what I proposed, the model
> will run as an in-kernel vbus device, and no longer run in userspace.
> It would therefore improve virtio-net as I stated, much in the same
> way vhost-net or venet-tap do today.
>

That can't work. virtio-net has its own ABI on top of virtio, for
example it prepends a header for TSO information. Maybe if you disable
all features it becomes compatible with venet, but that cripples it.

--
error compiling committee.c: too many arguments to function

2009-12-27 09:30:33

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/24/2009 11:31 AM, Gregory Haskins wrote:
> On 12/23/09 3:36 PM, Avi Kivity wrote:
>
>> On 12/23/2009 06:44 PM, Gregory Haskins wrote:
>>
>>>
>>>> - Are a pure software concept
>>>>
>>>>
>>> By design. In fact, I would describe it as "software to software
>>> optimized" as opposed to trying to shoehorn into something that was
>>> designed as a software-to-hardware interface (and therefore has
>>> assumptions about the constraints in that environment that are not
>>> applicable in software-only).
>>>
>>>
>>>
>> And that's the biggest mistake you can make.
>>
> Sorry, that is just wrong or you wouldn't have virtio either.
>

Things are not black and white. I prefer not to have paravirtualization
at all. When there is no alternative, I prefer to limit it to the
device level and keep it off the bus level.

>> Look at Xen, for
>> instance. The paravirtualized the fork out of everything that moved in
>> order to get x86 virt going. And where are they now? x86_64 syscalls
>> are slow since they have to trap to the hypervisor and (partially) flush
>> the tlb. With npt or ept capable hosts performance is better for many
>> workloads on fullvirt. And paravirt doesn't support Windows. Their
>> unsung hero Jeremy is still trying to upstream dom0 Xen support. And
>> they get to support it forever.
>>
> We are only talking about PV-IO here, so not apples to apples to what
> Xen is going through.
>

The same principles apply.

>> VMware stuck with the hardware defined interfaces. Sure they had to
>> implement binary translation to get there, but as a result, they only
>> have to support one interface, all guests support it, and they can drop
>> it on newer hosts where it doesn't give them anything.
>>
> Again, you are confusing PV-IO. Not relevant here. Afaict, vmware,
> kvm, xen, etc, all still do PV-IO and likely will for the foreseeable
> future.
>

They're all doing it very differently:

- pure emulation (qemu e1000, etc.)
- pci device (vmware, virtio/pci)
- paravirt bus bridged through a pci device (Xen hvm, Hyper-V (I think),
venet/vbus)
- paravirt bus (Xen pv, early vbus, virtio/lguest, virtio/s390)

The higher you are up this scale the easier things are, so once you get
reasonable performance there is no need to descend further.

--
error compiling committee.c: too many arguments to function

2009-12-27 09:34:22

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/24/2009 11:36 AM, Gregory Haskins wrote:
>> As a twist on this, the VMware paravirt driver interface is so
>> hardware-like that they're getting hardware vendors to supply cards that
>> implement it. Try that with a pure software approach.
>>
> Any hardware engineer (myself included) will tell you that, generally
> speaking, what you can do in hardware you can do in software (think of
> what QEMU does today, for instance). It's purely a cost/performance
> tradeoff.
>
> I can at least tell you that is true of vbus. Anything on the vbus side
> would be equally eligible for a hardware implementation, though there is
> not reason to do this today since we have equivalent functionality in
> baremetal already.

There's a huge difference in the probability of vmware getting cards to
their spec, or x86 vendors improving interrupt delivery to guests,
compared to vbus being implemented in hardware.

> The only motiviation is if you wanted to preserve
> ABI etc, which is what vmware is presumably after. However, I am not
> advocating this as necessary at this juncture.
>

Maybe AlacrityVM users don't care about compatibility, but my users do.

--
error compiling committee.c: too many arguments to function

2009-12-27 13:18:45

by Gregory Haskins

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/27/09 4:15 AM, Avi Kivity wrote:
> On 12/23/2009 11:21 PM, Gregory Haskins wrote:
>> That said, you are still incorrect. With what I proposed, the model
>> will run as an in-kernel vbus device, and no longer run in userspace.
>> It would therefore improve virtio-net as I stated, much in the same
>> way vhost-net or venet-tap do today.
>>
>
> That can't work. virtio-net has its own ABI on top of virtio, for
> example it prepends a header for TSO information. Maybe if you disable
> all features it becomes compatible with venet, but that cripples it.
>


You are confused. The backend would be virtio-net specific, and would
therefore understand the virtio-net ABI. It would support any feature
of virtio-net as long as it was implemented and negotiated by both sides
of the link.

-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-27 13:28:24

by Avi Kivity

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/27/2009 03:18 PM, Gregory Haskins wrote:
> On 12/27/09 4:15 AM, Avi Kivity wrote:
>
>> On 12/23/2009 11:21 PM, Gregory Haskins wrote:
>>
>>> That said, you are still incorrect. With what I proposed, the model
>>> will run as an in-kernel vbus device, and no longer run in userspace.
>>> It would therefore improve virtio-net as I stated, much in the same
>>> way vhost-net or venet-tap do today.
>>>
>>>
>> That can't work. virtio-net has its own ABI on top of virtio, for
>> example it prepends a header for TSO information. Maybe if you disable
>> all features it becomes compatible with venet, but that cripples it.
>>
>>
> You are confused. The backend would be virtio-net specific, and would
> therefore understand the virtio-net ABI. It would support any feature
> of virtio-net as long as it was implemented and negotiated by both sides
> of the link.
>

Then we're back to square one. A nice demonstration of vbus
flexibility, but no help for virtio.

--
error compiling committee.c: too many arguments to function

2009-12-27 13:34:53

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/27/09 4:33 AM, Avi Kivity wrote:
> On 12/24/2009 11:36 AM, Gregory Haskins wrote:
>>> As a twist on this, the VMware paravirt driver interface is so
>>> hardware-like that they're getting hardware vendors to supply cards that
>>> implement it. Try that with a pure software approach.
>>>
>> Any hardware engineer (myself included) will tell you that, generally
>> speaking, what you can do in hardware you can do in software (think of
>> what QEMU does today, for instance). It's purely a cost/performance
>> tradeoff.
>>
>> I can at least tell you that is true of vbus. Anything on the vbus side
>> would be equally eligible for a hardware implementation, though there is
>> not reason to do this today since we have equivalent functionality in
>> baremetal already.
>
> There's a huge difference in the probability of vmware getting cards to
> their spec, or x86 vendors improving interrupt delivery to guests,
> compared to vbus being implemented in hardware.

Thats not relevant, however. I said in the original quote that you
snipped that I made it a software design on purpose, and you tried to
somehow paint that as a negative because vmware made theirs
"hardware-like" and you implied it could not be done with my approach
with the statement "try that with a pure software approach". And the
bottom line is that the statement is incorrect and/or misleading.

>
>> The only motiviation is if you wanted to preserve
>> ABI etc, which is what vmware is presumably after. However, I am not
>> advocating this as necessary at this juncture.
>>
>
> Maybe AlacrityVM users don't care about compatibility, but my users do.

Again, not relevant to this thread. Making your interface
"hardware-like" buys you nothing in the end, as you ultimately need to
load drivers in the guest either way, and any major OS lets you extend
both devices and buses with relative ease. The only counter example
would be if you truly were "hardware-exactly" like e1000 emulation, but
we already know that this means it is hardware centric and not
"exit-rate aware" and would perform poorly. Otherwise "compatible" is
purely a point on the time line (for instance, the moment virtio-pci ABI
shipped), not an architectural description such as "hardware-like".



>



Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-27 13:39:35

by Gregory Haskins

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/27/09 8:27 AM, Avi Kivity wrote:
> On 12/27/2009 03:18 PM, Gregory Haskins wrote:
>> On 12/27/09 4:15 AM, Avi Kivity wrote:
>>
>>> On 12/23/2009 11:21 PM, Gregory Haskins wrote:
>>>
>>>> That said, you are still incorrect. With what I proposed, the model
>>>> will run as an in-kernel vbus device, and no longer run in userspace.
>>>> It would therefore improve virtio-net as I stated, much in the same
>>>> way vhost-net or venet-tap do today.
>>>>
>>>>
>>> That can't work. virtio-net has its own ABI on top of virtio, for
>>> example it prepends a header for TSO information. Maybe if you disable
>>> all features it becomes compatible with venet, but that cripples it.
>>>
>>>
>> You are confused. The backend would be virtio-net specific, and would
>> therefore understand the virtio-net ABI. It would support any feature
>> of virtio-net as long as it was implemented and negotiated by both sides
>> of the link.
>>
>
> Then we're back to square one. A nice demonstration of vbus
> flexibility, but no help for virtio.
>

No, where we are is at the point where we demonstrate that your original
statement that I did nothing to improve virtio was wrong.

-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-27 13:49:47

by Avi Kivity

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/27/2009 03:34 PM, Gregory Haskins wrote:
> On 12/27/09 4:33 AM, Avi Kivity wrote:
>
>> On 12/24/2009 11:36 AM, Gregory Haskins wrote:
>>
>>>> As a twist on this, the VMware paravirt driver interface is so
>>>> hardware-like that they're getting hardware vendors to supply cards that
>>>> implement it. Try that with a pure software approach.
>>>>
>>>>
>>> Any hardware engineer (myself included) will tell you that, generally
>>> speaking, what you can do in hardware you can do in software (think of
>>> what QEMU does today, for instance). It's purely a cost/performance
>>> tradeoff.
>>>
>>> I can at least tell you that is true of vbus. Anything on the vbus side
>>> would be equally eligible for a hardware implementation, though there is
>>> not reason to do this today since we have equivalent functionality in
>>> baremetal already.
>>>
>> There's a huge difference in the probability of vmware getting cards to
>> their spec, or x86 vendors improving interrupt delivery to guests,
>> compared to vbus being implemented in hardware.
>>
> Thats not relevant, however. I said in the original quote that you
> snipped that I made it a software design on purpose, and you tried to
> somehow paint that as a negative because vmware made theirs
> "hardware-like" and you implied it could not be done with my approach
> with the statement "try that with a pure software approach". And the
> bottom line is that the statement is incorrect and/or misleading.
>

It's not incorrect. VMware stuck to the pci specs and as a result they
can have hardware implement their virtual NIC protocol. For vbus this
is much harder to do since you need a side-channel between different
cards to coordinate interrupt delivery. In theory you can do eveything
if you don't consider practicalities.

That's a digression, though, I'm not suggesting we'll see virtio
hardware or that this is a virtio/pci advantage vs. vbus. It's an
anecdote showing that sticking with specs has its advantages.

wrt pci vs vbus, the difference is in the ability to use improvements in
interrupt delivery accelerations in virt hardware. If this happens,
virtio/pci can immediately take advantage of it, while vbus has to stick
with software delivery for backward compatibility, and all that code
becomes a useless support burden.

As an example of what hardware can do when it really sets its mind to
it, s390 can IPI from vcpu to vcpu without exiting to the host.

>>> The only motiviation is if you wanted to preserve
>>> ABI etc, which is what vmware is presumably after. However, I am not
>>> advocating this as necessary at this juncture.
>>>
>>>
>> Maybe AlacrityVM users don't care about compatibility, but my users do.
>>
> Again, not relevant to this thread. Making your interface
> "hardware-like" buys you nothing in the end, as you ultimately need to
> load drivers in the guest either way, and any major OS lets you extend
> both devices and buses with relative ease. The only counter example
> would be if you truly were "hardware-exactly" like e1000 emulation, but
> we already know that this means it is hardware centric and not
> "exit-rate aware" and would perform poorly. Otherwise "compatible" is
> purely a point on the time line (for instance, the moment virtio-pci ABI
> shipped), not an architectural description such as "hardware-like".
>

True, not related to the thread. But it is a problem. The difference
between virtio and vbus here is that virtio is already deployed and its
users expect not to reinstall drivers [1]. Before virtio existed,
people could not deploy performance sensitive applications on kvm. Now
that it exists, we have to support it without requiring users to touch
their guests.

That means that without proof that virtio cannot be scaled, we'll keep
supporting and extending it.


[1] Another difference is the requirement for writing a "bus driver" for
every supported guest, which means dealing with icky bits like hotplug.

--
error compiling committee.c: too many arguments to function

2009-12-27 13:50:30

by Avi Kivity

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/27/2009 03:39 PM, Gregory Haskins wrote:
> No, where we are is at the point where we demonstrate that your original
> statement that I did nothing to improve virtio was wrong.
>
>

I stand by it. virtio + your patch does nothing without a ton more work
(more or less equivalent to vhost-net).

--
error compiling committee.c: too many arguments to function

2009-12-27 14:29:55

by Gregory Haskins

[permalink] [raw]
Subject: Re: [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/27/09 8:49 AM, Avi Kivity wrote:
> On 12/27/2009 03:34 PM, Gregory Haskins wrote:
>> On 12/27/09 4:33 AM, Avi Kivity wrote:
>>
>>> On 12/24/2009 11:36 AM, Gregory Haskins wrote:
>>>
>>>>> As a twist on this, the VMware paravirt driver interface is so
>>>>> hardware-like that they're getting hardware vendors to supply cards
>>>>> that
>>>>> implement it. Try that with a pure software approach.
>>>>>
>>>>>
>>>> Any hardware engineer (myself included) will tell you that, generally
>>>> speaking, what you can do in hardware you can do in software (think of
>>>> what QEMU does today, for instance). It's purely a cost/performance
>>>> tradeoff.
>>>>
>>>> I can at least tell you that is true of vbus. Anything on the vbus
>>>> side
>>>> would be equally eligible for a hardware implementation, though
>>>> there is
>>>> not reason to do this today since we have equivalent functionality in
>>>> baremetal already.
>>>>
>>> There's a huge difference in the probability of vmware getting cards to
>>> their spec, or x86 vendors improving interrupt delivery to guests,
>>> compared to vbus being implemented in hardware.
>>>
>> Thats not relevant, however. I said in the original quote that you
>> snipped that I made it a software design on purpose, and you tried to
>> somehow paint that as a negative because vmware made theirs
>> "hardware-like" and you implied it could not be done with my approach
>> with the statement "try that with a pure software approach". And the
>> bottom line is that the statement is incorrect and/or misleading.
>>
>
> It's not incorrect.

At the very best it's misleading.

> VMware stuck to the pci specs and as a result they
> can have hardware implement their virtual NIC protocol. For vbus this
> is much harder

Not really.

> to do since you need a side-channel between different
> cards to coordinate interrupt delivery. In theory you can do eveything
> if you don't consider practicalities.

pci based designs, such as vmware and virtio-pci arent free of this
notion either. They simply rely on APIC emulation for the irq-chip, and
it just so happens that vbus implements a different irq-chip (more
specifically, the connector that we employ between the guest and vbus
does). On one hand, you have the advantage of the guest already
supporting the irq-chip ABI, and on other other, you have an optimized
(e.g. shared memory based inject/ack) and feature enhanced ABI
(interrupt priority, no IDT constraints, etc). The are pros and cons to
either direction, but the vbus project charter is to go for maximum
performance and features, so that is acceptable to us.


>
> That's a digression, though, I'm not suggesting we'll see virtio
> hardware or that this is a virtio/pci advantage vs. vbus. It's an
> anecdote showing that sticking with specs has its advantages.

It also has distinct disadvantages. For instance, the PCI spec is
gigantic, yet almost none of it is needed to do the job here. When you
are talking full-virt, you are left with no choice. With para-virt, you
do have a choice, and the vbus-connector for AlacrityVM capitalizes on this.

As an example, think about all the work that went into emulating the PCI
chipset, the APIC chipset, MSI-X support, irq-routing, etc, when all you
needed was a simple event-queue to indicate that an event (e.g. an
"interrupt") occurred.

This is an example connector in vbus:

http://git.kernel.org/?p=linux/kernel/git/ghaskins/alacrityvm/linux-2.6.git;a=blob;f=kernel/vbus/connectors/null.c;h=b6d16cb68b7e49e07528278bc9f5b73e1dac0c2f;hb=HEAD

It encapsulates all of hotplug, signal (interrupt) routing, and memory
routing for both sides of the "link" in 584 lines of code. And that
also implicitly brings in device discovery and configuration since that
is covered by the vbus framework. Try doing that with PCI, especially
when you are not already under the qemu umbrella, and the "standards
based" approach suddenly doesn't look very attractive.

>
> wrt pci vs vbus, the difference is in the ability to use improvements in
> interrupt delivery accelerations in virt hardware.

Most of which will also apply to the current vbus design as well since
at some point I have to have an underlying IDT mechanism too, btw.

> If this happens,
> virtio/pci can immediately take advantage of it, while vbus has to stick
> with software delivery for backward compatibility, and all that code
> becomes a useless support burden.
>

The shared-memory path will always be the fastest anyway, so I am not
too worried about it. But vbus supports feature negotiation, so we can
always phase that out if need be, same as anything else.

> As an example of what hardware can do when it really sets its mind to
> it, s390 can IPI from vcpu to vcpu without exiting to the host.

Great! I am just not in the practice of waiting for hardware to cover
sloppy software. There is a ton of impracticality in doing so, such as
the fact that the hardware, even once available, will not be ubiquitous
instantly.

>
>>>> The only motiviation is if you wanted to preserve
>>>> ABI etc, which is what vmware is presumably after. However, I am not
>>>> advocating this as necessary at this juncture.
>>>>
>>>>
>>> Maybe AlacrityVM users don't care about compatibility, but my users do.
>>>
>> Again, not relevant to this thread. Making your interface
>> "hardware-like" buys you nothing in the end, as you ultimately need to
>> load drivers in the guest either way, and any major OS lets you extend
>> both devices and buses with relative ease. The only counter example
>> would be if you truly were "hardware-exactly" like e1000 emulation, but
>> we already know that this means it is hardware centric and not
>> "exit-rate aware" and would perform poorly. Otherwise "compatible" is
>> purely a point on the time line (for instance, the moment virtio-pci ABI
>> shipped), not an architectural description such as "hardware-like".
>>
>
> True, not related to the thread. But it is a problem.

Agreed. It is a distinct disadvantage to switching. Note that I am not
advocating that we need to switch. virtio-pci can coexist peacefully
from my perspective, and AlacrityVM does exactly this.

-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature

2009-12-28 01:01:28

by Gregory Haskins

[permalink] [raw]
Subject: Re: [Alacrityvm-devel] [GIT PULL] AlacrityVM guest drivers for 2.6.33

On 12/27/09 8:49 AM, Avi Kivity wrote:
> On 12/27/2009 03:39 PM, Gregory Haskins wrote:
>> No, where we are is at the point where we demonstrate that your original
>> statement that I did nothing to improve virtio was wrong.
>>
>>
>
> I stand by it. virtio + your patch does nothing without a ton more work
> (more or less equivalent to vhost-net).
>

Perhaps, but my work predates vhost-net by months and that has nothing
to do with what we are talking about anyway. Since you snipped your
original comment that started the thread, here it is again:

On 12/23/09 5:22 AM, Avi Kivity wrote:
> >
> > There was no attempt by Gregory to improve virtio-net.

It's not a gray area, nor open to interpretation. That statement was,
is, and will always be demonstrably false, so I'm sorry but you are
still wrong.

-Greg


Attachments:
signature.asc (267.00 B)
OpenPGP digital signature