2007-08-22 05:28:32

by Zachary Amsden

[permalink] [raw]
Subject: [PATCH] Add I/O hypercalls for i386 paravirt

In general, I/O in a virtual guest is subject to performance problems.
The I/O can not be completed physically, but must be virtualized. This
means trapping and decoding port I/O instructions from the guest OS.
Not only is the trap for a #GP heavyweight, both in the processor and
the hypervisor (which usually has a complex #GP path), but this forces
the hypervisor to decode the individual instruction which has faulted.
Worse, even with hardware assist such as VT, the exit reason alone is
not sufficient to determine the true nature of the faulting instruction,
requiring a complex and costly instruction decode and simulation.

This patch provides hypercalls for the i386 port I/O instructions, which
vastly helps guests which use native-style drivers. For certain VMI
workloads, this provides a performance boost of up to 30%. We expect
KVM and lguest to be able to achieve similar gains on I/O intensive
workloads.

This patch is against 2.6.23-rc2-mm2, and should be targeted for 2.6.24.

Zach


Attachments:
i386-mm-paravirt-io-ops.patch (8.79 kB)

2007-08-22 05:34:36

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Zachary Amsden wrote:
> In general, I/O in a virtual guest is subject to performance
> problems. The I/O can not be completed physically, but must be
> virtualized. This means trapping and decoding port I/O instructions
> from the guest OS. Not only is the trap for a #GP heavyweight, both
> in the processor and the hypervisor (which usually has a complex #GP
> path), but this forces the hypervisor to decode the individual
> instruction which has faulted. Worse, even with hardware assist such
> as VT, the exit reason alone is not sufficient to determine the true
> nature of the faulting instruction, requiring a complex and costly
> instruction decode and simulation.
>
> This patch provides hypercalls for the i386 port I/O instructions,
> which vastly helps guests which use native-style drivers. For certain
> VMI workloads, this provides a performance boost of up to 30%. We
> expect KVM and lguest to be able to achieve similar gains on I/O
> intensive workloads.
>


Won't these workloads be better off using paravirtualized drivers?
i.e., do the native drivers with paravirt I/O instructions get anywhere
near the performance of paravirt drivers?


--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-08-22 05:46:07

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Avi Kivity wrote:
> Zachary Amsden wrote:
>
>> In general, I/O in a virtual guest is subject to performance
>> problems. The I/O can not be completed physically, but must be
>> virtualized. This means trapping and decoding port I/O instructions
>> from the guest OS. Not only is the trap for a #GP heavyweight, both
>> in the processor and the hypervisor (which usually has a complex #GP
>> path), but this forces the hypervisor to decode the individual
>> instruction which has faulted. Worse, even with hardware assist such
>> as VT, the exit reason alone is not sufficient to determine the true
>> nature of the faulting instruction, requiring a complex and costly
>> instruction decode and simulation.
>>
>> This patch provides hypercalls for the i386 port I/O instructions,
>> which vastly helps guests which use native-style drivers. For certain
>> VMI workloads, this provides a performance boost of up to 30%. We
>> expect KVM and lguest to be able to achieve similar gains on I/O
>> intensive workloads.
>>
>>
>
>
> Won't these workloads be better off using paravirtualized drivers?
> i.e., do the native drivers with paravirt I/O instructions get anywhere
> near the performance of paravirt drivers?
>

Yes, in general, this is true (better off with paravirt drivers).
However, we have "paravirt" drivers which run in both
fully-paravirtualized and fully traditionally virtualized environments.
As a result, they use native port I/O operations to interact with
virtual hardware.

Since not all hypervisors have paravirtualized driver infrastructures
and guest O/S support yet, these hypercalls can be advantages to a wide
range of scenarios. Using I/O hypercalls as such gives exactly the same
performance as paravirt drivers for us, by eliminating the costly decode
path, and the simplicity of using the same driver code makes this a huge
win in code complexity.

Zach

2007-08-22 06:00:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Zachary Amsden wrote:
> In general, I/O in a virtual guest is subject to performance problems.
> The I/O can not be completed physically, but must be virtualized. This
> means trapping and decoding port I/O instructions from the guest OS.
> Not only is the trap for a #GP heavyweight, both in the processor and
> the hypervisor (which usually has a complex #GP path), but this forces
> the hypervisor to decode the individual instruction which has faulted.
> Worse, even with hardware assist such as VT, the exit reason alone is
> not sufficient to determine the true nature of the faulting instruction,
> requiring a complex and costly instruction decode and simulation.
>
> This patch provides hypercalls for the i386 port I/O instructions, which
> vastly helps guests which use native-style drivers. For certain VMI
> workloads, this provides a performance boost of up to 30%. We expect
> KVM and lguest to be able to achieve similar gains on I/O intensive
> workloads.
>

What about cost on hardware?

-hpa

2007-08-22 06:08:31

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

H. Peter Anvin wrote:
> Zachary Amsden wrote:
>
>> In general, I/O in a virtual guest is subject to performance problems.
>> The I/O can not be completed physically, but must be virtualized. This
>> means trapping and decoding port I/O instructions from the guest OS.
>> Not only is the trap for a #GP heavyweight, both in the processor and
>> the hypervisor (which usually has a complex #GP path), but this forces
>> the hypervisor to decode the individual instruction which has faulted.
>> Worse, even with hardware assist such as VT, the exit reason alone is
>> not sufficient to determine the true nature of the faulting instruction,
>> requiring a complex and costly instruction decode and simulation.
>>
>> This patch provides hypercalls for the i386 port I/O instructions, which
>> vastly helps guests which use native-style drivers. For certain VMI
>> workloads, this provides a performance boost of up to 30%. We expect
>> KVM and lguest to be able to achieve similar gains on I/O intensive
>> workloads.
>>
>>
>
> What about cost on hardware?
>

On modern hardware, port I/O is about the most expensive thing you can
do. The extra function call cost is totally masked by the stall. We
have measured with port I/O converted like this on real hardware, and
have seen zero measurable impact on macro-benchmarks. Micro-benchmarks
that generate massively repeated port I/O might show some effect on
ancient hardware, but I can't even imagine a workload which does such a
thing, other than a polling port I/O loop perhaps - which would not be
performance critical in any case I can reasonably imagine.

Zach

2007-08-22 06:26:44

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

On Wed, 2007-08-22 at 08:34 +0300, Avi Kivity wrote:
> Zachary Amsden wrote:
> > This patch provides hypercalls for the i386 port I/O instructions,
> > which vastly helps guests which use native-style drivers. For certain
> > VMI workloads, this provides a performance boost of up to 30%. We
> > expect KVM and lguest to be able to achieve similar gains on I/O
> > intensive workloads.
>
> Won't these workloads be better off using paravirtualized drivers?
> i.e., do the native drivers with paravirt I/O instructions get anywhere
> near the performance of paravirt drivers?

This patch also means I can kill off the emulation code in
drivers/lguest/core.c, which is a real relief.

Cheers,
Rusty.


2007-08-22 08:37:46

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Zachary Amsden wrote:
> Avi Kivity wrote:
>> Zachary Amsden wrote:
>>
>>> In general, I/O in a virtual guest is subject to performance
>>> problems. The I/O can not be completed physically, but must be
>>> virtualized. This means trapping and decoding port I/O instructions
>>> from the guest OS. Not only is the trap for a #GP heavyweight, both
>>> in the processor and the hypervisor (which usually has a complex #GP
>>> path), but this forces the hypervisor to decode the individual
>>> instruction which has faulted. Worse, even with hardware assist such
>>> as VT, the exit reason alone is not sufficient to determine the true
>>> nature of the faulting instruction, requiring a complex and costly
>>> instruction decode and simulation.
>>>
>>> This patch provides hypercalls for the i386 port I/O instructions,
>>> which vastly helps guests which use native-style drivers. For certain
>>> VMI workloads, this provides a performance boost of up to 30%. We
>>> expect KVM and lguest to be able to achieve similar gains on I/O
>>> intensive workloads.
>>>
>>>
>>
>>
>> Won't these workloads be better off using paravirtualized drivers?
>> i.e., do the native drivers with paravirt I/O instructions get anywhere
>> near the performance of paravirt drivers?
>>
>
> Yes, in general, this is true (better off with paravirt drivers).
> However, we have "paravirt" drivers which run in both
> fully-paravirtualized and fully traditionally virtualized
> environments. As a result, they use native port I/O operations to
> interact with virtual hardware.

Suffering from terminology overdose here: "fully traditionally
virtualized, fully-paravirtuallized, para-fullyvirtualized".

Since this is only for newer kernels, won't updating the driver to use a
hypercall be more efficient? Or is this for existing out-of-tree drivers?

>
> Since not all hypervisors have paravirtualized driver infrastructures
> and guest O/S support yet, these hypercalls can be advantages to a
> wide range of scenarios. Using I/O hypercalls as such gives exactly
> the same performance as paravirt drivers for us, by eliminating the
> costly decode path, and the simplicity of using the same driver code
> makes this a huge win in code complexity.

Ah, seems the answer to the last question is yes.


--
error compiling committee.c: too many arguments to function

2007-08-22 09:28:36

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

On Tue, Aug 21, 2007 at 10:23:14PM -0700, Zachary Amsden wrote:
> In general, I/O in a virtual guest is subject to performance problems.
> The I/O can not be completed physically, but must be virtualized. This
> means trapping and decoding port I/O instructions from the guest OS.
> Not only is the trap for a #GP heavyweight, both in the processor and
> the hypervisor (which usually has a complex #GP path), but this forces
> the hypervisor to decode the individual instruction which has faulted.

Is that really that expensive? Hard to imagine.

e.g. you could always have a fast check for inb/outb at the beginning
of the #GP handler. And is your initial #GP entry really more expensive
than a hypercall?

> Worse, even with hardware assist such as VT, the exit reason alone is
> not sufficient to determine the true nature of the faulting instruction,
> requiring a complex and costly instruction decode and simulation.

It's unclear to me why that should be that costly.

Worst case it's a switch()

-Andi

2007-08-22 09:42:03

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

> This patch also means I can kill off the emulation code in
> drivers/lguest/core.c, which is a real relief.

But would it be faster? If not or only insignificant amount I think I would
prefer you keep it. Hooking IO is quite intrusive because it's done
by so many drivers.

-Andi

2007-08-22 09:51:18

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Andi Kleen wrote:
>> This patch also means I can kill off the emulation code in
>> drivers/lguest/core.c, which is a real relief.
>>
>
> But would it be faster? If not or only insignificant amount I think I would
> prefer you keep it. Hooking IO is quite intrusive because it's done
> by so many drivers.
>

I don't see why it's intrusive -- they all use the APIs, right?


--
error compiling committee.c: too many arguments to function

2007-08-22 10:14:27

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

> I don't see why it's intrusive -- they all use the APIs, right?

Yes, but it still changes them. It might have a larger impact
on code size for example.

-Andi

2007-08-22 10:23:38

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Andi Kleen wrote:
>> I don't see why it's intrusive -- they all use the APIs, right?
>>
>
> Yes, but it still changes them. It might have a larger impact
> on code size for example.
>

Only if CONFIG_PARAVIRT is defined. And even then, all the performance
sensitive stuff uses mmio, no?


--
error compiling committee.c: too many arguments to function

2007-08-22 10:30:17

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

On Wed, Aug 22, 2007 at 01:23:43PM +0300, Avi Kivity wrote:
> Andi Kleen wrote:
> >>I don't see why it's intrusive -- they all use the APIs, right?
> >>
> >
> >Yes, but it still changes them. It might have a larger impact
> >on code size for example.
> >
>
> Only if CONFIG_PARAVIRT is defined.

Which eventually distribution kernels will do.

> And even then, all the performance
> sensitive stuff uses mmio, no?

Not worried about performance, but just impact on code size etc.

-Andi

2007-08-22 12:09:08

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Andi Kleen wrote:
> On Wed, Aug 22, 2007 at 01:23:43PM +0300, Avi Kivity wrote:
>
>> Andi Kleen wrote:
>>
>>>> I don't see why it's intrusive -- they all use the APIs, right?
>>>>
>>>>
>>> Yes, but it still changes them. It might have a larger impact
>>> on code size for example.
>>>
>>>
>> Only if CONFIG_PARAVIRT is defined.
>>
>
> Which eventually distribution kernels will do.
>
>
>> And even then, all the performance
>> sensitive stuff uses mmio, no?
>>
>
> Not worried about performance, but just impact on code size etc.
>

Ah. But that's mostly modules, so real in-core changes should be very
small (say 10 bytes per call site X 10 callsites per driver X 10
drivers... even if off by an order of magnitude it's still tiny)


--
error compiling committee.c: too many arguments to function

2007-08-22 12:22:08

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

> Ah. But that's mostly modules, so real in-core changes should be very

Yes that's the big difference. Near all paravirt ops are concentrated
on the core kernel, but this one affects lots of people.

And why "but"? -- modules are as important as the core kernel. They're
not second citizens.

-Andi

2007-08-22 12:32:35

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Andi Kleen wrote:
>> Ah. But that's mostly modules, so real in-core changes should be very
>>
>
> Yes that's the big difference. Near all paravirt ops are concentrated
> on the core kernel, but this one affects lots of people.
>
> And why "but"? -- modules are as important as the core kernel. They're
> not second citizens.
>

It's not being second class; simply few modules are loaded at runtime,
so most of the code impact is on disk. The in-code impact is small. If
paravirt i/o insns are worthwhile, I don't think code size is an issue.

--
error compiling committee.c: too many arguments to function

2007-08-22 16:11:23

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Avi Kivity wrote:
> And even then, all the performance
> sensitive stuff uses mmio, no?


Depends on the hardware.

Jeff


2007-08-22 16:21:38

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Avi Kivity wrote:
>
> Since this is only for newer kernels, won't updating the driver to use
> a hypercall be more efficient? Or is this for existing out-of-tree
> drivers?

Actually, it is for in-tree drivers that we emulate but don't want to
pollute, and one out of tree driver (that will hopefully be in tree
soon!) that has no way to determine if making hypercalls is acceptable.

Zach

2007-08-22 16:53:46

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Andi Kleen wrote:
> On Tue, Aug 21, 2007 at 10:23:14PM -0700, Zachary Amsden wrote:
>
>> In general, I/O in a virtual guest is subject to performance problems.
>> The I/O can not be completed physically, but must be virtualized. This
>> means trapping and decoding port I/O instructions from the guest OS.
>> Not only is the trap for a #GP heavyweight, both in the processor and
>> the hypervisor (which usually has a complex #GP path), but this forces
>> the hypervisor to decode the individual instruction which has faulted.
>>
>
> Is that really that expensive? Hard to imagine.
>

You have an expensive (16x cost of hypercall on some processors)
privilege transition, you have to decode the instruction, then you have
to verify protection in the page tables mapping the page allows
execution (P, !NX, and U/S check). This is a lot more expensive than a
hypercall.

> e.g. you could always have a fast check for inb/outb at the beginning
> of the #GP handler. And is your initial #GP entry really more expensive
> than a hypercall?
>

The number of reasons for #GP is enormous, and there are too many paths
to optimize with fast checks. We do have a fast check for inb/outb;
it's just not fast enough.

On P4, hypercall entry is 120 cycles. #GP is about 2000. Modern
processors are better, but a hypercall is always faster than a fault.
Many times, the hypercall can be handled and ready to return before a
#GP would even complete.

On workloads that sit there and hammer network cards, these costs become
significant, and latency sensitive network benchmarks suffer.

>> Worse, even with hardware assist such as VT, the exit reason alone is
>> not sufficient to determine the true nature of the faulting instruction,
>> requiring a complex and costly instruction decode and simulation.
>>
>
> It's unclear to me why that should be that costly.
>
> Worst case it's a switch()
>

There are 24 different possible I/O operations; sometimes with a port
encoded in the instruction, sometimes with input in the DX register,
sometimes with a rep prefix, and for 3 different operand sizes.

Combine that with the MMU checks required, and it's complex and branchy
enough to justify short-circuiting the whole thing with a simple hypercall.

Zach

2007-08-22 17:05:37

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

On Wed, Aug 22, 2007 at 09:48:25AM -0700, Zachary Amsden wrote:
> Andi Kleen wrote:
> >On Tue, Aug 21, 2007 at 10:23:14PM -0700, Zachary Amsden wrote:
> >
> >>In general, I/O in a virtual guest is subject to performance problems.
> >>The I/O can not be completed physically, but must be virtualized. This
> >>means trapping and decoding port I/O instructions from the guest OS.
> >>Not only is the trap for a #GP heavyweight, both in the processor and
> >>the hypervisor (which usually has a complex #GP path), but this forces
> >>the hypervisor to decode the individual instruction which has faulted.
> >>
> >
> >Is that really that expensive? Hard to imagine.
> >
>
> You have an expensive (16x cost of hypercall on some processors)

Where is the difference comming from? Are you using SYSENTER
for the hypercall? I can't really see you using SYSENTER,
because how would you do system calls then? I bet system calls
are more frequent than in/out, so if you have decide between the
two using them for syscalls is likely faster.

For an int XYZ gate i wouldn't expect that much difference to
a #GP fault.

Also I fail to see the fundamental speed difference between

mov index,register
int 0x...
...
switch (register)
case xxxx: do emulation

versus

out ...
#gp
-> switch (*eip) {
case 0xee: /* etc. */
do emulation

> privilege transition, you have to decode the instruction, then you have

out is usually a single byte. Shouldn't be very expensive
to decode. In fact it should be roughly equivalent to your
hypercall multiplex.

> to verify protection in the page tables mapping the page allows
> execution (P, !NX, and U/S check). This is a lot more expensive than a

When the page is not executable or not present you get #PF not #GP.
So the hardware already checks that.

The only case where you would need to check yourself is if you emulate
NX on non NX capable hardware, but I can't see you doing that.

> There are 24 different possible I/O operations; sometimes with a port
> encoded in the instruction, sometimes with input in the DX register,
> sometimes with a rep prefix, and for 3 different operand sizes.

Most of this is a single byte which is the same as the hypercall
demux. Essentially a table lookup if you use the obvious switch()

-Andi

2007-08-22 17:13:09

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Andi Kleen wrote:
> On Wed, Aug 22, 2007 at 09:48:25AM -0700, Zachary Amsden wrote:
>
>> Andi Kleen wrote:
>>
>>> On Tue, Aug 21, 2007 at 10:23:14PM -0700, Zachary Amsden wrote:
>>>
>>>
>>>> In general, I/O in a virtual guest is subject to performance problems.
>>>> The I/O can not be completed physically, but must be virtualized. This
>>>> means trapping and decoding port I/O instructions from the guest OS.
>>>> Not only is the trap for a #GP heavyweight, both in the processor and
>>>> the hypervisor (which usually has a complex #GP path), but this forces
>>>> the hypervisor to decode the individual instruction which has faulted.
>>>>
>>>>
>>> Is that really that expensive? Hard to imagine.
>>>
>>>
>> You have an expensive (16x cost of hypercall on some processors)
>>
>
> Where is the difference comming from? Are you using SYSENTER
> for the hypercall? I can't really see you using SYSENTER,
> because how would you do system calls then? I bet system calls
> are more frequent than in/out, so if you have decide between the
> two using them for syscalls is likely faster.
>

We use sysenter for hypercalls and also for system calls. :)

> Also I fail to see the fundamental speed difference between
>
> mov index,register
> int 0x...
> ...
> switch (register)
> case xxxx: do emulation
>

Int (on p4 == ~680 cycles).

> versus
>
> out ...
> #gp
> -> switch (*eip) {
> case 0xee: /* etc. */
> do emulation
>

GP = ~2000 cycles.

>> to verify protection in the page tables mapping the page allows
>> execution (P, !NX, and U/S check). This is a lot more expensive than a
>>
>
> When the page is not executable or not present you get #PF not #GP.
> So the hardware already checks that.
>
> The only case where you would need to check yourself is if you emulate
> NX on non NX capable hardware, but I can't see you doing that.
>

No, it doesn't. Between the #GP and decode, you have an SMP race where
another processor can rewrite the instruction.

Zach

2007-08-22 17:27:52

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

> out is usually a single byte. Shouldn't be very expensive
> to decode. In fact it should be roughly equivalent to your
> hypercall multiplex.

Why is a performance critical path on a paravirt kernel even using I/O
instructions and not paravirtual device drivers ?

It clearly makes sense to virtualise I/O operations if you are doing that
(so you can do posting, triggers and predicted reply handling guest side
to keep the trap rate sane) but I don't see why this situation occurs in
the first place for paravirt.

2007-08-22 18:52:50

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

On Wed, Aug 22, 2007 at 10:07:47AM -0700, Zachary Amsden wrote:
> >Also I fail to see the fundamental speed difference between
> >
> >mov index,register
> >int 0x...
> >...
> >switch (register)
> >case xxxx: do emulation
> >
>
> Int (on p4 == ~680 cycles).
>
> >versus
> >
> >out ...
> >#gp
> >-> switch (*eip) {
> >case 0xee: /* etc. */
> > do emulation
> >
>
> GP = ~2000 cycles.

How is that measured? In a loop? In the same pipeline state?

It seems a little dubious to me.

>
> >>to verify protection in the page tables mapping the page allows
> >>execution (P, !NX, and U/S check). This is a lot more expensive than a
> >>
> >
> >When the page is not executable or not present you get #PF not #GP.
> >So the hardware already checks that.
> >
> >The only case where you would need to check yourself is if you emulate
> >NX on non NX capable hardware, but I can't see you doing that.
> >
>
> No, it doesn't. Between the #GP and decode, you have an SMP race where
> another processor can rewrite the instruction.

That can be ignored imho. If the page goes away you'll notice
when you handle the page fault on read. If it becomes NX then the execution
just happened to be logically a little earlier.

My other objection to this scheme is that you'll change a zillion
drivers you'll never emulate which seems just stupid. You could
just change the small handful that you emulate to use hypercalls.

Or easier to just write a backend for the lguest virtio drivers,
that will be likely faster in the end anyways than this gross
hack.

Really LinuxHAL^wparavirt ops is already so complicated that
any new hooks need an extremly good justification and that is
just not here for this.

We can add it if you find an equivalent number of hooks
to eliminate.

-Andi

2007-08-22 20:48:56

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Andi Kleen wrote:
>
> How is that measured? In a loop? In the same pipeline state?
>
> It seems a little dubious to me.
>

I did the experiments in a controlled environment, with interrupts
disabled and care to get the pipeline in the same state. It was a
perfectly repeatable experiment. I don't have exact cycle time anymore,
but they were the tightest measurements I've even seen on cycle counts
because of the unique nature of serializing the processor for the fault
/ privilege transition. I tested a variety of different conditions,
including different types of #GP (yes, the cost does vary), #NP, #PF,
sysenter, int $0xxx. Sysenter was the fastest, by far. Int was about
5x the cost. #GP and friends were all about similar costs. #PF was the
most expensive.


>
>>>> to verify protection in the page tables mapping the page allows
>>>> execution (P, !NX, and U/S check). This is a lot more expensive than a
>>>>
>>>>
>>> When the page is not executable or not present you get #PF not #GP.
>>> So the hardware already checks that.
>>>
>>> The only case where you would need to check yourself is if you emulate
>>> NX on non NX capable hardware, but I can't see you doing that.
>>>
>>>
>> No, it doesn't. Between the #GP and decode, you have an SMP race where
>> another processor can rewrite the instruction.
>>
>
> That can be ignored imho. If the page goes away you'll notice
> when you handle the page fault on read. If it becomes NX then the execution
> just happened to be logically a little earlier.
>
>

No, you can't ignore it. The page protections won't change between the
GP and the decoder execution, but the instruction can, causing you to
decode into the next page where the processor would not have. !P
becomes obvious, but failure to respect NX or U/S is an exploitable
bug. Put a 1 byte instruction at the end of a page crossing into a NX
(or supervisor page). Remotely, change keep switching between the
instruction and a segment override.

Result: user executes instruction on supervisor code page, learning data
as a result of this; code on NX page gets executed.

> Or easier to just write a backend for the lguest virtio drivers,
> that will be likely faster in the end anyways than this gross
> hack.
>

We already have drivers for all of our hardware in Linux. Most of the
hardware we emulate is physical hardware, and there are no virtual
drivers for it. Should we take the BusLogic driver and "paravirtualize"
it by adding VMI hypercalls? We might benefit from it, but would the
BusLogic driver? It sets a nasty precedent for maintenance as different
hypervisors and emulators hack up different drivers for their own
performance.

Our SCSI and IDE emulation and thus the drivers used by Linux are pretty
much fixed in stone; we are not going to go about changing a tricky
hardware interface to a virtual one, it is simply too risky for
something as critical as storage. We might be able to move our network
driver over to virtio, but that is not a short-term prospect either.

There is great advantage in talking to our existing device layer faster,
and this is something that is valuable today.

> Really LinuxHAL^wparavirt ops is already so complicated that
> any new hooks need an extremly good justification and that is
> just not here for this.
>
> We can add it if you find an equivalent number of hooks
> to eliminate.
>

Interesting trade. What if I sanitized the whole I/O messy macros into
something fun and friendly:

native_port_in(int port, iosize_t opsize, int delay)
native_port_out(int port, iosize_t opsize, u32 output, int delay)
native_port_string_in(int port, void *ptr, iosize_t opsize, unsigned
count, int delay)
native_port_string_out(int port, void *ptr, iosize_t opsize, unsigned
count, int delay)

Then we can be rid of all the macro goo in io.h, which frightens my
mother. We might even be able to get rid of the umpteen different
places where drivers wrap iospace access with their own byte / word /
long functions so they can switch between port I/O and memory mapped I/O
by moving it all into common infrastructure.

We could make similar (unwelcome?) advances on the pte functions if it
were not for the regrettable disconnect between pte_high / pte_low and
the rest. Perhaps if it was hidden in macros?

Zach

2007-08-22 21:11:31

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

> No, you can't ignore it. The page protections won't change between the
> GP and the decoder execution, but the instruction can, causing you to
> decode into the next page where the processor would not have. !P
> becomes obvious, but failure to respect NX or U/S is an exploitable
> bug. Put a 1 byte instruction at the end of a page crossing into a NX
> (or supervisor page). Remotely, change keep switching between the
> instruction and a segment override.

Then you do this check only if the instruction crosses a page.
That is very cheap to test.

> Result: user executes instruction on supervisor code page, learning data
> as a result of this; code on NX page gets executed.

Most systems probably have gaps between user and supervisor
(like Linux), but ok.

> We already have drivers for all of our hardware in Linux. Most of the
> hardware we emulate is physical hardware, and there are no virtual
> drivers for it. Should we take the BusLogic driver and "paravirtualize"
> it by adding VMI hypercalls?

You're proposing instead to paravirtualize all drivers, even if 99.99%
of those will never ever have a driver model.

> We might benefit from it, but would the
> BusLogic driver? It sets a nasty precedent for maintenance as different
> hypervisors and emulators hack up different drivers for their own
> performance.

I still think it's preferable to change some drivers than everybody.

AFAIK BusLogic as real hardware is pretty much dead anyways,
so you're probably the only primary user of it anyways.
Go wild on it!

If you worry about it do your own drivers like the other hypervisors.
I still suspect you could go faster if you use a paravirtualy
optimized driver, but and I'm not going to speculate on the
reasons why you don't want to do that.


> Our SCSI and IDE emulation and thus the drivers used by Linux are pretty
> much fixed in stone; we are not going to go about changing a tricky
> hardware interface to a virtual one, it is simply too risky for

You wouldn't need to change it; just add a very simple new one
(e.g. the lguest interface is nearly trivial)

> There is great advantage in talking to our existing device layer faster,
> and this is something that is valuable today.

Well that might be. I just think it would be a mistake
to design paravirt_ops based on someone's short term release engineering
considerations.

>
> >Really LinuxHAL^wparavirt ops is already so complicated that
> >any new hooks need an extremly good justification and that is
> >just not here for this.
> >
> >We can add it if you find an equivalent number of hooks
> >to eliminate.
> >
>
> Interesting trade. What if I sanitized the whole I/O messy macros into
> something fun and friendly:

That would be a cool project anyways. e.g. just moving
the NUMAQ support separately would clean it up. But probably not enough
on its own, sorry.

But you'll need separate interfaces anyways if you want to
go down the BusLogic change path. That could well be coupled
with a cleanup.

> We might even be able to get rid of the umpteen different
> places where drivers wrap iospace access with their own byte / word /
> long functions so they can switch between port I/O and memory mapped I/O
> by moving it all into common infrastructure.

I thought we had that already? But can't find it now :/

>
> We could make similar (unwelcome?) advances on the pte functions if it
> were not for the regrettable disconnect between pte_high / pte_low and
> the rest. Perhaps if it was hidden in macros?

You want to do what exactly?

If you mean PAE and non PAE In the same binary: that would likely
need abstracted page tables first.

-Andi

2007-08-22 21:18:45

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

> I still think it's preferable to change some drivers than everybody.
>
> AFAIK BusLogic as real hardware is pretty much dead anyways,
> so you're probably the only primary user of it anyways.
> Go wild on it!

I don't believe anyone is materially maintaining the buslogic driver and
in time its going to break completely.

> Well that might be. I just think it would be a mistake
> to design paravirt_ops based on someone's short term release engineering
> considerations.

Agreed, especially as an interface where each in or out traps into the
hypervisor is broken even for the model of virtualising hardware.

> I thought we had that already? But can't find it now :/

pci_iomap() and friends.

Alan

2007-08-22 21:46:50

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Alan Cox wrote:
>> I still think it's preferable to change some drivers than everybody.
>>
>> AFAIK BusLogic as real hardware is pretty much dead anyways,
>> so you're probably the only primary user of it anyways.
>> Go wild on it!
>>
>
> I don't believe anyone is materially maintaining the buslogic driver and
> in time its going to break completely.
>

I think I was actually the last person to touch it ;)

>
>> Well that might be. I just think it would be a mistake
>> to design paravirt_ops based on someone's short term release engineering
>> considerations.
>>
>
> Agreed, especially as an interface where each in or out traps into the
> hypervisor is broken even for the model of virtualising hardware.
>

Well, it's not necessarily broken, it's just a different model. At some
point the cost of maintaining a whole suite of virtual drivers becomes
greater than leveraging a bunch of legacy drivers. If you can eliminate
most of the performance cost of that by changing something at a layer
below (port I/O), it is a win even if it is not a perfect solution.

But I think I've lost the argument anyways; it doesn't seem to be for
the greater good of Linux, and there are alternatives we can take.
Unfortunately for me, they require a lot more work.

Zach

2007-08-22 21:50:50

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Andi Kleen wrote:
>
>> We might benefit from it, but would the
>> BusLogic driver? It sets a nasty precedent for maintenance as different
>> hypervisors and emulators hack up different drivers for their own
>> performance.
>>
>
> I still think it's preferable to change some drivers than everybody.
>
> AFAIK BusLogic as real hardware is pretty much dead anyways,
> so you're probably the only primary user of it anyways.
> Go wild on it!
>

It is looking juicy. Maybe another day.

Zach

2007-08-22 21:54:24

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Zachary Amsden wrote:
> This patch provides hypercalls for the i386 port I/O instructions,
> which vastly helps guests which use native-style drivers. For certain
> VMI workloads, this provides a performance boost of up to 30%. We
> expect KVM and lguest to be able to achieve similar gains on I/O
> intensive workloads.

Two comments:

- I should dust off my "break up paravirt_ops" patch, and this would fit
nicely into it (I think we already discussed this)

- What happens if you *don't* want to pv some of the io instructions?
What if you have a device which is directly exposed to the guest?

J

2007-08-22 22:21:29

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

* James Courtier-Dutton ([email protected]) wrote:
> If one could directly expose a device to the guest, this feature could
> be extremely useful for me.
> Is it possible? How would it manage to handle the DMA bus mastering?

Yes it's possible (Xen supports pci pass through). Without an IOMMU
(like Intel VT-d or AMD IOMMU) it's not DMA safe.

thanks,
-chris

2007-08-22 22:25:54

by James Courtier-Dutton

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Jeremy Fitzhardinge wrote:
> Zachary Amsden wrote:
>> This patch provides hypercalls for the i386 port I/O instructions,
>> which vastly helps guests which use native-style drivers. For certain
>> VMI workloads, this provides a performance boost of up to 30%. We
>> expect KVM and lguest to be able to achieve similar gains on I/O
>> intensive workloads.
>
> Two comments:
>
> - I should dust off my "break up paravirt_ops" patch, and this would fit
> nicely into it (I think we already discussed this)
>
> - What happens if you *don't* want to pv some of the io instructions?
> What if you have a device which is directly exposed to the guest?


If one could directly expose a device to the guest, this feature could
be extremely useful for me.
Is it possible? How would it manage to handle the DMA bus mastering?

James

2007-08-22 22:35:05

by Chris Wright

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

* James Courtier-Dutton ([email protected]) wrote:
> Ok, so I need to get a new CPU like the Intel Core Duo that has VT
> features? I have an old Pentium 4 at the moment, without any VT features.

Depends on your goals. You can certainly give a paravirt Xen guest[1]
physical hardware without any VT extentions. But that guest will be
able to DMA anywhere in memory without VT-d, so if it's an untrusted
guest you'd be taking a huge risk.

thanks,
-chris

[1] Note: this is with the xenbits.xensource.com kernel, not with a
kernel you'll get from kernel.org ATM.

2007-08-22 22:39:50

by James Courtier-Dutton

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Chris Wright wrote:
> * James Courtier-Dutton ([email protected]) wrote:
>> If one could directly expose a device to the guest, this feature could
>> be extremely useful for me.
>> Is it possible? How would it manage to handle the DMA bus mastering?
>
> Yes it's possible (Xen supports pci pass through). Without an IOMMU
> (like Intel VT-d or AMD IOMMU) it's not DMA safe.
>
> thanks,
> -chris

Ok, so I need to get a new CPU like the Intel Core Duo that has VT
features? I have an old Pentium 4 at the moment, without any VT features.

Kind Regards

James


2007-08-22 23:14:51

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

James Courtier-Dutton wrote:
> Ok, so I need to get a new CPU like the Intel Core Duo that has VT
> features? I have an old Pentium 4 at the moment, without any VT features.
>

No, VT-d (as opposed to VT) is a chipset feature which allows the
hypervisor to control who's allowed to DMA where. So you'd need a very
new machine with a VT-d capable chipset (which would also have VT, since
all new processors do).

J

2007-08-22 23:53:19

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

On Wed, Aug 22, 2007 at 04:14:41PM -0700, Jeremy Fitzhardinge wrote:
> (which would also have VT, since
> all new processors do).

Not true unfortunately. The Intel low end parts like Celerons (which
are actually shipped in very large numbers) don't. Also Intel
is still shipping some CPUs that don't support it at all, like
the ULV Centrinos which are based on an older core.

-Andi

2007-08-23 00:38:36

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Andi Kleen wrote:
> On Wed, Aug 22, 2007 at 04:14:41PM -0700, Jeremy Fitzhardinge wrote:
>
>> (which would also have VT, since
>> all new processors do).
>>
>
> Not true unfortunately. The Intel low end parts like Celerons (which
> are actually shipped in very large numbers) don't. Also Intel
> is still shipping some CPUs that don't support it at all, like
> the ULV Centrinos which are based on an older core.
>

Likely to be missing VT-d too, right?

J

2007-08-23 01:40:30

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

On Wed, Aug 22, 2007 at 05:38:31PM -0700, Jeremy Fitzhardinge wrote:
> Andi Kleen wrote:
> > On Wed, Aug 22, 2007 at 04:14:41PM -0700, Jeremy Fitzhardinge wrote:
> >
> >> (which would also have VT, since
> >> all new processors do).
> >>
> >
> > Not true unfortunately. The Intel low end parts like Celerons (which
> > are actually shipped in very large numbers) don't. Also Intel
> > is still shipping some CPUs that don't support it at all, like
> > the ULV Centrinos which are based on an older core.
> >
>
> Likely to be missing VT-d too, right?

VT-d is chipset functionality. So it depends on the chipset.

At least initially the non Intel chipsets and lowend chips are unlikely
to get IOMMUs I guess.

There might be some exceptions. e.g. the GPU vendors seem
to want to to their own IOMMUs, so perhaps graphic devices
might have them anyways.

-Andi

2007-08-23 05:35:48

by Rusty Russell

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

On Wed, 2007-08-22 at 22:25 +0100, Alan Cox wrote:
> > I still think it's preferable to change some drivers than everybody.
> >
> > AFAIK BusLogic as real hardware is pretty much dead anyways,
> > so you're probably the only primary user of it anyways.
> > Go wild on it!
>
> I don't believe anyone is materially maintaining the buslogic driver and
> in time its going to break completely.
>
> > Well that might be. I just think it would be a mistake
> > to design paravirt_ops based on someone's short term release engineering
> > considerations.
>
> Agreed, especially as an interface where each in or out traps into the
> hypervisor is broken even for the model of virtualising hardware.

I'd really like lguest guests not to do ins and outs, but that's likely
to be more invasive a change than this. We do it to find the PCI bus
IIRC, and a couple of other early probe bits.

It's just unfortunate that it's the one place lguest has to emulate
because of lack of paravirt_ops coverage.

Rusty.

2007-08-24 12:20:50

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Hi!

> >>In general, I/O in a virtual guest is subject to
> >>performance problems. The I/O can not be completed
> >>physically, but must be virtualized. This
> >>means trapping and decoding port I/O instructions from
> >>the guest OS. Not only is the trap for a #GP
> >>heavyweight, both in the processor and
> >>the hypervisor (which usually has a complex #GP path),
> >>but this forces
> >>the hypervisor to decode the individual instruction
> >>which has faulted. Worse, even with hardware assist
> >>such as VT, the exit reason alone is
> >>not sufficient to determine the true nature of the
> >>faulting instruction,
> >>requiring a complex and costly instruction decode and
> >>simulation.
> >>
> >>This patch provides hypercalls for the i386 port I/O
> >>instructions, which
> >>vastly helps guests which use native-style drivers.
> >>For certain VMI
> >>workloads, this provides a performance boost of up to
> >>30%. We expect
> >>KVM and lguest to be able to achieve similar gains on
> >>I/O intensive
> >>workloads.
> >>
> >>
> >
> >What about cost on hardware?
> >
>
> On modern hardware, port I/O is about the most expensive
> thing you can do. The extra function call cost is
> totally masked by the stall. We have measured with port
> I/O converted like this on real hardware, and have seen
> zero measurable impact on macro-benchmarks.
> Micro-benchmarks that generate massively repeated port
> I/O might show some effect on ancient hardware, but I
> can't even imagine a workload which does such a thing,
> other than a polling port I/O loop perhaps - which would
> not be performance critical in any case I can reasonably
> imagine.

SCSI controller in ISA slot? IDE without DMA enabled?

Yes, those are performance-critical. The second case seems common with
compactflash cards.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-08-28 06:49:53

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

On Wed, 2007-08-22 at 16:25 +1000, Rusty Russell wrote:
> On Wed, 2007-08-22 at 08:34 +0300, Avi Kivity wrote:
> > Zachary Amsden wrote:
> > > This patch provides hypercalls for the i386 port I/O instructions,
> > > which vastly helps guests which use native-style drivers. For certain
> > > VMI workloads, this provides a performance boost of up to 30%. We
> > > expect KVM and lguest to be able to achieve similar gains on I/O
> > > intensive workloads.
> >
> > Won't these workloads be better off using paravirtualized drivers?
> > i.e., do the native drivers with paravirt I/O instructions get anywhere
> > near the performance of paravirt drivers?
>
> This patch also means I can kill off the emulation code in
> drivers/lguest/core.c, which is a real relief.

Hrm... how do you deal with X doing IOs ?

Ben.


2007-08-28 06:50:16

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

On Tue, 2007-08-21 at 22:23 -0700, Zachary Amsden wrote:
> In general, I/O in a virtual guest is subject to performance problems.
> The I/O can not be completed physically, but must be virtualized. This
> means trapping and decoding port I/O instructions from the guest OS.
> Not only is the trap for a #GP heavyweight, both in the processor and
> the hypervisor (which usually has a complex #GP path), but this forces
> the hypervisor to decode the individual instruction which has faulted.
> Worse, even with hardware assist such as VT, the exit reason alone is
> not sufficient to determine the true nature of the faulting instruction,
> requiring a complex and costly instruction decode and simulation.

.../...

How about userland ? Things like X do IO's typically... You still need
to trap/emulate for these no ?

Cheers,
Ben.

2007-08-28 06:58:11

by Zachary Amsden

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt

Benjamin Herrenschmidt wrote:
> On Wed, 2007-08-22 at 16:25 +1000, Rusty Russell wrote:
>
>> On Wed, 2007-08-22 at 08:34 +0300, Avi Kivity wrote:
>>
>>> Zachary Amsden wrote:
>>>
>>>> This patch provides hypercalls for the i386 port I/O instructions,
>>>> which vastly helps guests which use native-style drivers. For certain
>>>> VMI workloads, this provides a performance boost of up to 30%. We
>>>> expect KVM and lguest to be able to achieve similar gains on I/O
>>>> intensive workloads.
>>>>
>>> Won't these workloads be better off using paravirtualized drivers?
>>> i.e., do the native drivers with paravirt I/O instructions get anywhere
>>> near the performance of paravirt drivers?
>>>
>> This patch also means I can kill off the emulation code in
>> drivers/lguest/core.c, which is a real relief.
>>
>
> Hrm... how do you deal with X doing IOs ?
>
> Ben.
>

We have an X driver that does minimal performance costing operations.
As we should and will have for our other drivers.

Zach

2007-08-29 17:08:01

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] Add I/O hypercalls for i386 paravirt


> We have an X driver that does minimal performance costing operations.
> As we should and will have for our other drivers.

Ok, so you use your own DDX and prevent X vgacrapware to kick in ? Makes
sense.

Ben.