LinuxLists.cc - [PATCH] Refactor hypercall infrastructure

[permalink] [raw]

Subject: Re: [PATCH] Refactor hypercall infrastructure

Anthony Liguori wrote:
> This patch refactors the current hypercall infrastructure to better support live
> migration and SMP. It eliminates the hypercall page by trapping the UD
> exception that would occur if you used the wrong hypercall instruction for the
> underlying architecture and replacing it with the right one lazily.
>

I guess it would be pretty rude/unlikely for these opcodes to get reused
in other implementations... But couldn't you make the page trap
instead, rather than relying on an instruction fault?

> It also introduces the infrastructure to probe for hypercall available via
> CPUID leaves 0x40000002. CPUID leaf 0x40000003 should be filled out by
> userspace.
>

Is this compatible with Xen's (and other's) use of cpuid? That is,
0x40000000 returns a hypervisor-specific signature in e[bcd]x, and eax
has the max hypervisor leaf.

J

2007-09-14 21:03:16

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>
>> This patch refactors the current hypercall infrastructure to better support live
>> migration and SMP. It eliminates the hypercall page by trapping the UD
>> exception that would occur if you used the wrong hypercall instruction for the
>> underlying architecture and replacing it with the right one lazily.
>>
>>
>
> I guess it would be pretty rude/unlikely for these opcodes to get reused
> in other implementations... But couldn't you make the page trap
> instead, rather than relying on an instruction fault?
>

The whole point of using the instruction is to allow hypercalls to be
used in many locations. This has the nice side effect of not requiring
a central hypercall initialization routine in the guest to fetch the
hypercall page. A PV driver can be completely independent of any other
code provided that it restricts itself to it's hypercall namespace.

>> It also introduces the infrastructure to probe for hypercall available via
>> CPUID leaves 0x40000002. CPUID leaf 0x40000003 should be filled out by
>> userspace.
>>
>>
>
> Is this compatible with Xen's (and other's) use of cpuid? That is,
> 0x40000000 returns a hypervisor-specific signature in e[bcd]x, and eax
> has the max hypervisor leaf.
>

Xen is currently using 0/1/2. I had thought it was only using 0/1. The
intention was not to squash Xen's current CPUID usage so that it would
still be possible for Xen to make use of the guest code. Can we agree
that Xen won't squash leaves 3/4 or is it not worth trying to be
compatible at this point?

Regards,

Anthony Liguori

> J
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> kvm-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>
>

2007-09-14 21:20:54

by Zachary Amsden

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

On Fri, 2007-09-14 at 16:02 -0500, Anthony Liguori wrote:
> Jeremy Fitzhardinge wrote:
> > Anthony Liguori wrote:
> >
> >> This patch refactors the current hypercall infrastructure to better support live
> >> migration and SMP. It eliminates the hypercall page by trapping the UD
> >> exception that would occur if you used the wrong hypercall instruction for the
> >> underlying architecture and replacing it with the right one lazily.
> >>
> >>
> >
> > I guess it would be pretty rude/unlikely for these opcodes to get reused
> > in other implementations... But couldn't you make the page trap
> > instead, rather than relying on an instruction fault?
> >
>
> The whole point of using the instruction is to allow hypercalls to be
> used in many locations. This has the nice side effect of not requiring
> a central hypercall initialization routine in the guest to fetch the
> hypercall page. A PV driver can be completely independent of any other
> code provided that it restricts itself to it's hypercall namespace.

But if the instruction is architecture dependent, and you run on the
wrong architecture, now you have to patch many locations at fault time,
introducing some nasty runtime code / data cache overlap performance
problems. Granted, they go away eventually.

I prefer the idea of a hypercall page, but not a central initialization.
Rather, a decentralized approach where PV drivers can detect using CPUID
which hypervisor is present, and a common MSR shared by all hypervisors
that provides the location of the hypercall page.

Zach

2007-09-14 21:22:51

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Anthony Liguori wrote:
> The whole point of using the instruction is to allow hypercalls to be
> used in many locations. This has the nice side effect of not
> requiring a central hypercall initialization routine in the guest to
> fetch the hypercall page. A PV driver can be completely independent
> of any other code provided that it restricts itself to it's hypercall
> namespace.

I see. So you take the fault, disassemble the instruction, see that its
another CPU's vmcall instruction, and then replace it with the current
CPU's vmcall?

> Xen is currently using 0/1/2. I had thought it was only using 0/1.
> The intention was not to squash Xen's current CPUID usage so that it
> would still be possible for Xen to make use of the guest code. Can we
> agree that Xen won't squash leaves 3/4 or is it not worth trying to be
> compatible at this point?

No, the point is that you're supposed to work out which hypervisor it is
from the signature in leaf 0, and then the hypervisor can put anything
it wants in the other leaves.

J

2007-09-14 21:45:25

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Zachary Amsden wrote:
> On Fri, 2007-09-14 at 16:02 -0500, Anthony Liguori wrote:
>
>> Jeremy Fitzhardinge wrote:
>>
>>> Anthony Liguori wrote:
>>>
>>>
>>>> This patch refactors the current hypercall infrastructure to better support live
>>>> migration and SMP. It eliminates the hypercall page by trapping the UD
>>>> exception that would occur if you used the wrong hypercall instruction for the
>>>> underlying architecture and replacing it with the right one lazily.
>>>>
>>>>
>>>>
>>> I guess it would be pretty rude/unlikely for these opcodes to get reused
>>> in other implementations... But couldn't you make the page trap
>>> instead, rather than relying on an instruction fault?
>>>
>>>
>> The whole point of using the instruction is to allow hypercalls to be
>> used in many locations. This has the nice side effect of not requiring
>> a central hypercall initialization routine in the guest to fetch the
>> hypercall page. A PV driver can be completely independent of any other
>> code provided that it restricts itself to it's hypercall namespace.
>>
>
> But if the instruction is architecture dependent, and you run on the
> wrong architecture, now you have to patch many locations at fault time,
> introducing some nasty runtime code / data cache overlap performance
> problems. Granted, they go away eventually.
>

We're addressing that by blowing away the shadow cache and holding the
big kvm lock to ensure SMP safety. Not a great thing to do from a
performance perspective but the whole point of patching is that the cost
is amortized.

> I prefer the idea of a hypercall page, but not a central initialization.
> Rather, a decentralized approach where PV drivers can detect using CPUID
> which hypervisor is present, and a common MSR shared by all hypervisors
> that provides the location of the hypercall page.
>

So then each module creates a hypercall page using this magic MSR and
the hypervisor has to keep track of it so that it can appropriately
change the page on migration. The page can only contain a single
instruction or else it cannot be easily changed (or you have to be able
to prevent the guest from being migrated while in the hypercall page).

We're really talking about identical models. Instead of an MSR, the #GP
is what tells the hypervisor to update the instruction. The nice thing
about this is that you don't have to keep track of all the current
hypercall page locations in the hypervisor.

Regards,

Anthony Liguori

> Zach
>
>
>

2007-09-14 21:46:40

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>
>> The whole point of using the instruction is to allow hypercalls to be
>> used in many locations. This has the nice side effect of not
>> requiring a central hypercall initialization routine in the guest to
>> fetch the hypercall page. A PV driver can be completely independent
>> of any other code provided that it restricts itself to it's hypercall
>> namespace.
>>
>
> I see. So you take the fault, disassemble the instruction, see that its
> another CPU's vmcall instruction, and then replace it with the current
> CPU's vmcall?
>

Yup.

>> Xen is currently using 0/1/2. I had thought it was only using 0/1.
>> The intention was not to squash Xen's current CPUID usage so that it
>> would still be possible for Xen to make use of the guest code. Can we
>> agree that Xen won't squash leaves 3/4 or is it not worth trying to be
>> compatible at this point?
>>
>
> No, the point is that you're supposed to work out which hypervisor it is
> from the signature in leaf 0, and then the hypervisor can put anything
> it wants in the other leaves.
>

Yeah, see, the initial goal was to make it possible to use the KVM
paravirtualizations on other hypervisors. However, I don't think this
is really going to be possible in general so maybe it's better to just
use leaf 0. I'll let others chime in before sending a new patch.

Regards,

Anthony Liguori

> J
>
>

2007-09-14 21:53:16

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Anthony Liguori wrote:
> Yeah, see, the initial goal was to make it possible to use the KVM
> paravirtualizations on other hypervisors. However, I don't think this
> is really going to be possible in general so maybe it's better to just
> use leaf 0. I'll let others chime in before sending a new patch.

Hm. Obviously you can just define a signature for "kvm-compatible
hypercall interface" and make it common that way, but it gets tricky if
the hypervisor supports multiple hypercall interfaces, including the kvm
one. Start the kvm leaves at 0x40001000 or something?

J

2007-09-14 22:08:36

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
>
>> Yeah, see, the initial goal was to make it possible to use the KVM
>> paravirtualizations on other hypervisors. However, I don't think this
>> is really going to be possible in general so maybe it's better to just
>> use leaf 0. I'll let others chime in before sending a new patch.
>>
>
> Hm. Obviously you can just define a signature for "kvm-compatible
> hypercall interface" and make it common that way, but it gets tricky if
> the hypervisor supports multiple hypercall interfaces, including the kvm
> one. Start the kvm leaves at 0x40001000 or something?
>

Yeah, that works with me.

Regards,

Anthony Liguori

> J
>
>

2007-09-14 22:40:39

[permalink] [raw]

Subject: RE: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Anthony Liguori wrote:
> Jeremy Fitzhardinge wrote:
> > Anthony Liguori wrote:
> >
> > > Yeah, see, the initial goal was to make it possible to use the KVM
> > > paravirtualizations on other hypervisors. However, I don't think
this
> > > is really going to be possible in general so maybe it's better to
just
> > > use leaf 0. I'll let others chime in before sending a new patch.
> > >
> >
> > Hm. Obviously you can just define a signature for "kvm-compatible
> > hypercall interface" and make it common that way, but it gets tricky
if
> > the hypervisor supports multiple hypercall interfaces, including the
kvm
> > one. Start the kvm leaves at 0x40001000 or something?
> >
>
> Yeah, that works with me.

To me this is the beginning of fragmentation. Why do we need different
and VMM-specific Linux paravirtualization for hardware-assisted
virtualization? That would not be good for Linux.

>
> Regards,
>
> Anthony Liguori
>
> > J

Jun
---
Intel Open Source Technology Center

2007-09-14 23:00:39

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Nakajima, Jun wrote:
>>> one. Start the kvm leaves at 0x40001000 or something?
>>>
>>>
>> Yeah, that works with me.
>>
>
> To me this is the beginning of fragmentation. Why do we need different
> and VMM-specific Linux paravirtualization for hardware-assisted
> virtualization? That would not be good for Linux.
>

On the contrary. Xen already has a hypercall interface, and we need to
keep supporting it. If we were to also support a vmm-independent
interface (aka "kvm interface"), then we need to be able to do that in
parallel. If we have a cpuid leaf clash, then its impossible to do so;
if we define the new interface to be disjoint from other current users
of cpuid, then we can support them concurrently.

J

2007-09-15 00:11:00

[permalink] [raw]

Subject: RE: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Jeremy Fitzhardinge wrote:
> Nakajima, Jun wrote:
> > > > one. Start the kvm leaves at 0x40001000 or something?
> > > >
> > > >
> > > Yeah, that works with me.
> > >
> >
> > To me this is the beginning of fragmentation. Why do we need
different
> > and VMM-specific Linux paravirtualization for hardware-assisted
> > virtualization? That would not be good for Linux.
> >
>
> On the contrary. Xen already has a hypercall interface, and we need
to
> keep supporting it. If we were to also support a vmm-independent
> interface (aka "kvm interface"), then we need to be able to do that in
> parallel. If we have a cpuid leaf clash, then its impossible to do
so;
> if we define the new interface to be disjoint from other current users
> of cpuid, then we can support them concurrently.
>
> J

Today, 3 CPUID leaves starting from 0x4000_0000 are defined in a generic
fashion (hypervisor detection, version, and hypercall page), and those
are the ones used by Xen today. We should extend those leaves (e.g.
starting from 0x4000_0003) for the vmm-independent features as well.

If Xen needs additional Xen-specific features, we need to allocate some
leaves for those (e.g. 0x4000_1000)

Jun
---
Intel Open Source Technology Center

2007-09-15 00:28:18

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Nakajima, Jun wrote:
> Today, 3 CPUID leaves starting from 0x4000_0000 are defined in a generic
> fashion (hypervisor detection, version, and hypercall page), and those
> are the ones used by Xen today. We should extend those leaves (e.g.
> starting from 0x4000_0003) for the vmm-independent features as well.
>
> If Xen needs additional Xen-specific features, we need to allocate some
> leaves for those (e.g. 0x4000_1000)

But the signature is "XenVMMXenVMM", which isn't very generic. If we're
presenting a generic interface, it needs to have a generic signature,
otherwise guests will need to have a list of all hypervisor signatures
supporting their interface. Since 0x40000000 has already been
established as the base leaf of the hypervisor-specific interfaces, the
generic interface will have to be elsewhere.

J

2007-09-15 01:04:25

[permalink] [raw]

Subject: RE: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Jeremy Fitzhardinge wrote:
> Nakajima, Jun wrote:
> > Today, 3 CPUID leaves starting from 0x4000_0000 are defined in a
generic
> > fashion (hypervisor detection, version, and hypercall page), and
those
> > are the ones used by Xen today. We should extend those leaves (e.g.
> > starting from 0x4000_0003) for the vmm-independent features as well.
> >
> > If Xen needs additional Xen-specific features, we need to allocate
some
> > leaves for those (e.g. 0x4000_1000)
>
> But the signature is "XenVMMXenVMM", which isn't very generic. If
we're
> presenting a generic interface, it needs to have a generic signature,
> otherwise guests will need to have a list of all hypervisor signatures
> supporting their interface. Since 0x40000000 has already been
> established as the base leaf of the hypervisor-specific interfaces,
the
> generic interface will have to be elsewhere.

The hypervisor detection machanism is generic, and the signature
returned is implentation specific. Having a list of all hypervisor
signatures sounds fine to me as we are detecting vendor-specific
processor(s) in the native. And I don't expect the list is large.

>
> J

Jun
---
Intel Open Source Technology Center

2007-09-15 03:05:55

by Rusty Russell

[permalink] [raw]

Subject: Re: [PATCH] Refactor hypercall infrastructure

On Fri, 2007-09-14 at 13:53 -0700, Jeremy Fitzhardinge wrote:
> Anthony Liguori wrote:
> > This patch refactors the current hypercall infrastructure to better support live
> > migration and SMP. It eliminates the hypercall page by trapping the UD
> > exception that would occur if you used the wrong hypercall instruction for the
> > underlying architecture and replacing it with the right one lazily.
> >
>
> I guess it would be pretty rude/unlikely for these opcodes to get reused
> in other implementations... But couldn't you make the page trap
> instead, rather than relying on an instruction fault?

That's a pain for inline hypercalls tho. I was planning on moving
lguest to this model (which is interesting, because AFAICT this insn
will cause a #UD or #GP depending on whether VT is supported on this box
so I have to look for both).

Cheers,
Rusty.

2007-09-15 03:38:05

by Zachary Amsden

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

On Fri, 2007-09-14 at 16:44 -0500, Anthony Liguori wrote:

> So then each module creates a hypercall page using this magic MSR and
> the hypervisor has to keep track of it so that it can appropriately
> change the page on migration. The page can only contain a single
> instruction or else it cannot be easily changed (or you have to be able
> to prevent the guest from being migrated while in the hypercall page).
>
> We're really talking about identical models. Instead of an MSR, the #GP
> is what tells the hypervisor to update the instruction. The nice thing
> about this is that you don't have to keep track of all the current
> hypercall page locations in the hypervisor.

I agree, multiple hypercall pages is insane. I was thinking more of a
single hypercall page, fixed in place by the hypervisor, not the kernel.

Then each module can read an MSR saying what VA the hypercall page is
at, and the hypervisor can simply flip one page to switch architectures.

Zach

2007-09-15 04:53:51

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Nakajima, Jun wrote:
> The hypervisor detection machanism is generic, and the signature
> returned is implentation specific. Having a list of all hypervisor
> signatures sounds fine to me as we are detecting vendor-specific
> processor(s) in the native. And I don't expect the list is large.
>
>

I'm confused about what you're proposing. I was thinking that a kernel
looking for the generic hypervisor interface would check for a specific
signature at some cpuid leaf, and then go about using it from there. If
not, how does is it supposed to detect the generic hypervisor interface?

J

2007-09-15 06:11:32

[permalink] [raw]

Subject: RE: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Jeremy Fitzhardinge wrote:
> Nakajima, Jun wrote:
> > The hypervisor detection machanism is generic, and the signature
> > returned is implentation specific. Having a list of all hypervisor
> > signatures sounds fine to me as we are detecting vendor-specific
> > processor(s) in the native. And I don't expect the list is large.
> >
> >
>
> I'm confused about what you're proposing. I was thinking that a
kernel
> looking for the generic hypervisor interface would check for a
specific
> signature at some cpuid leaf, and then go about using it from there.
If
> not, how does is it supposed to detect the generic hypervisor
interface?
>
> J

I'm suggesting that we use CPUID.0x4000000Y (Y: TBD, e.g. 6) for Linux
paravirtualization. The ebx, ecx and edx return the Linux
paravirtualization features available on that hypervisor. Those features
are defined architecturally (not VMM specific).

Like CPUID.0, CPUID.0x40000000 is used to detect the hypervisor with the
vendor identification string returned in ebx, edx, and ecx (as we are
doing in Xen). The eax returns the max leaf (which is 0x40000002 on Xen
today). And like CPUID.1, CPUID.0x40000001 returns the version number in
eax, and each VMM should be able to define a number of VMM-specific
features available in ebx, ecx, and edx returned (which are reserved,
i.e. not used in Xen today).

Suppose we knew (i.e. tested) Xen and KVM supported Linux
paravirtualization, the Linux code does:
1. detect Xen or KVM <the list> using CPUID.0x40000000
2. Check the version if necessary using CPUID.0x40000001
3. Check the Linux paravirtualization features available using
CPUID.0x4000000Y.

Jun
---
Intel Open Source Technology Center

2007-09-15 07:53:26

by Avi Kivity

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Anthony Liguori wrote:
> Jeremy Fitzhardinge wrote:
>
>> Anthony Liguori wrote:
>>
>>
>>> This patch refactors the current hypercall infrastructure to better support live
>>> migration and SMP. It eliminates the hypercall page by trapping the UD
>>> exception that would occur if you used the wrong hypercall instruction for the
>>> underlying architecture and replacing it with the right one lazily.
>>>
>>>
>>>
>> I guess it would be pretty rude/unlikely for these opcodes to get reused
>> in other implementations... But couldn't you make the page trap
>> instead, rather than relying on an instruction fault?
>>
>>
>
> The whole point of using the instruction is to allow hypercalls to be
> used in many locations. This has the nice side effect of not requiring
> a central hypercall initialization routine in the guest to fetch the
> hypercall page. A PV driver can be completely independent of any other
> code provided that it restricts itself to it's hypercall namespace.
>
>

It also has the benefit of not requiring an initialization protocol, and
of reducing complaints about the hypervisor injecting code into the guest.

>>> It also introduces the infrastructure to probe for hypercall available via
>>> CPUID leaves 0x40000002. CPUID leaf 0x40000003 should be filled out by
>>> userspace.
>>>
>>>
>>>
>> Is this compatible with Xen's (and other's) use of cpuid? That is,
>> 0x40000000 returns a hypervisor-specific signature in e[bcd]x, and eax
>> has the max hypervisor leaf.
>>
>>
>
> Xen is currently using 0/1/2. I had thought it was only using 0/1. The
> intention was not to squash Xen's current CPUID usage so that it would
> still be possible for Xen to make use of the guest code. Can we agree
> that Xen won't squash leaves 3/4 or is it not worth trying to be
> compatible at this point?
>

I definitely want kvm to be able to emulate the Xen hypercall interface,
but there's no need to allow both concurrently. So I'd say use
0x40000000 for detection and the rest cannot clash because detection fails.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-09-15 08:01:00

by Avi Kivity

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Nakajima, Jun wrote:
> To me this is the beginning of fragmentation. Why do we need different
> and VMM-specific Linux paravirtualization for hardware-assisted
> virtualization? That would not be good for Linux.
>
>

The only way to have a single interface is if a central authority
defines and documents that interface, and all hypervisor implementors
agree not to implement extensions. Do you see that happening?

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-09-15 08:09:03

by Avi Kivity

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Zachary Amsden wrote:
> On Fri, 2007-09-14 at 16:44 -0500, Anthony Liguori wrote:
>
>
>> So then each module creates a hypercall page using this magic MSR and
>> the hypervisor has to keep track of it so that it can appropriately
>> change the page on migration. The page can only contain a single
>> instruction or else it cannot be easily changed (or you have to be able
>> to prevent the guest from being migrated while in the hypercall page).
>>
>> We're really talking about identical models. Instead of an MSR, the #GP
>> is what tells the hypervisor to update the instruction. The nice thing
>> about this is that you don't have to keep track of all the current
>> hypercall page locations in the hypervisor.
>>
>
> I agree, multiple hypercall pages is insane. I was thinking more of a
> single hypercall page, fixed in place by the hypervisor, not the kernel.
>
> Then each module can read an MSR saying what VA the hypercall page is
> at, and the hypervisor can simply flip one page to switch architectures.
>

VA as in "Virtual Address"? the ppc people don't have
hypervisor-visible virtual addresses, and the hypervisor (on x86) can't
safely select a virtual address, and ...

That means you need a physical address, so you need a central
initialization routine, and drivers for unmodified OSes can no longer be
self contained.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

2007-09-15 17:33:50

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Zachary Amsden wrote:
> On Fri, 2007-09-14 at 16:44 -0500, Anthony Liguori wrote:
>
>
>> So then each module creates a hypercall page using this magic MSR and
>> the hypervisor has to keep track of it so that it can appropriately
>> change the page on migration. The page can only contain a single
>> instruction or else it cannot be easily changed (or you have to be able
>> to prevent the guest from being migrated while in the hypercall page).
>>
>> We're really talking about identical models. Instead of an MSR, the #GP
>> is what tells the hypervisor to update the instruction. The nice thing
>> about this is that you don't have to keep track of all the current
>> hypercall page locations in the hypervisor.
>>
>
> I agree, multiple hypercall pages is insane. I was thinking more of a
> single hypercall page, fixed in place by the hypervisor, not the kernel.
>
> Then each module can read an MSR saying what VA the hypercall page is
> at, and the hypervisor can simply flip one page to switch architectures.
>

That requires a memory hole though. In KVM, we don't have a memory hole.

Regards,

Anthony Liguori

> Zach
>
>

2007-09-15 18:24:13

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Nakajima, Jun wrote:
> Jeremy Fitzhardinge wrote:
>
>> Nakajima, Jun wrote:
>>
>>> The hypervisor detection machanism is generic, and the signature
>>> returned is implentation specific. Having a list of all hypervisor
>>> signatures sounds fine to me as we are detecting vendor-specific
>>> processor(s) in the native. And I don't expect the list is large.
>>>
>>>
>>>
>> I'm confused about what you're proposing. I was thinking that a
>>
> kernel
>
>> looking for the generic hypervisor interface would check for a
>>
> specific
>
>> signature at some cpuid leaf, and then go about using it from there.
>>
> If
>
>> not, how does is it supposed to detect the generic hypervisor
>>
> interface?
>
>> J
>>
>
> I'm suggesting that we use CPUID.0x4000000Y (Y: TBD, e.g. 6) for Linux
> paravirtualization. The ebx, ecx and edx return the Linux
> paravirtualization features available on that hypervisor. Those features
> are defined architecturally (not VMM specific).
>
> Like CPUID.0, CPUID.0x40000000 is used to detect the hypervisor with the
> vendor identification string returned in ebx, edx, and ecx (as we are
> doing in Xen). The eax returns the max leaf (which is 0x40000002 on Xen
> today).

I don't understand the purpose of returning the max leaf. Who is that
information useful for?

I like Jeremy's suggesting of starting with 0x40001000 for KVM. Xen has
an established hypercall interface and that isn't going to change.
However, in the future, if other Operating Systems (like the BSDs)
choose to implement the KVM paravirtualization interface, then that
leaves open the possibility for Xen to also support this interface to
get good performance for those OSes. It's necessary to be able to
support both at once if you wish to support these interfaces without
user interaction.

There's no tangible benefit to us to use 0x40000000. Therefore I'm
inclined to lean toward making things easier for others.

Regards,

Anthony Liguori

> And like CPUID.1, CPUID.0x40000001 returns the version number in
> eax, and each VMM should be able to define a number of VMM-specific
> features available in ebx, ecx, and edx returned (which are reserved,
> i.e. not used in Xen today).
>
> Suppose we knew (i.e. tested) Xen and KVM supported Linux
> paravirtualization, the Linux code does:
> 1. detect Xen or KVM <the list> using CPUID.0x40000000
> 2. Check the version if necessary using CPUID.0x40000001
> 3. Check the Linux paravirtualization features available using
> CPUID.0x4000000Y.
>
> Jun
> ---
> Intel Open Source Technology Center
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> kvm-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>

2007-09-17 18:15:22

[permalink] [raw]

Subject: RE: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Anthony Liguori wrote:
> Nakajima, Jun wrote:
<snip>
> >
> > I'm suggesting that we use CPUID.0x4000000Y (Y: TBD, e.g. 6) for
Linux
> > paravirtualization. The ebx, ecx and edx return the Linux
> > paravirtualization features available on that hypervisor. Those
features
> > are defined architecturally (not VMM specific).
> >
> > Like CPUID.0, CPUID.0x40000000 is used to detect the hypervisor with
the
> > vendor identification string returned in ebx, edx, and ecx (as we
are
> > doing in Xen). The eax returns the max leaf (which is 0x40000002 on
Xen
> > today).
>
> I don't understand the purpose of returning the max leaf. Who is that
> information useful for?

Well, this is the key info to the user of CPUID. It tells which leaves
are valid to use. Otherwise, the user cannot tell whether the results of
CPUID.0x4000000N are valid or not (i.e. junk). BTW, this is what we are
doing on the native (for the leaf 0, 0x80000000, for example). The fact
that Xen returns 0x40000002 means it only uses 3 leaves today.

>
> I like Jeremy's suggesting of starting with 0x40001000 for KVM. Xen
has
> an established hypercall interface and that isn't going to change.
> However, in the future, if other Operating Systems (like the BSDs)
> choose to implement the KVM paravirtualization interface, then that
> leaves open the possibility for Xen to also support this interface to
> get good performance for those OSes. It's necessary to be able to
> support both at once if you wish to support these interfaces without
> user interaction.

Using CPUID.0x4000000N (N > 2) does not prevent Xen from doing that,
either. If you use 0x40001000, 1) you need to say the leaves from
0x40000000 through 0x40001000 are all valid, OR 2) you create/fork a
new/odd leaf (with 0x1000 offset) repeating the detection redundantly.

>
> There's no tangible benefit to us to use 0x40000000. Therefore I'm
> inclined to lean toward making things easier for others.

Again, 0x40000000 is not Xen specific. If the leaf 0x40000000 is used
for any guest to detect any hypervisor, that would be compelling
benefit. For future Xen-specific features, it's safe for Xen to use
other bigger leaves (like 0x40001000) because the guest starts looking
at them after detection of Xen.

Likewise if KVM paravirtualization interface (as kind of "open source
paravirtualization interface") is detected in the generic areas (not in
vender-specific), any guest can check the features available without
knowing which hypervisor uses which CPUID for that.

>
> Regards,
>
> Anthony Liguori
>
> > And like CPUID.1, CPUID.0x40000001 returns the version number in
> > eax, and each VMM should be able to define a number of VMM-specific
> > features available in ebx, ecx, and edx returned (which are
reserved, i.e.
> > not used in Xen today).
> >
> > Suppose we knew (i.e. tested) Xen and KVM supported Linux
> > paravirtualization, the Linux code does:
> > 1. detect Xen or KVM <the list> using CPUID.0x40000000
> > 2. Check the version if necessary using CPUID.0x40000001
> > 3. Check the Linux paravirtualization features available using
> > CPUID.0x4000000Y.
> >
> > Jun
> > ---
> > Intel Open Source Technology Center

Jun
---
Intel Open Source Technology Center

2007-09-17 18:27:35

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Nakajima, Jun wrote:
>> I don't understand the purpose of returning the max leaf. Who is that
>> information useful for?
>>
>
> Well, this is the key info to the user of CPUID. It tells which leaves
> are valid to use. Otherwise, the user cannot tell whether the results of
> CPUID.0x4000000N are valid or not (i.e. junk). BTW, this is what we are
> doing on the native (for the leaf 0, 0x80000000, for example). The fact
> that Xen returns 0x40000002 means it only uses 3 leaves today.
>

Then it's just a version ID. You pretty much have to treat it as a
version id because if it returns 0x4000 0003 and you only know what 0002
is, then you can't actually use it.

I much prefer the current use of CPUID in KVM. If 1000 returns the KVM
signature, then 1001 *must* be valid and contain a set of feature bits.
If we wish to use additional CPUID leaves in the future, then we can
just use a feature bit. The real benefit to us is that we can use a
discontiguous set of leaves whereas the Xen approach is forced to use a
linear set (at least for the result to be meaningful).

>> I like Jeremy's suggesting of starting with 0x40001000 for KVM. Xen
>>
> has
>
>> an established hypercall interface and that isn't going to change.
>> However, in the future, if other Operating Systems (like the BSDs)
>> choose to implement the KVM paravirtualization interface, then that
>> leaves open the possibility for Xen to also support this interface to
>> get good performance for those OSes. It's necessary to be able to
>> support both at once if you wish to support these interfaces without
>> user interaction.
>>
>
> Using CPUID.0x4000000N (N > 2) does not prevent Xen from doing that,
> either. If you use 0x40001000, 1) you need to say the leaves from
> 0x40000000 through 0x40001000 are all valid, OR 2) you create/fork a
> new/odd leaf (with 0x1000 offset) repeating the detection redundantly.
>

Why do 0000-1000 have to be valid? Xen is not going to change what they
have today--they can't. However, if down the road, they decided that
since so many guests use KVM's paravirtualization interface other than
Linux, there's value in supporting it, by using 1000, they can.

>> There's no tangible benefit to us to use 0x40000000. Therefore I'm
>> inclined to lean toward making things easier for others.
>>
>
> Again, 0x40000000 is not Xen specific. If the leaf 0x40000000 is used
> for any guest to detect any hypervisor, that would be compelling
> benefit. For future Xen-specific features, it's safe for Xen to use
> other bigger leaves (like 0x40001000) because the guest starts looking
> at them after detection of Xen.
>

I'm starting to lean toward just using 0000. If for no other reason
than the hypercall space is unsharable.

Regards,

Anthony Liguori

> Likewise if KVM paravirtualization interface (as kind of "open source
> paravirtualization interface") is detected in the generic areas (not in
> vender-specific), any guest can check the features available without
> knowing which hypervisor uses which CPUID for that.
>
>
>> Regards,
>>
>> Anthony Liguori
>>
>>
>>> And like CPUID.1, CPUID.0x40000001 returns the version number in
>>> eax, and each VMM should be able to define a number of VMM-specific
>>> features available in ebx, ecx, and edx returned (which are
>>>
> reserved, i.e.
>
>>> not used in Xen today).
>>>
>>> Suppose we knew (i.e. tested) Xen and KVM supported Linux
>>> paravirtualization, the Linux code does:
>>> 1. detect Xen or KVM <the list> using CPUID.0x40000000
>>> 2. Check the version if necessary using CPUID.0x40000001
>>> 3. Check the Linux paravirtualization features available using
>>> CPUID.0x4000000Y.
>>>
>>> Jun
>>> ---
>>> Intel Open Source Technology Center
>>>
>
> Jun
> ---
> Intel Open Source Technology Center
>

2007-09-17 19:15:37

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Nakajima, Jun wrote:
> Using CPUID.0x4000000N (N > 2) does not prevent Xen from doing that,
> either. If you use 0x40001000, 1) you need to say the leaves from
> 0x40000000 through 0x40001000 are all valid, OR 2) you create/fork a
> new/odd leaf (with 0x1000 offset) repeating the detection redundantly.
>

I don't see a particular problem with that. If the whole 0x4xxxxxxx
range is reserved for hypervisor use, and existing hypervisors are
already using 0x400000xx in hypervisor-specific ways, then it makes
sense to start the generic stuff at 0x40001xxx (or some other offset).
But without a few more implementations of the "generic" interface its
all a bit moot (ie, where's your code? ;).

> Again, 0x40000000 is not Xen specific. If the leaf 0x40000000 is used
> for any guest to detect any hypervisor, that would be compelling
> benefit. For future Xen-specific features, it's safe for Xen to use
> other bigger leaves (like 0x40001000) because the guest starts looking
> at them after detection of Xen.
>
> Likewise if KVM paravirtualization interface (as kind of "open source
> paravirtualization interface") is detected in the generic areas (not in
> vender-specific), any guest can check the features available without
> knowing which hypervisor uses which CPUID for that.
>

This just seems a bit grotty. You're relying on the fact that you can
overlay Xen's current use of 0x4000000x for the generic interface by
freezing Xen's current use of 40000000-2. 0x40000000 becomes a more or
less useless hypervisor-identification signature (useless because you
need to assume that leaves 4000000x, x>2 implement the generic interface
anyway, where x=1,2 are reserved for Xen (=hypervisor-specific) uses).

In other words, what mechanism can a guest use to explicitly identify
the existence of the generic interface? There needs to be a signature
for that somewhere.

J

2007-09-17 19:15:52

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Anthony Liguori wrote:
> Nakajima, Jun wrote:
>>> I don't understand the purpose of returning the max leaf. Who is that
>>> information useful for?
>>>
>>
>> Well, this is the key info to the user of CPUID. It tells which leaves
>> are valid to use. Otherwise, the user cannot tell whether the results of
>> CPUID.0x4000000N are valid or not (i.e. junk). BTW, this is what we are
>> doing on the native (for the leaf 0, 0x80000000, for example). The fact
>> that Xen returns 0x40000002 means it only uses 3 leaves today.
>
> Then it's just a version ID. You pretty much have to treat it as a
> version id because if it returns 0x4000 0003 and you only know what
> 0002 is, then you can't actually use it.

Yeah. It's the way all the other cpuid leaf/level stuff works, so it's
reasonable to do the same thing here. The question it helps answer is
"I understand leaf 33, does the [v]CPU?".

> I much prefer the current use of CPUID in KVM. If 1000 returns the
> KVM signature, then 1001 *must* be valid and contain a set of feature
> bits. If we wish to use additional CPUID leaves in the future, then
> we can just use a feature bit. The real benefit to us is that we can
> use a discontiguous set of leaves whereas the Xen approach is forced
> to use a linear set (at least for the result to be meaningful).

Well, its also what the CPU itself does. The feature bits tend to
relate to specific CPU features rather than CPUID instruction leaves.
The features themselves may also have corresponding leaves, but that's
secondary. IOW, if feature bit X is set, it may use leaf 0x4000101f,
but that doesn't mean leaves 0x40001001-1f are necessarily defined.

> I'm starting to lean toward just using 0000. If for no other reason
> than the hypercall space is unsharable.

Well, it could be, but it would take affirmative action on the guest's
part. If there's feature bits for each supported hypercall interface,
then you could have a magic MSR to select which interface you want to
use now. That would allow a generic-interface-using guest to probe for
the generic interface at cpuid leaf 0x40001000, use 40001001 to
determine whether the hypercall interface is available, 4000100x to find
the base of the magic msrs, and write appropriate msr to set the desired
hypercall style (and all this can be done without using vmcall, so it
doesn't matter that hypercall interface is initially established).

J

2007-09-17 19:34:20

[permalink] [raw]

Subject: Re: [kvm-devel] [PATCH] Refactor hypercall infrastructure

Jeremy Fitzhardinge wrote:
>> I'm starting to lean toward just using 0000. If for no other reason
>> than the hypercall space is unsharable.
>>
>
> Well, it could be, but it would take affirmative action on the guest's
> part. If there's feature bits for each supported hypercall interface,
> then you could have a magic MSR to select which interface you want to
> use now. That would allow a generic-interface-using guest to probe for
> the generic interface at cpuid leaf 0x40001000, use 40001001 to
> determine whether the hypercall interface is available, 4000100x to find
> the base of the magic msrs, and write appropriate msr to set the desired
> hypercall style (and all this can be done without using vmcall, so it
> doesn't matter that hypercall interface is initially established).
>

The main thing keeping me from doing this ATM is what I perceive as lack
of interest in a generic interface. I think it's also a little
premature given that we don't have any features on the plate yet.
However, I don't think that means that we cannot turn KVM's PV into a
generic one. So here's what I propose.

Let's start building the KVM PV interface on 4000 0000. That means that
Xen cannot initially use it but that's okay. Once KVM-lite is merged
and we have some solid features (and other guests start implementing
them), we can also advertise this interface as a "generic interface" by
also supporting the signature on leave 4000 1000 and using the MSR
trickery that you propose.

As long as we all agree not to use 4000 1000 for now, it leaves open the
possibility of having a generic interface in the future.

Regards,

Anthony Liguori

> J
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> kvm-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>
>

2007-09-17 20:52:38