2023-11-28 07:08:23

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page

On Wed, 2023-11-08 at 11:17 +0000, Nicolas Saenz Julienne wrote:
> VTL call/return hypercalls have their own entry points in the hypercall
> page because they don't follow normal hyper-v hypercall conventions.
> Move the VTL call/return control input into ECX/RAX and set the
> hypercall code into EAX/RCX before calling the hypercall instruction in
> order to be able to use the Hyper-V hypercall entry function.
>
> Guests can read an emulated code page offsets register to know the
> offsets into the hypercall page for the VTL call/return entries.
>
> Signed-off-by: Nicolas Saenz Julienne <[email protected]>
>
> ---
>
> My tree has the additional patch, we're still trying to understand under
> what conditions Windows expects the offset to be fixed.
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 54f7f36a89bf..9f2ea8c34447 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -294,6 +294,7 @@ static int patch_hypercall_page(struct kvm_vcpu *vcpu, u64 data)
>
> /* VTL call/return entries */
> if (!kvm_xen_hypercall_enabled(kvm) && kvm_hv_vsm_enabled(kvm)) {
> + i = 22;
> #ifdef CONFIG_X86_64
> if (is_64_bit_mode(vcpu)) {
> /*
> ---
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/hyperv.c | 78 ++++++++++++++++++++++++++++++-
> include/asm-generic/hyperv-tlfs.h | 11 +++++
> 3 files changed, 90 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a2f224f95404..00cd21b09f8c 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1105,6 +1105,8 @@ struct kvm_hv {
> u64 hv_tsc_emulation_status;
> u64 hv_invtsc_control;
>
> + union hv_register_vsm_code_page_offsets vsm_code_page_offsets;
> +
> /* How many vCPUs have VP index != vCPU index */
> atomic_t num_mismatched_vp_indexes;
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 78d053042667..d4b1b53ea63d 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -259,7 +259,8 @@ static void synic_exit(struct kvm_vcpu_hv_synic *synic, u32 msr)
> static int patch_hypercall_page(struct kvm_vcpu *vcpu, u64 data)
> {
> struct kvm *kvm = vcpu->kvm;
> - u8 instructions[9];
> + struct kvm_hv *hv = to_kvm_hv(kvm);
> + u8 instructions[0x30];
> int i = 0;
> u64 addr;
>
> @@ -285,6 +286,81 @@ static int patch_hypercall_page(struct kvm_vcpu *vcpu, u64 data)
> /* ret */
> ((unsigned char *)instructions)[i++] = 0xc3;
>
> + /* VTL call/return entries */
> + if (!kvm_xen_hypercall_enabled(kvm) && kvm_hv_vsm_enabled(kvm)) {
> +#ifdef CONFIG_X86_64
> + if (is_64_bit_mode(vcpu)) {
> + /*
> + * VTL call 64-bit entry prologue:
> + * mov %rcx, %rax
> + * mov $0x11, %ecx
> + * jmp 0:

This isn't really 'jmp 0' as I first wondered but actually backward jump 32 bytes back (if I did the calculation correctly).
This is very dangerous because code that was before can change and in fact I don't think that this
offset is even correct now, and on top of that it depends on support for xen hypercalls as well.

This can be fixed by calculating the offset in runtime, however I am thinking:


Since userspace will have to be aware of the offsets in this page, and since
pretty much everything else is done in userspace, it might make sense to create
the hypercall page in the userspace.

In fact, the fact that KVM currently overwrites the guest page, is a violation of
the HV spec.

It's more correct regardless of VTL to do userspace vm exit and let the userspace put a memslot ("overlay")
over the address, and put whatever userspace wants there, including the above code.

Then we won't need the new ioctl as well.

To support this I think that we can add a userspace msr filter on the HV_X64_MSR_HYPERCALL,
although I am not 100% sure if a userspace msr filter overrides the in-kernel msr handling.

Best regards,
Maxim Levitsky


> + */
> + hv->vsm_code_page_offsets.vtl_call_offset = i;
> + instructions[i++] = 0x48;
> + instructions[i++] = 0x89;
> + instructions[i++] = 0xc8;
> + instructions[i++] = 0xb9;
> + instructions[i++] = 0x11;
> + instructions[i++] = 0x00;
> + instructions[i++] = 0x00;
> + instructions[i++] = 0x00;
> + instructions[i++] = 0xeb;
> + instructions[i++] = 0xe0;
> + /*
> + * VTL return 64-bit entry prologue:
> + * mov %rcx, %rax
> + * mov $0x12, %ecx
> + * jmp 0:
> + */
> + hv->vsm_code_page_offsets.vtl_return_offset = i;
> + instructions[i++] = 0x48;
> + instructions[i++] = 0x89;
> + instructions[i++] = 0xc8;
> + instructions[i++] = 0xb9;
> + instructions[i++] = 0x12;
> + instructions[i++] = 0x00;
> + instructions[i++] = 0x00;
> + instructions[i++] = 0x00;
> + instructions[i++] = 0xeb;
> + instructions[i++] = 0xd6;
> + } else
> +#endif
> + {
> + /*
> + * VTL call 32-bit entry prologue:
> + * mov %eax, %ecx
> + * mov $0x11, %eax
> + * jmp 0:
> + */
> + hv->vsm_code_page_offsets.vtl_call_offset = i;
> + instructions[i++] = 0x89;
> + instructions[i++] = 0xc1;
> + instructions[i++] = 0xb8;
> + instructions[i++] = 0x11;
> + instructions[i++] = 0x00;
> + instructions[i++] = 0x00;
> + instructions[i++] = 0x00;
> + instructions[i++] = 0xeb;
> + instructions[i++] = 0xf3;
> + /*
> + * VTL return 32-bit entry prologue:
> + * mov %eax, %ecx
> + * mov $0x12, %eax
> + * jmp 0:
> + */
> + hv->vsm_code_page_offsets.vtl_return_offset = i;
> + instructions[i++] = 0x89;
> + instructions[i++] = 0xc1;
> + instructions[i++] = 0xb8;
> + instructions[i++] = 0x12;
> + instructions[i++] = 0x00;
> + instructions[i++] = 0x00;
> + instructions[i++] = 0x00;
> + instructions[i++] = 0xeb;
> + instructions[i++] = 0xea;
> + }
> + }
> addr = data & HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_MASK;
> if (kvm_vcpu_write_guest(vcpu, addr, instructions, i))
> return 1;
> diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
> index fdac4a1714ec..0e7643c1ef01 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -823,4 +823,15 @@ struct hv_mmio_write_input {
> u8 data[HV_HYPERCALL_MMIO_MAX_DATA_LENGTH];
> } __packed;
>
> +/*
> + * VTL call/return hypercall page offsets register
> + */
> +union hv_register_vsm_code_page_offsets {
> + u64 as_u64;
> + struct {
> + u64 vtl_call_offset:12;
> + u64 vtl_return_offset:12;
> + u64 reserved:40;
> + } __packed;
> +};
> #endif






2023-11-28 16:34:10

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page

On Tue, Nov 28, 2023, Maxim Levitsky wrote:
> On Wed, 2023-11-08 at 11:17 +0000, Nicolas Saenz Julienne wrote:
> > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> > index 78d053042667..d4b1b53ea63d 100644
> > --- a/arch/x86/kvm/hyperv.c
> > +++ b/arch/x86/kvm/hyperv.c
> > @@ -259,7 +259,8 @@ static void synic_exit(struct kvm_vcpu_hv_synic *synic, u32 msr)
> > static int patch_hypercall_page(struct kvm_vcpu *vcpu, u64 data)
> > {
> > struct kvm *kvm = vcpu->kvm;
> > - u8 instructions[9];
> > + struct kvm_hv *hv = to_kvm_hv(kvm);
> > + u8 instructions[0x30];
> > int i = 0;
> > u64 addr;
> >
> > @@ -285,6 +286,81 @@ static int patch_hypercall_page(struct kvm_vcpu *vcpu, u64 data)
> > /* ret */
> > ((unsigned char *)instructions)[i++] = 0xc3;
> >
> > + /* VTL call/return entries */
> > + if (!kvm_xen_hypercall_enabled(kvm) && kvm_hv_vsm_enabled(kvm)) {
> > +#ifdef CONFIG_X86_64
> > + if (is_64_bit_mode(vcpu)) {
> > + /*
> > + * VTL call 64-bit entry prologue:
> > + * mov %rcx, %rax
> > + * mov $0x11, %ecx
> > + * jmp 0:
>
> This isn't really 'jmp 0' as I first wondered but actually backward jump 32
> bytes back (if I did the calculation correctly). This is very dangerous
> because code that was before can change and in fact I don't think that this
> offset is even correct now, and on top of that it depends on support for xen
> hypercalls as well.
>
> This can be fixed by calculating the offset in runtime, however I am
> thinking:
>
>
> Since userspace will have to be aware of the offsets in this page, and since
> pretty much everything else is done in userspace, it might make sense to
> create the hypercall page in the userspace.
>
> In fact, the fact that KVM currently overwrites the guest page, is a
> violation of the HV spec.
>
> It's more correct regardless of VTL to do userspace vm exit and let the
> userspace put a memslot ("overlay") over the address, and put whatever
> userspace wants there, including the above code.
>
> Then we won't need the new ioctl as well.
>
> To support this I think that we can add a userspace msr filter on the
> HV_X64_MSR_HYPERCALL, although I am not 100% sure if a userspace msr filter
> overrides the in-kernel msr handling.

Yep, userspace MSR filters override in-kernel handling.

2023-12-01 16:21:26

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page

On Tue Nov 28, 2023 at 7:08 AM UTC, Maxim Levitsky wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Wed, 2023-11-08 at 11:17 +0000, Nicolas Saenz Julienne wrote:
> > VTL call/return hypercalls have their own entry points in the hypercall
> > page because they don't follow normal hyper-v hypercall conventions.
> > Move the VTL call/return control input into ECX/RAX and set the
> > hypercall code into EAX/RCX before calling the hypercall instruction in
> > order to be able to use the Hyper-V hypercall entry function.
> >
> > Guests can read an emulated code page offsets register to know the
> > offsets into the hypercall page for the VTL call/return entries.
> >
> > Signed-off-by: Nicolas Saenz Julienne <[email protected]>
> >
> > ---
> >
> > My tree has the additional patch, we're still trying to understand under
> > what conditions Windows expects the offset to be fixed.
> >
> > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> > index 54f7f36a89bf..9f2ea8c34447 100644
> > --- a/arch/x86/kvm/hyperv.c
> > +++ b/arch/x86/kvm/hyperv.c
> > @@ -294,6 +294,7 @@ static int patch_hypercall_page(struct kvm_vcpu *vcpu, u64 data)
> >
> > /* VTL call/return entries */
> > if (!kvm_xen_hypercall_enabled(kvm) && kvm_hv_vsm_enabled(kvm)) {
> > + i = 22;
> > #ifdef CONFIG_X86_64
> > if (is_64_bit_mode(vcpu)) {
> > /*
> > ---
> > arch/x86/include/asm/kvm_host.h | 2 +
> > arch/x86/kvm/hyperv.c | 78 ++++++++++++++++++++++++++++++-
> > include/asm-generic/hyperv-tlfs.h | 11 +++++
> > 3 files changed, 90 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index a2f224f95404..00cd21b09f8c 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1105,6 +1105,8 @@ struct kvm_hv {
> > u64 hv_tsc_emulation_status;
> > u64 hv_invtsc_control;
> >
> > + union hv_register_vsm_code_page_offsets vsm_code_page_offsets;
> > +
> > /* How many vCPUs have VP index != vCPU index */
> > atomic_t num_mismatched_vp_indexes;
> >
> > diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> > index 78d053042667..d4b1b53ea63d 100644
> > --- a/arch/x86/kvm/hyperv.c
> > +++ b/arch/x86/kvm/hyperv.c
> > @@ -259,7 +259,8 @@ static void synic_exit(struct kvm_vcpu_hv_synic *synic, u32 msr)
> > static int patch_hypercall_page(struct kvm_vcpu *vcpu, u64 data)
> > {
> > struct kvm *kvm = vcpu->kvm;
> > - u8 instructions[9];
> > + struct kvm_hv *hv = to_kvm_hv(kvm);
> > + u8 instructions[0x30];
> > int i = 0;
> > u64 addr;
> >
> > @@ -285,6 +286,81 @@ static int patch_hypercall_page(struct kvm_vcpu *vcpu, u64 data)
> > /* ret */
> > ((unsigned char *)instructions)[i++] = 0xc3;
> >
> > + /* VTL call/return entries */
> > + if (!kvm_xen_hypercall_enabled(kvm) && kvm_hv_vsm_enabled(kvm)) {
> > +#ifdef CONFIG_X86_64
> > + if (is_64_bit_mode(vcpu)) {
> > + /*
> > + * VTL call 64-bit entry prologue:
> > + * mov %rcx, %rax
> > + * mov $0x11, %ecx
> > + * jmp 0:
>
> This isn't really 'jmp 0' as I first wondered but actually backward jump 32 bytes back (if I did the calculation correctly).
> This is very dangerous because code that was before can change and in fact I don't think that this
> offset is even correct now, and on top of that it depends on support for xen hypercalls as well.

You're absolutely right. The offset is wrong as is, and the overall
approach might break in the future.

Another solution is to explicitly do the vmcall and avoid any jumps.
This seems to be what Hyper-V does:
https://hal.science/hal-03117362/document (Figure 8).

> This can be fixed by calculating the offset in runtime, however I am thinking:
>
>
> Since userspace will have to be aware of the offsets in this page, and since
> pretty much everything else is done in userspace, it might make sense to create
> the hypercall page in the userspace.
>
> In fact, the fact that KVM currently overwrites the guest page, is a violation of
> the HV spec.
>
> It's more correct regardless of VTL to do userspace vm exit and let the userspace put a memslot ("overlay")
> over the address, and put whatever userspace wants there, including the above code.

I agree we should be on the safe side and fully implement overlays. That
said, I suspect they are not actually necessary in practice (with or
without VSM support).

> Then we won't need the new ioctl as well.
>
> To support this I think that we can add a userspace msr filter on the HV_X64_MSR_HYPERCALL,
> although I am not 100% sure if a userspace msr filter overrides the in-kernel msr handling.

I thought about it at the time. It's not that simple though, we should
still let KVM set the hypercall bytecode, and other quirks like the Xen
one. Additionally, we have no way of knowing where they are going to be
located. We could do something like this, but it's not pretty:

- Exit to user-space on HV_X64_MSR_HYPERCALL (it overrides the msr
handling).
- Setup the overlay.
- Call HV_X64_MSR_HYPERCALL from user-space so KVM writes its share of
the hypercall page.
- Copy the VSM parts from user-space in an area we know to be safe.

We could maybe introduce and extension CAP that provides a safe offset
in the hypercall page for user-space to use?

Nicolas

2023-12-01 16:32:46

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page

On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > To support this I think that we can add a userspace msr filter on the HV_X64_MSR_HYPERCALL,
> > although I am not 100% sure if a userspace msr filter overrides the in-kernel msr handling.
>
> I thought about it at the time. It's not that simple though, we should
> still let KVM set the hypercall bytecode, and other quirks like the Xen
> one.

Yeah, that Xen quirk is quite the killer.

Can you provide pseudo-assembly for what the final page is supposed to look like?
I'm struggling mightily to understand what this is actually trying to do.

2023-12-01 16:51:31

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page

On Fri Dec 1, 2023 at 4:32 PM UTC, Sean Christopherson wrote:
> On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > To support this I think that we can add a userspace msr filter on the HV_X64_MSR_HYPERCALL,
> > > although I am not 100% sure if a userspace msr filter overrides the in-kernel msr handling.
> >
> > I thought about it at the time. It's not that simple though, we should
> > still let KVM set the hypercall bytecode, and other quirks like the Xen
> > one.
>
> Yeah, that Xen quirk is quite the killer.
>
> Can you provide pseudo-assembly for what the final page is supposed to look like?
> I'm struggling mightily to understand what this is actually trying to do.

I'll make it as simple as possible (diregard 32bit support and that xen
exists):

vmcall <- Offset 0, regular Hyper-V hypercalls enter here
ret
mov rax,rcx <- VTL call hypercall enters here
mov rcx,0x11
vmcall
ret
mov rax,rcx <- VTL return hypercall enters here
mov rcx,0x12
vmcall
ret

rcx needs to be saved as it contains a "VTL call control input to the
hypervisor" (TLFS 15.6.1). I don't remember seeing it being used in
practice. Then, KVM expects the hypercall code in rcx, hence the
0x11/0x12 mov.

Nicolas

2023-12-01 17:47:23

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page

On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> On Fri Dec 1, 2023 at 4:32 PM UTC, Sean Christopherson wrote:
> > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > > To support this I think that we can add a userspace msr filter on the HV_X64_MSR_HYPERCALL,
> > > > although I am not 100% sure if a userspace msr filter overrides the in-kernel msr handling.
> > >
> > > I thought about it at the time. It's not that simple though, we should
> > > still let KVM set the hypercall bytecode, and other quirks like the Xen
> > > one.
> >
> > Yeah, that Xen quirk is quite the killer.
> >
> > Can you provide pseudo-assembly for what the final page is supposed to look like?
> > I'm struggling mightily to understand what this is actually trying to do.
>
> I'll make it as simple as possible (diregard 32bit support and that xen
> exists):
>
> vmcall <- Offset 0, regular Hyper-V hypercalls enter here
> ret
> mov rax,rcx <- VTL call hypercall enters here

I'm missing who/what defines "here" though. What generates the CALL that points
at this exact offset? If the exact offset is dictated in the TLFS, then aren't
we screwed with the whole Xen quirk, which inserts 5 bytes before that first VMCALL?

2023-12-01 18:16:24

by Nicolas Saenz Julienne

[permalink] [raw]
Subject: Re: [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page

On Fri Dec 1, 2023 at 5:47 PM UTC, Sean Christopherson wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > On Fri Dec 1, 2023 at 4:32 PM UTC, Sean Christopherson wrote:
> > > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > > > To support this I think that we can add a userspace msr filter on the HV_X64_MSR_HYPERCALL,
> > > > > although I am not 100% sure if a userspace msr filter overrides the in-kernel msr handling.
> > > >
> > > > I thought about it at the time. It's not that simple though, we should
> > > > still let KVM set the hypercall bytecode, and other quirks like the Xen
> > > > one.
> > >
> > > Yeah, that Xen quirk is quite the killer.
> > >
> > > Can you provide pseudo-assembly for what the final page is supposed to look like?
> > > I'm struggling mightily to understand what this is actually trying to do.
> >
> > I'll make it as simple as possible (diregard 32bit support and that xen
> > exists):
> >
> > vmcall <- Offset 0, regular Hyper-V hypercalls enter here
> > ret
> > mov rax,rcx <- VTL call hypercall enters here
>
> I'm missing who/what defines "here" though. What generates the CALL that points
> at this exact offset? If the exact offset is dictated in the TLFS, then aren't
> we screwed with the whole Xen quirk, which inserts 5 bytes before that first VMCALL?

Yes, sorry, I should've included some more context.

Here's a rundown (from memory) of how the first VTL call happens:
- CPU0 start running at VTL0.
- Hyper-V enables VTL1 on the partition.
- Hyper-V enabled VTL1 on CPU0, but doesn't yet switch to it. It passes
the initial VTL1 CPU state alongside the enablement hypercall
arguments.
- Hyper-V sets the Hypercall page overlay address through
HV_X64_MSR_HYPERCALL. KVM fills it.
- Hyper-V gets the VTL-call and VTL-return offset into the hypercall
page using the VP Register HvRegisterVsmCodePageOffsets (VP register
handling is in user-space).
- Hyper-V performs the first VTL-call, and has all it needs to move
between VTL0/1.

Nicolas

2023-12-05 19:22:11

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page

On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> On Fri Dec 1, 2023 at 5:47 PM UTC, Sean Christopherson wrote:
> > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > On Fri Dec 1, 2023 at 4:32 PM UTC, Sean Christopherson wrote:
> > > > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > > > > To support this I think that we can add a userspace msr filter on the HV_X64_MSR_HYPERCALL,
> > > > > > although I am not 100% sure if a userspace msr filter overrides the in-kernel msr handling.
> > > > >
> > > > > I thought about it at the time. It's not that simple though, we should
> > > > > still let KVM set the hypercall bytecode, and other quirks like the Xen
> > > > > one.
> > > >
> > > > Yeah, that Xen quirk is quite the killer.
> > > >
> > > > Can you provide pseudo-assembly for what the final page is supposed to look like?
> > > > I'm struggling mightily to understand what this is actually trying to do.
> > >
> > > I'll make it as simple as possible (diregard 32bit support and that xen
> > > exists):
> > >
> > > vmcall <- Offset 0, regular Hyper-V hypercalls enter here
> > > ret
> > > mov rax,rcx <- VTL call hypercall enters here
> >
> > I'm missing who/what defines "here" though. What generates the CALL that points
> > at this exact offset? If the exact offset is dictated in the TLFS, then aren't
> > we screwed with the whole Xen quirk, which inserts 5 bytes before that first VMCALL?
>
> Yes, sorry, I should've included some more context.
>
> Here's a rundown (from memory) of how the first VTL call happens:
> - CPU0 start running at VTL0.
> - Hyper-V enables VTL1 on the partition.
> - Hyper-V enabled VTL1 on CPU0, but doesn't yet switch to it. It passes
> the initial VTL1 CPU state alongside the enablement hypercall
> arguments.
> - Hyper-V sets the Hypercall page overlay address through
> HV_X64_MSR_HYPERCALL. KVM fills it.
> - Hyper-V gets the VTL-call and VTL-return offset into the hypercall
> page using the VP Register HvRegisterVsmCodePageOffsets (VP register
> handling is in user-space).

Ah, so the guest sets the offsets by "writing" HvRegisterVsmCodePageOffsets via
a HvSetVpRegisters() hypercall.

I don't see a sane way to handle this in KVM if userspace handles HvSetVpRegisters().
E.g. if the guest requests offsets that don't leave enough room for KVM to shove
in its data, then presumably userspace needs to reject HvSetVpRegisters(). But
that requires userspace to know exactly how many bytes KVM is going to write at
each offsets.

My vote is to have userspace do all the patching. IIUC, all of this is going to
be mutually exclusive with kvm_xen_hypercall_enabled(), i.e. userspace doesn't
need to worry about setting RAX[31]. At that point, it's just VMCALL versus
VMMCALL, and userspace is more than capable of identifying whether its running
on Intel or AMD.

> - Hyper-V performs the first VTL-call, and has all it needs to move
> between VTL0/1.
>
> Nicolas

2023-12-05 20:04:44

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page

On Tue, 2023-12-05 at 11:21 -0800, Sean Christopherson wrote:
> On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > On Fri Dec 1, 2023 at 5:47 PM UTC, Sean Christopherson wrote:
> > > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > > On Fri Dec 1, 2023 at 4:32 PM UTC, Sean Christopherson wrote:
> > > > > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > > > > > To support this I think that we can add a userspace msr filter on the HV_X64_MSR_HYPERCALL,
> > > > > > > although I am not 100% sure if a userspace msr filter overrides the in-kernel msr handling.
> > > > > >
> > > > > > I thought about it at the time. It's not that simple though, we should
> > > > > > still let KVM set the hypercall bytecode, and other quirks like the Xen
> > > > > > one.
> > > > >
> > > > > Yeah, that Xen quirk is quite the killer.
> > > > >
> > > > > Can you provide pseudo-assembly for what the final page is supposed to look like?
> > > > > I'm struggling mightily to understand what this is actually trying to do.
> > > >
> > > > I'll make it as simple as possible (diregard 32bit support and that xen
> > > > exists):
> > > >
> > > > vmcall <- Offset 0, regular Hyper-V hypercalls enter here
> > > > ret
> > > > mov rax,rcx <- VTL call hypercall enters here
> > >
> > > I'm missing who/what defines "here" though. What generates the CALL that points
> > > at this exact offset? If the exact offset is dictated in the TLFS, then aren't
> > > we screwed with the whole Xen quirk, which inserts 5 bytes before that first VMCALL?
> >
> > Yes, sorry, I should've included some more context.
> >
> > Here's a rundown (from memory) of how the first VTL call happens:
> > - CPU0 start running at VTL0.
> > - Hyper-V enables VTL1 on the partition.
> > - Hyper-V enabled VTL1 on CPU0, but doesn't yet switch to it. It passes
> > the initial VTL1 CPU state alongside the enablement hypercall
> > arguments.
> > - Hyper-V sets the Hypercall page overlay address through
> > HV_X64_MSR_HYPERCALL. KVM fills it.
> > - Hyper-V gets the VTL-call and VTL-return offset into the hypercall
> > page using the VP Register HvRegisterVsmCodePageOffsets (VP register
> > handling is in user-space).
>
> Ah, so the guest sets the offsets by "writing" HvRegisterVsmCodePageOffsets via
> a HvSetVpRegisters() hypercall.

No, you didn't understand this correctly.

The guest writes the HV_X64_MSR_HYPERCALL, and in the response hyperv fills the hypercall page,
including the VSM thunks.

Then the guest can _read_ the offsets, hyperv chose there by issuing another hypercall.

In the current implementation,
the offsets that the kernel choose are first exposed to the userspace via new ioctl, and then the userspace
exposes these offsets to the guest via that 'another hypercall'
(reading a pseudo partition register 'HvRegisterVsmCodePageOffsets')

I personally don't know for sure anymore if the userspace or kernel based hypercall page is better
here, it's ugly regardless :(


Best regards,
Maxim Levitsky

>
> I don't see a sane way to handle this in KVM if userspace handles HvSetVpRegisters().
> E.g. if the guest requests offsets that don't leave enough room for KVM to shove
> in its data, then presumably userspace needs to reject HvSetVpRegisters(). But
> that requires userspace to know exactly how many bytes KVM is going to write at
> each offsets.
>
> My vote is to have userspace do all the patching. IIUC, all of this is going to
> be mutually exclusive with kvm_xen_hypercall_enabled(), i.e. userspace doesn't
> need to worry about setting RAX[31]. At that point, it's just VMCALL versus
> VMMCALL, and userspace is more than capable of identifying whether its running
> on Intel or AMD.
>
> > - Hyper-V performs the first VTL-call, and has all it needs to move
> > between VTL0/1.
> >
> > Nicolas


2023-12-06 00:07:36

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page

On Tue, Dec 05, 2023, Maxim Levitsky wrote:
> On Tue, 2023-12-05 at 11:21 -0800, Sean Christopherson wrote:
> > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > On Fri Dec 1, 2023 at 5:47 PM UTC, Sean Christopherson wrote:
> > > > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > > > On Fri Dec 1, 2023 at 4:32 PM UTC, Sean Christopherson wrote:
> > > > > > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > > > > > > To support this I think that we can add a userspace msr filter on the HV_X64_MSR_HYPERCALL,
> > > > > > > > although I am not 100% sure if a userspace msr filter overrides the in-kernel msr handling.
> > > > > > >
> > > > > > > I thought about it at the time. It's not that simple though, we should
> > > > > > > still let KVM set the hypercall bytecode, and other quirks like the Xen
> > > > > > > one.
> > > > > >
> > > > > > Yeah, that Xen quirk is quite the killer.
> > > > > >
> > > > > > Can you provide pseudo-assembly for what the final page is supposed to look like?
> > > > > > I'm struggling mightily to understand what this is actually trying to do.
> > > > >
> > > > > I'll make it as simple as possible (diregard 32bit support and that xen
> > > > > exists):
> > > > >
> > > > > vmcall <- Offset 0, regular Hyper-V hypercalls enter here
> > > > > ret
> > > > > mov rax,rcx <- VTL call hypercall enters here
> > > >
> > > > I'm missing who/what defines "here" though. What generates the CALL that points
> > > > at this exact offset? If the exact offset is dictated in the TLFS, then aren't
> > > > we screwed with the whole Xen quirk, which inserts 5 bytes before that first VMCALL?
> > >
> > > Yes, sorry, I should've included some more context.
> > >
> > > Here's a rundown (from memory) of how the first VTL call happens:
> > > - CPU0 start running at VTL0.
> > > - Hyper-V enables VTL1 on the partition.
> > > - Hyper-V enabled VTL1 on CPU0, but doesn't yet switch to it. It passes
> > > the initial VTL1 CPU state alongside the enablement hypercall
> > > arguments.
> > > - Hyper-V sets the Hypercall page overlay address through
> > > HV_X64_MSR_HYPERCALL. KVM fills it.
> > > - Hyper-V gets the VTL-call and VTL-return offset into the hypercall
> > > page using the VP Register HvRegisterVsmCodePageOffsets (VP register
> > > handling is in user-space).
> >
> > Ah, so the guest sets the offsets by "writing" HvRegisterVsmCodePageOffsets via
> > a HvSetVpRegisters() hypercall.
>
> No, you didn't understand this correctly.
>
> The guest writes the HV_X64_MSR_HYPERCALL, and in the response hyperv fills

When people say "Hyper-V", do y'all mean "root partition"? If so, can we just
say "root partition"? Part of my confusion is that I don't instinctively know
whether things like "Hyper-V enables VTL1 on the partition" are talking about the
root partition (or I guess parent partition?) or the hypervisor. Functionally it
probably doesn't matter, it's just hard to reconcile things with the TLFS, which
is written largely to describe the hypervisor's behavior.

> the hypercall page, including the VSM thunks.
>
> Then the guest can _read_ the offsets, hyperv chose there by issuing another hypercall.

Hrm, now I'm really confused. Ah, the TLFS contradicts itself. The blurb for
AccessVpRegisters says:

The partition can invoke the hypercalls HvSetVpRegisters and HvGetVpRegisters.

And HvSetVpRegisters confirms that requirement:

The caller must either be the parent of the partition specified by PartitionId,
or the partition specified must be “self” and the partition must have the
AccessVpRegisters privilege

But it's absent from HvGetVpRegisters:

The caller must be the parent of the partition specified by PartitionId or the
partition specifying its own partition ID.

> In the current implementation, the offsets that the kernel choose are first
> exposed to the userspace via new ioctl, and then the userspace exposes these
> offsets to the guest via that 'another hypercall' (reading a pseudo partition
> register 'HvRegisterVsmCodePageOffsets')
>
> I personally don't know for sure anymore if the userspace or kernel based
> hypercall page is better here, it's ugly regardless :(

Hrm. Requiring userspace to intercept the WRMSR will be a mess because then KVM
will have zero knowledge of the hypercall page, e.g. userspace would be forced to
intercept HV_X64_MSR_GUEST_OS_ID as well. That's not the end of the world, but
it's not exactly ideal either.

What if we exit to userspace with a new kvm_hyperv_exit reason that requires
completion? I.e. punt to userspace if VSM is enabled, but still record the data
in KVM? Ugh, but even that's a mess because kvm_hv_set_msr_pw() is deep in the
WRMSR emulation call stack and can't easily signal that an exit to userspace is
needed. Blech.

2023-12-06 16:21:06

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [RFC 05/33] KVM: x86: hyper-v: Introduce VTL call/return prologues in hypercall page

On Tue, 2023-12-05 at 16:07 -0800, Sean Christopherson wrote:
> On Tue, Dec 05, 2023, Maxim Levitsky wrote:
> > On Tue, 2023-12-05 at 11:21 -0800, Sean Christopherson wrote:
> > > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > > On Fri Dec 1, 2023 at 5:47 PM UTC, Sean Christopherson wrote:
> > > > > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > > > > On Fri Dec 1, 2023 at 4:32 PM UTC, Sean Christopherson wrote:
> > > > > > > On Fri, Dec 01, 2023, Nicolas Saenz Julienne wrote:
> > > > > > > > > To support this I think that we can add a userspace msr filter on the HV_X64_MSR_HYPERCALL,
> > > > > > > > > although I am not 100% sure if a userspace msr filter overrides the in-kernel msr handling.
> > > > > > > >
> > > > > > > > I thought about it at the time. It's not that simple though, we should
> > > > > > > > still let KVM set the hypercall bytecode, and other quirks like the Xen
> > > > > > > > one.
> > > > > > >
> > > > > > > Yeah, that Xen quirk is quite the killer.
> > > > > > >
> > > > > > > Can you provide pseudo-assembly for what the final page is supposed to look like?
> > > > > > > I'm struggling mightily to understand what this is actually trying to do.
> > > > > >
> > > > > > I'll make it as simple as possible (diregard 32bit support and that xen
> > > > > > exists):
> > > > > >
> > > > > > vmcall <- Offset 0, regular Hyper-V hypercalls enter here
> > > > > > ret
> > > > > > mov rax,rcx <- VTL call hypercall enters here
> > > > >
> > > > > I'm missing who/what defines "here" though. What generates the CALL that points
> > > > > at this exact offset? If the exact offset is dictated in the TLFS, then aren't
> > > > > we screwed with the whole Xen quirk, which inserts 5 bytes before that first VMCALL?
> > > >
> > > > Yes, sorry, I should've included some more context.
> > > >
> > > > Here's a rundown (from memory) of how the first VTL call happens:
> > > > - CPU0 start running at VTL0.
> > > > - Hyper-V enables VTL1 on the partition.
> > > > - Hyper-V enabled VTL1 on CPU0, but doesn't yet switch to it. It passes
> > > > the initial VTL1 CPU state alongside the enablement hypercall
> > > > arguments.
> > > > - Hyper-V sets the Hypercall page overlay address through
> > > > HV_X64_MSR_HYPERCALL. KVM fills it.
> > > > - Hyper-V gets the VTL-call and VTL-return offset into the hypercall
> > > > page using the VP Register HvRegisterVsmCodePageOffsets (VP register
> > > > handling is in user-space).
> > >
> > > Ah, so the guest sets the offsets by "writing" HvRegisterVsmCodePageOffsets via
> > > a HvSetVpRegisters() hypercall.
> >
> > No, you didn't understand this correctly.
> >
> > The guest writes the HV_X64_MSR_HYPERCALL, and in the response hyperv fills
>
> When people say "Hyper-V", do y'all mean "root partition"?
> If so, can we just
> say "root partition"? Part of my confusion is that I don't instinctively know
> whether things like "Hyper-V enables VTL1 on the partition" are talking about the
> root partition (or I guess parent partition?) or the hypervisor. Functionally it
> probably doesn't matter, it's just hard to reconcile things with the TLFS, which
> is written largely to describe the hypervisor's behavior.
>
> > the hypercall page, including the VSM thunks.
> >
> > Then the guest can _read_ the offsets, hyperv chose there by issuing another hypercall.
>
> Hrm, now I'm really confused. Ah, the TLFS contradicts itself. The blurb for
> AccessVpRegisters says:
>
> The partition can invoke the hypercalls HvSetVpRegisters and HvGetVpRegisters.
>
> And HvSetVpRegisters confirms that requirement:
>
> The caller must either be the parent of the partition specified by PartitionId,
> or the partition specified must be “self” and the partition must have the
> AccessVpRegisters privilege
>
> But it's absent from HvGetVpRegisters:
>
> The caller must be the parent of the partition specified by PartitionId or the
> partition specifying its own partition ID.

Yes, it is indeed very strange, that a partition would do a hypercall to read its own
registers - but then the 'register' is also not really a register but more of a 'hack', and I guess
they allowed it in this particular case. That is why I wrote the 'another hypercall'
thing, because it is very strange that they (ab)used the HvGetVpRegisters for that.


But regardless of the above, guests (root partition or any other partition) do the
VTL calls, and in order to do a VTL call, that guest has to know the hypercall page offsets,
and for that the guest has to do the HvGetVpRegisters hypercall first.

>
> > In the current implementation, the offsets that the kernel choose are first
> > exposed to the userspace via new ioctl, and then the userspace exposes these
> > offsets to the guest via that 'another hypercall' (reading a pseudo partition
> > register 'HvRegisterVsmCodePageOffsets')
> >
> > I personally don't know for sure anymore if the userspace or kernel based
> > hypercall page is better here, it's ugly regardless :(
>
> Hrm. Requiring userspace to intercept the WRMSR will be a mess because then KVM
> will have zero knowledge of the hypercall page, e.g. userspace would be forced to
> intercept HV_X64_MSR_GUEST_OS_ID as well.
> That's not the end of the world, but
> it's not exactly ideal either.
>
> What if we exit to userspace with a new kvm_hyperv_exit reason that requires
> completion?

BTW the other option is to do the whole thing in kernel - the offset bug in the hypercall page
can be easily solved with a variable, and then the kernel can also intercept the HvGetVpRegisters
hypercall and return these offsets for HvRegisterVsmCodePageOffsets, and for all
other VP registers it can still exit to userspace - that way we also avoid adding a new ioctl,
and have the whole thing in one place.

All of the above can even be done unconditionally (or be conditionally tied to a Kconfig option),
because it doesn't add much overhead and neither should break backward compatibility - I don't think
hyperv guests rely on hypervisor not touching the hypercall page beyond the few bytes that KVM
does write currently.

Best regards,
Maxim Levitsky


> I.e. punt to userspace if VSM is enabled, but still record the data
> in KVM? Ugh, but even that's a mess because kvm_hv_set_msr_pw() is deep in the
> WRMSR emulation call stack and can't easily signal that an exit to userspace is
> needed. Blech.
>