2018-01-24 15:14:25

by Vitaly Kuznetsov

[permalink] [raw]
Subject: [PATCH] x86/kvm: disable fast MMIO when running nested

I was investigating an issue with seabios >= 1.10 which stopped working
for nested KVM on Hyper-V. The problem appears to be in
handle_ept_violation() function: when we do fast mmio we need to skip
the instruction so we do kvm_skip_emulated_instruction(). This, however,
depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
However, this is not the case.

Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
EPT MISCONFIG occurs. While on real hardware it was observed to be set,
some hypervisors follow the spec and don't set it; we end up advancing
IP with some random value.

I checked with Microsoft and they confirmed they don't fill
VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.

Fix the issue by disabling fast mmio when running nested.

Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
arch/x86/kvm/vmx.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c829d89e2e63..54afb446f38e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
/*
* A nested guest cannot optimize MMIO vmexits, because we have an
* nGPA here instead of the required GPA.
+ * Skipping instruction below depends on undefined behavior: Intel's
+ * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
+ * when EPT MISCONFIG occurs and while on real hardware it was observed
+ * to be set, other hypervisors (namely Hyper-V) don't set it, we end
+ * up advancing IP with some random value. Disable fast mmio when
+ * running nested and keep it for real hardware in hope that
+ * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
*/
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
- if (!is_guest_mode(vcpu) &&
+ if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu) &&
!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
trace_kvm_fast_mmio(gpa);
return kvm_skip_emulated_instruction(vcpu);
--
2.14.3



2018-01-25 07:56:40

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested

2018-01-24 23:12 GMT+08:00 Vitaly Kuznetsov <[email protected]>:
> I was investigating an issue with seabios >= 1.10 which stopped working
> for nested KVM on Hyper-V. The problem appears to be in
> handle_ept_violation() function: when we do fast mmio we need to skip
> the instruction so we do kvm_skip_emulated_instruction(). This, however,
> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> However, this is not the case.
>
> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> EPT MISCONFIG occurs. While on real hardware it was observed to be set,
> some hypervisors follow the spec and don't set it; we end up advancing
> IP with some random value.
>
> I checked with Microsoft and they confirmed they don't fill
> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
>
> Fix the issue by disabling fast mmio when running nested.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>

Reviewed-by: Wanpeng Li <[email protected]>

> ---
> arch/x86/kvm/vmx.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..54afb446f38e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
> /*
> * A nested guest cannot optimize MMIO vmexits, because we have an
> * nGPA here instead of the required GPA.
> + * Skipping instruction below depends on undefined behavior: Intel's
> + * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
> + * when EPT MISCONFIG occurs and while on real hardware it was observed
> + * to be set, other hypervisors (namely Hyper-V) don't set it, we end
> + * up advancing IP with some random value. Disable fast mmio when
> + * running nested and keep it for real hardware in hope that
> + * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
> */
> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> - if (!is_guest_mode(vcpu) &&
> + if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu) &&
> !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> trace_kvm_fast_mmio(gpa);
> return kvm_skip_emulated_instruction(vcpu);
> --
> 2.14.3
>

2018-01-25 09:56:15

by Liran Alon

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested


----- [email protected] wrote:

> I was investigating an issue with seabios >= 1.10 which stopped
> working
> for nested KVM on Hyper-V. The problem appears to be in
> handle_ept_violation() function: when we do fast mmio we need to skip
> the instruction so we do kvm_skip_emulated_instruction(). This,
> however,
> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> However, this is not the case.
>
> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> EPT MISCONFIG occurs. While on real hardware it was observed to be
> set,
> some hypervisors follow the spec and don't set it; we end up
> advancing
> IP with some random value.
>
> I checked with Microsoft and they confirmed they don't fill
> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
>
> Fix the issue by disabling fast mmio when running nested.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..54afb446f38e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu
> *vcpu)
> /*
> * A nested guest cannot optimize MMIO vmexits, because we have an
> * nGPA here instead of the required GPA.
> + * Skipping instruction below depends on undefined behavior:
> Intel's
> + * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
> + * when EPT MISCONFIG occurs and while on real hardware it was
> observed
> + * to be set, other hypervisors (namely Hyper-V) don't set it, we
> end
> + * up advancing IP with some random value. Disable fast mmio when
> + * running nested and keep it for real hardware in hope that
> + * VM_EXIT_INSTRUCTION_LEN will always be set correctly.

If Intel manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS on EPT_MISCONFIG,
I don't think we should do this on real-hardware as-well.

-Liran

> */
> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> - if (!is_guest_mode(vcpu) &&
> + if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu)
> &&
> !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> trace_kvm_fast_mmio(gpa);
> return kvm_skip_emulated_instruction(vcpu);
> --
> 2.14.3

2018-01-25 14:39:58

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested

2018-01-25 01:55-0800, Liran Alon:
> ----- [email protected] wrote:
> > I was investigating an issue with seabios >= 1.10 which stopped
> > working
> > for nested KVM on Hyper-V. The problem appears to be in
> > handle_ept_violation() function: when we do fast mmio we need to skip
> > the instruction so we do kvm_skip_emulated_instruction(). This,
> > however,
> > depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> > However, this is not the case.
> >
> > Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> > EPT MISCONFIG occurs. While on real hardware it was observed to be
> > set,
> > some hypervisors follow the spec and don't set it; we end up
> > advancing
> > IP with some random value.
> >
> > I checked with Microsoft and they confirmed they don't fill
> > VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
> >
> > Fix the issue by disabling fast mmio when running nested.
> >
> > Signed-off-by: Vitaly Kuznetsov <[email protected]>
> > ---
> > arch/x86/kvm/vmx.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index c829d89e2e63..54afb446f38e 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu
> > *vcpu)
> > /*
> > * A nested guest cannot optimize MMIO vmexits, because we have an
> > * nGPA here instead of the required GPA.
> > + * Skipping instruction below depends on undefined behavior:
> > Intel's
> > + * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
> > + * when EPT MISCONFIG occurs and while on real hardware it was
> > observed
> > + * to be set, other hypervisors (namely Hyper-V) don't set it, we
> > end
> > + * up advancing IP with some random value. Disable fast mmio when
> > + * running nested and keep it for real hardware in hope that
> > + * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
>
> If Intel manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS on EPT_MISCONFIG,
> I don't think we should do this on real-hardware as-well.

Neither do I, but you can see the last discussion on this topic,
https://patchwork.kernel.org/patch/9903811/. In short, we've agreed to
limit the hack to real hardware and wait for Intel or virtio changes.

Michael and Jason, any progress on implementing a fast virtio mechanism
that doesn't rely on undefined behavior?

(Encode writing instruction length into last 4 bits of MMIO address,
side-channel say that accesses to the MMIO area always use certain
instruction length, use hypercall, ...)

Thanks.

2018-01-25 14:41:40

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested



----- Original Message -----
> From: "Vitaly Kuznetsov" <[email protected]>
> To: [email protected]
> Cc: [email protected], [email protected], "Paolo Bonzini" <[email protected]>, "Radim Krčmář"
> <[email protected]>
> Sent: Wednesday, January 24, 2018 4:12:34 PM
> Subject: [PATCH] x86/kvm: disable fast MMIO when running nested
>
> I was investigating an issue with seabios >= 1.10 which stopped working
> for nested KVM on Hyper-V. The problem appears to be in
> handle_ept_violation() function: when we do fast mmio we need to skip
> the instruction so we do kvm_skip_emulated_instruction(). This, however,
> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> However, this is not the case.
>
> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> EPT MISCONFIG occurs. While on real hardware it was observed to be set,
> some hypervisors follow the spec and don't set it; we end up advancing
> IP with some random value.
>
> I checked with Microsoft and they confirmed they don't fill
> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
>
> Fix the issue by disabling fast mmio when running nested.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..54afb446f38e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
> /*
> * A nested guest cannot optimize MMIO vmexits, because we have an
> * nGPA here instead of the required GPA.
> + * Skipping instruction below depends on undefined behavior: Intel's
> + * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
> + * when EPT MISCONFIG occurs and while on real hardware it was observed
> + * to be set, other hypervisors (namely Hyper-V) don't set it, we end
> + * up advancing IP with some random value. Disable fast mmio when
> + * running nested and keep it for real hardware in hope that
> + * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
> */
> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> - if (!is_guest_mode(vcpu) &&
> + if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu) &&
> !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> trace_kvm_fast_mmio(gpa);
> return kvm_skip_emulated_instruction(vcpu);
> --
> 2.14.3

Vitaly,

could you base the X86_FEATURE_HYPERVISOR case on the patch at
https://patchwork.kernel.org/patch/9903811/?

By using EMULTYPE_SKIP, the eventfd is signaled before entering the
emulator and the impact on latency is small.

Thanks,

Paolo

2018-01-25 14:42:44

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested

2018-01-24 16:12+0100, Vitaly Kuznetsov:
> I was investigating an issue with seabios >= 1.10 which stopped working
> for nested KVM on Hyper-V. The problem appears to be in
> handle_ept_violation() function: when we do fast mmio we need to skip
> the instruction so we do kvm_skip_emulated_instruction(). This, however,
> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> However, this is not the case.
>
> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> EPT MISCONFIG occurs. While on real hardware it was observed to be set,
> some hypervisors follow the spec and don't set it; we end up advancing
> IP with some random value.
>
> I checked with Microsoft and they confirmed they don't fill
> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
>
> Fix the issue by disabling fast mmio when running nested.
>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..54afb446f38e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
> /*
> * A nested guest cannot optimize MMIO vmexits, because we have an
> * nGPA here instead of the required GPA.
> + * Skipping instruction below depends on undefined behavior: Intel's
> + * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
> + * when EPT MISCONFIG occurs and while on real hardware it was observed
> + * to be set, other hypervisors (namely Hyper-V) don't set it, we end
> + * up advancing IP with some random value. Disable fast mmio when
> + * running nested and keep it for real hardware in hope that
> + * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
> */
> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> - if (!is_guest_mode(vcpu) &&
> + if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu) &&

I realized that Paolo kept a minor optimization while getting rid of the
undefined behavior (https://patchwork.kernel.org/patch/9903811/).
Please do the same trick that signals kvm_io_bus_write() before going to
x86_emulate_instruction(... EMULTYPE_SKIP ...), but add a branch to use
kvm_skip_emulated_instruction() for bare-metal,

thanks.

> !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> trace_kvm_fast_mmio(gpa);
> return kvm_skip_emulated_instruction(vcpu);
> --
> 2.14.3
>

2018-01-25 14:47:10

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested



On 2018年01月25日 22:16, Radim Krčmář wrote:
> 2018-01-25 01:55-0800, Liran Alon:
>> ----- [email protected] wrote:
>>> I was investigating an issue with seabios >= 1.10 which stopped
>>> working
>>> for nested KVM on Hyper-V. The problem appears to be in
>>> handle_ept_violation() function: when we do fast mmio we need to skip
>>> the instruction so we do kvm_skip_emulated_instruction(). This,
>>> however,
>>> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
>>> However, this is not the case.
>>>
>>> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
>>> EPT MISCONFIG occurs. While on real hardware it was observed to be
>>> set,
>>> some hypervisors follow the spec and don't set it; we end up
>>> advancing
>>> IP with some random value.
>>>
>>> I checked with Microsoft and they confirmed they don't fill
>>> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
>>>
>>> Fix the issue by disabling fast mmio when running nested.
>>>
>>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>>> ---
>>> arch/x86/kvm/vmx.c | 9 ++++++++-
>>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index c829d89e2e63..54afb446f38e 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu
>>> *vcpu)
>>> /*
>>> * A nested guest cannot optimize MMIO vmexits, because we have an
>>> * nGPA here instead of the required GPA.
>>> + * Skipping instruction below depends on undefined behavior:
>>> Intel's
>>> + * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
>>> + * when EPT MISCONFIG occurs and while on real hardware it was
>>> observed
>>> + * to be set, other hypervisors (namely Hyper-V) don't set it, we
>>> end
>>> + * up advancing IP with some random value. Disable fast mmio when
>>> + * running nested and keep it for real hardware in hope that
>>> + * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
>> If Intel manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS on EPT_MISCONFIG,
>> I don't think we should do this on real-hardware as-well.
> Neither do I, but you can see the last discussion on this topic,
> https://patchwork.kernel.org/patch/9903811/. In short, we've agreed to
> limit the hack to real hardware and wait for Intel or virtio changes.
>
> Michael and Jason, any progress on implementing a fast virtio mechanism
> that doesn't rely on undefined behavior?
>
> (Encode writing instruction length into last 4 bits of MMIO address,
> side-channel say that accesses to the MMIO area always use certain
> instruction length, use hypercall, ...)
>
> Thanks.

No progress from my side. But we can use PIO for virtio 1.0 and it's
faster than fast MMIO (qemu supports modern pio notification bar, we can
make it as default). It looks to me that neither encoding nor hypercall
will work for real hardware virtio device.

Thanks

2018-01-25 14:51:14

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested

> > Michael and Jason, any progress on implementing a fast virtio mechanism
> > that doesn't rely on undefined behavior?
> >
> > (Encode writing instruction length into last 4 bits of MMIO address,
> > side-channel say that accesses to the MMIO area always use certain
> > instruction length, use hypercall, ...)
> >
> > Thanks.
>
> No progress from my side. But we can use PIO for virtio 1.0 and it's
> faster than fast MMIO (qemu supports modern pio notification bar, we can
> make it as default). It looks to me that neither encoding nor hypercall
> will work for real hardware virtio device.

Encoding the instruction length would work, the h/w virtio devices would
just ignore it. But... it is really ugly.

Using PIO would be a small step backwards for PCIe. As long as the device
only needs *one* notification register (either MMIO or PIO) to initialize
successfully, it's okay. Then if there is no PIO space you'd just fall back
to the slower MMIO notification.

Paolo

2018-01-25 14:51:51

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested

Paolo Bonzini <[email protected]> writes:

> ----- Original Message -----
>> From: "Vitaly Kuznetsov" <[email protected]>
>> To: [email protected]
>> Cc: [email protected], [email protected], "Paolo Bonzini" <[email protected]>, "Radim Krčmář"
>> <[email protected]>
>> Sent: Wednesday, January 24, 2018 4:12:34 PM
>> Subject: [PATCH] x86/kvm: disable fast MMIO when running nested
>>
>> I was investigating an issue with seabios >= 1.10 which stopped working
>> for nested KVM on Hyper-V. The problem appears to be in
>> handle_ept_violation() function: when we do fast mmio we need to skip
>> the instruction so we do kvm_skip_emulated_instruction(). This, however,
>> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
>> However, this is not the case.
>>
>> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
>> EPT MISCONFIG occurs. While on real hardware it was observed to be set,
>> some hypervisors follow the spec and don't set it; we end up advancing
>> IP with some random value.
>>
>> I checked with Microsoft and they confirmed they don't fill
>> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
>>
>> Fix the issue by disabling fast mmio when running nested.
>>
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>> arch/x86/kvm/vmx.c | 9 ++++++++-
>> 1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index c829d89e2e63..54afb446f38e 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -6558,9 +6558,16 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
>> /*
>> * A nested guest cannot optimize MMIO vmexits, because we have an
>> * nGPA here instead of the required GPA.
>> + * Skipping instruction below depends on undefined behavior: Intel's
>> + * manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set in VMCS
>> + * when EPT MISCONFIG occurs and while on real hardware it was observed
>> + * to be set, other hypervisors (namely Hyper-V) don't set it, we end
>> + * up advancing IP with some random value. Disable fast mmio when
>> + * running nested and keep it for real hardware in hope that
>> + * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
>> */
>> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>> - if (!is_guest_mode(vcpu) &&
>> + if (!static_cpu_has(X86_FEATURE_HYPERVISOR) && !is_guest_mode(vcpu) &&
>> !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
>> trace_kvm_fast_mmio(gpa);
>> return kvm_skip_emulated_instruction(vcpu);
>> --
>> 2.14.3
>
> Vitaly,
>
> could you base the X86_FEATURE_HYPERVISOR case on the patch at
> https://patchwork.kernel.org/patch/9903811/?
>
> By using EMULTYPE_SKIP, the eventfd is signaled before entering the
> emulator and the impact on latency is small.
>

Oh, I didn't know there was a story! I'll try EMULTYPE_SKIP, thanks!

--
Vitaly

2018-01-25 17:13:00

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested

On Thu, Jan 25, 2018 at 09:49:22AM -0500, Paolo Bonzini wrote:
> > > Michael and Jason, any progress on implementing a fast virtio mechanism
> > > that doesn't rely on undefined behavior?
> > >
> > > (Encode writing instruction length into last 4 bits of MMIO address,
> > > side-channel say that accesses to the MMIO area always use certain
> > > instruction length, use hypercall, ...)
> > >
> > > Thanks.
> >
> > No progress from my side. But we can use PIO for virtio 1.0 and it's
> > faster than fast MMIO (qemu supports modern pio notification bar, we can
> > make it as default). It looks to me that neither encoding nor hypercall
> > will work for real hardware virtio device.
>
> Encoding the instruction length would work, the h/w virtio devices would
> just ignore it. But... it is really ugly.
>
> Using PIO would be a small step backwards for PCIe. As long as the device
> only needs *one* notification register (either MMIO or PIO) to initialize
> successfully, it's okay. Then if there is no PIO space you'd just fall back
> to the slower MMIO notification.
>
> Paolo

A bigger issue for PIO is it's causing exits for hw devices.


--
MST

2018-01-26 02:42:55

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested



On 2018年01月26日 01:11, Michael S. Tsirkin wrote:
> On Thu, Jan 25, 2018 at 09:49:22AM -0500, Paolo Bonzini wrote:
>>>> Michael and Jason, any progress on implementing a fast virtio mechanism
>>>> that doesn't rely on undefined behavior?
>>>>
>>>> (Encode writing instruction length into last 4 bits of MMIO address,
>>>> side-channel say that accesses to the MMIO area always use certain
>>>> instruction length, use hypercall, ...)
>>>>
>>>> Thanks.
>>> No progress from my side. But we can use PIO for virtio 1.0 and it's
>>> faster than fast MMIO (qemu supports modern pio notification bar, we can
>>> make it as default). It looks to me that neither encoding nor hypercall
>>> will work for real hardware virtio device.
>> Encoding the instruction length would work, the h/w virtio devices would
>> just ignore it. But... it is really ugly.
>>
>> Using PIO would be a small step backwards for PCIe. As long as the device
>> only needs *one* notification register (either MMIO or PIO) to initialize
>> successfully, it's okay. Then if there is no PIO space you'd just fall back
>> to the slower MMIO notification.
>>
>> Paolo
> A bigger issue for PIO is it's causing exits for hw devices.
>
>

Just to make sure I understand. For exits you mean vmexit? I believe
MMIO will cause vmexit too.

Thanks

2018-01-26 02:51:35

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested

On Fri, Jan 26, 2018 at 10:41:58AM +0800, Jason Wang wrote:
>
>
> On 2018年01月26日 01:11, Michael S. Tsirkin wrote:
> > On Thu, Jan 25, 2018 at 09:49:22AM -0500, Paolo Bonzini wrote:
> > > > > Michael and Jason, any progress on implementing a fast virtio mechanism
> > > > > that doesn't rely on undefined behavior?
> > > > >
> > > > > (Encode writing instruction length into last 4 bits of MMIO address,
> > > > > side-channel say that accesses to the MMIO area always use certain
> > > > > instruction length, use hypercall, ...)
> > > > >
> > > > > Thanks.
> > > > No progress from my side. But we can use PIO for virtio 1.0 and it's
> > > > faster than fast MMIO (qemu supports modern pio notification bar, we can
> > > > make it as default). It looks to me that neither encoding nor hypercall
> > > > will work for real hardware virtio device.
> > > Encoding the instruction length would work, the h/w virtio devices would
> > > just ignore it. But... it is really ugly.
> > >
> > > Using PIO would be a small step backwards for PCIe. As long as the device
> > > only needs *one* notification register (either MMIO or PIO) to initialize
> > > successfully, it's okay. Then if there is no PIO space you'd just fall back
> > > to the slower MMIO notification.
> > >
> > > Paolo
> > A bigger issue for PIO is it's causing exits for hw devices.
> >
> >
>
> Just to make sure I understand. For exits you mean vmexit? I believe MMIO
> will cause vmexit too.
>
> Thanks

Not with an assigned device where the PTE is marked as present, it
won't.

--
MST

2018-01-26 03:22:47

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] x86/kvm: disable fast MMIO when running nested



On 2018年01月26日 10:49, Michael S. Tsirkin wrote:
> On Fri, Jan 26, 2018 at 10:41:58AM +0800, Jason Wang wrote:
>>
>> On 2018年01月26日 01:11, Michael S. Tsirkin wrote:
>>> On Thu, Jan 25, 2018 at 09:49:22AM -0500, Paolo Bonzini wrote:
>>>>>> Michael and Jason, any progress on implementing a fast virtio mechanism
>>>>>> that doesn't rely on undefined behavior?
>>>>>>
>>>>>> (Encode writing instruction length into last 4 bits of MMIO address,
>>>>>> side-channel say that accesses to the MMIO area always use certain
>>>>>> instruction length, use hypercall, ...)
>>>>>>
>>>>>> Thanks.
>>>>> No progress from my side. But we can use PIO for virtio 1.0 and it's
>>>>> faster than fast MMIO (qemu supports modern pio notification bar, we can
>>>>> make it as default). It looks to me that neither encoding nor hypercall
>>>>> will work for real hardware virtio device.
>>>> Encoding the instruction length would work, the h/w virtio devices would
>>>> just ignore it. But... it is really ugly.
>>>>
>>>> Using PIO would be a small step backwards for PCIe. As long as the device
>>>> only needs *one* notification register (either MMIO or PIO) to initialize
>>>> successfully, it's okay. Then if there is no PIO space you'd just fall back
>>>> to the slower MMIO notification.
>>>>
>>>> Paolo
>>> A bigger issue for PIO is it's causing exits for hw devices.
>>>
>>>
>> Just to make sure I understand. For exits you mean vmexit? I believe MMIO
>> will cause vmexit too.
>>
>> Thanks
> Not with an assigned device where the PTE is marked as present, it
> won't.
>

So in this case, assigned device can just provide MMIO bar.

Thanks