I was investigating an issue with seabios >= 1.10 which stopped working
for nested KVM on Hyper-V. The problem appears to be in
handle_ept_violation() function: when we do fast mmio we need to skip
the instruction so we do kvm_skip_emulated_instruction(). This, however,
depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
However, this is not the case.
Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
EPT MISCONFIG occurs. While on real hardware it was observed to be set,
some hypervisors follow the spec and don't set it; we end up advancing
IP with some random value.
I checked with Microsoft and they confirmed they don't fill
VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
Fix the issue by doing instruction skip through emulator when running
nested.
Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
Suggested-by: Radim Krčmář <[email protected]>
Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Vitaly Kuznetsov <[email protected]>
---
v1 -> v2:
inlay X86_FEATURE_HYPERVISOR case with EMULTYPE_SKIP optimization
[Paolo Bonzini, Radim Krčmář]
---
arch/x86/kvm/vmx.c | 16 +++++++++++++++-
arch/x86/kvm/x86.c | 3 ++-
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c829d89e2e63..e105b439c372 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6563,7 +6563,21 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
if (!is_guest_mode(vcpu) &&
!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
trace_kvm_fast_mmio(gpa);
- return kvm_skip_emulated_instruction(vcpu);
+ /*
+ * Doing kvm_skip_emulated_instruction() depends on undefined
+ * behavior: Intel's manual doesn't mandate
+ * VM_EXIT_INSTRUCTION_LEN to be set in VMCS when EPT MISCONFIG
+ * occurs and while on real hardware it was observed to be set,
+ * other hypervisors (namely Hyper-V) don't set it, we end up
+ * advancing IP with some random value. Disable fast mmio when
+ * running nested and keep it for real hardware in hope that
+ * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
+ */
+ if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
+ return kvm_skip_emulated_instruction(vcpu);
+ else
+ return x86_emulate_instruction(vcpu, gpa, EMULTYPE_SKIP,
+ NULL, 0) == EMULATE_DONE;
}
ret = kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1cec2c62a0b0..930aba87a723 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5703,7 +5703,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
* handle watchpoints yet, those would be handled in
* the emulate_ops.
*/
- if (kvm_vcpu_check_breakpoint(vcpu, &r))
+ if (!(emulation_type & EMULTYPE_SKIP) &&
+ kvm_vcpu_check_breakpoint(vcpu, &r))
return r;
ctxt->interruptibility = 0;
--
2.14.3
On Thu, Jan 25, 2018 at 04:37:07PM +0100, Vitaly Kuznetsov wrote:
> I was investigating an issue with seabios >= 1.10 which stopped working
> for nested KVM on Hyper-V. The problem appears to be in
> handle_ept_violation() function: when we do fast mmio we need to skip
> the instruction so we do kvm_skip_emulated_instruction(). This, however,
> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> However, this is not the case.
>
> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> EPT MISCONFIG occurs. While on real hardware it was observed to be set,
> some hypervisors follow the spec and don't set it; we end up advancing
> IP with some random value.
>
> I checked with Microsoft and they confirmed they don't fill
> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
>
> Fix the issue by doing instruction skip through emulator when running
> nested.
>
> Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
> Suggested-by: Radim Krčmář <[email protected]>
> Suggested-by: Paolo Bonzini <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
I would maybe also disable this when this is a kvm host
running a nested *guest*, just in case.
Acked-by: Michael S. Tsirkin <[email protected]>
> ---
> v1 -> v2:
> inlay X86_FEATURE_HYPERVISOR case with EMULTYPE_SKIP optimization
> [Paolo Bonzini, Radim Krčmář]
> ---
> arch/x86/kvm/vmx.c | 16 +++++++++++++++-
> arch/x86/kvm/x86.c | 3 ++-
> 2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..e105b439c372 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6563,7 +6563,21 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
> if (!is_guest_mode(vcpu) &&
> !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> trace_kvm_fast_mmio(gpa);
> - return kvm_skip_emulated_instruction(vcpu);
> + /*
> + * Doing kvm_skip_emulated_instruction() depends on undefined
> + * behavior: Intel's manual doesn't mandate
> + * VM_EXIT_INSTRUCTION_LEN to be set in VMCS when EPT MISCONFIG
> + * occurs and while on real hardware it was observed to be set,
> + * other hypervisors (namely Hyper-V) don't set it, we end up
> + * advancing IP with some random value. Disable fast mmio when
> + * running nested and keep it for real hardware in hope that
> + * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
> + */
> + if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
> + return kvm_skip_emulated_instruction(vcpu);
> + else
> + return x86_emulate_instruction(vcpu, gpa, EMULTYPE_SKIP,
> + NULL, 0) == EMULATE_DONE;
> }
>
> ret = kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1cec2c62a0b0..930aba87a723 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5703,7 +5703,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> * handle watchpoints yet, those would be handled in
> * the emulate_ops.
> */
> - if (kvm_vcpu_check_breakpoint(vcpu, &r))
> + if (!(emulation_type & EMULTYPE_SKIP) &&
> + kvm_vcpu_check_breakpoint(vcpu, &r))
> return r;
>
> ctxt->interruptibility = 0;
> --
> 2.14.3
2018-01-25 19:16+0200, Michael S. Tsirkin:
> On Thu, Jan 25, 2018 at 04:37:07PM +0100, Vitaly Kuznetsov wrote:
> > I was investigating an issue with seabios >= 1.10 which stopped working
> > for nested KVM on Hyper-V. The problem appears to be in
> > handle_ept_violation() function: when we do fast mmio we need to skip
> > the instruction so we do kvm_skip_emulated_instruction(). This, however,
> > depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> > However, this is not the case.
> >
> > Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> > EPT MISCONFIG occurs. While on real hardware it was observed to be set,
> > some hypervisors follow the spec and don't set it; we end up advancing
> > IP with some random value.
> >
> > I checked with Microsoft and they confirmed they don't fill
> > VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
> >
> > Fix the issue by doing instruction skip through emulator when running
> > nested.
> >
> > Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
> > Suggested-by: Radim Krčmář <[email protected]>
> > Suggested-by: Paolo Bonzini <[email protected]>
> > Signed-off-by: Vitaly Kuznetsov <[email protected]>
>
> I would maybe also disable this when this is a kvm host
> running a nested *guest*, just in case.
You mean to keep the fast path when running on KVM hypervisor?
(We already skip the path for nested guests.)
I'd prefer not to make this any uglier.
> Acked-by: Michael S. Tsirkin <[email protected]>
>
> > ---
> > v1 -> v2:
> > inlay X86_FEATURE_HYPERVISOR case with EMULTYPE_SKIP optimization
> > [Paolo Bonzini, Radim Krčmář]
Queued, thanks.
On Thu, Jan 25, 2018 at 04:37:07PM +0100, Vitaly Kuznetsov wrote:
> I was investigating an issue with seabios >= 1.10 which stopped working
> for nested KVM on Hyper-V. The problem appears to be in
> handle_ept_violation() function: when we do fast mmio we need to skip
> the instruction so we do kvm_skip_emulated_instruction(). This, however,
> depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
> However, this is not the case.
>
> Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
> EPT MISCONFIG occurs. While on real hardware it was observed to be set,
> some hypervisors follow the spec and don't set it; we end up advancing
> IP with some random value.
>
> I checked with Microsoft and they confirmed they don't fill
> VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
>
> Fix the issue by doing instruction skip through emulator when running
> nested.
>
> Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
> Suggested-by: Radim Krčmář <[email protected]>
> Suggested-by: Paolo Bonzini <[email protected]>
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
> ---
> v1 -> v2:
> inlay X86_FEATURE_HYPERVISOR case with EMULTYPE_SKIP optimization
> [Paolo Bonzini, Radim Krčmář]
> ---
> arch/x86/kvm/vmx.c | 16 +++++++++++++++-
> arch/x86/kvm/x86.c | 3 ++-
> 2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c829d89e2e63..e105b439c372 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6563,7 +6563,21 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
> if (!is_guest_mode(vcpu) &&
> !kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> trace_kvm_fast_mmio(gpa);
> - return kvm_skip_emulated_instruction(vcpu);
> + /*
> + * Doing kvm_skip_emulated_instruction() depends on undefined
> + * behavior: Intel's manual doesn't mandate
> + * VM_EXIT_INSTRUCTION_LEN to be set in VMCS when EPT MISCONFIG
> + * occurs and while on real hardware it was observed to be set,
> + * other hypervisors (namely Hyper-V) don't set it, we end up
> + * advancing IP with some random value. Disable fast mmio when
> + * running nested and keep it for real hardware in hope that
> + * VM_EXIT_INSTRUCTION_LEN will always be set correctly.
> + */
> + if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
> + return kvm_skip_emulated_instruction(vcpu);
> + else
> + return x86_emulate_instruction(vcpu, gpa, EMULTYPE_SKIP,
> + NULL, 0) == EMULATE_DONE;
> }
>
> ret = kvm_mmu_page_fault(vcpu, gpa, PFERR_RSVD_MASK, NULL, 0);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1cec2c62a0b0..930aba87a723 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5703,7 +5703,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> * handle watchpoints yet, those would be handled in
> * the emulate_ops.
> */
> - if (kvm_vcpu_check_breakpoint(vcpu, &r))
> + if (!(emulation_type & EMULTYPE_SKIP) &&
> + kvm_vcpu_check_breakpoint(vcpu, &r))
> return r;
>
> ctxt->interruptibility = 0;
> --
> 2.14.3