Microsoft pointed out privately to me that KVM's handling of
KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is invalid
in EPT misconfiguration vmexit handlers, because neither EPT violations
nor misconfigurations are listed in the manual among the VM exits that
set the VM-exit instruction length field.
While physical processors seem to set the field, this is not architectural
and is just a side effect of the implementation. I couldn't convince
myself of any condition on the exit qualification where VM-exit
instruction length "has" to be defined; there are no trap-like VM-exits
that can be repurposed; and fault-like VM-exits such as descriptor-table
exits provide no decoding information. So I don't really see any way
to keep the full speedup.
What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
because computing the physical RIP and reading the instruction is
expensive, but at least the eventfd is signaled before entering the
emulator. This saves on latency. While at it, don't check breakpoints
when skipping the instruction, as presumably any side effect has been
exposed already.
Adding a hypercall or MSR write that does a fast MMIO write to a physical
address would do it, but it adds hypervisor knowledge in virtio, including
CPUID handling. So it would be pretty ugly in the guest-side implementation,
but if somebody wants to do it and the virtio side is acceptable to the
virtio maintainers, I am okay with it.
Cc: Michael S. Tsirkin <[email protected]>
Cc: [email protected]
Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
Suggested-by: Radim Krčmář <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/vmx.c | 3 ++-
arch/x86/kvm/x86.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index df8d2f127508..5ec47fd0b990 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6407,7 +6407,8 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
trace_kvm_fast_mmio(gpa);
- return kvm_skip_emulated_instruction(vcpu);
+ return x86_emulate_instruction(vcpu, gpa, EMULTYPE_SKIP,
+ NULL, 0) == EMULATE_DONE;
}
ret = handle_mmio_page_fault(vcpu, gpa, true);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e10eda86bc7b..e74b79dab343 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5654,7 +5654,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
* handle watchpoints yet, those would be handled in
* the emulate_ops.
*/
- if (kvm_vcpu_check_breakpoint(vcpu, &r))
+ if (!(emulation_type & EMULTYPE_SKIP) &&
+ kvm_vcpu_check_breakpoint(vcpu, &r))
return r;
ctxt->interruptibility = 0;
--
1.8.3.1
On Wed, Aug 16, 2017 at 03:34:54PM +0200, Paolo Bonzini wrote:
> Microsoft pointed out privately to me that KVM's handling of
> KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is invalid
> in EPT misconfiguration vmexit handlers, because neither EPT violations
> nor misconfigurations are listed in the manual among the VM exits that
> set the VM-exit instruction length field.
>
> While physical processors seem to set the field, this is not architectural
> and is just a side effect of the implementation. I couldn't convince
> myself of any condition on the exit qualification where VM-exit
> instruction length "has" to be defined; there are no trap-like VM-exits
> that can be repurposed; and fault-like VM-exits such as descriptor-table
> exits provide no decoding information. So I don't really see any way
> to keep the full speedup.
>
> What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
> because computing the physical RIP and reading the instruction is
> expensive, but at least the eventfd is signaled before entering the
> emulator. This saves on latency. While at it, don't check breakpoints
> when skipping the instruction, as presumably any side effect has been
> exposed already.
>
> Adding a hypercall or MSR write that does a fast MMIO write to a physical
> address would do it, but it adds hypervisor knowledge in virtio, including
> CPUID handling. So it would be pretty ugly in the guest-side implementation,
> but if somebody wants to do it and the virtio side is acceptable to the
> virtio maintainers, I am okay with it.
>
> Cc: Michael S. Tsirkin <[email protected]>
> Cc: [email protected]
> Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
> Suggested-by: Radim Krčmář <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
Jason (cc) who worked on the original optimization said he can
work to test the performance impact.
I suggest we don't rush this (it's been like this for 2 years),
and the issue seems to be largely theoretical.
> ---
> arch/x86/kvm/vmx.c | 3 ++-
> arch/x86/kvm/x86.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index df8d2f127508..5ec47fd0b990 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -6407,7 +6407,8 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
> if (!kvm_io_bus_write(vcpu, KVM_FAST_MMIO_BUS, gpa, 0, NULL)) {
> trace_kvm_fast_mmio(gpa);
> - return kvm_skip_emulated_instruction(vcpu);
> + return x86_emulate_instruction(vcpu, gpa, EMULTYPE_SKIP,
> + NULL, 0) == EMULATE_DONE;
> }
>
> ret = handle_mmio_page_fault(vcpu, gpa, true);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e10eda86bc7b..e74b79dab343 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5654,7 +5654,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
> * handle watchpoints yet, those would be handled in
> * the emulate_ops.
> */
> - if (kvm_vcpu_check_breakpoint(vcpu, &r))
> + if (!(emulation_type & EMULTYPE_SKIP) &&
> + kvm_vcpu_check_breakpoint(vcpu, &r))
> return r;
>
> ctxt->interruptibility = 0;
> --
> 1.8.3.1
2017-08-16 17:10+0300, Michael S. Tsirkin:
> On Wed, Aug 16, 2017 at 03:34:54PM +0200, Paolo Bonzini wrote:
> > Microsoft pointed out privately to me that KVM's handling of
> > KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is invalid
> > in EPT misconfiguration vmexit handlers, because neither EPT violations
> > nor misconfigurations are listed in the manual among the VM exits that
> > set the VM-exit instruction length field.
> >
> > While physical processors seem to set the field, this is not architectural
> > and is just a side effect of the implementation. I couldn't convince
> > myself of any condition on the exit qualification where VM-exit
> > instruction length "has" to be defined; there are no trap-like VM-exits
> > that can be repurposed; and fault-like VM-exits such as descriptor-table
> > exits provide no decoding information. So I don't really see any way
> > to keep the full speedup.
> >
> > What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
> > because computing the physical RIP and reading the instruction is
> > expensive, but at least the eventfd is signaled before entering the
> > emulator. This saves on latency. While at it, don't check breakpoints
> > when skipping the instruction, as presumably any side effect has been
> > exposed already.
> >
> > Adding a hypercall or MSR write that does a fast MMIO write to a physical
> > address would do it, but it adds hypervisor knowledge in virtio, including
> > CPUID handling. So it would be pretty ugly in the guest-side implementation,
> > but if somebody wants to do it and the virtio side is acceptable to the
> > virtio maintainers, I am okay with it.
> >
> > Cc: Michael S. Tsirkin <[email protected]>
> > Cc: [email protected]
> > Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
> > Suggested-by: Radim Krčmář <[email protected]>
> > Signed-off-by: Paolo Bonzini <[email protected]>
>
> Jason (cc) who worked on the original optimization said he can
> work to test the performance impact.
> I suggest we don't rush this (it's been like this for 2 years),
> and the issue seems to be largely theoretical.
Paolo, did Microsoft point it out because they hit the bug when running
KVM on Hyper-V?
Thanks.
On Wed, Aug 16, 2017 at 06:56:25PM +0200, Radim Krčmář wrote:
> 2017-08-16 17:10+0300, Michael S. Tsirkin:
> > On Wed, Aug 16, 2017 at 03:34:54PM +0200, Paolo Bonzini wrote:
> > > Microsoft pointed out privately to me that KVM's handling of
> > > KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is invalid
> > > in EPT misconfiguration vmexit handlers, because neither EPT violations
> > > nor misconfigurations are listed in the manual among the VM exits that
> > > set the VM-exit instruction length field.
> > >
> > > While physical processors seem to set the field, this is not architectural
> > > and is just a side effect of the implementation. I couldn't convince
> > > myself of any condition on the exit qualification where VM-exit
> > > instruction length "has" to be defined; there are no trap-like VM-exits
> > > that can be repurposed; and fault-like VM-exits such as descriptor-table
> > > exits provide no decoding information. So I don't really see any way
> > > to keep the full speedup.
> > >
> > > What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
> > > because computing the physical RIP and reading the instruction is
> > > expensive, but at least the eventfd is signaled before entering the
> > > emulator. This saves on latency. While at it, don't check breakpoints
> > > when skipping the instruction, as presumably any side effect has been
> > > exposed already.
> > >
> > > Adding a hypercall or MSR write that does a fast MMIO write to a physical
> > > address would do it, but it adds hypervisor knowledge in virtio, including
> > > CPUID handling. So it would be pretty ugly in the guest-side implementation,
> > > but if somebody wants to do it and the virtio side is acceptable to the
> > > virtio maintainers, I am okay with it.
> > >
> > > Cc: Michael S. Tsirkin <[email protected]>
> > > Cc: [email protected]
> > > Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
> > > Suggested-by: Radim Krčmář <[email protected]>
> > > Signed-off-by: Paolo Bonzini <[email protected]>
> >
> > Jason (cc) who worked on the original optimization said he can
> > work to test the performance impact.
> > I suggest we don't rush this (it's been like this for 2 years),
> > and the issue seems to be largely theoretical.
>
> Paolo, did Microsoft point it out because they hit the bug when running
> KVM on Hyper-V?
>
> Thanks.
Seems likely. If we take this patch and limit it to nested virt (I
would not do just hyper-v) this would be reasonable to merge very
quickly then - maybe even in 4.13 - and we can discuss the theoretical
issue at leasure, maybe after some feedback from intel.
--
MST
On 2017/8/17 0:56, Radim Krčmář wrote:
> 2017-08-16 17:10+0300, Michael S. Tsirkin:
>> On Wed, Aug 16, 2017 at 03:34:54PM +0200, Paolo Bonzini wrote:
>>> Microsoft pointed out privately to me that KVM's handling of
>>> KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is invalid
>>> in EPT misconfiguration vmexit handlers, because neither EPT violations
>>> nor misconfigurations are listed in the manual among the VM exits that
>>> set the VM-exit instruction length field.
>>>
>>> While physical processors seem to set the field, this is not architectural
>>> and is just a side effect of the implementation. I couldn't convince
>>> myself of any condition on the exit qualification where VM-exit
>>> instruction length "has" to be defined; there are no trap-like VM-exits
>>> that can be repurposed; and fault-like VM-exits such as descriptor-table
>>> exits provide no decoding information. So I don't really see any way
>>> to keep the full speedup.
>>>
>>> What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
>>> because computing the physical RIP and reading the instruction is
>>> expensive, but at least the eventfd is signaled before entering the
>>> emulator. This saves on latency. While at it, don't check breakpoints
>>> when skipping the instruction, as presumably any side effect has been
>>> exposed already.
>>>
>>> Adding a hypercall or MSR write that does a fast MMIO write to a physical
>>> address would do it, but it adds hypervisor knowledge in virtio, including
>>> CPUID handling. So it would be pretty ugly in the guest-side implementation,
>>> but if somebody wants to do it and the virtio side is acceptable to the
>>> virtio maintainers, I am okay with it.
>>>
>>> Cc: Michael S. Tsirkin <[email protected]>
>>> Cc: [email protected]
>>> Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
>>> Suggested-by: Radim Krčmář <[email protected]>
>>> Signed-off-by: Paolo Bonzini <[email protected]>
>>
>> Jason (cc) who worked on the original optimization said he can
>> work to test the performance impact.
>> I suggest we don't rush this (it's been like this for 2 years),
>> and the issue seems to be largely theoretical.
>
> Paolo, did Microsoft point it out because they hit the bug when running
> KVM on Hyper-V?
Does this mean the nested emulation of EPT violation and
misconfiguration in KVM side doesn't strictly follow the manual since we
didn't hit the bug in KVM?
>
> Thanks.
>
--
Yang
Alibaba Cloud Computing
2017-08-17 16:07 GMT+08:00 Yang Zhang <[email protected]>:
> On 2017/8/17 0:56, Radim Krčmář wrote:
>>
>> 2017-08-16 17:10+0300, Michael S. Tsirkin:
>>>
>>> On Wed, Aug 16, 2017 at 03:34:54PM +0200, Paolo Bonzini wrote:
>>>>
>>>> Microsoft pointed out privately to me that KVM's handling of
>>>> KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is
>>>> invalid
>>>> in EPT misconfiguration vmexit handlers, because neither EPT violations
>>>> nor misconfigurations are listed in the manual among the VM exits that
>>>> set the VM-exit instruction length field.
>>>>
>>>> While physical processors seem to set the field, this is not
>>>> architectural
>>>> and is just a side effect of the implementation. I couldn't convince
>>>> myself of any condition on the exit qualification where VM-exit
>>>> instruction length "has" to be defined; there are no trap-like VM-exits
>>>> that can be repurposed; and fault-like VM-exits such as descriptor-table
>>>> exits provide no decoding information. So I don't really see any way
>>>> to keep the full speedup.
>>>>
>>>> What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
>>>> because computing the physical RIP and reading the instruction is
>>>> expensive, but at least the eventfd is signaled before entering the
>>>> emulator. This saves on latency. While at it, don't check breakpoints
>>>> when skipping the instruction, as presumably any side effect has been
>>>> exposed already.
>>>>
>>>> Adding a hypercall or MSR write that does a fast MMIO write to a
>>>> physical
>>>> address would do it, but it adds hypervisor knowledge in virtio,
>>>> including
>>>> CPUID handling. So it would be pretty ugly in the guest-side
>>>> implementation,
>>>> but if somebody wants to do it and the virtio side is acceptable to the
>>>> virtio maintainers, I am okay with it.
>>>>
>>>> Cc: Michael S. Tsirkin <[email protected]>
>>>> Cc: [email protected]
>>>> Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
>>>> Suggested-by: Radim Krčmář <[email protected]>
>>>> Signed-off-by: Paolo Bonzini <[email protected]>
>>>
>>>
>>> Jason (cc) who worked on the original optimization said he can
>>> work to test the performance impact.
>>> I suggest we don't rush this (it's been like this for 2 years),
>>> and the issue seems to be largely theoretical.
>>
>>
>> Paolo, did Microsoft point it out because they hit the bug when running
>> KVM on Hyper-V?
>
>
> Does this mean the nested emulation of EPT violation and misconfiguration in
> KVM side doesn't strictly follow the manual since we didn't hit the bug in
> KVM?
The VM-exit instruction length of vmcs12 is provided by vmcs02
(prepare_vmcs12()), so unless the length from vmcs02 is wrong. In
addition, something like mov instruction which can trigger the EPT
violation/misconfig in guest has already been decoded before executing
I think, IIUC, then exit qualification can have the information about
the instruction length.
Regards,
Wanpeng Li
2017-08-17 16:28 GMT+08:00 Wanpeng Li <[email protected]>:
> 2017-08-17 16:07 GMT+08:00 Yang Zhang <[email protected]>:
>> On 2017/8/17 0:56, Radim Krčmář wrote:
>>>
>>> 2017-08-16 17:10+0300, Michael S. Tsirkin:
>>>>
>>>> On Wed, Aug 16, 2017 at 03:34:54PM +0200, Paolo Bonzini wrote:
>>>>>
>>>>> Microsoft pointed out privately to me that KVM's handling of
>>>>> KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is
>>>>> invalid
>>>>> in EPT misconfiguration vmexit handlers, because neither EPT violations
>>>>> nor misconfigurations are listed in the manual among the VM exits that
>>>>> set the VM-exit instruction length field.
>>>>>
>>>>> While physical processors seem to set the field, this is not
>>>>> architectural
>>>>> and is just a side effect of the implementation. I couldn't convince
>>>>> myself of any condition on the exit qualification where VM-exit
>>>>> instruction length "has" to be defined; there are no trap-like VM-exits
>>>>> that can be repurposed; and fault-like VM-exits such as descriptor-table
>>>>> exits provide no decoding information. So I don't really see any way
>>>>> to keep the full speedup.
>>>>>
>>>>> What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
>>>>> because computing the physical RIP and reading the instruction is
>>>>> expensive, but at least the eventfd is signaled before entering the
>>>>> emulator. This saves on latency. While at it, don't check breakpoints
>>>>> when skipping the instruction, as presumably any side effect has been
>>>>> exposed already.
>>>>>
>>>>> Adding a hypercall or MSR write that does a fast MMIO write to a
>>>>> physical
>>>>> address would do it, but it adds hypervisor knowledge in virtio,
>>>>> including
>>>>> CPUID handling. So it would be pretty ugly in the guest-side
>>>>> implementation,
>>>>> but if somebody wants to do it and the virtio side is acceptable to the
>>>>> virtio maintainers, I am okay with it.
>>>>>
>>>>> Cc: Michael S. Tsirkin <[email protected]>
>>>>> Cc: [email protected]
>>>>> Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
>>>>> Suggested-by: Radim Krčmář <[email protected]>
>>>>> Signed-off-by: Paolo Bonzini <[email protected]>
>>>>
>>>>
>>>> Jason (cc) who worked on the original optimization said he can
>>>> work to test the performance impact.
>>>> I suggest we don't rush this (it's been like this for 2 years),
>>>> and the issue seems to be largely theoretical.
>>>
>>>
>>> Paolo, did Microsoft point it out because they hit the bug when running
>>> KVM on Hyper-V?
>>
>>
>> Does this mean the nested emulation of EPT violation and misconfiguration in
>> KVM side doesn't strictly follow the manual since we didn't hit the bug in
>> KVM?
>
> The VM-exit instruction length of vmcs12 is provided by vmcs02
> (prepare_vmcs12()), so unless the length from vmcs02 is wrong. In
> addition, something like mov instruction which can trigger the EPT
> violation/misconfig in guest has already been decoded before executing
> I think, IIUC, then exit qualification can have the information about
> the instruction length.
s/exit qualification/VM-exit instruction length
On 2017/8/17 16:31, Wanpeng Li wrote:
> 2017-08-17 16:28 GMT+08:00 Wanpeng Li <[email protected]>:
>> 2017-08-17 16:07 GMT+08:00 Yang Zhang <[email protected]>:
>>> On 2017/8/17 0:56, Radim Krčmář wrote:
>>>>
>>>> 2017-08-16 17:10+0300, Michael S. Tsirkin:
>>>>>
>>>>> On Wed, Aug 16, 2017 at 03:34:54PM +0200, Paolo Bonzini wrote:
>>>>>>
>>>>>> Microsoft pointed out privately to me that KVM's handling of
>>>>>> KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is
>>>>>> invalid
>>>>>> in EPT misconfiguration vmexit handlers, because neither EPT violations
>>>>>> nor misconfigurations are listed in the manual among the VM exits that
>>>>>> set the VM-exit instruction length field.
>>>>>>
>>>>>> While physical processors seem to set the field, this is not
>>>>>> architectural
>>>>>> and is just a side effect of the implementation. I couldn't convince
>>>>>> myself of any condition on the exit qualification where VM-exit
>>>>>> instruction length "has" to be defined; there are no trap-like VM-exits
>>>>>> that can be repurposed; and fault-like VM-exits such as descriptor-table
>>>>>> exits provide no decoding information. So I don't really see any way
>>>>>> to keep the full speedup.
>>>>>>
>>>>>> What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
>>>>>> because computing the physical RIP and reading the instruction is
>>>>>> expensive, but at least the eventfd is signaled before entering the
>>>>>> emulator. This saves on latency. While at it, don't check breakpoints
>>>>>> when skipping the instruction, as presumably any side effect has been
>>>>>> exposed already.
>>>>>>
>>>>>> Adding a hypercall or MSR write that does a fast MMIO write to a
>>>>>> physical
>>>>>> address would do it, but it adds hypervisor knowledge in virtio,
>>>>>> including
>>>>>> CPUID handling. So it would be pretty ugly in the guest-side
>>>>>> implementation,
>>>>>> but if somebody wants to do it and the virtio side is acceptable to the
>>>>>> virtio maintainers, I am okay with it.
>>>>>>
>>>>>> Cc: Michael S. Tsirkin <[email protected]>
>>>>>> Cc: [email protected]
>>>>>> Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
>>>>>> Suggested-by: Radim Krčmář <[email protected]>
>>>>>> Signed-off-by: Paolo Bonzini <[email protected]>
>>>>>
>>>>>
>>>>> Jason (cc) who worked on the original optimization said he can
>>>>> work to test the performance impact.
>>>>> I suggest we don't rush this (it's been like this for 2 years),
>>>>> and the issue seems to be largely theoretical.
>>>>
>>>>
>>>> Paolo, did Microsoft point it out because they hit the bug when running
>>>> KVM on Hyper-V?
>>>
>>>
>>> Does this mean the nested emulation of EPT violation and misconfiguration in
>>> KVM side doesn't strictly follow the manual since we didn't hit the bug in
>>> KVM?
>>
>> The VM-exit instruction length of vmcs12 is provided by vmcs02
>> (prepare_vmcs12()), so unless the length from vmcs02 is wrong. In
>> addition, something like mov instruction which can trigger the EPT
>> violation/misconfig in guest has already been decoded before executing
>> I think, IIUC, then exit qualification can have the information about
>> the instruction length.
>
> s/exit qualification/VM-exit instruction length
According to Paolo's comment "neither EPT violations nor
misconfigurations are listed in the manual among the VM exits that set
the VM-exit instruction length field", it seems to set the instruction
length in vmcs12 is not right though it is harmless.
--
Yang
Alibaba Cloud Computing
2017-08-17 16:48 GMT+08:00 Yang Zhang <[email protected]>:
> On 2017/8/17 16:31, Wanpeng Li wrote:
>>
>> 2017-08-17 16:28 GMT+08:00 Wanpeng Li <[email protected]>:
>>>
>>> 2017-08-17 16:07 GMT+08:00 Yang Zhang <[email protected]>:
>>>>
>>>> On 2017/8/17 0:56, Radim Krčmář wrote:
>>>>>
>>>>>
>>>>> 2017-08-16 17:10+0300, Michael S. Tsirkin:
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 16, 2017 at 03:34:54PM +0200, Paolo Bonzini wrote:
>>>>>>>
>>>>>>>
>>>>>>> Microsoft pointed out privately to me that KVM's handling of
>>>>>>> KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is
>>>>>>> invalid
>>>>>>> in EPT misconfiguration vmexit handlers, because neither EPT
>>>>>>> violations
>>>>>>> nor misconfigurations are listed in the manual among the VM exits
>>>>>>> that
>>>>>>> set the VM-exit instruction length field.
>>>>>>>
>>>>>>> While physical processors seem to set the field, this is not
>>>>>>> architectural
>>>>>>> and is just a side effect of the implementation. I couldn't convince
>>>>>>> myself of any condition on the exit qualification where VM-exit
>>>>>>> instruction length "has" to be defined; there are no trap-like
>>>>>>> VM-exits
>>>>>>> that can be repurposed; and fault-like VM-exits such as
>>>>>>> descriptor-table
>>>>>>> exits provide no decoding information. So I don't really see any way
>>>>>>> to keep the full speedup.
>>>>>>>
>>>>>>> What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
>>>>>>> because computing the physical RIP and reading the instruction is
>>>>>>> expensive, but at least the eventfd is signaled before entering the
>>>>>>> emulator. This saves on latency. While at it, don't check
>>>>>>> breakpoints
>>>>>>> when skipping the instruction, as presumably any side effect has been
>>>>>>> exposed already.
>>>>>>>
>>>>>>> Adding a hypercall or MSR write that does a fast MMIO write to a
>>>>>>> physical
>>>>>>> address would do it, but it adds hypervisor knowledge in virtio,
>>>>>>> including
>>>>>>> CPUID handling. So it would be pretty ugly in the guest-side
>>>>>>> implementation,
>>>>>>> but if somebody wants to do it and the virtio side is acceptable to
>>>>>>> the
>>>>>>> virtio maintainers, I am okay with it.
>>>>>>>
>>>>>>> Cc: Michael S. Tsirkin <[email protected]>
>>>>>>> Cc: [email protected]
>>>>>>> Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
>>>>>>> Suggested-by: Radim Krčmář <[email protected]>
>>>>>>> Signed-off-by: Paolo Bonzini <[email protected]>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Jason (cc) who worked on the original optimization said he can
>>>>>> work to test the performance impact.
>>>>>> I suggest we don't rush this (it's been like this for 2 years),
>>>>>> and the issue seems to be largely theoretical.
>>>>>
>>>>>
>>>>>
>>>>> Paolo, did Microsoft point it out because they hit the bug when running
>>>>> KVM on Hyper-V?
>>>>
>>>>
>>>>
>>>> Does this mean the nested emulation of EPT violation and
>>>> misconfiguration in
>>>> KVM side doesn't strictly follow the manual since we didn't hit the bug
>>>> in
>>>> KVM?
>>>
>>>
>>> The VM-exit instruction length of vmcs12 is provided by vmcs02
>>> (prepare_vmcs12()), so unless the length from vmcs02 is wrong. In
>>> addition, something like mov instruction which can trigger the EPT
>>> violation/misconfig in guest has already been decoded before executing
>>> I think, IIUC, then exit qualification can have the information about
>>> the instruction length.
>>
>>
>> s/exit qualification/VM-exit instruction length
>
>
> According to Paolo's comment "neither EPT violations nor misconfigurations
> are listed in the manual among the VM exits that set the VM-exit instruction
> length field", it seems to set the instruction length in vmcs12 is not right
> though it is harmless.
But Paolo also mentioned this "It just happens that the actual
condition for VM-exit instruction length being set correctly is "the
fault was taken after the accessing instruction has been decoded"."
Regards,
Wanpeng Li
On 2017/8/17 16:51, Wanpeng Li wrote:
> 2017-08-17 16:48 GMT+08:00 Yang Zhang <[email protected]>:
>> On 2017/8/17 16:31, Wanpeng Li wrote:
>>>
>>> 2017-08-17 16:28 GMT+08:00 Wanpeng Li <[email protected]>:
>>>>
>>>> 2017-08-17 16:07 GMT+08:00 Yang Zhang <[email protected]>:
>>>>>
>>>>> On 2017/8/17 0:56, Radim Krčmář wrote:
>>>>>>
>>>>>>
>>>>>> 2017-08-16 17:10+0300, Michael S. Tsirkin:
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Aug 16, 2017 at 03:34:54PM +0200, Paolo Bonzini wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Microsoft pointed out privately to me that KVM's handling of
>>>>>>>> KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is
>>>>>>>> invalid
>>>>>>>> in EPT misconfiguration vmexit handlers, because neither EPT
>>>>>>>> violations
>>>>>>>> nor misconfigurations are listed in the manual among the VM exits
>>>>>>>> that
>>>>>>>> set the VM-exit instruction length field.
>>>>>>>>
>>>>>>>> While physical processors seem to set the field, this is not
>>>>>>>> architectural
>>>>>>>> and is just a side effect of the implementation. I couldn't convince
>>>>>>>> myself of any condition on the exit qualification where VM-exit
>>>>>>>> instruction length "has" to be defined; there are no trap-like
>>>>>>>> VM-exits
>>>>>>>> that can be repurposed; and fault-like VM-exits such as
>>>>>>>> descriptor-table
>>>>>>>> exits provide no decoding information. So I don't really see any way
>>>>>>>> to keep the full speedup.
>>>>>>>>
>>>>>>>> What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
>>>>>>>> because computing the physical RIP and reading the instruction is
>>>>>>>> expensive, but at least the eventfd is signaled before entering the
>>>>>>>> emulator. This saves on latency. While at it, don't check
>>>>>>>> breakpoints
>>>>>>>> when skipping the instruction, as presumably any side effect has been
>>>>>>>> exposed already.
>>>>>>>>
>>>>>>>> Adding a hypercall or MSR write that does a fast MMIO write to a
>>>>>>>> physical
>>>>>>>> address would do it, but it adds hypervisor knowledge in virtio,
>>>>>>>> including
>>>>>>>> CPUID handling. So it would be pretty ugly in the guest-side
>>>>>>>> implementation,
>>>>>>>> but if somebody wants to do it and the virtio side is acceptable to
>>>>>>>> the
>>>>>>>> virtio maintainers, I am okay with it.
>>>>>>>>
>>>>>>>> Cc: Michael S. Tsirkin <[email protected]>
>>>>>>>> Cc: [email protected]
>>>>>>>> Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
>>>>>>>> Suggested-by: Radim Krčmář <[email protected]>
>>>>>>>> Signed-off-by: Paolo Bonzini <[email protected]>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Jason (cc) who worked on the original optimization said he can
>>>>>>> work to test the performance impact.
>>>>>>> I suggest we don't rush this (it's been like this for 2 years),
>>>>>>> and the issue seems to be largely theoretical.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Paolo, did Microsoft point it out because they hit the bug when running
>>>>>> KVM on Hyper-V?
>>>>>
>>>>>
>>>>>
>>>>> Does this mean the nested emulation of EPT violation and
>>>>> misconfiguration in
>>>>> KVM side doesn't strictly follow the manual since we didn't hit the bug
>>>>> in
>>>>> KVM?
>>>>
>>>>
>>>> The VM-exit instruction length of vmcs12 is provided by vmcs02
>>>> (prepare_vmcs12()), so unless the length from vmcs02 is wrong. In
>>>> addition, something like mov instruction which can trigger the EPT
>>>> violation/misconfig in guest has already been decoded before executing
>>>> I think, IIUC, then exit qualification can have the information about
>>>> the instruction length.
>>>
>>>
>>> s/exit qualification/VM-exit instruction length
>>
>>
>> According to Paolo's comment "neither EPT violations nor misconfigurations
>> are listed in the manual among the VM exits that set the VM-exit instruction
>> length field", it seems to set the instruction length in vmcs12 is not right
>> though it is harmless.
>
> But Paolo also mentioned this "It just happens that the actual
> condition for VM-exit instruction length being set correctly is "the
> fault was taken after the accessing instruction has been decoded"."
We are talking the different thing. As manual mentioned, "All VM exits
other than those listed in the above items leave this field undefined."
If we set the field which is not in the listed VM exits that means our
emulation is not correct. But i have checked the code, KVM just read the
instruction length from hardware which means we didn't set it artificially.
>
> Regards,
> Wanpeng Li
>
--
Yang
Alibaba Cloud Computing
On 2017年08月16日 22:10, Michael S. Tsirkin wrote:
> On Wed, Aug 16, 2017 at 03:34:54PM +0200, Paolo Bonzini wrote:
>> Microsoft pointed out privately to me that KVM's handling of
>> KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is invalid
>> in EPT misconfiguration vmexit handlers, because neither EPT violations
>> nor misconfigurations are listed in the manual among the VM exits that
>> set the VM-exit instruction length field.
>>
>> While physical processors seem to set the field, this is not architectural
>> and is just a side effect of the implementation. I couldn't convince
>> myself of any condition on the exit qualification where VM-exit
>> instruction length "has" to be defined; there are no trap-like VM-exits
>> that can be repurposed; and fault-like VM-exits such as descriptor-table
>> exits provide no decoding information. So I don't really see any way
>> to keep the full speedup.
>>
>> What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
>> because computing the physical RIP and reading the instruction is
>> expensive, but at least the eventfd is signaled before entering the
>> emulator. This saves on latency. While at it, don't check breakpoints
>> when skipping the instruction, as presumably any side effect has been
>> exposed already.
>>
>> Adding a hypercall or MSR write that does a fast MMIO write to a physical
>> address would do it, but it adds hypervisor knowledge in virtio, including
>> CPUID handling. So it would be pretty ugly in the guest-side implementation,
>> but if somebody wants to do it and the virtio side is acceptable to the
>> virtio maintainers, I am okay with it.
>>
>> Cc: Michael S. Tsirkin<[email protected]>
>> Cc:[email protected]
>> Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
>> Suggested-by: Radim Krčmář<[email protected]>
>> Signed-off-by: Paolo Bonzini<[email protected]>
> Jason (cc) who worked on the original optimization said he can
> work to test the performance impact.
I see regressions on both latency and cpu utilization through netperf
TCP_RR test:
pkt_size/sessions/+transaction_rate%/+per_cpu_transaction_rate%
1/ 1/ +0%/ -5%
1/ 25/ -1%/ -2%
1/ 50/ -9%/ -10%
64/ 1/ -3%/ -9%
64/ 25/ 0%/ -2%
64/ 50/ -10%/ -11%
256/ 1/ -10%/ -17%
256/ 25/ -11%/ -12%
256/ 50/ -9%/ -11%
Thanks
On 16.08.2017 15:34, Paolo Bonzini wrote:
> Microsoft pointed out privately to me that KVM's handling of
> KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is invalid
> in EPT misconfiguration vmexit handlers, because neither EPT violations
> nor misconfigurations are listed in the manual among the VM exits that
> set the VM-exit instruction length field.
>
> While physical processors seem to set the field, this is not architectural
> and is just a side effect of the implementation. I couldn't convince
> myself of any condition on the exit qualification where VM-exit
> instruction length "has" to be defined; there are no trap-like VM-exits
> that can be repurposed; and fault-like VM-exits such as descriptor-table
> exits provide no decoding information. So I don't really see any way
> to keep the full speedup.
What about a hack:
1. clear instruction length when entering
2. check if instruction length is set when trying to forward the RIP
2a. if set, use it
2b. if not set, compute it
this at least should give full speedup in existing setups. Not 99%
architecturally correct but might just work.
>
> What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
> because computing the physical RIP and reading the instruction is
> expensive, but at least the eventfd is signaled before entering the
> emulator. This saves on latency. While at it, don't check breakpoints
> when skipping the instruction, as presumably any side effect has been
> exposed already.
>
> Adding a hypercall or MSR write that does a fast MMIO write to a physical
> address would do it, but it adds hypervisor knowledge in virtio, including
> CPUID handling. So it would be pretty ugly in the guest-side implementation,
> but if somebody wants to do it and the virtio side is acceptable to the
> virtio maintainers, I am okay with it.
>
--
Thanks,
David
On 18/08/2017 13:57, David Hildenbrand wrote:
> What about a hack:
>
> 1. clear instruction length when entering
> 2. check if instruction length is set when trying to forward the RIP
> 2a. if set, use it
> 2b. if not set, compute it
It's undefined, so we don't know that the instruction length remains
zero (also, on older processors and possibly some nested setups the
field is read-only).
Testing the hypervisor bit is the first line of action.
Paolo
> this at least should give full speedup in existing setups. Not 99%
> architecturally correct but might just work.
>
On 18.08.2017 14:35, Paolo Bonzini wrote:
> On 18/08/2017 13:57, David Hildenbrand wrote:
>> What about a hack:
>>
>> 1. clear instruction length when entering
>> 2. check if instruction length is set when trying to forward the RIP
>> 2a. if set, use it
>> 2b. if not set, compute it
>
> It's undefined, so we don't know that the instruction length remains
> zero (also, on older processors and possibly some nested setups the
> field is read-only).
Oh I see, too bad :(
>
> Testing the hypervisor bit is the first line of action.
>
> Paolo
--
Thanks,
David
2017-08-18 16:46+0800, Jason Wang:
>
>
> On 2017年08月16日 22:10, Michael S. Tsirkin wrote:
> > On Wed, Aug 16, 2017 at 03:34:54PM +0200, Paolo Bonzini wrote:
> > > Microsoft pointed out privately to me that KVM's handling of
> > > KVM_FAST_MMIO_BUS is invalid. Using skip_emulation_instruction is invalid
> > > in EPT misconfiguration vmexit handlers, because neither EPT violations
> > > nor misconfigurations are listed in the manual among the VM exits that
> > > set the VM-exit instruction length field.
> > >
> > > While physical processors seem to set the field, this is not architectural
> > > and is just a side effect of the implementation. I couldn't convince
> > > myself of any condition on the exit qualification where VM-exit
> > > instruction length "has" to be defined; there are no trap-like VM-exits
> > > that can be repurposed; and fault-like VM-exits such as descriptor-table
> > > exits provide no decoding information. So I don't really see any way
> > > to keep the full speedup.
> > >
> > > What we can do is use EMULTYPE_SKIP; it only saves 200 clock cycles
> > > because computing the physical RIP and reading the instruction is
> > > expensive, but at least the eventfd is signaled before entering the
> > > emulator. This saves on latency. While at it, don't check breakpoints
> > > when skipping the instruction, as presumably any side effect has been
> > > exposed already.
> > >
> > > Adding a hypercall or MSR write that does a fast MMIO write to a physical
> > > address would do it, but it adds hypervisor knowledge in virtio, including
> > > CPUID handling. So it would be pretty ugly in the guest-side implementation,
> > > but if somebody wants to do it and the virtio side is acceptable to the
> > > virtio maintainers, I am okay with it.
> > >
> > > Cc: Michael S. Tsirkin<[email protected]>
> > > Cc:[email protected]
> > > Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
> > > Suggested-by: Radim Krčmář<[email protected]>
> > > Signed-off-by: Paolo Bonzini<[email protected]>
> > Jason (cc) who worked on the original optimization said he can
> > work to test the performance impact.
>
> I see regressions on both latency and cpu utilization through netperf TCP_RR
> test:
>
> pkt_size/sessions/+transaction_rate%/+per_cpu_transaction_rate%
> 1/ 1/ +0%/ -5%
> 1/ 25/ -1%/ -2%
> 1/ 50/ -9%/ -10%
> 64/ 1/ -3%/ -9%
> 64/ 25/ 0%/ -2%
> 64/ 50/ -10%/ -11%
> 256/ 1/ -10%/ -17%
> 256/ 25/ -11%/ -12%
> 256/ 50/ -9%/ -11%
Might be noticeable ... I'm ok with the hypervisor detection workaround.
Still, we will need a replacement mechanism for virtio if Intel doesn't
change SDM. And drop this workaround after a solution has been
implemented.