2018-02-05 11:06:43

by Wanpeng Li

[permalink] [raw]
Subject: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure

From: Wanpeng Li <[email protected]>

In L0, Haswell client host:

nested_vmx_exit_reflected failed vm entry 7
WARNING: CPU: 6 PID: 6797 at kvm/arch/x86/kvm//vmx.c:6206 handle_desc+0x2d/0x40 [kvm_intel]
CPU: 6 PID: 6797 Comm: qemu-system-x86 Tainted: G W OE 4.15.0+ #4
RIP: 0010:handle_desc+0x2d/0x40 [kvm_intel]
Call Trace:
vmx_handle_exit+0xbd/0xe20 [kvm_intel]
? kvm_arch_vcpu_ioctl_run+0xcde/0x1c00 [kvm]
kvm_arch_vcpu_ioctl_run+0xd5a/0x1c00 [kvm]
kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
? kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
? __fget+0xfc/0x210
? __fget+0xfc/0x210
do_vfs_ioctl+0xa4/0x6a0
? __fget+0x11d/0x210
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x25/0x9c

This can be reproduced by running kvm-unit-tests/run_tests.sh vmx_controls in
L1. UMIP CPUID bit is exposed to the L1 UMIP aware guest since it is emulated
by enabling descriptor-table exits on L0. There is a vmentry fail when
L0 tries to run L2 directly, the L1 guest architectural CR4 is not restored
after this failure since commit 4f350c6dbcb (kvm: nVMX: Handle deferred early
VMLAUNCH/VMRESUME failure properly). The L2 is kvm-unit-tests which will not
write CR4 w/ X86_CR4_UMIP bit. After another L1 access descriptor vmexit, we
check L2's architectural CR4 instead of L1's architectural CR4. This patch
fixes it by restoring L1's architectural CR4 after L0's VMLAUNCH/VMRESUME
failure.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/kvm/vmx.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 23789c9..9fc0492 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -11633,6 +11633,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
*/
nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);

+ vcpu->arch.cr4 = vmcs12->host_cr4;
load_vmcs12_mmu_host_state(vcpu, vmcs12);

/*
--
2.7.4



2018-02-05 18:58:46

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure

[Resending as plain text]

On Mon, Feb 5, 2018 at 10:21 AM Jim Mattson <[email protected]> wrote:

> This is incorrect. In the event of an early VM-entry failure (e.g. a
> VM-entry failure for "VM entry with invalid control field(s)"), no host
> state should be loaded from the VMCS12. Of course, no guest state should
> have been loaded from the VMCS12 either, but that's a problem we have with
> deferring some VMCS12 control field checks to the hardware.

> CR4 should be unchanged from the time of the VMLAUNCH/VMRESUME. There is
> no guarantee that vmcs12->host_cr4 holds the correct value.


> On Mon, Feb 5, 2018 at 3:05 AM Wanpeng Li <[email protected]> wrote:

>> From: Wanpeng Li <[email protected]>

>> In L0, Haswell client host:

>> nested_vmx_exit_reflected failed vm entry 7
>> WARNING: CPU: 6 PID: 6797 at kvm/arch/x86/kvm//vmx.c:6206
>> handle_desc+0x2d/0x40 [kvm_intel]
>> CPU: 6 PID: 6797 Comm: qemu-system-x86 Tainted: G W OE
>> 4.15.0+ #4
>> RIP: 0010:handle_desc+0x2d/0x40 [kvm_intel]
>> Call Trace:
>> vmx_handle_exit+0xbd/0xe20 [kvm_intel]
>> ? kvm_arch_vcpu_ioctl_run+0xcde/0x1c00 [kvm]
>> kvm_arch_vcpu_ioctl_run+0xd5a/0x1c00 [kvm]
>> kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
>> ? kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
>> ? __fget+0xfc/0x210
>> ? __fget+0xfc/0x210
>> do_vfs_ioctl+0xa4/0x6a0
>> ? __fget+0x11d/0x210
>> SyS_ioctl+0x79/0x90
>> entry_SYSCALL_64_fastpath+0x25/0x9c

>> This can be reproduced by running kvm-unit-tests/run_tests.sh
>> vmx_controls in
>> L1. UMIP CPUID bit is exposed to the L1 UMIP aware guest since it is
>> emulated
>> by enabling descriptor-table exits on L0. There is a vmentry fail when
>> L0 tries to run L2 directly, the L1 guest architectural CR4 is not
>> restored
>> after this failure since commit 4f350c6dbcb (kvm: nVMX: Handle deferred
>> early
>> VMLAUNCH/VMRESUME failure properly). The L2 is kvm-unit-tests which will
>> not
>> write CR4 w/ X86_CR4_UMIP bit. After another L1 access descriptor vmexit,
>> we
>> check L2's architectural CR4 instead of L1's architectural CR4. This
>> patch
>> fixes it by restoring L1's architectural CR4 after L0's VMLAUNCH/VMRESUME
>> failure.

>> Cc: Paolo Bonzini <[email protected]>
>> Cc: Radim Krčmář <[email protected]>
>> Signed-off-by: Wanpeng Li <[email protected]>
>> ---
>> arch/x86/kvm/vmx.c | 1 +
>> 1 file changed, 1 insertion(+)

>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 23789c9..9fc0492 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -11633,6 +11633,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu
>> *vcpu, u32 exit_reason,
>> */
>> nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);

>> + vcpu->arch.cr4 = vmcs12->host_cr4;
>> load_vmcs12_mmu_host_state(vcpu, vmcs12);

>> /*
>> --
>> 2.7.4

2018-02-06 00:59:01

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure

2018-02-06 2:24 GMT+08:00 Jim Mattson <[email protected]>:
> [Resending as plain text]
>
> On Mon, Feb 5, 2018 at 10:21 AM Jim Mattson <[email protected]> wrote:
>
>> This is incorrect. In the event of an early VM-entry failure (e.g. a
>> VM-entry failure for "VM entry with invalid control field(s)"), no host
>> state should be loaded from the VMCS12. Of course, no guest state should
>> have been loaded from the VMCS12 either, but that's a problem we have with
>> deferring some VMCS12 control field checks to the hardware.
>
>> CR4 should be unchanged from the time of the VMLAUNCH/VMRESUME. There is

This is effective one, what I restore in this patch is
achitectural/guest visible.

Regards,
Wanpeng Li

>> no guarantee that vmcs12->host_cr4 holds the correct value.
>
>
>> On Mon, Feb 5, 2018 at 3:05 AM Wanpeng Li <[email protected]> wrote:
>
>>> From: Wanpeng Li <[email protected]>
>
>>> In L0, Haswell client host:
>
>>> nested_vmx_exit_reflected failed vm entry 7
>>> WARNING: CPU: 6 PID: 6797 at kvm/arch/x86/kvm//vmx.c:6206
>>> handle_desc+0x2d/0x40 [kvm_intel]
>>> CPU: 6 PID: 6797 Comm: qemu-system-x86 Tainted: G W OE
>>> 4.15.0+ #4
>>> RIP: 0010:handle_desc+0x2d/0x40 [kvm_intel]
>>> Call Trace:
>>> vmx_handle_exit+0xbd/0xe20 [kvm_intel]
>>> ? kvm_arch_vcpu_ioctl_run+0xcde/0x1c00 [kvm]
>>> kvm_arch_vcpu_ioctl_run+0xd5a/0x1c00 [kvm]
>>> kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
>>> ? kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
>>> ? __fget+0xfc/0x210
>>> ? __fget+0xfc/0x210
>>> do_vfs_ioctl+0xa4/0x6a0
>>> ? __fget+0x11d/0x210
>>> SyS_ioctl+0x79/0x90
>>> entry_SYSCALL_64_fastpath+0x25/0x9c
>
>>> This can be reproduced by running kvm-unit-tests/run_tests.sh
>>> vmx_controls in
>>> L1. UMIP CPUID bit is exposed to the L1 UMIP aware guest since it is
>>> emulated
>>> by enabling descriptor-table exits on L0. There is a vmentry fail when
>>> L0 tries to run L2 directly, the L1 guest architectural CR4 is not
>>> restored
>>> after this failure since commit 4f350c6dbcb (kvm: nVMX: Handle deferred
>>> early
>>> VMLAUNCH/VMRESUME failure properly). The L2 is kvm-unit-tests which will
>>> not
>>> write CR4 w/ X86_CR4_UMIP bit. After another L1 access descriptor vmexit,
>>> we
>>> check L2's architectural CR4 instead of L1's architectural CR4. This
>>> patch
>>> fixes it by restoring L1's architectural CR4 after L0's VMLAUNCH/VMRESUME
>>> failure.
>
>>> Cc: Paolo Bonzini <[email protected]>
>>> Cc: Radim Krčmář <[email protected]>
>>> Signed-off-by: Wanpeng Li <[email protected]>
>>> ---
>>> arch/x86/kvm/vmx.c | 1 +
>>> 1 file changed, 1 insertion(+)
>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 23789c9..9fc0492 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -11633,6 +11633,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu
>>> *vcpu, u32 exit_reason,
>>> */
>>> nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD);
>
>>> + vcpu->arch.cr4 = vmcs12->host_cr4;
>>> load_vmcs12_mmu_host_state(vcpu, vmcs12);
>
>>> /*
>>> --
>>> 2.7.4

2018-02-06 17:00:05

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure

On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <[email protected]> wrote:

> This is effective one, what I restore in this patch is
> achitectural/guest visible.

This patch doesn't "restore" the guest visible CR4 to its value at the
time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12.
That behavior is incorrect.

2018-02-07 08:32:21

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure

2018-02-07 0:58 GMT+08:00 Jim Mattson <[email protected]>:
> On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <[email protected]> wrote:
>
>> This is effective one, what I restore in this patch is
>> achitectural/guest visible.
>
> This patch doesn't "restore" the guest visible CR4 to its value at the
> time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12.
> That behavior is incorrect.

You have another pointing out about this.
https://lkml.org/lkml/2018/2/5/518 vmcs12->host_cr3/host_cr4 has the
up-to-date value when L1 is running, it is still up-to-date after
vmexit due to L1 executes VMLAUNCH/VMRESUME, I think the value stays
the same before L0 emulates the VMLAUNCH/VMRESUME, according to below
comments, why vmcs12->host_cr3/cr4 is not the value which we should
restore?

* After an early L2 VM-entry failure, we're now back
* in L1 which thinks it just finished a VMLAUNCH or
* VMRESUME instruction

Regards,
Wanpeng Li

2018-02-07 16:59:17

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure

vmcs12->host_cr[34] does not contain the up-to-date values when L1 is
running. L1 can vmwrite any values there. We know at this point that
they are legal (because we checked them), but that's about it. If the
VMLAUNCH/VMRESUME of vmcs12 fails for "invalid control field," there
is no VM-exit from L2 to L1, and these fields are not loaded. Instead,
execution just falls through to the next instruction with VMFailValid
semantics.

On Wed, Feb 7, 2018 at 12:31 AM, Wanpeng Li <[email protected]> wrote:
> 2018-02-07 0:58 GMT+08:00 Jim Mattson <[email protected]>:
>> On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <[email protected]> wrote:
>>
>>> This is effective one, what I restore in this patch is
>>> achitectural/guest visible.
>>
>> This patch doesn't "restore" the guest visible CR4 to its value at the
>> time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12.
>> That behavior is incorrect.
>
> You have another pointing out about this.
> https://lkml.org/lkml/2018/2/5/518 vmcs12->host_cr3/host_cr4 has the
> up-to-date value when L1 is running, it is still up-to-date after
> vmexit due to L1 executes VMLAUNCH/VMRESUME, I think the value stays
> the same before L0 emulates the VMLAUNCH/VMRESUME, according to below
> comments, why vmcs12->host_cr3/cr4 is not the value which we should
> restore?
>
> * After an early L2 VM-entry failure, we're now back
> * in L1 which thinks it just finished a VMLAUNCH or
> * VMRESUME instruction
>
> Regards,
> Wanpeng Li

2018-02-08 06:24:05

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure

2018-02-08 0:57 GMT+08:00 Jim Mattson <[email protected]>:
> vmcs12->host_cr[34] does not contain the up-to-date values when L1 is
> running. L1 can vmwrite any values there. We know at this point that

It will incur a vmexit to emulate L1 vmwrites vmcs12->host_cr[34] even
if vmcs shadow is enabled since host_cr[34] is not shadowed in the
bitmap, why it is not up-to-date when L1 is running?

Regards,
Wanpeng Li

> they are legal (because we checked them), but that's about it. If the
> VMLAUNCH/VMRESUME of vmcs12 fails for "invalid control field," there
> is no VM-exit from L2 to L1, and these fields are not loaded. Instead,
> execution just falls through to the next instruction with VMFailValid
> semantics.
>
> On Wed, Feb 7, 2018 at 12:31 AM, Wanpeng Li <[email protected]> wrote:
>> 2018-02-07 0:58 GMT+08:00 Jim Mattson <[email protected]>:
>>> On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <[email protected]> wrote:
>>>
>>>> This is effective one, what I restore in this patch is
>>>> achitectural/guest visible.
>>>
>>> This patch doesn't "restore" the guest visible CR4 to its value at the
>>> time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12.
>>> That behavior is incorrect.
>>
>> You have another pointing out about this.
>> https://lkml.org/lkml/2018/2/5/518 vmcs12->host_cr3/host_cr4 has the
>> up-to-date value when L1 is running, it is still up-to-date after
>> vmexit due to L1 executes VMLAUNCH/VMRESUME, I think the value stays
>> the same before L0 emulates the VMLAUNCH/VMRESUME, according to below
>> comments, why vmcs12->host_cr3/cr4 is not the value which we should
>> restore?
>>
>> * After an early L2 VM-entry failure, we're now back
>> * in L1 which thinks it just finished a VMLAUNCH or
>> * VMRESUME instruction
>>
>> Regards,
>> Wanpeng Li

2018-02-08 15:30:53

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure

Consider the following scenario:

L1 has never successfully executed VMLAUNCH. It has written 0 to
vmcs12's host CR3 field using VMWRITE, but the current host CR3 value
is actually 3e7000. It has written some illegal control field that the
L0 KVM doesn't check itself, but defers to the hardware checks on
vmcs02 instead. So, when L1 tries to execute VMLAUNCH, L0 follows this
path for "VM-entry to vmcs02 failed due to invalid control field(s)."
Your change would set CR3 to 0, which is incorrect. CR3 should
actually be set to 3e7000. Now, if L0 is sane and using EPT, then it
can find the correct L1 CR3 value in vmcs01's Guest CR3 field, but if
for some reason L0 is using shadow paging to execute L1, that won't
work. Similarly, the correct L1 CR4 value should be in vmcs01's CR4
read shadow field.

You can't just assume that L1 has written values to the vmcs12 host
fields that actually match the current host values. There is nothing
in the architecture that would require this behavior.

On Wed, Feb 7, 2018 at 10:22 PM, Wanpeng Li <[email protected]> wrote:
> 2018-02-08 0:57 GMT+08:00 Jim Mattson <[email protected]>:
>> vmcs12->host_cr[34] does not contain the up-to-date values when L1 is
>> running. L1 can vmwrite any values there. We know at this point that
>
> It will incur a vmexit to emulate L1 vmwrites vmcs12->host_cr[34] even
> if vmcs shadow is enabled since host_cr[34] is not shadowed in the
> bitmap, why it is not up-to-date when L1 is running?
>
> Regards,
> Wanpeng Li
>
>> they are legal (because we checked them), but that's about it. If the
>> VMLAUNCH/VMRESUME of vmcs12 fails for "invalid control field," there
>> is no VM-exit from L2 to L1, and these fields are not loaded. Instead,
>> execution just falls through to the next instruction with VMFailValid
>> semantics.
>>
>> On Wed, Feb 7, 2018 at 12:31 AM, Wanpeng Li <[email protected]> wrote:
>>> 2018-02-07 0:58 GMT+08:00 Jim Mattson <[email protected]>:
>>>> On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <[email protected]> wrote:
>>>>
>>>>> This is effective one, what I restore in this patch is
>>>>> achitectural/guest visible.
>>>>
>>>> This patch doesn't "restore" the guest visible CR4 to its value at the
>>>> time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12.
>>>> That behavior is incorrect.
>>>
>>> You have another pointing out about this.
>>> https://lkml.org/lkml/2018/2/5/518 vmcs12->host_cr3/host_cr4 has the
>>> up-to-date value when L1 is running, it is still up-to-date after
>>> vmexit due to L1 executes VMLAUNCH/VMRESUME, I think the value stays
>>> the same before L0 emulates the VMLAUNCH/VMRESUME, according to below
>>> comments, why vmcs12->host_cr3/cr4 is not the value which we should
>>> restore?
>>>
>>> * After an early L2 VM-entry failure, we're now back
>>> * in L1 which thinks it just finished a VMLAUNCH or
>>> * VMRESUME instruction
>>>
>>> Regards,
>>> Wanpeng Li

2018-02-08 15:55:52

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure

On Thu, Feb 8, 2018 at 7:29 AM, Jim Mattson <[email protected]> wrote:
> Similarly, the correct L1 CR4 value should be in vmcs01's CR4
> read shadow field.

Sorry; that's wrong. L1's CR4 value has to be reconstructed from the
vmcs01 guest CR4 field and CR4 shadow field using the cr4 guest/host
mask. But there is no way to get it from any field(s) in vmcs12.

2018-02-08 16:42:13

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure

On 08/02/2018 16:54, Jim Mattson wrote:
> On Thu, Feb 8, 2018 at 7:29 AM, Jim Mattson <[email protected]> wrote:
>> Similarly, the correct L1 CR4 value should be in vmcs01's CR4
>> read shadow field.
> Sorry; that's wrong. L1's CR4 value has to be reconstructed from the
> vmcs01 guest CR4 field and CR4 shadow field using the cr4 guest/host
> mask. But there is no way to get it from any field(s) in vmcs12.

Now that we have the prepare_vmcs02_full/prepare_vmcs02 split, we
probably should do more checks in there, and not rely on the processor
anymore.

Paolo

2018-02-11 11:57:26

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure

2018-02-08 23:29 GMT+08:00 Jim Mattson <[email protected]>:
> Consider the following scenario:
>
> L1 has never successfully executed VMLAUNCH. It has written 0 to
> vmcs12's host CR3 field using VMWRITE, but the current host CR3 value

Writes 0 to cr3 can't be detected during vmentry checks by hardware.

> is actually 3e7000. It has written some illegal control field that the
> L0 KVM doesn't check itself, but defers to the hardware checks on
> vmcs02 instead. So, when L1 tries to execute VMLAUNCH, L0 follows this
> path for "VM-entry to vmcs02 failed due to invalid control field(s)."
> Your change would set CR3 to 0, which is incorrect. CR3 should
> actually be set to 3e7000. Now, if L0 is sane and using EPT, then it
> can find the correct L1 CR3 value in vmcs01's Guest CR3 field, but if
> for some reason L0 is using shadow paging to execute L1, that won't
> work. Similarly, the correct L1 CR4 value should be in vmcs01's CR4
> read shadow field.
>
> You can't just assume that L1 has written values to the vmcs12 host
> fields that actually match the current host values. There is nothing
> in the architecture that would require this behavior.
>
> On Wed, Feb 7, 2018 at 10:22 PM, Wanpeng Li <[email protected]> wrote:
>> 2018-02-08 0:57 GMT+08:00 Jim Mattson <[email protected]>:
>>> vmcs12->host_cr[34] does not contain the up-to-date values when L1 is
>>> running. L1 can vmwrite any values there. We know at this point that
>>
>> It will incur a vmexit to emulate L1 vmwrites vmcs12->host_cr[34] even
>> if vmcs shadow is enabled since host_cr[34] is not shadowed in the
>> bitmap, why it is not up-to-date when L1 is running?
>>
>> Regards,
>> Wanpeng Li
>>
>>> they are legal (because we checked them), but that's about it. If the
>>> VMLAUNCH/VMRESUME of vmcs12 fails for "invalid control field," there
>>> is no VM-exit from L2 to L1, and these fields are not loaded. Instead,
>>> execution just falls through to the next instruction with VMFailValid
>>> semantics.
>>>
>>> On Wed, Feb 7, 2018 at 12:31 AM, Wanpeng Li <[email protected]> wrote:
>>>> 2018-02-07 0:58 GMT+08:00 Jim Mattson <[email protected]>:
>>>>> On Mon, Feb 5, 2018 at 4:57 PM, Wanpeng Li <[email protected]> wrote:
>>>>>
>>>>>> This is effective one, what I restore in this patch is
>>>>>> achitectural/guest visible.
>>>>>
>>>>> This patch doesn't "restore" the guest visible CR4 to its value at the
>>>>> time of VMLAUNCH/VMRESUME. It loads a new CR4 value from the vmcs12.
>>>>> That behavior is incorrect.
>>>>
>>>> You have another pointing out about this.
>>>> https://lkml.org/lkml/2018/2/5/518 vmcs12->host_cr3/host_cr4 has the
>>>> up-to-date value when L1 is running, it is still up-to-date after
>>>> vmexit due to L1 executes VMLAUNCH/VMRESUME, I think the value stays
>>>> the same before L0 emulates the VMLAUNCH/VMRESUME, according to below
>>>> comments, why vmcs12->host_cr3/cr4 is not the value which we should
>>>> restore?
>>>>
>>>> * After an early L2 VM-entry failure, we're now back
>>>> * in L1 which thinks it just finished a VMLAUNCH or
>>>> * VMRESUME instruction
>>>>
>>>> Regards,
>>>> Wanpeng Li

2018-02-12 17:39:23

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Fix CR4 after VMLAUNCH/VMRESUME failure

On Sun, Feb 11, 2018 at 3:56 AM, Wanpeng Li <[email protected]> wrote:

> Writes 0 to cr3 can't be detected during vmentry checks by hardware.

I never suggested otherwise. I was just trying to explain why you
can't assume that the host CR3 field in the VMCS matches the host CR3
at the time of VMLAUNCH.

KVM is set up for failure, because it loads a bunch of guest state
before checking the validity of all of the control fields. If a
control field in the vmcs12 is invalid, but KVM doesn't recognize this
until after it has loaded guest state, all of the host state that has
been overwritten should be restored. "Restored" does not mean "loaded
from the vmcs12." It means reverted to its state at the time of the
failed VMLAUNCH/VMRESUME.