> On 24 Jun 2019, at 16:30, Vitaly Kuznetsov <[email protected]> wrote:
>
> When Enlightened VMCS is in use, it is valid to do VMCLEAR and,
> according to TLFS, this should "transition an enlightened VMCS from the
> active to the non-active state". It is, however, wrong to assume that
> it is only valid to do VMCLEAR for the eVMCS which is currently active
> on the vCPU performing VMCLEAR.
>
> Currently, the logic in handle_vmclear() is broken: in case, there is no
> active eVMCS on the vCPU doing VMCLEAR we treat the argument as a 'normal'
> VMCS and kvm_vcpu_write_guest() to the 'launch_state' field irreversibly
> corrupts the memory area.
>
> So, in case the VMCLEAR argument is not the current active eVMCS on the
> vCPU, how can we know if the area it is pointing to is a normal or an
> enlightened VMCS?
> Thanks to the bug in Hyper-V (see commit 72aeb60c52bf7 ("KVM: nVMX: Verify
> eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD")) we can
> not, the revision can't be used to distinguish between them. So let's
> assume it is always enlightened in case enlightened vmentry is enabled in
> the assist page. Also, check if vmx->nested.enlightened_vmcs_enabled to
> minimize the impact for 'unenlightened' workloads.
>
> Fixes: b8bbab928fb1 ("KVM: nVMX: implement enlightened VMPTRLD and VMCLEAR")
> Signed-off-by: Vitaly Kuznetsov <[email protected]>
> ---
> arch/x86/kvm/vmx/evmcs.c | 18 ++++++++++++++++++
> arch/x86/kvm/vmx/evmcs.h | 1 +
> arch/x86/kvm/vmx/nested.c | 19 ++++++++-----------
> 3 files changed, 27 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> index 1a6b3e1581aa..eae636ec0cc8 100644
> --- a/arch/x86/kvm/vmx/evmcs.c
> +++ b/arch/x86/kvm/vmx/evmcs.c
> @@ -3,6 +3,7 @@
> #include <linux/errno.h>
> #include <linux/smp.h>
>
> +#include "../hyperv.h"
> #include "evmcs.h"
> #include "vmcs.h"
> #include "vmx.h"
> @@ -309,6 +310,23 @@ void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf)
> }
> #endif
>
> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr)
I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
In addition, I think you should return either -1ull or assist_page.current_nested_vmcs.
i.e. Don’t return evmcs_ptr by pointer but instead as a return-value and get rid of the bool.
> +{
> + struct hv_vp_assist_page assist_page;
> +
> + *evmptr = -1ull;
> +
> + if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
> + return false;
> +
> + if (unlikely(!assist_page.enlighten_vmentry))
> + return false;
> +
> + *evmptr = assist_page.current_nested_vmcs;
> +
> + return true;
> +}
> +
> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
> index e0fcef85b332..c449e79a9c4a 100644
> --- a/arch/x86/kvm/vmx/evmcs.h
> +++ b/arch/x86/kvm/vmx/evmcs.h
> @@ -195,6 +195,7 @@ static inline void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf) {}
> static inline void evmcs_touch_msr_bitmap(void) {}
> #endif /* IS_ENABLED(CONFIG_HYPERV) */
>
> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr);
> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
> int nested_enable_evmcs(struct kvm_vcpu *vcpu,
> uint16_t *vmcs_version);
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 9214b3aea1f9..ee8dda7d8a03 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -1765,26 +1765,21 @@ static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu,
> bool from_launch)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> - struct hv_vp_assist_page assist_page;
> + u64 evmptr;
I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
>
> if (likely(!vmx->nested.enlightened_vmcs_enabled))
> return 1;
>
> - if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
> + if (!nested_enlightened_vmentry(vcpu, &evmptr))
> return 1;
>
> - if (unlikely(!assist_page.enlighten_vmentry))
> - return 1;
> -
> - if (unlikely(assist_page.current_nested_vmcs !=
> - vmx->nested.hv_evmcs_vmptr)) {
> -
> + if (unlikely(evmptr != vmx->nested.hv_evmcs_vmptr)) {
> if (!vmx->nested.hv_evmcs)
> vmx->nested.current_vmptr = -1ull;
>
> nested_release_evmcs(vcpu);
>
> - if (kvm_vcpu_map(vcpu, gpa_to_gfn(assist_page.current_nested_vmcs),
> + if (kvm_vcpu_map(vcpu, gpa_to_gfn(evmptr),
> &vmx->nested.hv_evmcs_map))
> return 0;
>
> @@ -1826,7 +1821,7 @@ static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu,
> */
> vmx->nested.hv_evmcs->hv_clean_fields &=
> ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
> - vmx->nested.hv_evmcs_vmptr = assist_page.current_nested_vmcs;
> + vmx->nested.hv_evmcs_vmptr = evmptr;
>
> /*
> * Unlike normal vmcs12, enlightened vmcs12 is not fully
> @@ -4331,6 +4326,7 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> u32 zero = 0;
> gpa_t vmptr;
> + u64 evmptr;
I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
>
> if (!nested_vmx_check_permission(vcpu))
> return 1;
> @@ -4346,7 +4342,8 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
> return nested_vmx_failValid(vcpu,
> VMXERR_VMCLEAR_VMXON_POINTER);
>
> - if (vmx->nested.hv_evmcs_map.hva) {
> + if (unlikely(vmx->nested.enlightened_vmcs_enabled) &&
> + nested_enlightened_vmentry(vcpu, &evmptr)) {
> if (vmptr == vmx->nested.hv_evmcs_vmptr)
Shouldn’t you also remove the (vmptr == vmx->nested.hv_evmcs_vmptr) condition?
To my understanding, vmx->nested.hv_evmcs_vmptr represents the address of the loaded eVMCS on current vCPU.
But according to commit message, it is valid for vCPU to perform VMCLEAR on eVMCS that differ from loaded eVMCS on vCPU.
E.g. The current vCPU may even have vmx->nested.hv_evmcs_vmptr set to -1ull.
-Liran
> nested_release_evmcs(vcpu);
> } else {
> --
> 2.20.1
>
Liran Alon <[email protected]> writes:
>> On 24 Jun 2019, at 16:30, Vitaly Kuznetsov <[email protected]> wrote:
>>
>> When Enlightened VMCS is in use, it is valid to do VMCLEAR and,
>> according to TLFS, this should "transition an enlightened VMCS from the
>> active to the non-active state". It is, however, wrong to assume that
>> it is only valid to do VMCLEAR for the eVMCS which is currently active
>> on the vCPU performing VMCLEAR.
>>
>> Currently, the logic in handle_vmclear() is broken: in case, there is no
>> active eVMCS on the vCPU doing VMCLEAR we treat the argument as a 'normal'
>> VMCS and kvm_vcpu_write_guest() to the 'launch_state' field irreversibly
>> corrupts the memory area.
>>
>> So, in case the VMCLEAR argument is not the current active eVMCS on the
>> vCPU, how can we know if the area it is pointing to is a normal or an
>> enlightened VMCS?
>> Thanks to the bug in Hyper-V (see commit 72aeb60c52bf7 ("KVM: nVMX: Verify
>> eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD")) we can
>> not, the revision can't be used to distinguish between them. So let's
>> assume it is always enlightened in case enlightened vmentry is enabled in
>> the assist page. Also, check if vmx->nested.enlightened_vmcs_enabled to
>> minimize the impact for 'unenlightened' workloads.
>>
>> Fixes: b8bbab928fb1 ("KVM: nVMX: implement enlightened VMPTRLD and VMCLEAR")
>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>> ---
>> arch/x86/kvm/vmx/evmcs.c | 18 ++++++++++++++++++
>> arch/x86/kvm/vmx/evmcs.h | 1 +
>> arch/x86/kvm/vmx/nested.c | 19 ++++++++-----------
>> 3 files changed, 27 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
>> index 1a6b3e1581aa..eae636ec0cc8 100644
>> --- a/arch/x86/kvm/vmx/evmcs.c
>> +++ b/arch/x86/kvm/vmx/evmcs.c
>> @@ -3,6 +3,7 @@
>> #include <linux/errno.h>
>> #include <linux/smp.h>
>>
>> +#include "../hyperv.h"
>> #include "evmcs.h"
>> #include "vmcs.h"
>> #include "vmx.h"
>> @@ -309,6 +310,23 @@ void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf)
>> }
>> #endif
>>
>> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr)
>
> I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
> In addition, I think you should return either -1ull or assist_page.current_nested_vmcs.
> i.e. Don’t return evmcs_ptr by pointer but instead as a return-value
> and get rid of the bool.
Sure, can do in v2.
>
>> +{
>> + struct hv_vp_assist_page assist_page;
>> +
>> + *evmptr = -1ull;
>> +
>> + if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
>> + return false;
>> +
>> + if (unlikely(!assist_page.enlighten_vmentry))
>> + return false;
>> +
>> + *evmptr = assist_page.current_nested_vmcs;
>> +
>> + return true;
>> +}
>> +
>> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
>> {
>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
>> index e0fcef85b332..c449e79a9c4a 100644
>> --- a/arch/x86/kvm/vmx/evmcs.h
>> +++ b/arch/x86/kvm/vmx/evmcs.h
>> @@ -195,6 +195,7 @@ static inline void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf) {}
>> static inline void evmcs_touch_msr_bitmap(void) {}
>> #endif /* IS_ENABLED(CONFIG_HYPERV) */
>>
>> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr);
>> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
>> int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>> uint16_t *vmcs_version);
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index 9214b3aea1f9..ee8dda7d8a03 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -1765,26 +1765,21 @@ static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu,
>> bool from_launch)
>> {
>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>> - struct hv_vp_assist_page assist_page;
>> + u64 evmptr;
>
> I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
>
Sure.
>>
>> if (likely(!vmx->nested.enlightened_vmcs_enabled))
>> return 1;
>>
>> - if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
>> + if (!nested_enlightened_vmentry(vcpu, &evmptr))
>> return 1;
>>
>> - if (unlikely(!assist_page.enlighten_vmentry))
>> - return 1;
>> -
>> - if (unlikely(assist_page.current_nested_vmcs !=
>> - vmx->nested.hv_evmcs_vmptr)) {
>> -
>> + if (unlikely(evmptr != vmx->nested.hv_evmcs_vmptr)) {
>> if (!vmx->nested.hv_evmcs)
>> vmx->nested.current_vmptr = -1ull;
>>
>> nested_release_evmcs(vcpu);
>>
>> - if (kvm_vcpu_map(vcpu, gpa_to_gfn(assist_page.current_nested_vmcs),
>> + if (kvm_vcpu_map(vcpu, gpa_to_gfn(evmptr),
>> &vmx->nested.hv_evmcs_map))
>> return 0;
>>
>> @@ -1826,7 +1821,7 @@ static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu,
>> */
>> vmx->nested.hv_evmcs->hv_clean_fields &=
>> ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
>> - vmx->nested.hv_evmcs_vmptr = assist_page.current_nested_vmcs;
>> + vmx->nested.hv_evmcs_vmptr = evmptr;
>>
>> /*
>> * Unlike normal vmcs12, enlightened vmcs12 is not fully
>> @@ -4331,6 +4326,7 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>> u32 zero = 0;
>> gpa_t vmptr;
>> + u64 evmptr;
>
> I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
>
Sure.
>>
>> if (!nested_vmx_check_permission(vcpu))
>> return 1;
>> @@ -4346,7 +4342,8 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
>> return nested_vmx_failValid(vcpu,
>> VMXERR_VMCLEAR_VMXON_POINTER);
>>
>> - if (vmx->nested.hv_evmcs_map.hva) {
>> + if (unlikely(vmx->nested.enlightened_vmcs_enabled) &&
>> + nested_enlightened_vmentry(vcpu, &evmptr)) {
>> if (vmptr == vmx->nested.hv_evmcs_vmptr)
>
> Shouldn’t you also remove the (vmptr == vmx->nested.hv_evmcs_vmptr) condition?
> To my understanding, vmx->nested.hv_evmcs_vmptr represents the address of the loaded eVMCS on current vCPU.
> But according to commit message, it is valid for vCPU to perform
> VMCLEAR on eVMCS that differ from loaded eVMCS on vCPU.
> E.g. The current vCPU may even have vmx->nested.hv_evmcs_vmptr set to
> -1ull.
nested_release_evmcs() unmaps current eVMCS on the vCPU, we can't easily
unmap eVMCS on other vCPUs without somehow synchronizing with
them. Actually, if we remove nested_release_evmcs() from here nothing is
going to change: the fact that eVMCS is mapped doesn't hurt much. If the
next enlightened vmentry is going to happen with the same evmptr we'll
have to map it back, in case a different one will be used - we'll unmap
the old.
In KVM, there's nothing we *have* to do to transition an eVMCS from
active to non-activer state. We, for example, don't enforce the
requirement that it can only be active on one vCPU at a time. Enforcing
it is expensive (some synchronization is required) and if L1 hypervisor
is misbehaving than, well, things are not going to work anyway.
That said I'm ok with dropping nested_release_evmcs() for consistency
but we can't just drop 'if (vmptr == vmx->nested.hv_evmcs_vmptr)'.
Thanks for your review!
--
Vitaly
> On 24 Jun 2019, at 17:16, Vitaly Kuznetsov <[email protected]> wrote:
>
> Liran Alon <[email protected]> writes:
>
>>> On 24 Jun 2019, at 16:30, Vitaly Kuznetsov <[email protected]> wrote:
>>>
>>> When Enlightened VMCS is in use, it is valid to do VMCLEAR and,
>>> according to TLFS, this should "transition an enlightened VMCS from the
>>> active to the non-active state". It is, however, wrong to assume that
>>> it is only valid to do VMCLEAR for the eVMCS which is currently active
>>> on the vCPU performing VMCLEAR.
>>>
>>> Currently, the logic in handle_vmclear() is broken: in case, there is no
>>> active eVMCS on the vCPU doing VMCLEAR we treat the argument as a 'normal'
>>> VMCS and kvm_vcpu_write_guest() to the 'launch_state' field irreversibly
>>> corrupts the memory area.
>>>
>>> So, in case the VMCLEAR argument is not the current active eVMCS on the
>>> vCPU, how can we know if the area it is pointing to is a normal or an
>>> enlightened VMCS?
>>> Thanks to the bug in Hyper-V (see commit 72aeb60c52bf7 ("KVM: nVMX: Verify
>>> eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD")) we can
>>> not, the revision can't be used to distinguish between them. So let's
>>> assume it is always enlightened in case enlightened vmentry is enabled in
>>> the assist page. Also, check if vmx->nested.enlightened_vmcs_enabled to
>>> minimize the impact for 'unenlightened' workloads.
>>>
>>> Fixes: b8bbab928fb1 ("KVM: nVMX: implement enlightened VMPTRLD and VMCLEAR")
>>> Signed-off-by: Vitaly Kuznetsov <[email protected]>
>>> ---
>>> arch/x86/kvm/vmx/evmcs.c | 18 ++++++++++++++++++
>>> arch/x86/kvm/vmx/evmcs.h | 1 +
>>> arch/x86/kvm/vmx/nested.c | 19 ++++++++-----------
>>> 3 files changed, 27 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
>>> index 1a6b3e1581aa..eae636ec0cc8 100644
>>> --- a/arch/x86/kvm/vmx/evmcs.c
>>> +++ b/arch/x86/kvm/vmx/evmcs.c
>>> @@ -3,6 +3,7 @@
>>> #include <linux/errno.h>
>>> #include <linux/smp.h>
>>>
>>> +#include "../hyperv.h"
>>> #include "evmcs.h"
>>> #include "vmcs.h"
>>> #include "vmx.h"
>>> @@ -309,6 +310,23 @@ void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf)
>>> }
>>> #endif
>>>
>>> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr)
>>
>> I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
>> In addition, I think you should return either -1ull or assist_page.current_nested_vmcs.
>> i.e. Don’t return evmcs_ptr by pointer but instead as a return-value
>> and get rid of the bool.
>
> Sure, can do in v2.
>
>>
>>> +{
>>> + struct hv_vp_assist_page assist_page;
>>> +
>>> + *evmptr = -1ull;
>>> +
>>> + if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
>>> + return false;
>>> +
>>> + if (unlikely(!assist_page.enlighten_vmentry))
>>> + return false;
>>> +
>>> + *evmptr = assist_page.current_nested_vmcs;
>>> +
>>> + return true;
>>> +}
>>> +
>>> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
>>> {
>>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>>> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
>>> index e0fcef85b332..c449e79a9c4a 100644
>>> --- a/arch/x86/kvm/vmx/evmcs.h
>>> +++ b/arch/x86/kvm/vmx/evmcs.h
>>> @@ -195,6 +195,7 @@ static inline void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf) {}
>>> static inline void evmcs_touch_msr_bitmap(void) {}
>>> #endif /* IS_ENABLED(CONFIG_HYPERV) */
>>>
>>> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr);
>>> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
>>> int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>>> uint16_t *vmcs_version);
>>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>>> index 9214b3aea1f9..ee8dda7d8a03 100644
>>> --- a/arch/x86/kvm/vmx/nested.c
>>> +++ b/arch/x86/kvm/vmx/nested.c
>>> @@ -1765,26 +1765,21 @@ static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu,
>>> bool from_launch)
>>> {
>>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>>> - struct hv_vp_assist_page assist_page;
>>> + u64 evmptr;
>>
>> I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
>>
>
> Sure.
Sorry I meant evmcs_vmptr to be consistent with vmx->nested.hv_evmcs_vmptr.
>
>>>
>>> if (likely(!vmx->nested.enlightened_vmcs_enabled))
>>> return 1;
>>>
>>> - if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
>>> + if (!nested_enlightened_vmentry(vcpu, &evmptr))
>>> return 1;
>>>
>>> - if (unlikely(!assist_page.enlighten_vmentry))
>>> - return 1;
>>> -
>>> - if (unlikely(assist_page.current_nested_vmcs !=
>>> - vmx->nested.hv_evmcs_vmptr)) {
>>> -
>>> + if (unlikely(evmptr != vmx->nested.hv_evmcs_vmptr)) {
>>> if (!vmx->nested.hv_evmcs)
>>> vmx->nested.current_vmptr = -1ull;
>>>
>>> nested_release_evmcs(vcpu);
>>>
>>> - if (kvm_vcpu_map(vcpu, gpa_to_gfn(assist_page.current_nested_vmcs),
>>> + if (kvm_vcpu_map(vcpu, gpa_to_gfn(evmptr),
>>> &vmx->nested.hv_evmcs_map))
>>> return 0;
>>>
>>> @@ -1826,7 +1821,7 @@ static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu,
>>> */
>>> vmx->nested.hv_evmcs->hv_clean_fields &=
>>> ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
>>> - vmx->nested.hv_evmcs_vmptr = assist_page.current_nested_vmcs;
>>> + vmx->nested.hv_evmcs_vmptr = evmptr;
>>>
>>> /*
>>> * Unlike normal vmcs12, enlightened vmcs12 is not fully
>>> @@ -4331,6 +4326,7 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
>>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>>> u32 zero = 0;
>>> gpa_t vmptr;
>>> + u64 evmptr;
>>
>> I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
>>
>
> Sure.
>
>>>
>>> if (!nested_vmx_check_permission(vcpu))
>>> return 1;
>>> @@ -4346,7 +4342,8 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
>>> return nested_vmx_failValid(vcpu,
>>> VMXERR_VMCLEAR_VMXON_POINTER);
>>>
>>> - if (vmx->nested.hv_evmcs_map.hva) {
>>> + if (unlikely(vmx->nested.enlightened_vmcs_enabled) &&
>>> + nested_enlightened_vmentry(vcpu, &evmptr)) {
>>> if (vmptr == vmx->nested.hv_evmcs_vmptr)
>>
>> Shouldn’t you also remove the (vmptr == vmx->nested.hv_evmcs_vmptr) condition?
>> To my understanding, vmx->nested.hv_evmcs_vmptr represents the address of the loaded eVMCS on current vCPU.
>> But according to commit message, it is valid for vCPU to perform
>> VMCLEAR on eVMCS that differ from loaded eVMCS on vCPU.
>> E.g. The current vCPU may even have vmx->nested.hv_evmcs_vmptr set to
>> -1ull.
>
> nested_release_evmcs() unmaps current eVMCS on the vCPU, we can't easily
> unmap eVMCS on other vCPUs without somehow synchronizing with
> them. Actually, if we remove nested_release_evmcs() from here nothing is
> going to change: the fact that eVMCS is mapped doesn't hurt much. If the
> next enlightened vmentry is going to happen with the same evmptr we'll
> have to map it back, in case a different one will be used - we'll unmap
> the old.
Right.
>
> In KVM, there's nothing we *have* to do to transition an eVMCS from
> active to non-activer state. We, for example, don't enforce the
> requirement that it can only be active on one vCPU at a time. Enforcing
> it is expensive (some synchronization is required) and if L1 hypervisor
> is misbehaving than, well, things are not going to work anyway.
Right.
>
> That said I'm ok with dropping nested_release_evmcs() for consistency
> but we can't just drop 'if (vmptr == vmx->nested.hv_evmcs_vmptr)’.
Right. I meant that we can just change code to:
/* Add relevant comment here as this is not trivial why we do this */
If (likely(!vmx->nested.enlightened_vmcs_enabled) ||
nested_enlightened_vmentry(vcpu, &evmptr)) {
if (vmptr == vmx->nested.current_vmptr)
nested_release_vmcs12(vcpu);
kvm_vcpu_write_guest(…);
}
-Liran
>
> Thanks for your review!
>
> --
> Vitaly
Liran Alon <[email protected]> writes:
>> On 24 Jun 2019, at 16:30, Vitaly Kuznetsov <[email protected]> wrote:
>>
>>
>> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr)
>
> I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
> In addition, I think you should return either -1ull or assist_page.current_nested_vmcs.
> i.e. Don’t return evmcs_ptr by pointer but instead as a return-value
> and get rid of the bool.
Actually no, sorry, I'm having second thoughts here: in handle_vmclear()
we don't care about the value of evmcs_ptr, we only want to check that
enlightened vmentry bit is enabled in assist page. If we switch to
checking evmcs_ptr against '-1', for example, we will make '-1' a magic
value which is not in the TLFS. Windows may decide to use it for
something else - and we will get a hard-to-debug bug again.
If you still dislike nested_enlightened_vmentry() having the side effect
of fetching evmcs_ptr I can get rid of it by splitting the function into
two, however, it will be less efficient for
nested_vmx_handle_enlightened_vmptrld(). Or we can just leave things as
they are there and use the newly introduced function in handle_vmclear()
only.
--
Vitaly
> On 25 Jun 2019, at 11:51, Vitaly Kuznetsov <[email protected]> wrote:
>
> Liran Alon <[email protected]> writes:
>
>>> On 24 Jun 2019, at 16:30, Vitaly Kuznetsov <[email protected]> wrote:
>>>
>>>
>>> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr)
>>
>> I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
>> In addition, I think you should return either -1ull or assist_page.current_nested_vmcs.
>> i.e. Don’t return evmcs_ptr by pointer but instead as a return-value
>> and get rid of the bool.
>
> Actually no, sorry, I'm having second thoughts here: in handle_vmclear()
> we don't care about the value of evmcs_ptr, we only want to check that
> enlightened vmentry bit is enabled in assist page. If we switch to
> checking evmcs_ptr against '-1', for example, we will make '-1' a magic
> value which is not in the TLFS. Windows may decide to use it for
> something else - and we will get a hard-to-debug bug again.
I’m not sure I understand.
You are worried that when guest have setup a valid assist-page and set enlighten_vmentry to true,
that assist_page.current_nested_vmcs can be -1ull and still be considered a valid eVMCS?
I don't think that's reasonable.
i.e. I thought about having this version of the method:
+u64 nested_enlightened_vmentry(struct kvm_vcpu *vcpu)
+{
+ struct hv_vp_assist_page assist_page;
+
+ if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
+ return -1ull;
+
+ if (unlikely(!assist_page.enlighten_vmentry))
+ return -1ull;
+
+ return assist_page.current_nested_vmcs;
+}
+
-Liran
>
> If you still dislike nested_enlightened_vmentry() having the side effect
> of fetching evmcs_ptr I can get rid of it by splitting the function into
> two, however, it will be less efficient for
> nested_vmx_handle_enlightened_vmptrld(). Or we can just leave things as
> they are there and use the newly introduced function in handle_vmclear()
> only.
>
> --
> Vitaly
Liran Alon <[email protected]> writes:
>> On 25 Jun 2019, at 11:51, Vitaly Kuznetsov <[email protected]> wrote:
>>
>> Liran Alon <[email protected]> writes:
>>
>>>> On 24 Jun 2019, at 16:30, Vitaly Kuznetsov <[email protected]> wrote:
>>>>
>>>>
>>>> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr)
>>>
>>> I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
>>> In addition, I think you should return either -1ull or assist_page.current_nested_vmcs.
>>> i.e. Don’t return evmcs_ptr by pointer but instead as a return-value
>>> and get rid of the bool.
>>
>> Actually no, sorry, I'm having second thoughts here: in handle_vmclear()
>> we don't care about the value of evmcs_ptr, we only want to check that
>> enlightened vmentry bit is enabled in assist page. If we switch to
>> checking evmcs_ptr against '-1', for example, we will make '-1' a magic
>> value which is not in the TLFS. Windows may decide to use it for
>> something else - and we will get a hard-to-debug bug again.
>
> I’m not sure I understand.
> You are worried that when guest have setup a valid assist-page and set
> enlighten_vmentry to true,
> that assist_page.current_nested_vmcs can be -1ull and still be considered a valid eVMCS?
> I don't think that's reasonable.
No, -1ull is not a valid eVMCS - but this shouldn't change VMCLEAR
semantics as VMCLEAR has it's own argument. It's perfectly valid to try
to put a eVMCS which was previously used on a different vCPU (and thus
which is 'active') to non-active state. The fact that we don't have an
active eVMCS on the vCPU doing VMCLEAR shouldn't matter at all.
--
Vitaly
Liran Alon <[email protected]> writes:
>> On 24 Jun 2019, at 17:16, Vitaly Kuznetsov <[email protected]> wrote:
>>
>>
>> That said I'm ok with dropping nested_release_evmcs() for consistency
>> but we can't just drop 'if (vmptr == vmx->nested.hv_evmcs_vmptr)’.
>
> Right. I meant that we can just change code to:
>
> /* Add relevant comment here as this is not trivial why we do this */
> If (likely(!vmx->nested.enlightened_vmcs_enabled) ||
> nested_enlightened_vmentry(vcpu, &evmptr)) {
>
> if (vmptr == vmx->nested.current_vmptr)
> nested_release_vmcs12(vcpu);
>
> kvm_vcpu_write_guest(…);
> }
>
The change, to my surprise, resulted in a set of L2 guest crashes. After
some debugging I figured out that clean fields is to blame: after
Windows does VMCLEAR it doesn't maintain clean field data before the
next VMLAUNCH - and nested_vmx_handle_enlightened_vmptrld() does nothing
in case evmcs_vmptr stays unchanged (so VMLAUNCH follows VMCLEAR on the
same vCPU). We apparently need to invalidate clean fields data on every
VMLAUCH. This is fix of its own, I'll do more testing and send v2.
--
Vitaly
On 24/06/19 16:16, Vitaly Kuznetsov wrote:
>>> gpa_t vmptr;
>>> + u64 evmptr;
>> I prefer to rename evmptr to evmcs_ptr. I think it’s more readable and sufficiently short.
>>
> Sure.
>
Let's make it evmcs_gpa instead. "*_ptr" or "p_*" should be for host
pointers.
Paolo