2022-06-28 10:48:30

by Anirudh Rayabharam

[permalink] [raw]
Subject: [PATCH v2] KVM: nVMX: Don't expose eVMCS unsupported fields to L1

When running cloud-hypervisor tests, VM entry into an L2 guest on KVM on
Hyper-V fails with this splat (stripped for brevity):

[ 1481.600386] WARNING: CPU: 4 PID: 7641 at arch/x86/kvm/vmx/nested.c:4563 nested_vmx_vmexit+0x70d/0x790 [kvm_intel]
[ 1481.600427] CPU: 4 PID: 7641 Comm: vcpu2 Not tainted 5.15.0-1008-azure #9-Ubuntu
[ 1481.600429] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 07/22/2021
[ 1481.600430] RIP: 0010:nested_vmx_vmexit+0x70d/0x790 [kvm_intel]
[ 1481.600447] Call Trace:
[ 1481.600449] <TASK>
[ 1481.600451] nested_vmx_reflect_vmexit+0x10b/0x440 [kvm_intel]
[ 1481.600457] __vmx_handle_exit+0xef/0x670 [kvm_intel]
[ 1481.600467] vmx_handle_exit+0x12/0x50 [kvm_intel]
[ 1481.600472] vcpu_enter_guest+0x83a/0xfd0 [kvm]
[ 1481.600524] vcpu_run+0x5e/0x240 [kvm]
[ 1481.600560] kvm_arch_vcpu_ioctl_run+0xd7/0x550 [kvm]
[ 1481.600597] kvm_vcpu_ioctl+0x29a/0x6d0 [kvm]
[ 1481.600634] __x64_sys_ioctl+0x91/0xc0
[ 1481.600637] do_syscall_64+0x5c/0xc0
[ 1481.600667] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1481.600670] RIP: 0033:0x7f688becdaff
[ 1481.600686] </TASK>

TSC multiplier field is currently not supported in EVMCS in KVM. It was
previously not supported from Hyper-V but has been added since. Because
it is not supported in KVM the use "TSC scaling control" is filtered out
of vmcs_config by evmcs_sanitize_exec_ctrls().

However, in nested_vmx_setup_ctls_msrs(), TSC scaling is exposed to L1.
eVMCS unsupported fields are not sanitized. When L1 tries to launch an L2
guest, vmcs12 has TSC scaling enabled. This propagates to vmcs02. But KVM
doesn't set the TSC multiplier value because kvm_has_tsc_control is false.
Due to this VM entry for L2 guest fails. (VM entry fails if
"use TSC scaling" is 1 but TSC multiplier is 0.)

To fix, in nested_vmx_setup_ctls_msrs(), sanitize the values read from MSRs
by filtering out fields that are not supported by eVMCS.

This is a stable-friendly intermediate fix. A more comprehensive fix is
in progress [1] but is probably too complicated to safely apply to
stable.

[1]: https://lore.kernel.org/kvm/[email protected]/

Fixes: d041b5ea93352 ("KVM: nVMX: Enable nested TSC scaling")
Signed-off-by: Anirudh Rayabharam <[email protected]>
---

Changes since v1:
- Sanitize all eVMCS unsupported fields instead of just TSC scaling.

v1: https://lore.kernel.org/lkml/[email protected]/

---
arch/x86/kvm/vmx/nested.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f5cb18e00e78..f88d748c7cc6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6564,6 +6564,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->pinbased_ctls_high);
msrs->pinbased_ctls_low |=
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs))
+ msrs->pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
+#endif
msrs->pinbased_ctls_high &=
PIN_BASED_EXT_INTR_MASK |
PIN_BASED_NMI_EXITING |
@@ -6580,6 +6584,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->exit_ctls_low =
VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;

+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs))
+ msrs->exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
+#endif
msrs->exit_ctls_high &=
#ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
@@ -6600,6 +6608,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->entry_ctls_high);
msrs->entry_ctls_low =
VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs))
+ msrs->entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
+#endif
msrs->entry_ctls_high &=
#ifdef CONFIG_X86_64
VM_ENTRY_IA32E_MODE |
@@ -6657,6 +6669,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->secondary_ctls_high);

msrs->secondary_ctls_low = 0;
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs))
+ msrs->secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
+#endif
msrs->secondary_ctls_high &=
SECONDARY_EXEC_DESC |
SECONDARY_EXEC_ENABLE_RDTSCP |
--
2.34.1


2022-06-28 12:26:28

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: nVMX: Don't expose eVMCS unsupported fields to L1

Anirudh Rayabharam <[email protected]> writes:

> When running cloud-hypervisor tests, VM entry into an L2 guest on KVM on
> Hyper-V fails with this splat (stripped for brevity):
>
> [ 1481.600386] WARNING: CPU: 4 PID: 7641 at arch/x86/kvm/vmx/nested.c:4563 nested_vmx_vmexit+0x70d/0x790 [kvm_intel]
> [ 1481.600427] CPU: 4 PID: 7641 Comm: vcpu2 Not tainted 5.15.0-1008-azure #9-Ubuntu
> [ 1481.600429] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 07/22/2021
> [ 1481.600430] RIP: 0010:nested_vmx_vmexit+0x70d/0x790 [kvm_intel]
> [ 1481.600447] Call Trace:
> [ 1481.600449] <TASK>
> [ 1481.600451] nested_vmx_reflect_vmexit+0x10b/0x440 [kvm_intel]
> [ 1481.600457] __vmx_handle_exit+0xef/0x670 [kvm_intel]
> [ 1481.600467] vmx_handle_exit+0x12/0x50 [kvm_intel]
> [ 1481.600472] vcpu_enter_guest+0x83a/0xfd0 [kvm]
> [ 1481.600524] vcpu_run+0x5e/0x240 [kvm]
> [ 1481.600560] kvm_arch_vcpu_ioctl_run+0xd7/0x550 [kvm]
> [ 1481.600597] kvm_vcpu_ioctl+0x29a/0x6d0 [kvm]
> [ 1481.600634] __x64_sys_ioctl+0x91/0xc0
> [ 1481.600637] do_syscall_64+0x5c/0xc0
> [ 1481.600667] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 1481.600670] RIP: 0033:0x7f688becdaff
> [ 1481.600686] </TASK>
>
> TSC multiplier field is currently not supported in EVMCS in KVM. It was
> previously not supported from Hyper-V but has been added since. Because
> it is not supported in KVM the use "TSC scaling control" is filtered out
> of vmcs_config by evmcs_sanitize_exec_ctrls().
>
> However, in nested_vmx_setup_ctls_msrs(), TSC scaling is exposed to L1.
> eVMCS unsupported fields are not sanitized. When L1 tries to launch an L2
> guest, vmcs12 has TSC scaling enabled. This propagates to vmcs02. But KVM
> doesn't set the TSC multiplier value because kvm_has_tsc_control is false.
> Due to this VM entry for L2 guest fails. (VM entry fails if
> "use TSC scaling" is 1 but TSC multiplier is 0.)
>
> To fix, in nested_vmx_setup_ctls_msrs(), sanitize the values read from MSRs
> by filtering out fields that are not supported by eVMCS.
>
> This is a stable-friendly intermediate fix. A more comprehensive fix is
> in progress [1] but is probably too complicated to safely apply to
> stable.
>
> [1]: https://lore.kernel.org/kvm/[email protected]/
>
> Fixes: d041b5ea93352 ("KVM: nVMX: Enable nested TSC scaling")
> Signed-off-by: Anirudh Rayabharam <[email protected]>
> ---
>
> Changes since v1:
> - Sanitize all eVMCS unsupported fields instead of just TSC scaling.
>
> v1: https://lore.kernel.org/lkml/[email protected]/
>
> ---
> arch/x86/kvm/vmx/nested.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index f5cb18e00e78..f88d748c7cc6 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6564,6 +6564,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> msrs->pinbased_ctls_high);
> msrs->pinbased_ctls_low |=
> PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
> +#if IS_ENABLED(CONFIG_HYPERV)
> + if (static_branch_unlikely(&enable_evmcs))
> + msrs->pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
> +#endif
> msrs->pinbased_ctls_high &=
> PIN_BASED_EXT_INTR_MASK |
> PIN_BASED_NMI_EXITING |
> @@ -6580,6 +6584,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> msrs->exit_ctls_low =
> VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
>
> +#if IS_ENABLED(CONFIG_HYPERV)
> + if (static_branch_unlikely(&enable_evmcs))
> + msrs->exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
> +#endif
> msrs->exit_ctls_high &=
> #ifdef CONFIG_X86_64
> VM_EXIT_HOST_ADDR_SPACE_SIZE |
> @@ -6600,6 +6608,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> msrs->entry_ctls_high);
> msrs->entry_ctls_low =
> VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
> +#if IS_ENABLED(CONFIG_HYPERV)
> + if (static_branch_unlikely(&enable_evmcs))
> + msrs->entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
> +#endif
> msrs->entry_ctls_high &=
> #ifdef CONFIG_X86_64
> VM_ENTRY_IA32E_MODE |
> @@ -6657,6 +6669,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> msrs->secondary_ctls_high);
>
> msrs->secondary_ctls_low = 0;
> +#if IS_ENABLED(CONFIG_HYPERV)
> + if (static_branch_unlikely(&enable_evmcs))
> + msrs->secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
> +#endif
> msrs->secondary_ctls_high &=
> SECONDARY_EXEC_DESC |
> SECONDARY_EXEC_ENABLE_RDTSCP |

(In theory, threre's also EVMCS1_UNSUPPORTED_VMFUNC filtering out
VMX_VMFUNC_EPTP_SWITCHING (as eVMCS EPTP_LIST_ADDRESS) but it is not
used by KVM)

As I said in another thread, I think this is fine as a
stable@/intermediate fix. Assuming the way to go for mainline is my
"KVM: nVMX: Use vmcs_config for setting up nested VMX MSRs", this patch
won't be needed and can be reverted.

--
Vitaly

2022-06-30 15:44:00

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: nVMX: Don't expose eVMCS unsupported fields to L1

Vitaly Kuznetsov <[email protected]> writes:

> Anirudh Rayabharam <[email protected]> writes:
>
>> When running cloud-hypervisor tests, VM entry into an L2 guest on KVM on
>> Hyper-V fails with this splat (stripped for brevity):
>>
>> [ 1481.600386] WARNING: CPU: 4 PID: 7641 at arch/x86/kvm/vmx/nested.c:4563 nested_vmx_vmexit+0x70d/0x790 [kvm_intel]
>> [ 1481.600427] CPU: 4 PID: 7641 Comm: vcpu2 Not tainted 5.15.0-1008-azure #9-Ubuntu
>> [ 1481.600429] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 07/22/2021
>> [ 1481.600430] RIP: 0010:nested_vmx_vmexit+0x70d/0x790 [kvm_intel]
>> [ 1481.600447] Call Trace:
>> [ 1481.600449] <TASK>
>> [ 1481.600451] nested_vmx_reflect_vmexit+0x10b/0x440 [kvm_intel]
>> [ 1481.600457] __vmx_handle_exit+0xef/0x670 [kvm_intel]
>> [ 1481.600467] vmx_handle_exit+0x12/0x50 [kvm_intel]
>> [ 1481.600472] vcpu_enter_guest+0x83a/0xfd0 [kvm]
>> [ 1481.600524] vcpu_run+0x5e/0x240 [kvm]
>> [ 1481.600560] kvm_arch_vcpu_ioctl_run+0xd7/0x550 [kvm]
>> [ 1481.600597] kvm_vcpu_ioctl+0x29a/0x6d0 [kvm]
>> [ 1481.600634] __x64_sys_ioctl+0x91/0xc0
>> [ 1481.600637] do_syscall_64+0x5c/0xc0
>> [ 1481.600667] entry_SYSCALL_64_after_hwframe+0x44/0xae
>> [ 1481.600670] RIP: 0033:0x7f688becdaff
>> [ 1481.600686] </TASK>
>>
>> TSC multiplier field is currently not supported in EVMCS in KVM. It was
>> previously not supported from Hyper-V but has been added since. Because
>> it is not supported in KVM the use "TSC scaling control" is filtered out
>> of vmcs_config by evmcs_sanitize_exec_ctrls().
>>
>> However, in nested_vmx_setup_ctls_msrs(), TSC scaling is exposed to L1.
>> eVMCS unsupported fields are not sanitized. When L1 tries to launch an L2
>> guest, vmcs12 has TSC scaling enabled. This propagates to vmcs02. But KVM
>> doesn't set the TSC multiplier value because kvm_has_tsc_control is false.
>> Due to this VM entry for L2 guest fails. (VM entry fails if
>> "use TSC scaling" is 1 but TSC multiplier is 0.)
>>
>> To fix, in nested_vmx_setup_ctls_msrs(), sanitize the values read from MSRs
>> by filtering out fields that are not supported by eVMCS.
>>
>> This is a stable-friendly intermediate fix. A more comprehensive fix is
>> in progress [1] but is probably too complicated to safely apply to
>> stable.
>>
>> [1]: https://lore.kernel.org/kvm/[email protected]/
>>
>> Fixes: d041b5ea93352 ("KVM: nVMX: Enable nested TSC scaling")
>> Signed-off-by: Anirudh Rayabharam <[email protected]>
>> ---
>>
>> Changes since v1:
>> - Sanitize all eVMCS unsupported fields instead of just TSC scaling.
>>
>> v1: https://lore.kernel.org/lkml/[email protected]/
>>
>> ---
>> arch/x86/kvm/vmx/nested.c | 16 ++++++++++++++++
>> 1 file changed, 16 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index f5cb18e00e78..f88d748c7cc6 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -6564,6 +6564,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
>> msrs->pinbased_ctls_high);
>> msrs->pinbased_ctls_low |=
>> PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
>> +#if IS_ENABLED(CONFIG_HYPERV)
>> + if (static_branch_unlikely(&enable_evmcs))
>> + msrs->pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
>> +#endif
>> msrs->pinbased_ctls_high &=
>> PIN_BASED_EXT_INTR_MASK |
>> PIN_BASED_NMI_EXITING |
>> @@ -6580,6 +6584,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
>> msrs->exit_ctls_low =
>> VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
>>
>> +#if IS_ENABLED(CONFIG_HYPERV)
>> + if (static_branch_unlikely(&enable_evmcs))
>> + msrs->exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
>> +#endif
>> msrs->exit_ctls_high &=
>> #ifdef CONFIG_X86_64
>> VM_EXIT_HOST_ADDR_SPACE_SIZE |
>> @@ -6600,6 +6608,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
>> msrs->entry_ctls_high);
>> msrs->entry_ctls_low =
>> VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
>> +#if IS_ENABLED(CONFIG_HYPERV)
>> + if (static_branch_unlikely(&enable_evmcs))
>> + msrs->entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
>> +#endif
>> msrs->entry_ctls_high &=
>> #ifdef CONFIG_X86_64
>> VM_ENTRY_IA32E_MODE |
>> @@ -6657,6 +6669,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
>> msrs->secondary_ctls_high);
>>
>> msrs->secondary_ctls_low = 0;
>> +#if IS_ENABLED(CONFIG_HYPERV)
>> + if (static_branch_unlikely(&enable_evmcs))
>> + msrs->secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
>> +#endif
>> msrs->secondary_ctls_high &=
>> SECONDARY_EXEC_DESC |
>> SECONDARY_EXEC_ENABLE_RDTSCP |
>
> (In theory, threre's also EVMCS1_UNSUPPORTED_VMFUNC filtering out
> VMX_VMFUNC_EPTP_SWITCHING (as eVMCS EPTP_LIST_ADDRESS) but it is not
> used by KVM)
>
> As I said in another thread, I think this is fine as a
> stable@/intermediate fix. Assuming the way to go for mainline is my
> "KVM: nVMX: Use vmcs_config for setting up nested VMX MSRs", this patch
> won't be needed and can be reverted.

(I've already said that in another thread, putting it here for
visibility)

I have to take this back. As-is, the patch is likely to break live
migration because of the reasons expressed by Jim:
https://lore.kernel.org/kvm/CALMp9eSBLcvuNDquvSfUnaF3S3f4ZkzqDRSsz-v93ZeX=xnssg@mail.gmail.com/

In case we filter out SECONDARY_EXEC_TSC_SCALING here in
nested_vmx_setup_ctls_msrs(), KVM_SET_MSRS on the destination will fail
because the data will have an unsupported bit.

It seems the easiest way to go for stable@ is to actually enable TSC
scaling, like the 4 patches I've suggested here:
https://lore.kernel.org/kvm/[email protected]/

The full solution including vmcs_config usage for
nested_vmx_setup_ctls_msrs() is 28 patches now, it's likely a no-go for
stable :-(

--
Vitaly