2022-06-13 19:14:16

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On 6/13/22 18:16, Anirudh Rayabharam wrote:
> + if (!kvm_has_tsc_control)
> + msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
> +
> msrs->secondary_ctls_low = 0;
> msrs->secondary_ctls_high &=
> SECONDARY_EXEC_DESC |
> @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> SECONDARY_EXEC_RDRAND_EXITING |
> SECONDARY_EXEC_ENABLE_INVPCID |
> SECONDARY_EXEC_RDSEED_EXITING |
> - SECONDARY_EXEC_XSAVES |
> - SECONDARY_EXEC_TSC_SCALING;
> + SECONDARY_EXEC_XSAVES;
>
> /*

This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
even if kvm_has_tsc_control == true.

That said, I think a better implementation of this patch is to just add
a version of evmcs_sanitize_exec_ctrls that takes a struct
nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like

evmcs_sanitize_nested_vmx_vsrs(msrs);

Even better (but I cannot "mentally test it" offhand) would be just

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e802f71a9e8d..b3425ce835c5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1862,7 +1862,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
* sanity checking and refuse to boot. Filter all unsupported
* features out.
*/
- if (!msr_info->host_initiated &&
+ if (static_branch_unlikely(&enable_evmcs) ||
vmx->nested.enlightened_vmcs_enabled)
nested_evmcs_filter_control_msr(msr_info->index,
&msr_info->data);

I cannot quite understand the host_initiated check, so I'll defer to
Vitaly on why it is needed. Most likely, removing it would cause some
warnings in QEMU with e.g. "-cpu Haswell,+vmx"; but I think it's a
userspace bug and we should remove that part of the condition. You
don't need to worry about that part, we'll cross that bridge if the
above patch works for your case.

Thanks,

Paolo


2022-06-13 19:14:43

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On Mon, Jun 13, 2022, Paolo Bonzini wrote:
> On 6/13/22 18:16, Anirudh Rayabharam wrote:
> > + if (!kvm_has_tsc_control)
> > + msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
> > +
> > msrs->secondary_ctls_low = 0;
> > msrs->secondary_ctls_high &=
> > SECONDARY_EXEC_DESC |
> > @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> > SECONDARY_EXEC_RDRAND_EXITING |
> > SECONDARY_EXEC_ENABLE_INVPCID |
> > SECONDARY_EXEC_RDSEED_EXITING |
> > - SECONDARY_EXEC_XSAVES |
> > - SECONDARY_EXEC_TSC_SCALING;
> > + SECONDARY_EXEC_XSAVES;
> > /*
>
> This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
> even if kvm_has_tsc_control == true.
>
> That said, I think a better implementation of this patch is to just add
> a version of evmcs_sanitize_exec_ctrls that takes a struct
> nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
>
> evmcs_sanitize_nested_vmx_vsrs(msrs);

Any reason not to use the already sanitized vmcs_config? I can't think of any
reason why the nested path should blindly use the raw MSR values from hardware.

2022-06-14 04:58:08

by Anirudh Rayabharam

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On Mon, Jun 13, 2022 at 06:49:17PM +0200, Paolo Bonzini wrote:
> On 6/13/22 18:16, Anirudh Rayabharam wrote:
> > + if (!kvm_has_tsc_control)
> > + msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
> > +
> > msrs->secondary_ctls_low = 0;
> > msrs->secondary_ctls_high &=
> > SECONDARY_EXEC_DESC |
> > @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> > SECONDARY_EXEC_RDRAND_EXITING |
> > SECONDARY_EXEC_ENABLE_INVPCID |
> > SECONDARY_EXEC_RDSEED_EXITING |
> > - SECONDARY_EXEC_XSAVES |
> > - SECONDARY_EXEC_TSC_SCALING;
> > + SECONDARY_EXEC_XSAVES;
> > /*
>
> This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
> even if kvm_has_tsc_control == true.

The MSR actually allows 1-setting of the "use TSC scaling" control. So this
line is redundant anyway.

>
> That said, I think a better implementation of this patch is to just add
> a version of evmcs_sanitize_exec_ctrls that takes a struct
> nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
>
> evmcs_sanitize_nested_vmx_vsrs(msrs);

Sanitize at the end might not work because I see some cases in
nested_vmx_setup_ctls_msrs() where we want to expose some things to L1
even though the hardware doesn't support it.

>
> Even better (but I cannot "mentally test it" offhand) would be just
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e802f71a9e8d..b3425ce835c5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1862,7 +1862,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> * sanity checking and refuse to boot. Filter all unsupported
> * features out.
> */
> - if (!msr_info->host_initiated &&
> + if (static_branch_unlikely(&enable_evmcs) ||
> vmx->nested.enlightened_vmcs_enabled)
> nested_evmcs_filter_control_msr(msr_info->index,
> &msr_info->data);

I will try this.

Thanks,

Anirudh.

>
> I cannot quite understand the host_initiated check, so I'll defer to
> Vitaly on why it is needed. Most likely, removing it would cause some
> warnings in QEMU with e.g. "-cpu Haswell,+vmx"; but I think it's a
> userspace bug and we should remove that part of the condition. You
> don't need to worry about that part, we'll cross that bridge if the
> above patch works for your case.
>
> Thanks,
>
> Paolo

2022-06-14 12:31:33

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On 6/14/22 06:55, Anirudh Rayabharam wrote:
>> That said, I think a better implementation of this patch is to just add
>> a version of evmcs_sanitize_exec_ctrls that takes a struct
>> nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
>>
>> evmcs_sanitize_nested_vmx_vsrs(msrs);
> Sanitize at the end might not work because I see some cases in
> nested_vmx_setup_ctls_msrs() where we want to expose some things to L1
> even though the hardware doesn't support it.
>

Yes, but these will never include eVMCS-unsupported features.

Paolo

2022-06-14 12:31:40

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

Paolo Bonzini <[email protected]> writes:

> On 6/13/22 18:16, Anirudh Rayabharam wrote:
>> + if (!kvm_has_tsc_control)
>> + msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
>> +
>> msrs->secondary_ctls_low = 0;
>> msrs->secondary_ctls_high &=
>> SECONDARY_EXEC_DESC |
>> @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
>> SECONDARY_EXEC_RDRAND_EXITING |
>> SECONDARY_EXEC_ENABLE_INVPCID |
>> SECONDARY_EXEC_RDSEED_EXITING |
>> - SECONDARY_EXEC_XSAVES |
>> - SECONDARY_EXEC_TSC_SCALING;
>> + SECONDARY_EXEC_XSAVES;
>>
>> /*
>
> This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
> even if kvm_has_tsc_control == true.
>
> That said, I think a better implementation of this patch is to just add
> a version of evmcs_sanitize_exec_ctrls that takes a struct
> nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
>
> evmcs_sanitize_nested_vmx_vsrs(msrs);
>
> Even better (but I cannot "mentally test it" offhand) would be just
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e802f71a9e8d..b3425ce835c5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -1862,7 +1862,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> * sanity checking and refuse to boot. Filter all unsupported
> * features out.
> */
> - if (!msr_info->host_initiated &&
> + if (static_branch_unlikely(&enable_evmcs) ||
> vmx->nested.enlightened_vmcs_enabled)
> nested_evmcs_filter_control_msr(msr_info->index,
> &msr_info->data);
>
> I cannot quite understand the host_initiated check, so I'll defer to
> Vitaly on why it is needed. Most likely, removing it would cause some
> warnings in QEMU with e.g. "-cpu Haswell,+vmx"; but I think it's a
> userspace bug and we should remove that part of the condition.

I forgot the details, of course, but 31de3d2500e4 says:

```
With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
with default (matching CPU model) values and in case eVMCS is also enabled,
fails.
```

so it certainly was a workaround for QEMU.

--
Vitaly

2022-06-14 15:18:48

by Anirudh Rayabharam

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On Tue, Jun 14, 2022 at 02:16:00PM +0200, Paolo Bonzini wrote:
> On 6/14/22 06:55, Anirudh Rayabharam wrote:
> > > That said, I think a better implementation of this patch is to just add
> > > a version of evmcs_sanitize_exec_ctrls that takes a struct
> > > nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
> > >
> > > evmcs_sanitize_nested_vmx_vsrs(msrs);
> > Sanitize at the end might not work because I see some cases in
> > nested_vmx_setup_ctls_msrs() where we want to expose some things to L1
> > even though the hardware doesn't support it.
> >
>
> Yes, but these will never include eVMCS-unsupported features.

How are you so sure?

For example, SECONDARY_EXEC_SHADOW_VMCS is unsupported in eVMCS but in
nested_vmx_setup_ctls_msrs() we do:

6675 /*
6676 * We can emulate "VMCS shadowing," even if the hardware
6677 * doesn't support it.
6678 */
6679 msrs->secondary_ctls_high |=
6680 SECONDARY_EXEC_SHADOW_VMCS;

If we sanitize this out it might cause some regression right?

Thanks!

Anirudh.
>
> Paolo

2022-06-14 15:29:36

by Anirudh Rayabharam

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On Tue, Jun 14, 2022 at 10:25:52AM +0530, Anirudh Rayabharam wrote:
> On Mon, Jun 13, 2022 at 06:49:17PM +0200, Paolo Bonzini wrote:
> > On 6/13/22 18:16, Anirudh Rayabharam wrote:
> > > + if (!kvm_has_tsc_control)
> > > + msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
> > > +
> > > msrs->secondary_ctls_low = 0;
> > > msrs->secondary_ctls_high &=
> > > SECONDARY_EXEC_DESC |
> > > @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> > > SECONDARY_EXEC_RDRAND_EXITING |
> > > SECONDARY_EXEC_ENABLE_INVPCID |
> > > SECONDARY_EXEC_RDSEED_EXITING |
> > > - SECONDARY_EXEC_XSAVES |
> > > - SECONDARY_EXEC_TSC_SCALING;
> > > + SECONDARY_EXEC_XSAVES;
> > > /*
> >
> > This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
> > even if kvm_has_tsc_control == true.
>
> The MSR actually allows 1-setting of the "use TSC scaling" control. So this
> line is redundant anyway.
>
> >
> > That said, I think a better implementation of this patch is to just add
> > a version of evmcs_sanitize_exec_ctrls that takes a struct
> > nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
> >
> > evmcs_sanitize_nested_vmx_vsrs(msrs);
>
> Sanitize at the end might not work because I see some cases in
> nested_vmx_setup_ctls_msrs() where we want to expose some things to L1
> even though the hardware doesn't support it.
>
> >
> > Even better (but I cannot "mentally test it" offhand) would be just
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index e802f71a9e8d..b3425ce835c5 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -1862,7 +1862,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > * sanity checking and refuse to boot. Filter all unsupported
> > * features out.
> > */
> > - if (!msr_info->host_initiated &&
> > + if (static_branch_unlikely(&enable_evmcs) ||
> > vmx->nested.enlightened_vmcs_enabled)
> > nested_evmcs_filter_control_msr(msr_info->index,
> > &msr_info->data);
>
> I will try this.

This patch fixed the issue for me. But again, this might filter out
things that we wan't to expose to L1 even if not available in hardware.

Thanks,

Anirudh.

>
> Thanks,
>
> Anirudh.
>
> >
> > I cannot quite understand the host_initiated check, so I'll defer to
> > Vitaly on why it is needed. Most likely, removing it would cause some
> > warnings in QEMU with e.g. "-cpu Haswell,+vmx"; but I think it's a
> > userspace bug and we should remove that part of the condition. You
> > don't need to worry about that part, we'll cross that bridge if the
> > above patch works for your case.
> >
> > Thanks,
> >
> > Paolo

2022-06-14 15:33:30

by Anirudh Rayabharam

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:
> On Mon, Jun 13, 2022, Paolo Bonzini wrote:
> > On 6/13/22 18:16, Anirudh Rayabharam wrote:
> > > + if (!kvm_has_tsc_control)
> > > + msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
> > > +
> > > msrs->secondary_ctls_low = 0;
> > > msrs->secondary_ctls_high &=
> > > SECONDARY_EXEC_DESC |
> > > @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> > > SECONDARY_EXEC_RDRAND_EXITING |
> > > SECONDARY_EXEC_ENABLE_INVPCID |
> > > SECONDARY_EXEC_RDSEED_EXITING |
> > > - SECONDARY_EXEC_XSAVES |
> > > - SECONDARY_EXEC_TSC_SCALING;
> > > + SECONDARY_EXEC_XSAVES;
> > > /*
> >
> > This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
> > even if kvm_has_tsc_control == true.
> >
> > That said, I think a better implementation of this patch is to just add
> > a version of evmcs_sanitize_exec_ctrls that takes a struct
> > nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
> >
> > evmcs_sanitize_nested_vmx_vsrs(msrs);
>
> Any reason not to use the already sanitized vmcs_config? I can't think of any
> reason why the nested path should blindly use the raw MSR values from hardware.

vmcs_config has the sanitized exec controls. But how do we construct MSR
values using them?

Thanks,

Anirudh.

2022-06-14 16:09:29

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On Tue, Jun 14, 2022, Anirudh Rayabharam wrote:
> On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:
> > On Mon, Jun 13, 2022, Paolo Bonzini wrote:
> > > On 6/13/22 18:16, Anirudh Rayabharam wrote:
> > > > + if (!kvm_has_tsc_control)
> > > > + msrs->secondary_ctls_high &= ~SECONDARY_EXEC_TSC_SCALING;
> > > > +
> > > > msrs->secondary_ctls_low = 0;
> > > > msrs->secondary_ctls_high &=
> > > > SECONDARY_EXEC_DESC |
> > > > @@ -6667,8 +6670,7 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> > > > SECONDARY_EXEC_RDRAND_EXITING |
> > > > SECONDARY_EXEC_ENABLE_INVPCID |
> > > > SECONDARY_EXEC_RDSEED_EXITING |
> > > > - SECONDARY_EXEC_XSAVES |
> > > > - SECONDARY_EXEC_TSC_SCALING;
> > > > + SECONDARY_EXEC_XSAVES;
> > > > /*
> > >
> > > This is wrong because it _always_ disables SECONDARY_EXEC_TSC_SCALING,
> > > even if kvm_has_tsc_control == true.
> > >
> > > That said, I think a better implementation of this patch is to just add
> > > a version of evmcs_sanitize_exec_ctrls that takes a struct
> > > nested_vmx_msrs *, and call it at the end of nested_vmx_setup_ctl_msrs like
> > >
> > > evmcs_sanitize_nested_vmx_vsrs(msrs);
> >
> > Any reason not to use the already sanitized vmcs_config? I can't think of any
> > reason why the nested path should blindly use the raw MSR values from hardware.
>
> vmcs_config has the sanitized exec controls. But how do we construct MSR
> values using them?

I was thinking we could use the sanitized controls for the allowed-1 bits, and then
take the required-1 bits from the CPU. And then if we wanted to avoid the redundant
RDMSRs in a follow-up patch we could add required-1 fields to vmcs_config.

Hastily constructed and compile-tested only, proceed with caution :-)

---
arch/x86/kvm/vmx/nested.c | 35 ++++++++++++++++++++---------------
arch/x86/kvm/vmx/nested.h | 2 +-
arch/x86/kvm/vmx/vmx.c | 5 ++---
3 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f5cb18e00e78..67cbb6643efa 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6541,8 +6541,13 @@ static u64 nested_vmx_calc_vmcs_enum_msr(void)
* bit in the high half is on if the corresponding bit in the control field
* may be on. See also vmx_control_verify().
*/
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
+void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
{
+ struct nested_vmx_msrs *msrs = &vmcs_config.nested;
+
+ /* Take the allowed-1 bits from KVM's sanitized VMCS configuration. */
+ u32 ignore_high;
+
/*
* Note that as a general rule, the high half of the MSRs (bits in
* the control fields which may be 1) should be initialized by the
@@ -6559,11 +6564,11 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
*/

/* pin-based controls */
- rdmsr(MSR_IA32_VMX_PINBASED_CTLS,
- msrs->pinbased_ctls_low,
- msrs->pinbased_ctls_high);
+ rdmsr(MSR_IA32_VMX_PINBASED_CTLS, msrs->pinbased_ctls_low, ignore_high);
msrs->pinbased_ctls_low |=
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
+
+ msrs->pinbased_ctls_high = vmcs_conf->pin_based_exec_ctrl;
msrs->pinbased_ctls_high &=
PIN_BASED_EXT_INTR_MASK |
PIN_BASED_NMI_EXITING |
@@ -6574,12 +6579,11 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
PIN_BASED_VMX_PREEMPTION_TIMER;

/* exit controls */
- rdmsr(MSR_IA32_VMX_EXIT_CTLS,
- msrs->exit_ctls_low,
- msrs->exit_ctls_high);
+ rdmsr(MSR_IA32_VMX_EXIT_CTLS, msrs->exit_ctls_low, ignore_high);
msrs->exit_ctls_low =
VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;

+ msrs->exit_ctls_high = vmcs_conf->vmexit_ctrl;
msrs->exit_ctls_high &=
#ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
@@ -6595,11 +6599,11 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->exit_ctls_low &= ~VM_EXIT_SAVE_DEBUG_CONTROLS;

/* entry controls */
- rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
- msrs->entry_ctls_low,
- msrs->entry_ctls_high);
+ rdmsr(MSR_IA32_VMX_ENTRY_CTLS, msrs->entry_ctls_low, ignore_high);
msrs->entry_ctls_low =
VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
+
+ msrs->entry_ctls_high = vmcs_conf->vmentry_ctrl;
msrs->entry_ctls_high &=
#ifdef CONFIG_X86_64
VM_ENTRY_IA32E_MODE |
@@ -6613,11 +6617,11 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->entry_ctls_low &= ~VM_ENTRY_LOAD_DEBUG_CONTROLS;

/* cpu-based controls */
- rdmsr(MSR_IA32_VMX_PROCBASED_CTLS,
- msrs->procbased_ctls_low,
- msrs->procbased_ctls_high);
+ rdmsr(MSR_IA32_VMX_PROCBASED_CTLS, msrs->procbased_ctls_low, ignore_high);
msrs->procbased_ctls_low =
CPU_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
+
+ msrs->procbased_ctls_high = vmcs_conf->cpu_based_exec_ctrl;
msrs->procbased_ctls_high &=
CPU_BASED_INTR_WINDOW_EXITING |
CPU_BASED_NMI_WINDOW_EXITING | CPU_BASED_USE_TSC_OFFSETTING |
@@ -6653,10 +6657,11 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
*/
if (msrs->procbased_ctls_high & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)
rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2,
- msrs->secondary_ctls_low,
- msrs->secondary_ctls_high);
+ msrs->secondary_ctls_low, ignore_high);

msrs->secondary_ctls_low = 0;
+
+ msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
msrs->secondary_ctls_high &=
SECONDARY_EXEC_DESC |
SECONDARY_EXEC_ENABLE_RDTSCP |
diff --git a/arch/x86/kvm/vmx/nested.h b/arch/x86/kvm/vmx/nested.h
index c92cea0b8ccc..fae047c6204b 100644
--- a/arch/x86/kvm/vmx/nested.h
+++ b/arch/x86/kvm/vmx/nested.h
@@ -17,7 +17,7 @@ enum nvmx_vmentry_status {
};

void vmx_leave_nested(struct kvm_vcpu *vcpu);
-void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps);
+void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps);
void nested_vmx_hardware_unsetup(void);
__init int nested_vmx_hardware_setup(int (*exit_handlers[])(struct kvm_vcpu *));
void nested_vmx_set_vmcs_shadowing_bitmap(void);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9bd86ecccdab..cd0d0ffae0bf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7139,7 +7139,7 @@ static int __init vmx_check_processor_compat(void)
if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0)
return -EIO;
if (nested)
- nested_vmx_setup_ctls_msrs(&vmcs_conf.nested, vmx_cap.ept);
+ nested_vmx_setup_ctls_msrs(&vmcs_conf, vmx_cap.ept);
if (memcmp(&vmcs_config, &vmcs_conf, sizeof(struct vmcs_config)) != 0) {
printk(KERN_ERR "kvm: CPU %d feature inconsistency!\n",
smp_processor_id());
@@ -8079,8 +8079,7 @@ static __init int hardware_setup(void)
setup_default_sgx_lepubkeyhash();

if (nested) {
- nested_vmx_setup_ctls_msrs(&vmcs_config.nested,
- vmx_capability.ept);
+ nested_vmx_setup_ctls_msrs(&vmcs_config, vmx_capability.ept);

r = nested_vmx_hardware_setup(kvm_vmx_exit_handlers);
if (r)

base-commit: b821e4ff9e35a8fc999685e8d44c0644cfeaa228
--

2022-06-14 17:45:54

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On 6/14/22 17:13, Anirudh Rayabharam wrote:
>>> Sanitize at the end might not work because I see some cases in
>>> nested_vmx_setup_ctls_msrs() where we want to expose some things to L1
>>> even though the hardware doesn't support it.
>>
>> Yes, but these will never include eVMCS-unsupported features.
>
> How are you so sure?
>
> For example, SECONDARY_EXEC_SHADOW_VMCS is unsupported in eVMCS but in
> nested_vmx_setup_ctls_msrs() we do:
>
> 6675 /*
> 6676 * We can emulate "VMCS shadowing," even if the hardware
> 6677 * doesn't support it.
> 6678 */
> 6679 msrs->secondary_ctls_high |=
> 6680 SECONDARY_EXEC_SHADOW_VMCS;
>
> If we sanitize this out it might cause some regression right?

Yes, you're right, shadow VMCS is special: it is not supported by
enlightened VMCS, but it is emulated rather than virtualized.
Therefore, if L1 does not use the enlightened VMCS, it can indeed use
shadow VMCS.

Paolo

2022-06-22 08:12:08

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

Sean Christopherson <[email protected]> writes:

> On Tue, Jun 14, 2022, Anirudh Rayabharam wrote:
>> On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:

...

>> >
>> > Any reason not to use the already sanitized vmcs_config? I can't think of any
>> > reason why the nested path should blindly use the raw MSR values from hardware.
>>
>> vmcs_config has the sanitized exec controls. But how do we construct MSR
>> values using them?
>
> I was thinking we could use the sanitized controls for the allowed-1 bits, and then
> take the required-1 bits from the CPU. And then if we wanted to avoid the redundant
> RDMSRs in a follow-up patch we could add required-1 fields to vmcs_config.
>
> Hastily constructed and compile-tested only, proceed with caution :-)

Independently from "[PATCH 00/11] KVM: VMX: Support TscScaling and
EnclsExitingBitmap whith eVMCS" which is supposed to fix the particular
TSC scaling issue, I like the idea to make nested_vmx_setup_ctls_msrs()
use both allowed-1 and required-1 bits from vmcs_config. I'll pick up
the suggested patch and try to construct something for required-1 bits.

Thanks!

--
Vitaly

2022-06-22 14:07:01

by Anirudh Rayabharam

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On Wed, Jun 22, 2022 at 10:00:29AM +0200, Vitaly Kuznetsov wrote:
> Sean Christopherson <[email protected]> writes:
>
> > On Tue, Jun 14, 2022, Anirudh Rayabharam wrote:
> >> On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:
>
> ...
>
> >> >
> >> > Any reason not to use the already sanitized vmcs_config? I can't think of any
> >> > reason why the nested path should blindly use the raw MSR values from hardware.
> >>
> >> vmcs_config has the sanitized exec controls. But how do we construct MSR
> >> values using them?
> >
> > I was thinking we could use the sanitized controls for the allowed-1 bits, and then
> > take the required-1 bits from the CPU. And then if we wanted to avoid the redundant
> > RDMSRs in a follow-up patch we could add required-1 fields to vmcs_config.
> >
> > Hastily constructed and compile-tested only, proceed with caution :-)
>
> Independently from "[PATCH 00/11] KVM: VMX: Support TscScaling and
> EnclsExitingBitmap whith eVMCS" which is supposed to fix the particular
> TSC scaling issue, I like the idea to make nested_vmx_setup_ctls_msrs()
> use both allowed-1 and required-1 bits from vmcs_config. I'll pick up
> the suggested patch and try to construct something for required-1 bits.

I tried this patch today but it causes some regression which causes
/dev/kvm to be unavailable in L1. I didn't get a chance to look into it
closely but I am guessing it has something to do with the fact that
vmcs_config reflects the config that L0 chose to use rather than what is
available to use. So constructing allowed-1 MSR bits based on what bits
are set in exec controls maybe isn't correct.


Thanks!

- Anirudh.

2022-06-22 15:01:51

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

Anirudh Rayabharam <[email protected]> writes:

> On Wed, Jun 22, 2022 at 10:00:29AM +0200, Vitaly Kuznetsov wrote:
>> Sean Christopherson <[email protected]> writes:
>>
>> > On Tue, Jun 14, 2022, Anirudh Rayabharam wrote:
>> >> On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:
>>
>> ...
>>
>> >> >
>> >> > Any reason not to use the already sanitized vmcs_config? I can't think of any
>> >> > reason why the nested path should blindly use the raw MSR values from hardware.
>> >>
>> >> vmcs_config has the sanitized exec controls. But how do we construct MSR
>> >> values using them?
>> >
>> > I was thinking we could use the sanitized controls for the allowed-1 bits, and then
>> > take the required-1 bits from the CPU. And then if we wanted to avoid the redundant
>> > RDMSRs in a follow-up patch we could add required-1 fields to vmcs_config.
>> >
>> > Hastily constructed and compile-tested only, proceed with caution :-)
>>
>> Independently from "[PATCH 00/11] KVM: VMX: Support TscScaling and
>> EnclsExitingBitmap whith eVMCS" which is supposed to fix the particular
>> TSC scaling issue, I like the idea to make nested_vmx_setup_ctls_msrs()
>> use both allowed-1 and required-1 bits from vmcs_config. I'll pick up
>> the suggested patch and try to construct something for required-1 bits.
>
> I tried this patch today but it causes some regression which causes
> /dev/kvm to be unavailable in L1. I didn't get a chance to look into it
> closely but I am guessing it has something to do with the fact that
> vmcs_config reflects the config that L0 chose to use rather than what is
> available to use. So constructing allowed-1 MSR bits based on what bits
> are set in exec controls maybe isn't correct.

I've tried to pick it up but it's actually much harder than I think. The
patch has some minor issues ('&vmcs_config.nested' needs to be switched
to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
identify at least:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5e14e4c40007..8076352174ad 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2483,8 +2483,14 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
CPU_BASED_INVLPG_EXITING |
CPU_BASED_RDPMC_EXITING;

- opt = CPU_BASED_TPR_SHADOW |
+ opt = CPU_BASED_INTR_WINDOW_EXITING |
+ CPU_BASED_NMI_WINDOW_EXITING |
+ CPU_BASED_TPR_SHADOW |
+ CPU_BASED_USE_IO_BITMAPS |
CPU_BASED_USE_MSR_BITMAPS |
+ CPU_BASED_MONITOR_TRAP_FLAG |
+ CPU_BASED_RDTSC_EXITING |
+ CPU_BASED_PAUSE_EXITING |
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS,
@@ -2582,6 +2588,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
#endif
opt = VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
VM_EXIT_LOAD_IA32_PAT |
+ VM_EXIT_SAVE_IA32_PAT |
VM_EXIT_LOAD_IA32_EFER |
VM_EXIT_CLEAR_BNDCFGS |
VM_EXIT_PT_CONCEAL_PIP |
@@ -2604,7 +2611,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
_pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;

min = VM_ENTRY_LOAD_DEBUG_CONTROLS;
- opt = VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
+ opt =
+#ifdef CONFIG_X86_64
+ VM_ENTRY_IA32E_MODE |
+#endif
+ VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
VM_ENTRY_LOAD_IA32_PAT |
VM_ENTRY_LOAD_IA32_EFER |
VM_ENTRY_LOAD_BNDCFGS |

but it is 1) not sufficient because some controls are smartly filtered
out just because we don't want them for L1 -- and this doesn't mean that
L2 doesn't need them and 2) because if we add some 'opt' controls to
setup_vmcs_config() we need to filter them out somewhere else.

I'm starting to think we may just want to store raw VMX MSR values in
vmcs_config first, then sanitize them (eVMCS, vmx preemtoion timer bug,
perf_ctrl bug,..) and then do the adjust_vmx_controls() magic.

I'm not giving up yet but don't expect something small and backportable
to stable :-)

--
Vitaly

2022-06-22 16:27:04

by Anirudh Rayabharam

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On Wed, Jun 22, 2022 at 04:35:27PM +0200, Vitaly Kuznetsov wrote:
> Anirudh Rayabharam <[email protected]> writes:
>
> > On Wed, Jun 22, 2022 at 10:00:29AM +0200, Vitaly Kuznetsov wrote:
> >> Sean Christopherson <[email protected]> writes:
> >>
> >> > On Tue, Jun 14, 2022, Anirudh Rayabharam wrote:
> >> >> On Mon, Jun 13, 2022 at 04:57:49PM +0000, Sean Christopherson wrote:
> >>
> >> ...
> >>
> >> >> >
> >> >> > Any reason not to use the already sanitized vmcs_config? I can't think of any
> >> >> > reason why the nested path should blindly use the raw MSR values from hardware.
> >> >>
> >> >> vmcs_config has the sanitized exec controls. But how do we construct MSR
> >> >> values using them?
> >> >
> >> > I was thinking we could use the sanitized controls for the allowed-1 bits, and then
> >> > take the required-1 bits from the CPU. And then if we wanted to avoid the redundant
> >> > RDMSRs in a follow-up patch we could add required-1 fields to vmcs_config.
> >> >
> >> > Hastily constructed and compile-tested only, proceed with caution :-)
> >>
> >> Independently from "[PATCH 00/11] KVM: VMX: Support TscScaling and
> >> EnclsExitingBitmap whith eVMCS" which is supposed to fix the particular
> >> TSC scaling issue, I like the idea to make nested_vmx_setup_ctls_msrs()
> >> use both allowed-1 and required-1 bits from vmcs_config. I'll pick up
> >> the suggested patch and try to construct something for required-1 bits.
> >
> > I tried this patch today but it causes some regression which causes
> > /dev/kvm to be unavailable in L1. I didn't get a chance to look into it
> > closely but I am guessing it has something to do with the fact that
> > vmcs_config reflects the config that L0 chose to use rather than what is
> > available to use. So constructing allowed-1 MSR bits based on what bits
> > are set in exec controls maybe isn't correct.
>
> I've tried to pick it up but it's actually much harder than I think. The
> patch has some minor issues ('&vmcs_config.nested' needs to be switched
> to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
> problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
> is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
> identify at least:
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 5e14e4c40007..8076352174ad 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2483,8 +2483,14 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> CPU_BASED_INVLPG_EXITING |
> CPU_BASED_RDPMC_EXITING;
>
> - opt = CPU_BASED_TPR_SHADOW |
> + opt = CPU_BASED_INTR_WINDOW_EXITING |
> + CPU_BASED_NMI_WINDOW_EXITING |
> + CPU_BASED_TPR_SHADOW |
> + CPU_BASED_USE_IO_BITMAPS |
> CPU_BASED_USE_MSR_BITMAPS |
> + CPU_BASED_MONITOR_TRAP_FLAG |
> + CPU_BASED_RDTSC_EXITING |
> + CPU_BASED_PAUSE_EXITING |
> CPU_BASED_ACTIVATE_SECONDARY_CONTROLS |
> CPU_BASED_ACTIVATE_TERTIARY_CONTROLS;
> if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PROCBASED_CTLS,
> @@ -2582,6 +2588,7 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> #endif
> opt = VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL |
> VM_EXIT_LOAD_IA32_PAT |
> + VM_EXIT_SAVE_IA32_PAT |
> VM_EXIT_LOAD_IA32_EFER |
> VM_EXIT_CLEAR_BNDCFGS |
> VM_EXIT_PT_CONCEAL_PIP |
> @@ -2604,7 +2611,11 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> _pin_based_exec_control &= ~PIN_BASED_POSTED_INTR;
>
> min = VM_ENTRY_LOAD_DEBUG_CONTROLS;
> - opt = VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
> + opt =
> +#ifdef CONFIG_X86_64
> + VM_ENTRY_IA32E_MODE |
> +#endif
> + VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL |
> VM_ENTRY_LOAD_IA32_PAT |
> VM_ENTRY_LOAD_IA32_EFER |
> VM_ENTRY_LOAD_BNDCFGS |
>
> but it is 1) not sufficient because some controls are smartly filtered
> out just because we don't want them for L1 -- and this doesn't mean that
> L2 doesn't need them and 2) because if we add some 'opt' controls to
> setup_vmcs_config() we need to filter them out somewhere else.
>
> I'm starting to think we may just want to store raw VMX MSR values in
> vmcs_config first, then sanitize them (eVMCS, vmx preemtoion timer bug,
> perf_ctrl bug,..) and then do the adjust_vmx_controls() magic.
>
> I'm not giving up yet but don't expect something small and backportable
> to stable :-)

How about we do something simple like the patch below to start with?
This will easily apply to stable and we can continue improving upon
it with follow up patches on mainline.

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f5cb18e00e78..f88d748c7cc6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6564,6 +6564,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->pinbased_ctls_high);
msrs->pinbased_ctls_low |=
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs))
+ msrs->pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
+#endif
msrs->pinbased_ctls_high &=
PIN_BASED_EXT_INTR_MASK |
PIN_BASED_NMI_EXITING |
@@ -6580,6 +6584,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->exit_ctls_low =
VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;

+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs))
+ msrs->exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
+#endif
msrs->exit_ctls_high &=
#ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
@@ -6600,6 +6608,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->entry_ctls_high);
msrs->entry_ctls_low =
VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs))
+ msrs->entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
+#endif
msrs->entry_ctls_high &=
#ifdef CONFIG_X86_64
VM_ENTRY_IA32E_MODE |
@@ -6657,6 +6669,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->secondary_ctls_high);

msrs->secondary_ctls_low = 0;
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs))
+ msrs->secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
+#endif
msrs->secondary_ctls_high &=
SECONDARY_EXEC_DESC |
SECONDARY_EXEC_ENABLE_RDTSCP |

2022-06-22 16:57:30

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

Anirudh Rayabharam <[email protected]> writes:

> On Wed, Jun 22, 2022 at 04:35:27PM +0200, Vitaly Kuznetsov wrote:

...

>>
>> I've tried to pick it up but it's actually much harder than I think. The
>> patch has some minor issues ('&vmcs_config.nested' needs to be switched
>> to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
>> problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
>> is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
>> identify at least:

...

I've jsut sent "[PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for
setting up nested VMX MSRs" which implements Sean's suggestion. Hope
this is the way to go for mainline.

>
> How about we do something simple like the patch below to start with?
> This will easily apply to stable and we can continue improving upon
> it with follow up patches on mainline.
>

Personally, I'm not against this for @stable. Alternatively, in case the
only observed issue is with TSC scaling, we can add support for it for
KVM-on-Hyper-V but not for Hyper-V-on-KVM (a small subset of "[PATCH
00/11] KVM: VMX: Support TscScaling and EnclsExitingBitmap whith
eVMCS"). I can prepare patches if needed.

--
Vitaly

2022-06-23 10:26:13

by Anirudh Rayabharam

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On Wed, Jun 22, 2022 at 06:48:50PM +0200, Vitaly Kuznetsov wrote:
> Anirudh Rayabharam <[email protected]> writes:
>
> > On Wed, Jun 22, 2022 at 04:35:27PM +0200, Vitaly Kuznetsov wrote:
>
> ...
>
> >>
> >> I've tried to pick it up but it's actually much harder than I think. The
> >> patch has some minor issues ('&vmcs_config.nested' needs to be switched
> >> to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
> >> problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
> >> is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
> >> identify at least:
>
> ...
>
> I've jsut sent "[PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for
> setting up nested VMX MSRs" which implements Sean's suggestion. Hope
> this is the way to go for mainline.
>
> >
> > How about we do something simple like the patch below to start with?
> > This will easily apply to stable and we can continue improving upon
> > it with follow up patches on mainline.
> >
>
> Personally, I'm not against this for @stable. Alternatively, in case the

I think it's a good intermediate fix for mainline too. It is easier to land
it in stable if it already exists in mainline. It can stay in mainline
until your series lands and replaces it with the vmcs_config approach.

What do you think?

> only observed issue is with TSC scaling, we can add support for it for
> KVM-on-Hyper-V but not for Hyper-V-on-KVM (a small subset of "[PATCH
> 00/11] KVM: VMX: Support TscScaling and EnclsExitingBitmap whith
> eVMCS"). I can prepare patches if needed.

Will it fit in stable's 100 line rule?

Thanks!

- Anirudh.

2022-06-23 11:58:12

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

Anirudh Rayabharam <[email protected]> writes:

> On Wed, Jun 22, 2022 at 06:48:50PM +0200, Vitaly Kuznetsov wrote:
>> Anirudh Rayabharam <[email protected]> writes:
>>
>> > On Wed, Jun 22, 2022 at 04:35:27PM +0200, Vitaly Kuznetsov wrote:
>>
>> ...
>>
>> >>
>> >> I've tried to pick it up but it's actually much harder than I think. The
>> >> patch has some minor issues ('&vmcs_config.nested' needs to be switched
>> >> to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
>> >> problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
>> >> is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
>> >> identify at least:
>>
>> ...
>>
>> I've jsut sent "[PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for
>> setting up nested VMX MSRs" which implements Sean's suggestion. Hope
>> this is the way to go for mainline.
>>
>> >
>> > How about we do something simple like the patch below to start with?
>> > This will easily apply to stable and we can continue improving upon
>> > it with follow up patches on mainline.
>> >
>>
>> Personally, I'm not against this for @stable. Alternatively, in case the
>
> I think it's a good intermediate fix for mainline too. It is easier to land
> it in stable if it already exists in mainline. It can stay in mainline
> until your series lands and replaces it with the vmcs_config approach.
>
> What do you think?
>

Paolo's call but personally I think both series can make 5.20 so there's
no need for an intermediate solution.

>> only observed issue is with TSC scaling, we can add support for it for
>> KVM-on-Hyper-V but not for Hyper-V-on-KVM (a small subset of "[PATCH
>> 00/11] KVM: VMX: Support TscScaling and EnclsExitingBitmap whith
>> eVMCS"). I can prepare patches if needed.
>
> Will it fit in stable's 100 line rule?
>

Yes, please take a look at the attached patches (5.18.y based). First 3
are identical to what I've sent for mainline, the last one is reduced to
only support TSC scaling for KVM on Hyper-V (but not Hyper-V on
KVM). Compile tested only, proceed with caution)

--
Vitaly


Attachments:
0001-x86-hyperv-Fix-struct-hv_enlightened_vmcs-definition.patch (2.16 kB)
0002-x86-hyperv-Update-struct-hv_enlightened_vmcs-definit.patch (1.58 kB)
0003-KVM-VMX-Define-VMCS-to-EVMCS-conversion-for-the-new-.patch (3.12 kB)
0004-KVM-VMX-Support-TSC-scaling-with-enlightened-VMCS.patch (3.32 kB)
Download all attachments

2022-06-28 11:03:32

by Anirudh Rayabharam

[permalink] [raw]
Subject: Re: [PATCH] KVM: nVMX: Don't expose TSC scaling to L1 when on Hyper-V

On Thu, Jun 23, 2022 at 01:49:30PM +0200, Vitaly Kuznetsov wrote:
> Anirudh Rayabharam <[email protected]> writes:
>
> > On Wed, Jun 22, 2022 at 06:48:50PM +0200, Vitaly Kuznetsov wrote:
> >> Anirudh Rayabharam <[email protected]> writes:
> >>
> >> > On Wed, Jun 22, 2022 at 04:35:27PM +0200, Vitaly Kuznetsov wrote:
> >>
> >> ...
> >>
> >> >>
> >> >> I've tried to pick it up but it's actually much harder than I think. The
> >> >> patch has some minor issues ('&vmcs_config.nested' needs to be switched
> >> >> to '&vmcs_conf->nested' in nested_vmx_setup_ctls_msrs()), but the main
> >> >> problem is that the set of controls nested_vmx_setup_ctls_msrs() needs
> >> >> is NOT a subset of vmcs_config (setup_vmcs_config()). I was able to
> >> >> identify at least:
> >>
> >> ...
> >>
> >> I've jsut sent "[PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for
> >> setting up nested VMX MSRs" which implements Sean's suggestion. Hope
> >> this is the way to go for mainline.
> >>
> >> >
> >> > How about we do something simple like the patch below to start with?
> >> > This will easily apply to stable and we can continue improving upon
> >> > it with follow up patches on mainline.
> >> >
> >>
> >> Personally, I'm not against this for @stable. Alternatively, in case the
> >
> > I think it's a good intermediate fix for mainline too. It is easier to land
> > it in stable if it already exists in mainline. It can stay in mainline
> > until your series lands and replaces it with the vmcs_config approach.
> >
> > What do you think?
> >
>
> Paolo's call but personally I think both series can make 5.20 so there's
> no need for an intermediate solution.

Only reason I see for this intermediate solution is to automatically
land the fix in stable without bothering to write a special backport.

I will send it as a proper patch and see if there is any interest in
taking it.

- Anirudh.