2020-08-28 08:55:23

by Chenyi Qiang

[permalink] [raw]
Subject: [PATCH 0/5] Fix nested VMX controls MSRs

The first three patches fix a issue for the nested VMX controls MSRs. The
issue happens when I use QEMU to run nested VM. The VM_{ENTRY,
EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL and VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS
in L1 MSR_IA32_VMX_TRUE_{ENTRY, EXIT}_CTLS MSR are always cleared
regardless of whether it supports in L1. This is because QEMU gets the
nested VMX MSRs from vmcs_config.nested_vmx_msrs which doesn't expose
these two fields. Then, when QEMU initializes the features MSRs after
SET_CPUID, it will override the nested VMX MSR values which has been
updated according to guest CPUID during SET_CPUID. This patch series
just expose the missing fields in nested VMX {ENTRY, EXIT} controls
MSR and adds the support to update nested VMX MSRs after set_vmx_msrs.

The last two patches are a minor fix and cleanup.

Chenyi Qiang (5):
KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled
KVM: nVMX: Verify the VMX controls MSRs with the global capability
when setting VMX MSRs
KVM: nVMX: Update VMX controls MSR according to guest CPUID after
setting VMX MSRs
KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL
control
KVM: nVMX: Simplify the initialization of nested_vmx_msrs

arch/x86/kvm/vmx/nested.c | 79 +++++++++++++++++++++++++++------------
arch/x86/kvm/vmx/vmx.c | 9 +++--
2 files changed, 62 insertions(+), 26 deletions(-)

--
2.17.1


2020-08-28 08:55:36

by Chenyi Qiang

[permalink] [raw]
Subject: [PATCH 1/5] KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled

KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and
VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they doesn't expose during
the setup of nested VMX controls MSR.

Signed-off-by: Chenyi Qiang <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 23b58c28a1c9..6e0e71f4d45f 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6310,7 +6310,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
#ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
#endif
- VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
+ VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
+ VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
msrs->exit_ctls_high |=
VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
@@ -6329,7 +6330,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
#ifdef CONFIG_X86_64
VM_ENTRY_IA32E_MODE |
#endif
- VM_ENTRY_LOAD_IA32_PAT;
+ VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
+ VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
msrs->entry_ctls_high |=
(VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER);

--
2.17.1

2020-08-28 08:56:00

by Chenyi Qiang

[permalink] [raw]
Subject: [PATCH 3/5] KVM: nVMX: Update VMX controls MSR according to guest CPUID after setting VMX MSRs

Update the fields (i.e. VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS and
VM_{ENTRY, EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL) in
nested MSR_IA32_VMX_TRUE_{ENTRY, EXIT}_CTLS according to guest CPUID
when user space initializes the features MSRs. Regardless of the order
of SET_CPUID and SET_MSRS from the user space, do the update to avoid
MSR values overriding.

Signed-off-by: Chenyi Qiang <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 819c185adf09..f9664ccc003b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -345,6 +345,7 @@ static bool guest_state_valid(struct kvm_vcpu *vcpu);
static u32 vmx_segment_access_rights(struct kvm_segment *var);
static __always_inline void vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
u32 msr, int type);
+static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu);

void vmx_vmexit(void);

@@ -2161,7 +2162,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1; /* they are read-only */
if (!nested_vmx_allowed(vcpu))
return 1;
- return vmx_set_vmx_msr(vcpu, msr_index, data);
+ ret = vmx_set_vmx_msr(vcpu, msr_index, data);
+ nested_vmx_pmu_entry_exit_ctls_update(vcpu);
+ nested_vmx_entry_exit_ctls_update(vcpu);
+ break;
case MSR_IA32_RTIT_CTL:
if (!vmx_pt_mode_is_host_guest() ||
vmx_rtit_ctl_check(vcpu, data) ||
--
2.17.1

2020-08-28 08:57:11

by Chenyi Qiang

[permalink] [raw]
Subject: [PATCH 5/5] KVM: nVMX: Simplify the initialization of nested_vmx_msrs

The nested VMX controls MSRs can be initialized by the global capability
values stored in vmcs_config.

Signed-off-by: Chenyi Qiang <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f9664ccc003b..0c36cd664222 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7019,8 +7019,7 @@ static int vmx_create_vcpu(struct kvm_vcpu *vcpu)
}

if (nested)
- nested_vmx_setup_ctls_msrs(&vmx->nested.msrs,
- vmx_capability.ept);
+ memcpy(&vmx->nested.msrs, &vmcs_config.nested, sizeof(vmx->nested.msrs));
else
memset(&vmx->nested.msrs, 0, sizeof(vmx->nested.msrs));

--
2.17.1

2020-08-28 08:58:36

by Chenyi Qiang

[permalink] [raw]
Subject: [PATCH 4/5] KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control

A minor fix for the update of VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL field
in exit_ctls_high.

Fixes: 03a8871add95 ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL
VM-{Entry,Exit} control")
Signed-off-by: Chenyi Qiang <[email protected]>
Reviewed-by: Xiaoyao Li <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 47bee53e235a..7d15765dfe17 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4699,7 +4699,7 @@ void nested_vmx_pmu_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
vmx->nested.msrs.entry_ctls_high &=
~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
vmx->nested.msrs.exit_ctls_high &=
- ~VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
+ ~VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
}
}

--
2.17.1

2020-08-28 08:59:18

by Chenyi Qiang

[permalink] [raw]
Subject: [PATCH 2/5] KVM: nVMX: Verify the VMX controls MSRs with the global capability when setting VMX MSRs

When setting the nested VMX MSRs, verify it with the values in
vmcs_config.nested_vmx_msrs, which reflects the global capability of
VMX controls MSRs.

Signed-off-by: Chenyi Qiang <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 71 ++++++++++++++++++++++++++++-----------
1 file changed, 51 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6e0e71f4d45f..47bee53e235a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1234,7 +1234,7 @@ static int vmx_restore_vmx_basic(struct vcpu_vmx *vmx, u64 data)
BIT_ULL(49) | BIT_ULL(54) | BIT_ULL(55) |
/* reserved */
BIT_ULL(31) | GENMASK_ULL(47, 45) | GENMASK_ULL(63, 56);
- u64 vmx_basic = vmx->nested.msrs.basic;
+ u64 vmx_basic = vmcs_config.nested.basic;

if (!is_bitwise_subset(vmx_basic, data, feature_and_reserved))
return -EINVAL;
@@ -1265,24 +1265,24 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)

switch (msr_index) {
case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
- lowp = &vmx->nested.msrs.pinbased_ctls_low;
- highp = &vmx->nested.msrs.pinbased_ctls_high;
+ lowp = &vmcs_config.nested.pinbased_ctls_low;
+ highp = &vmcs_config.nested.pinbased_ctls_high;
break;
case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
- lowp = &vmx->nested.msrs.procbased_ctls_low;
- highp = &vmx->nested.msrs.procbased_ctls_high;
+ lowp = &vmcs_config.nested.procbased_ctls_low;
+ highp = &vmcs_config.nested.procbased_ctls_high;
break;
case MSR_IA32_VMX_TRUE_EXIT_CTLS:
- lowp = &vmx->nested.msrs.exit_ctls_low;
- highp = &vmx->nested.msrs.exit_ctls_high;
+ lowp = &vmcs_config.nested.exit_ctls_low;
+ highp = &vmcs_config.nested.exit_ctls_high;
break;
case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
- lowp = &vmx->nested.msrs.entry_ctls_low;
- highp = &vmx->nested.msrs.entry_ctls_high;
+ lowp = &vmcs_config.nested.entry_ctls_low;
+ highp = &vmcs_config.nested.entry_ctls_high;
break;
case MSR_IA32_VMX_PROCBASED_CTLS2:
- lowp = &vmx->nested.msrs.secondary_ctls_low;
- highp = &vmx->nested.msrs.secondary_ctls_high;
+ lowp = &vmcs_config.nested.secondary_ctls_low;
+ highp = &vmcs_config.nested.secondary_ctls_high;
break;
default:
BUG();
@@ -1298,8 +1298,30 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
if (!is_bitwise_subset(supported, data, GENMASK_ULL(63, 32)))
return -EINVAL;

- *lowp = data;
- *highp = data >> 32;
+ switch (msr_index) {
+ case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
+ vmx->nested.msrs.pinbased_ctls_low = data;
+ vmx->nested.msrs.pinbased_ctls_high = data >> 32;
+ break;
+ case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
+ vmx->nested.msrs.procbased_ctls_low = data;
+ vmx->nested.msrs.procbased_ctls_high = data >> 32;
+ break;
+ case MSR_IA32_VMX_TRUE_EXIT_CTLS:
+ vmx->nested.msrs.exit_ctls_low = data;
+ vmx->nested.msrs.exit_ctls_high = data >> 32;
+ break;
+ case MSR_IA32_VMX_TRUE_ENTRY_CTLS:
+ vmx->nested.msrs.entry_ctls_low = data;
+ vmx->nested.msrs.entry_ctls_high = data >> 32;
+ break;
+ case MSR_IA32_VMX_PROCBASED_CTLS2:
+ vmx->nested.msrs.secondary_ctls_low = data;
+ vmx->nested.msrs.secondary_ctls_high = data >> 32;
+ break;
+ default:
+ BUG();
+ }
return 0;
}

@@ -1313,8 +1335,8 @@ static int vmx_restore_vmx_misc(struct vcpu_vmx *vmx, u64 data)
GENMASK_ULL(13, 9) | BIT_ULL(31);
u64 vmx_misc;

- vmx_misc = vmx_control_msr(vmx->nested.msrs.misc_low,
- vmx->nested.msrs.misc_high);
+ vmx_misc = vmx_control_msr(vmcs_config.nested.misc_low,
+ vmcs_config.nested.misc_high);

if (!is_bitwise_subset(vmx_misc, data, feature_and_reserved_bits))
return -EINVAL;
@@ -1344,8 +1366,8 @@ static int vmx_restore_vmx_ept_vpid_cap(struct vcpu_vmx *vmx, u64 data)
{
u64 vmx_ept_vpid_cap;

- vmx_ept_vpid_cap = vmx_control_msr(vmx->nested.msrs.ept_caps,
- vmx->nested.msrs.vpid_caps);
+ vmx_ept_vpid_cap = vmx_control_msr(vmcs_config.nested.ept_caps,
+ vmcs_config.nested.vpid_caps);

/* Every bit is either reserved or a feature bit. */
if (!is_bitwise_subset(vmx_ept_vpid_cap, data, -1ULL))
@@ -1362,10 +1384,10 @@ static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)

switch (msr_index) {
case MSR_IA32_VMX_CR0_FIXED0:
- msr = &vmx->nested.msrs.cr0_fixed0;
+ msr = &vmcs_config.nested.cr0_fixed0;
break;
case MSR_IA32_VMX_CR4_FIXED0:
- msr = &vmx->nested.msrs.cr4_fixed0;
+ msr = &vmcs_config.nested.cr4_fixed0;
break;
default:
BUG();
@@ -1378,7 +1400,16 @@ static int vmx_restore_fixed0_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
if (!is_bitwise_subset(data, *msr, -1ULL))
return -EINVAL;

- *msr = data;
+ switch (msr_index) {
+ case MSR_IA32_VMX_CR0_FIXED0:
+ vmx->nested.msrs.cr0_fixed0 = data;
+ break;
+ case MSR_IA32_VMX_CR4_FIXED0:
+ vmx->nested.msrs.cr4_fixed0 = data;
+ break;
+ default:
+ BUG();
+ }
return 0;
}

--
2.17.1

2020-08-28 17:47:40

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 1/5] KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled

On Fri, Aug 28, 2020 at 1:54 AM Chenyi Qiang <[email protected]> wrote:
>
> KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and
> VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they doesn't expose during
> the setup of nested VMX controls MSR.
>

Aren't these features added conditionally in
nested_vmx_entry_exit_ctls_update() and
nested_vmx_pmu_entry_exit_ctls_update()?

2020-08-28 18:25:11

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 2/5] KVM: nVMX: Verify the VMX controls MSRs with the global capability when setting VMX MSRs

On Fri, Aug 28, 2020 at 1:54 AM Chenyi Qiang <[email protected]> wrote:
>
> When setting the nested VMX MSRs, verify it with the values in
> vmcs_config.nested_vmx_msrs, which reflects the global capability of
> VMX controls MSRs.
>
> Signed-off-by: Chenyi Qiang <[email protected]>

You seem to have entirely missed the point of this code, which is to
prevent userspace from adding features that have previously been
removed for this vCPU (e.g as a side-effect of KVM_SET_CPUID).

2020-08-28 18:30:31

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 4/5] KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control

On Fri, Aug 28, 2020 at 1:54 AM Chenyi Qiang <[email protected]> wrote:
>
> A minor fix for the update of VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL field
> in exit_ctls_high.
>
> Fixes: 03a8871add95 ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL
> VM-{Entry,Exit} control")
> Signed-off-by: Chenyi Qiang <[email protected]>
> Reviewed-by: Xiaoyao Li <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>

2020-08-28 20:40:56

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 3/5] KVM: nVMX: Update VMX controls MSR according to guest CPUID after setting VMX MSRs

On Fri, Aug 28, 2020 at 1:54 AM Chenyi Qiang <[email protected]> wrote:
>
> Update the fields (i.e. VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS and
> VM_{ENTRY, EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL) in
> nested MSR_IA32_VMX_TRUE_{ENTRY, EXIT}_CTLS according to guest CPUID
> when user space initializes the features MSRs. Regardless of the order
> of SET_CPUID and SET_MSRS from the user space, do the update to avoid
> MSR values overriding.
>
> Signed-off-by: Chenyi Qiang <[email protected]>
> ---
> arch/x86/kvm/vmx/vmx.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 819c185adf09..f9664ccc003b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -345,6 +345,7 @@ static bool guest_state_valid(struct kvm_vcpu *vcpu);
> static u32 vmx_segment_access_rights(struct kvm_segment *var);
> static __always_inline void vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
> u32 msr, int type);
> +static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu);
>
> void vmx_vmexit(void);
>
> @@ -2161,7 +2162,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> return 1; /* they are read-only */
> if (!nested_vmx_allowed(vcpu))
> return 1;
> - return vmx_set_vmx_msr(vcpu, msr_index, data);
> + ret = vmx_set_vmx_msr(vcpu, msr_index, data);
> + nested_vmx_pmu_entry_exit_ctls_update(vcpu);
> + nested_vmx_entry_exit_ctls_update(vcpu);
> + break;

Now I see what you're doing. This commit should probably come before
the previous commit, so that at no point in the series can userspace
set VMX MSR bits that should be cleared based on the guest CPUID.

There's an ABI change here: userspace may no longer get -EINVAL if it
tries to set an illegal VMX MSR bit. Instead, some illegal bits are
silently cleared. Moreover, these functions will potentially set VMX
MSR bits that userspace has just asked to clear.

> case MSR_IA32_RTIT_CTL:
> if (!vmx_pt_mode_is_host_guest() ||
> vmx_rtit_ctl_check(vcpu, data) ||
> --
> 2.17.1
>

2020-08-29 01:52:51

by Chenyi Qiang

[permalink] [raw]
Subject: Re: [PATCH 1/5] KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled



On 8/29/2020 1:43 AM, Jim Mattson wrote:
> On Fri, Aug 28, 2020 at 1:54 AM Chenyi Qiang <[email protected]> wrote:
>>
>> KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and
>> VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they doesn't expose during
>> the setup of nested VMX controls MSR.
>>
>
> Aren't these features added conditionally in
> nested_vmx_entry_exit_ctls_update() and
> nested_vmx_pmu_entry_exit_ctls_update()?
>

Yes, but I assume vmcs_config.nested should reflect the global
capability of VMX MSR. KVM supports these two controls, so should be
exposed here.

2020-08-29 02:53:36

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 1/5] KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled

On 8/29/2020 9:49 AM, Chenyi Qiang wrote:
>
>
> On 8/29/2020 1:43 AM, Jim Mattson wrote:
>> On Fri, Aug 28, 2020 at 1:54 AM Chenyi Qiang <[email protected]>
>> wrote:
>>>
>>> KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and
>>> VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they doesn't expose during
>>> the setup of nested VMX controls MSR.
>>>
>>
>> Aren't these features added conditionally in
>> nested_vmx_entry_exit_ctls_update() and
>> nested_vmx_pmu_entry_exit_ctls_update()?
>>
>
> Yes, but I assume vmcs_config.nested should reflect the global
> capability of VMX MSR. KVM supports these two controls, so should be
> exposed here.

No. I prefer to say they are removed conditionally in
nested_vmx_entry_exit_ctls_update() and
nested_vmx_pmu_entry_exit_ctls_update().

Userspace calls vmx_get_msr_feature() to query what KVM supports for
these VMX MSR. In vmx_get_msr_feature(), it returns the value of
vmcs_config.nested. As KVM supports these two bits, we should advertise
them in vmcs_config.nested and report to userspace.

2020-08-31 03:19:16

by Chenyi Qiang

[permalink] [raw]
Subject: Re: [PATCH 2/5] KVM: nVMX: Verify the VMX controls MSRs with the global capability when setting VMX MSRs



On 8/29/2020 2:23 AM, Jim Mattson wrote:
> On Fri, Aug 28, 2020 at 1:54 AM Chenyi Qiang <[email protected]> wrote:
>>
>> When setting the nested VMX MSRs, verify it with the values in
>> vmcs_config.nested_vmx_msrs, which reflects the global capability of
>> VMX controls MSRs.
>>
>> Signed-off-by: Chenyi Qiang <[email protected]>
>
> You seem to have entirely missed the point of this code, which is to
> prevent userspace from adding features that have previously been
> removed for this vCPU (e.g as a side-effect of KVM_SET_CPUID).
>

We only have the case that the scope of features set by userspace is
always reduced, right? If so, we don't need the change here.

2020-08-31 17:39:05

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 1/5] KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled

On Fri, Aug 28, 2020 at 7:51 PM Xiaoyao Li <[email protected]> wrote:
>
> On 8/29/2020 9:49 AM, Chenyi Qiang wrote:
> >
> >
> > On 8/29/2020 1:43 AM, Jim Mattson wrote:
> >> On Fri, Aug 28, 2020 at 1:54 AM Chenyi Qiang <[email protected]>
> >> wrote:
> >>>
> >>> KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and
> >>> VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they doesn't expose during
> >>> the setup of nested VMX controls MSR.
> >>>
> >>
> >> Aren't these features added conditionally in
> >> nested_vmx_entry_exit_ctls_update() and
> >> nested_vmx_pmu_entry_exit_ctls_update()?
> >>
> >
> > Yes, but I assume vmcs_config.nested should reflect the global
> > capability of VMX MSR. KVM supports these two controls, so should be
> > exposed here.
>
> No. I prefer to say they are removed conditionally in
> nested_vmx_entry_exit_ctls_update() and
> nested_vmx_pmu_entry_exit_ctls_update().
>
> Userspace calls vmx_get_msr_feature() to query what KVM supports for
> these VMX MSR. In vmx_get_msr_feature(), it returns the value of
> vmcs_config.nested. As KVM supports these two bits, we should advertise
> them in vmcs_config.nested and report to userspace.

It would be nice if there was an API to query what MSR values KVM
supports for a specific VCPU configuration, but given that this is a
system ioctl, I agree with you.

2020-09-01 03:28:36

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 1/5] KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled

On 8/28/2020 4:56 PM, Chenyi Qiang wrote:
> KVM supports the nested VM_{EXIT, ENTRY}_LOAD_IA32_PERF_GLOBAL_CTRL and
> VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS, but they doesn't expose during
> the setup of nested VMX controls MSR.
>
> Signed-off-by: Chenyi Qiang <[email protected]>

Reviewed-by: Xiaoyao Li <[email protected]>

> ---
> arch/x86/kvm/vmx/nested.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 23b58c28a1c9..6e0e71f4d45f 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -6310,7 +6310,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> #ifdef CONFIG_X86_64
> VM_EXIT_HOST_ADDR_SPACE_SIZE |
> #endif
> - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
> + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
> + VM_EXIT_CLEAR_BNDCFGS | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL;
> msrs->exit_ctls_high |=
> VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER |
> @@ -6329,7 +6330,8 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
> #ifdef CONFIG_X86_64
> VM_ENTRY_IA32E_MODE |
> #endif
> - VM_ENTRY_LOAD_IA32_PAT;
> + VM_ENTRY_LOAD_IA32_PAT | VM_ENTRY_LOAD_BNDCFGS |
> + VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL;
> msrs->entry_ctls_high |=
> (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER);
>
>

2020-09-02 19:38:48

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 3/5] KVM: nVMX: Update VMX controls MSR according to guest CPUID after setting VMX MSRs

On Fri, Aug 28, 2020 at 01:39:39PM -0700, Jim Mattson wrote:
> On Fri, Aug 28, 2020 at 1:54 AM Chenyi Qiang <[email protected]> wrote:
> >
> > Update the fields (i.e. VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS and
> > VM_{ENTRY, EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL) in
> > nested MSR_IA32_VMX_TRUE_{ENTRY, EXIT}_CTLS according to guest CPUID
> > when user space initializes the features MSRs. Regardless of the order
> > of SET_CPUID and SET_MSRS from the user space, do the update to avoid
> > MSR values overriding.
> >
> > Signed-off-by: Chenyi Qiang <[email protected]>
> > ---
> > arch/x86/kvm/vmx/vmx.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 819c185adf09..f9664ccc003b 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -345,6 +345,7 @@ static bool guest_state_valid(struct kvm_vcpu *vcpu);
> > static u32 vmx_segment_access_rights(struct kvm_segment *var);
> > static __always_inline void vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
> > u32 msr, int type);
> > +static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu);
> >
> > void vmx_vmexit(void);
> >
> > @@ -2161,7 +2162,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > return 1; /* they are read-only */
> > if (!nested_vmx_allowed(vcpu))
> > return 1;
> > - return vmx_set_vmx_msr(vcpu, msr_index, data);
> > + ret = vmx_set_vmx_msr(vcpu, msr_index, data);
> > + nested_vmx_pmu_entry_exit_ctls_update(vcpu);
> > + nested_vmx_entry_exit_ctls_update(vcpu);
> > + break;
>
> Now I see what you're doing. This commit should probably come before
> the previous commit, so that at no point in the series can userspace
> set VMX MSR bits that should be cleared based on the guest CPUID.
>
> There's an ABI change here: userspace may no longer get -EINVAL if it
> tries to set an illegal VMX MSR bit. Instead, some illegal bits are
> silently cleared. Moreover, these functions will potentially set VMX
> MSR bits that userspace has just asked to clear.

Can we simply remove nested_vmx_entry_exit_ctls_update() and
nested_vmx_pmu_entry_exit_ctls_update()? It's userspace's responsibility
to present a valid vCPU model to the guest, I don't see any reason to
silently tweak the VMX MSRs unless allowing the bogus config breaks KVM.
E.g. there are many more controls that are non-sensical without "native"
support for the associated feature.

2020-09-02 19:39:11

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 3/5] KVM: nVMX: Update VMX controls MSR according to guest CPUID after setting VMX MSRs

On Wed, Sep 2, 2020 at 11:16 AM Sean Christopherson
<[email protected]> wrote:
>
> On Fri, Aug 28, 2020 at 01:39:39PM -0700, Jim Mattson wrote:
> > On Fri, Aug 28, 2020 at 1:54 AM Chenyi Qiang <[email protected]> wrote:
> > >
> > > Update the fields (i.e. VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS and
> > > VM_{ENTRY, EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL) in
> > > nested MSR_IA32_VMX_TRUE_{ENTRY, EXIT}_CTLS according to guest CPUID
> > > when user space initializes the features MSRs. Regardless of the order
> > > of SET_CPUID and SET_MSRS from the user space, do the update to avoid
> > > MSR values overriding.
> > >
> > > Signed-off-by: Chenyi Qiang <[email protected]>
> > > ---
> > > arch/x86/kvm/vmx/vmx.c | 6 +++++-
> > > 1 file changed, 5 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > > index 819c185adf09..f9664ccc003b 100644
> > > --- a/arch/x86/kvm/vmx/vmx.c
> > > +++ b/arch/x86/kvm/vmx/vmx.c
> > > @@ -345,6 +345,7 @@ static bool guest_state_valid(struct kvm_vcpu *vcpu);
> > > static u32 vmx_segment_access_rights(struct kvm_segment *var);
> > > static __always_inline void vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
> > > u32 msr, int type);
> > > +static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu);
> > >
> > > void vmx_vmexit(void);
> > >
> > > @@ -2161,7 +2162,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > > return 1; /* they are read-only */
> > > if (!nested_vmx_allowed(vcpu))
> > > return 1;
> > > - return vmx_set_vmx_msr(vcpu, msr_index, data);
> > > + ret = vmx_set_vmx_msr(vcpu, msr_index, data);
> > > + nested_vmx_pmu_entry_exit_ctls_update(vcpu);
> > > + nested_vmx_entry_exit_ctls_update(vcpu);
> > > + break;
> >
> > Now I see what you're doing. This commit should probably come before
> > the previous commit, so that at no point in the series can userspace
> > set VMX MSR bits that should be cleared based on the guest CPUID.
> >
> > There's an ABI change here: userspace may no longer get -EINVAL if it
> > tries to set an illegal VMX MSR bit. Instead, some illegal bits are
> > silently cleared. Moreover, these functions will potentially set VMX
> > MSR bits that userspace has just asked to clear.
>
> Can we simply remove nested_vmx_entry_exit_ctls_update() and
> nested_vmx_pmu_entry_exit_ctls_update()? It's userspace's responsibility
> to present a valid vCPU model to the guest, I don't see any reason to
> silently tweak the VMX MSRs unless allowing the bogus config breaks KVM.
> E.g. there are many more controls that are non-sensical without "native"
> support for the associated feature.

We might need a test for kvm_mpx_supported() here:

/* If not VM_EXIT_CLEAR_BNDCFGS, the L2 value propagates to L1. */
if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS)
vmcs_write64(GUEST_BNDCFGS, 0);

BTW, where does the L2 value propagate to L1 if not VM_EXIT_CLEAR_BNDCFGS?

2020-09-11 17:12:40

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/5] Fix nested VMX controls MSRs

On 28/08/20 10:56, Chenyi Qiang wrote:
> The first three patches fix a issue for the nested VMX controls MSRs. The
> issue happens when I use QEMU to run nested VM. The VM_{ENTRY,
> EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL and VM_{ENTRY_LOAD, EXIT_CLEAR}_BNDCFGS
> in L1 MSR_IA32_VMX_TRUE_{ENTRY, EXIT}_CTLS MSR are always cleared
> regardless of whether it supports in L1. This is because QEMU gets the
> nested VMX MSRs from vmcs_config.nested_vmx_msrs which doesn't expose
> these two fields. Then, when QEMU initializes the features MSRs after
> SET_CPUID, it will override the nested VMX MSR values which has been
> updated according to guest CPUID during SET_CPUID. This patch series
> just expose the missing fields in nested VMX {ENTRY, EXIT} controls
> MSR and adds the support to update nested VMX MSRs after set_vmx_msrs.
>
> The last two patches are a minor fix and cleanup.
>
> Chenyi Qiang (5):
> KVM: nVMX: Fix VMX controls MSRs setup when nested VMX enabled
> KVM: nVMX: Verify the VMX controls MSRs with the global capability
> when setting VMX MSRs
> KVM: nVMX: Update VMX controls MSR according to guest CPUID after
> setting VMX MSRs
> KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL
> control
> KVM: nVMX: Simplify the initialization of nested_vmx_msrs
>
> arch/x86/kvm/vmx/nested.c | 79 +++++++++++++++++++++++++++------------
> arch/x86/kvm/vmx/vmx.c | 9 +++--
> 2 files changed, 62 insertions(+), 26 deletions(-)
>

Queued patch 1/4/5, thanks.

Paolo

2020-09-11 17:13:31

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/5] KVM: nVMX: Update VMX controls MSR according to guest CPUID after setting VMX MSRs

On 02/09/20 20:32, Jim Mattson wrote:
>
> /* If not VM_EXIT_CLEAR_BNDCFGS, the L2 value propagates to L1. */
> if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS)
> vmcs_write64(GUEST_BNDCFGS, 0);
>
> BTW, where does the L2 value propagate to L1 if not VM_EXIT_CLEAR_BNDCFGS?

Hmm, nowhere. :/ Probably something like this (not really thought through):

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 1e903d51912b..aba76aa99465 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3317,7 +3317,8 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
if (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS))
vmx->nested.vmcs01_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
if (kvm_mpx_supported() &&
- !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
+ (!(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS) ||
+ !(vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS)))
vmx->nested.vmcs01_guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS);

/*
@@ -4186,9 +4187,12 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
vmcs_write32(GUEST_IDTR_LIMIT, 0xFFFF);
vmcs_write32(GUEST_GDTR_LIMIT, 0xFFFF);

- /* If not VM_EXIT_CLEAR_BNDCFGS, the L2 value propagates to L1. */
- if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS)
- vmcs_write64(GUEST_BNDCFGS, 0);
+ if (kvm_mpx_supported()) {
+ if (vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS)
+ vmcs_write64(GUEST_BNDCFGS, 0);
+ else
+ vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
+ }

if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_PAT) {
vmcs_write64(GUEST_IA32_PAT, vmcs12->host_ia32_pat);
@@ -4466,6 +4470,10 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
vmx_set_virtual_apic_mode(vcpu);
}

+ /* If not VM_EXIT_CLEAR_BNDCFGS, the L2 value propagates to L1. */
+ if (!(vmcs12->vm_exit_controls & VM_EXIT_CLEAR_BNDCFGS))
+ vmx->nested.vmcs01_guest_bndcfgs = vmcs12->guest_bndcfgs;
+
/* Unpin physical memory we referred to in vmcs02 */
if (vmx->nested.apic_access_page) {
kvm_release_page_clean(vmx->nested.apic_access_page);


which will also work in the failed vmentry case.

Paolo