2020-10-07 02:00:06

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.

The overarching issue is that kvm_x86_ops.set_cr4() can fail, but its
invocation from __set_sregs(), a.k.a. KVM_SET_SREGS, ignores the result.
Fix the issue by moving all validity checks out of .set_cr4() in one way
or another.

I intentionally omitted a Cc to stable. The first bug fix in particular
may break stable trees as it simply removes a check, and I don't know that
stable trees have the generic CR4 reserved bit check that is needed to
prevent the guest from setting VMXE when nVMX is not allowed.

Sean Christopherson (6):
KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4()
KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4()
KVM: SVM: Drop VMXE check from svm_set_cr4()
KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook
KVM: x86: Return bool instead of int for CR4 and SREGS validity checks
KVM: selftests: Verify supported CR4 bits can be set before
KVM_SET_CPUID2

arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/kvm/svm/nested.c | 2 +-
arch/x86/kvm/svm/svm.c | 12 ++-
arch/x86/kvm/svm/svm.h | 2 +-
arch/x86/kvm/vmx/nested.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 35 +++----
arch/x86/kvm/vmx/vmx.h | 2 +-
arch/x86/kvm/x86.c | 28 +++---
arch/x86/kvm/x86.h | 2 +-
.../selftests/kvm/include/x86_64/processor.h | 17 ++++
.../selftests/kvm/include/x86_64/vmx.h | 4 -
.../selftests/kvm/x86_64/set_sregs_test.c | 92 ++++++++++++++++++-
12 files changed, 153 insertions(+), 48 deletions(-)

--
2.28.0


2020-10-07 02:52:54

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 5/6] KVM: x86: Return bool instead of int for CR4 and SREGS validity checks

Rework the common CR4 and SREGS checks to return a bool instead of an
int, i.e. true/false instead of 0/-EINVAL, and add "is" to the name to
clarify the polarity of the return value (which is effectively inverted
by this change).

No functional changed intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/nested.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/x86.c | 28 ++++++++++++----------------
arch/x86/kvm/x86.h | 2 +-
4 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index ba50ff6e35c7..114e0e8561bc 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -254,7 +254,7 @@ static bool nested_vmcb_checks(struct vcpu_svm *svm, struct vmcb *vmcb12)
(vmcb12->save.cr3 & MSR_CR3_LONG_MBZ_MASK))
return false;
}
- if (kvm_valid_cr4(&svm->vcpu, vmcb12->save.cr4))
+ if (!kvm_is_valid_cr4(&svm->vcpu, vmcb12->save.cr4))
return false;

return nested_vmcb_check_controls(&vmcb12->control);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5aa0a3af7dbb..ac69aa3076d8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3081,7 +3081,7 @@ static bool vmx_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
/*
* We operate under the default treatment of SMM, so VMX cannot be
* enabled under SMM. Note, whether or not VMXE is allowed at all is
- * handled by kvm_valid_cr4().
+ * handled by kvm_is_valid_cr4().
*/
if ((cr4 & X86_CR4_VMXE) && is_smm(vcpu))
return false;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 64cc86f4f18f..5870aa6cbad2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -965,20 +965,17 @@ int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
}
EXPORT_SYMBOL_GPL(kvm_set_xcr);

-int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
if (cr4 & cr4_reserved_bits)
- return -EINVAL;
+ return false;

if (cr4 & vcpu->arch.cr4_guest_rsvd_bits)
- return -EINVAL;
+ return false;

- if (!kvm_x86_ops.is_valid_cr4(vcpu, cr4))
- return -EINVAL;
-
- return 0;
+ return kvm_x86_ops.is_valid_cr4(vcpu, cr4);
}
-EXPORT_SYMBOL_GPL(kvm_valid_cr4);
+EXPORT_SYMBOL_GPL(kvm_is_valid_cr4);

int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
{
@@ -986,7 +983,7 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
X86_CR4_SMEP;

- if (kvm_valid_cr4(vcpu, cr4))
+ if (!kvm_is_valid_cr4(vcpu, cr4))
return 1;

if (is_long_mode(vcpu)) {
@@ -9422,7 +9419,7 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
}
EXPORT_SYMBOL_GPL(kvm_task_switch);

-static int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
+static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
{
if ((sregs->efer & EFER_LME) && (sregs->cr0 & X86_CR0_PG)) {
/*
@@ -9430,19 +9427,18 @@ static int kvm_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
* 64-bit mode (though maybe in a 32-bit code segment).
* CR4.PAE and EFER.LMA must be set.
*/
- if (!(sregs->cr4 & X86_CR4_PAE)
- || !(sregs->efer & EFER_LMA))
- return -EINVAL;
+ if (!(sregs->cr4 & X86_CR4_PAE) || !(sregs->efer & EFER_LMA))
+ return false;
} else {
/*
* Not in 64-bit mode: EFER.LMA is clear and the code
* segment cannot be 64-bit.
*/
if (sregs->efer & EFER_LMA || sregs->cs.l)
- return -EINVAL;
+ return false;
}

- return kvm_valid_cr4(vcpu, sregs->cr4);
+ return kvm_is_valid_cr4(vcpu, sregs->cr4);
}

static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
@@ -9454,7 +9450,7 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
struct desc_ptr dt;
int ret = -EINVAL;

- if (kvm_valid_sregs(vcpu, sregs))
+ if (!kvm_is_valid_sregs(vcpu, sregs))
goto out;

apic_base_msr.data = sregs->apic_base;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 3900ab0c6004..b3b1d237ffe5 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -369,7 +369,7 @@ static inline bool kvm_dr6_valid(u64 data)
void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu);
void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu);
int kvm_spec_ctrl_test_value(u64 value);
-int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
+bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4);
bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu);
int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
struct x86_exception *e);
--
2.28.0

2020-10-07 03:07:35

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 6/6] KVM: selftests: Verify supported CR4 bits can be set before KVM_SET_CPUID2

Extend the KVM_SET_SREGS test to verify that all supported CR4 bits, as
enumerated by KVM, can be set before KVM_SET_CPUID2, i.e. without first
defining the vCPU model. KVM is supposed to skip guest CPUID checks
when host userspace is stuffing guest state.

Check the inverse as well, i.e. that KVM rejects KVM_SET_REGS if CR4
has one or more unsupported bits set.

Signed-off-by: Sean Christopherson <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 17 ++++
.../selftests/kvm/include/x86_64/vmx.h | 4 -
.../selftests/kvm/x86_64/set_sregs_test.c | 92 ++++++++++++++++++-
3 files changed, 108 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 82b7fe16a824..29f0bd7d8271 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -27,6 +27,7 @@
#define X86_CR4_OSFXSR (1ul << 9)
#define X86_CR4_OSXMMEXCPT (1ul << 10)
#define X86_CR4_UMIP (1ul << 11)
+#define X86_CR4_LA57 (1ul << 12)
#define X86_CR4_VMXE (1ul << 13)
#define X86_CR4_SMXE (1ul << 14)
#define X86_CR4_FSGSBASE (1ul << 16)
@@ -36,6 +37,22 @@
#define X86_CR4_SMAP (1ul << 21)
#define X86_CR4_PKE (1ul << 22)

+/* CPUID.1.ECX */
+#define CPUID_VMX (1ul << 5)
+#define CPUID_SMX (1ul << 6)
+#define CPUID_PCID (1ul << 17)
+#define CPUID_XSAVE (1ul << 26)
+
+/* CPUID.7.EBX */
+#define CPUID_FSGSBASE (1ul << 0)
+#define CPUID_SMEP (1ul << 7)
+#define CPUID_SMAP (1ul << 20)
+
+/* CPUID.7.ECX */
+#define CPUID_UMIP (1ul << 2)
+#define CPUID_PKU (1ul << 3)
+#define CPUID_LA57 (1ul << 16)
+
/* General Registers in 64-Bit Mode */
struct gpr64_regs {
u64 rax;
diff --git a/tools/testing/selftests/kvm/include/x86_64/vmx.h b/tools/testing/selftests/kvm/include/x86_64/vmx.h
index 54d624dd6c10..e4da3e784f90 100644
--- a/tools/testing/selftests/kvm/include/x86_64/vmx.h
+++ b/tools/testing/selftests/kvm/include/x86_64/vmx.h
@@ -11,10 +11,6 @@
#include <stdint.h>
#include "processor.h"

-#define CPUID_VMX_BIT 5
-
-#define CPUID_VMX (1 << 5)
-
/*
* Definitions of Primary Processor-Based VM-Execution Controls.
*/
diff --git a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
index 9f7656184f31..318be0bf77ab 100644
--- a/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/set_sregs_test.c
@@ -24,16 +24,106 @@

#define VCPU_ID 5

+static void test_cr4_feature_bit(struct kvm_vm *vm, struct kvm_sregs *orig,
+ uint64_t feature_bit)
+{
+ struct kvm_sregs sregs;
+ int rc;
+
+ /* Skip the sub-test, the feature is supported. */
+ if (orig->cr4 & feature_bit)
+ return;
+
+ memcpy(&sregs, orig, sizeof(sregs));
+ sregs.cr4 |= feature_bit;
+
+ rc = _vcpu_sregs_set(vm, VCPU_ID, &sregs);
+ TEST_ASSERT(rc, "KVM allowed unsupported CR4 bit (0x%lx)", feature_bit);
+
+ /* Sanity check that KVM didn't change anything. */
+ vcpu_sregs_get(vm, VCPU_ID, &sregs);
+ TEST_ASSERT(!memcmp(&sregs, orig, sizeof(sregs)), "KVM modified sregs");
+}
+
+static uint64_t calc_cr4_feature_bits(struct kvm_vm *vm)
+{
+ struct kvm_cpuid_entry2 *cpuid_1, *cpuid_7;
+ uint64_t cr4;
+
+ cpuid_1 = kvm_get_supported_cpuid_entry(1);
+ cpuid_7 = kvm_get_supported_cpuid_entry(7);
+
+ cr4 = X86_CR4_VME | X86_CR4_PVI | X86_CR4_TSD | X86_CR4_DE |
+ X86_CR4_PSE | X86_CR4_PAE | X86_CR4_MCE | X86_CR4_PGE |
+ X86_CR4_PCE | X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT;
+ if (cpuid_7->ecx & CPUID_UMIP)
+ cr4 |= X86_CR4_UMIP;
+ if (cpuid_7->ecx & CPUID_LA57)
+ cr4 |= X86_CR4_LA57;
+ if (cpuid_1->ecx & CPUID_VMX)
+ cr4 |= X86_CR4_VMXE;
+ if (cpuid_1->ecx & CPUID_SMX)
+ cr4 |= X86_CR4_SMXE;
+ if (cpuid_7->ebx & CPUID_FSGSBASE)
+ cr4 |= X86_CR4_FSGSBASE;
+ if (cpuid_1->ecx & CPUID_PCID)
+ cr4 |= X86_CR4_PCIDE;
+ if (cpuid_1->ecx & CPUID_XSAVE)
+ cr4 |= X86_CR4_OSXSAVE;
+ if (cpuid_7->ebx & CPUID_SMEP)
+ cr4 |= X86_CR4_SMEP;
+ if (cpuid_7->ebx & CPUID_SMAP)
+ cr4 |= X86_CR4_SMAP;
+ if (cpuid_7->ecx & CPUID_PKU)
+ cr4 |= X86_CR4_PKE;
+
+ return cr4;
+}
+
int main(int argc, char *argv[])
{
struct kvm_sregs sregs;
struct kvm_vm *vm;
+ uint64_t cr4;
int rc;

/* Tell stdout not to buffer its content */
setbuf(stdout, NULL);

- /* Create VM */
+ /*
+ * Create a dummy VM, specifically to avoid doing KVM_SET_CPUID2, and
+ * use it to verify all supported CR4 bits can be set prior to defining
+ * the vCPU model, i.e. without doing KVM_SET_CPUID2.
+ */
+ vm = vm_create(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES, O_RDWR);
+ vm_vcpu_add(vm, VCPU_ID);
+
+ vcpu_sregs_get(vm, VCPU_ID, &sregs);
+
+ sregs.cr4 |= calc_cr4_feature_bits(vm);
+ cr4 = sregs.cr4;
+
+ rc = _vcpu_sregs_set(vm, VCPU_ID, &sregs);
+ TEST_ASSERT(!rc, "Failed to set supported CR4 bits (0x%lx)", cr4);
+
+ vcpu_sregs_get(vm, VCPU_ID, &sregs);
+ TEST_ASSERT(sregs.cr4 == cr4, "sregs.CR4 (0x%llx) != CR4 (0x%lx)",
+ sregs.cr4, cr4);
+
+ /* Verify all unsupported features are rejected by KVM. */
+ test_cr4_feature_bit(vm, &sregs, X86_CR4_UMIP);
+ test_cr4_feature_bit(vm, &sregs, X86_CR4_LA57);
+ test_cr4_feature_bit(vm, &sregs, X86_CR4_VMXE);
+ test_cr4_feature_bit(vm, &sregs, X86_CR4_SMXE);
+ test_cr4_feature_bit(vm, &sregs, X86_CR4_FSGSBASE);
+ test_cr4_feature_bit(vm, &sregs, X86_CR4_PCIDE);
+ test_cr4_feature_bit(vm, &sregs, X86_CR4_OSXSAVE);
+ test_cr4_feature_bit(vm, &sregs, X86_CR4_SMEP);
+ test_cr4_feature_bit(vm, &sregs, X86_CR4_SMAP);
+ test_cr4_feature_bit(vm, &sregs, X86_CR4_PKE);
+ kvm_vm_free(vm);
+
+ /* Create a "real" VM and verify APIC_BASE can be set. */
vm = vm_create_default(VCPU_ID, 0, NULL);

vcpu_sregs_get(vm, VCPU_ID, &sregs);
--
2.28.0

2020-10-07 03:07:42

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 1/6] KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4()

Drop vmx_set_cr4()'s somewhat hidden guest_cpuid_has() check on VMXE now
that common x86 handles the check by incorporating VMXE into the CR4
reserved bits, i.e. in cr4_guest_rsvd_bits. This fixes a bug where KVM
incorrectly rejects KVM_SET_SREGS with CR4.VMXE=1 if it's executed
before KVM_SET_CPUID{,2}.

Fixes: 5e1746d6205d ("KVM: nVMX: Allow setting the VMXE bit in CR4")
Reported-by: Stas Sergeev <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e23c41ccfac9..99ea57ba2a84 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3110,9 +3110,10 @@ int vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
* must first be able to turn on cr4.VMXE (see handle_vmon()).
* So basically the check on whether to allow nested VMX
* is here. We operate under the default treatment of SMM,
- * so VMX cannot be enabled under SMM.
+ * so VMX cannot be enabled under SMM. Note, guest CPUID is
+ * intentionally ignored, it's handled by cr4_guest_rsvd_bits.
*/
- if (!nested_vmx_allowed(vcpu) || is_smm(vcpu))
+ if (!nested || is_smm(vcpu))
return 1;
}

--
2.28.0

2020-10-07 03:08:31

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 3/6] KVM: SVM: Drop VMXE check from svm_set_cr4()

Drop svm_set_cr4()'s explicit check CR4.VMXE now that common x86 handles
the check by incorporating VMXE into the CR4 reserved bits, via
kvm_cpu_caps. SVM obviously does not set X86_FEATURE_VMX.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/svm/svm.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4f401fc6a05d..f92a19b77da3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1684,9 +1684,6 @@ int svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
unsigned long host_cr4_mce = cr4_read_shadow() & X86_CR4_MCE;
unsigned long old_cr4 = to_svm(vcpu)->vmcb->save.cr4;

- if (cr4 & X86_CR4_VMXE)
- return 1;
-
if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE))
svm_flush_tlb(vcpu);

--
2.28.0

2020-10-08 16:04:57

by stsp

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

07.10.2020 04:44, Sean Christopherson пишет:
> Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.
Hi Sean & KVM devs.

I tested the patches, and wherever I
set VMXE in CR4, I now get
KVM: KVM_SET_SREGS: Invalid argument
Before the patch I was able (with many
problems, but still) to set VMXE sometimes.

So its a NAK so far, waiting for an update. :)

2020-10-08 19:19:29

by stsp

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

08.10.2020 20:59, Sean Christopherson пишет:
> On Thu, Oct 08, 2020 at 07:00:13PM +0300, stsp wrote:
>> 07.10.2020 04:44, Sean Christopherson пишет:
>>> Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.
>> Hi Sean & KVM devs.
>>
>> I tested the patches, and wherever I
>> set VMXE in CR4, I now get
>> KVM: KVM_SET_SREGS: Invalid argument
>> Before the patch I was able (with many
>> problems, but still) to set VMXE sometimes.
>>
>> So its a NAK so far, waiting for an update. :)
> IIRC, you said you were going to test on AMD? Assuming that's correct,

Yes, that is true.


> -EINVAL
> is the expected behavior. KVM was essentially lying before; it never actually
> set CR4.VMXE in hardware, it just didn't properply detect the error and so VMXE
> was set in KVM's shadow of the guest's CR4.

Hmm. But at least it was lying
similarly on AMD and Intel CPUs. :)
So I was able to reproduce the problems
myself.
Do you mean, any AMD tests are now
useless, and we need to proceed with
Intel tests only?

Then additional question.
On old Intel CPUs we needed to set
VMXE in guest to make it to work in
nested-guest mode.
Is it still needed even with your patches?
Or the nested-guest mode will work
now even on older Intel CPUs and KVM
will set VMXE for us itself, when needed?

2020-10-08 20:07:33

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

On Thu, Oct 08, 2020 at 07:00:13PM +0300, stsp wrote:
> 07.10.2020 04:44, Sean Christopherson пишет:
> >Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.
> Hi Sean & KVM devs.
>
> I tested the patches, and wherever I
> set VMXE in CR4, I now get
> KVM: KVM_SET_SREGS: Invalid argument
> Before the patch I was able (with many
> problems, but still) to set VMXE sometimes.
>
> So its a NAK so far, waiting for an update. :)

IIRC, you said you were going to test on AMD? Assuming that's correct, -EINVAL
is the expected behavior. KVM was essentially lying before; it never actually
set CR4.VMXE in hardware, it just didn't properply detect the error and so VMXE
was set in KVM's shadow of the guest's CR4.

2020-10-09 04:33:00

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

On Thu, Oct 08, 2020 at 09:18:18PM +0300, stsp wrote:
> 08.10.2020 20:59, Sean Christopherson пишет:
> >On Thu, Oct 08, 2020 at 07:00:13PM +0300, stsp wrote:
> >>07.10.2020 04:44, Sean Christopherson пишет:
> >>>Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.
> >>Hi Sean & KVM devs.
> >>
> >>I tested the patches, and wherever I
> >>set VMXE in CR4, I now get
> >>KVM: KVM_SET_SREGS: Invalid argument
> >>Before the patch I was able (with many
> >>problems, but still) to set VMXE sometimes.
> >>
> >>So its a NAK so far, waiting for an update. :)
> >IIRC, you said you were going to test on AMD? Assuming that's correct,
>
> Yes, that is true.
>
>
> > -EINVAL
> >is the expected behavior. KVM was essentially lying before; it never actually
> >set CR4.VMXE in hardware, it just didn't properply detect the error and so VMXE
> >was set in KVM's shadow of the guest's CR4.
>
> Hmm. But at least it was lying
> similarly on AMD and Intel CPUs. :)
> So I was able to reproduce the problems
> myself.
> Do you mean, any AMD tests are now useless, and we need to proceed with Intel
> tests only?

For anything VMXE related, yes.

> Then additional question.
> On old Intel CPUs we needed to set VMXE in guest to make it to work in
> nested-guest mode.
> Is it still needed even with your patches?
> Or the nested-guest mode will work now even on older Intel CPUs and KVM will
> set VMXE for us itself, when needed?

I'm struggling to even come up with a theory as to how setting VMXE from
userspace would have impacted KVM with unrestricted_guest=n, let alone fixed
anything.

CR4.VMXE must always be 1 in _hardware_ when VMX is on, including when running
the guest. But KVM forces vmcs.GUEST_CR4.VMXE=1 at all times, regardless of
the guest's actual value (the guest sees a shadow value when it reads CR4).

And unless I grossly misunderstand dosemu2, it's not doing anything related to
nested virtualization, i.e. the stuffing VMXE=1 for the guest's shadow value
should have absolutely zero impact.

More than likely, VMXE was a red herring. Given that the reporter is also
seeing the same bug on bare metal after moving to kernel 5.4, odds are good
the issue is related to unrestricted_guest=n and has nothing to do with nVMX.

2020-10-09 14:14:02

by stsp

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

09.10.2020 07:04, Sean Christopherson пишет:
>> Hmm. But at least it was lying
>> similarly on AMD and Intel CPUs. :)
>> So I was able to reproduce the problems
>> myself.
>> Do you mean, any AMD tests are now useless, and we need to proceed with Intel
>> tests only?
> For anything VMXE related, yes.

What would be the expected behaviour
on Intel, if it is set? Any difference with AMD?


>> Then additional question.
>> On old Intel CPUs we needed to set VMXE in guest to make it to work in
>> nested-guest mode.
>> Is it still needed even with your patches?
>> Or the nested-guest mode will work now even on older Intel CPUs and KVM will
>> set VMXE for us itself, when needed?
> I'm struggling to even come up with a theory as to how setting VMXE from
> userspace would have impacted KVM with unrestricted_guest=n, let alone fixed
> anything.
>
> CR4.VMXE must always be 1 in _hardware_ when VMX is on, including when running
> the guest. But KVM forces vmcs.GUEST_CR4.VMXE=1 at all times, regardless of
> the guest's actual value (the guest sees a shadow value when it reads CR4).
>
> And unless I grossly misunderstand dosemu2, it's not doing anything related to
> nested virtualization, i.e. the stuffing VMXE=1 for the guest's shadow value
> should have absolutely zero impact.
>
> More than likely, VMXE was a red herring.

Yes, it was. :(
(as you can see from the end of the
github thread)


> Given that the reporter is also
> seeing the same bug on bare metal after moving to kernel 5.4, odds are good
> the issue is related to unrestricted_guest=n and has nothing to do with nVMX.

But we do not use unrestricted guest.
We use v86 under KVM.
The only other effect of setting VMXE
was clearing VME. Which shouldn't affect
anything either, right?

2020-10-09 15:32:41

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

On Fri, Oct 09, 2020 at 05:11:51PM +0300, stsp wrote:
> 09.10.2020 07:04, Sean Christopherson пишет:
> >>Hmm. But at least it was lying
> >>similarly on AMD and Intel CPUs. :)
> >>So I was able to reproduce the problems
> >>myself.
> >>Do you mean, any AMD tests are now useless, and we need to proceed with Intel
> >>tests only?
> >For anything VMXE related, yes.
>
> What would be the expected behaviour on Intel, if it is set? Any difference
> with AMD?

On Intel, userspace should be able to stuff CR4.VMXE=1 via KVM_SET_SREGS if
the 'nested' module param is 1, e.g. if 'modprobe kvm_intel nested=1'. Note,
'nested' is enabled by default on kernel 5.0 and later.

With AMD, setting CR4.VMXE=1 is never allowed as AMD doesn't support VMX,
AMD's virtualization solution is called SVM (Secure Virtual Machine). KVM
doesn't support nesting VMX within SVM and vice versa.

> >>Then additional question.
> >>On old Intel CPUs we needed to set VMXE in guest to make it to work in
> >>nested-guest mode.
> >>Is it still needed even with your patches?
> >>Or the nested-guest mode will work now even on older Intel CPUs and KVM will
> >>set VMXE for us itself, when needed?
> >I'm struggling to even come up with a theory as to how setting VMXE from
> >userspace would have impacted KVM with unrestricted_guest=n, let alone fixed
> >anything.
> >
> >CR4.VMXE must always be 1 in _hardware_ when VMX is on, including when running
> >the guest. But KVM forces vmcs.GUEST_CR4.VMXE=1 at all times, regardless of
> >the guest's actual value (the guest sees a shadow value when it reads CR4).
> >
> >And unless I grossly misunderstand dosemu2, it's not doing anything related to
> >nested virtualization, i.e. the stuffing VMXE=1 for the guest's shadow value
> >should have absolutely zero impact.
> >
> >More than likely, VMXE was a red herring.
>
> Yes, it was. :( (as you can see from the end of the github thread)
>
>
> > Given that the reporter is also
> >seeing the same bug on bare metal after moving to kernel 5.4, odds are good
> >the issue is related to unrestricted_guest=n and has nothing to do with nVMX.
>
> But we do not use unrestricted guest.
> We use v86 under KVM.

Unrestricted guest can kick in even if CR0.PG=1 && CR0.PE=1, e.g. there are
segmentation checks that apply if and only if unrestricted_guest=0. Long story
short, without a deep audit, it's basically impossible to rule out a dependency
on unrestricted guest since you're playing around with v86.

> The only other effect of setting VMXE was clearing VME. Which shouldn't
> affect anything either, right?

Hmm, clearing VME would mean that exceptions/interrupts within the guest would
trigger a switch out of v86 and into vanilla protected mode. v86 and PM have
different consistency checks, particularly for segmentation, so it's plausible
that clearing CR4.VME inadvertantly worked around the bug by avoiding invalid
guest state for v86.

2020-10-09 15:50:42

by stsp

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

09.10.2020 18:30, Sean Christopherson пишет:
> On Fri, Oct 09, 2020 at 05:11:51PM +0300, stsp wrote:
>> 09.10.2020 07:04, Sean Christopherson пишет:
>>>> Hmm. But at least it was lying
>>>> similarly on AMD and Intel CPUs. :)
>>>> So I was able to reproduce the problems
>>>> myself.
>>>> Do you mean, any AMD tests are now useless, and we need to proceed with Intel
>>>> tests only?
>>> For anything VMXE related, yes.
>> What would be the expected behaviour on Intel, if it is set? Any difference
>> with AMD?
> On Intel, userspace should be able to stuff CR4.VMXE=1 via KVM_SET_SREGS if
> the 'nested' module param is 1, e.g. if 'modprobe kvm_intel nested=1'. Note,
> 'nested' is enabled by default on kernel 5.0 and later.

So if I understand you correctly, we
need to test that:
- with nested=0 VMXE gives EINVAL
- with nested=1 VMXE changes nothing
visible, except probably to allow guest
to read that value (we won't test guest
reading though).

Is this correct?


> With AMD, setting CR4.VMXE=1 is never allowed as AMD doesn't support VMX,

OK, for that I can give you a
Tested-by: Stas Sergeev <[email protected]>

because I confirm that on AMD it now
consistently returns EINVAL, whereas
without your patches it did random crap,
depending on whether it is a first call to
KVM_SET_SREGS, or not first.


>> But we do not use unrestricted guest.
>> We use v86 under KVM.
> Unrestricted guest can kick in even if CR0.PG=1 && CR0.PE=1, e.g. there are
> segmentation checks that apply if and only if unrestricted_guest=0. Long story
> short, without a deep audit, it's basically impossible to rule out a dependency
> on unrestricted guest since you're playing around with v86.

You mean "unrestricted_guest" as a module
parameter, rather than the similar named CPU
feature, right? So we may depend on
unrestricted_guest parameter, but not on a
hardware feature, correct?


>> The only other effect of setting VMXE was clearing VME. Which shouldn't
>> affect anything either, right?
> Hmm, clearing VME would mean that exceptions/interrupts within the guest would
> trigger a switch out of v86 and into vanilla protected mode. v86 and PM have
> different consistency checks, particularly for segmentation, so it's plausible
> that clearing CR4.VME inadvertantly worked around the bug by avoiding invalid
> guest state for v86.

Lets assume that was the case.
With those github guys its not possible
to do any consistent checks. :(

2020-10-09 16:14:02

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

On Fri, Oct 09, 2020 at 06:48:21PM +0300, stsp wrote:
> 09.10.2020 18:30, Sean Christopherson пишет:
> >On Fri, Oct 09, 2020 at 05:11:51PM +0300, stsp wrote:
> >>09.10.2020 07:04, Sean Christopherson пишет:
> >>>>Hmm. But at least it was lying
> >>>>similarly on AMD and Intel CPUs. :)
> >>>>So I was able to reproduce the problems
> >>>>myself.
> >>>>Do you mean, any AMD tests are now useless, and we need to proceed with Intel
> >>>>tests only?
> >>>For anything VMXE related, yes.
> >>What would be the expected behaviour on Intel, if it is set? Any difference
> >>with AMD?
> >On Intel, userspace should be able to stuff CR4.VMXE=1 via KVM_SET_SREGS if
> >the 'nested' module param is 1, e.g. if 'modprobe kvm_intel nested=1'. Note,
> >'nested' is enabled by default on kernel 5.0 and later.
>
> So if I understand you correctly, we
> need to test that:
> - with nested=0 VMXE gives EINVAL
> - with nested=1 VMXE changes nothing
> visible, except probably to allow guest
> to read that value (we won't test guest
> reading though).
>
> Is this correct?

Yep, exactly!

> >With AMD, setting CR4.VMXE=1 is never allowed as AMD doesn't support VMX,
>
> OK, for that I can give you a
> Tested-by: Stas Sergeev <[email protected]>
>
> because I confirm that on AMD it now consistently returns EINVAL, whereas
> without your patches it did random crap, depending on whether it is a first
> call to KVM_SET_SREGS, or not first.
>
>
> >>But we do not use unrestricted guest.
> >>We use v86 under KVM.
> >Unrestricted guest can kick in even if CR0.PG=1 && CR0.PE=1, e.g. there are
> >segmentation checks that apply if and only if unrestricted_guest=0. Long story
> >short, without a deep audit, it's basically impossible to rule out a dependency
> >on unrestricted guest since you're playing around with v86.
>
> You mean "unrestricted_guest" as a module parameter, rather than the similar
> named CPU feature, right? So we may depend on unrestricted_guest parameter,
> but not on a hardware feature, correct?

The unrestricted_guest module param is tied directly to the hardware feature,
i.e. if kvm_intel.unrestricted_guest=0 then KVM will run guests with
unrestricted guest disabled. That doesn't necessarily mean any of the
behavior that is allowed by unrestricted guest will be encountered, but if
it is encountered, then it will be handled by the CPU instead of causing a
VM-Exit and requiring KVM emulation.

The reported is using an old CPU that doesn't support unrestricted guest,
so both the hardware feature and the module param will be off/0.

> >>The only other effect of setting VMXE was clearing VME. Which shouldn't
> >>affect anything either, right?
> >Hmm, clearing VME would mean that exceptions/interrupts within the guest would
> >trigger a switch out of v86 and into vanilla protected mode. v86 and PM have
> >different consistency checks, particularly for segmentation, so it's plausible
> >that clearing CR4.VME inadvertantly worked around the bug by avoiding invalid
> >guest state for v86.
>
> Lets assume that was the case. With those github guys its not possible to do
> any consistent checks. :(

K. If this is ever a problem in the future, having a way relatively simple
reproducer, e.g. something we can run without having to build/install a
variety of tools, would make it easier to debug. In theory, the bug should be
reproducible even on modern hardware by loading KVM with unrestricted_guest=0.

2020-11-13 12:07:04

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup

On 07/10/20 03:44, Sean Christopherson wrote:
> Two bug fixes to handle KVM_SET_SREGS without a preceding KVM_SET_CPUID2.
>
> The overarching issue is that kvm_x86_ops.set_cr4() can fail, but its
> invocation from __set_sregs(), a.k.a. KVM_SET_SREGS, ignores the result.
> Fix the issue by moving all validity checks out of .set_cr4() in one way
> or another.
>
> I intentionally omitted a Cc to stable. The first bug fix in particular
> may break stable trees as it simply removes a check, and I don't know that
> stable trees have the generic CR4 reserved bit check that is needed to
> prevent the guest from setting VMXE when nVMX is not allowed.
>
> Sean Christopherson (6):
> KVM: VMX: Drop guest CPUID check for VMXE in vmx_set_cr4()
> KVM: VMX: Drop explicit 'nested' check from vmx_set_cr4()
> KVM: SVM: Drop VMXE check from svm_set_cr4()
> KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook
> KVM: x86: Return bool instead of int for CR4 and SREGS validity checks
> KVM: selftests: Verify supported CR4 bits can be set before
> KVM_SET_CPUID2
>
> arch/x86/include/asm/kvm_host.h | 3 +-
> arch/x86/kvm/svm/nested.c | 2 +-
> arch/x86/kvm/svm/svm.c | 12 ++-
> arch/x86/kvm/svm/svm.h | 2 +-
> arch/x86/kvm/vmx/nested.c | 2 +-
> arch/x86/kvm/vmx/vmx.c | 35 +++----
> arch/x86/kvm/vmx/vmx.h | 2 +-
> arch/x86/kvm/x86.c | 28 +++---
> arch/x86/kvm/x86.h | 2 +-
> .../selftests/kvm/include/x86_64/processor.h | 17 ++++
> .../selftests/kvm/include/x86_64/vmx.h | 4 -
> .../selftests/kvm/x86_64/set_sregs_test.c | 92 ++++++++++++++++++-
> 12 files changed, 153 insertions(+), 48 deletions(-)
>

Queued, thanks.

Paolo

2020-12-07 11:24:00

by stsp

[permalink] [raw]
Subject: KVM_SET_CPUID doesn't check supported bits (was Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup)

09.10.2020 18:30, Sean Christopherson пишет:
>> The only other effect of setting VMXE was clearing VME. Which shouldn't
>> affect anything either, right?
> Hmm, clearing VME would mean that exceptions/interrupts within the guest would
> trigger a switch out of v86 and into vanilla protected mode. v86 and PM have
> different consistency checks, particularly for segmentation, so it's plausible
> that clearing CR4.VME inadvertantly worked around the bug by avoiding invalid
> guest state for v86.

Almost.

So with your patch set (thanks!) and a
bit of further investigations, it now became
clear where the problem is.
We have this code:
---

|cpuid->nent = 2; // Use the same values as in emu-i386/simx86/interp.c
// (Pentium 133-200MHz, "GenuineIntel") cpuid->entries[0] = (struct
kvm_cpuid_entry) { .function = 0, .eax = 1, .ebx = 0x756e6547, .ecx =
0x6c65746e, .edx = 0x49656e69 }; // family 5, model 2, stepping 12, fpu
vme de pse tsc msr mce cx8 cpuid->entries[1] = (struct kvm_cpuid_entry)
{ .function = 1, .eax = 0x052c, .ebx = 0, .ecx = 0, .edx = 0x1bf }; ret
= ioctl(vcpufd, KVM_SET_CPUID, cpuid); free(cpuid); if (ret == -1) {
perror("KVM: KVM_SET_CPUID"); return 0; } --- It tries to enable VME
among other things. qemu appears to disable VME by default, unless you
do "-cpu host". So we have a situation where the host (which is qemu)
doesn't have VME, and guest (dosemu) is trying to enable it. Now obviously ||KVM_SET_CPUID| doesn't check anyting
at all and returns success. That later turns
into an invalid guest state.

Question: should|KVM_SET_CPUID| check for
supported bits, end return error if not everything
is supported?
||

2020-12-07 11:28:56

by stsp

[permalink] [raw]
Subject: KVM_SET_CPUID doesn't check supported bits (was Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup)

[re-send because of bad formatting]

09.10.2020 18:30, Sean Christopherson пишет:
>> The only other effect of setting VMXE was clearing VME. Which shouldn't
>> affect anything either, right?
> Hmm, clearing VME would mean that exceptions/interrupts within the
> guest would
> trigger a switch out of v86 and into vanilla protected mode. v86 and
> PM have
> different consistency checks, particularly for segmentation, so it's
> plausible
> that clearing CR4.VME inadvertantly worked around the bug by avoiding
> invalid
> guest state for v86.

Almost.

So with your patch set (thanks!) and a
bit of further investigations, it now became
clear where the problem is.
We have this code:
---

|cpuid->nent = 2; // Use the same values as in emu-i386/simx86/interp.c
// (Pentium 133-200MHz, "GenuineIntel") cpuid->entries[0] = (struct
kvm_cpuid_entry) { .function = 0, .eax = 1, .ebx = 0x756e6547, .ecx =
0x6c65746e, .edx = 0x49656e69 }; // family 5, model 2, stepping 12, fpu
vme de pse tsc msr mce cx8 cpuid->entries[1] = (struct kvm_cpuid_entry)
{ .function = 1, .eax = 0x052c, .ebx = 0, .ecx = 0, .edx = 0x1bf }; ret
= ioctl(vcpufd, KVM_SET_CPUID, cpuid); free(cpuid); if (ret == -1) {
perror("KVM: KVM_SET_CPUID"); return 0; }|

---


It tries to enable VME among other things.
qemu appears to disable VME by default,
unless you do "-cpu host". So we have a situation where
the host (which is qemu) doesn't have VME,
and guest (dosemu) is trying to enable it.
Now obviously KVM_SET_CPUID doesn't check anyting
at all and returns success. That later turns
into an invalid guest state.


Question: should KVM_SET_CPUID check for
supported bits, end return error if not everything
is supported?

2020-12-07 11:34:23

by Paolo Bonzini

[permalink] [raw]
Subject: Re: KVM_SET_CPUID doesn't check supported bits (was Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup)

On 07/12/20 12:24, stsp wrote:
> It tries to enable VME among other things.
> qemu appears to disable VME by default,
> unless you do "-cpu host". So we have a situation where
> the host (which is qemu) doesn't have VME,
> and guest (dosemu) is trying to enable it.
> Now obviously KVM_SET_CPUID doesn't check anyting
> at all and returns success. That later turns
> into an invalid guest state.
>
>
> Question: should KVM_SET_CPUID check for
> supported bits, end return error if not everything
> is supported?

No, it is intentional. Most bits of CPUID are not ever checked by KVM,
so userspace is supposed to set values that makes sense or just copy the
value of KVM_GET_SUPPORTED_CPUID more or less blindly.

Paolo

2020-12-07 11:58:45

by stsp

[permalink] [raw]
Subject: Re: KVM_SET_CPUID doesn't check supported bits (was Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup)

07.12.2020 14:29, Paolo Bonzini пишет:
> On 07/12/20 12:24, stsp wrote:
>> It tries to enable VME among other things.
>> qemu appears to disable VME by default,
>> unless you do "-cpu host". So we have a situation where
>> the host (which is qemu) doesn't have VME,
>> and guest (dosemu) is trying to enable it.
>> Now obviously KVM_SET_CPUID doesn't check anyting
>> at all and returns success. That later turns
>> into an invalid guest state.
>>
>>
>> Question: should KVM_SET_CPUID check for
>> supported bits, end return error if not everything
>> is supported?
>
> No, it is intentional.  Most bits of CPUID are not ever checked by
> KVM, so userspace is supposed to set values that makes sense
By "that makes sense" you probably
meant to say "bits_that_makes_sense masked
with the ones returned by KVM_GET_SUPPORTED_CPUID"?

So am I right that KVM_SET_CPUID only "lowers"
the supported bits? In which case I don't need to
call it at all, but instead just call KVM_GET_SUPPORTED_CPUID
and see if the needed bits are supported, and
exit otherwise, right?

2020-12-07 14:08:55

by stsp

[permalink] [raw]
Subject: Re: KVM_SET_CPUID doesn't check supported bits (was Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup)

07.12.2020 16:35, Paolo Bonzini пишет:
>
>
> Il lun 7 dic 2020, 12:47 stsp <[email protected]
> <mailto:[email protected]>> ha scritto:
>
> So am I right that KVM_SET_CPUID only "lowers"
> the supported bits? In which case I don't need to
> call it at all, but instead just call KVM_GET_SUPPORTED_CPUID
> and see if the needed bits are supported, and
> exit otherwise, right?
>
>
> You always have to call KVM_SET_CPUID2, but you can just pass in
> whatever you got from KVM_GET_SUPPORTED_CPUID.
OK, done that, thanks.
(after checking that KVM_GET_SUPPORTED_CPUID
actually has the needed features itself, otherwise exit).

Perhaps it would be good if guest cpuid to
have a default values of KVM_GET_SUPPORTED_CPUID,
so that the user doesn't have to do the needless
calls to just copy host features to guest cpuid.

2020-12-07 14:33:56

by stsp

[permalink] [raw]
Subject: Re: KVM_SET_CPUID doesn't check supported bits (was Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup)

07.12.2020 17:09, Paolo Bonzini пишет:
>
>
> Il lun 7 dic 2020, 15:04 stsp <[email protected]
> <mailto:[email protected]>> ha scritto:
>
> Perhaps it would be good if guest cpuid to
> have a default values of KVM_GET_SUPPORTED_CPUID,
> so that the user doesn't have to do the needless
> calls to just copy host features to guest cpuid.
>
>
> It is too late to change that aspect of the API, unfortunately. We
> don't know how various userspaces would behave.
Which means some sensible behaviour
already exists if I don't call KVM_SET_CPUID2.
So what is it, #UD on CPUID?
Would be good to have that documented.

2020-12-07 14:45:25

by stsp

[permalink] [raw]
Subject: Re: KVM_SET_CPUID doesn't check supported bits (was Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup)

07.12.2020 17:34, Paolo Bonzini пишет:
>
> > It is too late to change that aspect of the API, unfortunately. We
> > don't know how various userspaces would behave.
> Which means some sensible behaviour
> already exists if I don't call KVM_SET_CPUID2.
> So what is it, #UD on CPUID?
>
>
> I would have to check but I think you always get zeroes; not entirely
> sensible.
In that case I would argue that you can't
break anything by changing that to something
sensible. :)
But anyway, since my problem is solved,
this is just a potential improvement for the
future, or the case for documenting.

2020-12-08 01:19:04

by Jim Mattson

[permalink] [raw]
Subject: Re: KVM_SET_CPUID doesn't check supported bits (was Re: [PATCH 0/6] KVM: x86: KVM_SET_SREGS.CR4 bug fixes and cleanup)

On Mon, Dec 7, 2020 at 3:47 AM stsp <[email protected]> wrote:
>
> 07.12.2020 14:29, Paolo Bonzini пишет:
> > On 07/12/20 12:24, stsp wrote:
> >> It tries to enable VME among other things.
> >> qemu appears to disable VME by default,
> >> unless you do "-cpu host". So we have a situation where
> >> the host (which is qemu) doesn't have VME,
> >> and guest (dosemu) is trying to enable it.
> >> Now obviously KVM_SET_CPUID doesn't check anyting
> >> at all and returns success. That later turns
> >> into an invalid guest state.
> >>
> >>
> >> Question: should KVM_SET_CPUID check for
> >> supported bits, end return error if not everything
> >> is supported?
> >
> > No, it is intentional. Most bits of CPUID are not ever checked by
> > KVM, so userspace is supposed to set values that makes sense
> By "that makes sense" you probably
> meant to say "bits_that_makes_sense masked
> with the ones returned by KVM_GET_SUPPORTED_CPUID"?
>
> So am I right that KVM_SET_CPUID only "lowers"
> the supported bits? In which case I don't need to
> call it at all, but instead just call KVM_GET_SUPPORTED_CPUID
> and see if the needed bits are supported, and
> exit otherwise, right?

"Lowers" is a tricky concept for CPUID information. Some feature bits
report 0 for "present" and 1 for "not-present." Some multi-bit fields
are interpreted as numbers, which may be signed or unsigned. Some
multi-bit fields are strings. Some fields have dependencies on other
fields. Etc.