2022-08-03 15:54:20

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH v3 00/13] SMM emulation and interrupt shadow fixes

This patch series is a result of long debug work to find out why

sometimes guests with win11 secure boot

were failing during boot.



During writing a unit test I found another bug, turns out

that on rsm emulation, if the rsm instruction was done in real

or 32 bit mode, KVM would truncate the restored RIP to 32 bit.



I also refactored the way we write SMRAM so it is easier

now to understand what is going on.



The main bug in this series which I fixed is that we

allowed #SMI to happen during the STI interrupt shadow,

and we did nothing to both reset it on #SMI handler

entry and restore it on RSM.



V3: addressed most of the review feedback from Sean (thanks!)



Best regards,

Maxim Levitsky



Maxim Levitsky (13):

bug: introduce ASSERT_STRUCT_OFFSET

KVM: x86: emulator: em_sysexit should update ctxt->mode

KVM: x86: emulator: introduce emulator_recalc_and_set_mode

KVM: x86: emulator: update the emulation mode after rsm

KVM: x86: emulator: update the emulation mode after CR0 write

KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on

the image format

KVM: x86: emulator/smm: add structs for KVM's smram layout

KVM: x86: emulator/smm: use smram structs in the common code

KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore

KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore

KVM: x86: SVM: use smram structs

KVM: x86: SVM: don't save SVM state to SMRAM when VM is not long mode

capable

KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM



arch/x86/include/asm/kvm_host.h | 11 +-

arch/x86/kvm/emulate.c | 305 +++++++++++++++++---------------

arch/x86/kvm/kvm_emulate.h | 223 ++++++++++++++++++++++-

arch/x86/kvm/svm/svm.c | 30 ++--

arch/x86/kvm/vmx/vmcs12.h | 5 +-

arch/x86/kvm/vmx/vmx.c | 4 +-

arch/x86/kvm/x86.c | 175 +++++++++---------

include/linux/build_bug.h | 9 +

8 files changed, 497 insertions(+), 265 deletions(-)



--

2.26.3






2022-08-03 15:54:27

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH v3 04/13] KVM: x86: emulator: update the emulation mode after rsm

This ensures that RIP will be correctly written back,
because the RSM instruction can switch the CPU mode from
32 bit (or less) to 64 bit.

This fixes a guest crash in case the #SMI is received
while the guest runs a code from an address > 32 bit.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index bc70caf403c2b4..5e91b26cc1d8aa 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2666,6 +2666,11 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
if (ret != X86EMUL_CONTINUE)
goto emulate_shutdown;

+
+ ret = emulator_recalc_and_set_mode(ctxt);
+ if (ret != X86EMUL_CONTINUE)
+ goto emulate_shutdown;
+
/*
* Note, the ctxt->ops callbacks are responsible for handling side
* effects when writing MSRs and CRs, e.g. MMU context resets, CPUID
--
2.26.3


2022-08-03 15:54:48

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH v3 10/13] KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore

Use kvm_smram_state_64 struct to save/restore the 64 bit SMM state
(used when X86_FEATURE_LM is present in the guest CPUID,
regardless of 32-bitness of the guest).

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 88 ++++++++++++++----------------------------
arch/x86/kvm/x86.c | 75 ++++++++++++++++-------------------
2 files changed, 62 insertions(+), 101 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 3339d542a25439..4bdbc5893a1657 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2385,24 +2385,16 @@ static void rsm_load_seg_32(struct x86_emulate_ctxt *ctxt,
}

#ifdef CONFIG_X86_64
-static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, const char *smstate,
- int n)
+static void rsm_load_seg_64(struct x86_emulate_ctxt *ctxt,
+ const struct kvm_smm_seg_state_64 *state,
+ int n)
{
struct desc_struct desc;
- int offset;
- u16 selector;
- u32 base3;
-
- offset = 0x7e00 + n * 16;
-
- selector = GET_SMSTATE(u16, smstate, offset);
- rsm_set_desc_flags(&desc, GET_SMSTATE(u16, smstate, offset + 2) << 8);
- set_desc_limit(&desc, GET_SMSTATE(u32, smstate, offset + 4));
- set_desc_base(&desc, GET_SMSTATE(u32, smstate, offset + 8));
- base3 = GET_SMSTATE(u32, smstate, offset + 12);

- ctxt->ops->set_segment(ctxt, selector, &desc, base3, n);
- return X86EMUL_CONTINUE;
+ rsm_set_desc_flags(&desc, state->attributes << 8);
+ set_desc_limit(&desc, state->limit);
+ set_desc_base(&desc, (u32)state->base);
+ ctxt->ops->set_segment(ctxt, state->selector, &desc, state->base >> 32, n);
}
#endif

@@ -2496,71 +2488,49 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,

#ifdef CONFIG_X86_64
static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
- const char *smstate)
+ const struct kvm_smram_state_64 *smstate)
{
- struct desc_struct desc;
struct desc_ptr dt;
- u64 val, cr0, cr3, cr4;
- u32 base3;
- u16 selector;
int i, r;

for (i = 0; i < 16; i++)
- *reg_write(ctxt, i) = GET_SMSTATE(u64, smstate, 0x7ff8 - i * 8);
+ *reg_write(ctxt, i) = smstate->gprs[15 - i];

- ctxt->_eip = GET_SMSTATE(u64, smstate, 0x7f78);
- ctxt->eflags = GET_SMSTATE(u32, smstate, 0x7f70) | X86_EFLAGS_FIXED;
+ ctxt->_eip = smstate->rip;
+ ctxt->eflags = smstate->rflags | X86_EFLAGS_FIXED;

- val = GET_SMSTATE(u64, smstate, 0x7f68);
-
- if (ctxt->ops->set_dr(ctxt, 6, val))
+ if (ctxt->ops->set_dr(ctxt, 6, smstate->dr6))
return X86EMUL_UNHANDLEABLE;
-
- val = GET_SMSTATE(u64, smstate, 0x7f60);
-
- if (ctxt->ops->set_dr(ctxt, 7, val))
+ if (ctxt->ops->set_dr(ctxt, 7, smstate->dr7))
return X86EMUL_UNHANDLEABLE;

- cr0 = GET_SMSTATE(u64, smstate, 0x7f58);
- cr3 = GET_SMSTATE(u64, smstate, 0x7f50);
- cr4 = GET_SMSTATE(u64, smstate, 0x7f48);
- ctxt->ops->set_smbase(ctxt, GET_SMSTATE(u32, smstate, 0x7f00));
- val = GET_SMSTATE(u64, smstate, 0x7ed0);
+ ctxt->ops->set_smbase(ctxt, smstate->smbase);

- if (ctxt->ops->set_msr(ctxt, MSR_EFER, val & ~EFER_LMA))
+ if (ctxt->ops->set_msr(ctxt, MSR_EFER, smstate->efer & ~EFER_LMA))
return X86EMUL_UNHANDLEABLE;

- selector = GET_SMSTATE(u32, smstate, 0x7e90);
- rsm_set_desc_flags(&desc, GET_SMSTATE(u32, smstate, 0x7e92) << 8);
- set_desc_limit(&desc, GET_SMSTATE(u32, smstate, 0x7e94));
- set_desc_base(&desc, GET_SMSTATE(u32, smstate, 0x7e98));
- base3 = GET_SMSTATE(u32, smstate, 0x7e9c);
- ctxt->ops->set_segment(ctxt, selector, &desc, base3, VCPU_SREG_TR);
+ rsm_load_seg_64(ctxt, &smstate->tr, VCPU_SREG_TR);

- dt.size = GET_SMSTATE(u32, smstate, 0x7e84);
- dt.address = GET_SMSTATE(u64, smstate, 0x7e88);
+ dt.size = smstate->idtr.limit;
+ dt.address = smstate->idtr.base;
ctxt->ops->set_idt(ctxt, &dt);

- selector = GET_SMSTATE(u32, smstate, 0x7e70);
- rsm_set_desc_flags(&desc, GET_SMSTATE(u32, smstate, 0x7e72) << 8);
- set_desc_limit(&desc, GET_SMSTATE(u32, smstate, 0x7e74));
- set_desc_base(&desc, GET_SMSTATE(u32, smstate, 0x7e78));
- base3 = GET_SMSTATE(u32, smstate, 0x7e7c);
- ctxt->ops->set_segment(ctxt, selector, &desc, base3, VCPU_SREG_LDTR);
+ rsm_load_seg_64(ctxt, &smstate->ldtr, VCPU_SREG_LDTR);

- dt.size = GET_SMSTATE(u32, smstate, 0x7e64);
- dt.address = GET_SMSTATE(u64, smstate, 0x7e68);
+ dt.size = smstate->gdtr.limit;
+ dt.address = smstate->gdtr.base;
ctxt->ops->set_gdt(ctxt, &dt);

- r = rsm_enter_protected_mode(ctxt, cr0, cr3, cr4);
+ r = rsm_enter_protected_mode(ctxt, smstate->cr0, smstate->cr3, smstate->cr4);
if (r != X86EMUL_CONTINUE)
return r;

- for (i = 0; i < 6; i++) {
- r = rsm_load_seg_64(ctxt, smstate, i);
- if (r != X86EMUL_CONTINUE)
- return r;
- }
+ rsm_load_seg_64(ctxt, &smstate->es, VCPU_SREG_ES);
+ rsm_load_seg_64(ctxt, &smstate->cs, VCPU_SREG_CS);
+ rsm_load_seg_64(ctxt, &smstate->ss, VCPU_SREG_SS);
+ rsm_load_seg_64(ctxt, &smstate->ds, VCPU_SREG_DS);
+ rsm_load_seg_64(ctxt, &smstate->fs, VCPU_SREG_FS);
+ rsm_load_seg_64(ctxt, &smstate->gs, VCPU_SREG_GS);

return X86EMUL_CONTINUE;
}
@@ -2635,7 +2605,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)

#ifdef CONFIG_X86_64
if (emulator_has_longmode(ctxt))
- ret = rsm_load_state_64(ctxt, (const char *)&smram);
+ ret = rsm_load_state_64(ctxt, &smram.smram64);
else
#endif
ret = rsm_load_state_32(ctxt, &smram.smram32);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6abe35f7687e2c..4e3ef63baf83df 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9848,20 +9848,17 @@ static void enter_smm_save_seg_32(struct kvm_vcpu *vcpu,
}

#ifdef CONFIG_X86_64
-static void enter_smm_save_seg_64(struct kvm_vcpu *vcpu, char *buf, int n)
+static void enter_smm_save_seg_64(struct kvm_vcpu *vcpu,
+ struct kvm_smm_seg_state_64 *state,
+ int n)
{
struct kvm_segment seg;
- int offset;
- u16 flags;

kvm_get_segment(vcpu, &seg, n);
- offset = 0x7e00 + n * 16;
-
- flags = enter_smm_get_segment_flags(&seg) >> 8;
- put_smstate(u16, buf, offset, seg.selector);
- put_smstate(u16, buf, offset + 2, flags);
- put_smstate(u32, buf, offset + 4, seg.limit);
- put_smstate(u64, buf, offset + 8, seg.base);
+ state->selector = seg.selector;
+ state->attributes = enter_smm_get_segment_flags(&seg) >> 8;
+ state->limit = seg.limit;
+ state->base = seg.base;
}
#endif

@@ -9909,57 +9906,51 @@ static void enter_smm_save_state_32(struct kvm_vcpu *vcpu, struct kvm_smram_stat
}

#ifdef CONFIG_X86_64
-static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
+static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, struct kvm_smram_state_64 *smram)
{
struct desc_ptr dt;
- struct kvm_segment seg;
unsigned long val;
int i;

for (i = 0; i < 16; i++)
- put_smstate(u64, buf, 0x7ff8 - i * 8, kvm_register_read_raw(vcpu, i));
+ smram->gprs[15 - i] = kvm_register_read_raw(vcpu, i);
+
+ smram->rip = kvm_rip_read(vcpu);
+ smram->rflags = kvm_get_rflags(vcpu);

- put_smstate(u64, buf, 0x7f78, kvm_rip_read(vcpu));
- put_smstate(u32, buf, 0x7f70, kvm_get_rflags(vcpu));

kvm_get_dr(vcpu, 6, &val);
- put_smstate(u64, buf, 0x7f68, val);
+ smram->dr6 = val;
kvm_get_dr(vcpu, 7, &val);
- put_smstate(u64, buf, 0x7f60, val);
-
- put_smstate(u64, buf, 0x7f58, kvm_read_cr0(vcpu));
- put_smstate(u64, buf, 0x7f50, kvm_read_cr3(vcpu));
- put_smstate(u64, buf, 0x7f48, kvm_read_cr4(vcpu));
+ smram->dr7 = val;

- put_smstate(u32, buf, 0x7f00, vcpu->arch.smbase);
+ smram->cr0 = kvm_read_cr0(vcpu);
+ smram->cr3 = kvm_read_cr3(vcpu);
+ smram->cr4 = kvm_read_cr4(vcpu);

- /* revision id */
- put_smstate(u32, buf, 0x7efc, 0x00020064);
+ smram->smbase = vcpu->arch.smbase;
+ smram->smm_revison = 0x00020064;

- put_smstate(u64, buf, 0x7ed0, vcpu->arch.efer);
+ smram->efer = vcpu->arch.efer;

- kvm_get_segment(vcpu, &seg, VCPU_SREG_TR);
- put_smstate(u16, buf, 0x7e90, seg.selector);
- put_smstate(u16, buf, 0x7e92, enter_smm_get_segment_flags(&seg) >> 8);
- put_smstate(u32, buf, 0x7e94, seg.limit);
- put_smstate(u64, buf, 0x7e98, seg.base);
+ enter_smm_save_seg_64(vcpu, &smram->tr, VCPU_SREG_TR);

static_call(kvm_x86_get_idt)(vcpu, &dt);
- put_smstate(u32, buf, 0x7e84, dt.size);
- put_smstate(u64, buf, 0x7e88, dt.address);
+ smram->idtr.limit = dt.size;
+ smram->idtr.base = dt.address;

- kvm_get_segment(vcpu, &seg, VCPU_SREG_LDTR);
- put_smstate(u16, buf, 0x7e70, seg.selector);
- put_smstate(u16, buf, 0x7e72, enter_smm_get_segment_flags(&seg) >> 8);
- put_smstate(u32, buf, 0x7e74, seg.limit);
- put_smstate(u64, buf, 0x7e78, seg.base);
+ enter_smm_save_seg_64(vcpu, &smram->ldtr, VCPU_SREG_LDTR);

static_call(kvm_x86_get_gdt)(vcpu, &dt);
- put_smstate(u32, buf, 0x7e64, dt.size);
- put_smstate(u64, buf, 0x7e68, dt.address);
+ smram->gdtr.limit = dt.size;
+ smram->gdtr.base = dt.address;

- for (i = 0; i < 6; i++)
- enter_smm_save_seg_64(vcpu, buf, i);
+ enter_smm_save_seg_64(vcpu, &smram->es, VCPU_SREG_ES);
+ enter_smm_save_seg_64(vcpu, &smram->cs, VCPU_SREG_CS);
+ enter_smm_save_seg_64(vcpu, &smram->ss, VCPU_SREG_SS);
+ enter_smm_save_seg_64(vcpu, &smram->ds, VCPU_SREG_DS);
+ enter_smm_save_seg_64(vcpu, &smram->fs, VCPU_SREG_FS);
+ enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
}
#endif

@@ -9973,7 +9964,7 @@ static void enter_smm(struct kvm_vcpu *vcpu)
memset(smram.bytes, 0, sizeof(smram.bytes));
#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
- enter_smm_save_state_64(vcpu, (char *)&smram);
+ enter_smm_save_state_64(vcpu, &smram.smram64);
else
#endif
enter_smm_save_state_32(vcpu, &smram.smram32);
--
2.26.3


2022-08-03 15:54:49

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH v3 12/13] KVM: x86: SVM: don't save SVM state to SMRAM when VM is not long mode capable

When the guest CPUID doesn't have support for long mode, 32 bit SMRAM
layout is used and it has no support for preserving EFER and/or SVM
state.

Note that this isn't relevant to running 32 bit guests on VM which is
long mode capable - such VM can still run 32 bit guests in compatibility
mode.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7ca5e06878e19a..64cfd26bc5e7a6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4442,6 +4442,15 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
if (!is_guest_mode(vcpu))
return 0;

+ /*
+ * 32 bit SMRAM format doesn't preserve EFER and SVM state.
+ * SVM should not be enabled by the userspace without marking
+ * the CPU as at least long mode capable.
+ */
+
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_LM))
+ return 1;
+
smram->smram64.svm_guest_flag = 1;
smram->smram64.svm_guest_vmcb_gpa = svm->nested.vmcb12_gpa;

--
2.26.3


2022-08-03 16:42:48

by Maxim Levitsky

[permalink] [raw]
Subject: [PATCH v3 03/13] KVM: x86: emulator: introduce emulator_recalc_and_set_mode

Some instructions update the cpu execution mode, which needs
to update the emulation mode.

Extract this code, and make assign_eip_far use it.

assign_eip_far now reads CS, instead of getting it via a parameter,
which is ok, because callers always assign CS to the
same value before calling it.

No functional change is intended.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 85 ++++++++++++++++++++++++++++--------------
1 file changed, 57 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7bdc495710bd0e..bc70caf403c2b4 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -805,8 +805,7 @@ static int linearize(struct x86_emulate_ctxt *ctxt,
ctxt->mode, linear);
}

-static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst,
- enum x86emul_mode mode)
+static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst)
{
ulong linear;
int rc;
@@ -816,41 +815,71 @@ static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst,

if (ctxt->op_bytes != sizeof(unsigned long))
addr.ea = dst & ((1UL << (ctxt->op_bytes << 3)) - 1);
- rc = __linearize(ctxt, addr, &max_size, 1, false, true, mode, &linear);
+ rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode, &linear);
if (rc == X86EMUL_CONTINUE)
ctxt->_eip = addr.ea;
return rc;
}

+static inline int emulator_recalc_and_set_mode(struct x86_emulate_ctxt *ctxt)
+{
+ u64 efer;
+ struct desc_struct cs;
+ u16 selector;
+ u32 base3;
+
+ ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
+
+ if (!ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE) {
+ /* Real mode. cpu must not have long mode active */
+ if (efer & EFER_LMA)
+ return X86EMUL_UNHANDLEABLE;
+ ctxt->mode = X86EMUL_MODE_REAL;
+ return X86EMUL_CONTINUE;
+ }
+
+ if (ctxt->eflags & X86_EFLAGS_VM) {
+ /* Protected/VM86 mode. cpu must not have long mode active */
+ if (efer & EFER_LMA)
+ return X86EMUL_UNHANDLEABLE;
+ ctxt->mode = X86EMUL_MODE_VM86;
+ return X86EMUL_CONTINUE;
+ }
+
+ if (!ctxt->ops->get_segment(ctxt, &selector, &cs, &base3, VCPU_SREG_CS))
+ return X86EMUL_UNHANDLEABLE;
+
+ if (efer & EFER_LMA) {
+ if (cs.l) {
+ /* Proper long mode */
+ ctxt->mode = X86EMUL_MODE_PROT64;
+ } else if (cs.d) {
+ /* 32 bit compatibility mode*/
+ ctxt->mode = X86EMUL_MODE_PROT32;
+ } else {
+ ctxt->mode = X86EMUL_MODE_PROT16;
+ }
+ } else {
+ /* Legacy 32 bit / 16 bit mode */
+ ctxt->mode = cs.d ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
+ }
+
+ return X86EMUL_CONTINUE;
+}
+
static inline int assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
{
- return assign_eip(ctxt, dst, ctxt->mode);
+ return assign_eip(ctxt, dst);
}

-static int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst,
- const struct desc_struct *cs_desc)
+static int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst)
{
- enum x86emul_mode mode = ctxt->mode;
- int rc;
+ int rc = emulator_recalc_and_set_mode(ctxt);

-#ifdef CONFIG_X86_64
- if (ctxt->mode >= X86EMUL_MODE_PROT16) {
- if (cs_desc->l) {
- u64 efer = 0;
+ if (rc != X86EMUL_CONTINUE)
+ return rc;

- ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
- if (efer & EFER_LMA)
- mode = X86EMUL_MODE_PROT64;
- } else
- mode = X86EMUL_MODE_PROT32; /* temporary value */
- }
-#endif
- if (mode == X86EMUL_MODE_PROT16 || mode == X86EMUL_MODE_PROT32)
- mode = cs_desc->d ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
- rc = assign_eip(ctxt, dst, mode);
- if (rc == X86EMUL_CONTINUE)
- ctxt->mode = mode;
- return rc;
+ return assign_eip(ctxt, dst);
}

static inline int jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
@@ -2184,7 +2213,7 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE)
return rc;

- rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
+ rc = assign_eip_far(ctxt, ctxt->src.val);
/* Error handling is not implemented. */
if (rc != X86EMUL_CONTINUE)
return X86EMUL_UNHANDLEABLE;
@@ -2262,7 +2291,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
&new_desc);
if (rc != X86EMUL_CONTINUE)
return rc;
- rc = assign_eip_far(ctxt, eip, &new_desc);
+ rc = assign_eip_far(ctxt, eip);
/* Error handling is not implemented. */
if (rc != X86EMUL_CONTINUE)
return X86EMUL_UNHANDLEABLE;
@@ -3482,7 +3511,7 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE)
return rc;

- rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
+ rc = assign_eip_far(ctxt, ctxt->src.val);
if (rc != X86EMUL_CONTINUE)
goto fail;

--
2.26.3


2022-08-10 12:34:15

by Thomas Lamprecht

[permalink] [raw]
Subject: Re: [PATCH v3 00/13] SMM emulation and interrupt shadow fixes

On 03/08/2022 17:49, Maxim Levitsky wrote:
> This patch series is a result of long debug work to find out why
> sometimes guests with win11 secure boot
> were failing during boot.
>
> During writing a unit test I found another bug, turns out
> that on rsm emulation, if the rsm instruction was done in real
> or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
>
> I also refactored the way we write SMRAM so it is easier
> now to understand what is going on.
>
> The main bug in this series which I fixed is that we
> allowed #SMI to happen during the STI interrupt shadow,
> and we did nothing to both reset it on #SMI handler
> entry and restore it on RSM.
>
> V3: addressed most of the review feedback from Sean (thanks!)
>
> Best regards,
> Maxim Levitsky
>
> Maxim Levitsky (13):
> bug: introduce ASSERT_STRUCT_OFFSET
> KVM: x86: emulator: em_sysexit should update ctxt->mode
> KVM: x86: emulator: introduce emulator_recalc_and_set_mode
> KVM: x86: emulator: update the emulation mode after rsm
> KVM: x86: emulator: update the emulation mode after CR0 write
> KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on
> the image format
> KVM: x86: emulator/smm: add structs for KVM's smram layout
> KVM: x86: emulator/smm: use smram structs in the common code
> KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore
> KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore
> KVM: x86: SVM: use smram structs
> KVM: x86: SVM: don't save SVM state to SMRAM when VM is not long mode
> capable
> KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM
>
> arch/x86/include/asm/kvm_host.h | 11 +-
> arch/x86/kvm/emulate.c | 305 +++++++++++++++++---------------
> arch/x86/kvm/kvm_emulate.h | 223 ++++++++++++++++++++++-
> arch/x86/kvm/svm/svm.c | 30 ++--
> arch/x86/kvm/vmx/vmcs12.h | 5 +-
> arch/x86/kvm/vmx/vmx.c | 4 +-
> arch/x86/kvm/x86.c | 175 +++++++++---------
> include/linux/build_bug.h | 9 +
> 8 files changed, 497 insertions(+), 265 deletions(-)
>

FWIW, we tested the v2 on 5.19 and backported it to 5.15 with minimal adaption
required (mostly unrelated context change) and now also updated that backport
to the v3 of this patch series.

Our reproducer got fixed with either, but v3 now also avoids triggering logs like:

Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle
Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle
Jul 29 07:15:46 mits4 kernel: kvm: vcpu 1: requested 191999 ns lapic timer period limited to 200000 ns
Jul 29 11:06:31 mits4 kernel: kvm: vcpu 1: requested 105786 ns lapic timer period limited to 200000 ns

which happened earlier (not sure how deep that correlates with the v2 vs. v3, but
it stuck out, so mentioning for sake of completeness).

For the backport to 5.15 we skipped "KVM: x86: emulator/smm: number of GPRs in
the SMRAM image depends on the image format", as that constant was there yet and
the actual values stayed the same for our case FWICT and adapted to slight context
changes for the others.

So, the approach seems to fix our issue and we are already rolling out a kernel
to users for testing and got positive feedback there too.

With above in mind:

Tested-by: Thomas Lamprecht <[email protected]>

It would be also great to see this backported to still supported upstream stable kernels
from 5.15 onwards, as there the TDP MMU got by default enabled, and that is at least
increasing the chance of our reproducer to trigger dramatically.

thx & cheers
Thomas

2022-08-10 13:38:46

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v3 00/13] SMM emulation and interrupt shadow fixes

On Wed, 2022-08-10 at 14:00 +0200, Thomas Lamprecht wrote:
> On 03/08/2022 17:49, Maxim Levitsky wrote:
> > This patch series is a result of long debug work to find out why
> > sometimes guests with win11 secure boot
> > were failing during boot.
> >
> > During writing a unit test I found another bug, turns out
> > that on rsm emulation, if the rsm instruction was done in real
> > or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
> >
> > I also refactored the way we write SMRAM so it is easier
> > now to understand what is going on.
> >
> > The main bug in this series which I fixed is that we
> > allowed #SMI to happen during the STI interrupt shadow,
> > and we did nothing to both reset it on #SMI handler
> > entry and restore it on RSM.
> >
> > V3: addressed most of the review feedback from Sean (thanks!)
> >
> > Best regards,
> >         Maxim Levitsky
> >
> > Maxim Levitsky (13):
> >   bug: introduce ASSERT_STRUCT_OFFSET
> >   KVM: x86: emulator: em_sysexit should update ctxt->mode
> >   KVM: x86: emulator: introduce emulator_recalc_and_set_mode
> >   KVM: x86: emulator: update the emulation mode after rsm
> >   KVM: x86: emulator: update the emulation mode after CR0 write
> >   KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on
> >     the image format
> >   KVM: x86: emulator/smm: add structs for KVM's smram layout
> >   KVM: x86: emulator/smm: use smram structs in the common code
> >   KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore
> >   KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore
> >   KVM: x86: SVM: use smram structs
> >   KVM: x86: SVM: don't save SVM state to SMRAM when VM is not long mode
> >     capable
> >   KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM
> >
> >  arch/x86/include/asm/kvm_host.h |  11 +-
> >  arch/x86/kvm/emulate.c          | 305 +++++++++++++++++---------------
> >  arch/x86/kvm/kvm_emulate.h      | 223 ++++++++++++++++++++++-
> >  arch/x86/kvm/svm/svm.c          |  30 ++--
> >  arch/x86/kvm/vmx/vmcs12.h       |   5 +-
> >  arch/x86/kvm/vmx/vmx.c          |   4 +-
> >  arch/x86/kvm/x86.c              | 175 +++++++++---------
> >  include/linux/build_bug.h       |   9 +
> >  8 files changed, 497 insertions(+), 265 deletions(-)
> >
>
> FWIW, we tested the v2 on 5.19 and backported it to 5.15 with minimal adaption
> required (mostly unrelated context change) and now also updated that backport
> to the v3 of this patch series.
>
> Our reproducer got fixed with either, but v3 now also avoids triggering logs like:
>
>  Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle
>  Jul 29 04:59:18 mits4 QEMU[2775]: kvm: Could not update PFLASH: Stale file handle
>  Jul 29 07:15:46 mits4 kernel: kvm: vcpu 1: requested 191999 ns lapic timer period limited to 200000 ns
>  Jul 29 11:06:31 mits4 kernel: kvm: vcpu 1: requested 105786 ns lapic timer period limited to 200000 ns
>
> which happened earlier (not sure how deep that correlates with the v2 vs. v3, but
> it stuck out, so mentioning for sake of completeness).

This is likely just a coincidence because V3 should not contain any functional changes vs v2.
(If I remember correctly)

>
> For the backport to 5.15 we skipped "KVM: x86: emulator/smm: number of GPRs in
> the SMRAM image depends on the image format", as that constant was there yet and
> the actual values stayed the same for our case FWICT and adapted to slight context
> changes for the others.
>
> So, the approach seems to fix our issue and we are already rolling out a kernel
> to users for testing and got positive feedback there too.
>
> With above in mind:
>
> Tested-by: Thomas Lamprecht <[email protected]>

Thank you very much for testing!

>
> It would be also great to see this backported to still supported upstream stable kernels
> from 5.15 onwards, as there the TDP MMU got by default enabled, and that is at least
> increasing the chance of our reproducer to trigger dramatically.

Best regards,
Maxim Levitsky

>
> thx & cheers
> Thomas
>


2022-08-11 16:12:54

by Yang, Weijiang

[permalink] [raw]
Subject: Re: [PATCH v3 03/13] KVM: x86: emulator: introduce emulator_recalc_and_set_mode


On 8/3/2022 11:50 PM, Maxim Levitsky wrote:
> [...]
> +static inline int emulator_recalc_and_set_mode(struct x86_emulate_ctxt *ctxt)
> +{
> + u64 efer;
> + struct desc_struct cs;
> + u16 selector;
> + u32 base3;
> +
> + ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
> +
> + if (!ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE) {
Shouldn't this be:  !(ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE) ?
> + /* Real mode. cpu must not have long mode active */
> + if (efer & EFER_LMA)
> + return X86EMUL_UNHANDLEABLE;
> + ctxt->mode = X86EMUL_MODE_REAL;
> + return X86EMUL_CONTINUE;
> + }
> +
[...]
> --
> 2.26.3
>

2022-08-12 06:55:44

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v3 03/13] KVM: x86: emulator: introduce emulator_recalc_and_set_mode

On Thu, 2022-08-11 at 23:33 +0800, Yang, Weijiang wrote:
> On 8/3/2022 11:50 PM, Maxim Levitsky wrote:
> > [...]
> > +static inline int emulator_recalc_and_set_mode(struct x86_emulate_ctxt *ctxt)
> > +{
> > + u64 efer;
> > + struct desc_struct cs;
> > + u16 selector;
> > + u32 base3;
> > +
> > + ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
> > +
> > + if (!ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE) {

Ouch, thanks for catching this!!

I wonder how I didn't catch this in unit tests....
(I'll check on this Sunday)


Best regards,
Maxim Levitsky

> Shouldn't this be: !(ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE) ?
> > + /* Real mode. cpu must not have long mode active */
> > + if (efer & EFER_LMA)
> > + return X86EMUL_UNHANDLEABLE;
> > + ctxt->mode = X86EMUL_MODE_REAL;
> > + return X86EMUL_CONTINUE;
> > + }
> > +
> [...]
> > --
> > 2.26.3
> >


2022-08-17 14:57:23

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v3 03/13] KVM: x86: emulator: introduce emulator_recalc_and_set_mode

On Thu, 2022-08-11 at 23:33 +0800, Yang, Weijiang wrote:
>
> On 8/3/2022 11:50 PM, Maxim Levitsky wrote:
> > [...]
> > +static inline int emulator_recalc_and_set_mode(struct x86_emulate_ctxt *ctxt)
> > +{
> > +       u64 efer;
> > +       struct desc_struct cs;
> > +       u16 selector;
> > +       u32 base3;
> > +
> > +       ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
> > +
> > +       if (!ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE) {
> Shouldn't this be:  !(ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE) ?

Ouch, thanks for noticing this!
I'll fix it in v4 (I'll wait a bit if I get more review feedback before sending it).

Best regards,
Maxim Levitsky




> > +               /* Real mode. cpu must not have long mode active */
> > +               if (efer & EFER_LMA)
> > +                       return X86EMUL_UNHANDLEABLE;
> > +               ctxt->mode = X86EMUL_MODE_REAL;
> > +               return X86EMUL_CONTINUE;
> > +       }
> > +
> [...]
> > --
> > 2.26.3
> >
>


2022-08-24 21:55:28

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 04/13] KVM: x86: emulator: update the emulation mode after rsm

On Wed, Aug 03, 2022, Maxim Levitsky wrote:

Please make the changelog standalone, even though it means restating the shortlog
in most cases. When viewing git commits, the shortlog+changelog are bundled
fairly close together, but when viewing patches in a mail client, e.g. when doing
initial review, the shortlog is in the subject which may be far away or even
completely hidden.

> This ensures that RIP will be correctly written back,
> because the RSM instruction can switch the CPU mode from
> 32 bit (or less) to 64 bit.

Wrap closer to ~75 chars.

>
> This fixes a guest crash in case the #SMI is received
> while the guest runs a code from an address > 32 bit.
>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/emulate.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index bc70caf403c2b4..5e91b26cc1d8aa 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2666,6 +2666,11 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
> if (ret != X86EMUL_CONTINUE)
> goto emulate_shutdown;
>
> +

Unnecessary newline.

> + ret = emulator_recalc_and_set_mode(ctxt);
> + if (ret != X86EMUL_CONTINUE)
> + goto emulate_shutdown;
> +
> /*
> * Note, the ctxt->ops callbacks are responsible for handling side
> * effects when writing MSRs and CRs, e.g. MMU context resets, CPUID
> --
> 2.26.3
>

2022-08-24 23:06:53

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 12/13] KVM: x86: SVM: don't save SVM state to SMRAM when VM is not long mode capable

On Wed, Aug 03, 2022, Maxim Levitsky wrote:
> When the guest CPUID doesn't have support for long mode, 32 bit SMRAM
> layout is used and it has no support for preserving EFER and/or SVM
> state.
>
> Note that this isn't relevant to running 32 bit guests on VM which is
> long mode capable - such VM can still run 32 bit guests in compatibility
> mode.
>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/svm/svm.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7ca5e06878e19a..64cfd26bc5e7a6 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4442,6 +4442,15 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
> if (!is_guest_mode(vcpu))
> return 0;
>
> + /*
> + * 32 bit SMRAM format doesn't preserve EFER and SVM state.
> + * SVM should not be enabled by the userspace without marking
> + * the CPU as at least long mode capable.

Hmm, or userspace can ensure SMIs never get delivered. Maybe?

/*
* 32-bit SMRAM format doesn't preserve EFER and SVM state. Userspace is
* responsible for ensuring nested SVM and SMIs are mutually exclusive.
*/

> + */
> +

Unnecessary newline.

> + if (!guest_cpuid_has(vcpu, X86_FEATURE_LM))
> + return 1;

This doesn't actually fix anything, RSM will still jump to L2 state but in L1
context. I think we first need to actually handle errors from
static_call(kvm_x86_enter_smm).

Given that SVM can't even guarantee nested_svm_simple_vmexit() succeeds, i.e. KVM
can't force the vCPU out of L2 to ensure triple fault would hit L1, killing the VM
seems like the least awful solution (and it's still quite awful).

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 54fa0aa95785..38a6f4089296 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9985,7 +9985,10 @@ static void enter_smm(struct kvm_vcpu *vcpu)
* state (e.g. leave guest mode) after we've saved the state into the
* SMM state-save area.
*/
- static_call(kvm_x86_enter_smm)(vcpu, &smram);
+ if (static_call(kvm_x86_enter_smm)(vcpu, &smram)) {
+ kvm_vm_dead(vcpu->vm);
+ return;
+ }

kvm_smm_changed(vcpu, true);
kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, &smram, sizeof(smram));

> +
> smram->smram64.svm_guest_flag = 1;
> smram->smram64.svm_guest_vmcb_gpa = svm->nested.vmcb12_gpa;
>
> --
> 2.26.3
>

2022-08-24 23:27:06

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 10/13] KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore

On Wed, Aug 03, 2022, Maxim Levitsky wrote:
> @@ -9909,57 +9906,51 @@ static void enter_smm_save_state_32(struct kvm_vcpu *vcpu, struct kvm_smram_stat
> }
>
> #ifdef CONFIG_X86_64
> -static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
> +static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, struct kvm_smram_state_64 *smram)

Please put these on different lines.

> struct desc_ptr dt;
> - struct kvm_segment seg;
> unsigned long val;
> int i;
>
> for (i = 0; i < 16; i++)
> - put_smstate(u64, buf, 0x7ff8 - i * 8, kvm_register_read_raw(vcpu, i));
> + smram->gprs[15 - i] = kvm_register_read_raw(vcpu, i);

Blech, why do I get the feeling that the original layout was designed so that
ucode could use PUSHAD? This look so weird...