LinuxLists.cc - [PATCH v2 00/11] SMM emulation and interrupt shadow fixes

2022-06-21 15:12:57

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH v2 00/11] SMM emulation and interrupt shadow fixes

This patch series is a result of long debug work to find out why

sometimes guests with win11 secure boot

were failing during boot.

During writing a unit test I found another bug, turns out

that on rsm emulation, if the rsm instruction was done in real

or 32 bit mode, KVM would truncate the restored RIP to 32 bit.

I also refactored the way we write SMRAM so it is easier

now to understand what is going on.

The main bug in this series which I fixed is that we

allowed #SMI to happen during the STI interrupt shadow,

and we did nothing to both reset it on #SMI handler

entry and restore it on RSM.

Best regards,

Maxim Levitsky

Maxim Levitsky (11):

KVM: x86: emulator: em_sysexit should update ctxt->mode

KVM: x86: emulator: introduce update_emulation_mode

KVM: x86: emulator: remove assign_eip_near/far

KVM: x86: emulator: update the emulation mode after rsm

KVM: x86: emulator: update the emulation mode after CR0 write

KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on

the image format

KVM: x86: emulator/smm: add structs for KVM's smram layout

KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore

KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore

KVM: x86: SVM: use smram structs

KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

arch/x86/include/asm/kvm_host.h | 6 -

arch/x86/kvm/emulate.c | 305 ++++++++++++++++----------------

arch/x86/kvm/kvm_emulate.h | 146 +++++++++++++++

arch/x86/kvm/svm/svm.c | 28 +--

arch/x86/kvm/x86.c | 162 ++++++++---------

5 files changed, 394 insertions(+), 253 deletions(-)

--

2.26.3

2022-06-21 15:12:57

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH v2 03/11] KVM: x86: emulator: remove assign_eip_near/far

Now the assign_eip_far just updates the emulation mode in addition to
updating the rip, it doesn't make sense to keep that function.

Move mode update to the callers and remove these functions.

No functional change is intended.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 47 +++++++++++++++++++++---------------------
1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2c0087df2d7e6a..334a06e6c9b093 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -866,24 +866,9 @@ static inline int update_emulation_mode(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
}

-static inline int assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
-{
- return assign_eip(ctxt, dst);
-}
-
-static int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst)
-{
- int rc = update_emulation_mode(ctxt);
-
- if (rc != X86EMUL_CONTINUE)
- return rc;
-
- return assign_eip(ctxt, dst);
-}
-
static inline int jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
{
- return assign_eip_near(ctxt, ctxt->_eip + rel);
+ return assign_eip(ctxt, ctxt->_eip + rel);
}

static int linear_read_system(struct x86_emulate_ctxt *ctxt, ulong linear,
@@ -2213,7 +2198,12 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE)
return rc;

- rc = assign_eip_far(ctxt, ctxt->src.val);
+ rc = update_emulation_mode(ctxt);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
+
+ rc = assign_eip(ctxt, ctxt->src.val);
+
/* Error handling is not implemented. */
if (rc != X86EMUL_CONTINUE)
return X86EMUL_UNHANDLEABLE;
@@ -2223,7 +2213,7 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)

static int em_jmp_abs(struct x86_emulate_ctxt *ctxt)
{
- return assign_eip_near(ctxt, ctxt->src.val);
+ return assign_eip(ctxt, ctxt->src.val);
}

static int em_call_near_abs(struct x86_emulate_ctxt *ctxt)
@@ -2232,7 +2222,7 @@ static int em_call_near_abs(struct x86_emulate_ctxt *ctxt)
long int old_eip;

old_eip = ctxt->_eip;
- rc = assign_eip_near(ctxt, ctxt->src.val);
+ rc = assign_eip(ctxt, ctxt->src.val);
if (rc != X86EMUL_CONTINUE)
return rc;
ctxt->src.val = old_eip;
@@ -2270,7 +2260,7 @@ static int em_ret(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE)
return rc;

- return assign_eip_near(ctxt, eip);
+ return assign_eip(ctxt, eip);
}

static int em_ret_far(struct x86_emulate_ctxt *ctxt)
@@ -2291,7 +2281,13 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
&new_desc);
if (rc != X86EMUL_CONTINUE)
return rc;
- rc = assign_eip_far(ctxt, eip);
+
+ rc = update_emulation_mode(ctxt);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
+
+ rc = assign_eip(ctxt, eip);
+
/* Error handling is not implemented. */
if (rc != X86EMUL_CONTINUE)
return X86EMUL_UNHANDLEABLE;
@@ -3511,7 +3507,12 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE)
return rc;

- rc = assign_eip_far(ctxt, ctxt->src.val);
+ rc = update_emulation_mode(ctxt);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
+
+ rc = assign_eip(ctxt, ctxt->src.val);
+
if (rc != X86EMUL_CONTINUE)
goto fail;

@@ -3544,7 +3545,7 @@ static int em_ret_near_imm(struct x86_emulate_ctxt *ctxt)
rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
- rc = assign_eip_near(ctxt, eip);
+ rc = assign_eip(ctxt, eip);
if (rc != X86EMUL_CONTINUE)
return rc;
rsp_increment(ctxt, ctxt->src.val);
--
2.26.3

2022-06-21 15:13:09

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH v2 07/11] KVM: x86: emulator/smm: add structs for KVM's smram layout

Those structs will be used to read/write the smram state image.

Also document the differences between KVM's SMRAM layout and SMRAM
layout that is used by real Intel/AMD cpus.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/kvm_emulate.h | 139 +++++++++++++++++++++++++++++++++++++
1 file changed, 139 insertions(+)

diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 89246446d6aa9d..7015728da36d5f 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -503,6 +503,145 @@ enum x86_intercept {
nr_x86_intercepts
};

+
+/*
+ * 32 bit KVM's emulated SMM layout
+ * Loosely based on Intel's layout
+ */
+
+struct kvm_smm_seg_state_32 {
+ u32 flags;
+ u32 limit;
+ u32 base;
+} __packed;
+
+struct kvm_smram_state_32 {
+
+ u32 reserved1[62]; /* FE00 - FEF7 */
+ u32 smbase; /* FEF8 */
+ u32 smm_revision; /* FEFC */
+ u32 reserved2[5]; /* FF00-FF13 */
+ /* CR4 is not present in Intel/AMD SMRAM image*/
+ u32 cr4; /* FF14 */
+ u32 reserved3[5]; /* FF18 */
+
+ /*
+ * Segment state is not present/documented in the
+ * Intel/AMD SMRAM image
+ */
+ struct kvm_smm_seg_state_32 ds; /* FF2C */
+ struct kvm_smm_seg_state_32 fs; /* FF38 */
+ struct kvm_smm_seg_state_32 gs; /* FF44 */
+ /* idtr has only base and limit*/
+ struct kvm_smm_seg_state_32 idtr; /* FF50 */
+ struct kvm_smm_seg_state_32 tr; /* FF5C */
+ u32 reserved; /* FF68 */
+ /* gdtr has only base and limit*/
+ struct kvm_smm_seg_state_32 gdtr; /* FF6C */
+ struct kvm_smm_seg_state_32 ldtr; /* FF78 */
+ struct kvm_smm_seg_state_32 es; /* FF84 */
+ struct kvm_smm_seg_state_32 cs; /* FF90 */
+ struct kvm_smm_seg_state_32 ss; /* FF9C */
+
+ u32 es_sel; /* FFA8 */
+ u32 cs_sel; /* FFAC */
+ u32 ss_sel; /* FFB0 */
+ u32 ds_sel; /* FFB4 */
+ u32 fs_sel; /* FFB8 */
+ u32 gs_sel; /* FFBC */
+ u32 ldtr_sel; /* FFC0 */
+ u32 tr_sel; /* FFC4 */
+
+ u32 dr7; /* FFC8 */
+ u32 dr6; /* FFCC */
+
+ /* GPRS in the "natural" X86 order (RAX/RCX/RDX.../RDI)*/
+ u32 gprs[8]; /* FFD0-FFEC */
+
+ u32 eip; /* FFF0 */
+ u32 eflags; /* FFF4 */
+ u32 cr3; /* FFF8 */
+ u32 cr0; /* FFFC */
+} __packed;
+
+/*
+ * 64 bit KVM's emulated SMM layout
+ * Based on AMD64 layout
+ */
+
+struct kvm_smm_seg_state_64 {
+ u16 selector;
+ u16 attributes;
+ u32 limit;
+ u64 base;
+};
+
+struct kvm_smram_state_64 {
+ struct kvm_smm_seg_state_64 es; /* FE00 (R/O) */
+ struct kvm_smm_seg_state_64 cs; /* FE10 (R/O) */
+ struct kvm_smm_seg_state_64 ss; /* FE20 (R/O) */
+ struct kvm_smm_seg_state_64 ds; /* FE30 (R/O) */
+ struct kvm_smm_seg_state_64 fs; /* FE40 (R/O) */
+ struct kvm_smm_seg_state_64 gs; /* FE50 (R/O) */
+
+ /* gdtr has only base and limit*/
+ struct kvm_smm_seg_state_64 gdtr; /* FE60 (R/O) */
+ struct kvm_smm_seg_state_64 ldtr; /* FE70 (R/O) */
+
+ /* idtr has only base and limit*/
+ struct kvm_smm_seg_state_64 idtr; /* FE80 (R/O) */
+ struct kvm_smm_seg_state_64 tr; /* FE90 (R/O) */
+
+ /* I/O restart and auto halt restart are not implemented by KVM */
+ u64 io_restart_rip; /* FEA0 (R/O) */
+ u64 io_restart_rcx; /* FEA8 (R/O) */
+ u64 io_restart_rsi; /* FEB0 (R/O) */
+ u64 io_restart_rdi; /* FEB8 (R/O) */
+ u32 io_restart_dword; /* FEC0 (R/O) */
+ u32 reserved1; /* FEC4 */
+ u8 io_instruction_restart; /* FEC8 (R/W) */
+ u8 auto_halt_restart; /* FEC9 (R/W) */
+ u8 reserved2[6]; /* FECA-FECF */
+
+ u64 efer; /* FED0 (R/O) */
+
+ /*
+ * Implemented on AMD only, to store current SVM guest address.
+ * svm_guest_virtual_int has unknown purpose, not implemented.
+ */
+
+ u64 svm_guest_flag; /* FED8 (R/O) */
+ u64 svm_guest_vmcb_gpa; /* FEE0 (R/O) */
+ u64 svm_guest_virtual_int; /* FEE8 (R/O) */
+
+ u32 reserved3[3]; /* FEF0-FEFB */
+ u32 smm_revison; /* FEFC (R/O) */
+ u32 smbase; /* FFF0 (R/W) */
+ u32 reserved4[5]; /* FF04-FF17 */
+
+ /* SSP and SVM fields below are not implemented by KVM */
+ u64 ssp; /* FF18 (R/W) */
+ u64 svm_guest_pat; /* FF20 (R/O) */
+ u64 svm_host_efer; /* FF28 (R/O) */
+ u64 svm_host_cr4; /* FF30 (R/O) */
+ u64 svm_host_cr3; /* FF38 (R/O) */
+ u64 svm_host_cr0; /* FF40 (R/O) */
+
+ u64 cr4; /* FF48 (R/O) */
+ u64 cr3; /* FF50 (R/O) */
+ u64 cr0; /* FF58 (R/O) */
+
+ u64 dr7; /* FF60 (R/O) */
+ u64 dr6; /* FF68 (R/O) */
+
+ u64 rflags; /* FF70 (R/W) */
+ u64 rip; /* FF78 (R/W) */
+
+ /* GPRS in a reversed "natural" X86 order (R15/R14/../RCX/RAX.) */
+ u64 gprs[16]; /* FF80-FFFF (R/W) */
+};
+
+
/* Host execution mode. */
#if defined(CONFIG_X86_32)
#define X86EMUL_MODE_HOST X86EMUL_MODE_PROT32
--
2.26.3

2022-06-21 15:13:16

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

When #SMI is asserted, the CPU can be in interrupt shadow
due to sti or mov ss.

It is not mandatory in Intel/AMD prm to have the #SMI
blocked during the shadow, and on top of
that, since neither SVM nor VMX has true support for SMI
window, waiting for one instruction would mean single stepping
the guest.

Instead, allow #SMI in this case, but both reset the interrupt
window and stash its value in SMRAM to restore it on exit
from SMM.

This fixes rare failures seen mostly on windows guests on VMX,
when #SMI falls on the sti instruction which mainfest in
VM entry failure due to EFLAGS.IF not being set, but STI interrupt
window still being set in the VMCS.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 17 ++++++++++++++---
arch/x86/kvm/kvm_emulate.h | 13 ++++++++++---
arch/x86/kvm/x86.c | 12 ++++++++++++
3 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7a3a042d6b862a..d4ede5216491ad 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2443,7 +2443,7 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,
struct kvm_smram_state_32 *smstate)
{
struct desc_ptr dt;
- int i;
+ int i, r;

ctxt->eflags = smstate->eflags | X86_EFLAGS_FIXED;
ctxt->_eip = smstate->eip;
@@ -2478,8 +2478,16 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,

ctxt->ops->set_smbase(ctxt, smstate->smbase);

- return rsm_enter_protected_mode(ctxt, smstate->cr0,
- smstate->cr3, smstate->cr4);
+ r = rsm_enter_protected_mode(ctxt, smstate->cr0,
+ smstate->cr3, smstate->cr4);
+
+ if (r != X86EMUL_CONTINUE)
+ return r;
+
+ ctxt->ops->set_int_shadow(ctxt, 0);
+ ctxt->interruptibility = (u8)smstate->int_shadow;
+
+ return X86EMUL_CONTINUE;
}

#ifdef CONFIG_X86_64
@@ -2528,6 +2536,9 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
rsm_load_seg_64(ctxt, &smstate->fs, VCPU_SREG_FS);
rsm_load_seg_64(ctxt, &smstate->gs, VCPU_SREG_GS);

+ ctxt->ops->set_int_shadow(ctxt, 0);
+ ctxt->interruptibility = (u8)smstate->int_shadow;
+
return X86EMUL_CONTINUE;
}
#endif
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index 7015728da36d5f..11928306439c77 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -232,6 +232,7 @@ struct x86_emulate_ops {
bool (*guest_has_rdpid)(struct x86_emulate_ctxt *ctxt);

void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
+ void (*set_int_shadow)(struct x86_emulate_ctxt *ctxt, u8 shadow);

unsigned (*get_hflags)(struct x86_emulate_ctxt *ctxt);
void (*exiting_smm)(struct x86_emulate_ctxt *ctxt);
@@ -520,7 +521,9 @@ struct kvm_smram_state_32 {
u32 reserved1[62]; /* FE00 - FEF7 */
u32 smbase; /* FEF8 */
u32 smm_revision; /* FEFC */
- u32 reserved2[5]; /* FF00-FF13 */
+ u32 reserved2[4]; /* FF00-FF0F*/
+ /* int_shadow is KVM extension*/
+ u32 int_shadow; /* FF10 */
/* CR4 is not present in Intel/AMD SMRAM image*/
u32 cr4; /* FF14 */
u32 reserved3[5]; /* FF18 */
@@ -592,13 +595,17 @@ struct kvm_smram_state_64 {
struct kvm_smm_seg_state_64 idtr; /* FE80 (R/O) */
struct kvm_smm_seg_state_64 tr; /* FE90 (R/O) */

- /* I/O restart and auto halt restart are not implemented by KVM */
+ /*
+ * I/O restart and auto halt restart are not implemented by KVM
+ * int_shadow is KVM's extension
+ */
+
u64 io_restart_rip; /* FEA0 (R/O) */
u64 io_restart_rcx; /* FEA8 (R/O) */
u64 io_restart_rsi; /* FEB0 (R/O) */
u64 io_restart_rdi; /* FEB8 (R/O) */
u32 io_restart_dword; /* FEC0 (R/O) */
- u32 reserved1; /* FEC4 */
+ u32 int_shadow; /* FEC4 (R/O) */
u8 io_instruction_restart; /* FEC8 (R/W) */
u8 auto_halt_restart; /* FEC9 (R/W) */
u8 reserved2[6]; /* FECA-FECF */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a1b138f0815d30..665134b1096b25 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7887,6 +7887,11 @@ static void emulator_set_nmi_mask(struct x86_emulate_ctxt *ctxt, bool masked)
static_call(kvm_x86_set_nmi_mask)(emul_to_vcpu(ctxt), masked);
}

+static void emulator_set_int_shadow(struct x86_emulate_ctxt *ctxt, u8 shadow)
+{
+ static_call(kvm_x86_set_interrupt_shadow)(emul_to_vcpu(ctxt), shadow);
+}
+
static unsigned emulator_get_hflags(struct x86_emulate_ctxt *ctxt)
{
return emul_to_vcpu(ctxt)->arch.hflags;
@@ -7967,6 +7972,7 @@ static const struct x86_emulate_ops emulate_ops = {
.guest_has_fxsr = emulator_guest_has_fxsr,
.guest_has_rdpid = emulator_guest_has_rdpid,
.set_nmi_mask = emulator_set_nmi_mask,
+ .set_int_shadow = emulator_set_int_shadow,
.get_hflags = emulator_get_hflags,
.exiting_smm = emulator_exiting_smm,
.leave_smm = emulator_leave_smm,
@@ -9744,6 +9750,8 @@ static void enter_smm_save_state_32(struct kvm_vcpu *vcpu, struct kvm_smram_stat
smram->cr4 = kvm_read_cr4(vcpu);
smram->smm_revision = 0x00020000;
smram->smbase = vcpu->arch.smbase;
+
+ smram->int_shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
}

#ifdef CONFIG_X86_64
@@ -9792,6 +9800,8 @@ static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, struct kvm_smram_stat
enter_smm_save_seg_64(vcpu, &smram->ds, VCPU_SREG_DS);
enter_smm_save_seg_64(vcpu, &smram->fs, VCPU_SREG_FS);
enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
+
+ smram->int_shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
}
#endif

@@ -9828,6 +9838,8 @@ static void enter_smm(struct kvm_vcpu *vcpu)
kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
kvm_rip_write(vcpu, 0x8000);

+ static_call(kvm_x86_set_interrupt_shadow)(vcpu, 0);
+
cr0 = vcpu->arch.cr0 & ~(X86_CR0_PE | X86_CR0_EM | X86_CR0_TS | X86_CR0_PG);
static_call(kvm_x86_set_cr0)(vcpu, cr0);
vcpu->arch.cr0 = cr0;
--
2.26.3

2022-06-21 15:13:32

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH v2 09/11] KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore

Use kvm_smram_state_64 struct to save/restore the 64 bit SMM state
(used when X86_FEATURE_LM is present in the guest CPUID,
regardless of 32-bitness of the guest).

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 88 ++++++++++++++----------------------------
arch/x86/kvm/x86.c | 75 ++++++++++++++++-------------------
2 files changed, 62 insertions(+), 101 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 6d263906054689..7a3a042d6b862a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2381,24 +2381,16 @@ static void rsm_load_seg_32(struct x86_emulate_ctxt *ctxt,
}

#ifdef CONFIG_X86_64
-static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, const char *smstate,
- int n)
+static void rsm_load_seg_64(struct x86_emulate_ctxt *ctxt,
+ struct kvm_smm_seg_state_64 *state,
+ int n)
{
struct desc_struct desc;
- int offset;
- u16 selector;
- u32 base3;
-
- offset = 0x7e00 + n * 16;
-
- selector = GET_SMSTATE(u16, smstate, offset);
- rsm_set_desc_flags(&desc, GET_SMSTATE(u16, smstate, offset + 2) << 8);
- set_desc_limit(&desc, GET_SMSTATE(u32, smstate, offset + 4));
- set_desc_base(&desc, GET_SMSTATE(u32, smstate, offset + 8));
- base3 = GET_SMSTATE(u32, smstate, offset + 12);

- ctxt->ops->set_segment(ctxt, selector, &desc, base3, n);
- return X86EMUL_CONTINUE;
+ rsm_set_desc_flags(&desc, state->attributes << 8);
+ set_desc_limit(&desc, state->limit);
+ set_desc_base(&desc, (u32)state->base);
+ ctxt->ops->set_segment(ctxt, state->selector, &desc, state->base >> 32, n);
}
#endif

@@ -2492,71 +2484,49 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,

#ifdef CONFIG_X86_64
static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
- const char *smstate)
+ struct kvm_smram_state_64 *smstate)
{
- struct desc_struct desc;
struct desc_ptr dt;
- u64 val, cr0, cr3, cr4;
- u32 base3;
- u16 selector;
int i, r;

for (i = 0; i < 16; i++)
- *reg_write(ctxt, i) = GET_SMSTATE(u64, smstate, 0x7ff8 - i * 8);
+ *reg_write(ctxt, i) = smstate->gprs[15 - i];

- ctxt->_eip = GET_SMSTATE(u64, smstate, 0x7f78);
- ctxt->eflags = GET_SMSTATE(u32, smstate, 0x7f70) | X86_EFLAGS_FIXED;
+ ctxt->_eip = smstate->rip;
+ ctxt->eflags = smstate->rflags | X86_EFLAGS_FIXED;

- val = GET_SMSTATE(u64, smstate, 0x7f68);
-
- if (ctxt->ops->set_dr(ctxt, 6, val))
+ if (ctxt->ops->set_dr(ctxt, 6, smstate->dr6))
return X86EMUL_UNHANDLEABLE;
-
- val = GET_SMSTATE(u64, smstate, 0x7f60);
-
- if (ctxt->ops->set_dr(ctxt, 7, val))
+ if (ctxt->ops->set_dr(ctxt, 7, smstate->dr7))
return X86EMUL_UNHANDLEABLE;

- cr0 = GET_SMSTATE(u64, smstate, 0x7f58);
- cr3 = GET_SMSTATE(u64, smstate, 0x7f50);
- cr4 = GET_SMSTATE(u64, smstate, 0x7f48);
- ctxt->ops->set_smbase(ctxt, GET_SMSTATE(u32, smstate, 0x7f00));
- val = GET_SMSTATE(u64, smstate, 0x7ed0);
+ ctxt->ops->set_smbase(ctxt, smstate->smbase);

- if (ctxt->ops->set_msr(ctxt, MSR_EFER, val & ~EFER_LMA))
+ if (ctxt->ops->set_msr(ctxt, MSR_EFER, smstate->efer & ~EFER_LMA))
return X86EMUL_UNHANDLEABLE;

- selector = GET_SMSTATE(u32, smstate, 0x7e90);
- rsm_set_desc_flags(&desc, GET_SMSTATE(u32, smstate, 0x7e92) << 8);
- set_desc_limit(&desc, GET_SMSTATE(u32, smstate, 0x7e94));
- set_desc_base(&desc, GET_SMSTATE(u32, smstate, 0x7e98));
- base3 = GET_SMSTATE(u32, smstate, 0x7e9c);
- ctxt->ops->set_segment(ctxt, selector, &desc, base3, VCPU_SREG_TR);
+ rsm_load_seg_64(ctxt, &smstate->tr, VCPU_SREG_TR);

- dt.size = GET_SMSTATE(u32, smstate, 0x7e84);
- dt.address = GET_SMSTATE(u64, smstate, 0x7e88);
+ dt.size = smstate->idtr.limit;
+ dt.address = smstate->idtr.base;
ctxt->ops->set_idt(ctxt, &dt);

- selector = GET_SMSTATE(u32, smstate, 0x7e70);
- rsm_set_desc_flags(&desc, GET_SMSTATE(u32, smstate, 0x7e72) << 8);
- set_desc_limit(&desc, GET_SMSTATE(u32, smstate, 0x7e74));
- set_desc_base(&desc, GET_SMSTATE(u32, smstate, 0x7e78));
- base3 = GET_SMSTATE(u32, smstate, 0x7e7c);
- ctxt->ops->set_segment(ctxt, selector, &desc, base3, VCPU_SREG_LDTR);
+ rsm_load_seg_64(ctxt, &smstate->ldtr, VCPU_SREG_LDTR);

- dt.size = GET_SMSTATE(u32, smstate, 0x7e64);
- dt.address = GET_SMSTATE(u64, smstate, 0x7e68);
+ dt.size = smstate->gdtr.limit;
+ dt.address = smstate->gdtr.base;
ctxt->ops->set_gdt(ctxt, &dt);

- r = rsm_enter_protected_mode(ctxt, cr0, cr3, cr4);
+ r = rsm_enter_protected_mode(ctxt, smstate->cr0, smstate->cr3, smstate->cr4);
if (r != X86EMUL_CONTINUE)
return r;

- for (i = 0; i < 6; i++) {
- r = rsm_load_seg_64(ctxt, smstate, i);
- if (r != X86EMUL_CONTINUE)
- return r;
- }
+ rsm_load_seg_64(ctxt, &smstate->es, VCPU_SREG_ES);
+ rsm_load_seg_64(ctxt, &smstate->cs, VCPU_SREG_CS);
+ rsm_load_seg_64(ctxt, &smstate->ss, VCPU_SREG_SS);
+ rsm_load_seg_64(ctxt, &smstate->ds, VCPU_SREG_DS);
+ rsm_load_seg_64(ctxt, &smstate->fs, VCPU_SREG_FS);
+ rsm_load_seg_64(ctxt, &smstate->gs, VCPU_SREG_GS);

return X86EMUL_CONTINUE;
}
@@ -2629,7 +2599,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)

#ifdef CONFIG_X86_64
if (emulator_has_longmode(ctxt))
- ret = rsm_load_state_64(ctxt, buf);
+ ret = rsm_load_state_64(ctxt, (struct kvm_smram_state_64 *)buf);
else
#endif
ret = rsm_load_state_32(ctxt, (struct kvm_smram_state_32 *)buf);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a1bbf2ed520769..a1b138f0815d30 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9689,20 +9689,17 @@ static void enter_smm_save_seg_32(struct kvm_vcpu *vcpu,
}

#ifdef CONFIG_X86_64
-static void enter_smm_save_seg_64(struct kvm_vcpu *vcpu, char *buf, int n)
+static void enter_smm_save_seg_64(struct kvm_vcpu *vcpu,
+ struct kvm_smm_seg_state_64 *state,
+ int n)
{
struct kvm_segment seg;
- int offset;
- u16 flags;

kvm_get_segment(vcpu, &seg, n);
- offset = 0x7e00 + n * 16;
-
- flags = enter_smm_get_segment_flags(&seg) >> 8;
- put_smstate(u16, buf, offset, seg.selector);
- put_smstate(u16, buf, offset + 2, flags);
- put_smstate(u32, buf, offset + 4, seg.limit);
- put_smstate(u64, buf, offset + 8, seg.base);
+ state->selector = seg.selector;
+ state->attributes = enter_smm_get_segment_flags(&seg) >> 8;
+ state->limit = seg.limit;
+ state->base = seg.base;
}
#endif

@@ -9750,57 +9747,51 @@ static void enter_smm_save_state_32(struct kvm_vcpu *vcpu, struct kvm_smram_stat
}

#ifdef CONFIG_X86_64
-static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, char *buf)
+static void enter_smm_save_state_64(struct kvm_vcpu *vcpu, struct kvm_smram_state_64 *smram)
{
struct desc_ptr dt;
- struct kvm_segment seg;
unsigned long val;
int i;

for (i = 0; i < 16; i++)
- put_smstate(u64, buf, 0x7ff8 - i * 8, kvm_register_read_raw(vcpu, i));
+ smram->gprs[15 - i] = kvm_register_read_raw(vcpu, i);
+
+ smram->rip = kvm_rip_read(vcpu);
+ smram->rflags = kvm_get_rflags(vcpu);

- put_smstate(u64, buf, 0x7f78, kvm_rip_read(vcpu));
- put_smstate(u32, buf, 0x7f70, kvm_get_rflags(vcpu));

kvm_get_dr(vcpu, 6, &val);
- put_smstate(u64, buf, 0x7f68, val);
+ smram->dr6 = val;
kvm_get_dr(vcpu, 7, &val);
- put_smstate(u64, buf, 0x7f60, val);
-
- put_smstate(u64, buf, 0x7f58, kvm_read_cr0(vcpu));
- put_smstate(u64, buf, 0x7f50, kvm_read_cr3(vcpu));
- put_smstate(u64, buf, 0x7f48, kvm_read_cr4(vcpu));
+ smram->dr7 = val;

- put_smstate(u32, buf, 0x7f00, vcpu->arch.smbase);
+ smram->cr0 = kvm_read_cr0(vcpu);
+ smram->cr3 = kvm_read_cr3(vcpu);
+ smram->cr4 = kvm_read_cr4(vcpu);

- /* revision id */
- put_smstate(u32, buf, 0x7efc, 0x00020064);
+ smram->smbase = vcpu->arch.smbase;
+ smram->smm_revison = 0x00020064;

- put_smstate(u64, buf, 0x7ed0, vcpu->arch.efer);
+ smram->efer = vcpu->arch.efer;

- kvm_get_segment(vcpu, &seg, VCPU_SREG_TR);
- put_smstate(u16, buf, 0x7e90, seg.selector);
- put_smstate(u16, buf, 0x7e92, enter_smm_get_segment_flags(&seg) >> 8);
- put_smstate(u32, buf, 0x7e94, seg.limit);
- put_smstate(u64, buf, 0x7e98, seg.base);
+ enter_smm_save_seg_64(vcpu, &smram->tr, VCPU_SREG_TR);

static_call(kvm_x86_get_idt)(vcpu, &dt);
- put_smstate(u32, buf, 0x7e84, dt.size);
- put_smstate(u64, buf, 0x7e88, dt.address);
+ smram->idtr.limit = dt.size;
+ smram->idtr.base = dt.address;

- kvm_get_segment(vcpu, &seg, VCPU_SREG_LDTR);
- put_smstate(u16, buf, 0x7e70, seg.selector);
- put_smstate(u16, buf, 0x7e72, enter_smm_get_segment_flags(&seg) >> 8);
- put_smstate(u32, buf, 0x7e74, seg.limit);
- put_smstate(u64, buf, 0x7e78, seg.base);
+ enter_smm_save_seg_64(vcpu, &smram->ldtr, VCPU_SREG_LDTR);

static_call(kvm_x86_get_gdt)(vcpu, &dt);
- put_smstate(u32, buf, 0x7e64, dt.size);
- put_smstate(u64, buf, 0x7e68, dt.address);
+ smram->gdtr.limit = dt.size;
+ smram->gdtr.base = dt.address;

- for (i = 0; i < 6; i++)
- enter_smm_save_seg_64(vcpu, buf, i);
+ enter_smm_save_seg_64(vcpu, &smram->es, VCPU_SREG_ES);
+ enter_smm_save_seg_64(vcpu, &smram->cs, VCPU_SREG_CS);
+ enter_smm_save_seg_64(vcpu, &smram->ss, VCPU_SREG_SS);
+ enter_smm_save_seg_64(vcpu, &smram->ds, VCPU_SREG_DS);
+ enter_smm_save_seg_64(vcpu, &smram->fs, VCPU_SREG_FS);
+ enter_smm_save_seg_64(vcpu, &smram->gs, VCPU_SREG_GS);
}
#endif

@@ -9814,7 +9805,7 @@ static void enter_smm(struct kvm_vcpu *vcpu)
memset(buf, 0, 512);
#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
- enter_smm_save_state_64(vcpu, buf);
+ enter_smm_save_state_64(vcpu, (struct kvm_smram_state_64 *)buf);
else
#endif
enter_smm_save_state_32(vcpu, (struct kvm_smram_state_32 *)buf);
--
2.26.3

2022-06-21 15:15:29

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH v2 05/11] KVM: x86: emulator: update the emulation mode after CR0 write

CR0.PE toggles real/protected mode, thus its update
should update the emulation mode.

This is likely a benign bug because there is no writeback
of state, other than the RIP increment, and when toggling
CR0.PE, the CPU has to execute code from a very low memory address.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 6f4632babc4cd8..002687d17f9364 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3659,11 +3659,22 @@ static int em_movbe(struct x86_emulate_ctxt *ctxt)

static int em_cr_write(struct x86_emulate_ctxt *ctxt)
{
- if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val))
+ int cr_num = ctxt->modrm_reg;
+ int r;
+
+ if (ctxt->ops->set_cr(ctxt, cr_num, ctxt->src.val))
return emulate_gp(ctxt, 0);

/* Disable writeback. */
ctxt->dst.type = OP_NONE;
+
+ if (cr_num == 0) {
+ /* CR0 write might have updated CR0.PE */
+ r = update_emulation_mode(ctxt);
+ if (r != X86EMUL_CONTINUE)
+ return r;
+ }
+
return X86EMUL_CONTINUE;
}

--
2.26.3

2022-06-21 15:15:59

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH v2 10/11] KVM: x86: SVM: use smram structs

This removes the last user of put_smstate/GET_SMSTATE so
remove these functions as well.

Also add a sanity check that we don't attempt to enter the SMM
on non long mode capable guest CPU with a running nested guest.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 6 ------
arch/x86/kvm/svm/svm.c | 28 +++++++++++++++++-----------
2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1038ccb7056a39..9e8467be96b4e6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2057,12 +2057,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
#endif
}

-#define put_smstate(type, buf, offset, val) \
- *(type *)((buf) + (offset) - 0x7e00) = val
-
-#define GET_SMSTATE(type, buf, offset) \
- (*(type *)((buf) + (offset) - 0x7e00))
-
int kvm_cpu_dirty_log_size(void);

int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 136298cfb3fb57..8dcbbe839bef36 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4399,6 +4399,7 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)

static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
{
+ struct kvm_smram_state_64 *smram = (struct kvm_smram_state_64 *)smstate;
struct vcpu_svm *svm = to_svm(vcpu);
struct kvm_host_map map_save;
int ret;
@@ -4406,10 +4407,17 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
if (!is_guest_mode(vcpu))
return 0;

- /* FED8h - SVM Guest */
- put_smstate(u64, smstate, 0x7ed8, 1);
- /* FEE0h - SVM Guest VMCB Physical Address */
- put_smstate(u64, smstate, 0x7ee0, svm->nested.vmcb12_gpa);
+ /*
+ * 32 bit SMRAM format doesn't preserve EFER and SVM state.
+ * SVM should not be enabled by the userspace without marking
+ * the CPU as at least long mode capable.
+ */
+
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_SVM))
+ return 1;
+
+ smram->svm_guest_flag = 1;
+ smram->svm_guest_vmcb_gpa = svm->nested.vmcb12_gpa;

svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
@@ -4446,9 +4454,9 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)

static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
{
+ struct kvm_smram_state_64 *smram = (struct kvm_smram_state_64 *)smstate;
struct vcpu_svm *svm = to_svm(vcpu);
struct kvm_host_map map, map_save;
- u64 saved_efer, vmcb12_gpa;
struct vmcb *vmcb12;
int ret;

@@ -4456,18 +4464,16 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
return 0;

/* Non-zero if SMI arrived while vCPU was in guest mode. */
- if (!GET_SMSTATE(u64, smstate, 0x7ed8))
+ if (!smram->svm_guest_flag)
return 0;

if (!guest_cpuid_has(vcpu, X86_FEATURE_SVM))
return 1;

- saved_efer = GET_SMSTATE(u64, smstate, 0x7ed0);
- if (!(saved_efer & EFER_SVME))
+ if (!(smram->efer & EFER_SVME))
return 1;

- vmcb12_gpa = GET_SMSTATE(u64, smstate, 0x7ee0);
- if (kvm_vcpu_map(vcpu, gpa_to_gfn(vmcb12_gpa), &map) == -EINVAL)
+ if (kvm_vcpu_map(vcpu, gpa_to_gfn(smram->svm_guest_vmcb_gpa), &map) == -EINVAL)
return 1;

ret = 1;
@@ -4493,7 +4499,7 @@ static int svm_leave_smm(struct kvm_vcpu *vcpu, const char *smstate)
vmcb12 = map.hva;
nested_copy_vmcb_control_to_cache(svm, &vmcb12->control);
nested_copy_vmcb_save_to_cache(svm, &vmcb12->save);
- ret = enter_svm_guest_mode(vcpu, vmcb12_gpa, vmcb12, false);
+ ret = enter_svm_guest_mode(vcpu, smram->svm_guest_vmcb_gpa, vmcb12, false);

if (ret)
goto unmap_save;
--
2.26.3

2022-06-21 15:26:13

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH v2 04/11] KVM: x86: emulator: update the emulation mode after rsm

This ensures that RIP will be correctly written back,
because the RSM instruction can switch the CPU mode from
32 bit (or less) to 64 bit.

This fixes a guest crash in case the #SMI is received
while the guest runs a code from an address > 32 bit.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 334a06e6c9b093..6f4632babc4cd8 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2662,6 +2662,11 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
if (ret != X86EMUL_CONTINUE)
goto emulate_shutdown;

+
+ ret = update_emulation_mode(ctxt);
+ if (ret != X86EMUL_CONTINUE)
+ goto emulate_shutdown;
+
/*
* Note, the ctxt->ops callbacks are responsible for handling side
* effects when writing MSRs and CRs, e.g. MMU context resets, CPUID
--
2.26.3

2022-06-21 15:49:24

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH v2 02/11] KVM: x86: emulator: introduce update_emulation_mode

Some instructions update the cpu execution mode, which needs
to update the emulation mode.

Extract this code, and make assign_eip_far use it.

assign_eip_far now reads CS, instead of getting it via a parameter,
which is ok, because callers always assign CS to the
same value before calling it.

No functional change is intended.

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 85 ++++++++++++++++++++++++++++--------------
1 file changed, 57 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 5aeb343ca8b007..2c0087df2d7e6a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -804,8 +804,7 @@ static int linearize(struct x86_emulate_ctxt *ctxt,
ctxt->mode, linear);
}

-static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst,
- enum x86emul_mode mode)
+static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst)
{
ulong linear;
int rc;
@@ -815,41 +814,71 @@ static inline int assign_eip(struct x86_emulate_ctxt *ctxt, ulong dst,

if (ctxt->op_bytes != sizeof(unsigned long))
addr.ea = dst & ((1UL << (ctxt->op_bytes << 3)) - 1);
- rc = __linearize(ctxt, addr, &max_size, 1, false, true, mode, &linear);
+ rc = __linearize(ctxt, addr, &max_size, 1, false, true, ctxt->mode, &linear);
if (rc == X86EMUL_CONTINUE)
ctxt->_eip = addr.ea;
return rc;
}

+static inline int update_emulation_mode(struct x86_emulate_ctxt *ctxt)
+{
+ u64 efer;
+ struct desc_struct cs;
+ u16 selector;
+ u32 base3;
+
+ ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
+
+ if (!ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE) {
+ /* Real mode. cpu must not have long mode active */
+ if (efer & EFER_LMA)
+ return X86EMUL_UNHANDLEABLE;
+ ctxt->mode = X86EMUL_MODE_REAL;
+ return X86EMUL_CONTINUE;
+ }
+
+ if (ctxt->eflags & X86_EFLAGS_VM) {
+ /* Protected/VM86 mode. cpu must not have long mode active */
+ if (efer & EFER_LMA)
+ return X86EMUL_UNHANDLEABLE;
+ ctxt->mode = X86EMUL_MODE_VM86;
+ return X86EMUL_CONTINUE;
+ }
+
+ if (!ctxt->ops->get_segment(ctxt, &selector, &cs, &base3, VCPU_SREG_CS))
+ return X86EMUL_UNHANDLEABLE;
+
+ if (efer & EFER_LMA) {
+ if (cs.l) {
+ /* Proper long mode */
+ ctxt->mode = X86EMUL_MODE_PROT64;
+ } else if (cs.d) {
+ /* 32 bit compatibility mode*/
+ ctxt->mode = X86EMUL_MODE_PROT32;
+ } else {
+ ctxt->mode = X86EMUL_MODE_PROT16;
+ }
+ } else {
+ /* Legacy 32 bit / 16 bit mode */
+ ctxt->mode = cs.d ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
+ }
+
+ return X86EMUL_CONTINUE;
+}
+
static inline int assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
{
- return assign_eip(ctxt, dst, ctxt->mode);
+ return assign_eip(ctxt, dst);
}

-static int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst,
- const struct desc_struct *cs_desc)
+static int assign_eip_far(struct x86_emulate_ctxt *ctxt, ulong dst)
{
- enum x86emul_mode mode = ctxt->mode;
- int rc;
+ int rc = update_emulation_mode(ctxt);

-#ifdef CONFIG_X86_64
- if (ctxt->mode >= X86EMUL_MODE_PROT16) {
- if (cs_desc->l) {
- u64 efer = 0;
+ if (rc != X86EMUL_CONTINUE)
+ return rc;

- ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
- if (efer & EFER_LMA)
- mode = X86EMUL_MODE_PROT64;
- } else
- mode = X86EMUL_MODE_PROT32; /* temporary value */
- }
-#endif
- if (mode == X86EMUL_MODE_PROT16 || mode == X86EMUL_MODE_PROT32)
- mode = cs_desc->d ? X86EMUL_MODE_PROT32 : X86EMUL_MODE_PROT16;
- rc = assign_eip(ctxt, dst, mode);
- if (rc == X86EMUL_CONTINUE)
- ctxt->mode = mode;
- return rc;
+ return assign_eip(ctxt, dst);
}

static inline int jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
@@ -2184,7 +2213,7 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE)
return rc;

- rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
+ rc = assign_eip_far(ctxt, ctxt->src.val);
/* Error handling is not implemented. */
if (rc != X86EMUL_CONTINUE)
return X86EMUL_UNHANDLEABLE;
@@ -2262,7 +2291,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
&new_desc);
if (rc != X86EMUL_CONTINUE)
return rc;
- rc = assign_eip_far(ctxt, eip, &new_desc);
+ rc = assign_eip_far(ctxt, eip);
/* Error handling is not implemented. */
if (rc != X86EMUL_CONTINUE)
return X86EMUL_UNHANDLEABLE;
@@ -3482,7 +3511,7 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE)
return rc;

- rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
+ rc = assign_eip_far(ctxt, ctxt->src.val);
if (rc != X86EMUL_CONTINUE)
goto fail;

--
2.26.3

2022-06-21 15:52:35

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH v2 06/11] KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on the image format

On 64 bit host, if the guest doesn't have X86_FEATURE_LM, we would
access 16 gprs to 32-bit smram image, causing out-ouf-bound ram
access.

On 32 bit host, the rsm_load_state_64/enter_smm_save_state_64
is compiled out, thus access overflow can't happen.

Fixes: b443183a25ab61 ("KVM: x86: Reduce the number of emulator GPRs to '8' for 32-bit KVM")

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 002687d17f9364..ce186aebca8e83 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2469,7 +2469,7 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,
ctxt->eflags = GET_SMSTATE(u32, smstate, 0x7ff4) | X86_EFLAGS_FIXED;
ctxt->_eip = GET_SMSTATE(u32, smstate, 0x7ff0);

- for (i = 0; i < NR_EMULATOR_GPRS; i++)
+ for (i = 0; i < 8; i++)
*reg_write(ctxt, i) = GET_SMSTATE(u32, smstate, 0x7fd0 + i * 4);

val = GET_SMSTATE(u32, smstate, 0x7fcc);
@@ -2526,7 +2526,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
u16 selector;
int i, r;

- for (i = 0; i < NR_EMULATOR_GPRS; i++)
+ for (i = 0; i < 16; i++)
*reg_write(ctxt, i) = GET_SMSTATE(u64, smstate, 0x7ff8 - i * 8);

ctxt->_eip = GET_SMSTATE(u64, smstate, 0x7f78);
--
2.26.3

2022-06-21 15:52:45

by Maxim Levitsky

[permalink] [raw]

Subject: [PATCH v2 08/11] KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore

Use kvm_smram_state_32 struct to save/restore 32 bit SMM state
(used when X86_FEATURE_LM is not present in the guest CPUID).

Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 81 +++++++++++++++---------------------------
arch/x86/kvm/x86.c | 75 +++++++++++++++++---------------------
2 files changed, 60 insertions(+), 96 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ce186aebca8e83..6d263906054689 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2367,25 +2367,17 @@ static void rsm_set_desc_flags(struct desc_struct *desc, u32 flags)
desc->type = (flags >> 8) & 15;
}

-static int rsm_load_seg_32(struct x86_emulate_ctxt *ctxt, const char *smstate,
+static void rsm_load_seg_32(struct x86_emulate_ctxt *ctxt,
+ struct kvm_smm_seg_state_32 *state,
+ u16 selector,
int n)
{
struct desc_struct desc;
- int offset;
- u16 selector;
-
- selector = GET_SMSTATE(u32, smstate, 0x7fa8 + n * 4);
-
- if (n < 3)
- offset = 0x7f84 + n * 12;
- else
- offset = 0x7f2c + (n - 3) * 12;

- set_desc_base(&desc, GET_SMSTATE(u32, smstate, offset + 8));
- set_desc_limit(&desc, GET_SMSTATE(u32, smstate, offset + 4));
- rsm_set_desc_flags(&desc, GET_SMSTATE(u32, smstate, offset));
+ set_desc_base(&desc, state->base);
+ set_desc_limit(&desc, state->limit);
+ rsm_set_desc_flags(&desc, state->flags);
ctxt->ops->set_segment(ctxt, selector, &desc, 0, n);
- return X86EMUL_CONTINUE;
}

#ifdef CONFIG_X86_64
@@ -2456,63 +2448,46 @@ static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
}

static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,
- const char *smstate)
+ struct kvm_smram_state_32 *smstate)
{
- struct desc_struct desc;
struct desc_ptr dt;
- u16 selector;
- u32 val, cr0, cr3, cr4;
int i;

- cr0 = GET_SMSTATE(u32, smstate, 0x7ffc);
- cr3 = GET_SMSTATE(u32, smstate, 0x7ff8);
- ctxt->eflags = GET_SMSTATE(u32, smstate, 0x7ff4) | X86_EFLAGS_FIXED;
- ctxt->_eip = GET_SMSTATE(u32, smstate, 0x7ff0);
+ ctxt->eflags = smstate->eflags | X86_EFLAGS_FIXED;
+ ctxt->_eip = smstate->eip;

for (i = 0; i < 8; i++)
- *reg_write(ctxt, i) = GET_SMSTATE(u32, smstate, 0x7fd0 + i * 4);
-
- val = GET_SMSTATE(u32, smstate, 0x7fcc);
+ *reg_write(ctxt, i) = smstate->gprs[i];

- if (ctxt->ops->set_dr(ctxt, 6, val))
+ if (ctxt->ops->set_dr(ctxt, 6, smstate->dr6))
return X86EMUL_UNHANDLEABLE;
-
- val = GET_SMSTATE(u32, smstate, 0x7fc8);
-
- if (ctxt->ops->set_dr(ctxt, 7, val))
+ if (ctxt->ops->set_dr(ctxt, 7, smstate->dr7))
return X86EMUL_UNHANDLEABLE;

- selector = GET_SMSTATE(u32, smstate, 0x7fc4);
- set_desc_base(&desc, GET_SMSTATE(u32, smstate, 0x7f64));
- set_desc_limit(&desc, GET_SMSTATE(u32, smstate, 0x7f60));
- rsm_set_desc_flags(&desc, GET_SMSTATE(u32, smstate, 0x7f5c));
- ctxt->ops->set_segment(ctxt, selector, &desc, 0, VCPU_SREG_TR);
+ rsm_load_seg_32(ctxt, &smstate->tr, smstate->tr_sel, VCPU_SREG_TR);
+ rsm_load_seg_32(ctxt, &smstate->ldtr, smstate->ldtr_sel, VCPU_SREG_LDTR);

- selector = GET_SMSTATE(u32, smstate, 0x7fc0);
- set_desc_base(&desc, GET_SMSTATE(u32, smstate, 0x7f80));
- set_desc_limit(&desc, GET_SMSTATE(u32, smstate, 0x7f7c));
- rsm_set_desc_flags(&desc, GET_SMSTATE(u32, smstate, 0x7f78));
- ctxt->ops->set_segment(ctxt, selector, &desc, 0, VCPU_SREG_LDTR);

- dt.address = GET_SMSTATE(u32, smstate, 0x7f74);
- dt.size = GET_SMSTATE(u32, smstate, 0x7f70);
+ dt.address = smstate->gdtr.base;
+ dt.size = smstate->gdtr.limit;
ctxt->ops->set_gdt(ctxt, &dt);

- dt.address = GET_SMSTATE(u32, smstate, 0x7f58);
- dt.size = GET_SMSTATE(u32, smstate, 0x7f54);
+ dt.address = smstate->idtr.base;
+ dt.size = smstate->idtr.limit;
ctxt->ops->set_idt(ctxt, &dt);

- for (i = 0; i < 6; i++) {
- int r = rsm_load_seg_32(ctxt, smstate, i);
- if (r != X86EMUL_CONTINUE)
- return r;
- }
+ rsm_load_seg_32(ctxt, &smstate->es, smstate->es_sel, VCPU_SREG_ES);
+ rsm_load_seg_32(ctxt, &smstate->cs, smstate->cs_sel, VCPU_SREG_CS);
+ rsm_load_seg_32(ctxt, &smstate->ss, smstate->ss_sel, VCPU_SREG_SS);

- cr4 = GET_SMSTATE(u32, smstate, 0x7f14);
+ rsm_load_seg_32(ctxt, &smstate->ds, smstate->ds_sel, VCPU_SREG_DS);
+ rsm_load_seg_32(ctxt, &smstate->fs, smstate->fs_sel, VCPU_SREG_FS);
+ rsm_load_seg_32(ctxt, &smstate->gs, smstate->gs_sel, VCPU_SREG_GS);

- ctxt->ops->set_smbase(ctxt, GET_SMSTATE(u32, smstate, 0x7ef8));
+ ctxt->ops->set_smbase(ctxt, smstate->smbase);

- return rsm_enter_protected_mode(ctxt, cr0, cr3, cr4);
+ return rsm_enter_protected_mode(ctxt, smstate->cr0,
+ smstate->cr3, smstate->cr4);
}

#ifdef CONFIG_X86_64
@@ -2657,7 +2632,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
ret = rsm_load_state_64(ctxt, buf);
else
#endif
- ret = rsm_load_state_32(ctxt, buf);
+ ret = rsm_load_state_32(ctxt, (struct kvm_smram_state_32 *)buf);

if (ret != X86EMUL_CONTINUE)
goto emulate_shutdown;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 00e23dc518e091..a1bbf2ed520769 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9674,22 +9674,18 @@ static u32 enter_smm_get_segment_flags(struct kvm_segment *seg)
return flags;
}

-static void enter_smm_save_seg_32(struct kvm_vcpu *vcpu, char *buf, int n)
+static void enter_smm_save_seg_32(struct kvm_vcpu *vcpu,
+ struct kvm_smm_seg_state_32 *state,
+ u32 *selector,
+ int n)
{
struct kvm_segment seg;
- int offset;

kvm_get_segment(vcpu, &seg, n);
- put_smstate(u32, buf, 0x7fa8 + n * 4, seg.selector);
-
- if (n < 3)
- offset = 0x7f84 + n * 12;
- else
- offset = 0x7f2c + (n - 3) * 12;
-
- put_smstate(u32, buf, offset + 8, seg.base);
- put_smstate(u32, buf, offset + 4, seg.limit);
- put_smstate(u32, buf, offset, enter_smm_get_segment_flags(&seg));
+ *selector = seg.selector;
+ state->base = seg.base;
+ state->limit = seg.limit;
+ state->flags = enter_smm_get_segment_flags(&seg);
}

#ifdef CONFIG_X86_64
@@ -9710,54 +9706,47 @@ static void enter_smm_save_seg_64(struct kvm_vcpu *vcpu, char *buf, int n)
}
#endif

-static void enter_smm_save_state_32(struct kvm_vcpu *vcpu, char *buf)
+static void enter_smm_save_state_32(struct kvm_vcpu *vcpu, struct kvm_smram_state_32 *smram)
{
struct desc_ptr dt;
- struct kvm_segment seg;
unsigned long val;
int i;

- put_smstate(u32, buf, 0x7ffc, kvm_read_cr0(vcpu));
- put_smstate(u32, buf, 0x7ff8, kvm_read_cr3(vcpu));
- put_smstate(u32, buf, 0x7ff4, kvm_get_rflags(vcpu));
- put_smstate(u32, buf, 0x7ff0, kvm_rip_read(vcpu));
+ smram->cr0 = kvm_read_cr0(vcpu);
+ smram->cr3 = kvm_read_cr3(vcpu);
+ smram->eflags = kvm_get_rflags(vcpu);
+ smram->eip = kvm_rip_read(vcpu);

for (i = 0; i < 8; i++)
- put_smstate(u32, buf, 0x7fd0 + i * 4, kvm_register_read_raw(vcpu, i));
+ smram->gprs[i] = kvm_register_read_raw(vcpu, i);

kvm_get_dr(vcpu, 6, &val);
- put_smstate(u32, buf, 0x7fcc, (u32)val);
+ smram->dr6 = (u32)val;
kvm_get_dr(vcpu, 7, &val);
- put_smstate(u32, buf, 0x7fc8, (u32)val);
+ smram->dr7 = (u32)val;

- kvm_get_segment(vcpu, &seg, VCPU_SREG_TR);
- put_smstate(u32, buf, 0x7fc4, seg.selector);
- put_smstate(u32, buf, 0x7f64, seg.base);
- put_smstate(u32, buf, 0x7f60, seg.limit);
- put_smstate(u32, buf, 0x7f5c, enter_smm_get_segment_flags(&seg));
-
- kvm_get_segment(vcpu, &seg, VCPU_SREG_LDTR);
- put_smstate(u32, buf, 0x7fc0, seg.selector);
- put_smstate(u32, buf, 0x7f80, seg.base);
- put_smstate(u32, buf, 0x7f7c, seg.limit);
- put_smstate(u32, buf, 0x7f78, enter_smm_get_segment_flags(&seg));
+ enter_smm_save_seg_32(vcpu, &smram->tr, &smram->tr_sel, VCPU_SREG_TR);
+ enter_smm_save_seg_32(vcpu, &smram->ldtr, &smram->ldtr_sel, VCPU_SREG_LDTR);

static_call(kvm_x86_get_gdt)(vcpu, &dt);
- put_smstate(u32, buf, 0x7f74, dt.address);
- put_smstate(u32, buf, 0x7f70, dt.size);
+ smram->gdtr.base = dt.address;
+ smram->gdtr.limit = dt.size;

static_call(kvm_x86_get_idt)(vcpu, &dt);
- put_smstate(u32, buf, 0x7f58, dt.address);
- put_smstate(u32, buf, 0x7f54, dt.size);
+ smram->idtr.base = dt.address;
+ smram->idtr.limit = dt.size;

- for (i = 0; i < 6; i++)
- enter_smm_save_seg_32(vcpu, buf, i);
+ enter_smm_save_seg_32(vcpu, &smram->es, &smram->es_sel, VCPU_SREG_ES);
+ enter_smm_save_seg_32(vcpu, &smram->cs, &smram->cs_sel, VCPU_SREG_CS);
+ enter_smm_save_seg_32(vcpu, &smram->ss, &smram->ss_sel, VCPU_SREG_SS);

- put_smstate(u32, buf, 0x7f14, kvm_read_cr4(vcpu));
+ enter_smm_save_seg_32(vcpu, &smram->ds, &smram->ds_sel, VCPU_SREG_DS);
+ enter_smm_save_seg_32(vcpu, &smram->fs, &smram->fs_sel, VCPU_SREG_FS);
+ enter_smm_save_seg_32(vcpu, &smram->gs, &smram->gs_sel, VCPU_SREG_GS);

- /* revision id */
- put_smstate(u32, buf, 0x7efc, 0x00020000);
- put_smstate(u32, buf, 0x7ef8, vcpu->arch.smbase);
+ smram->cr4 = kvm_read_cr4(vcpu);
+ smram->smm_revision = 0x00020000;
+ smram->smbase = vcpu->arch.smbase;
}

#ifdef CONFIG_X86_64
@@ -9828,7 +9817,7 @@ static void enter_smm(struct kvm_vcpu *vcpu)
enter_smm_save_state_64(vcpu, buf);
else
#endif
- enter_smm_save_state_32(vcpu, buf);
+ enter_smm_save_state_32(vcpu, (struct kvm_smram_state_32 *)buf);

/*
* Give enter_smm() a chance to make ISA-specific changes to the vCPU
--
2.26.3

2022-06-29 07:30:42

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 00/11] SMM emulation and interrupt shadow fixes

On Tue, 2022-06-21 at 18:08 +0300, Maxim Levitsky wrote:
> This patch series is a result of long debug work to find out why
> sometimes guests with win11 secure boot
> were failing during boot.
>
> During writing a unit test I found another bug, turns out
> that on rsm emulation, if the rsm instruction was done in real
> or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
>
> I also refactored the way we write SMRAM so it is easier
> now to understand what is going on.
>
> The main bug in this series which I fixed is that we
> allowed #SMI to happen during the STI interrupt shadow,
> and we did nothing to both reset it on #SMI handler
> entry and restore it on RSM.
>
> Best regards,
> Maxim Levitsky
>
> Maxim Levitsky (11):
> KVM: x86: emulator: em_sysexit should update ctxt->mode
> KVM: x86: emulator: introduce update_emulation_mode
> KVM: x86: emulator: remove assign_eip_near/far
> KVM: x86: emulator: update the emulation mode after rsm
> KVM: x86: emulator: update the emulation mode after CR0 write
> KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on
> the image format
> KVM: x86: emulator/smm: add structs for KVM's smram layout
> KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore
> KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore
> KVM: x86: SVM: use smram structs
> KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM
>
> arch/x86/include/asm/kvm_host.h | 6 -
> arch/x86/kvm/emulate.c | 305 ++++++++++++++++----------------
> arch/x86/kvm/kvm_emulate.h | 146 +++++++++++++++
> arch/x86/kvm/svm/svm.c | 28 +--
> arch/x86/kvm/x86.c | 162 ++++++++---------
> 5 files changed, 394 insertions(+), 253 deletions(-)
>
> --
> 2.26.3
>
>
A very gentle ping on the patch series.

Best regards,
Maxim Levitsky

2022-06-29 16:55:11

[permalink] [raw]

Subject: Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

On Tue, Jun 21, 2022 at 8:09 AM Maxim Levitsky <[email protected]> wrote:
>
> When #SMI is asserted, the CPU can be in interrupt shadow
> due to sti or mov ss.
>
> It is not mandatory in Intel/AMD prm to have the #SMI
> blocked during the shadow, and on top of
> that, since neither SVM nor VMX has true support for SMI
> window, waiting for one instruction would mean single stepping
> the guest.
>
> Instead, allow #SMI in this case, but both reset the interrupt
> window and stash its value in SMRAM to restore it on exit
> from SMM.
>
> This fixes rare failures seen mostly on windows guests on VMX,
> when #SMI falls on the sti instruction which mainfest in
> VM entry failure due to EFLAGS.IF not being set, but STI interrupt
> window still being set in the VMCS.

I think you're just making stuff up! See Note #5 at
https://sandpile.org/x86/inter.htm.

Can you reference the vendors' documentation that supports this change?

2022-06-30 06:34:19

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

On Wed, 2022-06-29 at 09:31 -0700, Jim Mattson wrote:
> On Tue, Jun 21, 2022 at 8:09 AM Maxim Levitsky <[email protected]> wrote:
> > When #SMI is asserted, the CPU can be in interrupt shadow
> > due to sti or mov ss.
> >
> > It is not mandatory in Intel/AMD prm to have the #SMI
> > blocked during the shadow, and on top of
> > that, since neither SVM nor VMX has true support for SMI
> > window, waiting for one instruction would mean single stepping
> > the guest.
> >
> > Instead, allow #SMI in this case, but both reset the interrupt
> > window and stash its value in SMRAM to restore it on exit
> > from SMM.
> >
> > This fixes rare failures seen mostly on windows guests on VMX,
> > when #SMI falls on the sti instruction which mainfest in
> > VM entry failure due to EFLAGS.IF not being set, but STI interrupt
> > window still being set in the VMCS.
>
> I think you're just making stuff up! See Note #5 at
> https://sandpile.org/x86/inter.htm.
>
> Can you reference the vendors' documentation that supports this change?
>

First of all, just to note that the actual issue here was that
we don't clear the shadow bits in the guest interruptability field
in the vmcb on SMM entry, that triggered a consistency check because
we do clear EFLAGS.IF.
Preserving the interrupt shadow is just nice to have.

That what Intel's spec says for the 'STI':

"The IF flag and the STI and CLI instructions do not prohibit the generation of exceptions and nonmaskable inter-
rupts (NMIs). However, NMIs (and system-management interrupts) may be inhibited on the instruction boundary
following an execution of STI that begins with IF = 0."

Thus it is likely that #SMI are just blocked when in shadow, but it is easier to implement
it this way (avoids single stepping the guest) and without any user visable difference,
which I noted in the patch description, I noted that there are two ways to solve this,
and preserving the int shadow in SMRAM is just more simple way.

As for CPUS that neither block SMI nor preserve the int shadaw, in theory they can, but that would
break things, as noted in this mail

https://lore.kernel.org/lkml/[email protected]/

It is possible though that real cpu supports HLT restart flag, which makes this a non issue,
still. I can't rule out that a real cpu doesn't preserve the interrupt shadow on SMI, but
I don't see why we can't do this to make things more robust.

Best regards,
Maxim Levitsky

2022-06-30 16:08:41

[permalink] [raw]

Subject: Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

On Wed, Jun 29, 2022 at 11:00 PM Maxim Levitsky <[email protected]> wrote:
>
> On Wed, 2022-06-29 at 09:31 -0700, Jim Mattson wrote:
> > On Tue, Jun 21, 2022 at 8:09 AM Maxim Levitsky <[email protected]> wrote:
> > > When #SMI is asserted, the CPU can be in interrupt shadow
> > > due to sti or mov ss.
> > >
> > > It is not mandatory in Intel/AMD prm to have the #SMI
> > > blocked during the shadow, and on top of
> > > that, since neither SVM nor VMX has true support for SMI
> > > window, waiting for one instruction would mean single stepping
> > > the guest.
> > >
> > > Instead, allow #SMI in this case, but both reset the interrupt
> > > window and stash its value in SMRAM to restore it on exit
> > > from SMM.
> > >
> > > This fixes rare failures seen mostly on windows guests on VMX,
> > > when #SMI falls on the sti instruction which mainfest in
> > > VM entry failure due to EFLAGS.IF not being set, but STI interrupt
> > > window still being set in the VMCS.
> >
> > I think you're just making stuff up! See Note #5 at
> > https://sandpile.org/x86/inter.htm.
> >
> > Can you reference the vendors' documentation that supports this change?
> >
>
> First of all, just to note that the actual issue here was that
> we don't clear the shadow bits in the guest interruptability field
> in the vmcb on SMM entry, that triggered a consistency check because
> we do clear EFLAGS.IF.
> Preserving the interrupt shadow is just nice to have.
>
>
> That what Intel's spec says for the 'STI':
>
> "The IF flag and the STI and CLI instructions do not prohibit the generation of exceptions and nonmaskable inter-
> rupts (NMIs). However, NMIs (and system-management interrupts) may be inhibited on the instruction boundary
> following an execution of STI that begins with IF = 0."
>
> Thus it is likely that #SMI are just blocked when in shadow, but it is easier to implement
> it this way (avoids single stepping the guest) and without any user visable difference,
> which I noted in the patch description, I noted that there are two ways to solve this,
> and preserving the int shadow in SMRAM is just more simple way.

It's not true that there is no user-visible difference. In your
implementation, the SMI handler can see that the interrupt was
delivered in the interrupt shadow.

The right fix for this problem is to block SMI in an interrupt shadow,
as is likely the case for all modern CPUs.

>
> As for CPUS that neither block SMI nor preserve the int shadaw, in theory they can, but that would
> break things, as noted in this mail
>
> https://lore.kernel.org/lkml/[email protected]/
>
> It is possible though that real cpu supports HLT restart flag, which makes this a non issue,
> still. I can't rule out that a real cpu doesn't preserve the interrupt shadow on SMI, but
> I don't see why we can't do this to make things more robust.

Because, as I said, I think you're just making stuff up...unless, of
course, you have documentation to back this up.

2022-07-05 14:40:27

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

On Tue, 2022-07-05 at 16:38 +0300, Maxim Levitsky wrote:
> On Thu, 2022-06-30 at 09:00 -0700, Jim Mattson wrote:
> > On Wed, Jun 29, 2022 at 11:00 PM Maxim Levitsky <[email protected]> wrote:
> > >
> > > On Wed, 2022-06-29 at 09:31 -0700, Jim Mattson wrote:
> > > > On Tue, Jun 21, 2022 at 8:09 AM Maxim Levitsky <[email protected]> wrote:
> > > > > When #SMI is asserted, the CPU can be in interrupt shadow
> > > > > due to sti or mov ss.
> > > > >
> > > > > It is not mandatory in Intel/AMD prm to have the #SMI
> > > > > blocked during the shadow, and on top of
> > > > > that, since neither SVM nor VMX has true support for SMI
> > > > > window, waiting for one instruction would mean single stepping
> > > > > the guest.
> > > > >
> > > > > Instead, allow #SMI in this case, but both reset the interrupt
> > > > > window and stash its value in SMRAM to restore it on exit
> > > > > from SMM.
> > > > >
> > > > > This fixes rare failures seen mostly on windows guests on VMX,
> > > > > when #SMI falls on the sti instruction which mainfest in
> > > > > VM entry failure due to EFLAGS.IF not being set, but STI interrupt
> > > > > window still being set in the VMCS.
> > > >
> > > > I think you're just making stuff up! See Note #5 at
> > > > https://sandpile.org/x86/inter.htm.
> > > >
> > > > Can you reference the vendors' documentation that supports this change?
> > > >
> > >
> > > First of all, just to note that the actual issue here was that
> > > we don't clear the shadow bits in the guest interruptability field
> > > in the vmcb on SMM entry, that triggered a consistency check because
> > > we do clear EFLAGS.IF.
> > > Preserving the interrupt shadow is just nice to have.
> > >
> > >
> > > That what Intel's spec says for the 'STI':
> > >
> > > "The IF flag and the STI and CLI instructions do not prohibit the generation of exceptions and nonmaskable inter-
> > > rupts (NMIs). However, NMIs (and system-management interrupts) may be inhibited on the instruction boundary
> > > following an execution of STI that begins with IF = 0."
> > >
> > > Thus it is likely that #SMI are just blocked when in shadow, but it is easier to implement
> > > it this way (avoids single stepping the guest) and without any user visable difference,
> > > which I noted in the patch description, I noted that there are two ways to solve this,
> > > and preserving the int shadow in SMRAM is just more simple way.
> >
> > It's not true that there is no user-visible difference. In your
> > implementation, the SMI handler can see that the interrupt was
> > delivered in the interrupt shadow.
>
> Most of the SMI save state area is reserved, and the handler has no way of knowing
> what CPU stored there, it can only access the fields that are reserved in the spec.
I mean fields that are not reserved in the spec.

Best regards,
Maxim Levitsky
>
> Yes, if the SMI handler really insists it can see that the saved RIP points to an
> instruction that follows the STI, but does that really matter? It is allowed by the
> spec explicitly anyway.
>
> Plus our SMI layout (at least for 32 bit) doesn't confirm to the X86 spec anyway,
> we as I found out flat out write over the fields that have other meaning in the X86 spec.
>
> Also I proposed to preserve the int shadow in internal kvm state and migrate
> it in upper 4 bits of the 'shadow' field of struct kvm_vcpu_events.
> Both Paolo and Sean proposed to store the int shadow in the SMRAM instead,
> and you didn't object to this, and now after I refactored and implemented
> the whole thing you suddently do.
>
> BTW, just FYI, I found out that qemu doesn't migrate the 'shadow' field,
> this needs to be fixed (not related to the issue, just FYI).
>
> >
> > The right fix for this problem is to block SMI in an interrupt shadow,
> > as is likely the case for all modern CPUs.
>
> Yes, I agree that this is the most correct fix.
>
> However AMD just recently posted a VNMI patch series to avoid
> single stepping the CPU when NMI is blocked due to the same reason, because
> it is fragile.
>
> Do you really want KVM to single step the guest in this case, to deliver the #SMI?
> I can do it, but it is bound to cause lot of trouble.
>
> Note that I will have to do it on both Intel and AMD, as neither has support for SMI
> window, unless I were to use MTF, which is broken on nested virt as you know,
> so a nested hypervisor running a guest with SMI will now have to cope with broken MTF.
>
> Note that I can't use the VIRQ hack we use for interrupt window, because there
> is no guarantee that the guest's EFLAGS.IF is on.
>
> Best regards,
> Maxim Levitsky
>
> >
> > >
> > > As for CPUS that neither block SMI nor preserve the int shadaw, in theory they can, but that would
> > > break things, as noted in this mail
> > >
> > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > > It is possible though that real cpu supports HLT restart flag, which makes this a non issue,
> > > still. I can't rule out that a real cpu doesn't preserve the interrupt shadow on SMI, but
> > > I don't see why we can't do this to make things more robust.
> >
> > Because, as I said, I think you're just making stuff up...unless, of
> > course, you have documentation to back this up.
> >
>

2022-07-05 14:41:12

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

On Thu, 2022-06-30 at 09:00 -0700, Jim Mattson wrote:
> On Wed, Jun 29, 2022 at 11:00 PM Maxim Levitsky <[email protected]> wrote:
> >
> > On Wed, 2022-06-29 at 09:31 -0700, Jim Mattson wrote:
> > > On Tue, Jun 21, 2022 at 8:09 AM Maxim Levitsky <[email protected]> wrote:
> > > > When #SMI is asserted, the CPU can be in interrupt shadow
> > > > due to sti or mov ss.
> > > >
> > > > It is not mandatory in Intel/AMD prm to have the #SMI
> > > > blocked during the shadow, and on top of
> > > > that, since neither SVM nor VMX has true support for SMI
> > > > window, waiting for one instruction would mean single stepping
> > > > the guest.
> > > >
> > > > Instead, allow #SMI in this case, but both reset the interrupt
> > > > window and stash its value in SMRAM to restore it on exit
> > > > from SMM.
> > > >
> > > > This fixes rare failures seen mostly on windows guests on VMX,
> > > > when #SMI falls on the sti instruction which mainfest in
> > > > VM entry failure due to EFLAGS.IF not being set, but STI interrupt
> > > > window still being set in the VMCS.
> > >
> > > I think you're just making stuff up! See Note #5 at
> > > https://sandpile.org/x86/inter.htm.
> > >
> > > Can you reference the vendors' documentation that supports this change?
> > >
> >
> > First of all, just to note that the actual issue here was that
> > we don't clear the shadow bits in the guest interruptability field
> > in the vmcb on SMM entry, that triggered a consistency check because
> > we do clear EFLAGS.IF.
> > Preserving the interrupt shadow is just nice to have.
> >
> >
> > That what Intel's spec says for the 'STI':
> >
> > "The IF flag and the STI and CLI instructions do not prohibit the generation of exceptions and nonmaskable inter-
> > rupts (NMIs). However, NMIs (and system-management interrupts) may be inhibited on the instruction boundary
> > following an execution of STI that begins with IF = 0."
> >
> > Thus it is likely that #SMI are just blocked when in shadow, but it is easier to implement
> > it this way (avoids single stepping the guest) and without any user visable difference,
> > which I noted in the patch description, I noted that there are two ways to solve this,
> > and preserving the int shadow in SMRAM is just more simple way.
>
> It's not true that there is no user-visible difference. In your
> implementation, the SMI handler can see that the interrupt was
> delivered in the interrupt shadow.

Most of the SMI save state area is reserved, and the handler has no way of knowing
what CPU stored there, it can only access the fields that are reserved in the spec.

Yes, if the SMI handler really insists it can see that the saved RIP points to an
instruction that follows the STI, but does that really matter? It is allowed by the
spec explicitly anyway.

Plus our SMI layout (at least for 32 bit) doesn't confirm to the X86 spec anyway,
we as I found out flat out write over the fields that have other meaning in the X86 spec.

Also I proposed to preserve the int shadow in internal kvm state and migrate
it in upper 4 bits of the 'shadow' field of struct kvm_vcpu_events.
Both Paolo and Sean proposed to store the int shadow in the SMRAM instead,
and you didn't object to this, and now after I refactored and implemented
the whole thing you suddently do.

BTW, just FYI, I found out that qemu doesn't migrate the 'shadow' field,
this needs to be fixed (not related to the issue, just FYI).

>
> The right fix for this problem is to block SMI in an interrupt shadow,
> as is likely the case for all modern CPUs.

Yes, I agree that this is the most correct fix.

However AMD just recently posted a VNMI patch series to avoid
single stepping the CPU when NMI is blocked due to the same reason, because
it is fragile.

Do you really want KVM to single step the guest in this case, to deliver the #SMI?
I can do it, but it is bound to cause lot of trouble.

Note that I will have to do it on both Intel and AMD, as neither has support for SMI
window, unless I were to use MTF, which is broken on nested virt as you know,
so a nested hypervisor running a guest with SMI will now have to cope with broken MTF.

Note that I can't use the VIRQ hack we use for interrupt window, because there
is no guarantee that the guest's EFLAGS.IF is on.

Best regards,
Maxim Levitsky

>
> >
> > As for CPUS that neither block SMI nor preserve the int shadaw, in theory they can, but that would
> > break things, as noted in this mail
> >
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > It is possible though that real cpu supports HLT restart flag, which makes this a non issue,
> > still. I can't rule out that a real cpu doesn't preserve the interrupt shadow on SMI, but
> > I don't see why we can't do this to make things more robust.
>
> Because, as I said, I think you're just making stuff up...unless, of
> course, you have documentation to back this up.
>

2022-07-05 15:00:31

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

On Tue, 2022-07-05 at 16:40 +0300, Maxim Levitsky wrote:
> On Tue, 2022-07-05 at 16:38 +0300, Maxim Levitsky wrote:
> > On Thu, 2022-06-30 at 09:00 -0700, Jim Mattson wrote:
> > > On Wed, Jun 29, 2022 at 11:00 PM Maxim Levitsky <[email protected]> wrote:
> > > >
> > > > On Wed, 2022-06-29 at 09:31 -0700, Jim Mattson wrote:
> > > > > On Tue, Jun 21, 2022 at 8:09 AM Maxim Levitsky <[email protected]> wrote:
> > > > > > When #SMI is asserted, the CPU can be in interrupt shadow
> > > > > > due to sti or mov ss.
> > > > > >
> > > > > > It is not mandatory in Intel/AMD prm to have the #SMI
> > > > > > blocked during the shadow, and on top of
> > > > > > that, since neither SVM nor VMX has true support for SMI
> > > > > > window, waiting for one instruction would mean single stepping
> > > > > > the guest.
> > > > > >
> > > > > > Instead, allow #SMI in this case, but both reset the interrupt
> > > > > > window and stash its value in SMRAM to restore it on exit
> > > > > > from SMM.
> > > > > >
> > > > > > This fixes rare failures seen mostly on windows guests on VMX,
> > > > > > when #SMI falls on the sti instruction which mainfest in
> > > > > > VM entry failure due to EFLAGS.IF not being set, but STI interrupt
> > > > > > window still being set in the VMCS.
> > > > >
> > > > > I think you're just making stuff up! See Note #5 at
> > > > > https://sandpile.org/x86/inter.htm.
> > > > >
> > > > > Can you reference the vendors' documentation that supports this change?
> > > > >
> > > >
> > > > First of all, just to note that the actual issue here was that
> > > > we don't clear the shadow bits in the guest interruptability field
> > > > in the vmcb on SMM entry, that triggered a consistency check because
> > > > we do clear EFLAGS.IF.
> > > > Preserving the interrupt shadow is just nice to have.
> > > >
> > > >
> > > > That what Intel's spec says for the 'STI':
> > > >
> > > > "The IF flag and the STI and CLI instructions do not prohibit the generation of exceptions and nonmaskable inter-
> > > > rupts (NMIs). However, NMIs (and system-management interrupts) may be inhibited on the instruction boundary
> > > > following an execution of STI that begins with IF = 0."
> > > >
> > > > Thus it is likely that #SMI are just blocked when in shadow, but it is easier to implement
> > > > it this way (avoids single stepping the guest) and without any user visable difference,
> > > > which I noted in the patch description, I noted that there are two ways to solve this,
> > > > and preserving the int shadow in SMRAM is just more simple way.
> > >
> > > It's not true that there is no user-visible difference. In your
> > > implementation, the SMI handler can see that the interrupt was
> > > delivered in the interrupt shadow.
> >
> > Most of the SMI save state area is reserved, and the handler has no way of knowing
> > what CPU stored there, it can only access the fields that are reserved in the spec.
> I mean fields that are not reserved in the spec.
>
> Best regards,
>         Maxim Levitsky
> >
> > Yes, if the SMI handler really insists it can see that the saved RIP points to an
> > instruction that follows the STI, but does that really matter? It is allowed by the
> > spec explicitly anyway.
> >
> > Plus our SMI layout (at least for 32 bit) doesn't confirm to the X86 spec anyway,
> > we as I found out flat out write over the fields that have other meaning in the X86 spec.
> >
> > Also I proposed to preserve the int shadow in internal kvm state and migrate
> > it in upper 4 bits of the 'shadow' field of struct kvm_vcpu_events.
> > Both Paolo and Sean proposed to store the int shadow in the SMRAM instead,
> > and you didn't object to this, and now after I refactored and implemented
> > the whole thing you suddently do.
> >
> > BTW, just FYI, I found out that qemu doesn't migrate the 'shadow' field,
> > this needs to be fixed (not related to the issue, just FYI).
> >
> > >
> > > The right fix for this problem is to block SMI in an interrupt shadow,
> > > as is likely the case for all modern CPUs.
> >
> > Yes, I agree that this is the most correct fix.
> >
> > However AMD just recently posted a VNMI patch series to avoid
> > single stepping the CPU when NMI is blocked due to the same reason, because
> > it is fragile.
> >
> > Do you really want KVM to single step the guest in this case, to deliver the #SMI?
> > I can do it, but it is bound to cause lot of trouble.
> >
> > Note that I will have to do it on both Intel and AMD, as neither has support for SMI
> > window, unless I were to use MTF, which is broken on nested virt as you know,
> > so a nested hypervisor running a guest with SMI will now have to cope with broken MTF.
> >
> > Note that I can't use the VIRQ hack we use for interrupt window, because there
> > is no guarantee that the guest's EFLAGS.IF is on.
> >
> > Best regards,
> >         Maxim Levitsky
> >
> > >
> > > >
> > > > As for CPUS that neither block SMI nor preserve the int shadaw, in theory they can, but that would
> > > > break things, as noted in this mail
> > > >
> > > > https://lore.kernel.org/lkml/[email protected]/
> > > >
> > > > It is possible though that real cpu supports HLT restart flag, which makes this a non issue,
> > > > still. I can't rule out that a real cpu doesn't preserve the interrupt shadow on SMI, but
> > > > I don't see why we can't do this to make things more robust.
> > >
> > > Because, as I said, I think you're just making stuff up...unless, of
> > > course, you have documentation to back this up.

Again, I clearly explained that I choose to implement it this way because it
is more robust, _and_ it was approved by both Sean and Paolo.

It is not called making stuff up - I never claimed that a real
CPU does it this way.

Best regards,
Maxim Levitsky

> > >
> >
>
>

2022-07-06 18:22:55

[permalink] [raw]

Subject: Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

On Tue, Jul 5, 2022 at 6:38 AM Maxim Levitsky <[email protected]> wrote:

> Most of the SMI save state area is reserved, and the handler has no way of knowing
> what CPU stored there, it can only access the fields that are reserved in the spec.
>
> Yes, if the SMI handler really insists it can see that the saved RIP points to an
> instruction that follows the STI, but does that really matter? It is allowed by the
> spec explicitly anyway.

I was just pointing out that the difference between blocking SMI and
not blocking SMI is, in fact, observable.

> Plus our SMI layout (at least for 32 bit) doesn't confirm to the X86 spec anyway,
> we as I found out flat out write over the fields that have other meaning in the X86 spec.

Shouldn't we fix that?

> Also I proposed to preserve the int shadow in internal kvm state and migrate
> it in upper 4 bits of the 'shadow' field of struct kvm_vcpu_events.
> Both Paolo and Sean proposed to store the int shadow in the SMRAM instead,
> and you didn't object to this, and now after I refactored and implemented
> the whole thing you suddently do.

I did not see the prior conversations. I rarely get an opportunity to
read the list.

> However AMD just recently posted a VNMI patch series to avoid
> single stepping the CPU when NMI is blocked due to the same reason, because
> it is fragile.

The vNMI feature isn't available in any shipping processor yet, is it?

> Do you really want KVM to single step the guest in this case, to deliver the #SMI?
> I can do it, but it is bound to cause lot of trouble.

Perhaps you could document this as a KVM erratum...one of many
involving virtual SMI delivery.

2022-07-06 20:05:19

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

On Wed, 2022-07-06 at 11:13 -0700, Jim Mattson wrote:
> On Tue, Jul 5, 2022 at 6:38 AM Maxim Levitsky <[email protected]> wrote:
>
> > Most of the SMI save state area is reserved, and the handler has no way of knowing
> > what CPU stored there, it can only access the fields that are reserved in the spec.
> >
> > Yes, if the SMI handler really insists it can see that the saved RIP points to an
> > instruction that follows the STI, but does that really matter? It is allowed by the
> > spec explicitly anyway.
>
> I was just pointing out that the difference between blocking SMI and
> not blocking SMI is, in fact, observable.

Yes, and I agree, I should have said that while observable,
it should cause no problem.

>
> > Plus our SMI layout (at least for 32 bit) doesn't confirm to the X86 spec anyway,
> > we as I found out flat out write over the fields that have other meaning in the X86 spec.
>
> Shouldn't we fix that?
I am afraid we can't because that will break (in theory) the backward compatibility
(e.g if someone migrates a VM while in SMM).

Plus this is only for 32 bit layout which is only used when the guest has no long
mode in CPUID, which is only used these days by 32 bit qemu

(I found it the hard way when I found that SMM with a nested guest doesn't work
for me on 32 bit, and it was because the KVM doesn't bother to save/restore the
running nested guest vmcb address, when we use 32 bit SMM layout, which makes
sense because truly 32 bit only AMD cpus likely didn't had SVM).

But then after looking at SDM I also found out that Intel and AMD have completely
different SMM layout for 64 bit. We follow the AMD's layout, but we don't
implement many fields, including some that are barely/not documented.
(e.g what is svm_guest_virtual_int?)

In theory we could use Intel's layout when we run with Intel's vendor ID,
and AMD's vise versa, but we probably won't bother + once again there
is an issue of backward compatibility.

Feel free to look at the patch series, I documented fully the SMRAM layout
that KVM uses, including all the places when it differs from the real
thing.

>
> > Also I proposed to preserve the int shadow in internal kvm state and migrate
> > it in upper 4 bits of the 'shadow' field of struct kvm_vcpu_events.
> > Both Paolo and Sean proposed to store the int shadow in the SMRAM instead,
> > and you didn't object to this, and now after I refactored and implemented
> > the whole thing you suddently do.
>
> I did not see the prior conversations. I rarely get an opportunity to
> read the list.
I understand.

>
> > However AMD just recently posted a VNMI patch series to avoid
> > single stepping the CPU when NMI is blocked due to the same reason, because
> > it is fragile.
>
> The vNMI feature isn't available in any shipping processor yet, is it?
Yes, but one of its purposes is to avoid single stepping the guest,
which is especially painful on AMD, because there is no MTF, so
you have to 'borrow' the TF flag in the EFLAGS, and that can leak into
the guest state (e.g pushed onto the stack).

>
> > Do you really want KVM to single step the guest in this case, to deliver the #SMI?
> > I can do it, but it is bound to cause lot of trouble.
>
> Perhaps you could document this as a KVM erratum...one of many
> involving virtual SMI delivery.

Absolutely, I can document that we choose to save/restore the int shadow in
SMRAM, something that CPUs usually don't really do, but happens to be the best way
to deal with this corner case.

(Actually looking at clause of default treatment of SMIs in Intel's PRM,
they do mention that they preserve the int shadow somewhere at least
on some Intel's CPUs).

BTW, according to my observations, it is really hard to hit this problem,
because it looks like when the CPU is in interrupt shadow, it doesn't process
_real_ interrupts as well (despite the fact that in VM, real interrupts
should never be blocked(*), but yet, that is what I observed on both AMD and Intel.

(*) You can allow the guest to control the real EFLAGS.IF on both VMX and SVM,
(in which case int shadow should indeed work as on bare metal)
but KVM of course doesn't do it.

I observed that when KVM sends #SMI from other vCPU, it sends a vCPU kick,
and the kick never arrives inside the interrupt shadow.
I have seen it on both VMX and SVM.

What still triggers this problem, is that the instruction which is in the interrupt
shadow can still get a VM exit, (e.g EPT/NPT violation) and then it can notice
the pending SMI.

I think it has to be EPT/NPT violation btw, because, IMHO most if not all other VM exits I
think are instruction intercepts, which will cause KVM to emulate the instruction
and clear the interrupt shadow, and only after that it will enter SMM.

Even MMIO/IOPORT access is emulated by the KVM.

Its not the case with EPT/NPT violation, because the KVM will in this case re-execute
the instruction after it 'fixes' the fault.

Best regards,
Maxim Levitsky

>

2022-07-06 20:54:35

[permalink] [raw]

Subject: Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

On Wed, Jul 6, 2022 at 1:00 PM Maxim Levitsky <[email protected]> wrote:
>
> On Wed, 2022-07-06 at 11:13 -0700, Jim Mattson wrote:
> > On Tue, Jul 5, 2022 at 6:38 AM Maxim Levitsky <[email protected]> wrote:
...
> > > Plus our SMI layout (at least for 32 bit) doesn't confirm to the X86 spec anyway,
> > > we as I found out flat out write over the fields that have other meaning in the X86 spec.
> >
> > Shouldn't we fix that?
> I am afraid we can't because that will break (in theory) the backward compatibility
> (e.g if someone migrates a VM while in SMM).

Every time someone says, "We can't fix this, because it breaks
backward compatibility," I think, "Another potential use of
KVM_CAP_DISABLE_QUIRKS2?"

...
> But then after looking at SDM I also found out that Intel and AMD have completely
> different SMM layout for 64 bit. We follow the AMD's layout, but we don't
> implement many fields, including some that are barely/not documented.
> (e.g what is svm_guest_virtual_int?)
>
> In theory we could use Intel's layout when we run with Intel's vendor ID,
> and AMD's vise versa, but we probably won't bother + once again there
> is an issue of backward compatibility.

This seems pretty egregious, since the SDM specifically states, "Some
of the registers in the SMRAM state save area (marked YES in column 3)
may be read and changed by the
SMI handler, with the changed values restored to the processor
registers by the RSM instruction." How can that possibly work with
AMD's layout?
(See my comment above regarding backwards compatibility.)

<soapbox>I wish KVM would stop offering virtual CPU features that are
completely broken.</soapbox>

> > The vNMI feature isn't available in any shipping processor yet, is it?
> Yes, but one of its purposes is to avoid single stepping the guest,
> which is especially painful on AMD, because there is no MTF, so
> you have to 'borrow' the TF flag in the EFLAGS, and that can leak into
> the guest state (e.g pushed onto the stack).

So, what's the solution for all of today's SVM-capable processors? KVM
will probably be supporting AMD CPUs without vNMI for the next decade
or two.

> (Actually looking at clause of default treatment of SMIs in Intel's PRM,
> they do mention that they preserve the int shadow somewhere at least
> on some Intel's CPUs).

Yes, this is a required part of VMX-critical state for processors that
support SMI recognition while there is blocking by STI or by MOV SS.
However, I don't believe that KVM actually saves VMX-critical state on
delivery of a virtual SMI.

>
> BTW, according to my observations, it is really hard to hit this problem,
> because it looks like when the CPU is in interrupt shadow, it doesn't process
> _real_ interrupts as well (despite the fact that in VM, real interrupts
> should never be blocked(*), but yet, that is what I observed on both AMD and Intel.
>
> (*) You can allow the guest to control the real EFLAGS.IF on both VMX and SVM,
> (in which case int shadow should indeed work as on bare metal)
> but KVM of course doesn't do it.

It doesn't surprise me that hardware treats a virtual interrupt shadow
as a physical interrupt shadow. IIRC, each vendor has a way of
breaking an endless chain of interrupt shadows, so a malicious guest
can't defer interrupts indefinitely.

> I observed that when KVM sends #SMI from other vCPU, it sends a vCPU kick,
> and the kick never arrives inside the interrupt shadow.
> I have seen it on both VMX and SVM.
>
> What still triggers this problem, is that the instruction which is in the interrupt
> shadow can still get a VM exit, (e.g EPT/NPT violation) and then it can notice
> the pending SMI.
>
> I think it has to be EPT/NPT violation btw, because, IMHO most if not all other VM exits I
> think are instruction intercepts, which will cause KVM to emulate the instruction
> and clear the interrupt shadow, and only after that it will enter SMM.
>
> Even MMIO/IOPORT access is emulated by the KVM.
>
> Its not the case with EPT/NPT violation, because the KVM will in this case re-execute
> the instruction after it 'fixes' the fault.

Probably #PF as well, then, if TDP is disabled.

2022-07-10 16:42:33

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

On Wed, 2022-07-06 at 13:38 -0700, Jim Mattson wrote:
> On Wed, Jul 6, 2022 at 1:00 PM Maxim Levitsky <[email protected]> wrote:
> > On Wed, 2022-07-06 at 11:13 -0700, Jim Mattson wrote:
> > > On Tue, Jul 5, 2022 at 6:38 AM Maxim Levitsky <[email protected]> wrote:
> ...
> > > > Plus our SMI layout (at least for 32 bit) doesn't confirm to the X86 spec anyway,
> > > > we as I found out flat out write over the fields that have other meaning in the X86 spec.
> > >
> > > Shouldn't we fix that?
> > I am afraid we can't because that will break (in theory) the backward compatibility
> > (e.g if someone migrates a VM while in SMM).
>
> Every time someone says, "We can't fix this, because it breaks
> backward compatibility," I think, "Another potential use of
> KVM_CAP_DISABLE_QUIRKS2?"
>
> ...
> > But then after looking at SDM I also found out that Intel and AMD have completely
> > different SMM layout for 64 bit. We follow the AMD's layout, but we don't
> > implement many fields, including some that are barely/not documented.
> > (e.g what is svm_guest_virtual_int?)
> >
> > In theory we could use Intel's layout when we run with Intel's vendor ID,
> > and AMD's vise versa, but we probably won't bother + once again there
> > is an issue of backward compatibility.
>
> This seems pretty egregious, since the SDM specifically states, "Some
> of the registers in the SMRAM state save area (marked YES in column 3)
> may be read and changed by the
> SMI handler, with the changed values restored to the processor
> registers by the RSM instruction." How can that possibly work with
> AMD's layout?
> (See my comment above regarding backwards compatibility.)
>
> <soapbox>I wish KVM would stop offering virtual CPU features that are
> completely broken.</soapbox>
>
> > > The vNMI feature isn't available in any shipping processor yet, is it?
> > Yes, but one of its purposes is to avoid single stepping the guest,
> > which is especially painful on AMD, because there is no MTF, so
> > you have to 'borrow' the TF flag in the EFLAGS, and that can leak into
> > the guest state (e.g pushed onto the stack).
>
> So, what's the solution for all of today's SVM-capable processors? KVM
> will probably be supporting AMD CPUs without vNMI for the next decade
> or two.

I did some homework on this a few months ago so here it goes:

First of all lets assume that GIF is set, because when clear, we just
intercept STGI to deliver #NMI there. Same for #SMI.
GIF is easy in other words in regard to interrupt window.

So it works like that:

When we inject #NMI, we enable IRET intercept (in svm_inject_nmi)
As long as we didn't hit IRET, that is our NMI window, so
enable_nmi_window does nothing.

We also mark this situation with

vcpu->arch.hflags |= HF_NMI_MASK;

This means that we are in NMI, but haven't yet
seen IRET.

When we hit IRET interception which is fault like interception,
we are still in NMI, until IRET completes.

We mark this situaion with

vcpu->arch.hflags |= HF_IRET_MASK;

Now both HF_NMI_MASK and HF_IRET_MASK are set.

If at that point someone enables NMI window,
the NMI window code (enable_nmi_window) detects the
(HF_NMI_MASK | HF_IRET_MASK), enables single stepping,
and remembers current RIP.

Finally svm_complete_interrupts (which is called on each vm exit)
notices the HF_IRET_MASK flag, and if set, and RIP is not the same as
it was when we enabled single stepping, then it clears the HF_NMI_MASK
and raises KVM_REQ_EVENT to possibly inject now an another NMI.

Of course if for example the IRET gets an exception (or even interrupt
since EFLAGS.IF can be set), then TF flag we force enabled will be pushed
onto the exception stack and leaked to the guest which is not nice.

Note that the same problem doesn't happen with STGI interception,
because unlike IRET, we fully emulate STGI, so upon completion of emulation
of it, the NMI window is open.

IF we could fully emualate IRET, we could have done the same with it as well,
but it is hard, and of course in the case of skipping over the interrupt shadow,
we would have to emulate *any* instruction which happens to be there,
which is not feasable at all for the KVM's emulator.

That also doesn't work with SEV-ES, due to encrypted nature of the guest
(but then emulated SMM won't work either), I guess this is another reason
for vNMI feature.

TL;DR - on #NMI injection we intercept IRET, and rely on its interception
to signal the almost start of the NMI window, but this still leaves a short
window of executing the IRET itself during which NMIs are still blocked,
so we have to single step over it.

Note that there is no issue with interrupt shadow here because NMI doesn't
respect it.

>
>
> > (Actually looking at clause of default treatment of SMIs in Intel's PRM,
> > they do mention that they preserve the int shadow somewhere at least
> > on some Intel's CPUs).
>
> Yes, this is a required part of VMX-critical state for processors that
> support SMI recognition while there is blocking by STI or by MOV SS.
> However, I don't believe that KVM actually saves VMX-critical state on
> delivery of a virtual SMI.

Yes, but that does suggest that older cpus which allowed SMI in interrupt
shadow did preserve it *somewhere* Its also not a spec violation to preserve it
in this way.

>
> > BTW, according to my observations, it is really hard to hit this problem,
> > because it looks like when the CPU is in interrupt shadow, it doesn't process
> > _real_ interrupts as well (despite the fact that in VM, real interrupts
> > should never be blocked(*), but yet, that is what I observed on both AMD and Intel.
> >
> > (*) You can allow the guest to control the real EFLAGS.IF on both VMX and SVM,
> > (in which case int shadow should indeed work as on bare metal)
> > but KVM of course doesn't do it.
>
> It doesn't surprise me that hardware treats a virtual interrupt shadow
> as a physical interrupt shadow. IIRC, each vendor has a way of
> breaking an endless chain of interrupt shadows, so a malicious guest
> can't defer interrupts indefinitely.

Thankfully a malicious guest can't abuse the STI interrupt shadow this way I think
because STI interrupt shadow is only valid if the STI actually enables the EFLAGS.IF.
If it was already set, there is no shadow.

I don't know how they deal with repeated MOV SS instruction. Maybe this one
doesn't enable real interrupt shadow, or also doesn't enable shadow if
the shadow is already enabled, I don't know.

>
> > I observed that when KVM sends #SMI from other vCPU, it sends a vCPU kick,
> > and the kick never arrives inside the interrupt shadow.
> > I have seen it on both VMX and SVM.
> >
> > What still triggers this problem, is that the instruction which is in the interrupt
> > shadow can still get a VM exit, (e.g EPT/NPT violation) and then it can notice
> > the pending SMI.
> >
> > I think it has to be EPT/NPT violation btw, because, IMHO most if not all other VM exits I
> > think are instruction intercepts, which will cause KVM to emulate the instruction
> > and clear the interrupt shadow, and only after that it will enter SMM.
> >
> > Even MMIO/IOPORT access is emulated by the KVM.
> >
> > Its not the case with EPT/NPT violation, because the KVM will in this case re-execute
> > the instruction after it 'fixes' the fault.
>
> Probably #PF as well, then, if TDP is disabled.

Yep no doubt about it.

Also come to think about it, we also intercept #AC and just forward it to the guest,
and since we let the instruction to be re-executed that won't clear the interrupt
shadow either.
#UD is also intercepted, and if the emulator can't emulate it, it should also be forwarded
to the guest. That gives me an idea to improve my test by sticking an UD2 there.
I'll take a look.

Best regards,
Maxim Levitsky

>

2022-07-14 12:07:23

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 00/11] SMM emulation and interrupt shadow fixes

On Tue, 2022-06-21 at 18:08 +0300, Maxim Levitsky wrote:
> This patch series is a result of long debug work to find out why
> sometimes guests with win11 secure boot
> were failing during boot.
>
> During writing a unit test I found another bug, turns out
> that on rsm emulation, if the rsm instruction was done in real
> or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
>
> I also refactored the way we write SMRAM so it is easier
> now to understand what is going on.
>
> The main bug in this series which I fixed is that we
> allowed #SMI to happen during the STI interrupt shadow,
> and we did nothing to both reset it on #SMI handler
> entry and restore it on RSM.
>
> Best regards,
>         Maxim Levitsky
>
> Maxim Levitsky (11):
> KVM: x86: emulator: em_sysexit should update ctxt->mode
> KVM: x86: emulator: introduce update_emulation_mode
> KVM: x86: emulator: remove assign_eip_near/far
> KVM: x86: emulator: update the emulation mode after rsm
> KVM: x86: emulator: update the emulation mode after CR0 write
> KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on
>     the image format
> KVM: x86: emulator/smm: add structs for KVM's smram layout
> KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore
> KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore
> KVM: x86: SVM: use smram structs
> KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM
>
> arch/x86/include/asm/kvm_host.h |   6 -
> arch/x86/kvm/emulate.c          | 305 ++++++++++++++++----------------
> arch/x86/kvm/kvm_emulate.h      | 146 +++++++++++++++
> arch/x86/kvm/svm/svm.c          | 28 +--
> arch/x86/kvm/x86.c              | 162 ++++++++---------
> 5 files changed, 394 insertions(+), 253 deletions(-)
>
> --
> 2.26.3
>
>
A kind ping on these patches.

Best regards,
Maxim Levitsky

2022-07-20 08:53:58

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 00/11] SMM emulation and interrupt shadow fixes

On Thu, 2022-07-14 at 14:06 +0300, Maxim Levitsky wrote:
> On Tue, 2022-06-21 at 18:08 +0300, Maxim Levitsky wrote:
> > This patch series is a result of long debug work to find out why
> > sometimes guests with win11 secure boot
> > were failing during boot.
> >
> > During writing a unit test I found another bug, turns out
> > that on rsm emulation, if the rsm instruction was done in real
> > or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
> >
> > I also refactored the way we write SMRAM so it is easier
> > now to understand what is going on.
> >
> > The main bug in this series which I fixed is that we
> > allowed #SMI to happen during the STI interrupt shadow,
> > and we did nothing to both reset it on #SMI handler
> > entry and restore it on RSM.
> >
> > Best regards,
> >         Maxim Levitsky
> >
> > Maxim Levitsky (11):
> > KVM: x86: emulator: em_sysexit should update ctxt->mode
> > KVM: x86: emulator: introduce update_emulation_mode
> > KVM: x86: emulator: remove assign_eip_near/far
> > KVM: x86: emulator: update the emulation mode after rsm
> > KVM: x86: emulator: update the emulation mode after CR0 write
> > KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on
> >     the image format
> > KVM: x86: emulator/smm: add structs for KVM's smram layout
> > KVM: x86: emulator/smm: use smram struct for 32 bit smram load/restore
> > KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore
> > KVM: x86: SVM: use smram structs
> > KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM
> >
> > arch/x86/include/asm/kvm_host.h |   6 -
> > arch/x86/kvm/emulate.c          | 305 ++++++++++++++++----------------
> > arch/x86/kvm/kvm_emulate.h      | 146 +++++++++++++++
> > arch/x86/kvm/svm/svm.c          | 28 +--
> > arch/x86/kvm/x86.c              | 162 ++++++++---------
> > 5 files changed, 394 insertions(+), 253 deletions(-)
> >
> > --
> > 2.26.3
> >
> >
> A kind ping on these patches.

Another kind ping on this patch series.

Best regards,
Maxim Levitsky

>
> Best regards,
>         Maxim Levitsky
>

2022-07-21 00:04:00

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 02/11] KVM: x86: emulator: introduce update_emulation_mode

On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> +static inline int update_emulation_mode(struct x86_emulate_ctxt *ctxt)

Maybe emulator_recalc_and_set_mode()? It took me a second to understand that
"update" also involves determining the "new" mode, e.g. I was trying to figure
out where @mode was :-)

> +{
> + u64 efer;
> + struct desc_struct cs;
> + u16 selector;
> + u32 base3;
> +
> + ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
> +
> + if (!ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE) {
> + /* Real mode. cpu must not have long mode active */
> + if (efer & EFER_LMA)
> + return X86EMUL_UNHANDLEABLE;

If we hit this, is there any hope of X86EMUL_UNHANDLEABLE doing the right thing?
Ah, SMM and the ability to swizzle SMRAM state. Bummer. I was hoping we could
just bug the VM.

> + ctxt->mode = X86EMUL_MODE_REAL;
> + return X86EMUL_CONTINUE;
> + }

2022-07-21 00:04:50

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 05/11] KVM: x86: emulator: update the emulation mode after CR0 write

On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> CR0.PE toggles real/protected mode, thus its update
> should update the emulation mode.
>
> This is likely a benign bug because there is no writeback
> of state, other than the RIP increment, and when toggling
> CR0.PE, the CPU has to execute code from a very low memory address.
>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/emulate.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 6f4632babc4cd8..002687d17f9364 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -3659,11 +3659,22 @@ static int em_movbe(struct x86_emulate_ctxt *ctxt)
>
> static int em_cr_write(struct x86_emulate_ctxt *ctxt)
> {
> - if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val))
> + int cr_num = ctxt->modrm_reg;
> + int r;
> +
> + if (ctxt->ops->set_cr(ctxt, cr_num, ctxt->src.val))
> return emulate_gp(ctxt, 0);
>
> /* Disable writeback. */
> ctxt->dst.type = OP_NONE;
> +
> + if (cr_num == 0) {
> + /* CR0 write might have updated CR0.PE */

Or toggled CR0.PG. It's probably also worth noting that ->set_cr() handles side
effects to other registers, e.g. the lack of an EFER.LMA update makes this look
suspicious at first glance.

> + r = update_emulation_mode(ctxt);
> + if (r != X86EMUL_CONTINUE)
> + return r;
> + }
> +
> return X86EMUL_CONTINUE;
> }
>
> --
> 2.26.3
>

2022-07-21 00:09:40

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 06/11] KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on the image format

On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> On 64 bit host, if the guest doesn't have X86_FEATURE_LM, we would

s/we would/KVM will

> access 16 gprs to 32-bit smram image, causing out-ouf-bound ram
> access.
>
> On 32 bit host, the rsm_load_state_64/enter_smm_save_state_64
> is compiled out, thus access overflow can't happen.
>
> Fixes: b443183a25ab61 ("KVM: x86: Reduce the number of emulator GPRs to '8' for 32-bit KVM")

Argh, I forgot that this one of the like five places KVM actually respects the
long mode flag. Even worse, I fixed basically the same thing a while back,
commit b68f3cc7d978 ("KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels").

We should really harden put_smstate() and GET_SMSTATE()...

> Signed-off-by: Maxim Levitsky <[email protected]>
> ---

Nits aside,

Reviewed-by: Sean Christopherson <[email protected]>

> arch/x86/kvm/emulate.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 002687d17f9364..ce186aebca8e83 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2469,7 +2469,7 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt,
> ctxt->eflags = GET_SMSTATE(u32, smstate, 0x7ff4) | X86_EFLAGS_FIXED;
> ctxt->_eip = GET_SMSTATE(u32, smstate, 0x7ff0);
>
> - for (i = 0; i < NR_EMULATOR_GPRS; i++)
> + for (i = 0; i < 8; i++)
> *reg_write(ctxt, i) = GET_SMSTATE(u32, smstate, 0x7fd0 + i * 4);
>
> val = GET_SMSTATE(u32, smstate, 0x7fcc);
> @@ -2526,7 +2526,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt,
> u16 selector;
> int i, r;
>
> - for (i = 0; i < NR_EMULATOR_GPRS; i++)
> + for (i = 0; i < 16; i++)
> *reg_write(ctxt, i) = GET_SMSTATE(u64, smstate, 0x7ff8 - i * 8);
>
> ctxt->_eip = GET_SMSTATE(u64, smstate, 0x7f78);
> --
> 2.26.3
>

2022-07-21 00:12:28

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 06/11] KVM: x86: emulator/smm: number of GPRs in the SMRAM image depends on the image format

On Thu, Jul 21, 2022, Sean Christopherson wrote:
> On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> > On 64 bit host, if the guest doesn't have X86_FEATURE_LM, we would
>
> s/we would/KVM will
>
> > access 16 gprs to 32-bit smram image, causing out-ouf-bound ram
> > access.
> >
> > On 32 bit host, the rsm_load_state_64/enter_smm_save_state_64
> > is compiled out, thus access overflow can't happen.
> >
> > Fixes: b443183a25ab61 ("KVM: x86: Reduce the number of emulator GPRs to '8' for 32-bit KVM")
>
> Argh, I forgot that this one of the like five places KVM actually respects the
> long mode flag. Even worse, I fixed basically the same thing a while back,
> commit b68f3cc7d978 ("KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels").
>
> We should really harden put_smstate() and GET_SMSTATE()...

Or I could read the next few patches and see that they go away...

2022-07-21 00:29:42

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 03/11] KVM: x86: emulator: remove assign_eip_near/far

On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> Now the assign_eip_far just updates the emulation mode in addition to
> updating the rip, it doesn't make sense to keep that function.
>
> Move mode update to the callers and remove these functions.

I disagree, IMO there's a lot of value in differentiating between near and far.
Yeah, the assign_eip_near() wrapper is kinda silly, but have that instead of
a bare assign_eip() documents that e.g. jmp_rel() is a near jump and that it's
not missing an update.

2022-07-21 00:30:39

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 10/11] KVM: x86: SVM: use smram structs

On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> This removes the last user of put_smstate/GET_SMSTATE so
> remove these functions as well.
>
> Also add a sanity check that we don't attempt to enter the SMM
> on non long mode capable guest CPU with a running nested guest.
>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 6 ------
> arch/x86/kvm/svm/svm.c | 28 +++++++++++++++++-----------
> 2 files changed, 17 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1038ccb7056a39..9e8467be96b4e6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -2057,12 +2057,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
> #endif
> }
>
> -#define put_smstate(type, buf, offset, val) \
> - *(type *)((buf) + (offset) - 0x7e00) = val
> -
> -#define GET_SMSTATE(type, buf, offset) \
> - (*(type *)((buf) + (offset) - 0x7e00))
> -
> int kvm_cpu_dirty_log_size(void);
>
> int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 136298cfb3fb57..8dcbbe839bef36 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4399,6 +4399,7 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
>
> static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
> {
> + struct kvm_smram_state_64 *smram = (struct kvm_smram_state_64 *)smstate;
> struct vcpu_svm *svm = to_svm(vcpu);
> struct kvm_host_map map_save;
> int ret;
> @@ -4406,10 +4407,17 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
> if (!is_guest_mode(vcpu))
> return 0;
>
> - /* FED8h - SVM Guest */
> - put_smstate(u64, smstate, 0x7ed8, 1);
> - /* FEE0h - SVM Guest VMCB Physical Address */
> - put_smstate(u64, smstate, 0x7ee0, svm->nested.vmcb12_gpa);
> + /*
> + * 32 bit SMRAM format doesn't preserve EFER and SVM state.
> + * SVM should not be enabled by the userspace without marking
> + * the CPU as at least long mode capable.
> + */
> +
> + if (!guest_cpuid_has(vcpu, X86_FEATURE_SVM))
> + return 1;

Isn't this is an independent bug fix? I.e. should be in its own patch?

> +
> + smram->svm_guest_flag = 1;
> + smram->svm_guest_vmcb_gpa = svm->nested.vmcb12_gpa;
>
> svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
> svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];

2022-07-21 00:52:36

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 10/11] KVM: x86: SVM: use smram structs

On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> + if (!guest_cpuid_has(vcpu, X86_FEATURE_SVM))

I think you want X86_FEATURE_LM, not X86_FEATURE_SVM.

2022-07-21 01:06:32

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 07/11] KVM: x86: emulator/smm: add structs for KVM's smram layout

On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> Those structs will be used to read/write the smram state image.
>
> Also document the differences between KVM's SMRAM layout and SMRAM
> layout that is used by real Intel/AMD cpus.
>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> arch/x86/kvm/kvm_emulate.h | 139 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 139 insertions(+)
>
> diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
> index 89246446d6aa9d..7015728da36d5f 100644
> --- a/arch/x86/kvm/kvm_emulate.h
> +++ b/arch/x86/kvm/kvm_emulate.h
> @@ -503,6 +503,145 @@ enum x86_intercept {
> nr_x86_intercepts
> };
>
> +
> +/*
> + * 32 bit KVM's emulated SMM layout
> + * Loosely based on Intel's layout
> + */
> +
> +struct kvm_smm_seg_state_32 {
> + u32 flags;
> + u32 limit;
> + u32 base;
> +} __packed;
> +
> +struct kvm_smram_state_32 {
> +
> + u32 reserved1[62]; /* FE00 - FEF7 */
> + u32 smbase; /* FEF8 */
> + u32 smm_revision; /* FEFC */
> + u32 reserved2[5]; /* FF00-FF13 */
> + /* CR4 is not present in Intel/AMD SMRAM image*/
> + u32 cr4; /* FF14 */
> + u32 reserved3[5]; /* FF18 */

Again, I love this approach, but we should have compile-time asserts to verify
the layout, e.g. see vmx_check_vmcs12_offsets().

2022-07-21 01:11:45

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 09/11] KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore

On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> Use kvm_smram_state_64 struct to save/restore the 64 bit SMM state
> (used when X86_FEATURE_LM is present in the guest CPUID,
> regardless of 32-bitness of the guest).
>
> Signed-off-by: Maxim Levitsky <[email protected]>
> ---
> @@ -9814,7 +9805,7 @@ static void enter_smm(struct kvm_vcpu *vcpu)
> memset(buf, 0, 512);
> #ifdef CONFIG_X86_64
> if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
> - enter_smm_save_state_64(vcpu, buf);
> + enter_smm_save_state_64(vcpu, (struct kvm_smram_state_64 *)buf);
> else
> #endif
> enter_smm_save_state_32(vcpu, (struct kvm_smram_state_32 *)buf);

Hrm, I _love_ the approach overall, but I really dislike having to cast an
arbitrary buffer, especially in the SVM code.

Aha! Rather than keeping a buffer and casting, create a union to hold everything:

union kvm_smram {
struct kvm_smram_state_64 smram64;
struct kvm_smram_state_32 smram32;
u8 bytes[512];
};

and then enter_smm() becomes:

static void enter_smm(struct kvm_vcpu *vcpu)
{
struct kvm_segment cs, ds;
struct desc_ptr dt;
unsigned long cr0;

union kvm_smram smram;

BUILD_BUG_ON(sizeof(smram) != 512);

memset(smram.bytes, 0, sizeof(smram));
#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
enter_smm_save_state_64(vcpu, &smram.smram64);
else
#endif
enter_smm_save_state_32(vcpu, &smram.smram32);

/*
* Give enter_smm() a chance to make ISA-specific changes to the vCPU
* state (e.g. leave guest mode) after we've saved the state into the
* SMM state-save area.
*/
static_call(kvm_x86_enter_smm)(vcpu, &smram);

kvm_smm_changed(vcpu, true);
kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, smram.bytes, sizeof(smram));

and em_rsm() gets similar treatment. Then the vendor code doesn't have to cast,
e.g. SVM can do:

smram->smram64.svm_guest_flag = 1;
smram->smram64.svm_guest_vmcb_gpa = svm->nested.vmcb12_gpa;

That way we don't have to refactor this all again if we want to use SMRAM to save
something on Intel for VMX (though I agree with Jim that that's probably a bad idea).

2022-07-21 11:55:43

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 02/11] KVM: x86: emulator: introduce update_emulation_mode

On Wed, 2022-07-20 at 23:44 +0000, Sean Christopherson wrote:
> On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> > +static inline int update_emulation_mode(struct x86_emulate_ctxt *ctxt)
>
> Maybe emulator_recalc_and_set_mode()? It took me a second to understand that
> "update" also involves determining the "new" mode, e.g. I was trying to figure
> out where @mode was :-)

I don't mind at all, will update in v3.

>
> > +{
> > + u64 efer;
> > + struct desc_struct cs;
> > + u16 selector;
> > + u32 base3;
> > +
> > + ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
> > +
> > + if (!ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE) {
> > + /* Real mode. cpu must not have long mode active */
> > + if (efer & EFER_LMA)
> > + return X86EMUL_UNHANDLEABLE;
>
> If we hit this, is there any hope of X86EMUL_UNHANDLEABLE doing the right thing?
> Ah, SMM and the ability to swizzle SMRAM state. Bummer. I was hoping we could
> just bug the VM.

I just tried to be a good citizen here, it is probably impossible to hit this case.
(RSM ignores LMA bit in the EFER in the SMRAM).

Best regards,
Maxim Levitsky

>
> > + ctxt->mode = X86EMUL_MODE_REAL;
> > + return X86EMUL_CONTINUE;
> > + }

2022-07-21 11:56:16

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 05/11] KVM: x86: emulator: update the emulation mode after CR0 write

On Wed, 2022-07-20 at 23:50 +0000, Sean Christopherson wrote:
> On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> > CR0.PE toggles real/protected mode, thus its update
> > should update the emulation mode.
> >
> > This is likely a benign bug because there is no writeback
> > of state, other than the RIP increment, and when toggling
> > CR0.PE, the CPU has to execute code from a very low memory address.
> >
> > Signed-off-by: Maxim Levitsky <[email protected]>
> > ---
> > arch/x86/kvm/emulate.c | 13 ++++++++++++-
> > 1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> > index 6f4632babc4cd8..002687d17f9364 100644
> > --- a/arch/x86/kvm/emulate.c
> > +++ b/arch/x86/kvm/emulate.c
> > @@ -3659,11 +3659,22 @@ static int em_movbe(struct x86_emulate_ctxt *ctxt)
> >
> > static int em_cr_write(struct x86_emulate_ctxt *ctxt)
> > {
> > - if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val))
> > + int cr_num = ctxt->modrm_reg;
> > + int r;
> > +
> > + if (ctxt->ops->set_cr(ctxt, cr_num, ctxt->src.val))
> > return emulate_gp(ctxt, 0);
> >
> > /* Disable writeback. */
> > ctxt->dst.type = OP_NONE;
> > +
> > + if (cr_num == 0) {
> > + /* CR0 write might have updated CR0.PE */
>
> Or toggled CR0.PG.

I thought about it but paging actually does not affect the CPU mode.

E.g if you are in protected mode, instructions execute the same regardless
if you have paging or not.

(There are probably some exceptions but you understand what I mean).

Best regards,
Maxim Levitsky

> It's probably also worth noting that ->set_cr() handles side
> effects to other registers, e.g. the lack of an EFER.LMA update makes this look
> suspicious at first glance.

>
> > + r = update_emulation_mode(ctxt);
> > + if (r != X86EMUL_CONTINUE)
> > + return r;
> > + }
> > +
> > return X86EMUL_CONTINUE;
> > }
> >
> > --
> > 2.26.3
> >

2022-07-21 11:56:18

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 07/11] KVM: x86: emulator/smm: add structs for KVM's smram layout

On Thu, 2022-07-21 at 00:40 +0000, Sean Christopherson wrote:
> On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> > Those structs will be used to read/write the smram state image.
> >
> > Also document the differences between KVM's SMRAM layout and SMRAM
> > layout that is used by real Intel/AMD cpus.
> >
> > Signed-off-by: Maxim Levitsky <[email protected]>
> > ---
> > arch/x86/kvm/kvm_emulate.h | 139 +++++++++++++++++++++++++++++++++++++
> > 1 file changed, 139 insertions(+)
> >
> > diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
> > index 89246446d6aa9d..7015728da36d5f 100644
> > --- a/arch/x86/kvm/kvm_emulate.h
> > +++ b/arch/x86/kvm/kvm_emulate.h
> > @@ -503,6 +503,145 @@ enum x86_intercept {
> > nr_x86_intercepts
> > };
> >
> > +
> > +/*
> > + * 32 bit KVM's emulated SMM layout
> > + * Loosely based on Intel's layout
> > + */
> > +
> > +struct kvm_smm_seg_state_32 {
> > + u32 flags;
> > + u32 limit;
> > + u32 base;
> > +} __packed;
> > +
> > +struct kvm_smram_state_32 {
> > +
> > + u32 reserved1[62]; /* FE00 - FEF7 */
> > + u32 smbase; /* FEF8 */
> > + u32 smm_revision; /* FEFC */
> > + u32 reserved2[5]; /* FF00-FF13 */
> > + /* CR4 is not present in Intel/AMD SMRAM image*/
> > + u32 cr4; /* FF14 */
> > + u32 reserved3[5]; /* FF18 */
>
> Again, I love this approach, but we should have compile-time asserts to verify
> the layout, e.g. see vmx_check_vmcs12_offsets().
>

No objections, will do.

Best regards,
Maxim Levitsky

2022-07-21 11:57:13

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 10/11] KVM: x86: SVM: use smram structs

On Thu, 2022-07-21 at 00:18 +0000, Sean Christopherson wrote:
> On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> > This removes the last user of put_smstate/GET_SMSTATE so
> > remove these functions as well.
> >
> > Also add a sanity check that we don't attempt to enter the SMM
> > on non long mode capable guest CPU with a running nested guest.
> >
> > Signed-off-by: Maxim Levitsky <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 6 ------
> > arch/x86/kvm/svm/svm.c | 28 +++++++++++++++++-----------
> > 2 files changed, 17 insertions(+), 17 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 1038ccb7056a39..9e8467be96b4e6 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -2057,12 +2057,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
> > #endif
> > }
> >
> > -#define put_smstate(type, buf, offset, val) \
> > - *(type *)((buf) + (offset) - 0x7e00) = val
> > -
> > -#define GET_SMSTATE(type, buf, offset) \
> > - (*(type *)((buf) + (offset) - 0x7e00))
> > -
> > int kvm_cpu_dirty_log_size(void);
> >
> > int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages);
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 136298cfb3fb57..8dcbbe839bef36 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4399,6 +4399,7 @@ static int svm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> >
> > static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
> > {
> > + struct kvm_smram_state_64 *smram = (struct kvm_smram_state_64 *)smstate;
> > struct vcpu_svm *svm = to_svm(vcpu);
> > struct kvm_host_map map_save;
> > int ret;
> > @@ -4406,10 +4407,17 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
> > if (!is_guest_mode(vcpu))
> > return 0;
> >
> > - /* FED8h - SVM Guest */
> > - put_smstate(u64, smstate, 0x7ed8, 1);
> > - /* FEE0h - SVM Guest VMCB Physical Address */
> > - put_smstate(u64, smstate, 0x7ee0, svm->nested.vmcb12_gpa);
> > + /*
> > + * 32 bit SMRAM format doesn't preserve EFER and SVM state.
> > + * SVM should not be enabled by the userspace without marking
> > + * the CPU as at least long mode capable.
> > + */
> > +
> > + if (!guest_cpuid_has(vcpu, X86_FEATURE_SVM))
> > + return 1;
>
> Isn't this is an independent bug fix? I.e. should be in its own patch?

Yes to some extent, patch was small so I didn't bother, but I don't mind splitting it up.

Best regards,
Maxim Levitsky

>
> > +
> > + smram->svm_guest_flag = 1;
> > + smram->svm_guest_vmcb_gpa = svm->nested.vmcb12_gpa;
> >
> > svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX];
> > svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];

2022-07-21 11:58:03

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 10/11] KVM: x86: SVM: use smram structs

On Thu, 2022-07-21 at 00:39 +0000, Sean Christopherson wrote:
> On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> > + if (!guest_cpuid_has(vcpu, X86_FEATURE_SVM))
>
> I think you want X86_FEATURE_LM, not X86_FEATURE_SVM.
>

Yes, sorry about that! Thanks!

Best regards,
Maxim Levitsky

2022-07-21 12:17:15

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 09/11] KVM: x86: emulator/smm: use smram struct for 64 bit smram load/restore

On Thu, 2022-07-21 at 00:38 +0000, Sean Christopherson wrote:
> On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> > Use kvm_smram_state_64 struct to save/restore the 64 bit SMM state
> > (used when X86_FEATURE_LM is present in the guest CPUID,
> > regardless of 32-bitness of the guest).
> >
> > Signed-off-by: Maxim Levitsky <[email protected]>
> > ---
> > @@ -9814,7 +9805,7 @@ static void enter_smm(struct kvm_vcpu *vcpu)
> > memset(buf, 0, 512);
> > #ifdef CONFIG_X86_64
> > if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
> > - enter_smm_save_state_64(vcpu, buf);
> > + enter_smm_save_state_64(vcpu, (struct kvm_smram_state_64 *)buf);
> > else
> > #endif
> > enter_smm_save_state_32(vcpu, (struct kvm_smram_state_32 *)buf);
>
> Hrm, I _love_ the approach overall, but I really dislike having to cast an
> arbitrary buffer, especially in the SVM code.
>
> Aha! Rather than keeping a buffer and casting, create a union to hold everything:
>
> union kvm_smram {
> struct kvm_smram_state_64 smram64;
> struct kvm_smram_state_32 smram32;
> u8 bytes[512];
> };

Great idea, will do in v3.

>
> and then enter_smm() becomes:
>
> static void enter_smm(struct kvm_vcpu *vcpu)
> {
> struct kvm_segment cs, ds;
> struct desc_ptr dt;
> unsigned long cr0;
>
> union kvm_smram smram;
>
> BUILD_BUG_ON(sizeof(smram) != 512);
>
> memset(smram.bytes, 0, sizeof(smram));
> #ifdef CONFIG_X86_64
> if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
> enter_smm_save_state_64(vcpu, &smram.smram64);
> else
> #endif
> enter_smm_save_state_32(vcpu, &smram.smram32);
>
> /*
> * Give enter_smm() a chance to make ISA-specific changes to the vCPU
> * state (e.g. leave guest mode) after we've saved the state into the
> * SMM state-save area.
> */
> static_call(kvm_x86_enter_smm)(vcpu, &smram);
>
> kvm_smm_changed(vcpu, true);
> kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, smram.bytes, sizeof(smram));
>
> and em_rsm() gets similar treatment. Then the vendor code doesn't have to cast,
> e.g. SVM can do:
>
> smram->smram64.svm_guest_flag = 1;
> smram->smram64.svm_guest_vmcb_gpa = svm->nested.vmcb12_gpa;
>
> That way we don't have to refactor this all again if we want to use SMRAM to save
> something on Intel for VMX (though I agree with Jim that that's probably a bad idea).
>

Best regards,
Maxim Levitsky

2022-07-21 12:58:34

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 03/11] KVM: x86: emulator: remove assign_eip_near/far

On Wed, 2022-07-20 at 23:51 +0000, Sean Christopherson wrote:
> On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> > Now the assign_eip_far just updates the emulation mode in addition to
> > updating the rip, it doesn't make sense to keep that function.
> >
> > Move mode update to the callers and remove these functions.
>
> I disagree, IMO there's a lot of value in differentiating between near and far.
> Yeah, the assign_eip_near() wrapper is kinda silly, but have that instead of
> a bare assign_eip() documents that e.g. jmp_rel() is a near jump and that it's
> not missing an update.
>

OK, I'll drop this patch then.

Best regards,
Maxim Levitsky

2022-07-21 15:04:28

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 02/11] KVM: x86: emulator: introduce update_emulation_mode

On Thu, Jul 21, 2022, Maxim Levitsky wrote:
> On Wed, 2022-07-20 at 23:44 +0000, Sean Christopherson wrote:
> > On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> > > + if (!ctxt->ops->get_cr(ctxt, 0) & X86_CR0_PE) {
> > > + /* Real mode. cpu must not have long mode active */
> > > + if (efer & EFER_LMA)
> > > + return X86EMUL_UNHANDLEABLE;
> >
> > If we hit this, is there any hope of X86EMUL_UNHANDLEABLE doing the right thing?
> > Ah, SMM and the ability to swizzle SMRAM state. Bummer. I was hoping we could
> > just bug the VM.
>
> I just tried to be a good citizen here, it is probably impossible to hit this case.
> (RSM ignores LMA bit in the EFER in the SMRAM).

The reason I asked is because if all of the X86EMUL_UNHANDLEABLE paths are impossible
then my preference would be to refactor this slightly to:

static int emulator_calc_cpu_mode(const struct x86_emulate_ctxt *ctxt)

and return the mode instead of success/failure, and turn those checks into:

KVM_EMULATOR_BUG_ON(efer & EFER_LMA);

with the callers being:

ctxt->mode = emulator_calc_cpu_mode(ctxt);

But I think this one:

if (!ctxt->ops->get_segment(ctxt, &selector, &cs, &base3, VCPU_SREG_CS))
return X86EMUL_UNHANDLEABLE;

is reachable in the em_rsm() case :-/

2022-07-21 15:08:00

by Sean Christopherson

[permalink] [raw]

Subject: Re: [PATCH v2 05/11] KVM: x86: emulator: update the emulation mode after CR0 write

On Thu, Jul 21, 2022, Maxim Levitsky wrote:
> On Wed, 2022-07-20 at 23:50 +0000, Sean Christopherson wrote:
> > On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> > > CR0.PE toggles real/protected mode, thus its update
> > > should update the emulation mode.
> > >
> > > This is likely a benign bug because there is no writeback
> > > of state, other than the RIP increment, and when toggling
> > > CR0.PE, the CPU has to execute code from a very low memory address.
> > >
> > > Signed-off-by: Maxim Levitsky <[email protected]>
> > > ---
> > > arch/x86/kvm/emulate.c | 13 ++++++++++++-
> > > 1 file changed, 12 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> > > index 6f4632babc4cd8..002687d17f9364 100644
> > > --- a/arch/x86/kvm/emulate.c
> > > +++ b/arch/x86/kvm/emulate.c
> > > @@ -3659,11 +3659,22 @@ static int em_movbe(struct x86_emulate_ctxt *ctxt)
> > >
> > > static int em_cr_write(struct x86_emulate_ctxt *ctxt)
> > > {
> > > - if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val))
> > > + int cr_num = ctxt->modrm_reg;
> > > + int r;
> > > +
> > > + if (ctxt->ops->set_cr(ctxt, cr_num, ctxt->src.val))
> > > return emulate_gp(ctxt, 0);
> > >
> > > /* Disable writeback. */
> > > ctxt->dst.type = OP_NONE;
> > > +
> > > + if (cr_num == 0) {
> > > + /* CR0 write might have updated CR0.PE */
> >
> > Or toggled CR0.PG.
>
> I thought about it but paging actually does not affect the CPU mode.

Toggling CR0.PG when EFER.LME=1 (and CR4.PAE=1) switches the CPU in and out of
long mode. That's why I mentioned the EFER.LMA thing below. It's also notable
in that the only reason we don't have to handle CR4 here is because clearing
CR4.PAE while long is active causes a #GP.

> E.g if you are in protected mode, instructions execute the same regardless
> if you have paging or not.
>
> (There are probably some exceptions but you understand what I mean).
>
> Best regards,
> Maxim Levitsky
>
> > It's probably also worth noting that ->set_cr() handles side
> > effects to other registers, e.g. the lack of an EFER.LMA update makes this look
> > suspicious at first glance.

2022-07-21 15:09:54

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [PATCH v2 05/11] KVM: x86: emulator: update the emulation mode after CR0 write

On Thu, 2022-07-21 at 14:11 +0000, Sean Christopherson wrote:
> On Thu, Jul 21, 2022, Maxim Levitsky wrote:
> > On Wed, 2022-07-20 at 23:50 +0000, Sean Christopherson wrote:
> > > On Tue, Jun 21, 2022, Maxim Levitsky wrote:
> > > > CR0.PE toggles real/protected mode, thus its update
> > > > should update the emulation mode.
> > > >
> > > > This is likely a benign bug because there is no writeback
> > > > of state, other than the RIP increment, and when toggling
> > > > CR0.PE, the CPU has to execute code from a very low memory address.
> > > >
> > > > Signed-off-by: Maxim Levitsky <[email protected]>
> > > > ---
> > > > arch/x86/kvm/emulate.c | 13 ++++++++++++-
> > > > 1 file changed, 12 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> > > > index 6f4632babc4cd8..002687d17f9364 100644
> > > > --- a/arch/x86/kvm/emulate.c
> > > > +++ b/arch/x86/kvm/emulate.c
> > > > @@ -3659,11 +3659,22 @@ static int em_movbe(struct x86_emulate_ctxt *ctxt)
> > > >
> > > > static int em_cr_write(struct x86_emulate_ctxt *ctxt)
> > > > {
> > > > - if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val))
> > > > + int cr_num = ctxt->modrm_reg;
> > > > + int r;
> > > > +
> > > > + if (ctxt->ops->set_cr(ctxt, cr_num, ctxt->src.val))
> > > > return emulate_gp(ctxt, 0);
> > > >
> > > > /* Disable writeback. */
> > > > ctxt->dst.type = OP_NONE;
> > > > +
> > > > + if (cr_num == 0) {
> > > > + /* CR0 write might have updated CR0.PE */
> > >
> > > Or toggled CR0.PG.
> >
> > I thought about it but paging actually does not affect the CPU mode.
>
> Toggling CR0.PG when EFER.LME=1 (and CR4.PAE=1) switches the CPU in and out of
> long mode. That's why I mentioned the EFER.LMA thing below. It's also notable
> in that the only reason we don't have to handle CR4 here is because clearing
> CR4.PAE while long is active causes a #GP.

I had a distinct feeling that this is related to LMA/LME which I always learn and then forget
Now I do, and I wrote a short summary for myself to refresh my memory when I forget about this again :-)

I'll update the comment again in v3.

Thanks a lot,
Best regards,
Maxim Levitsky

>
> > E.g if you are in protected mode, instructions execute the same regardless
> > if you have paging or not.
> >
> > (There are probably some exceptions but you understand what I mean).
> >
> > Best regards,
> > Maxim Levitsky
> >
> > > It's probably also worth noting that ->set_cr() handles side
> > > effects to other registers, e.g. the lack of an EFER.LMA update makes this look
> > > suspicious at first glance.