This patch series is a result of long debug work to find out why
sometimes guests with win11 secure boot
were failing during boot.
During writing a unit test I found another bug, turns out
that on rsm emulation, if the rsm instruction was done in real
or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
I also refactored the way we write SMRAM so it is easier
now to understand what is going on.
The main bug in this series which I fixed is that we
allowed #SMI to happen during the STI interrupt shadow,
and we did nothing to both reset it on #SMI handler
entry and restore it on RSM.
V4:
- rebased on top of patch series from Paolo which
allows smm support to be disabled by Kconfig option.
- addressed review feedback.
I included these patches in the series for reference.
Best regards,
Maxim Levitsky
Maxim Levitsky (15):
bug: introduce ASSERT_STRUCT_OFFSET
KVM: x86: emulator: em_sysexit should update ctxt->mode
KVM: x86: emulator: introduce emulator_recalc_and_set_mode
KVM: x86: emulator: update the emulation mode after rsm
KVM: x86: emulator: update the emulation mode after CR0 write
KVM: x86: smm: number of GPRs in the SMRAM image depends on the image
format
KVM: x86: smm: check for failures on smm entry
KVM: x86: smm: add structs for KVM's smram layout
KVM: x86: smm: use smram structs in the common code
KVM: x86: smm: use smram struct for 32 bit smram load/restore
KVM: x86: smm: use smram struct for 64 bit smram load/restore
KVM: svm: drop explicit return value of kvm_vcpu_map
KVM: x86: SVM: use smram structs
KVM: x86: SVM: don't save SVM state to SMRAM when VM is not long mode
capable
KVM: x86: smm: preserve interrupt shadow in SMRAM
Paolo Bonzini (8):
KVM: x86: start moving SMM-related functions to new files
KVM: x86: move SMM entry to a new file
KVM: x86: move SMM exit to a new file
KVM: x86: do not go through ctxt->ops when emulating rsm
KVM: allow compiling out SMM support
KVM: x86: compile out vendor-specific code if SMM is disabled
KVM: x86: remove SMRAM address space if SMM is not supported
KVM: x86: do not define KVM_REQ_SMI if SMM disabled
arch/x86/include/asm/kvm-x86-ops.h | 2 +
arch/x86/include/asm/kvm_host.h | 29 +-
arch/x86/kvm/Kconfig | 11 +
arch/x86/kvm/Makefile | 1 +
arch/x86/kvm/emulate.c | 458 +++----------
arch/x86/kvm/kvm_cache_regs.h | 5 -
arch/x86/kvm/kvm_emulate.h | 47 +-
arch/x86/kvm/lapic.c | 14 +-
arch/x86/kvm/lapic.h | 7 +-
arch/x86/kvm/mmu/mmu.c | 1 +
arch/x86/kvm/smm.c | 637 ++++++++++++++++++
arch/x86/kvm/smm.h | 160 +++++
arch/x86/kvm/svm/nested.c | 3 +
arch/x86/kvm/svm/svm.c | 43 +-
arch/x86/kvm/vmx/nested.c | 1 +
arch/x86/kvm/vmx/vmcs12.h | 5 +-
arch/x86/kvm/vmx/vmx.c | 11 +-
arch/x86/kvm/x86.c | 353 +---------
include/linux/build_bug.h | 9 +
tools/testing/selftests/kvm/x86_64/smm_test.c | 2 +
20 files changed, 1031 insertions(+), 768 deletions(-)
create mode 100644 arch/x86/kvm/smm.c
create mode 100644 arch/x86/kvm/smm.h
--
2.34.3
Update the emulation mode when handling writes to CR0, because
toggling CR0.PE switches between Real and Protected Mode, and toggling
CR0.PG when EFER.LME=1 switches between Long and Protected Mode.
This is likely a benign bug because there is no writeback of state,
other than the RIP increment, and when toggling CR0.PE, the CPU has
to execute code from a very low memory address.
Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2c56d08b426065..5cc3efa0e21c17 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3290,11 +3290,25 @@ static int em_movbe(struct x86_emulate_ctxt *ctxt)
static int em_cr_write(struct x86_emulate_ctxt *ctxt)
{
- if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val))
+ int cr_num = ctxt->modrm_reg;
+ int r;
+
+ if (ctxt->ops->set_cr(ctxt, cr_num, ctxt->src.val))
return emulate_gp(ctxt, 0);
/* Disable writeback. */
ctxt->dst.type = OP_NONE;
+
+ if (cr_num == 0) {
+ /*
+ * CR0 write might have updated CR0.PE and/or CR0.PG
+ * which can affect the cpu's execution mode.
+ */
+ r = emulator_recalc_and_set_mode(ctxt);
+ if (r != X86EMUL_CONTINUE)
+ return r;
+ }
+
return X86EMUL_CONTINUE;
}
--
2.34.3
From: Paolo Bonzini <[email protected]>
This ensures that all the relevant code is compiled out, in fact
the process_smi stub can be removed too.
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/smm.h | 1 -
arch/x86/kvm/x86.c | 6 ++++++
3 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 28e41926027e7a..0b0a82c0bb5cbd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -81,7 +81,9 @@
#define KVM_REQ_NMI KVM_ARCH_REQ(9)
#define KVM_REQ_PMU KVM_ARCH_REQ(10)
#define KVM_REQ_PMI KVM_ARCH_REQ(11)
+#ifdef CONFIG_KVM_SMM
#define KVM_REQ_SMI KVM_ARCH_REQ(12)
+#endif
#define KVM_REQ_MASTERCLOCK_UPDATE KVM_ARCH_REQ(13)
#define KVM_REQ_MCLOCK_INPROGRESS \
KVM_ARCH_REQ_FLAGS(14, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
diff --git a/arch/x86/kvm/smm.h b/arch/x86/kvm/smm.h
index 7ccce6b655cacf..a6795b93ba3002 100644
--- a/arch/x86/kvm/smm.h
+++ b/arch/x86/kvm/smm.h
@@ -28,7 +28,6 @@ void process_smi(struct kvm_vcpu *vcpu);
static inline int kvm_inject_smi(struct kvm_vcpu *vcpu) { return -ENOTTY; }
static inline bool is_smm(struct kvm_vcpu *vcpu) { return false; }
static inline void kvm_smm_changed(struct kvm_vcpu *vcpu, bool in_smm) { WARN_ON_ONCE(1); }
-static inline void process_smi(struct kvm_vcpu *vcpu) { WARN_ON_ONCE(1); }
/*
* emulator_leave_smm is used as a function pointer, so the
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8394cd62c2854c..56004890a71742 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5033,8 +5033,10 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
process_nmi(vcpu);
+#ifdef CONFIG_KVM_SMM
if (kvm_check_request(KVM_REQ_SMI, vcpu))
process_smi(vcpu);
+#endif
/*
* KVM's ABI only allows for one exception to be migrated. Luckily,
@@ -10207,8 +10209,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
}
if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
record_steal_time(vcpu);
+#ifdef CONFIG_KVM_SMM
if (kvm_check_request(KVM_REQ_SMI, vcpu))
process_smi(vcpu);
+#endif
if (kvm_check_request(KVM_REQ_NMI, vcpu))
process_nmi(vcpu);
if (kvm_check_request(KVM_REQ_PMU, vcpu))
@@ -12565,7 +12569,9 @@ bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu)
return true;
if (kvm_test_request(KVM_REQ_NMI, vcpu) ||
+#ifdef CONFIG_KVM_SMM
kvm_test_request(KVM_REQ_SMI, vcpu) ||
+#endif
kvm_test_request(KVM_REQ_EVENT, vcpu))
return true;
--
2.34.3
SYSEXIT is one of the instructions that can change the
processor mode, thus ctxt->mode should be updated after it.
Note that this is likely a benign bug, because the only problematic
mode change is from 32 bit to 64 bit which can lead to truncation of RIP,
and it is not possible to do with sysexit,
since sysexit running in 32 bit mode will be limited to 32 bit version.
Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/emulate.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 671f7e5871ff70..e23c984d6aae09 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2525,6 +2525,7 @@ static int em_sysexit(struct x86_emulate_ctxt *ctxt)
ops->set_segment(ctxt, ss_sel, &ss, 0, VCPU_SREG_SS);
ctxt->_eip = rdx;
+ ctxt->mode = usermode;
*reg_write(ctxt, VCPU_REGS_RSP) = rcx;
return X86EMUL_CONTINUE;
--
2.34.3
In the rare case of the failure on SMM entry, the KVM should at
least terminate the VM instead of going south.
Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/smm.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index b290ad14070f72..1191a79cf027e5 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -211,11 +211,17 @@ void enter_smm(struct kvm_vcpu *vcpu)
* Give enter_smm() a chance to make ISA-specific changes to the vCPU
* state (e.g. leave guest mode) after we've saved the state into the
* SMM state-save area.
+ *
+ * Kill the VM in the unlikely case of failure, because the VM
+ * can be in undefined state in this case.
*/
- static_call(kvm_x86_enter_smm)(vcpu, buf);
+ if (static_call(kvm_x86_enter_smm)(vcpu, buf))
+ goto error;
kvm_smm_changed(vcpu, true);
- kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, buf, sizeof(buf));
+
+ if (kvm_vcpu_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, buf, sizeof(buf)))
+ goto error;
if (static_call(kvm_x86_get_nmi_mask)(vcpu))
vcpu->arch.hflags |= HF_SMM_INSIDE_NMI_MASK;
@@ -235,7 +241,8 @@ void enter_smm(struct kvm_vcpu *vcpu)
dt.address = dt.size = 0;
static_call(kvm_x86_set_idt)(vcpu, &dt);
- kvm_set_dr(vcpu, 7, DR7_FIXED_1);
+ if (WARN_ON_ONCE(kvm_set_dr(vcpu, 7, DR7_FIXED_1)))
+ goto error;
cs.selector = (vcpu->arch.smbase >> 4) & 0xffff;
cs.base = vcpu->arch.smbase;
@@ -264,11 +271,15 @@ void enter_smm(struct kvm_vcpu *vcpu)
#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
- static_call(kvm_x86_set_efer)(vcpu, 0);
+ if (static_call(kvm_x86_set_efer)(vcpu, 0))
+ goto error;
#endif
kvm_update_cpuid_runtime(vcpu);
kvm_mmu_reset_context(vcpu);
+ return;
+error:
+ kvm_vm_dead(vcpu->kvm);
}
static void rsm_set_desc_flags(struct kvm_segment *desc, u32 flags)
--
2.34.3
When the guest CPUID doesn't have support for long mode, 32 bit SMRAM
layout is used and it has no support for preserving EFER and/or SVM
state.
Note that this isn't relevant to running 32 bit guests on VM which is
long mode capable - such VM can still run 32 bit guests in compatibility
mode.
Signed-off-by: Maxim Levitsky <[email protected]>
---
arch/x86/kvm/svm/svm.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3004a5ff3fbf79..d22a809d923339 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4445,6 +4445,14 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, union kvm_smram *smram)
if (!is_guest_mode(vcpu))
return 0;
+ /*
+ * 32-bit SMRAM format doesn't preserve EFER and SVM state. Userspace is
+ * responsible for ensuring nested SVM and SMIs are mutually exclusive.
+ */
+
+ if (!guest_cpuid_has(vcpu, X86_FEATURE_LM))
+ return 1;
+
smram->smram64.svm_guest_flag = 1;
smram->smram64.svm_guest_vmcb_gpa = svm->nested.vmcb12_gpa;
--
2.34.3
On 10/25/22 14:47, Maxim Levitsky wrote:
> This patch series is a result of long debug work to find out why
> sometimes guests with win11 secure boot
> were failing during boot.
>
> During writing a unit test I found another bug, turns out
> that on rsm emulation, if the rsm instruction was done in real
> or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
>
> I also refactored the way we write SMRAM so it is easier
> now to understand what is going on.
>
> The main bug in this series which I fixed is that we
> allowed #SMI to happen during the STI interrupt shadow,
> and we did nothing to both reset it on #SMI handler
> entry and restore it on RSM.
I have now sent out the final/new version of the first 8 patches and
will review these tomorrow. Thanks for your patience. :)
Paolo
On Thu, 2022-10-27 at 18:49 +0200, Paolo Bonzini wrote:
> On 10/25/22 14:47, Maxim Levitsky wrote:
> > This patch series is a result of long debug work to find out why
> > sometimes guests with win11 secure boot
> > were failing during boot.
> >
> > During writing a unit test I found another bug, turns out
> > that on rsm emulation, if the rsm instruction was done in real
> > or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
> >
> > I also refactored the way we write SMRAM so it is easier
> > now to understand what is going on.
> >
> > The main bug in this series which I fixed is that we
> > allowed #SMI to happen during the STI interrupt shadow,
> > and we did nothing to both reset it on #SMI handler
> > entry and restore it on RSM.
>
> I have now sent out the final/new version of the first 8 patches and
> will review these tomorrow. Thanks for your patience. :)
>
> Paolo
>
Thank you very much!!
Best regards,
Maxim Levitsky
On 10/27/22 19:06, Maxim Levitsky wrote:
> On Thu, 2022-10-27 at 18:49 +0200, Paolo Bonzini wrote:
>> On 10/25/22 14:47, Maxim Levitsky wrote:
>>> This patch series is a result of long debug work to find out why
>>> sometimes guests with win11 secure boot
>>> were failing during boot.
>>>
>>> During writing a unit test I found another bug, turns out
>>> that on rsm emulation, if the rsm instruction was done in real
>>> or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
>>>
>>> I also refactored the way we write SMRAM so it is easier
>>> now to understand what is going on.
>>>
>>> The main bug in this series which I fixed is that we
>>> allowed #SMI to happen during the STI interrupt shadow,
>>> and we did nothing to both reset it on #SMI handler
>>> entry and restore it on RSM.
>>
>> I have now sent out the final/new version of the first 8 patches and
>> will review these tomorrow. Thanks for your patience. :)
>>
>> Paolo
>>
> Thank you very much!!
Queued, thanks. Note that some emulator patches should go in stable
releases so I have reordered them in front.
Paolo
On Fri, Oct 28, 2022, Paolo Bonzini wrote:
> On 10/27/22 19:06, Maxim Levitsky wrote:
> > On Thu, 2022-10-27 at 18:49 +0200, Paolo Bonzini wrote:
> > > On 10/25/22 14:47, Maxim Levitsky wrote:
> > > > This patch series is a result of long debug work to find out why
> > > > sometimes guests with win11 secure boot
> > > > were failing during boot.
> > > >
> > > > During writing a unit test I found another bug, turns out
> > > > that on rsm emulation, if the rsm instruction was done in real
> > > > or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
> > > >
> > > > I also refactored the way we write SMRAM so it is easier
> > > > now to understand what is going on.
> > > >
> > > > The main bug in this series which I fixed is that we
> > > > allowed #SMI to happen during the STI interrupt shadow,
> > > > and we did nothing to both reset it on #SMI handler
> > > > entry and restore it on RSM.
> > >
> > > I have now sent out the final/new version of the first 8 patches and
> > > will review these tomorrow. Thanks for your patience. :)
> > >
> > > Paolo
> > >
> > Thank you very much!!
>
> Queued, thanks. Note that some emulator patches should go in stable
> releases so I have reordered them in front.
Can you fix patch 04 (also patch 04 in your series[*]) before pushing to kvm/queue?
The unused variable breaks CONFIG_KVM_WERROR=y builds.
arch/x86/kvm/smm.c: In function ‘emulator_leave_smm’:
arch/x86/kvm/smm.c:567:33: error: unused variable ‘efer’ [-Werror=unused-variable]
567 | unsigned long cr0, cr4, efer;
| ^~~~
arch/x86/kvm/smm.c:567:28: error: unused variable ‘cr4’ [-Werror=unused-variable]
567 | unsigned long cr0, cr4, efer;
|
[*] https://lore.kernel.org/all/[email protected]