2015-04-30 11:36:33

by Paolo Bonzini

[permalink] [raw]
Subject: [RFC PATCH 00/13] KVM: x86: SMM support

This patch series introduces system management mode support.
There is still some work to do, namely: test without unrestricted
guest support, test on AMD, disable the capability if !unrestricted
guest and !emulate invalid guest state(*), test with a QEMU that
understand KVM_MEM_X86_SMRAM, actually post QEMU patches that let
you use this.

(*) newer chipsets moved away from legacy SMRAM at 0xa0000,
thus support for real mode CS base above 1M is necessary

Because legacy SMRAM is a mess, I have tried these patches with Q35's
high SMRAM (at 0xfeda0000). This means that right now this isn't
the easiest thing to test; you need QEMU patches that add support
for high SMRAM, and SeaBIOS patches to use high SMRAM. Until QEMU
support for KVM_MEM_X86_SMRAM is in place, also, I'm keeping SMRAM
open in SeaBIOS.

That said, even this clumsy and incomplete userspace configuration is
enough to test all patches except 11 and 12.

The series is structured as follows.

Patch 1 is an unrelated bugfix (I think). Patches 2 to 6 extend some
infrastructure functions. Patches 1 to 4 could be committed right now.

Patches 7 to 9 implement basic support for SMM in the KVM API
and teach KVM about doing the world switch on SMI and RSM.

Patch 10 touches all places in KVM that read/write guest memory to
go through an x86-specific function. The x86-specific function takes
a VCPU rather than a struct kvm. This is used in patches 11 and 12
to limits access to specially marked SMRAM slots unless the VCPU is
in system management mode.

Finally, patch 13 exposes the new capability for userspace to probe.

Paolo

Paolo Bonzini (12):
KVM: MMU: fix for CR4.SMEP=1, CR0.WP=0?
KVM: export __gfn_to_pfn_memslot, drop gfn_to_pfn_async
KVM: remove unnecessary arg from mark_page_dirty_in_slot, export it
KVM: x86: pass host_initiated to functions that read MSRs
KVM: x86: pass the whole hflags field to emulator and back
KVM: x86: API changes for SMM support
KVM: x86: stubs for SMM support
KVM: x86: save/load state on SMM switch
KVM: x86: add vcpu-specific functions to read/write/translate GFNs
KVM: x86: add SMM to the MMU role
KVM: x86: add KVM_MEM_X86_SMRAM memory slot flag
KVM: x86: advertise KVM_CAP_X86_SMM

Radim Krčmář (1):
KVM: reuse memslot in kvm_write_guest_page

Documentation/virtual/kvm/api.txt | 52 ++++-
arch/x86/include/asm/kvm_emulate.h | 7 +-
arch/x86/include/asm/kvm_host.h | 39 ++--
arch/x86/include/asm/vmx.h | 1 +
arch/x86/include/uapi/asm/kvm.h | 10 +
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/cpuid.h | 8 +
arch/x86/kvm/emulate.c | 257 ++++++++++++++++++++-
arch/x86/kvm/kvm_cache_regs.h | 5 +
arch/x86/kvm/lapic.c | 26 ++-
arch/x86/kvm/mmu.c | 20 +-
arch/x86/kvm/paging_tmpl.h | 8 +-
arch/x86/kvm/smram.c | 229 +++++++++++++++++++
arch/x86/kvm/svm.c | 63 +++---
arch/x86/kvm/vmx.c | 74 +++---
arch/x86/kvm/x86.c | 452 ++++++++++++++++++++++++++++++-------
include/linux/kvm_host.h | 20 +-
include/uapi/linux/kvm.h | 5 +-
virt/kvm/kvm_main.c | 48 ++--
19 files changed, 1095 insertions(+), 231 deletions(-)
create mode 100644 arch/x86/kvm/smram.c

--
1.8.3.1


2015-04-30 11:36:38

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 01/13] KVM: MMU: fix for CR4.SMEP=1, CR0.WP=0?

smep_andnot_wp is initialized in kvm_init_shadow_mmu and shadow pages
should not be reused for different values of it. Thus, it has to be
added to the mask in kvm_mmu_pte_write.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d43867c33bc4..209fe1477465 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4238,7 +4238,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
++vcpu->kvm->stat.mmu_pte_write;
kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE);

- mask.cr0_wp = mask.cr4_pae = mask.nxe = 1;
+ mask.cr0_wp = mask.cr4_pae = mask.nxe = mask.smep_andnot_wp = 1;
for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) {
if (detect_write_misaligned(sp, gpa, bytes) ||
detect_write_flooding(sp)) {
--
1.8.3.1

2015-04-30 11:40:17

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 02/13] KVM: reuse memslot in kvm_write_guest_page

From: Radim Krčmář <[email protected]>

Saves one O(log N) search.

Signed-off-by: Radim Krčmář <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
virt/kvm/kvm_main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 90977418aeb6..b6d415156283 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1590,15 +1590,17 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
int offset, int len)
{
int r;
+ struct kvm_memory_slot *memslot;
unsigned long addr;

- addr = gfn_to_hva(kvm, gfn);
+ memslot = gfn_to_memslot(kvm, gfn);
+ addr = gfn_to_hva_memslot(memslot, gfn);
if (kvm_is_error_hva(addr))
return -EFAULT;
r = __copy_to_user((void __user *)addr + offset, data, len);
if (r)
return -EFAULT;
- mark_page_dirty(kvm, gfn);
+ mark_page_dirty_in_slot(kvm, memslot, gfn);
return 0;
}
EXPORT_SYMBOL_GPL(kvm_write_guest_page);
--
1.8.3.1

2015-04-30 11:39:38

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 03/13] KVM: export __gfn_to_pfn_memslot, drop gfn_to_pfn_async

gfn_to_pfn_async is used in just one place, and because of x86-specific
treatment that place will need to look at the memory slot. Hence inline
it into try_async_pf and export __gfn_to_pfn_memslot.

The patch also switches the subsequent call to gfn_to_pfn_prot to use
__gfn_to_pfn_memslot. For now this is just a small optimization, but
having a memslot argument will also be useful when implementing SMRAM
(which will need an x86-specific function for gfn-to-memslot conversion).

Finally, remove the now-unused async argument of __gfn_to_pfn.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu.c | 9 +++++----
include/linux/kvm_host.h | 4 ++--
virt/kvm/kvm_main.c | 26 ++++++++------------------
3 files changed, 15 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 209fe1477465..371109546382 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3475,10 +3475,12 @@ static bool can_do_async_pf(struct kvm_vcpu *vcpu)
static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
gva_t gva, pfn_t *pfn, bool write, bool *writable)
{
+ struct kvm_memory_slot *slot;
bool async;

- *pfn = gfn_to_pfn_async(vcpu->kvm, gfn, &async, write, writable);
-
+ slot = gfn_to_memslot(vcpu->kvm, gfn);
+ async = false;
+ *pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async, write, writable);
if (!async)
return false; /* *pfn has correct page already */

@@ -3492,8 +3494,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
return true;
}

- *pfn = gfn_to_pfn_prot(vcpu->kvm, gfn, write, writable);
-
+ *pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL, write, writable);
return false;
}

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ad45054309a0..647ad05b05af 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -538,13 +538,13 @@ void kvm_release_page_dirty(struct page *page);
void kvm_set_page_accessed(struct page *page);

pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn);
-pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async,
- bool write_fault, bool *writable);
pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn);
pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
bool *writable);
pfn_t gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
pfn_t gfn_to_pfn_memslot_atomic(struct kvm_memory_slot *slot, gfn_t gfn);
+pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn, bool atomic,
+ bool *async, bool write_fault, bool *writable);

void kvm_release_pfn_clean(pfn_t pfn);
void kvm_set_pfn_dirty(pfn_t pfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b6d415156283..3382de0302a0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1355,9 +1355,8 @@ exit:
return pfn;
}

-static pfn_t
-__gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn, bool atomic,
- bool *async, bool write_fault, bool *writable)
+pfn_t __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn, bool atomic,
+ bool *async, bool write_fault, bool *writable)
{
unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault);

@@ -1376,44 +1375,35 @@ __gfn_to_pfn_memslot(struct kvm_memory_slot *slot, gfn_t gfn, bool atomic,
return hva_to_pfn(addr, atomic, async, write_fault,
writable);
}
+EXPORT_SYMBOL_GPL(__gfn_to_pfn_memslot);

-static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic, bool *async,
+static pfn_t __gfn_to_pfn(struct kvm *kvm, gfn_t gfn, bool atomic,
bool write_fault, bool *writable)
{
struct kvm_memory_slot *slot;

- if (async)
- *async = false;
-
slot = gfn_to_memslot(kvm, gfn);

- return __gfn_to_pfn_memslot(slot, gfn, atomic, async, write_fault,
+ return __gfn_to_pfn_memslot(slot, gfn, atomic, NULL, write_fault,
writable);
}

pfn_t gfn_to_pfn_atomic(struct kvm *kvm, gfn_t gfn)
{
- return __gfn_to_pfn(kvm, gfn, true, NULL, true, NULL);
+ return __gfn_to_pfn(kvm, gfn, true, true, NULL);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_atomic);

-pfn_t gfn_to_pfn_async(struct kvm *kvm, gfn_t gfn, bool *async,
- bool write_fault, bool *writable)
-{
- return __gfn_to_pfn(kvm, gfn, false, async, write_fault, writable);
-}
-EXPORT_SYMBOL_GPL(gfn_to_pfn_async);
-
pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn)
{
- return __gfn_to_pfn(kvm, gfn, false, NULL, true, NULL);
+ return __gfn_to_pfn(kvm, gfn, false, true, NULL);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn);

pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
bool *writable)
{
- return __gfn_to_pfn(kvm, gfn, false, NULL, write_fault, writable);
+ return __gfn_to_pfn(kvm, gfn, false, write_fault, writable);
}
EXPORT_SYMBOL_GPL(gfn_to_pfn_prot);

--
1.8.3.1

2015-04-30 11:36:45

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 04/13] KVM: remove unnecessary arg from mark_page_dirty_in_slot, export it

Reviewed-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 14 ++++++--------
2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 647ad05b05af..1edb63dbdead 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -572,6 +572,7 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
+void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot, gfn_t gfn);

void kvm_vcpu_block(struct kvm_vcpu *vcpu);
void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 3382de0302a0..e82c46b9860f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -103,8 +103,6 @@ static void hardware_disable_all(void);
static void kvm_io_bus_destroy(struct kvm_io_bus *bus);

static void kvm_release_pfn_dirty(pfn_t pfn);
-static void mark_page_dirty_in_slot(struct kvm *kvm,
- struct kvm_memory_slot *memslot, gfn_t gfn);

__visible bool kvm_rebooting;
EXPORT_SYMBOL_GPL(kvm_rebooting);
@@ -1590,7 +1588,7 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
r = __copy_to_user((void __user *)addr + offset, data, len);
if (r)
return -EFAULT;
- mark_page_dirty_in_slot(kvm, memslot, gfn);
+ mark_page_dirty_in_slot(memslot, gfn);
return 0;
}
EXPORT_SYMBOL_GPL(kvm_write_guest_page);
@@ -1673,7 +1671,7 @@ int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
r = __copy_to_user((void __user *)ghc->hva, data, len);
if (r)
return -EFAULT;
- mark_page_dirty_in_slot(kvm, ghc->memslot, ghc->gpa >> PAGE_SHIFT);
+ mark_page_dirty_in_slot(ghc->memslot, ghc->gpa >> PAGE_SHIFT);

return 0;
}
@@ -1731,9 +1729,8 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len)
}
EXPORT_SYMBOL_GPL(kvm_clear_guest);

-static void mark_page_dirty_in_slot(struct kvm *kvm,
- struct kvm_memory_slot *memslot,
- gfn_t gfn)
+void mark_page_dirty_in_slot(struct kvm_memory_slot *memslot,
+ gfn_t gfn)
{
if (memslot && memslot->dirty_bitmap) {
unsigned long rel_gfn = gfn - memslot->base_gfn;
@@ -1741,13 +1738,14 @@ static void mark_page_dirty_in_slot(struct kvm *kvm,
set_bit_le(rel_gfn, memslot->dirty_bitmap);
}
}
+EXPORT_SYMBOL_GPL(mark_page_dirty_in_slot);

void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
{
struct kvm_memory_slot *memslot;

memslot = gfn_to_memslot(kvm, gfn);
- mark_page_dirty_in_slot(kvm, memslot, gfn);
+ mark_page_dirty_in_slot(memslot, gfn);
}
EXPORT_SYMBOL_GPL(mark_page_dirty);

--
1.8.3.1

2015-04-30 11:39:33

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 05/13] KVM: x86: pass host_initiated to functions that read MSRs

SMBASE is only readable from SMM for the VCPU, but it must be always
accessible if userspace is accessing it. Thus, all functions that
read MSRs are changed to accept a struct msr_data; the host_initiated
and index fields are pre-initialized, while the data field is filled
on return.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 6 +--
arch/x86/kvm/svm.c | 54 ++++++++++----------
arch/x86/kvm/vmx.c | 62 ++++++++++++-----------
arch/x86/kvm/x86.c | 106 ++++++++++++++++++++++++----------------
4 files changed, 127 insertions(+), 101 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dea2e7e962e3..b0a97b4dbde4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -716,7 +716,7 @@ struct kvm_x86_ops {
void (*vcpu_put)(struct kvm_vcpu *vcpu);

void (*update_db_bp_intercept)(struct kvm_vcpu *vcpu);
- int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata);
+ int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
u64 (*get_segment_base)(struct kvm_vcpu *vcpu, int seg);
void (*get_segment)(struct kvm_vcpu *vcpu,
@@ -935,7 +935,7 @@ static inline int emulate_instruction(struct kvm_vcpu *vcpu,

void kvm_enable_efer_bits(u64);
bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer);
-int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *data);
+int kvm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
int kvm_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);

struct x86_emulate_ctxt;
@@ -964,7 +964,7 @@ void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw);
void kvm_get_cs_db_l_bits(struct kvm_vcpu *vcpu, int *db, int *l);
int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr);

-int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata);
+int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);
int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr);

unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ce741b8650f6..f2cc241c71c4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3067,42 +3067,42 @@ static u64 svm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
svm_scale_tsc(vcpu, host_tsc);
}

-static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
+static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
struct vcpu_svm *svm = to_svm(vcpu);

- switch (ecx) {
+ switch (msr_info->index) {
case MSR_IA32_TSC: {
- *data = svm->vmcb->control.tsc_offset +
+ msr_info->data = svm->vmcb->control.tsc_offset +
svm_scale_tsc(vcpu, native_read_tsc());

break;
}
case MSR_STAR:
- *data = svm->vmcb->save.star;
+ msr_info->data = svm->vmcb->save.star;
break;
#ifdef CONFIG_X86_64
case MSR_LSTAR:
- *data = svm->vmcb->save.lstar;
+ msr_info->data = svm->vmcb->save.lstar;
break;
case MSR_CSTAR:
- *data = svm->vmcb->save.cstar;
+ msr_info->data = svm->vmcb->save.cstar;
break;
case MSR_KERNEL_GS_BASE:
- *data = svm->vmcb->save.kernel_gs_base;
+ msr_info->data = svm->vmcb->save.kernel_gs_base;
break;
case MSR_SYSCALL_MASK:
- *data = svm->vmcb->save.sfmask;
+ msr_info->data = svm->vmcb->save.sfmask;
break;
#endif
case MSR_IA32_SYSENTER_CS:
- *data = svm->vmcb->save.sysenter_cs;
+ msr_info->data = svm->vmcb->save.sysenter_cs;
break;
case MSR_IA32_SYSENTER_EIP:
- *data = svm->sysenter_eip;
+ msr_info->data = svm->sysenter_eip;
break;
case MSR_IA32_SYSENTER_ESP:
- *data = svm->sysenter_esp;
+ msr_info->data = svm->sysenter_esp;
break;
/*
* Nobody will change the following 5 values in the VMCB so we can
@@ -3110,31 +3110,31 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
* implemented.
*/
case MSR_IA32_DEBUGCTLMSR:
- *data = svm->vmcb->save.dbgctl;
+ msr_info->data = svm->vmcb->save.dbgctl;
break;
case MSR_IA32_LASTBRANCHFROMIP:
- *data = svm->vmcb->save.br_from;
+ msr_info->data = svm->vmcb->save.br_from;
break;
case MSR_IA32_LASTBRANCHTOIP:
- *data = svm->vmcb->save.br_to;
+ msr_info->data = svm->vmcb->save.br_to;
break;
case MSR_IA32_LASTINTFROMIP:
- *data = svm->vmcb->save.last_excp_from;
+ msr_info->data = svm->vmcb->save.last_excp_from;
break;
case MSR_IA32_LASTINTTOIP:
- *data = svm->vmcb->save.last_excp_to;
+ msr_info->data = svm->vmcb->save.last_excp_to;
break;
case MSR_VM_HSAVE_PA:
- *data = svm->nested.hsave_msr;
+ msr_info->data = svm->nested.hsave_msr;
break;
case MSR_VM_CR:
- *data = svm->nested.vm_cr_msr;
+ msr_info->data = svm->nested.vm_cr_msr;
break;
case MSR_IA32_UCODE_REV:
- *data = 0x01000065;
+ msr_info->data = 0x01000065;
break;
default:
- return kvm_get_msr_common(vcpu, ecx, data);
+ return kvm_get_msr_common(vcpu, msr_info);
}
return 0;
}
@@ -3142,16 +3142,20 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data)
static int rdmsr_interception(struct vcpu_svm *svm)
{
u32 ecx = kvm_register_read(&svm->vcpu, VCPU_REGS_RCX);
- u64 data;
+ struct msr_data msr_info;

- if (svm_get_msr(&svm->vcpu, ecx, &data)) {
+ msr_info.index = ecx;
+ msr_info.host_initiated = false;
+ if (svm_get_msr(&svm->vcpu, &msr_info)) {
trace_kvm_msr_read_ex(ecx);
kvm_inject_gp(&svm->vcpu, 0);
} else {
- trace_kvm_msr_read(ecx, data);
+ trace_kvm_msr_read(ecx, msr_info.data);

- kvm_register_write(&svm->vcpu, VCPU_REGS_RAX, data & 0xffffffff);
- kvm_register_write(&svm->vcpu, VCPU_REGS_RDX, data >> 32);
+ kvm_register_write(&svm->vcpu, VCPU_REGS_RAX,
+ msr_info.data & 0xffffffff);
+ kvm_register_write(&svm->vcpu, VCPU_REGS_RDX,
+ msr_info.data >> 32);
svm->next_rip = kvm_rip_read(&svm->vcpu) + 2;
skip_emulated_instruction(&svm->vcpu);
}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 509470a4c49e..4e14dfa83ce0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2623,76 +2623,69 @@ static int vmx_get_vmx_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
* Returns 0 on success, non-0 otherwise.
* Assumes vcpu_load() was already called.
*/
-static int vmx_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
+static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
- u64 data;
struct shared_msr_entry *msr;

- if (!pdata) {
- printk(KERN_ERR "BUG: get_msr called with NULL pdata\n");
- return -EINVAL;
- }
-
- switch (msr_index) {
+ switch (msr_info->index) {
#ifdef CONFIG_X86_64
case MSR_FS_BASE:
- data = vmcs_readl(GUEST_FS_BASE);
+ msr_info->data = vmcs_readl(GUEST_FS_BASE);
break;
case MSR_GS_BASE:
- data = vmcs_readl(GUEST_GS_BASE);
+ msr_info->data = vmcs_readl(GUEST_GS_BASE);
break;
case MSR_KERNEL_GS_BASE:
vmx_load_host_state(to_vmx(vcpu));
- data = to_vmx(vcpu)->msr_guest_kernel_gs_base;
+ msr_info->data = to_vmx(vcpu)->msr_guest_kernel_gs_base;
break;
#endif
case MSR_EFER:
- return kvm_get_msr_common(vcpu, msr_index, pdata);
+ return kvm_get_msr_common(vcpu, msr_info);
case MSR_IA32_TSC:
- data = guest_read_tsc();
+ msr_info->data = guest_read_tsc();
break;
case MSR_IA32_SYSENTER_CS:
- data = vmcs_read32(GUEST_SYSENTER_CS);
+ msr_info->data = vmcs_read32(GUEST_SYSENTER_CS);
break;
case MSR_IA32_SYSENTER_EIP:
- data = vmcs_readl(GUEST_SYSENTER_EIP);
+ msr_info->data = vmcs_readl(GUEST_SYSENTER_EIP);
break;
case MSR_IA32_SYSENTER_ESP:
- data = vmcs_readl(GUEST_SYSENTER_ESP);
+ msr_info->data = vmcs_readl(GUEST_SYSENTER_ESP);
break;
case MSR_IA32_BNDCFGS:
if (!vmx_mpx_supported())
return 1;
- data = vmcs_read64(GUEST_BNDCFGS);
+ msr_info->data = vmcs_read64(GUEST_BNDCFGS);
break;
case MSR_IA32_FEATURE_CONTROL:
if (!nested_vmx_allowed(vcpu))
return 1;
- data = to_vmx(vcpu)->nested.msr_ia32_feature_control;
+ msr_info->data = to_vmx(vcpu)->nested.msr_ia32_feature_control;
break;
case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC:
if (!nested_vmx_allowed(vcpu))
return 1;
- return vmx_get_vmx_msr(vcpu, msr_index, pdata);
+ return vmx_get_vmx_msr(vcpu, msr_info->index, &msr_info->data);
case MSR_IA32_XSS:
if (!vmx_xsaves_supported())
return 1;
- data = vcpu->arch.ia32_xss;
+ msr_info->data = vcpu->arch.ia32_xss;
break;
case MSR_TSC_AUX:
if (!to_vmx(vcpu)->rdtscp_enabled)
return 1;
/* Otherwise falls through */
default:
- msr = find_msr_entry(to_vmx(vcpu), msr_index);
+ msr = find_msr_entry(to_vmx(vcpu), msr_info->index);
if (msr) {
- data = msr->data;
+ msr_info->data = msr->data;
break;
}
- return kvm_get_msr_common(vcpu, msr_index, pdata);
+ return kvm_get_msr_common(vcpu, msr_info);
}

- *pdata = data;
return 0;
}

@@ -5475,19 +5468,21 @@ static int handle_cpuid(struct kvm_vcpu *vcpu)
static int handle_rdmsr(struct kvm_vcpu *vcpu)
{
u32 ecx = vcpu->arch.regs[VCPU_REGS_RCX];
- u64 data;
+ struct msr_data msr_info;

- if (vmx_get_msr(vcpu, ecx, &data)) {
+ msr_info.index = ecx;
+ msr_info.host_initiated = false;
+ if (vmx_get_msr(vcpu, &msr_info)) {
trace_kvm_msr_read_ex(ecx);
kvm_inject_gp(vcpu, 0);
return 1;
}

- trace_kvm_msr_read(ecx, data);
+ trace_kvm_msr_read(ecx, msr_info.data);

/* FIXME: handling of bits 32:63 of rax, rdx */
- vcpu->arch.regs[VCPU_REGS_RAX] = data & -1u;
- vcpu->arch.regs[VCPU_REGS_RDX] = (data >> 32) & -1u;
+ vcpu->arch.regs[VCPU_REGS_RAX] = msr_info.data & -1u;
+ vcpu->arch.regs[VCPU_REGS_RDX] = (msr_info.data >> 32) & -1u;
skip_emulated_instruction(vcpu);
return 1;
}
@@ -9152,6 +9147,7 @@ static int nested_vmx_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
struct vmx_msr_entry e;

for (i = 0; i < count; i++) {
+ struct msr_data msr_info;
if (kvm_read_guest(vcpu->kvm,
gpa + i * sizeof(e),
&e, 2 * sizeof(u32))) {
@@ -9166,7 +9162,9 @@ static int nested_vmx_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
__func__, i, e.index, e.reserved);
return -EINVAL;
}
- if (kvm_get_msr(vcpu, e.index, &e.value)) {
+ msr_info.host_initiated = false;
+ msr_info.index = e.index;
+ if (kvm_get_msr(vcpu, &msr_info)) {
pr_warn_ratelimited(
"%s cannot read MSR (%u, 0x%x)\n",
__func__, i, e.index);
@@ -9175,10 +9173,10 @@ static int nested_vmx_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
if (kvm_write_guest(vcpu->kvm,
gpa + i * sizeof(e) +
offsetof(struct vmx_msr_entry, value),
- &e.value, sizeof(e.value))) {
+ &msr_info.data, sizeof(msr_info.data))) {
pr_warn_ratelimited(
"%s cannot write MSR (%u, 0x%x, 0x%llx)\n",
- __func__, i, e.index, e.value);
+ __func__, i, e.index, msr_info.data);
return -EINVAL;
}
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c73efcd03e29..856598afa6b4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1048,6 +1048,21 @@ EXPORT_SYMBOL_GPL(kvm_set_msr);
/*
* Adapt set_msr() to msr_io()'s calling convention
*/
+static int do_get_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
+{
+ struct msr_data msr;
+ int r;
+
+ msr.index = index;
+ msr.host_initiated = true;
+ r = kvm_set_msr(vcpu, &msr);
+ if (r)
+ return r;
+
+ *data = msr.data;
+ return 0;
+}
+
static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
{
struct msr_data msr;
@@ -2381,9 +2396,9 @@ EXPORT_SYMBOL_GPL(kvm_set_msr_common);
* Returns 0 on success, non-0 otherwise.
* Assumes vcpu_load() was already called.
*/
-int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
+int kvm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
{
- return kvm_x86_ops->get_msr(vcpu, msr_index, pdata);
+ return kvm_x86_ops->get_msr(vcpu, msr);
}
EXPORT_SYMBOL_GPL(kvm_get_msr);

@@ -2520,11 +2535,11 @@ static int get_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
return 0;
}

-int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
+int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
u64 data;

- switch (msr) {
+ switch (msr_info->index) {
case MSR_IA32_PLATFORM_ID:
case MSR_IA32_EBL_CR_POWERON:
case MSR_IA32_DEBUGCTLMSR:
@@ -2547,26 +2562,26 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
case MSR_AMD64_NB_CFG:
case MSR_FAM10H_MMIO_CONF_BASE:
case MSR_AMD64_BU_CFG2:
- data = 0;
+ msr_info->data = 0;
break;
case MSR_P6_PERFCTR0:
case MSR_P6_PERFCTR1:
case MSR_P6_EVNTSEL0:
case MSR_P6_EVNTSEL1:
- if (kvm_pmu_msr(vcpu, msr))
- return kvm_pmu_get_msr(vcpu, msr, pdata);
- data = 0;
+ if (kvm_pmu_msr(vcpu, msr_info->index))
+ return kvm_pmu_get_msr(vcpu, msr_info->index, &msr_info->data);
+ msr_info->data = 0;
break;
case MSR_IA32_UCODE_REV:
- data = 0x100000000ULL;
+ msr_info->data = 0x100000000ULL;
break;
case MSR_MTRRcap:
- data = 0x500 | KVM_NR_VAR_MTRR;
+ msr_info->data = 0x500 | KVM_NR_VAR_MTRR;
break;
case 0x200 ... 0x2ff:
- return get_msr_mtrr(vcpu, msr, pdata);
+ return get_msr_mtrr(vcpu, msr_info->index, &msr_info->data);
case 0xcd: /* fsb frequency */
- data = 3;
+ msr_info->data = 3;
break;
/*
* MSR_EBC_FREQUENCY_ID
@@ -2580,48 +2595,48 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
* multiplying by zero otherwise.
*/
case MSR_EBC_FREQUENCY_ID:
- data = 1 << 24;
+ msr_info->data = 1 << 24;
break;
case MSR_IA32_APICBASE:
- data = kvm_get_apic_base(vcpu);
+ msr_info->data = kvm_get_apic_base(vcpu);
break;
case APIC_BASE_MSR ... APIC_BASE_MSR + 0x3ff:
- return kvm_x2apic_msr_read(vcpu, msr, pdata);
+ return kvm_x2apic_msr_read(vcpu, msr_info->index, &msr_info->data);
break;
case MSR_IA32_TSCDEADLINE:
- data = kvm_get_lapic_tscdeadline_msr(vcpu);
+ msr_info->data = kvm_get_lapic_tscdeadline_msr(vcpu);
break;
case MSR_IA32_TSC_ADJUST:
- data = (u64)vcpu->arch.ia32_tsc_adjust_msr;
+ msr_info->data = (u64)vcpu->arch.ia32_tsc_adjust_msr;
break;
case MSR_IA32_MISC_ENABLE:
- data = vcpu->arch.ia32_misc_enable_msr;
+ msr_info->data = vcpu->arch.ia32_misc_enable_msr;
break;
case MSR_IA32_PERF_STATUS:
/* TSC increment by tick */
- data = 1000ULL;
+ msr_info->data = 1000ULL;
/* CPU multiplier */
data |= (((uint64_t)4ULL) << 40);
break;
case MSR_EFER:
- data = vcpu->arch.efer;
+ msr_info->data = vcpu->arch.efer;
break;
case MSR_KVM_WALL_CLOCK:
case MSR_KVM_WALL_CLOCK_NEW:
- data = vcpu->kvm->arch.wall_clock;
+ msr_info->data = vcpu->kvm->arch.wall_clock;
break;
case MSR_KVM_SYSTEM_TIME:
case MSR_KVM_SYSTEM_TIME_NEW:
- data = vcpu->arch.time;
+ msr_info->data = vcpu->arch.time;
break;
case MSR_KVM_ASYNC_PF_EN:
- data = vcpu->arch.apf.msr_val;
+ msr_info->data = vcpu->arch.apf.msr_val;
break;
case MSR_KVM_STEAL_TIME:
- data = vcpu->arch.st.msr_val;
+ msr_info->data = vcpu->arch.st.msr_val;
break;
case MSR_KVM_PV_EOI_EN:
- data = vcpu->arch.pv_eoi.msr_val;
+ msr_info->data = vcpu->arch.pv_eoi.msr_val;
break;
case MSR_IA32_P5_MC_ADDR:
case MSR_IA32_P5_MC_TYPE:
@@ -2629,7 +2644,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
case MSR_IA32_MCG_CTL:
case MSR_IA32_MCG_STATUS:
case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
- return get_msr_mce(vcpu, msr, pdata);
+ return get_msr_mce(vcpu, msr_info->index, &msr_info->data);
case MSR_K7_CLK_CTL:
/*
* Provide expected ramp-up count for K7. All other
@@ -2640,17 +2655,17 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
* type 6, model 8 and higher from exploding due to
* the rdmsr failing.
*/
- data = 0x20000000;
+ msr_info->data = 0x20000000;
break;
case HV_X64_MSR_GUEST_OS_ID ... HV_X64_MSR_SINT15:
- if (kvm_hv_msr_partition_wide(msr)) {
+ if (kvm_hv_msr_partition_wide(msr_info->index)) {
int r;
mutex_lock(&vcpu->kvm->lock);
- r = get_msr_hyperv_pw(vcpu, msr, pdata);
+ r = get_msr_hyperv_pw(vcpu, msr_info->index, &msr_info->data);
mutex_unlock(&vcpu->kvm->lock);
return r;
} else
- return get_msr_hyperv(vcpu, msr, pdata);
+ return get_msr_hyperv(vcpu, msr_info->index, &msr_info->data);
break;
case MSR_IA32_BBL_CR_CTL3:
/* This legacy MSR exists but isn't fully documented in current
@@ -2663,31 +2678,30 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
* L2 cache control register 3: 64GB range, 256KB size,
* enabled, latency 0x1, configured
*/
- data = 0xbe702111;
+ msr_info->data = 0xbe702111;
break;
case MSR_AMD64_OSVW_ID_LENGTH:
if (!guest_cpuid_has_osvw(vcpu))
return 1;
- data = vcpu->arch.osvw.length;
+ msr_info->data = vcpu->arch.osvw.length;
break;
case MSR_AMD64_OSVW_STATUS:
if (!guest_cpuid_has_osvw(vcpu))
return 1;
- data = vcpu->arch.osvw.status;
+ msr_info->data = vcpu->arch.osvw.status;
break;
default:
- if (kvm_pmu_msr(vcpu, msr))
- return kvm_pmu_get_msr(vcpu, msr, pdata);
+ if (kvm_pmu_msr(vcpu, msr_info->index))
+ return kvm_pmu_get_msr(vcpu, msr_info->index, &msr_info->data);
if (!ignore_msrs) {
- vcpu_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr);
+ vcpu_unimpl(vcpu, "unhandled rdmsr: 0x%x\n", msr_info->index);
return 1;
} else {
- vcpu_unimpl(vcpu, "ignored rdmsr: 0x%x\n", msr);
- data = 0;
+ vcpu_unimpl(vcpu, "ignored rdmsr: 0x%x\n", msr_info->index);
+ msr_info->data = 0;
}
break;
}
- *pdata = data;
return 0;
}
EXPORT_SYMBOL_GPL(kvm_get_msr_common);
@@ -3456,7 +3470,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
break;
}
case KVM_GET_MSRS:
- r = msr_io(vcpu, argp, kvm_get_msr, 1);
+ r = msr_io(vcpu, argp, do_get_msr, 1);
break;
case KVM_SET_MSRS:
r = msr_io(vcpu, argp, do_set_msr, 0);
@@ -4948,7 +4962,17 @@ static void emulator_set_segment(struct x86_emulate_ctxt *ctxt, u16 selector,
static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
u32 msr_index, u64 *pdata)
{
- return kvm_get_msr(emul_to_vcpu(ctxt), msr_index, pdata);
+ struct msr_data msr;
+ int r;
+
+ msr.index = msr_index;
+ msr.host_initiated = false;
+ r = kvm_get_msr(emul_to_vcpu(ctxt), &msr);
+ if (r)
+ return r;
+
+ *pdata = msr.data;
+ return 0;
}

static int emulator_set_msr(struct x86_emulate_ctxt *ctxt,
--
1.8.3.1

2015-04-30 11:39:06

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 06/13] KVM: x86: pass the whole hflags field to emulator and back

The hflags field will contain information about system management mode
and will be useful for the emulator. Pass the entire field rather than
just the guest-mode information.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_emulate.h | 5 ++++-
arch/x86/kvm/emulate.c | 6 +++---
arch/x86/kvm/x86.c | 4 +++-
3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 57a9d94fe160..7410879a41f7 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -262,6 +262,9 @@ enum x86emul_mode {
X86EMUL_MODE_PROT64, /* 64-bit (long) mode. */
};

+/* These match some of the HF_* flags defined in kvm_host.h */
+#define X86EMUL_GUEST_MASK (1 << 5) /* VCPU is in guest-mode */
+
struct x86_emulate_ctxt {
const struct x86_emulate_ops *ops;

@@ -273,8 +276,8 @@ struct x86_emulate_ctxt {

/* interruptibility state, as a result of execution of STI or MOV SS */
int interruptibility;
+ int emul_flags;

- bool guest_mode; /* guest running a nested guest */
bool perm_ok; /* do not check permissions if true */
bool ud; /* inject an #UD if host doesn't support insn */

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 630bcb0d7a04..cdb612b50910 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4871,7 +4871,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
fetch_possible_mmx_operand(ctxt, &ctxt->dst);
}

- if (unlikely(ctxt->guest_mode) && (ctxt->d & Intercept)) {
+ if (unlikely(ctxt->emul_flags & X86EMUL_GUEST_MASK) && ctxt->intercept) {
rc = emulator_check_intercept(ctxt, ctxt->intercept,
X86_ICPT_PRE_EXCEPT);
if (rc != X86EMUL_CONTINUE)
@@ -4900,7 +4900,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
goto done;
}

- if (unlikely(ctxt->guest_mode) && (ctxt->d & Intercept)) {
+ if (unlikely(ctxt->emul_flags & X86EMUL_GUEST_MASK) && (ctxt->d & Intercept)) {
rc = emulator_check_intercept(ctxt, ctxt->intercept,
X86_ICPT_POST_EXCEPT);
if (rc != X86EMUL_CONTINUE)
@@ -4953,7 +4953,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)

special_insn:

- if (unlikely(ctxt->guest_mode) && (ctxt->d & Intercept)) {
+ if (unlikely(ctxt->emul_flags & X86EMUL_GUEST_MASK) && (ctxt->d & Intercept)) {
rc = emulator_check_intercept(ctxt, ctxt->intercept,
X86_ICPT_POST_MEMACCESS);
if (rc != X86EMUL_CONTINUE)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 856598afa6b4..6009e6a0e406 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5132,7 +5132,8 @@ static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
(cs_l && is_long_mode(vcpu)) ? X86EMUL_MODE_PROT64 :
cs_db ? X86EMUL_MODE_PROT32 :
X86EMUL_MODE_PROT16;
- ctxt->guest_mode = is_guest_mode(vcpu);
+ BUILD_BUG_ON(HF_GUEST_MASK != X86EMUL_GUEST_MASK);
+ ctxt->emul_flags = vcpu->arch.hflags;

init_decode_cache(ctxt);
vcpu->arch.emulate_regs_need_sync_from_vcpu = false;
@@ -5500,6 +5501,7 @@ restart:
unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
toggle_interruptibility(vcpu, ctxt->interruptibility);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
+ vcpu->arch.hflags = ctxt->emul_flags;
kvm_rip_write(vcpu, ctxt->eip);
if (r == EMULATE_DONE)
kvm_vcpu_check_singlestep(vcpu, rflags, &r);
--
1.8.3.1

2015-04-30 11:36:53

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 07/13] KVM: x86: API changes for SMM support

This patch includes changes to the external API for SMM support.
All the changes are predicated by the availability of a new
capability, KVM_CAP_X86_SMM, which is added at the end of the
patch series.

Signed-off-by: Paolo Bonzini <[email protected]>
---
Documentation/virtual/kvm/api.txt | 34 ++++++++++++++++++++++++++++++----
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/uapi/asm/kvm.h | 7 +++++++
arch/x86/kvm/kvm_cache_regs.h | 5 +++++
arch/x86/kvm/x86.c | 30 +++++++++++++++++++++++++-----
include/uapi/linux/kvm.h | 5 ++++-
6 files changed, 72 insertions(+), 10 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 9fa2bf8c3f6f..c1afcb2e4b89 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -820,11 +820,19 @@ struct kvm_vcpu_events {
} nmi;
__u32 sipi_vector;
__u32 flags;
+ struct {
+ __u8 smm;
+ __u8 pad[3];
+ } smi;
};

-KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
-interrupt.shadow contains a valid state. Otherwise, this field is undefined.
+Only two fields are defined in the flags field:
+
+- KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
+ interrupt.shadow contains a valid state.

+- KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
+ smi.smm contains a valid state.

4.32 KVM_SET_VCPU_EVENTS

@@ -841,17 +849,20 @@ vcpu.
See KVM_GET_VCPU_EVENTS for the data structure.

Fields that may be modified asynchronously by running VCPUs can be excluded
-from the update. These fields are nmi.pending and sipi_vector. Keep the
+from the update. These fields are nmi.pending, sipi_vector, smi.smm. Keep the
corresponding bits in the flags field cleared to suppress overwriting the
current in-kernel state. The bits are:

KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
+KVM_VCPUEVENT_VALID_SMM - transfer smi.smm

If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
the flags field to signal that interrupt.shadow contains a valid state and
shall be written into the VCPU.

+KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
+

4.33 KVM_GET_DEBUGREGS

@@ -2978,6 +2989,16 @@ len must be a multiple of sizeof(struct kvm_s390_irq). It must be > 0
and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
which is the maximum number of possibly pending cpu-local interrupts.

+4.90 KVM_SMI
+
+Capability: KVM_CAP_X86_SMM
+Architectures: x86
+Type: vcpu ioctl
+Parameters: none
+Returns: 0 on success, -1 on error
+
+Queues an SMI on the thread's vcpu.
+
5. The kvm_run structure
------------------------

@@ -3013,7 +3034,12 @@ an interrupt can be injected now with KVM_INTERRUPT.
The value of the current interrupt flag. Only valid if in-kernel
local APIC is not used.

- __u8 padding2[2];
+ __u16 flags;
+
+More architecture-specific flags, detailing state of the VCPU that may
+affect the device's behavior. The only currently defined flag is
+KVM_RUN_X86_SMM, which is valid on x86 machines and is set if the
+VCPU is in system management mode.

/* in (pre_kvm_run), out (post_kvm_run) */
__u64 cr8;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b0a97b4dbde4..1171e346bb33 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1109,6 +1109,7 @@ enum {
#define HF_NMI_MASK (1 << 3)
#define HF_IRET_MASK (1 << 4)
#define HF_GUEST_MASK (1 << 5) /* VCPU is in guest-mode */
+#define HF_SMM_MASK (1 << 6)

/*
* Hardware virtualization extension instructions may fault if a
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index d7dcef58aefa..0bb09e2e549e 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -106,6 +106,8 @@ struct kvm_ioapic_state {
#define KVM_IRQCHIP_IOAPIC 2
#define KVM_NR_IRQCHIPS 3

+#define KVM_RUN_X86_SMM (1 << 0)
+
/* for KVM_GET_REGS and KVM_SET_REGS */
struct kvm_regs {
/* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
@@ -281,6 +283,7 @@ struct kvm_reinject_control {
#define KVM_VCPUEVENT_VALID_NMI_PENDING 0x00000001
#define KVM_VCPUEVENT_VALID_SIPI_VECTOR 0x00000002
#define KVM_VCPUEVENT_VALID_SHADOW 0x00000004
+#define KVM_VCPUEVENT_VALID_SMM 0x00000008

/* Interrupt shadow states */
#define KVM_X86_SHADOW_INT_MOV_SS 0x01
@@ -309,6 +312,10 @@ struct kvm_vcpu_events {
} nmi;
__u32 sipi_vector;
__u32 flags;
+ struct {
+ __u8 smm;
+ __u8 pad[3];
+ } smi;
__u32 reserved[10];
};

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 544076c4f44b..e1e89ee4af75 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -99,4 +99,9 @@ static inline bool is_guest_mode(struct kvm_vcpu *vcpu)
return vcpu->arch.hflags & HF_GUEST_MASK;
}

+static inline bool is_smm(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.hflags & HF_SMM_MASK;
+}
+
#endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6009e6a0e406..c088b057aad5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3032,6 +3032,11 @@ static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu)
return 0;
}

+static int kvm_vcpu_ioctl_smi(struct kvm_vcpu *vcpu)
+{
+ return 0;
+}
+
static int vcpu_ioctl_tpr_access_reporting(struct kvm_vcpu *vcpu,
struct kvm_tpr_access_ctl *tac)
{
@@ -3116,29 +3121,31 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
struct kvm_vcpu_events *events)
{
process_nmi(vcpu);
+
+ memset(events, 0, sizeof(*events));
events->exception.injected =
vcpu->arch.exception.pending &&
!kvm_exception_is_soft(vcpu->arch.exception.nr);
events->exception.nr = vcpu->arch.exception.nr;
events->exception.has_error_code = vcpu->arch.exception.has_error_code;
- events->exception.pad = 0;
events->exception.error_code = vcpu->arch.exception.error_code;

events->interrupt.injected =
vcpu->arch.interrupt.pending && !vcpu->arch.interrupt.soft;
events->interrupt.nr = vcpu->arch.interrupt.nr;
- events->interrupt.soft = 0;
events->interrupt.shadow = kvm_x86_ops->get_interrupt_shadow(vcpu);

events->nmi.injected = vcpu->arch.nmi_injected;
events->nmi.pending = vcpu->arch.nmi_pending != 0;
events->nmi.masked = kvm_x86_ops->get_nmi_mask(vcpu);
- events->nmi.pad = 0;

events->sipi_vector = 0; /* never valid when reporting to user space */

+ events->smi.smm = is_smm(vcpu);
+
events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
- | KVM_VCPUEVENT_VALID_SHADOW);
+ | KVM_VCPUEVENT_VALID_SHADOW
+ | KVM_VCPUEVENT_VALID_SMM);
memset(&events->reserved, 0, sizeof(events->reserved));
}

@@ -3147,7 +3154,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
{
if (events->flags & ~(KVM_VCPUEVENT_VALID_NMI_PENDING
| KVM_VCPUEVENT_VALID_SIPI_VECTOR
- | KVM_VCPUEVENT_VALID_SHADOW))
+ | KVM_VCPUEVENT_VALID_SHADOW
+ | KVM_VCPUEVENT_VALID_SMM))
return -EINVAL;

process_nmi(vcpu);
@@ -3172,6 +3180,13 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
kvm_vcpu_has_lapic(vcpu))
vcpu->arch.apic->sipi_vector = events->sipi_vector;

+ if (events->flags & KVM_VCPUEVENT_VALID_SMM) {
+ if (events->smi.smm)
+ vcpu->arch.hflags |= HF_SMM_MASK;
+ else
+ vcpu->arch.hflags &= ~HF_SMM_MASK;
+ }
+
kvm_make_request(KVM_REQ_EVENT, vcpu);

return 0;
@@ -3431,6 +3446,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = kvm_vcpu_ioctl_nmi(vcpu);
break;
}
+ case KVM_SMI: {
+ r = kvm_vcpu_ioctl_smi(vcpu);
+ break;
+ }
case KVM_SET_CPUID: {
struct kvm_cpuid __user *cpuid_arg = argp;
struct kvm_cpuid cpuid;
@@ -6067,6 +6086,7 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
struct kvm_run *kvm_run = vcpu->run;

kvm_run->if_flag = (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
+ kvm_run->flags = is_smm(vcpu) ? KVM_RUN_X86_SMM : 0;
kvm_run->cr8 = kvm_get_cr8(vcpu);
kvm_run->apic_base = kvm_get_apic_base(vcpu);
if (irqchip_in_kernel(vcpu->kvm))
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4b60056776d1..cd51f0289e9f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -202,7 +202,7 @@ struct kvm_run {
__u32 exit_reason;
__u8 ready_for_interrupt_injection;
__u8 if_flag;
- __u8 padding2[2];
+ __u16 flags;

/* in (pre_kvm_run), out (post_kvm_run) */
__u64 cr8;
@@ -814,6 +814,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_S390_INJECT_IRQ 113
#define KVM_CAP_S390_IRQ_STATE 114
#define KVM_CAP_PPC_HWRNG 115
+#define KVM_CAP_X86_SMM 120

#ifdef KVM_CAP_IRQ_ROUTING

@@ -1199,6 +1200,8 @@ struct kvm_s390_ucas_mapping {
/* Available with KVM_CAP_S390_IRQ_STATE */
#define KVM_S390_SET_IRQ_STATE _IOW(KVMIO, 0xb5, struct kvm_s390_irq_state)
#define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
+/* Available with KVM_CAP_X86_SMM */
+#define KVM_SMI _IO(KVMIO, 0xb7)

#define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
#define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)
--
1.8.3.1

2015-04-30 11:38:43

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 08/13] KVM: x86: stubs for SMM support

This patch adds the interface between x86.c and the emulator: the
SMBASE register, a new emulator flag, the RSM instruction. It also
adds a new request bit that will be used by the KVM_SMI ioctl.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_emulate.h | 2 ++
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/vmx.h | 1 +
arch/x86/kvm/emulate.c | 10 +++++++++-
arch/x86/kvm/lapic.c | 4 +++-
arch/x86/kvm/svm.c | 1 +
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 34 ++++++++++++++++++++++++++++++++++
include/linux/kvm_host.h | 1 +
9 files changed, 53 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 7410879a41f7..efe9b9565914 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -193,6 +193,7 @@ struct x86_emulate_ops {
int (*cpl)(struct x86_emulate_ctxt *ctxt);
int (*get_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong *dest);
int (*set_dr)(struct x86_emulate_ctxt *ctxt, int dr, ulong value);
+ void (*set_smbase)(struct x86_emulate_ctxt *ctxt, u64 smbase);
int (*set_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 data);
int (*get_msr)(struct x86_emulate_ctxt *ctxt, u32 msr_index, u64 *pdata);
int (*check_pmc)(struct x86_emulate_ctxt *ctxt, u32 pmc);
@@ -264,6 +265,7 @@ enum x86emul_mode {

/* These match some of the HF_* flags defined in kvm_host.h */
#define X86EMUL_GUEST_MASK (1 << 5) /* VCPU is in guest-mode */
+#define X86EMUL_SMM_MASK (1 << 6)

struct x86_emulate_ctxt {
const struct x86_emulate_ops *ops;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1171e346bb33..f0db43e80d09 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -367,6 +367,7 @@ struct kvm_vcpu_arch {
int32_t apic_arb_prio;
int mp_state;
u64 ia32_misc_enable_msr;
+ u64 smbase;
bool tpr_access_reporting;
u64 ia32_xss;

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index da772edd19ab..e8f998114df0 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -108,6 +108,7 @@
#define VMX_MISC_PREEMPTION_TIMER_RATE_MASK 0x0000001f
#define VMX_MISC_SAVE_EFER_LMA 0x00000020
#define VMX_MISC_ACTIVITY_HLT 0x00000040
+#define VMX_MISC_IA32_SMBASE_MSR 0x00008000

/* VMCS Encodings */
enum vmcs_field {
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index cdb612b50910..f6b641207416 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2262,6 +2262,14 @@ static int em_lseg(struct x86_emulate_ctxt *ctxt)
return rc;
}

+static int em_rsm(struct x86_emulate_ctxt *ctxt)
+{
+ if ((ctxt->emul_flags & X86EMUL_SMM_MASK) == 0)
+ return emulate_ud(ctxt);
+
+ return X86EMUL_UNHANDLEABLE;
+}
+
static void
setup_syscalls_segments(struct x86_emulate_ctxt *ctxt,
struct desc_struct *cs, struct desc_struct *ss)
@@ -4173,7 +4181,7 @@ static const struct opcode twobyte_table[256] = {
F(DstMem | SrcReg | Src2CL | ModRM, em_shld), N, N,
/* 0xA8 - 0xAF */
I(Stack | Src2GS, em_push_sreg), I(Stack | Src2GS, em_pop_sreg),
- DI(ImplicitOps, rsm),
+ II(No64 | EmulateOnUD | ImplicitOps, em_rsm, rsm),
F(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_bts),
F(DstMem | SrcReg | Src2ImmByte | ModRM, em_shrd),
F(DstMem | SrcReg | Src2CL | ModRM, em_shrd),
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 629af0f1c5c4..0b2e45e3dc4c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -799,7 +799,9 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
break;

case APIC_DM_SMI:
- apic_debug("Ignoring guest SMI\n");
+ result = 1;
+ kvm_make_request(KVM_REQ_SMI, vcpu);
+ kvm_vcpu_kick(vcpu);
break;

case APIC_DM_NMI:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f2cc241c71c4..0cbb49b8d555 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3392,6 +3392,7 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_MWAIT] = mwait_interception,
[SVM_EXIT_XSETBV] = xsetbv_interception,
[SVM_EXIT_NPF] = pf_interception,
+ [SVM_EXIT_RSM] = emulate_on_interception,
};

static void dump_vmcb(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4e14dfa83ce0..6d296fcb68f4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2505,7 +2505,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
vmx->nested.nested_vmx_misc_low &= VMX_MISC_SAVE_EFER_LMA;
vmx->nested.nested_vmx_misc_low |=
VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE |
- VMX_MISC_ACTIVITY_HLT;
+ VMX_MISC_ACTIVITY_HLT | VMX_MISC_IA32_SMBASE_MSR;
vmx->nested.nested_vmx_misc_high = 0;
}

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c088b057aad5..4cd7a2a18e93 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -953,6 +953,7 @@ static const u32 emulated_msrs[] = {
MSR_IA32_MISC_ENABLE,
MSR_IA32_MCG_STATUS,
MSR_IA32_MCG_CTL,
+ MSR_IA32_SMBASE,
};

bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
@@ -2217,6 +2218,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_MISC_ENABLE:
vcpu->arch.ia32_misc_enable_msr = data;
break;
+ case MSR_IA32_SMBASE:
+ if (!msr_info->host_initiated)
+ return 1;
+ vcpu->arch.smbase = data;
+ break;
case MSR_KVM_WALL_CLOCK_NEW:
case MSR_KVM_WALL_CLOCK:
vcpu->kvm->arch.wall_clock = data;
@@ -2612,6 +2618,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_IA32_MISC_ENABLE:
msr_info->data = vcpu->arch.ia32_misc_enable_msr;
break;
+ case MSR_IA32_SMBASE:
+ if (!msr_info->host_initiated && !is_smm(vcpu))
+ return 1;
+ msr_info->data = vcpu->arch.smbase;
+ break;
case MSR_IA32_PERF_STATUS:
/* TSC increment by tick */
msr_info->data = 1000ULL;
@@ -3034,6 +3045,8 @@ static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu)

static int kvm_vcpu_ioctl_smi(struct kvm_vcpu *vcpu)
{
+ kvm_make_request(KVM_REQ_SMI, vcpu);
+
return 0;
}

@@ -5005,6 +5018,13 @@ static int emulator_set_msr(struct x86_emulate_ctxt *ctxt,
return kvm_set_msr(emul_to_vcpu(ctxt), &msr);
}

+static void emulator_set_smbase(struct x86_emulate_ctxt *ctxt, u64 smbase)
+{
+ struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
+
+ vcpu->arch.smbase = smbase;
+}
+
static int emulator_check_pmc(struct x86_emulate_ctxt *ctxt,
u32 pmc)
{
@@ -5090,6 +5110,7 @@ static const struct x86_emulate_ops emulate_ops = {
.cpl = emulator_get_cpl,
.get_dr = emulator_get_dr,
.set_dr = emulator_set_dr,
+ .set_smbase = emulator_set_smbase,
.set_msr = emulator_set_msr,
.get_msr = emulator_get_msr,
.check_pmc = emulator_check_pmc,
@@ -5152,6 +5173,7 @@ static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
cs_db ? X86EMUL_MODE_PROT32 :
X86EMUL_MODE_PROT16;
BUILD_BUG_ON(HF_GUEST_MASK != X86EMUL_GUEST_MASK);
+ BUILD_BUG_ON(HF_SMM_MASK != X86EMUL_SMM_MASK);
ctxt->emul_flags = vcpu->arch.hflags;

init_decode_cache(ctxt);
@@ -6210,6 +6232,14 @@ static void process_nmi(struct kvm_vcpu *vcpu)
kvm_make_request(KVM_REQ_EVENT, vcpu);
}

+static void process_smi(struct kvm_vcpu *vcpu)
+{
+ if (is_smm(vcpu))
+ return;
+
+ printk_once(KERN_DEBUG "Ignoring guest SMI\n");
+}
+
static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
{
u64 eoi_exit_bitmap[4];
@@ -6316,6 +6346,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
}
if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
record_steal_time(vcpu);
+ if (kvm_check_request(KVM_REQ_SMI, vcpu))
+ process_smi(vcpu);
if (kvm_check_request(KVM_REQ_NMI, vcpu))
process_nmi(vcpu);
if (kvm_check_request(KVM_REQ_PMU, vcpu))
@@ -7208,6 +7240,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
vcpu->arch.regs_avail = ~0;
vcpu->arch.regs_dirty = ~0;

+ vcpu->arch.smbase = 0x30000;
+
kvm_x86_ops->vcpu_reset(vcpu);
}

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 1edb63dbdead..e7cfee09970a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -134,6 +134,7 @@ static inline bool is_error_page(struct page *page)
#define KVM_REQ_ENABLE_IBS 23
#define KVM_REQ_DISABLE_IBS 24
#define KVM_REQ_APIC_PAGE_RELOAD 25
+#define KVM_REQ_SMI 26

#define KVM_USERSPACE_IRQ_SOURCE_ID 0
#define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1
--
1.8.3.1

2015-04-30 11:38:27

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 09/13] KVM: x86: save/load state on SMM switch

The big ugly one. This patch adds support for switching in and out of
system management mode, respectively upon receiving KVM_REQ_SMI and upon
executing a RSM instruction. Both 32- and 64-bit formats are supported
for the SMM state save area.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/cpuid.h | 8 ++
arch/x86/kvm/emulate.c | 243 ++++++++++++++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/x86.c | 216 ++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 465 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c3b1ad9fca81..02b67616435d 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -70,6 +70,14 @@ static inline bool guest_cpuid_has_fsgsbase(struct kvm_vcpu *vcpu)
return best && (best->ebx & bit(X86_FEATURE_FSGSBASE));
}

+static inline bool guest_cpuid_has_longmode(struct kvm_vcpu *vcpu)
+{
+ struct kvm_cpuid_entry2 *best;
+
+ best = kvm_find_cpuid_entry(vcpu, 0x80000001, 0);
+ return best && (best->edx & bit(X86_FEATURE_LM));
+}
+
static inline bool guest_cpuid_has_osvw(struct kvm_vcpu *vcpu)
{
struct kvm_cpuid_entry2 *best;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f6b641207416..a49606871277 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2262,12 +2262,253 @@ static int em_lseg(struct x86_emulate_ctxt *ctxt)
return rc;
}

+static int emulator_has_longmode(struct x86_emulate_ctxt *ctxt)
+{
+ u32 eax, ebx, ecx, edx;
+
+ eax = 0x80000001;
+ ecx = 0;
+ ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
+ return edx & bit(X86_FEATURE_LM);
+}
+
+#define get_smstate(type, smbase, offset) \
+ ({ \
+ type __val; \
+ int r = ctxt->ops->read_std(ctxt, smbase + offset, &__val, \
+ sizeof(__val), NULL); \
+ if (r != X86EMUL_CONTINUE) \
+ return X86EMUL_UNHANDLEABLE; \
+ __val; \
+ })
+
+static void rsm_set_desc_flags(struct desc_struct *desc, u16 flags)
+{
+ desc->g = (flags >> 15) & 1;
+ desc->d = (flags >> 14) & 1;
+ desc->l = (flags >> 13) & 1;
+ desc->avl = (flags >> 12) & 1;
+ desc->p = (flags >> 7) & 1;
+ desc->dpl = (flags >> 5) & 3;
+ desc->s = (flags >> 4) & 1;
+ desc->type = flags & 15;
+}
+
+static int rsm_load_seg_32(struct x86_emulate_ctxt *ctxt, u64 smbase, int n)
+{
+ struct desc_struct desc;
+ int offset;
+ u16 selector;
+
+ selector = get_smstate(u32, smbase, 0x7fa8 + n * 4);
+
+ if (n < 3)
+ offset = 0x7f84 + n * 12;
+ else
+ offset = 0x7f2c + (n - 3) * 12;
+
+ set_desc_base(&desc, get_smstate(u32, smbase, offset + 8));
+ set_desc_limit(&desc, get_smstate(u32, smbase, offset + 4));
+ rsm_set_desc_flags(&desc, get_smstate(u32, smbase, offset));
+ ctxt->ops->set_segment(ctxt, selector, &desc, 0, n);
+ return X86EMUL_CONTINUE;
+}
+
+static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, u64 smbase, int n)
+{
+ struct desc_struct desc;
+ int offset;
+ u16 selector;
+ u32 base3;
+
+ offset = 0x7e00 + n * 16;
+
+ selector = get_smstate(u16, smbase, offset);
+ rsm_set_desc_flags(&desc, get_smstate(u16, smbase, offset + 2));
+ set_desc_limit(&desc, get_smstate(u32, smbase, offset + 4));
+ set_desc_base(&desc, get_smstate(u32, smbase, offset + 8));
+ base3 = get_smstate(u32, smbase, offset + 12);
+
+ ctxt->ops->set_segment(ctxt, selector, &desc, base3, n);
+ return X86EMUL_CONTINUE;
+}
+
+static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
+ u64 cr0, u64 cr4)
+{
+ int bad;
+
+ /*
+ * First enable PAE, long mode needs it before CR0.PG = 1 is set.
+ * Then enable protected mode. However, PCID cannot be enabled
+ * if EFER.LMA=0, so set it separately.
+ */
+ bad = ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PCIDE);
+ if (bad)
+ return X86EMUL_UNHANDLEABLE;
+
+ bad = ctxt->ops->set_cr(ctxt, 0, cr0);
+ if (bad)
+ return X86EMUL_UNHANDLEABLE;
+
+ if (cr4 & X86_CR4_PCIDE) {
+ bad = ctxt->ops->set_cr(ctxt, 4, cr4);
+ if (bad)
+ return X86EMUL_UNHANDLEABLE;
+ }
+
+ return X86EMUL_CONTINUE;
+}
+
+static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt, u64 smbase)
+{
+ struct desc_struct desc;
+ struct desc_ptr dt;
+ u16 selector;
+ u32 val, cr0, cr4;
+ int i;
+
+ cr0 = get_smstate(u32, smbase, 0x7ffc);
+ ctxt->ops->set_cr(ctxt, 3, get_smstate(u32, smbase, 0x7ff8));
+ ctxt->eflags = get_smstate(u32, smbase, 0x7ff4) | X86_EFLAGS_FIXED;
+ ctxt->_eip = get_smstate(u32, smbase, 0x7ff0);
+
+ for (i = 0; i < 8; i++)
+ *reg_write(ctxt, i) = get_smstate(u32, smbase, 0x7fd0 + i * 4);
+
+ val = get_smstate(u32, smbase, 0x7fcc);
+ ctxt->ops->set_dr(ctxt, 6, (val & DR6_VOLATILE) | DR6_FIXED_1);
+ val = get_smstate(u32, smbase, 0x7fc8);
+ ctxt->ops->set_dr(ctxt, 7, (val & DR7_VOLATILE) | DR7_FIXED_1);
+
+ selector = get_smstate(u32, smbase, 0x7fc4);
+ set_desc_base(&desc, get_smstate(u32, smbase, 0x7f64));
+ set_desc_limit(&desc, get_smstate(u32, smbase, 0x7f60));
+ rsm_set_desc_flags(&desc, get_smstate(u32, smbase, 0x7f5c));
+ ctxt->ops->set_segment(ctxt, selector, &desc, 0, VCPU_SREG_TR);
+
+ selector = get_smstate(u32, smbase, 0x7fc0);
+ set_desc_base(&desc, get_smstate(u32, smbase, 0x7f80));
+ set_desc_limit(&desc, get_smstate(u32, smbase, 0x7f7c));
+ rsm_set_desc_flags(&desc, get_smstate(u32, smbase, 0x7f78));
+ ctxt->ops->set_segment(ctxt, selector, &desc, 0, VCPU_SREG_LDTR);
+
+ dt.address = get_smstate(u32, smbase, 0x7f74);
+ dt.size = get_smstate(u32, smbase, 0x7f70);
+ ctxt->ops->set_gdt(ctxt, &dt);
+
+ dt.address = get_smstate(u32, smbase, 0x7f58);
+ dt.size = get_smstate(u32, smbase, 0x7f54);
+ ctxt->ops->set_idt(ctxt, &dt);
+
+ for (i = 0; i < 6; i++) {
+ int r = rsm_load_seg_32(ctxt, smbase, i);
+ if (r != X86EMUL_CONTINUE)
+ return r;
+ }
+
+ cr4 = get_smstate(u32, smbase, 0x7f14);
+
+ ctxt->ops->set_smbase(ctxt, get_smstate(u32, smbase, 0x7ef8));
+
+ return rsm_enter_protected_mode(ctxt, cr0, cr4);
+}
+
+static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase)
+{
+ struct desc_struct desc;
+ struct desc_ptr dt;
+ u64 val, cr0, cr4;
+ u32 base3;
+ u16 selector;
+ int i;
+
+ for (i = 0; i < 16; i++)
+ *reg_write(ctxt, i) = get_smstate(u64, smbase, 0x7ff8 - i * 8);
+
+ ctxt->_eip = get_smstate(u64, smbase, 0x7f78);
+ ctxt->eflags = get_smstate(u32, smbase, 0x7f70) | X86_EFLAGS_FIXED;
+
+ val = get_smstate(u32, smbase, 0x7f68);
+ ctxt->ops->set_dr(ctxt, 6, (val & DR6_VOLATILE) | DR6_FIXED_1);
+ val = get_smstate(u32, smbase, 0x7f60);
+ ctxt->ops->set_dr(ctxt, 7, (val & DR7_VOLATILE) | DR7_FIXED_1);
+
+ cr0 = get_smstate(u64, smbase, 0x7f58);
+ ctxt->ops->set_cr(ctxt, 3, get_smstate(u64, smbase, 0x7f50));
+ cr4 = get_smstate(u64, smbase, 0x7f48);
+ ctxt->ops->set_smbase(ctxt, get_smstate(u32, smbase, 0x7f00));
+ val = get_smstate(u64, smbase, 0x7ed0);
+ ctxt->ops->set_msr(ctxt, MSR_EFER, val & ~EFER_LMA);
+
+ selector = get_smstate(u32, smbase, 0x7e90);
+ rsm_set_desc_flags(&desc, get_smstate(u32, smbase, 0x7e92));
+ set_desc_limit(&desc, get_smstate(u32, smbase, 0x7e94));
+ set_desc_base(&desc, get_smstate(u32, smbase, 0x7e98));
+ base3 = get_smstate(u32, smbase, 0x7e9c);
+ ctxt->ops->set_segment(ctxt, selector, &desc, base3, VCPU_SREG_TR);
+
+ dt.size = get_smstate(u32, smbase, 0x7e84);
+ dt.address = get_smstate(u64, smbase, 0x7e88);
+ ctxt->ops->set_idt(ctxt, &dt);
+
+ selector = get_smstate(u32, smbase, 0x7e70);
+ rsm_set_desc_flags(&desc, get_smstate(u32, smbase, 0x7e72));
+ set_desc_limit(&desc, get_smstate(u32, smbase, 0x7e74));
+ set_desc_base(&desc, get_smstate(u32, smbase, 0x7e78));
+ base3 = get_smstate(u32, smbase, 0x7e7c);
+ ctxt->ops->set_segment(ctxt, selector, &desc, base3, VCPU_SREG_LDTR);
+
+ dt.size = get_smstate(u32, smbase, 0x7e64);
+ dt.address = get_smstate(u64, smbase, 0x7e68);
+ ctxt->ops->set_gdt(ctxt, &dt);
+
+ for (i = 0; i < 6; i++) {
+ int r = rsm_load_seg_64(ctxt, smbase, i);
+ if (r != X86EMUL_CONTINUE)
+ return r;
+ }
+
+ return rsm_enter_protected_mode(ctxt, cr0, cr4);
+}
+
static int em_rsm(struct x86_emulate_ctxt *ctxt)
{
+ unsigned long cr0, cr4, efer;
+ u64 smbase;
+ int ret;
+
+ printk("rsm\n");
if ((ctxt->emul_flags & X86EMUL_SMM_MASK) == 0)
return emulate_ud(ctxt);

- return X86EMUL_UNHANDLEABLE;
+ /*
+ * Get back to real mode, to prepare a safe state in which
+ * to load CR0/CR3/CR4/EFER. Also this will ensure that
+ * addresses passed to read_std/write_std are not virtual.
+ */
+ cr0 = ctxt->ops->get_cr(ctxt, 0);
+ if (cr0 & X86_CR0_PE)
+ ctxt->ops->set_cr(ctxt, 0, cr0 & ~(X86_CR0_PG | X86_CR0_PE));
+ cr4 = ctxt->ops->get_cr(ctxt, 4);
+ if (cr0 & X86_CR4_PAE)
+ ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PAE);
+ efer = 0;
+ ctxt->ops->set_msr(ctxt, MSR_EFER, efer);
+
+ ctxt->ops->get_msr(ctxt, MSR_IA32_SMBASE, &smbase);
+ if (emulator_has_longmode(ctxt))
+ ret = rsm_load_state_64(ctxt, smbase + 0x8000);
+ else
+ ret = rsm_load_state_32(ctxt, smbase + 0x8000);
+
+ if (ret != X86EMUL_CONTINUE) {
+ /* FIXME: should triple fault */
+ return X86EMUL_UNHANDLEABLE;
+ }
+
+ ctxt->emul_flags &= ~X86EMUL_SMM_MASK;
+ return X86EMUL_CONTINUE;
}

static void
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4cd7a2a18e93..ab6a38617813 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6232,12 +6232,226 @@ static void process_nmi(struct kvm_vcpu *vcpu)
kvm_make_request(KVM_REQ_EVENT, vcpu);
}

+#define put_smstate(type, buf, offset, val) \
+ *(type *)((buf) + (offset) - 0x7e00) = val
+
+static u16 process_smi_get_segment_flags(struct kvm_segment *seg)
+{
+ u16 flags = 0;
+ flags |= seg->g << 15;
+ flags |= seg->db << 14;
+ flags |= seg->l << 13;
+ flags |= seg->avl << 12;
+ flags |= seg->present << 7;
+ flags |= seg->dpl << 5;
+ flags |= seg->s << 4;
+ flags |= seg->type;
+ return flags;
+}
+
+static void process_smi_save_seg_32(struct kvm_vcpu *vcpu, char *buf, int n)
+{
+ struct kvm_segment seg;
+ int offset;
+
+ kvm_get_segment(vcpu, &seg, n);
+ put_smstate(u32, buf, 0x7fa8 + n * 4, seg.selector);
+
+ if (n < 3)
+ offset = 0x7f84 + n * 12;
+ else
+ offset = 0x7f2c + (n - 3) * 12;
+
+ put_smstate(u32, buf, offset + 8, seg.base);
+ put_smstate(u32, buf, offset + 4, seg.limit);
+ put_smstate(u32, buf, offset, process_smi_get_segment_flags(&seg));
+}
+
+static void process_smi_save_seg_64(struct kvm_vcpu *vcpu, char *buf, int n)
+{
+ struct kvm_segment seg;
+ int offset;
+
+ kvm_get_segment(vcpu, &seg, n);
+ offset = 0x7e00 + n * 16;
+
+ put_smstate(u16, buf, offset, seg.selector);
+ put_smstate(u16, buf, offset + 2, process_smi_get_segment_flags(&seg));
+ put_smstate(u32, buf, offset + 4, seg.limit);
+ put_smstate(u64, buf, offset + 8, seg.base);
+}
+
+static void process_smi_save_state_32(struct kvm_vcpu *vcpu, char *buf)
+{
+ struct desc_ptr dt;
+ struct kvm_segment seg;
+ unsigned long val;
+ int i;
+
+ put_smstate(u32, buf, 0x7ffc, kvm_read_cr0(vcpu));
+ put_smstate(u32, buf, 0x7ff8, kvm_read_cr3(vcpu));
+ put_smstate(u32, buf, 0x7ff4, kvm_get_rflags(vcpu));
+ put_smstate(u32, buf, 0x7ff0, kvm_rip_read(vcpu));
+
+ for (i = 0; i < 8; i++)
+ put_smstate(u32, buf, 0x7fd0 + i * 4, kvm_register_read(vcpu, i));
+
+ kvm_get_dr(vcpu, 6, &val);
+ put_smstate(u32, buf, 0x7fcc, (u32)val);
+ kvm_get_dr(vcpu, 7, &val);
+ put_smstate(u32, buf, 0x7fc8, (u32)val);
+
+ kvm_get_segment(vcpu, &seg, VCPU_SREG_TR);
+ put_smstate(u32, buf, 0x7fc4, seg.selector);
+ put_smstate(u32, buf, 0x7f64, seg.base);
+ put_smstate(u32, buf, 0x7f60, seg.limit);
+ put_smstate(u32, buf, 0x7f5c, process_smi_get_segment_flags(&seg));
+
+ kvm_get_segment(vcpu, &seg, VCPU_SREG_LDTR);
+ put_smstate(u32, buf, 0x7fc0, seg.selector);
+ put_smstate(u32, buf, 0x7f80, seg.base);
+ put_smstate(u32, buf, 0x7f7c, seg.limit);
+ put_smstate(u32, buf, 0x7f78, process_smi_get_segment_flags(&seg));
+
+ kvm_x86_ops->get_gdt(vcpu, &dt);
+ put_smstate(u32, buf, 0x7f74, dt.address);
+ put_smstate(u32, buf, 0x7f70, dt.size);
+
+ kvm_x86_ops->get_idt(vcpu, &dt);
+ put_smstate(u32, buf, 0x7f58, dt.address);
+ put_smstate(u32, buf, 0x7f54, dt.size);
+
+ for (i = 0; i < 6; i++)
+ process_smi_save_seg_32(vcpu, buf, i);
+
+ put_smstate(u32, buf, 0x7f14, kvm_read_cr4(vcpu));
+
+ /* revision id */
+ put_smstate(u32, buf, 0x7efc, 0x00020000);
+ put_smstate(u32, buf, 0x7ef8, vcpu->arch.smbase);
+}
+
+static void process_smi_save_state_64(struct kvm_vcpu *vcpu, char *buf)
+{
+#ifdef CONFIG_X86_64
+ struct desc_ptr dt;
+ struct kvm_segment seg;
+ unsigned long val;
+ int i;
+
+ for (i = 0; i < 16; i++)
+ put_smstate(u64, buf, 0x7ff8 - i * 8, kvm_register_read(vcpu, i));
+
+ put_smstate(u64, buf, 0x7f78, kvm_rip_read(vcpu));
+ put_smstate(u32, buf, 0x7f70, kvm_get_rflags(vcpu));
+
+ kvm_get_dr(vcpu, 6, &val);
+ put_smstate(u64, buf, 0x7f68, val);
+ kvm_get_dr(vcpu, 7, &val);
+ put_smstate(u64, buf, 0x7f60, val);
+
+ put_smstate(u64, buf, 0x7f58, kvm_read_cr0(vcpu));
+ put_smstate(u64, buf, 0x7f50, kvm_read_cr3(vcpu));
+ put_smstate(u64, buf, 0x7f48, kvm_read_cr4(vcpu));
+
+ put_smstate(u32, buf, 0x7f00, vcpu->arch.smbase);
+
+ /* revision id */
+ put_smstate(u32, buf, 0x7efc, 0x00020064);
+
+ put_smstate(u64, buf, 0x7ed0, vcpu->arch.efer);
+
+ kvm_get_segment(vcpu, &seg, VCPU_SREG_TR);
+ put_smstate(u16, buf, 0x7e90, seg.selector);
+ put_smstate(u16, buf, 0x7e92, process_smi_get_segment_flags(&seg));
+ put_smstate(u32, buf, 0x7e94, seg.limit);
+ put_smstate(u64, buf, 0x7e98, seg.base);
+
+ kvm_x86_ops->get_idt(vcpu, &dt);
+ put_smstate(u32, buf, 0x7e84, dt.size);
+ put_smstate(u64, buf, 0x7e88, dt.address);
+
+ kvm_get_segment(vcpu, &seg, VCPU_SREG_LDTR);
+ put_smstate(u16, buf, 0x7e70, seg.selector);
+ put_smstate(u16, buf, 0x7e72, process_smi_get_segment_flags(&seg));
+ put_smstate(u32, buf, 0x7e74, seg.limit);
+ put_smstate(u64, buf, 0x7e78, seg.base);
+
+ kvm_x86_ops->get_gdt(vcpu, &dt);
+ put_smstate(u32, buf, 0x7e64, dt.size);
+ put_smstate(u64, buf, 0x7e68, dt.address);
+
+ for (i = 0; i < 6; i++)
+ process_smi_save_seg_64(vcpu, buf, i);
+#else
+ WARN_ON_ONCE(1);
+#endif
+}
+
static void process_smi(struct kvm_vcpu *vcpu)
{
+ struct kvm_segment cs, ds;
+ char buf[512];
+ u32 cr0;
+ int r;
+
if (is_smm(vcpu))
return;

- printk_once(KERN_DEBUG "Ignoring guest SMI\n");
+ printk("smm %llx\n", vcpu->arch.smbase);
+ vcpu->arch.hflags |= HF_SMM_MASK;
+ memset(buf, 0, 512);
+ if (guest_cpuid_has_longmode(vcpu))
+ process_smi_save_state_64(vcpu, buf);
+ else
+ process_smi_save_state_32(vcpu, buf);
+
+ r = kvm_write_guest(vcpu->kvm, vcpu->arch.smbase + 0xfe00, buf, sizeof(buf));
+ if (r < 0)
+ return;
+
+ kvm_x86_ops->set_nmi_mask(vcpu, true);
+ kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
+ kvm_rip_write(vcpu, 0x8000);
+
+ cr0 = vcpu->arch.cr0 & ~(X86_CR0_PE | X86_CR0_EM | X86_CR0_TS | X86_CR0_PG);
+ kvm_x86_ops->set_cr0(vcpu, cr0);
+ vcpu->arch.cr0 = cr0;
+
+ kvm_x86_ops->set_cr4(vcpu, 0);
+
+ __kvm_set_dr(vcpu, 7, DR7_FIXED_1);
+
+ cs.selector = (vcpu->arch.smbase >> 4) & 0xffff;
+ cs.base = vcpu->arch.smbase;
+
+ ds.selector = 0;
+ ds.base = 0;
+
+ cs.limit = ds.limit = 0xffffffff;
+ cs.type = ds.type = 0x3;
+ cs.dpl = ds.dpl = 0;
+ cs.db = ds.db = 0;
+ cs.s = ds.s = 1;
+ cs.l = ds.l = 0;
+ cs.g = ds.g = 1;
+ cs.avl = ds.avl = 0;
+ cs.present = ds.present = 1;
+ cs.unusable = ds.unusable = 0;
+ cs.padding = ds.padding = 0;
+
+ kvm_set_segment(vcpu, &cs, VCPU_SREG_CS);
+ kvm_set_segment(vcpu, &ds, VCPU_SREG_DS);
+ kvm_set_segment(vcpu, &ds, VCPU_SREG_ES);
+ kvm_set_segment(vcpu, &ds, VCPU_SREG_FS);
+ kvm_set_segment(vcpu, &ds, VCPU_SREG_GS);
+ kvm_set_segment(vcpu, &ds, VCPU_SREG_SS);
+
+ if (guest_cpuid_has_longmode(vcpu))
+ kvm_x86_ops->set_efer(vcpu, 0);
+
+ kvm_update_cpuid(vcpu);
+ kvm_mmu_reset_context(vcpu);
}

static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
--
1.8.3.1

2015-04-30 11:37:05

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 10/13] KVM: x86: add vcpu-specific functions to read/write/translate GFNs

We need to hide SMRAM from guests not running in SMM. Therefore,
all uses of kvm_read_guest* and kvm_write_guest* must be changed to
check whether the VCPU is in system management mode. We need to
introduce a new family of functions for this.

For now, the code is just copied from virt/kvm/kvm_main.c, except
for calls to other read/write functions.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 20 ++++
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/lapic.c | 22 ++---
arch/x86/kvm/mmu.c | 8 +-
arch/x86/kvm/paging_tmpl.h | 8 +-
arch/x86/kvm/smram.c | 208 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm.c | 8 +-
arch/x86/kvm/vmx.c | 10 +-
arch/x86/kvm/x86.c | 62 ++++++------
9 files changed, 288 insertions(+), 60 deletions(-)
create mode 100644 arch/x86/kvm/smram.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f0db43e80d09..68a09a4394d8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -980,6 +980,26 @@ void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault);
int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
gfn_t gfn, void *data, int offset, int len,
u32 access);
+
+struct kvm_memory_slot *x86_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn);
+unsigned long x86_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn);
+int x86_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+ gpa_t gpa, unsigned long len);
+int x86_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, int offset,
+ int len);
+int x86_read_guest_atomic(struct kvm_vcpu *vcpu, gpa_t gpa, void *data,
+ unsigned long len);
+int x86_read_guest(struct kvm_vcpu *vcpu, gpa_t gpa, void *data,
+ unsigned long len);
+int x86_read_guest_cached(struct kvm_vcpu *vcpu, struct gfn_to_hva_cache *ghc,
+ void *data, unsigned long len);
+int x86_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, const void *data,
+ int offset, int len);
+int x86_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data,
+ unsigned long len);
+int x86_write_guest_cached(struct kvm_vcpu *vcpu, struct gfn_to_hva_cache *ghc,
+ void *data, unsigned long len);
+
bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl);
bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr);

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 16e8f962eaad..02b40b128490 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -12,7 +12,7 @@ kvm-y += $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o

kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
- i8254.o ioapic.o irq_comm.o cpuid.o pmu.o
+ i8254.o ioapic.o irq_comm.o cpuid.o pmu.o smram.o
kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += assigned-dev.o iommu.o
kvm-intel-y += vmx.o
kvm-amd-y += svm.o
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0b2e45e3dc4c..302027217553 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -497,15 +497,15 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq,
static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val)
{

- return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, &val,
+ return x86_write_guest_cached(vcpu, &vcpu->arch.pv_eoi.data, &val,
sizeof(val));
}

static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val)
{

- return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data, val,
- sizeof(*val));
+ return x86_read_guest_cached(vcpu, &vcpu->arch.pv_eoi.data, val,
+ sizeof(*val));
}

static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu)
@@ -1879,8 +1879,8 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu)
if (!test_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention))
return;

- kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.apic->vapic_cache, &data,
- sizeof(u32));
+ x86_read_guest_cached(vcpu, &vcpu->arch.apic->vapic_cache, &data,
+ sizeof(u32));

apic_set_tpr(vcpu->arch.apic, data & 0xff);
}
@@ -1931,16 +1931,16 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu)
max_isr = 0;
data = (tpr & 0xff) | ((max_isr & 0xf0) << 8) | (max_irr << 24);

- kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apic->vapic_cache, &data,
- sizeof(u32));
+ x86_write_guest_cached(vcpu, &vcpu->arch.apic->vapic_cache, &data,
+ sizeof(u32));
}

int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
{
if (vapic_addr) {
- if (kvm_gfn_to_hva_cache_init(vcpu->kvm,
- &vcpu->arch.apic->vapic_cache,
- vapic_addr, sizeof(u32)))
+ if (x86_gfn_to_hva_cache_init(vcpu->kvm,
+ &vcpu->arch.apic->vapic_cache,
+ vapic_addr, sizeof(u32)))
return -EINVAL;
__set_bit(KVM_APIC_CHECK_VAPIC, &vcpu->arch.apic_attention);
} else {
@@ -2032,7 +2032,7 @@ int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data)
vcpu->arch.pv_eoi.msr_val = data;
if (!pv_eoi_enabled(vcpu))
return 0;
- return kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.pv_eoi.data,
+ return x86_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.pv_eoi.data,
addr, sizeof(u8));
}

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 371109546382..4694ad42aa8b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -875,7 +875,7 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn,
{
struct kvm_memory_slot *slot;

- slot = gfn_to_memslot(vcpu->kvm, gfn);
+ slot = x86_gfn_to_memslot(vcpu, gfn);
if (!slot || slot->flags & KVM_MEMSLOT_INVALID ||
(no_dirty_log && slot->dirty_bitmap))
slot = NULL;
@@ -3460,7 +3460,7 @@ static int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn)
arch.direct_map = vcpu->arch.mmu.direct_map;
arch.cr3 = vcpu->arch.mmu.get_cr3(vcpu);

- return kvm_setup_async_pf(vcpu, gva, gfn_to_hva(vcpu->kvm, gfn), &arch);
+ return kvm_setup_async_pf(vcpu, gva, x86_gfn_to_hva(vcpu, gfn), &arch);
}

static bool can_do_async_pf(struct kvm_vcpu *vcpu)
@@ -3478,7 +3478,7 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
struct kvm_memory_slot *slot;
bool async;

- slot = gfn_to_memslot(vcpu->kvm, gfn);
+ slot = x86_gfn_to_memslot(vcpu, gfn);
async = false;
*pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async, write, writable);
if (!async)
@@ -4108,7 +4108,7 @@ static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa,
/* Handle a 32-bit guest writing two halves of a 64-bit gpte */
*gpa &= ~(gpa_t)7;
*bytes = 8;
- r = kvm_read_guest(vcpu->kvm, *gpa, &gentry, 8);
+ r = x86_read_guest(vcpu, *gpa, &gentry, 8);
if (r)
gentry = 0;
new = (const u8 *)&gentry;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index fd49c867b25a..197599c09bbc 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -511,11 +511,11 @@ static bool FNAME(gpte_changed)(struct kvm_vcpu *vcpu,
base_gpa = pte_gpa & ~mask;
index = (pte_gpa - base_gpa) / sizeof(pt_element_t);

- r = kvm_read_guest_atomic(vcpu->kvm, base_gpa,
+ r = x86_read_guest_atomic(vcpu, base_gpa,
gw->prefetch_ptes, sizeof(gw->prefetch_ptes));
curr_pte = gw->prefetch_ptes[index];
} else
- r = kvm_read_guest_atomic(vcpu->kvm, pte_gpa,
+ r = x86_read_guest_atomic(vcpu, pte_gpa,
&curr_pte, sizeof(curr_pte));

return r || curr_pte != gw->ptes[level - 1];
@@ -862,7 +862,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva)
if (!rmap_can_add(vcpu))
break;

- if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &gpte,
+ if (x86_read_guest_atomic(vcpu, pte_gpa, &gpte,
sizeof(pt_element_t)))
break;

@@ -949,7 +949,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)

pte_gpa = first_pte_gpa + i * sizeof(pt_element_t);

- if (kvm_read_guest_atomic(vcpu->kvm, pte_gpa, &gpte,
+ if (x86_read_guest_atomic(vcpu, pte_gpa, &gpte,
sizeof(pt_element_t)))
return -EINVAL;

diff --git a/arch/x86/kvm/smram.c b/arch/x86/kvm/smram.c
new file mode 100644
index 000000000000..73616edab631
--- /dev/null
+++ b/arch/x86/kvm/smram.c
@@ -0,0 +1,208 @@
+/*
+ * Helpers for SMRAM access
+ * Copyright 2015 Red Hat, Inc. and/or its affiliates.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include <linux/module.h>
+#include <linux/kvm_host.h>
+
+struct kvm_memory_slot *x86_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+ struct kvm_memory_slot *slot = gfn_to_memslot(vcpu->kvm, gfn);
+
+ return slot;
+}
+EXPORT_SYMBOL_GPL(x86_gfn_to_memslot);
+
+static unsigned long x86_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn,
+ bool *writable)
+{
+ struct kvm_memory_slot *slot = x86_gfn_to_memslot(vcpu, gfn);
+
+ return gfn_to_hva_memslot_prot(slot, gfn, writable);
+}
+
+unsigned long x86_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+ struct kvm_memory_slot *slot = x86_gfn_to_memslot(vcpu, gfn);
+
+ return gfn_to_hva_memslot(slot, gfn);
+}
+EXPORT_SYMBOL_GPL(x86_gfn_to_hva);
+
+static int next_segment(unsigned long len, int offset)
+{
+ if (len > PAGE_SIZE - offset)
+ return PAGE_SIZE - offset;
+ else
+ return len;
+}
+
+int x86_read_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, void *data, int offset,
+ int len)
+{
+ int r;
+ unsigned long addr;
+
+ addr = x86_gfn_to_hva_prot(vcpu, gfn, NULL);
+ if (kvm_is_error_hva(addr))
+ return -EFAULT;
+ r = __copy_from_user(data, (void __user *)addr + offset, len);
+ if (r)
+ return -EFAULT;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(x86_read_guest_page);
+
+int x86_read_guest_atomic(struct kvm_vcpu *vcpu, gpa_t gpa, void *data,
+ unsigned long len)
+{
+ int r;
+ unsigned long addr;
+ gfn_t gfn = gpa >> PAGE_SHIFT;
+ int offset = offset_in_page(gpa);
+
+ addr = x86_gfn_to_hva_prot(vcpu, gfn, NULL);
+ if (kvm_is_error_hva(addr))
+ return -EFAULT;
+ pagefault_disable();
+ r = __copy_from_user_inatomic(data, (void __user *)addr + offset, len);
+ pagefault_enable();
+ if (r)
+ return -EFAULT;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(x86_read_guest_atomic);
+
+int x86_read_guest(struct kvm_vcpu *vcpu, gpa_t gpa, void *data,
+ unsigned long len)
+{
+ gfn_t gfn = gpa >> PAGE_SHIFT;
+ int seg;
+ int offset = offset_in_page(gpa);
+ int ret;
+
+ while ((seg = next_segment(len, offset)) != 0) {
+ ret = x86_read_guest_page(vcpu, gfn, data, offset, seg);
+ if (ret < 0)
+ return ret;
+ offset = 0;
+ len -= seg;
+ data += seg;
+ ++gfn;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(x86_read_guest);
+
+int x86_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
+ gpa_t gpa, unsigned long len)
+{
+ return kvm_gfn_to_hva_cache_init(kvm, ghc, gpa, len);
+}
+EXPORT_SYMBOL_GPL(x86_gfn_to_hva_cache_init);
+
+int x86_read_guest_cached(struct kvm_vcpu *vcpu, struct gfn_to_hva_cache *ghc,
+ void *data, unsigned long len)
+{
+ struct kvm_memslots *slots = kvm_memslots(vcpu->kvm);
+ int r;
+
+ BUG_ON(len > ghc->len);
+
+ if (slots->generation != ghc->generation)
+ x86_gfn_to_hva_cache_init(vcpu->kvm, ghc, ghc->gpa, ghc->len);
+
+ if (unlikely(!ghc->memslot))
+ return x86_read_guest(vcpu, ghc->gpa, data, len);
+
+ if (kvm_is_error_hva(ghc->hva))
+ return -EFAULT;
+
+ r = __copy_from_user(data, (void __user *)ghc->hva, len);
+ if (r)
+ return -EFAULT;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(x86_read_guest_cached);
+
+int x86_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, const void *data,
+ int offset, int len)
+{
+ struct kvm_memory_slot *slot = x86_gfn_to_memslot(vcpu, gfn);
+ int r;
+ unsigned long addr;
+
+ slot = x86_gfn_to_memslot(vcpu, gfn);
+ addr = gfn_to_hva_memslot(slot, gfn);
+ if (kvm_is_error_hva(addr))
+ return -EFAULT;
+ r = __copy_to_user((void __user *)addr + offset, data, len);
+ if (r)
+ return -EFAULT;
+ mark_page_dirty_in_slot(slot, gfn);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(x86_write_guest_page);
+
+int x86_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data,
+ unsigned long len)
+{
+ gfn_t gfn = gpa >> PAGE_SHIFT;
+ int seg;
+ int offset = offset_in_page(gpa);
+ int ret;
+
+ while ((seg = next_segment(len, offset)) != 0) {
+ ret = x86_write_guest_page(vcpu, gfn, data, offset, seg);
+ if (ret < 0)
+ return ret;
+ offset = 0;
+ len -= seg;
+ data += seg;
+ ++gfn;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(x86_write_guest);
+
+int x86_write_guest_cached(struct kvm_vcpu *vcpu, struct gfn_to_hva_cache *ghc,
+ void *data, unsigned long len)
+{
+ struct kvm_memslots *slots = kvm_memslots(vcpu->kvm);
+ int r;
+
+ BUG_ON(len > ghc->len);
+
+ if (slots->generation != ghc->generation)
+ x86_gfn_to_hva_cache_init(vcpu->kvm, ghc, ghc->gpa, ghc->len);
+
+ if (unlikely(!ghc->memslot))
+ return x86_write_guest(vcpu, ghc->gpa, data, len);
+
+ if (kvm_is_error_hva(ghc->hva))
+ return -EFAULT;
+
+ r = __copy_to_user((void __user *)ghc->hva, data, len);
+ if (r)
+ return -EFAULT;
+ mark_page_dirty_in_slot(ghc->memslot, ghc->gpa >> PAGE_SHIFT);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(x86_write_guest_cached);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0cbb49b8d555..b7650c786474 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1953,7 +1953,7 @@ static u64 nested_svm_get_tdp_pdptr(struct kvm_vcpu *vcpu, int index)
u64 pdpte;
int ret;

- ret = kvm_read_guest_page(vcpu->kvm, gpa_to_gfn(cr3), &pdpte,
+ ret = x86_read_guest_page(vcpu, gpa_to_gfn(cr3), &pdpte,
offset_in_page(cr3) + index * 8, 8);
if (ret)
return 0;
@@ -2151,7 +2151,7 @@ static int nested_svm_intercept_ioio(struct vcpu_svm *svm)
mask = (0xf >> (4 - size)) << start_bit;
val = 0;

- if (kvm_read_guest(svm->vcpu.kvm, gpa, &val, iopm_len))
+ if (x86_read_guest(&svm->vcpu, gpa, &val, iopm_len))
return NESTED_EXIT_DONE;

return (val & mask) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
@@ -2176,7 +2176,7 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm)
/* Offset is in 32 bit units but need in 8 bit units */
offset *= 4;

- if (kvm_read_guest(svm->vcpu.kvm, svm->nested.vmcb_msrpm + offset, &value, 4))
+ if (x86_read_guest(&svm->vcpu, svm->nested.vmcb_msrpm + offset, &value, 4))
return NESTED_EXIT_DONE;

return (value & mask) ? NESTED_EXIT_DONE : NESTED_EXIT_HOST;
@@ -2447,7 +2447,7 @@ static bool nested_svm_vmrun_msrpm(struct vcpu_svm *svm)
p = msrpm_offsets[i];
offset = svm->nested.vmcb_msrpm + (p * 4);

- if (kvm_read_guest(svm->vcpu.kvm, offset, &value, 4))
+ if (x86_read_guest(&svm->vcpu, offset, &value, 4))
return false;

svm->nested.msrpm[p] = svm->msrpm[p] | value;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6d296fcb68f4..a0f5952ed0e9 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7328,7 +7328,7 @@ static bool nested_vmx_exit_handled_io(struct kvm_vcpu *vcpu,
bitmap += (port & 0x7fff) / 8;

if (last_bitmap != bitmap)
- if (kvm_read_guest(vcpu->kvm, bitmap, &b, 1))
+ if (x86_read_guest(vcpu, bitmap, &b, 1))
return true;
if (b & (1 << (port & 7)))
return true;
@@ -7372,7 +7372,7 @@ static bool nested_vmx_exit_handled_msr(struct kvm_vcpu *vcpu,
/* Then read the msr_index'th bit from this bitmap: */
if (msr_index < 1024*8) {
unsigned char b;
- if (kvm_read_guest(vcpu->kvm, bitmap + msr_index/8, &b, 1))
+ if (x86_read_guest(vcpu, bitmap + msr_index/8, &b, 1))
return true;
return 1 & (b >> (msr_index & 7));
} else
@@ -9114,7 +9114,7 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)

msr.host_initiated = false;
for (i = 0; i < count; i++) {
- if (kvm_read_guest(vcpu->kvm, gpa + i * sizeof(e),
+ if (x86_read_guest(vcpu, gpa + i * sizeof(e),
&e, sizeof(e))) {
pr_warn_ratelimited(
"%s cannot read MSR entry (%u, 0x%08llx)\n",
@@ -9148,7 +9148,7 @@ static int nested_vmx_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)

for (i = 0; i < count; i++) {
struct msr_data msr_info;
- if (kvm_read_guest(vcpu->kvm,
+ if (x86_read_guest(vcpu,
gpa + i * sizeof(e),
&e, 2 * sizeof(u32))) {
pr_warn_ratelimited(
@@ -9170,7 +9170,7 @@ static int nested_vmx_store_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
__func__, i, e.index);
return -EINVAL;
}
- if (kvm_write_guest(vcpu->kvm,
+ if (x86_write_guest(vcpu,
gpa + i * sizeof(e) +
offsetof(struct vmx_msr_entry, value),
&msr_info.data, sizeof(msr_info.data))) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ab6a38617813..90ab62f54e1c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -475,7 +475,7 @@ EXPORT_SYMBOL_GPL(kvm_require_dr);

/*
* This function will be used to read from the physical memory of the currently
- * running guest. The difference to kvm_read_guest_page is that this function
+ * running guest. The difference to x86_read_guest_page is that this function
* can read from guest physical or from the guest's guest physical memory.
*/
int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
@@ -493,7 +493,7 @@ int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,

real_gfn = gpa_to_gfn(real_gfn);

- return kvm_read_guest_page(vcpu->kvm, real_gfn, data, offset, len);
+ return x86_read_guest_page(vcpu, real_gfn, data, offset, len);
}
EXPORT_SYMBOL_GPL(kvm_read_guest_page_mmu);

@@ -1125,7 +1125,7 @@ void kvm_set_pending_timer(struct kvm_vcpu *vcpu)
kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
}

-static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
+static void kvm_write_wall_clock(struct kvm_vcpu *vcpu, gpa_t wall_clock)
{
int version;
int r;
@@ -1135,7 +1135,7 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
if (!wall_clock)
return;

- r = kvm_read_guest(kvm, wall_clock, &version, sizeof(version));
+ r = x86_read_guest(vcpu, wall_clock, &version, sizeof(version));
if (r)
return;

@@ -1144,7 +1144,7 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)

++version;

- kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
+ x86_write_guest(vcpu, wall_clock, &version, sizeof(version));

/*
* The guest calculates current wall clock time by adding
@@ -1154,18 +1154,18 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
*/
getboottime(&boot);

- if (kvm->arch.kvmclock_offset) {
- struct timespec ts = ns_to_timespec(kvm->arch.kvmclock_offset);
+ if (vcpu->kvm->arch.kvmclock_offset) {
+ struct timespec ts = ns_to_timespec(vcpu->kvm->arch.kvmclock_offset);
boot = timespec_sub(boot, ts);
}
wc.sec = boot.tv_sec;
wc.nsec = boot.tv_nsec;
wc.version = version;

- kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc));
+ x86_write_guest(vcpu, wall_clock, &wc, sizeof(wc));

version++;
- kvm_write_guest(kvm, wall_clock, &version, sizeof(version));
+ x86_write_guest(vcpu, wall_clock, &version, sizeof(version));
}

static uint32_t div_frac(uint32_t dividend, uint32_t divisor)
@@ -1681,7 +1681,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset;
vcpu->last_guest_tsc = tsc_timestamp;

- if (unlikely(kvm_read_guest_cached(v->kvm, &vcpu->pv_time,
+ if (unlikely(x86_read_guest_cached(v, &vcpu->pv_time,
&guest_hv_clock, sizeof(guest_hv_clock))))
return 0;

@@ -1724,7 +1724,7 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)

trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock);

- kvm_write_guest_cached(v->kvm, &vcpu->pv_time,
+ x86_write_guest_cached(v, &vcpu->pv_time,
&vcpu->hv_clock,
sizeof(vcpu->hv_clock));

@@ -1965,7 +1965,7 @@ static int xen_hvm_config(struct kvm_vcpu *vcpu, u64 data)
r = PTR_ERR(page);
goto out;
}
- if (kvm_write_guest(kvm, page_addr, page, PAGE_SIZE))
+ if (x86_write_guest(vcpu, page_addr, page, PAGE_SIZE))
goto out_free;
r = 0;
out_free:
@@ -2018,7 +2018,7 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data)
break;
}
gfn = data >> HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT;
- addr = gfn_to_hva(kvm, gfn);
+ addr = x86_gfn_to_hva(vcpu, gfn);
if (kvm_is_error_hva(addr))
return 1;
kvm_x86_ops->patch_hypercall(vcpu, instructions);
@@ -2037,7 +2037,7 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data)
if (!(data & HV_X64_MSR_TSC_REFERENCE_ENABLE))
break;
gfn = data >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT;
- if (kvm_write_guest(kvm, gfn << HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT,
+ if (x86_write_guest(vcpu, gfn << HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT,
&tsc_ref, sizeof(tsc_ref)))
return 1;
mark_page_dirty(kvm, gfn);
@@ -2065,7 +2065,7 @@ static int set_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 data)
break;
}
gfn = data >> HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT;
- addr = gfn_to_hva(vcpu->kvm, gfn);
+ addr = x86_gfn_to_hva(vcpu, gfn);
if (kvm_is_error_hva(addr))
return 1;
if (__clear_user((void __user *)addr, PAGE_SIZE))
@@ -2107,8 +2107,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
return 0;
}

- if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
- sizeof(u32)))
+ if (x86_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.apf.data, gpa,
+ sizeof(u32)))
return 1;

vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
@@ -2138,7 +2138,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
return;

- if (unlikely(kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
+ if (unlikely(x86_read_guest_cached(vcpu, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
return;

@@ -2146,7 +2146,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
vcpu->arch.st.steal.version += 2;
vcpu->arch.st.accum_steal = 0;

- kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
+ x86_write_guest_cached(vcpu, &vcpu->arch.st.stime,
&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
}

@@ -2226,7 +2226,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
case MSR_KVM_WALL_CLOCK_NEW:
case MSR_KVM_WALL_CLOCK:
vcpu->kvm->arch.wall_clock = data;
- kvm_write_wall_clock(vcpu->kvm, data);
+ kvm_write_wall_clock(vcpu, data);
break;
case MSR_KVM_SYSTEM_TIME_NEW:
case MSR_KVM_SYSTEM_TIME: {
@@ -2254,7 +2254,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)

gpa_offset = data & ~(PAGE_MASK | 1);

- if (kvm_gfn_to_hva_cache_init(vcpu->kvm,
+ if (x86_gfn_to_hva_cache_init(vcpu->kvm,
&vcpu->arch.pv_time, data & ~1ULL,
sizeof(struct pvclock_vcpu_time_info)))
vcpu->arch.pv_time_enabled = false;
@@ -2275,9 +2275,9 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
if (data & KVM_STEAL_RESERVED_MASK)
return 1;

- if (kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.st.stime,
- data & KVM_STEAL_VALID_BITS,
- sizeof(struct kvm_steal_time)))
+ if (x86_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.st.stime,
+ data & KVM_STEAL_VALID_BITS,
+ sizeof(struct kvm_steal_time)))
return 1;

vcpu->arch.st.msr_val = data;
@@ -4301,7 +4301,7 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes,

if (gpa == UNMAPPED_GVA)
return X86EMUL_PROPAGATE_FAULT;
- ret = kvm_read_guest_page(vcpu->kvm, gpa >> PAGE_SHIFT, data,
+ ret = x86_read_guest_page(vcpu, gpa >> PAGE_SHIFT, data,
offset, toread);
if (ret < 0) {
r = X86EMUL_IO_NEEDED;
@@ -4335,7 +4335,7 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ctxt *ctxt,
offset = addr & (PAGE_SIZE-1);
if (WARN_ON(offset + bytes > PAGE_SIZE))
bytes = (unsigned)PAGE_SIZE - offset;
- ret = kvm_read_guest_page(vcpu->kvm, gpa >> PAGE_SHIFT, val,
+ ret = x86_read_guest_page(vcpu, gpa >> PAGE_SHIFT, val,
offset, bytes);
if (unlikely(ret < 0))
return X86EMUL_IO_NEEDED;
@@ -4382,7 +4382,7 @@ int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,

if (gpa == UNMAPPED_GVA)
return X86EMUL_PROPAGATE_FAULT;
- ret = kvm_write_guest(vcpu->kvm, gpa, data, towrite);
+ ret = x86_write_guest(vcpu, gpa, data, towrite);
if (ret < 0) {
r = X86EMUL_IO_NEEDED;
goto out;
@@ -4435,7 +4435,7 @@ int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
{
int ret;

- ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes);
+ ret = x86_write_guest(vcpu, gpa, val, bytes);
if (ret < 0)
return 0;
kvm_mmu_pte_write(vcpu, gpa, val, bytes);
@@ -4469,7 +4469,7 @@ static int read_prepare(struct kvm_vcpu *vcpu, void *val, int bytes)
static int read_emulate(struct kvm_vcpu *vcpu, gpa_t gpa,
void *val, int bytes)
{
- return !kvm_read_guest(vcpu->kvm, gpa, val, bytes);
+ return !x86_read_guest(vcpu, gpa, val, bytes);
}

static int write_emulate(struct kvm_vcpu *vcpu, gpa_t gpa,
@@ -6406,7 +6406,7 @@ static void process_smi(struct kvm_vcpu *vcpu)
else
process_smi_save_state_32(vcpu, buf);

- r = kvm_write_guest(vcpu->kvm, vcpu->arch.smbase + 0xfe00, buf, sizeof(buf));
+ r = x86_write_guest(vcpu, vcpu->arch.smbase + 0xfe00, buf, sizeof(buf));
if (r < 0)
return;

@@ -8175,7 +8175,7 @@ static void kvm_del_async_pf_gfn(struct kvm_vcpu *vcpu, gfn_t gfn)
static int apf_put_user(struct kvm_vcpu *vcpu, u32 val)
{

- return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.data, &val,
+ return x86_write_guest_cached(vcpu, &vcpu->arch.apf.data, &val,
sizeof(val));
}

--
1.8.3.1

2015-04-30 11:37:00

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 11/13] KVM: x86: add SMM to the MMU role

Whenever it may change, the context is reset.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 11 +----------
arch/x86/kvm/mmu.c | 5 ++++-
arch/x86/kvm/x86.c | 5 ++++-
3 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 68a09a4394d8..b43c3a708328 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -184,16 +184,6 @@ struct kvm_mmu_memory_cache {
void *objects[KVM_NR_MEM_OBJS];
};

-/*
- * kvm_mmu_page_role, below, is defined as:
- *
- * bits 0:3 - total guest paging levels (2-4, or zero for real mode)
- * bits 4:7 - page table level for this shadow (1-4)
- * bits 8:9 - page table quadrant for 2-level guests
- * bit 16 - direct mapping of virtual to physical mapping at gfn
- * used for real mode and two-dimensional paging
- * bits 17:19 - common access permissions for all ptes in this shadow page
- */
union kvm_mmu_page_role {
unsigned word;
struct {
@@ -207,6 +197,7 @@ union kvm_mmu_page_role {
unsigned nxe:1;
unsigned cr0_wp:1;
unsigned smep_andnot_wp:1;
+ unsigned smm:1;
};
};

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 4694ad42aa8b..043680d90964 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3879,6 +3879,7 @@ static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu)
struct kvm_mmu *context = &vcpu->arch.mmu;

context->base_role.word = 0;
+ context->base_role.smm = is_smm(vcpu);
context->page_fault = tdp_page_fault;
context->sync_page = nonpaging_sync_page;
context->invlpg = nonpaging_invlpg;
@@ -3937,6 +3938,7 @@ void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu)
context->base_role.cr0_wp = is_write_protection(vcpu);
context->base_role.smep_andnot_wp
= smep && !is_write_protection(vcpu);
+ context->base_role.smm = is_smm(vcpu);
}
EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);

@@ -4239,7 +4241,8 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
++vcpu->kvm->stat.mmu_pte_write;
kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE);

- mask.cr0_wp = mask.cr4_pae = mask.nxe = mask.smep_andnot_wp = 1;
+ mask.cr0_wp = mask.cr4_pae = mask.nxe = mask.smep_andnot_wp =
+ mask.smm = 1;
for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) {
if (detect_write_misaligned(sp, gpa, bytes) ||
detect_write_flooding(sp)) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 90ab62f54e1c..a4192c352744 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5542,7 +5542,10 @@ restart:
unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
toggle_interruptibility(vcpu, ctxt->interruptibility);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
- vcpu->arch.hflags = ctxt->emul_flags;
+ if (ctxt->emul_flags != vcpu->arch.hflags) {
+ vcpu->arch.hflags = ctxt->emul_flags;
+ kvm_mmu_reset_context(vcpu);
+ }
kvm_rip_write(vcpu, ctxt->eip);
if (r == EMULATE_DONE)
kvm_vcpu_check_singlestep(vcpu, rflags, &r);
--
1.8.3.1

2015-04-30 11:38:03

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 12/13] KVM: x86: add KVM_MEM_X86_SMRAM memory slot flag

This adds an arch-specific memslot flag that hides slots unless the
VCPU is in system management mode.

Some care is needed in order to limit the overhead of x86_gfn_to_memslot
when compared with gfn_to_memslot. Thankfully, we have __gfn_to_memslot
and search_memslots which are the same, so we can add some extra output
to search_memslots. The compiler will optimize it as dead code in
__gfn_to_memslot, and will use it to thread jumps in x86_gfn_to_memslot.

Signed-off-by: Paolo Bonzini <[email protected]>
---
Documentation/virtual/kvm/api.txt | 18 ++++++++++++------
arch/x86/include/uapi/asm/kvm.h | 3 +++
arch/x86/kvm/smram.c | 25 +++++++++++++++++++++++--
include/linux/kvm_host.h | 14 ++++++++++----
virt/kvm/kvm_main.c | 4 ++++
5 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index c1afcb2e4b89..bc71f347782d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -931,18 +931,24 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr
be identical. This allows large pages in the guest to be backed by large
pages in the host.

-The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
-KVM_MEM_READONLY. The former can be set to instruct KVM to keep track of
-writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to
-use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
-to make a new slot read-only. In this case, writes to this memory will be
-posted to userspace as KVM_EXIT_MMIO exits.
+The flags field supports two architecture-independent flags:
+KVM_MEM_LOG_DIRTY_PAGES and KVM_MEM_READONLY. The former can be set
+to instruct KVM to keep track of writes to memory within the slot.
+See KVM_GET_DIRTY_LOG ioctl to know how to use it. The latter can
+be set, if KVM_CAP_READONLY_MEM capability allows it, to make a new
+slot read-only. In this case, writes to this memory will be posted to
+userspace as KVM_EXIT_MMIO exits.

When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
the memory region are automatically reflected into the guest. For example, an
mmap() that affects the region will be made visible immediately. Another
example is madvise(MADV_DROP).

+Each architectures can support other specific flags. Right now there is
+only one defined. On x86, if KVM_CAP_X86_SMM is available, the
+KVM_MEM_X86_SMRAM flag will hide the slot to VCPUs that are not
+in system management mode.
+
It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl.
The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
allocation and is deprecated.
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 0bb09e2e549e..093b0dcb4c0c 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -45,6 +45,9 @@
#define __KVM_HAVE_XCRS
#define __KVM_HAVE_READONLY_MEM

+#define __KVM_ARCH_VALID_FLAGS KVM_MEM_X86_SMRAM
+#define KVM_MEM_X86_SMRAM (1 << 24)
+
/* Architectural interrupt line count. */
#define KVM_NR_INTERRUPTS 256

diff --git a/arch/x86/kvm/smram.c b/arch/x86/kvm/smram.c
index 73616edab631..e7dd933673a4 100644
--- a/arch/x86/kvm/smram.c
+++ b/arch/x86/kvm/smram.c
@@ -19,10 +19,23 @@

#include <linux/module.h>
#include <linux/kvm_host.h>
+#include "kvm_cache_regs.h"

struct kvm_memory_slot *x86_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn)
{
- struct kvm_memory_slot *slot = gfn_to_memslot(vcpu->kvm, gfn);
+ /* By using search_memslots directly the compiler can optimize away
+ * the "if (found)" check below.
+ *
+ * It cannot do the same for gfn_to_memslot because it is not inlined,
+ * and it also cannot do the same for __gfn_to_memslot because the
+ * kernel is compiled with -fno-delete-null-pointer-checks.
+ */
+ bool found;
+ struct kvm_memslots *memslots = kvm_memslots(vcpu->kvm);
+ struct kvm_memory_slot *slot = search_memslots(memslots, gfn, &found);
+
+ if (found && unlikely(slot->flags & KVM_MEM_X86_SMRAM) && !is_smm(vcpu))
+ return NULL;

return slot;
}
@@ -112,7 +125,15 @@ EXPORT_SYMBOL_GPL(x86_read_guest);
int x86_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
gpa_t gpa, unsigned long len)
{
- return kvm_gfn_to_hva_cache_init(kvm, ghc, gpa, len);
+ int r = kvm_gfn_to_hva_cache_init(kvm, ghc, gpa, len);
+
+ if (r < 0)
+ return r;
+
+ /* Use slow path for reads and writes to SMRAM. */
+ if (ghc->memslot && (ghc->memslot->flags & KVM_MEM_X86_SMRAM))
+ ghc->memslot = NULL;
+ return r;
}
EXPORT_SYMBOL_GPL(x86_gfn_to_hva_cache_init);

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e7cfee09970a..ea3db5d89f67 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -800,16 +800,18 @@ static inline void kvm_guest_exit(void)
* gfn_to_memslot() itself isn't here as an inline because that would
* bloat other code too much.
*/
-static inline struct kvm_memory_slot *
-search_memslots(struct kvm_memslots *slots, gfn_t gfn)
+static __always_inline struct kvm_memory_slot *
+search_memslots(struct kvm_memslots *slots, gfn_t gfn, bool *found)
{
int start = 0, end = slots->used_slots;
int slot = atomic_read(&slots->lru_slot);
struct kvm_memory_slot *memslots = slots->memslots;

if (gfn >= memslots[slot].base_gfn &&
- gfn < memslots[slot].base_gfn + memslots[slot].npages)
+ gfn < memslots[slot].base_gfn + memslots[slot].npages) {
+ *found = true;
return &memslots[slot];
+ }

while (start < end) {
slot = start + (end - start) / 2;
@@ -823,16 +825,20 @@ search_memslots(struct kvm_memslots *slots, gfn_t gfn)
if (gfn >= memslots[start].base_gfn &&
gfn < memslots[start].base_gfn + memslots[start].npages) {
atomic_set(&slots->lru_slot, start);
+ *found = true;
return &memslots[start];
}

+ *found = false;
return NULL;
}

static inline struct kvm_memory_slot *
__gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn)
{
- return search_memslots(slots, gfn);
+ bool found;
+
+ return search_memslots(slots, gfn, &found);
}

static inline unsigned long
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e82c46b9860f..b8b76106a003 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -716,6 +716,10 @@ static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
#ifdef __KVM_HAVE_READONLY_MEM
valid_flags |= KVM_MEM_READONLY;
#endif
+#ifdef __KVM_ARCH_VALID_FLAGS
+ BUILD_BUG_ON(__KVM_ARCH_VALID_FLAGS & KVM_MEMSLOT_INVALID);
+ valid_flags |= __KVM_ARCH_VALID_FLAGS;
+#endif

if (mem->flags & ~valid_flags)
return -EINVAL;
--
1.8.3.1

2015-04-30 11:37:08

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 13/13] KVM: x86: advertise KVM_CAP_X86_SMM

TODO: disable if !unrestricted_guest && !emulate_invalid_guest_state.

Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/x86.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a4192c352744..ceb50dc9e71c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2801,6 +2801,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_MP_STATE:
case KVM_CAP_SYNC_MMU:
case KVM_CAP_USER_NMI:
+ case KVM_CAP_X86_SMM:
case KVM_CAP_REINJECT_CONTROL:
case KVM_CAP_IRQ_INJECT_STATUS:
case KVM_CAP_IOEVENTFD:
--
1.8.3.1

2015-05-04 14:02:00

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 05/13] KVM: x86: pass host_initiated to functions that read MSRs

2015-04-30 13:36+0200, Paolo Bonzini:
> SMBASE is only readable from SMM for the VCPU, but it must be always
> accessible if userspace is accessing it. Thus, all functions that
> read MSRs are changed to accept a struct msr_data; the host_initiated
> and index fields are pre-initialized, while the data field is filled
> on return.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> @@ -1048,6 +1048,21 @@ EXPORT_SYMBOL_GPL(kvm_set_msr);
> /*
> * Adapt set_msr() to msr_io()'s calling convention
> */
> +static int do_get_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
> +{
> + struct msr_data msr;
> + int r;
> +
> + msr.index = index;
> + msr.host_initiated = true;
> + r = kvm_set_msr(vcpu, &msr);

Should be kvm_get_msr().

> + if (r)
> + return r;
> +
> + *data = msr.data;
> + return 0;
> +}
> +
> @@ -3456,7 +3470,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> break;
> }
> case KVM_GET_MSRS:
> - r = msr_io(vcpu, argp, kvm_get_msr, 1);
> + r = msr_io(vcpu, argp, do_get_msr, 1);
> break;
> case KVM_SET_MSRS:
> r = msr_io(vcpu, argp, do_set_msr, 0);
> @@ -4948,7 +4962,17 @@ static void emulator_set_segment(struct x86_emulate_ctxt *ctxt, u16 selector,
> static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
> u32 msr_index, u64 *pdata)
> {
> - return kvm_get_msr(emul_to_vcpu(ctxt), msr_index, pdata);
> + struct msr_data msr;
> + int r;
> +
> + msr.index = msr_index;
> + msr.host_initiated = false;
> + r = kvm_get_msr(emul_to_vcpu(ctxt), &msr);
> + if (r)
> + return r;
> +
> + *pdata = msr.data;
> + return 0;
> }

(Only msr.host_initiated changed from do_get_msr() ...
I'd add a function with an extra bool arg and call it twice.)

2015-05-04 15:38:26

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 07/13] KVM: x86: API changes for SMM support

2015-04-30 13:36+0200, Paolo Bonzini:
> This patch includes changes to the external API for SMM support.
> All the changes are predicated by the availability of a new
> capability, KVM_CAP_X86_SMM, which is added at the end of the
> patch series.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> @@ -820,11 +820,19 @@ struct kvm_vcpu_events {
> } nmi;
> __u32 sipi_vector;
> __u32 flags;
> + struct {
> + __u8 smm;

34.3.1 Entering SMM:
Subsequent SMI requests are not acknowledged while the processor is in
SMM. The first SMI interrupt request that occurs while the processor
is in SMM (that is, after SMM has been acknowledged to external
hardware) is latched and serviced when the processor exits SMM with
the RSM instruction. The processor will latch only one SMI while in
SMM.

The final code doesn't handle pending SMI's at all, so we'll need to
store it somewhere and expose to userspace here.

> + __u8 pad[3];
> + } smi;
> };
>
> -KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
> -interrupt.shadow contains a valid state. Otherwise, this field is undefined.
> +Only two fields are defined in the flags field:

(Aren't KVM_VCPUEVENT_VALID_NMI_PENDING/KVM_VCPUEVENT_VALID_SIPI_VECTOR
defined and always 1/0?)

> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> @@ -281,6 +283,7 @@ struct kvm_reinject_control {
> #define KVM_VCPUEVENT_VALID_NMI_PENDING 0x00000001
> #define KVM_VCPUEVENT_VALID_SIPI_VECTOR 0x00000002
> #define KVM_VCPUEVENT_VALID_SHADOW 0x00000004
> +#define KVM_VCPUEVENT_VALID_SMM 0x00000008
>
> /* Interrupt shadow states */
> #define KVM_X86_SHADOW_INT_MOV_SS 0x01
> @@ -309,6 +312,10 @@ struct kvm_vcpu_events {
> } nmi;
> __u32 sipi_vector;
> __u32 flags;
> + struct {
> + __u8 smm;
> + __u8 pad[3];
> + } smi;
> __u32 reserved[10];

Nicely padded to use reserved[9] here :)

> };
>
> diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
> +static inline bool is_smm(struct kvm_vcpu *vcpu)
> +{
> + return vcpu->arch.hflags & HF_SMM_MASK;
> +}
> @@ -3116,29 +3121,31 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
> struct kvm_vcpu_events *events)
> {
> + events->smi.smm = is_smm(vcpu);
>
> @@ -3172,6 +3180,13 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> kvm_vcpu_has_lapic(vcpu))
> vcpu->arch.apic->sipi_vector = events->sipi_vector;
>
> + if (events->flags & KVM_VCPUEVENT_VALID_SMM) {
> + if (events->smi.smm)
> + vcpu->arch.hflags |= HF_SMM_MASK;
> + else
> + vcpu->arch.hflags &= ~HF_SMM_MASK;
> + }

SMM is not really an event ... is this interface intended to be used for
emulating RSM in userspace (and after resume)?

> kvm_make_request(KVM_REQ_EVENT, vcpu);
>
> return 0;
> @@ -3431,6 +3446,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> r = kvm_vcpu_ioctl_nmi(vcpu);
> break;
> }
> + case KVM_SMI: {
> + r = kvm_vcpu_ioctl_smi(vcpu);
> + break;
> + }
> case KVM_SET_CPUID: {
> struct kvm_cpuid __user *cpuid_arg = argp;
> struct kvm_cpuid cpuid;
> @@ -6067,6 +6086,7 @@ static void post_kvm_run_save(struct kvm_vcpu *vcpu)
> struct kvm_run *kvm_run = vcpu->run;
>
> kvm_run->if_flag = (kvm_get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
> + kvm_run->flags = is_smm(vcpu) ? KVM_RUN_X86_SMM : 0;

This is a mirror smm from events -- was it added for optimization?

> kvm_run->cr8 = kvm_get_cr8(vcpu);
> kvm_run->apic_base = kvm_get_apic_base(vcpu);
> if (irqchip_in_kernel(vcpu->kvm))
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4b60056776d1..cd51f0289e9f 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -202,7 +202,7 @@ struct kvm_run {
> __u32 exit_reason;
> __u8 ready_for_interrupt_injection;
> __u8 if_flag;
> - __u8 padding2[2];
> + __u16 flags;

(I think u8 would be better, it's not yet clear that we are going to
have that many flags. Little endianess of x86 makes later extension to
u16 easy, so u8 would just be keeping more options open.)

>
> /* in (pre_kvm_run), out (post_kvm_run) */
> __u64 cr8;

2015-05-04 16:03:07

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 07/13] KVM: x86: API changes for SMM support



On 04/05/2015 17:37, Radim Krčmář wrote:
> 2015-04-30 13:36+0200, Paolo Bonzini:
>> This patch includes changes to the external API for SMM support.
>> All the changes are predicated by the availability of a new
>> capability, KVM_CAP_X86_SMM, which is added at the end of the
>> patch series.
>>
>> Signed-off-by: Paolo Bonzini <[email protected]>
>> ---
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> @@ -820,11 +820,19 @@ struct kvm_vcpu_events {
>> } nmi;
>> __u32 sipi_vector;
>> __u32 flags;
>> + struct {
>> + __u8 smm;
>
> 34.3.1 Entering SMM:
> Subsequent SMI requests are not acknowledged while the processor is in
> SMM. The first SMI interrupt request that occurs while the processor
> is in SMM (that is, after SMM has been acknowledged to external
> hardware) is latched and serviced when the processor exits SMM with
> the RSM instruction. The processor will latch only one SMI while in
> SMM.
>
> The final code doesn't handle pending SMI's at all, so we'll need to
> store it somewhere and expose to userspace here.

Right, and I can add this to the slow path I already have for SMM exits:

+ if (ctxt->emul_flags != vcpu->arch.hflags) {
+ vcpu->arch.hflags = ctxt->emul_flags;
+ kvm_mmu_reset_context(vcpu);
+ }

Paolo

2015-05-04 16:04:50

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 05/13] KVM: x86: pass host_initiated to functions that read MSRs



On 04/05/2015 16:01, Radim Krčmář wrote:
>> static int emulator_get_msr(struct x86_emulate_ctxt *ctxt,
>> u32 msr_index, u64 *pdata)
>> {
>> - return kvm_get_msr(emul_to_vcpu(ctxt), msr_index, pdata);
>> + struct msr_data msr;
>> + int r;
>> +
>> + msr.index = msr_index;
>> + msr.host_initiated = false;
>> + r = kvm_get_msr(emul_to_vcpu(ctxt), &msr);
>> + if (r)
>> + return r;
>> +
>> + *pdata = msr.data;
>> + return 0;
>> }
>
> (Only msr.host_initiated changed from do_get_msr() ...
> I'd add a function with an extra bool arg and call it twice.)

True, but the same situation is there already for do_set_msr vs.
emulator_set_msr

Paolo

2015-05-04 17:51:12

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 08/13] KVM: x86: stubs for SMM support

2015-04-30 13:36+0200, Paolo Bonzini:
> This patch adds the interface between x86.c and the emulator: the
> SMBASE register, a new emulator flag, the RSM instruction. It also
> adds a new request bit that will be used by the KVM_SMI ioctl.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -2505,7 +2505,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx)
> vmx->nested.nested_vmx_misc_low &= VMX_MISC_SAVE_EFER_LMA;
> vmx->nested.nested_vmx_misc_low |=
> VMX_MISC_EMULATED_PREEMPTION_TIMER_RATE |
> - VMX_MISC_ACTIVITY_HLT;
> + VMX_MISC_ACTIVITY_HLT | VMX_MISC_IA32_SMBASE_MSR;
> vmx->nested.nested_vmx_misc_high = 0;
> }
>
> bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
> @@ -2217,6 +2218,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_IA32_MISC_ENABLE:
> vcpu->arch.ia32_misc_enable_msr = data;
> break;
> + case MSR_IA32_SMBASE:
> + if (!msr_info->host_initiated)
> + return 1;
> + vcpu->arch.smbase = data;
> + break;
> case MSR_KVM_WALL_CLOCK_NEW:
> case MSR_KVM_WALL_CLOCK:
> vcpu->kvm->arch.wall_clock = data;
> @@ -2612,6 +2618,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> case MSR_IA32_MISC_ENABLE:
> msr_info->data = vcpu->arch.ia32_misc_enable_msr;
> break;
> + case MSR_IA32_SMBASE:
> + if (!msr_info->host_initiated && !is_smm(vcpu))
> + return 1;
> + msr_info->data = vcpu->arch.smbase;
> + break;

(I'm not sure if this is supported if IA32_VMX_BASIC[49] = 0.
34.15.6.4 Saving Guest State
The SMM-transfer monitor (STM) can also discover the current value of
the SMBASE register by using the RDMSR

but it's not possible to get into STM without having a support for it
noted in IA32_VMX_BASIC[49] and more magic we also don't emulate to
actually enable it.)

> @@ -7208,6 +7240,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
> vcpu->arch.regs_avail = ~0;
> vcpu->arch.regs_dirty = ~0;
>
> + vcpu->arch.smbase = 0x30000;

It's not reset on INIT, only on RESET. (34.11 SMBASE RELOCATION)

2015-05-04 19:59:13

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 09/13] KVM: x86: save/load state on SMM switch

2015-04-30 13:36+0200, Paolo Bonzini:
> The big ugly one. This patch adds support for switching in and out of
> system management mode, respectively upon receiving KVM_REQ_SMI and upon
> executing a RSM instruction. Both 32- and 64-bit formats are supported
> for the SMM state save area.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> +static void rsm_set_desc_flags(struct desc_struct *desc, u16 flags)
> +{
> + desc->g = (flags >> 15) & 1;
> + desc->d = (flags >> 14) & 1;
> + desc->l = (flags >> 13) & 1;
> + desc->avl = (flags >> 12) & 1;
> + desc->p = (flags >> 7) & 1;
> + desc->dpl = (flags >> 5) & 3;
> + desc->s = (flags >> 4) & 1;
> + desc->type = flags & 15;

I can't find a description of this ... can you point me to a place where
the gap between 'p' and 'avl' is documented?
(Not that it matters unless the guest reads it, but it's a bit weird.)

Thanks.

2015-05-05 09:37:22

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 09/13] KVM: x86: save/load state on SMM switch



On 04/05/2015 21:59, Radim Krčmář wrote:
> > The big ugly one. This patch adds support for switching in and out of
> > system management mode, respectively upon receiving KVM_REQ_SMI and upon
> > executing a RSM instruction. Both 32- and 64-bit formats are supported
> > for the SMM state save area.
> >
> > Signed-off-by: Paolo Bonzini <[email protected]>
> > ---
> > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> > +static void rsm_set_desc_flags(struct desc_struct *desc, u16 flags)
> > +{
> > + desc->g = (flags >> 15) & 1;
> > + desc->d = (flags >> 14) & 1;
> > + desc->l = (flags >> 13) & 1;
> > + desc->avl = (flags >> 12) & 1;
> > + desc->p = (flags >> 7) & 1;
> > + desc->dpl = (flags >> 5) & 3;
> > + desc->s = (flags >> 4) & 1;
> > + desc->type = flags & 15;
>
> I can't find a description of this ... can you point me to a place where
> the gap between 'p' and 'avl' is documented?
> (Not that it matters unless the guest reads it, but it's a bit weird.)

It turns out that access rights are stored in the same format as the VMX
access rights. However, they are shifted by 8, which my code above
doesn't do (bug).

The documentation is, of course, QEMU and Bochs :) but you can also find
it in http://www.rcollins.org/ftp/source/include/struc.inc. It is not
exactly for SMM, but it is more or less the same.

Paolo

2015-05-05 09:38:07

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 08/13] KVM: x86: stubs for SMM support



On 04/05/2015 19:51, Radim Krčmář wrote:
> (I'm not sure if this is supported if IA32_VMX_BASIC[49] = 0.
> 34.15.6.4 Saving Guest State
> The SMM-transfer monitor (STM) can also discover the current value of
> the SMBASE register by using the RDMSR
>
> but it's not possible to get into STM without having a support for it
> noted in IA32_VMX_BASIC[49] and more magic we also don't emulate to
> actually enable it.)

I can certainly make the SMBASE MSR host-accessible only.

Paolo

2015-05-05 12:48:46

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 09/13] KVM: x86: save/load state on SMM switch

2015-05-05 11:37+0200, Paolo Bonzini:
> On 04/05/2015 21:59, Radim Krčmář wrote:
> > > The big ugly one. This patch adds support for switching in and out of
> > > system management mode, respectively upon receiving KVM_REQ_SMI and upon
> > > executing a RSM instruction. Both 32- and 64-bit formats are supported
> > > for the SMM state save area.
> > >
> > > Signed-off-by: Paolo Bonzini <[email protected]>
> > > ---
> > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> > > +static void rsm_set_desc_flags(struct desc_struct *desc, u16 flags)
> > > +{
> > > + desc->g = (flags >> 15) & 1;
> > > + desc->d = (flags >> 14) & 1;
> > > + desc->l = (flags >> 13) & 1;
> > > + desc->avl = (flags >> 12) & 1;
> > > + desc->p = (flags >> 7) & 1;
> > > + desc->dpl = (flags >> 5) & 3;
> > > + desc->s = (flags >> 4) & 1;
> > > + desc->type = flags & 15;
> >
> > I can't find a description of this ... can you point me to a place where
> > the gap between 'p' and 'avl' is documented?
> > (Not that it matters unless the guest reads it, but it's a bit weird.)
>
> It turns out that access rights are stored in the same format as the VMX
> access rights.

Thanks, so it really has a "reserved" space in the middle, to save some
processing because the format is horrible (backward compatible).

> access rights. However, they are shifted by 8, which my code above
> doesn't do (bug).

I think you are shifting it right, though ... they wouldn't fit into a
word if shifted left.

(I'd just shorten it after finding the right name for mask
u32 mask = 0x00f0ff00
desc->b = desc->b & ~mask | flags << 8 & mask

and maybe define 'attributes' field in desc that is already shifted.)

> The documentation is, of course, QEMU and Bochs :) but you can also find
> it in http://www.rcollins.org/ftp/source/include/struc.inc. It is not
> exactly for SMM, but it is more or less the same.

(It's a register hidden from software, so I have some trust issues :])

2015-05-05 13:50:38

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 09/13] KVM: x86: save/load state on SMM switch



On 05/05/2015 14:48, Radim Krčmář wrote:
>>>> > > > +{
>>>> > > > + desc->g = (flags >> 15) & 1;
>>>> > > > + desc->d = (flags >> 14) & 1;
>>>> > > > + desc->l = (flags >> 13) & 1;
>>>> > > > + desc->avl = (flags >> 12) & 1;
>>>> > > > + desc->p = (flags >> 7) & 1;
>>>> > > > + desc->dpl = (flags >> 5) & 3;
>>>> > > > + desc->s = (flags >> 4) & 1;
>>>> > > > + desc->type = flags & 15;
>>> > >
>>> > > I can't find a description of this ... can you point me to a place where
>>> > > the gap between 'p' and 'avl' is documented?
>>> > > (Not that it matters unless the guest reads it, but it's a bit weird.)
>> >
>> > It turns out that access rights are stored in the same format as the VMX
>> > access rights.
> Thanks, so it really has a "reserved" space in the middle, to save some
> processing because the format is horrible (backward compatible).
>
>> > access rights. However, they are shifted by 8, which my code above
>> > doesn't do (bug).
> I think you are shifting it right, though ... they wouldn't fit into a
> word if shifted left.

Right, they have to be shifted right in the 64-bit case but not the
32-bit case.

Paolo

2015-05-05 16:08:45

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 02/13] KVM: reuse memslot in kvm_write_guest_page

Paolo Bonzini <[email protected]> writes:

> From: Radim Krčmář <[email protected]>
>
> Saves one O(log N) search.

It really doesn't hurt to change this to:
"Caching memslot value helps us avoid another O(log N) search
later when calling mark_page_dirty_in_slot()."

Sorry I am not a fan of commit messages less than 75 characters :)
Or you can also call me dumb :)

> Signed-off-by: Radim Krčmář <[email protected]>
> Message-Id: <[email protected]>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> virt/kvm/kvm_main.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 90977418aeb6..b6d415156283 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1590,15 +1590,17 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
> int offset, int len)
> {
> int r;
> + struct kvm_memory_slot *memslot;
> unsigned long addr;
>
> - addr = gfn_to_hva(kvm, gfn);
> + memslot = gfn_to_memslot(kvm, gfn);
> + addr = gfn_to_hva_memslot(memslot, gfn);
> if (kvm_is_error_hva(addr))
> return -EFAULT;
> r = __copy_to_user((void __user *)addr + offset, data, len);
> if (r)
> return -EFAULT;
> - mark_page_dirty(kvm, gfn);
> + mark_page_dirty_in_slot(kvm, memslot, gfn);
> return 0;
> }
> EXPORT_SYMBOL_GPL(kvm_write_guest_page);

2015-05-05 16:11:18

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 06/13] KVM: x86: pass the whole hflags field to emulator and back

Paolo Bonzini <[email protected]> writes:

> The hflags field will contain information about system management mode
> and will be useful for the emulator. Pass the entire field rather than
> just the guest-mode information.

With respect to maintaining maximum isolation between vcpu internals and
the emulator, why not just "bool smm_mode" ?

> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> arch/x86/include/asm/kvm_emulate.h | 5 ++++-
> arch/x86/kvm/emulate.c | 6 +++---
> arch/x86/kvm/x86.c | 4 +++-
> 3 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
> index 57a9d94fe160..7410879a41f7 100644
> --- a/arch/x86/include/asm/kvm_emulate.h
> +++ b/arch/x86/include/asm/kvm_emulate.h
> @@ -262,6 +262,9 @@ enum x86emul_mode {
> X86EMUL_MODE_PROT64, /* 64-bit (long) mode. */
> };
>
> +/* These match some of the HF_* flags defined in kvm_host.h */
> +#define X86EMUL_GUEST_MASK (1 << 5) /* VCPU is in guest-mode */
> +
> struct x86_emulate_ctxt {
> const struct x86_emulate_ops *ops;
>
> @@ -273,8 +276,8 @@ struct x86_emulate_ctxt {
>
> /* interruptibility state, as a result of execution of STI or MOV SS */
> int interruptibility;
> + int emul_flags;
>
> - bool guest_mode; /* guest running a nested guest */
> bool perm_ok; /* do not check permissions if true */
> bool ud; /* inject an #UD if host doesn't support insn */
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 630bcb0d7a04..cdb612b50910 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -4871,7 +4871,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
> fetch_possible_mmx_operand(ctxt, &ctxt->dst);
> }
>
> - if (unlikely(ctxt->guest_mode) && (ctxt->d & Intercept)) {
> + if (unlikely(ctxt->emul_flags & X86EMUL_GUEST_MASK) && ctxt->intercept) {
> rc = emulator_check_intercept(ctxt, ctxt->intercept,
> X86_ICPT_PRE_EXCEPT);
> if (rc != X86EMUL_CONTINUE)
> @@ -4900,7 +4900,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
> goto done;
> }
>
> - if (unlikely(ctxt->guest_mode) && (ctxt->d & Intercept)) {
> + if (unlikely(ctxt->emul_flags & X86EMUL_GUEST_MASK) && (ctxt->d & Intercept)) {
> rc = emulator_check_intercept(ctxt, ctxt->intercept,
> X86_ICPT_POST_EXCEPT);
> if (rc != X86EMUL_CONTINUE)
> @@ -4953,7 +4953,7 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
>
> special_insn:
>
> - if (unlikely(ctxt->guest_mode) && (ctxt->d & Intercept)) {
> + if (unlikely(ctxt->emul_flags & X86EMUL_GUEST_MASK) && (ctxt->d & Intercept)) {
> rc = emulator_check_intercept(ctxt, ctxt->intercept,
> X86_ICPT_POST_MEMACCESS);
> if (rc != X86EMUL_CONTINUE)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 856598afa6b4..6009e6a0e406 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5132,7 +5132,8 @@ static void init_emulate_ctxt(struct kvm_vcpu *vcpu)
> (cs_l && is_long_mode(vcpu)) ? X86EMUL_MODE_PROT64 :
> cs_db ? X86EMUL_MODE_PROT32 :
> X86EMUL_MODE_PROT16;
> - ctxt->guest_mode = is_guest_mode(vcpu);
> + BUILD_BUG_ON(HF_GUEST_MASK != X86EMUL_GUEST_MASK);
> + ctxt->emul_flags = vcpu->arch.hflags;
>
> init_decode_cache(ctxt);
> vcpu->arch.emulate_regs_need_sync_from_vcpu = false;
> @@ -5500,6 +5501,7 @@ restart:
> unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
> toggle_interruptibility(vcpu, ctxt->interruptibility);
> vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
> + vcpu->arch.hflags = ctxt->emul_flags;
> kvm_rip_write(vcpu, ctxt->eip);
> if (r == EMULATE_DONE)
> kvm_vcpu_check_singlestep(vcpu, rflags, &r);

2015-05-05 16:16:55

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 06/13] KVM: x86: pass the whole hflags field to emulator and back



On 05/05/2015 17:47, Bandan Das wrote:
> > The hflags field will contain information about system management mode
> > and will be useful for the emulator. Pass the entire field rather than
> > just the guest-mode information.
>
> With respect to maintaining maximum isolation between vcpu internals and
> the emulator,

Isolation is maintained. hflags are simply parts of the processor state
that are not visible to the guest, you can choose to include them as
separate flags or in a single one. Bundling them in a single flag makes
it a bit faster to pass around many of them.

While we do not need all of them in the emulator, that's in some cases a
limitation of the emulator. For example, if we wanted to emulate
CLGI/STGI we would need either HF_GIF_MASK or another . Likewise,
HF_NMI_MASK could replace emulator_set_nmi_mask (especially in v2 of the
series, which will add kvm_set_hflags).

However, if you prefer, I can change it to "bool smm_mode" + a new
smm_exit emulator callback.

Paolo

> why not just "bool smm_mode" ?
>

2015-05-05 16:31:05

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 02/13] KVM: reuse memslot in kvm_write_guest_page

2015-05-05 11:03-0400, Bandan Das:
> Paolo Bonzini <[email protected]> writes:
>
>> From: Radim Krčmář <[email protected]>
>>
>> Saves one O(log N) search.
>
> It really doesn't hurt to change this to:
> "Caching memslot value helps us avoid another O(log N) search
> later when calling mark_page_dirty_in_slot()."
>
> Sorry I am not a fan of commit messages less than 75 characters :)

Sure, I just couldn't find a message that would be easier to understand
than the code and not misleading at the same time ... what came out was
a teaser for people who care about performance :)

In verbose mode, I'd write it like this:
gfn_to_hva() and mark_page_dirty() both call gfn_to_memslot().
We can store a result of gfn_to_memslot() (memslot) and unwrap
gfn_to_hva() and mark_page_dirty() to directly call functions that
accept a memslot.
gfn_to_memslot() does a binary search, so it also saves some cycles.

2015-05-05 16:36:48

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 07/13] KVM: x86: API changes for SMM support

Paolo Bonzini <[email protected]> writes:

> This patch includes changes to the external API for SMM support.
> All the changes are predicated by the availability of a new
> capability, KVM_CAP_X86_SMM, which is added at the end of the
> patch series.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> Documentation/virtual/kvm/api.txt | 34 ++++++++++++++++++++++++++++++----
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/include/uapi/asm/kvm.h | 7 +++++++
> arch/x86/kvm/kvm_cache_regs.h | 5 +++++
> arch/x86/kvm/x86.c | 30 +++++++++++++++++++++++++-----
> include/uapi/linux/kvm.h | 5 ++++-
> 6 files changed, 72 insertions(+), 10 deletions(-)
...
>
>
> +#define KVM_RUN_X86_SMM (1 << 0)
> +
> /* for KVM_GET_REGS and KVM_SET_REGS */
> struct kvm_regs {
> /* out (KVM_GET_REGS) / in (KVM_SET_REGS) */
> @@ -281,6 +283,7 @@ struct kvm_reinject_control {
> #define KVM_VCPUEVENT_VALID_NMI_PENDING 0x00000001
> #define KVM_VCPUEVENT_VALID_SIPI_VECTOR 0x00000002
> #define KVM_VCPUEVENT_VALID_SHADOW 0x00000004
> +#define KVM_VCPUEVENT_VALID_SMM 0x00000008

Note the formatting above is different in Linus' tree.
I can't seem to find where it changed.
/* When set in flags, include corresponding fields on KVM_SET_VCPU_EVENTS */
#define KVM_VCPUEVENT_VALID_NMI_PENDING 0x00000001
#define KVM_VCPUEVENT_VALID_SIPI_VECTOR 0x00000002
#define KVM_VCPUEVENT_VALID_SHADOW 0x00000004

...
> +static int kvm_vcpu_ioctl_smi(struct kvm_vcpu *vcpu)
> +{
> + return 0;
> +}
> +
> static int vcpu_ioctl_tpr_access_reporting(struct kvm_vcpu *vcpu,
> struct kvm_tpr_access_ctl *tac)
> {
> @@ -3116,29 +3121,31 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
> struct kvm_vcpu_events *events)
> {
> process_nmi(vcpu);
> +
> + memset(events, 0, sizeof(*events));
I think that it's better that the caller memsets this before passing it over
to the *_get_vcpu_events function.
...
>
> /* in (pre_kvm_run), out (post_kvm_run) */
> __u64 cr8;
> @@ -814,6 +814,7 @@ struct kvm_ppc_smmu_info {
> #define KVM_CAP_S390_INJECT_IRQ 113
> #define KVM_CAP_S390_IRQ_STATE 114
> #define KVM_CAP_PPC_HWRNG 115
> +#define KVM_CAP_X86_SMM 120
Why didn't we reserve the next available number here ?

> #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -1199,6 +1200,8 @@ struct kvm_s390_ucas_mapping {
> /* Available with KVM_CAP_S390_IRQ_STATE */
> #define KVM_S390_SET_IRQ_STATE _IOW(KVMIO, 0xb5, struct kvm_s390_irq_state)
> #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
> +/* Available with KVM_CAP_X86_SMM */
> +#define KVM_SMI _IO(KVMIO, 0xb7)
>
> #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
> #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)

2015-05-05 16:45:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 07/13] KVM: x86: API changes for SMM support



On 05/05/2015 18:36, Bandan Das wrote:
>> > @@ -3116,29 +3121,31 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
>> > struct kvm_vcpu_events *events)
>> > {
>> > process_nmi(vcpu);
>> > +
>> > + memset(events, 0, sizeof(*events));
> I think that it's better that the caller memsets this before passing it over
> to the *_get_vcpu_events function.

I should have actually removed the

memset(&events->reserved, 0, sizeof(events->reserved));

and padding memsets as well. But I'll instead remove the memset and
make sure to zero out events->smi.pad.

>> >
>> > /* in (pre_kvm_run), out (post_kvm_run) */
>> > __u64 cr8;
>> > @@ -814,6 +814,7 @@ struct kvm_ppc_smmu_info {
>> > #define KVM_CAP_S390_INJECT_IRQ 113
>> > #define KVM_CAP_S390_IRQ_STATE 114
>> > #define KVM_CAP_PPC_HWRNG 115
>> > +#define KVM_CAP_X86_SMM 120
> Why didn't we reserve the next available number here ?

Because it will be a while before this series is ready, and I got bored
of having to modify QEMU every time somebody used the next capability. :)

Paolo

>> > #ifdef KVM_CAP_IRQ_ROUTING
>> >
>> > @@ -1199,6 +1200,8 @@ struct kvm_s390_ucas_mapping {
>> > /* Available with KVM_CAP_S390_IRQ_STATE */
>> > #define KVM_S390_SET_IRQ_STATE _IOW(KVMIO, 0xb5, struct kvm_s390_irq_state)
>> > #define KVM_S390_GET_IRQ_STATE _IOW(KVMIO, 0xb6, struct kvm_s390_irq_state)
>> > +/* Available with KVM_CAP_X86_SMM */
>> > +#define KVM_SMI _IO(KVMIO, 0xb7)
>> >
>> > #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0)
>> > #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1)

2015-05-05 17:17:53

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 12/13] KVM: x86: add KVM_MEM_X86_SMRAM memory slot flag

2015-04-30 13:36+0200, Paolo Bonzini:
> This adds an arch-specific memslot flag that hides slots unless the
> VCPU is in system management mode.
>
> Some care is needed in order to limit the overhead of x86_gfn_to_memslot
> when compared with gfn_to_memslot. Thankfully, we have __gfn_to_memslot
> and search_memslots which are the same, so we can add some extra output
> to search_memslots. The compiler will optimize it as dead code in
> __gfn_to_memslot, and will use it to thread jumps in x86_gfn_to_memslot.
>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> @@ -19,10 +19,23 @@
>
> #include <linux/module.h>
> #include <linux/kvm_host.h>
> +#include "kvm_cache_regs.h"
>
> struct kvm_memory_slot *x86_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn)
> {
> - struct kvm_memory_slot *slot = gfn_to_memslot(vcpu->kvm, gfn);
> + /* By using search_memslots directly the compiler can optimize away
> + * the "if (found)" check below.
> + *
> + * It cannot do the same for gfn_to_memslot because it is not inlined,
> + * and it also cannot do the same for __gfn_to_memslot because the
> + * kernel is compiled with -fno-delete-null-pointer-checks.
> + */
> + bool found;
> + struct kvm_memslots *memslots = kvm_memslots(vcpu->kvm);
> + struct kvm_memory_slot *slot = search_memslots(memslots, gfn, &found);
> +
> + if (found && unlikely(slot->flags & KVM_MEM_X86_SMRAM) && !is_smm(vcpu))
> + return NULL;

Patch [10/13] made me sad and IIUIC, the line above is the only reason
for it ... what about renaming and changing kvm_* memory function to
vcpu_* and create
bool kvm_arch_vcpu_can_access_slot(vcpu, slot)
which could also be inline in arch/*/include/asm/kvm_host.h thanks to
the way we build.
We could be passing both kvm and vcpu in internal memslot operations and
not checking if vcpu is NULL. This should allow all possible operations
with little code duplication and the compiler could also optimize the
case where vcpu is NULL.

Another option is adding something like "vcpu kvm_arch_fake_vcpu(kvm)"
for cases where the access doesn't have an associated vcpu, so it would
always succeed. (Might not be generic enough.)

I prefer everything to copy-pasting code, so I'll try to come up with
more ideas if you don't like these :)

2015-05-05 18:39:11

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 08/13] KVM: x86: stubs for SMM support

Radim Krčmář <[email protected]> writes:
...
>> + break;
>
> (I'm not sure if this is supported if IA32_VMX_BASIC[49] = 0.
> 34.15.6.4 Saving Guest State
> The SMM-transfer monitor (STM) can also discover the current value of
> the SMBASE register by using the RDMSR
>
> but it's not possible to get into STM without having a support for it
> noted in IA32_VMX_BASIC[49] and more magic we also don't emulate to
> actually enable it.)

Where does it mention IA32_VMX_BASIC[49] ? I only see "IA32_VMX_MISC[15] should be 1"
in 34.15.6.4. Anyway, I think we should do what the spec says..

>> @@ -7208,6 +7240,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
>> vcpu->arch.regs_avail = ~0;
>> vcpu->arch.regs_dirty = ~0;
>>
>> + vcpu->arch.smbase = 0x30000;
>
> It's not reset on INIT, only on RESET. (34.11 SMBASE RELOCATION)
I remember mentioning it elsewhere - IMO kvm_vcpu_reset() and kvm_vcpu_init()
should really be two different interfaces. I don't mean code duplication - one
can just call the other but different names will be of some help when it comes
to the million places where the spec mentions INIT and RESET have different
behavior.

Bandan

2015-05-05 18:41:01

by Radim Krčmář

[permalink] [raw]
Subject: Re: [RFC PATCH 00/13] KVM: x86: SMM support

2015-04-30 13:35+0200, Paolo Bonzini:
> This patch series introduces system management mode support.
> There is still some work to do, namely: test without unrestricted
> guest support, test on AMD, disable the capability if !unrestricted
> guest and !emulate invalid guest state(*), test with a QEMU that
> understand KVM_MEM_X86_SMRAM, actually post QEMU patches that let
> you use this.
>
> (*) newer chipsets moved away from legacy SMRAM at 0xa0000,
> thus support for real mode CS base above 1M is necessary
>
> Because legacy SMRAM is a mess, I have tried these patches with Q35's
> high SMRAM (at 0xfeda0000). This means that right now this isn't
> the easiest thing to test; you need QEMU patches that add support
> for high SMRAM, and SeaBIOS patches to use high SMRAM. Until QEMU
> support for KVM_MEM_X86_SMRAM is in place, also, I'm keeping SMRAM
> open in SeaBIOS.
>
> That said, even this clumsy and incomplete userspace configuration is
> enough to test all patches except 11 and 12.
>
> The series is structured as follows.
>
> Patch 1 is an unrelated bugfix (I think). Patches 2 to 6 extend some
> infrastructure functions. Patches 1 to 4 could be committed right now.
>
> Patches 7 to 9 implement basic support for SMM in the KVM API
> and teach KVM about doing the world switch on SMI and RSM.
>
> Patch 10 touches all places in KVM that read/write guest memory to
> go through an x86-specific function. The x86-specific function takes
> a VCPU rather than a struct kvm. This is used in patches 11 and 12
> to limits access to specially marked SMRAM slots unless the VCPU is
> in system management mode.
>
> Finally, patch 13 exposes the new capability for userspace to probe.

I lost all concentration, so I'll write down design problems I noticed
while reviewing till now in case it helps:
(haven't mentioned the first three yet.)

- Whole SMRAM is writeable. Spec says that parts of state should be
read-only. (This seems hard to fix without trapping all writes.)

- I/O restarting is not enabled. (APM 2:10.2.4 SMM-Revision Identifier
says that AMD64 always sets this bit.)

- NMI handling has some quirks. (Software can enable NMI and another
SMI should mask them again.)

- SMIs received while in SMM aren't handled. (One can be pending.)

- SMM and userspace.
We can get if smm is enabled at two separate places (flag from KVM_RUN
and in KVM_GET_VCPU_EVENTS) and toggle it via KVM_SET_VCPU_EVENTS.

It's not an event, so I wouldn't include it in EVENTS API ...
Letting the flag in KVM_RUN also toggle SMM would be easiest.
Otherwise, wouldn't GET/SET_ONE_REG be a better match for it?

- [10/13] :)

2015-05-05 18:48:38

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 08/13] KVM: x86: stubs for SMM support

2015-05-05 14:38-0400, Bandan Das:
> Radim Krčmář <[email protected]> writes:
> ...
> >> + break;
> >
> > (I'm not sure if this is supported if IA32_VMX_BASIC[49] = 0.
> > 34.15.6.4 Saving Guest State
> > The SMM-transfer monitor (STM) can also discover the current value of
> > the SMBASE register by using the RDMSR
> >
> > but it's not possible to get into STM without having a support for it
> > noted in IA32_VMX_BASIC[49] and more magic we also don't emulate to
> > actually enable it.)
>
> Where does it mention IA32_VMX_BASIC[49] ? I only see "IA32_VMX_MISC[15] should be 1"
> in 34.15.6.4. Anyway, I think we should do what the spec says..

The relevant part is "SMM-transfer monitor (STM) can" -- you can't be
STM without IA32_VMX_MISC[15] and a bunch of other stuff.

Testing on real hardware would be the best way to tell ...
(Till we know, I'm okay with anything.)

> >> @@ -7208,6 +7240,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu)
> >> vcpu->arch.regs_avail = ~0;
> >> vcpu->arch.regs_dirty = ~0;
> >>
> >> + vcpu->arch.smbase = 0x30000;
> >
> > It's not reset on INIT, only on RESET. (34.11 SMBASE RELOCATION)
> I remember mentioning it elsewhere - IMO kvm_vcpu_reset() and kvm_vcpu_init()
> should really be two different interfaces. I don't mean code duplication - one
> can just call the other but different names will be of some help when it comes
> to the million places where the spec mentions INIT and RESET have different
> behavior.

Agreed.

2015-05-05 20:44:44

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 09/13] KVM: x86: save/load state on SMM switch

Paolo Bonzini <[email protected]> writes:

> +static inline bool guest_cpuid_has_longmode(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_cpuid_entry2 *best;
> +
> + best = kvm_find_cpuid_entry(vcpu, 0x80000001, 0);
> + return best && (best->edx & bit(X86_FEATURE_LM));
> +}
> +

We could combine all guest_cpuid_has* functions into a single
function guest_cpuid_has_feature(vcpu, feature) and avoid code duplication.
Not relevant to this change - just a note (to self).

> static inline bool guest_cpuid_has_osvw(struct kvm_vcpu *vcpu)
> {
> struct kvm_cpuid_entry2 *best;
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index f6b641207416..a49606871277 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2262,12 +2262,253 @@ static int em_lseg(struct x86_emulate_ctxt *ctxt)
> return rc;
> }
>
> +static int emulator_has_longmode(struct x86_emulate_ctxt *ctxt)
> +{
> + u32 eax, ebx, ecx, edx;
> +
> + eax = 0x80000001;
> + ecx = 0;
> + ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
> + return edx & bit(X86_FEATURE_LM);
> +}
> +
> +#define get_smstate(type, smbase, offset) \
> + ({ \
> + type __val; \
> + int r = ctxt->ops->read_std(ctxt, smbase + offset, &__val, \
> + sizeof(__val), NULL); \
> + if (r != X86EMUL_CONTINUE) \
> + return X86EMUL_UNHANDLEABLE; \
> + __val; \
> + })
> +
> +static void rsm_set_desc_flags(struct desc_struct *desc, u16 flags)
> +{
> + desc->g = (flags >> 15) & 1;
> + desc->d = (flags >> 14) & 1;
> + desc->l = (flags >> 13) & 1;
> + desc->avl = (flags >> 12) & 1;
> + desc->p = (flags >> 7) & 1;
> + desc->dpl = (flags >> 5) & 3;
> + desc->s = (flags >> 4) & 1;
> + desc->type = flags & 15;
> +}
> +
> +static int rsm_load_seg_32(struct x86_emulate_ctxt *ctxt, u64 smbase, int n)
> +{
> + struct desc_struct desc;
> + int offset;
> + u16 selector;
> +
> + selector = get_smstate(u32, smbase, 0x7fa8 + n * 4);
Probably a good idea to use #defines for all the offsets here
and elsewhere.

> +
> + if (n < 3)
> + offset = 0x7f84 + n * 12;
> + else
> + offset = 0x7f2c + (n - 3) * 12;
> +
> + set_desc_base(&desc, get_smstate(u32, smbase, offset + 8));
> + set_desc_limit(&desc, get_smstate(u32, smbase, offset + 4));
> + rsm_set_desc_flags(&desc, get_smstate(u32, smbase, offset));
> + ctxt->ops->set_segment(ctxt, selector, &desc, 0, n);
> + return X86EMUL_CONTINUE;
> +}
> +
> +static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, u64 smbase, int n)
> +{
> + struct desc_struct desc;
> + int offset;
> + u16 selector;
> + u32 base3;
> +
> + offset = 0x7e00 + n * 16;
> +
> + selector = get_smstate(u16, smbase, offset);
> + rsm_set_desc_flags(&desc, get_smstate(u16, smbase, offset + 2));
> + set_desc_limit(&desc, get_smstate(u32, smbase, offset + 4));
> + set_desc_base(&desc, get_smstate(u32, smbase, offset + 8));
> + base3 = get_smstate(u32, smbase, offset + 12);
> +
> + ctxt->ops->set_segment(ctxt, selector, &desc, base3, n);
> + return X86EMUL_CONTINUE;
> +}
> +
> +static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt,
> + u64 cr0, u64 cr4)
> +{
> + int bad;
> +
> + /*
> + * First enable PAE, long mode needs it before CR0.PG = 1 is set.
> + * Then enable protected mode. However, PCID cannot be enabled
> + * if EFER.LMA=0, so set it separately.
> + */
> + bad = ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PCIDE);
> + if (bad)
> + return X86EMUL_UNHANDLEABLE;
> +
> + bad = ctxt->ops->set_cr(ctxt, 0, cr0);
> + if (bad)
> + return X86EMUL_UNHANDLEABLE;
> +
> + if (cr4 & X86_CR4_PCIDE) {
> + bad = ctxt->ops->set_cr(ctxt, 4, cr4);
> + if (bad)
> + return X86EMUL_UNHANDLEABLE;
> + }
> +
> + return X86EMUL_CONTINUE;
> +}
> +
> +static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt, u64 smbase)
> +{
> + struct desc_struct desc;
> + struct desc_ptr dt;
> + u16 selector;
> + u32 val, cr0, cr4;
> + int i;
> +
> + cr0 = get_smstate(u32, smbase, 0x7ffc);
> + ctxt->ops->set_cr(ctxt, 3, get_smstate(u32, smbase, 0x7ff8));
> + ctxt->eflags = get_smstate(u32, smbase, 0x7ff4) | X86_EFLAGS_FIXED;
> + ctxt->_eip = get_smstate(u32, smbase, 0x7ff0);
> +
> + for (i = 0; i < 8; i++)
> + *reg_write(ctxt, i) = get_smstate(u32, smbase, 0x7fd0 + i * 4);
> +
> + val = get_smstate(u32, smbase, 0x7fcc);
> + ctxt->ops->set_dr(ctxt, 6, (val & DR6_VOLATILE) | DR6_FIXED_1);
> + val = get_smstate(u32, smbase, 0x7fc8);
> + ctxt->ops->set_dr(ctxt, 7, (val & DR7_VOLATILE) | DR7_FIXED_1);
> +
> + selector = get_smstate(u32, smbase, 0x7fc4);
> + set_desc_base(&desc, get_smstate(u32, smbase, 0x7f64));
> + set_desc_limit(&desc, get_smstate(u32, smbase, 0x7f60));
> + rsm_set_desc_flags(&desc, get_smstate(u32, smbase, 0x7f5c));
> + ctxt->ops->set_segment(ctxt, selector, &desc, 0, VCPU_SREG_TR);
> +
> + selector = get_smstate(u32, smbase, 0x7fc0);
> + set_desc_base(&desc, get_smstate(u32, smbase, 0x7f80));
> + set_desc_limit(&desc, get_smstate(u32, smbase, 0x7f7c));
> + rsm_set_desc_flags(&desc, get_smstate(u32, smbase, 0x7f78));
> + ctxt->ops->set_segment(ctxt, selector, &desc, 0, VCPU_SREG_LDTR);
> +
> + dt.address = get_smstate(u32, smbase, 0x7f74);
> + dt.size = get_smstate(u32, smbase, 0x7f70);
> + ctxt->ops->set_gdt(ctxt, &dt);
> +
> + dt.address = get_smstate(u32, smbase, 0x7f58);
> + dt.size = get_smstate(u32, smbase, 0x7f54);
> + ctxt->ops->set_idt(ctxt, &dt);
> +
> + for (i = 0; i < 6; i++) {
> + int r = rsm_load_seg_32(ctxt, smbase, i);
> + if (r != X86EMUL_CONTINUE)
> + return r;
This return is redundant since rsm_load_seg_32 always returns
success. Same for rsm_load_seg_64()

> + }
> +
> + cr4 = get_smstate(u32, smbase, 0x7f14);
> +
> + ctxt->ops->set_smbase(ctxt, get_smstate(u32, smbase, 0x7ef8));
> +
> + return rsm_enter_protected_mode(ctxt, cr0, cr4);
> +}
> +
> +static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase)
> +{
> + struct desc_struct desc;
> + struct desc_ptr dt;
> + u64 val, cr0, cr4;
> + u32 base3;
> + u16 selector;
> + int i;
> +
> + for (i = 0; i < 16; i++)
> + *reg_write(ctxt, i) = get_smstate(u64, smbase, 0x7ff8 - i * 8);
> +
> + ctxt->_eip = get_smstate(u64, smbase, 0x7f78);
> + ctxt->eflags = get_smstate(u32, smbase, 0x7f70) | X86_EFLAGS_FIXED;
> +
> + val = get_smstate(u32, smbase, 0x7f68);
> + ctxt->ops->set_dr(ctxt, 6, (val & DR6_VOLATILE) | DR6_FIXED_1);
> + val = get_smstate(u32, smbase, 0x7f60);
> + ctxt->ops->set_dr(ctxt, 7, (val & DR7_VOLATILE) | DR7_FIXED_1);
> +
> + cr0 = get_smstate(u64, smbase, 0x7f58);
> + ctxt->ops->set_cr(ctxt, 3, get_smstate(u64, smbase, 0x7f50));
> + cr4 = get_smstate(u64, smbase, 0x7f48);
> + ctxt->ops->set_smbase(ctxt, get_smstate(u32, smbase, 0x7f00));
> + val = get_smstate(u64, smbase, 0x7ed0);
> + ctxt->ops->set_msr(ctxt, MSR_EFER, val & ~EFER_LMA);
> +
> + selector = get_smstate(u32, smbase, 0x7e90);
> + rsm_set_desc_flags(&desc, get_smstate(u32, smbase, 0x7e92));
> + set_desc_limit(&desc, get_smstate(u32, smbase, 0x7e94));
> + set_desc_base(&desc, get_smstate(u32, smbase, 0x7e98));
> + base3 = get_smstate(u32, smbase, 0x7e9c);
> + ctxt->ops->set_segment(ctxt, selector, &desc, base3, VCPU_SREG_TR);
> +
> + dt.size = get_smstate(u32, smbase, 0x7e84);
> + dt.address = get_smstate(u64, smbase, 0x7e88);
> + ctxt->ops->set_idt(ctxt, &dt);
> +
> + selector = get_smstate(u32, smbase, 0x7e70);
> + rsm_set_desc_flags(&desc, get_smstate(u32, smbase, 0x7e72));
> + set_desc_limit(&desc, get_smstate(u32, smbase, 0x7e74));
> + set_desc_base(&desc, get_smstate(u32, smbase, 0x7e78));
> + base3 = get_smstate(u32, smbase, 0x7e7c);
> + ctxt->ops->set_segment(ctxt, selector, &desc, base3, VCPU_SREG_LDTR);
> +
> + dt.size = get_smstate(u32, smbase, 0x7e64);
> + dt.address = get_smstate(u64, smbase, 0x7e68);
> + ctxt->ops->set_gdt(ctxt, &dt);
> +
> + for (i = 0; i < 6; i++) {
> + int r = rsm_load_seg_64(ctxt, smbase, i);
> + if (r != X86EMUL_CONTINUE)
> + return r;
> + }
> +
> + return rsm_enter_protected_mode(ctxt, cr0, cr4);
> +}
> +
> static int em_rsm(struct x86_emulate_ctxt *ctxt)
> {
> + unsigned long cr0, cr4, efer;
> + u64 smbase;
> + int ret;
> +
> + printk("rsm\n");
> if ((ctxt->emul_flags & X86EMUL_SMM_MASK) == 0)
> return emulate_ud(ctxt);
>
> - return X86EMUL_UNHANDLEABLE;
> + /*
> + * Get back to real mode, to prepare a safe state in which
> + * to load CR0/CR3/CR4/EFER. Also this will ensure that
> + * addresses passed to read_std/write_std are not virtual.
> + */
I am trying to understand this. Aren't we are already in real mode
here since we are in smm or am I missing something..

> + cr0 = ctxt->ops->get_cr(ctxt, 0);
> + if (cr0 & X86_CR0_PE)
> + ctxt->ops->set_cr(ctxt, 0, cr0 & ~(X86_CR0_PG | X86_CR0_PE));
> + cr4 = ctxt->ops->get_cr(ctxt, 4);
> + if (cr0 & X86_CR4_PAE)
> + ctxt->ops->set_cr(ctxt, 4, cr4 & ~X86_CR4_PAE);
> + efer = 0;
> + ctxt->ops->set_msr(ctxt, MSR_EFER, efer);
> +
> + ctxt->ops->get_msr(ctxt, MSR_IA32_SMBASE, &smbase);
> + if (emulator_has_longmode(ctxt))
> + ret = rsm_load_state_64(ctxt, smbase + 0x8000);
> + else
> + ret = rsm_load_state_32(ctxt, smbase + 0x8000);
> +
> + if (ret != X86EMUL_CONTINUE) {
> + /* FIXME: should triple fault */
> + return X86EMUL_UNHANDLEABLE;
> + }
> +
> + ctxt->emul_flags &= ~X86EMUL_SMM_MASK;
> + return X86EMUL_CONTINUE;
> }
>
> static void
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4cd7a2a18e93..ab6a38617813 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6232,12 +6232,226 @@ static void process_nmi(struct kvm_vcpu *vcpu)
> kvm_make_request(KVM_REQ_EVENT, vcpu);
> }
>
> +#define put_smstate(type, buf, offset, val) \
> + *(type *)((buf) + (offset) - 0x7e00) = val
> +
> +static u16 process_smi_get_segment_flags(struct kvm_segment *seg)
> +{
> + u16 flags = 0;
> + flags |= seg->g << 15;
> + flags |= seg->db << 14;
> + flags |= seg->l << 13;
> + flags |= seg->avl << 12;
> + flags |= seg->present << 7;
> + flags |= seg->dpl << 5;
> + flags |= seg->s << 4;
> + flags |= seg->type;
> + return flags;
> +}
> +
> +static void process_smi_save_seg_32(struct kvm_vcpu *vcpu, char *buf, int n)
> +{
> + struct kvm_segment seg;
> + int offset;
> +
> + kvm_get_segment(vcpu, &seg, n);
> + put_smstate(u32, buf, 0x7fa8 + n * 4, seg.selector);
> +
> + if (n < 3)
> + offset = 0x7f84 + n * 12;
> + else
> + offset = 0x7f2c + (n - 3) * 12;
> +
> + put_smstate(u32, buf, offset + 8, seg.base);
> + put_smstate(u32, buf, offset + 4, seg.limit);
> + put_smstate(u32, buf, offset, process_smi_get_segment_flags(&seg));
> +}
> +
> +static void process_smi_save_seg_64(struct kvm_vcpu *vcpu, char *buf, int n)
> +{
> + struct kvm_segment seg;
> + int offset;
> +
> + kvm_get_segment(vcpu, &seg, n);
> + offset = 0x7e00 + n * 16;
> +
> + put_smstate(u16, buf, offset, seg.selector);
> + put_smstate(u16, buf, offset + 2, process_smi_get_segment_flags(&seg));
> + put_smstate(u32, buf, offset + 4, seg.limit);
> + put_smstate(u64, buf, offset + 8, seg.base);
> +}
> +
> +static void process_smi_save_state_32(struct kvm_vcpu *vcpu, char *buf)
> +{
> + struct desc_ptr dt;
> + struct kvm_segment seg;
> + unsigned long val;
> + int i;
> +
> + put_smstate(u32, buf, 0x7ffc, kvm_read_cr0(vcpu));
> + put_smstate(u32, buf, 0x7ff8, kvm_read_cr3(vcpu));
> + put_smstate(u32, buf, 0x7ff4, kvm_get_rflags(vcpu));
> + put_smstate(u32, buf, 0x7ff0, kvm_rip_read(vcpu));
> +
> + for (i = 0; i < 8; i++)
> + put_smstate(u32, buf, 0x7fd0 + i * 4, kvm_register_read(vcpu, i));
> +
> + kvm_get_dr(vcpu, 6, &val);
> + put_smstate(u32, buf, 0x7fcc, (u32)val);
> + kvm_get_dr(vcpu, 7, &val);
> + put_smstate(u32, buf, 0x7fc8, (u32)val);
> +
> + kvm_get_segment(vcpu, &seg, VCPU_SREG_TR);
> + put_smstate(u32, buf, 0x7fc4, seg.selector);
> + put_smstate(u32, buf, 0x7f64, seg.base);
> + put_smstate(u32, buf, 0x7f60, seg.limit);
> + put_smstate(u32, buf, 0x7f5c, process_smi_get_segment_flags(&seg));
> +
> + kvm_get_segment(vcpu, &seg, VCPU_SREG_LDTR);
> + put_smstate(u32, buf, 0x7fc0, seg.selector);
> + put_smstate(u32, buf, 0x7f80, seg.base);
> + put_smstate(u32, buf, 0x7f7c, seg.limit);
> + put_smstate(u32, buf, 0x7f78, process_smi_get_segment_flags(&seg));
> +
> + kvm_x86_ops->get_gdt(vcpu, &dt);
> + put_smstate(u32, buf, 0x7f74, dt.address);
> + put_smstate(u32, buf, 0x7f70, dt.size);
> +
> + kvm_x86_ops->get_idt(vcpu, &dt);
> + put_smstate(u32, buf, 0x7f58, dt.address);
> + put_smstate(u32, buf, 0x7f54, dt.size);
> +
> + for (i = 0; i < 6; i++)
> + process_smi_save_seg_32(vcpu, buf, i);
> +
> + put_smstate(u32, buf, 0x7f14, kvm_read_cr4(vcpu));
> +
> + /* revision id */
> + put_smstate(u32, buf, 0x7efc, 0x00020000);
> + put_smstate(u32, buf, 0x7ef8, vcpu->arch.smbase);
> +}
> +
> +static void process_smi_save_state_64(struct kvm_vcpu *vcpu, char *buf)
> +{
> +#ifdef CONFIG_X86_64
> + struct desc_ptr dt;
> + struct kvm_segment seg;
> + unsigned long val;
> + int i;
> +
> + for (i = 0; i < 16; i++)
> + put_smstate(u64, buf, 0x7ff8 - i * 8, kvm_register_read(vcpu, i));
> +
> + put_smstate(u64, buf, 0x7f78, kvm_rip_read(vcpu));
> + put_smstate(u32, buf, 0x7f70, kvm_get_rflags(vcpu));
> +
> + kvm_get_dr(vcpu, 6, &val);
> + put_smstate(u64, buf, 0x7f68, val);
> + kvm_get_dr(vcpu, 7, &val);
> + put_smstate(u64, buf, 0x7f60, val);
> +
> + put_smstate(u64, buf, 0x7f58, kvm_read_cr0(vcpu));
> + put_smstate(u64, buf, 0x7f50, kvm_read_cr3(vcpu));
> + put_smstate(u64, buf, 0x7f48, kvm_read_cr4(vcpu));
> +
> + put_smstate(u32, buf, 0x7f00, vcpu->arch.smbase);
> +
> + /* revision id */
> + put_smstate(u32, buf, 0x7efc, 0x00020064);
Is the revision id (and 0x00020000 for process_smi*_32()) from the
spec ? I can't seem to find them.

Bandan
...

2015-05-06 09:47:47

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 12/13] KVM: x86: add KVM_MEM_X86_SMRAM memory slot flag



On 05/05/2015 19:17, Radim Krčmář wrote:
> 2015-04-30 13:36+0200, Paolo Bonzini:
>> This adds an arch-specific memslot flag that hides slots unless the
>> VCPU is in system management mode.
>>
>> Some care is needed in order to limit the overhead of x86_gfn_to_memslot
>> when compared with gfn_to_memslot. Thankfully, we have __gfn_to_memslot
>> and search_memslots which are the same, so we can add some extra output
>> to search_memslots. The compiler will optimize it as dead code in
>> __gfn_to_memslot, and will use it to thread jumps in x86_gfn_to_memslot.
>>
>> Signed-off-by: Paolo Bonzini <[email protected]>
>> ---
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> @@ -19,10 +19,23 @@
>>
>> #include <linux/module.h>
>> #include <linux/kvm_host.h>
>> +#include "kvm_cache_regs.h"
>>
>> struct kvm_memory_slot *x86_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn)
>> {
>> - struct kvm_memory_slot *slot = gfn_to_memslot(vcpu->kvm, gfn);
>> + /* By using search_memslots directly the compiler can optimize away
>> + * the "if (found)" check below.
>> + *
>> + * It cannot do the same for gfn_to_memslot because it is not inlined,
>> + * and it also cannot do the same for __gfn_to_memslot because the
>> + * kernel is compiled with -fno-delete-null-pointer-checks.
>> + */
>> + bool found;
>> + struct kvm_memslots *memslots = kvm_memslots(vcpu->kvm);
>> + struct kvm_memory_slot *slot = search_memslots(memslots, gfn, &found);
>> +
>> + if (found && unlikely(slot->flags & KVM_MEM_X86_SMRAM) && !is_smm(vcpu))
>> + return NULL;
>
> Patch [10/13] made me sad and IIUIC, the line above is the only reason
> for it ...

Yes, all the differences trickle down to using x86_gfn_to_memslot.

On the other hand, there are already cut-and-pasted loops for guest
memory access, see kvm_write_guest_virt_system or
kvm_read_guest_virt_helper.

We could add __-prefixed macros like

#define __kvm_write_guest(fn_page, gpa, data, len, args...) \
({ \
gpa_t _gpa = (gpa); \
void *_data = (data); \
int _len = (len); \
gfn_t _gfn = _gpa >> PAGE_SHIFT; \
int _offset = offset_in_page(_gpa); \
int _seg, _ret; \
while ((_seg = next_segment(_len, _offset)) != 0) { \
_ret = (fn_page)(args##, _gfn, _data, _offset, _seg); \
if (_ret < 0) \
break; \
_offset = 0; \
_len -= _seg; \
_data += _seg; \
++_gfn; \
} \
_ret; \
})

...

int x86_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, const void *data,
int offset, unsigned long len)
{
return __kvm_write_guest_page(x86_gfn_to_memslot, gfn, data, offset, len, vcpu);
}

int x86_write_guest_cached(struct kvm_vcpu *vcpu, struct gfn_to_hva_cache *ghc,
const void *data, unsigned long len)
{
return __kvm_write_guest_cached(x86_gfn_to_hva_cache_init,
x86_write_guest, ghc, data, len, vcpu);
}

int x86_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data,
unsigned long len)
{
return __kvm_write_guest(x86_write_guest_page, gpa, data, len, vcpu);
}

but frankly it seems worse than the disease.

what about renaming and changing kvm_* memory function to
> vcpu_* and create
> bool kvm_arch_vcpu_can_access_slot(vcpu, slot)
> which could also be inline in arch/*/include/asm/kvm_host.h thanks to
> the way we build.
> We could be passing both kvm and vcpu in internal memslot operations and
> not checking if vcpu is NULL. This should allow all possible operations
> with little code duplication and the compiler could also optimize the
> case where vcpu is NULL.

That would be a huge patch, and most architectures do not (yet) need it.

I can change the functions to kvm_vcpu_read_* and when a second architecture
needs it, we move it from arch/x86/kvm/ to virt/kvm. I named it x86_ just
because it was the same length as kvm_ and thus hardly needed reindentation.

> Another option is adding something like "vcpu kvm_arch_fake_vcpu(kvm)"
> for cases where the access doesn't have an associated vcpu, so it would
> always succeed. (Might not be generic enough.)

That's ugly...

The question is also how often the copied code is changed, and the answer is
that most of it was never changed since it was introduced in 2007
(commit 195aefde9cc2, "KVM: Add general accessors to read and write guest
memory"). Before then, KVM used kmap_atomic directly!

Only the cache code is more recent, but that also has only been changed a
couple of times after introducing it in 2010 (commit 49c7754ce570, "KVM:
Add memory slot versioning and use it to provide fast guest write interface").
It is very stable code.

Paolo

2015-05-06 10:39:52

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 09/13] KVM: x86: save/load state on SMM switch



On 05/05/2015 22:44, Bandan Das wrote:
> Paolo Bonzini <[email protected]> writes:
>
>> +static inline bool guest_cpuid_has_longmode(struct kvm_vcpu *vcpu)
>> +{
>> + struct kvm_cpuid_entry2 *best;
>> +
>> + best = kvm_find_cpuid_entry(vcpu, 0x80000001, 0);
>> + return best && (best->edx & bit(X86_FEATURE_LM));
>> +}
>> +
>
> We could combine all guest_cpuid_has* functions into a single
> function guest_cpuid_has_feature(vcpu, feature) and avoid code duplication.
> Not relevant to this change - just a note (to self).
>
>> static inline bool guest_cpuid_has_osvw(struct kvm_vcpu *vcpu)
>> {
>> struct kvm_cpuid_entry2 *best;
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index f6b641207416..a49606871277 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -2262,12 +2262,253 @@ static int em_lseg(struct x86_emulate_ctxt *ctxt)
>> return rc;
>> }
>>
>> +static int emulator_has_longmode(struct x86_emulate_ctxt *ctxt)
>> +{
>> + u32 eax, ebx, ecx, edx;
>> +
>> + eax = 0x80000001;
>> + ecx = 0;
>> + ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
>> + return edx & bit(X86_FEATURE_LM);
>> +}
>> +
>> +#define get_smstate(type, smbase, offset) \
>> + ({ \
>> + type __val; \
>> + int r = ctxt->ops->read_std(ctxt, smbase + offset, &__val, \
>> + sizeof(__val), NULL); \
>> + if (r != X86EMUL_CONTINUE) \
>> + return X86EMUL_UNHANDLEABLE; \
>> + __val; \
>> + })
>> +
>> +static void rsm_set_desc_flags(struct desc_struct *desc, u16 flags)
>> +{
>> + desc->g = (flags >> 15) & 1;
>> + desc->d = (flags >> 14) & 1;
>> + desc->l = (flags >> 13) & 1;
>> + desc->avl = (flags >> 12) & 1;
>> + desc->p = (flags >> 7) & 1;
>> + desc->dpl = (flags >> 5) & 3;
>> + desc->s = (flags >> 4) & 1;
>> + desc->type = flags & 15;
>> +}
>> +
>> +static int rsm_load_seg_32(struct x86_emulate_ctxt *ctxt, u64 smbase, int n)
>> +{
>> + struct desc_struct desc;
>> + int offset;
>> + u16 selector;
>> +
>> + selector = get_smstate(u32, smbase, 0x7fa8 + n * 4);
> Probably a good idea to use #defines for all the offsets here
> and elsewhere.

Uff, those would be a lot of offsets. It's just as easy to get the
#defines right, as it is to get them right in get_smstate... >80
character lines would also mess everything up, I think.

> This return is redundant since rsm_load_seg_32 always returns
> success. Same for rsm_load_seg_64()

Note that the get_smstate macro can do an early return.

>> static int em_rsm(struct x86_emulate_ctxt *ctxt)
>> {
>> + unsigned long cr0, cr4, efer;
>> + u64 smbase;
>> + int ret;
>> +
>> + printk("rsm\n");
>> if ((ctxt->emul_flags & X86EMUL_SMM_MASK) == 0)
>> return emulate_ud(ctxt);
>>
>> - return X86EMUL_UNHANDLEABLE;
>> + /*
>> + * Get back to real mode, to prepare a safe state in which
>> + * to load CR0/CR3/CR4/EFER. Also this will ensure that
>> + * addresses passed to read_std/write_std are not virtual.
>> + */
>
> I am trying to understand this. Aren't we are already in real mode
> here since we are in smm or am I missing something..

SMM starts in big real mode. The SMI handler can go to protected mode
and even enable paging; as long as it's not 64-bit, it can execute an RSM.

I should check and disable CR4.PCIDE before CR0.PG.

>> + cr0 = ctxt->ops->get_cr(ctxt, 0);
>> + if (cr0 & X86_CR0_PE)
>> + ctxt->ops->set_cr(ctxt, 0, cr0 & ~(X86_CR0_PG | X86_CR0_PE));
>> + cr4 = ctxt->ops->get_cr(ctxt, 4);
>> + if (cr0 & X86_CR4_PAE)

Oops, cr4 here.

>> + /* revision id */
>> + put_smstate(u32, buf, 0x7efc, 0x00020064);
> Is the revision id (and 0x00020000 for process_smi*_32()) from the
> spec ? I can't seem to find them.

This revision id is in the AMD spec. You can see that SeaBIOS checks
for it and the 32-bit one as well (which I cannot find anywhere).

SeaBIOS should also accept 0x30000 and 0x30064. Sending a patch.

Paolo

> Bandan
> ...
>

2015-05-06 11:19:05

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 00/13] KVM: x86: SMM support



On 05/05/2015 20:40, Radim Krčmář wrote:
> - Whole SMRAM is writeable. Spec says that parts of state should be
> read-only. (This seems hard to fix without trapping all writes.)

Read-only here just means that you shouldn't touch it. It says "Some
register images are read-only, and must not be modified (modifying these
registers will result in unpredictable behavior)".

But actually the behavior is very predictable, and can be very fun. You
can do stuff such as interrupting a VM86 task with an SMI, and prepare
an SMM handler that returns to VM86 with CPL=0 (by setting SS.DPL=0 in
the SS access rights field). That's very illegal compared to big real
mode. :)

Or you can fake a processor reset straight after RSM, which includes
setting the right segment base, limit and access rights (again you need
to set SS.DPL=0 to affect the CPL).

Worst case, you get a failed VM entry (e.g. if you set up an invalid
combination of segment limit and segment G flag). If you care, disable
unrestricted_guest. :)

> - I/O restarting is not enabled. (APM 2:10.2.4 SMM-Revision Identifier
> says that AMD64 always sets this bit.)

Yes, unfortunately if I do enable it SeaBIOS breaks. So it's left for
later.

I/O restarting is meant for stuff like emulating the i8042 on top of a
USB keyboard. We luckily don't care (do not get strange ideas about
reducing the QEMU attack surface).

> - NMI handling has some quirks. (Software can enable NMI and another
> SMI should mask them again.)

Fixed now. The handling is pretty insane, and requires another field
smi.rsm_unmasks_nmi in KVM_GET_VCPU_EVENTS.

> - SMIs received while in SMM aren't handled. (One can be pending.)

Yup, fixed.

> - SMM and userspace.
> We can get if smm is enabled at two separate places (flag from KVM_RUN
> and in KVM_GET_VCPU_EVENTS) and toggle it via KVM_SET_VCPU_EVENTS.
>
> It's not an event, so I wouldn't include it in EVENTS API ...

Well, neither is nmi.masked or interrupt.shadow. In the end, smi.smm is
just "smi.masked" (except that it also doubles as "is RSM allowed/is
SMRAM accessible").

> Letting the flag in KVM_RUN also toggle SMM would be easiest.

I'm worried about breaking userspace with that. I would probably have
to enable the SMM capability manually.

By comparison, the current implementation is entirely transparent as
long as the guest only generates SMIs through the APIC: all QEMU changes
are needed to support SMRAM and generation of SMIs through port 0xB2,
but the feature otherwise has zero impact on userspace.

But the main point in favor of "smi.smm" IMO is that it doubles as
"smi.masked".

> Otherwise, wouldn't GET/SET_ONE_REG be a better match for it?

Perhaps, but then smi.pending would still be a better match for
KVM_GET_VCPU_EVENTS than for ONE_REG. (And again, so would
"smi.masked"---it just happens that "masked SMIs == CPU in SMM").

Paolo

2015-05-06 16:25:00

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH 12/13] KVM: x86: add KVM_MEM_X86_SMRAM memory slot flag

2015-05-06 11:47+0200, Paolo Bonzini:
> On 05/05/2015 19:17, Radim Krčmář wrote:
>> 2015-04-30 13:36+0200, Paolo Bonzini:
>>> struct kvm_memory_slot *x86_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn)
>>> {
>>> - struct kvm_memory_slot *slot = gfn_to_memslot(vcpu->kvm, gfn);
>>> + bool found;
>>> + struct kvm_memslots *memslots = kvm_memslots(vcpu->kvm);
>>> + struct kvm_memory_slot *slot = search_memslots(memslots, gfn, &found);
>>> +
>>> + if (found && unlikely(slot->flags & KVM_MEM_X86_SMRAM) && !is_smm(vcpu))
>>> + return NULL;
>>
>> Patch [10/13] made me sad and IIUIC, the line above is the only reason
>> for it ...
>
> Yes, all the differences trickle down to using x86_gfn_to_memslot.
>
> On the other hand, there are already cut-and-pasted loops for guest
> memory access, see kvm_write_guest_virt_system or
> kvm_read_guest_virt_helper.

(Yeah ... not introducing new problem is a good first step to fixing the
existing one. I can accept that both are okay -- the definition is up
to us -- but not that we are adding an abomination on purpose.)

> We could add __-prefixed macros like
>
> #define __kvm_write_guest(fn_page, gpa, data, len, args...) \
> ({ \
> gpa_t _gpa = (gpa); \
> void *_data = (data); \
> int _len = (len); \
> gfn_t _gfn = _gpa >> PAGE_SHIFT; \
> int _offset = offset_in_page(_gpa); \
> int _seg, _ret; \
> while ((_seg = next_segment(_len, _offset)) != 0) { \
> _ret = (fn_page)(args##, _gfn, _data, _offset, _seg); \
> if (_ret < 0) \
> break; \
> _offset = 0; \
> _len -= _seg; \
> _data += _seg; \
> ++_gfn; \
> } \
> _ret; \
> })
>
> ...
>
> int x86_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data,
> unsigned long len)
> {
> return __kvm_write_guest(x86_write_guest_page, gpa, data, len, vcpu);
> }
>
> but frankly it seems worse than the disease.

Well, it's a good approach, but the C language makes it awkward.
(I like first class functions.)

> what about renaming and changing kvm_* memory function to
>> vcpu_* and create
>> bool kvm_arch_vcpu_can_access_slot(vcpu, slot)
>> which could also be inline in arch/*/include/asm/kvm_host.h thanks to
>> the way we build.
>> We could be passing both kvm and vcpu in internal memslot operations and
>> not checking if vcpu is NULL. This should allow all possible operations
>> with little code duplication and the compiler could also optimize the
>> case where vcpu is NULL.
>
> That would be a huge patch, and most architectures do not (yet) need it.

Not that huge ... trivial extension for passing extra argument around
and adding few wrappers to keep compatibility and then a bunch of
static inline bool .*(vcpu, slot) { return true; }
for remaining arches. (We could have a default unless an arch #defines
KVM_ARCH_VCPU_SLOT_CHECKING or some other hack to anger programmers.)

The hard part is have the same object code and added flexibility in C.

> I can change the functions to kvm_vcpu_read_* and when a second architecture
> needs it, we move it from arch/x86/kvm/ to virt/kvm. I named it x86_ just
> because it was the same length as kvm_ and thus hardly needed reindentation.

That doesn't improve the main issue, so x86 is good.

>> Another option is adding something like "vcpu kvm_arch_fake_vcpu(kvm)"
>> for cases where the access doesn't have an associated vcpu, so it would
>> always succeed. (Might not be generic enough.)
>
> That's ugly...

Yes. (And I still prefer it.)

> The question is also how often the copied code is changed, and the answer is
> that most of it was never changed since it was introduced in 2007
> (commit 195aefde9cc2, "KVM: Add general accessors to read and write guest
> memory"). Before then, KVM used kmap_atomic directly!
>
> Only the cache code is more recent, but that also has only been changed a
> couple of times after introducing it in 2010 (commit 49c7754ce570, "KVM:
> Add memory slot versioning and use it to provide fast guest write interface").
> It is very stable code.

We have different views on code duplication :)

The feature you wanted exposed a flaw in the code, so an extension was
needed. Copying code is the last resort after all options of
abstracting were exhausted ... I might be forcing common paths when
writing it twice requires less brain power, but 200 lines of
structurally identical code seem far from it.
Reworking stable code is simpler, as we can just cover all features
needed now and omit the hard thinking about future extensions.
(For me, stable code is the first candidate for generalization ...
and I wouldn't copy it, even though it's mostly fine in practice.)

It's all nice in theory; I'll prepare a patch we can discuss.
(And maybe agree with this one after understanding all challenges.)

2015-05-06 16:49:50

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 06/13] KVM: x86: pass the whole hflags field to emulator and back

Paolo Bonzini <[email protected]> writes:

> On 05/05/2015 17:47, Bandan Das wrote:
>> > The hflags field will contain information about system management mode
>> > and will be useful for the emulator. Pass the entire field rather than
>> > just the guest-mode information.
>>
>> With respect to maintaining maximum isolation between vcpu internals and
>> the emulator,
>
> Isolation is maintained. hflags are simply parts of the processor state
> that are not visible to the guest, you can choose to include them as
> separate flags or in a single one. Bundling them in a single flag makes
> it a bit faster to pass around many of them.
>
> While we do not need all of them in the emulator, that's in some cases a
> limitation of the emulator. For example, if we wanted to emulate
> CLGI/STGI we would need either HF_GIF_MASK or another . Likewise,
> HF_NMI_MASK could replace emulator_set_nmi_mask (especially in v2 of the
> series, which will add kvm_set_hflags).
>
> However, if you prefer, I can change it to "bool smm_mode" + a new
> smm_exit emulator callback.

No, this makes sense. It's better to just pass hflags then to pass
bits individually.

> Paolo
>
>> why not just "bool smm_mode" ?
>>

2015-05-06 17:14:43

by Radim Krčmář

[permalink] [raw]
Subject: Re: [RFC PATCH 00/13] KVM: x86: SMM support

2015-05-06 13:18+0200, Paolo Bonzini:
> On 05/05/2015 20:40, Radim Krčmář wrote:
> > - Whole SMRAM is writeable. Spec says that parts of state should be
> > read-only. (This seems hard to fix without trapping all writes.)
>
> Read-only here just means that you shouldn't touch it. It says "Some
> register images are read-only, and must not be modified (modifying these
> registers will result in unpredictable behavior)".

I haven't seen the note that they musn't be modified, sorry.

> But actually the behavior is very predictable, and can be very fun. You
> can do stuff such as interrupting a VM86 task with an SMI, and prepare
> an SMM handler that returns to VM86 with CPL=0 (by setting SS.DPL=0 in
> the SS access rights field). That's very illegal compared to big real
> mode. :)
>
> Or you can fake a processor reset straight after RSM, which includes
> setting the right segment base, limit and access rights (again you need
> to set SS.DPL=0 to affect the CPL).
>
> Worst case, you get a failed VM entry (e.g. if you set up an invalid
> combination of segment limit and segment G flag). If you care, disable
> unrestricted_guest. :)

Nice, thanks.

> > - I/O restarting is not enabled. (APM 2:10.2.4 SMM-Revision Identifier
> > says that AMD64 always sets this bit.)
>
> Yes, unfortunately if I do enable it SeaBIOS breaks. So it's left for
> later.
>
> I/O restarting is meant for stuff like emulating the i8042 on top of a
> USB keyboard. We luckily don't care (do not get strange ideas about
> reducing the QEMU attack surface).

Ok. (SMM handlers doing sanity checks on their environment are probably
the biggest obstacle.)

> > - SMM and userspace.
> > We can get if smm is enabled at two separate places (flag from KVM_RUN
> > and in KVM_GET_VCPU_EVENTS) and toggle it via KVM_SET_VCPU_EVENTS.
> >
> > It's not an event, so I wouldn't include it in EVENTS API ...
>
> Well, neither is nmi.masked or interrupt.shadow. In the end, smi.smm is
> just "smi.masked" (except that it also doubles as "is RSM allowed/is
> SMRAM accessible").

Yeah, that double function is bugging me ... SMI can be masked for
reasons other than being in SMM, so the connection is not obvious.
(But all cases I know of are handled differently in KVM.)

Other case is that when emulating the SMM switch in userspace, EVENTS
ioctl wouldn't be the place where I where I would expect a toggle for
KVM to be.

> > Letting the flag in KVM_RUN also toggle SMM would be easiest.
>
> I'm worried about breaking userspace with that. I would probably have
> to enable the SMM capability manually.
>
> By comparison, the current implementation is entirely transparent as
> long as the guest only generates SMIs through the APIC: all QEMU changes
> are needed to support SMRAM and generation of SMIs through port 0xB2,
> but the feature otherwise has zero impact on userspace.

They should be equally transparent. Userspace needs to preserve all
reserved bits, and hopefully does. (It's the same with SET_EVENTS.)

> But the main point in favor of "smi.smm" IMO is that it doubles as
> "smi.masked".

True. 'smi.masked_as_we_are_in_smm' :)

> > Otherwise, wouldn't GET/SET_ONE_REG be a better match for it?
>
> Perhaps, but then smi.pending would still be a better match for
> KVM_GET_VCPU_EVENTS than for ONE_REG. (And again, so would
> "smi.masked"---it just happens that "masked SMIs == CPU in SMM").

smi.pending makes sense in events, it would be split ...

Your original solution is a good one. (Others aren't any better.)

2015-05-06 17:55:21

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 09/13] KVM: x86: save/load state on SMM switch

Paolo Bonzini <[email protected]> writes:
..
>>> +
>>> + selector = get_smstate(u32, smbase, 0x7fa8 + n * 4);
>> Probably a good idea to use #defines for all the offsets here
>> and elsewhere.
>
> Uff, those would be a lot of offsets. It's just as easy to get the
> #defines right, as it is to get them right in get_smstate... >80
> character lines would also mess everything up, I think.

Valid point. But I would say cryptic names will always be better then
hex offsets. It's a one time pain in the neck to get the defines
right.

I found this header -
ftp://ftp.xskernel.org/soft/linux-src/bochs-20080511/cpu/smm.h

Maybe we can pick it up ? I can sanity check the list and
make sure the offsets are correct if you prefer.

>> This return is redundant since rsm_load_seg_32 always returns
>> success. Same for rsm_load_seg_64()
>
> Note that the get_smstate macro can do an early return.
>
Oops! Sorry, right. I suggest changing get_smstate to get_smstate_iam_a_macro_not_a_function ;)

>>> static int em_rsm(struct x86_emulate_ctxt *ctxt)
>>> {
>>> + unsigned long cr0, cr4, efer;
>>> + u64 smbase;
>>> + int ret;
>>> +
>>> + printk("rsm\n");
>>> if ((ctxt->emul_flags & X86EMUL_SMM_MASK) == 0)
>>> return emulate_ud(ctxt);
>>>
>>> - return X86EMUL_UNHANDLEABLE;
>>> + /*
>>> + * Get back to real mode, to prepare a safe state in which
>>> + * to load CR0/CR3/CR4/EFER. Also this will ensure that
>>> + * addresses passed to read_std/write_std are not virtual.
>>> + */
>>
>> I am trying to understand this. Aren't we are already in real mode
>> here since we are in smm or am I missing something..
>
> SMM starts in big real mode. The SMI handler can go to protected mode
> and even enable paging; as long as it's not 64-bit, it can execute an RSM.

Ok. This part. rsm can be called from protected mode. Thanks for the
explanation.

> I should check and disable CR4.PCIDE before CR0.PG.
>
Actually, 4.10.1 says that "The processor ensures that CR4.PCIDE can
only be 1 in IA32e mode". At this point it should already be 0.

Bandan

>>> + cr0 = ctxt->ops->get_cr(ctxt, 0);
>>> + if (cr0 & X86_CR0_PE)
>>> + ctxt->ops->set_cr(ctxt, 0, cr0 & ~(X86_CR0_PG | X86_CR0_PE));
>>> + cr4 = ctxt->ops->get_cr(ctxt, 4);
>>> + if (cr0 & X86_CR4_PAE)
>
> Oops, cr4 here.
>
>>> + /* revision id */
>>> + put_smstate(u32, buf, 0x7efc, 0x00020064);
>> Is the revision id (and 0x00020000 for process_smi*_32()) from the
>> spec ? I can't seem to find them.
>
> This revision id is in the AMD spec. You can see that SeaBIOS checks
> for it and the 32-bit one as well (which I cannot find anywhere).
>
> SeaBIOS should also accept 0x30000 and 0x30064. Sending a patch.
>
> Paolo
>
>> Bandan
>> ...
>>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-05-06 18:15:16

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 12/13] KVM: x86: add KVM_MEM_X86_SMRAM memory slot flag

Radim Krčmář <[email protected]> writes:
...
>
> That doesn't improve the main issue, so x86 is good.
>
>>> Another option is adding something like "vcpu kvm_arch_fake_vcpu(kvm)"
>>> for cases where the access doesn't have an associated vcpu, so it would
>>> always succeed. (Might not be generic enough.)
>>
>> That's ugly...
>
> Yes. (And I still prefer it.)

Ooh, I hope we don't go this route :) I understand the motivation but
if there would be cases where we would have to fake the condition
to be true, maybe we should reconsider our design.

Bandan

2015-05-06 19:38:49

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 09/13] KVM: x86: save/load state on SMM switch



On 06/05/2015 19:55, Bandan Das wrote:
> Valid point. But I would say cryptic names will always be better then
> hex offsets. It's a one time pain in the neck to get the defines
> right.
>
> I found this header -
> ftp://ftp.xskernel.org/soft/linux-src/bochs-20080511/cpu/smm.h
>
> Maybe we can pick it up ? I can sanity check the list and
> make sure the offsets are correct if you prefer.

Thanks, I like that header. :) No need to sanity check yourself.

I'll have to adjust to define both 32-bit and 64-bit offsets. Ok to use
just SMRAM32_OFS_xyz and SMRAM64_OFS_xyz? No need for LO32 and HI32, in
particular.

The one problem is that I am doing math on the offsets in some places.
It would suck to save/restore 16 registers manually. Any suggestions?
I will probably send v1 for review without the offsets.

Paolo

2015-05-06 19:43:26

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 12/13] KVM: x86: add KVM_MEM_X86_SMRAM memory slot flag



On 06/05/2015 18:24, Radim Krčmář wrote:
> The feature you wanted exposed a flaw in the code, so an extension was
> needed. Copying code is the last resort after all options of
> abstracting were exhausted ... I might be forcing common paths when
> writing it twice requires less brain power, but 200 lines of
> structurally identical code seem far from it.

Note that it didn't really expose a flaw in the code, just a limitation.
There are cases even on x86 where you have no vcpu, for example filling
in the EPT identity page tables or VMX TSS.

> Reworking stable code is simpler, as we can just cover all features
> needed now and omit the hard thinking about future extensions.
> (For me, stable code is the first candidate for generalization ...
> and I wouldn't copy it, even though it's mostly fine in practice.)

Stable simple code is also important to keep simple though. Sometimes
code duplication is preferrable to obfuscation.

I agree that copying 200 lines of code because of one function three
levels down the call chain isn't nice. However, it doesn't seem
particularly easy to avoid the duplication even with C++ templates. C
is worse. OCaml or Haskell would be nicer. :)

Paolo

2015-05-08 02:55:59

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 01/13] KVM: MMU: fix for CR4.SMEP=1, CR0.WP=0?



On 04/30/2015 07:36 PM, Paolo Bonzini wrote:
> smep_andnot_wp is initialized in kvm_init_shadow_mmu and shadow pages
> should not be reused for different values of it. Thus, it has to be
> added to the mask in kvm_mmu_pte_write.
>


Good catch!

Reviewed-by: Xiao Guangrong <[email protected]>

2015-05-12 23:56:21

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 09/13] KVM: x86: save/load state on SMM switch

Paolo Bonzini <[email protected]> writes:

> On 06/05/2015 19:55, Bandan Das wrote:
>> Valid point. But I would say cryptic names will always be better then
>> hex offsets. It's a one time pain in the neck to get the defines
>> right.
>>
>> I found this header -
>> ftp://ftp.xskernel.org/soft/linux-src/bochs-20080511/cpu/smm.h
>>
>> Maybe we can pick it up ? I can sanity check the list and
>> make sure the offsets are correct if you prefer.
>
> Thanks, I like that header. :) No need to sanity check yourself.
>
> I'll have to adjust to define both 32-bit and 64-bit offsets. Ok to use
> just SMRAM32_OFS_xyz and SMRAM64_OFS_xyz? No need for LO32 and HI32, in
> particular.
Agreed, that sounds like the right thing to do.

> The one problem is that I am doing math on the offsets in some places.
> It would suck to save/restore 16 registers manually. Any suggestions?
> I will probably send v1 for review without the offsets.

I think it's ok just to replace the offset number with a name.
If you are concerned about readability, we could probably wrap the
"for loops" in the math calculations with a more serious sounding macro
eg. "#define foreach_smm_offset(val, start_offset, jump, count)" :)

> Paolo

2015-05-13 06:58:12

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 09/13] KVM: x86: save/load state on SMM switch



On 13/05/2015 01:56, Bandan Das wrote:
> I think it's ok just to replace the offset number with a name.
> If you are concerned about readability, we could probably wrap the
> "for loops" in the math calculations with a more serious sounding macro
> eg. "#define foreach_smm_offset(val, start_offset, jump, count)" :)

That doesn't help. The problem is that the for loops assume that you
know that the offsets are at a constant distance, plus the distance.
This is already not trivial for the registers (is RAX at the highest or
lowest offset); but it could even be worse than hardcoded offsets for
selectors and descriptor caches.

Paolo

2015-05-15 20:32:22

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 12/13] KVM: x86: add KVM_MEM_X86_SMRAM memory slot flag

On 04/30/2015 02:36 PM, Paolo Bonzini wrote:
> This adds an arch-specific memslot flag that hides slots unless the
> VCPU is in system management mode.
>
> Some care is needed in order to limit the overhead of x86_gfn_to_memslot
> when compared with gfn_to_memslot. Thankfully, we have __gfn_to_memslot
> and search_memslots which are the same, so we can add some extra output
> to search_memslots. The compiler will optimize it as dead code in
> __gfn_to_memslot, and will use it to thread jumps in x86_gfn_to_memslot.
>
>
> struct kvm_memory_slot *x86_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn)
> {
> - struct kvm_memory_slot *slot = gfn_to_memslot(vcpu->kvm, gfn);
> + /* By using search_memslots directly the compiler can optimize away
> + * the "if (found)" check below.
> + *
> + * It cannot do the same for gfn_to_memslot because it is not inlined,
> + * and it also cannot do the same for __gfn_to_memslot because the
> + * kernel is compiled with -fno-delete-null-pointer-checks.
> + */
> + bool found;
> + struct kvm_memslots *memslots = kvm_memslots(vcpu->kvm);
> + struct kvm_memory_slot *slot = search_memslots(memslots, gfn, &found);
> +
> + if (found && unlikely(slot->flags & KVM_MEM_X86_SMRAM) && !is_smm(vcpu))
> + return NULL;
>

Alternative approach: store the memslot pointer in the vcpu, instead of
of vcpu->kvm. The uapi interface code can generate two memslot tables
to be stored in kvm->, one for smm and one for !smm.

This is a little more generic and can be used for other asymmetric
memory maps causes (the only one I can think of, mapping the APIC to
different physical addresses, can't be efficiently supported).

> return slot;
> }
> @@ -112,7 +125,15 @@ EXPORT_SYMBOL_GPL(x86_read_guest);
> int x86_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
> gpa_t gpa, unsigned long len)
> {
> - return kvm_gfn_to_hva_cache_init(kvm, ghc, gpa, len);
> + int r = kvm_gfn_to_hva_cache_init(kvm, ghc, gpa, len);
> +
> + if (r < 0)
> + return r;
> +
> + /* Use slow path for reads and writes to SMRAM. */
> + if (ghc->memslot && (ghc->memslot->flags & KVM_MEM_X86_SMRAM))
> + ghc->memslot = NULL;
> + return r;
> }
> EXPORT_SYMBOL_GPL(x86_gfn_to_hva_cache_init);
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index e7cfee09970a..ea3db5d89f67 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -800,16 +800,18 @@ static inline void kvm_guest_exit(void)
> * gfn_to_memslot() itself isn't here as an inline because that would
> * bloat other code too much.
> */
> -static inline struct kvm_memory_slot *
> -search_memslots(struct kvm_memslots *slots, gfn_t gfn)
> +static __always_inline struct kvm_memory_slot *
> +search_memslots(struct kvm_memslots *slots, gfn_t gfn, bool *found)
> {
> int start = 0, end = slots->used_slots;
> int slot = atomic_read(&slots->lru_slot);
> struct kvm_memory_slot *memslots = slots->memslots;
>
> if (gfn >= memslots[slot].base_gfn &&
> - gfn < memslots[slot].base_gfn + memslots[slot].npages)
> + gfn < memslots[slot].base_gfn + memslots[slot].npages) {
> + *found = true;
> return &memslots[slot];
> + }
>
> while (start < end) {
> slot = start + (end - start) / 2;
> @@ -823,16 +825,20 @@ search_memslots(struct kvm_memslots *slots, gfn_t gfn)
> if (gfn >= memslots[start].base_gfn &&
> gfn < memslots[start].base_gfn + memslots[start].npages) {
> atomic_set(&slots->lru_slot, start);
> + *found = true;
> return &memslots[start];
> }
>
> + *found = false;
> return NULL;
> }
>
> static inline struct kvm_memory_slot *
> __gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn)
> {
> - return search_memslots(slots, gfn);
> + bool found;
> +
> + return search_memslots(slots, gfn, &found);
> }
>
> static inline unsigned long
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e82c46b9860f..b8b76106a003 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -716,6 +716,10 @@ static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
> #ifdef __KVM_HAVE_READONLY_MEM
> valid_flags |= KVM_MEM_READONLY;
> #endif
> +#ifdef __KVM_ARCH_VALID_FLAGS
> + BUILD_BUG_ON(__KVM_ARCH_VALID_FLAGS & KVM_MEMSLOT_INVALID);
> + valid_flags |= __KVM_ARCH_VALID_FLAGS;
> +#endif
>
> if (mem->flags & ~valid_flags)
> return -EINVAL;

2015-05-18 08:31:55

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 12/13] KVM: x86: add KVM_MEM_X86_SMRAM memory slot flag



On 15/05/2015 22:32, Avi Kivity wrote:
> Alternative approach: store the memslot pointer in the vcpu, instead of
> of vcpu->kvm. The uapi interface code can generate two memslot tables
> to be stored in kvm->, one for smm and one for !smm.

That's a great suggestion---and easy to implement too. Thanks Avi for
chiming in!

Paolo

2015-05-19 14:27:32

by Zhang, Yang Z

[permalink] [raw]
Subject: RE: [RFC PATCH 00/13] KVM: x86: SMM support

Paolo Bonzini wrote on 2015-04-30:
> This patch series introduces system management mode support.

Just curious what's motivation to add vSMM supporting? Is there any usage case inside guest requires SMM? Thanks.

> There is still some work to do, namely: test without unrestricted
> guest support, test on AMD, disable the capability if !unrestricted
> guest and !emulate invalid guest state(*), test with a QEMU that
> understand KVM_MEM_X86_SMRAM, actually post QEMU patches that let you use this.
>
> (*) newer chipsets moved away from legacy SMRAM at 0xa0000,
> thus support for real mode CS base above 1M is necessary
>
> Because legacy SMRAM is a mess, I have tried these patches with Q35's
> high SMRAM (at 0xfeda0000). This means that right now this isn't the
> easiest thing to test; you need QEMU patches that add support for high
> SMRAM, and SeaBIOS patches to use high SMRAM. Until QEMU support for
> KVM_MEM_X86_SMRAM is in place, also, I'm keeping SMRAM open in SeaBIOS.
>
> That said, even this clumsy and incomplete userspace configuration is
> enough to test all patches except 11 and 12.
>
> The series is structured as follows.
>
> Patch 1 is an unrelated bugfix (I think). Patches 2 to 6 extend some
> infrastructure functions. Patches 1 to 4 could be committed right now.
>
> Patches 7 to 9 implement basic support for SMM in the KVM API and
> teach KVM about doing the world switch on SMI and RSM.
>
> Patch 10 touches all places in KVM that read/write guest memory to go
> through an x86-specific function. The x86-specific function takes a
> VCPU rather than a struct kvm. This is used in patches 11 and 12 to
> limits access to specially marked SMRAM slots unless the VCPU is in
> system management mode.
>
> Finally, patch 13 exposes the new capability for userspace to probe.
>


Best regards,
Yang


????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-05-19 14:29:32

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [RFC PATCH 00/13] KVM: x86: SMM support



On 19/05/2015 16:25, Zhang, Yang Z wrote:
> Paolo Bonzini wrote on 2015-04-30:
>> This patch series introduces system management mode support.
>
> Just curious what's motivation to add vSMM supporting? Is there any
> usage case inside guest requires SMM? Thanks.

SMM provides the trusted base for UEFI Secure Boot.

Paolo

2015-05-20 01:04:14

by Zhang, Yang Z

[permalink] [raw]
Subject: RE: [RFC PATCH 00/13] KVM: x86: SMM support

Paolo Bonzini wrote on 2015-05-19:
>
>
> On 19/05/2015 16:25, Zhang, Yang Z wrote:
>> Paolo Bonzini wrote on 2015-04-30:
>>> This patch series introduces system management mode support.
>>
>> Just curious what's motivation to add vSMM supporting? Is there any
>> usage case inside guest requires SMM? Thanks.
>
> SMM provides the trusted base for UEFI Secure Boot.
>
> Paolo

OK, I see it. Thanks for your explaination.

Best regards,
Yang


????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-05-20 15:22:21

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH 00/13] KVM: x86: SMM support

Paolo Bonzini <[email protected]> writes:

> On 19/05/2015 16:25, Zhang, Yang Z wrote:
>> Paolo Bonzini wrote on 2015-04-30:
>>> This patch series introduces system management mode support.
>>
>> Just curious what's motivation to add vSMM supporting? Is there any
>> usage case inside guest requires SMM? Thanks.
>
> SMM provides the trusted base for UEFI Secure Boot.

I was wondering the same thing. This definitely should be somewhere in
the changelogs!

-Andi

--
[email protected] -- Speaking for myself only