Following is the kvm patchqueue for the 2.6.29 merge window. Due to the
number of patches, this will be sent in three batches, of which this is the
first.
Amit Shah (3):
KVM: x86: Fix typo in function name
KVM: SVM: Set the 'g' bit of the cs selector for cross-vendor
migration
KVM: SVM: Set the 'busy' flag of the TR selector
Gleb Natapov (1):
KVM: call kvm_arch_vcpu_reset() instead of the kvm_x86_ops callback
Guillaume Thouvenin (4):
KVM: x86 emulator: consolidate push reg
KVM: x86 emulator: Add decode entries for 0x04 and 0x05 opcodes (add
acc, imm)
KVM: allow emulator to adjust rip for emulated pio instructions
KVM: VMX: Handle mmio emulation when guest state is invalid
Hollis Blanchard (8):
KVM: ppc: Move 440-specific TLB code into 44x_tlb.c
KVM: ppc: Rename "struct tlbe" to "struct kvmppc_44x_tlbe"
KVM: ppc: combine booke_guest.c and booke_host.c
KVM: ppc: Refactor powerpc.c to relocate 440-specific code
ppc: Create disassemble.h to extract instruction fields
KVM: ppc: refactor instruction emulation into generic and
core-specific pieces
KVM: ppc: Move the last bits of 44x code out of booke.c
KVM: ppc: create struct kvm_vcpu_44x and introduce container_of()
accessor
Izik Eidus (1):
KVM: MMU: Fix aliased gfns treated as unaliased
Jan Kiszka (15):
KVM: VMX: include all IRQ window exits in statistics
KVM: VMX: Use INTR_TYPE_NMI_INTR instead of magic value
KVM: VMX: Support for NMI task gates
KVM: x86: Reset pending/inject NMI state on CPU reset
KVM: VMX: refactor/fix IRQ and NMI injectability determination
KVM: VMX: refactor IRQ and NMI window enabling
KVM: VMX: fix real-mode NMI support
KVM: x86: Enable NMI Watchdog via in-kernel PIT source
KVM: x86: VCPU with pending NMI is runnabled
KVM: Kick NMI receiving VCPU
KVM: x86: Support for user space injected NMIs
KVM: VMX: Provide support for user space injected NMIs
KVM: VMX: work around lacking VNMI support
KVM: x86: Fix and refactor NMI watchdog emulation
KVM: x86: Optimize NMI watchdog delivery
Sheng Yang (11):
x86: Rename mtrr_state struct and macro names
x86: Export some definition of MTRR
KVM: Improve MTRR structure
KVM: VMX: Add PAT support for EPT
KVM: Add local get_mtrr_type() to support MTRR
KVM: Enable MTRR for EPT
KVM: Clean up kvm_x86_emulate.h
KVM: MMU: Extend kvm_mmu_page->slot_bitmap size
KVM: VMX: Move private memory slot position
KVM: IRQ ACK notifier should be used with in-kernel irqchip
KVM: Enable Function Level Reset for assigned device
Xiantao Zhang (2):
KVM: ia64: Re-organize data sturure of guests' data area
KVM: ia64: Remove lock held by halted vcpu
arch/ia64/include/asm/kvm_host.h | 192 ++++++++-----
arch/ia64/kvm/kvm-ia64.c | 62 ++--
arch/ia64/kvm/kvm_minstate.h | 4 +-
arch/ia64/kvm/misc.h | 3 +-
arch/ia64/kvm/vcpu.c | 5 +-
arch/ia64/kvm/vtlb.c | 4 +-
arch/powerpc/include/asm/disassemble.h | 80 +++++
arch/powerpc/include/asm/kvm_44x.h | 47 +++
arch/powerpc/include/asm/kvm_host.h | 12 +-
arch/powerpc/include/asm/kvm_ppc.h | 75 ++---
arch/powerpc/kernel/asm-offsets.c | 16 +-
arch/powerpc/kvm/44x.c | 234 ++++++++++++++
arch/powerpc/kvm/44x_emulate.c | 335 ++++++++++++++++++++
arch/powerpc/kvm/44x_tlb.c | 179 ++++++++++-
arch/powerpc/kvm/44x_tlb.h | 29 ++-
arch/powerpc/kvm/Kconfig | 11 +-
arch/powerpc/kvm/Makefile | 11 +-
arch/powerpc/kvm/{booke_guest.c => booke.c} | 187 +++++++-----
arch/powerpc/kvm/booke.h | 39 +++
arch/powerpc/kvm/booke_host.c | 83 -----
arch/powerpc/kvm/booke_interrupts.S | 6 +-
arch/powerpc/kvm/emulate.c | 437 ++-------------------------
arch/powerpc/kvm/powerpc.c | 126 +-------
arch/x86/include/asm/kvm_host.h | 22 +-
arch/x86/include/asm/kvm_x86_emulate.h | 10 +-
arch/x86/include/asm/mtrr.h | 25 ++
arch/x86/kernel/cpu/mtrr/generic.c | 12 +-
arch/x86/kernel/cpu/mtrr/main.c | 4 +-
arch/x86/kernel/cpu/mtrr/mtrr.h | 18 +-
arch/x86/kvm/i8254.c | 19 ++
arch/x86/kvm/irq.h | 1 +
arch/x86/kvm/lapic.c | 58 +++-
arch/x86/kvm/mmu.c | 135 ++++++++-
arch/x86/kvm/svm.c | 23 ++
arch/x86/kvm/vmx.c | 345 +++++++++++++++-------
arch/x86/kvm/vmx.h | 12 +-
arch/x86/kvm/x86.c | 133 +++++++-
arch/x86/kvm/x86_emulate.c | 12 +-
include/linux/kvm.h | 11 +-
include/linux/kvm_host.h | 3 +-
virt/kvm/ioapic.c | 1 +
virt/kvm/irq_comm.c | 8 +-
virt/kvm/kvm_main.c | 16 +-
43 files changed, 1940 insertions(+), 1105 deletions(-)
create mode 100644 arch/powerpc/include/asm/disassemble.h
create mode 100644 arch/powerpc/include/asm/kvm_44x.h
create mode 100644 arch/powerpc/kvm/44x.c
create mode 100644 arch/powerpc/kvm/44x_emulate.c
rename arch/powerpc/kvm/{booke_guest.c => booke.c} (80%)
create mode 100644 arch/powerpc/kvm/booke.h
delete mode 100644 arch/powerpc/kvm/booke_host.c
From: Jan Kiszka <[email protected]>
irq_window_exits only tracks IRQ window exits due to user space
requests, nmi_window_exits include all exits. The latter makes more
sense, so let's adjust irq_window_exits accounting.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/vmx.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a4018b0..ac34537 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2767,6 +2767,7 @@ static int handle_interrupt_window(struct kvm_vcpu *vcpu,
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
KVMTRACE_0D(PEND_INTR, vcpu, handler);
+ ++vcpu->stat.irq_window_exits;
/*
* If the user space waits to inject interrupts, exit as soon as
@@ -2775,7 +2776,6 @@ static int handle_interrupt_window(struct kvm_vcpu *vcpu,
if (kvm_run->request_interrupt_window &&
!vcpu->arch.irq_summary) {
kvm_run->exit_reason = KVM_EXIT_IRQ_WINDOW_OPEN;
- ++vcpu->stat.irq_window_exits;
return 0;
}
return 1;
--
1.6.0.3
From: Jan Kiszka <[email protected]>
Properly set GUEST_INTR_STATE_NMI and reset nmi_injected when a
task-switch vmexit happened due to a task gate being used for handling
NMIs. Also avoid the false warning about valid vectoring info in
kvm_handle_exit.
Based on original patch by Gleb Natapov.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/vmx.c | 18 +++++++++++++++---
1 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 81cf12b..8d0fc68 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2832,6 +2832,7 @@ static int handle_apic_access(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
static int handle_task_switch(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
unsigned long exit_qualification;
u16 tss_selector;
int reason;
@@ -2839,6 +2840,15 @@ static int handle_task_switch(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
reason = (u32)exit_qualification >> 30;
+ if (reason == TASK_SWITCH_GATE && vmx->vcpu.arch.nmi_injected &&
+ (vmx->idt_vectoring_info & VECTORING_INFO_VALID_MASK) &&
+ (vmx->idt_vectoring_info & VECTORING_INFO_TYPE_MASK)
+ == INTR_TYPE_NMI_INTR) {
+ vcpu->arch.nmi_injected = false;
+ if (cpu_has_virtual_nmis())
+ vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
+ GUEST_INTR_STATE_NMI);
+ }
tss_selector = exit_qualification;
return kvm_task_switch(vcpu, tss_selector, reason);
@@ -3012,9 +3022,11 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
if ((vectoring_info & VECTORING_INFO_VALID_MASK) &&
(exit_reason != EXIT_REASON_EXCEPTION_NMI &&
- exit_reason != EXIT_REASON_EPT_VIOLATION))
- printk(KERN_WARNING "%s: unexpected, valid vectoring info and "
- "exit reason is 0x%x\n", __func__, exit_reason);
+ exit_reason != EXIT_REASON_EPT_VIOLATION &&
+ exit_reason != EXIT_REASON_TASK_SWITCH))
+ printk(KERN_WARNING "%s: unexpected, valid vectoring info "
+ "(0x%x) and exit reason is 0x%x\n",
+ __func__, vectoring_info, exit_reason);
if (exit_reason < kvm_vmx_max_exit_handlers
&& kvm_vmx_exit_handlers[exit_reason])
return kvm_vmx_exit_handlers[exit_reason](vcpu, kvm_run);
--
1.6.0.3
From: Jan Kiszka <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/vmx.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ac34537..81cf12b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2492,7 +2492,7 @@ static int handle_exception(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
set_bit(irq / BITS_PER_LONG, &vcpu->arch.irq_summary);
}
- if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == 0x200) /* nmi */
+ if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR)
return 1; /* already handled by vmx_vcpu_run() */
if (is_no_device(intr_info)) {
@@ -3337,7 +3337,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
/* We need to handle NMIs before interrupts are enabled */
- if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == 0x200 &&
+ if ((intr_info & INTR_INFO_INTR_TYPE_MASK) == INTR_TYPE_NMI_INTR &&
(intr_info & INTR_INFO_VALID_MASK)) {
KVMTRACE_0D(NMI, vcpu, handler);
asm("int $2");
--
1.6.0.3
From: Sheng Yang <[email protected]>
For EPT memory type support.
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/mmu.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 104 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 410ddbc..ac2304f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1393,6 +1393,110 @@ struct page *gva_to_page(struct kvm_vcpu *vcpu, gva_t gva)
return page;
}
+/*
+ * The function is based on mtrr_type_lookup() in
+ * arch/x86/kernel/cpu/mtrr/generic.c
+ */
+static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
+ u64 start, u64 end)
+{
+ int i;
+ u64 base, mask;
+ u8 prev_match, curr_match;
+ int num_var_ranges = KVM_NR_VAR_MTRR;
+
+ if (!mtrr_state->enabled)
+ return 0xFF;
+
+ /* Make end inclusive end, instead of exclusive */
+ end--;
+
+ /* Look in fixed ranges. Just return the type as per start */
+ if (mtrr_state->have_fixed && (start < 0x100000)) {
+ int idx;
+
+ if (start < 0x80000) {
+ idx = 0;
+ idx += (start >> 16);
+ return mtrr_state->fixed_ranges[idx];
+ } else if (start < 0xC0000) {
+ idx = 1 * 8;
+ idx += ((start - 0x80000) >> 14);
+ return mtrr_state->fixed_ranges[idx];
+ } else if (start < 0x1000000) {
+ idx = 3 * 8;
+ idx += ((start - 0xC0000) >> 12);
+ return mtrr_state->fixed_ranges[idx];
+ }
+ }
+
+ /*
+ * Look in variable ranges
+ * Look of multiple ranges matching this address and pick type
+ * as per MTRR precedence
+ */
+ if (!(mtrr_state->enabled & 2))
+ return mtrr_state->def_type;
+
+ prev_match = 0xFF;
+ for (i = 0; i < num_var_ranges; ++i) {
+ unsigned short start_state, end_state;
+
+ if (!(mtrr_state->var_ranges[i].mask_lo & (1 << 11)))
+ continue;
+
+ base = (((u64)mtrr_state->var_ranges[i].base_hi) << 32) +
+ (mtrr_state->var_ranges[i].base_lo & PAGE_MASK);
+ mask = (((u64)mtrr_state->var_ranges[i].mask_hi) << 32) +
+ (mtrr_state->var_ranges[i].mask_lo & PAGE_MASK);
+
+ start_state = ((start & mask) == (base & mask));
+ end_state = ((end & mask) == (base & mask));
+ if (start_state != end_state)
+ return 0xFE;
+
+ if ((start & mask) != (base & mask))
+ continue;
+
+ curr_match = mtrr_state->var_ranges[i].base_lo & 0xff;
+ if (prev_match == 0xFF) {
+ prev_match = curr_match;
+ continue;
+ }
+
+ if (prev_match == MTRR_TYPE_UNCACHABLE ||
+ curr_match == MTRR_TYPE_UNCACHABLE)
+ return MTRR_TYPE_UNCACHABLE;
+
+ if ((prev_match == MTRR_TYPE_WRBACK &&
+ curr_match == MTRR_TYPE_WRTHROUGH) ||
+ (prev_match == MTRR_TYPE_WRTHROUGH &&
+ curr_match == MTRR_TYPE_WRBACK)) {
+ prev_match = MTRR_TYPE_WRTHROUGH;
+ curr_match = MTRR_TYPE_WRTHROUGH;
+ }
+
+ if (prev_match != curr_match)
+ return MTRR_TYPE_UNCACHABLE;
+ }
+
+ if (prev_match != 0xFF)
+ return prev_match;
+
+ return mtrr_state->def_type;
+}
+
+static u8 get_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+ u8 mtrr;
+
+ mtrr = get_mtrr_type(&vcpu->arch.mtrr_state, gfn << PAGE_SHIFT,
+ (gfn << PAGE_SHIFT) + PAGE_SIZE);
+ if (mtrr == 0xfe || mtrr == 0xff)
+ mtrr = MTRR_TYPE_WRBACK;
+ return mtrr;
+}
+
static int kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
{
unsigned index;
--
1.6.0.3
From: Sheng Yang <[email protected]>
Also remove unnecessary parameter of unregister irq ack notifier.
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
include/linux/kvm_host.h | 3 +--
virt/kvm/irq_comm.c | 8 ++++++--
virt/kvm/kvm_main.c | 2 +-
3 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bb92be2..3a0fb77 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -316,8 +316,7 @@ void kvm_set_irq(struct kvm *kvm, int irq_source_id, int irq, int level);
void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi);
void kvm_register_irq_ack_notifier(struct kvm *kvm,
struct kvm_irq_ack_notifier *kian);
-void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
- struct kvm_irq_ack_notifier *kian);
+void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian);
int kvm_request_irq_source_id(struct kvm *kvm);
void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 55ad76e..9fbbdea 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -58,12 +58,16 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned gsi)
void kvm_register_irq_ack_notifier(struct kvm *kvm,
struct kvm_irq_ack_notifier *kian)
{
+ /* Must be called with in-kernel IRQ chip, otherwise it's nonsense */
+ ASSERT(irqchip_in_kernel(kvm));
+ ASSERT(kian);
hlist_add_head(&kian->link, &kvm->arch.irq_ack_notifier_list);
}
-void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
- struct kvm_irq_ack_notifier *kian)
+void kvm_unregister_irq_ack_notifier(struct kvm_irq_ack_notifier *kian)
{
+ if (!kian)
+ return;
hlist_del(&kian->link);
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a87f45e..4f43abe 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -143,7 +143,7 @@ static void kvm_free_assigned_device(struct kvm *kvm,
if (irqchip_in_kernel(kvm) && assigned_dev->irq_requested)
free_irq(assigned_dev->host_irq, (void *)assigned_dev);
- kvm_unregister_irq_ack_notifier(kvm, &assigned_dev->ack_notifier);
+ kvm_unregister_irq_ack_notifier(&assigned_dev->ack_notifier);
kvm_free_irq_source_id(kvm, assigned_dev->irq_source_id);
if (cancel_work_sync(&assigned_dev->interrupt_work))
--
1.6.0.3
From: Sheng Yang <[email protected]>
For KVM can reuse the type define, and need them to support shadow MTRR.
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/include/asm/mtrr.h | 25 +++++++++++++++++++++++++
arch/x86/kernel/cpu/mtrr/generic.c | 12 +++---------
arch/x86/kernel/cpu/mtrr/mtrr.h | 17 -----------------
3 files changed, 28 insertions(+), 26 deletions(-)
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 7c1e425..cb988aa 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -57,6 +57,31 @@ struct mtrr_gentry {
};
#endif /* !__i386__ */
+struct mtrr_var_range {
+ u32 base_lo;
+ u32 base_hi;
+ u32 mask_lo;
+ u32 mask_hi;
+};
+
+/* In the Intel processor's MTRR interface, the MTRR type is always held in
+ an 8 bit field: */
+typedef u8 mtrr_type;
+
+#define MTRR_NUM_FIXED_RANGES 88
+#define MTRR_MAX_VAR_RANGES 256
+
+struct mtrr_state_type {
+ struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES];
+ mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES];
+ unsigned char enabled;
+ unsigned char have_fixed;
+ mtrr_type def_type;
+};
+
+#define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
+#define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
+
/* These are the various ioctls */
#define MTRRIOC_ADD_ENTRY _IOW(MTRR_IOCTL_BASE, 0, struct mtrr_sentry)
#define MTRRIOC_SET_ENTRY _IOW(MTRR_IOCTL_BASE, 1, struct mtrr_sentry)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 90db91e..b59ddcc 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -14,14 +14,6 @@
#include <asm/pat.h>
#include "mtrr.h"
-struct mtrr_state_type {
- struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES];
- mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES];
- unsigned char enabled;
- unsigned char have_fixed;
- mtrr_type def_type;
-};
-
struct fixed_range_block {
int base_msr; /* start address of an MTRR block */
int ranges; /* number of MTRRs in this block */
@@ -35,10 +27,12 @@ static struct fixed_range_block fixed_range_blocks[] = {
};
static unsigned long smp_changes_mask;
-static struct mtrr_state_type mtrr_state = {};
static int mtrr_state_set;
u64 mtrr_tom2;
+struct mtrr_state_type mtrr_state = {};
+EXPORT_SYMBOL_GPL(mtrr_state);
+
#undef MODULE_PARAM_PREFIX
#define MODULE_PARAM_PREFIX "mtrr."
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 9885382..ffd6040 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -8,12 +8,6 @@
#define MTRRcap_MSR 0x0fe
#define MTRRdefType_MSR 0x2ff
-#define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
-#define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
-
-#define MTRR_NUM_FIXED_RANGES 88
-#define MTRR_MAX_VAR_RANGES 256
-
#define MTRRfix64K_00000_MSR 0x250
#define MTRRfix16K_80000_MSR 0x258
#define MTRRfix16K_A0000_MSR 0x259
@@ -30,10 +24,6 @@
#define MTRR_CHANGE_MASK_VARIABLE 0x02
#define MTRR_CHANGE_MASK_DEFTYPE 0x04
-/* In the Intel processor's MTRR interface, the MTRR type is always held in
- an 8 bit field: */
-typedef u8 mtrr_type;
-
extern unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
struct mtrr_ops {
@@ -71,13 +61,6 @@ struct set_mtrr_context {
u32 ccr3;
};
-struct mtrr_var_range {
- u32 base_lo;
- u32 base_hi;
- u32 mask_lo;
- u32 mask_hi;
-};
-
void set_mtrr_done(struct set_mtrr_context *ctxt);
void set_mtrr_cache_disable(struct set_mtrr_context *ctxt);
void set_mtrr_prepare_save(struct set_mtrr_context *ctxt);
--
1.6.0.3
From: Amit Shah <[email protected]>
The busy flag of the TR selector is not set by the hardware. This breaks
migration from amd hosts to intel hosts.
Signed-off-by: Amit Shah <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/svm.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 665008d..743aebd 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -781,6 +781,13 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
if (seg == VCPU_SREG_CS)
var->g = s->limit > 0xfffff;
+ /*
+ * Work around a bug where the busy flag in the tr selector
+ * isn't exposed
+ */
+ if (seg == VCPU_SREG_TR)
+ var->type |= 0x2;
+
var->unusable = !var->present;
}
--
1.6.0.3
From: Sheng Yang <[email protected]>
Ideally, every assigned device should in a clear condition before and after
assignment, so that the former state of device won't affect later work.
Some devices provide a mechanism named Function Level Reset, which is
defined in PCI/PCI-e document. We should execute it before and after device
assignment.
(But sadly, the feature is new, and most device on the market now don't
support it. We are considering using D0/D3hot transmit to emulate it later,
but not that elegant and reliable as FLR itself.)
[Update: Reminded by Xiantao, execute FLR after we ensure that the device can
be assigned to the guest.]
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/x86.c | 2 +-
virt/kvm/kvm_main.c | 5 +++++
2 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 38f79b6..9a4a39c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4148,8 +4148,8 @@ static void kvm_free_vcpus(struct kvm *kvm)
void kvm_arch_destroy_vm(struct kvm *kvm)
{
- kvm_iommu_unmap_guest(kvm);
kvm_free_all_assigned_devices(kvm);
+ kvm_iommu_unmap_guest(kvm);
kvm_free_pit(kvm);
kfree(kvm->arch.vpic);
kfree(kvm->arch.vioapic);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4f43abe..1838052 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -152,6 +152,8 @@ static void kvm_free_assigned_device(struct kvm *kvm,
*/
kvm_put_kvm(kvm);
+ pci_reset_function(assigned_dev->dev);
+
pci_release_regions(assigned_dev->dev);
pci_disable_device(assigned_dev->dev);
pci_dev_put(assigned_dev->dev);
@@ -283,6 +285,9 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
__func__);
goto out_disable;
}
+
+ pci_reset_function(dev);
+
match->assigned_dev_id = assigned_dev->assigned_dev_id;
match->host_busnr = assigned_dev->busnr;
match->host_devfn = assigned_dev->devfn;
--
1.6.0.3
From: Xiantao Zhang <[email protected]>
Remove the lock protection for kvm halt logic, otherwise,
once other vcpus want to acquire the lock, and they have to
wait all vcpus are waken up from halt.
Signed-off-by: Xiantao Zhang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/ia64/kvm/kvm-ia64.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index 43e45f6..70eb829 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -439,7 +439,6 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
expires = div64_u64(itc_diff, cyc_per_usec);
kt = ktime_set(0, 1000 * expires);
- down_read(&vcpu->kvm->slots_lock);
vcpu->arch.ht_active = 1;
hrtimer_start(p_ht, kt, HRTIMER_MODE_ABS);
@@ -452,7 +451,6 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
vcpu->arch.mp_state =
KVM_MP_STATE_RUNNABLE;
- up_read(&vcpu->kvm->slots_lock);
if (vcpu->arch.mp_state != KVM_MP_STATE_RUNNABLE)
return -EINTR;
--
1.6.0.3
From: Sheng Yang <[email protected]>
As well as reset mmu context when set MTRR.
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 5 ++-
arch/x86/kvm/x86.c | 61 +++++++++++++++++++++++++++++++++++++-
2 files changed, 63 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a40fa84..8082e87 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -21,6 +21,7 @@
#include <asm/pvclock-abi.h>
#include <asm/desc.h>
+#include <asm/mtrr.h>
#define KVM_MAX_VCPUS 16
#define KVM_MEMORY_SLOTS 32
@@ -86,6 +87,7 @@
#define KVM_MIN_FREE_MMU_PAGES 5
#define KVM_REFILL_PAGES 25
#define KVM_MAX_CPUID_ENTRIES 40
+#define KVM_NR_FIXED_MTRR_REGION 88
#define KVM_NR_VAR_MTRR 8
extern spinlock_t kvm_lock;
@@ -329,7 +331,8 @@ struct kvm_vcpu_arch {
bool nmi_injected;
bool nmi_window_open;
- u64 mtrr[0x100];
+ struct mtrr_state_type mtrr_state;
+ u32 pat;
};
struct kvm_mem_alias {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a2c4b55..f5b2334 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -39,6 +39,7 @@
#include <asm/uaccess.h>
#include <asm/msr.h>
#include <asm/desc.h>
+#include <asm/mtrr.h>
#define MAX_IO_MSRS 256
#define CR0_RESERVED_BITS \
@@ -650,10 +651,38 @@ static bool msr_mtrr_valid(unsigned msr)
static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
{
+ u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges;
+
if (!msr_mtrr_valid(msr))
return 1;
- vcpu->arch.mtrr[msr - 0x200] = data;
+ if (msr == MSR_MTRRdefType) {
+ vcpu->arch.mtrr_state.def_type = data;
+ vcpu->arch.mtrr_state.enabled = (data & 0xc00) >> 10;
+ } else if (msr == MSR_MTRRfix64K_00000)
+ p[0] = data;
+ else if (msr == MSR_MTRRfix16K_80000 || msr == MSR_MTRRfix16K_A0000)
+ p[1 + msr - MSR_MTRRfix16K_80000] = data;
+ else if (msr >= MSR_MTRRfix4K_C0000 && msr <= MSR_MTRRfix4K_F8000)
+ p[3 + msr - MSR_MTRRfix4K_C0000] = data;
+ else if (msr == MSR_IA32_CR_PAT)
+ vcpu->arch.pat = data;
+ else { /* Variable MTRRs */
+ int idx, is_mtrr_mask;
+ u64 *pt;
+
+ idx = (msr - 0x200) / 2;
+ is_mtrr_mask = msr - 0x200 - 2 * idx;
+ if (!is_mtrr_mask)
+ pt =
+ (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].base_lo;
+ else
+ pt =
+ (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].mask_lo;
+ *pt = data;
+ }
+
+ kvm_mmu_reset_context(vcpu);
return 0;
}
@@ -749,10 +778,37 @@ int kvm_get_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 *pdata)
static int get_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
{
+ u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges;
+
if (!msr_mtrr_valid(msr))
return 1;
- *pdata = vcpu->arch.mtrr[msr - 0x200];
+ if (msr == MSR_MTRRdefType)
+ *pdata = vcpu->arch.mtrr_state.def_type +
+ (vcpu->arch.mtrr_state.enabled << 10);
+ else if (msr == MSR_MTRRfix64K_00000)
+ *pdata = p[0];
+ else if (msr == MSR_MTRRfix16K_80000 || msr == MSR_MTRRfix16K_A0000)
+ *pdata = p[1 + msr - MSR_MTRRfix16K_80000];
+ else if (msr >= MSR_MTRRfix4K_C0000 && msr <= MSR_MTRRfix4K_F8000)
+ *pdata = p[3 + msr - MSR_MTRRfix4K_C0000];
+ else if (msr == MSR_IA32_CR_PAT)
+ *pdata = vcpu->arch.pat;
+ else { /* Variable MTRRs */
+ int idx, is_mtrr_mask;
+ u64 *pt;
+
+ idx = (msr - 0x200) / 2;
+ is_mtrr_mask = msr - 0x200 - 2 * idx;
+ if (!is_mtrr_mask)
+ pt =
+ (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].base_lo;
+ else
+ pt =
+ (u64 *)&vcpu->arch.mtrr_state.var_ranges[idx].mask_lo;
+ *pdata = *pt;
+ }
+
return 0;
}
@@ -3942,6 +3998,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
/* We do fxsave: this must be aligned. */
BUG_ON((unsigned long)&vcpu->arch.host_fx_image & 0xF);
+ vcpu->arch.mtrr_state.have_fixed = 1;
vcpu_load(vcpu);
r = kvm_arch_vcpu_reset(vcpu);
if (r == 0)
--
1.6.0.3
From: Jan Kiszka <[email protected]>
Older VMX supporting CPUs do not provide the "Virtual NMI" feature for
tracking the NMI-blocked state after injecting such events. For now
KVM is unable to inject NMIs on those CPUs.
Derived from Sheng Yang's suggestion to use the IRQ window notification
for detecting the end of NMI handlers, this patch implements virtual
NMI support without impact on the host's ability to receive real NMIs.
The downside is that the given approach requires some heuristics that
can cause NMI nesting in vary rare corner cases.
The approach works as follows:
- inject NMI and set a software-based NMI-blocked flag
- arm the IRQ window start notification whenever an NMI window is
requested
- if the guest exits due to an opening IRQ window, clear the emulated
NMI-blocked flag
- if the guest net execution time with NMI-blocked but without an IRQ
window exceeds 1 second, force NMI-blocked reset and inject anyway
This approach covers most practical scenarios:
- succeeding NMIs are seperated by at least one open IRQ window
- the guest may spin with IRQs disabled (e.g. due to a bug), but
leaving the NMI handler takes much less time than one second
- the guest does not rely on strict ordering or timing of NMIs
(would be problematic in virtualized environments anyway)
Successfully tested with the 'nmi n' monitor command, the kgdbts
testsuite on smp guests (additional patches required to add debug
register support to kvm) + the kernel's nmi_watchdog=1, and a Siemens-
specific board emulation (+ guest) that comes with its own NMI
watchdog mechanism.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/vmx.c | 174 ++++++++++++++++++++++++++++++++++------------------
1 files changed, 115 insertions(+), 59 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f16a62c..2180109 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -90,6 +90,11 @@ struct vcpu_vmx {
} rmode;
int vpid;
bool emulation_required;
+
+ /* Support for vnmi-less CPUs */
+ int soft_vnmi_blocked;
+ ktime_t entry_time;
+ s64 vnmi_blocked_time;
};
static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu)
@@ -2230,6 +2235,8 @@ static int vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmx->vcpu.arch.rmode.active = 0;
+ vmx->soft_vnmi_blocked = 0;
+
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(&vmx->vcpu, 0);
msr = 0xfee00000 | MSR_IA32_APICBASE_ENABLE;
@@ -2335,6 +2342,29 @@ out:
return ret;
}
+static void enable_irq_window(struct kvm_vcpu *vcpu)
+{
+ u32 cpu_based_vm_exec_control;
+
+ cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+ cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
+ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+}
+
+static void enable_nmi_window(struct kvm_vcpu *vcpu)
+{
+ u32 cpu_based_vm_exec_control;
+
+ if (!cpu_has_virtual_nmis()) {
+ enable_irq_window(vcpu);
+ return;
+ }
+
+ cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+ cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_NMI_PENDING;
+ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+}
+
static void vmx_inject_irq(struct kvm_vcpu *vcpu, int irq)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -2360,6 +2390,19 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ if (!cpu_has_virtual_nmis()) {
+ /*
+ * Tracking the NMI-blocked state in software is built upon
+ * finding the next open IRQ window. This, in turn, depends on
+ * well-behaving guests: They have to keep IRQs disabled at
+ * least as long as the NMI handler runs. Otherwise we may
+ * cause NMI nesting, maybe breaking the guest. But as this is
+ * highly unlikely, we can live with the residual risk.
+ */
+ vmx->soft_vnmi_blocked = 1;
+ vmx->vnmi_blocked_time = 0;
+ }
+
++vcpu->stat.nmi_injections;
if (vcpu->arch.rmode.active) {
vmx->rmode.irq.pending = true;
@@ -2384,6 +2427,8 @@ static void vmx_update_window_states(struct kvm_vcpu *vcpu)
!(guest_intr & (GUEST_INTR_STATE_STI |
GUEST_INTR_STATE_MOV_SS |
GUEST_INTR_STATE_NMI));
+ if (!cpu_has_virtual_nmis() && to_vmx(vcpu)->soft_vnmi_blocked)
+ vcpu->arch.nmi_window_open = 0;
vcpu->arch.interrupt_window_open =
((vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
@@ -2403,55 +2448,31 @@ static void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
kvm_queue_interrupt(vcpu, irq);
}
-static void enable_irq_window(struct kvm_vcpu *vcpu)
-{
- u32 cpu_based_vm_exec_control;
-
- cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
- cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
- vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
-}
-
-static void enable_nmi_window(struct kvm_vcpu *vcpu)
-{
- u32 cpu_based_vm_exec_control;
-
- if (!cpu_has_virtual_nmis())
- return;
-
- cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
- cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_NMI_PENDING;
- vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
-}
-
static void do_interrupt_requests(struct kvm_vcpu *vcpu,
struct kvm_run *kvm_run)
{
vmx_update_window_states(vcpu);
- if (cpu_has_virtual_nmis()) {
- if (vcpu->arch.nmi_pending && !vcpu->arch.nmi_injected) {
- if (vcpu->arch.nmi_window_open) {
- vcpu->arch.nmi_pending = false;
- vcpu->arch.nmi_injected = true;
- } else {
- enable_nmi_window(vcpu);
- return;
- }
- }
- if (vcpu->arch.nmi_injected) {
- vmx_inject_nmi(vcpu);
- if (vcpu->arch.nmi_pending
- || kvm_run->request_nmi_window)
- enable_nmi_window(vcpu);
- else if (vcpu->arch.irq_summary
- || kvm_run->request_interrupt_window)
- enable_irq_window(vcpu);
+ if (vcpu->arch.nmi_pending && !vcpu->arch.nmi_injected) {
+ if (vcpu->arch.nmi_window_open) {
+ vcpu->arch.nmi_pending = false;
+ vcpu->arch.nmi_injected = true;
+ } else {
+ enable_nmi_window(vcpu);
return;
}
- if (!vcpu->arch.nmi_window_open || kvm_run->request_nmi_window)
+ }
+ if (vcpu->arch.nmi_injected) {
+ vmx_inject_nmi(vcpu);
+ if (vcpu->arch.nmi_pending || kvm_run->request_nmi_window)
enable_nmi_window(vcpu);
+ else if (vcpu->arch.irq_summary
+ || kvm_run->request_interrupt_window)
+ enable_irq_window(vcpu);
+ return;
}
+ if (!vcpu->arch.nmi_window_open || kvm_run->request_nmi_window)
+ enable_nmi_window(vcpu);
if (vcpu->arch.interrupt_window_open) {
if (vcpu->arch.irq_summary && !vcpu->arch.interrupt.pending)
@@ -3097,6 +3118,37 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
printk(KERN_WARNING "%s: unexpected, valid vectoring info "
"(0x%x) and exit reason is 0x%x\n",
__func__, vectoring_info, exit_reason);
+
+ if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked)) {
+ if (vcpu->arch.interrupt_window_open) {
+ vmx->soft_vnmi_blocked = 0;
+ vcpu->arch.nmi_window_open = 1;
+ } else if (vmx->vnmi_blocked_time > 1000000000LL &&
+ (kvm_run->request_nmi_window || vcpu->arch.nmi_pending)) {
+ /*
+ * This CPU don't support us in finding the end of an
+ * NMI-blocked window if the guest runs with IRQs
+ * disabled. So we pull the trigger after 1 s of
+ * futile waiting, but inform the user about this.
+ */
+ printk(KERN_WARNING "%s: Breaking out of NMI-blocked "
+ "state on VCPU %d after 1 s timeout\n",
+ __func__, vcpu->vcpu_id);
+ vmx->soft_vnmi_blocked = 0;
+ vmx->vcpu.arch.nmi_window_open = 1;
+ }
+
+ /*
+ * If the user space waits to inject an NNI, exit ASAP
+ */
+ if (vcpu->arch.nmi_window_open && kvm_run->request_nmi_window
+ && !vcpu->arch.nmi_pending) {
+ kvm_run->exit_reason = KVM_EXIT_NMI_WINDOW_OPEN;
+ ++vcpu->stat.nmi_window_exits;
+ return 0;
+ }
+ }
+
if (exit_reason < kvm_vmx_max_exit_handlers
&& kvm_vmx_exit_handlers[exit_reason])
return kvm_vmx_exit_handlers[exit_reason](vcpu, kvm_run);
@@ -3146,7 +3198,9 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
if (unblock_nmi && vector != DF_VECTOR)
vmcs_set_bits(GUEST_INTERRUPTIBILITY_INFO,
GUEST_INTR_STATE_NMI);
- }
+ } else if (unlikely(vmx->soft_vnmi_blocked))
+ vmx->vnmi_blocked_time +=
+ ktime_to_ns(ktime_sub(ktime_get(), vmx->entry_time));
idt_vectoring_info = vmx->idt_vectoring_info;
idtv_info_valid = idt_vectoring_info & VECTORING_INFO_VALID_MASK;
@@ -3186,27 +3240,25 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
vmx_update_window_states(vcpu);
- if (cpu_has_virtual_nmis()) {
- if (vcpu->arch.nmi_pending && !vcpu->arch.nmi_injected) {
- if (vcpu->arch.interrupt.pending) {
- enable_nmi_window(vcpu);
- } else if (vcpu->arch.nmi_window_open) {
- vcpu->arch.nmi_pending = false;
- vcpu->arch.nmi_injected = true;
- } else {
- enable_nmi_window(vcpu);
- return;
- }
- }
- if (vcpu->arch.nmi_injected) {
- vmx_inject_nmi(vcpu);
- if (vcpu->arch.nmi_pending)
- enable_nmi_window(vcpu);
- else if (kvm_cpu_has_interrupt(vcpu))
- enable_irq_window(vcpu);
+ if (vcpu->arch.nmi_pending && !vcpu->arch.nmi_injected) {
+ if (vcpu->arch.interrupt.pending) {
+ enable_nmi_window(vcpu);
+ } else if (vcpu->arch.nmi_window_open) {
+ vcpu->arch.nmi_pending = false;
+ vcpu->arch.nmi_injected = true;
+ } else {
+ enable_nmi_window(vcpu);
return;
}
}
+ if (vcpu->arch.nmi_injected) {
+ vmx_inject_nmi(vcpu);
+ if (vcpu->arch.nmi_pending)
+ enable_nmi_window(vcpu);
+ else if (kvm_cpu_has_interrupt(vcpu))
+ enable_irq_window(vcpu);
+ return;
+ }
if (!vcpu->arch.interrupt.pending && kvm_cpu_has_interrupt(vcpu)) {
if (vcpu->arch.interrupt_window_open)
kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu));
@@ -3255,6 +3307,10 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
struct vcpu_vmx *vmx = to_vmx(vcpu);
u32 intr_info;
+ /* Record the guest's net vcpu time for enforced NMI injections. */
+ if (unlikely(!cpu_has_virtual_nmis() && vmx->soft_vnmi_blocked))
+ vmx->entry_time = ktime_get();
+
/* Handle invalid guest state instead of entering VMX */
if (vmx->emulation_required && emulate_invalid_guest_state) {
handle_invalid_guest_state(vcpu, kvm_run);
--
1.6.0.3
From: Sheng Yang <[email protected]>
The effective memory type of EPT is the mixture of MSR_IA32_CR_PAT and memory
type field of EPT entry.
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 ++-
arch/x86/kvm/mmu.c | 11 ++++++++++-
arch/x86/kvm/svm.c | 6 ++++++
arch/x86/kvm/vmx.c | 10 ++++++++--
arch/x86/kvm/x86.c | 2 +-
5 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8082e87..93040b5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -483,6 +483,7 @@ struct kvm_x86_ops {
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
int (*get_tdp_level)(void);
+ int (*get_mt_mask_shift)(void);
};
extern struct kvm_x86_ops *kvm_x86_ops;
@@ -496,7 +497,7 @@ int kvm_mmu_setup(struct kvm_vcpu *vcpu);
void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte);
void kvm_mmu_set_base_ptes(u64 base_pte);
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
- u64 dirty_mask, u64 nx_mask, u64 x_mask);
+ u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 mt_mask);
int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ac2304f..09d05f5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -168,6 +168,7 @@ static u64 __read_mostly shadow_x_mask; /* mutual exclusive with nx_mask */
static u64 __read_mostly shadow_user_mask;
static u64 __read_mostly shadow_accessed_mask;
static u64 __read_mostly shadow_dirty_mask;
+static u64 __read_mostly shadow_mt_mask;
void kvm_mmu_set_nonpresent_ptes(u64 trap_pte, u64 notrap_pte)
{
@@ -183,13 +184,14 @@ void kvm_mmu_set_base_ptes(u64 base_pte)
EXPORT_SYMBOL_GPL(kvm_mmu_set_base_ptes);
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
- u64 dirty_mask, u64 nx_mask, u64 x_mask)
+ u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 mt_mask)
{
shadow_user_mask = user_mask;
shadow_accessed_mask = accessed_mask;
shadow_dirty_mask = dirty_mask;
shadow_nx_mask = nx_mask;
shadow_x_mask = x_mask;
+ shadow_mt_mask = mt_mask;
}
EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);
@@ -1546,6 +1548,8 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
{
u64 spte;
int ret = 0;
+ u64 mt_mask = shadow_mt_mask;
+
/*
* We don't set the accessed bit, since we sometimes want to see
* whether the guest actually used the pte (in order to detect
@@ -1564,6 +1568,11 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *shadow_pte,
spte |= shadow_user_mask;
if (largepage)
spte |= PT_PAGE_SIZE_MASK;
+ if (mt_mask) {
+ mt_mask = get_memory_type(vcpu, gfn) <<
+ kvm_x86_ops->get_mt_mask_shift();
+ spte |= mt_mask;
+ }
spte |= (u64)pfn << PAGE_SHIFT;
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 9c4ce65..05efc4e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1912,6 +1912,11 @@ static int get_npt_level(void)
#endif
}
+static int svm_get_mt_mask_shift(void)
+{
+ return 0;
+}
+
static struct kvm_x86_ops svm_x86_ops = {
.cpu_has_kvm_support = has_svm,
.disabled_by_bios = is_disabled,
@@ -1967,6 +1972,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.set_tss_addr = svm_set_tss_addr,
.get_tdp_level = get_npt_level,
+ .get_mt_mask_shift = svm_get_mt_mask_shift,
};
static int __init svm_init(void)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b4c95a5..dae134f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3574,6 +3574,11 @@ static int get_ept_level(void)
return VMX_EPT_DEFAULT_GAW + 1;
}
+static int vmx_get_mt_mask_shift(void)
+{
+ return VMX_EPT_MT_EPTE_SHIFT;
+}
+
static struct kvm_x86_ops vmx_x86_ops = {
.cpu_has_kvm_support = cpu_has_kvm_support,
.disabled_by_bios = vmx_disabled_by_bios,
@@ -3629,6 +3634,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.set_tss_addr = vmx_set_tss_addr,
.get_tdp_level = get_ept_level,
+ .get_mt_mask_shift = vmx_get_mt_mask_shift,
};
static int __init vmx_init(void)
@@ -3685,10 +3691,10 @@ static int __init vmx_init(void)
bypass_guest_pf = 0;
kvm_mmu_set_base_ptes(VMX_EPT_READABLE_MASK |
VMX_EPT_WRITABLE_MASK |
- VMX_EPT_DEFAULT_MT << VMX_EPT_MT_EPTE_SHIFT |
VMX_EPT_IGMT_BIT);
kvm_mmu_set_mask_ptes(0ull, 0ull, 0ull, 0ull,
- VMX_EPT_EXECUTABLE_MASK);
+ VMX_EPT_EXECUTABLE_MASK,
+ VMX_EPT_DEFAULT_MT << VMX_EPT_MT_EPTE_SHIFT);
kvm_enable_tdp();
} else
kvm_disable_tdp();
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0edf753..f175b79 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2615,7 +2615,7 @@ int kvm_arch_init(void *opaque)
kvm_mmu_set_nonpresent_ptes(0ull, 0ull);
kvm_mmu_set_base_ptes(PT_PRESENT_MASK);
kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
- PT_DIRTY_MASK, PT64_NX_MASK, 0);
+ PT_DIRTY_MASK, PT64_NX_MASK, 0, 0);
return 0;
out:
--
1.6.0.3
From: Hollis Blanchard <[email protected]>
This introduces a set of core-provided hooks. For 440, some of these are
implemented by booke.c, with the rest in (the new) 44x.c.
Note that these hooks are link-time, not run-time. Since it is not possible to
build a single kernel for both e500 and 440 (for example), using function
pointers would only add overhead.
Signed-off-by: Hollis Blanchard <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/powerpc/include/asm/kvm_host.h | 3 +
arch/powerpc/include/asm/kvm_ppc.h | 34 +++++-----
arch/powerpc/kvm/44x.c | 123 +++++++++++++++++++++++++++++++++++
arch/powerpc/kvm/44x_tlb.h | 1 +
arch/powerpc/kvm/Kconfig | 11 +--
arch/powerpc/kvm/Makefile | 4 +-
arch/powerpc/kvm/booke.c | 61 +++++++++++++----
arch/powerpc/kvm/emulate.c | 2 +-
arch/powerpc/kvm/powerpc.c | 103 +++--------------------------
9 files changed, 207 insertions(+), 135 deletions(-)
create mode 100644 arch/powerpc/kvm/44x.c
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index df73351..f5850d7 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -74,6 +74,9 @@ struct kvmppc_44x_tlbe {
struct kvm_arch {
};
+/* XXX Can't include mmu-44x.h because it redefines struct mm_context. */
+#define PPC44x_TLB_SIZE 64
+
struct kvm_vcpu_arch {
/* Unmodified copy of the guest's TLB. */
struct kvmppc_44x_tlbe guest_tlb[PPC44x_TLB_SIZE];
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 39daeaa..96d5de9 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -61,23 +61,6 @@ extern void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn,
extern void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode);
extern void kvmppc_mmu_switch_pid(struct kvm_vcpu *vcpu, u32 pid);
-/* XXX Book E specific */
-extern void kvmppc_tlbe_set_modified(struct kvm_vcpu *vcpu, unsigned int i);
-
-extern void kvmppc_check_and_deliver_interrupts(struct kvm_vcpu *vcpu);
-
-static inline void kvmppc_queue_exception(struct kvm_vcpu *vcpu, int exception)
-{
- unsigned int priority = exception_priority[exception];
- set_bit(priority, &vcpu->arch.pending_exceptions);
-}
-
-static inline void kvmppc_clear_exception(struct kvm_vcpu *vcpu, int exception)
-{
- unsigned int priority = exception_priority[exception];
- clear_bit(priority, &vcpu->arch.pending_exceptions);
-}
-
/* Helper function for "full" MSR writes. No need to call this if only EE is
* changing. */
static inline void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
@@ -99,6 +82,23 @@ static inline void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 new_pid)
}
}
+/* Core-specific hooks */
+
+extern int kvmppc_core_check_processor_compat(void);
+
+extern void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
+extern void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu);
+
+extern void kvmppc_core_load_guest_debugstate(struct kvm_vcpu *vcpu);
+extern void kvmppc_core_load_host_debugstate(struct kvm_vcpu *vcpu);
+
+extern void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu);
+extern int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu);
+extern void kvmppc_core_queue_program(struct kvm_vcpu *vcpu);
+extern void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu);
+extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
+ struct kvm_interrupt *irq);
+
extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
#endif /* __POWERPC_KVM_PPC_H__ */
diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
new file mode 100644
index 0000000..fcf8c7d
--- /dev/null
+++ b/arch/powerpc/kvm/44x.c
@@ -0,0 +1,123 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * Authors: Hollis Blanchard <[email protected]>
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/reg.h>
+#include <asm/cputable.h>
+#include <asm/tlbflush.h>
+
+#include "44x_tlb.h"
+
+/* Note: clearing MSR[DE] just means that the debug interrupt will not be
+ * delivered *immediately*. Instead, it simply sets the appropriate DBSR bits.
+ * If those DBSR bits are still set when MSR[DE] is re-enabled, the interrupt
+ * will be delivered as an "imprecise debug event" (which is indicated by
+ * DBSR[IDE].
+ */
+static void kvm44x_disable_debug_interrupts(void)
+{
+ mtmsr(mfmsr() & ~MSR_DE);
+}
+
+void kvmppc_core_load_host_debugstate(struct kvm_vcpu *vcpu)
+{
+ kvm44x_disable_debug_interrupts();
+
+ mtspr(SPRN_IAC1, vcpu->arch.host_iac[0]);
+ mtspr(SPRN_IAC2, vcpu->arch.host_iac[1]);
+ mtspr(SPRN_IAC3, vcpu->arch.host_iac[2]);
+ mtspr(SPRN_IAC4, vcpu->arch.host_iac[3]);
+ mtspr(SPRN_DBCR1, vcpu->arch.host_dbcr1);
+ mtspr(SPRN_DBCR2, vcpu->arch.host_dbcr2);
+ mtspr(SPRN_DBCR0, vcpu->arch.host_dbcr0);
+ mtmsr(vcpu->arch.host_msr);
+}
+
+void kvmppc_core_load_guest_debugstate(struct kvm_vcpu *vcpu)
+{
+ struct kvm_guest_debug *dbg = &vcpu->guest_debug;
+ u32 dbcr0 = 0;
+
+ vcpu->arch.host_msr = mfmsr();
+ kvm44x_disable_debug_interrupts();
+
+ /* Save host debug register state. */
+ vcpu->arch.host_iac[0] = mfspr(SPRN_IAC1);
+ vcpu->arch.host_iac[1] = mfspr(SPRN_IAC2);
+ vcpu->arch.host_iac[2] = mfspr(SPRN_IAC3);
+ vcpu->arch.host_iac[3] = mfspr(SPRN_IAC4);
+ vcpu->arch.host_dbcr0 = mfspr(SPRN_DBCR0);
+ vcpu->arch.host_dbcr1 = mfspr(SPRN_DBCR1);
+ vcpu->arch.host_dbcr2 = mfspr(SPRN_DBCR2);
+
+ /* set registers up for guest */
+
+ if (dbg->bp[0]) {
+ mtspr(SPRN_IAC1, dbg->bp[0]);
+ dbcr0 |= DBCR0_IAC1 | DBCR0_IDM;
+ }
+ if (dbg->bp[1]) {
+ mtspr(SPRN_IAC2, dbg->bp[1]);
+ dbcr0 |= DBCR0_IAC2 | DBCR0_IDM;
+ }
+ if (dbg->bp[2]) {
+ mtspr(SPRN_IAC3, dbg->bp[2]);
+ dbcr0 |= DBCR0_IAC3 | DBCR0_IDM;
+ }
+ if (dbg->bp[3]) {
+ mtspr(SPRN_IAC4, dbg->bp[3]);
+ dbcr0 |= DBCR0_IAC4 | DBCR0_IDM;
+ }
+
+ mtspr(SPRN_DBCR0, dbcr0);
+ mtspr(SPRN_DBCR1, 0);
+ mtspr(SPRN_DBCR2, 0);
+}
+
+void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+{
+ int i;
+
+ /* Mark every guest entry in the shadow TLB entry modified, so that they
+ * will all be reloaded on the next vcpu run (instead of being
+ * demand-faulted). */
+ for (i = 0; i <= tlb_44x_hwater; i++)
+ kvmppc_tlbe_set_modified(vcpu, i);
+}
+
+void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
+{
+ /* Don't leave guest TLB entries resident when being de-scheduled. */
+ /* XXX It would be nice to differentiate between heavyweight exit and
+ * sched_out here, since we could avoid the TLB flush for heavyweight
+ * exits. */
+ _tlbia();
+}
+
+int kvmppc_core_check_processor_compat(void)
+{
+ int r;
+
+ if (strcmp(cur_cpu_spec->platform, "ppc440") == 0)
+ r = 0;
+ else
+ r = -ENOTSUPP;
+
+ return r;
+}
diff --git a/arch/powerpc/kvm/44x_tlb.h b/arch/powerpc/kvm/44x_tlb.h
index e5b0a76..357d79a 100644
--- a/arch/powerpc/kvm/44x_tlb.h
+++ b/arch/powerpc/kvm/44x_tlb.h
@@ -29,6 +29,7 @@ extern struct kvmppc_44x_tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu,
gva_t eaddr);
extern struct kvmppc_44x_tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu,
gva_t eaddr);
+extern void kvmppc_tlbe_set_modified(struct kvm_vcpu *vcpu, unsigned int i);
/* TLB helper functions */
static inline unsigned int get_tlb_size(const struct kvmppc_44x_tlbe *tlbe)
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index ffed96f..37e9b3c 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -16,11 +16,9 @@ if VIRTUALIZATION
config KVM
bool "Kernel-based Virtual Machine (KVM) support"
- depends on 44x && EXPERIMENTAL
+ depends on EXPERIMENTAL
select PREEMPT_NOTIFIERS
select ANON_INODES
- # We can only run on Book E hosts so far
- select KVM_BOOKE
---help---
Support hosting virtualized guest machines. You will also
need to select one or more of the processor modules below.
@@ -30,12 +28,11 @@ config KVM
If unsure, say N.
-config KVM_BOOKE
- bool "KVM support for Book E PowerPC processors"
+config KVM_440
+ bool "KVM support for PowerPC 440 processors"
depends on KVM && 44x
---help---
- Provides host support for KVM on Book E PowerPC processors. Currently
- this works on 440 processors only.
+ KVM can run unmodified 440 guest kernels on 440 host processors.
config KVM_TRACE
bool "KVM trace support"
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index a7f8574..f5e3375 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -13,5 +13,5 @@ obj-$(CONFIG_KVM) += kvm.o
AFLAGS_booke_interrupts.o := -I$(obj)
-kvm-booke-objs := booke.o booke_interrupts.o 44x_tlb.o
-obj-$(CONFIG_KVM_BOOKE) += kvm-booke.o
+kvm-440-objs := booke.o booke_interrupts.o 44x.o 44x_tlb.o
+obj-$(CONFIG_KVM_440) += kvm-440.o
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index b1e90a1..138014a 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -134,6 +134,40 @@ void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu)
}
}
+static void kvmppc_booke_queue_exception(struct kvm_vcpu *vcpu, int exception)
+{
+ unsigned int priority = exception_priority[exception];
+ set_bit(priority, &vcpu->arch.pending_exceptions);
+}
+
+static void kvmppc_booke_clear_exception(struct kvm_vcpu *vcpu, int exception)
+{
+ unsigned int priority = exception_priority[exception];
+ clear_bit(priority, &vcpu->arch.pending_exceptions);
+}
+
+void kvmppc_core_queue_program(struct kvm_vcpu *vcpu)
+{
+ kvmppc_booke_queue_exception(vcpu, BOOKE_INTERRUPT_PROGRAM);
+}
+
+void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu)
+{
+ kvmppc_booke_queue_exception(vcpu, BOOKE_INTERRUPT_DECREMENTER);
+}
+
+int kvmppc_core_pending_dec(struct kvm_vcpu *vcpu)
+{
+ unsigned int priority = exception_priority[BOOKE_INTERRUPT_DECREMENTER];
+ return test_bit(priority, &vcpu->arch.pending_exceptions);
+}
+
+void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
+ struct kvm_interrupt *irq)
+{
+ kvmppc_booke_queue_exception(vcpu, BOOKE_INTERRUPT_EXTERNAL);
+}
+
/* Check if we are ready to deliver the interrupt */
static int kvmppc_can_deliver_interrupt(struct kvm_vcpu *vcpu, int interrupt)
{
@@ -168,7 +202,7 @@ static int kvmppc_can_deliver_interrupt(struct kvm_vcpu *vcpu, int interrupt)
return r;
}
-static void kvmppc_deliver_interrupt(struct kvm_vcpu *vcpu, int interrupt)
+static void kvmppc_booke_deliver_interrupt(struct kvm_vcpu *vcpu, int interrupt)
{
switch (interrupt) {
case BOOKE_INTERRUPT_DECREMENTER:
@@ -183,7 +217,7 @@ static void kvmppc_deliver_interrupt(struct kvm_vcpu *vcpu, int interrupt)
}
/* Check pending exceptions and deliver one, if possible. */
-void kvmppc_check_and_deliver_interrupts(struct kvm_vcpu *vcpu)
+void kvmppc_core_deliver_interrupts(struct kvm_vcpu *vcpu)
{
unsigned long *pending = &vcpu->arch.pending_exceptions;
unsigned int exception;
@@ -193,8 +227,8 @@ void kvmppc_check_and_deliver_interrupts(struct kvm_vcpu *vcpu)
while (priority <= BOOKE_MAX_INTERRUPT) {
exception = priority_exception[priority];
if (kvmppc_can_deliver_interrupt(vcpu, exception)) {
- kvmppc_clear_exception(vcpu, exception);
- kvmppc_deliver_interrupt(vcpu, exception);
+ kvmppc_booke_clear_exception(vcpu, exception);
+ kvmppc_booke_deliver_interrupt(vcpu, exception);
break;
}
@@ -251,7 +285,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
/* Program traps generated by user-level software must be handled
* by the guest kernel. */
vcpu->arch.esr = vcpu->arch.fault_esr;
- kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_PROGRAM);
+ kvmppc_booke_queue_exception(vcpu, BOOKE_INTERRUPT_PROGRAM);
r = RESUME_GUEST;
break;
}
@@ -284,27 +318,27 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
break;
case BOOKE_INTERRUPT_FP_UNAVAIL:
- kvmppc_queue_exception(vcpu, exit_nr);
+ kvmppc_booke_queue_exception(vcpu, exit_nr);
r = RESUME_GUEST;
break;
case BOOKE_INTERRUPT_DATA_STORAGE:
vcpu->arch.dear = vcpu->arch.fault_dear;
vcpu->arch.esr = vcpu->arch.fault_esr;
- kvmppc_queue_exception(vcpu, exit_nr);
+ kvmppc_booke_queue_exception(vcpu, exit_nr);
vcpu->stat.dsi_exits++;
r = RESUME_GUEST;
break;
case BOOKE_INTERRUPT_INST_STORAGE:
vcpu->arch.esr = vcpu->arch.fault_esr;
- kvmppc_queue_exception(vcpu, exit_nr);
+ kvmppc_booke_queue_exception(vcpu, exit_nr);
vcpu->stat.isi_exits++;
r = RESUME_GUEST;
break;
case BOOKE_INTERRUPT_SYSCALL:
- kvmppc_queue_exception(vcpu, exit_nr);
+ kvmppc_booke_queue_exception(vcpu, exit_nr);
vcpu->stat.syscall_exits++;
r = RESUME_GUEST;
break;
@@ -318,7 +352,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
gtlbe = kvmppc_44x_dtlb_search(vcpu, eaddr);
if (!gtlbe) {
/* The guest didn't have a mapping for it. */
- kvmppc_queue_exception(vcpu, exit_nr);
+ kvmppc_booke_queue_exception(vcpu, exit_nr);
vcpu->arch.dear = vcpu->arch.fault_dear;
vcpu->arch.esr = vcpu->arch.fault_esr;
vcpu->stat.dtlb_real_miss_exits++;
@@ -360,7 +394,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
gtlbe = kvmppc_44x_itlb_search(vcpu, eaddr);
if (!gtlbe) {
/* The guest didn't have a mapping for it. */
- kvmppc_queue_exception(vcpu, exit_nr);
+ kvmppc_booke_queue_exception(vcpu, exit_nr);
vcpu->stat.itlb_real_miss_exits++;
break;
}
@@ -380,8 +414,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
gtlbe->word2);
} else {
/* Guest mapped and leaped at non-RAM! */
- kvmppc_queue_exception(vcpu,
- BOOKE_INTERRUPT_MACHINE_CHECK);
+ kvmppc_booke_queue_exception(vcpu, BOOKE_INTERRUPT_MACHINE_CHECK);
}
break;
@@ -409,7 +442,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
local_irq_disable();
- kvmppc_check_and_deliver_interrupts(vcpu);
+ kvmppc_core_deliver_interrupts(vcpu);
/* Do some exit accounting. */
vcpu->stat.sum_exits++;
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 0ce8ed5..c5d2bfc 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -139,7 +139,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
switch (get_op(inst)) {
case 3: /* trap */
printk("trap!\n");
- kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_PROGRAM);
+ kvmppc_core_queue_program(vcpu);
advance = 0;
break;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index fda9baa..800fe97 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -98,14 +98,7 @@ void kvm_arch_hardware_unsetup(void)
void kvm_arch_check_processor_compat(void *rtn)
{
- int r;
-
- if (strcmp(cur_cpu_spec->platform, "ppc440") == 0)
- r = 0;
- else
- r = -ENOTSUPP;
-
- *(int *)rtn = r;
+ *(int *)rtn = kvmppc_core_check_processor_compat();
}
struct kvm *kvm_arch_create_vm(void)
@@ -211,16 +204,14 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu)
{
- unsigned int priority = exception_priority[BOOKE_INTERRUPT_DECREMENTER];
-
- return test_bit(priority, &vcpu->arch.pending_exceptions);
+ return kvmppc_core_pending_dec(vcpu);
}
static void kvmppc_decrementer_func(unsigned long data)
{
struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
- kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_DECREMENTER);
+ kvmppc_core_queue_dec(vcpu);
if (waitqueue_active(&vcpu->wq)) {
wake_up_interruptible(&vcpu->wq);
@@ -241,96 +232,20 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
kvmppc_core_destroy_mmu(vcpu);
}
-/* Note: clearing MSR[DE] just means that the debug interrupt will not be
- * delivered *immediately*. Instead, it simply sets the appropriate DBSR bits.
- * If those DBSR bits are still set when MSR[DE] is re-enabled, the interrupt
- * will be delivered as an "imprecise debug event" (which is indicated by
- * DBSR[IDE].
- */
-static void kvmppc_disable_debug_interrupts(void)
-{
- mtmsr(mfmsr() & ~MSR_DE);
-}
-
-static void kvmppc_restore_host_debug_state(struct kvm_vcpu *vcpu)
-{
- kvmppc_disable_debug_interrupts();
-
- mtspr(SPRN_IAC1, vcpu->arch.host_iac[0]);
- mtspr(SPRN_IAC2, vcpu->arch.host_iac[1]);
- mtspr(SPRN_IAC3, vcpu->arch.host_iac[2]);
- mtspr(SPRN_IAC4, vcpu->arch.host_iac[3]);
- mtspr(SPRN_DBCR1, vcpu->arch.host_dbcr1);
- mtspr(SPRN_DBCR2, vcpu->arch.host_dbcr2);
- mtspr(SPRN_DBCR0, vcpu->arch.host_dbcr0);
- mtmsr(vcpu->arch.host_msr);
-}
-
-static void kvmppc_load_guest_debug_registers(struct kvm_vcpu *vcpu)
-{
- struct kvm_guest_debug *dbg = &vcpu->guest_debug;
- u32 dbcr0 = 0;
-
- vcpu->arch.host_msr = mfmsr();
- kvmppc_disable_debug_interrupts();
-
- /* Save host debug register state. */
- vcpu->arch.host_iac[0] = mfspr(SPRN_IAC1);
- vcpu->arch.host_iac[1] = mfspr(SPRN_IAC2);
- vcpu->arch.host_iac[2] = mfspr(SPRN_IAC3);
- vcpu->arch.host_iac[3] = mfspr(SPRN_IAC4);
- vcpu->arch.host_dbcr0 = mfspr(SPRN_DBCR0);
- vcpu->arch.host_dbcr1 = mfspr(SPRN_DBCR1);
- vcpu->arch.host_dbcr2 = mfspr(SPRN_DBCR2);
-
- /* set registers up for guest */
-
- if (dbg->bp[0]) {
- mtspr(SPRN_IAC1, dbg->bp[0]);
- dbcr0 |= DBCR0_IAC1 | DBCR0_IDM;
- }
- if (dbg->bp[1]) {
- mtspr(SPRN_IAC2, dbg->bp[1]);
- dbcr0 |= DBCR0_IAC2 | DBCR0_IDM;
- }
- if (dbg->bp[2]) {
- mtspr(SPRN_IAC3, dbg->bp[2]);
- dbcr0 |= DBCR0_IAC3 | DBCR0_IDM;
- }
- if (dbg->bp[3]) {
- mtspr(SPRN_IAC4, dbg->bp[3]);
- dbcr0 |= DBCR0_IAC4 | DBCR0_IDM;
- }
-
- mtspr(SPRN_DBCR0, dbcr0);
- mtspr(SPRN_DBCR1, 0);
- mtspr(SPRN_DBCR2, 0);
-}
-
void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
{
- int i;
-
if (vcpu->guest_debug.enabled)
- kvmppc_load_guest_debug_registers(vcpu);
+ kvmppc_core_load_guest_debugstate(vcpu);
- /* Mark every guest entry in the shadow TLB entry modified, so that they
- * will all be reloaded on the next vcpu run (instead of being
- * demand-faulted). */
- for (i = 0; i <= tlb_44x_hwater; i++)
- kvmppc_tlbe_set_modified(vcpu, i);
+ kvmppc_core_vcpu_load(vcpu, cpu);
}
void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
{
if (vcpu->guest_debug.enabled)
- kvmppc_restore_host_debug_state(vcpu);
+ kvmppc_core_load_host_debugstate(vcpu);
- /* Don't leave guest TLB entries resident when being de-scheduled. */
- /* XXX It would be nice to differentiate between heavyweight exit and
- * sched_out here, since we could avoid the TLB flush for heavyweight
- * exits. */
- _tlbia();
+ kvmppc_core_vcpu_put(vcpu);
}
int kvm_arch_vcpu_ioctl_debug_guest(struct kvm_vcpu *vcpu,
@@ -459,7 +374,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
vcpu->arch.dcr_needed = 0;
}
- kvmppc_check_and_deliver_interrupts(vcpu);
+ kvmppc_core_deliver_interrupts(vcpu);
local_irq_disable();
kvm_guest_enter();
@@ -477,7 +392,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq)
{
- kvmppc_queue_exception(vcpu, BOOKE_INTERRUPT_EXTERNAL);
+ kvmppc_core_queue_external(vcpu, irq);
if (waitqueue_active(&vcpu->wq)) {
wake_up_interruptible(&vcpu->wq);
--
1.6.0.3
From: Hollis Blanchard <[email protected]>
This will make it easier to provide implementations for other cores.
Signed-off-by: Hollis Blanchard <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/powerpc/include/asm/kvm_ppc.h | 4 +-
arch/powerpc/kvm/44x_tlb.c | 138 +++++++++++++++++++++++++++++++++++-
arch/powerpc/kvm/booke_guest.c | 26 -------
arch/powerpc/kvm/emulate.c | 120 ++-----------------------------
4 files changed, 145 insertions(+), 143 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index bb62ad8..4adb4a3 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -58,11 +58,11 @@ extern int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu,
extern int kvmppc_emulate_instruction(struct kvm_run *run,
struct kvm_vcpu *vcpu);
extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu);
+extern int kvmppc_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws);
+extern int kvmppc_emul_tlbsx(struct kvm_vcpu *vcpu, u8 rt, u8 ra, u8 rb, u8 rc);
extern void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn,
u64 asid, u32 flags);
-extern void kvmppc_mmu_invalidate(struct kvm_vcpu *vcpu, gva_t eaddr,
- gva_t eend, u32 asid);
extern void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode);
extern void kvmppc_mmu_switch_pid(struct kvm_vcpu *vcpu, u32 pid);
diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c
index ad72c6f..dd75ab8 100644
--- a/arch/powerpc/kvm/44x_tlb.c
+++ b/arch/powerpc/kvm/44x_tlb.c
@@ -32,6 +32,34 @@
static unsigned int kvmppc_tlb_44x_pos;
+#ifdef DEBUG
+void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu)
+{
+ struct kvmppc_44x_tlbe *tlbe;
+ int i;
+
+ printk("vcpu %d TLB dump:\n", vcpu->vcpu_id);
+ printk("| %2s | %3s | %8s | %8s | %8s |\n",
+ "nr", "tid", "word0", "word1", "word2");
+
+ for (i = 0; i < PPC44x_TLB_SIZE; i++) {
+ tlbe = &vcpu->arch.guest_tlb[i];
+ if (tlbe->word0 & PPC44x_TLB_VALID)
+ printk(" G%2d | %02X | %08X | %08X | %08X |\n",
+ i, tlbe->tid, tlbe->word0, tlbe->word1,
+ tlbe->word2);
+ }
+
+ for (i = 0; i < PPC44x_TLB_SIZE; i++) {
+ tlbe = &vcpu->arch.shadow_tlb[i];
+ if (tlbe->word0 & PPC44x_TLB_VALID)
+ printk(" S%2d | %02X | %08X | %08X | %08X |\n",
+ i, tlbe->tid, tlbe->word0, tlbe->word1,
+ tlbe->word2);
+ }
+}
+#endif
+
static u32 kvmppc_44x_tlb_shadow_attrib(u32 attrib, int usermode)
{
/* Mask off reserved bits. */
@@ -191,8 +219,8 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn, u64 asid,
handler);
}
-void kvmppc_mmu_invalidate(struct kvm_vcpu *vcpu, gva_t eaddr,
- gva_t eend, u32 asid)
+static void kvmppc_mmu_invalidate(struct kvm_vcpu *vcpu, gva_t eaddr,
+ gva_t eend, u32 asid)
{
unsigned int pid = !(asid & 0xff);
int i;
@@ -249,3 +277,109 @@ void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode)
vcpu->arch.shadow_pid = !usermode;
}
+
+static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu,
+ const struct tlbe *tlbe)
+{
+ gpa_t gpa;
+
+ if (!get_tlb_v(tlbe))
+ return 0;
+
+ /* Does it match current guest AS? */
+ /* XXX what about IS != DS? */
+ if (get_tlb_ts(tlbe) != !!(vcpu->arch.msr & MSR_IS))
+ return 0;
+
+ gpa = get_tlb_raddr(tlbe);
+ if (!gfn_to_memslot(vcpu->kvm, gpa >> PAGE_SHIFT))
+ /* Mapping is not for RAM. */
+ return 0;
+
+ return 1;
+}
+
+int kvmppc_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws)
+{
+ u64 eaddr;
+ u64 raddr;
+ u64 asid;
+ u32 flags;
+ struct tlbe *tlbe;
+ unsigned int index;
+
+ index = vcpu->arch.gpr[ra];
+ if (index > PPC44x_TLB_SIZE) {
+ printk("%s: index %d\n", __func__, index);
+ kvmppc_dump_vcpu(vcpu);
+ return EMULATE_FAIL;
+ }
+
+ tlbe = &vcpu->arch.guest_tlb[index];
+
+ /* Invalidate shadow mappings for the about-to-be-clobbered TLBE. */
+ if (tlbe->word0 & PPC44x_TLB_VALID) {
+ eaddr = get_tlb_eaddr(tlbe);
+ asid = (tlbe->word0 & PPC44x_TLB_TS) | tlbe->tid;
+ kvmppc_mmu_invalidate(vcpu, eaddr, get_tlb_end(tlbe), asid);
+ }
+
+ switch (ws) {
+ case PPC44x_TLB_PAGEID:
+ tlbe->tid = vcpu->arch.mmucr & 0xff;
+ tlbe->word0 = vcpu->arch.gpr[rs];
+ break;
+
+ case PPC44x_TLB_XLAT:
+ tlbe->word1 = vcpu->arch.gpr[rs];
+ break;
+
+ case PPC44x_TLB_ATTRIB:
+ tlbe->word2 = vcpu->arch.gpr[rs];
+ break;
+
+ default:
+ return EMULATE_FAIL;
+ }
+
+ if (tlbe_is_host_safe(vcpu, tlbe)) {
+ eaddr = get_tlb_eaddr(tlbe);
+ raddr = get_tlb_raddr(tlbe);
+ asid = (tlbe->word0 & PPC44x_TLB_TS) | tlbe->tid;
+ flags = tlbe->word2 & 0xffff;
+
+ /* Create a 4KB mapping on the host. If the guest wanted a
+ * large page, only the first 4KB is mapped here and the rest
+ * are mapped on the fly. */
+ kvmppc_mmu_map(vcpu, eaddr, raddr >> PAGE_SHIFT, asid, flags);
+ }
+
+ KVMTRACE_5D(GTLB_WRITE, vcpu, index,
+ tlbe->tid, tlbe->word0, tlbe->word1, tlbe->word2,
+ handler);
+
+ return EMULATE_DONE;
+}
+
+int kvmppc_emul_tlbsx(struct kvm_vcpu *vcpu, u8 rt, u8 ra, u8 rb, u8 rc)
+{
+ u32 ea;
+ int index;
+ unsigned int as = get_mmucr_sts(vcpu);
+ unsigned int pid = get_mmucr_stid(vcpu);
+
+ ea = vcpu->arch.gpr[rb];
+ if (ra)
+ ea += vcpu->arch.gpr[ra];
+
+ index = kvmppc_44x_tlb_index(vcpu, ea, pid, as);
+ if (rc) {
+ if (index < 0)
+ vcpu->arch.cr &= ~0x20000000;
+ else
+ vcpu->arch.cr |= 0x20000000;
+ }
+ vcpu->arch.gpr[rt] = index;
+
+ return EMULATE_DONE;
+}
diff --git a/arch/powerpc/kvm/booke_guest.c b/arch/powerpc/kvm/booke_guest.c
index 7b2591e..c0f8532 100644
--- a/arch/powerpc/kvm/booke_guest.c
+++ b/arch/powerpc/kvm/booke_guest.c
@@ -111,32 +111,6 @@ const unsigned char priority_exception[] = {
};
-void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu)
-{
- struct tlbe *tlbe;
- int i;
-
- printk("vcpu %d TLB dump:\n", vcpu->vcpu_id);
- printk("| %2s | %3s | %8s | %8s | %8s |\n",
- "nr", "tid", "word0", "word1", "word2");
-
- for (i = 0; i < PPC44x_TLB_SIZE; i++) {
- tlbe = &vcpu->arch.guest_tlb[i];
- if (tlbe->word0 & PPC44x_TLB_VALID)
- printk(" G%2d | %02X | %08X | %08X | %08X |\n",
- i, tlbe->tid, tlbe->word0, tlbe->word1,
- tlbe->word2);
- }
-
- for (i = 0; i < PPC44x_TLB_SIZE; i++) {
- tlbe = &vcpu->arch.shadow_tlb[i];
- if (tlbe->word0 & PPC44x_TLB_VALID)
- printk(" S%2d | %02X | %08X | %08X | %08X |\n",
- i, tlbe->tid, tlbe->word0, tlbe->word1,
- tlbe->word2);
- }
-}
-
/* TODO: use vcpu_printf() */
void kvmppc_dump_vcpu(struct kvm_vcpu *vcpu)
{
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 0fce4fb..0ce8ed5 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -29,8 +29,6 @@
#include <asm/byteorder.h>
#include <asm/kvm_ppc.h>
-#include "44x_tlb.h"
-
/* Instruction decoding */
static inline unsigned int get_op(u32 inst)
{
@@ -87,96 +85,6 @@ static inline unsigned int get_d(u32 inst)
return inst & 0xffff;
}
-static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu,
- const struct tlbe *tlbe)
-{
- gpa_t gpa;
-
- if (!get_tlb_v(tlbe))
- return 0;
-
- /* Does it match current guest AS? */
- /* XXX what about IS != DS? */
- if (get_tlb_ts(tlbe) != !!(vcpu->arch.msr & MSR_IS))
- return 0;
-
- gpa = get_tlb_raddr(tlbe);
- if (!gfn_to_memslot(vcpu->kvm, gpa >> PAGE_SHIFT))
- /* Mapping is not for RAM. */
- return 0;
-
- return 1;
-}
-
-static int kvmppc_emul_tlbwe(struct kvm_vcpu *vcpu, u32 inst)
-{
- u64 eaddr;
- u64 raddr;
- u64 asid;
- u32 flags;
- struct tlbe *tlbe;
- unsigned int ra;
- unsigned int rs;
- unsigned int ws;
- unsigned int index;
-
- ra = get_ra(inst);
- rs = get_rs(inst);
- ws = get_ws(inst);
-
- index = vcpu->arch.gpr[ra];
- if (index > PPC44x_TLB_SIZE) {
- printk("%s: index %d\n", __func__, index);
- kvmppc_dump_vcpu(vcpu);
- return EMULATE_FAIL;
- }
-
- tlbe = &vcpu->arch.guest_tlb[index];
-
- /* Invalidate shadow mappings for the about-to-be-clobbered TLBE. */
- if (tlbe->word0 & PPC44x_TLB_VALID) {
- eaddr = get_tlb_eaddr(tlbe);
- asid = (tlbe->word0 & PPC44x_TLB_TS) | tlbe->tid;
- kvmppc_mmu_invalidate(vcpu, eaddr, get_tlb_end(tlbe), asid);
- }
-
- switch (ws) {
- case PPC44x_TLB_PAGEID:
- tlbe->tid = vcpu->arch.mmucr & 0xff;
- tlbe->word0 = vcpu->arch.gpr[rs];
- break;
-
- case PPC44x_TLB_XLAT:
- tlbe->word1 = vcpu->arch.gpr[rs];
- break;
-
- case PPC44x_TLB_ATTRIB:
- tlbe->word2 = vcpu->arch.gpr[rs];
- break;
-
- default:
- return EMULATE_FAIL;
- }
-
- if (tlbe_is_host_safe(vcpu, tlbe)) {
- eaddr = get_tlb_eaddr(tlbe);
- raddr = get_tlb_raddr(tlbe);
- asid = (tlbe->word0 & PPC44x_TLB_TS) | tlbe->tid;
- flags = tlbe->word2 & 0xffff;
-
- /* Create a 4KB mapping on the host. If the guest wanted a
- * large page, only the first 4KB is mapped here and the rest
- * are mapped on the fly. */
- kvmppc_mmu_map(vcpu, eaddr, raddr >> PAGE_SHIFT, asid, flags);
- }
-
- KVMTRACE_5D(GTLB_WRITE, vcpu, index,
- tlbe->tid, tlbe->word0, tlbe->word1, tlbe->word2,
- handler);
-
- return EMULATE_DONE;
-}
-
static void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
{
if (vcpu->arch.tcr & TCR_DIE) {
@@ -222,6 +130,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
int rc;
int rs;
int rt;
+ int ws;
int sprn;
int dcrn;
enum emulation_result emulated = EMULATE_DONE;
@@ -630,33 +539,18 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
break;
case 978: /* tlbwe */
- emulated = kvmppc_emul_tlbwe(vcpu, inst);
+ ra = get_ra(inst);
+ rs = get_rs(inst);
+ ws = get_ws(inst);
+ emulated = kvmppc_emul_tlbwe(vcpu, ra, rs, ws);
break;
- case 914: { /* tlbsx */
- int index;
- unsigned int as = get_mmucr_sts(vcpu);
- unsigned int pid = get_mmucr_stid(vcpu);
-
+ case 914: /* tlbsx */
rt = get_rt(inst);
ra = get_ra(inst);
rb = get_rb(inst);
rc = get_rc(inst);
-
- ea = vcpu->arch.gpr[rb];
- if (ra)
- ea += vcpu->arch.gpr[ra];
-
- index = kvmppc_44x_tlb_index(vcpu, ea, pid, as);
- if (rc) {
- if (index < 0)
- vcpu->arch.cr &= ~0x20000000;
- else
- vcpu->arch.cr |= 0x20000000;
- }
- vcpu->arch.gpr[rt] = index;
-
- }
+ emulated = kvmppc_emul_tlbsx(vcpu, rt, ra, rb, rc);
break;
case 790: /* lhbrx */
--
1.6.0.3
From: Hollis Blanchard <[email protected]>
This patch doesn't yet move all 44x-specific data into the new structure, but
is the first step down that path. In the future we may also want to create a
struct kvm_vcpu_booke.
Based on patch from Liu Yu <[email protected]>.
Signed-off-by: Hollis Blanchard <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/powerpc/include/asm/kvm_44x.h | 47 ++++++++++++++++++++++++++
arch/powerpc/include/asm/kvm_host.h | 13 -------
arch/powerpc/include/asm/kvm_ppc.h | 6 +++
arch/powerpc/kernel/asm-offsets.c | 14 +++++---
arch/powerpc/kvm/44x.c | 62 +++++++++++++++++++++++++++++++++-
arch/powerpc/kvm/44x_tlb.c | 37 +++++++++++++-------
arch/powerpc/kvm/booke.c | 9 ++---
arch/powerpc/kvm/booke_interrupts.S | 6 ++--
arch/powerpc/kvm/powerpc.c | 23 +------------
9 files changed, 154 insertions(+), 63 deletions(-)
create mode 100644 arch/powerpc/include/asm/kvm_44x.h
diff --git a/arch/powerpc/include/asm/kvm_44x.h b/arch/powerpc/include/asm/kvm_44x.h
new file mode 100644
index 0000000..dece093
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_44x.h
@@ -0,0 +1,47 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * Authors: Hollis Blanchard <[email protected]>
+ */
+
+#ifndef __ASM_44X_H__
+#define __ASM_44X_H__
+
+#include <linux/kvm_host.h>
+
+/* XXX Can't include mmu-44x.h because it redefines struct mm_context. */
+#define PPC44x_TLB_SIZE 64
+
+struct kvmppc_vcpu_44x {
+ /* Unmodified copy of the guest's TLB. */
+ struct kvmppc_44x_tlbe guest_tlb[PPC44x_TLB_SIZE];
+ /* TLB that's actually used when the guest is running. */
+ struct kvmppc_44x_tlbe shadow_tlb[PPC44x_TLB_SIZE];
+ /* Pages which are referenced in the shadow TLB. */
+ struct page *shadow_pages[PPC44x_TLB_SIZE];
+
+ /* Track which TLB entries we've modified in the current exit. */
+ u8 shadow_tlb_mod[PPC44x_TLB_SIZE];
+
+ struct kvm_vcpu vcpu;
+};
+
+static inline struct kvmppc_vcpu_44x *to_44x(struct kvm_vcpu *vcpu)
+{
+ return container_of(vcpu, struct kvmppc_vcpu_44x, vcpu);
+}
+
+#endif /* __ASM_44X_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index f5850d7..765d8ec 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -74,20 +74,7 @@ struct kvmppc_44x_tlbe {
struct kvm_arch {
};
-/* XXX Can't include mmu-44x.h because it redefines struct mm_context. */
-#define PPC44x_TLB_SIZE 64
-
struct kvm_vcpu_arch {
- /* Unmodified copy of the guest's TLB. */
- struct kvmppc_44x_tlbe guest_tlb[PPC44x_TLB_SIZE];
- /* TLB that's actually used when the guest is running. */
- struct kvmppc_44x_tlbe shadow_tlb[PPC44x_TLB_SIZE];
- /* Pages which are referenced in the shadow TLB. */
- struct page *shadow_pages[PPC44x_TLB_SIZE];
-
- /* Track which TLB entries we've modified in the current exit. */
- u8 shadow_tlb_mod[PPC44x_TLB_SIZE];
-
u32 host_stack;
u32 host_pid;
u32 host_dbcr0;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index d593325..976ecc4 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -62,6 +62,9 @@ extern void kvmppc_mmu_switch_pid(struct kvm_vcpu *vcpu, u32 pid);
/* Core-specific hooks */
+extern struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm,
+ unsigned int id);
+extern void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu);
extern int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu);
extern int kvmppc_core_check_processor_compat(void);
extern int kvmppc_core_vcpu_translate(struct kvm_vcpu *vcpu,
@@ -85,6 +88,9 @@ extern int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
extern int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs);
extern int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt);
+extern int kvmppc_booke_init(void);
+extern void kvmppc_booke_exit(void);
+
extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
#endif /* __POWERPC_KVM_PPC_H__ */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 1d15420..b6367e8 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -23,9 +23,6 @@
#include <linux/mm.h>
#include <linux/suspend.h>
#include <linux/hrtimer.h>
-#ifdef CONFIG_KVM
-#include <linux/kvm_host.h>
-#endif
#ifdef CONFIG_PPC64
#include <linux/time.h>
#include <linux/hardirq.h>
@@ -51,6 +48,9 @@
#ifdef CONFIG_PPC_ISERIES
#include <asm/iseries/alpaca.h>
#endif
+#ifdef CONFIG_KVM
+#include <asm/kvm_44x.h>
+#endif
#if defined(CONFIG_BOOKE) || defined(CONFIG_40x)
#include "head_booke.h"
@@ -357,10 +357,14 @@ int main(void)
#ifdef CONFIG_KVM
DEFINE(TLBE_BYTES, sizeof(struct kvmppc_44x_tlbe));
+ DEFINE(VCPU_TO_44X, offsetof(struct kvmppc_vcpu_44x, vcpu));
+ DEFINE(VCPU44x_SHADOW_TLB,
+ offsetof(struct kvmppc_vcpu_44x, shadow_tlb));
+ DEFINE(VCPU44x_SHADOW_MOD,
+ offsetof(struct kvmppc_vcpu_44x, shadow_tlb_mod));
+
DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack));
DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid));
- DEFINE(VCPU_SHADOW_TLB, offsetof(struct kvm_vcpu, arch.shadow_tlb));
- DEFINE(VCPU_SHADOW_MOD, offsetof(struct kvm_vcpu, arch.shadow_tlb_mod));
DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr));
DEFINE(VCPU_LR, offsetof(struct kvm_vcpu, arch.lr));
DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, arch.cr));
diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
index f5d7028..22054b1 100644
--- a/arch/powerpc/kvm/44x.c
+++ b/arch/powerpc/kvm/44x.c
@@ -18,9 +18,13 @@
*/
#include <linux/kvm_host.h>
+#include <linux/err.h>
+
#include <asm/reg.h>
#include <asm/cputable.h>
#include <asm/tlbflush.h>
+#include <asm/kvm_44x.h>
+#include <asm/kvm_ppc.h>
#include "44x_tlb.h"
@@ -124,7 +128,8 @@ int kvmppc_core_check_processor_compat(void)
int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu)
{
- struct kvmppc_44x_tlbe *tlbe = &vcpu->arch.guest_tlb[0];
+ struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
+ struct kvmppc_44x_tlbe *tlbe = &vcpu_44x->guest_tlb[0];
tlbe->tid = 0;
tlbe->word0 = PPC44x_TLB_16M | PPC44x_TLB_VALID;
@@ -150,6 +155,7 @@ int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu)
int kvmppc_core_vcpu_translate(struct kvm_vcpu *vcpu,
struct kvm_translation *tr)
{
+ struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
struct kvmppc_44x_tlbe *gtlbe;
int index;
gva_t eaddr;
@@ -166,7 +172,7 @@ int kvmppc_core_vcpu_translate(struct kvm_vcpu *vcpu,
return 0;
}
- gtlbe = &vcpu->arch.guest_tlb[index];
+ gtlbe = &vcpu_44x->guest_tlb[index];
tr->physical_address = tlb_xlate(gtlbe, eaddr);
/* XXX what does "writeable" and "usermode" even mean? */
@@ -174,3 +180,55 @@ int kvmppc_core_vcpu_translate(struct kvm_vcpu *vcpu,
return 0;
}
+
+struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
+{
+ struct kvmppc_vcpu_44x *vcpu_44x;
+ struct kvm_vcpu *vcpu;
+ int err;
+
+ vcpu_44x = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
+ if (!vcpu_44x) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ vcpu = &vcpu_44x->vcpu;
+ err = kvm_vcpu_init(vcpu, kvm, id);
+ if (err)
+ goto free_vcpu;
+
+ return vcpu;
+
+free_vcpu:
+ kmem_cache_free(kvm_vcpu_cache, vcpu_44x);
+out:
+ return ERR_PTR(err);
+}
+
+void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
+{
+ struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
+
+ kvm_vcpu_uninit(vcpu);
+ kmem_cache_free(kvm_vcpu_cache, vcpu_44x);
+}
+
+static int kvmppc_44x_init(void)
+{
+ int r;
+
+ r = kvmppc_booke_init();
+ if (r)
+ return r;
+
+ return kvm_init(NULL, sizeof(struct kvmppc_vcpu_44x), THIS_MODULE);
+}
+
+static void kvmppc_44x_exit(void)
+{
+ kvmppc_booke_exit();
+}
+
+module_init(kvmppc_44x_init);
+module_exit(kvmppc_44x_exit);
diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c
index bb6da13..8b65fbd 100644
--- a/arch/powerpc/kvm/44x_tlb.c
+++ b/arch/powerpc/kvm/44x_tlb.c
@@ -24,6 +24,7 @@
#include <linux/highmem.h>
#include <asm/mmu-44x.h>
#include <asm/kvm_ppc.h>
+#include <asm/kvm_44x.h>
#include "44x_tlb.h"
@@ -43,7 +44,7 @@ void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu)
"nr", "tid", "word0", "word1", "word2");
for (i = 0; i < PPC44x_TLB_SIZE; i++) {
- tlbe = &vcpu->arch.guest_tlb[i];
+ tlbe = &vcpu_44x->guest_tlb[i];
if (tlbe->word0 & PPC44x_TLB_VALID)
printk(" G%2d | %02X | %08X | %08X | %08X |\n",
i, tlbe->tid, tlbe->word0, tlbe->word1,
@@ -51,7 +52,7 @@ void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu)
}
for (i = 0; i < PPC44x_TLB_SIZE; i++) {
- tlbe = &vcpu->arch.shadow_tlb[i];
+ tlbe = &vcpu_44x->shadow_tlb[i];
if (tlbe->word0 & PPC44x_TLB_VALID)
printk(" S%2d | %02X | %08X | %08X | %08X |\n",
i, tlbe->tid, tlbe->word0, tlbe->word1,
@@ -82,11 +83,12 @@ static u32 kvmppc_44x_tlb_shadow_attrib(u32 attrib, int usermode)
int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned int pid,
unsigned int as)
{
+ struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
int i;
/* XXX Replace loop with fancy data structures. */
for (i = 0; i < PPC44x_TLB_SIZE; i++) {
- struct kvmppc_44x_tlbe *tlbe = &vcpu->arch.guest_tlb[i];
+ struct kvmppc_44x_tlbe *tlbe = &vcpu_44x->guest_tlb[i];
unsigned int tid;
if (eaddr < get_tlb_eaddr(tlbe))
@@ -114,25 +116,27 @@ int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned int pid,
struct kvmppc_44x_tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu,
gva_t eaddr)
{
+ struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
unsigned int as = !!(vcpu->arch.msr & MSR_IS);
unsigned int index;
index = kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as);
if (index == -1)
return NULL;
- return &vcpu->arch.guest_tlb[index];
+ return &vcpu_44x->guest_tlb[index];
}
struct kvmppc_44x_tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu,
gva_t eaddr)
{
+ struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
unsigned int as = !!(vcpu->arch.msr & MSR_DS);
unsigned int index;
index = kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as);
if (index == -1)
return NULL;
- return &vcpu->arch.guest_tlb[index];
+ return &vcpu_44x->guest_tlb[index];
}
static int kvmppc_44x_tlbe_is_writable(struct kvmppc_44x_tlbe *tlbe)
@@ -143,8 +147,9 @@ static int kvmppc_44x_tlbe_is_writable(struct kvmppc_44x_tlbe *tlbe)
static void kvmppc_44x_shadow_release(struct kvm_vcpu *vcpu,
unsigned int index)
{
- struct kvmppc_44x_tlbe *stlbe = &vcpu->arch.shadow_tlb[index];
- struct page *page = vcpu->arch.shadow_pages[index];
+ struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
+ struct kvmppc_44x_tlbe *stlbe = &vcpu_44x->shadow_tlb[index];
+ struct page *page = vcpu_44x->shadow_pages[index];
if (get_tlb_v(stlbe)) {
if (kvmppc_44x_tlbe_is_writable(stlbe))
@@ -164,7 +169,9 @@ void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu)
void kvmppc_tlbe_set_modified(struct kvm_vcpu *vcpu, unsigned int i)
{
- vcpu->arch.shadow_tlb_mod[i] = 1;
+ struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
+
+ vcpu_44x->shadow_tlb_mod[i] = 1;
}
/* Caller must ensure that the specified guest TLB entry is safe to insert into
@@ -172,6 +179,7 @@ void kvmppc_tlbe_set_modified(struct kvm_vcpu *vcpu, unsigned int i)
void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn, u64 asid,
u32 flags)
{
+ struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
struct page *new_page;
struct kvmppc_44x_tlbe *stlbe;
hpa_t hpaddr;
@@ -182,7 +190,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn, u64 asid,
victim = kvmppc_tlb_44x_pos++;
if (kvmppc_tlb_44x_pos > tlb_44x_hwater)
kvmppc_tlb_44x_pos = 0;
- stlbe = &vcpu->arch.shadow_tlb[victim];
+ stlbe = &vcpu_44x->shadow_tlb[victim];
/* Get reference to new page. */
new_page = gfn_to_page(vcpu->kvm, gfn);
@@ -196,7 +204,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn, u64 asid,
/* Drop reference to old page. */
kvmppc_44x_shadow_release(vcpu, victim);
- vcpu->arch.shadow_pages[victim] = new_page;
+ vcpu_44x->shadow_pages[victim] = new_page;
/* XXX Make sure (va, size) doesn't overlap any other
* entries. 440x6 user manual says the result would be
@@ -224,12 +232,13 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn, u64 asid,
static void kvmppc_mmu_invalidate(struct kvm_vcpu *vcpu, gva_t eaddr,
gva_t eend, u32 asid)
{
+ struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
unsigned int pid = !(asid & 0xff);
int i;
/* XXX Replace loop with fancy data structures. */
for (i = 0; i <= tlb_44x_hwater; i++) {
- struct kvmppc_44x_tlbe *stlbe = &vcpu->arch.shadow_tlb[i];
+ struct kvmppc_44x_tlbe *stlbe = &vcpu_44x->shadow_tlb[i];
unsigned int tid;
if (!get_tlb_v(stlbe))
@@ -259,12 +268,13 @@ static void kvmppc_mmu_invalidate(struct kvm_vcpu *vcpu, gva_t eaddr,
* switching address spaces. */
void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode)
{
+ struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
int i;
if (vcpu->arch.swap_pid) {
/* XXX Replace loop with fancy data structures. */
for (i = 0; i <= tlb_44x_hwater; i++) {
- struct kvmppc_44x_tlbe *stlbe = &vcpu->arch.shadow_tlb[i];
+ struct kvmppc_44x_tlbe *stlbe = &vcpu_44x->shadow_tlb[i];
/* Future optimization: clear only userspace mappings. */
kvmppc_44x_shadow_release(vcpu, i);
@@ -303,6 +313,7 @@ static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu,
int kvmppc_44x_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws)
{
+ struct kvmppc_vcpu_44x *vcpu_44x = to_44x(vcpu);
u64 eaddr;
u64 raddr;
u64 asid;
@@ -317,7 +328,7 @@ int kvmppc_44x_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws)
return EMULATE_FAIL;
}
- tlbe = &vcpu->arch.guest_tlb[index];
+ tlbe = &vcpu_44x->guest_tlb[index];
/* Invalidate shadow mappings for the about-to-be-clobbered TLBE. */
if (tlbe->word0 & PPC44x_TLB_VALID) {
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index c619d1b..883e9db 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -573,7 +573,7 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
return kvmppc_core_vcpu_translate(vcpu, tr);
}
-static int kvmppc_booke_init(void)
+int kvmppc_booke_init(void)
{
unsigned long ivor[16];
unsigned long max_ivor = 0;
@@ -618,14 +618,11 @@ static int kvmppc_booke_init(void)
flush_icache_range(kvmppc_booke_handlers,
kvmppc_booke_handlers + max_ivor + kvmppc_handler_len);
- return kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE);
+ return 0;
}
-static void __exit kvmppc_booke_exit(void)
+void __exit kvmppc_booke_exit(void)
{
free_pages(kvmppc_booke_handlers, VCPU_SIZE_ORDER);
kvm_exit();
}
-
-module_init(kvmppc_booke_init)
-module_exit(kvmppc_booke_exit)
diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S
index 95e165b..8d6929b 100644
--- a/arch/powerpc/kvm/booke_interrupts.S
+++ b/arch/powerpc/kvm/booke_interrupts.S
@@ -349,8 +349,8 @@ lightweight_exit:
lis r5, tlb_44x_hwater@ha
lwz r5, tlb_44x_hwater@l(r5)
mtctr r5
- addi r9, r4, VCPU_SHADOW_TLB
- addi r5, r4, VCPU_SHADOW_MOD
+ addi r9, r4, -VCPU_TO_44X + VCPU44x_SHADOW_TLB
+ addi r5, r4, -VCPU_TO_44X + VCPU44x_SHADOW_MOD
li r3, 0
1:
lbzx r7, r3, r5
@@ -377,7 +377,7 @@ lightweight_exit:
/* Clear bitmap of modified TLB entries */
li r5, PPC44x_TLB_SIZE>>2
mtctr r5
- addi r5, r4, VCPU_SHADOW_MOD - 4
+ addi r5, r4, -VCPU_TO_44X + VCPU44x_SHADOW_MOD - 4
li r6, 0
1:
stwu r6, 4(r5)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 800fe97..dd279ee 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -170,31 +170,12 @@ void kvm_arch_flush_shadow(struct kvm *kvm)
struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
{
- struct kvm_vcpu *vcpu;
- int err;
-
- vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
- if (!vcpu) {
- err = -ENOMEM;
- goto out;
- }
-
- err = kvm_vcpu_init(vcpu, kvm, id);
- if (err)
- goto free_vcpu;
-
- return vcpu;
-
-free_vcpu:
- kmem_cache_free(kvm_vcpu_cache, vcpu);
-out:
- return ERR_PTR(err);
+ return kvmppc_core_vcpu_create(kvm, id);
}
void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
{
- kvm_vcpu_uninit(vcpu);
- kmem_cache_free(kvm_vcpu_cache, vcpu);
+ kvmppc_core_vcpu_free(vcpu);
}
void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
--
1.6.0.3
From: Hollis Blanchard <[email protected]>
This will ease ports to other cores.
Also remove unused "struct kvm_tlb" while we're at it.
Signed-off-by: Hollis Blanchard <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/powerpc/include/asm/kvm_host.h | 6 +++---
arch/powerpc/include/asm/kvm_ppc.h | 5 -----
arch/powerpc/kernel/asm-offsets.c | 2 +-
arch/powerpc/kvm/44x_tlb.c | 22 ++++++++++++----------
arch/powerpc/kvm/44x_tlb.h | 24 +++++++++++++-----------
arch/powerpc/kvm/booke_guest.c | 8 ++++----
6 files changed, 33 insertions(+), 34 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 34b52b7..df73351 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -64,7 +64,7 @@ struct kvm_vcpu_stat {
u32 halt_wakeup;
};
-struct tlbe {
+struct kvmppc_44x_tlbe {
u32 tid; /* Only the low 8 bits are used. */
u32 word0;
u32 word1;
@@ -76,9 +76,9 @@ struct kvm_arch {
struct kvm_vcpu_arch {
/* Unmodified copy of the guest's TLB. */
- struct tlbe guest_tlb[PPC44x_TLB_SIZE];
+ struct kvmppc_44x_tlbe guest_tlb[PPC44x_TLB_SIZE];
/* TLB that's actually used when the guest is running. */
- struct tlbe shadow_tlb[PPC44x_TLB_SIZE];
+ struct kvmppc_44x_tlbe shadow_tlb[PPC44x_TLB_SIZE];
/* Pages which are referenced in the shadow TLB. */
struct page *shadow_pages[PPC44x_TLB_SIZE];
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 4adb4a3..39daeaa 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -29,11 +29,6 @@
#include <linux/kvm_types.h>
#include <linux/kvm_host.h>
-struct kvm_tlb {
- struct tlbe guest_tlb[PPC44x_TLB_SIZE];
- struct tlbe shadow_tlb[PPC44x_TLB_SIZE];
-};
-
enum emulation_result {
EMULATE_DONE, /* no further processing */
EMULATE_DO_MMIO, /* kvm_run filled with MMIO request */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 75c5dd0..1d15420 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -355,7 +355,7 @@ int main(void)
DEFINE(PTE_SIZE, sizeof(pte_t));
#ifdef CONFIG_KVM
- DEFINE(TLBE_BYTES, sizeof(struct tlbe));
+ DEFINE(TLBE_BYTES, sizeof(struct kvmppc_44x_tlbe));
DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack));
DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid));
diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c
index dd75ab8..5152fe5 100644
--- a/arch/powerpc/kvm/44x_tlb.c
+++ b/arch/powerpc/kvm/44x_tlb.c
@@ -86,7 +86,7 @@ int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned int pid,
/* XXX Replace loop with fancy data structures. */
for (i = 0; i < PPC44x_TLB_SIZE; i++) {
- struct tlbe *tlbe = &vcpu->arch.guest_tlb[i];
+ struct kvmppc_44x_tlbe *tlbe = &vcpu->arch.guest_tlb[i];
unsigned int tid;
if (eaddr < get_tlb_eaddr(tlbe))
@@ -111,7 +111,8 @@ int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned int pid,
return -1;
}
-struct tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr)
+struct kvmppc_44x_tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu,
+ gva_t eaddr)
{
unsigned int as = !!(vcpu->arch.msr & MSR_IS);
unsigned int index;
@@ -122,7 +123,8 @@ struct tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr)
return &vcpu->arch.guest_tlb[index];
}
-struct tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr)
+struct kvmppc_44x_tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu,
+ gva_t eaddr)
{
unsigned int as = !!(vcpu->arch.msr & MSR_DS);
unsigned int index;
@@ -133,7 +135,7 @@ struct tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr)
return &vcpu->arch.guest_tlb[index];
}
-static int kvmppc_44x_tlbe_is_writable(struct tlbe *tlbe)
+static int kvmppc_44x_tlbe_is_writable(struct kvmppc_44x_tlbe *tlbe)
{
return tlbe->word2 & (PPC44x_TLB_SW|PPC44x_TLB_UW);
}
@@ -141,7 +143,7 @@ static int kvmppc_44x_tlbe_is_writable(struct tlbe *tlbe)
static void kvmppc_44x_shadow_release(struct kvm_vcpu *vcpu,
unsigned int index)
{
- struct tlbe *stlbe = &vcpu->arch.shadow_tlb[index];
+ struct kvmppc_44x_tlbe *stlbe = &vcpu->arch.shadow_tlb[index];
struct page *page = vcpu->arch.shadow_pages[index];
if (get_tlb_v(stlbe)) {
@@ -171,7 +173,7 @@ void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn, u64 asid,
u32 flags)
{
struct page *new_page;
- struct tlbe *stlbe;
+ struct kvmppc_44x_tlbe *stlbe;
hpa_t hpaddr;
unsigned int victim;
@@ -227,7 +229,7 @@ static void kvmppc_mmu_invalidate(struct kvm_vcpu *vcpu, gva_t eaddr,
/* XXX Replace loop with fancy data structures. */
for (i = 0; i <= tlb_44x_hwater; i++) {
- struct tlbe *stlbe = &vcpu->arch.shadow_tlb[i];
+ struct kvmppc_44x_tlbe *stlbe = &vcpu->arch.shadow_tlb[i];
unsigned int tid;
if (!get_tlb_v(stlbe))
@@ -262,7 +264,7 @@ void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode)
if (vcpu->arch.swap_pid) {
/* XXX Replace loop with fancy data structures. */
for (i = 0; i <= tlb_44x_hwater; i++) {
- struct tlbe *stlbe = &vcpu->arch.shadow_tlb[i];
+ struct kvmppc_44x_tlbe *stlbe = &vcpu->arch.shadow_tlb[i];
/* Future optimization: clear only userspace mappings. */
kvmppc_44x_shadow_release(vcpu, i);
@@ -279,7 +281,7 @@ void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode)
}
static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu,
- const struct tlbe *tlbe)
+ const struct kvmppc_44x_tlbe *tlbe)
{
gpa_t gpa;
@@ -305,7 +307,7 @@ int kvmppc_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws)
u64 raddr;
u64 asid;
u32 flags;
- struct tlbe *tlbe;
+ struct kvmppc_44x_tlbe *tlbe;
unsigned int index;
index = vcpu->arch.gpr[ra];
diff --git a/arch/powerpc/kvm/44x_tlb.h b/arch/powerpc/kvm/44x_tlb.h
index 2ccd46b..e5b0a76 100644
--- a/arch/powerpc/kvm/44x_tlb.h
+++ b/arch/powerpc/kvm/44x_tlb.h
@@ -25,48 +25,50 @@
extern int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr,
unsigned int pid, unsigned int as);
-extern struct tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr);
-extern struct tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr);
+extern struct kvmppc_44x_tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu,
+ gva_t eaddr);
+extern struct kvmppc_44x_tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu,
+ gva_t eaddr);
/* TLB helper functions */
-static inline unsigned int get_tlb_size(const struct tlbe *tlbe)
+static inline unsigned int get_tlb_size(const struct kvmppc_44x_tlbe *tlbe)
{
return (tlbe->word0 >> 4) & 0xf;
}
-static inline gva_t get_tlb_eaddr(const struct tlbe *tlbe)
+static inline gva_t get_tlb_eaddr(const struct kvmppc_44x_tlbe *tlbe)
{
return tlbe->word0 & 0xfffffc00;
}
-static inline gva_t get_tlb_bytes(const struct tlbe *tlbe)
+static inline gva_t get_tlb_bytes(const struct kvmppc_44x_tlbe *tlbe)
{
unsigned int pgsize = get_tlb_size(tlbe);
return 1 << 10 << (pgsize << 1);
}
-static inline gva_t get_tlb_end(const struct tlbe *tlbe)
+static inline gva_t get_tlb_end(const struct kvmppc_44x_tlbe *tlbe)
{
return get_tlb_eaddr(tlbe) + get_tlb_bytes(tlbe) - 1;
}
-static inline u64 get_tlb_raddr(const struct tlbe *tlbe)
+static inline u64 get_tlb_raddr(const struct kvmppc_44x_tlbe *tlbe)
{
u64 word1 = tlbe->word1;
return ((word1 & 0xf) << 32) | (word1 & 0xfffffc00);
}
-static inline unsigned int get_tlb_tid(const struct tlbe *tlbe)
+static inline unsigned int get_tlb_tid(const struct kvmppc_44x_tlbe *tlbe)
{
return tlbe->tid & 0xff;
}
-static inline unsigned int get_tlb_ts(const struct tlbe *tlbe)
+static inline unsigned int get_tlb_ts(const struct kvmppc_44x_tlbe *tlbe)
{
return (tlbe->word0 >> 8) & 0x1;
}
-static inline unsigned int get_tlb_v(const struct tlbe *tlbe)
+static inline unsigned int get_tlb_v(const struct kvmppc_44x_tlbe *tlbe)
{
return (tlbe->word0 >> 9) & 0x1;
}
@@ -81,7 +83,7 @@ static inline unsigned int get_mmucr_sts(const struct kvm_vcpu *vcpu)
return (vcpu->arch.mmucr >> 16) & 0x1;
}
-static inline gpa_t tlb_xlate(struct tlbe *tlbe, gva_t eaddr)
+static inline gpa_t tlb_xlate(struct kvmppc_44x_tlbe *tlbe, gva_t eaddr)
{
unsigned int pgmask = get_tlb_bytes(tlbe) - 1;
diff --git a/arch/powerpc/kvm/booke_guest.c b/arch/powerpc/kvm/booke_guest.c
index c0f8532..41bbf4c 100644
--- a/arch/powerpc/kvm/booke_guest.c
+++ b/arch/powerpc/kvm/booke_guest.c
@@ -307,7 +307,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
break;
case BOOKE_INTERRUPT_DTLB_MISS: {
- struct tlbe *gtlbe;
+ struct kvmppc_44x_tlbe *gtlbe;
unsigned long eaddr = vcpu->arch.fault_dear;
gfn_t gfn;
@@ -347,7 +347,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
}
case BOOKE_INTERRUPT_ITLB_MISS: {
- struct tlbe *gtlbe;
+ struct kvmppc_44x_tlbe *gtlbe;
unsigned long eaddr = vcpu->arch.pc;
gfn_t gfn;
@@ -442,7 +442,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
/* Initial guest state: 16MB mapping 0 -> 0, PC = 0, MSR = 0, R1 = 16MB */
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
- struct tlbe *tlbe = &vcpu->arch.guest_tlb[0];
+ struct kvmppc_44x_tlbe *tlbe = &vcpu->arch.guest_tlb[0];
tlbe->tid = 0;
tlbe->word0 = PPC44x_TLB_16M | PPC44x_TLB_VALID;
@@ -553,7 +553,7 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
struct kvm_translation *tr)
{
- struct tlbe *gtlbe;
+ struct kvmppc_44x_tlbe *gtlbe;
int index;
gva_t eaddr;
u8 pid;
--
1.6.0.3
From: Hollis Blanchard <[email protected]>
This is used in a couple places in KVM, but isn't KVM-specific.
However, this patch doesn't modify other in-kernel emulation code:
- xmon uses a direct copy of ppc_opc.c from binutils
- emulate_instruction() doesn't need it because it can use a series
of mask tests.
Signed-off-by: Hollis Blanchard <[email protected]>
Acked-by: Paul Mackerras <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/powerpc/include/asm/disassemble.h | 80 ++++++++++++++++++++++++++++++++
arch/powerpc/kvm/emulate.c | 57 +----------------------
2 files changed, 81 insertions(+), 56 deletions(-)
create mode 100644 arch/powerpc/include/asm/disassemble.h
diff --git a/arch/powerpc/include/asm/disassemble.h b/arch/powerpc/include/asm/disassemble.h
new file mode 100644
index 0000000..9b198d1
--- /dev/null
+++ b/arch/powerpc/include/asm/disassemble.h
@@ -0,0 +1,80 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * Authors: Hollis Blanchard <[email protected]>
+ */
+
+#ifndef __ASM_PPC_DISASSEMBLE_H__
+#define __ASM_PPC_DISASSEMBLE_H__
+
+#include <linux/types.h>
+
+static inline unsigned int get_op(u32 inst)
+{
+ return inst >> 26;
+}
+
+static inline unsigned int get_xop(u32 inst)
+{
+ return (inst >> 1) & 0x3ff;
+}
+
+static inline unsigned int get_sprn(u32 inst)
+{
+ return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
+}
+
+static inline unsigned int get_dcrn(u32 inst)
+{
+ return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
+}
+
+static inline unsigned int get_rt(u32 inst)
+{
+ return (inst >> 21) & 0x1f;
+}
+
+static inline unsigned int get_rs(u32 inst)
+{
+ return (inst >> 21) & 0x1f;
+}
+
+static inline unsigned int get_ra(u32 inst)
+{
+ return (inst >> 16) & 0x1f;
+}
+
+static inline unsigned int get_rb(u32 inst)
+{
+ return (inst >> 11) & 0x1f;
+}
+
+static inline unsigned int get_rc(u32 inst)
+{
+ return inst & 0x1;
+}
+
+static inline unsigned int get_ws(u32 inst)
+{
+ return (inst >> 11) & 0x1f;
+}
+
+static inline unsigned int get_d(u32 inst)
+{
+ return inst & 0xffff;
+}
+
+#endif /* __ASM_PPC_DISASSEMBLE_H__ */
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index c5d2bfc..5fd9cf7 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -28,62 +28,7 @@
#include <asm/time.h>
#include <asm/byteorder.h>
#include <asm/kvm_ppc.h>
-
-/* Instruction decoding */
-static inline unsigned int get_op(u32 inst)
-{
- return inst >> 26;
-}
-
-static inline unsigned int get_xop(u32 inst)
-{
- return (inst >> 1) & 0x3ff;
-}
-
-static inline unsigned int get_sprn(u32 inst)
-{
- return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
-}
-
-static inline unsigned int get_dcrn(u32 inst)
-{
- return ((inst >> 16) & 0x1f) | ((inst >> 6) & 0x3e0);
-}
-
-static inline unsigned int get_rt(u32 inst)
-{
- return (inst >> 21) & 0x1f;
-}
-
-static inline unsigned int get_rs(u32 inst)
-{
- return (inst >> 21) & 0x1f;
-}
-
-static inline unsigned int get_ra(u32 inst)
-{
- return (inst >> 16) & 0x1f;
-}
-
-static inline unsigned int get_rb(u32 inst)
-{
- return (inst >> 11) & 0x1f;
-}
-
-static inline unsigned int get_rc(u32 inst)
-{
- return inst & 0x1;
-}
-
-static inline unsigned int get_ws(u32 inst)
-{
- return (inst >> 11) & 0x1f;
-}
-
-static inline unsigned int get_d(u32 inst)
-{
- return inst & 0xffff;
-}
+#include <asm/disassemble.h>
static void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
{
--
1.6.0.3
From: Hollis Blanchard <[email protected]>
The division was somewhat artificial and cumbersome, and had no functional
benefit anyways: we can only guests built for the real host processor.
Signed-off-by: Hollis Blanchard <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/powerpc/kvm/Kconfig | 6 +-
arch/powerpc/kvm/Makefile | 6 +-
arch/powerpc/kvm/{booke_guest.c => booke.c} | 60 +++++++++++++++++++
arch/powerpc/kvm/booke_host.c | 83 ---------------------------
4 files changed, 66 insertions(+), 89 deletions(-)
rename arch/powerpc/kvm/{booke_guest.c => booke.c} (90%)
delete mode 100644 arch/powerpc/kvm/booke_host.c
diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 53aaa66..ffed96f 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -20,7 +20,7 @@ config KVM
select PREEMPT_NOTIFIERS
select ANON_INODES
# We can only run on Book E hosts so far
- select KVM_BOOKE_HOST
+ select KVM_BOOKE
---help---
Support hosting virtualized guest machines. You will also
need to select one or more of the processor modules below.
@@ -30,8 +30,8 @@ config KVM
If unsure, say N.
-config KVM_BOOKE_HOST
- bool "KVM host support for Book E PowerPC processors"
+config KVM_BOOKE
+ bool "KVM support for Book E PowerPC processors"
depends on KVM && 44x
---help---
Provides host support for KVM on Book E PowerPC processors. Currently
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 2a5d439..a7f8574 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -8,10 +8,10 @@ common-objs-y = $(addprefix ../../../virt/kvm/, kvm_main.o coalesced_mmio.o)
common-objs-$(CONFIG_KVM_TRACE) += $(addprefix ../../../virt/kvm/, kvm_trace.o)
-kvm-objs := $(common-objs-y) powerpc.o emulate.o booke_guest.o
+kvm-objs := $(common-objs-y) powerpc.o emulate.o
obj-$(CONFIG_KVM) += kvm.o
AFLAGS_booke_interrupts.o := -I$(obj)
-kvm-booke-host-objs := booke_host.o booke_interrupts.o 44x_tlb.o
-obj-$(CONFIG_KVM_BOOKE_HOST) += kvm-booke-host.o
+kvm-booke-objs := booke.o booke_interrupts.o 44x_tlb.o
+obj-$(CONFIG_KVM_BOOKE) += kvm-booke.o
diff --git a/arch/powerpc/kvm/booke_guest.c b/arch/powerpc/kvm/booke.c
similarity index 90%
rename from arch/powerpc/kvm/booke_guest.c
rename to arch/powerpc/kvm/booke.c
index 41bbf4c..b1e90a1 100644
--- a/arch/powerpc/kvm/booke_guest.c
+++ b/arch/powerpc/kvm/booke.c
@@ -27,9 +27,12 @@
#include <asm/cputable.h>
#include <asm/uaccess.h>
#include <asm/kvm_ppc.h>
+#include <asm/cacheflush.h>
#include "44x_tlb.h"
+unsigned long kvmppc_booke_handlers;
+
#define VM_STAT(x) offsetof(struct kvm, stat.x), KVM_STAT_VM
#define VCPU_STAT(x) offsetof(struct kvm_vcpu, stat.x), KVM_STAT_VCPU
@@ -577,3 +580,60 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
return 0;
}
+
+static int kvmppc_booke_init(void)
+{
+ unsigned long ivor[16];
+ unsigned long max_ivor = 0;
+ int i;
+
+ /* We install our own exception handlers by hijacking IVPR. IVPR must
+ * be 16-bit aligned, so we need a 64KB allocation. */
+ kvmppc_booke_handlers = __get_free_pages(GFP_KERNEL | __GFP_ZERO,
+ VCPU_SIZE_ORDER);
+ if (!kvmppc_booke_handlers)
+ return -ENOMEM;
+
+ /* XXX make sure our handlers are smaller than Linux's */
+
+ /* Copy our interrupt handlers to match host IVORs. That way we don't
+ * have to swap the IVORs on every guest/host transition. */
+ ivor[0] = mfspr(SPRN_IVOR0);
+ ivor[1] = mfspr(SPRN_IVOR1);
+ ivor[2] = mfspr(SPRN_IVOR2);
+ ivor[3] = mfspr(SPRN_IVOR3);
+ ivor[4] = mfspr(SPRN_IVOR4);
+ ivor[5] = mfspr(SPRN_IVOR5);
+ ivor[6] = mfspr(SPRN_IVOR6);
+ ivor[7] = mfspr(SPRN_IVOR7);
+ ivor[8] = mfspr(SPRN_IVOR8);
+ ivor[9] = mfspr(SPRN_IVOR9);
+ ivor[10] = mfspr(SPRN_IVOR10);
+ ivor[11] = mfspr(SPRN_IVOR11);
+ ivor[12] = mfspr(SPRN_IVOR12);
+ ivor[13] = mfspr(SPRN_IVOR13);
+ ivor[14] = mfspr(SPRN_IVOR14);
+ ivor[15] = mfspr(SPRN_IVOR15);
+
+ for (i = 0; i < 16; i++) {
+ if (ivor[i] > max_ivor)
+ max_ivor = ivor[i];
+
+ memcpy((void *)kvmppc_booke_handlers + ivor[i],
+ kvmppc_handlers_start + i * kvmppc_handler_len,
+ kvmppc_handler_len);
+ }
+ flush_icache_range(kvmppc_booke_handlers,
+ kvmppc_booke_handlers + max_ivor + kvmppc_handler_len);
+
+ return kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE);
+}
+
+static void __exit kvmppc_booke_exit(void)
+{
+ free_pages(kvmppc_booke_handlers, VCPU_SIZE_ORDER);
+ kvm_exit();
+}
+
+module_init(kvmppc_booke_init)
+module_exit(kvmppc_booke_exit)
diff --git a/arch/powerpc/kvm/booke_host.c b/arch/powerpc/kvm/booke_host.c
deleted file mode 100644
index b480341..0000000
--- a/arch/powerpc/kvm/booke_host.c
+++ /dev/null
@@ -1,83 +0,0 @@
-/*
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License, version 2, as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
- *
- * Copyright IBM Corp. 2008
- *
- * Authors: Hollis Blanchard <[email protected]>
- */
-
-#include <linux/errno.h>
-#include <linux/kvm_host.h>
-#include <linux/module.h>
-#include <asm/cacheflush.h>
-#include <asm/kvm_ppc.h>
-
-unsigned long kvmppc_booke_handlers;
-
-static int kvmppc_booke_init(void)
-{
- unsigned long ivor[16];
- unsigned long max_ivor = 0;
- int i;
-
- /* We install our own exception handlers by hijacking IVPR. IVPR must
- * be 16-bit aligned, so we need a 64KB allocation. */
- kvmppc_booke_handlers = __get_free_pages(GFP_KERNEL | __GFP_ZERO,
- VCPU_SIZE_ORDER);
- if (!kvmppc_booke_handlers)
- return -ENOMEM;
-
- /* XXX make sure our handlers are smaller than Linux's */
-
- /* Copy our interrupt handlers to match host IVORs. That way we don't
- * have to swap the IVORs on every guest/host transition. */
- ivor[0] = mfspr(SPRN_IVOR0);
- ivor[1] = mfspr(SPRN_IVOR1);
- ivor[2] = mfspr(SPRN_IVOR2);
- ivor[3] = mfspr(SPRN_IVOR3);
- ivor[4] = mfspr(SPRN_IVOR4);
- ivor[5] = mfspr(SPRN_IVOR5);
- ivor[6] = mfspr(SPRN_IVOR6);
- ivor[7] = mfspr(SPRN_IVOR7);
- ivor[8] = mfspr(SPRN_IVOR8);
- ivor[9] = mfspr(SPRN_IVOR9);
- ivor[10] = mfspr(SPRN_IVOR10);
- ivor[11] = mfspr(SPRN_IVOR11);
- ivor[12] = mfspr(SPRN_IVOR12);
- ivor[13] = mfspr(SPRN_IVOR13);
- ivor[14] = mfspr(SPRN_IVOR14);
- ivor[15] = mfspr(SPRN_IVOR15);
-
- for (i = 0; i < 16; i++) {
- if (ivor[i] > max_ivor)
- max_ivor = ivor[i];
-
- memcpy((void *)kvmppc_booke_handlers + ivor[i],
- kvmppc_handlers_start + i * kvmppc_handler_len,
- kvmppc_handler_len);
- }
- flush_icache_range(kvmppc_booke_handlers,
- kvmppc_booke_handlers + max_ivor + kvmppc_handler_len);
-
- return kvm_init(NULL, sizeof(struct kvm_vcpu), THIS_MODULE);
-}
-
-static void __exit kvmppc_booke_exit(void)
-{
- free_pages(kvmppc_booke_handlers, VCPU_SIZE_ORDER);
- kvm_exit();
-}
-
-module_init(kvmppc_booke_init)
-module_exit(kvmppc_booke_exit)
--
1.6.0.3
From: Hollis Blanchard <[email protected]>
Cores provide 3 emulation hooks, implemented for example in the new
4xx_emulate.c:
kvmppc_core_emulate_op
kvmppc_core_emulate_mtspr
kvmppc_core_emulate_mfspr
Strictly speaking the last two aren't necessary, but provide for more
informative error reporting ("unknown SPR").
Long term I'd like to have instruction decoding autogenerated from tables of
opcodes, and that way we could aggregate universal, Book E, and core-specific
instructions more easily and without redundant switch statements.
Signed-off-by: Hollis Blanchard <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/powerpc/include/asm/kvm_ppc.h | 29 +---
arch/powerpc/kvm/44x_emulate.c | 335 ++++++++++++++++++++++++++++++++++++
arch/powerpc/kvm/44x_tlb.c | 4 +-
arch/powerpc/kvm/44x_tlb.h | 4 +
arch/powerpc/kvm/Makefile | 7 +-
arch/powerpc/kvm/booke.c | 1 +
arch/powerpc/kvm/booke.h | 39 ++++
arch/powerpc/kvm/emulate.c | 272 +++---------------------------
8 files changed, 415 insertions(+), 276 deletions(-)
create mode 100644 arch/powerpc/kvm/44x_emulate.c
create mode 100644 arch/powerpc/kvm/booke.h
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 96d5de9..aecf95d 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -53,35 +53,13 @@ extern int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu,
extern int kvmppc_emulate_instruction(struct kvm_run *run,
struct kvm_vcpu *vcpu);
extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu);
-extern int kvmppc_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws);
-extern int kvmppc_emul_tlbsx(struct kvm_vcpu *vcpu, u8 rt, u8 ra, u8 rb, u8 rc);
+extern void kvmppc_emulate_dec(struct kvm_vcpu *vcpu);
extern void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn,
u64 asid, u32 flags);
extern void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode);
extern void kvmppc_mmu_switch_pid(struct kvm_vcpu *vcpu, u32 pid);
-/* Helper function for "full" MSR writes. No need to call this if only EE is
- * changing. */
-static inline void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
-{
- if ((new_msr & MSR_PR) != (vcpu->arch.msr & MSR_PR))
- kvmppc_mmu_priv_switch(vcpu, new_msr & MSR_PR);
-
- vcpu->arch.msr = new_msr;
-
- if (vcpu->arch.msr & MSR_WE)
- kvm_vcpu_block(vcpu);
-}
-
-static inline void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 new_pid)
-{
- if (vcpu->arch.pid != new_pid) {
- vcpu->arch.pid = new_pid;
- vcpu->arch.swap_pid = 1;
- }
-}
-
/* Core-specific hooks */
extern int kvmppc_core_check_processor_compat(void);
@@ -99,6 +77,11 @@ extern void kvmppc_core_queue_dec(struct kvm_vcpu *vcpu);
extern void kvmppc_core_queue_external(struct kvm_vcpu *vcpu,
struct kvm_interrupt *irq);
+extern int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
+ unsigned int op, int *advance);
+extern int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs);
+extern int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt);
+
extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
#endif /* __POWERPC_KVM_PPC_H__ */
diff --git a/arch/powerpc/kvm/44x_emulate.c b/arch/powerpc/kvm/44x_emulate.c
new file mode 100644
index 0000000..a634c0c
--- /dev/null
+++ b/arch/powerpc/kvm/44x_emulate.c
@@ -0,0 +1,335 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * Authors: Hollis Blanchard <[email protected]>
+ */
+
+#include <asm/kvm_ppc.h>
+#include <asm/dcr.h>
+#include <asm/dcr-regs.h>
+#include <asm/disassemble.h>
+
+#include "booke.h"
+#include "44x_tlb.h"
+
+#define OP_RFI 19
+
+#define XOP_RFI 50
+#define XOP_MFMSR 83
+#define XOP_WRTEE 131
+#define XOP_MTMSR 146
+#define XOP_WRTEEI 163
+#define XOP_MFDCR 323
+#define XOP_MTDCR 451
+#define XOP_TLBSX 914
+#define XOP_ICCCI 966
+#define XOP_TLBWE 978
+
+static inline void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 new_pid)
+{
+ if (vcpu->arch.pid != new_pid) {
+ vcpu->arch.pid = new_pid;
+ vcpu->arch.swap_pid = 1;
+ }
+}
+
+static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu)
+{
+ vcpu->arch.pc = vcpu->arch.srr0;
+ kvmppc_set_msr(vcpu, vcpu->arch.srr1);
+}
+
+int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
+ unsigned int inst, int *advance)
+{
+ int emulated = EMULATE_DONE;
+ int dcrn;
+ int ra;
+ int rb;
+ int rc;
+ int rs;
+ int rt;
+ int ws;
+
+ switch (get_op(inst)) {
+
+ case OP_RFI:
+ switch (get_xop(inst)) {
+ case XOP_RFI:
+ kvmppc_emul_rfi(vcpu);
+ *advance = 0;
+ break;
+
+ default:
+ emulated = EMULATE_FAIL;
+ break;
+ }
+ break;
+
+ case 31:
+ switch (get_xop(inst)) {
+
+ case XOP_MFMSR:
+ rt = get_rt(inst);
+ vcpu->arch.gpr[rt] = vcpu->arch.msr;
+ break;
+
+ case XOP_MTMSR:
+ rs = get_rs(inst);
+ kvmppc_set_msr(vcpu, vcpu->arch.gpr[rs]);
+ break;
+
+ case XOP_WRTEE:
+ rs = get_rs(inst);
+ vcpu->arch.msr = (vcpu->arch.msr & ~MSR_EE)
+ | (vcpu->arch.gpr[rs] & MSR_EE);
+ break;
+
+ case XOP_WRTEEI:
+ vcpu->arch.msr = (vcpu->arch.msr & ~MSR_EE)
+ | (inst & MSR_EE);
+ break;
+
+ case XOP_MFDCR:
+ dcrn = get_dcrn(inst);
+ rt = get_rt(inst);
+
+ /* The guest may access CPR0 registers to determine the timebase
+ * frequency, and it must know the real host frequency because it
+ * can directly access the timebase registers.
+ *
+ * It would be possible to emulate those accesses in userspace,
+ * but userspace can really only figure out the end frequency.
+ * We could decompose that into the factors that compute it, but
+ * that's tricky math, and it's easier to just report the real
+ * CPR0 values.
+ */
+ switch (dcrn) {
+ case DCRN_CPR0_CONFIG_ADDR:
+ vcpu->arch.gpr[rt] = vcpu->arch.cpr0_cfgaddr;
+ break;
+ case DCRN_CPR0_CONFIG_DATA:
+ local_irq_disable();
+ mtdcr(DCRN_CPR0_CONFIG_ADDR,
+ vcpu->arch.cpr0_cfgaddr);
+ vcpu->arch.gpr[rt] = mfdcr(DCRN_CPR0_CONFIG_DATA);
+ local_irq_enable();
+ break;
+ default:
+ run->dcr.dcrn = dcrn;
+ run->dcr.data = 0;
+ run->dcr.is_write = 0;
+ vcpu->arch.io_gpr = rt;
+ vcpu->arch.dcr_needed = 1;
+ emulated = EMULATE_DO_DCR;
+ }
+
+ break;
+
+ case XOP_MTDCR:
+ dcrn = get_dcrn(inst);
+ rs = get_rs(inst);
+
+ /* emulate some access in kernel */
+ switch (dcrn) {
+ case DCRN_CPR0_CONFIG_ADDR:
+ vcpu->arch.cpr0_cfgaddr = vcpu->arch.gpr[rs];
+ break;
+ default:
+ run->dcr.dcrn = dcrn;
+ run->dcr.data = vcpu->arch.gpr[rs];
+ run->dcr.is_write = 1;
+ vcpu->arch.dcr_needed = 1;
+ emulated = EMULATE_DO_DCR;
+ }
+
+ break;
+
+ case XOP_TLBWE:
+ ra = get_ra(inst);
+ rs = get_rs(inst);
+ ws = get_ws(inst);
+ emulated = kvmppc_44x_emul_tlbwe(vcpu, ra, rs, ws);
+ break;
+
+ case XOP_TLBSX:
+ rt = get_rt(inst);
+ ra = get_ra(inst);
+ rb = get_rb(inst);
+ rc = get_rc(inst);
+ emulated = kvmppc_44x_emul_tlbsx(vcpu, rt, ra, rb, rc);
+ break;
+
+ case XOP_ICCCI:
+ break;
+
+ default:
+ emulated = EMULATE_FAIL;
+ }
+
+ break;
+
+ default:
+ emulated = EMULATE_FAIL;
+ }
+
+ return emulated;
+}
+
+int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, int sprn, int rs)
+{
+ switch (sprn) {
+ case SPRN_MMUCR:
+ vcpu->arch.mmucr = vcpu->arch.gpr[rs]; break;
+ case SPRN_PID:
+ kvmppc_set_pid(vcpu, vcpu->arch.gpr[rs]); break;
+ case SPRN_CCR0:
+ vcpu->arch.ccr0 = vcpu->arch.gpr[rs]; break;
+ case SPRN_CCR1:
+ vcpu->arch.ccr1 = vcpu->arch.gpr[rs]; break;
+ case SPRN_DEAR:
+ vcpu->arch.dear = vcpu->arch.gpr[rs]; break;
+ case SPRN_ESR:
+ vcpu->arch.esr = vcpu->arch.gpr[rs]; break;
+ case SPRN_DBCR0:
+ vcpu->arch.dbcr0 = vcpu->arch.gpr[rs]; break;
+ case SPRN_DBCR1:
+ vcpu->arch.dbcr1 = vcpu->arch.gpr[rs]; break;
+ case SPRN_TSR:
+ vcpu->arch.tsr &= ~vcpu->arch.gpr[rs]; break;
+ case SPRN_TCR:
+ vcpu->arch.tcr = vcpu->arch.gpr[rs];
+ kvmppc_emulate_dec(vcpu);
+ break;
+
+ /* Note: SPRG4-7 are user-readable. These values are
+ * loaded into the real SPRGs when resuming the
+ * guest. */
+ case SPRN_SPRG4:
+ vcpu->arch.sprg4 = vcpu->arch.gpr[rs]; break;
+ case SPRN_SPRG5:
+ vcpu->arch.sprg5 = vcpu->arch.gpr[rs]; break;
+ case SPRN_SPRG6:
+ vcpu->arch.sprg6 = vcpu->arch.gpr[rs]; break;
+ case SPRN_SPRG7:
+ vcpu->arch.sprg7 = vcpu->arch.gpr[rs]; break;
+
+ case SPRN_IVPR:
+ vcpu->arch.ivpr = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR0:
+ vcpu->arch.ivor[0] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR1:
+ vcpu->arch.ivor[1] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR2:
+ vcpu->arch.ivor[2] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR3:
+ vcpu->arch.ivor[3] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR4:
+ vcpu->arch.ivor[4] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR5:
+ vcpu->arch.ivor[5] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR6:
+ vcpu->arch.ivor[6] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR7:
+ vcpu->arch.ivor[7] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR8:
+ vcpu->arch.ivor[8] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR9:
+ vcpu->arch.ivor[9] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR10:
+ vcpu->arch.ivor[10] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR11:
+ vcpu->arch.ivor[11] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR12:
+ vcpu->arch.ivor[12] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR13:
+ vcpu->arch.ivor[13] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR14:
+ vcpu->arch.ivor[14] = vcpu->arch.gpr[rs]; break;
+ case SPRN_IVOR15:
+ vcpu->arch.ivor[15] = vcpu->arch.gpr[rs]; break;
+
+ default:
+ return EMULATE_FAIL;
+ }
+
+ return EMULATE_DONE;
+}
+
+int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn, int rt)
+{
+ switch (sprn) {
+ /* 440 */
+ case SPRN_MMUCR:
+ vcpu->arch.gpr[rt] = vcpu->arch.mmucr; break;
+ case SPRN_CCR0:
+ vcpu->arch.gpr[rt] = vcpu->arch.ccr0; break;
+ case SPRN_CCR1:
+ vcpu->arch.gpr[rt] = vcpu->arch.ccr1; break;
+
+ /* Book E */
+ case SPRN_PID:
+ vcpu->arch.gpr[rt] = vcpu->arch.pid; break;
+ case SPRN_IVPR:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivpr; break;
+ case SPRN_DEAR:
+ vcpu->arch.gpr[rt] = vcpu->arch.dear; break;
+ case SPRN_ESR:
+ vcpu->arch.gpr[rt] = vcpu->arch.esr; break;
+ case SPRN_DBCR0:
+ vcpu->arch.gpr[rt] = vcpu->arch.dbcr0; break;
+ case SPRN_DBCR1:
+ vcpu->arch.gpr[rt] = vcpu->arch.dbcr1; break;
+
+ case SPRN_IVOR0:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[0]; break;
+ case SPRN_IVOR1:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[1]; break;
+ case SPRN_IVOR2:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[2]; break;
+ case SPRN_IVOR3:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[3]; break;
+ case SPRN_IVOR4:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[4]; break;
+ case SPRN_IVOR5:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[5]; break;
+ case SPRN_IVOR6:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[6]; break;
+ case SPRN_IVOR7:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[7]; break;
+ case SPRN_IVOR8:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[8]; break;
+ case SPRN_IVOR9:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[9]; break;
+ case SPRN_IVOR10:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[10]; break;
+ case SPRN_IVOR11:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[11]; break;
+ case SPRN_IVOR12:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[12]; break;
+ case SPRN_IVOR13:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[13]; break;
+ case SPRN_IVOR14:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[14]; break;
+ case SPRN_IVOR15:
+ vcpu->arch.gpr[rt] = vcpu->arch.ivor[15]; break;
+ default:
+ return EMULATE_FAIL;
+ }
+
+ return EMULATE_DONE;
+}
+
diff --git a/arch/powerpc/kvm/44x_tlb.c b/arch/powerpc/kvm/44x_tlb.c
index 5152fe5..bb6da13 100644
--- a/arch/powerpc/kvm/44x_tlb.c
+++ b/arch/powerpc/kvm/44x_tlb.c
@@ -301,7 +301,7 @@ static int tlbe_is_host_safe(const struct kvm_vcpu *vcpu,
return 1;
}
-int kvmppc_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws)
+int kvmppc_44x_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws)
{
u64 eaddr;
u64 raddr;
@@ -363,7 +363,7 @@ int kvmppc_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws)
return EMULATE_DONE;
}
-int kvmppc_emul_tlbsx(struct kvm_vcpu *vcpu, u8 rt, u8 ra, u8 rb, u8 rc)
+int kvmppc_44x_emul_tlbsx(struct kvm_vcpu *vcpu, u8 rt, u8 ra, u8 rb, u8 rc)
{
u32 ea;
int index;
diff --git a/arch/powerpc/kvm/44x_tlb.h b/arch/powerpc/kvm/44x_tlb.h
index 357d79a..b1029af 100644
--- a/arch/powerpc/kvm/44x_tlb.h
+++ b/arch/powerpc/kvm/44x_tlb.h
@@ -31,6 +31,10 @@ extern struct kvmppc_44x_tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu,
gva_t eaddr);
extern void kvmppc_tlbe_set_modified(struct kvm_vcpu *vcpu, unsigned int i);
+extern int kvmppc_44x_emul_tlbsx(struct kvm_vcpu *vcpu, u8 rt, u8 ra, u8 rb,
+ u8 rc);
+extern int kvmppc_44x_emul_tlbwe(struct kvm_vcpu *vcpu, u8 ra, u8 rs, u8 ws);
+
/* TLB helper functions */
static inline unsigned int get_tlb_size(const struct kvmppc_44x_tlbe *tlbe)
{
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index f5e3375..f045fad 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -13,5 +13,10 @@ obj-$(CONFIG_KVM) += kvm.o
AFLAGS_booke_interrupts.o := -I$(obj)
-kvm-440-objs := booke.o booke_interrupts.o 44x.o 44x_tlb.o
+kvm-440-objs := \
+ booke.o \
+ booke_interrupts.o \
+ 44x.o \
+ 44x_tlb.o \
+ 44x_emulate.o
obj-$(CONFIG_KVM_440) += kvm-440.o
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 138014a..ea63009 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -29,6 +29,7 @@
#include <asm/kvm_ppc.h>
#include <asm/cacheflush.h>
+#include "booke.h"
#include "44x_tlb.h"
unsigned long kvmppc_booke_handlers;
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
new file mode 100644
index 0000000..f694a4b
--- /dev/null
+++ b/arch/powerpc/kvm/booke.h
@@ -0,0 +1,39 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ *
+ * Copyright IBM Corp. 2008
+ *
+ * Authors: Hollis Blanchard <[email protected]>
+ */
+
+#ifndef __KVM_BOOKE_H__
+#define __KVM_BOOKE_H__
+
+#include <linux/types.h>
+#include <linux/kvm_host.h>
+
+/* Helper function for "full" MSR writes. No need to call this if only EE is
+ * changing. */
+static inline void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
+{
+ if ((new_msr & MSR_PR) != (vcpu->arch.msr & MSR_PR))
+ kvmppc_mmu_priv_switch(vcpu, new_msr & MSR_PR);
+
+ vcpu->arch.msr = new_msr;
+
+ if (vcpu->arch.msr & MSR_WE)
+ kvm_vcpu_block(vcpu);
+}
+
+#endif /* __KVM_BOOKE_H__ */
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index 5fd9cf7..30a49f8 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -23,14 +23,13 @@
#include <linux/string.h>
#include <linux/kvm_host.h>
-#include <asm/dcr.h>
-#include <asm/dcr-regs.h>
+#include <asm/reg.h>
#include <asm/time.h>
#include <asm/byteorder.h>
#include <asm/kvm_ppc.h>
#include <asm/disassemble.h>
-static void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
+void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
{
if (vcpu->arch.tcr & TCR_DIE) {
/* The decrementer ticks at the same rate as the timebase, so
@@ -46,12 +45,6 @@ static void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
}
}
-static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu)
-{
- vcpu->arch.pc = vcpu->arch.srr0;
- kvmppc_set_msr(vcpu, vcpu->arch.srr1);
-}
-
/* XXX to do:
* lhax
* lhaux
@@ -66,18 +59,17 @@ static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu)
*
* XXX is_bigendian should depend on MMU mapping or MSR[LE]
*/
+/* XXX Should probably auto-generate instruction decoding for a particular core
+ * from opcode tables in the future. */
int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
{
u32 inst = vcpu->arch.last_inst;
u32 ea;
int ra;
int rb;
- int rc;
int rs;
int rt;
- int ws;
int sprn;
- int dcrn;
enum emulation_result emulated = EMULATE_DONE;
int advance = 1;
@@ -88,19 +80,6 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
advance = 0;
break;
- case 19:
- switch (get_xop(inst)) {
- case 50: /* rfi */
- kvmppc_emul_rfi(vcpu);
- advance = 0;
- break;
-
- default:
- emulated = EMULATE_FAIL;
- break;
- }
- break;
-
case 31:
switch (get_xop(inst)) {
@@ -109,27 +88,11 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
emulated = kvmppc_handle_load(run, vcpu, rt, 4, 1);
break;
- case 83: /* mfmsr */
- rt = get_rt(inst);
- vcpu->arch.gpr[rt] = vcpu->arch.msr;
- break;
-
case 87: /* lbzx */
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 1, 1);
break;
- case 131: /* wrtee */
- rs = get_rs(inst);
- vcpu->arch.msr = (vcpu->arch.msr & ~MSR_EE)
- | (vcpu->arch.gpr[rs] & MSR_EE);
- break;
-
- case 146: /* mtmsr */
- rs = get_rs(inst);
- kvmppc_set_msr(vcpu, vcpu->arch.gpr[rs]);
- break;
-
case 151: /* stwx */
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu,
@@ -137,11 +100,6 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
4, 1);
break;
- case 163: /* wrteei */
- vcpu->arch.msr = (vcpu->arch.msr & ~MSR_EE)
- | (inst & MSR_EE);
- break;
-
case 215: /* stbx */
rs = get_rs(inst);
emulated = kvmppc_handle_store(run, vcpu,
@@ -182,42 +140,6 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
vcpu->arch.gpr[ra] = ea;
break;
- case 323: /* mfdcr */
- dcrn = get_dcrn(inst);
- rt = get_rt(inst);
-
- /* The guest may access CPR0 registers to determine the timebase
- * frequency, and it must know the real host frequency because it
- * can directly access the timebase registers.
- *
- * It would be possible to emulate those accesses in userspace,
- * but userspace can really only figure out the end frequency.
- * We could decompose that into the factors that compute it, but
- * that's tricky math, and it's easier to just report the real
- * CPR0 values.
- */
- switch (dcrn) {
- case DCRN_CPR0_CONFIG_ADDR:
- vcpu->arch.gpr[rt] = vcpu->arch.cpr0_cfgaddr;
- break;
- case DCRN_CPR0_CONFIG_DATA:
- local_irq_disable();
- mtdcr(DCRN_CPR0_CONFIG_ADDR,
- vcpu->arch.cpr0_cfgaddr);
- vcpu->arch.gpr[rt] = mfdcr(DCRN_CPR0_CONFIG_DATA);
- local_irq_enable();
- break;
- default:
- run->dcr.dcrn = dcrn;
- run->dcr.data = 0;
- run->dcr.is_write = 0;
- vcpu->arch.io_gpr = rt;
- vcpu->arch.dcr_needed = 1;
- emulated = EMULATE_DO_DCR;
- }
-
- break;
-
case 339: /* mfspr */
sprn = get_sprn(inst);
rt = get_rt(inst);
@@ -227,26 +149,8 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
vcpu->arch.gpr[rt] = vcpu->arch.srr0; break;
case SPRN_SRR1:
vcpu->arch.gpr[rt] = vcpu->arch.srr1; break;
- case SPRN_MMUCR:
- vcpu->arch.gpr[rt] = vcpu->arch.mmucr; break;
- case SPRN_PID:
- vcpu->arch.gpr[rt] = vcpu->arch.pid; break;
- case SPRN_IVPR:
- vcpu->arch.gpr[rt] = vcpu->arch.ivpr; break;
- case SPRN_CCR0:
- vcpu->arch.gpr[rt] = vcpu->arch.ccr0; break;
- case SPRN_CCR1:
- vcpu->arch.gpr[rt] = vcpu->arch.ccr1; break;
case SPRN_PVR:
vcpu->arch.gpr[rt] = vcpu->arch.pvr; break;
- case SPRN_DEAR:
- vcpu->arch.gpr[rt] = vcpu->arch.dear; break;
- case SPRN_ESR:
- vcpu->arch.gpr[rt] = vcpu->arch.esr; break;
- case SPRN_DBCR0:
- vcpu->arch.gpr[rt] = vcpu->arch.dbcr0; break;
- case SPRN_DBCR1:
- vcpu->arch.gpr[rt] = vcpu->arch.dbcr1; break;
/* Note: mftb and TBRL/TBWL are user-accessible, so
* the guest can always access the real TB anyways.
@@ -267,42 +171,12 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
/* Note: SPRG4-7 are user-readable, so we don't get
* a trap. */
- case SPRN_IVOR0:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[0]; break;
- case SPRN_IVOR1:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[1]; break;
- case SPRN_IVOR2:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[2]; break;
- case SPRN_IVOR3:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[3]; break;
- case SPRN_IVOR4:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[4]; break;
- case SPRN_IVOR5:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[5]; break;
- case SPRN_IVOR6:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[6]; break;
- case SPRN_IVOR7:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[7]; break;
- case SPRN_IVOR8:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[8]; break;
- case SPRN_IVOR9:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[9]; break;
- case SPRN_IVOR10:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[10]; break;
- case SPRN_IVOR11:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[11]; break;
- case SPRN_IVOR12:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[12]; break;
- case SPRN_IVOR13:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[13]; break;
- case SPRN_IVOR14:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[14]; break;
- case SPRN_IVOR15:
- vcpu->arch.gpr[rt] = vcpu->arch.ivor[15]; break;
-
default:
- printk("mfspr: unknown spr %x\n", sprn);
- vcpu->arch.gpr[rt] = 0;
+ emulated = kvmppc_core_emulate_mfspr(vcpu, sprn, rt);
+ if (emulated == EMULATE_FAIL) {
+ printk("mfspr: unknown spr %x\n", sprn);
+ vcpu->arch.gpr[rt] = 0;
+ }
break;
}
break;
@@ -332,25 +206,6 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
vcpu->arch.gpr[ra] = ea;
break;
- case 451: /* mtdcr */
- dcrn = get_dcrn(inst);
- rs = get_rs(inst);
-
- /* emulate some access in kernel */
- switch (dcrn) {
- case DCRN_CPR0_CONFIG_ADDR:
- vcpu->arch.cpr0_cfgaddr = vcpu->arch.gpr[rs];
- break;
- default:
- run->dcr.dcrn = dcrn;
- run->dcr.data = vcpu->arch.gpr[rs];
- run->dcr.is_write = 1;
- vcpu->arch.dcr_needed = 1;
- emulated = EMULATE_DO_DCR;
- }
-
- break;
-
case 467: /* mtspr */
sprn = get_sprn(inst);
rs = get_rs(inst);
@@ -359,22 +214,6 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
vcpu->arch.srr0 = vcpu->arch.gpr[rs]; break;
case SPRN_SRR1:
vcpu->arch.srr1 = vcpu->arch.gpr[rs]; break;
- case SPRN_MMUCR:
- vcpu->arch.mmucr = vcpu->arch.gpr[rs]; break;
- case SPRN_PID:
- kvmppc_set_pid(vcpu, vcpu->arch.gpr[rs]); break;
- case SPRN_CCR0:
- vcpu->arch.ccr0 = vcpu->arch.gpr[rs]; break;
- case SPRN_CCR1:
- vcpu->arch.ccr1 = vcpu->arch.gpr[rs]; break;
- case SPRN_DEAR:
- vcpu->arch.dear = vcpu->arch.gpr[rs]; break;
- case SPRN_ESR:
- vcpu->arch.esr = vcpu->arch.gpr[rs]; break;
- case SPRN_DBCR0:
- vcpu->arch.dbcr0 = vcpu->arch.gpr[rs]; break;
- case SPRN_DBCR1:
- vcpu->arch.dbcr1 = vcpu->arch.gpr[rs]; break;
/* XXX We need to context-switch the timebase for
* watchdog and FIT. */
@@ -386,14 +225,6 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
kvmppc_emulate_dec(vcpu);
break;
- case SPRN_TSR:
- vcpu->arch.tsr &= ~vcpu->arch.gpr[rs]; break;
-
- case SPRN_TCR:
- vcpu->arch.tcr = vcpu->arch.gpr[rs];
- kvmppc_emulate_dec(vcpu);
- break;
-
case SPRN_SPRG0:
vcpu->arch.sprg0 = vcpu->arch.gpr[rs]; break;
case SPRN_SPRG1:
@@ -403,56 +234,10 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
case SPRN_SPRG3:
vcpu->arch.sprg3 = vcpu->arch.gpr[rs]; break;
- /* Note: SPRG4-7 are user-readable. These values are
- * loaded into the real SPRGs when resuming the
- * guest. */
- case SPRN_SPRG4:
- vcpu->arch.sprg4 = vcpu->arch.gpr[rs]; break;
- case SPRN_SPRG5:
- vcpu->arch.sprg5 = vcpu->arch.gpr[rs]; break;
- case SPRN_SPRG6:
- vcpu->arch.sprg6 = vcpu->arch.gpr[rs]; break;
- case SPRN_SPRG7:
- vcpu->arch.sprg7 = vcpu->arch.gpr[rs]; break;
-
- case SPRN_IVPR:
- vcpu->arch.ivpr = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR0:
- vcpu->arch.ivor[0] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR1:
- vcpu->arch.ivor[1] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR2:
- vcpu->arch.ivor[2] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR3:
- vcpu->arch.ivor[3] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR4:
- vcpu->arch.ivor[4] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR5:
- vcpu->arch.ivor[5] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR6:
- vcpu->arch.ivor[6] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR7:
- vcpu->arch.ivor[7] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR8:
- vcpu->arch.ivor[8] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR9:
- vcpu->arch.ivor[9] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR10:
- vcpu->arch.ivor[10] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR11:
- vcpu->arch.ivor[11] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR12:
- vcpu->arch.ivor[12] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR13:
- vcpu->arch.ivor[13] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR14:
- vcpu->arch.ivor[14] = vcpu->arch.gpr[rs]; break;
- case SPRN_IVOR15:
- vcpu->arch.ivor[15] = vcpu->arch.gpr[rs]; break;
-
default:
- printk("mtspr: unknown spr %x\n", sprn);
- emulated = EMULATE_FAIL;
+ emulated = kvmppc_core_emulate_mtspr(vcpu, sprn, rs);
+ if (emulated == EMULATE_FAIL)
+ printk("mtspr: unknown spr %x\n", sprn);
break;
}
break;
@@ -483,21 +268,6 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
4, 0);
break;
- case 978: /* tlbwe */
- ra = get_ra(inst);
- rs = get_rs(inst);
- ws = get_ws(inst);
- emulated = kvmppc_emul_tlbwe(vcpu, ra, rs, ws);
- break;
-
- case 914: /* tlbsx */
- rt = get_rt(inst);
- ra = get_ra(inst);
- rb = get_rb(inst);
- rc = get_rc(inst);
- emulated = kvmppc_emul_tlbsx(vcpu, rt, ra, rb, rc);
- break;
-
case 790: /* lhbrx */
rt = get_rt(inst);
emulated = kvmppc_handle_load(run, vcpu, rt, 2, 0);
@@ -513,14 +283,9 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
2, 0);
break;
- case 966: /* iccci */
- break;
-
default:
- printk("unknown: op %d xop %d\n", get_op(inst),
- get_xop(inst));
+ /* Attempt core-specific emulation below. */
emulated = EMULATE_FAIL;
- break;
}
break;
@@ -603,9 +368,16 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu)
break;
default:
- printk("unknown op %d\n", get_op(inst));
emulated = EMULATE_FAIL;
- break;
+ }
+
+ if (emulated == EMULATE_FAIL) {
+ emulated = kvmppc_core_emulate_op(run, vcpu, inst, &advance);
+ if (emulated == EMULATE_FAIL) {
+ advance = 0;
+ printk(KERN_ERR "Couldn't emulate instruction 0x%08x "
+ "(op %d xop %d)\n", inst, get_op(inst), get_xop(inst));
+ }
}
KVMTRACE_3D(PPC_INSTR, vcpu, inst, vcpu->arch.pc, emulated, entryexit);
--
1.6.0.3
From: Hollis Blanchard <[email protected]>
Needed to port to other Book E processors.
Signed-off-by: Hollis Blanchard <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/powerpc/include/asm/kvm_ppc.h | 3 ++
arch/powerpc/kvm/44x.c | 53 ++++++++++++++++++++++++++++++++++++
arch/powerpc/kvm/booke.c | 46 +-----------------------------
3 files changed, 58 insertions(+), 44 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index aecf95d..d593325 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -62,7 +62,10 @@ extern void kvmppc_mmu_switch_pid(struct kvm_vcpu *vcpu, u32 pid);
/* Core-specific hooks */
+extern int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu);
extern int kvmppc_core_check_processor_compat(void);
+extern int kvmppc_core_vcpu_translate(struct kvm_vcpu *vcpu,
+ struct kvm_translation *tr);
extern void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
extern void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu);
diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
index fcf8c7d..f5d7028 100644
--- a/arch/powerpc/kvm/44x.c
+++ b/arch/powerpc/kvm/44x.c
@@ -121,3 +121,56 @@ int kvmppc_core_check_processor_compat(void)
return r;
}
+
+int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu)
+{
+ struct kvmppc_44x_tlbe *tlbe = &vcpu->arch.guest_tlb[0];
+
+ tlbe->tid = 0;
+ tlbe->word0 = PPC44x_TLB_16M | PPC44x_TLB_VALID;
+ tlbe->word1 = 0;
+ tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR;
+
+ tlbe++;
+ tlbe->tid = 0;
+ tlbe->word0 = 0xef600000 | PPC44x_TLB_4K | PPC44x_TLB_VALID;
+ tlbe->word1 = 0xef600000;
+ tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR
+ | PPC44x_TLB_I | PPC44x_TLB_G;
+
+ /* Since the guest can directly access the timebase, it must know the
+ * real timebase frequency. Accordingly, it must see the state of
+ * CCR1[TCS]. */
+ vcpu->arch.ccr1 = mfspr(SPRN_CCR1);
+
+ return 0;
+}
+
+/* 'linear_address' is actually an encoding of AS|PID|EADDR . */
+int kvmppc_core_vcpu_translate(struct kvm_vcpu *vcpu,
+ struct kvm_translation *tr)
+{
+ struct kvmppc_44x_tlbe *gtlbe;
+ int index;
+ gva_t eaddr;
+ u8 pid;
+ u8 as;
+
+ eaddr = tr->linear_address;
+ pid = (tr->linear_address >> 32) & 0xff;
+ as = (tr->linear_address >> 40) & 0x1;
+
+ index = kvmppc_44x_tlb_index(vcpu, eaddr, pid, as);
+ if (index == -1) {
+ tr->valid = 0;
+ return 0;
+ }
+
+ gtlbe = &vcpu->arch.guest_tlb[index];
+
+ tr->physical_address = tlb_xlate(gtlbe, eaddr);
+ /* XXX what does "writeable" and "usermode" even mean? */
+ tr->valid = 1;
+
+ return 0;
+}
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index ea63009..c619d1b 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -479,20 +479,6 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
/* Initial guest state: 16MB mapping 0 -> 0, PC = 0, MSR = 0, R1 = 16MB */
int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
{
- struct kvmppc_44x_tlbe *tlbe = &vcpu->arch.guest_tlb[0];
-
- tlbe->tid = 0;
- tlbe->word0 = PPC44x_TLB_16M | PPC44x_TLB_VALID;
- tlbe->word1 = 0;
- tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR;
-
- tlbe++;
- tlbe->tid = 0;
- tlbe->word0 = 0xef600000 | PPC44x_TLB_4K | PPC44x_TLB_VALID;
- tlbe->word1 = 0xef600000;
- tlbe->word2 = PPC44x_TLB_SX | PPC44x_TLB_SW | PPC44x_TLB_SR
- | PPC44x_TLB_I | PPC44x_TLB_G;
-
vcpu->arch.pc = 0;
vcpu->arch.msr = 0;
vcpu->arch.gpr[1] = (16<<20) - 8; /* -8 for the callee-save LR slot */
@@ -503,12 +489,7 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
* before it's programmed its own IVPR. */
vcpu->arch.ivpr = 0x55550000;
- /* Since the guest can directly access the timebase, it must know the
- * real timebase frequency. Accordingly, it must see the state of
- * CCR1[TCS]. */
- vcpu->arch.ccr1 = mfspr(SPRN_CCR1);
-
- return 0;
+ return kvmppc_core_vcpu_setup(vcpu);
}
int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
@@ -586,33 +567,10 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)
return -ENOTSUPP;
}
-/* 'linear_address' is actually an encoding of AS|PID|EADDR . */
int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
struct kvm_translation *tr)
{
- struct kvmppc_44x_tlbe *gtlbe;
- int index;
- gva_t eaddr;
- u8 pid;
- u8 as;
-
- eaddr = tr->linear_address;
- pid = (tr->linear_address >> 32) & 0xff;
- as = (tr->linear_address >> 40) & 0x1;
-
- index = kvmppc_44x_tlb_index(vcpu, eaddr, pid, as);
- if (index == -1) {
- tr->valid = 0;
- return 0;
- }
-
- gtlbe = &vcpu->arch.guest_tlb[index];
-
- tr->physical_address = tlb_xlate(gtlbe, eaddr);
- /* XXX what does "writeable" and "usermode" even mean? */
- tr->valid = 1;
-
- return 0;
+ return kvmppc_core_vcpu_translate(vcpu, tr);
}
static int kvmppc_booke_init(void)
--
1.6.0.3
From: Guillaume Thouvenin <[email protected]>
This patch consolidate the emulation of push reg instruction.
Signed-off-by: Guillaume Thouvenin <[email protected]>
Signed-off-by: Laurent Vivier <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/x86_emulate.c | 8 +-------
1 files changed, 1 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index ea05117..a391e21 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -1415,13 +1415,7 @@ special_insn:
emulate_1op("dec", c->dst, ctxt->eflags);
break;
case 0x50 ... 0x57: /* push reg */
- c->dst.type = OP_MEM;
- c->dst.bytes = c->op_bytes;
- c->dst.val = c->src.val;
- register_address_increment(c, &c->regs[VCPU_REGS_RSP],
- -c->op_bytes);
- c->dst.ptr = (void *) register_address(
- c, ss_base(ctxt), c->regs[VCPU_REGS_RSP]);
+ emulate_push(ctxt);
break;
case 0x58 ... 0x5f: /* pop reg */
pop_instruction:
--
1.6.0.3
From: Guillaume Thouvenin <[email protected]>
If emulate_invalid_guest_state is enabled, the emulator is called
when guest state is invalid. Until now, we reported an mmio failure
when emulate_instruction() returned EMULATE_DO_MMIO. This patch adds
the case where emulate_instruction() failed and an MMIO emulation
is needed.
Signed-off-by: Guillaume Thouvenin <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/vmx.c | 27 +++++++++++++++------------
1 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 816d231..427dbc1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3052,16 +3052,12 @@ static void handle_invalid_guest_state(struct kvm_vcpu *vcpu,
while (!guest_state_valid(vcpu)) {
err = emulate_instruction(vcpu, kvm_run, 0, 0, 0);
- switch (err) {
- case EMULATE_DONE:
- break;
- case EMULATE_DO_MMIO:
- kvm_report_emulation_failure(vcpu, "mmio");
- /* TODO: Handle MMIO */
- return;
- default:
- kvm_report_emulation_failure(vcpu, "emulation failure");
- return;
+ if (err == EMULATE_DO_MMIO)
+ break;
+
+ if (err != EMULATE_DONE) {
+ kvm_report_emulation_failure(vcpu, "emulation failure");
+ return;
}
if (signal_pending(current))
@@ -3073,8 +3069,10 @@ static void handle_invalid_guest_state(struct kvm_vcpu *vcpu,
local_irq_disable();
preempt_disable();
- /* Guest state should be valid now, no more emulation should be needed */
- vmx->emulation_required = 0;
+ /* Guest state should be valid now except if we need to
+ * emulate an MMIO */
+ if (guest_state_valid(vcpu))
+ vmx->emulation_required = 0;
}
/*
@@ -3121,6 +3119,11 @@ static int kvm_handle_exit(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
KVMTRACE_3D(VMEXIT, vcpu, exit_reason, (u32)kvm_rip_read(vcpu),
(u32)((u64)kvm_rip_read(vcpu) >> 32), entryexit);
+ /* If we need to emulate an MMIO from handle_invalid_guest_state
+ * we just return 0 */
+ if (vmx->emulation_required && emulate_invalid_guest_state)
+ return 0;
+
/* Access CR3 don't cause VMExit in paging mode, so we need
* to sync with guest real CR3. */
if (vm_need_ept() && is_paging(vcpu)) {
--
1.6.0.3
From: Guillaume Thouvenin <[email protected]>
Add decode entries for 0x04 and 0x05 (ADD) opcodes, execution is already
implemented.
Signed-off-by: Guillaume Thouvenin <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/x86_emulate.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index a391e21..57d7cc4 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -80,7 +80,7 @@ static u16 opcode_table[256] = {
/* 0x00 - 0x07 */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
- 0, 0, 0, 0,
+ ByteOp | DstAcc | SrcImm, DstAcc | SrcImm, 0, 0,
/* 0x08 - 0x0F */
ByteOp | DstMem | SrcReg | ModRM, DstMem | SrcReg | ModRM,
ByteOp | DstReg | SrcMem | ModRM, DstReg | SrcMem | ModRM,
--
1.6.0.3
From: Guillaume Thouvenin <[email protected]>
If we call the emulator we shouldn't call skip_emulated_instruction()
in the first place, since the emulator already computes the next rip
for us. Thus we move ->skip_emulated_instruction() out of
kvm_emulate_pio() and into handle_io() (and the svm equivalent). We
also replaced "return 0" by "break" in the "do_io:" case because now
the shadow register state needs to be committed. Otherwise eip will never
be updated.
Signed-off-by: Guillaume Thouvenin <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/svm.c | 1 +
arch/x86/kvm/vmx.c | 1 +
arch/x86/kvm/x86.c | 2 --
arch/x86/kvm/x86_emulate.c | 2 +-
4 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 743aebd..f0ad4d4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1115,6 +1115,7 @@ static int io_interception(struct vcpu_svm *svm, struct kvm_run *kvm_run)
rep = (io_info & SVM_IOIO_REP_MASK) != 0;
down = (svm->vmcb->save.rflags & X86_EFLAGS_DF) != 0;
+ skip_emulated_instruction(&svm->vcpu);
return kvm_emulate_pio(&svm->vcpu, kvm_run, in, size, port);
}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7623eb7..816d231 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2687,6 +2687,7 @@ static int handle_io(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
rep = (exit_qualification & 32) != 0;
port = exit_qualification >> 16;
+ skip_emulated_instruction(vcpu);
return kvm_emulate_pio(vcpu, kvm_run, in, size, port);
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ceeac88..38f79b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2478,8 +2478,6 @@ int kvm_emulate_pio(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
val = kvm_register_read(vcpu, VCPU_REGS_RAX);
memcpy(vcpu->arch.pio_data, &val, 4);
- kvm_x86_ops->skip_emulated_instruction(vcpu);
-
pio_dev = vcpu_find_pio_dev(vcpu, port, size, !in);
if (pio_dev) {
kernel_pio(pio_dev, vcpu, vcpu->arch.pio_data);
diff --git a/arch/x86/kvm/x86_emulate.c b/arch/x86/kvm/x86_emulate.c
index 57d7cc4..8f60ace 100644
--- a/arch/x86/kvm/x86_emulate.c
+++ b/arch/x86/kvm/x86_emulate.c
@@ -1772,7 +1772,7 @@ special_insn:
c->eip = saved_eip;
goto cannot_emulate;
}
- return 0;
+ break;
case 0xf4: /* hlt */
ctxt->vcpu->arch.halt_request = 1;
break;
--
1.6.0.3
From: Jan Kiszka <[email protected]>
This patch refactors the NMI watchdog delivery patch, consolidating
tests and providing a proper API for delivering watchdog events.
An included micro-optimization is to check only for apic_hw_enabled in
kvm_apic_local_deliver (the test for LVT mask is covering the
soft-disabled case already).
Signed-off-by: Jan Kiszka <[email protected]>
Acked-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/i8254.c | 15 +++++++++------
arch/x86/kvm/irq.h | 2 +-
arch/x86/kvm/lapic.c | 20 ++++++++++----------
3 files changed, 20 insertions(+), 17 deletions(-)
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 580cc1d..b6fcf5a 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -612,15 +612,18 @@ static void __inject_pit_timer_intr(struct kvm *kvm)
mutex_unlock(&kvm->lock);
/*
- * Provides NMI watchdog support in IOAPIC mode.
- * The route is: PIT -> PIC -> LVT0 in NMI mode,
- * timer IRQs will continue to flow through the IOAPIC.
+ * Provides NMI watchdog support via Virtual Wire mode.
+ * The route is: PIT -> PIC -> LVT0 in NMI mode.
+ *
+ * Note: Our Virtual Wire implementation is simplified, only
+ * propagating PIT interrupts to all VCPUs when they have set
+ * LVT0 to NMI delivery. Other PIC interrupts are just sent to
+ * VCPU0, and only if its LVT0 is in EXTINT mode.
*/
for (i = 0; i < KVM_MAX_VCPUS; ++i) {
vcpu = kvm->vcpus[i];
- if (!vcpu)
- continue;
- kvm_apic_local_deliver(vcpu, APIC_LVT0);
+ if (vcpu)
+ kvm_apic_nmi_wd_deliver(vcpu);
}
}
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 71e37a5..b9e9051 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -87,7 +87,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s);
void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
-int kvm_apic_local_deliver(struct kvm_vcpu *vcpu, int lvt_type);
+void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu);
void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 304f9dd..0b0d413 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -973,14 +973,12 @@ int apic_has_pending_timer(struct kvm_vcpu *vcpu)
return 0;
}
-int kvm_apic_local_deliver(struct kvm_vcpu *vcpu, int lvt_type)
+static int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
{
- struct kvm_lapic *apic = vcpu->arch.apic;
+ u32 reg = apic_get_reg(apic, lvt_type);
int vector, mode, trig_mode;
- u32 reg;
- if (apic && apic_enabled(apic)) {
- reg = apic_get_reg(apic, lvt_type);
+ if (apic_hw_enabled(apic) && !(reg & APIC_LVT_MASKED)) {
vector = reg & APIC_VECTOR_MASK;
mode = reg & APIC_MODE_MASK;
trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
@@ -989,9 +987,12 @@ int kvm_apic_local_deliver(struct kvm_vcpu *vcpu, int lvt_type)
return 0;
}
-static inline int __inject_apic_timer_irq(struct kvm_lapic *apic)
+void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
{
- return kvm_apic_local_deliver(apic->vcpu, APIC_LVTT);
+ struct kvm_lapic *apic = vcpu->arch.apic;
+
+ if (apic)
+ kvm_apic_local_deliver(apic, APIC_LVT0);
}
static enum hrtimer_restart apic_timer_fn(struct hrtimer *data)
@@ -1086,9 +1087,8 @@ void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu)
{
struct kvm_lapic *apic = vcpu->arch.apic;
- if (apic && apic_lvt_enabled(apic, APIC_LVTT) &&
- atomic_read(&apic->timer.pending) > 0) {
- if (__inject_apic_timer_irq(apic))
+ if (apic && atomic_read(&apic->timer.pending) > 0) {
+ if (kvm_apic_local_deliver(apic, APIC_LVTT))
atomic_dec(&apic->timer.pending);
}
}
--
1.6.0.3
From: Izik Eidus <[email protected]>
Some areas of kvm x86 mmu are using gfn offset inside a slot without
unaliasing the gfn first. This patch makes sure that the gfn will be
unaliased and add gfn_to_memslot_unaliased() to save the calculating
of the gfn unaliasing in case we have it unaliased already.
Signed-off-by: Izik Eidus <[email protected]>
Acked-by: Marcelo Tosatti <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/mmu.c | 14 ++++++++++----
virt/kvm/kvm_main.c | 9 +++++----
3 files changed, 17 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09e6c56..99e3cc1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -617,6 +617,8 @@ void kvm_disable_tdp(void);
int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
int complete_pio(struct kvm_vcpu *vcpu);
+struct kvm_memory_slot *gfn_to_memslot_unaliased(struct kvm *kvm, gfn_t gfn);
+
static inline struct kvm_mmu_page *page_header(hpa_t shadow_page)
{
struct page *page = pfn_to_page(shadow_page >> PAGE_SHIFT);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8687758..8904e8a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -386,7 +386,9 @@ static void account_shadowed(struct kvm *kvm, gfn_t gfn)
{
int *write_count;
- write_count = slot_largepage_idx(gfn, gfn_to_memslot(kvm, gfn));
+ gfn = unalias_gfn(kvm, gfn);
+ write_count = slot_largepage_idx(gfn,
+ gfn_to_memslot_unaliased(kvm, gfn));
*write_count += 1;
}
@@ -394,16 +396,20 @@ static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
{
int *write_count;
- write_count = slot_largepage_idx(gfn, gfn_to_memslot(kvm, gfn));
+ gfn = unalias_gfn(kvm, gfn);
+ write_count = slot_largepage_idx(gfn,
+ gfn_to_memslot_unaliased(kvm, gfn));
*write_count -= 1;
WARN_ON(*write_count < 0);
}
static int has_wrprotected_page(struct kvm *kvm, gfn_t gfn)
{
- struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
+ struct kvm_memory_slot *slot;
int *largepage_idx;
+ gfn = unalias_gfn(kvm, gfn);
+ slot = gfn_to_memslot_unaliased(kvm, gfn);
if (slot) {
largepage_idx = slot_largepage_idx(gfn, slot);
return *largepage_idx;
@@ -2973,8 +2979,8 @@ static void audit_write_protection(struct kvm_vcpu *vcpu)
if (sp->role.metaphysical)
continue;
- slot = gfn_to_memslot(vcpu->kvm, sp->gfn);
gfn = unalias_gfn(vcpu->kvm, sp->gfn);
+ slot = gfn_to_memslot_unaliased(vcpu->kvm, sp->gfn);
rmapp = &slot->rmap[gfn - slot->base_gfn];
if (*rmapp)
printk(KERN_ERR "%s: (%s) shadow page has writable"
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1838052..a65baa9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -923,7 +923,7 @@ int kvm_is_error_hva(unsigned long addr)
}
EXPORT_SYMBOL_GPL(kvm_is_error_hva);
-static struct kvm_memory_slot *__gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
+struct kvm_memory_slot *gfn_to_memslot_unaliased(struct kvm *kvm, gfn_t gfn)
{
int i;
@@ -936,11 +936,12 @@ static struct kvm_memory_slot *__gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
}
return NULL;
}
+EXPORT_SYMBOL_GPL(gfn_to_memslot_unaliased);
struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
{
gfn = unalias_gfn(kvm, gfn);
- return __gfn_to_memslot(kvm, gfn);
+ return gfn_to_memslot_unaliased(kvm, gfn);
}
int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
@@ -964,7 +965,7 @@ unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn)
struct kvm_memory_slot *slot;
gfn = unalias_gfn(kvm, gfn);
- slot = __gfn_to_memslot(kvm, gfn);
+ slot = gfn_to_memslot_unaliased(kvm, gfn);
if (!slot)
return bad_hva();
return (slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE);
@@ -1215,7 +1216,7 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn)
struct kvm_memory_slot *memslot;
gfn = unalias_gfn(kvm, gfn);
- memslot = __gfn_to_memslot(kvm, gfn);
+ memslot = gfn_to_memslot_unaliased(kvm, gfn);
if (memslot && memslot->dirty_bitmap) {
unsigned long rel_gfn = gfn - memslot->base_gfn;
--
1.6.0.3
From: Jan Kiszka <[email protected]>
As suggested by Avi, this patch introduces a counter of VCPUs that have
LVT0 set to NMI mode. Only if the counter > 0, we push the PIT ticks via
all LAPIC LVT0 lines to enable NMI watchdog support.
Signed-off-by: Jan Kiszka <[email protected]>
Acked-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/i8254.c | 11 ++++++-----
arch/x86/kvm/lapic.c | 23 ++++++++++++++++++++---
3 files changed, 27 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 59c3ae1..09e6c56 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -361,6 +361,7 @@ struct kvm_arch{
struct kvm_ioapic *vioapic;
struct kvm_pit *vpit;
struct hlist_head irq_ack_notifier_list;
+ int vapics_in_nmi_mode;
int round_robin_prev_vcpu;
unsigned int tss_addr;
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index b6fcf5a..e665d1c 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -620,11 +620,12 @@ static void __inject_pit_timer_intr(struct kvm *kvm)
* LVT0 to NMI delivery. Other PIC interrupts are just sent to
* VCPU0, and only if its LVT0 is in EXTINT mode.
*/
- for (i = 0; i < KVM_MAX_VCPUS; ++i) {
- vcpu = kvm->vcpus[i];
- if (vcpu)
- kvm_apic_nmi_wd_deliver(vcpu);
- }
+ if (kvm->arch.vapics_in_nmi_mode > 0)
+ for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+ vcpu = kvm->vcpus[i];
+ if (vcpu)
+ kvm_apic_nmi_wd_deliver(vcpu);
+ }
}
void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0b0d413..afac68c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -130,6 +130,11 @@ static inline int apic_lvtt_period(struct kvm_lapic *apic)
return apic_get_reg(apic, APIC_LVTT) & APIC_LVT_TIMER_PERIODIC;
}
+static inline int apic_lvt_nmi_mode(u32 lvt_val)
+{
+ return (lvt_val & (APIC_MODE_MASK | APIC_LVT_MASKED)) == APIC_DM_NMI;
+}
+
static unsigned int apic_lvt_mask[APIC_LVT_NUM] = {
LVT_MASK | APIC_LVT_TIMER_PERIODIC, /* LVTT */
LVT_MASK | APIC_MODE_MASK, /* LVTTHMR */
@@ -672,6 +677,20 @@ static void start_apic_timer(struct kvm_lapic *apic)
apic->timer.period)));
}
+static void apic_manage_nmi_watchdog(struct kvm_lapic *apic, u32 lvt0_val)
+{
+ int nmi_wd_enabled = apic_lvt_nmi_mode(apic_get_reg(apic, APIC_LVT0));
+
+ if (apic_lvt_nmi_mode(lvt0_val)) {
+ if (!nmi_wd_enabled) {
+ apic_debug("Receive NMI setting on APIC_LVT0 "
+ "for cpu %d\n", apic->vcpu->vcpu_id);
+ apic->vcpu->kvm->arch.vapics_in_nmi_mode++;
+ }
+ } else if (nmi_wd_enabled)
+ apic->vcpu->kvm->arch.vapics_in_nmi_mode--;
+}
+
static void apic_mmio_write(struct kvm_io_device *this,
gpa_t address, int len, const void *data)
{
@@ -753,9 +772,7 @@ static void apic_mmio_write(struct kvm_io_device *this,
break;
case APIC_LVT0:
- if (val == APIC_DM_NMI)
- apic_debug("Receive NMI setting on APIC_LVT0 "
- "for cpu %d\n", apic->vcpu->vcpu_id);
+ apic_manage_nmi_watchdog(apic, val);
case APIC_LVTT:
case APIC_LVTTHMR:
case APIC_LVTPC:
--
1.6.0.3
From: Xiantao Zhang <[email protected]>
1. Increase the size of data area to 64M
2. Support more vcpus and memory, 128 vcpus and 256G memory are supported
for guests.
3. Add the boundary check for memory and vcpu allocation.
With this patch, kvm guest's data area looks as follow:
*
* +----------------------+ ------- KVM_VM_DATA_SIZE
* | vcpu[n]'s data | | ___________________KVM_STK_OFFSET
* | | | / |
* | .......... | | /vcpu's struct&stack |
* | .......... | | /---------------------|---- 0
* | vcpu[5]'s data | | / vpd |
* | vcpu[4]'s data | |/-----------------------|
* | vcpu[3]'s data | / vtlb |
* | vcpu[2]'s data | /|------------------------|
* | vcpu[1]'s data |/ | vhpt |
* | vcpu[0]'s data |____________________________|
* +----------------------+ |
* | memory dirty log | |
* +----------------------+ |
* | vm's data struct | |
* +----------------------+ |
* | | |
* | | |
* | | |
* | | |
* | | |
* | | |
* | | |
* | vm's p2m table | |
* | | |
* | | |
* | | | |
* vm's data->| | | |
* +----------------------+ ------- 0
* To support large memory, needs to increase the size of p2m.
* To support more vcpus, needs to ensure it has enough space to
* hold vcpus' data.
*/
Signed-off-by: Xiantao Zhang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/ia64/include/asm/kvm_host.h | 192 ++++++++++++++++++++++++--------------
arch/ia64/kvm/kvm-ia64.c | 60 ++++++------
arch/ia64/kvm/kvm_minstate.h | 4 +-
arch/ia64/kvm/misc.h | 3 +-
arch/ia64/kvm/vcpu.c | 5 +-
arch/ia64/kvm/vtlb.c | 4 +-
6 files changed, 161 insertions(+), 107 deletions(-)
diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index c60d324..678e264 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -23,17 +23,6 @@
#ifndef __ASM_KVM_HOST_H
#define __ASM_KVM_HOST_H
-
-#include <linux/types.h>
-#include <linux/mm.h>
-#include <linux/kvm.h>
-#include <linux/kvm_para.h>
-#include <linux/kvm_types.h>
-
-#include <asm/pal.h>
-#include <asm/sal.h>
-
-#define KVM_MAX_VCPUS 4
#define KVM_MEMORY_SLOTS 32
/* memory slots that does not exposed to userspace */
#define KVM_PRIVATE_MEM_SLOTS 4
@@ -52,68 +41,127 @@
#define EXIT_REASON_PTC_G 8
/*Define vmm address space and vm data space.*/
-#define KVM_VMM_SIZE (16UL<<20)
+#define KVM_VMM_SIZE (__IA64_UL_CONST(16)<<20)
#define KVM_VMM_SHIFT 24
-#define KVM_VMM_BASE 0xD000000000000000UL
-#define VMM_SIZE (8UL<<20)
+#define KVM_VMM_BASE 0xD000000000000000
+#define VMM_SIZE (__IA64_UL_CONST(8)<<20)
/*
* Define vm_buffer, used by PAL Services, base address.
- * Note: vmbuffer is in the VMM-BLOCK, the size must be < 8M
+ * Note: vm_buffer is in the VMM-BLOCK, the size must be < 8M
*/
#define KVM_VM_BUFFER_BASE (KVM_VMM_BASE + VMM_SIZE)
-#define KVM_VM_BUFFER_SIZE (8UL<<20)
-
-/*Define Virtual machine data layout.*/
-#define KVM_VM_DATA_SHIFT 24
-#define KVM_VM_DATA_SIZE (1UL << KVM_VM_DATA_SHIFT)
-#define KVM_VM_DATA_BASE (KVM_VMM_BASE + KVM_VMM_SIZE)
-
-
-#define KVM_P2M_BASE KVM_VM_DATA_BASE
-#define KVM_P2M_OFS 0
-#define KVM_P2M_SIZE (8UL << 20)
-
-#define KVM_VHPT_BASE (KVM_P2M_BASE + KVM_P2M_SIZE)
-#define KVM_VHPT_OFS KVM_P2M_SIZE
-#define KVM_VHPT_BLOCK_SIZE (2UL << 20)
-#define VHPT_SHIFT 18
-#define VHPT_SIZE (1UL << VHPT_SHIFT)
-#define VHPT_NUM_ENTRIES (1<<(VHPT_SHIFT-5))
-
-#define KVM_VTLB_BASE (KVM_VHPT_BASE+KVM_VHPT_BLOCK_SIZE)
-#define KVM_VTLB_OFS (KVM_VHPT_OFS+KVM_VHPT_BLOCK_SIZE)
-#define KVM_VTLB_BLOCK_SIZE (1UL<<20)
-#define VTLB_SHIFT 17
-#define VTLB_SIZE (1UL<<VTLB_SHIFT)
-#define VTLB_NUM_ENTRIES (1<<(VTLB_SHIFT-5))
-
-#define KVM_VPD_BASE (KVM_VTLB_BASE+KVM_VTLB_BLOCK_SIZE)
-#define KVM_VPD_OFS (KVM_VTLB_OFS+KVM_VTLB_BLOCK_SIZE)
-#define KVM_VPD_BLOCK_SIZE (2UL<<20)
-#define VPD_SHIFT 16
-#define VPD_SIZE (1UL<<VPD_SHIFT)
-
-#define KVM_VCPU_BASE (KVM_VPD_BASE+KVM_VPD_BLOCK_SIZE)
-#define KVM_VCPU_OFS (KVM_VPD_OFS+KVM_VPD_BLOCK_SIZE)
-#define KVM_VCPU_BLOCK_SIZE (2UL<<20)
-#define VCPU_SHIFT 18
-#define VCPU_SIZE (1UL<<VCPU_SHIFT)
-#define MAX_VCPU_NUM KVM_VCPU_BLOCK_SIZE/VCPU_SIZE
-
-#define KVM_VM_BASE (KVM_VCPU_BASE+KVM_VCPU_BLOCK_SIZE)
-#define KVM_VM_OFS (KVM_VCPU_OFS+KVM_VCPU_BLOCK_SIZE)
-#define KVM_VM_BLOCK_SIZE (1UL<<19)
-
-#define KVM_MEM_DIRTY_LOG_BASE (KVM_VM_BASE+KVM_VM_BLOCK_SIZE)
-#define KVM_MEM_DIRTY_LOG_OFS (KVM_VM_OFS+KVM_VM_BLOCK_SIZE)
-#define KVM_MEM_DIRTY_LOG_SIZE (1UL<<19)
-
-/* Get vpd, vhpt, tlb, vcpu, base*/
-#define VPD_ADDR(n) (KVM_VPD_BASE+n*VPD_SIZE)
-#define VHPT_ADDR(n) (KVM_VHPT_BASE+n*VHPT_SIZE)
-#define VTLB_ADDR(n) (KVM_VTLB_BASE+n*VTLB_SIZE)
-#define VCPU_ADDR(n) (KVM_VCPU_BASE+n*VCPU_SIZE)
+#define KVM_VM_BUFFER_SIZE (__IA64_UL_CONST(8)<<20)
+
+/*
+ * kvm guest's data area looks as follow:
+ *
+ * +----------------------+ ------- KVM_VM_DATA_SIZE
+ * | vcpu[n]'s data | | ___________________KVM_STK_OFFSET
+ * | | | / |
+ * | .......... | | /vcpu's struct&stack |
+ * | .......... | | /---------------------|---- 0
+ * | vcpu[5]'s data | | / vpd |
+ * | vcpu[4]'s data | |/-----------------------|
+ * | vcpu[3]'s data | / vtlb |
+ * | vcpu[2]'s data | /|------------------------|
+ * | vcpu[1]'s data |/ | vhpt |
+ * | vcpu[0]'s data |____________________________|
+ * +----------------------+ |
+ * | memory dirty log | |
+ * +----------------------+ |
+ * | vm's data struct | |
+ * +----------------------+ |
+ * | | |
+ * | | |
+ * | | |
+ * | | |
+ * | | |
+ * | | |
+ * | | |
+ * | vm's p2m table | |
+ * | | |
+ * | | |
+ * | | | |
+ * vm's data->| | | |
+ * +----------------------+ ------- 0
+ * To support large memory, needs to increase the size of p2m.
+ * To support more vcpus, needs to ensure it has enough space to
+ * hold vcpus' data.
+ */
+
+#define KVM_VM_DATA_SHIFT 26
+#define KVM_VM_DATA_SIZE (__IA64_UL_CONST(1) << KVM_VM_DATA_SHIFT)
+#define KVM_VM_DATA_BASE (KVM_VMM_BASE + KVM_VM_DATA_SIZE)
+
+#define KVM_P2M_BASE KVM_VM_DATA_BASE
+#define KVM_P2M_SIZE (__IA64_UL_CONST(24) << 20)
+
+#define VHPT_SHIFT 16
+#define VHPT_SIZE (__IA64_UL_CONST(1) << VHPT_SHIFT)
+#define VHPT_NUM_ENTRIES (__IA64_UL_CONST(1) << (VHPT_SHIFT-5))
+
+#define VTLB_SHIFT 16
+#define VTLB_SIZE (__IA64_UL_CONST(1) << VTLB_SHIFT)
+#define VTLB_NUM_ENTRIES (1UL << (VHPT_SHIFT-5))
+
+#define VPD_SHIFT 16
+#define VPD_SIZE (__IA64_UL_CONST(1) << VPD_SHIFT)
+
+#define VCPU_STRUCT_SHIFT 16
+#define VCPU_STRUCT_SIZE (__IA64_UL_CONST(1) << VCPU_STRUCT_SHIFT)
+
+#define KVM_STK_OFFSET VCPU_STRUCT_SIZE
+
+#define KVM_VM_STRUCT_SHIFT 19
+#define KVM_VM_STRUCT_SIZE (__IA64_UL_CONST(1) << KVM_VM_STRUCT_SHIFT)
+
+#define KVM_MEM_DIRY_LOG_SHIFT 19
+#define KVM_MEM_DIRTY_LOG_SIZE (__IA64_UL_CONST(1) << KVM_MEM_DIRY_LOG_SHIFT)
+
+#ifndef __ASSEMBLY__
+
+/*Define the max vcpus and memory for Guests.*/
+#define KVM_MAX_VCPUS (KVM_VM_DATA_SIZE - KVM_P2M_SIZE - KVM_VM_STRUCT_SIZE -\
+ KVM_MEM_DIRTY_LOG_SIZE) / sizeof(struct kvm_vcpu_data)
+#define KVM_MAX_MEM_SIZE (KVM_P2M_SIZE >> 3 << PAGE_SHIFT)
+
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/kvm.h>
+#include <linux/kvm_para.h>
+#include <linux/kvm_types.h>
+
+#include <asm/pal.h>
+#include <asm/sal.h>
+#include <asm/page.h>
+
+struct kvm_vcpu_data {
+ char vcpu_vhpt[VHPT_SIZE];
+ char vcpu_vtlb[VTLB_SIZE];
+ char vcpu_vpd[VPD_SIZE];
+ char vcpu_struct[VCPU_STRUCT_SIZE];
+};
+
+struct kvm_vm_data {
+ char kvm_p2m[KVM_P2M_SIZE];
+ char kvm_vm_struct[KVM_VM_STRUCT_SIZE];
+ char kvm_mem_dirty_log[KVM_MEM_DIRTY_LOG_SIZE];
+ struct kvm_vcpu_data vcpu_data[KVM_MAX_VCPUS];
+};
+
+#define VCPU_BASE(n) KVM_VM_DATA_BASE + \
+ offsetof(struct kvm_vm_data, vcpu_data[n])
+#define VM_BASE KVM_VM_DATA_BASE + \
+ offsetof(struct kvm_vm_data, kvm_vm_struct)
+#define KVM_MEM_DIRTY_LOG_BASE KVM_VM_DATA_BASE + \
+ offsetof(struct kvm_vm_data, kvm_mem_dirty_log)
+
+#define VHPT_BASE(n) (VCPU_BASE(n) + offsetof(struct kvm_vcpu_data, vcpu_vhpt))
+#define VTLB_BASE(n) (VCPU_BASE(n) + offsetof(struct kvm_vcpu_data, vcpu_vtlb))
+#define VPD_BASE(n) (VCPU_BASE(n) + offsetof(struct kvm_vcpu_data, vcpu_vpd))
+#define VCPU_STRUCT_BASE(n) (VCPU_BASE(n) + \
+ offsetof(struct kvm_vcpu_data, vcpu_struct))
/*IO section definitions*/
#define IOREQ_READ 1
@@ -403,14 +451,13 @@ struct kvm_sal_data {
};
struct kvm_arch {
+ spinlock_t dirty_log_lock;
+
unsigned long vm_base;
unsigned long metaphysical_rr0;
unsigned long metaphysical_rr4;
unsigned long vmm_init_rr;
- unsigned long vhpt_base;
- unsigned long vtlb_base;
- unsigned long vpd_base;
- spinlock_t dirty_log_lock;
+
struct kvm_ioapic *vioapic;
struct kvm_vm_stat stat;
struct kvm_sal_data rdv_sal_data;
@@ -512,7 +559,7 @@ struct kvm_pt_regs {
static inline struct kvm_pt_regs *vcpu_regs(struct kvm_vcpu *v)
{
- return (struct kvm_pt_regs *) ((unsigned long) v + IA64_STK_OFFSET) - 1;
+ return (struct kvm_pt_regs *) ((unsigned long) v + KVM_STK_OFFSET) - 1;
}
typedef int kvm_vmm_entry(void);
@@ -531,5 +578,6 @@ int kvm_pal_emul(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run);
void kvm_sal_emul(struct kvm_vcpu *vcpu);
static inline void kvm_inject_nmi(struct kvm_vcpu *vcpu) {}
+#endif /* __ASSEMBLY__*/
#endif
diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
index af1464f..43e45f6 100644
--- a/arch/ia64/kvm/kvm-ia64.c
+++ b/arch/ia64/kvm/kvm-ia64.c
@@ -698,27 +698,24 @@ out:
return r;
}
-/*
- * Allocate 16M memory for every vm to hold its specific data.
- * Its memory map is defined in kvm_host.h.
- */
static struct kvm *kvm_alloc_kvm(void)
{
struct kvm *kvm;
uint64_t vm_base;
+ BUG_ON(sizeof(struct kvm) > KVM_VM_STRUCT_SIZE);
+
vm_base = __get_free_pages(GFP_KERNEL, get_order(KVM_VM_DATA_SIZE));
if (!vm_base)
return ERR_PTR(-ENOMEM);
- printk(KERN_DEBUG"kvm: VM data's base Address:0x%lx\n", vm_base);
- /* Zero all pages before use! */
memset((void *)vm_base, 0, KVM_VM_DATA_SIZE);
-
- kvm = (struct kvm *)(vm_base + KVM_VM_OFS);
+ kvm = (struct kvm *)(vm_base +
+ offsetof(struct kvm_vm_data, kvm_vm_struct));
kvm->arch.vm_base = vm_base;
+ printk(KERN_DEBUG"kvm: vm's data area:0x%lx\n", vm_base);
return kvm;
}
@@ -760,21 +757,12 @@ static void kvm_build_io_pmt(struct kvm *kvm)
static void kvm_init_vm(struct kvm *kvm)
{
- long vm_base;
-
BUG_ON(!kvm);
kvm->arch.metaphysical_rr0 = GUEST_PHYSICAL_RR0;
kvm->arch.metaphysical_rr4 = GUEST_PHYSICAL_RR4;
kvm->arch.vmm_init_rr = VMM_INIT_RR;
- vm_base = kvm->arch.vm_base;
- if (vm_base) {
- kvm->arch.vhpt_base = vm_base + KVM_VHPT_OFS;
- kvm->arch.vtlb_base = vm_base + KVM_VTLB_OFS;
- kvm->arch.vpd_base = vm_base + KVM_VPD_OFS;
- }
-
/*
*Fill P2M entries for MMIO/IO ranges
*/
@@ -864,7 +852,7 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
goto out;
r = copy_from_user(vcpu + 1, regs->saved_stack +
sizeof(struct kvm_vcpu),
- IA64_STK_OFFSET - sizeof(struct kvm_vcpu));
+ KVM_STK_OFFSET - sizeof(struct kvm_vcpu));
if (r)
goto out;
vcpu->arch.exit_data =
@@ -1166,10 +1154,11 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
/*Set entry address for first run.*/
regs->cr_iip = PALE_RESET_ENTRY;
- /*Initilize itc offset for vcpus*/
+ /*Initialize itc offset for vcpus*/
itc_offset = 0UL - ia64_getreg(_IA64_REG_AR_ITC);
- for (i = 0; i < MAX_VCPU_NUM; i++) {
- v = (struct kvm_vcpu *)((char *)vcpu + VCPU_SIZE * i);
+ for (i = 0; i < KVM_MAX_VCPUS; i++) {
+ v = (struct kvm_vcpu *)((char *)vcpu +
+ sizeof(struct kvm_vcpu_data) * i);
v->arch.itc_offset = itc_offset;
v->arch.last_itc = 0;
}
@@ -1183,7 +1172,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
vcpu->arch.apic->vcpu = vcpu;
p_ctx->gr[1] = 0;
- p_ctx->gr[12] = (unsigned long)((char *)vmm_vcpu + IA64_STK_OFFSET);
+ p_ctx->gr[12] = (unsigned long)((char *)vmm_vcpu + KVM_STK_OFFSET);
p_ctx->gr[13] = (unsigned long)vmm_vcpu;
p_ctx->psr = 0x1008522000UL;
p_ctx->ar[40] = FPSR_DEFAULT; /*fpsr*/
@@ -1218,12 +1207,12 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
vcpu->arch.hlt_timer.function = hlt_timer_fn;
vcpu->arch.last_run_cpu = -1;
- vcpu->arch.vpd = (struct vpd *)VPD_ADDR(vcpu->vcpu_id);
+ vcpu->arch.vpd = (struct vpd *)VPD_BASE(vcpu->vcpu_id);
vcpu->arch.vsa_base = kvm_vsa_base;
vcpu->arch.__gp = kvm_vmm_gp;
vcpu->arch.dirty_log_lock_pa = __pa(&kvm->arch.dirty_log_lock);
- vcpu->arch.vhpt.hash = (struct thash_data *)VHPT_ADDR(vcpu->vcpu_id);
- vcpu->arch.vtlb.hash = (struct thash_data *)VTLB_ADDR(vcpu->vcpu_id);
+ vcpu->arch.vhpt.hash = (struct thash_data *)VHPT_BASE(vcpu->vcpu_id);
+ vcpu->arch.vtlb.hash = (struct thash_data *)VTLB_BASE(vcpu->vcpu_id);
init_ptce_info(vcpu);
r = 0;
@@ -1273,12 +1262,22 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
int r;
int cpu;
+ BUG_ON(sizeof(struct kvm_vcpu) > VCPU_STRUCT_SIZE/2);
+
+ r = -EINVAL;
+ if (id >= KVM_MAX_VCPUS) {
+ printk(KERN_ERR"kvm: Can't configure vcpus > %ld",
+ KVM_MAX_VCPUS);
+ goto fail;
+ }
+
r = -ENOMEM;
if (!vm_base) {
printk(KERN_ERR"kvm: Create vcpu[%d] error!\n", id);
goto fail;
}
- vcpu = (struct kvm_vcpu *)(vm_base + KVM_VCPU_OFS + VCPU_SIZE * id);
+ vcpu = (struct kvm_vcpu *)(vm_base + offsetof(struct kvm_vm_data,
+ vcpu_data[id].vcpu_struct));
vcpu->kvm = kvm;
cpu = get_cpu();
@@ -1396,7 +1395,7 @@ int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs)
sizeof(union context));
if (r)
goto out;
- r = copy_to_user(regs->saved_stack, (void *)vcpu, IA64_STK_OFFSET);
+ r = copy_to_user(regs->saved_stack, (void *)vcpu, KVM_STK_OFFSET);
if (r)
goto out;
SAVE_REGS(mp_state);
@@ -1457,6 +1456,9 @@ int kvm_arch_set_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot = &kvm->memslots[mem->slot];
unsigned long base_gfn = memslot->base_gfn;
+ if (base_gfn + npages > (KVM_MAX_MEM_SIZE >> PAGE_SHIFT))
+ return -ENOMEM;
+
for (i = 0; i < npages; i++) {
pfn = gfn_to_pfn(kvm, base_gfn + i);
if (!kvm_is_mmio_pfn(pfn)) {
@@ -1631,8 +1633,8 @@ static int kvm_ia64_sync_dirty_log(struct kvm *kvm,
struct kvm_memory_slot *memslot;
int r, i;
long n, base;
- unsigned long *dirty_bitmap = (unsigned long *)((void *)kvm - KVM_VM_OFS
- + KVM_MEM_DIRTY_LOG_OFS);
+ unsigned long *dirty_bitmap = (unsigned long *)(kvm->arch.vm_base +
+ offsetof(struct kvm_vm_data, kvm_mem_dirty_log));
r = -EINVAL;
if (log->slot >= KVM_MEMORY_SLOTS)
diff --git a/arch/ia64/kvm/kvm_minstate.h b/arch/ia64/kvm/kvm_minstate.h
index 2cc41d1..b2bcaa2 100644
--- a/arch/ia64/kvm/kvm_minstate.h
+++ b/arch/ia64/kvm/kvm_minstate.h
@@ -24,6 +24,8 @@
#include <asm/asmmacro.h>
#include <asm/types.h>
#include <asm/kregs.h>
+#include <asm/kvm_host.h>
+
#include "asm-offsets.h"
#define KVM_MINSTATE_START_SAVE_MIN \
@@ -33,7 +35,7 @@
addl r22 = VMM_RBS_OFFSET,r1; /* compute base of RBS */ \
;; \
lfetch.fault.excl.nt1 [r22]; \
- addl r1 = IA64_STK_OFFSET-VMM_PT_REGS_SIZE,r1; /* compute base of memory stack */ \
+ addl r1 = KVM_STK_OFFSET-VMM_PT_REGS_SIZE, r1; \
mov r23 = ar.bspstore; /* save ar.bspstore */ \
;; \
mov ar.bspstore = r22; /* switch to kernel RBS */\
diff --git a/arch/ia64/kvm/misc.h b/arch/ia64/kvm/misc.h
index e585c46..dd979e0 100644
--- a/arch/ia64/kvm/misc.h
+++ b/arch/ia64/kvm/misc.h
@@ -27,7 +27,8 @@
*/
static inline uint64_t *kvm_host_get_pmt(struct kvm *kvm)
{
- return (uint64_t *)(kvm->arch.vm_base + KVM_P2M_OFS);
+ return (uint64_t *)(kvm->arch.vm_base +
+ offsetof(struct kvm_vm_data, kvm_p2m));
}
static inline void kvm_set_pmt_entry(struct kvm *kvm, gfn_t gfn,
diff --git a/arch/ia64/kvm/vcpu.c b/arch/ia64/kvm/vcpu.c
index e44027c..a528d70 100644
--- a/arch/ia64/kvm/vcpu.c
+++ b/arch/ia64/kvm/vcpu.c
@@ -816,8 +816,9 @@ static void vcpu_set_itc(struct kvm_vcpu *vcpu, u64 val)
unsigned long vitv = VCPU(vcpu, itv);
if (vcpu->vcpu_id == 0) {
- for (i = 0; i < MAX_VCPU_NUM; i++) {
- v = (struct kvm_vcpu *)((char *)vcpu + VCPU_SIZE * i);
+ for (i = 0; i < KVM_MAX_VCPUS; i++) {
+ v = (struct kvm_vcpu *)((char *)vcpu +
+ sizeof(struct kvm_vcpu_data) * i);
VMX(v, itc_offset) = itc_offset;
VMX(v, last_itc) = 0;
}
diff --git a/arch/ia64/kvm/vtlb.c b/arch/ia64/kvm/vtlb.c
index e22b933..6b6307a 100644
--- a/arch/ia64/kvm/vtlb.c
+++ b/arch/ia64/kvm/vtlb.c
@@ -183,8 +183,8 @@ void mark_pages_dirty(struct kvm_vcpu *v, u64 pte, u64 ps)
u64 i, dirty_pages = 1;
u64 base_gfn = (pte&_PAGE_PPN_MASK) >> PAGE_SHIFT;
spinlock_t *lock = __kvm_va(v->arch.dirty_log_lock_pa);
- void *dirty_bitmap = (void *)v - (KVM_VCPU_OFS + v->vcpu_id * VCPU_SIZE)
- + KVM_MEM_DIRTY_LOG_OFS;
+ void *dirty_bitmap = (void *)KVM_MEM_DIRTY_LOG_BASE;
+
dirty_pages <<= ps <= PAGE_SHIFT ? 0 : ps - PAGE_SHIFT;
vmm_spin_lock(lock);
--
1.6.0.3
From: Sheng Yang <[email protected]>
PCI device assignment would map guest MMIO spaces as separate slot, so it is
possible that the device has more than 2 MMIO spaces and overwrite current
private memslot.
The patch move private memory slot to the top of userspace visible memory slots.
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/vmx.h | 5 +++--
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index dae134f..7623eb7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2513,7 +2513,7 @@ static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
{
int ret;
struct kvm_userspace_memory_region tss_mem = {
- .slot = 8,
+ .slot = TSS_PRIVATE_MEMSLOT,
.guest_phys_addr = addr,
.memory_size = PAGE_SIZE * 3,
.flags = 0,
diff --git a/arch/x86/kvm/vmx.h b/arch/x86/kvm/vmx.h
index 18598af..3db236c 100644
--- a/arch/x86/kvm/vmx.h
+++ b/arch/x86/kvm/vmx.h
@@ -338,8 +338,9 @@ enum vmcs_field {
#define AR_RESERVD_MASK 0xfffe0f00
-#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT 9
-#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT 10
+#define TSS_PRIVATE_MEMSLOT (KVM_MEMORY_SLOTS + 0)
+#define APIC_ACCESS_PAGE_PRIVATE_MEMSLOT (KVM_MEMORY_SLOTS + 1)
+#define IDENTITY_PAGETABLE_PRIVATE_MEMSLOT (KVM_MEMORY_SLOTS + 2)
#define VMX_NR_VPIDS (1 << 16)
#define VMX_VPID_EXTENT_SINGLE_CONTEXT 1
--
1.6.0.3
From: Sheng Yang <[email protected]>
GUEST_PAT support is a new feature introduced by Intel Core i7 architecture.
With this, cpu would save/load guest and host PAT automatically, for EPT memory
type in guest depends on MSR_IA32_CR_PAT.
Also add save/restore for MSR_IA32_CR_PAT.
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/vmx.c | 29 ++++++++++++++++++++++++++---
arch/x86/kvm/vmx.h | 7 +++++++
arch/x86/kvm/x86.c | 2 +-
3 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2180109..b4c95a5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -962,6 +962,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
pr_unimpl(vcpu, "unimplemented perfctr wrmsr: 0x%x data 0x%llx\n", msr_index, data);
break;
+ case MSR_IA32_CR_PAT:
+ if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
+ vmcs_write64(GUEST_IA32_PAT, data);
+ vcpu->arch.pat = data;
+ break;
+ }
+ /* Otherwise falls through to kvm_set_msr_common */
default:
vmx_load_host_state(vmx);
msr = find_msr_entry(vmx, msr_index);
@@ -1181,12 +1188,13 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf)
#ifdef CONFIG_X86_64
min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
#endif
- opt = 0;
+ opt = VM_EXIT_SAVE_IA32_PAT | VM_EXIT_LOAD_IA32_PAT;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
&_vmexit_control) < 0)
return -EIO;
- min = opt = 0;
+ min = 0;
+ opt = VM_ENTRY_LOAD_IA32_PAT;
if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
&_vmentry_control) < 0)
return -EIO;
@@ -2092,8 +2100,9 @@ static void vmx_disable_intercept_for_msr(struct page *msr_bitmap, u32 msr)
*/
static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
{
- u32 host_sysenter_cs;
+ u32 host_sysenter_cs, msr_low, msr_high;
u32 junk;
+ u64 host_pat;
unsigned long a;
struct descriptor_table dt;
int i;
@@ -2181,6 +2190,20 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
rdmsrl(MSR_IA32_SYSENTER_EIP, a);
vmcs_writel(HOST_IA32_SYSENTER_EIP, a); /* 22.2.3 */
+ if (vmcs_config.vmexit_ctrl & VM_EXIT_LOAD_IA32_PAT) {
+ rdmsr(MSR_IA32_CR_PAT, msr_low, msr_high);
+ host_pat = msr_low | ((u64) msr_high << 32);
+ vmcs_write64(HOST_IA32_PAT, host_pat);
+ }
+ if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
+ rdmsr(MSR_IA32_CR_PAT, msr_low, msr_high);
+ host_pat = msr_low | ((u64) msr_high << 32);
+ /* Write the default value follow host pat */
+ vmcs_write64(GUEST_IA32_PAT, host_pat);
+ /* Keep arch.pat sync with GUEST_IA32_PAT */
+ vmx->vcpu.arch.pat = host_pat;
+ }
+
for (i = 0; i < NR_VMX_MSR; ++i) {
u32 index = vmx_msr_index[i];
u32 data_low, data_high;
diff --git a/arch/x86/kvm/vmx.h b/arch/x86/kvm/vmx.h
index ec5edc3..18598af 100644
--- a/arch/x86/kvm/vmx.h
+++ b/arch/x86/kvm/vmx.h
@@ -63,10 +63,13 @@
#define VM_EXIT_HOST_ADDR_SPACE_SIZE 0x00000200
#define VM_EXIT_ACK_INTR_ON_EXIT 0x00008000
+#define VM_EXIT_SAVE_IA32_PAT 0x00040000
+#define VM_EXIT_LOAD_IA32_PAT 0x00080000
#define VM_ENTRY_IA32E_MODE 0x00000200
#define VM_ENTRY_SMM 0x00000400
#define VM_ENTRY_DEACT_DUAL_MONITOR 0x00000800
+#define VM_ENTRY_LOAD_IA32_PAT 0x00004000
/* VMCS Encodings */
enum vmcs_field {
@@ -112,6 +115,8 @@ enum vmcs_field {
VMCS_LINK_POINTER_HIGH = 0x00002801,
GUEST_IA32_DEBUGCTL = 0x00002802,
GUEST_IA32_DEBUGCTL_HIGH = 0x00002803,
+ GUEST_IA32_PAT = 0x00002804,
+ GUEST_IA32_PAT_HIGH = 0x00002805,
GUEST_PDPTR0 = 0x0000280a,
GUEST_PDPTR0_HIGH = 0x0000280b,
GUEST_PDPTR1 = 0x0000280c,
@@ -120,6 +125,8 @@ enum vmcs_field {
GUEST_PDPTR2_HIGH = 0x0000280f,
GUEST_PDPTR3 = 0x00002810,
GUEST_PDPTR3_HIGH = 0x00002811,
+ HOST_IA32_PAT = 0x00002c00,
+ HOST_IA32_PAT_HIGH = 0x00002c01,
PIN_BASED_VM_EXEC_CONTROL = 0x00004000,
CPU_BASED_VM_EXEC_CONTROL = 0x00004002,
EXCEPTION_BITMAP = 0x00004004,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f5b2334..0edf753 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -452,7 +452,7 @@ static u32 msrs_to_save[] = {
MSR_CSTAR, MSR_KERNEL_GS_BASE, MSR_SYSCALL_MASK, MSR_LSTAR,
#endif
MSR_IA32_TIME_STAMP_COUNTER, MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK,
- MSR_IA32_PERF_STATUS,
+ MSR_IA32_PERF_STATUS, MSR_IA32_CR_PAT
};
static unsigned num_msrs_to_save;
--
1.6.0.3
From: Sheng Yang <[email protected]>
Otherwise set_bit() for private memory slot(above KVM_MEMORY_SLOTS) would
corrupted memory in 32bit host.
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 8 +++++---
arch/x86/kvm/mmu.c | 6 +++---
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 93040b5..59c3ae1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -192,9 +192,11 @@ struct kvm_mmu_page {
u64 *spt;
/* hold the gfn of each spte inside spt */
gfn_t *gfns;
- unsigned long slot_bitmap; /* One bit set per slot which has memory
- * in this shadow page.
- */
+ /*
+ * One bit set per slot which has memory
+ * in this shadow page.
+ */
+ DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
int multimapped; /* More than one parent_pte? */
int root_count; /* Currently serving as active root */
bool unsync;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 09d05f5..8687758 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -789,7 +789,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
ASSERT(is_empty_shadow_page(sp->spt));
- sp->slot_bitmap = 0;
+ bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
sp->multimapped = 0;
sp->parent_pte = parent_pte;
--vcpu->kvm->arch.n_free_mmu_pages;
@@ -1364,7 +1364,7 @@ static void page_header_update_slot(struct kvm *kvm, void *pte, gfn_t gfn)
int slot = memslot_id(kvm, gfn_to_memslot(kvm, gfn));
struct kvm_mmu_page *sp = page_header(__pa(pte));
- __set_bit(slot, &sp->slot_bitmap);
+ __set_bit(slot, sp->slot_bitmap);
}
static void mmu_convert_notrap(struct kvm_mmu_page *sp)
@@ -2564,7 +2564,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
int i;
u64 *pt;
- if (!test_bit(slot, &sp->slot_bitmap))
+ if (!test_bit(slot, sp->slot_bitmap))
continue;
pt = sp->spt;
--
1.6.0.3
From: Amit Shah <[email protected]>
The hardware does not set the 'g' bit of the cs selector and this breaks
migration from amd hosts to intel hosts. Set this bit if the segment
limit is beyond 1 MB.
Signed-off-by: Amit Shah <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/svm.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 05efc4e..665008d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -772,6 +772,15 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
var->l = (s->attrib >> SVM_SELECTOR_L_SHIFT) & 1;
var->db = (s->attrib >> SVM_SELECTOR_DB_SHIFT) & 1;
var->g = (s->attrib >> SVM_SELECTOR_G_SHIFT) & 1;
+
+ /*
+ * SVM always stores 0 for the 'G' bit in the CS selector in
+ * the VMCB on a VMEXIT. This hurts cross-vendor migration:
+ * Intel's VMENTRY has a check on the 'G' bit.
+ */
+ if (seg == VCPU_SREG_CS)
+ var->g = s->limit > 0xfffff;
+
var->unusable = !var->present;
}
--
1.6.0.3
From: Jan Kiszka <[email protected]>
LINT0 of the LAPIC can be used to route PIT events as NMI watchdog ticks
into the guest. This patch aligns the in-kernel irqchip emulation with
the user space irqchip with already supports this feature. The trick is
to route PIT interrupts to all LAPIC's LVT0 lines.
Rebased and slightly polished patch originally posted by Sheng Yang.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/i8254.c | 15 +++++++++++++++
arch/x86/kvm/irq.h | 1 +
arch/x86/kvm/lapic.c | 34 +++++++++++++++++++++++++++++-----
3 files changed, 45 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 59ebd37..580cc1d 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -603,10 +603,25 @@ void kvm_free_pit(struct kvm *kvm)
static void __inject_pit_timer_intr(struct kvm *kvm)
{
+ struct kvm_vcpu *vcpu;
+ int i;
+
mutex_lock(&kvm->lock);
kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
mutex_unlock(&kvm->lock);
+
+ /*
+ * Provides NMI watchdog support in IOAPIC mode.
+ * The route is: PIT -> PIC -> LVT0 in NMI mode,
+ * timer IRQs will continue to flow through the IOAPIC.
+ */
+ for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+ vcpu = kvm->vcpus[i];
+ if (!vcpu)
+ continue;
+ kvm_apic_local_deliver(vcpu, APIC_LVT0);
+ }
}
void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index f17c8f5..71e37a5 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -87,6 +87,7 @@ void kvm_pic_reset(struct kvm_kpic_state *s);
void kvm_timer_intr_post(struct kvm_vcpu *vcpu, int vec);
void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);
void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu);
+int kvm_apic_local_deliver(struct kvm_vcpu *vcpu, int lvt_type);
void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu);
void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu);
void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0fc3cab..206cc11 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -380,6 +380,14 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
}
break;
+ case APIC_DM_EXTINT:
+ /*
+ * Should only be called by kvm_apic_local_deliver() with LVT0,
+ * before NMI watchdog was enabled. Already handled by
+ * kvm_apic_accept_pic_intr().
+ */
+ break;
+
default:
printk(KERN_ERR "TODO: unsupported delivery mode %x\n",
delivery_mode);
@@ -743,10 +751,13 @@ static void apic_mmio_write(struct kvm_io_device *this,
apic_set_reg(apic, APIC_ICR2, val & 0xff000000);
break;
+ case APIC_LVT0:
+ if (val == APIC_DM_NMI)
+ apic_debug("Receive NMI setting on APIC_LVT0 "
+ "for cpu %d\n", apic->vcpu->vcpu_id);
case APIC_LVTT:
case APIC_LVTTHMR:
case APIC_LVTPC:
- case APIC_LVT0:
case APIC_LVT1:
case APIC_LVTERR:
/* TODO: Check vector */
@@ -961,12 +972,25 @@ int apic_has_pending_timer(struct kvm_vcpu *vcpu)
return 0;
}
-static int __inject_apic_timer_irq(struct kvm_lapic *apic)
+int kvm_apic_local_deliver(struct kvm_vcpu *vcpu, int lvt_type)
{
- int vector;
+ struct kvm_lapic *apic = vcpu->arch.apic;
+ int vector, mode, trig_mode;
+ u32 reg;
+
+ if (apic && apic_enabled(apic)) {
+ reg = apic_get_reg(apic, lvt_type);
+ vector = reg & APIC_VECTOR_MASK;
+ mode = reg & APIC_MODE_MASK;
+ trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
+ return __apic_accept_irq(apic, mode, vector, 1, trig_mode);
+ }
+ return 0;
+}
- vector = apic_lvt_vector(apic, APIC_LVTT);
- return __apic_accept_irq(apic, APIC_DM_FIXED, vector, 1, 0);
+static inline int __inject_apic_timer_irq(struct kvm_lapic *apic)
+{
+ return kvm_apic_local_deliver(apic->vcpu, APIC_LVTT);
}
static enum hrtimer_restart apic_timer_fn(struct hrtimer *data)
--
1.6.0.3
From: Jan Kiszka <[email protected]>
Introduces the KVM_NMI IOCTL to the generic x86 part of KVM for
injecting NMIs from user space and also extends the statistic report
accordingly.
Based on the original patch by Sheng Yang.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/x86.c | 46 +++++++++++++++++++++++++++++++++++++-
include/linux/kvm.h | 11 +++++++-
3 files changed, 55 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index bfbbdea..a40fa84 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -398,6 +398,7 @@ struct kvm_vcpu_stat {
u32 halt_exits;
u32 halt_wakeup;
u32 request_irq_exits;
+ u32 request_nmi_exits;
u32 irq_exits;
u32 host_state_reload;
u32 efer_reload;
@@ -406,6 +407,7 @@ struct kvm_vcpu_stat {
u32 insn_emulation_fail;
u32 hypercalls;
u32 irq_injections;
+ u32 nmi_injections;
};
struct descriptor_table {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1fa9a6d..0797145 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -86,6 +86,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "halt_wakeup", VCPU_STAT(halt_wakeup) },
{ "hypercalls", VCPU_STAT(hypercalls) },
{ "request_irq", VCPU_STAT(request_irq_exits) },
+ { "request_nmi", VCPU_STAT(request_nmi_exits) },
{ "irq_exits", VCPU_STAT(irq_exits) },
{ "host_state_reload", VCPU_STAT(host_state_reload) },
{ "efer_reload", VCPU_STAT(efer_reload) },
@@ -93,6 +94,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ "insn_emulation", VCPU_STAT(insn_emulation) },
{ "insn_emulation_fail", VCPU_STAT(insn_emulation_fail) },
{ "irq_injections", VCPU_STAT(irq_injections) },
+ { "nmi_injections", VCPU_STAT(nmi_injections) },
{ "mmu_shadow_zapped", VM_STAT(mmu_shadow_zapped) },
{ "mmu_pte_write", VM_STAT(mmu_pte_write) },
{ "mmu_pte_updated", VM_STAT(mmu_pte_updated) },
@@ -1318,6 +1320,15 @@ static int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu,
return 0;
}
+static int kvm_vcpu_ioctl_nmi(struct kvm_vcpu *vcpu)
+{
+ vcpu_load(vcpu);
+ kvm_inject_nmi(vcpu);
+ vcpu_put(vcpu);
+
+ return 0;
+}
+
static int vcpu_ioctl_tpr_access_reporting(struct kvm_vcpu *vcpu,
struct kvm_tpr_access_ctl *tac)
{
@@ -1377,6 +1388,13 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = 0;
break;
}
+ case KVM_NMI: {
+ r = kvm_vcpu_ioctl_nmi(vcpu);
+ if (r)
+ goto out;
+ r = 0;
+ break;
+ }
case KVM_SET_CPUID: {
struct kvm_cpuid __user *cpuid_arg = argp;
struct kvm_cpuid cpuid;
@@ -2812,18 +2830,37 @@ static int dm_request_for_irq_injection(struct kvm_vcpu *vcpu,
(kvm_x86_ops->get_rflags(vcpu) & X86_EFLAGS_IF));
}
+/*
+ * Check if userspace requested a NMI window, and that the NMI window
+ * is open.
+ *
+ * No need to exit to userspace if we already have a NMI queued.
+ */
+static int dm_request_for_nmi_injection(struct kvm_vcpu *vcpu,
+ struct kvm_run *kvm_run)
+{
+ return (!vcpu->arch.nmi_pending &&
+ kvm_run->request_nmi_window &&
+ vcpu->arch.nmi_window_open);
+}
+
static void post_kvm_run_save(struct kvm_vcpu *vcpu,
struct kvm_run *kvm_run)
{
kvm_run->if_flag = (kvm_x86_ops->get_rflags(vcpu) & X86_EFLAGS_IF) != 0;
kvm_run->cr8 = kvm_get_cr8(vcpu);
kvm_run->apic_base = kvm_get_apic_base(vcpu);
- if (irqchip_in_kernel(vcpu->kvm))
+ if (irqchip_in_kernel(vcpu->kvm)) {
kvm_run->ready_for_interrupt_injection = 1;
- else
+ kvm_run->ready_for_nmi_injection = 1;
+ } else {
kvm_run->ready_for_interrupt_injection =
(vcpu->arch.interrupt_window_open &&
vcpu->arch.irq_summary == 0);
+ kvm_run->ready_for_nmi_injection =
+ (vcpu->arch.nmi_window_open &&
+ vcpu->arch.nmi_pending == 0);
+ }
}
static void vapic_enter(struct kvm_vcpu *vcpu)
@@ -2999,6 +3036,11 @@ static int __vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
}
if (r > 0) {
+ if (dm_request_for_nmi_injection(vcpu, kvm_run)) {
+ r = -EINTR;
+ kvm_run->exit_reason = KVM_EXIT_NMI;
+ ++vcpu->stat.request_nmi_exits;
+ }
if (dm_request_for_irq_injection(vcpu, kvm_run)) {
r = -EINTR;
kvm_run->exit_reason = KVM_EXIT_INTR;
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index f18b86f..44fd7fa 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -83,18 +83,22 @@ struct kvm_irqchip {
#define KVM_EXIT_S390_SIEIC 13
#define KVM_EXIT_S390_RESET 14
#define KVM_EXIT_DCR 15
+#define KVM_EXIT_NMI 16
+#define KVM_EXIT_NMI_WINDOW_OPEN 17
/* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
struct kvm_run {
/* in */
__u8 request_interrupt_window;
- __u8 padding1[7];
+ __u8 request_nmi_window;
+ __u8 padding1[6];
/* out */
__u32 exit_reason;
__u8 ready_for_interrupt_injection;
__u8 if_flag;
- __u8 padding2[2];
+ __u8 ready_for_nmi_injection;
+ __u8 padding2;
/* in (pre_kvm_run), out (post_kvm_run) */
__u64 cr8;
@@ -387,6 +391,7 @@ struct kvm_trace_rec {
#define KVM_CAP_DEVICE_ASSIGNMENT 17
#endif
#define KVM_CAP_IOMMU 18
+#define KVM_CAP_NMI 19
/*
* ioctls for VM fds
@@ -458,6 +463,8 @@ struct kvm_trace_rec {
#define KVM_S390_INITIAL_RESET _IO(KVMIO, 0x97)
#define KVM_GET_MP_STATE _IOR(KVMIO, 0x98, struct kvm_mp_state)
#define KVM_SET_MP_STATE _IOW(KVMIO, 0x99, struct kvm_mp_state)
+/* Available with KVM_CAP_NMI */
+#define KVM_NMI _IO(KVMIO, 0x9a)
#define KVM_TRC_INJ_VIRQ (KVM_TRC_HANDLER + 0x02)
#define KVM_TRC_REDELIVER_EVT (KVM_TRC_HANDLER + 0x03)
--
1.6.0.3
From: Amit Shah <[email protected]>
get_segment_descritptor_dtable() contains an obvious type.
Signed-off-by: Amit Shah <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/x86.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f175b79..ceeac88 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3373,9 +3373,9 @@ static void seg_desct_to_kvm_desct(struct desc_struct *seg_desc, u16 selector,
kvm_desct->padding = 0;
}
-static void get_segment_descritptor_dtable(struct kvm_vcpu *vcpu,
- u16 selector,
- struct descriptor_table *dtable)
+static void get_segment_descriptor_dtable(struct kvm_vcpu *vcpu,
+ u16 selector,
+ struct descriptor_table *dtable)
{
if (selector & 1 << 2) {
struct kvm_segment kvm_seg;
@@ -3400,7 +3400,7 @@ static int load_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
struct descriptor_table dtable;
u16 index = selector >> 3;
- get_segment_descritptor_dtable(vcpu, selector, &dtable);
+ get_segment_descriptor_dtable(vcpu, selector, &dtable);
if (dtable.limit < index * 8 + 7) {
kvm_queue_exception_e(vcpu, GP_VECTOR, selector & 0xfffc);
@@ -3419,7 +3419,7 @@ static int save_guest_segment_descriptor(struct kvm_vcpu *vcpu, u16 selector,
struct descriptor_table dtable;
u16 index = selector >> 3;
- get_segment_descritptor_dtable(vcpu, selector, &dtable);
+ get_segment_descriptor_dtable(vcpu, selector, &dtable);
if (dtable.limit < index * 8 + 7)
return 1;
--
1.6.0.3
From: Jan Kiszka <[email protected]>
Ensure that a VCPU with pending NMIs is considered runnable.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/x86.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1a71f67..1fa9a6d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4130,7 +4130,8 @@ void kvm_arch_flush_shadow(struct kvm *kvm)
int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu)
{
return vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE
- || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED;
+ || vcpu->arch.mp_state == KVM_MP_STATE_SIPI_RECEIVED
+ || vcpu->arch.nmi_pending;
}
static void vcpu_kick_intr(void *info)
--
1.6.0.3
From: Jan Kiszka <[email protected]>
do_interrupt_requests and vmx_intr_assist go different way for
achieving the same: enabling the nmi/irq window start notification.
Unify their code over enable_{irq|nmi}_window, get rid of a redundant
call to enable_intr_window instead of direct enable_nmi_window
invocation and unroll enable_intr_window for both in-kernel and user
space irq injection accordingly.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/vmx.c | 78 +++++++++++++++++++++------------------------------
1 files changed, 32 insertions(+), 46 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f0866e1..440f56c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2389,30 +2389,42 @@ static void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
kvm_queue_interrupt(vcpu, irq);
}
-static void do_interrupt_requests(struct kvm_vcpu *vcpu,
- struct kvm_run *kvm_run)
+static void enable_irq_window(struct kvm_vcpu *vcpu)
{
u32 cpu_based_vm_exec_control;
- vmx_update_window_states(vcpu);
+ cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+ cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
+ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+}
- if (vcpu->arch.interrupt_window_open &&
- vcpu->arch.irq_summary && !vcpu->arch.interrupt.pending)
- kvm_do_inject_irq(vcpu);
+static void enable_nmi_window(struct kvm_vcpu *vcpu)
+{
+ u32 cpu_based_vm_exec_control;
- if (vcpu->arch.interrupt_window_open && vcpu->arch.interrupt.pending)
- vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
+ if (!cpu_has_virtual_nmis())
+ return;
cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
+ cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_NMI_PENDING;
+ vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+}
+
+static void do_interrupt_requests(struct kvm_vcpu *vcpu,
+ struct kvm_run *kvm_run)
+{
+ vmx_update_window_states(vcpu);
+
+ if (vcpu->arch.interrupt_window_open) {
+ if (vcpu->arch.irq_summary && !vcpu->arch.interrupt.pending)
+ kvm_do_inject_irq(vcpu);
+
+ if (vcpu->arch.interrupt.pending)
+ vmx_inject_irq(vcpu, vcpu->arch.interrupt.nr);
+ }
if (!vcpu->arch.interrupt_window_open &&
(vcpu->arch.irq_summary || kvm_run->request_interrupt_window))
- /*
- * Interrupts blocked. Wait for unblock.
- */
- cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
- else
- cpu_based_vm_exec_control &= ~CPU_BASED_VIRTUAL_INTR_PENDING;
- vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
+ enable_irq_window(vcpu);
}
static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr)
@@ -3066,35 +3078,6 @@ static void update_tpr_threshold(struct kvm_vcpu *vcpu)
vmcs_write32(TPR_THRESHOLD, (max_irr > tpr) ? tpr >> 4 : max_irr >> 4);
}
-static void enable_irq_window(struct kvm_vcpu *vcpu)
-{
- u32 cpu_based_vm_exec_control;
-
- cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
- cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_INTR_PENDING;
- vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
-}
-
-static void enable_nmi_window(struct kvm_vcpu *vcpu)
-{
- u32 cpu_based_vm_exec_control;
-
- if (!cpu_has_virtual_nmis())
- return;
-
- cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL);
- cpu_based_vm_exec_control |= CPU_BASED_VIRTUAL_NMI_PENDING;
- vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
-}
-
-static void enable_intr_window(struct kvm_vcpu *vcpu)
-{
- if (vcpu->arch.nmi_pending)
- enable_nmi_window(vcpu);
- else if (kvm_cpu_has_interrupt(vcpu))
- enable_irq_window(vcpu);
-}
-
static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
{
u32 exit_intr_info;
@@ -3165,13 +3148,16 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
vcpu->arch.nmi_pending = false;
vcpu->arch.nmi_injected = true;
} else {
- enable_intr_window(vcpu);
+ enable_nmi_window(vcpu);
return;
}
}
if (vcpu->arch.nmi_injected) {
vmx_inject_nmi(vcpu);
- enable_intr_window(vcpu);
+ if (vcpu->arch.nmi_pending)
+ enable_nmi_window(vcpu);
+ else if (kvm_cpu_has_interrupt(vcpu))
+ enable_irq_window(vcpu);
return;
}
}
--
1.6.0.3
From: Jan Kiszka <[email protected]>
Fix NMI injection in real-mode with the same pattern we perform IRQ
injection.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/vmx.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 440f56c..38d1385 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2358,6 +2358,19 @@ static void vmx_inject_irq(struct kvm_vcpu *vcpu, int irq)
static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+ if (vcpu->arch.rmode.active) {
+ vmx->rmode.irq.pending = true;
+ vmx->rmode.irq.vector = NMI_VECTOR;
+ vmx->rmode.irq.rip = kvm_rip_read(vcpu);
+ vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
+ NMI_VECTOR | INTR_TYPE_SOFT_INTR |
+ INTR_INFO_VALID_MASK);
+ vmcs_write32(VM_ENTRY_INSTRUCTION_LEN, 1);
+ kvm_rip_write(vcpu, vmx->rmode.irq.rip - 1);
+ return;
+ }
vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
}
--
1.6.0.3
From: Jan Kiszka <[email protected]>
CPU reset invalidates pending or already injected NMIs, therefore reset
the related state variables.
Based on original patch by Gleb Natapov.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/x86.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f1f8ff2..1a71f67 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3925,6 +3925,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
int kvm_arch_vcpu_reset(struct kvm_vcpu *vcpu)
{
+ vcpu->arch.nmi_pending = false;
+ vcpu->arch.nmi_injected = false;
+
return kvm_x86_ops->vcpu_reset(vcpu);
}
--
1.6.0.3
From: Gleb Natapov <[email protected]>
Call kvm_arch_vcpu_reset() instead of directly using arch callback.
The function does additional things.
Signed-off-by: Gleb Natapov <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/x86.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0797145..a2c4b55 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3010,7 +3010,7 @@ static int __vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
pr_debug("vcpu %d received sipi with vector # %x\n",
vcpu->vcpu_id, vcpu->arch.sipi_vector);
kvm_lapic_reset(vcpu);
- r = kvm_x86_ops->vcpu_reset(vcpu);
+ r = kvm_arch_vcpu_reset(vcpu);
if (r)
return r;
vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
--
1.6.0.3
From: Jan Kiszka <[email protected]>
Kick the NMI receiving VCPU in case the triggering caller runs in a
different context.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/lapic.c | 1 +
virt/kvm/ioapic.c | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 206cc11..304f9dd 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -354,6 +354,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
case APIC_DM_NMI:
kvm_inject_nmi(vcpu);
+ kvm_vcpu_kick(vcpu);
break;
case APIC_DM_INIT:
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 53772bb..c8f939c 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -150,6 +150,7 @@ static int ioapic_inj_irq(struct kvm_ioapic *ioapic,
static void ioapic_inj_nmi(struct kvm_vcpu *vcpu)
{
kvm_inject_nmi(vcpu);
+ kvm_vcpu_kick(vcpu);
}
static u32 ioapic_get_delivery_bitmask(struct kvm_ioapic *ioapic, u8 dest,
--
1.6.0.3
From: Jan Kiszka <[email protected]>
There are currently two ways in VMX to check if an IRQ or NMI can be
injected:
- vmx_{nmi|irq}_enabled and
- vcpu.arch.{nmi|interrupt}_window_open.
Even worse, one test (at the end of vmx_vcpu_run) uses an inconsistent,
likely incorrect logic.
This patch consolidates and unifies the tests over
{nmi|interrupt}_window_open as cache + vmx_update_window_states
for updating the cache content.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx.c | 46 +++++++++++++++++---------------------
2 files changed, 22 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8346be8..bfbbdea 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -327,6 +327,7 @@ struct kvm_vcpu_arch {
bool nmi_pending;
bool nmi_injected;
+ bool nmi_window_open;
u64 mtrr[0x100];
};
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8d0fc68..f0866e1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2362,6 +2362,21 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
}
+static void vmx_update_window_states(struct kvm_vcpu *vcpu)
+{
+ u32 guest_intr = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
+
+ vcpu->arch.nmi_window_open =
+ !(guest_intr & (GUEST_INTR_STATE_STI |
+ GUEST_INTR_STATE_MOV_SS |
+ GUEST_INTR_STATE_NMI));
+
+ vcpu->arch.interrupt_window_open =
+ ((vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
+ !(guest_intr & (GUEST_INTR_STATE_STI |
+ GUEST_INTR_STATE_MOV_SS)));
+}
+
static void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
{
int word_index = __ffs(vcpu->arch.irq_summary);
@@ -2374,15 +2389,12 @@ static void kvm_do_inject_irq(struct kvm_vcpu *vcpu)
kvm_queue_interrupt(vcpu, irq);
}
-
static void do_interrupt_requests(struct kvm_vcpu *vcpu,
struct kvm_run *kvm_run)
{
u32 cpu_based_vm_exec_control;
- vcpu->arch.interrupt_window_open =
- ((vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
- (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & 3) == 0);
+ vmx_update_window_states(vcpu);
if (vcpu->arch.interrupt_window_open &&
vcpu->arch.irq_summary && !vcpu->arch.interrupt.pending)
@@ -3075,22 +3087,6 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
}
-static int vmx_nmi_enabled(struct kvm_vcpu *vcpu)
-{
- u32 guest_intr = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
- return !(guest_intr & (GUEST_INTR_STATE_NMI |
- GUEST_INTR_STATE_MOV_SS |
- GUEST_INTR_STATE_STI));
-}
-
-static int vmx_irq_enabled(struct kvm_vcpu *vcpu)
-{
- u32 guest_intr = vmcs_read32(GUEST_INTERRUPTIBILITY_INFO);
- return (!(guest_intr & (GUEST_INTR_STATE_MOV_SS |
- GUEST_INTR_STATE_STI)) &&
- (vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF));
-}
-
static void enable_intr_window(struct kvm_vcpu *vcpu)
{
if (vcpu->arch.nmi_pending)
@@ -3159,11 +3155,13 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
{
update_tpr_threshold(vcpu);
+ vmx_update_window_states(vcpu);
+
if (cpu_has_virtual_nmis()) {
if (vcpu->arch.nmi_pending && !vcpu->arch.nmi_injected) {
if (vcpu->arch.interrupt.pending) {
enable_nmi_window(vcpu);
- } else if (vmx_nmi_enabled(vcpu)) {
+ } else if (vcpu->arch.nmi_window_open) {
vcpu->arch.nmi_pending = false;
vcpu->arch.nmi_injected = true;
} else {
@@ -3178,7 +3176,7 @@ static void vmx_intr_assist(struct kvm_vcpu *vcpu)
}
}
if (!vcpu->arch.interrupt.pending && kvm_cpu_has_interrupt(vcpu)) {
- if (vmx_irq_enabled(vcpu))
+ if (vcpu->arch.interrupt_window_open)
kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu));
else
enable_irq_window(vcpu);
@@ -3339,9 +3337,7 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
if (vmx->rmode.irq.pending)
fixup_rmode_irq(vmx);
- vcpu->arch.interrupt_window_open =
- (vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
- (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)) == 0;
+ vmx_update_window_states(vcpu);
asm("mov %0, %%ds; mov %0, %%es" : : "r"(__USER_DS));
vmx->launched = 1;
--
1.6.0.3
From: Jan Kiszka <[email protected]>
This patch adds the required bits to the VMX side for user space
injected NMIs. As with the preexisting in-kernel irqchip support, the
CPU must provide the "virtual NMI" feature for proper tracking of the
NMI blocking state.
Based on the original patch by Sheng Yang.
Signed-off-by: Jan Kiszka <[email protected]>
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kvm/vmx.c | 33 +++++++++++++++++++++++++++++++++
1 files changed, 33 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 38d1385..f16a62c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2360,6 +2360,7 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ ++vcpu->stat.nmi_injections;
if (vcpu->arch.rmode.active) {
vmx->rmode.irq.pending = true;
vmx->rmode.irq.vector = NMI_VECTOR;
@@ -2428,6 +2429,30 @@ static void do_interrupt_requests(struct kvm_vcpu *vcpu,
{
vmx_update_window_states(vcpu);
+ if (cpu_has_virtual_nmis()) {
+ if (vcpu->arch.nmi_pending && !vcpu->arch.nmi_injected) {
+ if (vcpu->arch.nmi_window_open) {
+ vcpu->arch.nmi_pending = false;
+ vcpu->arch.nmi_injected = true;
+ } else {
+ enable_nmi_window(vcpu);
+ return;
+ }
+ }
+ if (vcpu->arch.nmi_injected) {
+ vmx_inject_nmi(vcpu);
+ if (vcpu->arch.nmi_pending
+ || kvm_run->request_nmi_window)
+ enable_nmi_window(vcpu);
+ else if (vcpu->arch.irq_summary
+ || kvm_run->request_interrupt_window)
+ enable_irq_window(vcpu);
+ return;
+ }
+ if (!vcpu->arch.nmi_window_open || kvm_run->request_nmi_window)
+ enable_nmi_window(vcpu);
+ }
+
if (vcpu->arch.interrupt_window_open) {
if (vcpu->arch.irq_summary && !vcpu->arch.interrupt.pending)
kvm_do_inject_irq(vcpu);
@@ -2959,6 +2984,14 @@ static int handle_nmi_window(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control);
++vcpu->stat.nmi_window_exits;
+ /*
+ * If the user space waits to inject a NMI, exit as soon as possible
+ */
+ if (kvm_run->request_nmi_window && !vcpu->arch.nmi_pending) {
+ kvm_run->exit_reason = KVM_EXIT_NMI_WINDOW_OPEN;
+ return 0;
+ }
+
return 1;
}
--
1.6.0.3
From: Sheng Yang <[email protected]>
Remove one left improper comment of removed CR2.
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/include/asm/kvm_x86_emulate.h | 10 +++-------
1 files changed, 3 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/kvm_x86_emulate.h b/arch/x86/include/asm/kvm_x86_emulate.h
index 25179a2..16a0026 100644
--- a/arch/x86/include/asm/kvm_x86_emulate.h
+++ b/arch/x86/include/asm/kvm_x86_emulate.h
@@ -146,22 +146,18 @@ struct x86_emulate_ctxt {
/* Register state before/after emulation. */
struct kvm_vcpu *vcpu;
- /* Linear faulting address (if emulating a page-faulting instruction) */
unsigned long eflags;
-
/* Emulated execution mode, represented by an X86EMUL_MODE value. */
int mode;
-
u32 cs_base;
/* decode cache */
-
struct decode_cache decode;
};
/* Repeat String Operation Prefix */
-#define REPE_PREFIX 1
-#define REPNE_PREFIX 2
+#define REPE_PREFIX 1
+#define REPNE_PREFIX 2
/* Execution mode, passed to the emulator. */
#define X86EMUL_MODE_REAL 0 /* Real mode. */
@@ -170,7 +166,7 @@ struct x86_emulate_ctxt {
#define X86EMUL_MODE_PROT64 8 /* 64-bit (long) mode. */
/* Host execution mode. */
-#if defined(__i386__)
+#if defined(CONFIG_X86_32)
#define X86EMUL_MODE_HOST X86EMUL_MODE_PROT32
#elif defined(CONFIG_X86_64)
#define X86EMUL_MODE_HOST X86EMUL_MODE_PROT64
--
1.6.0.3
From: Sheng Yang <[email protected]>
Prepare for exporting them.
Signed-off-by: Sheng Yang <[email protected]>
Signed-off-by: Avi Kivity <[email protected]>
---
arch/x86/kernel/cpu/mtrr/generic.c | 8 ++++----
arch/x86/kernel/cpu/mtrr/main.c | 4 ++--
arch/x86/kernel/cpu/mtrr/mtrr.h | 7 ++++---
3 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/cpu/mtrr/generic.c b/arch/x86/kernel/cpu/mtrr/generic.c
index 4e8d77f..90db91e 100644
--- a/arch/x86/kernel/cpu/mtrr/generic.c
+++ b/arch/x86/kernel/cpu/mtrr/generic.c
@@ -14,9 +14,9 @@
#include <asm/pat.h>
#include "mtrr.h"
-struct mtrr_state {
- struct mtrr_var_range var_ranges[MAX_VAR_RANGES];
- mtrr_type fixed_ranges[NUM_FIXED_RANGES];
+struct mtrr_state_type {
+ struct mtrr_var_range var_ranges[MTRR_MAX_VAR_RANGES];
+ mtrr_type fixed_ranges[MTRR_NUM_FIXED_RANGES];
unsigned char enabled;
unsigned char have_fixed;
mtrr_type def_type;
@@ -35,7 +35,7 @@ static struct fixed_range_block fixed_range_blocks[] = {
};
static unsigned long smp_changes_mask;
-static struct mtrr_state mtrr_state = {};
+static struct mtrr_state_type mtrr_state = {};
static int mtrr_state_set;
u64 mtrr_tom2;
diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c
index c78c048..c091d06 100644
--- a/arch/x86/kernel/cpu/mtrr/main.c
+++ b/arch/x86/kernel/cpu/mtrr/main.c
@@ -49,7 +49,7 @@
u32 num_var_ranges = 0;
-unsigned int mtrr_usage_table[MAX_VAR_RANGES];
+unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
static DEFINE_MUTEX(mtrr_mutex);
u64 size_or_mask, size_and_mask;
@@ -574,7 +574,7 @@ struct mtrr_value {
unsigned long lsize;
};
-static struct mtrr_value mtrr_state[MAX_VAR_RANGES];
+static struct mtrr_value mtrr_state[MTRR_MAX_VAR_RANGES];
static int mtrr_save(struct sys_device * sysdev, pm_message_t state)
{
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.h b/arch/x86/kernel/cpu/mtrr/mtrr.h
index 2dc4ec6..9885382 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.h
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.h
@@ -11,8 +11,9 @@
#define MTRRphysBase_MSR(reg) (0x200 + 2 * (reg))
#define MTRRphysMask_MSR(reg) (0x200 + 2 * (reg) + 1)
-#define NUM_FIXED_RANGES 88
-#define MAX_VAR_RANGES 256
+#define MTRR_NUM_FIXED_RANGES 88
+#define MTRR_MAX_VAR_RANGES 256
+
#define MTRRfix64K_00000_MSR 0x250
#define MTRRfix16K_80000_MSR 0x258
#define MTRRfix16K_A0000_MSR 0x259
@@ -33,7 +34,7 @@
an 8 bit field: */
typedef u8 mtrr_type;
-extern unsigned int mtrr_usage_table[MAX_VAR_RANGES];
+extern unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
struct mtrr_ops {
u32 vendor;
--
1.6.0.3
Avi Kivity wrote:
> From: Jan Kiszka <[email protected]>
>
> Introduces the KVM_NMI IOCTL to the generic x86 part of KVM for
> injecting NMIs from user space and also extends the statistic report
> accordingly.
>
> Based on the original patch by Sheng Yang.
>
> Signed-off-by: Jan Kiszka <[email protected]>
> Signed-off-by: Sheng Yang <[email protected]>
> Signed-off-by: Avi Kivity <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/x86.c | 46 +++++++++++++++++++++++++++++++++++++-
> include/linux/kvm.h | 11 +++++++-
> 3 files changed, 55 insertions(+), 4 deletions(-)
>
...
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index f18b86f..44fd7fa 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -83,18 +83,22 @@ struct kvm_irqchip {
> #define KVM_EXIT_S390_SIEIC 13
> #define KVM_EXIT_S390_RESET 14
> #define KVM_EXIT_DCR 15
> +#define KVM_EXIT_NMI 16
> +#define KVM_EXIT_NMI_WINDOW_OPEN 17
>
> /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
> struct kvm_run {
> /* in */
> __u8 request_interrupt_window;
> - __u8 padding1[7];
> + __u8 request_nmi_window;
> + __u8 padding1[6];
>
> /* out */
> __u32 exit_reason;
> __u8 ready_for_interrupt_injection;
> __u8 if_flag;
> - __u8 padding2[2];
> + __u8 ready_for_nmi_injection;
> + __u8 padding2;
>
> /* in (pre_kvm_run), out (post_kvm_run) */
> __u64 cr8;
Avi, please consider [1] again. You've already merged the related user
space cleanup, and I was just waiting for your comment on how to deal
with the kernel part, ideally _before_ merging the superfluous
ready_for/request NMI interface into mainline.
Thanks,
Jan
[1] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/25016
--
Siemens AG, Corporate Technology, CT SE 26
Corporate Competence Center Embedded Linux
Jan Kiszka wrote:
> Avi Kivity wrote:
>
>> From: Jan Kiszka <[email protected]>
>>
>> Introduces the KVM_NMI IOCTL to the generic x86 part of KVM for
>> injecting NMIs from user space and also extends the statistic report
>> accordingly.
>>
> Avi, please consider [1] again. You've already merged the related user
> space cleanup, and I was just waiting for your comment on how to deal
> with the kernel part, ideally _before_ merging the superfluous
> ready_for/request NMI interface into mainline.
>
> Thanks,
> Jan
>
> [1] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/25016
>
>
Do you mean, drop these nmi window bits completely from 2.6.29, keeping
just injection? We can do that without breaking released userspace by
dropping KVM_CAP_NMI and adding KVM_CAP_NMI2 (better name?) that doesn't
have interrupt window support.
This means that userspace versions that supported KVM_CAP_NMI will not
work with 2.6.29, but I think we can live with that.
If this is acceptable, please send a patch that converts KVM_CAP_NMI to
KVM_CAP_NMI2, and a corresponding patch for userspace. I will fold the
kernel patch into the 2.6.29 submission, and apply both patches to the
master branch.
--
error compiling committee.c: too many arguments to function
Avi Kivity wrote:
> Jan Kiszka wrote:
>> Avi Kivity wrote:
>>
>>> From: Jan Kiszka <[email protected]>
>>>
>>> Introduces the KVM_NMI IOCTL to the generic x86 part of KVM for
>>> injecting NMIs from user space and also extends the statistic report
>>> accordingly.
>>>
>> Avi, please consider [1] again. You've already merged the related user
>> space cleanup, and I was just waiting for your comment on how to deal
>> with the kernel part, ideally _before_ merging the superfluous
>> ready_for/request NMI interface into mainline.
>>
>> Thanks,
>> Jan
>>
>> [1] http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/25016
>>
>>
>
> Do you mean, drop these nmi window bits completely from 2.6.29, keeping
> just injection? We can do that without breaking released userspace by
> dropping KVM_CAP_NMI and adding KVM_CAP_NMI2 (better name?) that doesn't
> have interrupt window support.
KVM_CAP_USER_NMI
>
> This means that userspace versions that supported KVM_CAP_NMI will not
> work with 2.6.29, but I think we can live with that.
>
> If this is acceptable, please send a patch that converts KVM_CAP_NMI to
> KVM_CAP_NMI2, and a corresponding patch for userspace. I will fold the
> kernel patch into the 2.6.29 submission, and apply both patches to the
> master branch.
OK, sounds reasonable. I will work out such patches.
Jan
--
Siemens AG, Corporate Technology, CT SE 26
Corporate Competence Center Embedded Linux