From: Borislav Petkov <[email protected]>
Alriiight,
here's another version of the patchset, hopefully addressing all review
feedback from last time. 6/6 is the respective qemu patch to handle
emulated features query, etc.
It is still a lot of fun to generate fast! Atom 32-bit SMP guests like
this:
[ 0.022876] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.033304] smpboot: CPU0: Intel(R) Atom(TM) CPU N270 @ 1.60GHz (fam: 06, model: 1c, stepping: 02)
[ 0.037000] APIC calibration not consistent with PM-Timer: 146ms instead of 100ms
[ 0.037000] APIC delta adjusted to PM-Timer: 6249937 (9125627)
[ 0.037066] Performance Events: unsupported p6 CPU model 28 no PMU driver, software events only.
[ 0.043605] SMP alternatives: lockdep: fixing up alternatives
[ 0.044030] CPU 1 irqstacks, hard=f450c000 soft=f450e000
[ 0.045004] smpboot: Booting Node 0, Processors #1[ 0.004000] Initializing CPU#1
[ 0.004000] Atom PSE erratum detected, BIOS microcode update recommended
[ 0.120290] SMP alternatives: lockdep: fixing up alternatives
[ 0.121007] CPU 2 irqstacks, hard=f451c000 soft=f451e000
[ 0.122003] #2[ 0.004000] Initializing CPU#2
[ 0.004000] Atom PSE erratum detected, BIOS microcode update recommended
...
[ 0.667192] SMP alternatives: lockdep: fixing up alternatives
[ 0.668007] CPU 7 irqstacks, hard=f45b0000 soft=f45b2000
[ 0.669010] #7 OK
[ 0.004000] Initializing CPU#7
[ 0.004000] Atom PSE erratum detected, BIOS microcode update recommended
[ 0.781052] Brought up 8 CPUs
[ 0.781917] smpboot: Total of 8 processors activated (57461.27 BogoMIPS)
LooL :-)
Comments and suggestions appreciated, as always!
Thanks.
Borislav Petkov (5):
kvm: Add KVM_GET_EMULATED_CPUID
kvm, emulator: Use opcode length
kvm, emulator: Rename VendorSpecific flag
kvm, emulator: Add initial three-byte insns support
kvm: Emulate MOVBE
Documentation/virtual/kvm/api.txt | 77 +++++++++++++++++++++++++++++--
arch/x86/include/asm/kvm_emulate.h | 10 ++--
arch/x86/include/uapi/asm/kvm.h | 6 +--
arch/x86/kvm/cpuid.c | 75 +++++++++++++++++++++++++++---
arch/x86/kvm/cpuid.h | 5 +-
arch/x86/kvm/emulate.c | 94 ++++++++++++++++++++++++++++++++++----
arch/x86/kvm/x86.c | 16 ++++---
include/uapi/linux/kvm.h | 2 +
8 files changed, 251 insertions(+), 34 deletions(-)
Borislav Petkov (1):
qemu: Add support for emulated CPU features
include/sysemu/kvm.h | 4 ++++
linux-headers/linux/kvm.h | 4 ++++
target-i386/cpu.c | 7 +++++++
target-i386/kvm.c | 38 ++++++++++++++++++++++++++++++++++----
4 files changed, 49 insertions(+), 4 deletions(-)
--
1.8.4
From: Borislav Petkov <[email protected]>
Add a field to the current emulation context which contains the
instruction opcode length. This will streamline handling of opcodes of
different length.
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/kvm_emulate.h | 8 ++++++--
arch/x86/kvm/emulate.c | 5 +++--
arch/x86/kvm/x86.c | 4 ++--
3 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 15f960c06ff7..92a176ad456c 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -279,8 +279,12 @@ struct x86_emulate_ctxt {
bool have_exception;
struct x86_exception exception;
- /* decode cache */
- u8 twobyte;
+ /*
+ * decode cache
+ */
+
+ /* current opcode length in bytes */
+ u8 opcode_len;
u8 b;
u8 intercept;
u8 lock_prefix;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 2bc1e81045b0..bfae4a433091 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4114,6 +4114,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
ctxt->_eip = ctxt->eip;
ctxt->fetch.start = ctxt->_eip;
ctxt->fetch.end = ctxt->fetch.start + insn_len;
+ ctxt->opcode_len = 1;
if (insn_len > 0)
memcpy(ctxt->fetch.data, insn, insn_len);
@@ -4196,7 +4197,7 @@ done_prefixes:
opcode = opcode_table[ctxt->b];
/* Two-byte opcode? */
if (ctxt->b == 0x0f) {
- ctxt->twobyte = 1;
+ ctxt->opcode_len = 2;
ctxt->b = insn_fetch(u8, ctxt);
opcode = twobyte_table[ctxt->b];
}
@@ -4528,7 +4529,7 @@ special_insn:
goto writeback;
}
- if (ctxt->twobyte)
+ if (ctxt->opcode_len == 2)
goto twobyte_insn;
switch (ctxt->b) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8dfde7a52dab..6f2ea40b80cf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4778,8 +4778,8 @@ static void inject_emulated_exception(struct kvm_vcpu *vcpu)
static void init_decode_cache(struct x86_emulate_ctxt *ctxt)
{
- memset(&ctxt->twobyte, 0,
- (void *)&ctxt->_regs - (void *)&ctxt->twobyte);
+ memset(&ctxt->opcode_len, 0,
+ (void *)&ctxt->_regs - (void *)&ctxt->opcode_len);
ctxt->fetch.start = 0;
ctxt->fetch.end = 0;
--
1.8.4
From: Borislav Petkov <[email protected]>
Add initial support for handling three-byte instructions in the
emulator.
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kvm/emulate.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 67277bcb377a..72093d76c769 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3880,6 +3880,25 @@ static const struct opcode twobyte_table[256] = {
N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N
};
+static const struct gprefix third_opcode_byte_0xf0 = {
+ N, N, N, N
+};
+
+static const struct gprefix third_opcode_byte_0xf1 = {
+ N, N, N, N
+};
+
+/*
+ * Insns below are selected by the prefix which indexed by the third opcode
+ * byte.
+ */
+static const struct opcode opcode_map_0f_38[256] = {
+ /* 0x00 - 0x7f */
+ X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N),
+ /* 0x80 - 0xff */
+ X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N)
+};
+
#undef D
#undef N
#undef G
@@ -4200,6 +4219,13 @@ done_prefixes:
ctxt->opcode_len = 2;
ctxt->b = insn_fetch(u8, ctxt);
opcode = twobyte_table[ctxt->b];
+
+ /* 0F_38 opcode map */
+ if (ctxt->b == 0x38) {
+ ctxt->opcode_len = 3;
+ ctxt->b = insn_fetch(u8, ctxt);
+ opcode = opcode_map_0f_38[ctxt->b];
+ }
}
ctxt->d = opcode.flags;
@@ -4531,6 +4557,8 @@ special_insn:
if (ctxt->opcode_len == 2)
goto twobyte_insn;
+ else if (ctxt->opcode_len == 3)
+ goto threebyte_insn;
switch (ctxt->b) {
case 0x63: /* movsxd */
@@ -4715,6 +4743,8 @@ twobyte_insn:
goto cannot_emulate;
}
+threebyte_insn:
+
if (rc != X86EMUL_CONTINUE)
goto done;
--
1.8.4
From: Borislav Petkov <[email protected]>
Add a kvm ioctl which states which system functionality kvm emulates.
The format used is that of CPUID and we return the corresponding CPUID
bits set for which we do emulate functionality.
Make sure ->padding is being passed on clean from userspace so that we
can use it for something in the future, after the ioctl gets cast in
stone.
s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
it.
Signed-off-by: Borislav Petkov <[email protected]>
---
Documentation/virtual/kvm/api.txt | 77 +++++++++++++++++++++++++++++++++++++--
arch/x86/include/uapi/asm/kvm.h | 6 +--
arch/x86/kvm/cpuid.c | 57 ++++++++++++++++++++++++++---
arch/x86/kvm/cpuid.h | 5 ++-
arch/x86/kvm/x86.c | 9 +++--
include/uapi/linux/kvm.h | 2 +
6 files changed, 139 insertions(+), 17 deletions(-)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 858aecf21db2..7b70d670bb28 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1122,9 +1122,9 @@ struct kvm_cpuid2 {
struct kvm_cpuid_entry2 entries[0];
};
-#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX 1
-#define KVM_CPUID_FLAG_STATEFUL_FUNC 2
-#define KVM_CPUID_FLAG_STATE_READ_NEXT 4
+#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
+#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
+#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
struct kvm_cpuid_entry2 {
__u32 function;
@@ -2661,6 +2661,77 @@ and usually define the validity of a groups of registers. (e.g. one bit
};
+4.81 KVM_GET_EMULATED_CPUID
+
+Capability: KVM_CAP_EXT_EMUL_CPUID
+Architectures: x86
+Type: system ioctl
+Parameters: struct kvm_cpuid2 (in/out)
+Returns: 0 on success, -1 on error
+
+struct kvm_cpuid2 {
+ __u32 nent;
+ __u32 flags;
+ struct kvm_cpuid_entry2 entries[0];
+};
+
+The member 'flags' is used for passing flags from userspace.
+
+#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
+#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
+#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
+
+struct kvm_cpuid_entry2 {
+ __u32 function;
+ __u32 index;
+ __u32 flags;
+ __u32 eax;
+ __u32 ebx;
+ __u32 ecx;
+ __u32 edx;
+ __u32 padding[3];
+};
+
+This ioctl returns x86 cpuid features which are emulated by
+kvm.Userspace can use the information returned by this ioctl to query
+which features are emulated by kvm instead of being present natively.
+
+Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2
+structure with the 'nent' field indicating the number of entries in
+the variable-size array 'entries'. If the number of entries is too low
+to describe the cpu capabilities, an error (E2BIG) is returned. If the
+number is too high, the 'nent' field is adjusted and an error (ENOMEM)
+is returned. If the number is just right, the 'nent' field is adjusted
+to the number of valid entries in the 'entries' array, which is then
+filled.
+
+The entries returned are the set CPUID bits of the respective features
+which kvm emulates, as returned by the CPUID instruction, with unknown
+or unsupported feature bits cleared.
+
+Features like x2apic, for example, may not be present in the host cpu
+but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be
+emulated efficiently and thus not included here.
+
+The fields in each entry are defined as follows:
+
+ function: the eax value used to obtain the entry
+ index: the ecx value used to obtain the entry (for entries that are
+ affected by ecx)
+ flags: an OR of zero or more of the following:
+ KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
+ if the index field is valid
+ KVM_CPUID_FLAG_STATEFUL_FUNC:
+ if cpuid for this function returns different values for successive
+ invocations; there will be several entries with the same function,
+ all with this flag set
+ KVM_CPUID_FLAG_STATE_READ_NEXT:
+ for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
+ the first entry to be read by a cpu
+ eax, ebx, ecx, edx: the values returned by the cpuid instruction for
+ this function/index combination
+
+
6. Capabilities that can be enabled
-----------------------------------
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 5d9a3033b3d7..d3a87780c70b 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -211,9 +211,9 @@ struct kvm_cpuid_entry2 {
__u32 padding[3];
};
-#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX 1
-#define KVM_CPUID_FLAG_STATEFUL_FUNC 2
-#define KVM_CPUID_FLAG_STATE_READ_NEXT 4
+#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
+#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
+#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
/* for KVM_SET_CPUID2 */
struct kvm_cpuid2 {
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b110fe6c03d4..eca357095a49 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -187,8 +187,14 @@ static bool supported_xcr0_bit(unsigned bit)
#define F(x) bit(X86_FEATURE_##x)
-static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
- u32 index, int *nent, int maxnent)
+static int __do_cpuid_ent_emulated(struct kvm_cpuid_entry2 *entry,
+ u32 func, u32 index, int *nent, int maxnent)
+{
+ return 0;
+}
+
+static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
+ u32 index, int *nent, int maxnent)
{
int r;
unsigned f_nx = is_efer_nx() ? F(NX) : 0;
@@ -481,6 +487,15 @@ out:
return r;
}
+static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 func,
+ u32 idx, int *nent, int maxnent, unsigned int type)
+{
+ if (type == KVM_GET_EMULATED_CPUID)
+ return __do_cpuid_ent_emulated(entry, func, idx, nent, maxnent);
+
+ return __do_cpuid_ent(entry, func, idx, nent, maxnent);
+}
+
#undef F
struct kvm_cpuid_param {
@@ -495,8 +510,34 @@ static bool is_centaur_cpu(const struct kvm_cpuid_param *param)
return boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR;
}
-int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
- struct kvm_cpuid_entry2 __user *entries)
+static bool sanity_check_entries(struct kvm_cpuid_entry2 __user *entries,
+ __u32 num_entries, unsigned int ioctl_type)
+{
+ int i;
+
+ if (ioctl_type != KVM_GET_EMULATED_CPUID)
+ return false;
+
+ /*
+ * We want to make sure that ->padding is being passed clean from
+ * userspace in case we want to use it for something in the future.
+ *
+ * Sadly, this wasn't enforced for KVM_GET_SUPPORTED_CPUID and so we
+ * have to give ourselves satisfied only with the emulated side. /me
+ * sheds a tear.
+ */
+ for (i = 0; i < num_entries; i++) {
+ if (entries[i].padding[0] ||
+ entries[i].padding[1] ||
+ entries[i].padding[2])
+ return true;
+ }
+ return false;
+}
+
+int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
+ struct kvm_cpuid_entry2 __user *entries,
+ unsigned int type)
{
struct kvm_cpuid_entry2 *cpuid_entries;
int limit, nent = 0, r = -E2BIG, i;
@@ -513,6 +554,10 @@ int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
goto out;
if (cpuid->nent > KVM_MAX_CPUID_ENTRIES)
cpuid->nent = KVM_MAX_CPUID_ENTRIES;
+
+ if (sanity_check_entries(entries, cpuid->nent, type))
+ return -EINVAL;
+
r = -ENOMEM;
cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent);
if (!cpuid_entries)
@@ -526,7 +571,7 @@ int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
continue;
r = do_cpuid_ent(&cpuid_entries[nent], ent->func, ent->idx,
- &nent, cpuid->nent);
+ &nent, cpuid->nent, type);
if (r)
goto out_free;
@@ -537,7 +582,7 @@ int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
limit = cpuid_entries[nent - 1].eax;
for (func = ent->func + 1; func <= limit && nent < cpuid->nent && r == 0; ++func)
r = do_cpuid_ent(&cpuid_entries[nent], func, ent->idx,
- &nent, cpuid->nent);
+ &nent, cpuid->nent, type);
if (r)
goto out_free;
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index b7fd07984888..f1e4895174b2 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -6,8 +6,9 @@
void kvm_update_cpuid(struct kvm_vcpu *vcpu);
struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
u32 function, u32 index);
-int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
- struct kvm_cpuid_entry2 __user *entries);
+int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
+ struct kvm_cpuid_entry2 __user *entries,
+ unsigned int type);
int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
struct kvm_cpuid *cpuid,
struct kvm_cpuid_entry __user *entries);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5ca72a5cdb6..8dfde7a52dab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2564,6 +2564,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_MMU_SHADOW_CACHE_CONTROL:
case KVM_CAP_SET_TSS_ADDR:
case KVM_CAP_EXT_CPUID:
+ case KVM_CAP_EXT_EMUL_CPUID:
case KVM_CAP_CLOCKSOURCE:
case KVM_CAP_PIT:
case KVM_CAP_NOP_IO_DELAY:
@@ -2673,15 +2674,17 @@ long kvm_arch_dev_ioctl(struct file *filp,
r = 0;
break;
}
- case KVM_GET_SUPPORTED_CPUID: {
+ case KVM_GET_SUPPORTED_CPUID:
+ case KVM_GET_EMULATED_CPUID: {
struct kvm_cpuid2 __user *cpuid_arg = argp;
struct kvm_cpuid2 cpuid;
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid))
goto out;
- r = kvm_dev_ioctl_get_supported_cpuid(&cpuid,
- cpuid_arg->entries);
+
+ r = kvm_dev_ioctl_get_cpuid(&cpuid, cpuid_arg->entries,
+ ioctl);
if (r)
goto out;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 99c25338ede8..bb986a1674a3 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -541,6 +541,7 @@ struct kvm_ppc_smmu_info {
#define KVM_TRACE_ENABLE __KVM_DEPRECATED_MAIN_W_0x06
#define KVM_TRACE_PAUSE __KVM_DEPRECATED_MAIN_0x07
#define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08
+#define KVM_GET_EMULATED_CPUID _IOWR(KVMIO, 0x09, struct kvm_cpuid2)
/*
* Extension capability list.
@@ -668,6 +669,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_IRQ_XICS 92
#define KVM_CAP_ARM_EL1_32BIT 93
#define KVM_CAP_SPAPR_MULTITCE 94
+#define KVM_CAP_EXT_EMUL_CPUID 95
#ifdef KVM_CAP_IRQ_ROUTING
--
1.8.4
From: Borislav Petkov <[email protected]>
Add support for the KVM_GET_EMULATED_CPUID ioctl and leave feature bits
enabled, when requested by userspace, if kvm emulates them.
Signed-off-by: Borislav Petkov <[email protected]>
---
include/sysemu/kvm.h | 4 ++++
linux-headers/linux/kvm.h | 4 ++++
target-i386/cpu.c | 7 +++++++
target-i386/kvm.c | 38 ++++++++++++++++++++++++++++++++++----
4 files changed, 49 insertions(+), 4 deletions(-)
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 9bbe3db1464e..8eda1ada848a 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -265,6 +265,10 @@ int kvm_check_extension(KVMState *s, unsigned int extension);
uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
uint32_t index, int reg);
+
+uint32_t kvm_arch_get_emulated_cpuid(KVMState *env, uint32_t function,
+ uint32_t index, int reg);
+
void kvm_cpu_synchronize_state(CPUState *cpu);
/* generic hooks - to be moved/refactored once there are more users */
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index c614070662e1..edc8f2db1f8d 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -541,6 +541,7 @@ struct kvm_ppc_smmu_info {
#define KVM_TRACE_ENABLE __KVM_DEPRECATED_MAIN_W_0x06
#define KVM_TRACE_PAUSE __KVM_DEPRECATED_MAIN_0x07
#define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08
+#define KVM_GET_EMULATED_CPUID _IOWR(KVMIO, 0x09, struct kvm_cpuid2)
/*
* Extension capability list.
@@ -666,6 +667,9 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_IRQ_MPIC 90
#define KVM_CAP_PPC_RTAS 91
#define KVM_CAP_IRQ_XICS 92
+#define KVM_CAP_ARM_EL1_32BIT 93
+#define KVM_CAP_SPAPR_MULTITCE 94
+#define KVM_CAP_EXT_EMUL_CPUID 95
#ifdef KVM_CAP_IRQ_ROUTING
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index c36345e426d7..5406348ceb4a 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -1850,7 +1850,14 @@ static void filter_features_for_kvm(X86CPU *cpu)
wi->cpuid_ecx,
wi->cpuid_reg);
uint32_t requested_features = env->features[w];
+
+ uint32_t emul_features = kvm_arch_get_emulated_cpuid(s, wi->cpuid_eax,
+ wi->cpuid_ecx,
+ wi->cpuid_reg);
+
env->features[w] &= host_feat;
+ env->features[w] |= (requested_features & emul_features);
+
cpu->filtered_features[w] = requested_features & ~env->features[w];
}
}
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 749aa09a21a6..7f598f01bda5 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -80,7 +80,7 @@ bool kvm_allows_irq0_override(void)
return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
}
-static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
+static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int ioctl, int max)
{
struct kvm_cpuid2 *cpuid;
int r, size;
@@ -88,7 +88,7 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
size = sizeof(*cpuid) + max * sizeof(*cpuid->entries);
cpuid = (struct kvm_cpuid2 *)g_malloc0(size);
cpuid->nent = max;
- r = kvm_ioctl(s, KVM_GET_SUPPORTED_CPUID, cpuid);
+ r = kvm_ioctl(s, ioctl, cpuid);
if (r == 0 && cpuid->nent >= max) {
r = -E2BIG;
}
@@ -97,7 +97,10 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
g_free(cpuid);
return NULL;
} else {
- fprintf(stderr, "KVM_GET_SUPPORTED_CPUID failed: %s\n",
+ fprintf(stderr, "%s failed: %s\n",
+ (ioctl == (int)KVM_GET_SUPPORTED_CPUID
+ ? "KVM_GET_SUPPORTED_CPUID"
+ : "KVM_GET_EMULATED_CPUID"),
strerror(-r));
exit(1);
}
@@ -112,7 +115,17 @@ static struct kvm_cpuid2 *get_supported_cpuid(KVMState *s)
{
struct kvm_cpuid2 *cpuid;
int max = 1;
- while ((cpuid = try_get_cpuid(s, max)) == NULL) {
+ while ((cpuid = try_get_cpuid(s, KVM_GET_SUPPORTED_CPUID, max)) == NULL) {
+ max *= 2;
+ }
+ return cpuid;
+}
+
+static struct kvm_cpuid2 *get_emulated_cpuid(KVMState *s)
+{
+ struct kvm_cpuid2 *cpuid;
+ int max = 1;
+ while ((cpuid = try_get_cpuid(s, KVM_GET_EMULATED_CPUID, max)) == NULL) {
max *= 2;
}
return cpuid;
@@ -241,6 +254,23 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
return ret;
}
+uint32_t kvm_arch_get_emulated_cpuid(KVMState *s, uint32_t function,
+ uint32_t index, int reg)
+{
+ struct kvm_cpuid2 *cpuid __attribute__((unused));
+
+ if (!kvm_check_extension(s, KVM_CAP_EXT_EMUL_CPUID))
+ return 0;
+
+ cpuid = get_emulated_cpuid(s);
+
+ struct kvm_cpuid_entry2 *entry = cpuid_find_entry(cpuid, function, index);
+ if (entry)
+ return cpuid_entry_get_reg(entry, reg);
+
+ return 0;
+}
+
typedef struct HWPoisonPage {
ram_addr_t ram_addr;
QLIST_ENTRY(HWPoisonPage) list;
--
1.8.4
From: Borislav Petkov <[email protected]>
This basically came from the need to be able to boot 32-bit Atom SMP
guests on an AMD host, i.e. a host which doesn't support MOVBE. As a
matter of fact, qemu has since recently received MOVBE support but we
cannot share that with kvm emulation and thus we have to do this in the
host. We're waay faster in kvm anyway. :-)
So, we piggyback on the #UD path and emulate the MOVBE functionality.
With it, an 8-core SMP guest boots in under 6 seconds.
Also, requesting MOVBE emulation needs to happen explicitly to work,
i.e. qemu -cpu n270,+movbe...
Just FYI, a fairly straight-forward boot of a MOVBE-enabled 3.9-rc6+
kernel in kvm executes MOVBE ~60K times.
Signed-off-by: Andre Przywara <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kvm/cpuid.c | 18 ++++++++++++++++-
arch/x86/kvm/emulate.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 66 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index eca357095a49..986cf7cc74f3 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -190,6 +190,22 @@ static bool supported_xcr0_bit(unsigned bit)
static int __do_cpuid_ent_emulated(struct kvm_cpuid_entry2 *entry,
u32 func, u32 index, int *nent, int maxnent)
{
+ switch (func) {
+ case 0:
+ entry->eax = 1; /* only one leaf currently */
+ ++*nent;
+ break;
+ case 1:
+ entry->ecx = F(MOVBE);
+ ++*nent;
+ break;
+ default:
+ break;
+ }
+
+ entry->function = func;
+ entry->index = index;
+
return 0;
}
@@ -559,7 +575,7 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
return -EINVAL;
r = -ENOMEM;
- cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent);
+ cpuid_entries = vzalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent);
if (!cpuid_entries)
goto out;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 72093d76c769..fba4a75d82cf 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2950,6 +2950,46 @@ static int em_mov(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
}
+#define FFL(x) bit(X86_FEATURE_##x)
+
+static int em_movbe(struct x86_emulate_ctxt *ctxt)
+{
+ u32 ebx, ecx, edx, eax = 1;
+ u16 tmp;
+
+ /*
+ * Check MOVBE is set in the guest-visible CPUID leaf.
+ */
+ ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
+ if (!(ecx & FFL(MOVBE)))
+ return emulate_ud(ctxt);
+
+ switch (ctxt->op_bytes) {
+ case 2:
+ /*
+ * From MOVBE definition: "...When the operand size is 16 bits,
+ * the upper word of the destination register remains unchanged
+ * ..."
+ *
+ * Both casting ->valptr and ->val to u16 breaks strict aliasing
+ * rules so we have to do the operation almost per hand.
+ */
+ tmp = (u16)ctxt->src.val;
+ ctxt->dst.val &= ~0xffffUL;
+ ctxt->dst.val |= (unsigned long)swab16(tmp);
+ break;
+ case 4:
+ ctxt->dst.val = swab32((u32)ctxt->src.val);
+ break;
+ case 8:
+ ctxt->dst.val = swab64(ctxt->src.val);
+ break;
+ default:
+ return X86EMUL_PROPAGATE_FAULT;
+ }
+ return X86EMUL_CONTINUE;
+}
+
static int em_cr_write(struct x86_emulate_ctxt *ctxt)
{
if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val))
@@ -3881,11 +3921,11 @@ static const struct opcode twobyte_table[256] = {
};
static const struct gprefix third_opcode_byte_0xf0 = {
- N, N, N, N
+ I(DstReg | SrcMem | Mov, em_movbe), N, N, N
};
static const struct gprefix third_opcode_byte_0xf1 = {
- N, N, N, N
+ I(DstMem | SrcReg | Mov, em_movbe), N, N, N
};
/*
@@ -3895,8 +3935,13 @@ static const struct gprefix third_opcode_byte_0xf1 = {
static const struct opcode opcode_map_0f_38[256] = {
/* 0x00 - 0x7f */
X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N),
- /* 0x80 - 0xff */
- X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N)
+ /* 0x80 - 0xef */
+ X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N),
+ /* 0xf0 - 0xf1 */
+ GP(EmulateOnUD | ModRM | Prefix, &third_opcode_byte_0xf0),
+ GP(EmulateOnUD | ModRM | Prefix, &third_opcode_byte_0xf1),
+ /* 0xf2 - 0xff */
+ N, N, X4(N), X8(N)
};
#undef D
--
1.8.4
From: Borislav Petkov <[email protected]>
Call it EmulateOnUD which is exactly what we're trying to do with
vendor-specific instructions.
Rename ->only_vendor_specific_insn to something shorter, while at it.
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/include/asm/kvm_emulate.h | 2 +-
arch/x86/kvm/emulate.c | 14 +++++++-------
arch/x86/kvm/x86.c | 3 +--
3 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 92a176ad456c..24ec1216596e 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -274,7 +274,7 @@ struct x86_emulate_ctxt {
bool guest_mode; /* guest running a nested guest */
bool perm_ok; /* do not check permissions if true */
- bool only_vendor_specific_insn;
+ bool ud; /* inject an #UD if host doesn't support insn */
bool have_exception;
struct x86_exception exception;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index bfae4a433091..67277bcb377a 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -130,7 +130,7 @@
#define Mov (1<<20)
/* Misc flags */
#define Prot (1<<21) /* instruction generates #UD if not in prot-mode */
-#define VendorSpecific (1<<22) /* Vendor specific instruction */
+#define EmulateOnUD (1<<22) /* Emulate if unsupported by the host */
#define NoAccess (1<<23) /* Don't access memory (lea/invlpg/verr etc) */
#define Op3264 (1<<24) /* Operand is 64b in long mode, 32b otherwise */
#define Undefined (1<<25) /* No Such Instruction */
@@ -3491,7 +3491,7 @@ static const struct opcode group7_rm1[] = {
static const struct opcode group7_rm3[] = {
DIP(SrcNone | Prot | Priv, vmrun, check_svme_pa),
- II(SrcNone | Prot | VendorSpecific, em_vmmcall, vmmcall),
+ II(SrcNone | Prot | EmulateOnUD, em_vmmcall, vmmcall),
DIP(SrcNone | Prot | Priv, vmload, check_svme_pa),
DIP(SrcNone | Prot | Priv, vmsave, check_svme_pa),
DIP(SrcNone | Prot | Priv, stgi, check_svme),
@@ -3576,7 +3576,7 @@ static const struct group_dual group7 = { {
II(SrcMem16 | Mov | Priv, em_lmsw, lmsw),
II(SrcMem | ByteOp | Priv | NoAccess, em_invlpg, invlpg),
}, {
- I(SrcNone | Priv | VendorSpecific, em_vmcall),
+ I(SrcNone | Priv | EmulateOnUD, em_vmcall),
EXT(0, group7_rm1),
N, EXT(0, group7_rm3),
II(SrcNone | DstMem | Mov, em_smsw, smsw), N,
@@ -3798,7 +3798,7 @@ static const struct opcode opcode_table[256] = {
static const struct opcode twobyte_table[256] = {
/* 0x00 - 0x0F */
G(0, group6), GD(0, &group7), N, N,
- N, I(ImplicitOps | VendorSpecific, em_syscall),
+ N, I(ImplicitOps | EmulateOnUD, em_syscall),
II(ImplicitOps | Priv, em_clts, clts), N,
DI(ImplicitOps | Priv, invd), DI(ImplicitOps | Priv, wbinvd), N, N,
N, D(ImplicitOps | ModRM), N, N,
@@ -3818,8 +3818,8 @@ static const struct opcode twobyte_table[256] = {
IIP(ImplicitOps, em_rdtsc, rdtsc, check_rdtsc),
II(ImplicitOps | Priv, em_rdmsr, rdmsr),
IIP(ImplicitOps, em_rdpmc, rdpmc, check_rdpmc),
- I(ImplicitOps | VendorSpecific, em_sysenter),
- I(ImplicitOps | Priv | VendorSpecific, em_sysexit),
+ I(ImplicitOps | EmulateOnUD, em_sysenter),
+ I(ImplicitOps | Priv | EmulateOnUD, em_sysexit),
N, N,
N, N, N, N, N, N, N, N,
/* 0x40 - 0x4F */
@@ -4256,7 +4256,7 @@ done_prefixes:
if (ctxt->d == 0 || (ctxt->d & NotImpl))
return EMULATION_FAILED;
- if (!(ctxt->d & VendorSpecific) && ctxt->only_vendor_specific_insn)
+ if (!(ctxt->d & EmulateOnUD) && ctxt->ud)
return EMULATION_FAILED;
if (mode == X86EMUL_MODE_PROT64 && (ctxt->d & Stack))
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6f2ea40b80cf..0767d9f4c5bd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5097,8 +5097,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
ctxt->have_exception = false;
ctxt->perm_ok = false;
- ctxt->only_vendor_specific_insn
- = emulation_type & EMULTYPE_TRAP_UD;
+ ctxt->ud = emulation_type & EMULTYPE_TRAP_UD;
r = x86_decode_insn(ctxt, insn, insn_len);
--
1.8.4
On Sun, Sep 22, 2013 at 04:44:50PM +0200, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
>
> Add a kvm ioctl which states which system functionality kvm emulates.
> The format used is that of CPUID and we return the corresponding CPUID
> bits set for which we do emulate functionality.
Let me check if I understood the purpose of the new ioctl correctly: the
only reason for GET_EMULATED_CPUID to exist is to allow userspace to
differentiate features that are native or that are emulated efficiently
(GET_SUPPORTED_CPUID) and features that are emulated not very
efficiently (GET_EMULATED_CPUID)?
If that's the case, how do we decide how efficient emulation should be,
to deserve inclusion in GET_SUPPORTED_CPUID? I am guessing that the
criterion will be: if enabling it doesn't risk making performance worse,
it can get in GET_SUPPORTED_CPUID.
>
> Make sure ->padding is being passed on clean from userspace so that we
> can use it for something in the future, after the ioctl gets cast in
> stone.
>
> s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
> it.
>
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
> Documentation/virtual/kvm/api.txt | 77 +++++++++++++++++++++++++++++++++++++--
> arch/x86/include/uapi/asm/kvm.h | 6 +--
> arch/x86/kvm/cpuid.c | 57 ++++++++++++++++++++++++++---
> arch/x86/kvm/cpuid.h | 5 ++-
> arch/x86/kvm/x86.c | 9 +++--
> include/uapi/linux/kvm.h | 2 +
> 6 files changed, 139 insertions(+), 17 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 858aecf21db2..7b70d670bb28 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1122,9 +1122,9 @@ struct kvm_cpuid2 {
> struct kvm_cpuid_entry2 entries[0];
> };
>
> -#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX 1
> -#define KVM_CPUID_FLAG_STATEFUL_FUNC 2
> -#define KVM_CPUID_FLAG_STATE_READ_NEXT 4
> +#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
> +#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
> +#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
>
> struct kvm_cpuid_entry2 {
> __u32 function;
> @@ -2661,6 +2661,77 @@ and usually define the validity of a groups of registers. (e.g. one bit
> };
>
>
> +4.81 KVM_GET_EMULATED_CPUID
> +
> +Capability: KVM_CAP_EXT_EMUL_CPUID
> +Architectures: x86
> +Type: system ioctl
> +Parameters: struct kvm_cpuid2 (in/out)
> +Returns: 0 on success, -1 on error
> +
> +struct kvm_cpuid2 {
> + __u32 nent;
> + __u32 flags;
> + struct kvm_cpuid_entry2 entries[0];
> +};
> +
> +The member 'flags' is used for passing flags from userspace.
> +
> +#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
> +#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
> +#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
> +
> +struct kvm_cpuid_entry2 {
> + __u32 function;
> + __u32 index;
> + __u32 flags;
> + __u32 eax;
> + __u32 ebx;
> + __u32 ecx;
> + __u32 edx;
> + __u32 padding[3];
> +};
> +
> +This ioctl returns x86 cpuid features which are emulated by
> +kvm.Userspace can use the information returned by this ioctl to query
> +which features are emulated by kvm instead of being present natively.
> +
> +Userspace invokes KVM_GET_EMULATED_CPUID by passing a kvm_cpuid2
> +structure with the 'nent' field indicating the number of entries in
> +the variable-size array 'entries'. If the number of entries is too low
> +to describe the cpu capabilities, an error (E2BIG) is returned. If the
> +number is too high, the 'nent' field is adjusted and an error (ENOMEM)
> +is returned. If the number is just right, the 'nent' field is adjusted
> +to the number of valid entries in the 'entries' array, which is then
> +filled.
> +
> +The entries returned are the set CPUID bits of the respective features
> +which kvm emulates, as returned by the CPUID instruction, with unknown
> +or unsupported feature bits cleared.
> +
> +Features like x2apic, for example, may not be present in the host cpu
> +but are exposed by kvm in KVM_GET_SUPPORTED_CPUID because they can be
> +emulated efficiently and thus not included here.
> +
> +The fields in each entry are defined as follows:
> +
> + function: the eax value used to obtain the entry
> + index: the ecx value used to obtain the entry (for entries that are
> + affected by ecx)
> + flags: an OR of zero or more of the following:
> + KVM_CPUID_FLAG_SIGNIFCANT_INDEX:
> + if the index field is valid
> + KVM_CPUID_FLAG_STATEFUL_FUNC:
> + if cpuid for this function returns different values for successive
> + invocations; there will be several entries with the same function,
> + all with this flag set
> + KVM_CPUID_FLAG_STATE_READ_NEXT:
> + for KVM_CPUID_FLAG_STATEFUL_FUNC entries, set if this entry is
> + the first entry to be read by a cpu
> + eax, ebx, ecx, edx: the values returned by the cpuid instruction for
> + this function/index combination
> +
> +
> 6. Capabilities that can be enabled
> -----------------------------------
>
> diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
> index 5d9a3033b3d7..d3a87780c70b 100644
> --- a/arch/x86/include/uapi/asm/kvm.h
> +++ b/arch/x86/include/uapi/asm/kvm.h
> @@ -211,9 +211,9 @@ struct kvm_cpuid_entry2 {
> __u32 padding[3];
> };
>
> -#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX 1
> -#define KVM_CPUID_FLAG_STATEFUL_FUNC 2
> -#define KVM_CPUID_FLAG_STATE_READ_NEXT 4
> +#define KVM_CPUID_FLAG_SIGNIFCANT_INDEX BIT(0)
> +#define KVM_CPUID_FLAG_STATEFUL_FUNC BIT(1)
> +#define KVM_CPUID_FLAG_STATE_READ_NEXT BIT(2)
>
> /* for KVM_SET_CPUID2 */
> struct kvm_cpuid2 {
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index b110fe6c03d4..eca357095a49 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -187,8 +187,14 @@ static bool supported_xcr0_bit(unsigned bit)
>
> #define F(x) bit(X86_FEATURE_##x)
>
> -static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> - u32 index, int *nent, int maxnent)
> +static int __do_cpuid_ent_emulated(struct kvm_cpuid_entry2 *entry,
> + u32 func, u32 index, int *nent, int maxnent)
> +{
> + return 0;
> +}
> +
> +static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> + u32 index, int *nent, int maxnent)
> {
> int r;
> unsigned f_nx = is_efer_nx() ? F(NX) : 0;
> @@ -481,6 +487,15 @@ out:
> return r;
> }
>
> +static int do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 func,
> + u32 idx, int *nent, int maxnent, unsigned int type)
> +{
> + if (type == KVM_GET_EMULATED_CPUID)
> + return __do_cpuid_ent_emulated(entry, func, idx, nent, maxnent);
> +
> + return __do_cpuid_ent(entry, func, idx, nent, maxnent);
> +}
> +
> #undef F
>
> struct kvm_cpuid_param {
> @@ -495,8 +510,34 @@ static bool is_centaur_cpu(const struct kvm_cpuid_param *param)
> return boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR;
> }
>
> -int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
> - struct kvm_cpuid_entry2 __user *entries)
> +static bool sanity_check_entries(struct kvm_cpuid_entry2 __user *entries,
> + __u32 num_entries, unsigned int ioctl_type)
> +{
> + int i;
> +
> + if (ioctl_type != KVM_GET_EMULATED_CPUID)
> + return false;
> +
> + /*
> + * We want to make sure that ->padding is being passed clean from
> + * userspace in case we want to use it for something in the future.
> + *
> + * Sadly, this wasn't enforced for KVM_GET_SUPPORTED_CPUID and so we
> + * have to give ourselves satisfied only with the emulated side. /me
> + * sheds a tear.
> + */
> + for (i = 0; i < num_entries; i++) {
> + if (entries[i].padding[0] ||
> + entries[i].padding[1] ||
> + entries[i].padding[2])
> + return true;
> + }
> + return false;
> +}
> +
> +int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
> + struct kvm_cpuid_entry2 __user *entries,
> + unsigned int type)
> {
> struct kvm_cpuid_entry2 *cpuid_entries;
> int limit, nent = 0, r = -E2BIG, i;
> @@ -513,6 +554,10 @@ int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
> goto out;
> if (cpuid->nent > KVM_MAX_CPUID_ENTRIES)
> cpuid->nent = KVM_MAX_CPUID_ENTRIES;
> +
> + if (sanity_check_entries(entries, cpuid->nent, type))
> + return -EINVAL;
> +
> r = -ENOMEM;
> cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent);
> if (!cpuid_entries)
> @@ -526,7 +571,7 @@ int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
> continue;
>
> r = do_cpuid_ent(&cpuid_entries[nent], ent->func, ent->idx,
> - &nent, cpuid->nent);
> + &nent, cpuid->nent, type);
>
> if (r)
> goto out_free;
> @@ -537,7 +582,7 @@ int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
> limit = cpuid_entries[nent - 1].eax;
> for (func = ent->func + 1; func <= limit && nent < cpuid->nent && r == 0; ++func)
> r = do_cpuid_ent(&cpuid_entries[nent], func, ent->idx,
> - &nent, cpuid->nent);
> + &nent, cpuid->nent, type);
>
> if (r)
> goto out_free;
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index b7fd07984888..f1e4895174b2 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -6,8 +6,9 @@
> void kvm_update_cpuid(struct kvm_vcpu *vcpu);
> struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
> u32 function, u32 index);
> -int kvm_dev_ioctl_get_supported_cpuid(struct kvm_cpuid2 *cpuid,
> - struct kvm_cpuid_entry2 __user *entries);
> +int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
> + struct kvm_cpuid_entry2 __user *entries,
> + unsigned int type);
> int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu,
> struct kvm_cpuid *cpuid,
> struct kvm_cpuid_entry __user *entries);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e5ca72a5cdb6..8dfde7a52dab 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2564,6 +2564,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> case KVM_CAP_MMU_SHADOW_CACHE_CONTROL:
> case KVM_CAP_SET_TSS_ADDR:
> case KVM_CAP_EXT_CPUID:
> + case KVM_CAP_EXT_EMUL_CPUID:
> case KVM_CAP_CLOCKSOURCE:
> case KVM_CAP_PIT:
> case KVM_CAP_NOP_IO_DELAY:
> @@ -2673,15 +2674,17 @@ long kvm_arch_dev_ioctl(struct file *filp,
> r = 0;
> break;
> }
> - case KVM_GET_SUPPORTED_CPUID: {
> + case KVM_GET_SUPPORTED_CPUID:
> + case KVM_GET_EMULATED_CPUID: {
> struct kvm_cpuid2 __user *cpuid_arg = argp;
> struct kvm_cpuid2 cpuid;
>
> r = -EFAULT;
> if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid))
> goto out;
> - r = kvm_dev_ioctl_get_supported_cpuid(&cpuid,
> - cpuid_arg->entries);
> +
> + r = kvm_dev_ioctl_get_cpuid(&cpuid, cpuid_arg->entries,
> + ioctl);
> if (r)
> goto out;
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 99c25338ede8..bb986a1674a3 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -541,6 +541,7 @@ struct kvm_ppc_smmu_info {
> #define KVM_TRACE_ENABLE __KVM_DEPRECATED_MAIN_W_0x06
> #define KVM_TRACE_PAUSE __KVM_DEPRECATED_MAIN_0x07
> #define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08
> +#define KVM_GET_EMULATED_CPUID _IOWR(KVMIO, 0x09, struct kvm_cpuid2)
>
> /*
> * Extension capability list.
> @@ -668,6 +669,7 @@ struct kvm_ppc_smmu_info {
> #define KVM_CAP_IRQ_XICS 92
> #define KVM_CAP_ARM_EL1_32BIT 93
> #define KVM_CAP_SPAPR_MULTITCE 94
> +#define KVM_CAP_EXT_EMUL_CPUID 95
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> --
> 1.8.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Eduardo
(CCing qemu-devel and libvir-list)
On Sun, Sep 22, 2013 at 04:44:55PM +0200, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
>
> Add support for the KVM_GET_EMULATED_CPUID ioctl and leave feature bits
> enabled, when requested by userspace, if kvm emulates them.
>
> Signed-off-by: Borislav Petkov <[email protected]>
This will break some assumptions about the semantics of "-cpu host",
that today means "enable every single feature that can be enabled in the
host". This, in turn, breaks the assumption that "-cpu host" can be used
by libvirt to get information about all CPU features supported by the
host.
So, I have two questions:
1) Should "-cpu host" enable GET_EMULATED_CPUID features? (I believe the
answer is "no");
2) If not, we need a new mechanism to let libvirt know which features
can be enabled on a host (in a way that would allow libvirt to
easily calculate the set of possible destination hosts to which a VM
can be migrated, before deciding to migrate it).
This interface is getting complex enough for me to consider simply
asking libvirt developers to call GET_EMULATED_CPUID and
GET_SUPPORTED_CPUID directly, instead of asking QEMU for information.
But if libvirt does that, it needs a way to know if it's running a QEMU
version that supports GET_EMULATED_CPUID, or an old version that
supports only GET_SUPPORTED_CPUID.
Another alternative is to simply let libvirt ignore GET_EMULATED_CPUID
by now, and treat GET_EMULATED_CPUID features as not supported by the
host. But then it will break the use cases that prompted the creation of
GET_EMULATED_CPUID in the first place.
> ---
> include/sysemu/kvm.h | 4 ++++
> linux-headers/linux/kvm.h | 4 ++++
> target-i386/cpu.c | 7 +++++++
> target-i386/kvm.c | 38 ++++++++++++++++++++++++++++++++++----
> 4 files changed, 49 insertions(+), 4 deletions(-)
>
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 9bbe3db1464e..8eda1ada848a 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -265,6 +265,10 @@ int kvm_check_extension(KVMState *s, unsigned int extension);
>
> uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function,
> uint32_t index, int reg);
> +
> +uint32_t kvm_arch_get_emulated_cpuid(KVMState *env, uint32_t function,
> + uint32_t index, int reg);
> +
> void kvm_cpu_synchronize_state(CPUState *cpu);
>
> /* generic hooks - to be moved/refactored once there are more users */
> diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
> index c614070662e1..edc8f2db1f8d 100644
> --- a/linux-headers/linux/kvm.h
> +++ b/linux-headers/linux/kvm.h
> @@ -541,6 +541,7 @@ struct kvm_ppc_smmu_info {
> #define KVM_TRACE_ENABLE __KVM_DEPRECATED_MAIN_W_0x06
> #define KVM_TRACE_PAUSE __KVM_DEPRECATED_MAIN_0x07
> #define KVM_TRACE_DISABLE __KVM_DEPRECATED_MAIN_0x08
> +#define KVM_GET_EMULATED_CPUID _IOWR(KVMIO, 0x09, struct kvm_cpuid2)
>
> /*
> * Extension capability list.
> @@ -666,6 +667,9 @@ struct kvm_ppc_smmu_info {
> #define KVM_CAP_IRQ_MPIC 90
> #define KVM_CAP_PPC_RTAS 91
> #define KVM_CAP_IRQ_XICS 92
> +#define KVM_CAP_ARM_EL1_32BIT 93
> +#define KVM_CAP_SPAPR_MULTITCE 94
> +#define KVM_CAP_EXT_EMUL_CPUID 95
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index c36345e426d7..5406348ceb4a 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -1850,7 +1850,14 @@ static void filter_features_for_kvm(X86CPU *cpu)
> wi->cpuid_ecx,
> wi->cpuid_reg);
> uint32_t requested_features = env->features[w];
> +
> + uint32_t emul_features = kvm_arch_get_emulated_cpuid(s, wi->cpuid_eax,
> + wi->cpuid_ecx,
> + wi->cpuid_reg);
> +
> env->features[w] &= host_feat;
> + env->features[w] |= (requested_features & emul_features);
> +
I find the above logic confusing: you remove a few bits just to re-add
them later. Isn't easier and simpler to simply do:
env->features[w] &= (host_feat | emul_features);
?
> cpu->filtered_features[w] = requested_features & ~env->features[w];
> }
> }
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 749aa09a21a6..7f598f01bda5 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -80,7 +80,7 @@ bool kvm_allows_irq0_override(void)
> return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
> }
>
> -static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
> +static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int ioctl, int max)
> {
> struct kvm_cpuid2 *cpuid;
> int r, size;
> @@ -88,7 +88,7 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
> size = sizeof(*cpuid) + max * sizeof(*cpuid->entries);
> cpuid = (struct kvm_cpuid2 *)g_malloc0(size);
> cpuid->nent = max;
> - r = kvm_ioctl(s, KVM_GET_SUPPORTED_CPUID, cpuid);
> + r = kvm_ioctl(s, ioctl, cpuid);
> if (r == 0 && cpuid->nent >= max) {
> r = -E2BIG;
> }
> @@ -97,7 +97,10 @@ static struct kvm_cpuid2 *try_get_cpuid(KVMState *s, int max)
> g_free(cpuid);
> return NULL;
> } else {
> - fprintf(stderr, "KVM_GET_SUPPORTED_CPUID failed: %s\n",
> + fprintf(stderr, "%s failed: %s\n",
> + (ioctl == (int)KVM_GET_SUPPORTED_CPUID
> + ? "KVM_GET_SUPPORTED_CPUID"
> + : "KVM_GET_EMULATED_CPUID"),
> strerror(-r));
> exit(1);
> }
> @@ -112,7 +115,17 @@ static struct kvm_cpuid2 *get_supported_cpuid(KVMState *s)
> {
> struct kvm_cpuid2 *cpuid;
> int max = 1;
> - while ((cpuid = try_get_cpuid(s, max)) == NULL) {
> + while ((cpuid = try_get_cpuid(s, KVM_GET_SUPPORTED_CPUID, max)) == NULL) {
> + max *= 2;
> + }
> + return cpuid;
> +}
> +
> +static struct kvm_cpuid2 *get_emulated_cpuid(KVMState *s)
> +{
> + struct kvm_cpuid2 *cpuid;
> + int max = 1;
> + while ((cpuid = try_get_cpuid(s, KVM_GET_EMULATED_CPUID, max)) == NULL) {
> max *= 2;
> }
> return cpuid;
> @@ -241,6 +254,23 @@ uint32_t kvm_arch_get_supported_cpuid(KVMState *s, uint32_t function,
> return ret;
> }
>
> +uint32_t kvm_arch_get_emulated_cpuid(KVMState *s, uint32_t function,
> + uint32_t index, int reg)
> +{
> + struct kvm_cpuid2 *cpuid __attribute__((unused));
> +
> + if (!kvm_check_extension(s, KVM_CAP_EXT_EMUL_CPUID))
> + return 0;
> +
> + cpuid = get_emulated_cpuid(s);
> +
> + struct kvm_cpuid_entry2 *entry = cpuid_find_entry(cpuid, function, index);
> + if (entry)
> + return cpuid_entry_get_reg(entry, reg);
> +
> + return 0;
> +}
> +
> typedef struct HWPoisonPage {
> ram_addr_t ram_addr;
> QLIST_ENTRY(HWPoisonPage) list;
> --
> 1.8.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Eduardo
On Mon, September 23, 2013 6:28 pm, Eduardo Habkost wrote:
> On Sun, Sep 22, 2013 at 04:44:50PM +0200, Borislav Petkov wrote:
>> From: Borislav Petkov <[email protected]>
>>
>> Add a kvm ioctl which states which system functionality kvm emulates.
>> The format used is that of CPUID and we return the corresponding CPUID
>> bits set for which we do emulate functionality.
>
> Let me check if I understood the purpose of the new ioctl correctly: the
> only reason for GET_EMULATED_CPUID to exist is to allow userspace to
> differentiate features that are native or that are emulated efficiently
> (GET_SUPPORTED_CPUID) and features that are emulated not very
> efficiently (GET_EMULATED_CPUID)?
Not only that - emulated features are not reported in CPUID so they
can be enabled only when specifically and explicitly requested, i.e.
"+movbe". Basically, you want to emulate that feature for the guest but
only for this specific guest - the others shouldn't see it.
> If that's the case, how do we decide how efficient emulation should be,
> to deserve inclusion in GET_SUPPORTED_CPUID? I am guessing that the
> criterion will be: if enabling it doesn't risk making performance worse,
> it can get in GET_SUPPORTED_CPUID.
Well, in the MOVBE case, supported means, the host can execute this
instruction natively. Now, you guys say you can emulate x2apic very
efficiently and I'm guessing emulating x2apic doesn't bring any
emulation overhead, thus SUPPORTED_CPUID.
But for single instructions or group of instructions, the distinction
should be very clear.
At least this is how I see it but Gleb probably can comment too.
Thanks.
--
Regards/Gruss,
Boris.
On Tue, Sep 24, 2013 at 11:57:00AM +0200, Borislav Petkov wrote:
> On Mon, September 23, 2013 6:28 pm, Eduardo Habkost wrote:
> > On Sun, Sep 22, 2013 at 04:44:50PM +0200, Borislav Petkov wrote:
> >> From: Borislav Petkov <[email protected]>
> >>
> >> Add a kvm ioctl which states which system functionality kvm emulates.
> >> The format used is that of CPUID and we return the corresponding CPUID
> >> bits set for which we do emulate functionality.
> >
> > Let me check if I understood the purpose of the new ioctl correctly: the
> > only reason for GET_EMULATED_CPUID to exist is to allow userspace to
> > differentiate features that are native or that are emulated efficiently
> > (GET_SUPPORTED_CPUID) and features that are emulated not very
> > efficiently (GET_EMULATED_CPUID)?
>
> Not only that - emulated features are not reported in CPUID so they
> can be enabled only when specifically and explicitly requested, i.e.
> "+movbe". Basically, you want to emulate that feature for the guest but
> only for this specific guest - the others shouldn't see it.
>
> > If that's the case, how do we decide how efficient emulation should be,
> > to deserve inclusion in GET_SUPPORTED_CPUID? I am guessing that the
> > criterion will be: if enabling it doesn't risk making performance worse,
> > it can get in GET_SUPPORTED_CPUID.
>
> Well, in the MOVBE case, supported means, the host can execute this
> instruction natively. Now, you guys say you can emulate x2apic very
> efficiently and I'm guessing emulating x2apic doesn't bring any
> emulation overhead, thus SUPPORTED_CPUID.
x2apic emulation has nothing to do with x2apic in a host. It is emulated
same way no matter if host has it or not. x2apic is not really cpu
feature, but apic one and apic is fully emulated by KVM anyway.
>
> But for single instructions or group of instructions, the distinction
> should be very clear.
>
> At least this is how I see it but Gleb probably can comment too.
>
That's how I see it two. Basically you want to use movbe emulation (as
opposite of virtualization) only if you have binary kernel that compiled
for CPU with movbe (Borislav's use case), or you want to migrate
temporarily from movbe enabled host to non movbe host because downtime
is not an option. We should avoid enabling it "by mistake".
--
Gleb.
On Tue, Sep 24, 2013 at 01:04:14PM +0300, Gleb Natapov wrote:
> On Tue, Sep 24, 2013 at 11:57:00AM +0200, Borislav Petkov wrote:
> > On Mon, September 23, 2013 6:28 pm, Eduardo Habkost wrote:
> > > On Sun, Sep 22, 2013 at 04:44:50PM +0200, Borislav Petkov wrote:
> > >> From: Borislav Petkov <[email protected]>
> > >>
> > >> Add a kvm ioctl which states which system functionality kvm emulates.
> > >> The format used is that of CPUID and we return the corresponding CPUID
> > >> bits set for which we do emulate functionality.
> > >
> > > Let me check if I understood the purpose of the new ioctl correctly: the
> > > only reason for GET_EMULATED_CPUID to exist is to allow userspace to
> > > differentiate features that are native or that are emulated efficiently
> > > (GET_SUPPORTED_CPUID) and features that are emulated not very
> > > efficiently (GET_EMULATED_CPUID)?
> >
> > Not only that - emulated features are not reported in CPUID so they
> > can be enabled only when specifically and explicitly requested, i.e.
> > "+movbe". Basically, you want to emulate that feature for the guest but
> > only for this specific guest - the others shouldn't see it.
Then we may have a problem: some CPU models already have "movbe"
included (e.g. Haswell), and patch 6/6 will make "-cpu Haswell" get
movbe enabled even if it is being emulated.
So if we really want to avoid enabling emulated features by mistake, we
may need a new CPU flag in addition to "enforce" to tell QEMU that it is
OK to enable emulated features (maybe "-cpu ...,emulate"?).
> >
> > > If that's the case, how do we decide how efficient emulation should be,
> > > to deserve inclusion in GET_SUPPORTED_CPUID? I am guessing that the
> > > criterion will be: if enabling it doesn't risk making performance worse,
> > > it can get in GET_SUPPORTED_CPUID.
> >
> > Well, in the MOVBE case, supported means, the host can execute this
> > instruction natively. Now, you guys say you can emulate x2apic very
> > efficiently and I'm guessing emulating x2apic doesn't bring any
> > emulation overhead, thus SUPPORTED_CPUID.
> x2apic emulation has nothing to do with x2apic in a host. It is emulated
> same way no matter if host has it or not. x2apic is not really cpu
> feature, but apic one and apic is fully emulated by KVM anyway.
But my question still stands: suppose we had x2apic emulation
implemented but for some reason it was painfully slow, we wouldn't want
to enable it by mistake. In this case, it would end up on EMULATED_CPUID
and not on SUPPORTED_CPUID, right?
>
> >
> > But for single instructions or group of instructions, the distinction
> > should be very clear.
> >
> > At least this is how I see it but Gleb probably can comment too.
> >
> That's how I see it two. Basically you want to use movbe emulation (as
> opposite of virtualization) only if you have binary kernel that compiled
> for CPU with movbe (Borislav's use case), or you want to migrate
> temporarily from movbe enabled host to non movbe host because downtime
> is not an option. We should avoid enabling it "by mistake".
"we should avoid enabling it 'by mistake'" sounds like a good criterion
for including something on GET_EMULATED_CPUID instead of
GET_SUPPORTED_CPUID.
In that case, I believe QEMU should use GET_EMULATED_CPUID only if
explicitly requested in the configuration/command-line (that's not what
patch 6/6 does).
--
Eduardo
On Thu, Sep 26, 2013 at 11:19:15AM -0300, Eduardo Habkost wrote:
> Then we may have a problem: some CPU models already have "movbe"
> included (e.g. Haswell), and patch 6/6 will make "-cpu Haswell" get
> movbe enabled even if it is being emulated.
Huh? HSW has MOVBE so we won't #UD on it and MOVBE will get executed in
hardware when executing the guest. IOW, we'll never get to the emulation
path of piggybacking on the #UD.
> So if we really want to avoid enabling emulated features by mistake,
> we may need a new CPU flag in addition to "enforce" to tell QEMU that
> it is OK to enable emulated features (maybe "-cpu ...,emulate"?).
EMULATED_CPUID are off by default and only if you request them
specifically, they get enabled. If you start with "-cpu Haswell", MOVBE
will be already set in the host CPUID.
Or am I missing something?
> But my question still stands: suppose we had x2apic emulation
> implemented but for some reason it was painfully slow, we wouldn't
> want to enable it by mistake. In this case, it would end up on
> EMULATED_CPUID and not on SUPPORTED_CPUID, right?
IMHO we want to enable emulation only when explicitly requested...
regardless of the emulation performance.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Thu, Sep 26, 2013 at 08:55:24PM +0200, Borislav Petkov wrote:
> On Thu, Sep 26, 2013 at 11:19:15AM -0300, Eduardo Habkost wrote:
> > Then we may have a problem: some CPU models already have "movbe"
> > included (e.g. Haswell), and patch 6/6 will make "-cpu Haswell" get
> > movbe enabled even if it is being emulated.
>
> Huh? HSW has MOVBE so we won't #UD on it and MOVBE will get executed in
> hardware when executing the guest. IOW, we'll never get to the emulation
> path of piggybacking on the #UD.
>
> > So if we really want to avoid enabling emulated features by mistake,
> > we may need a new CPU flag in addition to "enforce" to tell QEMU that
> > it is OK to enable emulated features (maybe "-cpu ...,emulate"?).
>
> EMULATED_CPUID are off by default and only if you request them
> specifically, they get enabled.
Please point me to the code that does this, because I don't see it on
patch 6/6.
> If you start with "-cpu Haswell", MOVBE
> will be already set in the host CPUID.
>
> Or am I missing something?
In the Haswell example, it is unlikely but possible in theory: you would
need a CPU that supported all features from Haswell except movbe. But
what will happen if you are using "-cpu n270,enforce" on a SandyBridge
host?
Also, we don't know anything about future CPUs or future features that
will end up on EMULATED_CPUID. The current code doesn't have anything to
differentiate features that were already included in the CPU definition
and ones explicitly enabled in the command-line (and I would like to
keep it that way).
And just because a feature was explicitly enabled in the command-line,
that doesn't mean the user believe it is acceptable to get it running in
emulated mode. That's why I propose a new "emulate" flag, to allow
features to be enabled in emulated mode.
>
> > But my question still stands: suppose we had x2apic emulation
> > implemented but for some reason it was painfully slow, we wouldn't
> > want to enable it by mistake. In this case, it would end up on
> > EMULATED_CPUID and not on SUPPORTED_CPUID, right?
>
> IMHO we want to enable emulation only when explicitly requested...
> regardless of the emulation performance.
Well, x2apic is emulated by KVM, and it is on SUPPORTED_CPUID. Ditto for
tsc-deadline. Or are you talking specifically about instruction
emulation?
--
Eduardo
On Thu, Sep 26, 2013 at 04:20:59PM -0300, Eduardo Habkost wrote:
> Please point me to the code that does this, because I don't see it on
> patch 6/6.
@@ -1850,7 +1850,14 @@ static void filter_features_for_kvm(X86CPU *cpu)
wi->cpuid_ecx,
wi->cpuid_reg);
uint32_t requested_features = env->features[w];
+
+ uint32_t emul_features = kvm_arch_get_emulated_cpuid(s, wi->cpuid_eax,
+ wi->cpuid_ecx,
+ wi->cpuid_reg);
+
env->features[w] &= host_feat;
+ env->features[w] |= (requested_features & emul_features);
Basically we give the requested_features a second chance here.
If we don't request an emulated feature, it won't get enabled.
> > If you start with "-cpu Haswell", MOVBE
> > will be already set in the host CPUID.
> >
> > Or am I missing something?
>
> In the Haswell example, it is unlikely but possible in theory: you would
> need a CPU that supported all features from Haswell except movbe. But
> what will happen if you are using "-cpu n270,enforce" on a SandyBridge
> host?
That's an interesting question: AFAICT, it will fail because MOVBE is
not available on the host, right?
And if so, then this is correct behavior IMHO, or how exactly is the
"enforce" thing supposed to work? Enforce host CPUID?
> Also, we don't know anything about future CPUs or future features
> that will end up on EMULATED_CPUID. The current code doesn't have
> anything to differentiate features that were already included in the
> CPU definition and ones explicitly enabled in the command-line (and I
> would like to keep it that way).
Ok.
> And just because a feature was explicitly enabled in the command-line,
> that doesn't mean the user believe it is acceptable to get it running
> in emulated mode. That's why I propose a new "emulate" flag, to allow
> features to be enabled in emulated mode.
And I think, saying "-cpu ...,+movbe" is an explicit statement enough to
say that yes, I am starting this guest and I want MOVBE emulation.
> Well, x2apic is emulated by KVM, and it is on SUPPORTED_CPUID. Ditto
> for tsc-deadline. Or are you talking specifically about instruction
> emulation?
Basically, I'm viewing this from a very practical standpoint - if I
build a kernel which requires MOVBE support but I cannot boot it in kvm
because it doesn't emulate MOVBE (TCG does now but it didn't before)
I'd like to be able to address that shortcoming by emulating that
instruction, if possible.
And the whole discussion grew out from the standpoint of being able to
emulate stuff so that you can do quick and dirty booting of kernels but
not show that emulation capability to the wide audience since it is slow
and it shouldn't be used and then migration has issues, etc, etc.
But hey, I don't really care all that much if I have to also say
-emulate in order to get my functionality.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Thu, Sep 26, 2013 at 10:32:06PM +0200, Borislav Petkov wrote:
> On Thu, Sep 26, 2013 at 04:20:59PM -0300, Eduardo Habkost wrote:
> > Please point me to the code that does this, because I don't see it on
> > patch 6/6.
>
> @@ -1850,7 +1850,14 @@ static void filter_features_for_kvm(X86CPU *cpu)
> wi->cpuid_ecx,
> wi->cpuid_reg);
> uint32_t requested_features = env->features[w];
> +
> + uint32_t emul_features = kvm_arch_get_emulated_cpuid(s, wi->cpuid_eax,
> + wi->cpuid_ecx,
> + wi->cpuid_reg);
> +
> env->features[w] &= host_feat;
> + env->features[w] |= (requested_features & emul_features);
>
> Basically we give the requested_features a second chance here.
>
> If we don't request an emulated feature, it won't get enabled.
The problem here is that "requested_features" doesn't include just the
explicit "+flag" flags, but any flag included in the CPU model
definition. See the "-cpu n270" example below.
>
> > > If you start with "-cpu Haswell", MOVBE
> > > will be already set in the host CPUID.
> > >
> > > Or am I missing something?
> >
> > In the Haswell example, it is unlikely but possible in theory: you would
> > need a CPU that supported all features from Haswell except movbe. But
> > what will happen if you are using "-cpu n270,enforce" on a SandyBridge
> > host?
>
> That's an interesting question: AFAICT, it will fail because MOVBE is
> not available on the host, right?
It should, but your patch will make it stop failing because of MOVBE, as
now it can be emulated[1].
>
> And if so, then this is correct behavior IMHO, or how exactly is the
> "enforce" thing supposed to work? Enforce host CPUID?
"enforce" makes sure all features are really being enabled. It makes
QEMU abort if there's any feature that can't be enabled on that host.
[1] Maybe one source of confusion is that the existing code have two
feature-filtering functions doing basically the same thing:
filter_features_for_kvm() and kvm_check_features_against_host(). That's
something we must clean up, and they should be unified. "enforce" should
become synonymous to "make sure filtered_features is all zeroes". This
way, libvirt can emulate what 'enforce" does while being able to collect
detailed error information (which is not easy to do if QEMU simply
aborts).
>
> > Also, we don't know anything about future CPUs or future features
> > that will end up on EMULATED_CPUID. The current code doesn't have
> > anything to differentiate features that were already included in the
> > CPU definition and ones explicitly enabled in the command-line (and I
> > would like to keep it that way).
>
> Ok.
>
> > And just because a feature was explicitly enabled in the command-line,
> > that doesn't mean the user believe it is acceptable to get it running
> > in emulated mode. That's why I propose a new "emulate" flag, to allow
> > features to be enabled in emulated mode.
>
> And I think, saying "-cpu ...,+movbe" is an explicit statement enough to
> say that yes, I am starting this guest and I want MOVBE emulation.
Not necessarily. libvirt has some code that will translate its own CPU
model definition to a "-cpu Model,+flag,+flag,+flag,-flag" command-line
when necessary. It is by design that there is no difference between
explicit "+flag" options and existing flags from the CPU model
definition.
>
> > Well, x2apic is emulated by KVM, and it is on SUPPORTED_CPUID. Ditto
> > for tsc-deadline. Or are you talking specifically about instruction
> > emulation?
>
> Basically, I'm viewing this from a very practical standpoint - if I
> build a kernel which requires MOVBE support but I cannot boot it in kvm
> because it doesn't emulate MOVBE (TCG does now but it didn't before)
> I'd like to be able to address that shortcoming by emulating that
> instruction, if possible.
>
> And the whole discussion grew out from the standpoint of being able to
> emulate stuff so that you can do quick and dirty booting of kernels but
> not show that emulation capability to the wide audience since it is slow
> and it shouldn't be used and then migration has issues, etc, etc.
>
> But hey, I don't really care all that much if I have to also say
> -emulate in order to get my functionality.
OK, I undestand your use case, now. Thanks for your explanation.
--
Eduardo
On Fri, Sep 27, 2013 at 11:21:34AM -0300, Eduardo Habkost wrote:
> The problem here is that "requested_features" doesn't include just
> the explicit "+flag" flags, but any flag included in the CPU model
> definition. See the "-cpu n270" example below.
Oh, you mean if requested_features would contain a flag included from
the CPU model definition - a flag which we haven't requested explicitly
- and if kvm emulates that flag, then it will get enabled?
Hmm.
> It should, but your patch will make it stop failing because of MOVBE, as
> now it can be emulated[1].
Right.
> "enforce" makes sure all features are really being enabled. It makes
> QEMU abort if there's any feature that can't be enabled on that host.
Ok.
> [1] Maybe one source of confusion is that the existing code have two
> feature-filtering functions doing basically the same thing:
> filter_features_for_kvm() and kvm_check_features_against_host(). That's
Yes, and the first gets executed unconditionally and does the feature
filtering, right after the second has run in the kvm_enabled() branch.
> something we must clean up, and they should be unified. "enforce" should
> become synonymous to "make sure filtered_features is all zeroes". This
> way, libvirt can emulate what 'enforce" does while being able to collect
> detailed error information (which is not easy to do if QEMU simply
> aborts).
Ok, maybe someone who's more knowledgeable with this code should do it -
not me :)
Also, there's another aspect, while we're here: now that QEMU emulates
MOVBE with TCG too, how do we specify on the command line, which
emulation should be used - kvm.ko or QEMU?
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Sat, Sep 28, 2013 at 12:49:04PM +0200, Borislav Petkov wrote:
> On Fri, Sep 27, 2013 at 11:21:34AM -0300, Eduardo Habkost wrote:
> > The problem here is that "requested_features" doesn't include just
> > the explicit "+flag" flags, but any flag included in the CPU model
> > definition. See the "-cpu n270" example below.
>
> Oh, you mean if requested_features would contain a flag included from
> the CPU model definition - a flag which we haven't requested explicitly
> - and if kvm emulates that flag, then it will get enabled?
Exactly. The code needs to filter/check all feature bits on the CPU, not
just the ones requested explicitly in the command-line.
[...]
> > [1] Maybe one source of confusion is that the existing code have two
> > feature-filtering functions doing basically the same thing:
> > filter_features_for_kvm() and kvm_check_features_against_host(). That's
>
> Yes, and the first gets executed unconditionally and does the feature
> filtering, right after the second has run in the kvm_enabled() branch.
This should be fixed, too: eventually "enforce" should work on TCG mode
as well.
>
> > something we must clean up, and they should be unified. "enforce" should
> > become synonymous to "make sure filtered_features is all zeroes". This
> > way, libvirt can emulate what 'enforce" does while being able to collect
> > detailed error information (which is not easy to do if QEMU simply
> > aborts).
>
> Ok, maybe someone who's more knowledgeable with this code should do it -
> not me :)
I have added it to my TODO-list. :-)
>
> Also, there's another aspect, while we're here: now that QEMU emulates
> MOVBE with TCG too, how do we specify on the command line, which
> emulation should be used - kvm.ko or QEMU?
You can use accel={tcg,kvm} option on the "-machine" argument, e.g.
"-machine pc,accel=kvm". Or the "-enable-kvm" option.
--
Eduardo
On Mon, Sep 30, 2013 at 01:13:34PM -0300, Eduardo Habkost wrote:
> I have added it to my TODO-list. :-)
Cool, thanks. Let me know if I can test stuff and help out somehow.
> >
> > Also, there's another aspect, while we're here: now that QEMU emulates
> > MOVBE with TCG too, how do we specify on the command line, which
> > emulation should be used - kvm.ko or QEMU?
>
> You can use accel={tcg,kvm} option on the "-machine" argument, e.g.
> "-machine pc,accel=kvm". Or the "-enable-kvm" option.
Ah, ok.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Sun, Sep 22, 2013 at 04:44:53PM +0200, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
>
> Add initial support for handling three-byte instructions in the
> emulator.
>
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
> arch/x86/kvm/emulate.c | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 67277bcb377a..72093d76c769 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -3880,6 +3880,25 @@ static const struct opcode twobyte_table[256] = {
> N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N
> };
>
> +static const struct gprefix third_opcode_byte_0xf0 = {
> + N, N, N, N
> +};
> +
> +static const struct gprefix third_opcode_byte_0xf1 = {
> + N, N, N, N
> +};
There are two three opcode tables, so third_opcode_byte is ambiguous.
What about pfx_0f_38_f0 and pfx_0f_38_f1?
> +
> +/*
> + * Insns below are selected by the prefix which indexed by the third opcode
> + * byte.
> + */
> +static const struct opcode opcode_map_0f_38[256] = {
> + /* 0x00 - 0x7f */
> + X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N),
> + /* 0x80 - 0xff */
> + X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N)
> +};
> +
> #undef D
> #undef N
> #undef G
> @@ -4200,6 +4219,13 @@ done_prefixes:
> ctxt->opcode_len = 2;
> ctxt->b = insn_fetch(u8, ctxt);
> opcode = twobyte_table[ctxt->b];
> +
> + /* 0F_38 opcode map */
> + if (ctxt->b == 0x38) {
> + ctxt->opcode_len = 3;
> + ctxt->b = insn_fetch(u8, ctxt);
> + opcode = opcode_map_0f_38[ctxt->b];
> + }
> }
> ctxt->d = opcode.flags;
>
> @@ -4531,6 +4557,8 @@ special_insn:
>
> if (ctxt->opcode_len == 2)
> goto twobyte_insn;
> + else if (ctxt->opcode_len == 3)
> + goto threebyte_insn;
>
> switch (ctxt->b) {
> case 0x63: /* movsxd */
> @@ -4715,6 +4743,8 @@ twobyte_insn:
> goto cannot_emulate;
> }
>
> +threebyte_insn:
> +
> if (rc != X86EMUL_CONTINUE)
> goto done;
>
> --
> 1.8.4
--
Gleb.
On Sun, Sep 22, 2013 at 04:44:49PM +0200, Borislav Petkov wrote:
> From: Borislav Petkov <[email protected]>
>
> Alriiight,
>
> here's another version of the patchset, hopefully addressing all review
> feedback from last time. 6/6 is the respective qemu patch to handle
> emulated features query, etc.
>
Except a small nitpick for patch 4 this looks good to me. No need to
wait for QEMU cpu parsing code to change to apply it.
--
Gleb.
On Tue, Oct 29, 2013 at 11:50:43AM +0200, Gleb Natapov wrote:
> There are two three opcode tables, so third_opcode_byte is ambiguous.
Actually there's also 0F_3A and there are also other prefixes besides f0
and f1. Oh, and those tables are not completely full so I can imagine
more stuff coming in later....
I know what you're thinking by now, btw :-)
> What about pfx_0f_38_f0 and pfx_0f_38_f1?
Yeah, those make it much more explicit.
I wanted to keep the "three_byte" in the name in there somewhere,
though, so that it is clear that we're dealing with three byte opcodes
instead of requiring the onlooking innocent person to know the opcodes.
How about:
three_byte_0f_38_f0
three_byte_0f_38_f1
three_byte_0f_3a_50
...
Last one is an example only.
Btw, we might want to reconsider that whole tabular representation when
more stuff needs to be added...
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Tue, Oct 29, 2013 at 11:04:57AM +0100, Borislav Petkov wrote:
> On Tue, Oct 29, 2013 at 11:50:43AM +0200, Gleb Natapov wrote:
> > There are two three opcode tables, so third_opcode_byte is ambiguous.
>
> Actually there's also 0F_3A and there are also other prefixes besides f0
> and f1. Oh, and those tables are not completely full so I can imagine
> more stuff coming in later....
>
> I know what you're thinking by now, btw :-)
>
:)
> > What about pfx_0f_38_f0 and pfx_0f_38_f1?
>
> Yeah, those make it much more explicit.
>
> I wanted to keep the "three_byte" in the name in there somewhere,
> though, so that it is clear that we're dealing with three byte opcodes
> instead of requiring the onlooking innocent person to know the opcodes.
>
> How about:
>
> three_byte_0f_38_f0
> three_byte_0f_38_f1
> three_byte_0f_3a_50
> ...
>
> Last one is an example only.
>
Looks OK to me.
> Btw, we might want to reconsider that whole tabular representation when
> more stuff needs to be added...
>
Of course. When tables will start to show their limitation we can always
change to something else.
--
Gleb.
On Tue, Oct 29, 2013 at 11:53:00AM +0200, Gleb Natapov wrote:
> Except a small nitpick for patch 4 this looks good to me. No need to
> wait for QEMU cpu parsing code to change to apply it.
Ok, want a resend or rather a pull request or only patch 4 resend?
Btw, we kinda missed each other in Edinburgh to exchange fingerprints
for pull request :-\
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Tue, Oct 29, 2013 at 11:30:03AM +0100, Borislav Petkov wrote:
> On Tue, Oct 29, 2013 at 11:53:00AM +0200, Gleb Natapov wrote:
> > Except a small nitpick for patch 4 this looks good to me. No need to
> > wait for QEMU cpu parsing code to change to apply it.
>
> Ok, want a resend or rather a pull request or only patch 4 resend?
>
If everything still applies patch 4 resend should be fine.
> Btw, we kinda missed each other in Edinburgh to exchange fingerprints
> for pull request :-\
>
Yeah, kvm forum was not at the same building as LinuxCon :(
--
Gleb.
On Tue, Oct 29, 2013 at 12:35:24PM +0200, Gleb Natapov wrote:
> If everything still applies patch 4 resend should be fine.
Which tree should they still apply to?
> Yeah, kvm forum was not at the same building as LinuxCon :(
Yep, I know. We did go to the kvm forum to use the wifi there because
the wifi at the conf was awful, as it is always the case at such
high-count events :)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Tue, Oct 29, 2013 at 12:28:24PM +0100, Borislav Petkov wrote:
> On Tue, Oct 29, 2013 at 12:35:24PM +0200, Gleb Natapov wrote:
> > If everything still applies patch 4 resend should be fine.
>
> Which tree should they still apply to?
>
git://git.kernel.org/pub/scm/virt/kvm/kvm.git queue branch.
> > Yeah, kvm forum was not at the same building as LinuxCon :(
>
> Yep, I know. We did go to the kvm forum to use the wifi there because
> the wifi at the conf was awful, as it is always the case at such
> high-count events :)
>
That's why we got "max users limit is reached" error! (only for the
first half a day though) :)
--
Gleb.
On Tue, Oct 29, 2013 at 01:36:03PM +0200, Gleb Natapov wrote:
> git://git.kernel.org/pub/scm/virt/kvm/kvm.git queue branch.
Ok, they still apply.
I'm sending 4 and 5 (5 needed changes too) as a reply to this message.
> That's why we got "max users limit is reached" error! (only for the
> first half a day though) :)
Ha!
You could use that for the next kvm forum: "Come to the kvm forum - if
nothing else, we have better wifi!"
:-)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
From: Borislav Petkov <[email protected]>
Add initial support for handling three-byte instructions in the
emulator.
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kvm/emulate.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 541801527225..6c5cfe962b28 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3892,6 +3892,25 @@ static const struct opcode twobyte_table[256] = {
N, N, N, N, N, N, N, N, N, N, N, N, N, N, N, N
};
+static const struct gprefix three_byte_0f_38_f0 = {
+ N, N, N, N
+};
+
+static const struct gprefix three_byte_0f_38_f1 = {
+ N, N, N, N
+};
+
+/*
+ * Insns below are selected by the prefix which indexed by the third opcode
+ * byte.
+ */
+static const struct opcode opcode_map_0f_38[256] = {
+ /* 0x00 - 0x7f */
+ X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N),
+ /* 0x80 - 0xff */
+ X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N)
+};
+
#undef D
#undef N
#undef G
@@ -4212,6 +4231,13 @@ done_prefixes:
ctxt->opcode_len = 2;
ctxt->b = insn_fetch(u8, ctxt);
opcode = twobyte_table[ctxt->b];
+
+ /* 0F_38 opcode map */
+ if (ctxt->b == 0x38) {
+ ctxt->opcode_len = 3;
+ ctxt->b = insn_fetch(u8, ctxt);
+ opcode = opcode_map_0f_38[ctxt->b];
+ }
}
ctxt->d = opcode.flags;
@@ -4543,6 +4569,8 @@ special_insn:
if (ctxt->opcode_len == 2)
goto twobyte_insn;
+ else if (ctxt->opcode_len == 3)
+ goto threebyte_insn;
switch (ctxt->b) {
case 0x63: /* movsxd */
@@ -4727,6 +4755,8 @@ twobyte_insn:
goto cannot_emulate;
}
+threebyte_insn:
+
if (rc != X86EMUL_CONTINUE)
goto done;
--
1.8.4
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
From: Borislav Petkov <[email protected]>
This basically came from the need to be able to boot 32-bit Atom SMP
guests on an AMD host, i.e. a host which doesn't support MOVBE. As a
matter of fact, qemu has since recently received MOVBE support but we
cannot share that with kvm emulation and thus we have to do this in the
host. We're waay faster in kvm anyway. :-)
So, we piggyback on the #UD path and emulate the MOVBE functionality.
With it, an 8-core SMP guest boots in under 6 seconds.
Also, requesting MOVBE emulation needs to happen explicitly to work,
i.e. qemu -cpu n270,+movbe...
Just FYI, a fairly straight-forward boot of a MOVBE-enabled 3.9-rc6+
kernel in kvm executes MOVBE ~60K times.
Signed-off-by: Andre Przywara <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
---
arch/x86/kvm/cpuid.c | 18 ++++++++++++++++-
arch/x86/kvm/emulate.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++----
2 files changed, 66 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 78a4439bfdc5..86d5756dda07 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -222,6 +222,22 @@ static bool supported_xcr0_bit(unsigned bit)
static int __do_cpuid_ent_emulated(struct kvm_cpuid_entry2 *entry,
u32 func, u32 index, int *nent, int maxnent)
{
+ switch (func) {
+ case 0:
+ entry->eax = 1; /* only one leaf currently */
+ ++*nent;
+ break;
+ case 1:
+ entry->ecx = F(MOVBE);
+ ++*nent;
+ break;
+ default:
+ break;
+ }
+
+ entry->function = func;
+ entry->index = index;
+
return 0;
}
@@ -593,7 +609,7 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
return -EINVAL;
r = -ENOMEM;
- cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent);
+ cpuid_entries = vzalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent);
if (!cpuid_entries)
goto out;
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 6c5cfe962b28..8e2a07bd8eac 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2961,6 +2961,46 @@ static int em_mov(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
}
+#define FFL(x) bit(X86_FEATURE_##x)
+
+static int em_movbe(struct x86_emulate_ctxt *ctxt)
+{
+ u32 ebx, ecx, edx, eax = 1;
+ u16 tmp;
+
+ /*
+ * Check MOVBE is set in the guest-visible CPUID leaf.
+ */
+ ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
+ if (!(ecx & FFL(MOVBE)))
+ return emulate_ud(ctxt);
+
+ switch (ctxt->op_bytes) {
+ case 2:
+ /*
+ * From MOVBE definition: "...When the operand size is 16 bits,
+ * the upper word of the destination register remains unchanged
+ * ..."
+ *
+ * Both casting ->valptr and ->val to u16 breaks strict aliasing
+ * rules so we have to do the operation almost per hand.
+ */
+ tmp = (u16)ctxt->src.val;
+ ctxt->dst.val &= ~0xffffUL;
+ ctxt->dst.val |= (unsigned long)swab16(tmp);
+ break;
+ case 4:
+ ctxt->dst.val = swab32((u32)ctxt->src.val);
+ break;
+ case 8:
+ ctxt->dst.val = swab64(ctxt->src.val);
+ break;
+ default:
+ return X86EMUL_PROPAGATE_FAULT;
+ }
+ return X86EMUL_CONTINUE;
+}
+
static int em_cr_write(struct x86_emulate_ctxt *ctxt)
{
if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val))
@@ -3893,11 +3933,11 @@ static const struct opcode twobyte_table[256] = {
};
static const struct gprefix three_byte_0f_38_f0 = {
- N, N, N, N
+ I(DstReg | SrcMem | Mov, em_movbe), N, N, N
};
static const struct gprefix three_byte_0f_38_f1 = {
- N, N, N, N
+ I(DstMem | SrcReg | Mov, em_movbe), N, N, N
};
/*
@@ -3907,8 +3947,13 @@ static const struct gprefix three_byte_0f_38_f1 = {
static const struct opcode opcode_map_0f_38[256] = {
/* 0x00 - 0x7f */
X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N),
- /* 0x80 - 0xff */
- X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N)
+ /* 0x80 - 0xef */
+ X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N),
+ /* 0xf0 - 0xf1 */
+ GP(EmulateOnUD | ModRM | Prefix, &three_byte_0f_38_f0),
+ GP(EmulateOnUD | ModRM | Prefix, &three_byte_0f_38_f1),
+ /* 0xf2 - 0xff */
+ N, N, X4(N), X8(N)
};
#undef D
--
1.8.4
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
Il 29/10/2013 12:54, Borislav Petkov ha scritto:
> From: Borislav Petkov <[email protected]>
>
> This basically came from the need to be able to boot 32-bit Atom SMP
> guests on an AMD host, i.e. a host which doesn't support MOVBE. As a
> matter of fact, qemu has since recently received MOVBE support but we
> cannot share that with kvm emulation and thus we have to do this in the
> host. We're waay faster in kvm anyway. :-)
>
> So, we piggyback on the #UD path and emulate the MOVBE functionality.
> With it, an 8-core SMP guest boots in under 6 seconds.
>
> Also, requesting MOVBE emulation needs to happen explicitly to work,
> i.e. qemu -cpu n270,+movbe...
>
> Just FYI, a fairly straight-forward boot of a MOVBE-enabled 3.9-rc6+
> kernel in kvm executes MOVBE ~60K times.
>
> Signed-off-by: Andre Przywara <[email protected]>
> Signed-off-by: Borislav Petkov <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 18 ++++++++++++++++-
> arch/x86/kvm/emulate.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++----
> 2 files changed, 66 insertions(+), 5 deletions(-)
Thanks, I've applied these patches to kvm/queue.
Paolo
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 78a4439bfdc5..86d5756dda07 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -222,6 +222,22 @@ static bool supported_xcr0_bit(unsigned bit)
> static int __do_cpuid_ent_emulated(struct kvm_cpuid_entry2 *entry,
> u32 func, u32 index, int *nent, int maxnent)
> {
> + switch (func) {
> + case 0:
> + entry->eax = 1; /* only one leaf currently */
> + ++*nent;
> + break;
> + case 1:
> + entry->ecx = F(MOVBE);
> + ++*nent;
> + break;
> + default:
> + break;
> + }
> +
> + entry->function = func;
> + entry->index = index;
> +
> return 0;
> }
>
> @@ -593,7 +609,7 @@ int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid,
> return -EINVAL;
>
> r = -ENOMEM;
> - cpuid_entries = vmalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent);
> + cpuid_entries = vzalloc(sizeof(struct kvm_cpuid_entry2) * cpuid->nent);
> if (!cpuid_entries)
> goto out;
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 6c5cfe962b28..8e2a07bd8eac 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2961,6 +2961,46 @@ static int em_mov(struct x86_emulate_ctxt *ctxt)
> return X86EMUL_CONTINUE;
> }
>
> +#define FFL(x) bit(X86_FEATURE_##x)
> +
> +static int em_movbe(struct x86_emulate_ctxt *ctxt)
> +{
> + u32 ebx, ecx, edx, eax = 1;
> + u16 tmp;
> +
> + /*
> + * Check MOVBE is set in the guest-visible CPUID leaf.
> + */
> + ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
> + if (!(ecx & FFL(MOVBE)))
> + return emulate_ud(ctxt);
> +
> + switch (ctxt->op_bytes) {
> + case 2:
> + /*
> + * From MOVBE definition: "...When the operand size is 16 bits,
> + * the upper word of the destination register remains unchanged
> + * ..."
> + *
> + * Both casting ->valptr and ->val to u16 breaks strict aliasing
> + * rules so we have to do the operation almost per hand.
> + */
> + tmp = (u16)ctxt->src.val;
> + ctxt->dst.val &= ~0xffffUL;
> + ctxt->dst.val |= (unsigned long)swab16(tmp);
> + break;
> + case 4:
> + ctxt->dst.val = swab32((u32)ctxt->src.val);
> + break;
> + case 8:
> + ctxt->dst.val = swab64(ctxt->src.val);
> + break;
> + default:
> + return X86EMUL_PROPAGATE_FAULT;
> + }
> + return X86EMUL_CONTINUE;
> +}
> +
> static int em_cr_write(struct x86_emulate_ctxt *ctxt)
> {
> if (ctxt->ops->set_cr(ctxt, ctxt->modrm_reg, ctxt->src.val))
> @@ -3893,11 +3933,11 @@ static const struct opcode twobyte_table[256] = {
> };
>
> static const struct gprefix three_byte_0f_38_f0 = {
> - N, N, N, N
> + I(DstReg | SrcMem | Mov, em_movbe), N, N, N
> };
>
> static const struct gprefix three_byte_0f_38_f1 = {
> - N, N, N, N
> + I(DstMem | SrcReg | Mov, em_movbe), N, N, N
> };
>
> /*
> @@ -3907,8 +3947,13 @@ static const struct gprefix three_byte_0f_38_f1 = {
> static const struct opcode opcode_map_0f_38[256] = {
> /* 0x00 - 0x7f */
> X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N),
> - /* 0x80 - 0xff */
> - X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N)
> + /* 0x80 - 0xef */
> + X16(N), X16(N), X16(N), X16(N), X16(N), X16(N), X16(N),
> + /* 0xf0 - 0xf1 */
> + GP(EmulateOnUD | ModRM | Prefix, &three_byte_0f_38_f0),
> + GP(EmulateOnUD | ModRM | Prefix, &three_byte_0f_38_f1),
> + /* 0xf2 - 0xff */
> + N, N, X4(N), X8(N)
> };
>
> #undef D
>
On Wed, Oct 30, 2013 at 06:56:10PM +0100, Paolo Bonzini wrote:
> Thanks, I've applied these patches to kvm/queue.
Cool.
As I'm sure I'll miss the qemu development, please ping me when the qemu
side is ready for adding the KVM_GET_EMULATED_CPUID thing there.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
Il 22/09/2013 16:44, Borislav Petkov ha scritto:
> From: Borislav Petkov <[email protected]>
>
> Alriiight,
>
> here's another version of the patchset, hopefully addressing all review
> feedback from last time. 6/6 is the respective qemu patch to handle
> emulated features query, etc.
>
> It is still a lot of fun to generate fast! Atom 32-bit SMP guests like
> this:
>
> [ 0.022876] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
> [ 0.033304] smpboot: CPU0: Intel(R) Atom(TM) CPU N270 @ 1.60GHz (fam: 06, model: 1c, stepping: 02)
> [ 0.037000] APIC calibration not consistent with PM-Timer: 146ms instead of 100ms
> [ 0.037000] APIC delta adjusted to PM-Timer: 6249937 (9125627)
> [ 0.037066] Performance Events: unsupported p6 CPU model 28 no PMU driver, software events only.
> [ 0.043605] SMP alternatives: lockdep: fixing up alternatives
> [ 0.044030] CPU 1 irqstacks, hard=f450c000 soft=f450e000
> [ 0.045004] smpboot: Booting Node 0, Processors #1[ 0.004000] Initializing CPU#1
> [ 0.004000] Atom PSE erratum detected, BIOS microcode update recommended
>
> [ 0.120290] SMP alternatives: lockdep: fixing up alternatives
> [ 0.121007] CPU 2 irqstacks, hard=f451c000 soft=f451e000
> [ 0.122003] #2[ 0.004000] Initializing CPU#2
> [ 0.004000] Atom PSE erratum detected, BIOS microcode update recommended
>
> ...
>
> [ 0.667192] SMP alternatives: lockdep: fixing up alternatives
> [ 0.668007] CPU 7 irqstacks, hard=f45b0000 soft=f45b2000
> [ 0.669010] #7 OK
> [ 0.004000] Initializing CPU#7
> [ 0.004000] Atom PSE erratum detected, BIOS microcode update recommended
> [ 0.781052] Brought up 8 CPUs
> [ 0.781917] smpboot: Total of 8 processors activated (57461.27 BogoMIPS)
>
> LooL :-)
>
> Comments and suggestions appreciated, as always!
>
> Thanks.
>
> Borislav Petkov (5):
> kvm: Add KVM_GET_EMULATED_CPUID
> kvm, emulator: Use opcode length
> kvm, emulator: Rename VendorSpecific flag
> kvm, emulator: Add initial three-byte insns support
> kvm: Emulate MOVBE
>
> Documentation/virtual/kvm/api.txt | 77 +++++++++++++++++++++++++++++--
> arch/x86/include/asm/kvm_emulate.h | 10 ++--
> arch/x86/include/uapi/asm/kvm.h | 6 +--
> arch/x86/kvm/cpuid.c | 75 +++++++++++++++++++++++++++---
> arch/x86/kvm/cpuid.h | 5 +-
> arch/x86/kvm/emulate.c | 94 ++++++++++++++++++++++++++++++++++----
> arch/x86/kvm/x86.c | 16 ++++---
> include/uapi/linux/kvm.h | 2 +
> 8 files changed, 251 insertions(+), 34 deletions(-)
>
> Borislav Petkov (1):
> qemu: Add support for emulated CPU features
>
> include/sysemu/kvm.h | 4 ++++
> linux-headers/linux/kvm.h | 4 ++++
> target-i386/cpu.c | 7 +++++++
> target-i386/kvm.c | 38 ++++++++++++++++++++++++++++++++++----
> 4 files changed, 49 insertions(+), 4 deletions(-)
>
I tested this patch manually with a small C test program and asm.
Would you please prepare a patch to kvm-unit-tests that stresses the new
codepath?
Paolo
On Wed, Oct 30, 2013 at 07:10:56PM +0100, Paolo Bonzini wrote:
> I tested this patch manually with a small C test program and asm.
>
> Would you please prepare a patch to kvm-unit-tests that stresses the new
> codepath?
Ha, now that you bring that up, I think Andre already had something
ready. Andre?
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--