2014-06-18 14:19:56

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 0/3] Correct monitor-mwait emulation as nop

KVM handles monitor-mwait as nop, but does not check any of the preconditions
for the instructions. These instructions may generate all kind of exceptions
(#UD, #PF, #GP, #SS). They can also be executed in real-mode. This patch-set
moves the handling of monitor-mwait to the emulator, to allow their execution
in either real-mode or protected-mode. It tries to follow the SDM in checking
the preconditions and generating the necassary exceptions.

Thanks for reviewing the patch. Please try it with OS X to make sure it works
properly without generating unnecassary exception.

Nadav Amit (3):
KVM: x86: Emulator flag for instruction with no big real mode
KVM: x86: Emulator support for #UD on CPL>0
KVM: x86: correct mwait and monitor emulation

arch/x86/kvm/emulate.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++----
arch/x86/kvm/svm.c | 22 ++------------------
arch/x86/kvm/vmx.c | 27 ++++++++++---------------
3 files changed, 64 insertions(+), 40 deletions(-)

--
1.9.1


2014-06-18 14:19:54

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 1/3] KVM: x86: Emulator flag for instruction with no big real mode

Certain instructions, such as monitor and xsave do not support big real mode
and cause a #GP exception if any of the accessed bytes effective address are
not within [0, 0xffff]. This patch introduces a flag to mark these
instructions, including the necassary checks.

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/kvm/emulate.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e4e833d..f90194d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -162,6 +162,7 @@
#define NoWrite ((u64)1 << 45) /* No writeback */
#define SrcWrite ((u64)1 << 46) /* Write back src operand */
#define NoMod ((u64)1 << 47) /* Mod field is ignored */
+#define NoBigReal ((u64)1 << 48) /* No big real mode */

#define DstXacc (DstAccLo | SrcAccHi | SrcWrite)

@@ -651,7 +652,12 @@ static int __linearize(struct x86_emulate_ctxt *ctxt,
if (!fetch && (desc.type & 8) && !(desc.type & 2))
goto bad;
lim = desc_limit_scaled(&desc);
- if ((desc.type & 8) || !(desc.type & 4)) {
+ if ((ctxt->mode == X86EMUL_MODE_REAL) && !fetch &&
+ (ctxt->d & NoBigReal)) {
+ /* la is between zero and 0xffff */
+ if (la > 0xffff || (u32)(la + size - 1) > 0xffff)
+ goto bad;
+ } else if ((desc.type & 8) || !(desc.type & 4)) {
/* expand-up segment */
if (addr.ea > lim || (u32)(addr.ea + size - 1) > lim)
goto bad;
--
1.9.1

2014-06-18 14:19:58

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 2/3] KVM: x86: Emulator support for #UD on CPL>0

Certain instructions (e.g., mwait and monitor) cause a #UD exception when they
are executed in privilaged mode. This is in contrast to the regular privilaged
instructions which cause #GP. In order not to mess with SVM interception of
mwait and monitor which assumes privilage level assertions take place before
interception, a flag has been added.

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/kvm/emulate.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f90194d..ef7a5a0 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -163,6 +163,7 @@
#define SrcWrite ((u64)1 << 46) /* Write back src operand */
#define NoMod ((u64)1 << 47) /* Mod field is ignored */
#define NoBigReal ((u64)1 << 48) /* No big real mode */
+#define UDOnPriv ((u64)1 << 49) /* #UD instead of #GP on CPL > 0 */

#define DstXacc (DstAccLo | SrcAccHi | SrcWrite)

@@ -4560,7 +4561,10 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)

/* Privileged instruction can be executed only in CPL=0 */
if ((ctxt->d & Priv) && ops->cpl(ctxt)) {
- rc = emulate_gp(ctxt, 0);
+ if (ctxt->d & UDOnPriv)
+ rc = emulate_ud(ctxt);
+ else
+ rc = emulate_gp(ctxt, 0);
goto done;
}

--
1.9.1

2014-06-18 14:24:14

by Nadav Amit

[permalink] [raw]
Subject: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

mwait and monitor are currently handled as nop. Considering this behavior, they
should still be handled correctly, i.e., check execution conditions and generate
exceptions when required. mwait and monitor may also be executed in real-mode
and are not handled in that case. This patch performs the emulation of
monitor-mwait according to Intel SDM (other than checking whether interrupt can
be used as a break event).

Signed-off-by: Nadav Amit <[email protected]>
---
arch/x86/kvm/emulate.c | 41 +++++++++++++++++++++++++++++++++++++++--
arch/x86/kvm/svm.c | 22 ++--------------------
arch/x86/kvm/vmx.c | 27 +++++++++++----------------
3 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index ef7a5a0..424b58d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -3344,6 +3344,43 @@ static int em_bswap(struct x86_emulate_ctxt *ctxt)
return X86EMUL_CONTINUE;
}

+static int em_monitor(struct x86_emulate_ctxt *ctxt)
+{
+ int rc;
+ struct segmented_address addr;
+ u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
+ u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
+ u8 byte;
+
+ if (ctxt->mode != X86EMUL_MODE_PROT64)
+ rcx = (u32)rcx;
+ if (rcx != 0)
+ return emulate_gp(ctxt, 0);
+
+ addr.seg = seg_override(ctxt);
+ addr.ea = ctxt->ad_bytes == 8 ? rax : (u32)rax;
+ rc = segmented_read(ctxt, addr, &byte, 1);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
+
+ printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
+ return X86EMUL_CONTINUE;
+}
+
+static int em_mwait(struct x86_emulate_ctxt *ctxt)
+{
+ u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
+
+ if (ctxt->mode != X86EMUL_MODE_PROT64)
+ rcx = (u32)rcx;
+ if ((rcx & ~1UL) != 0)
+ return emulate_gp(ctxt, 0);
+
+ /* Accepting interrupt as break event regardless to cpuid */
+ printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
+ return X86EMUL_CONTINUE;
+}
+
static bool valid_cr(int nr)
{
switch (nr) {
@@ -3557,8 +3594,8 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
F2bv(((_f) & ~Lock) | DstAcc | SrcImm, _e)

static const struct opcode group7_rm1[] = {
- DI(SrcNone | Priv, monitor),
- DI(SrcNone | Priv, mwait),
+ II(SrcNone | Priv | NoBigReal | UDOnPriv, em_monitor, monitor),
+ II(SrcNone | Priv | NoBigReal | UDOnPriv, em_mwait, mwait),
N, N, N, N, N, N,
};

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c..a524e04 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3274,24 +3274,6 @@ static int pause_interception(struct vcpu_svm *svm)
return 1;
}

-static int nop_interception(struct vcpu_svm *svm)
-{
- skip_emulated_instruction(&(svm->vcpu));
- return 1;
-}
-
-static int monitor_interception(struct vcpu_svm *svm)
-{
- printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
- return nop_interception(svm);
-}
-
-static int mwait_interception(struct vcpu_svm *svm)
-{
- printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
- return nop_interception(svm);
-}
-
static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_READ_CR0] = cr_interception,
[SVM_EXIT_READ_CR3] = cr_interception,
@@ -3349,8 +3331,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
[SVM_EXIT_CLGI] = clgi_interception,
[SVM_EXIT_SKINIT] = skinit_interception,
[SVM_EXIT_WBINVD] = emulate_on_interception,
- [SVM_EXIT_MONITOR] = monitor_interception,
- [SVM_EXIT_MWAIT] = mwait_interception,
+ [SVM_EXIT_MONITOR] = emulate_on_interception,
+ [SVM_EXIT_MWAIT] = emulate_on_interception,
[SVM_EXIT_XSETBV] = xsetbv_interception,
[SVM_EXIT_NPF] = pf_interception,
};
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 801332e..7023e71 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5672,22 +5672,17 @@ static int handle_pause(struct kvm_vcpu *vcpu)
return 1;
}

-static int handle_nop(struct kvm_vcpu *vcpu)
+static int handle_emulate(struct kvm_vcpu *vcpu)
{
- skip_emulated_instruction(vcpu);
- return 1;
-}
+ int err = emulate_instruction(vcpu, 0);

-static int handle_mwait(struct kvm_vcpu *vcpu)
-{
- printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
- return handle_nop(vcpu);
-}
-
-static int handle_monitor(struct kvm_vcpu *vcpu)
-{
- printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
- return handle_nop(vcpu);
+ if (err != EMULATE_DONE) {
+ vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
+ vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
+ vcpu->run->internal.ndata = 0;
+ return 0;
+ }
+ return 1;
}

/*
@@ -6651,8 +6646,8 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
[EXIT_REASON_EPT_VIOLATION] = handle_ept_violation,
[EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig,
[EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
- [EXIT_REASON_MWAIT_INSTRUCTION] = handle_mwait,
- [EXIT_REASON_MONITOR_INSTRUCTION] = handle_monitor,
+ [EXIT_REASON_MWAIT_INSTRUCTION] = handle_emulate,
+ [EXIT_REASON_MONITOR_INSTRUCTION] = handle_emulate,
[EXIT_REASON_INVEPT] = handle_invept,
};

--
1.9.1

2014-06-18 16:29:41

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 2/3] KVM: x86: Emulator support for #UD on CPL>0

Il 18/06/2014 16:19, Nadav Amit ha scritto:
> Certain instructions (e.g., mwait and monitor) cause a #UD exception when they
> are executed in privilaged mode.

It's actually "non-privileged mode" (Priv means the instruction is
privileged, not the mode). So I've renamed the flag to PrivUD.

Paolo

This is in contrast to the regular privilaged
> instructions which cause #GP. In order not to mess with SVM interception of
> mwait and monitor which assumes privilage level assertions take place before
> interception, a flag has been added.
>
> Signed-off-by: Nadav Amit <[email protected]>
> ---
> arch/x86/kvm/emulate.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index f90194d..ef7a5a0 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -163,6 +163,7 @@
> #define SrcWrite ((u64)1 << 46) /* Write back src operand */
> #define NoMod ((u64)1 << 47) /* Mod field is ignored */
> #define NoBigReal ((u64)1 << 48) /* No big real mode */
> +#define UDOnPriv ((u64)1 << 49) /* #UD instead of #GP on CPL > 0 */
>
> #define DstXacc (DstAccLo | SrcAccHi | SrcWrite)
>
> @@ -4560,7 +4561,10 @@ int x86_emulate_insn(struct x86_emulate_ctxt *ctxt)
>
> /* Privileged instruction can be executed only in CPL=0 */
> if ((ctxt->d & Priv) && ops->cpl(ctxt)) {
> - rc = emulate_gp(ctxt, 0);
> + if (ctxt->d & UDOnPriv)
> + rc = emulate_ud(ctxt);
> + else
> + rc = emulate_gp(ctxt, 0);
> goto done;
> }
>
>

2014-06-18 16:32:24

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

Il 18/06/2014 16:19, Nadav Amit ha scritto:
> mwait and monitor are currently handled as nop. Considering this behavior, they
> should still be handled correctly, i.e., check execution conditions and generate
> exceptions when required. mwait and monitor may also be executed in real-mode
> and are not handled in that case. This patch performs the emulation of
> monitor-mwait according to Intel SDM (other than checking whether interrupt can
> be used as a break event).
>
> Signed-off-by: Nadav Amit <[email protected]>
> ---
> arch/x86/kvm/emulate.c | 41 +++++++++++++++++++++++++++++++++++++++--
> arch/x86/kvm/svm.c | 22 ++--------------------
> arch/x86/kvm/vmx.c | 27 +++++++++++----------------
> 3 files changed, 52 insertions(+), 38 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index ef7a5a0..424b58d 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -3344,6 +3344,43 @@ static int em_bswap(struct x86_emulate_ctxt *ctxt)
> return X86EMUL_CONTINUE;
> }
>
> +static int em_monitor(struct x86_emulate_ctxt *ctxt)
> +{
> + int rc;
> + struct segmented_address addr;
> + u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
> + u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
> + u8 byte;
> +
> + if (ctxt->mode != X86EMUL_MODE_PROT64)
> + rcx = (u32)rcx;
> + if (rcx != 0)
> + return emulate_gp(ctxt, 0);
> +
> + addr.seg = seg_override(ctxt);
> + addr.ea = ctxt->ad_bytes == 8 ? rax : (u32)rax;
> + rc = segmented_read(ctxt, addr, &byte, 1);
> + if (rc != X86EMUL_CONTINUE)
> + return rc;
> +
> + printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> + return X86EMUL_CONTINUE;
> +}
> +
> +static int em_mwait(struct x86_emulate_ctxt *ctxt)
> +{
> + u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
> +
> + if (ctxt->mode != X86EMUL_MODE_PROT64)
> + rcx = (u32)rcx;
> + if ((rcx & ~1UL) != 0)
> + return emulate_gp(ctxt, 0);
> +
> + /* Accepting interrupt as break event regardless to cpuid */
> + printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> + return X86EMUL_CONTINUE;
> +}
> +
> static bool valid_cr(int nr)
> {
> switch (nr) {
> @@ -3557,8 +3594,8 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
> F2bv(((_f) & ~Lock) | DstAcc | SrcImm, _e)
>
> static const struct opcode group7_rm1[] = {
> - DI(SrcNone | Priv, monitor),
> - DI(SrcNone | Priv, mwait),
> + II(SrcNone | Priv | NoBigReal | UDOnPriv, em_monitor, monitor),
> + II(SrcNone | Priv | NoBigReal | UDOnPriv, em_mwait, mwait),
> N, N, N, N, N, N,
> };
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ec8366c..a524e04 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3274,24 +3274,6 @@ static int pause_interception(struct vcpu_svm *svm)
> return 1;
> }
>
> -static int nop_interception(struct vcpu_svm *svm)
> -{
> - skip_emulated_instruction(&(svm->vcpu));
> - return 1;
> -}
> -
> -static int monitor_interception(struct vcpu_svm *svm)
> -{
> - printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> - return nop_interception(svm);
> -}
> -
> -static int mwait_interception(struct vcpu_svm *svm)
> -{
> - printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> - return nop_interception(svm);
> -}
> -
> static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> [SVM_EXIT_READ_CR0] = cr_interception,
> [SVM_EXIT_READ_CR3] = cr_interception,
> @@ -3349,8 +3331,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> [SVM_EXIT_CLGI] = clgi_interception,
> [SVM_EXIT_SKINIT] = skinit_interception,
> [SVM_EXIT_WBINVD] = emulate_on_interception,
> - [SVM_EXIT_MONITOR] = monitor_interception,
> - [SVM_EXIT_MWAIT] = mwait_interception,
> + [SVM_EXIT_MONITOR] = emulate_on_interception,
> + [SVM_EXIT_MWAIT] = emulate_on_interception,
> [SVM_EXIT_XSETBV] = xsetbv_interception,
> [SVM_EXIT_NPF] = pf_interception,
> };
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 801332e..7023e71 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5672,22 +5672,17 @@ static int handle_pause(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> -static int handle_nop(struct kvm_vcpu *vcpu)
> +static int handle_emulate(struct kvm_vcpu *vcpu)
> {
> - skip_emulated_instruction(vcpu);
> - return 1;
> -}
> + int err = emulate_instruction(vcpu, 0);
>
> -static int handle_mwait(struct kvm_vcpu *vcpu)
> -{
> - printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> - return handle_nop(vcpu);
> -}
> -
> -static int handle_monitor(struct kvm_vcpu *vcpu)
> -{
> - printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> - return handle_nop(vcpu);
> + if (err != EMULATE_DONE) {
> + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
> + vcpu->run->internal.ndata = 0;
> + return 0;
> + }
> + return 1;
> }
>
> /*
> @@ -6651,8 +6646,8 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation,
> [EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig,
> [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
> - [EXIT_REASON_MWAIT_INSTRUCTION] = handle_mwait,
> - [EXIT_REASON_MONITOR_INSTRUCTION] = handle_monitor,
> + [EXIT_REASON_MWAIT_INSTRUCTION] = handle_emulate,
> + [EXIT_REASON_MONITOR_INSTRUCTION] = handle_emulate,
> [EXIT_REASON_INVEPT] = handle_invept,
> };
>
>

In VMX, handle_invd can be reused as handle_emulate.

Paolo

2014-06-18 16:43:33

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

Nadav Amit <[email protected]> writes:

> mwait and monitor are currently handled as nop. Considering this behavior, they
> should still be handled correctly, i.e., check execution conditions and generate
> exceptions when required. mwait and monitor may also be executed in real-mode

Is this necessary ? They are NOPs and kvm prints that out (correctly so) to dmesg.
Implementing them correctly is a different thing, but adding extra checks for NOPs
just seems like adding extra cycles.


> and are not handled in that case. This patch performs the emulation of
> monitor-mwait according to Intel SDM (other than checking whether interrupt can
> be used as a break event).
>
> Signed-off-by: Nadav Amit <[email protected]>
> ---
> arch/x86/kvm/emulate.c | 41 +++++++++++++++++++++++++++++++++++++++--
> arch/x86/kvm/svm.c | 22 ++--------------------
> arch/x86/kvm/vmx.c | 27 +++++++++++----------------
> 3 files changed, 52 insertions(+), 38 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index ef7a5a0..424b58d 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -3344,6 +3344,43 @@ static int em_bswap(struct x86_emulate_ctxt *ctxt)
> return X86EMUL_CONTINUE;
> }
>
> +static int em_monitor(struct x86_emulate_ctxt *ctxt)
> +{
> + int rc;
> + struct segmented_address addr;
> + u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
> + u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
> + u8 byte;
> +
> + if (ctxt->mode != X86EMUL_MODE_PROT64)
> + rcx = (u32)rcx;
> + if (rcx != 0)
> + return emulate_gp(ctxt, 0);
> +
> + addr.seg = seg_override(ctxt);
> + addr.ea = ctxt->ad_bytes == 8 ? rax : (u32)rax;
> + rc = segmented_read(ctxt, addr, &byte, 1);
> + if (rc != X86EMUL_CONTINUE)
> + return rc;
> +
> + printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> + return X86EMUL_CONTINUE;
> +}
> +
> +static int em_mwait(struct x86_emulate_ctxt *ctxt)
> +{
> + u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
> +
> + if (ctxt->mode != X86EMUL_MODE_PROT64)
> + rcx = (u32)rcx;
> + if ((rcx & ~1UL) != 0)
> + return emulate_gp(ctxt, 0);
> +
> + /* Accepting interrupt as break event regardless to cpuid */
> + printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> + return X86EMUL_CONTINUE;
> +}
> +
> static bool valid_cr(int nr)
> {
> switch (nr) {
> @@ -3557,8 +3594,8 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
> F2bv(((_f) & ~Lock) | DstAcc | SrcImm, _e)
>
> static const struct opcode group7_rm1[] = {
> - DI(SrcNone | Priv, monitor),
> - DI(SrcNone | Priv, mwait),
> + II(SrcNone | Priv | NoBigReal | UDOnPriv, em_monitor, monitor),
> + II(SrcNone | Priv | NoBigReal | UDOnPriv, em_mwait, mwait),
> N, N, N, N, N, N,
> };
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ec8366c..a524e04 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3274,24 +3274,6 @@ static int pause_interception(struct vcpu_svm *svm)
> return 1;
> }
>
> -static int nop_interception(struct vcpu_svm *svm)
> -{
> - skip_emulated_instruction(&(svm->vcpu));
> - return 1;
> -}
> -
> -static int monitor_interception(struct vcpu_svm *svm)
> -{
> - printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> - return nop_interception(svm);
> -}
> -
> -static int mwait_interception(struct vcpu_svm *svm)
> -{
> - printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> - return nop_interception(svm);
> -}
> -
> static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> [SVM_EXIT_READ_CR0] = cr_interception,
> [SVM_EXIT_READ_CR3] = cr_interception,
> @@ -3349,8 +3331,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> [SVM_EXIT_CLGI] = clgi_interception,
> [SVM_EXIT_SKINIT] = skinit_interception,
> [SVM_EXIT_WBINVD] = emulate_on_interception,
> - [SVM_EXIT_MONITOR] = monitor_interception,
> - [SVM_EXIT_MWAIT] = mwait_interception,
> + [SVM_EXIT_MONITOR] = emulate_on_interception,
> + [SVM_EXIT_MWAIT] = emulate_on_interception,
> [SVM_EXIT_XSETBV] = xsetbv_interception,
> [SVM_EXIT_NPF] = pf_interception,
> };
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 801332e..7023e71 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5672,22 +5672,17 @@ static int handle_pause(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> -static int handle_nop(struct kvm_vcpu *vcpu)
> +static int handle_emulate(struct kvm_vcpu *vcpu)
> {
> - skip_emulated_instruction(vcpu);
> - return 1;
> -}
> + int err = emulate_instruction(vcpu, 0);
>
> -static int handle_mwait(struct kvm_vcpu *vcpu)
> -{
> - printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> - return handle_nop(vcpu);
> -}
> -
> -static int handle_monitor(struct kvm_vcpu *vcpu)
> -{
> - printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> - return handle_nop(vcpu);
> + if (err != EMULATE_DONE) {
> + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
> + vcpu->run->internal.ndata = 0;
> + return 0;
> + }
> + return 1;
> }
>
> /*
> @@ -6651,8 +6646,8 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation,
> [EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig,
> [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
> - [EXIT_REASON_MWAIT_INSTRUCTION] = handle_mwait,
> - [EXIT_REASON_MONITOR_INSTRUCTION] = handle_monitor,
> + [EXIT_REASON_MWAIT_INSTRUCTION] = handle_emulate,
> + [EXIT_REASON_MONITOR_INSTRUCTION] = handle_emulate,
> [EXIT_REASON_INVEPT] = handle_invept,
> };

2014-06-18 16:44:50

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

Il 18/06/2014 18:43, Bandan Das ha scritto:
>> > mwait and monitor are currently handled as nop. Considering this behavior, they
>> > should still be handled correctly, i.e., check execution conditions and generate
>> > exceptions when required. mwait and monitor may also be executed in real-mode
> Is this necessary ? They are NOPs and kvm prints that out (correctly so) to dmesg.
> Implementing them correctly is a different thing, but adding extra checks for NOPs
> just seems like adding extra cycles.

Raising the correct exception is a good thing, though. The guest is
going to busy wait anyway, it doesn't matter how fast it does that. :)

Paolo

2014-06-18 17:33:44

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

Paolo Bonzini <[email protected]> writes:

> Il 18/06/2014 18:43, Bandan Das ha scritto:
>>> > mwait and monitor are currently handled as nop. Considering this behavior, they
>>> > should still be handled correctly, i.e., check execution conditions and generate
>>> > exceptions when required. mwait and monitor may also be executed in real-mode
>> Is this necessary ? They are NOPs and kvm prints that out (correctly so) to dmesg.
>> Implementing them correctly is a different thing, but adding extra checks for NOPs
>> just seems like adding extra cycles.
>
> Raising the correct exception is a good thing, though. The guest is
> going to busy wait anyway, it doesn't matter how fast it does that. :)

Thanks, makes sense. I was also thinking why to advertise it at all, but it's
probably more common than I think it is.

> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-06-18 17:59:17

by Eric Northup

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
> mwait and monitor are currently handled as nop. Considering this behavior, they
> should still be handled correctly, i.e., check execution conditions and generate
> exceptions when required. mwait and monitor may also be executed in real-mode
> and are not handled in that case. This patch performs the emulation of
> monitor-mwait according to Intel SDM (other than checking whether interrupt can
> be used as a break event).
>
> Signed-off-by: Nadav Amit <[email protected]>
> ---
> arch/x86/kvm/emulate.c | 41 +++++++++++++++++++++++++++++++++++++++--
> arch/x86/kvm/svm.c | 22 ++--------------------
> arch/x86/kvm/vmx.c | 27 +++++++++++----------------
> 3 files changed, 52 insertions(+), 38 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index ef7a5a0..424b58d 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -3344,6 +3344,43 @@ static int em_bswap(struct x86_emulate_ctxt *ctxt)
> return X86EMUL_CONTINUE;
> }
>
> +static int em_monitor(struct x86_emulate_ctxt *ctxt)
> +{
> + int rc;
> + struct segmented_address addr;
> + u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
> + u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
> + u8 byte;

I'd request:

u32 ebx, ecx, edx, eax = 1;
ctxt->opt->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
if (!(ecx & FFL(MWAIT)))
return emulate_ud(ctxt);

and also in em_mwait.

> +
> + if (ctxt->mode != X86EMUL_MODE_PROT64)
> + rcx = (u32)rcx;
> + if (rcx != 0)
> + return emulate_gp(ctxt, 0);
> +
> + addr.seg = seg_override(ctxt);
> + addr.ea = ctxt->ad_bytes == 8 ? rax : (u32)rax;
> + rc = segmented_read(ctxt, addr, &byte, 1);
> + if (rc != X86EMUL_CONTINUE)
> + return rc;
> +
> + printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> + return X86EMUL_CONTINUE;
> +}
> +
> +static int em_mwait(struct x86_emulate_ctxt *ctxt)
> +{
> + u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
> +
> + if (ctxt->mode != X86EMUL_MODE_PROT64)
> + rcx = (u32)rcx;
> + if ((rcx & ~1UL) != 0)
> + return emulate_gp(ctxt, 0);
> +
> + /* Accepting interrupt as break event regardless to cpuid */
> + printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> + return X86EMUL_CONTINUE;
> +}
> +
> static bool valid_cr(int nr)
> {
> switch (nr) {
> @@ -3557,8 +3594,8 @@ static int check_perm_out(struct x86_emulate_ctxt *ctxt)
> F2bv(((_f) & ~Lock) | DstAcc | SrcImm, _e)
>
> static const struct opcode group7_rm1[] = {
> - DI(SrcNone | Priv, monitor),
> - DI(SrcNone | Priv, mwait),
> + II(SrcNone | Priv | NoBigReal | UDOnPriv, em_monitor, monitor),
> + II(SrcNone | Priv | NoBigReal | UDOnPriv, em_mwait, mwait),
> N, N, N, N, N, N,
> };
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ec8366c..a524e04 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3274,24 +3274,6 @@ static int pause_interception(struct vcpu_svm *svm)
> return 1;
> }
>
> -static int nop_interception(struct vcpu_svm *svm)
> -{
> - skip_emulated_instruction(&(svm->vcpu));
> - return 1;
> -}
> -
> -static int monitor_interception(struct vcpu_svm *svm)
> -{
> - printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> - return nop_interception(svm);
> -}
> -
> -static int mwait_interception(struct vcpu_svm *svm)
> -{
> - printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> - return nop_interception(svm);
> -}
> -
> static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> [SVM_EXIT_READ_CR0] = cr_interception,
> [SVM_EXIT_READ_CR3] = cr_interception,
> @@ -3349,8 +3331,8 @@ static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> [SVM_EXIT_CLGI] = clgi_interception,
> [SVM_EXIT_SKINIT] = skinit_interception,
> [SVM_EXIT_WBINVD] = emulate_on_interception,
> - [SVM_EXIT_MONITOR] = monitor_interception,
> - [SVM_EXIT_MWAIT] = mwait_interception,
> + [SVM_EXIT_MONITOR] = emulate_on_interception,
> + [SVM_EXIT_MWAIT] = emulate_on_interception,
> [SVM_EXIT_XSETBV] = xsetbv_interception,
> [SVM_EXIT_NPF] = pf_interception,
> };
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 801332e..7023e71 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5672,22 +5672,17 @@ static int handle_pause(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> -static int handle_nop(struct kvm_vcpu *vcpu)
> +static int handle_emulate(struct kvm_vcpu *vcpu)
> {
> - skip_emulated_instruction(vcpu);
> - return 1;
> -}
> + int err = emulate_instruction(vcpu, 0);
>
> -static int handle_mwait(struct kvm_vcpu *vcpu)
> -{
> - printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> - return handle_nop(vcpu);
> -}
> -
> -static int handle_monitor(struct kvm_vcpu *vcpu)
> -{
> - printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> - return handle_nop(vcpu);
> + if (err != EMULATE_DONE) {
> + vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> + vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION;
> + vcpu->run->internal.ndata = 0;
> + return 0;
> + }
> + return 1;
> }
>
> /*
> @@ -6651,8 +6646,8 @@ static int (*const kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
> [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation,
> [EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig,
> [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
> - [EXIT_REASON_MWAIT_INSTRUCTION] = handle_mwait,
> - [EXIT_REASON_MONITOR_INSTRUCTION] = handle_monitor,
> + [EXIT_REASON_MWAIT_INSTRUCTION] = handle_emulate,
> + [EXIT_REASON_MONITOR_INSTRUCTION] = handle_emulate,
> [EXIT_REASON_INVEPT] = handle_invept,
> };
>
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-06-18 18:23:35

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On 6/18/14, 8:59 PM, Eric Northup wrote:
> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
>> mwait and monitor are currently handled as nop. Considering this behavior, they
>> should still be handled correctly, i.e., check execution conditions and generate
>> exceptions when required. mwait and monitor may also be executed in real-mode
>> and are not handled in that case. This patch performs the emulation of
>> monitor-mwait according to Intel SDM (other than checking whether interrupt can
>> be used as a break event).
>>
>> Signed-off-by: Nadav Amit <[email protected]>
>> ---
>> arch/x86/kvm/emulate.c | 41 +++++++++++++++++++++++++++++++++++++++--
>> arch/x86/kvm/svm.c | 22 ++--------------------
>> arch/x86/kvm/vmx.c | 27 +++++++++++----------------
>> 3 files changed, 52 insertions(+), 38 deletions(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index ef7a5a0..424b58d 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -3344,6 +3344,43 @@ static int em_bswap(struct x86_emulate_ctxt *ctxt)
>> return X86EMUL_CONTINUE;
>> }
>>
>> +static int em_monitor(struct x86_emulate_ctxt *ctxt)
>> +{
>> + int rc;
>> + struct segmented_address addr;
>> + u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
>> + u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
>> + u8 byte;
>
> I'd request:
>
> u32 ebx, ecx, edx, eax = 1;
> ctxt->opt->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
> if (!(ecx & FFL(MWAIT)))
> return emulate_ud(ctxt);
>
> and also in em_mwait.
>

I had similar implementation on previous version, which also checked on
mwait whether "interrupt as break event" matches ECX value. However, I
was under the impression that it was decided that MWAIT will always be
emulated as NOP to avoid misbehaving VMs that ignore CPUID (see the
discussion at http://www.spinics.net/lists/kvm/msg102766.html ).

Nadav

2014-06-18 18:30:12

by Eric Northup

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

Quoting Gabriel's post http://www.spinics.net/lists/kvm/msg103792.html :

[...]

> E.g., OS X 10.5 *does* check CPUID, and panics if it doesn't find it.
> It needs the MONITOR cpuid flag to be on, *and* the actual
> instructions to work.




On Wed, Jun 18, 2014 at 11:23 AM, Nadav Amit <[email protected]> wrote:
> On 6/18/14, 8:59 PM, Eric Northup wrote:
>>
>> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]>
>> wrote:
>>>
>>> mwait and monitor are currently handled as nop. Considering this
>>> behavior, they
>>> should still be handled correctly, i.e., check execution conditions and
>>> generate
>>> exceptions when required. mwait and monitor may also be executed in
>>> real-mode
>>> and are not handled in that case. This patch performs the emulation of
>>> monitor-mwait according to Intel SDM (other than checking whether
>>> interrupt can
>>> be used as a break event).
>>>
>>> Signed-off-by: Nadav Amit <[email protected]>
>>> ---
>>> arch/x86/kvm/emulate.c | 41 +++++++++++++++++++++++++++++++++++++++--
>>> arch/x86/kvm/svm.c | 22 ++--------------------
>>> arch/x86/kvm/vmx.c | 27 +++++++++++----------------
>>> 3 files changed, 52 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>>> index ef7a5a0..424b58d 100644
>>> --- a/arch/x86/kvm/emulate.c
>>> +++ b/arch/x86/kvm/emulate.c
>>> @@ -3344,6 +3344,43 @@ static int em_bswap(struct x86_emulate_ctxt *ctxt)
>>> return X86EMUL_CONTINUE;
>>> }
>>>
>>> +static int em_monitor(struct x86_emulate_ctxt *ctxt)
>>> +{
>>> + int rc;
>>> + struct segmented_address addr;
>>> + u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
>>> + u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
>>> + u8 byte;
>>
>>
>> I'd request:
>>
>> u32 ebx, ecx, edx, eax = 1;
>> ctxt->opt->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
>> if (!(ecx & FFL(MWAIT)))
>> return emulate_ud(ctxt);
>>
>> and also in em_mwait.
>>
>
> I had similar implementation on previous version, which also checked on
> mwait whether "interrupt as break event" matches ECX value. However, I was
> under the impression that it was decided that MWAIT will always be emulated
> as NOP to avoid misbehaving VMs that ignore CPUID (see the discussion at
> http://www.spinics.net/lists/kvm/msg102766.html ).
>
> Nadav

2014-06-18 18:46:07

by Gabriel L. Somlo

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
> > mwait and monitor are currently handled as nop. Considering this behavior, they
> > should still be handled correctly, i.e., check execution conditions and generate
> > exceptions when required. mwait and monitor may also be executed in real-mode
> > and are not handled in that case. This patch performs the emulation of
> > monitor-mwait according to Intel SDM (other than checking whether interrupt can
> > be used as a break event).
> >
> > Signed-off-by: Nadav Amit <[email protected]>

How about this instead (details in the commit log below) ? Please let
me know what you think, and if you'd prefer me to send it out as a
separate patch rather than a reply to this thread.

Thanks,
--Gabriel


>From 0375a0aceb54cdbc26a6c0e5b43c46324f830ec3 Mon Sep 17 00:00:00 2001
From: "Gabriel L. Somlo" <[email protected]>
Date: Wed, 18 Jun 2014 14:39:15 -0400
Subject: [PATCH] kvm: x86: revert "emulate monitor and mwait instructions as nop"

This reverts commit 87c00572ba05aa8c9db118da75c608f47eb10b9e.

OS X <= 10.7.* are the only known guests which realistically required
this functionality. As it turns out, OS X can be told to forego using
monitor/mwait by passing it "idlehalt=0" as a kernel argument, so we're
better off removing this hack from KVM altogether.

Signed-off-by: Gabriel L. Somlo <[email protected]>
---
arch/x86/kvm/cpuid.c | 2 --
arch/x86/kvm/svm.c | 8 +++-----
arch/x86/kvm/vmx.c | 10 ++++------
3 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 38a0afe..17b42fa 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -283,8 +283,6 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW);
/* cpuid 1.ecx */
const u32 kvm_supported_word4_x86_features =
- /* NOTE: MONITOR (and MWAIT) are emulated as NOP,
- * but *not* advertised to guests via CPUID ! */
F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
0 /* DS-CPL, VMX, SMX, EST */ |
0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c..0e8ef20 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3274,7 +3274,7 @@ static int pause_interception(struct vcpu_svm *svm)
return 1;
}

-static int nop_interception(struct vcpu_svm *svm)
+static int invalid_op_interception(struct vcpu_svm *svm)
{
skip_emulated_instruction(&(svm->vcpu));
return 1;
@@ -3282,14 +3282,12 @@ static int nop_interception(struct vcpu_svm *svm)

static int monitor_interception(struct vcpu_svm *svm)
{
- printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
- return nop_interception(svm);
+ return invalid_op_interception(svm);
}

static int mwait_interception(struct vcpu_svm *svm)
{
- printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
- return nop_interception(svm);
+ return invalid_op_interception(svm);
}

static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 801332e..577c7df 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5672,22 +5672,20 @@ static int handle_pause(struct kvm_vcpu *vcpu)
return 1;
}

-static int handle_nop(struct kvm_vcpu *vcpu)
+static int handle_invalid_op(struct kvm_vcpu *vcpu)
{
- skip_emulated_instruction(vcpu);
+ kvm_queue_exception(vcpu, UD_VECTOR);
return 1;
}

static int handle_mwait(struct kvm_vcpu *vcpu)
{
- printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
- return handle_nop(vcpu);
+ return handle_invalid_op(vcpu);
}

static int handle_monitor(struct kvm_vcpu *vcpu)
{
- printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
- return handle_nop(vcpu);
+ return handle_invalid_op(vcpu);
}

/*
--
1.9.3

2014-06-18 18:59:44

by Gabriel L. Somlo

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On Wed, Jun 18, 2014 at 11:30:07AM -0700, Eric Northup wrote:
> Quoting Gabriel's post http://www.spinics.net/lists/kvm/msg103792.html :
>
> [...]
>
> > E.g., OS X 10.5 *does* check CPUID, and panics if it doesn't find it.
> > It needs the MONITOR cpuid flag to be on, *and* the actual
> > instructions to work.

That was an argument in favor of finding a mechanism to allow (qemu)
users to enable an otherwise default-off monitor cpuid flag.

We definitely don't want to advertise monitor/mwait availability to
guests which would otherwise sanely fail back to a hlt-based idle loop
when cpuid tells them monitor/mwait are not available :)

However, check my earlier proposal of backing out of monitor/mwait
entirely (as it turns out, there's a kernel command line to tell OS X
not to use monitor/mwait, which is IMHO vastly preferable to creating
"undocumented" KVM hacks :)

Thanks much,
--Gabriel

2014-06-18 19:09:22

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

"Gabriel L. Somlo" <[email protected]> writes:

> On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
>> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
>> > mwait and monitor are currently handled as nop. Considering this behavior, they
>> > should still be handled correctly, i.e., check execution conditions and generate
>> > exceptions when required. mwait and monitor may also be executed in real-mode
>> > and are not handled in that case. This patch performs the emulation of
>> > monitor-mwait according to Intel SDM (other than checking whether interrupt can
>> > be used as a break event).
>> >
>> > Signed-off-by: Nadav Amit <[email protected]>
>
> How about this instead (details in the commit log below) ? Please let
> me know what you think, and if you'd prefer me to send it out as a
> separate patch rather than a reply to this thread.

I am not saying we should "punish" guests that don't do the right thing.
But if we can't emulate it the right way, we emulate it ?
And the fact that a workaround exists for the guest is all the more better :)

> Thanks,
> --Gabriel
>
>
> From 0375a0aceb54cdbc26a6c0e5b43c46324f830ec3 Mon Sep 17 00:00:00 2001
> From: "Gabriel L. Somlo" <[email protected]>
> Date: Wed, 18 Jun 2014 14:39:15 -0400
> Subject: [PATCH] kvm: x86: revert "emulate monitor and mwait instructions as nop"
>
> This reverts commit 87c00572ba05aa8c9db118da75c608f47eb10b9e.
>
> OS X <= 10.7.* are the only known guests which realistically required
> this functionality. As it turns out, OS X can be told to forego using
> monitor/mwait by passing it "idlehalt=0" as a kernel argument, so we're
> better off removing this hack from KVM altogether.
>
> Signed-off-by: Gabriel L. Somlo <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 2 --
> arch/x86/kvm/svm.c | 8 +++-----
> arch/x86/kvm/vmx.c | 10 ++++------
> 3 files changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 38a0afe..17b42fa 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -283,8 +283,6 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW);
> /* cpuid 1.ecx */
> const u32 kvm_supported_word4_x86_features =
> - /* NOTE: MONITOR (and MWAIT) are emulated as NOP,
> - * but *not* advertised to guests via CPUID ! */
> F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
> 0 /* DS-CPL, VMX, SMX, EST */ |
> 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ec8366c..0e8ef20 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3274,7 +3274,7 @@ static int pause_interception(struct vcpu_svm *svm)
> return 1;
> }
>
> -static int nop_interception(struct vcpu_svm *svm)
> +static int invalid_op_interception(struct vcpu_svm *svm)
> {
> skip_emulated_instruction(&(svm->vcpu));
> return 1;
> @@ -3282,14 +3282,12 @@ static int nop_interception(struct vcpu_svm *svm)
>
> static int monitor_interception(struct vcpu_svm *svm)
> {
> - printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> - return nop_interception(svm);
> + return invalid_op_interception(svm);
> }
>
> static int mwait_interception(struct vcpu_svm *svm)
> {
> - printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> - return nop_interception(svm);
> + return invalid_op_interception(svm);
> }
>
> static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 801332e..577c7df 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5672,22 +5672,20 @@ static int handle_pause(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> -static int handle_nop(struct kvm_vcpu *vcpu)
> +static int handle_invalid_op(struct kvm_vcpu *vcpu)
> {
> - skip_emulated_instruction(vcpu);
> + kvm_queue_exception(vcpu, UD_VECTOR);
> return 1;
> }
>
> static int handle_mwait(struct kvm_vcpu *vcpu)
> {
> - printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> - return handle_nop(vcpu);
> + return handle_invalid_op(vcpu);
> }
>
> static int handle_monitor(struct kvm_vcpu *vcpu)
> {
> - printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> - return handle_nop(vcpu);
> + return handle_invalid_op(vcpu);
> }
>
> /*

2014-06-19 10:18:10

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
> On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> > On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
> > > mwait and monitor are currently handled as nop. Considering this behavior, they
> > > should still be handled correctly, i.e., check execution conditions and generate
> > > exceptions when required. mwait and monitor may also be executed in real-mode
> > > and are not handled in that case. This patch performs the emulation of
> > > monitor-mwait according to Intel SDM (other than checking whether interrupt can
> > > be used as a break event).
> > >
> > > Signed-off-by: Nadav Amit <[email protected]>
>
> How about this instead (details in the commit log below) ? Please let
> me know what you think, and if you'd prefer me to send it out as a
> separate patch rather than a reply to this thread.
>
> Thanks,
> --Gabriel

If there's an easy workaround, I'm inclined to agree.
We can always go back to Gabriel's patch (and then we'll need
Nadav's one too) but if we release a kernel with this
support it becomes an ABI and we can't go back.

So let's be careful here, and revert the hack for 3.16.


Acked-by: Michael S. Tsirkin <[email protected]>



>
> >From 0375a0aceb54cdbc26a6c0e5b43c46324f830ec3 Mon Sep 17 00:00:00 2001
> From: "Gabriel L. Somlo" <[email protected]>
> Date: Wed, 18 Jun 2014 14:39:15 -0400
> Subject: [PATCH] kvm: x86: revert "emulate monitor and mwait instructions as nop"
>
> This reverts commit 87c00572ba05aa8c9db118da75c608f47eb10b9e.
>
> OS X <= 10.7.* are the only known guests which realistically required
> this functionality. As it turns out, OS X can be told to forego using
> monitor/mwait by passing it "idlehalt=0" as a kernel argument, so we're
> better off removing this hack from KVM altogether.
>
> Signed-off-by: Gabriel L. Somlo <[email protected]>
> ---
> arch/x86/kvm/cpuid.c | 2 --
> arch/x86/kvm/svm.c | 8 +++-----
> arch/x86/kvm/vmx.c | 10 ++++------
> 3 files changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 38a0afe..17b42fa 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -283,8 +283,6 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW);
> /* cpuid 1.ecx */
> const u32 kvm_supported_word4_x86_features =
> - /* NOTE: MONITOR (and MWAIT) are emulated as NOP,
> - * but *not* advertised to guests via CPUID ! */
> F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
> 0 /* DS-CPL, VMX, SMX, EST */ |
> 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ec8366c..0e8ef20 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3274,7 +3274,7 @@ static int pause_interception(struct vcpu_svm *svm)
> return 1;
> }
>
> -static int nop_interception(struct vcpu_svm *svm)
> +static int invalid_op_interception(struct vcpu_svm *svm)
> {
> skip_emulated_instruction(&(svm->vcpu));
> return 1;
> @@ -3282,14 +3282,12 @@ static int nop_interception(struct vcpu_svm *svm)
>
> static int monitor_interception(struct vcpu_svm *svm)
> {
> - printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> - return nop_interception(svm);
> + return invalid_op_interception(svm);
> }
>
> static int mwait_interception(struct vcpu_svm *svm)
> {
> - printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> - return nop_interception(svm);
> + return invalid_op_interception(svm);
> }
>
> static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 801332e..577c7df 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5672,22 +5672,20 @@ static int handle_pause(struct kvm_vcpu *vcpu)
> return 1;
> }
>
> -static int handle_nop(struct kvm_vcpu *vcpu)
> +static int handle_invalid_op(struct kvm_vcpu *vcpu)
> {
> - skip_emulated_instruction(vcpu);
> + kvm_queue_exception(vcpu, UD_VECTOR);
> return 1;
> }
>
> static int handle_mwait(struct kvm_vcpu *vcpu)
> {
> - printk_once(KERN_WARNING "kvm: MWAIT instruction emulated as NOP!\n");
> - return handle_nop(vcpu);
> + return handle_invalid_op(vcpu);
> }
>
> static int handle_monitor(struct kvm_vcpu *vcpu)
> {
> - printk_once(KERN_WARNING "kvm: MONITOR instruction emulated as NOP!\n");
> - return handle_nop(vcpu);
> + return handle_invalid_op(vcpu);
> }
>
> /*
> --
> 1.9.3

2014-06-19 11:24:04

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
>
> On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <[email protected]> wrote:
>
> > On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
> >> On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> >>> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
> >>>> mwait and monitor are currently handled as nop. Considering this behavior, they
> >>>> should still be handled correctly, i.e., check execution conditions and generate
> >>>> exceptions when required. mwait and monitor may also be executed in real-mode
> >>>> and are not handled in that case. This patch performs the emulation of
> >>>> monitor-mwait according to Intel SDM (other than checking whether interrupt can
> >>>> be used as a break event).
> >>>>
> >>>> Signed-off-by: Nadav Amit <[email protected]>
> >>
> >> How about this instead (details in the commit log below) ? Please let
> >> me know what you think, and if you'd prefer me to send it out as a
> >> separate patch rather than a reply to this thread.
> >>
> >> Thanks,
> >> --Gabriel
> >
> > If there's an easy workaround, I'm inclined to agree.
> > We can always go back to Gabriel's patch (and then we'll need
> > Nadav's one too) but if we release a kernel with this
> > support it becomes an ABI and we can't go back.
> >
> > So let's be careful here, and revert the hack for 3.16.
> >
> >
> > Acked-by: Michael S. Tsirkin <[email protected]>
> >
> Personally, I got a custom guest which requires mwait for executing correctly.
Can you elaborate on this guest a little bit. With nop implementation
for mwait the guest will hog a host cpu. Do you consider this to be
"executing correctly?"

--
Gleb.

2014-06-19 11:35:06

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

Il 18/06/2014 19:59, Eric Northup ha scritto:
> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
>> mwait and monitor are currently handled as nop. Considering this behavior, they
>> should still be handled correctly, i.e., check execution conditions and generate
>> exceptions when required. mwait and monitor may also be executed in real-mode
>> and are not handled in that case. This patch performs the emulation of
>> monitor-mwait according to Intel SDM (other than checking whether interrupt can
>> be used as a break event).
>>
>> Signed-off-by: Nadav Amit <[email protected]>
>> ---
>> arch/x86/kvm/emulate.c | 41 +++++++++++++++++++++++++++++++++++++++--
>> arch/x86/kvm/svm.c | 22 ++--------------------
>> arch/x86/kvm/vmx.c | 27 +++++++++++----------------
>> 3 files changed, 52 insertions(+), 38 deletions(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index ef7a5a0..424b58d 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -3344,6 +3344,43 @@ static int em_bswap(struct x86_emulate_ctxt *ctxt)
>> return X86EMUL_CONTINUE;
>> }
>>
>> +static int em_monitor(struct x86_emulate_ctxt *ctxt)
>> +{
>> + int rc;
>> + struct segmented_address addr;
>> + u64 rcx = reg_read(ctxt, VCPU_REGS_RCX);
>> + u64 rax = reg_read(ctxt, VCPU_REGS_RAX);
>> + u8 byte;
>
> I'd request:
>
> u32 ebx, ecx, edx, eax = 1;
> ctxt->opt->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx);
> if (!(ecx & FFL(MWAIT)))
> return emulate_ud(ctxt);
>
> and also in em_mwait.

Ignoring the fact that this should never be true
(KVM_GET_SUPPORTED_CPUID never reports the MWAIT bit), why should
MONITOR and MWAIT be special? We do not do this kind of check for SSE
or AVX instructions.

An alternative is to record the address that was being waited on, and
invoke PLE (kvm_vcpu_on_spin) if the current address matches the last
one. A VMEXIT + emulation takes a couple thousand cycles, which is the
same order of magnitude as the PLE window.

Even if there is a workaround, I don't think reverting the patch is
necessary. The patch was there for a fringe case anyway (recent
versions of Mac OS X get CPUID right), so I don't think the availability
of a work around changes the assessment of how ugly/useful MONITOR/MWAIT is.

Paolo

2014-06-19 11:52:28

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On 6/19/14, 2:23 PM, Gleb Natapov wrote:
> On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
>>
>> On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <[email protected]> wrote:
>>
>>> On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
>>>> On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
>>>>> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
>>>>>> mwait and monitor are currently handled as nop. Considering this behavior, they
>>>>>> should still be handled correctly, i.e., check execution conditions and generate
>>>>>> exceptions when required. mwait and monitor may also be executed in real-mode
>>>>>> and are not handled in that case. This patch performs the emulation of
>>>>>> monitor-mwait according to Intel SDM (other than checking whether interrupt can
>>>>>> be used as a break event).
>>>>>>
>>>>>> Signed-off-by: Nadav Amit <[email protected]>
>>>>
>>>> How about this instead (details in the commit log below) ? Please let
>>>> me know what you think, and if you'd prefer me to send it out as a
>>>> separate patch rather than a reply to this thread.
>>>>
>>>> Thanks,
>>>> --Gabriel
>>>
>>> If there's an easy workaround, I'm inclined to agree.
>>> We can always go back to Gabriel's patch (and then we'll need
>>> Nadav's one too) but if we release a kernel with this
>>> support it becomes an ABI and we can't go back.
>>>
>>> So let's be careful here, and revert the hack for 3.16.
>>>
>>>
>>> Acked-by: Michael S. Tsirkin <[email protected]>
>>>
>> Personally, I got a custom guest which requires mwait for executing correctly.
> Can you elaborate on this guest a little bit. With nop implementation
> for mwait the guest will hog a host cpu. Do you consider this to be
> "executing correctly?"
>
> --

mwait is not as "clean" as it may appear. It encounters false wake-ups
due to a variety of reasons, and any code need to recheck the wake-up
condition afterwards. Actually, some CPUs had bugs that caused excessive
wake-ups that degraded performance considerably (Nehalem, if I am not
mistaken).
Therefore, handling mwait as nop is logically correct (although it may
degrade performance).

For the reference, if you look at the SDM 8.10.4, you'll see:
"Multiple events other than a write to the triggering address range can
cause a processor that executed MWAIT to wake up. These include events
that would lead to voluntary or involuntary context switches, such as..."

Note the words "include" in the sentence "These include events".
Software has no way of controlling whether it gets false wake-ups and
cannot rely on the wake-up as indication to anything.

Nadav

2014-06-19 12:01:25

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:
> On 6/19/14, 2:23 PM, Gleb Natapov wrote:
> >On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
> >>
> >>On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <[email protected]> wrote:
> >>
> >>>On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
> >>>>On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> >>>>>On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
> >>>>>>mwait and monitor are currently handled as nop. Considering this behavior, they
> >>>>>>should still be handled correctly, i.e., check execution conditions and generate
> >>>>>>exceptions when required. mwait and monitor may also be executed in real-mode
> >>>>>>and are not handled in that case. This patch performs the emulation of
> >>>>>>monitor-mwait according to Intel SDM (other than checking whether interrupt can
> >>>>>>be used as a break event).
> >>>>>>
> >>>>>>Signed-off-by: Nadav Amit <[email protected]>
> >>>>
> >>>>How about this instead (details in the commit log below) ? Please let
> >>>>me know what you think, and if you'd prefer me to send it out as a
> >>>>separate patch rather than a reply to this thread.
> >>>>
> >>>>Thanks,
> >>>>--Gabriel
> >>>
> >>>If there's an easy workaround, I'm inclined to agree.
> >>>We can always go back to Gabriel's patch (and then we'll need
> >>>Nadav's one too) but if we release a kernel with this
> >>>support it becomes an ABI and we can't go back.
> >>>
> >>>So let's be careful here, and revert the hack for 3.16.
> >>>
> >>>
> >>>Acked-by: Michael S. Tsirkin <[email protected]>
> >>>
> >>Personally, I got a custom guest which requires mwait for executing correctly.
> >Can you elaborate on this guest a little bit. With nop implementation
> >for mwait the guest will hog a host cpu. Do you consider this to be
> >"executing correctly?"
> >
> >--
>
> mwait is not as "clean" as it may appear. It encounters false wake-ups due
> to a variety of reasons, and any code need to recheck the wake-up condition
> afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
> degraded performance considerably (Nehalem, if I am not mistaken).
> Therefore, handling mwait as nop is logically correct (although it may
> degrade performance).
>
> For the reference, if you look at the SDM 8.10.4, you'll see:
> "Multiple events other than a write to the triggering address range can
> cause a processor that executed MWAIT to wake up. These include events that
> would lead to voluntary or involuntary context switches, such as..."
>
> Note the words "include" in the sentence "These include events". Software
> has no way of controlling whether it gets false wake-ups and cannot rely on
> the wake-up as indication to anything.
>
> Nadav

It's a quality of implementation question.
It is correct in the same sense that
a NIC dropping each second packet is correct.

If we ship this hack we have to maintain it forever,
so there needs to be a compelling reason beyond
just "because we can".


--
MST

2014-06-19 12:07:46

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:
> On 6/19/14, 2:23 PM, Gleb Natapov wrote:
> >On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
> >>
> >>On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <[email protected]> wrote:
> >>
> >>>On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
> >>>>On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> >>>>>On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
> >>>>>>mwait and monitor are currently handled as nop. Considering this behavior, they
> >>>>>>should still be handled correctly, i.e., check execution conditions and generate
> >>>>>>exceptions when required. mwait and monitor may also be executed in real-mode
> >>>>>>and are not handled in that case. This patch performs the emulation of
> >>>>>>monitor-mwait according to Intel SDM (other than checking whether interrupt can
> >>>>>>be used as a break event).
> >>>>>>
> >>>>>>Signed-off-by: Nadav Amit <[email protected]>
> >>>>
> >>>>How about this instead (details in the commit log below) ? Please let
> >>>>me know what you think, and if you'd prefer me to send it out as a
> >>>>separate patch rather than a reply to this thread.
> >>>>
> >>>>Thanks,
> >>>>--Gabriel
> >>>
> >>>If there's an easy workaround, I'm inclined to agree.
> >>>We can always go back to Gabriel's patch (and then we'll need
> >>>Nadav's one too) but if we release a kernel with this
> >>>support it becomes an ABI and we can't go back.
> >>>
> >>>So let's be careful here, and revert the hack for 3.16.
> >>>
> >>>
> >>>Acked-by: Michael S. Tsirkin <[email protected]>
> >>>
> >>Personally, I got a custom guest which requires mwait for executing correctly.
> >Can you elaborate on this guest a little bit. With nop implementation
> >for mwait the guest will hog a host cpu. Do you consider this to be
> >"executing correctly?"
> >
> >--
>
> mwait is not as "clean" as it may appear. It encounters false wake-ups due
> to a variety of reasons, and any code need to recheck the wake-up condition
> afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
> degraded performance considerably (Nehalem, if I am not mistaken).
> Therefore, handling mwait as nop is logically correct (although it may
> degrade performance).
>
> For the reference, if you look at the SDM 8.10.4, you'll see:
> "Multiple events other than a write to the triggering address range can
> cause a processor that executed MWAIT to wake up. These include events that
> would lead to voluntary or involuntary context switches, such as..."
>
> Note the words "include" in the sentence "These include events". Software
> has no way of controlling whether it gets false wake-ups and cannot rely on
> the wake-up as indication to anything.
>
That's all well and good and I didn't say that nop is not a valid
mwait implementation, it is, though there is a big difference between
"encounters false wake-ups" and never sleeps. What I asked is do you
consider your guest hogging host cpu to be "executing correctly?". What
this guest is doing that such behaviour is tolerated and shouldn't it
be better to just poll for a condition you are waiting for instead of
executing expensive vmexits. This will also hog 100% host cpu, but will
be actually faster.

--
Gleb.

2014-06-19 12:10:28

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On 6/19/14, 3:07 PM, Gleb Natapov wrote:
> On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:
>> On 6/19/14, 2:23 PM, Gleb Natapov wrote:
>>> On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
>>>>
>>>> On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <[email protected]> wrote:
>>>>
>>>>> On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
>>>>>> On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
>>>>>>> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
>>>>>>>> mwait and monitor are currently handled as nop. Considering this behavior, they
>>>>>>>> should still be handled correctly, i.e., check execution conditions and generate
>>>>>>>> exceptions when required. mwait and monitor may also be executed in real-mode
>>>>>>>> and are not handled in that case. This patch performs the emulation of
>>>>>>>> monitor-mwait according to Intel SDM (other than checking whether interrupt can
>>>>>>>> be used as a break event).
>>>>>>>>
>>>>>>>> Signed-off-by: Nadav Amit <[email protected]>
>>>>>>
>>>>>> How about this instead (details in the commit log below) ? Please let
>>>>>> me know what you think, and if you'd prefer me to send it out as a
>>>>>> separate patch rather than a reply to this thread.
>>>>>>
>>>>>> Thanks,
>>>>>> --Gabriel
>>>>>
>>>>> If there's an easy workaround, I'm inclined to agree.
>>>>> We can always go back to Gabriel's patch (and then we'll need
>>>>> Nadav's one too) but if we release a kernel with this
>>>>> support it becomes an ABI and we can't go back.
>>>>>
>>>>> So let's be careful here, and revert the hack for 3.16.
>>>>>
>>>>>
>>>>> Acked-by: Michael S. Tsirkin <[email protected]>
>>>>>
>>>> Personally, I got a custom guest which requires mwait for executing correctly.
>>> Can you elaborate on this guest a little bit. With nop implementation
>>> for mwait the guest will hog a host cpu. Do you consider this to be
>>> "executing correctly?"
>>>
>>> --
>>
>> mwait is not as "clean" as it may appear. It encounters false wake-ups due
>> to a variety of reasons, and any code need to recheck the wake-up condition
>> afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
>> degraded performance considerably (Nehalem, if I am not mistaken).
>> Therefore, handling mwait as nop is logically correct (although it may
>> degrade performance).
>>
>> For the reference, if you look at the SDM 8.10.4, you'll see:
>> "Multiple events other than a write to the triggering address range can
>> cause a processor that executed MWAIT to wake up. These include events that
>> would lead to voluntary or involuntary context switches, such as..."
>>
>> Note the words "include" in the sentence "These include events". Software
>> has no way of controlling whether it gets false wake-ups and cannot rely on
>> the wake-up as indication to anything.
>>
> That's all well and good and I didn't say that nop is not a valid
> mwait implementation, it is, though there is a big difference between
> "encounters false wake-ups" and never sleeps. What I asked is do you
> consider your guest hogging host cpu to be "executing correctly?". What
> this guest is doing that such behaviour is tolerated and shouldn't it
> be better to just poll for a condition you are waiting for instead of
> executing expensive vmexits. This will also hog 100% host cpu, but will
> be actually faster.
>
You are correct, but unfortunately I have no control over the guest
workload. In this specific workload I do not care about performance but
only about correctness.

Nadav

2014-06-19 12:16:34

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On Thu, Jun 19, 2014 at 03:10:21PM +0300, Nadav Amit wrote:
> On 6/19/14, 3:07 PM, Gleb Natapov wrote:
> >On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:
> >>On 6/19/14, 2:23 PM, Gleb Natapov wrote:
> >>>On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
> >>>>
> >>>>On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <[email protected]> wrote:
> >>>>
> >>>>>On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
> >>>>>>On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> >>>>>>>On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
> >>>>>>>>mwait and monitor are currently handled as nop. Considering this behavior, they
> >>>>>>>>should still be handled correctly, i.e., check execution conditions and generate
> >>>>>>>>exceptions when required. mwait and monitor may also be executed in real-mode
> >>>>>>>>and are not handled in that case. This patch performs the emulation of
> >>>>>>>>monitor-mwait according to Intel SDM (other than checking whether interrupt can
> >>>>>>>>be used as a break event).
> >>>>>>>>
> >>>>>>>>Signed-off-by: Nadav Amit <[email protected]>
> >>>>>>
> >>>>>>How about this instead (details in the commit log below) ? Please let
> >>>>>>me know what you think, and if you'd prefer me to send it out as a
> >>>>>>separate patch rather than a reply to this thread.
> >>>>>>
> >>>>>>Thanks,
> >>>>>>--Gabriel
> >>>>>
> >>>>>If there's an easy workaround, I'm inclined to agree.
> >>>>>We can always go back to Gabriel's patch (and then we'll need
> >>>>>Nadav's one too) but if we release a kernel with this
> >>>>>support it becomes an ABI and we can't go back.
> >>>>>
> >>>>>So let's be careful here, and revert the hack for 3.16.
> >>>>>
> >>>>>
> >>>>>Acked-by: Michael S. Tsirkin <[email protected]>
> >>>>>
> >>>>Personally, I got a custom guest which requires mwait for executing correctly.
> >>>Can you elaborate on this guest a little bit. With nop implementation
> >>>for mwait the guest will hog a host cpu. Do you consider this to be
> >>>"executing correctly?"
> >>>
> >>>--
> >>
> >>mwait is not as "clean" as it may appear. It encounters false wake-ups due
> >>to a variety of reasons, and any code need to recheck the wake-up condition
> >>afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
> >>degraded performance considerably (Nehalem, if I am not mistaken).
> >>Therefore, handling mwait as nop is logically correct (although it may
> >>degrade performance).
> >>
> >>For the reference, if you look at the SDM 8.10.4, you'll see:
> >>"Multiple events other than a write to the triggering address range can
> >>cause a processor that executed MWAIT to wake up. These include events that
> >>would lead to voluntary or involuntary context switches, such as..."
> >>
> >>Note the words "include" in the sentence "These include events". Software
> >>has no way of controlling whether it gets false wake-ups and cannot rely on
> >>the wake-up as indication to anything.
> >>
> >That's all well and good and I didn't say that nop is not a valid
> >mwait implementation, it is, though there is a big difference between
> >"encounters false wake-ups" and never sleeps. What I asked is do you
> >consider your guest hogging host cpu to be "executing correctly?". What
> >this guest is doing that such behaviour is tolerated and shouldn't it
> >be better to just poll for a condition you are waiting for instead of
> >executing expensive vmexits. This will also hog 100% host cpu, but will
> >be actually faster.
> >
> You are correct, but unfortunately I have no control over the guest
> workload. In this specific workload I do not care about performance but only
> about correctness.
>
Fair enough. But can you at least hint what is this mysterious guest?

--
Gleb.

2014-06-19 12:17:50

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On Thu, Jun 19, 2014 at 03:10:21PM +0300, Nadav Amit wrote:
> On 6/19/14, 3:07 PM, Gleb Natapov wrote:
> >On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:
> >>On 6/19/14, 2:23 PM, Gleb Natapov wrote:
> >>>On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
> >>>>
> >>>>On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <[email protected]> wrote:
> >>>>
> >>>>>On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
> >>>>>>On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
> >>>>>>>On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
> >>>>>>>>mwait and monitor are currently handled as nop. Considering this behavior, they
> >>>>>>>>should still be handled correctly, i.e., check execution conditions and generate
> >>>>>>>>exceptions when required. mwait and monitor may also be executed in real-mode
> >>>>>>>>and are not handled in that case. This patch performs the emulation of
> >>>>>>>>monitor-mwait according to Intel SDM (other than checking whether interrupt can
> >>>>>>>>be used as a break event).
> >>>>>>>>
> >>>>>>>>Signed-off-by: Nadav Amit <[email protected]>
> >>>>>>
> >>>>>>How about this instead (details in the commit log below) ? Please let
> >>>>>>me know what you think, and if you'd prefer me to send it out as a
> >>>>>>separate patch rather than a reply to this thread.
> >>>>>>
> >>>>>>Thanks,
> >>>>>>--Gabriel
> >>>>>
> >>>>>If there's an easy workaround, I'm inclined to agree.
> >>>>>We can always go back to Gabriel's patch (and then we'll need
> >>>>>Nadav's one too) but if we release a kernel with this
> >>>>>support it becomes an ABI and we can't go back.
> >>>>>
> >>>>>So let's be careful here, and revert the hack for 3.16.
> >>>>>
> >>>>>
> >>>>>Acked-by: Michael S. Tsirkin <[email protected]>
> >>>>>
> >>>>Personally, I got a custom guest which requires mwait for executing correctly.
> >>>Can you elaborate on this guest a little bit. With nop implementation
> >>>for mwait the guest will hog a host cpu. Do you consider this to be
> >>>"executing correctly?"
> >>>
> >>>--
> >>
> >>mwait is not as "clean" as it may appear. It encounters false wake-ups due
> >>to a variety of reasons, and any code need to recheck the wake-up condition
> >>afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
> >>degraded performance considerably (Nehalem, if I am not mistaken).
> >>Therefore, handling mwait as nop is logically correct (although it may
> >>degrade performance).
> >>
> >>For the reference, if you look at the SDM 8.10.4, you'll see:
> >>"Multiple events other than a write to the triggering address range can
> >>cause a processor that executed MWAIT to wake up. These include events that
> >>would lead to voluntary or involuntary context switches, such as..."
> >>
> >>Note the words "include" in the sentence "These include events". Software
> >>has no way of controlling whether it gets false wake-ups and cannot rely on
> >>the wake-up as indication to anything.
> >>
> >That's all well and good and I didn't say that nop is not a valid
> >mwait implementation, it is, though there is a big difference between
> >"encounters false wake-ups" and never sleeps. What I asked is do you
> >consider your guest hogging host cpu to be "executing correctly?". What
> >this guest is doing that such behaviour is tolerated and shouldn't it
> >be better to just poll for a condition you are waiting for instead of
> >executing expensive vmexits. This will also hog 100% host cpu, but will
> >be actually faster.
> >
> You are correct, but unfortunately I have no control over the guest
> workload. In this specific workload I do not care about performance but only
> about correctness.
>
> Nadav

No one prevents you from patching your kernel to run this workload. But
is this of use to anyone else? If yes why?

--
MST

2014-06-19 12:28:41

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH 3/3] KVM: x86: correct mwait and monitor emulation

On 6/19/14, 3:17 PM, Michael S. Tsirkin wrote:
> On Thu, Jun 19, 2014 at 03:10:21PM +0300, Nadav Amit wrote:
>> On 6/19/14, 3:07 PM, Gleb Natapov wrote:
>>> On Thu, Jun 19, 2014 at 02:52:20PM +0300, Nadav Amit wrote:
>>>> On 6/19/14, 2:23 PM, Gleb Natapov wrote:
>>>>> On Thu, Jun 19, 2014 at 01:53:36PM +0300, Nadav Amit wrote:
>>>>>>
>>>>>> On Jun 19, 2014, at 1:18 PM, Michael S. Tsirkin <[email protected]> wrote:
>>>>>>
>>>>>>> On Wed, Jun 18, 2014 at 02:46:01PM -0400, Gabriel L. Somlo wrote:
>>>>>>>> On Wed, Jun 18, 2014 at 10:59:14AM -0700, Eric Northup wrote:
>>>>>>>>> On Wed, Jun 18, 2014 at 7:19 AM, Nadav Amit <[email protected]> wrote:
>>>>>>>>>> mwait and monitor are currently handled as nop. Considering this behavior, they
>>>>>>>>>> should still be handled correctly, i.e., check execution conditions and generate
>>>>>>>>>> exceptions when required. mwait and monitor may also be executed in real-mode
>>>>>>>>>> and are not handled in that case. This patch performs the emulation of
>>>>>>>>>> monitor-mwait according to Intel SDM (other than checking whether interrupt can
>>>>>>>>>> be used as a break event).
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Nadav Amit <[email protected]>
>>>>>>>>
>>>>>>>> How about this instead (details in the commit log below) ? Please let
>>>>>>>> me know what you think, and if you'd prefer me to send it out as a
>>>>>>>> separate patch rather than a reply to this thread.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> --Gabriel
>>>>>>>
>>>>>>> If there's an easy workaround, I'm inclined to agree.
>>>>>>> We can always go back to Gabriel's patch (and then we'll need
>>>>>>> Nadav's one too) but if we release a kernel with this
>>>>>>> support it becomes an ABI and we can't go back.
>>>>>>>
>>>>>>> So let's be careful here, and revert the hack for 3.16.
>>>>>>>
>>>>>>>
>>>>>>> Acked-by: Michael S. Tsirkin <[email protected]>
>>>>>>>
>>>>>> Personally, I got a custom guest which requires mwait for executing correctly.
>>>>> Can you elaborate on this guest a little bit. With nop implementation
>>>>> for mwait the guest will hog a host cpu. Do you consider this to be
>>>>> "executing correctly?"
>>>>>
>>>>> --
>>>>
>>>> mwait is not as "clean" as it may appear. It encounters false wake-ups due
>>>> to a variety of reasons, and any code need to recheck the wake-up condition
>>>> afterwards. Actually, some CPUs had bugs that caused excessive wake-ups that
>>>> degraded performance considerably (Nehalem, if I am not mistaken).
>>>> Therefore, handling mwait as nop is logically correct (although it may
>>>> degrade performance).
>>>>
>>>> For the reference, if you look at the SDM 8.10.4, you'll see:
>>>> "Multiple events other than a write to the triggering address range can
>>>> cause a processor that executed MWAIT to wake up. These include events that
>>>> would lead to voluntary or involuntary context switches, such as..."
>>>>
>>>> Note the words "include" in the sentence "These include events". Software
>>>> has no way of controlling whether it gets false wake-ups and cannot rely on
>>>> the wake-up as indication to anything.
>>>>
>>> That's all well and good and I didn't say that nop is not a valid
>>> mwait implementation, it is, though there is a big difference between
>>> "encounters false wake-ups" and never sleeps. What I asked is do you
>>> consider your guest hogging host cpu to be "executing correctly?". What
>>> this guest is doing that such behaviour is tolerated and shouldn't it
>>> be better to just poll for a condition you are waiting for instead of
>>> executing expensive vmexits. This will also hog 100% host cpu, but will
>>> be actually faster.
>>>
>> You are correct, but unfortunately I have no control over the guest
>> workload. In this specific workload I do not care about performance but only
>> about correctness.
>>
>> Nadav
>
> No one prevents you from patching your kernel to run this workload. But
> is this of use to anyone else? If yes why?
>
I do not say it should be the default behavior, and I can try to push to
qemu some setting to turn it on by demand.

Anyhow, I believe there are cases you may want mwait support - either an
OS X guest which was not modified to run without mwait, or for debugging
the monitor-mwait flow of a guest OS.

I am not going to argue too much. Since I was under the impression there
are needs for mwait, other than mine, I thought it would make all of our
lives easier to have a better implementation.

Nadav