2016-11-22 19:21:22

by Radim Krčmář

[permalink] [raw]
Subject: [PATCH] KVM: x86: restore IP after all far jump failures

em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
bit mode, but syzkaller proved otherwise (and SDM agrees).
Code segment was restored upon failure, but it was left uninitialized
outside of long mode, which could lead to a leak of host kernel stack.

Found by syzkaller:

WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
Kernel panic - not syncing: panic_on_warn set ...

CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[...]
Call Trace:
[...] __dump_stack lib/dump_stack.c:15
[...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
[...] panic+0x1b7/0x3a3 kernel/panic.c:179
[...] __warn+0x1c4/0x1e0 kernel/panic.c:542
[...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
[...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
[...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
[...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
[...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
[...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
[...] complete_emulated_io arch/x86/kvm/x86.c:6870
[...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
[...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
[...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
[...] vfs_ioctl fs/ioctl.c:43
[...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
[...] SYSC_ioctl fs/ioctl.c:694
[...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
[...] entry_SYSCALL_64_fastpath+0x1f/0xc2

Reported-by: Dmitry Vyukov <[email protected]>
Cc: [email protected]
Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
Signed-off-by: Radim Krčmář <[email protected]>
---
Cc: Nadav Amit <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Cc: Steve Rutherford <[email protected]>
Cc: Andrew Honig <[email protected]>
Cc: Prasad Pandit <[email protected]>
---
arch/x86/kvm/emulate.c | 20 ++++++--------------
1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 7d4f9b7f06ee..d8a812b6204d 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2137,10 +2137,7 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
const struct x86_emulate_ops *ops = ctxt->ops;
u8 cpl = ctxt->ops->cpl(ctxt);

- /* Assignment of RIP may only fail in 64-bit mode */
- if (ctxt->mode == X86EMUL_MODE_PROT64)
- ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
- VCPU_SREG_CS);
+ ops->get_segment(ctxt, &old_sel, &old_desc, NULL, VCPU_SREG_CS);

memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);

@@ -2151,12 +2148,10 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
return rc;

rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
- if (rc != X86EMUL_CONTINUE) {
- WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
+ if (rc != X86EMUL_CONTINUE)
/* assigning eip failed; restore the old cs */
ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
- return rc;
- }
+
return rc;
}

@@ -2221,9 +2216,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
struct desc_struct old_desc, new_desc;
const struct x86_emulate_ops *ops = ctxt->ops;

- if (ctxt->mode == X86EMUL_MODE_PROT64)
- ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
- VCPU_SREG_CS);
+ ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);

rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
@@ -2240,10 +2233,9 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
if (rc != X86EMUL_CONTINUE)
return rc;
rc = assign_eip_far(ctxt, eip, &new_desc);
- if (rc != X86EMUL_CONTINUE) {
- WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
+ if (rc != X86EMUL_CONTINUE)
ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
- }
+
return rc;
}

--
2.10.2


2016-11-22 19:34:31

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: restore IP after all far jump failures



On 22/11/2016 20:21, Radim Krčmář wrote:
> em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
> bit mode, but syzkaller proved otherwise (and SDM agrees).
> Code segment was restored upon failure, but it was left uninitialized
> outside of long mode, which could lead to a leak of host kernel stack.
>
> Found by syzkaller:
>
> WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [...]
> Call Trace:
> [...] __dump_stack lib/dump_stack.c:15
> [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
> [...] panic+0x1b7/0x3a3 kernel/panic.c:179
> [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
> [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
> [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
> [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
> [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
> [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
> [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
> [...] complete_emulated_io arch/x86/kvm/x86.c:6870
> [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
> [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
> [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
> [...] vfs_ioctl fs/ioctl.c:43
> [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
> [...] SYSC_ioctl fs/ioctl.c:694
> [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
> [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
>
> Reported-by: Dmitry Vyukov <[email protected]>
> Cc: [email protected]
> Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
> Signed-off-by: Radim Krčmář <[email protected]>
> ---
> Cc: Nadav Amit <[email protected]>
> Cc: Dmitry Vyukov <[email protected]>
> Cc: Steve Rutherford <[email protected]>
> Cc: Andrew Honig <[email protected]>
> Cc: Prasad Pandit <[email protected]>
> ---
> arch/x86/kvm/emulate.c | 20 ++++++--------------
> 1 file changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 7d4f9b7f06ee..d8a812b6204d 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2137,10 +2137,7 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
> const struct x86_emulate_ops *ops = ctxt->ops;
> u8 cpl = ctxt->ops->cpl(ctxt);
>
> - /* Assignment of RIP may only fail in 64-bit mode */
> - if (ctxt->mode == X86EMUL_MODE_PROT64)
> - ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
> - VCPU_SREG_CS);
> + ops->get_segment(ctxt, &old_sel, &old_desc, NULL, VCPU_SREG_CS);
>
> memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
>
> @@ -2151,12 +2148,10 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
> return rc;
>
> rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
> - if (rc != X86EMUL_CONTINUE) {
> - WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
> + if (rc != X86EMUL_CONTINUE)
> /* assigning eip failed; restore the old cs */
> ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
> - return rc;
> - }
> +
> return rc;
> }
>
> @@ -2221,9 +2216,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
> struct desc_struct old_desc, new_desc;
> const struct x86_emulate_ops *ops = ctxt->ops;
>
> - if (ctxt->mode == X86EMUL_MODE_PROT64)
> - ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
> - VCPU_SREG_CS);
> + ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);
>
> rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
> if (rc != X86EMUL_CONTINUE)
> @@ -2240,10 +2233,9 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
> if (rc != X86EMUL_CONTINUE)
> return rc;
> rc = assign_eip_far(ctxt, eip, &new_desc);
> - if (rc != X86EMUL_CONTINUE) {
> - WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
> + if (rc != X86EMUL_CONTINUE)
> ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
> - }
> +
> return rc;
> }
>
>

Reviewed-by: Paolo Bonzini <[email protected]>

2016-11-22 19:44:55

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: restore IP after all far jump failures

I admit my wrongdoings, but I still think the fix should have been to
remove the entire recovery logic and just return X86EMUL_UNHANDLEABLE if
something goes wrong (exception). This will kill the misbehaving process
but keep the VM running.

Otherwise, a malicious VM process, which can somehow control descriptors
(LDT?) may modify the descriptor during the emulation and get the system
to inconsistent state and prevent the VM-entry.

No?

Nadav

> On Nov 22, 2016, at 11:34 AM, Paolo Bonzini <[email protected]> wrote:
>
>
>
> On 22/11/2016 20:21, Radim Krčmář wrote:
>> em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
>> bit mode, but syzkaller proved otherwise (and SDM agrees).
>> Code segment was restored upon failure, but it was left uninitialized
>> outside of long mode, which could lead to a leak of host kernel stack.
>>
>> Found by syzkaller:
>>
>> WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
>> Kernel panic - not syncing: panic_on_warn set ...
>>
>> CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> [...]
>> Call Trace:
>> [...] __dump_stack lib/dump_stack.c:15
>> [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
>> [...] panic+0x1b7/0x3a3 kernel/panic.c:179
>> [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
>> [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
>> [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
>> [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
>> [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
>> [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
>> [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
>> [...] complete_emulated_io arch/x86/kvm/x86.c:6870
>> [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
>> [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
>> [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
>> [...] vfs_ioctl fs/ioctl.c:43
>> [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
>> [...] SYSC_ioctl fs/ioctl.c:694
>> [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
>> [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
>>
>> Reported-by: Dmitry Vyukov <[email protected]>
>> Cc: [email protected]
>> Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
>> Signed-off-by: Radim Krčmář <[email protected]>
>> ---
>> Cc: Nadav Amit <[email protected]>
>> Cc: Dmitry Vyukov <[email protected]>
>> Cc: Steve Rutherford <[email protected]>
>> Cc: Andrew Honig <[email protected]>
>> Cc: Prasad Pandit <[email protected]>
>> ---
>> arch/x86/kvm/emulate.c | 20 ++++++--------------
>> 1 file changed, 6 insertions(+), 14 deletions(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
>> index 7d4f9b7f06ee..d8a812b6204d 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -2137,10 +2137,7 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
>> const struct x86_emulate_ops *ops = ctxt->ops;
>> u8 cpl = ctxt->ops->cpl(ctxt);
>>
>> - /* Assignment of RIP may only fail in 64-bit mode */
>> - if (ctxt->mode == X86EMUL_MODE_PROT64)
>> - ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
>> - VCPU_SREG_CS);
>> + ops->get_segment(ctxt, &old_sel, &old_desc, NULL, VCPU_SREG_CS);
>>
>> memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
>>
>> @@ -2151,12 +2148,10 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
>> return rc;
>>
>> rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
>> - if (rc != X86EMUL_CONTINUE) {
>> - WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
>> + if (rc != X86EMUL_CONTINUE)
>> /* assigning eip failed; restore the old cs */
>> ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
>> - return rc;
>> - }
>> +
>> return rc;
>> }
>>
>> @@ -2221,9 +2216,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
>> struct desc_struct old_desc, new_desc;
>> const struct x86_emulate_ops *ops = ctxt->ops;
>>
>> - if (ctxt->mode == X86EMUL_MODE_PROT64)
>> - ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
>> - VCPU_SREG_CS);
>> + ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);
>>
>> rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
>> if (rc != X86EMUL_CONTINUE)
>> @@ -2240,10 +2233,9 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
>> if (rc != X86EMUL_CONTINUE)
>> return rc;
>> rc = assign_eip_far(ctxt, eip, &new_desc);
>> - if (rc != X86EMUL_CONTINUE) {
>> - WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
>> + if (rc != X86EMUL_CONTINUE)
>> ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
>> - }
>> +
>> return rc;
>> }
>>
>>
>
> Reviewed-by: Paolo Bonzini <[email protected]>


2016-11-22 20:03:07

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: restore IP after all far jump failures

Should the subject read: "KVM: x86: restore CS after all far jump failures"?

On Tue, Nov 22, 2016 at 11:21 AM, Radim Krčmář <[email protected]> wrote:
> em_jmp_far and em_ret_far assumed that setting IP can only fail in 64
> bit mode, but syzkaller proved otherwise (and SDM agrees).
> Code segment was restored upon failure, but it was left uninitialized
> outside of long mode, which could lead to a leak of host kernel stack.
>
> Found by syzkaller:
>
> WARNING: CPU: 2 PID: 3668 at arch/x86/kvm/emulate.c:2217 em_ret_far+0x428/0x480
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 2 PID: 3668 Comm: syz-executor Not tainted 4.9.0-rc4+ #49
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [...]
> Call Trace:
> [...] __dump_stack lib/dump_stack.c:15
> [...] dump_stack+0xb3/0x118 lib/dump_stack.c:51
> [...] panic+0x1b7/0x3a3 kernel/panic.c:179
> [...] __warn+0x1c4/0x1e0 kernel/panic.c:542
> [...] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
> [...] em_ret_far+0x428/0x480 arch/x86/kvm/emulate.c:2217
> [...] em_ret_far_imm+0x17/0x70 arch/x86/kvm/emulate.c:2227
> [...] x86_emulate_insn+0x87a/0x3730 arch/x86/kvm/emulate.c:5294
> [...] x86_emulate_instruction+0x520/0x1ba0 arch/x86/kvm/x86.c:5545
> [...] emulate_instruction arch/x86/include/asm/kvm_host.h:1116
> [...] complete_emulated_io arch/x86/kvm/x86.c:6870
> [...] complete_emulated_mmio+0x4e9/0x710 arch/x86/kvm/x86.c:6934
> [...] kvm_arch_vcpu_ioctl_run+0x3b7a/0x5a90 arch/x86/kvm/x86.c:6978
> [...] kvm_vcpu_ioctl+0x61e/0xdd0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2557
> [...] vfs_ioctl fs/ioctl.c:43
> [...] do_vfs_ioctl+0x18c/0x1040 fs/ioctl.c:679
> [...] SYSC_ioctl fs/ioctl.c:694
> [...] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
> [...] entry_SYSCALL_64_fastpath+0x1f/0xc2
>
> Reported-by: Dmitry Vyukov <[email protected]>
> Cc: [email protected]
> Fixes: d1442d85cc30 ("KVM: x86: Handle errors when RIP is set during far jumps")
> Signed-off-by: Radim Krčmář <[email protected]>
> ---
> Cc: Nadav Amit <[email protected]>
> Cc: Dmitry Vyukov <[email protected]>
> Cc: Steve Rutherford <[email protected]>
> Cc: Andrew Honig <[email protected]>
> Cc: Prasad Pandit <[email protected]>
> ---
> arch/x86/kvm/emulate.c | 20 ++++++--------------
> 1 file changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 7d4f9b7f06ee..d8a812b6204d 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2137,10 +2137,7 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
> const struct x86_emulate_ops *ops = ctxt->ops;
> u8 cpl = ctxt->ops->cpl(ctxt);
>
> - /* Assignment of RIP may only fail in 64-bit mode */
> - if (ctxt->mode == X86EMUL_MODE_PROT64)
> - ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
> - VCPU_SREG_CS);
> + ops->get_segment(ctxt, &old_sel, &old_desc, NULL, VCPU_SREG_CS);
>
> memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
>
> @@ -2151,12 +2148,10 @@ static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
> return rc;
>
> rc = assign_eip_far(ctxt, ctxt->src.val, &new_desc);
> - if (rc != X86EMUL_CONTINUE) {
> - WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
> + if (rc != X86EMUL_CONTINUE)
> /* assigning eip failed; restore the old cs */
> ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
> - return rc;
> - }
> +
> return rc;
> }
>
> @@ -2221,9 +2216,7 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
> struct desc_struct old_desc, new_desc;
> const struct x86_emulate_ops *ops = ctxt->ops;
>
> - if (ctxt->mode == X86EMUL_MODE_PROT64)
> - ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
> - VCPU_SREG_CS);
> + ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);
>
> rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
> if (rc != X86EMUL_CONTINUE)
> @@ -2240,10 +2233,9 @@ static int em_ret_far(struct x86_emulate_ctxt *ctxt)
> if (rc != X86EMUL_CONTINUE)
> return rc;
> rc = assign_eip_far(ctxt, eip, &new_desc);
> - if (rc != X86EMUL_CONTINUE) {
> - WARN_ON(ctxt->mode != X86EMUL_MODE_PROT64);
> + if (rc != X86EMUL_CONTINUE)
> ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
> - }
> +
> return rc;
> }
>
> --
> 2.10.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-11-22 20:56:55

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: restore IP after all far jump failures

2016-11-22 11:43-0800, Nadav Amit:
> I admit my wrongdoings, but I still think the fix should have been to
> remove the entire recovery logic and just return X86EMUL_UNHANDLEABLE if
> something goes wrong (exception). This will kill the misbehaving process
> but keep the VM running.

X86EMUL_UNHANDLEABLE will kill the whole VM (on QEMU, other userspaces
might handle the instruction and resume KVM).

The recovery path is in the spec, which means that nothing goes wrong.
I think we implement the spec quite well now, so keeping the #GP and CS
recovery is slightly better, although not safer.

> Otherwise, a malicious VM process, which can somehow control descriptors
> (LDT?) may modify the descriptor during the emulation and get the system
> to inconsistent state and prevent the VM-entry.

We restore the original CS -- malicious guest would get killed on a
failed VM entry anyway, so the difference is only in KVM internal error
code (assuming there are no other bugs).

Am I misunderstanding something?

2016-11-22 20:58:11

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: restore IP after all far jump failures

2016-11-22 12:03-0800, Jim Mattson:
> Should the subject read: "KVM: x86: restore CS after all far jump failures"?

Yes, thanks. Maybe "preserve CS" would be even better.

I'll fix it when applying if there are no other problems.

2016-11-22 23:18:45

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: restore IP after all far jump failures


> On Nov 22, 2016, at 12:56 PM, Radim Krčmář <[email protected]> wrote:
>
> 2016-11-22 11:43-0800, Nadav Amit:
>> I admit my wrongdoings, but I still think the fix should have been to
>> remove the entire recovery logic and just return X86EMUL_UNHANDLEABLE if
>> something goes wrong (exception). This will kill the misbehaving process
>> but keep the VM running.
>
> X86EMUL_UNHANDLEABLE will kill the whole VM (on QEMU, other userspaces
> might handle the instruction and resume KVM).

I don’t think so. If CPL is not 0, handle_emulation_failure() will be called
and will inject #UD.

>
> The recovery path is in the spec, which means that nothing goes wrong.
> I think we implement the spec quite well now, so keeping the #GP and CS
> recovery is slightly better, although not safer.
>
>> Otherwise, a malicious VM process, which can somehow control descriptors
>> (LDT?) may modify the descriptor during the emulation and get the system
>> to inconsistent state and prevent the VM-entry.
>
> We restore the original CS -- malicious guest would get killed on a
> failed VM entry anyway, so the difference is only in KVM internal error
> code (assuming there are no other bugs).
>
> Am I misunderstanding something?

Most likely you are right, but after doing one mistake, I don’t want
to be accountable for another.

Note there is another problematic case in em_ret_far(). If an exception
occurs when RIP is assigned, the RSP updates (of emulate_pop() ) are not
going to be reverted. Can it be used for anything malicious? I don’t know.

Nadav

2016-11-23 16:24:03

by Radim Krčmář

[permalink] [raw]
Subject: Re: [PATCH] KVM: x86: restore IP after all far jump failures

2016-11-22 15:18-0800, Nadav Amit:
>> On Nov 22, 2016, at 12:56 PM, Radim Krčmář <[email protected]> wrote:
>>
>> 2016-11-22 11:43-0800, Nadav Amit:
>>> I admit my wrongdoings, but I still think the fix should have been to
>>> remove the entire recovery logic and just return X86EMUL_UNHANDLEABLE if
>>> something goes wrong (exception). This will kill the misbehaving process
>>> but keep the VM running.
>>
>> X86EMUL_UNHANDLEABLE will kill the whole VM (on QEMU, other userspaces
>> might handle the instruction and resume KVM).
>
> I don’t think so. If CPL is not 0, handle_emulation_failure() will be called
> and will inject #UD.

This part of the emulator should only be used on Intels without
unrestricted guest, for real and unpaged protected mode.

It is unsafe as we don't make a clear distinction which instructions are
emulated for MMIO and which for old CPUs, but anyway, we should not be
emulating it only any OS that considers security.

(x86 can use pretty much any instruction for MMIO, but I think that KVM
should let a userspace emulator handle those exotic OSes.)

>> The recovery path is in the spec, which means that nothing goes wrong.
>> I think we implement the spec quite well now, so keeping the #GP and CS
>> recovery is slightly better, although not safer.
>>
>>> Otherwise, a malicious VM process, which can somehow control descriptors
>>> (LDT?) may modify the descriptor during the emulation and get the system
>>> to inconsistent state and prevent the VM-entry.
>>
>> We restore the original CS -- malicious guest would get killed on a
>> failed VM entry anyway, so the difference is only in KVM internal error
>> code (assuming there are no other bugs).
>>
>> Am I misunderstanding something?
>
> Most likely you are right, but after doing one mistake, I don’t want
> to be accountable for another.
>
> Note there is another problematic case in em_ret_far(). If an exception
> occurs when RIP is assigned, the RSP updates (of emulate_pop() ) are not
> going to be reverted. Can it be used for anything malicious? I don’t know.

True, we are not emulating it right ... security is just side-effect.
Benefits of emulation are not worth the work so I'll prepare a patch
that just returns X86EMUL_UNHANDLEABLE. It's not going to fix other
bugs with stack handling, but at least we won't be making it crazier.

Thanks.