Hi guys,
so I'm looking at this splat below when booting current linus+tip/master
in a kvm guest. Initially I thought this is something related to the
PARAVIRT gunk but it happens with and without it.
So, from what I can see, we first #DF and then lockdep fires a deadlock
warning. That I can understand but what I can't understand is why we #DF
with this RIP:
[ 2.744062] RIP: 0010:[<ffffffff816139df>] [<ffffffff816139df>] __schedule+0x28f/0xab0
disassembling this points to
/*
* Since the runqueue lock will be released by the next
* task (which is an invalid locking op but in the case
* of the scheduler it's an obvious special-case), so we
* do an early lockdep release here:
*/
#ifndef __ARCH_WANT_UNLOCKED_CTXSW
spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
#endif
this call in context_switch() (provided this RIP is correct, of course).
(btw, various dumps at the end of this mail with the "<---- faulting"
marker).
And that's lock_release() in lockdep.c.
What's also interesting is that we have two __schedule calls on the stack
before #DF:
[ 2.744062] [<ffffffff816139ce>] ? __schedule+0x27e/0xab0
[ 2.744062] [<ffffffff816139df>] ? __schedule+0x28f/0xab0
The show_stack_log_lvl() I'm attributing to the userspace stack not
being mapped while we're trying to walk it (we do have a %cr3 write
shortly before the RIP we're faulting at) which is another snafu and
shouldn't happen, i.e., we should detect that and not walk it or
whatever...
Anyway, this is what I can see - any and all suggestions on how to debug
this further are appreciated. More info available upon request.
Thanks.
[ 1.932807] devtmpfs: mounted
[ 1.938324] Freeing unused kernel memory: 2872K (ffffffff819ad000 - ffffffff81c7b000)
[ 2.450824] udevd[814]: starting version 175
[ 2.743648] PANIC: double fault, error_code: 0x0
[ 2.743657]
[ 2.744062] ======================================================
[ 2.744062] [ INFO: possible circular locking dependency detected ]
[ 2.744062] 3.16.0-rc2+ #2 Not tainted
[ 2.744062] -------------------------------------------------------
[ 2.744062] vmmouse_detect/957 is trying to acquire lock:
[ 2.744062] ((console_sem).lock){-.....}, at: [<ffffffff81092dcd>] down_trylock+0x1d/0x50
[ 2.744062]
[ 2.744062] but task is already holding lock:
[ 2.744062] (&rq->lock){-.-.-.}, at: [<ffffffff8161382f>] __schedule+0xdf/0xab0
[ 2.744062]
[ 2.744062] which lock already depends on the new lock.
[ 2.744062]
[ 2.744062]
[ 2.744062] the existing dependency chain (in reverse order) is:
[ 2.744062]
-> #2 (&rq->lock){-.-.-.}:
[ 2.744062] [<ffffffff8109c0d9>] lock_acquire+0xb9/0x200
[ 2.744062] [<ffffffff81619111>] _raw_spin_lock+0x41/0x80
[ 2.744062] [<ffffffff8108090b>] wake_up_new_task+0xbb/0x290
[ 2.744062] [<ffffffff8104e847>] do_fork+0x147/0x770
[ 2.744062] [<ffffffff8104ee96>] kernel_thread+0x26/0x30
[ 2.744062] [<ffffffff8160e282>] rest_init+0x22/0x140
[ 2.744062] [<ffffffff81b82e3e>] start_kernel+0x408/0x415
[ 2.744062] [<ffffffff81b82463>] x86_64_start_reservations+0x2a/0x2c
[ 2.744062] [<ffffffff81b8255b>] x86_64_start_kernel+0xf6/0xf9
[ 2.744062]
-> #1 (&p->pi_lock){-.-.-.}:
[ 2.744062] [<ffffffff8109c0d9>] lock_acquire+0xb9/0x200
[ 2.744062] [<ffffffff81619333>] _raw_spin_lock_irqsave+0x53/0x90
[ 2.744062] [<ffffffff810803b1>] try_to_wake_up+0x31/0x450
[ 2.744062] [<ffffffff810807f3>] wake_up_process+0x23/0x40
[ 2.744062] [<ffffffff816177ff>] __up.isra.0+0x1f/0x30
[ 2.744062] [<ffffffff81092fc1>] up+0x41/0x50
[ 2.744062] [<ffffffff810ac7b8>] console_unlock+0x258/0x490
[ 2.744062] [<ffffffff810acc81>] vprintk_emit+0x291/0x610
[ 2.744062] [<ffffffff8161185c>] printk+0x4f/0x57
[ 2.744062] [<ffffffff81486ad1>] input_register_device+0x401/0x4d0
[ 2.744062] [<ffffffff814909b4>] atkbd_connect+0x2b4/0x2e0
[ 2.744062] [<ffffffff81481a3b>] serio_connect_driver+0x3b/0x60
[ 2.744062] [<ffffffff81481a80>] serio_driver_probe+0x20/0x30
[ 2.744062] [<ffffffff813cd8e5>] really_probe+0x75/0x230
[ 2.744062] [<ffffffff813cdbc1>] __driver_attach+0xb1/0xc0
[ 2.744062] [<ffffffff813cb97b>] bus_for_each_dev+0x6b/0xb0
[ 2.744062] [<ffffffff813cd43e>] driver_attach+0x1e/0x20
[ 2.744062] [<ffffffff81482ded>] serio_handle_event+0x14d/0x1f0
[ 2.744062] [<ffffffff8106c9d7>] process_one_work+0x1c7/0x680
[ 2.744062] [<ffffffff8106d77b>] worker_thread+0x6b/0x540
[ 2.744062] [<ffffffff81072ec8>] kthread+0x108/0x120
[ 2.744062] [<ffffffff8161a3ac>] ret_from_fork+0x7c/0xb0
[ 2.744062]
-> #0 ((console_sem).lock){-.....}:
[ 2.744062] [<ffffffff8109b564>] __lock_acquire+0x1f14/0x2290
[ 2.744062] [<ffffffff8109c0d9>] lock_acquire+0xb9/0x200
[ 2.744062] [<ffffffff81619333>] _raw_spin_lock_irqsave+0x53/0x90
[ 2.744062] [<ffffffff81092dcd>] down_trylock+0x1d/0x50
[ 2.744062] [<ffffffff810ac2ae>] console_trylock+0x1e/0xb0
[ 2.744062] [<ffffffff810acc63>] vprintk_emit+0x273/0x610
[ 2.744062] [<ffffffff8161185c>] printk+0x4f/0x57
[ 2.744062] [<ffffffff8103d10b>] df_debug+0x1b/0x40
[ 2.744062] [<ffffffff81003d1d>] do_double_fault+0x5d/0x80
[ 2.744062] [<ffffffff8161bf87>] double_fault+0x27/0x30
[ 2.744062]
[ 2.744062] other info that might help us debug this:
[ 2.744062]
[ 2.744062] Chain exists of:
(console_sem).lock --> &p->pi_lock --> &rq->lock
[ 2.744062] Possible unsafe locking scenario:
[ 2.744062]
[ 2.744062] CPU0 CPU1
[ 2.744062] ---- ----
[ 2.744062] lock(&rq->lock);
[ 2.744062] lock(&p->pi_lock);
[ 2.744062] lock(&rq->lock);
[ 2.744062] lock((console_sem).lock);
[ 2.744062]
[ 2.744062] *** DEADLOCK ***
[ 2.744062]
[ 2.744062] 1 lock held by vmmouse_detect/957:
[ 2.744062] #0: (&rq->lock){-.-.-.}, at: [<ffffffff8161382f>] __schedule+0xdf/0xab0
[ 2.744062]
[ 2.744062] stack backtrace:
[ 2.744062] CPU: 0 PID: 957 Comm: vmmouse_detect Not tainted 3.16.0-rc2+ #2
[ 2.744062] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 2.744062] ffffffff823f00a0 ffff88007c205c50 ffffffff8161206f ffffffff823f2d30
[ 2.744062] ffff88007c205c90 ffffffff81095b3b ffffffff827f4980 ffff88007aab9ad8
[ 2.744062] ffff88007aab93a8 ffff88007aab9370 0000000000000001 ffff88007aab9aa0
[ 2.744062] Call Trace:
[ 2.744062] <#DF> [<ffffffff8161206f>] dump_stack+0x4e/0x7a
[ 2.744062] [<ffffffff81095b3b>] print_circular_bug+0x1fb/0x330
[ 2.744062] [<ffffffff8109b564>] __lock_acquire+0x1f14/0x2290
[ 2.744062] [<ffffffff8109c0d9>] lock_acquire+0xb9/0x200
[ 2.744062] [<ffffffff81092dcd>] ? down_trylock+0x1d/0x50
[ 2.744062] [<ffffffff81619333>] _raw_spin_lock_irqsave+0x53/0x90
[ 2.744062] [<ffffffff81092dcd>] ? down_trylock+0x1d/0x50
[ 2.744062] [<ffffffff810acc63>] ? vprintk_emit+0x273/0x610
[ 2.744062] [<ffffffff81092dcd>] down_trylock+0x1d/0x50
[ 2.744062] [<ffffffff810acc63>] ? vprintk_emit+0x273/0x610
[ 2.744062] [<ffffffff810ac2ae>] console_trylock+0x1e/0xb0
[ 2.744062] [<ffffffff810acc63>] vprintk_emit+0x273/0x610
[ 2.744062] [<ffffffff8161185c>] printk+0x4f/0x57
[ 2.744062] [<ffffffff8103d10b>] df_debug+0x1b/0x40
[ 2.744062] [<ffffffff81003d1d>] do_double_fault+0x5d/0x80
[ 2.744062] [<ffffffff8161bf87>] double_fault+0x27/0x30
[ 2.744062] [<ffffffff816139ce>] ? __schedule+0x27e/0xab0
[ 2.744062] [<ffffffff816139df>] ? __schedule+0x28f/0xab0
[ 2.744062] <<EOE>> <UNK>
[ 2.744062] CPU: 0 PID: 957 Comm: vmmouse_detect Not tainted 3.16.0-rc2+ #2
[ 2.744062] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 2.744062] task: ffff88007aab9370 ti: ffff88007abb8000 task.ti: ffff88007abb8000
[ 2.744062] RIP: 0010:[<ffffffff816139df>] [<ffffffff816139df>] __schedule+0x28f/0xab0
[ 2.744062] RSP: 002b:00007fffb47a8730 EFLAGS: 00013086
[ 2.744062] RAX: 000000007b4b2000 RBX: ffff88007b0cb200 RCX: 0000000000000028
[ 2.744062] RDX: ffffffff816139ce RSI: 0000000000000001 RDI: ffff88007c3d3a18
[ 2.744062] RBP: 00007fffb47a8820 R08: 0000000000000000 R09: 0000000000002dd4
[ 2.744062] R10: 0000000000000001 R11: 0000000000000019 R12: ffff88007c3d3a00
[ 2.744062] R13: ffff880079c24a00 R14: 0000000000000000 R15: ffff88007aab9370
[ 2.744062] FS: 00007f48052ad700(0000) GS:ffff88007c200000(0000) knlGS:0000000000000000
[ 2.744062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.744062] CR2: 00007fffb47a8728 CR3: 000000007b4b2000 CR4: 00000000000006f0
[ 2.744062] Stack:
[ 2.744062] BUG: unable to handle kernel paging request at 00007fffb47a8730
[ 2.744062] IP: [<ffffffff81005a4c>] show_stack_log_lvl+0x11c/0x1d0
[ 2.744062] PGD 7b210067 PUD 0
[ 2.744062] Oops: 0000 [#1] PREEMPT SMP
[ 2.744062] Modules linked in:
[ 2.744062] CPU: 0 PID: 957 Comm: vmmouse_detect Not tainted 3.16.0-rc2+ #2
[ 2.744062] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 2.744062] task: ffff88007aab9370 ti: ffff88007abb8000 task.ti: ffff88007abb8000
[ 2.744062] RIP: 0010:[<ffffffff81005a4c>] [<ffffffff81005a4c>] show_stack_log_lvl+0x11c/0x1d0
[ 2.744062] RSP: 002b:ffff88007c205e58 EFLAGS: 00013046
[ 2.744062] RAX: 00007fffb47a8738 RBX: 0000000000000000 RCX: ffff88007c203fc0
[ 2.744062] RDX: 00007fffb47a8730 RSI: ffff88007c200000 RDI: ffffffff8184e0ea
[ 2.744062] RBP: ffff88007c205ea8 R08: ffff88007c1fffc0 R09: 0000000000000000
[ 2.744062] R10: 000000007c200000 R11: 0000000000000000 R12: ffff88007c205f58
[ 2.744062] R13: 0000000000000000 R14: ffffffff8181b584 R15: 0000000000000000
[ 2.744062] FS: 00007f48052ad700(0000) GS:ffff88007c200000(0000) knlGS:0000000000000000
[ 2.744062] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.744062] CR2: 00007fffb47a8730 CR3: 000000007b4b2000 CR4: 00000000000006f0
[ 2.744062] Stack:
[ 2.744062] 0000000000000008 ffff88007c205eb8 ffff88007c205e70 000000007b4b2000
[ 2.744062] 00007fffb47a8730 ffff88007c205f58 00007fffb47a8730 0000000000000040
[ 2.744062] 0000000000000ac0 ffff88007aab9370 ffff88007c205f08 ffffffff81005ba0
[ 2.744062] Call Trace:
[ 2.744062] <#DF>
[ 2.744062] [<ffffffff81005ba0>] show_regs+0xa0/0x280
[ 2.744062] [<ffffffff8103d113>] df_debug+0x23/0x40
[ 2.744062] [<ffffffff81003d1d>] do_double_fault+0x5d/0x80
[ 2.744062] [<ffffffff8161bf87>] double_fault+0x27/0x30
[ 2.744062] [<ffffffff816139ce>] ? __schedule+0x27e/0xab0
[ 2.744062] [<ffffffff816139df>] ? __schedule+0x28f/0xab0
[ 2.744062] <<EOE>>
[ 2.744062] <UNK> Code: 7a ff ff ff 0f 1f 00 e8 d3 80 00 00 eb a5 48 39 ca 0f 84 8d 00 00 00 45 85 ff 0f 1f 44 00 00 74 06 41 f6 c7 03 74 55 48 8d 42 08 <48> 8b 32 48 c7 c7 7c b5 81 81 4c 89 45 b8 48 89 4d c0 41 ff c7
[ 2.744062] RIP [<ffffffff81005a4c>] show_stack_log_lvl+0x11c/0x1d0
[ 2.744062] RSP <ffff88007c205e58>
[ 2.744062] CR2: 00007fffb47a8730
[ 2.744062] ---[ end trace 5cdf016839902dea ]---
[ 2.744062] note: vmmouse_detect[957] exited with preempt_count 3
247: 48 8b 7b 40 mov 0x40(%rbx),%rdi
24b: e8 00 00 00 00 callq 250 <__schedule+0x250>
250: 0f 22 d8 mov %rax,%cr3
253: f0 4d 0f b3 b5 88 03 lock btr %r14,0x388(%r13)
25a: 00 00
25c: 4c 8b b3 90 03 00 00 mov 0x390(%rbx),%r14
263: 4d 39 b5 90 03 00 00 cmp %r14,0x390(%r13)
26a: 0f 85 38 06 00 00 jne 8a8 <__schedule+0x8a8>
270: 49 83 bf 88 02 00 00 cmpq $0x0,0x288(%r15)
277: 00
278: 0f 84 9a 03 00 00 je 618 <__schedule+0x618>
27e: 49 8d 7c 24 18 lea 0x18(%r12),%rdi
283: 48 c7 c2 00 00 00 00 mov $0x0,%rdx
28a: be 01 00 00 00 mov $0x1,%esi
28f: e8 00 00 00 00 callq 294 <__schedule+0x294> <--- faulting
294: 48 8b 74 24 18 mov 0x18(%rsp),%rsi
299: 4c 89 ff mov %r15,%rdi
29c: 9c pushfq
29d: 55 push %rbp
29e: 48 89 f5 mov %rsi,%rbp
2a1: 48 89 a7 f0 04 00 00 mov %rsp,0x4f0(%rdi)
2a8: 48 8b a6 f0 04 00 00 mov 0x4f0(%rsi),%rsp
2af: e8 00 00 00 00 callq 2b4 <__schedule+0x2b4>
2b4: 65 48 8b 34 25 00 00 mov %gs:0x0,%rsi
ffffffff81613997: 48 8b 7b 40 mov 0x40(%rbx),%rdi
ffffffff8161399b: e8 50 28 a3 ff callq ffffffff810461f0 <__phys_addr>
ffffffff816139a0: 0f 22 d8 mov %rax,%cr3
ffffffff816139a3: f0 4d 0f b3 b5 88 03 lock btr %r14,0x388(%r13)
ffffffff816139aa: 00 00
ffffffff816139ac: 4c 8b b3 90 03 00 00 mov 0x390(%rbx),%r14
ffffffff816139b3: 4d 39 b5 90 03 00 00 cmp %r14,0x390(%r13)
ffffffff816139ba: 0f 85 38 06 00 00 jne ffffffff81613ff8 <__schedule+0x8a8>
ffffffff816139c0: 49 83 bf 88 02 00 00 cmpq $0x0,0x288(%r15)
ffffffff816139c7: 00
ffffffff816139c8: 0f 84 9a 03 00 00 je ffffffff81613d68 <__schedule+0x618>
ffffffff816139ce: 49 8d 7c 24 18 lea 0x18(%r12),%rdi
ffffffff816139d3: 48 c7 c2 ce 39 61 81 mov $0xffffffff816139ce,%rdx
ffffffff816139da: be 01 00 00 00 mov $0x1,%esi
ffffffff816139df: e8 bc 82 a8 ff callq ffffffff8109bca0 <lock_release> <--- faulting
ffffffff816139e4: 48 8b 74 24 18 mov 0x18(%rsp),%rsi
ffffffff816139e9: 4c 89 ff mov %r15,%rdi
ffffffff816139ec: 9c pushfq
ffffffff816139ed: 55 push %rbp
ffffffff816139ee: 48 89 f5 mov %rsi,%rbp
ffffffff816139f1: 48 89 a7 f0 04 00 00 mov %rsp,0x4f0(%rdi)
ffffffff816139f8: 48 8b a6 f0 04 00 00 mov 0x4f0(%rsi),%rsp
ffffffff816139ff: e8 cc d9 9e ff callq ffffffff810013d0 <__switch_to>
ffffffff81613a04: 65 48 8b 34 25 00 b9 mov %gs:0xb900,%rsi
# 80 "./arch/x86/include/asm/bitops.h" 1
.pushsection .smp_locks,"a"
.balign 4
.long 671f - .
.popsection
671:
lock; bts %r14,904(%rbx) # D.63059, MEM[(volatile long int *)_201]
# 0 "" 2
#NO_APP
movq 64(%rbx), %rdi # mm_193->pgd, mm_193->pgd
call __phys_addr #
#APP
# 54 "./arch/x86/include/asm/special_insns.h" 1
mov %rax,%cr3 # D.63056
# 0 "" 2
# 117 "./arch/x86/include/asm/bitops.h" 1
.pushsection .smp_locks,"a"
.balign 4
.long 671f - .
.popsection
671:
lock; btr %r14,904(%r13) # D.63059, MEM[(volatile long int *)_215]
# 0 "" 2
#NO_APP
movq 912(%rbx), %r14 # mm_193->context.ldt, D.63062
cmpq %r14, 912(%r13) # D.63062, oldmm_194->context.ldt
jne .L2117 #,
.L2023:
cmpq $0, 648(%r15) #, prev_21->mm
je .L2118 #,
.L2029:
leaq 24(%r12), %rdi #, D.63079
movq $.L2029, %rdx #,
movl $1, %esi #,
call lock_release # <---faulting
movq 24(%rsp), %rsi # %sfp, D.63067
movq %r15, %rdi # prev, prev
#APP
# 2338 "kernel/sched/core.c" 1
pushf ; pushq %rbp ; movq %rsi,%rbp
movq %rsp,1264(%rdi) #, prev
movq 1264(%rsi),%rsp #, D.63067
call __switch_to
movq %gs:current_task,%rsi # current_task
movq 760(%rsi),%r8 #
movq %r8,%gs:irq_stack_union+40 # irq_stack_union.D.4635.stack_canary
movq 8(%rsi),%r8 #
movq %rax,%rdi
testl $262144,16(%r8) #,
jnz ret_from_fork
movq %rbp,%rsi ; popq %rbp ; popf
# 0 "" 2
#NO_APP
movq %rax, 24(%rsp) # prev, %sfp
call debug_smp_processor_id #
movl %eax, %eax # D.63055, D.63055
movq $runqueues, %rbx #, __ptr
movq 24(%rsp), %rsi # %sfp, prev
movq %rbx, %rdi # __ptr, D.63056
addq __per_cpu_offset(,%rax,8), %rdi # __per_cpu_offset, D.63056
movq $runqueues, %r12 #, __ptr
call finish_task_switch #
call debug_smp_processor_id #
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Wed, Jun 25, 2014 at 05:32:28PM +0200, Borislav Petkov wrote:
> Hi guys,
>
> so I'm looking at this splat below when booting current linus+tip/master
> in a kvm guest. Initially I thought this is something related to the
> PARAVIRT gunk but it happens with and without it.
Ok, here's a cleaner splat. I went and rebuilt qemu to latest master
from today to rule out some breakage there but it still fires.
Paolo, any ideas why would kvm+qemu trigger a #DF in the guest? I guess
I should dust off my old kvm/qemu #DF debugging patch I had somewhere...
I did try to avoid the invalid stack issue by doing:
---
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 1abcb50b48ae..dd8e0eec071e 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -286,7 +286,7 @@ show_stack_log_lvl(struct task_struct *task, struct pt_regs *regs,
}
if (i && ((i % STACKSLOTS_PER_LINE) == 0))
pr_cont("\n");
- pr_cont(" %016lx", *stack++);
+ pr_cont(" %016lx", (((unsigned long)stack <= 0x00007fffffffffffUL) ? -1 : *stack++));
touch_nmi_watchdog();
}
preempt_enable();
---
but that didn't work either - see second splat at the end.
[ 2.704184] PANIC: double fault, error_code: 0x0
[ 2.708132] CPU: 1 PID: 959 Comm: vmmouse_detect Not tainted 3.15.0+ #7
[ 2.708132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 2.708132] task: ffff880079c78000 ti: ffff880079c74000 task.ti: ffff880079c74000
[ 2.708132] RIP: 0010:[<ffffffff8161130f>] [<ffffffff8161130f>] __schedule+0x28f/0xab0
[ 2.708132] RSP: 002b:00007fff99e51100 EFLAGS: 00013082
[ 2.708132] RAX: 000000007b206000 RBX: ffff88007b526f80 RCX: 0000000000000028
[ 2.708132] RDX: ffffffff816112fe RSI: 0000000000000001 RDI: ffff88007c5d3c58
[ 2.708132] RBP: 00007fff99e511f0 R08: 0000000000000000 R09: 0000000000000000
[ 2.708132] R10: 0000000000000001 R11: 0000000000000019 R12: ffff88007c5d3c40
[ 2.708132] R13: ffff880079c84e40 R14: 0000000000000000 R15: ffff880079c78000
[ 2.708132] FS: 00007ff252c6d700(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[ 2.708132] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.708132] CR2: 00007fff99e510f8 CR3: 000000007b206000 CR4: 00000000000006e0
[ 2.708132] Stack:
[ 2.708132] BUG: unable to handle kernel paging request at 00007fff99e51100
[ 2.708132] IP: [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[ 2.708132] PGD 7b20d067 PUD 0
[ 2.708132] Oops: 0000 [#1] PREEMPT SMP
[ 2.708132] Modules linked in:
[ 2.708132] CPU: 1 PID: 959 Comm: vmmouse_detect Not tainted 3.15.0+ #7
[ 2.708132] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 2.708132] task: ffff880079c78000 ti: ffff880079c74000 task.ti: ffff880079c74000
[ 2.708132] RIP: 0010:[<ffffffff81005bbc>] [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[ 2.708132] RSP: 002b:ffff88007c405e58 EFLAGS: 00013046
[ 2.708132] RAX: 00007fff99e51108 RBX: 0000000000000000 RCX: ffff88007c403fc0
[ 2.708132] RDX: 00007fff99e51100 RSI: ffff88007c400000 RDI: ffffffff81846aba
[ 2.708132] RBP: ffff88007c405ea8 R08: ffff88007c3fffc0 R09: 0000000000000000
[ 2.708132] R10: 000000007c400000 R11: 0000000000000000 R12: ffff88007c405f58
[ 2.708132] R13: 0000000000000000 R14: ffffffff818136fc R15: 0000000000000000
[ 2.708132] FS: 00007ff252c6d700(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[ 2.708132] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.708132] CR2: 00007fff99e51100 CR3: 000000007b206000 CR4: 00000000000006e0
[ 2.708132] Stack:
[ 2.708132] 0000000000000008 ffff88007c405eb8 ffff88007c405e70 000000007b206000
[ 2.708132] 00007fff99e51100 ffff88007c405f58 00007fff99e51100 0000000000000040
[ 2.708132] 0000000000000ac0 ffff880079c78000 ffff88007c405f08 ffffffff81005d10
[ 2.708132] Call Trace:
[ 2.708132] <#DF>
[ 2.708132] [<ffffffff81005d10>] show_regs+0xa0/0x280
[ 2.708132] [<ffffffff8103d143>] df_debug+0x23/0x40
[ 2.708132] [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[ 2.708132] [<ffffffff816194c7>] double_fault+0x27/0x30
[ 2.708132] [<ffffffff816112fe>] ? __schedule+0x27e/0xab0
[ 2.708132] [<ffffffff8161130f>] ? __schedule+0x28f/0xab0
[ 2.708132] <<EOE>>
[ 2.708132] <UNK> Code: 7a ff ff ff 0f 1f 00 e8 93 80 00 00 eb a5 48 39 ca 0f 84 8d 00 00 00 45 85 ff 0f 1f 44 00 00 74 06 41 f6 c7 03 74 55 48 8d 42 08 <48> 8b 32 48 c7 c7 f4 36 81 81 4c 89 45 b8 48 89 4d c0 41 ff c7
[ 2.708132] RIP [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[ 2.708132] RSP <ffff88007c405e58>
[ 2.708132] CR2: 00007fff99e51100
[ 2.708132] ---[ end trace 749cd02c31c493a0 ]---
[ 2.708132] note: vmmouse_detect[959] exited with preempt_count 3
[ 1.730726] VFS: Mounted root (ext3 filesystem) readonly on device 8:1.
[ 1.737392] devtmpfs: mounted
[ 1.748817] Freeing unused kernel memory: 2872K (ffffffff819a9000 - ffffffff81c77000)
[ 2.249240] udevd[812]: starting version 175
[ 2.563876] PANIC: double fault, error_code: 0x0
[ 2.563885]
[ 2.564051] ======================================================
[ 2.564051] [ INFO: possible circular locking dependency detected ]
[ 2.575059] 3.15.0+ #8 Not tainted
[ 2.575059] -------------------------------------------------------
[ 2.575059] vmmouse_detect/960 is trying to acquire lock:
[ 2.575059] ((console_sem).lock){-.....}, at: [<ffffffff8109cfdd>] down_trylock+0x1d/0x50
[ 2.575059]
[ 2.575059] but task is already holding lock:
[ 2.575059] (&rq->lock){-.-.-.}, at: [<ffffffff8161118f>] __schedule+0xdf/0xab0
[ 2.575059]
[ 2.575059] which lock already depends on the new lock.
[ 2.575059]
[ 2.575059]
[ 2.575059] the existing dependency chain (in reverse order) is:
[ 2.575059]
-> #2 (&rq->lock){-.-.-.}:
[ 2.575059] [<ffffffff810a62c9>] lock_acquire+0xb9/0x200
[ 2.575059] [<ffffffff816160e1>] _raw_spin_lock+0x41/0x80
[ 2.575059] [<ffffffff8108ab3b>] wake_up_new_task+0xbb/0x290
[ 2.575059] [<ffffffff8104e887>] do_fork+0x147/0x770
[ 2.575059] [<ffffffff8104eed6>] kernel_thread+0x26/0x30
[ 2.575059] [<ffffffff8160b572>] rest_init+0x22/0x140
[ 2.575059] [<ffffffff81b7ee3e>] start_kernel+0x408/0x415
[ 2.575059] [<ffffffff81b7e463>] x86_64_start_reservations+0x2a/0x2c
[ 2.575059] [<ffffffff81b7e55b>] x86_64_start_kernel+0xf6/0xf9
[ 2.575059]
-> #1 (&p->pi_lock){-.-.-.}:
[ 2.575059] [<ffffffff810a62c9>] lock_acquire+0xb9/0x200
[ 2.575059] [<ffffffff81616303>] _raw_spin_lock_irqsave+0x53/0x90
[ 2.575059] [<ffffffff8108a70c>] try_to_wake_up+0x3c/0x330
[ 2.575059] [<ffffffff8108aa23>] wake_up_process+0x23/0x40
[ 2.575059] [<ffffffff816151af>] __up.isra.0+0x1f/0x30
[ 2.575059] [<ffffffff8109d1d1>] up+0x41/0x50
[ 2.575059] [<ffffffff810b5608>] console_unlock+0x258/0x490
[ 2.575059] [<ffffffff810b5ad1>] vprintk_emit+0x291/0x610
[ 2.575059] [<ffffffff8160ebf7>] printk_emit+0x33/0x3b
[ 2.575059] [<ffffffff810b5fd4>] devkmsg_writev+0x154/0x1d0
[ 2.575059] [<ffffffff8116d77a>] do_sync_write+0x5a/0x90
[ 2.575059] [<ffffffff8116df25>] vfs_write+0x175/0x1c0
[ 2.575059] [<ffffffff8116e982>] SyS_write+0x52/0xc0
[ 2.575059] [<ffffffff81617ce6>] system_call_fastpath+0x1a/0x1f
[ 2.575059]
-> #0 ((console_sem).lock){-.....}:
[ 2.575059] [<ffffffff810a5754>] __lock_acquire+0x1f14/0x2290
[ 2.575059] [<ffffffff810a62c9>] lock_acquire+0xb9/0x200
[ 2.575059] [<ffffffff81616303>] _raw_spin_lock_irqsave+0x53/0x90
[ 2.575059] [<ffffffff8109cfdd>] down_trylock+0x1d/0x50
[ 2.575059] [<ffffffff810b50fe>] console_trylock+0x1e/0xb0
[ 2.575059] [<ffffffff810b5ab3>] vprintk_emit+0x273/0x610
[ 2.575059] [<ffffffff8160ec4e>] printk+0x4f/0x57
[ 2.575059] [<ffffffff8103d16b>] df_debug+0x1b/0x40
[ 2.575059] [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[ 2.575059] [<ffffffff81619507>] double_fault+0x27/0x30
[ 2.575059]
[ 2.575059] other info that might help us debug this:
[ 2.575059]
[ 2.575059] Chain exists of:
(console_sem).lock --> &p->pi_lock --> &rq->lock
[ 2.575059] Possible unsafe locking scenario:
[ 2.575059]
[ 2.575059] CPU0 CPU1
[ 2.575059] ---- ----
[ 2.575059] lock(&rq->lock);
[ 2.575059] lock(&p->pi_lock);
[ 2.575059] lock(&rq->lock);
[ 2.575059] lock((console_sem).lock);
[ 2.575059]
[ 2.575059] *** DEADLOCK ***
[ 2.575059]
[ 2.575059] 1 lock held by vmmouse_detect/960:
[ 2.575059] #0: (&rq->lock){-.-.-.}, at: [<ffffffff8161118f>] __schedule+0xdf/0xab0
[ 2.575059]
[ 2.575059] stack backtrace:
[ 2.575059] CPU: 0 PID: 960 Comm: vmmouse_detect Not tainted 3.15.0+ #8
[ 2.575059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 2.575059] ffffffff823ef810 ffff88007c205c50 ffffffff8160f461 ffffffff823f22b0
[ 2.575059] ffff88007c205c90 ffffffff8109fd2b ffffffff82802180 ffff880079ca2e48
[ 2.575059] ffff880079ca2718 ffff880079ca26e0 0000000000000001 ffff880079ca2e10
[ 2.575059] Call Trace:
[ 2.575059] <#DF> [<ffffffff8160f461>] dump_stack+0x4e/0x7a
[ 2.575059] [<ffffffff8109fd2b>] print_circular_bug+0x1fb/0x330
[ 2.575059] [<ffffffff810a5754>] __lock_acquire+0x1f14/0x2290
[ 2.575059] [<ffffffff810a62c9>] lock_acquire+0xb9/0x200
[ 2.575059] [<ffffffff8109cfdd>] ? down_trylock+0x1d/0x50
[ 2.575059] [<ffffffff81616303>] _raw_spin_lock_irqsave+0x53/0x90
[ 2.575059] [<ffffffff8109cfdd>] ? down_trylock+0x1d/0x50
[ 2.575059] [<ffffffff810b5ab3>] ? vprintk_emit+0x273/0x610
[ 2.575059] [<ffffffff8109cfdd>] down_trylock+0x1d/0x50
[ 2.575059] [<ffffffff810b5ab3>] ? vprintk_emit+0x273/0x610
[ 2.575059] [<ffffffff810b50fe>] console_trylock+0x1e/0xb0
[ 2.575059] [<ffffffff810b5ab3>] vprintk_emit+0x273/0x610
[ 2.575059] [<ffffffff8160ec4e>] printk+0x4f/0x57
[ 2.575059] [<ffffffff8103d16b>] df_debug+0x1b/0x40
[ 2.575059] [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[ 2.575059] [<ffffffff81619507>] double_fault+0x27/0x30
[ 2.575059] [<ffffffff8161132e>] ? __schedule+0x27e/0xab0
[ 2.575059] [<ffffffff8161133f>] ? __schedule+0x28f/0xab0
[ 2.575059] <<EOE>> <UNK>
[ 2.575059] CPU: 0 PID: 960 Comm: vmmouse_detect Not tainted 3.15.0+ #8
[ 2.575059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 2.575059] task: ffff880079ca26e0 ti: ffff880079d04000 task.ti: ffff880079d04000
[ 2.575059] RIP: 0010:[<ffffffff8161133f>] [<ffffffff8161133f>] __schedule+0x28f/0xab0
[ 2.575059] RSP: 002b:00007fff70516420 EFLAGS: 00013086
[ 2.575059] RAX: 000000007ae81000 RBX: ffff88007be67900 RCX: 0000000000000028
[ 2.575059] RDX: ffffffff8161132e RSI: 0000000000000001 RDI: ffff88007c3d3c58
[ 2.575059] RBP: 00007fff70516510 R08: 0000000000000000 R09: 0000000000005c00
[ 2.575059] R10: 0000000000000001 R11: 0000000000000019 R12: ffff88007c3d3c40
[ 2.575059] R13: ffff88007b634000 R14: 0000000000000000 R15: ffff880079ca26e0
[ 2.575059] FS: 00007f77c13d6700(0000) GS:ffff88007c200000(0000) knlGS:0000000000000000
[ 2.575059] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.575059] CR2: 00007fff70516418 CR3: 000000007ae81000 CR4: 00000000000006f0
[ 2.575059] Stack:
[ 2.575059] ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[ 2.575059] ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[ 2.575059] ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
[ 2.575059] Call Trace:
[ 2.575059] <UNK>
[ 2.575059] Code: 39 b5 80 03 00 00 0f 85 38 06 00 00 49 83 bf 88 02 00 00 00 0f 84 9a 03 00 00 49 8d 7c 24 18 48 c7 c2 2e 13 61 81 be 01 00 00 00 <e8> 4c 4b a9 ff 48 8b 74 24 18 4c 89 ff 9c 55 48 89 f5 48 89 a7
[ 2.575059] Kernel panic - not syncing: Machine halted.
[ 2.575059] CPU: 0 PID: 960 Comm: vmmouse_detect Not tainted 3.15.0+ #8
[ 2.575059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 2.575059] ffff88007c205f18 ffff88007c205e88 ffffffff8160f461 ffffffff81817b42
[ 2.575059] ffff88007c205f08 ffffffff8160ded3 0000000000000008 ffff88007c205f18
[ 2.575059] ffff88007c205eb0 ffffffff81005cfb ffffffff81616531 0000000080000002
[ 2.575059] Call Trace:
[ 2.575059] <#DF> [<ffffffff8160f461>] dump_stack+0x4e/0x7a
[ 2.575059] [<ffffffff8160ded3>] panic+0xc5/0x1e1
[ 2.575059] [<ffffffff81005cfb>] ? show_regs+0x5b/0x280
[ 2.575059] [<ffffffff81616531>] ? _raw_spin_unlock_irqrestore+0x41/0x90
[ 2.575059] [<ffffffff8103d181>] df_debug+0x31/0x40
[ 2.575059] [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[ 2.575059] [<ffffffff81619507>] double_fault+0x27/0x30
[ 2.575059] [<ffffffff8161132e>] ? __schedule+0x27e/0xab0
[ 2.575059] [<ffffffff8161133f>] ? __schedule+0x28f/0xab0
[ 2.575059] <<EOE>> <UNK>
[ 2.575059] Shutting down cpus with NMI
[ 2.575059] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
[ 2.575059] ---[ end Kernel panic - not syncing: Machine halted.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Wed, Jun 25, 2014 at 10:26:50PM +0200, Borislav Petkov wrote:
> On Wed, Jun 25, 2014 at 05:32:28PM +0200, Borislav Petkov wrote:
> > Hi guys,
> >
> > so I'm looking at this splat below when booting current linus+tip/master
> > in a kvm guest. Initially I thought this is something related to the
> > PARAVIRT gunk but it happens with and without it.
>
> Ok, here's a cleaner splat. I went and rebuilt qemu to latest master
> from today to rule out some breakage there but it still fires.
Ok, another observation: I was using qemu from sources from the other day:
v2.0.0-1806-g2b5b7ae917e8
Switching back to the installed one:
$ qemu-system-x86_64 --version
QEMU emulator version 1.7.1 (Debian 1.7.0+dfsg-6), Copyright (c) 2003-2008 Fabrice Bellard
fixes the issue.
Joerg says I should bisect but I'm busy with other stuff. If people are
interested in chasing this further, I could free up some time to do
so...
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
Il 27/06/2014 12:18, Borislav Petkov ha scritto:
> Joerg says I should bisect but I'm busy with other stuff. If people are
> interested in chasing this further, I could free up some time to do
> so...
Please first try "-M pc-1.7" on the 2.0 QEMU. If it fails, please do
bisect it. A QEMU bisection between one release only usually takes only
half an hour or so for me.
I use
../configure --target-list=x86_64-softmmu && make distclean &&
../configure --target-list=x86_64-softmmu &&
make -j8 subdir-x86_64-softmmu
Until it's below 50 commits. After that point just "make -j8
subdir-x86_64-softmmu" should do. This ensures that build system
changes do not bite you as you move back and forth in time.
Thanks!
Paolo
On Fri, Jun 27, 2014 at 01:41:30PM +0200, Paolo Bonzini wrote:
> Il 27/06/2014 12:18, Borislav Petkov ha scritto:
> >Joerg says I should bisect but I'm busy with other stuff. If people are
> >interested in chasing this further, I could free up some time to do
> >so...
>
> Please first try "-M pc-1.7" on the 2.0 QEMU. If it fails, please do bisect
> it. A QEMU bisection between one release only usually takes only half an
> hour or so for me.
>
> I use
>
> ../configure --target-list=x86_64-softmmu && make distclean &&
> ../configure --target-list=x86_64-softmmu &&
> make -j8 subdir-x86_64-softmmu
>
> Until it's below 50 commits. After that point just "make -j8
> subdir-x86_64-softmmu" should do. This ensures that build system changes do
> not bite you as you move back and forth in time.
Ok, thanks for the help.
However, as it always happens, right after sending the mail, I triggered
it with the qemu installed on the system too :-( I.e., qemu 1.7.1.
So we will try to debug the #DF first to find out why kvm is injecting
it in the first place. I'll keep you posted.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
Il 27/06/2014 13:55, Borislav Petkov ha scritto:
> On Fri, Jun 27, 2014 at 01:41:30PM +0200, Paolo Bonzini wrote:
>> Il 27/06/2014 12:18, Borislav Petkov ha scritto:
>>> Joerg says I should bisect but I'm busy with other stuff. If people are
>>> interested in chasing this further, I could free up some time to do
>>> so...
>>
>> Please first try "-M pc-1.7" on the 2.0 QEMU. If it fails, please do bisect
>> it. A QEMU bisection between one release only usually takes only half an
>> hour or so for me.
>>
>> I use
>>
>> ../configure --target-list=x86_64-softmmu && make distclean &&
>> ../configure --target-list=x86_64-softmmu &&
>> make -j8 subdir-x86_64-softmmu
>>
>> Until it's below 50 commits. After that point just "make -j8
>> subdir-x86_64-softmmu" should do. This ensures that build system changes do
>> not bite you as you move back and forth in time.
>
> Ok, thanks for the help.
>
> However, as it always happens, right after sending the mail, I triggered
> it with the qemu installed on the system too :-( I.e., qemu 1.7.1.
:)
Can you try gathering a trace? (and since those things get huge, you can
send it to me offlist) Also try without ept and see what happens.
Also, perhaps you can bisect between Linus's tree and tip?
And finally, what is the host kernel?
Paolo
On Fri, Jun 27, 2014 at 02:01:43PM +0200, Paolo Bonzini wrote:
> Can you try gathering a trace? (and since those things get huge, you
> can send it to me offlist) Also try without ept and see what happens.
Yeah, Joerg just sent me a diff on how to intercept #DF. I'll add a
tracepoint so that it all goes into the trace together.
> Also, perhaps you can bisect between Linus's tree and tip?
Yep, that's next if we don't get smart from the #DF trace.
> And finally, what is the host kernel?
3.16-rc2+ - "+" is tip/master from the last weekend with a couple of
patches to arch/x86/ from me which should be unrelated (yeah, we've
heard that before :-)).
Thanks for the suggestions, I'm going for a run now but after I get
back, debugging session starts with host and guest rebuilt afresh. :-)
Stay tuned.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
Ok, I rebuilt the host kernel with latest linus+tip/master and my queue.
The guest kernel is v3.15-8992-g08f7cc749389 with a is a bunch of RAS
patches. Before I start doing the coarse-grained bisection by testing
-rcs and major numbers, I wanted to catch a #DF and try to analyze at
least why it happens. And from what I'm seeing, it looks insane.
Ok, so kvm_amd.ko is loaded with npt=0 so that I can see the pagefaults
in the trace.
All TPs in events/kvm/ are enabled. The df tracepoint is
straightforward, attached.
However, with npt=0 this #DF TP doesn't get hit. I still can see the #DF
though and here's what it looks like. (qemu is latest from git):
So let's comment on what I'm seeing:
...
qemu-system-x86-20240 [006] d..2 9406.484041: kvm_entry: vcpu 1
qemu-system-x86-20240 [006] d..2 9406.484042: kvm_exit: reason PF excp rip 0xffffffff8103be46 info b ffffffffff5fd380
qemu-system-x86-20240 [006] ...1 9406.484042: kvm_page_fault: address ffffffffff5fd380 error_code b
qemu-system-x86-20240 [006] ...1 9406.484044: kvm_emulate_insn: 0:ffffffff8103be46:89 b7 00 d0 5f ff (prot64)
qemu-system-x86-20240 [006] ...1 9406.484044: vcpu_match_mmio: gva 0xffffffffff5fd380 gpa 0xfee00380 Write GVA
qemu-system-x86-20240 [006] ...1 9406.484044: kvm_mmio: mmio write len 4 gpa 0xfee00380 val 0x39884
qemu-system-x86-20240 [006] ...1 9406.484045: kvm_apic: apic_write APIC_TMICT = 0x39884
qemu-system-x86-20239 [004] d..2 9406.484046: kvm_entry: vcpu 0
qemu-system-x86-20240 [006] d..2 9406.484048: kvm_entry: vcpu 1
qemu-system-x86-20239 [004] d..2 9406.484052: kvm_exit: reason PF excp rip 0xffffffff812da4ff info 0 1188808
this rip is
ffffffff812da4e0 <__get_user_8>:
...
ffffffff812da4ff: 48 8b 50 f9 mov -0x7(%rax),%rdx
so we're basically pagefaulting when doing get_user and the user address is 1188808.
And that looks ok, this value is exitinfo2 where SVM puts the faulting
address on an #PF exception intercept.
qemu-system-x86-20239 [004] ...1 9406.484053: kvm_page_fault: address 1188808 error_code 0
qemu-system-x86-20240 [006] d..2 9406.484055: kvm_exit: reason write_cr3 rip 0xffffffff816112d0 info 8000000000000000 0
This is interesting, cpu1 switches address spaces, looks like we're
in context_switch(), i.e. consistent with the guest rip pointing to
__schedule+0x28f below.
I say "interesting" because this bug feels like we're trying to access
the user process' memory which is gone by the time we do so. Hmm, just a
gut feeling though.
qemu-system-x86-20239 [004] d..2 9406.484059: kvm_entry: vcpu 0
qemu-system-x86-20239 [004] d..2 9406.484060: kvm_exit: reason PF excp rip 0xffffffff812da4ff info 0 1188808
qemu-system-x86-20239 [004] ...1 9406.484061: kvm_page_fault: address 1188808 error_code 0
Now here's where it gets interesting:
qemu-system-x86-20239 [004] d..2 9406.484131: kvm_entry: vcpu 0
qemu-system-x86-20240 [006] d..2 9406.484132: kvm_entry: vcpu 1
qemu-system-x86-20240 [006] d..2 9406.484133: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
We're pagefaulting on a user address 7fffb62ba318 at guest rip
0xffffffff8161130f which is:
ffffffff816112da: 00 00
ffffffff816112dc: 4c 8b b3 80 03 00 00 mov 0x380(%rbx),%r14
ffffffff816112e3: 4d 39 b5 80 03 00 00 cmp %r14,0x380(%r13)
ffffffff816112ea: 0f 85 38 06 00 00 jne ffffffff81611928 <__schedule+0x8a8>
ffffffff816112f0: 49 83 bf 88 02 00 00 cmpq $0x0,0x288(%r15)
ffffffff816112f7: 00
ffffffff816112f8: 0f 84 9a 03 00 00 je ffffffff81611698 <__schedule+0x618>
ffffffff816112fe: 49 8d 7c 24 18 lea 0x18(%r12),%rdi
ffffffff81611303: 48 c7 c2 fe 12 61 81 mov $0xffffffff816112fe,%rdx
ffffffff8161130a: be 01 00 00 00 mov $0x1,%esi
ffffffff8161130f: e8 4c 4b a9 ff callq ffffffff810a5e60 <lock_release> <---
ffffffff81611314: 48 8b 74 24 18 mov 0x18(%rsp),%rsi
ffffffff81611319: 4c 89 ff mov %r15,%rdi
ffffffff8161131c: 9c pushfq
ffffffff8161131d: 55 push %rbp
which, if I'm not mistaken is this here in context_switch():
#ifndef __ARCH_WANT_UNLOCKED_CTXSW
spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
#endif
Related annotated asm:
#APP
# 54 "/w/kernel/linux-2.6/arch/x86/include/asm/special_insns.h" 1
mov %rax,%cr3 # D.62668
# 0 "" 2
# 117 "/w/kernel/linux-2.6/arch/x86/include/asm/bitops.h" 1
.pushsection .smp_locks,"a"
.balign 4
.long 671f - .
.popsection
671:
lock; btr %r14,888(%r13) # D.62671, MEM[(volatile long int *)_215]
# 0 "" 2
#NO_APP
movq 896(%rbx), %r14 # mm_193->context.ldt, D.62674
cmpq %r14, 896(%r13) # D.62674, oldmm_194->context.ldt
jne .L2019 #,
.L1925:
cmpq $0, 648(%r15) #, prev_21->mm <--- that's the "if (!prev->mm)" test
je .L2020 #,
.L1931:
leaq 24(%r12), %rdi #, D.62691
movq $.L1931, %rdx #,
movl $1, %esi #,
call lock_release # <---- the call to spin_release
movq 24(%rsp), %rsi # %sfp, D.62679
movq %r15, %rdi # prev, prev
#APP
# 2307 "kernel/sched/core.c" 1
so it basically is the same as what we saw before.
qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a
kvm injects the #PF into the guest.
qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1
qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0)
Second #PF at the same address and kvm injects the #DF.
BUT(!), why?
I probably am missing something but WTH are we pagefaulting at a
user address in context_switch() while doing a lockdep call, i.e.
spin_release? We're not touching any userspace gunk there AFAICT.
Is this an async pagefault or so which kvm is doing so that the guest
rip is actually pointing at the wrong place?
Or something else I'm missing, most probably...
In any case I'll try to repro with the latest kernel in the guest too.
Here's the splat shown in the guest:
[ 3.130253] random: nonblocking pool is initialized
[ 3.700333] PANIC: double fault, error_code: 0x0
[ 3.704212] CPU: 1 PID: 911 Comm: vmmouse_detect Not tainted 3.15.0+ #1
[ 3.704212] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 3.704212] task: ffff88007b4e4dc0 ti: ffff88007aa08000 task.ti: ffff88007aa08000
[ 3.704212] RIP: 0010:[<ffffffff8161130f>] [<ffffffff8161130f>] __schedule+0x28f/0xab0
[ 3.704212] RSP: 002b:00007fffb62ba320 EFLAGS: 00013082
[ 3.704212] RAX: 000000007b75b000 RBX: ffff88007b5b8980 RCX: 0000000000000028
[ 3.704212] RDX: ffffffff816112fe RSI: 0000000000000001 RDI: ffff88007c5d3c58
[ 3.704212] RBP: 00007fffb62ba410 R08: ffff88007bdd3ac9 R09: 0000000000000000
[ 3.704212] R10: 0000000000000001 R11: 0000000000000019 R12: ffff88007c5d3c40
[ 3.704212] R13: ffff88007b5bb440 R14: 0000000000000000 R15: ffff88007b4e4dc0
[ 3.704212] FS: 00007fa1eec0f700(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[ 3.704212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.704212] CR2: 00007fffb62ba318 CR3: 000000007b75b000 CR4: 00000000000006e0
[ 3.704212] Stack:
[ 3.704212] BUG: unable to handle kernel paging request at 00007fffb62ba320
[ 3.704212] IP: [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[ 3.704212] PGD 7b3ab067 PUD 0
[ 3.704212] Oops: 0000 [#1] PREEMPT SMP
[ 3.704212] Modules linked in:
[ 3.704212] CPU: 1 PID: 911 Comm: vmmouse_detect Not tainted 3.15.0+ #1
[ 3.704212] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[ 3.704212] task: ffff88007b4e4dc0 ti: ffff88007aa08000 task.ti: ffff88007aa08000
[ 3.704212] RIP: 0010:[<ffffffff81005bbc>] [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[ 3.704212] RSP: 002b:ffff88007c405e58 EFLAGS: 00013046
[ 3.704212] RAX: 00007fffb62ba328 RBX: 0000000000000000 RCX: ffff88007c403fc0
[ 3.704212] RDX: 00007fffb62ba320 RSI: ffff88007c400000 RDI: ffffffff81846aba
[ 3.704212] RBP: ffff88007c405ea8 R08: ffff88007c3fffc0 R09: 0000000000000000
[ 3.704212] R10: 000000007c400000 R11: 0000000000000000 R12: ffff88007c405f58
[ 3.704212] R13: 0000000000000000 R14: ffffffff818136fc R15: 0000000000000000
[ 3.704212] FS: 00007fa1eec0f700(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000
[ 3.704212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.704212] CR2: 00007fffb62ba320 CR3: 000000007b75b000 CR4: 00000000000006e0
[ 3.704212] Stack:
[ 3.704212] 0000000000000008 ffff88007c405eb8 ffff88007c405e70 000000007b75b000
[ 3.704212] 00007fffb62ba320 ffff88007c405f58 00007fffb62ba320 0000000000000040
[ 3.704212] 0000000000000ac0 ffff88007b4e4dc0 ffff88007c405f08 ffffffff81005d10
[ 3.704212] Call Trace:
[ 3.704212] <#DF>
[ 3.704212] [<ffffffff81005d10>] show_regs+0xa0/0x280
[ 3.704212] [<ffffffff8103d143>] df_debug+0x23/0x40
[ 3.704212] [<ffffffff81003b6d>] do_double_fault+0x5d/0x80
[ 3.704212] [<ffffffff816194c7>] double_fault+0x27/0x30
[ 3.704212] [<ffffffff816112fe>] ? __schedule+0x27e/0xab0
[ 3.704212] [<ffffffff8161130f>] ? __schedule+0x28f/0xab0
[ 3.704212] <<EOE>>
[ 3.704212] <UNK> Code: 7a ff ff ff 0f 1f 00 e8 93 80 00 00 eb a5 48 39 ca 0f 84 8d 00 00 00 45 85 ff 0f 1f 44 00 00 74 06 41 f6 c7 03 74 55 48 8d 42 08 <48> 8b 32 48 c7 c7 f4 36 81 81 4c 89 45 b8 48 89 4d c0 41 ff c7
[ 3.704212] RIP [<ffffffff81005bbc>] show_stack_log_lvl+0x11c/0x1d0
[ 3.704212] RSP <ffff88007c405e58>
[ 3.704212] CR2: 00007fffb62ba320
[ 3.704212] ---[ end trace 85735a6f8b08ee31 ]---
[ 3.704212] note: vmmouse_detect[911] exited with preempt_count 3
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a
>
> kvm injects the #PF into the guest.
>
> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1
> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0)
>
> Second #PF at the same address and kvm injects the #DF.
>
> BUT(!), why?
>
> I probably am missing something but WTH are we pagefaulting at a
> user address in context_switch() while doing a lockdep call, i.e.
> spin_release? We're not touching any userspace gunk there AFAICT.
>
> Is this an async pagefault or so which kvm is doing so that the guest
> rip is actually pointing at the wrong place?
>
There is nothing in the trace that point to async pagefault as far as I see.
> Or something else I'm missing, most probably...
>
Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
kvm_multiple_exception() to see which two exception are combined into #DF.
--
Gleb.
On 2014-06-29 08:46, Gleb Natapov wrote:
> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
>> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
>> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a
>>
>> kvm injects the #PF into the guest.
>>
>> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1
>> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
>> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
>> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0)
>>
>> Second #PF at the same address and kvm injects the #DF.
>>
>> BUT(!), why?
>>
>> I probably am missing something but WTH are we pagefaulting at a
>> user address in context_switch() while doing a lockdep call, i.e.
>> spin_release? We're not touching any userspace gunk there AFAICT.
>>
>> Is this an async pagefault or so which kvm is doing so that the guest
>> rip is actually pointing at the wrong place?
>>
> There is nothing in the trace that point to async pagefault as far as I see.
>
>> Or something else I'm missing, most probably...
>>
> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
> kvm_multiple_exception() to see which two exception are combined into #DF.
>
FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
when patch-disabling the vmport in QEMU.
Let me know if I can help with the analysis.
Jan
On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote:
> On 2014-06-29 08:46, Gleb Natapov wrote:
> > On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
> >> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
> >> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a
> >>
> >> kvm injects the #PF into the guest.
> >>
> >> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1
> >> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
> >> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
> >> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0)
> >>
> >> Second #PF at the same address and kvm injects the #DF.
> >>
> >> BUT(!), why?
> >>
> >> I probably am missing something but WTH are we pagefaulting at a
> >> user address in context_switch() while doing a lockdep call, i.e.
> >> spin_release? We're not touching any userspace gunk there AFAICT.
> >>
> >> Is this an async pagefault or so which kvm is doing so that the guest
> >> rip is actually pointing at the wrong place?
> >>
> > There is nothing in the trace that point to async pagefault as far as I see.
> >
> >> Or something else I'm missing, most probably...
> >>
> > Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
> > kvm_multiple_exception() to see which two exception are combined into #DF.
> >
>
> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
> when patch-disabling the vmport in QEMU.
>
> Let me know if I can help with the analysis.
>
Bisection would be great of course. Once thing that is special about
vmport that comes to mind is that it reads vcpu registers to userspace and
write them back. IIRC "info registers" does the same. Can you see if the
problem is reproducible with disabled vmport, but doing "info registers"
in qemu console? Although trace does not should any exists to userspace
near the failure...
--
Gleb.
On 2014-06-29 12:24, Gleb Natapov wrote:
> On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote:
>> On 2014-06-29 08:46, Gleb Natapov wrote:
>>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
>>>> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
>>>> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a
>>>>
>>>> kvm injects the #PF into the guest.
>>>>
>>>> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1
>>>> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
>>>> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
>>>> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0)
>>>>
>>>> Second #PF at the same address and kvm injects the #DF.
>>>>
>>>> BUT(!), why?
>>>>
>>>> I probably am missing something but WTH are we pagefaulting at a
>>>> user address in context_switch() while doing a lockdep call, i.e.
>>>> spin_release? We're not touching any userspace gunk there AFAICT.
>>>>
>>>> Is this an async pagefault or so which kvm is doing so that the guest
>>>> rip is actually pointing at the wrong place?
>>>>
>>> There is nothing in the trace that point to async pagefault as far as I see.
>>>
>>>> Or something else I'm missing, most probably...
>>>>
>>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
>>> kvm_multiple_exception() to see which two exception are combined into #DF.
>>>
>>
>> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
>> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
>> when patch-disabling the vmport in QEMU.
>>
>> Let me know if I can help with the analysis.
>>
> Bisection would be great of course. Once thing that is special about
> vmport that comes to mind is that it reads vcpu registers to userspace and
> write them back. IIRC "info registers" does the same. Can you see if the
> problem is reproducible with disabled vmport, but doing "info registers"
> in qemu console? Although trace does not should any exists to userspace
> near the failure...
Yes, info registers crashes the guest after a while as well (with
different backtrace due to different context).
Jan
On Sun, Jun 29, 2014 at 12:31:50PM +0200, Jan Kiszka wrote:
> On 2014-06-29 12:24, Gleb Natapov wrote:
> > On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote:
> >> On 2014-06-29 08:46, Gleb Natapov wrote:
> >>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
> >>>> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
> >>>> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a
> >>>>
> >>>> kvm injects the #PF into the guest.
> >>>>
> >>>> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1
> >>>> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
> >>>> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
> >>>> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0)
> >>>>
> >>>> Second #PF at the same address and kvm injects the #DF.
> >>>>
> >>>> BUT(!), why?
> >>>>
> >>>> I probably am missing something but WTH are we pagefaulting at a
> >>>> user address in context_switch() while doing a lockdep call, i.e.
> >>>> spin_release? We're not touching any userspace gunk there AFAICT.
> >>>>
> >>>> Is this an async pagefault or so which kvm is doing so that the guest
> >>>> rip is actually pointing at the wrong place?
> >>>>
> >>> There is nothing in the trace that point to async pagefault as far as I see.
> >>>
> >>>> Or something else I'm missing, most probably...
> >>>>
> >>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
> >>> kvm_multiple_exception() to see which two exception are combined into #DF.
> >>>
> >>
> >> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
> >> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
> >> when patch-disabling the vmport in QEMU.
> >>
> >> Let me know if I can help with the analysis.
> >>
> > Bisection would be great of course. Once thing that is special about
> > vmport that comes to mind is that it reads vcpu registers to userspace and
> > write them back. IIRC "info registers" does the same. Can you see if the
> > problem is reproducible with disabled vmport, but doing "info registers"
> > in qemu console? Although trace does not should any exists to userspace
> > near the failure...
>
> Yes, info registers crashes the guest after a while as well (with
> different backtrace due to different context).
>
Oh crap. Bisection would be most helpful. Just to be absolutely sure
that this is not QEMU problem: does exactly same QEMU version work with
older kernels?
--
Gleb.
On 2014-06-29 12:53, Gleb Natapov wrote:
> On Sun, Jun 29, 2014 at 12:31:50PM +0200, Jan Kiszka wrote:
>> On 2014-06-29 12:24, Gleb Natapov wrote:
>>> On Sun, Jun 29, 2014 at 11:56:03AM +0200, Jan Kiszka wrote:
>>>> On 2014-06-29 08:46, Gleb Natapov wrote:
>>>>> On Sat, Jun 28, 2014 at 01:44:31PM +0200, Borislav Petkov wrote:
>>>>>> qemu-system-x86-20240 [006] ...1 9406.484134: kvm_page_fault: address 7fffb62ba318 error_code 2
>>>>>> qemu-system-x86-20240 [006] ...1 9406.484136: kvm_inj_exception: #PF (0x2)a
>>>>>>
>>>>>> kvm injects the #PF into the guest.
>>>>>>
>>>>>> qemu-system-x86-20240 [006] d..2 9406.484136: kvm_entry: vcpu 1
>>>>>> qemu-system-x86-20240 [006] d..2 9406.484137: kvm_exit: reason PF excp rip 0xffffffff8161130f info 2 7fffb62ba318
>>>>>> qemu-system-x86-20240 [006] ...1 9406.484138: kvm_page_fault: address 7fffb62ba318 error_code 2
>>>>>> qemu-system-x86-20240 [006] ...1 9406.484141: kvm_inj_exception: #DF (0x0)
>>>>>>
>>>>>> Second #PF at the same address and kvm injects the #DF.
>>>>>>
>>>>>> BUT(!), why?
>>>>>>
>>>>>> I probably am missing something but WTH are we pagefaulting at a
>>>>>> user address in context_switch() while doing a lockdep call, i.e.
>>>>>> spin_release? We're not touching any userspace gunk there AFAICT.
>>>>>>
>>>>>> Is this an async pagefault or so which kvm is doing so that the guest
>>>>>> rip is actually pointing at the wrong place?
>>>>>>
>>>>> There is nothing in the trace that point to async pagefault as far as I see.
>>>>>
>>>>>> Or something else I'm missing, most probably...
>>>>>>
>>>>> Strange indeed. Can you also enable kvmmmu tracing? You can also instrument
>>>>> kvm_multiple_exception() to see which two exception are combined into #DF.
>>>>>
>>>>
>>>> FWIW, I'm seeing the same issue here (likely) on an E-450 APU. It
>>>> disappears with older KVM (didn't bisect yet, some 3.11 is fine) and
>>>> when patch-disabling the vmport in QEMU.
>>>>
>>>> Let me know if I can help with the analysis.
>>>>
>>> Bisection would be great of course. Once thing that is special about
>>> vmport that comes to mind is that it reads vcpu registers to userspace and
>>> write them back. IIRC "info registers" does the same. Can you see if the
>>> problem is reproducible with disabled vmport, but doing "info registers"
>>> in qemu console? Although trace does not should any exists to userspace
>>> near the failure...
>>
>> Yes, info registers crashes the guest after a while as well (with
>> different backtrace due to different context).
>>
> Oh crap. Bisection would be most helpful. Just to be absolutely sure
> that this is not QEMU problem: does exactly same QEMU version work with
> older kernels?
Yes, that was the case last time I tried (I'm on today's git head with
QEMU right now).
Will see what I can do regarding bisecting. That host is a bit slow
(netbook), so it may take a while. Boris will probably beat me in this.
Jan
On Sun, Jun 29, 2014 at 12:59:30PM +0200, Jan Kiszka wrote:
> Will see what I can do regarding bisecting. That host is a bit slow
> (netbook), so it may take a while. Boris will probably beat me in
> this.
Nah, I was about to instrument kvm_multiple_exception() first and am
slow anyway so... :-)
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On 2014-06-29 13:51, Borislav Petkov wrote:
> On Sun, Jun 29, 2014 at 12:59:30PM +0200, Jan Kiszka wrote:
>> Will see what I can do regarding bisecting. That host is a bit slow
>> (netbook), so it may take a while. Boris will probably beat me in
>> this.
>
> Nah, I was about to instrument kvm_multiple_exception() first and am
> slow anyway so... :-)
OK, looks like I won ;): The issue was apparently introduced with "KVM:
x86: get CPL from SS.DPL" (ae9fedc793). Maybe we are not properly saving
or restoring this state on SVM since then.
Need a break, will look into details later.
Jan
On Sun, Jun 29, 2014 at 02:22:35PM +0200, Jan Kiszka wrote:
> OK, looks like I won ;):
I gladly let you win. :-P
> The issue was apparently introduced with "KVM: x86: get CPL from
> SS.DPL" (ae9fedc793). Maybe we are not properly saving or restoring
> this state on SVM since then.
I wonder if this change in the CPL saving would have anything to do with
the fact that we're doing a CR3 write right before we fail pagetable
walk and end up walking a user page table. It could be unrelated though,
as in the previous dump I had a get_user right before the #DF. Hmmm.
I better go and revert that one and check whether it fixes things.
> Need a break, will look into details later.
Ok, some more info from my side, see relevant snippet below. We're
basically not finding the pte at level 3 during the page walk for
7fff0b0f8908.
However, why we're even page walking this userspace address at that
point I have no idea.
And the CR3 write right before this happens is there so I'm pretty much
sure by now that this is related...
qemu-system-x86-5007 [007] ...1 346.126204: vcpu_match_mmio: gva 0xffffffffff5fd0b0 gpa 0xfee000b0 Write GVA
qemu-system-x86-5007 [007] ...1 346.126204: kvm_mmio: mmio write len 4 gpa 0xfee000b0 val 0x0
qemu-system-x86-5007 [007] ...1 346.126205: kvm_apic: apic_write APIC_EOI = 0x0
qemu-system-x86-5007 [007] ...1 346.126205: kvm_eoi: apicid 0 vector 253
qemu-system-x86-5007 [007] d..2 346.126206: kvm_entry: vcpu 0
qemu-system-x86-5007 [007] d..2 346.126211: kvm_exit: reason write_cr3 rip 0xffffffff816113a0 info 8000000000000000 0
qemu-system-x86-5007 [007] ...2 346.126214: kvm_mmu_get_page: sp gen 25 gfn 7b2b1 4 pae q0 wux !nxe root 0 sync existing
qemu-system-x86-5007 [007] d..2 346.126215: kvm_entry: vcpu 0
qemu-system-x86-5007 [007] d..2 346.126216: kvm_exit: reason PF excp rip 0xffffffff816113df info 2 7fff0b0f8908
qemu-system-x86-5007 [007] ...1 346.126217: kvm_page_fault: address 7fff0b0f8908 error_code 2
qemu-system-x86-5007 [007] ...1 346.126218: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 2 W
qemu-system-x86-5007 [007] ...1 346.126219: kvm_mmu_paging_element: pte 7b2b6067 level 4
qemu-system-x86-5007 [007] ...1 346.126220: kvm_mmu_paging_element: pte 0 level 3
qemu-system-x86-5007 [007] ...1 346.126220: kvm_mmu_walker_error: pferr 2 W
qemu-system-x86-5007 [007] ...1 346.126221: kvm_multiple_exception: nr: 14, prev: 255, has_error: 1, error_code: 0x2, reinj: 0
qemu-system-x86-5007 [007] ...1 346.126221: kvm_inj_exception: #PF (0x2)
qemu-system-x86-5007 [007] d..2 346.126222: kvm_entry: vcpu 0
qemu-system-x86-5007 [007] d..2 346.126223: kvm_exit: reason PF excp rip 0xffffffff816113df info 2 7fff0b0f8908
qemu-system-x86-5007 [007] ...1 346.126224: kvm_multiple_exception: nr: 14, prev: 14, has_error: 1, error_code: 0x2, reinj: 1
qemu-system-x86-5007 [007] ...1 346.126225: kvm_page_fault: address 7fff0b0f8908 error_code 2
qemu-system-x86-5007 [007] ...1 346.126225: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 0
qemu-system-x86-5007 [007] ...1 346.126226: kvm_mmu_paging_element: pte 7b2b6067 level 4
qemu-system-x86-5007 [007] ...1 346.126227: kvm_mmu_paging_element: pte 0 level 3
qemu-system-x86-5007 [007] ...1 346.126227: kvm_mmu_walker_error: pferr 0
qemu-system-x86-5007 [007] ...1 346.126228: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 2 W
qemu-system-x86-5007 [007] ...1 346.126229: kvm_mmu_paging_element: pte 7b2b6067 level 4
qemu-system-x86-5007 [007] ...1 346.126230: kvm_mmu_paging_element: pte 0 level 3
qemu-system-x86-5007 [007] ...1 346.126230: kvm_mmu_walker_error: pferr 2 W
qemu-system-x86-5007 [007] ...1 346.126231: kvm_multiple_exception: nr: 14, prev: 14, has_error: 1, error_code: 0x2, reinj: 0
qemu-system-x86-5007 [007] ...1 346.126231: kvm_inj_exception: #DF (0x0)
qemu-system-x86-5007 [007] d..2 346.126232: kvm_entry: vcpu 0
qemu-system-x86-5007 [007] d..2 346.126371: kvm_exit: reason io rip 0xffffffff8131e623 info 3d40220 ffffffff8131e625
qemu-system-x86-5007 [007] ...1 346.126372: kvm_pio: pio_write at 0x3d4 size 2 count 1 val 0x130e
qemu-system-x86-5007 [007] ...1 346.126374: kvm_userspace_exit: reason KVM_EXIT_IO (2)
qemu-system-x86-5007 [007] d..2 346.126383: kvm_entry: vcpu 0
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Sun, Jun 29, 2014 at 03:14:43PM +0200, Borislav Petkov wrote:
> On Sun, Jun 29, 2014 at 02:22:35PM +0200, Jan Kiszka wrote:
> > OK, looks like I won ;):
>
> I gladly let you win. :-P
>
> > The issue was apparently introduced with "KVM: x86: get CPL from
> > SS.DPL" (ae9fedc793). Maybe we are not properly saving or restoring
> > this state on SVM since then.
>
> I wonder if this change in the CPL saving would have anything to do with
> the fact that we're doing a CR3 write right before we fail pagetable
> walk and end up walking a user page table. It could be unrelated though,
> as in the previous dump I had a get_user right before the #DF. Hmmm.
>
> I better go and revert that one and check whether it fixes things.
Please do so and let us know.
>
> > Need a break, will look into details later.
>
> Ok, some more info from my side, see relevant snippet below. We're
> basically not finding the pte at level 3 during the page walk for
> 7fff0b0f8908.
>
> However, why we're even page walking this userspace address at that
> point I have no idea.
>
> And the CR3 write right before this happens is there so I'm pretty much
> sure by now that this is related...
>
> qemu-system-x86-5007 [007] ...1 346.126204: vcpu_match_mmio: gva 0xffffffffff5fd0b0 gpa 0xfee000b0 Write GVA
> qemu-system-x86-5007 [007] ...1 346.126204: kvm_mmio: mmio write len 4 gpa 0xfee000b0 val 0x0
> qemu-system-x86-5007 [007] ...1 346.126205: kvm_apic: apic_write APIC_EOI = 0x0
> qemu-system-x86-5007 [007] ...1 346.126205: kvm_eoi: apicid 0 vector 253
> qemu-system-x86-5007 [007] d..2 346.126206: kvm_entry: vcpu 0
> qemu-system-x86-5007 [007] d..2 346.126211: kvm_exit: reason write_cr3 rip 0xffffffff816113a0 info 8000000000000000 0
> qemu-system-x86-5007 [007] ...2 346.126214: kvm_mmu_get_page: sp gen 25 gfn 7b2b1 4 pae q0 wux !nxe root 0 sync existing
> qemu-system-x86-5007 [007] d..2 346.126215: kvm_entry: vcpu 0
> qemu-system-x86-5007 [007] d..2 346.126216: kvm_exit: reason PF excp rip 0xffffffff816113df info 2 7fff0b0f8908
> qemu-system-x86-5007 [007] ...1 346.126217: kvm_page_fault: address 7fff0b0f8908 error_code 2
VCPU faults on 7fff0b0f8908.
> qemu-system-x86-5007 [007] ...1 346.126218: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 2 W
> qemu-system-x86-5007 [007] ...1 346.126219: kvm_mmu_paging_element: pte 7b2b6067 level 4
> qemu-system-x86-5007 [007] ...1 346.126220: kvm_mmu_paging_element: pte 0 level 3
> qemu-system-x86-5007 [007] ...1 346.126220: kvm_mmu_walker_error: pferr 2 W
Address is not mapped by the page tables.
> qemu-system-x86-5007 [007] ...1 346.126221: kvm_multiple_exception: nr: 14, prev: 255, has_error: 1, error_code: 0x2, reinj: 0
> qemu-system-x86-5007 [007] ...1 346.126221: kvm_inj_exception: #PF (0x2)
KVM injects #PF.
> qemu-system-x86-5007 [007] d..2 346.126222: kvm_entry: vcpu 0
> qemu-system-x86-5007 [007] d..2 346.126223: kvm_exit: reason PF excp rip 0xffffffff816113df info 2 7fff0b0f8908
> qemu-system-x86-5007 [007] ...1 346.126224: kvm_multiple_exception: nr: 14, prev: 14, has_error: 1, error_code: 0x2, reinj: 1
reinj:1 means that previous injection failed due to another #PF that
happened during the event injection itself This may happen if GDT or fist
instruction of a fault handler is not mapped by shadow pages, but here
it says that the new page fault is at the same address as the previous
one as if GDT is or #PF handler is mapped there. Strange. Especially
since #DF is injected successfully, so GDT should be fine. May be wrong
cpl makes svm crazy?
> qemu-system-x86-5007 [007] ...1 346.126225: kvm_page_fault: address 7fff0b0f8908 error_code 2
> qemu-system-x86-5007 [007] ...1 346.126225: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 0
> qemu-system-x86-5007 [007] ...1 346.126226: kvm_mmu_paging_element: pte 7b2b6067 level 4
> qemu-system-x86-5007 [007] ...1 346.126227: kvm_mmu_paging_element: pte 0 level 3
> qemu-system-x86-5007 [007] ...1 346.126227: kvm_mmu_walker_error: pferr 0
> qemu-system-x86-5007 [007] ...1 346.126228: kvm_mmu_pagetable_walk: addr 7fff0b0f8908 pferr 2 W
> qemu-system-x86-5007 [007] ...1 346.126229: kvm_mmu_paging_element: pte 7b2b6067 level 4
> qemu-system-x86-5007 [007] ...1 346.126230: kvm_mmu_paging_element: pte 0 level 3
> qemu-system-x86-5007 [007] ...1 346.126230: kvm_mmu_walker_error: pferr 2 W
> qemu-system-x86-5007 [007] ...1 346.126231: kvm_multiple_exception: nr: 14, prev: 14, has_error: 1, error_code: 0x2, reinj: 0
Here we getting a #PF while delivering another #PF which is, rightfully, transformed to #DF.
> qemu-system-x86-5007 [007] ...1 346.126231: kvm_inj_exception: #DF (0x0)
> qemu-system-x86-5007 [007] d..2 346.126232: kvm_entry: vcpu 0
> qemu-system-x86-5007 [007] d..2 346.126371: kvm_exit: reason io rip 0xffffffff8131e623 info 3d40220 ffffffff8131e625
> qemu-system-x86-5007 [007] ...1 346.126372: kvm_pio: pio_write at 0x3d4 size 2 count 1 val 0x130e
> qemu-system-x86-5007 [007] ...1 346.126374: kvm_userspace_exit: reason KVM_EXIT_IO (2)
> qemu-system-x86-5007 [007] d..2 346.126383: kvm_entry: vcpu 0
>
--
Gleb.
On Sun, Jun 29, 2014 at 03:14:43PM +0200, Borislav Petkov wrote:
> I better go and revert that one and check whether it fixes things.
Yahaaa, that was some good bisection work Jan! :-)
> 20 guest restart cycles and all is fine - it used to trigger after 5
max.
Phew, we have it right in time before the football game in 2 hrs. :-)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Sun, Jun 29, 2014 at 04:42:47PM +0300, Gleb Natapov wrote:
> Please do so and let us know.
Yep, just did. Reverting ae9fedc793 fixes the issue.
> reinj:1 means that previous injection failed due to another #PF that
> happened during the event injection itself This may happen if GDT or fist
> instruction of a fault handler is not mapped by shadow pages, but here
> it says that the new page fault is at the same address as the previous
> one as if GDT is or #PF handler is mapped there. Strange. Especially
> since #DF is injected successfully, so GDT should be fine. May be wrong
> cpl makes svm crazy?
Well, I'm not going to even pretend to know kvm to know *when* we're
saving VMCB state but if we're saving the wrong CPL and then doing the
pagetable walk, I can very well imagine if the walker gets confused. One
possible issue could be U/S bit (bit 2) in the PTE bits which allows
access to supervisor pages only when CPL < 3. I.e., CPL has effect on
pagetable walk and a wrong CPL level could break it.
All a conjecture though...
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Sun, Jun 29, 2014 at 04:01:04PM +0200, Borislav Petkov wrote:
> On Sun, Jun 29, 2014 at 04:42:47PM +0300, Gleb Natapov wrote:
> > Please do so and let us know.
>
> Yep, just did. Reverting ae9fedc793 fixes the issue.
>
> > reinj:1 means that previous injection failed due to another #PF that
> > happened during the event injection itself This may happen if GDT or fist
> > instruction of a fault handler is not mapped by shadow pages, but here
> > it says that the new page fault is at the same address as the previous
> > one as if GDT is or #PF handler is mapped there. Strange. Especially
> > since #DF is injected successfully, so GDT should be fine. May be wrong
> > cpl makes svm crazy?
>
> Well, I'm not going to even pretend to know kvm to know *when* we're
> saving VMCB state but if we're saving the wrong CPL and then doing the
> pagetable walk, I can very well imagine if the walker gets confused. One
> possible issue could be U/S bit (bit 2) in the PTE bits which allows
> access to supervisor pages only when CPL < 3. I.e., CPL has effect on
> pagetable walk and a wrong CPL level could break it.
>
> All a conjecture though...
>
Looks plausible, still strange that second #PF is at the same address as the first one though.
Anyway, not we have the commit to blame.
--
Gleb.
On 2014-06-29 16:27, Gleb Natapov wrote:
> On Sun, Jun 29, 2014 at 04:01:04PM +0200, Borislav Petkov wrote:
>> On Sun, Jun 29, 2014 at 04:42:47PM +0300, Gleb Natapov wrote:
>>> Please do so and let us know.
>>
>> Yep, just did. Reverting ae9fedc793 fixes the issue.
>>
>>> reinj:1 means that previous injection failed due to another #PF that
>>> happened during the event injection itself This may happen if GDT or fist
>>> instruction of a fault handler is not mapped by shadow pages, but here
>>> it says that the new page fault is at the same address as the previous
>>> one as if GDT is or #PF handler is mapped there. Strange. Especially
>>> since #DF is injected successfully, so GDT should be fine. May be wrong
>>> cpl makes svm crazy?
>>
>> Well, I'm not going to even pretend to know kvm to know *when* we're
>> saving VMCB state but if we're saving the wrong CPL and then doing the
>> pagetable walk, I can very well imagine if the walker gets confused. One
>> possible issue could be U/S bit (bit 2) in the PTE bits which allows
>> access to supervisor pages only when CPL < 3. I.e., CPL has effect on
>> pagetable walk and a wrong CPL level could break it.
>>
>> All a conjecture though...
>>
> Looks plausible, still strange that second #PF is at the same address as the first one though.
> Anyway, not we have the commit to blame.
I suspect there is a gap between cause and effect. I'm tracing CPL
changes currently, and my first impression is that QEMU triggers an
unwanted switch from CPL 3 to 0 on vmport access:
qemu-system-x86-11883 [001] 7493.378630: kvm_entry: vcpu 0
qemu-system-x86-11883 [001] 7493.378631: bprint: svm_vcpu_run: entry cpl 0
qemu-system-x86-11883 [001] 7493.378636: bprint: svm_vcpu_run: exit cpl 3
qemu-system-x86-11883 [001] 7493.378637: kvm_exit: reason io rip 0x400854 info 56580241 400855
qemu-system-x86-11883 [001] 7493.378640: kvm_emulate_insn: 0:400854:ed (prot64)
qemu-system-x86-11883 [001] 7493.378642: kvm_userspace_exit: reason KVM_EXIT_IO (2)
qemu-system-x86-11883 [001] 7493.378655: bprint: kvm_arch_vcpu_ioctl_get_sregs: ss.dpl 0
qemu-system-x86-11883 [001] 7493.378684: bprint: kvm_arch_vcpu_ioctl_set_sregs: ss.dpl 0
qemu-system-x86-11883 [001] 7493.378685: bprint: svm_set_segment: cpl = 0
qemu-system-x86-11883 [001] 7493.378711: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x3442554a
Yeah... do we have to manually sync save.cpl into ss.dpl on get_sregs
on AMD?
Jan
On 2014-06-29 16:32, Jan Kiszka wrote:
> On 2014-06-29 16:27, Gleb Natapov wrote:
>> On Sun, Jun 29, 2014 at 04:01:04PM +0200, Borislav Petkov wrote:
>>> On Sun, Jun 29, 2014 at 04:42:47PM +0300, Gleb Natapov wrote:
>>>> Please do so and let us know.
>>>
>>> Yep, just did. Reverting ae9fedc793 fixes the issue.
>>>
>>>> reinj:1 means that previous injection failed due to another #PF that
>>>> happened during the event injection itself This may happen if GDT or fist
>>>> instruction of a fault handler is not mapped by shadow pages, but here
>>>> it says that the new page fault is at the same address as the previous
>>>> one as if GDT is or #PF handler is mapped there. Strange. Especially
>>>> since #DF is injected successfully, so GDT should be fine. May be wrong
>>>> cpl makes svm crazy?
>>>
>>> Well, I'm not going to even pretend to know kvm to know *when* we're
>>> saving VMCB state but if we're saving the wrong CPL and then doing the
>>> pagetable walk, I can very well imagine if the walker gets confused. One
>>> possible issue could be U/S bit (bit 2) in the PTE bits which allows
>>> access to supervisor pages only when CPL < 3. I.e., CPL has effect on
>>> pagetable walk and a wrong CPL level could break it.
>>>
>>> All a conjecture though...
>>>
>> Looks plausible, still strange that second #PF is at the same address as the first one though.
>> Anyway, not we have the commit to blame.
>
> I suspect there is a gap between cause and effect. I'm tracing CPL
> changes currently, and my first impression is that QEMU triggers an
> unwanted switch from CPL 3 to 0 on vmport access:
>
> qemu-system-x86-11883 [001] 7493.378630: kvm_entry: vcpu 0
> qemu-system-x86-11883 [001] 7493.378631: bprint: svm_vcpu_run: entry cpl 0
> qemu-system-x86-11883 [001] 7493.378636: bprint: svm_vcpu_run: exit cpl 3
> qemu-system-x86-11883 [001] 7493.378637: kvm_exit: reason io rip 0x400854 info 56580241 400855
> qemu-system-x86-11883 [001] 7493.378640: kvm_emulate_insn: 0:400854:ed (prot64)
> qemu-system-x86-11883 [001] 7493.378642: kvm_userspace_exit: reason KVM_EXIT_IO (2)
> qemu-system-x86-11883 [001] 7493.378655: bprint: kvm_arch_vcpu_ioctl_get_sregs: ss.dpl 0
> qemu-system-x86-11883 [001] 7493.378684: bprint: kvm_arch_vcpu_ioctl_set_sregs: ss.dpl 0
> qemu-system-x86-11883 [001] 7493.378685: bprint: svm_set_segment: cpl = 0
> qemu-system-x86-11883 [001] 7493.378711: kvm_pio: pio_read at 0x5658 size 4 count 1 val 0x3442554a
>
> Yeah... do we have to manually sync save.cpl into ss.dpl on get_sregs
> on AMD?
>
Applying this logic:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c..b5e994a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1462,6 +1462,7 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
*/
if (var->unusable)
var->db = 0;
+ var->dpl = to_svm(vcpu)->vmcb->save.cpl;
break;
}
}
...and my VM runs smoothly so far. Does it make sense in all scenarios?
Jan
From: Jan Kiszka <[email protected]>
We import the CPL via SS.DPL since ae9fedc793. However, we fail to
export it this way so far. This caused spurious guest crashes, e.g. of
Linux when accessing the vmport from guest user space which triggered
register saving/restoring to/from host user space.
Signed-off-by: Jan Kiszka <[email protected]>
---
Just in time for the next match :D
arch/x86/kvm/svm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c..b5e994a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1462,6 +1462,7 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
*/
if (var->unusable)
var->db = 0;
+ var->dpl = to_svm(vcpu)->vmcb->save.cpl;
break;
}
}
--
1.8.4.5
On Sun, Jun 29, 2014 at 05:12:43PM +0200, Jan Kiszka wrote:
> From: Jan Kiszka <[email protected]>
>
> We import the CPL via SS.DPL since ae9fedc793. However, we fail to
> export it this way so far. This caused spurious guest crashes, e.g. of
> Linux when accessing the vmport from guest user space which triggered
> register saving/restoring to/from host user space.
>
> Signed-off-by: Jan Kiszka <[email protected]>
Yep, looks good.
Tested-by: Borislav Petkov <[email protected]>
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
Il 29/06/2014 17:12, Jan Kiszka ha scritto:
> From: Jan Kiszka <[email protected]>
>
> We import the CPL via SS.DPL since ae9fedc793. However, we fail to
> export it this way so far. This caused spurious guest crashes, e.g. of
> Linux when accessing the vmport from guest user space which triggered
> register saving/restoring to/from host user space.
>
> Signed-off-by: Jan Kiszka <[email protected]>
> ---
>
> Just in time for the next match :D
>
> arch/x86/kvm/svm.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index ec8366c..b5e994a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1462,6 +1462,7 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
> */
> if (var->unusable)
> var->db = 0;
> + var->dpl = to_svm(vcpu)->vmcb->save.cpl;
> break;
> }
> }
>
Thanks. In theory this is not necessary, the SS.DPL should be the same
as the CPL according to the manuals (the manual say that the SS.DPL
"should match" the CPL, and that's the only reason why I included the
import in ae9fedc793). But apparently this is not the case.
Paolo
On 2014-06-30 17:01, Paolo Bonzini wrote:
> Il 29/06/2014 17:12, Jan Kiszka ha scritto:
>> From: Jan Kiszka <[email protected]>
>>
>> We import the CPL via SS.DPL since ae9fedc793. However, we fail to
>> export it this way so far. This caused spurious guest crashes, e.g. of
>> Linux when accessing the vmport from guest user space which triggered
>> register saving/restoring to/from host user space.
>>
>> Signed-off-by: Jan Kiszka <[email protected]>
>> ---
>>
>> Just in time for the next match :D
>>
>> arch/x86/kvm/svm.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index ec8366c..b5e994a 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -1462,6 +1462,7 @@ static void svm_get_segment(struct kvm_vcpu *vcpu,
>> */
>> if (var->unusable)
>> var->db = 0;
>> + var->dpl = to_svm(vcpu)->vmcb->save.cpl;
>> break;
>> }
>> }
>>
>
> Thanks. In theory this is not necessary, the SS.DPL should be the same
> as the CPL according to the manuals (the manual say that the SS.DPL
> "should match" the CPL, and that's the only reason why I included the
> import in ae9fedc793). But apparently this is not the case.
15.5.1:
"When examining segment attributes after a #VMEXIT:
[...]
? Retrieve the CPL from the CPL field in the VMCB, not from any segment
DPL."
Jan
On Mon, Jun 30, 2014 at 05:03:57PM +0200, Jan Kiszka wrote:
> 15.5.1:
>
> "When examining segment attributes after a #VMEXIT:
> [...]
> • Retrieve the CPL from the CPL field in the VMCB, not from any segment
> DPL."
Heey, it is even documented! :-P
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Mon, Jun 30, 2014 at 05:15:44PM +0200, Borislav Petkov wrote:
> On Mon, Jun 30, 2014 at 05:03:57PM +0200, Jan Kiszka wrote:
> > 15.5.1:
> >
> > "When examining segment attributes after a #VMEXIT:
> > [...]
> > ? Retrieve the CPL from the CPL field in the VMCB, not from any segment
> > DPL."
>
> Heey, it is even documented! :-P
>
Yes, on SVM we should always respect this field. Unfortunately there
is no such field in VMX, so we have to do DPL gymnastics there.
--
Gleb.
Il 30/06/2014 17:03, Jan Kiszka ha scritto:
> 15.5.1:
>
> "When examining segment attributes after a #VMEXIT:
> [...]
> ? Retrieve the CPL from the CPL field in the VMCB, not from any segment
> DPL."
It's only the fourth paragraph below the one I did read...
Paolo