LinuxLists.cc - kvm: WARNING in mmu_spte_clear_track

2016-12-13 19:51:22

Subject: kvm: WARNING in mmu_spte_clear_track_bits

Hello,

The following program:
https://gist.githubusercontent.com/dvyukov/23d8bd622fd526d7701ac2057bbbc9c2/raw/aacd20451e6f460232f5e1da262b653fb3155613/gistfile1.txt

leads to WARNING in mmu_spte_clear_track_bits and later to splash of
BUG: Bad page state in process a.out pfn:619b5

On commit e7aa8c2eb11ba69b1b69099c3c7bd6be3087b0ba (Dec 12).

------------[ cut here ]------------
WARNING: CPU: 0 PID: 6907 at mmu_spte_clear_track_bits+0x326/0x3a0
arch/x86/kvm/mmu.c:614
Modules linked in:
CPU: 0 PID: 6907 Comm: a.out Not tainted 4.9.0+ #85
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[< none >] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
[< none >] __warn+0x1a4/0x1e0 kernel/panic.c:550
[< none >] warn_slowpath_null+0x31/0x40 kernel/panic.c:585
[< none >] mmu_spte_clear_track_bits+0x326/0x3a0
arch/x86/kvm/mmu.c:614
[< none >] drop_spte+0x29/0x220 arch/x86/kvm/mmu.c:1182
[< none >] mmu_page_zap_pte+0x209/0x300 arch/x86/kvm/mmu.c:2306
[< inline >] kvm_mmu_page_unlink_children arch/x86/kvm/mmu.c:2328
[< none >] kvm_mmu_prepare_zap_page+0x1cd/0x1240
arch/x86/kvm/mmu.c:2372
[< inline >] kvm_zap_obsolete_pages arch/x86/kvm/mmu.c:4915
[< none >] kvm_mmu_invalidate_zap_all_pages+0x4af/0x6f0
arch/x86/kvm/mmu.c:4956
[< none >] kvm_arch_flush_shadow_all+0x1a/0x20
arch/x86/kvm/x86.c:8177
[< none >] kvm_mmu_notifier_release+0x76/0xb0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:467
[< none >] __mmu_notifier_release+0x1fe/0x6c0 mm/mmu_notifier.c:74
[< inline >] mmu_notifier_release ./include/linux/mmu_notifier.h:235
[< none >] exit_mmap+0x3d1/0x4a0 mm/mmap.c:2918
[< inline >] __mmput kernel/fork.c:868
[< none >] mmput+0x1fd/0x690 kernel/fork.c:890
[< inline >] exit_mm kernel/exit.c:521
[< none >] do_exit+0x9e7/0x2930 kernel/exit.c:826
[< none >] do_group_exit+0x14e/0x420 kernel/exit.c:943
[< inline >] SYSC_exit_group kernel/exit.c:954
[< none >] SyS_exit_group+0x22/0x30 kernel/exit.c:952
[< none >] entry_SYSCALL_64_fastpath+0x23/0xc6
arch/x86/entry/entry_64.S:203
RIP: 0033:0x43f4d9
RSP: 002b:00007ffc7e83f548 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00000000006d6660 RCX: 000000000043f4d9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000001 R08: 000000000000003c R09: 00000000000000e7
R10: ffffffffffffffd0 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fe58e3869c0 R15: 00007fe58e386700
---[ end trace 37ef4e3d7e4c81a9 ]---

BUG: Bad page state in process a.out pfn:61fb5
page:ffffea000187ed40 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x5fffc0000000014(referenced|dirty)
raw: 05fffc0000000014 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffffea000187ed60 0000000000000000 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
bad because of flags: 0x14(referenced|dirty)
Modules linked in:
CPU: 2 PID: 7169 Comm: a.out Tainted: G W 4.9.0+ #85
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[< none >] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
[< none >] bad_page+0x29c/0x320 mm/page_alloc.c:550
[< none >] check_new_page_bad+0x203/0x2f0 mm/page_alloc.c:1682
[< inline >] check_new_page mm/page_alloc.c:1694
[< inline >] check_new_pages mm/page_alloc.c:1731
[< none >] buffered_rmqueue+0x1770/0x2900 mm/page_alloc.c:2668
[< none >] get_page_from_freelist+0x213/0x1180
mm/page_alloc.c:2985
[< none >] __alloc_pages_nodemask+0x3b2/0xc90 mm/page_alloc.c:3801
[< inline >] __alloc_pages ./include/linux/gfp.h:433
[< inline >] __alloc_pages_node ./include/linux/gfp.h:446
[< none >] alloc_pages_vma+0x723/0xa30 mm/mempolicy.c:2012
[< none >] do_huge_pmd_anonymous_page+0x35f/0x1b10
mm/huge_memory.c:704
[< inline >] create_huge_pmd mm/memory.c:3476
[< inline >] __handle_mm_fault mm/memory.c:3626
[< none >] handle_mm_fault+0x1975/0x2b90 mm/memory.c:3687
[< none >] __do_page_fault+0x4fb/0xb60 arch/x86/mm/fault.c:1396
[< none >] trace_do_page_fault+0x159/0x810
arch/x86/mm/fault.c:1489
[< none >] do_async_page_fault+0x77/0xd0 arch/x86/kernel/kvm.c:264
[< none >] async_page_fault+0x28/0x30
arch/x86/entry/entry_64.S:1011
RIP: 0033:0x401f5f
RSP: 002b:00007fe592b8ece0 EFLAGS: 00010246
RAX: 0000000020017fe0 RBX: 0000000000000000 RCX: 0000000000403894
RDX: b93bc4d4f06f7d0e RSI: 0000000000000000 RDI: 00007fe592b8f608
RBP: 00007fe592b8ed10 R08: 00007fe592b8f700 R09: 00007fe592b8f700
R10: 00007fe592b8f9d0 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fe592b8f9c0 R15: 00007fe592b8f700

BUG: Bad page state in process a.out pfn:619b5
page:ffffea0001866d40 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x5fffc0000000014(referenced|dirty)
raw: 05fffc0000000014 0000000000000000 0000000000000000 00000000ffffffff
raw: 0000000000000000 ffffea0001866d60 0000000000000000 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag set
bad because of flags: 0x14(referenced|dirty)
Modules linked in:
CPU: 2 PID: 7169 Comm: a.out Tainted: G B W 4.9.0+ #85
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[< none >] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
[< none >] bad_page+0x29c/0x320 mm/page_alloc.c:550
[< none >] check_new_page_bad+0x203/0x2f0 mm/page_alloc.c:1682
[< inline >] check_new_page mm/page_alloc.c:1694
[< inline >] check_new_pages mm/page_alloc.c:1731
[< none >] buffered_rmqueue+0x1770/0x2900 mm/page_alloc.c:2668
[< none >] get_page_from_freelist+0x213/0x1180
mm/page_alloc.c:2985
[< none >] __alloc_pages_nodemask+0x3b2/0xc90 mm/page_alloc.c:3801
[< inline >] __alloc_pages ./include/linux/gfp.h:433
[< inline >] __alloc_pages_node ./include/linux/gfp.h:446
[< none >] alloc_pages_vma+0x723/0xa30 mm/mempolicy.c:2012
[< none >] do_huge_pmd_anonymous_page+0x35f/0x1b10
mm/huge_memory.c:704
[< inline >] create_huge_pmd mm/memory.c:3476
[< inline >] __handle_mm_fault mm/memory.c:3626
[< none >] handle_mm_fault+0x1975/0x2b90 mm/memory.c:3687
[< none >] __do_page_fault+0x4fb/0xb60 arch/x86/mm/fault.c:1396
[< none >] trace_do_page_fault+0x159/0x810
arch/x86/mm/fault.c:1489
[< none >] do_async_page_fault+0x77/0xd0 arch/x86/kernel/kvm.c:264
[< none >] async_page_fault+0x28/0x30
arch/x86/entry/entry_64.S:1011
RIP: 0033:0x401f5f
RSP: 002b:00007fe592b8ece0 EFLAGS: 00010246
RAX: 0000000020017fe0 RBX: 0000000000000000 RCX: 0000000000403894
RDX: b93bc4d4f06f7d0e RSI: 0000000000000000 RDI: 00007fe592b8f608
RBP: 00007fe592b8ed10 R08: 00007fe592b8f700 R09: 00007fe592b8f700
R10: 00007fe592b8f9d0 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fe592b8f9c0 R15: 00007fe592b8f700

2017-03-12 11:21:11

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: kvm: WARNING in mmu_spte_clear_track_bits

On Tue, Jan 17, 2017 at 5:00 PM, Dmitry Vyukov <[email protected]> wrote:
> On Tue, Jan 17, 2017 at 4:20 PM, Paolo Bonzini <[email protected]> wrote:
>>
>>
>> On 13/01/2017 12:15, Dmitry Vyukov wrote:
>>>
>>> I've commented out the WARNING for now, but I am seeing lots of
>>> use-after-free's and rcu stalls involving mmu_spte_clear_track_bits:
>>>
>>>
>>> BUG: KASAN: use-after-free in mmu_spte_clear_track_bits+0x186/0x190
>>> arch/x86/kvm/mmu.c:597 at addr ffff880068ae2008
>>> Read of size 8 by task syz-executor2/16715
>>> page:ffffea00016e6170 count:0 mapcount:0 mapping: (null) index:0x0
>>> flags: 0x500000000000000()
>>> raw: 0500000000000000 0000000000000000 0000000000000000 00000000ffffffff
>>> raw: ffffea00017ec5a0 ffffea0001783d48 ffff88006aec5d98
>>> page dumped because: kasan: bad access detected
>>> CPU: 2 PID: 16715 Comm: syz-executor2 Not tainted 4.10.0-rc3+ #163
>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>> Call Trace:
>>> __dump_stack lib/dump_stack.c:15 [inline]
>>> dump_stack+0x292/0x3a2 lib/dump_stack.c:51
>>> kasan_report_error mm/kasan/report.c:213 [inline]
>>> kasan_report+0x42d/0x460 mm/kasan/report.c:307
>>> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:333
>>> mmu_spte_clear_track_bits+0x186/0x190 arch/x86/kvm/mmu.c:597
>>> drop_spte+0x24/0x280 arch/x86/kvm/mmu.c:1182
>>> kvm_zap_rmapp+0x119/0x260 arch/x86/kvm/mmu.c:1401
>>> kvm_unmap_rmapp+0x1d/0x30 arch/x86/kvm/mmu.c:1412
>>> kvm_handle_hva_range+0x54a/0x7d0 arch/x86/kvm/mmu.c:1565
>>> kvm_unmap_hva_range+0x2e/0x40 arch/x86/kvm/mmu.c:1591
>>> kvm_mmu_notifier_invalidate_range_start+0xae/0x140
>>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:360
>>> __mmu_notifier_invalidate_range_start+0x1f8/0x300 mm/mmu_notifier.c:199
>>> mmu_notifier_invalidate_range_start include/linux/mmu_notifier.h:282 [inline]
>>> unmap_vmas+0x14b/0x1b0 mm/memory.c:1368
>>> unmap_region+0x2f8/0x560 mm/mmap.c:2460
>>> do_munmap+0x7b8/0xfa0 mm/mmap.c:2657
>>> mmap_region+0x68f/0x18e0 mm/mmap.c:1612
>>> do_mmap+0x6a2/0xd40 mm/mmap.c:1450
>>> do_mmap_pgoff include/linux/mm.h:2031 [inline]
>>> vm_mmap_pgoff+0x1a9/0x200 mm/util.c:305
>>> SYSC_mmap_pgoff mm/mmap.c:1500 [inline]
>>> SyS_mmap_pgoff+0x22c/0x5d0 mm/mmap.c:1458
>>> SYSC_mmap arch/x86/kernel/sys_x86_64.c:95 [inline]
>>> SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:86
>>> entry_SYSCALL_64_fastpath+0x1f/0xc2
>>> RIP: 0033:0x445329
>>> RSP: 002b:00007fb33933cb58 EFLAGS: 00000282 ORIG_RAX: 0000000000000009
>>> RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 0000000000445329
>>> RDX: 0000000000000003 RSI: 0000000000af1000 RDI: 0000000020000000
>>> RBP: 00000000006dfe90 R08: ffffffffffffffff R09: 0000000000000000
>>> R10: 0000000000000032 R11: 0000000000000282 R12: 0000000000700000
>>> R13: 0000000000000006 R14: ffffffffffffffff R15: 0000000020001000
>>> Memory state around the buggy address:
>>> ffff880068ae1f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>> ffff880068ae1f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>> ffff880068ae2000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>> ^
>>> ffff880068ae2080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>> ffff880068ae2100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>> ==================================================================
>>
>> This could be related to the gfn_to_rmap issues.
>
>
> Humm... That's possible. Potentially I am not seeing any more of
> spte-related crashes after I applied the following patch:
>
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -968,8 +968,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
> /* Check for overlaps */
> r = -EEXIST;
> kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
> - if ((slot->id >= KVM_USER_MEM_SLOTS) ||
> - (slot->id == id))
> + if (slot->id == id)
> continue;
> if (!((base_gfn + npages <= slot->base_gfn) ||
> (base_gfn >= slot->base_gfn + slot->npages)))

Friendly ping. Just hit it on
mmotm/86292b33d4b79ee03e2f43ea0381ef85f077c760 (without the above
change):

------------[ cut here ]------------
WARNING: CPU: 1 PID: 31060 at arch/x86/kvm/mmu.c:682
mmu_spte_clear_track_bits+0x3a1/0x420 arch/x86/kvm/mmu.c:682
CPU: 1 PID: 31060 Comm: syz-executor0 Not tainted 4.11.0-rc1+ #328
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x1a7/0x26a lib/dump_stack.c:52
panic+0x1f8/0x40f kernel/panic.c:180
__warn+0x1c4/0x1e0 kernel/panic.c:541
warn_slowpath_null+0x2c/0x40 kernel/panic.c:584
mmu_spte_clear_track_bits+0x3a1/0x420 arch/x86/kvm/mmu.c:682
drop_spte+0x24/0x280 arch/x86/kvm/mmu.c:1323
mmu_page_zap_pte+0x223/0x350 arch/x86/kvm/mmu.c:2438
kvm_mmu_page_unlink_children arch/x86/kvm/mmu.c:2460 [inline]
kvm_mmu_prepare_zap_page+0x1ce/0x13d0 arch/x86/kvm/mmu.c:2504
kvm_zap_obsolete_pages arch/x86/kvm/mmu.c:5134 [inline]
kvm_mmu_invalidate_zap_all_pages+0x4d4/0x6b0 arch/x86/kvm/mmu.c:5175
kvm_arch_flush_shadow_all+0x15/0x20 arch/x86/kvm/x86.c:8364
kvm_mmu_notifier_release+0x71/0xb0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:472
__mmu_notifier_release+0x1e5/0x6b0 mm/mmu_notifier.c:75
mmu_notifier_release include/linux/mmu_notifier.h:235 [inline]
exit_mmap+0x3a3/0x470 mm/mmap.c:2941
__mmput kernel/fork.c:890 [inline]
mmput+0x228/0x700 kernel/fork.c:912
exit_mm kernel/exit.c:558 [inline]
do_exit+0x9e8/0x1c20 kernel/exit.c:866
do_group_exit+0x149/0x400 kernel/exit.c:983
get_signal+0x6d9/0x1840 kernel/signal.c:2318
do_signal+0x94/0x1f30 arch/x86/kernel/signal.c:808
exit_to_usermode_loop+0x1e5/0x2d0 arch/x86/entry/common.c:157
prepare_exit_to_usermode arch/x86/entry/common.c:191 [inline]
syscall_return_slowpath+0x3bd/0x460 arch/x86/entry/common.c:260
entry_SYSCALL_64_fastpath+0xc0/0xc2
RIP: 0033:0x4458d9
RSP: 002b:00007ffa472c3b58 EFLAGS: 00000286 ORIG_RAX: 00000000000000ce
RAX: fffffffffffffff4 RBX: 0000000000708000 RCX: 00000000004458d9
RDX: 0000000000000000 RSI: 000000002006bff8 RDI: 000000000000a05b
RBP: 0000000000000fe0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000286 R12: 00000000006df0a0
R13: 000000000000a05b R14: 000000002006bff8 R15: 0000000000000000

2017-03-14 15:17:49

by Radim Krčmář

[permalink] [raw]

Subject: Re: kvm: WARNING in mmu_spte_clear_track_bits

2017-03-12 12:20+0100, Dmitry Vyukov:
> On Tue, Jan 17, 2017 at 5:00 PM, Dmitry Vyukov <[email protected]> wrote:
>> On Tue, Jan 17, 2017 at 4:20 PM, Paolo Bonzini <[email protected]> wrote:
>>>
>>>
>>> On 13/01/2017 12:15, Dmitry Vyukov wrote:
>>>>
>>>> I've commented out the WARNING for now, but I am seeing lots of
>>>> use-after-free's and rcu stalls involving mmu_spte_clear_track_bits:
>>>>
>>>>
>>>> BUG: KASAN: use-after-free in mmu_spte_clear_track_bits+0x186/0x190
>>>> arch/x86/kvm/mmu.c:597 at addr ffff880068ae2008
>>>> Read of size 8 by task syz-executor2/16715
>>>> page:ffffea00016e6170 count:0 mapcount:0 mapping: (null) index:0x0
>>>> flags: 0x500000000000000()
>>>> raw: 0500000000000000 0000000000000000 0000000000000000 00000000ffffffff
>>>> raw: ffffea00017ec5a0 ffffea0001783d48 ffff88006aec5d98
>>>> page dumped because: kasan: bad access detected
>>>> CPU: 2 PID: 16715 Comm: syz-executor2 Not tainted 4.10.0-rc3+ #163
>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>>> Call Trace:
>>>> __dump_stack lib/dump_stack.c:15 [inline]
>>>> dump_stack+0x292/0x3a2 lib/dump_stack.c:51
>>>> kasan_report_error mm/kasan/report.c:213 [inline]
>>>> kasan_report+0x42d/0x460 mm/kasan/report.c:307
>>>> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:333
>>>> mmu_spte_clear_track_bits+0x186/0x190 arch/x86/kvm/mmu.c:597
>>>> drop_spte+0x24/0x280 arch/x86/kvm/mmu.c:1182
>>>> kvm_zap_rmapp+0x119/0x260 arch/x86/kvm/mmu.c:1401
>>>> kvm_unmap_rmapp+0x1d/0x30 arch/x86/kvm/mmu.c:1412
>>>> kvm_handle_hva_range+0x54a/0x7d0 arch/x86/kvm/mmu.c:1565
>>>> kvm_unmap_hva_range+0x2e/0x40 arch/x86/kvm/mmu.c:1591
>>>> kvm_mmu_notifier_invalidate_range_start+0xae/0x140
>>>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:360
>>>> __mmu_notifier_invalidate_range_start+0x1f8/0x300 mm/mmu_notifier.c:199
>>>> mmu_notifier_invalidate_range_start include/linux/mmu_notifier.h:282 [inline]
>>>> unmap_vmas+0x14b/0x1b0 mm/memory.c:1368
>>>> unmap_region+0x2f8/0x560 mm/mmap.c:2460
>>>> do_munmap+0x7b8/0xfa0 mm/mmap.c:2657
>>>> mmap_region+0x68f/0x18e0 mm/mmap.c:1612
>>>> do_mmap+0x6a2/0xd40 mm/mmap.c:1450
>>>> do_mmap_pgoff include/linux/mm.h:2031 [inline]
>>>> vm_mmap_pgoff+0x1a9/0x200 mm/util.c:305
>>>> SYSC_mmap_pgoff mm/mmap.c:1500 [inline]
>>>> SyS_mmap_pgoff+0x22c/0x5d0 mm/mmap.c:1458
>>>> SYSC_mmap arch/x86/kernel/sys_x86_64.c:95 [inline]
>>>> SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:86
>>>> entry_SYSCALL_64_fastpath+0x1f/0xc2
>>>> RIP: 0033:0x445329
>>>> RSP: 002b:00007fb33933cb58 EFLAGS: 00000282 ORIG_RAX: 0000000000000009
>>>> RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 0000000000445329
>>>> RDX: 0000000000000003 RSI: 0000000000af1000 RDI: 0000000020000000
>>>> RBP: 00000000006dfe90 R08: ffffffffffffffff R09: 0000000000000000
>>>> R10: 0000000000000032 R11: 0000000000000282 R12: 0000000000700000
>>>> R13: 0000000000000006 R14: ffffffffffffffff R15: 0000000020001000
>>>> Memory state around the buggy address:
>>>> ffff880068ae1f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>> ffff880068ae1f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> ffff880068ae2000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>> ^
>>>> ffff880068ae2080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>> ffff880068ae2100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>> ==================================================================
>>>
>>> This could be related to the gfn_to_rmap issues.
>>
>>
>> Humm... That's possible. Potentially I am not seeing any more of
>> spte-related crashes after I applied the following patch:
>>
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -968,8 +968,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>> /* Check for overlaps */
>> r = -EEXIST;
>> kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
>> - if ((slot->id >= KVM_USER_MEM_SLOTS) ||
>> - (slot->id == id))
>> + if (slot->id == id)
>> continue;
>> if (!((base_gfn + npages <= slot->base_gfn) ||
>> (base_gfn >= slot->base_gfn + slot->npages)))

I don't understand how this fixes the test: the only memslot that the
test creates is at memory range 0x0-0x1000, which should not overlap
with any private memslots.
There should be just the IDENTITY_PAGETABLE_PRIVATE_MEMSLOT @
0xfffbc000ul.

Do you get any ouput with this hunk?

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a17d78759727..7e1929432232 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -888,6 +888,14 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
return old_memslots;
}

+void kvm_dump_slot(struct kvm_memory_slot *slot)
+{
+ printk("kvm_memory_slot %p { .id = %u, .base_gfn = %#llx, .npages = %lu, "
+ ".userspace_addr = %#lx, .flags = %u, .dirty_bitmap = %p, .arch = ? }\n",
+ slot, slot->id, slot->base_gfn, slot->npages,
+ slot->userspace_addr, slot->flags, slot->dirty_bitmap);
+}
+
/*
* Allocate some memory and give it an address in the guest physical address
* space.
@@ -978,12 +986,14 @@ int __kvm_set_memory_region(struct kvm *kvm,
/* Check for overlaps */
r = -EEXIST;
kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
- if ((slot->id >= KVM_USER_MEM_SLOTS) ||
- (slot->id == id))
+ if (slot->id == id)
continue;
if (!((base_gfn + npages <= slot->base_gfn) ||
- (base_gfn >= slot->base_gfn + slot->npages)))
+ (base_gfn >= slot->base_gfn + slot->npages))) {
+ kvm_dump_slot(&new);
+ kvm_dump_slot(slot);
goto out;
+ }
}
}

> Friendly ping. Just hit it on

And the warning happens at mmap ... I can't reproduce, but does the bug
happen on the second mmap()? (Test line 210 when i = 0.)

The change above makes sense as memslots currently cannot overlap
anywhere. There are three private memslots that can cause this problem:
TSS, IDENTITY_MAP and APIC.

TSS and IDENTITY_MAP can be configured by userspace and must not
conflict by design, so we can safely enforce that.
APIC memslot doesn't provide such guarantees and should be overlaid over
any memory, but assuming that userspace doesn't configure memslots there
seems bearable.

Still, I'd like to understand why that patch would fix this bug.

Thanks.

> mmotm/86292b33d4b79ee03e2f43ea0381ef85f077c760 (without the above
> change):
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 31060 at arch/x86/kvm/mmu.c:682
> mmu_spte_clear_track_bits+0x3a1/0x420 arch/x86/kvm/mmu.c:682
> CPU: 1 PID: 31060 Comm: syz-executor0 Not tainted 4.11.0-rc1+ #328
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:16 [inline]
> dump_stack+0x1a7/0x26a lib/dump_stack.c:52
> panic+0x1f8/0x40f kernel/panic.c:180
> __warn+0x1c4/0x1e0 kernel/panic.c:541
> warn_slowpath_null+0x2c/0x40 kernel/panic.c:584
> mmu_spte_clear_track_bits+0x3a1/0x420 arch/x86/kvm/mmu.c:682
> drop_spte+0x24/0x280 arch/x86/kvm/mmu.c:1323
> mmu_page_zap_pte+0x223/0x350 arch/x86/kvm/mmu.c:2438
> kvm_mmu_page_unlink_children arch/x86/kvm/mmu.c:2460 [inline]
> kvm_mmu_prepare_zap_page+0x1ce/0x13d0 arch/x86/kvm/mmu.c:2504
> kvm_zap_obsolete_pages arch/x86/kvm/mmu.c:5134 [inline]
> kvm_mmu_invalidate_zap_all_pages+0x4d4/0x6b0 arch/x86/kvm/mmu.c:5175
> kvm_arch_flush_shadow_all+0x15/0x20 arch/x86/kvm/x86.c:8364
> kvm_mmu_notifier_release+0x71/0xb0
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:472
> __mmu_notifier_release+0x1e5/0x6b0 mm/mmu_notifier.c:75
> mmu_notifier_release include/linux/mmu_notifier.h:235 [inline]
> exit_mmap+0x3a3/0x470 mm/mmap.c:2941
> __mmput kernel/fork.c:890 [inline]
> mmput+0x228/0x700 kernel/fork.c:912
> exit_mm kernel/exit.c:558 [inline]
> do_exit+0x9e8/0x1c20 kernel/exit.c:866
> do_group_exit+0x149/0x400 kernel/exit.c:983
> get_signal+0x6d9/0x1840 kernel/signal.c:2318
> do_signal+0x94/0x1f30 arch/x86/kernel/signal.c:808
> exit_to_usermode_loop+0x1e5/0x2d0 arch/x86/entry/common.c:157
> prepare_exit_to_usermode arch/x86/entry/common.c:191 [inline]
> syscall_return_slowpath+0x3bd/0x460 arch/x86/entry/common.c:260
> entry_SYSCALL_64_fastpath+0xc0/0xc2
> RIP: 0033:0x4458d9
> RSP: 002b:00007ffa472c3b58 EFLAGS: 00000286 ORIG_RAX: 00000000000000ce
> RAX: fffffffffffffff4 RBX: 0000000000708000 RCX: 00000000004458d9
> RDX: 0000000000000000 RSI: 000000002006bff8 RDI: 000000000000a05b
> RBP: 0000000000000fe0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000286 R12: 00000000006df0a0
> R13: 000000000000a05b R14: 000000002006bff8 R15: 0000000000000000

2017-03-23 16:39:44

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: kvm: WARNING in mmu_spte_clear_track_bits

On Tue, Mar 14, 2017 at 4:17 PM, Radim Krčmář <[email protected]> wrote:
> 2017-03-12 12:20+0100, Dmitry Vyukov:
>> On Tue, Jan 17, 2017 at 5:00 PM, Dmitry Vyukov <[email protected]> wrote:
>>> On Tue, Jan 17, 2017 at 4:20 PM, Paolo Bonzini <[email protected]> wrote:
>>>>
>>>>
>>>> On 13/01/2017 12:15, Dmitry Vyukov wrote:
>>>>>
>>>>> I've commented out the WARNING for now, but I am seeing lots of
>>>>> use-after-free's and rcu stalls involving mmu_spte_clear_track_bits:
>>>>>
>>>>>
>>>>> BUG: KASAN: use-after-free in mmu_spte_clear_track_bits+0x186/0x190
>>>>> arch/x86/kvm/mmu.c:597 at addr ffff880068ae2008
>>>>> Read of size 8 by task syz-executor2/16715
>>>>> page:ffffea00016e6170 count:0 mapcount:0 mapping: (null) index:0x0
>>>>> flags: 0x500000000000000()
>>>>> raw: 0500000000000000 0000000000000000 0000000000000000 00000000ffffffff
>>>>> raw: ffffea00017ec5a0 ffffea0001783d48 ffff88006aec5d98
>>>>> page dumped because: kasan: bad access detected
>>>>> CPU: 2 PID: 16715 Comm: syz-executor2 Not tainted 4.10.0-rc3+ #163
>>>>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>>>>> Call Trace:
>>>>> __dump_stack lib/dump_stack.c:15 [inline]
>>>>> dump_stack+0x292/0x3a2 lib/dump_stack.c:51
>>>>> kasan_report_error mm/kasan/report.c:213 [inline]
>>>>> kasan_report+0x42d/0x460 mm/kasan/report.c:307
>>>>> __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:333
>>>>> mmu_spte_clear_track_bits+0x186/0x190 arch/x86/kvm/mmu.c:597
>>>>> drop_spte+0x24/0x280 arch/x86/kvm/mmu.c:1182
>>>>> kvm_zap_rmapp+0x119/0x260 arch/x86/kvm/mmu.c:1401
>>>>> kvm_unmap_rmapp+0x1d/0x30 arch/x86/kvm/mmu.c:1412
>>>>> kvm_handle_hva_range+0x54a/0x7d0 arch/x86/kvm/mmu.c:1565
>>>>> kvm_unmap_hva_range+0x2e/0x40 arch/x86/kvm/mmu.c:1591
>>>>> kvm_mmu_notifier_invalidate_range_start+0xae/0x140
>>>>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:360
>>>>> __mmu_notifier_invalidate_range_start+0x1f8/0x300 mm/mmu_notifier.c:199
>>>>> mmu_notifier_invalidate_range_start include/linux/mmu_notifier.h:282 [inline]
>>>>> unmap_vmas+0x14b/0x1b0 mm/memory.c:1368
>>>>> unmap_region+0x2f8/0x560 mm/mmap.c:2460
>>>>> do_munmap+0x7b8/0xfa0 mm/mmap.c:2657
>>>>> mmap_region+0x68f/0x18e0 mm/mmap.c:1612
>>>>> do_mmap+0x6a2/0xd40 mm/mmap.c:1450
>>>>> do_mmap_pgoff include/linux/mm.h:2031 [inline]
>>>>> vm_mmap_pgoff+0x1a9/0x200 mm/util.c:305
>>>>> SYSC_mmap_pgoff mm/mmap.c:1500 [inline]
>>>>> SyS_mmap_pgoff+0x22c/0x5d0 mm/mmap.c:1458
>>>>> SYSC_mmap arch/x86/kernel/sys_x86_64.c:95 [inline]
>>>>> SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:86
>>>>> entry_SYSCALL_64_fastpath+0x1f/0xc2
>>>>> RIP: 0033:0x445329
>>>>> RSP: 002b:00007fb33933cb58 EFLAGS: 00000282 ORIG_RAX: 0000000000000009
>>>>> RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 0000000000445329
>>>>> RDX: 0000000000000003 RSI: 0000000000af1000 RDI: 0000000020000000
>>>>> RBP: 00000000006dfe90 R08: ffffffffffffffff R09: 0000000000000000
>>>>> R10: 0000000000000032 R11: 0000000000000282 R12: 0000000000700000
>>>>> R13: 0000000000000006 R14: ffffffffffffffff R15: 0000000020001000
>>>>> Memory state around the buggy address:
>>>>> ffff880068ae1f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>> ffff880068ae1f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>>>> ffff880068ae2000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>> ^
>>>>> ffff880068ae2080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>> ffff880068ae2100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
>>>>> ==================================================================
>>>>
>>>> This could be related to the gfn_to_rmap issues.
>>>
>>>
>>> Humm... That's possible. Potentially I am not seeing any more of
>>> spte-related crashes after I applied the following patch:
>>>
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -968,8 +968,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>>> /* Check for overlaps */
>>> r = -EEXIST;
>>> kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
>>> - if ((slot->id >= KVM_USER_MEM_SLOTS) ||
>>> - (slot->id == id))
>>> + if (slot->id == id)
>>> continue;
>>> if (!((base_gfn + npages <= slot->base_gfn) ||
>>> (base_gfn >= slot->base_gfn + slot->npages)))
>
> I don't understand how this fixes the test: the only memslot that the
> test creates is at memory range 0x0-0x1000, which should not overlap
> with any private memslots.
> There should be just the IDENTITY_PAGETABLE_PRIVATE_MEMSLOT @
> 0xfffbc000ul.
>
> Do you get any ouput with this hunk?
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a17d78759727..7e1929432232 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -888,6 +888,14 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
> return old_memslots;
> }
>
> +void kvm_dump_slot(struct kvm_memory_slot *slot)
> +{
> + printk("kvm_memory_slot %p { .id = %u, .base_gfn = %#llx, .npages = %lu, "
> + ".userspace_addr = %#lx, .flags = %u, .dirty_bitmap = %p, .arch = ? }\n",
> + slot, slot->id, slot->base_gfn, slot->npages,
> + slot->userspace_addr, slot->flags, slot->dirty_bitmap);
> +}
> +
> /*
> * Allocate some memory and give it an address in the guest physical address
> * space.
> @@ -978,12 +986,14 @@ int __kvm_set_memory_region(struct kvm *kvm,
> /* Check for overlaps */
> r = -EEXIST;
> kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
> - if ((slot->id >= KVM_USER_MEM_SLOTS) ||
> - (slot->id == id))
> + if (slot->id == id)
> continue;
> if (!((base_gfn + npages <= slot->base_gfn) ||
> - (base_gfn >= slot->base_gfn + slot->npages)))
> + (base_gfn >= slot->base_gfn + slot->npages))) {
> + kvm_dump_slot(&new);
> + kvm_dump_slot(slot);
> goto out;
> + }
> }
> }
>
>
>> Friendly ping. Just hit it on
>
> And the warning happens at mmap ... I can't reproduce, but does the bug
> happen on the second mmap()? (Test line 210 when i = 0.)
>
> The change above makes sense as memslots currently cannot overlap
> anywhere. There are three private memslots that can cause this problem:
> TSS, IDENTITY_MAP and APIC.
>
> TSS and IDENTITY_MAP can be configured by userspace and must not
> conflict by design, so we can safely enforce that.
> APIC memslot doesn't provide such guarantees and should be overlaid over
> any memory, but assuming that userspace doesn't configure memslots there
> seems bearable.
>
> Still, I'd like to understand why that patch would fix this bug.
>
> Thanks.

Humm... I cannot reproduce it anymore. Maybe it was fixed by something else...
However this looks very close and is still not fixed:
https://groups.google.com/d/msg/syzkaller/IqkesiRS-t0/aLcJuMXqBgAJ
Maybe it's another reincarnation of the same problem...

>> mmotm/86292b33d4b79ee03e2f43ea0381ef85f077c760 (without the above
>> change):
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 31060 at arch/x86/kvm/mmu.c:682
>> mmu_spte_clear_track_bits+0x3a1/0x420 arch/x86/kvm/mmu.c:682
>> CPU: 1 PID: 31060 Comm: syz-executor0 Not tainted 4.11.0-rc1+ #328
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Call Trace:
>> __dump_stack lib/dump_stack.c:16 [inline]
>> dump_stack+0x1a7/0x26a lib/dump_stack.c:52
>> panic+0x1f8/0x40f kernel/panic.c:180
>> __warn+0x1c4/0x1e0 kernel/panic.c:541
>> warn_slowpath_null+0x2c/0x40 kernel/panic.c:584
>> mmu_spte_clear_track_bits+0x3a1/0x420 arch/x86/kvm/mmu.c:682
>> drop_spte+0x24/0x280 arch/x86/kvm/mmu.c:1323
>> mmu_page_zap_pte+0x223/0x350 arch/x86/kvm/mmu.c:2438
>> kvm_mmu_page_unlink_children arch/x86/kvm/mmu.c:2460 [inline]
>> kvm_mmu_prepare_zap_page+0x1ce/0x13d0 arch/x86/kvm/mmu.c:2504
>> kvm_zap_obsolete_pages arch/x86/kvm/mmu.c:5134 [inline]
>> kvm_mmu_invalidate_zap_all_pages+0x4d4/0x6b0 arch/x86/kvm/mmu.c:5175
>> kvm_arch_flush_shadow_all+0x15/0x20 arch/x86/kvm/x86.c:8364
>> kvm_mmu_notifier_release+0x71/0xb0
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:472
>> __mmu_notifier_release+0x1e5/0x6b0 mm/mmu_notifier.c:75
>> mmu_notifier_release include/linux/mmu_notifier.h:235 [inline]
>> exit_mmap+0x3a3/0x470 mm/mmap.c:2941
>> __mmput kernel/fork.c:890 [inline]
>> mmput+0x228/0x700 kernel/fork.c:912
>> exit_mm kernel/exit.c:558 [inline]
>> do_exit+0x9e8/0x1c20 kernel/exit.c:866
>> do_group_exit+0x149/0x400 kernel/exit.c:983
>> get_signal+0x6d9/0x1840 kernel/signal.c:2318
>> do_signal+0x94/0x1f30 arch/x86/kernel/signal.c:808
>> exit_to_usermode_loop+0x1e5/0x2d0 arch/x86/entry/common.c:157
>> prepare_exit_to_usermode arch/x86/entry/common.c:191 [inline]
>> syscall_return_slowpath+0x3bd/0x460 arch/x86/entry/common.c:260
>> entry_SYSCALL_64_fastpath+0xc0/0xc2
>> RIP: 0033:0x4458d9
>> RSP: 002b:00007ffa472c3b58 EFLAGS: 00000286 ORIG_RAX: 00000000000000ce
>> RAX: fffffffffffffff4 RBX: 0000000000708000 RCX: 00000000004458d9
>> RDX: 0000000000000000 RSI: 000000002006bff8 RDI: 000000000000a05b
>> RBP: 0000000000000fe0 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000286 R12: 00000000006df0a0
>> R13: 000000000000a05b R14: 000000002006bff8 R15: 0000000000000000