2016-12-11 06:46:36

by Dmitry Vyukov

[permalink] [raw]
Subject: kvm: use-after-free in process_srcu

Hello,

I am getting the following use-after-free reports while running
syzkaller fuzzer.
On commit 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7).
Unfortunately it is not reproducible, but all reports look sane and
very similar, so I would assume that it is some hard to trigger race.
In all cases the use-after-free offset within struct kvm is 344 bytes.
This points to srcu field, which starts at 208 with size 360 (I have
some debug configs enabled).


BUG: KASAN: use-after-free in process_srcu+0x27a/0x280 at addr ffff88005e29a418
Read of size 8 by task kworker/3:1/1496
CPU: 3 PID: 1496 Comm: kworker/3:1 Not tainted 4.9.0-rc8+ #78
Hardware name: Google Google/Google, BIOS Google 01/01/2011
Workqueue: events_power_efficient process_srcu
ffff88006b1df3a0 ffffffff8348fb59 ffffffff00000003 1ffff1000d63be07
ffffed000d63bdff 0000000041b58ab3 ffffffff8957cf20 ffffffff8348f86b
ffff8800668dc440 ffffffff8816c000 1ffff1000d63be18 dffffc0000000000
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff8348fb59>] dump_stack+0x2ee/0x3f5 lib/dump_stack.c:51
[<ffffffff819f1c21>] kasan_object_err+0x21/0x70 mm/kasan/report.c:163
[< inline >] print_address_description mm/kasan/report.c:201
[< inline >] kasan_report_error mm/kasan/report.c:285
[<ffffffff819f1ec1>] kasan_report+0x1a1/0x440 mm/kasan/report.c:305
[<ffffffff819f21d9>] __asan_report_load8_noabort+0x19/0x20
mm/kasan/report.c:331
[< inline >] rcu_batch_empty kernel/rcu/srcu.c:64
[< inline >] rcu_batch_dequeue kernel/rcu/srcu.c:75
[< inline >] srcu_invoke_callbacks kernel/rcu/srcu.c:624
[<ffffffff815ba91a>] process_srcu+0x27a/0x280 kernel/rcu/srcu.c:672
[<ffffffff814731c0>] process_one_work+0xb40/0x1ba0 kernel/workqueue.c:2096
[<ffffffff81474434>] worker_thread+0x214/0x18a0 kernel/workqueue.c:2230
[<ffffffff8148a058>] kthread+0x328/0x3e0 kernel/kthread.c:209
[<ffffffff8816c61a>] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433
Object at ffff88005e29a2c0, in cache kmalloc-16384 size: 16384
Allocated:
PID = 13066
[ 376.024345] [<ffffffff81270fdb>] save_stack_trace+0x1b/0x20
arch/x86/kernel/stacktrace.c:57
[ 376.024345] [<ffffffff819f0ea3>] save_stack+0x43/0xd0 mm/kasan/kasan.c:495
[ 376.024345] [< inline >] set_track mm/kasan/kasan.c:507
[ 376.024345] [<ffffffff819f116d>] kasan_kmalloc+0xad/0xe0
mm/kasan/kasan.c:598
[ 376.024345] [<ffffffff819ed4ec>]
kmem_cache_alloc_trace+0x12c/0x710 mm/slab.c:3635
[ 376.024345] [< inline >] kvm_arch_alloc_vm include/linux/slab.h:490
[ 376.024345] [< inline >] kvm_create_vm
arch/x86/kvm/../../../virt/kvm/kvm_main.c:610
[ 376.024345] [< inline >] kvm_dev_ioctl_create_vm
arch/x86/kvm/../../../virt/kvm/kvm_main.c:3164
[ 376.024345] [<ffffffff810789d5>] kvm_dev_ioctl+0x1b5/0x1100
arch/x86/kvm/../../../virt/kvm/kvm_main.c:3208
[ 376.024345] [< inline >] vfs_ioctl fs/ioctl.c:43
[ 376.024345] [<ffffffff81abdf24>] do_vfs_ioctl+0x1c4/0x1630 fs/ioctl.c:679
[ 376.024345] [< inline >] SYSC_ioctl fs/ioctl.c:694
[ 376.024345] [<ffffffff81abf424>] SyS_ioctl+0x94/0xc0 fs/ioctl.c:685
[ 376.024345] [<ffffffff8816c385>] entry_SYSCALL_64_fastpath+0x23/0xc6
Freed:
PID = 13064
[ 376.024345] [<ffffffff81270fdb>] save_stack_trace+0x1b/0x20
arch/x86/kernel/stacktrace.c:57
[ 376.024345] [<ffffffff819f0ea3>] save_stack+0x43/0xd0 mm/kasan/kasan.c:495
[ 376.024345] [< inline >] set_track mm/kasan/kasan.c:507
[ 376.024345] [<ffffffff819f17c2>] kasan_slab_free+0x72/0xc0
mm/kasan/kasan.c:571
[ 376.024345] [< inline >] __cache_free mm/slab.c:3511
[ 376.024345] [<ffffffff819ef3e8>] kfree+0xc8/0x2a0 mm/slab.c:3828
[ 376.024345] [< inline >] kvm_arch_free_vm
include/linux/kvm_host.h:774
[ 376.024345] [< inline >] kvm_destroy_vm
arch/x86/kvm/../../../virt/kvm/kvm_main.c:739
[ 376.024345] [<ffffffff8106fcd9>] kvm_put_kvm+0x489/0x5f0
arch/x86/kvm/../../../virt/kvm/kvm_main.c:754
[ 376.024345] [<ffffffff8106ff97>] kvm_vm_release+0x47/0x60
arch/x86/kvm/../../../virt/kvm/kvm_main.c:765
[ 376.024345] [<ffffffff81a77f7e>] __fput+0x34e/0x910 fs/file_table.c:208
[ 376.024345] [<ffffffff81a785ca>] ____fput+0x1a/0x20 fs/file_table.c:244
[ 376.024345] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
kernel/task_work.c:116
[ 376.024345] [< inline >] exit_task_work include/linux/task_work.h:21
[ 376.024345] [<ffffffff814129e2>] do_exit+0x1842/0x2650 kernel/exit.c:828
[ 376.024345] [<ffffffff814139ae>] do_group_exit+0x14e/0x420 kernel/exit.c:932
[ 376.024345] [<ffffffff81442b43>] get_signal+0x663/0x1880
kernel/signal.c:2307
[ 376.024345] [<ffffffff81239b45>] do_signal+0xc5/0x2190
arch/x86/kernel/signal.c:807
[ 376.024345] [<ffffffff8100666a>] exit_to_usermode_loop+0x1ea/0x2d0
arch/x86/entry/common.c:156
[ 376.024345] [< inline >] prepare_exit_to_usermode
arch/x86/entry/common.c:190
[ 376.024345] [<ffffffff81009693>]
syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
[ 376.024345] [<ffffffff8816c426>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Memory state around the buggy address:
ffff88005e29a300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88005e29a380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88005e29a400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88005e29a480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88005e29a500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


2016-12-11 08:40:39

by Vegard Nossum

[permalink] [raw]
Subject: Re: kvm: use-after-free in process_srcu

On 11 December 2016 at 07:46, Dmitry Vyukov <[email protected]> wrote:
> Hello,
>
> I am getting the following use-after-free reports while running
> syzkaller fuzzer.
> On commit 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7).
> Unfortunately it is not reproducible, but all reports look sane and
> very similar, so I would assume that it is some hard to trigger race.
> In all cases the use-after-free offset within struct kvm is 344 bytes.
> This points to srcu field, which starts at 208 with size 360 (I have
> some debug configs enabled).
[...]
> [ 376.024345] [<ffffffff81a77f7e>] __fput+0x34e/0x910 fs/file_table.c:208
> [ 376.024345] [<ffffffff81a785ca>] ____fput+0x1a/0x20 fs/file_table.c:244

I've been hitting what I think is a struct file refcounting bug which
causes similar symptoms as you have here (the struct file is freed
while somebody still has an active reference to it).

> [ 376.024345] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
> kernel/task_work.c:116
> [ 376.024345] [< inline >] exit_task_work include/linux/task_work.h:21
> [ 376.024345] [<ffffffff814129e2>] do_exit+0x1842/0x2650 kernel/exit.c:828
> [ 376.024345] [<ffffffff814139ae>] do_group_exit+0x14e/0x420 kernel/exit.c:932
> [ 376.024345] [<ffffffff81442b43>] get_signal+0x663/0x1880
> kernel/signal.c:2307
> [ 376.024345] [<ffffffff81239b45>] do_signal+0xc5/0x2190
> arch/x86/kernel/signal.c:807

Was this or any other process by any chance killed by the OOM killer?
That seems to be a pattern in the crashes I've seen. If not, do you
know what killed this process?


Vegard

2016-12-11 08:50:15

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: kvm: use-after-free in process_srcu

On Sun, Dec 11, 2016 at 9:40 AM, Vegard Nossum <[email protected]> wrote:
> On 11 December 2016 at 07:46, Dmitry Vyukov <[email protected]> wrote:
>> Hello,
>>
>> I am getting the following use-after-free reports while running
>> syzkaller fuzzer.
>> On commit 318c8932ddec5c1c26a4af0f3c053784841c598e (Dec 7).
>> Unfortunately it is not reproducible, but all reports look sane and
>> very similar, so I would assume that it is some hard to trigger race.
>> In all cases the use-after-free offset within struct kvm is 344 bytes.
>> This points to srcu field, which starts at 208 with size 360 (I have
>> some debug configs enabled).
> [...]
>> [ 376.024345] [<ffffffff81a77f7e>] __fput+0x34e/0x910 fs/file_table.c:208
>> [ 376.024345] [<ffffffff81a785ca>] ____fput+0x1a/0x20 fs/file_table.c:244
>
> I've been hitting what I think is a struct file refcounting bug which
> causes similar symptoms as you have here (the struct file is freed
> while somebody still has an active reference to it).
>
>> [ 376.024345] [<ffffffff81483c20>] task_work_run+0x1a0/0x280
>> kernel/task_work.c:116
>> [ 376.024345] [< inline >] exit_task_work include/linux/task_work.h:21
>> [ 376.024345] [<ffffffff814129e2>] do_exit+0x1842/0x2650 kernel/exit.c:828
>> [ 376.024345] [<ffffffff814139ae>] do_group_exit+0x14e/0x420 kernel/exit.c:932
>> [ 376.024345] [<ffffffff81442b43>] get_signal+0x663/0x1880
>> kernel/signal.c:2307
>> [ 376.024345] [<ffffffff81239b45>] do_signal+0xc5/0x2190
>> arch/x86/kernel/signal.c:807
>
> Was this or any other process by any chance killed by the OOM killer?
> That seems to be a pattern in the crashes I've seen. If not, do you
> know what killed this process?


Difficult to say as I can't reproduce them.
I've looked at the logs I have and there are no OOM kills, only some
kvm-related messages:

[ 372.188708] kvm [12528]: vcpu0, guest rIP: 0xfff0
kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x2, nop
[ 372.321334] kvm [12528]: vcpu0, guest rIP: 0xfff0 unhandled wrmsr:
0x0 data 0x0
[ 372.426831] kvm [12593]: vcpu512, guest rIP: 0xfff0 unhandled
wrmsr: 0x5 data 0x200
[ 372.646417] irq bypass consumer (token ffff880052f74780)
registration fails: -16
[ 373.001273] pit: kvm: requested 1676 ns i8254 timer period limited
to 500000 ns
[ 375.541449] kvm [13011]: vcpu0, guest rIP: 0x110000 unhandled
wrmsr: 0x0 data 0x2
[ 376.005387] ==================================================================
[ 376.024345] BUG: KASAN: use-after-free in process_srcu+0x27a/0x280
at addr ffff88005e29a418

[ 720.214985] kvm: vcpu 0: requested 244148 ns lapic timer period
limited to 500000 ns
[ 720.271334] kvm: vcpu 0: requested 244148 ns lapic timer period
limited to 500000 ns
[ 720.567985] kvm_vm_ioctl_assign_device: host device not found
[ 721.094589] kvm [22114]: vcpu0, guest rIP: 0x2 unhandled wrmsr: 0x6 data 0x8
[ 723.829467] ==================================================================
[ 723.829467] BUG: KASAN: use-after-free in process_srcu+0x27a/0x280
at addr ffff88005a4d10d8

Logs capture ~3-4 seconds before the crash.
However, syzkaller test processes tend to consume lots of memory from
time to time and cause low memory conditions.

Kills are usually caused by my test driver that kills test processes
after short time.

However, I do see other assorted bugs caused by kvm that are induced
by OOM kills:
https://groups.google.com/d/msg/syzkaller/ytVPh93HLnI/KhZdengZBwAJ