LinuxLists.cc - KASAN: use-after-free Read in percpu_ref_switch_to_atomic

2020-02-07 08:16:20

Subject: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

Hello,

syzbot found the following crash on:

HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KASAN: use-after-free in percpu_ref_call_confirm_rcu lib/percpu-refcount.c:126 [inline]
BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0x3f7/0x400 lib/percpu-refcount.c:165
Read of size 1 at addr ffff8880a8d91830 by task swapper/0/0

CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1fb/0x318 lib/dump_stack.c:118
print_address_description+0x74/0x5c0 mm/kasan/report.c:374
__kasan_report+0x149/0x1c0 mm/kasan/report.c:506
kasan_report+0x26/0x50 mm/kasan/common.c:641
__asan_report_load1_noabort+0x14/0x20 mm/kasan/generic_report.c:132
percpu_ref_call_confirm_rcu lib/percpu-refcount.c:126 [inline]
percpu_ref_switch_to_atomic_rcu+0x3f7/0x400 lib/percpu-refcount.c:165
rcu_do_batch kernel/rcu/tree.c:2186 [inline]
rcu_core+0x81b/0x10c0 kernel/rcu/tree.c:2410
rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2419
__do_softirq+0x283/0x7bd kernel/softirq.c:292
invoke_softirq kernel/softirq.c:373 [inline]
irq_exit+0x227/0x230 kernel/softirq.c:413
exiting_irq arch/x86/include/asm/apic.h:536 [inline]
smp_apic_timer_interrupt+0x113/0x280 arch/x86/kernel/apic/apic.c:1137
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
</IRQ>
RIP: 0010:native_safe_halt+0x12/0x20 arch/x86/include/asm/irqflags.h:61
Code: 89 d9 80 e1 07 80 c1 03 38 c1 7c ba 48 89 df e8 e4 5f 9d f9 eb b0 cc cc 55 48 89 e5 e9 07 00 00 00 0f 00 2d 62 17 4c 00 fb f4 <5d> c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e9 07 00 00
RSP: 0018:ffffffff89207db8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
RAX: 1ffffffff1255a25 RBX: ffffffff89275b00 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff812c06aa RDI: ffffffff89276344
RBP: ffffffff89207db8 R08: ffffffff89276358 R09: fffffbfff124eb61
R10: fffffbfff124eb61 R11: 0000000000000000 R12: 1ffffffff124eb60
R13: dffffc0000000000 R14: dffffc0000000000 R15: 1ffffffff1255a23
arch_safe_halt arch/x86/include/asm/paravirt.h:144 [inline]
default_idle+0x50/0x70 arch/x86/kernel/process.c:695
arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:686
default_idle_call+0x59/0xa0 kernel/sched/idle.c:94
cpuidle_idle_call kernel/sched/idle.c:154 [inline]
do_idle+0x1ec/0x630 kernel/sched/idle.c:269
cpu_startup_entry+0x25/0x30 kernel/sched/idle.c:361
rest_init+0x29d/0x2b0 init/main.c:450
arch_call_rest_init+0xe/0x10
start_kernel+0x676/0x777 init/main.c:784
x86_64_start_reservations+0x18/0x2e arch/x86/kernel/head64.c:490
x86_64_start_kernel+0x7a/0x7d arch/x86/kernel/head64.c:471
secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242

Allocated by task 25166:
save_stack mm/kasan/common.c:72 [inline]
set_track mm/kasan/common.c:80 [inline]
__kasan_kmalloc+0x118/0x1c0 mm/kasan/common.c:515
kasan_kmalloc+0x9/0x10 mm/kasan/common.c:529
kmem_cache_alloc_trace+0x221/0x2f0 mm/slab.c:3551
kmalloc include/linux/slab.h:555 [inline]
kzalloc include/linux/slab.h:669 [inline]
io_sqe_files_register fs/io_uring.c:5528 [inline]
__io_uring_register fs/io_uring.c:6875 [inline]
__do_sys_io_uring_register fs/io_uring.c:6955 [inline]
__se_sys_io_uring_register+0x1df4/0x3260 fs/io_uring.c:6937
__x64_sys_io_uring_register+0x9b/0xb0 fs/io_uring.c:6937
do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 25160:
save_stack mm/kasan/common.c:72 [inline]
set_track mm/kasan/common.c:80 [inline]
kasan_set_free_info mm/kasan/common.c:337 [inline]
__kasan_slab_free+0x12e/0x1e0 mm/kasan/common.c:476
kasan_slab_free+0xe/0x10 mm/kasan/common.c:485
__cache_free mm/slab.c:3426 [inline]
kfree+0x10d/0x220 mm/slab.c:3757
io_sqe_files_unregister+0x238/0x2b0 fs/io_uring.c:5250
io_ring_ctx_free fs/io_uring.c:6229 [inline]
io_ring_ctx_wait_and_kill+0x343d/0x3b00 fs/io_uring.c:6310
io_uring_release+0x5d/0x70 fs/io_uring.c:6318
__fput+0x2e4/0x740 fs/file_table.c:280
____fput+0x15/0x20 fs/file_table.c:313
task_work_run+0x176/0x1b0 kernel/task_work.c:113
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_usermode_loop arch/x86/entry/common.c:164 [inline]
prepare_exit_to_usermode+0x480/0x5b0 arch/x86/entry/common.c:195
syscall_return_slowpath+0x113/0x4a0 arch/x86/entry/common.c:278
do_syscall_64+0x11f/0x1c0 arch/x86/entry/common.c:304
entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8880a8d91800
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 48 bytes inside of
256-byte region [ffff8880a8d91800, ffff8880a8d91900)
The buggy address belongs to the page:
page:ffffea0002a36440 refcount:1 mapcount:0 mapping:ffff8880aa4008c0 index:0x0
flags: 0xfffe0000000200(slab)
raw: 00fffe0000000200 ffffea000265d2c8 ffffea00022f7948 ffff8880aa4008c0
raw: 0000000000000000 ffff8880a8d91000 0000000100000008 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8880a8d91700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8880a8d91780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8880a8d91800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880a8d91880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880a8d91900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

2020-03-04 08:00:26

by Dmitry Vyukov

[permalink] [raw]

Subject: Re: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

On Fri, Feb 7, 2020 at 9:14 AM syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
> dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
> compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]

+io_uring maintainers

Here is a repro:
https://gist.githubusercontent.com/dvyukov/6b340beab6483a036f4186e7378882ce/raw/cd1922185516453c201df8eded1d4b006a6d6a3a/gistfile1.txt

> ==================================================================
> BUG: KASAN: use-after-free in percpu_ref_call_confirm_rcu lib/percpu-refcount.c:126 [inline]
> BUG: KASAN: use-after-free in percpu_ref_switch_to_atomic_rcu+0x3f7/0x400 lib/percpu-refcount.c:165
> Read of size 1 at addr ffff8880a8d91830 by task swapper/0/0
>
> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.5.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x1fb/0x318 lib/dump_stack.c:118
> print_address_description+0x74/0x5c0 mm/kasan/report.c:374
> __kasan_report+0x149/0x1c0 mm/kasan/report.c:506
> kasan_report+0x26/0x50 mm/kasan/common.c:641
> __asan_report_load1_noabort+0x14/0x20 mm/kasan/generic_report.c:132
> percpu_ref_call_confirm_rcu lib/percpu-refcount.c:126 [inline]
> percpu_ref_switch_to_atomic_rcu+0x3f7/0x400 lib/percpu-refcount.c:165
> rcu_do_batch kernel/rcu/tree.c:2186 [inline]
> rcu_core+0x81b/0x10c0 kernel/rcu/tree.c:2410
> rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2419
> __do_softirq+0x283/0x7bd kernel/softirq.c:292
> invoke_softirq kernel/softirq.c:373 [inline]
> irq_exit+0x227/0x230 kernel/softirq.c:413
> exiting_irq arch/x86/include/asm/apic.h:536 [inline]
> smp_apic_timer_interrupt+0x113/0x280 arch/x86/kernel/apic/apic.c:1137
> apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
> </IRQ>
> RIP: 0010:native_safe_halt+0x12/0x20 arch/x86/include/asm/irqflags.h:61
> Code: 89 d9 80 e1 07 80 c1 03 38 c1 7c ba 48 89 df e8 e4 5f 9d f9 eb b0 cc cc 55 48 89 e5 e9 07 00 00 00 0f 00 2d 62 17 4c 00 fb f4 <5d> c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 e9 07 00 00
> RSP: 0018:ffffffff89207db8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
> RAX: 1ffffffff1255a25 RBX: ffffffff89275b00 RCX: dffffc0000000000
> RDX: 0000000000000000 RSI: ffffffff812c06aa RDI: ffffffff89276344
> RBP: ffffffff89207db8 R08: ffffffff89276358 R09: fffffbfff124eb61
> R10: fffffbfff124eb61 R11: 0000000000000000 R12: 1ffffffff124eb60
> R13: dffffc0000000000 R14: dffffc0000000000 R15: 1ffffffff1255a23
> arch_safe_halt arch/x86/include/asm/paravirt.h:144 [inline]
> default_idle+0x50/0x70 arch/x86/kernel/process.c:695
> arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:686
> default_idle_call+0x59/0xa0 kernel/sched/idle.c:94
> cpuidle_idle_call kernel/sched/idle.c:154 [inline]
> do_idle+0x1ec/0x630 kernel/sched/idle.c:269
> cpu_startup_entry+0x25/0x30 kernel/sched/idle.c:361
> rest_init+0x29d/0x2b0 init/main.c:450
> arch_call_rest_init+0xe/0x10
> start_kernel+0x676/0x777 init/main.c:784
> x86_64_start_reservations+0x18/0x2e arch/x86/kernel/head64.c:490
> x86_64_start_kernel+0x7a/0x7d arch/x86/kernel/head64.c:471
> secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:242
>
> Allocated by task 25166:
> save_stack mm/kasan/common.c:72 [inline]
> set_track mm/kasan/common.c:80 [inline]
> __kasan_kmalloc+0x118/0x1c0 mm/kasan/common.c:515
> kasan_kmalloc+0x9/0x10 mm/kasan/common.c:529
> kmem_cache_alloc_trace+0x221/0x2f0 mm/slab.c:3551
> kmalloc include/linux/slab.h:555 [inline]
> kzalloc include/linux/slab.h:669 [inline]
> io_sqe_files_register fs/io_uring.c:5528 [inline]
> __io_uring_register fs/io_uring.c:6875 [inline]
> __do_sys_io_uring_register fs/io_uring.c:6955 [inline]
> __se_sys_io_uring_register+0x1df4/0x3260 fs/io_uring.c:6937
> __x64_sys_io_uring_register+0x9b/0xb0 fs/io_uring.c:6937
> do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Freed by task 25160:
> save_stack mm/kasan/common.c:72 [inline]
> set_track mm/kasan/common.c:80 [inline]
> kasan_set_free_info mm/kasan/common.c:337 [inline]
> __kasan_slab_free+0x12e/0x1e0 mm/kasan/common.c:476
> kasan_slab_free+0xe/0x10 mm/kasan/common.c:485
> __cache_free mm/slab.c:3426 [inline]
> kfree+0x10d/0x220 mm/slab.c:3757
> io_sqe_files_unregister+0x238/0x2b0 fs/io_uring.c:5250
> io_ring_ctx_free fs/io_uring.c:6229 [inline]
> io_ring_ctx_wait_and_kill+0x343d/0x3b00 fs/io_uring.c:6310
> io_uring_release+0x5d/0x70 fs/io_uring.c:6318
> __fput+0x2e4/0x740 fs/file_table.c:280
> ____fput+0x15/0x20 fs/file_table.c:313
> task_work_run+0x176/0x1b0 kernel/task_work.c:113
> tracehook_notify_resume include/linux/tracehook.h:188 [inline]
> exit_to_usermode_loop arch/x86/entry/common.c:164 [inline]
> prepare_exit_to_usermode+0x480/0x5b0 arch/x86/entry/common.c:195
> syscall_return_slowpath+0x113/0x4a0 arch/x86/entry/common.c:278
> do_syscall_64+0x11f/0x1c0 arch/x86/entry/common.c:304
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> The buggy address belongs to the object at ffff8880a8d91800
> which belongs to the cache kmalloc-256 of size 256
> The buggy address is located 48 bytes inside of
> 256-byte region [ffff8880a8d91800, ffff8880a8d91900)
> The buggy address belongs to the page:
> page:ffffea0002a36440 refcount:1 mapcount:0 mapping:ffff8880aa4008c0 index:0x0
> flags: 0xfffe0000000200(slab)
> raw: 00fffe0000000200 ffffea000265d2c8 ffffea00022f7948 ffff8880aa4008c0
> raw: 0000000000000000 ffff8880a8d91000 0000000100000008 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
> ffff8880a8d91700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8880a8d91780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >ffff8880a8d91800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ^
> ffff8880a8d91880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8880a8d91900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ==================================================================
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/00000000000067c6df059df7f9f5%40google.com.

2020-03-04 14:40:34

by Jens Axboe

[permalink] [raw]

Subject: Re: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

On 3/4/20 12:59 AM, Dmitry Vyukov wrote:
> On Fri, Feb 7, 2020 at 9:14 AM syzbot
> <[email protected]> wrote:
>>
>> Hello,
>>
>> syzbot found the following crash on:
>>
>> HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
>> git tree: upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
>> kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
>> dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
>> compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: [email protected]
>
> +io_uring maintainers
>
> Here is a repro:
> https://gist.githubusercontent.com/dvyukov/6b340beab6483a036f4186e7378882ce/raw/cd1922185516453c201df8eded1d4b006a6d6a3a/gistfile1.txt

I've queued up a fix for this:

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=9875fe3dc4b8cff1f1b440fb925054a5124403c3

--
Jens Axboe

2020-03-06 14:38:45

by Dan Carpenter

[permalink] [raw]

Subject: Re: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

There a bunch of similar bugs. It's seems a common anti-pattern.

block/blk-cgroup.c:85 blkg_free() warn: freeing 'blkg' which has percpu_ref_exit()
block/blk-core.c:558 blk_alloc_queue_node() warn: freeing 'q' which has percpu_ref_exit()
drivers/md/md.c:5528 md_free() warn: freeing 'mddev' which has percpu_ref_exit()
drivers/target/target_core_transport.c:583 transport_free_session() warn: freeing 'se_sess' which has percpu_ref_exit()
fs/aio.c:592 free_ioctx() warn: freeing 'ctx' which has percpu_ref_exit()
fs/aio.c:806 ioctx_alloc() warn: freeing 'ctx' which has percpu_ref_exit()
fs/io_uring.c:6115 io_sqe_files_unregister() warn: freeing 'data' which has percpu_ref_exit()
fs/io_uring.c:6431 io_sqe_files_register() warn: freeing 'ctx->file_data' which has percpu_ref_exit()
fs/io_uring.c:7134 io_ring_ctx_free() warn: freeing 'ctx' which has percpu_ref_exit()
kernel/cgroup/cgroup.c:4948 css_free_rwork_fn() warn: freeing 'css' which has percpu_ref_exit()
mm/backing-dev.c:615 cgwb_create() warn: freeing 'wb' which has percpu_ref_exit()

regards,
dan carpenter

2020-03-06 14:59:22

by Jens Axboe

[permalink] [raw]

Subject: Re: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

On 3/6/20 7:35 AM, Dan Carpenter wrote:
>
> There a bunch of similar bugs. It's seems a common anti-pattern.
>
> block/blk-cgroup.c:85 blkg_free() warn: freeing 'blkg' which has percpu_ref_exit()
> block/blk-core.c:558 blk_alloc_queue_node() warn: freeing 'q' which has percpu_ref_exit()
> drivers/md/md.c:5528 md_free() warn: freeing 'mddev' which has percpu_ref_exit()
> drivers/target/target_core_transport.c:583 transport_free_session() warn: freeing 'se_sess' which has percpu_ref_exit()
> fs/aio.c:592 free_ioctx() warn: freeing 'ctx' which has percpu_ref_exit()
> fs/aio.c:806 ioctx_alloc() warn: freeing 'ctx' which has percpu_ref_exit()
> fs/io_uring.c:6115 io_sqe_files_unregister() warn: freeing 'data' which has percpu_ref_exit()
> fs/io_uring.c:6431 io_sqe_files_register() warn: freeing 'ctx->file_data' which has percpu_ref_exit()
> fs/io_uring.c:7134 io_ring_ctx_free() warn: freeing 'ctx' which has percpu_ref_exit()
> kernel/cgroup/cgroup.c:4948 css_free_rwork_fn() warn: freeing 'css' which has percpu_ref_exit()
> mm/backing-dev.c:615 cgwb_create() warn: freeing 'wb' which has percpu_ref_exit()

The file table io_uring issue is using the ref in a funky way, switching
in and out of atomic if we need to quiesce it. That's different from
other use cases, that just use it as a "normal" reference. Hence for the
funky use case, you can potentially have a switch in progress when you
exit the ref. You really want to wait for that, the easiest solution is
to punt the exit + free to an RCU callback, if there's nothing else you
need to handle once the switch is done.

So I would not be so quick to assume that similar patterns (exit + free)
have similar issues.

--
Jens Axboe

2020-03-06 15:00:05

by Jann Horn

[permalink] [raw]

Subject: Re: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

+paulmck

On Wed, Mar 4, 2020 at 3:40 PM Jens Axboe <[email protected]> wrote:
> On 3/4/20 12:59 AM, Dmitry Vyukov wrote:
> > On Fri, Feb 7, 2020 at 9:14 AM syzbot
> > <[email protected]> wrote:
> >>
> >> Hello,
> >>
> >> syzbot found the following crash on:
> >>
> >> HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
> >> git tree: upstream
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
> >> kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
> >> compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> >>
> >> Unfortunately, I don't have any reproducer for this crash yet.
> >>
> >> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> Reported-by: [email protected]
> >
> > +io_uring maintainers
> >
> > Here is a repro:
> > https://gist.githubusercontent.com/dvyukov/6b340beab6483a036f4186e7378882ce/raw/cd1922185516453c201df8eded1d4b006a6d6a3a/gistfile1.txt
>
> I've queued up a fix for this:
>
> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=9875fe3dc4b8cff1f1b440fb925054a5124403c3

I believe that this fix relies on call_rcu() having FIFO ordering; but
<https://www.kernel.org/doc/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html#Callback%20Registry>
says:

| call_rcu() normally acts only on CPU-local state[...] It simply
enqueues the rcu_head structure on a per-CPU list,

Is this fix really correct?

2020-03-06 15:35:00

by Jens Axboe

[permalink] [raw]

Subject: Re: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

On 3/6/20 7:57 AM, Jann Horn wrote:
> +paulmck
>
> On Wed, Mar 4, 2020 at 3:40 PM Jens Axboe <[email protected]> wrote:
>> On 3/4/20 12:59 AM, Dmitry Vyukov wrote:
>>> On Fri, Feb 7, 2020 at 9:14 AM syzbot
>>> <[email protected]> wrote:
>>>>
>>>> Hello,
>>>>
>>>> syzbot found the following crash on:
>>>>
>>>> HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
>>>> git tree: upstream
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
>>>> compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
>>>>
>>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>>
>>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>>> Reported-by: [email protected]
>>>
>>> +io_uring maintainers
>>>
>>> Here is a repro:
>>> https://gist.githubusercontent.com/dvyukov/6b340beab6483a036f4186e7378882ce/raw/cd1922185516453c201df8eded1d4b006a6d6a3a/gistfile1.txt
>>
>> I've queued up a fix for this:
>>
>> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=9875fe3dc4b8cff1f1b440fb925054a5124403c3
>
> I believe that this fix relies on call_rcu() having FIFO ordering; but
> <https://www.kernel.org/doc/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html#Callback%20Registry>
> says:
>
> | call_rcu() normally acts only on CPU-local state[...] It simply
> enqueues the rcu_head structure on a per-CPU list,
>
> Is this fix really correct?

That's a good point, there's a potentially stronger guarantee we need
here that isn't "nobody is inside an RCU critical section", but rather
that we're depending on a previous call_rcu() to have happened. Hence I
think you are right - it'll shrink the window drastically, since the
previous callback is already queued up, but it's not a full close.

Hmm...

--
Jens Axboe

2020-03-06 15:37:29

by Jann Horn

[permalink] [raw]

Subject: Re: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

On Fri, Mar 6, 2020 at 4:34 PM Jens Axboe <[email protected]> wrote:
> On 3/6/20 7:57 AM, Jann Horn wrote:
> > +paulmck
> >
> > On Wed, Mar 4, 2020 at 3:40 PM Jens Axboe <[email protected]> wrote:
> >> On 3/4/20 12:59 AM, Dmitry Vyukov wrote:
> >>> On Fri, Feb 7, 2020 at 9:14 AM syzbot
> >>> <[email protected]> wrote:
> >>>>
> >>>> Hello,
> >>>>
> >>>> syzbot found the following crash on:
> >>>>
> >>>> HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
> >>>> git tree: upstream
> >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
> >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
> >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
> >>>> compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> >>>>
> >>>> Unfortunately, I don't have any reproducer for this crash yet.
> >>>>
> >>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >>>> Reported-by: [email protected]
> >>>
> >>> +io_uring maintainers
> >>>
> >>> Here is a repro:
> >>> https://gist.githubusercontent.com/dvyukov/6b340beab6483a036f4186e7378882ce/raw/cd1922185516453c201df8eded1d4b006a6d6a3a/gistfile1.txt
> >>
> >> I've queued up a fix for this:
> >>
> >> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=9875fe3dc4b8cff1f1b440fb925054a5124403c3
> >
> > I believe that this fix relies on call_rcu() having FIFO ordering; but
> > <https://www.kernel.org/doc/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html#Callback%20Registry>
> > says:
> >
> > | call_rcu() normally acts only on CPU-local state[...] It simply
> > enqueues the rcu_head structure on a per-CPU list,
> >
> > Is this fix really correct?
>
> That's a good point, there's a potentially stronger guarantee we need
> here that isn't "nobody is inside an RCU critical section", but rather
> that we're depending on a previous call_rcu() to have happened. Hence I
> think you are right - it'll shrink the window drastically, since the
> previous callback is already queued up, but it's not a full close.
>
> Hmm...

You could potentially hack up the semantics you want by doing a
call_rcu() whose callback does another call_rcu(), or something like
that - but I'd like to hear paulmck's opinion on this first.

2020-03-06 16:45:37

by Paul E. McKenney

[permalink] [raw]

Subject: Re: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

On Fri, Mar 06, 2020 at 04:36:20PM +0100, Jann Horn wrote:
> On Fri, Mar 6, 2020 at 4:34 PM Jens Axboe <[email protected]> wrote:
> > On 3/6/20 7:57 AM, Jann Horn wrote:
> > > +paulmck
> > >
> > > On Wed, Mar 4, 2020 at 3:40 PM Jens Axboe <[email protected]> wrote:
> > >> On 3/4/20 12:59 AM, Dmitry Vyukov wrote:
> > >>> On Fri, Feb 7, 2020 at 9:14 AM syzbot
> > >>> <[email protected]> wrote:
> > >>>>
> > >>>> Hello,
> > >>>>
> > >>>> syzbot found the following crash on:
> > >>>>
> > >>>> HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
> > >>>> git tree: upstream
> > >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
> > >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
> > >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
> > >>>> compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> > >>>>
> > >>>> Unfortunately, I don't have any reproducer for this crash yet.
> > >>>>
> > >>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > >>>> Reported-by: [email protected]
> > >>>
> > >>> +io_uring maintainers
> > >>>
> > >>> Here is a repro:
> > >>> https://gist.githubusercontent.com/dvyukov/6b340beab6483a036f4186e7378882ce/raw/cd1922185516453c201df8eded1d4b006a6d6a3a/gistfile1.txt
> > >>
> > >> I've queued up a fix for this:
> > >>
> > >> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=9875fe3dc4b8cff1f1b440fb925054a5124403c3
> > >
> > > I believe that this fix relies on call_rcu() having FIFO ordering; but
> > > <https://www.kernel.org/doc/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html#Callback%20Registry>
> > > says:
> > >
> > > | call_rcu() normally acts only on CPU-local state[...] It simply
> > > enqueues the rcu_head structure on a per-CPU list,

Indeed. For but one example, if there was a CPU-to-CPU migration between
the two call_rcu() invocations, it would not be at all surprising for
the two callbacks to execute out of order.

> > > Is this fix really correct?
> >
> > That's a good point, there's a potentially stronger guarantee we need
> > here that isn't "nobody is inside an RCU critical section", but rather
> > that we're depending on a previous call_rcu() to have happened. Hence I
> > think you are right - it'll shrink the window drastically, since the
> > previous callback is already queued up, but it's not a full close.
> >
> > Hmm...
>
> You could potentially hack up the semantics you want by doing a
> call_rcu() whose callback does another call_rcu(), or something like
> that - but I'd like to hear paulmck's opinion on this first.

That would work!

Or, alternatively, do an rcu_barrier() between the two calls to
call_rcu(), assuming that the use case can tolerate rcu_barrier()
overhead and latency.

Thanx, Paul

2020-03-06 17:01:14

by Jens Axboe

[permalink] [raw]

Subject: Re: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

On 3/6/20 9:44 AM, Paul E. McKenney wrote:
> On Fri, Mar 06, 2020 at 04:36:20PM +0100, Jann Horn wrote:
>> On Fri, Mar 6, 2020 at 4:34 PM Jens Axboe <[email protected]> wrote:
>>> On 3/6/20 7:57 AM, Jann Horn wrote:
>>>> +paulmck
>>>>
>>>> On Wed, Mar 4, 2020 at 3:40 PM Jens Axboe <[email protected]> wrote:
>>>>> On 3/4/20 12:59 AM, Dmitry Vyukov wrote:
>>>>>> On Fri, Feb 7, 2020 at 9:14 AM syzbot
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> syzbot found the following crash on:
>>>>>>>
>>>>>>> HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
>>>>>>> git tree: upstream
>>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
>>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
>>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
>>>>>>> compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
>>>>>>>
>>>>>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>>>>>
>>>>>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>>>>>> Reported-by: [email protected]
>>>>>>
>>>>>> +io_uring maintainers
>>>>>>
>>>>>> Here is a repro:
>>>>>> https://gist.githubusercontent.com/dvyukov/6b340beab6483a036f4186e7378882ce/raw/cd1922185516453c201df8eded1d4b006a6d6a3a/gistfile1.txt
>>>>>
>>>>> I've queued up a fix for this:
>>>>>
>>>>> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=9875fe3dc4b8cff1f1b440fb925054a5124403c3
>>>>
>>>> I believe that this fix relies on call_rcu() having FIFO ordering; but
>>>> <https://www.kernel.org/doc/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html#Callback%20Registry>
>>>> says:
>>>>
>>>> | call_rcu() normally acts only on CPU-local state[...] It simply
>>>> enqueues the rcu_head structure on a per-CPU list,
>
> Indeed. For but one example, if there was a CPU-to-CPU migration between
> the two call_rcu() invocations, it would not be at all surprising for
> the two callbacks to execute out of order.
>
>>>> Is this fix really correct?
>>>
>>> That's a good point, there's a potentially stronger guarantee we need
>>> here that isn't "nobody is inside an RCU critical section", but rather
>>> that we're depending on a previous call_rcu() to have happened. Hence I
>>> think you are right - it'll shrink the window drastically, since the
>>> previous callback is already queued up, but it's not a full close.
>>>
>>> Hmm...
>>
>> You could potentially hack up the semantics you want by doing a
>> call_rcu() whose callback does another call_rcu(), or something like
>> that - but I'd like to hear paulmck's opinion on this first.
>
> That would work!
>
> Or, alternatively, do an rcu_barrier() between the two calls to
> call_rcu(), assuming that the use case can tolerate rcu_barrier()
> overhead and latency.

If the nested call_rcu() works, that seems greatly preferable to needing
the rcu_barrier(), even if that would not be a showstopper for me. The
nested call_rcu() is just a bit odd, but with a comment it should be OK.

Incremental here I'm going to test, would just fold in of course.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f3218fc81943..95ba95b4d8ec 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5330,7 +5330,7 @@ static void io_file_ref_kill(struct percpu_ref *ref)
complete(&data->done);
}

-static void io_file_ref_exit_and_free(struct rcu_head *rcu)
+static void __io_file_ref_exit_and_free(struct rcu_head *rcu)
{
struct fixed_file_data *data = container_of(rcu, struct fixed_file_data,
rcu);
@@ -5338,6 +5338,18 @@ static void io_file_ref_exit_and_free(struct rcu_head *rcu)
kfree(data);
}

+static void io_file_ref_exit_and_free(struct rcu_head *rcu)
+{
+ /*
+ * We need to order our exit+free call again the potentially
+ * existing call_rcu() for switching to atomic. One way to do that
+ * is to have this rcu callback queue the final put and free, as we
+ * could otherwise a pre-existing atomic switch complete _after_
+ * the free callback we queued.
+ */
+ call_rcu(rcu, __io_file_ref_exit_and_free);
+}
+
static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
{
struct fixed_file_data *data = ctx->file_data;

--
Jens Axboe

2020-03-06 17:09:14

by Jens Axboe

[permalink] [raw]

Subject: Re: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

On 3/6/20 10:00 AM, Jens Axboe wrote:
> On 3/6/20 9:44 AM, Paul E. McKenney wrote:
>> On Fri, Mar 06, 2020 at 04:36:20PM +0100, Jann Horn wrote:
>>> On Fri, Mar 6, 2020 at 4:34 PM Jens Axboe <[email protected]> wrote:
>>>> On 3/6/20 7:57 AM, Jann Horn wrote:
>>>>> +paulmck
>>>>>
>>>>> On Wed, Mar 4, 2020 at 3:40 PM Jens Axboe <[email protected]> wrote:
>>>>>> On 3/4/20 12:59 AM, Dmitry Vyukov wrote:
>>>>>>> On Fri, Feb 7, 2020 at 9:14 AM syzbot
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> syzbot found the following crash on:
>>>>>>>>
>>>>>>>> HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
>>>>>>>> git tree: upstream
>>>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
>>>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
>>>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
>>>>>>>> compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
>>>>>>>>
>>>>>>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>>>>>>
>>>>>>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>>>>>>> Reported-by: [email protected]
>>>>>>>
>>>>>>> +io_uring maintainers
>>>>>>>
>>>>>>> Here is a repro:
>>>>>>> https://gist.githubusercontent.com/dvyukov/6b340beab6483a036f4186e7378882ce/raw/cd1922185516453c201df8eded1d4b006a6d6a3a/gistfile1.txt
>>>>>>
>>>>>> I've queued up a fix for this:
>>>>>>
>>>>>> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=9875fe3dc4b8cff1f1b440fb925054a5124403c3
>>>>>
>>>>> I believe that this fix relies on call_rcu() having FIFO ordering; but
>>>>> <https://www.kernel.org/doc/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html#Callback%20Registry>
>>>>> says:
>>>>>
>>>>> | call_rcu() normally acts only on CPU-local state[...] It simply
>>>>> enqueues the rcu_head structure on a per-CPU list,
>>
>> Indeed. For but one example, if there was a CPU-to-CPU migration between
>> the two call_rcu() invocations, it would not be at all surprising for
>> the two callbacks to execute out of order.
>>
>>>>> Is this fix really correct?
>>>>
>>>> That's a good point, there's a potentially stronger guarantee we need
>>>> here that isn't "nobody is inside an RCU critical section", but rather
>>>> that we're depending on a previous call_rcu() to have happened. Hence I
>>>> think you are right - it'll shrink the window drastically, since the
>>>> previous callback is already queued up, but it's not a full close.
>>>>
>>>> Hmm...
>>>
>>> You could potentially hack up the semantics you want by doing a
>>> call_rcu() whose callback does another call_rcu(), or something like
>>> that - but I'd like to hear paulmck's opinion on this first.
>>
>> That would work!
>>
>> Or, alternatively, do an rcu_barrier() between the two calls to
>> call_rcu(), assuming that the use case can tolerate rcu_barrier()
>> overhead and latency.
>
> If the nested call_rcu() works, that seems greatly preferable to needing
> the rcu_barrier(), even if that would not be a showstopper for me. The
> nested call_rcu() is just a bit odd, but with a comment it should be OK.
>
> Incremental here I'm going to test, would just fold in of course.

Been running for a few minutes just fine, I'm going to leave the
reproducer beating on it for a few hours. But here's the folded in
final:

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=fae702294a6a0774ceb3cf250be79e7fe207250a

--
Jens Axboe

2020-03-06 17:20:53

by Paul E. McKenney

[permalink] [raw]

Subject: Re: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

On Fri, Mar 06, 2020 at 10:00:19AM -0700, Jens Axboe wrote:
> On 3/6/20 9:44 AM, Paul E. McKenney wrote:
> > On Fri, Mar 06, 2020 at 04:36:20PM +0100, Jann Horn wrote:
> >> On Fri, Mar 6, 2020 at 4:34 PM Jens Axboe <[email protected]> wrote:
> >>> On 3/6/20 7:57 AM, Jann Horn wrote:
> >>>> +paulmck
> >>>>
> >>>> On Wed, Mar 4, 2020 at 3:40 PM Jens Axboe <[email protected]> wrote:
> >>>>> On 3/4/20 12:59 AM, Dmitry Vyukov wrote:
> >>>>>> On Fri, Feb 7, 2020 at 9:14 AM syzbot
> >>>>>> <[email protected]> wrote:
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> syzbot found the following crash on:
> >>>>>>>
> >>>>>>> HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
> >>>>>>> git tree: upstream
> >>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
> >>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
> >>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
> >>>>>>> compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
> >>>>>>>
> >>>>>>> Unfortunately, I don't have any reproducer for this crash yet.
> >>>>>>>
> >>>>>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >>>>>>> Reported-by: [email protected]
> >>>>>>
> >>>>>> +io_uring maintainers
> >>>>>>
> >>>>>> Here is a repro:
> >>>>>> https://gist.githubusercontent.com/dvyukov/6b340beab6483a036f4186e7378882ce/raw/cd1922185516453c201df8eded1d4b006a6d6a3a/gistfile1.txt
> >>>>>
> >>>>> I've queued up a fix for this:
> >>>>>
> >>>>> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=9875fe3dc4b8cff1f1b440fb925054a5124403c3
> >>>>
> >>>> I believe that this fix relies on call_rcu() having FIFO ordering; but
> >>>> <https://www.kernel.org/doc/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html#Callback%20Registry>
> >>>> says:
> >>>>
> >>>> | call_rcu() normally acts only on CPU-local state[...] It simply
> >>>> enqueues the rcu_head structure on a per-CPU list,
> >
> > Indeed. For but one example, if there was a CPU-to-CPU migration between
> > the two call_rcu() invocations, it would not be at all surprising for
> > the two callbacks to execute out of order.
> >
> >>>> Is this fix really correct?
> >>>
> >>> That's a good point, there's a potentially stronger guarantee we need
> >>> here that isn't "nobody is inside an RCU critical section", but rather
> >>> that we're depending on a previous call_rcu() to have happened. Hence I
> >>> think you are right - it'll shrink the window drastically, since the
> >>> previous callback is already queued up, but it's not a full close.
> >>>
> >>> Hmm...
> >>
> >> You could potentially hack up the semantics you want by doing a
> >> call_rcu() whose callback does another call_rcu(), or something like
> >> that - but I'd like to hear paulmck's opinion on this first.
> >
> > That would work!
> >
> > Or, alternatively, do an rcu_barrier() between the two calls to
> > call_rcu(), assuming that the use case can tolerate rcu_barrier()
> > overhead and latency.
>
> If the nested call_rcu() works, that seems greatly preferable to needing
> the rcu_barrier(), even if that would not be a showstopper for me. The
> nested call_rcu() is just a bit odd, but with a comment it should be OK.
>
> Incremental here I'm going to test, would just fold in of course.
>
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index f3218fc81943..95ba95b4d8ec 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5330,7 +5330,7 @@ static void io_file_ref_kill(struct percpu_ref *ref)
> complete(&data->done);
> }
>
> -static void io_file_ref_exit_and_free(struct rcu_head *rcu)
> +static void __io_file_ref_exit_and_free(struct rcu_head *rcu)
> {
> struct fixed_file_data *data = container_of(rcu, struct fixed_file_data,
> rcu);
> @@ -5338,6 +5338,18 @@ static void io_file_ref_exit_and_free(struct rcu_head *rcu)
> kfree(data);
> }
>
> +static void io_file_ref_exit_and_free(struct rcu_head *rcu)
> +{
> + /*
> + * We need to order our exit+free call again the potentially
> + * existing call_rcu() for switching to atomic. One way to do that
> + * is to have this rcu callback queue the final put and free, as we
> + * could otherwise a pre-existing atomic switch complete _after_
> + * the free callback we queued.
> + */
> + call_rcu(rcu, __io_file_ref_exit_and_free);
> +}
> +
> static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
> {
> struct fixed_file_data *data = ctx->file_data;

Looks good to me!

Thanx, Paul

2020-03-06 17:22:17

by Jens Axboe

[permalink] [raw]

Subject: Re: KASAN: use-after-free Read in percpu_ref_switch_to_atomic_rcu

On 3/6/20 10:19 AM, Paul E. McKenney wrote:
> On Fri, Mar 06, 2020 at 10:00:19AM -0700, Jens Axboe wrote:
>> On 3/6/20 9:44 AM, Paul E. McKenney wrote:
>>> On Fri, Mar 06, 2020 at 04:36:20PM +0100, Jann Horn wrote:
>>>> On Fri, Mar 6, 2020 at 4:34 PM Jens Axboe <[email protected]> wrote:
>>>>> On 3/6/20 7:57 AM, Jann Horn wrote:
>>>>>> +paulmck
>>>>>>
>>>>>> On Wed, Mar 4, 2020 at 3:40 PM Jens Axboe <[email protected]> wrote:
>>>>>>> On 3/4/20 12:59 AM, Dmitry Vyukov wrote:
>>>>>>>> On Fri, Feb 7, 2020 at 9:14 AM syzbot
>>>>>>>> <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> syzbot found the following crash on:
>>>>>>>>>
>>>>>>>>> HEAD commit: 4c7d00cc Merge tag 'pwm/for-5.6-rc1' of git://git.kernel.o..
>>>>>>>>> git tree: upstream
>>>>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=12fec785e00000
>>>>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=e162021ddededa72
>>>>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e017e49c39ab484ac87a
>>>>>>>>> compiler: clang version 10.0.0 (https://github.com/llvm/llvm-project/ c2443155a0fb245c8f17f2c1c72b6ea391e86e81)
>>>>>>>>>
>>>>>>>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>>>>>>>
>>>>>>>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>>>>>>>> Reported-by: [email protected]
>>>>>>>>
>>>>>>>> +io_uring maintainers
>>>>>>>>
>>>>>>>> Here is a repro:
>>>>>>>> https://gist.githubusercontent.com/dvyukov/6b340beab6483a036f4186e7378882ce/raw/cd1922185516453c201df8eded1d4b006a6d6a3a/gistfile1.txt
>>>>>>>
>>>>>>> I've queued up a fix for this:
>>>>>>>
>>>>>>> https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=9875fe3dc4b8cff1f1b440fb925054a5124403c3
>>>>>>
>>>>>> I believe that this fix relies on call_rcu() having FIFO ordering; but
>>>>>> <https://www.kernel.org/doc/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html#Callback%20Registry>
>>>>>> says:
>>>>>>
>>>>>> | call_rcu() normally acts only on CPU-local state[...] It simply
>>>>>> enqueues the rcu_head structure on a per-CPU list,
>>>
>>> Indeed. For but one example, if there was a CPU-to-CPU migration between
>>> the two call_rcu() invocations, it would not be at all surprising for
>>> the two callbacks to execute out of order.
>>>
>>>>>> Is this fix really correct?
>>>>>
>>>>> That's a good point, there's a potentially stronger guarantee we need
>>>>> here that isn't "nobody is inside an RCU critical section", but rather
>>>>> that we're depending on a previous call_rcu() to have happened. Hence I
>>>>> think you are right - it'll shrink the window drastically, since the
>>>>> previous callback is already queued up, but it's not a full close.
>>>>>
>>>>> Hmm...
>>>>
>>>> You could potentially hack up the semantics you want by doing a
>>>> call_rcu() whose callback does another call_rcu(), or something like
>>>> that - but I'd like to hear paulmck's opinion on this first.
>>>
>>> That would work!
>>>
>>> Or, alternatively, do an rcu_barrier() between the two calls to
>>> call_rcu(), assuming that the use case can tolerate rcu_barrier()
>>> overhead and latency.
>>
>> If the nested call_rcu() works, that seems greatly preferable to needing
>> the rcu_barrier(), even if that would not be a showstopper for me. The
>> nested call_rcu() is just a bit odd, but with a comment it should be OK.
>>
>> Incremental here I'm going to test, would just fold in of course.
>>
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index f3218fc81943..95ba95b4d8ec 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5330,7 +5330,7 @@ static void io_file_ref_kill(struct percpu_ref *ref)
>> complete(&data->done);
>> }
>>
>> -static void io_file_ref_exit_and_free(struct rcu_head *rcu)
>> +static void __io_file_ref_exit_and_free(struct rcu_head *rcu)
>> {
>> struct fixed_file_data *data = container_of(rcu, struct fixed_file_data,
>> rcu);
>> @@ -5338,6 +5338,18 @@ static void io_file_ref_exit_and_free(struct rcu_head *rcu)
>> kfree(data);
>> }
>>
>> +static void io_file_ref_exit_and_free(struct rcu_head *rcu)
>> +{
>> + /*
>> + * We need to order our exit+free call again the potentially
>> + * existing call_rcu() for switching to atomic. One way to do that
>> + * is to have this rcu callback queue the final put and free, as we
>> + * could otherwise a pre-existing atomic switch complete _after_
>> + * the free callback we queued.
>> + */
>> + call_rcu(rcu, __io_file_ref_exit_and_free);
>> +}
>> +
>> static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
>> {
>> struct fixed_file_data *data = ctx->file_data;
>
> Looks good to me!

Thanks Paul! I talked to Paul in private, but here's the final version
with updated comment and attributions.

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.6&id=c1e2148f8ecb26863b899d402a823dab8e26efd1

--
Jens Axboe