2021-05-21 20:24:02

by syzbot

[permalink] [raw]
Subject: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

Hello,

syzbot found the following issue on:

HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
git tree: bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KASAN: use-after-free in check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
Read of size 1 at addr ffff88802767a05c by task rcu_tasks_trace/12

CPU: 0 PID: 12 Comm: rcu_tasks_trace Not tainted 5.12.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
__kasan_report mm/kasan/report.c:419 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
kthread+0x3b1/0x4a0 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Allocated by task 8477:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:428 [inline]
__kasan_slab_alloc+0x84/0xa0 mm/kasan/common.c:461
kasan_slab_alloc include/linux/kasan.h:236 [inline]
slab_post_alloc_hook mm/slab.h:524 [inline]
slab_alloc_node mm/slub.c:2912 [inline]
kmem_cache_alloc_node+0x269/0x3e0 mm/slub.c:2948
alloc_task_struct_node kernel/fork.c:171 [inline]
dup_task_struct kernel/fork.c:865 [inline]
copy_process+0x5c8/0x7120 kernel/fork.c:1947
kernel_clone+0xe7/0xab0 kernel/fork.c:2503
__do_sys_clone+0xc8/0x110 kernel/fork.c:2620
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 12:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
____kasan_slab_free mm/kasan/common.c:360 [inline]
____kasan_slab_free mm/kasan/common.c:325 [inline]
__kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
kasan_slab_free include/linux/kasan.h:212 [inline]
slab_free_hook mm/slub.c:1581 [inline]
slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1606
slab_free mm/slub.c:3166 [inline]
kmem_cache_free+0x8a/0x740 mm/slub.c:3182
__put_task_struct+0x26f/0x400 kernel/fork.c:747
trc_wait_for_one_reader kernel/rcu/tasks.h:935 [inline]
check_all_holdout_tasks_trace+0x179/0x420 kernel/rcu/tasks.h:1081
rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
kthread+0x3b1/0x4a0 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Last potentially related work creation:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
__call_rcu kernel/rcu/tree.c:3038 [inline]
call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
release_task+0xca1/0x1690 kernel/exit.c:226
wait_task_zombie kernel/exit.c:1108 [inline]
wait_consider_task+0x2fb5/0x3b40 kernel/exit.c:1335
do_wait_thread kernel/exit.c:1398 [inline]
do_wait+0x724/0xd40 kernel/exit.c:1515
kernel_wait4+0x14c/0x260 kernel/exit.c:1678
__do_sys_wait4+0x13f/0x150 kernel/exit.c:1706
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

Second to last potentially related work creation:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
__call_rcu kernel/rcu/tree.c:3038 [inline]
call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
context_switch kernel/sched/core.c:4342 [inline]
__schedule+0x91e/0x23e0 kernel/sched/core.c:5147
preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5307
preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
try_to_wake_up+0xa12/0x14b0 kernel/sched/core.c:3489
wake_up_process kernel/sched/core.c:3552 [inline]
wake_up_q+0x96/0x100 kernel/sched/core.c:597
futex_wake+0x3e9/0x490 kernel/futex.c:1634
do_futex+0x326/0x1780 kernel/futex.c:3738
__do_sys_futex+0x2a2/0x470 kernel/futex.c:3796
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff888027679c40
which belongs to the cache task_struct of size 6976
The buggy address is located 1052 bytes inside of
6976-byte region [ffff888027679c40, ffff88802767b780)
The buggy address belongs to the page:
page:ffffea00009d9e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802767b880 pfn:0x27678
head:ffffea00009d9e00 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 ffffea000071e208 ffffea0000950808 ffff888140005140
raw: ffff88802767b880 0000000000040003 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 243, ts 14372676818, free_ts 0
prep_new_page mm/page_alloc.c:2358 [inline]
get_page_from_freelist+0x1033/0x2b60 mm/page_alloc.c:3994
__alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200
alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
alloc_slab_page mm/slub.c:1644 [inline]
allocate_slab+0x2c5/0x4c0 mm/slub.c:1784
new_slab mm/slub.c:1847 [inline]
new_slab_objects mm/slub.c:2593 [inline]
___slab_alloc+0x44c/0x7a0 mm/slub.c:2756
__slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2796
slab_alloc_node mm/slub.c:2878 [inline]
kmem_cache_alloc_node+0x12f/0x3e0 mm/slub.c:2948
alloc_task_struct_node kernel/fork.c:171 [inline]
dup_task_struct kernel/fork.c:865 [inline]
copy_process+0x5c8/0x7120 kernel/fork.c:1947
kernel_clone+0xe7/0xab0 kernel/fork.c:2503
kernel_thread+0xb5/0xf0 kernel/fork.c:2555
call_usermodehelper_exec_work kernel/umh.c:174 [inline]
call_usermodehelper_exec_work+0xcc/0x180 kernel/umh.c:160
process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
kthread+0x3b1/0x4a0 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
page_owner free stack trace missing

Memory state around the buggy address:
ffff888027679f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888027679f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88802767a000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88802767a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88802767a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2021-05-23 07:00:39

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

On Fri, May 21, 2021 at 7:29 PM syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
> git tree: bpf-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
> dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]

This looks rcu-related. +rcu mailing list

> ==================================================================
> BUG: KASAN: use-after-free in check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
> Read of size 1 at addr ffff88802767a05c by task rcu_tasks_trace/12
>
> CPU: 0 PID: 12 Comm: rcu_tasks_trace Not tainted 5.12.0-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:79 [inline]
> dump_stack+0x141/0x1d7 lib/dump_stack.c:120
> print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
> __kasan_report mm/kasan/report.c:419 [inline]
> kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
> check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
> rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
> rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
> kthread+0x3b1/0x4a0 kernel/kthread.c:313
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
>
> Allocated by task 8477:
> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> kasan_set_track mm/kasan/common.c:46 [inline]
> set_alloc_info mm/kasan/common.c:428 [inline]
> __kasan_slab_alloc+0x84/0xa0 mm/kasan/common.c:461
> kasan_slab_alloc include/linux/kasan.h:236 [inline]
> slab_post_alloc_hook mm/slab.h:524 [inline]
> slab_alloc_node mm/slub.c:2912 [inline]
> kmem_cache_alloc_node+0x269/0x3e0 mm/slub.c:2948
> alloc_task_struct_node kernel/fork.c:171 [inline]
> dup_task_struct kernel/fork.c:865 [inline]
> copy_process+0x5c8/0x7120 kernel/fork.c:1947
> kernel_clone+0xe7/0xab0 kernel/fork.c:2503
> __do_sys_clone+0xc8/0x110 kernel/fork.c:2620
> do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Freed by task 12:
> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
> kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
> ____kasan_slab_free mm/kasan/common.c:360 [inline]
> ____kasan_slab_free mm/kasan/common.c:325 [inline]
> __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
> kasan_slab_free include/linux/kasan.h:212 [inline]
> slab_free_hook mm/slub.c:1581 [inline]
> slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1606
> slab_free mm/slub.c:3166 [inline]
> kmem_cache_free+0x8a/0x740 mm/slub.c:3182
> __put_task_struct+0x26f/0x400 kernel/fork.c:747
> trc_wait_for_one_reader kernel/rcu/tasks.h:935 [inline]
> check_all_holdout_tasks_trace+0x179/0x420 kernel/rcu/tasks.h:1081
> rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
> rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
> kthread+0x3b1/0x4a0 kernel/kthread.c:313
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
>
> Last potentially related work creation:
> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
> __call_rcu kernel/rcu/tree.c:3038 [inline]
> call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
> put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
> release_task+0xca1/0x1690 kernel/exit.c:226
> wait_task_zombie kernel/exit.c:1108 [inline]
> wait_consider_task+0x2fb5/0x3b40 kernel/exit.c:1335
> do_wait_thread kernel/exit.c:1398 [inline]
> do_wait+0x724/0xd40 kernel/exit.c:1515
> kernel_wait4+0x14c/0x260 kernel/exit.c:1678
> __do_sys_wait4+0x13f/0x150 kernel/exit.c:1706
> do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Second to last potentially related work creation:
> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
> __call_rcu kernel/rcu/tree.c:3038 [inline]
> call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
> put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
> context_switch kernel/sched/core.c:4342 [inline]
> __schedule+0x91e/0x23e0 kernel/sched/core.c:5147
> preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5307
> preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
> try_to_wake_up+0xa12/0x14b0 kernel/sched/core.c:3489
> wake_up_process kernel/sched/core.c:3552 [inline]
> wake_up_q+0x96/0x100 kernel/sched/core.c:597
> futex_wake+0x3e9/0x490 kernel/futex.c:1634
> do_futex+0x326/0x1780 kernel/futex.c:3738
> __do_sys_futex+0x2a2/0x470 kernel/futex.c:3796
> do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> The buggy address belongs to the object at ffff888027679c40
> which belongs to the cache task_struct of size 6976
> The buggy address is located 1052 bytes inside of
> 6976-byte region [ffff888027679c40, ffff88802767b780)
> The buggy address belongs to the page:
> page:ffffea00009d9e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802767b880 pfn:0x27678
> head:ffffea00009d9e00 order:3 compound_mapcount:0 compound_pincount:0
> flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
> raw: 00fff00000010200 ffffea000071e208 ffffea0000950808 ffff888140005140
> raw: ffff88802767b880 0000000000040003 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
> page_owner tracks the page as allocated
> page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 243, ts 14372676818, free_ts 0
> prep_new_page mm/page_alloc.c:2358 [inline]
> get_page_from_freelist+0x1033/0x2b60 mm/page_alloc.c:3994
> __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200
> alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
> alloc_slab_page mm/slub.c:1644 [inline]
> allocate_slab+0x2c5/0x4c0 mm/slub.c:1784
> new_slab mm/slub.c:1847 [inline]
> new_slab_objects mm/slub.c:2593 [inline]
> ___slab_alloc+0x44c/0x7a0 mm/slub.c:2756
> __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2796
> slab_alloc_node mm/slub.c:2878 [inline]
> kmem_cache_alloc_node+0x12f/0x3e0 mm/slub.c:2948
> alloc_task_struct_node kernel/fork.c:171 [inline]
> dup_task_struct kernel/fork.c:865 [inline]
> copy_process+0x5c8/0x7120 kernel/fork.c:1947
> kernel_clone+0xe7/0xab0 kernel/fork.c:2503
> kernel_thread+0xb5/0xf0 kernel/fork.c:2555
> call_usermodehelper_exec_work kernel/umh.c:174 [inline]
> call_usermodehelper_exec_work+0xcc/0x180 kernel/umh.c:160
> process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
> worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
> kthread+0x3b1/0x4a0 kernel/kthread.c:313
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> page_owner free stack trace missing
>
> Memory state around the buggy address:
> ffff888027679f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff888027679f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >ffff88802767a000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ^
> ffff88802767a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88802767a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000f034fc05c2da6617%40google.com.

2021-05-24 04:16:25

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:
> On Fri, May 21, 2021 at 7:29 PM syzbot
> <[email protected]> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
> > git tree: bpf-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
> > dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
>
> This looks rcu-related. +rcu mailing list

I think I see a possible cause for this, and will say more after some
testing and after becoming more awake Monday morning, Pacific time.

Thanx, Paul

> > ==================================================================
> > BUG: KASAN: use-after-free in check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
> > Read of size 1 at addr ffff88802767a05c by task rcu_tasks_trace/12
> >
> > CPU: 0 PID: 12 Comm: rcu_tasks_trace Not tainted 5.12.0-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Call Trace:
> > __dump_stack lib/dump_stack.c:79 [inline]
> > dump_stack+0x141/0x1d7 lib/dump_stack.c:120
> > print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
> > __kasan_report mm/kasan/report.c:419 [inline]
> > kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
> > check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
> > rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
> > rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
> > kthread+0x3b1/0x4a0 kernel/kthread.c:313
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> >
> > Allocated by task 8477:
> > kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> > kasan_set_track mm/kasan/common.c:46 [inline]
> > set_alloc_info mm/kasan/common.c:428 [inline]
> > __kasan_slab_alloc+0x84/0xa0 mm/kasan/common.c:461
> > kasan_slab_alloc include/linux/kasan.h:236 [inline]
> > slab_post_alloc_hook mm/slab.h:524 [inline]
> > slab_alloc_node mm/slub.c:2912 [inline]
> > kmem_cache_alloc_node+0x269/0x3e0 mm/slub.c:2948
> > alloc_task_struct_node kernel/fork.c:171 [inline]
> > dup_task_struct kernel/fork.c:865 [inline]
> > copy_process+0x5c8/0x7120 kernel/fork.c:1947
> > kernel_clone+0xe7/0xab0 kernel/fork.c:2503
> > __do_sys_clone+0xc8/0x110 kernel/fork.c:2620
> > do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> > entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > Freed by task 12:
> > kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> > kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
> > kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
> > ____kasan_slab_free mm/kasan/common.c:360 [inline]
> > ____kasan_slab_free mm/kasan/common.c:325 [inline]
> > __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
> > kasan_slab_free include/linux/kasan.h:212 [inline]
> > slab_free_hook mm/slub.c:1581 [inline]
> > slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1606
> > slab_free mm/slub.c:3166 [inline]
> > kmem_cache_free+0x8a/0x740 mm/slub.c:3182
> > __put_task_struct+0x26f/0x400 kernel/fork.c:747
> > trc_wait_for_one_reader kernel/rcu/tasks.h:935 [inline]
> > check_all_holdout_tasks_trace+0x179/0x420 kernel/rcu/tasks.h:1081
> > rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
> > rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
> > kthread+0x3b1/0x4a0 kernel/kthread.c:313
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> >
> > Last potentially related work creation:
> > kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> > kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
> > __call_rcu kernel/rcu/tree.c:3038 [inline]
> > call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
> > put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
> > release_task+0xca1/0x1690 kernel/exit.c:226
> > wait_task_zombie kernel/exit.c:1108 [inline]
> > wait_consider_task+0x2fb5/0x3b40 kernel/exit.c:1335
> > do_wait_thread kernel/exit.c:1398 [inline]
> > do_wait+0x724/0xd40 kernel/exit.c:1515
> > kernel_wait4+0x14c/0x260 kernel/exit.c:1678
> > __do_sys_wait4+0x13f/0x150 kernel/exit.c:1706
> > do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> > entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > Second to last potentially related work creation:
> > kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> > kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
> > __call_rcu kernel/rcu/tree.c:3038 [inline]
> > call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
> > put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
> > context_switch kernel/sched/core.c:4342 [inline]
> > __schedule+0x91e/0x23e0 kernel/sched/core.c:5147
> > preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5307
> > preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
> > try_to_wake_up+0xa12/0x14b0 kernel/sched/core.c:3489
> > wake_up_process kernel/sched/core.c:3552 [inline]
> > wake_up_q+0x96/0x100 kernel/sched/core.c:597
> > futex_wake+0x3e9/0x490 kernel/futex.c:1634
> > do_futex+0x326/0x1780 kernel/futex.c:3738
> > __do_sys_futex+0x2a2/0x470 kernel/futex.c:3796
> > do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> > entry_SYSCALL_64_after_hwframe+0x44/0xae
> >
> > The buggy address belongs to the object at ffff888027679c40
> > which belongs to the cache task_struct of size 6976
> > The buggy address is located 1052 bytes inside of
> > 6976-byte region [ffff888027679c40, ffff88802767b780)
> > The buggy address belongs to the page:
> > page:ffffea00009d9e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802767b880 pfn:0x27678
> > head:ffffea00009d9e00 order:3 compound_mapcount:0 compound_pincount:0
> > flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
> > raw: 00fff00000010200 ffffea000071e208 ffffea0000950808 ffff888140005140
> > raw: ffff88802767b880 0000000000040003 00000001ffffffff 0000000000000000
> > page dumped because: kasan: bad access detected
> > page_owner tracks the page as allocated
> > page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 243, ts 14372676818, free_ts 0
> > prep_new_page mm/page_alloc.c:2358 [inline]
> > get_page_from_freelist+0x1033/0x2b60 mm/page_alloc.c:3994
> > __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200
> > alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
> > alloc_slab_page mm/slub.c:1644 [inline]
> > allocate_slab+0x2c5/0x4c0 mm/slub.c:1784
> > new_slab mm/slub.c:1847 [inline]
> > new_slab_objects mm/slub.c:2593 [inline]
> > ___slab_alloc+0x44c/0x7a0 mm/slub.c:2756
> > __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2796
> > slab_alloc_node mm/slub.c:2878 [inline]
> > kmem_cache_alloc_node+0x12f/0x3e0 mm/slub.c:2948
> > alloc_task_struct_node kernel/fork.c:171 [inline]
> > dup_task_struct kernel/fork.c:865 [inline]
> > copy_process+0x5c8/0x7120 kernel/fork.c:1947
> > kernel_clone+0xe7/0xab0 kernel/fork.c:2503
> > kernel_thread+0xb5/0xf0 kernel/fork.c:2555
> > call_usermodehelper_exec_work kernel/umh.c:174 [inline]
> > call_usermodehelper_exec_work+0xcc/0x180 kernel/umh.c:160
> > process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
> > worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
> > kthread+0x3b1/0x4a0 kernel/kthread.c:313
> > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> > page_owner free stack trace missing
> >
> > Memory state around the buggy address:
> > ffff888027679f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ffff888027679f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > >ffff88802767a000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ^
> > ffff88802767a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ffff88802767a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ==================================================================
> >
> >
> > ---
> > This report is generated by a bot. It may contain errors.
> > See https://goo.gl/tpsmEJ for more information about syzbot.
> > syzbot engineers can be reached at [email protected].
> >
> > syzbot will keep track of this issue. See:
> > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000f034fc05c2da6617%40google.com.

2021-05-24 22:48:30

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

On Sun, May 23, 2021 at 09:13:50PM -0700, Paul E. McKenney wrote:
> On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:
> > On Fri, May 21, 2021 at 7:29 PM syzbot
> > <[email protected]> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
> > > git tree: bpf-next
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: [email protected]
> >
> > This looks rcu-related. +rcu mailing list
>
> I think I see a possible cause for this, and will say more after some
> testing and after becoming more awake Monday morning, Pacific time.

No joy. From what I can see, within RCU Tasks Trace, the calls to
get_task_struct() are properly protected (either by RCU or by an earlier
get_task_struct()), and the calls to put_task_struct() are balanced by
those to get_task_struct().

I could of course have missed something, but at this point I am suspecting
an unbalanced put_task_struct() has been added elsewhere.

As always, extra eyes on this code would be a good thing.

If it were reproducible, I would of course suggest bisection. :-/

Thanx, Paul

> > > ==================================================================
> > > BUG: KASAN: use-after-free in check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
> > > Read of size 1 at addr ffff88802767a05c by task rcu_tasks_trace/12
> > >
> > > CPU: 0 PID: 12 Comm: rcu_tasks_trace Not tainted 5.12.0-syzkaller #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > > Call Trace:
> > > __dump_stack lib/dump_stack.c:79 [inline]
> > > dump_stack+0x141/0x1d7 lib/dump_stack.c:120
> > > print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
> > > __kasan_report mm/kasan/report.c:419 [inline]
> > > kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
> > > check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
> > > rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
> > > rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
> > > kthread+0x3b1/0x4a0 kernel/kthread.c:313
> > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> > >
> > > Allocated by task 8477:
> > > kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> > > kasan_set_track mm/kasan/common.c:46 [inline]
> > > set_alloc_info mm/kasan/common.c:428 [inline]
> > > __kasan_slab_alloc+0x84/0xa0 mm/kasan/common.c:461
> > > kasan_slab_alloc include/linux/kasan.h:236 [inline]
> > > slab_post_alloc_hook mm/slab.h:524 [inline]
> > > slab_alloc_node mm/slub.c:2912 [inline]
> > > kmem_cache_alloc_node+0x269/0x3e0 mm/slub.c:2948
> > > alloc_task_struct_node kernel/fork.c:171 [inline]
> > > dup_task_struct kernel/fork.c:865 [inline]
> > > copy_process+0x5c8/0x7120 kernel/fork.c:1947
> > > kernel_clone+0xe7/0xab0 kernel/fork.c:2503
> > > __do_sys_clone+0xc8/0x110 kernel/fork.c:2620
> > > do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> > > entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > Freed by task 12:
> > > kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> > > kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
> > > kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
> > > ____kasan_slab_free mm/kasan/common.c:360 [inline]
> > > ____kasan_slab_free mm/kasan/common.c:325 [inline]
> > > __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
> > > kasan_slab_free include/linux/kasan.h:212 [inline]
> > > slab_free_hook mm/slub.c:1581 [inline]
> > > slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1606
> > > slab_free mm/slub.c:3166 [inline]
> > > kmem_cache_free+0x8a/0x740 mm/slub.c:3182
> > > __put_task_struct+0x26f/0x400 kernel/fork.c:747
> > > trc_wait_for_one_reader kernel/rcu/tasks.h:935 [inline]
> > > check_all_holdout_tasks_trace+0x179/0x420 kernel/rcu/tasks.h:1081
> > > rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
> > > rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
> > > kthread+0x3b1/0x4a0 kernel/kthread.c:313
> > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> > >
> > > Last potentially related work creation:
> > > kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> > > kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
> > > __call_rcu kernel/rcu/tree.c:3038 [inline]
> > > call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
> > > put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
> > > release_task+0xca1/0x1690 kernel/exit.c:226
> > > wait_task_zombie kernel/exit.c:1108 [inline]
> > > wait_consider_task+0x2fb5/0x3b40 kernel/exit.c:1335
> > > do_wait_thread kernel/exit.c:1398 [inline]
> > > do_wait+0x724/0xd40 kernel/exit.c:1515
> > > kernel_wait4+0x14c/0x260 kernel/exit.c:1678
> > > __do_sys_wait4+0x13f/0x150 kernel/exit.c:1706
> > > do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> > > entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > Second to last potentially related work creation:
> > > kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> > > kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
> > > __call_rcu kernel/rcu/tree.c:3038 [inline]
> > > call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
> > > put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
> > > context_switch kernel/sched/core.c:4342 [inline]
> > > __schedule+0x91e/0x23e0 kernel/sched/core.c:5147
> > > preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5307
> > > preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
> > > try_to_wake_up+0xa12/0x14b0 kernel/sched/core.c:3489
> > > wake_up_process kernel/sched/core.c:3552 [inline]
> > > wake_up_q+0x96/0x100 kernel/sched/core.c:597
> > > futex_wake+0x3e9/0x490 kernel/futex.c:1634
> > > do_futex+0x326/0x1780 kernel/futex.c:3738
> > > __do_sys_futex+0x2a2/0x470 kernel/futex.c:3796
> > > do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> > > entry_SYSCALL_64_after_hwframe+0x44/0xae
> > >
> > > The buggy address belongs to the object at ffff888027679c40
> > > which belongs to the cache task_struct of size 6976
> > > The buggy address is located 1052 bytes inside of
> > > 6976-byte region [ffff888027679c40, ffff88802767b780)
> > > The buggy address belongs to the page:
> > > page:ffffea00009d9e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802767b880 pfn:0x27678
> > > head:ffffea00009d9e00 order:3 compound_mapcount:0 compound_pincount:0
> > > flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
> > > raw: 00fff00000010200 ffffea000071e208 ffffea0000950808 ffff888140005140
> > > raw: ffff88802767b880 0000000000040003 00000001ffffffff 0000000000000000
> > > page dumped because: kasan: bad access detected
> > > page_owner tracks the page as allocated
> > > page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 243, ts 14372676818, free_ts 0
> > > prep_new_page mm/page_alloc.c:2358 [inline]
> > > get_page_from_freelist+0x1033/0x2b60 mm/page_alloc.c:3994
> > > __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200
> > > alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
> > > alloc_slab_page mm/slub.c:1644 [inline]
> > > allocate_slab+0x2c5/0x4c0 mm/slub.c:1784
> > > new_slab mm/slub.c:1847 [inline]
> > > new_slab_objects mm/slub.c:2593 [inline]
> > > ___slab_alloc+0x44c/0x7a0 mm/slub.c:2756
> > > __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2796
> > > slab_alloc_node mm/slub.c:2878 [inline]
> > > kmem_cache_alloc_node+0x12f/0x3e0 mm/slub.c:2948
> > > alloc_task_struct_node kernel/fork.c:171 [inline]
> > > dup_task_struct kernel/fork.c:865 [inline]
> > > copy_process+0x5c8/0x7120 kernel/fork.c:1947
> > > kernel_clone+0xe7/0xab0 kernel/fork.c:2503
> > > kernel_thread+0xb5/0xf0 kernel/fork.c:2555
> > > call_usermodehelper_exec_work kernel/umh.c:174 [inline]
> > > call_usermodehelper_exec_work+0xcc/0x180 kernel/umh.c:160
> > > process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
> > > worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
> > > kthread+0x3b1/0x4a0 kernel/kthread.c:313
> > > ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> > > page_owner free stack trace missing
> > >
> > > Memory state around the buggy address:
> > > ffff888027679f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > ffff888027679f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > >ffff88802767a000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > ^
> > > ffff88802767a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > ffff88802767a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > > ==================================================================
> > >
> > >
> > > ---
> > > This report is generated by a bot. It may contain errors.
> > > See https://goo.gl/tpsmEJ for more information about syzbot.
> > > syzbot engineers can be reached at [email protected].
> > >
> > > syzbot will keep track of this issue. See:
> > > https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000f034fc05c2da6617%40google.com.

2021-05-25 02:48:52

by Xu, Yanfei

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace



On 5/25/21 6:46 AM, Paul E. McKenney wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, May 23, 2021 at 09:13:50PM -0700, Paul E. McKenney wrote:
>> On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:
>>> On Fri, May 21, 2021 at 7:29 PM syzbot
>>> <[email protected]> wrote:
>>>>
>>>> Hello,
>>>>
>>>> syzbot found the following issue on:
>>>>
>>>> HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
>>>> git tree: bpf-next
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
>>>>
>>>> Unfortunately, I don't have any reproducer for this issue yet.
>>>>
>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>> Reported-by: [email protected]
>>>
>>> This looks rcu-related. +rcu mailing list
>>
>> I think I see a possible cause for this, and will say more after some
>> testing and after becoming more awake Monday morning, Pacific time.
>
> No joy. From what I can see, within RCU Tasks Trace, the calls to
> get_task_struct() are properly protected (either by RCU or by an earlier
> get_task_struct()), and the calls to put_task_struct() are balanced by
> those to get_task_struct().
>
> I could of course have missed something, but at this point I am suspecting
> an unbalanced put_task_struct() has been added elsewhere.
>
> As always, extra eyes on this code would be a good thing.
>
> If it were reproducible, I would of course suggest bisection. :-/
>
> Thanx, Paul
>
Hi Paul,

Could it be?

CPU1 CPU2
trc_add_holdout(t, bhp)
//t->usage==2
release_task
put_task_struct_rcu_user
delayed_put_task_struct
......
put_task_struct(t)
//t->usage==1

check_all_holdout_tasks_trace
->trc_wait_for_one_reader
->trc_del_holdout
->put_task_struct(t)
//t->usage==0 and task_struct freed
READ_ONCE(t->trc_reader_checked)
//ops, t had been freed.

So, after excuting trc_wait_for_one_reader(), task might had been
removed from holdout list and the corresponding task_struct was freed.
And we shouldn't do READ_ONCE(t->trc_reader_checked).

I investigate the trc_wait_for_one_reader() and found before we excute
trc_del_holdout, there is always set t->trc_reader_checked=true. How
about we just set the checked flag and unified excute trc_del_holdout()
in check_all_holdout_tasks_trace with checking the flag?


Thanks,
Yanfei




>>>> ==================================================================
>>>> BUG: KASAN: use-after-free in check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
>>>> Read of size 1 at addr ffff88802767a05c by task rcu_tasks_trace/12
>>>>
>>>> CPU: 0 PID: 12 Comm: rcu_tasks_trace Not tainted 5.12.0-syzkaller #0
>>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>>> Call Trace:
>>>> __dump_stack lib/dump_stack.c:79 [inline]
>>>> dump_stack+0x141/0x1d7 lib/dump_stack.c:120
>>>> print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
>>>> __kasan_report mm/kasan/report.c:419 [inline]
>>>> kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
>>>> check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
>>>> rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
>>>> rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
>>>> kthread+0x3b1/0x4a0 kernel/kthread.c:313
>>>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
>>>>
>>>> Allocated by task 8477:
>>>> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
>>>> kasan_set_track mm/kasan/common.c:46 [inline]
>>>> set_alloc_info mm/kasan/common.c:428 [inline]
>>>> __kasan_slab_alloc+0x84/0xa0 mm/kasan/common.c:461
>>>> kasan_slab_alloc include/linux/kasan.h:236 [inline]
>>>> slab_post_alloc_hook mm/slab.h:524 [inline]
>>>> slab_alloc_node mm/slub.c:2912 [inline]
>>>> kmem_cache_alloc_node+0x269/0x3e0 mm/slub.c:2948
>>>> alloc_task_struct_node kernel/fork.c:171 [inline]
>>>> dup_task_struct kernel/fork.c:865 [inline]
>>>> copy_process+0x5c8/0x7120 kernel/fork.c:1947
>>>> kernel_clone+0xe7/0xab0 kernel/fork.c:2503
>>>> __do_sys_clone+0xc8/0x110 kernel/fork.c:2620
>>>> do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
>>>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>
>>>> Freed by task 12:
>>>> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
>>>> kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
>>>> kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
>>>> ____kasan_slab_free mm/kasan/common.c:360 [inline]
>>>> ____kasan_slab_free mm/kasan/common.c:325 [inline]
>>>> __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
>>>> kasan_slab_free include/linux/kasan.h:212 [inline]
>>>> slab_free_hook mm/slub.c:1581 [inline]
>>>> slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1606
>>>> slab_free mm/slub.c:3166 [inline]
>>>> kmem_cache_free+0x8a/0x740 mm/slub.c:3182
>>>> __put_task_struct+0x26f/0x400 kernel/fork.c:747
>>>> trc_wait_for_one_reader kernel/rcu/tasks.h:935 [inline]
>>>> check_all_holdout_tasks_trace+0x179/0x420 kernel/rcu/tasks.h:1081
>>>> rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
>>>> rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
>>>> kthread+0x3b1/0x4a0 kernel/kthread.c:313
>>>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
>>>>
>>>> Last potentially related work creation:
>>>> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
>>>> kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
>>>> __call_rcu kernel/rcu/tree.c:3038 [inline]
>>>> call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
>>>> put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
>>>> release_task+0xca1/0x1690 kernel/exit.c:226
>>>> wait_task_zombie kernel/exit.c:1108 [inline]
>>>> wait_consider_task+0x2fb5/0x3b40 kernel/exit.c:1335
>>>> do_wait_thread kernel/exit.c:1398 [inline]
>>>> do_wait+0x724/0xd40 kernel/exit.c:1515
>>>> kernel_wait4+0x14c/0x260 kernel/exit.c:1678
>>>> __do_sys_wait4+0x13f/0x150 kernel/exit.c:1706
>>>> do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
>>>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>
>>>> Second to last potentially related work creation:
>>>> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
>>>> kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
>>>> __call_rcu kernel/rcu/tree.c:3038 [inline]
>>>> call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
>>>> put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
>>>> context_switch kernel/sched/core.c:4342 [inline]
>>>> __schedule+0x91e/0x23e0 kernel/sched/core.c:5147
>>>> preempt_schedule_common+0x45/0xc0 kernel/sched/core.c:5307
>>>> preempt_schedule_thunk+0x16/0x18 arch/x86/entry/thunk_64.S:35
>>>> try_to_wake_up+0xa12/0x14b0 kernel/sched/core.c:3489
>>>> wake_up_process kernel/sched/core.c:3552 [inline]
>>>> wake_up_q+0x96/0x100 kernel/sched/core.c:597
>>>> futex_wake+0x3e9/0x490 kernel/futex.c:1634
>>>> do_futex+0x326/0x1780 kernel/futex.c:3738
>>>> __do_sys_futex+0x2a2/0x470 kernel/futex.c:3796
>>>> do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
>>>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>
>>>> The buggy address belongs to the object at ffff888027679c40
>>>> which belongs to the cache task_struct of size 6976
>>>> The buggy address is located 1052 bytes inside of
>>>> 6976-byte region [ffff888027679c40, ffff88802767b780)
>>>> The buggy address belongs to the page:
>>>> page:ffffea00009d9e00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802767b880 pfn:0x27678
>>>> head:ffffea00009d9e00 order:3 compound_mapcount:0 compound_pincount:0
>>>> flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
>>>> raw: 00fff00000010200 ffffea000071e208 ffffea0000950808 ffff888140005140
>>>> raw: ffff88802767b880 0000000000040003 00000001ffffffff 0000000000000000
>>>> page dumped because: kasan: bad access detected
>>>> page_owner tracks the page as allocated
>>>> page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 243, ts 14372676818, free_ts 0
>>>> prep_new_page mm/page_alloc.c:2358 [inline]
>>>> get_page_from_freelist+0x1033/0x2b60 mm/page_alloc.c:3994
>>>> __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200
>>>> alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
>>>> alloc_slab_page mm/slub.c:1644 [inline]
>>>> allocate_slab+0x2c5/0x4c0 mm/slub.c:1784
>>>> new_slab mm/slub.c:1847 [inline]
>>>> new_slab_objects mm/slub.c:2593 [inline]
>>>> ___slab_alloc+0x44c/0x7a0 mm/slub.c:2756
>>>> __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2796
>>>> slab_alloc_node mm/slub.c:2878 [inline]
>>>> kmem_cache_alloc_node+0x12f/0x3e0 mm/slub.c:2948
>>>> alloc_task_struct_node kernel/fork.c:171 [inline]
>>>> dup_task_struct kernel/fork.c:865 [inline]
>>>> copy_process+0x5c8/0x7120 kernel/fork.c:1947
>>>> kernel_clone+0xe7/0xab0 kernel/fork.c:2503
>>>> kernel_thread+0xb5/0xf0 kernel/fork.c:2555
>>>> call_usermodehelper_exec_work kernel/umh.c:174 [inline]
>>>> call_usermodehelper_exec_work+0xcc/0x180 kernel/umh.c:160
>>>> process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
>>>> worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
>>>> kthread+0x3b1/0x4a0 kernel/kthread.c:313
>>>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
>>>> page_owner free stack trace missing
>>>>
>>>> Memory state around the buggy address:
>>>> ffff888027679f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> ffff888027679f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>>> ffff88802767a000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> ^
>>>> ffff88802767a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> ffff88802767a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>>> ==================================================================
>>>>
>>>>
>>>> ---
>>>> This report is generated by a bot. It may contain errors.
>>>> See https://goo.gl/tpsmEJ for more information about syzbot.
>>>> syzbot engineers can be reached at [email protected].
>>>>
>>>> syzbot will keep track of this issue. See:
>>>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000f034fc05c2da6617%40google.com.

2021-05-25 03:38:53

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

On Tue, May 25, 2021 at 10:31:55AM +0800, Xu, Yanfei wrote:
>
>
> On 5/25/21 6:46 AM, Paul E. McKenney wrote:
> > [Please note: This e-mail is from an EXTERNAL e-mail address]
> >
> > On Sun, May 23, 2021 at 09:13:50PM -0700, Paul E. McKenney wrote:
> > > On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:
> > > > On Fri, May 21, 2021 at 7:29 PM syzbot
> > > > <[email protected]> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > syzbot found the following issue on:
> > > > >
> > > > > HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
> > > > > git tree: bpf-next
> > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
> > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
> > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
> > > > >
> > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > >
> > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > Reported-by: [email protected]
> > > >
> > > > This looks rcu-related. +rcu mailing list
> > >
> > > I think I see a possible cause for this, and will say more after some
> > > testing and after becoming more awake Monday morning, Pacific time.
> >
> > No joy. From what I can see, within RCU Tasks Trace, the calls to
> > get_task_struct() are properly protected (either by RCU or by an earlier
> > get_task_struct()), and the calls to put_task_struct() are balanced by
> > those to get_task_struct().
> >
> > I could of course have missed something, but at this point I am suspecting
> > an unbalanced put_task_struct() has been added elsewhere.
> >
> > As always, extra eyes on this code would be a good thing.
> >
> > If it were reproducible, I would of course suggest bisection. :-/
> >
> > Thanx, Paul
> >
> Hi Paul,
>
> Could it be?
>
> CPU1 CPU2
> trc_add_holdout(t, bhp)
> //t->usage==2
> release_task
> put_task_struct_rcu_user
> delayed_put_task_struct
> ......
> put_task_struct(t)
> //t->usage==1
>
> check_all_holdout_tasks_trace
> ->trc_wait_for_one_reader
> ->trc_del_holdout
> ->put_task_struct(t)
> //t->usage==0 and task_struct freed
> READ_ONCE(t->trc_reader_checked)
> //ops, t had been freed.
>
> So, after excuting trc_wait_for_one_reader(), task might had been removed
> from holdout list and the corresponding task_struct was freed.
> And we shouldn't do READ_ONCE(t->trc_reader_checked).

I was suspicious of that call to trc_del_holdout() from within
trc_wait_for_one_reader(), but the only time it executes is in the
context of the current running task, which means that CPU 2 had better
not be invoking release_task() on it just yet.

Or am I missing your point?

Of course, if you can reproduce it, the following patch might be
an interesting thing to try, my doubts notwithstanding. But more
important, please check the patch to make sure that we are both
talking about the same call to trc_del_holdout()!

If we are talking about the same call to trc_del_holdout(), are you
actually seeing that code execute except when rcu_tasks_trace_pertask()
calls trc_wait_for_one_reader()?

> I investigate the trc_wait_for_one_reader() and found before we excute
> trc_del_holdout, there is always set t->trc_reader_checked=true. How about
> we just set the checked flag and unified excute trc_del_holdout()
> in check_all_holdout_tasks_trace with checking the flag?

The problem is that we cannot execute trc_del_holdout() except in
the context of the RCU Tasks Trace grace-period kthread. So it is
necessary to manipulate ->trc_reader_checked separately from the list
in order to safely synchronize with IPIs and with the exit code path
for any reader tasks, see for example trc_read_check_handler() and
exit_tasks_rcu_finish_trace().

Or are you thinking of some other approach?

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index efb8127f3a36..2a0d4bdd619a 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -987,7 +987,6 @@ static void trc_wait_for_one_reader(struct task_struct *t,
// The current task had better be in a quiescent state.
if (t == current) {
t->trc_reader_checked = true;
- trc_del_holdout(t);
WARN_ON_ONCE(READ_ONCE(t->trc_reader_nesting));
return;
}

2021-05-25 08:54:34

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

On Tue, May 25, 2021 at 5:33 AM Paul E. McKenney <[email protected]> wrote:
>
> On Tue, May 25, 2021 at 10:31:55AM +0800, Xu, Yanfei wrote:
> >
> >
> > On 5/25/21 6:46 AM, Paul E. McKenney wrote:
> > > [Please note: This e-mail is from an EXTERNAL e-mail address]
> > >
> > > On Sun, May 23, 2021 at 09:13:50PM -0700, Paul E. McKenney wrote:
> > > > On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:
> > > > > On Fri, May 21, 2021 at 7:29 PM syzbot
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > syzbot found the following issue on:
> > > > > >
> > > > > > HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
> > > > > > git tree: bpf-next
> > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
> > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
> > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
> > > > > >
> > > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > > >
> > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > Reported-by: [email protected]
> > > > >
> > > > > This looks rcu-related. +rcu mailing list
> > > >
> > > > I think I see a possible cause for this, and will say more after some
> > > > testing and after becoming more awake Monday morning, Pacific time.
> > >
> > > No joy. From what I can see, within RCU Tasks Trace, the calls to
> > > get_task_struct() are properly protected (either by RCU or by an earlier
> > > get_task_struct()), and the calls to put_task_struct() are balanced by
> > > those to get_task_struct().
> > >
> > > I could of course have missed something, but at this point I am suspecting
> > > an unbalanced put_task_struct() has been added elsewhere.
> > >
> > > As always, extra eyes on this code would be a good thing.
> > >
> > > If it were reproducible, I would of course suggest bisection. :-/
> > >
> > > Thanx, Paul
> > >
> > Hi Paul,
> >
> > Could it be?
> >
> > CPU1 CPU2
> > trc_add_holdout(t, bhp)
> > //t->usage==2
> > release_task
> > put_task_struct_rcu_user
> > delayed_put_task_struct
> > ......
> > put_task_struct(t)
> > //t->usage==1
> >
> > check_all_holdout_tasks_trace
> > ->trc_wait_for_one_reader
> > ->trc_del_holdout
> > ->put_task_struct(t)
> > //t->usage==0 and task_struct freed
> > READ_ONCE(t->trc_reader_checked)
> > //ops, t had been freed.
> >
> > So, after excuting trc_wait_for_one_reader(), task might had been removed
> > from holdout list and the corresponding task_struct was freed.
> > And we shouldn't do READ_ONCE(t->trc_reader_checked).
>
> I was suspicious of that call to trc_del_holdout() from within
> trc_wait_for_one_reader(), but the only time it executes is in the
> context of the current running task, which means that CPU 2 had better
> not be invoking release_task() on it just yet.
>
> Or am I missing your point?
>
> Of course, if you can reproduce it, the following patch might be
> an interesting thing to try, my doubts notwithstanding. But more
> important, please check the patch to make sure that we are both
> talking about the same call to trc_del_holdout()!
>
> If we are talking about the same call to trc_del_holdout(), are you
> actually seeing that code execute except when rcu_tasks_trace_pertask()
> calls trc_wait_for_one_reader()?
>
> > I investigate the trc_wait_for_one_reader() and found before we excute
> > trc_del_holdout, there is always set t->trc_reader_checked=true. How about
> > we just set the checked flag and unified excute trc_del_holdout()
> > in check_all_holdout_tasks_trace with checking the flag?
>
> The problem is that we cannot execute trc_del_holdout() except in
> the context of the RCU Tasks Trace grace-period kthread. So it is
> necessary to manipulate ->trc_reader_checked separately from the list
> in order to safely synchronize with IPIs and with the exit code path
> for any reader tasks, see for example trc_read_check_handler() and
> exit_tasks_rcu_finish_trace().
>
> Or are you thinking of some other approach?

This could be caused by a buggy extra put_pid somewhere else, right?
If so, I suspect that's what may be happening. We've 2 very similar
use-after-free reports on an internal kernel, but it also has a number
of other use-after-free reports in pid-related functions
(pid_task/pid_nr_ns/attach_pid). One of them is happening relatively
frequently (150 crashes) and is caused by something in the tty
subsystem. Presumably it may be causing one off use-after-free's in
other random places of the kernel as well. Unfortunately these crashes
don't happen on the upstream kernel (at least not yet).
So if you don't see any obvious smoking gun in rcu, I think we can
assume for now that it's due to tty.



> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index efb8127f3a36..2a0d4bdd619a 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -987,7 +987,6 @@ static void trc_wait_for_one_reader(struct task_struct *t,
> // The current task had better be in a quiescent state.
> if (t == current) {
> t->trc_reader_checked = true;
> - trc_del_holdout(t);
> WARN_ON_ONCE(READ_ONCE(t->trc_reader_nesting));
> return;
> }
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20210525033355.GN4441%40paulmck-ThinkPad-P17-Gen-1.

2021-05-25 10:41:56

by Xu, Yanfei

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace



On 5/25/21 11:33 AM, Paul E. McKenney wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Tue, May 25, 2021 at 10:31:55AM +0800, Xu, Yanfei wrote:
>>
>>
>> On 5/25/21 6:46 AM, Paul E. McKenney wrote:
>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>
>>> On Sun, May 23, 2021 at 09:13:50PM -0700, Paul E. McKenney wrote:
>>>> On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:
>>>>> On Fri, May 21, 2021 at 7:29 PM syzbot
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> syzbot found the following issue on:
>>>>>>
>>>>>> HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
>>>>>> git tree: bpf-next
>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
>>>>>>
>>>>>> Unfortunately, I don't have any reproducer for this issue yet.
>>>>>>
>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>>> Reported-by: [email protected]
>>>>>
>>>>> This looks rcu-related. +rcu mailing list
>>>>
>>>> I think I see a possible cause for this, and will say more after some
>>>> testing and after becoming more awake Monday morning, Pacific time.
>>>
>>> No joy. From what I can see, within RCU Tasks Trace, the calls to
>>> get_task_struct() are properly protected (either by RCU or by an earlier
>>> get_task_struct()), and the calls to put_task_struct() are balanced by
>>> those to get_task_struct().
>>>
>>> I could of course have missed something, but at this point I am suspecting
>>> an unbalanced put_task_struct() has been added elsewhere.
>>>
>>> As always, extra eyes on this code would be a good thing.
>>>
>>> If it were reproducible, I would of course suggest bisection. :-/
>>>
>>> Thanx, Paul
>>>
>> Hi Paul,
>>
>> Could it be?
>>
>> CPU1 CPU2
>> trc_add_holdout(t, bhp)
>> //t->usage==2
>> release_task
>> put_task_struct_rcu_user
>> delayed_put_task_struct
>> ......
>> put_task_struct(t)
>> //t->usage==1
>>
>> check_all_holdout_tasks_trace
>> ->trc_wait_for_one_reader
>> ->trc_del_holdout
>> ->put_task_struct(t)
>> //t->usage==0 and task_struct freed
>> READ_ONCE(t->trc_reader_checked)
>> //ops, t had been freed.
>>
>> So, after excuting trc_wait_for_one_reader(), task might had been removed
>> from holdout list and the corresponding task_struct was freed.
>> And we shouldn't do READ_ONCE(t->trc_reader_checked).
>
> I was suspicious of that call to trc_del_holdout() from within
> trc_wait_for_one_reader(), but the only time it executes is in the
> context of the current running task, which means that CPU 2 had better
> not be invoking release_task() on it just yet.
>
> Or am I missing your point?

Two times.
1. the task is current.

trc_wait_for_one_reader
->trc_del_holdout

2. task isn't current.

trc_wait_for_one_reader
->get_task_struct
->try_invoke_on_locked_down_task(trc_inspect_reader)
->trc_del_holdout
->put_task_struct


>
> Of course, if you can reproduce it, the following patch might be

Sorry...I can't reproduce it, just analyse syzbot's log. :(


Thanks,
Yanfei

> an interesting thing to try, my doubts notwithstanding. But more
> important, please check the patch to make sure that we are both
> talking about the same call to trc_del_holdout()!
>
> If we are talking about the same call to trc_del_holdout(), are you
> actually seeing that code execute except when rcu_tasks_trace_pertask()
> calls trc_wait_for_one_reader()?
>
>> I investigate the trc_wait_for_one_reader() and found before we excute
>> trc_del_holdout, there is always set t->trc_reader_checked=true. How about
>> we just set the checked flag and unified excute trc_del_holdout()
>> in check_all_holdout_tasks_trace with checking the flag?
>
> The problem is that we cannot execute trc_del_holdout() except in
> the context of the RCU Tasks Trace grace-period kthread. So it is
> necessary to manipulate ->trc_reader_checked separately from the list
> in order to safely synchronize with IPIs and with the exit code path
> for any reader tasks, see for example trc_read_check_handler() and
> exit_tasks_rcu_finish_trace().
>
> Or are you thinking of some other approach?
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index efb8127f3a36..2a0d4bdd619a 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -987,7 +987,6 @@ static void trc_wait_for_one_reader(struct task_struct *t,
> // The current task had better be in a quiescent state.
> if (t == current) {
> t->trc_reader_checked = true;
> - trc_del_holdout(t);
> WARN_ON_ONCE(READ_ONCE(t->trc_reader_nesting));
> return;
> }
>

2021-05-25 17:07:51

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

On Tue, May 25, 2021 at 06:24:10PM +0800, Xu, Yanfei wrote:
>
>
> On 5/25/21 11:33 AM, Paul E. McKenney wrote:
> > [Please note: This e-mail is from an EXTERNAL e-mail address]
> >
> > On Tue, May 25, 2021 at 10:31:55AM +0800, Xu, Yanfei wrote:
> > >
> > >
> > > On 5/25/21 6:46 AM, Paul E. McKenney wrote:
> > > > [Please note: This e-mail is from an EXTERNAL e-mail address]
> > > >
> > > > On Sun, May 23, 2021 at 09:13:50PM -0700, Paul E. McKenney wrote:
> > > > > On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:
> > > > > > On Fri, May 21, 2021 at 7:29 PM syzbot
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > syzbot found the following issue on:
> > > > > > >
> > > > > > > HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
> > > > > > > git tree: bpf-next
> > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
> > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
> > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
> > > > > > >
> > > > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > > > >
> > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > > Reported-by: [email protected]
> > > > > >
> > > > > > This looks rcu-related. +rcu mailing list
> > > > >
> > > > > I think I see a possible cause for this, and will say more after some
> > > > > testing and after becoming more awake Monday morning, Pacific time.
> > > >
> > > > No joy. From what I can see, within RCU Tasks Trace, the calls to
> > > > get_task_struct() are properly protected (either by RCU or by an earlier
> > > > get_task_struct()), and the calls to put_task_struct() are balanced by
> > > > those to get_task_struct().
> > > >
> > > > I could of course have missed something, but at this point I am suspecting
> > > > an unbalanced put_task_struct() has been added elsewhere.
> > > >
> > > > As always, extra eyes on this code would be a good thing.
> > > >
> > > > If it were reproducible, I would of course suggest bisection. :-/
> > > >
> > > > Thanx, Paul
> > > >
> > > Hi Paul,
> > >
> > > Could it be?
> > >
> > > CPU1 CPU2
> > > trc_add_holdout(t, bhp)
> > > //t->usage==2
> > > release_task
> > > put_task_struct_rcu_user
> > > delayed_put_task_struct
> > > ......
> > > put_task_struct(t)
> > > //t->usage==1
> > >
> > > check_all_holdout_tasks_trace
> > > ->trc_wait_for_one_reader
> > > ->trc_del_holdout
> > > ->put_task_struct(t)
> > > //t->usage==0 and task_struct freed
> > > READ_ONCE(t->trc_reader_checked)
> > > //ops, t had been freed.
> > >
> > > So, after excuting trc_wait_for_one_reader(), task might had been removed
> > > from holdout list and the corresponding task_struct was freed.
> > > And we shouldn't do READ_ONCE(t->trc_reader_checked).
> >
> > I was suspicious of that call to trc_del_holdout() from within
> > trc_wait_for_one_reader(), but the only time it executes is in the
> > context of the current running task, which means that CPU 2 had better
> > not be invoking release_task() on it just yet.
> >
> > Or am I missing your point?
>
> Two times.
> 1. the task is current.
>
> trc_wait_for_one_reader
> ->trc_del_holdout

This one should be fine because the task cannot be freed until it
actually exits, and the grace-period kthread never exits. But it
could also be removed without any problem that I see.

> 2. task isn't current.
>
> trc_wait_for_one_reader
> ->get_task_struct
> ->try_invoke_on_locked_down_task(trc_inspect_reader)
> ->trc_del_holdout
> ->put_task_struct

Ah, this one is more interesting, thank you!

Yes, it is safe from the list's viewpoint to do the removal in the
trc_inspect_reader() callback, but you are right that the grace-period
kthread may touch the task structure after return, and there might not
be anything else holding that task structure in place.

> > Of course, if you can reproduce it, the following patch might be
>
> Sorry...I can't reproduce it, just analyse syzbot's log. :(

Well, if it could be reproduced, that would mean that it was too easy,
wouldn't it? ;-)

How about the (untested) patch below, just to make sure that we are
talking about the same thing? I have started testing, but then
again, I have not yet been able to reproduce this, either.

Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
index efb8127f3a36..8b25551d10db 100644
--- a/kernel/rcu/tasks.h
+++ b/kernel/rcu/tasks.h
@@ -957,10 +957,9 @@ static bool trc_inspect_reader(struct task_struct *t, void *arg)
in_qs = likely(!t->trc_reader_nesting);
}

- // Mark as checked. Because this is called from the grace-period
- // kthread, also remove the task from the holdout list.
+ // Mark as checked so that the grace-period kthread will
+ // remove it from the holdout list.
t->trc_reader_checked = true;
- trc_del_holdout(t);

if (in_qs)
return true; // Already in quiescent state, done!!!

2021-05-25 19:27:01

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

On Tue, May 25, 2021 at 10:33:27AM +0200, Dmitry Vyukov wrote:
> On Tue, May 25, 2021 at 5:33 AM Paul E. McKenney <[email protected]> wrote:
> >
> > On Tue, May 25, 2021 at 10:31:55AM +0800, Xu, Yanfei wrote:
> > >
> > >
> > > On 5/25/21 6:46 AM, Paul E. McKenney wrote:
> > > > [Please note: This e-mail is from an EXTERNAL e-mail address]
> > > >
> > > > On Sun, May 23, 2021 at 09:13:50PM -0700, Paul E. McKenney wrote:
> > > > > On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:
> > > > > > On Fri, May 21, 2021 at 7:29 PM syzbot
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > syzbot found the following issue on:
> > > > > > >
> > > > > > > HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
> > > > > > > git tree: bpf-next
> > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
> > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
> > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
> > > > > > >
> > > > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > > > >
> > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > > Reported-by: [email protected]
> > > > > >
> > > > > > This looks rcu-related. +rcu mailing list
> > > > >
> > > > > I think I see a possible cause for this, and will say more after some
> > > > > testing and after becoming more awake Monday morning, Pacific time.
> > > >
> > > > No joy. From what I can see, within RCU Tasks Trace, the calls to
> > > > get_task_struct() are properly protected (either by RCU or by an earlier
> > > > get_task_struct()), and the calls to put_task_struct() are balanced by
> > > > those to get_task_struct().
> > > >
> > > > I could of course have missed something, but at this point I am suspecting
> > > > an unbalanced put_task_struct() has been added elsewhere.
> > > >
> > > > As always, extra eyes on this code would be a good thing.
> > > >
> > > > If it were reproducible, I would of course suggest bisection. :-/
> > > >
> > > > Thanx, Paul
> > > >
> > > Hi Paul,
> > >
> > > Could it be?
> > >
> > > CPU1 CPU2
> > > trc_add_holdout(t, bhp)
> > > //t->usage==2
> > > release_task
> > > put_task_struct_rcu_user
> > > delayed_put_task_struct
> > > ......
> > > put_task_struct(t)
> > > //t->usage==1
> > >
> > > check_all_holdout_tasks_trace
> > > ->trc_wait_for_one_reader
> > > ->trc_del_holdout
> > > ->put_task_struct(t)
> > > //t->usage==0 and task_struct freed
> > > READ_ONCE(t->trc_reader_checked)
> > > //ops, t had been freed.
> > >
> > > So, after excuting trc_wait_for_one_reader(), task might had been removed
> > > from holdout list and the corresponding task_struct was freed.
> > > And we shouldn't do READ_ONCE(t->trc_reader_checked).
> >
> > I was suspicious of that call to trc_del_holdout() from within
> > trc_wait_for_one_reader(), but the only time it executes is in the
> > context of the current running task, which means that CPU 2 had better
> > not be invoking release_task() on it just yet.
> >
> > Or am I missing your point?
> >
> > Of course, if you can reproduce it, the following patch might be
> > an interesting thing to try, my doubts notwithstanding. But more
> > important, please check the patch to make sure that we are both
> > talking about the same call to trc_del_holdout()!
> >
> > If we are talking about the same call to trc_del_holdout(), are you
> > actually seeing that code execute except when rcu_tasks_trace_pertask()
> > calls trc_wait_for_one_reader()?
> >
> > > I investigate the trc_wait_for_one_reader() and found before we excute
> > > trc_del_holdout, there is always set t->trc_reader_checked=true. How about
> > > we just set the checked flag and unified excute trc_del_holdout()
> > > in check_all_holdout_tasks_trace with checking the flag?
> >
> > The problem is that we cannot execute trc_del_holdout() except in
> > the context of the RCU Tasks Trace grace-period kthread. So it is
> > necessary to manipulate ->trc_reader_checked separately from the list
> > in order to safely synchronize with IPIs and with the exit code path
> > for any reader tasks, see for example trc_read_check_handler() and
> > exit_tasks_rcu_finish_trace().
> >
> > Or are you thinking of some other approach?
>
> This could be caused by a buggy extra put_pid somewhere else, right?
> If so, I suspect that's what may be happening. We've 2 very similar
> use-after-free reports on an internal kernel, but it also has a number
> of other use-after-free reports in pid-related functions
> (pid_task/pid_nr_ns/attach_pid). One of them is happening relatively
> frequently (150 crashes) and is caused by something in the tty
> subsystem. Presumably it may be causing one off use-after-free's in
> other random places of the kernel as well. Unfortunately these crashes
> don't happen on the upstream kernel (at least not yet).
> So if you don't see any obvious smoking gun in rcu, I think we can
> assume for now that it's due to tty.

Good to hear!

On the other hand, it looks like Yanfei might have found a real problem,
whether or not it was actually being triggered. Tough to prove either
way, of course!

Thanx, Paul

> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > index efb8127f3a36..2a0d4bdd619a 100644
> > --- a/kernel/rcu/tasks.h
> > +++ b/kernel/rcu/tasks.h
> > @@ -987,7 +987,6 @@ static void trc_wait_for_one_reader(struct task_struct *t,
> > // The current task had better be in a quiescent state.
> > if (t == current) {
> > t->trc_reader_checked = true;
> > - trc_del_holdout(t);
> > WARN_ON_ONCE(READ_ONCE(t->trc_reader_nesting));
> > return;
> > }
> >
> > --
> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20210525033355.GN4441%40paulmck-ThinkPad-P17-Gen-1.

2021-05-26 02:56:52

by Xu, Yanfei

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace



On 5/25/21 10:28 PM, Paul E. McKenney wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Tue, May 25, 2021 at 06:24:10PM +0800, Xu, Yanfei wrote:
>>
>>
>> On 5/25/21 11:33 AM, Paul E. McKenney wrote:
>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>
>>> On Tue, May 25, 2021 at 10:31:55AM +0800, Xu, Yanfei wrote:
>>>>
>>>>
>>>> On 5/25/21 6:46 AM, Paul E. McKenney wrote:
>>>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>>>
>>>>> On Sun, May 23, 2021 at 09:13:50PM -0700, Paul E. McKenney wrote:
>>>>>> On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:
>>>>>>> On Fri, May 21, 2021 at 7:29 PM syzbot
>>>>>>> <[email protected]> wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> syzbot found the following issue on:
>>>>>>>>
>>>>>>>> HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
>>>>>>>> git tree: bpf-next
>>>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
>>>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
>>>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
>>>>>>>>
>>>>>>>> Unfortunately, I don't have any reproducer for this issue yet.
>>>>>>>>
>>>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>>>>> Reported-by: [email protected]
>>>>>>>
>>>>>>> This looks rcu-related. +rcu mailing list
>>>>>>
>>>>>> I think I see a possible cause for this, and will say more after some
>>>>>> testing and after becoming more awake Monday morning, Pacific time.
>>>>>
>>>>> No joy. From what I can see, within RCU Tasks Trace, the calls to
>>>>> get_task_struct() are properly protected (either by RCU or by an earlier
>>>>> get_task_struct()), and the calls to put_task_struct() are balanced by
>>>>> those to get_task_struct().
>>>>>
>>>>> I could of course have missed something, but at this point I am suspecting
>>>>> an unbalanced put_task_struct() has been added elsewhere.
>>>>>
>>>>> As always, extra eyes on this code would be a good thing.
>>>>>
>>>>> If it were reproducible, I would of course suggest bisection. :-/
>>>>>
>>>>> Thanx, Paul
>>>>>
>>>> Hi Paul,
>>>>
>>>> Could it be?
>>>>
>>>> CPU1 CPU2
>>>> trc_add_holdout(t, bhp)
>>>> //t->usage==2
>>>> release_task
>>>> put_task_struct_rcu_user
>>>> delayed_put_task_struct
>>>> ......
>>>> put_task_struct(t)
>>>> //t->usage==1
>>>>
>>>> check_all_holdout_tasks_trace
>>>> ->trc_wait_for_one_reader
>>>> ->trc_del_holdout
>>>> ->put_task_struct(t)
>>>> //t->usage==0 and task_struct freed
>>>> READ_ONCE(t->trc_reader_checked)
>>>> //ops, t had been freed.
>>>>
>>>> So, after excuting trc_wait_for_one_reader(), task might had been removed
>>>> from holdout list and the corresponding task_struct was freed.
>>>> And we shouldn't do READ_ONCE(t->trc_reader_checked).
>>>
>>> I was suspicious of that call to trc_del_holdout() from within
>>> trc_wait_for_one_reader(), but the only time it executes is in the
>>> context of the current running task, which means that CPU 2 had better
>>> not be invoking release_task() on it just yet.
>>>
>>> Or am I missing your point?
>>
>> Two times.
>> 1. the task is current.
>>
>> trc_wait_for_one_reader
>> ->trc_del_holdout
>
> This one should be fine because the task cannot be freed until it
> actually exits, and the grace-period kthread never exits. But it
> could also be removed without any problem that I see. >

Agree, current task's task_struct should be high probably safe. If you
think it is safe to remove, I prefer to remove it. Because it can make
trc_wait_for_one_reader's behavior about deleting task from holdout more
unified. And there should be a very small racy that the task is checked
as a current and then turn into a exiting task before its task_struct is
accessed in trc_wait_for_one_reader or check_all_holdout_tasks_trace.(or
I misunderstand something about rcu tasks)

>> 2. task isn't current.
>>
>> trc_wait_for_one_reader
>> ->get_task_struct
>> ->try_invoke_on_locked_down_task(trc_inspect_reader)
>> ->trc_del_holdout
>> ->put_task_struct
>
> Ah, this one is more interesting, thank you!
>
> Yes, it is safe from the list's viewpoint to do the removal in the
> trc_inspect_reader() callback, but you are right that the grace-period
> kthread may touch the task structure after return, and there might not
> be anything else holding that task structure in place.
>
>>> Of course, if you can reproduce it, the following patch might be
>>
>> Sorry...I can't reproduce it, just analyse syzbot's log. :(
>
> Well, if it could be reproduced, that would mean that it was too easy,
> wouldn't it? ;-)

Ha ;-)
>
> How about the (untested) patch below, just to make sure that we are
> talking about the same thing? I have started testing, but then
> again, I have not yet been able to reproduce this, either.
>
> Thanx, Paul
>

Yes! we are talking the same thing, Should I send a new patch?

Thanks,
Yanfei

> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index efb8127f3a36..8b25551d10db 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -957,10 +957,9 @@ static bool trc_inspect_reader(struct task_struct *t, void *arg)
> in_qs = likely(!t->trc_reader_nesting);
> }
>
> - // Mark as checked. Because this is called from the grace-period
> - // kthread, also remove the task from the holdout list.
> + // Mark as checked so that the grace-period kthread will
> + // remove it from the holdout list.
> t->trc_reader_checked = true;
> - trc_del_holdout(t);
>
> if (in_qs)
> return true; // Already in quiescent state, done!!!
>

2021-05-26 06:05:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

On Wed, May 26, 2021 at 10:22:59AM +0800, Xu, Yanfei wrote:
> On 5/25/21 10:28 PM, Paul E. McKenney wrote:
> > [Please note: This e-mail is from an EXTERNAL e-mail address]
> >
> > On Tue, May 25, 2021 at 06:24:10PM +0800, Xu, Yanfei wrote:
> > >
> > >
> > > On 5/25/21 11:33 AM, Paul E. McKenney wrote:
> > > > [Please note: This e-mail is from an EXTERNAL e-mail address]
> > > >
> > > > On Tue, May 25, 2021 at 10:31:55AM +0800, Xu, Yanfei wrote:
> > > > >
> > > > >
> > > > > On 5/25/21 6:46 AM, Paul E. McKenney wrote:
> > > > > > [Please note: This e-mail is from an EXTERNAL e-mail address]
> > > > > >
> > > > > > On Sun, May 23, 2021 at 09:13:50PM -0700, Paul E. McKenney wrote:
> > > > > > > On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:
> > > > > > > > On Fri, May 21, 2021 at 7:29 PM syzbot
> > > > > > > > <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > Hello,
> > > > > > > > >
> > > > > > > > > syzbot found the following issue on:
> > > > > > > > >
> > > > > > > > > HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
> > > > > > > > > git tree: bpf-next
> > > > > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
> > > > > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
> > > > > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
> > > > > > > > >
> > > > > > > > > Unfortunately, I don't have any reproducer for this issue yet.
> > > > > > > > >
> > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > > > > > > > Reported-by: [email protected]
> > > > > > > >
> > > > > > > > This looks rcu-related. +rcu mailing list
> > > > > > >
> > > > > > > I think I see a possible cause for this, and will say more after some
> > > > > > > testing and after becoming more awake Monday morning, Pacific time.
> > > > > >
> > > > > > No joy. From what I can see, within RCU Tasks Trace, the calls to
> > > > > > get_task_struct() are properly protected (either by RCU or by an earlier
> > > > > > get_task_struct()), and the calls to put_task_struct() are balanced by
> > > > > > those to get_task_struct().
> > > > > >
> > > > > > I could of course have missed something, but at this point I am suspecting
> > > > > > an unbalanced put_task_struct() has been added elsewhere.
> > > > > >
> > > > > > As always, extra eyes on this code would be a good thing.
> > > > > >
> > > > > > If it were reproducible, I would of course suggest bisection. :-/
> > > > > >
> > > > > > Thanx, Paul
> > > > > >
> > > > > Hi Paul,
> > > > >
> > > > > Could it be?
> > > > >
> > > > > CPU1 CPU2
> > > > > trc_add_holdout(t, bhp)
> > > > > //t->usage==2
> > > > > release_task
> > > > > put_task_struct_rcu_user
> > > > > delayed_put_task_struct
> > > > > ......
> > > > > put_task_struct(t)
> > > > > //t->usage==1
> > > > >
> > > > > check_all_holdout_tasks_trace
> > > > > ->trc_wait_for_one_reader
> > > > > ->trc_del_holdout
> > > > > ->put_task_struct(t)
> > > > > //t->usage==0 and task_struct freed
> > > > > READ_ONCE(t->trc_reader_checked)
> > > > > //ops, t had been freed.
> > > > >
> > > > > So, after excuting trc_wait_for_one_reader(), task might had been removed
> > > > > from holdout list and the corresponding task_struct was freed.
> > > > > And we shouldn't do READ_ONCE(t->trc_reader_checked).
> > > >
> > > > I was suspicious of that call to trc_del_holdout() from within
> > > > trc_wait_for_one_reader(), but the only time it executes is in the
> > > > context of the current running task, which means that CPU 2 had better
> > > > not be invoking release_task() on it just yet.
> > > >
> > > > Or am I missing your point?
> > >
> > > Two times.
> > > 1. the task is current.
> > >
> > > trc_wait_for_one_reader
> > > ->trc_del_holdout
> >
> > This one should be fine because the task cannot be freed until it
> > actually exits, and the grace-period kthread never exits. But it
> > could also be removed without any problem that I see. >
>
> Agree, current task's task_struct should be high probably safe. If you
> think it is safe to remove, I prefer to remove it. Because it can make
> trc_wait_for_one_reader's behavior about deleting task from holdout more
> unified. And there should be a very small racy that the task is checked as a
> current and then turn into a exiting task before its task_struct is accessed
> in trc_wait_for_one_reader or check_all_holdout_tasks_trace.(or I
> misunderstand something about rcu tasks)
>
> > > 2. task isn't current.
> > >
> > > trc_wait_for_one_reader
> > > ->get_task_struct
> > > ->try_invoke_on_locked_down_task(trc_inspect_reader)
> > > ->trc_del_holdout
> > > ->put_task_struct
> >
> > Ah, this one is more interesting, thank you!
> >
> > Yes, it is safe from the list's viewpoint to do the removal in the
> > trc_inspect_reader() callback, but you are right that the grace-period
> > kthread may touch the task structure after return, and there might not
> > be anything else holding that task structure in place.
> >
> > > > Of course, if you can reproduce it, the following patch might be
> > >
> > > Sorry...I can't reproduce it, just analyse syzbot's log. :(
> >
> > Well, if it could be reproduced, that would mean that it was too easy,
> > wouldn't it? ;-)
>
> Ha ;-)

But it should be possible to make this happen... Is it possible to
add lots of short-lived tasks to the test that failed?

> > How about the (untested) patch below, just to make sure that we are
> > talking about the same thing? I have started testing, but then
> > again, I have not yet been able to reproduce this, either.
> >
> > Thanx, Paul
>
> Yes! we are talking the same thing, Should I send a new patch?

Or look at these commits that I queued this past morning (Pacific Time)
on the "dev" branch of the -rcu tree:

aac385ea2494 rcu-tasks: Don't delete holdouts within trc_inspect_reader()
bf30dc63947c rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader()

They pass initial testing, but then again, such tests passed before
these patches were queued. :-/

Thanx, Paul

2021-05-26 06:16:52

by Xu, Yanfei

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace



On 5/26/21 12:21 PM, Paul E. McKenney wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Wed, May 26, 2021 at 10:22:59AM +0800, Xu, Yanfei wrote:
>> On 5/25/21 10:28 PM, Paul E. McKenney wrote:
>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>
>>> On Tue, May 25, 2021 at 06:24:10PM +0800, Xu, Yanfei wrote:
>>>>
>>>>
>>>> On 5/25/21 11:33 AM, Paul E. McKenney wrote:
>>>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>>>
>>>>> On Tue, May 25, 2021 at 10:31:55AM +0800, Xu, Yanfei wrote:
>>>>>>
>>>>>>
>>>>>> On 5/25/21 6:46 AM, Paul E. McKenney wrote:
>>>>>>> [Please note: This e-mail is from an EXTERNAL e-mail address]
>>>>>>>
>>>>>>> On Sun, May 23, 2021 at 09:13:50PM -0700, Paul E. McKenney wrote:
>>>>>>>> On Sun, May 23, 2021 at 08:51:56AM +0200, Dmitry Vyukov wrote:
>>>>>>>>> On Fri, May 21, 2021 at 7:29 PM syzbot
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> syzbot found the following issue on:
>>>>>>>>>>
>>>>>>>>>> HEAD commit: f18ba26d libbpf: Add selftests for TC-BPF management API
>>>>>>>>>> git tree: bpf-next
>>>>>>>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=17f50d1ed00000
>>>>>>>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=8ff54addde0afb5d
>>>>>>>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
>>>>>>>>>>
>>>>>>>>>> Unfortunately, I don't have any reproducer for this issue yet.
>>>>>>>>>>
>>>>>>>>>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>>>>>>>>>> Reported-by: [email protected]
>>>>>>>>>
>>>>>>>>> This looks rcu-related. +rcu mailing list
>>>>>>>>
>>>>>>>> I think I see a possible cause for this, and will say more after some
>>>>>>>> testing and after becoming more awake Monday morning, Pacific time.
>>>>>>>
>>>>>>> No joy. From what I can see, within RCU Tasks Trace, the calls to
>>>>>>> get_task_struct() are properly protected (either by RCU or by an earlier
>>>>>>> get_task_struct()), and the calls to put_task_struct() are balanced by
>>>>>>> those to get_task_struct().
>>>>>>>
>>>>>>> I could of course have missed something, but at this point I am suspecting
>>>>>>> an unbalanced put_task_struct() has been added elsewhere.
>>>>>>>
>>>>>>> As always, extra eyes on this code would be a good thing.
>>>>>>>
>>>>>>> If it were reproducible, I would of course suggest bisection. :-/
>>>>>>>
>>>>>>> Thanx, Paul
>>>>>>>
>>>>>> Hi Paul,
>>>>>>
>>>>>> Could it be?
>>>>>>
>>>>>> CPU1 CPU2
>>>>>> trc_add_holdout(t, bhp)
>>>>>> //t->usage==2
>>>>>> release_task
>>>>>> put_task_struct_rcu_user
>>>>>> delayed_put_task_struct
>>>>>> ......
>>>>>> put_task_struct(t)
>>>>>> //t->usage==1
>>>>>>
>>>>>> check_all_holdout_tasks_trace
>>>>>> ->trc_wait_for_one_reader
>>>>>> ->trc_del_holdout
>>>>>> ->put_task_struct(t)
>>>>>> //t->usage==0 and task_struct freed
>>>>>> READ_ONCE(t->trc_reader_checked)
>>>>>> //ops, t had been freed.
>>>>>>
>>>>>> So, after excuting trc_wait_for_one_reader(), task might had been removed
>>>>>> from holdout list and the corresponding task_struct was freed.
>>>>>> And we shouldn't do READ_ONCE(t->trc_reader_checked).
>>>>>
>>>>> I was suspicious of that call to trc_del_holdout() from within
>>>>> trc_wait_for_one_reader(), but the only time it executes is in the
>>>>> context of the current running task, which means that CPU 2 had better
>>>>> not be invoking release_task() on it just yet.
>>>>>
>>>>> Or am I missing your point?
>>>>
>>>> Two times.
>>>> 1. the task is current.
>>>>
>>>> trc_wait_for_one_reader
>>>> ->trc_del_holdout
>>>
>>> This one should be fine because the task cannot be freed until it
>>> actually exits, and the grace-period kthread never exits. But it
>>> could also be removed without any problem that I see. >
>>
>> Agree, current task's task_struct should be high probably safe. If you
>> think it is safe to remove, I prefer to remove it. Because it can make
>> trc_wait_for_one_reader's behavior about deleting task from holdout more
>> unified. And there should be a very small racy that the task is checked as a
>> current and then turn into a exiting task before its task_struct is accessed
>> in trc_wait_for_one_reader or check_all_holdout_tasks_trace.(or I
>> misunderstand something about rcu tasks)
>>
>>>> 2. task isn't current.
>>>>
>>>> trc_wait_for_one_reader
>>>> ->get_task_struct
>>>> ->try_invoke_on_locked_down_task(trc_inspect_reader)
>>>> ->trc_del_holdout
>>>> ->put_task_struct
>>>
>>> Ah, this one is more interesting, thank you!
>>>
>>> Yes, it is safe from the list's viewpoint to do the removal in the
>>> trc_inspect_reader() callback, but you are right that the grace-period
>>> kthread may touch the task structure after return, and there might not
>>> be anything else holding that task structure in place.
>>>
>>>>> Of course, if you can reproduce it, the following patch might be
>>>>
>>>> Sorry...I can't reproduce it, just analyse syzbot's log. :(
>>>
>>> Well, if it could be reproduced, that would mean that it was too easy,
>>> wouldn't it? ;-)
>>
>> Ha ;-)
>
> But it should be possible to make this happen... Is it possible to
> add lots of short-lived tasks to the test that failed?
>

Agree.

>>> How about the (untested) patch below, just to make sure that we are
>>> talking about the same thing? I have started testing, but then
>>> again, I have not yet been able to reproduce this, either.
>>>
>>> Thanx, Paul
>>
>> Yes! we are talking the same thing, Should I send a new patch?
>
> Or look at these commits that I queued this past morning (Pacific Time)
> on the "dev" branch of the -rcu tree:
>
> aac385ea2494 rcu-tasks: Don't delete holdouts within trc_inspect_reader()
> bf30dc63947c rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader()

Got it, Thanks!

Regards,
Yanfei

>
> They pass initial testing, but then again, such tests passed before
> these patches were queued. :-/
>
> Thanx, Paul
>

2021-06-19 04:07:11

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

syzbot has found a reproducer for the following issue on:

HEAD commit: 0c38740c selftests/bpf: Fix ringbuf test fetching map FD
git tree: bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=128a7e34300000
kernel config: https://syzkaller.appspot.com/x/.config?x=a6380da8984033f1
dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1264c2d7d00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

==================================================================
BUG: KASAN: use-after-free in check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
Read of size 1 at addr ffff8880294cbc9c by task rcu_tasks_trace/12

CPU: 0 PID: 12 Comm: rcu_tasks_trace Not tainted 5.13.0-rc3-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
__kasan_report mm/kasan/report.c:419 [inline]
kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
kthread+0x3b1/0x4a0 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Allocated by task 8499:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:428 [inline]
__kasan_slab_alloc+0x84/0xa0 mm/kasan/common.c:461
kasan_slab_alloc include/linux/kasan.h:236 [inline]
slab_post_alloc_hook mm/slab.h:524 [inline]
slab_alloc_node mm/slub.c:2913 [inline]
kmem_cache_alloc_node+0x269/0x3e0 mm/slub.c:2949
alloc_task_struct_node kernel/fork.c:171 [inline]
dup_task_struct kernel/fork.c:865 [inline]
copy_process+0x5c8/0x7120 kernel/fork.c:1947
kernel_clone+0xe7/0xab0 kernel/fork.c:2503
__do_sys_clone+0xc8/0x110 kernel/fork.c:2620
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 12:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
____kasan_slab_free mm/kasan/common.c:360 [inline]
____kasan_slab_free mm/kasan/common.c:325 [inline]
__kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
kasan_slab_free include/linux/kasan.h:212 [inline]
slab_free_hook mm/slub.c:1582 [inline]
slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1607
slab_free mm/slub.c:3167 [inline]
kmem_cache_free+0x8a/0x740 mm/slub.c:3183
__put_task_struct+0x26f/0x400 kernel/fork.c:747
trc_wait_for_one_reader kernel/rcu/tasks.h:935 [inline]
check_all_holdout_tasks_trace+0x179/0x420 kernel/rcu/tasks.h:1081
rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
kthread+0x3b1/0x4a0 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294

Last potentially related work creation:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
__call_rcu kernel/rcu/tree.c:3038 [inline]
call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
release_task+0xca1/0x1690 kernel/exit.c:226
wait_task_zombie kernel/exit.c:1108 [inline]
wait_consider_task+0x2fb5/0x3b40 kernel/exit.c:1335
do_wait_thread kernel/exit.c:1398 [inline]
do_wait+0x724/0xd40 kernel/exit.c:1515
kernel_wait4+0x14c/0x260 kernel/exit.c:1678
__do_sys_wait4+0x13f/0x150 kernel/exit.c:1706
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

Second to last potentially related work creation:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
__call_rcu kernel/rcu/tree.c:3038 [inline]
call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
release_task+0xca1/0x1690 kernel/exit.c:226
wait_task_zombie kernel/exit.c:1108 [inline]
wait_consider_task+0x2fb5/0x3b40 kernel/exit.c:1335
do_wait_thread kernel/exit.c:1398 [inline]
do_wait+0x724/0xd40 kernel/exit.c:1515
kernel_wait4+0x14c/0x260 kernel/exit.c:1678
__do_sys_wait4+0x13f/0x150 kernel/exit.c:1706
do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff8880294cb880
which belongs to the cache task_struct of size 6976
The buggy address is located 1052 bytes inside of
6976-byte region [ffff8880294cb880, ffff8880294cd3c0)
The buggy address belongs to the page:
page:ffffea0000a53200 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x294c8
head:ffffea0000a53200 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 ffffea00008d6400 0000000200000002 ffff888140005140
raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 2, ts 15187628853, free_ts 0
prep_new_page mm/page_alloc.c:2358 [inline]
get_page_from_freelist+0x1034/0x2bf0 mm/page_alloc.c:3994
__alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200
alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
alloc_slab_page mm/slub.c:1645 [inline]
allocate_slab+0x32e/0x4c0 mm/slub.c:1785
new_slab mm/slub.c:1848 [inline]
new_slab_objects mm/slub.c:2594 [inline]
___slab_alloc+0x4a1/0x810 mm/slub.c:2757
__slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2797
slab_alloc_node mm/slub.c:2879 [inline]
kmem_cache_alloc_node+0x12f/0x3e0 mm/slub.c:2949
alloc_task_struct_node kernel/fork.c:171 [inline]
dup_task_struct kernel/fork.c:865 [inline]
copy_process+0x5c8/0x7120 kernel/fork.c:1947
kernel_clone+0xe7/0xab0 kernel/fork.c:2503
kernel_thread+0xb5/0xf0 kernel/fork.c:2555
create_kthread kernel/kthread.c:336 [inline]
kthreadd+0x52a/0x790 kernel/kthread.c:679
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
page_owner free stack trace missing

Memory state around the buggy address:
ffff8880294cbb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880294cbc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880294cbc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880294cbd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880294cbd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

2021-06-19 20:49:43

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

syzbot has bisected this issue to:

commit f9006acc8dfe59e25aa75729728ac57a8d84fc32
Author: Florian Westphal <[email protected]>
Date: Wed Apr 21 07:51:08 2021 +0000

netfilter: arp_tables: pass table pointer via nf_hook_ops

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10dceae8300000
start commit: 0c38740c selftests/bpf: Fix ringbuf test fetching map FD
git tree: bpf-next
final oops: https://syzkaller.appspot.com/x/report.txt?x=12dceae8300000
console output: https://syzkaller.appspot.com/x/log.txt?x=14dceae8300000
kernel config: https://syzkaller.appspot.com/x/.config?x=a6380da8984033f1
dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1264c2d7d00000

Reported-by: [email protected]
Fixes: f9006acc8dfe ("netfilter: arp_tables: pass table pointer via nf_hook_ops")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

2021-06-21 22:38:50

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

On Fri, Jun 18, 2021 at 01:45:23PM -0700, syzbot wrote:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: 0c38740c selftests/bpf: Fix ringbuf test fetching map FD
> git tree: bpf-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=128a7e34300000
> kernel config: https://syzkaller.appspot.com/x/.config?x=a6380da8984033f1
> dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1264c2d7d00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]

This looks like a bug that Xu Yanfei located a few weeks back.

Do these commits from -rcu help?

6a04a59eacbd ("rcu-tasks: Don't delete holdouts within trc_inspect_reader()"
dd5da0a9140e ("rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader()")

On any interaction with the HEAD commit above I must defer to Andrii.

Thanx, Paul

> ==================================================================
> BUG: KASAN: use-after-free in check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
> Read of size 1 at addr ffff8880294cbc9c by task rcu_tasks_trace/12
>
> CPU: 0 PID: 12 Comm: rcu_tasks_trace Not tainted 5.13.0-rc3-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:79 [inline]
> dump_stack+0x141/0x1d7 lib/dump_stack.c:120
> print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
> __kasan_report mm/kasan/report.c:419 [inline]
> kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
> check_all_holdout_tasks_trace+0x302/0x420 kernel/rcu/tasks.h:1084
> rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
> rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
> kthread+0x3b1/0x4a0 kernel/kthread.c:313
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
>
> Allocated by task 8499:
> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> kasan_set_track mm/kasan/common.c:46 [inline]
> set_alloc_info mm/kasan/common.c:428 [inline]
> __kasan_slab_alloc+0x84/0xa0 mm/kasan/common.c:461
> kasan_slab_alloc include/linux/kasan.h:236 [inline]
> slab_post_alloc_hook mm/slab.h:524 [inline]
> slab_alloc_node mm/slub.c:2913 [inline]
> kmem_cache_alloc_node+0x269/0x3e0 mm/slub.c:2949
> alloc_task_struct_node kernel/fork.c:171 [inline]
> dup_task_struct kernel/fork.c:865 [inline]
> copy_process+0x5c8/0x7120 kernel/fork.c:1947
> kernel_clone+0xe7/0xab0 kernel/fork.c:2503
> __do_sys_clone+0xc8/0x110 kernel/fork.c:2620
> do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Freed by task 12:
> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
> kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:357
> ____kasan_slab_free mm/kasan/common.c:360 [inline]
> ____kasan_slab_free mm/kasan/common.c:325 [inline]
> __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:368
> kasan_slab_free include/linux/kasan.h:212 [inline]
> slab_free_hook mm/slub.c:1582 [inline]
> slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1607
> slab_free mm/slub.c:3167 [inline]
> kmem_cache_free+0x8a/0x740 mm/slub.c:3183
> __put_task_struct+0x26f/0x400 kernel/fork.c:747
> trc_wait_for_one_reader kernel/rcu/tasks.h:935 [inline]
> check_all_holdout_tasks_trace+0x179/0x420 kernel/rcu/tasks.h:1081
> rcu_tasks_wait_gp+0x594/0xa60 kernel/rcu/tasks.h:358
> rcu_tasks_kthread+0x31c/0x6a0 kernel/rcu/tasks.h:224
> kthread+0x3b1/0x4a0 kernel/kthread.c:313
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
>
> Last potentially related work creation:
> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
> __call_rcu kernel/rcu/tree.c:3038 [inline]
> call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
> put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
> release_task+0xca1/0x1690 kernel/exit.c:226
> wait_task_zombie kernel/exit.c:1108 [inline]
> wait_consider_task+0x2fb5/0x3b40 kernel/exit.c:1335
> do_wait_thread kernel/exit.c:1398 [inline]
> do_wait+0x724/0xd40 kernel/exit.c:1515
> kernel_wait4+0x14c/0x260 kernel/exit.c:1678
> __do_sys_wait4+0x13f/0x150 kernel/exit.c:1706
> do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Second to last potentially related work creation:
> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
> kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:345
> __call_rcu kernel/rcu/tree.c:3038 [inline]
> call_rcu+0xb1/0x750 kernel/rcu/tree.c:3113
> put_task_struct_rcu_user+0x7f/0xb0 kernel/exit.c:180
> release_task+0xca1/0x1690 kernel/exit.c:226
> wait_task_zombie kernel/exit.c:1108 [inline]
> wait_consider_task+0x2fb5/0x3b40 kernel/exit.c:1335
> do_wait_thread kernel/exit.c:1398 [inline]
> do_wait+0x724/0xd40 kernel/exit.c:1515
> kernel_wait4+0x14c/0x260 kernel/exit.c:1678
> __do_sys_wait4+0x13f/0x150 kernel/exit.c:1706
> do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> The buggy address belongs to the object at ffff8880294cb880
> which belongs to the cache task_struct of size 6976
> The buggy address is located 1052 bytes inside of
> 6976-byte region [ffff8880294cb880, ffff8880294cd3c0)
> The buggy address belongs to the page:
> page:ffffea0000a53200 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x294c8
> head:ffffea0000a53200 order:3 compound_mapcount:0 compound_pincount:0
> flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
> raw: 00fff00000010200 ffffea00008d6400 0000000200000002 ffff888140005140
> raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
> page_owner tracks the page as allocated
> page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 2, ts 15187628853, free_ts 0
> prep_new_page mm/page_alloc.c:2358 [inline]
> get_page_from_freelist+0x1034/0x2bf0 mm/page_alloc.c:3994
> __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5200
> alloc_pages+0x18c/0x2a0 mm/mempolicy.c:2272
> alloc_slab_page mm/slub.c:1645 [inline]
> allocate_slab+0x32e/0x4c0 mm/slub.c:1785
> new_slab mm/slub.c:1848 [inline]
> new_slab_objects mm/slub.c:2594 [inline]
> ___slab_alloc+0x4a1/0x810 mm/slub.c:2757
> __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2797
> slab_alloc_node mm/slub.c:2879 [inline]
> kmem_cache_alloc_node+0x12f/0x3e0 mm/slub.c:2949
> alloc_task_struct_node kernel/fork.c:171 [inline]
> dup_task_struct kernel/fork.c:865 [inline]
> copy_process+0x5c8/0x7120 kernel/fork.c:1947
> kernel_clone+0xe7/0xab0 kernel/fork.c:2503
> kernel_thread+0xb5/0xf0 kernel/fork.c:2555
> create_kthread kernel/kthread.c:336 [inline]
> kthreadd+0x52a/0x790 kernel/kthread.c:679
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> page_owner free stack trace missing
>
> Memory state around the buggy address:
> ffff8880294cbb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8880294cbc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >ffff8880294cbc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ^
> ffff8880294cbd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8880294cbd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
>

2021-06-21 22:42:38

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

On Sat, Jun 19, 2021 at 09:54:06AM -0700, syzbot wrote:
> syzbot has bisected this issue to:
>
> commit f9006acc8dfe59e25aa75729728ac57a8d84fc32
> Author: Florian Westphal <[email protected]>
> Date: Wed Apr 21 07:51:08 2021 +0000
>
> netfilter: arp_tables: pass table pointer via nf_hook_ops
>
> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10dceae8300000
> start commit: 0c38740c selftests/bpf: Fix ringbuf test fetching map FD
> git tree: bpf-next
> final oops: https://syzkaller.appspot.com/x/report.txt?x=12dceae8300000
> console output: https://syzkaller.appspot.com/x/log.txt?x=14dceae8300000
> kernel config: https://syzkaller.appspot.com/x/.config?x=a6380da8984033f1
> dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1264c2d7d00000
>
> Reported-by: [email protected]
> Fixes: f9006acc8dfe ("netfilter: arp_tables: pass table pointer via nf_hook_ops")
>
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection

I am not seeing any mention of check_all_holdout_tasks_trace() in
the console output, but I again suggest the following two patches:

6a04a59eacbd ("rcu-tasks: Don't delete holdouts within trc_inspect_reader()"
dd5da0a9140e ("rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader()")

Thanx, Paul

2021-06-28 08:45:10

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [syzbot] KASAN: use-after-free Read in check_all_holdout_tasks_trace

On Tue, Jun 22, 2021 at 12:41 AM Paul E. McKenney <[email protected]> wrote:
>
> On Sat, Jun 19, 2021 at 09:54:06AM -0700, syzbot wrote:
> > syzbot has bisected this issue to:
> >
> > commit f9006acc8dfe59e25aa75729728ac57a8d84fc32
> > Author: Florian Westphal <[email protected]>
> > Date: Wed Apr 21 07:51:08 2021 +0000
> >
> > netfilter: arp_tables: pass table pointer via nf_hook_ops
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=10dceae8300000
> > start commit: 0c38740c selftests/bpf: Fix ringbuf test fetching map FD
> > git tree: bpf-next
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=12dceae8300000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14dceae8300000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=a6380da8984033f1
> > dashboard link: https://syzkaller.appspot.com/bug?extid=7b2b13f4943374609532
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1264c2d7d00000
> >
> > Reported-by: [email protected]
> > Fixes: f9006acc8dfe ("netfilter: arp_tables: pass table pointer via nf_hook_ops")
> >
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>
> I am not seeing any mention of check_all_holdout_tasks_trace() in
> the console output, but I again suggest the following two patches:
>
> 6a04a59eacbd ("rcu-tasks: Don't delete holdouts within trc_inspect_reader()"
> dd5da0a9140e ("rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader()")

Let's tell syzbot about these fixes, then it will tell us if they help or not.

#syz fix: rcu-tasks: Don't delete holdouts within trc_inspect_reader()