LinuxLists.cc - [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu

2024-03-18 10:07:39

Subject: [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult

Hello,

syzbot found the following issue on:

HEAD commit: ea80e3ed09ab net: ethernet: mtk_eth_soc: fix PPE hanging i..
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=1249daa5180000
kernel config: https://syzkaller.appspot.com/x/.config?x=6fb1be60a193d440
dashboard link: https://syzkaller.appspot.com/bug?extid=c4f4d25859c2e5859988
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17fd8c81180000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1795afc1180000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/4c6c49a7ef5c/disk-ea80e3ed.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/242942b30f2d/vmlinux-ea80e3ed.xz
kernel image: https://storage.googleapis.com/syzbot-assets/74dcc2059655/bzImage-ea80e3ed.xz

The issue was bisected to:

commit ee042be16cb455116d0fe99b77c6bc8baf87c8c6
Author: Namhyung Kim <[email protected]>
Date: Tue Mar 22 18:57:09 2022 +0000

locking: Apply contention tracepoints in the slow path

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=1702c2a5180000
final oops: https://syzkaller.appspot.com/x/report.txt?x=1482c2a5180000
console output: https://syzkaller.appspot.com/x/log.txt?x=1082c2a5180000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
Fixes: ee042be16cb4 ("locking: Apply contention tracepoints in the slow path")

=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.8.0-syzkaller-05221-gea80e3ed09ab #0 Not tainted
-----------------------------------------------------
rcu_exp_gp_kthr/18 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939

and this task is already holding:
ffffffff8e136558 (rcu_node_0){-.-.}-{2:2}, at: sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
which would create a new lock dependency:
(rcu_node_0){-.-.}-{2:2} -> (&htab->buckets[i].lock){+...}-{2:2}

but this new dependency connects a HARDIRQ-irq-safe lock:
(rcu_node_0){-.-.}-{2:2}

.. which became HARDIRQ-irq-safe at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
rcu_report_exp_cpu_mult+0x27/0x2f0 kernel/rcu/tree_exp.h:238
csd_do_func kernel/smp.c:133 [inline]
__flush_smp_call_function_queue+0xb2e/0x15b0 kernel/smp.c:542
__sysvec_call_function_single+0xa8/0x3e0 arch/x86/kernel/smp.c:271
instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
sysvec_call_function_single+0x9e/0xc0 arch/x86/kernel/smp.c:266
asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:709
__sanitizer_cov_trace_switch+0x90/0x120
update_event_printk kernel/trace/trace_events.c:2750 [inline]
trace_event_eval_update+0x311/0xf90 kernel/trace/trace_events.c:2922
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243

to a HARDIRQ-irq-unsafe lock:
(&htab->buckets[i].lock){+...}-{2:2}

.. which became HARDIRQ-irq-unsafe at:
..
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
0xffffffffa0001b0e
bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
__bpf_prog_run include/linux/filter.h:657 [inline]
bpf_prog_run include/linux/filter.h:664 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
__mutex_lock_common kernel/locking/mutex.c:617 [inline]
__mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
futex_cleanup_begin kernel/futex/core.c:1091 [inline]
futex_exit_release+0x34/0x1f0 kernel/futex/core.c:1143
exit_mm_release+0x1a/0x30 kernel/fork.c:1652
exit_mm+0xb0/0x310 kernel/exit.c:542
do_exit+0x99e/0x27e0 kernel/exit.c:865
do_group_exit+0x207/0x2c0 kernel/exit.c:1027
__do_sys_exit_group kernel/exit.c:1038 [inline]
__se_sys_exit_group kernel/exit.c:1036 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75

other info that might help us debug this:

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&htab->buckets[i].lock);
local_irq_disable();
lock(rcu_node_0);
lock(&htab->buckets[i].lock);
<Interrupt>
lock(rcu_node_0);

*** DEADLOCK ***

2 locks held by rcu_exp_gp_kthr/18:
#0: ffffffff8e136558 (rcu_node_0){-.-.}-{2:2}, at: sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
#1: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#1: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#1: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2380 [inline]
#1: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x114/0x420 kernel/trace/bpf_trace.c:2420

the dependencies between HARDIRQ-irq-safe lock and the holding lock:
-> (rcu_node_0){-.-.}-{2:2} {
IN-HARDIRQ-W at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
rcu_report_exp_cpu_mult+0x27/0x2f0 kernel/rcu/tree_exp.h:238
csd_do_func kernel/smp.c:133 [inline]
__flush_smp_call_function_queue+0xb2e/0x15b0 kernel/smp.c:542
__sysvec_call_function_single+0xa8/0x3e0 arch/x86/kernel/smp.c:271
instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
sysvec_call_function_single+0x9e/0xc0 arch/x86/kernel/smp.c:266
asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:709
__sanitizer_cov_trace_switch+0x90/0x120
update_event_printk kernel/trace/trace_events.c:2750 [inline]
trace_event_eval_update+0x311/0xf90 kernel/trace/trace_events.c:2922
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
IN-SOFTIRQ-W at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
rcu_report_qs_rdp kernel/rcu/tree.c:2018 [inline]
rcu_check_quiescent_state kernel/rcu/tree.c:2100 [inline]
rcu_core+0x3ae/0x1830 kernel/rcu/tree.c:2455
__do_softirq+0x2bc/0x943 kernel/softirq.c:554
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
unwind_next_frame+0x1d8e/0x2a00 arch/x86/kernel/unwind_orc.c:665
arch_stack_walk+0x151/0x1b0 arch/x86/kernel/stacktrace.c:25
stack_trace_save+0x118/0x1d0 kernel/stacktrace.c:122
save_stack+0xfb/0x1f0 mm/page_owner.c:129
__set_page_owner+0x29/0x380 mm/page_owner.c:195
set_page_owner include/linux/page_owner.h:31 [inline]
post_alloc_hook+0x1ea/0x210 mm/page_alloc.c:1533
prep_new_page mm/page_alloc.c:1540 [inline]
get_page_from_freelist+0x33ea/0x3580 mm/page_alloc.c:3311
__alloc_pages+0x256/0x680 mm/page_alloc.c:4569
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page+0x5f/0x160 mm/slub.c:2190
allocate_slab mm/slub.c:2354 [inline]
new_slab+0x84/0x2f0 mm/slub.c:2407
___slab_alloc+0xd1b/0x13e0 mm/slub.c:3540
__slab_alloc mm/slub.c:3625 [inline]
__slab_alloc_node mm/slub.c:3678 [inline]
slab_alloc_node mm/slub.c:3850 [inline]
kmalloc_trace+0x267/0x360 mm/slub.c:4007
kmalloc include/linux/slab.h:590 [inline]
kzalloc include/linux/slab.h:711 [inline]
ddebug_add_module+0x88/0x800 lib/dynamic_debug.c:1240
dynamic_debug_init+0x205/0x5a0 lib/dynamic_debug.c:1446
do_one_initcall+0x238/0x830 init/main.c:1241
do_pre_smp_initcalls+0x57/0xa0 init/main.c:1347
kernel_init_freeable+0x40d/0x5d0 init/main.c:1546
kernel_init+0x1d/0x2a0 init/main.c:1446
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
INITIAL USE at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
rcutree_prepare_cpu+0x71/0x640 kernel/rcu/tree.c:4484
rcu_init+0x9b/0x140 kernel/rcu/tree.c:5224
start_kernel+0x1f7/0x500 init/main.c:969
x86_64_start_reservations+0x2a/0x30 arch/x86/kernel/head64.c:509
x86_64_start_kernel+0x99/0xa0 arch/x86/kernel/head64.c:490
common_startup_64+0x13e/0x147
}
... key at: [<ffffffff945012e0>] rcu_init_one.rcu_node_class+0x0/0x20

the dependencies between the lock to be acquired
and HARDIRQ-irq-unsafe lock:
-> (&htab->buckets[i].lock){+...}-{2:2} {
HARDIRQ-ON-W at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
0xffffffffa0001b0e
bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
__bpf_prog_run include/linux/filter.h:657 [inline]
bpf_prog_run include/linux/filter.h:664 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
__mutex_lock_common kernel/locking/mutex.c:617 [inline]
__mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
futex_cleanup_begin kernel/futex/core.c:1091 [inline]
futex_exit_release+0x34/0x1f0 kernel/futex/core.c:1143
exit_mm_release+0x1a/0x30 kernel/fork.c:1652
exit_mm+0xb0/0x310 kernel/exit.c:542
do_exit+0x99e/0x27e0 kernel/exit.c:865
do_group_exit+0x207/0x2c0 kernel/exit.c:1027
__do_sys_exit_group kernel/exit.c:1038 [inline]
__se_sys_exit_group kernel/exit.c:1036 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
INITIAL USE at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
0xffffffffa0001b0e
bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
__bpf_prog_run include/linux/filter.h:657 [inline]
bpf_prog_run include/linux/filter.h:664 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
__mutex_lock_common kernel/locking/mutex.c:617 [inline]
__mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
futex_cleanup_begin kernel/futex/core.c:1091 [inline]
futex_exit_release+0x34/0x1f0 kernel/futex/core.c:1143
exit_mm_release+0x1a/0x30 kernel/fork.c:1652
exit_mm+0xb0/0x310 kernel/exit.c:542
do_exit+0x99e/0x27e0 kernel/exit.c:865
do_group_exit+0x207/0x2c0 kernel/exit.c:1027
__do_sys_exit_group kernel/exit.c:1038 [inline]
__se_sys_exit_group kernel/exit.c:1036 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
}
... key at: [<ffffffff94882300>] sock_hash_alloc.__key+0x0/0x20
... acquired at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
bpf_prog_43221478a22f23b5+0x42/0x46
bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
__bpf_prog_run include/linux/filter.h:657 [inline]
bpf_prog_run include/linux/filter.h:664 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
trace_contention_end+0xf6/0x120 include/trace/events/lock.h:122
__pv_queued_spin_lock_slowpath+0x939/0xc60 kernel/locking/qspinlock.c:560
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:584 [inline]
queued_spin_lock_slowpath+0x42/0x50 arch/x86/include/asm/qspinlock.h:51
queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
do_raw_spin_lock+0x272/0x370 kernel/locking/spinlock_debug.c:116
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
_raw_spin_lock_irqsave+0xe1/0x120 kernel/locking/spinlock.c:162
sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
synchronize_rcu_expedited_wait_once kernel/rcu/tree_exp.h:516 [inline]
synchronize_rcu_expedited_wait kernel/rcu/tree_exp.h:570 [inline]
rcu_exp_wait_wake kernel/rcu/tree_exp.h:641 [inline]
rcu_exp_sel_wait_wake+0x628/0x1df0 kernel/rcu/tree_exp.h:675
kthread_worker_fn+0x4bf/0xab0 kernel/kthread.c:841
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243

stack backtrace:
CPU: 1 PID: 18 Comm: rcu_exp_gp_kthr Not tainted 6.8.0-syzkaller-05221-gea80e3ed09ab #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
print_bad_irq_dependency kernel/locking/lockdep.c:2626 [inline]
check_irq_usage kernel/locking/lockdep.c:2865 [inline]
check_prev_add kernel/locking/lockdep.c:3138 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x4dc7/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
bpf_prog_43221478a22f23b5+0x42/0x46
bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
__bpf_prog_run include/linux/filter.h:657 [inline]
bpf_prog_run include/linux/filter.h:664 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
trace_contention_end+0xf6/0x120 include/trace/events/lock.h:122
__pv_queued_spin_lock_slowpath+0x939/0xc60 kernel/locking/qspinlock.c:560
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:584 [inline]
queued_spin_lock_slowpath+0x42/0x50 arch/x86/include/asm/qspinlock.h:51
queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
do_raw_spin_lock+0x272/0x370 kernel/locking/spinlock_debug.c:116
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
_raw_spin_lock_irqsave+0xe1/0x120 kernel/locking/spinlock.c:162
sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
synchronize_rcu_expedited_wait_once kernel/rcu/tree_exp.h:516 [inline]
synchronize_rcu_expedited_wait kernel/rcu/tree_exp.h:570 [inline]
rcu_exp_wait_wake kernel/rcu/tree_exp.h:641 [inline]
rcu_exp_sel_wait_wake+0x628/0x1df0 kernel/rcu/tree_exp.h:675
kthread_worker_fn+0x4bf/0xab0 kernel/kthread.c:841
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>
------------[ cut here ]------------
raw_local_irq_restore() called with IRQs enabled
WARNING: CPU: 1 PID: 18 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x29/0x40 kernel/locking/irqflag-debug.c:10
Modules linked in:
CPU: 1 PID: 18 Comm: rcu_exp_gp_kthr Not tainted 6.8.0-syzkaller-05221-gea80e3ed09ab #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
RIP: 0010:warn_bogus_irq_restore+0x29/0x40 kernel/locking/irqflag-debug.c:10
Code: 90 f3 0f 1e fa 90 80 3d 9e 69 01 04 00 74 06 90 c3 cc cc cc cc c6 05 8f 69 01 04 01 90 48 c7 c7 20 ba aa 8b e8 f8 e5 e7 f5 90 <0f> 0b 90 90 90 c3 cc cc cc cc 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
RSP: 0018:ffffc90000177bb8 EFLAGS: 00010246
RAX: bd04dc17ab040900 RBX: 1ffff9200002ef7c RCX: ffff8880172c1e00
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90000177c48 R08: ffffffff8157cc12 R09: 1ffff9200002eecc
R10: dffffc0000000000 R11: fffff5200002eecd R12: dffffc0000000000
R13: 1ffff9200002ef78 R14: ffffc90000177be0 R15: 0000000000000246
FS: 0000000000000000(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6d95bcb0d0 CR3: 000000002098e000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:151 [inline]
_raw_spin_unlock_irqrestore+0x120/0x140 kernel/locking/spinlock.c:194
sync_rcu_exp_done_unlocked+0xdb/0x140 kernel/rcu/tree_exp.h:171
synchronize_rcu_expedited_wait_once kernel/rcu/tree_exp.h:516 [inline]
synchronize_rcu_expedited_wait kernel/rcu/tree_exp.h:570 [inline]
rcu_exp_wait_wake kernel/rcu/tree_exp.h:641 [inline]
rcu_exp_sel_wait_wake+0x628/0x1df0 kernel/rcu/tree_exp.h:675
kthread_worker_fn+0x4bf/0xab0 kernel/kthread.c:841
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>

---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

2024-03-21 00:33:53

by Edward Adam Davis

[permalink] [raw]

Subject: Re: [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult

2024-03-21 15:13:36

by syzbot

[permalink] [raw]

Subject: Re: [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
possible deadlock in add_timer_on

=====================================================
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.8.0-syzkaller-05231-ga51cd6bf8e10-dirty #0 Not tainted
-----------------------------------------------------
udevd/5417 [HC0[0]:SC1[1]:HE0:SE0] is trying to acquire:
ffff88806c3c9020 (&htab->buckets[i].lock){+.-.}-{2:2}, at: sock_hash_delete_elem+0xb1/0x2f0 net/core/sock_map.c:940

and this task is already holding:
ffffffff94697d58 (&obj_hash[i].lock){-.-.}-{2:2}, at: debug_object_active_state+0x15d/0x360 lib/debugobjects.c:936
which would create a new lock dependency:
(&obj_hash[i].lock){-.-.}-{2:2} -> (&htab->buckets[i].lock){+.-.}-{2:2}

but this new dependency connects a HARDIRQ-irq-safe lock:
(&obj_hash[i].lock){-.-.}-{2:2}

.. which became HARDIRQ-irq-safe at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
debug_object_assert_init+0x164/0x440 lib/debugobjects.c:897
debug_timer_assert_init kernel/time/timer.c:846 [inline]
debug_assert_init kernel/time/timer.c:891 [inline]
add_timer_on+0xc3/0x5c0 kernel/time/timer.c:1351
handle_irq_event_percpu kernel/irq/handle.c:195 [inline]
handle_irq_event+0xad/0x1f0 kernel/irq/handle.c:210
handle_level_irq+0x3c5/0x6e0 kernel/irq/chip.c:648
generic_handle_irq_desc include/linux/irqdesc.h:161 [inline]
handle_irq arch/x86/kernel/irq.c:238 [inline]
__common_interrupt+0x13a/0x230 arch/x86/kernel/irq.c:257
common_interrupt+0xa5/0xd0 arch/x86/kernel/irq.c:247
asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]
_raw_spin_unlock_irqrestore+0xd8/0x140 kernel/locking/spinlock.c:194
__setup_irq+0x1277/0x1cf0 kernel/irq/manage.c:1818
request_threaded_irq+0x2ab/0x380 kernel/irq/manage.c:2202
request_irq include/linux/interrupt.h:168 [inline]
setup_default_timer_irq+0x25/0x60 arch/x86/kernel/time.c:70
x86_late_time_init+0x66/0xc0 arch/x86/kernel/time.c:94
start_kernel+0x3f3/0x500 init/main.c:1039
x86_64_start_reservations+0x2a/0x30 arch/x86/kernel/head64.c:509
x86_64_start_kernel+0x99/0xa0 arch/x86/kernel/head64.c:490
common_startup_64+0x13e/0x147

to a HARDIRQ-irq-unsafe lock:
(&htab->buckets[i].lock){+.-.}-{2:2}

.. which became HARDIRQ-irq-unsafe at:
..
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
sock_hash_free+0x164/0x820 net/core/sock_map.c:1155
bpf_map_free_deferred+0xe6/0x110 kernel/bpf/syscall.c:734
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243

other info that might help us debug this:

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&htab->buckets[i].lock);
local_irq_disable();
lock(&obj_hash[i].lock);
lock(&htab->buckets[i].lock);
<Interrupt>
lock(&obj_hash[i].lock);

*** DEADLOCK ***

5 locks held by udevd/5417:
#0: ffff88802a208420 (sb_writers#5){.+.+}-{0:0}, at: mnt_want_write+0x3f/0x90 fs/namespace.c:409
#1: ffff8880758002d0 (&type->i_mutex_dir_key#5){++++}-{3:3}, at: inode_lock include/linux/fs.h:793 [inline]
#1: ffff8880758002d0 (&type->i_mutex_dir_key#5){++++}-{3:3}, at: open_last_lookups fs/namei.c:3564 [inline]
#1: ffff8880758002d0 (&type->i_mutex_dir_key#5){++++}-{3:3}, at: path_openat+0x7d3/0x3240 fs/namei.c:3797
#2: ffffffff8e236790 (remove_cache_srcu){.+.+}-{0:0}, at: srcu_lock_acquire include/linux/srcu.h:116 [inline]
#2: ffffffff8e236790 (remove_cache_srcu){.+.+}-{0:0}, at: srcu_read_lock+0x24/0x50 include/linux/srcu.h:215
#3: ffffffff94697d58 (&obj_hash[i].lock){-.-.}-{2:2}, at: debug_object_active_state+0x15d/0x360 lib/debugobjects.c:936
#4: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#4: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#4: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: __bpf_trace_run kernel/trace/bpf_trace.c:2380 [inline]
#4: ffffffff8e131920 (rcu_read_lock){....}-{1:2}, at: bpf_trace_run2+0x114/0x420 kernel/trace/bpf_trace.c:2420

the dependencies between HARDIRQ-irq-safe lock and the holding lock:
-> (&obj_hash[i].lock){-.-.}-{2:2} {
IN-HARDIRQ-W at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
debug_object_assert_init+0x164/0x440 lib/debugobjects.c:897
debug_timer_assert_init kernel/time/timer.c:846 [inline]
debug_assert_init kernel/time/timer.c:891 [inline]
add_timer_on+0xc3/0x5c0 kernel/time/timer.c:1351
handle_irq_event_percpu kernel/irq/handle.c:195 [inline]
handle_irq_event+0xad/0x1f0 kernel/irq/handle.c:210
handle_level_irq+0x3c5/0x6e0 kernel/irq/chip.c:648
generic_handle_irq_desc include/linux/irqdesc.h:161 [inline]
handle_irq arch/x86/kernel/irq.c:238 [inline]
__common_interrupt+0x13a/0x230 arch/x86/kernel/irq.c:257
common_interrupt+0xa5/0xd0 arch/x86/kernel/irq.c:247
asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693
__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:152 [inline]
_raw_spin_unlock_irqrestore+0xd8/0x140 kernel/locking/spinlock.c:194
__setup_irq+0x1277/0x1cf0 kernel/irq/manage.c:1818
request_threaded_irq+0x2ab/0x380 kernel/irq/manage.c:2202
request_irq include/linux/interrupt.h:168 [inline]
setup_default_timer_irq+0x25/0x60 arch/x86/kernel/time.c:70
x86_late_time_init+0x66/0xc0 arch/x86/kernel/time.c:94
start_kernel+0x3f3/0x500 init/main.c:1039
x86_64_start_reservations+0x2a/0x30 arch/x86/kernel/head64.c:509
x86_64_start_kernel+0x99/0xa0 arch/x86/kernel/head64.c:490
common_startup_64+0x13e/0x147
IN-SOFTIRQ-W at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
debug_object_deactivate+0x158/0x390 lib/debugobjects.c:763
debug_timer_deactivate kernel/time/timer.c:841 [inline]
debug_deactivate kernel/time/timer.c:885 [inline]
detach_timer+0x24/0x300 kernel/time/timer.c:932
expire_timers kernel/time/timer.c:1826 [inline]
__run_timers kernel/time/timer.c:2408 [inline]
__run_timer_base+0x5ef/0x8e0 kernel/time/timer.c:2419
run_timer_base kernel/time/timer.c:2428 [inline]
run_timer_softirq+0x67/0x170 kernel/time/timer.c:2436
__do_softirq+0x2be/0x943 kernel/softirq.c:554
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
common_interrupt+0xaa/0xd0 arch/x86/kernel/irq.c:247
asm_common_interrupt+0x26/0x40 arch/x86/include/asm/idtentry.h:693
console_flush_all+0x9cd/0xec0
console_unlock+0x13b/0x4d0 kernel/printk/printk.c:3025
vprintk_emit+0x509/0x720 kernel/printk/printk.c:2292
_printk+0xd5/0x120 kernel/printk/printk.c:2317
calibrate_delay+0x1597/0x16b0 init/calibrate.c:308
start_kernel+0x3fd/0x500 init/main.c:1041
x86_64_start_reservations+0x2a/0x30 arch/x86/kernel/head64.c:509
x86_64_start_kernel+0x99/0xa0 arch/x86/kernel/head64.c:490
common_startup_64+0x13e/0x147
INITIAL USE at:
}
... key at: [<ffffffff9466d4c0>] debug_objects_early_init.__key+0x0/0x20

the dependencies between the lock to be acquired
and HARDIRQ-irq-unsafe lock:
-> (&htab->buckets[i].lock){+.-.}-{2:2} {
HARDIRQ-ON-W at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
sock_hash_free+0x164/0x820 net/core/sock_map.c:1155
bpf_map_free_deferred+0xe6/0x110 kernel/bpf/syscall.c:734
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
IN-SOFTIRQ-W at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
sock_hash_delete_elem+0xb1/0x2f0 net/core/sock_map.c:940
bpf_prog_43221478a22f23b5+0x42/0x46
bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
__bpf_prog_run include/linux/filter.h:657 [inline]
bpf_prog_run include/linux/filter.h:664 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
trace_contention_end+0xf6/0x120 include/trace/events/lock.h:122
__pv_queued_spin_lock_slowpath+0x939/0xc60 kernel/locking/qspinlock.c:560
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:584 [inline]
queued_spin_lock_slowpath+0x42/0x50 arch/x86/include/asm/qspinlock.h:51
queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
do_raw_spin_lock+0x272/0x370 kernel/locking/spinlock_debug.c:116
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
_raw_spin_lock_irqsave+0xe1/0x120 kernel/locking/spinlock.c:162
debug_object_active_state+0x15d/0x360 lib/debugobjects.c:936
debug_rcu_head_unqueue kernel/rcu/rcu.h:236 [inline]
rcu_do_batch kernel/rcu/tree.c:2188 [inline]
rcu_core+0xa70/0x1830 kernel/rcu/tree.c:2471
__do_softirq+0x2bc/0x943 kernel/softirq.c:554
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
_compound_head include/linux/page-flags.h:247 [inline]
virt_to_folio include/linux/mm.h:1294 [inline]
virt_to_slab mm/kasan/../slab.h:204 [inline]
qlink_to_cache+0x1c/0xb0 mm/kasan/quarantine.c:131
qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:176
kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3813 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
kmem_cache_alloc_lru+0x175/0x350 mm/slub.c:3879
alloc_inode_sb include/linux/fs.h:3088 [inline]
shmem_alloc_inode+0x28/0x40 mm/shmem.c:4425
alloc_inode fs/inode.c:261 [inline]
new_inode_pseudo+0x69/0x1e0 fs/inode.c:1007
new_inode+0x22/0x1d0 fs/inode.c:1033
__shmem_get_inode mm/shmem.c:2477 [inline]
shmem_get_inode+0x34a/0xd40 mm/shmem.c:2548
shmem_mknod+0x5f/0x1d0 mm/shmem.c:3242
lookup_open fs/namei.c:3498 [inline]
open_last_lookups fs/namei.c:3567 [inline]
path_openat+0x1425/0x3240 fs/namei.c:3797
do_filp_open+0x235/0x490 fs/namei.c:3827
do_sys_openat2+0x13e/0x1d0 fs/open.c:1407
do_sys_open fs/open.c:1422 [inline]
__do_sys_openat fs/open.c:1438 [inline]
__se_sys_openat fs/open.c:1433 [inline]
__x64_sys_openat+0x247/0x2a0 fs/open.c:1433
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
INITIAL USE at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
sock_hash_delete_elem+0xb1/0x2f0 net/core/sock_map.c:940
0xffffffffa000556a
bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
__bpf_prog_run include/linux/filter.h:657 [inline]
bpf_prog_run include/linux/filter.h:664 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
__mutex_lock_common kernel/locking/mutex.c:617 [inline]
__mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
tracepoint_probe_unregister+0x32/0x990 kernel/tracepoint.c:548
bpf_raw_tp_link_release+0x63/0x90 kernel/bpf/syscall.c:3482
bpf_link_free kernel/bpf/syscall.c:3033 [inline]
bpf_link_put_direct+0x123/0x1b0 kernel/bpf/syscall.c:3064
bpf_link_release+0x3b/0x50 kernel/bpf/syscall.c:3071
__fput+0x429/0x8a0 fs/file_table.c:423
__do_sys_close fs/open.c:1557 [inline]
__se_sys_close fs/open.c:1542 [inline]
__x64_sys_close+0x7f/0x110 fs/open.c:1542
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
}
... key at: [<ffffffff94882300>] sock_hash_alloc.__key+0x0/0x20
... acquired at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
sock_hash_delete_elem+0xb1/0x2f0 net/core/sock_map.c:940
bpf_prog_43221478a22f23b5+0x42/0x46
bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
__bpf_prog_run include/linux/filter.h:657 [inline]
bpf_prog_run include/linux/filter.h:664 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
trace_contention_end+0xf6/0x120 include/trace/events/lock.h:122
__pv_queued_spin_lock_slowpath+0x939/0xc60 kernel/locking/qspinlock.c:560
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:584 [inline]
queued_spin_lock_slowpath+0x42/0x50 arch/x86/include/asm/qspinlock.h:51
queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
do_raw_spin_lock+0x272/0x370 kernel/locking/spinlock_debug.c:116
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
_raw_spin_lock_irqsave+0xe1/0x120 kernel/locking/spinlock.c:162
debug_object_active_state+0x15d/0x360 lib/debugobjects.c:936
debug_rcu_head_unqueue kernel/rcu/rcu.h:236 [inline]
rcu_do_batch kernel/rcu/tree.c:2188 [inline]
rcu_core+0xa70/0x1830 kernel/rcu/tree.c:2471
__do_softirq+0x2bc/0x943 kernel/softirq.c:554
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
_compound_head include/linux/page-flags.h:247 [inline]
virt_to_folio include/linux/mm.h:1294 [inline]
virt_to_slab mm/kasan/../slab.h:204 [inline]
qlink_to_cache+0x1c/0xb0 mm/kasan/quarantine.c:131
qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:176
kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3813 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
kmem_cache_alloc_lru+0x175/0x350 mm/slub.c:3879
alloc_inode_sb include/linux/fs.h:3088 [inline]
shmem_alloc_inode+0x28/0x40 mm/shmem.c:4425
alloc_inode fs/inode.c:261 [inline]
new_inode_pseudo+0x69/0x1e0 fs/inode.c:1007
new_inode+0x22/0x1d0 fs/inode.c:1033
__shmem_get_inode mm/shmem.c:2477 [inline]
shmem_get_inode+0x34a/0xd40 mm/shmem.c:2548
shmem_mknod+0x5f/0x1d0 mm/shmem.c:3242
lookup_open fs/namei.c:3498 [inline]
open_last_lookups fs/namei.c:3567 [inline]
path_openat+0x1425/0x3240 fs/namei.c:3797
do_filp_open+0x235/0x490 fs/namei.c:3827
do_sys_openat2+0x13e/0x1d0 fs/open.c:1407
do_sys_open fs/open.c:1422 [inline]
__do_sys_openat fs/open.c:1438 [inline]
__se_sys_openat fs/open.c:1433 [inline]
__x64_sys_openat+0x247/0x2a0 fs/open.c:1433
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75

stack backtrace:
CPU: 1 PID: 5417 Comm: udevd Not tainted 6.8.0-syzkaller-05231-ga51cd6bf8e10-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106
print_bad_irq_dependency kernel/locking/lockdep.c:2626 [inline]
check_irq_usage kernel/locking/lockdep.c:2865 [inline]
check_prev_add kernel/locking/lockdep.c:3138 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x4dc7/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
sock_hash_delete_elem+0xb1/0x2f0 net/core/sock_map.c:940
bpf_prog_43221478a22f23b5+0x42/0x46
bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
__bpf_prog_run include/linux/filter.h:657 [inline]
bpf_prog_run include/linux/filter.h:664 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
trace_contention_end+0xf6/0x120 include/trace/events/lock.h:122
__pv_queued_spin_lock_slowpath+0x939/0xc60 kernel/locking/qspinlock.c:560
pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:584 [inline]
queued_spin_lock_slowpath+0x42/0x50 arch/x86/include/asm/qspinlock.h:51
queued_spin_lock include/asm-generic/qspinlock.h:114 [inline]
do_raw_spin_lock+0x272/0x370 kernel/locking/spinlock_debug.c:116
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:111 [inline]
_raw_spin_lock_irqsave+0xe1/0x120 kernel/locking/spinlock.c:162
debug_object_active_state+0x15d/0x360 lib/debugobjects.c:936
debug_rcu_head_unqueue kernel/rcu/rcu.h:236 [inline]
rcu_do_batch kernel/rcu/tree.c:2188 [inline]
rcu_core+0xa70/0x1830 kernel/rcu/tree.c:2471
__do_softirq+0x2bc/0x943 kernel/softirq.c:554
invoke_softirq kernel/softirq.c:428 [inline]
__irq_exit_rcu+0xf2/0x1c0 kernel/softirq.c:633
irq_exit_rcu+0x9/0x30 kernel/softirq.c:645
instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline]
sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
RIP: 0010:_compound_head include/linux/page-flags.h:249 [inline]
RIP: 0010:virt_to_folio include/linux/mm.h:1294 [inline]
RIP: 0010:virt_to_slab mm/kasan/../slab.h:204 [inline]
RIP: 0010:qlink_to_cache+0x1c/0xb0 mm/kasan/quarantine.c:131
Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 90 e8 ab a3 49 ff 48 c1 e8 06 48 83 e0 c0 48 ba 00 00 00 00 00 ea ff ff 48 8b 4c 10 08 <f6> c1 01 75 44 48 01 d0 66 90 48 8b 48 08 f6 c1 01 75 65 66 90 48
RSP: 0018:ffffc90004c1f6d0 EFLAGS: 00000206
RAX: 0000000000ba42c0 RBX: ffff88802e90b300 RCX: ffffea0000ba4201
RDX: ffffea0000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88801caa0780 R08: ffffffff8141ef1c R09: 1ffffffff2598ea5
R10: dffffc0000000000 R11: fffffbfff2598ea6 R12: 0000000000000000
R13: ffff88802e90b300 R14: ffffc90004c1f708 R15: 0000000000000000
qlist_free_all+0x2e/0xc0 mm/kasan/quarantine.c:176
kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:322
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slub.c:3813 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
kmem_cache_alloc_lru+0x175/0x350 mm/slub.c:3879
alloc_inode_sb include/linux/fs.h:3088 [inline]
shmem_alloc_inode+0x28/0x40 mm/shmem.c:4425
alloc_inode fs/inode.c:261 [inline]
new_inode_pseudo+0x69/0x1e0 fs/inode.c:1007
new_inode+0x22/0x1d0 fs/inode.c:1033
__shmem_get_inode mm/shmem.c:2477 [inline]
shmem_get_inode+0x34a/0xd40 mm/shmem.c:2548
shmem_mknod+0x5f/0x1d0 mm/shmem.c:3242
lookup_open fs/namei.c:3498 [inline]
open_last_lookups fs/namei.c:3567 [inline]
path_openat+0x1425/0x3240 fs/namei.c:3797
do_filp_open+0x235/0x490 fs/namei.c:3827
do_sys_openat2+0x13e/0x1d0 fs/open.c:1407
do_sys_open fs/open.c:1422 [inline]
__do_sys_openat fs/open.c:1438 [inline]
__se_sys_openat fs/open.c:1433 [inline]
__x64_sys_openat+0x247/0x2a0 fs/open.c:1433
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75
RIP: 0033:0x7f2bc89169a4
Code: 24 20 48 8d 44 24 30 48 89 44 24 28 64 8b 04 25 18 00 00 00 85 c0 75 2c 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 76 60 48 8b 15 55 a4 0d 00 f7 d8 64 89 02 48 83
RSP: 002b:00007ffcd2db6460 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f2bc89169a4
RDX: 0000000000080241 RSI: 00007ffcd2db69a8 RDI: 00000000ffffff9c
RBP: 00007ffcd2db69a8 R08: 0000000000000004 R09: 0000000000000001
R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000080241
R13: 000055cf24a9672e R14: 0000000000000001 R15: 000055cf24ab1160
</TASK>
----------------
Code disassembly (best guess):
0: 90 nop
1: 90 nop
2: 90 nop
3: 90 nop
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 90 nop
9: 90 nop
a: 90 nop
b: 90 nop
c: 90 nop
d: 90 nop
e: e8 ab a3 49 ff call 0xff49a3be
13: 48 c1 e8 06 shr $0x6,%rax
17: 48 83 e0 c0 and $0xffffffffffffffc0,%rax
1b: 48 ba 00 00 00 00 00 movabs $0xffffea0000000000,%rdx
22: ea ff ff
25: 48 8b 4c 10 08 mov 0x8(%rax,%rdx,1),%rcx
* 2a: f6 c1 01 test $0x1,%cl <-- trapping instruction
2d: 75 44 jne 0x73
2f: 48 01 d0 add %rdx,%rax
32: 66 90 xchg %ax,%ax
34: 48 8b 48 08 mov 0x8(%rax),%rcx
38: f6 c1 01 test $0x1,%cl
3b: 75 65 jne 0xa2
3d: 66 90 xchg %ax,%ax
3f: 48 rex.W

Tested on:

commit: a51cd6bf arm64: bpf: fix 32bit unconditional bswap
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=1797ba81180000
kernel config: https://syzkaller.appspot.com/x/.config?x=6fb1be60a193d440
dashboard link: https://syzkaller.appspot.com/bug?extid=c4f4d25859c2e5859988
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=100f1d66180000

2024-03-22 00:22:59

by Edward Adam Davis

[permalink] [raw]

Subject: Re: [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult

please test dl in rcu_report_exp_cpu_mult

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 27d733c0f65e..ae8f81b26e16 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -932,11 +932,12 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
struct bpf_shtab_bucket *bucket;
struct bpf_shtab_elem *elem;
int ret = -ENOENT;
+ unsigned long flags;

hash = sock_hash_bucket_hash(key, key_size);
bucket = sock_hash_select_bucket(htab, hash);

- spin_lock_bh(&bucket->lock);
+ spin_lock_irqsave(&bucket->lock, flags);
elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
if (elem) {
hlist_del_rcu(&elem->node);
@@ -944,7 +945,7 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
sock_hash_free_elem(htab, elem);
ret = 0;
}
- spin_unlock_bh(&bucket->lock);
+ spin_unlock_irqrestore(&bucket->lock, flags);
return ret;
}

@@ -1136,6 +1137,7 @@ static void sock_hash_free(struct bpf_map *map)
struct bpf_shtab_elem *elem;
struct hlist_node *node;
int i;
+ unsigned long flags;

/* After the sync no updates or deletes will be in-flight so it
* is safe to walk map and remove entries without risking a race
@@ -1151,11 +1153,11 @@ static void sock_hash_free(struct bpf_map *map)
* exists, psock exists and holds a ref to socket. That
* lets us to grab a socket ref too.
*/
- spin_lock_bh(&bucket->lock);
+ spin_lock_irqsave(&bucket->lock, flags);
hlist_for_each_entry(elem, &bucket->head, node)
sock_hold(elem->sk);
hlist_move_list(&bucket->head, &unlink_list);
- spin_unlock_bh(&bucket->lock);
+ spin_unlock_irqrestore(&bucket->lock, flags);

/* Process removed entries out of atomic context to
* block for socket lock before deleting the psock's

2024-03-22 10:56:11

by syzbot

[permalink] [raw]

Subject: Re: [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: [email protected]

Tested on:

commit: ddb2ffdc libbpf: Define MFD_CLOEXEC if not available
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=170141a5180000
kernel config: https://syzkaller.appspot.com/x/.config?x=6fb1be60a193d440
dashboard link: https://syzkaller.appspot.com/bug?extid=c4f4d25859c2e5859988
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=14b44711180000

Note: testing is done by a robot and is best-effort only.

2024-03-23 05:54:08

by Edward Adam Davis

[permalink] [raw]

Subject: [PATCH] bpf, sockmap: fix deadlock in rcu_report_exp_cpu_mult

[Syzbot reported]
WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
6.8.0-syzkaller-05221-gea80e3ed09ab #0 Not tainted
-----------------------------------------------------
rcu_exp_gp_kthr/18 [HC0[0]:SC0[2]:HE0:SE0] is trying to acquire:
ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffff88802b5ab020 (&htab->buckets[i].lock){+...}-{2:2}, at: sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939

and this task is already holding:
ffffffff8e136558 (rcu_node_0){-.-.}-{2:2}, at: sync_rcu_exp_done_unlocked+0xe/0x140 kernel/rcu/tree_exp.h:169
which would create a new lock dependency:
(rcu_node_0){-.-.}-{2:2} -> (&htab->buckets[i].lock){+...}-{2:2}

but this new dependency connects a HARDIRQ-irq-safe lock:
(rcu_node_0){-.-.}-{2:2}

.. which became HARDIRQ-irq-safe at:
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
rcu_report_exp_cpu_mult+0x27/0x2f0 kernel/rcu/tree_exp.h:238
csd_do_func kernel/smp.c:133 [inline]
__flush_smp_call_function_queue+0xb2e/0x15b0 kernel/smp.c:542
__sysvec_call_function_single+0xa8/0x3e0 arch/x86/kernel/smp.c:271
instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
sysvec_call_function_single+0x9e/0xc0 arch/x86/kernel/smp.c:266
asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:709
__sanitizer_cov_trace_switch+0x90/0x120
update_event_printk kernel/trace/trace_events.c:2750 [inline]
trace_event_eval_update+0x311/0xf90 kernel/trace/trace_events.c:2922
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243

to a HARDIRQ-irq-unsafe lock:
(&htab->buckets[i].lock){+...}-{2:2}

.. which became HARDIRQ-irq-unsafe at:
..
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
sock_hash_delete_elem+0xb0/0x300 net/core/sock_map.c:939
0xffffffffa0001b0e
bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
__bpf_prog_run include/linux/filter.h:657 [inline]
bpf_prog_run include/linux/filter.h:664 [inline]
__bpf_trace_run kernel/trace/bpf_trace.c:2381 [inline]
bpf_trace_run2+0x204/0x420 kernel/trace/bpf_trace.c:2420
trace_contention_end+0xd7/0x100 include/trace/events/lock.h:122
__mutex_lock_common kernel/locking/mutex.c:617 [inline]
__mutex_lock+0x2e5/0xd70 kernel/locking/mutex.c:752
futex_cleanup_begin kernel/futex/core.c:1091 [inline]
futex_exit_release+0x34/0x1f0 kernel/futex/core.c:1143
exit_mm_release+0x1a/0x30 kernel/fork.c:1652
exit_mm+0xb0/0x310 kernel/exit.c:542
do_exit+0x99e/0x27e0 kernel/exit.c:865
do_group_exit+0x207/0x2c0 kernel/exit.c:1027
__do_sys_exit_group kernel/exit.c:1038 [inline]
__se_sys_exit_group kernel/exit.c:1036 [inline]
__x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1036
do_syscall_64+0xfb/0x240
entry_SYSCALL_64_after_hwframe+0x6d/0x75

other info that might help us debug this:

Possible interrupt unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&htab->buckets[i].lock);
local_irq_disable();
lock(rcu_node_0);
lock(&htab->buckets[i].lock);
<Interrupt>
lock(rcu_node_0);

*** DEADLOCK ***
[Fix]
Ensure that the context interrupt state is the same before and after using the
bucket->lock.

Reported-and-tested-by: [email protected]
Signed-off-by: Edward Adam Davis <[email protected]>
---
net/core/sock_map.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 27d733c0f65e..ae8f81b26e16 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -932,11 +932,12 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
struct bpf_shtab_bucket *bucket;
struct bpf_shtab_elem *elem;
int ret = -ENOENT;
+ unsigned long flags;

hash = sock_hash_bucket_hash(key, key_size);
bucket = sock_hash_select_bucket(htab, hash);

- spin_lock_bh(&bucket->lock);
+ spin_lock_irqsave(&bucket->lock, flags);
elem = sock_hash_lookup_elem_raw(&bucket->head, hash, key, key_size);
if (elem) {
hlist_del_rcu(&elem->node);
@@ -944,7 +945,7 @@ static long sock_hash_delete_elem(struct bpf_map *map, void *key)
sock_hash_free_elem(htab, elem);
ret = 0;
}
- spin_unlock_bh(&bucket->lock);
+ spin_unlock_irqrestore(&bucket->lock, flags);
return ret;
}

@@ -1136,6 +1137,7 @@ static void sock_hash_free(struct bpf_map *map)
struct bpf_shtab_elem *elem;
struct hlist_node *node;
int i;
+ unsigned long flags;

/* After the sync no updates or deletes will be in-flight so it
* is safe to walk map and remove entries without risking a race
@@ -1151,11 +1153,11 @@ static void sock_hash_free(struct bpf_map *map)
* exists, psock exists and holds a ref to socket. That
* lets us to grab a socket ref too.
*/
- spin_lock_bh(&bucket->lock);
+ spin_lock_irqsave(&bucket->lock, flags);
hlist_for_each_entry(elem, &bucket->head, node)
sock_hold(elem->sk);
hlist_move_list(&bucket->head, &unlink_list);
- spin_unlock_bh(&bucket->lock);
+ spin_unlock_irqrestore(&bucket->lock, flags);

/* Process removed entries out of atomic context to
* block for socket lock before deleting the psock's
--
2.43.0

2024-03-25 16:16:44

by Jakub Sitnicki

[permalink] [raw]

Subject: Re: [PATCH] bpf, sockmap: fix deadlock in rcu_report_exp_cpu_mult

On Mon, Mar 25, 2024 at 01:23 PM +01, Jakub Sitnicki wrote:

[...]

> But we also need to cover sock_map_unref->sock_sock_map_del_link called
> from sock_hash_delete_elem. It also grabs a spin lock.

On second look, no need to disable interrupts in
sock_map_unref->sock_sock_map_del_link. Call is enclosed in the critical
section in sock_hash_delete_elem that has been updated.

I have a question, though, why are we patching sock_hash_free? It
doesn't get called unless there are no more existing users of the BPF
map. So nothing can mutate it from interrupt context.

[...]

2024-03-29 05:29:51

by John Fastabend

[permalink] [raw]

Subject: Re: [PATCH] bpf, sockmap: fix deadlock in rcu_report_exp_cpu_mult

Jakub Sitnicki wrote:
> On Mon, Mar 25, 2024 at 01:23 PM +01, Jakub Sitnicki wrote:
>
> [...]
>
> > But we also need to cover sock_map_unref->sock_sock_map_del_link called
> > from sock_hash_delete_elem. It also grabs a spin lock.
>
> On second look, no need to disable interrupts in
> sock_map_unref->sock_sock_map_del_link. Call is enclosed in the critical
> section in sock_hash_delete_elem that has been updated.
>
> I have a question, though, why are we patching sock_hash_free? It
> doesn't get called unless there are no more existing users of the BPF
> map. So nothing can mutate it from interrupt context.
>
> [...]

Agree sock_hash_free should be only after all refs are dropped.

Edward, did you want to send a v2 for this? Also if you want fixing the
sockmap case as well would be useful. Also happy to finish up the patches
if you would rather not.

Thanks,
John

2024-04-20 14:51:48

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [syzbot] [bpf?] [net?] possible deadlock in rcu_report_exp_cpu_mult

#syz fix: bpf, sockmap: Prevent lock inversion deadlock in map delete elem