2024-03-28 14:20:59

by syzbot

[permalink] [raw]
Subject: [syzbot] [net?] possible deadlock in hsr_dev_xmit (2)

Hello,

syzbot found the following issue on:

HEAD commit: fe46a7dd189e Merge tag 'sound-6.9-rc1' of git://git.kernel..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=11cc4c51180000
kernel config: https://syzkaller.appspot.com/x/.config?x=fe78468a74fdc3b7
dashboard link: https://syzkaller.appspot.com/bug?extid=fbf74291c3b7e753b481
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
userspace arch: i386

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/55a16212fbdf/disk-fe46a7dd.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/704972635ac7/vmlinux-fe46a7dd.xz
kernel image: https://storage.googleapis.com/syzbot-assets/a04b0d8c481f/bzImage-fe46a7dd.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

============================================
WARNING: possible recursive locking detected
6.8.0-syzkaller-08951-gfe46a7dd189e #0 Not tainted
--------------------------------------------
kworker/u8:3/49 is trying to acquire lock:
ffff888050f26da0 (&hsr->seqnr_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffff888050f26da0 (&hsr->seqnr_lock){+.-.}-{2:2}, at: hsr_dev_xmit+0x13e/0x1d0 net/hsr/hsr_device.c:229

but task is already holding lock:
ffff88807cdaeda0 (&hsr->seqnr_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffff88807cdaeda0 (&hsr->seqnr_lock){+.-.}-{2:2}, at: send_hsr_supervision_frame+0x276/0xad0 net/hsr/hsr_device.c:310

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&hsr->seqnr_lock);
lock(&hsr->seqnr_lock);

*** DEADLOCK ***

May be due to missing lock nesting notation

9 locks held by kworker/u8:3/49:
#0: ffff88802a81f948 ((wq_completion)bat_events){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline]
#0: ffff88802a81f948 ((wq_completion)bat_events){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x1770 kernel/workqueue.c:3335
#1: ffffc90000b97d00 ((work_completion)(&(&bat_priv->nc.work)->work)
){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline]
){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x1770 kernel/workqueue.c:3335
#2: ffffc90000a08ca0 ((&hsr->announce_timer)){+.-.}-{0:0}
, at: call_timer_fn+0xc0/0x600 kernel/time/timer.c:1789
#3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: hsr_announce+0xa3/0x370 net/hsr/hsr_device.c:387
#4: ffff88807cdaeda0 (&hsr->seqnr_lock
){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
){+.-.}-{2:2}, at: send_hsr_supervision_frame+0x276/0xad0 net/hsr/hsr_device.c:310
#5: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#5: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#5: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: hsr_forward_skb+0xae/0x2400 net/hsr/hsr_forward.c:614
#6: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
#6: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline]
#6: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4260
#7: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#7: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#7: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: br_dev_xmit+0x1b9/0x1a10 net/bridge/br_device.c:44
#8: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
#8: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline]
#8: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4260

stack backtrace:
CPU: 1 PID: 49 Comm: kworker/u8:3 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Workqueue: bat_events batadv_nc_worker
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
check_deadlock kernel/locking/lockdep.c:3062 [inline]
validate_chain+0x15c1/0x58e0 kernel/locking/lockdep.c:3856
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
hsr_dev_xmit+0x13e/0x1d0 net/hsr/hsr_device.c:229
__netdev_start_xmit include/linux/netdevice.h:4903 [inline]
netdev_start_xmit include/linux/netdevice.h:4917 [inline]
xmit_one net/core/dev.c:3531 [inline]
dev_hard_start_xmit+0x26a/0x790 net/core/dev.c:3547
__dev_queue_xmit+0x19f4/0x3b10 net/core/dev.c:4335
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
br_dev_queue_push_xmit+0x701/0x8d0 net/bridge/br_forward.c:53
NF_HOOK+0x3a7/0x460 include/linux/netfilter.h:314
br_forward_finish+0xe5/0x140 net/bridge/br_forward.c:66
NF_HOOK+0x3a7/0x460 include/linux/netfilter.h:314
__br_forward+0x489/0x660 net/bridge/br_forward.c:115
deliver_clone net/bridge/br_forward.c:131 [inline]
maybe_deliver+0xb3/0x150 net/bridge/br_forward.c:190
br_flood+0x2e4/0x660 net/bridge/br_forward.c:236
br_dev_xmit+0x118c/0x1a10
__netdev_start_xmit include/linux/netdevice.h:4903 [inline]
netdev_start_xmit include/linux/netdevice.h:4917 [inline]
xmit_one net/core/dev.c:3531 [inline]
dev_hard_start_xmit+0x26a/0x790 net/core/dev.c:3547
__dev_queue_xmit+0x19f4/0x3b10 net/core/dev.c:4335
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
hsr_xmit net/hsr/hsr_forward.c:380 [inline]
hsr_forward_do net/hsr/hsr_forward.c:471 [inline]
hsr_forward_skb+0x183f/0x2400 net/hsr/hsr_forward.c:619
send_hsr_supervision_frame+0x548/0xad0 net/hsr/hsr_device.c:333
hsr_announce+0x1a9/0x370 net/hsr/hsr_device.c:389
call_timer_fn+0x17e/0x600 kernel/time/timer.c:1792
expire_timers kernel/time/timer.c:1843 [inline]
__run_timers kernel/time/timer.c:2408 [inline]
__run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2419
run_timer_base kernel/time/timer.c:2428 [inline]
run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2438
__do_softirq+0x2bc/0x943 kernel/softirq.c:554
do_softirq+0x11b/0x1e0 kernel/softirq.c:455
</IRQ>
<TASK>
__local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382
spin_unlock_bh include/linux/spinlock.h:396 [inline]
batadv_nc_purge_paths+0x30f/0x3b0 net/batman-adv/network-coding.c:471
batadv_nc_worker+0x328/0x610 net/batman-adv/network-coding.c:720
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup


2024-04-01 06:11:43

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [net?] possible deadlock in hsr_dev_xmit (2)

syzbot has found a reproducer for the following issue on:

HEAD commit: 480e035fc4c7 Merge tag 'drm-next-2024-03-13' of https://gi..
git tree: upstream
console+strace: https://syzkaller.appspot.com/x/log.txt?x=166a86e5180000
kernel config: https://syzkaller.appspot.com/x/.config?x=1e5b814e91787669
dashboard link: https://syzkaller.appspot.com/bug?extid=fbf74291c3b7e753b481
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10526855180000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15e0f5c3180000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/5f73b6ef963d/disk-480e035f.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/46c949396aad/vmlinux-480e035f.xz
kernel image: https://storage.googleapis.com/syzbot-assets/e3b4d0f5a5f8/bzImage-480e035f.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

============================================
WARNING: possible recursive locking detected
6.8.0-syzkaller-08073-g480e035fc4c7 #0 Not tainted
--------------------------------------------
ksoftirqd/1/23 is trying to acquire lock:
ffff8880744d6da0 (&hsr->seqnr_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffff8880744d6da0 (&hsr->seqnr_lock){+.-.}-{2:2}, at: hsr_dev_xmit+0x13e/0x1d0 net/hsr/hsr_device.c:229

but task is already holding lock:
ffff88802383ada0 (&hsr->seqnr_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
ffff88802383ada0 (&hsr->seqnr_lock){+.-.}-{2:2}, at: send_hsr_supervision_frame+0x276/0xad0 net/hsr/hsr_device.c:310

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&hsr->seqnr_lock);
lock(&hsr->seqnr_lock);

*** DEADLOCK ***

May be due to missing lock nesting notation

7 locks held by ksoftirqd/1/23:
#0: ffffc900001d7a40 ((&hsr->announce_timer)){+.-.}-{0:0}, at: call_timer_fn+0xc0/0x600 kernel/time/timer.c:1789
#1: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#1: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#1: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: hsr_announce+0xa3/0x370 net/hsr/hsr_device.c:387
#2: ffff88802383ada0 (&hsr->seqnr_lock){+.-.}-{2:2}, at: spin_lock_bh include/linux/spinlock.h:356 [inline]
#2: ffff88802383ada0 (&hsr->seqnr_lock){+.-.}-{2:2}, at: send_hsr_supervision_frame+0x276/0xad0 net/hsr/hsr_device.c:310
#3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: hsr_forward_skb+0xae/0x2400 net/hsr/hsr_forward.c:614
#4: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
#4: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline]
#4: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4260
#5: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#5: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#5: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: br_dev_xmit+0x1b9/0x1a10 net/bridge/br_device.c:44
#6: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
#6: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline]
#6: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4260

stack backtrace:
CPU: 1 PID: 23 Comm: ksoftirqd/1 Not tainted 6.8.0-syzkaller-08073-g480e035fc4c7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
check_deadlock kernel/locking/lockdep.c:3062 [inline]
validate_chain+0x15c1/0x58e0 kernel/locking/lockdep.c:3856
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
__raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
_raw_spin_lock_bh+0x35/0x50 kernel/locking/spinlock.c:178
spin_lock_bh include/linux/spinlock.h:356 [inline]
hsr_dev_xmit+0x13e/0x1d0 net/hsr/hsr_device.c:229
__netdev_start_xmit include/linux/netdevice.h:4903 [inline]
netdev_start_xmit include/linux/netdevice.h:4917 [inline]
xmit_one net/core/dev.c:3531 [inline]
dev_hard_start_xmit+0x26a/0x790 net/core/dev.c:3547
__dev_queue_xmit+0x19f4/0x3b10 net/core/dev.c:4335
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
br_dev_queue_push_xmit+0x701/0x8d0 net/bridge/br_forward.c:53
NF_HOOK+0x3a7/0x460 include/linux/netfilter.h:314
br_forward_finish+0xe5/0x140 net/bridge/br_forward.c:66
NF_HOOK+0x3a7/0x460 include/linux/netfilter.h:314
__br_forward+0x489/0x660 net/bridge/br_forward.c:115
deliver_clone net/bridge/br_forward.c:131 [inline]
maybe_deliver+0xb3/0x150 net/bridge/br_forward.c:190
br_flood+0x2e4/0x660 net/bridge/br_forward.c:236
br_dev_xmit+0x118c/0x1a10
__netdev_start_xmit include/linux/netdevice.h:4903 [inline]
netdev_start_xmit include/linux/netdevice.h:4917 [inline]
xmit_one net/core/dev.c:3531 [inline]
dev_hard_start_xmit+0x26a/0x790 net/core/dev.c:3547
__dev_queue_xmit+0x19f4/0x3b10 net/core/dev.c:4335
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
hsr_xmit net/hsr/hsr_forward.c:380 [inline]
hsr_forward_do net/hsr/hsr_forward.c:471 [inline]
hsr_forward_skb+0x183f/0x2400 net/hsr/hsr_forward.c:619
send_hsr_supervision_frame+0x548/0xad0 net/hsr/hsr_device.c:333
hsr_announce+0x1a9/0x370 net/hsr/hsr_device.c:389
call_timer_fn+0x17e/0x600 kernel/time/timer.c:1792
expire_timers kernel/time/timer.c:1843 [inline]
__run_timers kernel/time/timer.c:2408 [inline]
__run_timer_base+0x66a/0x8e0 kernel/time/timer.c:2419
run_timer_base kernel/time/timer.c:2428 [inline]
run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2438
__do_softirq+0x2bc/0x943 kernel/softirq.c:554
run_ksoftirqd+0xc5/0x130 kernel/softirq.c:924
smpboot_thread_fn+0x544/0xa30 kernel/smpboot.c:164
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>


---
If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

2024-04-01 13:04:38

by Hillf Danton

[permalink] [raw]
Subject: Re: [syzbot] [net?] possible deadlock in hsr_dev_xmit (2)

On Sun, 31 Mar 2024 23:11:27 -0700
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: 480e035fc4c7 Merge tag 'drm-next-2024-03-13' of https://gi..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15e0f5c3180000

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 480e035fc4c7

--- x/net/hsr/hsr_device.c
+++ y/net/hsr/hsr_device.c
@@ -226,9 +226,11 @@ static netdev_tx_t hsr_dev_xmit(struct s
skb->dev = master->dev;
skb_reset_mac_header(skb);
skb_reset_mac_len(skb);
- spin_lock_bh(&hsr->seqnr_lock);
+ local_bh_disable();
+ spin_lock_nested(&hsr->seqnr_lock, 1);
hsr_forward_skb(skb, master);
- spin_unlock_bh(&hsr->seqnr_lock);
+ spin_unlock(&hsr->seqnr_lock);
+ local_bh_enable();
} else {
dev_core_stats_tx_dropped_inc(dev);
dev_kfree_skb_any(skb);
--

2024-04-01 13:36:12

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [net?] possible deadlock in hsr_dev_xmit (2)

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
possible deadlock in hsr_dev_xmit

============================================
WARNING: possible recursive locking detected
6.8.0-syzkaller-08073-g480e035fc4c7-dirty #0 Not tainted
--------------------------------------------
kworker/0:1/8 is trying to acquire lock:
ffff88806df74da0 (&hsr->seqnr_lock/1){+.-.}-{2:2}, at: hsr_dev_xmit+0x157/0x200 net/hsr/hsr_device.c:230

but task is already holding lock:
ffff888069ca6da0 (&hsr->seqnr_lock/1){+.-.}-{2:2}, at: hsr_dev_xmit+0x157/0x200 net/hsr/hsr_device.c:230

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&hsr->seqnr_lock/1);
lock(&hsr->seqnr_lock/1);

*** DEADLOCK ***

May be due to missing lock nesting notation

11 locks held by kworker/0:1/8:
#0: ffff888029984d48 ((wq_completion)mld){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline]
#0: ffff888029984d48 ((wq_completion)mld){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x1770 kernel/workqueue.c:3335
#1: ffffc900000d7d00 ((work_completion)(&(&idev->mc_ifc_work)->work)){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline]
#1: ffffc900000d7d00 ((work_completion)(&(&idev->mc_ifc_work)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x1770 kernel/workqueue.c:3335
#2: ffff88806a227538 (&idev->mc_lock){+.+.}-{3:3}, at: mld_ifc_work+0x2d/0xd90 net/ipv6/mcast.c:2649
#3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#3: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: mld_sendpack+0x1de/0xda0 net/ipv6/mcast.c:1790
#4: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#4: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#4: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: ip6_finish_output2+0x712/0x1670 net/ipv6/ip6_output.c:122
#5: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
#5: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline]
#5: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4260
#6: ffff888069ca6da0 (&hsr->seqnr_lock/1){+.-.}-{2:2}, at: hsr_dev_xmit+0x157/0x200 net/hsr/hsr_device.c:230
#7: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#7: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#7: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: hsr_forward_skb+0xae/0x2400 net/hsr/hsr_forward.c:614
#8: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
#8: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline]
#8: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4260
#9: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:298 [inline]
#9: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:750 [inline]
#9: ffffffff8e132020 (rcu_read_lock){....}-{1:2}, at: br_dev_xmit+0x1b9/0x1a10 net/bridge/br_device.c:44
#10: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: local_bh_disable include/linux/bottom_half.h:20 [inline]
#10: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: rcu_read_lock_bh include/linux/rcupdate.h:802 [inline]
#10: ffffffff8e132080 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0x2c4/0x3b10 net/core/dev.c:4260

stack backtrace:
CPU: 0 PID: 8 Comm: kworker/0:1 Not tainted 6.8.0-syzkaller-08073-g480e035fc4c7-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Workqueue: mld mld_ifc_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
check_deadlock kernel/locking/lockdep.c:3062 [inline]
validate_chain+0x15c1/0x58e0 kernel/locking/lockdep.c:3856
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1e4/0x530 kernel/locking/lockdep.c:5754
_raw_spin_lock_nested+0x31/0x40 kernel/locking/spinlock.c:378
hsr_dev_xmit+0x157/0x200 net/hsr/hsr_device.c:230
__netdev_start_xmit include/linux/netdevice.h:4903 [inline]
netdev_start_xmit include/linux/netdevice.h:4917 [inline]
xmit_one net/core/dev.c:3531 [inline]
dev_hard_start_xmit+0x26a/0x790 net/core/dev.c:3547
__dev_queue_xmit+0x19f4/0x3b10 net/core/dev.c:4335
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
br_dev_queue_push_xmit+0x701/0x8d0 net/bridge/br_forward.c:53
NF_HOOK+0x3a7/0x460 include/linux/netfilter.h:314
br_forward_finish+0xe5/0x140 net/bridge/br_forward.c:66
NF_HOOK+0x3a7/0x460 include/linux/netfilter.h:314
__br_forward+0x489/0x660 net/bridge/br_forward.c:115
deliver_clone net/bridge/br_forward.c:131 [inline]
maybe_deliver+0xb3/0x150 net/bridge/br_forward.c:190
br_flood+0x2e4/0x660 net/bridge/br_forward.c:236
br_dev_xmit+0x118c/0x1a10
__netdev_start_xmit include/linux/netdevice.h:4903 [inline]
netdev_start_xmit include/linux/netdevice.h:4917 [inline]
xmit_one net/core/dev.c:3531 [inline]
dev_hard_start_xmit+0x26a/0x790 net/core/dev.c:3547
__dev_queue_xmit+0x19f4/0x3b10 net/core/dev.c:4335
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
hsr_xmit net/hsr/hsr_forward.c:380 [inline]
hsr_forward_do net/hsr/hsr_forward.c:471 [inline]
hsr_forward_skb+0x183f/0x2400 net/hsr/hsr_forward.c:619
hsr_dev_xmit+0x162/0x200 net/hsr/hsr_device.c:231
__netdev_start_xmit include/linux/netdevice.h:4903 [inline]
netdev_start_xmit include/linux/netdevice.h:4917 [inline]
xmit_one net/core/dev.c:3531 [inline]
dev_hard_start_xmit+0x26a/0x790 net/core/dev.c:3547
__dev_queue_xmit+0x19f4/0x3b10 net/core/dev.c:4335
neigh_output include/net/neighbour.h:542 [inline]
ip6_finish_output2+0xff8/0x1670 net/ipv6/ip6_output.c:137
ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222
NF_HOOK+0x9e/0x430 include/linux/netfilter.h:314
mld_sendpack+0x838/0xda0 net/ipv6/mcast.c:1818
mld_send_cr net/ipv6/mcast.c:2119 [inline]
mld_ifc_work+0x7d6/0xd90 net/ipv6/mcast.c:2650
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa00/0x1770 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
</TASK>


Tested on:

commit: 480e035f Merge tag 'drm-next-2024-03-13' of https://gi..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=1702f003180000
kernel config: https://syzkaller.appspot.com/x/.config?x=1e5b814e91787669
dashboard link: https://syzkaller.appspot.com/bug?extid=fbf74291c3b7e753b481
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=109cd855180000


2024-04-01 22:44:51

by Hillf Danton

[permalink] [raw]
Subject: Re: [syzbot] [net?] possible deadlock in hsr_dev_xmit (2)

On Sun, 31 Mar 2024 23:11:27 -0700
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: 480e035fc4c7 Merge tag 'drm-next-2024-03-13' of https://gi..
> git tree: upstream
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15e0f5c3180000

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 480e035fc4c7

--- x/net/hsr/hsr_device.c
+++ y/net/hsr/hsr_device.c
@@ -220,15 +220,19 @@ static netdev_tx_t hsr_dev_xmit(struct s
{
struct hsr_priv *hsr = netdev_priv(dev);
struct hsr_port *master;
+ static int depth = 0;

master = hsr_port_get_hsr(hsr, HSR_PT_MASTER);
if (master) {
skb->dev = master->dev;
skb_reset_mac_header(skb);
skb_reset_mac_len(skb);
- spin_lock_bh(&hsr->seqnr_lock);
+ local_bh_disable();
+ spin_lock_nested(&hsr->seqnr_lock, ++depth);
hsr_forward_skb(skb, master);
- spin_unlock_bh(&hsr->seqnr_lock);
+ --depth;
+ spin_unlock(&hsr->seqnr_lock);
+ local_bh_enable();
} else {
dev_core_stats_tx_dropped_inc(dev);
dev_kfree_skb_any(skb);
--

2024-04-02 03:27:26

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [net?] possible deadlock in hsr_dev_xmit (2)

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: [email protected]

Tested on:

commit: 480e035f Merge tag 'drm-next-2024-03-13' of https://gi..
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=12e1130d180000
kernel config: https://syzkaller.appspot.com/x/.config?x=1e5b814e91787669
dashboard link: https://syzkaller.appspot.com/bug?extid=fbf74291c3b7e753b481
compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=1321fac5180000

Note: testing is done by a robot and is best-effort only.

2024-05-05 09:08:15

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] [net?] possible deadlock in hsr_dev_xmit (2)

syzbot has bisected this issue to:

commit 06afd2c31d338fa762548580c1bf088703dd1e03
Author: Sebastian Andrzej Siewior <[email protected]>
Date: Tue Nov 29 16:48:12 2022 +0000

hsr: Synchronize sending frames to have always incremented outgoing seq nr.

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=133c15f8980000
start commit: 5829614a7b3b Merge branch 'net-sysctl-sentinel'
git tree: net-next
final oops: https://syzkaller.appspot.com/x/report.txt?x=10bc15f8980000
console output: https://syzkaller.appspot.com/x/log.txt?x=173c15f8980000
kernel config: https://syzkaller.appspot.com/x/.config?x=7c70a227bc928e1b
dashboard link: https://syzkaller.appspot.com/bug?extid=fbf74291c3b7e753b481
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=144d20e4980000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1532ab38980000

Reported-by: [email protected]
Fixes: 06afd2c31d33 ("hsr: Synchronize sending frames to have always incremented outgoing seq nr.")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection