2022-02-17 16:56:44

by syzbot

[permalink] [raw]
Subject: [syzbot] BUG: sleeping function called from invalid context in smc_pnet_apply_ib

Hello,

syzbot found the following issue on:

HEAD commit: c832962ac972 net: bridge: multicast: notify switchdev driv..
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=16b157bc700000
kernel config: https://syzkaller.appspot.com/x/.config?x=266de9da75c71a45
dashboard link: https://syzkaller.appspot.com/bug?extid=4f322a6d84e991c38775
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

infiniband syz1: set down
infiniband syz1: added lo
RDS/IB: syz1: added
smc: adding ib device syz1 with port count 1
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:577
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 17974, name: syz-executor.3
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
6 locks held by syz-executor.3/17974:
#0: ffffffff90865838 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
#1: ffffffff8d04edf0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x25d/0x560 drivers/infiniband/core/nldev.c:1707
#2: ffffffff8d03e650 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1321
#3: ffffffff8d03e510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1329
#4: ffff8880482c85c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:718
#5: ffff8880230a4118 (&pnettable->lock){++++}-{2:2}, at: smc_pnetid_by_table_ib+0x18c/0x470 net/smc/smc_pnet.c:1159
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 1 PID: 17974 Comm: syz-executor.3 Not tainted 5.17.0-rc3-syzkaller-00170-gc832962ac972 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
__might_resched.cold+0x222/0x26b kernel/sched/core.c:9576
__mutex_lock_common kernel/locking/mutex.c:577 [inline]
__mutex_lock+0x9f/0x12f0 kernel/locking/mutex.c:733
smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
smc_pnetid_by_table_ib+0x2ae/0x470 net/smc/smc_pnet.c:1164
smc_ib_add_dev+0x4d7/0x900 net/smc/smc_ib.c:940
add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:720
enable_device_and_get+0x1cd/0x3b0 drivers/infiniband/core/device.c:1331
ib_register_device drivers/infiniband/core/device.c:1419 [inline]
ib_register_device+0x814/0xaf0 drivers/infiniband/core/device.c:1365
rxe_register_device+0x2fe/0x3b0 drivers/infiniband/sw/rxe/rxe_verbs.c:1146
rxe_add+0x1331/0x1710 drivers/infiniband/sw/rxe/rxe.c:246
rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:538
rxe_newlink drivers/infiniband/sw/rxe/rxe.c:268 [inline]
rxe_newlink+0xa9/0xd0 drivers/infiniband/sw/rxe/rxe.c:249
nldev_newlink+0x30a/0x560 drivers/infiniband/core/nldev.c:1717
rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f909305f059
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f90919d4168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f9093171f60 RCX: 00007f909305f059
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000004
RBP: 00007f90930b908d R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fff171c256f R14: 00007f90919d4300 R15: 0000000000022000
</TASK>

=============================
[ BUG: Invalid wait context ]
5.17.0-rc3-syzkaller-00170-gc832962ac972 #0 Tainted: G W
-----------------------------
syz-executor.3/17974 is trying to lock:
ffffffff8d710098 (smc_ib_devices.mutex){+.+.}-{3:3}, at: smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
other info that might help us debug this:
context-{4:4}
6 locks held by syz-executor.3/17974:
#0: ffffffff90865838 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
#1: ffffffff8d04edf0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x25d/0x560 drivers/infiniband/core/nldev.c:1707
#2: ffffffff8d03e650 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1321
#3: ffffffff8d03e510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1329
#4: ffff8880482c85c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:718
#5: ffff8880230a4118 (&pnettable->lock){++++}-{2:2}, at: smc_pnetid_by_table_ib+0x18c/0x470 net/smc/smc_pnet.c:1159
stack backtrace:
CPU: 1 PID: 17974 Comm: syz-executor.3 Tainted: G W 5.17.0-rc3-syzkaller-00170-gc832962ac972 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_lock_invalid_wait_context kernel/locking/lockdep.c:4678 [inline]
check_wait_context kernel/locking/lockdep.c:4739 [inline]
__lock_acquire.cold+0x213/0x3ab kernel/locking/lockdep.c:4977
lock_acquire kernel/locking/lockdep.c:5639 [inline]
lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
__mutex_lock_common kernel/locking/mutex.c:600 [inline]
__mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
smc_pnetid_by_table_ib+0x2ae/0x470 net/smc/smc_pnet.c:1164
smc_ib_add_dev+0x4d7/0x900 net/smc/smc_ib.c:940
add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:720
enable_device_and_get+0x1cd/0x3b0 drivers/infiniband/core/device.c:1331
ib_register_device drivers/infiniband/core/device.c:1419 [inline]
ib_register_device+0x814/0xaf0 drivers/infiniband/core/device.c:1365
rxe_register_device+0x2fe/0x3b0 drivers/infiniband/sw/rxe/rxe_verbs.c:1146
rxe_add+0x1331/0x1710 drivers/infiniband/sw/rxe/rxe.c:246
rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:538
rxe_newlink drivers/infiniband/sw/rxe/rxe.c:268 [inline]
rxe_newlink+0xa9/0xd0 drivers/infiniband/sw/rxe/rxe.c:249
nldev_newlink+0x30a/0x560 drivers/infiniband/core/nldev.c:1717
rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f909305f059
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f90919d4168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f9093171f60 RCX: 00007f909305f059
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000004
RBP: 00007f90930b908d R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fff171c256f R14: 00007f90919d4300 R15: 0000000000022000
</TASK>
smc: ib device syz1 port 1 has pnetid SYZ2 (user defined)
lo speed is unknown, defaulting to 1000
lo speed is unknown, defaulting to 1000
lo speed is unknown, defaulting to 1000
lo speed is unknown, defaulting to 1000
lo speed is unknown, defaulting to 1000
lo speed is unknown, defaulting to 1000


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


2022-02-18 00:08:49

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in smc_pnet_apply_ib

On giovedì 17 febbraio 2022 17:41:22 CET syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: c832962ac972 net: bridge: multicast: notify switchdev driv..
> git tree: net
> console output: https://syzkaller.appspot.com/x/log.txt?x=16b157bc700000
> kernel config: https://syzkaller.appspot.com/x/.config?x=266de9da75c71a45
> dashboard link: https://syzkaller.appspot.com/bug?extid=4f322a6d84e991c38775
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> infiniband syz1: set down
> infiniband syz1: added lo
> RDS/IB: syz1: added
> smc: adding ib device syz1 with port count 1
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:577
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 17974, name: syz-executor.3
> preempt_count: 1, expected: 0
> RCU nest depth: 0, expected: 0
> 6 locks held by syz-executor.3/17974:
> #0: ffffffff90865838 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
> #1: ffffffff8d04edf0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x25d/0x560 drivers/infiniband/core/nldev.c:1707
> #2: ffffffff8d03e650 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1321
> #3: ffffffff8d03e510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1329
> #4: ffff8880482c85c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:718
> #5: ffff8880230a4118 (&pnettable->lock){++++}-{2:2}, at: smc_pnetid_by_table_ib+0x18c/0x470 net/smc/smc_pnet.c:1159
> Preemption disabled at:
> [<0000000000000000>] 0x0
> CPU: 1 PID: 17974 Comm: syz-executor.3 Not tainted 5.17.0-rc3-syzkaller-00170-gc832962ac972 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> __might_resched.cold+0x222/0x26b kernel/sched/core.c:9576
> __mutex_lock_common kernel/locking/mutex.c:577 [inline]
> __mutex_lock+0x9f/0x12f0 kernel/locking/mutex.c:733
> smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
> smc_pnetid_by_table_ib+0x2ae/0x470 net/smc/smc_pnet.c:1164

If I recall it well, read_lock() disables preemption.

smc_pnetid_by_table_ib() uses read_lock() and then it calls smc_pnet_apply_ib()
which, in turn, calls mutex_lock(&smc_ib_devices.mutex). Therefore the code
acquires a mutex while in atomic and we get a SAC bug.

Actually, even if my argument is correct(?), I don't know if the read_lock()
in smc_pnetid_by_table_ib() can be converted to a sleeping lock like a mutex or
a semaphore.

Any comment?

Thanks,

Fabio M. De Francesco



2022-02-18 00:17:18

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in smc_pnet_apply_ib

syzbot has found a reproducer for the following issue on:

HEAD commit: 5740d0689096 net: sched: limit TC_ACT_REPEAT loops
git tree: net
console output: https://syzkaller.appspot.com/x/log.txt?x=1474360e700000
kernel config: https://syzkaller.appspot.com/x/.config?x=88e226f0197aeba5
dashboard link: https://syzkaller.appspot.com/bug?extid=4f322a6d84e991c38775
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13dd93f2700000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16a497e2700000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

infiniband syz1: set active
infiniband syz1: added lo
RDS/IB: syz1: added
smc: adding ib device syz1 with port count 1
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:577
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3589, name: syz-executor180
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
6 locks held by syz-executor180/3589:
#0: ffffffff90865838 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
#1: ffffffff8d04edf0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x25d/0x560 drivers/infiniband/core/nldev.c:1707
#2: ffffffff8d03e650 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1321
#3: ffffffff8d03e510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1329
#4: ffff8880790445c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:718
#5: ffff88814a29c818 (&pnettable->lock){++++}-{2:2}, at: smc_pnetid_by_table_ib+0x18c/0x470 net/smc/smc_pnet.c:1159
Preemption disabled at:
[<0000000000000000>] 0x0
CPU: 0 PID: 3589 Comm: syz-executor180 Not tainted 5.17.0-rc3-syzkaller-00174-g5740d0689096 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
__might_resched.cold+0x222/0x26b kernel/sched/core.c:9576
__mutex_lock_common kernel/locking/mutex.c:577 [inline]
__mutex_lock+0x9f/0x12f0 kernel/locking/mutex.c:733
smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
smc_pnetid_by_table_ib+0x2ae/0x470 net/smc/smc_pnet.c:1164
smc_ib_add_dev+0x4d7/0x900 net/smc/smc_ib.c:940
add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:720
enable_device_and_get+0x1cd/0x3b0 drivers/infiniband/core/device.c:1331
ib_register_device drivers/infiniband/core/device.c:1419 [inline]
ib_register_device+0x814/0xaf0 drivers/infiniband/core/device.c:1365
rxe_register_device+0x2fe/0x3b0 drivers/infiniband/sw/rxe/rxe_verbs.c:1146
rxe_add+0x1331/0x1710 drivers/infiniband/sw/rxe/rxe.c:246
rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:538
rxe_newlink drivers/infiniband/sw/rxe/rxe.c:268 [inline]
rxe_newlink+0xa9/0xd0 drivers/infiniband/sw/rxe/rxe.c:249
nldev_newlink+0x30a/0x560 drivers/infiniband/core/nldev.c:1717
rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7ef25bed59
Code: 28 c3 e8 5a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcd0ce91d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ef25bed59
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000005
RBP: 00007f7ef25827c0 R08: 0000000000000014 R09: 0000000000000000
R10: 0000000000000041 R11: 0000000000000246 R12: 00007f7ef2582850
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>

=============================
[ BUG: Invalid wait context ]
5.17.0-rc3-syzkaller-00174-g5740d0689096 #0 Tainted: G W
-----------------------------
syz-executor180/3589 is trying to lock:
ffffffff8d7100d8 (smc_ib_devices.mutex){+.+.}-{3:3}, at: smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
other info that might help us debug this:
context-{4:4}
6 locks held by syz-executor180/3589:
#0: ffffffff90865838 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
#1: ffffffff8d04edf0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x25d/0x560 drivers/infiniband/core/nldev.c:1707
#2: ffffffff8d03e650 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1321
#3: ffffffff8d03e510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1329
#4: ffff8880790445c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:718
#5: ffff88814a29c818 (&pnettable->lock){++++}-{2:2}, at: smc_pnetid_by_table_ib+0x18c/0x470 net/smc/smc_pnet.c:1159
stack backtrace:
CPU: 0 PID: 3589 Comm: syz-executor180 Tainted: G W 5.17.0-rc3-syzkaller-00174-g5740d0689096 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_lock_invalid_wait_context kernel/locking/lockdep.c:4678 [inline]
check_wait_context kernel/locking/lockdep.c:4739 [inline]
__lock_acquire.cold+0x213/0x3ab kernel/locking/lockdep.c:4977
lock_acquire kernel/locking/lockdep.c:5639 [inline]
lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
__mutex_lock_common kernel/locking/mutex.c:600 [inline]
__mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
smc_pnetid_by_table_ib+0x2ae/0x470 net/smc/smc_pnet.c:1164
smc_ib_add_dev+0x4d7/0x900 net/smc/smc_ib.c:940
add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:720
enable_device_and_get+0x1cd/0x3b0 drivers/infiniband/core/device.c:1331
ib_register_device drivers/infiniband/core/device.c:1419 [inline]
ib_register_device+0x814/0xaf0 drivers/infiniband/core/device.c:1365
rxe_register_device+0x2fe/0x3b0 drivers/infiniband/sw/rxe/rxe_verbs.c:1146
rxe_add+0x1331/0x1710 drivers/infiniband/sw/rxe/rxe.c:246
rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:538
rxe_newlink drivers/infiniband/sw/rxe/rxe.c:268 [inline]
rxe_newlink+0xa9/0xd0 drivers/infiniband/sw/rxe/rxe.c:249
nldev_newlink+0x30a/0x560 drivers/infiniband/core/nldev.c:1717
rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:725
____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
___sys_sendmsg+0xf3/0x170 net/socket.c:2467
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7ef25bed59
Code: 28 c3 e8 5a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcd0ce91d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ef25bed59
RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000005
RBP: 00007f7ef25827c0 R08: 0000000000000014 R09: 0000000000000000
R10: 0000000000000041 R11: 0000000000000246 R12: 00007f7ef2582850
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
smc: ib device syz1 port 1 has pnetid SYZ2 (user defined)

2022-02-21 09:29:21

by Tony Lu

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in smc_pnet_apply_ib

On Thu, Feb 17, 2022 at 07:05:31PM +0100, Fabio M. De Francesco wrote:
> On gioved? 17 febbraio 2022 17:41:22 CET syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: c832962ac972 net: bridge: multicast: notify switchdev driv..
> > git tree: net
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16b157bc700000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=266de9da75c71a45
> > dashboard link: https://syzkaller.appspot.com/bug?extid=4f322a6d84e991c38775
> > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
> >
> > infiniband syz1: set down
> > infiniband syz1: added lo
> > RDS/IB: syz1: added
> > smc: adding ib device syz1 with port count 1
> > BUG: sleeping function called from invalid context at kernel/locking/mutex.c:577
> > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 17974, name: syz-executor.3
> > preempt_count: 1, expected: 0
> > RCU nest depth: 0, expected: 0
> > 6 locks held by syz-executor.3/17974:
> > #0: ffffffff90865838 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
> > #1: ffffffff8d04edf0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x25d/0x560 drivers/infiniband/core/nldev.c:1707
> > #2: ffffffff8d03e650 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1321
> > #3: ffffffff8d03e510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1329
> > #4: ffff8880482c85c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:718
> > #5: ffff8880230a4118 (&pnettable->lock){++++}-{2:2}, at: smc_pnetid_by_table_ib+0x18c/0x470 net/smc/smc_pnet.c:1159
> > Preemption disabled at:
> > [<0000000000000000>] 0x0
> > CPU: 1 PID: 17974 Comm: syz-executor.3 Not tainted 5.17.0-rc3-syzkaller-00170-gc832962ac972 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:88 [inline]
> > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> > __might_resched.cold+0x222/0x26b kernel/sched/core.c:9576
> > __mutex_lock_common kernel/locking/mutex.c:577 [inline]
> > __mutex_lock+0x9f/0x12f0 kernel/locking/mutex.c:733
> > smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
> > smc_pnetid_by_table_ib+0x2ae/0x470 net/smc/smc_pnet.c:1164
>
> If I recall it well, read_lock() disables preemption.
>
> smc_pnetid_by_table_ib() uses read_lock() and then it calls smc_pnet_apply_ib()
> which, in turn, calls mutex_lock(&smc_ib_devices.mutex). Therefore the code
> acquires a mutex while in atomic and we get a SAC bug.
>
> Actually, even if my argument is correct(?), I don't know if the read_lock()
> in smc_pnetid_by_table_ib() can be converted to a sleeping lock like a mutex or
> a semaphore.

I think it is okay to use mutex, because this path is not so hot and no
limit to require spinlocks. pnettable is accessed by netlink, syscall
and netdevice notifier.

2022-02-22 04:40:29

by Tony Lu

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in smc_pnet_apply_ib

On Thu, Feb 17, 2022 at 07:05:31PM +0100, Fabio M. De Francesco wrote:
> On gioved? 17 febbraio 2022 17:41:22 CET syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: c832962ac972 net: bridge: multicast: notify switchdev driv..
> > git tree: net
> > console output: https://syzkaller.appspot.com/x/log.txt?x=16b157bc700000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=266de9da75c71a45
> > dashboard link: https://syzkaller.appspot.com/bug?extid=4f322a6d84e991c38775
> > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
> >
> > infiniband syz1: set down
> > infiniband syz1: added lo
> > RDS/IB: syz1: added
> > smc: adding ib device syz1 with port count 1
> > BUG: sleeping function called from invalid context at kernel/locking/mutex.c:577
> > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 17974, name: syz-executor.3
> > preempt_count: 1, expected: 0
> > RCU nest depth: 0, expected: 0
> > 6 locks held by syz-executor.3/17974:
> > #0: ffffffff90865838 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
> > #1: ffffffff8d04edf0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x25d/0x560 drivers/infiniband/core/nldev.c:1707
> > #2: ffffffff8d03e650 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1321
> > #3: ffffffff8d03e510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1329
> > #4: ffff8880482c85c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:718
> > #5: ffff8880230a4118 (&pnettable->lock){++++}-{2:2}, at: smc_pnetid_by_table_ib+0x18c/0x470 net/smc/smc_pnet.c:1159
> > Preemption disabled at:
> > [<0000000000000000>] 0x0
> > CPU: 1 PID: 17974 Comm: syz-executor.3 Not tainted 5.17.0-rc3-syzkaller-00170-gc832962ac972 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Call Trace:
> > <TASK>
> > __dump_stack lib/dump_stack.c:88 [inline]
> > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> > __might_resched.cold+0x222/0x26b kernel/sched/core.c:9576
> > __mutex_lock_common kernel/locking/mutex.c:577 [inline]
> > __mutex_lock+0x9f/0x12f0 kernel/locking/mutex.c:733
> > smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
> > smc_pnetid_by_table_ib+0x2ae/0x470 net/smc/smc_pnet.c:1164
>
> If I recall it well, read_lock() disables preemption.
>
> smc_pnetid_by_table_ib() uses read_lock() and then it calls smc_pnet_apply_ib()
> which, in turn, calls mutex_lock(&smc_ib_devices.mutex). Therefore the code
> acquires a mutex while in atomic and we get a SAC bug.
>
> Actually, even if my argument is correct(?), I don't know if the read_lock()
> in smc_pnetid_by_table_ib() can be converted to a sleeping lock like a mutex or
> a semaphore.

Take the email above. I think it is safe to convert read_lock() to
mutex, which is already used by smc_ib_devices.mutex.

Thank you,
Tony Lu

2022-02-22 05:04:55

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in smc_pnet_apply_ib

On lunedì 21 febbraio 2022 10:18:59 CET Tony Lu wrote:
> On Thu, Feb 17, 2022 at 07:05:31PM +0100, Fabio M. De Francesco wrote:
> > On giovedì 17 febbraio 2022 17:41:22 CET syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following issue on:
> > >
> > > HEAD commit: c832962ac972 net: bridge: multicast: notify switchdev driv..
> > > git tree: net
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=16b157bc700000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=266de9da75c71a45
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=4f322a6d84e991c38775
> > > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > >
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > >
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: [email protected]
> > >
> > > infiniband syz1: set down
> > > infiniband syz1: added lo
> > > RDS/IB: syz1: added
> > > smc: adding ib device syz1 with port count 1
> > > BUG: sleeping function called from invalid context at kernel/locking/mutex.c:577
> > > in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 17974, name: syz-executor.3
> > > preempt_count: 1, expected: 0
> > > RCU nest depth: 0, expected: 0
> > > 6 locks held by syz-executor.3/17974:
> > > #0: ffffffff90865838 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
> > > #1: ffffffff8d04edf0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x25d/0x560 drivers/infiniband/core/nldev.c:1707
> > > #2: ffffffff8d03e650 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1321
> > > #3: ffffffff8d03e510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1329
> > > #4: ffff8880482c85c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:718
> > > #5: ffff8880230a4118 (&pnettable->lock){++++}-{2:2}, at: smc_pnetid_by_table_ib+0x18c/0x470 net/smc/smc_pnet.c:1159
> > > Preemption disabled at:
> > > [<0000000000000000>] 0x0
> > > CPU: 1 PID: 17974 Comm: syz-executor.3 Not tainted 5.17.0-rc3-syzkaller-00170-gc832962ac972 #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > > Call Trace:
> > > <TASK>
> > > __dump_stack lib/dump_stack.c:88 [inline]
> > > dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> > > __might_resched.cold+0x222/0x26b kernel/sched/core.c:9576
> > > __mutex_lock_common kernel/locking/mutex.c:577 [inline]
> > > __mutex_lock+0x9f/0x12f0 kernel/locking/mutex.c:733
> > > smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
> > > smc_pnetid_by_table_ib+0x2ae/0x470 net/smc/smc_pnet.c:1164
> >
> > If I recall it well, read_lock() disables preemption.
> >
> > smc_pnetid_by_table_ib() uses read_lock() and then it calls smc_pnet_apply_ib()
> > which, in turn, calls mutex_lock(&smc_ib_devices.mutex). Therefore the code
> > acquires a mutex while in atomic and we get a SAC bug.
> >
> > Actually, even if my argument is correct(?), I don't know if the read_lock()
> > in smc_pnetid_by_table_ib() can be converted to a sleeping lock like a mutex or
> > a semaphore.
>
> Take the email above. I think it is safe to convert read_lock() to
> mutex, which is already used by smc_ib_devices.mutex.

Thanks for your reply.

I have noticed that the "pnettable->lock" rwlock is acquired several times
in different functions of net/smc/smc_pnet.c. smc_pnetid_by_table_ib() is just one
of many functions that acquire that rwlock.

Therefore, my question is... are you _really_ sure that "pnettable->lock" can be
safely converted to a mutex everywhere in net/smc?

I haven't read _all_ the path that lead to {write,read}_lock(&pnettable->lock) in
the net/smc code.

I think that before submitting that patch I should carefully read the code and check
_all_ the paths, unless you can confirm that the conversion is safe everywhere. If
you can answer my question, I can work on a patch by this evening (CET time zone)
and, obviously, give you proper credit.

Thank you,

Fabio M. De Francesco

>
> Thank you,
> Tony Lu



2022-02-23 14:17:37

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in smc_pnet_apply_ib

Hello,

syzbot tried to test the proposed patch but the build/boot failed:

net/smc/smc_pnet.h:32:2: error: unknown type name 'mutex'


Tested on:

commit: 5c1ee569 Merge branch 'for-5.17-fixes' of git://git.ke..
git tree: upstream
dashboard link: https://syzkaller.appspot.com/bug?extid=4f322a6d84e991c38775
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=116231fe700000

2022-02-23 15:41:36

by syzbot

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in smc_pnet_apply_ib

Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: [email protected]

Tested on:

commit: 5c1ee569 Merge branch 'for-5.17-fixes' of git://git.ke..
git tree: upstream
kernel config: https://syzkaller.appspot.com/x/.config?x=15187fc11a461d83
dashboard link: https://syzkaller.appspot.com/bug?extid=4f322a6d84e991c38775
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch: https://syzkaller.appspot.com/x/patch.diff?x=15bc3696700000

Note: testing is done by a robot and is best-effort only.

2022-02-23 16:35:13

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in smc_pnet_apply_ib

On giovedì 17 febbraio 2022 19:13:19 CET syzbot wrote:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: 5740d0689096 net: sched: limit TC_ACT_REPEAT loops
> git tree: net
> console output: https://syzkaller.appspot.com/x/log.txt?x=1474360e700000
> kernel config: https://syzkaller.appspot.com/x/.config?x=88e226f0197aeba5
> dashboard link: https://syzkaller.appspot.com/bug?extid=4f322a6d84e991c38775
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13dd93f2700000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16a497e2700000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> infiniband syz1: set active
> infiniband syz1: added lo
> RDS/IB: syz1: added
> smc: adding ib device syz1 with port count 1
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:577
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3589, name: syz-executor180
> preempt_count: 1, expected: 0
> RCU nest depth: 0, expected: 0
> 6 locks held by syz-executor180/3589:
> #0: ffffffff90865838 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
> #1: ffffffff8d04edf0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x25d/0x560 drivers/infiniband/core/nldev.c:1707
> #2: ffffffff8d03e650 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1321
> #3: ffffffff8d03e510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1329
> #4: ffff8880790445c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:718
> #5: ffff88814a29c818 (&pnettable->lock){++++}-{2:2}, at: smc_pnetid_by_table_ib+0x18c/0x470 net/smc/smc_pnet.c:1159
> Preemption disabled at:
> [<0000000000000000>] 0x0
> CPU: 0 PID: 3589 Comm: syz-executor180 Not tainted 5.17.0-rc3-syzkaller-00174-g5740d0689096 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> __might_resched.cold+0x222/0x26b kernel/sched/core.c:9576
> __mutex_lock_common kernel/locking/mutex.c:577 [inline]
> __mutex_lock+0x9f/0x12f0 kernel/locking/mutex.c:733
> smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
> smc_pnetid_by_table_ib+0x2ae/0x470 net/smc/smc_pnet.c:1164
> smc_ib_add_dev+0x4d7/0x900 net/smc/smc_ib.c:940
> add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:720
> enable_device_and_get+0x1cd/0x3b0 drivers/infiniband/core/device.c:1331
> ib_register_device drivers/infiniband/core/device.c:1419 [inline]
> ib_register_device+0x814/0xaf0 drivers/infiniband/core/device.c:1365
> rxe_register_device+0x2fe/0x3b0 drivers/infiniband/sw/rxe/rxe_verbs.c:1146
> rxe_add+0x1331/0x1710 drivers/infiniband/sw/rxe/rxe.c:246
> rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:538
> rxe_newlink drivers/infiniband/sw/rxe/rxe.c:268 [inline]
> rxe_newlink+0xa9/0xd0 drivers/infiniband/sw/rxe/rxe.c:249
> nldev_newlink+0x30a/0x560 drivers/infiniband/core/nldev.c:1717
> rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
> rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
> rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
> netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
> netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
> netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
> sock_sendmsg_nosec net/socket.c:705 [inline]
> sock_sendmsg+0xcf/0x120 net/socket.c:725
> ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
> ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
> __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f7ef25bed59
> Code: 28 c3 e8 5a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffcd0ce91d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ef25bed59
> RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000005
> RBP: 00007f7ef25827c0 R08: 0000000000000014 R09: 0000000000000000
> R10: 0000000000000041 R11: 0000000000000246 R12: 00007f7ef2582850
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
>
> =============================
> [ BUG: Invalid wait context ]
> 5.17.0-rc3-syzkaller-00174-g5740d0689096 #0 Tainted: G W
> -----------------------------
> syz-executor180/3589 is trying to lock:
> ffffffff8d7100d8 (smc_ib_devices.mutex){+.+.}-{3:3}, at: smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
> other info that might help us debug this:
> context-{4:4}
> 6 locks held by syz-executor180/3589:
> #0: ffffffff90865838 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
> #1: ffffffff8d04edf0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x25d/0x560 drivers/infiniband/core/nldev.c:1707
> #2: ffffffff8d03e650 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1321
> #3: ffffffff8d03e510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1329
> #4: ffff8880790445c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:718
> #5: ffff88814a29c818 (&pnettable->lock){++++}-{2:2}, at: smc_pnetid_by_table_ib+0x18c/0x470 net/smc/smc_pnet.c:1159
> stack backtrace:
> CPU: 0 PID: 3589 Comm: syz-executor180 Tainted: G W 5.17.0-rc3-syzkaller-00174-g5740d0689096 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> print_lock_invalid_wait_context kernel/locking/lockdep.c:4678 [inline]
> check_wait_context kernel/locking/lockdep.c:4739 [inline]
> __lock_acquire.cold+0x213/0x3ab kernel/locking/lockdep.c:4977
> lock_acquire kernel/locking/lockdep.c:5639 [inline]
> lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
> __mutex_lock_common kernel/locking/mutex.c:600 [inline]
> __mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
> smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
> smc_pnetid_by_table_ib+0x2ae/0x470 net/smc/smc_pnet.c:1164
> smc_ib_add_dev+0x4d7/0x900 net/smc/smc_ib.c:940
> add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:720
> enable_device_and_get+0x1cd/0x3b0 drivers/infiniband/core/device.c:1331
> ib_register_device drivers/infiniband/core/device.c:1419 [inline]
> ib_register_device+0x814/0xaf0 drivers/infiniband/core/device.c:1365
> rxe_register_device+0x2fe/0x3b0 drivers/infiniband/sw/rxe/rxe_verbs.c:1146
> rxe_add+0x1331/0x1710 drivers/infiniband/sw/rxe/rxe.c:246
> rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:538
> rxe_newlink drivers/infiniband/sw/rxe/rxe.c:268 [inline]
> rxe_newlink+0xa9/0xd0 drivers/infiniband/sw/rxe/rxe.c:249
> nldev_newlink+0x30a/0x560 drivers/infiniband/core/nldev.c:1717
> rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
> rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
> rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
> netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
> netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
> netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
> sock_sendmsg_nosec net/socket.c:705 [inline]
> sock_sendmsg+0xcf/0x120 net/socket.c:725
> ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
> ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
> __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f7ef25bed59
> Code: 28 c3 e8 5a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffcd0ce91d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ef25bed59
> RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000005
> RBP: 00007f7ef25827c0 R08: 0000000000000014 R09: 0000000000000000
> R10: 0000000000000041 R11: 0000000000000246 R12: 00007f7ef2582850
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
> smc: ib device syz1 port 1 has pnetid SYZ2 (user defined)
>
>
As confirmed by Tony Lu (thanks!), replace rwlocks with mutexes for locking
"struct smc_pnettable".

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Fabio M. De Francesco


Attachments:
diff (6.40 kB)

2022-02-23 17:06:20

by Fabio M. De Francesco

[permalink] [raw]
Subject: Re: [syzbot] BUG: sleeping function called from invalid context in smc_pnet_apply_ib

On giovedì 17 febbraio 2022 19:13:19 CET syzbot wrote:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit: 5740d0689096 net: sched: limit TC_ACT_REPEAT loops
> git tree: net
> console output: https://syzkaller.appspot.com/x/log.txt?x=1474360e700000
> kernel config: https://syzkaller.appspot.com/x/.config?x=88e226f0197aeba5
> dashboard link: https://syzkaller.appspot.com/bug?extid=4f322a6d84e991c38775
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13dd93f2700000
> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=16a497e2700000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> infiniband syz1: set active
> infiniband syz1: added lo
> RDS/IB: syz1: added
> smc: adding ib device syz1 with port count 1
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:577
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3589, name: syz-executor180
> preempt_count: 1, expected: 0
> RCU nest depth: 0, expected: 0
> 6 locks held by syz-executor180/3589:
> #0: ffffffff90865838 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
> #1: ffffffff8d04edf0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x25d/0x560 drivers/infiniband/core/nldev.c:1707
> #2: ffffffff8d03e650 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1321
> #3: ffffffff8d03e510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1329
> #4: ffff8880790445c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:718
> #5: ffff88814a29c818 (&pnettable->lock){++++}-{2:2}, at: smc_pnetid_by_table_ib+0x18c/0x470 net/smc/smc_pnet.c:1159
> Preemption disabled at:
> [<0000000000000000>] 0x0
> CPU: 0 PID: 3589 Comm: syz-executor180 Not tainted 5.17.0-rc3-syzkaller-00174-g5740d0689096 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> __might_resched.cold+0x222/0x26b kernel/sched/core.c:9576
> __mutex_lock_common kernel/locking/mutex.c:577 [inline]
> __mutex_lock+0x9f/0x12f0 kernel/locking/mutex.c:733
> smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
> smc_pnetid_by_table_ib+0x2ae/0x470 net/smc/smc_pnet.c:1164
> smc_ib_add_dev+0x4d7/0x900 net/smc/smc_ib.c:940
> add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:720
> enable_device_and_get+0x1cd/0x3b0 drivers/infiniband/core/device.c:1331
> ib_register_device drivers/infiniband/core/device.c:1419 [inline]
> ib_register_device+0x814/0xaf0 drivers/infiniband/core/device.c:1365
> rxe_register_device+0x2fe/0x3b0 drivers/infiniband/sw/rxe/rxe_verbs.c:1146
> rxe_add+0x1331/0x1710 drivers/infiniband/sw/rxe/rxe.c:246
> rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:538
> rxe_newlink drivers/infiniband/sw/rxe/rxe.c:268 [inline]
> rxe_newlink+0xa9/0xd0 drivers/infiniband/sw/rxe/rxe.c:249
> nldev_newlink+0x30a/0x560 drivers/infiniband/core/nldev.c:1717
> rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
> rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
> rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
> netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
> netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
> netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
> sock_sendmsg_nosec net/socket.c:705 [inline]
> sock_sendmsg+0xcf/0x120 net/socket.c:725
> ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
> ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
> __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f7ef25bed59
> Code: 28 c3 e8 5a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffcd0ce91d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ef25bed59
> RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000005
> RBP: 00007f7ef25827c0 R08: 0000000000000014 R09: 0000000000000000
> R10: 0000000000000041 R11: 0000000000000246 R12: 00007f7ef2582850
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
>
> =============================
> [ BUG: Invalid wait context ]
> 5.17.0-rc3-syzkaller-00174-g5740d0689096 #0 Tainted: G W
> -----------------------------
> syz-executor180/3589 is trying to lock:
> ffffffff8d7100d8 (smc_ib_devices.mutex){+.+.}-{3:3}, at: smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
> other info that might help us debug this:
> context-{4:4}
> 6 locks held by syz-executor180/3589:
> #0: ffffffff90865838 (&rdma_nl_types[idx].sem){.+.+}-{3:3}, at: rdma_nl_rcv_msg+0x161/0x690 drivers/infiniband/core/netlink.c:164
> #1: ffffffff8d04edf0 (link_ops_rwsem){++++}-{3:3}, at: nldev_newlink+0x25d/0x560 drivers/infiniband/core/nldev.c:1707
> #2: ffffffff8d03e650 (devices_rwsem){++++}-{3:3}, at: enable_device_and_get+0xfc/0x3b0 drivers/infiniband/core/device.c:1321
> #3: ffffffff8d03e510 (clients_rwsem){++++}-{3:3}, at: enable_device_and_get+0x15b/0x3b0 drivers/infiniband/core/device.c:1329
> #4: ffff8880790445c0 (&device->client_data_rwsem){++++}-{3:3}, at: add_client_context+0x3d0/0x5e0 drivers/infiniband/core/device.c:718
> #5: ffff88814a29c818 (&pnettable->lock){++++}-{2:2}, at: smc_pnetid_by_table_ib+0x18c/0x470 net/smc/smc_pnet.c:1159
> stack backtrace:
> CPU: 0 PID: 3589 Comm: syz-executor180 Tainted: G W 5.17.0-rc3-syzkaller-00174-g5740d0689096 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
> print_lock_invalid_wait_context kernel/locking/lockdep.c:4678 [inline]
> check_wait_context kernel/locking/lockdep.c:4739 [inline]
> __lock_acquire.cold+0x213/0x3ab kernel/locking/lockdep.c:4977
> lock_acquire kernel/locking/lockdep.c:5639 [inline]
> lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5604
> __mutex_lock_common kernel/locking/mutex.c:600 [inline]
> __mutex_lock+0x12f/0x12f0 kernel/locking/mutex.c:733
> smc_pnet_apply_ib+0x28/0x160 net/smc/smc_pnet.c:251
> smc_pnetid_by_table_ib+0x2ae/0x470 net/smc/smc_pnet.c:1164
> smc_ib_add_dev+0x4d7/0x900 net/smc/smc_ib.c:940
> add_client_context+0x405/0x5e0 drivers/infiniband/core/device.c:720
> enable_device_and_get+0x1cd/0x3b0 drivers/infiniband/core/device.c:1331
> ib_register_device drivers/infiniband/core/device.c:1419 [inline]
> ib_register_device+0x814/0xaf0 drivers/infiniband/core/device.c:1365
> rxe_register_device+0x2fe/0x3b0 drivers/infiniband/sw/rxe/rxe_verbs.c:1146
> rxe_add+0x1331/0x1710 drivers/infiniband/sw/rxe/rxe.c:246
> rxe_net_add+0x8c/0xe0 drivers/infiniband/sw/rxe/rxe_net.c:538
> rxe_newlink drivers/infiniband/sw/rxe/rxe.c:268 [inline]
> rxe_newlink+0xa9/0xd0 drivers/infiniband/sw/rxe/rxe.c:249
> nldev_newlink+0x30a/0x560 drivers/infiniband/core/nldev.c:1717
> rdma_nl_rcv_msg+0x36d/0x690 drivers/infiniband/core/netlink.c:195
> rdma_nl_rcv_skb drivers/infiniband/core/netlink.c:239 [inline]
> rdma_nl_rcv+0x2ee/0x430 drivers/infiniband/core/netlink.c:259
> netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
> netlink_unicast+0x539/0x7e0 net/netlink/af_netlink.c:1343
> netlink_sendmsg+0x904/0xe00 net/netlink/af_netlink.c:1919
> sock_sendmsg_nosec net/socket.c:705 [inline]
> sock_sendmsg+0xcf/0x120 net/socket.c:725
> ____sys_sendmsg+0x6e8/0x810 net/socket.c:2413
> ___sys_sendmsg+0xf3/0x170 net/socket.c:2467
> __sys_sendmsg+0xe5/0x1b0 net/socket.c:2496
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f7ef25bed59
> Code: 28 c3 e8 5a 14 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffcd0ce91d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ef25bed59
> RDX: 0000000000000000 RSI: 00000000200000c0 RDI: 0000000000000005
> RBP: 00007f7ef25827c0 R08: 0000000000000014 R09: 0000000000000000
> R10: 0000000000000041 R11: 0000000000000246 R12: 00007f7ef2582850
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> </TASK>
> smc: ib device syz1 port 1 has pnetid SYZ2 (user defined)
>
As confirmed by Tony Lu (thanks!), replace rwlocks with mutexes for locking
"struct smc_pnettable".

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master

Fabio M. De Francesco

P.S.: I have just sent another diff but it has a stupid mistake so it cannot compile.


Attachments:
diff (6.40 kB)