Hi,
i'm seeing the lockdep splat below in CI. I think this is because
nbd_open is called with disk->open_mutex held, and acquires
nbd_index_mutex. However, nbd_put() first takes the nbd_index_lock,
and calls del_gendisk, which locks disk->open_mutex, so the order is
reversed.
WARNING: possible circular locking dependency detected
5.14.0-20210816.rc5.git0.04a03f7da6c2.300.fc34.s390x+debug #1 Not tainted
------------------------------------------------------
modprobe/17864 is trying to acquire lock:
00000001dea24d28 (&disk->open_mutex){+.+.}-{3:3}, at: del_gendisk+0x64/0x210
but task is already holding lock:
000003ff805fd6e8 (nbd_index_mutex){+.+.}-{3:3}, at: refcount_dec_and_mutex_lock+0x7e/0x110
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (nbd_index_mutex){+.+.}-{3:3}:
validate_chain+0x9ca/0xde8
__lock_acquire+0x64c/0xc40
lock_acquire.part.0+0xec/0x258
lock_acquire+0xb0/0x200
__mutex_lock+0xa2/0x8d8
mutex_lock_nested+0x32/0x40
nbd_open+0x30/0x248 [nbd]
blkdev_get_whole+0x38/0x128
blkdev_get_by_dev+0xcc/0x400
blkdev_open+0x7a/0xd8
do_dentry_open+0x19e/0x390
do_open+0x2e0/0x458
path_openat+0xec/0x2a8
do_filp_open+0x90/0x130
do_sys_openat2+0xa8/0x168
do_sys_open+0x62/0x90
__do_syscall+0x1c2/0x1f0
system_call+0x78/0xa0
-> #0 (&disk->open_mutex){+.+.}-{3:3}:
check_noncircular+0x168/0x188
check_prev_add+0xe0/0xed8
validate_chain+0x9ca/0xde8
__lock_acquire+0x64c/0xc40
lock_acquire.part.0+0xec/0x258
lock_acquire+0xb0/0x200
__mutex_lock+0xa2/0x8d8
mutex_lock_nested+0x32/0x40
del_gendisk+0x64/0x210
nbd_put.part.0+0x46/0x98 [nbd]
nbd_cleanup+0xde/0x118 [nbd]
__do_sys_delete_module+0x19a/0x2a8
__do_syscall+0x1c2/0x1f0
system_call+0x78/0xa0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(nbd_index_mutex);
lock(&disk->open_mutex);
lock(nbd_index_mutex);
lock(&disk->open_mutex);
*** DEADLOCK ***
1 lock held by modprobe/17864:
#0: 000003ff805fd6e8 (nbd_index_mutex){+.+.}-{3:3}, at: refcount_dec_and_mutex_lock+0x7e/0x110
stack backtrace:
CPU: 1 PID: 17864 Comm: modprobe Not tainted 5.14.0-20210816.rc5.git0.04a03f7da6c2.300.fc34.s390x+debug #1
Hardware name: IBM 8561 T01 703 (LPAR)
Call Trace:
[<000000008f735098>] show_stack+0x90/0xf8
[<000000008f746d56>] dump_stack_lvl+0x8e/0xc8
[<000000008eb3a3b0>] check_noncircular+0x168/0x188
[<000000008eb3b470>] check_prev_add+0xe0/0xed8
[<000000008eb3cc32>] validate_chain+0x9ca/0xde8
[<000000008eb3fc2c>] __lock_acquire+0x64c/0xc40
[<000000008eb3e834>] lock_acquire.part.0+0xec/0x258
[<000000008eb3ea50>] lock_acquire+0xb0/0x200
[<000000008f75591a>] __mutex_lock+0xa2/0x8d8
[<000000008f756182>] mutex_lock_nested+0x32/0x40
[<000000008f2538cc>] del_gendisk+0x64/0x210
[<000003ff805f4936>] nbd_put.part.0+0x46/0x98 [nbd]
[<000003ff805f965e>] nbd_cleanup+0xde/0x118 [nbd]
[<000000008eba7a52>] __do_sys_delete_module+0x19a/0x2a8
[<000000008f74a67a>] __do_syscall+0x1c2/0x1f0
[<000000008f75cf38>] system_call+0x78/0xa0
INFO: lockdep is turned off.
On Thu, Aug 19, 2021 at 03:29:38PM +0800, Hillf Danton wrote:
> On Wed, 18 Aug 2021 09:10:49 +0200 Sven Schnelle wrote:
> > Hi,
> >
> > i'm seeing the lockdep splat below in CI. I think this is because
>
> Thanks for reporting it.
>
> > nbd_open is called with disk->open_mutex held, and acquires
> > nbd_index_mutex. However, nbd_put() first takes the nbd_index_lock,
> > and calls del_gendisk, which locks disk->open_mutex, so the order is
> > reversed.
>
> Right. See diff attached.
This is already fixed in linux-next.