LinuxLists.cc - Converting dev->mutex into dev->spinlock ?

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Sat, Feb 04, 2023 at 10:32:11PM +0900, Tetsuo Handa wrote:
> Hello.
>
> There is a long-standing deadlock problem in driver core code caused by
> "struct device"->mutex being marked as "do not apply lockdep checks".

The marking of a lock does not cause a deadlock problem, so what do you
mean exactly by this? Where is the actual deadlock?

> We can make this deadlock visible by applying [1], and we can confirm that
> there is a deadlock problem that I think needs to be addressed in core code [2].

Any reason why you didn't cc: us on these patches?

> Also, since driver developers are taking it for granted that driver callback
> functions can behave as if dev->mutex is not held (because possibility of deadlock
> was never reported), it would solve many deadlocks in driver code if you can update
> driver core code to avoid calling driver callback functions with dev->mutex held
> (by e.g. replacing dev->mutex with dev->spinlock and dev->atomic_flags).
> But I'm not familiar enough to propose such change...

A driver developer should never be messing with the mutex of a device,
that's not for them to touch, that's the driver core's lock to touch,
right?

So I don't understand the real problem here. What subsystem is having
issues and what issues are they exactly?

And using a spinlock shouldn't change any locking deadlocks that I can
tell, so I don't understand the proposal, sorry.

thanks,

greg k-h

2023-02-04 14:21:44

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On 2023/02/04 22:47, Greg Kroah-Hartman wrote:
> On Sat, Feb 04, 2023 at 10:32:11PM +0900, Tetsuo Handa wrote:
>> Hello.
>>
>> There is a long-standing deadlock problem in driver core code caused by
>> "struct device"->mutex being marked as "do not apply lockdep checks".
>
> The marking of a lock does not cause a deadlock problem, so what do you
> mean exactly by this? Where is the actual deadlock?

A few of examples:

https://syzkaller.appspot.com/bug?extid=2d6ac90723742279e101
https://syzkaller.appspot.com/bug?extid=2e39bc6569d281acbcfb
https://syzkaller.appspot.com/bug?extid=9ef743bba3a17c756174

>
>> We can make this deadlock visible by applying [1], and we can confirm that
>> there is a deadlock problem that I think needs to be addressed in core code [2].
>
> Any reason why you didn't cc: us on these patches?

We can't apply this "drivers/core: Remove lockdep_set_novalidate_class() usage" patch
until we fix all lockdep warnings that happen during the boot stage; otherwise syzbot
testing can't work which is more painful than applying this patch now.

Therefore, I locally tested this patch (in order not to be applied now). And I got
a lockdep warning on the perf_event code. I got next lockdep warning on the driver core
code when I tried a fix for the perf_event code suggested by Peter Zijlstra. Since Peter
confirmed that this is a problem that led to commit 1704f47b50b5 ("lockdep: Add novalidate
class for dev->mutex conversion"), this time I'm reporting this problem to you (so that
you can propose a fix for the driver core code).

2023-02-04 14:34:34

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Sat, Feb 04, 2023 at 11:21:27PM +0900, Tetsuo Handa wrote:
> On 2023/02/04 22:47, Greg Kroah-Hartman wrote:
> > On Sat, Feb 04, 2023 at 10:32:11PM +0900, Tetsuo Handa wrote:
> >> Hello.
> >>
> >> There is a long-standing deadlock problem in driver core code caused by
> >> "struct device"->mutex being marked as "do not apply lockdep checks".
> >
> > The marking of a lock does not cause a deadlock problem, so what do you
> > mean exactly by this? Where is the actual deadlock?
>
> A few of examples:
>
> https://syzkaller.appspot.com/bug?extid=2d6ac90723742279e101
> https://syzkaller.appspot.com/bug?extid=2e39bc6569d281acbcfb
> https://syzkaller.appspot.com/bug?extid=9ef743bba3a17c756174

Random links to syzkaller reports that are huge and not descriptive does
not actually persuade me as I don't have the inclination to dig through
them, sorry.

Specific examples, with code, please.

> >> We can make this deadlock visible by applying [1], and we can confirm that
> >> there is a deadlock problem that I think needs to be addressed in core code [2].
> >
> > Any reason why you didn't cc: us on these patches?
>
> We can't apply this "drivers/core: Remove lockdep_set_novalidate_class() usage" patch

What patch is that? I do not see that in my inbox anywhere. I don't
even see it in my lkml archive, so I do not know what you are talking
about.

> until we fix all lockdep warnings that happen during the boot stage;

What lockdep warnings?

> otherwise syzbot testing can't work which is more painful than
> applying this patch now.

Again, I'm totally confused. What is the real bug/problem/issue here?

Where is the deadlock?

> Therefore, I locally tested this patch (in order not to be applied now).

What patch? I'm totally confused.

> And I got a lockdep warning on the perf_event code.

What warning?

> I got next lockdep warning on the driver core code when I tried a fix
> for the perf_event code suggested by Peter Zijlstra.

Again, what warning?

> Since Peter confirmed that this is a problem that led to commit
> 1704f47b50b5 ("lockdep: Add novalidate class for dev->mutex
> conversion"), this time I'm reporting this problem to you (so that you
> can propose a fix for the driver core code).

Again, I have no idea what the real problem is!

Please show me in the driver core code, where the deadlock is that needs
to be resolved. Without that, I can't answer anything...

totally and throughly confused,

greg k-h

2023-02-04 15:12:23

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Sat, Feb 04, 2023 at 10:32:11PM +0900, Tetsuo Handa wrote:
> Hello.
>
> There is a long-standing deadlock problem in driver core code caused by
> "struct device"->mutex being marked as "do not apply lockdep checks".

What exactly is the deadlock problem? Furthermore, how can skipping
some lockdep checks _cause_ a problem?

> We can make this deadlock visible by applying [1], and we can confirm that
> there is a deadlock problem that I think needs to be addressed in core code [2].

I don't understand why you think there is a deadlock problem. The splat
in [2] says "WARNING: possible recursive locking detected". This is
only a warning; it doesn't mean there really is a problem.

> Also, since driver developers are taking it for granted that driver callback
> functions can behave as if dev->mutex is not held (because possibility of deadlock
> was never reported),

What? I have no idea what developers you're talking about. I have
never heard of anyone taking this for granted. Certainly developers
working on the USB subsystem's core are well aware of dev->mutex
locking.

> it would solve many deadlocks in driver code if you can update

What deadlocks? If there are so many deadlocks floating around in
driver code, why haven't we heard about them before now?

> driver core code to avoid calling driver callback functions with dev->mutex held

We most definitely cannot do that. The driver core relies on mutual
exclusion.

> (by e.g. replacing dev->mutex with dev->spinlock and dev->atomic_flags).
> But I'm not familiar enough to propose such change...

Such a change cannot be made. Consider this: Driver callbacks often
need to sleep. But when a thread holds a spinlock, it is not allowed to
sleep. Therefore driver callbacks must not be invoked while a spinlock
is held.

Besides, how will replacing a mutex with a spinlock prevent any deadlock
problems? If the new locks get held at the same time as the old mutexes
were held, won't the same deadlocks occur?

Alan Stern

> [1] https://lkml.kernel.org/r/[email protected]
> [2] https://lkml.kernel.org/r/[email protected]

2023-02-04 15:16:46

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On 2023/02/04 23:34, Greg Kroah-Hartman wrote:
>>>> We can make this deadlock visible by applying [1], and we can confirm that
>>>> there is a deadlock problem that I think needs to be addressed in core code [2].
>>>
>>> Any reason why you didn't cc: us on these patches?
>>
>> We can't apply this "drivers/core: Remove lockdep_set_novalidate_class() usage" patch
>
> What patch is that? I do not see that in my inbox anywhere. I don't
> even see it in my lkml archive, so I do not know what you are talking
> about.

Here is a copy. Please don't apply to git trees, or syzbot will fail to test kernels.

From f7ff56455ae7813768c6ab85e8e3db374122f32b Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <[email protected]>
Date: Mon, 23 Jan 2023 19:32:26 +0900
Subject: [PATCH] drivers/core: Remove lockdep_set_novalidate_class() usage

This patch experimentally removes lockdep_set_novalidate_class() call
from device_initialize() introduced by commit 1704f47b50b5 ("lockdep:
Add novalidate class for dev->mutex conversion"), for this commit made it
impossible to find real deadlocks unless timing dependent testings manage
to trigger hung task like [1] and [2]. Let's try if we can find remaining
drivers which need to use separate classes without causing too many crashes
to continue.

[1] https://syzkaller.appspot.com/bug?extid=2d6ac90723742279e101
[2] https://syzkaller.appspot.com/bug?extid=2e39bc6569d281acbcfb

Signed-off-by: Tetsuo Handa <[email protected]>
---
drivers/base/core.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index a3e14143ec0c..68189722e343 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2941,7 +2941,6 @@ void device_initialize(struct device *dev)
kobject_init(&dev->kobj, &device_ktype);
INIT_LIST_HEAD(&dev->dma_pools);
mutex_init(&dev->mutex);
- lockdep_set_novalidate_class(&dev->mutex);
spin_lock_init(&dev->devres_lock);
INIT_LIST_HEAD(&dev->devres_head);
device_pm_init(dev);
--
2.18.4

>
>> until we fix all lockdep warnings that happen during the boot stage;
>
> What lockdep warnings?

Here is an example that you will be able to observe by applying the patch above.

----------
[ 2.276394][ T9] Trying to unpack rootfs image as initramfs...
[ 2.276394][ T1] software IO TLB: mapped [mem 0x00000000bbed0000-0x00000000bfed0000] (64MB)
[ 2.276394][ T1] workingset: timestamp_bits=46 max_order=21 bucket_order=0
[ 2.276394][ T1] SGI XFS with ACLs, security attributes, verbose warnings, quota, no debug enabled
[ 2.276394][ T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[ 2.837244][ T1]
[ 2.837244][ T1] ============================================
[ 2.837244][ T1] WARNING: possible recursive locking detected
[ 2.837244][ T1] 6.2.0-rc5+ #10 Not tainted
[ 2.837244][ T1] --------------------------------------------
[ 2.837244][ T1] swapper/0/1 is trying to acquire lock:
[ 2.837244][ T1] ffff984dc3d50108 (&dev->mutex){+.+.}-{3:3}, at: __device_attach+0x35/0x1a0
[ 2.837244][ T1]
[ 2.837244][ T1] but task is already holding lock:
[ 2.837244][ T1] ffff984dc1b5e1b8 (&dev->mutex){+.+.}-{3:3}, at: __device_driver_lock+0x28/0x40
[ 2.837244][ T1]
[ 2.837244][ T1] other info that might help us debug this:
[ 2.837244][ T1] Possible unsafe locking scenario:
[ 2.837244][ T1]
[ 2.837244][ T1] CPU0
[ 2.837244][ T1] ----
[ 2.837244][ T1] lock(&dev->mutex);
[ 2.837244][ T1] lock(&dev->mutex);
[ 2.837244][ T1]
[ 2.837244][ T1] *** DEADLOCK ***
[ 2.837244][ T1]
[ 2.837244][ T1] May be due to missing lock nesting notation
[ 2.837244][ T1]
[ 2.837244][ T1] 1 lock held by swapper/0/1:
[ 2.837244][ T1] #0: ffff984dc1b5e1b8 (&dev->mutex){+.+.}-{3:3}, at: __device_driver_lock+0x28/0x40
[ 2.837244][ T1]
[ 2.837244][ T1] stack backtrace:
[ 2.837244][ T1] CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc5+ #10
[ 2.837244][ T1] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[ 2.837244][ T1] Call Trace:
[ 2.837244][ T1] <TASK>
[ 2.837244][ T1] dump_stack_lvl+0x49/0x5e
[ 2.837244][ T1] dump_stack+0x10/0x12
[ 2.837244][ T1] __lock_acquire.cold.73+0x12e/0x2c7
[ 2.837244][ T1] lock_acquire+0xc7/0x2e0
[ 2.837244][ T1] ? __device_attach+0x35/0x1a0
[ 2.837244][ T1] __mutex_lock+0x99/0xf00
[ 2.837244][ T1] ? __device_attach+0x35/0x1a0
[ 2.837244][ T1] ? __this_cpu_preempt_check+0x13/0x20
[ 2.837244][ T1] ? __device_attach+0x35/0x1a0
[ 2.837244][ T1] ? kobject_uevent_env+0x12f/0x770
[ 2.837244][ T1] mutex_lock_nested+0x16/0x20
[ 2.837244][ T1] ? mutex_lock_nested+0x16/0x20
[ 2.837244][ T1] __device_attach+0x35/0x1a0
[ 2.837244][ T1] device_initial_probe+0xe/0x10
[ 2.837244][ T1] bus_probe_device+0x9b/0xb0
[ 2.837244][ T1] device_add+0x3e1/0x900
[ 2.837244][ T1] ? __init_waitqueue_head+0x4a/0x70
[ 2.837244][ T1] device_register+0x15/0x20
[ 2.837244][ T1] pcie_portdrv_probe+0x3e3/0x670
[ 2.837244][ T1] ? trace_hardirqs_on+0x3b/0x100
[ 2.837244][ T1] pci_device_probe+0xa8/0x150
[ 2.837244][ T1] really_probe+0xd9/0x340
[ 2.837244][ T1] ? pm_runtime_barrier+0x52/0xb0
[ 2.837244][ T1] __driver_probe_device+0x78/0x170
[ 2.837244][ T1] driver_probe_device+0x1f/0x90
[ 2.837244][ T1] __driver_attach+0xaa/0x160
[ 2.837244][ T1] ? __device_attach_driver+0x100/0x100
[ 2.837244][ T1] bus_for_each_dev+0x75/0xb0
[ 2.837244][ T1] driver_attach+0x19/0x20
[ 2.837244][ T1] bus_add_driver+0x1be/0x210
[ 2.837244][ T1] ? dmi_pcie_pme_disable_msi+0x1f/0x1f
[ 2.837244][ T1] ? dmi_pcie_pme_disable_msi+0x1f/0x1f
[ 2.837244][ T1] ? rdinit_setup+0x27/0x27
[ 2.837244][ T1] driver_register+0x6b/0xc0
[ 2.837244][ T1] ? dmi_pcie_pme_disable_msi+0x1f/0x1f
[ 2.837244][ T1] __pci_register_driver+0x7c/0x80
[ 2.837244][ T1] pcie_portdrv_init+0x3d/0x45
[ 2.837244][ T1] do_one_initcall+0x58/0x300
[ 2.837244][ T1] ? rdinit_setup+0x27/0x27
[ 2.837244][ T1] ? rcu_read_lock_sched_held+0x4a/0x70
[ 2.837244][ T1] kernel_init_freeable+0x181/0x1d2
[ 2.837244][ T1] ? rest_init+0x190/0x190
[ 2.837244][ T1] kernel_init+0x15/0x120
[ 2.837244][ T1] ret_from_fork+0x1f/0x30
[ 2.837244][ T1] </TASK>
[ 4.126397][ T1] pcieport 0000:00:15.0: PME: Signaling with IRQ 24
[ 4.126397][ T1] pcieport 0000:00:15.0: pciehp: Slot #160 AttnBtn+ PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise- Interlock- NoCompl+ IbPresDis- LLActRep+
[ 4.126397][ T1] pcieport 0000:00:15.1: PME: Signaling with IRQ 25
----------

# ./scripts/faddr2line --list vmlinux __device_attach+0x35/0x1a0 __device_driver_lock+0x28/0x40
__device_attach+0x35/0x1a0:

__device_attach at drivers/base/dd.c:984
979 {
980 int ret = 0;
981 bool async = false;
982
983 device_lock(dev);
>984< if (dev->p->dead) {
985 goto out_unlock;
986 } else if (dev->driver) {
987 if (device_is_bound(dev)) {
988 ret = 1;
989 goto out_unlock;

__device_driver_lock+0x28/0x40:

__device_driver_lock at drivers/base/dd.c:1074
1069 static void __device_driver_lock(struct device *dev, struct device *parent)
1070 {
1071 if (parent && dev->bus->need_parent_lock)
1072 device_lock(parent);
1073 device_lock(dev);
>1074< }
1075
1076 /*
1077 * __device_driver_unlock - release locks needed to manipulate dev->drv
1078 * @dev: Device we will update driver info for
1079 * @parent: Parent device. Needed if the bus requires parent lock

# ./scripts/faddr2line vmlinux __device_attach+0x35/0x1a0 device_initial_probe+0xe/0x10 bus_probe_device+0x9b/0xb0 device_add+0x3e1/0x900 device_register+0x15/0x20 pcie_portdrv_probe+0x3e3/0x670 pci_device_probe+0xa8/0x150 really_probe+0xd9/0x340 __driver_probe_device+0x78/0x170 driver_probe_device+0x1f/0x90 __driver_attach+0xaa/0x160 bus_for_each_dev+0x75/0xb0 driver_attach+0x19/0x20 bus_add_driver+0x1be/0x210 driver_register+0x6b/0xc0
__device_attach+0x35/0x1a0:
__device_attach at drivers/base/dd.c:984

device_initial_probe+0xe/0x10:
device_initial_probe at drivers/base/dd.c:1058

bus_probe_device+0x9b/0xb0:
bus_probe_device at drivers/base/bus.c:487

device_add+0x3e1/0x900:
device_add at drivers/base/core.c:3485

device_register+0x15/0x20:
device_register at drivers/base/core.c:3560

pcie_portdrv_probe+0x3e3/0x670:
pcie_device_init at drivers/pci/pcie/portdrv.c:310
(inlined by) pcie_port_device_register at drivers/pci/pcie/portdrv.c:363
(inlined by) pcie_portdrv_probe at drivers/pci/pcie/portdrv.c:696

pci_device_probe+0xa8/0x150:
local_pci_probe at drivers/pci/pci-driver.c:324
(inlined by) pci_call_probe at drivers/pci/pci-driver.c:392
(inlined by) __pci_device_probe at drivers/pci/pci-driver.c:417
(inlined by) pci_device_probe at drivers/pci/pci-driver.c:460

really_probe+0xd9/0x340:
call_driver_probe at drivers/base/dd.c:560
(inlined by) really_probe at drivers/base/dd.c:639

__driver_probe_device+0x78/0x170:
__driver_probe_device at drivers/base/dd.c:778

driver_probe_device+0x1f/0x90:
driver_probe_device at drivers/base/dd.c:808

__driver_attach+0xaa/0x160:
__driver_attach at drivers/base/dd.c:1195

bus_for_each_dev+0x75/0xb0:
bus_for_each_dev at drivers/base/bus.c:300

driver_attach+0x19/0x20:
driver_attach at drivers/base/dd.c:1212

bus_add_driver+0x1be/0x210:
bus_add_driver at drivers/base/bus.c:619

driver_register+0x6b/0xc0:
driver_register at drivers/base/driver.c:246

>
>> otherwise syzbot testing can't work which is more painful than
>> applying this patch now.
>
> Again, I'm totally confused. What is the real bug/problem/issue here?

Since the possibility of deadlock is not reported by lockdep, we can't find
real deadlocks unless khungtaskd reports it as a hung task.

>
> Where is the deadlock?

In driver core code (an example shown above) and in many driver codes
(an example shown below). Since dev->mutex is hidden from lockdep checks,
real deadlocks cannot be reported until khungtaskd reports as hung tasks.

----------
INFO: task syz-executor145:4505 blocked for more than 143 seconds.
Not tainted 6.1.0-rc5-syzkaller-00008-ge01d50cbd6ee #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor145 state:D stack:21896 pid:4505 ppid:3645 flags:0x00004002
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5191 [inline]
__schedule+0x8c9/0xd70 kernel/sched/core.c:6503
schedule+0xcb/0x190 kernel/sched/core.c:6579
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6638
__mutex_lock_common+0xe4f/0x26e0 kernel/locking/mutex.c:679
__mutex_lock kernel/locking/mutex.c:747 [inline]
mutex_lock_nested+0x17/0x20 kernel/locking/mutex.c:799
rfkill_unregister+0xcb/0x220 net/rfkill/core.c:1130
nfc_unregister_device+0xba/0x290 net/nfc/core.c:1167
virtual_ncidev_close+0x55/0x90 drivers/nfc/virtual_ncidev.c:166
__fput+0x3ba/0x880 fs/file_table.c:320
task_work_run+0x243/0x300 kernel/task_work.c:179
exit_task_work include/linux/task_work.h:38 [inline]
do_exit+0x664/0x2070 kernel/exit.c:820
do_group_exit+0x1fd/0x2b0 kernel/exit.c:950
__do_sys_exit_group kernel/exit.c:961 [inline]
__se_sys_exit_group kernel/exit.c:959 [inline]
__x64_sys_exit_group+0x3b/0x40 kernel/exit.c:959
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fc8e3d92af9
RSP: 002b:00007fff2cfab0b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007fc8e3e06330 RCX: 00007fc8e3d92af9
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffffffffffffffc0 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000246 R12: 00007fc8e3e06330
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
</TASK>
INFO: task syz-executor145:4516 blocked for more than 144 seconds.
Not tainted 6.1.0-rc5-syzkaller-00008-ge01d50cbd6ee #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor145 state:D stack:23096 pid:4516 ppid:3647 flags:0x00004004
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5191 [inline]
__schedule+0x8c9/0xd70 kernel/sched/core.c:6503
schedule+0xcb/0x190 kernel/sched/core.c:6579
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6638
__mutex_lock_common+0xe4f/0x26e0 kernel/locking/mutex.c:679
__mutex_lock kernel/locking/mutex.c:747 [inline]
mutex_lock_nested+0x17/0x20 kernel/locking/mutex.c:799
device_lock include/linux/device.h:835 [inline]
nfc_dev_down+0x33/0x260 net/nfc/core.c:143
nfc_rfkill_set_block+0x28/0xc0 net/nfc/core.c:179
rfkill_set_block+0x1e7/0x430 net/rfkill/core.c:345
rfkill_fop_write+0x5db/0x790 net/rfkill/core.c:1286
vfs_write+0x303/0xc50 fs/read_write.c:582
ksys_write+0x177/0x2a0 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fc8e3d93e69
RSP: 002b:00007fff2cfab108 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007fc8e3d93e69
RDX: 0000000000000008 RSI: 0000000020000000 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000246 R12: 000000000000d60b
R13: 00007fff2cfab11c R14: 00007fff2cfab130 R15: 00007fff2cfab120
</TASK>
----------

----------
2 locks held by syz-executor145/4505:
#0: ffff88807268e100 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#0: ffff88807268e100 (&dev->mutex){....}-{3:3}, at: nfc_unregister_device+0x87/0x290 net/nfc/core.c:1165
#1: ffffffff8e787b08 (rfkill_global_mutex){+.+.}-{3:3}, at: rfkill_unregister+0xcb/0x220 net/rfkill/core.c:1130
2 locks held by syz-executor145/4516:
#0: ffffffff8e787b08 (rfkill_global_mutex){+.+.}-{3:3}, at: rfkill_fop_write+0x1b3/0x790 net/rfkill/core.c:1278
#1: ffff88807268e100 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#1: ffff88807268e100 (&dev->mutex){....}-{3:3}, at: nfc_dev_down+0x33/0x260 net/nfc/core.c:143
----------

>
>> Therefore, I locally tested this patch (in order not to be applied now).
>
> What patch? I'm totally confused.

The "drivers/core: Remove lockdep_set_novalidate_class() usage" shown above.

>
>> And I got a lockdep warning on the perf_event code.
>
> What warning?

Here is a copy.

----------
[ 2.241650][ T9] Trying to unpack rootfs image as initramfs...
[ 2.241630][ T1] software IO TLB: mapped [mem 0x00000000bbed0000-0x00000000bfed0000] (64MB)
[ 2.241670][ T1] workingset: timestamp_bits=46 max_order=21 bucket_order=0
[ 2.241670][ T1] SGI XFS with ACLs, security attributes, verbose warnings, quota, no debug enabled
[ 2.241670][ T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[ 2.798150][ T1]
[ 2.798660][ T1] ======================================================
[ 2.798660][ T1] WARNING: possible circular locking dependency detected
[ 2.798660][ T1] 6.2.0-rc5+ #9 Not tainted
[ 2.798660][ T1] ------------------------------------------------------
[ 2.798660][ T1] swapper/0/1 is trying to acquire lock:
[ 2.798660][ T1] ffffffffb002e888 (cpu_add_remove_lock){+.+.}-{3:3}, at: cpu_hotplug_disable+0x12/0x30
[ 2.798660][ T1]
[ 2.798660][ T1] but task is already holding lock:
[ 2.798660][ T1] ffff941940a161b8 (&dev->mutex){+.+.}-{3:3}, at: __device_driver_lock+0x28/0x40
[ 2.798660][ T1]
[ 2.798660][ T1] which lock already depends on the new lock.
[ 2.798660][ T1]
[ 2.798660][ T1]
[ 2.798660][ T1] the existing dependency chain (in reverse order) is:
[ 2.798660][ T1]
[ 2.798660][ T1] -> #3 (&dev->mutex){+.+.}-{3:3}:
[ 2.798660][ T1] lock_acquire+0xc7/0x2e0
[ 2.798660][ T1] __mutex_lock+0x99/0xf00
[ 2.798660][ T1] mutex_lock_nested+0x16/0x20
[ 2.798660][ T1] __device_attach+0x35/0x1a0
[ 2.798660][ T1] device_initial_probe+0xe/0x10
[ 2.798660][ T1] bus_probe_device+0x9b/0xb0
[ 2.798660][ T1] device_add+0x3e1/0x900
[ 2.798660][ T1] pmu_dev_alloc+0x98/0xf0
[ 2.798660][ T1] perf_event_sysfs_init+0x56/0x8f
[ 2.798660][ T1] do_one_initcall+0x58/0x300
[ 2.798660][ T1] kernel_init_freeable+0x181/0x1d2
[ 2.798660][ T1] kernel_init+0x15/0x120
[ 2.798660][ T1] ret_from_fork+0x1f/0x30
[ 2.798660][ T1]
[ 2.798660][ T1] -> #2 (pmus_lock){+.+.}-{3:3}:
[ 2.798660][ T1] lock_acquire+0xc7/0x2e0
[ 2.798660][ T1] __mutex_lock+0x99/0xf00
[ 2.798660][ T1] mutex_lock_nested+0x16/0x20
[ 2.798660][ T1] perf_event_init_cpu+0x4c/0x110
[ 2.798660][ T1] cpuhp_invoke_callback+0x17a/0x880
[ 2.798660][ T1] __cpuhp_invoke_callback_range+0x77/0xb0
[ 2.798660][ T1] _cpu_up+0xdc/0x240
[ 2.798660][ T1] cpu_up+0x8c/0xa0
[ 2.798660][ T1] bringup_nonboot_cpus+0x56/0x60
[ 2.798660][ T1] smp_init+0x25/0x5f
[ 2.798660][ T1] kernel_init_freeable+0xb4/0x1d2
[ 2.798660][ T1] kernel_init+0x15/0x120
[ 2.798660][ T1] ret_from_fork+0x1f/0x30
[ 2.798660][ T1]
[ 2.798660][ T1] -> #1 (cpu_hotplug_lock){++++}-{0:0}:
[ 2.798660][ T1] lock_acquire+0xc7/0x2e0
[ 2.798660][ T1] percpu_down_write+0x44/0x2c0
[ 2.798660][ T1] _cpu_up+0x35/0x240
[ 2.798660][ T1] cpu_up+0x8c/0xa0
[ 2.798660][ T1] bringup_nonboot_cpus+0x56/0x60
[ 2.798660][ T1] smp_init+0x25/0x5f
[ 2.798660][ T1] kernel_init_freeable+0xb4/0x1d2
[ 2.798660][ T1] kernel_init+0x15/0x120
[ 2.798660][ T1] ret_from_fork+0x1f/0x30
[ 2.798660][ T1]
[ 2.798660][ T1] -> #0 (cpu_add_remove_lock){+.+.}-{3:3}:
[ 2.798660][ T1] check_prevs_add+0x16a/0x1070
[ 2.798660][ T1] __lock_acquire+0x11bd/0x1670
[ 2.798660][ T1] lock_acquire+0xc7/0x2e0
[ 2.798660][ T1] __mutex_lock+0x99/0xf00
[ 2.798660][ T1] mutex_lock_nested+0x16/0x20
[ 2.798660][ T1] cpu_hotplug_disable+0x12/0x30
[ 2.798660][ T1] pci_device_probe+0x8c/0x150
[ 2.798660][ T1] really_probe+0xd9/0x340
[ 2.798660][ T1] __driver_probe_device+0x78/0x170
[ 2.798660][ T1] driver_probe_device+0x1f/0x90
[ 2.798660][ T1] __driver_attach+0xaa/0x160
[ 2.798660][ T1] bus_for_each_dev+0x75/0xb0
[ 2.798660][ T1] driver_attach+0x19/0x20
[ 2.798660][ T1] bus_add_driver+0x1be/0x210
[ 2.798660][ T1] driver_register+0x6b/0xc0
[ 2.798660][ T1] __pci_register_driver+0x7c/0x80
[ 2.798660][ T1] pcie_portdrv_init+0x3d/0x45
[ 2.798660][ T1] do_one_initcall+0x58/0x300
[ 2.798660][ T1] kernel_init_freeable+0x181/0x1d2
[ 2.798660][ T1] kernel_init+0x15/0x120
[ 2.798660][ T1] ret_from_fork+0x1f/0x30
[ 2.798660][ T1]
[ 2.798660][ T1] other info that might help us debug this:
[ 2.798660][ T1]
[ 2.798660][ T1] Chain exists of:
[ 2.798660][ T1] cpu_add_remove_lock --> pmus_lock --> &dev->mutex
[ 2.798660][ T1]
[ 2.798660][ T1] Possible unsafe locking scenario:
[ 2.798660][ T1]
[ 2.798660][ T1] CPU0 CPU1
[ 2.798660][ T1] ---- ----
[ 2.798660][ T1] lock(&dev->mutex);
[ 2.798660][ T1] lock(pmus_lock);
[ 2.798660][ T1] lock(&dev->mutex);
[ 2.798660][ T1] lock(cpu_add_remove_lock);
[ 2.798660][ T1]
[ 2.798660][ T1] *** DEADLOCK ***
[ 2.798660][ T1]
[ 2.798660][ T1] 1 lock held by swapper/0/1:
[ 2.798660][ T1] #0: ffff941940a161b8 (&dev->mutex){+.+.}-{3:3}, at: __device_driver_lock+0x28/0x40
[ 2.798660][ T1]
[ 2.798660][ T1] stack backtrace:
[ 2.798660][ T1] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc5+ #9
[ 2.798660][ T1] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[ 2.798660][ T1] Call Trace:
[ 2.798660][ T1] <TASK>
[ 2.798660][ T1] dump_stack_lvl+0x49/0x5e
[ 2.798660][ T1] dump_stack+0x10/0x12
[ 2.798660][ T1] print_circular_bug.isra.46.cold.66+0x13e/0x143
[ 2.798660][ T1] check_noncircular+0xfe/0x110
[ 2.798660][ T1] check_prevs_add+0x16a/0x1070
[ 2.798660][ T1] __lock_acquire+0x11bd/0x1670
[ 2.798660][ T1] lock_acquire+0xc7/0x2e0
[ 2.798660][ T1] ? cpu_hotplug_disable+0x12/0x30
[ 2.798660][ T1] __mutex_lock+0x99/0xf00
[ 2.798660][ T1] ? cpu_hotplug_disable+0x12/0x30
[ 2.798660][ T1] ? pci_match_device+0xd5/0x130
[ 2.798660][ T1] ? __this_cpu_preempt_check+0x13/0x20
[ 2.798660][ T1] ? cpu_hotplug_disable+0x12/0x30
[ 2.798660][ T1] ? kernfs_add_one+0xf1/0x130
[ 2.798660][ T1] mutex_lock_nested+0x16/0x20
[ 2.798660][ T1] ? mutex_lock_nested+0x16/0x20
[ 2.798660][ T1] cpu_hotplug_disable+0x12/0x30
[ 2.798660][ T1] pci_device_probe+0x8c/0x150
[ 2.798660][ T1] really_probe+0xd9/0x340
[ 2.798660][ T1] ? pm_runtime_barrier+0x52/0xb0
[ 2.798660][ T1] __driver_probe_device+0x78/0x170
[ 2.798660][ T1] driver_probe_device+0x1f/0x90
[ 2.798660][ T1] __driver_attach+0xaa/0x160
[ 2.798660][ T1] ? __device_attach_driver+0x100/0x100
[ 2.798660][ T1] bus_for_each_dev+0x75/0xb0
[ 2.798660][ T1] driver_attach+0x19/0x20
[ 2.798660][ T1] bus_add_driver+0x1be/0x210
[ 2.798660][ T1] ? dmi_pcie_pme_disable_msi+0x1f/0x1f
[ 2.798660][ T1] ? dmi_pcie_pme_disable_msi+0x1f/0x1f
[ 2.798660][ T1] ? rdinit_setup+0x27/0x27
[ 2.798660][ T1] driver_register+0x6b/0xc0
[ 2.798660][ T1] ? dmi_pcie_pme_disable_msi+0x1f/0x1f
[ 2.798660][ T1] __pci_register_driver+0x7c/0x80
[ 2.798660][ T1] pcie_portdrv_init+0x3d/0x45
[ 2.798660][ T1] do_one_initcall+0x58/0x300
[ 2.798660][ T1] ? rdinit_setup+0x27/0x27
[ 2.798660][ T1] ? rcu_read_lock_sched_held+0x4a/0x70
[ 2.798660][ T1] kernel_init_freeable+0x181/0x1d2
[ 2.798660][ T1] ? rest_init+0x190/0x190
[ 2.798660][ T1] kernel_init+0x15/0x120
[ 2.798660][ T1] ret_from_fork+0x1f/0x30
[ 2.798660][ T1] </TASK>
[ 3.991673][ T92] tsc: Refined TSC clocksource calibration: 2611.210 MHz
[ 3.991673][ T92] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x25a399d04c4, max_idle_ns: 440795206293 ns
[ 4.992946][ T92] clocksource: Switched to clocksource tsc
----------

>
>> I got next lockdep warning on the driver core code when I tried a fix
>> for the perf_event code suggested by Peter Zijlstra.
>
> Again, what warning?

Shown above.

[ 2.837244][ T1] swapper/0/1 is trying to acquire lock:
[ 2.837244][ T1] ffff984dc3d50108 (&dev->mutex){+.+.}-{3:3}, at: __device_attach+0x35/0x1a0
[ 2.837244][ T1]
[ 2.837244][ T1] but task is already holding lock:
[ 2.837244][ T1] ffff984dc1b5e1b8 (&dev->mutex){+.+.}-{3:3}, at: __device_driver_lock+0x28/0x40
[ 2.837244][ T1]
[ 2.837244][ T1] other info that might help us debug this:
[ 2.837244][ T1] Possible unsafe locking scenario:
[ 2.837244][ T1]
[ 2.837244][ T1] CPU0
[ 2.837244][ T1] ----
[ 2.837244][ T1] lock(&dev->mutex);
[ 2.837244][ T1] lock(&dev->mutex);
[ 2.837244][ T1]
[ 2.837244][ T1] *** DEADLOCK ***

>
>> Since Peter confirmed that this is a problem that led to commit
>> 1704f47b50b5 ("lockdep: Add novalidate class for dev->mutex
>> conversion"), this time I'm reporting this problem to you (so that you
>> can propose a fix for the driver core code).
>
> Again, I have no idea what the real problem is!

Since dev->mutex is hidden from lockdep checks, real deadlocks cannot be
reported until khungtaskd reports as hung tasks.

>
> Please show me in the driver core code, where the deadlock is that needs
> to be resolved. Without that, I can't answer anything...
>
> totally and throughly confused,
>
> greg k-h

2023-02-04 15:30:21

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On 2023/02/05 0:12, Alan Stern wrote:
>> it would solve many deadlocks in driver code if you can update
>
> What deadlocks? If there are so many deadlocks floating around in
> driver code, why haven't we heard about them before now?

Since dev->mutex is hidden from lockdep checks, nobody can see lockdep warnings.
syzbot is reporting real deadlocks without lockdep warnings, for the fundamental
problem you mentioned in https://lkml.kernel.org/r/[email protected]
is remaining. I'm suggesting you that now is time to address this fundamental problem.

>> (by e.g. replacing dev->mutex with dev->spinlock and dev->atomic_flags).
>> But I'm not familiar enough to propose such change...
>
> Such a change cannot be made. Consider this: Driver callbacks often
> need to sleep. But when a thread holds a spinlock, it is not allowed to
> sleep. Therefore driver callbacks must not be invoked while a spinlock
> is held.

What I'm suggesting is "Do not call driver callbacks with dev->mutex held,
by rewriting driver core code".

2023-02-04 15:35:03

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Sat, Feb 04, 2023 at 11:21:27PM +0900, Tetsuo Handa wrote:
> On 2023/02/04 22:47, Greg Kroah-Hartman wrote:
> > On Sat, Feb 04, 2023 at 10:32:11PM +0900, Tetsuo Handa wrote:
> >> Hello.
> >>
> >> There is a long-standing deadlock problem in driver core code caused by
> >> "struct device"->mutex being marked as "do not apply lockdep checks".
> >
> > The marking of a lock does not cause a deadlock problem, so what do you
> > mean exactly by this? Where is the actual deadlock?
>
> A few of examples:
>
> https://syzkaller.appspot.com/bug?extid=2d6ac90723742279e101

It's hard to figure out what's wrong from looking at the syzbot report.
What makes you think it is connected with dev->mutex?

At first glance, it seems that the ath6kl driver is trying to flush a
workqueue while holding a lock or mutex that is needed by one of the
jobs in the workqueue. That's obviously never going to work, no matter
what sort of lockdep validation gets used.

> https://syzkaller.appspot.com/bug?extid=2e39bc6569d281acbcfb
> https://syzkaller.appspot.com/bug?extid=9ef743bba3a17c756174
>
> >
> >> We can make this deadlock visible by applying [1], and we can confirm that
> >> there is a deadlock problem that I think needs to be addressed in core code [2].
> >
> > Any reason why you didn't cc: us on these patches?
>
> We can't apply this "drivers/core: Remove lockdep_set_novalidate_class() usage" patch
> until we fix all lockdep warnings that happen during the boot stage; otherwise syzbot
> testing can't work which is more painful than applying this patch now.
>
> Therefore, I locally tested this patch (in order not to be applied now). And I got
> a lockdep warning on the perf_event code. I got next lockdep warning on the driver core
> code when I tried a fix for the perf_event code suggested by Peter Zijlstra. Since Peter
> confirmed that this is a problem that led to commit 1704f47b50b5 ("lockdep: Add novalidate
> class for dev->mutex conversion"), this time I'm reporting this problem to you (so that
> you can propose a fix for the driver core code).

There is no fix. This problem is inherent to the way that lockdep
works.

Lockdep classifies locks into classes, and it reports a problem if there
is a locking cycle (for example, if someone tries to acquire a lock in
class B while holding a lock in class A, someone else tries to acquire a
lock in class C while holding a lock in class B, and someone else tries
to acquire a lock in class A while holding a lock in class C).

But this sort of approach doesn't work when you're dealing with a
recursive locking structure such as the device tree. Here all the
dev->mutex locks belong to the same class, so lockdep thinks there's a
problem whenever someone tries to lock a device while holding another
device's lock.

However, it is always safe to acquire a child device's lock while
holding the parent's lock. Lockdep isn't aware of this because it
doesn't understand the hierarchical device tree. That's why lockdep
checking has to be disabled for dev->mutex; if it weren't disabled then
it would constantly report false positives.

Alan Stern

2023-02-04 15:41:13

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Sun, Feb 05, 2023 at 12:30:07AM +0900, Tetsuo Handa wrote:
> On 2023/02/05 0:12, Alan Stern wrote:
> >> it would solve many deadlocks in driver code if you can update
> >
> > What deadlocks? If there are so many deadlocks floating around in
> > driver code, why haven't we heard about them before now?
>
> Since dev->mutex is hidden from lockdep checks, nobody can see lockdep warnings.
> syzbot is reporting real deadlocks without lockdep warnings, for the fundamental
> problem you mentioned in https://lkml.kernel.org/r/[email protected]
> is remaining. I'm suggesting you that now is time to address this fundamental problem.

Maybe so. But the place to address it is inside lockdep, not in the
driver core.

> >> (by e.g. replacing dev->mutex with dev->spinlock and dev->atomic_flags).
> >> But I'm not familiar enough to propose such change...
> >
> > Such a change cannot be made. Consider this: Driver callbacks often
> > need to sleep. But when a thread holds a spinlock, it is not allowed to
> > sleep. Therefore driver callbacks must not be invoked while a spinlock
> > is held.
>
> What I'm suggesting is "Do not call driver callbacks with dev->mutex held,
> by rewriting driver core code".

That cannot be done. The only possible solution is to teach lockdep how
to handle recursive locking structures.

Alan Stern

2023-02-04 16:12:30

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On 2023/02/05 0:34, Alan Stern wrote:
>> A few of examples:
>>
>> https://syzkaller.appspot.com/bug?extid=2d6ac90723742279e101
>
> It's hard to figure out what's wrong from looking at the syzbot report.
> What makes you think it is connected with dev->mutex?
>
> At first glance, it seems that the ath6kl driver is trying to flush a
> workqueue while holding a lock or mutex that is needed by one of the
> jobs in the workqueue. That's obviously never going to work, no matter
> what sort of lockdep validation gets used.

That lock is exactly dev->mutex where lockdep validation is disabled.
If lockdep validation on dev->mutex were not disabled, we can catch
possibility of deadlock before khungtaskd reports real deadlock as hung.

Lockdep validation on dev->mutex being disabled is really annoying, and
I want to make lockdep validation on dev->mutex enabled; that is the
"drivers/core: Remove lockdep_set_novalidate_class() usage" patch.

-------- Forwarded Message --------
Message-ID: <[email protected]>
Date: Sun, 3 Jul 2022 23:29:16 +0900
From: Tetsuo Handa <[email protected]>
Subject: Re: INFO: task hung in ath6kl_usb_power_off

I finally found why lockdep was not able to report this deadlock.
device_initialize() was hiding dev->mutex from lockdep tests.
I hope we can get rid of this lockdep_set_novalidate_class()...

Below is a reproducer kernel module. If request_firmware_nowait() called

INIT_WORK(&fw_work->work, request_firmware_work_func);
schedule_work(&fw_work->work);

before hub_event() calls

usb_lock_device(hdev);

, flush_scheduled_work() from hub_event() becomes responsible for flushing
fw_work->work. But flush_scheduled_work() can't flush fw_work->work because
dev->mutex is held by hub_event().

----------------------------------------
#include <linux/module.h>
#include <linux/sched.h>

static DEFINE_MUTEX(mutex);

static void request_firmware_work_func(struct work_struct *work)
{
schedule_timeout_uninterruptible(HZ); // inject race window for allowing hub_event() to find this work
mutex_lock(&mutex); // device_lock(parent) from ath9k_hif_usb_firmware_fail() waits for ath6kl_hif_power_off() to release dev->mutex
mutex_unlock(&mutex); // device_unlock(parent) from ath9k_hif_usb_firmware_fail()
}
static void hub_event(struct work_struct *work)
{
mutex_lock(&mutex); // usb_lock_device(hdev)
flush_scheduled_work(); // ath6kl_usb_flush_all() from ath6kl_hif_power_off() waits for request_firmware_work_func() while holding dev->mutex
mutex_unlock(&mutex); // usb_unlock_device(hdev)
}

static DECLARE_WORK(fw_work, request_firmware_work_func);
static DECLARE_WORK(hub_events, hub_event);

static int __init test_init(void)
{
lockdep_set_novalidate_class(&mutex); // device_initialize() suppresses lockdep warning
schedule_work(&fw_work); // request_firmware_nowait() from ath9k driver queues into system_wq
queue_work(system_freezable_wq, &hub_events); // kick_hub_wq() from usb code queues into hub_wq
return 0;
}

static void test_exit(void)
{
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");
----------------------------------------

----------------------------------------
[ 38.832553][ T2786] test: loading out-of-tree module taints kernel.
[ 187.116969][ T35] INFO: task kworker/0:2:33 blocked for more than 143 seconds.
[ 187.121366][ T35] Tainted: G O 5.19.0-rc4-next-20220701 #43
[ 187.124830][ T35] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 187.128724][ T35] task:kworker/0:2 state:D stack: 0 pid: 33 ppid: 2 flags:0x00004000
[ 187.133235][ T35] Workqueue: events_freezable hub_event [test]
[ 187.136000][ T35] Call Trace:
[ 187.137512][ T35] <TASK>
[ 187.138863][ T35] __schedule+0x304/0x8f0
[ 187.140923][ T35] schedule+0x40/0xa0
[ 187.142940][ T35] schedule_timeout+0x300/0x3a0
[ 187.145341][ T35] ? mark_held_locks+0x55/0x80
[ 187.147518][ T35] ? wait_for_completion+0x6b/0x130
[ 187.149257][ T35] ? _raw_spin_unlock_irq+0x22/0x30
[ 187.151187][ T35] ? wait_for_completion+0x6b/0x130
[ 187.152915][ T35] ? trace_hardirqs_on+0x3b/0xd0
[ 187.154388][ T35] wait_for_completion+0x73/0x130
[ 187.155547][ T35] __flush_workqueue+0x17b/0x480
[ 187.156710][ T35] ? __mutex_lock+0x12b/0xe10
[ 187.157703][ T35] ? wait_for_completion+0x2d/0x130
[ 187.158837][ T35] hub_event+0x1e/0x30 [test]
[ 187.160286][ T35] ? hub_event+0x1e/0x30 [test]
[ 187.161352][ T35] process_one_work+0x292/0x570
[ 187.162565][ T35] worker_thread+0x2f/0x3d0
[ 187.163590][ T35] ? process_one_work+0x570/0x570
[ 187.164779][ T35] kthread+0xd6/0x100
[ 187.165940][ T35] ? kthread_complete_and_exit+0x20/0x20
[ 187.167077][ T35] ret_from_fork+0x1f/0x30
[ 187.168319][ T35] </TASK>
[ 187.169190][ T35] INFO: task kworker/0:3:54 blocked for more than 143 seconds.
[ 187.171458][ T35] Tainted: G O 5.19.0-rc4-next-20220701 #43
[ 187.173380][ T35] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 187.175359][ T35] task:kworker/0:3 state:D stack: 0 pid: 54 ppid: 2 flags:0x00004000
[ 187.177660][ T35] Workqueue: events request_firmware_work_func [test]
[ 187.179221][ T35] Call Trace:
[ 187.180381][ T35] <TASK>
[ 187.181170][ T35] __schedule+0x304/0x8f0
[ 187.182618][ T35] schedule+0x40/0xa0
[ 187.184207][ T35] schedule_preempt_disabled+0x10/0x20
[ 187.186318][ T35] __mutex_lock+0x650/0xe10
[ 187.188527][ T35] ? request_firmware_work_func+0x1c/0x30 [test]
[ 187.190094][ T35] mutex_lock_nested+0x16/0x20
[ 187.191905][ T35] ? mutex_lock_nested+0x16/0x20
[ 187.193159][ T35] request_firmware_work_func+0x1c/0x30 [test]
[ 187.194643][ T35] process_one_work+0x292/0x570
[ 187.195899][ T35] worker_thread+0x2f/0x3d0
[ 187.197031][ T35] ? process_one_work+0x570/0x570
[ 187.198489][ T35] kthread+0xd6/0x100
[ 187.199756][ T35] ? kthread_complete_and_exit+0x20/0x20
[ 187.201224][ T35] ret_from_fork+0x1f/0x30
[ 187.202374][ T35] </TASK>
[ 187.203293][ T35]
[ 187.203293][ T35] Showing all locks held in the system:
[ 187.205200][ T35] 1 lock held by rcu_tasks_trace/10:
[ 187.206489][ T35] #0: ffffffffb25bdea0 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x28/0x3b0
[ 187.209383][ T35] 3 locks held by kworker/0:2/33:
[ 187.210930][ T35] #0: ffff9e2bc004d148 ((wq_completion)events_freezable){+.+.}-{0:0}, at: process_one_work+0x215/0x570
[ 187.214012][ T35] #1: ffffac448032fe68 (hub_events){+.+.}-{0:0}, at: process_one_work+0x215/0x570
[ 187.217162][ T35] #2: ffffffffc067b130 (&dev->mutex){....}-{3:3}, at: hub_event+0x12/0x30 [test]
[ 187.220277][ T35] 1 lock held by khungtaskd/35:
[ 187.221592][ T35] #0: ffffffffb25be5c0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire.constprop.56+0x0/0x30
[ 187.224310][ T35] 3 locks held by kworker/0:3/54:
[ 187.225596][ T35] #0: ffff9e2bc004cb48 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x215/0x570
[ 187.228147][ T35] #1: ffffac4480cebe68 (fw_work){+.+.}-{0:0}, at: process_one_work+0x215/0x570
[ 187.230646][ T35] #2: ffffffffc067b130 (&dev->mutex){....}-{3:3}, at: request_firmware_work_func+0x1c/0x30 [test]
[ 187.234203][ T35] 2 locks held by agetty/2729:
[ 187.235463][ T35] #0: ffff9e2bc71740a0 (&tty->ldisc_sem){++++}-{0:0}, at: ldsem_down_read+0xe/0x10
[ 187.237864][ T35] #1: ffffac448229f2f8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x168/0x5f0
[ 187.240571][ T35] 2 locks held by agetty/2731:
[ 187.241876][ T35] #0: ffff9e2bc907b8a0 (&tty->ldisc_sem){++++}-{0:0}, at: ldsem_down_read+0xe/0x10
[ 187.244312][ T35] #1: ffffac44822a72f8 (&ldata->atomic_read_lock){+.+.}-{3:3}, at: n_tty_read+0x168/0x5f0
[ 187.246938][ T35]
[ 187.247756][ T35] =============================================
[ 187.247756][ T35]
----------------------------------------

> However, it is always safe to acquire a child device's lock while
> holding the parent's lock. Lockdep isn't aware of this because it
> doesn't understand the hierarchical device tree. That's why lockdep
> checking has to be disabled for dev->mutex; if it weren't disabled then
> it would constantly report false positives.

Even if it is always safe to acquire a child device's lock while holding
the parent's lock, disabling lockdep checks completely on device's lock is
not safe.

2023-02-04 16:27:59

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Sun, Feb 05, 2023 at 01:12:12AM +0900, Tetsuo Handa wrote:
> On 2023/02/05 0:34, Alan Stern wrote:
> >> A few of examples:
> >>
> >> https://syzkaller.appspot.com/bug?extid=2d6ac90723742279e101
> >
> > It's hard to figure out what's wrong from looking at the syzbot report.
> > What makes you think it is connected with dev->mutex?
> >
> > At first glance, it seems that the ath6kl driver is trying to flush a
> > workqueue while holding a lock or mutex that is needed by one of the
> > jobs in the workqueue. That's obviously never going to work, no matter
> > what sort of lockdep validation gets used.
>
> That lock is exactly dev->mutex where lockdep validation is disabled.
> If lockdep validation on dev->mutex were not disabled, we can catch
> possibility of deadlock before khungtaskd reports real deadlock as hung.
>
> Lockdep validation on dev->mutex being disabled is really annoying, and
> I want to make lockdep validation on dev->mutex enabled; that is the
> "drivers/core: Remove lockdep_set_novalidate_class() usage" patch.

> Even if it is always safe to acquire a child device's lock while holding
> the parent's lock, disabling lockdep checks completely on device's lock is
> not safe.

I understand the problem you want to solve, and I understand that it
can be frustrating. However, I do not believe you will be able to
solve this problem.

Alan Stern

2023-02-04 17:10:10

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On 2023/02/05 1:27, Alan Stern wrote:
> On Sun, Feb 05, 2023 at 01:12:12AM +0900, Tetsuo Handa wrote:
>> On 2023/02/05 0:34, Alan Stern wrote:
>>>> A few of examples:
>>>>
>>>> https://syzkaller.appspot.com/bug?extid=2d6ac90723742279e101
>>>
>>> It's hard to figure out what's wrong from looking at the syzbot report.
>>> What makes you think it is connected with dev->mutex?
>>>
>>> At first glance, it seems that the ath6kl driver is trying to flush a
>>> workqueue while holding a lock or mutex that is needed by one of the
>>> jobs in the workqueue. That's obviously never going to work, no matter
>>> what sort of lockdep validation gets used.
>>
>> That lock is exactly dev->mutex where lockdep validation is disabled.
>> If lockdep validation on dev->mutex were not disabled, we can catch
>> possibility of deadlock before khungtaskd reports real deadlock as hung.
>>
>> Lockdep validation on dev->mutex being disabled is really annoying, and
>> I want to make lockdep validation on dev->mutex enabled; that is the
>> "drivers/core: Remove lockdep_set_novalidate_class() usage" patch.
>
>> Even if it is always safe to acquire a child device's lock while holding
>> the parent's lock, disabling lockdep checks completely on device's lock is
>> not safe.
>
> I understand the problem you want to solve, and I understand that it
> can be frustrating. However, I do not believe you will be able to
> solve this problem.

That is a declaration that driver developers are allowed to take it for granted
that driver callback functions can behave as if dev->mutex is not held.

Some developers test their changes with lockdep enabled, and believe that their
changes are correct because lockdep did not complain.
https://syzkaller.appspot.com/bug?extid=9ef743bba3a17c756174 is an example.

We should somehow update driver core code to make it possible to keep lockdep
checks enabled on dev->mutex.

2023-02-04 20:01:49

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Sun, Feb 05, 2023 at 02:09:40AM +0900, Tetsuo Handa wrote:
> On 2023/02/05 1:27, Alan Stern wrote:
> > On Sun, Feb 05, 2023 at 01:12:12AM +0900, Tetsuo Handa wrote:
> >> On 2023/02/05 0:34, Alan Stern wrote:
> >> Lockdep validation on dev->mutex being disabled is really annoying, and
> >> I want to make lockdep validation on dev->mutex enabled; that is the
> >> "drivers/core: Remove lockdep_set_novalidate_class() usage" patch.
> >
> >> Even if it is always safe to acquire a child device's lock while holding
> >> the parent's lock, disabling lockdep checks completely on device's lock is
> >> not safe.
> >
> > I understand the problem you want to solve, and I understand that it
> > can be frustrating. However, I do not believe you will be able to
> > solve this problem.
>
> That is a declaration that driver developers are allowed to take it for granted
> that driver callback functions can behave as if dev->mutex is not held.

No it isn't. It is a declaration that driver developers must be extra
careful because lockdep is unable to detect locking errors involving
dev->mutex.

> Some developers test their changes with lockdep enabled, and believe that their
> changes are correct because lockdep did not complain.
> https://syzkaller.appspot.com/bug?extid=9ef743bba3a17c756174 is an example.

How do you know developers are making this mistake? That example
doesn't show anything of the sort; the commit which introduced the bug
says nothing about lockdep.

> We should somehow update driver core code to make it possible to keep lockdep
> checks enabled on dev->mutex.

I'm sorry, but that simply is not feasible. It doesn't matter how much
you want to do it or feel it is needed; there is no reasonable way to do
it. To take just one example, what you are saying implies that when a
driver is probed for a device, it would not be allowed to register a
child device. That's a ridiculous restriction.

(I might also mention that dev->mutex is used by drivers in places
outside of the driver core. So even if you did magically manage to fix
the driver core, the problem would still remain.)

Alan Stern

2023-02-04 20:15:21

by Linus Torvalds

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Sat, Feb 4, 2023 at 12:01 PM Alan Stern <[email protected]> wrote:
>
> I'm sorry, but that simply is not feasible. It doesn't matter how much
> you want to do it or feel it is needed; there is no reasonable way to do
> it. To take just one example, what you are saying implies that when a
> driver is probed for a device, it would not be allowed to register a
> child device. That's a ridiculous restriction.

Well, we've worked around that in other places by making the lockdep
classes for different locks of the same type be different.

So this *could* possibly be solved by lockdep being smarter about
dev->mutex than just "disable checking entirely".

So maybe the lock_set_novalidate_class() could be something better. It
_is_ kind of disgusting.

That said, maybe people tried to subclass the locks and failed, and
that "no validation" is the best that can be done.

But other areas *do* end up spending extra effort to separate out the
locks (and the different uses of the locks), and I think the
dev->mutex is one of the few cases that just gives up and says "no
validation at all".

The other case seems to be the md bcache code.

Linus

2023-02-05 01:24:05

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Sat, Feb 04, 2023 at 12:14:54PM -0800, Linus Torvalds wrote:
> On Sat, Feb 4, 2023 at 12:01 PM Alan Stern <[email protected]> wrote:
> >
> > I'm sorry, but that simply is not feasible. It doesn't matter how much
> > you want to do it or feel it is needed; there is no reasonable way to do
> > it. To take just one example, what you are saying implies that when a
> > driver is probed for a device, it would not be allowed to register a
> > child device. That's a ridiculous restriction.
>
> Well, we've worked around that in other places by making the lockdep
> classes for different locks of the same type be different.
>
> So this *could* possibly be solved by lockdep being smarter about
> dev->mutex than just "disable checking entirely".
>
> So maybe the lock_set_novalidate_class() could be something better. It
> _is_ kind of disgusting.
>
> That said, maybe people tried to subclass the locks and failed, and
> that "no validation" is the best that can be done.
>
> But other areas *do* end up spending extra effort to separate out the
> locks (and the different uses of the locks), and I think the
> dev->mutex is one of the few cases that just gives up and says "no
> validation at all".
>
> The other case seems to be the md bcache code.

I suppose we could create separate lockdep classes for every bus_type
and device_type combination, as well as for the different sorts of
devices -- treat things like class devices separately from normal
devices, and so on. But even then there would be trouble.

For example, consider PCI devices and PCI bridges (this sort of thing
happens on other buses too). I don't know the details of how the PCI
subsystem works, but presumably when a bridge is probed, the driver then
probes all the devices on the other side of the bridge while holding the
bridge's lock. Then if one of those devices is another bridge, the same
thing happens recursively, and so on. How would drivers cope with that?
How deep will this nesting go? I doubt that the driver core could take
care of these issues all by itself.

I don't know if the following situation ever happens anywhere, but it
could: Suppose a driver wants to lock several children of some device A.
Providing it already holds A's lock, this is perfectly safe. But how
can we tell lockdep? Even if A belongs to a different lockdep class
from its children, the children would all be in the same class.

What happens when a driver wants to lock both a regular device and its
corresponding class device? Some drivers might acquire the locks in one
order and some drivers in another. Again it's safe, because separate
drivers will never try to lock the same devices, but how do you tell
lockdep about this?

No doubt there are other examples of potential problems. Somebody could
try to implement this kind of approach, but almost certainly it would
lead to tons of false positives.

Alan Stern

2023-02-05 01:32:24

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On 2023/02/05 5:01, Alan Stern wrote:
> On Sun, Feb 05, 2023 at 02:09:40AM +0900, Tetsuo Handa wrote:
>> On 2023/02/05 1:27, Alan Stern wrote:
>>> On Sun, Feb 05, 2023 at 01:12:12AM +0900, Tetsuo Handa wrote:
>>>> On 2023/02/05 0:34, Alan Stern wrote:
>>>> Lockdep validation on dev->mutex being disabled is really annoying, and
>>>> I want to make lockdep validation on dev->mutex enabled; that is the
>>>> "drivers/core: Remove lockdep_set_novalidate_class() usage" patch.
>>>
>>>> Even if it is always safe to acquire a child device's lock while holding
>>>> the parent's lock, disabling lockdep checks completely on device's lock is
>>>> not safe.
>>>
>>> I understand the problem you want to solve, and I understand that it
>>> can be frustrating. However, I do not believe you will be able to
>>> solve this problem.
>>
>> That is a declaration that driver developers are allowed to take it for granted
>> that driver callback functions can behave as if dev->mutex is not held.
>
> No it isn't. It is a declaration that driver developers must be extra
> careful because lockdep is unable to detect locking errors involving
> dev->mutex.

Driver developers are not always familiar with locks used by driver core,
like your

It's hard to figure out what's wrong from looking at the syzbot report.
What makes you think it is connected with dev->mutex?

At first glance, it seems that the ath6kl driver is trying to flush a
workqueue while holding a lock or mutex that is needed by one of the
jobs in the workqueue. That's obviously never going to work, no matter
what sort of lockdep validation gets used.

comment indicates that you did not notice that dev->mutex was connected to
this problem which involved ath6kl driver code and ath9k driver code and
driver core code.

Core developers can't assume that driver developers are extra careful, as
well as driver developers can't assume that core developers are familiar
with locks used by individual drivers. We need to fill the gap.

>
>> Some developers test their changes with lockdep enabled, and believe that their
>> changes are correct because lockdep did not complain.
>> https://syzkaller.appspot.com/bug?extid=9ef743bba3a17c756174 is an example.
>
> How do you know developers are making this mistake? That example
> doesn't show anything of the sort; the commit which introduced the bug
> says nothing about lockdep.

The commit which introduced the bug cannot say something about lockdep, for
lockdep validation is disabled and nobody noticed the possibility of deadlock
until syzbot reports it as hung. Since the possibility of deadlock cannot be
noticed until syzbot reports it as hung, I assume that there are many similar
deadlocks in the kernel that involves dev->mutex. How do you teach developers
that they are making this mistake, without keeping lockdep validation enabled?

By keeping lockdep validation disabled, you are declaring that driver developers
need not to worry about dev->mutex rather than declaring that driver developers
need to worry about dev->mutex.

2023-02-05 16:46:15

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Sun, Feb 05, 2023 at 10:31:56AM +0900, Tetsuo Handa wrote:
> On 2023/02/05 5:01, Alan Stern wrote:
> > On Sun, Feb 05, 2023 at 02:09:40AM +0900, Tetsuo Handa wrote:
> >> That is a declaration that driver developers are allowed to take it for granted
> >> that driver callback functions can behave as if dev->mutex is not held.
> >
> > No it isn't. It is a declaration that driver developers must be extra
> > careful because lockdep is unable to detect locking errors involving
> > dev->mutex.
>
> Driver developers are not always familiar with locks used by driver core,
> like your
>
> It's hard to figure out what's wrong from looking at the syzbot report.
> What makes you think it is connected with dev->mutex?

You didn't answer this question.

> At first glance, it seems that the ath6kl driver is trying to flush a
> workqueue while holding a lock or mutex that is needed by one of the
> jobs in the workqueue. That's obviously never going to work, no matter
> what sort of lockdep validation gets used.
>
> comment indicates that you did not notice that dev->mutex was connected to
> this problem which involved ath6kl driver code and ath9k driver code and
> driver core code.

Of course I didn't. There isn't enough information in the syzbot log
for someone to recognize the connection if they aren't already familiar
with the code in question.

> Core developers can't assume that driver developers are extra careful, as
> well as driver developers can't assume that core developers are familiar
> with locks used by individual drivers. We need to fill the gap.

Agreed.

> >> Some developers test their changes with lockdep enabled, and believe that their
> >> changes are correct because lockdep did not complain.
> >> https://syzkaller.appspot.com/bug?extid=9ef743bba3a17c756174 is an example.
> >
> > How do you know developers are making this mistake? That example
> > doesn't show anything of the sort; the commit which introduced the bug
> > says nothing about lockdep.
>
> The commit which introduced the bug cannot say something about lockdep, for
> lockdep validation is disabled and nobody noticed the possibility of deadlock
> until syzbot reports it as hung. Since the possibility of deadlock cannot be
> noticed until syzbot reports it as hung,

That isn't true at all. There are lots of occasions when people realize
that a deadlock might occur without seeing a report from lockdep or
syzbot. You just aren't aware of these occasions because the developer
then fixes the code before submitting it. But if you search through the
mailing list archives, I'm sure you'll find plenty of examples where
somebody criticizes a proposed patch on the grounds that it can cause a
deadlock.

> I assume that there are many similar
> deadlocks in the kernel that involves dev->mutex. How do you teach developers
> that they are making this mistake, without keeping lockdep validation enabled?

There probably are many similar deadlocks in the kernel. There probably
also are deadlocks (not involving dev->mutex) which lockdep could catch,
but hasn't because the right combination of conditions hasn't occurred.

You teach developers about this the same way you teach them about
anything else: Publishing information, talking to people, and putting
comments in the kernel source code.

> By keeping lockdep validation disabled, you are declaring that driver developers
> need not to worry about dev->mutex rather than declaring that driver developers
> need to worry about dev->mutex.

That is a very peculiar thing to say. How do you think people managed
to deal with deadlocks in the kernel before lockdep was developed? Do
you think they said: "My testing didn't reveal any deadlocks, so the
code must be perfect"?

Of course they didn't. And now people simply need to realize that
lockdep isn't perfect either.

And by the way, by disabling lockdep validation I am declaraing that
enabling it would cause an overwhelming number of false positives,
rendering lockdep useless (as you found out when you tried). Not that
driver developers don't have to worry about dev->mutex.

Alan Stern

2023-02-06 04:44:52

by Matthew Wilcox

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Mon, Feb 06, 2023 at 10:56:29AM +0800, Hillf Danton wrote:
> On Sun, 5 Feb 2023 11:46:06 -0500 Alan Stern <[email protected]>
> >
> > And by the way, by disabling lockdep validation I am declaraing that
> > enabling it would cause an overwhelming number of false positives,
>
> Could you share 5 false positives you see upstream to help understand how
> it is useless?

I've asked you before to stop cc'ing linux-mm on things which aren't
about memory management. Now I'm asking you publically.

2023-02-06 05:17:14

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Mon, Feb 06, 2023 at 10:56:29AM +0800, Hillf Danton wrote:
> On Sun, 5 Feb 2023 11:46:06 -0500 Alan Stern <[email protected]>
> >
> > And by the way, by disabling lockdep validation I am declaraing that
> > enabling it would cause an overwhelming number of false positives,
>
> Could you share 5 false positives you see upstream to help understand how
> it is useless?

Please see this other email in this thread:
https://lore.kernel.org/r/[email protected]

2023-02-06 06:48:12

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Mon, Feb 06, 2023 at 02:43:05PM +0800, Hillf Danton wrote:
> On Mon, 6 Feb 2023 06:17:03 +0100 Greg Kroah-Hartman <[email protected]>
> > On Mon, Feb 06, 2023 at 10:56:29AM +0800, Hillf Danton wrote:
> > > On Sun, 5 Feb 2023 11:46:06 -0500 Alan Stern <[email protected]>
> > > >
> > > > And by the way, by disabling lockdep validation I am declaraing that
> > > > enabling it would cause an overwhelming number of false positives,
> > >
> > > Could you share 5 false positives you see upstream to help understand how
> > > it is useless?
> >
> > Please see this other email in this thread:
> > https://lore.kernel.org/r/[email protected]
>
> What lockdep warnings? Specific examples, please.

Remove the one line of code, as per the patch in this thread, and boot
with lockdep enabled and see what happens if you wish to see them
yourself.

thanks,

greg k-h

2023-02-06 14:14:35

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On 2023/02/05 10:23, Alan Stern wrote:
> I suppose we could create separate lockdep classes for every bus_type
> and device_type combination, as well as for the different sorts of
> devices -- treat things like class devices separately from normal
> devices, and so on. But even then there would be trouble.

Sorry, since I'm not familiar with devices, I can't interpret what you
are talking about in this response. But why don't you try test5() approach
in an example module shown below (i.e. treat all dev->mutex instances
independent to each other) ?

Sharing mutex_init() (like test2() approach) causes false positives,
but allocating a key on each dev->mutex (like test5() approach) should
avoid false positives.

----------
#include <linux/module.h>
#include <linux/slab.h>

static struct mutex *alloc_mutex(void)
{
return kzalloc(sizeof(struct mutex), GFP_KERNEL | __GFP_NOFAIL);
}

static void free_mutex(struct mutex *m)
{
kfree(m);
}

static void test1(void)
{
struct mutex *parent_mutex;
struct mutex *peer_mutex[4];
int i;

pr_info("Running %s\n", __func__);
parent_mutex = alloc_mutex();
mutex_init(parent_mutex);

peer_mutex[0] = alloc_mutex();
mutex_init(peer_mutex[0]);
peer_mutex[1] = alloc_mutex();
mutex_init(peer_mutex[1]);
peer_mutex[2] = alloc_mutex();
mutex_init(peer_mutex[2]);
peer_mutex[3] = alloc_mutex();
mutex_init(peer_mutex[3]);

mutex_lock(parent_mutex);
for (i = 0; i < 4; i++)
mutex_lock(peer_mutex[i]);
for (i = 0; i < 4; i++)
mutex_unlock(peer_mutex[i]);
mutex_unlock(parent_mutex);
for (i = 0; i < 4; i++)
free_mutex(peer_mutex[i]);
free_mutex(parent_mutex);
}

static void test2(void)
{
struct mutex *parent_mutex;
struct mutex *peer_mutex[4];
int i;

pr_info("Running %s\n", __func__);
parent_mutex = alloc_mutex();
mutex_init(parent_mutex);
for (i = 0; i < 4; i++) {
peer_mutex[i] = alloc_mutex();
mutex_init(peer_mutex[i]);
}
mutex_lock(parent_mutex);
for (i = 0; i < 4; i++)
mutex_lock(peer_mutex[i]);
for (i = 0; i < 4; i++)
mutex_unlock(peer_mutex[i]);
mutex_unlock(parent_mutex);
for (i = 0; i < 4; i++)
free_mutex(peer_mutex[i]);
free_mutex(parent_mutex);
}

static void test3(void)
{
struct mutex *parent_mutex;
struct mutex *peer_mutex[4];
int i;

pr_info("Running %s\n", __func__);
parent_mutex = alloc_mutex();
mutex_init(parent_mutex);
for (i = 0; i < 4; i++) {
peer_mutex[i] = alloc_mutex();
switch (i) {
case 0:
mutex_init(peer_mutex[i]);
break;
case 1:
mutex_init(peer_mutex[i]);
break;
case 2:
mutex_init(peer_mutex[i]);
break;
case 3:
mutex_init(peer_mutex[i]);
break;
}
}
mutex_lock(parent_mutex);
for (i = 0; i < 4; i++)
mutex_lock(peer_mutex[i]);
for (i = 0; i < 4; i++)
mutex_unlock(peer_mutex[i]);
mutex_unlock(parent_mutex);
for (i = 0; i < 4; i++)
free_mutex(peer_mutex[i]);
free_mutex(parent_mutex);
}

static void test4(void)
{
struct mutex *parent_mutex;
struct mutex *peer_mutex[4];
int i;

pr_info("Running %s\n", __func__);
parent_mutex = alloc_mutex();
mutex_init(parent_mutex);
for (i = 0; i < 4; i++) {
peer_mutex[i] = alloc_mutex();
switch (i) {
case 0:
mutex_init(peer_mutex[i]);
break;
case 1:
mutex_init(peer_mutex[i]);
break;
case 2:
mutex_init(peer_mutex[i]);
break;
case 3:
mutex_init(peer_mutex[i]);
break;
}
}
mutex_lock(parent_mutex);
for (i = 0; i < 4; i++)
mutex_lock(peer_mutex[i]);
for (i = 0; i < 4; i++)
mutex_unlock(peer_mutex[i]);
for (i = 3; i >= 0; i--)
mutex_lock(peer_mutex[i]);
for (i = 3; i >= 0; i--)
mutex_unlock(peer_mutex[i]);
mutex_unlock(parent_mutex);
for (i = 0; i < 4; i++)
free_mutex(peer_mutex[i]);
free_mutex(parent_mutex);
}

struct mutex2 {
struct mutex mutex;
struct lock_class_key key;
};

static struct mutex2 *alloc_mutex2(void)
{
struct mutex2 *m = kzalloc(sizeof(struct mutex2), GFP_KERNEL | __GFP_NOFAIL);

lockdep_register_key(&m->key);
mutex_init(&m->mutex);
lockdep_set_class(&m->mutex, &m->key);
return m;
}

static void free_mutex2(struct mutex2 *m)
{
mutex_destroy(&m->mutex);
lockdep_unregister_key(&m->key);
kfree(m);
}

static void test5(void)
{
struct mutex2 *parent_mutex;
struct mutex2 *peer_mutex[4];
int i;

pr_info("Running %s\n", __func__);
parent_mutex = alloc_mutex2();
for (i = 0; i < 4; i++)
peer_mutex[i] = alloc_mutex2();
mutex_lock(&parent_mutex->mutex);
for (i = 0; i < 4; i++)
mutex_lock(&peer_mutex[i]->mutex);
for (i = 0; i < 4; i++)
mutex_unlock(&peer_mutex[i]->mutex);
mutex_unlock(&parent_mutex->mutex);
for (i = 0; i < 4; i++)
free_mutex2(peer_mutex[i]);
free_mutex2(parent_mutex);
}

static void test6(void)
{
struct mutex2 *parent_mutex;
struct mutex2 *peer_mutex[4];
int i;

pr_info("Running %s\n", __func__);
parent_mutex = alloc_mutex2();
for (i = 0; i < 4; i++)
peer_mutex[i] = alloc_mutex2();
mutex_lock(&parent_mutex->mutex);
for (i = 0; i < 4; i++)
mutex_lock(&peer_mutex[i]->mutex);
for (i = 0; i < 4; i++)
mutex_unlock(&peer_mutex[i]->mutex);
for (i = 3; i >= 0; i--)
mutex_lock(&peer_mutex[i]->mutex);
for (i = 3; i >= 0; i--)
mutex_unlock(&peer_mutex[i]->mutex);
mutex_unlock(&parent_mutex->mutex);
for (i = 0; i < 4; i++)
free_mutex2(peer_mutex[i]);
free_mutex2(parent_mutex);
}

static int __init lockdep_test_init(void)
{
if (1)
test1(); // safe and lockdep does not complain
if (0)
test2(); // safe but lockdep complains
if (1)
test3(); // safe and lockdep does not complain
if (0)
test4(); // unsafe and lockdep complains
if (1)
test5(); // safe and lockdep does not complain
if (0)
test6(); // unsafe and lockdep complains
return -EINVAL;
}

module_init(lockdep_test_init);
MODULE_LICENSE("GPL");
----------

2023-02-06 15:46:35

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Mon, Feb 06, 2023 at 11:13:38PM +0900, Tetsuo Handa wrote:
> On 2023/02/05 10:23, Alan Stern wrote:
> > I suppose we could create separate lockdep classes for every bus_type
> > and device_type combination, as well as for the different sorts of
> > devices -- treat things like class devices separately from normal
> > devices, and so on. But even then there would be trouble.
>
> Sorry, since I'm not familiar with devices, I can't interpret what you
> are talking about in this response. But why don't you try test5() approach
> in an example module shown below (i.e. treat all dev->mutex instances
> independent to each other) ?
>
> Sharing mutex_init() (like test2() approach) causes false positives,
> but allocating a key on each dev->mutex (like test5() approach) should
> avoid false positives.

Interesting idea. I'm doubtful that it will accomplish all that you
want. After all, one of lockdep's biggest advantages is that it can
detect the potential for deadlocks without a deadlock actually
occurring. By putting each mutex into its own class, you lose much of
this ability.

But who knows? Maybe it will be a big help.

Anyway, below is a patch you can try, based on the code for your test5.
Let me know what happens.

Alan Stern

Index: usb-devel/drivers/base/core.c
===================================================================
--- usb-devel.orig/drivers/base/core.c
+++ usb-devel/drivers/base/core.c
@@ -2322,6 +2322,8 @@ static void device_release(struct kobjec
devres_release_all(dev);

kfree(dev->dma_range_map);
+ mutex_destroy(&dev->mutex);
+ lockdep_unregister_key(&dev->mutex_key);

if (dev->release)
dev->release(dev);
@@ -2941,7 +2943,8 @@ void device_initialize(struct device *de
kobject_init(&dev->kobj, &device_ktype);
INIT_LIST_HEAD(&dev->dma_pools);
mutex_init(&dev->mutex);
- lockdep_set_novalidate_class(&dev->mutex);
+ lockdep_register_key(&dev->mutex_key);
+ lockdep_set_class(&dev->mutex, &dev->mutex_key);
spin_lock_init(&dev->devres_lock);
INIT_LIST_HEAD(&dev->devres_head);
device_pm_init(dev);
Index: usb-devel/include/linux/device.h
===================================================================
--- usb-devel.orig/include/linux/device.h
+++ usb-devel/include/linux/device.h
@@ -570,6 +570,7 @@ struct device {
struct mutex mutex; /* mutex to synchronize calls to
* its driver.
*/
+ struct lock_class_key mutex_key; /* Unique key for each device */

struct dev_links_info links;
struct dev_pm_info power;

2023-02-07 13:08:44

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On 2023/02/07 0:45, Alan Stern wrote:
> On Mon, Feb 06, 2023 at 11:13:38PM +0900, Tetsuo Handa wrote:
>> On 2023/02/05 10:23, Alan Stern wrote:
>>> I suppose we could create separate lockdep classes for every bus_type
>>> and device_type combination, as well as for the different sorts of
>>> devices -- treat things like class devices separately from normal
>>> devices, and so on. But even then there would be trouble.
>>
>> Sorry, since I'm not familiar with devices, I can't interpret what you
>> are talking about in this response. But why don't you try test5() approach
>> in an example module shown below (i.e. treat all dev->mutex instances
>> independent to each other) ?
>>
>> Sharing mutex_init() (like test2() approach) causes false positives,
>> but allocating a key on each dev->mutex (like test5() approach) should
>> avoid false positives.
>
> Interesting idea. I'm doubtful that it will accomplish all that you
> want. After all, one of lockdep's biggest advantages is that it can
> detect the potential for deadlocks without a deadlock actually
> occurring. By putting each mutex into its own class, you lose much of
> this ability.
>
> But who knows? Maybe it will be a big help.
>
> Anyway, below is a patch you can try, based on the code for your test5.
> Let me know what happens.
>

It boots, except lockdep_register_key() hit WARN_ON_ONCE() at
device_register(&platform_bus) from platform_bus_init(), for
platform_bus is a static object.

struct device platform_bus = {
.init_name = "platform",
};

We need to skip lockdep_register_key()/lockdep_unregister_key() on
static "struct device" instances...

----------
[ 0.550046][ T1] smpboot: Total of 12 processors activated (74513.31 BogoMIPS)
[ 0.559082][ T1] devtmpfs: initialized
[ 0.560054][ T1] ------------[ cut here ]------------
[ 0.562046][ T1] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:1223 lockdep_register_key+0x1a2/0x230
[ 0.564046][ T1] Modules linked in:
[ 0.565050][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc7+ #16
[ 0.567046][ T1] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
[ 0.569077][ T1] RIP: 0010:lockdep_register_key+0x1a2/0x230
[ 0.571046][ T1] Code: 89 03 4a 89 1c e5 60 b1 e8 84 48 85 c0 0f 84 27 ff ff ff 8b 3d c7 68 f8 01 48 89 58 08 85 ff 0f 85 54 ff ff ff e9 1a ff ff ff <0f> 0b 5b 41 5c 41 5d 41 5e 5d c3 89 c6 48 c7 c7 70 41 05 85 e8 35
[ 0.573046][ T1] RSP: 0000:ffffadbb00017e80 EFLAGS: 00010202
[ 0.575046][ T1] RAX: 0000000000000001 RBX: ffffffff8443f5d0 RCX: 0000000000000000
[ 0.577054][ T1] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff8443f5d0
[ 0.579069][ T1] RBP: ffffadbb00017ea0 R08: 0000000000000003 R09: 0000000000000000
[ 0.581069][ T1] R10: d6bf87c490213bdc R11: 0000000000000001 R12: ffffffff8443f5d0
[ 0.583069][ T1] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 0.585058][ T1] FS: 0000000000000000(0000) GS:ffff9b7ef6e00000(0000) knlGS:0000000000000000
[ 0.587046][ T1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.589046][ T1] CR2: ffff9b7df0202000 CR3: 000000012f011001 CR4: 0000000000370ef0
[ 0.591046][ T1] Call Trace:
[ 0.592310][ T1] <TASK>
[ 0.594046][ T1] device_initialize+0x5f/0x170
[ 0.595046][ T1] device_register+0xd/0x20
[ 0.597046][ T1] platform_bus_init+0x16/0x4d
[ 0.598061][ T1] driver_init+0x2e/0x3a
[ 0.600054][ T1] kernel_init_freeable+0xc3/0x1d2
[ 0.601051][ T1] ? rest_init+0x190/0x190
[ 0.603051][ T1] kernel_init+0x15/0x120
[ 0.604273][ T1] ret_from_fork+0x1f/0x30
[ 0.606058][ T1] </TASK>
[ 0.607069][ T1] irq event stamp: 38345
[ 0.608284][ T1] hardirqs last enabled at (38357): [<ffffffff830d9953>] __up_console_sem+0x53/0x60
[ 0.610046][ T1] hardirqs last disabled at (38370): [<ffffffff830d9938>] __up_console_sem+0x38/0x60
[ 0.613054][ T1] softirqs last enabled at (38314): [<ffffffff838d54db>] __do_softirq+0x30b/0x46f
[ 0.615061][ T1] softirqs last disabled at (38309): [<ffffffff8306e859>] irq_exit_rcu+0xb9/0xf0
[ 0.617059][ T1] ---[ end trace 0000000000000000 ]---
[ 0.622102][ T1] ACPI: PM: Registering ACPI NVS region [mem 0xbfeff000-0xbfefffff] (4096 bytes)
[ 0.625130][ T1] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
----------

2023-02-07 17:46:38

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On Tue, Feb 07, 2023 at 10:07:18PM +0900, Tetsuo Handa wrote:
> On 2023/02/07 0:45, Alan Stern wrote:
> > On Mon, Feb 06, 2023 at 11:13:38PM +0900, Tetsuo Handa wrote:
> >> On 2023/02/05 10:23, Alan Stern wrote:
> >>> I suppose we could create separate lockdep classes for every bus_type
> >>> and device_type combination, as well as for the different sorts of
> >>> devices -- treat things like class devices separately from normal
> >>> devices, and so on. But even then there would be trouble.
> >>
> >> Sorry, since I'm not familiar with devices, I can't interpret what you
> >> are talking about in this response. But why don't you try test5() approach
> >> in an example module shown below (i.e. treat all dev->mutex instances
> >> independent to each other) ?
> >>
> >> Sharing mutex_init() (like test2() approach) causes false positives,
> >> but allocating a key on each dev->mutex (like test5() approach) should
> >> avoid false positives.
> >
> > Interesting idea. I'm doubtful that it will accomplish all that you
> > want. After all, one of lockdep's biggest advantages is that it can
> > detect the potential for deadlocks without a deadlock actually
> > occurring. By putting each mutex into its own class, you lose much of
> > this ability.
> >
> > But who knows? Maybe it will be a big help.
> >
> > Anyway, below is a patch you can try, based on the code for your test5.
> > Let me know what happens.
> >
>
> It boots, except lockdep_register_key() hit WARN_ON_ONCE() at
> device_register(&platform_bus) from platform_bus_init(), for
> platform_bus is a static object.
>
> struct device platform_bus = {
> .init_name = "platform",
> };
>
> We need to skip lockdep_register_key()/lockdep_unregister_key() on
> static "struct device" instances...

Okay, no doubt you can modify the patch to handle that.

The real question is what will happen in your syzbot test scenarios.
Lockdep certainly ought to be able to detect a real deadlock when one
occurs. It will be more interesting to find out if it can warn about
potential deadlocks _without_ them occurring.

Alan Stern

2023-02-07 22:17:32

[permalink] [raw]

Subject: Re: Converting dev->mutex into dev->spinlock ?

On 2023/02/08 2:46, Alan Stern wrote:
> The real question is what will happen in your syzbot test scenarios.
> Lockdep certainly ought to be able to detect a real deadlock when one
> occurs. It will be more interesting to find out if it can warn about
> potential deadlocks _without_ them occurring.

For example, https://syzkaller.appspot.com/x/repro.c?x=15556074480000 generates
below warning, but I don't have syzbot environment. Please propose an updated
patch (which won't hit WARN_ON_ONCE()) for allowing people to try it in syzbot
environment.

----------
[ 122.946483][ T3692]
[ 122.979855][ T3692] ======================================================
[ 122.984206][ T3692] WARNING: possible circular locking dependency detected
[ 122.986920][ T3692] 6.2.0-rc7-00011-g05ecb680708a-dirty #943 Tainted: G W
[ 122.989918][ T3692] ------------------------------------------------------
[ 122.993357][ T3692] a.out/3692 is trying to acquire lock:
[ 122.995732][ T3692] ffff888128168900 (&dev->mutex_key#165){+.+.}-{3:3}, at: nfc_dev_down+0x26/0x120
[ 123.008266][ T3692]
[ 123.008266][ T3692] but task is already holding lock:
[ 123.010995][ T3692] ffffffff85a09040 (rfkill_global_mutex){+.+.}-{3:3}, at: rfkill_fop_write+0x10e/0x360
[ 123.020389][ T3692]
[ 123.020389][ T3692] which lock already depends on the new lock.
[ 123.020389][ T3692]
[ 123.029714][ T3692]
[ 123.029714][ T3692] the existing dependency chain (in reverse order) is:
[ 123.033255][ T3692]
[ 123.033255][ T3692] -> #1 (rfkill_global_mutex){+.+.}-{3:3}:
[ 123.037794][ T3692] __mutex_lock_common+0xe6/0xea0
[ 123.040845][ T3692] mutex_lock_nested+0x1b/0x20
[ 123.043990][ T3692] rfkill_register+0x25/0x3d0
[ 123.046100][ T3692] nfc_register_device+0xd9/0x200
[ 123.048369][ T3692] nfcsim_device_new+0x146/0x2c0
[ 123.077246][ T3692] nfcsim_init+0x71/0x130
[ 123.079456][ T3692] do_one_initcall+0xab/0x200
[ 123.081584][ T3692] do_initcall_level+0xd7/0x1c0
[ 123.084613][ T3692] do_initcalls+0x3f/0x80
[ 123.087020][ T3692] kernel_init_freeable+0x230/0x2e0
[ 123.089388][ T3692] kernel_init+0x1b/0x290
[ 123.091567][ T3692] ret_from_fork+0x1f/0x30
[ 123.094393][ T3692]
[ 123.094393][ T3692] -> #0 (&dev->mutex_key#165){+.+.}-{3:3}:
[ 123.097522][ T3692] __lock_acquire+0x170d/0x33c0
[ 123.100739][ T3692] lock_acquire+0xd3/0x200
[ 123.103840][ T3692] __mutex_lock_common+0xe6/0xea0
[ 123.106366][ T3692] mutex_lock_nested+0x1b/0x20
[ 123.108942][ T3692] nfc_dev_down+0x26/0x120
[ 123.111542][ T3692] nfc_rfkill_set_block+0x26/0x80
[ 123.114113][ T3692] rfkill_set_block+0xa1/0x1e0
[ 123.116739][ T3692] rfkill_fop_write+0x2e9/0x360
[ 123.148674][ T3692] vfs_write+0x187/0x4d0
[ 123.151862][ T3692] ksys_write+0xc6/0x170
[ 123.156763][ T3692] do_syscall_64+0x41/0x90
[ 123.161519][ T3692] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 123.171373][ T3692]
[ 123.171373][ T3692] other info that might help us debug this:
[ 123.171373][ T3692]
[ 123.179398][ T3692] Possible unsafe locking scenario:
[ 123.179398][ T3692]
[ 123.183031][ T3692] CPU0 CPU1
[ 123.188115][ T3692] ---- ----
[ 123.190123][ T3692] lock(rfkill_global_mutex);
[ 123.192104][ T3692] lock(&dev->mutex_key#165);
[ 123.200840][ T3692] lock(rfkill_global_mutex);
[ 123.207386][ T3692] lock(&dev->mutex_key#165);
[ 123.241397][ T3692]
[ 123.241397][ T3692] *** DEADLOCK ***
[ 123.241397][ T3692]
[ 123.245893][ T3692] 1 lock held by a.out/3692:
[ 123.250422][ T3692] #0: ffffffff85a09040 (rfkill_global_mutex){+.+.}-{3:3}, at: rfkill_fop_write+0x10e/0x360
[ 123.256266][ T3692]
[ 123.256266][ T3692] stack backtrace:
[ 123.258802][ T3692] CPU: 0 PID: 3692 Comm: a.out Tainted: G W 6.2.0-rc7-00011-g05ecb680708a-dirty #943
[ 123.276931][ T3692] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 123.280352][ T3692] Call Trace:
[ 123.310882][ T3692] <TASK>
[ 123.312232][ T3692] dump_stack_lvl+0xfe/0x190
[ 123.314278][ T3692] check_noncircular+0x12e/0x140
[ 123.317038][ T3692] __lock_acquire+0x170d/0x33c0
[ 123.320079][ T3692] ? __lock_acquire+0x65f/0x33c0
[ 123.321887][ T3692] ? __lock_acquire+0x65f/0x33c0
[ 123.325643][ T3692] lock_acquire+0xd3/0x200
[ 123.327611][ T3692] ? nfc_dev_down+0x26/0x120
[ 123.331732][ T3692] ? nfc_dev_down+0x26/0x120
[ 123.335510][ T3692] ? nfc_dev_down+0x26/0x120
[ 123.337961][ T3692] __mutex_lock_common+0xe6/0xea0
[ 123.340214][ T3692] ? nfc_dev_down+0x26/0x120
[ 123.347330][ T3692] ? nfc_dev_down+0x26/0x120
[ 123.352194][ T3692] mutex_lock_nested+0x1b/0x20
[ 123.356877][ T3692] nfc_dev_down+0x26/0x120
[ 123.361856][ T3692] nfc_rfkill_set_block+0x26/0x80
[ 123.378236][ T3692] rfkill_set_block+0xa1/0x1e0
[ 123.379954][ T3692] rfkill_fop_write+0x2e9/0x360
[ 123.381693][ T3692] ? rfkill_fop_read+0x2a0/0x2a0
[ 123.406421][ T3692] vfs_write+0x187/0x4d0
[ 123.408119][ T3692] ? do_user_addr_fault+0x6e1/0x9c0
[ 123.410174][ T3692] ksys_write+0xc6/0x170
[ 123.411856][ T3692] do_syscall_64+0x41/0x90
[ 123.426593][ T3692] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 123.429725][ T3692] RIP: 0033:0x7fbea991ea3d
[ 123.431577][ T3692] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
[ 123.461942][ T3692] RSP: 002b:00007ffc8e2f63a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 123.468363][ T3692] RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007fbea991ea3d
[ 123.480268][ T3692] RDX: 0000000000000008 RSI: 0000000020000080 RDI: 0000000000000004
[ 123.487754][ T3692] RBP: 0000000000000000 R08: 0000000000000060 R09: 0000000000000060
[ 123.491003][ T3692] R10: 0000000000000060 R11: 0000000000000246 R12: 00007ffc8e2f6538
[ 123.550555][ T3692] R13: 000055cea974c2e0 R14: 00007ffc8e2f63d0 R15: 00007ffc8e2f63c0
[ 123.555519][ T3692] </TASK>
----------

2023-02-08 00:34:41