2021-02-13 12:40:19

by syzbot

[permalink] [raw]
Subject: INFO: task hung in misc_open (4)

Hello,

syzbot found the following issue on:

HEAD commit: e0756cfc Merge tag 'trace-v5.11-rc7' of git://git.kernel.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=147e449cd00000
kernel config: https://syzkaller.appspot.com/x/.config?x=1106b4b91e8dfab8
dashboard link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=163e1dacd00000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13af25e4d00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

INFO: task syz-executor487:8574 blocked for more than 143 seconds.
Not tainted 5.11.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor487 state:D stack:28112 pid: 8574 ppid: 8479 flags:0x00000004
Call Trace:
context_switch kernel/sched/core.c:4327 [inline]
__schedule+0x90c/0x21a0 kernel/sched/core.c:5078
schedule+0xcf/0x270 kernel/sched/core.c:5157
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:5216
__mutex_lock_common kernel/locking/mutex.c:1033 [inline]
__mutex_lock+0x81a/0x1110 kernel/locking/mutex.c:1103
misc_open+0x55/0x4a0 drivers/char/misc.c:107
chrdev_open+0x266/0x770 fs/char_dev.c:414
do_dentry_open+0x4b9/0x11b0 fs/open.c:817
do_open fs/namei.c:3254 [inline]
path_openat+0x1b9a/0x2730 fs/namei.c:3371
do_filp_open+0x17e/0x3c0 fs/namei.c:3398
do_sys_openat2+0x16d/0x420 fs/open.c:1172
do_sys_open fs/open.c:1188 [inline]
__do_sys_openat fs/open.c:1204 [inline]
__se_sys_openat fs/open.c:1199 [inline]
__x64_sys_openat+0x13f/0x1f0 fs/open.c:1199
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4031a7
RSP: 002b:00007ffd77c5adc0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000020000440 RCX: 00000000004031a7
RDX: 0000000000000002 RSI: 000000000048803b RDI: 00000000ffffff9c
RBP: 000000000048803b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffd77c5ceec R14: 0000000000000076 R15: 00007ffd77c5cef0
INFO: task syz-executor487:8631 blocked for more than 143 seconds.
Not tainted 5.11.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor487 state:D stack:27912 pid: 8631 ppid: 8480 flags:0x00000004
Call Trace:
context_switch kernel/sched/core.c:4327 [inline]
__schedule+0x90c/0x21a0 kernel/sched/core.c:5078
schedule+0xcf/0x270 kernel/sched/core.c:5157
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:5216
__mutex_lock_common kernel/locking/mutex.c:1033 [inline]
__mutex_lock+0x81a/0x1110 kernel/locking/mutex.c:1103
misc_open+0x55/0x4a0 drivers/char/misc.c:107
chrdev_open+0x266/0x770 fs/char_dev.c:414
do_dentry_open+0x4b9/0x11b0 fs/open.c:817
do_open fs/namei.c:3254 [inline]
path_openat+0x1b9a/0x2730 fs/namei.c:3371
do_filp_open+0x17e/0x3c0 fs/namei.c:3398
do_sys_openat2+0x16d/0x420 fs/open.c:1172
do_sys_open fs/open.c:1188 [inline]
__do_sys_openat fs/open.c:1204 [inline]
__se_sys_openat fs/open.c:1199 [inline]
__x64_sys_openat+0x13f/0x1f0 fs/open.c:1199
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4031a7
RSP: 002b:00007ffd77c5adc0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000020000440 RCX: 00000000004031a7
RDX: 0000000000000002 RSI: 000000000048803b RDI: 00000000ffffff9c
RBP: 000000000048803b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffd77c5ceec R14: 0000000000000076 R15: 00007ffd77c5cef0
INFO: task syz-executor487:8634 blocked for more than 143 seconds.
Not tainted 5.11.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor487 state:D stack:28160 pid: 8634 ppid: 8475 flags:0x00000004
Call Trace:
context_switch kernel/sched/core.c:4327 [inline]
__schedule+0x90c/0x21a0 kernel/sched/core.c:5078
schedule+0xcf/0x270 kernel/sched/core.c:5157
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:5216
__mutex_lock_common kernel/locking/mutex.c:1033 [inline]
__mutex_lock+0x81a/0x1110 kernel/locking/mutex.c:1103
misc_open+0x55/0x4a0 drivers/char/misc.c:107
chrdev_open+0x266/0x770 fs/char_dev.c:414
do_dentry_open+0x4b9/0x11b0 fs/open.c:817
do_open fs/namei.c:3254 [inline]
path_openat+0x1b9a/0x2730 fs/namei.c:3371
do_filp_open+0x17e/0x3c0 fs/namei.c:3398
do_sys_openat2+0x16d/0x420 fs/open.c:1172
do_sys_open fs/open.c:1188 [inline]
__do_sys_openat fs/open.c:1204 [inline]
__se_sys_openat fs/open.c:1199 [inline]
__x64_sys_openat+0x13f/0x1f0 fs/open.c:1199
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4031a7
RSP: 002b:00007ffd77c5adc0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000020000440 RCX: 00000000004031a7
RDX: 0000000000000002 RSI: 000000000048803b RDI: 00000000ffffff9c
RBP: 000000000048803b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffd77c5ceec R14: 0000000000000076 R15: 00007ffd77c5cef0
INFO: task syz-executor487:8668 blocked for more than 144 seconds.
Not tainted 5.11.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor487 state:D stack:28160 pid: 8668 ppid: 8477 flags:0x00000004
Call Trace:
context_switch kernel/sched/core.c:4327 [inline]
__schedule+0x90c/0x21a0 kernel/sched/core.c:5078
schedule+0xcf/0x270 kernel/sched/core.c:5157
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:5216
__mutex_lock_common kernel/locking/mutex.c:1033 [inline]
__mutex_lock+0x81a/0x1110 kernel/locking/mutex.c:1103
misc_open+0x55/0x4a0 drivers/char/misc.c:107
chrdev_open+0x266/0x770 fs/char_dev.c:414
do_dentry_open+0x4b9/0x11b0 fs/open.c:817
do_open fs/namei.c:3254 [inline]
path_openat+0x1b9a/0x2730 fs/namei.c:3371
do_filp_open+0x17e/0x3c0 fs/namei.c:3398
do_sys_openat2+0x16d/0x420 fs/open.c:1172
do_sys_open fs/open.c:1188 [inline]
__do_sys_openat fs/open.c:1204 [inline]
__se_sys_openat fs/open.c:1199 [inline]
__x64_sys_openat+0x13f/0x1f0 fs/open.c:1199
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4031a7
RSP: 002b:00007ffd77c5adc0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000020000440 RCX: 00000000004031a7
RDX: 0000000000000002 RSI: 000000000048803b RDI: 00000000ffffff9c
RBP: 000000000048803b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffd77c5ceec R14: 0000000000000076 R15: 00007ffd77c5cef0
INFO: task syz-executor487:8679 blocked for more than 144 seconds.
Not tainted 5.11.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor487 state:D stack:28160 pid: 8679 ppid: 8478 flags:0x00000004
Call Trace:
context_switch kernel/sched/core.c:4327 [inline]
__schedule+0x90c/0x21a0 kernel/sched/core.c:5078
schedule+0xcf/0x270 kernel/sched/core.c:5157
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:5216
__mutex_lock_common kernel/locking/mutex.c:1033 [inline]
__mutex_lock+0x81a/0x1110 kernel/locking/mutex.c:1103
misc_open+0x55/0x4a0 drivers/char/misc.c:107
chrdev_open+0x266/0x770 fs/char_dev.c:414
do_dentry_open+0x4b9/0x11b0 fs/open.c:817
do_open fs/namei.c:3254 [inline]
path_openat+0x1b9a/0x2730 fs/namei.c:3371
do_filp_open+0x17e/0x3c0 fs/namei.c:3398
do_sys_openat2+0x16d/0x420 fs/open.c:1172
do_sys_open fs/open.c:1188 [inline]
__do_sys_openat fs/open.c:1204 [inline]
__se_sys_openat fs/open.c:1199 [inline]
__x64_sys_openat+0x13f/0x1f0 fs/open.c:1199
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x4031a7
RSP: 002b:00007ffd77c5adc0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000020000440 RCX: 00000000004031a7
RDX: 0000000000000002 RSI: 000000000048803b RDI: 00000000ffffff9c
RBP: 000000000048803b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffd77c5ceec R14: 0000000000000076 R15: 00007ffd77c5cef0

Showing all locks held in the system:
5 locks held by kworker/1:0/19:
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:616 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x871/0x15f0 kernel/workqueue.c:2246
#1: ffffc90000d97da8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x15f0 kernel/workqueue.c:2250
#2: ffff88801ccb9218 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#2: ffff88801ccb9218 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4320 drivers/usb/core/hub.c:5537
#3: ffff8881441ce218 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#3: ffff8881441ce218 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:889
#4: ffff8881418921a8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#4: ffff8881418921a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:889
1 lock held by khungtaskd/1657:
#0: ffffffff8bd73da0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6259
5 locks held by kworker/0:2/3000:
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:616 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x871/0x15f0 kernel/workqueue.c:2246
#1: ffffc90001887da8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x15f0 kernel/workqueue.c:2250
#2: ffff88801cc79218 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#2: ffff88801cc79218 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4320 drivers/usb/core/hub.c:5537
#3: ffff888147549218 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#3: ffff888147549218 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:889
#4: ffff88814402f1a8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#4: ffff88814402f1a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:889
5 locks held by kworker/0:4/4909:
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:616 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x871/0x15f0 kernel/workqueue.c:2246
#1: ffffc900016d7da8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x15f0 kernel/workqueue.c:2250
#2: ffff88801cd51218 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#2: ffff88801cd51218 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4320 drivers/usb/core/hub.c:5537
#3: ffff88814745c218 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#3: ffff88814745c218 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:889
#4: ffff888140e6d1a8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#4: ffff888140e6d1a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:889
1 lock held by in:imklog/8184:
#0: ffff888018fa34f0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:947
5 locks held by kworker/1:1/8486:
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:616 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x871/0x15f0 kernel/workqueue.c:2246
#1: ffffc9000175fda8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x15f0 kernel/workqueue.c:2250
#2: ffff888144669218 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#2: ffff888144669218 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4320 drivers/usb/core/hub.c:5537
#3: ffff888019edc218 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#3: ffff888019edc218 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:889
#4: ffff88814748c1a8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#4: ffff88814748c1a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:889
5 locks held by kworker/1:2/8499:
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:856 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:41 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:616 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:643 [inline]
#0: ffff888015a52538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x871/0x15f0 kernel/workqueue.c:2246
#1: ffffc90001a27da8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8a5/0x15f0 kernel/workqueue.c:2250
#2: ffff888144611218 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#2: ffff888144611218 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4320 drivers/usb/core/hub.c:5537
#3: ffff88814404a218 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#3: ffff88814404a218 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:889
#4: ffff8881438b01a8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:737 [inline]
#4: ffff8881438b01a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:889
1 lock held by syz-executor487/8574:
#0: ffffffff8c752b48 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x55/0x4a0 drivers/char/misc.c:107
1 lock held by syz-executor487/8631:
#0: ffffffff8c752b48 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x55/0x4a0 drivers/char/misc.c:107
1 lock held by syz-executor487/8634:
#0: ffffffff8c752b48 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x55/0x4a0 drivers/char/misc.c:107
1 lock held by syz-executor487/8668:
#0: ffffffff8c752b48 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x55/0x4a0 drivers/char/misc.c:107
1 lock held by syz-executor487/8679:
#0: ffffffff8c752b48 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x55/0x4a0 drivers/char/misc.c:107
2 locks held by syz-executor487/8683:
#0: ffffffff8c752b48 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x55/0x4a0 drivers/char/misc.c:107
#1: ffffffff8bc45148 (system_transition_mutex){+.+.}-{3:3}, at: snapshot_open+0x3b/0x2a0 kernel/power/user.c:54

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1657 Comm: khungtaskd Not tainted 5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x107/0x163 lib/dump_stack.c:120
nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:209 [inline]
watchdog+0xd43/0xfa0 kernel/hung_task.c:294
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0 skipped: idling at native_safe_halt arch/x86/include/asm/irqflags.h:60 [inline]
NMI backtrace for cpu 0 skipped: idling at arch_safe_halt arch/x86/include/asm/irqflags.h:103 [inline]
NMI backtrace for cpu 0 skipped: idling at acpi_safe_halt drivers/acpi/processor_idle.c:110 [inline]
NMI backtrace for cpu 0 skipped: idling at acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:516


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches


2022-07-04 06:56:42

by Tetsuo Handa

[permalink] [raw]
Subject: [PATCH] char: misc: make misc_open() and misc_register() killable

syzbot is reporting hung task at misc_open() [1], for snapshot_open() from
misc_open() might sleep for long with misc_mtx held whereas userspace can
flood with concurrent misc_open() requests. Mitigate this problem by making
misc_open() and misc_register() killable.

Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1]
Reported-by: syzbot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
---
drivers/char/misc.c | 57 +++++++++++++++++++--------------------------
1 file changed, 24 insertions(+), 33 deletions(-)

diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index cba19bfdc44d..b9a494bc4228 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -100,49 +100,39 @@ static const struct seq_operations misc_seq_ops = {
static int misc_open(struct inode *inode, struct file *file)
{
int minor = iminor(inode);
- struct miscdevice *c = NULL, *iter;
+ struct miscdevice *iter;
int err = -ENODEV;
const struct file_operations *new_fops = NULL;
+ bool retried = false;

- mutex_lock(&misc_mtx);
-
+ retry:
+ if (mutex_lock_killable(&misc_mtx))
+ return -EINTR;
list_for_each_entry(iter, &misc_list, list) {
if (iter->minor != minor)
continue;
- c = iter;
new_fops = fops_get(iter->fops);
+ if (!new_fops)
+ break;
+ /*
+ * Place the miscdevice in the file's
+ * private_data so it can be used by the
+ * file operations, including f_op->open below
+ */
+ file->private_data = iter;
+
+ err = 0;
+ replace_fops(file, new_fops);
+ if (file->f_op->open)
+ err = file->f_op->open(inode, file);
break;
}
-
- if (!new_fops) {
- mutex_unlock(&misc_mtx);
+ mutex_unlock(&misc_mtx);
+ if (!new_fops && !retried) {
request_module("char-major-%d-%d", MISC_MAJOR, minor);
- mutex_lock(&misc_mtx);
-
- list_for_each_entry(iter, &misc_list, list) {
- if (iter->minor != minor)
- continue;
- c = iter;
- new_fops = fops_get(iter->fops);
- break;
- }
- if (!new_fops)
- goto fail;
+ retried = true;
+ goto retry;
}
-
- /*
- * Place the miscdevice in the file's
- * private_data so it can be used by the
- * file operations, including f_op->open below
- */
- file->private_data = c;
-
- err = 0;
- replace_fops(file, new_fops);
- if (file->f_op->open)
- err = file->f_op->open(inode, file);
-fail:
- mutex_unlock(&misc_mtx);
return err;
}

@@ -180,7 +170,8 @@ int misc_register(struct miscdevice *misc)

INIT_LIST_HEAD(&misc->list);

- mutex_lock(&misc_mtx);
+ if (mutex_lock_killable(&misc_mtx))
+ return -EINTR;

if (is_dynamic) {
int i = find_first_zero_bit(misc_minors, DYNAMIC_MINORS);
--
2.18.4


2022-07-04 07:45:25

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Mon, Jul 04, 2022 at 03:44:07PM +0900, Tetsuo Handa wrote:
> syzbot is reporting hung task at misc_open() [1], for snapshot_open() from
> misc_open() might sleep for long with misc_mtx held whereas userspace can
> flood with concurrent misc_open() requests. Mitigate this problem by making
> misc_open() and misc_register() killable.

I do not understand, why not just fix snapshot_open()? Why add this
complexity to the misc core for a foolish individual misc device? Why
not add the fix there where it is spinning instead?

> Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1]
> Reported-by: syzbot <[email protected]>
> Signed-off-by: Tetsuo Handa <[email protected]>
> ---
> drivers/char/misc.c | 57 +++++++++++++++++++--------------------------
> 1 file changed, 24 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/char/misc.c b/drivers/char/misc.c
> index cba19bfdc44d..b9a494bc4228 100644
> --- a/drivers/char/misc.c
> +++ b/drivers/char/misc.c
> @@ -100,49 +100,39 @@ static const struct seq_operations misc_seq_ops = {
> static int misc_open(struct inode *inode, struct file *file)
> {
> int minor = iminor(inode);
> - struct miscdevice *c = NULL, *iter;
> + struct miscdevice *iter;
> int err = -ENODEV;
> const struct file_operations *new_fops = NULL;
> + bool retried = false;
>
> - mutex_lock(&misc_mtx);
> -
> + retry:
> + if (mutex_lock_killable(&misc_mtx))
> + return -EINTR;

I really don't want to add this type of thing to the misc core if at all
possible. Again, please just fix up the misbehaving misc driver
instead, don't penalize the core with thie complexity.

thanks,

greg k-h

2022-07-04 11:19:30

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 2022/07/04 16:29, Greg KH wrote:
> On Mon, Jul 04, 2022 at 03:44:07PM +0900, Tetsuo Handa wrote:
>> syzbot is reporting hung task at misc_open() [1], for snapshot_open() from
>> misc_open() might sleep for long with misc_mtx held whereas userspace can
>> flood with concurrent misc_open() requests. Mitigate this problem by making
>> misc_open() and misc_register() killable.
>
> I do not understand, why not just fix snapshot_open()? Why add this
> complexity to the misc core for a foolish individual misc device? Why
> not add the fix there where it is spinning instead?

Quoting an example from [1]. Multiple processes are calling misc_open() and
all but one processes are blocked at mutex_lock(&misc_mtx). The one which is
not blocked at mutex_lock(&misc_mtx) is also holding system_transition_mutex.

If you don't like mutex_lock_killable(&misc_mtx), we will need to consider moving
file->f_op->open() from misc_open() to after mutex_unlock(&misc_mtx). However,
since there might be misc devices which count on serialization currently provided
by mutex_lock(&misc_mtx), I guess we will after all end up with adding
"struct mutex" to "struct miscdevice" and call
mutex_lock_killable(&"struct miscdevice"->mutex) before calling
file->f_op->open() from misc_open().

----------------------------------------
INFO: task syz-executor864:6907 blocked for more than 143 seconds.
Not tainted 5.15.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor864 state:D stack:27936 pid: 6907 ppid: 6545 flags:0x00000004
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xb44/0x5960 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
__mutex_lock_common kernel/locking/mutex.c:669 [inline]
__mutex_lock+0xa34/0x12f0 kernel/locking/mutex.c:729
misc_open+0x55/0x4a0 drivers/char/misc.c:107
chrdev_open+0x266/0x770 fs/char_dev.c:414
do_dentry_open+0x4c8/0x11d0 fs/open.c:822
do_open fs/namei.c:3428 [inline]
path_openat+0x1c9a/0x2740 fs/namei.c:3561
do_filp_open+0x1aa/0x400 fs/namei.c:3588
do_sys_openat2+0x16d/0x4d0 fs/open.c:1200
do_sys_open fs/open.c:1216 [inline]
__do_sys_openat fs/open.c:1232 [inline]
__se_sys_openat fs/open.c:1227 [inline]
__x64_sys_openat+0x13f/0x1f0 fs/open.c:1227
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7ff261e34347
RSP: 002b:00007ffe30328bb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000020000440 RCX: 00007ff261e34347
RDX: 0000000000000002 RSI: 00007ff261eba03b RDI: 00000000ffffff9c
RBP: 00007ff261eba03b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffe3032acdc R14: 0000000000000076 R15: 00007ffe3032ace0
INFO: task syz-executor864:6922 blocked for more than 143 seconds.
Not tainted 5.15.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor864 state:D stack:28616 pid: 6922 ppid: 6546 flags:0x00000004
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xb44/0x5960 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
__mutex_lock_common kernel/locking/mutex.c:669 [inline]
__mutex_lock+0xa34/0x12f0 kernel/locking/mutex.c:729
misc_open+0x55/0x4a0 drivers/char/misc.c:107
chrdev_open+0x266/0x770 fs/char_dev.c:414
do_dentry_open+0x4c8/0x11d0 fs/open.c:822
do_open fs/namei.c:3428 [inline]
path_openat+0x1c9a/0x2740 fs/namei.c:3561
do_filp_open+0x1aa/0x400 fs/namei.c:3588
do_sys_openat2+0x16d/0x4d0 fs/open.c:1200
do_sys_open fs/open.c:1216 [inline]
__do_sys_openat fs/open.c:1232 [inline]
__se_sys_openat fs/open.c:1227 [inline]
__x64_sys_openat+0x13f/0x1f0 fs/open.c:1227
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7ff261e34347
RSP: 002b:00007ffe30328bb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000020000440 RCX: 00007ff261e34347
RDX: 0000000000000002 RSI: 00007ff261eba03b RDI: 00000000ffffff9c
RBP: 00007ff261eba03b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffe3032acdc R14: 0000000000000076 R15: 00007ffe3032ace0
INFO: task syz-executor864:6936 blocked for more than 143 seconds.
Not tainted 5.15.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor864 state:D stack:27760 pid: 6936 ppid: 6548 flags:0x00000004
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xb44/0x5960 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
__mutex_lock_common kernel/locking/mutex.c:669 [inline]
__mutex_lock+0xa34/0x12f0 kernel/locking/mutex.c:729
misc_open+0x55/0x4a0 drivers/char/misc.c:107
chrdev_open+0x266/0x770 fs/char_dev.c:414
do_dentry_open+0x4c8/0x11d0 fs/open.c:822
do_open fs/namei.c:3428 [inline]
path_openat+0x1c9a/0x2740 fs/namei.c:3561
do_filp_open+0x1aa/0x400 fs/namei.c:3588
do_sys_openat2+0x16d/0x4d0 fs/open.c:1200
do_sys_open fs/open.c:1216 [inline]
__do_sys_openat fs/open.c:1232 [inline]
__se_sys_openat fs/open.c:1227 [inline]
__x64_sys_openat+0x13f/0x1f0 fs/open.c:1227
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7ff261e34347
RSP: 002b:00007ffe30328bb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000020000440 RCX: 00007ff261e34347
RDX: 0000000000000002 RSI: 00007ff261eba03b RDI: 00000000ffffff9c
RBP: 00007ff261eba03b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffe3032acdc R14: 0000000000000076 R15: 00007ffe3032ace0
INFO: task syz-executor864:6954 blocked for more than 144 seconds.
Not tainted 5.15.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor864 state:D stack:28328 pid: 6954 ppid: 6549 flags:0x00000004
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xb44/0x5960 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
__mutex_lock_common kernel/locking/mutex.c:669 [inline]
__mutex_lock+0xa34/0x12f0 kernel/locking/mutex.c:729
misc_open+0x55/0x4a0 drivers/char/misc.c:107
chrdev_open+0x266/0x770 fs/char_dev.c:414
do_dentry_open+0x4c8/0x11d0 fs/open.c:822
do_open fs/namei.c:3428 [inline]
path_openat+0x1c9a/0x2740 fs/namei.c:3561
do_filp_open+0x1aa/0x400 fs/namei.c:3588
do_sys_openat2+0x16d/0x4d0 fs/open.c:1200
do_sys_open fs/open.c:1216 [inline]
__do_sys_openat fs/open.c:1232 [inline]
__se_sys_openat fs/open.c:1227 [inline]
__x64_sys_openat+0x13f/0x1f0 fs/open.c:1227
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7ff261e34347
RSP: 002b:00007ffe30328bb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000020000440 RCX: 00007ff261e34347
RDX: 0000000000000002 RSI: 00007ff261eba03b RDI: 00000000ffffff9c
RBP: 00007ff261eba03b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffe3032acdc R14: 0000000000000076 R15: 00007ffe3032ace0
INFO: task syz-executor864:6972 blocked for more than 144 seconds.
Not tainted 5.15.0-rc7-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor864 state:D stack:28328 pid: 6972 ppid: 6547 flags:0x00000004
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0xb44/0x5960 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425
__mutex_lock_common kernel/locking/mutex.c:669 [inline]
__mutex_lock+0xa34/0x12f0 kernel/locking/mutex.c:729
misc_open+0x55/0x4a0 drivers/char/misc.c:107
chrdev_open+0x266/0x770 fs/char_dev.c:414
do_dentry_open+0x4c8/0x11d0 fs/open.c:822
do_open fs/namei.c:3428 [inline]
path_openat+0x1c9a/0x2740 fs/namei.c:3561
do_filp_open+0x1aa/0x400 fs/namei.c:3588
do_sys_openat2+0x16d/0x4d0 fs/open.c:1200
do_sys_open fs/open.c:1216 [inline]
__do_sys_openat fs/open.c:1232 [inline]
__se_sys_openat fs/open.c:1227 [inline]
__x64_sys_openat+0x13f/0x1f0 fs/open.c:1227
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7ff261e34347
RSP: 002b:00007ffe30328bb0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
RAX: ffffffffffffffda RBX: 0000000020000440 RCX: 00007ff261e34347
RDX: 0000000000000002 RSI: 00007ff261eba03b RDI: 00000000ffffff9c
RBP: 00007ff261eba03b R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007ffe3032acdc R14: 0000000000000076 R15: 00007ffe3032ace0

Showing all locks held in the system:
1 lock held by khungtaskd/27:
#0: ffffffff8b97f0e0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
5 locks held by kworker/1:2/1053:
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:634 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x8a3/0x16b0 kernel/workqueue.c:2268
#1: ffffc900049ffdb0 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8d7/0x16b0 kernel/workqueue.c:2272
#2: ffff88801fed4220 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#2: ffff88801fed4220 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4330 drivers/usb/core/hub.c:5662
#3: ffff888022e64220 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#3: ffff888022e64220 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:944
#4: ffff8880781991a8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#4: ffff8880781991a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:944
5 locks held by kworker/0:2/1266:
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:634 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x8a3/0x16b0 kernel/workqueue.c:2268
#1: ffffc900053efdb0 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8d7/0x16b0 kernel/workqueue.c:2272
#2: ffff888146d2f220 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#2: ffff888146d2f220 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4330 drivers/usb/core/hub.c:5662
#3: ffff88806fe79220 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#3: ffff88806fe79220 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:944
#4: ffff88801fe231a8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#4: ffff88801fe231a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:944
5 locks held by kworker/1:3/2969:
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:634 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x8a3/0x16b0 kernel/workqueue.c:2268
#1: ffffc90001177db0 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8d7/0x16b0 kernel/workqueue.c:2272
#2: ffff888146d3d220 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#2: ffff888146d3d220 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4330 drivers/usb/core/hub.c:5662
#3: ffff8880750ec220 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#3: ffff8880750ec220 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:944
#4: ffff888078ab61a8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#4: ffff888078ab61a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:944
1 lock held by in:imklog/6253:
#0: ffff88801b806db0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:990
5 locks held by kworker/1:0/6557:
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:634 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x8a3/0x16b0 kernel/workqueue.c:2268
#1: ffffc90002c37db0 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8d7/0x16b0 kernel/workqueue.c:2272
#2: ffff888146d23220 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#2: ffff888146d23220 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4330 drivers/usb/core/hub.c:5662
#3: ffff888079db5220 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#3: ffff888079db5220 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:944
#4: ffff88801a88f1a8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#4: ffff88801a88f1a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:944
5 locks held by kworker/0:1/6563:
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1198 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:634 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
#0: ffff88801643cd38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x8a3/0x16b0 kernel/workqueue.c:2268
#1: ffffc90002cb7db0 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8d7/0x16b0 kernel/workqueue.c:2272
#2: ffff8880200b6220 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#2: ffff8880200b6220 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4330 drivers/usb/core/hub.c:5662
#3: ffff88806e962220 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#3: ffff88806e962220 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:944
#4: ffff88806eba91a8 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:760 [inline]
#4: ffff88806eba91a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0 drivers/base/dd.c:944
1 lock held by syz-executor864/6907:
#0: ffffffff8c3c4488 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x55/0x4a0 drivers/char/misc.c:107
1 lock held by syz-executor864/6922:
#0: ffffffff8c3c4488 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x55/0x4a0 drivers/char/misc.c:107
1 lock held by syz-executor864/6936:
#0: ffffffff8c3c4488 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x55/0x4a0 drivers/char/misc.c:107
1 lock held by syz-executor864/6954:
#0: ffffffff8c3c4488 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x55/0x4a0 drivers/char/misc.c:107
1 lock held by syz-executor864/6972:
#0: ffffffff8c3c4488 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x55/0x4a0 drivers/char/misc.c:107
2 locks held by syz-executor864/6975:
#0: ffffffff8c3c4488 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x55/0x4a0 drivers/char/misc.c:107
#1: ffffffff8b84d408 (system_transition_mutex){+.+.}-{3:3}, at: snapshot_open+0x3b/0x2a0 kernel/power/user.c:54

----------------------------------------

2022-07-04 11:37:09

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Mon, Jul 04, 2022 at 07:25:44PM +0900, Tetsuo Handa wrote:
> On 2022/07/04 16:29, Greg KH wrote:
> > On Mon, Jul 04, 2022 at 03:44:07PM +0900, Tetsuo Handa wrote:
> >> syzbot is reporting hung task at misc_open() [1], for snapshot_open() from
> >> misc_open() might sleep for long with misc_mtx held whereas userspace can
> >> flood with concurrent misc_open() requests. Mitigate this problem by making
> >> misc_open() and misc_register() killable.
> >
> > I do not understand, why not just fix snapshot_open()? Why add this
> > complexity to the misc core for a foolish individual misc device? Why
> > not add the fix there where it is spinning instead?
>
> Quoting an example from [1]. Multiple processes are calling misc_open() and
> all but one processes are blocked at mutex_lock(&misc_mtx). The one which is
> not blocked at mutex_lock(&misc_mtx) is also holding system_transition_mutex.

And that is because of that one misc device, right? Why not fix that
instead of papering over the issue in the misc core?

thanks,

greg k-h

2022-07-04 13:23:47

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 2022/07/04 20:01, Greg KH wrote:
> On Mon, Jul 04, 2022 at 07:25:44PM +0900, Tetsuo Handa wrote:
>> On 2022/07/04 16:29, Greg KH wrote:
>>> On Mon, Jul 04, 2022 at 03:44:07PM +0900, Tetsuo Handa wrote:
>>>> syzbot is reporting hung task at misc_open() [1], for snapshot_open() from
>>>> misc_open() might sleep for long with misc_mtx held whereas userspace can
>>>> flood with concurrent misc_open() requests. Mitigate this problem by making
>>>> misc_open() and misc_register() killable.
>>>
>>> I do not understand, why not just fix snapshot_open()? Why add this
>>> complexity to the misc core for a foolish individual misc device? Why
>>> not add the fix there where it is spinning instead?
>>
>> Quoting an example from [1]. Multiple processes are calling misc_open() and
>> all but one processes are blocked at mutex_lock(&misc_mtx). The one which is
>> not blocked at mutex_lock(&misc_mtx) is also holding system_transition_mutex.
>
> And that is because of that one misc device, right? Why not fix that
> instead of papering over the issue in the misc core?

Since "struct file_operations"->open() is allowed to sleep, calling
"struct file_operations"->open() via reassignment by "struct miscdevice"->fops
with locks held can cause problems.

Assuming that this is not a deadlock hidden by device_initialize(), current
mutex_lock(&misc_mtx) is as problematic as major_names_lock mentioned at
https://lkml.kernel.org/r/[email protected] .

>> If you don't like mutex_lock_killable(&misc_mtx), we will need to consider moving
>> file->f_op->open() from misc_open() to after mutex_unlock(&misc_mtx).

Below is minimal changes for avoid calling "struct file_operations"->open() with
misc_mtx held. This would be nothing but moving hung task warning from misc_open()
to snapshot_open() (and therefore we would need to introduce killable version of
lock_system_sleep()), but we can avoid making misc_mtx like major_names_lock above.

Greg, can you accept this minimal change?

drivers/char/misc.c | 4 ++++
include/linux/miscdevice.h | 1 +
kernel/power/user.c | 1 +
3 files changed, 6 insertions(+)

diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index cba19bfdc44d..292c86c090b9 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -139,6 +139,10 @@ static int misc_open(struct inode *inode, struct file *file)

err = 0;
replace_fops(file, new_fops);
+ if (iter->unlocked_open && file->f_op->open) {
+ mutex_unlock(&misc_mtx);
+ return file->f_op->open(inode, file);
+ }
if (file->f_op->open)
err = file->f_op->open(inode, file);
fail:
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index 0676f18093f9..e112ef9e3b7b 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -86,6 +86,7 @@ struct miscdevice {
const struct attribute_group **groups;
const char *nodename;
umode_t mode;
+ bool unlocked_open;
};

extern int misc_register(struct miscdevice *misc);
diff --git a/kernel/power/user.c b/kernel/power/user.c
index ad241b4ff64c..69a269c4fb46 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -441,6 +441,7 @@ static struct miscdevice snapshot_device = {
.minor = SNAPSHOT_MINOR,
.name = "snapshot",
.fops = &snapshot_fops,
+ .unlocked_open = true,
};

static int __init snapshot_device_init(void)
--
2.34.1

2022-07-04 13:25:26

by Wedson Almeida Filho

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Mon, Jul 04, 2022 at 09:34:04PM +0900, Tetsuo Handa wrote:
> On 2022/07/04 20:01, Greg KH wrote:
> > On Mon, Jul 04, 2022 at 07:25:44PM +0900, Tetsuo Handa wrote:
> >> On 2022/07/04 16:29, Greg KH wrote:
> >>> On Mon, Jul 04, 2022 at 03:44:07PM +0900, Tetsuo Handa wrote:
> >>>> syzbot is reporting hung task at misc_open() [1], for snapshot_open() from
> >>>> misc_open() might sleep for long with misc_mtx held whereas userspace can
> >>>> flood with concurrent misc_open() requests. Mitigate this problem by making
> >>>> misc_open() and misc_register() killable.
> >>>
> >>> I do not understand, why not just fix snapshot_open()? Why add this
> >>> complexity to the misc core for a foolish individual misc device? Why
> >>> not add the fix there where it is spinning instead?
> >>
> >> Quoting an example from [1]. Multiple processes are calling misc_open() and
> >> all but one processes are blocked at mutex_lock(&misc_mtx). The one which is
> >> not blocked at mutex_lock(&misc_mtx) is also holding system_transition_mutex.
> >
> > And that is because of that one misc device, right? Why not fix that
> > instead of papering over the issue in the misc core?
>
> Since "struct file_operations"->open() is allowed to sleep, calling
> "struct file_operations"->open() via reassignment by "struct miscdevice"->fops
> with locks held can cause problems.
>
> Assuming that this is not a deadlock hidden by device_initialize(), current
> mutex_lock(&misc_mtx) is as problematic as major_names_lock mentioned at
> https://lkml.kernel.org/r/[email protected] .
>
> >> If you don't like mutex_lock_killable(&misc_mtx), we will need to consider moving
> >> file->f_op->open() from misc_open() to after mutex_unlock(&misc_mtx).
>
> Below is minimal changes for avoid calling "struct file_operations"->open() with
> misc_mtx held. This would be nothing but moving hung task warning from misc_open()
> to snapshot_open() (and therefore we would need to introduce killable version of
> lock_system_sleep()), but we can avoid making misc_mtx like major_names_lock above.
>
> Greg, can you accept this minimal change?
>
> drivers/char/misc.c | 4 ++++
> include/linux/miscdevice.h | 1 +
> kernel/power/user.c | 1 +
> 3 files changed, 6 insertions(+)
>
> diff --git a/drivers/char/misc.c b/drivers/char/misc.c
> index cba19bfdc44d..292c86c090b9 100644
> --- a/drivers/char/misc.c
> +++ b/drivers/char/misc.c
> @@ -139,6 +139,10 @@ static int misc_open(struct inode *inode, struct file *file)
>
> err = 0;
> replace_fops(file, new_fops);
> + if (iter->unlocked_open && file->f_op->open) {
> + mutex_unlock(&misc_mtx);
> + return file->f_op->open(inode, file);
> + }

One of the invariants of miscdev is that once misc_deregister() returns,
no new calls to f_op->open() are made. (Although, of course, you can
still have open files but that's a whole different problem.)

This change breaks this invariant which I think is problematic because
drivers then can't know when new calls to open() are guaranteed to stop
coming.

> if (file->f_op->open)
> err = file->f_op->open(inode, file);
> fail:
> diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
> index 0676f18093f9..e112ef9e3b7b 100644
> --- a/include/linux/miscdevice.h
> +++ b/include/linux/miscdevice.h
> @@ -86,6 +86,7 @@ struct miscdevice {
> const struct attribute_group **groups;
> const char *nodename;
> umode_t mode;
> + bool unlocked_open;
> };
>
> extern int misc_register(struct miscdevice *misc);
> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index ad241b4ff64c..69a269c4fb46 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -441,6 +441,7 @@ static struct miscdevice snapshot_device = {
> .minor = SNAPSHOT_MINOR,
> .name = "snapshot",
> .fops = &snapshot_fops,
> + .unlocked_open = true,
> };
>
> static int __init snapshot_device_init(void)
> --
> 2.34.1
>

2022-07-04 14:00:44

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 2022/07/04 21:59, Wedson Almeida Filho wrote:
>> @@ -139,6 +139,10 @@ static int misc_open(struct inode *inode, struct file *file)
>>
>> err = 0;
>> replace_fops(file, new_fops);
>> + if (iter->unlocked_open && file->f_op->open) {
>> + mutex_unlock(&misc_mtx);
>> + return file->f_op->open(inode, file);
>> + }
>
> One of the invariants of miscdev is that once misc_deregister() returns,
> no new calls to f_op->open() are made. (Although, of course, you can
> still have open files but that's a whole different problem.)

The point of this change is that file->f_op after mutex_unlock(&misc_mtx) is
from new_fops which is guaranteed to hold a ref on "struct file_operations"
via new_fops = fops_get("struct miscdevice"->fops).
That is, a module ref remains valid even after mutex_unlock(&misc_mtx).

And as with major_names_lock case quoted below, this change assumes that
misc_deregister() is called from module's __exit function, and fops_get()
is preventing the module owning new_fops from calling __exit function.

+ if ((*n)->major != major || !(*n)->probe)
+ continue;
+ if (!try_module_get((*n)->owner))
+ break;
+ /*
+ * Calling probe function with major_names_lock held causes
+ * circular locking dependency problem. Thus, call it after
+ * releasing major_names_lock.
+ */
+ probe_fn = (*n)->probe;
+ mutex_unlock(&major_names_lock);
+ /*
+ * Assuming that unregister_blkdev() is called from module's
+ * __exit function, a module refcount taken above allows us
+ * to safely call probe function without major_names_lock held.
+ */
+ probe_fn(devt);
+ module_put((*n)->owner);
+ return;

2022-07-04 14:05:18

by Wedson Almeida Filho

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Mon, Jul 04, 2022 at 10:48:32PM +0900, Tetsuo Handa wrote:
> On 2022/07/04 21:59, Wedson Almeida Filho wrote:
> >> @@ -139,6 +139,10 @@ static int misc_open(struct inode *inode, struct file *file)
> >>
> >> err = 0;
> >> replace_fops(file, new_fops);
> >> + if (iter->unlocked_open && file->f_op->open) {
> >> + mutex_unlock(&misc_mtx);
> >> + return file->f_op->open(inode, file);
> >> + }
> >
> > One of the invariants of miscdev is that once misc_deregister() returns,
> > no new calls to f_op->open() are made. (Although, of course, you can
> > still have open files but that's a whole different problem.)
>
> The point of this change is that file->f_op after mutex_unlock(&misc_mtx) is
> from new_fops which is guaranteed to hold a ref on "struct file_operations"
> via new_fops = fops_get("struct miscdevice"->fops).
> That is, a module ref remains valid even after mutex_unlock(&misc_mtx).
>
> And as with major_names_lock case quoted below, this change assumes that
> misc_deregister() is called from module's __exit function, and fops_get()
> is preventing the module owning new_fops from calling __exit function.

Your assumption is not sound. misc_deregister() can be (and is)
legitimately called from other places, for example, a driver's remove()
callback. In fact, when I grep for misc_deregister(), the second
instance is such a case.

Anyway, I agree that something should be done about the issue you
report, I'm just arguing that this proposal isn't a good one because it
breaks an invariant that makes it really hard for drivers to properly
cleanup.

>
> + if ((*n)->major != major || !(*n)->probe)
> + continue;
> + if (!try_module_get((*n)->owner))
> + break;
> + /*
> + * Calling probe function with major_names_lock held causes
> + * circular locking dependency problem. Thus, call it after
> + * releasing major_names_lock.
> + */
> + probe_fn = (*n)->probe;
> + mutex_unlock(&major_names_lock);
> + /*
> + * Assuming that unregister_blkdev() is called from module's
> + * __exit function, a module refcount taken above allows us
> + * to safely call probe function without major_names_lock held.
> + */
> + probe_fn(devt);
> + module_put((*n)->owner);
> + return;
>

2022-07-04 14:21:17

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 2022/07/04 22:57, Wedson Almeida Filho wrote:
> On Mon, Jul 04, 2022 at 10:48:32PM +0900, Tetsuo Handa wrote:
>> On 2022/07/04 21:59, Wedson Almeida Filho wrote:
>>>> @@ -139,6 +139,10 @@ static int misc_open(struct inode *inode, struct file *file)
>>>>
>>>> err = 0;
>>>> replace_fops(file, new_fops);
>>>> + if (iter->unlocked_open && file->f_op->open) {
>>>> + mutex_unlock(&misc_mtx);
>>>> + return file->f_op->open(inode, file);
>>>> + }
>>>
>>> One of the invariants of miscdev is that once misc_deregister() returns,
>>> no new calls to f_op->open() are made. (Although, of course, you can
>>> still have open files but that's a whole different problem.)
>>
>> The point of this change is that file->f_op after mutex_unlock(&misc_mtx) is
>> from new_fops which is guaranteed to hold a ref on "struct file_operations"
>> via new_fops = fops_get("struct miscdevice"->fops).
>> That is, a module ref remains valid even after mutex_unlock(&misc_mtx).
>>
>> And as with major_names_lock case quoted below, this change assumes that
>> misc_deregister() is called from module's __exit function, and fops_get()
>> is preventing the module owning new_fops from calling __exit function.
>
> Your assumption is not sound. misc_deregister() can be (and is)
> legitimately called from other places, for example, a driver's remove()
> callback. In fact, when I grep for misc_deregister(), the second
> instance is such a case.

OK, the frequency of calling misc_deregister() can be much higher than
unregister_blkdev(), which means that misc_mtx is more prone to trigger
hung task warnings. I'm more inclined to avoid sleeping with misc_mtx held.

2022-07-04 15:08:47

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Mon, Jul 04, 2022 at 09:34:04PM +0900, Tetsuo Handa wrote:
> On 2022/07/04 20:01, Greg KH wrote:
> > On Mon, Jul 04, 2022 at 07:25:44PM +0900, Tetsuo Handa wrote:
> >> On 2022/07/04 16:29, Greg KH wrote:
> >>> On Mon, Jul 04, 2022 at 03:44:07PM +0900, Tetsuo Handa wrote:
> >>>> syzbot is reporting hung task at misc_open() [1], for snapshot_open() from
> >>>> misc_open() might sleep for long with misc_mtx held whereas userspace can
> >>>> flood with concurrent misc_open() requests. Mitigate this problem by making
> >>>> misc_open() and misc_register() killable.
> >>>
> >>> I do not understand, why not just fix snapshot_open()? Why add this
> >>> complexity to the misc core for a foolish individual misc device? Why
> >>> not add the fix there where it is spinning instead?
> >>
> >> Quoting an example from [1]. Multiple processes are calling misc_open() and
> >> all but one processes are blocked at mutex_lock(&misc_mtx). The one which is
> >> not blocked at mutex_lock(&misc_mtx) is also holding system_transition_mutex.
> >
> > And that is because of that one misc device, right? Why not fix that
> > instead of papering over the issue in the misc core?
>
> Since "struct file_operations"->open() is allowed to sleep, calling
> "struct file_operations"->open() via reassignment by "struct miscdevice"->fops
> with locks held can cause problems.
>
> Assuming that this is not a deadlock hidden by device_initialize(), current
> mutex_lock(&misc_mtx) is as problematic as major_names_lock mentioned at
> https://lkml.kernel.org/r/[email protected] .
>
> >> If you don't like mutex_lock_killable(&misc_mtx), we will need to consider moving
> >> file->f_op->open() from misc_open() to after mutex_unlock(&misc_mtx).
>
> Below is minimal changes for avoid calling "struct file_operations"->open() with
> misc_mtx held. This would be nothing but moving hung task warning from misc_open()
> to snapshot_open() (and therefore we would need to introduce killable version of
> lock_system_sleep()), but we can avoid making misc_mtx like major_names_lock above.
>
> Greg, can you accept this minimal change?
>
> drivers/char/misc.c | 4 ++++
> include/linux/miscdevice.h | 1 +
> kernel/power/user.c | 1 +
> 3 files changed, 6 insertions(+)

I don't understand what you are trying to "fix" here. What is userspace
doing (as a normal user) that is causing a problem, and what problem is
it causing and for what device/hardware/driver is this a problem?

Yes, you can sleep in open(), but you shouldn't sleep long, if at all
possible as it can be annoying. So why not fix up the offending driver
not to sleep to long?

thanks,

greg k-h

2022-07-04 15:10:55

by Wedson Almeida Filho

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Mon, Jul 04, 2022 at 11:07:54PM +0900, Tetsuo Handa wrote:
> On 2022/07/04 22:57, Wedson Almeida Filho wrote:
> > On Mon, Jul 04, 2022 at 10:48:32PM +0900, Tetsuo Handa wrote:
> >> On 2022/07/04 21:59, Wedson Almeida Filho wrote:
> >>>> @@ -139,6 +139,10 @@ static int misc_open(struct inode *inode, struct file *file)
> >>>>
> >>>> err = 0;
> >>>> replace_fops(file, new_fops);
> >>>> + if (iter->unlocked_open && file->f_op->open) {
> >>>> + mutex_unlock(&misc_mtx);
> >>>> + return file->f_op->open(inode, file);
> >>>> + }
> >>>
> >>> One of the invariants of miscdev is that once misc_deregister() returns,
> >>> no new calls to f_op->open() are made. (Although, of course, you can
> >>> still have open files but that's a whole different problem.)
> >>
> >> The point of this change is that file->f_op after mutex_unlock(&misc_mtx) is
> >> from new_fops which is guaranteed to hold a ref on "struct file_operations"
> >> via new_fops = fops_get("struct miscdevice"->fops).
> >> That is, a module ref remains valid even after mutex_unlock(&misc_mtx).
> >>
> >> And as with major_names_lock case quoted below, this change assumes that
> >> misc_deregister() is called from module's __exit function, and fops_get()
> >> is preventing the module owning new_fops from calling __exit function.
> >
> > Your assumption is not sound. misc_deregister() can be (and is)
> > legitimately called from other places, for example, a driver's remove()
> > callback. In fact, when I grep for misc_deregister(), the second
> > instance is such a case.
>
> OK, the frequency of calling misc_deregister() can be much higher than
> unregister_blkdev(), which means that misc_mtx is more prone to trigger
> hung task warnings. I'm more inclined to avoid sleeping with misc_mtx held.

Tetsuo, I'm sorry if I'm not making myself clear. I'm arguing that your
patch is buggy and therefore should not be accepted.

Here's one example of an issue that your patch would introduce. In
binder init, we have the following error path:

err_init_binder_device_failed:
hlist_for_each_entry_safe(device, tmp, &binder_devices, hlist) {
misc_deregister(&device->miscdev);
hlist_del(&device->hlist);
kfree(device);
}

Note that open() for binder touches the `device` pointer. If open() can
be called _after_ misc_deregister(), then there is a race condition
where open() is touching `device` while this error path is freeing it.

Right now it isn't a bug because misc_deregister() guarantees that after
it returns, open() won't be called anymore. Your patch breaks this
guarantee.

2022-07-05 05:38:18

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 2022/07/04 23:31, Greg KH wrote:
> I don't understand what you are trying to "fix" here. What is userspace
> doing (as a normal user) that is causing a problem, and what problem is
> it causing and for what device/hardware/driver is this a problem?

Currently the root cause is unknown.
This might be another example of deadlock hidden by device_initialize().

We can see from https://syzkaller.appspot.com/text?tag=CrashReport&x=11feb7e0080000 that
when khungtaskd reports that a process is blocked waiting for misc_mtx at misc_open(),
there is a process which is holding system_transition_mutex from snapshot_open().

----------------------------------------
INFO: task syz-executor.4:21922 blocked for more than 143 seconds.
Not tainted 5.19.0-rc4-syzkaller-00187-g089866061428 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.4 state:D stack:28408 pid:21922 ppid: 3666 flags:0x00000004

2 locks held by syz-executor.5/21906:
#0: ffffffff8c82f708 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x5f/0x4a0 drivers/char/misc.c:107
#1: ffffffff8bc536e8 (system_transition_mutex){+.+.}-{3:3}, at: snapshot_open+0x3b/0x2a0 kernel/power/user.c:54
1 lock held by syz-executor.4/21922:
#0: ffffffff8c82f708 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x5f/0x4a0 drivers/char/misc.c:107
----------------------------------------

Possible locations where snapshot_open() might sleep with system_transition_mutex held are
pm_notifier_call_chain_robust()/wait_for_device_probe()/create_basic_memory_bitmaps().
But I think we can exclude pm_notifier_call_chain_robust() because lockdep does not report
that that process is holding "struct blocking_notifier_head"->rwsem. I suspect that
that process is sleeping at wait_for_device_probe(), for it waits for probe operations.

----------------------------------------
void wait_for_device_probe(void)
{
/* wait for the deferred probe workqueue to finish */
flush_work(&deferred_probe_work);

/* wait for the known devices to complete their probing */
wait_event(probe_waitqueue, atomic_read(&probe_count) == 0);
async_synchronize_full();
}
----------------------------------------

>
> Yes, you can sleep in open(), but you shouldn't sleep long, if at all
> possible as it can be annoying. So why not fix up the offending driver
> not to sleep to long?

We can't predict how long snapshot_open() sleeps inside wait_for_device_probe().

Looking at abovementioned report again, it seems to be common that one process is
inside input_register_handle() and another process is inside input_close_device(),
and these two processes are holding the same &dev->mutex#2 object. Guessing from
the code that input_register_handle() will not sleep with dev->mutex held,
input_close_device() is holding dev->mutex and input_register_handle() is
waiting for input_close_device() to release dev->mutex.

Therefore, there might be a race or deadlock between these two processes.
If &dev->mutex#2 were subjected to device_initialize() magic, lockdep won't be
able to catch the deadlock. But I'm not familiar with device management code...

Maybe input_close_device() is failing to release dev->mutex for some reason?
Maybe nothing but too slow to wait?

----------------------------------------
7 locks held by kworker/1:0/22:
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1280 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:636 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:663 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x87a/0x1610 kernel/workqueue.c:2260
#1: ffffc900001c7da8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8ae/0x1610 kernel/workqueue.c:2264
#2: ffff8881479d4190 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#2: ffff8881479d4190 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4690 drivers/usb/core/hub.c:5691
#3: ffff888044782190 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#3: ffff888044782190 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x530 drivers/base/dd.c:964
#4: ffff8880447d2118 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#4: ffff8880447d2118 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x530 drivers/base/dd.c:964
#5: ffffffff8ceafca8 (input_mutex){+.+.}-{3:3}, at: input_register_device.cold+0x34/0x304 drivers/input/input.c:2378
#6: ffff8880447d52c0 (&dev->mutex#2){+.+.}-{3:3}, at: input_register_handle+0x6d/0x510 drivers/input/input.c:2544

2 locks held by acpid/2962:
#0: ffff888042a28158 (&joydev->mutex){+.+.}-{3:3}, at: joydev_close_device drivers/input/joydev.c:220 [inline]
#0: ffff888042a28158 (&joydev->mutex){+.+.}-{3:3}, at: joydev_release+0x187/0x290 drivers/input/joydev.c:252
#1: ffff8880447d52c0 (&dev->mutex#2){+.+.}-{3:3}, at: input_close_device+0x42/0x1f0 drivers/input/input.c:726

7 locks held by kworker/1:11/5743:
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: atomic_long_set include/linux/atomic/atomic-instrumented.h:1280 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:636 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:663 [inline]
#0: ffff888011a65d38 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x87a/0x1610 kernel/workqueue.c:2260
#1: ffffc900153c7da8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8ae/0x1610 kernel/workqueue.c:2264
#2: ffff888021384190 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#2: ffff888021384190 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4690 drivers/usb/core/hub.c:5691
#3: ffff8880468a4190 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#3: ffff8880468a4190 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x530 drivers/base/dd.c:964
#4: ffff8880468a6118 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#4: ffff8880468a6118 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x530 drivers/base/dd.c:964
#5: ffff8880255f1a20 (&dev->mutex){....}-{3:3}, at: device_lock include/linux/device.h:835 [inline]
#5: ffff8880255f1a20 (&dev->mutex){....}-{3:3}, at: __device_attach+0x76/0x530 drivers/base/dd.c:964
#6: ffffffff8ceafca8 (input_mutex){+.+.}-{3:3}, at: input_register_device.cold+0x34/0x304 drivers/input/input.c:2378
----------------------------------------

2022-07-05 05:42:45

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Tue, Jul 05, 2022 at 02:21:17PM +0900, Tetsuo Handa wrote:
> On 2022/07/04 23:31, Greg KH wrote:
> > I don't understand what you are trying to "fix" here. What is userspace
> > doing (as a normal user) that is causing a problem, and what problem is
> > it causing and for what device/hardware/driver is this a problem?
>
> Currently the root cause is unknown.
> This might be another example of deadlock hidden by device_initialize().
>
> We can see from https://syzkaller.appspot.com/text?tag=CrashReport&x=11feb7e0080000 that
> when khungtaskd reports that a process is blocked waiting for misc_mtx at misc_open(),
> there is a process which is holding system_transition_mutex from snapshot_open().

/dev/snapshot is not read/writable by anyone but root for obvious
reasons.

And perhaps it's something that syzbot shouldn't be fuzzing unless it
wants to take the system down easily :)

thanks,

greg k-h

2022-07-05 07:25:14

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Tue, 5 Jul 2022 at 07:54, Tetsuo Handa
<[email protected]> wrote:
> On Tue, Jul 05, 2022 at 02:21:17PM +0900, Tetsuo Handa wrote:
> > On 2022/07/04 23:31, Greg KH wrote:
> > > I don't understand what you are trying to "fix" here. What is userspace
> > > doing (as a normal user) that is causing a problem, and what problem is
> > > it causing and for what device/hardware/driver is this a problem?
> >
> > Currently the root cause is unknown.
> > This might be another example of deadlock hidden by device_initialize().
> >
> > We can see from https://syzkaller.appspot.com/text?tag=CrashReport&x=11feb7e0080000 that
> > when khungtaskd reports that a process is blocked waiting for misc_mtx at misc_open(),
> > there is a process which is holding system_transition_mutex from snapshot_open().
>
> /dev/snapshot is not read/writable by anyone but root for obvious
> reasons.
>
> And perhaps it's something that syzbot shouldn't be fuzzing unless it
> wants to take the system down easily :)

We could turn CONFIG_HIBERNATION_SNAPSHOT_DEV off for syzbot, but it
will also mean this part of the kernel won't be tested at all.
I see it has 14 ioclt's (below) and not all of them look problematic
(like POWER_OFF).
Perhaps the kernel could restrict access only to reboot/restore
functionality? This way we could at least test everything related to
snapshot creation.

#define SNAPSHOT_FREEZE _IO(SNAPSHOT_IOC_MAGIC, 1)
#define SNAPSHOT_UNFREEZE _IO(SNAPSHOT_IOC_MAGIC, 2)
#define SNAPSHOT_ATOMIC_RESTORE _IO(SNAPSHOT_IOC_MAGIC, 4)
#define SNAPSHOT_FREE _IO(SNAPSHOT_IOC_MAGIC, 5)
#define SNAPSHOT_FREE_SWAP_PAGES _IO(SNAPSHOT_IOC_MAGIC, 9)
#define SNAPSHOT_S2RAM _IO(SNAPSHOT_IOC_MAGIC, 11)
#define SNAPSHOT_SET_SWAP_AREA _IOW(SNAPSHOT_IOC_MAGIC, 13, struct
resume_swap_area)
#define SNAPSHOT_GET_IMAGE_SIZE _IOR(SNAPSHOT_IOC_MAGIC, 14, __kernel_loff_t)
#define SNAPSHOT_PLATFORM_SUPPORT _IO(SNAPSHOT_IOC_MAGIC, 15)
#define SNAPSHOT_POWER_OFF _IO(SNAPSHOT_IOC_MAGIC, 16)
#define SNAPSHOT_CREATE_IMAGE _IOW(SNAPSHOT_IOC_MAGIC, 17, int)
#define SNAPSHOT_PREF_IMAGE_SIZE _IO(SNAPSHOT_IOC_MAGIC, 18)
#define SNAPSHOT_AVAIL_SWAP_SIZE _IOR(SNAPSHOT_IOC_MAGIC, 19, __kernel_loff_t)
#define SNAPSHOT_ALLOC_SWAP_PAGE _IOR(SNAPSHOT_IOC_MAGIC, 20, __kernel_loff_t)

2022-07-05 10:46:42

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Tue, Jul 05, 2022 at 09:20:24AM +0200, Dmitry Vyukov wrote:
> On Tue, 5 Jul 2022 at 07:54, Tetsuo Handa
> <[email protected]> wrote:
> > On Tue, Jul 05, 2022 at 02:21:17PM +0900, Tetsuo Handa wrote:
> > > On 2022/07/04 23:31, Greg KH wrote:
> > > > I don't understand what you are trying to "fix" here. What is userspace
> > > > doing (as a normal user) that is causing a problem, and what problem is
> > > > it causing and for what device/hardware/driver is this a problem?
> > >
> > > Currently the root cause is unknown.
> > > This might be another example of deadlock hidden by device_initialize().
> > >
> > > We can see from https://syzkaller.appspot.com/text?tag=CrashReport&x=11feb7e0080000 that
> > > when khungtaskd reports that a process is blocked waiting for misc_mtx at misc_open(),
> > > there is a process which is holding system_transition_mutex from snapshot_open().
> >
> > /dev/snapshot is not read/writable by anyone but root for obvious
> > reasons.
> >
> > And perhaps it's something that syzbot shouldn't be fuzzing unless it
> > wants to take the system down easily :)
>
> We could turn CONFIG_HIBERNATION_SNAPSHOT_DEV off for syzbot, but it
> will also mean this part of the kernel won't be tested at all.
> I see it has 14 ioclt's (below) and not all of them look problematic
> (like POWER_OFF).
> Perhaps the kernel could restrict access only to reboot/restore
> functionality? This way we could at least test everything related to
> snapshot creation.

This is already restricted to root, why would you want to restrict it
anymore?

> #define SNAPSHOT_FREEZE _IO(SNAPSHOT_IOC_MAGIC, 1)
> #define SNAPSHOT_UNFREEZE _IO(SNAPSHOT_IOC_MAGIC, 2)
> #define SNAPSHOT_ATOMIC_RESTORE _IO(SNAPSHOT_IOC_MAGIC, 4)
> #define SNAPSHOT_FREE _IO(SNAPSHOT_IOC_MAGIC, 5)
> #define SNAPSHOT_FREE_SWAP_PAGES _IO(SNAPSHOT_IOC_MAGIC, 9)
> #define SNAPSHOT_S2RAM _IO(SNAPSHOT_IOC_MAGIC, 11)
> #define SNAPSHOT_SET_SWAP_AREA _IOW(SNAPSHOT_IOC_MAGIC, 13, struct
> resume_swap_area)
> #define SNAPSHOT_GET_IMAGE_SIZE _IOR(SNAPSHOT_IOC_MAGIC, 14, __kernel_loff_t)
> #define SNAPSHOT_PLATFORM_SUPPORT _IO(SNAPSHOT_IOC_MAGIC, 15)
> #define SNAPSHOT_POWER_OFF _IO(SNAPSHOT_IOC_MAGIC, 16)
> #define SNAPSHOT_CREATE_IMAGE _IOW(SNAPSHOT_IOC_MAGIC, 17, int)
> #define SNAPSHOT_PREF_IMAGE_SIZE _IO(SNAPSHOT_IOC_MAGIC, 18)
> #define SNAPSHOT_AVAIL_SWAP_SIZE _IOR(SNAPSHOT_IOC_MAGIC, 19, __kernel_loff_t)
> #define SNAPSHOT_ALLOC_SWAP_PAGE _IOR(SNAPSHOT_IOC_MAGIC, 20, __kernel_loff_t)

Fuzzing this is always nice, but be very aware of the system state
changes that you are creating. Also know when you make these state
changes, the rest of the system's functionality also changes.

thanks,

greg k-h

2022-07-05 14:14:46

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 2022/07/05 14:21, Tetsuo Handa wrote:
> Possible locations where snapshot_open() might sleep with system_transition_mutex held are
> pm_notifier_call_chain_robust()/wait_for_device_probe()/create_basic_memory_bitmaps().
> But I think we can exclude pm_notifier_call_chain_robust() because lockdep does not report
> that that process is holding "struct blocking_notifier_head"->rwsem. I suspect that
> that process is sleeping at wait_for_device_probe(), for it waits for probe operations.
>
> ----------------------------------------
> void wait_for_device_probe(void)
> {
> /* wait for the deferred probe workqueue to finish */
> flush_work(&deferred_probe_work);
>
> /* wait for the known devices to complete their probing */
> wait_event(probe_waitqueue, atomic_read(&probe_count) == 0);
> async_synchronize_full();
> }
> ----------------------------------------

syzbot confirmed that snapshot_open() is unable to proceed due to
atomic_read(&probe_count) == 2 for 145 seconds.

----------------------------------------
[ 86.794300][ T4209] Held system_transition_mutex.
[ 86.821486][ T4209] Calling wait_for_device_probe()
[ 86.841374][ T4209] Calling flush_work(&deferred_probe_work)
[ 86.867398][ T4209] Calling wait_event(probe_waitqueue)
[ 87.966188][ T4209] Calling probe_count=2
(...snipped...)
[ 233.554473][ T4209] Calling probe_count=2
[ 234.444800][ T28] INFO: task syz-executor.4:4146 blocked for more than 143 seconds.
----------------------------------------

Apart from whether we should fuzz snapshot code or not,
there seems to be a bug that causes wait_for_device_probe() to hung.

On 2022/07/05 22:44, syzbot wrote:
> Tested on:
>
> commit: c1084b6c Merge tag 'soc-fixes-5.19-2' of git://git.ker..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=156cc410080000
> kernel config: https://syzkaller.appspot.com/x/.config?x=66f70cacb0085db4
> dashboard link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c
> compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> patch: https://syzkaller.appspot.com/x/patch.diff?x=11ae4834080000

2022-07-05 14:39:01

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Tue, Jul 05, 2022 at 11:01:38PM +0900, Tetsuo Handa wrote:
> On 2022/07/05 14:21, Tetsuo Handa wrote:
> > Possible locations where snapshot_open() might sleep with system_transition_mutex held are
> > pm_notifier_call_chain_robust()/wait_for_device_probe()/create_basic_memory_bitmaps().
> > But I think we can exclude pm_notifier_call_chain_robust() because lockdep does not report
> > that that process is holding "struct blocking_notifier_head"->rwsem. I suspect that
> > that process is sleeping at wait_for_device_probe(), for it waits for probe operations.
> >
> > ----------------------------------------
> > void wait_for_device_probe(void)
> > {
> > /* wait for the deferred probe workqueue to finish */
> > flush_work(&deferred_probe_work);
> >
> > /* wait for the known devices to complete their probing */
> > wait_event(probe_waitqueue, atomic_read(&probe_count) == 0);
> > async_synchronize_full();
> > }
> > ----------------------------------------
>
> syzbot confirmed that snapshot_open() is unable to proceed due to
> atomic_read(&probe_count) == 2 for 145 seconds.
>
> ----------------------------------------
> [ 86.794300][ T4209] Held system_transition_mutex.
> [ 86.821486][ T4209] Calling wait_for_device_probe()
> [ 86.841374][ T4209] Calling flush_work(&deferred_probe_work)
> [ 86.867398][ T4209] Calling wait_event(probe_waitqueue)
> [ 87.966188][ T4209] Calling probe_count=2
> (...snipped...)
> [ 233.554473][ T4209] Calling probe_count=2
> [ 234.444800][ T28] INFO: task syz-executor.4:4146 blocked for more than 143 seconds.
> ----------------------------------------
>
> Apart from whether we should fuzz snapshot code or not,
> there seems to be a bug that causes wait_for_device_probe() to hung.

What else is going on in the system at this point in time? Are devices
still being added as part of boot init sequences? Or has boot finished
properly and these are devices being removed?

Some device is being probed at the moment, maybe we have a deadlock
somewhere here...

thanks,

greg k-h

2022-07-05 15:03:59

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 2022/07/05 23:16, Greg KH wrote:
>> Apart from whether we should fuzz snapshot code or not,
>> there seems to be a bug that causes wait_for_device_probe() to hung.
>
> What else is going on in the system at this point in time? Are devices
> still being added as part of boot init sequences? Or has boot finished
> properly and these are devices being removed?

Whatever is going on. syzkaller starts after boot has finished properly.

syzkaller is opening /dev/snapshot as one of testcases among with
connecting to usb devices using /dev/raw-gadget .

An example C reproducer is
https://syzkaller.appspot.com/text?tag=ReproC&x=13ef54d2b00000 .

Console output is
https://syzkaller.appspot.com/x/log.txt?x=11589950080000 .

>
> Some device is being probed at the moment, maybe we have a deadlock
> somewhere here...

Lockdep says __device_attach() from hub_event() was in progress.

----------------------------------------
[ 237.376478][ T28] 5 locks held by kworker/1:1/26:
[ 237.381526][ T28] #0: ffff888016b92538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x87a/0x1610
[ 237.392798][ T28] #1: ffffc90000c2fda8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8ae/0x1610
[ 237.406354][ T28] #2: ffff88801f7ee220 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4680
[ 237.415920][ T28] #3: ffff88801b6c6220 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0
[ 237.426682][ T28] #4: ffff8880216bc1a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0
----------------------------------------

2022-07-06 06:29:09

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 2022/07/05 23:35, Tetsuo Handa wrote:
> On 2022/07/05 23:16, Greg KH wrote:
>> Some device is being probed at the moment, maybe we have a deadlock
>> somewhere here...
>
> Lockdep says __device_attach() from hub_event() was in progress.
>
> ----------------------------------------
> [ 237.376478][ T28] 5 locks held by kworker/1:1/26:
> [ 237.381526][ T28] #0: ffff888016b92538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x87a/0x1610
> [ 237.392798][ T28] #1: ffffc90000c2fda8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8ae/0x1610
> [ 237.406354][ T28] #2: ffff88801f7ee220 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4680
> [ 237.415920][ T28] #3: ffff88801b6c6220 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0
> [ 237.426682][ T28] #4: ffff8880216bc1a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0
> ----------------------------------------
>

The number returned by atomic_read(&probe_count) matched the number of works for hub_event()
in usb_hub_wq workqueue. The probe function is called from hub_event(), and
usb_stor_msg_common() calls wait_for_completion_interruptible_timeout(MAX_SCHEDULE_TIMEOUT)
via driver's init function.

But if the usb device is unresponsive, wait_for_completion_interruptible_timeout() sleeps
forever. And in this testcase (which emulates usb devices using /dev/raw-gadget interface),
the usb device became unresponsive because the process who is responsible with reading/writing
/dev/raw-gadget interface is blocked at mutex_lock(&misc_mtx) at misc_open(), and results in
an AB-BA deadlock condition. Making misc_open() killable solved this problem, by allowing
the opener of /dev/raw-gadget interface to call fput() upon "send SIGKILL after 5 seconds from
fork()" behavior.

Anyway,

/*
* Resuming. We may need to wait for the image device to
* appear.
*/
wait_for_device_probe();

in snapshot_open() will sleep forever if some device became unresponsive.

How should we fix this problem?



SysRq-t output (location is based on Linux 5.18.0):
----------------------------------------
[ 113.988455][ T4455] Calling probe_count=2
[ 115.002350][ C2] sysrq: Show State

[ 124.602603][ C2] task:kworker/2:2 state:S stack:11608 pid: 980 ppid: 2 flags:0x00004000
[ 124.613409][ C2] Workqueue: usb_hub_wq hub_event
[ 124.618941][ C2] Call Trace:
[ 124.624046][ C2] <TASK>
[ 124.629122][ C2] __schedule+0x3b6/0x1690
[ 124.634575][ C2] ? _raw_spin_unlock_irq+0x1f/0x40
[ 124.640085][ C2] schedule+0x49/0xb0
[ 124.645280][ C2] schedule_timeout+0x134/0x180
[ 124.650574][ C2] ? _raw_spin_unlock_irq+0x1f/0x40
[ 124.655880][ C2] ? lockdep_hardirqs_on+0x79/0x100
[ 124.662294][ C2] __wait_for_common+0xac/0x1f0
[ 124.667740][ C2] ? usleep_range_state+0xe0/0xe0
[ 124.673449][ C2] usb_stor_msg_common+0x157/0x1f0 usb_stor_msg_common at drivers/usb/storage/transport.c:166 (discriminator 4)
[ 124.679074][ C2] ? verify_cpu+0xf0/0x100
[ 124.684774][ C2] usb_stor_ctrl_transfer+0xc3/0xf0 usb_stor_ctrl_transfer at drivers/usb/storage/transport.c:344
[ 124.690387][ C2] usb_stor_sddr09_dpcm_init+0x63/0x1a0 sddr09_send_command at drivers/usb/storage/sddr09.c:307
(inlined by) usb_stor_sddr09_dpcm_init at drivers/usb/storage/sddr09.c:1455
[ 124.696087][ C2] ? sddr09_get_cardinfo+0x530/0x530
[ 124.701654][ C2] usb_stor_probe2+0x24f/0x470 usb_stor_acquire_resources at drivers/usb/storage/usb.c:788
(inlined by) usb_stor_probe2 at drivers/usb/storage/usb.c:1050
[ 124.707024][ C2] ? sddr09_transport+0x1940/0x1940
[ 124.712690][ C2] sddr09_probe+0xd6/0x110
[ 124.717980][ C2] usb_probe_interface+0x19b/0x3e0
[ 124.723533][ C2] ? usb_match_dynamic_id+0xe0/0xe0
[ 124.728900][ C2] really_probe+0x138/0x4c0
[ 124.734058][ C2] __driver_probe_device+0x191/0x220
[ 124.739406][ C2] driver_probe_device+0x2a/0x120
[ 124.744649][ C2] __device_attach_driver+0x105/0x1a0
[ 124.749981][ C2] ? driver_allows_async_probing+0x90/0x90
[ 124.755600][ C2] bus_for_each_drv+0xba/0x100
[ 124.761136][ C2] __device_attach+0x130/0x290
[ 124.766408][ C2] bus_probe_device+0xdb/0xf0
[ 124.771735][ C2] device_add+0x635/0xdf0
[ 124.776770][ C2] ? __mutex_unlock_slowpath+0x37/0x280
[ 124.782189][ C2] usb_set_configuration+0x9ca/0xd10
[ 124.787516][ C2] usb_generic_driver_probe+0x8c/0xc0
[ 124.792906][ C2] usb_probe_device+0x6c/0x180
[ 124.798041][ C2] ? usb_driver_release_interface+0xc0/0xc0
[ 124.803416][ C2] really_probe+0x138/0x4c0
[ 124.808329][ C2] __driver_probe_device+0x191/0x220
[ 124.813373][ C2] driver_probe_device+0x2a/0x120
[ 124.818345][ C2] __device_attach_driver+0x105/0x1a0
[ 124.823436][ C2] ? driver_allows_async_probing+0x90/0x90
[ 124.828730][ C2] bus_for_each_drv+0xba/0x100
[ 124.834151][ C2] __device_attach+0x130/0x290
[ 124.840066][ C2] bus_probe_device+0xdb/0xf0
[ 124.845475][ C2] device_add+0x635/0xdf0
[ 124.850835][ C2] usb_new_device.cold+0x110/0x5e9
[ 124.856251][ C2] hub_event+0x177c/0x26b0
[ 124.861609][ C2] process_one_work+0x3d4/0x9c0
[ 124.866730][ C2] worker_thread+0x5b/0x5d0
[ 124.874165][ C2] ? process_one_work+0x9c0/0x9c0
[ 124.879966][ C2] kthread+0x135/0x170
[ 124.884870][ C2] ? kthread_complete_and_exit+0x30/0x30
[ 124.890628][ C2] ret_from_fork+0x1f/0x30
[ 124.896103][ C2] </TASK>

[ 143.975320][ C2] Showing all locks held in the system:
[ 143.985599][ C2] 5 locks held by kworker/2:2/980:
[ 143.991149][ C2] #0: ffff88800d615938 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x304/0x9c0
[ 144.002742][ C2] #1: ffffc90003ed3e70 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x304/0x9c0
[ 144.015128][ C2] #2: ffff88800fe66a20 (&dev->mutex){....}-{3:3}, at: hub_event+0xc4/0x26b0
[ 144.022646][ C2] #3: ffff88810e445a20 (&dev->mutex){....}-{3:3}, at: __device_attach+0x41/0x290
[ 144.032770][ C2] #4: ffff88810e4449a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x41/0x290
[ 144.045877][ C2] 1 lock held by in:imklog/3844:
[ 144.052399][ C2] #0: ffff888106694ee8 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0x8d/0xa0
[ 144.060131][ C2] 1 lock held by a.out/4425:
[ 144.066811][ C2] #0: ffffffff86863848 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x25/0x1e0
[ 144.074587][ C2] 3 locks held by systemd-udevd/4439:
[ 144.081494][ C2] #0: ffff8880140bc688 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_read_iter+0x9d/0x2e0
[ 144.095133][ C2] #1: ffff88810ba50938 (kn->active#60){++++}-{0:0}, at: kernfs_fop_read_iter+0xa5/0x2e0
[ 144.109502][ C2] #2: ffff88810e445a20 (&dev->mutex){....}-{3:3}, at: read_descriptors+0x3c/0x1e0
[ 144.123733][ C2] 2 locks held by a.out/4455:
[ 144.131152][ C2] #0: ffffffff86863848 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x25/0x1e0
[ 144.139727][ C2] #1: ffffffff8652d068 (system_transition_mutex){+.+.}-{3:3}, at: snapshot_open+0x40/0x160
[ 144.154965][ C2] 1 lock held by a.out/4456:
[ 144.162582][ C2] #0: ffffffff86863848 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x25/0x1e0
[ 144.171374][ C2] 1 lock held by a.out/4457:
[ 144.179220][ C2] #0: ffffffff86863848 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x25/0x1e0
[ 144.188231][ C2] 1 lock held by a.out/4458:
[ 144.196069][ C2] #0: ffffffff86863848 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x25/0x1e0
[ 144.204946][ C2] 1 lock held by a.out/4459:
[ 144.214175][ C2] #0: ffffffff86863848 (misc_mtx){+.+.}-{3:3}, at: misc_open+0x25/0x1e0

[ 144.246256][ C2] Showing busy workqueues and worker pools:
[ 144.254448][ C2] workqueue usb_hub_wq: flags=0x4
[ 144.262458][ C2] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
[ 144.262554][ C2] in-flight: 980:hub_event hub_event
[ 144.262743][ C2] pool 4: cpus=2 node=0 flags=0x0 nice=0 hung=29s workers=4 idle: 28 3084 53
[ 144.290063][ T4455] Calling probe_count=2
----------------------------------------

2022-07-06 06:42:00

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Wed, Jul 06, 2022 at 03:21:15PM +0900, Tetsuo Handa wrote:
> On 2022/07/05 23:35, Tetsuo Handa wrote:
> > On 2022/07/05 23:16, Greg KH wrote:
> >> Some device is being probed at the moment, maybe we have a deadlock
> >> somewhere here...
> >
> > Lockdep says __device_attach() from hub_event() was in progress.
> >
> > ----------------------------------------
> > [ 237.376478][ T28] 5 locks held by kworker/1:1/26:
> > [ 237.381526][ T28] #0: ffff888016b92538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x87a/0x1610
> > [ 237.392798][ T28] #1: ffffc90000c2fda8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8ae/0x1610
> > [ 237.406354][ T28] #2: ffff88801f7ee220 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4680
> > [ 237.415920][ T28] #3: ffff88801b6c6220 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0
> > [ 237.426682][ T28] #4: ffff8880216bc1a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0
> > ----------------------------------------
> >
>
> The number returned by atomic_read(&probe_count) matched the number of works for hub_event()
> in usb_hub_wq workqueue. The probe function is called from hub_event(), and
> usb_stor_msg_common() calls wait_for_completion_interruptible_timeout(MAX_SCHEDULE_TIMEOUT)
> via driver's init function.
>
> But if the usb device is unresponsive, wait_for_completion_interruptible_timeout() sleeps
> forever. And in this testcase (which emulates usb devices using /dev/raw-gadget interface),
> the usb device became unresponsive because the process who is responsible with reading/writing
> /dev/raw-gadget interface is blocked at mutex_lock(&misc_mtx) at misc_open(), and results in
> an AB-BA deadlock condition. Making misc_open() killable solved this problem, by allowing
> the opener of /dev/raw-gadget interface to call fput() upon "send SIGKILL after 5 seconds from
> fork()" behavior.
>
> Anyway,
>
> /*
> * Resuming. We may need to wait for the image device to
> * appear.
> */
> wait_for_device_probe();
>
> in snapshot_open() will sleep forever if some device became unresponsive.
>
> How should we fix this problem?

We can decrease the timeout in usb_stor_msg_common(). I imagine that if
that timeout is ever hit in this sequence, then all will recover, right?
Try decreasing it to a sane number and see what happens.

thanks,

greg k-h

2022-07-06 11:22:56

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 2022/07/06 15:34, Greg KH wrote:
> On Wed, Jul 06, 2022 at 03:21:15PM +0900, Tetsuo Handa wrote:
>> How should we fix this problem?
>
> We can decrease the timeout in usb_stor_msg_common(). I imagine that if
> that timeout is ever hit in this sequence, then all will recover, right?
> Try decreasing it to a sane number and see what happens.

Yes, all recovers with below diff.

------------------------------------------------------------
diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
index 1928b3918242..d2a192306e0c 100644
--- a/drivers/usb/storage/transport.c
+++ b/drivers/usb/storage/transport.c
@@ -164,7 +164,7 @@ static int usb_stor_msg_common(struct us_data *us, int timeout)

/* wait for the completion of the URB */
timeleft = wait_for_completion_interruptible_timeout(
- &urb_done, timeout ? : MAX_SCHEDULE_TIMEOUT);
+ &urb_done, timeout ? : 5 * HZ);

clear_bit(US_FLIDX_URB_ACTIVE, &us->dflags);

------------------------------------------------------------

But

>> Anyway,
>>
>> /*
>> * Resuming. We may need to wait for the image device to
>> * appear.
>> */
>> wait_for_device_probe();
>>
>> in snapshot_open() will sleep forever if some device became unresponsive.
>>

wait_for_device_probe() in snapshot_open() was added by commit c751085943362143
("PM/Hibernate: Wait for SCSI devices scan to complete during resume"), and
that commit did not take into account possibility of unresponsive hardware.

"In addition, if the resume from hibernation is userland-driven, it's
better to wait for all device probes in the kernel to complete before
attempting to open the resume device."

It is trivial to make e.g. atomic_read(&probe_count) == 10, which means that
acceptable timeout for usb_stor_msg_common() may be no longer acceptable timeout
for wait_for_device_probe(). Unlike flush_workqueue(), wait_for_device_probe()
can wait forever if new probe requests keep coming in while waiting for existing
probe requests to complete. Therefore, I think we should introduce timeout on
wait_for_device_probe() side as well.

I would like to propose below changes in 3 patches as fixes for this problem.
Since there are 13 wait_for_device_probe() callers, maybe we want both killable
and uninterruptible versions and pass timeout as an argument...

------------------------------------------------------------
drivers/base/dd.c | 3 ++-
drivers/char/misc.c | 9 ++++++---
drivers/usb/storage/transport.c | 2 +-
3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 3fc3b5940bb3..67e08b381ee2 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -723,7 +723,8 @@ void wait_for_device_probe(void)
flush_work(&deferred_probe_work);

/* wait for the known devices to complete their probing */
- wait_event(probe_waitqueue, atomic_read(&probe_count) == 0);
+ wait_event_killable_timeout(probe_waitqueue,
+ atomic_read(&probe_count) == 0, 60 * HZ);
async_synchronize_full();
}
EXPORT_SYMBOL_GPL(wait_for_device_probe);
diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index ca5141ed5ef3..6430c534a1cb 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -104,7 +104,8 @@ static int misc_open(struct inode *inode, struct file *file)
int err = -ENODEV;
const struct file_operations *new_fops = NULL;

- mutex_lock(&misc_mtx);
+ if (mutex_lock_killable(&misc_mtx))
+ return -EINTR;

list_for_each_entry(c, &misc_list, list) {
if (c->minor == minor) {
@@ -116,7 +117,8 @@ static int misc_open(struct inode *inode, struct file *file)
if (!new_fops) {
mutex_unlock(&misc_mtx);
request_module("char-major-%d-%d", MISC_MAJOR, minor);
- mutex_lock(&misc_mtx);
+ if (mutex_lock_killable(&misc_mtx))
+ return -EINTR;

list_for_each_entry(c, &misc_list, list) {
if (c->minor == minor) {
@@ -178,7 +180,8 @@ int misc_register(struct miscdevice *misc)

INIT_LIST_HEAD(&misc->list);

- mutex_lock(&misc_mtx);
+ if (mutex_lock_killable(&misc_mtx))
+ return -EINTR;

if (is_dynamic) {
int i = find_first_zero_bit(misc_minors, DYNAMIC_MINORS);
diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
index 1928b3918242..d2a192306e0c 100644
--- a/drivers/usb/storage/transport.c
+++ b/drivers/usb/storage/transport.c
@@ -164,7 +164,7 @@ static int usb_stor_msg_common(struct us_data *us, int timeout)

/* wait for the completion of the URB */
timeleft = wait_for_completion_interruptible_timeout(
- &urb_done, timeout ? : MAX_SCHEDULE_TIMEOUT);
+ &urb_done, timeout ? : 60 * HZ);

clear_bit(US_FLIDX_URB_ACTIVE, &us->dflags);

------------------------------------------------------------

2022-07-06 11:30:48

by Oliver Neukum

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable



On 06.07.22 12:26, Tetsuo Handa wrote:
> On 2022/07/06 15:34, Greg KH wrote:
>> On Wed, Jul 06, 2022 at 03:21:15PM +0900, Tetsuo Handa wrote:
>>> How should we fix this problem?
>>
>> We can decrease the timeout in usb_stor_msg_common(). I imagine that if
>> that timeout is ever hit in this sequence, then all will recover, right?

Not really. The timeout there is supposed to come from the SCSI layer
in the general case.

>> Try decreasing it to a sane number and see what happens.
>
> Yes, all recovers with below diff.
>
> ------------------------------------------------------------
> diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
> index 1928b3918242..d2a192306e0c 100644
> --- a/drivers/usb/storage/transport.c
> +++ b/drivers/usb/storage/transport.c
> @@ -164,7 +164,7 @@ static int usb_stor_msg_common(struct us_data *us, int timeout)
>
> /* wait for the completion of the URB */
> timeleft = wait_for_completion_interruptible_timeout(
> - &urb_done, timeout ? : MAX_SCHEDULE_TIMEOUT);
> + &urb_done, timeout ? : 5 * HZ);
>
> clear_bit(US_FLIDX_URB_ACTIVE, &us->dflags);
>
> ------------------------------------------------------------
>
> But
>
>>> Anyway,
>>>
>>> /*
>>> * Resuming. We may need to wait for the image device to
>>> * appear.
>>> */
>>> wait_for_device_probe();
>>>
>>> in snapshot_open() will sleep forever if some device became unresponsive.
>>>
>
> wait_for_device_probe() in snapshot_open() was added by commit c751085943362143
> ("PM/Hibernate: Wait for SCSI devices scan to complete during resume"), and
> that commit did not take into account possibility of unresponsive hardware.
>
> "In addition, if the resume from hibernation is userland-driven, it's
> better to wait for all device probes in the kernel to complete before
> attempting to open the resume device."
>
> It is trivial to make e.g. atomic_read(&probe_count) == 10, which means that
> acceptable timeout for usb_stor_msg_common() may be no longer acceptable timeout
> for wait_for_device_probe(). Unlike flush_workqueue(), wait_for_device_probe()
> can wait forever if new probe requests keep coming in while waiting for existing
> probe requests to complete. Therefore, I think we should introduce timeout on
> wait_for_device_probe() side as well.
>
> I would like to propose below changes in 3 patches as fixes for this problem.
> Since there are 13 wait_for_device_probe() callers, maybe we want both killable
> and uninterruptible versions and pass timeout as an argument...
>
> ------------------------------------------------------------
> drivers/base/dd.c | 3 ++-
> drivers/char/misc.c | 9 ++++++---
> drivers/usb/storage/transport.c | 2 +-
> 3 files changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 3fc3b5940bb3..67e08b381ee2 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -723,7 +723,8 @@ void wait_for_device_probe(void)
> flush_work(&deferred_probe_work);
>
> /* wait for the known devices to complete their probing */
> - wait_event(probe_waitqueue, atomic_read(&probe_count) == 0);
> + wait_event_killable_timeout(probe_waitqueue,
> + atomic_read(&probe_count) == 0, 60 * HZ);

You cannot just let a timeout go unreported.

> async_synchronize_full();
> }
> EXPORT_SYMBOL_GPL(wait_for_device_probe);
> diff --git a/drivers/char/misc.c b/drivers/char/misc.c
> index ca5141ed5ef3..6430c534a1cb 100644
> --- a/drivers/char/misc.c
> +++ b/drivers/char/misc.c
> @@ -104,7 +104,8 @@ static int misc_open(struct inode *inode, struct file *file)
> int err = -ENODEV;
> const struct file_operations *new_fops = NULL;
>
> - mutex_lock(&misc_mtx);
> + if (mutex_lock_killable(&misc_mtx))
> + return -EINTR;
>
> list_for_each_entry(c, &misc_list, list) {
> if (c->minor == minor) {
> @@ -116,7 +117,8 @@ static int misc_open(struct inode *inode, struct file *file)
> if (!new_fops) {
> mutex_unlock(&misc_mtx);
> request_module("char-major-%d-%d", MISC_MAJOR, minor);
> - mutex_lock(&misc_mtx);
> + if (mutex_lock_killable(&misc_mtx))
> + return -EINTR;
>
> list_for_each_entry(c, &misc_list, list) {
> if (c->minor == minor) {
> @@ -178,7 +180,8 @@ int misc_register(struct miscdevice *misc)
>
> INIT_LIST_HEAD(&misc->list);
>
> - mutex_lock(&misc_mtx);
> + if (mutex_lock_killable(&misc_mtx))
> + return -EINTR;

This usually runs in the context of a kernel thread, does it not?
What could kill that?

>
> if (is_dynamic) {
> int i = find_first_zero_bit(misc_minors, DYNAMIC_MINORS);
> diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
> index 1928b3918242..d2a192306e0c 100644
> --- a/drivers/usb/storage/transport.c
> +++ b/drivers/usb/storage/transport.c
> @@ -164,7 +164,7 @@ static int usb_stor_msg_common(struct us_data *us, int timeout)
>
> /* wait for the completion of the URB */
> timeleft = wait_for_completion_interruptible_timeout(
> - &urb_done, timeout ? : MAX_SCHEDULE_TIMEOUT);
> + &urb_done, timeout ? : 60 * HZ);

No, I am sorry, but there are SCSI commands that can run for more than
60 seconds. Formats, rewinding tapes, ...It seems to me you need to look
at the SCSI scanning code.

Regards
Oliver

2022-07-06 12:35:34

by Oliver Neukum

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable



On 06.07.22 12:26, Tetsuo Handa wrote:

> wait_for_device_probe() in snapshot_open() was added by commit c751085943362143
> ("PM/Hibernate: Wait for SCSI devices scan to complete during resume"), and
> that commit did not take into account possibility of unresponsive hardware.
>
> "In addition, if the resume from hibernation is userland-driven, it's
> better to wait for all device probes in the kernel to complete before
> attempting to open the resume device."
>
>

Testsuo-san,

I am afraid my first reply was too court to be useful. Sorry for that.
First let me congratulate you for finding and analyzing an important
issue.
Yet, I am afraid while your analysis is good, your attempt at a fix
suffers from being too close to the analysis, instead of taking a step
back and looking at root causes.
Frankly I was afraid you'd look at UAS next and try to fix it in the
same way. And that is the core of the issue. IF the SCSI layer can be
made to hang a host controller by an unresponsive device, the issue
is in the SCSI layer. If you were to insist on your current approach
you'd have to go through every host controller driver. You are just
seeing this only with storage because you are fuzzing USB, not SCSI.
But the bug you found is more fundamental than a single bus system.

The SCSI layer is just designed in such a way that timeouts are handled
by the core. That is a fundamental design decision you cannot easily
deviate from. Hence I would like to ask you to take a closer look
at the scanning code in the SCSI layer, not a host controller driver.

Regards
Oliver

2022-07-07 05:34:13

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 2022/07/06 20:04, Oliver Neukum wrote:
>
>
> On 06.07.22 12:26, Tetsuo Handa wrote:
>> On 2022/07/06 15:34, Greg KH wrote:
>>> On Wed, Jul 06, 2022 at 03:21:15PM +0900, Tetsuo Handa wrote:
>>>> How should we fix this problem?
>>>
>>> We can decrease the timeout in usb_stor_msg_common(). I imagine that if
>>> that timeout is ever hit in this sequence, then all will recover, right?
>
> Not really. The timeout there is supposed to come from the SCSI layer
> in the general case.

I couldn't catch. usb_stor_msg_common() belongs to USB subsystem, doesn't it?

How does SCSI layer matter here?
Does USB storage device shows up as if SCSI disk (e.g. /dev/sdb) relevant?

>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>> index 3fc3b5940bb3..67e08b381ee2 100644
>> --- a/drivers/base/dd.c
>> +++ b/drivers/base/dd.c
>> @@ -723,7 +723,8 @@ void wait_for_device_probe(void)
>> flush_work(&deferred_probe_work);
>>
>> /* wait for the known devices to complete their probing */
>> - wait_event(probe_waitqueue, atomic_read(&probe_count) == 0);
>> + wait_event_killable_timeout(probe_waitqueue,
>> + atomic_read(&probe_count) == 0, 60 * HZ);
>
> You cannot just let a timeout go unreported.

I think regarding snapshot_open() case, being best-effort is acceptable, for
the reason of timeout might be new probe requests kept coming in, even after
the device which snapshot_open() needed to wait already became ready.

>
>> async_synchronize_full();
>> }
>> EXPORT_SYMBOL_GPL(wait_for_device_probe);
>> diff --git a/drivers/char/misc.c b/drivers/char/misc.c
>> index ca5141ed5ef3..6430c534a1cb 100644
>> --- a/drivers/char/misc.c
>> +++ b/drivers/char/misc.c
>> @@ -104,7 +104,8 @@ static int misc_open(struct inode *inode, struct file *file)
>> int err = -ENODEV;
>> const struct file_operations *new_fops = NULL;
>>
>> - mutex_lock(&misc_mtx);
>> + if (mutex_lock_killable(&misc_mtx))
>> + return -EINTR;
>>
>> list_for_each_entry(c, &misc_list, list) {
>> if (c->minor == minor) {
>> @@ -116,7 +117,8 @@ static int misc_open(struct inode *inode, struct file *file)
>> if (!new_fops) {
>> mutex_unlock(&misc_mtx);
>> request_module("char-major-%d-%d", MISC_MAJOR, minor);
>> - mutex_lock(&misc_mtx);
>> + if (mutex_lock_killable(&misc_mtx))
>> + return -EINTR;
>>
>> list_for_each_entry(c, &misc_list, list) {
>> if (c->minor == minor) {
>> @@ -178,7 +180,8 @@ int misc_register(struct miscdevice *misc)
>>
>> INIT_LIST_HEAD(&misc->list);
>>
>> - mutex_lock(&misc_mtx);
>> + if (mutex_lock_killable(&misc_mtx))
>> + return -EINTR;
>
> This usually runs in the context of a kernel thread, does it not?
> What could kill that?

I think that misc_register() is usually called from module's __init function
which runs in the user context. Though, it would be much better if we can avoid
sleeping operations with misc_mtx held.

>
>>
>> if (is_dynamic) {
>> int i = find_first_zero_bit(misc_minors, DYNAMIC_MINORS);
>> diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
>> index 1928b3918242..d2a192306e0c 100644
>> --- a/drivers/usb/storage/transport.c
>> +++ b/drivers/usb/storage/transport.c
>> @@ -164,7 +164,7 @@ static int usb_stor_msg_common(struct us_data *us, int timeout)
>>
>> /* wait for the completion of the URB */
>> timeleft = wait_for_completion_interruptible_timeout(
>> - &urb_done, timeout ? : MAX_SCHEDULE_TIMEOUT);
>> + &urb_done, timeout ? : 60 * HZ);
>
> No, I am sorry, but there are SCSI commands that can run for more than
> 60 seconds. Formats, rewinding tapes, ...It seems to me you need to look
> at the SCSI scanning code.

I wonder how SCSI layer matters here.

usb_stor_ctrl_transfer() is not using timeout for unknown reason

/*
* Transfer one control message, without timeouts, but allowing early
* termination. Return codes are USB_STOR_XFER_xxx.
*/

but I think we should supply timeout in order to handle unresponsive hardware.
Although usb_stor_ctrl_transfer() is called from kernel threads (which usually
do not receive signals), use of wait_for_completion_interruptible_timeout() in
usb_stor_msg_common() suggests that aborting upon timeout is also tolerable.





Let's summarize current location:

(1) Greg wants me to fix snapshot_open() not to sleep for too long, instead of
making misc_open() killable.

(2) I found snapshot_open() calls wait_for_device_probe() which might sleep
long enough to consider as hung up due to:

(a) One of existing probe request got stuck due to unresponsive hardware.

(b) New probe requests come in before existing probe requests complete.

(3) Because of (2), it is difficult to guarantee snapshot_open() not to sleep for
too long.

(4) Because of (3), calling file->f_op->open(inode, file) with misc_mtx held can
block mutex_lock(&misc_mtx) too long. This is the phenomenon syzbot is reporting.

Initial mitigation was to replace mutex_lock(&misc_mtx) with mutex_lock_killable(&misc_mtx)
so that /dev/raw-gadget users can terminate upon SIGKILL and khungtaskd will not complain
about misc_mtx.

Next mitigation was not to call file->f_op->open() with misc_mtx held.
Wedson worried that this approach breaks modules which call misc_deregister(), but
I think we can use this approach for modules which do not need to call misc_deregister()
given that this approach is opt-in basis.

I also think that we can bring wait_for_device_probe() in snapshot_open() to before
lock_system_sleep(), for system_transition_mutex will not be required for waiting for
the image device to appear. If we can accept the "not to call file->f_op->open() with
misc_mtx held" mitigation, wait_for_device_probe() in snapshot_open() can be called
without locks.

Finding universally safe timeout value is beyond what I can do for this report.
Regarding this report, I think we can lower the risk of regression if we apply
timeout for atomic_read(&probe_count) == 0 from only snapshot_open().
Can we make below changes?

------------------------------------------------------------

drivers/base/dd.c | 14 ++++++++++++++
drivers/char/misc.c | 4 ++++
include/linux/device/driver.h | 1 +
include/linux/miscdevice.h | 1 +
kernel/power/user.c | 31 ++++++++++++++++++-------------
5 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 3fc3b5940bb3..60fb22a50a4e 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -728,6 +728,20 @@ void wait_for_device_probe(void)
}
EXPORT_SYMBOL_GPL(wait_for_device_probe);

+void wait_for_device_probe_killable_timeout(unsigned long timeout)
+{
+ /* wait for probe timeout */
+ wait_event(probe_timeout_waitqueue, !driver_deferred_probe_timeout);
+
+ /* wait for the deferred probe workqueue to finish */
+ flush_work(&deferred_probe_work);
+
+ /* wait for the known devices to complete their probing */
+ wait_event_killable_timeout(probe_waitqueue,
+ atomic_read(&probe_count) == 0, timeout);
+ async_synchronize_full();
+}
+
static int __driver_probe_device(struct device_driver *drv, struct device *dev)
{
int ret = 0;
diff --git a/drivers/char/misc.c b/drivers/char/misc.c
index ca5141ed5ef3..16601c78fecb 100644
--- a/drivers/char/misc.c
+++ b/drivers/char/misc.c
@@ -137,6 +137,10 @@ static int misc_open(struct inode *inode, struct file *file)

err = 0;
replace_fops(file, new_fops);
+ if (c->unlocked_open && file->f_op->open) {
+ mutex_unlock(&misc_mtx);
+ return file->f_op->open(inode, file);
+ }
if (file->f_op->open)
err = file->f_op->open(inode, file);
fail:
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index 15e7c5e15d62..324b2866d0f2 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -129,6 +129,7 @@ extern struct device_driver *driver_find(const char *name,
struct bus_type *bus);
extern int driver_probe_done(void);
extern void wait_for_device_probe(void);
+extern void wait_for_device_probe_killable_timeout(unsigned long timeout);

/* sysfs interface for exporting driver attributes */

diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index 0676f18093f9..e112ef9e3b7b 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -86,6 +86,7 @@ struct miscdevice {
const struct attribute_group **groups;
const char *nodename;
umode_t mode;
+ bool unlocked_open;
};

extern int misc_register(struct miscdevice *misc);
diff --git a/kernel/power/user.c b/kernel/power/user.c
index ad241b4ff64c..1960d05ee439 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -51,35 +51,39 @@ static int snapshot_open(struct inode *inode, struct file *filp)
if (!hibernation_available())
return -EPERM;

+ switch (filp->f_flags & O_ACCMODE) {
+ case O_RDWR: /* Can't do both at the same time. */
+ return -ENOSYS;
+ case O_RDONLY: /* Hibernating */
+ /* The image device should be already ready. */
+ break;
+ default: /* Resuming */
+ /*
+ * Since the image device might not be ready, try to wait up to
+ * 5 minutes. We should not wait forever, for we might get stuck
+ * due to unresponsive devices and/or new probe events which
+ * are irrelevant to the image device keep coming in.
+ */
+ wait_for_device_probe_killable_timeout(300 * HZ);
+ break;
+ }
+
lock_system_sleep();

if (!hibernate_acquire()) {
error = -EBUSY;
goto Unlock;
}
-
- if ((filp->f_flags & O_ACCMODE) == O_RDWR) {
- hibernate_release();
- error = -ENOSYS;
- goto Unlock;
- }
nonseekable_open(inode, filp);
data = &snapshot_state;
filp->private_data = data;
memset(&data->handle, 0, sizeof(struct snapshot_handle));
if ((filp->f_flags & O_ACCMODE) == O_RDONLY) {
- /* Hibernating. The image device should be accessible. */
data->swap = swap_type_of(swsusp_resume_device, 0);
data->mode = O_RDONLY;
data->free_bitmaps = false;
error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION);
} else {
- /*
- * Resuming. We may need to wait for the image device to
- * appear.
- */
- wait_for_device_probe();
-
data->swap = -1;
data->mode = O_WRONLY;
error = pm_notifier_call_chain_robust(PM_RESTORE_PREPARE, PM_POST_RESTORE);
@@ -441,6 +445,7 @@ static struct miscdevice snapshot_device = {
.minor = SNAPSHOT_MINOR,
.name = "snapshot",
.fops = &snapshot_fops,
+ .unlocked_open = true, /* Call snapshot_open() with no locks held. */
};

static int __init snapshot_device_init(void)
------------------------------------------------------------

2022-07-07 08:07:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Thu, Jul 07, 2022 at 02:06:38PM +0900, Tetsuo Handa wrote:
> On 2022/07/06 20:04, Oliver Neukum wrote:
> >
> >
> > On 06.07.22 12:26, Tetsuo Handa wrote:
> >> On 2022/07/06 15:34, Greg KH wrote:
> >>> On Wed, Jul 06, 2022 at 03:21:15PM +0900, Tetsuo Handa wrote:
> >>>> How should we fix this problem?
> >>>
> >>> We can decrease the timeout in usb_stor_msg_common(). I imagine that if
> >>> that timeout is ever hit in this sequence, then all will recover, right?
> >
> > Not really. The timeout there is supposed to come from the SCSI layer
> > in the general case.
>
> I couldn't catch. usb_stor_msg_common() belongs to USB subsystem, doesn't it?
>
> How does SCSI layer matter here?
> Does USB storage device shows up as if SCSI disk (e.g. /dev/sdb) relevant?

Yes. USB storage is a very tiny wrapper around the whole SCSI protocol,
it is a SCSI disk for all all intensive purposes.

thanks,

greg k-h

2022-07-08 06:35:35

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Tue, 5 Jul 2022 at 12:10, Greg Kroah-Hartman
<[email protected]> wrote:
>
> On Tue, Jul 05, 2022 at 09:20:24AM +0200, Dmitry Vyukov wrote:
> > On Tue, 5 Jul 2022 at 07:54, Tetsuo Handa
> > <[email protected]> wrote:
> > > On Tue, Jul 05, 2022 at 02:21:17PM +0900, Tetsuo Handa wrote:
> > > > On 2022/07/04 23:31, Greg KH wrote:
> > > > > I don't understand what you are trying to "fix" here. What is userspace
> > > > > doing (as a normal user) that is causing a problem, and what problem is
> > > > > it causing and for what device/hardware/driver is this a problem?
> > > >
> > > > Currently the root cause is unknown.
> > > > This might be another example of deadlock hidden by device_initialize().
> > > >
> > > > We can see from https://syzkaller.appspot.com/text?tag=CrashReport&x=11feb7e0080000 that
> > > > when khungtaskd reports that a process is blocked waiting for misc_mtx at misc_open(),
> > > > there is a process which is holding system_transition_mutex from snapshot_open().
> > >
> > > /dev/snapshot is not read/writable by anyone but root for obvious
> > > reasons.
> > >
> > > And perhaps it's something that syzbot shouldn't be fuzzing unless it
> > > wants to take the system down easily :)
> >
> > We could turn CONFIG_HIBERNATION_SNAPSHOT_DEV off for syzbot, but it
> > will also mean this part of the kernel won't be tested at all.
> > I see it has 14 ioclt's (below) and not all of them look problematic
> > (like POWER_OFF).
> > Perhaps the kernel could restrict access only to reboot/restore
> > functionality? This way we could at least test everything related to
> > snapshot creation.
>
> This is already restricted to root, why would you want to restrict it
> anymore?

Root like the wrong criteria here. Root protection is for global
machine state and in some cases closing unreliable code. It's not
about if this code should be randomly tested or not. In fact,
unreliable code (bpf, filesystems) is exactly the code that needs to
be tested as much as possible. But it is restricted with root as well.

Though, I noticed syzkaller already avoids SNAPSHOT_FREEZE and
SNAPSHOT_POWER_OFF:
https://github.com/google/syzkaller/blob/bff65f44b47bd73f56c3d6a5c3899de5f5775136/sys/linux/init.go#L310-L315
This should work fine (unless the IOCTL const values don't collide
with any other IOCTL const values, otherwise these duplicate values
won't be tested as well).



> > #define SNAPSHOT_FREEZE _IO(SNAPSHOT_IOC_MAGIC, 1)
> > #define SNAPSHOT_UNFREEZE _IO(SNAPSHOT_IOC_MAGIC, 2)
> > #define SNAPSHOT_ATOMIC_RESTORE _IO(SNAPSHOT_IOC_MAGIC, 4)
> > #define SNAPSHOT_FREE _IO(SNAPSHOT_IOC_MAGIC, 5)
> > #define SNAPSHOT_FREE_SWAP_PAGES _IO(SNAPSHOT_IOC_MAGIC, 9)
> > #define SNAPSHOT_S2RAM _IO(SNAPSHOT_IOC_MAGIC, 11)
> > #define SNAPSHOT_SET_SWAP_AREA _IOW(SNAPSHOT_IOC_MAGIC, 13, struct
> > resume_swap_area)
> > #define SNAPSHOT_GET_IMAGE_SIZE _IOR(SNAPSHOT_IOC_MAGIC, 14, __kernel_loff_t)
> > #define SNAPSHOT_PLATFORM_SUPPORT _IO(SNAPSHOT_IOC_MAGIC, 15)
> > #define SNAPSHOT_POWER_OFF _IO(SNAPSHOT_IOC_MAGIC, 16)
> > #define SNAPSHOT_CREATE_IMAGE _IOW(SNAPSHOT_IOC_MAGIC, 17, int)
> > #define SNAPSHOT_PREF_IMAGE_SIZE _IO(SNAPSHOT_IOC_MAGIC, 18)
> > #define SNAPSHOT_AVAIL_SWAP_SIZE _IOR(SNAPSHOT_IOC_MAGIC, 19, __kernel_loff_t)
> > #define SNAPSHOT_ALLOC_SWAP_PAGE _IOR(SNAPSHOT_IOC_MAGIC, 20, __kernel_loff_t)
>
> Fuzzing this is always nice, but be very aware of the system state
> changes that you are creating. Also know when you make these state
> changes, the rest of the system's functionality also changes.
>
> thanks,
>
> greg k-h

2022-07-08 13:50:07

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On Thu, Jul 07, 2022 at 02:06:38PM +0900, Tetsuo Handa wrote:
> Let's summarize current location:
>
> (1) Greg wants me to fix snapshot_open() not to sleep for too long, instead of
> making misc_open() killable.
>
> (2) I found snapshot_open() calls wait_for_device_probe() which might sleep
> long enough to consider as hung up due to:
>
> (a) One of existing probe request got stuck due to unresponsive hardware.
>
> (b) New probe requests come in before existing probe requests complete.
>
> (3) Because of (2), it is difficult to guarantee snapshot_open() not to sleep for
> too long.
>
> (4) Because of (3), calling file->f_op->open(inode, file) with misc_mtx held can
> block mutex_lock(&misc_mtx) too long. This is the phenomenon syzbot is reporting.
>
> Initial mitigation was to replace mutex_lock(&misc_mtx) with mutex_lock_killable(&misc_mtx)
> so that /dev/raw-gadget users can terminate upon SIGKILL and khungtaskd will not complain
> about misc_mtx.
>
> Next mitigation was not to call file->f_op->open() with misc_mtx held.
> Wedson worried that this approach breaks modules which call misc_deregister(), but
> I think we can use this approach for modules which do not need to call misc_deregister()
> given that this approach is opt-in basis.
>
> I also think that we can bring wait_for_device_probe() in snapshot_open() to before
> lock_system_sleep(), for system_transition_mutex will not be required for waiting for
> the image device to appear. If we can accept the "not to call file->f_op->open() with
> misc_mtx held" mitigation, wait_for_device_probe() in snapshot_open() can be called
> without locks.
>
> Finding universally safe timeout value is beyond what I can do for this report.
> Regarding this report, I think we can lower the risk of regression if we apply
> timeout for atomic_read(&probe_count) == 0 from only snapshot_open().
> Can we make below changes?
>
> ------------------------------------------------------------
>
> drivers/base/dd.c | 14 ++++++++++++++
> drivers/char/misc.c | 4 ++++
> include/linux/device/driver.h | 1 +
> include/linux/miscdevice.h | 1 +
> kernel/power/user.c | 31 ++++++++++++++++++-------------
> 5 files changed, 38 insertions(+), 13 deletions(-)

Can you make this a patch series, it's hard to tease out what the
different things are attempting to do here :(

thanks,

greg k-h

2022-07-10 03:15:02

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH] char: misc: make misc_open() and misc_register() killable

On 2022/07/08 22:37, Greg KH wrote:
> Can you make this a patch series, it's hard to tease out what the
> different things are attempting to do here :(

I see. Posted v2 at https://lkml.kernel.org/r/[email protected] .