2019-08-12 12:19:36

by syzbot

[permalink] [raw]
Subject: INFO: task hung in wdm_flush

Hello,

syzbot found the following crash on:

HEAD commit: e96407b4 usb-fuzzer: main usb gadget fuzzer driver
git tree: https://github.com/google/kasan.git usb-fuzzer
console output: https://syzkaller.appspot.com/x/log.txt?x=1046c6ee600000
kernel config: https://syzkaller.appspot.com/x/.config?x=cfa2c18fb6a8068e
dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
compiler: gcc (GCC) 9.0.0 20181231 (experimental)
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=1299132c600000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=176e6d8c600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

INFO: task syz-executor121:1726 blocked for more than 143 seconds.
Not tainted 5.3.0-rc2+ #25
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor121 D28520 1726 1724 0x80004006
Call Trace:
schedule+0x9a/0x250 kernel/sched/core.c:3944
wdm_flush+0x20c/0x370 drivers/usb/class/cdc-wdm.c:590
filp_close+0xb4/0x160 fs/open.c:1166
close_files fs/file.c:388 [inline]
put_files_struct fs/file.c:416 [inline]
put_files_struct+0x1d8/0x2e0 fs/file.c:413
exit_files+0x7e/0xa0 fs/file.c:445
do_exit+0x8bc/0x2c50 kernel/exit.c:873
do_group_exit+0x125/0x340 kernel/exit.c:982
get_signal+0x466/0x23d0 kernel/signal.c:2728
do_signal+0x88/0x14e0 arch/x86/kernel/signal.c:815
exit_to_usermode_loop+0x1a2/0x200 arch/x86/entry/common.c:159
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x45f/0x580 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x401520
Code: 6e 65 54 61 62 6c 65 00 67 65 74 63 6f 6e 00 5f 69 6e 69 74 00 69 73
5f 73 65 6c 69 6e 75 78 5f 65 6e 61 62 6c 65 64 00 73 65 <63> 75 72 69 74
79 5f 67 65 74 65 6e 66 6f 72 63 65 00 67 65 74 5f
RSP: 002b:00007ffd59c75df8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000401520
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00007ffd59c75e10
RBP: 00000000006cc018 R08: 0000000000000000 R09: 000000000000000f
R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000402540
R13: 00000000004025d0 R14: 0000000000000000 R15: 0000000000000000
INFO: task syz-executor121:1731 blocked for more than 143 seconds.
Not tainted 5.3.0-rc2+ #25
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor121 D28520 1731 1730 0x80004006
Call Trace:
schedule+0x9a/0x250 kernel/sched/core.c:3944
wdm_flush+0x20c/0x370 drivers/usb/class/cdc-wdm.c:590
filp_close+0xb4/0x160 fs/open.c:1166
close_files fs/file.c:388 [inline]
put_files_struct fs/file.c:416 [inline]
put_files_struct+0x1d8/0x2e0 fs/file.c:413
exit_files+0x7e/0xa0 fs/file.c:445
do_exit+0x8bc/0x2c50 kernel/exit.c:873
do_group_exit+0x125/0x340 kernel/exit.c:982
get_signal+0x466/0x23d0 kernel/signal.c:2728
do_signal+0x88/0x14e0 arch/x86/kernel/signal.c:815
exit_to_usermode_loop+0x1a2/0x200 arch/x86/entry/common.c:159
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x45f/0x580 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4417e9
Code: 65 64 2e 0a 44 69 64 20 79 6f 75 20 64 6f 20 61 20 22 6d 61 6b 65 20
69 6e 73 74 61 6c 6c 22 3f 0a 53 75 67 67 65 73 74 65 64 <20> 61 63 74 69
6f 6e 3a 20 72 75 6e 20 72 73 79 73 6c 6f 67 64 20
RSP: 002b:00007ffd59c75ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00000000004417e9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00000000006cc018 R08: 000000000000000f R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402540
R13: 00000000004025d0 R14: 0000000000000000 R15: 0000000000000000
INFO: task syz-executor121:1732 blocked for more than 143 seconds.
Not tainted 5.3.0-rc2+ #25
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor121 D28520 1732 1728 0x80004006
Call Trace:
schedule+0x9a/0x250 kernel/sched/core.c:3944
wdm_flush+0x20c/0x370 drivers/usb/class/cdc-wdm.c:590
filp_close+0xb4/0x160 fs/open.c:1166
close_files fs/file.c:388 [inline]
put_files_struct fs/file.c:416 [inline]
put_files_struct+0x1d8/0x2e0 fs/file.c:413
exit_files+0x7e/0xa0 fs/file.c:445
do_exit+0x8bc/0x2c50 kernel/exit.c:873
do_group_exit+0x125/0x340 kernel/exit.c:982
get_signal+0x466/0x23d0 kernel/signal.c:2728
do_signal+0x88/0x14e0 arch/x86/kernel/signal.c:815
exit_to_usermode_loop+0x1a2/0x200 arch/x86/entry/common.c:159
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x45f/0x580 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x401520
Code: 00 00 3d 02 00 00 46 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 10
01 00 00 2f 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 20 01 00 00 00 00
RSP: 002b:00007ffd59c75df8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000401520
RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00007ffd59c75e10
RBP: 00000000006cc018 R08: 0000000000000000 R09: 000000000000000f
R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000402540
R13: 00000000004025d0 R14: 0000000000000000 R15: 0000000000000000
INFO: task syz-executor121:1733 blocked for more than 144 seconds.
Not tainted 5.3.0-rc2+ #25
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor121 D28376 1733 1725 0x80000002
Call Trace:
schedule+0x9a/0x250 kernel/sched/core.c:3944
wdm_flush+0x20c/0x370 drivers/usb/class/cdc-wdm.c:590
filp_close+0xb4/0x160 fs/open.c:1166
close_files fs/file.c:388 [inline]
put_files_struct fs/file.c:416 [inline]
put_files_struct+0x1d8/0x2e0 fs/file.c:413
exit_files+0x7e/0xa0 fs/file.c:445
do_exit+0x8bc/0x2c50 kernel/exit.c:873
do_group_exit+0x125/0x340 kernel/exit.c:982
__do_sys_exit_group kernel/exit.c:993 [inline]
__se_sys_exit_group kernel/exit.c:991 [inline]
__x64_sys_exit_group+0x3a/0x50 kernel/exit.c:991
do_syscall_64+0xb7/0x580 arch/x86/entry/common.c:296
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x440438
Code: 61 74 68 3e 5d 0a 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 5b
2d 75 3c 6e 75 6d 62 65 72 3e 5d 0a 54 6f 20 72 75 6e 20 <72> 73 79 73 6c
6f 67 64 20 69 6e 20 6e 61 74 69 76 65 20 6d 6f 64
RSP: 002b:00007ffd59c75e68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440438
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00000000004bff70 R08: 00000000000000e7 R09: ffffffffffffffd0
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00000000006d2180 R14: 0000000000000000 R15: 0000000000000000
INFO: task syz-executor121:1734 blocked for more than 144 seconds.
Not tainted 5.3.0-rc2+ #25
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor121 D28248 1734 1729 0x80004006
Call Trace:
schedule+0x9a/0x250 kernel/sched/core.c:3944
wdm_flush+0x20c/0x370 drivers/usb/class/cdc-wdm.c:590
filp_close+0xb4/0x160 fs/open.c:1166
close_files fs/file.c:388 [inline]
put_files_struct fs/file.c:416 [inline]
put_files_struct+0x1d8/0x2e0 fs/file.c:413
exit_files+0x7e/0xa0 fs/file.c:445
do_exit+0x8bc/0x2c50 kernel/exit.c:873
do_group_exit+0x125/0x340 kernel/exit.c:982
get_signal+0x466/0x23d0 kernel/signal.c:2728
do_signal+0x88/0x14e0 arch/x86/kernel/signal.c:815
exit_to_usermode_loop+0x1a2/0x200 arch/x86/entry/common.c:159
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x45f/0x580 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4417e9
Code: 65 64 2e 0a 44 69 64 20 79 6f 75 20 64 6f 20 61 20 22 6d 61 6b 65 20
69 6e 73 74 61 6c 6c 22 3f 0a 53 75 67 67 65 73 74 65 64 <20> 61 63 74 69
6f 6e 3a 20 72 75 6e 20 72 73 79 73 6c 6f 67 64 20
RSP: 002b:00007ffd59c75ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00000000004417e9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00000000006cc018 R08: 000000000000000f R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402540
R13: 00000000004025d0 R14: 0000000000000000 R15: 0000000000000000
INFO: task syz-executor121:1736 blocked for more than 144 seconds.
Not tainted 5.3.0-rc2+ #25
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor121 D28520 1736 1727 0x80004006
Call Trace:
schedule+0x9a/0x250 kernel/sched/core.c:3944
wdm_flush+0x20c/0x370 drivers/usb/class/cdc-wdm.c:590
filp_close+0xb4/0x160 fs/open.c:1166
close_files fs/file.c:388 [inline]
put_files_struct fs/file.c:416 [inline]
put_files_struct+0x1d8/0x2e0 fs/file.c:413
exit_files+0x7e/0xa0 fs/file.c:445
do_exit+0x8bc/0x2c50 kernel/exit.c:873
do_group_exit+0x125/0x340 kernel/exit.c:982
get_signal+0x466/0x23d0 kernel/signal.c:2728
do_signal+0x88/0x14e0 arch/x86/kernel/signal.c:815
exit_to_usermode_loop+0x1a2/0x200 arch/x86/entry/common.c:159
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
do_syscall_64+0x45f/0x580 arch/x86/entry/common.c:299
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4417e9
Code: 65 64 2e 0a 44 69 64 20 79 6f 75 20 64 6f 20 61 20 22 6d 61 6b 65 20
69 6e 73 74 61 6c 6c 22 3f 0a 53 75 67 67 65 73 74 65 64 <20> 61 63 74 69
6f 6e 3a 20 72 75 6e 20 72 73 79 73 6c 6f 67 64 20
RSP: 002b:00007ffd59c75ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: fffffffffffffe00 RBX: 0000000000000000 RCX: 00000000004417e9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00000000006cc018 R08: 000000000000000f R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402540
R13: 00000000004025d0 R14: 0000000000000000 R15: 0000000000000000

Showing all locks held in the system:
1 lock held by khungtaskd/23:
#0: 00000000743497a3 (rcu_read_lock){....}, at:
debug_show_all_locks+0x53/0x269 kernel/locking/lockdep.c:5254
1 lock held by rsyslogd/1602:
#0: 00000000988125b0 (&f->f_pos_lock){+.+.}, at: __fdget_pos+0xe3/0x100
fs/file.c:801
2 locks held by getty/1693:
#0: 0000000047c29258 (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:272
#1: 00000000527dfb3a (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x223/0x1ae0 drivers/tty/n_tty.c:2156
2 locks held by getty/1694:
#0: 000000003a351c46 (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:272
#1: 00000000d8d75c5b (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x223/0x1ae0 drivers/tty/n_tty.c:2156
2 locks held by getty/1695:
#0: 00000000e15b15bf (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:272
#1: 000000004d294c18 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x223/0x1ae0 drivers/tty/n_tty.c:2156
2 locks held by getty/1696:
#0: 0000000051d028a3 (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:272
#1: 0000000038c23150 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x223/0x1ae0 drivers/tty/n_tty.c:2156
2 locks held by getty/1697:
#0: 000000001b33f7ab (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:272
#1: 00000000f5955915 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x223/0x1ae0 drivers/tty/n_tty.c:2156
2 locks held by getty/1698:
#0: 000000007ef217e0 (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:272
#1: 00000000bc876517 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x223/0x1ae0 drivers/tty/n_tty.c:2156
2 locks held by getty/1699:
#0: 000000000ee3efd4 (&tty->ldisc_sem){++++}, at:
tty_ldisc_ref_wait+0x22/0x80 drivers/tty/tty_ldisc.c:272
#1: 000000006bc64f89 (&ldata->atomic_read_lock){+.+.}, at:
n_tty_read+0x223/0x1ae0 drivers/tty/n_tty.c:2156

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 23 Comm: khungtaskd Not tainted 5.3.0-rc2+ #25
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xca/0x13e lib/dump_stack.c:113
nmi_cpu_backtrace.cold+0x55/0x96 lib/nmi_backtrace.c:101
nmi_trigger_cpumask_backtrace+0x1b0/0x1c7 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
watchdog+0x9a4/0xe50 kernel/hung_task.c:289
kthread+0x318/0x420 kernel/kthread.c:255
ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1 skipped: idling at native_safe_halt
arch/x86/include/asm/irqflags.h:60 [inline]
NMI backtrace for cpu 1 skipped: idling at arch_safe_halt
arch/x86/include/asm/irqflags.h:103 [inline]
NMI backtrace for cpu 1 skipped: idling at default_idle+0x28/0x2e0
arch/x86/kernel/process.c:580


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches


2019-11-19 09:16:23

by Bjørn Mork

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

syzbot <[email protected]> writes:

> INFO: task syz-executor121:1726 blocked for more than 143 seconds.
> Not tainted 5.3.0-rc2+ #25
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor121 D28520 1726 1724 0x80004006
> Call Trace:
> schedule+0x9a/0x250 kernel/sched/core.c:3944
> wdm_flush+0x20c/0x370 drivers/usb/class/cdc-wdm.c:590
> filp_close+0xb4/0x160 fs/open.c:1166
> close_files fs/file.c:388 [inline]
> put_files_struct fs/file.c:416 [inline]
> put_files_struct+0x1d8/0x2e0 fs/file.c:413
> exit_files+0x7e/0xa0 fs/file.c:445
> do_exit+0x8bc/0x2c50 kernel/exit.c:873
> do_group_exit+0x125/0x340 kernel/exit.c:982
> get_signal+0x466/0x23d0 kernel/signal.c:2728
> do_signal+0x88/0x14e0 arch/x86/kernel/signal.c:815
> exit_to_usermode_loop+0x1a2/0x200 arch/x86/entry/common.c:159
> prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
> syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
> do_syscall_64+0x45f/0x580 arch/x86/entry/common.c:299
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x401520
> Code: 6e 65 54 61 62 6c 65 00 67 65 74 63 6f 6e 00 5f 69 6e 69 74 00
> 69 73 5f 73 65 6c 69 6e 75 78 5f 65 6e 61 62 6c 65 64 00 73 65 <63> 75
> 72 69 74 79 5f 67 65 74 65 6e 66 6f 72 63 65 00 67 65 74 5f
> RSP: 002b:00007ffd59c75df8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
> RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000401520
> RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00007ffd59c75e10
> RBP: 00000000006cc018 R08: 0000000000000000 R09: 000000000000000f
> R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000402540
> R13: 00000000004025d0 R14: 0000000000000000 R15: 0000000000000000


Thanks to Eric for reminiding me of this one. I did look briefly at it
before, and meant to revisit it for a more thorough analysis. And
forgot, of corse...

Anyway, I believe this is not a bug.

wdm_flush will wait forever for the IN_USE flag to be cleared or the
DISCONNECTING flag to be set. The only way you can avoid this is by
creating a device that works normally up to a point and then completely
ignores all messages, but without resetting or disconnecting. It is
obviously possible to create such a device. But I think the current
error handling is more than sufficient, unless you show me some way to
abuse this or reproduce the issue with a real device.

Just disconnect the malfunctioning device and throw it away.


Bjørn

2019-11-19 10:34:08

by Oliver Neukum

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

Am Dienstag, den 19.11.2019, 10:14 +0100 schrieb Bjørn Mork:
> syzbot <[email protected]> writes:
>
> > INFO: task syz-executor121:1726 blocked for more than 143 seconds.
> > Not tainted 5.3.0-rc2+ #25
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > syz-executor121 D28520 1726 1724 0x80004006
> > Call Trace:
> > schedule+0x9a/0x250 kernel/sched/core.c:3944
> > wdm_flush+0x20c/0x370 drivers/usb/class/cdc-wdm.c:590
> > filp_close+0xb4/0x160 fs/open.c:1166
> > close_files fs/file.c:388 [inline]
> > put_files_struct fs/file.c:416 [inline]
> > put_files_struct+0x1d8/0x2e0 fs/file.c:413
> > exit_files+0x7e/0xa0 fs/file.c:445
> > do_exit+0x8bc/0x2c50 kernel/exit.c:873
> > do_group_exit+0x125/0x340 kernel/exit.c:982
> > get_signal+0x466/0x23d0 kernel/signal.c:2728
> > do_signal+0x88/0x14e0 arch/x86/kernel/signal.c:815
> > exit_to_usermode_loop+0x1a2/0x200 arch/x86/entry/common.c:159
> > prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
> > syscall_return_slowpath arch/x86/entry/common.c:274 [inline]
> > do_syscall_64+0x45f/0x580 arch/x86/entry/common.c:299
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > RIP: 0033:0x401520
> > Code: 6e 65 54 61 62 6c 65 00 67 65 74 63 6f 6e 00 5f 69 6e 69 74 00
> > 69 73 5f 73 65 6c 69 6e 75 78 5f 65 6e 61 62 6c 65 64 00 73 65 <63> 75
> > 72 69 74 79 5f 67 65 74 65 6e 66 6f 72 63 65 00 67 65 74 5f
> > RSP: 002b:00007ffd59c75df8 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
> > RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000401520
> > RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00007ffd59c75e10
> > RBP: 00000000006cc018 R08: 0000000000000000 R09: 000000000000000f
> > R10: 0000000000000064 R11: 0000000000000246 R12: 0000000000402540
> > R13: 00000000004025d0 R14: 0000000000000000 R15: 0000000000000000
>
>
> Thanks to Eric for reminiding me of this one. I did look briefly at it
> before, and meant to revisit it for a more thorough analysis. And
> forgot, of corse...
>
> Anyway, I believe this is not a bug.
>
> wdm_flush will wait forever for the IN_USE flag to be cleared or the

Damn. Too obvious. So you think we simply have pending output that does
just not complete?

> DISCONNECTING flag to be set. The only way you can avoid this is by
> creating a device that works normally up to a point and then completely
> ignores all messages,

Devices may crash. I don't think we can ignore that case.

> but without resetting or disconnecting. It is
> obviously possible to create such a device. But I think the current
> error handling is more than sufficient, unless you show me some way to
> abuse this or reproduce the issue with a real device.

Malicious devices are real. Potentially at least.
But you are right, we need not bend over to handle them well, but we
ought to be able to handle them.

Regards
Oliver


2019-11-19 11:39:27

by Bjørn Mork

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

Oliver Neukum <[email protected]> writes:
> Am Dienstag, den 19.11.2019, 10:14 +0100 schrieb Bjørn Mork:
>
>> Anyway, I believe this is not a bug.
>>
>> wdm_flush will wait forever for the IN_USE flag to be cleared or the
>
> Damn. Too obvious. So you think we simply have pending output that does
> just not complete?

I do miss a lot of stuff so I might be wrong, but I can't see any other
way this can happen. The out_callback will unconditionally clear the
IN_USE flag and wake up the wait_queue.

>> DISCONNECTING flag to be set. The only way you can avoid this is by
>> creating a device that works normally up to a point and then completely
>> ignores all messages,
>
> Devices may crash. I don't think we can ignore that case.

Sure, but I've never seen that happen without the device falling off the
bus. Which is a disconnect.

But I am all for handling this *if* someone reproduces it with a real
device. I just don't think it's worth the effort if it's only a
theoretical problem.

>> but without resetting or disconnecting. It is
>> obviously possible to create such a device. But I think the current
>> error handling is more than sufficient, unless you show me some way to
>> abuse this or reproduce the issue with a real device.
>
> Malicious devices are real. Potentially at least.
> But you are right, we need not bend over to handle them well, but we
> ought to be able to handle them.

Sure, we need to handle malicious devices. But only if they can be used
for real harm.

This warning requires physical acceess and is only slightly annoying.
Like a USB device making loud farting sounds. You'd just disconnect the
device. No need for Linux to detect the sound and handle it
automatically, I think.


Bjørn

2019-11-23 06:54:00

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

On Tue, Nov 19, 2019 at 12:34 PM Bjørn Mork <[email protected]> wrote:
>
> Oliver Neukum <[email protected]> writes:
> > Am Dienstag, den 19.11.2019, 10:14 +0100 schrieb Bjørn Mork:
> >
> >> Anyway, I believe this is not a bug.
> >>
> >> wdm_flush will wait forever for the IN_USE flag to be cleared or the
> >
> > Damn. Too obvious. So you think we simply have pending output that does
> > just not complete?
>
> I do miss a lot of stuff so I might be wrong, but I can't see any other
> way this can happen. The out_callback will unconditionally clear the
> IN_USE flag and wake up the wait_queue.
>
> >> DISCONNECTING flag to be set. The only way you can avoid this is by
> >> creating a device that works normally up to a point and then completely
> >> ignores all messages,
> >
> > Devices may crash. I don't think we can ignore that case.
>
> Sure, but I've never seen that happen without the device falling off the
> bus. Which is a disconnect.
>
> But I am all for handling this *if* someone reproduces it with a real
> device. I just don't think it's worth the effort if it's only a
> theoretical problem.
>
> >> but without resetting or disconnecting. It is
> >> obviously possible to create such a device. But I think the current
> >> error handling is more than sufficient, unless you show me some way to
> >> abuse this or reproduce the issue with a real device.
> >
> > Malicious devices are real. Potentially at least.
> > But you are right, we need not bend over to handle them well, but we
> > ought to be able to handle them.
>
> Sure, we need to handle malicious devices. But only if they can be used
> for real harm.
>
> This warning requires physical acceess and is only slightly annoying.
> Like a USB device making loud farting sounds. You'd just disconnect the
> device. No need for Linux to detect the sound and handle it
> automatically, I think.

Hi Bjørn,

Besides the production use you are referring to, there are 2 cases we
should take into account as well:
1. Testing.
Any kernel testing system needs a binary criteria for detecting kernel
bugs. It seems right to detect unkillable hung tasks as kernel bugs.
Which means that we need to resolve this in some way regardless of the
production scenario.
2. Reliable killing of processes.
It's a very important property that an admin or script can reliably
kill whatever process/container they need to kill for whatever reason.
This case results in an unkillable process, which means scripts will
fail, automated systems will misbehave, admins will waste time (if
they are qualified to resolve this at all).

2020-02-10 10:06:45

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

On Sat, Nov 23, 2019 at 7:52 AM Dmitry Vyukov <[email protected]> wrote:
>
> On Tue, Nov 19, 2019 at 12:34 PM Bjørn Mork <[email protected]> wrote:
> >
> > Oliver Neukum <[email protected]> writes:
> > > Am Dienstag, den 19.11.2019, 10:14 +0100 schrieb Bjørn Mork:
> > >
> > >> Anyway, I believe this is not a bug.
> > >>
> > >> wdm_flush will wait forever for the IN_USE flag to be cleared or the
> > >
> > > Damn. Too obvious. So you think we simply have pending output that does
> > > just not complete?
> >
> > I do miss a lot of stuff so I might be wrong, but I can't see any other
> > way this can happen. The out_callback will unconditionally clear the
> > IN_USE flag and wake up the wait_queue.
> >
> > >> DISCONNECTING flag to be set. The only way you can avoid this is by
> > >> creating a device that works normally up to a point and then completely
> > >> ignores all messages,
> > >
> > > Devices may crash. I don't think we can ignore that case.
> >
> > Sure, but I've never seen that happen without the device falling off the
> > bus. Which is a disconnect.
> >
> > But I am all for handling this *if* someone reproduces it with a real
> > device. I just don't think it's worth the effort if it's only a
> > theoretical problem.
> >
> > >> but without resetting or disconnecting. It is
> > >> obviously possible to create such a device. But I think the current
> > >> error handling is more than sufficient, unless you show me some way to
> > >> abuse this or reproduce the issue with a real device.
> > >
> > > Malicious devices are real. Potentially at least.
> > > But you are right, we need not bend over to handle them well, but we
> > > ought to be able to handle them.
> >
> > Sure, we need to handle malicious devices. But only if they can be used
> > for real harm.
> >
> > This warning requires physical acceess and is only slightly annoying.
> > Like a USB device making loud farting sounds. You'd just disconnect the
> > device. No need for Linux to detect the sound and handle it
> > automatically, I think.
>
> Hi Bjørn,
>
> Besides the production use you are referring to, there are 2 cases we
> should take into account as well:
> 1. Testing.
> Any kernel testing system needs a binary criteria for detecting kernel
> bugs. It seems right to detect unkillable hung tasks as kernel bugs.
> Which means that we need to resolve this in some way regardless of the
> production scenario.
> 2. Reliable killing of processes.
> It's a very important property that an admin or script can reliably
> kill whatever process/container they need to kill for whatever reason.
> This case results in an unkillable process, which means scripts will
> fail, automated systems will misbehave, admins will waste time (if
> they are qualified to resolve this at all).

On Mon, Feb 10, 2020 at 11:00 AM Tetsuo Handa
<[email protected]> wrote:
>
> Hello.
>
> Will you check whether patch testing is working? I tried
>
> #syz test: https://github.com/google/kasan.git usb-fuzzer
>
> but the reproducer did not trigger crash for both "with a patch"
> and "without a patch", despite dashboard is still adding crashes.
> I suspect something is wrong. Is it possible that reproducer is
> trying to test a bug which was already fixed but a different new
> bug is still reported as the same bug?

2020-02-10 10:10:04

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

On Mon, Feb 10, 2020 at 11:06 AM Dmitry Vyukov <[email protected]> wrote:
> > > Oliver Neukum <[email protected]> writes:
> > > > Am Dienstag, den 19.11.2019, 10:14 +0100 schrieb Bjørn Mork:
> > > >
> > > >> Anyway, I believe this is not a bug.
> > > >>
> > > >> wdm_flush will wait forever for the IN_USE flag to be cleared or the
> > > >
> > > > Damn. Too obvious. So you think we simply have pending output that does
> > > > just not complete?
> > >
> > > I do miss a lot of stuff so I might be wrong, but I can't see any other
> > > way this can happen. The out_callback will unconditionally clear the
> > > IN_USE flag and wake up the wait_queue.
> > >
> > > >> DISCONNECTING flag to be set. The only way you can avoid this is by
> > > >> creating a device that works normally up to a point and then completely
> > > >> ignores all messages,
> > > >
> > > > Devices may crash. I don't think we can ignore that case.
> > >
> > > Sure, but I've never seen that happen without the device falling off the
> > > bus. Which is a disconnect.
> > >
> > > But I am all for handling this *if* someone reproduces it with a real
> > > device. I just don't think it's worth the effort if it's only a
> > > theoretical problem.
> > >
> > > >> but without resetting or disconnecting. It is
> > > >> obviously possible to create such a device. But I think the current
> > > >> error handling is more than sufficient, unless you show me some way to
> > > >> abuse this or reproduce the issue with a real device.
> > > >
> > > > Malicious devices are real. Potentially at least.
> > > > But you are right, we need not bend over to handle them well, but we
> > > > ought to be able to handle them.
> > >
> > > Sure, we need to handle malicious devices. But only if they can be used
> > > for real harm.
> > >
> > > This warning requires physical acceess and is only slightly annoying.
> > > Like a USB device making loud farting sounds. You'd just disconnect the
> > > device. No need for Linux to detect the sound and handle it
> > > automatically, I think.
> >
> > Hi Bjørn,
> >
> > Besides the production use you are referring to, there are 2 cases we
> > should take into account as well:
> > 1. Testing.
> > Any kernel testing system needs a binary criteria for detecting kernel
> > bugs. It seems right to detect unkillable hung tasks as kernel bugs.
> > Which means that we need to resolve this in some way regardless of the
> > production scenario.
> > 2. Reliable killing of processes.
> > It's a very important property that an admin or script can reliably
> > kill whatever process/container they need to kill for whatever reason.
> > This case results in an unkillable process, which means scripts will
> > fail, automated systems will misbehave, admins will waste time (if
> > they are qualified to resolve this at all).
>
> On Mon, Feb 10, 2020 at 11:00 AM Tetsuo Handa
> <[email protected]> wrote:
> >
> > Hello.
> >
> > Will you check whether patch testing is working? I tried
> >
> > #syz test: https://github.com/google/kasan.git usb-fuzzer
> >
> > but the reproducer did not trigger crash for both "with a patch"
> > and "without a patch", despite dashboard is still adding crashes.
> > I suspect something is wrong. Is it possible that reproducer is
> > trying to test a bug which was already fixed but a different new
> > bug is still reported as the same bug?

Hi Tetsuo,

The simplest and fastest you may try is to request testing on another,
simpler bug. I have not seen any other signals suggesting that patch
testing in general is somehow broken.

You may also try on the exact commit the bug was reported, because
usb-fuzzer is tracking branch, things may change there.

If the old bug was fixed, but syzbot is not aware, new bugs being
piled into the same bucket is exactly what will happen. So that's
definitely possible.

2020-02-10 12:49:04

by Tetsuo Handa

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

On 2020/02/10 19:09, Dmitry Vyukov wrote:
> You may also try on the exact commit the bug was reported, because
> usb-fuzzer is tracking branch, things may change there.

OK. I explicitly tried

#syz test: https://github.com/google/kasan.git e5cd56e94edde38ca4dafae5a450c5a16b8a5f23

but syzbot still cannot reproduce this bug using the reproducer...

On 2020/02/10 21:02, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger crash:
>
> Reported-and-tested-by: [email protected]
>
> Tested on:
>
> commit: e5cd56e9 usb: gadget: add raw-gadget interface
> git tree: https://github.com/google/kasan.git
> kernel config: https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162
> dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
>
> Note: testing is done by a robot and is best-effort only.
>

Anyway, I'm just suspecting that we are forgetting to wake up all waiters
after clearing WDM_IN_USE bit because sometimes multiple threads are reported
as hung.

On 2020/02/10 15:27, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger crash:
>
> Reported-and-tested-by: [email protected]
>
> Tested on:
>
> commit: e5cd56e9 usb: gadget: add raw-gadget interface
> git tree: https://github.com/google/kasan.git usb-fuzzer
> kernel config: https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162
> dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> patch: https://syzkaller.appspot.com/x/patch.diff?x=117c3ae9e00000
>
> Note: testing is done by a robot and is best-effort only.
>

On 2020/02/10 15:55, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger crash:
>
> Reported-and-tested-by: [email protected]
>
> Tested on:
>
> commit: e5cd56e9 usb: gadget: add raw-gadget interface
> git tree: https://github.com/google/kasan.git usb-fuzzer
> kernel config: https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162
> dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> patch: https://syzkaller.appspot.com/x/patch.diff?x=13b3f6e9e00000
>
> Note: testing is done by a robot and is best-effort only.
>

On 2020/02/10 16:21, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger crash:
>
> Reported-and-tested-by: [email protected]
>
> Tested on:
>
> commit: e5cd56e9 usb: gadget: add raw-gadget interface
> git tree: https://github.com/google/kasan.git usb-fuzzer
> kernel config: https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162
> dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> patch: https://syzkaller.appspot.com/x/patch.diff?x=115026b5e00000
>
> Note: testing is done by a robot and is best-effort only.
>

On 2020/02/10 16:44, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger crash:
>
> Reported-and-tested-by: [email protected]
>
> Tested on:
>
> commit: e5cd56e9 usb: gadget: add raw-gadget interface
> git tree: https://github.com/google/kasan.git usb-fuzzer
> kernel config: https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162
> dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> patch: https://syzkaller.appspot.com/x/patch.diff?x=17285431e00000
>
> Note: testing is done by a robot and is best-effort only.
>

On 2020/02/10 17:05, syzbot wrote:
> Hello,
>
> syzbot has tested the proposed patch and the reproducer did not trigger crash:
>
> Reported-and-tested-by: [email protected]
>
> Tested on:
>
> commit: e5cd56e9 usb: gadget: add raw-gadget interface
> git tree: https://github.com/google/kasan.git usb-fuzzer
> kernel config: https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162
> dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
> compiler: gcc (GCC) 9.0.0 20181231 (experimental)
>
> Note: testing is done by a robot and is best-effort only.
>

2020-02-10 15:07:00

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

On Mon, Feb 10, 2020 at 1:46 PM Tetsuo Handa
<[email protected]> wrote:
>
> On 2020/02/10 19:09, Dmitry Vyukov wrote:
> > You may also try on the exact commit the bug was reported, because
> > usb-fuzzer is tracking branch, things may change there.
>
> OK. I explicitly tried
>
> #syz test: https://github.com/google/kasan.git e5cd56e94edde38ca4dafae5a450c5a16b8a5f23
>
> but syzbot still cannot reproduce this bug using the reproducer...
>
> On 2020/02/10 21:02, syzbot wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch and the reproducer did not trigger crash:
> >
> > Reported-and-tested-by: [email protected]
> >
> > Tested on:
> >
> > commit: e5cd56e9 usb: gadget: add raw-gadget interface
> > git tree: https://github.com/google/kasan.git
> > kernel config: https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162
> > dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Note: testing is done by a robot and is best-effort only.
> >
>
> Anyway, I'm just suspecting that we are forgetting to wake up all waiters
> after clearing WDM_IN_USE bit because sometimes multiple threads are reported
> as hung.
>
> On 2020/02/10 15:27, syzbot wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch and the reproducer did not trigger crash:
> >
> > Reported-and-tested-by: [email protected]
> >
> > Tested on:
> >
> > commit: e5cd56e9 usb: gadget: add raw-gadget interface
> > git tree: https://github.com/google/kasan.git usb-fuzzer
> > kernel config: https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162
> > dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > patch: https://syzkaller.appspot.com/x/patch.diff?x=117c3ae9e00000
> >
> > Note: testing is done by a robot and is best-effort only.
> >
>
> On 2020/02/10 15:55, syzbot wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch and the reproducer did not trigger crash:
> >
> > Reported-and-tested-by: [email protected]
> >
> > Tested on:
> >
> > commit: e5cd56e9 usb: gadget: add raw-gadget interface
> > git tree: https://github.com/google/kasan.git usb-fuzzer
> > kernel config: https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162
> > dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > patch: https://syzkaller.appspot.com/x/patch.diff?x=13b3f6e9e00000
> >
> > Note: testing is done by a robot and is best-effort only.
> >
>
> On 2020/02/10 16:21, syzbot wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch and the reproducer did not trigger crash:
> >
> > Reported-and-tested-by: [email protected]
> >
> > Tested on:
> >
> > commit: e5cd56e9 usb: gadget: add raw-gadget interface
> > git tree: https://github.com/google/kasan.git usb-fuzzer
> > kernel config: https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162
> > dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > patch: https://syzkaller.appspot.com/x/patch.diff?x=115026b5e00000
> >
> > Note: testing is done by a robot and is best-effort only.
> >
>
> On 2020/02/10 16:44, syzbot wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch and the reproducer did not trigger crash:
> >
> > Reported-and-tested-by: [email protected]
> >
> > Tested on:
> >
> > commit: e5cd56e9 usb: gadget: add raw-gadget interface
> > git tree: https://github.com/google/kasan.git usb-fuzzer
> > kernel config: https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162
> > dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > patch: https://syzkaller.appspot.com/x/patch.diff?x=17285431e00000
> >
> > Note: testing is done by a robot and is best-effort only.
> >
>
> On 2020/02/10 17:05, syzbot wrote:
> > Hello,
> >
> > syzbot has tested the proposed patch and the reproducer did not trigger crash:
> >
> > Reported-and-tested-by: [email protected]
> >
> > Tested on:
> >
> > commit: e5cd56e9 usb: gadget: add raw-gadget interface
> > git tree: https://github.com/google/kasan.git usb-fuzzer
> > kernel config: https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162
> > dashboard link: https://syzkaller.appspot.com/bug?extid=854768b99f19e89d7f81
> > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> >
> > Note: testing is done by a robot and is best-effort only.



On Mon, Feb 10, 2020 at 4:03 PM Tetsuo Handa
<[email protected]> wrote:
>
> On 2020/02/10 21:46, Tetsuo Handa wrote:
> > On 2020/02/10 19:09, Dmitry Vyukov wrote:
> >> You may also try on the exact commit the bug was reported, because
> >> usb-fuzzer is tracking branch, things may change there.
> >
> > OK. I explicitly tried
> >
> > #syz test: https://github.com/google/kasan.git e5cd56e94edde38ca4dafae5a450c5a16b8a5f23
> >
> > but syzbot still cannot reproduce this bug using the reproducer...
>
> It seems that there is non-trivial difference between kernel config in dashboard
> and kernel config in "syz test:" mails. Maybe that's the cause...

2020-02-10 15:09:14

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

> On Mon, Feb 10, 2020 at 4:03 PM Tetsuo Handa
> <[email protected]> wrote:
> >
> > On 2020/02/10 21:46, Tetsuo Handa wrote:
> > > On 2020/02/10 19:09, Dmitry Vyukov wrote:
> > >> You may also try on the exact commit the bug was reported, because
> > >> usb-fuzzer is tracking branch, things may change there.
> > >
> > > OK. I explicitly tried
> > >
> > > #syz test: https://github.com/google/kasan.git e5cd56e94edde38ca4dafae5a450c5a16b8a5f23
> > >
> > > but syzbot still cannot reproduce this bug using the reproducer...
> >
> > It seems that there is non-trivial difference between kernel config in dashboard
> > and kernel config in "syz test:" mails. Maybe that's the cause...


syzkaller runs oldconfig when building any kernels:
https://github.com/google/syzkaller/blob/master/pkg/build/linux.go#L56
Is that difference what oldconfig produces?

2020-02-10 15:23:46

by Tetsuo Handa

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

On 2020/02/11 0:06, Dmitry Vyukov wrote:
>> On Mon, Feb 10, 2020 at 4:03 PM Tetsuo Handa
>> <[email protected]> wrote:
>>>
>>> On 2020/02/10 21:46, Tetsuo Handa wrote:
>>>> On 2020/02/10 19:09, Dmitry Vyukov wrote:
>>>>> You may also try on the exact commit the bug was reported, because
>>>>> usb-fuzzer is tracking branch, things may change there.
>>>>
>>>> OK. I explicitly tried
>>>>
>>>> #syz test: https://github.com/google/kasan.git e5cd56e94edde38ca4dafae5a450c5a16b8a5f23
>>>>
>>>> but syzbot still cannot reproduce this bug using the reproducer...
>>>
>>> It seems that there is non-trivial difference between kernel config in dashboard
>>> and kernel config in "syz test:" mails. Maybe that's the cause...
>
>
> syzkaller runs oldconfig when building any kernels:
> https://github.com/google/syzkaller/blob/master/pkg/build/linux.go#L56
> Is that difference what oldconfig produces?
>

Here is the diff (with "#" lines excluded) between dashboard and "syz test:" mails.
I feel this difference is bigger than what simple oldconfig would cause.

$ curl 'https://syzkaller.appspot.com/text?tag=KernelConfig&x=8cff427cc8996115' | sort > dashboard
$ curl 'https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162' | sort > syz-test
$ diff -u dashboard syz-test | grep -vF '#' | grep '^[+-]'
--- dashboard 2020-02-11 00:19:14.793977153 +0900
+++ syz-test 2020-02-11 00:19:15.659977108 +0900
-CONFIG_BLK_DEV_LOOP_MIN_COUNT=16
+CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
-CONFIG_BUG_ON_DATA_CORRUPTION=y
-CONFIG_DEBUG_CREDENTIALS=y
-CONFIG_DEBUG_PER_CPU_MAPS=y
-CONFIG_DEBUG_PLIST=y
-CONFIG_DEBUG_SG=y
-CONFIG_DEBUG_VIRTUAL=y
+CONFIG_DEVMEM=y
+CONFIG_DEVPORT=y
+CONFIG_DMA_OF=y
-CONFIG_DYNAMIC_DEBUG=y
-CONFIG_DYNAMIC_MEMORY_LAYOUT=y
+CONFIG_HID_REDRAGON=y
+CONFIG_IRQCHIP=y
-CONFIG_LSM="lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
+CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
-CONFIG_MAC80211_HWSIM=y
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
+CONFIG_MAGIC_SYSRQ_SERIAL=y
+CONFIG_NET_TC_SKB_EXT=y
+CONFIG_OF=y
+CONFIG_OF_ADDRESS=y
+CONFIG_OF_GPIO=y
+CONFIG_OF_IOMMU=y
+CONFIG_OF_IRQ=y
+CONFIG_OF_KOBJ=y
+CONFIG_OF_MDIO=y
+CONFIG_OF_NET=y
-CONFIG_PGTABLE_LEVELS=5
+CONFIG_PGTABLE_LEVELS=4
+CONFIG_PWRSEQ_EMMC=y
+CONFIG_PWRSEQ_SIMPLE=y
+CONFIG_RTLWIFI_DEBUG=y
-CONFIG_SECURITYFS=y
+CONFIG_STRICT_DEVMEM=y
+CONFIG_THERMAL_OF=y
+CONFIG_USB_CHIPIDEA_OF=y
+CONFIG_USB_DWC3_OF_SIMPLE=y
-CONFIG_USB_RAW_GADGET=y
+CONFIG_USB_SNP_UDC_PLAT=y
-CONFIG_VIRTIO_BLK_SCSI=y
-CONFIG_VIRT_WIFI=y
-CONFIG_X86_5LEVEL=y

2020-02-11 14:05:50

by Tetsuo Handa

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

On 2020/02/11 0:21, Tetsuo Handa wrote:
> On 2020/02/11 0:06, Dmitry Vyukov wrote:
>>> On Mon, Feb 10, 2020 at 4:03 PM Tetsuo Handa
>>> <[email protected]> wrote:
>>>>
>>>> On 2020/02/10 21:46, Tetsuo Handa wrote:
>>>>> On 2020/02/10 19:09, Dmitry Vyukov wrote:
>>>>>> You may also try on the exact commit the bug was reported, because
>>>>>> usb-fuzzer is tracking branch, things may change there.
>>>>>
>>>>> OK. I explicitly tried
>>>>>
>>>>> #syz test: https://github.com/google/kasan.git e5cd56e94edde38ca4dafae5a450c5a16b8a5f23
>>>>>
>>>>> but syzbot still cannot reproduce this bug using the reproducer...
>>>>
>>>> It seems that there is non-trivial difference between kernel config in dashboard
>>>> and kernel config in "syz test:" mails. Maybe that's the cause...
>>
>>
>> syzkaller runs oldconfig when building any kernels:
>> https://github.com/google/syzkaller/blob/master/pkg/build/linux.go#L56
>> Is that difference what oldconfig produces?
>>
>
> Here is the diff (with "#" lines excluded) between dashboard and "syz test:" mails.
> I feel this difference is bigger than what simple oldconfig would cause.
>

I explicitly tried a commit as of the first report (instead of the latest report)

#syz test: https://github.com/google/kasan.git e96407b497622d03f088bcf17d2c8c5a1ab066c8

and syzbot reproduced this bug using the reproducer. Therefore, it seems that differences
in the kernel config used for "syz test:" was inappropriate but "syz test:" failed to detect
it. Since there might be changes which fixed different bugs (and in order to confirm that
proposed patch cleanly applies to the current kernel without causing other problems), I guess
that people tend to test using the latest commit (instead of a commit as of the first report).

I suggest "syz test:" to retest without proposed patch when proposed patch did not reproduce
the bug. If retesting without proposed patch did not reproduce the bug, we can figure out that
something is wrong (maybe the bug is difficult to reproduce, maybe the bug was already fixed,
maybe kernel config was inappropriate, maybe something else).



Regarding the bug for this report, debug printk() reported that WDM_IN_USE was not cleared
for some reason. While we need to investigate why WDM_IN_USE was not cleared, I guess that
wdm_write() should clear WDM_IN_USE upon error
( https://syzkaller.appspot.com/x/patch.diff?x=17ec7ee9e00000 ) so that we will surely
wake up somebody potentially waiting on WDM_IN_USE.

[ 38.587596][ T2807] wdm_flush: file=ffff8881d488bb80 flags=2
[ 40.214039][ T2807] wdm_flush: file=ffff8881d63fb400 flags=2
[ 40.304390][ T2842] wdm_flush: file=ffff8881d5e22500 flags=0
[ 40.371742][ T2869] wdm_flush: file=ffff8881d4964c80 flags=0
[ 40.429954][ T2844] wdm_flush: file=ffff8881d5937b80 flags=0
[ 40.461538][ T2858] wdm_flush: file=ffff8881d488b400 flags=0
[ 40.464909][ T2863] wdm_flush: file=ffff8881d488ea00 flags=0
[ 41.576761][ T2896] wdm_flush: file=ffff8881d43dea00 flags=2
[ 41.949941][ T2909] wdm_flush: file=ffff8881d63c3b80 flags=2
[ 43.760828][ T2899] wdm_flush: file=ffff8881d3d7a000 flags=2
[ 43.857364][ T2911] wdm_flush: file=ffff8881d63c2000 flags=2
[ 43.857501][ T2904] wdm_flush: file=ffff8881d3d7a280 flags=2
[ 43.866560][ T2906] wdm_flush: file=ffff8881d5ce4780 flags=2
[ 43.876210][ T2897] wdm_flush: file=ffff8881d385db80 flags=2
[ 72.308895][ T2909] INFO: task syz-executor.0:2909 blocked for more than 30 seconds.
[ 72.316860][ T2909] wdm_flush: file=ffff8881d63c3b80 flags=2
[ 74.228916][ T2906] INFO: task syz-executor.1:2906 blocked for more than 30 seconds.
[ 74.228921][ T2911] INFO: task syz-executor.3:2911 blocked for more than 30 seconds.
[ 74.228935][ T2911] wdm_flush: file=ffff8881d63c2000 flags=2
[ 74.236949][ T2906] wdm_flush: file=ffff8881d5ce4780 flags=2
[ 74.236991][ T2904] INFO: task syz-executor.4:2904 blocked for more than 30 seconds.
[ 74.245459][ T2897] INFO: task syz-executor.2:2897 blocked for more than 30 seconds.
[ 74.251305][ T2904] wdm_flush: file=ffff8881d3d7a280 flags=2
[ 74.257129][ T2897] wdm_flush: file=ffff8881d385db80 flags=2
[ 74.257951][ T2899] INFO: task syz-executor.5:2899 blocked for more than 30 seconds.
[ 74.294465][ T2899] wdm_flush: file=ffff8881d3d7a000 flags=2

2020-02-11 14:28:52

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

On Mon, Feb 10, 2020 at 4:22 PM Tetsuo Handa
<[email protected]> wrote:
>
> On 2020/02/11 0:06, Dmitry Vyukov wrote:
> >> On Mon, Feb 10, 2020 at 4:03 PM Tetsuo Handa
> >> <[email protected]> wrote:
> >>>
> >>> On 2020/02/10 21:46, Tetsuo Handa wrote:
> >>>> On 2020/02/10 19:09, Dmitry Vyukov wrote:
> >>>>> You may also try on the exact commit the bug was reported, because
> >>>>> usb-fuzzer is tracking branch, things may change there.
> >>>>
> >>>> OK. I explicitly tried
> >>>>
> >>>> #syz test: https://github.com/google/kasan.git e5cd56e94edde38ca4dafae5a450c5a16b8a5f23
> >>>>
> >>>> but syzbot still cannot reproduce this bug using the reproducer...
> >>>
> >>> It seems that there is non-trivial difference between kernel config in dashboard
> >>> and kernel config in "syz test:" mails. Maybe that's the cause...
> >
> >
> > syzkaller runs oldconfig when building any kernels:
> > https://github.com/google/syzkaller/blob/master/pkg/build/linux.go#L56
> > Is that difference what oldconfig produces?
> >
>
> Here is the diff (with "#" lines excluded) between dashboard and "syz test:" mails.
> I feel this difference is bigger than what simple oldconfig would cause.
>
> $ curl 'https://syzkaller.appspot.com/text?tag=KernelConfig&x=8cff427cc8996115' | sort > dashboard

I think you took a wrong config as a base.
This 8cff427cc8996115 was only used for crashes without reproducers as
far as I see, so it can't be used for patch testing.
I would expect the one used for last patch testing is this one:
https://syzkaller.appspot.com/text?tag=KernelConfig&x=8847e5384a16f66a
associated with this crash:
ci2-upstream-usb2019/09/23 13:26https://github.com/google/kasan.git
usb-fuzzere0bd8d79d96e88f3

I checked at least CONFIG_DYNAMIC_DEBUG, and it matches what was used
for patch testing.
So everything seems right to me as far as I see.



> $ curl 'https://syzkaller.appspot.com/x/.config?x=c372cdb7140fc162' | sort > syz-test
> $ diff -u dashboard syz-test | grep -vF '#' | grep '^[+-]'
> --- dashboard 2020-02-11 00:19:14.793977153 +0900
> +++ syz-test 2020-02-11 00:19:15.659977108 +0900
> -CONFIG_BLK_DEV_LOOP_MIN_COUNT=16
> +CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
> -CONFIG_BUG_ON_DATA_CORRUPTION=y
> -CONFIG_DEBUG_CREDENTIALS=y
> -CONFIG_DEBUG_PER_CPU_MAPS=y
> -CONFIG_DEBUG_PLIST=y
> -CONFIG_DEBUG_SG=y
> -CONFIG_DEBUG_VIRTUAL=y
> +CONFIG_DEVMEM=y
> +CONFIG_DEVPORT=y
> +CONFIG_DMA_OF=y
> -CONFIG_DYNAMIC_DEBUG=y
> -CONFIG_DYNAMIC_MEMORY_LAYOUT=y
> +CONFIG_HID_REDRAGON=y
> +CONFIG_IRQCHIP=y
> -CONFIG_LSM="lockdown,yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
> +CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
> -CONFIG_MAC80211_HWSIM=y
> +CONFIG_MAGIC_SYSRQ=y
> +CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1
> +CONFIG_MAGIC_SYSRQ_SERIAL=y
> +CONFIG_NET_TC_SKB_EXT=y
> +CONFIG_OF=y
> +CONFIG_OF_ADDRESS=y
> +CONFIG_OF_GPIO=y
> +CONFIG_OF_IOMMU=y
> +CONFIG_OF_IRQ=y
> +CONFIG_OF_KOBJ=y
> +CONFIG_OF_MDIO=y
> +CONFIG_OF_NET=y
> -CONFIG_PGTABLE_LEVELS=5
> +CONFIG_PGTABLE_LEVELS=4
> +CONFIG_PWRSEQ_EMMC=y
> +CONFIG_PWRSEQ_SIMPLE=y
> +CONFIG_RTLWIFI_DEBUG=y
> -CONFIG_SECURITYFS=y
> +CONFIG_STRICT_DEVMEM=y
> +CONFIG_THERMAL_OF=y
> +CONFIG_USB_CHIPIDEA_OF=y
> +CONFIG_USB_DWC3_OF_SIMPLE=y
> -CONFIG_USB_RAW_GADGET=y
> +CONFIG_USB_SNP_UDC_PLAT=y
> -CONFIG_VIRTIO_BLK_SCSI=y
> -CONFIG_VIRT_WIFI=y
> -CONFIG_X86_5LEVEL=y

2020-02-11 16:33:34

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: INFO: task hung in wdm_flush

On Tue, Feb 11, 2020 at 2:55 PM Tetsuo Handa
<[email protected]> wrote:
>
> On 2020/02/11 0:21, Tetsuo Handa wrote:
> > On 2020/02/11 0:06, Dmitry Vyukov wrote:
> >>> On Mon, Feb 10, 2020 at 4:03 PM Tetsuo Handa
> >>> <[email protected]> wrote:
> >>>>
> >>>> On 2020/02/10 21:46, Tetsuo Handa wrote:
> >>>>> On 2020/02/10 19:09, Dmitry Vyukov wrote:
> >>>>>> You may also try on the exact commit the bug was reported, because
> >>>>>> usb-fuzzer is tracking branch, things may change there.
> >>>>>
> >>>>> OK. I explicitly tried
> >>>>>
> >>>>> #syz test: https://github.com/google/kasan.git e5cd56e94edde38ca4dafae5a450c5a16b8a5f23
> >>>>>
> >>>>> but syzbot still cannot reproduce this bug using the reproducer...
> >>>>
> >>>> It seems that there is non-trivial difference between kernel config in dashboard
> >>>> and kernel config in "syz test:" mails. Maybe that's the cause...
> >>
> >>
> >> syzkaller runs oldconfig when building any kernels:
> >> https://github.com/google/syzkaller/blob/master/pkg/build/linux.go#L56
> >> Is that difference what oldconfig produces?
> >>
> >
> > Here is the diff (with "#" lines excluded) between dashboard and "syz test:" mails.
> > I feel this difference is bigger than what simple oldconfig would cause.
> >
>
> I explicitly tried a commit as of the first report (instead of the latest report)
>
> #syz test: https://github.com/google/kasan.git e96407b497622d03f088bcf17d2c8c5a1ab066c8
>
> and syzbot reproduced this bug using the reproducer. Therefore, it seems that differences
> in the kernel config used for "syz test:" was inappropriate but "syz test:" failed to detect
> it. Since there might be changes which fixed different bugs (and in order to confirm that
> proposed patch cleanly applies to the current kernel without causing other problems), I guess
> that people tend to test using the latest commit (instead of a commit as of the first report).
>
> I suggest "syz test:" to retest without proposed patch when proposed patch did not reproduce
> the bug. If retesting without proposed patch did not reproduce the bug, we can figure out that
> something is wrong (maybe the bug is difficult to reproduce, maybe the bug was already fixed,
> maybe kernel config was inappropriate, maybe something else).

This is already possible, right? One can request any single testing as
they see fit.
Chaining tests into complex workflows won't necessarily make things
simpler. It will be hard to explain what exactly happened and why.
Also, consider, a reproducer is flaky, it did not crashed with patch,
but crashed without the patch (just because it's flaky).


> Regarding the bug for this report, debug printk() reported that WDM_IN_USE was not cleared
> for some reason. While we need to investigate why WDM_IN_USE was not cleared, I guess that
> wdm_write() should clear WDM_IN_USE upon error
> ( https://syzkaller.appspot.com/x/patch.diff?x=17ec7ee9e00000 ) so that we will surely
> wake up somebody potentially waiting on WDM_IN_USE.
>
> [ 38.587596][ T2807] wdm_flush: file=ffff8881d488bb80 flags=2
> [ 40.214039][ T2807] wdm_flush: file=ffff8881d63fb400 flags=2
> [ 40.304390][ T2842] wdm_flush: file=ffff8881d5e22500 flags=0
> [ 40.371742][ T2869] wdm_flush: file=ffff8881d4964c80 flags=0
> [ 40.429954][ T2844] wdm_flush: file=ffff8881d5937b80 flags=0
> [ 40.461538][ T2858] wdm_flush: file=ffff8881d488b400 flags=0
> [ 40.464909][ T2863] wdm_flush: file=ffff8881d488ea00 flags=0
> [ 41.576761][ T2896] wdm_flush: file=ffff8881d43dea00 flags=2
> [ 41.949941][ T2909] wdm_flush: file=ffff8881d63c3b80 flags=2
> [ 43.760828][ T2899] wdm_flush: file=ffff8881d3d7a000 flags=2
> [ 43.857364][ T2911] wdm_flush: file=ffff8881d63c2000 flags=2
> [ 43.857501][ T2904] wdm_flush: file=ffff8881d3d7a280 flags=2
> [ 43.866560][ T2906] wdm_flush: file=ffff8881d5ce4780 flags=2
> [ 43.876210][ T2897] wdm_flush: file=ffff8881d385db80 flags=2
> [ 72.308895][ T2909] INFO: task syz-executor.0:2909 blocked for more than 30 seconds.
> [ 72.316860][ T2909] wdm_flush: file=ffff8881d63c3b80 flags=2
> [ 74.228916][ T2906] INFO: task syz-executor.1:2906 blocked for more than 30 seconds.
> [ 74.228921][ T2911] INFO: task syz-executor.3:2911 blocked for more than 30 seconds.
> [ 74.228935][ T2911] wdm_flush: file=ffff8881d63c2000 flags=2
> [ 74.236949][ T2906] wdm_flush: file=ffff8881d5ce4780 flags=2
> [ 74.236991][ T2904] INFO: task syz-executor.4:2904 blocked for more than 30 seconds.
> [ 74.245459][ T2897] INFO: task syz-executor.2:2897 blocked for more than 30 seconds.
> [ 74.251305][ T2904] wdm_flush: file=ffff8881d3d7a280 flags=2
> [ 74.257129][ T2897] wdm_flush: file=ffff8881d385db80 flags=2
> [ 74.257951][ T2899] INFO: task syz-executor.5:2899 blocked for more than 30 seconds.
> [ 74.294465][ T2899] wdm_flush: file=ffff8881d3d7a000 flags=2
>