Hello,
syzbot found the following issue on:
HEAD commit: 50987bec Merge tag 'trace-v5.12-rc7' of git://git.kernel.o..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1065c5fcd00000
kernel config: https://syzkaller.appspot.com/x/.config?x=398c4d0fe6f66e68
dashboard link: https://syzkaller.appspot.com/bug?extid=e2eae5639e7203360018
Unfortunately, I don't have any reproducer for this issue yet.
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
usbtmc 5-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 5-1:0.0: unknown status received: -71
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 1-...!: (8580 ticks this GP) idle=72e/1/0x4000000000000000 softirq=20679/20679 fqs=0
(t=10500 jiffies g=27129 q=416)
rcu: rcu_preempt kthread starved for 10500 jiffies! g27129 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:29168 pid: 14 ppid: 2 flags:0x00004000
Call Trace:
context_switch kernel/sched/core.c:4322 [inline]
__schedule+0x911/0x21b0 kernel/sched/core.c:5073
schedule+0xcf/0x270 kernel/sched/core.c:5152
schedule_timeout+0x14a/0x250 kernel/time/timer.c:1892
rcu_gp_fqs_loop kernel/rcu/tree.c:2005 [inline]
rcu_gp_kthread+0xd07/0x2250 kernel/rcu/tree.c:2178
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 3232 Comm: aoe_tx0 Not tainted 5.12.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:native_apic_mem_write+0x8/0x10 arch/x86/include/asm/apic.h:110
Code: c7 40 d9 36 8f e8 c8 11 86 00 eb b0 66 0f 1f 44 00 00 be 01 00 00 00 e9 36 c7 2c 00 cc cc cc cc cc cc 89 ff 89 b7 00 c0 5f ff <c3> 0f 1f 80 00 00 00 00 48 b8 00 00 00 00 00 fc ff df 53 89 fb 48
RSP: 0018:ffffc90000007ea8 EFLAGS: 00000046
RAX: dffffc0000000000 RBX: ffffffff8b0a78c0 RCX: 0000000000000020
RDX: 1ffffffff1614f1a RSI: 000000000001c285 RDI: 0000000000000380
RBP: ffff8880b9c1f2c0 R08: 000000000000003f R09: 0000000000000000
R10: ffffffff8166ecf7 R11: 0000000000000000 R12: 000000000001c285
R13: 0000000000000020 R14: ffff8880b9c26340 R15: 0000006120792e26
FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb9e6cdb380 CR3: 0000000018792000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
apic_write arch/x86/include/asm/apic.h:393 [inline]
lapic_next_event+0x4d/0x80 arch/x86/kernel/apic/apic.c:472
clockevents_program_event+0x254/0x370 kernel/time/clockevents.c:334
tick_program_event+0xac/0x140 kernel/time/tick-oneshot.c:44
hrtimer_interrupt+0x414/0xa00 kernel/time/hrtimer.c:1676
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
__sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1106
sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1100
</IRQ>
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:27 [inline]
RIP: 0010:check_kcov_mode kernel/kcov.c:163 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x60 kernel/kcov.c:197
Code: f0 4d 89 03 e9 f2 fc ff ff b9 ff ff ff ff ba 08 00 00 00 4d 8b 03 48 0f bd ca 49 8b 45 00 48 63 c9 e9 64 ff ff ff 0f 1f 40 00 <65> 8b 05 39 fe 8d 7e 89 c1 48 8b 34 24 81 e1 00 01 00 00 65 48 8b
RSP: 0018:ffffc900030cf6f8 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88801aff1c40 RSI: ffffffff815c2e4f RDI: 0000000000000003
RBP: ffffc900030cf738 R08: 0000000000000000 R09: ffffffff8fa9a96f
R10: ffffffff815c2e45 R11: 0000000000000000 R12: 000000000000002d
R13: ffff8880113db880 R14: 0000000000000000 R15: 0000000000000200
console_trylock_spinning kernel/printk/printk.c:1818 [inline]
vprintk_emit+0x3a5/0x560 kernel/printk/printk.c:2097
dev_vprintk_emit+0x36e/0x3b2 drivers/base/core.c:4434
dev_printk_emit+0xba/0xf1 drivers/base/core.c:4445
__netdev_printk+0x1c6/0x27a net/core/dev.c:11292
netdev_warn+0xd7/0x109 net/core/dev.c:11345
ieee802154_subif_start_xmit.cold+0x17/0x27 net/mac802154/tx.c:125
__netdev_start_xmit include/linux/netdevice.h:4825 [inline]
netdev_start_xmit include/linux/netdevice.h:4839 [inline]
xmit_one net/core/dev.c:3605 [inline]
dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3621
sch_direct_xmit+0x2e1/0xbd0 net/sched/sch_generic.c:313
qdisc_restart net/sched/sch_generic.c:376 [inline]
__qdisc_run+0x4ba/0x15f0 net/sched/sch_generic.c:384
qdisc_run include/net/pkt_sched.h:136 [inline]
qdisc_run include/net/pkt_sched.h:128 [inline]
__dev_xmit_skb net/core/dev.c:3807 [inline]
__dev_queue_xmit+0x14b9/0x2e00 net/core/dev.c:4162
tx+0x68/0xb0 drivers/block/aoe/aoenet.c:63
kthread+0x1e7/0x3a0 drivers/block/aoe/aoecmd.c:1230
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
NMI backtrace for cpu 1
CPU: 1 PID: 37 Comm: kworker/1:1 Not tainted 5.12.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_dev_trap_report_work
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x141/0x1d7 lib/dump_stack.c:120
nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
rcu_dump_cpu_stacks+0x222/0x2a7 kernel/rcu/tree_stall.h:341
print_cpu_stall kernel/rcu/tree_stall.h:622 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:697 [inline]
rcu_pending kernel/rcu/tree.c:3830 [inline]
rcu_sched_clock_irq.cold+0x4f7/0x11dd kernel/rcu/tree.c:2650
update_process_times+0x16d/0x200 kernel/time/timer.c:1796
tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1369
__run_hrtimer kernel/time/hrtimer.c:1537 [inline]
__hrtimer_run_queues+0x1c0/0xe40 kernel/time/hrtimer.c:1601
hrtimer_interrupt+0x330/0xa00 kernel/time/hrtimer.c:1663
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
__sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1106
sysvec_apic_timer_interrupt+0x40/0xc0 arch/x86/kernel/apic/apic.c:1100
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70 kernel/locking/spinlock.c:191
Code: 74 24 10 e8 ba 19 54 f8 48 89 ef e8 f2 cf 54 f8 81 e3 00 02 00 00 75 25 9c 58 f6 c4 02 75 2d 48 85 db 74 01 fb bf 01 00 00 00 <e8> d3 9d 48 f8 65 8b 05 7c 68 fc 76 85 c0 74 0a 5b 5d c3 e8 40 59
RSP: 0018:ffffc90000dc0b28 EFLAGS: 00000206
RAX: 0000000000000002 RBX: 0000000000000200 RCX: 1ffffffff1f5f34a
RDX: 0000000000000000 RSI: 0000000000000103 RDI: 0000000000000001
RBP: ffff888144fa8000 R08: 0000000000000001 R09: ffffffff8fa9a99f
R10: 0000000000000001 R11: ffffc90013880000 R12: ffff888145047440
R13: ffff88801ee8e500 R14: dffffc0000000000 R15: ffff888011f69c00
spin_unlock_irqrestore include/linux/spinlock.h:409 [inline]
dummy_timer+0x12f1/0x32a0 drivers/usb/gadget/udc/dummy_hcd.c:1985
call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1431
expire_timers kernel/time/timer.c:1476 [inline]
__run_timers.part.0+0x67c/0xa50 kernel/time/timer.c:1745
__run_timers kernel/time/timer.c:1726 [inline]
run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1758
__do_softirq+0x29b/0x9f6 kernel/softirq.c:345
do_softirq.part.0+0xd9/0x130 kernel/softirq.c:248
</IRQ>
do_softirq kernel/softirq.c:240 [inline]
__local_bh_enable_ip+0x102/0x120 kernel/softirq.c:198
spin_unlock_bh include/linux/spinlock.h:399 [inline]
nsim_dev_trap_report drivers/net/netdevsim/dev.c:585 [inline]
nsim_dev_trap_report_work+0x867/0xbd0 drivers/net/netdevsim/dev.c:611
process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
kthread+0x3b1/0x4a0 kernel/kthread.c:292
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 5-1:0.0: unknown status received: -71
usbtmc 5-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 5-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 5-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 2-1:0.0: unknown status received: -71
usbtmc 4-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: unknown status received: -71
usbtmc 3-1:0.0: usb_submit_urb failed: -19
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: unknown status received: -71
usbtmc 6-1:0.0: usb_submit_urb failed: -19
---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].
syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
On Mon, Apr 19, 2021 at 9:19 AM syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 50987bec Merge tag 'trace-v5.12-rc7' of git://git.kernel.o..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1065c5fcd00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=398c4d0fe6f66e68
> dashboard link: https://syzkaller.appspot.com/bug?extid=e2eae5639e7203360018
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> usbtmc 5-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 5-1:0.0: unknown status received: -71
The log shows an infinite stream of these before the stall, so I
assume it's an infinite loop in usbtmc.
+usbtmc maintainers
[ 370.171634][ C0] usbtmc 6-1:0.0: unknown status received: -71
[ 370.177799][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.183912][ C0] usbtmc 4-1:0.0: unknown status received: -71
[ 370.190076][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.196194][ C0] usbtmc 2-1:0.0: unknown status received: -71
[ 370.202387][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.208460][ C0] usbtmc 6-1:0.0: unknown status received: -71
[ 370.214615][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.220736][ C0] usbtmc 4-1:0.0: unknown status received: -71
[ 370.226902][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.233005][ C0] usbtmc 2-1:0.0: unknown status received: -71
[ 370.239168][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.245271][ C0] usbtmc 6-1:0.0: unknown status received: -71
[ 370.251426][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.257552][ C0] usbtmc 4-1:0.0: unknown status received: -71
[ 370.263715][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.269819][ C0] usbtmc 2-1:0.0: unknown status received: -71
[ 370.275974][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.282100][ C0] usbtmc 6-1:0.0: unknown status received: -71
[ 370.288262][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.294399][ C0] usbtmc 4-1:0.0: unknown status received: -71
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 1-...!: (8580 ticks this GP) idle=72e/1/0x4000000000000000 softirq=20679/20679 fqs=0
> (t=10500 jiffies g=27129 q=416)
> rcu: rcu_preempt kthread starved for 10500 jiffies! g27129 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> rcu: RCU grace-period kthread stack dump:
> task:rcu_preempt state:R running task stack:29168 pid: 14 ppid: 2 flags:0x00004000
> Call Trace:
> context_switch kernel/sched/core.c:4322 [inline]
> __schedule+0x911/0x21b0 kernel/sched/core.c:5073
> schedule+0xcf/0x270 kernel/sched/core.c:5152
> schedule_timeout+0x14a/0x250 kernel/time/timer.c:1892
> rcu_gp_fqs_loop kernel/rcu/tree.c:2005 [inline]
> rcu_gp_kthread+0xd07/0x2250 kernel/rcu/tree.c:2178
> kthread+0x3b1/0x4a0 kernel/kthread.c:292
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> rcu: Stack dump where RCU GP kthread last ran:
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 3232 Comm: aoe_tx0 Not tainted 5.12.0-rc7-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:native_apic_mem_write+0x8/0x10 arch/x86/include/asm/apic.h:110
> Code: c7 40 d9 36 8f e8 c8 11 86 00 eb b0 66 0f 1f 44 00 00 be 01 00 00 00 e9 36 c7 2c 00 cc cc cc cc cc cc 89 ff 89 b7 00 c0 5f ff <c3> 0f 1f 80 00 00 00 00 48 b8 00 00 00 00 00 fc ff df 53 89 fb 48
> RSP: 0018:ffffc90000007ea8 EFLAGS: 00000046
> RAX: dffffc0000000000 RBX: ffffffff8b0a78c0 RCX: 0000000000000020
> RDX: 1ffffffff1614f1a RSI: 000000000001c285 RDI: 0000000000000380
> RBP: ffff8880b9c1f2c0 R08: 000000000000003f R09: 0000000000000000
> R10: ffffffff8166ecf7 R11: 0000000000000000 R12: 000000000001c285
> R13: 0000000000000020 R14: ffff8880b9c26340 R15: 0000006120792e26
> FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fb9e6cdb380 CR3: 0000000018792000 CR4: 00000000001506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <IRQ>
> apic_write arch/x86/include/asm/apic.h:393 [inline]
> lapic_next_event+0x4d/0x80 arch/x86/kernel/apic/apic.c:472
> clockevents_program_event+0x254/0x370 kernel/time/clockevents.c:334
> tick_program_event+0xac/0x140 kernel/time/tick-oneshot.c:44
> hrtimer_interrupt+0x414/0xa00 kernel/time/hrtimer.c:1676
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
> __sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1106
> sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1100
> </IRQ>
> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
> RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:27 [inline]
> RIP: 0010:check_kcov_mode kernel/kcov.c:163 [inline]
> RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x60 kernel/kcov.c:197
> Code: f0 4d 89 03 e9 f2 fc ff ff b9 ff ff ff ff ba 08 00 00 00 4d 8b 03 48 0f bd ca 49 8b 45 00 48 63 c9 e9 64 ff ff ff 0f 1f 40 00 <65> 8b 05 39 fe 8d 7e 89 c1 48 8b 34 24 81 e1 00 01 00 00 65 48 8b
> RSP: 0018:ffffc900030cf6f8 EFLAGS: 00000293
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: ffff88801aff1c40 RSI: ffffffff815c2e4f RDI: 0000000000000003
> RBP: ffffc900030cf738 R08: 0000000000000000 R09: ffffffff8fa9a96f
> R10: ffffffff815c2e45 R11: 0000000000000000 R12: 000000000000002d
> R13: ffff8880113db880 R14: 0000000000000000 R15: 0000000000000200
> console_trylock_spinning kernel/printk/printk.c:1818 [inline]
> vprintk_emit+0x3a5/0x560 kernel/printk/printk.c:2097
> dev_vprintk_emit+0x36e/0x3b2 drivers/base/core.c:4434
> dev_printk_emit+0xba/0xf1 drivers/base/core.c:4445
> __netdev_printk+0x1c6/0x27a net/core/dev.c:11292
> netdev_warn+0xd7/0x109 net/core/dev.c:11345
> ieee802154_subif_start_xmit.cold+0x17/0x27 net/mac802154/tx.c:125
> __netdev_start_xmit include/linux/netdevice.h:4825 [inline]
> netdev_start_xmit include/linux/netdevice.h:4839 [inline]
> xmit_one net/core/dev.c:3605 [inline]
> dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3621
> sch_direct_xmit+0x2e1/0xbd0 net/sched/sch_generic.c:313
> qdisc_restart net/sched/sch_generic.c:376 [inline]
> __qdisc_run+0x4ba/0x15f0 net/sched/sch_generic.c:384
> qdisc_run include/net/pkt_sched.h:136 [inline]
> qdisc_run include/net/pkt_sched.h:128 [inline]
> __dev_xmit_skb net/core/dev.c:3807 [inline]
> __dev_queue_xmit+0x14b9/0x2e00 net/core/dev.c:4162
> tx+0x68/0xb0 drivers/block/aoe/aoenet.c:63
> kthread+0x1e7/0x3a0 drivers/block/aoe/aoecmd.c:1230
> kthread+0x3b1/0x4a0 kernel/kthread.c:292
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> NMI backtrace for cpu 1
> CPU: 1 PID: 37 Comm: kworker/1:1 Not tainted 5.12.0-rc7-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: events nsim_dev_trap_report_work
> Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:79 [inline]
> dump_stack+0x141/0x1d7 lib/dump_stack.c:120
> nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
> nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
> trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
> rcu_dump_cpu_stacks+0x222/0x2a7 kernel/rcu/tree_stall.h:341
> print_cpu_stall kernel/rcu/tree_stall.h:622 [inline]
> check_cpu_stall kernel/rcu/tree_stall.h:697 [inline]
> rcu_pending kernel/rcu/tree.c:3830 [inline]
> rcu_sched_clock_irq.cold+0x4f7/0x11dd kernel/rcu/tree.c:2650
> update_process_times+0x16d/0x200 kernel/time/timer.c:1796
> tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
> tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1369
> __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
> __hrtimer_run_queues+0x1c0/0xe40 kernel/time/hrtimer.c:1601
> hrtimer_interrupt+0x330/0xa00 kernel/time/hrtimer.c:1663
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
> __sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1106
> sysvec_apic_timer_interrupt+0x40/0xc0 arch/x86/kernel/apic/apic.c:1100
> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
> RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
> RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70 kernel/locking/spinlock.c:191
> Code: 74 24 10 e8 ba 19 54 f8 48 89 ef e8 f2 cf 54 f8 81 e3 00 02 00 00 75 25 9c 58 f6 c4 02 75 2d 48 85 db 74 01 fb bf 01 00 00 00 <e8> d3 9d 48 f8 65 8b 05 7c 68 fc 76 85 c0 74 0a 5b 5d c3 e8 40 59
> RSP: 0018:ffffc90000dc0b28 EFLAGS: 00000206
> RAX: 0000000000000002 RBX: 0000000000000200 RCX: 1ffffffff1f5f34a
> RDX: 0000000000000000 RSI: 0000000000000103 RDI: 0000000000000001
> RBP: ffff888144fa8000 R08: 0000000000000001 R09: ffffffff8fa9a99f
> R10: 0000000000000001 R11: ffffc90013880000 R12: ffff888145047440
> R13: ffff88801ee8e500 R14: dffffc0000000000 R15: ffff888011f69c00
> spin_unlock_irqrestore include/linux/spinlock.h:409 [inline]
> dummy_timer+0x12f1/0x32a0 drivers/usb/gadget/udc/dummy_hcd.c:1985
> call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1431
> expire_timers kernel/time/timer.c:1476 [inline]
> __run_timers.part.0+0x67c/0xa50 kernel/time/timer.c:1745
> __run_timers kernel/time/timer.c:1726 [inline]
> run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1758
> __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
> do_softirq.part.0+0xd9/0x130 kernel/softirq.c:248
> </IRQ>
> do_softirq kernel/softirq.c:240 [inline]
> __local_bh_enable_ip+0x102/0x120 kernel/softirq.c:198
> spin_unlock_bh include/linux/spinlock.h:399 [inline]
> nsim_dev_trap_report drivers/net/netdevsim/dev.c:585 [inline]
> nsim_dev_trap_report_work+0x867/0xbd0 drivers/net/netdevsim/dev.c:611
> process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
> worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
> kthread+0x3b1/0x4a0 kernel/kthread.c:292
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 5-1:0.0: unknown status received: -71
> usbtmc 5-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 5-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 5-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 2-1:0.0: unknown status received: -71
> usbtmc 4-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: usb_submit_urb failed: -19
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: usb_submit_urb failed: -19
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000a9b79905c04e25a0%40google.com.
Hi all,
The error is in usbtmc_interrupt(struct urb *urb) since five years. The status code EPROTO is not handled correctly.
It's not a showstopper, but we should fix it and check the status code according to usbtmc_read_bulk_cb() or
usb_skeleton.c.
@Dave: Do you have time? Otherwise I can do it.
@Greg: Is it urgent?
- Guido
-----Original Message-----
From: Dmitry
Sent: Monday, April 19, 2021 9:27 AM
Subject: Re: [syzbot] INFO: rcu detected stall in tx
On Mon, Apr 19, 2021 at 9:19 AM syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 50987bec Merge tag 'trace-v5.12-rc7' of git://git.kernel.o..
> git tree: upstream
> console output:
> https://syzkaller.appspot.com/x/log.txt?x=1065c5fcd00000
> kernel config:
> https://syzkaller.appspot.com/x/.config?x=398c4d0fe6f66e68
> dashboard link:
> https://syzkaller.appspot.com/bug?extid=e2eae5639e7203360018
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> usbtmc 5-1:0.0: unknown status received: -71 usbtmc 3-1:0.0: unknown
> status received: -71 usbtmc 5-1:0.0: unknown status received: -71
The log shows an infinite stream of these before the stall, so I assume it's an infinite loop in usbtmc.
+usbtmc maintainers
[ 370.171634][ C0] usbtmc 6-1:0.0: unknown status received: -71
[ 370.177799][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.183912][ C0] usbtmc 4-1:0.0: unknown status received: -71
[ 370.190076][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.196194][ C0] usbtmc 2-1:0.0: unknown status received: -71
[ 370.202387][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.208460][ C0] usbtmc 6-1:0.0: unknown status received: -71
[ 370.214615][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.220736][ C0] usbtmc 4-1:0.0: unknown status received: -71
[ 370.226902][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.233005][ C0] usbtmc 2-1:0.0: unknown status received: -71
[ 370.239168][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.245271][ C0] usbtmc 6-1:0.0: unknown status received: -71
[ 370.251426][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.257552][ C0] usbtmc 4-1:0.0: unknown status received: -71
[ 370.263715][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.269819][ C0] usbtmc 2-1:0.0: unknown status received: -71
[ 370.275974][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.282100][ C0] usbtmc 6-1:0.0: unknown status received: -71
[ 370.288262][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.294399][ C0] usbtmc 4-1:0.0: unknown status received: -71
Content provided within this e-mail including any attachments, is for the use of the intended recipients and may contain Rohde & Schwarz company restricted information. Any unauthorized use, disclosure, or distribution of this communication in whole or in part is strictly prohibited. If you are not the intended recipient, please notify the sender by reply email or by telephone and delete the communication in its entirety.
On Mon, Apr 19, 2021 at 08:56:19PM +0000, Guido Kiener wrote:
> Hi all,
>
> The error is in usbtmc_interrupt(struct urb *urb) since five years. The status code EPROTO is not handled correctly.
> It's not a showstopper, but we should fix it and check the status code according to usbtmc_read_bulk_cb() or
> usb_skeleton.c.
> @Dave: Do you have time? Otherwise I can do it.
> @Greg: Is it urgent?
No idea, but patches for known problems are always good to get completed
as soon as possible :)
thanks,
greg k-h
Hi all,
Dave and I discussed the "self-detected stall on CPU" caused by the usbtmc driver.
What happened?
The callback handler usbtmc_interrupt(struct urb *urb) for the INT pipe receives an erroneous urb with status -EPROTO (-71).
See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/class/usbtmc.c?h=v5.12#n2340
-EPROTO does not abort/shutdown the pipe and the urb is resubmitted to receive the next packet. However the callback handler usbtmc_interrupt is called again with the same erroneous status -EPROTO and this seems to result in an endless loop.
According to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/driver-api/usb/error-codes.rst?h=v5.12#n177
the error -EPROTO indicates a hardware problem or a bad cable.
Most usb drivers do not react in a specific way on this hardware problems and resubmit the urb. We assume these drivers will run into the same endless loop. Some other driver samples are:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/class/cdc-acm.c?h=v5.12#n379
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/hid/usbhid/usbmouse.c?h=v5.12#n65
Possible solutions:
Hardware defects or bad cables seems to be a common problem for most usb drivers and I assume we do not want to fix this problem in all class specific drivers, but in lower level host drivers, e.g:
1. Using a counter and close the pipe after some detected errors
2. Delay the resubmission of the urb to avoid high cpu usage
3. Do nothing, since it is just a rare problem.
We've never seen this problem in our products and we do not dare to change anything.
- Guido
-----Original Message-----
From: Dmitry
Sent: Monday, April 19, 2021 9:27 AM
Subject: Re: [syzbot] INFO: rcu detected stall in tx
On Mon, Apr 19, 2021 at 9:19 AM syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 50987bec Merge tag 'trace-v5.12-rc7' of git://git.kernel.o..
> git tree: upstream
> console output:
> https://syzkaller.appspot.com/x/log.txt?x=1065c5fcd00000
> kernel config:
> https://syzkaller.appspot.com/x/.config?x=398c4d0fe6f66e68
> dashboard link:
> https://syzkaller.appspot.com/bug?extid=e2eae5639e7203360018
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> usbtmc 5-1:0.0: unknown status received: -71 usbtmc 3-1:0.0: unknown
> status received: -71 usbtmc 5-1:0.0: unknown status received: -71
The log shows an infinite stream of these before the stall, so I assume it's an infinite loop in usbtmc.
+usbtmc maintainers
[ 370.171634][ C0] usbtmc 6-1:0.0: unknown status received: -71
[ 370.177799][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.183912][ C0] usbtmc 4-1:0.0: unknown status received: -71
[ 370.190076][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.196194][ C0] usbtmc 2-1:0.0: unknown status received: -71
[ 370.202387][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.208460][ C0] usbtmc 6-1:0.0: unknown status received: -71
[ 370.214615][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.220736][ C0] usbtmc 4-1:0.0: unknown status received: -71
[ 370.226902][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.233005][ C0] usbtmc 2-1:0.0: unknown status received: -71
[ 370.239168][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.245271][ C0] usbtmc 6-1:0.0: unknown status received: -71
[ 370.251426][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.257552][ C0] usbtmc 4-1:0.0: unknown status received: -71
[ 370.263715][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.269819][ C0] usbtmc 2-1:0.0: unknown status received: -71
[ 370.275974][ C1] usbtmc 3-1:0.0: unknown status received: -71
[ 370.282100][ C0] usbtmc 6-1:0.0: unknown status received: -71
[ 370.288262][ C1] usbtmc 5-1:0.0: unknown status received: -71
[ 370.294399][ C0] usbtmc 4-1:0.0: unknown status received: -71
On Mon, May 03, 2021 at 09:56:05PM +0000, Guido Kiener wrote:
> Hi all,
>
> Dave and I discussed the "self-detected stall on CPU" caused by the usbtmc driver.
>
> What happened?
> The callback handler usbtmc_interrupt(struct urb *urb) for the INT pipe receives an erroneous urb with status -EPROTO (-71).
> See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/class/usbtmc.c?h=v5.12#n2340
> -EPROTO does not abort/shutdown the pipe and the urb is resubmitted to receive the next packet. However the callback handler usbtmc_interrupt is called again with the same erroneous status -EPROTO and this seems to result in an endless loop.
> According to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/driver-api/usb/error-codes.rst?h=v5.12#n177
> the error -EPROTO indicates a hardware problem or a bad cable.
>
> Most usb drivers do not react in a specific way on this hardware problems and resubmit the urb. We assume these drivers will run into the same endless loop. Some other driver samples are:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/class/cdc-acm.c?h=v5.12#n379
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/hid/usbhid/usbmouse.c?h=v5.12#n65
>
> Possible solutions:
> Hardware defects or bad cables seems to be a common problem for most usb drivers and I assume we do not want to fix this problem in all class specific drivers, but in lower level host drivers, e.g:
> 1. Using a counter and close the pipe after some detected errors
> 2. Delay the resubmission of the urb to avoid high cpu usage
> 3. Do nothing, since it is just a rare problem.
>
> We've never seen this problem in our products and we do not dare to change anything.
Drivers are not consistent in the way they handle these errors, as you
have seen. A few try to take active measures, such as retrys with
increasing timeouts. Many drivers just ignore them, which is not a very
good idea.
The general feeling among kernel USB developers is that a -EPROTO,
-EILSEQ, or -ETIME error should be regarded as fatal, much the same as
an unplug event. The driver should avoid resubmitting URBs and just
wait to be unbound from the device.
If you would like to audit drivers and fix them up to behave this way,
that would be great.
(FYI, by far the most common causes of these errors are: The user has
unplugged the USB cable, or the device's firmware has crashed. It is
quite rare for the cause to be intermittent, although not entirely
unheard of -- for example, someone once reported errors resulting from
EM or power-line interference caused by flickering fluorescent lights or
something of that sort. It's pretty safe to ignore this possibility.)
Alan Stern
> -----Original Message-----
> From: Alan Stern <[email protected]>
> Sent: Tuesday, May 4, 2021 5:14 PM
> To: Kiener Guido 14DS1
> Subject: Re: Re: [syzbot] INFO: rcu detected stall in tx
>
> On Mon, May 03, 2021 at 09:56:05PM +0000, Guido Kiener wrote:
> > Hi all,
> >
> > Dave and I discussed the "self-detected stall on CPU" caused by the usbtmc
> driver.
> >
> > What happened?
> > The callback handler usbtmc_interrupt(struct urb *urb) for the INT pipe receives
> an erroneous urb with status -EPROTO (-71).
> > See
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > e/drivers/usb/class/usbtmc.c?h=v5.12#n2340
> > -EPROTO does not abort/shutdown the pipe and the urb is resubmitted to receive
> the next packet. However the callback handler usbtmc_interrupt is called again with
> the same erroneous status -EPROTO and this seems to result in an endless loop.
> > According to
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > e/Documentation/driver-api/usb/error-codes.rst?h=v5.12#n177
> > the error -EPROTO indicates a hardware problem or a bad cable.
> >
> > Most usb drivers do not react in a specific way on this hardware problems and
> resubmit the urb. We assume these drivers will run into the same endless loop.
> Some other driver samples are:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > e/drivers/usb/class/cdc-acm.c?h=v5.12#n379
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > e/drivers/hid/usbhid/usbmouse.c?h=v5.12#n65
> >
> > Possible solutions:
> > Hardware defects or bad cables seems to be a common problem for most usb
> drivers and I assume we do not want to fix this problem in all class specific drivers,
> but in lower level host drivers, e.g:
> > 1. Using a counter and close the pipe after some detected errors 2.
> > Delay the resubmission of the urb to avoid high cpu usage 3. Do
> > nothing, since it is just a rare problem.
> >
> > We've never seen this problem in our products and we do not dare to change
> anything.
>
> Drivers are not consistent in the way they handle these errors, as you have seen. A
> few try to take active measures, such as retrys with increasing timeouts. Many
> drivers just ignore them, which is not a very good idea.
>
> The general feeling among kernel USB developers is that a -EPROTO, -EILSEQ, or
> -ETIME error should be regarded as fatal, much the same as an unplug event. The
> driver should avoid resubmitting URBs and just wait to be unbound from the device.
Thanks for your assessment. I agree with the general feeling. I counted about hundred
specific usb drivers, so wouldn't it be better to fix the problem in some of the host drivers (e.g. urb.c)?
We could return an error when calling usb_submit_urb() on an erroneous pipe.
I cannot estimate the side effects and we need to check all drivers again how they deal with the
error situation. Maybe there are some special driver that need a specialized error handling.
In this case these drivers could reset the (new?) error flag to allow calling usb_submit_urb()
again without error. This could work, isn't it?
> If you would like to audit drivers and fix them up to behave this way, that would be
> great.
Currently not. I cannot pull the USB cable in home office :-), but I will keep an eye on it.
When I'm more involved in the next USB driver issue than I will test bad cables and
maybe get more ideas how we could test and fix this rare error.
> (FYI, by far the most common causes of these errors are: The user has unplugged
> the USB cable, or the device's firmware has crashed. It is quite rare for the cause to
> be intermittent, although not entirely unheard of -- for example, someone once
> reported errors resulting from EM or power-line interference caused by flickering
> fluorescent lights or something of that sort. It's pretty safe to ignore this possibility.)
I fear I may not use the 75 kW TV transmitter to interfere the USB cable :-)
-Guido
On Wed, May 05, 2021 at 10:22:24PM +0000, Guido Kiener wrote:
> > -----Original Message-----
> > From: Alan Stern <[email protected]>
> > Sent: Tuesday, May 4, 2021 5:14 PM
> > To: Kiener Guido 14DS1
> > Subject: Re: Re: [syzbot] INFO: rcu detected stall in tx
> >
> > On Mon, May 03, 2021 at 09:56:05PM +0000, Guido Kiener wrote:
> > > Hi all,
> > >
> > > Dave and I discussed the "self-detected stall on CPU" caused by the usbtmc
> > driver.
> > >
> > > What happened?
> > > The callback handler usbtmc_interrupt(struct urb *urb) for the INT pipe receives
> > an erroneous urb with status -EPROTO (-71).
> > > See
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > > e/drivers/usb/class/usbtmc.c?h=v5.12#n2340
> > > -EPROTO does not abort/shutdown the pipe and the urb is resubmitted to receive
> > the next packet. However the callback handler usbtmc_interrupt is called again with
> > the same erroneous status -EPROTO and this seems to result in an endless loop.
> > > According to
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > > e/Documentation/driver-api/usb/error-codes.rst?h=v5.12#n177
> > > the error -EPROTO indicates a hardware problem or a bad cable.
> > >
> > > Most usb drivers do not react in a specific way on this hardware problems and
> > resubmit the urb. We assume these drivers will run into the same endless loop.
> > Some other driver samples are:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > > e/drivers/usb/class/cdc-acm.c?h=v5.12#n379
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > > e/drivers/hid/usbhid/usbmouse.c?h=v5.12#n65
> > >
> > > Possible solutions:
> > > Hardware defects or bad cables seems to be a common problem for most usb
> > drivers and I assume we do not want to fix this problem in all class specific drivers,
> > but in lower level host drivers, e.g:
> > > 1. Using a counter and close the pipe after some detected errors 2.
> > > Delay the resubmission of the urb to avoid high cpu usage 3. Do
> > > nothing, since it is just a rare problem.
> > >
> > > We've never seen this problem in our products and we do not dare to change
> > anything.
> >
> > Drivers are not consistent in the way they handle these errors, as you have seen. A
> > few try to take active measures, such as retrys with increasing timeouts. Many
> > drivers just ignore them, which is not a very good idea.
> >
> > The general feeling among kernel USB developers is that a -EPROTO, -EILSEQ, or
> > -ETIME error should be regarded as fatal, much the same as an unplug event. The
> > driver should avoid resubmitting URBs and just wait to be unbound from the device.
>
> Thanks for your assessment. I agree with the general feeling. I counted about hundred
> specific usb drivers, so wouldn't it be better to fix the problem in some of the host drivers (e.g. urb.c)?
> We could return an error when calling usb_submit_urb() on an erroneous pipe.
> I cannot estimate the side effects and we need to check all drivers again how they deal with the
> error situation. Maybe there are some special driver that need a specialized error handling.
> In this case these drivers could reset the (new?) error flag to allow calling usb_submit_urb()
> again without error. This could work, isn't it?
That is feasible, although it would be an awkward approach. As you
said, the side effects aren't clear. But it might work.
> > If you would like to audit drivers and fix them up to behave this way, that would be
> > great.
>
> Currently not. I cannot pull the USB cable in home office :-), but I will keep an eye on it.
> When I'm more involved in the next USB driver issue than I will test bad cables and
> maybe get more ideas how we could test and fix this rare error.
Will you be able to test patches?
Alan Stern
> -----Original Message-----
> From: Alan Stern
> Sent: Thursday, May 6, 2021 3:49 PM
> To: Kiener Guido 14DS1 <[email protected]>
>
> On Wed, May 05, 2021 at 10:22:24PM +0000, Guido Kiener wrote:
> > > Drivers are not consistent in the way they handle these errors, as
> > > you have seen. A few try to take active measures, such as retrys
> > > with increasing timeouts. Many drivers just ignore them, which is not a very
> good idea.
> > >
> > > The general feeling among kernel USB developers is that a -EPROTO,
> > > -EILSEQ, or -ETIME error should be regarded as fatal, much the same
> > > as an unplug event. The driver should avoid resubmitting URBs and just wait to
> be unbound from the device.
> >
> > Thanks for your assessment. I agree with the general feeling. I
> > counted about hundred specific usb drivers, so wouldn't it be better to fix the
> problem in some of the host drivers (e.g. urb.c)?
> > We could return an error when calling usb_submit_urb() on an erroneous pipe.
> > I cannot estimate the side effects and we need to check all drivers
> > again how they deal with the error situation. Maybe there are some special driver
> that need a specialized error handling.
> > In this case these drivers could reset the (new?) error flag to allow
> > calling usb_submit_urb() again without error. This could work, isn't it?
>
> That is feasible, although it would be an awkward approach. As you said, the side
> effects aren't clear. But it might work.
Otherwise I see only the other approach to change hundred drivers and add the
cases EPROTO, EILSEQ and ETIME in each callback handler. The usbtmc driver
already respects the EILSEQ and ETIME, and only EPROTO is missing.
The rest should be more a management task.
BTW do you assume it is only a problem for INT pipes or is it also a problem
for isochronous and bulk transfers?
> > > If you would like to audit drivers and fix them up to behave this
> > > way, that would be great.
> >
> > Currently not. I cannot pull the USB cable in home office :-), but I will keep an eye
> on it.
> > When I'm more involved in the next USB driver issue than I will test
> > bad cables and maybe get more ideas how we could test and fix this rare error.
>
> Will you be able to test patches?
I only can test the USBTMC function in some different PCs. I do not have automated
regression tests for USB drivers or Linux kernels.
Maybe there is company who could do that.
-Guido
On Thu, May 06, 2021 at 05:44:55PM +0000, Guido Kiener wrote:
> > -----Original Message-----
> > From: Alan Stern
> > Sent: Thursday, May 6, 2021 3:49 PM
> > To: Kiener Guido 14DS1 <[email protected]>
> > >
> > > Thanks for your assessment. I agree with the general feeling. I
> > > counted about hundred specific usb drivers, so wouldn't it be better to fix the
> > problem in some of the host drivers (e.g. urb.c)?
> > > We could return an error when calling usb_submit_urb() on an erroneous pipe.
> > > I cannot estimate the side effects and we need to check all drivers
> > > again how they deal with the error situation. Maybe there are some special driver
> > that need a specialized error handling.
> > > In this case these drivers could reset the (new?) error flag to allow
> > > calling usb_submit_urb() again without error. This could work, isn't it?
> >
> > That is feasible, although it would be an awkward approach. As you said, the side
> > effects aren't clear. But it might work.
>
> Otherwise I see only the other approach to change hundred drivers and add the
> cases EPROTO, EILSEQ and ETIME in each callback handler. The usbtmc driver
> already respects the EILSEQ and ETIME, and only EPROTO is missing.
> The rest should be more a management task.
> BTW do you assume it is only a problem for INT pipes or is it also a problem
> for isochronous and bulk transfers?
All of them. Control too.
> > Will you be able to test patches?
>
> I only can test the USBTMC function in some different PCs. I do not have automated
> regression tests for USB drivers or Linux kernels.
> Maybe there is company who could do that.
Well then, if I do find time to write a patch, I'll ask you to try it
out with the usbtmc driver.
Alan Stern
> -----Original Message-----
> From: Alan Stern
> Sent: Thursday, May 6, 2021 8:32 PM
> To: Kiener Guido 14DS1
>
> On Thu, May 06, 2021 at 05:44:55PM +0000, Guido Kiener wrote:
> > > -----Original Message-----
> > > From: Alan Stern
> > > Sent: Thursday, May 6, 2021 3:49 PM
> > > To: Kiener Guido 14DS1 <[email protected]>
> > > >
> > > > Thanks for your assessment. I agree with the general feeling. I
> > > > counted about hundred specific usb drivers, so wouldn't it be
> > > > better to fix the
> > > problem in some of the host drivers (e.g. urb.c)?
> > > > We could return an error when calling usb_submit_urb() on an erroneous
> pipe.
> > > > I cannot estimate the side effects and we need to check all
> > > > drivers again how they deal with the error situation. Maybe there
> > > > are some special driver
> > > that need a specialized error handling.
> > > > In this case these drivers could reset the (new?) error flag to
> > > > allow calling usb_submit_urb() again without error. This could work, isn't it?
> > >
> > > That is feasible, although it would be an awkward approach. As you
> > > said, the side effects aren't clear. But it might work.
> >
> > Otherwise I see only the other approach to change hundred drivers and
> > add the cases EPROTO, EILSEQ and ETIME in each callback handler. The
> > usbtmc driver already respects the EILSEQ and ETIME, and only EPROTO is
> missing.
> > The rest should be more a management task.
> > BTW do you assume it is only a problem for INT pipes or is it also a
> > problem for isochronous and bulk transfers?
>
> All of them. Control too.
>
> > > Will you be able to test patches?
> >
> > I only can test the USBTMC function in some different PCs. I do not
> > have automated regression tests for USB drivers or Linux kernels.
> > Maybe there is company who could do that.
>
> Well then, if I do find time to write a patch, I'll ask you to try it out with the usbtmc
> driver.
You mean that you will do a patch in urb.c or a host driver? Or just add a line in usbtmc.c?
Anyhow there is no hurry. On May 20 I will send you a mail if I'm able to
provoke one of these hardware errors EPROTO, EILSQ, or ETIME. Otherwise
it doesn't make sense to test it.
-Guido
On Thu, 6 May 2021 at 22:31, Guido Kiener
<[email protected]> wrote:
>
> > -----Original Message-----
> > From: Alan Stern
> > Sent: Thursday, May 6, 2021 8:32 PM
> > To: Kiener Guido 14DS1
> >
> > On Thu, May 06, 2021 at 05:44:55PM +0000, Guido Kiener wrote:
> > > > -----Original Message-----
> > > > From: Alan Stern
> > > > Sent: Thursday, May 6, 2021 3:49 PM
> > > > To: Kiener Guido 14DS1 <[email protected]>
> > > > >
> > > > > Thanks for your assessment. I agree with the general feeling. I
> > > > > counted about hundred specific usb drivers, so wouldn't it be
> > > > > better to fix the
> > > > problem in some of the host drivers (e.g. urb.c)?
> > > > > We could return an error when calling usb_submit_urb() on an erroneous
> > pipe.
> > > > > I cannot estimate the side effects and we need to check all
> > > > > drivers again how they deal with the error situation. Maybe there
> > > > > are some special driver
> > > > that need a specialized error handling.
> > > > > In this case these drivers could reset the (new?) error flag to
> > > > > allow calling usb_submit_urb() again without error. This could work, isn't it?
> > > >
> > > > That is feasible, although it would be an awkward approach. As you
> > > > said, the side effects aren't clear. But it might work.
> > >
> > > Otherwise I see only the other approach to change hundred drivers and
> > > add the cases EPROTO, EILSEQ and ETIME in each callback handler. The
> > > usbtmc driver already respects the EILSEQ and ETIME, and only EPROTO is
> > missing.
> > > The rest should be more a management task.
> > > BTW do you assume it is only a problem for INT pipes or is it also a
> > > problem for isochronous and bulk transfers?
> >
> > All of them. Control too.
> >
> > > > Will you be able to test patches?
> > >
> > > I only can test the USBTMC function in some different PCs. I do not
> > > have automated regression tests for USB drivers or Linux kernels.
> > > Maybe there is company who could do that.
> >
> > Well then, if I do find time to write a patch, I'll ask you to try it out with the usbtmc
> > driver.
>
> You mean that you will do a patch in urb.c or a host driver? Or just add a line in usbtmc.c?
> Anyhow there is no hurry. On May 20 I will send you a mail if I'm able to
> provoke one of these hardware errors EPROTO, EILSQ, or ETIME. Otherwise
> it doesn't make sense to test it.
>
> -Guido
EPROTO is a link level issue and needs to be handled by the host driver.
When the host driver detects a protocol error while processing an URB
it completes the URB with EPROTO status and marks the endpoint as
halted.
When the class driver resubmits the URB and the if the host driver
finds the endpoint still marked as halted it should return EPIPE
status on the resubmitted URB
When the class driver and usbtmc in particular receives an URB with
EPIPE status it cleans up and does not resubmit.
Can someone from syzbot land please confirm whether usbtmc running on
the xhci host driver causes an RCU stall to be detected ?
-Dave
On Sat, May 08, 2021 at 10:14:41AM +0200, dave penkler wrote:
> On Thu, 6 May 2021 at 22:31, Guido Kiener
> <[email protected]> wrote:
> >
> > > -----Original Message-----
> > > From: Alan Stern
> > > Sent: Thursday, May 6, 2021 8:32 PM
> > > To: Kiener Guido 14DS1
> > >
> > > On Thu, May 06, 2021 at 05:44:55PM +0000, Guido Kiener wrote:
> > > > > -----Original Message-----
> > > > > From: Alan Stern
> > > > > Sent: Thursday, May 6, 2021 3:49 PM
> > > > > To: Kiener Guido 14DS1 <[email protected]>
> > > > > >
> > > > > > Thanks for your assessment. I agree with the general feeling. I
> > > > > > counted about hundred specific usb drivers, so wouldn't it be
> > > > > > better to fix the
> > > > > problem in some of the host drivers (e.g. urb.c)?
> > > > > > We could return an error when calling usb_submit_urb() on an erroneous
> > > pipe.
> > > > > > I cannot estimate the side effects and we need to check all
> > > > > > drivers again how they deal with the error situation. Maybe there
> > > > > > are some special driver
> > > > > that need a specialized error handling.
> > > > > > In this case these drivers could reset the (new?) error flag to
> > > > > > allow calling usb_submit_urb() again without error. This could work, isn't it?
> > > > >
> > > > > That is feasible, although it would be an awkward approach. As you
> > > > > said, the side effects aren't clear. But it might work.
> > > >
> > > > Otherwise I see only the other approach to change hundred drivers and
> > > > add the cases EPROTO, EILSEQ and ETIME in each callback handler. The
> > > > usbtmc driver already respects the EILSEQ and ETIME, and only EPROTO is
> > > missing.
> > > > The rest should be more a management task.
> > > > BTW do you assume it is only a problem for INT pipes or is it also a
> > > > problem for isochronous and bulk transfers?
> > >
> > > All of them. Control too.
> > >
> > > > > Will you be able to test patches?
> > > >
> > > > I only can test the USBTMC function in some different PCs. I do not
> > > > have automated regression tests for USB drivers or Linux kernels.
> > > > Maybe there is company who could do that.
> > >
> > > Well then, if I do find time to write a patch, I'll ask you to try it out with the usbtmc
> > > driver.
> >
> > You mean that you will do a patch in urb.c or a host driver? Or just add a line in usbtmc.c?
> > Anyhow there is no hurry. On May 20 I will send you a mail if I'm able to
> > provoke one of these hardware errors EPROTO, EILSQ, or ETIME. Otherwise
> > it doesn't make sense to test it.
> >
> > -Guido
>
> EPROTO is a link level issue and needs to be handled by the host driver.
Are you referring to the host controller driver, or to the class device
driver running on the host? The host controller driver is responsible
for creating the -EPROTO error code in the first place. The class
device driver is responsible for taking an appropriate action in
response.
> When the host driver detects a protocol error while processing an URB
> it completes the URB with EPROTO status and marks the endpoint as
> halted.
Not true. It does not mark the endpoint as halted, not unless it
receives a STALL handshake from the device. A STALL is not a protocol
error.
> When the class driver resubmits the URB and the if the host driver
> finds the endpoint still marked as halted it should return EPIPE
> status on the resubmitted URB
Irrelevant.
> When the class driver and usbtmc in particular receives an URB with
> EPIPE status it cleans up and does not resubmit.
> Can someone from syzbot land please confirm whether usbtmc running on
> the xhci host driver causes an RCU stall to be detected ?
That is not an easy thing to test, and syzbot is not capable of testing
it. You would need a USB device which could deliberately be set to
create a protocol error; I don't know of any devices like that.
Alan Stern
On Sat, 8 May 2021 at 16:29, Alan Stern <[email protected]> wrote:
>
> On Sat, May 08, 2021 at 10:14:41AM +0200, dave penkler wrote:
> > On Thu, 6 May 2021 at 22:31, Guido Kiener
> > <[email protected]> wrote:
> > >
> > > > -----Original Message-----
> > > > From: Alan Stern
> > > > Sent: Thursday, May 6, 2021 8:32 PM
> > > > To: Kiener Guido 14DS1
> > > >
> > > > On Thu, May 06, 2021 at 05:44:55PM +0000, Guido Kiener wrote:
> > > > > > -----Original Message-----
> > > > > > From: Alan Stern
> > > > > > Sent: Thursday, May 6, 2021 3:49 PM
> > > > > > To: Kiener Guido 14DS1 <[email protected]>
> > > > > > >
> > > > > > > Thanks for your assessment. I agree with the general feeling. I
> > > > > > > counted about hundred specific usb drivers, so wouldn't it be
> > > > > > > better to fix the
> > > > > > problem in some of the host drivers (e.g. urb.c)?
> > > > > > > We could return an error when calling usb_submit_urb() on an erroneous
> > > > pipe.
> > > > > > > I cannot estimate the side effects and we need to check all
> > > > > > > drivers again how they deal with the error situation. Maybe there
> > > > > > > are some special driver
> > > > > > that need a specialized error handling.
> > > > > > > In this case these drivers could reset the (new?) error flag to
> > > > > > > allow calling usb_submit_urb() again without error. This could work, isn't it?
> > > > > >
> > > > > > That is feasible, although it would be an awkward approach. As you
> > > > > > said, the side effects aren't clear. But it might work.
> > > > >
> > > > > Otherwise I see only the other approach to change hundred drivers and
> > > > > add the cases EPROTO, EILSEQ and ETIME in each callback handler. The
> > > > > usbtmc driver already respects the EILSEQ and ETIME, and only EPROTO is
> > > > missing.
> > > > > The rest should be more a management task.
> > > > > BTW do you assume it is only a problem for INT pipes or is it also a
> > > > > problem for isochronous and bulk transfers?
> > > >
> > > > All of them. Control too.
> > > >
> > > > > > Will you be able to test patches?
> > > > >
> > > > > I only can test the USBTMC function in some different PCs. I do not
> > > > > have automated regression tests for USB drivers or Linux kernels.
> > > > > Maybe there is company who could do that.
> > > >
> > > > Well then, if I do find time to write a patch, I'll ask you to try it out with the usbtmc
> > > > driver.
> > >
> > > You mean that you will do a patch in urb.c or a host driver? Or just add a line in usbtmc.c?
> > > Anyhow there is no hurry. On May 20 I will send you a mail if I'm able to
> > > provoke one of these hardware errors EPROTO, EILSQ, or ETIME. Otherwise
> > > it doesn't make sense to test it.
> > >
> > > -Guido
> >
> > EPROTO is a link level issue and needs to be handled by the host driver.
>
> Are you referring to the host controller driver, or to the class device
> driver running on the host? The host controller driver is responsible
> for creating the -EPROTO error code in the first place. The class
> device driver is responsible for taking an appropriate action in
> response.
host controller driver
>
> > When the host driver detects a protocol error while processing an URB
> > it completes the URB with EPROTO status and marks the endpoint as
> > halted.
>
> Not true. It does not mark the endpoint as halted, not unless it
> receives a STALL handshake from the device. A STALL is not a protocol
> error.
>
> > When the class driver resubmits the URB and the if the host driver
> > finds the endpoint still marked as halted it should return EPIPE
> > status on the resubmitted URB
>
> Irrelevant.
Not at all. The point is that when an application is talking to an
instrument over the usbtmc driver, the underlying host controller and
its driver will detect and silence a babbling endpoint.
Hence no EPROTO loop will ensue in this case and therefore no changes
are needed in usbtmc.
>
> > When the class driver and usbtmc in particular receives an URB with
> > EPIPE status it cleans up and does not resubmit.
> > Can someone from syzbot land please confirm whether usbtmc running on
> > the xhci host driver causes an RCU stall to be detected ?
>
> That is not an easy thing to test, and syzbot is not capable of testing
> it. You would need a USB device which could deliberately be set to
> create a protocol error; I don't know of any devices like that.
>
> Alan Stern
On Wed, May 19, 2021 at 10:48:29AM +0200, dave penkler wrote:
> On Sat, 8 May 2021 at 16:29, Alan Stern <[email protected]> wrote:
> >
> > On Sat, May 08, 2021 at 10:14:41AM +0200, dave penkler wrote:
> > > When the host driver detects a protocol error while processing an URB
> > > it completes the URB with EPROTO status and marks the endpoint as
> > > halted.
> >
> > Not true. It does not mark the endpoint as halted, not unless it
> > receives a STALL handshake from the device. A STALL is not a protocol
> > error.
> >
> > > When the class driver resubmits the URB and the if the host driver
> > > finds the endpoint still marked as halted it should return EPIPE
> > > status on the resubmitted URB
> >
> > Irrelevant.
> Not at all. The point is that when an application is talking to an
> instrument over the usbtmc driver, the underlying host controller and
> its driver will detect and silence a babbling endpoint.
No, they won't. That is, they will detect a babble error and return an
error status, but they won't silence the endpoint. What makes you think
they will?
> Hence no EPROTO loop will ensue in this case and therefore no changes
> are needed in usbtmc.
Since this conclusion relies on the incorrect assumption above, it also
is incorrect.
Alan Stern
> On Wed, May 19, 2021 at 10:48:29AM +0200, dave penkler wrote:
> > On Sat, 8 May 2021 at 16:29, Alan Stern <[email protected]> wrote:
> > >
> > > On Sat, May 08, 2021 at 10:14:41AM +0200, dave penkler wrote:
> > > > When the host driver detects a protocol error while processing an
> > > > URB it completes the URB with EPROTO status and marks the endpoint
> > > > as halted.
> > >
> > > Not true. It does not mark the endpoint as halted, not unless it
> > > receives a STALL handshake from the device. A STALL is not a
> > > protocol error.
> > >
> > > > When the class driver resubmits the URB and the if the host driver
> > > > finds the endpoint still marked as halted it should return EPIPE
> > > > status on the resubmitted URB
> > >
> > > Irrelevant.
> > Not at all. The point is that when an application is talking to an
> > instrument over the usbtmc driver, the underlying host controller and
> > its driver will detect and silence a babbling endpoint.
>
> No, they won't. That is, they will detect a babble error and return an error status, but
> they won't silence the endpoint. What makes you think they will?
Maybe there is a misunderstanding. I guess that Dave wanted to propose:
"EPROTO is a link level issue and needs to be handled by the host driver.
When the host driver detects a protocol error while processing an
URB it SHOULD complete the URB with EPROTO status and SHOULD mark the endpoint
as halted."
Is this a realistic fix for all host drivers?
-Guido
On Wed, 19 May 2021, Guido Kiener wrote:
> > On Wed, May 19, 2021 at 10:48:29AM +0200, dave penkler wrote:
> > > On Sat, 8 May 2021 at 16:29, Alan Stern <[email protected]> wrote:
> > > >
> > > > On Sat, May 08, 2021 at 10:14:41AM +0200, dave penkler wrote:
> > > > > When the host driver detects a protocol error while processing an
> > > > > URB it completes the URB with EPROTO status and marks the endpoint
> > > > > as halted.
> > > >
> > > > Not true. It does not mark the endpoint as halted, not unless it
> > > > receives a STALL handshake from the device. A STALL is not a
> > > > protocol error.
> > > >
> > > > > When the class driver resubmits the URB and the if the host driver
> > > > > finds the endpoint still marked as halted it should return EPIPE
> > > > > status on the resubmitted URB
> > > >
> > > > Irrelevant.
> > > Not at all. The point is that when an application is talking to an
> > > instrument over the usbtmc driver, the underlying host controller and
> > > its driver will detect and silence a babbling endpoint.
> >
> > No, they won't. That is, they will detect a babble error and return an error status, but
> > they won't silence the endpoint. What makes you think they will?
>
> Maybe there is a misunderstanding. I guess that Dave wanted to propose:
> "EPROTO is a link level issue and needs to be handled by the host driver.
> When the host driver detects a protocol error while processing an
> URB it SHOULD complete the URB with EPROTO status and SHOULD mark the endpoint
> as halted."
> Is this a realistic fix for all host drivers?
>
> -Guido
Guido, would you mind taking a look at your mailer settings please? I
now have >=7 threads running through my inbox with the same subject.
For some reason your mailer is insisting on creating a new one for
each of your replies.
It's also adding odd "re: re: re: ..." prefixes.
TIA
--
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog
On Wed, May 19, 2021 at 04:14:20PM +0000, Guido Kiener wrote:
> > On Wed, May 19, 2021 at 10:48:29AM +0200, dave penkler wrote:
> > > On Sat, 8 May 2021 at 16:29, Alan Stern <[email protected]> wrote:
> > > >
> > > > On Sat, May 08, 2021 at 10:14:41AM +0200, dave penkler wrote:
> > > > > When the host driver detects a protocol error while processing an
> > > > > URB it completes the URB with EPROTO status and marks the endpoint
> > > > > as halted.
> > > >
> > > > Not true. It does not mark the endpoint as halted, not unless it
> > > > receives a STALL handshake from the device. A STALL is not a
> > > > protocol error.
> > > >
> > > > > When the class driver resubmits the URB and the if the host driver
> > > > > finds the endpoint still marked as halted it should return EPIPE
> > > > > status on the resubmitted URB
> > > >
> > > > Irrelevant.
> > > Not at all. The point is that when an application is talking to an
> > > instrument over the usbtmc driver, the underlying host controller and
> > > its driver will detect and silence a babbling endpoint.
> >
> > No, they won't. That is, they will detect a babble error and return an error status, but
> > they won't silence the endpoint. What makes you think they will?
>
> Maybe there is a misunderstanding. I guess that Dave wanted to propose:
> "EPROTO is a link level issue and needs to be handled by the host driver.
> When the host driver detects a protocol error while processing an
> URB it SHOULD complete the URB with EPROTO status
The host controller drivers _do_ complete URBs with -EPROTO (or similar)
status when a link-level error occurs...
> and SHOULD mark the endpoint
> as halted."
but they don't mark the endpoint as halted. Even if they did, it
wouldn't fix anything because the kernel allows URBs to be submitted to
halted endpoints. In fact, it doesn't even keep track of which
endpoints are or are not halted.
> Is this a realistic fix for all host drivers?
No, it isn't.
An endpoint shouldn't be marked as halted unless it really is halted.
Otherwise a driver might be tempted to clear the Halt feature, and
some devices do not like to receive a Clear-Halt request for an endpoint
that isn't halted.
What we could do is what you suggested earlier: Note the fact that the
endpoint is in some sort of fault condition and disallow further
communication with the endpoint until the fault condition has been
cleared. (It isn't entirely obvious exactly what actions should clear
such a fault... I guess resetting or re-enabling the endpoint, or
resetting the entire device.)
Alan Stern
Alan Stern wrote:
> On Wed, May 19, 2021 at 04:14:20PM +0000, Guido Kiener wrote:
>>> On Wed, May 19, 2021 at 10:48:29AM +0200, dave penkler wrote:
>>>> On Sat, 8 May 2021 at 16:29, Alan Stern <[email protected]> wrote:
>>>>>
>>>>> On Sat, May 08, 2021 at 10:14:41AM +0200, dave penkler wrote:
>>>>>> When the host driver detects a protocol error while processing an
>>>>>> URB it completes the URB with EPROTO status and marks the endpoint
>>>>>> as halted.
>>>>>
>>>>> Not true. It does not mark the endpoint as halted, not unless it
>>>>> receives a STALL handshake from the device. A STALL is not a
>>>>> protocol error.
>>>>>
>>>>>> When the class driver resubmits the URB and the if the host driver
>>>>>> finds the endpoint still marked as halted it should return EPIPE
>>>>>> status on the resubmitted URB
>>>>>
>>>>> Irrelevant.
>>>> Not at all. The point is that when an application is talking to an
>>>> instrument over the usbtmc driver, the underlying host controller and
>>>> its driver will detect and silence a babbling endpoint.
>>>
>>> No, they won't. That is, they will detect a babble error and return an error status, but
>>> they won't silence the endpoint. What makes you think they will?
>>
>> Maybe there is a misunderstanding. I guess that Dave wanted to propose:
>> "EPROTO is a link level issue and needs to be handled by the host driver.
>> When the host driver detects a protocol error while processing an
>> URB it SHOULD complete the URB with EPROTO status
>
> The host controller drivers _do_ complete URBs with -EPROTO (or similar)
> status when a link-level error occurs...
>
>> and SHOULD mark the endpoint
>> as halted."
>
> but they don't mark the endpoint as halted. Even if they did, it
> wouldn't fix anything because the kernel allows URBs to be submitted to
> halted endpoints. In fact, it doesn't even keep track of which
> endpoints are or are not halted.
>
>> Is this a realistic fix for all host drivers?
>
> No, it isn't.
>
> An endpoint shouldn't be marked as halted unless it really is halted.
> Otherwise a driver might be tempted to clear the Halt feature, and
> some devices do not like to receive a Clear-Halt request for an endpoint
> that isn't halted.
>
> What we could do is what you suggested earlier: Note the fact that the
> endpoint is in some sort of fault condition and disallow further
> communication with the endpoint until the fault condition has been
> cleared. (It isn't entirely obvious exactly what actions should clear
> such a fault... I guess resetting or re-enabling the endpoint, or
> resetting the entire device.)
>
> Alan Stern
>
Hi Alan,
Sorry if this diverges from the thread, but I've been wondering whether
to add a change for this also.
For xHCI hosts, after transactions errors, the endpoint will enter
halted state. The driver will attempt a few soft-retries before giving
up. According to the xHCI spec (section 4.6.8), a host may send a
ClearFeature(endpoint_halt) to recover and restart the transfer (see
"reset a pipe" in xhci spec), and the class driver can handle this after
receiving something like -EPROTO from xhci.
However, as you've pointed out, some devices don't like
ClearFeature(ep_halt) and may not properly synchronize with the host on
where it should restart.
Some OS (such as Windows) do this. Not sure if we also want this?
Currently the recovery is just a timeout and a port reset from the class
driver, but the timeout is usually defaulted to a long time (e.g. 30
seconds for storage class driver).
Thanks,
Thinh
On Wed, May 19, 2021 at 07:38:52PM +0000, Thinh Nguyen wrote:
> Hi Alan,
>
> Sorry if this diverges from the thread, but I've been wondering whether
> to add a change for this also.
>
> For xHCI hosts, after transactions errors, the endpoint will enter
> halted state.
No. You are misreading the xHCI spec. Section 4.6.8 says:
... the state of the associated Endpoint Context is set to
Halted...
Note this carefully. It says "Endpoint Context", not "endpoint".
The endpoint is part of the device, whereas the endpoint context is part
of the host controller. The device doesn't know when a transaction
error has occurred; consequently such errors do not affect the endpoint.
The host controller does know, and consequently such errors do affect
the endpoint context.
> The driver will attempt a few soft-retries before giving
> up. According to the xHCI spec (section 4.6.8), a host may send a
> ClearFeature(endpoint_halt) to recover and restart the transfer (see
Not quite. The section of the spec you're talking about says:
Software shall execute the following sequence to “reset a
pipe”.... Issue a ClearFeature(ENDPOINT_HALT) request to
device.
It does not say the host controller will do this; it says that software
will do it.
> "reset a pipe" in xhci spec), and the class driver can handle this after
> receiving something like -EPROTO from xhci.
>
> However, as you've pointed out, some devices don't like
> ClearFeature(ep_halt) and may not properly synchronize with the host on
> where it should restart.
>
> Some OS (such as Windows) do this. Not sure if we also want this?
In general we should do the same thing as Windows does, because most
hardware designers test their equipment on Windows systems but
relatively few test on Linux systems.
> Currently the recovery is just a timeout and a port reset from the class
This depends on the driver. Some perform no recovery at all.
> driver, but the timeout is usually defaulted to a long time (e.g. 30
> seconds for storage class driver).
That 30-second timeout in the mass-storage driver applies in situations
where a command fails to complete, not in situations where it completes
quickly but with a -EPROTO or -EPIPE error.
The fact is that only a small percentage of -EPROTO errors are
recoverable. Some of them can be handled by a port reset, which can be
pretty awkward to perform but does occasionally work. A lot of them
occur because the USB cable has been unplugged; obviously there's no way
to recover from that. With only a few exceptions, the best and simplest
approach is not to try to recover at all.
For the case in question (the syzbot bug report that started this
thread), the class driver doesn't try to perform any recovery. It just
resubmits the URB, getting into a tight retry loop which consumes too
much CPU time. Simply giving up would be preferable.
Alan Stern
+Mathias
Alan Stern wrote:
> On Wed, May 19, 2021 at 07:38:52PM +0000, Thinh Nguyen wrote:
>> Hi Alan,
>>
>> Sorry if this diverges from the thread, but I've been wondering whether
>> to add a change for this also.
>>
>> For xHCI hosts, after transactions errors, the endpoint will enter
>> halted state.
>
> No. You are misreading the xHCI spec. Section 4.6.8 says:
>
> ... the state of the associated Endpoint Context is set to
> Halted...
>
> Note this carefully. It says "Endpoint Context", not "endpoint".
>
> The endpoint is part of the device, whereas the endpoint context is part
> of the host controller. The device doesn't know when a transaction
> error has occurred; consequently such errors do not affect the endpoint.
> The host controller does know, and consequently such errors do affect
> the endpoint context.
>
You're right, my mistake here.
>> The driver will attempt a few soft-retries before giving
>> up. According to the xHCI spec (section 4.6.8), a host may send a
>> ClearFeature(endpoint_halt) to recover and restart the transfer (see
>
> Not quite. The section of the spec you're talking about says:
>
> Software shall execute the following sequence to “reset a
> pipe”.... Issue a ClearFeature(ENDPOINT_HALT) request to
> device.
>
> It does not say the host controller will do this; it says that software
> will do it.
Sorry for being unclear. I meant from the class driver, see my next
sentence.
>
>> "reset a pipe" in xhci spec), and the class driver can handle this after
>> receiving something like -EPROTO from xhci.
>>
>> However, as you've pointed out, some devices don't like
>> ClearFeature(ep_halt) and may not properly synchronize with the host on
>> where it should restart.
>>
>> Some OS (such as Windows) do this. Not sure if we also want this?
>
> In general we should do the same thing as Windows does, because most
> hardware designers test their equipment on Windows systems but
> relatively few test on Linux systems.
>
>> Currently the recovery is just a timeout and a port reset from the class
>
> This depends on the driver. Some perform no recovery at all.
>
>> driver, but the timeout is usually defaulted to a long time (e.g. 30
>> seconds for storage class driver).
>
> That 30-second timeout in the mass-storage driver applies in situations
> where a command fails to complete, not in situations where it completes
> quickly but with a -EPROTO or -EPIPE error.
Hm... looks like we have a couple of issues in the uas storage class
driver and the xhci driver.
We may need to fix that in the uas storage driver because it doesn't
seem to handle it. (check uas_data_cmplt() in uas.c).
As for the xhci driver, there maybe a case where the stream URB never
gets to complete because the transaction err_count is not properly
updated. The err_count for transaction error is stored in ep_ring, but
the xhci driver may not be able to lookup the correct ep_ring based on
TRB address for streams. There are cases for streams where the event
TRBs have their TRB pointer field cleared to '0' (xhci spec section
4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
it automatically does a soft-retry. This is seen from one of our
testings that the driver was repeatedly doing soft-retry until the class
driver timed out.
Hi Mathias, maybe you have some comment on this? Thanks.
>
> The fact is that only a small percentage of -EPROTO errors are
> recoverable. Some of them can be handled by a port reset, which can be
> pretty awkward to perform but does occasionally work. A lot of them
> occur because the USB cable has been unplugged; obviously there's no way
> to recover from that. With only a few exceptions, the best and simplest
> approach is not to try to recover at all.
If the cable is unplugged, then we should get a connection change event
and the driver can handle it properly.
Yes, it's probably simplest to do a port reset and let the transfer be
incomplete/corrupted. However, I think we should give
ClearFeature(ep_halt) some more thoughts as I think it can be a recovery
mechanism for storage class driver, even though that it may not be
foolproof.
>
> For the case in question (the syzbot bug report that started this
> thread), the class driver doesn't try to perform any recovery. It just
> resubmits the URB, getting into a tight retry loop which consumes too
> much CPU time. Simply giving up would be preferable.
>
> Alan Stern
>
I see. By giving up, you mean doing port reset right? Otherwise it needs
some other mechanism to synchronize with the device side.
Thanks,
Thinh
On 20.5.2021 23.30, Thinh Nguyen wrote:
> +Mathias
>
...
> Hm... looks like we have a couple of issues in the uas storage class
> driver and the xhci driver.
>
> We may need to fix that in the uas storage driver because it doesn't
> seem to handle it. (check uas_data_cmplt() in uas.c).
>
> As for the xhci driver, there maybe a case where the stream URB never
> gets to complete because the transaction err_count is not properly
> updated. The err_count for transaction error is stored in ep_ring, but
> the xhci driver may not be able to lookup the correct ep_ring based on
> TRB address for streams. There are cases for streams where the event
> TRBs have their TRB pointer field cleared to '0' (xhci spec section
> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
> it automatically does a soft-retry. This is seen from one of our
> testings that the driver was repeatedly doing soft-retry until the class
> driver timed out.
>
> Hi Mathias, maybe you have some comment on this? Thanks.
This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
We should add one and prevent a loop. after e few soft resets we can end with a
hard reset to clear the host side endpoint halt.
We don't know the URB that was being tansferred during the error, and can't
give it back with a proper error code.
In that sense we still end up waiting for a timeout and someone to cancel
the urb.
-Mathias
On Mon, May 24, 2021 at 06:18:59PM +0300, Mathias Nyman wrote:
> On 20.5.2021 23.30, Thinh Nguyen wrote:
> > As for the xhci driver, there maybe a case where the stream URB never
> > gets to complete because the transaction err_count is not properly
> > updated. The err_count for transaction error is stored in ep_ring, but
> > the xhci driver may not be able to lookup the correct ep_ring based on
> > TRB address for streams. There are cases for streams where the event
> > TRBs have their TRB pointer field cleared to '0' (xhci spec section
> > 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
> > it automatically does a soft-retry. This is seen from one of our
> > testings that the driver was repeatedly doing soft-retry until the class
> > driver timed out.
> >
> > Hi Mathias, maybe you have some comment on this? Thanks.
>
> This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
> We should add one and prevent a loop. after e few soft resets we can end with a
> hard reset to clear the host side endpoint halt.
>
> We don't know the URB that was being tansferred during the error, and can't
> give it back with a proper error code.
> In that sense we still end up waiting for a timeout and someone to cancel
> the urb.
That's not good. There may not be a timeout; drivers expect transfers
to complete with a failure, not to be retried indefinitely.
However, if you do know which endpoint/stream the error is connected to,
you should be able to get the URB. It will be the first one queued for
that endpoint/stream.
Alan Stern
Alan Stern wrote:
> On Mon, May 24, 2021 at 06:18:59PM +0300, Mathias Nyman wrote:
>> On 20.5.2021 23.30, Thinh Nguyen wrote:
>>> As for the xhci driver, there maybe a case where the stream URB never
>>> gets to complete because the transaction err_count is not properly
>>> updated. The err_count for transaction error is stored in ep_ring, but
>>> the xhci driver may not be able to lookup the correct ep_ring based on
>>> TRB address for streams. There are cases for streams where the event
>>> TRBs have their TRB pointer field cleared to '0' (xhci spec section
>>> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
>>> it automatically does a soft-retry. This is seen from one of our
>>> testings that the driver was repeatedly doing soft-retry until the class
>>> driver timed out.
>>>
>>> Hi Mathias, maybe you have some comment on this? Thanks.
>>
>> This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
>> We should add one and prevent a loop. after e few soft resets we can end with a
>> hard reset to clear the host side endpoint halt.
>>
>> We don't know the URB that was being tansferred during the error, and can't
>> give it back with a proper error code.
>> In that sense we still end up waiting for a timeout and someone to cancel
>> the urb.
>
> That's not good. There may not be a timeout; drivers expect transfers
> to complete with a failure, not to be retried indefinitely.
>
> However, if you do know which endpoint/stream the error is connected to,
> you should be able to get the URB. It will be the first one queued for
> that endpoint/stream.
>
When the xhci can't recover a transfer with soft-retry, no outstanding
transfer can proceed/complete for the endpoint. If the TRB pointer is 0,
we just don't know which stream or endpoint ring it's for, but we know
all the outstanding URBs of an endpoint. Let's may as well return an
error status for all of them after a limited number of soft-retries.
BR,
Thinh
On 24.5.2021 22.23, Thinh Nguyen wrote:
> Alan Stern wrote:
>> On Mon, May 24, 2021 at 06:18:59PM +0300, Mathias Nyman wrote:
>>> On 20.5.2021 23.30, Thinh Nguyen wrote:
>>>> As for the xhci driver, there maybe a case where the stream URB never
>>>> gets to complete because the transaction err_count is not properly
>>>> updated. The err_count for transaction error is stored in ep_ring, but
>>>> the xhci driver may not be able to lookup the correct ep_ring based on
>>>> TRB address for streams. There are cases for streams where the event
>>>> TRBs have their TRB pointer field cleared to '0' (xhci spec section
>>>> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
>>>> it automatically does a soft-retry. This is seen from one of our
>>>> testings that the driver was repeatedly doing soft-retry until the class
>>>> driver timed out.
>>>>
>>>> Hi Mathias, maybe you have some comment on this? Thanks.
>>>
>>> This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
>>> We should add one and prevent a loop. after e few soft resets we can end with a
>>> hard reset to clear the host side endpoint halt.
>>>
>>> We don't know the URB that was being tansferred during the error, and can't
>>> give it back with a proper error code.
>>> In that sense we still end up waiting for a timeout and someone to cancel
>>> the urb.
>>
>> That's not good. There may not be a timeout; drivers expect transfers
>> to complete with a failure, not to be retried indefinitely.
>>
>> However, if you do know which endpoint/stream the error is connected to,
>> you should be able to get the URB. It will be the first one queued for
>> that endpoint/stream.
>>
>
> When the xhci can't recover a transfer with soft-retry, no outstanding
> transfer can proceed/complete for the endpoint. If the TRB pointer is 0,
> we just don't know which stream or endpoint ring it's for, but we know
> all the outstanding URBs of an endpoint. Let's may as well return an
> error status for all of them after a limited number of soft-retries.
We get the endpoint, but not the stream.
I guess we could walk through each stream of this endpoint, and return the
first URB of every stream that has a pending URB.
xHCI spec claims to supports 65533 streams per endpoint, but in real life
UAS probably only uses a few per endpoint?
-Mathias
Mathias Nyman wrote:
> On 24.5.2021 22.23, Thinh Nguyen wrote:
>> Alan Stern wrote:
>>> On Mon, May 24, 2021 at 06:18:59PM +0300, Mathias Nyman wrote:
>>>> On 20.5.2021 23.30, Thinh Nguyen wrote:
>>>>> As for the xhci driver, there maybe a case where the stream URB never
>>>>> gets to complete because the transaction err_count is not properly
>>>>> updated. The err_count for transaction error is stored in ep_ring, but
>>>>> the xhci driver may not be able to lookup the correct ep_ring based on
>>>>> TRB address for streams. There are cases for streams where the event
>>>>> TRBs have their TRB pointer field cleared to '0' (xhci spec section
>>>>> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
>>>>> it automatically does a soft-retry. This is seen from one of our
>>>>> testings that the driver was repeatedly doing soft-retry until the class
>>>>> driver timed out.
>>>>>
>>>>> Hi Mathias, maybe you have some comment on this? Thanks.
>>>>
>>>> This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
>>>> We should add one and prevent a loop. after e few soft resets we can end with a
>>>> hard reset to clear the host side endpoint halt.
>>>>
>>>> We don't know the URB that was being tansferred during the error, and can't
>>>> give it back with a proper error code.
>>>> In that sense we still end up waiting for a timeout and someone to cancel
>>>> the urb.
>>>
>>> That's not good. There may not be a timeout; drivers expect transfers
>>> to complete with a failure, not to be retried indefinitely.
>>>
>>> However, if you do know which endpoint/stream the error is connected to,
>>> you should be able to get the URB. It will be the first one queued for
>>> that endpoint/stream.
>>>
>>
>> When the xhci can't recover a transfer with soft-retry, no outstanding
>> transfer can proceed/complete for the endpoint. If the TRB pointer is 0,
>> we just don't know which stream or endpoint ring it's for, but we know
>> all the outstanding URBs of an endpoint. Let's may as well return an
>> error status for all of them after a limited number of soft-retries.
>
> We get the endpoint, but not the stream.
Right.
>
> I guess we could walk through each stream of this endpoint, and return the
> first URB of every stream that has a pending URB.
> xHCI spec claims to supports 65533 streams per endpoint, but in real life
> UAS probably only uses a few per endpoint?
>
> -Mathias
>
Typically UASP devices advertise to support up to 32 streams. We notice
that some newer builds of Windows OS has a bug (or intentional?) that it
rejects any device that uses more or less than 32 streams (probably a
bug) in the descriptor.
I think we only need to do this if we don't know which stream the event
belongs to. Otherwise, we can keep the old logic.
BR,
Thinh
syzbot has found a reproducer for the following issue on:
HEAD commit: 625acffd Merge tag 's390-5.13-5' of git://git.kernel.org/p..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=101a20fbd00000
kernel config: https://syzkaller.appspot.com/x/.config?x=279de9012e194ee1
dashboard link: https://syzkaller.appspot.com/bug?extid=e2eae5639e7203360018
compiler: Debian clang version 11.0.1-2
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=17215fb8300000
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
#3: ffffffff8d5c8818 (kbd_event_lock){..-.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
#3: ffffffff8d5c8818 (kbd_event_lock){..-.}-{2:2}, at: kbd_event+0x97/0x3c00 drivers/tty/vt/keyboard.c:1525
#4: ffffffff8cf15d00 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x30 arch/x86/pci/mmconfig_64.c:151
=============================================
keytouch 0003:0926:3333.00B5: can't resubmit intr, dummy_hcd.4-1/input0, status -19
keytouch 0003:0926:3333.00B5: usb_submit_urb(ctrl) failed: -19
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 1-...!: (2 ticks this GP) idle=d92/1/0x4000000000000000 softirq=25390/25392 fqs=3
(t=12164 jiffies g=31645 q=43226)
rcu: rcu_preempt kthread starved for 12162 jiffies! g31645 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack:26384 pid: 14 ppid: 2 flags:0x00004000
Call Trace:
context_switch kernel/sched/core.c:4339 [inline]
__schedule+0xb98/0x1120 kernel/sched/core.c:5147
schedule+0x14b/0x200 kernel/sched/core.c:5226
schedule_timeout+0x1aa/0x2c0 kernel/time/timer.c:1892
rcu_gp_fqs_loop kernel/rcu/tree.c:2004 [inline]
rcu_gp_kthread+0x112d/0x2190 kernel/rcu/tree.c:2177
kthread+0x39a/0x3c0 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 3234 Comm: aoe_tx0 Not tainted 5.13.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:mark_lock+0x208/0x1eb0 kernel/locking/lockdep.c:4501
Code: ff 8f 49 83 c6 50 4c 89 f0 48 c1 e8 03 42 80 3c 38 00 74 08 4c 89 f7 e8 56 5d 65 00 bb 01 00 00 00 45 85 2e 0f 84 b0 00 00 00 <48> c7 44 24 60 0e 36 e0 45 43 c7 04 27 00 00 00 00 4b c7 44 27 14
RSP: 0018:ffffc90000007580 EFLAGS: 00000002
RAX: 1ffffffff1fff3c2 RBX: 0000000000000001 RCX: ffffffff8161dad9
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff9026fd88
RBP: ffffc90000007810 R08: dffffc0000000000 R09: fffffbfff204dfb2
R10: fffffbfff204dfb2 R11: 0000000000000000 R12: 1ffff92000000ebc
R13: 0000000000000002 R14: ffffffff8fff9e10 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f00730d2b00 CR3: 00000000141fd000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
mark_usage kernel/locking/lockdep.c:4365 [inline]
__lock_acquire+0xb66/0x6040 kernel/locking/lockdep.c:4858
lock_acquire+0x182/0x4a0 kernel/locking/lockdep.c:5514
seqcount_lockdep_reader_access+0xe5/0x200 include/linux/seqlock.h:103
ktime_get+0x35/0x2b0 kernel/time/timekeeping.c:827
clockevents_program_event+0xe4/0x320 kernel/time/clockevents.c:326
hrtimer_interrupt+0xbaa/0x1040 kernel/time/hrtimer.c:1676
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
__sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1106
sysvec_apic_timer_interrupt+0x8c/0xb0 arch/x86/kernel/apic/apic.c:1100
</IRQ>
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:647
RIP: 0010:console_trylock_spinning+0x31b/0x3a0 kernel/printk/printk.c:1894
Code: 08 4d 85 ed 74 91 e8 94 c2 19 00 fb 31 db eb 41 e8 8a c2 19 00 e8 95 74 5b 08 4d 85 ed 74 d1 e8 7b c2 19 00 fb bb 01 00 00 00 <48> c7 c7 00 22 df 8c 31 f6 ba 01 00 00 00 31 c9 41 b8 01 00 00 00
RSP: 0018:ffffc9000288f360 EFLAGS: 00000293
RAX: ffffffff81655005 RBX: 0000000000000001 RCX: ffff888021699c40
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc9000288f430 R08: ffffffff81654fc2 R09: fffffbfff204dfcb
R10: fffffbfff204dfcb R11: 0000000000000000 R12: 1ffff92000511e6c
R13: 0000000000000200 R14: 0000000000000086 R15: dffffc0000000000
vprintk_emit+0x201/0x2f0 kernel/printk/printk.c:2173
dev_vprintk_emit+0x2e1/0x355 drivers/base/core.c:4525
dev_printk_emit+0xca/0x109 drivers/base/core.c:4536
__netdev_printk+0x339/0x419 net/core/dev.c:11392
netdev_warn+0x110/0x158 net/core/dev.c:11445
ieee802154_subif_start_xmit+0xbd/0x100 net/mac802154/tx.c:125
__netdev_start_xmit include/linux/netdevice.h:4944 [inline]
netdev_start_xmit include/linux/netdevice.h:4958 [inline]
xmit_one net/core/dev.c:3654 [inline]
dev_hard_start_xmit+0x20b/0x450 net/core/dev.c:3670
sch_direct_xmit+0x2be/0xec0 net/sched/sch_generic.c:336
qdisc_restart net/sched/sch_generic.c:401 [inline]
__qdisc_run+0xa43/0x1c00 net/sched/sch_generic.c:409
qdisc_run include/net/pkt_sched.h:131 [inline]
__dev_xmit_skb net/core/dev.c:3857 [inline]
__dev_queue_xmit+0xedd/0x2fe0 net/core/dev.c:4214
tx+0x6f/0x110 drivers/block/aoe/aoenet.c:63
kthread+0x22d/0x440 drivers/block/aoe/aoecmd.c:1230
kthread+0x39a/0x3c0 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
NMI backtrace for cpu 1
CPU: 1 PID: 17756 Comm: systemd-udevd Not tainted 5.13.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x202/0x31e lib/dump_stack.c:120
nmi_cpu_backtrace+0x16c/0x190 lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace+0x191/0x2f0 lib/nmi_backtrace.c:62
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
rcu_dump_cpu_stacks+0x22d/0x390 kernel/rcu/tree_stall.h:341
print_cpu_stall kernel/rcu/tree_stall.h:624 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:699 [inline]
rcu_pending kernel/rcu/tree.c:3911 [inline]
rcu_sched_clock_irq+0x1d0d/0x2a30 kernel/rcu/tree.c:2649
update_process_times+0x197/0x200 kernel/time/timer.c:1796
tick_sched_handle kernel/time/tick-sched.c:226 [inline]
tick_sched_timer+0x27d/0x420 kernel/time/tick-sched.c:1374
__run_hrtimer kernel/time/hrtimer.c:1537 [inline]
__hrtimer_run_queues+0x4cb/0xa60 kernel/time/hrtimer.c:1601
hrtimer_interrupt+0x3b3/0x1040 kernel/time/hrtimer.c:1663
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
__sysvec_apic_timer_interrupt+0xf9/0x270 arch/x86/kernel/apic/apic.c:1106
sysvec_apic_timer_interrupt+0x3e/0xb0 arch/x86/kernel/apic/apic.c:1100
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:647
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0xbc/0x120 kernel/locking/spinlock.c:191
Code: f0 48 c1 e8 03 42 80 3c 20 00 74 08 4c 89 f7 e8 ba ad 03 f8 f6 44 24 21 02 75 4e 41 f7 c7 00 02 00 00 74 01 fb bf 01 00 00 00 <e8> ff 62 93 f7 65 8b 05 90 64 3e 76 85 c0 74 3f 48 c7 04 24 0e 36
RSP: 0018:ffffc90000dc0800 EFLAGS: 00000206
RAX: 1ffff920001b8104 RBX: ffff888022268000 RCX: ffffffff8161dad9
RDX: dffffc0000000000 RSI: 0000000000000102 RDI: 0000000000000001
RBP: ffffc90000dc0890 R08: dffffc0000000000 R09: fffffbfff204dfce
R10: fffffbfff204dfce R11: 0000000000000000 R12: dffffc0000000000
R13: 1ffff920001b8100 R14: ffffc90000dc0820 R15: 0000000000000a06
dummy_timer+0x3002/0x3100 drivers/usb/gadget/udc/dummy_hcd.c:1987
call_timer_fn+0xf6/0x210 kernel/time/timer.c:1431
expire_timers kernel/time/timer.c:1476 [inline]
__run_timers+0x6ff/0x910 kernel/time/timer.c:1745
run_timer_softirq+0x63/0xf0 kernel/time/timer.c:1758
__do_softirq+0x372/0x7a6 kernel/softirq.c:559
invoke_softirq kernel/softirq.c:433 [inline]
__irq_exit_rcu+0x245/0x280 kernel/softirq.c:637
irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
sysvec_apic_timer_interrupt+0x91/0xb0 arch/x86/kernel/apic/apic.c:1100
</IRQ>
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:647
RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
RIP: 0010:_raw_spin_unlock_irqrestore+0xbc/0x120 kernel/locking/spinlock.c:191
Code: f0 48 c1 e8 03 42 80 3c 20 00 74 08 4c 89 f7 e8 ba ad 03 f8 f6 44 24 21 02 75 4e 41 f7 c7 00 02 00 00 74 01 fb bf 01 00 00 00 <e8> ff 62 93 f7 65 8b 05 90 64 3e 76 85 c0 74 3f 48 c7 04 24 0e 36
RSP: 0018:ffffc9000945f7e0 EFLAGS: 00000206
RAX: 1ffff9200128bf00 RBX: ffffffff911be368 RCX: ffffffff90e87703
RDX: dffffc0000000000 RSI: 0000000000000001 RDI: 0000000000000001
RBP: ffffc9000945f870 R08: ffffffff81856800 R09: fffffbfff2237c6e
R10: fffffbfff2237c6e R11: 0000000000000000 R12: dffffc0000000000
R13: 1ffff9200128befc R14: ffffc9000945f800 R15: 0000000000000a06
__debug_check_no_obj_freed lib/debugobjects.c:997 [inline]
debug_check_no_obj_freed+0x5a2/0x650 lib/debugobjects.c:1018
slab_free_hook mm/slub.c:1558 [inline]
slab_free_freelist_hook+0x161/0x290 mm/slub.c:1608
slab_free mm/slub.c:3168 [inline]
kmem_cache_free+0x85/0x170 mm/slub.c:3184
anon_vma_chain_free mm/rmap.c:141 [inline]
unlink_anon_vmas+0x58b/0x600 mm/rmap.c:439
free_pgtables+0x7f/0x2d0 mm/memory.c:413
exit_mmap+0x2be/0x5f0 mm/mmap.c:3209
__mmput+0x111/0x370 kernel/fork.c:1096
exit_mm+0x67e/0x7d0 kernel/exit.c:502
do_exit+0x6b9/0x23d0 kernel/exit.c:813
do_group_exit+0x168/0x2d0 kernel/exit.c:923
__do_sys_exit_group+0x13/0x20 kernel/exit.c:934
__se_sys_exit_group+0x10/0x10 kernel/exit.c:932
__x64_sys_exit_group+0x37/0x40 kernel/exit.c:932
do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f0072df1618
Code: Unable to access opcode bytes at RIP 0x7f0072df15ee.
RSP: 002b:00007ffc0fb77be8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007ffc0fb77cb0 RCX: 00007f0072df1618
RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
RBP: 00007ffc0fb77d60 R08: 00000000000000e7 R09: fffffffffffffe50
R10: 00000000ffffffff R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000003 R15: 000000000000000e
keytouch 0003:0926:3333.00BB: usb_submit_urb(ctrl) failed: -19
keytouch 0003:0926:3333.00BC: usb_submit_urb(ctrl) failed: -19
________________________________________
From: Dmitry Vyukov <[email protected]>
Sent: Monday, 19 April 2021 15:27
To: syzbot; Greg Kroah-Hartman; [email protected]; [email protected]; [email protected]; USB list
Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [syzbot] INFO: rcu detected stall in tx
[Please note: This e-mail is from an EXTERNAL e-mail address]
On Mon, Apr 19, 2021 at 9:19 AM syzbot
<[email protected]> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: 50987bec Merge tag 'trace-v5.12-rc7' of git://git.kernel.o..
> git tree: upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1065c5fcd00000
> kernel config: https://syzkaller.appspot.com/x/.config?x=398c4d0fe6f66e68
> dashboard link: https://syzkaller.appspot.com/bug?extid=e2eae5639e7203360018
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
>
> usbtmc 5-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 5-1:0.0: unknown status received: -71
>The log shows an infinite stream of these before the stall, so I
>assume it's an infinite loop in usbtmc.
>+usbtmc maintainers
>
>[ 370.171634][ C0] usbtmc 6-1:0.0: unknown status received: >-71
>[ 370.177799][ C1] usbtmc 3-1:0.0: unknown status received: >-71
>[ 370.183912][ C0] usbtmc 4-1:0.0: unknown status received: >-71
>[ 370.190076][ C1] usbtmc 5-1:0.0: unknown status received: >-71
>[ 370.196194][ C0] usbtmc 2-1:0.0: unknown status received: >-71
>[ 370.202387][ C1] usbtmc 3-1:0.0: unknown status received: >-71
>[ 370.208460][ C0] usbtmc 6-1:0.0: unknown status received: >-71
>[ 370.214615][ C1] usbtmc 5-1:0.0: unknown status received: >-71
>[ 370.220736][ C0] usbtmc 4-1:0.0: unknown status received: >-71
>[ 370.226902][ C1] usbtmc 3-1:0.0: unknown status received: >-71
>[ 370.233005][ C0] usbtmc 2-1:0.0: unknown status received: >-71
>[ 370.239168][ C1] usbtmc 5-1:0.0: unknown status received: >-71
>[ 370.245271][ C0] usbtmc 6-1:0.0: unknown status received: >-71
>[ 370.251426][ C1] usbtmc 3-1:0.0: unknown status received: >-71
>[ 370.257552][ C0] usbtmc 4-1:0.0: unknown status received: >-71
>[ 370.263715][ C1] usbtmc 5-1:0.0: unknown status received: >-71
>[ 370.269819][ C0] usbtmc 2-1:0.0: unknown status received: >-71
>[ 370.275974][ C1] usbtmc 3-1:0.0: unknown status received: >-71
>[ 370.282100][ C0] usbtmc 6-1:0.0: unknown status received: >-71
>[ 370.288262][ C1] usbtmc 5-1:0.0: unknown status received: >-71
>[ 370.294399][ C0] usbtmc 4-1:0.0: unknown status received: >-71
This seems like a long time in the following cycle, when the callback function usbtmc_interrupt() find unknown status error, it will submit urb again. the urb may be insert urbp_list.
due to the dummy_timer() be called in bh-disable.
This will result in the RCU reading critical area not exiting for a long time (note: bh_disable/enable, preempt_disable/enable is regarded as the RCU critical reading area ), and prevent rcu_preempt kthread be schedule and running.
dummy_timer()
{
restart:
list_for_each_entry_safe(urbp, tmp, &dum_hcd->urbp_list, urbp_list) {
.........
ep = find_endpoint(dum, address);
if (!ep) {
status = -EPROTO;
gotto return_urb;
}
............
return_urb:
usb_hcd_giveback_urb();
goto restart;
}
}
Whether to return directly when we find the urb status is unknown error?
diff --git a/drivers/usb/class/usbtmc.c b/drivers/usb/class/usbtmc.c
index 74d5a9c5238a..39d44339c03f 100644
--- a/drivers/usb/class/usbtmc.c
+++ b/drivers/usb/class/usbtmc.c
@@ -2335,6 +2335,7 @@ static void usbtmc_interrupt(struct urb *urb)
return;
default:
dev_err(dev, "unknown status received: %d\n", status);
+ return;
}
exit:
rv = usb_submit_urb(urb, GFP_ATOMIC);
Thanks
Qiang
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 1-...!: (8580 ticks this GP) idle=72e/1/0x4000000000000000 softirq=20679/20679 fqs=0
> (t=10500 jiffies g=27129 q=416)
> rcu: rcu_preempt kthread starved for 10500 jiffies! g27129 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
> rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
> rcu: RCU grace-period kthread stack dump:
> task:rcu_preempt state:R running task stack:29168 pid: 14 ppid: 2 flags:0x00004000
> Call Trace:
> context_switch kernel/sched/core.c:4322 [inline]
> __schedule+0x911/0x21b0 kernel/sched/core.c:5073
> schedule+0xcf/0x270 kernel/sched/core.c:5152
> schedule_timeout+0x14a/0x250 kernel/time/timer.c:1892
> rcu_gp_fqs_loop kernel/rcu/tree.c:2005 [inline]
> rcu_gp_kthread+0xd07/0x2250 kernel/rcu/tree.c:2178
> kthread+0x3b1/0x4a0 kernel/kthread.c:292
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> rcu: Stack dump where RCU GP kthread last ran:
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 3232 Comm: aoe_tx0 Not tainted 5.12.0-rc7-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:native_apic_mem_write+0x8/0x10 arch/x86/include/asm/apic.h:110
> Code: c7 40 d9 36 8f e8 c8 11 86 00 eb b0 66 0f 1f 44 00 00 be 01 00 00 00 e9 36 c7 2c 00 cc cc cc cc cc cc 89 ff 89 b7 00 c0 5f ff <c3> 0f 1f 80 00 00 00 00 48 b8 00 00 00 00 00 fc ff df 53 89 fb 48
> RSP: 0018:ffffc90000007ea8 EFLAGS: 00000046
> RAX: dffffc0000000000 RBX: ffffffff8b0a78c0 RCX: 0000000000000020
> RDX: 1ffffffff1614f1a RSI: 000000000001c285 RDI: 0000000000000380
> RBP: ffff8880b9c1f2c0 R08: 000000000000003f R09: 0000000000000000
> R10: ffffffff8166ecf7 R11: 0000000000000000 R12: 000000000001c285
> R13: 0000000000000020 R14: ffff8880b9c26340 R15: 0000006120792e26
> FS: 0000000000000000(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fb9e6cdb380 CR3: 0000000018792000 CR4: 00000000001506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <IRQ>
> apic_write arch/x86/include/asm/apic.h:393 [inline]
> lapic_next_event+0x4d/0x80 arch/x86/kernel/apic/apic.c:472
> clockevents_program_event+0x254/0x370 kernel/time/clockevents.c:334
> tick_program_event+0xac/0x140 kernel/time/tick-oneshot.c:44
> hrtimer_interrupt+0x414/0xa00 kernel/time/hrtimer.c:1676
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
> __sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1106
> sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1100
> </IRQ>
> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
> RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:27 [inline]
> RIP: 0010:check_kcov_mode kernel/kcov.c:163 [inline]
> RIP: 0010:__sanitizer_cov_trace_pc+0x0/0x60 kernel/kcov.c:197
> Code: f0 4d 89 03 e9 f2 fc ff ff b9 ff ff ff ff ba 08 00 00 00 4d 8b 03 48 0f bd ca 49 8b 45 00 48 63 c9 e9 64 ff ff ff 0f 1f 40 00 <65> 8b 05 39 fe 8d 7e 89 c1 48 8b 34 24 81 e1 00 01 00 00 65 48 8b
> RSP: 0018:ffffc900030cf6f8 EFLAGS: 00000293
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: ffff88801aff1c40 RSI: ffffffff815c2e4f RDI: 0000000000000003
> RBP: ffffc900030cf738 R08: 0000000000000000 R09: ffffffff8fa9a96f
> R10: ffffffff815c2e45 R11: 0000000000000000 R12: 000000000000002d
> R13: ffff8880113db880 R14: 0000000000000000 R15: 0000000000000200
> console_trylock_spinning kernel/printk/printk.c:1818 [inline]
> vprintk_emit+0x3a5/0x560 kernel/printk/printk.c:2097
> dev_vprintk_emit+0x36e/0x3b2 drivers/base/core.c:4434
> dev_printk_emit+0xba/0xf1 drivers/base/core.c:4445
> __netdev_printk+0x1c6/0x27a net/core/dev.c:11292
> netdev_warn+0xd7/0x109 net/core/dev.c:11345
> ieee802154_subif_start_xmit.cold+0x17/0x27 net/mac802154/tx.c:125
> __netdev_start_xmit include/linux/netdevice.h:4825 [inline]
> netdev_start_xmit include/linux/netdevice.h:4839 [inline]
> xmit_one net/core/dev.c:3605 [inline]
> dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3621
> sch_direct_xmit+0x2e1/0xbd0 net/sched/sch_generic.c:313
> qdisc_restart net/sched/sch_generic.c:376 [inline]
> __qdisc_run+0x4ba/0x15f0 net/sched/sch_generic.c:384
> qdisc_run include/net/pkt_sched.h:136 [inline]
> qdisc_run include/net/pkt_sched.h:128 [inline]
> __dev_xmit_skb net/core/dev.c:3807 [inline]
> __dev_queue_xmit+0x14b9/0x2e00 net/core/dev.c:4162
> tx+0x68/0xb0 drivers/block/aoe/aoenet.c:63
> kthread+0x1e7/0x3a0 drivers/block/aoe/aoecmd.c:1230
> kthread+0x3b1/0x4a0 kernel/kthread.c:292
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> NMI backtrace for cpu 1
> CPU: 1 PID: 37 Comm: kworker/1:1 Not tainted 5.12.0-rc7-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: events nsim_dev_trap_report_work
> Call Trace:
> <IRQ>
> __dump_stack lib/dump_stack.c:79 [inline]
> dump_stack+0x141/0x1d7 lib/dump_stack.c:120
> nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
> nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
> trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
> rcu_dump_cpu_stacks+0x222/0x2a7 kernel/rcu/tree_stall.h:341
> print_cpu_stall kernel/rcu/tree_stall.h:622 [inline]
> check_cpu_stall kernel/rcu/tree_stall.h:697 [inline]
> rcu_pending kernel/rcu/tree.c:3830 [inline]
> rcu_sched_clock_irq.cold+0x4f7/0x11dd kernel/rcu/tree.c:2650
> update_process_times+0x16d/0x200 kernel/time/timer.c:1796
> tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
> tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1369
> __run_hrtimer kernel/time/hrtimer.c:1537 [inline]
> __hrtimer_run_queues+0x1c0/0xe40 kernel/time/hrtimer.c:1601
> hrtimer_interrupt+0x330/0xa00 kernel/time/hrtimer.c:1663
> local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1089 [inline]
> __sysvec_apic_timer_interrupt+0x146/0x540 arch/x86/kernel/apic/apic.c:1106
> sysvec_apic_timer_interrupt+0x40/0xc0 arch/x86/kernel/apic/apic.c:1100
> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
> RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:161 [inline]
> RIP: 0010:_raw_spin_unlock_irqrestore+0x38/0x70 kernel/locking/spinlock.c:191
> Code: 74 24 10 e8 ba 19 54 f8 48 89 ef e8 f2 cf 54 f8 81 e3 00 02 00 00 75 25 9c 58 f6 c4 02 75 2d 48 85 db 74 01 fb bf 01 00 00 00 <e8> d3 9d 48 f8 65 8b 05 7c 68 fc 76 85 c0 74 0a 5b 5d c3 e8 40 59
> RSP: 0018:ffffc90000dc0b28 EFLAGS: 00000206
> RAX: 0000000000000002 RBX: 0000000000000200 RCX: 1ffffffff1f5f34a
> RDX: 0000000000000000 RSI: 0000000000000103 RDI: 0000000000000001
> RBP: ffff888144fa8000 R08: 0000000000000001 R09: ffffffff8fa9a99f
> R10: 0000000000000001 R11: ffffc90013880000 R12: ffff888145047440
> R13: ffff88801ee8e500 R14: dffffc0000000000 R15: ffff888011f69c00
> spin_unlock_irqrestore include/linux/spinlock.h:409 [inline]
> dummy_timer+0x12f1/0x32a0 drivers/usb/gadget/udc/dummy_hcd.c:1985
> call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1431
> expire_timers kernel/time/timer.c:1476 [inline]
> __run_timers.part.0+0x67c/0xa50 kernel/time/timer.c:1745
> __run_timers kernel/time/timer.c:1726 [inline]
> run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1758
> __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
> do_softirq.part.0+0xd9/0x130 kernel/softirq.c:248
> </IRQ>
> do_softirq kernel/softirq.c:240 [inline]
> __local_bh_enable_ip+0x102/0x120 kernel/softirq.c:198
> spin_unlock_bh include/linux/spinlock.h:399 [inline]
> nsim_dev_trap_report drivers/net/netdevsim/dev.c:585 [inline]
> nsim_dev_trap_report_work+0x867/0xbd0 drivers/net/netdevsim/dev.c:611
> process_one_work+0x98d/0x1600 kernel/workqueue.c:2275
> worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
> kthread+0x3b1/0x4a0 kernel/kthread.c:292
> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 5-1:0.0: unknown status received: -71
> usbtmc 5-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 5-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 5-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 2-1:0.0: unknown status received: -71
> usbtmc 4-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: unknown status received: -71
> usbtmc 3-1:0.0: usb_submit_urb failed: -19
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: unknown status received: -71
> usbtmc 6-1:0.0: usb_submit_urb failed: -19
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/000000000000a9b79905c04e25a0%40google.com.
On Mon, Jun 28, 2021 at 06:38:37AM +0000, Zhang, Qiang wrote:
>
>
> ________________________________________
> From: Dmitry Vyukov <[email protected]>
> Sent: Monday, 19 April 2021 15:27
> To: syzbot; Greg Kroah-Hartman; [email protected]; [email protected]; [email protected]; USB list
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
> Subject: Re: [syzbot] INFO: rcu detected stall in tx
>
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Mon, Apr 19, 2021 at 9:19 AM syzbot
> <[email protected]> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 50987bec Merge tag 'trace-v5.12-rc7' of git://git.kernel.o..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1065c5fcd00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=398c4d0fe6f66e68
> > dashboard link: https://syzkaller.appspot.com/bug?extid=e2eae5639e7203360018
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: [email protected]
> >
> > usbtmc 5-1:0.0: unknown status received: -71
> > usbtmc 3-1:0.0: unknown status received: -71
> > usbtmc 5-1:0.0: unknown status received: -71
>
> >The log shows an infinite stream of these before the stall, so I
> >assume it's an infinite loop in usbtmc.
> >+usbtmc maintainers
> >
> >[ 370.171634][ C0] usbtmc 6-1:0.0: unknown status received: >-71
> >[ 370.177799][ C1] usbtmc 3-1:0.0: unknown status received: >-71
> This seems like a long time in the following cycle, when the callback function usbtmc_interrupt() find unknown status error, it will submit urb again. the urb may be insert urbp_list.
> due to the dummy_timer() be called in bh-disable.
> This will result in the RCU reading critical area not exiting for a long time (note: bh_disable/enable, preempt_disable/enable is regarded as the RCU critical reading area ), and prevent rcu_preempt kthread be schedule and running.
> Whether to return directly when we find the urb status is unknown error?
Yes.
> diff --git a/drivers/usb/class/usbtmc.c b/drivers/usb/class/usbtmc.c
> index 74d5a9c5238a..39d44339c03f 100644
> --- a/drivers/usb/class/usbtmc.c
> +++ b/drivers/usb/class/usbtmc.c
> @@ -2335,6 +2335,7 @@ static void usbtmc_interrupt(struct urb *urb)
> return;
> default:
> dev_err(dev, "unknown status received: %d\n", status);
> + return;
> }
> exit:
> rv = usb_submit_urb(urb, GFP_ATOMIC);
This is the right thing to do. In fact, you should also change the code
above this. There's no real need for special handling of the
-ECONNRESET, -ENOENT, ..., -EPIPE codes, since the driver will do the
same thing no matter what the code is.
Alan Stern
syzbot has found a reproducer for the following issue on:
HEAD commit: 7cca308cfdc0 Merge tag 'powerpc-5.15-1' of git://git.kerne..
git tree: upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=10535915300000
kernel config: https://syzkaller.appspot.com/x/.config?x=9c582b69de20dde2
dashboard link: https://syzkaller.appspot.com/bug?extid=e2eae5639e7203360018
compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=10e2e533300000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1294ff33300000
IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]
rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 1-....: (288 ticks this GP) idle=bdd/1/0x4000000000000000 softirq=20305/20305 fqs=5241
(t=10500 jiffies g=18249 q=67)
NMI backtrace for cpu 1
CPU: 1 PID: 3254 Comm: aoe_tx0 Not tainted 5.14.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace+0x1ae/0x220 lib/nmi_backtrace.c:62
trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
rcu_dump_cpu_stacks+0x25e/0x3f0 kernel/rcu/tree_stall.h:343
print_cpu_stall kernel/rcu/tree_stall.h:627 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:711 [inline]
rcu_pending kernel/rcu/tree.c:3880 [inline]
rcu_sched_clock_irq.cold+0x9d/0x746 kernel/rcu/tree.c:2599
update_process_times+0x16d/0x200 kernel/time/timer.c:1785
tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226
tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1421
__run_hrtimer kernel/time/hrtimer.c:1685 [inline]
__hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749
hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811
local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
__sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103
sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097
</IRQ>
asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638
RIP: 0010:__sanitizer_cov_trace_pc+0x5c/0x60 kernel/kcov.c:207
Code: 82 18 15 00 00 83 f8 02 75 20 48 8b 8a 20 15 00 00 8b 92 1c 15 00 00 48 8b 01 48 83 c0 01 48 39 c2 76 07 48 89 34 c1 48 89 01 <c3> 0f 1f 00 41 55 41 54 49 89 fc 55 48 bd eb 83 b5 80 46 86 c8 61
RSP: 0018:ffffc90002ccfad8 EFLAGS: 00000293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff8880206e3900 RSI: ffffffff874e536f RDI: 0000000000000003
RBP: ffff88807df1b340 R08: 0000000000000000 R09: 0000000000000001
R10: ffffffff874e5366 R11: 0000000000000000 R12: ffff88807df1b000
R13: dffffc0000000000 R14: ffff8880709ff490 R15: ffff88807df1b338
__list_del_entry include/linux/list.h:132 [inline]
list_move_tail include/linux/list.h:227 [inline]
fq_codel_dequeue+0x7cf/0x1f50 net/sched/sch_fq_codel.c:299
dequeue_skb net/sched/sch_generic.c:292 [inline]
qdisc_restart net/sched/sch_generic.c:397 [inline]
__qdisc_run+0x1ae/0x1700 net/sched/sch_generic.c:415
__dev_xmit_skb net/core/dev.c:3861 [inline]
__dev_queue_xmit+0x1f6e/0x3710 net/core/dev.c:4170
tx+0x68/0xb0 drivers/block/aoe/aoenet.c:63
kthread+0x1e7/0x3b0 drivers/block/aoe/aoecmd.c:1230
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
----------------
Code disassembly (best guess), 1 bytes skipped:
0: 18 15 00 00 83 f8 sbb %dl,-0x77d0000(%rip) # 0xf8830006
6: 02 75 20 add 0x20(%rbp),%dh
9: 48 8b 8a 20 15 00 00 mov 0x1520(%rdx),%rcx
10: 8b 92 1c 15 00 00 mov 0x151c(%rdx),%edx
16: 48 8b 01 mov (%rcx),%rax
19: 48 83 c0 01 add $0x1,%rax
1d: 48 39 c2 cmp %rax,%rdx
20: 76 07 jbe 0x29
22: 48 89 34 c1 mov %rsi,(%rcx,%rax,8)
26: 48 89 01 mov %rax,(%rcx)
* 29: c3 retq <-- trapping instruction
2a: 0f 1f 00 nopl (%rax)
2d: 41 55 push %r13
2f: 41 54 push %r12
31: 49 89 fc mov %rdi,%r12
34: 55 push %rbp
35: 48 bd eb 83 b5 80 46 movabs $0x61c8864680b583eb,%rbp
3c: 86 c8 61