2018-08-08 20:19:25

by Randy Oostdyk

[permalink] [raw]
Subject: wireless dongle causing entire machine to hang

Good day all,

I'm writing to report an issue with the linux kernel, and I'm hoping
this is the right place to report it. (short aside: I tried to ask in
the #linux-wireless IRC channel, but wasn't permitted to speak!)

I'm aware that this is a USB protocol issue, and so this may be the
wrong place to report the bug, but the warning seems to be generated
by the wireless driver, and that appears to be the key issue here.

My USB wifi dongle is on the end of a very long USB cable, and was
connected to a USB hub. On two different occasions (after hours or
days of use), the machine it was attached to (Raspberry Pi 3) stopped
responding. I was unable to SSH in, even over ethernet. After a hard
reboot, I found that the following error was repeated **many thousands
of times per second** in three different log files:

Rpi kernel: [857011.857581] ieee80211 phy1:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71

As the machine continued in that state for hours, those log files had
grown to several gigabytes in size each! (/var/log/syslog,
/var/log/kern.log, and /var/log/messages)

It appears to be a very similar (if not same) bug referenced here:
https://www.spinics.net/lists/linux-wireless/msg150555.html

He resolved the "soft lockup" issue with some changes to the driver
(diff included in that thread), so I'm hoping this is the right place
to bring this issue up.

Output of uname -a:
Linux RCBLpi 4.14.52-v7+ #1123 SMP Wed Jun 27 17:35:49 BST 2018 armv7l GNU/Linux

If any further details are needed, feel free to email me directly, as
I haven't subscribed to this list.


2018-08-09 13:35:15

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

On Wed, Aug 08, 2018 at 11:58:36AM -0600, Randy Oostdyk wrote:
> Good day all,
>
> I'm writing to report an issue with the linux kernel, and I'm hoping
> this is the right place to report it. (short aside: I tried to ask in
> the #linux-wireless IRC channel, but wasn't permitted to speak!)
>
> I'm aware that this is a USB protocol issue, and so this may be the
> wrong place to report the bug, but the warning seems to be generated
> by the wireless driver, and that appears to be the key issue here.
>
> My USB wifi dongle is on the end of a very long USB cable, and was
> connected to a USB hub. On two different occasions (after hours or
> days of use), the machine it was attached to (Raspberry Pi 3) stopped
> responding. I was unable to SSH in, even over ethernet. After a hard
> reboot, I found that the following error was repeated **many thousands
> of times per second** in three different log files:
>
> Rpi kernel: [857011.857581] ieee80211 phy1:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71

We have this change that make device not presend when we get -EPROTO
error from USB driver.

commit 2ad69ac5976191e9bb7dc4044204a504653ad1bb
Author: Richard Genoud <[email protected]>
Date: Thu Apr 3 16:48:56 2014 +0200

rt2x00: Endless loop on hub port power down

However I reverted it in

commit dc50a52a36f459e0e3bed492d72a78bf64772c25
Author: Stanislaw Gruszka <[email protected]>
Date: Wed Nov 26 15:29:16 2014 +0100

Revert "rt2x00: Endless loop on hub port power down"

Because it caused that if we get this error only one time,
it prevent devices to work, which worked otherwise.

rt2800usb driver expect thet we get -ENODEV or -ENOENT from
usb host driver when we can not talk to device anymore. On other usb
errors rt2800usb just try to work. I think this should be fixed in
usb host driver to return proper error code.

> As the machine continued in that state for hours, those log files had
> grown to several gigabytes in size each! (/var/log/syslog,
> /var/log/kern.log, and /var/log/messages)
>
> It appears to be a very similar (if not same) bug referenced here:
> https://www.spinics.net/lists/linux-wireless/msg150555.html
>
> He resolved the "soft lockup" issue with some changes to the driver
> (diff included in that thread), so I'm hoping this is the right place
> to bring this issue up.

I'm reluctant to replace _warn by _dbg messages as if somethings
will go wrong we will not notice that. We can use printk_ratelimited()
variant instead to do not spam log in speed that it will hung
the machine. But the correct fix should be in USB host drivers,
which seems to be not in perfect shape on those embedded platforms.

Regards
Stanislaw

2018-08-10 14:17:15

by Randy Oostdyk

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

It's indeed possible that the cause was faulty or out-of-spec HW. I'm
confident the machine hung (soft lock?) due to log spam, because upon
reboot, log files were several GB each, covering the time period I was
unable to connect to it.

Thanks for the patch. I'll try to test it with my previous HW setup,
and let you know how it goes. It might take a few weeks, due to the
intermittent nature of the USB problem that triggered it.

On Fri, Aug 10, 2018 at 5:14 AM, Stanislaw Gruszka <[email protected]> wrote:
> On Thu, Aug 09, 2018 at 12:49:58PM -0600, Randy Oostdyk wrote:
>> On Thu, Aug 9, 2018 at 5:10 AM, Stanislaw Gruszka <[email protected]> wrote:
>> > I'm reluctant to replace _warn by _dbg messages as if somethings
>> > will go wrong we will not notice that. We can use printk_ratelimited()
>> > variant instead to do not spam log in speed that it will hung
>> > the machine. But the correct fix should be in USB host drivers,
>> > which seems to be not in perfect shape on those embedded platforms.
>>
>> Agreed about the "correct" fix likely being on the USB side, but at
>> least using printk_ratelimited() will avoid the log spamming, and
>> avoid locking the machine, as you said.
>>
>> Will someone else take that approach and come up with a patch, or is
>> this something I should try to take on myself? I could certainly test
>> such a patch, if desired.
>>
>> Meanwhile, I'll look into reporting the USB bug, if it hasn't already been.
>
> I'm attaching the ratelimit patch for test.
>
> However after reading email from Larry I withdraw blaming USB host
> driver. The issue could be faulty HW and USB host driver can not do
> much much about this.
>
> Another question is if machine hung due to log spam or because there
> was some crash. Those error conditions trigger code paths that are
> not usually used, so there could be some oops that hung the system.
>
> Anyway you can test the patch and report back.
>
> Cheers
> Stanislaw

2018-08-09 21:16:08

by Randy Oostdyk

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

On Thu, Aug 9, 2018 at 5:10 AM, Stanislaw Gruszka <[email protected]> wrote:
> I'm reluctant to replace _warn by _dbg messages as if somethings
> will go wrong we will not notice that. We can use printk_ratelimited()
> variant instead to do not spam log in speed that it will hung
> the machine. But the correct fix should be in USB host drivers,
> which seems to be not in perfect shape on those embedded platforms.

Agreed about the "correct" fix likely being on the USB side, but at
least using printk_ratelimited() will avoid the log spamming, and
avoid locking the machine, as you said.

Will someone else take that approach and come up with a patch, or is
this something I should try to take on myself? I could certainly test
such a patch, if desired.

Meanwhile, I'll look into reporting the USB bug, if it hasn't already been.

2018-08-28 20:02:08

by Randy Oostdyk

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

Hi Stanislaw,

I had considerable difficulty getting the module compiled and
installed. I had ignored your first patch, and only applied the second
patch (that you said compiles), but it turns out to be missing the
variable declarations portion of the first patch. After taking that
portion of the first patch, I managed to get the module to compile and
installed.

After one week of testing, I'm not able to cause the USB protocol
error (-71) to happen, except by disconnecting the device from the USB
bus. So I'm not able to fully ascertain if your patch really solved
the issue or not. Sorry... At least I can say that it doesn't break
anything!

I will try to continue testing, and report the results if I can manage
to get the error to surface again.

Cheers,
Randy

On Thu, Aug 16, 2018 at 5:02 AM, Stanislaw Gruszka <[email protected]> wrote:
> On Thu, Aug 16, 2018 at 12:31:07PM +0200, Stanislaw Gruszka wrote:
>> I prepared a patch that count EPROTO errors and if it's bigger
>> than 10 mark device as gone. It should not make a problem
>> when we will get one time random error. I'm attaching it,
>> please test.
>
> Forgot to compile the patch, here is one that compiles.
>
> Cheers
> Stanislaw

2018-08-09 00:05:01

by Jonas Gorski

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

On 8 August 2018 at 21:14, Arend van Spriel
<[email protected]> wrote:
> On 8/8/2018 7:58 PM, Randy Oostdyk wrote:
>>
>> Good day all,
>
>
> Hi Randy
>
>> I'm writing to report an issue with the linux kernel, and I'm hoping
>> this is the right place to report it. (short aside: I tried to ask in
>> the #linux-wireless IRC channel, but wasn't permitted to speak!)
>
>
> uhm. why?

There was (and still is) a lot of link spamming recently on freenode
in #linux-wireless (and several other open [source] channels) from
unregistered clients; to prevent further disruptions unregistered
clients were prevented from writing to the channel for now.

Quite likely this was/is Randys issue. Registering helps, not sure how
easy it is to give individual unregistered users the right to talk.


Regards
Jonas

2018-08-08 21:35:54

by Arend Van Spriel

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

On 8/8/2018 7:58 PM, Randy Oostdyk wrote:
> Good day all,

Hi Randy

> I'm writing to report an issue with the linux kernel, and I'm hoping
> this is the right place to report it. (short aside: I tried to ask in
> the #linux-wireless IRC channel, but wasn't permitted to speak!)

uhm. why?

> I'm aware that this is a USB protocol issue, and so this may be the
> wrong place to report the bug, but the warning seems to be generated
> by the wireless driver, and that appears to be the key issue here.
>
> My USB wifi dongle is on the end of a very long USB cable, and was
> connected to a USB hub. On two different occasions (after hours or
> days of use), the machine it was attached to (Raspberry Pi 3) stopped
> responding. I was unable to SSH in, even over ethernet. After a hard
> reboot, I found that the following error was repeated **many thousands
> of times per second** in three different log files:

I assume that USB is powered, right? Now probably asking the obvious,
but you did not write it down so here it is: did you try with a very
short USB cable?

I guess you read this already [1]

> Rpi kernel: [857011.857581] ieee80211 phy1:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71

So this -EPROTO. The usb host controller of the RPi3 can be found here:

https://github.com/raspberrypi/linux/tree/rpi-4.18.y/drivers/usb/host/dwc_otg

Not sure it that tells us anything though.

Regards,
Arend

[1]
https://www.raspberrypi.org/documentation/hardware/raspberrypi/usb/README.md

2018-08-16 07:21:27

by Randy Oostdyk

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

Sorry about the mangled log file! Here's the txt file as an
attachment, hope it's useful.

Cheers.

On Wed, Aug 15, 2018 at 2:07 AM, Stanislaw Gruszka <[email protected]> wrote:
> Please attach logs as txt file. Your mail client break lines
> what makes logs unreadable for me.
>
> Thanks
> Stanislaw
>
> On Tue, Aug 14, 2018 at 11:19:09PM -0600, Randy Oostdyk wrote:
>> Finally got an opportunity to implement the patch and test it out.
>> Looks like it suppressed the log spam (excellent!), but the machine
>> unfortunately remains unresponsive. Attached is a relevant section of
>> the log: I'm not able to interpret the CPU stall backtrace info, but
>> perhaps someone here can determine whether it's USB or the wireless
>> driver hanging the machine? In the log, it repeats with further
>> warnings and CPU stall backtraces. I can supply more, if needed.
>>
>> As mentioned earlier, it seems clear I have some kind of USB hardware
>> issue going on, and I'll sort that out myself, I just wanted to report
>> the hang, in case it might help contribute toward improving the
>> reliability of the kernel.
>>
>> Cheers!
>>
>>
>> Aug 10 14:44:09 RCBLpi kernel: [ 284.697778] wlan1: AP
>> fc:2d:5e:dd:54:32 changed bandwidth, new config is 2437 MHz, width 1
>> (2437/0 MHz)
>> Aug 10 14:45:00 RCBLpi kernel: [ 335.680872]
>> rt2800usb_tx_sta_fifo_read_completed: 1389 callbacks suppressed
>> Aug 10 14:45:00 RCBLpi kernel: [ 335.680887] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [ 335.680986] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [ 335.681059] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [ 335.681126] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [ 335.681193] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [ 335.681282] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [ 335.681348] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [ 335.681415] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [ 335.681548] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [ 335.681623] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [ 336.171569] ieee80211 phy4:
>> rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for
>> offset 0x1700 with error -71
>> Aug 10 14:45:05 RCBLpi kernel: [ 340.681558]
>> rt2800usb_tx_sta_fifo_read_completed: 73528 callbacks suppressed
>> Aug 10 14:45:05 RCBLpi kernel: [ 340.681573] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [ 340.681659] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [ 340.681754] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [ 340.681829] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [ 340.681903] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [ 340.681984] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [ 340.682079] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [ 340.682149] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [ 340.682236] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [ 340.682302] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [ 345.691516]
>> rt2800usb_tx_sta_fifo_read_completed: 76991 callbacks suppressed
>> Aug 10 14:45:10 RCBLpi kernel: [ 345.691530] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [ 345.691603] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [ 345.691704] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [ 345.691810] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [ 345.691877] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [ 345.691958] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [ 345.692024] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [ 345.692091] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [ 345.692173] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [ 345.692241] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [ 350.701580]
>> rt2800usb_tx_sta_fifo_read_completed: 77030 callbacks suppressed
>> Aug 10 14:45:15 RCBLpi kernel: [ 350.701596] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [ 350.701722] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [ 350.701789] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [ 350.701870] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [ 350.701935] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [ 350.702001] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [ 350.702084] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [ 350.702150] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [ 350.702241] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [ 350.702307] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [ 355.711589]
>> rt2800usb_tx_sta_fifo_read_completed: 77771 callbacks suppressed
>> Aug 10 14:45:20 RCBLpi kernel: [ 355.711603] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [ 355.711674] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [ 355.711763] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [ 355.711829] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [ 355.711917] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [ 355.711983] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [ 355.712048] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [ 355.712135] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [ 355.712199] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [ 355.712265] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171492] INFO: rcu_sched
>> self-detected stall on CPU
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171522] 0-...: (2099 ticks
>> this GP) idle=72a/140000000000002/0 softirq=11632/11632 fqs=1002
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171525] (t=2100 jiffies
>> g=5626 c=5625 q=17568)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171541] NMI backtrace for cpu 0
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171553] CPU: 0 PID: 1007 Comm:
>> kworker/u8:17 Tainted: G C 4.14.61-v7+ #1
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171558] Hardware name: BCM2835
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171588] Workqueue: phy4
>> rt2x00usb_work_rxdone [rt2x00usb]
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171622] [<8010ffc0>]
>> (unwind_backtrace) from [<8010c1fc>] (show_stack+0x20/0x24)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171640] [<8010c1fc>]
>> (show_stack) from [<807894fc>] (dump_stack+0xc8/0x10c)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171659] [<807894fc>]
>> (dump_stack) from [<8078f43c>] (nmi_cpu_backtrace+0x11c/0x120)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171676] [<8078f43c>]
>> (nmi_cpu_backtrace) from [<8078f520>]
>> (nmi_trigger_cpumask_backtrace+0xe0/0x12c)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171692] [<8078f520>]
>> (nmi_trigger_cpumask_backtrace) from [<8010e648>]
>> (arch_trigger_cpumask_backtrace+0x20/0x24)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171710] [<8010e648>]
>> (arch_trigger_cpumask_backtrace) from [<80185710>]
>> (rcu_dump_cpu_stacks+0xb0/0xdc)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171728] [<80185710>]
>> (rcu_dump_cpu_stacks) from [<801850b0>]
>> (rcu_check_callbacks+0x7e8/0x960)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171745] [<801850b0>]
>> (rcu_check_callbacks) from [<8018b314>]
>> (update_process_times+0x44/0x70)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171762] [<8018b314>]
>> (update_process_times) from [<8019d32c>] (tick_sched_handle+0x64/0x70)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171779] [<8019d32c>]
>> (tick_sched_handle) from [<8019d590>] (tick_sched_timer+0x50/0xac)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171795] [<8019d590>]
>> (tick_sched_timer) from [<8018c2cc>]
>> (__hrtimer_run_queues+0x158/0x2ec)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171810] [<8018c2cc>]
>> (__hrtimer_run_queues) from [<8018c704>]
>> (hrtimer_interrupt+0xb8/0x20c)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171827] [<8018c704>]
>> (hrtimer_interrupt) from [<8063de74>]
>> (arch_timer_handler_phys+0x40/0x48)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171846] [<8063de74>]
>> (arch_timer_handler_phys) from [<8017a150>]
>> (handle_percpu_devid_irq+0x88/0x23c)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171863] [<8017a150>]
>> (handle_percpu_devid_irq) from [<801749fc>]
>> (generic_handle_irq+0x34/0x44)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171879] [<801749fc>]
>> (generic_handle_irq) from [<80175050>] (__handle_domain_irq+0x6c/0xc4)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171894] [<80175050>]
>> (__handle_domain_irq) from [<80101520>]
>> (bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171911] [<80101520>]
>> (bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>]
>> (__irq_svc+0x5c/0x7c)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171918] Exception
>> stack(0xadbffab0 to 0xadbffaf8)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171927] faa0:
>> 00000000 00000000 a980bd00 00000000
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171939] fac0: b7716e80 60000113
>> b9fbe400 adae99d0 00000001 adbfe000 00000001 adbffb1c
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171949] fae0: adbffb00 adbffb00
>> 805b3388 805b339c 60000113 ffffffff
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171969] [<807a4ffc>] (__irq_svc)
>> from [<805b339c>] (__usb_hcd_giveback_urb+0x118/0x15c)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.171988] [<805b339c>]
>> (__usb_hcd_giveback_urb) from [<805b3530>]
>> (usb_hcd_giveback_urb+0x4c/0xfc)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172006] [<805b3530>]
>> (usb_hcd_giveback_urb) from [<805dde40>]
>> (completion_tasklet_func+0x80/0xac)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172024] [<805dde40>]
>> (completion_tasklet_func) from [<805ecc60>]
>> (tasklet_callback+0x20/0x24)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172040] [<805ecc60>]
>> (tasklet_callback) from [<80123ac4>] (tasklet_hi_action+0x70/0x104)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172055] [<80123ac4>]
>> (tasklet_hi_action) from [<8010169c>] (__do_softirq+0x174/0x3d8)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172069] [<8010169c>]
>> (__do_softirq) from [<801236a8>] (irq_exit+0xe4/0x140)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172083] [<801236a8>] (irq_exit)
>> from [<80175054>] (__handle_domain_irq+0x70/0xc4)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172097] [<80175054>]
>> (__handle_domain_irq) from [<80101520>]
>> (bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172111] [<80101520>]
>> (bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>]
>> (__irq_svc+0x5c/0x7c)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172117] Exception
>> stack(0xadbffc58 to 0xadbffca0)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172125] fc40:
>> b9d83a40 00000000
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172136] fc60: 00000001 00000000
>> 60000013 00000000 aa2d4b00 a9949600 b9db0400 b9fbe400
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172148] fc80: 00000000 adbffcbc
>> adbffca8 adbffca8 807a4d2c 807a4d40 60000013 ffffffff
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172164] [<807a4ffc>] (__irq_svc)
>> from [<807a4d40>] (_raw_spin_unlock_irqrestore+0x50/0x70)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172180] [<807a4d40>]
>> (_raw_spin_unlock_irqrestore) from [<805ecbf8>]
>> (DWC_SPINUNLOCK_IRQRESTORE+0x18/0x1c)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172196] [<805ecbf8>]
>> (DWC_SPINUNLOCK_IRQRESTORE) from [<805e27f4>]
>> (dwc_otg_urb_enqueue+0x1dc/0x31c)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172212] [<805e27f4>]
>> (dwc_otg_urb_enqueue) from [<805b4f34>]
>> (usb_hcd_submit_urb+0xc8/0x934)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172226] [<805b4f34>]
>> (usb_hcd_submit_urb) from [<805b687c>] (usb_submit_urb+0x284/0x510)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172250] [<805b687c>]
>> (usb_submit_urb) from [<7f4a76e0>] (rt2x00usb_kick_rx_entry+0xb4/0xfc
>> [rt2x00usb])
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172296] [<7f4a76e0>]
>> (rt2x00usb_kick_rx_entry [rt2x00usb]) from [<7f4a7758>]
>> (rt2x00usb_clear_entry+0x30/0x34 [rt2x00usb])
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172354] [<7f4a7758>]
>> (rt2x00usb_clear_entry [rt2x00usb]) from [<7f492a44>]
>> (rt2x00lib_rxdone+0x148/0x5b8 [rt2x00lib])
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172412] [<7f492a44>]
>> (rt2x00lib_rxdone [rt2x00lib]) from [<7f4a8088>]
>> (rt2x00usb_work_rxdone+0x58/0x8c [rt2x00usb])
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172440] [<7f4a8088>]
>> (rt2x00usb_work_rxdone [rt2x00usb]) from [<80137384>]
>> (process_one_work+0x224/0x518)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172459] [<80137384>]
>> (process_one_work) from [<8013835c>] (worker_thread+0x60/0x5f0)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172476] [<8013835c>]
>> (worker_thread) from [<8013db68>] (kthread+0x144/0x174)
>> Aug 10 14:45:20 RCBLpi kernel: [ 356.172492] [<8013db68>] (kthread)
>> from [<801080ac>] (ret_from_fork+0x14/0x28)
>> Aug 10 14:45:22 RCBLpi kernel: [ 357.591518] INFO: rcu_sched detected
>> expedited stalls on CPUs/tasks: { 0-... } 2109 jiffies s: 221 root:
>> 0x1/.
>> Aug 10 14:45:22 RCBLpi kernel: [ 357.591545] blocking rcu_node structures:
>> Aug 10 14:45:22 RCBLpi kernel: [ 357.591554] Task dump for CPU 0:
>> Aug 10 14:45:22 RCBLpi kernel: [ 357.591560] kworker/u8:17 R
>> running task 0 1007 2 0x00000002
>> Aug 10 14:45:22 RCBLpi kernel: [ 357.591603] Workqueue: phy4
>> rt2x00usb_work_rxdone [rt2x00usb]
>> Aug 10 14:45:25 RCBLpi kernel: [ 360.721537]
>> rt2800usb_tx_sta_fifo_read_completed: 77886 callbacks suppressed
>> Aug 10 14:45:25 RCBLpi kernel: [ 360.721552] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:25 RCBLpi kernel: [ 360.721651] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:25 RCBLpi kernel: [ 360.721720] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>>
>> On Fri, Aug 10, 2018 at 5:14 AM, Stanislaw Gruszka <[email protected]> wrote:
>> > On Thu, Aug 09, 2018 at 12:49:58PM -0600, Randy Oostdyk wrote:
>> >> On Thu, Aug 9, 2018 at 5:10 AM, Stanislaw Gruszka <[email protected]> wrote:
>> >> > I'm reluctant to replace _warn by _dbg messages as if somethings
>> >> > will go wrong we will not notice that. We can use printk_ratelimited()
>> >> > variant instead to do not spam log in speed that it will hung
>> >> > the machine. But the correct fix should be in USB host drivers,
>> >> > which seems to be not in perfect shape on those embedded platforms.
>> >>
>> >> Agreed about the "correct" fix likely being on the USB side, but at
>> >> least using printk_ratelimited() will avoid the log spamming, and
>> >> avoid locking the machine, as you said.
>> >>
>> >> Will someone else take that approach and come up with a patch, or is
>> >> this something I should try to take on myself? I could certainly test
>> >> such a patch, if desired.
>> >>
>> >> Meanwhile, I'll look into reporting the USB bug, if it hasn't already been.
>> >
>> > I'm attaching the ratelimit patch for test.
>> >
>> > However after reading email from Larry I withdraw blaming USB host
>> > driver. The issue could be faulty HW and USB host driver can not do
>> > much much about this.
>> >
>> > Another question is if machine hung due to log spam or because there
>> > was some crash. Those error conditions trigger code paths that are
>> > not usually used, so there could be some oops that hung the system.
>> >
>> > Anyway you can test the patch and report back.
>> >
>> > Cheers
>> > Stanislaw


Attachments:
rt2800usb_cpu_stall.txt (15.32 kB)

2018-08-15 10:59:03

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

Please attach logs as txt file. Your mail client break lines
what makes logs unreadable for me.

Thanks
Stanislaw

On Tue, Aug 14, 2018 at 11:19:09PM -0600, Randy Oostdyk wrote:
> Finally got an opportunity to implement the patch and test it out.
> Looks like it suppressed the log spam (excellent!), but the machine
> unfortunately remains unresponsive. Attached is a relevant section of
> the log: I'm not able to interpret the CPU stall backtrace info, but
> perhaps someone here can determine whether it's USB or the wireless
> driver hanging the machine? In the log, it repeats with further
> warnings and CPU stall backtraces. I can supply more, if needed.
>
> As mentioned earlier, it seems clear I have some kind of USB hardware
> issue going on, and I'll sort that out myself, I just wanted to report
> the hang, in case it might help contribute toward improving the
> reliability of the kernel.
>
> Cheers!
>
>
> Aug 10 14:44:09 RCBLpi kernel: [ 284.697778] wlan1: AP
> fc:2d:5e:dd:54:32 changed bandwidth, new config is 2437 MHz, width 1
> (2437/0 MHz)
> Aug 10 14:45:00 RCBLpi kernel: [ 335.680872]
> rt2800usb_tx_sta_fifo_read_completed: 1389 callbacks suppressed
> Aug 10 14:45:00 RCBLpi kernel: [ 335.680887] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [ 335.680986] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [ 335.681059] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [ 335.681126] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [ 335.681193] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [ 335.681282] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [ 335.681348] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [ 335.681415] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [ 335.681548] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [ 335.681623] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [ 336.171569] ieee80211 phy4:
> rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for
> offset 0x1700 with error -71
> Aug 10 14:45:05 RCBLpi kernel: [ 340.681558]
> rt2800usb_tx_sta_fifo_read_completed: 73528 callbacks suppressed
> Aug 10 14:45:05 RCBLpi kernel: [ 340.681573] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [ 340.681659] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [ 340.681754] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [ 340.681829] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [ 340.681903] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [ 340.681984] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [ 340.682079] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [ 340.682149] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [ 340.682236] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [ 340.682302] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [ 345.691516]
> rt2800usb_tx_sta_fifo_read_completed: 76991 callbacks suppressed
> Aug 10 14:45:10 RCBLpi kernel: [ 345.691530] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [ 345.691603] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [ 345.691704] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [ 345.691810] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [ 345.691877] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [ 345.691958] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [ 345.692024] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [ 345.692091] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [ 345.692173] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [ 345.692241] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [ 350.701580]
> rt2800usb_tx_sta_fifo_read_completed: 77030 callbacks suppressed
> Aug 10 14:45:15 RCBLpi kernel: [ 350.701596] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [ 350.701722] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [ 350.701789] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [ 350.701870] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [ 350.701935] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [ 350.702001] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [ 350.702084] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [ 350.702150] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [ 350.702241] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [ 350.702307] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [ 355.711589]
> rt2800usb_tx_sta_fifo_read_completed: 77771 callbacks suppressed
> Aug 10 14:45:20 RCBLpi kernel: [ 355.711603] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [ 355.711674] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [ 355.711763] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [ 355.711829] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [ 355.711917] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [ 355.711983] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [ 355.712048] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [ 355.712135] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [ 355.712199] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [ 355.712265] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171492] INFO: rcu_sched
> self-detected stall on CPU
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171522] 0-...: (2099 ticks
> this GP) idle=72a/140000000000002/0 softirq=11632/11632 fqs=1002
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171525] (t=2100 jiffies
> g=5626 c=5625 q=17568)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171541] NMI backtrace for cpu 0
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171553] CPU: 0 PID: 1007 Comm:
> kworker/u8:17 Tainted: G C 4.14.61-v7+ #1
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171558] Hardware name: BCM2835
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171588] Workqueue: phy4
> rt2x00usb_work_rxdone [rt2x00usb]
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171622] [<8010ffc0>]
> (unwind_backtrace) from [<8010c1fc>] (show_stack+0x20/0x24)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171640] [<8010c1fc>]
> (show_stack) from [<807894fc>] (dump_stack+0xc8/0x10c)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171659] [<807894fc>]
> (dump_stack) from [<8078f43c>] (nmi_cpu_backtrace+0x11c/0x120)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171676] [<8078f43c>]
> (nmi_cpu_backtrace) from [<8078f520>]
> (nmi_trigger_cpumask_backtrace+0xe0/0x12c)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171692] [<8078f520>]
> (nmi_trigger_cpumask_backtrace) from [<8010e648>]
> (arch_trigger_cpumask_backtrace+0x20/0x24)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171710] [<8010e648>]
> (arch_trigger_cpumask_backtrace) from [<80185710>]
> (rcu_dump_cpu_stacks+0xb0/0xdc)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171728] [<80185710>]
> (rcu_dump_cpu_stacks) from [<801850b0>]
> (rcu_check_callbacks+0x7e8/0x960)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171745] [<801850b0>]
> (rcu_check_callbacks) from [<8018b314>]
> (update_process_times+0x44/0x70)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171762] [<8018b314>]
> (update_process_times) from [<8019d32c>] (tick_sched_handle+0x64/0x70)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171779] [<8019d32c>]
> (tick_sched_handle) from [<8019d590>] (tick_sched_timer+0x50/0xac)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171795] [<8019d590>]
> (tick_sched_timer) from [<8018c2cc>]
> (__hrtimer_run_queues+0x158/0x2ec)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171810] [<8018c2cc>]
> (__hrtimer_run_queues) from [<8018c704>]
> (hrtimer_interrupt+0xb8/0x20c)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171827] [<8018c704>]
> (hrtimer_interrupt) from [<8063de74>]
> (arch_timer_handler_phys+0x40/0x48)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171846] [<8063de74>]
> (arch_timer_handler_phys) from [<8017a150>]
> (handle_percpu_devid_irq+0x88/0x23c)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171863] [<8017a150>]
> (handle_percpu_devid_irq) from [<801749fc>]
> (generic_handle_irq+0x34/0x44)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171879] [<801749fc>]
> (generic_handle_irq) from [<80175050>] (__handle_domain_irq+0x6c/0xc4)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171894] [<80175050>]
> (__handle_domain_irq) from [<80101520>]
> (bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171911] [<80101520>]
> (bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>]
> (__irq_svc+0x5c/0x7c)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171918] Exception
> stack(0xadbffab0 to 0xadbffaf8)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171927] faa0:
> 00000000 00000000 a980bd00 00000000
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171939] fac0: b7716e80 60000113
> b9fbe400 adae99d0 00000001 adbfe000 00000001 adbffb1c
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171949] fae0: adbffb00 adbffb00
> 805b3388 805b339c 60000113 ffffffff
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171969] [<807a4ffc>] (__irq_svc)
> from [<805b339c>] (__usb_hcd_giveback_urb+0x118/0x15c)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.171988] [<805b339c>]
> (__usb_hcd_giveback_urb) from [<805b3530>]
> (usb_hcd_giveback_urb+0x4c/0xfc)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172006] [<805b3530>]
> (usb_hcd_giveback_urb) from [<805dde40>]
> (completion_tasklet_func+0x80/0xac)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172024] [<805dde40>]
> (completion_tasklet_func) from [<805ecc60>]
> (tasklet_callback+0x20/0x24)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172040] [<805ecc60>]
> (tasklet_callback) from [<80123ac4>] (tasklet_hi_action+0x70/0x104)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172055] [<80123ac4>]
> (tasklet_hi_action) from [<8010169c>] (__do_softirq+0x174/0x3d8)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172069] [<8010169c>]
> (__do_softirq) from [<801236a8>] (irq_exit+0xe4/0x140)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172083] [<801236a8>] (irq_exit)
> from [<80175054>] (__handle_domain_irq+0x70/0xc4)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172097] [<80175054>]
> (__handle_domain_irq) from [<80101520>]
> (bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172111] [<80101520>]
> (bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>]
> (__irq_svc+0x5c/0x7c)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172117] Exception
> stack(0xadbffc58 to 0xadbffca0)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172125] fc40:
> b9d83a40 00000000
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172136] fc60: 00000001 00000000
> 60000013 00000000 aa2d4b00 a9949600 b9db0400 b9fbe400
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172148] fc80: 00000000 adbffcbc
> adbffca8 adbffca8 807a4d2c 807a4d40 60000013 ffffffff
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172164] [<807a4ffc>] (__irq_svc)
> from [<807a4d40>] (_raw_spin_unlock_irqrestore+0x50/0x70)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172180] [<807a4d40>]
> (_raw_spin_unlock_irqrestore) from [<805ecbf8>]
> (DWC_SPINUNLOCK_IRQRESTORE+0x18/0x1c)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172196] [<805ecbf8>]
> (DWC_SPINUNLOCK_IRQRESTORE) from [<805e27f4>]
> (dwc_otg_urb_enqueue+0x1dc/0x31c)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172212] [<805e27f4>]
> (dwc_otg_urb_enqueue) from [<805b4f34>]
> (usb_hcd_submit_urb+0xc8/0x934)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172226] [<805b4f34>]
> (usb_hcd_submit_urb) from [<805b687c>] (usb_submit_urb+0x284/0x510)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172250] [<805b687c>]
> (usb_submit_urb) from [<7f4a76e0>] (rt2x00usb_kick_rx_entry+0xb4/0xfc
> [rt2x00usb])
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172296] [<7f4a76e0>]
> (rt2x00usb_kick_rx_entry [rt2x00usb]) from [<7f4a7758>]
> (rt2x00usb_clear_entry+0x30/0x34 [rt2x00usb])
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172354] [<7f4a7758>]
> (rt2x00usb_clear_entry [rt2x00usb]) from [<7f492a44>]
> (rt2x00lib_rxdone+0x148/0x5b8 [rt2x00lib])
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172412] [<7f492a44>]
> (rt2x00lib_rxdone [rt2x00lib]) from [<7f4a8088>]
> (rt2x00usb_work_rxdone+0x58/0x8c [rt2x00usb])
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172440] [<7f4a8088>]
> (rt2x00usb_work_rxdone [rt2x00usb]) from [<80137384>]
> (process_one_work+0x224/0x518)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172459] [<80137384>]
> (process_one_work) from [<8013835c>] (worker_thread+0x60/0x5f0)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172476] [<8013835c>]
> (worker_thread) from [<8013db68>] (kthread+0x144/0x174)
> Aug 10 14:45:20 RCBLpi kernel: [ 356.172492] [<8013db68>] (kthread)
> from [<801080ac>] (ret_from_fork+0x14/0x28)
> Aug 10 14:45:22 RCBLpi kernel: [ 357.591518] INFO: rcu_sched detected
> expedited stalls on CPUs/tasks: { 0-... } 2109 jiffies s: 221 root:
> 0x1/.
> Aug 10 14:45:22 RCBLpi kernel: [ 357.591545] blocking rcu_node structures:
> Aug 10 14:45:22 RCBLpi kernel: [ 357.591554] Task dump for CPU 0:
> Aug 10 14:45:22 RCBLpi kernel: [ 357.591560] kworker/u8:17 R
> running task 0 1007 2 0x00000002
> Aug 10 14:45:22 RCBLpi kernel: [ 357.591603] Workqueue: phy4
> rt2x00usb_work_rxdone [rt2x00usb]
> Aug 10 14:45:25 RCBLpi kernel: [ 360.721537]
> rt2800usb_tx_sta_fifo_read_completed: 77886 callbacks suppressed
> Aug 10 14:45:25 RCBLpi kernel: [ 360.721552] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:25 RCBLpi kernel: [ 360.721651] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:25 RCBLpi kernel: [ 360.721720] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
>
> On Fri, Aug 10, 2018 at 5:14 AM, Stanislaw Gruszka <[email protected]> wrote:
> > On Thu, Aug 09, 2018 at 12:49:58PM -0600, Randy Oostdyk wrote:
> >> On Thu, Aug 9, 2018 at 5:10 AM, Stanislaw Gruszka <[email protected]> wrote:
> >> > I'm reluctant to replace _warn by _dbg messages as if somethings
> >> > will go wrong we will not notice that. We can use printk_ratelimited()
> >> > variant instead to do not spam log in speed that it will hung
> >> > the machine. But the correct fix should be in USB host drivers,
> >> > which seems to be not in perfect shape on those embedded platforms.
> >>
> >> Agreed about the "correct" fix likely being on the USB side, but at
> >> least using printk_ratelimited() will avoid the log spamming, and
> >> avoid locking the machine, as you said.
> >>
> >> Will someone else take that approach and come up with a patch, or is
> >> this something I should try to take on myself? I could certainly test
> >> such a patch, if desired.
> >>
> >> Meanwhile, I'll look into reporting the USB bug, if it hasn't already been.
> >
> > I'm attaching the ratelimit patch for test.
> >
> > However after reading email from Larry I withdraw blaming USB host
> > driver. The issue could be faulty HW and USB host driver can not do
> > much much about this.
> >
> > Another question is if machine hung due to log spam or because there
> > was some crash. Those error conditions trigger code paths that are
> > not usually used, so there could be some oops that hung the system.
> >
> > Anyway you can test the patch and report back.
> >
> > Cheers
> > Stanislaw

2018-08-16 13:31:53

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

On Wed, Aug 15, 2018 at 10:25:48PM -0600, Randy Oostdyk wrote:
> Sorry about the mangled log file! Here's the txt file as an
> attachment, hope it's useful.

We loop infinitely in rt2x00_rxdone because we can not submit
urb. We always get EPROTO error. I think usb host driver should
give ENODEV at some point, but this is out of tree dwc_otg driver,
so I don't expect we can get fix for it.

I prepared a patch that count EPROTO errors and if it's bigger
than 10 mark device as gone. It should not make a problem
when we will get one time random error. I'm attaching it,
please test.

Thanks
Stanislaw


Attachments:
(No filename) (605.00 B)
rt2x00_check_proto_err.patch (2.18 kB)
Download all attachments

2018-08-10 13:47:01

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

On Thu, Aug 09, 2018 at 12:49:58PM -0600, Randy Oostdyk wrote:
> On Thu, Aug 9, 2018 at 5:10 AM, Stanislaw Gruszka <[email protected]> wrote:
> > I'm reluctant to replace _warn by _dbg messages as if somethings
> > will go wrong we will not notice that. We can use printk_ratelimited()
> > variant instead to do not spam log in speed that it will hung
> > the machine. But the correct fix should be in USB host drivers,
> > which seems to be not in perfect shape on those embedded platforms.
>
> Agreed about the "correct" fix likely being on the USB side, but at
> least using printk_ratelimited() will avoid the log spamming, and
> avoid locking the machine, as you said.
>
> Will someone else take that approach and come up with a patch, or is
> this something I should try to take on myself? I could certainly test
> such a patch, if desired.
>
> Meanwhile, I'll look into reporting the USB bug, if it hasn't already been.

I'm attaching the ratelimit patch for test.

However after reading email from Larry I withdraw blaming USB host
driver. The issue could be faulty HW and USB host driver can not do
much much about this.

Another question is if machine hung due to log spam or because there
was some crash. Those error conditions trigger code paths that are
not usually used, so there could be some oops that hung the system.

Anyway you can test the patch and report back.

Cheers
Stanislaw


Attachments:
(No filename) (1.38 kB)
rt2x00_ratelimited_err.patch (1.17 kB)
Download all attachments

2018-08-08 21:25:12

by Larry Finger

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

On 08/08/2018 12:58 PM, Randy Oostdyk wrote:
> Good day all,
>
> I'm writing to report an issue with the linux kernel, and I'm hoping
> this is the right place to report it. (short aside: I tried to ask in
> the #linux-wireless IRC channel, but wasn't permitted to speak!)
>
> I'm aware that this is a USB protocol issue, and so this may be the
> wrong place to report the bug, but the warning seems to be generated
> by the wireless driver, and that appears to be the key issue here.
>
> My USB wifi dongle is on the end of a very long USB cable, and was
> connected to a USB hub. On two different occasions (after hours or
> days of use), the machine it was attached to (Raspberry Pi 3) stopped
> responding. I was unable to SSH in, even over ethernet. After a hard
> reboot, I found that the following error was repeated **many thousands
> of times per second** in three different log files:
>
> Rpi kernel: [857011.857581] ieee80211 phy1:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
>
> As the machine continued in that state for hours, those log files had
> grown to several gigabytes in size each! (/var/log/syslog,
> /var/log/kern.log, and /var/log/messages)
>
> It appears to be a very similar (if not same) bug referenced here:
> https://www.spinics.net/lists/linux-wireless/msg150555.html
>
> He resolved the "soft lockup" issue with some changes to the driver
> (diff included in that thread), so I'm hoping this is the right place
> to bring this issue up.
>
> Output of uname -a:
> Linux RCBLpi 4.14.52-v7+ #1123 SMP Wed Jun 27 17:35:49 BST 2018 armv7l GNU/Linux

My browsing shows that error 71 (EPROTO) is some kind of protocol error. Was the
one you noted the first, or had others been logged? If that was the first, that
infrequent an error will be hard to find.

I suspect some subtle USB timing issue. Is that very long cable within specs of
5 meters for USB2 or 3 for USB3? Was the hub powered? If your setup permits, you
might try with a shorter cable.

Larry

2018-08-16 14:03:42

by Stanislaw Gruszka

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

On Thu, Aug 16, 2018 at 12:31:07PM +0200, Stanislaw Gruszka wrote:
> I prepared a patch that count EPROTO errors and if it's bigger
> than 10 mark device as gone. It should not make a problem
> when we will get one time random error. I'm attaching it,
> please test.

Forgot to compile the patch, here is one that compiles.

Cheers
Stanislaw


Attachments:
(No filename) (343.00 B)
rt2x00_check_proto_err_v2.patch (1.76 kB)
Download all attachments

2018-08-15 08:09:49

by Randy Oostdyk

[permalink] [raw]
Subject: Re: wireless dongle causing entire machine to hang

Finally got an opportunity to implement the patch and test it out.
Looks like it suppressed the log spam (excellent!), but the machine
unfortunately remains unresponsive. Attached is a relevant section of
the log: I'm not able to interpret the CPU stall backtrace info, but
perhaps someone here can determine whether it's USB or the wireless
driver hanging the machine? In the log, it repeats with further
warnings and CPU stall backtraces. I can supply more, if needed.

As mentioned earlier, it seems clear I have some kind of USB hardware
issue going on, and I'll sort that out myself, I just wanted to report
the hang, in case it might help contribute toward improving the
reliability of the kernel.

Cheers!


Aug 10 14:44:09 RCBLpi kernel: [ 284.697778] wlan1: AP
fc:2d:5e:dd:54:32 changed bandwidth, new config is 2437 MHz, width 1
(2437/0 MHz)
Aug 10 14:45:00 RCBLpi kernel: [ 335.680872]
rt2800usb_tx_sta_fifo_read_completed: 1389 callbacks suppressed
Aug 10 14:45:00 RCBLpi kernel: [ 335.680887] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [ 335.680986] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [ 335.681059] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [ 335.681126] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [ 335.681193] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [ 335.681282] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [ 335.681348] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [ 335.681415] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [ 335.681548] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [ 335.681623] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [ 336.171569] ieee80211 phy4:
rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for
offset 0x1700 with error -71
Aug 10 14:45:05 RCBLpi kernel: [ 340.681558]
rt2800usb_tx_sta_fifo_read_completed: 73528 callbacks suppressed
Aug 10 14:45:05 RCBLpi kernel: [ 340.681573] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [ 340.681659] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [ 340.681754] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [ 340.681829] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [ 340.681903] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [ 340.681984] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [ 340.682079] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [ 340.682149] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [ 340.682236] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [ 340.682302] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [ 345.691516]
rt2800usb_tx_sta_fifo_read_completed: 76991 callbacks suppressed
Aug 10 14:45:10 RCBLpi kernel: [ 345.691530] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [ 345.691603] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [ 345.691704] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [ 345.691810] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [ 345.691877] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [ 345.691958] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [ 345.692024] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [ 345.692091] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [ 345.692173] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [ 345.692241] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [ 350.701580]
rt2800usb_tx_sta_fifo_read_completed: 77030 callbacks suppressed
Aug 10 14:45:15 RCBLpi kernel: [ 350.701596] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [ 350.701722] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [ 350.701789] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [ 350.701870] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [ 350.701935] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [ 350.702001] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [ 350.702084] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [ 350.702150] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [ 350.702241] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [ 350.702307] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [ 355.711589]
rt2800usb_tx_sta_fifo_read_completed: 77771 callbacks suppressed
Aug 10 14:45:20 RCBLpi kernel: [ 355.711603] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [ 355.711674] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [ 355.711763] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [ 355.711829] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [ 355.711917] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [ 355.711983] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [ 355.712048] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [ 355.712135] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [ 355.712199] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [ 355.712265] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [ 356.171492] INFO: rcu_sched
self-detected stall on CPU
Aug 10 14:45:20 RCBLpi kernel: [ 356.171522] 0-...: (2099 ticks
this GP) idle=72a/140000000000002/0 softirq=11632/11632 fqs=1002
Aug 10 14:45:20 RCBLpi kernel: [ 356.171525] (t=2100 jiffies
g=5626 c=5625 q=17568)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171541] NMI backtrace for cpu 0
Aug 10 14:45:20 RCBLpi kernel: [ 356.171553] CPU: 0 PID: 1007 Comm:
kworker/u8:17 Tainted: G C 4.14.61-v7+ #1
Aug 10 14:45:20 RCBLpi kernel: [ 356.171558] Hardware name: BCM2835
Aug 10 14:45:20 RCBLpi kernel: [ 356.171588] Workqueue: phy4
rt2x00usb_work_rxdone [rt2x00usb]
Aug 10 14:45:20 RCBLpi kernel: [ 356.171622] [<8010ffc0>]
(unwind_backtrace) from [<8010c1fc>] (show_stack+0x20/0x24)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171640] [<8010c1fc>]
(show_stack) from [<807894fc>] (dump_stack+0xc8/0x10c)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171659] [<807894fc>]
(dump_stack) from [<8078f43c>] (nmi_cpu_backtrace+0x11c/0x120)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171676] [<8078f43c>]
(nmi_cpu_backtrace) from [<8078f520>]
(nmi_trigger_cpumask_backtrace+0xe0/0x12c)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171692] [<8078f520>]
(nmi_trigger_cpumask_backtrace) from [<8010e648>]
(arch_trigger_cpumask_backtrace+0x20/0x24)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171710] [<8010e648>]
(arch_trigger_cpumask_backtrace) from [<80185710>]
(rcu_dump_cpu_stacks+0xb0/0xdc)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171728] [<80185710>]
(rcu_dump_cpu_stacks) from [<801850b0>]
(rcu_check_callbacks+0x7e8/0x960)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171745] [<801850b0>]
(rcu_check_callbacks) from [<8018b314>]
(update_process_times+0x44/0x70)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171762] [<8018b314>]
(update_process_times) from [<8019d32c>] (tick_sched_handle+0x64/0x70)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171779] [<8019d32c>]
(tick_sched_handle) from [<8019d590>] (tick_sched_timer+0x50/0xac)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171795] [<8019d590>]
(tick_sched_timer) from [<8018c2cc>]
(__hrtimer_run_queues+0x158/0x2ec)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171810] [<8018c2cc>]
(__hrtimer_run_queues) from [<8018c704>]
(hrtimer_interrupt+0xb8/0x20c)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171827] [<8018c704>]
(hrtimer_interrupt) from [<8063de74>]
(arch_timer_handler_phys+0x40/0x48)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171846] [<8063de74>]
(arch_timer_handler_phys) from [<8017a150>]
(handle_percpu_devid_irq+0x88/0x23c)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171863] [<8017a150>]
(handle_percpu_devid_irq) from [<801749fc>]
(generic_handle_irq+0x34/0x44)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171879] [<801749fc>]
(generic_handle_irq) from [<80175050>] (__handle_domain_irq+0x6c/0xc4)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171894] [<80175050>]
(__handle_domain_irq) from [<80101520>]
(bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171911] [<80101520>]
(bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>]
(__irq_svc+0x5c/0x7c)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171918] Exception
stack(0xadbffab0 to 0xadbffaf8)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171927] faa0:
00000000 00000000 a980bd00 00000000
Aug 10 14:45:20 RCBLpi kernel: [ 356.171939] fac0: b7716e80 60000113
b9fbe400 adae99d0 00000001 adbfe000 00000001 adbffb1c
Aug 10 14:45:20 RCBLpi kernel: [ 356.171949] fae0: adbffb00 adbffb00
805b3388 805b339c 60000113 ffffffff
Aug 10 14:45:20 RCBLpi kernel: [ 356.171969] [<807a4ffc>] (__irq_svc)
from [<805b339c>] (__usb_hcd_giveback_urb+0x118/0x15c)
Aug 10 14:45:20 RCBLpi kernel: [ 356.171988] [<805b339c>]
(__usb_hcd_giveback_urb) from [<805b3530>]
(usb_hcd_giveback_urb+0x4c/0xfc)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172006] [<805b3530>]
(usb_hcd_giveback_urb) from [<805dde40>]
(completion_tasklet_func+0x80/0xac)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172024] [<805dde40>]
(completion_tasklet_func) from [<805ecc60>]
(tasklet_callback+0x20/0x24)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172040] [<805ecc60>]
(tasklet_callback) from [<80123ac4>] (tasklet_hi_action+0x70/0x104)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172055] [<80123ac4>]
(tasklet_hi_action) from [<8010169c>] (__do_softirq+0x174/0x3d8)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172069] [<8010169c>]
(__do_softirq) from [<801236a8>] (irq_exit+0xe4/0x140)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172083] [<801236a8>] (irq_exit)
from [<80175054>] (__handle_domain_irq+0x70/0xc4)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172097] [<80175054>]
(__handle_domain_irq) from [<80101520>]
(bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172111] [<80101520>]
(bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>]
(__irq_svc+0x5c/0x7c)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172117] Exception
stack(0xadbffc58 to 0xadbffca0)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172125] fc40:
b9d83a40 00000000
Aug 10 14:45:20 RCBLpi kernel: [ 356.172136] fc60: 00000001 00000000
60000013 00000000 aa2d4b00 a9949600 b9db0400 b9fbe400
Aug 10 14:45:20 RCBLpi kernel: [ 356.172148] fc80: 00000000 adbffcbc
adbffca8 adbffca8 807a4d2c 807a4d40 60000013 ffffffff
Aug 10 14:45:20 RCBLpi kernel: [ 356.172164] [<807a4ffc>] (__irq_svc)
from [<807a4d40>] (_raw_spin_unlock_irqrestore+0x50/0x70)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172180] [<807a4d40>]
(_raw_spin_unlock_irqrestore) from [<805ecbf8>]
(DWC_SPINUNLOCK_IRQRESTORE+0x18/0x1c)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172196] [<805ecbf8>]
(DWC_SPINUNLOCK_IRQRESTORE) from [<805e27f4>]
(dwc_otg_urb_enqueue+0x1dc/0x31c)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172212] [<805e27f4>]
(dwc_otg_urb_enqueue) from [<805b4f34>]
(usb_hcd_submit_urb+0xc8/0x934)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172226] [<805b4f34>]
(usb_hcd_submit_urb) from [<805b687c>] (usb_submit_urb+0x284/0x510)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172250] [<805b687c>]
(usb_submit_urb) from [<7f4a76e0>] (rt2x00usb_kick_rx_entry+0xb4/0xfc
[rt2x00usb])
Aug 10 14:45:20 RCBLpi kernel: [ 356.172296] [<7f4a76e0>]
(rt2x00usb_kick_rx_entry [rt2x00usb]) from [<7f4a7758>]
(rt2x00usb_clear_entry+0x30/0x34 [rt2x00usb])
Aug 10 14:45:20 RCBLpi kernel: [ 356.172354] [<7f4a7758>]
(rt2x00usb_clear_entry [rt2x00usb]) from [<7f492a44>]
(rt2x00lib_rxdone+0x148/0x5b8 [rt2x00lib])
Aug 10 14:45:20 RCBLpi kernel: [ 356.172412] [<7f492a44>]
(rt2x00lib_rxdone [rt2x00lib]) from [<7f4a8088>]
(rt2x00usb_work_rxdone+0x58/0x8c [rt2x00usb])
Aug 10 14:45:20 RCBLpi kernel: [ 356.172440] [<7f4a8088>]
(rt2x00usb_work_rxdone [rt2x00usb]) from [<80137384>]
(process_one_work+0x224/0x518)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172459] [<80137384>]
(process_one_work) from [<8013835c>] (worker_thread+0x60/0x5f0)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172476] [<8013835c>]
(worker_thread) from [<8013db68>] (kthread+0x144/0x174)
Aug 10 14:45:20 RCBLpi kernel: [ 356.172492] [<8013db68>] (kthread)
from [<801080ac>] (ret_from_fork+0x14/0x28)
Aug 10 14:45:22 RCBLpi kernel: [ 357.591518] INFO: rcu_sched detected
expedited stalls on CPUs/tasks: { 0-... } 2109 jiffies s: 221 root:
0x1/.
Aug 10 14:45:22 RCBLpi kernel: [ 357.591545] blocking rcu_node structures:
Aug 10 14:45:22 RCBLpi kernel: [ 357.591554] Task dump for CPU 0:
Aug 10 14:45:22 RCBLpi kernel: [ 357.591560] kworker/u8:17 R
running task 0 1007 2 0x00000002
Aug 10 14:45:22 RCBLpi kernel: [ 357.591603] Workqueue: phy4
rt2x00usb_work_rxdone [rt2x00usb]
Aug 10 14:45:25 RCBLpi kernel: [ 360.721537]
rt2800usb_tx_sta_fifo_read_completed: 77886 callbacks suppressed
Aug 10 14:45:25 RCBLpi kernel: [ 360.721552] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:25 RCBLpi kernel: [ 360.721651] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:25 RCBLpi kernel: [ 360.721720] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71

On Fri, Aug 10, 2018 at 5:14 AM, Stanislaw Gruszka <[email protected]> wrote:
> On Thu, Aug 09, 2018 at 12:49:58PM -0600, Randy Oostdyk wrote:
>> On Thu, Aug 9, 2018 at 5:10 AM, Stanislaw Gruszka <[email protected]> wrote:
>> > I'm reluctant to replace _warn by _dbg messages as if somethings
>> > will go wrong we will not notice that. We can use printk_ratelimited()
>> > variant instead to do not spam log in speed that it will hung
>> > the machine. But the correct fix should be in USB host drivers,
>> > which seems to be not in perfect shape on those embedded platforms.
>>
>> Agreed about the "correct" fix likely being on the USB side, but at
>> least using printk_ratelimited() will avoid the log spamming, and
>> avoid locking the machine, as you said.
>>
>> Will someone else take that approach and come up with a patch, or is
>> this something I should try to take on myself? I could certainly test
>> such a patch, if desired.
>>
>> Meanwhile, I'll look into reporting the USB bug, if it hasn't already been.
>
> I'm attaching the ratelimit patch for test.
>
> However after reading email from Larry I withdraw blaming USB host
> driver. The issue could be faulty HW and USB host driver can not do
> much much about this.
>
> Another question is if machine hung due to log spam or because there
> was some crash. Those error conditions trigger code paths that are
> not usually used, so there could be some oops that hung the system.
>
> Anyway you can test the patch and report back.
>
> Cheers
> Stanislaw