2022-08-04 17:25:38

by Kostas Peletidis

[permalink] [raw]
Subject: mt7921e: Network device not responding following chip reset

Hello,

Takashi (in cc) and I have been looking at a strange mt7921e issue I encountered and any help to get to the bottom of it would be much appreciated. During normal use of my machine sometimes the network would become unreachable and any network related commands, such as ping or ss, would hang indefinitely when executed in a terminal. This is what a typical dmesg output would look like (see URL at the end of this message for full details):

[11249.676616] r8169 0000:02:00.0 enp2s0f0: Link is Down
[11453.812782] mt7921e 0000:03:00.0: driver own failed
[11454.986117] mt7921e 0000:03:00.0: driver own failed
[11454.986134] mt7921e 0000:03:00.0: chip reset
[11456.170894] mt7921e 0000:03:00.0: driver own failed
[11456.278532] pcieport 0000:00:02.3: pciehp: Slot(0): Link Down
[11456.278536] pcieport 0000:00:02.3: pciehp: Slot(0): Card not present
[11456.313973] wlp3s0: deauthenticating from f8:5b:3b:0f:2b:9f by local choice (Reason: 3=DEAUTH_LEAVING)
[11457.286206] mt7921e 0000:03:00.0: Timeout for driver own
[11458.400420] mt7921e 0000:03:00.0: driver own failed
[11458.400442] ------------[ cut here ]------------
[11458.400443] WARNING: CPU: 2 PID: 8597 at kernel/kthread.c:659 kthread_park+0x81/0x90

I have noticed this issue both with tainted and not tainted kernels. To me it looks like some kind of hardware reset timed out (or the hardware was probed too quickly). This is what a successful chip reset looks like in my logs:

Chip reset OK, no warning
-------------------------
Jul 04 13:06:33 savra kernel: mt7921e 0000:03:00.0: driver own failed
Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: driver own failed
Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: chip reset
Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: HW/SW Version: 0x8a108a10, Build Time: 20220311230842a
Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: WM Firmware Version: ____010000, Build Time: 20220311230931
Jul 04 13:06:36 savra kernel: wlp3s0: Driver requested disconnection from AP f8:5b:3b:0f:2b:9f

And this is what the chip reset looks like when the issue occurs:

Chip reset timeout, warning
---------------------------
Aug 03 12:18:00 savra kernel: mt7921e 0000:03:00.0: driver own failed
Aug 03 12:18:02 savra kernel: mt7921e 0000:03:00.0: driver own failed
Aug 03 12:18:02 savra kernel: mt7921e 0000:03:00.0: chip reset
Aug 03 12:18:03 savra kernel: mt7921e 0000:03:00.0: driver own failed
Aug 03 12:18:03 savra kernel: pcieport 0000:00:02.3: pciehp: Slot(0): Link Down
Aug 03 12:18:03 savra kernel: pcieport 0000:00:02.3: pciehp: Slot(0): Card not present
Aug 03 12:18:03 savra kernel: wlp3s0: deauthenticating from f8:5b:3b:0f:2b:9f by local choice (Reason: 3=DEAUTH_LEAVING)
Aug 03 12:18:04 savra kernel: mt7921e 0000:03:00.0: Timeout for driver own
Aug 03 12:18:05 savra kernel: mt7921e 0000:03:00.0: driver own failed
Aug 03 12:18:05 savra kernel: ------------[ cut here ]------------
Aug 03 12:18:05 savra kernel: WARNING: CPU: 6 PID: 26340 at kernel/kthread.c:659 kthread_park+0x81/0x90

I have added dmesg logs and all pertinent information in the ticket below:
https://bugzilla.opensuse.org/show_bug.cgi?id=1201845

Would someone please have a look and help us figure out what would cause the "driver own failed" message to be logged? Thank you.


Regards,
Kostas


2022-08-08 18:07:34

by Sean Wang

[permalink] [raw]
Subject: Re: mt7921e: Network device not responding following chip reset

Hi Kostas,

Applying the patch in [1] should be able to fix the following kernel
panic to make the system run even if something goes wrong in the
driver.

Jul 08 08:47:21 savra kernel: WARNING: CPU: 7 PID: 113 at
kernel/kthread.c:659 kthread_park+0x7b/0x90
<...>
Jul 08 08:47:21 savra kernel: Call Trace:
Jul 08 08:47:21 savra kernel: <TASK>
Jul 08 08:47:21 savra kernel: mt7921e_mac_reset+0x9e/0x2d0 [mt7921e
1df6344b7ec017c6819314bafbaefbc4739af58d]
Jul 08 08:47:21 savra kernel: mt7921_mac_reset_work+0x9f/0x14a
[mt7921_common a3df60fd5ed501d6ce3c322675b791e633aa28b5]
Jul 08 08:47:21 savra kernel: process_one_work+0x208/0x3c0
Jul 08 08:47:21 savra kernel: worker_thread+0x4a/0x3b0
Jul 08 08:47:21 savra kernel: ? process_one_work+0x3c0/0x3c0
Jul 08 08:47:21 savra kernel: kthread+0xda/0x100
Jul 08 08:47:21 savra kernel: ? kthread_complete_and_exit+0x20/0x20
Jul 08 08:47:21 savra kernel: ret_from_fork+0x22/0x30
Jul 08 08:47:21 savra kernel: </TASK>

On the other hand, we need time to figure out why "mt7921e
0000:03:00.0: driver own failed" happened in the log you provided
here.
But if it is possible for you, you can try out the latest firmware in
[2] first to see if it would be helpful for you.

[1] https://patchwork.kernel.org/project/linux-wireless/patch/7[email protected]mediatek.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/mediatek
Grab and update the following three files.
1.) BT_RAM_CODE_MT7961_1_2_hdr.bin
2.) WIFI_MT7961_patch_mcu_1_2_hdr.bin
3.) WIFI_RAM_CODE_MT7961_1.bin

Sean

On Thu, Aug 4, 2022 at 10:25 AM Kostas Peletidis
<[email protected]> wrote:
>
> Hello,
>
> Takashi (in cc) and I have been looking at a strange mt7921e issue I encountered and any help to get to the bottom of it would be much appreciated. During normal use of my machine sometimes the network would become unreachable and any network related commands, such as ping or ss, would hang indefinitely when executed in a terminal. This is what a typical dmesg output would look like (see URL at the end of this message for full details):
>
> [11249.676616] r8169 0000:02:00.0 enp2s0f0: Link is Down
> [11453.812782] mt7921e 0000:03:00.0: driver own failed
> [11454.986117] mt7921e 0000:03:00.0: driver own failed
> [11454.986134] mt7921e 0000:03:00.0: chip reset
> [11456.170894] mt7921e 0000:03:00.0: driver own failed
> [11456.278532] pcieport 0000:00:02.3: pciehp: Slot(0): Link Down
> [11456.278536] pcieport 0000:00:02.3: pciehp: Slot(0): Card not present
> [11456.313973] wlp3s0: deauthenticating from f8:5b:3b:0f:2b:9f by local choice (Reason: 3=DEAUTH_LEAVING)
> [11457.286206] mt7921e 0000:03:00.0: Timeout for driver own
> [11458.400420] mt7921e 0000:03:00.0: driver own failed
> [11458.400442] ------------[ cut here ]------------
> [11458.400443] WARNING: CPU: 2 PID: 8597 at kernel/kthread.c:659 kthread_park+0x81/0x90
>
> I have noticed this issue both with tainted and not tainted kernels. To me it looks like some kind of hardware reset timed out (or the hardware was probed too quickly). This is what a successful chip reset looks like in my logs:
>
> Chip reset OK, no warning
> -------------------------
> Jul 04 13:06:33 savra kernel: mt7921e 0000:03:00.0: driver own failed
> Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: driver own failed
> Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: chip reset
> Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: HW/SW Version: 0x8a108a10, Build Time: 20220311230842a
> Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: WM Firmware Version: ____010000, Build Time: 20220311230931
> Jul 04 13:06:36 savra kernel: wlp3s0: Driver requested disconnection from AP f8:5b:3b:0f:2b:9f
>
> And this is what the chip reset looks like when the issue occurs:
>
> Chip reset timeout, warning
> ---------------------------
> Aug 03 12:18:00 savra kernel: mt7921e 0000:03:00.0: driver own failed
> Aug 03 12:18:02 savra kernel: mt7921e 0000:03:00.0: driver own failed
> Aug 03 12:18:02 savra kernel: mt7921e 0000:03:00.0: chip reset
> Aug 03 12:18:03 savra kernel: mt7921e 0000:03:00.0: driver own failed
> Aug 03 12:18:03 savra kernel: pcieport 0000:00:02.3: pciehp: Slot(0): Link Down
> Aug 03 12:18:03 savra kernel: pcieport 0000:00:02.3: pciehp: Slot(0): Card not present
> Aug 03 12:18:03 savra kernel: wlp3s0: deauthenticating from f8:5b:3b:0f:2b:9f by local choice (Reason: 3=DEAUTH_LEAVING)
> Aug 03 12:18:04 savra kernel: mt7921e 0000:03:00.0: Timeout for driver own
> Aug 03 12:18:05 savra kernel: mt7921e 0000:03:00.0: driver own failed
> Aug 03 12:18:05 savra kernel: ------------[ cut here ]------------
> Aug 03 12:18:05 savra kernel: WARNING: CPU: 6 PID: 26340 at kernel/kthread.c:659 kthread_park+0x81/0x90
>
> I have added dmesg logs and all pertinent information in the ticket below:
> https://bugzilla.opensuse.org/show_bug.cgi?id=1201845
>
> Would someone please have a look and help us figure out what would cause the "driver own failed" message to be logged? Thank you.
>
>
> Regards,
> Kostas

2022-08-09 06:25:33

by Takashi Iwai

[permalink] [raw]
Subject: Re: mt7921e: Network device not responding following chip reset

On Mon, 08 Aug 2022 19:59:58 +0200,
Sean Wang wrote:
>
> Hi Kostas,
>
> Applying the patch in [1] should be able to fix the following kernel
> panic to make the system run even if something goes wrong in the
> driver.
>
> Jul 08 08:47:21 savra kernel: WARNING: CPU: 7 PID: 113 at
> kernel/kthread.c:659 kthread_park+0x7b/0x90
> <...>
> Jul 08 08:47:21 savra kernel: Call Trace:
> Jul 08 08:47:21 savra kernel: <TASK>
> Jul 08 08:47:21 savra kernel: mt7921e_mac_reset+0x9e/0x2d0 [mt7921e
> 1df6344b7ec017c6819314bafbaefbc4739af58d]
> Jul 08 08:47:21 savra kernel: mt7921_mac_reset_work+0x9f/0x14a
> [mt7921_common a3df60fd5ed501d6ce3c322675b791e633aa28b5]
> Jul 08 08:47:21 savra kernel: process_one_work+0x208/0x3c0
> Jul 08 08:47:21 savra kernel: worker_thread+0x4a/0x3b0
> Jul 08 08:47:21 savra kernel: ? process_one_work+0x3c0/0x3c0
> Jul 08 08:47:21 savra kernel: kthread+0xda/0x100
> Jul 08 08:47:21 savra kernel: ? kthread_complete_and_exit+0x20/0x20
> Jul 08 08:47:21 savra kernel: ret_from_fork+0x22/0x30
> Jul 08 08:47:21 savra kernel: </TASK>
>
> On the other hand, we need time to figure out why "mt7921e
> 0000:03:00.0: driver own failed" happened in the log you provided
> here.
> But if it is possible for you, you can try out the latest firmware in
> [2] first to see if it would be helpful for you.
>
> [1] https://patchwork.kernel.org/project/linux-wireless/patch/7[email protected]mediatek.com/
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/mediatek
> Grab and update the following three files.
> 1.) BT_RAM_CODE_MT7961_1_2_hdr.bin
> 2.) WIFI_MT7961_patch_mcu_1_2_hdr.bin
> 3.) WIFI_RAM_CODE_MT7961_1.bin

Thanks!

Kostas, I'm build a test kernel package with the patch above in OBS
home:tiwai:bsc1201845 repo. Once after the build finishes (takes an
hour or so), it'll appear at
http://download.opensuse.org/repositories/home:/tiwai:/bsc1201845/standard/

Please give it a try later.

The firmware files should be already included in the latest TW
kernel-firmware-* packages. But, to be sure, please install the
kernel-firmware-* packages from OBS Kernel:HEAD repo, which is built
from the latest linux-firmware git tree.
http://download.opensuse.org/repositories/Kernel:/HEAD/standard/

(Note that OBS Kernel:HEAD contains the kernel package itself, so
better to download kernel-firmware-*.rpm there and install them
manually instead of adding the repo URL to zypper.)


Takashi

>
> Sean
>
> On Thu, Aug 4, 2022 at 10:25 AM Kostas Peletidis
> <[email protected]> wrote:
> >
> > Hello,
> >
> > Takashi (in cc) and I have been looking at a strange mt7921e issue I encountered and any help to get to the bottom of it would be much appreciated. During normal use of my machine sometimes the network would become unreachable and any network related commands, such as ping or ss, would hang indefinitely when executed in a terminal. This is what a typical dmesg output would look like (see URL at the end of this message for full details):
> >
> > [11249.676616] r8169 0000:02:00.0 enp2s0f0: Link is Down
> > [11453.812782] mt7921e 0000:03:00.0: driver own failed
> > [11454.986117] mt7921e 0000:03:00.0: driver own failed
> > [11454.986134] mt7921e 0000:03:00.0: chip reset
> > [11456.170894] mt7921e 0000:03:00.0: driver own failed
> > [11456.278532] pcieport 0000:00:02.3: pciehp: Slot(0): Link Down
> > [11456.278536] pcieport 0000:00:02.3: pciehp: Slot(0): Card not present
> > [11456.313973] wlp3s0: deauthenticating from f8:5b:3b:0f:2b:9f by local choice (Reason: 3=DEAUTH_LEAVING)
> > [11457.286206] mt7921e 0000:03:00.0: Timeout for driver own
> > [11458.400420] mt7921e 0000:03:00.0: driver own failed
> > [11458.400442] ------------[ cut here ]------------
> > [11458.400443] WARNING: CPU: 2 PID: 8597 at kernel/kthread.c:659 kthread_park+0x81/0x90
> >
> > I have noticed this issue both with tainted and not tainted kernels. To me it looks like some kind of hardware reset timed out (or the hardware was probed too quickly). This is what a successful chip reset looks like in my logs:
> >
> > Chip reset OK, no warning
> > -------------------------
> > Jul 04 13:06:33 savra kernel: mt7921e 0000:03:00.0: driver own failed
> > Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: driver own failed
> > Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: chip reset
> > Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: HW/SW Version: 0x8a108a10, Build Time: 20220311230842a
> > Jul 04 13:06:35 savra kernel: mt7921e 0000:03:00.0: WM Firmware Version: ____010000, Build Time: 20220311230931
> > Jul 04 13:06:36 savra kernel: wlp3s0: Driver requested disconnection from AP f8:5b:3b:0f:2b:9f
> >
> > And this is what the chip reset looks like when the issue occurs:
> >
> > Chip reset timeout, warning
> > ---------------------------
> > Aug 03 12:18:00 savra kernel: mt7921e 0000:03:00.0: driver own failed
> > Aug 03 12:18:02 savra kernel: mt7921e 0000:03:00.0: driver own failed
> > Aug 03 12:18:02 savra kernel: mt7921e 0000:03:00.0: chip reset
> > Aug 03 12:18:03 savra kernel: mt7921e 0000:03:00.0: driver own failed
> > Aug 03 12:18:03 savra kernel: pcieport 0000:00:02.3: pciehp: Slot(0): Link Down
> > Aug 03 12:18:03 savra kernel: pcieport 0000:00:02.3: pciehp: Slot(0): Card not present
> > Aug 03 12:18:03 savra kernel: wlp3s0: deauthenticating from f8:5b:3b:0f:2b:9f by local choice (Reason: 3=DEAUTH_LEAVING)
> > Aug 03 12:18:04 savra kernel: mt7921e 0000:03:00.0: Timeout for driver own
> > Aug 03 12:18:05 savra kernel: mt7921e 0000:03:00.0: driver own failed
> > Aug 03 12:18:05 savra kernel: ------------[ cut here ]------------
> > Aug 03 12:18:05 savra kernel: WARNING: CPU: 6 PID: 26340 at kernel/kthread.c:659 kthread_park+0x81/0x90
> >
> > I have added dmesg logs and all pertinent information in the ticket below:
> > https://bugzilla.opensuse.org/show_bug.cgi?id=1201845
> >
> > Would someone please have a look and help us figure out what would cause the "driver own failed" message to be logged? Thank you.
> >
> >
> > Regards,
> > Kostas
>