2021-11-25 22:42:58

by Mikhail Gavrilov

[permalink] [raw]
Subject: Re: [Bug] Driver mt7921e cause computer reboot.

On Tue, 5 Oct 2021 at 01:40, Mikhail Gavrilov
<[email protected]> wrote:
>

With recent kernel 5.16.0-0.rc2 commit 5d9f4cf36721 the behavior has
been changed for the better, but the WiFi adapter still works with
bugs.

Now spontaneous endless reboots do not occur. But if I restart the
laptop instead of shutting down, then the next time I boot, the WiFi
adapter disappears. In order for the WiFi adapter to appear again, it
needs to turn off the laptop and then turn on. Can this be somehow
fixed?
lspci output after reboot and boot after shutdown is different:

After reboot:
Subsystem: AzureWave Device 4680
Flags: fast devsel, IRQ 84, IOMMU group 14
Memory at fc30300000 (64-bit, prefetchable) [size=1M]
Memory at fc30400000 (64-bit, prefetchable) [size=16K]
Memory at fc30404000 (64-bit, prefetchable) [size=4K]
Capabilities: [80] Express Endpoint, MSI 00
Capabilities: [e0] MSI: Enable- Count=1/32 Maskable+ 64bit+
Capabilities: [f8] Power Management version 3
Capabilities: [100] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
Capabilities: [108] Latency Tolerance Reporting
Capabilities: [110] L1 PM Substates
Capabilities: [200] Advanced Error Reporting
Kernel modules: mt7921e

After shutdown:
05:00.0 Network controller: MEDIATEK Corp. Device 7961
Subsystem: AzureWave Device 4680
Flags: bus master, fast devsel, latency 0, IRQ 85, IOMMU group 14
Memory at fc30300000 (64-bit, prefetchable) [size=1M]
Memory at fc30400000 (64-bit, prefetchable) [size=16K]
Memory at fc30404000 (64-bit, prefetchable) [size=4K]
Capabilities: [80] Express Endpoint, MSI 00
Capabilities: [e0] MSI: Enable+ Count=1/32 Maskable+ 64bit+
Capabilities: [f8] Power Management version 3
Capabilities: [100] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
Capabilities: [108] Latency Tolerance Reporting
Capabilities: [110] L1 PM Substates
Capabilities: [200] Advanced Error Reporting
Kernel driver in use: mt7921e
Kernel modules: mt7921e

Screen of a visual comparison of lspci in meld: https://postimg.cc/642NKJ5Y

--
Best Regards,
Mike Gavrilov.


2021-12-27 10:11:33

by Íñigo Huguet

[permalink] [raw]
Subject: Re: [Bug] Driver mt7921e cause computer reboot.

Hi,

On Thu, Nov 25, 2021 at 11:43 PM Mikhail Gavrilov
<[email protected]> wrote:
> With recent kernel 5.16.0-0.rc2 commit 5d9f4cf36721 the behavior has
> been changed for the better, but the WiFi adapter still works with
> bugs.
>
> Now spontaneous endless reboots do not occur. But if I restart the
> laptop instead of shutting down, then the next time I boot, the WiFi
> adapter disappears. In order for the WiFi adapter to appear again, it
> needs to turn off the laptop and then turn on. Can this be somehow
> fixed?
> lspci output after reboot and boot after shutdown is different:

I've been experiencing similar problems, but they're solved at v5.15
version, at least for me.

How are you installing the kernel? Custom build? Have you updated the
firmware to latest versions, as well?

> After reboot:
> Subsystem: AzureWave Device 4680
> Flags: fast devsel, IRQ 84, IOMMU group 14
> Memory at fc30300000 (64-bit, prefetchable) [size=1M]
> Memory at fc30400000 (64-bit, prefetchable) [size=16K]
> Memory at fc30404000 (64-bit, prefetchable) [size=4K]
> Capabilities: [80] Express Endpoint, MSI 00
> Capabilities: [e0] MSI: Enable- Count=1/32 Maskable+ 64bit+
> Capabilities: [f8] Power Management version 3
> Capabilities: [100] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
> Capabilities: [108] Latency Tolerance Reporting
> Capabilities: [110] L1 PM Substates
> Capabilities: [200] Advanced Error Reporting
> Kernel modules: mt7921e
>
> After shutdown:
> 05:00.0 Network controller: MEDIATEK Corp. Device 7961
> Subsystem: AzureWave Device 4680
> Flags: bus master, fast devsel, latency 0, IRQ 85, IOMMU group 14
> Memory at fc30300000 (64-bit, prefetchable) [size=1M]
> Memory at fc30400000 (64-bit, prefetchable) [size=16K]
> Memory at fc30404000 (64-bit, prefetchable) [size=4K]
> Capabilities: [80] Express Endpoint, MSI 00
> Capabilities: [e0] MSI: Enable+ Count=1/32 Maskable+ 64bit+
> Capabilities: [f8] Power Management version 3
> Capabilities: [100] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
> Capabilities: [108] Latency Tolerance Reporting
> Capabilities: [110] L1 PM Substates
> Capabilities: [200] Advanced Error Reporting
> Kernel driver in use: mt7921e
> Kernel modules: mt7921e
>
> Screen of a visual comparison of lspci in meld: https://postimg.cc/642NKJ5Y

For me, these differences seem to be the normal effect of the driver
not recognizing the device.

Regards
--
Íñigo Huguet


2021-12-27 11:30:27

by Mikhail Gavrilov

[permalink] [raw]
Subject: Re: [Bug] Driver mt7921e cause computer reboot.

On Mon, 27 Dec 2021 at 15:11, Íñigo Huguet <[email protected]> wrote:
> I've been experiencing similar problems, but they're solved at v5.15
> version, at least for me.
>
> How are you installing the kernel? Custom build? Have you updated the
> firmware to latest versions, as well?

I use Fedora Rawhide with default kernel and firmware packages.

$ uname -r
5.16.0-0.rc6.20211223gitbc491fb12513.44.fc36.x86_64
$ rpm -q linux-firmware
linux-firmware-20211027-126.fc36.noarch

>
> For me, these differences seem to be the normal effect of the driver
> not recognizing the device.

By the kernel logs, it looks like this:
After reboot:
$ dmesg | grep mt7921e
[ 8.629358] mt7921e 0000:05:00.0: enabling device (0000 -> 0002)
[ 8.630229] mt7921e 0000:05:00.0: ASIC revision: 79610010
[ 9.687652] mt7921e: probe of 0000:05:00.0 failed with error -110

# rmmod mt7921e
# modprobe mt7921e

[ 215.514503] mt7921e 0000:05:00.0: ASIC revision: feed0000
[ 216.604741] mt7921e: probe of 0000:05:00.0 failed with error -110

After cold boot after shutdown:
$ dmesg | grep mt7921e
[ 8.545171] mt7921e 0000:05:00.0: enabling device (0000 -> 0002)
[ 8.545757] mt7921e 0000:05:00.0: ASIC revision: 79610010
[ 8.631156] mt7921e 0000:05:00.0: HW/SW Version: 0x8a108a10, Build
Time: 20211014150838a
[ 8.912687] mt7921e 0000:05:00.0: WM Firmware Version: ____010000,
Build Time: 20211014150922
[ 8.938756] mt7921e 0000:05:00.0: Firmware init done
[ 9.753257] mt7921e 0000:05:00.0 wlp5s0: renamed from wlan0

It looks like something is not re-initialized after a reboot.
Laptop BIOS is latest: Version 316
https://dlcdnets.asus.com/pub/ASUS/GamingNB/G513QY/G513QYAS316.zip

Maybe anyone from the pci mailing list can lid some light why pci
device not re-initialized after a reboot?

--
Best Regards,
Mike Gavrilov.

2021-12-30 00:21:23

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bug] Driver mt7921e cause computer reboot.

[+cc Lorenzo, Ryder (the rest of the mt7921 maintainers)]

Thread: https://lore.kernel.org/all/CABXGCsODP8ze_mvzfJKcRYxuS-esVgHXAvDXS5KN3xFUN6bWgA@mail.gmail.com/T/#u

On Mon, Dec 27, 2021 at 04:30:11PM +0500, Mikhail Gavrilov wrote:
> On Mon, 27 Dec 2021 at 15:11, ??igo Huguet <[email protected]> wrote:
> > I've been experiencing similar problems, but they're solved at v5.15
> > version, at least for me.
> >
> > How are you installing the kernel? Custom build? Have you updated the
> > firmware to latest versions, as well?
>
> I use Fedora Rawhide with default kernel and firmware packages.
>
> $ uname -r
> 5.16.0-0.rc6.20211223gitbc491fb12513.44.fc36.x86_64
> $ rpm -q linux-firmware
> linux-firmware-20211027-126.fc36.noarch
>
> >
> > For me, these differences seem to be the normal effect of the driver
> > not recognizing the device.
>
> By the kernel logs, it looks like this:
> After reboot:
> $ dmesg | grep mt7921e
> [ 8.629358] mt7921e 0000:05:00.0: enabling device (0000 -> 0002)
> [ 8.630229] mt7921e 0000:05:00.0: ASIC revision: 79610010
> [ 9.687652] mt7921e: probe of 0000:05:00.0 failed with error -110
>
> # rmmod mt7921e
> # modprobe mt7921e
>
> [ 215.514503] mt7921e 0000:05:00.0: ASIC revision: feed0000
> [ 216.604741] mt7921e: probe of 0000:05:00.0 failed with error -110
>
> After cold boot after shutdown:
> $ dmesg | grep mt7921e
> [ 8.545171] mt7921e 0000:05:00.0: enabling device (0000 -> 0002)
> [ 8.545757] mt7921e 0000:05:00.0: ASIC revision: 79610010
> [ 8.631156] mt7921e 0000:05:00.0: HW/SW Version: 0x8a108a10, Build
> Time: 20211014150838a
> [ 8.912687] mt7921e 0000:05:00.0: WM Firmware Version: ____010000,
> Build Time: 20211014150922
> [ 8.938756] mt7921e 0000:05:00.0: Firmware init done
> [ 9.753257] mt7921e 0000:05:00.0 wlp5s0: renamed from wlan0
>
> It looks like something is not re-initialized after a reboot.
> Laptop BIOS is latest: Version 316
> https://dlcdnets.asus.com/pub/ASUS/GamingNB/G513QY/G513QYAS316.zip
>
> Maybe anyone from the pci mailing list can lid some light why pci
> device not re-initialized after a reboot?

Sorry for the inconvenience and thank you very much for the report!

If I understand correctly, when you do a cold boot, the mt7921e device
works properly.

But when you simply reboot, without a power off, the device does not
work, and the dmesg log contains:

pci 0000:05:00.0: [14c3:7961] type 00 class 0x028000
pci 0000:05:00.0: reg 0x10: [mem 0xfc30300000-0xfc303fffff 64bit pref]
pci 0000:05:00.0: reg 0x18: [mem 0xfc30400000-0xfc30403fff 64bit pref]
pci 0000:05:00.0: reg 0x20: [mem 0xfc30404000-0xfc30404fff 64bit pref]
...
mt7921e 0000:05:00.0: enabling device (0000 -> 0002)
mt7921e 0000:05:00.0: ASIC revision: 79610010
mt7921e: probe of 0000:05:00.0 failed with error -110

That means the device responds to PCI config reads and writes, but the
probe failed with -ETIMEDOUT after printing the ASIC revision [1].

devm_request_irq() should not return -ETIMEDOUT, but it looks like
mt7921_dma_init() can (via mt7921_dma_disable()). Maybe the mt7921e
driver can't tolerate some state the device was left in by reboot?

I don't see anything obviously wrong from a PCI core perspective. The
PCI core does not reset devices either when going down for a reboot or
when coming up at boot-time.

Bjorn

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/wireless/mediatek/mt76/mt7921/pci.c?id=v5.16-rc6#n187