2023-12-19 13:59:56

by James Prestwood

[permalink] [raw]
Subject: Ath11k warnings, and eventual phy going away requiring reboot

Hi,

I noticed this after one of our devices dropped offline. The device had
roamed 7 minutes prior so I doubt that had anything to do with it. But
then we get this, and then tons of warnings. I'm happy to provide full
stack traces but its quite a few, not sure which ones are relevant or
not. After all the warnings IWD got an RTNL del link event and was
unable to recover from that. It seems after that ath11k tried to power
back on but failed.

This is a stock 6.2 ubuntu kernel, WCN6855:

fw_version 0x1106996e fw_build_timestamp 2023-10-13 07:30 fw_build_id
WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30

I see there is a new FW as of 5 days ago, so I could try that if you
think this is a FW problem.

Dec 19 11:53:33 kernel: ieee80211 phy0: Hardware restart was requested
Dec 19 11:53:33 kernel: mhi mhi0: Requested to power ON
Dec 19 11:53:33 kernel: mhi mhi0: Power on setup success
Dec 19 11:53:34 kernel: mhi mhi0: Wait for device to enter SBL or
Mission mode
Dec 19 11:53:34 kernel: ath11k_pci 0000:01:00.0: already resetting count 2
Dec 19 11:53:43 kernel: ath11k_pci 0000:01:00.0: failed to send
WMI_PDEV_SET_PARAM cmd
Dec 19 11:53:43 kernel: ath11k_pci 0000:01:00.0: failed to enable PMF
QOS: (-108
Dec 19 11:53:43 kernel: Hardware became unavailable during restart.
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/util.c:2555 ieee80211_reconfig+0x505/0x1100 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.h:627 drv_flush+0x16f/0x180 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.h:839 drv_mgd_complete_tx+0x169/0x190 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.c:399 drv_ampdu_action+0x176/0x1a0 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.h:508 drv_sta_pre_rcu_remove+0x15f/0x180 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.c:120 drv_sta_state+0x226/0x230 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/sta_info.c:1291 __sta_info_destroy_part2+0x19b/0x1b0 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.c:462 drv_set_key+0x1d5/0x1e0 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/sta_info.c:1308 __sta_info_destroy_part2+0x15a/0x1b0 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/sta_info.c:1316 __sta_info_destroy_part2+0x17f/0x1b0 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.h:555 drv_sta_statistics+0x160/0x180 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/sta_info.c:417 sta_info_free+0xf4/0x170 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/sta_info.c:420 sta_info_free+0x162/0x170 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/main.c:235 ieee80211_bss_info_change_notify+0x2dd/0x2f0
[mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.c:193 drv_conf_tx+0x1e5/0x250 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.c:316 drv_unassign_vif_chanctx+0x19d/0x1d0
[mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.h:156
ieee80211_vif_cfg_change_notify+0x19c/0x1b0 [mac80211]
Dec 19 11:53:43 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.h:888 ieee80211_del_chanctx+0x1d6/0x1e0 [mac80211]
Dec 19 11:53:48 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.c:99 drv_remove_interface+0x137/0x150 [mac80211]
Dec 19 11:53:48 kernel: WARNING: CPU: 1 PID: 1328948 at
net/mac80211/driver-ops.c:38 drv_stop+0x10f/0x120 [mac80211]
Dec 19 11:53:48 iwd[490]: src/station.c:station_enter_state() Old State:
autoconnect_quick, new state: autoconnect_full
Dec 19 11:53:48 iwd[490]: src/scan.c:scan_periodic_start() Starting
periodic scan for wdev 2
Dec 19 11:53:48 iwd[490]: src/wiphy.c:wiphy_radio_work_insert()
Inserting work item 1466
Dec 19 11:53:48 iwd[490]: src/wiphy.c:wiphy_radio_work_done() Work item
1465 done
Dec 19 11:53:48 iwd[490]: src/wiphy.c:wiphy_radio_work_next() Starting
work item 1466
Dec 19 11:53:48 iwd[490]: Received error during CMD_TRIGGER_SCAN:
Network is down (100)
Dec 19 11:53:48 iwd[490]: src/netdev.c:netdev_link_notify() event 16 on
ifindex 5
Dec 19 11:53:48 iwd[490]: src/station.c:station_free()
Dec 19 11:53:48 iwd[490]: src/netconfig.c:netconfig_destroy()
Dec 19 11:53:48 iwd[490]: src/scan.c:scan_periodic_stop() Stopping
periodic scan for wdev 2
Dec 19 11:53:48 iwd[490]: src/scan.c:scan_cancel() Trying to cancel scan
id 1466 for wdev 2
Dec 19 11:53:48 iwd[490]: src/scan.c:scan_cancel() Scan is already started
Dec 19 11:53:48 iwd[490]: src/wiphy.c:wiphy_radio_work_done() Work item
1466 done
Dec 19 11:53:48 iwd[490]: src/station.c:station_roam_state_clear() 5
Dec 19 11:53:55 kernel: qcom_mhi_qrtr mhi0_IPCR: 20: Failed to receive
START channel command completion
Dec 19 11:53:55 kernel: qcom_mhi_qrtr: probe of mhi0_IPCR failed with
error -5
Dec 19 11:54:16 kernel: mhi mhi0: Requested to power ON
Dec 19 11:54:16 kernel: mhi mhi0: Power on setup success
Dec 19 11:54:45 kernel: mhi mhi0: Device failed to enter MHI Ready
Dec 19 11:54:45 kernel: mhi mhi0: MHI did not enter READY state
Dec 19 11:54:45 kernel: ath11k_pci 0000:01:00.0: failed to power up mhi:
-110
Dec 19 11:54:45 kernel: ath11k_pci 0000:01:00.0: failed to start mhi: -110
Dec 19 11:54:45 kernel: ath11k_pci 0000:01:00.0: already resetting count 3
Dec 19 11:55:27 kernel: mhi mhi0: Requested to power ON
Dec 19 11:55:27 kernel: mhi mhi0: Power on setup success
Dec 19 11:55:57 kernel: mhi mhi0: Device failed to enter MHI Ready
Dec 19 11:55:57 kernel: mhi mhi0: MHI did not enter READY state
Dec 19 11:55:57 kernel: ath11k_pci 0000:01:00.0: failed to power up mhi:
-110
Dec 19 11:55:57 kernel: ath11k_pci 0000:01:00.0: failed to start mhi: -110
Dec 19 12:15:25 kernel: usb 4-1: USB disconnect, device number 3

Thanks,

James



2023-12-19 18:24:36

by Kalle Valo

[permalink] [raw]
Subject: Re: Ath11k warnings, and eventual phy going away requiring reboot

James Prestwood <[email protected]> writes:

> I noticed this after one of our devices dropped offline. The device
> had roamed 7 minutes prior so I doubt that had anything to do with it.
> But then we get this, and then tons of warnings. I'm happy to provide
> full stack traces but its quite a few, not sure which ones are
> relevant or not. After all the warnings IWD got an RTNL del link event
> and was unable to recover from that. It seems after that ath11k tried
> to power back on but failed.
>
> This is a stock 6.2 ubuntu kernel, WCN6855:

BTW I don't know how it's nowadays, but back in the day Ubuntu heavily
modified ath11k. And we can't support distro kernels anyway as we don't
know what they have changed in the kernel.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2023-12-20 12:43:18

by James Prestwood

[permalink] [raw]
Subject: Re: Ath11k warnings, and eventual phy going away requiring reboot

Hi Kalle,

On 12/19/23 10:24 AM, Kalle Valo wrote:
> James Prestwood <[email protected]> writes:
>
>> I noticed this after one of our devices dropped offline. The device
>> had roamed 7 minutes prior so I doubt that had anything to do with it.
>> But then we get this, and then tons of warnings. I'm happy to provide
>> full stack traces but its quite a few, not sure which ones are
>> relevant or not. After all the warnings IWD got an RTNL del link event
>> and was unable to recover from that. It seems after that ath11k tried
>> to power back on but failed.
>>
>> This is a stock 6.2 ubuntu kernel, WCN6855:
> BTW I don't know how it's nowadays, but back in the day Ubuntu heavily
> modified ath11k. And we can't support distro kernels anyway as we don't
> know what they have changed in the kernel.

Ok. I understand where your coming from, but at the same time Ubuntu is
the largest linux distribution so it seems like these type of reports
wouldn't be uncommon. OTOH maybe users just go directly to Ubuntu.
Anyways, this is a list of patches isn't huge according to the changelog
(for 6.2):

     - wifi: ath11k: fix registration of 6Ghz-only phy without the full
channel
     - wifi: ath11k: add support default regdb while searching
board-2.bin for
     - wifi: ath11k: fix memory leak in WMI firmware stats
     - wifi: ath11k: Add missing check for ioremap
     - wifi: ath11k: Ignore frags from uninitialized peer in dp.
     - wifi: ath11k: Fix SKB corruption in REO destination ring
     - wifi: ath11k: reduce the MHI timeout to 20s
     - wifi: ath11k: Use platform_get_irq() to get the interrupt
     - wifi: ath11k: fix SAC bug on peer addition with sta band migration
     - wifi: ath11k: fix deinitialization of firmware resources
     - wifi: ath11k: fix writing to unintended memory region
     - wifi: ath11k: Fix memory leak in ath11k_peer_rx_frag_setup
     - wifi: ath11k: fix monitor mode bringup crash
     - wifi: ath11k: debugfs: fix to work with multiple PCI devices
     - wifi: ath11k: allow system suspend to survive ath11k

I see the dilemma of not wanting to waste time debugging when there are
unknown changes applied. I was hoping someone would recognize the
behavior and could suggest a patch/kernel/firmware to try, I do see some
recent patches related to firmware powering down which seems related. I
also will try the latest firmware that was just released.

Thanks,
James


2023-12-20 14:31:01

by Kalle Valo

[permalink] [raw]
Subject: Re: Ath11k warnings, and eventual phy going away requiring reboot

James Prestwood <[email protected]> writes:

> Hi Kalle,
>
> On 12/19/23 10:24 AM, Kalle Valo wrote:
>> James Prestwood <[email protected]> writes:
>>
>>> I noticed this after one of our devices dropped offline. The device
>>> had roamed 7 minutes prior so I doubt that had anything to do with it.
>>> But then we get this, and then tons of warnings. I'm happy to provide
>>> full stack traces but its quite a few, not sure which ones are
>>> relevant or not. After all the warnings IWD got an RTNL del link event
>>> and was unable to recover from that. It seems after that ath11k tried
>>> to power back on but failed.
>>>
>>> This is a stock 6.2 ubuntu kernel, WCN6855:
>> BTW I don't know how it's nowadays, but back in the day Ubuntu heavily
>> modified ath11k. And we can't support distro kernels anyway as we don't
>> know what they have changed in the kernel.
>
> Ok. I understand where your coming from, but at the same time Ubuntu
> is the largest linux distribution so it seems like these type of
> reports wouldn't be uncommon. OTOH maybe users just go directly to
> Ubuntu.

Isn't the recommendation that distro kernel bugs should be reported to
distro bug trackers? At least that's what I have understood. At least
our bugzilla says that:

https://bugzilla.kernel.org/

> Anyways, this is a list of patches isn't huge according to the
> changelog (for 6.2):
>
>      - wifi: ath11k: fix registration of 6Ghz-only phy without the
> full channel
>      - wifi: ath11k: add support default regdb while searching
> board-2.bin for
>      - wifi: ath11k: fix memory leak in WMI firmware stats
>      - wifi: ath11k: Add missing check for ioremap
>      - wifi: ath11k: Ignore frags from uninitialized peer in dp.
>      - wifi: ath11k: Fix SKB corruption in REO destination ring
>      - wifi: ath11k: reduce the MHI timeout to 20s
>      - wifi: ath11k: Use platform_get_irq() to get the interrupt
>      - wifi: ath11k: fix SAC bug on peer addition with sta band migration
>      - wifi: ath11k: fix deinitialization of firmware resources
>      - wifi: ath11k: fix writing to unintended memory region
>      - wifi: ath11k: Fix memory leak in ath11k_peer_rx_frag_setup
>      - wifi: ath11k: fix monitor mode bringup crash
>      - wifi: ath11k: debugfs: fix to work with multiple PCI devices
>      - wifi: ath11k: allow system suspend to survive ath11k
>
> I see the dilemma of not wanting to waste time debugging when there
> are unknown changes applied. I was hoping someone would recognize the
> behavior and could suggest a patch/kernel/firmware to try

Sure, I also get where you are coming from :) Just wanted to make sure
you know the problem with distro kernels.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches