2023-06-15 07:20:36

by Bagas Sanjaya

[permalink] [raw]
Subject: Fwd: ath11k: QCN9074: ce desc not available for wmi command

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> Hello,
>
> We are trying to connect 2x QCN9074 together (one as AP, the other as client).
>
> Using Ubuntu 22.04 hwe 5.19 generic kernel allows to pair both units in 800.11ac 80MHz only. Any other combinations of 802.11ax or ac/ax with 160MHz bandwidth does not work. The client kernel freezes when associating to QCN9074 AP without specific logs and requires reboot. I'll post another bug once I can get more logs.
>
> Since quite some patches came through since 5.19 - some of them related to 160MHz bandwidth. I tried multiple newer mainline kernels without success and usually the same error.
>
> Building kernel from latest ath master branch: 6.4.0-rc4-wt-ath+ gives the following dmesg output:
>
> [ 353.587072] ath11k_pci 0000:04:00.0: BAR 0: assigned [mem 0xa4200000-0xa43fffff 64bit]
> [ 353.587180] ath11k_pci 0000:04:00.0: MSI vectors: 1
> [ 353.587186] ath11k_pci 0000:04:00.0: qcn9074 hw1.0
> [ 353.741799] mhi mhi0: Requested to power ON
> [ 353.741806] mhi mhi0: Power on setup success
> [ 353.912479] mhi mhi0: Wait for device to enter SBL or Mission mode
> [ 354.007221] ath11k_pci 0000:04:00.0: chip_id 0x0 chip_family 0x0 board_id 0xff soc_id 0xffffffff
> [ 354.007225] ath11k_pci 0000:04:00.0: fw_version 0x2403072e fw_build_timestamp 2021-06-06 23:27 fw_build_id
> [ 355.333791] ath11k_pci 0000:04:00.0: leaving PCI ASPM disabled to avoid MHI M2 problems
> [ 355.729786] ath11k_pci 0000:04:00.0 wlp4s0: renamed from wlan0
> [ 358.960477] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 36866
> [ 358.960481] ath11k_pci 0000:04:00.0: failed to send WMI_STA_POWERSAVE_PARAM_CMDID
> [ 358.960484] ath11k_pci 0000:04:00.0: could not set uapsd params -105
> [ 358.960485] ath11k_pci 0000:04:00.0: failed to set sta uapsd: -105
> [ 362.032472] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 90113
> [ 362.032477] ath11k_pci 0000:04:00.0: failed to send WMI_REQUEST_STATS cmd
> [ 362.032479] ath11k_pci 0000:04:00.0: could not request fw stats (-105)
> [ 362.032480] ath11k_pci 0000:04:00.0: failed to request fw pdev stats: -105
> [ 365.104479] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 20482
> [ 365.104483] ath11k_pci 0000:04:00.0: failed to submit WMI_VDEV_DELETE_CMDID
> [ 365.104485] ath11k_pci 0000:04:00.0: failed to delete WMI vdev 0: -105
> [ 365.104487] ath11k_pci 0000:04:00.0: failed to delete vdev 0: -105
> [ 368.176472] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 16387
> [ 368.176476] ath11k_pci 0000:04:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> [ 368.176479] ath11k_pci 0000:04:00.0: failed to enable PMF QOS: (-105
> [ 371.248474] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 16387
> [ 371.248478] ath11k_pci 0000:04:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> [ 371.248480] ath11k_pci 0000:04:00.0: failed to enable PMF QOS: (-105
> [ 374.320393] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 16387
> [ 374.320397] ath11k_pci 0000:04:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> [ 374.320400] ath11k_pci 0000:04:00.0: failed to enable PMF QOS: (-105
>
> Both PCs are Intel x86 (same bug for AMD). We have multiple references of QCN9074 that we tested:
> - Sparklan WPEQ-405AX (our preferred one, as they are the only vendor I know that went through FCC certification) - This unit can associate to APs only with the firmware Sparklan provided. Otherwise, link strength is reported low and barely no APs are listed after scanning.
> - Emwicon WMX7406 - has better performances with Sparklan's vendor FW. Works with ath11k-firmware 2.7.0.1 but shows lower TX mostly.
>
> Tested FW (non exhaustive):
> # ath11k-firmware 2.5.0.1
> 823915206101779f8cab6b89066e1040 /lib/firmware/ath11k/QCN9074/hw1.0/amss.bin
> 668f53050a92db5b4281ae5f26c7e35d /lib/firmware/ath11k/QCN9074/hw1.0/board-2.bin
> fcca36959c5f56f9f0fb7015083dc806 /lib/firmware/ath11k/QCN9074/hw1.0/m3.bin
>
> # ath11k-firmware 2.7.0.1
> 465d0a063d049f7e4b79d267a035c6c7 /lib/firmware/ath11k/QCN9074/hw1.0/amss.bin
> 668f53050a92db5b4281ae5f26c7e35d /lib/firmware/ath11k/QCN9074/hw1.0/board-2.bin
> ad8fafb9c1deab744c972469be916e72 /lib/firmware/ath11k/QCN9074/hw1.0/m3.bin
>
> # Vendor firmware
> 1e88ff2e2b5bcf7f130397cb5b21ef39 /lib/firmware/ath11k/QCN9074/hw1.0/amss.bin
> 7b3ce8686713a724946466ec1fefc2f4 /lib/firmware/ath11k/QCN9074/hw1.0/board.bin
> d0a6f7ccd52f9e3886f0bc96309f7b9a /lib/firmware/ath11k/QCN9074/hw1.0/m3.bin
>
>
> Attached dmesg log with ath11k debug_mask=0xFFFF and lspci.
>
> Thank you

See Bugzilla for the full thread and attached dmesg.

Manikanta: This regression is apparently caused by a commit of yours.
Would you like to take a look on it?

Anyway, I'm adding it to regzbot:

#regzbot introduced: 13aa2fb692d371 https://bugzilla.kernel.org/show_bug.cgi?id=217536
#regzbot title: Threaded NAPI causes ce desc unavailable error on ath11k

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217536

--
An old man doll... just what I always wanted! - Clara


Subject: Re: Fwd: ath11k: QCN9074: ce desc not available for wmi command

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Hmmm, there afaics was no real progress and not even a single reply from
a developer (neither here or in bugzilla) since the issue was reported
~10 days ago. :-/

Manikanta, did you maybe just miss that this is caused by change of
yours (and thus is something you should look into)?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 15.06.23 09:07, Bagas Sanjaya wrote:
> Hi,
>
> I notice a regression report on Bugzilla [1]. Quoting from it:
>
>> Hello,
>>
>> We are trying to connect 2x QCN9074 together (one as AP, the other as client).
>>
>> Using Ubuntu 22.04 hwe 5.19 generic kernel allows to pair both units in 800.11ac 80MHz only. Any other combinations of 802.11ax or ac/ax with 160MHz bandwidth does not work. The client kernel freezes when associating to QCN9074 AP without specific logs and requires reboot. I'll post another bug once I can get more logs.
>>
>> Since quite some patches came through since 5.19 - some of them related to 160MHz bandwidth. I tried multiple newer mainline kernels without success and usually the same error.
>>
>> Building kernel from latest ath master branch: 6.4.0-rc4-wt-ath+ gives the following dmesg output:
>>
>> [ 353.587072] ath11k_pci 0000:04:00.0: BAR 0: assigned [mem 0xa4200000-0xa43fffff 64bit]
>> [ 353.587180] ath11k_pci 0000:04:00.0: MSI vectors: 1
>> [ 353.587186] ath11k_pci 0000:04:00.0: qcn9074 hw1.0
>> [ 353.741799] mhi mhi0: Requested to power ON
>> [ 353.741806] mhi mhi0: Power on setup success
>> [ 353.912479] mhi mhi0: Wait for device to enter SBL or Mission mode
>> [ 354.007221] ath11k_pci 0000:04:00.0: chip_id 0x0 chip_family 0x0 board_id 0xff soc_id 0xffffffff
>> [ 354.007225] ath11k_pci 0000:04:00.0: fw_version 0x2403072e fw_build_timestamp 2021-06-06 23:27 fw_build_id
>> [ 355.333791] ath11k_pci 0000:04:00.0: leaving PCI ASPM disabled to avoid MHI M2 problems
>> [ 355.729786] ath11k_pci 0000:04:00.0 wlp4s0: renamed from wlan0
>> [ 358.960477] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 36866
>> [ 358.960481] ath11k_pci 0000:04:00.0: failed to send WMI_STA_POWERSAVE_PARAM_CMDID
>> [ 358.960484] ath11k_pci 0000:04:00.0: could not set uapsd params -105
>> [ 358.960485] ath11k_pci 0000:04:00.0: failed to set sta uapsd: -105
>> [ 362.032472] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 90113
>> [ 362.032477] ath11k_pci 0000:04:00.0: failed to send WMI_REQUEST_STATS cmd
>> [ 362.032479] ath11k_pci 0000:04:00.0: could not request fw stats (-105)
>> [ 362.032480] ath11k_pci 0000:04:00.0: failed to request fw pdev stats: -105
>> [ 365.104479] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 20482
>> [ 365.104483] ath11k_pci 0000:04:00.0: failed to submit WMI_VDEV_DELETE_CMDID
>> [ 365.104485] ath11k_pci 0000:04:00.0: failed to delete WMI vdev 0: -105
>> [ 365.104487] ath11k_pci 0000:04:00.0: failed to delete vdev 0: -105
>> [ 368.176472] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 16387
>> [ 368.176476] ath11k_pci 0000:04:00.0: failed to send WMI_PDEV_SET_PARAM cmd
>> [ 368.176479] ath11k_pci 0000:04:00.0: failed to enable PMF QOS: (-105
>> [ 371.248474] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 16387
>> [ 371.248478] ath11k_pci 0000:04:00.0: failed to send WMI_PDEV_SET_PARAM cmd
>> [ 371.248480] ath11k_pci 0000:04:00.0: failed to enable PMF QOS: (-105
>> [ 374.320393] ath11k_pci 0000:04:00.0: ce desc not available for wmi command 16387
>> [ 374.320397] ath11k_pci 0000:04:00.0: failed to send WMI_PDEV_SET_PARAM cmd
>> [ 374.320400] ath11k_pci 0000:04:00.0: failed to enable PMF QOS: (-105
>>
>> Both PCs are Intel x86 (same bug for AMD). We have multiple references of QCN9074 that we tested:
>> - Sparklan WPEQ-405AX (our preferred one, as they are the only vendor I know that went through FCC certification) - This unit can associate to APs only with the firmware Sparklan provided. Otherwise, link strength is reported low and barely no APs are listed after scanning.
>> - Emwicon WMX7406 - has better performances with Sparklan's vendor FW. Works with ath11k-firmware 2.7.0.1 but shows lower TX mostly.
>>
>> Tested FW (non exhaustive):
>> # ath11k-firmware 2.5.0.1
>> 823915206101779f8cab6b89066e1040 /lib/firmware/ath11k/QCN9074/hw1.0/amss.bin
>> 668f53050a92db5b4281ae5f26c7e35d /lib/firmware/ath11k/QCN9074/hw1.0/board-2.bin
>> fcca36959c5f56f9f0fb7015083dc806 /lib/firmware/ath11k/QCN9074/hw1.0/m3.bin
>>
>> # ath11k-firmware 2.7.0.1
>> 465d0a063d049f7e4b79d267a035c6c7 /lib/firmware/ath11k/QCN9074/hw1.0/amss.bin
>> 668f53050a92db5b4281ae5f26c7e35d /lib/firmware/ath11k/QCN9074/hw1.0/board-2.bin
>> ad8fafb9c1deab744c972469be916e72 /lib/firmware/ath11k/QCN9074/hw1.0/m3.bin
>>
>> # Vendor firmware
>> 1e88ff2e2b5bcf7f130397cb5b21ef39 /lib/firmware/ath11k/QCN9074/hw1.0/amss.bin
>> 7b3ce8686713a724946466ec1fefc2f4 /lib/firmware/ath11k/QCN9074/hw1.0/board.bin
>> d0a6f7ccd52f9e3886f0bc96309f7b9a /lib/firmware/ath11k/QCN9074/hw1.0/m3.bin
>>
>>
>> Attached dmesg log with ath11k debug_mask=0xFFFF and lspci.
>>
>> Thank you
>
> See Bugzilla for the full thread and attached dmesg.
>
> Manikanta: This regression is apparently caused by a commit of yours.
> Would you like to take a look on it?
>
> Anyway, I'm adding it to regzbot:
>
> #regzbot introduced: 13aa2fb692d371 https://bugzilla.kernel.org/show_bug.cgi?id=217536
> #regzbot title: Threaded NAPI causes ce desc unavailable error on ath11k
>
> Thanks.
>
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217536
>

2023-07-11 07:32:53

by Kalle Valo

[permalink] [raw]
Subject: Re: Fwd: ath11k: QCN9074: ce desc not available for wmi command

"Linux regression tracking (Thorsten Leemhuis)"
<[email protected]> writes:

> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
>
> Hmmm, there afaics was no real progress and not even a single reply from
> a developer (neither here or in bugzilla) since the issue was reported
> ~10 days ago. :-/
>
> Manikanta, did you maybe just miss that this is caused by change of
> yours (and thus is something you should look into)?

No reply from Manikanta so I think I'll just revert the commit. I have
assigned bug #217536 to me now.

The wireless trees are closed for July but my plan is that I submit the
revert directly to net tree.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2023-07-25 10:55:10

by Manikanta Pubbisetty

[permalink] [raw]
Subject: Re: Fwd: ath11k: QCN9074: ce desc not available for wmi command

On 6/26/2023 6:19 PM, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
>
> Hmmm, there afaics was no real progress and not even a single reply from
> a developer (neither here or in bugzilla) since the issue was reported
> ~10 days ago. :-/
>
> Manikanta, did you maybe just miss that this is caused by change of
> yours (and thus is something you should look into)?
>

Extremely sorry for having this missed due to incorrect mail filters on
my machine. I have looked the logs attached to the buganizer.

The issue from the logs looks like it is happening during the boot.
Generally, issues like these "ce desc not available for wmi command"
occur when there is no room in the copy engine pipe for driver to
enqueue the command to the firmware and in many cases these would have
happen when firmware is reaping the ring slowly.

It is puzzling to know that thread NAPI is causing this and reverting
this got the issue fixed. NAPI generally acts on the RX rings and has
nothing to do with the TX.

Hi Sanjay,

This issue is seen just with the kernel upgrade alone? Or firmware has
also been upgraded?

Meanwhile, I'll try to repro the issue on my local setup and try to root
cause the problem. Pls let me know the firmware version that has been
used for testing.

Although I'm okay reverting the threaded NAPI patch for now, in the long
run we want that back as threaded NAPI brings significant improvement on
the throughput front.

Thanks,
Manikanta

Subject: Re: Fwd: ath11k: QCN9074: ce desc not available for wmi command

On 25.07.23 11:17, Manikanta Pubbisetty wrote:
> On 6/26/2023 6:19 PM, Linux regression tracking (Thorsten Leemhuis) wrote:
>>
>> Hmmm, there afaics was no real progress and not even a single reply from
>> a developer (neither here or in bugzilla) since the issue was reported
>> ~10 days ago. :-/

BTW: Kalle, many thx for picking this up and posting & applying the revert!

>> Manikanta, did you maybe just miss that this is caused by change of
>> yours (and thus is something you should look into)?
>
> Extremely sorry for having this missed [...]
>
> Hi Sanjay, [...]

FWIW, Bagas Sanjaya just forwarded the report and the reporter is not
CCed afaics (bugzilla privacy policy does not allow this, which
complicates things a lot :-/ ). You have to use bugzilla to reach the
reporter: https://bugzilla.kernel.org/show_bug.cgi?id=217536

Bagas Sanjaya: wondering if you should make that "I'm just forwarding"
aspect more obvious in your mails. And it afaics would also be good to
mentioned the author of the culprit quite early in your mails, as there
is a risk that people will miss that aspect otherwise.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

2023-07-26 11:19:50

by Manikanta Pubbisetty

[permalink] [raw]
Subject: Re: Fwd: ath11k: QCN9074: ce desc not available for wmi command


On 7/26/2023 2:51 PM, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 25.07.23 11:17, Manikanta Pubbisetty wrote:
>> On 6/26/2023 6:19 PM, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>
>>> Hmmm, there afaics was no real progress and not even a single reply from
>>> a developer (neither here or in bugzilla) since the issue was reported
>>> ~10 days ago. :-/
>
> BTW: Kalle, many thx for picking this up and posting & applying the revert!
>
>>> Manikanta, did you maybe just miss that this is caused by change of
>>> yours (and thus is something you should look into)?
>>
>> Extremely sorry for having this missed [...]
>>
>> Hi Sanjay, [...]
>
> FWIW, Bagas Sanjaya just forwarded the report and the reporter is not
> CCed afaics (bugzilla privacy policy does not allow this, which
> complicates things a lot :-/ ). You have to use bugzilla to reach the
> reporter: https://bugzilla.kernel.org/show_bug.cgi?id=217536
>

Sure, thanks Thorsten.

2023-09-05 16:24:49

by Tyler Stachecki

[permalink] [raw]
Subject: Re: Fwd: ath11k: QCN9074: ce desc not available for wmi command

> On 25.07.23 11:17, Manikanta Pubbisetty wrote:
>
> FWIW, Bagas Sanjaya just forwarded the report and the reporter is not
> CCed afaics (bugzilla privacy policy does not allow this, which
> complicates things a lot :-/ ). You have to use bugzilla to reach the
> reporter: https://bugzilla.kernel.org/show_bug.cgi?id=217536

Hi Manikanta,

I just wanted to report that this is likely related to QCN9074 when the host
system only has 1 MSI-X vector available for the modem and/or related to a
product named "WPEQ-405AX".

I have two different hosts running the exact same kernel, same QCN9074
firmware (WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1), etc. The only
differences are that the one which does not work is running on a slightly
older Intel SBC, with the older one leveraging mPCIe instead of PCIe and
only having one MSI-X vector.

I tried backing out the threaded NAPI commit and, as mentioned, everything
begins working again on the host with 1 MSI-X vector. I have also seen some
other oddities with the system only working with 1 MSI-X vector, such as
the modem not working when I boot with hpet=disabled. I am guessing it is
not related, but mentioning it just in case.

The only other thing I'll mention is that the CE desc errors are *only* seen
after upping the link (via `ip link set wlp1s0 up`). After this point, doing
something as simple as reading the temperature of the modem fails and the
kernel log starts printing the errors described above. Prior to that, however,
no error messages are seen.

I'm happy to be of service to test any changes you might suggest. Thanks
for the threaded NAPI work, by the way - it definitely provides a boost!

Regards,
Tyler

2023-10-26 23:06:56

by Tyler Stachecki

[permalink] [raw]
Subject: Re: Fwd: ath11k: QCN9074: ce desc not available for wmi command

On Mon, Sep 4, 2023 at 5:47 AM Manikanta Pubbisetty
<[email protected]> wrote:
> > Hi Manikanta,
> >
> > I just wanted to report that this is likely related to QCN9074 when the host
> > system only has 1 MSI-X vector available for the modem and/or related to a
> > product named "WPEQ-405AX".
> >
> > I have two different hosts running the exact same kernel, same QCN9074
> > firmware (WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1), etc. The only
> > differences are that the one which does not work is running on a slightly
> > older Intel SBC, with the older one leveraging mPCIe instead of PCIe and
> > only having one MSI-X vector.
>
> Yes, you are right. This seems to be a problem with some hardware having
> QCN9074. We have tried to reproduce this problem in QC on different
> hardware but could not reproduce it even once. Not even with one MSI vector.

Just as a heads up, this "Sparklan WPEQ-405AX" version of QCN9074 may
be adding to some of the confusion here and so there may be two
problems. As mentioned previously, CE desc errors stopped after
reverting the threaded NAPI patch. However, there's something odd
about this modem - it does not work with the board-2.bin that Kalle
provides as the OP noted.

Upon request, the vendor of this modem provides a board.bin for
WPEQ-405AX compatible with a copy of amss/m3 which appears to be
WLAN.HK.2.4.0.1-01838-QCAHKSWPL_SILICONZ-1 (based on checksums). I
found out the hard way that the ABI of the BDFs changed for
WLAN.HK.2.5.0 firmwares and beyond, making things non-backwards
compatible with the board.bin that the vendor provides.. which is
unfortunate as the vendor would not supply a BDF built against 2.5.0+
when requested.

The board-2.bin that Kalle allows the modem start, but it fails to
associate or really do anything useful beyond announcing a SSID in AP
mode.

Anyways: I think it's mostly just an issue with the IRQ affinity --
maybe the threaded NAPI patch is changing it somehow...

Cheers,
Tyler