Hello,
After updating to Ubuntu 21.04 I found two problems related to the BRCMF_C_GET_ASSOCLIST using an older BCM4329 SDIO WiFi.
1. The kernel is spammed with:
ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST unsupported, err=-52
ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST unsupported, err=-52
ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST unsupported, err=-52
Which happens apparently due to a newer NetworkManager version that pokes dump_station() periodically. I sent [1] that fixes this noise.
[1] https://patchwork.kernel.org/project/linux-wireless/list/?series=480715
2. The other much worse problem is that WiFi eventually dies now with these errors:
...
ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST unsupported, err=-52
brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST unsupported, err=-110
ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
From this point all firmware calls start to fail with err=-110 and WiFi doesn't work anymore. This problem is reproducible with 5.13-rc and current -next, I haven't checked older kernel versions. Somehow it's worse using a recent -next, WiFi dies quicker.
What's interesting is that I see that there is always a pending signal in brcmf_sdio_dcmd_resp_wait() when timeout happens. It looks like the timeout happens when there is access to a swap partition, which stalls system for a second or two, but this is not 100%. Increasing DCMD_RESP_TIMEOUT doesn't help.
Please let me know if you have any ideas of how to fix this trouble properly or if you need need any more info.
Removing BRCMF_C_GET_ASSOCLIST firmware call entirely from the driver fixes the problem.
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index f4405d7861b6..6327cb38d6ec 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -2886,22 +2886,6 @@ brcmf_cfg80211_dump_station(struct wiphy *wiphy, struct net_device *ndev,
brcmf_dbg(TRACE, "Enter, idx %d\n", idx);
- if (idx == 0) {
- cfg->assoclist.count = cpu_to_le32(BRCMF_MAX_ASSOCLIST);
- err = brcmf_fil_cmd_data_get(ifp, BRCMF_C_GET_ASSOCLIST,
- &cfg->assoclist,
- sizeof(cfg->assoclist));
- if (err) {
- bphy_err(drvr, "BRCMF_C_GET_ASSOCLIST unsupported, err=%d\n",
- err);
- cfg->assoclist.count = 0;
- return -EOPNOTSUPP;
- }
- }
- if (idx < le32_to_cpu(cfg->assoclist.count)) {
- memcpy(mac, cfg->assoclist.mac[idx], ETH_ALEN);
- return brcmf_cfg80211_get_station(wiphy, ndev, mac, sinfo);
- }
return -ENOENT;
}
27.05.2021 19:42, Arend van Spriel пишет:
> On 5/26/2021 5:10 PM, Dmitry Osipenko wrote:
>> Hello,
>>
>> After updating to Ubuntu 21.04 I found two problems related to the
>> BRCMF_C_GET_ASSOCLIST using an older BCM4329 SDIO WiFi.
>>
>> 1. The kernel is spammed with:
>>
>> ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST
>> unsupported, err=-52
>> ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST
>> unsupported, err=-52
>> ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST
>> unsupported, err=-52
>>
>> Which happens apparently due to a newer NetworkManager version that
>> pokes dump_station() periodically. I sent [1] that fixes this noise.
>>
>> [1]
>> https://patchwork.kernel.org/project/linux-wireless/list/?series=480715
>
> Right. I noticed this one and did not have anything to add to the
> review/suggestion.
Please feel free to add yours r-b to the patches if they are good to you.
>> 2. The other much worse problem is that WiFi eventually dies now with
>> these errors:
>>
>> ...
>> ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST
>> unsupported, err=-52
>> brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
>> ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST
>> unsupported, err=-110
>> ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg
>> failed w/status -110
>>
>> From this point all firmware calls start to fail with err=-110 and
>> WiFi doesn't work anymore. This problem is reproducible with 5.13-rc
>> and current -next, I haven't checked older kernel versions. Somehow
>> it's worse using a recent -next, WiFi dies quicker.
>>
>> What's interesting is that I see that there is always a pending signal
>> in brcmf_sdio_dcmd_resp_wait() when timeout happens. It looks like the
>> timeout happens when there is access to a swap partition, which stalls
>> system for a second or two, but this is not 100%. Increasing
>> DCMD_RESP_TIMEOUT doesn't help.
>
> The timeout error (-110) can have two root causes that I am aware off.
> Either the firmware died or the SDIO layer has gone haywire. Not sure if
> that swap partition is on eMMC device, but if so it could be related.
> You could try generating device coredump. If that also gives -110 errors
> we know it is the SDIO layer.
Coredump is a good idea, thank you. The swap partition is on external SD
card, everything else is on eMMC.
>> Please let me know if you have any ideas of how to fix this trouble
>> properly or if you need need any more info.
>>
>> Removing BRCMF_C_GET_ASSOCLIST firmware call entirely from the driver
>> fixes the problem.
>
> My guess is that reducing interaction with firmware is what is avoiding
> the issue and not so much this specific firmware command. As always it
> is good to know the conditions in which the issue occurs. What is the
> hardware platform you are running Ubuntu on? Stuff like that.
That's an older Acer A500 NVIDIA Tegra20 tablet device [1]. I may also
try to reproduce problem on Tegra30 Nexus 7 with BCM4330.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/tegra20-acer-a500-picasso.dts
Thank you very much for the suggestions. I will try to collect more info
and come back with the report.
28.05.2021 01:47, Dmitry Osipenko пишет:
> 27.05.2021 19:42, Arend van Spriel пишет:
>> On 5/26/2021 5:10 PM, Dmitry Osipenko wrote:
>>> Hello,
>>>
>>> After updating to Ubuntu 21.04 I found two problems related to the
>>> BRCMF_C_GET_ASSOCLIST using an older BCM4329 SDIO WiFi.
>>>
>>> 1. The kernel is spammed with:
>>>
>>> ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST
>>> unsupported, err=-52
>>> ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST
>>> unsupported, err=-52
>>> ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST
>>> unsupported, err=-52
>>>
>>> Which happens apparently due to a newer NetworkManager version that
>>> pokes dump_station() periodically. I sent [1] that fixes this noise.
>>>
>>> [1]
>>> https://patchwork.kernel.org/project/linux-wireless/list/?series=480715
>>
>> Right. I noticed this one and did not have anything to add to the
>> review/suggestion.
>
> Please feel free to add yours r-b to the patches if they are good to you.
>
>>> 2. The other much worse problem is that WiFi eventually dies now with
>>> these errors:
>>>
>>> ...
>>> ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST
>>> unsupported, err=-52
>>> brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
>>> ieee80211 phy0: brcmf_cfg80211_dump_station: BRCMF_C_GET_ASSOCLIST
>>> unsupported, err=-110
>>> ieee80211 phy0: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg
>>> failed w/status -110
>>>
>>> From this point all firmware calls start to fail with err=-110 and
>>> WiFi doesn't work anymore. This problem is reproducible with 5.13-rc
>>> and current -next, I haven't checked older kernel versions. Somehow
>>> it's worse using a recent -next, WiFi dies quicker.
>>>
>>> What's interesting is that I see that there is always a pending signal
>>> in brcmf_sdio_dcmd_resp_wait() when timeout happens. It looks like the
>>> timeout happens when there is access to a swap partition, which stalls
>>> system for a second or two, but this is not 100%. Increasing
>>> DCMD_RESP_TIMEOUT doesn't help.
>>
>> The timeout error (-110) can have two root causes that I am aware off.
>> Either the firmware died or the SDIO layer has gone haywire. Not sure if
>> that swap partition is on eMMC device, but if so it could be related.
>> You could try generating device coredump. If that also gives -110 errors
>> we know it is the SDIO layer.
>
> Coredump is a good idea, thank you. The swap partition is on external SD
> card, everything else is on eMMC.
>
>>> Please let me know if you have any ideas of how to fix this trouble
>>> properly or if you need need any more info.
>>>
>>> Removing BRCMF_C_GET_ASSOCLIST firmware call entirely from the driver
>>> fixes the problem.
>>
>> My guess is that reducing interaction with firmware is what is avoiding
>> the issue and not so much this specific firmware command. As always it
>> is good to know the conditions in which the issue occurs. What is the
>> hardware platform you are running Ubuntu on? Stuff like that.
>
> That's an older Acer A500 NVIDIA Tegra20 tablet device [1]. I may also
> try to reproduce problem on Tegra30 Nexus 7 with BCM4330.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/tegra20-acer-a500-picasso.dts
>
> Thank you very much for the suggestions. I will try to collect more info
> and come back with the report.
>
I was testing this for the past weeks and the problem is not
reproducible anymore. Apparently something got fixed in linux-next. I
haven't tried to bisect the fix since it's a bit too painful to do.
Still there are occasional -110 errors when system stalls on a memory
swap, but WiFi keeps working now.