2022-07-20 21:43:35

by rtl8821cerfe2

[permalink] [raw]
Subject: rtw88: Doesn't work for 60-90 seconds several times a day

Hello.

I am unable to open any sites in Firefox for 60-90 seconds at a time.
On one occasion it was 156 seconds. Firefox gives up after 20 seconds
or so. NetworkManager reports "limited connectivity". The router doesn't
reply to pings. The journal shows that the laptop remains connected to
the router. This happens several times a day.

However, my IRC client seems to be unaffected. It never detected any
abnormally high lag during these events, not even the one that lasted
156 seconds. It checks the lag every 30 seconds. Also, the bot named
"phrik" from the #archlinux-offtopic channel reacts immediately when
I send it "!ping" during one of these events. (It sends back "pong".)
So I guess existing connections are not affected.

I have had this problem ever since support for RTL8821CE with RFE 2
was added. (The wifi card's RFE type is 2.)

Other devices connected to the same router don't have this problem.

The laptop and the router are in the same room. The distance
between them is about 3 meters.


These are the things I tried which did not help:

- The rtw88_core option disable_lps_deep=1

- `iw wlo1 set power_save off`

- Installing wireless-regdb and uncommenting my country in
/etc/conf.d/wireless-regdom

- Switching the router to "n only" mode. Previously it was in "b/g/n"
mode.

- Making the router use channel 9 instead of "auto". By itself it was
selecting channels 1 or 11 the few times I checked that. Channel 9
seemed less crowded than those.

- Making the router use 40 MHz channel width instead of the "20/40"
setting. This doubled the speed but didn't help with my problem.

- The firmware from the rtl8821ce driver [0] (version 20.1.0),
instead of the one from linux-firmware (version 24.11.0). I used the
one with the length of 137616 bytes.

This doesn't happen with the rtl8821ce driver, which is why I extracted
that firmware from it, to see if it's a firmware issue.


Pinging the router all day seems to prevent this problem. Enabling all
the debug flags for rtw88_core also *may* prevent it. I'm not sure about
that.


Most of the time I don't have any bluetooth devices connected.
When I do, they don't cause problems.


I captured a bit of wifi traffic using another laptop, including two of
these events, and noticed something strange:

- rtw88 sends "Null function" telling the router it's going to sleep
- router immediately sends ack (after less than 1 ms)
- rtw88 resends "Null function" (same SN, Retry flag set)
- router immediately sends ack
- rtw88 resends
- router immediately sends ack
- rtw88 resends
- ...
- ...

rtw88 resends the "Null function" 3-4 times, even though the router
promptly sends ack each time, then it sends a new "Null function" with
different SN and the process repeats. This seems to happen all the time,
not just when I can't open any pages in Firefox. The rtl8821ce driver
doesn't do this, but rtw88 with the old 20.1.0 firmware does. My phone
doesn't do this either.

I can provide the captures in private.


Currently I'm using the rtw88_pci option disable_aspm=1, because kernel
5.18 brought the freezes back. [1]


My laptop is HP 250 G7 with a Core i3 7020U CPU.

The RTL8821CE wifi card is in M.2 slot, not soldered to the motherboard,
even though the interface is named wlo1. It has one antenna, in case
that matters.

The router is a Fiberhome HG6544C.

The network is secured with WPA2 Personal.

The kernel version is 5.18.5-arch1-1.

The wifi firmware version is 24.11.0.

NetworkManager version is 1.38.2-1.

wpa_supplicant version is 2.10-4.

The operating system is Arch Linux.



Just out of curiosity, what is C2H with id 0x15 ? It is not handled by
rtw88, but the firmware sends it often.


[0] https://raw.githubusercontent.com/tomaspinho/rtl8821ce/be733dc86781c68571650b395dd0fa6b53c0a039/hal/rtl8821c/hal8821c_fw.c
[1] https://lore.kernel.org/linux-wireless/Te_PJvJjKCi-lK28Zu0d8VQG0AGdwTl6cJydYEETLbc3gN0l8liXH1DSOZnKxUHYGxavLBCs1sqos2e6jeiRzzO0RLRSISdWvTiiPp0v9kM=@protonmail.com/


2022-08-18 07:44:10

by rtl8821cerfe2

[permalink] [raw]
Subject: Re: rtw88: Doesn't work for 60-90 seconds several times a day

On Thursday, July 21st, 2022 at 12:35 AM, rtl8821cerfe2 <[email protected]> wrote:

> Hello.
>
> I am unable to open any sites in Firefox for 60-90 seconds at a time.
> On one occasion it was 156 seconds. Firefox gives up after 20 seconds
> or so. NetworkManager reports "limited connectivity". The router doesn't
> reply to pings. The journal shows that the laptop remains connected to
> the router. This happens several times a day.
>
> However, my IRC client seems to be unaffected. It never detected any
> abnormally high lag during these events, not even the one that lasted
> 156 seconds. It checks the lag every 30 seconds. Also, the bot named
> "phrik" from the #archlinux-offtopic channel reacts immediately when
> I send it "!ping" during one of these events. (It sends back "pong".)
> So I guess existing connections are not affected.
>
> I have had this problem ever since support for RTL8821CE with RFE 2
> was added. (The wifi card's RFE type is 2.)
>
> Other devices connected to the same router don't have this problem.
>
> The laptop and the router are in the same room. The distance
> between them is about 3 meters.
>
>
> These are the things I tried which did not help:
>
> - The rtw88_core option disable_lps_deep=1
>
> - `iw wlo1 set power_save off`
>
> - Installing wireless-regdb and uncommenting my country in
> /etc/conf.d/wireless-regdom
>
> - Switching the router to "n only" mode. Previously it was in "b/g/n"
> mode.
>
> - Making the router use channel 9 instead of "auto". By itself it was
> selecting channels 1 or 11 the few times I checked that. Channel 9
> seemed less crowded than those.
>
> - Making the router use 40 MHz channel width instead of the "20/40"
> setting. This doubled the speed but didn't help with my problem.
>
> - The firmware from the rtl8821ce driver [0] (version 20.1.0),
> instead of the one from linux-firmware (version 24.11.0). I used the
> one with the length of 137616 bytes.
>
> This doesn't happen with the rtl8821ce driver, which is why I extracted
> that firmware from it, to see if it's a firmware issue.
>
>
> Pinging the router all day seems to prevent this problem. Enabling all
> the debug flags for rtw88_core also may prevent it. I'm not sure about
> that.
>
>
> Most of the time I don't have any bluetooth devices connected.
> When I do, they don't cause problems.
>
>
> I captured a bit of wifi traffic using another laptop, including two of
> these events, and noticed something strange:
>
> - rtw88 sends "Null function" telling the router it's going to sleep
> - router immediately sends ack (after less than 1 ms)
> - rtw88 resends "Null function" (same SN, Retry flag set)
> - router immediately sends ack
> - rtw88 resends
> - router immediately sends ack
> - rtw88 resends
> - ...
> - ...
>
> rtw88 resends the "Null function" 3-4 times, even though the router
> promptly sends ack each time, then it sends a new "Null function" with
> different SN and the process repeats. This seems to happen all the time,
> not just when I can't open any pages in Firefox. The rtl8821ce driver
> doesn't do this, but rtw88 with the old 20.1.0 firmware does. My phone
> doesn't do this either.
>
> I can provide the captures in private.
>
>
> Currently I'm using the rtw88_pci option disable_aspm=1, because kernel
> 5.18 brought the freezes back. [1]
>
>
> My laptop is HP 250 G7 with a Core i3 7020U CPU.
>
> The RTL8821CE wifi card is in M.2 slot, not soldered to the motherboard,
> even though the interface is named wlo1. It has one antenna, in case
> that matters.
>
> The router is a Fiberhome HG6544C.
>
> The network is secured with WPA2 Personal.
>
> The kernel version is 5.18.5-arch1-1.
>
> The wifi firmware version is 24.11.0.
>
> NetworkManager version is 1.38.2-1.
>
> wpa_supplicant version is 2.10-4.
>
> The operating system is Arch Linux.
>
>
>
> Just out of curiosity, what is C2H with id 0x15 ? It is not handled by
> rtw88, but the firmware sends it often.
>
>
> [0] https://raw.githubusercontent.com/tomaspinho/rtl8821ce/be733dc86781c68571650b395dd0fa6b53c0a039/hal/rtl8821c/hal8821c_fw.c
> [1] https://lore.kernel.org/linux-wireless/Te_PJvJjKCi-lK28Zu0d8VQG0AGdwTl6cJydYEETLbc3gN0l8liXH1DSOZnKxUHYGxavLBCs1sqos2e6jeiRzzO0RLRSISdWvTiiPp0v9kM=@protonmail.com/


Frank's recent message [0] got me thinking and digging again. I added some rtw_warn and found something interesting:

2022-08-18T10:20:44.585943+0300 home wpa_supplicant[441]: wlo1: CTRL-EVENT-BEACON-LOSS
2022-08-18T10:20:44.592099+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_enqueue: sn: 128
2022-08-18T10:20:44.592997+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_handle: src=f, sn=128, st=0
2022-08-18T10:20:44.593569+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_handle: tx_report->queue: 128

2022-08-18T10:20:45.599099+0300 home wpa_supplicant[441]: wlo1: CTRL-EVENT-BEACON-LOSS
2022-08-18T10:20:45.602156+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_enqueue: sn: 132
2022-08-18T10:20:45.602924+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_handle: src=f, sn=132, st=0
2022-08-18T10:20:45.603495+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_handle: tx_report->queue: 132

2022-08-18T10:20:46.585960+0300 home wpa_supplicant[441]: wlo1: CTRL-EVENT-BEACON-LOSS
2022-08-18T10:20:46.592130+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_enqueue: sn: 136
2022-08-18T10:20:46.593381+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_handle: src=f, sn=136, st=0
2022-08-18T10:20:46.594114+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_handle: tx_report->queue: 136

2022-08-18T10:20:47.572620+0300 home wpa_supplicant[441]: wlo1: CTRL-EVENT-BEACON-LOSS
2022-08-18T10:20:47.575483+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_enqueue: sn: 140
2022-08-18T10:20:47.576486+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_handle: src=f, sn=140, st=0
2022-08-18T10:20:47.577287+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_handle: tx_report->queue: 140

2022-08-18T10:20:48.558864+0300 home wpa_supplicant[441]: wlo1: CTRL-EVENT-BEACON-LOSS
2022-08-18T10:20:48.562131+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_enqueue: sn: 144
2022-08-18T10:20:48.562394+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_handle: src=f, sn=144, st=0
2022-08-18T10:20:48.562565+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_handle: tx_report->queue: 144

2022-08-18T10:20:56.559084+0300 home wpa_supplicant[441]: wlo1: CTRL-EVENT-BEACON-LOSS
2022-08-18T10:20:56.562157+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_enqueue: sn: 148
2022-08-18T10:20:56.565498+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_handle: src=f, sn=148, st=0
2022-08-18T10:20:56.566860+0300 home kernel: rtw_8821ce 0000:02:00.0: rtw_tx_report_handle: tx_report->queue: 148

It looks like the TX reports are coming in just a little too late. In my captures I see the router sends the ack very quickly, and the card sends the TX reports pretty quickly after the requests are enqueued, so I assume rtw88 is not transmitting quickly enough the frames that require a TX report?

I forgot to mention in my previous message that I'm on 2.4 GHz.


[0] https://lore.kernel.org/linux-wireless/[email protected]/

2022-12-28 23:06:52

by Bitterblue Smith

[permalink] [raw]
Subject: Re: rtw88: Doesn't work for 60-90 seconds several times a day

On 21/07/2022 00:35, rtl8821cerfe2 wrote:
> Hello.
>
> I am unable to open any sites in Firefox for 60-90 seconds at a time.
> On one occasion it was 156 seconds. Firefox gives up after 20 seconds
> or so. NetworkManager reports "limited connectivity". The router doesn't
> reply to pings. The journal shows that the laptop remains connected to
> the router. This happens several times a day.
>
> However, my IRC client seems to be unaffected. It never detected any
> abnormally high lag during these events, not even the one that lasted
> 156 seconds. It checks the lag every 30 seconds. Also, the bot named
> "phrik" from the #archlinux-offtopic channel reacts immediately when
> I send it "!ping" during one of these events. (It sends back "pong".)
> So I guess existing connections are not affected.
>
> I have had this problem ever since support for RTL8821CE with RFE 2
> was added. (The wifi card's RFE type is 2.)
>
> Other devices connected to the same router don't have this problem.
>
> The laptop and the router are in the same room. The distance
> between them is about 3 meters.
>
>
> These are the things I tried which did not help:
>
> - The rtw88_core option disable_lps_deep=1
>
> - `iw wlo1 set power_save off`
>
> - Installing wireless-regdb and uncommenting my country in
> /etc/conf.d/wireless-regdom
>
> - Switching the router to "n only" mode. Previously it was in "b/g/n"
> mode.
>
> - Making the router use channel 9 instead of "auto". By itself it was
> selecting channels 1 or 11 the few times I checked that. Channel 9
> seemed less crowded than those.
>
> - Making the router use 40 MHz channel width instead of the "20/40"
> setting. This doubled the speed but didn't help with my problem.
>
> - The firmware from the rtl8821ce driver [0] (version 20.1.0),
> instead of the one from linux-firmware (version 24.11.0). I used the
> one with the length of 137616 bytes.
>
> This doesn't happen with the rtl8821ce driver, which is why I extracted
> that firmware from it, to see if it's a firmware issue.
>
>
> Pinging the router all day seems to prevent this problem. Enabling all
> the debug flags for rtw88_core also *may* prevent it. I'm not sure about
> that.
>
>
> Most of the time I don't have any bluetooth devices connected.
> When I do, they don't cause problems.
>
>
> I captured a bit of wifi traffic using another laptop, including two of
> these events, and noticed something strange:
>
> - rtw88 sends "Null function" telling the router it's going to sleep
> - router immediately sends ack (after less than 1 ms)
> - rtw88 resends "Null function" (same SN, Retry flag set)
> - router immediately sends ack
> - rtw88 resends
> - router immediately sends ack
> - rtw88 resends
> - ...
> - ...
>
> rtw88 resends the "Null function" 3-4 times, even though the router
> promptly sends ack each time, then it sends a new "Null function" with
> different SN and the process repeats. This seems to happen all the time,
> not just when I can't open any pages in Firefox. The rtl8821ce driver
> doesn't do this, but rtw88 with the old 20.1.0 firmware does. My phone
> doesn't do this either.
>
> I can provide the captures in private.
>
>
> Currently I'm using the rtw88_pci option disable_aspm=1, because kernel
> 5.18 brought the freezes back. [1]
>
>
> My laptop is HP 250 G7 with a Core i3 7020U CPU.
>
> The RTL8821CE wifi card is in M.2 slot, not soldered to the motherboard,
> even though the interface is named wlo1. It has one antenna, in case
> that matters.
>
> The router is a Fiberhome HG6544C.
>
> The network is secured with WPA2 Personal.
>
> The kernel version is 5.18.5-arch1-1.
>
> The wifi firmware version is 24.11.0.
>
> NetworkManager version is 1.38.2-1.
>
> wpa_supplicant version is 2.10-4.
>
> The operating system is Arch Linux.
>
>
>
> Just out of curiosity, what is C2H with id 0x15 ? It is not handled by
> rtw88, but the firmware sends it often.
>
>
> [0] https://raw.githubusercontent.com/tomaspinho/rtl8821ce/be733dc86781c68571650b395dd0fa6b53c0a039/hal/rtl8821c/hal8821c_fw.c
> [1] https://lore.kernel.org/linux-wireless/Te_PJvJjKCi-lK28Zu0d8VQG0AGdwTl6cJydYEETLbc3gN0l8liXH1DSOZnKxUHYGxavLBCs1sqos2e6jeiRzzO0RLRSISdWvTiiPp0v9kM=@protonmail.com/
>

A symptom I forgot to mention: rtw88 fills the journal with lots and lots
of CTRL-EVENT-BEACON-LOSS.

I had another look at this problem recently and discovered that disabling
dig and cckpd in the dynamic mechanism "fixes" it. Making them write the
default values instead of what they calculate also works:

diff --git a/phy.c b/phy.c
index 5753462..a4e2cd2 100644
--- a/phy.c
+++ b/phy.c
@@ -241,6 +241,8 @@ void rtw_phy_dig_write(struct rtw_dev *rtwdev, u8 igi)
for (path = 0; path < hal->rf_path_num; path++) {
addr = chip->dig[path].addr;
mask = chip->dig[path].mask;
+ if (chip->id == RTW_CHIP_TYPE_8821C)
+ igi = 0x20;
rtw_write32_mask(rtwdev, addr, mask, igi);
}
}
@@ -746,6 +748,9 @@ static void rtw_phy_cck_pd(struct rtw_dev *rtwdev)
if (level >= CCK_PD_LV_MAX)
return;

+ if (chip->id == RTW_CHIP_TYPE_8821C)
+ level = CCK_PD_LV0;
+
if (chip->ops->cck_pd_set)
chip->ops->cck_pd_set(rtwdev, level);
}

I don't understand why this helps. rtw88 calculates more or less the
same initial gain and cckpd level as the vendor driver.

I implemented the DC cancellation in rtw88, hoping it would help. It
didn't.

I found this mistake in the false alarm code. Fixing it didn't help:

diff --git a/rtw8821c.c b/rtw8821c.c
index 7b624ec..e9c6d46 100644
--- a/rtw8821c.c
+++ b/rtw8821c.c
@@ -683,9 +685,9 @@ static void rtw8821c_false_alarm_statistics(struct rtw_dev *rtwdev)

dm_info->cck_fa_cnt = cck_fa_cnt;
dm_info->ofdm_fa_cnt = ofdm_fa_cnt;
+ dm_info->total_fa_cnt = ofdm_fa_cnt;
if (cck_enable)
dm_info->total_fa_cnt += cck_fa_cnt;
- dm_info->total_fa_cnt = ofdm_fa_cnt;

crc32_cnt = rtw_read32(rtwdev, REG_CRC_CCK);
dm_info->cck_ok_cnt = FIELD_GET(GENMASK(15, 0), crc32_cnt);

What else should I check?