Sascha an Ping-Ke,
I have been testing the RTW8822BU driver found in my rtw88 GitHub repo. This
code matches the code found in wireless-next. I created 9 files of 5.8 GiB each
and used a for loop to copy them from the test computer to/from my server. The
wireless connection is on the 5 GHz band (channel 153) connected to an ax1500
Wifi 6 router, which in turn is connected to the server via a 1G ethernet cable.
The connection has not crashed, but I see strange behavior.
With both TX and RX, the rate is high at 13.5 MiB/s for RX and 11.1 MiB/s for TX
for about 1/3 of the time, but then the driver reports "timed out to flush queue
3" and the rate drops to 3-5 MiB/s for RX and 2-3 MiB/s for TX. These low rates
are in effect for 2/3 of the time. The 5G bands are relatively unused in my
house, thus I do not suspect interference.
Any idea why the timeout errors are occurring? They definitely kill the
throughput. If you are interested, I have a photo of the TX throughput vs time.
Larry
Hi Larry,
On Sat, Mar 04, 2023 at 08:52:26PM -0600, Larry Finger wrote:
> Sascha an Ping-Ke,
>
> I have been testing the RTW8822BU driver found in my rtw88 GitHub repo. This
> code matches the code found in wireless-next. I created 9 files of 5.8 GiB
> each and used a for loop to copy them from the test computer to/from my
> server. The wireless connection is on the 5 GHz band (channel 153) connected
> to an ax1500 Wifi 6 router, which in turn is connected to the server via a
> 1G ethernet cable. The connection has not crashed, but I see strange
> behavior.
What chipset are you using? Is it a RTL8822bu or some other chipset
reported by the driver?
>
> With both TX and RX, the rate is high at 13.5 MiB/s for RX and 11.1 MiB/s
> for TX for about 1/3 of the time, but then the driver reports "timed out to
> flush queue 3" and the rate drops to 3-5 MiB/s for RX and 2-3 MiB/s for TX.
> These low rates are in effect for 2/3 of the time. The 5G bands are
> relatively unused in my house, thus I do not suspect interference.
I've received a very similar report this weekend. About 3-4 messages per
second, "timed out to flush queue 3", but driver continues to work.
I've also seen it this morning by accident and once again while writing
this mail. This was on a RTL8821CU.
So far I have no idea what the problem might be.
Sascha
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
On Mon, Mar 06, 2023 at 10:18:45AM +0100, Sascha Hauer wrote:
> Hi Larry,
>
> On Sat, Mar 04, 2023 at 08:52:26PM -0600, Larry Finger wrote:
> > Sascha an Ping-Ke,
> >
> > I have been testing the RTW8822BU driver found in my rtw88 GitHub repo. This
> > code matches the code found in wireless-next. I created 9 files of 5.8 GiB
> > each and used a for loop to copy them from the test computer to/from my
> > server. The wireless connection is on the 5 GHz band (channel 153) connected
> > to an ax1500 Wifi 6 router, which in turn is connected to the server via a
> > 1G ethernet cable. The connection has not crashed, but I see strange
> > behavior.
>
> What chipset are you using? Is it a RTL8822bu or some other chipset
> reported by the driver?
>
> >
> > With both TX and RX, the rate is high at 13.5 MiB/s for RX and 11.1 MiB/s
> > for TX for about 1/3 of the time, but then the driver reports "timed out to
> > flush queue 3" and the rate drops to 3-5 MiB/s for RX and 2-3 MiB/s for TX.
> > These low rates are in effect for 2/3 of the time. The 5G bands are
> > relatively unused in my house, thus I do not suspect interference.
>
> I've received a very similar report this weekend. About 3-4 messages per
> second, "timed out to flush queue 3", but driver continues to work.
> I've also seen it this morning by accident and once again while writing
> this mail. This was on a RTL8821CU.
>
> So far I have no idea what the problem might be.
The "timed out to flush queue %d\n" message comes from
__rtw_mac_flush_prio_queue(). Here some registers are read which show
the number of reserved pages for a queue and the number of available
pages of a queue. I used the debugfs interface to observe these
registers from time to time:
f=$(echo /sys/kernel/debug/ieee80211/phy*/rtw88/read_reg); for i in 0x230 0x234 0x238 0x23c; do echo "$i 4" > $f; cat $f; done
This is what they show:
reg 0x230: 0x00230040
reg 0x234: 0x00400040
reg 0x238: 0x00400040
reg 0x23c: 0x00000000
The upper 16bit contain the number of available pages and the lower
16bit contain the number of reserved pages (Note these are the registers
on a RTL8822CU, on other chipsets the number of available pages is
lower, like 0x10 on RTL8821CU). Register 0x230 is the interesting one
for us, it has the values for queue 3.
What I can see is that for the other queues the number of reserved pages
usually matches the number of available pages. It happens sometimes that
the number of available pages goes down to 0x3f, but with the next
register read it goes back to 0x40. For 0x230 this is different though.
Here the number of available pages continuously decreases over time and
never goes back up.
I don't know what this is trying to tell me. It seems that things queued
to queue RTW_DMA_MAPPING_HIGH are sometimes (always?) stuck.
Unfortunately I also don't know how the different priority queues relate
to the different USB endpoints and how these in turn go together with
the qsel settings. Maybe Ping-Ke can shed some light on this.
Sascha
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
> -----Original Message-----
> From: Sascha Hauer <[email protected]>
> Sent: Monday, March 6, 2023 9:00 PM
> To: Larry Finger <[email protected]>
> Cc: Ping-Ke Shih <[email protected]>; linux-wireless <[email protected]>
> Subject: Re: Performance of rtw88_8822bu
>
> On Mon, Mar 06, 2023 at 10:18:45AM +0100, Sascha Hauer wrote:
> > Hi Larry,
> >
> > On Sat, Mar 04, 2023 at 08:52:26PM -0600, Larry Finger wrote:
> > > Sascha an Ping-Ke,
> > >
> > > I have been testing the RTW8822BU driver found in my rtw88 GitHub repo. This
> > > code matches the code found in wireless-next. I created 9 files of 5.8 GiB
> > > each and used a for loop to copy them from the test computer to/from my
> > > server. The wireless connection is on the 5 GHz band (channel 153) connected
> > > to an ax1500 Wifi 6 router, which in turn is connected to the server via a
> > > 1G ethernet cable. The connection has not crashed, but I see strange
> > > behavior.
> >
> > What chipset are you using? Is it a RTL8822bu or some other chipset
> > reported by the driver?
> >
> > >
> > > With both TX and RX, the rate is high at 13.5 MiB/s for RX and 11.1 MiB/s
> > > for TX for about 1/3 of the time, but then the driver reports "timed out to
> > > flush queue 3" and the rate drops to 3-5 MiB/s for RX and 2-3 MiB/s for TX.
> > > These low rates are in effect for 2/3 of the time. The 5G bands are
> > > relatively unused in my house, thus I do not suspect interference.
> >
> > I've received a very similar report this weekend. About 3-4 messages per
> > second, "timed out to flush queue 3", but driver continues to work.
> > I've also seen it this morning by accident and once again while writing
> > this mail. This was on a RTL8821CU.
> >
> > So far I have no idea what the problem might be.
>
> The "timed out to flush queue %d\n" message comes from
> __rtw_mac_flush_prio_queue(). Here some registers are read which show
> the number of reserved pages for a queue and the number of available
> pages of a queue. I used the debugfs interface to observe these
> registers from time to time:
>
> f=$(echo /sys/kernel/debug/ieee80211/phy*/rtw88/read_reg); for i in 0x230 0x234 0x238 0x23c; do echo "$i
> 4" > $f; cat $f; done
>
> This is what they show:
>
> reg 0x230: 0x00230040
> reg 0x234: 0x00400040
> reg 0x238: 0x00400040
> reg 0x23c: 0x00000000
>
> The upper 16bit contain the number of available pages and the lower
> 16bit contain the number of reserved pages (Note these are the registers
> on a RTL8822CU, on other chipsets the number of available pages is
> lower, like 0x10 on RTL8821CU). Register 0x230 is the interesting one
> for us, it has the values for queue 3.
>
> What I can see is that for the other queues the number of reserved pages
> usually matches the number of available pages. It happens sometimes that
> the number of available pages goes down to 0x3f, but with the next
> register read it goes back to 0x40. For 0x230 this is different though.
> Here the number of available pages continuously decreases over time and
> never goes back up.
>
> I don't know what this is trying to tell me. It seems that things queued
> to queue RTW_DMA_MAPPING_HIGH are sometimes (always?) stuck.
> Unfortunately I also don't know how the different priority queues relate
> to the different USB endpoints and how these in turn go together with
> the qsel settings. Maybe Ping-Ke can shed some light on this.
>
To quickly check if RTW_DMA_MAPPING_HIGH get stuck, changing qsel_to_ep[]
to different priority queue would be helpful to identify the problem.
If only this queue works not well, we may dig MAC settings. Otherwise,
it may be a RF performance problem.
0x240 is another queue called public queue. If 0x230/0x234/0x238/0x23c
become full, packets are queued into this queue. From view of MAC circuit,
it fetches these queues in specific order (from high to low conceptually;
I'm 100% sure.), and apply EDCA contention parameters for internal and
external contention.
I don't have much useful ideas to this problem for now.
Ping-Ke
On 3/6/23 19:39, Ping-Ke Shih wrote:
>
>
>> -----Original Message-----
>> From: Sascha Hauer <[email protected]>
>> Sent: Monday, March 6, 2023 9:00 PM
>> To: Larry Finger <[email protected]>
>> Cc: Ping-Ke Shih <[email protected]>; linux-wireless <[email protected]>
>> Subject: Re: Performance of rtw88_8822bu
>>
>> On Mon, Mar 06, 2023 at 10:18:45AM +0100, Sascha Hauer wrote:
>>> Hi Larry,
>>>
>>> On Sat, Mar 04, 2023 at 08:52:26PM -0600, Larry Finger wrote:
>>>> Sascha an Ping-Ke,
>>>>
>>>> I have been testing the RTW8822BU driver found in my rtw88 GitHub repo. This
>>>> code matches the code found in wireless-next. I created 9 files of 5.8 GiB
>>>> each and used a for loop to copy them from the test computer to/from my
>>>> server. The wireless connection is on the 5 GHz band (channel 153) connected
>>>> to an ax1500 Wifi 6 router, which in turn is connected to the server via a
>>>> 1G ethernet cable. The connection has not crashed, but I see strange
>>>> behavior.
>>>
>>> What chipset are you using? Is it a RTL8822bu or some other chipset
>>> reported by the driver?
>>>
>>>>
>>>> With both TX and RX, the rate is high at 13.5 MiB/s for RX and 11.1 MiB/s
>>>> for TX for about 1/3 of the time, but then the driver reports "timed out to
>>>> flush queue 3" and the rate drops to 3-5 MiB/s for RX and 2-3 MiB/s for TX.
>>>> These low rates are in effect for 2/3 of the time. The 5G bands are
>>>> relatively unused in my house, thus I do not suspect interference.
>>>
>>> I've received a very similar report this weekend. About 3-4 messages per
>>> second, "timed out to flush queue 3", but driver continues to work.
>>> I've also seen it this morning by accident and once again while writing
>>> this mail. This was on a RTL8821CU.
>>>
>>> So far I have no idea what the problem might be.
>>
>> The "timed out to flush queue %d\n" message comes from
>> __rtw_mac_flush_prio_queue(). Here some registers are read which show
>> the number of reserved pages for a queue and the number of available
>> pages of a queue. I used the debugfs interface to observe these
>> registers from time to time:
>>
>> f=$(echo /sys/kernel/debug/ieee80211/phy*/rtw88/read_reg); for i in 0x230 0x234 0x238 0x23c; do echo "$i
>> 4" > $f; cat $f; done
>>
>> This is what they show:
>>
>> reg 0x230: 0x00230040
>> reg 0x234: 0x00400040
>> reg 0x238: 0x00400040
>> reg 0x23c: 0x00000000
>>
>> The upper 16bit contain the number of available pages and the lower
>> 16bit contain the number of reserved pages (Note these are the registers
>> on a RTL8822CU, on other chipsets the number of available pages is
>> lower, like 0x10 on RTL8821CU). Register 0x230 is the interesting one
>> for us, it has the values for queue 3.
>>
>> What I can see is that for the other queues the number of reserved pages
>> usually matches the number of available pages. It happens sometimes that
>> the number of available pages goes down to 0x3f, but with the next
>> register read it goes back to 0x40. For 0x230 this is different though.
>> Here the number of available pages continuously decreases over time and
>> never goes back up.
>>
>> I don't know what this is trying to tell me. It seems that things queued
>> to queue RTW_DMA_MAPPING_HIGH are sometimes (always?) stuck.
>> Unfortunately I also don't know how the different priority queues relate
>> to the different USB endpoints and how these in turn go together with
>> the qsel settings. Maybe Ping-Ke can shed some light on this.
>>
>
> To quickly check if RTW_DMA_MAPPING_HIGH get stuck, changing qsel_to_ep[]
> to different priority queue would be helpful to identify the problem.
> If only this queue works not well, we may dig MAC settings. Otherwise,
> it may be a RF performance problem.
>
> 0x240 is another queue called public queue. If 0x230/0x234/0x238/0x23c
> become full, packets are queued into this queue. From view of MAC circuit,
> it fetches these queues in specific order (from high to low conceptually;
> I'm 100% sure.), and apply EDCA contention parameters for internal and
> external contention.
>
> I don't have much useful ideas to this problem for now.
Ping-Ke and Sasha,
I made a discovery this morning. I set up a transfer from my NFS server to the
computer over an rtw8822bu link using rsync with the --progress option. In a
second window, I ran Sasha's register dump in a loop using a 5 second delay
between readouts. A third window showed was running 'dmesg -w'.
The transfer ran to completion on a 5.8 GiB file with all incremental speeds
reported as 11-12 MB/s. No timeouts on flushing the queue were logged, until I
opened the NetworkManager applet! At that point, I got many queue timeouts
logged, and the instantaneous throughput dropped to 2-3 MB/s as I reported
earlier. Surprisingly, there were no changes in the registers when the errors
happened.
The NM applet is going to be reading the transfer rate from the device, which
apparently messes up the data flow to/from the device.
As long as I do not cause the NM applet to display the connections, I get
nothing logged.
Larry
> -----Original Message-----
> From: Larry Finger <[email protected]> On Behalf Of Larry Finger
> Sent: Tuesday, March 7, 2023 10:44 PM
> To: Ping-Ke Shih <[email protected]>; Sascha Hauer <[email protected]>
> Cc: linux-wireless <[email protected]>
> Subject: Re: Performance of rtw88_8822bu
>
> On 3/6/23 19:39, Ping-Ke Shih wrote:
> >
> >
> >> -----Original Message-----
> >> From: Sascha Hauer <[email protected]>
> >> Sent: Monday, March 6, 2023 9:00 PM
> >> To: Larry Finger <[email protected]>
> >> Cc: Ping-Ke Shih <[email protected]>; linux-wireless <[email protected]>
> >> Subject: Re: Performance of rtw88_8822bu
> >>
> >> On Mon, Mar 06, 2023 at 10:18:45AM +0100, Sascha Hauer wrote:
> >>> Hi Larry,
> >>>
> >>> On Sat, Mar 04, 2023 at 08:52:26PM -0600, Larry Finger wrote:
> >>>> Sascha an Ping-Ke,
> >>>>
> >>>> I have been testing the RTW8822BU driver found in my rtw88 GitHub repo. This
> >>>> code matches the code found in wireless-next. I created 9 files of 5.8 GiB
> >>>> each and used a for loop to copy them from the test computer to/from my
> >>>> server. The wireless connection is on the 5 GHz band (channel 153) connected
> >>>> to an ax1500 Wifi 6 router, which in turn is connected to the server via a
> >>>> 1G ethernet cable. The connection has not crashed, but I see strange
> >>>> behavior.
> >>>
> >>> What chipset are you using? Is it a RTL8822bu or some other chipset
> >>> reported by the driver?
> >>>
> >>>>
> >>>> With both TX and RX, the rate is high at 13.5 MiB/s for RX and 11.1 MiB/s
> >>>> for TX for about 1/3 of the time, but then the driver reports "timed out to
> >>>> flush queue 3" and the rate drops to 3-5 MiB/s for RX and 2-3 MiB/s for TX.
> >>>> These low rates are in effect for 2/3 of the time. The 5G bands are
> >>>> relatively unused in my house, thus I do not suspect interference.
> >>>
> >>> I've received a very similar report this weekend. About 3-4 messages per
> >>> second, "timed out to flush queue 3", but driver continues to work.
> >>> I've also seen it this morning by accident and once again while writing
> >>> this mail. This was on a RTL8821CU.
> >>>
> >>> So far I have no idea what the problem might be.
> >>
> >> The "timed out to flush queue %d\n" message comes from
> >> __rtw_mac_flush_prio_queue(). Here some registers are read which show
> >> the number of reserved pages for a queue and the number of available
> >> pages of a queue. I used the debugfs interface to observe these
> >> registers from time to time:
> >>
> >> f=$(echo /sys/kernel/debug/ieee80211/phy*/rtw88/read_reg); for i in 0x230 0x234 0x238 0x23c; do echo
> "$i
> >> 4" > $f; cat $f; done
> >>
> >> This is what they show:
> >>
> >> reg 0x230: 0x00230040
> >> reg 0x234: 0x00400040
> >> reg 0x238: 0x00400040
> >> reg 0x23c: 0x00000000
> >>
> >> The upper 16bit contain the number of available pages and the lower
> >> 16bit contain the number of reserved pages (Note these are the registers
> >> on a RTL8822CU, on other chipsets the number of available pages is
> >> lower, like 0x10 on RTL8821CU). Register 0x230 is the interesting one
> >> for us, it has the values for queue 3.
> >>
> >> What I can see is that for the other queues the number of reserved pages
> >> usually matches the number of available pages. It happens sometimes that
> >> the number of available pages goes down to 0x3f, but with the next
> >> register read it goes back to 0x40. For 0x230 this is different though.
> >> Here the number of available pages continuously decreases over time and
> >> never goes back up.
> >>
> >> I don't know what this is trying to tell me. It seems that things queued
> >> to queue RTW_DMA_MAPPING_HIGH are sometimes (always?) stuck.
> >> Unfortunately I also don't know how the different priority queues relate
> >> to the different USB endpoints and how these in turn go together with
> >> the qsel settings. Maybe Ping-Ke can shed some light on this.
> >>
> >
> > To quickly check if RTW_DMA_MAPPING_HIGH get stuck, changing qsel_to_ep[]
> > to different priority queue would be helpful to identify the problem.
> > If only this queue works not well, we may dig MAC settings. Otherwise,
> > it may be a RF performance problem.
> >
> > 0x240 is another queue called public queue. If 0x230/0x234/0x238/0x23c
> > become full, packets are queued into this queue. From view of MAC circuit,
> > it fetches these queues in specific order (from high to low conceptually;
> > I'm 100% sure.), and apply EDCA contention parameters for internal and
> > external contention.
> >
> > I don't have much useful ideas to this problem for now.
>
> Ping-Ke and Sasha,
>
> I made a discovery this morning. I set up a transfer from my NFS server to the
> computer over an rtw8822bu link using rsync with the --progress option. In a
> second window, I ran Sasha's register dump in a loop using a 5 second delay
> between readouts. A third window showed was running 'dmesg -w'.
>
> The transfer ran to completion on a 5.8 GiB file with all incremental speeds
> reported as 11-12 MB/s. No timeouts on flushing the queue were logged, until I
> opened the NetworkManager applet! At that point, I got many queue timeouts
> logged, and the instantaneous throughput dropped to 2-3 MB/s as I reported
> earlier. Surprisingly, there were no changes in the registers when the errors
> happened.
>
> The NM applet is going to be reading the transfer rate from the device, which
> apparently messes up the data flow to/from the device.
>
> As long as I do not cause the NM applet to display the connections, I get
> nothing logged.
>
I think NM triggers scan operation when turning it on. Then, driver switches channels
between AP and scan channels with flushing queue that causes timeout. The cause is
still hard to transmit packets out, so TX buffer gets jammed.
If you enlarge the retry count or timeout value of __rtw_mac_flush_prio_queue(),
the timeout flushing could be disappear. Also, if we can implement
rtwdev->hci.ops->flush_queues for USB, the flushing log can be reduced.
Ping-Ke
On Wed, 2023-03-08 at 00:36 +0000, Ping-Ke Shih wrote:
>
>
> > -----Original Message-----
> > From: Larry Finger <[email protected]> On Behalf Of Larry
> > Finger
> > Sent: Tuesday, March 7, 2023 10:44 PM
> > To: Ping-Ke Shih <[email protected]>; Sascha Hauer
> > <[email protected]>
> > Cc: linux-wireless <[email protected]>
> > Subject: Re: Performance of rtw88_8822bu
> >
> > On 3/6/23 19:39, Ping-Ke Shih wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Sascha Hauer <[email protected]>
> > > > Sent: Monday, March 6, 2023 9:00 PM
> > > > To: Larry Finger <[email protected]>
> > > > Cc: Ping-Ke Shih <[email protected]>; linux-wireless
> > > > <[email protected]>
> > > > Subject: Re: Performance of rtw88_8822bu
> > > >
> > > > On Mon, Mar 06, 2023 at 10:18:45AM +0100, Sascha Hauer wrote:
> > > > > Hi Larry,
> > > > >
> > > > > On Sat, Mar 04, 2023 at 08:52:26PM -0600, Larry Finger wrote:
> > > > > > Sascha an Ping-Ke,
> > > > > >
> > > > > > I have been testing the RTW8822BU driver found in my rtw88
> > > > > > GitHub repo. This
> > > > > > code matches the code found in wireless-next. I created 9
> > > > > > files of 5.8 GiB
> > > > > > each and used a for loop to copy them from the test
> > > > > > computer to/from my
> > > > > > server. The wireless connection is on the 5 GHz band
> > > > > > (channel 153) connected
> > > > > > to an ax1500 Wifi 6 router, which in turn is connected to
> > > > > > the server via a
> > > > > > 1G ethernet cable. The connection has not crashed, but I
> > > > > > see strange
> > > > > > behavior.
> > > > >
> > > > > What chipset are you using? Is it a RTL8822bu or some other
> > > > > chipset
> > > > > reported by the driver?
> > > > >
> > > > > >
> > > > > > With both TX and RX, the rate is high at 13.5 MiB/s for RX
> > > > > > and 11.1 MiB/s
> > > > > > for TX for about 1/3 of the time, but then the driver
> > > > > > reports "timed out to
> > > > > > flush queue 3" and the rate drops to 3-5 MiB/s for RX and
> > > > > > 2-3 MiB/s for TX.
> > > > > > These low rates are in effect for 2/3 of the time. The 5G
> > > > > > bands are
> > > > > > relatively unused in my house, thus I do not suspect
> > > > > > interference.
> > > > >
> > > > > I've received a very similar report this weekend. About 3-4
> > > > > messages per
> > > > > second, "timed out to flush queue 3", but driver continues to
> > > > > work.
> > > > > I've also seen it this morning by accident and once again
> > > > > while writing
> > > > > this mail. This was on a RTL8821CU.
> > > > >
> > > > > So far I have no idea what the problem might be.
> > > >
> > > > The "timed out to flush queue %d\n" message comes from
> > > > __rtw_mac_flush_prio_queue(). Here some registers are read
> > > > which show
> > > > the number of reserved pages for a queue and the number of
> > > > available
> > > > pages of a queue. I used the debugfs interface to observe these
> > > > registers from time to time:
> > > >
> > > > f=$(echo /sys/kernel/debug/ieee80211/phy*/rtw88/read_reg); for
> > > > i in 0x230 0x234 0x238 0x23c; do echo
> > "$i
> > > > 4" > $f; cat $f; done
> > > >
> > > > This is what they show:
> > > >
> > > > reg 0x230: 0x00230040
> > > > reg 0x234: 0x00400040
> > > > reg 0x238: 0x00400040
> > > > reg 0x23c: 0x00000000
> > > >
> > > > The upper 16bit contain the number of available pages and the
> > > > lower
> > > > 16bit contain the number of reserved pages (Note these are the
> > > > registers
> > > > on a RTL8822CU, on other chipsets the number of available pages
> > > > is
> > > > lower, like 0x10 on RTL8821CU). Register 0x230 is the
> > > > interesting one
> > > > for us, it has the values for queue 3.
> > > >
> > > > What I can see is that for the other queues the number of
> > > > reserved pages
> > > > usually matches the number of available pages. It happens
> > > > sometimes that
> > > > the number of available pages goes down to 0x3f, but with the
> > > > next
> > > > register read it goes back to 0x40. For 0x230 this is different
> > > > though.
> > > > Here the number of available pages continuously decreases over
> > > > time and
> > > > never goes back up.
> > > >
> > > > I don't know what this is trying to tell me. It seems that
> > > > things queued
> > > > to queue RTW_DMA_MAPPING_HIGH are sometimes (always?) stuck.
> > > > Unfortunately I also don't know how the different priority
> > > > queues relate
> > > > to the different USB endpoints and how these in turn go
> > > > together with
> > > > the qsel settings. Maybe Ping-Ke can shed some light on this.
> > > >
> > >
> > > To quickly check if RTW_DMA_MAPPING_HIGH get stuck, changing
> > > qsel_to_ep[]
> > > to different priority queue would be helpful to identify the
> > > problem.
> > > If only this queue works not well, we may dig MAC settings.
> > > Otherwise,
> > > it may be a RF performance problem.
> > >
> > > 0x240 is another queue called public queue. If
> > > 0x230/0x234/0x238/0x23c
> > > become full, packets are queued into this queue. From view of MAC
> > > circuit,
> > > it fetches these queues in specific order (from high to low
> > > conceptually;
> > > I'm 100% sure.), and apply EDCA contention parameters for
> > > internal and
> > > external contention.
> > >
> > > I don't have much useful ideas to this problem for now.
> >
> > Ping-Ke and Sasha,
> >
> > I made a discovery this morning. I set up a transfer from my NFS
> > server to the
> > computer over an rtw8822bu link using rsync with the --progress
> > option. In a
> > second window, I ran Sasha's register dump in a loop using a 5
> > second delay
> > between readouts. A third window showed was running 'dmesg -w'.
> >
> > The transfer ran to completion on a 5.8 GiB file with all
> > incremental speeds
> > reported as 11-12 MB/s. No timeouts on flushing the queue were
> > logged, until I
> > opened the NetworkManager applet! At that point, I got many queue
> > timeouts
> > logged, and the instantaneous throughput dropped to 2-3 MB/s as I
> > reported
> > earlier. Surprisingly, there were no changes in the registers when
> > the errors
> > happened.
> >
> > The NM applet is going to be reading the transfer rate from the
> > device, which
> > apparently messes up the data flow to/from the device.
> >
> > As long as I do not cause the NM applet to display the connections,
> > I get
> > nothing logged.
> >
>
> I think NM triggers scan operation when turning it on. Then, driver
> switches channels
> between AP and scan channels with flushing queue that causes timeout.
> The cause is
> still hard to transmit packets out, so TX buffer gets jammed.
Yes, (at least historically) nm-applet requests that NM perform a scan
when you interact with it, on the theory that when you open the WiFi
network menu you probably want to see recent scan results. Similar to
MacOS's AirPort menu.
Most drivers handle that OK with intermittent traffic, but it will
cause disruption if for high throughput and/or latency-sensitive
traffic.
Dan
>
> If you enlarge the retry count or timeout value of
> __rtw_mac_flush_prio_queue(),
> the timeout flushing could be disappear. Also, if we can implement
> rtwdev->hci.ops->flush_queues for USB, the flushing log can be
> reduced.
>
> Ping-Ke
>
On Wed, Mar 08, 2023 at 12:36:15AM +0000, Ping-Ke Shih wrote:
>
>
> > -----Original Message-----
> > From: Larry Finger <[email protected]> On Behalf Of Larry Finger
> > Sent: Tuesday, March 7, 2023 10:44 PM
> > To: Ping-Ke Shih <[email protected]>; Sascha Hauer <[email protected]>
> > Cc: linux-wireless <[email protected]>
> > Subject: Re: Performance of rtw88_8822bu
> >
> > On 3/6/23 19:39, Ping-Ke Shih wrote:
> > >
> > >
> > >> -----Original Message-----
> > >> From: Sascha Hauer <[email protected]>
> > >> Sent: Monday, March 6, 2023 9:00 PM
> > >> To: Larry Finger <[email protected]>
> > >> Cc: Ping-Ke Shih <[email protected]>; linux-wireless <[email protected]>
> > >> Subject: Re: Performance of rtw88_8822bu
> > >>
> > >> On Mon, Mar 06, 2023 at 10:18:45AM +0100, Sascha Hauer wrote:
> > >>> Hi Larry,
> > >>>
> > >>> On Sat, Mar 04, 2023 at 08:52:26PM -0600, Larry Finger wrote:
> > >>>> Sascha an Ping-Ke,
> > >>>>
> > >>>> I have been testing the RTW8822BU driver found in my rtw88 GitHub repo. This
> > >>>> code matches the code found in wireless-next. I created 9 files of 5.8 GiB
> > >>>> each and used a for loop to copy them from the test computer to/from my
> > >>>> server. The wireless connection is on the 5 GHz band (channel 153) connected
> > >>>> to an ax1500 Wifi 6 router, which in turn is connected to the server via a
> > >>>> 1G ethernet cable. The connection has not crashed, but I see strange
> > >>>> behavior.
> > >>>
> > >>> What chipset are you using? Is it a RTL8822bu or some other chipset
> > >>> reported by the driver?
> > >>>
> > >>>>
> > >>>> With both TX and RX, the rate is high at 13.5 MiB/s for RX and 11.1 MiB/s
> > >>>> for TX for about 1/3 of the time, but then the driver reports "timed out to
> > >>>> flush queue 3" and the rate drops to 3-5 MiB/s for RX and 2-3 MiB/s for TX.
> > >>>> These low rates are in effect for 2/3 of the time. The 5G bands are
> > >>>> relatively unused in my house, thus I do not suspect interference.
> > >>>
> > >>> I've received a very similar report this weekend. About 3-4 messages per
> > >>> second, "timed out to flush queue 3", but driver continues to work.
> > >>> I've also seen it this morning by accident and once again while writing
> > >>> this mail. This was on a RTL8821CU.
> > >>>
> > >>> So far I have no idea what the problem might be.
> > >>
> > >> The "timed out to flush queue %d\n" message comes from
> > >> __rtw_mac_flush_prio_queue(). Here some registers are read which show
> > >> the number of reserved pages for a queue and the number of available
> > >> pages of a queue. I used the debugfs interface to observe these
> > >> registers from time to time:
> > >>
> > >> f=$(echo /sys/kernel/debug/ieee80211/phy*/rtw88/read_reg); for i in 0x230 0x234 0x238 0x23c; do echo
> > "$i
> > >> 4" > $f; cat $f; done
> > >>
> > >> This is what they show:
> > >>
> > >> reg 0x230: 0x00230040
> > >> reg 0x234: 0x00400040
> > >> reg 0x238: 0x00400040
> > >> reg 0x23c: 0x00000000
> > >>
> > >> The upper 16bit contain the number of available pages and the lower
> > >> 16bit contain the number of reserved pages (Note these are the registers
> > >> on a RTL8822CU, on other chipsets the number of available pages is
> > >> lower, like 0x10 on RTL8821CU). Register 0x230 is the interesting one
> > >> for us, it has the values for queue 3.
> > >>
> > >> What I can see is that for the other queues the number of reserved pages
> > >> usually matches the number of available pages. It happens sometimes that
> > >> the number of available pages goes down to 0x3f, but with the next
> > >> register read it goes back to 0x40. For 0x230 this is different though.
> > >> Here the number of available pages continuously decreases over time and
> > >> never goes back up.
> > >>
> > >> I don't know what this is trying to tell me. It seems that things queued
> > >> to queue RTW_DMA_MAPPING_HIGH are sometimes (always?) stuck.
> > >> Unfortunately I also don't know how the different priority queues relate
> > >> to the different USB endpoints and how these in turn go together with
> > >> the qsel settings. Maybe Ping-Ke can shed some light on this.
> > >>
> > >
> > > To quickly check if RTW_DMA_MAPPING_HIGH get stuck, changing qsel_to_ep[]
> > > to different priority queue would be helpful to identify the problem.
> > > If only this queue works not well, we may dig MAC settings. Otherwise,
> > > it may be a RF performance problem.
> > >
> > > 0x240 is another queue called public queue. If 0x230/0x234/0x238/0x23c
> > > become full, packets are queued into this queue. From view of MAC circuit,
> > > it fetches these queues in specific order (from high to low conceptually;
> > > I'm 100% sure.), and apply EDCA contention parameters for internal and
> > > external contention.
> > >
> > > I don't have much useful ideas to this problem for now.
> >
> > Ping-Ke and Sasha,
> >
> > I made a discovery this morning. I set up a transfer from my NFS server to the
> > computer over an rtw8822bu link using rsync with the --progress option. In a
> > second window, I ran Sasha's register dump in a loop using a 5 second delay
> > between readouts. A third window showed was running 'dmesg -w'.
> >
> > The transfer ran to completion on a 5.8 GiB file with all incremental speeds
> > reported as 11-12 MB/s. No timeouts on flushing the queue were logged, until I
> > opened the NetworkManager applet! At that point, I got many queue timeouts
> > logged, and the instantaneous throughput dropped to 2-3 MB/s as I reported
> > earlier. Surprisingly, there were no changes in the registers when the errors
> > happened.
> >
> > The NM applet is going to be reading the transfer rate from the device, which
> > apparently messes up the data flow to/from the device.
> >
> > As long as I do not cause the NM applet to display the connections, I get
> > nothing logged.
> >
>
> I think NM triggers scan operation when turning it on. Then, driver switches channels
> between AP and scan channels with flushing queue that causes timeout. The cause is
> still hard to transmit packets out, so TX buffer gets jammed.
>
> If you enlarge the retry count or timeout value of __rtw_mac_flush_prio_queue(),
> the timeout flushing could be disappear. Also, if we can implement
> rtwdev->hci.ops->flush_queues for USB, the flushing log can be reduced.
I don't have nm-applet available on my box, but with a 'nmcli dev wifi
list --rescan yes' I run into problems quite fast. That also happens on
an otherwise idle wifi link.
Sascha
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
On Tue, Mar 07, 2023 at 01:39:19AM +0000, Ping-Ke Shih wrote:
>
>
> > -----Original Message-----
> > From: Sascha Hauer <[email protected]>
> > Sent: Monday, March 6, 2023 9:00 PM
> > To: Larry Finger <[email protected]>
> > Cc: Ping-Ke Shih <[email protected]>; linux-wireless <[email protected]>
> > Subject: Re: Performance of rtw88_8822bu
> >
> > On Mon, Mar 06, 2023 at 10:18:45AM +0100, Sascha Hauer wrote:
> > > Hi Larry,
> > >
> > > On Sat, Mar 04, 2023 at 08:52:26PM -0600, Larry Finger wrote:
> > > > Sascha an Ping-Ke,
> > > >
> > > > I have been testing the RTW8822BU driver found in my rtw88 GitHub repo. This
> > > > code matches the code found in wireless-next. I created 9 files of 5.8 GiB
> > > > each and used a for loop to copy them from the test computer to/from my
> > > > server. The wireless connection is on the 5 GHz band (channel 153) connected
> > > > to an ax1500 Wifi 6 router, which in turn is connected to the server via a
> > > > 1G ethernet cable. The connection has not crashed, but I see strange
> > > > behavior.
> > >
> > > What chipset are you using? Is it a RTL8822bu or some other chipset
> > > reported by the driver?
> > >
> > > >
> > > > With both TX and RX, the rate is high at 13.5 MiB/s for RX and 11.1 MiB/s
> > > > for TX for about 1/3 of the time, but then the driver reports "timed out to
> > > > flush queue 3" and the rate drops to 3-5 MiB/s for RX and 2-3 MiB/s for TX.
> > > > These low rates are in effect for 2/3 of the time. The 5G bands are
> > > > relatively unused in my house, thus I do not suspect interference.
> > >
> > > I've received a very similar report this weekend. About 3-4 messages per
> > > second, "timed out to flush queue 3", but driver continues to work.
> > > I've also seen it this morning by accident and once again while writing
> > > this mail. This was on a RTL8821CU.
> > >
> > > So far I have no idea what the problem might be.
> >
> > The "timed out to flush queue %d\n" message comes from
> > __rtw_mac_flush_prio_queue(). Here some registers are read which show
> > the number of reserved pages for a queue and the number of available
> > pages of a queue. I used the debugfs interface to observe these
> > registers from time to time:
> >
> > f=$(echo /sys/kernel/debug/ieee80211/phy*/rtw88/read_reg); for i in 0x230 0x234 0x238 0x23c; do echo "$i
> > 4" > $f; cat $f; done
> >
> > This is what they show:
> >
> > reg 0x230: 0x00230040
> > reg 0x234: 0x00400040
> > reg 0x238: 0x00400040
> > reg 0x23c: 0x00000000
> >
> > The upper 16bit contain the number of available pages and the lower
> > 16bit contain the number of reserved pages (Note these are the registers
> > on a RTL8822CU, on other chipsets the number of available pages is
> > lower, like 0x10 on RTL8821CU). Register 0x230 is the interesting one
> > for us, it has the values for queue 3.
> >
> > What I can see is that for the other queues the number of reserved pages
> > usually matches the number of available pages. It happens sometimes that
> > the number of available pages goes down to 0x3f, but with the next
> > register read it goes back to 0x40. For 0x230 this is different though.
> > Here the number of available pages continuously decreases over time and
> > never goes back up.
> >
> > I don't know what this is trying to tell me. It seems that things queued
> > to queue RTW_DMA_MAPPING_HIGH are sometimes (always?) stuck.
> > Unfortunately I also don't know how the different priority queues relate
> > to the different USB endpoints and how these in turn go together with
> > the qsel settings. Maybe Ping-Ke can shed some light on this.
> >
>
> To quickly check if RTW_DMA_MAPPING_HIGH get stuck, changing qsel_to_ep[]
> to different priority queue would be helpful to identify the problem.
> If only this queue works not well, we may dig MAC settings. Otherwise,
> it may be a RF performance problem.
By changing qsel_to_ep[] I can shift the problem to other priority
queues.
I changed qsel_to_ep[] back to default then and tried this instead:
- txdma_pq_map |= BIT_TXDMA_HIQ_MAP(rqpn->dma_map_hi);
- txdma_pq_map |= BIT_TXDMA_MGQ_MAP(rqpn->dma_map_mg);
+ txdma_pq_map |= BIT_TXDMA_HIQ_MAP(RTW_DMA_MAPPING_NORMAL);
+ txdma_pq_map |= BIT_TXDMA_MGQ_MAP(RTW_DMA_MAPPING_NORMAL);
From my understanding RTW_DMA_MAPPING_HIGH shouldn't be used anymore and
I assumed that the number of available pages shown at 0x230
(REG_FIFOPAGE_INFO_1) shouldn't shrink anymore. To my surprise the value
in 0x230 is still shrinking.
I still wonder how the different endpoints relate to the different
priority queues. Does a packet destined for a certain priority queue
have to go through a specific endpoint, or are they completly unrelated?
Sascha
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
On 3/8/23 10:43, Dan Williams wrote:
> On Wed, 2023-03-08 at 00:36 +0000, Ping-Ke Shih wrote:
>>
>> I think NM triggers scan operation when turning it on. Then, driver
>> switches channels
>> between AP and scan channels with flushing queue that causes timeout.
>> The cause is
>> still hard to transmit packets out, so TX buffer gets jammed.
>
> Yes, (at least historically) nm-applet requests that NM perform a scan
> when you interact with it, on the theory that when you open the WiFi
> network menu you probably want to see recent scan results. Similar to
> MacOS's AirPort menu.
>
> Most drivers handle that OK with intermittent traffic, but it will
> cause disruption if for high throughput and/or latency-sensitive
> traffic.
>
> Dan
>
>>
>> If you enlarge the retry count or timeout value of
>> __rtw_mac_flush_prio_queue(),
>> the timeout flushing could be disappear. Also, if we can implement
>> rtwdev->hci.ops->flush_queues for USB, the flushing log can be
>> reduced.
Dan: Thanks for confirming that a scan trigger was the likely cause of the
problem. I verified that by adding a WARN_ONCE() after the warning was printed.
The first part of the traceback was:
rtw_ops_flush
__ieee89211_flush_queues
ieee80211_offchannel_stop_vifs
__ieee80211_start_scan
__ieee80211_request_scan
rdev_scan
nl80211_trigger_scan
Ping-Ke: Increasing the time waiting for the queue to flush did not help. I
increased the loop count from 5 to 50, and the sleep time from 20 to 200 msec. I
also tested exiting as soon as "avail" was greater than zero. I also tested
unlocking the mutex before the msleep() and relocking it afterward. Nothing
changed the behavior. It appears that something else is locking the queue
clearing, and that no amount of delay will help.
I will be trying to write a USB version of the flush queues routine based on the
PCI code.
Sasha: I agree that no load is required to trigger the problem.
There are no simple problems!
Larry