2023-06-11 20:33:04

by Martin Blumenstingl

[permalink] [raw]
Subject: wifi: rtw88: question about SDIO RX aggregation limiting

Hello Ping-Ke,

certain Amlogic SDIO host controllers have a limit of
receiving/transmitting at most 1536 bytes at a time.
It turns out that rtw_sdio_enable_rx_aggregation() from rtw88/sdio.c
is not taking this into account currently.
For any RX buffer that is bigger than 1536 bytes (which can happen due
to RX aggregation) we're unable to do any processing on the host side
because all bytes beyond the 1536 bytes mark are lost.

Lukas found that limiting BIT_RXDMA_AGG_PG_TH to 0x6 makes his
RTL8822CS work on the affected Amlogic SoCs.

My question now is: how can we properly limit BIT_RXDMA_AGG_PG_TH
without hard-coding a one-fits-all value (which may reduce
performance)?

Initially I thought that we could just calculate it:
host_max_pages = mmc_host->max_req_size / rtwdev->chip->page_size
max_req_size for the affected controller is 1536 and chip->page_size
is 128, so the result would be 12 (I thought it would be close to this
number, maybe +/-1).
Unfortunately this doesn't fix the issue and for his board
BIT_RXDMA_AGG_PG_TH the limit is 6 or 7.

If you could describe how BIT_RXDMA_AGG_PG_TH generally works I can
come up with the algorithm to calculate the limit on my own (at least
I hope so).
Lukas has been very patient with testing so far and I understood that
he's willing to test further patches if we think that it fixes the
rtw88 driver issue he's seeing.


Thank you and best regards,
Martin


2023-06-13 02:29:47

by Ping-Ke Shih

[permalink] [raw]
Subject: RE: wifi: rtw88: question about SDIO RX aggregation limiting



> -----Original Message-----
> From: Martin Blumenstingl <[email protected]>
> Sent: Monday, June 12, 2023 4:23 AM
> To: [email protected]; Ping-Ke Shih <[email protected]>
> Cc: Lukas F. Hartmann <[email protected]>; [email protected]; [email protected];
> [email protected]
> Subject: wifi: rtw88: question about SDIO RX aggregation limiting
>
> Hello Ping-Ke,
>
> certain Amlogic SDIO host controllers have a limit of
> receiving/transmitting at most 1536 bytes at a time.
> It turns out that rtw_sdio_enable_rx_aggregation() from rtw88/sdio.c
> is not taking this into account currently.
> For any RX buffer that is bigger than 1536 bytes (which can happen due
> to RX aggregation) we're unable to do any processing on the host side
> because all bytes beyond the 1536 bytes mark are lost.
>
> Lukas found that limiting BIT_RXDMA_AGG_PG_TH to 0x6 makes his
> RTL8822CS work on the affected Amlogic SoCs.
>
> My question now is: how can we properly limit BIT_RXDMA_AGG_PG_TH
> without hard-coding a one-fits-all value (which may reduce
> performance)?
>
> Initially I thought that we could just calculate it:
> host_max_pages = mmc_host->max_req_size / rtwdev->chip->page_size
> max_req_size for the affected controller is 1536 and chip->page_size
> is 128, so the result would be 12 (I thought it would be close to this
> number, maybe +/-1).
> Unfortunately this doesn't fix the issue and for his board
> BIT_RXDMA_AGG_PG_TH the limit is 6 or 7.
>
> If you could describe how BIT_RXDMA_AGG_PG_TH generally works I can
> come up with the algorithm to calculate the limit on my own (at least
> I hope so).
> Lukas has been very patient with testing so far and I understood that
> he's willing to test further patches if we think that it fixes the
> rtw88 driver issue he's seeing.
>

The unit of BIT_RXDMA_AGG_PG_TH is 1k bytes, so I think you can
set mmc_host->max_req_size/1024.

I wonder why 0x6 works on Amlogic SoCs. Could you or Lukas compare performance
between the settings of 0x1 and 0x6?

Ping-Ke

2023-06-19 20:47:46

by Martin Blumenstingl

[permalink] [raw]
Subject: Re: wifi: rtw88: question about SDIO RX aggregation limiting

Hello Ping-Ke,

apologies for the long delay.

On Tue, Jun 13, 2023 at 4:20 AM Ping-Ke Shih <[email protected]> wrote:
[...]
> The unit of BIT_RXDMA_AGG_PG_TH is 1k bytes, so I think you can
> set mmc_host->max_req_size/1024.
I tried this but I got a result that I don't understand.
I've been testing with three BIT_RXDMA_AGG_PG_TH values on a SoC that
can handle 255 * 1024 bytes. Each time I connected to the same AP and
downloaded a bigger file over http(s).
BIT_RXDMA_AGG_PG_TH: biggest observed rx_len in rtw_sdio_rxfifo_recv()
255: 20968
6: 5122
1: 1602

The biggest rx_len I have observed for BIT_RXDMA_AGG_PG_TH 1 looks suspicious:
My understanding is that I shouldn't be seeing rx_len larger than
BIT_RXDMA_AGG_PG_TH * 1024.
BIT_RXDMA_AGG_PG_TH = 6 is within this limit but BIT_RXDMA_AGG_PG_TH =
1 isn't (I'm seeing 578 extra bytes in addition to the 1024 bytes that
I was expecting).
Do you have any idea where this is coming from? I'm worried that we
can still end up with the problem that Lukas described but seems to
not have hit in his testing with BIT_RXDMA_AGG_PG_TH = 6

> I wonder why 0x6 works on Amlogic SoCs. Could you or Lukas compare performance
> between the settings of 0x1 and 0x6?
I can do this later this week but I'd like to understand the above
results first.


Best regards,
Martin

2023-06-20 05:41:59

by Ping-Ke Shih

[permalink] [raw]
Subject: RE: wifi: rtw88: question about SDIO RX aggregation limiting



> -----Original Message-----
> From: Martin Blumenstingl <[email protected]>
> Sent: Tuesday, June 20, 2023 4:38 AM
> To: Ping-Ke Shih <[email protected]>
> Cc: [email protected]; Lukas F. Hartmann <[email protected]>; [email protected];
> [email protected]; [email protected]
> Subject: Re: wifi: rtw88: question about SDIO RX aggregation limiting
>
> Hello Ping-Ke,
>
> apologies for the long delay.
>
> On Tue, Jun 13, 2023 at 4:20 AM Ping-Ke Shih <[email protected]> wrote:
> [...]
> > The unit of BIT_RXDMA_AGG_PG_TH is 1k bytes, so I think you can
> > set mmc_host->max_req_size/1024.
> I tried this but I got a result that I don't understand.
> I've been testing with three BIT_RXDMA_AGG_PG_TH values on a SoC that
> can handle 255 * 1024 bytes. Each time I connected to the same AP and
> downloaded a bigger file over http(s).
> BIT_RXDMA_AGG_PG_TH: biggest observed rx_len in rtw_sdio_rxfifo_recv()
> 255: 20968
> 6: 5122
> 1: 1602

Please also print out number of packets you receive, and then we can see how
many packets aggregate.

>
> The biggest rx_len I have observed for BIT_RXDMA_AGG_PG_TH 1 looks suspicious:
> My understanding is that I shouldn't be seeing rx_len larger than
> BIT_RXDMA_AGG_PG_TH * 1024.
> BIT_RXDMA_AGG_PG_TH = 6 is within this limit but BIT_RXDMA_AGG_PG_TH =
> 1 isn't (I'm seeing 578 extra bytes in addition to the 1024 bytes that
> I was expecting).

Assume threshold is 1k, and single one packet is larger than 1k. Hardware
will not split it into two. Also, please make sure 0x280[29] BIT_EN_PRE_CALC
is 1. Otherwise, it will possibly aggregate additional one packet to over
the threshold.

0x280[15:8] is timeout time in unit of 1us for SDIO interface. When set
threshold to 255, you can enlarge this to see if it can aggregate more as
expected.

> Do you have any idea where this is coming from? I'm worried that we
> can still end up with the problem that Lukas described but seems to
> not have hit in his testing with BIT_RXDMA_AGG_PG_TH = 6
>
> > I wonder why 0x6 works on Amlogic SoCs. Could you or Lukas compare performance
> > between the settings of 0x1 and 0x6?
> I can do this later this week but I'd like to understand the above
> results first.
>

Ping-Ke

2023-07-03 21:28:02

by Martin Blumenstingl

[permalink] [raw]
Subject: Re: wifi: rtw88: question about SDIO RX aggregation limiting

Hello Ping-Ke,

sorry again for the long waiting time. I'll be quicker next time.

On Tue, Jun 20, 2023 at 7:26 AM Ping-Ke Shih <[email protected]> wrote:
[...]
> > > The unit of BIT_RXDMA_AGG_PG_TH is 1k bytes, so I think you can
> > > set mmc_host->max_req_size/1024.
> > I tried this but I got a result that I don't understand.
> > I've been testing with three BIT_RXDMA_AGG_PG_TH values on a SoC that
> > can handle 255 * 1024 bytes. Each time I connected to the same AP and
> > downloaded a bigger file over http(s).
> > BIT_RXDMA_AGG_PG_TH: biggest observed rx_len in rtw_sdio_rxfifo_recv()
> > 255: 20968
> > 6: 5122
> > 1: 1602
>
> Please also print out number of packets you receive, and then we can see how
> many packets aggregate.
sure - here are the results:
BIT_RXDMA_AGG_PG_TH: biggest observed rx_len in rtw_sdio_rxfifo_recv()
/ number of (aggregated) packets
255: 20824 / 12
6: 5128 / 4
1: 3132 / 1 (these were a few exceptions and I'm not able to reliably
reproduce it, 1602 / 1 is what I can easily reproduce)

> > The biggest rx_len I have observed for BIT_RXDMA_AGG_PG_TH 1 looks suspicious:
> > My understanding is that I shouldn't be seeing rx_len larger than
> > BIT_RXDMA_AGG_PG_TH * 1024.
> > BIT_RXDMA_AGG_PG_TH = 6 is within this limit but BIT_RXDMA_AGG_PG_TH =
> > 1 isn't (I'm seeing 578 extra bytes in addition to the 1024 bytes that
> > I was expecting).
>
> Assume threshold is 1k, and single one packet is larger than 1k. Hardware
> will not split it into two. Also, please make sure 0x280[29] BIT_EN_PRE_CALC
> is 1. Otherwise, it will possibly aggregate additional one packet to over
> the threshold.
From the numbers above it seems most likely that we're hitting the
"one packet is larger than 1k" case.

Also I'm seeing:
wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UP group default qlen 1000
My interface's MTU is 1500 bytes. Seeing 1602 bytes rx_len with one
packet is already odd (that would mean 102 bytes for overhead like RX
descriptor and other headers/metadata). But 3132 bytes rx_len is very
odd.

BIT_EN_PRE_CALC is set, see also the attached diff (it's not meant to
be applied anywhere - it's just so you understand what I've been
testing with).

> 0x280[15:8] is timeout time in unit of 1us for SDIO interface. When set
> threshold to 255, you can enlarge this to see if it can aggregate more as
> expected.
I did not experiment with this yet as I'd like to understand the above
findings first.


Best regards,
Martin


Attachments:
rtw88-rx-agg-test.diff (2.77 kB)

2023-07-04 09:22:32

by Ping-Ke Shih

[permalink] [raw]
Subject: RE: wifi: rtw88: question about SDIO RX aggregation limiting



> -----Original Message-----
> From: Martin Blumenstingl <[email protected]>
> Sent: Tuesday, July 4, 2023 5:25 AM
> To: Ping-Ke Shih <[email protected]>
> Cc: [email protected]; Lukas F. Hartmann <[email protected]>; [email protected];
> [email protected]; [email protected]
> Subject: Re: wifi: rtw88: question about SDIO RX aggregation limiting
>
> Hello Ping-Ke,
>
> sorry again for the long waiting time. I'll be quicker next time.
>
> On Tue, Jun 20, 2023 at 7:26 AM Ping-Ke Shih <[email protected]> wrote:
> [...]
> > > > The unit of BIT_RXDMA_AGG_PG_TH is 1k bytes, so I think you can
> > > > set mmc_host->max_req_size/1024.
> > > I tried this but I got a result that I don't understand.
> > > I've been testing with three BIT_RXDMA_AGG_PG_TH values on a SoC that
> > > can handle 255 * 1024 bytes. Each time I connected to the same AP and
> > > downloaded a bigger file over http(s).
> > > BIT_RXDMA_AGG_PG_TH: biggest observed rx_len in rtw_sdio_rxfifo_recv()
> > > 255: 20968
> > > 6: 5122
> > > 1: 1602
> >
> > Please also print out number of packets you receive, and then we can see how
> > many packets aggregate.
> sure - here are the results:
> BIT_RXDMA_AGG_PG_TH: biggest observed rx_len in rtw_sdio_rxfifo_recv()
> / number of (aggregated) packets
> 255: 20824 / 12
> 6: 5128 / 4
> 1: 3132 / 1 (these were a few exceptions and I'm not able to reliably
> reproduce it, 1602 / 1 is what I can easily reproduce)

These results are expected.

The minimum number of received packet is one, so it is possible that packet size
is larger than 1k.

>
> > > The biggest rx_len I have observed for BIT_RXDMA_AGG_PG_TH 1 looks suspicious:
> > > My understanding is that I shouldn't be seeing rx_len larger than
> > > BIT_RXDMA_AGG_PG_TH * 1024.
> > > BIT_RXDMA_AGG_PG_TH = 6 is within this limit but BIT_RXDMA_AGG_PG_TH =
> > > 1 isn't (I'm seeing 578 extra bytes in addition to the 1024 bytes that
> > > I was expecting).
> >
> > Assume threshold is 1k, and single one packet is larger than 1k. Hardware
> > will not split it into two. Also, please make sure 0x280[29] BIT_EN_PRE_CALC
> > is 1. Otherwise, it will possibly aggregate additional one packet to over
> > the threshold.
> From the numbers above it seems most likely that we're hitting the
> "one packet is larger than 1k" case.
>
> Also I'm seeing:
> wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
> state UP group default qlen 1000
> My interface's MTU is 1500 bytes. Seeing 1602 bytes rx_len with one
> packet is already odd (that would mean 102 bytes for overhead like RX
> descriptor and other headers/metadata). But 3132 bytes rx_len is very
> odd.

I think MTU only affects transmitting packets. Would you mind to try to
set peer's MTU to 1000 or lower?

For received packets, driver is the first entry, so maybe net stack can
shrink packet size to fit MTU, not sure though.

The large 3132-byte packet is also very odd to me. Try to dump raw data
and check the content that packet size is larger than 1500.


An idea from my colleague. Is it possible to multiple read_port for a certain
receiving? Like below pseudo code: (I don't define '* host')

@@ -502,17 +502,26 @@ static int rtw_sdio_read_port(struct rtw_dev *rtwdev, u8 *buf, size_t count)
struct rtw_sdio *rtwsdio = (struct rtw_sdio *)rtwdev->priv;
bool bus_claim = rtw_sdio_bus_claim_needed(rtwsdio);
u32 rxaddr = rtwsdio->rx_addr++;
+ size_t n;
int ret;

if (bus_claim)
sdio_claim_host(rtwsdio->sdio_func);

- ret = sdio_memcpy_fromio(rtwsdio->sdio_func, buf,
- RTW_SDIO_ADDR_RX_RX0FF_GEN(rxaddr), count);
- if (ret)
- rtw_warn(rtwdev,
- "Failed to read %zu byte(s) from SDIO port 0x%08x",
- count, rxaddr);
+ while (count > 0) {
+ n = min(host->max_req_size, count);
+
+ ret = sdio_memcpy_fromio(rtwsdio->sdio_func, buf,
+ RTW_SDIO_ADDR_RX_RX0FF_GEN(rxaddr), n);
+ if (ret)
+ rtw_warn(rtwdev,
+ "Failed to read %zu byte(s) from SDIO port 0x%08x",
+ n, rxaddr);
+
+ count -= n;
+ buf += n;
+ }
+

if (bus_claim)
sdio_release_host(rtwsdio->sdio_func);


Ping-Ke