2023-07-04 09:08:01

by Wei Fang

[permalink] [raw]
Subject: [PATCH net 0/3] net: fec: fix some issues of ndo_xdp_xmit()

From: Wei Fang <[email protected]>

We encountered some issues when testing the ndo_xdp_xmit() interface
of the fec driver on i.MX8MP and i.MX93 platforms. These issues are
easy to reproduce, and the specific reproduction steps are as follows.

step1: The ethernet port of a board (board A) is connected to the EQOS
port of i.MX8MP/i.MX93, and the FEC port of i.MX8MP/i.MX93 is connected
to another ethernet port, such as a switch port.

step2: Board A uses the pktgen_sample03_burst_single_flow.sh to generate
and send packets to i.MX8MP/i.MX93. The command is shown below.
./pktgen_sample03_burst_single_flow.sh -i eth0 -d 192.168.6.8 -m \
56:bf:0d:68:b0:9e -s 1500

step3: i.MX8MP/i.MX93 use the xdp_redirect bfp program to redirect the
XDP frames from EQOS port to FEC port. The command is shown below.
./xdp_redirect eth1 eth0

After a few moments, the warning or error logs will be printed in the
console, for more details, please refer to the commit message of each
patch.

Wei Fang (3):
net: fec: dynamically set the NETDEV_XDP_ACT_NDO_XMIT feature of XDP
net: fec: recycle pages for transmitted XDP frames
net: fec: increase the size of tx ring and update thresholds of tx
ring

drivers/net/ethernet/freescale/fec.h | 17 ++-
drivers/net/ethernet/freescale/fec_main.c | 168 +++++++++++++++-------
2 files changed, 128 insertions(+), 57 deletions(-)

--
2.25.1



2023-07-04 09:34:30

by Wei Fang

[permalink] [raw]
Subject: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring

From: Wei Fang <[email protected]>

When the XDP feature is enabled and with heavy XDP frames to be
transmitted, there is a considerable probability that available
tx BDs are insufficient. This will lead to some XDP frames to be
discarded and the "NOT enough BD for SG!" error log will appear
in the console (as shown below).

[ 160.013112] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.023116] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.028926] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.038946] fec 30be0000.ethernet eth0: NOT enough BD for SG!
[ 160.044758] fec 30be0000.ethernet eth0: NOT enough BD for SG!

In the case of heavy XDP traffic, sometimes the speed of recycling
tx BDs may be slower than the speed of sending XDP frames. There
may be several specific reasons, such as the interrupt is not
responsed in time, the efficiency of the NAPI callback function is
too low due to all the queues (tx queues and rx queues) share the
same NAPI, and so on.

After trying various methods, I think that increase the size of tx
BD ring is simple and effective. Maybe the best resolution is that
allocate NAPI for each queue to improve the efficiency of the NAPI
callback, but this change is a bit big and I didn't try this method.
Perheps this method will be implemented in a future patch.

In addtion, this patch also updates the tx_stop_threshold and the
tx_wake_threshold of the tx ring. In previous logic, the value of
tx_stop_threshold is 217, however, the value of tx_wake_threshold
is 147, it does not make sense that tx_wake_threshold is less than
tx_stop_threshold. Besides, both XDP path and 'slow path' share the
tx BD rings. So if tx_stop_threshold is 217, in the case of heavy
XDP traffic, the slow path is easily to be stopped, this will have
a serious impact on the slow path.

Fixes: 6d6b39f180b8 ("net: fec: add initial XDP support")
Signed-off-by: Wei Fang <[email protected]>
---
drivers/net/ethernet/freescale/fec.h | 2 +-
drivers/net/ethernet/freescale/fec_main.c | 7 +++----
2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec.h b/drivers/net/ethernet/freescale/fec.h
index 8c0226d061fe..63a053dea819 100644
--- a/drivers/net/ethernet/freescale/fec.h
+++ b/drivers/net/ethernet/freescale/fec.h
@@ -355,7 +355,7 @@ struct bufdesc_ex {
#define RX_RING_SIZE (FEC_ENET_RX_FRPPG * FEC_ENET_RX_PAGES)
#define FEC_ENET_TX_FRSIZE 2048
#define FEC_ENET_TX_FRPPG (PAGE_SIZE / FEC_ENET_TX_FRSIZE)
-#define TX_RING_SIZE 512 /* Must be power of two */
+#define TX_RING_SIZE 1024 /* Must be power of two */
#define TX_RING_MOD_MASK 511 /* for this to work */

#define BD_ENET_RX_INT 0x00800000
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 940d3afe1d24..6c255c0d6f41 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3348,9 +3348,8 @@ static int fec_enet_alloc_queue(struct net_device *ndev)
txq->bd.ring_size = TX_RING_SIZE;
fep->total_tx_ring_size += fep->tx_queue[i]->bd.ring_size;

- txq->tx_stop_threshold = FEC_MAX_SKB_DESCS;
- txq->tx_wake_threshold =
- (txq->bd.ring_size - txq->tx_stop_threshold) / 2;
+ txq->tx_stop_threshold = MAX_SKB_FRAGS;
+ txq->tx_wake_threshold = MAX_SKB_FRAGS + 1;

txq->tso_hdrs = dma_alloc_coherent(&fep->pdev->dev,
txq->bd.ring_size * TSO_HEADER_SIZE,
@@ -3837,7 +3836,7 @@ static int fec_enet_txq_xmit_frame(struct fec_enet_private *fep,

entries_free = fec_enet_get_free_txdesc_num(txq);
if (entries_free < MAX_SKB_FRAGS + 1) {
- netdev_err(fep->netdev, "NOT enough BD for SG!\n");
+ netdev_err_once(fep->netdev, "NOT enough BD for SG!\n");
return -EBUSY;
}

--
2.25.1


2023-07-05 00:50:53

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring

> After trying various methods, I think that increase the size of tx
> BD ring is simple and effective. Maybe the best resolution is that
> allocate NAPI for each queue to improve the efficiency of the NAPI
> callback, but this change is a bit big and I didn't try this method.
> Perheps this method will be implemented in a future patch.

How does this affect platforms like Vybrid with its fast Ethernet?
Does the burst latency go up?

> In addtion, this patch also updates the tx_stop_threshold and the
> tx_wake_threshold of the tx ring. In previous logic, the value of
> tx_stop_threshold is 217, however, the value of tx_wake_threshold
> is 147, it does not make sense that tx_wake_threshold is less than
> tx_stop_threshold.

What do these actually mean? I could imagine that as the ring fills
you don't want to stop until it is 217/512 full. There is then some
hysteresis, such that it has to drop below 147/512 before more can be
added?

> Besides, both XDP path and 'slow path' share the
> tx BD rings. So if tx_stop_threshold is 217, in the case of heavy
> XDP traffic, the slow path is easily to be stopped, this will have
> a serious impact on the slow path.

Please post your iperf results for various platforms, so we can see
the effects of this. We generally don't accept tuning patches without
benchmarks which prove the improvements, and also show there is no
regression. And given the wide variety of SoCs using the FEC, i expect
testing on a number of SoCs, but Fast and 1G.

Andrew

2023-07-05 06:37:38

by Wei Fang

[permalink] [raw]
Subject: RE: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring

> -----Original Message-----
> From: Andrew Lunn <[email protected]>
> Sent: 2023??7??5?? 8:25
> To: Wei Fang <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Shenwei Wang
> <[email protected]>; Clark Wang <[email protected]>;
> [email protected]; dl-linux-imx <[email protected]>;
> [email protected]; [email protected]
> Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update
> thresholds of tx ring
>
> > After trying various methods, I think that increase the size of tx BD
> > ring is simple and effective. Maybe the best resolution is that
> > allocate NAPI for each queue to improve the efficiency of the NAPI
> > callback, but this change is a bit big and I didn't try this method.
> > Perheps this method will be implemented in a future patch.
>
> How does this affect platforms like Vybrid with its fast Ethernet?
Sorry, I don't have the Vybrid platform, but I think I don't think it has much
impact, at most it just takes up some more memory.

> Does the burst latency go up?
No, for fec, when a packet is attached to the BDs, the software will immediately
trigger the hardware to send the packet. In addition, I think it may improve the
latency, because the size of the tx ring becomes larger, and more packets can be
attached to the BD ring for burst traffic.

>
> > In addtion, this patch also updates the tx_stop_threshold and the
> > tx_wake_threshold of the tx ring. In previous logic, the value of
> > tx_stop_threshold is 217, however, the value of tx_wake_threshold is
> > 147, it does not make sense that tx_wake_threshold is less than
> > tx_stop_threshold.
>
> What do these actually mean? I could imagine that as the ring fills you don't
> want to stop until it is 217/512 full. There is then some hysteresis, such that it
> has to drop below 147/512 before more can be added?
>
You must have misunderstood, let me explain more clearly, the queue will be
stopped when the available BDs are less than tx_stop_threshold (217 BDs). And
the queue will be waked when the available BDs are greater than tx_wake_threshold
(147 BDs). So in most cases, the available BDs are greater than tx_wake_threshold
when the queue is stopped, the only effect is to delay packet sending.
In my opinion, tx_wake_threshold should be greater than tx_stop_threshold, we
should stop queue when the available BDs are not enough for a skb to be attached.
And wake the queue when the available BDs are sufficient for a skb.

> > Besides, both XDP path and 'slow path' share the tx BD rings. So if
> > tx_stop_threshold is 217, in the case of heavy XDP traffic, the slow
> > path is easily to be stopped, this will have a serious impact on the
> > slow path.
>
> Please post your iperf results for various platforms, so we can see the effects of
> this. We generally don't accept tuning patches without benchmarks which
> prove the improvements, and also show there is no regression. And given the
> wide variety of SoCs using the FEC, i expect testing on a number of SoCs, but
> Fast and 1G.
>
Below are the results on i.MX6UL/8MM/8MP/8ULP/93 platforms, i.MX6UL and
8ULP only support Fast ethernet. Others support 1G.
1.1 i.MX6UL with tx ring size 512
root@imx6ul7d:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[ 5] local 192.168.3.9 port 47148 connected to 192.168.3.6 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 10.1 MBytes 84.9 Mbits/sec 0 103 KBytes
[ 5] 1.00-2.00 sec 9.88 MBytes 82.6 Mbits/sec 0 103 KBytes
[ 5] 2.00-3.00 sec 9.82 MBytes 82.7 Mbits/sec 0 103 KBytes
[ 5] 3.00-4.00 sec 9.82 MBytes 82.4 Mbits/sec 0 103 KBytes
[ 5] 4.00-5.00 sec 9.88 MBytes 82.9 Mbits/sec 0 103 KBytes
[ 5] 5.00-6.00 sec 9.94 MBytes 83.4 Mbits/sec 0 103 KBytes
[ 5] 6.00-7.00 sec 10.1 MBytes 84.3 Mbits/sec 0 103 KBytes
[ 5] 7.00-8.00 sec 9.82 MBytes 82.4 Mbits/sec 0 103 KBytes
[ 5] 8.00-9.00 sec 9.82 MBytes 82.4 Mbits/sec 0 103 KBytes
[ 5] 9.00-10.00 sec 9.88 MBytes 82.9 Mbits/sec 0 103 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 99.0 MBytes 83.1 Mbits/sec 0 sender
[ 5] 0.00-10.01 sec 98.8 MBytes 82.8 Mbits/sec receiver

1.2 i.MX6UL with tx ring size 1024
root@imx6ul7d:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[ 5] local 192.168.3.9 port 55670 connected to 192.168.3.6 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 10.2 MBytes 85.4 Mbits/sec 0 236 KBytes
[ 5] 1.00-2.00 sec 10.1 MBytes 84.6 Mbits/sec 0 236 KBytes
[ 5] 2.00-3.00 sec 10.2 MBytes 85.5 Mbits/sec 0 249 KBytes
[ 5] 3.00-4.00 sec 10.1 MBytes 85.1 Mbits/sec 0 249 KBytes
[ 5] 4.00-5.00 sec 10.1 MBytes 84.7 Mbits/sec 0 249 KBytes
[ 5] 5.00-6.00 sec 10.0 MBytes 84.1 Mbits/sec 0 249 KBytes
[ 5] 6.00-7.00 sec 10.1 MBytes 85.1 Mbits/sec 0 249 KBytes
[ 5] 7.00-8.00 sec 10.1 MBytes 84.9 Mbits/sec 0 249 KBytes
[ 5] 8.00-9.00 sec 10.3 MBytes 85.9 Mbits/sec 0 249 KBytes
[ 5] 9.00-10.00 sec 10.2 MBytes 85.6 Mbits/sec 0 249 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 101 MBytes 85.1 Mbits/sec 0 sender
[ 5] 0.00-10.01 sec 101 MBytes 84.5 Mbits/sec receiver

2.1 i.MX8ULP with tx ring size 512
root@imx8ulpevk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[ 5] local 192.168.3.9 port 54342 connected to 192.168.3.6 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 10.8 MBytes 90.3 Mbits/sec 0 99.0 KBytes
[ 5] 1.00-2.00 sec 9.94 MBytes 83.4 Mbits/sec 0 99.0 KBytes
[ 5] 2.00-3.00 sec 10.2 MBytes 85.5 Mbits/sec 0 99.0 KBytes
[ 5] 3.00-4.00 sec 9.94 MBytes 83.4 Mbits/sec 0 99.0 KBytes
[ 5] 4.00-5.00 sec 10.2 MBytes 85.5 Mbits/sec 0 99.0 KBytes
[ 5] 5.00-6.00 sec 9.94 MBytes 83.4 Mbits/sec 0 99.0 KBytes
[ 5] 6.00-7.00 sec 9.69 MBytes 81.3 Mbits/sec 0 99.0 KBytes
[ 5] 7.00-8.00 sec 9.94 MBytes 83.4 Mbits/sec 0 99.0 KBytes
[ 5] 8.00-9.00 sec 9.69 MBytes 81.3 Mbits/sec 0 99.0 KBytes
[ 5] 9.00-10.00 sec 10.2 MBytes 85.5 Mbits/sec 0 99.0 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 100 MBytes 84.3 Mbits/sec 0 sender
[ 5] 0.00-9.90 sec 100 MBytes 84.7 Mbits/sec receiver

2.1 i.MX8ULP with tx ring size 1024
root@imx8ulpevk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[ 5] local 192.168.3.9 port 44770 connected to 192.168.3.6 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 10.7 MBytes 90.1 Mbits/sec 0 93.3 KBytes
[ 5] 1.00-2.00 sec 9.94 MBytes 83.4 Mbits/sec 0 93.3 KBytes
[ 5] 2.00-3.00 sec 10.2 MBytes 85.5 Mbits/sec 0 93.3 KBytes
[ 5] 3.00-4.00 sec 10.1 MBytes 85.0 Mbits/sec 0 93.3 KBytes
[ 5] 4.00-5.00 sec 9.94 MBytes 83.4 Mbits/sec 0 100 KBytes
[ 5] 5.00-6.00 sec 10.2 MBytes 85.5 Mbits/sec 0 100 KBytes
[ 5] 6.00-7.00 sec 9.69 MBytes 81.3 Mbits/sec 0 100 KBytes
[ 5] 7.00-8.00 sec 9.94 MBytes 83.4 Mbits/sec 0 100 KBytes
[ 5] 8.00-9.00 sec 10.2 MBytes 85.5 Mbits/sec 0 100 KBytes
[ 5] 9.00-10.00 sec 9.69 MBytes 81.3 Mbits/sec 0 100 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 101 MBytes 84.4 Mbits/sec 0 sender
[ 5] 0.00-9.92 sec 100 MBytes 84.8 Mbits/sec receiver

3.1 i.MX8MM with tx ring size 512
root@imx8mmevk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[ 5] local 192.168.3.9 port 55734 connected to 192.168.3.6 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 111 MBytes 934 Mbits/sec 0 577 KBytes
[ 5] 1.00-2.00 sec 112 MBytes 937 Mbits/sec 0 577 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 942 Mbits/sec 0 609 KBytes
[ 5] 3.00-4.00 sec 113 MBytes 945 Mbits/sec 0 638 KBytes
[ 5] 4.00-5.00 sec 112 MBytes 941 Mbits/sec 0 638 KBytes
[ 5] 5.00-6.00 sec 112 MBytes 942 Mbits/sec 0 638 KBytes
[ 5] 6.00-7.00 sec 112 MBytes 942 Mbits/sec 0 638 KBytes
[ 5] 7.00-8.00 sec 112 MBytes 943 Mbits/sec 0 638 KBytes
[ 5] 8.00-9.00 sec 112 MBytes 943 Mbits/sec 0 638 KBytes
[ 5] 9.00-10.00 sec 112 MBytes 942 Mbits/sec 0 638 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 941 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 1.09 GBytes 938 Mbits/sec receiver

3.2 i.MX8MM with tx ring size 1024
root@imx8mmevk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[ 5] local 192.168.3.9 port 53350 connected to 192.168.3.6 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 114 MBytes 952 Mbits/sec 0 585 KBytes
[ 5] 1.00-2.00 sec 112 MBytes 942 Mbits/sec 0 585 KBytes
[ 5] 2.00-3.00 sec 113 MBytes 947 Mbits/sec 0 585 KBytes
[ 5] 3.00-4.00 sec 112 MBytes 940 Mbits/sec 0 648 KBytes
[ 5] 4.00-5.00 sec 112 MBytes 944 Mbits/sec 0 648 KBytes
[ 5] 5.00-6.00 sec 112 MBytes 944 Mbits/sec 0 648 KBytes
[ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 648 KBytes
[ 5] 7.00-8.00 sec 112 MBytes 944 Mbits/sec 0 648 KBytes
[ 5] 8.00-9.00 sec 112 MBytes 944 Mbits/sec 0 648 KBytes
[ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 0 648 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 943 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec receiver

4.1 i.MX8MP with tx ring size 512
root@imx8mpevk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[ 5] local 192.168.3.9 port 51892 connected to 192.168.3.6 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 114 MBytes 959 Mbits/sec 0 594 KBytes
[ 5] 1.00-2.00 sec 112 MBytes 940 Mbits/sec 0 626 KBytes
[ 5] 2.00-3.00 sec 113 MBytes 946 Mbits/sec 0 626 KBytes
[ 5] 3.00-4.00 sec 112 MBytes 937 Mbits/sec 0 626 KBytes
[ 5] 4.00-5.00 sec 112 MBytes 940 Mbits/sec 0 626 KBytes
[ 5] 5.00-6.00 sec 112 MBytes 940 Mbits/sec 0 626 KBytes
[ 5] 6.00-7.00 sec 113 MBytes 946 Mbits/sec 0 626 KBytes
[ 5] 7.00-8.00 sec 112 MBytes 939 Mbits/sec 0 626 KBytes
[ 5] 8.00-9.00 sec 111 MBytes 935 Mbits/sec 0 626 KBytes
[ 5] 9.00-10.00 sec 112 MBytes 943 Mbits/sec 0 626 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 943 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 1.09 GBytes 940 Mbits/sec receiver

4.2 i.MX8MP with tx ring size 1024
root@imx8mpevk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[ 5] local 192.168.3.9 port 37922 connected to 192.168.3.6 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 113 MBytes 951 Mbits/sec 0 608 KBytes
[ 5] 1.00-2.00 sec 112 MBytes 937 Mbits/sec 0 608 KBytes
[ 5] 2.00-3.00 sec 113 MBytes 947 Mbits/sec 0 608 KBytes
[ 5] 3.00-4.00 sec 111 MBytes 934 Mbits/sec 0 608 KBytes
[ 5] 4.00-5.00 sec 112 MBytes 942 Mbits/sec 0 608 KBytes
[ 5] 5.00-6.00 sec 112 MBytes 939 Mbits/sec 0 608 KBytes
[ 5] 6.00-7.00 sec 113 MBytes 949 Mbits/sec 0 608 KBytes
[ 5] 7.00-8.00 sec 112 MBytes 942 Mbits/sec 0 608 KBytes
[ 5] 8.00-9.00 sec 112 MBytes 936 Mbits/sec 0 608 KBytes
[ 5] 9.00-10.00 sec 112 MBytes 942 Mbits/sec 0 608 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 942 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 1.09 GBytes 939 Mbits/sec receiver

5.1 i.MX93 with tx ring size 512
root@imx93evk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[ 5] local 192.168.3.9 port 44216 connected to 192.168.3.6 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 115 MBytes 965 Mbits/sec 0 656 KBytes
[ 5] 1.00-2.00 sec 111 MBytes 934 Mbits/sec 0 656 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes
[ 5] 3.00-4.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes
[ 5] 4.00-5.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes
[ 5] 5.00-6.00 sec 111 MBytes 933 Mbits/sec 0 656 KBytes
[ 5] 6.00-7.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes
[ 5] 7.00-8.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes
[ 5] 8.00-9.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes
[ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 0 656 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 944 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 1.10 GBytes 941 Mbits/sec receiver

5.2 i.MX93 with tx ring size 1024
root@imx93evk:~# iperf3 -c 192.168.3.6
Connecting to host 192.168.3.6, port 5201
[ 5] local 192.168.3.9 port 51058 connected to 192.168.3.6 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 114 MBytes 959 Mbits/sec 0 588 KBytes
[ 5] 1.00-2.00 sec 112 MBytes 935 Mbits/sec 0 649 KBytes
[ 5] 2.00-3.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes
[ 5] 3.00-4.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes
[ 5] 4.00-5.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes
[ 5] 5.00-6.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes
[ 5] 6.00-7.00 sec 111 MBytes 933 Mbits/sec 0 649 KBytes
[ 5] 7.00-8.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes
[ 5] 8.00-9.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes
[ 5] 9.00-10.00 sec 112 MBytes 944 Mbits/sec 0 649 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.10 GBytes 943 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 1.10 GBytes 940 Mbits/sec receiver

2023-07-05 18:18:47

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring

On Wed, 5 Jul 2023 06:20:26 +0000 Wei Fang wrote:
> > > In addtion, this patch also updates the tx_stop_threshold and the
> > > tx_wake_threshold of the tx ring. In previous logic, the value of
> > > tx_stop_threshold is 217, however, the value of tx_wake_threshold is
> > > 147, it does not make sense that tx_wake_threshold is less than
> > > tx_stop_threshold.
> >
> > What do these actually mean? I could imagine that as the ring fills you don't
> > want to stop until it is 217/512 full. There is then some hysteresis, such that it
> > has to drop below 147/512 before more can be added?
> >
> You must have misunderstood, let me explain more clearly, the queue will be
> stopped when the available BDs are less than tx_stop_threshold (217 BDs). And
> the queue will be waked when the available BDs are greater than tx_wake_threshold
> (147 BDs). So in most cases, the available BDs are greater than tx_wake_threshold
> when the queue is stopped, the only effect is to delay packet sending.
> In my opinion, tx_wake_threshold should be greater than tx_stop_threshold, we
> should stop queue when the available BDs are not enough for a skb to be attached.
> And wake the queue when the available BDs are sufficient for a skb.

But you shouldn't restart the queue for a single packet either.
Restarting for a single packet wastes CPU cycles as there will
be much more stop / start operations. Two large packets seem
like the absolute minimum reasonable wake threshold.

Setting tx_stop_threshold to MAX_SKB_FRAGS doesn't seem right either,
as you won't be able to accept a full TSO frame.

Please split the change, the netdev_err_once() should be one patch
and then the change to wake thresh a separate one.
--
pw-bot: cr

2023-07-06 02:05:08

by Wei Fang

[permalink] [raw]
Subject: RE: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring

> -----Original Message-----
> From: Jakub Kicinski <[email protected]>
> Sent: 2023??7??6?? 2:11
> To: Wei Fang <[email protected]>
> Cc: Andrew Lunn <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> Shenwei Wang <[email protected]>; Clark Wang
> <[email protected]>; [email protected]; dl-linux-imx
> <[email protected]>; [email protected]; [email protected]
> Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update
> thresholds of tx ring
>
> On Wed, 5 Jul 2023 06:20:26 +0000 Wei Fang wrote:
> > > > In addtion, this patch also updates the tx_stop_threshold and the
> > > > tx_wake_threshold of the tx ring. In previous logic, the value of
> > > > tx_stop_threshold is 217, however, the value of tx_wake_threshold
> > > > is 147, it does not make sense that tx_wake_threshold is less than
> > > > tx_stop_threshold.
> > >
> > > What do these actually mean? I could imagine that as the ring fills
> > > you don't want to stop until it is 217/512 full. There is then some
> > > hysteresis, such that it has to drop below 147/512 before more can be
> added?
> > >
> > You must have misunderstood, let me explain more clearly, the queue
> > will be stopped when the available BDs are less than tx_stop_threshold
> > (217 BDs). And the queue will be waked when the available BDs are
> > greater than tx_wake_threshold
> > (147 BDs). So in most cases, the available BDs are greater than
> > tx_wake_threshold when the queue is stopped, the only effect is to delay
> packet sending.
> > In my opinion, tx_wake_threshold should be greater than
> > tx_stop_threshold, we should stop queue when the available BDs are not
> enough for a skb to be attached.
> > And wake the queue when the available BDs are sufficient for a skb.
>
> But you shouldn't restart the queue for a single packet either.
> Restarting for a single packet wastes CPU cycles as there will be much more
> stop / start operations. Two large packets seem like the absolute minimum
> reasonable wake threshold.
>
> Setting tx_stop_threshold to MAX_SKB_FRAGS doesn't seem right either, as
> you won't be able to accept a full TSO frame.
>
Maybe I should keep the tx_stop_threshold unchanged, so that the queue is
to be stopped if the available BDs is not enough for a full TSO frame to be attached.
And then just change tx_wake_threshold to tx_stop_threshold + 1, which I think it's
more reasonable.

> Please split the change, the netdev_err_once() should be one patch and then
> the change to wake thresh a separate one.
Okay, thanks!

2023-07-06 03:31:57

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring

On Thu, 6 Jul 2023 01:44:49 +0000 Wei Fang wrote:
> > But you shouldn't restart the queue for a single packet either.
> > Restarting for a single packet wastes CPU cycles as there will be much more
> > stop / start operations. Two large packets seem like the absolute minimum
> > reasonable wake threshold.
> >
> > Setting tx_stop_threshold to MAX_SKB_FRAGS doesn't seem right either, as
> > you won't be able to accept a full TSO frame.
> >
> Maybe I should keep the tx_stop_threshold unchanged, so that the queue is
> to be stopped if the available BDs is not enough for a full TSO frame to be attached.
> And then just change tx_wake_threshold to tx_stop_threshold + 1, which I think it's
> more reasonable.

How about at least tx_stop_threshold + 2 * MAX_SKB_FRAGS ?
If a queue of hundreds of entries is overflowing, we should be able
to apply a hysteresis of a few tens of entries. Do you see a
difference in drops? The packets from the stack should preferably stay
in the qdiscs instead of the driver queue, where AQM and scheduling can
be applied.

2023-07-06 06:28:08

by Wei Fang

[permalink] [raw]
Subject: RE: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring

> -----Original Message-----
> From: Jakub Kicinski <[email protected]>
> Sent: 2023??7??6?? 11:12
> To: Wei Fang <[email protected]>
> Cc: Andrew Lunn <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> Shenwei Wang <[email protected]>; Clark Wang
> <[email protected]>; [email protected]; dl-linux-imx
> <[email protected]>; [email protected]; [email protected]
> Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update
> thresholds of tx ring
>
> On Thu, 6 Jul 2023 01:44:49 +0000 Wei Fang wrote:
> > > But you shouldn't restart the queue for a single packet either.
> > > Restarting for a single packet wastes CPU cycles as there will be
> > > much more stop / start operations. Two large packets seem like the
> > > absolute minimum reasonable wake threshold.
> > >
> > > Setting tx_stop_threshold to MAX_SKB_FRAGS doesn't seem right
> > > either, as you won't be able to accept a full TSO frame.
> > >
> > Maybe I should keep the tx_stop_threshold unchanged, so that the queue
> > is to be stopped if the available BDs is not enough for a full TSO frame to be
> attached.
> > And then just change tx_wake_threshold to tx_stop_threshold + 1, which
> > I think it's more reasonable.
>
> How about at least tx_stop_threshold + 2 * MAX_SKB_FRAGS ?
It's okay. The iperf performance is well as before.

> If a queue of hundreds of entries is overflowing, we should be able to apply a
> hysteresis of a few tens of entries. Do you see a difference in drops? The
I didn't see there was any packet loss.

> packets from the stack should preferably stay in the qdiscs instead of the driver
> queue, where AQM and scheduling can be applied.

2023-07-08 20:08:55

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring

> > How does this affect platforms like Vybrid with its fast Ethernet?
> Sorry, I don't have the Vybrid platform, but I think I don't think it has much
> impact, at most it just takes up some more memory.

It has been 6 months since the page pool patches were posted and i
asked about benchmark results for other platforms like Vybrid. Is it
so hard to get hold or reference platforms? Fugang Duan used to have a
test farm of all sorts of boards and reported to me the regressions i
introduced with MDIO changes and PM changes. As somebody who seems to
be an NXP FEC Maintainer i would expect you to have access to a range
of hardware. Especially since XDP and eBPF is a bit of a niche for
embedded processes which NXP produce. You want to be sure your changes
don't regress the main use cases which i guess are plain networking.

> > Does the burst latency go up?
> No, for fec, when a packet is attached to the BDs, the software will immediately
> trigger the hardware to send the packet. In addition, I think it may improve the
> latency, because the size of the tx ring becomes larger, and more packets can be
> attached to the BD ring for burst traffic.

And a bigger burst means more latency. Read about Buffer
bloat.

While you have iperf running saturating the link, try a ping as
well. How does ping latency change with more TX buffers?

Ideally you want enough transmit buffers to keep the link full, but
not more. If the driver is using BQL, the network stack will help with
this.

> Below are the results on i.MX6UL/8MM/8MP/8ULP/93 platforms, i.MX6UL and
> 8ULP only support Fast ethernet. Others support 1G.

Thanks for the benchmark numbers. Please get into the habit of
including them. We like to see justification for any sort of
performance tweaks.

Andrew

2023-07-10 06:30:28

by Wei Fang

[permalink] [raw]
Subject: RE: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring

> -----Original Message-----
> From: Andrew Lunn <[email protected]>
> Sent: 2023??7??9?? 3:04
> To: Wei Fang <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; Shenwei Wang
> <[email protected]>; Clark Wang <[email protected]>;
> [email protected]; dl-linux-imx <[email protected]>;
> [email protected]; [email protected]
> Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update
> thresholds of tx ring
>
> > > How does this affect platforms like Vybrid with its fast Ethernet?
> > Sorry, I don't have the Vybrid platform, but I think I don't think it
> > has much impact, at most it just takes up some more memory.
>
> It has been 6 months since the page pool patches were posted and i asked
> about benchmark results for other platforms like Vybrid. Is it so hard to get
> hold or reference platforms? Fugang Duan used to have a test farm of all sorts
> of boards and reported to me the regressions i introduced with MDIO changes
> and PM changes. As somebody who seems to be an NXP FEC Maintainer i
> would expect you to have access to a range of hardware. Especially since XDP
> and eBPF is a bit of a niche for embedded processes which NXP produce. You
> want to be sure your changes don't regress the main use cases which i guess
> are plain networking.
>
Sorry, the Vybrid platform is not currently maintained by us, and Vybrid is also not
included in our usual Yocto SDK RC version. Even I find a Vybrid board, I think it
probably cann't run with the latest kernel image, because the latest kernel image
does not match with the old Yocto SDK, and new Yocto SDK does not support Vybrid
platform. I also asked my colleague in test team who is in charge of ethernet testing,
she hadn't even heard of Vybrid platform.

> > > Does the burst latency go up?
> > No, for fec, when a packet is attached to the BDs, the software will
> > immediately trigger the hardware to send the packet. In addition, I
> > think it may improve the latency, because the size of the tx ring
> > becomes larger, and more packets can be attached to the BD ring for burst
> traffic.
>
> And a bigger burst means more latency. Read about Buffer bloat.
>
Perhaps, but not necessarily. For example, in some cases that some burst packets
maybe stay in Qdisc instead of hardware queue if ring size is small.

> While you have iperf running saturating the link, try a ping as well. How does
> ping latency change with more TX buffers?
>
Per your suggestions, I tested on i.MX8ULP, i.MX8MM and i.MX93 platforms. The
results do not appear to be very different.
i.MX93 with ring size 1024:
fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.114 -c 16
PING 10.193.102.114 (10.193.102.114) 56(84) bytes of data.
64 bytes from 10.193.102.114: icmp_seq=1 ttl=63 time=3.78 ms
64 bytes from 10.193.102.114: icmp_seq=2 ttl=63 time=2.16 ms
64 bytes from 10.193.102.114: icmp_seq=3 ttl=63 time=3.31 ms
64 bytes from 10.193.102.114: icmp_seq=4 ttl=63 time=2.11 ms
64 bytes from 10.193.102.114: icmp_seq=5 ttl=63 time=3.43 ms
64 bytes from 10.193.102.114: icmp_seq=6 ttl=63 time=3.20 ms
64 bytes from 10.193.102.114: icmp_seq=7 ttl=63 time=3.20 ms
64 bytes from 10.193.102.114: icmp_seq=8 ttl=63 time=3.75 ms
64 bytes from 10.193.102.114: icmp_seq=9 ttl=63 time=3.21 ms
64 bytes from 10.193.102.114: icmp_seq=10 ttl=63 time=3.76 ms
64 bytes from 10.193.102.114: icmp_seq=11 ttl=63 time=2.16 ms
64 bytes from 10.193.102.114: icmp_seq=12 ttl=63 time=2.67 ms
64 bytes from 10.193.102.114: icmp_seq=13 ttl=63 time=3.59 ms
64 bytes from 10.193.102.114: icmp_seq=14 ttl=63 time=2.55 ms
64 bytes from 10.193.102.114: icmp_seq=15 ttl=63 time=2.54 ms
64 bytes from 10.193.102.114: icmp_seq=16 ttl=63 time=3.88 ms

--- 10.193.102.114 ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15027ms
rtt min/avg/max/mdev = 2.112/3.082/3.875/0.606 ms

i.MX93 with ring size 512:
fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.114 -c 16
PING 10.193.102.114 (10.193.102.114) 56(84) bytes of data.
64 bytes from 10.193.102.114: icmp_seq=1 ttl=63 time=2.74 ms
64 bytes from 10.193.102.114: icmp_seq=2 ttl=63 time=3.32 ms
64 bytes from 10.193.102.114: icmp_seq=3 ttl=63 time=2.72 ms
64 bytes from 10.193.102.114: icmp_seq=4 ttl=63 time=3.36 ms
64 bytes from 10.193.102.114: icmp_seq=5 ttl=63 time=3.41 ms
64 bytes from 10.193.102.114: icmp_seq=6 ttl=63 time=2.67 ms
64 bytes from 10.193.102.114: icmp_seq=7 ttl=63 time=2.77 ms
64 bytes from 10.193.102.114: icmp_seq=8 ttl=63 time=3.38 ms
64 bytes from 10.193.102.114: icmp_seq=9 ttl=63 time=2.54 ms
64 bytes from 10.193.102.114: icmp_seq=10 ttl=63 time=3.36 ms
64 bytes from 10.193.102.114: icmp_seq=11 ttl=63 time=3.44 ms
64 bytes from 10.193.102.114: icmp_seq=12 ttl=63 time=2.80 ms
64 bytes from 10.193.102.114: icmp_seq=13 ttl=63 time=2.86 ms
64 bytes from 10.193.102.114: icmp_seq=14 ttl=63 time=3.90 ms
64 bytes from 10.193.102.114: icmp_seq=15 ttl=63 time=2.50 ms
64 bytes from 10.193.102.114: icmp_seq=16 ttl=63 time=2.89 ms

--- 10.193.102.114 ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15028ms
rtt min/avg/max/mdev = 2.496/3.040/3.898/0.394 ms

i.MX8MM with ring size 1024:
fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.126 -c 16
PING 10.193.102.126 (10.193.102.126) 56(84) bytes of data.
64 bytes from 10.193.102.126: icmp_seq=1 ttl=127 time=1.34 ms
64 bytes from 10.193.102.126: icmp_seq=2 ttl=127 time=2.07 ms
64 bytes from 10.193.102.126: icmp_seq=3 ttl=127 time=2.40 ms
64 bytes from 10.193.102.126: icmp_seq=4 ttl=127 time=1.48 ms
64 bytes from 10.193.102.126: icmp_seq=5 ttl=127 time=1.69 ms
64 bytes from 10.193.102.126: icmp_seq=6 ttl=127 time=1.54 ms
64 bytes from 10.193.102.126: icmp_seq=7 ttl=127 time=2.30 ms
64 bytes from 10.193.102.126: icmp_seq=8 ttl=127 time=1.94 ms
64 bytes from 10.193.102.126: icmp_seq=9 ttl=127 time=4.25 ms
64 bytes from 10.193.102.126: icmp_seq=10 ttl=127 time=1.75 ms
64 bytes from 10.193.102.126: icmp_seq=11 ttl=127 time=1.25 ms
64 bytes from 10.193.102.126: icmp_seq=12 ttl=127 time=2.04 ms
64 bytes from 10.193.102.126: icmp_seq=13 ttl=127 time=2.31 ms
64 bytes from 10.193.102.126: icmp_seq=14 ttl=127 time=2.18 ms
64 bytes from 10.193.102.126: icmp_seq=15 ttl=127 time=2.25 ms
64 bytes from 10.193.102.126: icmp_seq=16 ttl=127 time=1.37 ms

--- 10.193.102.126 ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15026ms
rtt min/avg/max/mdev = 1.248/2.011/4.250/0.686 ms

i.MX8MM with ring size 512:
fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.126 -c 16
PING 10.193.102.126 (10.193.102.126) 56(84) bytes of data.
64 bytes from 10.193.102.126: icmp_seq=1 ttl=63 time=4.82 ms
64 bytes from 10.193.102.126: icmp_seq=2 ttl=63 time=4.67 ms
64 bytes from 10.193.102.126: icmp_seq=3 ttl=63 time=3.74 ms
64 bytes from 10.193.102.126: icmp_seq=4 ttl=63 time=3.87 ms
64 bytes from 10.193.102.126: icmp_seq=5 ttl=63 time=3.30 ms
64 bytes from 10.193.102.126: icmp_seq=6 ttl=63 time=3.79 ms
64 bytes from 10.193.102.126: icmp_seq=7 ttl=127 time=2.12 ms
64 bytes from 10.193.102.126: icmp_seq=8 ttl=127 time=1.99 ms
64 bytes from 10.193.102.126: icmp_seq=9 ttl=127 time=2.15 ms
64 bytes from 10.193.102.126: icmp_seq=10 ttl=127 time=1.82 ms
64 bytes from 10.193.102.126: icmp_seq=11 ttl=127 time=1.92 ms
64 bytes from 10.193.102.126: icmp_seq=12 ttl=127 time=1.23 ms
64 bytes from 10.193.102.126: icmp_seq=13 ttl=127 time=2.00 ms
64 bytes from 10.193.102.126: icmp_seq=14 ttl=127 time=1.66 ms
64 bytes from 10.193.102.126: icmp_seq=15 ttl=127 time=1.75 ms
64 bytes from 10.193.102.126: icmp_seq=16 ttl=127 time=2.24 ms

--- 10.193.102.126 ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15026ms
rtt min/avg/max/mdev = 1.226/2.691/4.820/1.111 ms

i.MX8ULP with ring size 1024:
fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.216 -c 16
PING 10.193.102.216 (10.193.102.216) 56(84) bytes of data.
64 bytes from 10.193.102.216: icmp_seq=1 ttl=63 time=3.40 ms
64 bytes from 10.193.102.216: icmp_seq=2 ttl=63 time=5.46 ms
64 bytes from 10.193.102.216: icmp_seq=3 ttl=63 time=5.55 ms
64 bytes from 10.193.102.216: icmp_seq=4 ttl=63 time=5.97 ms
64 bytes from 10.193.102.216: icmp_seq=5 ttl=63 time=6.26 ms
64 bytes from 10.193.102.216: icmp_seq=6 ttl=63 time=0.963 ms
64 bytes from 10.193.102.216: icmp_seq=7 ttl=63 time=4.10 ms
64 bytes from 10.193.102.216: icmp_seq=8 ttl=63 time=4.55 ms
64 bytes from 10.193.102.216: icmp_seq=9 ttl=63 time=1.24 ms
64 bytes from 10.193.102.216: icmp_seq=10 ttl=63 time=6.96 ms
64 bytes from 10.193.102.216: icmp_seq=11 ttl=63 time=3.27 ms
64 bytes from 10.193.102.216: icmp_seq=12 ttl=63 time=6.57 ms
64 bytes from 10.193.102.216: icmp_seq=13 ttl=63 time=2.99 ms
64 bytes from 10.193.102.216: icmp_seq=14 ttl=63 time=1.70 ms
64 bytes from 10.193.102.216: icmp_seq=15 ttl=63 time=1.79 ms
64 bytes from 10.193.102.216: icmp_seq=16 ttl=63 time=1.42 ms

--- 10.193.102.216 ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15026ms
rtt min/avg/max/mdev = 0.963/3.886/6.955/2.009 ms

i.MX8ULP with ring size 512:
fangwei@fangwei-OptiPlex-7050:~$ ping 10.193.102.216 -c 16
PING 10.193.102.216 (10.193.102.216) 56(84) bytes of data.
64 bytes from 10.193.102.216: icmp_seq=1 ttl=63 time=5.70 ms
64 bytes from 10.193.102.216: icmp_seq=2 ttl=63 time=5.89 ms
64 bytes from 10.193.102.216: icmp_seq=3 ttl=63 time=3.37 ms
64 bytes from 10.193.102.216: icmp_seq=4 ttl=63 time=5.07 ms
64 bytes from 10.193.102.216: icmp_seq=5 ttl=63 time=1.47 ms
64 bytes from 10.193.102.216: icmp_seq=6 ttl=63 time=3.45 ms
64 bytes from 10.193.102.216: icmp_seq=7 ttl=63 time=1.35 ms
64 bytes from 10.193.102.216: icmp_seq=8 ttl=63 time=6.62 ms
64 bytes from 10.193.102.216: icmp_seq=9 ttl=63 time=1.41 ms
64 bytes from 10.193.102.216: icmp_seq=10 ttl=63 time=6.43 ms
64 bytes from 10.193.102.216: icmp_seq=11 ttl=63 time=1.41 ms
64 bytes from 10.193.102.216: icmp_seq=12 ttl=63 time=6.75 ms
64 bytes from 10.193.102.216: icmp_seq=13 ttl=63 time=4.76 ms
64 bytes from 10.193.102.216: icmp_seq=14 ttl=63 time=3.85 ms
64 bytes from 10.193.102.216: icmp_seq=15 ttl=63 time=3.50 ms
64 bytes from 10.193.102.216: icmp_seq=16 ttl=63 time=1.37 ms

--- 10.193.102.216 ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 15027ms
rtt min/avg/max/mdev = 1.349/3.900/6.749/1.985 ms

> Ideally you want enough transmit buffers to keep the link full, but not more. If
> the driver is using BQL, the network stack will help with this.
>
> > Below are the results on i.MX6UL/8MM/8MP/8ULP/93 platforms, i.MX6UL
> > and 8ULP only support Fast ethernet. Others support 1G.
>
> Thanks for the benchmark numbers. Please get into the habit of including
> them. We like to see justification for any sort of performance tweaks.
>
> Andrew

2023-07-12 01:35:16

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH net 3/3] net: fec: increase the size of tx ring and update thresholds of tx ring

> Sorry, the Vybrid platform is not currently maintained by us, and Vybrid is also not
> included in our usual Yocto SDK RC version.

Your Yocto SDK version does not matter. We are talking about mainline
here... You are maintaining the mainline driver, and submitting
patches to extend the mainline driver.

> Even I find a Vybrid board, I think it
> probably cann't run with the latest kernel image, because the latest kernel image
> does not match with the old Yocto SDK, and new Yocto SDK does not support Vybrid
> platform. I also asked my colleague in test team who is in charge of ethernet testing,
> she hadn't even heard of Vybrid platform.

Well 6.5-rc1 does run on Vybrid, i have a number of boards with
it. You will also find that many inflight entertainment systems have a
box under each row of seats in a aeroplane with a Vybrid controlling a
Marvell switch. Take a look at arch/arm/boot/dts/vf610-zii-* You will
also find a number of DTS files for imx5i, imx6, imx7 used for ZII
products which are back of the seat displays, and make heavy use of
networking with the FEC.

So i have in interest in the FEC for all these platforms.

> > And a bigger burst means more latency. Read about Buffer bloat.
> >
> Perhaps, but not necessarily. For example, in some cases that some burst packets
> maybe stay in Qdisc instead of hardware queue if ring size is small.

Which is what you want, so high priority packets can jump to the head
of the queue.

> > While you have iperf running saturating the link, try a ping as well. How does
> > ping latency change with more TX buffers?
> >
> Per your suggestions, I tested on i.MX8ULP, i.MX8MM and i.MX93 platforms. The
> results do not appear to be very different.

Thanks for the benchmark numbers. They look O.K.

Andrew