2024-02-06 09:19:38

by Paul Barker

[permalink] [raw]
Subject: [RFC PATCH net-next v2 0/7] Improve GbEth performance on Renesas RZ/G2L and related SoCs

This series aims to improve peformance of the GbEth IP in the Renesas
RZ/G2L SoC family and the RZ/G3S SoC, which use the ravb driver. Along
the way, we do some refactoring and ensure that napi_complete_done() is
used in accordance with the NAPI documentation for both GbEth and R-Car
code paths.

Performance improvment mainly comes from enabling SW IRQ Coalescing for
all SoCs using the GbEth IP, and NAPI Threaded mode for single core SoCs
using the GbEth IP. These can be enabled/disabled at runtime via sysfs,
but our goal is to set sensible defaults which get good performance on
the affected SoCs.

The performance impact of this series on iperf3 testing is as follows:
* RZ/G2L Ethernet throughput is unchanged, but CPU usage drops:
* Bidirectional and TCP RX: 6.5% less CPU usage
* UDP RX: 10% less CPU usage

* RZ/G2UL and RZ/G3S Ethernet throughput is increased for all test
cases except UDP TX, which suffers a slight loss:
* TCP TX: 32% more throughput
* TCP RX: 11% more throughput
* UDP TX: 10% less throughput
* UDP RX: 10183% more throughput - the previous throughput of
1.06Mbps is what prompted this work.

* RZ/G2N CPU usage and Ethernet throughput is unchanged (tested as a
representative of the SoCs which use the R-Car based RAVB IP).

This series depends on:
* "net: ravb: Let IP-specific receive function to interrogate descriptors" v6
https://lore.kernel.org/all/[email protected]/

To get the results shown above, you'll also need:
* "topology: Set capacity_freq_ref in all cases"
https://lore.kernel.org/all/[email protected]/

* "ravb: Add Rx checksum offload support" v4
https://lore.kernel.org/all/[email protected]/

* "ravb: Add Tx checksum offload support" v4
https://lore.kernel.org/all/[email protected]/

Work in this area will continue, in particular we expect to improve
TCP/UDP RX performance further with future changes to RX buffer
handling.

Changes v1->v2:
* Marked as RFC as the series depends on unmerged patches.
* Refactored R-Car code paths as well as GbEth code paths.
* Updated references to the patches this series depends on.

Paul Barker (7):
net: ravb: Simplify poll & receive functions
net: ravb: Count packets instead of descriptors in RX path
net: ravb: Always process TX descriptor ring
net: ravb: Always update error counters
net: ravb: Align poll function with NAPI docs
net: ravb: Enable SW IRQ Coalescing for GbEth
net: ravb: Use NAPI threaded mode on 1-core CPUs with GbEth IP

drivers/net/ethernet/renesas/ravb.h | 3 +-
drivers/net/ethernet/renesas/ravb_main.c | 92 ++++++++++++------------
2 files changed, 46 insertions(+), 49 deletions(-)

--
2.39.2


2024-02-06 09:19:58

by Paul Barker

[permalink] [raw]
Subject: [RFC PATCH net-next v2 1/7] net: ravb: Simplify poll & receive functions

We don't need to pass the work budget to ravb_rx() by reference, it's
cleaner to pass this by value and return the amount of work done. This
allows us to simplify the ravb_poll() function and use the common
`work_done` variable name seen in other network drivers for consistency
and ease of understanding.

In ravb_rx_gbeth() & ravb_rx_rcar(), we can also drop the confusingly
named `boguscnt` variable and use a for loop to iterate through
descriptors.

This is a pure refactor and should not affect behaviour.

Signed-off-by: Paul Barker <[email protected]>
---
drivers/net/ethernet/renesas/ravb.h | 2 +-
drivers/net/ethernet/renesas/ravb_main.c | 47 +++++++++---------------
2 files changed, 19 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb.h b/drivers/net/ethernet/renesas/ravb.h
index 3cf869fb9a68..55a7a08aabef 100644
--- a/drivers/net/ethernet/renesas/ravb.h
+++ b/drivers/net/ethernet/renesas/ravb.h
@@ -1050,7 +1050,7 @@ struct ravb_hw_info {
void (*rx_ring_free)(struct net_device *ndev, int q);
void (*rx_ring_format)(struct net_device *ndev, int q);
void *(*alloc_rx_desc)(struct net_device *ndev, int q);
- bool (*receive)(struct net_device *ndev, int *quota, int q);
+ int (*receive)(struct net_device *ndev, int budget, int q);
void (*set_rate)(struct net_device *ndev);
int (*set_feature)(struct net_device *ndev, netdev_features_t features);
int (*dmac_init)(struct net_device *ndev);
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 4976ecc91cde..b18026575a2d 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -822,7 +822,7 @@ static struct sk_buff *ravb_get_skb_gbeth(struct net_device *ndev, int entry,
}

/* Packet receive function for Gigabit Ethernet */
-static bool ravb_rx_gbeth(struct net_device *ndev, int *quota, int q)
+static int ravb_rx_gbeth(struct net_device *ndev, int budget, int q)
{
struct ravb_private *priv = netdev_priv(ndev);
const struct ravb_hw_info *info = priv->info;
@@ -831,28 +831,24 @@ static bool ravb_rx_gbeth(struct net_device *ndev, int *quota, int q)
struct sk_buff *skb;
dma_addr_t dma_addr;
u8 desc_status;
- int boguscnt;
u16 pkt_len;
u8 die_dt;
int entry;
int limit;
+ int i;

entry = priv->cur_rx[q] % priv->num_rx_ring[q];
- boguscnt = priv->dirty_rx[q] + priv->num_rx_ring[q] - priv->cur_rx[q];
+ limit = priv->dirty_rx[q] + priv->num_rx_ring[q] - priv->cur_rx[q];
stats = &priv->stats[q];

- boguscnt = min(boguscnt, *quota);
- limit = boguscnt;
+ limit = min(limit, budget);
desc = &priv->gbeth_rx_ring[entry];
- while (desc->die_dt != DT_FEMPTY) {
+ for (i = 0; i < limit && desc->die_dt != DT_FEMPTY; i++) {
/* Descriptor type must be checked before all other reads */
dma_rmb();
desc_status = desc->msc;
pkt_len = le16_to_cpu(desc->ds_cc) & RX_DS;

- if (--boguscnt < 0)
- break;
-
/* We use 0-byte descriptors to mark the DMA mapping errors */
if (!pkt_len)
continue;
@@ -949,19 +945,16 @@ static bool ravb_rx_gbeth(struct net_device *ndev, int *quota, int q)
desc->die_dt = DT_FEMPTY;
}

- *quota -= limit - (++boguscnt);
-
- return boguscnt <= 0;
+ return i;
}

/* Packet receive function for Ethernet AVB */
-static bool ravb_rx_rcar(struct net_device *ndev, int *quota, int q)
+static int ravb_rx_rcar(struct net_device *ndev, int budget, int q)
{
struct ravb_private *priv = netdev_priv(ndev);
const struct ravb_hw_info *info = priv->info;
int entry = priv->cur_rx[q] % priv->num_rx_ring[q];
- int boguscnt = (priv->dirty_rx[q] + priv->num_rx_ring[q]) -
- priv->cur_rx[q];
+ int limit = priv->dirty_rx[q] + priv->num_rx_ring[q] - priv->cur_rx[q];
struct net_device_stats *stats = &priv->stats[q];
struct ravb_ex_rx_desc *desc;
struct sk_buff *skb;
@@ -970,19 +963,16 @@ static bool ravb_rx_rcar(struct net_device *ndev, int *quota, int q)
u8 desc_status;
u16 pkt_len;
int limit;
+ int i;

- boguscnt = min(boguscnt, *quota);
- limit = boguscnt;
+ limit = min(limit, budget);
desc = &priv->rx_ring[q][entry];
- while (desc->die_dt != DT_FEMPTY) {
+ for (i = 0; i < limit && desc->die_dt != DT_FEMPTY; i++) {
/* Descriptor type must be checked before all other reads */
dma_rmb();
desc_status = desc->msc;
pkt_len = le16_to_cpu(desc->ds_cc) & RX_DS;

- if (--boguscnt < 0)
- break;
-
/* We use 0-byte descriptors to mark the DMA mapping errors */
if (!pkt_len)
continue;
@@ -1064,18 +1054,16 @@ static bool ravb_rx_rcar(struct net_device *ndev, int *quota, int q)
desc->die_dt = DT_FEMPTY;
}

- *quota -= limit - (++boguscnt);
-
- return boguscnt <= 0;
+ return i;
}

/* Packet receive function for Ethernet AVB */
-static bool ravb_rx(struct net_device *ndev, int *quota, int q)
+static int ravb_rx(struct net_device *ndev, int budget, int q)
{
struct ravb_private *priv = netdev_priv(ndev);
const struct ravb_hw_info *info = priv->info;

- return info->receive(ndev, quota, q);
+ return info->receive(ndev, budget, q);
}

static void ravb_rcv_snd_disable(struct net_device *ndev)
@@ -1353,12 +1341,13 @@ static int ravb_poll(struct napi_struct *napi, int budget)
unsigned long flags;
int q = napi - priv->napi;
int mask = BIT(q);
- int quota = budget;
+ int work_done;

/* Processing RX Descriptor Ring */
/* Clear RX interrupt */
ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0);
- if (ravb_rx(ndev, &quota, q))
+ work_done = ravb_rx(ndev, budget, q);
+ if (work_done == budget)
goto out;

/* Processing TX Descriptor Ring */
@@ -1391,7 +1380,7 @@ static int ravb_poll(struct napi_struct *napi, int budget)
if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors)
ndev->stats.rx_fifo_errors = priv->rx_fifo_errors;
out:
- return budget - quota;
+ return work_done;
}

static void ravb_set_duplex_gbeth(struct net_device *ndev)
--
2.39.2


2024-02-06 09:20:16

by Paul Barker

[permalink] [raw]
Subject: [RFC PATCH net-next v2 3/7] net: ravb: Always process TX descriptor ring

The TX queue should be serviced each time the poll function is called,
even if the full RX work budget has been consumed. This prevents
starvation of the TX queue when RX bandwidth usage is high.

Signed-off-by: Paul Barker <[email protected]>
---
drivers/net/ethernet/renesas/ravb_main.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 20193944c143..10f11141569f 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1349,8 +1349,6 @@ static int ravb_poll(struct napi_struct *napi, int budget)
/* Clear RX interrupt */
ravb_write(ndev, ~(mask | RIS0_RESERVED), RIS0);
work_done = ravb_rx(ndev, budget, q);
- if (work_done == budget)
- goto out;

/* Processing TX Descriptor Ring */
spin_lock_irqsave(&priv->lock, flags);
@@ -1360,6 +1358,9 @@ static int ravb_poll(struct napi_struct *napi, int budget)
netif_wake_subqueue(ndev, q);
spin_unlock_irqrestore(&priv->lock, flags);

+ if (work_done == budget)
+ goto out;
+
napi_complete(napi);

/* Re-enable RX/TX interrupts */
--
2.39.2


2024-02-06 09:20:34

by Paul Barker

[permalink] [raw]
Subject: [RFC PATCH net-next v2 2/7] net: ravb: Count packets instead of descriptors in RX path

The units of "work done" in the RX path should be packets instead of
descriptors, as large packets can be spread over multiple descriptors.

Signed-off-by: Paul Barker <[email protected]>
---
drivers/net/ethernet/renesas/ravb_main.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index b18026575a2d..20193944c143 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -830,6 +830,7 @@ static int ravb_rx_gbeth(struct net_device *ndev, int budget, int q)
struct ravb_rx_desc *desc;
struct sk_buff *skb;
dma_addr_t dma_addr;
+ int rx_packets = 0;
u8 desc_status;
u16 pkt_len;
u8 die_dt;
@@ -841,9 +842,8 @@ static int ravb_rx_gbeth(struct net_device *ndev, int budget, int q)
limit = priv->dirty_rx[q] + priv->num_rx_ring[q] - priv->cur_rx[q];
stats = &priv->stats[q];

- limit = min(limit, budget);
desc = &priv->gbeth_rx_ring[entry];
- for (i = 0; i < limit && desc->die_dt != DT_FEMPTY; i++) {
+ for (i = 0; i < limit && rx_packets < budget && desc->die_dt != DT_FEMPTY; i++) {
/* Descriptor type must be checked before all other reads */
dma_rmb();
desc_status = desc->msc;
@@ -876,7 +876,7 @@ static int ravb_rx_gbeth(struct net_device *ndev, int budget, int q)
if (ndev->features & NETIF_F_RXCSUM)
ravb_rx_csum_gbeth(skb);
napi_gro_receive(&priv->napi[q], skb);
- stats->rx_packets++;
+ rx_packets++;
stats->rx_bytes += pkt_len;
break;
case DT_FSTART:
@@ -906,7 +906,7 @@ static int ravb_rx_gbeth(struct net_device *ndev, int budget, int q)
ravb_rx_csum_gbeth(skb);
napi_gro_receive(&priv->napi[q],
priv->rx_1st_skb);
- stats->rx_packets++;
+ rx_packets++;
stats->rx_bytes += pkt_len;
break;
}
@@ -945,7 +945,8 @@ static int ravb_rx_gbeth(struct net_device *ndev, int budget, int q)
desc->die_dt = DT_FEMPTY;
}

- return i;
+ stats->rx_packets += rx_packets;
+ return rx_packets;
}

/* Packet receive function for Ethernet AVB */
@@ -960,14 +961,14 @@ static int ravb_rx_rcar(struct net_device *ndev, int budget, int q)
struct sk_buff *skb;
dma_addr_t dma_addr;
struct timespec64 ts;
+ int rx_packets = 0;
u8 desc_status;
u16 pkt_len;
int limit;
int i;

- limit = min(limit, budget);
desc = &priv->rx_ring[q][entry];
- for (i = 0; i < limit && desc->die_dt != DT_FEMPTY; i++) {
+ for (i = 0; i < limit && rx_packets < budget && desc->die_dt != DT_FEMPTY; i++) {
/* Descriptor type must be checked before all other reads */
dma_rmb();
desc_status = desc->msc;
@@ -1018,7 +1019,7 @@ static int ravb_rx_rcar(struct net_device *ndev, int budget, int q)
if (ndev->features & NETIF_F_RXCSUM)
ravb_rx_csum(skb);
napi_gro_receive(&priv->napi[q], skb);
- stats->rx_packets++;
+ rx_packets++;
stats->rx_bytes += pkt_len;
}

@@ -1054,7 +1055,8 @@ static int ravb_rx_rcar(struct net_device *ndev, int budget, int q)
desc->die_dt = DT_FEMPTY;
}

- return i;
+ stats->rx_packets += rx_packets;
+ return rx_packets;
}

/* Packet receive function for Ethernet AVB */
--
2.39.2


2024-02-06 09:20:53

by Paul Barker

[permalink] [raw]
Subject: [RFC PATCH net-next v2 4/7] net: ravb: Always update error counters

The error statistics should be updated each time the poll function is
called, even if the full RX work budget has been consumed. This prevents
the counts from becoming stuck when RX bandwidth usage is high.

This also ensures that error counters are not updated after we've
re-enabled interrupts as that could result in a race condition.

Also drop an unnecessary space.

Signed-off-by: Paul Barker <[email protected]>
---
drivers/net/ethernet/renesas/ravb_main.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 10f11141569f..2136600d60dd 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1358,6 +1358,15 @@ static int ravb_poll(struct napi_struct *napi, int budget)
netif_wake_subqueue(ndev, q);
spin_unlock_irqrestore(&priv->lock, flags);

+ /* Receive error message handling */
+ priv->rx_over_errors = priv->stats[RAVB_BE].rx_over_errors;
+ if (info->nc_queues)
+ priv->rx_over_errors += priv->stats[RAVB_NC].rx_over_errors;
+ if (priv->rx_over_errors != ndev->stats.rx_over_errors)
+ ndev->stats.rx_over_errors = priv->rx_over_errors;
+ if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors)
+ ndev->stats.rx_fifo_errors = priv->rx_fifo_errors;
+
if (work_done == budget)
goto out;

@@ -1374,14 +1383,6 @@ static int ravb_poll(struct napi_struct *napi, int budget)
}
spin_unlock_irqrestore(&priv->lock, flags);

- /* Receive error message handling */
- priv->rx_over_errors = priv->stats[RAVB_BE].rx_over_errors;
- if (info->nc_queues)
- priv->rx_over_errors += priv->stats[RAVB_NC].rx_over_errors;
- if (priv->rx_over_errors != ndev->stats.rx_over_errors)
- ndev->stats.rx_over_errors = priv->rx_over_errors;
- if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors)
- ndev->stats.rx_fifo_errors = priv->rx_fifo_errors;
out:
return work_done;
}
--
2.39.2


2024-02-06 09:21:12

by Paul Barker

[permalink] [raw]
Subject: [RFC PATCH net-next v2 5/7] net: ravb: Align poll function with NAPI docs

Call napi_complete_done() in accordance with the documentation in
`Documentation/networking/napi.rst`.

Signed-off-by: Paul Barker <[email protected]>
---
drivers/net/ethernet/renesas/ravb_main.c | 26 ++++++++++--------------
1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 2136600d60dd..661fd86899ac 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1367,23 +1367,19 @@ static int ravb_poll(struct napi_struct *napi, int budget)
if (priv->rx_fifo_errors != ndev->stats.rx_fifo_errors)
ndev->stats.rx_fifo_errors = priv->rx_fifo_errors;

- if (work_done == budget)
- goto out;
-
- napi_complete(napi);
-
- /* Re-enable RX/TX interrupts */
- spin_lock_irqsave(&priv->lock, flags);
- if (!info->irq_en_dis) {
- ravb_modify(ndev, RIC0, mask, mask);
- ravb_modify(ndev, TIC, mask, mask);
- } else {
- ravb_write(ndev, mask, RIE0);
- ravb_write(ndev, mask, TIE);
+ if (work_done < budget && napi_complete_done(napi, work_done)) {
+ /* Re-enable RX/TX interrupts */
+ spin_lock_irqsave(&priv->lock, flags);
+ if (!info->irq_en_dis) {
+ ravb_modify(ndev, RIC0, mask, mask);
+ ravb_modify(ndev, TIC, mask, mask);
+ } else {
+ ravb_write(ndev, mask, RIE0);
+ ravb_write(ndev, mask, TIE);
+ }
+ spin_unlock_irqrestore(&priv->lock, flags);
}
- spin_unlock_irqrestore(&priv->lock, flags);

-out:
return work_done;
}

--
2.39.2


2024-02-06 09:21:48

by Paul Barker

[permalink] [raw]
Subject: [RFC PATCH net-next v2 7/7] net: ravb: Use NAPI threaded mode on 1-core CPUs with GbEth IP

NAPI Threaded mode (along with the previously enabled SW IRQ Coalescing)
is required to improve network stack performance for single core SoCs
using the GbEth IP (currently the RZ/G2L SoC family and the RZ/G3S SoC).

For the RZ/G2UL, network throughput is greatly increased by this change
(results obtained with iperf3) for all test cases except UDP TX:
* TCP TX: 30% more throughput
* TCP RX: 9.8% more throughput
* UDP TX: 9.7% less throughput
* UDP RX: 89% more throughput

For the RZ/G3S we see improvements in network throughput similar to the
RZ/G2UL.

The improvement of UDP RX bandwidth for the single core SoCs (RZ/G2UL &
RZ/G3S) is particularly critical. NAPI Threaded mode can be disabled at
runtime via sysfs for applications where UDP TX performance is a
priority.

Signed-off-by: Paul Barker <[email protected]>
---
drivers/net/ethernet/renesas/ravb_main.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 7bb80608f260..522df82524ff 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -2984,8 +2984,11 @@ static int ravb_probe(struct platform_device *pdev)
if (info->nc_queues)
netif_napi_add(ndev, &priv->napi[RAVB_NC], ravb_poll);

- if (info->needs_irq_coalesce)
+ if (info->needs_irq_coalesce) {
netdev_sw_irq_coalesce_default_on(ndev);
+ if (num_present_cpus() == 1)
+ dev_set_threaded(ndev, true);
+ }

/* Network device register */
error = register_netdev(ndev);
--
2.39.2


2024-02-06 09:22:12

by Paul Barker

[permalink] [raw]
Subject: [RFC PATCH net-next v2 6/7] net: ravb: Enable SW IRQ Coalescing for GbEth

Software IRQ Coalescing is required to improve network stack performance
in the RZ/G2L SoC family and the RZ/G3S SoC, i.e. the SoCs which use the
GbEth IP.

For the RZ/G2L, network throughput is comparable before and after this
change. CPU usage during TCP RX testing dropped by 6.5% and during UDP
RX testing dropped by 10%.

For the RZ/G2UL, network throughput is greatly increased by this change
(results obtained with iperf3):
* TCP TX: 2.9% more throughput
* TCP RX: 1.1% more throughput
* UDP TX: similar throughput
* UDP RX: 41500% more throughput

For the RZ/G3S we see improvements in network throughput similar to the
RZ/G2UL.

The improvement of UDP RX bandwidth for the single core SoCs (RZ/G2UL &
RZ/G3S) is particularly critical.

Signed-off-by: Paul Barker <[email protected]>
---
drivers/net/ethernet/renesas/ravb.h | 1 +
drivers/net/ethernet/renesas/ravb_main.c | 4 ++++
2 files changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/renesas/ravb.h b/drivers/net/ethernet/renesas/ravb.h
index 55a7a08aabef..ca7a66759e35 100644
--- a/drivers/net/ethernet/renesas/ravb.h
+++ b/drivers/net/ethernet/renesas/ravb.h
@@ -1078,6 +1078,7 @@ struct ravb_hw_info {
unsigned nc_queues:1; /* AVB-DMAC has RX and TX NC queues */
unsigned magic_pkt:1; /* E-MAC supports magic packet detection */
unsigned half_duplex:1; /* E-MAC supports half duplex mode */
+ unsigned needs_irq_coalesce:1; /* Requires SW IRQ Coalescing to achieve best performance */
};

struct ravb_private {
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 661fd86899ac..7bb80608f260 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -2639,6 +2639,7 @@ static const struct ravb_hw_info gbeth_hw_info = {
.tx_counters = 1,
.carrier_counters = 1,
.half_duplex = 1,
+ .needs_irq_coalesce = 1,
};

static const struct of_device_id ravb_match_table[] = {
@@ -2983,6 +2984,9 @@ static int ravb_probe(struct platform_device *pdev)
if (info->nc_queues)
netif_napi_add(ndev, &priv->napi[RAVB_NC], ravb_poll);

+ if (info->needs_irq_coalesce)
+ netdev_sw_irq_coalesce_default_on(ndev);
+
/* Network device register */
error = register_netdev(ndev);
if (error)
--
2.39.2


2024-02-10 16:14:10

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 1/7] net: ravb: Simplify poll & receive functions

On 2/6/24 12:19 PM, Paul Barker wrote:

> We don't need to pass the work budget to ravb_rx() by reference, it's
> cleaner to pass this by value and return the amount of work done. This
> allows us to simplify the ravb_poll() function and use the common
> `work_done` variable name seen in other network drivers for consistency
> and ease of understanding.
>
> In ravb_rx_gbeth() & ravb_rx_rcar(), we can also drop the confusingly
> named `boguscnt` variable and use a for loop to iterate through
> descriptors.
>
> This is a pure refactor and should not affect behaviour.
>
> Signed-off-by: Paul Barker <[email protected]>

Reviewed-by: Sergey Shtylyov <[email protected]>

[...]

MBR, Sergey

2024-02-10 16:34:01

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 2/7] net: ravb: Count packets instead of descriptors in RX path

On 2/6/24 12:19 PM, Paul Barker wrote:

> The units of "work done" in the RX path should be packets instead of
> descriptors, as large packets can be spread over multiple descriptors.

Only for GbEth, right?
This does look like a bug fix...

> Signed-off-by: Paul Barker <[email protected]>

[...]

MBR, Sergey

2024-02-10 16:42:29

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 3/7] net: ravb: Always process TX descriptor ring

On 2/6/24 12:19 PM, Paul Barker wrote:

> The TX queue should be serviced each time the poll function is called,
> even if the full RX work budget has been consumed. This prevents
> starvation of the TX queue when RX bandwidth usage is high.
>
> Signed-off-by: Paul Barker <[email protected]>

Also does look like a bug fix...

[...]

MBR, Sergey

2024-02-10 16:48:25

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 4/7] net: ravb: Always update error counters

On 2/6/24 12:19 PM, Paul Barker wrote:

> The error statistics should be updated each time the poll function is
> called, even if the full RX work budget has been consumed. This prevents
> the counts from becoming stuck when RX bandwidth usage is high.
>
> This also ensures that error counters are not updated after we've
> re-enabled interrupts as that could result in a race condition.
>
> Also drop an unnecessary space.
>
> Signed-off-by: Paul Barker <[email protected]>

Definitely looks like a bug fix...

[...]

MBR, Sergey

2024-02-10 17:11:36

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 5/7] net: ravb: Align poll function with NAPI docs

On 2/6/24 12:19 PM, Paul Barker wrote:

> Call napi_complete_done() in accordance with the documentation in
> `Documentation/networking/napi.rst`.
>
> Signed-off-by: Paul Barker <[email protected]>

Reviewed-by: Sergey Shtylyov <[email protected]>

[...]

MBR, Sergey

2024-02-10 18:42:55

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 6/7] net: ravb: Enable SW IRQ Coalescing for GbEth

On 2/6/24 12:19 PM, Paul Barker wrote:

> Software IRQ Coalescing is required to improve network stack performance
> in the RZ/G2L SoC family and the RZ/G3S SoC, i.e. the SoCs which use the
> GbEth IP.
>
> For the RZ/G2L, network throughput is comparable before and after this
> change. CPU usage during TCP RX testing dropped by 6.5% and during UDP
> RX testing dropped by 10%.
>
> For the RZ/G2UL, network throughput is greatly increased by this change
> (results obtained with iperf3):
> * TCP TX: 2.9% more throughput
> * TCP RX: 1.1% more throughput
> * UDP TX: similar throughput
> * UDP RX: 41500% more throughput

Wow! 8-)

> For the RZ/G3S we see improvements in network throughput similar to the
> RZ/G2UL.
>
> The improvement of UDP RX bandwidth for the single core SoCs (RZ/G2UL &
> RZ/G3S) is particularly critical.
>
> Signed-off-by: Paul Barker <[email protected]>
[...]

> diff --git a/drivers/net/ethernet/renesas/ravb.h b/drivers/net/ethernet/renesas/ravb.h
> index 55a7a08aabef..ca7a66759e35 100644
> --- a/drivers/net/ethernet/renesas/ravb.h
> +++ b/drivers/net/ethernet/renesas/ravb.h
> @@ -1078,6 +1078,7 @@ struct ravb_hw_info {
> unsigned nc_queues:1; /* AVB-DMAC has RX and TX NC queues */
> unsigned magic_pkt:1; /* E-MAC supports magic packet detection */
> unsigned half_duplex:1; /* E-MAC supports half duplex mode */
> + unsigned needs_irq_coalesce:1; /* Requires SW IRQ Coalescing to achieve best performance */

Is this really a hardware feature?
Also, s/Requires SW/Needs software/ and s/to achieve best performance//,
please...

[...]

MBR, Sergey

2024-02-10 19:00:27

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 7/7] net: ravb: Use NAPI threaded mode on 1-core CPUs with GbEth IP

On 2/6/24 12:19 PM, Paul Barker wrote:

> NAPI Threaded mode (along with the previously enabled SW IRQ Coalescing)
> is required to improve network stack performance for single core SoCs
> using the GbEth IP (currently the RZ/G2L SoC family and the RZ/G3S SoC).
>
> For the RZ/G2UL, network throughput is greatly increased by this change
> (results obtained with iperf3) for all test cases except UDP TX:
> * TCP TX: 30% more throughput
> * TCP RX: 9.8% more throughput
> * UDP TX: 9.7% less throughput
> * UDP RX: 89% more throughput
>
> For the RZ/G3S we see improvements in network throughput similar to the
> RZ/G2UL.
>
> The improvement of UDP RX bandwidth for the single core SoCs (RZ/G2UL &
> RZ/G3S) is particularly critical. NAPI Threaded mode can be disabled at
> runtime via sysfs for applications where UDP TX performance is a
> priority.
>
> Signed-off-by: Paul Barker <[email protected]>

Reviewed-by: Sergey Shtylyov <[email protected]>

[...]

MBR, Sergey

2024-02-10 19:36:39

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 0/7] Improve GbEth performance on Renesas RZ/G2L and related SoCs

On 2/6/24 12:19 PM, Paul Barker wrote:

> This series aims to improve peformance of the GbEth IP in the Renesas

You didn't fix the typo in "peformance"... :-/

> RZ/G2L SoC family and the RZ/G3S SoC, which use the ravb driver. Along
> the way, we do some refactoring and ensure that napi_complete_done() is
> used in accordance with the NAPI documentation for both GbEth and R-Car
> code paths.
>
> Performance improvment mainly comes from enabling SW IRQ Coalescing for

And in "improvment" too... :-/

> all SoCs using the GbEth IP, and NAPI Threaded mode for single core SoCs
> using the GbEth IP. These can be enabled/disabled at runtime via sysfs,
> but our goal is to set sensible defaults which get good performance on
> the affected SoCs.
>
> The performance impact of this series on iperf3 testing is as follows:
> * RZ/G2L Ethernet throughput is unchanged, but CPU usage drops:
> * Bidirectional and TCP RX: 6.5% less CPU usage
> * UDP RX: 10% less CPU usage
>
> * RZ/G2UL and RZ/G3S Ethernet throughput is increased for all test
> cases except UDP TX, which suffers a slight loss:
> * TCP TX: 32% more throughput
> * TCP RX: 11% more throughput
> * UDP TX: 10% less throughput
> * UDP RX: 10183% more throughput - the previous throughput of

So this is a real figure? I thought you forgot to erase 10... :-)

> 1.06Mbps is what prompted this work.
>
> * RZ/G2N CPU usage and Ethernet throughput is unchanged (tested as a
> representative of the SoCs which use the R-Car based RAVB IP).
>
> This series depends on:
> * "net: ravb: Let IP-specific receive function to interrogate descriptors" v6
> https://lore.kernel.org/all/[email protected]/

This one has been merged now, so you can drop RFC...

> To get the results shown above, you'll also need:
> * "topology: Set capacity_freq_ref in all cases"
> https://lore.kernel.org/all/[email protected]/
>
> * "ravb: Add Rx checksum offload support" v4
> https://lore.kernel.org/all/[email protected]/
>
> * "ravb: Add Tx checksum offload support" v4
> https://lore.kernel.org/all/[email protected]/

These two have been merged too...

> Work in this area will continue, in particular we expect to improve
> TCP/UDP RX performance further with future changes to RX buffer
> handling.
>
> Changes v1->v2:
> * Marked as RFC as the series depends on unmerged patches.
> * Refactored R-Car code paths as well as GbEth code paths.
> * Updated references to the patches this series depends on.
>
> Paul Barker (7):
> net: ravb: Simplify poll & receive functions

The below 3 commits fix issues in the GbEth code, so should
be redone against net.git and posted separately from this series...

> net: ravb: Count packets instead of descriptors in RX path
> net: ravb: Always process TX descriptor ring
> net: ravb: Always update error counters

[...]

MBR, Sergey

2024-02-12 11:45:40

by Paul Barker

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 6/7] net: ravb: Enable SW IRQ Coalescing for GbEth

On 10/02/2024 18:42, Sergey Shtylyov wrote:
> On 2/6/24 12:19 PM, Paul Barker wrote:
>> diff --git a/drivers/net/ethernet/renesas/ravb.h b/drivers/net/ethernet/renesas/ravb.h
>> index 55a7a08aabef..ca7a66759e35 100644
>> --- a/drivers/net/ethernet/renesas/ravb.h
>> +++ b/drivers/net/ethernet/renesas/ravb.h
>> @@ -1078,6 +1078,7 @@ struct ravb_hw_info {
>> unsigned nc_queues:1; /* AVB-DMAC has RX and TX NC queues */
>> unsigned magic_pkt:1; /* E-MAC supports magic packet detection */
>> unsigned half_duplex:1; /* E-MAC supports half duplex mode */
>> + unsigned needs_irq_coalesce:1; /* Requires SW IRQ Coalescing to achieve best performance */
>
> Is this really a hardware feature?

It's more like a requirement to get the best out of this hardware and the Linux networking stack.

I considered checking the compatible string in the probe function but I decided that storing a configuration bit in the HW info struct was cleaner.

> Also, s/Requires SW/Needs software/ and s/to achieve best performance//,
> please...

Will do.

>
> [...]
>
> MBR, Sergey

Thanks for the review,
Paul


Attachments:
OpenPGP_0x27F4B3459F002257.asc (3.49 kB)
OpenPGP public key
OpenPGP_signature.asc (243.00 B)
OpenPGP digital signature
Download all attachments

2024-02-12 11:53:00

by Paul Barker

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 0/7] Improve GbEth performance on Renesas RZ/G2L and related SoCs

On 10/02/2024 19:36, Sergey Shtylyov wrote:
> On 2/6/24 12:19 PM, Paul Barker wrote:
>
>> This series aims to improve peformance of the GbEth IP in the Renesas
>
> You didn't fix the typo in "peformance"... :-/
>
>> RZ/G2L SoC family and the RZ/G3S SoC, which use the ravb driver. Along
>> the way, we do some refactoring and ensure that napi_complete_done() is
>> used in accordance with the NAPI documentation for both GbEth and R-Car
>> code paths.
>>
>> Performance improvment mainly comes from enabling SW IRQ Coalescing for
>
> And in "improvment" too... :-/

I'll fix this and the above type in v3.

>
>> all SoCs using the GbEth IP, and NAPI Threaded mode for single core SoCs
>> using the GbEth IP. These can be enabled/disabled at runtime via sysfs,
>> but our goal is to set sensible defaults which get good performance on
>> the affected SoCs.
>>
>> The performance impact of this series on iperf3 testing is as follows:
>> * RZ/G2L Ethernet throughput is unchanged, but CPU usage drops:
>> * Bidirectional and TCP RX: 6.5% less CPU usage
>> * UDP RX: 10% less CPU usage
>>
>> * RZ/G2UL and RZ/G3S Ethernet throughput is increased for all test
>> cases except UDP TX, which suffers a slight loss:
>> * TCP TX: 32% more throughput
>> * TCP RX: 11% more throughput
>> * UDP TX: 10% less throughput
>> * UDP RX: 10183% more throughput - the previous throughput of
>
> So this is a real figure? I thought you forgot to erase 10... :-)

Yes, throughput went from 1.06Mbps to 109Mbps for the RZ/G2UL with these
changes.

Initial testing shows that goes up again to 485Mbps with the next patch
series I'm working on to reduce RX buffer sizes.

Biju's work on checksum offload also helps a lot with these numbers, I
can't take all the credit.

>
>> 1.06Mbps is what prompted this work.
>>
>> * RZ/G2N CPU usage and Ethernet throughput is unchanged (tested as a
>> representative of the SoCs which use the R-Car based RAVB IP).
>>
>> This series depends on:
>> * "net: ravb: Let IP-specific receive function to interrogate descriptors" v6
>> https://lore.kernel.org/all/[email protected]/
>
> This one has been merged now, so you can drop RFC...
>
>> To get the results shown above, you'll also need:
>> * "topology: Set capacity_freq_ref in all cases"
>> https://lore.kernel.org/all/[email protected]/
>>
>> * "ravb: Add Rx checksum offload support" v4
>> https://lore.kernel.org/all/[email protected]/
>>
>> * "ravb: Add Tx checksum offload support" v4
>> https://lore.kernel.org/all/[email protected]/
>
> These two have been merged too...
>
>> Work in this area will continue, in particular we expect to improve
>> TCP/UDP RX performance further with future changes to RX buffer
>> handling.
>>
>> Changes v1->v2:
>> * Marked as RFC as the series depends on unmerged patches.
>> * Refactored R-Car code paths as well as GbEth code paths.
>> * Updated references to the patches this series depends on.
>>
>> Paul Barker (7):
>> net: ravb: Simplify poll & receive functions
>
> The below 3 commits fix issues in the GbEth code, so should
> be redone against net.git and posted separately from this series...
>
>> net: ravb: Count packets instead of descriptors in RX path
>> net: ravb: Always process TX descriptor ring
>> net: ravb: Always update error counters

I'll split out and re-submit these as bug fixes. "net: ravb: Count
packets instead of descriptors in RX path" will require a bit of rework
so it doesn't depend on the first patch of the series ("net: ravb:
Simplify poll & receive functions") so you'll probably want to re-review
when I send it.

Then I'll re-send the rest as a non-RFC series.

>
> [...]
>
> MBR, Sergey

Thanks for the review!
Paul


Attachments:
OpenPGP_0x27F4B3459F002257.asc (3.49 kB)
OpenPGP public key
OpenPGP_signature.asc (243.00 B)
OpenPGP digital signature
Download all attachments

2024-02-12 20:40:28

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 6/7] net: ravb: Enable SW IRQ Coalescing for GbEth

On 2/12/24 2:45 PM, Paul Barker wrote:
[...]
>>> diff --git a/drivers/net/ethernet/renesas/ravb.h b/drivers/net/ethernet/renesas/ravb.h
>>> index 55a7a08aabef..ca7a66759e35 100644
>>> --- a/drivers/net/ethernet/renesas/ravb.h
>>> +++ b/drivers/net/ethernet/renesas/ravb.h
>>> @@ -1078,6 +1078,7 @@ struct ravb_hw_info {
>>> unsigned nc_queues:1; /* AVB-DMAC has RX and TX NC queues */
>>> unsigned magic_pkt:1; /* E-MAC supports magic packet detection */
>>> unsigned half_duplex:1; /* E-MAC supports half duplex mode */
>>> + unsigned needs_irq_coalesce:1; /* Requires SW IRQ Coalescing to achieve best performance */
>>
>> Is this really a hardware feature?
>
> It's more like a requirement to get the best out of this hardware and the Linux networking stack.
>
> I considered checking the compatible string in the probe function but I decided that storing a configuration bit in the HW info struct was cleaner.

Yes, but you added the new bit under the "hardware features" commet. :-)

>> Also, s/Requires SW/Needs software/ and s/to achieve best performance//,
>> please...
>
> Will do.

The comment is too long, I think. :-)

[...]

> Thanks for the review,
> Paul

MBR, Sergey

2024-02-12 20:53:30

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 0/7] Improve GbEth performance on Renesas RZ/G2L and related SoCs

On 2/12/24 2:52 PM, Paul Barker wrote:
[...]

>>> This series aims to improve peformance of the GbEth IP in the Renesas
>>
>> You didn't fix the typo in "peformance"... :-/
>>
>>> RZ/G2L SoC family and the RZ/G3S SoC, which use the ravb driver. Along
>>> the way, we do some refactoring and ensure that napi_complete_done() is
>>> used in accordance with the NAPI documentation for both GbEth and R-Car
>>> code paths.
>>>
>>> Performance improvment mainly comes from enabling SW IRQ Coalescing for
>>
>> And in "improvment" too... :-/
>
> I'll fix this and the above type in v3.

TIA! Chances are this will end up in the merge commit...

>>> all SoCs using the GbEth IP, and NAPI Threaded mode for single core SoCs
>>> using the GbEth IP. These can be enabled/disabled at runtime via sysfs,
>>> but our goal is to set sensible defaults which get good performance on
>>> the affected SoCs.
>>>
>>> The performance impact of this series on iperf3 testing is as follows:
>>> * RZ/G2L Ethernet throughput is unchanged, but CPU usage drops:
>>> * Bidirectional and TCP RX: 6.5% less CPU usage
>>> * UDP RX: 10% less CPU usage
>>>
>>> * RZ/G2UL and RZ/G3S Ethernet throughput is increased for all test
>>> cases except UDP TX, which suffers a slight loss:
>>> * TCP TX: 32% more throughput
>>> * TCP RX: 11% more throughput
>>> * UDP TX: 10% less throughput
>>> * UDP RX: 10183% more throughput - the previous throughput of
>>
>> So this is a real figure? I thought you forgot to erase 10... :-)
>
> Yes, throughput went from 1.06Mbps to 109Mbps for the RZ/G2UL with these
> changes.

Hm, that gives me even 10283%! :-)

> Initial testing shows that goes up again to 485Mbps with the next patch
> series I'm working on to reduce RX buffer sizes.

Oh, wow! :-)

> Biju's work on checksum offload also helps a lot with these numbers, I
> can't take all the credit.

Took 5 versions to merge, unfortunately... :-/

[...]

>>> Work in this area will continue, in particular we expect to improve
>>> TCP/UDP RX performance further with future changes to RX buffer
>>> handling.
>>>
>>> Changes v1->v2:
>>> * Marked as RFC as the series depends on unmerged patches.
>>> * Refactored R-Car code paths as well as GbEth code paths.
>>> * Updated references to the patches this series depends on.
>>>
>>> Paul Barker (7):
>>> net: ravb: Simplify poll & receive functions
>>
>> The below 3 commits fix issues in the GbEth code, so should
>> be redone against net.git and posted separately from this series...
>>
>>> net: ravb: Count packets instead of descriptors in RX path
>>> net: ravb: Always process TX descriptor ring
>>> net: ravb: Always update error counters
>
> I'll split out and re-submit these as bug fixes. "net: ravb: Count
> packets instead of descriptors in RX path" will require a bit of rework
> so it doesn't depend on the first patch of the series ("net: ravb:
> Simplify poll & receive functions") so you'll probably want to re-review
> when I send it.

Yes, I figured that at least the 1st patch would need to be reworked...

> Then I'll re-send the rest as a non-RFC series.

Won't they need to be rebased against 3 fixes?

[...]

> Thanks for the review!
> Paul

MBR, Sergey

2024-02-13 09:45:17

by Sergey Shtylyov

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 0/7] Improve GbEth performance on Renesas RZ/G2L and related SoCs

On 2/12/24 11:53 PM, Sergey Shtylyov wrote:
[...]

>>>> This series aims to improve peformance of the GbEth IP in the Renesas
>>>
>>> You didn't fix the typo in "peformance"... :-/
>>>
>>>> RZ/G2L SoC family and the RZ/G3S SoC, which use the ravb driver. Along
>>>> the way, we do some refactoring and ensure that napi_complete_done() is
>>>> used in accordance with the NAPI documentation for both GbEth and R-Car
>>>> code paths.
>>>>
>>>> Performance improvment mainly comes from enabling SW IRQ Coalescing for
>>>
>>> And in "improvment" too... :-/
>>
>> I'll fix this and the above type in v3.
>
> TIA! Chances are this will end up in the merge commit...
>
>>>> all SoCs using the GbEth IP, and NAPI Threaded mode for single core SoCs
>>>> using the GbEth IP. These can be enabled/disabled at runtime via sysfs,
>>>> but our goal is to set sensible defaults which get good performance on
>>>> the affected SoCs.
>>>>
>>>> The performance impact of this series on iperf3 testing is as follows:
>>>> * RZ/G2L Ethernet throughput is unchanged, but CPU usage drops:
>>>> * Bidirectional and TCP RX: 6.5% less CPU usage
>>>> * UDP RX: 10% less CPU usage
>>>>
>>>> * RZ/G2UL and RZ/G3S Ethernet throughput is increased for all test
>>>> cases except UDP TX, which suffers a slight loss:
>>>> * TCP TX: 32% more throughput
>>>> * TCP RX: 11% more throughput
>>>> * UDP TX: 10% less throughput
>>>> * UDP RX: 10183% more throughput - the previous throughput of
>>>
>>> So this is a real figure? I thought you forgot to erase 10... :-)
>>
>> Yes, throughput went from 1.06Mbps to 109Mbps for the RZ/G2UL with these
>> changes.
>
> Hm, that gives me even 10283%! :-)

Stupid me, forgot to subtract 100%... :-)

[...]

MBR, Sergey

2024-02-14 09:17:43

by Paul Barker

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 0/7] Improve GbEth performance on Renesas RZ/G2L and related SoCs

On 12/02/2024 20:53, Sergey Shtylyov wrote:
> On 2/12/24 2:52 PM, Paul Barker wrote:
> [...]
>
>>>> This series aims to improve peformance of the GbEth IP in the Renesas
>>>
>>> You didn't fix the typo in "peformance"... :-/
>>>
>>>> RZ/G2L SoC family and the RZ/G3S SoC, which use the ravb driver. Along
>>>> the way, we do some refactoring and ensure that napi_complete_done() is
>>>> used in accordance with the NAPI documentation for both GbEth and R-Car
>>>> code paths.
>>>>
>>>> Performance improvment mainly comes from enabling SW IRQ Coalescing for
>>>
>>> And in "improvment" too... :-/
>>
>> I'll fix this and the above type in v3.
>
> TIA! Chances are this will end up in the merge commit...
>
>>>> all SoCs using the GbEth IP, and NAPI Threaded mode for single core SoCs
>>>> using the GbEth IP. These can be enabled/disabled at runtime via sysfs,
>>>> but our goal is to set sensible defaults which get good performance on
>>>> the affected SoCs.
>>>>
>>>> The performance impact of this series on iperf3 testing is as follows:
>>>> * RZ/G2L Ethernet throughput is unchanged, but CPU usage drops:
>>>> * Bidirectional and TCP RX: 6.5% less CPU usage
>>>> * UDP RX: 10% less CPU usage
>>>>
>>>> * RZ/G2UL and RZ/G3S Ethernet throughput is increased for all test
>>>> cases except UDP TX, which suffers a slight loss:
>>>> * TCP TX: 32% more throughput
>>>> * TCP RX: 11% more throughput
>>>> * UDP TX: 10% less throughput
>>>> * UDP RX: 10183% more throughput - the previous throughput of
>>>
>>> So this is a real figure? I thought you forgot to erase 10... :-)
>>
>> Yes, throughput went from 1.06Mbps to 109Mbps for the RZ/G2UL with these
>> changes.
>
> Hm, that gives me even 10283%! :-)
>
>> Initial testing shows that goes up again to 485Mbps with the next patch
>> series I'm working on to reduce RX buffer sizes.
>
> Oh, wow! :-)
>
>> Biju's work on checksum offload also helps a lot with these numbers, I
>> can't take all the credit.
>
> Took 5 versions to merge, unfortunately... :-/
>
> [...]
>
>>>> Work in this area will continue, in particular we expect to improve
>>>> TCP/UDP RX performance further with future changes to RX buffer
>>>> handling.
>>>>
>>>> Changes v1->v2:
>>>> * Marked as RFC as the series depends on unmerged patches.
>>>> * Refactored R-Car code paths as well as GbEth code paths.
>>>> * Updated references to the patches this series depends on.
>>>>
>>>> Paul Barker (7):
>>>> net: ravb: Simplify poll & receive functions
>>>
>>> The below 3 commits fix issues in the GbEth code, so should
>>> be redone against net.git and posted separately from this series...
>>>
>>>> net: ravb: Count packets instead of descriptors in RX path
>>>> net: ravb: Always process TX descriptor ring
>>>> net: ravb: Always update error counters
>>
>> I'll split out and re-submit these as bug fixes. "net: ravb: Count
>> packets instead of descriptors in RX path" will require a bit of rework
>> so it doesn't depend on the first patch of the series ("net: ravb:
>> Simplify poll & receive functions") so you'll probably want to re-review
>> when I send it.
>
> Yes, I figured that at least the 1st patch would need to be reworked...
>
>> Then I'll re-send the rest as a non-RFC series.
>
> Won't they need to be rebased against 3 fixes?

Yes, the rest will need rebasing.

We need to test gPTP on an RZ/G2N board with these changes first. We're
working on it and I'll let you know the status soon. I should be able to
send at least one bugfix in a way that doesn't affect RZ/G2N & R-Car
boards though...

Thanks,

--
Paul Barker


Attachments:
OpenPGP_0x27F4B3459F002257.asc (3.49 kB)
OpenPGP public key
OpenPGP_signature.asc (243.00 B)
OpenPGP digital signature
Download all attachments

2024-02-14 09:36:58

by Paul Barker

[permalink] [raw]
Subject: Re: [RFC PATCH net-next v2 6/7] net: ravb: Enable SW IRQ Coalescing for GbEth

On 12/02/2024 20:40, Sergey Shtylyov wrote:
> On 2/12/24 2:45 PM, Paul Barker wrote:
> [...]
>>>> diff --git a/drivers/net/ethernet/renesas/ravb.h b/drivers/net/ethernet/renesas/ravb.h
>>>> index 55a7a08aabef..ca7a66759e35 100644
>>>> --- a/drivers/net/ethernet/renesas/ravb.h
>>>> +++ b/drivers/net/ethernet/renesas/ravb.h
>>>> @@ -1078,6 +1078,7 @@ struct ravb_hw_info {
>>>> unsigned nc_queues:1; /* AVB-DMAC has RX and TX NC queues */
>>>> unsigned magic_pkt:1; /* E-MAC supports magic packet detection */
>>>> unsigned half_duplex:1; /* E-MAC supports half duplex mode */
>>>> + unsigned needs_irq_coalesce:1; /* Requires SW IRQ Coalescing to achieve best performance */
>>>
>>> Is this really a hardware feature?
>>
>> It's more like a requirement to get the best out of this hardware and the Linux networking stack.
>>
>> I considered checking the compatible string in the probe function but I decided that storing a configuration bit in the HW info struct was cleaner.
>
> Yes, but you added the new bit under the "hardware features" commet. :-)
>
>>> Also, s/Requires SW/Needs software/ and s/to achieve best performance//,
>>> please...
>>
>> Will do.
>
> The comment is too long, I think. :-)

I'll fix both in the next revision.

--
Paul Barker


Attachments:
OpenPGP_0x27F4B3459F002257.asc (3.49 kB)
OpenPGP public key
OpenPGP_signature.asc (243.00 B)
OpenPGP digital signature
Download all attachments