2019-11-13 15:14:27

by Jose Abreu

[permalink] [raw]
Subject: [PATCH net-next 0/7] net: stmmac: CPU Performance Improvements

CPU Performance improvements for stmmac. Please check bellow for results
before and after the series.

Patch 1/7, allows RX Interrupt on Completion to be disabled and only use the
RX HW Watchdog.

Patch 2/7, setups the default RX coalesce settings instead of using the
minimum value.

Patch 3/7, enables the Transmit Buffer Unavailable interrupt on GMAC4+ cores
so that we don't miss any packet that could have been coalesced.

Patch 4/7 and 5/7, removes the uneeded computations for RX Flow Control
activation/de-activation, on some cases.

Patch 6/7, tunes-up the default coalesce settings.

Patch 7/7, corrects the interpretation of TX Coalesce.


NetPerf UDP Results:
--------------------

Socket Message Elapsed Messages CPU Service
Size Size Time Okay Errors Throughput Util Demand
bytes bytes secs # # 10^6bits/sec % SS us/KB
--- [email protected]: Before
212992 1400 10.00 2100620 0 2351.7 36.69 5.112
212992 10.00 2100539 2351.6 26.18 3.648
--- [email protected]: After
212992 1400 10.00 2116860 0 2370.4 27.61 3.816
212992 10.00 2111552 2364.5 17.41 2.407

--- GMAC5@1G: Before
212992 1400 10.00 786000 0 880.2 34.71 12.923
212992 10.00 786000 880.2 23.42 8.719
--- GMAC5@1G: After
212992 1400 10.00 847702 0 949.3 15.07 5.201
212992 10.00 847702 949.3 12.91 4.456


Perf TCP Results on RX Path:
----------------------------
--- [email protected]: Before
22.51% swapper [stmmac] [k] dwxgmac2_dma_interrupt
10.82% swapper [stmmac] [k] dwxgmac2_host_mtl_irq_status
5.21% swapper [stmmac] [k] dwxgmac2_host_irq_status
4.67% swapper [stmmac] [k] dwxgmac3_safety_feat_irq_status
3.63% swapper [kernel.kallsyms] [k] stack_trace_consume_entry
2.74% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string
2.52% swapper [kernel.kallsyms] [k] update_stack_state
1.94% ksoftirqd/0 [stmmac] [k] dwxgmac2_dma_interrupt
1.45% iperf3 [kernel.kallsyms] [k] queued_spin_lock_slowpath
1.26% swapper [kernel.kallsyms] [k] create_object
--- [email protected]: After
12.00% swapper [stmmac] [k] dwxgmac2_dma_interrupt
5.96% swapper [kernel.kallsyms] [k] stack_trace_consume_entry
5.65% swapper [stmmac] [k] dwxgmac2_host_mtl_irq_status
4.36% swapper [kernel.kallsyms] [k] update_stack_state
3.91% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string
2.82% swapper [stmmac] [k] dwxgmac2_host_irq_status
2.62% swapper [stmmac] [k] dwxgmac3_safety_feat_irq_status
2.25% swapper [kernel.kallsyms] [k] create_object
2.03% swapper [stmmac] [k] stmmac_napi_poll_rx
1.97% swapper [kernel.kallsyms] [k] unwind_next_frame.part.4

--- GMAC5@1G: Before
31.29% swapper [stmmac] [k] dwmac4_dma_interrupt
14.57% swapper [stmmac] [k] dwmac4_irq_mtl_status
10.66% swapper [stmmac] [k] dwmac4_irq_status
1.97% swapper [kernel.kallsyms] [k] stack_trace_consume_entry
1.73% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string
1.59% swapper [kernel.kallsyms] [k] update_stack_state
1.15% iperf3 [kernel.kallsyms] [k] do_syscall_64
1.01% ksoftirqd/0 [stmmac] [k] dwmac4_dma_interrupt
0.89% swapper [kernel.kallsyms] [k] __default_send_IPI_dest_field
0.75% swapper [stmmac] [k] stmmac_napi_poll_rx
--- GMAC5@1G: After
9.27% swapper [stmmac] [k] dwmac4_dma_interrupt
6.35% swapper [kernel.kallsyms] [k] stack_trace_consume_entry
4.94% swapper [kernel.kallsyms] [k] update_stack_state
4.70% swapper [stmmac] [k] dwmac4_irq_mtl_status
3.58% swapper [stmmac] [k] dwmac4_irq_status
3.42% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string
2.18% swapper [kernel.kallsyms] [k] unwind_next_frame.part.4
2.17% swapper [stmmac] [k] stmmac_napi_poll_rx
2.15% swapper [kernel.kallsyms] [k] create_object
1.26% swapper [kernel.kallsyms] [k] unwind_get_return_address

---
Cc: Giuseppe Cavallaro <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Jose Abreu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---

Jose Abreu (7):
net: stmmac: Do not set RX IC bit if RX Coalesce is zero
net: stmmac: Setup a default RX Coalesce value instead of the minimum
net: stmmac: gmac4+: Enable the TBU Interrupt
net: stmmac: gmac4+: Remove uneeded computation for RFA/RFD
net: stmmac: xgmac: Remove uneeded computation for RFA/RFD
net: stmmac: Tune-up default coalesce settings
net: stmmac: TX Coalesce should be per-packet

drivers/net/ethernet/stmicro/stmmac/common.h | 5 +++--
drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.c | 14 ++------------
drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h | 2 ++
drivers/net/ethernet/stmicro/stmmac/dwxgmac2_dma.c | 14 ++------------
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 14 ++++++++------
5 files changed, 17 insertions(+), 32 deletions(-)

--
2.7.4


2019-11-13 15:14:52

by Jose Abreu

[permalink] [raw]
Subject: [PATCH net-next 1/7] net: stmmac: Do not set RX IC bit if RX Coalesce is zero

We may only want to use the RX Watchdog so lets check if RX Coalesce
settings are non-zero and only set the RX Interrupt on Completion bit if
its not.

Signed-off-by: Jose Abreu <[email protected]>

---
Cc: Giuseppe Cavallaro <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Jose Abreu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 39b4efd521f9..e3677883ea30 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3440,7 +3440,8 @@ static inline void stmmac_rx_refill(struct stmmac_priv *priv, u32 queue)
rx_q->rx_count_frames += priv->rx_coal_frames;
if (rx_q->rx_count_frames > priv->rx_coal_frames)
rx_q->rx_count_frames = 0;
- use_rx_wd = priv->use_riwt && rx_q->rx_count_frames;
+ use_rx_wd = !priv->rx_coal_frames;
+ use_rx_wd |= priv->use_riwt && rx_q->rx_count_frames;

dma_wmb();
stmmac_set_rx_owner(priv, p, use_rx_wd);
--
2.7.4

2019-11-13 15:17:59

by Jose Abreu

[permalink] [raw]
Subject: [PATCH net-next 7/7] net: stmmac: TX Coalesce should be per-packet

TX Coalesce settings are per packet and not per fragment because
otherwise the coalesce would be different between TSO and non-TSO
packets.

Signed-off-by: Jose Abreu <[email protected]>

---
Cc: Giuseppe Cavallaro <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Jose Abreu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 6136ada20c8e..140abfcb54c6 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3033,7 +3033,7 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
tx_q->tx_skbuff[tx_q->cur_tx] = skb;

/* Manage tx mitigation */
- tx_q->tx_count_frames += nfrags + 1;
+ tx_q->tx_count_frames++;
if (likely(priv->tx_coal_frames > tx_q->tx_count_frames) &&
!((skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&
priv->hwts_tx_en)) {
@@ -3241,7 +3241,7 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev)
* This approach takes care about the fragments: desc is the first
* element in case of no SG.
*/
- tx_q->tx_count_frames += nfrags + 1;
+ tx_q->tx_count_frames++;
if (likely(priv->tx_coal_frames > tx_q->tx_count_frames) &&
!((skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) &&
priv->hwts_tx_en)) {
--
2.7.4

2019-11-13 16:40:32

by Jose Abreu

[permalink] [raw]
Subject: [PATCH net-next 3/7] net: stmmac: gmac4+: Enable the TBU Interrupt

Enables Transmit Buffer Unavailable interrupt so that any coalesced
packet is not missed on transmission.

Signed-off-by: Jose Abreu <[email protected]>

---
Cc: Giuseppe Cavallaro <[email protected]>
Cc: Alexandre Torgue <[email protected]>
Cc: Jose Abreu <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Maxime Coquelin <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
index 589931795847..1be1df5f65de 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_dma.h
@@ -161,6 +161,7 @@

#define DMA_CHAN_INTR_NORMAL (DMA_CHAN_INTR_ENA_NIE | \
DMA_CHAN_INTR_ENA_RIE | \
+ DMA_CHAN_INTR_ENA_TBUE | \
DMA_CHAN_INTR_ENA_TIE)

#define DMA_CHAN_INTR_ABNORMAL (DMA_CHAN_INTR_ENA_AIE | \
@@ -171,6 +172,7 @@

#define DMA_CHAN_INTR_NORMAL_4_10 (DMA_CHAN_INTR_ENA_NIE_4_10 | \
DMA_CHAN_INTR_ENA_RIE | \
+ DMA_CHAN_INTR_ENA_TBUE | \
DMA_CHAN_INTR_ENA_TIE)

#define DMA_CHAN_INTR_ABNORMAL_4_10 (DMA_CHAN_INTR_ENA_AIE_4_10 | \
--
2.7.4

2019-11-14 11:02:09

by Jose Abreu

[permalink] [raw]
Subject: RE: [PATCH net-next 0/7] net: stmmac: CPU Performance Improvements

From: Jose Abreu <[email protected]>
Date: Nov/13/2019, 15:12:01 (UTC+00:00)

> CPU Performance improvements for stmmac. Please check bellow for results
> before and after the series.

Please do not apply this. I found an issue with patch 1/7 and I have
some more changes that reduce even more the CPU usage.

---
Thanks,
Jose Miguel Abreu

2019-11-14 21:37:27

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next 0/7] net: stmmac: CPU Performance Improvements

From: Jose Abreu <[email protected]>
Date: Thu, 14 Nov 2019 10:59:14 +0000

> From: Jose Abreu <[email protected]>
> Date: Nov/13/2019, 15:12:01 (UTC+00:00)
>
>> CPU Performance improvements for stmmac. Please check bellow for results
>> before and after the series.
>
> Please do not apply this. I found an issue with patch 1/7 and I have
> some more changes that reduce even more the CPU usage.

Ok.