Most of these changes come from OpenWrt where they have been present and
tested for months.
First three patches are bug fixes. The rest are performance
improvements. The last patch is a cleanup to use the iopoll.h macro for
busy-waiting instead of a custom loop.
Source: 770-*.patch at https://git.openwrt.org/?p=openwrt/openwrt.git;a=tree;f=target/linux/generic/pending-5.10;hb=HEAD
Felix Fietkau (12):
net: ethernet: mtk_eth_soc: fix RX VLAN offload
net: ethernet: mtk_eth_soc: unmap RX data before calling build_skb
net: ethernet: mtk_eth_soc: use napi_consume_skb
net: ethernet: mtk_eth_soc: reduce MDIO bus access latency
net: ethernet: mtk_eth_soc: remove unnecessary TX queue stops
net: ethernet: mtk_eth_soc: use larger burst size for QDMA TX
net: ethernet: mtk_eth_soc: increase DMA ring sizes
net: ethernet: mtk_eth_soc: implement dynamic interrupt moderation
net: ethernet: mtk_eth_soc: cache HW pointer of last freed TX
descriptor
net: ethernet: mtk_eth_soc: only read the full RX descriptor if DMA is
done
net: ethernet: mtk_eth_soc: reduce unnecessary interrupts
net: ethernet: mtk_eth_soc: set PPE flow hash as skb hash if present
Ilya Lipnitskiy (2):
net: ethernet: mtk_eth_soc: fix build_skb cleanup
net: ethernet: mtk_eth_soc: use iopoll.h macro for DMA init
drivers/net/ethernet/mediatek/Kconfig | 1 +
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 228 ++++++++++++++------
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 52 ++++-
3 files changed, 199 insertions(+), 82 deletions(-)
--
2.31.1
From: Felix Fietkau <[email protected]>
The VLAN ID in the rx descriptor is only valid if the RX_DMA_VTAG bit is
set. Fixes frames wrongly marked with VLAN tags.
Signed-off-by: Felix Fietkau <[email protected]>
[Ilya: fix commit message]
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 +-
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 6b00c12c6c43..b2175ec451ab 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1319,7 +1319,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
skb->protocol = eth_type_trans(skb, netdev);
if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX &&
- RX_DMA_VID(trxd.rxd3))
+ (trxd.rxd2 & RX_DMA_VTAG))
__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
RX_DMA_VID(trxd.rxd3));
skb_record_rx_queue(skb, 0);
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 1a6750c08bb9..875e67b41561 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -301,6 +301,7 @@
#define RX_DMA_LSO BIT(30)
#define RX_DMA_PLEN0(_x) (((_x) & 0x3fff) << 16)
#define RX_DMA_GET_PLEN0(_x) (((_x) >> 16) & 0x3fff)
+#define RX_DMA_VTAG BIT(15)
/* QDMA descriptor rxd3 */
#define RX_DMA_VID(_x) ((_x) & 0xfff)
--
2.31.1
In case build_skb fails, call skb_free_frag on the correct pointer. Also
update the DMA structures with the new mapping before exiting, because
the mapping was successful
Suggested-by: Felix Fietkau <[email protected]>
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 540003f3fcb8..07daa5de8bec 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1304,9 +1304,9 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
/* receive data */
skb = build_skb(data, ring->frag_size);
if (unlikely(!skb)) {
- skb_free_frag(new_data);
+ skb_free_frag(data);
netdev->stats.rx_dropped++;
- goto release_desc;
+ goto skip_rx;
}
skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
@@ -1326,6 +1326,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
skb_record_rx_queue(skb, 0);
napi_gro_receive(napi, skb);
+skip_rx:
ring->data[idx] = new_data;
rxd->rxd1 = (unsigned int)dma_addr;
--
2.31.1
From: Felix Fietkau <[email protected]>
Since build_skb accesses the data area (for initializing shinfo), dma unmap
needs to happen before that call
Signed-off-by: Felix Fietkau <[email protected]>
[Ilya: split build_skb cleanup fix into a separate commit]
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index b2175ec451ab..540003f3fcb8 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1298,6 +1298,9 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
goto release_desc;
}
+ dma_unmap_single(eth->dev, trxd.rxd1,
+ ring->buf_size, DMA_FROM_DEVICE);
+
/* receive data */
skb = build_skb(data, ring->frag_size);
if (unlikely(!skb)) {
@@ -1307,8 +1310,6 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
}
skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
- dma_unmap_single(eth->dev, trxd.rxd1,
- ring->buf_size, DMA_FROM_DEVICE);
pktlen = RX_DMA_GET_PLEN0(trxd.rxd2);
skb->dev = netdev;
skb_put(skb, pktlen);
--
2.31.1
From: Felix Fietkau <[email protected]>
Improves tx performance
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 +-
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 223131645a37..5a67bbe9bd90 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2191,7 +2191,7 @@ static int mtk_start_dma(struct mtk_eth *eth)
if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
mtk_w32(eth,
MTK_TX_WB_DDONE | MTK_TX_DMA_EN |
- MTK_DMA_SIZE_16DWORDS | MTK_NDP_CO_PRO |
+ MTK_TX_BT_32DWORDS | MTK_NDP_CO_PRO |
MTK_RX_DMA_EN | MTK_RX_2B_OFFSET |
MTK_RX_BT_32DWORDS,
MTK_QDMA_GLO_CFG);
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 989342a7ae4a..039c39d750e0 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -203,7 +203,7 @@
#define MTK_RX_BT_32DWORDS (3 << 11)
#define MTK_NDP_CO_PRO BIT(10)
#define MTK_TX_WB_DDONE BIT(6)
-#define MTK_DMA_SIZE_16DWORDS (2 << 4)
+#define MTK_TX_BT_32DWORDS (3 << 4)
#define MTK_RX_DMA_BUSY BIT(3)
#define MTK_TX_DMA_BUSY BIT(1)
#define MTK_RX_DMA_EN BIT(2)
--
2.31.1
From: Felix Fietkau <[email protected]>
usleep_range often ends up sleeping much longer than the 10-20us provided
as a range here. This causes significant latency in mdio bus acceses,
which easily adds multiple seconds to the boot time on MT7621 when polling
DSA slave ports.
Use udelay via readx_poll_timeout_atomic, since the MDIO access does not
take much time
Signed-off-by: Felix Fietkau <[email protected]>
[Ilya: use readx_poll_timeout_atomic instead of cond_resched]
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 18 ++++++++----------
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 2 +-
2 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 5cf64de3ddf8..a3958e99a29f 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -79,18 +79,16 @@ static u32 mtk_m32(struct mtk_eth *eth, u32 mask, u32 set, unsigned reg)
static int mtk_mdio_busy_wait(struct mtk_eth *eth)
{
- unsigned long t_start = jiffies;
+ int ret;
+ u32 val;
- while (1) {
- if (!(mtk_r32(eth, MTK_PHY_IAC) & PHY_IAC_ACCESS))
- return 0;
- if (time_after(jiffies, t_start + PHY_IAC_TIMEOUT))
- break;
- usleep_range(10, 20);
- }
+ ret = readx_poll_timeout_atomic(__raw_readl, eth->base + MTK_PHY_IAC,
+ val, !(val & PHY_IAC_ACCESS),
+ 5, PHY_IAC_TIMEOUT_US);
+ if (ret)
+ dev_err(eth->dev, "mdio: MDIO timeout\n");
- dev_err(eth->dev, "mdio: MDIO timeout\n");
- return -1;
+ return ret;
}
static u32 _mtk_mdio_write(struct mtk_eth *eth, u32 phy_addr,
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 875e67b41561..989342a7ae4a 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -327,7 +327,7 @@
#define PHY_IAC_START BIT(16)
#define PHY_IAC_ADDR_SHIFT 20
#define PHY_IAC_REG_SHIFT 25
-#define PHY_IAC_TIMEOUT HZ
+#define PHY_IAC_TIMEOUT_US 1000000
#define MTK_MAC_MISC 0x1000c
#define MTK_MUX_TO_ESW BIT(0)
--
2.31.1
From: Felix Fietkau <[email protected]>
Reduces the number of interrupts under load
Signed-off-by: Felix Fietkau <[email protected]>
[Ilya: add documentation for new struct fields]
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/Kconfig | 1 +
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 96 +++++++++++++++++++--
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 41 +++++++--
3 files changed, 124 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
index 08c2e446d3d5..c357c193378e 100644
--- a/drivers/net/ethernet/mediatek/Kconfig
+++ b/drivers/net/ethernet/mediatek/Kconfig
@@ -11,6 +11,7 @@ config NET_MEDIATEK_SOC
tristate "MediaTek SoC Gigabit Ethernet support"
depends on NET_DSA || !NET_DSA
select PHYLINK
+ select DIMLIB
help
This driver supports the gigabit ethernet MACs in the
MediaTek SoC family.
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 5a67bbe9bd90..043ab5446524 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1231,12 +1231,13 @@ static void mtk_update_rx_cpu_idx(struct mtk_eth *eth)
static int mtk_poll_rx(struct napi_struct *napi, int budget,
struct mtk_eth *eth)
{
+ struct dim_sample dim_sample = {};
struct mtk_rx_ring *ring;
int idx;
struct sk_buff *skb;
u8 *data, *new_data;
struct mtk_rx_dma *rxd, trxd;
- int done = 0;
+ int done = 0, bytes = 0;
while (done < budget) {
struct net_device *netdev;
@@ -1310,6 +1311,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
else
skb_checksum_none_assert(skb);
skb->protocol = eth_type_trans(skb, netdev);
+ bytes += pktlen;
if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX &&
(trxd.rxd2 & RX_DMA_VTAG))
@@ -1342,6 +1344,12 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
mtk_update_rx_cpu_idx(eth);
}
+ eth->rx_packets += done;
+ eth->rx_bytes += bytes;
+ dim_update_sample(eth->rx_events, eth->rx_packets, eth->rx_bytes,
+ &dim_sample);
+ net_dim(ð->rx_dim, dim_sample);
+
return done;
}
@@ -1434,6 +1442,7 @@ static int mtk_poll_tx_pdma(struct mtk_eth *eth, int budget,
static int mtk_poll_tx(struct mtk_eth *eth, int budget)
{
struct mtk_tx_ring *ring = ð->tx_ring;
+ struct dim_sample dim_sample = {};
unsigned int done[MTK_MAX_DEVS];
unsigned int bytes[MTK_MAX_DEVS];
int total = 0, i;
@@ -1451,8 +1460,14 @@ static int mtk_poll_tx(struct mtk_eth *eth, int budget)
continue;
netdev_completed_queue(eth->netdev[i], done[i], bytes[i]);
total += done[i];
+ eth->tx_packets += done[i];
+ eth->tx_bytes += bytes[i];
}
+ dim_update_sample(eth->tx_events, eth->tx_packets, eth->tx_bytes,
+ &dim_sample);
+ net_dim(ð->tx_dim, dim_sample);
+
if (mtk_queue_stopped(eth) &&
(atomic_read(&ring->free_count) > ring->thresh))
mtk_wake_queue(eth);
@@ -2127,6 +2142,7 @@ static irqreturn_t mtk_handle_irq_rx(int irq, void *_eth)
{
struct mtk_eth *eth = _eth;
+ eth->rx_events++;
if (likely(napi_schedule_prep(ð->rx_napi))) {
__napi_schedule(ð->rx_napi);
mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
@@ -2139,6 +2155,7 @@ static irqreturn_t mtk_handle_irq_tx(int irq, void *_eth)
{
struct mtk_eth *eth = _eth;
+ eth->tx_events++;
if (likely(napi_schedule_prep(ð->tx_napi))) {
__napi_schedule(ð->tx_napi);
mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
@@ -2323,6 +2340,9 @@ static int mtk_stop(struct net_device *dev)
napi_disable(ð->tx_napi);
napi_disable(ð->rx_napi);
+ cancel_work_sync(ð->rx_dim.work);
+ cancel_work_sync(ð->tx_dim.work);
+
if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
mtk_stop_dma(eth, MTK_QDMA_GLO_CFG);
mtk_stop_dma(eth, MTK_PDMA_GLO_CFG);
@@ -2375,6 +2395,64 @@ static int mtk_clk_enable(struct mtk_eth *eth)
return ret;
}
+static void mtk_dim_rx(struct work_struct *work)
+{
+ struct dim *dim = container_of(work, struct dim, work);
+ struct mtk_eth *eth = container_of(dim, struct mtk_eth, rx_dim);
+ struct dim_cq_moder cur_profile;
+ u32 val, cur;
+
+ cur_profile = net_dim_get_rx_moderation(eth->rx_dim.mode,
+ dim->profile_ix);
+ spin_lock_bh(ð->dim_lock);
+
+ val = mtk_r32(eth, MTK_PDMA_DELAY_INT);
+ val &= MTK_PDMA_DELAY_TX_MASK;
+ val |= MTK_PDMA_DELAY_RX_EN;
+
+ cur = min_t(u32, DIV_ROUND_UP(cur_profile.usec, 20), MTK_PDMA_DELAY_PTIME_MASK);
+ val |= cur << MTK_PDMA_DELAY_RX_PTIME_SHIFT;
+
+ cur = min_t(u32, cur_profile.pkts, MTK_PDMA_DELAY_PINT_MASK);
+ val |= cur << MTK_PDMA_DELAY_RX_PINT_SHIFT;
+
+ mtk_w32(eth, val, MTK_PDMA_DELAY_INT);
+ mtk_w32(eth, val, MTK_QDMA_DELAY_INT);
+
+ spin_unlock_bh(ð->dim_lock);
+
+ dim->state = DIM_START_MEASURE;
+}
+
+static void mtk_dim_tx(struct work_struct *work)
+{
+ struct dim *dim = container_of(work, struct dim, work);
+ struct mtk_eth *eth = container_of(dim, struct mtk_eth, tx_dim);
+ struct dim_cq_moder cur_profile;
+ u32 val, cur;
+
+ cur_profile = net_dim_get_tx_moderation(eth->tx_dim.mode,
+ dim->profile_ix);
+ spin_lock_bh(ð->dim_lock);
+
+ val = mtk_r32(eth, MTK_PDMA_DELAY_INT);
+ val &= MTK_PDMA_DELAY_RX_MASK;
+ val |= MTK_PDMA_DELAY_TX_EN;
+
+ cur = min_t(u32, DIV_ROUND_UP(cur_profile.usec, 20), MTK_PDMA_DELAY_PTIME_MASK);
+ val |= cur << MTK_PDMA_DELAY_TX_PTIME_SHIFT;
+
+ cur = min_t(u32, cur_profile.pkts, MTK_PDMA_DELAY_PINT_MASK);
+ val |= cur << MTK_PDMA_DELAY_TX_PINT_SHIFT;
+
+ mtk_w32(eth, val, MTK_PDMA_DELAY_INT);
+ mtk_w32(eth, val, MTK_QDMA_DELAY_INT);
+
+ spin_unlock_bh(ð->dim_lock);
+
+ dim->state = DIM_START_MEASURE;
+}
+
static int mtk_hw_init(struct mtk_eth *eth)
{
int i, val, ret;
@@ -2396,9 +2474,6 @@ static int mtk_hw_init(struct mtk_eth *eth)
goto err_disable_pm;
}
- /* enable interrupt delay for RX */
- mtk_w32(eth, MTK_PDMA_DELAY_RX_DELAY, MTK_PDMA_DELAY_INT);
-
/* disable delay and normal interrupt */
mtk_tx_irq_disable(eth, ~0);
mtk_rx_irq_disable(eth, ~0);
@@ -2437,11 +2512,11 @@ static int mtk_hw_init(struct mtk_eth *eth)
/* Enable RX VLan Offloading */
mtk_w32(eth, 1, MTK_CDMP_EG_CTRL);
- /* enable interrupt delay for RX */
- mtk_w32(eth, MTK_PDMA_DELAY_RX_DELAY, MTK_PDMA_DELAY_INT);
+ /* set interrupt delays based on current Net DIM sample */
+ mtk_dim_rx(ð->rx_dim.work);
+ mtk_dim_tx(ð->tx_dim.work);
/* disable delay and normal interrupt */
- mtk_w32(eth, 0, MTK_QDMA_DELAY_INT);
mtk_tx_irq_disable(eth, ~0);
mtk_rx_irq_disable(eth, ~0);
@@ -2976,6 +3051,13 @@ static int mtk_probe(struct platform_device *pdev)
spin_lock_init(ð->page_lock);
spin_lock_init(ð->tx_irq_lock);
spin_lock_init(ð->rx_irq_lock);
+ spin_lock_init(ð->dim_lock);
+
+ eth->rx_dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+ INIT_WORK(ð->rx_dim.work, mtk_dim_rx);
+
+ eth->tx_dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+ INIT_WORK(ð->tx_dim.work, mtk_dim_tx);
if (!MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628)) {
eth->ethsys = syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 4999f8123180..a8d388b02558 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -16,6 +16,7 @@
#include <linux/refcount.h>
#include <linux/phylink.h>
#include <linux/rhashtable.h>
+#include <linux/dim.h>
#include "mtk_ppe.h"
#define MTK_QDMA_PAGE_SIZE 2048
@@ -137,13 +138,18 @@
/* PDMA Delay Interrupt Register */
#define MTK_PDMA_DELAY_INT 0xa0c
+#define MTK_PDMA_DELAY_RX_MASK GENMASK(15, 0)
#define MTK_PDMA_DELAY_RX_EN BIT(15)
-#define MTK_PDMA_DELAY_RX_PINT 4
#define MTK_PDMA_DELAY_RX_PINT_SHIFT 8
-#define MTK_PDMA_DELAY_RX_PTIME 4
-#define MTK_PDMA_DELAY_RX_DELAY \
- (MTK_PDMA_DELAY_RX_EN | MTK_PDMA_DELAY_RX_PTIME | \
- (MTK_PDMA_DELAY_RX_PINT << MTK_PDMA_DELAY_RX_PINT_SHIFT))
+#define MTK_PDMA_DELAY_RX_PTIME_SHIFT 0
+
+#define MTK_PDMA_DELAY_TX_MASK GENMASK(31, 16)
+#define MTK_PDMA_DELAY_TX_EN BIT(31)
+#define MTK_PDMA_DELAY_TX_PINT_SHIFT 24
+#define MTK_PDMA_DELAY_TX_PTIME_SHIFT 16
+
+#define MTK_PDMA_DELAY_PINT_MASK 0x7f
+#define MTK_PDMA_DELAY_PTIME_MASK 0xff
/* PDMA Interrupt Status Register */
#define MTK_PDMA_INT_STATUS 0xa20
@@ -225,6 +231,7 @@
/* QDMA Interrupt Status Register */
#define MTK_QDMA_INT_STATUS 0x1A18
#define MTK_RX_DONE_DLY BIT(30)
+#define MTK_TX_DONE_DLY BIT(28)
#define MTK_RX_DONE_INT3 BIT(19)
#define MTK_RX_DONE_INT2 BIT(18)
#define MTK_RX_DONE_INT1 BIT(17)
@@ -234,8 +241,7 @@
#define MTK_TX_DONE_INT1 BIT(1)
#define MTK_TX_DONE_INT0 BIT(0)
#define MTK_RX_DONE_INT MTK_RX_DONE_DLY
-#define MTK_TX_DONE_INT (MTK_TX_DONE_INT0 | MTK_TX_DONE_INT1 | \
- MTK_TX_DONE_INT2 | MTK_TX_DONE_INT3)
+#define MTK_TX_DONE_INT MTK_TX_DONE_DLY
/* QDMA Interrupt grouping registers */
#define MTK_QDMA_INT_GRP1 0x1a20
@@ -849,6 +855,7 @@ struct mtk_sgmii {
* @page_lock: Make sure that register operations are atomic
* @tx_irq__lock: Make sure that IRQ register operations are atomic
* @rx_irq__lock: Make sure that IRQ register operations are atomic
+ * @dim_lock: Make sure that Net DIM operations are atomic
* @dummy_dev: we run 2 netdevs on 1 physical DMA ring and need a
* dummy for NAPI to work
* @netdev: The netdev instances
@@ -867,6 +874,14 @@ struct mtk_sgmii {
* @rx_ring_qdma: Pointer to the memory holding info about the QDMA RX ring
* @tx_napi: The TX NAPI struct
* @rx_napi: The RX NAPI struct
+ * @rx_events: Net DIM RX event counter
+ * @rx_packets: Net DIM RX packet counter
+ * @rx_bytes: Net DIM RX byte counter
+ * @rx_dim: Net DIM RX context
+ * @tx_events: Net DIM TX event counter
+ * @tx_packets: Net DIM TX packet counter
+ * @tx_bytes: Net DIM TX byte counter
+ * @tx_dim: Net DIM TX context
* @scratch_ring: Newer SoCs need memory for a second HW managed TX ring
* @phy_scratch_ring: physical address of scratch_ring
* @scratch_head: The scratch memory that scratch_ring points to.
@@ -911,6 +926,18 @@ struct mtk_eth {
const struct mtk_soc_data *soc;
+ spinlock_t dim_lock;
+
+ u32 rx_events;
+ u32 rx_packets;
+ u32 rx_bytes;
+ struct dim rx_dim;
+
+ u32 tx_events;
+ u32 tx_packets;
+ u32 tx_bytes;
+ struct dim tx_dim;
+
u32 tx_int_mask_reg;
u32 tx_int_status_reg;
u32 rx_dma_l4_valid;
--
2.31.1
From: Felix Fietkau <[email protected]>
Avoid rearming interrupt if napi_complete returns false
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 5a531bb83348..88a437f478fd 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1517,8 +1517,8 @@ static int mtk_napi_tx(struct napi_struct *napi, int budget)
if (status & MTK_TX_DONE_INT)
return budget;
- napi_complete(napi);
- mtk_tx_irq_enable(eth, MTK_TX_DONE_INT);
+ if (napi_complete(napi))
+ mtk_tx_irq_enable(eth, MTK_TX_DONE_INT);
return tx_done;
}
@@ -1551,8 +1551,9 @@ static int mtk_napi_rx(struct napi_struct *napi, int budget)
remain_budget -= rx_done;
goto poll_again;
}
- napi_complete(napi);
- mtk_rx_irq_enable(eth, MTK_RX_DONE_INT);
+
+ if (napi_complete(napi))
+ mtk_rx_irq_enable(eth, MTK_RX_DONE_INT);
return rx_done + budget - remain_budget;
}
--
2.31.1
From: Felix Fietkau <[email protected]>
256 descriptors is not enough for multi-gigabit traffic under load on
MT7622. Bump it to 512 to improve performance.
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 039c39d750e0..4999f8123180 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -22,7 +22,7 @@
#define MTK_MAX_RX_LENGTH 1536
#define MTK_MAX_RX_LENGTH_2K 2048
#define MTK_TX_DMA_BUF_LEN 0x3fff
-#define MTK_DMA_SIZE 256
+#define MTK_DMA_SIZE 512
#define MTK_NAPI_WEIGHT 64
#define MTK_MAC_COUNT 2
#define MTK_RX_ETH_HLEN (ETH_HLEN + ETH_FCS_LEN)
--
2.31.1
From: Felix Fietkau <[email protected]>
When running short on descriptors, only stop the queue for the netdev that
tx was attempted for. By the time something tries to send on the other
netdev, the ring might have some more room already.
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 15 ++-------------
1 file changed, 2 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index a3958e99a29f..223131645a37 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1129,17 +1129,6 @@ static void mtk_wake_queue(struct mtk_eth *eth)
}
}
-static void mtk_stop_queue(struct mtk_eth *eth)
-{
- int i;
-
- for (i = 0; i < MTK_MAC_COUNT; i++) {
- if (!eth->netdev[i])
- continue;
- netif_stop_queue(eth->netdev[i]);
- }
-}
-
static netdev_tx_t mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct mtk_mac *mac = netdev_priv(dev);
@@ -1160,7 +1149,7 @@ static netdev_tx_t mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)
tx_num = mtk_cal_txd_req(skb);
if (unlikely(atomic_read(&ring->free_count) <= tx_num)) {
- mtk_stop_queue(eth);
+ netif_stop_queue(dev);
netif_err(eth, tx_queued, dev,
"Tx Ring full when queue awake!\n");
spin_unlock(ð->page_lock);
@@ -1186,7 +1175,7 @@ static netdev_tx_t mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)
goto drop;
if (unlikely(atomic_read(&ring->free_count) <= ring->thresh))
- mtk_stop_queue(eth);
+ netif_stop_queue(dev);
spin_unlock(ð->page_lock);
--
2.31.1
From: Felix Fietkau <[email protected]>
Should improve performance, since it can use bulk free
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 07daa5de8bec..5cf64de3ddf8 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -858,7 +858,8 @@ static int txd_to_idx(struct mtk_tx_ring *ring, struct mtk_tx_dma *dma)
return ((void *)dma - (void *)ring->dma) / sizeof(*dma);
}
-static void mtk_tx_unmap(struct mtk_eth *eth, struct mtk_tx_buf *tx_buf)
+static void mtk_tx_unmap(struct mtk_eth *eth, struct mtk_tx_buf *tx_buf,
+ bool napi)
{
if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
if (tx_buf->flags & MTK_TX_FLAGS_SINGLE0) {
@@ -890,8 +891,12 @@ static void mtk_tx_unmap(struct mtk_eth *eth, struct mtk_tx_buf *tx_buf)
tx_buf->flags = 0;
if (tx_buf->skb &&
- (tx_buf->skb != (struct sk_buff *)MTK_DMA_DUMMY_DESC))
- dev_kfree_skb_any(tx_buf->skb);
+ (tx_buf->skb != (struct sk_buff *)MTK_DMA_DUMMY_DESC)) {
+ if (napi)
+ napi_consume_skb(tx_buf->skb, napi);
+ else
+ dev_kfree_skb_any(tx_buf->skb);
+ }
tx_buf->skb = NULL;
}
@@ -1069,7 +1074,7 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
tx_buf = mtk_desc_to_tx_buf(ring, itxd);
/* unmap dma */
- mtk_tx_unmap(eth, tx_buf);
+ mtk_tx_unmap(eth, tx_buf, false);
itxd->txd3 = TX_DMA_LS0 | TX_DMA_OWNER_CPU;
if (!MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
@@ -1388,7 +1393,7 @@ static int mtk_poll_tx_qdma(struct mtk_eth *eth, int budget,
done[mac]++;
budget--;
}
- mtk_tx_unmap(eth, tx_buf);
+ mtk_tx_unmap(eth, tx_buf, true);
ring->last_free = desc;
atomic_inc(&ring->free_count);
@@ -1425,7 +1430,7 @@ static int mtk_poll_tx_pdma(struct mtk_eth *eth, int budget,
budget--;
}
- mtk_tx_unmap(eth, tx_buf);
+ mtk_tx_unmap(eth, tx_buf, true);
desc = &ring->dma[cpu];
ring->last_free = desc;
@@ -1627,7 +1632,7 @@ static void mtk_tx_clean(struct mtk_eth *eth)
if (ring->buf) {
for (i = 0; i < MTK_DMA_SIZE; i++)
- mtk_tx_unmap(eth, &ring->buf[i]);
+ mtk_tx_unmap(eth, &ring->buf[i], false);
kfree(ring->buf);
ring->buf = NULL;
}
--
2.31.1
From: Felix Fietkau <[email protected]>
This improves GRO performance
Signed-off-by: Felix Fietkau <[email protected]>
[Ilya: Use MTK_RXD4_FOE_ENTRY instead of GENMASK(13, 0)]
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 88a437f478fd..8c863322587e 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -19,6 +19,7 @@
#include <linux/interrupt.h>
#include <linux/pinctrl/devinfo.h>
#include <linux/phylink.h>
+#include <linux/jhash.h>
#include <net/dsa.h>
#include "mtk_eth_soc.h"
@@ -1248,6 +1249,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
struct net_device *netdev;
unsigned int pktlen;
dma_addr_t dma_addr;
+ u32 hash;
int mac;
ring = mtk_get_rx_ring(eth);
@@ -1317,6 +1319,12 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
skb->protocol = eth_type_trans(skb, netdev);
bytes += pktlen;
+ hash = trxd.rxd4 & MTK_RXD4_FOE_ENTRY;
+ if (hash != MTK_RXD4_FOE_ENTRY) {
+ hash = jhash_1word(hash, 0);
+ skb_set_hash(skb, hash, PKT_HASH_TYPE_L4);
+ }
+
if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX &&
(trxd.rxd2 & RX_DMA_VTAG))
__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q),
--
2.31.1
From: Felix Fietkau <[email protected]>
The value is only updated by the CPU, so it is cheaper to access from the
ring data structure than from a hardware register.
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 8 ++++----
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 2 ++
2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 043ab5446524..01ad10c76d53 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1362,7 +1362,7 @@ static int mtk_poll_tx_qdma(struct mtk_eth *eth, int budget,
struct mtk_tx_buf *tx_buf;
u32 cpu, dma;
- cpu = mtk_r32(eth, MTK_QTX_CRX_PTR);
+ cpu = ring->last_free_ptr;
dma = mtk_r32(eth, MTK_QTX_DRX_PTR);
desc = mtk_qdma_phys_to_virt(ring, cpu);
@@ -1396,6 +1396,7 @@ static int mtk_poll_tx_qdma(struct mtk_eth *eth, int budget,
cpu = next_cpu;
}
+ ring->last_free_ptr = cpu;
mtk_w32(eth, cpu, MTK_QTX_CRX_PTR);
return budget;
@@ -1596,6 +1597,7 @@ static int mtk_tx_alloc(struct mtk_eth *eth)
atomic_set(&ring->free_count, MTK_DMA_SIZE - 2);
ring->next_free = &ring->dma[0];
ring->last_free = &ring->dma[MTK_DMA_SIZE - 1];
+ ring->last_free_ptr = (u32)(ring->phys + ((MTK_DMA_SIZE - 1) * sz));
ring->thresh = MAX_SKB_FRAGS;
/* make sure that all changes to the dma ring are flushed before we
@@ -1609,9 +1611,7 @@ static int mtk_tx_alloc(struct mtk_eth *eth)
mtk_w32(eth,
ring->phys + ((MTK_DMA_SIZE - 1) * sz),
MTK_QTX_CRX_PTR);
- mtk_w32(eth,
- ring->phys + ((MTK_DMA_SIZE - 1) * sz),
- MTK_QTX_DRX_PTR);
+ mtk_w32(eth, ring->last_free_ptr, MTK_QTX_DRX_PTR);
mtk_w32(eth, (QDMA_RES_THRES << 8) | QDMA_RES_THRES,
MTK_QTX_CFG(0));
} else {
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index a8d388b02558..214da569e869 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -642,6 +642,7 @@ struct mtk_tx_buf {
* @phys: The physical addr of tx_buf
* @next_free: Pointer to the next free descriptor
* @last_free: Pointer to the last free descriptor
+ * @last_free_ptr: Hardware pointer value of the last free descriptor
* @thresh: The threshold of minimum amount of free descriptors
* @free_count: QDMA uses a linked list. Track how many free descriptors
* are present
@@ -652,6 +653,7 @@ struct mtk_tx_ring {
dma_addr_t phys;
struct mtk_tx_dma *next_free;
struct mtk_tx_dma *last_free;
+ u32 last_free_ptr;
u16 thresh;
atomic_t free_count;
int dma_size;
--
2.31.1
From: Felix Fietkau <[email protected]>
Uncached memory access is expensive, and there is no need to access all
descriptor words if we can't process them anyway
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 01ad10c76d53..5a531bb83348 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -775,13 +775,18 @@ static inline int mtk_max_buf_size(int frag_size)
return buf_size;
}
-static inline void mtk_rx_get_desc(struct mtk_rx_dma *rxd,
+static inline bool mtk_rx_get_desc(struct mtk_rx_dma *rxd,
struct mtk_rx_dma *dma_rxd)
{
- rxd->rxd1 = READ_ONCE(dma_rxd->rxd1);
rxd->rxd2 = READ_ONCE(dma_rxd->rxd2);
+ if (!(rxd->rxd2 & RX_DMA_DONE))
+ return false;
+
+ rxd->rxd1 = READ_ONCE(dma_rxd->rxd1);
rxd->rxd3 = READ_ONCE(dma_rxd->rxd3);
rxd->rxd4 = READ_ONCE(dma_rxd->rxd4);
+
+ return true;
}
/* the qdma core needs scratch memory to be setup */
@@ -1253,8 +1258,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
rxd = &ring->dma[idx];
data = ring->data[idx];
- mtk_rx_get_desc(&trxd, rxd);
- if (!(trxd.rxd2 & RX_DMA_DONE))
+ if (!mtk_rx_get_desc(&trxd, rxd))
break;
/* find out which mac the packet come from. values start at 1 */
--
2.31.1
Replace a tight busy-wait loop without a pause with a standard
readx_poll_timeout_atomic routine with a 5 us poll period.
Tested by booting a MT7621 device to ensure the driver initializes
properly.
Signed-off-by: Ilya Lipnitskiy <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 29 +++++++++------------
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 2 +-
2 files changed, 14 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 8c863322587e..720d73d0c007 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2037,25 +2037,22 @@ static int mtk_set_features(struct net_device *dev, netdev_features_t features)
/* wait for DMA to finish whatever it is doing before we start using it again */
static int mtk_dma_busy_wait(struct mtk_eth *eth)
{
- unsigned long t_start = jiffies;
+ u32 val;
+ int ret;
+ unsigned int reg;
- while (1) {
- if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
- if (!(mtk_r32(eth, MTK_QDMA_GLO_CFG) &
- (MTK_RX_DMA_BUSY | MTK_TX_DMA_BUSY)))
- return 0;
- } else {
- if (!(mtk_r32(eth, MTK_PDMA_GLO_CFG) &
- (MTK_RX_DMA_BUSY | MTK_TX_DMA_BUSY)))
- return 0;
- }
+ if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
+ reg = MTK_QDMA_GLO_CFG;
+ else
+ reg = MTK_PDMA_GLO_CFG;
- if (time_after(jiffies, t_start + MTK_DMA_BUSY_TIMEOUT))
- break;
- }
+ ret = readx_poll_timeout_atomic(__raw_readl, eth->base + reg, val,
+ !(val & (MTK_RX_DMA_BUSY | MTK_TX_DMA_BUSY)),
+ 5, MTK_DMA_BUSY_TIMEOUT_US);
+ if (ret)
+ dev_err(eth->dev, "DMA init timeout\n");
- dev_err(eth->dev, "DMA init timeout\n");
- return -1;
+ return ret;
}
static int mtk_dma_init(struct mtk_eth *eth)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 214da569e869..2e4356ccf778 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -214,7 +214,7 @@
#define MTK_TX_DMA_BUSY BIT(1)
#define MTK_RX_DMA_EN BIT(2)
#define MTK_TX_DMA_EN BIT(0)
-#define MTK_DMA_BUSY_TIMEOUT HZ
+#define MTK_DMA_BUSY_TIMEOUT_US 1000000
/* QDMA Reset Index Register */
#define MTK_QDMA_RST_IDX 0x1A08
--
2.31.1
On Wed, Apr 21, 2021 at 09:09:05PM -0700, Ilya Lipnitskiy wrote:
> From: Felix Fietkau <[email protected]>
>
> usleep_range often ends up sleeping much longer than the 10-20us provided
> as a range here. This causes significant latency in mdio bus acceses,
I found the same with the FEC driver, and make the same change.
> Signed-off-by: Felix Fietkau <[email protected]>
> [Ilya: use readx_poll_timeout_atomic instead of cond_resched]
> Signed-off-by: Ilya Lipnitskiy <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Andrew
On Wed, Apr 21, 2021 at 09:09:14PM -0700, Ilya Lipnitskiy wrote:
> Replace a tight busy-wait loop without a pause with a standard
> readx_poll_timeout_atomic routine with a 5 us poll period.
>
> Tested by booting a MT7621 device to ensure the driver initializes
> properly.
>
> Signed-off-by: Ilya Lipnitskiy <[email protected]>
> ---
> drivers/net/ethernet/mediatek/mtk_eth_soc.c | 29 +++++++++------------
> drivers/net/ethernet/mediatek/mtk_eth_soc.h | 2 +-
> 2 files changed, 14 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> index 8c863322587e..720d73d0c007 100644
> --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> @@ -2037,25 +2037,22 @@ static int mtk_set_features(struct net_device *dev, netdev_features_t features)
> /* wait for DMA to finish whatever it is doing before we start using it again */
> static int mtk_dma_busy_wait(struct mtk_eth *eth)
> {
> - unsigned long t_start = jiffies;
> + u32 val;
> + int ret;
> + unsigned int reg;
Nit:
Reverse christmass tree.
Reviewed-by: Andrew Lunn <[email protected]>
Andrew
On Wed, Apr 21, 2021 at 09:09:00PM -0700, Ilya Lipnitskiy wrote:
> Most of these changes come from OpenWrt where they have been present and
> tested for months.
>
> First three patches are bug fixes. The rest are performance
> improvements. The last patch is a cleanup to use the iopoll.h macro for
> busy-waiting instead of a custom loop.
Do you have any benchmark numbers you can share?
Andrew
On 2021-04-22 06:09, Ilya Lipnitskiy wrote:
> From: Felix Fietkau <[email protected]>
>
> usleep_range often ends up sleeping much longer than the 10-20us provided
> as a range here. This causes significant latency in mdio bus acceses,
> which easily adds multiple seconds to the boot time on MT7621 when polling
> DSA slave ports.
>
> Use udelay via readx_poll_timeout_atomic, since the MDIO access does not
> take much time
>
> Signed-off-by: Felix Fietkau <[email protected]>
> [Ilya: use readx_poll_timeout_atomic instead of cond_resched]
I still prefer the cond_resched() variant. On a fully loaded system, I'd
prefer to let the MDIO access take longer instead of wasting cycles on
udelay.
- Felix
On Wed, 21 Apr 2021 21:09:12 -0700 Ilya Lipnitskiy wrote:
> @@ -1551,8 +1551,9 @@ static int mtk_napi_rx(struct napi_struct *napi, int budget)
> remain_budget -= rx_done;
> goto poll_again;
> }
> - napi_complete(napi);
> - mtk_rx_irq_enable(eth, MTK_RX_DONE_INT);
> +
> + if (napi_complete(napi))
> + mtk_rx_irq_enable(eth, MTK_RX_DONE_INT);
Why not napi_complete_done(napi, rx_done + budget - remain_budget)?
(Modulo possible elimination of rx_done in this function.)
On Thu, Apr 22, 2021 at 5:33 AM Felix Fietkau <[email protected]> wrote:
>
>
> On 2021-04-22 06:09, Ilya Lipnitskiy wrote:
> > From: Felix Fietkau <[email protected]>
> >
> > usleep_range often ends up sleeping much longer than the 10-20us provided
> > as a range here. This causes significant latency in mdio bus acceses,
> > which easily adds multiple seconds to the boot time on MT7621 when polling
> > DSA slave ports.
> >
> > Use udelay via readx_poll_timeout_atomic, since the MDIO access does not
> > take much time
> >
> > Signed-off-by: Felix Fietkau <[email protected]>
> > [Ilya: use readx_poll_timeout_atomic instead of cond_resched]
> I still prefer the cond_resched() variant.
No problem, I will respin with your original change. Looks like we
can't take advantage of iopoll.h routine in this case, but that's not
the end of the world!
> On a fully loaded system, I'd
> prefer to let the MDIO access take longer instead of wasting cycles on
> udelay.
>
> - Felix
Ilya
On Thu, Apr 22, 2021 at 9:26 AM Jakub Kicinski <[email protected]> wrote:
>
> On Wed, 21 Apr 2021 21:09:12 -0700 Ilya Lipnitskiy wrote:
> > @@ -1551,8 +1551,9 @@ static int mtk_napi_rx(struct napi_struct *napi, int budget)
> > remain_budget -= rx_done;
> > goto poll_again;
> > }
> > - napi_complete(napi);
> > - mtk_rx_irq_enable(eth, MTK_RX_DONE_INT);
> > +
> > + if (napi_complete(napi))
> > + mtk_rx_irq_enable(eth, MTK_RX_DONE_INT);
>
> Why not napi_complete_done(napi, rx_done + budget - remain_budget)?
> (Modulo possible elimination of rx_done in this function.)
No reason, I think. Thanks for pointing it out. I will clean up both
TX and RX NAPI callbacks to use napi_complete_done and to get rid of
that ugly goto...
Ilya
On Thu, Apr 22, 2021 at 5:23 AM Andrew Lunn <[email protected]> wrote:
>
> On Wed, Apr 21, 2021 at 09:09:00PM -0700, Ilya Lipnitskiy wrote:
> > Most of these changes come from OpenWrt where they have been present and
> > tested for months.
> >
> > First three patches are bug fixes. The rest are performance
> > improvements. The last patch is a cleanup to use the iopoll.h macro for
> > busy-waiting instead of a custom loop.
>
> Do you have any benchmark numbers you can share?
No. Felix, do you have anything handy?
If needed, I'll run some tests before and after the patch series on
OpenWrt against v5.10, but I only have an MT7621 device - should be
better than nothing though...
Ilya