2016-04-08 07:52:48

by John Crispin

[permalink] [raw]
Subject: [PATCH V3 0/8] net: mediatek: make the driver pass stress tests

While testing the driver we managed to get the TX path to stall and fail
to recover. When dual MAC support was added to the driver, the whole queue
stop/wake code was not properly adapted. There was also a regression in the
locking of the xmit function. The fact that watchdog_timeo was not set and
that the tx_timeout code failed to properly reset the dma, irq and queue
just made the mess complete.

This series make the driver pass stress testing. With this series applied
the testbed has been running for several days and still has not locked up.
We have a second setup that has a small hack patch applied to randomly stop
irqs and/or one of the queues and successfully manages to recover from these
simulated tx stalls.

John Crispin (8):
net: mediatek: watchdog_timeo was not set
net: mediatek: mtk_cal_txd_req() returns bad value
net: mediatek: remove superfluous reset call
net: mediatek: fix stop and wakeup of queue
net: mediatek: fix TX locking
net: mediatek: fix mtk_pending_work
net: mediatek: move the pending_work struct to the device generic
struct
net: mediatek: do not set the QID field in the TX DMA descriptors

drivers/net/ethernet/mediatek/mtk_eth_soc.c | 106 ++++++++++++++++-----------
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 4 +-
2 files changed, 66 insertions(+), 44 deletions(-)

--
1.7.10.4


2016-04-08 07:50:55

by John Crispin

[permalink] [raw]
Subject: [PATCH V3 5/8] net: mediatek: fix TX locking

Inside the TX path there is a lock inside the tx_map function. This is
however too late. The patch moves the lock to the start of the xmit
function right before the free count check of the DMA ring happens.
If we do not do this, the code becomes racy leading to TX stalls and
dropped packets. This happens as there are 2 netdevs running on the
same physical DMA ring.

Signed-off-by: John Crispin <[email protected]>

---
Changes in V3
* fix a typo in the comment

drivers/net/ethernet/mediatek/mtk_eth_soc.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 4ebc42e..7b76075 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -536,7 +536,6 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
struct mtk_eth *eth = mac->hw;
struct mtk_tx_dma *itxd, *txd;
struct mtk_tx_buf *tx_buf;
- unsigned long flags;
dma_addr_t mapped_addr;
unsigned int nr_frags;
int i, n_desc = 1;
@@ -568,11 +567,6 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
if (unlikely(dma_mapping_error(&dev->dev, mapped_addr)))
return -ENOMEM;

- /* normally we can rely on the stack not calling this more than once,
- * however we have 2 queues running ont he same ring so we need to lock
- * the ring access
- */
- spin_lock_irqsave(&eth->page_lock, flags);
WRITE_ONCE(itxd->txd1, mapped_addr);
tx_buf->flags |= MTK_TX_FLAGS_SINGLE0;
dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr);
@@ -632,8 +626,6 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
WRITE_ONCE(itxd->txd3, (TX_DMA_SWC | TX_DMA_PLEN0(skb_headlen(skb)) |
(!nr_frags * TX_DMA_LS0)));

- spin_unlock_irqrestore(&eth->page_lock, flags);
-
netdev_sent_queue(dev, skb->len);
skb_tx_timestamp(skb);

@@ -661,8 +653,6 @@ err_dma:
itxd = mtk_qdma_phys_to_virt(ring, itxd->txd2);
} while (itxd != txd);

- spin_unlock_irqrestore(&eth->page_lock, flags);
-
return -ENOMEM;
}

@@ -712,14 +702,22 @@ static int mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)
struct mtk_eth *eth = mac->hw;
struct mtk_tx_ring *ring = &eth->tx_ring;
struct net_device_stats *stats = &dev->stats;
+ unsigned long flags;
bool gso = false;
int tx_num;

+ /* normally we can rely on the stack not calling this more than once,
+ * however we have 2 queues running on the same ring so we need to lock
+ * the ring access
+ */
+ spin_lock_irqsave(&eth->page_lock, flags);
+
tx_num = mtk_cal_txd_req(skb);
if (unlikely(atomic_read(&ring->free_count) <= tx_num)) {
mtk_stop_queue(eth);
netif_err(eth, tx_queued, dev,
"Tx Ring full when queue awake!\n");
+ spin_unlock_irqrestore(&eth->page_lock, flags);
return NETDEV_TX_BUSY;
}

@@ -747,10 +745,12 @@ static int mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)
ring->thresh))
mtk_wake_queue(eth);
}
+ spin_unlock_irqrestore(&eth->page_lock, flags);

return NETDEV_TX_OK;

drop:
+ spin_unlock_irqrestore(&eth->page_lock, flags);
stats->tx_dropped++;
dev_kfree_skb(skb);
return NETDEV_TX_OK;
--
1.7.10.4

2016-04-08 07:50:56

by John Crispin

[permalink] [raw]
Subject: [PATCH V3 7/8] net: mediatek: move the pending_work struct to the device generic struct

The worker always touches both netdevs. It is ethernet core and not MAC
specific. We only need one worker, which belongs into the ethernets core
struct.

Signed-off-by: John Crispin <[email protected]>

---
Changes in V3
* make the patch bisectable

drivers/net/ethernet/mediatek/mtk_eth_soc.c | 13 +++++--------
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 4 ++--
2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index cd5d0c9..eb0d554 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1193,7 +1193,7 @@ static void mtk_tx_timeout(struct net_device *dev)
eth->netdev[mac->id]->stats.tx_errors++;
netif_err(eth, tx_err, dev,
"transmit timed out\n");
- schedule_work(&mac->pending_work);
+ schedule_work(&eth->pending_work);
}

static irqreturn_t mtk_handle_irq(int irq, void *_eth)
@@ -1430,8 +1430,7 @@ static int mtk_do_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)

static void mtk_pending_work(struct work_struct *work)
{
- struct mtk_mac *mac = container_of(work, struct mtk_mac, pending_work);
- struct mtk_eth *eth = mac->hw;
+ struct mtk_eth *eth = container_of(work, struct mtk_eth, pending_work);
int err, i;
unsigned long restart = 0;

@@ -1439,7 +1438,7 @@ static void mtk_pending_work(struct work_struct *work)

/* stop all devices to make sure that dma is properly shut down */
for (i = 0; i < MTK_MAC_COUNT; i++) {
- if (!netif_oper_up(eth->netdev[i]))
+ if (!eth->netdev[i])
continue;
mtk_stop(eth->netdev[i]);
__set_bit(i, &restart);
@@ -1464,15 +1463,13 @@ static int mtk_cleanup(struct mtk_eth *eth)
int i;

for (i = 0; i < MTK_MAC_COUNT; i++) {
- struct mtk_mac *mac = netdev_priv(eth->netdev[i]);
-
if (!eth->netdev[i])
continue;

unregister_netdev(eth->netdev[i]);
free_netdev(eth->netdev[i]);
- cancel_work_sync(&mac->pending_work);
}
+ cancel_work_sync(&eth->pending_work);

return 0;
}
@@ -1660,7 +1657,6 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
mac->id = id;
mac->hw = eth;
mac->of_node = np;
- INIT_WORK(&mac->pending_work, mtk_pending_work);

mac->hw_stats = devm_kzalloc(eth->dev,
sizeof(*mac->hw_stats),
@@ -1762,6 +1758,7 @@ static int mtk_probe(struct platform_device *pdev)

eth->dev = &pdev->dev;
eth->msg_enable = netif_msg_init(mtk_msg_level, MTK_DEFAULT_MSG_ENABLE);
+ INIT_WORK(&eth->pending_work, mtk_pending_work);

err = mtk_hw_init(eth);
if (err)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 48a5292..eed626d 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -363,6 +363,7 @@ struct mtk_rx_ring {
* @clk_gp1: The gmac1 clock
* @clk_gp2: The gmac2 clock
* @mii_bus: If there is a bus we need to create an instance for it
+ * @pending_work: The workqueue used to reset the dma ring
*/

struct mtk_eth {
@@ -389,6 +390,7 @@ struct mtk_eth {
struct clk *clk_gp1;
struct clk *clk_gp2;
struct mii_bus *mii_bus;
+ struct work_struct pending_work;
};

/* struct mtk_mac - the structure that holds the info about the MACs of the
@@ -398,7 +400,6 @@ struct mtk_eth {
* @hw: Backpointer to our main datastruture
* @hw_stats: Packet statistics counter
* @phy_dev: The attached PHY if available
- * @pending_work: The workqueue used to reset the dma ring
*/
struct mtk_mac {
int id;
@@ -406,7 +407,6 @@ struct mtk_mac {
struct mtk_eth *hw;
struct mtk_hw_stats *hw_stats;
struct phy_device *phy_dev;
- struct work_struct pending_work;
};

/* the struct describing the SoC. these are declared in the soc_xyz.c files */
--
1.7.10.4

2016-04-08 07:50:53

by John Crispin

[permalink] [raw]
Subject: [PATCH V3 3/8] net: mediatek: remove superfluous reset call

HW reset is triggered in the mtk_hw_init() function. There is no need to
also reset the core during probe.

Signed-off-by: John Crispin <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 94cceb8..a4982e4 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1679,10 +1679,6 @@ static int mtk_probe(struct platform_device *pdev)
struct mtk_eth *eth;
int err;

- err = device_reset(&pdev->dev);
- if (err)
- return err;
-
match = of_match_device(of_mtk_match, &pdev->dev);
soc = (struct mtk_soc_data *)match->data;

--
1.7.10.4

2016-04-08 07:51:37

by John Crispin

[permalink] [raw]
Subject: [PATCH V3 8/8] net: mediatek: do not set the QID field in the TX DMA descriptors

The QID field gets set to the mac id. This made the DMA linked list queue
the traffic of each MAC on a different internal queue. However during long
term testing we found that this will cause traffic stalls as the multi
queue setup requires a more complete initialisation which is not part of
the upstream driver yet.

This patch removes the code setting the QID field, resulting in all
traffic ending up in queue 0 which works without any special setup.

Signed-off-by: John Crispin <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index eb0d554..c984462 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -603,8 +603,7 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
WRITE_ONCE(txd->txd1, mapped_addr);
WRITE_ONCE(txd->txd3, (TX_DMA_SWC |
TX_DMA_PLEN0(frag_map_size) |
- last_frag * TX_DMA_LS0) |
- mac->id);
+ last_frag * TX_DMA_LS0));
WRITE_ONCE(txd->txd4, 0);

tx_buf->skb = (struct sk_buff *)MTK_DMA_DUMMY_DESC;
--
1.7.10.4

2016-04-08 07:51:36

by John Crispin

[permalink] [raw]
Subject: [PATCH V3 6/8] net: mediatek: fix mtk_pending_work

The driver supports 2 MACs. Both run on the same DMA ring. If we hit a TX
timeout we need to stop both netdevs before restarting them again. If we
don't do this, mtk_stop() wont shutdown DMA and the consecutive call to
mtk_open() wont restart DMA and enable IRQs.

Signed-off-by: John Crispin <[email protected]>

---
Changes in V3
* make the patch bisectable

drivers/net/ethernet/mediatek/mtk_eth_soc.c | 28 +++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 7b76075..cd5d0c9 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1432,17 +1432,29 @@ static void mtk_pending_work(struct work_struct *work)
{
struct mtk_mac *mac = container_of(work, struct mtk_mac, pending_work);
struct mtk_eth *eth = mac->hw;
- struct net_device *dev = eth->netdev[mac->id];
- int err;
+ int err, i;
+ unsigned long restart = 0;

rtnl_lock();
- mtk_stop(dev);

- err = mtk_open(dev);
- if (err) {
- netif_alert(eth, ifup, dev,
- "Driver up/down cycle failed, closing device.\n");
- dev_close(dev);
+ /* stop all devices to make sure that dma is properly shut down */
+ for (i = 0; i < MTK_MAC_COUNT; i++) {
+ if (!netif_oper_up(eth->netdev[i]))
+ continue;
+ mtk_stop(eth->netdev[i]);
+ __set_bit(i, &restart);
+ }
+
+ /* restart DMA and enable IRQs */
+ for (i = 0; i < MTK_MAC_COUNT; i++) {
+ if (!test_bit(i, &restart))
+ continue;
+ err = mtk_open(eth->netdev[i]);
+ if (err) {
+ netif_alert(eth, ifup, eth->netdev[i],
+ "Driver up/down cycle failed, closing device.\n");
+ dev_close(eth->netdev[i]);
+ }
}
rtnl_unlock();
}
--
1.7.10.4

2016-04-08 07:52:09

by John Crispin

[permalink] [raw]
Subject: [PATCH V3 1/8] net: mediatek: watchdog_timeo was not set

The original commit failed to set watchdog_timeo. This patch sets
watchdog_timeo to HZ.

Signed-off-by: John Crispin <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index e0b68af..bb10d57 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1645,6 +1645,7 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
mac->hw_stats->reg_offset = id * MTK_STAT_OFFSET;

SET_NETDEV_DEV(eth->netdev[id], eth->dev);
+ eth->netdev[id]->watchdog_timeo = HZ;
eth->netdev[id]->netdev_ops = &mtk_netdev_ops;
eth->netdev[id]->base_addr = (unsigned long)eth->base;
eth->netdev[id]->vlan_features = MTK_HW_FEATURES &
--
1.7.10.4

2016-04-08 07:50:50

by John Crispin

[permalink] [raw]
Subject: [PATCH V3 2/8] net: mediatek: mtk_cal_txd_req() returns bad value

The code used to also support the PDMA engine, which had 2 packet pointers
per descriptor. Because of this we had to divide the result by 2 and round
it up. This is no longer needed as the code only supports QDMA.

Signed-off-by: John Crispin <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index bb10d57..94cceb8 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -681,7 +681,7 @@ static inline int mtk_cal_txd_req(struct sk_buff *skb)
nfrags += skb_shinfo(skb)->nr_frags;
}

- return DIV_ROUND_UP(nfrags, 2);
+ return nfrags;
}

static int mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)
--
1.7.10.4

2016-04-08 07:53:20

by John Crispin

[permalink] [raw]
Subject: [PATCH V3 4/8] net: mediatek: fix stop and wakeup of queue

The driver supports 2 MACs. Both run on the same DMA ring. If we go
above/below the TX rings threshold value, we always need to wake/stop
the queue of both devices. Not doing to can cause TX stalls and packet
drops on one of the devices.

Signed-off-by: John Crispin <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 37 +++++++++++++++++++--------
1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index a4982e4..4ebc42e 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -684,6 +684,28 @@ static inline int mtk_cal_txd_req(struct sk_buff *skb)
return nfrags;
}

+static void mtk_wake_queue(struct mtk_eth *eth)
+{
+ int i;
+
+ for (i = 0; i < MTK_MAC_COUNT; i++) {
+ if (!eth->netdev[i])
+ continue;
+ netif_wake_queue(eth->netdev[i]);
+ }
+}
+
+static void mtk_stop_queue(struct mtk_eth *eth)
+{
+ int i;
+
+ for (i = 0; i < MTK_MAC_COUNT; i++) {
+ if (!eth->netdev[i])
+ continue;
+ netif_stop_queue(eth->netdev[i]);
+ }
+}
+
static int mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct mtk_mac *mac = netdev_priv(dev);
@@ -695,7 +717,7 @@ static int mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)

tx_num = mtk_cal_txd_req(skb);
if (unlikely(atomic_read(&ring->free_count) <= tx_num)) {
- netif_stop_queue(dev);
+ mtk_stop_queue(eth);
netif_err(eth, tx_queued, dev,
"Tx Ring full when queue awake!\n");
return NETDEV_TX_BUSY;
@@ -720,10 +742,10 @@ static int mtk_start_xmit(struct sk_buff *skb, struct net_device *dev)
goto drop;

if (unlikely(atomic_read(&ring->free_count) <= ring->thresh)) {
- netif_stop_queue(dev);
+ mtk_stop_queue(eth);
if (unlikely(atomic_read(&ring->free_count) >
ring->thresh))
- netif_wake_queue(dev);
+ mtk_wake_queue(eth);
}

return NETDEV_TX_OK;
@@ -897,13 +919,8 @@ static int mtk_poll_tx(struct mtk_eth *eth, int budget, bool *tx_again)
if (!total)
return 0;

- for (i = 0; i < MTK_MAC_COUNT; i++) {
- if (!eth->netdev[i] ||
- unlikely(!netif_queue_stopped(eth->netdev[i])))
- continue;
- if (atomic_read(&ring->free_count) > ring->thresh)
- netif_wake_queue(eth->netdev[i]);
- }
+ if (atomic_read(&ring->free_count) > ring->thresh)
+ mtk_wake_queue(eth);

return total;
}
--
1.7.10.4

2016-04-13 02:42:14

by David Miller

[permalink] [raw]
Subject: Re: [PATCH V3 0/8] net: mediatek: make the driver pass stress tests

From: John Crispin <[email protected]>
Date: Fri, 8 Apr 2016 00:54:03 +0200

> While testing the driver we managed to get the TX path to stall and fail
> to recover. When dual MAC support was added to the driver, the whole queue
> stop/wake code was not properly adapted. There was also a regression in the
> locking of the xmit function. The fact that watchdog_timeo was not set and
> that the tx_timeout code failed to properly reset the dma, irq and queue
> just made the mess complete.
>
> This series make the driver pass stress testing. With this series applied
> the testbed has been running for several days and still has not locked up.
> We have a second setup that has a small hack patch applied to randomly stop
> irqs and/or one of the queues and successfully manages to recover from these
> simulated tx stalls.

Series applied, thanks.