arc_emac_rx() has some issues found by code review.
In case netdev_alloc_skb_ip_align() or dma_map_single() failure
rx fifo entry will not be returned to EMAC.
In case dma_map_single() failure previously allocated skb became
lost to driver. At the same time address of newly allocated skb
will not be provided to EMAC.
Signed-off-by: Alexander Kochetkov <[email protected]>
---
drivers/net/ethernet/arc/emac_main.c | 53 ++++++++++++++++++++--------------
1 file changed, 31 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c
index b2e0051..0ea57fe 100644
--- a/drivers/net/ethernet/arc/emac_main.c
+++ b/drivers/net/ethernet/arc/emac_main.c
@@ -212,39 +212,48 @@ static int arc_emac_rx(struct net_device *ndev, int budget)
continue;
}
- pktlen = info & LEN_MASK;
- stats->rx_packets++;
- stats->rx_bytes += pktlen;
- skb = rx_buff->skb;
- skb_put(skb, pktlen);
- skb->dev = ndev;
- skb->protocol = eth_type_trans(skb, ndev);
-
- dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
- dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
-
- /* Prepare the BD for next cycle */
- rx_buff->skb = netdev_alloc_skb_ip_align(ndev,
- EMAC_BUFFER_SIZE);
- if (unlikely(!rx_buff->skb)) {
+ /* Prepare the BD for next cycle. netif_receive_skb()
+ * only if new skb was allocated and mapped to avoid holes
+ * in the RX fifo.
+ */
+ skb = netdev_alloc_skb_ip_align(ndev, EMAC_BUFFER_SIZE);
+ if (unlikely(!skb)) {
+ if (net_ratelimit())
+ netdev_err(ndev, "cannot allocate skb\n");
+ /* Return ownership to EMAC */
+ rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
stats->rx_errors++;
- /* Because receive_skb is below, increment rx_dropped */
stats->rx_dropped++;
continue;
}
- /* receive_skb only if new skb was allocated to avoid holes */
- netif_receive_skb(skb);
-
- addr = dma_map_single(&ndev->dev, (void *)rx_buff->skb->data,
+ addr = dma_map_single(&ndev->dev, (void *)skb->data,
EMAC_BUFFER_SIZE, DMA_FROM_DEVICE);
if (dma_mapping_error(&ndev->dev, addr)) {
if (net_ratelimit())
- netdev_err(ndev, "cannot dma map\n");
- dev_kfree_skb(rx_buff->skb);
+ netdev_err(ndev, "cannot map dma buffer\n");
+ dev_kfree_skb(skb);
+ /* Return ownership to EMAC */
+ rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
stats->rx_errors++;
+ stats->rx_dropped++;
continue;
}
+
+ /* unmap previosly mapped skb */
+ dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
+ dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
+
+ pktlen = info & LEN_MASK;
+ stats->rx_packets++;
+ stats->rx_bytes += pktlen;
+ skb_put(rx_buff->skb, pktlen);
+ rx_buff->skb->dev = ndev;
+ rx_buff->skb->protocol = eth_type_trans(rx_buff->skb, ndev);
+
+ netif_receive_skb(rx_buff->skb);
+
+ rx_buff->skb = skb;
dma_unmap_addr_set(rx_buff, addr, addr);
dma_unmap_len_set(rx_buff, len, EMAC_BUFFER_SIZE);
--
1.7.9.5
Under certain conditions EMAC stop reception of incoming packets and
continuously increment R_MISS register instead of saving data into
provided buffer. The commit implement workaround for such situation.
Then the stall detected EMAC will be restarted.
On device the stall looks like the device lost it's dynamic IP address.
ifconfig shows that interface error counter rapidly increments.
At the same time on the DHCP server we can see continues DHCP-requests
from device.
In real network stalls happen really rarely. To make them frequent the
broadcast storm[1] should be simulated. For simulation it is necessary
to make following connections:
1. connect radxarock to 1st port of switch
2. connect some PC to 2nd port of switch
3. connect two other free ports together using standard ethernet cable,
in order to make a switching loop.
After that, is necessary to make a broadcast storm. For example, running on
PC 'ping' to some IP address triggers ARP-request storm. After some
time (~10sec), EMAC on rk3188 will stall.
Observed and tested on rk3188 radxarock.
[1] https://en.wikipedia.org/wiki/Broadcast_radiation
Signed-off-by: Alexander Kochetkov <[email protected]>
---
Changes in v2:
- Rebased against stable linux-4.14.y branch
drivers/net/ethernet/arc/emac.h | 2 +
drivers/net/ethernet/arc/emac_main.c | 111 ++++++++++++++++++++++++++++++++++
2 files changed, 113 insertions(+)
diff --git a/drivers/net/ethernet/arc/emac.h b/drivers/net/ethernet/arc/emac.h
index 3c63b16..d9efbc8 100644
--- a/drivers/net/ethernet/arc/emac.h
+++ b/drivers/net/ethernet/arc/emac.h
@@ -159,6 +159,8 @@ struct arc_emac_priv {
unsigned int link;
unsigned int duplex;
unsigned int speed;
+
+ unsigned int rx_missed_errors;
};
/**
diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c
index 3241af1..363d909 100644
--- a/drivers/net/ethernet/arc/emac_main.c
+++ b/drivers/net/ethernet/arc/emac_main.c
@@ -26,6 +26,8 @@
#include "emac.h"
+static void arc_emac_restart(struct net_device *ndev);
+
/**
* arc_emac_tx_avail - Return the number of available slots in the tx ring.
* @priv: Pointer to ARC EMAC private data structure.
@@ -259,6 +261,53 @@ static int arc_emac_rx(struct net_device *ndev, int budget)
}
/**
+ * arc_emac_rx_miss_handle - handle R_MISS register
+ * @ndev: Pointer to the net_device structure.
+ */
+static void arc_emac_rx_miss_handle(struct net_device *ndev)
+{
+ struct arc_emac_priv *priv = netdev_priv(ndev);
+ struct net_device_stats *stats = &ndev->stats;
+ unsigned int miss;
+
+ miss = arc_reg_get(priv, R_MISS);
+ if (miss) {
+ stats->rx_errors += miss;
+ stats->rx_missed_errors += miss;
+ priv->rx_missed_errors += miss;
+ }
+}
+
+/**
+ * arc_emac_rx_stall_check - check RX stall
+ * @ndev: Pointer to the net_device structure.
+ * @budget: How many BDs requested to process on 1 call.
+ * @work_done: How many BDs processed
+ *
+ * Under certain conditions EMAC stop reception of incoming packets and
+ * continuously increment R_MISS register instead of saving data into
+ * provided buffer. This function detect that condition and restart
+ * EMAC.
+ */
+static void arc_emac_rx_stall_check(struct net_device *ndev,
+ int budget, unsigned int work_done)
+{
+ struct arc_emac_priv *priv = netdev_priv(ndev);
+ struct arc_emac_bd *rxbd;
+
+ if (work_done)
+ priv->rx_missed_errors = 0;
+
+ if (priv->rx_missed_errors && budget) {
+ rxbd = &priv->rxbd[priv->last_rx_bd];
+ if (le32_to_cpu(rxbd->info) & FOR_EMAC) {
+ arc_emac_restart(ndev);
+ priv->rx_missed_errors = 0;
+ }
+ }
+}
+
+/**
* arc_emac_poll - NAPI poll handler.
* @napi: Pointer to napi_struct structure.
* @budget: How many BDs to process on 1 call.
@@ -272,6 +321,7 @@ static int arc_emac_poll(struct napi_struct *napi, int budget)
unsigned int work_done;
arc_emac_tx_clean(ndev);
+ arc_emac_rx_miss_handle(ndev);
work_done = arc_emac_rx(ndev, budget);
if (work_done < budget) {
@@ -279,6 +329,8 @@ static int arc_emac_poll(struct napi_struct *napi, int budget)
arc_reg_or(priv, R_ENABLE, RXINT_MASK | TXINT_MASK);
}
+ arc_emac_rx_stall_check(ndev, budget, work_done);
+
return work_done;
}
@@ -320,6 +372,8 @@ static irqreturn_t arc_emac_intr(int irq, void *dev_instance)
if (status & MSER_MASK) {
stats->rx_missed_errors += 0x100;
stats->rx_errors += 0x100;
+ priv->rx_missed_errors += 0x100;
+ napi_schedule(&priv->napi);
}
if (status & RXCR_MASK) {
@@ -732,6 +786,63 @@ static int arc_emac_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
}
+/**
+ * arc_emac_restart - Restart EMAC
+ * @ndev: Pointer to net_device structure.
+ *
+ * This function do hardware reset of EMAC in order to restore
+ * network packets reception.
+ */
+static void arc_emac_restart(struct net_device *ndev)
+{
+ struct arc_emac_priv *priv = netdev_priv(ndev);
+ struct net_device_stats *stats = &ndev->stats;
+ int i;
+
+ if (net_ratelimit())
+ netdev_warn(ndev, "restarting stalled EMAC\n");
+
+ netif_stop_queue(ndev);
+
+ /* Disable interrupts */
+ arc_reg_clr(priv, R_ENABLE, RXINT_MASK | TXINT_MASK | ERR_MASK);
+
+ /* Disable EMAC */
+ arc_reg_clr(priv, R_CTRL, EN_MASK);
+
+ /* Return the sk_buff to system */
+ arc_free_tx_queue(ndev);
+
+ /* Clean Tx BD's */
+ priv->txbd_curr = 0;
+ priv->txbd_dirty = 0;
+ memset(priv->txbd, 0, TX_RING_SZ);
+
+ for (i = 0; i < RX_BD_NUM; i++) {
+ struct arc_emac_bd *rxbd = &priv->rxbd[i];
+ unsigned int info = le32_to_cpu(rxbd->info);
+
+ if (!(info & FOR_EMAC)) {
+ stats->rx_errors++;
+ stats->rx_dropped++;
+ }
+ /* Return ownership to EMAC */
+ rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
+ }
+ priv->last_rx_bd = 0;
+
+ /* Make sure info is visible to EMAC before enable */
+ wmb();
+
+ /* Enable interrupts */
+ arc_reg_set(priv, R_ENABLE, RXINT_MASK | TXINT_MASK | ERR_MASK);
+
+ /* Enable EMAC */
+ arc_reg_or(priv, R_CTRL, EN_MASK);
+
+ netif_start_queue(ndev);
+}
+
static const struct net_device_ops arc_emac_netdev_ops = {
.ndo_open = arc_emac_open,
.ndo_stop = arc_emac_stop,
--
1.7.9.5
arc_emac_rx() has some issues found by code review.
In case netdev_alloc_skb_ip_align() or dma_map_single() failure
rx fifo entry will not be returned to EMAC.
In case dma_map_single() failure previously allocated skb became
lost to driver. At the same time address of newly allocated skb
will not be provided to EMAC.
Signed-off-by: Alexander Kochetkov <[email protected]>
---
Changes in v2:
- Rebased against stable linux-4.14.y branch
drivers/net/ethernet/arc/emac_main.c | 53 ++++++++++++++++++++--------------
1 file changed, 31 insertions(+), 22 deletions(-)
diff --git a/drivers/net/ethernet/arc/emac_main.c b/drivers/net/ethernet/arc/emac_main.c
index 363d909..bd277b0 100644
--- a/drivers/net/ethernet/arc/emac_main.c
+++ b/drivers/net/ethernet/arc/emac_main.c
@@ -212,39 +212,48 @@ static int arc_emac_rx(struct net_device *ndev, int budget)
continue;
}
- pktlen = info & LEN_MASK;
- stats->rx_packets++;
- stats->rx_bytes += pktlen;
- skb = rx_buff->skb;
- skb_put(skb, pktlen);
- skb->dev = ndev;
- skb->protocol = eth_type_trans(skb, ndev);
-
- dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
- dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
-
- /* Prepare the BD for next cycle */
- rx_buff->skb = netdev_alloc_skb_ip_align(ndev,
- EMAC_BUFFER_SIZE);
- if (unlikely(!rx_buff->skb)) {
+ /* Prepare the BD for next cycle. netif_receive_skb()
+ * only if new skb was allocated and mapped to avoid holes
+ * in the RX fifo.
+ */
+ skb = netdev_alloc_skb_ip_align(ndev, EMAC_BUFFER_SIZE);
+ if (unlikely(!skb)) {
+ if (net_ratelimit())
+ netdev_err(ndev, "cannot allocate skb\n");
+ /* Return ownership to EMAC */
+ rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
stats->rx_errors++;
- /* Because receive_skb is below, increment rx_dropped */
stats->rx_dropped++;
continue;
}
- /* receive_skb only if new skb was allocated to avoid holes */
- netif_receive_skb(skb);
-
- addr = dma_map_single(&ndev->dev, (void *)rx_buff->skb->data,
+ addr = dma_map_single(&ndev->dev, (void *)skb->data,
EMAC_BUFFER_SIZE, DMA_FROM_DEVICE);
if (dma_mapping_error(&ndev->dev, addr)) {
if (net_ratelimit())
- netdev_err(ndev, "cannot dma map\n");
- dev_kfree_skb(rx_buff->skb);
+ netdev_err(ndev, "cannot map dma buffer\n");
+ dev_kfree_skb(skb);
+ /* Return ownership to EMAC */
+ rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
stats->rx_errors++;
+ stats->rx_dropped++;
continue;
}
+
+ /* unmap previosly mapped skb */
+ dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
+ dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
+
+ pktlen = info & LEN_MASK;
+ stats->rx_packets++;
+ stats->rx_bytes += pktlen;
+ skb_put(rx_buff->skb, pktlen);
+ rx_buff->skb->dev = ndev;
+ rx_buff->skb->protocol = eth_type_trans(rx_buff->skb, ndev);
+
+ netif_receive_skb(rx_buff->skb);
+
+ rx_buff->skb = skb;
dma_unmap_addr_set(rx_buff, addr, addr);
dma_unmap_len_set(rx_buff, len, EMAC_BUFFER_SIZE);
--
1.7.9.5
From: Alexander Kochetkov <[email protected]>
Date: Fri, 15 Dec 2017 20:20:06 +0300
> arc_emac_rx() has some issues found by code review.
>
> In case netdev_alloc_skb_ip_align() or dma_map_single() failure
> rx fifo entry will not be returned to EMAC.
>
> In case dma_map_single() failure previously allocated skb became
> lost to driver. At the same time address of newly allocated skb
> will not be provided to EMAC.
>
> Signed-off-by: Alexander Kochetkov <[email protected]>
This patch adds quite a few bugs, here is one which shows this is not
functionally tested:
> - /* receive_skb only if new skb was allocated to avoid holes */
> - netif_receive_skb(skb);
> -
> - addr = dma_map_single(&ndev->dev, (void *)rx_buff->skb->data,
> + addr = dma_map_single(&ndev->dev, (void *)skb->data,
> EMAC_BUFFER_SIZE, DMA_FROM_DEVICE);
Map the new SKB.
> if (dma_mapping_error(&ndev->dev, addr)) {
> if (net_ratelimit())
> - netdev_err(ndev, "cannot dma map\n");
> - dev_kfree_skb(rx_buff->skb);
> + netdev_err(ndev, "cannot map dma buffer\n");
> + dev_kfree_skb(skb);
> + /* Return ownership to EMAC */
> + rxbd->info = cpu_to_le32(FOR_EMAC | EMAC_BUFFER_SIZE);
> stats->rx_errors++;
> + stats->rx_dropped++;
> continue;
> }
> +
> + /* unmap previosly mapped skb */
> + dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
> + dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
And then you unmap it. "addr" is the new DMA mapping, not the old one.
> 19 дек. 2017 г., в 18:22, David Miller <[email protected]> написал(а):
>
> From: Alexander Kochetkov <[email protected]>
> Date: Fri, 15 Dec 2017 20:20:06 +0300
>
>> arc_emac_rx() has some issues found by code review.
>>
>> In case netdev_alloc_skb_ip_align() or dma_map_single() failure
>> rx fifo entry will not be returned to EMAC.
>>
>> In case dma_map_single() failure previously allocated skb became
>> lost to driver. At the same time address of newly allocated skb
>> will not be provided to EMAC.
>>
>> Signed-off-by: Alexander Kochetkov <[email protected]>
>
> This patch adds quite a few bugs, here is one which shows this is not
> functionally tested:
May be I don’t understand correctly, but I don’t see bug here.
Wrong dma mapping usage should immediately manifest itself (kernel
instability, koops and so on). The patch was tested on rk3188 and work fine for me.
Also I did simulations of netdev_alloc_skb_ip_align() and dma_map_single()
faults and can confirm that new error handling work.
Could someone else with ARC EMAC test the patch? I would be very grateful for the help.
Florian or Eric, can you test it on your hardware?
>> +
>> + /* unmap previosly mapped skb */
>> + dma_unmap_single(&ndev->dev, dma_unmap_addr(rx_buff, addr),
>> + dma_unmap_len(rx_buff, len), DMA_FROM_DEVICE);
>
> And then you unmap it. "addr" is the new DMA mapping, not the old one.
The old mapping should be unmapped here. It refer to old skb what contains already
received packet. Because buffer doesn’t belong to EMAC anymore.
That old mapping point to old skb buffer. And netif_receive_skb() will be
called for old skb after that.
The new mapping «addr" will be unmapped then EMAC receive
new packet. Later by the code (after netif_receive_skb()) there are lines for
saving «addr» value:
rx_buff->skb = skb;
dma_unmap_addr_set(rx_buff, addr, addr);
dma_unmap_len_set(rx_buff, len, EMAC_BUFFER_SIZE);
Regards,
Alexander.
From: Alexander Kochetkov <[email protected]>
Date: Tue, 19 Dec 2017 18:49:48 +0300
>> And then you unmap it. "addr" is the new DMA mapping, not the old one.
>
> The old mapping should be unmapped here. It refer to old skb what contains already
> received packet. Because buffer doesn?t belong to EMAC anymore.
> That old mapping point to old skb buffer. And netif_receive_skb() will be
> called for old skb after that.
I misread the code, sorry. I saw the dma_unmap_addr(xxx, addr) and just
considered 'addr' as the local variable. It's not, it's in fact the
TX struct member.
Let me look at this some more. ;-)
From: Alexander Kochetkov <[email protected]>
Date: Tue, 19 Dec 2017 14:03:57 +0300
> Under certain conditions EMAC stop reception of incoming packets and
> continuously increment R_MISS register instead of saving data into
> provided buffer. The commit implement workaround for such situation.
> Then the stall detected EMAC will be restarted.
>
> On device the stall looks like the device lost it's dynamic IP address.
> ifconfig shows that interface error counter rapidly increments.
> At the same time on the DHCP server we can see continues DHCP-requests
> from device.
>
> In real network stalls happen really rarely. To make them frequent the
> broadcast storm[1] should be simulated. For simulation it is necessary
> to make following connections:
> 1. connect radxarock to 1st port of switch
> 2. connect some PC to 2nd port of switch
> 3. connect two other free ports together using standard ethernet cable,
> in order to make a switching loop.
>
> After that, is necessary to make a broadcast storm. For example, running on
> PC 'ping' to some IP address triggers ARP-request storm. After some
> time (~10sec), EMAC on rk3188 will stall.
>
> Observed and tested on rk3188 radxarock.
>
> [1] https://en.wikipedia.org/wiki/Broadcast_radiation
>
> Signed-off-by: Alexander Kochetkov <[email protected]>
Applied.
From: Alexander Kochetkov <[email protected]>
Date: Tue, 19 Dec 2017 14:07:04 +0300
> arc_emac_rx() has some issues found by code review.
>
> In case netdev_alloc_skb_ip_align() or dma_map_single() failure
> rx fifo entry will not be returned to EMAC.
>
> In case dma_map_single() failure previously allocated skb became
> lost to driver. At the same time address of newly allocated skb
> will not be provided to EMAC.
>
> Signed-off-by: Alexander Kochetkov <[email protected]>
Applied.
On 12/19/2017 07:49 AM, Alexander Kochetkov wrote:
>
>> 19 дек. 2017 г., в 18:22, David Miller <[email protected]> написал(а):
>>
>> From: Alexander Kochetkov <[email protected]>
>> Date: Fri, 15 Dec 2017 20:20:06 +0300
>>
>>> arc_emac_rx() has some issues found by code review.
>>>
>>> In case netdev_alloc_skb_ip_align() or dma_map_single() failure
>>> rx fifo entry will not be returned to EMAC.
>>>
>>> In case dma_map_single() failure previously allocated skb became
>>> lost to driver. At the same time address of newly allocated skb
>>> will not be provided to EMAC.
>>>
>>> Signed-off-by: Alexander Kochetkov <[email protected]>
>>
>> This patch adds quite a few bugs, here is one which shows this is not
>> functionally tested:
>
> May be I don’t understand correctly, but I don’t see bug here.
>
> Wrong dma mapping usage should immediately manifest itself (kernel
> instability, koops and so on). The patch was tested on rk3188 and work fine for me.
> Also I did simulations of netdev_alloc_skb_ip_align() and dma_map_single()
> faults and can confirm that new error handling work.
>
> Could someone else with ARC EMAC test the patch? I would be very grateful for the help.
> Florian or Eric, can you test it on your hardware?
I don't actually have access to this hardware, the only change I did to
this driver was to use a standard Device Tree property :)
--
Florian