patch1 fixes rx_offset_correction set and usage. Because the
rx_offset_correction is RX packet offset correction for platforms,
it's not related with SW BM, instead, it's only related with the
platform's NET_SKB_PAD.
patch2 fixes the wrong function to unmap rx buf
patch3 removes the NETIF_F_GRO check ourself, because the net subsystem
will handle it for us.
patch4 enables NETIF_F_RXCSUM by default, since the driver and HW
supports the feature.
patch5 is a trivial optimization, to reduce smp_processor_id() calling
in mvneta_tx_done_gbe.
Jisheng Zhang (5):
net: mvneta: fix rx_offset_correction set and usage
net: mvneta: fix the wrong function to unmap rx buf
net: mvneta: Don't check NETIF_F_GRO ourself
net: mvneta: enable NETIF_F_RXCSUM by default
net: mvneta: reduce smp_processor_id() calling in mvneta_tx_done_gbe
drivers/net/ethernet/marvell/mvneta.c | 49 ++++++++++++---------------
1 file changed, 22 insertions(+), 27 deletions(-)
--
2.18.0
The rx_offset_correction is RX packet offset correction for platforms,
it's not related with SW BM, instead, it's only related with the
platform's NET_SKB_PAD.
Fix the issue by reverting to the original behavior.
Fixes: 562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM")
Signed-off-by: Jisheng Zhang <[email protected]>
---
drivers/net/ethernet/marvell/mvneta.c | 24 ++++++++++--------------
1 file changed, 10 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index bc80a678abc3..0ce94f6587a5 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2899,21 +2899,18 @@ static void mvneta_rxq_hw_init(struct mvneta_port *pp,
mvreg_write(pp, MVNETA_RXQ_BASE_ADDR_REG(rxq->id), rxq->descs_phys);
mvreg_write(pp, MVNETA_RXQ_SIZE_REG(rxq->id), rxq->size);
+ /* Set Offset */
+ mvneta_rxq_offset_set(pp, rxq, NET_SKB_PAD - pp->rx_offset_correction);
+
/* Set coalescing pkts and time */
mvneta_rx_pkts_coal_set(pp, rxq, rxq->pkts_coal);
mvneta_rx_time_coal_set(pp, rxq, rxq->time_coal);
if (!pp->bm_priv) {
- /* Set Offset */
- mvneta_rxq_offset_set(pp, rxq, 0);
mvneta_rxq_buf_size_set(pp, rxq, pp->frag_size);
mvneta_rxq_bm_disable(pp, rxq);
mvneta_rxq_fill(pp, rxq, rxq->size);
} else {
- /* Set Offset */
- mvneta_rxq_offset_set(pp, rxq,
- NET_SKB_PAD - pp->rx_offset_correction);
-
mvneta_rxq_bm_enable(pp, rxq);
/* Fill RXQ with buffers from RX pool */
mvneta_rxq_long_pool_set(pp, rxq);
@@ -4547,7 +4544,13 @@ static int mvneta_probe(struct platform_device *pdev)
SET_NETDEV_DEV(dev, &pdev->dev);
pp->id = global_port_id++;
- pp->rx_offset_correction = 0; /* not relevant for SW BM */
+
+ /* Set RX packet offset correction for platforms, whose
+ * NET_SKB_PAD, exceeds 64B. It should be 64B for 64-bit
+ * platforms and 0B for 32-bit ones.
+ */
+ pp->rx_offset_correction =
+ max(0, NET_SKB_PAD - MVNETA_RX_PKT_OFFSET_CORRECTION);
/* Obtain access to BM resources if enabled and already initialized */
bm_node = of_parse_phandle(dn, "buffer-manager", 0);
@@ -4562,13 +4565,6 @@ static int mvneta_probe(struct platform_device *pdev)
pp->bm_priv = NULL;
}
}
- /* Set RX packet offset correction for platforms, whose
- * NET_SKB_PAD, exceeds 64B. It should be 64B for 64-bit
- * platforms and 0B for 32-bit ones.
- */
- pp->rx_offset_correction = max(0,
- NET_SKB_PAD -
- MVNETA_RX_PKT_OFFSET_CORRECTION);
}
of_node_put(bm_node);
--
2.18.0
Commit 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor")
always allocate one page for each rx descriptor, so the rx is mapped
with dmap_map_page() now, but the unmap routine isn't updated at the
same time.
Fix this by using dma_unmap_page() in corresponding places.
Fixes: 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor")
Signed-off-by: Jisheng Zhang <[email protected]>
---
drivers/net/ethernet/marvell/mvneta.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 0ce94f6587a5..d9206094fce3 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1890,8 +1890,9 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
if (!data || !(rx_desc->buf_phys_addr))
continue;
- dma_unmap_single(pp->dev->dev.parent, rx_desc->buf_phys_addr,
- MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
+ dma_unmap_page(pp->dev->dev.parent, rx_desc->buf_phys_addr,
+ MVNETA_RX_BUF_SIZE(pp->pkt_size),
+ DMA_FROM_DEVICE);
__free_page(data);
}
}
@@ -2008,8 +2009,8 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
skb_add_rx_frag(rxq->skb, frag_num, page,
frag_offset, frag_size,
PAGE_SIZE);
- dma_unmap_single(dev->dev.parent, phys_addr,
- PAGE_SIZE, DMA_FROM_DEVICE);
+ dma_unmap_page(dev->dev.parent, phys_addr,
+ PAGE_SIZE, DMA_FROM_DEVICE);
rxq->left_size -= frag_size;
}
} else {
@@ -2039,9 +2040,8 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
frag_offset, frag_size,
PAGE_SIZE);
- dma_unmap_single(dev->dev.parent, phys_addr,
- PAGE_SIZE,
- DMA_FROM_DEVICE);
+ dma_unmap_page(dev->dev.parent, phys_addr,
+ PAGE_SIZE, DMA_FROM_DEVICE);
rxq->left_size -= frag_size;
}
--
2.18.0
napi_gro_receive() checks NETIF_F_GRO bit as well, if the bit is not
set, we will go through GRO_NORMAL in napi_skb_finish(), so fall back
to netif_receive_skb_internal(), so we don't need to check NETIF_F_GRO
ourself.
Signed-off-by: Jisheng Zhang <[email protected]>
---
drivers/net/ethernet/marvell/mvneta.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index d9206094fce3..06634d4f9b94 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2065,10 +2065,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
/* Linux processing */
rxq->skb->protocol = eth_type_trans(rxq->skb, dev);
- if (dev->features & NETIF_F_GRO)
- napi_gro_receive(napi, rxq->skb);
- else
- netif_receive_skb(rxq->skb);
+ napi_gro_receive(napi, rxq->skb);
/* clean uncomplete skb pointer in queue */
rxq->skb = NULL;
--
2.18.0
The code and HW supports NETIF_F_RXCSUM, so let's enable it by default.
Signed-off-by: Jisheng Zhang <[email protected]>
---
drivers/net/ethernet/marvell/mvneta.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 06634d4f9b94..7d98f7828a30 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -4591,7 +4591,8 @@ static int mvneta_probe(struct platform_device *pdev)
}
}
- dev->features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_TSO;
+ dev->features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
+ NETIF_F_TSO | NETIF_F_RXCSUM;
dev->hw_features |= dev->features;
dev->vlan_features |= dev->features;
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
--
2.18.0
In the loop of mvneta_tx_done_gbe(), we call the smp_processor_id()
each time, move the call out of the loop to optimize the code a bit.
Before the patch, the loop looks like(under arm64):
ldr x1, [x29,#120]
...
ldr w24, [x1,#36]
...
bl 0 <_raw_spin_lock>
str w24, [x27,#132]
...
After the patch, the loop looks like(under arm64):
...
bl 0 <_raw_spin_lock>
str w23, [x28,#132]
...
where w23 is loaded so be ready before the loop.
From another side, mvneta_tx_done_gbe() is called from mvneta_poll()
which is in non-preemptible context, so it's safe to call the
smp_processor_id() function once.
Signed-off-by: Jisheng Zhang <[email protected]>
---
drivers/net/ethernet/marvell/mvneta.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 7d98f7828a30..62e81e267e13 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2507,12 +2507,13 @@ static void mvneta_tx_done_gbe(struct mvneta_port *pp, u32 cause_tx_done)
{
struct mvneta_tx_queue *txq;
struct netdev_queue *nq;
+ int cpu = smp_processor_id();
while (cause_tx_done) {
txq = mvneta_tx_done_policy(pp, cause_tx_done);
nq = netdev_get_tx_queue(pp->dev, txq->id);
- __netif_tx_lock(nq, smp_processor_id());
+ __netif_tx_lock(nq, cpu);
if (txq->count)
mvneta_txq_done(pp, txq);
--
2.18.0
On Wed, 29 Aug 2018 16:25:57 +0800
Jisheng Zhang <[email protected]> wrote:
> patch1 fixes rx_offset_correction set and usage. Because the
> rx_offset_correction is RX packet offset correction for platforms,
> it's not related with SW BM, instead, it's only related with the
> platform's NET_SKB_PAD.
>
> patch2 fixes the wrong function to unmap rx buf
I have question about the following two commits:
7e47fd84b56b ("net: mvneta: Allocate page for the descriptor"), it cause
a waste, for normal 1500 MTU, before this patch we allocate 1920Bytes for rx
after this patch, we always allocate PAGE_SIZE bytes, if PAGE_SIZE=4096, we
waste 53% memory for each rx buf. I'm not sure whether the performance
improvement deserve the pay.
562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM")
mentions that "With system having a small memory (around 256MB), the state
"cannot allocate memory to refill with new buffer" is reach pretty quickly"
is it due to the memory waste as said above? Anyway, by this commit, we
want to improve the situation on a small memory system, so should we firstly
revert commit 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor")?
Any comments are welcome!
Thanks
>
> patch3 removes the NETIF_F_GRO check ourself, because the net subsystem
> will handle it for us.
>
> patch4 enables NETIF_F_RXCSUM by default, since the driver and HW
> supports the feature.
>
> patch5 is a trivial optimization, to reduce smp_processor_id() calling
> in mvneta_tx_done_gbe.
>
> Jisheng Zhang (5):
> net: mvneta: fix rx_offset_correction set and usage
> net: mvneta: fix the wrong function to unmap rx buf
> net: mvneta: Don't check NETIF_F_GRO ourself
> net: mvneta: enable NETIF_F_RXCSUM by default
> net: mvneta: reduce smp_processor_id() calling in mvneta_tx_done_gbe
>
> drivers/net/ethernet/marvell/mvneta.c | 49 ++++++++++++---------------
> 1 file changed, 22 insertions(+), 27 deletions(-)
>
On Wed, 29 Aug 2018 16:40:24 +0800 Jisheng Zhang wrote:
> On Wed, 29 Aug 2018 16:25:57 +0800
> Jisheng Zhang wrote:
>
> > patch1 fixes rx_offset_correction set and usage. Because the
> > rx_offset_correction is RX packet offset correction for platforms,
> > it's not related with SW BM, instead, it's only related with the
> > platform's NET_SKB_PAD.
> >
> > patch2 fixes the wrong function to unmap rx buf
>
> I have question about the following two commits:
>
> 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor"), it cause
> a waste, for normal 1500 MTU, before this patch we allocate 1920Bytes for rx
> after this patch, we always allocate PAGE_SIZE bytes, if PAGE_SIZE=4096, we
> waste 53% memory for each rx buf. I'm not sure whether the performance
> improvement deserve the pay.
>
> 562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM")
> mentions that "With system having a small memory (around 256MB), the state
> "cannot allocate memory to refill with new buffer" is reach pretty quickly"
> is it due to the memory waste as said above? Anyway, by this commit, we
> want to improve the situation on a small memory system, so should we firstly
> revert commit 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor")?
>
If maintainers decide to revert the two commits: 7e47fd84b56b and 562e2f467e71
then, patch1,2,3 are useless, we can drop them. Only patch4 and patch5 are
still useful.
Thanks
> Any comments are welcome!
>
> Thanks
>
>
> >
> > patch3 removes the NETIF_F_GRO check ourself, because the net subsystem
> > will handle it for us.
> >
> > patch4 enables NETIF_F_RXCSUM by default, since the driver and HW
> > supports the feature.
> >
> > patch5 is a trivial optimization, to reduce smp_processor_id() calling
> > in mvneta_tx_done_gbe.
> >
> > Jisheng Zhang (5):
> > net: mvneta: fix rx_offset_correction set and usage
> > net: mvneta: fix the wrong function to unmap rx buf
> > net: mvneta: Don't check NETIF_F_GRO ourself
> > net: mvneta: enable NETIF_F_RXCSUM by default
> > net: mvneta: reduce smp_processor_id() calling in mvneta_tx_done_gbe
> >
> > drivers/net/ethernet/marvell/mvneta.c | 49 ++++++++++++---------------
> > 1 file changed, 22 insertions(+), 27 deletions(-)
> >
>
Hi Jisheng,
On mer., août 29 2018, Jisheng Zhang <[email protected]> wrote:
> The rx_offset_correction is RX packet offset correction for platforms,
> it's not related with SW BM, instead, it's only related with the
> platform's NET_SKB_PAD.
>
But if I undrestood well, the value of rx_offset_correction has an
influence only when we use HW BM.
However since d93277b9839b ("Revert "arm64: Increase the max granular
size""), NET_SKB_PAD is 64 for arm64, so in the end rx_offset_correction
is always 0 for recent kernels.
Gregory
> Fix the issue by reverting to the original behavior.
>
> Fixes: 562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM")
> Signed-off-by: Jisheng Zhang <[email protected]>
> ---
> drivers/net/ethernet/marvell/mvneta.c | 24 ++++++++++--------------
> 1 file changed, 10 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index bc80a678abc3..0ce94f6587a5 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -2899,21 +2899,18 @@ static void mvneta_rxq_hw_init(struct mvneta_port *pp,
> mvreg_write(pp, MVNETA_RXQ_BASE_ADDR_REG(rxq->id), rxq->descs_phys);
> mvreg_write(pp, MVNETA_RXQ_SIZE_REG(rxq->id), rxq->size);
>
> + /* Set Offset */
> + mvneta_rxq_offset_set(pp, rxq, NET_SKB_PAD - pp->rx_offset_correction);
> +
> /* Set coalescing pkts and time */
> mvneta_rx_pkts_coal_set(pp, rxq, rxq->pkts_coal);
> mvneta_rx_time_coal_set(pp, rxq, rxq->time_coal);
>
> if (!pp->bm_priv) {
> - /* Set Offset */
> - mvneta_rxq_offset_set(pp, rxq, 0);
> mvneta_rxq_buf_size_set(pp, rxq, pp->frag_size);
> mvneta_rxq_bm_disable(pp, rxq);
> mvneta_rxq_fill(pp, rxq, rxq->size);
> } else {
> - /* Set Offset */
> - mvneta_rxq_offset_set(pp, rxq,
> - NET_SKB_PAD - pp->rx_offset_correction);
> -
> mvneta_rxq_bm_enable(pp, rxq);
> /* Fill RXQ with buffers from RX pool */
> mvneta_rxq_long_pool_set(pp, rxq);
> @@ -4547,7 +4544,13 @@ static int mvneta_probe(struct platform_device *pdev)
> SET_NETDEV_DEV(dev, &pdev->dev);
>
> pp->id = global_port_id++;
> - pp->rx_offset_correction = 0; /* not relevant for SW BM */
> +
> + /* Set RX packet offset correction for platforms, whose
> + * NET_SKB_PAD, exceeds 64B. It should be 64B for 64-bit
> + * platforms and 0B for 32-bit ones.
> + */
> + pp->rx_offset_correction =
> + max(0, NET_SKB_PAD - MVNETA_RX_PKT_OFFSET_CORRECTION);
>
> /* Obtain access to BM resources if enabled and already initialized */
> bm_node = of_parse_phandle(dn, "buffer-manager", 0);
> @@ -4562,13 +4565,6 @@ static int mvneta_probe(struct platform_device *pdev)
> pp->bm_priv = NULL;
> }
> }
> - /* Set RX packet offset correction for platforms, whose
> - * NET_SKB_PAD, exceeds 64B. It should be 64B for 64-bit
> - * platforms and 0B for 32-bit ones.
> - */
> - pp->rx_offset_correction = max(0,
> - NET_SKB_PAD -
> - MVNETA_RX_PKT_OFFSET_CORRECTION);
> }
> of_node_put(bm_node);
>
> --
> 2.18.0
>
--
Gregory Clement, Bootlin
Embedded Linux and Kernel engineering
http://bootlin.com
Hi,
On Wed, 29 Aug 2018 11:05:45 +0200 Gregory CLEMENT wrote:
> Hi Jisheng,
>
> On mer., août 29 2018, Jisheng Zhang <[email protected]> wrote:
>
> > The rx_offset_correction is RX packet offset correction for platforms,
> > it's not related with SW BM, instead, it's only related with the
> > platform's NET_SKB_PAD.
> >
>
> But if I undrestood well, the value of rx_offset_correction has an
> influence only when we use HW BM.
The rx_offset_correction is introduced by commit 8d5047cf9ca2 ("net: mvneta:
Convert to be 64 bits compatible"). It's to support mvneta on 64bit
platforms such as Armada 3700. It's not related with HW BM.
>
> However since d93277b9839b ("Revert "arm64: Increase the max granular
> size""), NET_SKB_PAD is 64 for arm64, so in the end rx_offset_correction
> is always 0 for recent kernels.
yes, I mentioned this in email "[query] about recent mvneta patches".
IMHO, we'd better not rely on the platform's L1_CACHE_BYTES value,
we dunno whether the max granular size is increased again in the future.
Thanks
>
> Gregory
>
>
> > Fix the issue by reverting to the original behavior.
> >
> > Fixes: 562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM")
> > Signed-off-by: Jisheng Zhang <[email protected]>
> > ---
> > drivers/net/ethernet/marvell/mvneta.c | 24 ++++++++++--------------
> > 1 file changed, 10 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > index bc80a678abc3..0ce94f6587a5 100644
> > --- a/drivers/net/ethernet/marvell/mvneta.c
> > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > @@ -2899,21 +2899,18 @@ static void mvneta_rxq_hw_init(struct mvneta_port *pp,
> > mvreg_write(pp, MVNETA_RXQ_BASE_ADDR_REG(rxq->id), rxq->descs_phys);
> > mvreg_write(pp, MVNETA_RXQ_SIZE_REG(rxq->id), rxq->size);
> >
> > + /* Set Offset */
> > + mvneta_rxq_offset_set(pp, rxq, NET_SKB_PAD - pp->rx_offset_correction);
> > +
> > /* Set coalescing pkts and time */
> > mvneta_rx_pkts_coal_set(pp, rxq, rxq->pkts_coal);
> > mvneta_rx_time_coal_set(pp, rxq, rxq->time_coal);
> >
> > if (!pp->bm_priv) {
> > - /* Set Offset */
> > - mvneta_rxq_offset_set(pp, rxq, 0);
> > mvneta_rxq_buf_size_set(pp, rxq, pp->frag_size);
> > mvneta_rxq_bm_disable(pp, rxq);
> > mvneta_rxq_fill(pp, rxq, rxq->size);
> > } else {
> > - /* Set Offset */
> > - mvneta_rxq_offset_set(pp, rxq,
> > - NET_SKB_PAD - pp->rx_offset_correction);
> > -
> > mvneta_rxq_bm_enable(pp, rxq);
> > /* Fill RXQ with buffers from RX pool */
> > mvneta_rxq_long_pool_set(pp, rxq);
> > @@ -4547,7 +4544,13 @@ static int mvneta_probe(struct platform_device *pdev)
> > SET_NETDEV_DEV(dev, &pdev->dev);
> >
> > pp->id = global_port_id++;
> > - pp->rx_offset_correction = 0; /* not relevant for SW BM */
> > +
> > + /* Set RX packet offset correction for platforms, whose
> > + * NET_SKB_PAD, exceeds 64B. It should be 64B for 64-bit
> > + * platforms and 0B for 32-bit ones.
> > + */
> > + pp->rx_offset_correction =
> > + max(0, NET_SKB_PAD - MVNETA_RX_PKT_OFFSET_CORRECTION);
> >
> > /* Obtain access to BM resources if enabled and already initialized */
> > bm_node = of_parse_phandle(dn, "buffer-manager", 0);
> > @@ -4562,13 +4565,6 @@ static int mvneta_probe(struct platform_device *pdev)
> > pp->bm_priv = NULL;
> > }
> > }
> > - /* Set RX packet offset correction for platforms, whose
> > - * NET_SKB_PAD, exceeds 64B. It should be 64B for 64-bit
> > - * platforms and 0B for 32-bit ones.
> > - */
> > - pp->rx_offset_correction = max(0,
> > - NET_SKB_PAD -
> > - MVNETA_RX_PKT_OFFSET_CORRECTION);
> > }
> > of_node_put(bm_node);
> >
> > --
> > 2.18.0
> >
>
Hi Jisheng,
On mer., août 29 2018, Jisheng Zhang <[email protected]> wrote:
> Commit 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor")
> always allocate one page for each rx descriptor, so the rx is mapped
> with dmap_map_page() now, but the unmap routine isn't updated at the
> same time.
>
> Fix this by using dma_unmap_page() in corresponding places.
>
> Fixes: 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor")
> Signed-off-by: Jisheng Zhang <[email protected]>
> ---
> drivers/net/ethernet/marvell/mvneta.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 0ce94f6587a5..d9206094fce3 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -1890,8 +1890,9 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
> if (!data || !(rx_desc->buf_phys_addr))
> continue;
>
> - dma_unmap_single(pp->dev->dev.parent, rx_desc->buf_phys_addr,
> - MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
> + dma_unmap_page(pp->dev->dev.parent, rx_desc->buf_phys_addr,
> + MVNETA_RX_BUF_SIZE(pp->pkt_size),
> + DMA_FROM_DEVICE);
> __free_page(data);
> }
> }
This one can be called when the allocation is done in with HWBM in this
case which use a dma_map_single.
Gregory
> @@ -2008,8 +2009,8 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
> skb_add_rx_frag(rxq->skb, frag_num, page,
> frag_offset, frag_size,
> PAGE_SIZE);
> - dma_unmap_single(dev->dev.parent, phys_addr,
> - PAGE_SIZE, DMA_FROM_DEVICE);
> + dma_unmap_page(dev->dev.parent, phys_addr,
> + PAGE_SIZE, DMA_FROM_DEVICE);
> rxq->left_size -= frag_size;
> }
> } else {
> @@ -2039,9 +2040,8 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
> frag_offset, frag_size,
> PAGE_SIZE);
>
> - dma_unmap_single(dev->dev.parent, phys_addr,
> - PAGE_SIZE,
> - DMA_FROM_DEVICE);
> + dma_unmap_page(dev->dev.parent, phys_addr,
> + PAGE_SIZE, DMA_FROM_DEVICE);
>
> rxq->left_size -= frag_size;
> }
> --
> 2.18.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
Gregory Clement, Bootlin
Embedded Linux and Kernel engineering
http://bootlin.com
Hi Jisheng,
On mer., août 29 2018, Jisheng Zhang <[email protected]> wrote:
> napi_gro_receive() checks NETIF_F_GRO bit as well, if the bit is not
> set, we will go through GRO_NORMAL in napi_skb_finish(), so fall back
> to netif_receive_skb_internal(), so we don't need to check NETIF_F_GRO
> ourself.
this one is not a fix and it should go to net-next.
And for the patch it looks OK:
Reviewed-by: Gregory CLEMENT <[email protected]>
Thanks,
Gregory
>
> Signed-off-by: Jisheng Zhang <[email protected]>
> ---
> drivers/net/ethernet/marvell/mvneta.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index d9206094fce3..06634d4f9b94 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -2065,10 +2065,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
> /* Linux processing */
> rxq->skb->protocol = eth_type_trans(rxq->skb, dev);
>
> - if (dev->features & NETIF_F_GRO)
> - napi_gro_receive(napi, rxq->skb);
> - else
> - netif_receive_skb(rxq->skb);
> + napi_gro_receive(napi, rxq->skb);
>
> /* clean uncomplete skb pointer in queue */
> rxq->skb = NULL;
> --
> 2.18.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
Gregory Clement, Bootlin
Embedded Linux and Kernel engineering
http://bootlin.com
Hi Jisheng,
On mer., août 29 2018, Jisheng Zhang <[email protected]> wrote:
> The code and HW supports NETIF_F_RXCSUM, so let's enable it by
> default.
Same as the previous one, it should go to net-next and
Reviewed-by: Gregory CLEMENT <[email protected]>
Thanks,
Gregory
>
> Signed-off-by: Jisheng Zhang <[email protected]>
> ---
> drivers/net/ethernet/marvell/mvneta.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 06634d4f9b94..7d98f7828a30 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -4591,7 +4591,8 @@ static int mvneta_probe(struct platform_device *pdev)
> }
> }
>
> - dev->features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_TSO;
> + dev->features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> + NETIF_F_TSO | NETIF_F_RXCSUM;
> dev->hw_features |= dev->features;
> dev->vlan_features |= dev->features;
> dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
> --
> 2.18.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
Gregory Clement, Bootlin
Embedded Linux and Kernel engineering
http://bootlin.com
Hi Jisheng,
On mer., août 29 2018, Jisheng Zhang <[email protected]> wrote:
> In the loop of mvneta_tx_done_gbe(), we call the smp_processor_id()
> each time, move the call out of the loop to optimize the code a bit.
>
> Before the patch, the loop looks like(under arm64):
>
> ldr x1, [x29,#120]
> ...
> ldr w24, [x1,#36]
> ...
> bl 0 <_raw_spin_lock>
> str w24, [x27,#132]
> ...
>
> After the patch, the loop looks like(under arm64):
>
> ...
> bl 0 <_raw_spin_lock>
> str w23, [x28,#132]
> ...
> where w23 is loaded so be ready before the loop.
>
> From another side, mvneta_tx_done_gbe() is called from mvneta_poll()
> which is in non-preemptible context, so it's safe to call the
> smp_processor_id() function once.
This improvement should go to net-next. Besides this patch looks nice:
Reviewed-by: Gregory CLEMENT <[email protected]>
Thanks,
Gregory
>
> Signed-off-by: Jisheng Zhang <[email protected]>
> ---
> drivers/net/ethernet/marvell/mvneta.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 7d98f7828a30..62e81e267e13 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -2507,12 +2507,13 @@ static void mvneta_tx_done_gbe(struct mvneta_port *pp, u32 cause_tx_done)
> {
> struct mvneta_tx_queue *txq;
> struct netdev_queue *nq;
> + int cpu = smp_processor_id();
>
> while (cause_tx_done) {
> txq = mvneta_tx_done_policy(pp, cause_tx_done);
>
> nq = netdev_get_tx_queue(pp->dev, txq->id);
> - __netif_tx_lock(nq, smp_processor_id());
> + __netif_tx_lock(nq, cpu);
>
> if (txq->count)
> mvneta_txq_done(pp, txq);
> --
> 2.18.0
>
--
Gregory Clement, Bootlin
Embedded Linux and Kernel engineering
http://bootlin.com
On Wed, Aug 29, 2018 at 04:29:32PM +0800, Jisheng Zhang wrote:
> The code and HW supports NETIF_F_RXCSUM, so let's enable it by default.
Hi Jisheng
I've never studied what all these different flags mean. Does
NETIF_F_RXCSUM mean Ethernet FCS? Or does it also include IPv4, IPv6,
UDP, TCP... checksums?
I've seen network interfaces get checksum'ing wrong when used with an
Ethernet switch with DSA. The extra header DSA uses means the hardware
cannot parse the packet correctly, and so cannot find these headers.
If this is just for FCS, then it is not a problem.
Thanks
Andrew
Hi Jisheng
Please separate fixes from new features.
Fixes should be based on DaveM net branch, and use the subject line
[PATCH net]...
New features should be based on DaveM net-next branch, and use the
subject line [PATCH net-next]...
Thanks
Andrew
Hi Andrew,
On Wed, 29 Aug 2018 15:08:36 +0200 Andrew Lunn wrote:
> On Wed, Aug 29, 2018 at 04:29:32PM +0800, Jisheng Zhang wrote:
> > The code and HW supports NETIF_F_RXCSUM, so let's enable it by default.
>
> Hi Jisheng
>
> I've never studied what all these different flags mean. Does
> NETIF_F_RXCSUM mean Ethernet FCS? Or does it also include IPv4, IPv6,
> UDP, TCP... checksums?
Per my understanding, it means RX checksumming. And it only supports IPv4
RX checksum, the code will
>
> I've seen network interfaces get checksum'ing wrong when used with an
> Ethernet switch with DSA. The extra header DSA uses means the hardware
> cannot parse the packet correctly, and so cannot find these headers.
The network interface is mvneta? Do you mean after this patch, we would
see errors as the following?
"bad rx status 0xabcdefgh (crc error)"
So for DSA, we should disable RXCSUM? I'm not sure how to handle this case.
I believe other drivers(with RXCSUM enabled by deafult) also have this problem
with DSA.
Thanks
>
> If this is just for FCS, then it is not a problem.
>
> Thanks
> Andrew
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > I've seen network interfaces get checksum'ing wrong when used with an
> > Ethernet switch with DSA. The extra header DSA uses means the hardware
> > cannot parse the packet correctly, and so cannot find these headers.
>
> The network interface is mvneta?
I've not tested mvneta yet. It could be, since it was designed to be
used with Marvell switches in things like WiFi boxes, it knows how to
find the IP header when there is a DSA tag.
> Do you mean after this patch, we would see errors as the following?
>
> "bad rx status 0xabcdefgh (crc error)"
I've not tried it yet. That is why i'm trying to understand what
NETIF_F_RXCSUM actually means.
Andrew
On Wed, 29 Aug 2018 11:21:12 +0200 Gregory CLEMENT wrote:
> Hi Jisheng,
>
> On mer., août 29 2018, Jisheng Zhang <[email protected]> wrote:
>
> > Commit 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor")
> > always allocate one page for each rx descriptor, so the rx is mapped
> > with dmap_map_page() now, but the unmap routine isn't updated at the
> > same time.
> >
> > Fix this by using dma_unmap_page() in corresponding places.
> >
> > Fixes: 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor")
> > Signed-off-by: Jisheng Zhang <[email protected]>
> > ---
> > drivers/net/ethernet/marvell/mvneta.c | 14 +++++++-------
> > 1 file changed, 7 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > index 0ce94f6587a5..d9206094fce3 100644
> > --- a/drivers/net/ethernet/marvell/mvneta.c
> > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > @@ -1890,8 +1890,9 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
> > if (!data || !(rx_desc->buf_phys_addr))
> > continue;
> >
> > - dma_unmap_single(pp->dev->dev.parent, rx_desc->buf_phys_addr,
> > - MVNETA_RX_BUF_SIZE(pp->pkt_size), DMA_FROM_DEVICE);
> > + dma_unmap_page(pp->dev->dev.parent, rx_desc->buf_phys_addr,
> > + MVNETA_RX_BUF_SIZE(pp->pkt_size),
> > + DMA_FROM_DEVICE);
> > __free_page(data);
> > }
> > }
> This one can be called when the allocation is done in with HWBM in this
> case which use a dma_map_single.
oops, thanks for the catch. will fix it in v2
>
> Gregory
>
>
>
> > @@ -2008,8 +2009,8 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
> > skb_add_rx_frag(rxq->skb, frag_num, page,
> > frag_offset, frag_size,
> > PAGE_SIZE);
> > - dma_unmap_single(dev->dev.parent, phys_addr,
> > - PAGE_SIZE, DMA_FROM_DEVICE);
> > + dma_unmap_page(dev->dev.parent, phys_addr,
> > + PAGE_SIZE, DMA_FROM_DEVICE);
> > rxq->left_size -= frag_size;
> > }
> > } else {
> > @@ -2039,9 +2040,8 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
> > frag_offset, frag_size,
> > PAGE_SIZE);
> >
> > - dma_unmap_single(dev->dev.parent, phys_addr,
> > - PAGE_SIZE,
> > - DMA_FROM_DEVICE);
> > + dma_unmap_page(dev->dev.parent, phys_addr,
> > + PAGE_SIZE, DMA_FROM_DEVICE);
> >
> > rxq->left_size -= frag_size;
> > }
> > --
> > 2.18.0
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > [email protected]
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
On Wed, 29 Aug 2018 15:12:32 +0200 Andrew Lunn wrote:
> Hi Jisheng
>
> Please separate fixes from new features.
>
> Fixes should be based on DaveM net branch, and use the subject line
> [PATCH net]...
>
> New features should be based on DaveM net-next branch, and use the
> subject line [PATCH net-next]...
Got it. Thanks for information.
On Wed, 29 Aug 2018 16:51:31 +0800 Jisheng Zhang wrote:
> On Wed, 29 Aug 2018 16:40:24 +0800 Jisheng Zhang wrote:
>
> > On Wed, 29 Aug 2018 16:25:57 +0800
> > Jisheng Zhang wrote:
> >
> > > patch1 fixes rx_offset_correction set and usage. Because the
> > > rx_offset_correction is RX packet offset correction for platforms,
> > > it's not related with SW BM, instead, it's only related with the
> > > platform's NET_SKB_PAD.
> > >
> > > patch2 fixes the wrong function to unmap rx buf
> >
> > I have question about the following two commits:
> >
> > 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor"), it cause
> > a waste, for normal 1500 MTU, before this patch we allocate 1920Bytes for rx
> > after this patch, we always allocate PAGE_SIZE bytes, if PAGE_SIZE=4096, we
> > waste 53% memory for each rx buf. I'm not sure whether the performance
> > improvement deserve the pay.
> >
> > 562e2f467e71 ("net: mvneta: Improve the buffer allocation method for SWBM")
> > mentions that "With system having a small memory (around 256MB), the state
> > "cannot allocate memory to refill with new buffer" is reach pretty quickly"
> > is it due to the memory waste as said above? Anyway, by this commit, we
> > want to improve the situation on a small memory system, so should we firstly
> > revert commit 7e47fd84b56b ("net: mvneta: Allocate page for the descriptor")?
Any comments?
Now I believe the situation is due to the memory waste introduced by 7e47fd84b56b
With linux 4.18, I tried to limit berlin platforms available memory to 256MB,
I didn't see "cannot allocate memory to refill with new buffer".
Thanks
> >
>
> If maintainers decide to revert the two commits: 7e47fd84b56b and 562e2f467e71
> then, patch1,2,3 are useless, we can drop them. Only patch4 and patch5 are
> still useful.
>
> Thanks
>
> > Any comments are welcome!
> >
> > Thanks
> >
> >
> > >
> > > patch3 removes the NETIF_F_GRO check ourself, because the net subsystem
> > > will handle it for us.
> > >
> > > patch4 enables NETIF_F_RXCSUM by default, since the driver and HW
> > > supports the feature.
> > >
> > > patch5 is a trivial optimization, to reduce smp_processor_id() calling
> > > in mvneta_tx_done_gbe.
> > >
> > > Jisheng Zhang (5):
> > > net: mvneta: fix rx_offset_correction set and usage
> > > net: mvneta: fix the wrong function to unmap rx buf
> > > net: mvneta: Don't check NETIF_F_GRO ourself
> > > net: mvneta: enable NETIF_F_RXCSUM by default
> > > net: mvneta: reduce smp_processor_id() calling in mvneta_tx_done_gbe
> > >
> > > drivers/net/ethernet/marvell/mvneta.c | 49 ++++++++++++---------------
> > > 1 file changed, 22 insertions(+), 27 deletions(-)
> > >
> >
>