2019-12-24 01:02:20

by Matteo Croce

[permalink] [raw]
Subject: [RFC net-next 0/2] mvpp2: page_pool support

This patches change the memory allocator of mvpp2 from the frag allocator to
the page_pool API. This change is needed to add later XDP support to mvpp2.

The reason I send it as RFC is that with this changeset, mvpp2 performs much
more slower. This is the tc drop rate measured with a single flow:

stock net-next with frag allocator:
rx: 900.7 Mbps 1877 Kpps

this patchset with page_pool:
rx: 423.5 Mbps 882.3 Kpps

This is the perf top when receiving traffic:

27.68% [kernel] [k] __page_pool_clean_page
9.79% [kernel] [k] get_page_from_freelist
7.18% [kernel] [k] free_unref_page
4.64% [kernel] [k] build_skb
4.63% [kernel] [k] __netif_receive_skb_core
3.83% [mvpp2] [k] mvpp2_poll
3.64% [kernel] [k] eth_type_trans
3.61% [kernel] [k] kmem_cache_free
3.03% [kernel] [k] kmem_cache_alloc
2.76% [kernel] [k] dev_gro_receive
2.69% [mvpp2] [k] mvpp2_bm_pool_put
2.68% [kernel] [k] page_frag_free
1.83% [kernel] [k] inet_gro_receive
1.74% [kernel] [k] page_pool_alloc_pages
1.70% [kernel] [k] __build_skb
1.47% [kernel] [k] __alloc_pages_nodemask
1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0
1.29% [kernel] [k] tcf_action_exec

I tried Ilias patches for page_pool recycling, I get an improvement
to ~1100, but I'm still far than the original allocator.

Any idea on why I get such bad numbers?

Another reason to send it as RFC is that I'm not fully convinced on how to
use the page_pool given the HW limitation of the BM.

The driver currently uses, for every CPU, a page_pool for short packets and
another for long ones. The driver also has 4 rx queue per port, so every
RXQ #1 will share the short and long page pools of CPU #1.

This means that for every RX queue I call xdp_rxq_info_reg_mem_model() twice,
on two different page_pool, can this be a problem?

As usual, ideas are welcome.

Matteo Croce (2):
mvpp2: use page_pool allocator
mvpp2: memory accounting

drivers/net/ethernet/marvell/Kconfig | 1 +
drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 7 +
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 142 +++++++++++++++---
3 files changed, 125 insertions(+), 25 deletions(-)

--
2.24.1


2019-12-24 01:02:35

by Matteo Croce

[permalink] [raw]
Subject: [RFC net-next 2/2] mvpp2: memory accounting

Use the XDP API for memory accounting.

Signed-off-by: Matteo Croce <[email protected]>
---
drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 3 ++
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 31 ++++++++++++++++++-
2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index 67b3bf0d3c8b..ffd633e0a3be 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -1165,6 +1165,9 @@ struct mvpp2_rx_queue {

/* Port's logic RXQ number to which physical RXQ is mapped */
int logic_rxq;
+
+ /* XDP memory accounting */
+ struct xdp_rxq_info xdp_rxq;
};

struct mvpp2_bm_pool {
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 4edb81c8941f..3b0aac66ac52 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -2405,10 +2405,11 @@ static int mvpp2_aggr_txq_init(struct platform_device *pdev,
/* Create a specified Rx queue */
static int mvpp2_rxq_init(struct mvpp2_port *port,
struct mvpp2_rx_queue *rxq)
-
{
+ struct mvpp2 *priv = port->priv;
unsigned int thread;
u32 rxq_dma;
+ int err;

rxq->size = port->rx_ring_size;

@@ -2446,7 +2447,35 @@ static int mvpp2_rxq_init(struct mvpp2_port *port,
/* Add number of descriptors ready for receiving packets */
mvpp2_rxq_status_update(port, rxq->id, 0, rxq->size);

+ err = xdp_rxq_info_reg(&rxq->xdp_rxq, port->dev, rxq->id);
+ if (err < 0)
+ goto err_free_dma;
+
+ if (priv->percpu_pools) {
+ /* Every RXQ has a pool for short and another for long packets */
+ err = xdp_rxq_info_reg_mem_model(&rxq->xdp_rxq,
+ MEM_TYPE_PAGE_POOL,
+ priv->page_pool[rxq->logic_rxq]);
+ if (err < 0)
+ goto err_unregister_rxq;
+
+ err = xdp_rxq_info_reg_mem_model(&rxq->xdp_rxq,
+ MEM_TYPE_PAGE_POOL,
+ priv->page_pool[rxq->logic_rxq +
+ port->nrxqs]);
+ if (err < 0)
+ goto err_unregister_rxq;
+ }
+
return 0;
+
+err_unregister_rxq:
+ xdp_rxq_info_unreg(&rxq->xdp_rxq);
+err_free_dma:
+ dma_free_coherent(port->dev->dev.parent,
+ rxq->size * MVPP2_DESC_ALIGNED_SIZE,
+ rxq->descs, rxq->descs_dma);
+ return err;
}

/* Push packets received by the RXQ to BM pool */
--
2.24.1

2019-12-24 01:03:17

by Matteo Croce

[permalink] [raw]
Subject: [RFC net-next 1/2] mvpp2: use page_pool allocator

Use the page_pool API for memory management. This is a prerequisite for
native XDP support.

Signed-off-by: Matteo Croce <[email protected]>
---
drivers/net/ethernet/marvell/Kconfig | 1 +
drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 4 +
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 111 ++++++++++++++----
3 files changed, 92 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/marvell/Kconfig b/drivers/net/ethernet/marvell/Kconfig
index 3d5caea096fb..e2612cc4920d 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -87,6 +87,7 @@ config MVPP2
depends on ARCH_MVEBU || COMPILE_TEST
select MVMDIO
select PHYLINK
+ select PAGE_POOL
---help---
This driver supports the network interface units in the
Marvell ARMADA 375, 7K and 8K SoCs.
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
index 543a310ec102..67b3bf0d3c8b 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2.h
@@ -15,6 +15,7 @@
#include <linux/phy.h>
#include <linux/phylink.h>
#include <net/flow_offload.h>
+#include <net/page_pool.h>

/* Fifo Registers */
#define MVPP2_RX_DATA_FIFO_SIZE_REG(port) (0x00 + 4 * (port))
@@ -820,6 +821,9 @@ struct mvpp2 {

/* RSS Indirection tables */
struct mvpp2_rss_table *rss_tables[MVPP22_N_RSS_TABLES];
+
+ /* page_pool allocator */
+ struct page_pool *page_pool[MVPP2_PORT_MAX_RXQ];
};

struct mvpp2_pcpu_stats {
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
index 14e372cda7f4..4edb81c8941f 100644
--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -92,6 +92,21 @@ static inline u32 mvpp2_cpu_to_thread(struct mvpp2 *priv, int cpu)
return cpu % priv->nthreads;
}

+static struct page_pool *
+mvpp2_create_page_pool(struct device *dev, int num)
+{
+ struct page_pool_params pp_params = {
+ /* internal DMA mapping in page_pool */
+ .flags = PP_FLAG_DMA_MAP,
+ .pool_size = num,
+ .nid = NUMA_NO_NODE,
+ .dev = dev,
+ .dma_dir = DMA_FROM_DEVICE,
+ };
+
+ return page_pool_create(&pp_params);
+}
+
/* These accessors should be used to access:
*
* - per-thread registers, where each thread has its own copy of the
@@ -324,17 +339,26 @@ static inline int mvpp2_txq_phys(int port, int txq)
return (MVPP2_MAX_TCONT + port) * MVPP2_MAX_TXQ + txq;
}

-static void *mvpp2_frag_alloc(const struct mvpp2_bm_pool *pool)
+/* Returns a struct page if page_pool is set, otherwise a buffer */
+static void *mvpp2_frag_alloc(const struct mvpp2_bm_pool *pool,
+ struct page_pool *page_pool)
{
+ if (page_pool)
+ return page_pool_alloc_pages(page_pool,
+ GFP_ATOMIC | __GFP_NOWARN);
+
if (likely(pool->frag_size <= PAGE_SIZE))
return netdev_alloc_frag(pool->frag_size);
- else
- return kmalloc(pool->frag_size, GFP_ATOMIC);
+
+ return kmalloc(pool->frag_size, GFP_ATOMIC);
}

-static void mvpp2_frag_free(const struct mvpp2_bm_pool *pool, void *data)
+static void mvpp2_frag_free(const struct mvpp2_bm_pool *pool,
+ struct page_pool *page_pool, void *data)
{
- if (likely(pool->frag_size <= PAGE_SIZE))
+ if (page_pool)
+ page_pool_put_page(page_pool, virt_to_page(data), false);
+ else if (likely(pool->frag_size <= PAGE_SIZE))
skb_free_frag(data);
else
kfree(data);
@@ -439,6 +463,7 @@ static void mvpp2_bm_bufs_get_addrs(struct device *dev, struct mvpp2 *priv,
static void mvpp2_bm_bufs_free(struct device *dev, struct mvpp2 *priv,
struct mvpp2_bm_pool *bm_pool, int buf_num)
{
+ struct page_pool *pp = NULL;
int i;

if (buf_num > bm_pool->buf_num) {
@@ -447,6 +472,9 @@ static void mvpp2_bm_bufs_free(struct device *dev, struct mvpp2 *priv,
buf_num = bm_pool->buf_num;
}

+ if (priv->percpu_pools)
+ pp = priv->page_pool[bm_pool->id];
+
for (i = 0; i < buf_num; i++) {
dma_addr_t buf_dma_addr;
phys_addr_t buf_phys_addr;
@@ -455,14 +483,15 @@ static void mvpp2_bm_bufs_free(struct device *dev, struct mvpp2 *priv,
mvpp2_bm_bufs_get_addrs(dev, priv, bm_pool,
&buf_dma_addr, &buf_phys_addr);

- dma_unmap_single(dev, buf_dma_addr,
- bm_pool->buf_size, DMA_FROM_DEVICE);
+ if (!pp)
+ dma_unmap_single(dev, buf_dma_addr,
+ bm_pool->buf_size, DMA_FROM_DEVICE);

data = (void *)phys_to_virt(buf_phys_addr);
if (!data)
break;

- mvpp2_frag_free(bm_pool, data);
+ mvpp2_frag_free(bm_pool, pp, data);
}

/* Update BM driver with number of buffers removed from pool */
@@ -493,6 +522,9 @@ static int mvpp2_bm_pool_destroy(struct device *dev, struct mvpp2 *priv,
int buf_num;
u32 val;

+ if (priv->percpu_pools)
+ page_pool_destroy(priv->page_pool[bm_pool->id]);
+
buf_num = mvpp2_check_hw_buf_num(priv, bm_pool);
mvpp2_bm_bufs_free(dev, priv, bm_pool, buf_num);

@@ -545,8 +577,16 @@ static int mvpp2_bm_init(struct device *dev, struct mvpp2 *priv)
{
int i, err, poolnum = MVPP2_BM_POOLS_NUM;

- if (priv->percpu_pools)
+ if (priv->percpu_pools) {
poolnum = mvpp2_get_nrxqs(priv) * 2;
+ for (i = 0; i < poolnum; i++) {
+ priv->page_pool[i] =
+ mvpp2_create_page_pool(dev,
+ mvpp2_pools[i / (poolnum / 2)].buf_num);
+ if (IS_ERR(priv->page_pool[i]))
+ return PTR_ERR(priv->page_pool[i]);
+ }
+ }

dev_info(dev, "using %d %s buffers\n", poolnum,
priv->percpu_pools ? "per-cpu" : "shared");
@@ -629,23 +669,35 @@ static void mvpp2_rxq_short_pool_set(struct mvpp2_port *port,

static void *mvpp2_buf_alloc(struct mvpp2_port *port,
struct mvpp2_bm_pool *bm_pool,
+ struct page_pool *page_pool,
dma_addr_t *buf_dma_addr,
phys_addr_t *buf_phys_addr,
gfp_t gfp_mask)
{
dma_addr_t dma_addr;
+ struct page *page;
void *data;

- data = mvpp2_frag_alloc(bm_pool);
+ data = mvpp2_frag_alloc(bm_pool, page_pool);
if (!data)
return NULL;

- dma_addr = dma_map_single(port->dev->dev.parent, data,
- MVPP2_RX_BUF_SIZE(bm_pool->pkt_size),
- DMA_FROM_DEVICE);
- if (unlikely(dma_mapping_error(port->dev->dev.parent, dma_addr))) {
- mvpp2_frag_free(bm_pool, data);
- return NULL;
+ if (page_pool) {
+ page = (struct page *)data;
+ dma_addr = page_pool_get_dma_addr(page);
+ data = page_to_virt(page);
+ dma_sync_single_for_device(port->dev->dev.parent,
+ virt_to_phys(data),
+ bm_pool->buf_size,
+ DMA_FROM_DEVICE);
+ } else {
+ dma_addr = dma_map_single(port->dev->dev.parent, data,
+ MVPP2_RX_BUF_SIZE(bm_pool->pkt_size),
+ DMA_FROM_DEVICE);
+ if (unlikely(dma_mapping_error(port->dev->dev.parent, dma_addr))) {
+ mvpp2_frag_free(bm_pool, NULL, data);
+ return NULL;
+ }
}
*buf_dma_addr = dma_addr;
*buf_phys_addr = virt_to_phys(data);
@@ -703,6 +755,7 @@ static int mvpp2_bm_bufs_add(struct mvpp2_port *port,
int i, buf_size, total_size;
dma_addr_t dma_addr;
phys_addr_t phys_addr;
+ struct page_pool *pp = NULL;
void *buf;

if (port->priv->percpu_pools &&
@@ -723,8 +776,10 @@ static int mvpp2_bm_bufs_add(struct mvpp2_port *port,
return 0;
}

+ if (port->priv->percpu_pools)
+ pp = port->priv->page_pool[bm_pool->id];
for (i = 0; i < buf_num; i++) {
- buf = mvpp2_buf_alloc(port, bm_pool, &dma_addr,
+ buf = mvpp2_buf_alloc(port, bm_pool, pp, &dma_addr,
&phys_addr, GFP_KERNEL);
if (!buf)
break;
@@ -2865,14 +2920,15 @@ static void mvpp2_rx_csum(struct mvpp2_port *port, u32 status,

/* Allocate a new skb and add it to BM pool */
static int mvpp2_rx_refill(struct mvpp2_port *port,
- struct mvpp2_bm_pool *bm_pool, int pool)
+ struct mvpp2_bm_pool *bm_pool,
+ struct page_pool *page_pool, int pool)
{
dma_addr_t dma_addr;
phys_addr_t phys_addr;
void *buf;

- buf = mvpp2_buf_alloc(port, bm_pool, &dma_addr, &phys_addr,
- GFP_ATOMIC);
+ buf = mvpp2_buf_alloc(port, bm_pool, page_pool,
+ &dma_addr, &phys_addr, GFP_ATOMIC);
if (!buf)
return -ENOMEM;

@@ -2931,6 +2987,7 @@ static int mvpp2_rx(struct mvpp2_port *port, struct napi_struct *napi,
while (rx_done < rx_todo) {
struct mvpp2_rx_desc *rx_desc = mvpp2_rxq_next_desc_get(rxq);
struct mvpp2_bm_pool *bm_pool;
+ struct page_pool *pp = NULL;
struct sk_buff *skb;
unsigned int frag_size;
dma_addr_t dma_addr;
@@ -2975,15 +3032,21 @@ static int mvpp2_rx(struct mvpp2_port *port, struct napi_struct *napi,
goto err_drop_frame;
}

- err = mvpp2_rx_refill(port, bm_pool, pool);
+ if (port->priv->percpu_pools)
+ pp = port->priv->page_pool[pool];
+
+ err = mvpp2_rx_refill(port, bm_pool, pp, pool);
if (err) {
netdev_err(port->dev, "failed to refill BM pools\n");
goto err_drop_frame;
}

- dma_unmap_single_attrs(dev->dev.parent, dma_addr,
- bm_pool->buf_size, DMA_FROM_DEVICE,
- DMA_ATTR_SKIP_CPU_SYNC);
+ if (pp)
+ page_pool_release_page(pp, virt_to_page(data));
+ else
+ dma_unmap_single_attrs(dev->dev.parent, dma_addr,
+ bm_pool->buf_size, DMA_FROM_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC);

rcvd_pkts++;
rcvd_bytes += rx_bytes;
--
2.24.1

2019-12-24 09:53:22

by Ilias Apalodimas

[permalink] [raw]
Subject: Re: [RFC net-next 0/2] mvpp2: page_pool support

On Tue, Dec 24, 2019 at 02:01:01AM +0100, Matteo Croce wrote:
> This patches change the memory allocator of mvpp2 from the frag allocator to
> the page_pool API. This change is needed to add later XDP support to mvpp2.
>
> The reason I send it as RFC is that with this changeset, mvpp2 performs much
> more slower. This is the tc drop rate measured with a single flow:
>
> stock net-next with frag allocator:
> rx: 900.7 Mbps 1877 Kpps
>
> this patchset with page_pool:
> rx: 423.5 Mbps 882.3 Kpps
>
> This is the perf top when receiving traffic:
>
> 27.68% [kernel] [k] __page_pool_clean_page

This seems extremly high on the list.

> 9.79% [kernel] [k] get_page_from_freelist
> 7.18% [kernel] [k] free_unref_page
> 4.64% [kernel] [k] build_skb
> 4.63% [kernel] [k] __netif_receive_skb_core
> 3.83% [mvpp2] [k] mvpp2_poll
> 3.64% [kernel] [k] eth_type_trans
> 3.61% [kernel] [k] kmem_cache_free
> 3.03% [kernel] [k] kmem_cache_alloc
> 2.76% [kernel] [k] dev_gro_receive
> 2.69% [mvpp2] [k] mvpp2_bm_pool_put
> 2.68% [kernel] [k] page_frag_free
> 1.83% [kernel] [k] inet_gro_receive
> 1.74% [kernel] [k] page_pool_alloc_pages
> 1.70% [kernel] [k] __build_skb
> 1.47% [kernel] [k] __alloc_pages_nodemask
> 1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0
> 1.29% [kernel] [k] tcf_action_exec
>
> I tried Ilias patches for page_pool recycling, I get an improvement
> to ~1100, but I'm still far than the original allocator.

Can you post the recycling perf for comparison?

>
> Any idea on why I get such bad numbers?

Nop but it's indeed strange

>
> Another reason to send it as RFC is that I'm not fully convinced on how to
> use the page_pool given the HW limitation of the BM.

I'll have a look right after holidays

>
> The driver currently uses, for every CPU, a page_pool for short packets and
> another for long ones. The driver also has 4 rx queue per port, so every
> RXQ #1 will share the short and long page pools of CPU #1.
>

I am not sure i am following the hardware config here

> This means that for every RX queue I call xdp_rxq_info_reg_mem_model() twice,
> on two different page_pool, can this be a problem?
>
> As usual, ideas are welcome.
>
> Matteo Croce (2):
> mvpp2: use page_pool allocator
> mvpp2: memory accounting
>
> drivers/net/ethernet/marvell/Kconfig | 1 +
> drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 7 +
> .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 142 +++++++++++++++---
> 3 files changed, 125 insertions(+), 25 deletions(-)
>
> --
> 2.24.1
>
Cheers
/Ilias

2019-12-24 13:35:56

by Matteo Croce

[permalink] [raw]
Subject: Re: [RFC net-next 0/2] mvpp2: page_pool support

On Tue, Dec 24, 2019 at 10:52 AM Ilias Apalodimas
<[email protected]> wrote:
>
> On Tue, Dec 24, 2019 at 02:01:01AM +0100, Matteo Croce wrote:
> > This patches change the memory allocator of mvpp2 from the frag allocator to
> > the page_pool API. This change is needed to add later XDP support to mvpp2.
> >
> > The reason I send it as RFC is that with this changeset, mvpp2 performs much
> > more slower. This is the tc drop rate measured with a single flow:
> >
> > stock net-next with frag allocator:
> > rx: 900.7 Mbps 1877 Kpps
> >
> > this patchset with page_pool:
> > rx: 423.5 Mbps 882.3 Kpps
> >
> > This is the perf top when receiving traffic:
> >
> > 27.68% [kernel] [k] __page_pool_clean_page
>
> This seems extremly high on the list.
>
> > 9.79% [kernel] [k] get_page_from_freelist
> > 7.18% [kernel] [k] free_unref_page
> > 4.64% [kernel] [k] build_skb
> > 4.63% [kernel] [k] __netif_receive_skb_core
> > 3.83% [mvpp2] [k] mvpp2_poll
> > 3.64% [kernel] [k] eth_type_trans
> > 3.61% [kernel] [k] kmem_cache_free
> > 3.03% [kernel] [k] kmem_cache_alloc
> > 2.76% [kernel] [k] dev_gro_receive
> > 2.69% [mvpp2] [k] mvpp2_bm_pool_put
> > 2.68% [kernel] [k] page_frag_free
> > 1.83% [kernel] [k] inet_gro_receive
> > 1.74% [kernel] [k] page_pool_alloc_pages
> > 1.70% [kernel] [k] __build_skb
> > 1.47% [kernel] [k] __alloc_pages_nodemask
> > 1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0
> > 1.29% [kernel] [k] tcf_action_exec
> >
> > I tried Ilias patches for page_pool recycling, I get an improvement
> > to ~1100, but I'm still far than the original allocator.
>
> Can you post the recycling perf for comparison?
>

12.00% [kernel] [k] get_page_from_freelist
9.25% [kernel] [k] free_unref_page
6.83% [kernel] [k] eth_type_trans
5.33% [kernel] [k] __netif_receive_skb_core
4.96% [mvpp2] [k] mvpp2_poll
4.64% [kernel] [k] kmem_cache_free
4.06% [kernel] [k] __xdp_return
3.60% [kernel] [k] kmem_cache_alloc
3.31% [kernel] [k] dev_gro_receive
3.29% [kernel] [k] __page_pool_clean_page
3.25% [mvpp2] [k] mvpp2_bm_pool_put
2.73% [kernel] [k] __page_pool_put_page
2.33% [kernel] [k] __alloc_pages_nodemask
2.33% [kernel] [k] inet_gro_receive
2.05% [kernel] [k] __build_skb
1.95% [kernel] [k] build_skb
1.89% [cls_matchall] [k] mall_classify
1.83% [kernel] [k] page_pool_alloc_pages
1.80% [kernel] [k] tcf_action_exec
1.70% [mvpp2] [k] mvpp2_buf_alloc.isra.0
1.63% [kernel] [k] free_unref_page_prepare.part.0
1.45% [kernel] [k] page_pool_return_skb_page
1.42% [act_gact] [k] tcf_gact_act
1.16% [kernel] [k] netif_receive_skb_list_internal
1.08% [kernel] [k] kfree_skb
1.07% [kernel] [k] skb_release_data


> >
> > Any idea on why I get such bad numbers?
>
> Nop but it's indeed strange
>
> >
> > Another reason to send it as RFC is that I'm not fully convinced on how to
> > use the page_pool given the HW limitation of the BM.
>
> I'll have a look right after holidays
>

Thanks

> >
> > The driver currently uses, for every CPU, a page_pool for short packets and
> > another for long ones. The driver also has 4 rx queue per port, so every
> > RXQ #1 will share the short and long page pools of CPU #1.
> >
>
> I am not sure i am following the hardware config here
>

Never mind, it's quite a mess, I needed a lot of time to get it :)

The HW put the packets in different buffer pools depending on the size:
short: 64..128
long: 128..1664
jumbo: 1664..9856

Let's skip the jumbo buffer for now and assume we have 4 CPU, the
driver allocates 4 short and 4 long buffers.
Each port has 4 RX queues, and each one uses a short and a long buffer.
With the page_pool api, we have 8 struct page_pool, 4 for the short
and 4 for the long buffers.


> > This means that for every RX queue I call xdp_rxq_info_reg_mem_model() twice,
> > on two different page_pool, can this be a problem?
> >
> > As usual, ideas are welcome.
> >
> > Matteo Croce (2):
> > mvpp2: use page_pool allocator
> > mvpp2: memory accounting
> >
> > drivers/net/ethernet/marvell/Kconfig | 1 +
> > drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 7 +
> > .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 142 +++++++++++++++---
> > 3 files changed, 125 insertions(+), 25 deletions(-)
> >
> > --
> > 2.24.1
> >
> Cheers
> /Ilias
>

Bye,
--
Matteo Croce
per aspera ad upstream

2019-12-24 14:02:06

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [RFC net-next 0/2] mvpp2: page_pool support

On Tue, 24 Dec 2019 11:52:29 +0200
Ilias Apalodimas <[email protected]> wrote:

> On Tue, Dec 24, 2019 at 02:01:01AM +0100, Matteo Croce wrote:
> > This patches change the memory allocator of mvpp2 from the frag allocator to
> > the page_pool API. This change is needed to add later XDP support to mvpp2.
> >
> > The reason I send it as RFC is that with this changeset, mvpp2 performs much
> > more slower. This is the tc drop rate measured with a single flow:
> >
> > stock net-next with frag allocator:
> > rx: 900.7 Mbps 1877 Kpps
> >
> > this patchset with page_pool:
> > rx: 423.5 Mbps 882.3 Kpps
> >
> > This is the perf top when receiving traffic:
> >
> > 27.68% [kernel] [k] __page_pool_clean_page
>
> This seems extremly high on the list.

This looks related to the cost of dma unmap, as page_pool have
PP_FLAG_DMA_MAP. (It is a little strange, as page_pool have flag
DMA_ATTR_SKIP_CPU_SYNC, which should make it less expensive).


> > 9.79% [kernel] [k] get_page_from_freelist

You are clearly hitting page-allocator every time, because you are not
using page_pool recycle facility.


> > 7.18% [kernel] [k] free_unref_page
> > 4.64% [kernel] [k] build_skb
> > 4.63% [kernel] [k] __netif_receive_skb_core
> > 3.83% [mvpp2] [k] mvpp2_poll
> > 3.64% [kernel] [k] eth_type_trans
> > 3.61% [kernel] [k] kmem_cache_free
> > 3.03% [kernel] [k] kmem_cache_alloc
> > 2.76% [kernel] [k] dev_gro_receive
> > 2.69% [mvpp2] [k] mvpp2_bm_pool_put
> > 2.68% [kernel] [k] page_frag_free
> > 1.83% [kernel] [k] inet_gro_receive
> > 1.74% [kernel] [k] page_pool_alloc_pages
> > 1.70% [kernel] [k] __build_skb
> > 1.47% [kernel] [k] __alloc_pages_nodemask
> > 1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0
> > 1.29% [kernel] [k] tcf_action_exec
> >
> > I tried Ilias patches for page_pool recycling, I get an improvement
> > to ~1100, but I'm still far than the original allocator.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2019-12-24 14:06:14

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [RFC net-next 0/2] mvpp2: page_pool support

On Tue, 24 Dec 2019 14:34:07 +0100
Matteo Croce <[email protected]> wrote:

> On Tue, Dec 24, 2019 at 10:52 AM Ilias Apalodimas
> <[email protected]> wrote:
> >
> > On Tue, Dec 24, 2019 at 02:01:01AM +0100, Matteo Croce wrote:
> > > This patches change the memory allocator of mvpp2 from the frag allocator to
> > > the page_pool API. This change is needed to add later XDP support to mvpp2.
> > >
> > > The reason I send it as RFC is that with this changeset, mvpp2 performs much
> > > more slower. This is the tc drop rate measured with a single flow:
> > >
> > > stock net-next with frag allocator:
> > > rx: 900.7 Mbps 1877 Kpps
> > >
> > > this patchset with page_pool:
> > > rx: 423.5 Mbps 882.3 Kpps
> > >
> > > This is the perf top when receiving traffic:
> > >
> > > 27.68% [kernel] [k] __page_pool_clean_page
> >
> > This seems extremly high on the list.
> >
> > > 9.79% [kernel] [k] get_page_from_freelist
> > > 7.18% [kernel] [k] free_unref_page
> > > 4.64% [kernel] [k] build_skb
> > > 4.63% [kernel] [k] __netif_receive_skb_core
> > > 3.83% [mvpp2] [k] mvpp2_poll
> > > 3.64% [kernel] [k] eth_type_trans
> > > 3.61% [kernel] [k] kmem_cache_free
> > > 3.03% [kernel] [k] kmem_cache_alloc
> > > 2.76% [kernel] [k] dev_gro_receive
> > > 2.69% [mvpp2] [k] mvpp2_bm_pool_put
> > > 2.68% [kernel] [k] page_frag_free
> > > 1.83% [kernel] [k] inet_gro_receive
> > > 1.74% [kernel] [k] page_pool_alloc_pages
> > > 1.70% [kernel] [k] __build_skb
> > > 1.47% [kernel] [k] __alloc_pages_nodemask
> > > 1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0
> > > 1.29% [kernel] [k] tcf_action_exec
> > >
> > > I tried Ilias patches for page_pool recycling, I get an improvement
> > > to ~1100, but I'm still far than the original allocator.
> >
> > Can you post the recycling perf for comparison?
> >
>
> 12.00% [kernel] [k] get_page_from_freelist
> 9.25% [kernel] [k] free_unref_page

Hmm, this indicate pages are not getting recycled.

> 6.83% [kernel] [k] eth_type_trans
> 5.33% [kernel] [k] __netif_receive_skb_core
> 4.96% [mvpp2] [k] mvpp2_poll
> 4.64% [kernel] [k] kmem_cache_free
> 4.06% [kernel] [k] __xdp_return

You do invoke __xdp_return() code, but it might find that the page
cannot be recycled...

> 3.60% [kernel] [k] kmem_cache_alloc
> 3.31% [kernel] [k] dev_gro_receive
> 3.29% [kernel] [k] __page_pool_clean_page
> 3.25% [mvpp2] [k] mvpp2_bm_pool_put
> 2.73% [kernel] [k] __page_pool_put_page
> 2.33% [kernel] [k] __alloc_pages_nodemask
> 2.33% [kernel] [k] inet_gro_receive
> 2.05% [kernel] [k] __build_skb
> 1.95% [kernel] [k] build_skb
> 1.89% [cls_matchall] [k] mall_classify
> 1.83% [kernel] [k] page_pool_alloc_pages
> 1.80% [kernel] [k] tcf_action_exec
> 1.70% [mvpp2] [k] mvpp2_buf_alloc.isra.0
> 1.63% [kernel] [k] free_unref_page_prepare.part.0
> 1.45% [kernel] [k] page_pool_return_skb_page
> 1.42% [act_gact] [k] tcf_gact_act
> 1.16% [kernel] [k] netif_receive_skb_list_internal
> 1.08% [kernel] [k] kfree_skb
> 1.07% [kernel] [k] skb_release_data

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2019-12-24 14:39:33

by Matteo Croce

[permalink] [raw]
Subject: Re: [RFC net-next 0/2] mvpp2: page_pool support

On Tue, Dec 24, 2019 at 3:01 PM Jesper Dangaard Brouer
<[email protected]> wrote:
>
> On Tue, 24 Dec 2019 11:52:29 +0200
> Ilias Apalodimas <[email protected]> wrote:
>
> > On Tue, Dec 24, 2019 at 02:01:01AM +0100, Matteo Croce wrote:
> > > This patches change the memory allocator of mvpp2 from the frag allocator to
> > > the page_pool API. This change is needed to add later XDP support to mvpp2.
> > >
> > > The reason I send it as RFC is that with this changeset, mvpp2 performs much
> > > more slower. This is the tc drop rate measured with a single flow:
> > >
> > > stock net-next with frag allocator:
> > > rx: 900.7 Mbps 1877 Kpps
> > >
> > > this patchset with page_pool:
> > > rx: 423.5 Mbps 882.3 Kpps
> > >
> > > This is the perf top when receiving traffic:
> > >
> > > 27.68% [kernel] [k] __page_pool_clean_page
> >
> > This seems extremly high on the list.
>
> This looks related to the cost of dma unmap, as page_pool have
> PP_FLAG_DMA_MAP. (It is a little strange, as page_pool have flag
> DMA_ATTR_SKIP_CPU_SYNC, which should make it less expensive).
>
>
> > > 9.79% [kernel] [k] get_page_from_freelist
>
> You are clearly hitting page-allocator every time, because you are not
> using page_pool recycle facility.
>
>
> > > 7.18% [kernel] [k] free_unref_page
> > > 4.64% [kernel] [k] build_skb
> > > 4.63% [kernel] [k] __netif_receive_skb_core
> > > 3.83% [mvpp2] [k] mvpp2_poll
> > > 3.64% [kernel] [k] eth_type_trans
> > > 3.61% [kernel] [k] kmem_cache_free
> > > 3.03% [kernel] [k] kmem_cache_alloc
> > > 2.76% [kernel] [k] dev_gro_receive
> > > 2.69% [mvpp2] [k] mvpp2_bm_pool_put
> > > 2.68% [kernel] [k] page_frag_free
> > > 1.83% [kernel] [k] inet_gro_receive
> > > 1.74% [kernel] [k] page_pool_alloc_pages
> > > 1.70% [kernel] [k] __build_skb
> > > 1.47% [kernel] [k] __alloc_pages_nodemask
> > > 1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0
> > > 1.29% [kernel] [k] tcf_action_exec
> > >
> > > I tried Ilias patches for page_pool recycling, I get an improvement
> > > to ~1100, but I'm still far than the original allocator.
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
>

The change I did to use the recycling is the following:

--- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
+++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
@@ -3071,7 +3071,7 @@ static int mvpp2_rx(struct mvpp2_port *port,
struct napi_struct *napi,
if (pp)
- page_pool_release_page(pp, virt_to_page(data));
+ skb_mark_for_recycle(skb, virt_to_page(data), &rxq->xdp_rxq.mem);
else
dma_unmap_single_attrs(dev->dev.parent, dma_addr,




--
Matteo Croce
per aspera ad upstream

2019-12-27 11:52:54

by Ilias Apalodimas

[permalink] [raw]
Subject: Re: [RFC net-next 0/2] mvpp2: page_pool support

On Tue, Dec 24, 2019 at 03:37:49PM +0100, Matteo Croce wrote:
> On Tue, Dec 24, 2019 at 3:01 PM Jesper Dangaard Brouer
> <[email protected]> wrote:
> >
> > On Tue, 24 Dec 2019 11:52:29 +0200
> > Ilias Apalodimas <[email protected]> wrote:
> >
> > > On Tue, Dec 24, 2019 at 02:01:01AM +0100, Matteo Croce wrote:
> > > > This patches change the memory allocator of mvpp2 from the frag allocator to
> > > > the page_pool API. This change is needed to add later XDP support to mvpp2.
> > > >
> > > > The reason I send it as RFC is that with this changeset, mvpp2 performs much
> > > > more slower. This is the tc drop rate measured with a single flow:
> > > >
> > > > stock net-next with frag allocator:
> > > > rx: 900.7 Mbps 1877 Kpps
> > > >
> > > > this patchset with page_pool:
> > > > rx: 423.5 Mbps 882.3 Kpps
> > > >
> > > > This is the perf top when receiving traffic:
> > > >
> > > > 27.68% [kernel] [k] __page_pool_clean_page
> > >
> > > This seems extremly high on the list.
> >
> > This looks related to the cost of dma unmap, as page_pool have
> > PP_FLAG_DMA_MAP. (It is a little strange, as page_pool have flag
> > DMA_ATTR_SKIP_CPU_SYNC, which should make it less expensive).
> >
> >
> > > > 9.79% [kernel] [k] get_page_from_freelist
> >
> > You are clearly hitting page-allocator every time, because you are not
> > using page_pool recycle facility.
> >
> >
> > > > 7.18% [kernel] [k] free_unref_page
> > > > 4.64% [kernel] [k] build_skb
> > > > 4.63% [kernel] [k] __netif_receive_skb_core
> > > > 3.83% [mvpp2] [k] mvpp2_poll
> > > > 3.64% [kernel] [k] eth_type_trans
> > > > 3.61% [kernel] [k] kmem_cache_free
> > > > 3.03% [kernel] [k] kmem_cache_alloc
> > > > 2.76% [kernel] [k] dev_gro_receive
> > > > 2.69% [mvpp2] [k] mvpp2_bm_pool_put
> > > > 2.68% [kernel] [k] page_frag_free
> > > > 1.83% [kernel] [k] inet_gro_receive
> > > > 1.74% [kernel] [k] page_pool_alloc_pages
> > > > 1.70% [kernel] [k] __build_skb
> > > > 1.47% [kernel] [k] __alloc_pages_nodemask
> > > > 1.36% [mvpp2] [k] mvpp2_buf_alloc.isra.0
> > > > 1.29% [kernel] [k] tcf_action_exec
> > > >
> > > > I tried Ilias patches for page_pool recycling, I get an improvement
> > > > to ~1100, but I'm still far than the original allocator.
> > --
> > Best regards,
> > Jesper Dangaard Brouer
> > MSc.CS, Principal Kernel Engineer at Red Hat
> > LinkedIn: http://www.linkedin.com/in/brouer
> >
>
> The change I did to use the recycling is the following:
>
> --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c
> @@ -3071,7 +3071,7 @@ static int mvpp2_rx(struct mvpp2_port *port,
> struct napi_struct *napi,
> if (pp)
> - page_pool_release_page(pp, virt_to_page(data));
> + skb_mark_for_recycle(skb, virt_to_page(data), &rxq->xdp_rxq.mem);
> else
> dma_unmap_single_attrs(dev->dev.parent, dma_addr,
>
>
Jesper is rightm you aren't recycling anything.

The mark skb_mark_for_recycle() usage seems correct.
There are a few more places that we refuse to recycle (for example coalescing
page pool and slub allocated pages is forbidden). I wonder if you hit any of
those cases and recycling doesn't take place.
We'll hopefully release updated code shortly. I'll ping you and we can test on
that


Thanks
/Ilias
>
>
> --
> Matteo Croce
> per aspera ad upstream
>