2018-01-05 11:04:37

by Ioana Ciocoi Radulescu

[permalink] [raw]
Subject: [PATCH 0/2] staging: fsl-mc/dpio, fsl-dpaa2/eth: Use affine DPIO services

The first patch introduces a new DPIO API function allowing
the selection of a cpu-affine service. In the second patch,
the Ethernet driver uses this new API to map cpu-affine DPIO
services to channels.

This adds a significant performance boost on the frame dequeue
side (see numbers below), but, more importantly, helps us avoid
a scenario leading to the kernel running out of memory:

In an unidirectional IP forwarding test running on 8cores, with
ingress traffic @10Gbps, the rate with which Tx frames are
enqueued by the cpu on Tx queues is higher than the rate with
which cores manage to dequeue Tx confirmation frames. So these
confirmation frames increasingly accumulate on hardware queues,
eventually gobbling up all system memory.

Enforcing DPIO service affinity takes advantage of the way
our hardware was designed to be used, eliminating the bottleneck
that caused the imbalance between Rx and Tx processing paths.

Performance numbers measured on a LS2088A board with 8cores @2GHz:

Bidirectional IP forwarding, 512 flows with 64B frames:
1.68Mpps (before) vs 3.77Mpps (after)
Unidirectional IP forwarding, 512 flows with 64B frames:
1.33Mpps (and OOM after ~5-8min) vs 3.72Mpps
TCP netperf (client), 256 flows:
17.27Gbps with 100% cpu load vs 18.78Gbps with 77.6% cpu load

Ioana Radulescu (2):
staging: fsl-mc/dpio: Add dpaa2_io_service_select() API
staging: fsl-dpaa2/eth: Use affine DPIO services

drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c | 24 ++++++++++++++----------
drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h | 1 +
drivers/staging/fsl-mc/bus/dpio/dpio-service.c | 17 +++++++++++++++++
drivers/staging/fsl-mc/include/dpaa2-io.h | 2 ++
4 files changed, 34 insertions(+), 10 deletions(-)

--
2.7.4


2018-01-05 11:04:39

by Ioana Ciocoi Radulescu

[permalink] [raw]
Subject: [PATCH 1/2] staging: fsl-mc/dpio: Add dpaa2_io_service_select() API

All DPIO service API functions receive a dpaa2_io service pointer
as parameter (NULL meaning any service will do) which indicates
the hardware resource to be used to execute the specified command.

There isn't however any available API for obtaining such a service
reference that could be used further, effectively forcing the users
to always request a random service for DPIO operations.
(The DPIO driver holds internally an array mapping services to cpus,
and affine services can be indirectly requested by a couple of API
functions: dpaa2_io_service_register and dpaa2_io_service_rearm
use the cpu id provided by the user to select the corresponding
service)

This patch adds a function for selecting a DPIO service based on
the specified cpu id. If the user provides a "don't care" value
for the cpu, we revert to the default behavior and return the next
DPIO, taken in a round-robin fashion from a list of available
services.

Signed-off-by: Ioana Radulescu <[email protected]>
---
drivers/staging/fsl-mc/bus/dpio/dpio-service.c | 17 +++++++++++++++++
drivers/staging/fsl-mc/include/dpaa2-io.h | 2 ++
2 files changed, 19 insertions(+)

diff --git a/drivers/staging/fsl-mc/bus/dpio/dpio-service.c b/drivers/staging/fsl-mc/bus/dpio/dpio-service.c
index a8a8e15..6e8994c 100644
--- a/drivers/staging/fsl-mc/bus/dpio/dpio-service.c
+++ b/drivers/staging/fsl-mc/bus/dpio/dpio-service.c
@@ -104,6 +104,23 @@ static inline struct dpaa2_io *service_select(struct dpaa2_io *d)
}

/**
+ * dpaa2_io_service_select() - return a dpaa2_io service affined to this cpu
+ * @cpu: the cpu id
+ *
+ * Return the affine dpaa2_io service, or NULL if there is no service affined
+ * to the specified cpu. If DPAA2_IO_ANY_CPU is used, return the next available
+ * service.
+ */
+struct dpaa2_io *dpaa2_io_service_select(int cpu)
+{
+ if (cpu == DPAA2_IO_ANY_CPU)
+ return service_select(NULL);
+
+ return service_select_by_cpu(NULL, cpu);
+}
+EXPORT_SYMBOL_GPL(dpaa2_io_service_select);
+
+/**
* dpaa2_io_create() - create a dpaa2_io object.
* @desc: the dpaa2_io descriptor
*
diff --git a/drivers/staging/fsl-mc/include/dpaa2-io.h b/drivers/staging/fsl-mc/include/dpaa2-io.h
index 07ad15a..9d70251 100644
--- a/drivers/staging/fsl-mc/include/dpaa2-io.h
+++ b/drivers/staging/fsl-mc/include/dpaa2-io.h
@@ -88,6 +88,8 @@ void dpaa2_io_down(struct dpaa2_io *d);

irqreturn_t dpaa2_io_irq(struct dpaa2_io *obj);

+struct dpaa2_io *dpaa2_io_service_select(int cpu);
+
/**
* struct dpaa2_io_notification_ctx - The DPIO notification context structure
* @cb: The callback to be invoked when the notification arrives
--
2.7.4

2018-01-05 11:05:00

by Ioana Ciocoi Radulescu

[permalink] [raw]
Subject: [PATCH 2/2] staging: fsl-dpaa2/eth: Use affine DPIO services

Use the newly added DPIO service API to map cpu-affine DPIO services
to channels.

The DPAA2 Ethernet driver already had mappings of frame queues and
channels to cpus, but had no control over the DPIOs used. We can
now ensure full affinity of hotpath hardware resources to cores,
which improves performance and almost eliminates some resource
contentions (e.g. enqueue/dequeue busy counters should be close to
zero from now on).

Making the pull channel operation core affine brings the most
significant benefits. This ensures the same DPIO service will be
used for all dequeue commands issued for a certain frame queue,
which is in line with the way hardware is optimized.

Additionally, we also use affine DPIOs for the frame enqueue and
buffer release operations in order to avoid resource contention.
dpaa2_io_service_register() and dpaa2_io_service_rearm()
functions receive an affine DPIO as argument mostly for uniformity,
but this doesn't change the previous functionality.

Signed-off-by: Ioana Radulescu <[email protected]>
---
drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c | 24 ++++++++++++++----------
drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h | 1 +
2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c b/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c
index 824c4ad..2817e67 100644
--- a/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c
+++ b/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c
@@ -597,7 +597,8 @@ static netdev_tx_t dpaa2_eth_tx(struct sk_buff *skb, struct net_device *net_dev)
queue_mapping = skb_get_queue_mapping(skb);
fq = &priv->fq[queue_mapping];
for (i = 0; i < DPAA2_ETH_ENQUEUE_RETRIES; i++) {
- err = dpaa2_io_service_enqueue_qd(NULL, priv->tx_qdid, 0,
+ err = dpaa2_io_service_enqueue_qd(fq->channel->dpio,
+ priv->tx_qdid, 0,
fq->tx_qdbin, &fd);
if (err != -EBUSY)
break;
@@ -719,7 +720,8 @@ static void free_bufs(struct dpaa2_eth_priv *priv, u64 *buf_array, int count)
/* Perform a single release command to add buffers
* to the specified buffer pool
*/
-static int add_bufs(struct dpaa2_eth_priv *priv, u16 bpid)
+static int add_bufs(struct dpaa2_eth_priv *priv,
+ struct dpaa2_eth_channel *ch, u16 bpid)
{
struct device *dev = priv->net_dev->dev.parent;
u64 buf_array[DPAA2_ETH_BUFS_PER_CMD];
@@ -753,7 +755,7 @@ static int add_bufs(struct dpaa2_eth_priv *priv, u16 bpid)

release_bufs:
/* In case the portal is busy, retry until successful */
- while ((err = dpaa2_io_service_release(NULL, bpid,
+ while ((err = dpaa2_io_service_release(ch->dpio, bpid,
buf_array, i)) == -EBUSY)
cpu_relax();

@@ -794,7 +796,7 @@ static int seed_pool(struct dpaa2_eth_priv *priv, u16 bpid)
for (j = 0; j < priv->num_channels; j++) {
for (i = 0; i < DPAA2_ETH_NUM_BUFS;
i += DPAA2_ETH_BUFS_PER_CMD) {
- new_count = add_bufs(priv, bpid);
+ new_count = add_bufs(priv, priv->channel[j], bpid);
priv->channel[j]->buf_count += new_count;

if (new_count < DPAA2_ETH_BUFS_PER_CMD) {
@@ -852,7 +854,7 @@ static int refill_pool(struct dpaa2_eth_priv *priv,
return 0;

do {
- new_count = add_bufs(priv, bpid);
+ new_count = add_bufs(priv, ch, bpid);
if (unlikely(!new_count)) {
/* Out of memory; abort for now, we'll try later on */
break;
@@ -873,7 +875,8 @@ static int pull_channel(struct dpaa2_eth_channel *ch)

/* Retry while portal is busy */
do {
- err = dpaa2_io_service_pull_channel(NULL, ch->ch_id, ch->store);
+ err = dpaa2_io_service_pull_channel(ch->dpio, ch->ch_id,
+ ch->store);
dequeues++;
cpu_relax();
} while (err == -EBUSY);
@@ -923,7 +926,7 @@ static int dpaa2_eth_poll(struct napi_struct *napi, int budget)
if (cleaned < budget && napi_complete_done(napi, cleaned)) {
/* Re-enable data available notifications */
do {
- err = dpaa2_io_service_rearm(NULL, &ch->nctx);
+ err = dpaa2_io_service_rearm(ch->dpio, &ch->nctx);
cpu_relax();
} while (err == -EBUSY);
WARN_ONCE(err, "CDAN notifications rearm failed on core %d",
@@ -1531,7 +1534,8 @@ static int setup_dpio(struct dpaa2_eth_priv *priv)
nctx->desired_cpu = i;

/* Register the new context */
- err = dpaa2_io_service_register(NULL, nctx);
+ channel->dpio = dpaa2_io_service_select(i);
+ err = dpaa2_io_service_register(channel->dpio, nctx);
if (err) {
dev_dbg(dev, "No affine DPIO for cpu %d\n", i);
/* If no affine DPIO for this core, there's probably
@@ -1571,7 +1575,7 @@ static int setup_dpio(struct dpaa2_eth_priv *priv)
return 0;

err_set_cdan:
- dpaa2_io_service_deregister(NULL, nctx);
+ dpaa2_io_service_deregister(channel->dpio, nctx);
err_service_reg:
free_channel(priv, channel);
err_alloc_ch:
@@ -1594,7 +1598,7 @@ static void free_dpio(struct dpaa2_eth_priv *priv)
/* deregister CDAN notifications and free channels */
for (i = 0; i < priv->num_channels; i++) {
ch = priv->channel[i];
- dpaa2_io_service_deregister(NULL, &ch->nctx);
+ dpaa2_io_service_deregister(ch->dpio, &ch->nctx);
free_channel(priv, ch);
}
}
diff --git a/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h b/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h
index fb8fb5c..e577410 100644
--- a/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h
+++ b/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.h
@@ -288,6 +288,7 @@ struct dpaa2_eth_channel {
int ch_id;
int dpio_id;
struct napi_struct napi;
+ struct dpaa2_io *dpio;
struct dpaa2_io_store *store;
struct dpaa2_eth_priv *priv;
int buf_count;
--
2.7.4

2018-01-05 18:11:07

by Roy Pledge

[permalink] [raw]
Subject: Re: [PATCH 1/2] staging: fsl-mc/dpio: Add dpaa2_io_service_select() API

On 1/5/2018 6:04 AM, Ioana Radulescu wrote:
> All DPIO service API functions receive a dpaa2_io service pointer
> as parameter (NULL meaning any service will do) which indicates
> the hardware resource to be used to execute the specified command.
>
> There isn't however any available API for obtaining such a service
> reference that could be used further, effectively forcing the users
> to always request a random service for DPIO operations.
> (The DPIO driver holds internally an array mapping services to cpus,
> and affine services can be indirectly requested by a couple of API
> functions: dpaa2_io_service_register and dpaa2_io_service_rearm
> use the cpu id provided by the user to select the corresponding
> service)
>
> This patch adds a function for selecting a DPIO service based on
> the specified cpu id. If the user provides a "don't care" value
> for the cpu, we revert to the default behavior and return the next
> DPIO, taken in a round-robin fashion from a list of available
> services.
>
> Signed-off-by: Ioana Radulescu <[email protected]>
> ---
> drivers/staging/fsl-mc/bus/dpio/dpio-service.c | 17 +++++++++++++++++
> drivers/staging/fsl-mc/include/dpaa2-io.h | 2 ++
> 2 files changed, 19 insertions(+)
>
> diff --git a/drivers/staging/fsl-mc/bus/dpio/dpio-service.c b/drivers/staging/fsl-mc/bus/dpio/dpio-service.c
> index a8a8e15..6e8994c 100644
> --- a/drivers/staging/fsl-mc/bus/dpio/dpio-service.c
> +++ b/drivers/staging/fsl-mc/bus/dpio/dpio-service.c
> @@ -104,6 +104,23 @@ static inline struct dpaa2_io *service_select(struct dpaa2_io *d)
> }
>
> /**
> + * dpaa2_io_service_select() - return a dpaa2_io service affined to this cpu
> + * @cpu: the cpu id
> + *
> + * Return the affine dpaa2_io service, or NULL if there is no service affined
> + * to the specified cpu. If DPAA2_IO_ANY_CPU is used, return the next available
> + * service.
> + */
> +struct dpaa2_io *dpaa2_io_service_select(int cpu)
> +{
> + if (cpu == DPAA2_IO_ANY_CPU)
> + return service_select(NULL);
> +
> + return service_select_by_cpu(NULL, cpu);
> +}
> +EXPORT_SYMBOL_GPL(dpaa2_io_service_select);
> +
> +/**
> * dpaa2_io_create() - create a dpaa2_io object.
> * @desc: the dpaa2_io descriptor
> *
> diff --git a/drivers/staging/fsl-mc/include/dpaa2-io.h b/drivers/staging/fsl-mc/include/dpaa2-io.h
> index 07ad15a..9d70251 100644
> --- a/drivers/staging/fsl-mc/include/dpaa2-io.h
> +++ b/drivers/staging/fsl-mc/include/dpaa2-io.h
> @@ -88,6 +88,8 @@ void dpaa2_io_down(struct dpaa2_io *d);
>
> irqreturn_t dpaa2_io_irq(struct dpaa2_io *obj);
>
> +struct dpaa2_io *dpaa2_io_service_select(int cpu);
> +
> /**
> * struct dpaa2_io_notification_ctx - The DPIO notification context structure
> * @cb: The callback to be invoked when the notification arrives
>

Acked-by: Roy Pledge <[email protected]>