2023-02-06 10:09:07

by Vladimir Oltean

[permalink] [raw]
Subject: [RFC PATCH net-next 00/11] NXP ENETC AF_XDP zero-copy sockets

This is RFC because I have a few things I'm not 100% certain about.
I've tested this with the xdpsock test application, I don't have very
detailed knowledge about the internals of AF_XDP sockets.

Patches where I'd appreciate if people took a look are 02/11, 05/11,
10/11 and 11/11.

Vladimir Oltean (11):
net: enetc: optimize struct enetc_rx_swbd layout
net: enetc: perform XDP RX queue registration at enetc_setup_bpf()
time
net: enetc: rename "cleaned_cnt" to "buffs_missing"
net: enetc: continue NAPI processing on frames with RX errors
net: enetc: add support for ethtool --show-channels
net: enetc: consolidate rx_swbd freeing
net: enetc: rename enetc_free_tx_frame() to enetc_free_tx_swbd()
net: enetc: increment rx_byte_cnt for XDP data path
net: enetc: move setting of ENETC_TXBD_FLAGS_F flag to
enetc_xdp_map_tx_buff()
net: enetc: add RX support for zero-copy XDP sockets
net: enetc: add TX support for zero-copy XDP sockets

drivers/net/ethernet/freescale/enetc/enetc.c | 676 +++++++++++++++---
drivers/net/ethernet/freescale/enetc/enetc.h | 9 +-
.../ethernet/freescale/enetc/enetc_ethtool.c | 10 +
.../net/ethernet/freescale/enetc/enetc_pf.c | 1 +
4 files changed, 606 insertions(+), 90 deletions(-)

--
2.34.1



2023-02-06 10:09:12

by Vladimir Oltean

[permalink] [raw]
Subject: [RFC PATCH net-next 01/11] net: enetc: optimize struct enetc_rx_swbd layout

Eliminate a 4 byte hole on arm64, to be able to introduce a new member
to this structure in a future patch without increasing the overall
structure size.

Before:

struct enetc_rx_swbd {
struct page * page; /* 0 8 */
enum dma_data_direction dir; /* 8 4 */

/* XXX 4 bytes hole, try to pack */

dma_addr_t dma; /* 16 8 */
u16 page_offset; /* 24 2 */
u16 len; /* 26 2 */

/* size: 32, cachelines: 1, members: 5 */
/* sum members: 24, holes: 1, sum holes: 4 */
/* padding: 4 */
/* last cacheline: 32 bytes */
};

After:

struct enetc_rx_swbd {
struct page * page; /* 0 8 */
dma_addr_t dma; /* 8 8 */
enum dma_data_direction dir; /* 16 4 */
u16 page_offset; /* 20 2 */
u16 len; /* 22 2 */

/* size: 24, cachelines: 1, members: 5 */
/* last cacheline: 24 bytes */
};

Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index e21d096c5a90..704aa1f9dfa3 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -48,10 +48,10 @@ struct enetc_tx_swbd {
(SKB_WITH_OVERHEAD(ENETC_RXB_TRUESIZE) - XDP_PACKET_HEADROOM)

struct enetc_rx_swbd {
- dma_addr_t dma;
struct page *page;
- u16 page_offset;
+ dma_addr_t dma;
enum dma_data_direction dir;
+ u16 page_offset;
u16 len;
};

--
2.34.1


2023-02-06 10:09:18

by Vladimir Oltean

[permalink] [raw]
Subject: [RFC PATCH net-next 02/11] net: enetc: perform XDP RX queue registration at enetc_setup_bpf() time

In a future patch, the XDP RX queues will have to switch between
exposing a shared page memory model or an XSK pool memory model.

If we keep the RXQ registration where it currently is (enetc_pf_probe()
-> enetc_alloc_msix()), we'll end up needing to unregister the existing
RXQs and register new ones. But surprise, registering them can fail, and
that can leave us in the unpleasant situation that we can't recover from
two consecutive errors.

Taking a quick look at net/core/xdp.c, I see that this information seems
to only be used for xdp_buff :: rxq (and :: mem) and xdp_frame :: mem,
essentially between xdp_init_buff() and xdp_release_frame(). While these
2 might not be under the same NAPI poll cycle, the enetc_reconfigure()
procedure does make sure that any XDP buffers in flight are returned to
the respective memory "allocator" prior to calling
enetc_reconfigure_xdp_cb().

So it seems that the most logical way to place this is no earlier than
when it is needed, and unregister no later than when it stops being
needed. This also saves us from the impossible condition of two
consecutive registration failures, because now there isn't anything to
rollback on failure, we can just propagate the error to user space and
we're in the same state as before. I don't really understand why don't
more drivers do this.

Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 89 +++++++++++++++-----
1 file changed, 68 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 1c0aeaa13cde..2d8f79ddb78f 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -2450,6 +2450,59 @@ void enetc_start(struct net_device *ndev)
}
EXPORT_SYMBOL_GPL(enetc_start);

+static int enetc_xdp_rxq_mem_model_register(struct enetc_ndev_priv *priv,
+ int rxq)
+{
+ struct enetc_bdr *rx_ring = priv->rx_ring[rxq];
+ int err;
+
+ err = xdp_rxq_info_reg(&rx_ring->xdp.rxq, priv->ndev, rxq, 0);
+ if (err)
+ return err;
+
+ err = xdp_rxq_info_reg_mem_model(&rx_ring->xdp.rxq,
+ MEM_TYPE_PAGE_SHARED, NULL);
+ if (err)
+ xdp_rxq_info_unreg(&rx_ring->xdp.rxq);
+
+ return err;
+}
+
+static void enetc_xdp_rxq_mem_model_unregister(struct enetc_ndev_priv *priv,
+ int rxq)
+{
+ struct enetc_bdr *rx_ring = priv->rx_ring[rxq];
+
+ xdp_rxq_info_unreg_mem_model(&rx_ring->xdp.rxq);
+ xdp_rxq_info_unreg(&rx_ring->xdp.rxq);
+}
+
+static int enetc_xdp_mem_model_register(struct enetc_ndev_priv *priv)
+{
+ int i, err;
+
+ for (i = 0; i < priv->num_rx_rings; i++) {
+ err = enetc_xdp_rxq_mem_model_register(priv, i);
+ if (err)
+ goto rollback;
+ }
+
+ return 0;
+
+rollback:
+ for (; i >= 0; i--)
+ enetc_xdp_rxq_mem_model_unregister(priv, i);
+ return err;
+}
+
+static void enetc_xdp_mem_model_unregister(struct enetc_ndev_priv *priv)
+{
+ int i;
+
+ for (i = 0; i < priv->num_rx_rings; i++)
+ enetc_xdp_rxq_mem_model_unregister(priv, i);
+}
+
int enetc_open(struct net_device *ndev)
{
struct enetc_ndev_priv *priv = netdev_priv(ndev);
@@ -2675,13 +2728,19 @@ static int enetc_reconfigure_xdp_cb(struct enetc_ndev_priv *priv, void *ctx)
int num_stack_tx_queues;
int err, i;

+ if (prog) {
+ err = enetc_xdp_mem_model_register(priv);
+ if (err)
+ return err;
+ }
+
old_prog = xchg(&priv->xdp_prog, prog);

num_stack_tx_queues = enetc_num_stack_tx_queues(priv);
err = netif_set_real_num_tx_queues(priv->ndev, num_stack_tx_queues);
if (err) {
xchg(&priv->xdp_prog, old_prog);
- return err;
+ goto err_xdp_mem_model_unreg;
}

if (old_prog)
@@ -2698,7 +2757,15 @@ static int enetc_reconfigure_xdp_cb(struct enetc_ndev_priv *priv, void *ctx)
rx_ring->buffer_offset = ENETC_RXB_PAD;
}

+ if (!prog)
+ enetc_xdp_mem_model_unregister(priv);
+
return 0;
+
+err_xdp_mem_model_unreg:
+ if (prog)
+ enetc_xdp_mem_model_unregister(priv);
+ return err;
}

static int enetc_setup_xdp_prog(struct net_device *ndev, struct bpf_prog *prog,
@@ -2954,20 +3021,6 @@ int enetc_alloc_msix(struct enetc_ndev_priv *priv)
bdr->buffer_offset = ENETC_RXB_PAD;
priv->rx_ring[i] = bdr;

- err = xdp_rxq_info_reg(&bdr->xdp.rxq, priv->ndev, i, 0);
- if (err) {
- kfree(v);
- goto fail;
- }
-
- err = xdp_rxq_info_reg_mem_model(&bdr->xdp.rxq,
- MEM_TYPE_PAGE_SHARED, NULL);
- if (err) {
- xdp_rxq_info_unreg(&bdr->xdp.rxq);
- kfree(v);
- goto fail;
- }
-
/* init defaults for adaptive IC */
if (priv->ic_mode & ENETC_IC_RX_ADAPTIVE) {
v->rx_ictt = 0x1;
@@ -3011,10 +3064,7 @@ int enetc_alloc_msix(struct enetc_ndev_priv *priv)
fail:
while (i--) {
struct enetc_int_vector *v = priv->int_vector[i];
- struct enetc_bdr *rx_ring = &v->rx_ring;

- xdp_rxq_info_unreg_mem_model(&rx_ring->xdp.rxq);
- xdp_rxq_info_unreg(&rx_ring->xdp.rxq);
netif_napi_del(&v->napi);
cancel_work_sync(&v->rx_dim.work);
kfree(v);
@@ -3032,10 +3082,7 @@ void enetc_free_msix(struct enetc_ndev_priv *priv)

for (i = 0; i < priv->bdr_int_num; i++) {
struct enetc_int_vector *v = priv->int_vector[i];
- struct enetc_bdr *rx_ring = &v->rx_ring;

- xdp_rxq_info_unreg_mem_model(&rx_ring->xdp.rxq);
- xdp_rxq_info_unreg(&rx_ring->xdp.rxq);
netif_napi_del(&v->napi);
cancel_work_sync(&v->rx_dim.work);
}
--
2.34.1


2023-02-06 10:09:31

by Vladimir Oltean

[permalink] [raw]
Subject: [RFC PATCH net-next 04/11] net: enetc: continue NAPI processing on frames with RX errors

If we see frames with RX errors, consume them, mark their buffers for
refill, and go through the rest of the ring until the NAPI budget is
done. Right now we exit and ask the softirq to be rescheduled.

Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 4a81a23539fb..37d6ad0576e5 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1121,12 +1121,14 @@ static void enetc_add_rx_buff_to_skb(struct enetc_bdr *rx_ring, int i,

static bool enetc_check_bd_errors_and_consume(struct enetc_bdr *rx_ring,
u32 bd_status,
- union enetc_rx_bd **rxbd, int *i)
+ union enetc_rx_bd **rxbd, int *i,
+ int *buffs_missing)
{
if (likely(!(bd_status & ENETC_RXBD_LSTATUS(ENETC_RXBD_ERR_MASK))))
return false;

enetc_put_rx_buff(rx_ring, &rx_ring->rx_swbd[*i]);
+ (*buffs_missing)++;
enetc_rxbd_next(rx_ring, rxbd, i);

while (!(bd_status & ENETC_RXBD_LSTATUS_F)) {
@@ -1134,6 +1136,7 @@ static bool enetc_check_bd_errors_and_consume(struct enetc_bdr *rx_ring,
bd_status = le32_to_cpu((*rxbd)->r.lstatus);

enetc_put_rx_buff(rx_ring, &rx_ring->rx_swbd[*i]);
+ (*buffs_missing)++;
enetc_rxbd_next(rx_ring, rxbd, i);
}

@@ -1215,8 +1218,9 @@ static int enetc_clean_rx_ring(struct enetc_bdr *rx_ring,
dma_rmb(); /* for reading other rxbd fields */

if (enetc_check_bd_errors_and_consume(rx_ring, bd_status,
- &rxbd, &i))
- break;
+ &rxbd, &i,
+ &buffs_missing))
+ continue;

skb = enetc_build_skb(rx_ring, bd_status, &rxbd, &i,
&buffs_missing, ENETC_RXB_DMA_SIZE);
@@ -1549,8 +1553,9 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
dma_rmb(); /* for reading other rxbd fields */

if (enetc_check_bd_errors_and_consume(rx_ring, bd_status,
- &rxbd, &i))
- break;
+ &rxbd, &i,
+ &buffs_missing))
+ continue;

orig_rxbd = rxbd;
orig_buffs_missing = buffs_missing;
--
2.34.1


2023-02-06 10:09:37

by Vladimir Oltean

[permalink] [raw]
Subject: [RFC PATCH net-next 03/11] net: enetc: rename "cleaned_cnt" to "buffs_missing"

Calling enetc_bd_unused() on an RX ring returns the number of
descriptors necessary for the ring to be full with descriptors owned by
hardware (for it to put packets in).

Putting this value in a variable named "cleaned_cnt" is misleading to me,
especially since we may start the NAPI poll routine (enetc_clean_rx_ring)
with a non-zero cleaned_cnt.

Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 41 ++++++++++----------
1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 2d8f79ddb78f..4a81a23539fb 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1145,7 +1145,8 @@ static bool enetc_check_bd_errors_and_consume(struct enetc_bdr *rx_ring,

static struct sk_buff *enetc_build_skb(struct enetc_bdr *rx_ring,
u32 bd_status, union enetc_rx_bd **rxbd,
- int *i, int *cleaned_cnt, int buffer_size)
+ int *i, int *buffs_missing,
+ int buffer_size)
{
struct sk_buff *skb;
u16 size;
@@ -1157,7 +1158,7 @@ static struct sk_buff *enetc_build_skb(struct enetc_bdr *rx_ring,

enetc_get_offloads(rx_ring, *rxbd, skb);

- (*cleaned_cnt)++;
+ (*buffs_missing)++;

enetc_rxbd_next(rx_ring, rxbd, i);

@@ -1173,7 +1174,7 @@ static struct sk_buff *enetc_build_skb(struct enetc_bdr *rx_ring,

enetc_add_rx_buff_to_skb(rx_ring, *i, size, skb);

- (*cleaned_cnt)++;
+ (*buffs_missing)++;

enetc_rxbd_next(rx_ring, rxbd, i);
}
@@ -1190,9 +1191,9 @@ static int enetc_clean_rx_ring(struct enetc_bdr *rx_ring,
struct napi_struct *napi, int work_limit)
{
int rx_frm_cnt = 0, rx_byte_cnt = 0;
- int cleaned_cnt, i;
+ int buffs_missing, i;

- cleaned_cnt = enetc_bd_unused(rx_ring);
+ buffs_missing = enetc_bd_unused(rx_ring);
/* next descriptor to process */
i = rx_ring->next_to_clean;

@@ -1201,9 +1202,9 @@ static int enetc_clean_rx_ring(struct enetc_bdr *rx_ring,
struct sk_buff *skb;
u32 bd_status;

- if (cleaned_cnt >= ENETC_RXBD_BUNDLE)
- cleaned_cnt -= enetc_refill_rx_ring(rx_ring,
- cleaned_cnt);
+ if (buffs_missing >= ENETC_RXBD_BUNDLE)
+ buffs_missing -= enetc_refill_rx_ring(rx_ring,
+ buffs_missing);

rxbd = enetc_rxbd(rx_ring, i);
bd_status = le32_to_cpu(rxbd->r.lstatus);
@@ -1218,7 +1219,7 @@ static int enetc_clean_rx_ring(struct enetc_bdr *rx_ring,
break;

skb = enetc_build_skb(rx_ring, bd_status, &rxbd, &i,
- &cleaned_cnt, ENETC_RXB_DMA_SIZE);
+ &buffs_missing, ENETC_RXB_DMA_SIZE);
if (!skb)
break;

@@ -1447,14 +1448,14 @@ static void enetc_add_rx_buff_to_xdp(struct enetc_bdr *rx_ring, int i,

static void enetc_build_xdp_buff(struct enetc_bdr *rx_ring, u32 bd_status,
union enetc_rx_bd **rxbd, int *i,
- int *cleaned_cnt, struct xdp_buff *xdp_buff)
+ int *buffs_missing, struct xdp_buff *xdp_buff)
{
u16 size = le16_to_cpu((*rxbd)->r.buf_len);

xdp_init_buff(xdp_buff, ENETC_RXB_TRUESIZE, &rx_ring->xdp.rxq);

enetc_map_rx_buff_to_xdp(rx_ring, *i, xdp_buff, size);
- (*cleaned_cnt)++;
+ (*buffs_missing)++;
enetc_rxbd_next(rx_ring, rxbd, i);

/* not last BD in frame? */
@@ -1468,7 +1469,7 @@ static void enetc_build_xdp_buff(struct enetc_bdr *rx_ring, u32 bd_status,
}

enetc_add_rx_buff_to_xdp(rx_ring, *i, size, xdp_buff);
- (*cleaned_cnt)++;
+ (*buffs_missing)++;
enetc_rxbd_next(rx_ring, rxbd, i);
}
}
@@ -1524,16 +1525,16 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
struct enetc_ndev_priv *priv = netdev_priv(rx_ring->ndev);
int rx_frm_cnt = 0, rx_byte_cnt = 0;
struct enetc_bdr *tx_ring;
- int cleaned_cnt, i;
+ int buffs_missing, i;
u32 xdp_act;

- cleaned_cnt = enetc_bd_unused(rx_ring);
+ buffs_missing = enetc_bd_unused(rx_ring);
/* next descriptor to process */
i = rx_ring->next_to_clean;

while (likely(rx_frm_cnt < work_limit)) {
union enetc_rx_bd *rxbd, *orig_rxbd;
- int orig_i, orig_cleaned_cnt;
+ int orig_i, orig_buffs_missing;
struct xdp_buff xdp_buff;
struct sk_buff *skb;
u32 bd_status;
@@ -1552,11 +1553,11 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
break;

orig_rxbd = rxbd;
- orig_cleaned_cnt = cleaned_cnt;
+ orig_buffs_missing = buffs_missing;
orig_i = i;

enetc_build_xdp_buff(rx_ring, bd_status, &rxbd, &i,
- &cleaned_cnt, &xdp_buff);
+ &buffs_missing, &xdp_buff);

xdp_act = bpf_prog_run_xdp(prog, &xdp_buff);

@@ -1572,11 +1573,11 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
break;
case XDP_PASS:
rxbd = orig_rxbd;
- cleaned_cnt = orig_cleaned_cnt;
+ buffs_missing = orig_buffs_missing;
i = orig_i;

skb = enetc_build_skb(rx_ring, bd_status, &rxbd,
- &i, &cleaned_cnt,
+ &i, &buffs_missing,
ENETC_RXB_DMA_SIZE_XDP);
if (unlikely(!skb))
goto out;
@@ -1640,7 +1641,7 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
if (xdp_tx_frm_cnt)
enetc_update_tx_ring_tail(tx_ring);

- if (cleaned_cnt > rx_ring->xdp.xdp_tx_in_flight)
+ if (buffs_missing > rx_ring->xdp.xdp_tx_in_flight)
enetc_refill_rx_ring(rx_ring, enetc_bd_unused(rx_ring) -
rx_ring->xdp.xdp_tx_in_flight);

--
2.34.1


2023-02-06 10:09:40

by Vladimir Oltean

[permalink] [raw]
Subject: [RFC PATCH net-next 05/11] net: enetc: add support for ethtool --show-channels

In a strange twist of events, some libraries such as libbpf perform an
ETHTOOL_GCHANNELS ioctl to find out the max number of queues owned by a
device, number which in turn is used for attaching XDP sockets to
queues.

To add compatibility with libbpf, it is therefore desirable to report
something to this ethtool callback.

According to the ethtool man page, "A channel is an IRQ and the set of
queues that can trigger that IRQ".

In enetc (embedded in NXP LS1028A, a dual core SoC, and LS1018A, a
single core SoC), the enetc_alloc_msix() function allocates a number of
MSI-X interrupt vectors equal to priv->bdr_int_num (which in turn is
equal to the number of CPUs, 2 or 1). Each interrupt vector has 1 RX
ring to process (there are more than 2 RX rings available on an ENETC
port, but the driver only uses up to 2). In addition, the up to 8 TX
rings are distributed in a round-robing manner between the up to 2
available interrupt vectors.

Therefore, even if we have more resources than 2 RX rings, given the
definitions, we can only report 2 combined channels. So do that.

Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc_ethtool.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
index 05e2bad609c6..dda02f8d102a 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_ethtool.c
@@ -863,6 +863,15 @@ static int enetc_set_link_ksettings(struct net_device *dev,
return phylink_ethtool_ksettings_set(priv->phylink, cmd);
}

+static void enetc_get_channels(struct net_device *ndev,
+ struct ethtool_channels *chan)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+
+ chan->max_combined = priv->bdr_int_num;
+ chan->combined_count = priv->bdr_int_num;
+}
+
static const struct ethtool_ops enetc_pf_ethtool_ops = {
.supported_coalesce_params = ETHTOOL_COALESCE_USECS |
ETHTOOL_COALESCE_MAX_FRAMES |
@@ -893,6 +902,7 @@ static const struct ethtool_ops enetc_pf_ethtool_ops = {
.set_wol = enetc_set_wol,
.get_pauseparam = enetc_get_pauseparam,
.set_pauseparam = enetc_set_pauseparam,
+ .get_channels = enetc_get_channels,
};

static const struct ethtool_ops enetc_vf_ethtool_ops = {
--
2.34.1


2023-02-06 10:09:43

by Vladimir Oltean

[permalink] [raw]
Subject: [RFC PATCH net-next 06/11] net: enetc: consolidate rx_swbd freeing

There are 2 code paths in the driver which DMA unmap and give back to
the allocator the page held by the shadow copy of an RX buffer
descriptor. One is on RX ring teardown, and the other on XDP_TX
recycling failure.

Refactor them to call the same helper function, which will make it
easier to add support for one more RX software BD type in the future
(XSK buffer).

Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 28 ++++++++++----------
1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 37d6ad0576e5..33950c81e53c 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -95,6 +95,17 @@ static void enetc_free_tx_frame(struct enetc_bdr *tx_ring,
}
}

+static void enetc_free_rx_swbd(struct enetc_bdr *rx_ring,
+ struct enetc_rx_swbd *rx_swbd)
+{
+ if (rx_swbd->page) {
+ dma_unmap_page(rx_ring->dev, rx_swbd->dma, PAGE_SIZE,
+ rx_swbd->dir);
+ __free_page(rx_swbd->page);
+ rx_swbd->page = NULL;
+ }
+}
+
/* Let H/W know BD ring has been updated */
static void enetc_update_tx_ring_tail(struct enetc_bdr *tx_ring)
{
@@ -796,9 +807,7 @@ static void enetc_recycle_xdp_tx_buff(struct enetc_bdr *tx_ring,
*/
rx_ring->stats.recycle_failures++;

- dma_unmap_page(rx_ring->dev, rx_swbd.dma, PAGE_SIZE,
- rx_swbd.dir);
- __free_page(rx_swbd.page);
+ enetc_free_rx_swbd(rx_ring, &rx_swbd);
}

rx_ring->xdp.xdp_tx_in_flight--;
@@ -1988,17 +1997,8 @@ static void enetc_free_rx_ring(struct enetc_bdr *rx_ring)
{
int i;

- for (i = 0; i < rx_ring->bd_count; i++) {
- struct enetc_rx_swbd *rx_swbd = &rx_ring->rx_swbd[i];
-
- if (!rx_swbd->page)
- continue;
-
- dma_unmap_page(rx_ring->dev, rx_swbd->dma, PAGE_SIZE,
- rx_swbd->dir);
- __free_page(rx_swbd->page);
- rx_swbd->page = NULL;
- }
+ for (i = 0; i < rx_ring->bd_count; i++)
+ enetc_free_rx_swbd(rx_ring, &rx_ring->rx_swbd[i]);
}

static void enetc_free_rxtx_rings(struct enetc_ndev_priv *priv)
--
2.34.1


2023-02-06 10:09:46

by Vladimir Oltean

[permalink] [raw]
Subject: [RFC PATCH net-next 07/11] net: enetc: rename enetc_free_tx_frame() to enetc_free_tx_swbd()

Create naming consistency between the free procedures for a TX and an RX
software BD.

Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 33950c81e53c..fe6a3a531a0a 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -77,8 +77,8 @@ static void enetc_unmap_tx_buff(struct enetc_bdr *tx_ring,
tx_swbd->dma = 0;
}

-static void enetc_free_tx_frame(struct enetc_bdr *tx_ring,
- struct enetc_tx_swbd *tx_swbd)
+static void enetc_free_tx_swbd(struct enetc_bdr *tx_ring,
+ struct enetc_tx_swbd *tx_swbd)
{
struct xdp_frame *xdp_frame = enetc_tx_swbd_get_xdp_frame(tx_swbd);
struct sk_buff *skb = enetc_tx_swbd_get_skb(tx_swbd);
@@ -331,7 +331,7 @@ static int enetc_map_tx_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb)

do {
tx_swbd = &tx_ring->tx_swbd[i];
- enetc_free_tx_frame(tx_ring, tx_swbd);
+ enetc_free_tx_swbd(tx_ring, tx_swbd);
if (i == 0)
i = tx_ring->bd_count;
i--;
@@ -580,7 +580,7 @@ static int enetc_map_tx_tso_buffs(struct enetc_bdr *tx_ring, struct sk_buff *skb
err_chained_bd:
do {
tx_swbd = &tx_ring->tx_swbd[i];
- enetc_free_tx_frame(tx_ring, tx_swbd);
+ enetc_free_tx_swbd(tx_ring, tx_swbd);
if (i == 0)
i = tx_ring->bd_count;
i--;
@@ -1986,11 +1986,8 @@ static void enetc_free_tx_ring(struct enetc_bdr *tx_ring)
{
int i;

- for (i = 0; i < tx_ring->bd_count; i++) {
- struct enetc_tx_swbd *tx_swbd = &tx_ring->tx_swbd[i];
-
- enetc_free_tx_frame(tx_ring, tx_swbd);
- }
+ for (i = 0; i < tx_ring->bd_count; i++)
+ enetc_free_tx_swbd(tx_ring, &tx_ring->tx_swbd[i]);
}

static void enetc_free_rx_ring(struct enetc_bdr *rx_ring)
--
2.34.1


2023-02-06 10:10:13

by Vladimir Oltean

[permalink] [raw]
Subject: [RFC PATCH net-next 11/11] net: enetc: add TX support for zero-copy XDP sockets

Schedule NAPI by hand from enetc_xsk_wakeup(), and send frames from the
XSK TX queue from NAPI context. Add them to the completion queue from
the enetc_clean_tx_ring() procedure which is common for all kinds of
traffic.

We reuse one of the TX rings for XDP (XDP_TX/XDP_REDIRECT) for XSK as
well. They are already cropped from the TX rings that the network stack
can use when XDP is enabled (with or without AF_XDP).

As for XDP_REDIRECT calling enetc's ndo_xdp_xmit, I'm not sure if that
can run simultaneously with enetc_poll() (on different CPUs, but towards
the same TXQ). I guess it probably can, but idk what to do about it.
The problem is that enetc_xdp_xmit() sends to
priv->xdp_tx_ring[smp_processor_id()], while enetc_xsk_xmit() and XDP_TX
send to priv->xdp_tx_ring[NAPI instance]. So when the NAPI instance runs
on a different CPU that the one it is numerically equal to, we should
have a lock that serializes XDP_REDIRECT with the others.

Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 102 ++++++++++++++++++-
drivers/net/ethernet/freescale/enetc/enetc.h | 2 +
2 files changed, 99 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 3990c006c011..bc0db788afc7 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -84,7 +84,7 @@ static void enetc_free_tx_swbd(struct enetc_bdr *tx_ring,
struct xdp_frame *xdp_frame = enetc_tx_swbd_get_xdp_frame(tx_swbd);
struct sk_buff *skb = enetc_tx_swbd_get_skb(tx_swbd);

- if (tx_swbd->dma)
+ if (!tx_swbd->is_xsk && tx_swbd->dma)
enetc_unmap_tx_buff(tx_ring, tx_swbd);

if (xdp_frame) {
@@ -817,7 +817,8 @@ static void enetc_recycle_xdp_tx_buff(struct enetc_bdr *tx_ring,
rx_ring->xdp.xdp_tx_in_flight--;
}

-static bool enetc_clean_tx_ring(struct enetc_bdr *tx_ring, int napi_budget)
+static bool enetc_clean_tx_ring(struct enetc_bdr *tx_ring, int napi_budget,
+ int *xsk_confirmed)
{
int tx_frm_cnt = 0, tx_byte_cnt = 0, tx_win_drop = 0;
struct net_device *ndev = tx_ring->ndev;
@@ -854,7 +855,9 @@ static bool enetc_clean_tx_ring(struct enetc_bdr *tx_ring, int napi_budget)
tx_win_drop++;
}

- if (tx_swbd->is_xdp_tx)
+ if (tx_swbd->is_xsk)
+ (*xsk_confirmed)++;
+ else if (tx_swbd->is_xdp_tx)
enetc_recycle_xdp_tx_buff(tx_ring, tx_swbd);
else if (likely(tx_swbd->dma))
enetc_unmap_tx_buff(tx_ring, tx_swbd);
@@ -1465,6 +1468,58 @@ int enetc_xdp_xmit(struct net_device *ndev, int num_frames,
}
EXPORT_SYMBOL_GPL(enetc_xdp_xmit);

+static void enetc_xsk_map_tx_desc(struct enetc_tx_swbd *tx_swbd,
+ const struct xdp_desc *xsk_desc,
+ struct xsk_buff_pool *pool)
+{
+ dma_addr_t dma;
+
+ dma = xsk_buff_raw_get_dma(pool, xsk_desc->addr);
+ xsk_buff_raw_dma_sync_for_device(pool, dma, xsk_desc->len);
+
+ tx_swbd->dma = dma;
+ tx_swbd->len = xsk_desc->len;
+ tx_swbd->is_xsk = true;
+ tx_swbd->is_eof = true;
+}
+
+static bool enetc_xsk_xmit(struct net_device *ndev, struct xsk_buff_pool *pool,
+ u32 queue_id)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct xdp_desc *xsk_descs = pool->tx_descs;
+ struct enetc_tx_swbd tx_swbd = {0};
+ struct enetc_bdr *tx_ring;
+ u32 budget, batch;
+ int i, k;
+
+ tx_ring = priv->xdp_tx_ring[queue_id];
+
+ /* Shouldn't race with anyone because we are running in the softirq
+ * of the only CPU that sends packets to this TX ring
+ */
+ budget = min(enetc_bd_unused(tx_ring) - 1, ENETC_XSK_TX_BATCH);
+
+ batch = xsk_tx_peek_release_desc_batch(pool, budget);
+ if (!batch)
+ return true;
+
+ i = tx_ring->next_to_use;
+
+ for (k = 0; k < batch; k++) {
+ enetc_xsk_map_tx_desc(&tx_swbd, &xsk_descs[k], pool);
+ enetc_xdp_map_tx_buff(tx_ring, i, &tx_swbd, tx_swbd.len);
+ enetc_bdr_idx_inc(tx_ring, &i);
+ }
+
+ tx_ring->next_to_use = i;
+
+ xsk_tx_release(pool);
+ enetc_update_tx_ring_tail(tx_ring);
+
+ return budget != batch;
+}
+
static void enetc_map_rx_buff_to_xdp(struct enetc_bdr *rx_ring, int i,
struct xdp_buff *xdp_buff, u16 size)
{
@@ -1881,6 +1936,7 @@ static int enetc_poll(struct napi_struct *napi, int budget)
struct enetc_bdr *rx_ring = &v->rx_ring;
struct xsk_buff_pool *pool;
struct bpf_prog *prog;
+ int xsk_confirmed = 0;
bool complete = true;
int work_done;
int i;
@@ -1888,7 +1944,8 @@ static int enetc_poll(struct napi_struct *napi, int budget)
enetc_lock_mdio();

for (i = 0; i < v->count_tx_rings; i++)
- if (!enetc_clean_tx_ring(&v->tx_ring[i], budget))
+ if (!enetc_clean_tx_ring(&v->tx_ring[i], budget,
+ &xsk_confirmed))
complete = false;

prog = rx_ring->xdp.prog;
@@ -1901,6 +1958,17 @@ static int enetc_poll(struct napi_struct *napi, int budget)
else
work_done = enetc_clean_rx_ring(rx_ring, napi, budget);

+ if (pool) {
+ if (xsk_confirmed)
+ xsk_tx_completed(pool, xsk_confirmed);
+
+ if (xsk_uses_need_wakeup(pool))
+ xsk_set_tx_need_wakeup(pool);
+
+ if (!enetc_xsk_xmit(rx_ring->ndev, pool, rx_ring->index))
+ complete = false;
+ }
+
if (work_done == budget)
complete = false;
if (work_done)
@@ -3122,7 +3190,31 @@ static int enetc_setup_xsk_pool(struct net_device *ndev,

int enetc_xsk_wakeup(struct net_device *ndev, u32 queue_id, u32 flags)
{
- /* xp_assign_dev() wants this; nothing needed for RX */
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct enetc_int_vector *v;
+ struct enetc_bdr *rx_ring;
+
+ if (!netif_running(ndev) || !netif_carrier_ok(ndev))
+ return -ENETDOWN;
+
+ if (queue_id >= priv->bdr_int_num)
+ return -ERANGE;
+
+ v = priv->int_vector[queue_id];
+ rx_ring = &v->rx_ring;
+
+ if (!rx_ring->xdp.xsk_pool || !rx_ring->xdp.prog)
+ return -EINVAL;
+
+ /* No way to kick TX by triggering a hardirq right away =>
+ * raise softirq. This might schedule NAPI on a CPU different than the
+ * smp_affinity of its IRQ would suggest, but that would happen anyway
+ * if, say, we change that affinity under heavy traffic.
+ * So enetc_poll() has to be prepared for it anyway.
+ */
+ if (!napi_if_scheduled_mark_missed(&v->napi))
+ napi_schedule(&v->napi);
+
return 0;
}

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index e1a746e37c9a..403f40473b52 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -36,6 +36,7 @@ struct enetc_tx_swbd {
u8 is_eof:1;
u8 is_xdp_tx:1;
u8 is_xdp_redirect:1;
+ u8 is_xsk:1;
u8 qbv_en:1;
};

@@ -86,6 +87,7 @@ struct enetc_xdp_data {
#define ENETC_RX_RING_DEFAULT_SIZE 2048
#define ENETC_TX_RING_DEFAULT_SIZE 2048
#define ENETC_DEFAULT_TX_WORK (ENETC_TX_RING_DEFAULT_SIZE / 2)
+#define ENETC_XSK_TX_BATCH ENETC_DEFAULT_TX_WORK

struct enetc_bdr_resource {
/* Input arguments saved for teardown */
--
2.34.1


2023-02-06 10:10:21

by Vladimir Oltean

[permalink] [raw]
Subject: [RFC PATCH net-next 08/11] net: enetc: increment rx_byte_cnt for XDP data path

v->rx_ring.stats.bytes is apparently only used for interrupt coalescing,
not for printing to ethtool -S. I am unable to find a functional problem
caused by the lack of updating this counter, but it is updated from the
stack NAPI poll routine, so update it from the XDP one too.

Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index fe6a3a531a0a..e4552acf762c 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1461,7 +1461,8 @@ static void enetc_add_rx_buff_to_xdp(struct enetc_bdr *rx_ring, int i,

static void enetc_build_xdp_buff(struct enetc_bdr *rx_ring, u32 bd_status,
union enetc_rx_bd **rxbd, int *i,
- int *buffs_missing, struct xdp_buff *xdp_buff)
+ int *buffs_missing, struct xdp_buff *xdp_buff,
+ int *rx_byte_cnt)
{
u16 size = le16_to_cpu((*rxbd)->r.buf_len);

@@ -1469,6 +1470,7 @@ static void enetc_build_xdp_buff(struct enetc_bdr *rx_ring, u32 bd_status,

enetc_map_rx_buff_to_xdp(rx_ring, *i, xdp_buff, size);
(*buffs_missing)++;
+ (*rx_byte_cnt) += size;
enetc_rxbd_next(rx_ring, rxbd, i);

/* not last BD in frame? */
@@ -1483,6 +1485,7 @@ static void enetc_build_xdp_buff(struct enetc_bdr *rx_ring, u32 bd_status,

enetc_add_rx_buff_to_xdp(rx_ring, *i, size, xdp_buff);
(*buffs_missing)++;
+ (*rx_byte_cnt) += size;
enetc_rxbd_next(rx_ring, rxbd, i);
}
}
@@ -1571,7 +1574,7 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
orig_i = i;

enetc_build_xdp_buff(rx_ring, bd_status, &rxbd, &i,
- &buffs_missing, &xdp_buff);
+ &buffs_missing, &xdp_buff, &rx_byte_cnt);

xdp_act = bpf_prog_run_xdp(prog, &xdp_buff);

--
2.34.1


2023-02-06 10:10:25

by Vladimir Oltean

[permalink] [raw]
Subject: [RFC PATCH net-next 10/11] net: enetc: add RX support for zero-copy XDP sockets

Add support for filling an RX ring with buffers coming from an XSK umem.
Although enetc has up to 8 RX rings, we still use one of the 2 per-CPU
RX rings for XSK.

To set up an XSK pool on one of the RX queues, we use the
reconfiguration procedure which temporarily stops the rings.

Since the RX procedure in the NAPI poll function is completely different
(both the API for creating an xdp_buff, as well as refilling the ring
with memory from user space), create a separate enetc_clean_rx_ring_xsk()
function which gets called when we have both an XSK pool and an XDK
program on this RX queue.

Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 377 +++++++++++++++++-
drivers/net/ethernet/freescale/enetc/enetc.h | 3 +
.../net/ethernet/freescale/enetc/enetc_pf.c | 1 +
3 files changed, 373 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index dee432cacf85..3990c006c011 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -10,6 +10,7 @@
#include <net/ip6_checksum.h>
#include <net/pkt_sched.h>
#include <net/tso.h>
+#include <net/xdp_sock_drv.h>

u32 enetc_port_mac_rd(struct enetc_si *si, u32 reg)
{
@@ -103,6 +104,9 @@ static void enetc_free_rx_swbd(struct enetc_bdr *rx_ring,
rx_swbd->dir);
__free_page(rx_swbd->page);
rx_swbd->page = NULL;
+ } else if (rx_swbd->xsk_buff) {
+ xsk_buff_free(rx_swbd->xsk_buff);
+ rx_swbd->xsk_buff = NULL;
}
}

@@ -979,6 +983,44 @@ static int enetc_refill_rx_ring(struct enetc_bdr *rx_ring, const int buff_cnt)
return j;
}

+static int enetc_refill_rx_ring_xsk(struct enetc_bdr *rx_ring, int buff_cnt)
+{
+ struct xsk_buff_pool *pool = rx_ring->xdp.xsk_pool;
+ struct enetc_rx_swbd *rx_swbd;
+ struct xdp_buff *xsk_buff;
+ union enetc_rx_bd *rxbd;
+ int i, j;
+
+ i = rx_ring->next_to_use;
+ rxbd = enetc_rxbd(rx_ring, i);
+
+ for (j = 0; j < buff_cnt; j++) {
+ xsk_buff = xsk_buff_alloc(pool); // TODO use _batch?
+ if (!xsk_buff)
+ break;
+
+ rx_swbd = &rx_ring->rx_swbd[i];
+ rx_swbd->xsk_buff = xsk_buff;
+ rx_swbd->dma = xsk_buff_xdp_get_dma(xsk_buff);
+
+ /* update RxBD */
+ rxbd->w.addr = cpu_to_le64(rx_swbd->dma);
+ /* clear 'R" as well */
+ rxbd->r.lstatus = 0;
+
+ enetc_rxbd_next(rx_ring, &rxbd, &i);
+ }
+
+ if (likely(j)) {
+ rx_ring->next_to_use = i;
+
+ /* update ENETC's consumer index */
+ enetc_wr_reg_hot(rx_ring->rcir, rx_ring->next_to_use);
+ }
+
+ return j;
+}
+
#ifdef CONFIG_FSL_ENETC_PTP_CLOCK
static void enetc_get_rx_tstamp(struct net_device *ndev,
union enetc_rx_bd *rxbd,
@@ -1128,6 +1170,18 @@ static void enetc_add_rx_buff_to_skb(struct enetc_bdr *rx_ring, int i,
enetc_flip_rx_buff(rx_ring, rx_swbd);
}

+static void enetc_put_rx_swbd(struct enetc_bdr *rx_ring, int i)
+{
+ struct enetc_rx_swbd *rx_swbd = &rx_ring->rx_swbd[i];
+
+ if (rx_swbd->xsk_buff) {
+ xsk_buff_free(rx_swbd->xsk_buff);
+ rx_swbd->xsk_buff = NULL;
+ } else {
+ enetc_put_rx_buff(rx_ring, rx_swbd);
+ }
+}
+
static bool enetc_check_bd_errors_and_consume(struct enetc_bdr *rx_ring,
u32 bd_status,
union enetc_rx_bd **rxbd, int *i,
@@ -1136,7 +1190,7 @@ static bool enetc_check_bd_errors_and_consume(struct enetc_bdr *rx_ring,
if (likely(!(bd_status & ENETC_RXBD_LSTATUS(ENETC_RXBD_ERR_MASK))))
return false;

- enetc_put_rx_buff(rx_ring, &rx_ring->rx_swbd[*i]);
+ enetc_put_rx_swbd(rx_ring, *i);
(*buffs_missing)++;
enetc_rxbd_next(rx_ring, rxbd, i);

@@ -1144,7 +1198,7 @@ static bool enetc_check_bd_errors_and_consume(struct enetc_bdr *rx_ring,
dma_rmb();
bd_status = le32_to_cpu((*rxbd)->r.lstatus);

- enetc_put_rx_buff(rx_ring, &rx_ring->rx_swbd[*i]);
+ enetc_put_rx_swbd(rx_ring, *i);
(*buffs_missing)++;
enetc_rxbd_next(rx_ring, rxbd, i);
}
@@ -1484,6 +1538,43 @@ static void enetc_build_xdp_buff(struct enetc_bdr *rx_ring, u32 bd_status,
}
}

+static struct xdp_buff *enetc_build_xsk_buff(struct xsk_buff_pool *pool,
+ struct enetc_bdr *rx_ring,
+ u32 bd_status,
+ union enetc_rx_bd **rxbd, int *i,
+ int *buffs_missing, int *rx_byte_cnt)
+{
+ struct enetc_rx_swbd *rx_swbd = &rx_ring->rx_swbd[*i];
+ u16 size = le16_to_cpu((*rxbd)->r.buf_len);
+ struct xdp_buff *xsk_buff;
+
+ /* Multi-buffer frames are not supported in XSK mode */
+ if (unlikely(!(bd_status & ENETC_RXBD_LSTATUS_F))) {
+ while (!(bd_status & ENETC_RXBD_LSTATUS_F)) {
+ enetc_put_rx_swbd(rx_ring, *i);
+
+ (*buffs_missing)++;
+ enetc_rxbd_next(rx_ring, rxbd, i);
+ dma_rmb();
+ bd_status = le32_to_cpu((*rxbd)->r.lstatus);
+ }
+
+ return NULL;
+ }
+
+ xsk_buff = rx_swbd->xsk_buff;
+ xsk_buff_set_size(xsk_buff, size);
+ xsk_buff_dma_sync_for_cpu(xsk_buff, pool);
+
+ rx_swbd->xsk_buff = NULL;
+
+ (*buffs_missing)++;
+ (*rx_byte_cnt) += size;
+ enetc_rxbd_next(rx_ring, rxbd, i);
+
+ return xsk_buff;
+}
+
/* Convert RX buffer descriptors to TX buffer descriptors. These will be
* recycled back into the RX ring in enetc_clean_tx_ring.
*/
@@ -1659,11 +1750,136 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
return rx_frm_cnt;
}

+static void enetc_xsk_buff_to_skb(struct xdp_buff *xsk_buff,
+ struct enetc_bdr *rx_ring,
+ union enetc_rx_bd *rxbd,
+ struct napi_struct *napi)
+{
+ size_t len = xdp_get_buff_len(xsk_buff);
+ struct sk_buff *skb;
+
+ skb = napi_alloc_skb(napi, len);
+ if (unlikely(!skb)) {
+ rx_ring->stats.rx_alloc_errs++;
+ goto out;
+ }
+
+ skb_put_data(skb, xsk_buff->data, len);
+
+ enetc_get_offloads(rx_ring, rxbd, skb);
+
+ skb_record_rx_queue(skb, rx_ring->index);
+ skb->protocol = eth_type_trans(skb, rx_ring->ndev);
+
+ rx_ring->stats.packets += skb->len;
+ rx_ring->stats.bytes++;
+
+ napi_gro_receive(napi, skb);
+out:
+ xsk_buff_free(xsk_buff);
+}
+
+static int enetc_clean_rx_ring_xsk(struct enetc_bdr *rx_ring,
+ struct napi_struct *napi, int work_limit,
+ struct bpf_prog *prog,
+ struct xsk_buff_pool *pool)
+{
+ struct net_device *ndev = rx_ring->ndev;
+ union enetc_rx_bd *rxbd, *orig_rxbd;
+ int rx_frm_cnt = 0, rx_byte_cnt = 0;
+ int xdp_redirect_frm_cnt = 0;
+ struct xdp_buff *xsk_buff;
+ int buffs_missing, err, i;
+ bool wakeup_xsk = false;
+ u32 bd_status, xdp_act;
+
+ buffs_missing = enetc_bd_unused(rx_ring);
+ /* next descriptor to process */
+ i = rx_ring->next_to_clean;
+
+ while (likely(rx_frm_cnt < work_limit)) {
+ if (buffs_missing >= ENETC_RXBD_BUNDLE) {
+ buffs_missing -= enetc_refill_rx_ring_xsk(rx_ring,
+ buffs_missing);
+ wakeup_xsk |= (buffs_missing != 0);
+ }
+
+ rxbd = enetc_rxbd(rx_ring, i);
+ bd_status = le32_to_cpu(rxbd->r.lstatus);
+ if (!bd_status)
+ break;
+
+ enetc_wr_reg_hot(rx_ring->idr, BIT(rx_ring->index));
+ dma_rmb(); /* for reading other rxbd fields */
+
+ if (enetc_check_bd_errors_and_consume(rx_ring, bd_status,
+ &rxbd, &i,
+ &buffs_missing))
+ continue;
+
+ orig_rxbd = rxbd;
+
+ xsk_buff = enetc_build_xsk_buff(pool, rx_ring, bd_status,
+ &rxbd, &i, &buffs_missing,
+ &rx_byte_cnt);
+ if (!xsk_buff)
+ continue;
+
+ xdp_act = bpf_prog_run_xdp(prog, xsk_buff);
+ switch (xdp_act) {
+ default:
+ bpf_warn_invalid_xdp_action(ndev, prog, xdp_act);
+ fallthrough;
+ case XDP_ABORTED:
+ trace_xdp_exception(ndev, prog, xdp_act);
+ fallthrough;
+ case XDP_DROP:
+ xsk_buff_free(xsk_buff);
+ break;
+ case XDP_PASS:
+ enetc_xsk_buff_to_skb(xsk_buff, rx_ring, orig_rxbd,
+ napi);
+ break;
+ case XDP_REDIRECT:
+ err = xdp_do_redirect(ndev, xsk_buff, prog);
+ if (unlikely(err)) {
+ if (err == -ENOBUFS)
+ wakeup_xsk = true;
+ xsk_buff_free(xsk_buff);
+ rx_ring->stats.xdp_redirect_failures++;
+ } else {
+ xdp_redirect_frm_cnt++;
+ rx_ring->stats.xdp_redirect++;
+ }
+ }
+
+ rx_frm_cnt++;
+ }
+
+ rx_ring->next_to_clean = i;
+
+ rx_ring->stats.packets += rx_frm_cnt;
+ rx_ring->stats.bytes += rx_byte_cnt;
+
+ if (xdp_redirect_frm_cnt)
+ xdp_do_flush_map();
+
+ if (xsk_uses_need_wakeup(pool)) {
+ if (wakeup_xsk)
+ xsk_set_rx_need_wakeup(pool);
+ else
+ xsk_clear_rx_need_wakeup(pool);
+ }
+
+ return rx_frm_cnt;
+}
+
static int enetc_poll(struct napi_struct *napi, int budget)
{
struct enetc_int_vector
*v = container_of(napi, struct enetc_int_vector, napi);
struct enetc_bdr *rx_ring = &v->rx_ring;
+ struct xsk_buff_pool *pool;
struct bpf_prog *prog;
bool complete = true;
int work_done;
@@ -1676,10 +1892,15 @@ static int enetc_poll(struct napi_struct *napi, int budget)
complete = false;

prog = rx_ring->xdp.prog;
- if (prog)
+ pool = rx_ring->xdp.xsk_pool;
+ if (prog && pool)
+ work_done = enetc_clean_rx_ring_xsk(rx_ring, napi, budget, prog,
+ pool);
+ else if (prog)
work_done = enetc_clean_rx_ring_xdp(rx_ring, napi, budget, prog);
else
work_done = enetc_clean_rx_ring(rx_ring, napi, budget);
+
if (work_done == budget)
complete = false;
if (work_done)
@@ -2168,7 +2389,16 @@ static void enetc_setup_rxbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring,
rx_ring->next_to_alloc = 0;

enetc_lock_mdio();
- enetc_refill_rx_ring(rx_ring, enetc_bd_unused(rx_ring));
+ /* The XSK buffer pool and the BPF program are set up through different
+ * syscalls. From the moment the pool has been DMA mapped and until the
+ * XDP program is attached, we still need to use normal RX buffers,
+ * because we still use the normal NAPI poll routine. Only use buffers
+ * from the XSK pool when both conditions are fulfilled.
+ */
+ if (rx_ring->xdp.prog && rx_ring->xdp.xsk_pool)
+ enetc_refill_rx_ring_xsk(rx_ring, enetc_bd_unused(rx_ring));
+ else
+ enetc_refill_rx_ring(rx_ring, enetc_bd_unused(rx_ring));
enetc_unlock_mdio();

enetc_rxbdr_wr(hw, idx, ENETC_RBMR, rbmr);
@@ -2454,18 +2684,27 @@ static int enetc_xdp_rxq_mem_model_register(struct enetc_ndev_priv *priv,
int rxq)
{
struct enetc_bdr *rx_ring = priv->rx_ring[rxq];
+ struct xsk_buff_pool *pool;
+ enum xdp_mem_type type;
int err;

err = xdp_rxq_info_reg(&rx_ring->xdp.rxq, priv->ndev, rxq, 0);
if (err)
return err;

- err = xdp_rxq_info_reg_mem_model(&rx_ring->xdp.rxq,
- MEM_TYPE_PAGE_SHARED, NULL);
- if (err)
+ pool = rx_ring->xdp.xsk_pool;
+ type = !!pool ? MEM_TYPE_XSK_BUFF_POOL : MEM_TYPE_PAGE_SHARED;
+
+ err = xdp_rxq_info_reg_mem_model(&rx_ring->xdp.rxq, type, NULL);
+ if (err) {
xdp_rxq_info_unreg(&rx_ring->xdp.rxq);
+ return err;
+ }

- return err;
+ if (pool)
+ xsk_pool_set_rxq_info(pool, &rx_ring->xdp.rxq);
+
+ return 0;
}

static void enetc_xdp_rxq_mem_model_unregister(struct enetc_ndev_priv *priv,
@@ -2768,6 +3007,125 @@ static int enetc_reconfigure_xdp_cb(struct enetc_ndev_priv *priv, void *ctx)
return err;
}

+struct enetc_xsk_reconfig_ctx {
+ struct enetc_bdr *rx_ring;
+ struct xsk_buff_pool *pool;
+};
+
+static int enetc_enable_xsk_cb(struct enetc_ndev_priv *priv, void *ctx)
+{
+ struct enetc_xsk_reconfig_ctx *data = ctx;
+ struct enetc_bdr *rx_ring = data->rx_ring;
+ struct xsk_buff_pool *pool = data->pool;
+ int err;
+
+ err = xsk_pool_dma_map(pool, priv->dev, 0);
+ if (err)
+ return err;
+
+ rx_ring->xdp.xsk_pool = pool;
+
+ return 0;
+}
+
+static int enetc_disable_xsk_cb(struct enetc_ndev_priv *priv, void *ctx)
+{
+ struct enetc_xsk_reconfig_ctx *data = ctx;
+ struct enetc_bdr *rx_ring = data->rx_ring;
+ struct xsk_buff_pool *pool = data->pool;
+
+ rx_ring->xdp.xsk_pool = NULL;
+ xsk_pool_dma_unmap(pool, 0);
+
+ return 0;
+}
+
+static int enetc_enable_xsk_pool(struct net_device *ndev,
+ struct xsk_buff_pool *pool, u16 queue_id)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct enetc_xsk_reconfig_ctx ctx;
+ struct enetc_int_vector *v;
+ struct enetc_bdr *rx_ring;
+ bool extended;
+
+ if (queue_id >= priv->bdr_int_num) {
+ netdev_err(ndev, "QID %d exceeds the %d channels available\n",
+ queue_id, priv->bdr_int_num);
+ return -ERANGE;
+ }
+
+ v = priv->int_vector[queue_id];
+ rx_ring = &v->rx_ring;
+ if (rx_ring->xdp.xsk_pool) {
+ netdev_err(ndev, "QID %d already has an XSK pool attached\n",
+ rx_ring->index);
+ return -EBUSY;
+ }
+
+ /* Ensure enetc_setup_xdp_prog() won't be called before
+ * enetc_setup_xsk_pool(), because enetc_xdp_rxq_mem_model_register()
+ * depends on call ordering.
+ */
+ if (rx_ring->xdp.prog) {
+ netdev_err(ndev,
+ "Cannot use XSK if there is an XDP program already attached\n");
+ return -EINVAL;
+ }
+
+ extended = !!(priv->active_offloads & ENETC_F_RX_TSTAMP);
+ ctx.rx_ring = rx_ring;
+ ctx.pool = pool;
+
+ return enetc_reconfigure(priv, extended, enetc_enable_xsk_cb, &ctx);
+}
+
+static int enetc_disable_xsk_pool(struct net_device *ndev, u16 queue_id)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct enetc_xsk_reconfig_ctx ctx;
+ struct enetc_int_vector *v;
+ struct xsk_buff_pool *pool;
+ struct enetc_bdr *rx_ring;
+ bool extended;
+
+ if (queue_id >= priv->bdr_int_num) {
+ netdev_err(ndev, "QID %d exceeds the %d channels available\n",
+ queue_id, priv->bdr_int_num);
+ return -ERANGE;
+ }
+
+ v = priv->int_vector[queue_id];
+ rx_ring = &v->rx_ring;
+
+ pool = rx_ring->xdp.xsk_pool;
+ if (!pool) {
+ netdev_err(ndev, "QID %d does not have an XSK pool attached\n",
+ rx_ring->index);
+ return -ENOENT;
+ }
+
+ extended = !!(priv->active_offloads & ENETC_F_RX_TSTAMP);
+ ctx.rx_ring = rx_ring;
+ ctx.pool = pool;
+
+ return enetc_reconfigure(priv, extended, enetc_disable_xsk_cb, &ctx);
+}
+
+static int enetc_setup_xsk_pool(struct net_device *ndev,
+ struct xsk_buff_pool *pool,
+ u16 queue_id)
+{
+ return pool ? enetc_enable_xsk_pool(ndev, pool, queue_id) :
+ enetc_disable_xsk_pool(ndev, queue_id);
+}
+
+int enetc_xsk_wakeup(struct net_device *ndev, u32 queue_id, u32 flags)
+{
+ /* xp_assign_dev() wants this; nothing needed for RX */
+ return 0;
+}
+
static int enetc_setup_xdp_prog(struct net_device *ndev, struct bpf_prog *prog,
struct netlink_ext_ack *extack)
{
@@ -2798,6 +3156,9 @@ int enetc_setup_bpf(struct net_device *ndev, struct netdev_bpf *bpf)
switch (bpf->command) {
case XDP_SETUP_PROG:
return enetc_setup_xdp_prog(ndev, bpf->prog, bpf->extack);
+ case XDP_SETUP_XSK_POOL:
+ return enetc_setup_xsk_pool(ndev, bpf->xsk.pool,
+ bpf->xsk.queue_id);
default:
return -EINVAL;
}
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 704aa1f9dfa3..e1a746e37c9a 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -48,6 +48,7 @@ struct enetc_tx_swbd {
(SKB_WITH_OVERHEAD(ENETC_RXB_TRUESIZE) - XDP_PACKET_HEADROOM)

struct enetc_rx_swbd {
+ struct xdp_buff *xsk_buff;
struct page *page;
dma_addr_t dma;
enum dma_data_direction dir;
@@ -77,6 +78,7 @@ struct enetc_ring_stats {

struct enetc_xdp_data {
struct xdp_rxq_info rxq;
+ struct xsk_buff_pool *xsk_pool;
struct bpf_prog *prog;
int xdp_tx_in_flight;
};
@@ -424,6 +426,7 @@ int enetc_setup_tc_mqprio(struct net_device *ndev, void *type_data);
int enetc_setup_bpf(struct net_device *ndev, struct netdev_bpf *bpf);
int enetc_xdp_xmit(struct net_device *ndev, int num_frames,
struct xdp_frame **frames, u32 flags);
+int enetc_xsk_wakeup(struct net_device *dev, u32 queue, u32 flags);

/* ethtool */
void enetc_set_ethtool_ops(struct net_device *ndev);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index 7facc7d5261e..1a95d213683b 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -773,6 +773,7 @@ static const struct net_device_ops enetc_ndev_ops = {
.ndo_setup_tc = enetc_pf_setup_tc,
.ndo_bpf = enetc_setup_bpf,
.ndo_xdp_xmit = enetc_xdp_xmit,
+ .ndo_xsk_wakeup = enetc_xsk_wakeup,
};

static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
--
2.34.1


2023-02-06 10:10:29

by Vladimir Oltean

[permalink] [raw]
Subject: [RFC PATCH net-next 09/11] net: enetc: move setting of ENETC_TXBD_FLAGS_F flag to enetc_xdp_map_tx_buff()

XSK transmission will reuse this procedure, and will also need the logic
for setting the "final" bit of a TX BD, based on tx_swbd->is_eof.
Not sure why this was left to be done by the caller of
enetc_xdp_map_tx_buff(), but move it inside.

Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index e4552acf762c..dee432cacf85 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1263,6 +1263,10 @@ static void enetc_xdp_map_tx_buff(struct enetc_bdr *tx_ring, int i,
txbd->buf_len = cpu_to_le16(tx_swbd->len);
txbd->frm_len = cpu_to_le16(frm_len);

+ /* last BD needs 'F' bit set */
+ if (tx_swbd->is_eof)
+ txbd->flags = ENETC_TXBD_FLAGS_F;
+
memcpy(&tx_ring->tx_swbd[i], tx_swbd, sizeof(*tx_swbd));
}

@@ -1286,17 +1290,7 @@ static bool enetc_xdp_tx(struct enetc_bdr *tx_ring,
i = tx_ring->next_to_use;

for (k = 0; k < num_tx_swbd; k++) {
- struct enetc_tx_swbd *xdp_tx_swbd = &xdp_tx_arr[k];
-
- enetc_xdp_map_tx_buff(tx_ring, i, xdp_tx_swbd, frm_len);
-
- /* last BD needs 'F' bit set */
- if (xdp_tx_swbd->is_eof) {
- union enetc_tx_bd *txbd = ENETC_TXBD(*tx_ring, i);
-
- txbd->flags = ENETC_TXBD_FLAGS_F;
- }
-
+ enetc_xdp_map_tx_buff(tx_ring, i, &xdp_tx_arr[k], frm_len);
enetc_bdr_idx_inc(tx_ring, &i);
}

--
2.34.1


2023-02-06 10:19:48

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC PATCH net-next 11/11] net: enetc: add TX support for zero-copy XDP sockets

Yikes, ran git format-patch one patch too soon. There's also a hidden,
bonus patch "12/11" below. Doesn't affect first 11 patches in any way,
though. Here it is.

From f7f10232622309d66a7a1ae1932d0c081936d546 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <[email protected]>
Date: Tue, 25 Oct 2022 02:32:24 +0300
Subject: [RFC PATCH net-next 12/12] net: enetc: add support for XDP_TX
with zero-copy XDP sockets

Add support for the case when the BPF program attached to a ring with an
XSK pool returns the XDP_TX verdict. The frame needs to go back on the
interface it came from.

No dma_map or dma_sync_for_device is necessary, just a small impedance
matching logic with the XDP_TX procedure we have in place for non-XSK,
since the data structures are different (xdp_buff vs xdp_frame; cannot
have multi-buffer with XSK).

In the TX confirmation routine, just release the RX buffer (as opposed
to non-XSK XDP_TX). Recycling might be possible, but I haven't
experimented with it.

Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 35 ++++++++++++++++++--
drivers/net/ethernet/freescale/enetc/enetc.h | 1 +
2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index bc0db788afc7..06aaf0334dc3 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -855,7 +855,9 @@ static bool enetc_clean_tx_ring(struct enetc_bdr *tx_ring, int napi_budget,
tx_win_drop++;
}

- if (tx_swbd->is_xsk)
+ if (tx_swbd->is_xsk && tx_swbd->is_xdp_tx)
+ xsk_buff_free(tx_swbd->xsk_buff);
+ else if (tx_swbd->is_xsk)
(*xsk_confirmed)++;
else if (tx_swbd->is_xdp_tx)
enetc_recycle_xdp_tx_buff(tx_ring, tx_swbd);
@@ -1661,6 +1663,21 @@ static int enetc_rx_swbd_to_xdp_tx_swbd(struct enetc_tx_swbd *xdp_tx_arr,
return n;
}

+static bool enetc_xsk_xdp_tx(struct enetc_bdr *tx_ring,
+ struct xdp_buff *xsk_buff)
+{
+ struct enetc_tx_swbd tx_swbd = {
+ .dma = xsk_buff_xdp_get_dma(xsk_buff),
+ .len = xdp_get_buff_len(xsk_buff),
+ .is_xdp_tx = true,
+ .is_xsk = true,
+ .is_eof = true,
+ .xsk_buff = xsk_buff,
+ };
+
+ return enetc_xdp_tx(tx_ring, &tx_swbd, 1);
+}
+
static void enetc_xdp_drop(struct enetc_bdr *rx_ring, int rx_ring_first,
int rx_ring_last)
{
@@ -1839,10 +1856,12 @@ static int enetc_clean_rx_ring_xsk(struct enetc_bdr *rx_ring,
struct bpf_prog *prog,
struct xsk_buff_pool *pool)
{
+ struct enetc_ndev_priv *priv = netdev_priv(rx_ring->ndev);
+ struct enetc_bdr *tx_ring = priv->xdp_tx_ring[rx_ring->index];
+ int xdp_redirect_frm_cnt = 0, xdp_tx_frm_cnt = 0;
struct net_device *ndev = rx_ring->ndev;
union enetc_rx_bd *rxbd, *orig_rxbd;
int rx_frm_cnt = 0, rx_byte_cnt = 0;
- int xdp_redirect_frm_cnt = 0;
struct xdp_buff *xsk_buff;
int buffs_missing, err, i;
bool wakeup_xsk = false;
@@ -1895,6 +1914,15 @@ static int enetc_clean_rx_ring_xsk(struct enetc_bdr *rx_ring,
enetc_xsk_buff_to_skb(xsk_buff, rx_ring, orig_rxbd,
napi);
break;
+ case XDP_TX:
+ if (enetc_xsk_xdp_tx(tx_ring, xsk_buff)) {
+ xdp_tx_frm_cnt++;
+ tx_ring->stats.xdp_tx++;
+ } else {
+ xsk_buff_free(xsk_buff);
+ tx_ring->stats.xdp_tx_drops++;
+ }
+ break;
case XDP_REDIRECT:
err = xdp_do_redirect(ndev, xsk_buff, prog);
if (unlikely(err)) {
@@ -1919,6 +1947,9 @@ static int enetc_clean_rx_ring_xsk(struct enetc_bdr *rx_ring,
if (xdp_redirect_frm_cnt)
xdp_do_flush_map();

+ if (xdp_tx_frm_cnt)
+ enetc_update_tx_ring_tail(tx_ring);
+
if (xsk_uses_need_wakeup(pool)) {
if (wakeup_xsk)
xsk_set_rx_need_wakeup(pool);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 403f40473b52..58df6c32cfb3 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -24,6 +24,7 @@ struct enetc_tx_swbd {
union {
struct sk_buff *skb;
struct xdp_frame *xdp_frame;
+ struct xdp_buff *xsk_buff;
};
dma_addr_t dma;
struct page *page; /* valid only if is_xdp_tx */
--
2.34.1


2023-02-08 16:37:03

by Maciej Fijalkowski

[permalink] [raw]
Subject: Re: [RFC PATCH net-next 10/11] net: enetc: add RX support for zero-copy XDP sockets

On Mon, Feb 06, 2023 at 12:08:36PM +0200, Vladimir Oltean wrote:
> Add support for filling an RX ring with buffers coming from an XSK umem.
> Although enetc has up to 8 RX rings, we still use one of the 2 per-CPU
> RX rings for XSK.
>
> To set up an XSK pool on one of the RX queues, we use the
> reconfiguration procedure which temporarily stops the rings.
>
> Since the RX procedure in the NAPI poll function is completely different
> (both the API for creating an xdp_buff, as well as refilling the ring
> with memory from user space), create a separate enetc_clean_rx_ring_xsk()
> function which gets called when we have both an XSK pool and an XDK
> program on this RX queue.
>
> Signed-off-by: Vladimir Oltean <[email protected]>
> ---
> drivers/net/ethernet/freescale/enetc/enetc.c | 377 +++++++++++++++++-
> drivers/net/ethernet/freescale/enetc/enetc.h | 3 +
> .../net/ethernet/freescale/enetc/enetc_pf.c | 1 +
> 3 files changed, 373 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
> index dee432cacf85..3990c006c011 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.c
> @@ -10,6 +10,7 @@
> #include <net/ip6_checksum.h>
> #include <net/pkt_sched.h>
> #include <net/tso.h>
> +#include <net/xdp_sock_drv.h>
>
> u32 enetc_port_mac_rd(struct enetc_si *si, u32 reg)
> {
> @@ -103,6 +104,9 @@ static void enetc_free_rx_swbd(struct enetc_bdr *rx_ring,
> rx_swbd->dir);
> __free_page(rx_swbd->page);
> rx_swbd->page = NULL;
> + } else if (rx_swbd->xsk_buff) {
> + xsk_buff_free(rx_swbd->xsk_buff);
> + rx_swbd->xsk_buff = NULL;
> }
> }
>
> @@ -979,6 +983,44 @@ static int enetc_refill_rx_ring(struct enetc_bdr *rx_ring, const int buff_cnt)
> return j;
> }
>
> +static int enetc_refill_rx_ring_xsk(struct enetc_bdr *rx_ring, int buff_cnt)
> +{
> + struct xsk_buff_pool *pool = rx_ring->xdp.xsk_pool;
> + struct enetc_rx_swbd *rx_swbd;
> + struct xdp_buff *xsk_buff;
> + union enetc_rx_bd *rxbd;
> + int i, j;
> +
> + i = rx_ring->next_to_use;
> + rxbd = enetc_rxbd(rx_ring, i);
> +
> + for (j = 0; j < buff_cnt; j++) {
> + xsk_buff = xsk_buff_alloc(pool); // TODO use _batch?

yes, use batch:P

> + if (!xsk_buff)
> + break;
> +
> + rx_swbd = &rx_ring->rx_swbd[i];
> + rx_swbd->xsk_buff = xsk_buff;
> + rx_swbd->dma = xsk_buff_xdp_get_dma(xsk_buff);
> +
> + /* update RxBD */
> + rxbd->w.addr = cpu_to_le64(rx_swbd->dma);
> + /* clear 'R" as well */
> + rxbd->r.lstatus = 0;
> +
> + enetc_rxbd_next(rx_ring, &rxbd, &i);
> + }
> +
> + if (likely(j)) {
> + rx_ring->next_to_use = i;
> +
> + /* update ENETC's consumer index */
> + enetc_wr_reg_hot(rx_ring->rcir, rx_ring->next_to_use);
> + }
> +
> + return j;
> +}
> +
> #ifdef CONFIG_FSL_ENETC_PTP_CLOCK
> static void enetc_get_rx_tstamp(struct net_device *ndev,
> union enetc_rx_bd *rxbd,
> @@ -1128,6 +1170,18 @@ static void enetc_add_rx_buff_to_skb(struct enetc_bdr *rx_ring, int i,
> enetc_flip_rx_buff(rx_ring, rx_swbd);
> }
>
> +static void enetc_put_rx_swbd(struct enetc_bdr *rx_ring, int i)
> +{
> + struct enetc_rx_swbd *rx_swbd = &rx_ring->rx_swbd[i];
> +
> + if (rx_swbd->xsk_buff) {
> + xsk_buff_free(rx_swbd->xsk_buff);
> + rx_swbd->xsk_buff = NULL;
> + } else {
> + enetc_put_rx_buff(rx_ring, rx_swbd);
> + }
> +}
> +
> static bool enetc_check_bd_errors_and_consume(struct enetc_bdr *rx_ring,
> u32 bd_status,
> union enetc_rx_bd **rxbd, int *i,
> @@ -1136,7 +1190,7 @@ static bool enetc_check_bd_errors_and_consume(struct enetc_bdr *rx_ring,
> if (likely(!(bd_status & ENETC_RXBD_LSTATUS(ENETC_RXBD_ERR_MASK))))
> return false;
>
> - enetc_put_rx_buff(rx_ring, &rx_ring->rx_swbd[*i]);
> + enetc_put_rx_swbd(rx_ring, *i);
> (*buffs_missing)++;
> enetc_rxbd_next(rx_ring, rxbd, i);
>
> @@ -1144,7 +1198,7 @@ static bool enetc_check_bd_errors_and_consume(struct enetc_bdr *rx_ring,
> dma_rmb();
> bd_status = le32_to_cpu((*rxbd)->r.lstatus);
>
> - enetc_put_rx_buff(rx_ring, &rx_ring->rx_swbd[*i]);
> + enetc_put_rx_swbd(rx_ring, *i);
> (*buffs_missing)++;
> enetc_rxbd_next(rx_ring, rxbd, i);
> }
> @@ -1484,6 +1538,43 @@ static void enetc_build_xdp_buff(struct enetc_bdr *rx_ring, u32 bd_status,
> }
> }
>
> +static struct xdp_buff *enetc_build_xsk_buff(struct xsk_buff_pool *pool,
> + struct enetc_bdr *rx_ring,
> + u32 bd_status,
> + union enetc_rx_bd **rxbd, int *i,
> + int *buffs_missing, int *rx_byte_cnt)
> +{
> + struct enetc_rx_swbd *rx_swbd = &rx_ring->rx_swbd[*i];
> + u16 size = le16_to_cpu((*rxbd)->r.buf_len);
> + struct xdp_buff *xsk_buff;
> +
> + /* Multi-buffer frames are not supported in XSK mode */

Nice! I realized we need to forbid that on ice now.

> + if (unlikely(!(bd_status & ENETC_RXBD_LSTATUS_F))) {
> + while (!(bd_status & ENETC_RXBD_LSTATUS_F)) {
> + enetc_put_rx_swbd(rx_ring, *i);
> +
> + (*buffs_missing)++;
> + enetc_rxbd_next(rx_ring, rxbd, i);
> + dma_rmb();
> + bd_status = le32_to_cpu((*rxbd)->r.lstatus);
> + }
> +
> + return NULL;
> + }
> +
> + xsk_buff = rx_swbd->xsk_buff;
> + xsk_buff_set_size(xsk_buff, size);
> + xsk_buff_dma_sync_for_cpu(xsk_buff, pool);
> +
> + rx_swbd->xsk_buff = NULL;
> +
> + (*buffs_missing)++;
> + (*rx_byte_cnt) += size;
> + enetc_rxbd_next(rx_ring, rxbd, i);
> +
> + return xsk_buff;
> +}
> +
> /* Convert RX buffer descriptors to TX buffer descriptors. These will be
> * recycled back into the RX ring in enetc_clean_tx_ring.
> */
> @@ -1659,11 +1750,136 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
> return rx_frm_cnt;
> }
>
> +static void enetc_xsk_buff_to_skb(struct xdp_buff *xsk_buff,
> + struct enetc_bdr *rx_ring,
> + union enetc_rx_bd *rxbd,
> + struct napi_struct *napi)
> +{
> + size_t len = xdp_get_buff_len(xsk_buff);
> + struct sk_buff *skb;
> +
> + skb = napi_alloc_skb(napi, len);
> + if (unlikely(!skb)) {
> + rx_ring->stats.rx_alloc_errs++;
> + goto out;
> + }
> +
> + skb_put_data(skb, xsk_buff->data, len);
> +
> + enetc_get_offloads(rx_ring, rxbd, skb);
> +
> + skb_record_rx_queue(skb, rx_ring->index);
> + skb->protocol = eth_type_trans(skb, rx_ring->ndev);
> +
> + rx_ring->stats.packets += skb->len;
> + rx_ring->stats.bytes++;
> +
> + napi_gro_receive(napi, skb);
> +out:
> + xsk_buff_free(xsk_buff);
> +}
> +
> +static int enetc_clean_rx_ring_xsk(struct enetc_bdr *rx_ring,
> + struct napi_struct *napi, int work_limit,
> + struct bpf_prog *prog,
> + struct xsk_buff_pool *pool)
> +{
> + struct net_device *ndev = rx_ring->ndev;
> + union enetc_rx_bd *rxbd, *orig_rxbd;
> + int rx_frm_cnt = 0, rx_byte_cnt = 0;
> + int xdp_redirect_frm_cnt = 0;
> + struct xdp_buff *xsk_buff;
> + int buffs_missing, err, i;
> + bool wakeup_xsk = false;
> + u32 bd_status, xdp_act;
> +
> + buffs_missing = enetc_bd_unused(rx_ring);
> + /* next descriptor to process */
> + i = rx_ring->next_to_clean;
> +
> + while (likely(rx_frm_cnt < work_limit)) {
> + if (buffs_missing >= ENETC_RXBD_BUNDLE) {
> + buffs_missing -= enetc_refill_rx_ring_xsk(rx_ring,
> + buffs_missing);
> + wakeup_xsk |= (buffs_missing != 0);
> + }
> +
> + rxbd = enetc_rxbd(rx_ring, i);
> + bd_status = le32_to_cpu(rxbd->r.lstatus);
> + if (!bd_status)
> + break;
> +
> + enetc_wr_reg_hot(rx_ring->idr, BIT(rx_ring->index));
> + dma_rmb(); /* for reading other rxbd fields */
> +
> + if (enetc_check_bd_errors_and_consume(rx_ring, bd_status,
> + &rxbd, &i,
> + &buffs_missing))
> + continue;
> +
> + orig_rxbd = rxbd;
> +
> + xsk_buff = enetc_build_xsk_buff(pool, rx_ring, bd_status,
> + &rxbd, &i, &buffs_missing,
> + &rx_byte_cnt);
> + if (!xsk_buff)
> + continue;
> +
> + xdp_act = bpf_prog_run_xdp(prog, xsk_buff);
> + switch (xdp_act) {
> + default:
> + bpf_warn_invalid_xdp_action(ndev, prog, xdp_act);
> + fallthrough;
> + case XDP_ABORTED:
> + trace_xdp_exception(ndev, prog, xdp_act);
> + fallthrough;
> + case XDP_DROP:
> + xsk_buff_free(xsk_buff);
> + break;
> + case XDP_PASS:
> + enetc_xsk_buff_to_skb(xsk_buff, rx_ring, orig_rxbd,
> + napi);
> + break;
> + case XDP_REDIRECT:
> + err = xdp_do_redirect(ndev, xsk_buff, prog);
> + if (unlikely(err)) {
> + if (err == -ENOBUFS)
> + wakeup_xsk = true;
> + xsk_buff_free(xsk_buff);
> + rx_ring->stats.xdp_redirect_failures++;
> + } else {
> + xdp_redirect_frm_cnt++;
> + rx_ring->stats.xdp_redirect++;
> + }

no XDP_TX support? I don't see it being added on next patch.

> + }
> +
> + rx_frm_cnt++;
> + }
> +
> + rx_ring->next_to_clean = i;
> +
> + rx_ring->stats.packets += rx_frm_cnt;
> + rx_ring->stats.bytes += rx_byte_cnt;
> +
> + if (xdp_redirect_frm_cnt)
> + xdp_do_flush_map();
> +
> + if (xsk_uses_need_wakeup(pool)) {
> + if (wakeup_xsk)
> + xsk_set_rx_need_wakeup(pool);
> + else
> + xsk_clear_rx_need_wakeup(pool);
> + }
> +
> + return rx_frm_cnt;
> +}
> +
> static int enetc_poll(struct napi_struct *napi, int budget)
> {
> struct enetc_int_vector
> *v = container_of(napi, struct enetc_int_vector, napi);
> struct enetc_bdr *rx_ring = &v->rx_ring;
> + struct xsk_buff_pool *pool;
> struct bpf_prog *prog;
> bool complete = true;
> int work_done;
> @@ -1676,10 +1892,15 @@ static int enetc_poll(struct napi_struct *napi, int budget)
> complete = false;
>
> prog = rx_ring->xdp.prog;
> - if (prog)

(...)

2023-02-08 16:38:00

by Maciej Fijalkowski

[permalink] [raw]
Subject: Re: [RFC PATCH net-next 11/11] net: enetc: add TX support for zero-copy XDP sockets

On Mon, Feb 06, 2023 at 12:08:37PM +0200, Vladimir Oltean wrote:

Hey Vladimir,

> Schedule NAPI by hand from enetc_xsk_wakeup(), and send frames from the
> XSK TX queue from NAPI context. Add them to the completion queue from
> the enetc_clean_tx_ring() procedure which is common for all kinds of
> traffic.
>
> We reuse one of the TX rings for XDP (XDP_TX/XDP_REDIRECT) for XSK as
> well. They are already cropped from the TX rings that the network stack
> can use when XDP is enabled (with or without AF_XDP).
>
> As for XDP_REDIRECT calling enetc's ndo_xdp_xmit, I'm not sure if that
> can run simultaneously with enetc_poll() (on different CPUs, but towards
> the same TXQ). I guess it probably can, but idk what to do about it.
> The problem is that enetc_xdp_xmit() sends to
> priv->xdp_tx_ring[smp_processor_id()], while enetc_xsk_xmit() and XDP_TX
> send to priv->xdp_tx_ring[NAPI instance]. So when the NAPI instance runs

Why not use cpu id on the latter then?

> on a different CPU that the one it is numerically equal to, we should
> have a lock that serializes XDP_REDIRECT with the others.
>
> Signed-off-by: Vladimir Oltean <[email protected]>
> ---
> drivers/net/ethernet/freescale/enetc/enetc.c | 102 ++++++++++++++++++-
> drivers/net/ethernet/freescale/enetc/enetc.h | 2 +
> 2 files changed, 99 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
> index 3990c006c011..bc0db788afc7 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.c
> @@ -84,7 +84,7 @@ static void enetc_free_tx_swbd(struct enetc_bdr *tx_ring,
> struct xdp_frame *xdp_frame = enetc_tx_swbd_get_xdp_frame(tx_swbd);
> struct sk_buff *skb = enetc_tx_swbd_get_skb(tx_swbd);
>
> - if (tx_swbd->dma)
> + if (!tx_swbd->is_xsk && tx_swbd->dma)
> enetc_unmap_tx_buff(tx_ring, tx_swbd);
>
> if (xdp_frame) {
> @@ -817,7 +817,8 @@ static void enetc_recycle_xdp_tx_buff(struct enetc_bdr *tx_ring,
> rx_ring->xdp.xdp_tx_in_flight--;
> }
>
> -static bool enetc_clean_tx_ring(struct enetc_bdr *tx_ring, int napi_budget)
> +static bool enetc_clean_tx_ring(struct enetc_bdr *tx_ring, int napi_budget,
> + int *xsk_confirmed)
> {
> int tx_frm_cnt = 0, tx_byte_cnt = 0, tx_win_drop = 0;
> struct net_device *ndev = tx_ring->ndev;
> @@ -854,7 +855,9 @@ static bool enetc_clean_tx_ring(struct enetc_bdr *tx_ring, int napi_budget)
> tx_win_drop++;
> }
>
> - if (tx_swbd->is_xdp_tx)
> + if (tx_swbd->is_xsk)
> + (*xsk_confirmed)++;
> + else if (tx_swbd->is_xdp_tx)
> enetc_recycle_xdp_tx_buff(tx_ring, tx_swbd);
> else if (likely(tx_swbd->dma))
> enetc_unmap_tx_buff(tx_ring, tx_swbd);
> @@ -1465,6 +1468,58 @@ int enetc_xdp_xmit(struct net_device *ndev, int num_frames,
> }
> EXPORT_SYMBOL_GPL(enetc_xdp_xmit);
>
> +static void enetc_xsk_map_tx_desc(struct enetc_tx_swbd *tx_swbd,
> + const struct xdp_desc *xsk_desc,
> + struct xsk_buff_pool *pool)
> +{
> + dma_addr_t dma;
> +
> + dma = xsk_buff_raw_get_dma(pool, xsk_desc->addr);
> + xsk_buff_raw_dma_sync_for_device(pool, dma, xsk_desc->len);
> +
> + tx_swbd->dma = dma;
> + tx_swbd->len = xsk_desc->len;
> + tx_swbd->is_xsk = true;
> + tx_swbd->is_eof = true;
> +}
> +
> +static bool enetc_xsk_xmit(struct net_device *ndev, struct xsk_buff_pool *pool,
> + u32 queue_id)
> +{
> + struct enetc_ndev_priv *priv = netdev_priv(ndev);
> + struct xdp_desc *xsk_descs = pool->tx_descs;
> + struct enetc_tx_swbd tx_swbd = {0};
> + struct enetc_bdr *tx_ring;
> + u32 budget, batch;
> + int i, k;
> +
> + tx_ring = priv->xdp_tx_ring[queue_id];
> +
> + /* Shouldn't race with anyone because we are running in the softirq
> + * of the only CPU that sends packets to this TX ring
> + */
> + budget = min(enetc_bd_unused(tx_ring) - 1, ENETC_XSK_TX_BATCH);
> +
> + batch = xsk_tx_peek_release_desc_batch(pool, budget);
> + if (!batch)
> + return true;
> +
> + i = tx_ring->next_to_use;
> +
> + for (k = 0; k < batch; k++) {
> + enetc_xsk_map_tx_desc(&tx_swbd, &xsk_descs[k], pool);
> + enetc_xdp_map_tx_buff(tx_ring, i, &tx_swbd, tx_swbd.len);
> + enetc_bdr_idx_inc(tx_ring, &i);
> + }
> +
> + tx_ring->next_to_use = i;
> +
> + xsk_tx_release(pool);

xsk_tx_release() is not needed if you're using
xsk_tx_peek_release_desc_batch() above, it will do this for you at the end
of its job.

> + enetc_update_tx_ring_tail(tx_ring);
> +
> + return budget != batch;
> +}
> +
> static void enetc_map_rx_buff_to_xdp(struct enetc_bdr *rx_ring, int i,
> struct xdp_buff *xdp_buff, u16 size)
> {
> @@ -1881,6 +1936,7 @@ static int enetc_poll(struct napi_struct *napi, int budget)
> struct enetc_bdr *rx_ring = &v->rx_ring;
> struct xsk_buff_pool *pool;
> struct bpf_prog *prog;
> + int xsk_confirmed = 0;
> bool complete = true;
> int work_done;
> int i;
> @@ -1888,7 +1944,8 @@ static int enetc_poll(struct napi_struct *napi, int budget)
> enetc_lock_mdio();
>
> for (i = 0; i < v->count_tx_rings; i++)
> - if (!enetc_clean_tx_ring(&v->tx_ring[i], budget))
> + if (!enetc_clean_tx_ring(&v->tx_ring[i], budget,
> + &xsk_confirmed))
> complete = false;
>
> prog = rx_ring->xdp.prog;
> @@ -1901,6 +1958,17 @@ static int enetc_poll(struct napi_struct *napi, int budget)
> else
> work_done = enetc_clean_rx_ring(rx_ring, napi, budget);
>
> + if (pool) {
> + if (xsk_confirmed)
> + xsk_tx_completed(pool, xsk_confirmed);
> +
> + if (xsk_uses_need_wakeup(pool))
> + xsk_set_tx_need_wakeup(pool);
> +
> + if (!enetc_xsk_xmit(rx_ring->ndev, pool, rx_ring->index))
> + complete = false;
> + }
> +
> if (work_done == budget)
> complete = false;
> if (work_done)
> @@ -3122,7 +3190,31 @@ static int enetc_setup_xsk_pool(struct net_device *ndev,
>
> int enetc_xsk_wakeup(struct net_device *ndev, u32 queue_id, u32 flags)
> {
> - /* xp_assign_dev() wants this; nothing needed for RX */
> + struct enetc_ndev_priv *priv = netdev_priv(ndev);
> + struct enetc_int_vector *v;
> + struct enetc_bdr *rx_ring;
> +
> + if (!netif_running(ndev) || !netif_carrier_ok(ndev))
> + return -ENETDOWN;
> +
> + if (queue_id >= priv->bdr_int_num)
> + return -ERANGE;
> +
> + v = priv->int_vector[queue_id];
> + rx_ring = &v->rx_ring;
> +
> + if (!rx_ring->xdp.xsk_pool || !rx_ring->xdp.prog)
> + return -EINVAL;
> +
> + /* No way to kick TX by triggering a hardirq right away =>
> + * raise softirq. This might schedule NAPI on a CPU different than the
> + * smp_affinity of its IRQ would suggest, but that would happen anyway
> + * if, say, we change that affinity under heavy traffic.
> + * So enetc_poll() has to be prepared for it anyway.
> + */
> + if (!napi_if_scheduled_mark_missed(&v->napi))
> + napi_schedule(&v->napi);
> +
> return 0;
> }
>
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
> index e1a746e37c9a..403f40473b52 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.h
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.h
> @@ -36,6 +36,7 @@ struct enetc_tx_swbd {
> u8 is_eof:1;
> u8 is_xdp_tx:1;
> u8 is_xdp_redirect:1;
> + u8 is_xsk:1;
> u8 qbv_en:1;
> };
>
> @@ -86,6 +87,7 @@ struct enetc_xdp_data {
> #define ENETC_RX_RING_DEFAULT_SIZE 2048
> #define ENETC_TX_RING_DEFAULT_SIZE 2048
> #define ENETC_DEFAULT_TX_WORK (ENETC_TX_RING_DEFAULT_SIZE / 2)
> +#define ENETC_XSK_TX_BATCH ENETC_DEFAULT_TX_WORK
>
> struct enetc_bdr_resource {
> /* Input arguments saved for teardown */
> --
> 2.34.1
>

2023-02-08 16:39:11

by Maciej Fijalkowski

[permalink] [raw]
Subject: Re: [RFC PATCH net-next 11/11] net: enetc: add TX support for zero-copy XDP sockets

On Mon, Feb 06, 2023 at 12:19:21PM +0200, Vladimir Oltean wrote:
> Yikes, ran git format-patch one patch too soon. There's also a hidden,
> bonus patch "12/11" below. Doesn't affect first 11 patches in any way,
> though. Here it is.
>
> From f7f10232622309d66a7a1ae1932d0c081936d546 Mon Sep 17 00:00:00 2001
> From: Vladimir Oltean <[email protected]>
> Date: Tue, 25 Oct 2022 02:32:24 +0300
> Subject: [RFC PATCH net-next 12/12] net: enetc: add support for XDP_TX
> with zero-copy XDP sockets
>
> Add support for the case when the BPF program attached to a ring with an
> XSK pool returns the XDP_TX verdict. The frame needs to go back on the
> interface it came from.
>
> No dma_map or dma_sync_for_device is necessary, just a small impedance
> matching logic with the XDP_TX procedure we have in place for non-XSK,
> since the data structures are different (xdp_buff vs xdp_frame; cannot
> have multi-buffer with XSK).
>
> In the TX confirmation routine, just release the RX buffer (as opposed
> to non-XSK XDP_TX). Recycling might be possible, but I haven't
> experimented with it.

ugh premature response on 10/11.

>
> Signed-off-by: Vladimir Oltean <[email protected]>
> ---
> drivers/net/ethernet/freescale/enetc/enetc.c | 35 ++++++++++++++++++--
> drivers/net/ethernet/freescale/enetc/enetc.h | 1 +
> 2 files changed, 34 insertions(+), 2 deletions(-)

2023-02-08 16:44:49

by Maciej Fijalkowski

[permalink] [raw]
Subject: Re: [RFC PATCH net-next 00/11] NXP ENETC AF_XDP zero-copy sockets

On Mon, Feb 06, 2023 at 12:08:26PM +0200, Vladimir Oltean wrote:
> This is RFC because I have a few things I'm not 100% certain about.
> I've tested this with the xdpsock test application, I don't have very
> detailed knowledge about the internals of AF_XDP sockets.
>
> Patches where I'd appreciate if people took a look are 02/11, 05/11,
> 10/11 and 11/11.

All of that looks rather sane to me. Can you point out explicitly your
issues here? Is it around concurrent access to XDP Tx rings or something
more?

I commented on 10 and 11. For 2 it seems fine and 5 has always been
controversial.

>
> Vladimir Oltean (11):
> net: enetc: optimize struct enetc_rx_swbd layout
> net: enetc: perform XDP RX queue registration at enetc_setup_bpf()
> time
> net: enetc: rename "cleaned_cnt" to "buffs_missing"
> net: enetc: continue NAPI processing on frames with RX errors
> net: enetc: add support for ethtool --show-channels
> net: enetc: consolidate rx_swbd freeing
> net: enetc: rename enetc_free_tx_frame() to enetc_free_tx_swbd()
> net: enetc: increment rx_byte_cnt for XDP data path
> net: enetc: move setting of ENETC_TXBD_FLAGS_F flag to
> enetc_xdp_map_tx_buff()
> net: enetc: add RX support for zero-copy XDP sockets
> net: enetc: add TX support for zero-copy XDP sockets
>
> drivers/net/ethernet/freescale/enetc/enetc.c | 676 +++++++++++++++---
> drivers/net/ethernet/freescale/enetc/enetc.h | 9 +-
> .../ethernet/freescale/enetc/enetc_ethtool.c | 10 +
> .../net/ethernet/freescale/enetc/enetc_pf.c | 1 +
> 4 files changed, 606 insertions(+), 90 deletions(-)
>
> --
> 2.34.1
>

2023-02-08 17:08:29

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC PATCH net-next 11/11] net: enetc: add TX support for zero-copy XDP sockets

Hi Maciej,

On Wed, Feb 08, 2023 at 05:37:35PM +0100, Maciej Fijalkowski wrote:
> On Mon, Feb 06, 2023 at 12:08:37PM +0200, Vladimir Oltean wrote:
>
> Hey Vladimir,
>
> > Schedule NAPI by hand from enetc_xsk_wakeup(), and send frames from the
> > XSK TX queue from NAPI context. Add them to the completion queue from
> > the enetc_clean_tx_ring() procedure which is common for all kinds of
> > traffic.
> >
> > We reuse one of the TX rings for XDP (XDP_TX/XDP_REDIRECT) for XSK as
> > well. They are already cropped from the TX rings that the network stack
> > can use when XDP is enabled (with or without AF_XDP).
> >
> > As for XDP_REDIRECT calling enetc's ndo_xdp_xmit, I'm not sure if that
> > can run simultaneously with enetc_poll() (on different CPUs, but towards
> > the same TXQ). I guess it probably can, but idk what to do about it.
> > The problem is that enetc_xdp_xmit() sends to
> > priv->xdp_tx_ring[smp_processor_id()], while enetc_xsk_xmit() and XDP_TX
> > send to priv->xdp_tx_ring[NAPI instance]. So when the NAPI instance runs
>
> Why not use cpu id on the latter then?

Hmm, because I want the sendto() syscall to trigger wakeup of the NAPI
that sends traffic to the proper queue_id, rather than to the queue_id
affine to the CPU that the sendto() syscall was made?

2023-02-08 17:19:41

by Maciej Fijalkowski

[permalink] [raw]
Subject: Re: [RFC PATCH net-next 11/11] net: enetc: add TX support for zero-copy XDP sockets

On Wed, Feb 08, 2023 at 07:08:15PM +0200, Vladimir Oltean wrote:
> Hi Maciej,
>
> On Wed, Feb 08, 2023 at 05:37:35PM +0100, Maciej Fijalkowski wrote:
> > On Mon, Feb 06, 2023 at 12:08:37PM +0200, Vladimir Oltean wrote:
> >
> > Hey Vladimir,
> >
> > > Schedule NAPI by hand from enetc_xsk_wakeup(), and send frames from the
> > > XSK TX queue from NAPI context. Add them to the completion queue from
> > > the enetc_clean_tx_ring() procedure which is common for all kinds of
> > > traffic.
> > >
> > > We reuse one of the TX rings for XDP (XDP_TX/XDP_REDIRECT) for XSK as
> > > well. They are already cropped from the TX rings that the network stack
> > > can use when XDP is enabled (with or without AF_XDP).
> > >
> > > As for XDP_REDIRECT calling enetc's ndo_xdp_xmit, I'm not sure if that
> > > can run simultaneously with enetc_poll() (on different CPUs, but towards
> > > the same TXQ). I guess it probably can, but idk what to do about it.
> > > The problem is that enetc_xdp_xmit() sends to
> > > priv->xdp_tx_ring[smp_processor_id()], while enetc_xsk_xmit() and XDP_TX
> > > send to priv->xdp_tx_ring[NAPI instance]. So when the NAPI instance runs
> >
> > Why not use cpu id on the latter then?
>
> Hmm, because I want the sendto() syscall to trigger wakeup of the NAPI
> that sends traffic to the proper queue_id, rather than to the queue_id
> affine to the CPU that the sendto() syscall was made?

Ok i was referring to the thing that using cpu id on both sides would
address concurrency. Regarding your need, what i did for ice was that i
assign xdp_ring to the rx_ring and within ndo_xsk_wakeup() i pick the
rx_ring based on queue_id that comes as an arg:

https://lore.kernel.org/bpf/[email protected]/

2023-03-20 16:37:21

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC PATCH net-next 11/11] net: enetc: add TX support for zero-copy XDP sockets

Hi Maciej,

On Wed, Feb 08, 2023 at 06:17:52PM +0100, Maciej Fijalkowski wrote:
> Ok i was referring to the thing that using cpu id on both sides would
> address concurrency. Regarding your need, what i did for ice was that i
> assign xdp_ring to the rx_ring and within ndo_xsk_wakeup() i pick the
> rx_ring based on queue_id that comes as an arg:
>
> https://lore.kernel.org/bpf/[email protected]/

I looked at your response a few times in the past month, and I'm sorry
to say, but I don't have enough background in the ice driver to really
understand what you're saying and how that helps in the given situation.
If you could rephrase or explain some more, that would be great.