2023-12-23 02:58:28

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 00/34] Christmas 3-serie XDP for idpf (+generic stuff)

I was highly asked to send this WIP before the holidays to trigger
some discussions at least for the generic parts.

This all depends on libie[0] and WB-on-ITR fix[1]. The RFC does not
guarantee to work perfectly, but at least regular XDP seems to work
for me...

In fact, here are 3 separate series:
* 01-08: convert idpf to libie and make it more sane;
* 09-25: add XDP to idpf;
* 26-34: add XSk to idpf.

Most people may want to be interested only in the following generic
changes:
* 11: allow attaching already registered memory models to XDP RxQ info;
* 12-13: generic helpers for adding a frag to &xdp_buff and converting
it to an skb;
* 14: get rid of xdp_frame::mem.id, allow mixing pages from different
page_pools within one &xdp_buff/&xdp_frame;
* 15: some Page Pool helper;
* 18: it's for libie, but I wanted to talk about XDP_TX bulking;
* 26: same as 13, but for converting XSK &xdp_buff to skb.

The rest is up to you, driver-specific stuff is pretty boring sometimes.

I'll be polishing and finishing this all starting January 3rd and then
preparing and sending sane series, some early feedback never hurts tho.

Merry Yule!

[0] https://lore.kernel.org/netdev/[email protected]
[1] https://lore.kernel.org/netdev/[email protected]

Alexander Lobakin (23):
idpf: reuse libie's definitions of parsed ptype structures
idpf: pack &idpf_queue way more efficiently
idpf: remove legacy Page Pool Ethtool stats
libie: support different types of buffers for Rx
idpf: convert header split mode to libie + napi_build_skb()
idpf: use libie Rx buffer management for payload buffer
libie: add Tx buffer completion helpers
idpf: convert to libie Tx buffer completion
bpf, xdp: constify some bpf_prog * function arguments
xdp: constify read-only arguments of some static inline helpers
xdp: allow attaching already registered memory model to xdp_rxq_info
xdp: add generic xdp_buff_add_frag()
xdp: add generic xdp_build_skb_from_buff()
xdp: get rid of xdp_frame::mem.id
page_pool: add inline helper to sync VA for device (for XDP_TX)
jump_label: export static_key_slow_{inc,dec}_cpuslocked()
libie: support native XDP and register memory model
libie: add a couple of XDP helpers
idpf: stop using macros for accessing queue descriptors
idpf: use generic functions to build xdp_buff and skb
idpf: add support for XDP on Rx
idpf: add support for .ndo_xdp_xmit()
xdp: add generic XSk xdp_buff -> skb conversion

Michal Kubiak (11):
idpf: make complq cleaning dependent on scheduling mode
idpf: prepare structures to support xdp
idpf: implement XDP_SETUP_PROG in ndo_bpf for splitq
idpf: add support for sw interrupt
idpf: add relative queue id member to idpf_queue
idpf: add vc functions to manage selected queues
idpf: move search rx and tx queues to header
idpf: add XSk pool initialization
idpf: implement Tx path for AF_XDP
idpf: implement Rx path for AF_XDP
idpf: enable XSk features and ndo_xsk_wakeup

.../net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +-
drivers/net/ethernet/intel/Kconfig | 3 +-
drivers/net/ethernet/intel/idpf/Makefile | 3 +
drivers/net/ethernet/intel/idpf/idpf.h | 91 +-
drivers/net/ethernet/intel/idpf/idpf_dev.c | 3 +
.../net/ethernet/intel/idpf/idpf_ethtool.c | 74 +-
.../net/ethernet/intel/idpf/idpf_lan_txrx.h | 6 +-
drivers/net/ethernet/intel/idpf/idpf_lib.c | 40 +-
drivers/net/ethernet/intel/idpf/idpf_main.c | 1 +
.../ethernet/intel/idpf/idpf_singleq_txrx.c | 221 ++-
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 1142 ++++++++--------
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 451 +++----
drivers/net/ethernet/intel/idpf/idpf_vf_dev.c | 3 +
.../net/ethernet/intel/idpf/idpf_virtchnl.c | 1132 ++++++++++------
drivers/net/ethernet/intel/idpf/idpf_xdp.c | 522 ++++++++
drivers/net/ethernet/intel/idpf/idpf_xdp.h | 38 +
drivers/net/ethernet/intel/idpf/idpf_xsk.c | 1181 +++++++++++++++++
drivers/net/ethernet/intel/idpf/idpf_xsk.h | 30 +
drivers/net/ethernet/intel/libie/Makefile | 3 +
drivers/net/ethernet/intel/libie/rx.c | 135 +-
drivers/net/ethernet/intel/libie/tx.c | 16 +
drivers/net/ethernet/intel/libie/xdp.c | 50 +
drivers/net/ethernet/intel/libie/xsk.c | 49 +
drivers/net/veth.c | 4 +-
include/linux/bpf.h | 12 +-
include/linux/filter.h | 9 +-
include/linux/net/intel/libie/rx.h | 25 +-
include/linux/net/intel/libie/tx.h | 94 ++
include/linux/net/intel/libie/xdp.h | 586 ++++++++
include/linux/net/intel/libie/xsk.h | 172 +++
include/linux/netdevice.h | 6 +-
include/linux/skbuff.h | 14 +-
include/net/page_pool/helpers.h | 32 +
include/net/page_pool/types.h | 6 +-
include/net/xdp.h | 109 +-
kernel/bpf/cpumap.c | 2 +-
kernel/bpf/devmap.c | 8 +-
kernel/jump_label.c | 2 +
net/bpf/test_run.c | 6 +-
net/core/dev.c | 8 +-
net/core/filter.c | 27 +-
net/core/page_pool.c | 31 +-
net/core/xdp.c | 189 ++-
43 files changed, 4971 insertions(+), 1567 deletions(-)
create mode 100644 drivers/net/ethernet/intel/idpf/idpf_xdp.c
create mode 100644 drivers/net/ethernet/intel/idpf/idpf_xdp.h
create mode 100644 drivers/net/ethernet/intel/idpf/idpf_xsk.c
create mode 100644 drivers/net/ethernet/intel/idpf/idpf_xsk.h
create mode 100644 drivers/net/ethernet/intel/libie/tx.c
create mode 100644 drivers/net/ethernet/intel/libie/xdp.c
create mode 100644 drivers/net/ethernet/intel/libie/xsk.c
create mode 100644 include/linux/net/intel/libie/tx.h
create mode 100644 include/linux/net/intel/libie/xdp.h
create mode 100644 include/linux/net/intel/libie/xsk.h

--
2.43.0



2023-12-23 02:58:53

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 01/34] idpf: reuse libie's definitions of parsed ptype structures

idpf's in-kernel parsed ptype structure is almost identical to the one
used in the previous Intel drivers, which means it can be converted to
use libie's definitions and even helpers. The only difference is that
it doesn't use a constant table, rather than one obtained from the
device.
Remove the driver counterpart and use libie's helpers for hashes and
checksums. This slightly optimizes skb fields processing due to faster
checks.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/Kconfig | 1 +
drivers/net/ethernet/intel/idpf/idpf.h | 2 +-
drivers/net/ethernet/intel/idpf/idpf_main.c | 1 +
.../ethernet/intel/idpf/idpf_singleq_txrx.c | 87 +++++++--------
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 101 ++++++------------
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 88 +--------------
.../net/ethernet/intel/idpf/idpf_virtchnl.c | 54 ++++++----
7 files changed, 110 insertions(+), 224 deletions(-)

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index c7da7d05d93e..0db1aa36866e 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -378,6 +378,7 @@ config IDPF
tristate "Intel(R) Infrastructure Data Path Function Support"
depends on PCI_MSI
select DIMLIB
+ select LIBIE
select PAGE_POOL
select PAGE_POOL_STATS
help
diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
index 0acc125decb3..8342df0f4f3d 100644
--- a/drivers/net/ethernet/intel/idpf/idpf.h
+++ b/drivers/net/ethernet/intel/idpf/idpf.h
@@ -385,7 +385,7 @@ struct idpf_vport {
u16 num_rxq_grp;
struct idpf_rxq_group *rxq_grps;
u32 rxq_model;
- struct idpf_rx_ptype_decoded rx_ptype_lkup[IDPF_RX_MAX_PTYPE];
+ struct libie_rx_ptype_parsed rx_ptype_lkup[IDPF_RX_MAX_PTYPE];

struct idpf_adapter *adapter;
struct net_device *netdev;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
index e1febc74cefd..6471158e6f6b 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_main.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
@@ -7,6 +7,7 @@
#define DRV_SUMMARY "Intel(R) Infrastructure Data Path Function Linux Driver"

MODULE_DESCRIPTION(DRV_SUMMARY);
+MODULE_IMPORT_NS(LIBIE);
MODULE_LICENSE("GPL");

/**
diff --git a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
index 8122a0cc97de..e58e08c9997d 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
@@ -636,75 +636,64 @@ static bool idpf_rx_singleq_is_non_eop(struct idpf_queue *rxq,
* @rxq: Rx ring being processed
* @skb: skb currently being received and modified
* @csum_bits: checksum bits from descriptor
- * @ptype: the packet type decoded by hardware
+ * @parsed: the packet type parsed by hardware
*
* skb->protocol must be set before this function is called
*/
static void idpf_rx_singleq_csum(struct idpf_queue *rxq, struct sk_buff *skb,
- struct idpf_rx_csum_decoded *csum_bits,
- u16 ptype)
+ struct idpf_rx_csum_decoded csum_bits,
+ struct libie_rx_ptype_parsed parsed)
{
- struct idpf_rx_ptype_decoded decoded;
bool ipv4, ipv6;

/* check if Rx checksum is enabled */
- if (unlikely(!(rxq->vport->netdev->features & NETIF_F_RXCSUM)))
+ if (!libie_has_rx_checksum(rxq->vport->netdev, parsed))
return;

/* check if HW has decoded the packet and checksum */
- if (unlikely(!(csum_bits->l3l4p)))
+ if (unlikely(!csum_bits.l3l4p))
return;

- decoded = rxq->vport->rx_ptype_lkup[ptype];
- if (unlikely(!(decoded.known && decoded.outer_ip)))
+ if (unlikely(parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_L2))
return;

- ipv4 = IDPF_RX_PTYPE_TO_IPV(&decoded, IDPF_RX_PTYPE_OUTER_IPV4);
- ipv6 = IDPF_RX_PTYPE_TO_IPV(&decoded, IDPF_RX_PTYPE_OUTER_IPV6);
+ ipv4 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV4;
+ ipv6 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV6;

/* Check if there were any checksum errors */
- if (unlikely(ipv4 && (csum_bits->ipe || csum_bits->eipe)))
+ if (unlikely(ipv4 && (csum_bits.ipe || csum_bits.eipe)))
goto checksum_fail;

/* Device could not do any checksum offload for certain extension
* headers as indicated by setting IPV6EXADD bit
*/
- if (unlikely(ipv6 && csum_bits->ipv6exadd))
+ if (unlikely(ipv6 && csum_bits.ipv6exadd))
return;

/* check for L4 errors and handle packets that were not able to be
* checksummed due to arrival speed
*/
- if (unlikely(csum_bits->l4e))
+ if (unlikely(csum_bits.l4e))
goto checksum_fail;

- if (unlikely(csum_bits->nat && csum_bits->eudpe))
+ if (unlikely(csum_bits.nat && csum_bits.eudpe))
goto checksum_fail;

/* Handle packets that were not able to be checksummed due to arrival
* speed, in this case the stack can compute the csum.
*/
- if (unlikely(csum_bits->pprs))
+ if (unlikely(csum_bits.pprs))
return;

/* If there is an outer header present that might contain a checksum
* we need to bump the checksum level by 1 to reflect the fact that
* we are indicating we validated the inner checksum.
*/
- if (decoded.tunnel_type >= IDPF_RX_PTYPE_TUNNEL_IP_GRENAT)
+ if (parsed.tunnel_type >= LIBIE_RX_PTYPE_TUNNEL_IP_GRENAT)
skb->csum_level = 1;

- /* Only report checksum unnecessary for ICMP, TCP, UDP, or SCTP */
- switch (decoded.inner_prot) {
- case IDPF_RX_PTYPE_INNER_PROT_ICMP:
- case IDPF_RX_PTYPE_INNER_PROT_TCP:
- case IDPF_RX_PTYPE_INNER_PROT_UDP:
- case IDPF_RX_PTYPE_INNER_PROT_SCTP:
- skb->ip_summed = CHECKSUM_UNNECESSARY;
- return;
- default:
- return;
- }
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
+ return;

checksum_fail:
u64_stats_update_begin(&rxq->stats_sync);
@@ -717,7 +706,7 @@ static void idpf_rx_singleq_csum(struct idpf_queue *rxq, struct sk_buff *skb,
* @rx_q: Rx completion queue
* @skb: skb currently being received and modified
* @rx_desc: the receive descriptor
- * @ptype: Rx packet type
+ * @parsed: Rx packet type parsed by hardware
*
* This function only operates on the VIRTCHNL2_RXDID_1_32B_BASE_M base 32byte
* descriptor writeback format.
@@ -725,7 +714,7 @@ static void idpf_rx_singleq_csum(struct idpf_queue *rxq, struct sk_buff *skb,
static void idpf_rx_singleq_base_csum(struct idpf_queue *rx_q,
struct sk_buff *skb,
union virtchnl2_rx_desc *rx_desc,
- u16 ptype)
+ struct libie_rx_ptype_parsed parsed)
{
struct idpf_rx_csum_decoded csum_bits;
u32 rx_error, rx_status;
@@ -749,7 +738,7 @@ static void idpf_rx_singleq_base_csum(struct idpf_queue *rx_q,
csum_bits.nat = 0;
csum_bits.eudpe = 0;

- idpf_rx_singleq_csum(rx_q, skb, &csum_bits, ptype);
+ idpf_rx_singleq_csum(rx_q, skb, csum_bits, parsed);
}

/**
@@ -757,7 +746,7 @@ static void idpf_rx_singleq_base_csum(struct idpf_queue *rx_q,
* @rx_q: Rx completion queue
* @skb: skb currently being received and modified
* @rx_desc: the receive descriptor
- * @ptype: Rx packet type
+ * @parsed: Rx packet type parsed by hardware
*
* This function only operates on the VIRTCHNL2_RXDID_2_FLEX_SQ_NIC flexible
* descriptor writeback format.
@@ -765,7 +754,7 @@ static void idpf_rx_singleq_base_csum(struct idpf_queue *rx_q,
static void idpf_rx_singleq_flex_csum(struct idpf_queue *rx_q,
struct sk_buff *skb,
union virtchnl2_rx_desc *rx_desc,
- u16 ptype)
+ struct libie_rx_ptype_parsed parsed)
{
struct idpf_rx_csum_decoded csum_bits;
u16 rx_status0, rx_status1;
@@ -789,7 +778,7 @@ static void idpf_rx_singleq_flex_csum(struct idpf_queue *rx_q,
rx_status1);
csum_bits.pprs = 0;

- idpf_rx_singleq_csum(rx_q, skb, &csum_bits, ptype);
+ idpf_rx_singleq_csum(rx_q, skb, csum_bits, parsed);
}

/**
@@ -797,7 +786,7 @@ static void idpf_rx_singleq_flex_csum(struct idpf_queue *rx_q,
* @rx_q: Rx completion queue
* @skb: skb currently being received and modified
* @rx_desc: specific descriptor
- * @decoded: Decoded Rx packet type related fields
+ * @parsed: parsed Rx packet type related fields
*
* This function only operates on the VIRTCHNL2_RXDID_1_32B_BASE_M base 32byte
* descriptor writeback format.
@@ -805,11 +794,11 @@ static void idpf_rx_singleq_flex_csum(struct idpf_queue *rx_q,
static void idpf_rx_singleq_base_hash(struct idpf_queue *rx_q,
struct sk_buff *skb,
union virtchnl2_rx_desc *rx_desc,
- struct idpf_rx_ptype_decoded *decoded)
+ struct libie_rx_ptype_parsed parsed)
{
u64 mask, qw1;

- if (unlikely(!(rx_q->vport->netdev->features & NETIF_F_RXHASH)))
+ if (!libie_has_rx_hash(rx_q->vport->netdev, parsed))
return;

mask = VIRTCHNL2_RX_BASE_DESC_FLTSTAT_RSS_HASH_M;
@@ -818,7 +807,7 @@ static void idpf_rx_singleq_base_hash(struct idpf_queue *rx_q,
if (FIELD_GET(mask, qw1) == mask) {
u32 hash = le32_to_cpu(rx_desc->base_wb.qword0.hi_dword.rss);

- skb_set_hash(skb, hash, idpf_ptype_to_htype(decoded));
+ libie_skb_set_hash(skb, hash, parsed);
}
}

@@ -827,7 +816,7 @@ static void idpf_rx_singleq_base_hash(struct idpf_queue *rx_q,
* @rx_q: Rx completion queue
* @skb: skb currently being received and modified
* @rx_desc: specific descriptor
- * @decoded: Decoded Rx packet type related fields
+ * @parsed: parsed Rx packet type related fields
*
* This function only operates on the VIRTCHNL2_RXDID_2_FLEX_SQ_NIC flexible
* descriptor writeback format.
@@ -835,15 +824,17 @@ static void idpf_rx_singleq_base_hash(struct idpf_queue *rx_q,
static void idpf_rx_singleq_flex_hash(struct idpf_queue *rx_q,
struct sk_buff *skb,
union virtchnl2_rx_desc *rx_desc,
- struct idpf_rx_ptype_decoded *decoded)
+ struct libie_rx_ptype_parsed parsed)
{
- if (unlikely(!(rx_q->vport->netdev->features & NETIF_F_RXHASH)))
+ if (!libie_has_rx_hash(rx_q->vport->netdev, parsed))
return;

if (FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_STATUS0_RSS_VALID_M,
- le16_to_cpu(rx_desc->flex_nic_wb.status_error0)))
- skb_set_hash(skb, le32_to_cpu(rx_desc->flex_nic_wb.rss_hash),
- idpf_ptype_to_htype(decoded));
+ le16_to_cpu(rx_desc->flex_nic_wb.status_error0))) {
+ u32 hash = le32_to_cpu(rx_desc->flex_nic_wb.rss_hash);
+
+ libie_skb_set_hash(skb, hash, parsed);
+ }
}

/**
@@ -863,7 +854,7 @@ static void idpf_rx_singleq_process_skb_fields(struct idpf_queue *rx_q,
union virtchnl2_rx_desc *rx_desc,
u16 ptype)
{
- struct idpf_rx_ptype_decoded decoded =
+ struct libie_rx_ptype_parsed parsed =
rx_q->vport->rx_ptype_lkup[ptype];

/* modifies the skb - consumes the enet header */
@@ -871,11 +862,11 @@ static void idpf_rx_singleq_process_skb_fields(struct idpf_queue *rx_q,

/* Check if we're using base mode descriptor IDs */
if (rx_q->rxdids == VIRTCHNL2_RXDID_1_32B_BASE_M) {
- idpf_rx_singleq_base_hash(rx_q, skb, rx_desc, &decoded);
- idpf_rx_singleq_base_csum(rx_q, skb, rx_desc, ptype);
+ idpf_rx_singleq_base_hash(rx_q, skb, rx_desc, parsed);
+ idpf_rx_singleq_base_csum(rx_q, skb, rx_desc, parsed);
} else {
- idpf_rx_singleq_flex_hash(rx_q, skb, rx_desc, &decoded);
- idpf_rx_singleq_flex_csum(rx_q, skb, rx_desc, ptype);
+ idpf_rx_singleq_flex_hash(rx_q, skb, rx_desc, parsed);
+ idpf_rx_singleq_flex_csum(rx_q, skb, rx_desc, parsed);
}
}

diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 610841dc4512..70785f9afadd 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -2705,51 +2705,27 @@ netdev_tx_t idpf_tx_splitq_start(struct sk_buff *skb,
return idpf_tx_splitq_frame(skb, tx_q);
}

-/**
- * idpf_ptype_to_htype - get a hash type
- * @decoded: Decoded Rx packet type related fields
- *
- * Returns appropriate hash type (such as PKT_HASH_TYPE_L2/L3/L4) to be used by
- * skb_set_hash based on PTYPE as parsed by HW Rx pipeline and is part of
- * Rx desc.
- */
-enum pkt_hash_types idpf_ptype_to_htype(const struct idpf_rx_ptype_decoded *decoded)
-{
- if (!decoded->known)
- return PKT_HASH_TYPE_NONE;
- if (decoded->payload_layer == IDPF_RX_PTYPE_PAYLOAD_LAYER_PAY2 &&
- decoded->inner_prot)
- return PKT_HASH_TYPE_L4;
- if (decoded->payload_layer == IDPF_RX_PTYPE_PAYLOAD_LAYER_PAY2 &&
- decoded->outer_ip)
- return PKT_HASH_TYPE_L3;
- if (decoded->outer_ip == IDPF_RX_PTYPE_OUTER_L2)
- return PKT_HASH_TYPE_L2;
-
- return PKT_HASH_TYPE_NONE;
-}
-
/**
* idpf_rx_hash - set the hash value in the skb
* @rxq: Rx descriptor ring packet is being transacted on
* @skb: pointer to current skb being populated
* @rx_desc: Receive descriptor
- * @decoded: Decoded Rx packet type related fields
+ * @parsed: parsed Rx packet type related fields
*/
static void idpf_rx_hash(struct idpf_queue *rxq, struct sk_buff *skb,
struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc,
- struct idpf_rx_ptype_decoded *decoded)
+ struct libie_rx_ptype_parsed parsed)
{
u32 hash;

- if (unlikely(!idpf_is_feature_ena(rxq->vport, NETIF_F_RXHASH)))
+ if (!libie_has_rx_hash(rxq->vport->netdev, parsed))
return;

hash = le16_to_cpu(rx_desc->hash1) |
(rx_desc->ff2_mirrid_hash2.hash2 << 16) |
(rx_desc->hash3 << 24);

- skb_set_hash(skb, hash, idpf_ptype_to_htype(decoded));
+ libie_skb_set_hash(skb, hash, parsed);
}

/**
@@ -2757,60 +2733,48 @@ static void idpf_rx_hash(struct idpf_queue *rxq, struct sk_buff *skb,
* @rxq: Rx descriptor ring packet is being transacted on
* @skb: pointer to current skb being populated
* @csum_bits: checksum fields extracted from the descriptor
- * @decoded: Decoded Rx packet type related fields
+ * @parsed: parsed Rx packet type related fields
*
* skb->protocol must be set before this function is called
*/
static void idpf_rx_csum(struct idpf_queue *rxq, struct sk_buff *skb,
- struct idpf_rx_csum_decoded *csum_bits,
- struct idpf_rx_ptype_decoded *decoded)
+ struct idpf_rx_csum_decoded csum_bits,
+ struct libie_rx_ptype_parsed parsed)
{
bool ipv4, ipv6;

/* check if Rx checksum is enabled */
- if (unlikely(!idpf_is_feature_ena(rxq->vport, NETIF_F_RXCSUM)))
+ if (!libie_has_rx_checksum(rxq->vport->netdev, parsed))
return;

/* check if HW has decoded the packet and checksum */
- if (!(csum_bits->l3l4p))
+ if (!csum_bits.l3l4p)
return;

- ipv4 = IDPF_RX_PTYPE_TO_IPV(decoded, IDPF_RX_PTYPE_OUTER_IPV4);
- ipv6 = IDPF_RX_PTYPE_TO_IPV(decoded, IDPF_RX_PTYPE_OUTER_IPV6);
+ ipv4 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV4;
+ ipv6 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV6;

- if (ipv4 && (csum_bits->ipe || csum_bits->eipe))
+ if (ipv4 && (csum_bits.ipe || csum_bits.eipe))
goto checksum_fail;

- if (ipv6 && csum_bits->ipv6exadd)
+ if (ipv6 && csum_bits.ipv6exadd)
return;

/* check for L4 errors and handle packets that were not able to be
* checksummed
*/
- if (csum_bits->l4e)
+ if (csum_bits.l4e)
goto checksum_fail;

- /* Only report checksum unnecessary for ICMP, TCP, UDP, or SCTP */
- switch (decoded->inner_prot) {
- case IDPF_RX_PTYPE_INNER_PROT_ICMP:
- case IDPF_RX_PTYPE_INNER_PROT_TCP:
- case IDPF_RX_PTYPE_INNER_PROT_UDP:
- if (!csum_bits->raw_csum_inv) {
- u16 csum = csum_bits->raw_csum;
-
- skb->csum = csum_unfold((__force __sum16)~swab16(csum));
- skb->ip_summed = CHECKSUM_COMPLETE;
- } else {
- skb->ip_summed = CHECKSUM_UNNECESSARY;
- }
- break;
- case IDPF_RX_PTYPE_INNER_PROT_SCTP:
+ if (csum_bits.raw_csum_inv ||
+ parsed.inner_prot == LIBIE_RX_PTYPE_INNER_SCTP) {
skb->ip_summed = CHECKSUM_UNNECESSARY;
- break;
- default:
- break;
+ return;
}

+ skb->csum = csum_unfold((__force __sum16)~swab16(csum_bits.raw_csum));
+ skb->ip_summed = CHECKSUM_COMPLETE;
+
return;

checksum_fail:
@@ -2853,7 +2817,7 @@ static void idpf_rx_splitq_extract_csum_bits(struct virtchnl2_rx_flex_desc_adv_n
* @rxq : Rx descriptor ring packet is being transacted on
* @skb : pointer to current skb being populated
* @rx_desc: Receive descriptor
- * @decoded: Decoded Rx packet type related fields
+ * @parsed: parsed Rx packet type related fields
*
* Return 0 on success and error code on failure
*
@@ -2862,21 +2826,21 @@ static void idpf_rx_splitq_extract_csum_bits(struct virtchnl2_rx_flex_desc_adv_n
*/
static int idpf_rx_rsc(struct idpf_queue *rxq, struct sk_buff *skb,
struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc,
- struct idpf_rx_ptype_decoded *decoded)
+ struct libie_rx_ptype_parsed parsed)
{
u16 rsc_segments, rsc_seg_len;
bool ipv4, ipv6;
int len;

- if (unlikely(!decoded->outer_ip))
+ if (unlikely(parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_L2))
return -EINVAL;

rsc_seg_len = le16_to_cpu(rx_desc->misc.rscseglen);
if (unlikely(!rsc_seg_len))
return -EINVAL;

- ipv4 = IDPF_RX_PTYPE_TO_IPV(decoded, IDPF_RX_PTYPE_OUTER_IPV4);
- ipv6 = IDPF_RX_PTYPE_TO_IPV(decoded, IDPF_RX_PTYPE_OUTER_IPV6);
+ ipv4 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV4;
+ ipv6 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV6;

if (unlikely(!(ipv4 ^ ipv6)))
return -EINVAL;
@@ -2935,30 +2899,25 @@ static int idpf_rx_process_skb_fields(struct idpf_queue *rxq,
struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc)
{
struct idpf_rx_csum_decoded csum_bits = { };
- struct idpf_rx_ptype_decoded decoded;
+ struct libie_rx_ptype_parsed parsed;
u16 rx_ptype;

rx_ptype = FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_ADV_PTYPE_M,
le16_to_cpu(rx_desc->ptype_err_fflags0));

- decoded = rxq->vport->rx_ptype_lkup[rx_ptype];
- /* If we don't know the ptype we can't do anything else with it. Just
- * pass it up the stack as-is.
- */
- if (!decoded.known)
- return 0;
+ parsed = rxq->vport->rx_ptype_lkup[rx_ptype];

/* process RSS/hash */
- idpf_rx_hash(rxq, skb, rx_desc, &decoded);
+ idpf_rx_hash(rxq, skb, rx_desc, parsed);

skb->protocol = eth_type_trans(skb, rxq->vport->netdev);

if (FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_ADV_RSC_M,
le16_to_cpu(rx_desc->hdrlen_flags)))
- return idpf_rx_rsc(rxq, skb, rx_desc, &decoded);
+ return idpf_rx_rsc(rxq, skb, rx_desc, parsed);

idpf_rx_splitq_extract_csum_bits(rx_desc, &csum_bits);
- idpf_rx_csum(rxq, skb, &csum_bits, &decoded);
+ idpf_rx_csum(rxq, skb, csum_bits, parsed);

return 0;
}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index e0660ede58ff..f082d3edeb9c 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -4,6 +4,8 @@
#ifndef _IDPF_TXRX_H_
#define _IDPF_TXRX_H_

+#include <linux/net/intel/libie/rx.h>
+
#include <net/page_pool/helpers.h>
#include <net/tcp.h>
#include <net/netdev_queues.h>
@@ -346,72 +348,6 @@ struct idpf_rx_buf {
#define IDPF_RX_MAX_BASE_PTYPE 256
#define IDPF_INVALID_PTYPE_ID 0xFFFF

-/* Packet type non-ip values */
-enum idpf_rx_ptype_l2 {
- IDPF_RX_PTYPE_L2_RESERVED = 0,
- IDPF_RX_PTYPE_L2_MAC_PAY2 = 1,
- IDPF_RX_PTYPE_L2_TIMESYNC_PAY2 = 2,
- IDPF_RX_PTYPE_L2_FIP_PAY2 = 3,
- IDPF_RX_PTYPE_L2_OUI_PAY2 = 4,
- IDPF_RX_PTYPE_L2_MACCNTRL_PAY2 = 5,
- IDPF_RX_PTYPE_L2_LLDP_PAY2 = 6,
- IDPF_RX_PTYPE_L2_ECP_PAY2 = 7,
- IDPF_RX_PTYPE_L2_EVB_PAY2 = 8,
- IDPF_RX_PTYPE_L2_QCN_PAY2 = 9,
- IDPF_RX_PTYPE_L2_EAPOL_PAY2 = 10,
- IDPF_RX_PTYPE_L2_ARP = 11,
-};
-
-enum idpf_rx_ptype_outer_ip {
- IDPF_RX_PTYPE_OUTER_L2 = 0,
- IDPF_RX_PTYPE_OUTER_IP = 1,
-};
-
-#define IDPF_RX_PTYPE_TO_IPV(ptype, ipv) \
- (((ptype)->outer_ip == IDPF_RX_PTYPE_OUTER_IP) && \
- ((ptype)->outer_ip_ver == (ipv)))
-
-enum idpf_rx_ptype_outer_ip_ver {
- IDPF_RX_PTYPE_OUTER_NONE = 0,
- IDPF_RX_PTYPE_OUTER_IPV4 = 1,
- IDPF_RX_PTYPE_OUTER_IPV6 = 2,
-};
-
-enum idpf_rx_ptype_outer_fragmented {
- IDPF_RX_PTYPE_NOT_FRAG = 0,
- IDPF_RX_PTYPE_FRAG = 1,
-};
-
-enum idpf_rx_ptype_tunnel_type {
- IDPF_RX_PTYPE_TUNNEL_NONE = 0,
- IDPF_RX_PTYPE_TUNNEL_IP_IP = 1,
- IDPF_RX_PTYPE_TUNNEL_IP_GRENAT = 2,
- IDPF_RX_PTYPE_TUNNEL_IP_GRENAT_MAC = 3,
- IDPF_RX_PTYPE_TUNNEL_IP_GRENAT_MAC_VLAN = 4,
-};
-
-enum idpf_rx_ptype_tunnel_end_prot {
- IDPF_RX_PTYPE_TUNNEL_END_NONE = 0,
- IDPF_RX_PTYPE_TUNNEL_END_IPV4 = 1,
- IDPF_RX_PTYPE_TUNNEL_END_IPV6 = 2,
-};
-
-enum idpf_rx_ptype_inner_prot {
- IDPF_RX_PTYPE_INNER_PROT_NONE = 0,
- IDPF_RX_PTYPE_INNER_PROT_UDP = 1,
- IDPF_RX_PTYPE_INNER_PROT_TCP = 2,
- IDPF_RX_PTYPE_INNER_PROT_SCTP = 3,
- IDPF_RX_PTYPE_INNER_PROT_ICMP = 4,
- IDPF_RX_PTYPE_INNER_PROT_TIMESYNC = 5,
-};
-
-enum idpf_rx_ptype_payload_layer {
- IDPF_RX_PTYPE_PAYLOAD_LAYER_NONE = 0,
- IDPF_RX_PTYPE_PAYLOAD_LAYER_PAY2 = 1,
- IDPF_RX_PTYPE_PAYLOAD_LAYER_PAY3 = 2,
- IDPF_RX_PTYPE_PAYLOAD_LAYER_PAY4 = 3,
-};
-
enum idpf_tunnel_state {
IDPF_PTYPE_TUNNEL_IP = BIT(0),
IDPF_PTYPE_TUNNEL_IP_GRENAT = BIT(1),
@@ -419,22 +355,9 @@ enum idpf_tunnel_state {
};

struct idpf_ptype_state {
- bool outer_ip;
- bool outer_frag;
- u8 tunnel_state;
-};
-
-struct idpf_rx_ptype_decoded {
- u32 ptype:10;
- u32 known:1;
- u32 outer_ip:1;
- u32 outer_ip_ver:2;
- u32 outer_frag:1;
- u32 tunnel_type:3;
- u32 tunnel_end_prot:2;
- u32 tunnel_end_frag:1;
- u32 inner_prot:4;
- u32 payload_layer:3;
+ bool outer_ip:1;
+ bool outer_frag:1;
+ u8 tunnel_state:6;
};

/**
@@ -1014,7 +937,6 @@ int idpf_vport_intr_alloc(struct idpf_vport *vport);
void idpf_vport_intr_update_itr_ena_irq(struct idpf_q_vector *q_vector);
void idpf_vport_intr_deinit(struct idpf_vport *vport);
int idpf_vport_intr_init(struct idpf_vport *vport);
-enum pkt_hash_types idpf_ptype_to_htype(const struct idpf_rx_ptype_decoded *decoded);
int idpf_config_rss(struct idpf_vport *vport);
int idpf_init_rss(struct idpf_vport *vport);
void idpf_deinit_rss(struct idpf_vport *vport);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index d0cdd63b3d5b..98c904f4dcf5 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -2614,39 +2614,52 @@ int idpf_send_get_set_rss_key_msg(struct idpf_vport *vport, bool get)
* @frag: fragmentation allowed
*
*/
-static void idpf_fill_ptype_lookup(struct idpf_rx_ptype_decoded *ptype,
+static void idpf_fill_ptype_lookup(struct libie_rx_ptype_parsed *ptype,
struct idpf_ptype_state *pstate,
bool ipv4, bool frag)
{
if (!pstate->outer_ip || !pstate->outer_frag) {
- ptype->outer_ip = IDPF_RX_PTYPE_OUTER_IP;
pstate->outer_ip = true;

if (ipv4)
- ptype->outer_ip_ver = IDPF_RX_PTYPE_OUTER_IPV4;
+ ptype->outer_ip = LIBIE_RX_PTYPE_OUTER_IPV4;
else
- ptype->outer_ip_ver = IDPF_RX_PTYPE_OUTER_IPV6;
+ ptype->outer_ip = LIBIE_RX_PTYPE_OUTER_IPV6;

if (frag) {
- ptype->outer_frag = IDPF_RX_PTYPE_FRAG;
+ ptype->outer_frag = LIBIE_RX_PTYPE_FRAG;
pstate->outer_frag = true;
}
} else {
- ptype->tunnel_type = IDPF_RX_PTYPE_TUNNEL_IP_IP;
+ ptype->tunnel_type = LIBIE_RX_PTYPE_TUNNEL_IP_IP;
pstate->tunnel_state = IDPF_PTYPE_TUNNEL_IP;

if (ipv4)
ptype->tunnel_end_prot =
- IDPF_RX_PTYPE_TUNNEL_END_IPV4;
+ LIBIE_RX_PTYPE_TUNNEL_END_IPV4;
else
ptype->tunnel_end_prot =
- IDPF_RX_PTYPE_TUNNEL_END_IPV6;
+ LIBIE_RX_PTYPE_TUNNEL_END_IPV6;

if (frag)
- ptype->tunnel_end_frag = IDPF_RX_PTYPE_FRAG;
+ ptype->tunnel_end_frag = LIBIE_RX_PTYPE_FRAG;
}
}

+static void idpf_finalize_ptype_lookup(struct libie_rx_ptype_parsed *ptype)
+{
+ if (ptype->payload_layer == LIBIE_RX_PTYPE_PAYLOAD_L2 &&
+ ptype->inner_prot)
+ ptype->payload_layer = LIBIE_RX_PTYPE_PAYLOAD_L4;
+ else if (ptype->payload_layer == LIBIE_RX_PTYPE_PAYLOAD_L2 &&
+ ptype->outer_ip)
+ ptype->payload_layer = LIBIE_RX_PTYPE_PAYLOAD_L3;
+ else if (ptype->outer_ip == LIBIE_RX_PTYPE_OUTER_L2)
+ ptype->payload_layer = LIBIE_RX_PTYPE_PAYLOAD_L2;
+ else
+ ptype->payload_layer = LIBIE_RX_PTYPE_PAYLOAD_NONE;
+}
+
/**
* idpf_send_get_rx_ptype_msg - Send virtchnl for ptype info
* @vport: virtual port data structure
@@ -2655,7 +2668,7 @@ static void idpf_fill_ptype_lookup(struct idpf_rx_ptype_decoded *ptype,
*/
int idpf_send_get_rx_ptype_msg(struct idpf_vport *vport)
{
- struct idpf_rx_ptype_decoded *ptype_lkup = vport->rx_ptype_lkup;
+ struct libie_rx_ptype_parsed *ptype_lkup = vport->rx_ptype_lkup;
struct virtchnl2_get_ptype_info get_ptype_info;
int max_ptype, ptypes_recvd = 0, ptype_offset;
struct idpf_adapter *adapter = vport->adapter;
@@ -2736,9 +2749,6 @@ int idpf_send_get_rx_ptype_msg(struct idpf_vport *vport)
else
k = ptype->ptype_id_8;

- if (ptype->proto_id_count)
- ptype_lkup[k].known = 1;
-
for (j = 0; j < ptype->proto_id_count; j++) {
id = le16_to_cpu(ptype->proto_id[j]);
switch (id) {
@@ -2746,18 +2756,18 @@ int idpf_send_get_rx_ptype_msg(struct idpf_vport *vport)
if (pstate.tunnel_state ==
IDPF_PTYPE_TUNNEL_IP) {
ptype_lkup[k].tunnel_type =
- IDPF_RX_PTYPE_TUNNEL_IP_GRENAT;
+ LIBIE_RX_PTYPE_TUNNEL_IP_GRENAT;
pstate.tunnel_state |=
IDPF_PTYPE_TUNNEL_IP_GRENAT;
}
break;
case VIRTCHNL2_PROTO_HDR_MAC:
ptype_lkup[k].outer_ip =
- IDPF_RX_PTYPE_OUTER_L2;
+ LIBIE_RX_PTYPE_OUTER_L2;
if (pstate.tunnel_state ==
IDPF_TUN_IP_GRE) {
ptype_lkup[k].tunnel_type =
- IDPF_RX_PTYPE_TUNNEL_IP_GRENAT_MAC;
+ LIBIE_RX_PTYPE_TUNNEL_IP_GRENAT_MAC;
pstate.tunnel_state |=
IDPF_PTYPE_TUNNEL_IP_GRENAT_MAC;
}
@@ -2784,23 +2794,23 @@ int idpf_send_get_rx_ptype_msg(struct idpf_vport *vport)
break;
case VIRTCHNL2_PROTO_HDR_UDP:
ptype_lkup[k].inner_prot =
- IDPF_RX_PTYPE_INNER_PROT_UDP;
+ LIBIE_RX_PTYPE_INNER_UDP;
break;
case VIRTCHNL2_PROTO_HDR_TCP:
ptype_lkup[k].inner_prot =
- IDPF_RX_PTYPE_INNER_PROT_TCP;
+ LIBIE_RX_PTYPE_INNER_TCP;
break;
case VIRTCHNL2_PROTO_HDR_SCTP:
ptype_lkup[k].inner_prot =
- IDPF_RX_PTYPE_INNER_PROT_SCTP;
+ LIBIE_RX_PTYPE_INNER_SCTP;
break;
case VIRTCHNL2_PROTO_HDR_ICMP:
ptype_lkup[k].inner_prot =
- IDPF_RX_PTYPE_INNER_PROT_ICMP;
+ LIBIE_RX_PTYPE_INNER_ICMP;
break;
case VIRTCHNL2_PROTO_HDR_PAY:
ptype_lkup[k].payload_layer =
- IDPF_RX_PTYPE_PAYLOAD_LAYER_PAY2;
+ LIBIE_RX_PTYPE_PAYLOAD_L2;
break;
case VIRTCHNL2_PROTO_HDR_ICMPV6:
case VIRTCHNL2_PROTO_HDR_IPV6_EH:
@@ -2854,6 +2864,8 @@ int idpf_send_get_rx_ptype_msg(struct idpf_vport *vport)
break;
}
}
+
+ idpf_finalize_ptype_lookup(&ptype_lkup[k]);
}
}

--
2.43.0


2023-12-23 02:59:21

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 02/34] idpf: pack &idpf_queue way more efficiently

Currently, sizeof(struct idpf_queue) is 32 Kb.
This is due to the 12-bit hashtable declaration at the end of the queue.
This HT is needed only for Tx queues when the flow scheduling mode is
enabled. But &idpf_queue is unified for all of the queue types,
provoking excessive memory usage.
Instead, allocate those hashtables dynamically only when needed at the
moment of Tx queues initialization.
Next, reshuffle queue fields to reduce holes and ensure better cacheline
locality. Ideally, &idpf_queue must be split into 4, as lots of fields
are used only in 1 or 2 types, but for now, just unionize as much as we
can.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 58 ++++++++++++-----
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 72 ++++++++++++++-------
2 files changed, 91 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 70785f9afadd..d81eff39a632 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -179,6 +179,9 @@ static int idpf_tx_buf_alloc_all(struct idpf_queue *tx_q)
for (i = 0; i < tx_q->desc_count; i++)
tx_q->tx_buf[i].compl_tag = IDPF_SPLITQ_TX_INVAL_COMPL_TAG;

+ if (!test_bit(__IDPF_Q_FLOW_SCH_EN, tx_q->flags))
+ return 0;
+
/* Initialize tx buf stack for out-of-order completions if
* flow scheduling offload is enabled
*/
@@ -801,11 +804,16 @@ static int idpf_rx_desc_alloc_all(struct idpf_vport *vport)
*/
static void idpf_txq_group_rel(struct idpf_vport *vport)
{
+ bool split, flow_sch_en;
int i, j;

if (!vport->txq_grps)
return;

+ split = idpf_is_queue_model_split(vport->txq_model);
+ flow_sch_en = !idpf_is_cap_ena(vport->adapter, IDPF_OTHER_CAPS,
+ VIRTCHNL2_CAP_SPLITQ_QSCHED);
+
for (i = 0; i < vport->num_txq_grp; i++) {
struct idpf_txq_group *txq_grp = &vport->txq_grps[i];

@@ -813,8 +821,15 @@ static void idpf_txq_group_rel(struct idpf_vport *vport)
kfree(txq_grp->txqs[j]);
txq_grp->txqs[j] = NULL;
}
+
+ if (!split)
+ continue;
+
kfree(txq_grp->complq);
txq_grp->complq = NULL;
+
+ if (flow_sch_en)
+ kfree(txq_grp->hashes);
}
kfree(vport->txq_grps);
vport->txq_grps = NULL;
@@ -1157,20 +1172,22 @@ static void idpf_rxq_set_descids(struct idpf_vport *vport, struct idpf_queue *q)
*/
static int idpf_txq_group_alloc(struct idpf_vport *vport, u16 num_txq)
{
- bool flow_sch_en;
- int err, i;
+ bool split, flow_sch_en;
+ int i;

vport->txq_grps = kcalloc(vport->num_txq_grp,
sizeof(*vport->txq_grps), GFP_KERNEL);
if (!vport->txq_grps)
return -ENOMEM;

+ split = idpf_is_queue_model_split(vport->txq_model);
flow_sch_en = !idpf_is_cap_ena(vport->adapter, IDPF_OTHER_CAPS,
VIRTCHNL2_CAP_SPLITQ_QSCHED);

for (i = 0; i < vport->num_txq_grp; i++) {
struct idpf_txq_group *tx_qgrp = &vport->txq_grps[i];
struct idpf_adapter *adapter = vport->adapter;
+ struct idpf_txq_hash *hashes;
int j;

tx_qgrp->vport = vport;
@@ -1179,10 +1196,16 @@ static int idpf_txq_group_alloc(struct idpf_vport *vport, u16 num_txq)
for (j = 0; j < tx_qgrp->num_txq; j++) {
tx_qgrp->txqs[j] = kzalloc(sizeof(*tx_qgrp->txqs[j]),
GFP_KERNEL);
- if (!tx_qgrp->txqs[j]) {
- err = -ENOMEM;
+ if (!tx_qgrp->txqs[j])
goto err_alloc;
- }
+ }
+
+ if (split && flow_sch_en) {
+ hashes = kcalloc(num_txq, sizeof(*hashes), GFP_KERNEL);
+ if (!hashes)
+ goto err_alloc;
+
+ tx_qgrp->hashes = hashes;
}

for (j = 0; j < tx_qgrp->num_txq; j++) {
@@ -1194,22 +1217,26 @@ static int idpf_txq_group_alloc(struct idpf_vport *vport, u16 num_txq)
q->tx_min_pkt_len = idpf_get_min_tx_pkt_len(adapter);
q->vport = vport;
q->txq_grp = tx_qgrp;
- hash_init(q->sched_buf_hash);

- if (flow_sch_en)
- set_bit(__IDPF_Q_FLOW_SCH_EN, q->flags);
+ if (!flow_sch_en)
+ continue;
+
+ if (split) {
+ q->sched_buf_hash = &hashes[j];
+ hash_init(q->sched_buf_hash->hash);
+ }
+
+ set_bit(__IDPF_Q_FLOW_SCH_EN, q->flags);
}

- if (!idpf_is_queue_model_split(vport->txq_model))
+ if (!split)
continue;

tx_qgrp->complq = kcalloc(IDPF_COMPLQ_PER_GROUP,
sizeof(*tx_qgrp->complq),
GFP_KERNEL);
- if (!tx_qgrp->complq) {
- err = -ENOMEM;
+ if (!tx_qgrp->complq)
goto err_alloc;
- }

tx_qgrp->complq->dev = &adapter->pdev->dev;
tx_qgrp->complq->desc_count = vport->complq_desc_count;
@@ -1225,7 +1252,7 @@ static int idpf_txq_group_alloc(struct idpf_vport *vport, u16 num_txq)
err_alloc:
idpf_txq_group_rel(vport);

- return err;
+ return -ENOMEM;
}

/**
@@ -1512,7 +1539,7 @@ static void idpf_tx_clean_stashed_bufs(struct idpf_queue *txq, u16 compl_tag,
struct hlist_node *tmp_buf;

/* Buffer completion */
- hash_for_each_possible_safe(txq->sched_buf_hash, stash, tmp_buf,
+ hash_for_each_possible_safe(txq->sched_buf_hash->hash, stash, tmp_buf,
hlist, compl_tag) {
if (unlikely(stash->buf.compl_tag != (int)compl_tag))
continue;
@@ -1567,7 +1594,8 @@ static int idpf_stash_flow_sch_buffers(struct idpf_queue *txq,
stash->buf.compl_tag = tx_buf->compl_tag;

/* Add buffer to buf_hash table to be freed later */
- hash_add(txq->sched_buf_hash, &stash->hlist, stash->buf.compl_tag);
+ hash_add(txq->sched_buf_hash->hash, &stash->hlist,
+ stash->buf.compl_tag);

memset(tx_buf, 0, sizeof(struct idpf_tx_buf));

diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index f082d3edeb9c..4a97790cbf68 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -623,7 +623,6 @@ struct idpf_queue {
struct idpf_txq_group *txq_grp;
struct idpf_rxq_group *rxq_grp;
};
- u16 idx;
void __iomem *tail;
union {
struct idpf_tx_buf *tx_buf;
@@ -634,7 +633,8 @@ struct idpf_queue {
} rx_buf;
};
struct page_pool *pp;
- struct sk_buff *skb;
+ void *desc_ring;
+ u16 idx;
u16 q_type;
u32 q_id;
u16 desc_count;
@@ -644,38 +644,57 @@ struct idpf_queue {
u16 next_to_alloc;
DECLARE_BITMAP(flags, __IDPF_Q_FLAGS_NBITS);

+ struct idpf_q_vector *q_vector;
+
union idpf_queue_stats q_stats;
struct u64_stats_sync stats_sync;

- u32 cleaned_bytes;
- u16 cleaned_pkts;
+ union {
+ /* Rx */
+ struct {
+ u64 rxdids;
+ u8 rx_buffer_low_watermark;
+ bool rx_hsplit_en:1;
+ u16 rx_hbuf_size;
+ u16 rx_buf_size;
+ u16 rx_max_pkt_size;
+ u16 rx_buf_stride;
+ };
+ /* Tx */
+ struct {
+ u32 cleaned_bytes;
+ u16 cleaned_pkts;

- bool rx_hsplit_en;
- u16 rx_hbuf_size;
- u16 rx_buf_size;
- u16 rx_max_pkt_size;
- u16 rx_buf_stride;
- u8 rx_buffer_low_watermark;
- u64 rxdids;
- struct idpf_q_vector *q_vector;
- unsigned int size;
- dma_addr_t dma;
- void *desc_ring;
+ u16 tx_max_bufs;
+ u8 tx_min_pkt_len;

- u16 tx_max_bufs;
- u8 tx_min_pkt_len;
+ u32 num_completions;

- u32 num_completions;
+ struct idpf_buf_lifo buf_stack;
+ };
+ };

- struct idpf_buf_lifo buf_stack;
+ union {
+ /* Rx */
+ struct {
+ struct sk_buff *skb;
+ };
+ /* Tx */
+ struct {
+ u16 compl_tag_bufid_m;
+ u16 compl_tag_gen_s;

- u16 compl_tag_bufid_m;
- u16 compl_tag_gen_s;
+ u16 compl_tag_cur_gen;
+ u16 compl_tag_gen_max;

- u16 compl_tag_cur_gen;
- u16 compl_tag_gen_max;
+ struct idpf_txq_hash *sched_buf_hash;
+ };
+ };

- DECLARE_HASHTABLE(sched_buf_hash, 12);
+ /* Slowpath */
+
+ dma_addr_t dma;
+ unsigned int size;
} ____cacheline_internodealigned_in_smp;

/**
@@ -768,6 +787,10 @@ struct idpf_rxq_group {
};
};

+struct idpf_txq_hash {
+ DECLARE_HASHTABLE(hash, 12);
+};
+
/**
* struct idpf_txq_group
* @vport: Vport back pointer
@@ -787,6 +810,7 @@ struct idpf_txq_group {

u16 num_txq;
struct idpf_queue *txqs[IDPF_LARGE_MAX_Q];
+ struct idpf_txq_hash *hashes;

struct idpf_queue *complq;

--
2.43.0


2023-12-23 02:59:44

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 03/34] idpf: remove legacy Page Pool Ethtool stats

Page Pool Ethtool stats are deprecated since the Netlink Page Pool
interface introduction.
idpf receives big changes in Rx buffer management, including &page_pool
layout, so keeping these deprecated stats does only harm, not speaking
of that CONFIG_IDPF selects CONFIG_PAGE_POOL_STATS unconditionally,
while the latter is often turned off for better performance.
Remove all the references to PP stats from the Ethtool code. The stats
are still available in their full via the generic Netlink interface.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/Kconfig | 1 -
.../net/ethernet/intel/idpf/idpf_ethtool.c | 29 +------------------
2 files changed, 1 insertion(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index 0db1aa36866e..d2e9bef2e0cb 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -380,7 +380,6 @@ config IDPF
select DIMLIB
select LIBIE
select PAGE_POOL
- select PAGE_POOL_STATS
help
This driver supports Intel(R) Infrastructure Data Path Function
devices.
diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
index 986d429d1175..da7963f27bd8 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
@@ -571,8 +571,6 @@ static void idpf_get_stat_strings(struct net_device *netdev, u8 *data)
for (i = 0; i < vport_config->max_q.max_rxq; i++)
idpf_add_qstat_strings(&data, idpf_gstrings_rx_queue_stats,
"rx", i);
-
- page_pool_ethtool_stats_get_strings(data);
}

/**
@@ -606,7 +604,6 @@ static int idpf_get_sset_count(struct net_device *netdev, int sset)
struct idpf_netdev_priv *np = netdev_priv(netdev);
struct idpf_vport_config *vport_config;
u16 max_txq, max_rxq;
- unsigned int size;

if (sset != ETH_SS_STATS)
return -EINVAL;
@@ -625,11 +622,8 @@ static int idpf_get_sset_count(struct net_device *netdev, int sset)
max_txq = vport_config->max_q.max_txq;
max_rxq = vport_config->max_q.max_rxq;

- size = IDPF_PORT_STATS_LEN + (IDPF_TX_QUEUE_STATS_LEN * max_txq) +
+ return IDPF_PORT_STATS_LEN + (IDPF_TX_QUEUE_STATS_LEN * max_txq) +
(IDPF_RX_QUEUE_STATS_LEN * max_rxq);
- size += page_pool_ethtool_stats_get_count();
-
- return size;
}

/**
@@ -877,7 +871,6 @@ static void idpf_get_ethtool_stats(struct net_device *netdev,
{
struct idpf_netdev_priv *np = netdev_priv(netdev);
struct idpf_vport_config *vport_config;
- struct page_pool_stats pp_stats = { };
struct idpf_vport *vport;
unsigned int total = 0;
unsigned int i, j;
@@ -947,32 +940,12 @@ static void idpf_get_ethtool_stats(struct net_device *netdev,
idpf_add_empty_queue_stats(&data, qtype);
else
idpf_add_queue_stats(&data, rxq);
-
- /* In splitq mode, don't get page pool stats here since
- * the pools are attached to the buffer queues
- */
- if (is_splitq)
- continue;
-
- if (rxq)
- page_pool_get_stats(rxq->pp, &pp_stats);
- }
- }
-
- for (i = 0; i < vport->num_rxq_grp; i++) {
- for (j = 0; j < vport->num_bufqs_per_qgrp; j++) {
- struct idpf_queue *rxbufq =
- &vport->rxq_grps[i].splitq.bufq_sets[j].bufq;
-
- page_pool_get_stats(rxbufq->pp, &pp_stats);
}
}

for (; total < vport_config->max_q.max_rxq; total++)
idpf_add_empty_queue_stats(&data, VIRTCHNL2_QUEUE_TYPE_RX);

- page_pool_ethtool_stats_get(data, &pp_stats);
-
rcu_read_unlock();

idpf_vport_ctrl_unlock(netdev);
--
2.43.0


2023-12-23 03:00:02

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 04/34] libie: support different types of buffers for Rx

Unlike previous generations, idpf requires more buffer types for optimal
performance. This includes: header buffers, short buffers, and
no-overhead buffers (w/o headroom and tailroom, for TCP zerocopy when
the header split is enabled).
Introduce libie Rx buffer type and calculate page_pool params
accordingly. All the HW-related details like buffer alignment are still
accounted. For the header buffers, pick 256 bytes as in most places in
the kernel (have you ever seen frames with bigger headers?).

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/libie/rx.c | 107 +++++++++++++++++++++++---
include/linux/net/intel/libie/rx.h | 19 +++++
2 files changed, 115 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/libie/rx.c b/drivers/net/ethernet/intel/libie/rx.c
index 610f16043bcf..3d3b19d2b40d 100644
--- a/drivers/net/ethernet/intel/libie/rx.c
+++ b/drivers/net/ethernet/intel/libie/rx.c
@@ -6,14 +6,14 @@
/* Rx buffer management */

/**
- * libie_rx_hw_len - get the actual buffer size to be passed to HW
+ * libie_rx_hw_len_mtu - get the actual buffer size to be passed to HW
* @pp: &page_pool_params of the netdev to calculate the size for
*
* Return: HW-writeable length per one buffer to pass it to the HW accounting:
* MTU the @dev has, HW required alignment, minimum and maximum allowed values,
* and system's page size.
*/
-static u32 libie_rx_hw_len(const struct page_pool_params *pp)
+static u32 libie_rx_hw_len_mtu(const struct page_pool_params *pp)
{
u32 len;

@@ -24,6 +24,96 @@ static u32 libie_rx_hw_len(const struct page_pool_params *pp)
return len;
}

+/**
+ * libie_rx_hw_len_truesize - get the short buffer size to be passed to HW
+ * @pp: &page_pool_params of the netdev to calculate the size for
+ * @truesize: desired truesize for the buffers
+ *
+ * Return: HW-writeable length per one buffer to pass it to the HW ignoring the
+ * MTU and closest to the passed truesize. Can be used for "short" buffer
+ * queues to fragment pages more efficiently.
+ */
+static u32 libie_rx_hw_len_truesize(const struct page_pool_params *pp,
+ u32 truesize)
+{
+ u32 min, len;
+
+ min = SKB_HEAD_ALIGN(pp->offset + LIBIE_RX_BUF_LEN_ALIGN);
+ truesize = clamp(roundup_pow_of_two(truesize), roundup_pow_of_two(min),
+ PAGE_SIZE << LIBIE_RX_PAGE_ORDER);
+
+ len = SKB_WITH_OVERHEAD(truesize - pp->offset);
+ len = ALIGN_DOWN(len, LIBIE_RX_BUF_LEN_ALIGN);
+ len = clamp(len, LIBIE_MIN_RX_BUF_LEN, pp->max_len);
+
+ return len;
+}
+
+static void libie_rx_page_pool_params(struct libie_buf_queue *bq,
+ struct page_pool_params *pp)
+{
+ pp->offset = LIBIE_SKB_HEADROOM;
+ /* HW-writeable / syncable length per one page */
+ pp->max_len = LIBIE_RX_BUF_LEN(pp->offset);
+
+ /* HW-writeable length per buffer */
+ switch (bq->type) {
+ case LIBIE_RX_BUF_MTU:
+ bq->rx_buf_len = libie_rx_hw_len_mtu(pp);
+ break;
+ case LIBIE_RX_BUF_SHORT:
+ bq->rx_buf_len = libie_rx_hw_len_truesize(pp, bq->truesize);
+ break;
+ case LIBIE_RX_BUF_HDR:
+ bq->rx_buf_len = ALIGN(LIBIE_MAX_HEAD, LIBIE_RX_BUF_LEN_ALIGN);
+ break;
+ default:
+ break;
+ }
+
+ /* Buffer size to allocate */
+ bq->truesize = roundup_pow_of_two(SKB_HEAD_ALIGN(pp->offset +
+ bq->rx_buf_len));
+}
+
+/**
+ * libie_rx_page_pool_params_zc - calculate params without the stack overhead
+ * @bq: buffer queue to calculate the size for
+ * @pp: &page_pool_params of the netdev
+ *
+ * Adjusts the PP params to exclude the stack overhead and sets both the buffer
+ * lengh and the truesize, which are equal for the data buffers. Note that this
+ * requires separate header buffers to be always active and account the
+ * overhead.
+ * With the MTU == ``PAGE_SIZE``, this allows the kernel to enable the zerocopy
+ * mode.
+ */
+static bool libie_rx_page_pool_params_zc(struct libie_buf_queue *bq,
+ struct page_pool_params *pp)
+{
+ u32 mtu;
+
+ pp->offset = 0;
+ pp->max_len = PAGE_SIZE << LIBIE_RX_PAGE_ORDER;
+
+ switch (bq->type) {
+ case LIBIE_RX_BUF_MTU:
+ mtu = READ_ONCE(pp->netdev->mtu);
+ break;
+ case LIBIE_RX_BUF_SHORT:
+ mtu = bq->truesize;
+ break;
+ default:
+ return false;
+ }
+
+ bq->rx_buf_len = clamp(roundup_pow_of_two(mtu), LIBIE_RX_BUF_LEN_ALIGN,
+ pp->max_len);
+ bq->truesize = bq->rx_buf_len;
+
+ return true;
+}
+
/**
* libie_rx_page_pool_create - create a PP with the default libie settings
* @bq: buffer queue struct to fill
@@ -43,17 +133,12 @@ int libie_rx_page_pool_create(struct libie_buf_queue *bq,
.netdev = napi->dev,
.napi = napi,
.dma_dir = DMA_FROM_DEVICE,
- .offset = LIBIE_SKB_HEADROOM,
};

- /* HW-writeable / syncable length per one page */
- pp.max_len = LIBIE_RX_BUF_LEN(pp.offset);
-
- /* HW-writeable length per buffer */
- bq->rx_buf_len = libie_rx_hw_len(&pp);
- /* Buffer size to allocate */
- bq->truesize = roundup_pow_of_two(SKB_HEAD_ALIGN(pp.offset +
- bq->rx_buf_len));
+ if (!bq->hsplit)
+ libie_rx_page_pool_params(bq, &pp);
+ else if (!libie_rx_page_pool_params_zc(bq, &pp))
+ return -EINVAL;

bq->pp = page_pool_create(&pp);

diff --git a/include/linux/net/intel/libie/rx.h b/include/linux/net/intel/libie/rx.h
index 0d6bce19ad6b..87ad8f9e89c7 100644
--- a/include/linux/net/intel/libie/rx.h
+++ b/include/linux/net/intel/libie/rx.h
@@ -19,6 +19,8 @@
#define LIBIE_MAX_HEADROOM LIBIE_SKB_HEADROOM
/* Link layer / L2 overhead: Ethernet, 2 VLAN tags (C + S), FCS */
#define LIBIE_RX_LL_LEN (ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN)
+/* Maximum supported L2-L4 header length */
+#define LIBIE_MAX_HEAD 256

/* Always use order-0 pages */
#define LIBIE_RX_PAGE_ORDER 0
@@ -64,6 +66,18 @@ struct libie_rx_buffer {
u32 truesize;
};

+/**
+ * enum libie_rx_buf_type - enum representing types of Rx buffers
+ * @LIBIE_RX_BUF_MTU: buffer size is determined by MTU
+ * @LIBIE_RX_BUF_SHORT: buffer size is smaller than MTU, for short frames
+ * @LIBIE_RX_BUF_HDR: buffer size is ```LIBIE_MAX_HEAD```-sized, for headers
+ */
+enum libie_rx_buf_type {
+ LIBIE_RX_BUF_MTU = 0U,
+ LIBIE_RX_BUF_SHORT,
+ LIBIE_RX_BUF_HDR,
+};
+
/**
* struct libie_buf_queue - structure representing a buffer queue
* @pp: &page_pool for buffer management
@@ -71,6 +85,8 @@ struct libie_rx_buffer {
* @truesize: size to allocate per buffer, w/overhead
* @count: number of descriptors/buffers the queue has
* @rx_buf_len: HW-writeable length per each buffer
+ * @type: type of the buffers this queue has
+ * @hsplit: flag whether header split is enabled
*/
struct libie_buf_queue {
struct page_pool *pp;
@@ -81,6 +97,9 @@ struct libie_buf_queue {

/* Cold fields */
u32 rx_buf_len;
+ enum libie_rx_buf_type type:2;
+
+ bool hsplit:1;
};

int libie_rx_page_pool_create(struct libie_buf_queue *bq,
--
2.43.0


2023-12-23 03:00:22

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 05/34] idpf: convert header split mode to libie + napi_build_skb()

Currently, idpf uses the following model for the header buffers:

* buffers are allocated via dma_alloc_coherent();
* when receiving, napi_alloc_skb() is called and then the header is
copied to the newly allocated linear part.

This is far from optimal as DMA coherent zone is slow on many systems
and memcpy() neutralizes the idea and benefits of the header split.
Instead, use libie to create page_pools for the header buffers, allocate
them dynamically and then build an skb via napi_build_skb() around them
with no memory copy. With one exception...
When you enable header split, you except you'll always have a separate
header buffer, so that you could reserve headroom and tailroom only
there and then use full buffers for the data. For example, this is how
TCP zerocopy works -- you have to have the payload aligned to PAGE_SIZE.
The current hardware running idpf does *not* guarantee that you'll
always have headers placed separately. For example, on my setup, even
ICMP packets are written as one piece to the data buffers. You can't
build a valid skb around a data buffer in this case.
To not complicate things and not lose TCP zerocopy etc., when such thing
happens, use the empty header buffer and pull either full frame (if it's
short) or the Ethernet header there and build an skb around it. GRO
layer will pull more from the data buffer later. This W/A will hopefully
be removed one day.

Signed-off-by: Alexander Lobakin <[email protected]>
---
.../ethernet/intel/idpf/idpf_singleq_txrx.c | 2 +
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 219 +++++++++++-------
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 23 +-
.../net/ethernet/intel/idpf/idpf_virtchnl.c | 14 +-
4 files changed, 159 insertions(+), 99 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
index e58e08c9997d..53ff572ce252 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
@@ -868,6 +868,8 @@ static void idpf_rx_singleq_process_skb_fields(struct idpf_queue *rx_q,
idpf_rx_singleq_flex_hash(rx_q, skb, rx_desc, parsed);
idpf_rx_singleq_flex_csum(rx_q, skb, rx_desc, parsed);
}
+
+ skb_record_rx_queue(skb, rx_q->idx);
}

/**
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index d81eff39a632..f696fd9839fc 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -331,18 +331,17 @@ static int idpf_tx_desc_alloc_all(struct idpf_vport *vport)

/**
* idpf_rx_page_rel - Release an rx buffer page
- * @rxq: the queue that owns the buffer
* @rx_buf: the buffer to free
*/
-static void idpf_rx_page_rel(struct idpf_queue *rxq, struct idpf_rx_buf *rx_buf)
+static void idpf_rx_page_rel(struct libie_rx_buffer *rx_buf)
{
if (unlikely(!rx_buf->page))
return;

- page_pool_put_full_page(rxq->pp, rx_buf->page, false);
+ page_pool_put_full_page(rx_buf->page->pp, rx_buf->page, false);

rx_buf->page = NULL;
- rx_buf->page_offset = 0;
+ rx_buf->offset = 0;
}

/**
@@ -351,13 +350,17 @@ static void idpf_rx_page_rel(struct idpf_queue *rxq, struct idpf_rx_buf *rx_buf)
*/
static void idpf_rx_hdr_buf_rel_all(struct idpf_queue *rxq)
{
- struct idpf_adapter *adapter = rxq->vport->adapter;
+ struct libie_buf_queue bq = {
+ .pp = rxq->hdr_pp,
+ };
+
+ for (u32 i = 0; i < rxq->desc_count; i++)
+ idpf_rx_page_rel(&rxq->rx_buf.hdr_buf[i]);
+
+ libie_rx_page_pool_destroy(&bq);
+ rxq->hdr_pp = bq.pp;

- dma_free_coherent(&adapter->pdev->dev,
- rxq->desc_count * IDPF_HDR_BUF_SIZE,
- rxq->rx_buf.hdr_buf_va,
- rxq->rx_buf.hdr_buf_pa);
- rxq->rx_buf.hdr_buf_va = NULL;
+ kfree(rxq->rx_buf.hdr_buf);
}

/**
@@ -374,7 +377,7 @@ static void idpf_rx_buf_rel_all(struct idpf_queue *rxq)

/* Free all the bufs allocated and given to hw on Rx queue */
for (i = 0; i < rxq->desc_count; i++)
- idpf_rx_page_rel(rxq, &rxq->rx_buf.buf[i]);
+ idpf_rx_page_rel(&rxq->rx_buf.buf[i]);

if (rxq->rx_hsplit_en)
idpf_rx_hdr_buf_rel_all(rxq);
@@ -484,17 +487,33 @@ void idpf_rx_buf_hw_update(struct idpf_queue *rxq, u32 val)
*/
static int idpf_rx_hdr_buf_alloc_all(struct idpf_queue *rxq)
{
- struct idpf_adapter *adapter = rxq->vport->adapter;
-
- rxq->rx_buf.hdr_buf_va =
- dma_alloc_coherent(&adapter->pdev->dev,
- IDPF_HDR_BUF_SIZE * rxq->desc_count,
- &rxq->rx_buf.hdr_buf_pa,
- GFP_KERNEL);
- if (!rxq->rx_buf.hdr_buf_va)
+ struct libie_buf_queue bq = {
+ .count = rxq->desc_count,
+ .type = LIBIE_RX_BUF_HDR,
+ };
+ struct libie_rx_buffer *hdr_buf;
+ int ret;
+
+ hdr_buf = kcalloc(bq.count, sizeof(*hdr_buf), GFP_KERNEL);
+ if (!hdr_buf)
return -ENOMEM;

+ rxq->rx_buf.hdr_buf = hdr_buf;
+
+ ret = libie_rx_page_pool_create(&bq, &rxq->q_vector->napi);
+ if (ret)
+ goto free_hdr;
+
+ rxq->hdr_pp = bq.pp;
+ rxq->hdr_truesize = bq.truesize;
+ rxq->rx_hbuf_size = bq.rx_buf_len;
+
return 0;
+
+free_hdr:
+ kfree(hdr_buf);
+
+ return ret;
}

/**
@@ -529,6 +548,9 @@ static void idpf_rx_post_buf_refill(struct idpf_sw_queue *refillq, u16 buf_id)
static bool idpf_rx_post_buf_desc(struct idpf_queue *bufq, u16 buf_id)
{
struct virtchnl2_splitq_rx_buf_desc *splitq_rx_desc = NULL;
+ struct libie_buf_queue bq = {
+ .count = bufq->desc_count,
+ };
u16 nta = bufq->next_to_alloc;
struct idpf_rx_buf *buf;
dma_addr_t addr;
@@ -537,9 +559,15 @@ static bool idpf_rx_post_buf_desc(struct idpf_queue *bufq, u16 buf_id)
buf = &bufq->rx_buf.buf[buf_id];

if (bufq->rx_hsplit_en) {
- splitq_rx_desc->hdr_addr =
- cpu_to_le64(bufq->rx_buf.hdr_buf_pa +
- (u32)buf_id * IDPF_HDR_BUF_SIZE);
+ bq.pp = bufq->hdr_pp;
+ bq.rx_bi = bufq->rx_buf.hdr_buf;
+ bq.truesize = bufq->hdr_truesize;
+
+ addr = libie_rx_alloc(&bq, buf_id);
+ if (addr == DMA_MAPPING_ERROR)
+ return false;
+
+ splitq_rx_desc->hdr_addr = cpu_to_le64(addr);
}

addr = idpf_alloc_page(bufq->pp, buf, bufq->rx_buf_size);
@@ -1328,11 +1356,7 @@ static int idpf_rxq_group_alloc(struct idpf_vport *vport, u16 num_rxq)
q->rx_buf_size = vport->bufq_size[j];
q->rx_buffer_low_watermark = IDPF_LOW_WATERMARK;
q->rx_buf_stride = IDPF_RX_BUF_STRIDE;
-
- if (hs) {
- q->rx_hsplit_en = true;
- q->rx_hbuf_size = IDPF_HDR_BUF_SIZE;
- }
+ q->rx_hsplit_en = hs;

bufq_set->num_refillqs = num_rxq;
bufq_set->refillqs = kcalloc(num_rxq, swq_size,
@@ -1373,10 +1397,7 @@ static int idpf_rxq_group_alloc(struct idpf_vport *vport, u16 num_rxq)
rx_qgrp->splitq.rxq_sets[j]->refillq1 =
&rx_qgrp->splitq.bufq_sets[1].refillqs[j];

- if (hs) {
- q->rx_hsplit_en = true;
- q->rx_hbuf_size = IDPF_HDR_BUF_SIZE;
- }
+ q->rx_hsplit_en = hs;

setup_rxq:
q->dev = &adapter->pdev->dev;
@@ -2947,6 +2968,8 @@ static int idpf_rx_process_skb_fields(struct idpf_queue *rxq,
idpf_rx_splitq_extract_csum_bits(rx_desc, &csum_bits);
idpf_rx_csum(rxq, skb, csum_bits, parsed);

+ skb_record_rx_queue(skb, rxq->idx);
+
return 0;
}

@@ -2964,7 +2987,7 @@ void idpf_rx_add_frag(struct idpf_rx_buf *rx_buf, struct sk_buff *skb,
unsigned int size)
{
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buf->page,
- rx_buf->page_offset, size, rx_buf->truesize);
+ rx_buf->offset, size, rx_buf->truesize);

rx_buf->page = NULL;
}
@@ -2987,7 +3010,7 @@ struct sk_buff *idpf_rx_construct_skb(struct idpf_queue *rxq,
struct sk_buff *skb;
void *va;

- va = page_address(rx_buf->page) + rx_buf->page_offset;
+ va = page_address(rx_buf->page) + rx_buf->offset;

/* prefetch first cache line of first page */
net_prefetch(va);
@@ -3000,7 +3023,6 @@ struct sk_buff *idpf_rx_construct_skb(struct idpf_queue *rxq,
return NULL;
}

- skb_record_rx_queue(skb, rxq->idx);
skb_mark_for_recycle(skb);

/* Determine available headroom for copy */
@@ -3019,7 +3041,7 @@ struct sk_buff *idpf_rx_construct_skb(struct idpf_queue *rxq,
return skb;
}

- skb_add_rx_frag(skb, 0, rx_buf->page, rx_buf->page_offset + headlen,
+ skb_add_rx_frag(skb, 0, rx_buf->page, rx_buf->offset + headlen,
size, rx_buf->truesize);

/* Since we're giving the page to the stack, clear our reference to it.
@@ -3031,36 +3053,31 @@ struct sk_buff *idpf_rx_construct_skb(struct idpf_queue *rxq,
}

/**
- * idpf_rx_hdr_construct_skb - Allocate skb and populate it from header buffer
- * @rxq: Rx descriptor queue
- * @va: Rx buffer to pull data from
+ * idpf_rx_build_skb - Allocate skb and populate it from header buffer
+ * @buf: Rx buffer to pull data from
* @size: the length of the packet
*
* This function allocates an skb. It then populates it with the page data from
* the current receive descriptor, taking care to set up the skb correctly.
- * This specifically uses a header buffer to start building the skb.
*/
-static struct sk_buff *idpf_rx_hdr_construct_skb(struct idpf_queue *rxq,
- const void *va,
- unsigned int size)
+struct sk_buff *idpf_rx_build_skb(const struct libie_rx_buffer *buf, u32 size)
{
+ u32 hr = buf->page->pp->p.offset;
struct sk_buff *skb;
+ void *va;

- /* allocate a skb to store the frags */
- skb = __napi_alloc_skb(&rxq->q_vector->napi, size, GFP_ATOMIC);
+ va = page_address(buf->page) + buf->offset;
+ net_prefetch(va + hr);
+
+ skb = napi_build_skb(va, buf->truesize);
if (unlikely(!skb))
return NULL;

- skb_record_rx_queue(skb, rxq->idx);
-
- memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
-
- /* More than likely, a payload fragment, which will use a page from
- * page_pool will be added to the SKB so mark it for recycle
- * preemptively. And if not, it's inconsequential.
- */
skb_mark_for_recycle(skb);

+ skb_reserve(skb, hr);
+ __skb_put(skb, size);
+
return skb;
}

@@ -3091,6 +3108,26 @@ static bool idpf_rx_splitq_is_eop(struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_de
IDPF_RXD_EOF_SPLITQ));
}

+static u32 idpf_rx_hsplit_wa(struct libie_rx_buffer *hdr,
+ struct libie_rx_buffer *buf,
+ u32 data_len)
+{
+ u32 copy = data_len <= SMP_CACHE_BYTES ? data_len : ETH_HLEN;
+ const void *src;
+ void *dst;
+
+ if (!libie_rx_sync_for_cpu(buf, copy))
+ return 0;
+
+ dst = page_address(hdr->page) + hdr->offset + hdr->page->pp->p.offset;
+ src = page_address(buf->page) + buf->offset + buf->page->pp->p.offset;
+ memcpy(dst, src, ALIGN(copy, sizeof(long)));
+
+ buf->offset += copy;
+
+ return copy;
+}
+
/**
* idpf_rx_splitq_clean - Clean completed descriptors from Rx queue
* @rxq: Rx descriptor queue to retrieve receive buffer queue
@@ -3113,16 +3150,16 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)
/* Process Rx packets bounded by budget */
while (likely(total_rx_pkts < budget)) {
struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc;
+ struct idpf_rx_buf *hdr, *rx_buf = NULL;
struct idpf_sw_queue *refillq = NULL;
struct idpf_rxq_set *rxq_set = NULL;
- struct idpf_rx_buf *rx_buf = NULL;
union virtchnl2_rx_desc *desc;
unsigned int pkt_len = 0;
unsigned int hdr_len = 0;
u16 gen_id, buf_id = 0;
- /* Header buffer overflow only valid for header split */
- bool hbo = false;
int bufq_id;
+ /* Header buffer overflow only valid for header split */
+ bool hbo;
u8 rxdid;

/* get the Rx desc from Rx queue based on 'next_to_clean' */
@@ -3155,26 +3192,6 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)
pkt_len = FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_ADV_LEN_PBUF_M,
pkt_len);

- hbo = FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_ADV_STATUS0_HBO_M,
- rx_desc->status_err0_qw1);
-
- if (unlikely(hbo)) {
- /* If a header buffer overflow, occurs, i.e. header is
- * too large to fit in the header split buffer, HW will
- * put the entire packet, including headers, in the
- * data/payload buffer.
- */
- u64_stats_update_begin(&rxq->stats_sync);
- u64_stats_inc(&rxq->q_stats.rx.hsplit_buf_ovf);
- u64_stats_update_end(&rxq->stats_sync);
- goto bypass_hsplit;
- }
-
- hdr_len = le16_to_cpu(rx_desc->hdrlen_flags);
- hdr_len = FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_ADV_LEN_HDR_M,
- hdr_len);
-
-bypass_hsplit:
bufq_id = le16_to_cpu(rx_desc->pktlen_gen_bufq_id);
bufq_id = FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_ADV_BUFQ_ID_M,
bufq_id);
@@ -3192,16 +3209,46 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)

rx_buf = &rx_bufq->rx_buf.buf[buf_id];

- if (hdr_len) {
- const void *va = (u8 *)rx_bufq->rx_buf.hdr_buf_va +
- (u32)buf_id * IDPF_HDR_BUF_SIZE;
+ if (!rx_bufq->hdr_pp)
+ goto payload;
+
+ hbo = FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_ADV_STATUS0_HBO_M,
+ rx_desc->status_err0_qw1);
+ if (likely(!hbo))
+ /* If a header buffer overflow, occurs, i.e. header is
+ * too large to fit in the header split buffer, HW will
+ * put the entire packet, including headers, in the
+ * data/payload buffer.
+ */
+#define __HDR_LEN_MASK VIRTCHNL2_RX_FLEX_DESC_ADV_LEN_HDR_M
+ hdr_len = le16_get_bits(rx_desc->hdrlen_flags,
+ __HDR_LEN_MASK);
+#undef __HDR_LEN_MASK
+
+ hdr = &rx_bufq->rx_buf.hdr_buf[buf_id];
+
+ if (unlikely(!hdr_len && !skb)) {
+ hdr_len = idpf_rx_hsplit_wa(hdr, rx_buf, pkt_len);
+ pkt_len -= hdr_len;
+
+ u64_stats_update_begin(&rxq->stats_sync);
+ u64_stats_inc(&rxq->q_stats.rx.hsplit_buf_ovf);
+ u64_stats_update_end(&rxq->stats_sync);
+ }
+
+ if (libie_rx_sync_for_cpu(hdr, hdr_len)) {
+ skb = idpf_rx_build_skb(hdr, hdr_len);
+ if (!skb)
+ break;

- skb = idpf_rx_hdr_construct_skb(rxq, va, hdr_len);
u64_stats_update_begin(&rxq->stats_sync);
u64_stats_inc(&rxq->q_stats.rx.hsplit_pkts);
u64_stats_update_end(&rxq->stats_sync);
}

+ hdr->page = NULL;
+
+payload:
if (pkt_len) {
idpf_rx_sync_for_cpu(rx_buf, pkt_len);
if (skb)
@@ -3271,6 +3318,9 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)
static int idpf_rx_update_bufq_desc(struct idpf_queue *bufq, u16 refill_desc,
struct virtchnl2_splitq_rx_buf_desc *buf_desc)
{
+ struct libie_buf_queue bq = {
+ .count = bufq->desc_count,
+ };
struct idpf_rx_buf *buf;
dma_addr_t addr;
u16 buf_id;
@@ -3289,8 +3339,15 @@ static int idpf_rx_update_bufq_desc(struct idpf_queue *bufq, u16 refill_desc,
if (!bufq->rx_hsplit_en)
return 0;

- buf_desc->hdr_addr = cpu_to_le64(bufq->rx_buf.hdr_buf_pa +
- (u32)buf_id * IDPF_HDR_BUF_SIZE);
+ bq.pp = bufq->hdr_pp;
+ bq.rx_bi = bufq->rx_buf.hdr_buf;
+ bq.truesize = bufq->hdr_truesize;
+
+ addr = libie_rx_alloc(&bq, buf_id);
+ if (addr == DMA_MAPPING_ERROR)
+ return -ENOMEM;
+
+ buf_desc->hdr_addr = cpu_to_le64(addr);

return 0;
}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index 4a97790cbf68..357683559b57 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -99,8 +99,6 @@ do { \
#define IDPF_RX_BUF_STRIDE 32
#define IDPF_RX_BUF_POST_STRIDE 16
#define IDPF_LOW_WATERMARK 64
-/* Size of header buffer specifically for header split */
-#define IDPF_HDR_BUF_SIZE 256
#define IDPF_PACKET_HDR_PAD \
(ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN * 2)
#define IDPF_TX_TSO_MIN_MSS 88
@@ -315,16 +313,10 @@ struct idpf_rx_extracted {
#define IDPF_TX_MAX_DESC_DATA_ALIGNED \
ALIGN_DOWN(IDPF_TX_MAX_DESC_DATA, IDPF_TX_MAX_READ_REQ_SIZE)

-#define IDPF_RX_DMA_ATTR \
- (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
#define IDPF_RX_DESC(rxq, i) \
(&(((union virtchnl2_rx_desc *)((rxq)->desc_ring))[i]))

-struct idpf_rx_buf {
- struct page *page;
- unsigned int page_offset;
- u16 truesize;
-};
+#define idpf_rx_buf libie_rx_buffer

#define IDPF_RX_MAX_PTYPE_PROTO_IDS 32
#define IDPF_RX_MAX_PTYPE_SZ (sizeof(struct virtchnl2_ptype) + \
@@ -627,13 +619,15 @@ struct idpf_queue {
union {
struct idpf_tx_buf *tx_buf;
struct {
+ struct libie_rx_buffer *hdr_buf;
struct idpf_rx_buf *buf;
- dma_addr_t hdr_buf_pa;
- void *hdr_buf_va;
} rx_buf;
};
+ struct page_pool *hdr_pp;
struct page_pool *pp;
void *desc_ring;
+
+ u32 hdr_truesize;
u16 idx;
u16 q_type;
u32 q_id;
@@ -885,7 +879,7 @@ static inline dma_addr_t idpf_alloc_page(struct page_pool *pool,
unsigned int buf_size)
{
if (buf_size == IDPF_RX_BUF_2048)
- buf->page = page_pool_dev_alloc_frag(pool, &buf->page_offset,
+ buf->page = page_pool_dev_alloc_frag(pool, &buf->offset,
buf_size);
else
buf->page = page_pool_dev_alloc_pages(pool);
@@ -895,7 +889,7 @@ static inline dma_addr_t idpf_alloc_page(struct page_pool *pool,

buf->truesize = buf_size;

- return page_pool_get_dma_addr(buf->page) + buf->page_offset +
+ return page_pool_get_dma_addr(buf->page) + buf->offset +
pool->p.offset;
}

@@ -922,7 +916,7 @@ static inline void idpf_rx_sync_for_cpu(struct idpf_rx_buf *rx_buf, u32 len)

dma_sync_single_range_for_cpu(pp->p.dev,
page_pool_get_dma_addr(page),
- rx_buf->page_offset + pp->p.offset, len,
+ rx_buf->offset + pp->p.offset, len,
page_pool_get_dma_dir(pp));
}

@@ -970,6 +964,7 @@ void idpf_rx_add_frag(struct idpf_rx_buf *rx_buf, struct sk_buff *skb,
struct sk_buff *idpf_rx_construct_skb(struct idpf_queue *rxq,
struct idpf_rx_buf *rx_buf,
unsigned int size);
+struct sk_buff *idpf_rx_build_skb(const struct libie_rx_buffer *buf, u32 size);
bool idpf_init_rx_buf_hw_alloc(struct idpf_queue *rxq, struct idpf_rx_buf *buf);
void idpf_rx_buf_hw_update(struct idpf_queue *rxq, u32 val);
void idpf_tx_buf_hw_update(struct idpf_queue *tx_q, u32 val,
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 98c904f4dcf5..d599c0199e22 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -1636,32 +1636,38 @@ static int idpf_send_config_rx_queues_msg(struct idpf_vport *vport)
num_rxq = rx_qgrp->singleq.num_rxq;

for (j = 0; j < num_rxq; j++, k++) {
+ const struct idpf_bufq_set *sets;
struct idpf_queue *rxq;

if (!idpf_is_queue_model_split(vport->rxq_model)) {
rxq = rx_qgrp->singleq.rxqs[j];
goto common_qi_fields;
}
+
rxq = &rx_qgrp->splitq.rxq_sets[j]->rxq;
- qi[k].rx_bufq1_id =
- cpu_to_le16(rxq->rxq_grp->splitq.bufq_sets[0].bufq.q_id);
+ sets = rxq->rxq_grp->splitq.bufq_sets;
+
+ qi[k].rx_bufq1_id = cpu_to_le16(sets[0].bufq.q_id);
if (vport->num_bufqs_per_qgrp > IDPF_SINGLE_BUFQ_PER_RXQ_GRP) {
qi[k].bufq2_ena = IDPF_BUFQ2_ENA;
qi[k].rx_bufq2_id =
- cpu_to_le16(rxq->rxq_grp->splitq.bufq_sets[1].bufq.q_id);
+ cpu_to_le16(sets[1].bufq.q_id);
}
qi[k].rx_buffer_low_watermark =
cpu_to_le16(rxq->rx_buffer_low_watermark);
if (idpf_is_feature_ena(vport, NETIF_F_GRO_HW))
qi[k].qflags |= cpu_to_le16(VIRTCHNL2_RXQ_RSC);

-common_qi_fields:
+ rxq->rx_hbuf_size = sets[0].bufq.rx_hbuf_size;
+
if (rxq->rx_hsplit_en) {
qi[k].qflags |=
cpu_to_le16(VIRTCHNL2_RXQ_HDR_SPLIT);
qi[k].hdr_buffer_size =
cpu_to_le16(rxq->rx_hbuf_size);
}
+
+common_qi_fields:
qi[k].queue_id = cpu_to_le32(rxq->q_id);
qi[k].model = cpu_to_le16(vport->rxq_model);
qi[k].type = cpu_to_le32(rxq->q_type);
--
2.43.0


2023-12-23 03:00:43

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 06/34] idpf: use libie Rx buffer management for payload buffer

idpf uses Page Pool for data buffers with hardcoded buffer lengths of
4k for "classic" buffers and 2k for "short" ones. This is not flexible
and does not ensure optimal memory usage. Why would you need 4k buffers
when the MTU is 1500?
Use libie for the data buffers and don't hardcode any buffer sizes. Let
them be calculated from the MTU for "classics" and then divide the
truesize by 2 for "short" ones. The memory usage is now greatly reduced
and 2 buffer queues starts make sense: on frames <= 1024, you'll recycle
(and resync) a page only after 4 HW writes rather than two.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/Kconfig | 1 -
drivers/net/ethernet/intel/idpf/idpf.h | 1 -
.../ethernet/intel/idpf/idpf_singleq_txrx.c | 24 +--
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 194 ++++++------------
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 70 +------
.../net/ethernet/intel/idpf/idpf_virtchnl.c | 8 +-
6 files changed, 88 insertions(+), 210 deletions(-)

diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index d2e9bef2e0cb..c96d244a1d54 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -379,7 +379,6 @@ config IDPF
depends on PCI_MSI
select DIMLIB
select LIBIE
- select PAGE_POOL
help
This driver supports Intel(R) Infrastructure Data Path Function
devices.
diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
index 8342df0f4f3d..596ece7df26a 100644
--- a/drivers/net/ethernet/intel/idpf/idpf.h
+++ b/drivers/net/ethernet/intel/idpf/idpf.h
@@ -381,7 +381,6 @@ struct idpf_vport {
u32 rxq_desc_count;
u8 num_bufqs_per_qgrp;
u32 bufq_desc_count[IDPF_MAX_BUFQS_PER_RXQ_GRP];
- u32 bufq_size[IDPF_MAX_BUFQS_PER_RXQ_GRP];
u16 num_rxq_grp;
struct idpf_rxq_group *rxq_grps;
u32 rxq_model;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
index 53ff572ce252..63a709743037 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
@@ -883,20 +883,24 @@ bool idpf_rx_singleq_buf_hw_alloc_all(struct idpf_queue *rx_q,
u16 cleaned_count)
{
struct virtchnl2_singleq_rx_buf_desc *desc;
+ const struct libie_buf_queue bq = {
+ .pp = rx_q->pp,
+ .rx_bi = rx_q->rx_buf.buf,
+ .truesize = rx_q->truesize,
+ .count = rx_q->desc_count,
+ };
u16 nta = rx_q->next_to_alloc;
- struct idpf_rx_buf *buf;

if (!cleaned_count)
return false;

desc = IDPF_SINGLEQ_RX_BUF_DESC(rx_q, nta);
- buf = &rx_q->rx_buf.buf[nta];

do {
dma_addr_t addr;

- addr = idpf_alloc_page(rx_q->pp, buf, rx_q->rx_buf_size);
- if (unlikely(addr == DMA_MAPPING_ERROR))
+ addr = libie_rx_alloc(&bq, nta);
+ if (addr == DMA_MAPPING_ERROR)
break;

/* Refresh the desc even if buffer_addrs didn't change
@@ -906,11 +910,9 @@ bool idpf_rx_singleq_buf_hw_alloc_all(struct idpf_queue *rx_q,
desc->hdr_addr = 0;
desc++;

- buf++;
nta++;
if (unlikely(nta == rx_q->desc_count)) {
desc = IDPF_SINGLEQ_RX_BUF_DESC(rx_q, 0);
- buf = rx_q->rx_buf.buf;
nta = 0;
}

@@ -1031,24 +1033,22 @@ static int idpf_rx_singleq_clean(struct idpf_queue *rx_q, int budget)
idpf_rx_singleq_extract_fields(rx_q, rx_desc, &fields);

rx_buf = &rx_q->rx_buf.buf[ntc];
- if (!fields.size) {
- idpf_rx_put_page(rx_buf);
+ if (!libie_rx_sync_for_cpu(rx_buf, fields.size))
goto skip_data;
- }

- idpf_rx_sync_for_cpu(rx_buf, fields.size);
if (skb)
idpf_rx_add_frag(rx_buf, skb, fields.size);
else
- skb = idpf_rx_construct_skb(rx_q, rx_buf, fields.size);
+ skb = idpf_rx_build_skb(rx_buf, fields.size);

/* exit if we failed to retrieve a buffer */
if (!skb)
break;

skip_data:
- IDPF_SINGLEQ_BUMP_RING_IDX(rx_q, ntc);
+ rx_buf->page = NULL;

+ IDPF_SINGLEQ_BUMP_RING_IDX(rx_q, ntc);
cleaned_count++;

/* skip if it is non EOP desc */
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index f696fd9839fc..c44737e243b0 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -369,6 +369,10 @@ static void idpf_rx_hdr_buf_rel_all(struct idpf_queue *rxq)
*/
static void idpf_rx_buf_rel_all(struct idpf_queue *rxq)
{
+ struct libie_buf_queue bq = {
+ .pp = rxq->pp,
+ };
+ struct device *dev;
u16 i;

/* queue already cleared, nothing to do */
@@ -382,8 +386,9 @@ static void idpf_rx_buf_rel_all(struct idpf_queue *rxq)
if (rxq->rx_hsplit_en)
idpf_rx_hdr_buf_rel_all(rxq);

- page_pool_destroy(rxq->pp);
- rxq->pp = NULL;
+ dev = bq.pp->p.dev;
+ libie_rx_page_pool_destroy(&bq);
+ rxq->dev = dev;

kfree(rxq->rx_buf.buf);
rxq->rx_buf.buf = NULL;
@@ -552,11 +557,9 @@ static bool idpf_rx_post_buf_desc(struct idpf_queue *bufq, u16 buf_id)
.count = bufq->desc_count,
};
u16 nta = bufq->next_to_alloc;
- struct idpf_rx_buf *buf;
dma_addr_t addr;

splitq_rx_desc = IDPF_SPLITQ_RX_BUF_DESC(bufq, nta);
- buf = &bufq->rx_buf.buf[buf_id];

if (bufq->rx_hsplit_en) {
bq.pp = bufq->hdr_pp;
@@ -570,8 +573,12 @@ static bool idpf_rx_post_buf_desc(struct idpf_queue *bufq, u16 buf_id)
splitq_rx_desc->hdr_addr = cpu_to_le64(addr);
}

- addr = idpf_alloc_page(bufq->pp, buf, bufq->rx_buf_size);
- if (unlikely(addr == DMA_MAPPING_ERROR))
+ bq.pp = bufq->pp;
+ bq.rx_bi = bufq->rx_buf.buf;
+ bq.truesize = bufq->truesize;
+
+ addr = libie_rx_alloc(&bq, buf_id);
+ if (addr == DMA_MAPPING_ERROR)
return false;

splitq_rx_desc->pkt_addr = cpu_to_le64(addr);
@@ -607,28 +614,6 @@ static bool idpf_rx_post_init_bufs(struct idpf_queue *bufq, u16 working_set)
return true;
}

-/**
- * idpf_rx_create_page_pool - Create a page pool
- * @rxbufq: RX queue to create page pool for
- *
- * Returns &page_pool on success, casted -errno on failure
- */
-static struct page_pool *idpf_rx_create_page_pool(struct idpf_queue *rxbufq)
-{
- struct page_pool_params pp = {
- .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
- .order = 0,
- .pool_size = rxbufq->desc_count,
- .nid = NUMA_NO_NODE,
- .dev = rxbufq->vport->netdev->dev.parent,
- .max_len = PAGE_SIZE,
- .dma_dir = DMA_FROM_DEVICE,
- .offset = 0,
- };
-
- return page_pool_create(&pp);
-}
-
/**
* idpf_rx_buf_alloc_all - Allocate memory for all buffer resources
* @rxbufq: queue for which the buffers are allocated; equivalent to
@@ -676,18 +661,28 @@ static int idpf_rx_buf_alloc_all(struct idpf_queue *rxbufq)
/**
* idpf_rx_bufs_init - Initialize page pool, allocate rx bufs, and post to HW
* @rxbufq: RX queue to create page pool for
+ * @type: type of Rx buffers to allocate
*
* Returns 0 on success, negative on failure
*/
-static int idpf_rx_bufs_init(struct idpf_queue *rxbufq)
+static int idpf_rx_bufs_init(struct idpf_queue *rxbufq,
+ enum libie_rx_buf_type type)
{
- struct page_pool *pool;
+ struct libie_buf_queue bq = {
+ .truesize = rxbufq->truesize,
+ .count = rxbufq->desc_count,
+ .type = type,
+ .hsplit = rxbufq->rx_hsplit_en,
+ };
+ int ret;

- pool = idpf_rx_create_page_pool(rxbufq);
- if (IS_ERR(pool))
- return PTR_ERR(pool);
+ ret = libie_rx_page_pool_create(&bq, &rxbufq->q_vector->napi);
+ if (ret)
+ return ret;

- rxbufq->pp = pool;
+ rxbufq->pp = bq.pp;
+ rxbufq->truesize = bq.truesize;
+ rxbufq->rx_buf_size = bq.rx_buf_len;

return idpf_rx_buf_alloc_all(rxbufq);
}
@@ -700,20 +695,21 @@ static int idpf_rx_bufs_init(struct idpf_queue *rxbufq)
*/
int idpf_rx_bufs_init_all(struct idpf_vport *vport)
{
- struct idpf_rxq_group *rx_qgrp;
+ bool split = idpf_is_queue_model_split(vport->rxq_model);
struct idpf_queue *q;
int i, j, err;

for (i = 0; i < vport->num_rxq_grp; i++) {
- rx_qgrp = &vport->rxq_grps[i];
+ struct idpf_rxq_group *rx_qgrp = &vport->rxq_grps[i];
+ u32 truesize = 0;

/* Allocate bufs for the rxq itself in singleq */
- if (!idpf_is_queue_model_split(vport->rxq_model)) {
+ if (!split) {
int num_rxq = rx_qgrp->singleq.num_rxq;

for (j = 0; j < num_rxq; j++) {
q = rx_qgrp->singleq.rxqs[j];
- err = idpf_rx_bufs_init(q);
+ err = idpf_rx_bufs_init(q, LIBIE_RX_BUF_MTU);
if (err)
return err;
}
@@ -723,10 +719,18 @@ int idpf_rx_bufs_init_all(struct idpf_vport *vport)

/* Otherwise, allocate bufs for the buffer queues */
for (j = 0; j < vport->num_bufqs_per_qgrp; j++) {
+ enum libie_rx_buf_type qt;
+
q = &rx_qgrp->splitq.bufq_sets[j].bufq;
- err = idpf_rx_bufs_init(q);
+ q->truesize = truesize;
+
+ qt = truesize ? LIBIE_RX_BUF_SHORT : LIBIE_RX_BUF_MTU;
+
+ err = idpf_rx_bufs_init(q, qt);
if (err)
return err;
+
+ truesize = q->truesize >> 1;
}
}

@@ -1009,17 +1013,11 @@ void idpf_vport_init_num_qs(struct idpf_vport *vport,
/* Adjust number of buffer queues per Rx queue group. */
if (!idpf_is_queue_model_split(vport->rxq_model)) {
vport->num_bufqs_per_qgrp = 0;
- vport->bufq_size[0] = IDPF_RX_BUF_2048;

return;
}

vport->num_bufqs_per_qgrp = IDPF_MAX_BUFQS_PER_RXQ_GRP;
- /* Bufq[0] default buffer size is 4K
- * Bufq[1] default buffer size is 2K
- */
- vport->bufq_size[0] = IDPF_RX_BUF_4096;
- vport->bufq_size[1] = IDPF_RX_BUF_2048;
}

/**
@@ -1353,7 +1351,6 @@ static int idpf_rxq_group_alloc(struct idpf_vport *vport, u16 num_rxq)
q->vport = vport;
q->rxq_grp = rx_qgrp;
q->idx = j;
- q->rx_buf_size = vport->bufq_size[j];
q->rx_buffer_low_watermark = IDPF_LOW_WATERMARK;
q->rx_buf_stride = IDPF_RX_BUF_STRIDE;
q->rx_hsplit_en = hs;
@@ -1405,14 +1402,9 @@ static int idpf_rxq_group_alloc(struct idpf_vport *vport, u16 num_rxq)
q->vport = vport;
q->rxq_grp = rx_qgrp;
q->idx = (i * num_rxq) + j;
- /* In splitq mode, RXQ buffer size should be
- * set to that of the first buffer queue
- * associated with this RXQ
- */
- q->rx_buf_size = vport->bufq_size[0];
q->rx_buffer_low_watermark = IDPF_LOW_WATERMARK;
q->rx_max_pkt_size = vport->netdev->mtu +
- IDPF_PACKET_HDR_PAD;
+ LIBIE_RX_LL_LEN;
idpf_rxq_set_descids(vport, q);
}
}
@@ -2986,70 +2978,10 @@ static int idpf_rx_process_skb_fields(struct idpf_queue *rxq,
void idpf_rx_add_frag(struct idpf_rx_buf *rx_buf, struct sk_buff *skb,
unsigned int size)
{
- skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buf->page,
- rx_buf->offset, size, rx_buf->truesize);
-
- rx_buf->page = NULL;
-}
-
-/**
- * idpf_rx_construct_skb - Allocate skb and populate it
- * @rxq: Rx descriptor queue
- * @rx_buf: Rx buffer to pull data from
- * @size: the length of the packet
- *
- * This function allocates an skb. It then populates it with the page
- * data from the current receive descriptor, taking care to set up the
- * skb correctly.
- */
-struct sk_buff *idpf_rx_construct_skb(struct idpf_queue *rxq,
- struct idpf_rx_buf *rx_buf,
- unsigned int size)
-{
- unsigned int headlen;
- struct sk_buff *skb;
- void *va;
-
- va = page_address(rx_buf->page) + rx_buf->offset;
-
- /* prefetch first cache line of first page */
- net_prefetch(va);
- /* allocate a skb to store the frags */
- skb = __napi_alloc_skb(&rxq->q_vector->napi, IDPF_RX_HDR_SIZE,
- GFP_ATOMIC);
- if (unlikely(!skb)) {
- idpf_rx_put_page(rx_buf);
-
- return NULL;
- }
-
- skb_mark_for_recycle(skb);
-
- /* Determine available headroom for copy */
- headlen = size;
- if (headlen > IDPF_RX_HDR_SIZE)
- headlen = eth_get_headlen(skb->dev, va, IDPF_RX_HDR_SIZE);
-
- /* align pull length to size of long to optimize memcpy performance */
- memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
-
- /* if we exhaust the linear part then add what is left as a frag */
- size -= headlen;
- if (!size) {
- idpf_rx_put_page(rx_buf);
-
- return skb;
- }
+ u32 hr = rx_buf->page->pp->p.offset;

- skb_add_rx_frag(skb, 0, rx_buf->page, rx_buf->offset + headlen,
- size, rx_buf->truesize);
-
- /* Since we're giving the page to the stack, clear our reference to it.
- * We'll get a new one during buffer posting.
- */
- rx_buf->page = NULL;
-
- return skb;
+ skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buf->page,
+ rx_buf->offset + hr, size, rx_buf->truesize);
}

/**
@@ -3249,24 +3181,24 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)
hdr->page = NULL;

payload:
- if (pkt_len) {
- idpf_rx_sync_for_cpu(rx_buf, pkt_len);
- if (skb)
- idpf_rx_add_frag(rx_buf, skb, pkt_len);
- else
- skb = idpf_rx_construct_skb(rxq, rx_buf,
- pkt_len);
- } else {
- idpf_rx_put_page(rx_buf);
- }
+ if (!libie_rx_sync_for_cpu(rx_buf, pkt_len))
+ goto skip_data;
+
+ if (skb)
+ idpf_rx_add_frag(rx_buf, skb, pkt_len);
+ else
+ skb = idpf_rx_build_skb(rx_buf, pkt_len);

/* exit if we failed to retrieve a buffer */
if (!skb)
break;

- idpf_rx_post_buf_refill(refillq, buf_id);
+skip_data:
+ rx_buf->page = NULL;

+ idpf_rx_post_buf_refill(refillq, buf_id);
IDPF_RX_BUMP_NTC(rxq, ntc);
+
/* skip if it is non EOP desc */
if (!idpf_rx_splitq_is_eop(rx_desc))
continue;
@@ -3319,18 +3251,18 @@ static int idpf_rx_update_bufq_desc(struct idpf_queue *bufq, u16 refill_desc,
struct virtchnl2_splitq_rx_buf_desc *buf_desc)
{
struct libie_buf_queue bq = {
+ .pp = bufq->pp,
+ .rx_bi = bufq->rx_buf.buf,
+ .truesize = bufq->truesize,
.count = bufq->desc_count,
};
- struct idpf_rx_buf *buf;
dma_addr_t addr;
u16 buf_id;

buf_id = FIELD_GET(IDPF_RX_BI_BUFID_M, refill_desc);

- buf = &bufq->rx_buf.buf[buf_id];
-
- addr = idpf_alloc_page(bufq->pp, buf, bufq->rx_buf_size);
- if (unlikely(addr == DMA_MAPPING_ERROR))
+ addr = libie_rx_alloc(&bq, buf_id);
+ if (addr == DMA_MAPPING_ERROR)
return -ENOMEM;

buf_desc->pkt_addr = cpu_to_le64(addr);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index 357683559b57..0bbc654a24b9 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -93,14 +93,10 @@ do { \
idx = 0; \
} while (0)

-#define IDPF_RX_HDR_SIZE 256
-#define IDPF_RX_BUF_2048 2048
-#define IDPF_RX_BUF_4096 4096
#define IDPF_RX_BUF_STRIDE 32
#define IDPF_RX_BUF_POST_STRIDE 16
#define IDPF_LOW_WATERMARK 64
-#define IDPF_PACKET_HDR_PAD \
- (ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN * 2)
+
#define IDPF_TX_TSO_MIN_MSS 88

/* Minimum number of descriptors between 2 descriptors with the RE bit set;
@@ -609,7 +605,6 @@ union idpf_queue_stats {
* @sched_buf_hash: Hash table to stores buffers
*/
struct idpf_queue {
- struct device *dev;
struct idpf_vport *vport;
union {
struct idpf_txq_group *txq_grp;
@@ -624,10 +619,14 @@ struct idpf_queue {
} rx_buf;
};
struct page_pool *hdr_pp;
- struct page_pool *pp;
+ union {
+ struct page_pool *pp;
+ struct device *dev;
+ };
void *desc_ring;

u32 hdr_truesize;
+ u32 truesize;
u16 idx;
u16 q_type;
u32 q_id;
@@ -866,60 +865,6 @@ static inline void idpf_tx_splitq_build_desc(union idpf_tx_flex_desc *desc,
idpf_tx_splitq_build_flow_desc(desc, params, td_cmd, size);
}

-/**
- * idpf_alloc_page - Allocate a new RX buffer from the page pool
- * @pool: page_pool to allocate from
- * @buf: metadata struct to populate with page info
- * @buf_size: 2K or 4K
- *
- * Returns &dma_addr_t to be passed to HW for Rx, %DMA_MAPPING_ERROR otherwise.
- */
-static inline dma_addr_t idpf_alloc_page(struct page_pool *pool,
- struct idpf_rx_buf *buf,
- unsigned int buf_size)
-{
- if (buf_size == IDPF_RX_BUF_2048)
- buf->page = page_pool_dev_alloc_frag(pool, &buf->offset,
- buf_size);
- else
- buf->page = page_pool_dev_alloc_pages(pool);
-
- if (!buf->page)
- return DMA_MAPPING_ERROR;
-
- buf->truesize = buf_size;
-
- return page_pool_get_dma_addr(buf->page) + buf->offset +
- pool->p.offset;
-}
-
-/**
- * idpf_rx_put_page - Return RX buffer page to pool
- * @rx_buf: RX buffer metadata struct
- */
-static inline void idpf_rx_put_page(struct idpf_rx_buf *rx_buf)
-{
- page_pool_put_page(rx_buf->page->pp, rx_buf->page,
- rx_buf->truesize, true);
- rx_buf->page = NULL;
-}
-
-/**
- * idpf_rx_sync_for_cpu - Synchronize DMA buffer
- * @rx_buf: RX buffer metadata struct
- * @len: frame length from descriptor
- */
-static inline void idpf_rx_sync_for_cpu(struct idpf_rx_buf *rx_buf, u32 len)
-{
- struct page *page = rx_buf->page;
- struct page_pool *pp = page->pp;
-
- dma_sync_single_range_for_cpu(pp->p.dev,
- page_pool_get_dma_addr(page),
- rx_buf->offset + pp->p.offset, len,
- page_pool_get_dma_dir(pp));
-}
-
/**
* idpf_vport_intr_set_wb_on_itr - enable descriptor writeback on disabled interrupts
* @q_vector: pointer to queue vector struct
@@ -961,9 +906,6 @@ void idpf_deinit_rss(struct idpf_vport *vport);
int idpf_rx_bufs_init_all(struct idpf_vport *vport);
void idpf_rx_add_frag(struct idpf_rx_buf *rx_buf, struct sk_buff *skb,
unsigned int size);
-struct sk_buff *idpf_rx_construct_skb(struct idpf_queue *rxq,
- struct idpf_rx_buf *rx_buf,
- unsigned int size);
struct sk_buff *idpf_rx_build_skb(const struct libie_rx_buffer *buf, u32 size);
bool idpf_init_rx_buf_hw_alloc(struct idpf_queue *rxq, struct idpf_rx_buf *buf);
void idpf_rx_buf_hw_update(struct idpf_queue *rxq, u32 val);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index d599c0199e22..5c3d7c3534af 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -1647,6 +1647,12 @@ static int idpf_send_config_rx_queues_msg(struct idpf_vport *vport)
rxq = &rx_qgrp->splitq.rxq_sets[j]->rxq;
sets = rxq->rxq_grp->splitq.bufq_sets;

+ /* In splitq mode, RXQ buffer size should be
+ * set to that of the first buffer queue
+ * associated with this RXQ.
+ */
+ rxq->rx_buf_size = sets[0].bufq.rx_buf_size;
+
qi[k].rx_bufq1_id = cpu_to_le16(sets[0].bufq.q_id);
if (vport->num_bufqs_per_qgrp > IDPF_SINGLE_BUFQ_PER_RXQ_GRP) {
qi[k].bufq2_ena = IDPF_BUFQ2_ENA;
@@ -3297,7 +3303,7 @@ void idpf_vport_init(struct idpf_vport *vport, struct idpf_vport_max_q *max_q)
rss_data->rss_lut_size = le16_to_cpu(vport_msg->rss_lut_size);

ether_addr_copy(vport->default_mac_addr, vport_msg->default_mac_addr);
- vport->max_mtu = le16_to_cpu(vport_msg->max_mtu) - IDPF_PACKET_HDR_PAD;
+ vport->max_mtu = le16_to_cpu(vport_msg->max_mtu) - LIBIE_RX_LL_LEN;

/* Initialize Tx and Rx profiles for Dynamic Interrupt Moderation */
memcpy(vport->rx_itr_profile, rx_itr, IDPF_DIM_PROFILE_SLOTS);
--
2.43.0


2023-12-23 03:01:00

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 07/34] libie: add Tx buffer completion helpers

Software-side Tx buffers for storing DMA, frame size, skb pointers etc.
are pretty much generic and every driver defines them the same way. The
same can be said for software Tx completions -- same napi_consume_skb()s
and all that...
Add a couple simple wrappers for doing that to stop repeating the old
tale at least within the Intel code.

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/linux/net/intel/libie/tx.h | 88 ++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
create mode 100644 include/linux/net/intel/libie/tx.h

diff --git a/include/linux/net/intel/libie/tx.h b/include/linux/net/intel/libie/tx.h
new file mode 100644
index 000000000000..07a19abb72fd
--- /dev/null
+++ b/include/linux/net/intel/libie/tx.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2023 Intel Corporation. */
+
+#ifndef __LIBIE_TX_H
+#define __LIBIE_TX_H
+
+#include <linux/net/intel/libie/stats.h>
+#include <linux/skbuff.h>
+
+/**
+ * enum libie_tx_buf_type - type of &libie_tx_buf to act on Tx completion
+ * @LIBIE_TX_BUF_EMPTY: unused OR XSk frame, no action required
+ * @LIBIE_TX_BUF_SLAB: kmalloc-allocated buffer, unmap and kfree()
+ * @LIBIE_TX_BUF_FRAG: mapped skb OR &xdp_buff frag, only unmap DMA
+ * @LIBIE_TX_BUF_SKB: &sk_buff, unmap and consume_skb(), update stats
+ * @LIBIE_TX_BUF_XDP_TX: &skb_shared_info, page_pool_put_full_page(), stats
+ * @LIBIE_TX_BUF_XDP_XMIT: &xdp_frame, unmap and xdp_return_frame(), stats
+ * @LIBIE_TX_BUF_XSK_TX: &xdp_buff on XSk queue, xsk_buff_free(), stats
+ */
+enum libie_tx_buf_type {
+ LIBIE_TX_BUF_EMPTY = 0U,
+ LIBIE_TX_BUF_SLAB,
+ LIBIE_TX_BUF_FRAG,
+ LIBIE_TX_BUF_SKB,
+ LIBIE_TX_BUF_XDP_TX,
+ LIBIE_TX_BUF_XDP_XMIT,
+ LIBIE_TX_BUF_XSK_TX,
+};
+
+struct libie_tx_buffer {
+ void *next_to_watch;
+ union {
+ void *raw;
+ struct sk_buff *skb;
+ struct skb_shared_info *sinfo;
+ struct xdp_frame *xdpf;
+ struct xdp_buff *xdp;
+ };
+
+ DEFINE_DMA_UNMAP_ADDR(dma);
+ DEFINE_DMA_UNMAP_LEN(len);
+
+ u32 bytecount;
+ u16 gso_segs;
+ enum libie_tx_buf_type type:16;
+
+ union {
+ int compl_tag;
+ bool ctx_entry;
+ };
+};
+
+static inline void libie_tx_complete_buf(struct libie_tx_buffer *buf,
+ struct device *dev, bool napi,
+ struct libie_sq_onstack_stats *ss)
+{
+ switch (buf->type) {
+ case LIBIE_TX_BUF_EMPTY:
+ return;
+ case LIBIE_TX_BUF_SLAB:
+ case LIBIE_TX_BUF_FRAG:
+ case LIBIE_TX_BUF_SKB:
+ dma_unmap_page(dev, dma_unmap_addr(buf, dma),
+ dma_unmap_len(buf, len),
+ DMA_TO_DEVICE);
+ break;
+ default:
+ break;
+ }
+
+ switch (buf->type) {
+ case LIBIE_TX_BUF_SLAB:
+ kfree(buf->raw);
+ break;
+ case LIBIE_TX_BUF_SKB:
+ ss->packets += buf->gso_segs;
+ ss->bytes += buf->bytecount;
+
+ napi_consume_skb(buf->skb, napi);
+ break;
+ default:
+ break;
+ }
+
+ buf->type = LIBIE_TX_BUF_EMPTY;
+}
+
+#endif /* __LIBIE_TX_H */
--
2.43.0


2023-12-23 03:01:20

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 08/34] idpf: convert to libie Tx buffer completion

&idpf_tx_buffer is almost identical to the previous generations, as well
as the way it's handled. Moreover, relying on dma_unmap_addr() and
!!buf->skb instead of explicit defining of buffer's type was never good.
Use the newly added libie helpers to do it properly and reduce the
copy-paste around the Tx code.

Signed-off-by: Alexander Lobakin <[email protected]>
---
.../ethernet/intel/idpf/idpf_singleq_txrx.c | 38 ++-----
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 107 +++---------------
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 43 +------
3 files changed, 31 insertions(+), 157 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
index 63a709743037..23dcc02e6976 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
@@ -190,6 +190,7 @@ static void idpf_tx_singleq_map(struct idpf_queue *tx_q,
struct idpf_tx_buf *first,
struct idpf_tx_offload_params *offloads)
{
+ enum libie_tx_buf_type type = LIBIE_TX_BUF_SKB;
u32 offsets = offloads->hdr_offsets;
struct idpf_tx_buf *tx_buf = first;
struct idpf_base_tx_desc *tx_desc;
@@ -219,6 +220,8 @@ static void idpf_tx_singleq_map(struct idpf_queue *tx_q,
if (dma_mapping_error(tx_q->dev, dma))
return idpf_tx_dma_map_error(tx_q, skb, first, i);

+ tx_buf->type = type;
+
/* record length, and DMA address */
dma_unmap_len_set(tx_buf, len, size);
dma_unmap_addr_set(tx_buf, dma, dma);
@@ -270,6 +273,7 @@ static void idpf_tx_singleq_map(struct idpf_queue *tx_q,
DMA_TO_DEVICE);

tx_buf = &tx_q->tx_buf[i];
+ type = LIBIE_TX_BUF_FRAG;
}

skb_tx_timestamp(first->skb);
@@ -447,7 +451,7 @@ static bool idpf_tx_singleq_clean(struct idpf_queue *tx_q, int napi_budget,
int *cleaned)
{
unsigned int budget = tx_q->vport->compln_clean_budget;
- unsigned int total_bytes = 0, total_pkts = 0;
+ struct libie_sq_onstack_stats ss = { };
struct idpf_base_tx_desc *tx_desc;
s16 ntc = tx_q->next_to_clean;
struct idpf_netdev_priv *np;
@@ -495,20 +499,7 @@ static bool idpf_tx_singleq_clean(struct idpf_queue *tx_q, int napi_budget,
tx_buf->next_to_watch = NULL;

/* update the statistics for this packet */
- total_bytes += tx_buf->bytecount;
- total_pkts += tx_buf->gso_segs;
-
- napi_consume_skb(tx_buf->skb, napi_budget);
-
- /* unmap skb header data */
- dma_unmap_single(tx_q->dev,
- dma_unmap_addr(tx_buf, dma),
- dma_unmap_len(tx_buf, len),
- DMA_TO_DEVICE);
-
- /* clear tx_buf data */
- tx_buf->skb = NULL;
- dma_unmap_len_set(tx_buf, len, 0);
+ libie_tx_complete_buf(tx_buf, tx_q->dev, !!napi_budget, &ss);

/* unmap remaining buffers */
while (tx_desc != eop_desc) {
@@ -522,13 +513,8 @@ static bool idpf_tx_singleq_clean(struct idpf_queue *tx_q, int napi_budget,
}

/* unmap any remaining paged data */
- if (dma_unmap_len(tx_buf, len)) {
- dma_unmap_page(tx_q->dev,
- dma_unmap_addr(tx_buf, dma),
- dma_unmap_len(tx_buf, len),
- DMA_TO_DEVICE);
- dma_unmap_len_set(tx_buf, len, 0);
- }
+ libie_tx_complete_buf(tx_buf, tx_q->dev, !!napi_budget,
+ &ss);
}

/* update budget only if we did something */
@@ -548,11 +534,11 @@ static bool idpf_tx_singleq_clean(struct idpf_queue *tx_q, int napi_budget,
ntc += tx_q->desc_count;
tx_q->next_to_clean = ntc;

- *cleaned += total_pkts;
+ *cleaned += ss.packets;

u64_stats_update_begin(&tx_q->stats_sync);
- u64_stats_add(&tx_q->q_stats.tx.packets, total_pkts);
- u64_stats_add(&tx_q->q_stats.tx.bytes, total_bytes);
+ u64_stats_add(&tx_q->q_stats.tx.packets, ss.packets);
+ u64_stats_add(&tx_q->q_stats.tx.bytes, ss.bytes);
u64_stats_update_end(&tx_q->stats_sync);

vport = tx_q->vport;
@@ -561,7 +547,7 @@ static bool idpf_tx_singleq_clean(struct idpf_queue *tx_q, int napi_budget,

dont_wake = np->state != __IDPF_VPORT_UP ||
!netif_carrier_ok(vport->netdev);
- __netif_txq_completed_wake(nq, total_pkts, total_bytes,
+ __netif_txq_completed_wake(nq, ss.packets, ss.bytes,
IDPF_DESC_UNUSED(tx_q), IDPF_TX_WAKE_THRESH,
dont_wake);

diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index c44737e243b0..6fd9128e61d8 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -54,39 +54,13 @@ void idpf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
}
}

-/**
- * idpf_tx_buf_rel - Release a Tx buffer
- * @tx_q: the queue that owns the buffer
- * @tx_buf: the buffer to free
- */
-static void idpf_tx_buf_rel(struct idpf_queue *tx_q, struct idpf_tx_buf *tx_buf)
-{
- if (tx_buf->skb) {
- if (dma_unmap_len(tx_buf, len))
- dma_unmap_single(tx_q->dev,
- dma_unmap_addr(tx_buf, dma),
- dma_unmap_len(tx_buf, len),
- DMA_TO_DEVICE);
- dev_kfree_skb_any(tx_buf->skb);
- } else if (dma_unmap_len(tx_buf, len)) {
- dma_unmap_page(tx_q->dev,
- dma_unmap_addr(tx_buf, dma),
- dma_unmap_len(tx_buf, len),
- DMA_TO_DEVICE);
- }
-
- tx_buf->next_to_watch = NULL;
- tx_buf->skb = NULL;
- tx_buf->compl_tag = IDPF_SPLITQ_TX_INVAL_COMPL_TAG;
- dma_unmap_len_set(tx_buf, len, 0);
-}
-
/**
* idpf_tx_buf_rel_all - Free any empty Tx buffers
* @txq: queue to be cleaned
*/
static void idpf_tx_buf_rel_all(struct idpf_queue *txq)
{
+ struct libie_sq_onstack_stats ss = { };
u16 i;

/* Buffers already cleared, nothing to do */
@@ -95,7 +69,7 @@ static void idpf_tx_buf_rel_all(struct idpf_queue *txq)

/* Free all the Tx buffer sk_buffs */
for (i = 0; i < txq->desc_count; i++)
- idpf_tx_buf_rel(txq, &txq->tx_buf[i]);
+ libie_tx_complete_buf(&txq->tx_buf[i], txq->dev, false, &ss);

kfree(txq->tx_buf);
txq->tx_buf = NULL;
@@ -1505,37 +1479,6 @@ static void idpf_tx_handle_sw_marker(struct idpf_queue *tx_q)
wake_up(&vport->sw_marker_wq);
}

-/**
- * idpf_tx_splitq_clean_hdr - Clean TX buffer resources for header portion of
- * packet
- * @tx_q: tx queue to clean buffer from
- * @tx_buf: buffer to be cleaned
- * @cleaned: pointer to stats struct to track cleaned packets/bytes
- * @napi_budget: Used to determine if we are in netpoll
- */
-static void idpf_tx_splitq_clean_hdr(struct idpf_queue *tx_q,
- struct idpf_tx_buf *tx_buf,
- struct idpf_cleaned_stats *cleaned,
- int napi_budget)
-{
- napi_consume_skb(tx_buf->skb, napi_budget);
-
- if (dma_unmap_len(tx_buf, len)) {
- dma_unmap_single(tx_q->dev,
- dma_unmap_addr(tx_buf, dma),
- dma_unmap_len(tx_buf, len),
- DMA_TO_DEVICE);
-
- dma_unmap_len_set(tx_buf, len, 0);
- }
-
- /* clear tx_buf data */
- tx_buf->skb = NULL;
-
- cleaned->bytes += tx_buf->bytecount;
- cleaned->packets += tx_buf->gso_segs;
-}
-
/**
* idpf_tx_clean_stashed_bufs - clean bufs that were stored for
* out of order completions
@@ -1557,16 +1500,8 @@ static void idpf_tx_clean_stashed_bufs(struct idpf_queue *txq, u16 compl_tag,
if (unlikely(stash->buf.compl_tag != (int)compl_tag))
continue;

- if (stash->buf.skb) {
- idpf_tx_splitq_clean_hdr(txq, &stash->buf, cleaned,
- budget);
- } else if (dma_unmap_len(&stash->buf, len)) {
- dma_unmap_page(txq->dev,
- dma_unmap_addr(&stash->buf, dma),
- dma_unmap_len(&stash->buf, len),
- DMA_TO_DEVICE);
- dma_unmap_len_set(&stash->buf, len, 0);
- }
+ libie_tx_complete_buf(&stash->buf, txq->dev, !!budget,
+ cleaned);

/* Push shadow buf back onto stack */
idpf_buf_lifo_push(&txq->buf_stack, stash);
@@ -1599,6 +1534,7 @@ static int idpf_stash_flow_sch_buffers(struct idpf_queue *txq,
}

/* Store buffer params in shadow buffer */
+ stash->buf.type = tx_buf->type;
stash->buf.skb = tx_buf->skb;
stash->buf.bytecount = tx_buf->bytecount;
stash->buf.gso_segs = tx_buf->gso_segs;
@@ -1693,8 +1629,8 @@ static void idpf_tx_splitq_clean(struct idpf_queue *tx_q, u16 end,
}
}
} else {
- idpf_tx_splitq_clean_hdr(tx_q, tx_buf, cleaned,
- napi_budget);
+ libie_tx_complete_buf(tx_buf, tx_q->dev, !!napi_budget,
+ cleaned);

/* unmap remaining buffers */
while (tx_desc != eop_desc) {
@@ -1702,13 +1638,8 @@ static void idpf_tx_splitq_clean(struct idpf_queue *tx_q, u16 end,
tx_desc, tx_buf);

/* unmap any remaining paged data */
- if (dma_unmap_len(tx_buf, len)) {
- dma_unmap_page(tx_q->dev,
- dma_unmap_addr(tx_buf, dma),
- dma_unmap_len(tx_buf, len),
- DMA_TO_DEVICE);
- dma_unmap_len_set(tx_buf, len, 0);
- }
+ libie_tx_complete_buf(tx_buf, tx_q->dev,
+ !!napi_budget, cleaned);
}
}

@@ -1756,18 +1687,7 @@ static bool idpf_tx_clean_buf_ring(struct idpf_queue *txq, u16 compl_tag,
tx_buf = &txq->tx_buf[idx];

while (tx_buf->compl_tag == (int)compl_tag) {
- if (tx_buf->skb) {
- idpf_tx_splitq_clean_hdr(txq, tx_buf, cleaned, budget);
- } else if (dma_unmap_len(tx_buf, len)) {
- dma_unmap_page(txq->dev,
- dma_unmap_addr(tx_buf, dma),
- dma_unmap_len(tx_buf, len),
- DMA_TO_DEVICE);
- dma_unmap_len_set(tx_buf, len, 0);
- }
-
- memset(tx_buf, 0, sizeof(struct idpf_tx_buf));
- tx_buf->compl_tag = IDPF_SPLITQ_TX_INVAL_COMPL_TAG;
+ libie_tx_complete_buf(tx_buf, txq->dev, !!budget, cleaned);

num_descs_cleaned++;
idpf_tx_clean_buf_ring_bump_ntc(txq, idx, tx_buf);
@@ -2161,6 +2081,8 @@ unsigned int idpf_tx_desc_count_required(struct idpf_queue *txq,
void idpf_tx_dma_map_error(struct idpf_queue *txq, struct sk_buff *skb,
struct idpf_tx_buf *first, u16 idx)
{
+ struct libie_sq_onstack_stats ss = { };
+
u64_stats_update_begin(&txq->stats_sync);
u64_stats_inc(&txq->q_stats.tx.dma_map_errs);
u64_stats_update_end(&txq->stats_sync);
@@ -2170,7 +2092,7 @@ void idpf_tx_dma_map_error(struct idpf_queue *txq, struct sk_buff *skb,
struct idpf_tx_buf *tx_buf;

tx_buf = &txq->tx_buf[idx];
- idpf_tx_buf_rel(txq, tx_buf);
+ libie_tx_complete_buf(tx_buf, txq->dev, false, &ss);
if (tx_buf == first)
break;
if (idx == 0)
@@ -2227,6 +2149,7 @@ static void idpf_tx_splitq_map(struct idpf_queue *tx_q,
struct idpf_tx_splitq_params *params,
struct idpf_tx_buf *first)
{
+ enum libie_tx_buf_type type = LIBIE_TX_BUF_SKB;
union idpf_tx_flex_desc *tx_desc;
unsigned int data_len, size;
struct idpf_tx_buf *tx_buf;
@@ -2259,6 +2182,7 @@ static void idpf_tx_splitq_map(struct idpf_queue *tx_q,
if (dma_mapping_error(tx_q->dev, dma))
return idpf_tx_dma_map_error(tx_q, skb, first, i);

+ tx_buf->type = type;
tx_buf->compl_tag = params->compl_tag;

/* record length, and DMA address */
@@ -2374,6 +2298,7 @@ static void idpf_tx_splitq_map(struct idpf_queue *tx_q,
DMA_TO_DEVICE);

tx_buf = &tx_q->tx_buf[i];
+ type = LIBIE_TX_BUF_FRAG;
}

/* record SW timestamp if HW timestamp is not available */
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index 0bbc654a24b9..5975c6d029d7 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -5,6 +5,7 @@
#define _IDPF_TXRX_H_

#include <linux/net/intel/libie/rx.h>
+#include <linux/net/intel/libie/tx.h>

#include <net/page_pool/helpers.h>
#include <net/tcp.h>
@@ -165,42 +166,7 @@ union idpf_tx_flex_desc {
struct idpf_flex_tx_sched_desc flow; /* flow based scheduling */
};

-/**
- * struct idpf_tx_buf
- * @next_to_watch: Next descriptor to clean
- * @skb: Pointer to the skb
- * @dma: DMA address
- * @len: DMA length
- * @bytecount: Number of bytes
- * @gso_segs: Number of GSO segments
- * @compl_tag: Splitq only, unique identifier for a buffer. Used to compare
- * with completion tag returned in buffer completion event.
- * Because the completion tag is expected to be the same in all
- * data descriptors for a given packet, and a single packet can
- * span multiple buffers, we need this field to track all
- * buffers associated with this completion tag independently of
- * the buf_id. The tag consists of a N bit buf_id and M upper
- * order "generation bits". See compl_tag_bufid_m and
- * compl_tag_gen_s in struct idpf_queue. We'll use a value of -1
- * to indicate the tag is not valid.
- * @ctx_entry: Singleq only. Used to indicate the corresponding entry
- * in the descriptor ring was used for a context descriptor and
- * this buffer entry should be skipped.
- */
-struct idpf_tx_buf {
- void *next_to_watch;
- struct sk_buff *skb;
- DEFINE_DMA_UNMAP_ADDR(dma);
- DEFINE_DMA_UNMAP_LEN(len);
- unsigned int bytecount;
- unsigned short gso_segs;
-
- union {
- int compl_tag;
-
- bool ctx_entry;
- };
-};
+#define idpf_tx_buf libie_tx_buffer

struct idpf_tx_stash {
struct hlist_node hlist;
@@ -493,10 +459,7 @@ struct idpf_tx_queue_stats {
u64_stats_t dma_map_errs;
};

-struct idpf_cleaned_stats {
- u32 packets;
- u32 bytes;
-};
+#define idpf_cleaned_stats libie_sq_onstack_stats

union idpf_queue_stats {
struct idpf_rx_queue_stats rx;
--
2.43.0


2023-12-23 03:01:39

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 09/34] bpf, xdp: constify some bpf_prog * function arguments

In lots of places, bpf_prog pointer is used only for tracing or other
stuff that doesn't modify the structure itself. Same for net_device.
Address at least some of them and add `const` attributes there. The
object code didn't change, but that may prevent unwanted data
modifications and also allow more helpers to have const arguments.

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/linux/bpf.h | 12 ++++++------
include/linux/filter.h | 9 +++++----
include/linux/netdevice.h | 6 +++---
kernel/bpf/devmap.c | 8 ++++----
net/core/dev.c | 8 ++++----
net/core/filter.c | 25 ++++++++++++++-----------
6 files changed, 36 insertions(+), 32 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 7a8d4c81a39a..53ccac0f0d64 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2384,10 +2384,10 @@ int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_frame *xdpf,
int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx,
struct bpf_map *map, bool exclude_ingress);
int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
- struct bpf_prog *xdp_prog);
+ const struct bpf_prog *xdp_prog);
int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
- struct bpf_prog *xdp_prog, struct bpf_map *map,
- bool exclude_ingress);
+ const struct bpf_prog *xdp_prog,
+ struct bpf_map *map, bool exclude_ingress);

void __cpu_map_flush(void);
int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf,
@@ -2632,15 +2632,15 @@ struct sk_buff;

static inline int dev_map_generic_redirect(struct bpf_dtab_netdev *dst,
struct sk_buff *skb,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
return 0;
}

static inline
int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
- struct bpf_prog *xdp_prog, struct bpf_map *map,
- bool exclude_ingress)
+ const struct bpf_prog *xdp_prog,
+ struct bpf_map *map, bool exclude_ingress)
{
return 0;
}
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 68fb6c8142fe..34c9e2a4cc01 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1015,17 +1015,18 @@ static inline int xdp_ok_fwd_dev(const struct net_device *fwd,
* This does not appear to be a real limitation for existing software.
*/
int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
- struct xdp_buff *xdp, struct bpf_prog *prog);
+ struct xdp_buff *xdp, const struct bpf_prog *prog);
int xdp_do_redirect(struct net_device *dev,
struct xdp_buff *xdp,
- struct bpf_prog *prog);
+ const struct bpf_prog *prog);
int xdp_do_redirect_frame(struct net_device *dev,
struct xdp_buff *xdp,
struct xdp_frame *xdpf,
- struct bpf_prog *prog);
+ const struct bpf_prog *prog);
void xdp_do_flush(void);

-void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog, u32 act);
+void bpf_warn_invalid_xdp_action(const struct net_device *dev,
+ const struct bpf_prog *prog, u32 act);

#ifdef CONFIG_INET
struct sock *bpf_run_sk_reuseport(struct sock_reuseport *reuse, struct sock *sk,
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 75c7725e5e4f..7bb8324a4ebe 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3956,9 +3956,9 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
}

u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog);
-void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
-int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
+ const struct bpf_prog *xdp_prog);
+void generic_xdp_tx(struct sk_buff *skb, const struct bpf_prog *xdp_prog);
+int do_xdp_generic(const struct bpf_prog *xdp_prog, struct sk_buff *skb);
int netif_rx(struct sk_buff *skb);
int __netif_rx(struct sk_buff *skb);

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index a936c704d4e7..5ad73ef21da2 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -673,7 +673,7 @@ int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx,
}

int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
int err;

@@ -696,7 +696,7 @@ int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,

static int dev_map_redirect_clone(struct bpf_dtab_netdev *dst,
struct sk_buff *skb,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
struct sk_buff *nskb;
int err;
@@ -715,8 +715,8 @@ static int dev_map_redirect_clone(struct bpf_dtab_netdev *dst,
}

int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
- struct bpf_prog *xdp_prog, struct bpf_map *map,
- bool exclude_ingress)
+ const struct bpf_prog *xdp_prog,
+ struct bpf_map *map, bool exclude_ingress)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
struct bpf_dtab_netdev *dst, *last_dst = NULL;
diff --git a/net/core/dev.c b/net/core/dev.c
index f9d4b550ef4b..b2a9839dd18e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4833,7 +4833,7 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
}

u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
void *orig_data, *orig_data_end, *hard_start;
struct netdev_rx_queue *rxqueue;
@@ -4922,7 +4922,7 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp,

static u32 netif_receive_generic_xdp(struct sk_buff *skb,
struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
u32 act = XDP_DROP;

@@ -4979,7 +4979,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
* and DDOS attacks will be more effective. In-driver-XDP use dedicated TX
* queues, so they do not have this starvation issue.
*/
-void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
+void generic_xdp_tx(struct sk_buff *skb, const struct bpf_prog *xdp_prog)
{
struct net_device *dev = skb->dev;
struct netdev_queue *txq;
@@ -5004,7 +5004,7 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)

static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key);

-int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
+int do_xdp_generic(const struct bpf_prog *xdp_prog, struct sk_buff *skb)
{
if (xdp_prog) {
struct xdp_buff xdp;
diff --git a/net/core/filter.c b/net/core/filter.c
index 24061f29c9dd..4ace1edc4de1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4297,9 +4297,9 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
EXPORT_SYMBOL_GPL(xdp_master_redirect);

static inline int __xdp_do_redirect_xsk(struct bpf_redirect_info *ri,
- struct net_device *dev,
+ const struct net_device *dev,
struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
enum bpf_map_type map_type = ri->map_type;
void *fwd = ri->tgt_value;
@@ -4320,10 +4320,10 @@ static inline int __xdp_do_redirect_xsk(struct bpf_redirect_info *ri,
return err;
}

-static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri,
- struct net_device *dev,
- struct xdp_frame *xdpf,
- struct bpf_prog *xdp_prog)
+static __always_inline int
+__xdp_do_redirect_frame(struct bpf_redirect_info *ri, struct net_device *dev,
+ struct xdp_frame *xdpf,
+ const struct bpf_prog *xdp_prog)
{
enum bpf_map_type map_type = ri->map_type;
void *fwd = ri->tgt_value;
@@ -4381,7 +4381,7 @@ static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri,
}

int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog)
+ const struct bpf_prog *xdp_prog)
{
struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
enum bpf_map_type map_type = ri->map_type;
@@ -4395,7 +4395,8 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
EXPORT_SYMBOL_GPL(xdp_do_redirect);

int xdp_do_redirect_frame(struct net_device *dev, struct xdp_buff *xdp,
- struct xdp_frame *xdpf, struct bpf_prog *xdp_prog)
+ struct xdp_frame *xdpf,
+ const struct bpf_prog *xdp_prog)
{
struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
enum bpf_map_type map_type = ri->map_type;
@@ -4410,7 +4411,7 @@ EXPORT_SYMBOL_GPL(xdp_do_redirect_frame);
static int xdp_do_generic_redirect_map(struct net_device *dev,
struct sk_buff *skb,
struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog,
+ const struct bpf_prog *xdp_prog,
void *fwd,
enum bpf_map_type map_type, u32 map_id)
{
@@ -4457,7 +4458,8 @@ static int xdp_do_generic_redirect_map(struct net_device *dev,
}

int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
- struct xdp_buff *xdp, struct bpf_prog *xdp_prog)
+ struct xdp_buff *xdp,
+ const struct bpf_prog *xdp_prog)
{
struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
enum bpf_map_type map_type = ri->map_type;
@@ -8961,7 +8963,8 @@ static bool xdp_is_valid_access(int off, int size,
return __is_valid_xdp_access(off, size);
}

-void bpf_warn_invalid_xdp_action(struct net_device *dev, struct bpf_prog *prog, u32 act)
+void bpf_warn_invalid_xdp_action(const struct net_device *dev,
+ const struct bpf_prog *prog, u32 act)
{
const u32 act_max = XDP_REDIRECT;

--
2.43.0


2023-12-23 03:01:58

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 10/34] xdp: constify read-only arguments of some static inline helpers

Lots of read-only helpers for &xdp_buff and &xdp_frame, such as getting
the frame length, skb_shared_info etc., don't have their arguments
marked with `const` for no reason. Add the missing annotations to leave
less place for mistakes and more for optimization.

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/xdp.h | 29 +++++++++++++++++------------
1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index e6770dd40c91..197808df1ee1 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -88,7 +88,7 @@ struct xdp_buff {
u32 flags; /* supported values defined in xdp_buff_flags */
};

-static __always_inline bool xdp_buff_has_frags(struct xdp_buff *xdp)
+static __always_inline bool xdp_buff_has_frags(const struct xdp_buff *xdp)
{
return !!(xdp->flags & XDP_FLAGS_HAS_FRAGS);
}
@@ -103,7 +103,8 @@ static __always_inline void xdp_buff_clear_frags_flag(struct xdp_buff *xdp)
xdp->flags &= ~XDP_FLAGS_HAS_FRAGS;
}

-static __always_inline bool xdp_buff_is_frag_pfmemalloc(struct xdp_buff *xdp)
+static __always_inline bool
+xdp_buff_is_frag_pfmemalloc(const struct xdp_buff *xdp)
{
return !!(xdp->flags & XDP_FLAGS_FRAGS_PF_MEMALLOC);
}
@@ -144,15 +145,16 @@ xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,
SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))

static inline struct skb_shared_info *
-xdp_get_shared_info_from_buff(struct xdp_buff *xdp)
+xdp_get_shared_info_from_buff(const struct xdp_buff *xdp)
{
return (struct skb_shared_info *)xdp_data_hard_end(xdp);
}

-static __always_inline unsigned int xdp_get_buff_len(struct xdp_buff *xdp)
+static __always_inline unsigned int
+xdp_get_buff_len(const struct xdp_buff *xdp)
{
unsigned int len = xdp->data_end - xdp->data;
- struct skb_shared_info *sinfo;
+ const struct skb_shared_info *sinfo;

if (likely(!xdp_buff_has_frags(xdp)))
goto out;
@@ -177,12 +179,13 @@ struct xdp_frame {
u32 flags; /* supported values defined in xdp_buff_flags */
};

-static __always_inline bool xdp_frame_has_frags(struct xdp_frame *frame)
+static __always_inline bool xdp_frame_has_frags(const struct xdp_frame *frame)
{
return !!(frame->flags & XDP_FLAGS_HAS_FRAGS);
}

-static __always_inline bool xdp_frame_is_frag_pfmemalloc(struct xdp_frame *frame)
+static __always_inline bool
+xdp_frame_is_frag_pfmemalloc(const struct xdp_frame *frame)
{
return !!(frame->flags & XDP_FLAGS_FRAGS_PF_MEMALLOC);
}
@@ -201,7 +204,7 @@ static __always_inline void xdp_frame_bulk_init(struct xdp_frame_bulk *bq)
}

static inline struct skb_shared_info *
-xdp_get_shared_info_from_frame(struct xdp_frame *frame)
+xdp_get_shared_info_from_frame(const struct xdp_frame *frame)
{
void *data_hard_start = frame->data - frame->headroom - sizeof(*frame);

@@ -249,7 +252,8 @@ int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp);
struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf);

static inline
-void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
+void xdp_convert_frame_to_buff(const struct xdp_frame *frame,
+ struct xdp_buff *xdp)
{
xdp->data_hard_start = frame->data - frame->headroom - sizeof(*frame);
xdp->data = frame->data;
@@ -260,7 +264,7 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
}

static inline
-int xdp_update_frame_from_buff(struct xdp_buff *xdp,
+int xdp_update_frame_from_buff(const struct xdp_buff *xdp,
struct xdp_frame *xdp_frame)
{
int metasize, headroom;
@@ -317,9 +321,10 @@ void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq);
void xdp_return_frame_bulk(struct xdp_frame *xdpf,
struct xdp_frame_bulk *bq);

-static __always_inline unsigned int xdp_get_frame_len(struct xdp_frame *xdpf)
+static __always_inline unsigned int
+xdp_get_frame_len(const struct xdp_frame *xdpf)
{
- struct skb_shared_info *sinfo;
+ const struct skb_shared_info *sinfo;
unsigned int len = xdpf->len;

if (likely(!xdp_frame_has_frags(xdpf)))
--
2.43.0


2023-12-23 03:02:18

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 11/34] xdp: allow attaching already registered memory model to xdp_rxq_info

One may need to register memory model separately from xdp_rxq_info. One
simple example may be XDP test run code, but in general, it might be
useful when memory model registering is managed by one layer and then
XDP RxQ info by a different one.
Allow such scenarios by adding a simple helper which "attaches" an
already registered memory model to the desired xdp_rxq_info. As this is
mostly needed for Page Pool, add a special function to do that for a
&page_pool pointer.

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/xdp.h | 14 ++++++++++++++
net/bpf/test_run.c | 4 ++--
net/core/xdp.c | 12 ++++++++++++
3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 197808df1ee1..909c0bc50517 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -356,6 +356,20 @@ void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq);
int xdp_reg_mem_model(struct xdp_mem_info *mem,
enum xdp_mem_type type, void *allocator);
void xdp_unreg_mem_model(struct xdp_mem_info *mem);
+void xdp_rxq_info_attach_page_pool(struct xdp_rxq_info *xdp_rxq,
+ const struct page_pool *pool);
+
+static inline void
+xdp_rxq_info_attach_mem_model(struct xdp_rxq_info *xdp_rxq,
+ const struct xdp_mem_info *mem)
+{
+ xdp_rxq->mem = *mem;
+}
+
+static inline void xdp_rxq_info_detach_mem_model(struct xdp_rxq_info *xdp_rxq)
+{
+ xdp_rxq->mem = (struct xdp_mem_info){ };
+}

/* Drivers not supporting XDP metadata can use this helper, which
* rejects any room expansion for metadata as a result.
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index dfd919374017..b612b28ebeac 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -194,8 +194,7 @@ static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_c
* xdp_mem_info pointing to our page_pool
*/
xdp_rxq_info_reg(&xdp->rxq, orig_ctx->rxq->dev, 0, 0);
- xdp->rxq.mem.type = MEM_TYPE_PAGE_POOL;
- xdp->rxq.mem.id = pp->xdp_mem_id;
+ xdp_rxq_info_attach_page_pool(&xdp->rxq, xdp->pp);
xdp->dev = orig_ctx->rxq->dev;
xdp->orig_ctx = orig_ctx;

@@ -212,6 +211,7 @@ static int xdp_test_run_setup(struct xdp_test_data *xdp, struct xdp_buff *orig_c

static void xdp_test_run_teardown(struct xdp_test_data *xdp)
{
+ xdp_rxq_info_detach_mem_model(&xdp->rxq);
xdp_unreg_mem_model(&xdp->mem);
page_pool_destroy(xdp->pp);
kfree(xdp->frames);
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 4869c1c2d8f3..03ebdb21ea62 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -368,6 +368,18 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,

EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);

+void xdp_rxq_info_attach_page_pool(struct xdp_rxq_info *xdp_rxq,
+ const struct page_pool *pool)
+{
+ struct xdp_mem_info mem = {
+ .type = MEM_TYPE_PAGE_POOL,
+ .id = pool->xdp_mem_id,
+ };
+
+ xdp_rxq_info_attach_mem_model(xdp_rxq, &mem);
+}
+EXPORT_SYMBOL_GPL(xdp_rxq_info_attach_page_pool);
+
/* XDP RX runs under NAPI protection, and in different delivery error
* scenarios (e.g. queue full), it is possible to return the xdp_frame
* while still leveraging this protection. The @napi_direct boolean
--
2.43.0


2023-12-23 03:02:35

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 12/34] xdp: add generic xdp_buff_add_frag()

The code piece which would attach a frag to &xdp_buff is almost
identical across the drivers supporting XDP multi-buffer on Rx.
Make it a generic elegant onelner.
Also, I see lots of drivers calculating frags_truesize as
`xdp->frame_sz * nr_frags`. I can't say this is fully correct, since
frags might be backed by chunks of different sizes, especially with
stuff like the header split. Even page_pool_alloc() can give you two
different truesizes on two subsequent requests to allocate the same
buffer size. Add a field to &skb_shared_info (unionized as there's no
free slot currently on x6_64) to track the "true" truesize. It can be
used later when updating an skb.

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/linux/skbuff.h | 14 ++++++++++----
include/net/xdp.h | 36 +++++++++++++++++++++++++++++++++++-
2 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ea5c8ab3ed00..e350efa04070 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -598,11 +598,17 @@ struct skb_shared_info {
* Warning : all fields before dataref are cleared in __alloc_skb()
*/
atomic_t dataref;
- unsigned int xdp_frags_size;

- /* Intermediate layers must ensure that destructor_arg
- * remains valid until skb destructor */
- void * destructor_arg;
+ union {
+ struct {
+ unsigned int xdp_frags_size;
+ u32 xdp_frags_truesize;
+ };
+
+ /* Intermediate layers must ensure that destructor_arg
+ * remains valid until skb destructor */
+ void * destructor_arg;
+ };

/* must be last field, see pskb_expand_head() */
skb_frag_t frags[MAX_SKB_FRAGS];
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 909c0bc50517..a3dc0f39b437 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -165,6 +165,34 @@ xdp_get_buff_len(const struct xdp_buff *xdp)
return len;
}

+static inline bool xdp_buff_add_frag(struct xdp_buff *xdp, struct page *page,
+ u32 offset, u32 size, u32 truesize)
+{
+ struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
+
+ if (!xdp_buff_has_frags(xdp)) {
+ sinfo->nr_frags = 0;
+
+ sinfo->xdp_frags_size = 0;
+ sinfo->xdp_frags_truesize = 0;
+
+ xdp_buff_set_frags_flag(xdp);
+ }
+
+ if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS))
+ return false;
+
+ __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, page, offset,
+ size);
+ sinfo->xdp_frags_size += size;
+ sinfo->xdp_frags_truesize += truesize;
+
+ if (unlikely(page_is_pfmemalloc(page)))
+ xdp_buff_set_frag_pfmemalloc(xdp);
+
+ return true;
+}
+
struct xdp_frame {
void *data;
u16 len;
@@ -230,7 +258,13 @@ xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_frags,
unsigned int size, unsigned int truesize,
bool pfmemalloc)
{
- skb_shinfo(skb)->nr_frags = nr_frags;
+ struct skb_shared_info *sinfo = skb_shinfo(skb);
+
+ sinfo->nr_frags = nr_frags;
+ /* ``destructor_arg`` is unionized with ``xdp_frags_{,true}size``,
+ * reset it after that these fields aren't used anymore.
+ */
+ sinfo->destructor_arg = NULL;

skb->len += size;
skb->data_len += size;
--
2.43.0


2023-12-23 03:02:54

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 13/34] xdp: add generic xdp_build_skb_from_buff()

The code which builds an skb from an &xdp_buff keeps multiplying itself
around the drivers with almost no changes. Let's try to stop that by
adding a generic function.
There's __xdp_build_skb_from_frame() already, so just convert it to take
&xdp_buff instead, while making the original one a wrapper. The original
one always took an already allocated skb, allow both variants here -- if
no skb passed, which is expected when calling from a driver, pick one via
napi_build_skb().

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/xdp.h | 4 +++
net/core/xdp.c | 89 +++++++++++++++++++++++++++++++----------------
2 files changed, 63 insertions(+), 30 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index a3dc0f39b437..4fcf0ac48345 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -276,6 +276,10 @@ xdp_update_skb_shared_info(struct sk_buff *skb, u8 nr_frags,
void xdp_warn(const char *msg, const char *func, const int line);
#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)

+struct sk_buff *__xdp_build_skb_from_buff(struct sk_buff *skb,
+ const struct xdp_buff *xdp);
+#define xdp_build_skb_from_buff(xdp) __xdp_build_skb_from_buff(NULL, xdp)
+
struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
struct sk_buff *skb,
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 03ebdb21ea62..ed73b97472b4 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -609,45 +609,77 @@ int xdp_alloc_skb_bulk(void **skbs, int n_skb, gfp_t gfp)
}
EXPORT_SYMBOL_GPL(xdp_alloc_skb_bulk);

-struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
- struct sk_buff *skb,
- struct net_device *dev)
+struct sk_buff *__xdp_build_skb_from_buff(struct sk_buff *skb,
+ const struct xdp_buff *xdp)
{
- struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
- unsigned int headroom, frame_size;
- void *hard_start;
- u8 nr_frags;
+ const struct xdp_rxq_info *rxq = xdp->rxq;
+ const struct skb_shared_info *sinfo;
+ u32 nr_frags = 0;

/* xdp frags frame */
- if (unlikely(xdp_frame_has_frags(xdpf)))
+ if (unlikely(xdp_buff_has_frags(xdp))) {
+ sinfo = xdp_get_shared_info_from_buff(xdp);
nr_frags = sinfo->nr_frags;
+ }

- /* Part of headroom was reserved to xdpf */
- headroom = sizeof(*xdpf) + xdpf->headroom;
+ net_prefetch(xdp->data_meta);

- /* Memory size backing xdp_frame data already have reserved
- * room for build_skb to place skb_shared_info in tailroom.
- */
- frame_size = xdpf->frame_sz;
+ if (!skb) {
+ skb = napi_build_skb(xdp->data_hard_start, xdp->frame_sz);
+ if (unlikely(!skb))
+ return NULL;
+ } else {
+ /* build_skb_around() can return NULL only when !skb, which
+ * is impossible here.
+ */
+ build_skb_around(skb, xdp->data_hard_start, xdp->frame_sz);
+ }

- hard_start = xdpf->data - headroom;
- skb = build_skb_around(skb, hard_start, frame_size);
- if (unlikely(!skb))
- return NULL;
+ skb_reserve(skb, xdp->data - xdp->data_hard_start);
+ __skb_put(skb, xdp->data_end - xdp->data);
+ if (xdp->data > xdp->data_meta)
+ skb_metadata_set(skb, xdp->data - xdp->data_meta);
+
+ if (rxq->mem.type == MEM_TYPE_PAGE_POOL)
+ skb_mark_for_recycle(skb);

- skb_reserve(skb, headroom);
- __skb_put(skb, xdpf->len);
- if (xdpf->metasize)
- skb_metadata_set(skb, xdpf->metasize);
+ /* __xdp_rxq_info_reg() sets these two together */
+ if (rxq->reg_state == REG_STATE_REGISTERED)
+ skb_record_rx_queue(skb, rxq->queue_index);
+
+ if (unlikely(nr_frags)) {
+ u32 truesize = sinfo->xdp_frags_truesize ? :
+ nr_frags * xdp->frame_sz;

- if (unlikely(xdp_frame_has_frags(xdpf)))
xdp_update_skb_shared_info(skb, nr_frags,
- sinfo->xdp_frags_size,
- nr_frags * xdpf->frame_sz,
- xdp_frame_is_frag_pfmemalloc(xdpf));
+ sinfo->xdp_frags_size, truesize,
+ xdp_buff_is_frag_pfmemalloc(xdp));
+ }

/* Essential SKB info: protocol and skb->dev */
- skb->protocol = eth_type_trans(skb, dev);
+ skb->protocol = eth_type_trans(skb, rxq->dev);
+
+ return skb;
+}
+EXPORT_SYMBOL_GPL(__xdp_build_skb_from_buff);
+
+struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
+ struct sk_buff *skb,
+ struct net_device *dev)
+{
+ struct xdp_rxq_info rxq = {
+ .dev = dev,
+ .mem = xdpf->mem,
+ };
+ struct xdp_buff xdp;
+
+ /* Check early instead of delegating it to build_skb_around() */
+ if (unlikely(!skb))
+ return NULL;
+
+ xdp.rxq = &rxq;
+ xdp_convert_frame_to_buff(xdpf, &xdp);
+ __xdp_build_skb_from_buff(skb, &xdp);

/* Optional SKB info, currently missing:
* - HW checksum info (skb->ip_summed)
@@ -655,9 +687,6 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
* - RX ring dev queue index (skb_record_rx_queue)
*/

- if (xdpf->mem.type == MEM_TYPE_PAGE_POOL)
- skb_mark_for_recycle(skb);
-
/* Allow SKB to reuse area used by xdp_frame */
xdp_scrub_frame(xdpf);

--
2.43.0


2023-12-23 03:03:15

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 14/34] xdp: get rid of xdp_frame::mem.id

Initially, xdp_frame::mem.id was used to search for the corresponding
&page_pool to return the page correctly.
However, after that struct page now contains a direct pointer to its PP,
further keeping of this field makes no sense. xdp_return_frame_bulk()
still uses it to do a lookup, but this is rather a leftover.
Remove xdp_frame::mem and replace it with ::mem_type, as only memory
type still matters and we need to know it to be able to free the frame
correctly.
But the main reason for this change was to allow mixing pages from
different &page_pools within one &xdp_buff/&xdp_frame. Why not?
Adjust xdp_return_frame_bulk() and page_pool_put_page_bulk(), so that
they won't be tied to a particular pool. Let the latter splice the
bulk when it encounters a page whichs PP is different and flush it
recursively.
As a cute side effect, sizeof(struct xdp_frame) on x86_64 is reduced
from 40 to 32 bytes, giving a bit more free headroom in front of the
frame.

Signed-off-by: Alexander Lobakin <[email protected]>
---
.../net/ethernet/freescale/dpaa/dpaa_eth.c | 2 +-
drivers/net/veth.c | 4 +-
include/net/page_pool/types.h | 6 +--
include/net/xdp.h | 17 +++----
kernel/bpf/cpumap.c | 2 +-
net/bpf/test_run.c | 2 +-
net/core/filter.c | 2 +-
net/core/page_pool.c | 31 ++++++++++--
net/core/xdp.c | 49 ++++++-------------
9 files changed, 58 insertions(+), 57 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index dcbc598b11c6..4f1fb8181131 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2263,7 +2263,7 @@ static int dpaa_a050385_wa_xdpf(struct dpaa_priv *priv,
new_xdpf->len = xdpf->len;
new_xdpf->headroom = priv->tx_headroom;
new_xdpf->frame_sz = DPAA_BP_RAW_SIZE;
- new_xdpf->mem.type = MEM_TYPE_PAGE_ORDER0;
+ new_xdpf->mem_type = MEM_TYPE_PAGE_ORDER0;

/* Release the initial buffer */
xdp_return_frame_rx_napi(xdpf);
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 578e36ea1589..8223c5c68704 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -633,7 +633,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
break;
case XDP_TX:
orig_frame = *frame;
- xdp->rxq->mem = frame->mem;
+ xdp->rxq->mem.type = frame->mem_type;
if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
trace_xdp_exception(rq->dev, xdp_prog, act);
frame = &orig_frame;
@@ -645,7 +645,7 @@ static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq,
goto xdp_xmit;
case XDP_REDIRECT:
orig_frame = *frame;
- xdp->rxq->mem = frame->mem;
+ xdp->rxq->mem.type = frame->mem_type;
if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) {
frame = &orig_frame;
stats->rx_drops++;
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 24733bef19bc..adc6630ace9c 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -222,8 +222,7 @@ void page_pool_unlink_napi(struct page_pool *pool);
void page_pool_destroy(struct page_pool *pool);
void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
struct xdp_mem_info *mem);
-void page_pool_put_page_bulk(struct page_pool *pool, void **data,
- int count);
+void page_pool_put_page_bulk(void **data, int count);
#else
static inline void page_pool_unlink_napi(struct page_pool *pool)
{
@@ -239,8 +238,7 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
{
}

-static inline void page_pool_put_page_bulk(struct page_pool *pool, void **data,
- int count)
+static inline void page_pool_put_page_bulk(void **data, int count)
{
}
#endif
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 4fcf0ac48345..66854b755b58 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -197,11 +197,8 @@ struct xdp_frame {
void *data;
u16 len;
u16 headroom;
- u32 metasize; /* uses lower 8-bits */
- /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
- * while mem info is valid on remote CPU.
- */
- struct xdp_mem_info mem;
+ u16 metasize;
+ enum xdp_mem_type mem_type:16;
struct net_device *dev_rx; /* used by cpumap */
u32 frame_sz;
u32 flags; /* supported values defined in xdp_buff_flags */
@@ -221,14 +218,12 @@ xdp_frame_is_frag_pfmemalloc(const struct xdp_frame *frame)
#define XDP_BULK_QUEUE_SIZE 16
struct xdp_frame_bulk {
int count;
- void *xa;
void *q[XDP_BULK_QUEUE_SIZE];
};

static __always_inline void xdp_frame_bulk_init(struct xdp_frame_bulk *bq)
{
- /* bq->count will be zero'ed when bq->xa gets updated */
- bq->xa = NULL;
+ bq->count = 0;
}

static inline struct skb_shared_info *
@@ -344,13 +339,13 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
if (unlikely(xdp_update_frame_from_buff(xdp, xdp_frame) < 0))
return NULL;

- /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
- xdp_frame->mem = xdp->rxq->mem;
+ /* rxq only valid until napi_schedule ends, convert to xdp_mem_type */
+ xdp_frame->mem_type = xdp->rxq->mem.type;

return xdp_frame;
}

-void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
+void __xdp_return(void *data, enum xdp_mem_type mem_type, bool napi_direct,
struct xdp_buff *xdp);
void xdp_return_frame(struct xdp_frame *xdpf);
void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 8a0bb80fe48a..3d557d458284 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -191,7 +191,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
int err;

rxq.dev = xdpf->dev_rx;
- rxq.mem = xdpf->mem;
+ rxq.mem.type = xdpf->mem_type;
/* TODO: report queue_index to xdp_rxq_info */

xdp_convert_frame_to_buff(xdpf, &xdp);
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index b612b28ebeac..a35940a41589 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -151,7 +151,7 @@ static void xdp_test_run_init_page(struct page *page, void *arg)
new_ctx->data = new_ctx->data_meta + meta_len;

xdp_update_frame_from_buff(new_ctx, frm);
- frm->mem = new_ctx->rxq->mem;
+ frm->mem_type = new_ctx->rxq->mem.type;

memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx));
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 4ace1edc4de1..e79e74edbae4 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4114,7 +4114,7 @@ static int bpf_xdp_frags_shrink_tail(struct xdp_buff *xdp, int offset)
if (skb_frag_size(frag) == shrink) {
struct page *page = skb_frag_page(frag);

- __xdp_return(page_address(page), &xdp->rxq->mem,
+ __xdp_return(page_address(page), xdp->rxq->mem.type,
false, NULL);
n_frags_free++;
} else {
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index bdebc9028da3..742289cd86cd 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -698,9 +698,18 @@ void page_pool_put_unrefed_page(struct page_pool *pool, struct page *page,
}
EXPORT_SYMBOL(page_pool_put_unrefed_page);

+static void page_pool_bulk_splice(struct xdp_frame_bulk *bulk, void *data)
+{
+ if (unlikely(bulk->count == ARRAY_SIZE(bulk->q))) {
+ page_pool_put_page_bulk(bulk->q, bulk->count);
+ bulk->count = 0;
+ }
+
+ bulk->q[bulk->count++] = data;
+}
+
/**
* page_pool_put_page_bulk() - release references on multiple pages
- * @pool: pool from which pages were allocated
* @data: array holding page pointers
* @count: number of pages in @data
*
@@ -713,12 +722,15 @@ EXPORT_SYMBOL(page_pool_put_unrefed_page);
* Please note the caller must not use data area after running
* page_pool_put_page_bulk(), as this function overwrites it.
*/
-void page_pool_put_page_bulk(struct page_pool *pool, void **data,
- int count)
+void page_pool_put_page_bulk(void **data, int count)
{
+ struct page_pool *pool = NULL;
+ struct xdp_frame_bulk sub;
int i, bulk_len = 0;
bool in_softirq;

+ xdp_frame_bulk_init(&sub);
+
for (i = 0; i < count; i++) {
struct page *page = virt_to_head_page(data[i]);

@@ -726,12 +738,25 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
if (!page_pool_is_last_ref(page))
continue;

+ if (unlikely(!pool)) {
+ pool = page->pp;
+ } else if (page->pp != pool) {
+ /* If the page belongs to a different page_pool,
+ * splice the array and handle it recursively.
+ */
+ page_pool_bulk_splice(&sub, data[i]);
+ continue;
+ }
+
page = __page_pool_put_page(pool, page, -1, false);
/* Approved for bulk recycling in ptr_ring cache */
if (page)
data[bulk_len++] = page;
}

+ if (sub.count)
+ page_pool_put_page_bulk(sub.q, sub.count);
+
if (unlikely(!bulk_len))
return;

diff --git a/net/core/xdp.c b/net/core/xdp.c
index ed73b97472b4..8ef1d735a7eb 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -386,12 +386,12 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_attach_page_pool);
* is used for those calls sites. Thus, allowing for faster recycling
* of xdp_frames/pages in those cases.
*/
-void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
+void __xdp_return(void *data, enum xdp_mem_type mem_type, bool napi_direct,
struct xdp_buff *xdp)
{
struct page *page;

- switch (mem->type) {
+ switch (mem_type) {
case MEM_TYPE_PAGE_POOL:
page = virt_to_head_page(data);
if (napi_direct && xdp_return_frame_no_direct())
@@ -414,7 +414,7 @@ void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
break;
default:
/* Not possible, checked in xdp_rxq_info_reg_mem_model() */
- WARN(1, "Incorrect XDP memory type (%d) usage", mem->type);
+ WARN(1, "Incorrect XDP memory type (%d) usage", mem_type);
break;
}
}
@@ -431,10 +431,10 @@ void xdp_return_frame(struct xdp_frame *xdpf)
for (i = 0; i < sinfo->nr_frags; i++) {
struct page *page = skb_frag_page(&sinfo->frags[i]);

- __xdp_return(page_address(page), &xdpf->mem, false, NULL);
+ __xdp_return(page_address(page), xdpf->mem_type, false, NULL);
}
out:
- __xdp_return(xdpf->data, &xdpf->mem, false, NULL);
+ __xdp_return(xdpf->data, xdpf->mem_type, false, NULL);
}
EXPORT_SYMBOL_GPL(xdp_return_frame);

@@ -450,10 +450,10 @@ void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
for (i = 0; i < sinfo->nr_frags; i++) {
struct page *page = skb_frag_page(&sinfo->frags[i]);

- __xdp_return(page_address(page), &xdpf->mem, true, NULL);
+ __xdp_return(page_address(page), xdpf->mem_type, true, NULL);
}
out:
- __xdp_return(xdpf->data, &xdpf->mem, true, NULL);
+ __xdp_return(xdpf->data, xdpf->mem_type, true, NULL);
}
EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);

@@ -469,12 +469,10 @@ EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
*/
void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq)
{
- struct xdp_mem_allocator *xa = bq->xa;
-
- if (unlikely(!xa || !bq->count))
+ if (unlikely(!bq->count))
return;

- page_pool_put_page_bulk(xa->page_pool, bq->q, bq->count);
+ page_pool_put_page_bulk(bq->q, bq->count);
/* bq->xa is not cleared to save lookup, if mem.id same in next bulk */
bq->count = 0;
}
@@ -484,29 +482,14 @@ EXPORT_SYMBOL_GPL(xdp_flush_frame_bulk);
void xdp_return_frame_bulk(struct xdp_frame *xdpf,
struct xdp_frame_bulk *bq)
{
- struct xdp_mem_info *mem = &xdpf->mem;
- struct xdp_mem_allocator *xa;
-
- if (mem->type != MEM_TYPE_PAGE_POOL) {
+ if (xdpf->mem_type != MEM_TYPE_PAGE_POOL) {
xdp_return_frame(xdpf);
return;
}

- xa = bq->xa;
- if (unlikely(!xa)) {
- xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
- bq->count = 0;
- bq->xa = xa;
- }
-
if (bq->count == XDP_BULK_QUEUE_SIZE)
xdp_flush_frame_bulk(bq);

- if (unlikely(mem->id != xa->mem.id)) {
- xdp_flush_frame_bulk(bq);
- bq->xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
- }
-
if (unlikely(xdp_frame_has_frags(xdpf))) {
struct skb_shared_info *sinfo;
int i;
@@ -536,10 +519,11 @@ void xdp_return_buff(struct xdp_buff *xdp)
for (i = 0; i < sinfo->nr_frags; i++) {
struct page *page = skb_frag_page(&sinfo->frags[i]);

- __xdp_return(page_address(page), &xdp->rxq->mem, true, xdp);
+ __xdp_return(page_address(page), xdp->rxq->mem.type, true,
+ xdp);
}
out:
- __xdp_return(xdp->data, &xdp->rxq->mem, true, xdp);
+ __xdp_return(xdp->data, xdp->rxq->mem.type, true, xdp);
}
EXPORT_SYMBOL_GPL(xdp_return_buff);

@@ -585,7 +569,7 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp)
xdpf->headroom = 0;
xdpf->metasize = metasize;
xdpf->frame_sz = PAGE_SIZE;
- xdpf->mem.type = MEM_TYPE_PAGE_ORDER0;
+ xdpf->mem_type = MEM_TYPE_PAGE_ORDER0;

xsk_buff_free(xdp);
return xdpf;
@@ -669,7 +653,7 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
{
struct xdp_rxq_info rxq = {
.dev = dev,
- .mem = xdpf->mem,
+ .mem.type = xdpf->mem_type,
};
struct xdp_buff xdp;

@@ -731,8 +715,7 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf)
nxdpf = addr;
nxdpf->data = addr + headroom;
nxdpf->frame_sz = PAGE_SIZE;
- nxdpf->mem.type = MEM_TYPE_PAGE_ORDER0;
- nxdpf->mem.id = 0;
+ nxdpf->mem_type = MEM_TYPE_PAGE_ORDER0;

return nxdpf;
}
--
2.43.0


2023-12-23 03:03:35

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 15/34] page_pool: add inline helper to sync VA for device (for XDP_TX)

Drivers using Page Pool for Rx buffers do the same pattern on XDP_TX:
syncing-DMA-for-device and obtaining DMA address for &xdp_buff they
are sending.
Add a helper for that to be able to do that in one call in the drivers.
I explicitly added `bool compound` argument and set it to false by
default: only a few drivers, if any, uses high-order pages with Page
Pool, so losing cycles on compound_head() looks suboptimal. Drivers
can always call the underscored version if needed (for example, pass
pool->p.order as the last argument -- will always work).

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/page_pool/helpers.h | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 873631c79ab1..99dc825b03a5 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -397,6 +397,38 @@ static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
return false;
}

+static inline dma_addr_t __page_pool_dma_sync_va_for_device(const void *va,
+ u32 dma_sync_size,
+ bool compound)
+{
+ const struct page_pool *pool;
+ const struct page *page;
+ dma_addr_t addr;
+ u32 offset;
+
+ if (unlikely(compound)) {
+ page = virt_to_head_page(va);
+ offset = va - page_address(page);
+ } else {
+ page = virt_to_page(va);
+ offset = offset_in_page(va);
+ }
+
+ addr = page_pool_get_dma_addr(page) + offset;
+ pool = page->pp;
+
+ dma_sync_single_for_device(pool->p.dev, addr, dma_sync_size,
+ page_pool_get_dma_dir(pool));
+
+ return addr;
+}
+
+static inline dma_addr_t page_pool_dma_sync_va_for_device(const void *va,
+ u32 dma_sync_size)
+{
+ return __page_pool_dma_sync_va_for_device(va, dma_sync_size, false);
+}
+
/**
* page_pool_dma_sync_for_cpu - sync Rx page for CPU after it's written by HW
* @pool: &page_pool the @page belongs to
--
2.43.0


2023-12-23 03:03:54

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 16/34] jump_label: export static_key_slow_{inc,dec}_cpuslocked()

Sometimes, there's a need to modify a lot of static keys or modify the
same key multiple times in a loop. In that case, it seems more optimal
to lock cpu_read_lock once and then call _cpuslocked() variants.
The enable/disable functions are already exported, the refcounted
counterparts however are not. Fix that to allow modules to save some
cycles.

Signed-off-by: Alexander Lobakin <[email protected]>
---
kernel/jump_label.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index d9c822bbffb8..f0375372b484 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -177,6 +177,7 @@ bool static_key_slow_inc_cpuslocked(struct static_key *key)
jump_label_unlock();
return true;
}
+EXPORT_SYMBOL_GPL(static_key_slow_inc_cpuslocked);

bool static_key_slow_inc(struct static_key *key)
{
@@ -304,6 +305,7 @@ void static_key_slow_dec_cpuslocked(struct static_key *key)
STATIC_KEY_CHECK_USE(key);
__static_key_slow_dec_cpuslocked(key);
}
+EXPORT_SYMBOL_GPL(static_key_slow_dec_cpuslocked);

void __static_key_slow_dec_deferred(struct static_key *key,
struct delayed_work *work,
--
2.43.0


2023-12-23 03:04:13

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 17/34] libie: support native XDP and register memory model

Expand libie's Page Pool functionality by adding native XDP support.
This means picking the appropriate headroom and DMA direction.
Also, register all the created &page_pools as XDP memory models.
A driver then can call xdp_rxq_info_attach_page_pool() when registering
its RxQ info.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/libie/rx.c | 32 ++++++++++++++++++++++-----
include/linux/net/intel/libie/rx.h | 6 ++++-
2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/libie/rx.c b/drivers/net/ethernet/intel/libie/rx.c
index 3d3b19d2b40d..b4c404958f25 100644
--- a/drivers/net/ethernet/intel/libie/rx.c
+++ b/drivers/net/ethernet/intel/libie/rx.c
@@ -52,7 +52,7 @@ static u32 libie_rx_hw_len_truesize(const struct page_pool_params *pp,
static void libie_rx_page_pool_params(struct libie_buf_queue *bq,
struct page_pool_params *pp)
{
- pp->offset = LIBIE_SKB_HEADROOM;
+ pp->offset = bq->xdp ? LIBIE_XDP_HEADROOM : LIBIE_SKB_HEADROOM;
/* HW-writeable / syncable length per one page */
pp->max_len = LIBIE_RX_BUF_LEN(pp->offset);

@@ -132,17 +132,34 @@ int libie_rx_page_pool_create(struct libie_buf_queue *bq,
.dev = napi->dev->dev.parent,
.netdev = napi->dev,
.napi = napi,
- .dma_dir = DMA_FROM_DEVICE,
};
+ struct xdp_mem_info mem;
+ struct page_pool *pool;
+ int ret;
+
+ pp.dma_dir = bq->xdp ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE;

if (!bq->hsplit)
libie_rx_page_pool_params(bq, &pp);
else if (!libie_rx_page_pool_params_zc(bq, &pp))
return -EINVAL;

- bq->pp = page_pool_create(&pp);
+ pool = page_pool_create(&pp);
+ if (IS_ERR(pool))
+ return PTR_ERR(pool);
+
+ ret = xdp_reg_mem_model(&mem, MEM_TYPE_PAGE_POOL, pool);
+ if (ret)
+ goto err_mem;
+
+ bq->pp = pool;
+
+ return 0;

- return PTR_ERR_OR_ZERO(bq->pp);
+err_mem:
+ page_pool_destroy(pool);
+
+ return ret;
}
EXPORT_SYMBOL_NS_GPL(libie_rx_page_pool_create, LIBIE);

@@ -152,7 +169,12 @@ EXPORT_SYMBOL_NS_GPL(libie_rx_page_pool_create, LIBIE);
*/
void libie_rx_page_pool_destroy(struct libie_buf_queue *bq)
{
- page_pool_destroy(bq->pp);
+ struct xdp_mem_info mem = {
+ .type = MEM_TYPE_PAGE_POOL,
+ .id = bq->pp->xdp_mem_id,
+ };
+
+ xdp_unreg_mem_model(&mem);
bq->pp = NULL;
}
EXPORT_SYMBOL_NS_GPL(libie_rx_page_pool_destroy, LIBIE);
diff --git a/include/linux/net/intel/libie/rx.h b/include/linux/net/intel/libie/rx.h
index 87ad8f9e89c7..8eda4ac8028c 100644
--- a/include/linux/net/intel/libie/rx.h
+++ b/include/linux/net/intel/libie/rx.h
@@ -15,8 +15,10 @@

/* Space reserved in front of each frame */
#define LIBIE_SKB_HEADROOM (NET_SKB_PAD + NET_IP_ALIGN)
+#define LIBIE_XDP_HEADROOM (ALIGN(XDP_PACKET_HEADROOM, NET_SKB_PAD) + \
+ NET_IP_ALIGN)
/* Maximum headroom to calculate max MTU below */
-#define LIBIE_MAX_HEADROOM LIBIE_SKB_HEADROOM
+#define LIBIE_MAX_HEADROOM LIBIE_XDP_HEADROOM
/* Link layer / L2 overhead: Ethernet, 2 VLAN tags (C + S), FCS */
#define LIBIE_RX_LL_LEN (ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN)
/* Maximum supported L2-L4 header length */
@@ -87,6 +89,7 @@ enum libie_rx_buf_type {
* @rx_buf_len: HW-writeable length per each buffer
* @type: type of the buffers this queue has
* @hsplit: flag whether header split is enabled
+ * @xdp: flag indicating whether XDP is enabled
*/
struct libie_buf_queue {
struct page_pool *pp;
@@ -100,6 +103,7 @@ struct libie_buf_queue {
enum libie_rx_buf_type type:2;

bool hsplit:1;
+ bool xdp:1;
};

int libie_rx_page_pool_create(struct libie_buf_queue *bq,
--
2.43.0


2023-12-23 03:04:33

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 18/34] libie: add a couple of XDP helpers

"Couple" is a bit humbly... Add the following functionality to libie:

* XDP shared queues managing
* XDP_TX bulk sending infra
* .ndo_xdp_xmit() infra
* adding buffers to &xdp_buff
* running XDP prog and managing its verdict
* completing XDP Tx buffers
* ^ repeat everything for XSk

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/libie/Makefile | 3 +
drivers/net/ethernet/intel/libie/tx.c | 16 +
drivers/net/ethernet/intel/libie/xdp.c | 50 ++
drivers/net/ethernet/intel/libie/xsk.c | 49 ++
include/linux/net/intel/libie/tx.h | 6 +
include/linux/net/intel/libie/xdp.h | 586 ++++++++++++++++++++++
include/linux/net/intel/libie/xsk.h | 172 +++++++
7 files changed, 882 insertions(+)
create mode 100644 drivers/net/ethernet/intel/libie/tx.c
create mode 100644 drivers/net/ethernet/intel/libie/xdp.c
create mode 100644 drivers/net/ethernet/intel/libie/xsk.c
create mode 100644 include/linux/net/intel/libie/xdp.h
create mode 100644 include/linux/net/intel/libie/xsk.h

diff --git a/drivers/net/ethernet/intel/libie/Makefile b/drivers/net/ethernet/intel/libie/Makefile
index 76f32253481b..7ca353e4ecdf 100644
--- a/drivers/net/ethernet/intel/libie/Makefile
+++ b/drivers/net/ethernet/intel/libie/Makefile
@@ -5,3 +5,6 @@ obj-$(CONFIG_LIBIE) += libie.o

libie-objs += rx.o
libie-objs += stats.o
+libie-objs += tx.o
+libie-objs += xdp.o
+libie-objs += xsk.o
diff --git a/drivers/net/ethernet/intel/libie/tx.c b/drivers/net/ethernet/intel/libie/tx.c
new file mode 100644
index 000000000000..9e3b6e3c3c25
--- /dev/null
+++ b/drivers/net/ethernet/intel/libie/tx.c
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2023 Intel Corporation. */
+
+#include <linux/net/intel/libie/xdp.h>
+
+void libie_tx_complete_any(struct libie_tx_buffer *buf, struct device *dev,
+ struct xdp_frame_bulk *bq, u32 *xdp_tx_active,
+ struct libie_sq_onstack_stats *ss)
+{
+ if (buf->type > LIBIE_TX_BUF_SKB)
+ libie_xdp_complete_tx_buf(buf, dev, false, bq, xdp_tx_active,
+ ss);
+ else
+ libie_tx_complete_buf(buf, dev, false, ss);
+}
+EXPORT_SYMBOL_NS_GPL(libie_tx_complete_any, LIBIE);
diff --git a/drivers/net/ethernet/intel/libie/xdp.c b/drivers/net/ethernet/intel/libie/xdp.c
new file mode 100644
index 000000000000..f47a17ca6e66
--- /dev/null
+++ b/drivers/net/ethernet/intel/libie/xdp.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2023 Intel Corporation. */
+
+#include <linux/net/intel/libie/xdp.h>
+
+/* XDP SQ sharing */
+
+DEFINE_STATIC_KEY_FALSE(libie_xdp_sq_share);
+EXPORT_SYMBOL_NS_GPL(libie_xdp_sq_share, LIBIE);
+
+void __libie_xdp_sq_get(struct libie_xdp_sq_lock *lock,
+ const struct net_device *dev)
+{
+ bool warn;
+
+ spin_lock_init(&lock->lock);
+ lock->share = true;
+
+ warn = !static_key_enabled(&libie_xdp_sq_share);
+ static_branch_inc_cpuslocked(&libie_xdp_sq_share);
+
+ if (warn)
+ netdev_warn(dev, "XDP SQ sharing enabled, possible XDP_TX/XDP_REDIRECT slowdown\n");
+
+}
+EXPORT_SYMBOL_NS_GPL(__libie_xdp_sq_get, LIBIE);
+
+void __libie_xdp_sq_put(struct libie_xdp_sq_lock *lock,
+ const struct net_device *dev)
+{
+ static_branch_dec_cpuslocked(&libie_xdp_sq_share);
+
+ if (!static_key_enabled(&libie_xdp_sq_share))
+ netdev_notice(dev, "XDP SQ sharing disabled\n");
+
+ lock->share = false;
+}
+EXPORT_SYMBOL_NS_GPL(__libie_xdp_sq_put, LIBIE);
+
+/* ``XDP_TX`` bulking */
+
+void libie_xdp_tx_return_bulk(const struct libie_xdp_tx_frame *bq, u32 count)
+{
+ for (u32 i = 0; i < count; i++) {
+ struct page *page = virt_to_page(bq[i].data);
+
+ page_pool_recycle_direct(page->pp, page);
+ }
+}
+EXPORT_SYMBOL_NS_GPL(libie_xdp_tx_return_bulk, LIBIE);
diff --git a/drivers/net/ethernet/intel/libie/xsk.c b/drivers/net/ethernet/intel/libie/xsk.c
new file mode 100644
index 000000000000..ffbdb85586f1
--- /dev/null
+++ b/drivers/net/ethernet/intel/libie/xsk.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2023 Intel Corporation. */
+
+#include <linux/net/intel/libie/xsk.h>
+
+#define LIBIE_XSK_DMA_ATTR (DMA_ATTR_WEAK_ORDERING | \
+ DMA_ATTR_SKIP_CPU_SYNC)
+
+int libie_xsk_enable_pool(struct net_device *dev, u32 qid, unsigned long *map)
+{
+ struct xsk_buff_pool *pool;
+ int ret;
+
+ if (qid >= min(dev->real_num_rx_queues, dev->real_num_tx_queues))
+ return -EINVAL;
+
+ pool = xsk_get_pool_from_qid(dev, qid);
+ if (!pool)
+ return -EINVAL;
+
+ ret = xsk_pool_dma_map(pool, dev->dev.parent, LIBIE_XSK_DMA_ATTR);
+ if (ret)
+ return ret;
+
+ set_bit(qid, map);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(libie_xsk_enable_pool, LIBIE);
+
+int libie_xsk_disable_pool(struct net_device *dev, u32 qid,
+ unsigned long *map)
+{
+ struct xsk_buff_pool *pool;
+
+ if (qid >= min(dev->real_num_rx_queues, dev->real_num_tx_queues))
+ return -EINVAL;
+
+ pool = xsk_get_pool_from_qid(dev, qid);
+ if (!pool)
+ return -EINVAL;
+
+ xsk_pool_dma_unmap(pool, LIBIE_XSK_DMA_ATTR);
+
+ clear_bit(qid, map);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(libie_xsk_disable_pool, LIBIE);
diff --git a/include/linux/net/intel/libie/tx.h b/include/linux/net/intel/libie/tx.h
index 07a19abb72fd..4d4d85af6f7e 100644
--- a/include/linux/net/intel/libie/tx.h
+++ b/include/linux/net/intel/libie/tx.h
@@ -85,4 +85,10 @@ static inline void libie_tx_complete_buf(struct libie_tx_buffer *buf,
buf->type = LIBIE_TX_BUF_EMPTY;
}

+struct xdp_frame_bulk;
+
+void libie_tx_complete_any(struct libie_tx_buffer *buf, struct device *dev,
+ struct xdp_frame_bulk *bq, u32 *xdp_tx_active,
+ struct libie_sq_onstack_stats *ss);
+
#endif /* __LIBIE_TX_H */
diff --git a/include/linux/net/intel/libie/xdp.h b/include/linux/net/intel/libie/xdp.h
new file mode 100644
index 000000000000..087fc075078f
--- /dev/null
+++ b/include/linux/net/intel/libie/xdp.h
@@ -0,0 +1,586 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2023 Intel Corporation. */
+
+#ifndef __LIBIE_XDP_H
+#define __LIBIE_XDP_H
+
+#include <linux/bpf_trace.h>
+#include <linux/net/intel/libie/rx.h>
+#include <linux/net/intel/libie/tx.h>
+
+#include <net/xdp_sock_drv.h>
+
+/* Defined as bits to be able to use them as a mask */
+enum {
+ LIBIE_XDP_PASS = 0U,
+ LIBIE_XDP_DROP = BIT(0),
+ LIBIE_XDP_ABORTED = BIT(1),
+ LIBIE_XDP_TX = BIT(2),
+ LIBIE_XDP_REDIRECT = BIT(3),
+};
+
+/* XDP SQ sharing */
+
+struct libie_xdp_sq_lock {
+ spinlock_t lock;
+ bool share;
+};
+
+DECLARE_STATIC_KEY_FALSE(libie_xdp_sq_share);
+
+static inline u32 libie_xdp_get_sq_num(u32 rxq, u32 txq, u32 max)
+{
+ return min(max(nr_cpu_ids, rxq), max - txq);
+}
+
+static inline bool libie_xdp_sq_shared(u32 qid)
+{
+ return qid < nr_cpu_ids;
+}
+
+static inline u32 libie_xdp_sq_id(u32 num)
+{
+ u32 ret = smp_processor_id();
+
+ if (static_branch_unlikely(&libie_xdp_sq_share) &&
+ libie_xdp_sq_shared(num))
+ ret %= num;
+
+ return ret;
+}
+
+void __libie_xdp_sq_get(struct libie_xdp_sq_lock *lock,
+ const struct net_device *dev);
+void __libie_xdp_sq_put(struct libie_xdp_sq_lock *lock,
+ const struct net_device *dev);
+
+static inline void libie_xdp_sq_get(struct libie_xdp_sq_lock *lock,
+ const struct net_device *dev,
+ bool share)
+{
+ if (unlikely(share))
+ __libie_xdp_sq_get(lock, dev);
+}
+
+static inline void libie_xdp_sq_put(struct libie_xdp_sq_lock *lock,
+ const struct net_device *dev)
+{
+ if (static_branch_unlikely(&libie_xdp_sq_share) && lock->share)
+ __libie_xdp_sq_put(lock, dev);
+}
+
+static inline void __acquires(&lock->lock)
+libie_xdp_sq_lock(struct libie_xdp_sq_lock *lock)
+{
+ if (static_branch_unlikely(&libie_xdp_sq_share) && lock->share)
+ spin_lock(&lock->lock);
+}
+
+static inline void __releases(&lock->lock)
+libie_xdp_sq_unlock(struct libie_xdp_sq_lock *lock)
+{
+ if (static_branch_unlikely(&libie_xdp_sq_share) && lock->share)
+ spin_unlock(&lock->lock);
+}
+
+/* ``XDP_TX`` bulking */
+
+#define LIBIE_XDP_TX_BULK XDP_BULK_QUEUE_SIZE
+#define LIBIE_XDP_TX_BATCH 8
+
+#ifdef __clang__
+#define libie_xdp_tx_for _Pragma("clang loop unroll_count(8)") for
+#elif __GNUC__ >= 8
+#define libie_xdp_tx_for _Pragma("GCC unroll (8)") for
+#else
+#define libie_xdp_tx_for for
+#endif
+
+struct libie_xdp_tx_frame {
+ union {
+ struct {
+ void *data;
+ u16 len;
+
+ enum xdp_buff_flags flags:16;
+ u32 soff;
+ };
+ struct {
+ struct xdp_frame *xdpf;
+ dma_addr_t dma;
+ };
+
+ struct {
+ struct xdp_buff *xsk;
+ /* u32 len */
+ };
+ struct xdp_desc desc;
+ };
+};
+static_assert(sizeof(struct libie_xdp_tx_frame) == sizeof(struct xdp_desc));
+
+struct libie_xdp_tx_bulk {
+ const struct bpf_prog *prog;
+ struct net_device *dev;
+ void *xdpq;
+
+ u32 act_mask;
+ u32 count;
+ struct libie_xdp_tx_frame bulk[LIBIE_XDP_TX_BULK];
+};
+
+
+struct libie_xdp_tx_queue {
+ union {
+ struct device *dev;
+ struct xsk_buff_pool *pool;
+ };
+ struct libie_tx_buffer *tx_buf;
+ void *desc_ring;
+
+ struct libie_xdp_sq_lock *xdp_lock;
+ u16 *next_to_use;
+ u32 desc_count;
+
+ u32 *xdp_tx_active;
+};
+
+struct libie_xdp_tx_desc {
+ dma_addr_t addr;
+ u32 len;
+};
+
+static inline void __libie_xdp_tx_init_bulk(struct libie_xdp_tx_bulk *bq,
+ const struct bpf_prog *prog,
+ struct net_device *dev, void *xdpq)
+{
+ bq->prog = prog;
+ bq->dev = dev;
+ bq->xdpq = xdpq;
+
+ bq->act_mask = 0;
+ bq->count = 0;
+}
+
+#define _libie_xdp_tx_init_bulk(bq, prog, dev, xdpqs, num, uniq) ({ \
+ const struct bpf_prog *uniq = rcu_dereference(prog); \
+ \
+ if (uniq) \
+ __libie_xdp_tx_init_bulk(bq, uniq, dev, \
+ (xdpqs)[libie_xdp_sq_id(num)]); \
+})
+
+#define libie_xdp_tx_init_bulk(bq, prog, dev, xdpqs, num) \
+ _libie_xdp_tx_init_bulk(bq, prog, dev, xdpqs, num, \
+ __UNIQUE_ID(prog_))
+
+static inline void libie_xdp_tx_queue_bulk(struct libie_xdp_tx_bulk *bq,
+ const struct xdp_buff *xdp)
+{
+ bq->bulk[bq->count++] = (typeof(*bq->bulk)){
+ .data = xdp->data,
+ .len = xdp->data_end - xdp->data,
+ .soff = xdp_data_hard_end(xdp) - xdp->data,
+ .flags = xdp->flags,
+ };
+}
+
+static inline struct libie_xdp_tx_desc
+libie_xdp_tx_fill_buf(const struct libie_xdp_tx_frame *frm,
+ const struct libie_xdp_tx_queue *sq)
+{
+ struct libie_xdp_tx_desc desc = {
+ .len = frm->len,
+ };
+ struct libie_tx_buffer *tx_buf;
+
+ desc.addr = page_pool_dma_sync_va_for_device(frm->data, desc.len);
+
+ tx_buf = &sq->tx_buf[*sq->next_to_use];
+ tx_buf->type = LIBIE_TX_BUF_XDP_TX;
+ tx_buf->gso_segs = 1;
+ tx_buf->bytecount = desc.len;
+ tx_buf->sinfo = frm->data + frm->soff;
+
+ return desc;
+}
+
+static __always_inline u32
+libie_xdp_tx_xmit_bulk(const struct libie_xdp_tx_bulk *bq,
+ u32 (*prep)(void *xdpq, struct libie_xdp_tx_queue *sq),
+ struct libie_xdp_tx_desc
+ (*fill)(const struct libie_xdp_tx_frame *frm,
+ const struct libie_xdp_tx_queue *sq),
+ void (*xmit)(struct libie_xdp_tx_desc desc,
+ const struct libie_xdp_tx_queue *sq))
+{
+ u32 this, batched, leftover, off = 0;
+ struct libie_xdp_tx_queue sq;
+ u32 free, count, ntu, i = 0;
+
+ free = prep(bq->xdpq, &sq);
+ count = min3(bq->count, free, LIBIE_XDP_TX_BULK);
+ ntu = *sq.next_to_use;
+
+again:
+ this = sq.desc_count - ntu;
+ if (likely(this > count))
+ this = count;
+
+ batched = ALIGN_DOWN(this, LIBIE_XDP_TX_BATCH);
+ leftover = this - batched;
+
+ for ( ; i < off + batched; i += LIBIE_XDP_TX_BATCH) {
+ libie_xdp_tx_for (u32 j = 0; j < LIBIE_XDP_TX_BATCH; j++) {
+ struct libie_xdp_tx_desc desc;
+
+ desc = fill(&bq->bulk[i + j], &sq);
+ xmit(desc, &sq);
+
+ ntu++;
+ }
+ }
+
+ for ( ; i < off + batched + leftover; i++) {
+ struct libie_xdp_tx_desc desc;
+
+ desc = fill(&bq->bulk[i], &sq);
+ xmit(desc, &sq);
+
+ ntu++;
+ }
+
+ if (likely(ntu < sq.desc_count))
+ goto out;
+
+ ntu = 0;
+
+ count -= this;
+ if (count) {
+ off = i;
+ goto again;
+ }
+
+out:
+ *sq.next_to_use = ntu;
+ if (sq.xdp_tx_active)
+ *sq.xdp_tx_active += i;
+
+ libie_xdp_sq_unlock(sq.xdp_lock);
+
+ return i;
+}
+
+void libie_xdp_tx_return_bulk(const struct libie_xdp_tx_frame *bq, u32 count);
+
+static __always_inline bool
+__libie_xdp_tx_flush_bulk(struct libie_xdp_tx_bulk *bq,
+ u32 (*prep)(void *xdpq,
+ struct libie_xdp_tx_queue *sq),
+ struct libie_xdp_tx_desc
+ (*fill)(const struct libie_xdp_tx_frame *frm,
+ const struct libie_xdp_tx_queue *sq),
+ void (*xmit)(struct libie_xdp_tx_desc desc,
+ const struct libie_xdp_tx_queue *sq))
+{
+ u32 sent, drops;
+ int err = 0;
+
+ sent = libie_xdp_tx_xmit_bulk(bq, prep, fill, xmit);
+ drops = bq->count - sent;
+ bq->count = 0;
+
+ if (unlikely(drops)) {
+ trace_xdp_exception(bq->dev, bq->prog, XDP_TX);
+ err = -ENXIO;
+
+ libie_xdp_tx_return_bulk(&bq->bulk[sent], drops);
+ }
+
+ trace_xdp_bulk_tx(bq->dev, sent, drops, err);
+
+ return likely(sent);
+}
+
+#define libie_xdp_tx_flush_bulk(bq, prep, xmit) \
+ __libie_xdp_tx_flush_bulk(bq, prep, libie_xdp_tx_fill_buf, xmit)
+
+/* .ndo_xdp_xmit() implementation */
+
+static inline bool libie_xdp_xmit_queue_bulk(struct libie_xdp_tx_bulk *bq,
+ struct xdp_frame *xdpf)
+{
+ struct device *dev = bq->dev->dev.parent;
+ dma_addr_t dma;
+
+ dma = dma_map_single(dev, xdpf->data, xdpf->len, DMA_TO_DEVICE);
+ if (dma_mapping_error(dev, dma))
+ return false;
+
+ bq->bulk[bq->count++] = (typeof(*bq->bulk)){
+ .xdpf = xdpf,
+ .dma = dma,
+ };
+
+ return true;
+}
+
+static inline struct libie_xdp_tx_desc
+libie_xdp_xmit_fill_buf(const struct libie_xdp_tx_frame *frm,
+ const struct libie_xdp_tx_queue *sq)
+{
+ struct xdp_frame *xdpf = frm->xdpf;
+ struct libie_xdp_tx_desc desc = {
+ .addr = frm->dma,
+ .len = xdpf->len,
+ };
+ struct libie_tx_buffer *tx_buf;
+
+ tx_buf = &sq->tx_buf[*sq->next_to_use];
+ tx_buf->type = LIBIE_TX_BUF_XDP_XMIT;
+ tx_buf->gso_segs = 1;
+ tx_buf->bytecount = desc.len;
+ tx_buf->xdpf = xdpf;
+
+ dma_unmap_addr_set(tx_buf, dma, frm->dma);
+ dma_unmap_len_set(tx_buf, len, desc.len);
+
+ return desc;
+}
+
+static __always_inline int
+__libie_xdp_xmit_do_bulk(struct libie_xdp_tx_bulk *bq,
+ struct xdp_frame **frames, u32 n, u32 flags,
+ u32 (*prep)(void *xdpq,
+ struct libie_xdp_tx_queue *sq),
+ void (*xmit)(struct libie_xdp_tx_desc desc,
+ const struct libie_xdp_tx_queue *sq),
+ void (*finalize)(void *xdpq, bool tail))
+{
+ int err = -ENXIO;
+ u32 nxmit = 0;
+
+ if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
+ return -EINVAL;
+
+ for (u32 i = 0; i < n; i++) {
+ if (!libie_xdp_xmit_queue_bulk(bq, frames[i]))
+ break;
+ }
+
+ if (unlikely(!bq->count))
+ goto out;
+
+ nxmit = libie_xdp_tx_xmit_bulk(bq, prep, libie_xdp_xmit_fill_buf,
+ xmit);
+ if (unlikely(!nxmit))
+ goto out;
+
+ finalize(bq->xdpq, flags & XDP_XMIT_FLUSH);
+
+ if (likely(nxmit == n))
+ err = 0;
+
+out:
+ trace_xdp_bulk_tx(bq->dev, nxmit, n - nxmit, err);
+
+ return nxmit;
+}
+
+#define libie_xdp_xmit_init_bulk(bq, dev, xdpqs, num) \
+ __libie_xdp_tx_init_bulk(bq, NULL, dev, \
+ (xdpqs)[libie_xdp_sq_id(num)])
+
+#define _libie_xdp_xmit_do_bulk(dev, n, fr, fl, xqs, nqs, pr, xm, fin, un) ({ \
+ struct libie_xdp_tx_bulk un; \
+ \
+ libie_xdp_xmit_init_bulk(&un, dev, xqs, nqs); \
+ __libie_xdp_xmit_do_bulk(&un, fr, n, fl, pr, xm, fin); \
+})
+#define libie_xdp_xmit_do_bulk(dev, n, fr, fl, xqs, nqs, pr, xm, fin) \
+ _libie_xdp_xmit_do_bulk(dev, n, fr, fl, xqs, nqs, pr, xm, fin, \
+ __UNIQUE_ID(bq_))
+
+/* Rx polling path */
+
+static inline void libie_xdp_init_buff(struct xdp_buff *dst,
+ const struct xdp_buff *src,
+ struct xdp_rxq_info *rxq)
+{
+ if (!src->data) {
+ dst->data = NULL;
+ dst->rxq = rxq;
+ } else {
+ *dst = *src;
+ }
+}
+
+#define libie_xdp_save_buff(dst, src) libie_xdp_init_buff(dst, src, NULL)
+
+/**
+ * libie_xdp_process_buff - process an Rx buffer
+ * @xdp: XDP buffer to attach the buffer to
+ * @buf: Rx buffer to process
+ * @len: received data length from the descriptor
+ *
+ * Return: false if the descriptor must be skipped, true otherwise.
+ */
+static inline bool libie_xdp_process_buff(struct xdp_buff *xdp,
+ const struct libie_rx_buffer *buf,
+ u32 len)
+{
+ if (!libie_rx_sync_for_cpu(buf, len))
+ return false;
+
+ if (!xdp->data) {
+ xdp->flags = 0;
+ xdp->frame_sz = buf->truesize;
+
+ xdp_prepare_buff(xdp, page_address(buf->page) + buf->offset,
+ buf->page->pp->p.offset, len, true);
+ } else if (!xdp_buff_add_frag(xdp, buf->page,
+ buf->offset + buf->page->pp->p.offset,
+ len, buf->truesize)) {
+ xdp_return_buff(xdp);
+ xdp->data = NULL;
+
+ return false;
+ }
+
+ return true;
+}
+
+/**
+ * __libie_xdp_run_prog - run XDP program on an XDP buffer
+ * @xdp: XDP buffer to run the prog on
+ * @bq: buffer bulk for ``XDP_TX`` queueing
+ *
+ * Return: LIBIE_XDP_{PASS,DROP,TX,REDIRECT} depending on the prog's verdict.
+ */
+static inline u32 __libie_xdp_run_prog(struct xdp_buff *xdp,
+ struct libie_xdp_tx_bulk *bq)
+{
+ const struct bpf_prog *prog = bq->prog;
+ u32 act;
+
+ act = bpf_prog_run_xdp(prog, xdp);
+ switch (act) {
+ case XDP_ABORTED:
+err:
+ trace_xdp_exception(bq->dev, prog, act);
+ fallthrough;
+ case XDP_DROP:
+ xdp_return_buff(xdp);
+ xdp->data = NULL;
+
+ return LIBIE_XDP_DROP;
+ case XDP_PASS:
+ return LIBIE_XDP_PASS;
+ case XDP_TX:
+ libie_xdp_tx_queue_bulk(bq, xdp);
+ xdp->data = NULL;
+
+ return LIBIE_XDP_TX;
+ case XDP_REDIRECT:
+ if (unlikely(xdp_do_redirect(bq->dev, xdp, prog)))
+ goto err;
+
+ xdp->data = NULL;
+
+ return LIBIE_XDP_REDIRECT;
+ default:
+ bpf_warn_invalid_xdp_action(bq->dev, prog, act);
+ goto err;
+ }
+}
+
+static __always_inline u32
+__libie_xdp_run_flush(struct xdp_buff *xdp, struct libie_xdp_tx_bulk *bq,
+ u32 (*run)(struct xdp_buff *xdp,
+ struct libie_xdp_tx_bulk *bq),
+ bool (*flush_bulk)(struct libie_xdp_tx_bulk *))
+{
+ u32 act;
+
+ act = run(xdp, bq);
+ if (act == LIBIE_XDP_TX &&
+ unlikely(bq->count == LIBIE_XDP_TX_BULK && !flush_bulk(bq)))
+ act = LIBIE_XDP_DROP;
+
+ bq->act_mask |= act;
+
+ return act;
+}
+
+#define libie_xdp_run_prog(xdp, bq, fl) \
+ (__libie_xdp_run_flush(xdp, bq, __libie_xdp_run_prog, fl) == \
+ XDP_PASS)
+
+static __always_inline void
+libie_xdp_finalize_rx(struct libie_xdp_tx_bulk *bq,
+ bool (*flush_bulk)(struct libie_xdp_tx_bulk *),
+ void (*finalize)(void *xdpq, bool tail))
+{
+ if (bq->act_mask & LIBIE_XDP_TX) {
+ if (bq->count)
+ flush_bulk(bq);
+ finalize(bq->xdpq, true);
+ }
+ if (bq->act_mask & LIBIE_XDP_REDIRECT)
+ xdp_do_flush();
+}
+
+/* Tx buffer completion */
+
+static inline void libie_xdp_return_sinfo(const struct libie_tx_buffer *buf,
+ bool napi)
+{
+ const struct skb_shared_info *sinfo = buf->sinfo;
+ struct page *page;
+
+ if (likely(buf->gso_segs == 1))
+ goto return_head;
+
+ for (u32 i = 0; i < sinfo->nr_frags; i++) {
+ page = skb_frag_page(&sinfo->frags[i]);
+ page_pool_put_full_page(page->pp, page, napi);
+ }
+
+return_head:
+ page = virt_to_page(sinfo);
+ page_pool_put_full_page(page->pp, page, napi);
+}
+
+static inline void libie_xdp_complete_tx_buf(struct libie_tx_buffer *buf,
+ struct device *dev, bool napi,
+ struct xdp_frame_bulk *bq,
+ u32 *xdp_tx_active,
+ struct libie_sq_onstack_stats *ss)
+{
+ switch (buf->type) {
+ case LIBIE_TX_BUF_EMPTY:
+ return;
+ case LIBIE_TX_BUF_XDP_TX:
+ libie_xdp_return_sinfo(buf, napi);
+ break;
+ case LIBIE_TX_BUF_XDP_XMIT:
+ dma_unmap_page(dev, dma_unmap_addr(buf, dma),
+ dma_unmap_len(buf, len), DMA_TO_DEVICE);
+ xdp_return_frame_bulk(buf->xdpf, bq);
+ break;
+ case LIBIE_TX_BUF_XSK_TX:
+ xsk_buff_free(buf->xdp);
+ break;
+ default:
+ break;
+ }
+
+ (*xdp_tx_active)--;
+
+ ss->packets += buf->gso_segs;
+ ss->bytes += buf->bytecount;
+
+ buf->type = LIBIE_TX_BUF_EMPTY;
+}
+
+#endif /* __LIBIE_XDP_H */
diff --git a/include/linux/net/intel/libie/xsk.h b/include/linux/net/intel/libie/xsk.h
new file mode 100644
index 000000000000..d21fdb69a5e0
--- /dev/null
+++ b/include/linux/net/intel/libie/xsk.h
@@ -0,0 +1,172 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2023 Intel Corporation. */
+
+#ifndef __LIBIE_XSK_H
+#define __LIBIE_XSK_H
+
+#include <linux/net/intel/libie/xdp.h>
+
+/* ``XDP_TX`` bulking */
+
+#define libie_xsk_tx_init_bulk(bq, prog, dev, xdpqs, num) \
+ __libie_xdp_tx_init_bulk(bq, rcu_dereference(prog), dev, \
+ (xdpqs)[libie_xdp_sq_id(num)])
+
+static inline void libie_xsk_tx_queue_bulk(struct libie_xdp_tx_bulk *bq,
+ struct xdp_buff *xdp)
+{
+ bq->bulk[bq->count++] = (typeof(*bq->bulk)){
+ .xsk = xdp,
+ .len = xdp->data_end - xdp->data,
+ };
+}
+
+static inline struct libie_xdp_tx_desc
+libie_xsk_tx_fill_buf(const struct libie_xdp_tx_frame *frm,
+ const struct libie_xdp_tx_queue *sq)
+{
+ struct libie_xdp_tx_desc desc = {
+ .len = frm->len,
+ };
+ struct xdp_buff *xdp = frm->xsk;
+ struct libie_tx_buffer *tx_buf;
+
+ desc.addr = xsk_buff_xdp_get_dma(xdp);
+ xsk_buff_raw_dma_sync_for_device(sq->pool, desc.addr, desc.len);
+
+ tx_buf = &sq->tx_buf[*sq->next_to_use];
+ tx_buf->type = LIBIE_TX_BUF_XSK_TX;
+ tx_buf->gso_segs = 1;
+ tx_buf->bytecount = desc.len;
+ tx_buf->xdp = xdp;
+
+ return desc;
+}
+
+#define libie_xsk_tx_flush_bulk(bq, prep, xmit) \
+ __libie_xdp_tx_flush_bulk(bq, prep, libie_xsk_tx_fill_buf, xmit)
+
+/* XSk xmit implementation */
+
+#define libie_xsk_xmit_init_bulk(bq, xdpq) \
+ __libie_xdp_tx_init_bulk(bq, NULL, NULL, xdpq)
+
+static inline struct libie_xdp_tx_desc
+libie_xsk_xmit_fill_buf(const struct libie_xdp_tx_frame *frm,
+ const struct libie_xdp_tx_queue *sq)
+{
+ struct libie_xdp_tx_desc desc = {
+ .len = frm->desc.len,
+ };
+
+ desc.addr = xsk_buff_raw_get_dma(sq->pool, frm->desc.addr);
+ xsk_buff_raw_dma_sync_for_device(sq->pool, desc.addr, desc.len);
+
+ return desc;
+}
+
+static __always_inline bool
+libie_xsk_xmit_do_bulk(void *xdpq, struct xsk_buff_pool *pool, u32 budget,
+ u32 (*prep)(void *xdpq, struct libie_xdp_tx_queue *sq),
+ void (*xmit)(struct libie_xdp_tx_desc desc,
+ const struct libie_xdp_tx_queue *sq),
+ void (*finalize)(void *xdpq, bool tail))
+{
+ struct libie_xdp_tx_bulk bq;
+ u32 n, batched;
+
+ n = xsk_tx_peek_release_desc_batch(pool, budget);
+ if (unlikely(!n))
+ return true;
+
+ batched = ALIGN_DOWN(n, LIBIE_XDP_TX_BULK);
+
+ libie_xsk_xmit_init_bulk(&bq, xdpq);
+ bq.count = LIBIE_XDP_TX_BULK;
+
+ for (u32 i = 0; i < batched; i += LIBIE_XDP_TX_BULK) {
+ memcpy(bq.bulk, &pool->tx_descs[i], sizeof(bq.bulk));
+ libie_xdp_tx_xmit_bulk(&bq, prep, libie_xsk_xmit_fill_buf,
+ xmit);
+ }
+
+ bq.count = n - batched;
+
+ memcpy(bq.bulk, &pool->tx_descs[batched], bq.count * sizeof(*bq.bulk));
+ libie_xdp_tx_xmit_bulk(&bq, prep, libie_xsk_xmit_fill_buf, xmit);
+
+ finalize(bq.xdpq, true);
+
+ if (xsk_uses_need_wakeup(pool))
+ xsk_set_tx_need_wakeup(pool);
+
+ return n < budget;
+}
+
+/* Rx polling path */
+
+/**
+ * __libie_xsk_run_prog - run XDP program on an XDP buffer
+ * @xdp: XDP buffer to run the prog on
+ * @bq: buffer bulk for ``XDP_TX`` queueing
+ *
+ * Return: LIBIE_XDP_{PASS,DROP,ABORTED,TX,REDIRECT} depending on the prog's
+ * verdict.
+ */
+static inline u32 __libie_xsk_run_prog(struct xdp_buff *xdp,
+ struct libie_xdp_tx_bulk *bq)
+{
+ const struct bpf_prog *prog = bq->prog;
+ u32 act, drop = LIBIE_XDP_DROP;
+ struct xdp_buff_xsk *xsk;
+ int ret;
+
+ act = bpf_prog_run_xdp(prog, xdp);
+ if (unlikely(act != XDP_REDIRECT))
+ goto rest;
+
+ ret = xdp_do_redirect(bq->dev, xdp, prog);
+ if (unlikely(ret))
+ goto check_err;
+
+ return LIBIE_XDP_REDIRECT;
+
+rest:
+ switch (act) {
+ case XDP_ABORTED:
+err:
+ trace_xdp_exception(bq->dev, prog, act);
+ fallthrough;
+ case XDP_DROP:
+ xsk_buff_free(xdp);
+
+ return drop;
+ case XDP_PASS:
+ return LIBIE_XDP_PASS;
+ case XDP_TX:
+ libie_xsk_tx_queue_bulk(bq, xdp);
+
+ return LIBIE_XDP_TX;
+ default:
+ bpf_warn_invalid_xdp_action(bq->dev, prog, act);
+ goto err;
+ }
+
+check_err:
+ xsk = container_of(xdp, typeof(*xsk), xdp);
+ if (xsk_uses_need_wakeup(xsk->pool) && ret == -ENOBUFS)
+ drop = LIBIE_XDP_ABORTED;
+
+ goto err;
+}
+
+#define libie_xsk_run_prog(xdp, bq, fl) \
+ __libie_xdp_run_flush(xdp, bq, __libie_xsk_run_prog, fl)
+
+/* Externals */
+
+int libie_xsk_enable_pool(struct net_device *dev, u32 qid, unsigned long *map);
+int libie_xsk_disable_pool(struct net_device *dev, u32 qid,
+ unsigned long *map);
+
+#endif /* __LIBIE_XSK_H */
--
2.43.0


2023-12-23 03:04:54

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 19/34] idpf: stop using macros for accessing queue descriptors

In C, we have structures and unions.
Casting `void *` via macros is not only error-prone, but also looks
confusing and awful in general.
Replace it with a union and direct array dereferences. Had idpf had
separate queue structures, it would look way more elegant -- will do
one day.

Signed-off-by: Alexander Lobakin <[email protected]>
---
.../ethernet/intel/idpf/idpf_singleq_txrx.c | 20 +++++-----
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 30 +++++++--------
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 37 ++++++++-----------
3 files changed, 40 insertions(+), 47 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
index 23dcc02e6976..7072d45f007b 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
@@ -206,7 +206,7 @@ static void idpf_tx_singleq_map(struct idpf_queue *tx_q,
data_len = skb->data_len;
size = skb_headlen(skb);

- tx_desc = IDPF_BASE_TX_DESC(tx_q, i);
+ tx_desc = &tx_q->base_tx[i];

dma = dma_map_single(tx_q->dev, skb->data, size, DMA_TO_DEVICE);

@@ -242,7 +242,7 @@ static void idpf_tx_singleq_map(struct idpf_queue *tx_q,
i++;

if (i == tx_q->desc_count) {
- tx_desc = IDPF_BASE_TX_DESC(tx_q, 0);
+ tx_desc = &tx_q->base_tx[0];
i = 0;
}

@@ -262,7 +262,7 @@ static void idpf_tx_singleq_map(struct idpf_queue *tx_q,
i++;

if (i == tx_q->desc_count) {
- tx_desc = IDPF_BASE_TX_DESC(tx_q, 0);
+ tx_desc = &tx_q->base_tx[0];
i = 0;
}

@@ -311,7 +311,7 @@ idpf_tx_singleq_get_ctx_desc(struct idpf_queue *txq)
memset(&txq->tx_buf[ntu], 0, sizeof(struct idpf_tx_buf));
txq->tx_buf[ntu].ctx_entry = true;

- ctx_desc = IDPF_BASE_TX_CTX_DESC(txq, ntu);
+ ctx_desc = &txq->base_ctx[ntu];

IDPF_SINGLEQ_BUMP_RING_IDX(txq, ntu);
txq->next_to_use = ntu;
@@ -460,7 +460,7 @@ static bool idpf_tx_singleq_clean(struct idpf_queue *tx_q, int napi_budget,
struct netdev_queue *nq;
bool dont_wake;

- tx_desc = IDPF_BASE_TX_DESC(tx_q, ntc);
+ tx_desc = &tx_q->base_tx[ntc];
tx_buf = &tx_q->tx_buf[ntc];
ntc -= tx_q->desc_count;

@@ -509,7 +509,7 @@ static bool idpf_tx_singleq_clean(struct idpf_queue *tx_q, int napi_budget,
if (unlikely(!ntc)) {
ntc -= tx_q->desc_count;
tx_buf = tx_q->tx_buf;
- tx_desc = IDPF_BASE_TX_DESC(tx_q, 0);
+ tx_desc = &tx_q->base_tx[0];
}

/* unmap any remaining paged data */
@@ -527,7 +527,7 @@ static bool idpf_tx_singleq_clean(struct idpf_queue *tx_q, int napi_budget,
if (unlikely(!ntc)) {
ntc -= tx_q->desc_count;
tx_buf = tx_q->tx_buf;
- tx_desc = IDPF_BASE_TX_DESC(tx_q, 0);
+ tx_desc = &tx_q->base_tx[0];
}
} while (likely(budget));

@@ -880,7 +880,7 @@ bool idpf_rx_singleq_buf_hw_alloc_all(struct idpf_queue *rx_q,
if (!cleaned_count)
return false;

- desc = IDPF_SINGLEQ_RX_BUF_DESC(rx_q, nta);
+ desc = &rx_q->single_buf[nta];

do {
dma_addr_t addr;
@@ -898,7 +898,7 @@ bool idpf_rx_singleq_buf_hw_alloc_all(struct idpf_queue *rx_q,

nta++;
if (unlikely(nta == rx_q->desc_count)) {
- desc = IDPF_SINGLEQ_RX_BUF_DESC(rx_q, 0);
+ desc = &rx_q->single_buf[0];
nta = 0;
}

@@ -998,7 +998,7 @@ static int idpf_rx_singleq_clean(struct idpf_queue *rx_q, int budget)
struct idpf_rx_buf *rx_buf;

/* get the Rx desc from Rx queue based on 'next_to_clean' */
- rx_desc = IDPF_RX_DESC(rx_q, ntc);
+ rx_desc = &rx_q->rx[ntc];

/* status_error_ptype_len will always be zero for unused
* descriptors because it's cleared in cleanup, and overlaps
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 6fd9128e61d8..40b8d8b17827 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -533,7 +533,7 @@ static bool idpf_rx_post_buf_desc(struct idpf_queue *bufq, u16 buf_id)
u16 nta = bufq->next_to_alloc;
dma_addr_t addr;

- splitq_rx_desc = IDPF_SPLITQ_RX_BUF_DESC(bufq, nta);
+ splitq_rx_desc = &bufq->split_buf[nta];

if (bufq->rx_hsplit_en) {
bq.pp = bufq->hdr_pp;
@@ -1560,7 +1560,7 @@ do { \
if (unlikely(!(ntc))) { \
ntc -= (txq)->desc_count; \
buf = (txq)->tx_buf; \
- desc = IDPF_FLEX_TX_DESC(txq, 0); \
+ desc = &(txq)->flex_tx[0]; \
} else { \
(buf)++; \
(desc)++; \
@@ -1593,8 +1593,8 @@ static void idpf_tx_splitq_clean(struct idpf_queue *tx_q, u16 end,
s16 ntc = tx_q->next_to_clean;
struct idpf_tx_buf *tx_buf;

- tx_desc = IDPF_FLEX_TX_DESC(tx_q, ntc);
- next_pending_desc = IDPF_FLEX_TX_DESC(tx_q, end);
+ tx_desc = &tx_q->flex_tx[ntc];
+ next_pending_desc = &tx_q->flex_tx[end];
tx_buf = &tx_q->tx_buf[ntc];
ntc -= tx_q->desc_count;

@@ -1774,7 +1774,7 @@ static bool idpf_tx_clean_complq(struct idpf_queue *complq, int budget,
int i;

complq_budget = vport->compln_clean_budget;
- tx_desc = IDPF_SPLITQ_TX_COMPLQ_DESC(complq, ntc);
+ tx_desc = &complq->comp[ntc];
ntc -= complq->desc_count;

do {
@@ -1840,7 +1840,7 @@ static bool idpf_tx_clean_complq(struct idpf_queue *complq, int budget,
ntc++;
if (unlikely(!ntc)) {
ntc -= complq->desc_count;
- tx_desc = IDPF_SPLITQ_TX_COMPLQ_DESC(complq, 0);
+ tx_desc = &complq->comp[0];
change_bit(__IDPF_Q_GEN_CHK, complq->flags);
}

@@ -2107,7 +2107,7 @@ void idpf_tx_dma_map_error(struct idpf_queue *txq, struct sk_buff *skb,
* used one additional descriptor for a context
* descriptor. Reset that here.
*/
- tx_desc = IDPF_FLEX_TX_DESC(txq, idx);
+ tx_desc = &txq->flex_tx[idx];
memset(tx_desc, 0, sizeof(struct idpf_flex_tx_ctx_desc));
if (idx == 0)
idx = txq->desc_count;
@@ -2167,7 +2167,7 @@ static void idpf_tx_splitq_map(struct idpf_queue *tx_q,
data_len = skb->data_len;
size = skb_headlen(skb);

- tx_desc = IDPF_FLEX_TX_DESC(tx_q, i);
+ tx_desc = &tx_q->flex_tx[i];

dma = dma_map_single(tx_q->dev, skb->data, size, DMA_TO_DEVICE);

@@ -2241,7 +2241,7 @@ static void idpf_tx_splitq_map(struct idpf_queue *tx_q,
i++;

if (i == tx_q->desc_count) {
- tx_desc = IDPF_FLEX_TX_DESC(tx_q, 0);
+ tx_desc = &tx_q->flex_tx[0];
i = 0;
tx_q->compl_tag_cur_gen =
IDPF_TX_ADJ_COMPL_TAG_GEN(tx_q);
@@ -2286,7 +2286,7 @@ static void idpf_tx_splitq_map(struct idpf_queue *tx_q,
i++;

if (i == tx_q->desc_count) {
- tx_desc = IDPF_FLEX_TX_DESC(tx_q, 0);
+ tx_desc = &tx_q->flex_tx[0];
i = 0;
tx_q->compl_tag_cur_gen = IDPF_TX_ADJ_COMPL_TAG_GEN(tx_q);
}
@@ -2520,7 +2520,7 @@ idpf_tx_splitq_get_ctx_desc(struct idpf_queue *txq)
txq->tx_buf[i].compl_tag = IDPF_SPLITQ_TX_INVAL_COMPL_TAG;

/* grab the next descriptor */
- desc = IDPF_FLEX_TX_CTX_DESC(txq, i);
+ desc = &txq->flex_ctx[i];
txq->next_to_use = idpf_tx_splitq_bump_ntu(txq, i);

return desc;
@@ -3020,7 +3020,7 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)
u8 rxdid;

/* get the Rx desc from Rx queue based on 'next_to_clean' */
- desc = IDPF_RX_DESC(rxq, ntc);
+ desc = &rxq->rx[ntc];
rx_desc = (struct virtchnl2_rx_flex_desc_adv_nic_3 *)desc;

/* This memory barrier is needed to keep us from reading
@@ -3225,11 +3225,11 @@ static void idpf_rx_clean_refillq(struct idpf_queue *bufq,
int cleaned = 0;
u16 gen;

- buf_desc = IDPF_SPLITQ_RX_BUF_DESC(bufq, bufq_nta);
+ buf_desc = &bufq->split_buf[bufq_nta];

/* make sure we stop at ring wrap in the unlikely case ring is full */
while (likely(cleaned < refillq->desc_count)) {
- u16 refill_desc = IDPF_SPLITQ_RX_BI_DESC(refillq, ntc);
+ u16 refill_desc = refillq->ring[ntc];
bool failure;

gen = FIELD_GET(IDPF_RX_BI_GEN_M, refill_desc);
@@ -3247,7 +3247,7 @@ static void idpf_rx_clean_refillq(struct idpf_queue *bufq,
}

if (unlikely(++bufq_nta == bufq->desc_count)) {
- buf_desc = IDPF_SPLITQ_RX_BUF_DESC(bufq, 0);
+ buf_desc = &bufq->split_buf[0];
bufq_nta = 0;
} else {
buf_desc++;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index 5975c6d029d7..2584bd94363f 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -112,24 +112,6 @@ do { \
#define IDPF_RXD_EOF_SPLITQ VIRTCHNL2_RX_FLEX_DESC_ADV_STATUS0_EOF_M
#define IDPF_RXD_EOF_SINGLEQ VIRTCHNL2_RX_BASE_DESC_STATUS_EOF_M

-#define IDPF_SINGLEQ_RX_BUF_DESC(rxq, i) \
- (&(((struct virtchnl2_singleq_rx_buf_desc *)((rxq)->desc_ring))[i]))
-#define IDPF_SPLITQ_RX_BUF_DESC(rxq, i) \
- (&(((struct virtchnl2_splitq_rx_buf_desc *)((rxq)->desc_ring))[i]))
-#define IDPF_SPLITQ_RX_BI_DESC(rxq, i) ((((rxq)->ring))[i])
-
-#define IDPF_BASE_TX_DESC(txq, i) \
- (&(((struct idpf_base_tx_desc *)((txq)->desc_ring))[i]))
-#define IDPF_BASE_TX_CTX_DESC(txq, i) \
- (&(((struct idpf_base_tx_ctx_desc *)((txq)->desc_ring))[i]))
-#define IDPF_SPLITQ_TX_COMPLQ_DESC(txcq, i) \
- (&(((struct idpf_splitq_tx_compl_desc *)((txcq)->desc_ring))[i]))
-
-#define IDPF_FLEX_TX_DESC(txq, i) \
- (&(((union idpf_tx_flex_desc *)((txq)->desc_ring))[i]))
-#define IDPF_FLEX_TX_CTX_DESC(txq, i) \
- (&(((struct idpf_flex_tx_ctx_desc *)((txq)->desc_ring))[i]))
-
#define IDPF_DESC_UNUSED(txq) \
((((txq)->next_to_clean > (txq)->next_to_use) ? 0 : (txq)->desc_count) + \
(txq)->next_to_clean - (txq)->next_to_use - 1)
@@ -275,9 +257,6 @@ struct idpf_rx_extracted {
#define IDPF_TX_MAX_DESC_DATA_ALIGNED \
ALIGN_DOWN(IDPF_TX_MAX_DESC_DATA, IDPF_TX_MAX_READ_REQ_SIZE)

-#define IDPF_RX_DESC(rxq, i) \
- (&(((union virtchnl2_rx_desc *)((rxq)->desc_ring))[i]))
-
#define idpf_rx_buf libie_rx_buffer

#define IDPF_RX_MAX_PTYPE_PROTO_IDS 32
@@ -586,7 +565,21 @@ struct idpf_queue {
struct page_pool *pp;
struct device *dev;
};
- void *desc_ring;
+ union {
+ union virtchnl2_rx_desc *rx;
+
+ struct virtchnl2_singleq_rx_buf_desc *single_buf;
+ struct virtchnl2_splitq_rx_buf_desc *split_buf;
+
+ struct idpf_base_tx_desc *base_tx;
+ struct idpf_base_tx_ctx_desc *base_ctx;
+ union idpf_tx_flex_desc *flex_tx;
+ struct idpf_flex_tx_ctx_desc *flex_ctx;
+
+ struct idpf_splitq_tx_compl_desc *comp;
+
+ void *desc_ring;
+ };

u32 hdr_truesize;
u32 truesize;
--
2.43.0


2023-12-23 03:05:16

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 20/34] idpf: make complq cleaning dependent on scheduling mode

From: Michal Kubiak <[email protected]>

Extend completion queue cleaning function to support queue-based
scheduling mode needed for XDP queues.
Add 4-byte descriptor for queue-based scheduling mode and
perform some refactoring to extract the common code for
both scheduling modes.
Make the completion desc parsing function a static inline, as
we'll need it in a different file soon.

Signed-off-by: Michal Kubiak <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
.../net/ethernet/intel/idpf/idpf_lan_txrx.h | 6 +-
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 203 ++++++++++--------
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 44 ++++
3 files changed, 166 insertions(+), 87 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_lan_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_lan_txrx.h
index a5752dcab888..7f8fc9b61e90 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lan_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_lan_txrx.h
@@ -184,13 +184,17 @@ struct idpf_base_tx_desc {
__le64 qw1; /* type_cmd_offset_bsz_l2tag1 */
}; /* read used with buffer queues */

-struct idpf_splitq_tx_compl_desc {
+struct idpf_splitq_4b_tx_compl_desc {
/* qid=[10:0] comptype=[13:11] rsvd=[14] gen=[15] */
__le16 qid_comptype_gen;
union {
__le16 q_head; /* Queue head */
__le16 compl_tag; /* Completion tag */
} q_head_compl_tag;
+}; /* writeback used with completion queues */
+
+struct idpf_splitq_tx_compl_desc {
+ struct idpf_splitq_4b_tx_compl_desc common;
u8 ts[3];
u8 rsvd; /* Reserved */
}; /* writeback used with completion queues */
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 40b8d8b17827..1a79ec1fb838 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -1722,8 +1722,8 @@ static bool idpf_tx_clean_buf_ring(struct idpf_queue *txq, u16 compl_tag,
}

/**
- * idpf_tx_handle_rs_completion - clean a single packet and all of its buffers
- * whether on the buffer ring or in the hash table
+ * idpf_tx_handle_rs_cmpl_qb - clean a single packet and all of its buffers
+ * whether the Tx queue is working in queue-based scheduling
* @txq: Tx ring to clean
* @desc: pointer to completion queue descriptor to extract completion
* information from
@@ -1732,20 +1732,33 @@ static bool idpf_tx_clean_buf_ring(struct idpf_queue *txq, u16 compl_tag,
*
* Returns bytes/packets cleaned
*/
-static void idpf_tx_handle_rs_completion(struct idpf_queue *txq,
- struct idpf_splitq_tx_compl_desc *desc,
- struct idpf_cleaned_stats *cleaned,
- int budget)
+static void
+idpf_tx_handle_rs_cmpl_qb(struct idpf_queue *txq,
+ const struct idpf_splitq_4b_tx_compl_desc *desc,
+ struct idpf_cleaned_stats *cleaned, int budget)
{
- u16 compl_tag;
+ u16 head = le16_to_cpu(desc->q_head_compl_tag.q_head);

- if (!test_bit(__IDPF_Q_FLOW_SCH_EN, txq->flags)) {
- u16 head = le16_to_cpu(desc->q_head_compl_tag.q_head);
-
- return idpf_tx_splitq_clean(txq, head, budget, cleaned, false);
- }
+ return idpf_tx_splitq_clean(txq, head, budget, cleaned, false);
+}

- compl_tag = le16_to_cpu(desc->q_head_compl_tag.compl_tag);
+/**
+ * idpf_tx_handle_rs_cmpl_fb - clean a single packet and all of its buffers
+ * whether on the buffer ring or in the hash table (flow-based scheduling only)
+ * @txq: Tx ring to clean
+ * @desc: pointer to completion queue descriptor to extract completion
+ * information from
+ * @cleaned: pointer to stats struct to track cleaned packets/bytes
+ * @budget: Used to determine if we are in netpoll
+ *
+ * Returns bytes/packets cleaned
+ */
+static void
+idpf_tx_handle_rs_cmpl_fb(struct idpf_queue *txq,
+ const struct idpf_splitq_4b_tx_compl_desc *desc,
+ struct idpf_cleaned_stats *cleaned,int budget)
+{
+ u16 compl_tag = le16_to_cpu(desc->q_head_compl_tag.compl_tag);

/* If we didn't clean anything on the ring, this packet must be
* in the hash table. Go clean it there.
@@ -1754,6 +1767,65 @@ static void idpf_tx_handle_rs_completion(struct idpf_queue *txq,
idpf_tx_clean_stashed_bufs(txq, compl_tag, cleaned, budget);
}

+/**
+ * idpf_tx_finalize_complq - Finalize completion queue cleaning
+ * @complq: completion queue to finalize
+ * @ntc: next to complete index
+ * @gen_flag: current state of generation flag
+ * @cleaned: returns number of packets cleaned
+ */
+static void idpf_tx_finalize_complq(struct idpf_queue *complq, int ntc,
+ bool gen_flag, int *cleaned)
+{
+ struct idpf_netdev_priv *np;
+ bool complq_ok = true;
+ int i;
+
+ /* Store the state of the complq to be used later in deciding if a
+ * TXQ can be started again
+ */
+ if (unlikely(IDPF_TX_COMPLQ_PENDING(complq->txq_grp) >
+ IDPF_TX_COMPLQ_OVERFLOW_THRESH(complq)))
+ complq_ok = false;
+
+ np = netdev_priv(complq->vport->netdev);
+ for (i = 0; i < complq->txq_grp->num_txq; ++i) {
+ struct idpf_queue *tx_q = complq->txq_grp->txqs[i];
+ struct netdev_queue *nq;
+ bool dont_wake;
+
+ /* We didn't clean anything on this queue, move along */
+ if (!tx_q->cleaned_bytes)
+ continue;
+
+ *cleaned += tx_q->cleaned_pkts;
+
+ /* Update BQL */
+ nq = netdev_get_tx_queue(tx_q->vport->netdev, tx_q->idx);
+
+ dont_wake = !complq_ok || IDPF_TX_BUF_RSV_LOW(tx_q) ||
+ np->state != __IDPF_VPORT_UP ||
+ !netif_carrier_ok(tx_q->vport->netdev);
+ /* Check if the TXQ needs to and can be restarted */
+ __netif_txq_completed_wake(nq, tx_q->cleaned_pkts, tx_q->cleaned_bytes,
+ IDPF_DESC_UNUSED(tx_q), IDPF_TX_WAKE_THRESH,
+ dont_wake);
+
+ /* Reset cleaned stats for the next time this queue is
+ * cleaned
+ */
+ tx_q->cleaned_bytes = 0;
+ tx_q->cleaned_pkts = 0;
+ }
+
+ ntc += complq->desc_count;
+ complq->next_to_clean = ntc;
+ if (gen_flag)
+ set_bit(__IDPF_Q_GEN_CHK, complq->flags);
+ else
+ clear_bit(__IDPF_Q_GEN_CHK, complq->flags);
+}
+
/**
* idpf_tx_clean_complq - Reclaim resources on completion queue
* @complq: Tx ring to clean
@@ -1765,61 +1837,55 @@ static void idpf_tx_handle_rs_completion(struct idpf_queue *txq,
static bool idpf_tx_clean_complq(struct idpf_queue *complq, int budget,
int *cleaned)
{
- struct idpf_splitq_tx_compl_desc *tx_desc;
+ struct idpf_splitq_4b_tx_compl_desc *tx_desc;
struct idpf_vport *vport = complq->vport;
s16 ntc = complq->next_to_clean;
- struct idpf_netdev_priv *np;
unsigned int complq_budget;
- bool complq_ok = true;
- int i;
+ bool flow, gen_flag;
+ u32 pos = ntc;
+
+ flow = test_bit(__IDPF_Q_FLOW_SCH_EN, complq->flags);
+ gen_flag = test_bit(__IDPF_Q_GEN_CHK, complq->flags);

complq_budget = vport->compln_clean_budget;
- tx_desc = &complq->comp[ntc];
+ tx_desc = flow ? &complq->comp[pos].common : &complq->comp_4b[pos];
ntc -= complq->desc_count;

do {
struct idpf_cleaned_stats cleaned_stats = { };
struct idpf_queue *tx_q;
- int rel_tx_qid;
u16 hw_head;
- u8 ctype; /* completion type */
- u16 gen;
-
- /* if the descriptor isn't done, no work yet to do */
- gen = (le16_to_cpu(tx_desc->qid_comptype_gen) &
- IDPF_TXD_COMPLQ_GEN_M) >> IDPF_TXD_COMPLQ_GEN_S;
- if (test_bit(__IDPF_Q_GEN_CHK, complq->flags) != gen)
- break;
-
- /* Find necessary info of TX queue to clean buffers */
- rel_tx_qid = (le16_to_cpu(tx_desc->qid_comptype_gen) &
- IDPF_TXD_COMPLQ_QID_M) >> IDPF_TXD_COMPLQ_QID_S;
- if (rel_tx_qid >= complq->txq_grp->num_txq ||
- !complq->txq_grp->txqs[rel_tx_qid]) {
- dev_err(&complq->vport->adapter->pdev->dev,
- "TxQ not found\n");
- goto fetch_next_desc;
- }
- tx_q = complq->txq_grp->txqs[rel_tx_qid];
+ int ctype;

- /* Determine completion type */
- ctype = (le16_to_cpu(tx_desc->qid_comptype_gen) &
- IDPF_TXD_COMPLQ_COMPL_TYPE_M) >>
- IDPF_TXD_COMPLQ_COMPL_TYPE_S;
+ ctype = idpf_parse_compl_desc(tx_desc, complq, &tx_q,
+ gen_flag);
switch (ctype) {
case IDPF_TXD_COMPLT_RE:
+ if (unlikely(!flow))
+ goto fetch_next_desc;
+
hw_head = le16_to_cpu(tx_desc->q_head_compl_tag.q_head);

idpf_tx_splitq_clean(tx_q, hw_head, budget,
&cleaned_stats, true);
break;
case IDPF_TXD_COMPLT_RS:
- idpf_tx_handle_rs_completion(tx_q, tx_desc,
- &cleaned_stats, budget);
+ if (flow)
+ idpf_tx_handle_rs_cmpl_fb(tx_q, tx_desc,
+ &cleaned_stats,
+ budget);
+ else
+ idpf_tx_handle_rs_cmpl_qb(tx_q, tx_desc,
+ &cleaned_stats,
+ budget);
break;
case IDPF_TXD_COMPLT_SW_MARKER:
idpf_tx_handle_sw_marker(tx_q);
break;
+ case -ENODATA:
+ goto exit_clean_complq;
+ case -EINVAL:
+ goto fetch_next_desc;
default:
dev_err(&tx_q->vport->adapter->pdev->dev,
"Unknown TX completion type: %d\n",
@@ -1836,59 +1902,24 @@ static bool idpf_tx_clean_complq(struct idpf_queue *complq, int budget,
u64_stats_update_end(&tx_q->stats_sync);

fetch_next_desc:
- tx_desc++;
+ pos++;
ntc++;
if (unlikely(!ntc)) {
ntc -= complq->desc_count;
- tx_desc = &complq->comp[0];
- change_bit(__IDPF_Q_GEN_CHK, complq->flags);
+ pos = 0;
+ gen_flag = !gen_flag;
}

+ tx_desc = flow ? &complq->comp[pos].common :
+ &complq->comp_4b[pos];
prefetch(tx_desc);

/* update budget accounting */
complq_budget--;
} while (likely(complq_budget));

- /* Store the state of the complq to be used later in deciding if a
- * TXQ can be started again
- */
- if (unlikely(IDPF_TX_COMPLQ_PENDING(complq->txq_grp) >
- IDPF_TX_COMPLQ_OVERFLOW_THRESH(complq)))
- complq_ok = false;
-
- np = netdev_priv(complq->vport->netdev);
- for (i = 0; i < complq->txq_grp->num_txq; ++i) {
- struct idpf_queue *tx_q = complq->txq_grp->txqs[i];
- struct netdev_queue *nq;
- bool dont_wake;
-
- /* We didn't clean anything on this queue, move along */
- if (!tx_q->cleaned_bytes)
- continue;
-
- *cleaned += tx_q->cleaned_pkts;
-
- /* Update BQL */
- nq = netdev_get_tx_queue(tx_q->vport->netdev, tx_q->idx);
-
- dont_wake = !complq_ok || IDPF_TX_BUF_RSV_LOW(tx_q) ||
- np->state != __IDPF_VPORT_UP ||
- !netif_carrier_ok(tx_q->vport->netdev);
- /* Check if the TXQ needs to and can be restarted */
- __netif_txq_completed_wake(nq, tx_q->cleaned_pkts, tx_q->cleaned_bytes,
- IDPF_DESC_UNUSED(tx_q), IDPF_TX_WAKE_THRESH,
- dont_wake);
-
- /* Reset cleaned stats for the next time this queue is
- * cleaned
- */
- tx_q->cleaned_bytes = 0;
- tx_q->cleaned_pkts = 0;
- }
-
- ntc += complq->desc_count;
- complq->next_to_clean = ntc;
+exit_clean_complq:
+ idpf_tx_finalize_complq(complq, ntc, gen_flag, cleaned);

return !!complq_budget;
}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index 2584bd94363f..3e15ed779860 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -577,6 +577,7 @@ struct idpf_queue {
struct idpf_flex_tx_ctx_desc *flex_ctx;

struct idpf_splitq_tx_compl_desc *comp;
+ struct idpf_splitq_4b_tx_compl_desc *comp_4b;

void *desc_ring;
};
@@ -821,6 +822,49 @@ static inline void idpf_tx_splitq_build_desc(union idpf_tx_flex_desc *desc,
idpf_tx_splitq_build_flow_desc(desc, params, td_cmd, size);
}

+/**
+ * idpf_parse_compl_desc - Parse the completion descriptor
+ * @desc: completion descriptor to be parsed
+ * @complq: completion queue containing the descriptor
+ * @txq: returns corresponding Tx queue for a given descriptor
+ * @gen_flag: current generation flag in the completion queue
+ *
+ * Returns completion type from descriptor or negative value in case of error:
+ * -ENODATA if there is no completion descriptor to be cleaned
+ * -EINVAL if no Tx queue has been found for the completion queue
+ */
+static inline int
+idpf_parse_compl_desc(struct idpf_splitq_4b_tx_compl_desc *desc,
+ struct idpf_queue *complq, struct idpf_queue **txq,
+ bool gen_flag)
+{
+ int rel_tx_qid;
+ u8 ctype; /* completion type */
+ u16 gen;
+
+ /* if the descriptor isn't done, no work yet to do */
+ gen = (le16_to_cpu(desc->qid_comptype_gen) &
+ IDPF_TXD_COMPLQ_GEN_M) >> IDPF_TXD_COMPLQ_GEN_S;
+ if (gen_flag != gen)
+ return -ENODATA;
+
+ /* Find necessary info of TX queue to clean buffers */
+ rel_tx_qid = (le16_to_cpu(desc->qid_comptype_gen) &
+ IDPF_TXD_COMPLQ_QID_M) >> IDPF_TXD_COMPLQ_QID_S;
+ if (rel_tx_qid >= complq->txq_grp->num_txq ||
+ !complq->txq_grp->txqs[rel_tx_qid])
+ return -EINVAL;
+
+ *txq = complq->txq_grp->txqs[rel_tx_qid];
+
+ /* Determine completion type */
+ ctype = (le16_to_cpu(desc->qid_comptype_gen) &
+ IDPF_TXD_COMPLQ_COMPL_TYPE_M) >>
+ IDPF_TXD_COMPLQ_COMPL_TYPE_S;
+
+ return ctype;
+}
+
/**
* idpf_vport_intr_set_wb_on_itr - enable descriptor writeback on disabled interrupts
* @q_vector: pointer to queue vector struct
--
2.43.0


2023-12-23 03:05:59

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 22/34] idpf: implement XDP_SETUP_PROG in ndo_bpf for splitq

From: Michal Kubiak <[email protected]>

Implement loading the XDP program using ndo_bpf
callback for splitq and XDP_SETUP_PROG parameter.

Add functions for stopping, reconfiguring and restarting
all queues when needed.
Also, implement the XDP hot swap mechanism when the existing
XDP program is replaced by another one (without a necessity
of reconfiguring anything).

Signed-off-by: Michal Kubiak <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/idpf.h | 2 +
drivers/net/ethernet/intel/idpf/idpf_lib.c | 5 +-
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 5 +
drivers/net/ethernet/intel/idpf/idpf_xdp.c | 160 ++++++++++++++++++++
drivers/net/ethernet/intel/idpf/idpf_xdp.h | 7 +
5 files changed, 177 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
index 76df52b797d9..91f61060f500 100644
--- a/drivers/net/ethernet/intel/idpf/idpf.h
+++ b/drivers/net/ethernet/intel/idpf/idpf.h
@@ -975,6 +975,8 @@ int idpf_set_promiscuous(struct idpf_adapter *adapter,
u32 vport_id);
int idpf_send_disable_queues_msg(struct idpf_vport *vport);
void idpf_vport_init(struct idpf_vport *vport, struct idpf_vport_max_q *max_q);
+int idpf_vport_open(struct idpf_vport *vport, bool alloc_res);
+void idpf_vport_stop(struct idpf_vport *vport);
u32 idpf_get_vport_id(struct idpf_vport *vport);
int idpf_vport_queue_ids_init(struct idpf_vport *vport);
int idpf_queue_reg_init(struct idpf_vport *vport);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index c3fb20197725..01130e7c4d2e 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -889,7 +889,7 @@ static void idpf_remove_features(struct idpf_vport *vport)
* idpf_vport_stop - Disable a vport
* @vport: vport to disable
*/
-static void idpf_vport_stop(struct idpf_vport *vport)
+void idpf_vport_stop(struct idpf_vport *vport)
{
struct idpf_netdev_priv *np = netdev_priv(vport->netdev);

@@ -1369,7 +1369,7 @@ static void idpf_rx_init_buf_tail(struct idpf_vport *vport)
* @vport: vport to bring up
* @alloc_res: allocate queue resources
*/
-static int idpf_vport_open(struct idpf_vport *vport, bool alloc_res)
+int idpf_vport_open(struct idpf_vport *vport, bool alloc_res)
{
struct idpf_netdev_priv *np = netdev_priv(vport->netdev);
struct idpf_adapter *adapter = vport->adapter;
@@ -2444,6 +2444,7 @@ static const struct net_device_ops idpf_netdev_ops_splitq = {
.ndo_get_stats64 = idpf_get_stats64,
.ndo_set_features = idpf_set_features,
.ndo_tx_timeout = idpf_tx_timeout,
+ .ndo_bpf = idpf_xdp,
};

static const struct net_device_ops idpf_netdev_ops_singleq = {
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index d4a9f4c36b63..e7081b68bc7d 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -929,6 +929,8 @@ static void idpf_vport_queue_grp_rel_all(struct idpf_vport *vport)
void idpf_vport_queues_rel(struct idpf_vport *vport)
{
idpf_vport_xdpq_put(vport);
+ idpf_copy_xdp_prog_to_qs(vport, NULL);
+
idpf_tx_desc_rel_all(vport);
idpf_rx_desc_rel_all(vport);
idpf_vport_queue_grp_rel_all(vport);
@@ -1485,6 +1487,7 @@ static int idpf_vport_queue_grp_alloc_all(struct idpf_vport *vport)
*/
int idpf_vport_queues_alloc(struct idpf_vport *vport)
{
+ struct bpf_prog *prog;
int err;

err = idpf_vport_queue_grp_alloc_all(vport);
@@ -1503,6 +1506,8 @@ int idpf_vport_queues_alloc(struct idpf_vport *vport)
if (err)
goto err_out;

+ prog = vport->adapter->vport_config[vport->idx]->user_config.xdp_prog;
+ idpf_copy_xdp_prog_to_qs(vport, prog);
idpf_vport_xdpq_get(vport);

return 0;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xdp.c b/drivers/net/ethernet/intel/idpf/idpf_xdp.c
index 29b2fe68c7eb..87d147e80047 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xdp.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_xdp.c
@@ -104,6 +104,33 @@ void idpf_xdp_rxq_info_deinit_all(const struct idpf_vport *vport)
idpf_rxq_for_each(vport, idpf_xdp_rxq_info_deinit, NULL);
}

+static int idpf_xdp_rxq_assign_prog(struct idpf_queue *rxq, void *arg)
+{
+ struct mutex *lock = &rxq->vport->adapter->vport_ctrl_lock;
+ struct bpf_prog *prog = arg;
+ struct bpf_prog *old;
+
+ if (prog)
+ bpf_prog_inc(prog);
+
+ old = rcu_replace_pointer(rxq->xdp_prog, prog, lockdep_is_held(lock));
+ if (old)
+ bpf_prog_put(old);
+
+ return 0;
+}
+
+/**
+ * idpf_copy_xdp_prog_to_qs - set pointers to xdp program for each Rx queue
+ * @vport: vport to setup XDP for
+ * @xdp_prog: XDP program that should be copied to all Rx queues
+ */
+void idpf_copy_xdp_prog_to_qs(const struct idpf_vport *vport,
+ struct bpf_prog *xdp_prog)
+{
+ idpf_rxq_for_each(vport, idpf_xdp_rxq_assign_prog, xdp_prog);
+}
+
void idpf_vport_xdpq_get(const struct idpf_vport *vport)
{
if (!idpf_xdp_is_prog_ena(vport))
@@ -145,3 +172,136 @@ void idpf_vport_xdpq_put(const struct idpf_vport *vport)

cpus_read_unlock();
}
+
+/**
+ * idpf_xdp_reconfig_queues - reconfigure queues after the XDP setup
+ * @vport: vport to load or unload XDP for
+ */
+static int idpf_xdp_reconfig_queues(struct idpf_vport *vport)
+{
+ int err;
+
+ err = idpf_vport_adjust_qs(vport);
+ if (err) {
+ netdev_err(vport->netdev,
+ "Could not adjust queue number for XDP\n");
+ return err;
+ }
+ idpf_vport_calc_num_q_desc(vport);
+
+ err = idpf_vport_queues_alloc(vport);
+ if (err) {
+ netdev_err(vport->netdev,
+ "Could not allocate queues for XDP\n");
+ return err;
+ }
+
+ err = idpf_send_add_queues_msg(vport, vport->num_txq,
+ vport->num_complq,
+ vport->num_rxq, vport->num_bufq);
+ if (err) {
+ netdev_err(vport->netdev,
+ "Could not add queues for XDP, VC message sent failed\n");
+ return err;
+ }
+
+ idpf_vport_alloc_vec_indexes(vport);
+
+ return 0;
+}
+
+/**
+ * idpf_assign_bpf_prog - Assign a given BPF program to vport
+ * @current_prog: pointer to XDP program in user config data
+ * @prog: BPF program to be assigned to vport
+ */
+static void idpf_assign_bpf_prog(struct bpf_prog **current_prog,
+ struct bpf_prog *prog)
+{
+ struct bpf_prog *old_prog = *current_prog;
+
+ *current_prog = prog;
+ if (old_prog)
+ bpf_prog_put(old_prog);
+}
+
+/**
+ * idpf_xdp_setup_prog - Add or remove XDP eBPF program
+ * @vport: vport to setup XDP for
+ * @prog: XDP program
+ * @extack: netlink extended ack
+ */
+static int
+idpf_xdp_setup_prog(struct idpf_vport *vport, struct bpf_prog *prog,
+ struct netlink_ext_ack *extack)
+{
+ struct idpf_netdev_priv *np = netdev_priv(vport->netdev);
+ bool needs_reconfig, vport_is_up;
+ struct bpf_prog **current_prog;
+ u16 idx = vport->idx;
+ int err;
+
+ vport_is_up = np->state == __IDPF_VPORT_UP;
+
+ current_prog = &vport->adapter->vport_config[idx]->user_config.xdp_prog;
+ needs_reconfig = !!(*current_prog) != !!prog;
+
+ if (!needs_reconfig) {
+ idpf_copy_xdp_prog_to_qs(vport, prog);
+ idpf_assign_bpf_prog(current_prog, prog);
+
+ return 0;
+ }
+
+ if (!vport_is_up) {
+ idpf_send_delete_queues_msg(vport);
+ } else {
+ set_bit(IDPF_VPORT_DEL_QUEUES, vport->flags);
+ idpf_vport_stop(vport);
+ }
+
+ idpf_deinit_rss(vport);
+
+ idpf_assign_bpf_prog(current_prog, prog);
+
+ err = idpf_xdp_reconfig_queues(vport);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Could not reconfigure the queues after XDP setup\n");
+ return err;
+ }
+
+ if (vport_is_up) {
+ err = idpf_vport_open(vport, false);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Could not re-open the vport after XDP setup\n");
+ return err;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * idpf_xdp - implements XDP handler
+ * @netdev: netdevice
+ * @xdp: XDP command
+ */
+int idpf_xdp(struct net_device *netdev, struct netdev_bpf *xdp)
+{
+ struct idpf_vport *vport;
+ int err;
+
+ idpf_vport_ctrl_lock(netdev);
+ vport = idpf_netdev_to_vport(netdev);
+
+ switch (xdp->command) {
+ case XDP_SETUP_PROG:
+ err = idpf_xdp_setup_prog(vport, xdp->prog, xdp->extack);
+ break;
+ default:
+ err = -EINVAL;
+ }
+
+ idpf_vport_ctrl_unlock(netdev);
+ return err;
+}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xdp.h b/drivers/net/ethernet/intel/idpf/idpf_xdp.h
index 16b30caaac3f..1d102b1fd2ac 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xdp.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_xdp.h
@@ -4,12 +4,19 @@
#ifndef _IDPF_XDP_H_
#define _IDPF_XDP_H_

+struct bpf_prog;
struct idpf_vport;
+struct net_device;
+struct netdev_bpf;

int idpf_xdp_rxq_info_init_all(const struct idpf_vport *vport);
void idpf_xdp_rxq_info_deinit_all(const struct idpf_vport *vport);
+void idpf_copy_xdp_prog_to_qs(const struct idpf_vport *vport,
+ struct bpf_prog *xdp_prog);

void idpf_vport_xdpq_get(const struct idpf_vport *vport);
void idpf_vport_xdpq_put(const struct idpf_vport *vport);

+int idpf_xdp(struct net_device *netdev, struct netdev_bpf *xdp);
+
#endif /* _IDPF_XDP_H_ */
--
2.43.0


2023-12-23 03:06:32

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 21/34] idpf: prepare structures to support xdp

From: Michal Kubiak <[email protected]>

Extend basic structures of the driver (e.g. 'idpf_vport', 'idpf_queue',
'idpf_vport_user_config_data') by adding members necessary to support XDP.
Add extra XDP Tx queues needed to support XDP_TX and XDP_REDIRECT actions
without interfering a regular Tx traffic.
Also add functions dedicated to support XDP initialization for Rx and
Tx queues and call those functions from the existing algorithms of
queues configuration.

Signed-off-by: Michal Kubiak <[email protected]>
Co-developed-by: Alexander Lobakin <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/Makefile | 2 +
drivers/net/ethernet/intel/idpf/idpf.h | 23 +++
.../net/ethernet/intel/idpf/idpf_ethtool.c | 6 +-
drivers/net/ethernet/intel/idpf/idpf_lib.c | 25 ++-
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 122 ++++++++++++++-
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 24 ++-
.../net/ethernet/intel/idpf/idpf_virtchnl.c | 36 +++--
drivers/net/ethernet/intel/idpf/idpf_xdp.c | 147 ++++++++++++++++++
drivers/net/ethernet/intel/idpf/idpf_xdp.h | 15 ++
9 files changed, 375 insertions(+), 25 deletions(-)
create mode 100644 drivers/net/ethernet/intel/idpf/idpf_xdp.c
create mode 100644 drivers/net/ethernet/intel/idpf/idpf_xdp.h

diff --git a/drivers/net/ethernet/intel/idpf/Makefile b/drivers/net/ethernet/intel/idpf/Makefile
index 6844ead2f3ac..4024781ff02b 100644
--- a/drivers/net/ethernet/intel/idpf/Makefile
+++ b/drivers/net/ethernet/intel/idpf/Makefile
@@ -16,3 +16,5 @@ idpf-y := \
idpf_txrx.o \
idpf_virtchnl.o \
idpf_vf_dev.o
+
+idpf-objs += idpf_xdp.o
diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
index 596ece7df26a..76df52b797d9 100644
--- a/drivers/net/ethernet/intel/idpf/idpf.h
+++ b/drivers/net/ethernet/intel/idpf/idpf.h
@@ -376,6 +376,14 @@ struct idpf_vport {
struct idpf_queue **txqs;
bool crc_enable;

+ bool xdpq_share;
+ u16 num_xdp_txq;
+ u16 num_xdp_rxq;
+ u16 num_xdp_complq;
+ u16 xdp_txq_offset;
+ u16 xdp_rxq_offset;
+ u16 xdp_complq_offset;
+
u16 num_rxq;
u16 num_bufq;
u32 rxq_desc_count;
@@ -465,8 +473,11 @@ struct idpf_vport_user_config_data {
struct idpf_rss_data rss_data;
u16 num_req_tx_qs;
u16 num_req_rx_qs;
+ u16 num_req_xdp_qs;
u32 num_req_txq_desc;
u32 num_req_rxq_desc;
+ /* Duplicated in queue structure for performance reasons */
+ struct bpf_prog *xdp_prog;
DECLARE_BITMAP(user_flags, __IDPF_USER_FLAGS_NBITS);
struct list_head mac_filter_list;
};
@@ -685,6 +696,18 @@ static inline int idpf_is_queue_model_split(u16 q_model)
return q_model == VIRTCHNL2_QUEUE_MODEL_SPLIT;
}

+/**
+ * idpf_xdp_is_prog_ena - check if there is an XDP program on adapter
+ * @vport: vport to check
+ */
+static inline bool idpf_xdp_is_prog_ena(const struct idpf_vport *vport)
+{
+ if (!vport->adapter)
+ return false;
+
+ return !!vport->adapter->vport_config[vport->idx]->user_config.xdp_prog;
+}
+
#define idpf_is_cap_ena(adapter, field, flag) \
idpf_is_capability_ena(adapter, false, field, flag)
#define idpf_is_cap_ena_all(adapter, field, flag) \
diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
index da7963f27bd8..0d192417205d 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
@@ -186,9 +186,11 @@ static void idpf_get_channels(struct net_device *netdev,
{
struct idpf_netdev_priv *np = netdev_priv(netdev);
struct idpf_vport_config *vport_config;
+ const struct idpf_vport *vport;
u16 num_txq, num_rxq;
u16 combined;

+ vport = idpf_netdev_to_vport(netdev);
vport_config = np->adapter->vport_config[np->vport_idx];

num_txq = vport_config->user_config.num_req_tx_qs;
@@ -202,8 +204,8 @@ static void idpf_get_channels(struct net_device *netdev,
ch->max_rx = vport_config->max_q.max_rxq;
ch->max_tx = vport_config->max_q.max_txq;

- ch->max_other = IDPF_MAX_MBXQ;
- ch->other_count = IDPF_MAX_MBXQ;
+ ch->max_other = IDPF_MAX_MBXQ + vport->num_xdp_txq;
+ ch->other_count = IDPF_MAX_MBXQ + vport->num_xdp_txq;

ch->combined_count = combined;
ch->rx_count = num_rxq - combined;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index 5fea2fd957eb..c3fb20197725 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -2,6 +2,7 @@
/* Copyright (C) 2023 Intel Corporation */

#include "idpf.h"
+#include "idpf_xdp.h"

static const struct net_device_ops idpf_netdev_ops_splitq;
static const struct net_device_ops idpf_netdev_ops_singleq;
@@ -912,6 +913,7 @@ static void idpf_vport_stop(struct idpf_vport *vport)
idpf_remove_features(vport);

vport->link_up = false;
+ idpf_xdp_rxq_info_deinit_all(vport);
idpf_vport_intr_deinit(vport);
idpf_vport_intr_rel(vport);
idpf_vport_queues_rel(vport);
@@ -1299,13 +1301,18 @@ static void idpf_restore_features(struct idpf_vport *vport)
*/
static int idpf_set_real_num_queues(struct idpf_vport *vport)
{
- int err;
+ int num_txq, err;

err = netif_set_real_num_rx_queues(vport->netdev, vport->num_rxq);
if (err)
return err;

- return netif_set_real_num_tx_queues(vport->netdev, vport->num_txq);
+ if (idpf_xdp_is_prog_ena(vport))
+ num_txq = vport->num_txq - vport->num_xdp_txq;
+ else
+ num_txq = vport->num_txq;
+
+ return netif_set_real_num_tx_queues(vport->netdev, num_txq);
}

/**
@@ -1418,18 +1425,26 @@ static int idpf_vport_open(struct idpf_vport *vport, bool alloc_res)

idpf_rx_init_buf_tail(vport);

+ err = idpf_xdp_rxq_info_init_all(vport);
+ if (err) {
+ netdev_err(vport->netdev,
+ "Failed to initialize XDP RxQ info for vport %u: %pe\n",
+ vport->vport_id, ERR_PTR(err));
+ goto intr_deinit;
+ }
+
err = idpf_send_config_queues_msg(vport);
if (err) {
dev_err(&adapter->pdev->dev, "Failed to configure queues for vport %u, %d\n",
vport->vport_id, err);
- goto intr_deinit;
+ goto rxq_deinit;
}

err = idpf_send_map_unmap_queue_vector_msg(vport, true);
if (err) {
dev_err(&adapter->pdev->dev, "Failed to map queue vectors for vport %u: %d\n",
vport->vport_id, err);
- goto intr_deinit;
+ goto rxq_deinit;
}

err = idpf_send_enable_queues_msg(vport);
@@ -1477,6 +1492,8 @@ static int idpf_vport_open(struct idpf_vport *vport, bool alloc_res)
idpf_send_disable_queues_msg(vport);
unmap_queue_vectors:
idpf_send_map_unmap_queue_vector_msg(vport, false);
+rxq_deinit:
+ idpf_xdp_rxq_info_deinit_all(vport);
intr_deinit:
idpf_vport_intr_deinit(vport);
intr_rel:
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 1a79ec1fb838..d4a9f4c36b63 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -2,6 +2,7 @@
/* Copyright (C) 2023 Intel Corporation */

#include "idpf.h"
+#include "idpf_xdp.h"

/**
* idpf_buf_lifo_push - push a buffer pointer onto stack
@@ -61,15 +62,23 @@ void idpf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
static void idpf_tx_buf_rel_all(struct idpf_queue *txq)
{
struct libie_sq_onstack_stats ss = { };
+ struct xdp_frame_bulk bq;
u16 i;

/* Buffers already cleared, nothing to do */
if (!txq->tx_buf)
return;

+ xdp_frame_bulk_init(&bq);
+ rcu_read_lock();
+
/* Free all the Tx buffer sk_buffs */
for (i = 0; i < txq->desc_count; i++)
- libie_tx_complete_buf(&txq->tx_buf[i], txq->dev, false, &ss);
+ libie_tx_complete_any(&txq->tx_buf[i], txq->dev, &bq,
+ &txq->xdp_tx_active, &ss);
+
+ xdp_flush_frame_bulk(&bq);
+ rcu_read_unlock();

kfree(txq->tx_buf);
txq->tx_buf = NULL;
@@ -469,6 +478,7 @@ static int idpf_rx_hdr_buf_alloc_all(struct idpf_queue *rxq)
struct libie_buf_queue bq = {
.count = rxq->desc_count,
.type = LIBIE_RX_BUF_HDR,
+ .xdp = idpf_xdp_is_prog_ena(rxq->vport),
};
struct libie_rx_buffer *hdr_buf;
int ret;
@@ -647,6 +657,7 @@ static int idpf_rx_bufs_init(struct idpf_queue *rxbufq,
.count = rxbufq->desc_count,
.type = type,
.hsplit = rxbufq->rx_hsplit_en,
+ .xdp = idpf_xdp_is_prog_ena(rxbufq->vport),
};
int ret;

@@ -917,6 +928,7 @@ static void idpf_vport_queue_grp_rel_all(struct idpf_vport *vport)
*/
void idpf_vport_queues_rel(struct idpf_vport *vport)
{
+ idpf_vport_xdpq_put(vport);
idpf_tx_desc_rel_all(vport);
idpf_rx_desc_rel_all(vport);
idpf_vport_queue_grp_rel_all(vport);
@@ -984,6 +996,27 @@ void idpf_vport_init_num_qs(struct idpf_vport *vport,
if (idpf_is_queue_model_split(vport->rxq_model))
vport->num_bufq = le16_to_cpu(vport_msg->num_rx_bufq);

+ vport->num_xdp_rxq = 0;
+ vport->xdp_rxq_offset = 0;
+
+ if (idpf_xdp_is_prog_ena(vport)) {
+ vport->xdp_txq_offset = config_data->num_req_tx_qs;
+ vport->num_xdp_txq = le16_to_cpu(vport_msg->num_tx_q) -
+ vport->xdp_txq_offset;
+ vport->xdpq_share = libie_xdp_sq_shared(vport->num_xdp_txq);
+ } else {
+ vport->xdp_txq_offset = 0;
+ vport->num_xdp_txq = 0;
+ vport->xdpq_share = 0;
+ }
+
+ if (idpf_is_queue_model_split(vport->txq_model)) {
+ vport->num_xdp_complq = vport->num_xdp_txq;
+ vport->xdp_complq_offset = vport->xdp_txq_offset;
+ }
+
+ config_data->num_req_xdp_qs = vport->num_xdp_txq;
+
/* Adjust number of buffer queues per Rx queue group. */
if (!idpf_is_queue_model_split(vport->rxq_model)) {
vport->num_bufqs_per_qgrp = 0;
@@ -1055,9 +1088,10 @@ int idpf_vport_calc_total_qs(struct idpf_adapter *adapter, u16 vport_idx,
int dflt_splitq_txq_grps = 0, dflt_singleq_txqs = 0;
int dflt_splitq_rxq_grps = 0, dflt_singleq_rxqs = 0;
u16 num_req_tx_qs = 0, num_req_rx_qs = 0;
+ struct idpf_vport_user_config_data *user;
struct idpf_vport_config *vport_config;
u16 num_txq_grps, num_rxq_grps;
- u32 num_qs;
+ u32 num_qs, num_xdpq;

vport_config = adapter->vport_config[vport_idx];
if (vport_config) {
@@ -1105,6 +1139,29 @@ int idpf_vport_calc_total_qs(struct idpf_adapter *adapter, u16 vport_idx,
vport_msg->num_rx_bufq = 0;
}

+ if (!vport_config)
+ return 0;
+
+ user = &vport_config->user_config;
+ user->num_req_rx_qs = le16_to_cpu(vport_msg->num_rx_q);
+ user->num_req_tx_qs = le16_to_cpu(vport_msg->num_tx_q);
+
+ if (vport_config->user_config.xdp_prog)
+ /* As we now know new number of Rx and Tx queues, we can
+ * request additional Tx queues for XDP.
+ */
+ num_xdpq = libie_xdp_get_sq_num(user->num_req_rx_qs,
+ user->num_req_tx_qs,
+ IDPF_LARGE_MAX_Q);
+ else
+ num_xdpq = 0;
+
+ user->num_req_xdp_qs = num_xdpq;
+
+ vport_msg->num_tx_q = cpu_to_le16(user->num_req_tx_qs + num_xdpq);
+ if (idpf_is_queue_model_split(le16_to_cpu(vport_msg->txq_model)))
+ vport_msg->num_tx_complq = vport_msg->num_tx_q;
+
return 0;
}

@@ -1446,6 +1503,8 @@ int idpf_vport_queues_alloc(struct idpf_vport *vport)
if (err)
goto err_out;

+ idpf_vport_xdpq_get(vport);
+
return 0;

err_out:
@@ -3791,9 +3850,15 @@ static bool idpf_tx_splitq_clean_all(struct idpf_q_vector *q_vec,
return true;

budget_per_q = DIV_ROUND_UP(budget, num_txq);
- for (i = 0; i < num_txq; i++)
- clean_complete &= idpf_tx_clean_complq(q_vec->tx[i],
- budget_per_q, cleaned);
+
+ for (i = 0; i < num_txq; i++) {
+ struct idpf_queue *cq = q_vec->tx[i];
+
+ if (!test_bit(__IDPF_Q_XDP, cq->flags))
+ clean_complete &= idpf_tx_clean_complq(cq,
+ budget_per_q,
+ cleaned);
+ }

return clean_complete;
}
@@ -3893,13 +3958,22 @@ static int idpf_vport_splitq_napi_poll(struct napi_struct *napi, int budget)
*/
static void idpf_vport_intr_map_vector_to_qs(struct idpf_vport *vport)
{
+ bool is_xdp_prog_ena = idpf_xdp_is_prog_ena(vport);
u16 num_txq_grp = vport->num_txq_grp;
int i, j, qv_idx, bufq_vidx = 0;
struct idpf_rxq_group *rx_qgrp;
struct idpf_txq_group *tx_qgrp;
struct idpf_queue *q, *bufq;
+ int num_active_rxq;
u16 q_index;

+ /* XDP Tx queues are handled within Rx loop, correct num_txq_grp so
+ * that it stores number of regular Tx queue groups. This way when we
+ * later assign Tx to qvector, we go only through regular Tx queues.
+ */
+ if (is_xdp_prog_ena && idpf_is_queue_model_split(vport->txq_model))
+ num_txq_grp = vport->xdp_txq_offset;
+
for (i = 0, qv_idx = 0; i < vport->num_rxq_grp; i++) {
u16 num_rxq;

@@ -3909,6 +3983,8 @@ static void idpf_vport_intr_map_vector_to_qs(struct idpf_vport *vport)
else
num_rxq = rx_qgrp->singleq.num_rxq;

+ num_active_rxq = num_rxq - vport->num_xdp_rxq;
+
for (j = 0; j < num_rxq; j++) {
if (qv_idx >= vport->num_q_vectors)
qv_idx = 0;
@@ -3921,6 +3997,30 @@ static void idpf_vport_intr_map_vector_to_qs(struct idpf_vport *vport)
q_index = q->q_vector->num_rxq;
q->q_vector->rx[q_index] = q;
q->q_vector->num_rxq++;
+
+ /* Do not setup XDP Tx queues for dummy Rx queues. */
+ if (j >= num_active_rxq)
+ goto skip_xdp_txq_config;
+
+ if (is_xdp_prog_ena) {
+ if (idpf_is_queue_model_split(vport->txq_model)) {
+ tx_qgrp = &vport->txq_grps[i + vport->xdp_txq_offset];
+ q = tx_qgrp->complq;
+ q->q_vector = &vport->q_vectors[qv_idx];
+ q_index = q->q_vector->num_txq;
+ q->q_vector->tx[q_index] = q;
+ q->q_vector->num_txq++;
+ } else {
+ tx_qgrp = &vport->txq_grps[i];
+ q = tx_qgrp->txqs[j + vport->xdp_txq_offset];
+ q->q_vector = &vport->q_vectors[qv_idx];
+ q_index = q->q_vector->num_txq;
+ q->q_vector->tx[q_index] = q;
+ q->q_vector->num_txq++;
+ }
+ }
+
+skip_xdp_txq_config:
qv_idx++;
}

@@ -3954,6 +4054,9 @@ static void idpf_vport_intr_map_vector_to_qs(struct idpf_vport *vport)
q->q_vector->num_txq++;
qv_idx++;
} else {
+ num_txq = is_xdp_prog_ena ? tx_qgrp->num_txq - vport->xdp_txq_offset
+ : tx_qgrp->num_txq;
+
for (j = 0; j < num_txq; j++) {
if (qv_idx >= vport->num_q_vectors)
qv_idx = 0;
@@ -4175,6 +4278,15 @@ static void idpf_fill_dflt_rss_lut(struct idpf_vport *vport)

rss_data = &adapter->vport_config[vport->idx]->user_config.rss_data;

+ /* When we use this code for legacy devices (e.g. in AVF driver), some
+ * Rx queues may not be used because we would not be able to create XDP
+ * Tx queues for them. In such a case do not add their queue IDs to the
+ * RSS LUT by setting the number of active Rx queues to XDP Tx queues
+ * count.
+ */
+ if (idpf_xdp_is_prog_ena(vport))
+ num_active_rxq -= vport->num_xdp_rxq;
+
for (i = 0; i < rss_data->rss_lut_size; i++) {
rss_data->rss_lut[i] = i % num_active_rxq;
rss_data->cached_lut[i] = rss_data->rss_lut[i];
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index 3e15ed779860..b1c30795f376 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -4,8 +4,7 @@
#ifndef _IDPF_TXRX_H_
#define _IDPF_TXRX_H_

-#include <linux/net/intel/libie/rx.h>
-#include <linux/net/intel/libie/tx.h>
+#include <linux/net/intel/libie/xdp.h>

#include <net/page_pool/helpers.h>
#include <net/tcp.h>
@@ -319,6 +318,7 @@ enum idpf_queue_flags_t {
__IDPF_Q_FLOW_SCH_EN,
__IDPF_Q_SW_MARKER,
__IDPF_Q_POLL_MODE,
+ __IDPF_Q_XDP,

__IDPF_Q_FLAGS_NBITS,
};
@@ -554,13 +554,20 @@ struct idpf_queue {
};
void __iomem *tail;
union {
- struct idpf_tx_buf *tx_buf;
+ struct {
+ struct idpf_tx_buf *tx_buf;
+ struct libie_xdp_sq_lock xdp_lock;
+ };
+ u32 num_xdp_txq;
struct {
struct libie_rx_buffer *hdr_buf;
struct idpf_rx_buf *buf;
} rx_buf;
};
- struct page_pool *hdr_pp;
+ union {
+ struct page_pool *hdr_pp;
+ struct idpf_queue **xdpqs;
+ };
union {
struct page_pool *pp;
struct device *dev;
@@ -582,7 +589,10 @@ struct idpf_queue {
void *desc_ring;
};

- u32 hdr_truesize;
+ union {
+ u32 hdr_truesize;
+ u32 xdp_tx_active;
+ };
u32 truesize;
u16 idx;
u16 q_type;
@@ -627,8 +637,12 @@ struct idpf_queue {
union {
/* Rx */
struct {
+ struct xdp_rxq_info xdp_rxq;
+
+ struct bpf_prog __rcu *xdp_prog;
struct sk_buff *skb;
};
+
/* Tx */
struct {
u16 compl_tag_bufid_m;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 5c3d7c3534af..59b8bbebead7 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -1947,20 +1947,27 @@ int idpf_send_map_unmap_queue_vector_msg(struct idpf_vport *vport, bool map)
struct idpf_txq_group *tx_qgrp = &vport->txq_grps[i];

for (j = 0; j < tx_qgrp->num_txq; j++, k++) {
+ const struct idpf_q_vector *vec;
+ u32 v_idx, tx_itr_idx;
+
vqv[k].queue_type = cpu_to_le32(tx_qgrp->txqs[j]->q_type);
vqv[k].queue_id = cpu_to_le32(tx_qgrp->txqs[j]->q_id);

- if (idpf_is_queue_model_split(vport->txq_model)) {
- vqv[k].vector_id =
- cpu_to_le16(tx_qgrp->complq->q_vector->v_idx);
- vqv[k].itr_idx =
- cpu_to_le32(tx_qgrp->complq->q_vector->tx_itr_idx);
+ if (idpf_is_queue_model_split(vport->txq_model))
+ vec = tx_qgrp->complq->q_vector;
+ else
+ vec = tx_qgrp->txqs[j]->q_vector;
+
+ if (vec) {
+ v_idx = vec->v_idx;
+ tx_itr_idx = vec->tx_itr_idx;
} else {
- vqv[k].vector_id =
- cpu_to_le16(tx_qgrp->txqs[j]->q_vector->v_idx);
- vqv[k].itr_idx =
- cpu_to_le32(tx_qgrp->txqs[j]->q_vector->tx_itr_idx);
+ v_idx = 0;
+ tx_itr_idx = VIRTCHNL2_ITR_IDX_1;
}
+
+ vqv[k].vector_id = cpu_to_le16(v_idx);
+ vqv[k].itr_idx = cpu_to_le32(tx_itr_idx);
}
}

@@ -3253,6 +3260,17 @@ int idpf_vport_alloc_vec_indexes(struct idpf_vport *vport)
vec_info.default_vport = vport->default_vport;
vec_info.index = vport->idx;

+ /* Additional XDP Tx queues share the q_vector with regular Tx and Rx
+ * queues to which they are assigned. Also, XDP shall request additional
+ * Tx queues via VIRTCHNL. Therefore, to avoid exceeding over
+ * "vport->q_vector_idxs array", do not request empty q_vectors
+ * for XDP Tx queues.
+ */
+ if (idpf_xdp_is_prog_ena(vport))
+ vec_info.num_req_vecs = max_t(u16,
+ vport->num_txq - vport->num_xdp_txq,
+ vport->num_rxq);
+
num_alloc_vecs = idpf_req_rel_vector_indexes(vport->adapter,
vport->q_vector_idxs,
&vec_info);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xdp.c b/drivers/net/ethernet/intel/idpf/idpf_xdp.c
new file mode 100644
index 000000000000..29b2fe68c7eb
--- /dev/null
+++ b/drivers/net/ethernet/intel/idpf/idpf_xdp.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2023 Intel Corporation */
+
+#include "idpf.h"
+#include "idpf_xdp.h"
+
+static int idpf_rxq_for_each(const struct idpf_vport *vport,
+ int (*fn)(struct idpf_queue *rxq, void *arg),
+ void *arg)
+{
+ bool splitq = idpf_is_queue_model_split(vport->rxq_model);
+
+ for (u32 i = 0; i < vport->num_rxq_grp; i++) {
+ const struct idpf_rxq_group *rx_qgrp = &vport->rxq_grps[i];
+ u32 num_rxq;
+
+ if (splitq)
+ num_rxq = rx_qgrp->splitq.num_rxq_sets;
+ else
+ num_rxq = rx_qgrp->singleq.num_rxq;
+
+ for (u32 j = 0; j < num_rxq; j++) {
+ struct idpf_queue *q;
+ int err;
+
+ if (splitq)
+ q = &rx_qgrp->splitq.rxq_sets[j]->rxq;
+ else
+ q = rx_qgrp->singleq.rxqs[j];
+
+ err = fn(q, arg);
+ if (err)
+ return err;
+ }
+ }
+
+ return 0;
+}
+
+/**
+ * idpf_xdp_rxq_info_init - Setup XDP RxQ info for a given Rx queue
+ * @rxq: Rx queue for which the resources are setup
+ * @splitq: flag indicating if the HW works in split queue mode
+ *
+ * Return: 0 on success, negative on failure.
+ */
+static int idpf_xdp_rxq_info_init(struct idpf_queue *rxq, void *arg)
+{
+ const struct idpf_vport *vport = rxq->vport;
+ const struct page_pool *pp;
+ int err;
+
+ err = __xdp_rxq_info_reg(&rxq->xdp_rxq, vport->netdev, rxq->idx,
+ rxq->q_vector->napi.napi_id,
+ rxq->rx_buf_size);
+ if (err)
+ return err;
+
+ pp = arg ? rxq->rxq_grp->splitq.bufq_sets[0].bufq.pp : rxq->pp;
+ xdp_rxq_info_attach_page_pool(&rxq->xdp_rxq, pp);
+
+ rxq->xdpqs = &vport->txqs[vport->xdp_txq_offset];
+ rxq->num_xdp_txq = vport->num_xdp_txq;
+
+ return 0;
+}
+
+/**
+ * idpf_xdp_rxq_info_init_all - initialize RxQ info for all Rx queues in vport
+ * @vport: vport to setup the info
+ *
+ * Return: 0 on success, negative on failure.
+ */
+int idpf_xdp_rxq_info_init_all(const struct idpf_vport *vport)
+{
+ void *arg;
+
+ arg = (void *)(size_t)idpf_is_queue_model_split(vport->rxq_model);
+
+ return idpf_rxq_for_each(vport, idpf_xdp_rxq_info_init, arg);
+}
+
+/**
+ * idpf_xdp_rxq_info_deinit - Deinit XDP RxQ info for a given Rx queue
+ * @rxq: Rx queue for which the resources are destroyed
+ */
+static int idpf_xdp_rxq_info_deinit(struct idpf_queue *rxq, void *arg)
+{
+ rxq->xdpqs = NULL;
+ rxq->num_xdp_txq = 0;
+
+ xdp_rxq_info_detach_mem_model(&rxq->xdp_rxq);
+ xdp_rxq_info_unreg(&rxq->xdp_rxq);
+
+ return 0;
+}
+
+/**
+ * idpf_xdp_rxq_info_deinit_all - deinit RxQ info for all Rx queues in vport
+ * @vport: vport to setup the info
+ */
+void idpf_xdp_rxq_info_deinit_all(const struct idpf_vport *vport)
+{
+ idpf_rxq_for_each(vport, idpf_xdp_rxq_info_deinit, NULL);
+}
+
+void idpf_vport_xdpq_get(const struct idpf_vport *vport)
+{
+ if (!idpf_xdp_is_prog_ena(vport))
+ return;
+
+ cpus_read_lock();
+
+ for (u32 j = vport->xdp_txq_offset; j < vport->num_txq; j++) {
+ struct idpf_queue *xdpq = vport->txqs[j];
+
+ __clear_bit(__IDPF_Q_FLOW_SCH_EN, xdpq->flags);
+ __clear_bit(__IDPF_Q_FLOW_SCH_EN,
+ xdpq->txq_grp->complq->flags);
+ __set_bit(__IDPF_Q_XDP, xdpq->flags);
+ __set_bit(__IDPF_Q_XDP, xdpq->txq_grp->complq->flags);
+
+ libie_xdp_sq_get(&xdpq->xdp_lock, vport->netdev,
+ vport->xdpq_share);
+ }
+
+ cpus_read_unlock();
+}
+
+void idpf_vport_xdpq_put(const struct idpf_vport *vport)
+{
+ if (!idpf_xdp_is_prog_ena(vport))
+ return;
+
+ cpus_read_lock();
+
+ for (u32 j = vport->xdp_txq_offset; j < vport->num_txq; j++) {
+ struct idpf_queue *xdpq = vport->txqs[j];
+
+ if (!__test_and_clear_bit(__IDPF_Q_XDP, xdpq->flags))
+ continue;
+
+ libie_xdp_sq_put(&xdpq->xdp_lock, vport->netdev);
+ }
+
+ cpus_read_unlock();
+}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xdp.h b/drivers/net/ethernet/intel/idpf/idpf_xdp.h
new file mode 100644
index 000000000000..16b30caaac3f
--- /dev/null
+++ b/drivers/net/ethernet/intel/idpf/idpf_xdp.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#ifndef _IDPF_XDP_H_
+#define _IDPF_XDP_H_
+
+struct idpf_vport;
+
+int idpf_xdp_rxq_info_init_all(const struct idpf_vport *vport);
+void idpf_xdp_rxq_info_deinit_all(const struct idpf_vport *vport);
+
+void idpf_vport_xdpq_get(const struct idpf_vport *vport);
+void idpf_vport_xdpq_put(const struct idpf_vport *vport);
+
+#endif /* _IDPF_XDP_H_ */
--
2.43.0


2023-12-23 03:06:43

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 24/34] idpf: add support for XDP on Rx

Use libie XDP infra to support running XDP program on Rx polling.
This includes all of the possible verdicts/actions.
XDP Tx queues are cleaned only in "lazy" mode when there are less than
1/4 free descriptors left on the ring. Some functions are oneliners
around libie's __always_inlines, so that the compiler could uninline
them when needed.

Co-developed-by: Michal Kubiak <[email protected]>
Signed-off-by: Michal Kubiak <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/idpf_lib.c | 6 +
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 10 +-
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 55 ++++++++
drivers/net/ethernet/intel/idpf/idpf_xdp.c | 140 ++++++++++++++++++++
drivers/net/ethernet/intel/idpf/idpf_xdp.h | 20 ++-
5 files changed, 227 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index 01130e7c4d2e..a19704c4c421 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -840,6 +840,12 @@ static int idpf_cfg_netdev(struct idpf_vport *vport)
netdev->features |= dflt_features;
netdev->hw_features |= dflt_features | offloads;
netdev->hw_enc_features |= dflt_features | offloads;
+
+ if (idpf_is_queue_model_split(vport->rxq_model))
+ xdp_set_features_flag(netdev, NETDEV_XDP_ACT_BASIC |
+ NETDEV_XDP_ACT_REDIRECT |
+ NETDEV_XDP_ACT_RX_SG);
+
idpf_set_ethtool_ops(netdev);
SET_NETDEV_DEV(netdev, &adapter->pdev->dev);

diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index cbbb6bf85b19..99c9b889507b 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -1522,7 +1522,7 @@ int idpf_vport_queues_alloc(struct idpf_vport *vport)
* idpf_tx_handle_sw_marker - Handle queue marker packet
* @tx_q: tx queue to handle software marker
*/
-static void idpf_tx_handle_sw_marker(struct idpf_queue *tx_q)
+void idpf_tx_handle_sw_marker(struct idpf_queue *tx_q)
{
struct idpf_vport *vport = tx_q->vport;
int i;
@@ -3045,8 +3045,11 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)
int total_rx_bytes = 0, total_rx_pkts = 0;
struct idpf_queue *rx_bufq = NULL;
u16 ntc = rxq->next_to_clean;
+ struct libie_xdp_tx_bulk bq;
struct xdp_buff xdp;

+ libie_xdp_tx_init_bulk(&bq, rxq->xdp_prog, rxq->xdp_rxq.dev,
+ rxq->xdpqs, rxq->num_xdp_txq);
libie_xdp_init_buff(&xdp, &rxq->xdp, &rxq->xdp_rxq);

/* Process Rx packets bounded by budget */
@@ -3161,6 +3164,9 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)
total_rx_bytes += xdp_get_buff_len(&xdp);
total_rx_pkts++;

+ if (!idpf_xdp_run_prog(&xdp, &bq))
+ continue;
+
skb = xdp_build_skb_from_buff(&xdp);
if (unlikely(!skb)) {
xdp_return_buff(&xdp);
@@ -3182,7 +3188,9 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)
}

rxq->next_to_clean = ntc;
+
libie_xdp_save_buff(&rxq->xdp, &xdp);
+ idpf_xdp_finalize_rx(&bq);

u64_stats_update_begin(&rxq->stats_sync);
u64_stats_add(&rxq->q_stats.rx.packets, total_rx_pkts);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index 318241020347..20f484712ac2 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -135,6 +135,8 @@ do { \
((++(txq)->compl_tag_cur_gen) >= (txq)->compl_tag_gen_max ? \
0 : (txq)->compl_tag_cur_gen)

+#define IDPF_QUEUE_QUARTER(Q) ((Q)->desc_count >> 2)
+
#define IDPF_TXD_LAST_DESC_CMD (IDPF_TX_DESC_CMD_EOP | IDPF_TX_DESC_CMD_RS)

#define IDPF_TX_FLAGS_TSO BIT(0)
@@ -939,5 +941,58 @@ netdev_tx_t idpf_tx_singleq_start(struct sk_buff *skb,
bool idpf_rx_singleq_buf_hw_alloc_all(struct idpf_queue *rxq,
u16 cleaned_count);
int idpf_tso(struct sk_buff *skb, struct idpf_tx_offload_params *off);
+void idpf_tx_handle_sw_marker(struct idpf_queue *tx_q);
+
+/**
+ * idpf_xdpq_update_tail - Updates the XDP Tx queue tail register
+ * @xdpq: XDP Tx queue
+ *
+ * This function updates the XDP Tx queue tail register.
+ */
+static inline void idpf_xdpq_update_tail(const struct idpf_queue *xdpq)
+{
+ /* Force memory writes to complete before letting h/w
+ * know there are new descriptors to fetch.
+ */
+ wmb();
+ writel_relaxed(xdpq->next_to_use, xdpq->tail);
+}
+
+/**
+ * idpf_set_rs_bit - set RS bit on last produced descriptor.
+ * @xdpq: XDP queue to produce the HW Tx descriptors on
+ *
+ * Returns the index of descriptor RS bit was set on (one behind current NTU).
+ */
+static inline void idpf_set_rs_bit(const struct idpf_queue *xdpq)
+{
+ int rs_idx = xdpq->next_to_use ? xdpq->next_to_use - 1 :
+ xdpq->desc_count - 1;
+ union idpf_tx_flex_desc *tx_desc;
+
+ tx_desc = &xdpq->flex_tx[rs_idx];
+ tx_desc->q.qw1.cmd_dtype |= le16_encode_bits(IDPF_TXD_LAST_DESC_CMD,
+ IDPF_FLEX_TXD_QW1_CMD_M);
+}
+
+/**
+ * idpf_xdp_tx_finalize - Bump XDP Tx tail and/or flush redirect map
+ * @xdpq: XDP Tx queue
+ *
+ * This function bumps XDP Tx tail and should be called when a batch of packets
+ * has been processed in the napi loop.
+ */
+static inline void idpf_xdp_tx_finalize(void *_xdpq, bool tail)
+{
+ struct idpf_queue *xdpq = _xdpq;
+
+ libie_xdp_sq_lock(&xdpq->xdp_lock);
+
+ idpf_set_rs_bit(xdpq);
+ if (tail)
+ idpf_xdpq_update_tail(xdpq);
+
+ libie_xdp_sq_unlock(&xdpq->xdp_lock);
+}

#endif /* !_IDPF_TXRX_H_ */
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xdp.c b/drivers/net/ethernet/intel/idpf/idpf_xdp.c
index 87d147e80047..b9952ebda4fb 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xdp.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_xdp.c
@@ -173,6 +173,146 @@ void idpf_vport_xdpq_put(const struct idpf_vport *vport)
cpus_read_unlock();
}

+/**
+ * idpf_clean_xdp_irq - Reclaim a batch of TX resources from completed XDP_TX
+ * @xdpq: XDP Tx queue
+ *
+ * Returns number of cleaned descriptors.
+ */
+static u32 idpf_clean_xdp_irq(struct idpf_queue *xdpq)
+{
+ struct idpf_queue *complq = xdpq->txq_grp->complq, *txq;
+ struct idpf_splitq_4b_tx_compl_desc *last_rs_desc;
+ struct libie_sq_onstack_stats ss = { };
+ int complq_budget = complq->desc_count;
+ u32 tx_ntc = xdpq->next_to_clean;
+ u32 ntc = complq->next_to_clean;
+ u32 cnt = xdpq->desc_count;
+ u32 done_frames = 0, i = 0;
+ struct xdp_frame_bulk bq;
+ int head = tx_ntc;
+ bool gen_flag;
+
+ last_rs_desc = &complq->comp_4b[ntc];
+ gen_flag = test_bit(__IDPF_Q_GEN_CHK, complq->flags);
+
+ do {
+ int ctype = idpf_parse_compl_desc(last_rs_desc, complq,
+ &txq, gen_flag);
+ if (likely(ctype == IDPF_TXD_COMPLT_RS)) {
+ head = le16_to_cpu(last_rs_desc->q_head_compl_tag.q_head);
+ goto fetch_next_desc;
+ }
+
+ switch (ctype) {
+ case IDPF_TXD_COMPLT_SW_MARKER:
+ idpf_tx_handle_sw_marker(xdpq);
+ break;
+ case -ENODATA:
+ goto exit_xdp_irq;
+ case -EINVAL:
+ break;
+ default:
+ dev_err(&xdpq->vport->adapter->pdev->dev,
+ "Unsupported completion type for XDP\n");
+ break;
+ }
+
+fetch_next_desc:
+ last_rs_desc++;
+ ntc++;
+ if (unlikely(ntc == complq->desc_count)) {
+ ntc = 0;
+ last_rs_desc = &complq->comp_4b[0];
+ gen_flag = !gen_flag;
+ change_bit(__IDPF_Q_GEN_CHK, complq->flags);
+ }
+ prefetch(last_rs_desc);
+ complq_budget--;
+ } while (likely(complq_budget));
+
+exit_xdp_irq:
+ complq->next_to_clean = ntc;
+ done_frames = head >= tx_ntc ? head - tx_ntc :
+ head + cnt - tx_ntc;
+
+ xdp_frame_bulk_init(&bq);
+
+ for (i = 0; i < done_frames; i++) {
+ libie_xdp_complete_tx_buf(&xdpq->tx_buf[tx_ntc], xdpq->dev,
+ true, &bq, &xdpq->xdp_tx_active,
+ &ss);
+
+ if (unlikely(++tx_ntc == cnt))
+ tx_ntc = 0;
+ }
+
+ xdpq->next_to_clean = tx_ntc;
+
+ xdp_flush_frame_bulk(&bq);
+ libie_sq_napi_stats_add((struct libie_sq_stats *)&xdpq->q_stats.tx,
+ &ss);
+
+ return i;
+}
+
+static u32 idpf_xdp_tx_prep(void *_xdpq, struct libie_xdp_tx_queue *sq)
+{
+ struct idpf_queue *xdpq = _xdpq;
+ u32 free;
+
+ libie_xdp_sq_lock(&xdpq->xdp_lock);
+
+ free = IDPF_DESC_UNUSED(xdpq);
+ if (unlikely(free < IDPF_QUEUE_QUARTER(xdpq)))
+ free += idpf_clean_xdp_irq(xdpq);
+
+ *sq = (struct libie_xdp_tx_queue){
+ .dev = xdpq->dev,
+ .tx_buf = xdpq->tx_buf,
+ .desc_ring = xdpq->desc_ring,
+ .xdp_lock = &xdpq->xdp_lock,
+ .next_to_use = &xdpq->next_to_use,
+ .desc_count = xdpq->desc_count,
+ .xdp_tx_active = &xdpq->xdp_tx_active,
+ };
+
+ return free;
+}
+
+static void idpf_xdp_tx_xmit(struct libie_xdp_tx_desc desc,
+ const struct libie_xdp_tx_queue *sq)
+{
+ union idpf_tx_flex_desc *tx_desc = sq->desc_ring;
+ struct idpf_tx_splitq_params tx_params = {
+ .dtype = IDPF_TX_DESC_DTYPE_FLEX_L2TAG1_L2TAG2,
+ .eop_cmd = IDPF_TX_DESC_CMD_EOP,
+ };
+
+ tx_desc = &tx_desc[*sq->next_to_use];
+ tx_desc->q.buf_addr = cpu_to_le64(desc.addr);
+
+ idpf_tx_splitq_build_desc(tx_desc, &tx_params,
+ tx_params.eop_cmd | tx_params.offload.td_cmd,
+ desc.len);
+}
+
+static bool idpf_xdp_tx_flush_bulk(struct libie_xdp_tx_bulk *bq)
+{
+ return libie_xdp_tx_flush_bulk(bq, idpf_xdp_tx_prep, idpf_xdp_tx_xmit);
+}
+
+void __idpf_xdp_finalize_rx(struct libie_xdp_tx_bulk *bq)
+{
+ libie_xdp_finalize_rx(bq, idpf_xdp_tx_flush_bulk,
+ idpf_xdp_tx_finalize);
+}
+
+bool __idpf_xdp_run_prog(struct xdp_buff *xdp, struct libie_xdp_tx_bulk *bq)
+{
+ return libie_xdp_run_prog(xdp, bq, idpf_xdp_tx_flush_bulk);
+}
+
/**
* idpf_xdp_reconfig_queues - reconfigure queues after the XDP setup
* @vport: vport to load or unload XDP for
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xdp.h b/drivers/net/ethernet/intel/idpf/idpf_xdp.h
index 1d102b1fd2ac..1f299c268ca5 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xdp.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_xdp.h
@@ -4,10 +4,9 @@
#ifndef _IDPF_XDP_H_
#define _IDPF_XDP_H_

-struct bpf_prog;
+#include <linux/net/intel/libie/xdp.h>
+
struct idpf_vport;
-struct net_device;
-struct netdev_bpf;

int idpf_xdp_rxq_info_init_all(const struct idpf_vport *vport);
void idpf_xdp_rxq_info_deinit_all(const struct idpf_vport *vport);
@@ -17,6 +16,21 @@ void idpf_copy_xdp_prog_to_qs(const struct idpf_vport *vport,
void idpf_vport_xdpq_get(const struct idpf_vport *vport);
void idpf_vport_xdpq_put(const struct idpf_vport *vport);

+bool __idpf_xdp_run_prog(struct xdp_buff *xdp, struct libie_xdp_tx_bulk *bq);
+void __idpf_xdp_finalize_rx(struct libie_xdp_tx_bulk *bq);
+
+static inline bool idpf_xdp_run_prog(struct xdp_buff *xdp,
+ struct libie_xdp_tx_bulk *bq)
+{
+ return bq->prog ? __idpf_xdp_run_prog(xdp, bq) : true;
+}
+
+static inline void idpf_xdp_finalize_rx(struct libie_xdp_tx_bulk *bq)
+{
+ if (bq->act_mask >= LIBIE_XDP_TX)
+ __idpf_xdp_finalize_rx(bq);
+}
+
int idpf_xdp(struct net_device *netdev, struct netdev_bpf *xdp);

#endif /* _IDPF_XDP_H_ */
--
2.43.0


2023-12-23 03:07:05

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 25/34] idpf: add support for .ndo_xdp_xmit()

Use libie XDP infra to implement .ndo_xdp_xmit() in idpf.
The Tx callbacks are reused from XDP_TX code. XDP redirect target
feature is set/cleared depending on the XDP prog presence, as for now
we still don't allocate XDP Tx queues when there's no program.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/idpf_lib.c | 1 +
drivers/net/ethernet/intel/idpf/idpf_xdp.c | 34 ++++++++++++++++++++++
drivers/net/ethernet/intel/idpf/idpf_xdp.h | 2 ++
3 files changed, 37 insertions(+)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index a19704c4c421..7c3d45f84e1b 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -2451,6 +2451,7 @@ static const struct net_device_ops idpf_netdev_ops_splitq = {
.ndo_set_features = idpf_set_features,
.ndo_tx_timeout = idpf_tx_timeout,
.ndo_bpf = idpf_xdp,
+ .ndo_xdp_xmit = idpf_xdp_xmit,
};

static const struct net_device_ops idpf_netdev_ops_singleq = {
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xdp.c b/drivers/net/ethernet/intel/idpf/idpf_xdp.c
index b9952ebda4fb..b4f096186302 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xdp.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_xdp.c
@@ -313,6 +313,35 @@ bool __idpf_xdp_run_prog(struct xdp_buff *xdp, struct libie_xdp_tx_bulk *bq)
return libie_xdp_run_prog(xdp, bq, idpf_xdp_tx_flush_bulk);
}

+/**
+ * idpf_xdp_xmit - submit packets to xdp ring for transmission
+ * @dev: netdev
+ * @n: number of xdp frames to be transmitted
+ * @frames: xdp frames to be transmitted
+ * @flags: transmit flags
+ *
+ * Returns number of frames successfully sent. Frames that fail are
+ * free'ed via XDP return API.
+ * For error cases, a negative errno code is returned and no-frames
+ * are transmitted (caller must handle freeing frames).
+ */
+int idpf_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
+ u32 flags)
+{
+ struct idpf_netdev_priv *np = netdev_priv(dev);
+ struct idpf_vport *vport = np->vport;
+
+ if (unlikely(!netif_carrier_ok(dev) || !vport->link_up))
+ return -ENETDOWN;
+ if (unlikely(!idpf_xdp_is_prog_ena(vport)))
+ return -ENXIO;
+
+ return libie_xdp_xmit_do_bulk(dev, n, frames, flags,
+ &vport->txqs[vport->xdp_txq_offset],
+ vport->num_xdp_txq, idpf_xdp_tx_prep,
+ idpf_xdp_tx_xmit, idpf_xdp_tx_finalize);
+}
+
/**
* idpf_xdp_reconfig_queues - reconfigure queues after the XDP setup
* @vport: vport to load or unload XDP for
@@ -410,6 +439,11 @@ idpf_xdp_setup_prog(struct idpf_vport *vport, struct bpf_prog *prog,
return err;
}

+ if (prog)
+ xdp_features_set_redirect_target(vport->netdev, false);
+ else
+ xdp_features_clear_redirect_target(vport->netdev);
+
if (vport_is_up) {
err = idpf_vport_open(vport, false);
if (err) {
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xdp.h b/drivers/net/ethernet/intel/idpf/idpf_xdp.h
index 1f299c268ca5..f1444482f69d 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xdp.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_xdp.h
@@ -31,6 +31,8 @@ static inline void idpf_xdp_finalize_rx(struct libie_xdp_tx_bulk *bq)
__idpf_xdp_finalize_rx(bq);
}

+int idpf_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
+ u32 flags);
int idpf_xdp(struct net_device *netdev, struct netdev_bpf *xdp);

#endif /* _IDPF_XDP_H_ */
--
2.43.0


2023-12-23 03:07:17

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 23/34] idpf: use generic functions to build xdp_buff and skb

In preparation of XDP support, move from having skb as the main frame
container during the Rx polling to &xdp_buff.
This allows to use generic and libie helpers for building an XDP buffer
and changes the logics: now we try to allocate an skb only when we
processed all the descriptors related to the frame.
For sure, &xdp_buff is "a bit" bigger in size comparing to skb pointer
to store on the ring, but I already reserved a cacheline-aligned slot
for it earlier.

Signed-off-by: Alexander Lobakin <[email protected]>
---
.../ethernet/intel/idpf/idpf_singleq_txrx.c | 60 +++-------
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 108 ++++--------------
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 5 +-
3 files changed, 41 insertions(+), 132 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
index 7072d45f007b..fa1b66595024 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
@@ -601,14 +601,9 @@ static bool idpf_rx_singleq_test_staterr(const union virtchnl2_rx_desc *rx_desc,

/**
* idpf_rx_singleq_is_non_eop - process handling of non-EOP buffers
- * @rxq: Rx ring being processed
* @rx_desc: Rx descriptor for current buffer
- * @skb: Current socket buffer containing buffer in progress
- * @ntc: next to clean
*/
-static bool idpf_rx_singleq_is_non_eop(struct idpf_queue *rxq,
- union virtchnl2_rx_desc *rx_desc,
- struct sk_buff *skb, u16 ntc)
+static bool idpf_rx_singleq_is_non_eop(const union virtchnl2_rx_desc *rx_desc)
{
/* if we are the last buffer then there is nothing else to do */
if (likely(idpf_rx_singleq_test_staterr(rx_desc, IDPF_RXD_EOF_SINGLEQ)))
@@ -843,9 +838,6 @@ static void idpf_rx_singleq_process_skb_fields(struct idpf_queue *rx_q,
struct libie_rx_ptype_parsed parsed =
rx_q->vport->rx_ptype_lkup[ptype];

- /* modifies the skb - consumes the enet header */
- skb->protocol = eth_type_trans(skb, rx_q->vport->netdev);
-
/* Check if we're using base mode descriptor IDs */
if (rx_q->rxdids == VIRTCHNL2_RXDID_1_32B_BASE_M) {
idpf_rx_singleq_base_hash(rx_q, skb, rx_desc, parsed);
@@ -854,8 +846,6 @@ static void idpf_rx_singleq_process_skb_fields(struct idpf_queue *rx_q,
idpf_rx_singleq_flex_hash(rx_q, skb, rx_desc, parsed);
idpf_rx_singleq_flex_csum(rx_q, skb, rx_desc, parsed);
}
-
- skb_record_rx_queue(skb, rx_q->idx);
}

/**
@@ -986,16 +976,19 @@ static void idpf_rx_singleq_extract_fields(struct idpf_queue *rx_q,
static int idpf_rx_singleq_clean(struct idpf_queue *rx_q, int budget)
{
unsigned int total_rx_bytes = 0, total_rx_pkts = 0;
- struct sk_buff *skb = rx_q->skb;
u16 ntc = rx_q->next_to_clean;
u16 cleaned_count = 0;
bool failure = false;
+ struct xdp_buff xdp;
+
+ libie_xdp_init_buff(&xdp, &rx_q->xdp, &rx_q->xdp_rxq);

/* Process Rx packets bounded by budget */
while (likely(total_rx_pkts < (unsigned int)budget)) {
struct idpf_rx_extracted fields = { };
union virtchnl2_rx_desc *rx_desc;
struct idpf_rx_buf *rx_buf;
+ struct sk_buff *skb;

/* get the Rx desc from Rx queue based on 'next_to_clean' */
rx_desc = &rx_q->rx[ntc];
@@ -1019,45 +1012,35 @@ static int idpf_rx_singleq_clean(struct idpf_queue *rx_q, int budget)
idpf_rx_singleq_extract_fields(rx_q, rx_desc, &fields);

rx_buf = &rx_q->rx_buf.buf[ntc];
- if (!libie_rx_sync_for_cpu(rx_buf, fields.size))
- goto skip_data;
-
- if (skb)
- idpf_rx_add_frag(rx_buf, skb, fields.size);
- else
- skb = idpf_rx_build_skb(rx_buf, fields.size);
-
- /* exit if we failed to retrieve a buffer */
- if (!skb)
- break;
-
-skip_data:
+ libie_xdp_process_buff(&xdp, rx_buf, fields.size);
rx_buf->page = NULL;

IDPF_SINGLEQ_BUMP_RING_IDX(rx_q, ntc);
cleaned_count++;

/* skip if it is non EOP desc */
- if (idpf_rx_singleq_is_non_eop(rx_q, rx_desc, skb, ntc))
+ if (!xdp.data || idpf_rx_singleq_is_non_eop(rx_desc))
continue;

#define IDPF_RXD_ERR_S FIELD_PREP(VIRTCHNL2_RX_BASE_DESC_QW1_ERROR_M, \
VIRTCHNL2_RX_BASE_DESC_ERROR_RXE_M)
if (unlikely(idpf_rx_singleq_test_staterr(rx_desc,
IDPF_RXD_ERR_S))) {
- dev_kfree_skb_any(skb);
- skb = NULL;
- continue;
- }
+drop_cont:
+ xdp_return_buff(&xdp);
+ xdp.data = NULL;

- /* pad skb if needed (to make valid ethernet frame) */
- if (eth_skb_pad(skb)) {
- skb = NULL;
continue;
}

- /* probably a little skewed due to removing CRC */
- total_rx_bytes += skb->len;
+ total_rx_bytes += xdp_get_buff_len(&xdp);
+ total_rx_pkts++;
+
+ skb = xdp_build_skb_from_buff(&xdp);
+ if (unlikely(!skb))
+ goto drop_cont;
+
+ xdp.data = NULL;

/* protocol */
idpf_rx_singleq_process_skb_fields(rx_q, skb,
@@ -1065,15 +1048,10 @@ static int idpf_rx_singleq_clean(struct idpf_queue *rx_q, int budget)

/* send completed skb up the stack */
napi_gro_receive(&rx_q->q_vector->napi, skb);
- skb = NULL;
-
- /* update budget accounting */
- total_rx_pkts++;
}

- rx_q->skb = skb;
-
rx_q->next_to_clean = ntc;
+ libie_xdp_save_buff(&rx_q->xdp, &xdp);

if (cleaned_count)
failure = idpf_rx_singleq_buf_hw_alloc_all(rx_q, cleaned_count);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index e7081b68bc7d..cbbb6bf85b19 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -390,9 +390,9 @@ static void idpf_rx_desc_rel(struct idpf_queue *rxq, bool bufq, s32 q_model)
if (!rxq)
return;

- if (rxq->skb) {
- dev_kfree_skb_any(rxq->skb);
- rxq->skb = NULL;
+ if (rxq->xdp.data) {
+ xdp_return_buff(&rxq->xdp);
+ rxq->xdp.data = NULL;
}

if (bufq || !idpf_is_queue_model_split(q_model))
@@ -2971,8 +2971,6 @@ static int idpf_rx_process_skb_fields(struct idpf_queue *rxq,
/* process RSS/hash */
idpf_rx_hash(rxq, skb, rx_desc, parsed);

- skb->protocol = eth_type_trans(skb, rxq->vport->netdev);
-
if (FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_ADV_RSC_M,
le16_to_cpu(rx_desc->hdrlen_flags)))
return idpf_rx_rsc(rxq, skb, rx_desc, parsed);
@@ -2980,59 +2978,9 @@ static int idpf_rx_process_skb_fields(struct idpf_queue *rxq,
idpf_rx_splitq_extract_csum_bits(rx_desc, &csum_bits);
idpf_rx_csum(rxq, skb, csum_bits, parsed);

- skb_record_rx_queue(skb, rxq->idx);
-
return 0;
}

-/**
- * idpf_rx_add_frag - Add contents of Rx buffer to sk_buff as a frag
- * @rx_buf: buffer containing page to add
- * @skb: sk_buff to place the data into
- * @size: packet length from rx_desc
- *
- * This function will add the data contained in rx_buf->page to the skb.
- * It will just attach the page as a frag to the skb.
- * The function will then update the page offset.
- */
-void idpf_rx_add_frag(struct idpf_rx_buf *rx_buf, struct sk_buff *skb,
- unsigned int size)
-{
- u32 hr = rx_buf->page->pp->p.offset;
-
- skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buf->page,
- rx_buf->offset + hr, size, rx_buf->truesize);
-}
-
-/**
- * idpf_rx_build_skb - Allocate skb and populate it from header buffer
- * @buf: Rx buffer to pull data from
- * @size: the length of the packet
- *
- * This function allocates an skb. It then populates it with the page data from
- * the current receive descriptor, taking care to set up the skb correctly.
- */
-struct sk_buff *idpf_rx_build_skb(const struct libie_rx_buffer *buf, u32 size)
-{
- u32 hr = buf->page->pp->p.offset;
- struct sk_buff *skb;
- void *va;
-
- va = page_address(buf->page) + buf->offset;
- net_prefetch(va + hr);
-
- skb = napi_build_skb(va, buf->truesize);
- if (unlikely(!skb))
- return NULL;
-
- skb_mark_for_recycle(skb);
-
- skb_reserve(skb, hr);
- __skb_put(skb, size);
-
- return skb;
-}
-
/**
* idpf_rx_splitq_test_staterr - tests bits in Rx descriptor
* status and error fields
@@ -3096,8 +3044,10 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)
{
int total_rx_bytes = 0, total_rx_pkts = 0;
struct idpf_queue *rx_bufq = NULL;
- struct sk_buff *skb = rxq->skb;
u16 ntc = rxq->next_to_clean;
+ struct xdp_buff xdp;
+
+ libie_xdp_init_buff(&xdp, &rxq->xdp, &rxq->xdp_rxq);

/* Process Rx packets bounded by budget */
while (likely(total_rx_pkts < budget)) {
@@ -3109,6 +3059,7 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)
unsigned int pkt_len = 0;
unsigned int hdr_len = 0;
u16 gen_id, buf_id = 0;
+ struct sk_buff *skb;
int bufq_id;
/* Header buffer overflow only valid for header split */
bool hbo;
@@ -3179,7 +3130,7 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)

hdr = &rx_bufq->rx_buf.hdr_buf[buf_id];

- if (unlikely(!hdr_len && !skb)) {
+ if (unlikely(!hdr_len && !xdp.data)) {
hdr_len = idpf_rx_hsplit_wa(hdr, rx_buf, pkt_len);
pkt_len -= hdr_len;

@@ -3188,11 +3139,7 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)
u64_stats_update_end(&rxq->stats_sync);
}

- if (libie_rx_sync_for_cpu(hdr, hdr_len)) {
- skb = idpf_rx_build_skb(hdr, hdr_len);
- if (!skb)
- break;
-
+ if (libie_xdp_process_buff(&xdp, hdr, hdr_len)) {
u64_stats_update_begin(&rxq->stats_sync);
u64_stats_inc(&rxq->q_stats.rx.hsplit_pkts);
u64_stats_update_end(&rxq->stats_sync);
@@ -3201,55 +3148,42 @@ static int idpf_rx_splitq_clean(struct idpf_queue *rxq, int budget)
hdr->page = NULL;

payload:
- if (!libie_rx_sync_for_cpu(rx_buf, pkt_len))
- goto skip_data;
-
- if (skb)
- idpf_rx_add_frag(rx_buf, skb, pkt_len);
- else
- skb = idpf_rx_build_skb(rx_buf, pkt_len);
-
- /* exit if we failed to retrieve a buffer */
- if (!skb)
- break;
-
-skip_data:
+ libie_xdp_process_buff(&xdp, rx_buf, pkt_len);
rx_buf->page = NULL;

idpf_rx_post_buf_refill(refillq, buf_id);
IDPF_RX_BUMP_NTC(rxq, ntc);

/* skip if it is non EOP desc */
- if (!idpf_rx_splitq_is_eop(rx_desc))
+ if (!xdp.data || !idpf_rx_splitq_is_eop(rx_desc))
continue;

- /* pad skb if needed (to make valid ethernet frame) */
- if (eth_skb_pad(skb)) {
- skb = NULL;
+ total_rx_bytes += xdp_get_buff_len(&xdp);
+ total_rx_pkts++;
+
+ skb = xdp_build_skb_from_buff(&xdp);
+ if (unlikely(!skb)) {
+ xdp_return_buff(&xdp);
+ xdp.data = NULL;
+
continue;
}

- /* probably a little skewed due to removing CRC */
- total_rx_bytes += skb->len;
+ xdp.data = NULL;

/* protocol */
if (unlikely(idpf_rx_process_skb_fields(rxq, skb, rx_desc))) {
dev_kfree_skb_any(skb);
- skb = NULL;
continue;
}

/* send completed skb up the stack */
napi_gro_receive(&rxq->q_vector->napi, skb);
- skb = NULL;
-
- /* update budget accounting */
- total_rx_pkts++;
}

rxq->next_to_clean = ntc;
+ libie_xdp_save_buff(&rxq->xdp, &xdp);

- rxq->skb = skb;
u64_stats_update_begin(&rxq->stats_sync);
u64_stats_add(&rxq->q_stats.rx.packets, total_rx_pkts);
u64_stats_add(&rxq->q_stats.rx.bytes, total_rx_bytes);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index b1c30795f376..318241020347 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -640,7 +640,7 @@ struct idpf_queue {
struct xdp_rxq_info xdp_rxq;

struct bpf_prog __rcu *xdp_prog;
- struct sk_buff *skb;
+ struct xdp_buff xdp;
};

/* Tx */
@@ -918,9 +918,6 @@ int idpf_config_rss(struct idpf_vport *vport);
int idpf_init_rss(struct idpf_vport *vport);
void idpf_deinit_rss(struct idpf_vport *vport);
int idpf_rx_bufs_init_all(struct idpf_vport *vport);
-void idpf_rx_add_frag(struct idpf_rx_buf *rx_buf, struct sk_buff *skb,
- unsigned int size);
-struct sk_buff *idpf_rx_build_skb(const struct libie_rx_buffer *buf, u32 size);
bool idpf_init_rx_buf_hw_alloc(struct idpf_queue *rxq, struct idpf_rx_buf *buf);
void idpf_rx_buf_hw_update(struct idpf_queue *rxq, u32 val);
void idpf_tx_buf_hw_update(struct idpf_queue *tx_q, u32 val,
--
2.43.0


2023-12-23 03:07:23

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 26/34] xdp: add generic XSk xdp_buff -> skb conversion

Same as with converting &xdp_buff to skb on Rx, the code which allocates
a new skb and copies the XSk frame there is identical across the
drivers, so make it generic.
Note that this time skb_record_rx_queue() is called unconditionally, as
it's not intended to call this function with a non-registered RxQ info.

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/xdp.h | 11 ++++++++++-
net/core/xdp.c | 41 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 66854b755b58..23ada4bb0e69 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -273,7 +273,16 @@ void xdp_warn(const char *msg, const char *func, const int line);

struct sk_buff *__xdp_build_skb_from_buff(struct sk_buff *skb,
const struct xdp_buff *xdp);
-#define xdp_build_skb_from_buff(xdp) __xdp_build_skb_from_buff(NULL, xdp)
+struct sk_buff *xdp_build_skb_from_zc(struct napi_struct *napi,
+ struct xdp_buff *xdp);
+
+static inline struct sk_buff *xdp_build_skb_from_buff(struct xdp_buff *xdp)
+{
+ if (xdp->rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL)
+ return xdp_build_skb_from_zc(NULL, xdp);
+
+ return __xdp_build_skb_from_buff(NULL, xdp);
+}

struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 8ef1d735a7eb..2bdb1fb8a9b8 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -21,6 +21,8 @@
#include <trace/events/xdp.h>
#include <net/xdp_sock_drv.h>

+#include "dev.h"
+
#define REG_STATE_NEW 0x0
#define REG_STATE_REGISTERED 0x1
#define REG_STATE_UNREGISTERED 0x2
@@ -647,6 +649,45 @@ struct sk_buff *__xdp_build_skb_from_buff(struct sk_buff *skb,
}
EXPORT_SYMBOL_GPL(__xdp_build_skb_from_buff);

+struct sk_buff *xdp_build_skb_from_zc(struct napi_struct *napi,
+ struct xdp_buff *xdp)
+{
+ const struct xdp_rxq_info *rxq = xdp->rxq;
+ u32 totalsize, metasize;
+ struct sk_buff *skb;
+
+ if (!napi) {
+ napi = napi_by_id(rxq->napi_id);
+ if (unlikely(!napi))
+ return NULL;
+ }
+
+ totalsize = xdp->data_end - xdp->data_meta;
+
+ skb = __napi_alloc_skb(napi, totalsize, GFP_ATOMIC | __GFP_NOWARN);
+ if (unlikely(!skb))
+ return NULL;
+
+ net_prefetch(xdp->data_meta);
+
+ memcpy(__skb_put(skb, totalsize), xdp->data_meta,
+ ALIGN(totalsize, sizeof(long)));
+
+ metasize = xdp->data - xdp->data_meta;
+ if (metasize) {
+ skb_metadata_set(skb, metasize);
+ __skb_pull(skb, metasize);
+ }
+
+ skb_record_rx_queue(skb, rxq->queue_index);
+ skb->protocol = eth_type_trans(skb, rxq->dev);
+
+ xsk_buff_free(xdp);
+
+ return skb;
+}
+EXPORT_SYMBOL_GPL(xdp_build_skb_from_zc);
+
struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
struct sk_buff *skb,
struct net_device *dev)
--
2.43.0


2023-12-23 03:07:44

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 27/34] idpf: add support for sw interrupt

From: Michal Kubiak <[email protected]>

Sometimes, it may be needed to be able to trigger a HW interrupt
without a "real" interrupt from the hardware.
Add the corresponding register fields to the register table.

Signed-off-by: Michal Kubiak <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/idpf_dev.c | 3 +++
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 4 ++++
drivers/net/ethernet/intel/idpf/idpf_vf_dev.c | 3 +++
3 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_dev.c b/drivers/net/ethernet/intel/idpf/idpf_dev.c
index 2c6776086130..335bf789d908 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_dev.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_dev.c
@@ -100,6 +100,9 @@ static int idpf_intr_reg_init(struct idpf_vport *vport)
intr->dyn_ctl_itridx_s = PF_GLINT_DYN_CTL_ITR_INDX_S;
intr->dyn_ctl_intrvl_s = PF_GLINT_DYN_CTL_INTERVAL_S;
intr->dyn_ctl_wb_on_itr_m = PF_GLINT_DYN_CTL_WB_ON_ITR_M;
+ intr->dyn_ctl_swint_trig_m = PF_GLINT_DYN_CTL_SWINT_TRIG_M;
+ intr->dyn_ctl_sw_itridx_ena_m =
+ PF_GLINT_DYN_CTL_SW_ITR_INDX_ENA_M;

spacing = IDPF_ITR_IDX_SPACING(reg_vals[vec_id].itrn_index_spacing,
IDPF_PF_ITR_IDX_SPACING);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index 20f484712ac2..fa21feddd204 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -347,6 +347,8 @@ struct idpf_vec_regs {
* @dyn_ctl_itridx_m: Mask for ITR index
* @dyn_ctl_intrvl_s: Register bit offset for ITR interval
* @dyn_ctl_wb_on_itr_m: Mask for WB on ITR feature
+ * @dyn_ctl_swint_trig_m: Mask for SW ITR trigger register
+ * @dyn_ctl_sw_itridx_ena_m: Mask for SW ITR enable index
* @rx_itr: RX ITR register
* @tx_itr: TX ITR register
* @icr_ena: Interrupt cause register offset
@@ -360,6 +362,8 @@ struct idpf_intr_reg {
u32 dyn_ctl_itridx_m;
u32 dyn_ctl_intrvl_s;
u32 dyn_ctl_wb_on_itr_m;
+ u32 dyn_ctl_swint_trig_m;
+ u32 dyn_ctl_sw_itridx_ena_m;
void __iomem *rx_itr;
void __iomem *tx_itr;
void __iomem *icr_ena;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c b/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
index f5b0a0666636..a78ae0e618ca 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_vf_dev.c
@@ -99,6 +99,9 @@ static int idpf_vf_intr_reg_init(struct idpf_vport *vport)
intr->dyn_ctl_intena_msk_m = VF_INT_DYN_CTLN_INTENA_MSK_M;
intr->dyn_ctl_itridx_s = VF_INT_DYN_CTLN_ITR_INDX_S;
intr->dyn_ctl_wb_on_itr_m = VF_INT_DYN_CTLN_WB_ON_ITR_M;
+ intr->dyn_ctl_itridx_m = VF_INT_DYN_CTLN_ITR_INDX_M;
+ intr->dyn_ctl_sw_itridx_ena_m =
+ VF_INT_DYN_CTLN_SW_ITR_INDX_ENA_M;

spacing = IDPF_ITR_IDX_SPACING(reg_vals[vec_id].itrn_index_spacing,
IDPF_VF_ITR_IDX_SPACING);
--
2.43.0


2023-12-23 03:08:04

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 28/34] idpf: add relative queue id member to idpf_queue

From: Michal Kubiak <[email protected]>

Relative queue id is one of the required fields of the Tx queue
description in VC 2.0 for splitq mode.
In the current VC implementation all Tx queues are configured
together, so the relative queue id (the index of the Tx queue
in the queue group) can be computed on the fly.

However, such a solution is not flexible because it is not easy to
configure a single Tx queue. So, instead, introduce a new structure
member in 'idpf_queue' dedicated to storing the relative queue id.
Then send that value over the VC.

This patch is the first step in making the existing VC API more flexible
to allow configuration of single queues.

Signed-off-by: Michal Kubiak <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 1 +
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 2 ++
drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 3 ++-
3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 99c9b889507b..3dc21731df2f 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -1276,6 +1276,7 @@ static int idpf_txq_group_alloc(struct idpf_vport *vport, u16 num_txq)
q->tx_min_pkt_len = idpf_get_min_tx_pkt_len(adapter);
q->vport = vport;
q->txq_grp = tx_qgrp;
+ q->relative_q_id = j;

if (!flow_sch_en)
continue;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index fa21feddd204..f32d854fe850 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -665,6 +665,8 @@ struct idpf_queue {

dma_addr_t dma;
unsigned int size;
+
+ u32 relative_q_id;
} ____cacheline_internodealigned_in_smp;

/**
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 59b8bbebead7..49a96af52343 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -1491,7 +1491,8 @@ static int idpf_send_config_tx_queues_msg(struct idpf_vport *vport)

qi[k].tx_compl_queue_id =
cpu_to_le16(tx_qgrp->complq->q_id);
- qi[k].relative_queue_id = cpu_to_le16(j);
+ qi[k].relative_queue_id =
+ cpu_to_le16(q->relative_q_id);

if (test_bit(__IDPF_Q_FLOW_SCH_EN, q->flags))
qi[k].sched_mode =
--
2.43.0


2023-12-23 03:08:26

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 29/34] idpf: add vc functions to manage selected queues

From: Michal Kubiak <[email protected]>

Implement VC functions dedicated to enabling, disabling and configuring
randomly selected queues.

Also, refactor the existing implementation to make the code more
modular. Introduce new generic functions for sending VC messages
consisting of chunks, in order to isolate the sending algorithm
and its implementation for specific VC messages.

Finally, rewrite the function for mapping queues to q_vectors using the
new modular approach to avoid copying the code that implements the VC
message sending algorithm.

Signed-off-by: Michal Kubiak <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/idpf.h | 9 +
.../net/ethernet/intel/idpf/idpf_virtchnl.c | 1085 +++++++++++------
2 files changed, 693 insertions(+), 401 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
index 91f61060f500..a12c56f9f2ef 100644
--- a/drivers/net/ethernet/intel/idpf/idpf.h
+++ b/drivers/net/ethernet/intel/idpf/idpf.h
@@ -982,6 +982,15 @@ int idpf_vport_queue_ids_init(struct idpf_vport *vport);
int idpf_queue_reg_init(struct idpf_vport *vport);
int idpf_send_config_queues_msg(struct idpf_vport *vport);
int idpf_send_enable_queues_msg(struct idpf_vport *vport);
+int idpf_send_enable_selected_queues_msg(struct idpf_vport *vport,
+ struct idpf_queue **qs,
+ int num_q);
+int idpf_send_disable_selected_queues_msg(struct idpf_vport *vport,
+ struct idpf_queue **qs,
+ int num_q);
+int idpf_send_config_selected_queues_msg(struct idpf_vport *vport,
+ struct idpf_queue **qs,
+ int num_q);
int idpf_send_create_vport_msg(struct idpf_adapter *adapter,
struct idpf_vport_max_q *max_q);
int idpf_check_supported_desc_ids(struct idpf_vport *vport);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index 49a96af52343..24df268ad49e 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -748,26 +748,31 @@ static int idpf_wait_for_event(struct idpf_adapter *adapter,
}

/**
- * idpf_wait_for_marker_event - wait for software marker response
+ * idpf_wait_for_selected_marker_events - wait for software marker response
+ * for selected tx queues
* @vport: virtual port data structure
+ * @qs: array of tx queues on which the function should wait for marker events
+ * @num_qs: number of queues contained in the 'qs' array
*
* Returns 0 success, negative on failure.
- **/
-static int idpf_wait_for_marker_event(struct idpf_vport *vport)
+ */
+static int idpf_wait_for_selected_marker_events(struct idpf_vport *vport,
+ struct idpf_queue **qs,
+ int num_qs)
{
int event;
int i;

- for (i = 0; i < vport->num_txq; i++)
- set_bit(__IDPF_Q_SW_MARKER, vport->txqs[i]->flags);
+ for (i = 0; i < num_qs; i++)
+ set_bit(__IDPF_Q_SW_MARKER, qs[i]->flags);

event = wait_event_timeout(vport->sw_marker_wq,
test_and_clear_bit(IDPF_VPORT_SW_MARKER,
vport->flags),
msecs_to_jiffies(500));

- for (i = 0; i < vport->num_txq; i++)
- clear_bit(__IDPF_Q_POLL_MODE, vport->txqs[i]->flags);
+ for (i = 0; i < num_qs; i++)
+ clear_bit(__IDPF_Q_POLL_MODE, qs[i]->flags);

if (event)
return 0;
@@ -777,6 +782,19 @@ static int idpf_wait_for_marker_event(struct idpf_vport *vport)
return -ETIMEDOUT;
}

+/**
+ * idpf_wait_for_marker_event - wait for software marker response
+ * @vport: virtual port data structure
+ *
+ * Returns 0 success, negative on failure.
+ **/
+static int idpf_wait_for_marker_event(struct idpf_vport *vport)
+{
+ return idpf_wait_for_selected_marker_events(vport,
+ vport->txqs,
+ vport->num_txq);
+}
+
/**
* idpf_send_ver_msg - send virtchnl version message
* @adapter: Driver specific private structure
@@ -1450,6 +1468,195 @@ int idpf_send_disable_vport_msg(struct idpf_vport *vport)
return err;
}

+struct idpf_chunked_msg_params {
+ u32 op;
+ enum idpf_vport_vc_state state;
+ enum idpf_vport_vc_state err_check;
+ int timeout;
+ int config_sz;
+ int chunk_sz;
+ int (*prepare_msg)(struct idpf_vport *, u8 *, u8 *, int);
+ u32 num_chunks;
+ u8 *chunks;
+};
+
+/**
+ * idpf_send_chunked_msg - Send a VC message consisting of chunks.
+ * @vport: virtual port data structure
+ * @xn_params: parameters for this particular transaction including
+ * @prep_msg: pointer to the helper function for preparing the VC message
+ * @config_data_size: size of the message config data
+ * @num_chunks: number of chunks in the message
+ * @chunk_size: size of single message chunk
+ * @chunks: pointer to a buffer containig chunks of the message
+ *
+ * Helper function for preparing the message describing queues to be enabled
+ * or disabled.
+ * Returns the total size of the prepared message.
+ */
+static int idpf_send_chunked_msg(struct idpf_vport *vport,
+ struct idpf_chunked_msg_params *params)
+{
+ u32 num_msgs, max_chunks, left_chunks, max_buf_sz;
+ u32 num_chunks = params->num_chunks;
+ int err = 0, i;
+ u8 *msg_buf;
+
+ max_chunks = min_t(u32, IDPF_NUM_CHUNKS_PER_MSG(params->config_sz,
+ params->chunk_sz),
+ params->num_chunks);
+ max_buf_sz = params->config_sz + max_chunks * params->chunk_sz;
+ num_msgs = DIV_ROUND_UP(params->num_chunks, max_chunks);
+
+ msg_buf = kzalloc(max_buf_sz, GFP_KERNEL);
+ if (!msg_buf) {
+ err = -ENOMEM;
+ goto error;
+ }
+
+ mutex_lock(&vport->vc_buf_lock);
+
+ left_chunks = num_chunks;
+ for (i = 0; i < num_msgs; i++) {
+ u8 *chunks = params->chunks;
+ int buf_sz, msg_size;
+ u8 *first_chunk;
+
+ num_chunks = min(max_chunks, left_chunks);
+ first_chunk = chunks + (i * params->chunk_sz * max_chunks);
+ msg_size = params->chunk_sz * num_chunks + params->config_sz;
+
+ buf_sz = params->prepare_msg(vport, msg_buf, first_chunk,
+ num_chunks);
+ if (buf_sz != msg_size) {
+ err = -EINVAL;
+ goto msg_error;
+ }
+
+ err = idpf_send_mb_msg(vport->adapter, params->op,
+ buf_sz, msg_buf);
+ if (err)
+ goto msg_error;
+
+ err = __idpf_wait_for_event(vport->adapter, vport,
+ params->state, params->err_check,
+ params->timeout);
+ if (err)
+ goto msg_error;
+
+ left_chunks -= num_chunks;
+ }
+
+ if (left_chunks != 0)
+ err = -EINVAL;
+msg_error:
+ mutex_unlock(&vport->vc_buf_lock);
+ kfree(msg_buf);
+error:
+ return err;
+}
+
+/**
+ * idpf_fill_txcomplq_config_chunk - Fill the chunk describing the tx queue.
+ * @vport: virtual port data structure
+ * @q: tx queue to be inserted into VC chunk
+ * @qi: pointer to the buffer containing the VC chunk
+ */
+static void idpf_fill_txcomplq_config_chunk(struct idpf_vport *vport,
+ struct idpf_queue *q,
+ struct virtchnl2_txq_info *qi)
+{
+ qi->queue_id = cpu_to_le32(q->q_id);
+ qi->model = cpu_to_le16(vport->txq_model);
+ qi->type = cpu_to_le32(q->q_type);
+ qi->ring_len = cpu_to_le16(q->desc_count);
+ qi->dma_ring_addr = cpu_to_le64(q->dma);
+
+ if (!idpf_is_queue_model_split(vport->txq_model))
+ return;
+
+ if (test_bit(__IDPF_Q_FLOW_SCH_EN, q->flags))
+ qi->sched_mode = cpu_to_le16(VIRTCHNL2_TXQ_SCHED_MODE_FLOW);
+ else
+ qi->sched_mode = cpu_to_le16(VIRTCHNL2_TXQ_SCHED_MODE_QUEUE);
+
+ if (q->q_type != VIRTCHNL2_QUEUE_TYPE_TX)
+ return;
+
+ qi->tx_compl_queue_id = cpu_to_le16(q->txq_grp->complq->q_id);
+ qi->relative_queue_id = cpu_to_le16(q->relative_q_id);
+}
+
+/**
+ * idpf_prepare_cfg_txqs_msg - Prepare message to configure selected tx queues.
+ * @vport: virtual port data structure
+ * @msg_buf: buffer containing the message
+ * @first_chunk: pointer to the first chunk describing the tx queue
+ * @num_chunks: number of chunks in the message
+ *
+ * Helper function for preparing the message describing configuration of
+ * tx queues.
+ * Returns the total size of the prepared message.
+ */
+static int idpf_prepare_cfg_txqs_msg(struct idpf_vport *vport,
+ u8 *msg_buf, u8 *first_chunk,
+ int num_chunks)
+{
+ int chunk_size = sizeof(struct virtchnl2_txq_info);
+ struct virtchnl2_config_tx_queues *ctq;
+
+ ctq = (struct virtchnl2_config_tx_queues *)msg_buf;
+ ctq->vport_id = cpu_to_le32(vport->vport_id);
+ ctq->num_qinfo = cpu_to_le16(num_chunks);
+
+ memcpy(ctq->qinfo, first_chunk, num_chunks * chunk_size);
+
+ return (chunk_size * num_chunks + sizeof(*ctq));
+}
+
+/**
+ * idpf_send_config_selected_tx_queues_msg - Send virtchnl config tx queues
+ * message for selected tx queues only.
+ * @vport: virtual port data structure
+ * @qs: array of tx queues to be configured
+ * @num_q: number of tx queues contained in 'qs' array
+ *
+ * Send config queues virtchnl message for queues contained in 'qs' array.
+ * The 'qs' array can contain tx queues (or completion queues) only.
+ * Returns 0 on success, negative on failure.
+ */
+static int idpf_send_config_selected_tx_queues_msg(struct idpf_vport *vport,
+ struct idpf_queue **qs,
+ int num_q)
+{
+ struct idpf_chunked_msg_params params = { };
+ struct virtchnl2_txq_info *qi;
+ int err, i;
+
+ qi = (struct virtchnl2_txq_info *)
+ kcalloc(num_q, sizeof(struct virtchnl2_txq_info), GFP_KERNEL);
+ if (!qi)
+ return -ENOMEM;
+
+ params.op = VIRTCHNL2_OP_CONFIG_TX_QUEUES;
+ params.state = IDPF_VC_CONFIG_TXQ;
+ params.err_check = IDPF_VC_CONFIG_TXQ_ERR;
+ params.timeout = IDPF_WAIT_FOR_EVENT_TIMEO;
+ params.config_sz = sizeof(struct virtchnl2_config_tx_queues);
+ params.chunk_sz = sizeof(struct virtchnl2_txq_info);
+ params.prepare_msg = &idpf_prepare_cfg_txqs_msg;
+ params.num_chunks = num_q;
+ params.chunks = (u8 *)qi;
+
+ for (i = 0; i < num_q; i++)
+ idpf_fill_txcomplq_config_chunk(vport, qs[i], &qi[i]);
+
+ err = idpf_send_chunked_msg(vport, &params);
+
+ kfree(qi);
+ return err;
+}
+
/**
* idpf_send_config_tx_queues_msg - Send virtchnl config tx queues message
* @vport: virtual port data structure
@@ -1459,69 +1666,24 @@ int idpf_send_disable_vport_msg(struct idpf_vport *vport)
*/
static int idpf_send_config_tx_queues_msg(struct idpf_vport *vport)
{
- struct virtchnl2_config_tx_queues *ctq;
- u32 config_sz, chunk_sz, buf_sz;
- int totqs, num_msgs, num_chunks;
- struct virtchnl2_txq_info *qi;
- int err = 0, i, k = 0;
+ int totqs, err = 0, i, k = 0;
+ struct idpf_queue **qs;

totqs = vport->num_txq + vport->num_complq;
- qi = kcalloc(totqs, sizeof(struct virtchnl2_txq_info), GFP_KERNEL);
- if (!qi)
+ qs = (struct idpf_queue **)kzalloc(totqs * sizeof(*qs), GFP_KERNEL);
+ if (!qs)
return -ENOMEM;

/* Populate the queue info buffer with all queue context info */
for (i = 0; i < vport->num_txq_grp; i++) {
struct idpf_txq_group *tx_qgrp = &vport->txq_grps[i];
- int j, sched_mode;
-
- for (j = 0; j < tx_qgrp->num_txq; j++, k++) {
- qi[k].queue_id =
- cpu_to_le32(tx_qgrp->txqs[j]->q_id);
- qi[k].model =
- cpu_to_le16(vport->txq_model);
- qi[k].type =
- cpu_to_le32(tx_qgrp->txqs[j]->q_type);
- qi[k].ring_len =
- cpu_to_le16(tx_qgrp->txqs[j]->desc_count);
- qi[k].dma_ring_addr =
- cpu_to_le64(tx_qgrp->txqs[j]->dma);
- if (idpf_is_queue_model_split(vport->txq_model)) {
- struct idpf_queue *q = tx_qgrp->txqs[j];
-
- qi[k].tx_compl_queue_id =
- cpu_to_le16(tx_qgrp->complq->q_id);
- qi[k].relative_queue_id =
- cpu_to_le16(q->relative_q_id);
-
- if (test_bit(__IDPF_Q_FLOW_SCH_EN, q->flags))
- qi[k].sched_mode =
- cpu_to_le16(VIRTCHNL2_TXQ_SCHED_MODE_FLOW);
- else
- qi[k].sched_mode =
- cpu_to_le16(VIRTCHNL2_TXQ_SCHED_MODE_QUEUE);
- } else {
- qi[k].sched_mode =
- cpu_to_le16(VIRTCHNL2_TXQ_SCHED_MODE_QUEUE);
- }
- }
-
- if (!idpf_is_queue_model_split(vport->txq_model))
- continue;
-
- qi[k].queue_id = cpu_to_le32(tx_qgrp->complq->q_id);
- qi[k].model = cpu_to_le16(vport->txq_model);
- qi[k].type = cpu_to_le32(tx_qgrp->complq->q_type);
- qi[k].ring_len = cpu_to_le16(tx_qgrp->complq->desc_count);
- qi[k].dma_ring_addr = cpu_to_le64(tx_qgrp->complq->dma);
+ int j;

- if (test_bit(__IDPF_Q_FLOW_SCH_EN, tx_qgrp->complq->flags))
- sched_mode = VIRTCHNL2_TXQ_SCHED_MODE_FLOW;
- else
- sched_mode = VIRTCHNL2_TXQ_SCHED_MODE_QUEUE;
- qi[k].sched_mode = cpu_to_le16(sched_mode);
+ for (j = 0; j < tx_qgrp->num_txq; j++, k++)
+ qs[k] = tx_qgrp->txqs[j];

- k++;
+ if (idpf_is_queue_model_split(vport->txq_model))
+ qs[k++] = tx_qgrp->complq;
}

/* Make sure accounting agrees */
@@ -1530,56 +1692,142 @@ static int idpf_send_config_tx_queues_msg(struct idpf_vport *vport)
goto error;
}

- /* Chunk up the queue contexts into multiple messages to avoid
- * sending a control queue message buffer that is too large
+ err = idpf_send_config_selected_tx_queues_msg(vport, qs, totqs);
+error:
+ kfree(qs);
+ return err;
+}
+
+/**
+ * idpf_fill_bufq_config_chunk - Fill the chunk describing the rx or buf queue.
+ * @vport: virtual port data structure
+ * @rxbufq: rx or buffer queue to be inserted into VC chunk
+ * @qi: pointer to the buffer containing the VC chunk
+ */
+static void idpf_fill_rxbufq_config_chunk(struct idpf_vport *vport,
+ struct idpf_queue *q,
+ struct virtchnl2_rxq_info *qi)
+{
+ const struct idpf_bufq_set *sets;
+
+ qi->queue_id = cpu_to_le32(q->q_id);
+ qi->model = cpu_to_le16(vport->rxq_model);
+ qi->type = cpu_to_le32(q->q_type);
+ qi->ring_len = cpu_to_le16(q->desc_count);
+ qi->dma_ring_addr = cpu_to_le64(q->dma);
+ qi->data_buffer_size = cpu_to_le32(q->rx_buf_size);
+ qi->rx_buffer_low_watermark =
+ cpu_to_le16(q->rx_buffer_low_watermark);
+ if (idpf_is_feature_ena(vport, NETIF_F_GRO_HW))
+ qi->qflags |= cpu_to_le16(VIRTCHNL2_RXQ_RSC);
+
+ if (q->q_type == VIRTCHNL2_QUEUE_TYPE_RX_BUFFER) {
+ qi->desc_ids = cpu_to_le64(VIRTCHNL2_RXDID_2_FLEX_SPLITQ_M);
+ qi->buffer_notif_stride = q->rx_buf_stride;
+ }
+
+ if (q->q_type != VIRTCHNL2_QUEUE_TYPE_RX)
+ return;
+
+ qi->max_pkt_size = cpu_to_le32(q->rx_max_pkt_size);
+ qi->qflags |= cpu_to_le16(VIRTCHNL2_RX_DESC_SIZE_32BYTE);
+ qi->desc_ids = cpu_to_le64(q->rxdids);
+
+ if (!idpf_is_queue_model_split(vport->rxq_model))
+ return;
+
+ sets = q->rxq_grp->splitq.bufq_sets;
+
+ /* In splitq mode, RXQ buffer size should be set to that of the first
+ * buffer queue associated with this RXQ.
*/
- config_sz = sizeof(struct virtchnl2_config_tx_queues);
- chunk_sz = sizeof(struct virtchnl2_txq_info);
+ q->rx_buf_size = sets[0].bufq.rx_buf_size;
+ qi->data_buffer_size = cpu_to_le32(q->rx_buf_size);
+
+ qi->rx_bufq1_id = cpu_to_le16(sets[0].bufq.q_id);
+ if (vport->num_bufqs_per_qgrp > IDPF_SINGLE_BUFQ_PER_RXQ_GRP) {
+ qi->bufq2_ena = IDPF_BUFQ2_ENA;
+ qi->rx_bufq2_id = cpu_to_le16(sets[1].bufq.q_id);
+ }

- num_chunks = min_t(u32, IDPF_NUM_CHUNKS_PER_MSG(config_sz, chunk_sz),
- totqs);
- num_msgs = DIV_ROUND_UP(totqs, num_chunks);
+ q->rx_hbuf_size = sets[0].bufq.rx_hbuf_size;

- buf_sz = struct_size(ctq, qinfo, num_chunks);
- ctq = kzalloc(buf_sz, GFP_KERNEL);
- if (!ctq) {
- err = -ENOMEM;
- goto error;
+ if (q->rx_hsplit_en) {
+ qi->qflags |= cpu_to_le16(VIRTCHNL2_RXQ_HDR_SPLIT);
+ qi->hdr_buffer_size = cpu_to_le16(q->rx_hbuf_size);
}
+}

- mutex_lock(&vport->vc_buf_lock);
+/**
+ * idpf_prepare_cfg_rxqs_msg - Prepare message to configure selected rx queues.
+ * @vport: virtual port data structure
+ * @msg_buf: buffer containing the message
+ * @first_chunk: pointer to the first chunk describing the rx queue
+ * @num_chunks: number of chunks in the message
+ *
+ * Helper function for preparing the message describing configuration of
+ * rx queues.
+ * Returns the total size of the prepared message.
+ */
+static int idpf_prepare_cfg_rxqs_msg(struct idpf_vport *vport,
+ u8 *msg_buf, u8 *first_chunk,
+ int num_chunks)
+{
+ int chunk_size = sizeof(struct virtchnl2_rxq_info);
+ struct virtchnl2_config_rx_queues *crq;

- for (i = 0, k = 0; i < num_msgs; i++) {
- memset(ctq, 0, buf_sz);
- ctq->vport_id = cpu_to_le32(vport->vport_id);
- ctq->num_qinfo = cpu_to_le16(num_chunks);
- memcpy(ctq->qinfo, &qi[k], chunk_sz * num_chunks);
-
- err = idpf_send_mb_msg(vport->adapter,
- VIRTCHNL2_OP_CONFIG_TX_QUEUES,
- buf_sz, (u8 *)ctq);
- if (err)
- goto mbx_error;
+ crq = (struct virtchnl2_config_rx_queues *)msg_buf;

- err = idpf_wait_for_event(vport->adapter, vport,
- IDPF_VC_CONFIG_TXQ,
- IDPF_VC_CONFIG_TXQ_ERR);
- if (err)
- goto mbx_error;
+ crq->vport_id = cpu_to_le32(vport->vport_id);
+ crq->num_qinfo = cpu_to_le16(num_chunks);
+
+ memcpy(crq->qinfo, first_chunk, num_chunks * chunk_size);
+
+ return (chunk_size * num_chunks + sizeof(*crq));
+}

- k += num_chunks;
- totqs -= num_chunks;
- num_chunks = min(num_chunks, totqs);
- /* Recalculate buffer size */
- buf_sz = struct_size(ctq, qinfo, num_chunks);
+/**
+ * idpf_send_config_selected_rx_queues_msg - Send virtchnl config rx queues
+ * message for selected rx queues only.
+ * @vport: virtual port data structure
+ * @qs: array of rx queues to be configured
+ * @num_q: number of rx queues contained in 'qs' array
+ *
+ * Send config queues virtchnl message for queues contained in 'qs' array.
+ * The 'qs' array can contain rx queues (or buffer queues) only.
+ * Returns 0 on success, negative on failure.
+ */
+static int idpf_send_config_selected_rx_queues_msg(struct idpf_vport *vport,
+ struct idpf_queue **qs,
+ int num_q)
+{
+ struct idpf_chunked_msg_params params = { };
+ struct virtchnl2_rxq_info *qi;
+ int err, i;
+
+ qi = (struct virtchnl2_rxq_info *)
+ kcalloc(num_q, sizeof(struct virtchnl2_rxq_info), GFP_KERNEL);
+ if (!qi) {
+ err = -ENOMEM;
+ goto alloc_error;
}

-mbx_error:
- mutex_unlock(&vport->vc_buf_lock);
- kfree(ctq);
-error:
- kfree(qi);
+ params.op = VIRTCHNL2_OP_CONFIG_RX_QUEUES;
+ params.state = IDPF_VC_CONFIG_RXQ;
+ params.err_check = IDPF_VC_CONFIG_RXQ_ERR;
+ params.timeout = IDPF_WAIT_FOR_EVENT_TIMEO;
+ params.config_sz = sizeof(struct virtchnl2_config_rx_queues);
+ params.chunk_sz = sizeof(struct virtchnl2_rxq_info);
+ params.prepare_msg = &idpf_prepare_cfg_rxqs_msg;
+ params.num_chunks = num_q;
+ params.chunks = (u8 *)qi;

+ for (i = 0; i < num_q; i++)
+ idpf_fill_rxbufq_config_chunk(vport, qs[i], &qi[i]);
+
+ err = idpf_send_chunked_msg(vport, &params);
+ kfree(qi);
+alloc_error:
return err;
}

@@ -1592,43 +1840,25 @@ static int idpf_send_config_tx_queues_msg(struct idpf_vport *vport)
*/
static int idpf_send_config_rx_queues_msg(struct idpf_vport *vport)
{
- struct virtchnl2_config_rx_queues *crq;
- u32 config_sz, chunk_sz, buf_sz;
- int totqs, num_msgs, num_chunks;
- struct virtchnl2_rxq_info *qi;
- int err = 0, i, k = 0;
+ int totqs, err = 0, i, k = 0;
+ struct idpf_queue **qs;

totqs = vport->num_rxq + vport->num_bufq;
- qi = kcalloc(totqs, sizeof(struct virtchnl2_rxq_info), GFP_KERNEL);
- if (!qi)
+ qs = (struct idpf_queue **)kzalloc(totqs * sizeof(*qs), GFP_KERNEL);
+ if (!qs)
return -ENOMEM;

/* Populate the queue info buffer with all queue context info */
for (i = 0; i < vport->num_rxq_grp; i++) {
struct idpf_rxq_group *rx_qgrp = &vport->rxq_grps[i];
- u16 num_rxq;
+ int num_rxq;
int j;

if (!idpf_is_queue_model_split(vport->rxq_model))
goto setup_rxqs;

- for (j = 0; j < vport->num_bufqs_per_qgrp; j++, k++) {
- struct idpf_queue *bufq =
- &rx_qgrp->splitq.bufq_sets[j].bufq;
-
- qi[k].queue_id = cpu_to_le32(bufq->q_id);
- qi[k].model = cpu_to_le16(vport->rxq_model);
- qi[k].type = cpu_to_le32(bufq->q_type);
- qi[k].desc_ids = cpu_to_le64(VIRTCHNL2_RXDID_2_FLEX_SPLITQ_M);
- qi[k].ring_len = cpu_to_le16(bufq->desc_count);
- qi[k].dma_ring_addr = cpu_to_le64(bufq->dma);
- qi[k].data_buffer_size = cpu_to_le32(bufq->rx_buf_size);
- qi[k].buffer_notif_stride = bufq->rx_buf_stride;
- qi[k].rx_buffer_low_watermark =
- cpu_to_le16(bufq->rx_buffer_low_watermark);
- if (idpf_is_feature_ena(vport, NETIF_F_GRO_HW))
- qi[k].qflags |= cpu_to_le16(VIRTCHNL2_RXQ_RSC);
- }
+ for (j = 0; j < vport->num_bufqs_per_qgrp; j++, k++)
+ qs[k] = &rx_qgrp->splitq.bufq_sets[j].bufq;

setup_rxqs:
if (idpf_is_queue_model_split(vport->rxq_model))
@@ -1636,56 +1866,11 @@ static int idpf_send_config_rx_queues_msg(struct idpf_vport *vport)
else
num_rxq = rx_qgrp->singleq.num_rxq;

- for (j = 0; j < num_rxq; j++, k++) {
- const struct idpf_bufq_set *sets;
- struct idpf_queue *rxq;
-
- if (!idpf_is_queue_model_split(vport->rxq_model)) {
- rxq = rx_qgrp->singleq.rxqs[j];
- goto common_qi_fields;
- }
-
- rxq = &rx_qgrp->splitq.rxq_sets[j]->rxq;
- sets = rxq->rxq_grp->splitq.bufq_sets;
-
- /* In splitq mode, RXQ buffer size should be
- * set to that of the first buffer queue
- * associated with this RXQ.
- */
- rxq->rx_buf_size = sets[0].bufq.rx_buf_size;
-
- qi[k].rx_bufq1_id = cpu_to_le16(sets[0].bufq.q_id);
- if (vport->num_bufqs_per_qgrp > IDPF_SINGLE_BUFQ_PER_RXQ_GRP) {
- qi[k].bufq2_ena = IDPF_BUFQ2_ENA;
- qi[k].rx_bufq2_id =
- cpu_to_le16(sets[1].bufq.q_id);
- }
- qi[k].rx_buffer_low_watermark =
- cpu_to_le16(rxq->rx_buffer_low_watermark);
- if (idpf_is_feature_ena(vport, NETIF_F_GRO_HW))
- qi[k].qflags |= cpu_to_le16(VIRTCHNL2_RXQ_RSC);
-
- rxq->rx_hbuf_size = sets[0].bufq.rx_hbuf_size;
-
- if (rxq->rx_hsplit_en) {
- qi[k].qflags |=
- cpu_to_le16(VIRTCHNL2_RXQ_HDR_SPLIT);
- qi[k].hdr_buffer_size =
- cpu_to_le16(rxq->rx_hbuf_size);
- }
-
-common_qi_fields:
- qi[k].queue_id = cpu_to_le32(rxq->q_id);
- qi[k].model = cpu_to_le16(vport->rxq_model);
- qi[k].type = cpu_to_le32(rxq->q_type);
- qi[k].ring_len = cpu_to_le16(rxq->desc_count);
- qi[k].dma_ring_addr = cpu_to_le64(rxq->dma);
- qi[k].max_pkt_size = cpu_to_le32(rxq->rx_max_pkt_size);
- qi[k].data_buffer_size = cpu_to_le32(rxq->rx_buf_size);
- qi[k].qflags |=
- cpu_to_le16(VIRTCHNL2_RX_DESC_SIZE_32BYTE);
- qi[k].desc_ids = cpu_to_le64(rxq->rxdids);
- }
+ for (j = 0; j < num_rxq; j++, k++)
+ if (!idpf_is_queue_model_split(vport->rxq_model))
+ qs[k] = rx_qgrp->singleq.rxqs[j];
+ else
+ qs[k] = &rx_qgrp->splitq.rxq_sets[j]->rxq;
}

/* Make sure accounting agrees */
@@ -1694,56 +1879,94 @@ static int idpf_send_config_rx_queues_msg(struct idpf_vport *vport)
goto error;
}

- /* Chunk up the queue contexts into multiple messages to avoid
- * sending a control queue message buffer that is too large
- */
- config_sz = sizeof(struct virtchnl2_config_rx_queues);
- chunk_sz = sizeof(struct virtchnl2_rxq_info);
+ err = idpf_send_config_selected_rx_queues_msg(vport, qs, totqs);

- num_chunks = min_t(u32, IDPF_NUM_CHUNKS_PER_MSG(config_sz, chunk_sz),
- totqs);
- num_msgs = DIV_ROUND_UP(totqs, num_chunks);
+error:
+ kfree(qs);
+ return err;
+}

- buf_sz = struct_size(crq, qinfo, num_chunks);
- crq = kzalloc(buf_sz, GFP_KERNEL);
- if (!crq) {
- err = -ENOMEM;
- goto error;
- }
+/**
+ * idpf_prepare_ena_dis_qs_msg - Prepare message to enable/disable selected
+ * queues.
+ * @vport: virtual port data structure
+ * @msg_buf: buffer containing the message
+ * @first_chunk: pointer to the first chunk describing the queue
+ * @num_chunks: number of chunks in the message
+ *
+ * Helper function for preparing the message describing queues to be enabled
+ * or disabled.
+ * Returns the total size of the prepared message.
+ */
+static int idpf_prepare_ena_dis_qs_msg(struct idpf_vport *vport,
+ u8 *msg_buf, u8 *first_chunk,
+ int num_chunks)
+{
+ int chunk_size = sizeof(struct virtchnl2_queue_chunk);
+ struct virtchnl2_del_ena_dis_queues *eq;

- mutex_lock(&vport->vc_buf_lock);
+ eq = (struct virtchnl2_del_ena_dis_queues *)msg_buf;

- for (i = 0, k = 0; i < num_msgs; i++) {
- memset(crq, 0, buf_sz);
- crq->vport_id = cpu_to_le32(vport->vport_id);
- crq->num_qinfo = cpu_to_le16(num_chunks);
- memcpy(crq->qinfo, &qi[k], chunk_sz * num_chunks);
-
- err = idpf_send_mb_msg(vport->adapter,
- VIRTCHNL2_OP_CONFIG_RX_QUEUES,
- buf_sz, (u8 *)crq);
- if (err)
- goto mbx_error;
+ eq->vport_id = cpu_to_le32(vport->vport_id);
+ eq->chunks.num_chunks = cpu_to_le16(num_chunks);

- err = idpf_wait_for_event(vport->adapter, vport,
- IDPF_VC_CONFIG_RXQ,
- IDPF_VC_CONFIG_RXQ_ERR);
- if (err)
- goto mbx_error;
+ memcpy(eq->chunks.chunks, first_chunk, num_chunks * chunk_size);
+
+ return (chunk_size * num_chunks + sizeof(*eq));
+}
+
+/**
+ * idpf_send_ena_dis_selected_qs_msg - Send virtchnl enable or disable
+ * queues message for selected queues only
+ * @vport: virtual port data structure
+ * @qs: array containing pointers to queues to be enabled/disabled
+ * @num_q: number of queues contained in 'qs' array.
+ * @ena: if true enable, false disable
+ *
+ * Send enable or disable queues virtchnl message for queues contained
+ * in 'qs' array.
+ * The 'qs' array can contain pointers to both rx and tx queues.
+ * Returns 0 on success, negative on failure.
+ */
+static int idpf_send_ena_dis_selected_qs_msg(struct idpf_vport *vport,
+ struct idpf_queue **qs,
+ int num_q,
+ u32 vc_op)
+{
+ bool en = vc_op == VIRTCHNL2_OP_ENABLE_QUEUES;
+ struct idpf_chunked_msg_params params = { };
+ struct virtchnl2_queue_chunk *qc;
+ int err, i;

- k += num_chunks;
- totqs -= num_chunks;
- num_chunks = min(num_chunks, totqs);
- /* Recalculate buffer size */
- buf_sz = struct_size(crq, qinfo, num_chunks);
+ qc = (struct virtchnl2_queue_chunk *)kzalloc(sizeof(*qc) * num_q,
+ GFP_KERNEL);
+ if (!qc) {
+ err = -ENOMEM;
+ goto alloc_error;
}

-mbx_error:
- mutex_unlock(&vport->vc_buf_lock);
- kfree(crq);
-error:
- kfree(qi);
+ params.op = vc_op;
+ params.state = en ? IDPF_VC_ENA_QUEUES : IDPF_VC_DIS_QUEUES;
+ params.err_check = en ? IDPF_VC_ENA_QUEUES_ERR : IDPF_VC_DIS_QUEUES_ERR;
+ params.timeout = en ? IDPF_WAIT_FOR_EVENT_TIMEO :
+ IDPF_WAIT_FOR_EVENT_TIMEO_MIN;
+ params.config_sz = sizeof(struct virtchnl2_del_ena_dis_queues);
+ params.chunk_sz = sizeof(struct virtchnl2_queue_chunk);
+ params.prepare_msg = &idpf_prepare_ena_dis_qs_msg;
+ params.num_chunks = num_q;
+ params.chunks = (u8 *)qc;
+
+ for (i = 0; i < num_q; i++) {
+ struct idpf_queue *q = qs[i];
+
+ qc[i].start_queue_id = cpu_to_le32(q->q_id);
+ qc[i].type = cpu_to_le32(q->q_type);
+ qc[i].num_queues = cpu_to_le32(IDPF_NUMQ_PER_CHUNK);
+ }

+ err = idpf_send_chunked_msg(vport, &params);
+ kfree(qc);
+alloc_error:
return err;
}

@@ -1758,39 +1981,23 @@ static int idpf_send_config_rx_queues_msg(struct idpf_vport *vport)
*/
static int idpf_send_ena_dis_queues_msg(struct idpf_vport *vport, u32 vc_op)
{
- u32 num_msgs, num_chunks, num_txq, num_rxq, num_q;
- struct idpf_adapter *adapter = vport->adapter;
- struct virtchnl2_del_ena_dis_queues *eq;
- struct virtchnl2_queue_chunks *qcs;
- struct virtchnl2_queue_chunk *qc;
- u32 config_sz, chunk_sz, buf_sz;
- int i, j, k = 0, err = 0;
-
- /* validate virtchnl op */
- switch (vc_op) {
- case VIRTCHNL2_OP_ENABLE_QUEUES:
- case VIRTCHNL2_OP_DISABLE_QUEUES:
- break;
- default:
- return -EINVAL;
- }
+ int num_txq, num_rxq, num_q, err = 0;
+ struct idpf_queue **qs;
+ int i, j, k = 0;

num_txq = vport->num_txq + vport->num_complq;
num_rxq = vport->num_rxq + vport->num_bufq;
num_q = num_txq + num_rxq;
- buf_sz = sizeof(struct virtchnl2_queue_chunk) * num_q;
- qc = kzalloc(buf_sz, GFP_KERNEL);
- if (!qc)
+
+ qs = (struct idpf_queue **)kzalloc(num_q * sizeof(*qs), GFP_KERNEL);
+ if (!qs)
return -ENOMEM;

for (i = 0; i < vport->num_txq_grp; i++) {
struct idpf_txq_group *tx_qgrp = &vport->txq_grps[i];

- for (j = 0; j < tx_qgrp->num_txq; j++, k++) {
- qc[k].type = cpu_to_le32(tx_qgrp->txqs[j]->q_type);
- qc[k].start_queue_id = cpu_to_le32(tx_qgrp->txqs[j]->q_id);
- qc[k].num_queues = cpu_to_le32(IDPF_NUMQ_PER_CHUNK);
- }
+ for (j = 0; j < tx_qgrp->num_txq; j++, k++)
+ qs[k] = tx_qgrp->txqs[j];
}
if (vport->num_txq != k) {
err = -EINVAL;
@@ -1800,13 +2007,9 @@ static int idpf_send_ena_dis_queues_msg(struct idpf_vport *vport, u32 vc_op)
if (!idpf_is_queue_model_split(vport->txq_model))
goto setup_rx;

- for (i = 0; i < vport->num_txq_grp; i++, k++) {
- struct idpf_txq_group *tx_qgrp = &vport->txq_grps[i];
+ for (i = 0; i < vport->num_txq_grp; i++, k++)
+ qs[k] = vport->txq_grps[i].complq;

- qc[k].type = cpu_to_le32(tx_qgrp->complq->q_type);
- qc[k].start_queue_id = cpu_to_le32(tx_qgrp->complq->q_id);
- qc[k].num_queues = cpu_to_le32(IDPF_NUMQ_PER_CHUNK);
- }
if (vport->num_complq != (k - vport->num_txq)) {
err = -EINVAL;
goto error;
@@ -1822,18 +2025,10 @@ static int idpf_send_ena_dis_queues_msg(struct idpf_vport *vport, u32 vc_op)
num_rxq = rx_qgrp->singleq.num_rxq;

for (j = 0; j < num_rxq; j++, k++) {
- if (idpf_is_queue_model_split(vport->rxq_model)) {
- qc[k].start_queue_id =
- cpu_to_le32(rx_qgrp->splitq.rxq_sets[j]->rxq.q_id);
- qc[k].type =
- cpu_to_le32(rx_qgrp->splitq.rxq_sets[j]->rxq.q_type);
- } else {
- qc[k].start_queue_id =
- cpu_to_le32(rx_qgrp->singleq.rxqs[j]->q_id);
- qc[k].type =
- cpu_to_le32(rx_qgrp->singleq.rxqs[j]->q_type);
- }
- qc[k].num_queues = cpu_to_le32(IDPF_NUMQ_PER_CHUNK);
+ if (idpf_is_queue_model_split(vport->rxq_model))
+ qs[k] = &rx_qgrp->splitq.rxq_sets[j]->rxq;
+ else
+ qs[k] = rx_qgrp->singleq.rxqs[j];
}
}
if (vport->num_rxq != k - (vport->num_txq + vport->num_complq)) {
@@ -1847,14 +2042,8 @@ static int idpf_send_ena_dis_queues_msg(struct idpf_vport *vport, u32 vc_op)
for (i = 0; i < vport->num_rxq_grp; i++) {
struct idpf_rxq_group *rx_qgrp = &vport->rxq_grps[i];

- for (j = 0; j < vport->num_bufqs_per_qgrp; j++, k++) {
- struct idpf_queue *q;
-
- q = &rx_qgrp->splitq.bufq_sets[j].bufq;
- qc[k].type = cpu_to_le32(q->q_type);
- qc[k].start_queue_id = cpu_to_le32(q->q_id);
- qc[k].num_queues = cpu_to_le32(IDPF_NUMQ_PER_CHUNK);
- }
+ for (j = 0; j < vport->num_bufqs_per_qgrp; j++, k++)
+ qs[k] = &rx_qgrp->splitq.bufq_sets[j].bufq;
}
if (vport->num_bufq != k - (vport->num_txq +
vport->num_complq +
@@ -1864,58 +2053,117 @@ static int idpf_send_ena_dis_queues_msg(struct idpf_vport *vport, u32 vc_op)
}

send_msg:
- /* Chunk up the queue info into multiple messages */
- config_sz = sizeof(struct virtchnl2_del_ena_dis_queues);
- chunk_sz = sizeof(struct virtchnl2_queue_chunk);
+ err = idpf_send_ena_dis_selected_qs_msg(vport, qs, num_q, vc_op);
+error:
+ kfree(qs);
+ return err;
+}

- num_chunks = min_t(u32, IDPF_NUM_CHUNKS_PER_MSG(config_sz, chunk_sz),
- num_q);
- num_msgs = DIV_ROUND_UP(num_q, num_chunks);
+/**
+ * idpf_prep_map_unmap_sel_queue_vector_msg - Prepare message to map or unmap
+ * selected queues to the interrupt vector.
+ * @vport: virtual port data structure
+ * @msg_buf: buffer containing the message
+ * @first_chunk: pointer to the first chunk describing the vector mapping
+ * @num_chunks: number of chunks in the message
+ *
+ * Helper function for preparing the message describing mapping queues to
+ * q_vectors.
+ * Returns the total size of the prepared message.
+ */
+static int
+idpf_prep_map_unmap_sel_queue_vector_msg(struct idpf_vport *vport,
+ u8 *msg_buf, u8 *first_chunk,
+ int num_chunks)
+{
+ int chunk_size = sizeof(struct virtchnl2_queue_vector);
+ struct virtchnl2_queue_vector_maps *vqvm;

- buf_sz = struct_size(eq, chunks.chunks, num_chunks);
- eq = kzalloc(buf_sz, GFP_KERNEL);
- if (!eq) {
- err = -ENOMEM;
- goto error;
- }
+ vqvm = (struct virtchnl2_queue_vector_maps *)msg_buf;

- mutex_lock(&vport->vc_buf_lock);
+ vqvm->vport_id = cpu_to_le32(vport->vport_id);
+ vqvm->num_qv_maps = cpu_to_le16(num_chunks);

- for (i = 0, k = 0; i < num_msgs; i++) {
- memset(eq, 0, buf_sz);
- eq->vport_id = cpu_to_le32(vport->vport_id);
- eq->chunks.num_chunks = cpu_to_le16(num_chunks);
- qcs = &eq->chunks;
- memcpy(qcs->chunks, &qc[k], chunk_sz * num_chunks);
+ memcpy(vqvm->qv_maps, first_chunk, num_chunks * chunk_size);

- err = idpf_send_mb_msg(adapter, vc_op, buf_sz, (u8 *)eq);
- if (err)
- goto mbx_error;
+ return chunk_size * num_chunks + sizeof(*vqvm);
+}

- if (vc_op == VIRTCHNL2_OP_ENABLE_QUEUES)
- err = idpf_wait_for_event(adapter, vport,
- IDPF_VC_ENA_QUEUES,
- IDPF_VC_ENA_QUEUES_ERR);
- else
- err = idpf_min_wait_for_event(adapter, vport,
- IDPF_VC_DIS_QUEUES,
- IDPF_VC_DIS_QUEUES_ERR);
- if (err)
- goto mbx_error;
+/**
+ * idpf_send_map_unmap_sel_queue_vector_msg - Send virtchnl map or unmap
+ * selected queue vector message
+ * @vport: virtual port data structure
+ * @qs: array of queues to be mapped/unmapped
+ * @num_q: number of queues in 'qs' array
+ * @map: true for map and false for unmap
+ *
+ * Send map or unmap queue vector virtchnl message for selected queues only.
+ * Returns 0 on success, negative on failure.
+ */
+static int idpf_send_map_unmap_sel_queue_vector_msg(struct idpf_vport *vport,
+ struct idpf_queue **qs,
+ int num_q,
+ bool map)
+{
+ struct idpf_chunked_msg_params params = { };
+ struct virtchnl2_queue_vector *vqv;
+ int err, i;

- k += num_chunks;
- num_q -= num_chunks;
- num_chunks = min(num_chunks, num_q);
- /* Recalculate buffer size */
- buf_sz = struct_size(eq, chunks.chunks, num_chunks);
+ vqv = (struct virtchnl2_queue_vector *)kzalloc(sizeof(*vqv) * num_q,
+ GFP_KERNEL);
+ if (!vqv) {
+ err = -ENOMEM;
+ goto alloc_error;
}

-mbx_error:
- mutex_unlock(&vport->vc_buf_lock);
- kfree(eq);
-error:
- kfree(qc);
+ params.op = map ? VIRTCHNL2_OP_MAP_QUEUE_VECTOR :
+ VIRTCHNL2_OP_UNMAP_QUEUE_VECTOR;
+ params.state = map ? IDPF_VC_MAP_IRQ : IDPF_VC_UNMAP_IRQ;
+ params.err_check = map ? IDPF_VC_MAP_IRQ_ERR : IDPF_VC_UNMAP_IRQ_ERR;
+ params.timeout = map ? IDPF_WAIT_FOR_EVENT_TIMEO :
+ IDPF_WAIT_FOR_EVENT_TIMEO_MIN;
+ params.config_sz = sizeof(struct virtchnl2_queue_vector_maps);
+ params.chunk_sz = sizeof(struct virtchnl2_queue_vector);
+ params.prepare_msg = &idpf_prep_map_unmap_sel_queue_vector_msg;
+ params.num_chunks = num_q;
+ params.chunks = (u8 *)vqv;
+
+ for (i = 0; i < num_q; i++) {
+ const struct idpf_q_vector *vec;
+ struct idpf_queue *q = qs[i];
+ u32 vector_id, itr_idx;
+
+ vqv[i].queue_type = cpu_to_le32(q->q_type);
+ vqv[i].queue_id = cpu_to_le32(q->q_id);
+
+ if (q->q_type != VIRTCHNL2_QUEUE_TYPE_TX) {
+ vector_id = q->q_vector->v_idx;
+ itr_idx = q->q_vector->rx_itr_idx;
+
+ goto fill;
+ }

+ if (idpf_is_queue_model_split(vport->txq_model))
+ vec = q->txq_grp->complq->q_vector;
+ else
+ vec = q->q_vector;
+
+ if (vec) {
+ vector_id = vec->v_idx;
+ itr_idx = vec->tx_itr_idx;
+ } else {
+ vector_id = 0;
+ itr_idx = VIRTCHNL2_ITR_IDX_1;
+ }
+
+fill:
+ vqv[i].vector_id = cpu_to_le16(vector_id);
+ vqv[i].itr_idx = cpu_to_le32(itr_idx);
+ }
+
+ err = idpf_send_chunked_msg(vport, &params);
+ kfree(vqv);
+alloc_error:
return err;
}

@@ -1930,46 +2178,21 @@ static int idpf_send_ena_dis_queues_msg(struct idpf_vport *vport, u32 vc_op)
*/
int idpf_send_map_unmap_queue_vector_msg(struct idpf_vport *vport, bool map)
{
- struct idpf_adapter *adapter = vport->adapter;
- struct virtchnl2_queue_vector_maps *vqvm;
- struct virtchnl2_queue_vector *vqv;
- u32 config_sz, chunk_sz, buf_sz;
- u32 num_msgs, num_chunks, num_q;
- int i, j, k = 0, err = 0;
+ struct idpf_queue **qs;
+ int num_q, err = 0;
+ int i, j, k = 0;

num_q = vport->num_txq + vport->num_rxq;

- buf_sz = sizeof(struct virtchnl2_queue_vector) * num_q;
- vqv = kzalloc(buf_sz, GFP_KERNEL);
- if (!vqv)
+ qs = (struct idpf_queue **)kzalloc(num_q * sizeof(*qs), GFP_KERNEL);
+ if (!qs)
return -ENOMEM;

for (i = 0; i < vport->num_txq_grp; i++) {
struct idpf_txq_group *tx_qgrp = &vport->txq_grps[i];

- for (j = 0; j < tx_qgrp->num_txq; j++, k++) {
- const struct idpf_q_vector *vec;
- u32 v_idx, tx_itr_idx;
-
- vqv[k].queue_type = cpu_to_le32(tx_qgrp->txqs[j]->q_type);
- vqv[k].queue_id = cpu_to_le32(tx_qgrp->txqs[j]->q_id);
-
- if (idpf_is_queue_model_split(vport->txq_model))
- vec = tx_qgrp->complq->q_vector;
- else
- vec = tx_qgrp->txqs[j]->q_vector;
-
- if (vec) {
- v_idx = vec->v_idx;
- tx_itr_idx = vec->tx_itr_idx;
- } else {
- v_idx = 0;
- tx_itr_idx = VIRTCHNL2_ITR_IDX_1;
- }
-
- vqv[k].vector_id = cpu_to_le16(v_idx);
- vqv[k].itr_idx = cpu_to_le32(tx_itr_idx);
- }
+ for (j = 0; j < tx_qgrp->num_txq; j++, k++)
+ qs[k] = tx_qgrp->txqs[j];
}

if (vport->num_txq != k) {
@@ -1979,7 +2202,7 @@ int idpf_send_map_unmap_queue_vector_msg(struct idpf_vport *vport, bool map)

for (i = 0; i < vport->num_rxq_grp; i++) {
struct idpf_rxq_group *rx_qgrp = &vport->rxq_grps[i];
- u16 num_rxq;
+ int num_rxq;

if (idpf_is_queue_model_split(vport->rxq_model))
num_rxq = rx_qgrp->splitq.num_rxq_sets;
@@ -1994,10 +2217,7 @@ int idpf_send_map_unmap_queue_vector_msg(struct idpf_vport *vport, bool map)
else
rxq = rx_qgrp->singleq.rxqs[j];

- vqv[k].queue_type = cpu_to_le32(rxq->q_type);
- vqv[k].queue_id = cpu_to_le32(rxq->q_id);
- vqv[k].vector_id = cpu_to_le16(rxq->q_vector->v_idx);
- vqv[k].itr_idx = cpu_to_le32(rxq->q_vector->rx_itr_idx);
+ qs[k] = rxq;
}
}

@@ -2013,63 +2233,9 @@ int idpf_send_map_unmap_queue_vector_msg(struct idpf_vport *vport, bool map)
}
}

- /* Chunk up the vector info into multiple messages */
- config_sz = sizeof(struct virtchnl2_queue_vector_maps);
- chunk_sz = sizeof(struct virtchnl2_queue_vector);
-
- num_chunks = min_t(u32, IDPF_NUM_CHUNKS_PER_MSG(config_sz, chunk_sz),
- num_q);
- num_msgs = DIV_ROUND_UP(num_q, num_chunks);
-
- buf_sz = struct_size(vqvm, qv_maps, num_chunks);
- vqvm = kzalloc(buf_sz, GFP_KERNEL);
- if (!vqvm) {
- err = -ENOMEM;
- goto error;
- }
-
- mutex_lock(&vport->vc_buf_lock);
-
- for (i = 0, k = 0; i < num_msgs; i++) {
- memset(vqvm, 0, buf_sz);
- vqvm->vport_id = cpu_to_le32(vport->vport_id);
- vqvm->num_qv_maps = cpu_to_le16(num_chunks);
- memcpy(vqvm->qv_maps, &vqv[k], chunk_sz * num_chunks);
-
- if (map) {
- err = idpf_send_mb_msg(adapter,
- VIRTCHNL2_OP_MAP_QUEUE_VECTOR,
- buf_sz, (u8 *)vqvm);
- if (!err)
- err = idpf_wait_for_event(adapter, vport,
- IDPF_VC_MAP_IRQ,
- IDPF_VC_MAP_IRQ_ERR);
- } else {
- err = idpf_send_mb_msg(adapter,
- VIRTCHNL2_OP_UNMAP_QUEUE_VECTOR,
- buf_sz, (u8 *)vqvm);
- if (!err)
- err =
- idpf_min_wait_for_event(adapter, vport,
- IDPF_VC_UNMAP_IRQ,
- IDPF_VC_UNMAP_IRQ_ERR);
- }
- if (err)
- goto mbx_error;
-
- k += num_chunks;
- num_q -= num_chunks;
- num_chunks = min(num_chunks, num_q);
- /* Recalculate buffer size */
- buf_sz = struct_size(vqvm, qv_maps, num_chunks);
- }
-
-mbx_error:
- mutex_unlock(&vport->vc_buf_lock);
- kfree(vqvm);
+ err = idpf_send_map_unmap_sel_queue_vector_msg(vport, qs, num_q, map);
error:
- kfree(vqv);
-
+ kfree(qs);
return err;
}

@@ -2113,6 +2279,123 @@ int idpf_send_disable_queues_msg(struct idpf_vport *vport)
return idpf_wait_for_marker_event(vport);
}

+/**
+ * idpf_send_enable_selected_queues_msg - send enable queues virtchnl message
+ * for selected queues only
+ * @vport: Virtual port private data structure
+ * @qs: array containing queues to be enabled
+ * @num_q: number of queues in the 'qs' array
+ *
+ * Will send enable queues virtchnl message for queues contained in 'qs' table.
+ * Returns 0 on success, negative on failure.
+ */
+int idpf_send_enable_selected_queues_msg(struct idpf_vport *vport,
+ struct idpf_queue **qs,
+ int num_q)
+{
+ return idpf_send_ena_dis_selected_qs_msg(vport, qs, num_q,
+ VIRTCHNL2_OP_ENABLE_QUEUES);
+}
+
+/**
+ * idpf_send_disable_selected_queues_msg - send disable queues virtchnl message
+ * for selected queues only
+ * @vport: Virtual port private data structure
+ * @qs: array containing queues to be disabled
+ * @num_q: number of queues in the 'qs' array
+ *
+ * Will send disable queues virtchnl message for queues contained in 'qs' table.
+ * Returns 0 on success, negative on failure.
+ */
+int idpf_send_disable_selected_queues_msg(struct idpf_vport *vport,
+ struct idpf_queue **qs,
+ int num_q)
+{
+ struct idpf_queue **tx_qs;
+ int err, i, tx_idx = 0;
+
+ err = idpf_send_ena_dis_selected_qs_msg(vport, qs, num_q,
+ VIRTCHNL2_OP_DISABLE_QUEUES);
+ if (err)
+ return err;
+
+ tx_qs = (struct idpf_queue **)kzalloc(num_q * sizeof(*qs), GFP_KERNEL);
+ if (!tx_qs)
+ return -ENOMEM;
+
+ for (i = 0; i < num_q; i++) {
+ if (qs[i]->q_type == VIRTCHNL2_QUEUE_TYPE_TX) {
+ set_bit(__IDPF_Q_POLL_MODE, qs[i]->flags);
+ tx_qs[tx_idx++] = qs[i];
+ }
+
+ if (qs[i]->q_type == VIRTCHNL2_QUEUE_TYPE_TX_COMPLETION)
+ napi_schedule(&qs[i]->q_vector->napi);
+ }
+
+ err = idpf_wait_for_selected_marker_events(vport, tx_qs, tx_idx);
+
+ kfree(tx_qs);
+ return err;
+}
+
+/**
+ * idpf_send_config_selected_queues_msg - Send virtchnl config queues message
+ * for selected queues only.
+ * @vport: virtual port data structure
+ * @qs: array of queues to be configured
+ * @num_q: number of queues contained in 'qs' array
+ *
+ * Send config queues virtchnl message for queues contained in 'qs' array.
+ * The 'qs' array can contain both RX or TX queues.
+ * Returns 0 on success, negative on failure.
+ */
+int idpf_send_config_selected_queues_msg(struct idpf_vport *vport,
+ struct idpf_queue **qs,
+ int num_q)
+{
+ int num_rxq = 0, num_txq = 0, i, err;
+ struct idpf_queue **txqs, **rxqs;
+
+ txqs = (struct idpf_queue **)kzalloc(num_q * sizeof(*qs), GFP_KERNEL);
+ if (!txqs)
+ return -ENOMEM;
+
+ rxqs = (struct idpf_queue **)kzalloc(num_q * sizeof(*qs), GFP_KERNEL);
+ if (!rxqs) {
+ err = -ENOMEM;
+ goto rxq_alloc_err;
+ }
+
+ for (i = 0; i < num_q; i++) {
+ struct idpf_queue *q = qs[i];
+
+ if (q->q_type == VIRTCHNL2_QUEUE_TYPE_TX ||
+ q->q_type == VIRTCHNL2_QUEUE_TYPE_TX_COMPLETION) {
+ txqs[num_txq] = q;
+ num_txq++;
+ }
+ else if (q->q_type == VIRTCHNL2_QUEUE_TYPE_RX ||
+ q->q_type == VIRTCHNL2_QUEUE_TYPE_RX_BUFFER) {
+ rxqs[num_rxq] = q;
+ num_rxq++;
+ }
+ }
+
+ err = idpf_send_config_selected_tx_queues_msg(vport, txqs, num_txq);
+ if (err)
+ goto send_txq_err;
+
+ err = idpf_send_config_selected_rx_queues_msg(vport, rxqs, num_rxq);
+
+send_txq_err:
+ kfree(rxqs);
+rxq_alloc_err:
+ kfree(txqs);
+
+ return err;
+}
+
/**
* idpf_convert_reg_to_queue_chunks - Copy queue chunk information to the right
* structure
--
2.43.0


2023-12-23 03:08:47

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 30/34] idpf: move search rx and tx queues to header

From: Michal Kubiak <[email protected]>

Move Rx and Tx queue lookup functions from the ethtool implementation to
the idpf header.
Now, those functions can be used globally, including XDP configuration.

Signed-off-by: Michal Kubiak <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/idpf.h | 41 +++++++++++++++++++
.../net/ethernet/intel/idpf/idpf_ethtool.c | 39 ------------------
2 files changed, 41 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
index a12c56f9f2ef..d99ebd045c4e 100644
--- a/drivers/net/ethernet/intel/idpf/idpf.h
+++ b/drivers/net/ethernet/intel/idpf/idpf.h
@@ -696,6 +696,47 @@ static inline int idpf_is_queue_model_split(u16 q_model)
return q_model == VIRTCHNL2_QUEUE_MODEL_SPLIT;
}

+/**
+ * idpf_find_rxq - find rxq from q index
+ * @vport: virtual port associated to queue
+ * @q_num: q index used to find queue
+ *
+ * returns pointer to rx queue
+ */
+static inline struct idpf_queue *
+idpf_find_rxq(struct idpf_vport *vport, int q_num)
+{
+ int q_grp, q_idx;
+
+ if (!idpf_is_queue_model_split(vport->rxq_model))
+ return vport->rxq_grps->singleq.rxqs[q_num];
+
+ q_grp = q_num / IDPF_DFLT_SPLITQ_RXQ_PER_GROUP;
+ q_idx = q_num % IDPF_DFLT_SPLITQ_RXQ_PER_GROUP;
+
+ return &vport->rxq_grps[q_grp].splitq.rxq_sets[q_idx]->rxq;
+}
+
+/**
+ * idpf_find_txq - find txq from q index
+ * @vport: virtual port associated to queue
+ * @q_num: q index used to find queue
+ *
+ * returns pointer to tx queue
+ */
+static inline struct idpf_queue *
+idpf_find_txq(struct idpf_vport *vport, int q_num)
+{
+ int q_grp;
+
+ if (!idpf_is_queue_model_split(vport->txq_model))
+ return vport->txqs[q_num];
+
+ q_grp = q_num / IDPF_DFLT_SPLITQ_TXQ_PER_GROUP;
+
+ return vport->txq_grps[q_grp].complq;
+}
+
/**
* idpf_xdp_is_prog_ena - check if there is an XDP program on adapter
* @vport: vport to check
diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
index 0d192417205d..f7ec679c9b16 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c
@@ -953,45 +953,6 @@ static void idpf_get_ethtool_stats(struct net_device *netdev,
idpf_vport_ctrl_unlock(netdev);
}

-/**
- * idpf_find_rxq - find rxq from q index
- * @vport: virtual port associated to queue
- * @q_num: q index used to find queue
- *
- * returns pointer to rx queue
- */
-static struct idpf_queue *idpf_find_rxq(struct idpf_vport *vport, int q_num)
-{
- int q_grp, q_idx;
-
- if (!idpf_is_queue_model_split(vport->rxq_model))
- return vport->rxq_grps->singleq.rxqs[q_num];
-
- q_grp = q_num / IDPF_DFLT_SPLITQ_RXQ_PER_GROUP;
- q_idx = q_num % IDPF_DFLT_SPLITQ_RXQ_PER_GROUP;
-
- return &vport->rxq_grps[q_grp].splitq.rxq_sets[q_idx]->rxq;
-}
-
-/**
- * idpf_find_txq - find txq from q index
- * @vport: virtual port associated to queue
- * @q_num: q index used to find queue
- *
- * returns pointer to tx queue
- */
-static struct idpf_queue *idpf_find_txq(struct idpf_vport *vport, int q_num)
-{
- int q_grp;
-
- if (!idpf_is_queue_model_split(vport->txq_model))
- return vport->txqs[q_num];
-
- q_grp = q_num / IDPF_DFLT_SPLITQ_TXQ_PER_GROUP;
-
- return vport->txq_grps[q_grp].complq;
-}
-
/**
* __idpf_get_q_coalesce - get ITR values for specific queue
* @ec: ethtool structure to fill with driver's coalesce settings
--
2.43.0


2023-12-23 03:09:10

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 31/34] idpf: add XSk pool initialization

From: Michal Kubiak <[email protected]>

Add functionality to setup an XSk buffer pool, including ability to
stop, reconfig and start only selected queues, not the whole device.
Pool DMA mapping is managed by libie.

Signed-off-by: Michal Kubiak <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/Makefile | 1 +
drivers/net/ethernet/intel/idpf/idpf.h | 13 +
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 +-
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 6 +
drivers/net/ethernet/intel/idpf/idpf_xdp.c | 5 +
drivers/net/ethernet/intel/idpf/idpf_xsk.c | 474 ++++++++++++++++++++
drivers/net/ethernet/intel/idpf/idpf_xsk.h | 15 +
7 files changed, 521 insertions(+), 7 deletions(-)
create mode 100644 drivers/net/ethernet/intel/idpf/idpf_xsk.c
create mode 100644 drivers/net/ethernet/intel/idpf/idpf_xsk.h

diff --git a/drivers/net/ethernet/intel/idpf/Makefile b/drivers/net/ethernet/intel/idpf/Makefile
index 4024781ff02b..f4bc1fd96092 100644
--- a/drivers/net/ethernet/intel/idpf/Makefile
+++ b/drivers/net/ethernet/intel/idpf/Makefile
@@ -18,3 +18,4 @@ idpf-y := \
idpf_vf_dev.o

idpf-objs += idpf_xdp.o
+idpf-objs += idpf_xsk.o
diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
index d99ebd045c4e..f05ed84600fd 100644
--- a/drivers/net/ethernet/intel/idpf/idpf.h
+++ b/drivers/net/ethernet/intel/idpf/idpf.h
@@ -464,6 +464,7 @@ struct idpf_rss_data {
* ethtool
* @num_req_rxq_desc: Number of user requested RX queue descriptors through
* ethtool
+ * @af_xdp_zc_qps: Mask of queue pairs where the AF_XDP socket is established
* @user_flags: User toggled config flags
* @mac_filter_list: List of MAC filters
*
@@ -480,6 +481,7 @@ struct idpf_vport_user_config_data {
struct bpf_prog *xdp_prog;
DECLARE_BITMAP(user_flags, __IDPF_USER_FLAGS_NBITS);
struct list_head mac_filter_list;
+ DECLARE_BITMAP(af_xdp_zc_qps, IDPF_LARGE_MAX_Q);
};

/**
@@ -959,6 +961,17 @@ static inline void idpf_vport_ctrl_unlock(struct net_device *netdev)
mutex_unlock(&np->adapter->vport_ctrl_lock);
}

+/**
+ * idpf_vport_ctrl_is_locked - Check if vport control lock is taken
+ * @netdev: Network interface device structure
+ */
+static inline bool idpf_vport_ctrl_is_locked(struct net_device *netdev)
+{
+ struct idpf_netdev_priv *np = netdev_priv(netdev);
+
+ return mutex_is_locked(&np->adapter->vport_ctrl_lock);
+}
+
void idpf_statistics_task(struct work_struct *work);
void idpf_init_task(struct work_struct *work);
void idpf_service_task(struct work_struct *work);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 3dc21731df2f..e3f59bbe7c90 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -100,7 +100,7 @@ static void idpf_tx_buf_rel_all(struct idpf_queue *txq)
*
* Free all transmit software resources
*/
-static void idpf_tx_desc_rel(struct idpf_queue *txq, bool bufq)
+void idpf_tx_desc_rel(struct idpf_queue *txq, bool bufq)
{
if (bufq)
idpf_tx_buf_rel_all(txq);
@@ -194,7 +194,7 @@ static int idpf_tx_buf_alloc_all(struct idpf_queue *tx_q)
*
* Returns 0 on success, negative on failure
*/
-static int idpf_tx_desc_alloc(struct idpf_queue *tx_q, bool bufq)
+int idpf_tx_desc_alloc(struct idpf_queue *tx_q, bool bufq)
{
struct device *dev = tx_q->dev;
u32 desc_sz;
@@ -385,7 +385,7 @@ static void idpf_rx_buf_rel_all(struct idpf_queue *rxq)
*
* Free a specific rx queue resources
*/
-static void idpf_rx_desc_rel(struct idpf_queue *rxq, bool bufq, s32 q_model)
+void idpf_rx_desc_rel(struct idpf_queue *rxq, bool bufq, s32 q_model)
{
if (!rxq)
return;
@@ -649,8 +649,7 @@ static int idpf_rx_buf_alloc_all(struct idpf_queue *rxbufq)
*
* Returns 0 on success, negative on failure
*/
-static int idpf_rx_bufs_init(struct idpf_queue *rxbufq,
- enum libie_rx_buf_type type)
+int idpf_rx_bufs_init(struct idpf_queue *rxbufq, enum libie_rx_buf_type type)
{
struct libie_buf_queue bq = {
.truesize = rxbufq->truesize,
@@ -730,7 +729,7 @@ int idpf_rx_bufs_init_all(struct idpf_vport *vport)
*
* Returns 0 on success, negative on failure
*/
-static int idpf_rx_desc_alloc(struct idpf_queue *rxq, bool bufq, s32 q_model)
+int idpf_rx_desc_alloc(struct idpf_queue *rxq, bool bufq, s32 q_model)
{
struct device *dev = rxq->dev;

@@ -1870,7 +1869,8 @@ static void idpf_tx_finalize_complq(struct idpf_queue *complq, int ntc,

dont_wake = !complq_ok || IDPF_TX_BUF_RSV_LOW(tx_q) ||
np->state != __IDPF_VPORT_UP ||
- !netif_carrier_ok(tx_q->vport->netdev);
+ !netif_carrier_ok(tx_q->vport->netdev) ||
+ idpf_vport_ctrl_is_locked(tx_q->vport->netdev);
/* Check if the TXQ needs to and can be restarted */
__netif_txq_completed_wake(nq, tx_q->cleaned_pkts, tx_q->cleaned_bytes,
IDPF_DESC_UNUSED(tx_q), IDPF_TX_WAKE_THRESH,
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index f32d854fe850..be396f1e346a 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -519,6 +519,7 @@ union idpf_queue_stats {
* @size: Length of descriptor ring in bytes
* @dma: Physical address of ring
* @desc_ring: Descriptor ring memory
+ * @xsk_pool: Pointer to a description of a buffer pool for AF_XDP socket
* @tx_max_bufs: Max buffers that can be transmitted with scatter-gather
* @tx_min_pkt_len: Min supported packet length
* @num_completions: Only relevant for TX completion queue. It tracks the
@@ -948,6 +949,11 @@ bool idpf_rx_singleq_buf_hw_alloc_all(struct idpf_queue *rxq,
u16 cleaned_count);
int idpf_tso(struct sk_buff *skb, struct idpf_tx_offload_params *off);
void idpf_tx_handle_sw_marker(struct idpf_queue *tx_q);
+int idpf_rx_desc_alloc(struct idpf_queue *rxq, bool bufq, s32 q_model);
+void idpf_rx_desc_rel(struct idpf_queue *rxq, bool bufq, s32 q_model);
+int idpf_tx_desc_alloc(struct idpf_queue *tx_q, bool bufq);
+void idpf_tx_desc_rel(struct idpf_queue *txq, bool bufq);
+int idpf_rx_bufs_init(struct idpf_queue *rxbufq, enum libie_rx_buf_type type);

/**
* idpf_xdpq_update_tail - Updates the XDP Tx queue tail register
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xdp.c b/drivers/net/ethernet/intel/idpf/idpf_xdp.c
index b4f096186302..c20c805583be 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xdp.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_xdp.c
@@ -3,6 +3,7 @@

#include "idpf.h"
#include "idpf_xdp.h"
+#include "idpf_xsk.h"

static int idpf_rxq_for_each(const struct idpf_vport *vport,
int (*fn)(struct idpf_queue *rxq, void *arg),
@@ -472,6 +473,10 @@ int idpf_xdp(struct net_device *netdev, struct netdev_bpf *xdp)
case XDP_SETUP_PROG:
err = idpf_xdp_setup_prog(vport, xdp->prog, xdp->extack);
break;
+ case XDP_SETUP_XSK_POOL:
+ err = idpf_xsk_pool_setup(vport, xdp->xsk.pool,
+ xdp->xsk.queue_id);
+ break;
default:
err = -EINVAL;
}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xsk.c b/drivers/net/ethernet/intel/idpf/idpf_xsk.c
new file mode 100644
index 000000000000..3017680fedb3
--- /dev/null
+++ b/drivers/net/ethernet/intel/idpf/idpf_xsk.c
@@ -0,0 +1,474 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2023 Intel Corporation */
+
+#include <linux/net/intel/libie/xsk.h>
+
+#include "idpf.h"
+#include "idpf_xsk.h"
+
+/**
+ * idpf_qp_cfg_qs - Configure all queues contained from a given array.
+ * @vport: vport structure
+ * @qs: an array of queues to configure
+ * @num_qs: number of queues in the 'qs' array
+ *
+ * Returns 0 in case of success, false otherwise.
+ */
+static int
+idpf_qp_cfg_qs(struct idpf_vport *vport, struct idpf_queue **qs, int num_qs)
+{
+ bool splitq = idpf_is_queue_model_split(vport->rxq_model);
+ int i, err;
+
+ for (i = 0; i < num_qs; i++) {
+ const struct idpf_bufq_set *sets;
+ struct idpf_queue *q = qs[i];
+ enum libie_rx_buf_type qt;
+ u32 ts;
+
+ switch (q->q_type) {
+ case VIRTCHNL2_QUEUE_TYPE_RX:
+ err = idpf_rx_desc_alloc(q, false, vport->rxq_model);
+ if (err) {
+ netdev_err(vport->netdev, "Could not allocate buffer for RX queue.\n");
+ break;
+ }
+ if (!splitq)
+ err = idpf_rx_bufs_init(q, LIBIE_RX_BUF_MTU);
+ break;
+ case VIRTCHNL2_QUEUE_TYPE_RX_BUFFER:
+ err = idpf_rx_desc_alloc(q, true, vport->rxq_model);
+ if (err)
+ break;
+
+ sets = q->rxq_grp->splitq.bufq_sets;
+ qt = q->idx ? LIBIE_RX_BUF_SHORT : LIBIE_RX_BUF_MTU;
+ ts = q->idx ? sets[q->idx - 1].bufq.truesize >> 1 : 0;
+ q->truesize = ts;
+
+ err = idpf_rx_bufs_init(q, qt);
+ break;
+ case VIRTCHNL2_QUEUE_TYPE_TX:
+ err = idpf_tx_desc_alloc(q, true);
+ break;
+ case VIRTCHNL2_QUEUE_TYPE_TX_COMPLETION:
+ err = idpf_tx_desc_alloc(q, false);
+ break;
+ }
+
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+/**
+ * idpf_qp_clean_qs - Clean all queues contained from a given array.
+ * @vport: vport structure
+ * @qs: an array of queues to clean
+ * @num_qs: number of queues in the 'qs' array
+ */
+static void
+idpf_qp_clean_qs(struct idpf_vport *vport, struct idpf_queue **qs, int num_qs)
+{
+ for (u32 i = 0; i < num_qs; i++) {
+ struct idpf_queue *q = qs[i];
+
+ switch (q->q_type) {
+ case VIRTCHNL2_QUEUE_TYPE_RX:
+ idpf_rx_desc_rel(q, false, vport->rxq_model);
+ break;
+ case VIRTCHNL2_QUEUE_TYPE_RX_BUFFER:
+ idpf_rx_desc_rel(q, true, vport->rxq_model);
+ break;
+ case VIRTCHNL2_QUEUE_TYPE_TX:
+ idpf_tx_desc_rel(q, true);
+ q->txq_grp->num_completions_pending = 0;
+ writel(q->next_to_use, q->tail);
+ break;
+ case VIRTCHNL2_QUEUE_TYPE_TX_COMPLETION:
+ idpf_tx_desc_rel(q, false);
+ q->num_completions = 0;
+ break;
+ }
+ }
+}
+
+/**
+ * idpf_qvec_ena_irq - Enable IRQ for given queue vector
+ * @q_vector: queue vector
+ */
+static void
+idpf_qvec_ena_irq(struct idpf_q_vector *q_vector)
+{
+ /* Write the default ITR values */
+ if (q_vector->num_rxq)
+ idpf_vport_intr_write_itr(q_vector, q_vector->rx_itr_value,
+ false);
+ if (q_vector->num_txq)
+ idpf_vport_intr_write_itr(q_vector, q_vector->tx_itr_value,
+ true);
+ if (q_vector->num_rxq || q_vector->num_txq)
+ idpf_vport_intr_update_itr_ena_irq(q_vector);
+}
+
+/**
+ * idpf_insert_txqs_from_grp - Insert all tx and buffer queues from txq group
+ * to a given array.
+ * @vport: vport structure
+ * @txq: pointer to a tx queue
+ * @qs: pointer to an element of array where tx queues should be inserted
+ *
+ * Returns the number of queues that has been inserted to an output 'qs'
+ * array.
+ * Note that the caller of this function must ensure that there is enough space
+ * in the 'qs' array to insert all the queues from the rx queue group.
+ */
+static int
+idpf_insert_txqs_from_grp(struct idpf_vport *vport,
+ struct idpf_queue *txq,
+ struct idpf_queue **qs)
+{
+ int qs_idx = 0;
+
+ if (!idpf_is_queue_model_split(vport->txq_model)) {
+ qs[qs_idx++] = txq;
+ } else {
+ struct idpf_txq_group *txq_grp = txq->txq_grp;
+ int i;
+
+ for (i = 0; i < txq_grp->num_txq; i++)
+ qs[qs_idx++] = txq_grp->txqs[i];
+
+ for (i = 0; i < IDPF_COMPLQ_PER_GROUP; i++)
+ qs[qs_idx++] = &txq_grp->complq[i];
+ }
+
+ return qs_idx;
+}
+
+/**
+ * idpf_insert_rxqs_from_grp - Insert all rx and buffer queues from rxq group
+ * to a given array.
+ * @vport: vport structure
+ * @rxq: pointer to a rx queue
+ * @qs: pointer to an element of array where rx queues should be inserted
+ *
+ * Returns the number of queues that has been inserted to an output 'qs'
+ * array.
+ * Note that the caller of this function must ensure that there is enough space
+ * in the 'qs' array to insert all the queues from the rx queue group.
+ */
+static int
+idpf_insert_rxqs_from_grp(struct idpf_vport *vport,
+ struct idpf_queue *rxq,
+ struct idpf_queue **qs)
+{
+ int qs_idx = 0;
+
+ if (!idpf_is_queue_model_split(vport->rxq_model)) {
+ qs[qs_idx++] = rxq;
+ } else {
+ struct idpf_rxq_group *rxq_grp = rxq->rxq_grp;
+ int i;
+
+ for (i = 0; i < rxq_grp->splitq.num_rxq_sets; i++)
+ qs[qs_idx++] = &rxq_grp->splitq.rxq_sets[i]->rxq;
+
+ for (i = 0; i < vport->num_bufqs_per_qgrp; i++)
+ qs[qs_idx++] = &rxq_grp->splitq.bufq_sets[i].bufq;
+ }
+
+ return qs_idx;
+}
+
+/**
+ * idpf_count_rxqs_in_grp - Returns the number of rx queues in rx queue group
+ * containing a given rx queue.
+ * @vport: vport structure
+ * @rxq: pointer to a rx queue
+ *
+ * Returns the number of rx queues in the rx queue group associated with
+ * a given rx queue. Or, in case of singleq mode, 1, because rx queues
+ * are not grouped.
+ */
+static int
+idpf_count_rxqs_in_grp(struct idpf_vport *vport, struct idpf_queue *rxq)
+{
+ if (!idpf_is_queue_model_split(vport->rxq_model))
+ return 1;
+
+ return rxq->rxq_grp->splitq.num_rxq_sets + vport->num_bufqs_per_qgrp;
+}
+
+/**
+ * idpf_count_txqs_in_grp - Returns the number of tx queues in tx queue group
+ * containing a given tx queue.
+ * @vport: vport structure
+ * @txq: pointer to a tx queue
+ *
+ * Returns the number of tx queues in the tx queue group associated with
+ * a given tx queue. Or, in case of singleq mode, 1, because tx queues
+ * are not grouped.
+ */
+static int
+idpf_count_txqs_in_grp(struct idpf_vport *vport, struct idpf_queue *txq)
+{
+ if (!idpf_is_queue_model_split(vport->txq_model))
+ return 1;
+
+ return txq->txq_grp->num_txq + IDPF_COMPLQ_PER_GROUP;
+}
+
+/**
+ * idpf_create_queue_list - Creates a list of queues associated with a given
+ * queue index.
+ * @vport: vport structure
+ * @q_idx: index of queue pair to establish XSK socket
+ * @num_qs: number of queues in returned array.
+ *
+ * Returns a pointer to a dynamically allocated array of pointers to all
+ * queues associated with a given queue index (q_idx).
+ * Please note that the caller is responsible to free the memory allocated
+ * by this function using 'kfree()'.
+ * NULL-pointer will be returned in case of error.
+ */
+static struct idpf_queue **
+idpf_create_queue_list(struct idpf_vport *vport, u16 q_idx, int *num_qs)
+{
+ struct idpf_queue *rxq, *txq, *xdpq = NULL;
+ struct idpf_queue **qs;
+ int qs_idx;
+
+ *num_qs = 0;
+
+ if (q_idx >= vport->num_rxq || q_idx >= vport->num_txq)
+ return NULL;
+
+ rxq = idpf_find_rxq(vport, q_idx);
+ txq = idpf_find_txq(vport, q_idx);
+
+ *num_qs += idpf_count_rxqs_in_grp(vport, rxq);
+ *num_qs += idpf_count_txqs_in_grp(vport, txq);
+
+ if (idpf_xdp_is_prog_ena(vport)) {
+ xdpq = vport->txqs[q_idx + vport->xdp_txq_offset];
+ *num_qs += idpf_count_txqs_in_grp(vport, xdpq);
+ }
+
+ qs = kcalloc(*num_qs, sizeof(*qs), GFP_KERNEL);
+ if (!qs)
+ return NULL;
+
+ qs_idx = 0;
+ qs_idx += idpf_insert_txqs_from_grp(vport, txq, &qs[qs_idx]);
+
+ if (xdpq)
+ qs_idx += idpf_insert_txqs_from_grp(vport, xdpq, &qs[qs_idx]);
+
+ qs_idx += idpf_insert_rxqs_from_grp(vport, rxq, &qs[qs_idx]);
+
+ if (*num_qs != qs_idx) {
+ kfree(qs);
+ *num_qs = 0;
+ qs = NULL;
+ }
+
+ return qs;
+}
+
+/**
+ * idpf_qp_dis - Disables queues associated with a queue pair
+ * @vport: vport structure
+ * @q_vector: interrupt vector mapped to a given queue pair
+ * @qs: array of pointers to queues to enable
+ * @num_qs: number of queues in 'qs' array
+ * @q_idx: index of queue pair to enable
+ *
+ * Returns 0 on success, negative on failure.
+ */
+static int idpf_qp_dis(struct idpf_vport *vport, struct idpf_q_vector *q_vector,
+ struct idpf_queue **qs, int num_qs, u16 q_idx)
+{
+ int err = 0;
+
+ netif_stop_subqueue(vport->netdev, q_idx);
+
+ err = idpf_send_disable_vport_msg(vport);
+ if (err) {
+ netdev_err(vport->netdev, "Could not disable vport, error = %d\n",
+ err);
+ goto err_send_msg;
+ }
+ err = idpf_send_disable_selected_queues_msg(vport, qs, num_qs);
+ if (err) {
+ netdev_err(vport->netdev, "Could not disable queues for index %d, error = %d\n",
+ q_idx, err);
+ goto err_send_msg;
+ }
+
+ napi_disable(&q_vector->napi);
+ writel(0, q_vector->intr_reg.dyn_ctl);
+ idpf_qp_clean_qs(vport, qs, num_qs);
+
+ return 0;
+
+err_send_msg:
+ netif_start_subqueue(vport->netdev, q_idx);
+
+ return err;
+}
+
+/**
+ * idpf_qp_ena - Enables queues associated with a queue pair
+ * @vport: vport structure
+ * @q_vector: interrupt vector mapped to a given queue pair
+ * @qs: array of pointers to queues to enable
+ * @num_qs: number of queues in 'qs' array
+ * @q_idx: index of queue pair to enable
+ *
+ * Returns 0 on success, negative on failure.
+ */
+static int idpf_qp_ena(struct idpf_vport *vport, struct idpf_q_vector *q_vector,
+ struct idpf_queue **qs, int num_qs, u16 q_idx)
+{
+ int err;
+
+ err = idpf_qp_cfg_qs(vport, qs, num_qs);
+ if (err) {
+ netdev_err(vport->netdev, "Could not initialize queues for index %d, error = %d\n",
+ q_idx, err);
+ return err;
+ }
+
+ napi_enable(&q_vector->napi);
+ idpf_qvec_ena_irq(q_vector);
+
+ err = idpf_send_config_selected_queues_msg(vport, qs, num_qs);
+ if (err) {
+ netdev_err(vport->netdev, "Could not configure queues for index %d, error = %d\n",
+ q_idx, err);
+ return err;
+ }
+
+ err = idpf_send_enable_selected_queues_msg(vport, qs, num_qs);
+ if (err) {
+ netdev_err(vport->netdev, "Could not enable queues for index %d, error = %d\n",
+ q_idx, err);
+ return err;
+ }
+
+ err = idpf_send_enable_vport_msg(vport);
+ if (err) {
+ netdev_err(vport->netdev, "Could not enable vport, error = %d\n",
+ err);
+ return err;
+ }
+
+ netif_start_subqueue(vport->netdev, q_idx);
+
+ return 0;
+}
+
+/**
+ * idpf_xsk_pool_disable - disables a BUFF POOL region
+ * @vport: vport to allocate the buffer pool on
+ * @qid: queue id
+ *
+ * Returns 0 on success, negative on error
+ */
+static int idpf_xsk_pool_disable(struct idpf_vport *vport, u16 qid)
+{
+ struct idpf_vport_user_config_data *cfg_data;
+
+ if (!vport->rxq_grps)
+ return -EINVAL;
+
+ cfg_data = &vport->adapter->vport_config[vport->idx]->user_config;
+
+ return libie_xsk_disable_pool(vport->netdev, qid,
+ cfg_data->af_xdp_zc_qps);
+}
+/**
+ * idpf_xsk_pool_enable - enables a BUFF POOL region
+ * @vport: vport to allocate the buffer pool on
+ * @pool: pointer to a requested BUFF POOL region
+ * @qid: queue id
+ *
+ * Returns 0 on success, negative on error
+ */
+static int idpf_xsk_pool_enable(struct idpf_vport *vport, u16 qid)
+{
+ struct idpf_vport_user_config_data *cfg_data;
+
+ cfg_data = &vport->adapter->vport_config[vport->idx]->user_config;
+
+ return libie_xsk_enable_pool(vport->netdev, qid,
+ cfg_data->af_xdp_zc_qps);
+}
+
+/**
+ * idpf_xsk_pool_setup - enable/disable a BUFF POOL region
+ * @vport: current vport of interest
+ * @pool: pointer to a requested BUFF POOL region
+ * @qid: queue id
+ *
+ * Returns 0 on success, negative on failure
+ */
+int idpf_xsk_pool_setup(struct idpf_vport *vport, struct xsk_buff_pool *pool,
+ u32 qid)
+{
+ bool if_running, pool_present = !!pool;
+ int err = 0, pool_failure = 0, num_qs;
+ struct idpf_q_vector *q_vector;
+ struct idpf_queue *rxq, **qs;
+
+ if_running = netif_running(vport->netdev) &&
+ idpf_xdp_is_prog_ena(vport);
+
+ if (if_running) {
+ rxq = idpf_find_rxq(vport, qid);
+ q_vector = rxq->q_vector;
+
+ qs = idpf_create_queue_list(vport, qid, &num_qs);
+ if (!qs) {
+ err = -ENOMEM;
+ goto xsk_exit;
+ }
+
+ err = idpf_qp_dis(vport, q_vector, qs, num_qs, qid);
+ if (err) {
+ netdev_err(vport->netdev, "Cannot disable queues for XSK setup, error = %d\n",
+ err);
+ goto xsk_pool_if_up;
+ }
+ }
+
+ pool_failure = pool_present ? idpf_xsk_pool_enable(vport, qid) :
+ idpf_xsk_pool_disable(vport, qid);
+
+ if (!idpf_xdp_is_prog_ena(vport))
+ netdev_warn(vport->netdev, "RSS may schedule pkts to q occupied by AF XDP\n");
+
+xsk_pool_if_up:
+ if (if_running) {
+ err = idpf_qp_ena(vport, q_vector, qs, num_qs, qid);
+ if (!err && pool_present)
+ napi_schedule(&rxq->q_vector->napi);
+ else if (err)
+ netdev_err(vport->netdev,
+ "Could not enable queues after XSK setup, error = %d\n",
+ err);
+ kfree(qs);
+ }
+
+ if (pool_failure) {
+ netdev_err(vport->netdev, "Could not %sable BUFF POOL, error = %d\n",
+ pool_present ? "en" : "dis", pool_failure);
+ err = pool_failure;
+ }
+
+xsk_exit:
+ return err;
+}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xsk.h b/drivers/net/ethernet/intel/idpf/idpf_xsk.h
new file mode 100644
index 000000000000..93705900f592
--- /dev/null
+++ b/drivers/net/ethernet/intel/idpf/idpf_xsk.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (C) 2023 Intel Corporation */
+
+#ifndef _IDPF_XSK_H_
+#define _IDPF_XSK_H_
+
+#include <linux/types.h>
+
+struct idpf_vport;
+struct xsk_buff_pool;
+
+int idpf_xsk_pool_setup(struct idpf_vport *vport, struct xsk_buff_pool *pool,
+ u32 qid);
+
+#endif /* !_IDPF_XSK_H_ */
--
2.43.0


2023-12-23 03:09:31

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 32/34] idpf: implement Tx path for AF_XDP

From: Michal Kubiak <[email protected]>

Implement Tx handling for AF_XDP feature in zero-copy mode using
the libie XSk infra.

Signed-off-by: Michal Kubiak <[email protected]>
Co-developed-by: Alexander Lobakin <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 44 ++-
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 4 +
drivers/net/ethernet/intel/idpf/idpf_xsk.c | 318 ++++++++++++++++++++
drivers/net/ethernet/intel/idpf/idpf_xsk.h | 9 +
4 files changed, 361 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index e3f59bbe7c90..5ba880c2bedc 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -3,6 +3,7 @@

#include "idpf.h"
#include "idpf_xdp.h"
+#include "idpf_xsk.h"

/**
* idpf_buf_lifo_push - push a buffer pointer onto stack
@@ -55,30 +56,36 @@ void idpf_tx_timeout(struct net_device *netdev, unsigned int txqueue)
}
}

-/**
- * idpf_tx_buf_rel_all - Free any empty Tx buffers
- * @txq: queue to be cleaned
- */
-static void idpf_tx_buf_rel_all(struct idpf_queue *txq)
+static void idpf_tx_buf_clean(struct idpf_queue *txq)
{
struct libie_sq_onstack_stats ss = { };
struct xdp_frame_bulk bq;
- u16 i;
-
- /* Buffers already cleared, nothing to do */
- if (!txq->tx_buf)
- return;

xdp_frame_bulk_init(&bq);
rcu_read_lock();

- /* Free all the Tx buffer sk_buffs */
- for (i = 0; i < txq->desc_count; i++)
+ for (u32 i = 0; i < txq->desc_count; i++)
libie_tx_complete_any(&txq->tx_buf[i], txq->dev, &bq,
&txq->xdp_tx_active, &ss);

xdp_flush_frame_bulk(&bq);
rcu_read_unlock();
+}
+
+/**
+ * idpf_tx_buf_rel_all - Free any empty Tx buffers
+ * @txq: queue to be cleaned
+ */
+static void idpf_tx_buf_rel_all(struct idpf_queue *txq)
+{
+ /* Buffers already cleared, nothing to do */
+ if (!txq->tx_buf)
+ return;
+
+ if (test_bit(__IDPF_Q_XSK, txq->flags))
+ idpf_xsk_clean_xdpq(txq);
+ else
+ idpf_tx_buf_clean(txq);

kfree(txq->tx_buf);
txq->tx_buf = NULL;
@@ -86,7 +93,7 @@ static void idpf_tx_buf_rel_all(struct idpf_queue *txq)
if (!txq->buf_stack.bufs)
return;

- for (i = 0; i < txq->buf_stack.size; i++)
+ for (u32 i = 0; i < txq->buf_stack.size; i++)
kfree(txq->buf_stack.bufs[i]);

kfree(txq->buf_stack.bufs);
@@ -105,6 +112,8 @@ void idpf_tx_desc_rel(struct idpf_queue *txq, bool bufq)
if (bufq)
idpf_tx_buf_rel_all(txq);

+ idpf_xsk_clear_queue(txq);
+
if (!txq->desc_ring)
return;

@@ -196,6 +205,7 @@ static int idpf_tx_buf_alloc_all(struct idpf_queue *tx_q)
*/
int idpf_tx_desc_alloc(struct idpf_queue *tx_q, bool bufq)
{
+ enum virtchnl2_queue_type type;
struct device *dev = tx_q->dev;
u32 desc_sz;
int err;
@@ -228,6 +238,10 @@ int idpf_tx_desc_alloc(struct idpf_queue *tx_q, bool bufq)
tx_q->next_to_clean = 0;
set_bit(__IDPF_Q_GEN_CHK, tx_q->flags);

+ type = bufq ? VIRTCHNL2_QUEUE_TYPE_TX :
+ VIRTCHNL2_QUEUE_TYPE_TX_COMPLETION;
+ idpf_xsk_setup_queue(tx_q, type);
+
return 0;

err_alloc:
@@ -3802,7 +3816,9 @@ static bool idpf_tx_splitq_clean_all(struct idpf_q_vector *q_vec,
for (i = 0; i < num_txq; i++) {
struct idpf_queue *cq = q_vec->tx[i];

- if (!test_bit(__IDPF_Q_XDP, cq->flags))
+ if (test_bit(__IDPF_Q_XSK, cq->flags))
+ clean_complete &= idpf_xmit_zc(cq);
+ else if (!test_bit(__IDPF_Q_XDP, cq->flags))
clean_complete &= idpf_tx_clean_complq(cq,
budget_per_q,
cleaned);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index be396f1e346a..d55ff6aaae2b 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -313,6 +313,7 @@ struct idpf_ptype_state {
* @__IDPF_Q_SW_MARKER: Used to indicate TX queue marker completions
* @__IDPF_Q_POLL_MODE: Enable poll mode
* @__IDPF_Q_FLAGS_NBITS: Must be last
+ * @__IDPF_Q_XSK: Queue used to handle the AF_XDP socket
*/
enum idpf_queue_flags_t {
__IDPF_Q_GEN_CHK,
@@ -321,6 +322,7 @@ enum idpf_queue_flags_t {
__IDPF_Q_SW_MARKER,
__IDPF_Q_POLL_MODE,
__IDPF_Q_XDP,
+ __IDPF_Q_XSK,

__IDPF_Q_FLAGS_NBITS,
};
@@ -574,10 +576,12 @@ struct idpf_queue {
union {
struct page_pool *hdr_pp;
struct idpf_queue **xdpqs;
+ struct xsk_buff_pool *xsk_tx;
};
union {
struct page_pool *pp;
struct device *dev;
+ struct xsk_buff_pool *xsk_rx;
};
union {
union virtchnl2_rx_desc *rx;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xsk.c b/drivers/net/ethernet/intel/idpf/idpf_xsk.c
index 3017680fedb3..6f1870c05948 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xsk.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_xsk.c
@@ -6,6 +6,89 @@
#include "idpf.h"
#include "idpf_xsk.h"

+/**
+ * idpf_xsk_setup_queue - set xsk_pool pointer from netdev to the queue structure
+ * @q: queue to use
+ *
+ * Assigns pointer to xsk_pool field in queue struct if it is supported in
+ * netdev, NULL otherwise.
+ */
+void idpf_xsk_setup_queue(struct idpf_queue *q, enum virtchnl2_queue_type t)
+{
+ struct idpf_vport_user_config_data *cfg_data;
+ struct idpf_vport *vport = q->vport;
+ struct xsk_buff_pool *pool;
+ bool is_rx = false;
+ int qid;
+
+ __clear_bit(__IDPF_Q_XSK, q->flags);
+
+ if (!idpf_xdp_is_prog_ena(q->vport))
+ return;
+
+ switch (t) {
+ case VIRTCHNL2_QUEUE_TYPE_RX:
+ is_rx = true;
+ qid = q->idx;
+ break;
+ case VIRTCHNL2_QUEUE_TYPE_RX_BUFFER:
+ is_rx = true;
+ qid = q->rxq_grp->splitq.rxq_sets[0]->rxq.idx;
+ break;
+ case VIRTCHNL2_QUEUE_TYPE_TX:
+ qid = q->idx - q->vport->xdp_txq_offset;
+ break;
+ case VIRTCHNL2_QUEUE_TYPE_TX_COMPLETION:
+ qid = q->txq_grp->txqs[0]->idx - q->vport->xdp_txq_offset;
+ break;
+ default:
+ return;
+ }
+
+ if (!is_rx && !test_bit(__IDPF_Q_XDP, q->flags))
+ return;
+
+ cfg_data = &vport->adapter->vport_config[vport->idx]->user_config;
+
+ if (!test_bit(qid, cfg_data->af_xdp_zc_qps))
+ return;
+
+ pool = xsk_get_pool_from_qid(q->vport->netdev, qid);
+
+ if (pool && is_rx && !xsk_buff_can_alloc(pool, 1))
+ return;
+
+ if (is_rx)
+ q->xsk_rx = pool;
+ else
+ q->xsk_tx = pool;
+
+ __set_bit(__IDPF_Q_XSK, q->flags);
+}
+
+void idpf_xsk_clear_queue(struct idpf_queue *q)
+{
+ struct device *dev;
+
+ if (!__test_and_clear_bit(__IDPF_Q_XSK, q->flags))
+ return;
+
+ switch (q->q_type) {
+ case VIRTCHNL2_QUEUE_TYPE_RX:
+ case VIRTCHNL2_QUEUE_TYPE_RX_BUFFER:
+ dev = q->xsk_rx->dev;
+ q->xsk_rx = NULL;
+ q->dev = dev;
+ break;
+ case VIRTCHNL2_QUEUE_TYPE_TX:
+ case VIRTCHNL2_QUEUE_TYPE_TX_COMPLETION:
+ dev = q->xsk_tx->dev;
+ q->xsk_tx = NULL;
+ q->dev = dev;
+ break;
+ }
+}
+
/**
* idpf_qp_cfg_qs - Configure all queues contained from a given array.
* @vport: vport structure
@@ -95,6 +178,23 @@ idpf_qp_clean_qs(struct idpf_vport *vport, struct idpf_queue **qs, int num_qs)
}
}

+/**
+ * idpf_trigger_sw_intr - trigger a software interrupt
+ * @hw: pointer to the HW structure
+ * @q_vector: interrupt vector to trigger the software interrupt for
+ */
+static void
+idpf_trigger_sw_intr(struct idpf_hw *hw, struct idpf_q_vector *q_vector)
+{
+ struct idpf_intr_reg *intr = &q_vector->intr_reg;
+ u32 val;
+
+ val = intr->dyn_ctl_intena_m | intr->dyn_ctl_itridx_m | /* set no itr*/
+ intr->dyn_ctl_swint_trig_m |intr->dyn_ctl_sw_itridx_ena_m;
+
+ writel(val, intr->dyn_ctl);
+}
+
/**
* idpf_qvec_ena_irq - Enable IRQ for given queue vector
* @q_vector: queue vector
@@ -472,3 +572,221 @@ int idpf_xsk_pool_setup(struct idpf_vport *vport, struct xsk_buff_pool *pool,
xsk_exit:
return err;
}
+
+/**
+ * idpf_xsk_clean_xdpq - Clean the XDP Tx queue and its buffer pool queues
+ * @xdpq: XDP_Tx queue
+ */
+void idpf_xsk_clean_xdpq(struct idpf_queue *xdpq)
+{
+ u32 ntc = xdpq->next_to_clean, ntu = xdpq->next_to_use;
+ struct device *dev = xdpq->xsk_tx->dev;
+ struct libie_sq_onstack_stats ss = { };
+ struct xdp_frame_bulk bq;
+ u32 xsk_frames = 0;
+
+ xdp_frame_bulk_init(&bq);
+ rcu_read_lock();
+
+ while (ntc != ntu) {
+ struct libie_tx_buffer *tx_buf = &xdpq->tx_buf[ntc];
+
+ if (tx_buf->type)
+ libie_xdp_complete_tx_buf(tx_buf, dev, false, &bq,
+ &xdpq->xdp_tx_active, &ss);
+ else
+ xsk_frames++;
+
+ if (unlikely(++ntc >= xdpq->desc_count))
+ ntc = 0;
+ }
+
+ xdp_flush_frame_bulk(&bq);
+ rcu_read_unlock();
+
+ if (xsk_frames)
+ xsk_tx_completed(xdpq->xsk_tx, xsk_frames);
+}
+
+/**
+ * idpf_clean_xdp_irq_zc - produce AF_XDP descriptors to CQ
+ * @complq: completion queue associated with zero-copy Tx queue
+ */
+static u32 idpf_clean_xdp_irq_zc(struct idpf_queue *complq)
+{
+ struct idpf_splitq_4b_tx_compl_desc *last_rs_desc;
+ struct device *dev = complq->xsk_tx->dev;
+ struct libie_sq_onstack_stats ss = { };
+ int complq_budget = complq->desc_count;
+ u32 ntc = complq->next_to_clean;
+ struct idpf_queue *xdpq = NULL;
+ struct xdp_frame_bulk bq;
+ u32 done_frames = 0;
+ u32 xsk_frames = 0;
+ u32 tx_ntc, cnt;
+ bool gen_flag;
+ int head, i;
+
+ last_rs_desc = &complq->comp_4b[ntc];
+ gen_flag = test_bit(__IDPF_Q_GEN_CHK, complq->flags);
+
+ do {
+ int ctype = idpf_parse_compl_desc(last_rs_desc, complq,
+ &xdpq, gen_flag);
+
+ if (likely(ctype == IDPF_TXD_COMPLT_RS)) {
+ head = le16_to_cpu(last_rs_desc->q_head_compl_tag.q_head);
+ goto fetch_next_desc;
+ }
+
+ switch (ctype) {
+ case IDPF_TXD_COMPLT_SW_MARKER:
+ idpf_tx_handle_sw_marker(xdpq);
+ break;
+ case -ENODATA:
+ goto clean_xdpq;
+ case -EINVAL:
+ goto fetch_next_desc;
+ default:
+ dev_err(&xdpq->vport->adapter->pdev->dev,
+ "Unsupported completion type for XSK\n");
+ goto fetch_next_desc;
+ }
+
+fetch_next_desc:
+ last_rs_desc++;
+ ntc++;
+ if (unlikely(ntc == complq->desc_count)) {
+ ntc = 0;
+ last_rs_desc = &complq->comp_4b[0];
+ gen_flag = !gen_flag;
+ change_bit(__IDPF_Q_GEN_CHK, complq->flags);
+ }
+ prefetch(last_rs_desc);
+ complq_budget--;
+ } while (likely(complq_budget));
+
+clean_xdpq:
+ complq->next_to_clean = ntc;
+
+ if (!xdpq)
+ return 0;
+
+ cnt = xdpq->desc_count;
+ tx_ntc = xdpq->next_to_clean;
+ done_frames = head >= tx_ntc ? head - tx_ntc :
+ head + cnt - tx_ntc;
+ if (!done_frames)
+ return 0;
+
+ if (likely(!complq->xdp_tx_active))
+ goto xsk;
+
+ xdp_frame_bulk_init(&bq);
+
+ for (i = 0; i < done_frames; i++) {
+ struct libie_tx_buffer *tx_buf = &xdpq->tx_buf[tx_ntc];
+
+ if (tx_buf->type)
+ libie_xdp_complete_tx_buf(tx_buf, dev, true, &bq,
+ &xdpq->xdp_tx_active,
+ &ss);
+ else
+ xsk_frames++;
+
+ if (unlikely(++tx_ntc == cnt))
+ tx_ntc = 0;
+ }
+
+ xdp_flush_frame_bulk(&bq);
+
+xsk:
+ xdpq->next_to_clean += done_frames;
+ if (xdpq->next_to_clean >= cnt)
+ xdpq->next_to_clean -= cnt;
+
+ if (xsk_frames)
+ xsk_tx_completed(xdpq->xsk_tx, xsk_frames);
+
+ return done_frames;
+}
+
+/**
+ * idpf_xmit_pkt - produce a single HW Tx descriptor out of AF_XDP descriptor
+ * @xdpq: XDP queue to produce the HW Tx descriptor on
+ * @desc: AF_XDP descriptor to pull the DMA address and length from
+ * @total_bytes: bytes accumulator that will be used for stats update
+ */
+static void idpf_xsk_xmit_pkt(struct libie_xdp_tx_desc desc,
+ const struct libie_xdp_tx_queue *sq)
+{
+ union idpf_tx_flex_desc *tx_desc = sq->desc_ring;
+ struct idpf_tx_splitq_params tx_params = {
+ .dtype = IDPF_TX_DESC_DTYPE_FLEX_L2TAG1_L2TAG2,
+ .eop_cmd = IDPF_TX_DESC_CMD_EOP,
+ };
+
+ tx_desc = &tx_desc[*sq->next_to_use];
+ tx_desc->q.buf_addr = cpu_to_le64(desc.addr);
+
+ idpf_tx_splitq_build_desc(tx_desc, &tx_params,
+ tx_params.eop_cmd | tx_params.offload.td_cmd,
+ desc.len);
+}
+
+static u32 idpf_xsk_xmit_prep(void *_xdpq, struct libie_xdp_tx_queue *sq)
+{
+ struct idpf_queue *xdpq = _xdpq;
+
+ libie_xdp_sq_lock(&xdpq->xdp_lock);
+
+ *sq = (struct libie_xdp_tx_queue){
+ .dev = xdpq->dev,
+ .tx_buf = xdpq->tx_buf,
+ .desc_ring = xdpq->desc_ring,
+ .xdp_lock = &xdpq->xdp_lock,
+ .next_to_use = &xdpq->next_to_use,
+ .desc_count = xdpq->desc_count,
+ .xdp_tx_active = &xdpq->xdp_tx_active,
+ };
+
+ return IDPF_DESC_UNUSED(xdpq);
+}
+
+/**
+ * idpf_xmit_xdpq_zc - take entries from XSK Tx queue and place them onto HW Tx queue
+ * @xdpq: XDP queue to produce the HW Tx descriptors on
+ *
+ * Returns true if there is no more work that needs to be done, false otherwise
+ */
+static bool idpf_xmit_xdpq_zc(struct idpf_queue *xdpq)
+{
+ u32 budget;
+
+ budget = IDPF_DESC_UNUSED(xdpq);
+ budget = min_t(u32, budget, IDPF_QUEUE_QUARTER(xdpq));
+
+ return libie_xsk_xmit_do_bulk(xdpq, xdpq->xsk_tx, budget,
+ idpf_xsk_xmit_prep, idpf_xsk_xmit_pkt,
+ idpf_xdp_tx_finalize);
+}
+
+/**
+ * idpf_xmit_zc - perform xmit from all XDP queues assigned to the completion queue
+ * @complq: Completion queue associated with one or more XDP queues
+ *
+ * Returns true if there is no more work that needs to be done, false otherwise
+ */
+bool idpf_xmit_zc(struct idpf_queue *complq)
+{
+ struct idpf_txq_group *xdpq_grp = complq->txq_grp;
+ bool result = true;
+ int i;
+
+ idpf_clean_xdp_irq_zc(complq);
+
+ for (i = 0; i < xdpq_grp->num_txq; i++)
+ result &= idpf_xmit_xdpq_zc(xdpq_grp->txqs[i]);
+
+ return result;
+}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xsk.h b/drivers/net/ethernet/intel/idpf/idpf_xsk.h
index 93705900f592..777d6ab7891d 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xsk.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_xsk.h
@@ -6,9 +6,18 @@

#include <linux/types.h>

+enum virtchnl2_queue_type;
+
+struct idpf_queue;
struct idpf_vport;
struct xsk_buff_pool;

+void idpf_xsk_setup_queue(struct idpf_queue *q, enum virtchnl2_queue_type t);
+void idpf_xsk_clear_queue(struct idpf_queue *q);
+
+void idpf_xsk_clean_xdpq(struct idpf_queue *xdpq);
+bool idpf_xmit_zc(struct idpf_queue *complq);
+
int idpf_xsk_pool_setup(struct idpf_vport *vport, struct xsk_buff_pool *pool,
u32 qid);

--
2.43.0


2023-12-23 03:09:52

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 33/34] idpf: implement Rx path for AF_XDP

From: Michal Kubiak <[email protected]>

Implement RX packet processing specific to AF_XDP ZC using the libie
XSk infra. XDP_PASS case is implemented using the generic ZC-to-skb
conversion function.

Signed-off-by: Michal Kubiak <[email protected]>
Co-developed-by: Alexander Lobakin <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 36 +-
drivers/net/ethernet/intel/idpf/idpf_txrx.h | 6 +
drivers/net/ethernet/intel/idpf/idpf_xdp.c | 44 ++-
drivers/net/ethernet/intel/idpf/idpf_xsk.c | 347 ++++++++++++++++++++
drivers/net/ethernet/intel/idpf/idpf_xsk.h | 4 +
5 files changed, 422 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 5ba880c2bedc..0c78811d65e5 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -409,9 +409,13 @@ void idpf_rx_desc_rel(struct idpf_queue *rxq, bool bufq, s32 q_model)
rxq->xdp.data = NULL;
}

- if (bufq || !idpf_is_queue_model_split(q_model))
+ if (bufq && test_bit(__IDPF_Q_XSK, rxq->flags))
+ idpf_xsk_buf_rel(rxq);
+ else if (bufq || !idpf_is_queue_model_split(q_model))
idpf_rx_buf_rel_all(rxq);

+ idpf_xsk_clear_queue(rxq);
+
rxq->next_to_alloc = 0;
rxq->next_to_clean = 0;
rxq->next_to_use = 0;
@@ -674,6 +678,9 @@ int idpf_rx_bufs_init(struct idpf_queue *rxbufq, enum libie_rx_buf_type type)
};
int ret;

+ if (test_bit(__IDPF_Q_XSK, rxbufq->flags))
+ return idpf_check_alloc_rx_buffers_zc(rxbufq);
+
ret = libie_rx_page_pool_create(&bq, &rxbufq->q_vector->napi);
if (ret)
return ret;
@@ -745,6 +752,7 @@ int idpf_rx_bufs_init_all(struct idpf_vport *vport)
*/
int idpf_rx_desc_alloc(struct idpf_queue *rxq, bool bufq, s32 q_model)
{
+ enum virtchnl2_queue_type type;
struct device *dev = rxq->dev;

if (bufq)
@@ -769,6 +777,9 @@ int idpf_rx_desc_alloc(struct idpf_queue *rxq, bool bufq, s32 q_model)
rxq->next_to_use = 0;
set_bit(__IDPF_Q_GEN_CHK, rxq->flags);

+ type = bufq ? VIRTCHNL2_QUEUE_TYPE_RX_BUFFER : VIRTCHNL2_QUEUE_TYPE_RX;
+ idpf_xsk_setup_queue(rxq, type);
+
return 0;
}

@@ -2788,8 +2799,8 @@ netdev_tx_t idpf_tx_splitq_start(struct sk_buff *skb,
* @rx_desc: Receive descriptor
* @parsed: parsed Rx packet type related fields
*/
-static void idpf_rx_hash(struct idpf_queue *rxq, struct sk_buff *skb,
- struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc,
+static void idpf_rx_hash(const struct idpf_queue *rxq, struct sk_buff *skb,
+ const struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc,
struct libie_rx_ptype_parsed parsed)
{
u32 hash;
@@ -2865,7 +2876,7 @@ static void idpf_rx_csum(struct idpf_queue *rxq, struct sk_buff *skb,
* @csum: structure to extract checksum fields
*
**/
-static void idpf_rx_splitq_extract_csum_bits(struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc,
+static void idpf_rx_splitq_extract_csum_bits(const struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc,
struct idpf_rx_csum_decoded *csum)
{
u8 qword0, qword1;
@@ -2901,7 +2912,7 @@ static void idpf_rx_splitq_extract_csum_bits(struct virtchnl2_rx_flex_desc_adv_n
* length and packet type.
*/
static int idpf_rx_rsc(struct idpf_queue *rxq, struct sk_buff *skb,
- struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc,
+ const struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc,
struct libie_rx_ptype_parsed parsed)
{
u16 rsc_segments, rsc_seg_len;
@@ -2970,9 +2981,8 @@ static int idpf_rx_rsc(struct idpf_queue *rxq, struct sk_buff *skb,
* order to populate the hash, checksum, protocol, and
* other fields within the skb.
*/
-static int idpf_rx_process_skb_fields(struct idpf_queue *rxq,
- struct sk_buff *skb,
- struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc)
+int idpf_rx_process_skb_fields(struct idpf_queue *rxq, struct sk_buff *skb,
+ const struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc)
{
struct idpf_rx_csum_decoded csum_bits = { };
struct libie_rx_ptype_parsed parsed;
@@ -3851,7 +3861,9 @@ static bool idpf_rx_splitq_clean_all(struct idpf_q_vector *q_vec, int budget,
struct idpf_queue *rxq = q_vec->rx[i];
int pkts_cleaned_per_q;

- pkts_cleaned_per_q = idpf_rx_splitq_clean(rxq, budget_per_q);
+ pkts_cleaned_per_q = test_bit(__IDPF_Q_XSK, rxq->flags) ?
+ idpf_clean_rx_irq_zc(rxq, budget_per_q) :
+ idpf_rx_splitq_clean(rxq, budget_per_q);
/* if we clean as many as budgeted, we must not be done */
if (pkts_cleaned_per_q >= budget_per_q)
clean_complete = false;
@@ -3859,8 +3871,10 @@ static bool idpf_rx_splitq_clean_all(struct idpf_q_vector *q_vec, int budget,
}
*cleaned = pkts_cleaned;

- for (i = 0; i < q_vec->num_bufq; i++)
- idpf_rx_clean_refillq_all(q_vec->bufq[i]);
+ for (i = 0; i < q_vec->num_bufq; i++) {
+ if (!test_bit(__IDPF_Q_XSK, q_vec->bufq[i]->flags))
+ idpf_rx_clean_refillq_all(q_vec->bufq[i]);
+ }

return clean_complete;
}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.h b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
index d55ff6aaae2b..bfb867256ad2 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.h
@@ -572,6 +572,7 @@ struct idpf_queue {
struct libie_rx_buffer *hdr_buf;
struct idpf_rx_buf *buf;
} rx_buf;
+ struct xdp_buff **xsk;
};
union {
struct page_pool *hdr_pp;
@@ -951,6 +952,11 @@ netdev_tx_t idpf_tx_singleq_start(struct sk_buff *skb,
struct net_device *netdev);
bool idpf_rx_singleq_buf_hw_alloc_all(struct idpf_queue *rxq,
u16 cleaned_count);
+
+struct virtchnl2_rx_flex_desc_adv_nic_3;
+
+int idpf_rx_process_skb_fields(struct idpf_queue *rxq, struct sk_buff *skb,
+ const struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc);
int idpf_tso(struct sk_buff *skb, struct idpf_tx_offload_params *off);
void idpf_tx_handle_sw_marker(struct idpf_queue *tx_q);
int idpf_rx_desc_alloc(struct idpf_queue *rxq, bool bufq, s32 q_model);
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xdp.c b/drivers/net/ethernet/intel/idpf/idpf_xdp.c
index c20c805583be..de5187192c58 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xdp.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_xdp.c
@@ -48,7 +48,6 @@ static int idpf_rxq_for_each(const struct idpf_vport *vport,
static int idpf_xdp_rxq_info_init(struct idpf_queue *rxq, void *arg)
{
const struct idpf_vport *vport = rxq->vport;
- const struct page_pool *pp;
int err;

err = __xdp_rxq_info_reg(&rxq->xdp_rxq, vport->netdev, rxq->idx,
@@ -57,13 +56,28 @@ static int idpf_xdp_rxq_info_init(struct idpf_queue *rxq, void *arg)
if (err)
return err;

- pp = arg ? rxq->rxq_grp->splitq.bufq_sets[0].bufq.pp : rxq->pp;
- xdp_rxq_info_attach_page_pool(&rxq->xdp_rxq, pp);
+ if (test_bit(__IDPF_Q_XSK, rxq->flags)) {
+ err = xdp_rxq_info_reg_mem_model(&rxq->xdp_rxq,
+ MEM_TYPE_XSK_BUFF_POOL,
+ NULL);
+ } else {
+ const struct page_pool *pp;
+
+ pp = arg ? rxq->rxq_grp->splitq.bufq_sets[0].bufq.pp : rxq->pp;
+ xdp_rxq_info_attach_page_pool(&rxq->xdp_rxq, pp);
+ }
+ if (err)
+ goto unreg;

rxq->xdpqs = &vport->txqs[vport->xdp_txq_offset];
rxq->num_xdp_txq = vport->num_xdp_txq;

return 0;
+
+unreg:
+ xdp_rxq_info_unreg(&rxq->xdp_rxq);
+
+ return err;
}

/**
@@ -90,7 +104,9 @@ static int idpf_xdp_rxq_info_deinit(struct idpf_queue *rxq, void *arg)
rxq->xdpqs = NULL;
rxq->num_xdp_txq = 0;

- xdp_rxq_info_detach_mem_model(&rxq->xdp_rxq);
+ if (!test_bit(__IDPF_Q_XSK, rxq->flags))
+ xdp_rxq_info_detach_mem_model(&rxq->xdp_rxq);
+
xdp_rxq_info_unreg(&rxq->xdp_rxq);

return 0;
@@ -132,6 +148,23 @@ void idpf_copy_xdp_prog_to_qs(const struct idpf_vport *vport,
idpf_rxq_for_each(vport, idpf_xdp_rxq_assign_prog, xdp_prog);
}

+static int idpf_rx_napi_schedule(struct idpf_queue *rxq, void *arg)
+{
+ if (test_bit(__IDPF_Q_XSK, rxq->flags))
+ napi_schedule(&rxq->q_vector->napi);
+
+ return 0;
+}
+
+/**
+ * idpf_vport_rx_napi_schedule - Schedule napi on RX queues from vport
+ * @vport: vport to schedule napi on
+ */
+static void idpf_vport_rx_napi_schedule(const struct idpf_vport *vport)
+{
+ idpf_rxq_for_each(vport, idpf_rx_napi_schedule, NULL);
+}
+
void idpf_vport_xdpq_get(const struct idpf_vport *vport)
{
if (!idpf_xdp_is_prog_ena(vport))
@@ -451,6 +484,9 @@ idpf_xdp_setup_prog(struct idpf_vport *vport, struct bpf_prog *prog,
NL_SET_ERR_MSG_MOD(extack, "Could not re-open the vport after XDP setup\n");
return err;
}
+
+ if (prog)
+ idpf_vport_rx_napi_schedule(vport);
}

return 0;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xsk.c b/drivers/net/ethernet/intel/idpf/idpf_xsk.c
index 6f1870c05948..01231e828f6a 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xsk.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_xsk.c
@@ -573,6 +573,171 @@ int idpf_xsk_pool_setup(struct idpf_vport *vport, struct xsk_buff_pool *pool,
return err;
}

+/**
+ * idpf_init_rx_descs_zc - pick buffers from XSK buffer pool and use it
+ * @pool: XSK Buffer pool to pull the buffers from
+ * @xdp: SW ring of xdp_buff that will hold the buffers
+ * @buf_desc: Pointer to buffer descriptors that will be filled
+ * @first_buf_id: ID of the first buffer to be filled
+ * @count: The number of buffers to allocate
+ *
+ * This function allocates a number of Rx buffers from the fill queue
+ * or the internal recycle mechanism and places them on the buffer queue.
+ *
+ * Note that queue wrap should be handled by caller of this function.
+ *
+ * Returns the amount of allocated Rx descriptors
+ */
+static u32 idpf_init_rx_descs_zc(struct xsk_buff_pool *pool,
+ struct xdp_buff **xdp,
+ struct virtchnl2_splitq_rx_buf_desc *buf_desc,
+ u32 first_buf_id,
+ u32 count)
+{
+ dma_addr_t dma;
+ u32 num_buffs;
+ u32 i;
+
+ num_buffs = xsk_buff_alloc_batch(pool, xdp, count);
+ for (i = 0; i < num_buffs; i++) {
+ dma = xsk_buff_xdp_get_dma(*xdp);
+ buf_desc->pkt_addr = cpu_to_le64(dma);
+ buf_desc->qword0.buf_id = cpu_to_le16(i + first_buf_id);
+
+ buf_desc++;
+ xdp++;
+ }
+
+ return num_buffs;
+}
+
+static struct xdp_buff **idpf_get_xdp_buff(const struct idpf_queue *q, u32 idx)
+{
+ return &q->xsk[idx];
+}
+
+/**
+ * __idpf_alloc_rx_buffers_zc - allocate a number of Rx buffers
+ * @rxbufq: buffer queue
+ * @count: The number of buffers to allocate
+ *
+ * Place the @count of descriptors onto buffer queue. Handle the queue wrap
+ * for case where space from next_to_use up to the end of ring is less
+ * than @count. Finally do a tail bump.
+ *
+ * Returns true if all allocations were successful, false if any fail.
+ */
+static bool __idpf_alloc_rx_buffers_zc(struct idpf_queue *rxbufq, u32 count)
+{
+ struct virtchnl2_splitq_rx_buf_desc *buf_desc;
+ u32 nb_buffs_extra = 0, nb_buffs = 0;
+ u32 ntu = rxbufq->next_to_use;
+ u32 total_count = count;
+ struct xdp_buff **xdp;
+
+ buf_desc = &rxbufq->split_buf[ntu];
+ xdp = idpf_get_xdp_buff(rxbufq, ntu);
+
+ if (ntu + count >= rxbufq->desc_count) {
+ nb_buffs_extra = idpf_init_rx_descs_zc(rxbufq->xsk_rx, xdp,
+ buf_desc,
+ ntu,
+ rxbufq->desc_count - ntu);
+ if (nb_buffs_extra != rxbufq->desc_count - ntu) {
+ ntu += nb_buffs_extra;
+ goto exit;
+ }
+ buf_desc = &rxbufq->split_buf[0];
+ xdp = idpf_get_xdp_buff(rxbufq, 0);
+ ntu = 0;
+ count -= nb_buffs_extra;
+ idpf_rx_buf_hw_update(rxbufq, 0);
+
+ if (!count)
+ goto exit;
+ }
+
+ nb_buffs = idpf_init_rx_descs_zc(rxbufq->xsk_rx, xdp,
+ buf_desc, ntu, count);
+
+ ntu += nb_buffs;
+ if (ntu == rxbufq->desc_count)
+ ntu = 0;
+
+exit:
+ if (rxbufq->next_to_use != ntu)
+ idpf_rx_buf_hw_update(rxbufq, ntu);
+
+ rxbufq->next_to_alloc = ntu;
+
+ return total_count == (nb_buffs_extra + nb_buffs);
+}
+
+/**
+ * idpf_alloc_rx_buffers_zc - allocate a number of Rx buffers
+ * @rxbufq: buffer queue
+ * @count: The number of buffers to allocate
+ *
+ * Wrapper for internal allocation routine; figure out how many tail
+ * bumps should take place based on the given threshold
+ *
+ * Returns true if all calls to internal alloc routine succeeded
+ */
+static bool idpf_alloc_rx_buffers_zc(struct idpf_queue *rxbufq, u32 count)
+{
+ u32 rx_thresh = IDPF_QUEUE_QUARTER(rxbufq);
+ u32 leftover, i, tail_bumps;
+
+ tail_bumps = count / rx_thresh;
+ leftover = count - (tail_bumps * rx_thresh);
+
+ for (i = 0; i < tail_bumps; i++)
+ if (!__idpf_alloc_rx_buffers_zc(rxbufq, rx_thresh))
+ return false;
+ return __idpf_alloc_rx_buffers_zc(rxbufq, leftover);
+}
+
+/**
+ * idpf_check_alloc_rx_buffers_zc - allocate a number of Rx buffers with logs
+ * @adapter: board private structure
+ * @rxbufq: buffer queue
+ *
+ * Wrapper for internal allocation routine; Prints out logs, if allocation
+ * did not go as expected
+ */
+int idpf_check_alloc_rx_buffers_zc(struct idpf_queue *rxbufq)
+{
+ struct net_device *netdev = rxbufq->vport->netdev;
+ struct xsk_buff_pool *pool = rxbufq->xsk_rx;
+ u32 count = IDPF_DESC_UNUSED(rxbufq);
+
+ rxbufq->xsk = kcalloc(rxbufq->desc_count, sizeof(*rxbufq->xsk),
+ GFP_KERNEL);
+ if (!rxbufq->xsk)
+ return -ENOMEM;
+
+ if (!xsk_buff_can_alloc(pool, count)) {
+ netdev_warn(netdev, "XSK buffer pool does not provide enough addresses to fill %d buffers on Rx queue %d\n",
+ count, rxbufq->idx);
+ netdev_warn(netdev, "Change Rx queue/fill queue size to avoid performance issues\n");
+ }
+
+ if (!idpf_alloc_rx_buffers_zc(rxbufq, count))
+ netdev_warn(netdev, "Failed to allocate some buffers on XSK buffer pool enabled Rx queue %d\n",
+ rxbufq->idx);
+
+ rxbufq->rx_buf_size = xsk_pool_get_rx_frame_size(pool);
+
+ return 0;
+}
+
+void idpf_xsk_buf_rel(struct idpf_queue *rxbufq)
+{
+ rxbufq->rx_buf_size = 0;
+
+ kfree(rxbufq->xsk);
+}
+
/**
* idpf_xsk_clean_xdpq - Clean the XDP Tx queue and its buffer pool queues
* @xdpq: XDP_Tx queue
@@ -711,6 +876,30 @@ static u32 idpf_clean_xdp_irq_zc(struct idpf_queue *complq)
return done_frames;
}

+static u32 idpf_xsk_tx_prep(void *_xdpq, struct libie_xdp_tx_queue *sq)
+{
+ struct idpf_queue *xdpq = _xdpq;
+ u32 free;
+
+ libie_xdp_sq_lock(&xdpq->xdp_lock);
+
+ free = IDPF_DESC_UNUSED(xdpq);
+ if (unlikely(free < IDPF_QUEUE_QUARTER(xdpq)))
+ free += idpf_clean_xdp_irq_zc(xdpq->txq_grp->complq);
+
+ *sq = (struct libie_xdp_tx_queue){
+ .dev = xdpq->dev,
+ .tx_buf = xdpq->tx_buf,
+ .desc_ring = xdpq->desc_ring,
+ .xdp_lock = &xdpq->xdp_lock,
+ .next_to_use = &xdpq->next_to_use,
+ .desc_count = xdpq->desc_count,
+ .xdp_tx_active = &xdpq->xdp_tx_active,
+ };
+
+ return free;
+}
+
/**
* idpf_xmit_pkt - produce a single HW Tx descriptor out of AF_XDP descriptor
* @xdpq: XDP queue to produce the HW Tx descriptor on
@@ -734,6 +923,24 @@ static void idpf_xsk_xmit_pkt(struct libie_xdp_tx_desc desc,
desc.len);
}

+static bool idpf_xsk_tx_flush_bulk(struct libie_xdp_tx_bulk *bq)
+{
+ return libie_xsk_tx_flush_bulk(bq, idpf_xsk_tx_prep,
+ idpf_xsk_xmit_pkt);
+}
+
+static bool idpf_xsk_run_prog(struct xdp_buff *xdp,
+ struct libie_xdp_tx_bulk *bq)
+{
+ return libie_xdp_run_prog(xdp, bq, idpf_xsk_tx_flush_bulk);
+}
+
+static void idpf_xsk_finalize_rx(struct libie_xdp_tx_bulk *bq)
+{
+ if (bq->act_mask >= LIBIE_XDP_TX)
+ libie_xdp_finalize_rx(bq, idpf_xsk_tx_flush_bulk,
+ idpf_xdp_tx_finalize);
+}
static u32 idpf_xsk_xmit_prep(void *_xdpq, struct libie_xdp_tx_queue *sq)
{
struct idpf_queue *xdpq = _xdpq;
@@ -753,6 +960,146 @@ static u32 idpf_xsk_xmit_prep(void *_xdpq, struct libie_xdp_tx_queue *sq)
return IDPF_DESC_UNUSED(xdpq);
}

+static bool
+idpf_xsk_rx_skb(struct xdp_buff *xdp,
+ const struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc,
+ struct idpf_queue *rxq)
+{
+ struct napi_struct *napi = &rxq->q_vector->napi;
+ struct sk_buff *skb;
+
+ skb = xdp_build_skb_from_zc(napi, xdp);
+ if (unlikely(!skb))
+ return false;
+
+ if (unlikely(!idpf_rx_process_skb_fields(rxq, skb, rx_desc))) {
+ kfree_skb(skb);
+ return false;
+ }
+
+ napi_gro_receive(napi, skb);
+
+ return true;
+}
+
+/**
+ * idpf_clean_rx_irq_zc - consumes packets from the hardware queue
+ * @rxq: AF_XDP Rx queue
+ * @budget: NAPI budget
+ *
+ * Returns number of processed packets on success, remaining budget on failure.
+ */
+int idpf_clean_rx_irq_zc(struct idpf_queue *rxq, int budget)
+{
+ int total_rx_bytes = 0, total_rx_pkts = 0;
+ struct idpf_queue *rx_bufq = NULL;
+ u32 ntc = rxq->next_to_clean;
+ struct libie_xdp_tx_bulk bq;
+ bool failure = false;
+ u32 to_refill;
+ u16 buf_id;
+
+ libie_xsk_tx_init_bulk(&bq, rxq->xdp_prog, rxq->xdp_rxq.dev,
+ rxq->xdpqs, rxq->num_xdp_txq);
+
+ while (likely(total_rx_pkts < budget)) {
+ struct virtchnl2_rx_flex_desc_adv_nic_3 *rx_desc;
+ union virtchnl2_rx_desc *desc;
+ unsigned int pkt_len = 0;
+ struct xdp_buff *xdp;
+ u32 bufq_id, xdp_act;
+ u16 gen_id;
+ u8 rxdid;
+
+ desc = &rxq->rx[ntc];
+ rx_desc = (struct virtchnl2_rx_flex_desc_adv_nic_3 *)desc;
+
+ dma_rmb();
+
+ /* if the descriptor isn't done, no work yet to do */
+ gen_id = le16_to_cpu(rx_desc->pktlen_gen_bufq_id);
+ gen_id = FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_ADV_GEN_M, gen_id);
+
+ if (test_bit(__IDPF_Q_GEN_CHK, rxq->flags) != gen_id)
+ break;
+
+ rxdid = FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_ADV_RXDID_M,
+ rx_desc->rxdid_ucast);
+ if (rxdid != VIRTCHNL2_RXDID_2_FLEX_SPLITQ) {
+ IDPF_RX_BUMP_NTC(rxq, ntc);
+ u64_stats_update_begin(&rxq->stats_sync);
+ u64_stats_inc(&rxq->q_stats.rx.bad_descs);
+ u64_stats_update_end(&rxq->stats_sync);
+ continue;
+ }
+
+ pkt_len = le16_to_cpu(rx_desc->pktlen_gen_bufq_id);
+ pkt_len = FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_ADV_LEN_PBUF_M,
+ pkt_len);
+
+ bufq_id = le16_to_cpu(rx_desc->pktlen_gen_bufq_id);
+ bufq_id = FIELD_GET(VIRTCHNL2_RX_FLEX_DESC_ADV_BUFQ_ID_M,
+ bufq_id);
+
+ rx_bufq = &rxq->rxq_grp->splitq.bufq_sets[bufq_id].bufq;
+ buf_id = le16_to_cpu(rx_desc->buf_id);
+
+ xdp = *idpf_get_xdp_buff(rx_bufq, buf_id);
+
+ if (unlikely(!pkt_len)) {
+ xsk_buff_free(xdp);
+ goto next;
+ }
+
+ xsk_buff_set_size(xdp, pkt_len);
+ xsk_buff_dma_sync_for_cpu(xdp, rxq->xsk_rx);
+
+ xdp_act = idpf_xsk_run_prog(xdp, &bq);
+ if ((xdp_act == LIBIE_XDP_PASS &&
+ unlikely(!idpf_xsk_rx_skb(xdp, rx_desc, rxq))) ||
+ unlikely(xdp_act == LIBIE_XDP_ABORTED)) {
+ failure = true;
+ break;
+ }
+
+ total_rx_bytes += pkt_len;
+ total_rx_pkts++;
+
+next:
+ IDPF_RX_BUMP_NTC(rxq, ntc);
+ }
+
+ rxq->next_to_clean = ntc;
+ idpf_xsk_finalize_rx(&bq);
+
+ u64_stats_update_begin(&rxq->stats_sync);
+ u64_stats_add(&rxq->q_stats.rx.packets, total_rx_pkts);
+ u64_stats_add(&rxq->q_stats.rx.bytes, total_rx_bytes);
+ u64_stats_update_end(&rxq->stats_sync);
+
+ if (!rx_bufq)
+ goto skip_refill;
+
+ IDPF_RX_BUMP_NTC(rx_bufq, buf_id);
+ rx_bufq->next_to_clean = buf_id;
+
+ to_refill = IDPF_DESC_UNUSED(rx_bufq);
+ if (to_refill > IDPF_QUEUE_QUARTER(rx_bufq))
+ failure |= !idpf_alloc_rx_buffers_zc(rx_bufq, to_refill);
+
+skip_refill:
+ if (xsk_uses_need_wakeup(rxq->xsk_rx)) {
+ if (failure || rxq->next_to_clean == rxq->next_to_use)
+ xsk_set_rx_need_wakeup(rxq->xsk_rx);
+ else
+ xsk_clear_rx_need_wakeup(rxq->xsk_rx);
+
+ return total_rx_pkts;
+ }
+
+ return unlikely(failure) ? budget : total_rx_pkts;
+}
+
/**
* idpf_xmit_xdpq_zc - take entries from XSK Tx queue and place them onto HW Tx queue
* @xdpq: XDP queue to produce the HW Tx descriptors on
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xsk.h b/drivers/net/ethernet/intel/idpf/idpf_xsk.h
index 777d6ab7891d..51ddf2e36577 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xsk.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_xsk.h
@@ -15,7 +15,11 @@ struct xsk_buff_pool;
void idpf_xsk_setup_queue(struct idpf_queue *q, enum virtchnl2_queue_type t);
void idpf_xsk_clear_queue(struct idpf_queue *q);

+int idpf_check_alloc_rx_buffers_zc(struct idpf_queue *rxbufq);
+void idpf_xsk_buf_rel(struct idpf_queue *rxbufq);
void idpf_xsk_clean_xdpq(struct idpf_queue *xdpq);
+
+int idpf_clean_rx_irq_zc(struct idpf_queue *rxq, int budget);
bool idpf_xmit_zc(struct idpf_queue *complq);

int idpf_xsk_pool_setup(struct idpf_vport *vport, struct xsk_buff_pool *pool,
--
2.43.0


2023-12-23 03:11:40

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next 34/34] idpf: enable XSk features and ndo_xsk_wakeup

From: Michal Kubiak <[email protected]>

Now that AF_XDP functionality is fully implemented, advertise XSk XDP
feature and add .ndo_xsk_wakeup() callback to be able to use it with
this driver.

Signed-off-by: Michal Kubiak <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/idpf/idpf_lib.c | 5 ++-
drivers/net/ethernet/intel/idpf/idpf_xsk.c | 42 ++++++++++++++++++++++
drivers/net/ethernet/intel/idpf/idpf_xsk.h | 2 ++
3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c
index 7c3d45f84e1b..af4f708b82f3 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_lib.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c
@@ -3,6 +3,7 @@

#include "idpf.h"
#include "idpf_xdp.h"
+#include "idpf_xsk.h"

static const struct net_device_ops idpf_netdev_ops_splitq;
static const struct net_device_ops idpf_netdev_ops_singleq;
@@ -844,7 +845,8 @@ static int idpf_cfg_netdev(struct idpf_vport *vport)
if (idpf_is_queue_model_split(vport->rxq_model))
xdp_set_features_flag(netdev, NETDEV_XDP_ACT_BASIC |
NETDEV_XDP_ACT_REDIRECT |
- NETDEV_XDP_ACT_RX_SG);
+ NETDEV_XDP_ACT_RX_SG |
+ NETDEV_XDP_ACT_XSK_ZEROCOPY);

idpf_set_ethtool_ops(netdev);
SET_NETDEV_DEV(netdev, &adapter->pdev->dev);
@@ -2452,6 +2454,7 @@ static const struct net_device_ops idpf_netdev_ops_splitq = {
.ndo_tx_timeout = idpf_tx_timeout,
.ndo_bpf = idpf_xdp,
.ndo_xdp_xmit = idpf_xdp_xmit,
+ .ndo_xsk_wakeup = idpf_xsk_wakeup,
};

static const struct net_device_ops idpf_netdev_ops_singleq = {
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xsk.c b/drivers/net/ethernet/intel/idpf/idpf_xsk.c
index 01231e828f6a..aff37c6a5adb 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xsk.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_xsk.c
@@ -1137,3 +1137,45 @@ bool idpf_xmit_zc(struct idpf_queue *complq)

return result;
}
+
+/**
+ * idpf_xsk_wakeup - Implements ndo_xsk_wakeup
+ * @netdev: net_device
+ * @queue_id: queue to wake up
+ * @flags: ignored in our case, since we have Rx and Tx in the same NAPI
+ *
+ * Returns negative on error, zero otherwise.
+ */
+int idpf_xsk_wakeup(struct net_device *netdev, u32 qid, u32 flags)
+{
+ struct idpf_netdev_priv *np = netdev_priv(netdev);
+ struct idpf_vport *vport = np->vport;
+ struct idpf_q_vector *q_vector;
+ struct idpf_queue *q;
+ int idx;
+
+ if (idpf_vport_ctrl_is_locked(netdev))
+ return -EBUSY;
+
+ if (unlikely(!vport->link_up))
+ return -ENETDOWN;
+
+ if (unlikely(!idpf_xdp_is_prog_ena(vport)))
+ return -ENXIO;
+
+ idx = qid + vport->xdp_txq_offset;
+
+ if (unlikely(idx >= vport->num_txq))
+ return -ENXIO;
+
+ if (unlikely(!test_bit(__IDPF_Q_XSK, vport->txqs[idx]->flags)))
+ return -ENXIO;
+
+ q = vport->txqs[idx];
+ q_vector = q->txq_grp->complq->q_vector;
+
+ if (!napi_if_scheduled_mark_missed(&q_vector->napi))
+ idpf_trigger_sw_intr(&vport->adapter->hw, q_vector);
+
+ return 0;
+}
diff --git a/drivers/net/ethernet/intel/idpf/idpf_xsk.h b/drivers/net/ethernet/intel/idpf/idpf_xsk.h
index 51ddf2e36577..446ca971f37e 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_xsk.h
+++ b/drivers/net/ethernet/intel/idpf/idpf_xsk.h
@@ -10,6 +10,7 @@ enum virtchnl2_queue_type;

struct idpf_queue;
struct idpf_vport;
+struct net_device;
struct xsk_buff_pool;

void idpf_xsk_setup_queue(struct idpf_queue *q, enum virtchnl2_queue_type t);
@@ -24,5 +25,6 @@ bool idpf_xmit_zc(struct idpf_queue *complq);

int idpf_xsk_pool_setup(struct idpf_vport *vport, struct xsk_buff_pool *pool,
u32 qid);
+int idpf_xsk_wakeup(struct net_device *netdev, u32 qid, u32 flags);

#endif /* !_IDPF_XSK_H_ */
--
2.43.0


2023-12-26 20:23:55

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH RFC net-next 00/34] Christmas 3-serie XDP for idpf (+generic stuff)

Alexander Lobakin wrote:
> I was highly asked to send this WIP before the holidays to trigger
> some discussions at least for the generic parts.
>
> This all depends on libie[0] and WB-on-ITR fix[1]. The RFC does not
> guarantee to work perfectly, but at least regular XDP seems to work
> for me...
>
> In fact, here are 3 separate series:
> * 01-08: convert idpf to libie and make it more sane;
> * 09-25: add XDP to idpf;
> * 26-34: add XSk to idpf.
>
> Most people may want to be interested only in the following generic
> changes:
> * 11: allow attaching already registered memory models to XDP RxQ info;
> * 12-13: generic helpers for adding a frag to &xdp_buff and converting
> it to an skb;
> * 14: get rid of xdp_frame::mem.id, allow mixing pages from different
> page_pools within one &xdp_buff/&xdp_frame;
> * 15: some Page Pool helper;
> * 18: it's for libie, but I wanted to talk about XDP_TX bulking;
> * 26: same as 13, but for converting XSK &xdp_buff to skb.
>
> The rest is up to you, driver-specific stuff is pretty boring sometimes.
>
> I'll be polishing and finishing this all starting January 3rd and then
> preparing and sending sane series, some early feedback never hurts tho.
>
> Merry Yule!
>
> [0] https://lore.kernel.org/netdev/[email protected]
> [1] https://lore.kernel.org/netdev/[email protected]

This is great. Thanks for sharing the entire series.

Which SHA1 should we apply this to? I'm having a hard time applying
cleanly.

The libie v7 series applied cleanly on bc044ae9d64b. Which I chose
only based on the follow-on page pool patch.

But that base commit causes too many conflicts when applying this.
Patch 6 had a trivial one in idpf_rx_singleq_clean (`skb = rx_q->skb`).
But patch 14 has so many conflicts in page_pool.c, that I'm clearly
on the wrong track trying to fix up manually.




2023-12-27 15:31:04

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH RFC net-next 05/34] idpf: convert header split mode to libie + napi_build_skb()

Alexander Lobakin wrote:
> Currently, idpf uses the following model for the header buffers:
>
> * buffers are allocated via dma_alloc_coherent();
> * when receiving, napi_alloc_skb() is called and then the header is
> copied to the newly allocated linear part.
>
> This is far from optimal as DMA coherent zone is slow on many systems
> and memcpy() neutralizes the idea and benefits of the header split.

Do you have data showing this?

The assumption for the current model is that the headers will be
touched shortly after, so the copy just primes the cache.

The single coherently allocated region for all headers reduces
IOTLB pressure.

It is possible that the alternative model is faster. But that is not
trivially obvious.

I think patches like this can stand on their own. Probably best to
leave them out of the dependency series to enable XDP and AF_XDP.

> Instead, use libie to create page_pools for the header buffers, allocate
> them dynamically and then build an skb via napi_build_skb() around them
> with no memory copy. With one exception...
> When you enable header split, you except you'll always have a separate
> header buffer, so that you could reserve headroom and tailroom only
> there and then use full buffers for the data. For example, this is how
> TCP zerocopy works -- you have to have the payload aligned to PAGE_SIZE.
> The current hardware running idpf does *not* guarantee that you'll
> always have headers placed separately. For example, on my setup, even
> ICMP packets are written as one piece to the data buffers. You can't
> build a valid skb around a data buffer in this case.
> To not complicate things and not lose TCP zerocopy etc., when such thing
> happens, use the empty header buffer and pull either full frame (if it's
> short) or the Ethernet header there and build an skb around it. GRO
> layer will pull more from the data buffer later. This W/A will hopefully
> be removed one day.
>
> Signed-off-by: Alexander Lobakin <[email protected]>

2023-12-27 15:43:47

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH RFC net-next 01/34] idpf: reuse libie's definitions of parsed ptype structures

Alexander Lobakin wrote:
> idpf's in-kernel parsed ptype structure is almost identical to the one
> used in the previous Intel drivers, which means it can be converted to
> use libie's definitions and even helpers. The only difference is that
> it doesn't use a constant table, rather than one obtained from the
> device.
> Remove the driver counterpart and use libie's helpers for hashes and
> checksums. This slightly optimizes skb fields processing due to faster
> checks.
>
> Signed-off-by: Alexander Lobakin <[email protected]>
> ---
> drivers/net/ethernet/intel/Kconfig | 1 +
> drivers/net/ethernet/intel/idpf/idpf.h | 2 +-
> drivers/net/ethernet/intel/idpf/idpf_main.c | 1 +
> .../ethernet/intel/idpf/idpf_singleq_txrx.c | 87 +++++++--------
> drivers/net/ethernet/intel/idpf/idpf_txrx.c | 101 ++++++------------
> drivers/net/ethernet/intel/idpf/idpf_txrx.h | 88 +--------------
> .../net/ethernet/intel/idpf/idpf_virtchnl.c | 54 ++++++----
> 7 files changed, 110 insertions(+), 224 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
> index c7da7d05d93e..0db1aa36866e 100644
> --- a/drivers/net/ethernet/intel/Kconfig
> +++ b/drivers/net/ethernet/intel/Kconfig
> @@ -378,6 +378,7 @@ config IDPF
> tristate "Intel(R) Infrastructure Data Path Function Support"
> depends on PCI_MSI
> select DIMLIB
> + select LIBIE
> select PAGE_POOL
> select PAGE_POOL_STATS
> help
> diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h
> index 0acc125decb3..8342df0f4f3d 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf.h
> +++ b/drivers/net/ethernet/intel/idpf/idpf.h
> @@ -385,7 +385,7 @@ struct idpf_vport {
> u16 num_rxq_grp;
> struct idpf_rxq_group *rxq_grps;
> u32 rxq_model;
> - struct idpf_rx_ptype_decoded rx_ptype_lkup[IDPF_RX_MAX_PTYPE];
> + struct libie_rx_ptype_parsed rx_ptype_lkup[IDPF_RX_MAX_PTYPE];
>
> struct idpf_adapter *adapter;
> struct net_device *netdev;
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_main.c b/drivers/net/ethernet/intel/idpf/idpf_main.c
> index e1febc74cefd..6471158e6f6b 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_main.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_main.c
> @@ -7,6 +7,7 @@
> #define DRV_SUMMARY "Intel(R) Infrastructure Data Path Function Linux Driver"
>
> MODULE_DESCRIPTION(DRV_SUMMARY);
> +MODULE_IMPORT_NS(LIBIE);
> MODULE_LICENSE("GPL");
>
> /**
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
> index 8122a0cc97de..e58e08c9997d 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_singleq_txrx.c
> @@ -636,75 +636,64 @@ static bool idpf_rx_singleq_is_non_eop(struct idpf_queue *rxq,
> * @rxq: Rx ring being processed
> * @skb: skb currently being received and modified
> * @csum_bits: checksum bits from descriptor
> - * @ptype: the packet type decoded by hardware
> + * @parsed: the packet type parsed by hardware
> *
> * skb->protocol must be set before this function is called
> */
> static void idpf_rx_singleq_csum(struct idpf_queue *rxq, struct sk_buff *skb,
> - struct idpf_rx_csum_decoded *csum_bits,
> - u16 ptype)
> + struct idpf_rx_csum_decoded csum_bits,
> + struct libie_rx_ptype_parsed parsed)
> {
> - struct idpf_rx_ptype_decoded decoded;
> bool ipv4, ipv6;
>
> /* check if Rx checksum is enabled */
> - if (unlikely(!(rxq->vport->netdev->features & NETIF_F_RXCSUM)))
> + if (!libie_has_rx_checksum(rxq->vport->netdev, parsed))
> return;
>
> /* check if HW has decoded the packet and checksum */
> - if (unlikely(!(csum_bits->l3l4p)))
> + if (unlikely(!csum_bits.l3l4p))
> return;
>
> - decoded = rxq->vport->rx_ptype_lkup[ptype];
> - if (unlikely(!(decoded.known && decoded.outer_ip)))
> + if (unlikely(parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_L2))
> return;
>
> - ipv4 = IDPF_RX_PTYPE_TO_IPV(&decoded, IDPF_RX_PTYPE_OUTER_IPV4);
> - ipv6 = IDPF_RX_PTYPE_TO_IPV(&decoded, IDPF_RX_PTYPE_OUTER_IPV6);
> + ipv4 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV4;
> + ipv6 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV6;
>
> /* Check if there were any checksum errors */
> - if (unlikely(ipv4 && (csum_bits->ipe || csum_bits->eipe)))
> + if (unlikely(ipv4 && (csum_bits.ipe || csum_bits.eipe)))
> goto checksum_fail;
>
> /* Device could not do any checksum offload for certain extension
> * headers as indicated by setting IPV6EXADD bit
> */
> - if (unlikely(ipv6 && csum_bits->ipv6exadd))
> + if (unlikely(ipv6 && csum_bits.ipv6exadd))
> return;
>
> /* check for L4 errors and handle packets that were not able to be
> * checksummed due to arrival speed
> */
> - if (unlikely(csum_bits->l4e))
> + if (unlikely(csum_bits.l4e))
> goto checksum_fail;
>
> - if (unlikely(csum_bits->nat && csum_bits->eudpe))
> + if (unlikely(csum_bits.nat && csum_bits.eudpe))
> goto checksum_fail;
>
> /* Handle packets that were not able to be checksummed due to arrival
> * speed, in this case the stack can compute the csum.
> */
> - if (unlikely(csum_bits->pprs))
> + if (unlikely(csum_bits.pprs))
> return;
>
> /* If there is an outer header present that might contain a checksum
> * we need to bump the checksum level by 1 to reflect the fact that
> * we are indicating we validated the inner checksum.
> */
> - if (decoded.tunnel_type >= IDPF_RX_PTYPE_TUNNEL_IP_GRENAT)
> + if (parsed.tunnel_type >= LIBIE_RX_PTYPE_TUNNEL_IP_GRENAT)
> skb->csum_level = 1;
>
> - /* Only report checksum unnecessary for ICMP, TCP, UDP, or SCTP */
> - switch (decoded.inner_prot) {
> - case IDPF_RX_PTYPE_INNER_PROT_ICMP:
> - case IDPF_RX_PTYPE_INNER_PROT_TCP:
> - case IDPF_RX_PTYPE_INNER_PROT_UDP:
> - case IDPF_RX_PTYPE_INNER_PROT_SCTP:
> - skb->ip_summed = CHECKSUM_UNNECESSARY;
> - return;
> - default:
> - return;
> - }
> + skb->ip_summed = CHECKSUM_UNNECESSARY;
> + return;

Is it intentional to change from CHECKSUM_NONE to CHECKSUM_UNNECESSARY
in the default case?

I suppose so, as idpf_rx_csum (the splitq equivalent) does the same
(bar CHECKSUM_COMPLETE depending on descriptor bit).

2024-01-08 16:01:39

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH RFC net-next 00/34] Christmas 3-serie XDP for idpf (+generic stuff)

From: Willem De Bruijn <[email protected]>
Date: Tue, 26 Dec 2023 15:23:41 -0500

> Alexander Lobakin wrote:
>> I was highly asked to send this WIP before the holidays to trigger
>> some discussions at least for the generic parts.
>>
>> This all depends on libie[0] and WB-on-ITR fix[1]. The RFC does not
>> guarantee to work perfectly, but at least regular XDP seems to work
>> for me...
>>
>> In fact, here are 3 separate series:
>> * 01-08: convert idpf to libie and make it more sane;
>> * 09-25: add XDP to idpf;
>> * 26-34: add XSk to idpf.
>>
>> Most people may want to be interested only in the following generic
>> changes:
>> * 11: allow attaching already registered memory models to XDP RxQ info;
>> * 12-13: generic helpers for adding a frag to &xdp_buff and converting
>> it to an skb;
>> * 14: get rid of xdp_frame::mem.id, allow mixing pages from different
>> page_pools within one &xdp_buff/&xdp_frame;
>> * 15: some Page Pool helper;
>> * 18: it's for libie, but I wanted to talk about XDP_TX bulking;
>> * 26: same as 13, but for converting XSK &xdp_buff to skb.
>>
>> The rest is up to you, driver-specific stuff is pretty boring sometimes.
>>
>> I'll be polishing and finishing this all starting January 3rd and then
>> preparing and sending sane series, some early feedback never hurts tho.
>>
>> Merry Yule!
>>
>> [0] https://lore.kernel.org/netdev/[email protected]
>> [1] https://lore.kernel.org/netdev/[email protected]
>
> This is great. Thanks for sharing the entire series.
>
> Which SHA1 should we apply this to? I'm having a hard time applying
> cleanly.
>
> The libie v7 series applied cleanly on bc044ae9d64b. Which I chose
> only based on the follow-on page pool patch.
>
> But that base commit causes too many conflicts when applying this.
> Patch 6 had a trivial one in idpf_rx_singleq_clean (`skb = rx_q->skb`).
> But patch 14 has so many conflicts in page_pool.c, that I'm clearly
> on the wrong track trying to fix up manually.

net-next was updated while I was preparing the series. I also did a
couple changes in the basic libie code, but a new rev wasn't sent.
Please just use my open GH[0].

[0] https://github.com/alobakin/linux/tree/idpf-libie

Thanks,
Olek

2024-01-08 16:04:42

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH RFC net-next 01/34] idpf: reuse libie's definitions of parsed ptype structures

From: Willem De Bruijn <[email protected]>
Date: Wed, 27 Dec 2023 10:43:34 -0500

> Alexander Lobakin wrote:
>> idpf's in-kernel parsed ptype structure is almost identical to the one
>> used in the previous Intel drivers, which means it can be converted to
>> use libie's definitions and even helpers. The only difference is that
>> it doesn't use a constant table, rather than one obtained from the
>> device.

[...]

>> static void idpf_rx_singleq_csum(struct idpf_queue *rxq, struct sk_buff *skb,
>> - struct idpf_rx_csum_decoded *csum_bits,
>> - u16 ptype)
>> + struct idpf_rx_csum_decoded csum_bits,
>> + struct libie_rx_ptype_parsed parsed)
>> {
>> - struct idpf_rx_ptype_decoded decoded;
>> bool ipv4, ipv6;
>>
>> /* check if Rx checksum is enabled */
>> - if (unlikely(!(rxq->vport->netdev->features & NETIF_F_RXCSUM)))
>> + if (!libie_has_rx_checksum(rxq->vport->netdev, parsed))
>> return;
>>
>> /* check if HW has decoded the packet and checksum */
>> - if (unlikely(!(csum_bits->l3l4p)))
>> + if (unlikely(!csum_bits.l3l4p))
>> return;
>>
>> - decoded = rxq->vport->rx_ptype_lkup[ptype];
>> - if (unlikely(!(decoded.known && decoded.outer_ip)))
>> + if (unlikely(parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_L2))
>> return;
>>
>> - ipv4 = IDPF_RX_PTYPE_TO_IPV(&decoded, IDPF_RX_PTYPE_OUTER_IPV4);
>> - ipv6 = IDPF_RX_PTYPE_TO_IPV(&decoded, IDPF_RX_PTYPE_OUTER_IPV6);
>> + ipv4 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV4;
>> + ipv6 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV6;
>>
>> /* Check if there were any checksum errors */
>> - if (unlikely(ipv4 && (csum_bits->ipe || csum_bits->eipe)))
>> + if (unlikely(ipv4 && (csum_bits.ipe || csum_bits.eipe)))
>> goto checksum_fail;
>>
>> /* Device could not do any checksum offload for certain extension
>> * headers as indicated by setting IPV6EXADD bit
>> */
>> - if (unlikely(ipv6 && csum_bits->ipv6exadd))
>> + if (unlikely(ipv6 && csum_bits.ipv6exadd))
>> return;
>>
>> /* check for L4 errors and handle packets that were not able to be
>> * checksummed due to arrival speed
>> */
>> - if (unlikely(csum_bits->l4e))
>> + if (unlikely(csum_bits.l4e))
>> goto checksum_fail;
>>
>> - if (unlikely(csum_bits->nat && csum_bits->eudpe))
>> + if (unlikely(csum_bits.nat && csum_bits.eudpe))
>> goto checksum_fail;
>>
>> /* Handle packets that were not able to be checksummed due to arrival
>> * speed, in this case the stack can compute the csum.
>> */
>> - if (unlikely(csum_bits->pprs))
>> + if (unlikely(csum_bits.pprs))
>> return;
>>
>> /* If there is an outer header present that might contain a checksum
>> * we need to bump the checksum level by 1 to reflect the fact that
>> * we are indicating we validated the inner checksum.
>> */
>> - if (decoded.tunnel_type >= IDPF_RX_PTYPE_TUNNEL_IP_GRENAT)
>> + if (parsed.tunnel_type >= LIBIE_RX_PTYPE_TUNNEL_IP_GRENAT)
>> skb->csum_level = 1;
>>
>> - /* Only report checksum unnecessary for ICMP, TCP, UDP, or SCTP */
>> - switch (decoded.inner_prot) {
>> - case IDPF_RX_PTYPE_INNER_PROT_ICMP:
>> - case IDPF_RX_PTYPE_INNER_PROT_TCP:
>> - case IDPF_RX_PTYPE_INNER_PROT_UDP:
>> - case IDPF_RX_PTYPE_INNER_PROT_SCTP:
>> - skb->ip_summed = CHECKSUM_UNNECESSARY;
>> - return;
>> - default:
>> - return;
>> - }
>> + skb->ip_summed = CHECKSUM_UNNECESSARY;
>> + return;
>
> Is it intentional to change from CHECKSUM_NONE to CHECKSUM_UNNECESSARY
> in the default case?

The basic logic wasn't changed. libie_has_rx_checksum() checks if the
protocol can be checksummed by HW at the beginning of the function
instead of the end (why calculate and check all this if the proto is not
supported?).

>
> I suppose so, as idpf_rx_csum (the splitq equivalent) does the same
> (bar CHECKSUM_COMPLETE depending on descriptor bit).

Thanks,
Olek

2024-01-08 16:10:10

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH RFC net-next 00/34] Christmas 3-serie XDP for idpf (+generic stuff)

Alexander Lobakin wrote:
> From: Willem De Bruijn <[email protected]>
> Date: Tue, 26 Dec 2023 15:23:41 -0500
>
> > Alexander Lobakin wrote:
> >> I was highly asked to send this WIP before the holidays to trigger
> >> some discussions at least for the generic parts.
> >>
> >> This all depends on libie[0] and WB-on-ITR fix[1]. The RFC does not
> >> guarantee to work perfectly, but at least regular XDP seems to work
> >> for me...
> >>
> >> In fact, here are 3 separate series:
> >> * 01-08: convert idpf to libie and make it more sane;
> >> * 09-25: add XDP to idpf;
> >> * 26-34: add XSk to idpf.
> >>
> >> Most people may want to be interested only in the following generic
> >> changes:
> >> * 11: allow attaching already registered memory models to XDP RxQ info;
> >> * 12-13: generic helpers for adding a frag to &xdp_buff and converting
> >> it to an skb;
> >> * 14: get rid of xdp_frame::mem.id, allow mixing pages from different
> >> page_pools within one &xdp_buff/&xdp_frame;
> >> * 15: some Page Pool helper;
> >> * 18: it's for libie, but I wanted to talk about XDP_TX bulking;
> >> * 26: same as 13, but for converting XSK &xdp_buff to skb.
> >>
> >> The rest is up to you, driver-specific stuff is pretty boring sometimes.
> >>
> >> I'll be polishing and finishing this all starting January 3rd and then
> >> preparing and sending sane series, some early feedback never hurts tho.
> >>
> >> Merry Yule!
> >>
> >> [0] https://lore.kernel.org/netdev/[email protected]
> >> [1] https://lore.kernel.org/netdev/[email protected]
> >
> > This is great. Thanks for sharing the entire series.
> >
> > Which SHA1 should we apply this to? I'm having a hard time applying
> > cleanly.
> >
> > The libie v7 series applied cleanly on bc044ae9d64b. Which I chose
> > only based on the follow-on page pool patch.
> >
> > But that base commit causes too many conflicts when applying this.
> > Patch 6 had a trivial one in idpf_rx_singleq_clean (`skb = rx_q->skb`).
> > But patch 14 has so many conflicts in page_pool.c, that I'm clearly
> > on the wrong track trying to fix up manually.
>
> net-next was updated while I was preparing the series. I also did a
> couple changes in the basic libie code, but a new rev wasn't sent.
> Please just use my open GH[0].
>
> [0] https://github.com/alobakin/linux/tree/idpf-libie

Even better, thanks. I'll use that to run my basic XSK tests.

2024-01-08 16:18:19

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH RFC net-next 05/34] idpf: convert header split mode to libie + napi_build_skb()

From: Willem De Bruijn <[email protected]>
Date: Wed, 27 Dec 2023 10:30:48 -0500

> Alexander Lobakin wrote:
>> Currently, idpf uses the following model for the header buffers:
>>
>> * buffers are allocated via dma_alloc_coherent();
>> * when receiving, napi_alloc_skb() is called and then the header is
>> copied to the newly allocated linear part.
>>
>> This is far from optimal as DMA coherent zone is slow on many systems
>> and memcpy() neutralizes the idea and benefits of the header split.
>
> Do you have data showing this?

Showing slow coherent DMA or memcpy()?
Try MIPS for the first one.
For the second -- try comparing performance on ice with the "legacy-rx"
private flag disabled and enabled.

>
> The assumption for the current model is that the headers will be
> touched shortly after, so the copy just primes the cache.

They won't be touched in many cases. E.g. XDP_DROP.
Or headers can be long. memcpy(32) != memcpy(128).
The current model allocates a new skb with a linear part, which is a
real memory allocation. napi_build_skb() doesn't allocate anything
except struct sk_buff, which is usually available in the NAPI percpu cache.
If build_skb() wasn't more effective, it wouldn't be introduced.
The current model just assumes default socket traffic with ~40-byte
headers and no XDP etc.

>
> The single coherently allocated region for all headers reduces
> IOTLB pressure.

page_pool pages are mapped once at allocation.

>
> It is possible that the alternative model is faster. But that is not
> trivially obvious.
>
> I think patches like this can stand on their own. Probably best to
> leave them out of the dependency series to enable XDP and AF_XDP.

You can't do XDP on DMA coherent zone. To do this memcpy(), you need
allocate a new skb with a linear part, which is usually done after XDP,
otherwise it's too much overhead and little-to-no benefits comparing to
generic skb XDP.
The current idpf code is just not compatible with the XDP code in this
series, it's pointless to do double work.

Disabling header split when XDP is enabled (alternative option) means
disabling TCP zerocopy and worse performance in general, I don't
consider this.

>
>> Instead, use libie to create page_pools for the header buffers, allocate
>> them dynamically and then build an skb via napi_build_skb() around them
>> with no memory copy. With one exception...
>> When you enable header split, you except you'll always have a separate
>> header buffer, so that you could reserve headroom and tailroom only
>> there and then use full buffers for the data. For example, this is how
>> TCP zerocopy works -- you have to have the payload aligned to PAGE_SIZE.
>> The current hardware running idpf does *not* guarantee that you'll
>> always have headers placed separately. For example, on my setup, even
>> ICMP packets are written as one piece to the data buffers. You can't
>> build a valid skb around a data buffer in this case.
>> To not complicate things and not lose TCP zerocopy etc., when such thing
>> happens, use the empty header buffer and pull either full frame (if it's
>> short) or the Ethernet header there and build an skb around it. GRO
>> layer will pull more from the data buffer later. This W/A will hopefully
>> be removed one day.
>>
>> Signed-off-by: Alexander Lobakin <[email protected]>

Thanks,
Olek

2024-01-09 13:59:42

by Willem de Bruijn

[permalink] [raw]
Subject: Re: [PATCH RFC net-next 05/34] idpf: convert header split mode to libie + napi_build_skb()

Alexander Lobakin wrote:
> From: Willem De Bruijn <[email protected]>
> Date: Wed, 27 Dec 2023 10:30:48 -0500
>
> > Alexander Lobakin wrote:
> >> Currently, idpf uses the following model for the header buffers:
> >>
> >> * buffers are allocated via dma_alloc_coherent();
> >> * when receiving, napi_alloc_skb() is called and then the header is
> >> copied to the newly allocated linear part.
> >>
> >> This is far from optimal as DMA coherent zone is slow on many systems
> >> and memcpy() neutralizes the idea and benefits of the header split.
> >
> > Do you have data showing this?
>
> Showing slow coherent DMA or memcpy()?
> Try MIPS for the first one.
> For the second -- try comparing performance on ice with the "legacy-rx"
> private flag disabled and enabled.
>
> >
> > The assumption for the current model is that the headers will be
> > touched shortly after, so the copy just primes the cache.
>
> They won't be touched in many cases. E.g. XDP_DROP.
> Or headers can be long. memcpy(32) != memcpy(128).
> The current model allocates a new skb with a linear part, which is a
> real memory allocation. napi_build_skb() doesn't allocate anything
> except struct sk_buff, which is usually available in the NAPI percpu cache.
> If build_skb() wasn't more effective, it wouldn't be introduced.
> The current model just assumes default socket traffic with ~40-byte
> headers and no XDP etc.
>
> >
> > The single coherently allocated region for all headers reduces
> > IOTLB pressure.
>
> page_pool pages are mapped once at allocation.
>
> >
> > It is possible that the alternative model is faster. But that is not
> > trivially obvious.
> >
> > I think patches like this can stand on their own. Probably best to
> > leave them out of the dependency series to enable XDP and AF_XDP.
>
> You can't do XDP on DMA coherent zone. To do this memcpy(), you need
> allocate a new skb with a linear part, which is usually done after XDP,
> otherwise it's too much overhead and little-to-no benefits comparing to
> generic skb XDP.
> The current idpf code is just not compatible with the XDP code in this
> series, it's pointless to do double work.
>
> Disabling header split when XDP is enabled (alternative option) means
> disabling TCP zerocopy and worse performance in general, I don't
> consider this.

My concern is if optimizations for XDP might degrade the TCP/IP common
path. XDP_DROP and all of XDP even is a niche feature by comparison.

The current driver behavior was not the first for IDPF, but arrived
at based on extensive performance debugging. An earlier iteration used
separate header buffers. Switching to a single coherent allocated
buffer region significantly increased throughput / narrowed the gap
between header-split and non-header-split mode.

I follow your argument and the heuristics are reasonable. My request
is only that this decision is based on real data for this driver and
modern platforms. We cannot regress TCP/IP hot path performance.




2024-01-09 14:44:16

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH RFC net-next 05/34] idpf: convert header split mode to libie + napi_build_skb()

On Sat, Dec 23, 2023 at 3:58 AM Alexander Lobakin
<[email protected]> wrote:
>
> Currently, idpf uses the following model for the header buffers:
>
> * buffers are allocated via dma_alloc_coherent();
> * when receiving, napi_alloc_skb() is called and then the header is
> copied to the newly allocated linear part.
>
> This is far from optimal as DMA coherent zone is slow on many systems
> and memcpy() neutralizes the idea and benefits of the header split.
> Instead, use libie to create page_pools for the header buffers, allocate
> them dynamically and then build an skb via napi_build_skb() around them
> with no memory copy. With one exception...
> When you enable header split, you except you'll always have a separate
> header buffer, so that you could reserve headroom and tailroom only
> there and then use full buffers for the data. For example, this is how
> TCP zerocopy works -- you have to have the payload aligned to PAGE_SIZE.
> The current hardware running idpf does *not* guarantee that you'll
> always have headers placed separately. For example, on my setup, even
> ICMP packets are written as one piece to the data buffers. You can't
> build a valid skb around a data buffer in this case.
> To not complicate things and not lose TCP zerocopy etc., when such thing
> happens, use the empty header buffer and pull either full frame (if it's
> short) or the Ethernet header there and build an skb around it. GRO
> layer will pull more from the data buffer later. This W/A will hopefully
> be removed one day.

We definitely want performance numbers here, for systems that truly matter.

We spent a lot of time trying to make idpf slightly better than it
was, we do not want regressions.

Thank you.

2024-01-11 13:09:56

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH RFC net-next 05/34] idpf: convert header split mode to libie + napi_build_skb()

From: Willem De Bruijn <[email protected]>
Date: Tue, 09 Jan 2024 08:59:27 -0500

> Alexander Lobakin wrote:
>> From: Willem De Bruijn <[email protected]>
>> Date: Wed, 27 Dec 2023 10:30:48 -0500
>>
>>> Alexander Lobakin wrote:
>>>> Currently, idpf uses the following model for the header buffers:
>>>>
>>>> * buffers are allocated via dma_alloc_coherent();
>>>> * when receiving, napi_alloc_skb() is called and then the header is
>>>> copied to the newly allocated linear part.
>>>>
>>>> This is far from optimal as DMA coherent zone is slow on many systems
>>>> and memcpy() neutralizes the idea and benefits of the header split.
>>>
>>> Do you have data showing this?
>>
>> Showing slow coherent DMA or memcpy()?
>> Try MIPS for the first one.
>> For the second -- try comparing performance on ice with the "legacy-rx"
>> private flag disabled and enabled.
>>
>>>
>>> The assumption for the current model is that the headers will be
>>> touched shortly after, so the copy just primes the cache.
>>
>> They won't be touched in many cases. E.g. XDP_DROP.
>> Or headers can be long. memcpy(32) != memcpy(128).
>> The current model allocates a new skb with a linear part, which is a
>> real memory allocation. napi_build_skb() doesn't allocate anything
>> except struct sk_buff, which is usually available in the NAPI percpu cache.
>> If build_skb() wasn't more effective, it wouldn't be introduced.
>> The current model just assumes default socket traffic with ~40-byte
>> headers and no XDP etc.
>>
>>>
>>> The single coherently allocated region for all headers reduces
>>> IOTLB pressure.
>>
>> page_pool pages are mapped once at allocation.
>>
>>>
>>> It is possible that the alternative model is faster. But that is not
>>> trivially obvious.
>>>
>>> I think patches like this can stand on their own. Probably best to
>>> leave them out of the dependency series to enable XDP and AF_XDP.
>>
>> You can't do XDP on DMA coherent zone. To do this memcpy(), you need
>> allocate a new skb with a linear part, which is usually done after XDP,
>> otherwise it's too much overhead and little-to-no benefits comparing to
>> generic skb XDP.
>> The current idpf code is just not compatible with the XDP code in this
>> series, it's pointless to do double work.
>>
>> Disabling header split when XDP is enabled (alternative option) means
>> disabling TCP zerocopy and worse performance in general, I don't
>> consider this.
>
> My concern is if optimizations for XDP might degrade the TCP/IP common

We take care of this. Please don't think that my team allows perf
degradation when developing stuff, it's not true.

> path. XDP_DROP and all of XDP even is a niche feature by comparison.
>
> The current driver behavior was not the first for IDPF, but arrived
> at based on extensive performance debugging. An earlier iteration used
> separate header buffers. Switching to a single coherent allocated
> buffer region significantly increased throughput / narrowed the gap
> between header-split and non-header-split mode.
>
> I follow your argument and the heuristics are reasonable. My request
> is only that this decision is based on real data for this driver and
> modern platforms. We cannot regress TCP/IP hot path performance.

Sure, I'll provide numbers in the next iteration. Please go ahead with
further review (if you're interested).

Thanks,
Olek