2023-07-05 16:41:20

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next v4 0/9] net: intel: start The Great Code Dedup + Page Pool for iavf

Here's a two-shot: introduce Intel Ethernet common library (libie) and
switch iavf to Page Pool. Details in the commit messages; here's
summary:

Not a secret there's a ton of code duplication between two and more Intel
ethernet modules. Before introducing new changes, which would need to be
copied over again, start decoupling the already existing duplicate
functionality into a new module, which will be shared between several
Intel Ethernet drivers.
The first thing that came to my mind was "libie" -- "Intel Ethernet
common library". Also this sounds like "lovelie" and can be expanded as
"lib Internet Explorer" :P I'm open for anything else (but justified).
The series is only the beginning. From now on, adding every new feature
or doing any good driver refactoring will remove much more lines than add
for quite some time. There's a basic roadmap with some deduplications
planned already, not speaking of that touching every line now asks: "can
I share this?".
PP conversion for iavf lands within the same series as these two are tied
closely. libie will support Page Pool model only, so a driver can't use
much of the lib until it's converted. iavf is only the example, the rest
will eventually be converted soon on a per-driver basis. That is when it
gets really interesting. Stay tech.

Alexander Lobakin (9):
net: intel: introduce Intel Ethernet common library
iavf: kill "legacy-rx" for good
iavf: drop page splitting and recycling
net: page_pool: add DMA-sync-for-CPU inline helpers
libie: add Rx buffer management (via Page Pool)
iavf: switch to Page Pool
libie: add common queue stats
libie: add per-queue Page Pool stats
iavf: switch queue stats to libie

MAINTAINERS | 3 +-
drivers/net/ethernet/intel/Kconfig | 11 +
drivers/net/ethernet/intel/Makefile | 1 +
drivers/net/ethernet/intel/i40e/i40e_common.c | 253 --------
drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
.../net/ethernet/intel/i40e/i40e_prototype.h | 7 -
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 74 +--
drivers/net/ethernet/intel/i40e/i40e_type.h | 88 ---
drivers/net/ethernet/intel/iavf/iavf.h | 2 +-
drivers/net/ethernet/intel/iavf/iavf_common.c | 253 --------
.../net/ethernet/intel/iavf/iavf_ethtool.c | 226 +------
drivers/net/ethernet/intel/iavf/iavf_main.c | 44 +-
.../net/ethernet/intel/iavf/iavf_prototype.h | 7 -
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 608 ++++--------------
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 176 +----
drivers/net/ethernet/intel/iavf/iavf_type.h | 90 ---
.../net/ethernet/intel/iavf/iavf_virtchnl.c | 20 +-
.../net/ethernet/intel/ice/ice_lan_tx_rx.h | 316 ---------
drivers/net/ethernet/intel/ice/ice_main.c | 1 +
drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 74 +--
drivers/net/ethernet/intel/libie/Makefile | 7 +
drivers/net/ethernet/intel/libie/internal.h | 23 +
drivers/net/ethernet/intel/libie/rx.c | 183 ++++++
drivers/net/ethernet/intel/libie/stats.c | 190 ++++++
include/linux/net/intel/libie/rx.h | 241 +++++++
include/linux/net/intel/libie/stats.h | 214 ++++++
include/net/page_pool.h | 49 +-
27 files changed, 1137 insertions(+), 2025 deletions(-)
create mode 100644 drivers/net/ethernet/intel/libie/Makefile
create mode 100644 drivers/net/ethernet/intel/libie/internal.h
create mode 100644 drivers/net/ethernet/intel/libie/rx.c
create mode 100644 drivers/net/ethernet/intel/libie/stats.c
create mode 100644 include/linux/net/intel/libie/rx.h
create mode 100644 include/linux/net/intel/libie/stats.h

---
Directly to net-next, has non-Intel code change (#4) :p

Based on the PP "hybrid" allocation series[0] and requires it to work.

From v3[1]:
* base on the latest net-next, update bloat-o-meter and perf stats;
* split generic PP optimizations into a separate series;
* drop "optimize hotpath a bunch" commit: a lot of [controversial]
changes in one place, worth own series (Alex);
* 02: pick Rev-by (Alex);
* 03: move in-place recycling removal here from the dropped patch;
* 05: new, add libie Rx buffer API separatelly from IAVF changes;
* 05-06: use new "hybrid" allocation API from[0] to reduce memory usage
when a page can fit more than 1 truesize (also asked by David);
* 06: merge with "always use order-0 page" commit to reduce diffs and
simplify things (Alex);
* 09: fix page_alloc_fail counter.

From v2[2]:
* 0006: fix page_pool.h include in OcteonTX2 files (Jakub, Patchwork);
* no functional changes.

From v1[3]:
* 0006: new (me, Jakub);
* 0008: give the helpers more intuitive names (Jakub, Ilias);
* -^-: also expand their kdoc a bit for the same reason;
* -^-: fix kdoc copy-paste issue (Patchwork, Jakub);
* 0011: drop `inline` from C file (Patchwork, Jakub).

[0] https://lore.kernel.org/netdev/[email protected]
[1] https://lore.kernel.org/netdev/[email protected]
[2] https://lore.kernel.org/netdev/[email protected]
[3] https://lore.kernel.org/netdev/[email protected]

--
2.41.0



2023-07-05 16:41:51

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next v4 2/9] iavf: kill "legacy-rx" for good

Ever since build_skb() became stable, the old way with allocating an skb
for storing the headers separately, which will be then copied manually,
was slower, less flexible and thus obsolete.

* it had higher pressure on MM since it actually allocates new pages,
which then get split and refcount-biased (NAPI page cache);
* it implies memcpy() of packet headers (40+ bytes per each frame);
* the actual header length was calculated via eth_get_headlen(), which
invokes Flow Dissector and thus wastes a bunch of CPU cycles;
* XDP makes it even more weird since it requires headroom for long and
also tailroom for some time (since mbuf landed). Take a look at the
ice driver, which is built around work-arounds to make XDP work with
it.

Even on some quite low-end hardware (not a common case for 100G NICs) it
was performing worse.
The only advantage "legacy-rx" had is that it didn't require any
reserved headroom and tailroom. But iavf didn't use this, as it always
splits pages into two halves of 2k, while that save would only be useful
when striding. And again, XDP effectively removes that sole pro.

There's a train of features to land in IAVF soon: Page Pool, XDP, XSk,
multi-buffer etc. Each new would require adding more and more Danse
Macabre for absolutely no reason, besides making hotpath less and less
effective.
Remove the "feature" with all the related code. This includes at least
one very hot branch (typically hit on each new frame), which was either
always-true or always-false at least for a complete NAPI bulk of 64
frames, the whole private flags cruft, and so on. Some stats:

Function: add/remove: 0/4 grow/shrink: 0/7 up/down: 0/-757 (-757)
RO Data: add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-40 (-40)

Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/iavf/iavf.h | 2 +-
.../net/ethernet/intel/iavf/iavf_ethtool.c | 140 ------------------
drivers/net/ethernet/intel/iavf/iavf_main.c | 10 +-
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 84 +----------
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 27 +---
.../net/ethernet/intel/iavf/iavf_virtchnl.c | 3 +-
6 files changed, 8 insertions(+), 258 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index f80f2735e688..71c5d9b18692 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -298,7 +298,7 @@ struct iavf_adapter {
#define IAVF_FLAG_CLIENT_NEEDS_L2_PARAMS BIT(12)
#define IAVF_FLAG_PROMISC_ON BIT(13)
#define IAVF_FLAG_ALLMULTI_ON BIT(14)
-#define IAVF_FLAG_LEGACY_RX BIT(15)
+/* BIT(15) is free, was IAVF_FLAG_LEGACY_RX */
#define IAVF_FLAG_REINIT_ITR_NEEDED BIT(16)
#define IAVF_FLAG_QUEUES_DISABLED BIT(17)
#define IAVF_FLAG_SETUP_NETDEV_FEATURES BIT(18)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index 6f171d1d85b7..de3050c02b6f 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -239,29 +239,6 @@ static const struct iavf_stats iavf_gstrings_stats[] = {

#define IAVF_QUEUE_STATS_LEN ARRAY_SIZE(iavf_gstrings_queue_stats)

-/* For now we have one and only one private flag and it is only defined
- * when we have support for the SKIP_CPU_SYNC DMA attribute. Instead
- * of leaving all this code sitting around empty we will strip it unless
- * our one private flag is actually available.
- */
-struct iavf_priv_flags {
- char flag_string[ETH_GSTRING_LEN];
- u32 flag;
- bool read_only;
-};
-
-#define IAVF_PRIV_FLAG(_name, _flag, _read_only) { \
- .flag_string = _name, \
- .flag = _flag, \
- .read_only = _read_only, \
-}
-
-static const struct iavf_priv_flags iavf_gstrings_priv_flags[] = {
- IAVF_PRIV_FLAG("legacy-rx", IAVF_FLAG_LEGACY_RX, 0),
-};
-
-#define IAVF_PRIV_FLAGS_STR_LEN ARRAY_SIZE(iavf_gstrings_priv_flags)
-
/**
* iavf_get_link_ksettings - Get Link Speed and Duplex settings
* @netdev: network interface device structure
@@ -341,8 +318,6 @@ static int iavf_get_sset_count(struct net_device *netdev, int sset)
return IAVF_STATS_LEN +
(IAVF_QUEUE_STATS_LEN * 2 *
netdev->real_num_tx_queues);
- else if (sset == ETH_SS_PRIV_FLAGS)
- return IAVF_PRIV_FLAGS_STR_LEN;
else
return -EINVAL;
}
@@ -384,24 +359,6 @@ static void iavf_get_ethtool_stats(struct net_device *netdev,
rcu_read_unlock();
}

-/**
- * iavf_get_priv_flag_strings - Get private flag strings
- * @netdev: network interface device structure
- * @data: buffer for string data
- *
- * Builds the private flags string table
- **/
-static void iavf_get_priv_flag_strings(struct net_device *netdev, u8 *data)
-{
- unsigned int i;
-
- for (i = 0; i < IAVF_PRIV_FLAGS_STR_LEN; i++) {
- snprintf(data, ETH_GSTRING_LEN, "%s",
- iavf_gstrings_priv_flags[i].flag_string);
- data += ETH_GSTRING_LEN;
- }
-}
-
/**
* iavf_get_stat_strings - Get stat strings
* @netdev: network interface device structure
@@ -440,105 +397,11 @@ static void iavf_get_strings(struct net_device *netdev, u32 sset, u8 *data)
case ETH_SS_STATS:
iavf_get_stat_strings(netdev, data);
break;
- case ETH_SS_PRIV_FLAGS:
- iavf_get_priv_flag_strings(netdev, data);
- break;
default:
break;
}
}

-/**
- * iavf_get_priv_flags - report device private flags
- * @netdev: network interface device structure
- *
- * The get string set count and the string set should be matched for each
- * flag returned. Add new strings for each flag to the iavf_gstrings_priv_flags
- * array.
- *
- * Returns a u32 bitmap of flags.
- **/
-static u32 iavf_get_priv_flags(struct net_device *netdev)
-{
- struct iavf_adapter *adapter = netdev_priv(netdev);
- u32 i, ret_flags = 0;
-
- for (i = 0; i < IAVF_PRIV_FLAGS_STR_LEN; i++) {
- const struct iavf_priv_flags *priv_flags;
-
- priv_flags = &iavf_gstrings_priv_flags[i];
-
- if (priv_flags->flag & adapter->flags)
- ret_flags |= BIT(i);
- }
-
- return ret_flags;
-}
-
-/**
- * iavf_set_priv_flags - set private flags
- * @netdev: network interface device structure
- * @flags: bit flags to be set
- **/
-static int iavf_set_priv_flags(struct net_device *netdev, u32 flags)
-{
- struct iavf_adapter *adapter = netdev_priv(netdev);
- u32 orig_flags, new_flags, changed_flags;
- u32 i;
-
- orig_flags = READ_ONCE(adapter->flags);
- new_flags = orig_flags;
-
- for (i = 0; i < IAVF_PRIV_FLAGS_STR_LEN; i++) {
- const struct iavf_priv_flags *priv_flags;
-
- priv_flags = &iavf_gstrings_priv_flags[i];
-
- if (flags & BIT(i))
- new_flags |= priv_flags->flag;
- else
- new_flags &= ~(priv_flags->flag);
-
- if (priv_flags->read_only &&
- ((orig_flags ^ new_flags) & ~BIT(i)))
- return -EOPNOTSUPP;
- }
-
- /* Before we finalize any flag changes, any checks which we need to
- * perform to determine if the new flags will be supported should go
- * here...
- */
-
- /* Compare and exchange the new flags into place. If we failed, that
- * is if cmpxchg returns anything but the old value, this means
- * something else must have modified the flags variable since we
- * copied it. We'll just punt with an error and log something in the
- * message buffer.
- */
- if (cmpxchg(&adapter->flags, orig_flags, new_flags) != orig_flags) {
- dev_warn(&adapter->pdev->dev,
- "Unable to update adapter->flags as it was modified by another thread...\n");
- return -EAGAIN;
- }
-
- changed_flags = orig_flags ^ new_flags;
-
- /* Process any additional changes needed as a result of flag changes.
- * The changed_flags value reflects the list of bits that were changed
- * in the code above.
- */
-
- /* issue a reset to force legacy-rx change to take effect */
- if (changed_flags & IAVF_FLAG_LEGACY_RX) {
- if (netif_running(netdev)) {
- adapter->flags |= IAVF_FLAG_RESET_NEEDED;
- queue_work(adapter->wq, &adapter->reset_task);
- }
- }
-
- return 0;
-}
-
/**
* iavf_get_msglevel - Get debug message level
* @netdev: network interface device structure
@@ -584,7 +447,6 @@ static void iavf_get_drvinfo(struct net_device *netdev,
strscpy(drvinfo->driver, iavf_driver_name, 32);
strscpy(drvinfo->fw_version, "N/A", 4);
strscpy(drvinfo->bus_info, pci_name(adapter->pdev), 32);
- drvinfo->n_priv_flags = IAVF_PRIV_FLAGS_STR_LEN;
}

/**
@@ -1969,8 +1831,6 @@ static const struct ethtool_ops iavf_ethtool_ops = {
.get_strings = iavf_get_strings,
.get_ethtool_stats = iavf_get_ethtool_stats,
.get_sset_count = iavf_get_sset_count,
- .get_priv_flags = iavf_get_priv_flags,
- .set_priv_flags = iavf_set_priv_flags,
.get_msglevel = iavf_get_msglevel,
.set_msglevel = iavf_set_msglevel,
.get_coalesce = iavf_get_coalesce,
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 23fbd45dd986..db1ed13f11bb 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -702,9 +702,7 @@ static void iavf_configure_rx(struct iavf_adapter *adapter)
struct iavf_hw *hw = &adapter->hw;
int i;

- /* Legacy Rx will always default to a 2048 buffer size. */
-#if (PAGE_SIZE < 8192)
- if (!(adapter->flags & IAVF_FLAG_LEGACY_RX)) {
+ if (PAGE_SIZE < 8192) {
struct net_device *netdev = adapter->netdev;

/* For jumbo frames on systems with 4K pages we have to use
@@ -721,16 +719,10 @@ static void iavf_configure_rx(struct iavf_adapter *adapter)
(netdev->mtu <= ETH_DATA_LEN))
rx_buf_len = IAVF_RXBUFFER_1536 - NET_IP_ALIGN;
}
-#endif

for (i = 0; i < adapter->num_active_queues; i++) {
adapter->rx_rings[i].tail = hw->hw_addr + IAVF_QRX_TAIL1(i);
adapter->rx_rings[i].rx_buf_len = rx_buf_len;
-
- if (adapter->flags & IAVF_FLAG_LEGACY_RX)
- clear_ring_build_skb_enabled(&adapter->rx_rings[i]);
- else
- set_ring_build_skb_enabled(&adapter->rx_rings[i]);
}
}

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index aae357af072c..a85b270fc769 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -823,17 +823,6 @@ static inline void iavf_release_rx_desc(struct iavf_ring *rx_ring, u32 val)
writel(val, rx_ring->tail);
}

-/**
- * iavf_rx_offset - Return expected offset into page to access data
- * @rx_ring: Ring we are requesting offset of
- *
- * Returns the offset value for ring into the data buffer.
- */
-static inline unsigned int iavf_rx_offset(struct iavf_ring *rx_ring)
-{
- return ring_uses_build_skb(rx_ring) ? IAVF_SKB_PAD : 0;
-}
-
/**
* iavf_alloc_mapped_page - recycle or make a new page
* @rx_ring: ring to use
@@ -878,7 +867,7 @@ static bool iavf_alloc_mapped_page(struct iavf_ring *rx_ring,

bi->dma = dma;
bi->page = page;
- bi->page_offset = iavf_rx_offset(rx_ring);
+ bi->page_offset = IAVF_SKB_PAD;

/* initialize pagecnt_bias to 1 representing we fully own page */
bi->pagecnt_bias = 1;
@@ -1219,7 +1208,7 @@ static void iavf_add_rx_frag(struct iavf_ring *rx_ring,
#if (PAGE_SIZE < 8192)
unsigned int truesize = iavf_rx_pg_size(rx_ring) / 2;
#else
- unsigned int truesize = SKB_DATA_ALIGN(size + iavf_rx_offset(rx_ring));
+ unsigned int truesize = SKB_DATA_ALIGN(size + IAVF_SKB_PAD);
#endif

if (!size)
@@ -1267,71 +1256,6 @@ static struct iavf_rx_buffer *iavf_get_rx_buffer(struct iavf_ring *rx_ring,
return rx_buffer;
}

-/**
- * iavf_construct_skb - Allocate skb and populate it
- * @rx_ring: rx descriptor ring to transact packets on
- * @rx_buffer: rx buffer to pull data from
- * @size: size of buffer to add to skb
- *
- * This function allocates an skb. It then populates it with the page
- * data from the current receive descriptor, taking care to set up the
- * skb correctly.
- */
-static struct sk_buff *iavf_construct_skb(struct iavf_ring *rx_ring,
- struct iavf_rx_buffer *rx_buffer,
- unsigned int size)
-{
- void *va;
-#if (PAGE_SIZE < 8192)
- unsigned int truesize = iavf_rx_pg_size(rx_ring) / 2;
-#else
- unsigned int truesize = SKB_DATA_ALIGN(size);
-#endif
- unsigned int headlen;
- struct sk_buff *skb;
-
- if (!rx_buffer)
- return NULL;
- /* prefetch first cache line of first page */
- va = page_address(rx_buffer->page) + rx_buffer->page_offset;
- net_prefetch(va);
-
- /* allocate a skb to store the frags */
- skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
- IAVF_RX_HDR_SIZE,
- GFP_ATOMIC | __GFP_NOWARN);
- if (unlikely(!skb))
- return NULL;
-
- /* Determine available headroom for copy */
- headlen = size;
- if (headlen > IAVF_RX_HDR_SIZE)
- headlen = eth_get_headlen(skb->dev, va, IAVF_RX_HDR_SIZE);
-
- /* align pull length to size of long to optimize memcpy performance */
- memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
-
- /* update all of the pointers */
- size -= headlen;
- if (size) {
- skb_add_rx_frag(skb, 0, rx_buffer->page,
- rx_buffer->page_offset + headlen,
- size, truesize);
-
- /* buffer is used by skb, update page_offset */
-#if (PAGE_SIZE < 8192)
- rx_buffer->page_offset ^= truesize;
-#else
- rx_buffer->page_offset += truesize;
-#endif
- } else {
- /* buffer is unused, reset bias back to rx_buffer */
- rx_buffer->pagecnt_bias++;
- }
-
- return skb;
-}
-
/**
* iavf_build_skb - Build skb around an existing buffer
* @rx_ring: Rx descriptor ring to transact packets on
@@ -1504,10 +1428,8 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
/* retrieve a buffer from the ring */
if (skb)
iavf_add_rx_frag(rx_ring, rx_buffer, skb, size);
- else if (ring_uses_build_skb(rx_ring))
- skb = iavf_build_skb(rx_ring, rx_buffer, size);
else
- skb = iavf_construct_skb(rx_ring, rx_buffer, size);
+ skb = iavf_build_skb(rx_ring, rx_buffer, size);

/* exit if we failed to retrieve a buffer */
if (!skb) {
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index 7e6ee32d19b6..4b412f7662e4 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -82,20 +82,11 @@ enum iavf_dyn_idx_t {
BIT_ULL(IAVF_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP))

/* Supported Rx Buffer Sizes (a multiple of 128) */
-#define IAVF_RXBUFFER_256 256
#define IAVF_RXBUFFER_1536 1536 /* 128B aligned standard Ethernet frame */
#define IAVF_RXBUFFER_2048 2048
#define IAVF_RXBUFFER_3072 3072 /* Used for large frames w/ padding */
#define IAVF_MAX_RXBUFFER 9728 /* largest size for single descriptor */

-/* NOTE: netdev_alloc_skb reserves up to 64 bytes, NET_IP_ALIGN means we
- * reserve 2 more, and skb_shared_info adds an additional 384 bytes more,
- * this adds up to 512 bytes of extra data meaning the smallest allocation
- * we could have is 1K.
- * i.e. RXBUFFER_256 --> 960 byte skb (size-1024 slab)
- * i.e. RXBUFFER_512 --> 1216 byte skb (size-2048 slab)
- */
-#define IAVF_RX_HDR_SIZE IAVF_RXBUFFER_256
#define IAVF_PACKET_HDR_PAD (ETH_HLEN + ETH_FCS_LEN + (VLAN_HLEN * 2))
#define iavf_rx_desc iavf_32byte_rx_desc

@@ -362,7 +353,8 @@ struct iavf_ring {

u16 flags;
#define IAVF_TXR_FLAGS_WB_ON_ITR BIT(0)
-#define IAVF_RXR_FLAGS_BUILD_SKB_ENABLED BIT(1)
+/* BIT(1) is free, was IAVF_RXR_FLAGS_BUILD_SKB_ENABLED */
+/* BIT(2) is free */
#define IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1 BIT(3)
#define IAVF_TXR_FLAGS_VLAN_TAG_LOC_L2TAG2 BIT(4)
#define IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2 BIT(5)
@@ -393,21 +385,6 @@ struct iavf_ring {
*/
} ____cacheline_internodealigned_in_smp;

-static inline bool ring_uses_build_skb(struct iavf_ring *ring)
-{
- return !!(ring->flags & IAVF_RXR_FLAGS_BUILD_SKB_ENABLED);
-}
-
-static inline void set_ring_build_skb_enabled(struct iavf_ring *ring)
-{
- ring->flags |= IAVF_RXR_FLAGS_BUILD_SKB_ENABLED;
-}
-
-static inline void clear_ring_build_skb_enabled(struct iavf_ring *ring)
-{
- ring->flags &= ~IAVF_RXR_FLAGS_BUILD_SKB_ENABLED;
-}
-
#define IAVF_ITR_ADAPTIVE_MIN_INC 0x0002
#define IAVF_ITR_ADAPTIVE_MIN_USECS 0x0002
#define IAVF_ITR_ADAPTIVE_MAX_USECS 0x007e
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 7c0578b5457b..fdddc3588487 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -290,8 +290,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
return;

/* Limit maximum frame size when jumbo frames is not enabled */
- if (!(adapter->flags & IAVF_FLAG_LEGACY_RX) &&
- (adapter->netdev->mtu <= ETH_DATA_LEN))
+ if (adapter->netdev->mtu <= ETH_DATA_LEN)
max_frame = IAVF_RXBUFFER_1536 - NET_IP_ALIGN;

vqci->vsi_id = adapter->vsi_res->vsi_id;
--
2.41.0


2023-07-05 16:42:11

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH RFC net-next v4 9/9] iavf: switch queue stats to libie

iavf is pretty much ready for using the generic libie stats, so drop all
the custom code and just use generic definitions. The only thing is that
it previously lacked the counter of Tx queue stops. It's present in the
other drivers, so add it here as well.
The rest is straightforward. There were two fields in the Tx stats
struct, which didn't belong there. The first one has never been used,
wipe it; and move the other to the queue structure. Plus move around
a couple fields in &iavf_ring to account stats structs' alignment.

Signed-off-by: Alexander Lobakin <[email protected]>
---
.../net/ethernet/intel/iavf/iavf_ethtool.c | 86 ++++-------------
drivers/net/ethernet/intel/iavf/iavf_main.c | 2 +
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 92 ++++++++++---------
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 54 +++--------
4 files changed, 83 insertions(+), 151 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index de3050c02b6f..cdf09c5412a0 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -46,16 +46,6 @@ struct iavf_stats {
.stat_offset = offsetof(_type, _stat) \
}

-/* Helper macro for defining some statistics related to queues */
-#define IAVF_QUEUE_STAT(_name, _stat) \
- IAVF_STAT(struct iavf_ring, _name, _stat)
-
-/* Stats associated with a Tx or Rx ring */
-static const struct iavf_stats iavf_gstrings_queue_stats[] = {
- IAVF_QUEUE_STAT("%s-%u.packets", stats.packets),
- IAVF_QUEUE_STAT("%s-%u.bytes", stats.bytes),
-};
-
/**
* iavf_add_one_ethtool_stat - copy the stat into the supplied buffer
* @data: location to store the stat value
@@ -141,43 +131,6 @@ __iavf_add_ethtool_stats(u64 **data, void *pointer,
#define iavf_add_ethtool_stats(data, pointer, stats) \
__iavf_add_ethtool_stats(data, pointer, stats, ARRAY_SIZE(stats))

-/**
- * iavf_add_queue_stats - copy queue statistics into supplied buffer
- * @data: ethtool stats buffer
- * @ring: the ring to copy
- *
- * Queue statistics must be copied while protected by
- * u64_stats_fetch_begin, so we can't directly use iavf_add_ethtool_stats.
- * Assumes that queue stats are defined in iavf_gstrings_queue_stats. If the
- * ring pointer is null, zero out the queue stat values and update the data
- * pointer. Otherwise safely copy the stats from the ring into the supplied
- * buffer and update the data pointer when finished.
- *
- * This function expects to be called while under rcu_read_lock().
- **/
-static void
-iavf_add_queue_stats(u64 **data, struct iavf_ring *ring)
-{
- const unsigned int size = ARRAY_SIZE(iavf_gstrings_queue_stats);
- const struct iavf_stats *stats = iavf_gstrings_queue_stats;
- unsigned int start;
- unsigned int i;
-
- /* To avoid invalid statistics values, ensure that we keep retrying
- * the copy until we get a consistent value according to
- * u64_stats_fetch_retry. But first, make sure our ring is
- * non-null before attempting to access its syncp.
- */
- do {
- start = !ring ? 0 : u64_stats_fetch_begin(&ring->syncp);
- for (i = 0; i < size; i++)
- iavf_add_one_ethtool_stat(&(*data)[i], ring, &stats[i]);
- } while (ring && u64_stats_fetch_retry(&ring->syncp, start));
-
- /* Once we successfully copy the stats in, update the data pointer */
- *data += size;
-}
-
/**
* __iavf_add_stat_strings - copy stat strings into ethtool buffer
* @p: ethtool supplied buffer
@@ -237,8 +190,6 @@ static const struct iavf_stats iavf_gstrings_stats[] = {

#define IAVF_STATS_LEN ARRAY_SIZE(iavf_gstrings_stats)

-#define IAVF_QUEUE_STATS_LEN ARRAY_SIZE(iavf_gstrings_queue_stats)
-
/**
* iavf_get_link_ksettings - Get Link Speed and Duplex settings
* @netdev: network interface device structure
@@ -308,18 +259,22 @@ static int iavf_get_link_ksettings(struct net_device *netdev,
**/
static int iavf_get_sset_count(struct net_device *netdev, int sset)
{
- /* Report the maximum number queues, even if not every queue is
- * currently configured. Since allocation of queues is in pairs,
- * use netdev->real_num_tx_queues * 2. The real_num_tx_queues is set
- * at device creation and never changes.
- */
+ u32 num;

- if (sset == ETH_SS_STATS)
- return IAVF_STATS_LEN +
- (IAVF_QUEUE_STATS_LEN * 2 *
- netdev->real_num_tx_queues);
- else
+ switch (sset) {
+ case ETH_SS_STATS:
+ /* Per-queue */
+ num = libie_rq_stats_get_sset_count();
+ num += libie_sq_stats_get_sset_count();
+ num *= netdev->real_num_tx_queues;
+
+ /* Global */
+ num += IAVF_STATS_LEN;
+
+ return num;
+ default:
return -EINVAL;
+ }
}

/**
@@ -346,15 +301,14 @@ static void iavf_get_ethtool_stats(struct net_device *netdev,
* it to iterate over rings' stats.
*/
for (i = 0; i < adapter->num_active_queues; i++) {
- struct iavf_ring *ring;
+ const struct iavf_ring *ring;

/* Tx rings stats */
- ring = &adapter->tx_rings[i];
- iavf_add_queue_stats(&data, ring);
+ libie_sq_stats_get_data(&data, &adapter->tx_rings[i].sq_stats);

/* Rx rings stats */
ring = &adapter->rx_rings[i];
- iavf_add_queue_stats(&data, ring);
+ libie_rq_stats_get_data(&data, &ring->rq_stats, ring->pp);
}
rcu_read_unlock();
}
@@ -376,10 +330,8 @@ static void iavf_get_stat_strings(struct net_device *netdev, u8 *data)
* real_num_tx_queues for both Tx and Rx queues.
*/
for (i = 0; i < netdev->real_num_tx_queues; i++) {
- iavf_add_stat_strings(&data, iavf_gstrings_queue_stats,
- "tx", i);
- iavf_add_stat_strings(&data, iavf_gstrings_queue_stats,
- "rx", i);
+ libie_sq_stats_get_strings(&data, i);
+ libie_rq_stats_get_strings(&data, i);
}
}

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 39c6d83e80a1..659bdfc33e0b 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -1566,12 +1566,14 @@ static int iavf_alloc_queues(struct iavf_adapter *adapter)
tx_ring->itr_setting = IAVF_ITR_TX_DEF;
if (adapter->flags & IAVF_FLAG_WB_ON_ITR_CAPABLE)
tx_ring->flags |= IAVF_TXR_FLAGS_WB_ON_ITR;
+ u64_stats_init(&tx_ring->sq_stats.syncp);

rx_ring = &adapter->rx_rings[i];
rx_ring->queue_index = i;
rx_ring->netdev = adapter->netdev;
rx_ring->count = adapter->rx_desc_count;
rx_ring->itr_setting = IAVF_ITR_RX_DEF;
+ u64_stats_init(&rx_ring->rq_stats.syncp);
}

adapter->num_active_queues = num_active_queues;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index d1491b481eac..1ccc80c3f732 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -176,6 +176,9 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
for (i = 0; i < vsi->back->num_active_queues; i++) {
tx_ring = &vsi->back->tx_rings[i];
if (tx_ring && tx_ring->desc) {
+ const struct libie_sq_stats *st = &tx_ring->sq_stats;
+ u32 start;
+
/* If packet counter has not changed the queue is
* likely stalled, so force an interrupt for this
* queue.
@@ -183,8 +186,13 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
* prev_pkt_ctr would be negative if there was no
* pending work.
*/
- packets = tx_ring->stats.packets & INT_MAX;
- if (tx_ring->tx_stats.prev_pkt_ctr == packets) {
+ do {
+ start = u64_stats_fetch_begin(&st->syncp);
+ packets = u64_stats_read(&st->packets) &
+ INT_MAX;
+ } while (u64_stats_fetch_retry(&st->syncp, start));
+
+ if (tx_ring->prev_pkt_ctr == packets) {
iavf_force_wb(vsi, tx_ring->q_vector);
continue;
}
@@ -193,7 +201,7 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
* to iavf_get_tx_pending()
*/
smp_rmb();
- tx_ring->tx_stats.prev_pkt_ctr =
+ tx_ring->prev_pkt_ctr =
iavf_get_tx_pending(tx_ring, true) ? packets : -1;
}
}
@@ -212,11 +220,11 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
static bool iavf_clean_tx_irq(struct iavf_vsi *vsi,
struct iavf_ring *tx_ring, int napi_budget)
{
+ unsigned int budget = IAVF_DEFAULT_IRQ_WORK;
+ struct libie_sq_onstack_stats stats = { };
int i = tx_ring->next_to_clean;
struct iavf_tx_buffer *tx_buf;
struct iavf_tx_desc *tx_desc;
- unsigned int total_bytes = 0, total_packets = 0;
- unsigned int budget = IAVF_DEFAULT_IRQ_WORK;

tx_buf = &tx_ring->tx_bi[i];
tx_desc = IAVF_TX_DESC(tx_ring, i);
@@ -242,8 +250,8 @@ static bool iavf_clean_tx_irq(struct iavf_vsi *vsi,
tx_buf->next_to_watch = NULL;

/* update the statistics for this packet */
- total_bytes += tx_buf->bytecount;
- total_packets += tx_buf->gso_segs;
+ stats.bytes += tx_buf->bytecount;
+ stats.packets += tx_buf->gso_segs;

/* free the skb */
napi_consume_skb(tx_buf->skb, napi_budget);
@@ -300,12 +308,9 @@ static bool iavf_clean_tx_irq(struct iavf_vsi *vsi,

i += tx_ring->count;
tx_ring->next_to_clean = i;
- u64_stats_update_begin(&tx_ring->syncp);
- tx_ring->stats.bytes += total_bytes;
- tx_ring->stats.packets += total_packets;
- u64_stats_update_end(&tx_ring->syncp);
- tx_ring->q_vector->tx.total_bytes += total_bytes;
- tx_ring->q_vector->tx.total_packets += total_packets;
+ libie_sq_napi_stats_add(&tx_ring->sq_stats, &stats);
+ tx_ring->q_vector->tx.total_bytes += stats.bytes;
+ tx_ring->q_vector->tx.total_packets += stats.packets;

if (tx_ring->flags & IAVF_TXR_FLAGS_WB_ON_ITR) {
/* check to see if there are < 4 descriptors
@@ -324,10 +329,10 @@ static bool iavf_clean_tx_irq(struct iavf_vsi *vsi,

/* notify netdev of completed buffers */
netdev_tx_completed_queue(txring_txq(tx_ring),
- total_packets, total_bytes);
+ stats.packets, stats.bytes);

#define TX_WAKE_THRESHOLD ((s16)(DESC_NEEDED * 2))
- if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
+ if (unlikely(stats.packets && netif_carrier_ok(tx_ring->netdev) &&
(IAVF_DESC_UNUSED(tx_ring) >= TX_WAKE_THRESHOLD))) {
/* Make sure that anybody stopping the queue after this
* sees the new next_to_clean.
@@ -338,7 +343,7 @@ static bool iavf_clean_tx_irq(struct iavf_vsi *vsi,
!test_bit(__IAVF_VSI_DOWN, vsi->state)) {
netif_wake_subqueue(tx_ring->netdev,
tx_ring->queue_index);
- ++tx_ring->tx_stats.restart_queue;
+ libie_stats_inc_one(&tx_ring->sq_stats, restarts);
}
}

@@ -674,7 +679,7 @@ int iavf_setup_tx_descriptors(struct iavf_ring *tx_ring)

tx_ring->next_to_use = 0;
tx_ring->next_to_clean = 0;
- tx_ring->tx_stats.prev_pkt_ctr = -1;
+ tx_ring->prev_pkt_ctr = -1;
return 0;

err:
@@ -730,7 +735,7 @@ void iavf_free_rx_resources(struct iavf_ring *rx_ring)
rx_ring->desc = NULL;
}

- page_pool_destroy(rx_ring->pp);
+ libie_rx_page_pool_destroy(rx_ring->pp, &rx_ring->rq_stats);
rx_ring->pp = NULL;
}

@@ -758,8 +763,6 @@ int iavf_setup_rx_descriptors(struct iavf_ring *rx_ring)
if (!rx_ring->rx_bi)
goto err;

- u64_stats_init(&rx_ring->syncp);
-
/* Round up to nearest 4K */
rx_ring->size = rx_ring->count * sizeof(union iavf_32byte_rx_desc);
rx_ring->size = ALIGN(rx_ring->size, 4096);
@@ -878,7 +881,7 @@ bool iavf_alloc_rx_buffers(struct iavf_ring *rx_ring, u16 cleaned_count)
if (rx_ring->next_to_use != ntu)
iavf_release_rx_desc(rx_ring, ntu);

- rx_ring->rx_stats.alloc_page_failed++;
+ libie_stats_inc_one(&rx_ring->rq_stats, alloc_page_fail);

/* make sure to come back via polling to try again after
* allocation failure
@@ -1089,7 +1092,7 @@ static struct sk_buff *iavf_build_skb(const struct libie_rx_buffer *rx_buffer,
* iavf_is_non_eop - process handling of non-EOP buffers
* @rx_ring: Rx ring being processed
* @rx_desc: Rx descriptor for current buffer
- * @skb: Current socket buffer containing buffer in progress
+ * @stats: NAPI poll local stats to update
*
* This function updates next to clean. If the buffer is an EOP buffer
* this function exits returning false, otherwise it will place the
@@ -1098,7 +1101,7 @@ static struct sk_buff *iavf_build_skb(const struct libie_rx_buffer *rx_buffer,
**/
static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
union iavf_rx_desc *rx_desc,
- struct sk_buff *skb)
+ struct libie_rq_onstack_stats *stats)
{
u32 ntc = rx_ring->next_to_clean + 1;

@@ -1113,7 +1116,7 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
if (likely(iavf_test_staterr(rx_desc, IAVF_RXD_EOF)))
return false;

- rx_ring->rx_stats.non_eop_descs++;
+ stats->fragments++;

return true;
}
@@ -1132,12 +1135,12 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
**/
static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
{
- unsigned int total_rx_bytes = 0, total_rx_packets = 0;
- struct sk_buff *skb = rx_ring->skb;
u16 cleaned_count = IAVF_DESC_UNUSED(rx_ring);
+ struct libie_rq_onstack_stats stats = { };
+ struct sk_buff *skb = rx_ring->skb;
bool failure = false;

- while (likely(total_rx_packets < (unsigned int)budget)) {
+ while (likely(stats.packets < budget)) {
struct libie_rx_buffer *rx_buffer;
union iavf_rx_desc *rx_desc;
unsigned int size;
@@ -1187,14 +1190,15 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)

/* exit if we failed to retrieve a buffer */
if (!skb) {
- rx_ring->rx_stats.alloc_buff_failed++;
+ libie_stats_inc_one(&rx_ring->rq_stats,
+ build_skb_fail);
break;
}

skip_data:
cleaned_count++;

- if (iavf_is_non_eop(rx_ring, rx_desc, skb))
+ if (iavf_is_non_eop(rx_ring, rx_desc, &stats))
continue;

/* ERR_MASK will only have valid bits if EOP set, and
@@ -1214,7 +1218,7 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
}

/* probably a little skewed due to removing CRC */
- total_rx_bytes += skb->len;
+ stats.bytes += skb->len;

qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
rx_ptype = (qword & IAVF_RXD_QW1_PTYPE_MASK) >>
@@ -1236,20 +1240,20 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
skb = NULL;

/* update budget accounting */
- total_rx_packets++;
+ stats.packets++;
}

rx_ring->skb = skb;

- u64_stats_update_begin(&rx_ring->syncp);
- rx_ring->stats.packets += total_rx_packets;
- rx_ring->stats.bytes += total_rx_bytes;
- u64_stats_update_end(&rx_ring->syncp);
- rx_ring->q_vector->rx.total_packets += total_rx_packets;
- rx_ring->q_vector->rx.total_bytes += total_rx_bytes;
+ libie_rq_napi_stats_add(&rx_ring->rq_stats, &stats);
+ rx_ring->q_vector->rx.total_packets += stats.packets;
+ rx_ring->q_vector->rx.total_bytes += stats.bytes;

/* guarantee a trip back through this routine if there was a failure */
- return failure ? budget : (int)total_rx_packets;
+ if (unlikely(failure))
+ return budget;
+
+ return stats.packets;
}

static inline u32 iavf_buildreg_itr(const int type, u16 itr)
@@ -1426,10 +1430,8 @@ int iavf_napi_poll(struct napi_struct *napi, int budget)
return budget - 1;
}
tx_only:
- if (arm_wb) {
- q_vector->tx.ring[0].tx_stats.tx_force_wb++;
+ if (arm_wb)
iavf_enable_wb_on_itr(vsi, q_vector);
- }
return budget;
}

@@ -1888,6 +1890,7 @@ bool __iavf_chk_linearize(struct sk_buff *skb)
int __iavf_maybe_stop_tx(struct iavf_ring *tx_ring, int size)
{
netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+ libie_stats_inc_one(&tx_ring->sq_stats, stops);
/* Memory barrier before checking head and tail */
smp_mb();

@@ -1897,7 +1900,8 @@ int __iavf_maybe_stop_tx(struct iavf_ring *tx_ring, int size)

/* A reprieve! - use start_queue because it doesn't call schedule */
netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
- ++tx_ring->tx_stats.restart_queue;
+ libie_stats_inc_one(&tx_ring->sq_stats, restarts);
+
return 0;
}

@@ -2078,7 +2082,7 @@ static netdev_tx_t iavf_xmit_frame_ring(struct sk_buff *skb,
return NETDEV_TX_OK;
}
count = iavf_txd_use_count(skb->len);
- tx_ring->tx_stats.tx_linearize++;
+ libie_stats_inc_one(&tx_ring->sq_stats, linearized);
}

/* need: 1 descriptor per page * PAGE_SIZE/IAVF_MAX_DATA_PER_TXD,
@@ -2088,7 +2092,7 @@ static netdev_tx_t iavf_xmit_frame_ring(struct sk_buff *skb,
* otherwise try next time
*/
if (iavf_maybe_stop_tx(tx_ring, count + 4 + 1)) {
- tx_ring->tx_stats.tx_busy++;
+ libie_stats_inc_one(&tx_ring->sq_stats, busy);
return NETDEV_TX_BUSY;
}

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index b13d878c74c6..81b430e8d8ff 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -4,6 +4,8 @@
#ifndef _IAVF_TXRX_H_
#define _IAVF_TXRX_H_

+#include <linux/net/intel/libie/stats.h>
+
/* Interrupt Throttling and Rate Limiting Goodies */
#define IAVF_DEFAULT_IRQ_WORK 256

@@ -201,33 +203,6 @@ struct iavf_tx_buffer {
u32 tx_flags;
};

-struct iavf_queue_stats {
- u64 packets;
- u64 bytes;
-};
-
-struct iavf_tx_queue_stats {
- u64 restart_queue;
- u64 tx_busy;
- u64 tx_done_old;
- u64 tx_linearize;
- u64 tx_force_wb;
- int prev_pkt_ctr;
- u64 tx_lost_interrupt;
-};
-
-struct iavf_rx_queue_stats {
- u64 non_eop_descs;
- u64 alloc_page_failed;
- u64 alloc_buff_failed;
-};
-
-enum iavf_ring_state_t {
- __IAVF_TX_FDIR_INIT_DONE,
- __IAVF_TX_XPS_INIT_DONE,
- __IAVF_RING_STATE_NBITS /* must be last */
-};
-
/* some useful defines for virtchannel interface, which
* is the only remaining user of header split
*/
@@ -252,7 +227,6 @@ struct iavf_ring {
struct libie_rx_buffer *rx_bi;
struct iavf_tx_buffer *tx_bi;
};
- DECLARE_BITMAP(state, __IAVF_RING_STATE_NBITS);
u8 __iomem *tail;
u16 queue_index; /* Queue number of ring */
u8 dcb_tc; /* Traffic class of ring */
@@ -286,21 +260,9 @@ struct iavf_ring {
#define IAVF_TXR_FLAGS_VLAN_TAG_LOC_L2TAG2 BIT(4)
#define IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2 BIT(5)

- /* stats structs */
- struct iavf_queue_stats stats;
- struct u64_stats_sync syncp;
- union {
- struct iavf_tx_queue_stats tx_stats;
- struct iavf_rx_queue_stats rx_stats;
- };
-
- unsigned int size; /* length of descriptor ring in bytes */
- dma_addr_t dma; /* physical address of ring */
-
struct iavf_vsi *vsi; /* Backreference to associated VSI */
struct iavf_q_vector *q_vector; /* Backreference to associated vector */

- struct rcu_head rcu; /* to avoid race on free */
struct sk_buff *skb; /* When iavf_clean_rx_ring_irq() must
* return before it sees the EOP for
* the current packet, we save that skb
@@ -309,6 +271,18 @@ struct iavf_ring {
* iavf_clean_rx_ring_irq() is called
* for this ring.
*/
+
+ /* stats structs */
+ union {
+ struct libie_rq_stats rq_stats;
+ struct libie_sq_stats sq_stats;
+ };
+
+ int prev_pkt_ctr; /* For stall detection */
+ unsigned int size; /* length of descriptor ring in bytes */
+ dma_addr_t dma; /* physical address of ring */
+
+ struct rcu_head rcu; /* to avoid race on free */
} ____cacheline_internodealigned_in_smp;

#define IAVF_ITR_ADAPTIVE_MIN_INC 0x0002
--
2.41.0


2023-07-14 14:33:37

by Przemek Kitszel

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH RFC net-next v4 2/9] iavf: kill "legacy-rx" for good

On 7/5/23 17:55, Alexander Lobakin wrote:
> Ever since build_skb() became stable, the old way with allocating an skb
> for storing the headers separately, which will be then copied manually,
> was slower, less flexible and thus obsolete.
>
> * it had higher pressure on MM since it actually allocates new pages,
> which then get split and refcount-biased (NAPI page cache);
> * it implies memcpy() of packet headers (40+ bytes per each frame);
> * the actual header length was calculated via eth_get_headlen(), which
> invokes Flow Dissector and thus wastes a bunch of CPU cycles;
> * XDP makes it even more weird since it requires headroom for long and
> also tailroom for some time (since mbuf landed). Take a look at the
> ice driver, which is built around work-arounds to make XDP work with
> it.
>
> Even on some quite low-end hardware (not a common case for 100G NICs) it
> was performing worse.
> The only advantage "legacy-rx" had is that it didn't require any
> reserved headroom and tailroom. But iavf didn't use this, as it always
> splits pages into two halves of 2k, while that save would only be useful
> when striding. And again, XDP effectively removes that sole pro.
>
> There's a train of features to land in IAVF soon: Page Pool, XDP, XSk,
> multi-buffer etc. Each new would require adding more and more Danse
> Macabre for absolutely no reason, besides making hotpath less and less
> effective.
> Remove the "feature" with all the related code. This includes at least
> one very hot branch (typically hit on each new frame), which was either
> always-true or always-false at least for a complete NAPI bulk of 64
> frames, the whole private flags cruft, and so on. Some stats:
>
> Function: add/remove: 0/4 grow/shrink: 0/7 up/down: 0/-757 (-757)
> RO Data: add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-40 (-40)
>
> Reviewed-by: Alexander Duyck <[email protected]>
> Signed-off-by: Alexander Lobakin <[email protected]>
> ---
> drivers/net/ethernet/intel/iavf/iavf.h | 2 +-
> .../net/ethernet/intel/iavf/iavf_ethtool.c | 140 ------------------
> drivers/net/ethernet/intel/iavf/iavf_main.c | 10 +-
> drivers/net/ethernet/intel/iavf/iavf_txrx.c | 84 +----------
> drivers/net/ethernet/intel/iavf/iavf_txrx.h | 27 +---
> .../net/ethernet/intel/iavf/iavf_virtchnl.c | 3 +-
> 6 files changed, 8 insertions(+), 258 deletions(-)

Good one, there were some random questions in my mind during the read,
but all are resolved by subsequent patches.
(It's a pity I have not found yet time to fully read them though)

Reviewed-by: Przemek Kitszel <[email protected]>