2024-04-18 11:38:19

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v10 00/10] net: intel: start The Great Code Dedup + Page Pool for iavf

Here's a two-shot: introduce {,Intel} Ethernet common library (libeth and
libie) and switch iavf to Page Pool. Details are in the commit messages;
here's a summary:

Not a secret there's a ton of code duplication between two and more Intel
ethernet modules. Before introducing new changes, which would need to be
copied over again, start decoupling the already existing duplicate
functionality into a new module, which will be shared between several
Intel Ethernet drivers. The first name that came to my mind was
"libie" -- "Intel Ethernet common library". Also this sounds like
"lovelie" (-> one word, no "lib I E" pls) and can be expanded as
"lib Internet Explorer" :P
The "generic", pure-software part is placed separately, so that it can be
easily reused in any driver by any vendor without linking to the Intel
pre-200G guts. In a few words, it's something any modern driver does the
same way, but nobody moved it level up (yet).
The series is only the beginning. From now on, adding every new feature
or doing any good driver refactoring will remove much more lines than add
for quite some time. There's a basic roadmap with some deduplications
planned already, not speaking of that touching every line now asks:
"can I share this?". The final destination is very ambitious: have only
one unified driver for at least i40e, ice, iavf, and idpf with a struct
ops for each generation. That's never gonna happen, right? But you still
can at least try.
PP conversion for iavf lands within the same series as these two are tied
closely. libie will support Page Pool model only, so that a driver can't
use much of the lib until it's converted. iavf is only the example, the
rest will eventually be converted soon on a per-driver basis. That is
when it gets really interesting. Stay tech.

Alexander Lobakin (10):
net: intel: introduce {,Intel} Ethernet common library
iavf: kill "legacy-rx" for good
iavf: drop page splitting and recycling
slab: introduce kvmalloc_array_node() and kvcalloc_node()
page_pool: constify some read-only function arguments
page_pool: add DMA-sync-for-CPU inline helper
libeth: add Rx buffer management
iavf: pack iavf_ring more efficiently
iavf: switch to Page Pool
MAINTAINERS: add entry for libeth and libie

MAINTAINERS | 20 +
drivers/net/ethernet/intel/Kconfig | 7 +
drivers/net/ethernet/intel/libeth/Kconfig | 9 +
drivers/net/ethernet/intel/libie/Kconfig | 10 +
drivers/net/ethernet/intel/Makefile | 3 +
drivers/net/ethernet/intel/libeth/Makefile | 6 +
drivers/net/ethernet/intel/libie/Makefile | 6 +
include/net/page_pool/types.h | 4 +-
.../net/ethernet/intel/i40e/i40e_prototype.h | 7 -
drivers/net/ethernet/intel/i40e/i40e_type.h | 88 ---
drivers/net/ethernet/intel/iavf/iavf.h | 2 +-
.../net/ethernet/intel/iavf/iavf_prototype.h | 7 -
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 146 +----
drivers/net/ethernet/intel/iavf/iavf_type.h | 90 ---
.../net/ethernet/intel/ice/ice_lan_tx_rx.h | 320 ----------
include/linux/net/intel/libie/rx.h | 50 ++
include/linux/slab.h | 17 +-
include/net/libeth/rx.h | 242 ++++++++
include/net/page_pool/helpers.h | 34 +-
drivers/net/ethernet/intel/i40e/i40e_common.c | 253 --------
drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 72 +--
drivers/net/ethernet/intel/iavf/iavf_common.c | 253 --------
.../net/ethernet/intel/iavf/iavf_ethtool.c | 140 -----
drivers/net/ethernet/intel/iavf/iavf_main.c | 40 +-
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 551 +++---------------
.../net/ethernet/intel/iavf/iavf_virtchnl.c | 17 +-
drivers/net/ethernet/intel/ice/ice_main.c | 1 +
drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 111 +---
drivers/net/ethernet/intel/libeth/rx.c | 150 +++++
drivers/net/ethernet/intel/libie/rx.c | 124 ++++
net/core/page_pool.c | 10 +-
32 files changed, 836 insertions(+), 1955 deletions(-)
create mode 100644 drivers/net/ethernet/intel/libeth/Kconfig
create mode 100644 drivers/net/ethernet/intel/libie/Kconfig
create mode 100644 drivers/net/ethernet/intel/libeth/Makefile
create mode 100644 drivers/net/ethernet/intel/libie/Makefile
create mode 100644 include/linux/net/intel/libie/rx.h
create mode 100644 include/net/libeth/rx.h
create mode 100644 drivers/net/ethernet/intel/libeth/rx.c
create mode 100644 drivers/net/ethernet/intel/libie/rx.c

---
libeth has way more generic functionality and code in the idpf XDP
tree[0], take a look if you want to have more complete picture of
what this really is about.

From v9[1]:
* pick Acked-by from Vlastimil and a couple Reviewed-by from Przemek;
* mention that the libeth_fq::fp kernel-doc generates a warning and the
fix for that is pending on the linux-doc ML (Jakub);
* no functional changes.

From v8[2]:
* rebase on top of net-next (6.9-rc1+);
* introduce kvmalloc_array_node() and kvcalloc_node();
* make Rx buffer management NUMA-aware;
* resolve kdoc issues (Jakub, me).

From v7[3]:
* drop Page Pool optimization prereqs;
* drop generic stats part: will redo to the new per-queue stats later;
* split libie into "generic" and "fnic" (i40e, ice, iavf) parts;
* use shorter and modern struct names;
* #1: allow to compile-out hotpath IPv6 code when !CONFIG_IPV6;
* #1: generate XDP RSS hash directly in the lookup table;
* #8: fix rare skb nullptr deref bug.

From v6[4]:
* #04: resolve ethtool_puts() Git conflict (Jakub);
* #06: pick RB from Ilias;
* no functional changes.

From v5[5]:
* drop Page Pool DMA shortcut: will pick up Eric's more global DMA sync
optimization[6] and expand it to cover both IOMMU and direct DMA a bit
later (Yunsheng);
* drop per-queue Page Pool Ethtool stats: they are now exported via
generic Netlink interface (Jakub);
* #01: leave a comment why exactly this alignment (Jakub, Yunsheng);
* #08: make use of page_pool_params::netdev when calculating PP params;
* #08: rename ``libie_rx_queue`` -> ``libie_buf_queue``.

From v4[7]:
* make use of Jakub's &page_pool_params split;
* #01: prevent frag fields from spanning into 2 cachelines after
splitting &page_pool_params into fast and slow;
* #02-03: bring back the DMA sync shortcut, now as a per-page flag
(me, Yunsheng);
* #04: let libie have its own Kconfig to stop further bloating of poor
intel/Kconfig;
* #06: merge page split-reuse-recycle drop into one commit (Alex);
* #07: decouple constifying of several Page Pool function arguments
into a separate commit, constify some more;
* #09: stop abusing internal PP fields in the driver code (Yunsheng);
* #09: calculate DMA sync size (::max_len) correctly: within one page,
not one buffer (Yunsheng);
* #10: decouple rearranging &iavf_ring into separate commit, optimize
it even more;
* #11: let the driver get back to the last descriptor to process after
an skb allocation fail, don't drop it (Alex);
* #11: stop touching unrelated stuff like watchdog timeout etc. (Alex);
* fix "Return:" in the kdoc (now `W=12 C=1` is clean), misc typos.

From v3[8]:
* base on the latest net-next, update bloat-o-meter and perf stats;
* split generic PP optimizations into a separate series;
* drop "optimize hotpath a bunch" commit: a lot of [controversial]
changes in one place, worth own series (Alex);
* 02: pick Rev-by (Alex);
* 03: move in-place recycling removal here from the dropped patch;
* 05: new, add libie Rx buffer API separatelly from IAVF changes;
* 05-06: use new "hybrid" allocation API from[8] to reduce memory usage
when a page can fit more than 1 truesize (also asked by David);
* 06: merge with "always use order-0 page" commit to reduce diffs and
simplify things (Alex);
* 09: fix page_alloc_fail counter.

From v2[10]:
* 0006: fix page_pool.h include in OcteonTX2 files (Jakub, Patchwork);
* no functional changes.

From v1[11]:
* 0006: new (me, Jakub);
* 0008: give the helpers more intuitive names (Jakub, Ilias);
* -^-: also expand their kdoc a bit for the same reason;
* -^-: fix kdoc copy-paste issue (Patchwork, Jakub);
* 0011: drop `inline` from C file (Patchwork, Jakub).

[0] https://github.com/alobakin/linux/commits/idpf-libie-new
[1] https://lore.kernel.org/netdev/[email protected]
[2] https://lore.kernel.org/netdev/[email protected]
[3] https://lore.kernel.org/netdev/[email protected]
[4] https://lore.kernel.org/netdev/[email protected]
[5] https://lore.kernel.org/netdev/[email protected]
[6] https://lore.kernel.org/netdev/[email protected]
[7] https://lore.kernel.org/netdev/[email protected]
[8] https://lore.kernel.org/netdev/[email protected]
[9] https://lore.kernel.org/netdev/[email protected]
[10] https://lore.kernel.org/netdev/[email protected]
[11] https://lore.kernel.org/netdev/[email protected]

--
2.44.0



2024-04-18 11:38:35

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v10 06/10] page_pool: add DMA-sync-for-CPU inline helper

Each driver is responsible for syncing buffers written by HW for CPU
before accessing them. Almost each PP-enabled driver uses the same
pattern, which could be shorthanded into a static inline to make driver
code a little bit more compact.
Introduce a simple helper which performs DMA synchronization for the
size passed from the driver. It can be used even when the pool doesn't
manage DMA-syncs-for-device, just make sure the page has a correct DMA
address set via page_pool_set_dma_addr().

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/page_pool/helpers.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index c7bb06750e85..873631c79ab1 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -52,6 +52,8 @@
#ifndef _NET_PAGE_POOL_HELPERS_H
#define _NET_PAGE_POOL_HELPERS_H

+#include <linux/dma-mapping.h>
+
#include <net/page_pool/types.h>

#ifdef CONFIG_PAGE_POOL_STATS
@@ -395,6 +397,28 @@ static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
return false;
}

+/**
+ * page_pool_dma_sync_for_cpu - sync Rx page for CPU after it's written by HW
+ * @pool: &page_pool the @page belongs to
+ * @page: page to sync
+ * @offset: offset from page start to "hard" start if using PP frags
+ * @dma_sync_size: size of the data written to the page
+ *
+ * Can be used as a shorthand to sync Rx pages before accessing them in the
+ * driver. Caller must ensure the pool was created with ``PP_FLAG_DMA_MAP``.
+ * Note that this version performs DMA sync unconditionally, even if the
+ * associated PP doesn't perform sync-for-device.
+ */
+static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
+ const struct page *page,
+ u32 offset, u32 dma_sync_size)
+{
+ dma_sync_single_range_for_cpu(pool->p.dev,
+ page_pool_get_dma_addr(page),
+ offset + pool->p.offset, dma_sync_size,
+ page_pool_get_dma_dir(pool));
+}
+
static inline bool page_pool_put(struct page_pool *pool)
{
return refcount_dec_and_test(&pool->user_cnt);
--
2.44.0


2024-04-18 11:39:01

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v10 07/10] libeth: add Rx buffer management

Add a couple intuitive helpers to hide Rx buffer implementation details
in the library and not multiplicate it between drivers. The settings are
sorta optimized for 100G+ NICs, but nothing really HW-specific here.
Use the new page_pool_dev_alloc() to dynamically switch between
split-page and full-page modes depending on MTU, page size, required
headroom etc. For example, on x86_64 with the default driver settings
each page is shared between 2 buffers. Turning on XDP (not in this
series) -> increasing headroom requirement pushes truesize out of 2048
boundary, leading to that each buffer starts getting a full page.
The "ceiling" limit is %PAGE_SIZE, as only order-0 pages are used to
avoid compound overhead. For the above architecture, this means maximum
linear frame size of 3712 w/o XDP.
Not that &libeth_buf_queue is not a complete queue/ring structure for
now, rather a shim, but eventually the libeth-enabled drivers will move
to it, with iavf being the first one.

Signed-off-by: Alexander Lobakin <[email protected]>
---
kernel-doc for libeth_fq::fp generates an "Excess struct member" warning
currently, here's patch which fixes the script: [0]

[0] https://lore.kernel.org/linux-doc/[email protected]
---
drivers/net/ethernet/intel/libeth/Kconfig | 1 +
include/net/libeth/rx.h | 117 ++++++++++++++++++++++
drivers/net/ethernet/intel/libeth/rx.c | 98 ++++++++++++++++++
3 files changed, 216 insertions(+)

diff --git a/drivers/net/ethernet/intel/libeth/Kconfig b/drivers/net/ethernet/intel/libeth/Kconfig
index af970a63c227..480293b71dbc 100644
--- a/drivers/net/ethernet/intel/libeth/Kconfig
+++ b/drivers/net/ethernet/intel/libeth/Kconfig
@@ -3,6 +3,7 @@

config LIBETH
tristate
+ select PAGE_POOL
help
libeth is a common library containing routines shared between several
drivers, but not yet promoted to the generic kernel API.
diff --git a/include/net/libeth/rx.h b/include/net/libeth/rx.h
index 0807e19f44b3..f29ea3e34c6c 100644
--- a/include/net/libeth/rx.h
+++ b/include/net/libeth/rx.h
@@ -4,8 +4,125 @@
#ifndef __LIBETH_RX_H
#define __LIBETH_RX_H

+#include <linux/if_vlan.h>
+
+#include <net/page_pool/helpers.h>
#include <net/xdp.h>

+/* Rx buffer management */
+
+/* Space reserved in front of each frame */
+#define LIBETH_SKB_HEADROOM (NET_SKB_PAD + NET_IP_ALIGN)
+/* Maximum headroom for worst-case calculations */
+#define LIBETH_MAX_HEADROOM LIBETH_SKB_HEADROOM
+/* Link layer / L2 overhead: Ethernet, 2 VLAN tags (C + S), FCS */
+#define LIBETH_RX_LL_LEN (ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN)
+
+/* Always use order-0 pages */
+#define LIBETH_RX_PAGE_ORDER 0
+/* Pick a sane buffer stride and align to a cacheline boundary */
+#define LIBETH_RX_BUF_STRIDE SKB_DATA_ALIGN(128)
+/* HW-writeable space in one buffer: truesize - headroom/tailroom, aligned */
+#define LIBETH_RX_PAGE_LEN(hr) \
+ ALIGN_DOWN(SKB_MAX_ORDER(hr, LIBETH_RX_PAGE_ORDER), \
+ LIBETH_RX_BUF_STRIDE)
+
+/**
+ * struct libeth_fqe - structure representing an Rx buffer (fill queue element)
+ * @page: page holding the buffer
+ * @offset: offset from the page start (to the headroom)
+ * @truesize: total space occupied by the buffer (w/ headroom and tailroom)
+ *
+ * Depending on the MTU, API switches between one-page-per-frame and shared
+ * page model (to conserve memory on bigger-page platforms). In case of the
+ * former, @offset is always 0 and @truesize is always ```PAGE_SIZE```.
+ */
+struct libeth_fqe {
+ struct page *page;
+ u32 offset;
+ u32 truesize;
+} __aligned_largest;
+
+/**
+ * struct libeth_fq - structure representing a buffer (fill) queue
+ * @fp: hotpath part of the structure
+ * @pp: &page_pool for buffer management
+ * @fqes: array of Rx buffers
+ * @truesize: size to allocate per buffer, w/overhead
+ * @count: number of descriptors/buffers the queue has
+ * @buf_len: HW-writeable length per each buffer
+ * @nid: ID of the closest NUMA node with memory
+ */
+struct libeth_fq {
+ struct_group_tagged(libeth_fq_fp, fp,
+ struct page_pool *pp;
+ struct libeth_fqe *fqes;
+
+ u32 truesize;
+ u32 count;
+ );
+
+ /* Cold fields */
+ u32 buf_len;
+ int nid;
+};
+
+int libeth_rx_fq_create(struct libeth_fq *fq, struct napi_struct *napi);
+void libeth_rx_fq_destroy(struct libeth_fq *fq);
+
+/**
+ * libeth_rx_alloc - allocate a new Rx buffer
+ * @fq: fill queue to allocate for
+ * @i: index of the buffer within the queue
+ *
+ * Return: DMA address to be passed to HW for Rx on successful allocation,
+ * ```DMA_MAPPING_ERROR``` otherwise.
+ */
+static inline dma_addr_t libeth_rx_alloc(const struct libeth_fq_fp *fq, u32 i)
+{
+ struct libeth_fqe *buf = &fq->fqes[i];
+
+ buf->truesize = fq->truesize;
+ buf->page = page_pool_dev_alloc(fq->pp, &buf->offset, &buf->truesize);
+ if (unlikely(!buf->page))
+ return DMA_MAPPING_ERROR;
+
+ return page_pool_get_dma_addr(buf->page) + buf->offset +
+ fq->pp->p.offset;
+}
+
+void libeth_rx_recycle_slow(struct page *page);
+
+/**
+ * libeth_rx_sync_for_cpu - synchronize or recycle buffer post DMA
+ * @fqe: buffer to process
+ * @len: frame length from the descriptor
+ *
+ * Process the buffer after it's written by HW. The regular path is to
+ * synchronize DMA for CPU, but in case of no data it will be immediately
+ * recycled back to its PP.
+ *
+ * Return: true when there's data to process, false otherwise.
+ */
+static inline bool libeth_rx_sync_for_cpu(const struct libeth_fqe *fqe,
+ u32 len)
+{
+ struct page *page = fqe->page;
+
+ /* Very rare, but possible case. The most common reason:
+ * the last fragment contained FCS only, which was then
+ * stripped by the HW.
+ */
+ if (unlikely(!len)) {
+ libeth_rx_recycle_slow(page);
+ return false;
+ }
+
+ page_pool_dma_sync_for_cpu(page->pp, page, fqe->offset, len);
+
+ return true;
+}
+
/* Converting abstract packet type numbers into a software structure with
* the packet parameters to do O(1) lookup on Rx.
*/
diff --git a/drivers/net/ethernet/intel/libeth/rx.c b/drivers/net/ethernet/intel/libeth/rx.c
index 879c4dfd6a4e..6221b88c34ac 100644
--- a/drivers/net/ethernet/intel/libeth/rx.c
+++ b/drivers/net/ethernet/intel/libeth/rx.c
@@ -3,6 +3,104 @@

#include <net/libeth/rx.h>

+/* Rx buffer management */
+
+/**
+ * libeth_rx_hw_len - get the actual buffer size to be passed to HW
+ * @pp: &page_pool_params of the netdev to calculate the size for
+ * @max_len: maximum buffer size for a single descriptor
+ *
+ * Return: HW-writeable length per one buffer to pass it to the HW accounting:
+ * MTU the @dev has, HW required alignment, minimum and maximum allowed values,
+ * and system's page size.
+ */
+static u32 libeth_rx_hw_len(const struct page_pool_params *pp, u32 max_len)
+{
+ u32 len;
+
+ len = READ_ONCE(pp->netdev->mtu) + LIBETH_RX_LL_LEN;
+ len = ALIGN(len, LIBETH_RX_BUF_STRIDE);
+ len = min3(len, ALIGN_DOWN(max_len ? : U32_MAX, LIBETH_RX_BUF_STRIDE),
+ pp->max_len);
+
+ return len;
+}
+
+/**
+ * libeth_rx_fq_create - create a PP with the default libeth settings
+ * @fq: buffer queue struct to fill
+ * @napi: &napi_struct covering this PP (no usage outside its poll loops)
+ *
+ * Return: %0 on success, -%errno on failure.
+ */
+int libeth_rx_fq_create(struct libeth_fq *fq, struct napi_struct *napi)
+{
+ struct page_pool_params pp = {
+ .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
+ .order = LIBETH_RX_PAGE_ORDER,
+ .pool_size = fq->count,
+ .nid = fq->nid,
+ .dev = napi->dev->dev.parent,
+ .netdev = napi->dev,
+ .napi = napi,
+ .dma_dir = DMA_FROM_DEVICE,
+ .offset = LIBETH_SKB_HEADROOM,
+ };
+ struct libeth_fqe *fqes;
+ struct page_pool *pool;
+
+ /* HW-writeable / syncable length per one page */
+ pp.max_len = LIBETH_RX_PAGE_LEN(pp.offset);
+
+ /* HW-writeable length per buffer */
+ fq->buf_len = libeth_rx_hw_len(&pp, fq->buf_len);
+ /* Buffer size to allocate */
+ fq->truesize = roundup_pow_of_two(SKB_HEAD_ALIGN(pp.offset +
+ fq->buf_len));
+
+ pool = page_pool_create(&pp);
+ if (IS_ERR(pool))
+ return PTR_ERR(pool);
+
+ fqes = kvcalloc_node(fq->count, sizeof(*fqes), GFP_KERNEL, fq->nid);
+ if (!fqes)
+ goto err_buf;
+
+ fq->fqes = fqes;
+ fq->pp = pool;
+
+ return 0;
+
+err_buf:
+ page_pool_destroy(pool);
+
+ return -ENOMEM;
+}
+EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_create, LIBETH);
+
+/**
+ * libeth_rx_fq_destroy - destroy a &page_pool created by libeth
+ * @fq: buffer queue to process
+ */
+void libeth_rx_fq_destroy(struct libeth_fq *fq)
+{
+ kvfree(fq->fqes);
+ page_pool_destroy(fq->pp);
+}
+EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_destroy, LIBETH);
+
+/**
+ * libeth_rx_recycle_slow - recycle a libeth page from the NAPI context
+ * @page: page to recycle
+ *
+ * To be used on exceptions or rare cases not requiring fast inline recycling.
+ */
+void libeth_rx_recycle_slow(struct page *page)
+{
+ page_pool_recycle_direct(page->pp, page);
+}
+EXPORT_SYMBOL_NS_GPL(libeth_rx_recycle_slow, LIBETH);
+
/* Converting abstract packet type numbers into a software structure with
* the packet parameters to do O(1) lookup on Rx.
*/
--
2.44.0


2024-04-18 11:39:17

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v10 03/10] iavf: drop page splitting and recycling

As an intermediate step, remove all page splitting/recycling code. Just
always allocate a new page and don't touch its refcount, so that it gets
freed by the core stack later.
Same for the "in-place" recycling, i.e. when an unused buffer gets
assigned to a first needs-refilling descriptor. In some cases, this
was leading to moving up to 63 &iavf_rx_buf structures around the ring
on a per-field basis -- not something wanted on hotpath.
The change allows to greatly simplify certain parts of the code:

Function: add/remove: 0/2 grow/shrink: 0/7 up/down: 0/-744 (-744)

Although the array of &iavf_rx_buf is barely used now and could be
replaced with just page pointer array, don't touch it now to not
complicate replacing it with libie Rx buffer struct later on.
No surprise perf loses up to 30% here, but that regression will
go away once PP lands.
Note that iavf_rx_pg_*() definitions are left to reduce diffstat.
They will be removed with the conversion to Page Pool.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 65 --------
drivers/net/ethernet/intel/iavf/iavf_type.h | 2 -
drivers/net/ethernet/intel/iavf/iavf_main.c | 24 +--
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 152 +-----------------
.../net/ethernet/intel/iavf/iavf_virtchnl.c | 8 +-
5 files changed, 10 insertions(+), 241 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index 68543efdd29b..e01777531635 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -81,8 +81,6 @@ enum iavf_dyn_idx_t {
BIT_ULL(IAVF_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP))

/* Supported Rx Buffer Sizes (a multiple of 128) */
-#define IAVF_RXBUFFER_1536 1536 /* 128B aligned standard Ethernet frame */
-#define IAVF_RXBUFFER_2048 2048
#define IAVF_RXBUFFER_3072 3072 /* Used for large frames w/ padding */
#define IAVF_MAX_RXBUFFER 9728 /* largest size for single descriptor */

@@ -92,57 +90,7 @@ enum iavf_dyn_idx_t {
#define IAVF_RX_DMA_ATTR \
(DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)

-/* Attempt to maximize the headroom available for incoming frames. We
- * use a 2K buffer for receives and need 1536/1534 to store the data for
- * the frame. This leaves us with 512 bytes of room. From that we need
- * to deduct the space needed for the shared info and the padding needed
- * to IP align the frame.
- *
- * Note: For cache line sizes 256 or larger this value is going to end
- * up negative. In these cases we should fall back to the legacy
- * receive path.
- */
-#if (PAGE_SIZE < 8192)
-#define IAVF_2K_TOO_SMALL_WITH_PADDING \
-((NET_SKB_PAD + IAVF_RXBUFFER_1536) > SKB_WITH_OVERHEAD(IAVF_RXBUFFER_2048))
-
-static inline int iavf_compute_pad(int rx_buf_len)
-{
- int page_size, pad_size;
-
- page_size = ALIGN(rx_buf_len, PAGE_SIZE / 2);
- pad_size = SKB_WITH_OVERHEAD(page_size) - rx_buf_len;
-
- return pad_size;
-}
-
-static inline int iavf_skb_pad(void)
-{
- int rx_buf_len;
-
- /* If a 2K buffer cannot handle a standard Ethernet frame then
- * optimize padding for a 3K buffer instead of a 1.5K buffer.
- *
- * For a 3K buffer we need to add enough padding to allow for
- * tailroom due to NET_IP_ALIGN possibly shifting us out of
- * cache-line alignment.
- */
- if (IAVF_2K_TOO_SMALL_WITH_PADDING)
- rx_buf_len = IAVF_RXBUFFER_3072 + SKB_DATA_ALIGN(NET_IP_ALIGN);
- else
- rx_buf_len = IAVF_RXBUFFER_1536;
-
- /* if needed make room for NET_IP_ALIGN */
- rx_buf_len -= NET_IP_ALIGN;
-
- return iavf_compute_pad(rx_buf_len);
-}
-
-#define IAVF_SKB_PAD iavf_skb_pad()
-#else
-#define IAVF_2K_TOO_SMALL_WITH_PADDING false
#define IAVF_SKB_PAD (NET_SKB_PAD + NET_IP_ALIGN)
-#endif

/**
* iavf_test_staterr - tests bits in Rx descriptor status and error fields
@@ -265,12 +213,7 @@ struct iavf_tx_buffer {
struct iavf_rx_buffer {
dma_addr_t dma;
struct page *page;
-#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
__u32 page_offset;
-#else
- __u16 page_offset;
-#endif
- __u16 pagecnt_bias;
};

struct iavf_queue_stats {
@@ -292,8 +235,6 @@ struct iavf_rx_queue_stats {
u64 non_eop_descs;
u64 alloc_page_failed;
u64 alloc_buff_failed;
- u64 page_reuse_count;
- u64 realloc_count;
};

enum iavf_ring_state_t {
@@ -337,7 +278,6 @@ struct iavf_ring {

u16 count; /* Number of descriptors */
u16 reg_idx; /* HW register index of the ring */
- u16 rx_buf_len;

/* used in interrupt processing */
u16 next_to_use;
@@ -373,7 +313,6 @@ struct iavf_ring {
struct iavf_q_vector *q_vector; /* Backreference to associated vector */

struct rcu_head rcu; /* to avoid race on free */
- u16 next_to_alloc;
struct sk_buff *skb; /* When iavf_clean_rx_ring_irq() must
* return before it sees the EOP for
* the current packet, we save that skb
@@ -407,10 +346,6 @@ struct iavf_ring_container {

static inline unsigned int iavf_rx_pg_order(struct iavf_ring *ring)
{
-#if (PAGE_SIZE < 8192)
- if (ring->rx_buf_len > (PAGE_SIZE / 2))
- return 1;
-#endif
return 0;
}

diff --git a/drivers/net/ethernet/intel/iavf/iavf_type.h b/drivers/net/ethernet/intel/iavf/iavf_type.h
index 23ded4fcd94f..f6b09e57abce 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_type.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_type.h
@@ -10,8 +10,6 @@
#include "iavf_adminq.h"
#include "iavf_devids.h"

-#define IAVF_RXQ_CTX_DBUFF_SHIFT 7
-
/* IAVF_MASK is a macro used on 32 bit registers */
#define IAVF_MASK(mask, shift) ((u32)(mask) << (shift))

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 5eb7379956e4..ffb71a62b105 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -715,32 +715,10 @@ static void iavf_configure_tx(struct iavf_adapter *adapter)
**/
static void iavf_configure_rx(struct iavf_adapter *adapter)
{
- unsigned int rx_buf_len = IAVF_RXBUFFER_2048;
struct iavf_hw *hw = &adapter->hw;
- int i;
-
- if (PAGE_SIZE < 8192) {
- struct net_device *netdev = adapter->netdev;

- /* For jumbo frames on systems with 4K pages we have to use
- * an order 1 page, so we might as well increase the size
- * of our Rx buffer to make better use of the available space
- */
- rx_buf_len = IAVF_RXBUFFER_3072;
-
- /* We use a 1536 buffer size for configurations with
- * standard Ethernet mtu. On x86 this gives us enough room
- * for shared info and 192 bytes of padding.
- */
- if (!IAVF_2K_TOO_SMALL_WITH_PADDING &&
- (netdev->mtu <= ETH_DATA_LEN))
- rx_buf_len = IAVF_RXBUFFER_1536 - NET_IP_ALIGN;
- }
-
- for (i = 0; i < adapter->num_active_queues; i++) {
+ for (u32 i = 0; i < adapter->num_active_queues; i++)
adapter->rx_rings[i].tail = hw->hw_addr + IAVF_QRX_TAIL1(i);
- adapter->rx_rings[i].rx_buf_len = rx_buf_len;
- }
}

/**
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index a908b394409d..c7602c173420 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -715,7 +715,7 @@ static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
dma_sync_single_range_for_cpu(rx_ring->dev,
rx_bi->dma,
rx_bi->page_offset,
- rx_ring->rx_buf_len,
+ IAVF_RXBUFFER_3072,
DMA_FROM_DEVICE);

/* free resources associated with mapping */
@@ -724,7 +724,7 @@ static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
DMA_FROM_DEVICE,
IAVF_RX_DMA_ATTR);

- __page_frag_cache_drain(rx_bi->page, rx_bi->pagecnt_bias);
+ __free_page(rx_bi->page);

rx_bi->page = NULL;
rx_bi->page_offset = 0;
@@ -736,7 +736,6 @@ static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
/* Zero out the descriptor ring */
memset(rx_ring->desc, 0, rx_ring->size);

- rx_ring->next_to_alloc = 0;
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;
}
@@ -792,7 +791,6 @@ int iavf_setup_rx_descriptors(struct iavf_ring *rx_ring)
goto err;
}

- rx_ring->next_to_alloc = 0;
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;

@@ -812,9 +810,6 @@ static void iavf_release_rx_desc(struct iavf_ring *rx_ring, u32 val)
{
rx_ring->next_to_use = val;

- /* update next to alloc since we have filled the ring */
- rx_ring->next_to_alloc = val;
-
/* Force memory writes to complete before letting h/w
* know there are new descriptors to fetch. (Only
* applicable for weak-ordered memory model archs,
@@ -838,12 +833,6 @@ static bool iavf_alloc_mapped_page(struct iavf_ring *rx_ring,
struct page *page = bi->page;
dma_addr_t dma;

- /* since we are recycling buffers we should seldom need to alloc */
- if (likely(page)) {
- rx_ring->rx_stats.page_reuse_count++;
- return true;
- }
-
/* alloc new page for storage */
page = dev_alloc_pages(iavf_rx_pg_order(rx_ring));
if (unlikely(!page)) {
@@ -870,9 +859,6 @@ static bool iavf_alloc_mapped_page(struct iavf_ring *rx_ring,
bi->page = page;
bi->page_offset = IAVF_SKB_PAD;

- /* initialize pagecnt_bias to 1 representing we fully own page */
- bi->pagecnt_bias = 1;
-
return true;
}

@@ -924,7 +910,7 @@ bool iavf_alloc_rx_buffers(struct iavf_ring *rx_ring, u16 cleaned_count)
/* sync the buffer for use by the device */
dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
bi->page_offset,
- rx_ring->rx_buf_len,
+ IAVF_RXBUFFER_3072,
DMA_FROM_DEVICE);

/* Refresh the desc even if buffer_addrs didn't change
@@ -1102,91 +1088,6 @@ static bool iavf_cleanup_headers(struct iavf_ring *rx_ring, struct sk_buff *skb)
return false;
}

-/**
- * iavf_reuse_rx_page - page flip buffer and store it back on the ring
- * @rx_ring: rx descriptor ring to store buffers on
- * @old_buff: donor buffer to have page reused
- *
- * Synchronizes page for reuse by the adapter
- **/
-static void iavf_reuse_rx_page(struct iavf_ring *rx_ring,
- struct iavf_rx_buffer *old_buff)
-{
- struct iavf_rx_buffer *new_buff;
- u16 nta = rx_ring->next_to_alloc;
-
- new_buff = &rx_ring->rx_bi[nta];
-
- /* update, and store next to alloc */
- nta++;
- rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
-
- /* transfer page from old buffer to new buffer */
- new_buff->dma = old_buff->dma;
- new_buff->page = old_buff->page;
- new_buff->page_offset = old_buff->page_offset;
- new_buff->pagecnt_bias = old_buff->pagecnt_bias;
-}
-
-/**
- * iavf_can_reuse_rx_page - Determine if this page can be reused by
- * the adapter for another receive
- *
- * @rx_buffer: buffer containing the page
- *
- * If page is reusable, rx_buffer->page_offset is adjusted to point to
- * an unused region in the page.
- *
- * For small pages, @truesize will be a constant value, half the size
- * of the memory at page. We'll attempt to alternate between high and
- * low halves of the page, with one half ready for use by the hardware
- * and the other half being consumed by the stack. We use the page
- * ref count to determine whether the stack has finished consuming the
- * portion of this page that was passed up with a previous packet. If
- * the page ref count is >1, we'll assume the "other" half page is
- * still busy, and this page cannot be reused.
- *
- * For larger pages, @truesize will be the actual space used by the
- * received packet (adjusted upward to an even multiple of the cache
- * line size). This will advance through the page by the amount
- * actually consumed by the received packets while there is still
- * space for a buffer. Each region of larger pages will be used at
- * most once, after which the page will not be reused.
- *
- * In either case, if the page is reusable its refcount is increased.
- **/
-static bool iavf_can_reuse_rx_page(struct iavf_rx_buffer *rx_buffer)
-{
- unsigned int pagecnt_bias = rx_buffer->pagecnt_bias;
- struct page *page = rx_buffer->page;
-
- /* Is any reuse possible? */
- if (!dev_page_is_reusable(page))
- return false;
-
-#if (PAGE_SIZE < 8192)
- /* if we are only owner of page we can reuse it */
- if (unlikely((page_count(page) - pagecnt_bias) > 1))
- return false;
-#else
-#define IAVF_LAST_OFFSET \
- (SKB_WITH_OVERHEAD(PAGE_SIZE) - IAVF_RXBUFFER_2048)
- if (rx_buffer->page_offset > IAVF_LAST_OFFSET)
- return false;
-#endif
-
- /* If we have drained the page fragment pool we need to update
- * the pagecnt_bias and page count so that we fully restock the
- * number of references the driver holds.
- */
- if (unlikely(!pagecnt_bias)) {
- page_ref_add(page, USHRT_MAX);
- rx_buffer->pagecnt_bias = USHRT_MAX;
- }
-
- return true;
-}
-
/**
* iavf_add_rx_frag - Add contents of Rx buffer to sk_buff
* @rx_ring: rx descriptor ring to transact packets on
@@ -1204,24 +1105,13 @@ static void iavf_add_rx_frag(struct iavf_ring *rx_ring,
struct sk_buff *skb,
unsigned int size)
{
-#if (PAGE_SIZE < 8192)
- unsigned int truesize = iavf_rx_pg_size(rx_ring) / 2;
-#else
unsigned int truesize = SKB_DATA_ALIGN(size + IAVF_SKB_PAD);
-#endif

if (!size)
return;

skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
rx_buffer->page_offset, size, truesize);
-
- /* page is being used so we must update the page offset */
-#if (PAGE_SIZE < 8192)
- rx_buffer->page_offset ^= truesize;
-#else
- rx_buffer->page_offset += truesize;
-#endif
}

/**
@@ -1249,9 +1139,6 @@ static struct iavf_rx_buffer *iavf_get_rx_buffer(struct iavf_ring *rx_ring,
size,
DMA_FROM_DEVICE);

- /* We have pulled a buffer for use, so decrement pagecnt_bias */
- rx_buffer->pagecnt_bias--;
-
return rx_buffer;
}

@@ -1269,12 +1156,8 @@ static struct sk_buff *iavf_build_skb(struct iavf_ring *rx_ring,
unsigned int size)
{
void *va;
-#if (PAGE_SIZE < 8192)
- unsigned int truesize = iavf_rx_pg_size(rx_ring) / 2;
-#else
unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
SKB_DATA_ALIGN(IAVF_SKB_PAD + size);
-#endif
struct sk_buff *skb;

if (!rx_buffer || !size)
@@ -1292,23 +1175,15 @@ static struct sk_buff *iavf_build_skb(struct iavf_ring *rx_ring,
skb_reserve(skb, IAVF_SKB_PAD);
__skb_put(skb, size);

- /* buffer is used by skb, update page_offset */
-#if (PAGE_SIZE < 8192)
- rx_buffer->page_offset ^= truesize;
-#else
- rx_buffer->page_offset += truesize;
-#endif
-
return skb;
}

/**
- * iavf_put_rx_buffer - Clean up used buffer and either recycle or free
+ * iavf_put_rx_buffer - Unmap used buffer
* @rx_ring: rx descriptor ring to transact packets on
* @rx_buffer: rx buffer to pull data from
*
- * This function will clean up the contents of the rx_buffer. It will
- * either recycle the buffer or unmap it and free the associated resources.
+ * This function will unmap the buffer after it's written by HW.
*/
static void iavf_put_rx_buffer(struct iavf_ring *rx_ring,
struct iavf_rx_buffer *rx_buffer)
@@ -1316,18 +1191,9 @@ static void iavf_put_rx_buffer(struct iavf_ring *rx_ring,
if (!rx_buffer)
return;

- if (iavf_can_reuse_rx_page(rx_buffer)) {
- /* hand second half of page back to the ring */
- iavf_reuse_rx_page(rx_ring, rx_buffer);
- rx_ring->rx_stats.page_reuse_count++;
- } else {
- /* we are not reusing the buffer so unmap it */
- dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
- iavf_rx_pg_size(rx_ring),
- DMA_FROM_DEVICE, IAVF_RX_DMA_ATTR);
- __page_frag_cache_drain(rx_buffer->page,
- rx_buffer->pagecnt_bias);
- }
+ /* we are not reusing the buffer so unmap it */
+ dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma, PAGE_SIZE,
+ DMA_FROM_DEVICE, IAVF_RX_DMA_ATTR);

/* clear contents of buffer_info */
rx_buffer->page = NULL;
@@ -1432,8 +1298,6 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
/* exit if we failed to retrieve a buffer */
if (!skb) {
rx_ring->rx_stats.alloc_buff_failed++;
- if (rx_buffer && size)
- rx_buffer->pagecnt_bias++;
break;
}

diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index a31df5af0473..f8e9f859a4f1 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -288,10 +288,6 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
if (!vqci)
return;

- /* Limit maximum frame size when jumbo frames is not enabled */
- if (adapter->netdev->mtu <= ETH_DATA_LEN)
- max_frame = IAVF_RXBUFFER_1536 - NET_IP_ALIGN;
-
vqci->vsi_id = adapter->vsi_res->vsi_id;
vqci->num_queue_pairs = pairs;
vqpi = vqci->qpair;
@@ -308,9 +304,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
vqpi->rxq.ring_len = adapter->rx_rings[i].count;
vqpi->rxq.dma_ring_addr = adapter->rx_rings[i].dma;
vqpi->rxq.max_pkt_size = max_frame;
- vqpi->rxq.databuffer_size =
- ALIGN(adapter->rx_rings[i].rx_buf_len,
- BIT_ULL(IAVF_RXQ_CTX_DBUFF_SHIFT));
+ vqpi->rxq.databuffer_size = IAVF_RXBUFFER_3072;
if (CRC_OFFLOAD_ALLOWED(adapter))
vqpi->rxq.crc_disable = !!(adapter->netdev->features &
NETIF_F_RXFCS);
--
2.44.0


2024-04-18 11:39:40

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v10 09/10] iavf: switch to Page Pool

Now that the IAVF driver simply uses dev_alloc_page() + free_page() with
no custom recycling logics, it can easily be switched to using Page
Pool / libeth API instead.
This allows to removing the whole dancing around headroom, HW buffer
size, and page order. All DMA-for-device is now done in the PP core,
for-CPU -- in the libeth helper.
Use skb_mark_for_recycle() to bring back the recycling and restore the
performance. Speaking of performance: on par with the baseline and
faster with the PP optimization series applied. But the memory usage for
1500b MTU is now almost 2x lower (x86_64) thanks to allocating a page
every second descriptor.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 34 +--
include/linux/net/intel/libie/rx.h | 17 ++
drivers/net/ethernet/intel/iavf/iavf_main.c | 7 +-
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 257 +++++-------------
.../net/ethernet/intel/iavf/iavf_virtchnl.c | 10 +-
5 files changed, 111 insertions(+), 214 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index ed559fa6f214..d7b5587aeb8e 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -80,18 +80,8 @@ enum iavf_dyn_idx_t {
BIT_ULL(IAVF_FILTER_PCTYPE_NONF_UNICAST_IPV6_UDP) | \
BIT_ULL(IAVF_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP))

-/* Supported Rx Buffer Sizes (a multiple of 128) */
-#define IAVF_RXBUFFER_3072 3072 /* Used for large frames w/ padding */
-#define IAVF_MAX_RXBUFFER 9728 /* largest size for single descriptor */
-
-#define IAVF_PACKET_HDR_PAD (ETH_HLEN + ETH_FCS_LEN + (VLAN_HLEN * 2))
#define iavf_rx_desc iavf_32byte_rx_desc

-#define IAVF_RX_DMA_ATTR \
- (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
-
-#define IAVF_SKB_PAD (NET_SKB_PAD + NET_IP_ALIGN)
-
/**
* iavf_test_staterr - tests bits in Rx descriptor status and error fields
* @rx_desc: pointer to receive descriptor (in le64 format)
@@ -210,12 +200,6 @@ struct iavf_tx_buffer {
u32 tx_flags;
};

-struct iavf_rx_buffer {
- dma_addr_t dma;
- struct page *page;
- __u32 page_offset;
-};
-
struct iavf_queue_stats {
u64 packets;
u64 bytes;
@@ -251,13 +235,18 @@ struct iavf_rx_queue_stats {
struct iavf_ring {
struct iavf_ring *next; /* pointer to next ring in q_vector */
void *desc; /* Descriptor ring memory */
- struct device *dev; /* Used for DMA mapping */
+ union {
+ struct page_pool *pp; /* Used on Rx for buffer management */
+ struct device *dev; /* Used on Tx for DMA mapping */
+ };
struct net_device *netdev; /* netdev ring maps to */
union {
+ struct libeth_fqe *rx_fqes;
struct iavf_tx_buffer *tx_bi;
- struct iavf_rx_buffer *rx_bi;
};
u8 __iomem *tail;
+ u32 truesize;
+
u16 queue_index; /* Queue number of ring */

/* high bit set means dynamic, use accessors routines to read/write.
@@ -305,6 +294,8 @@ struct iavf_ring {
* iavf_clean_rx_ring_irq() is called
* for this ring.
*/
+
+ u32 rx_buf_len;
} ____cacheline_internodealigned_in_smp;

#define IAVF_ITR_ADAPTIVE_MIN_INC 0x0002
@@ -328,13 +319,6 @@ struct iavf_ring_container {
#define iavf_for_each_ring(pos, head) \
for (pos = (head).ring; pos != NULL; pos = pos->next)

-static inline unsigned int iavf_rx_pg_order(struct iavf_ring *ring)
-{
- return 0;
-}
-
-#define iavf_rx_pg_size(_ring) (PAGE_SIZE << iavf_rx_pg_order(_ring))
-
bool iavf_alloc_rx_buffers(struct iavf_ring *rxr, u16 cleaned_count);
netdev_tx_t iavf_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
int iavf_setup_tx_descriptors(struct iavf_ring *tx_ring);
diff --git a/include/linux/net/intel/libie/rx.h b/include/linux/net/intel/libie/rx.h
index 37c5c8f26fb9..8e97775f1d66 100644
--- a/include/linux/net/intel/libie/rx.h
+++ b/include/linux/net/intel/libie/rx.h
@@ -6,6 +6,23 @@

#include <net/libeth/rx.h>

+/* Rx buffer management */
+
+/* The largest size for a single descriptor as per HW */
+#define LIBIE_MAX_RX_BUF_LEN 9728U
+/* "True" HW-writeable space: minimum from SW and HW values */
+#define LIBIE_RX_BUF_LEN(hr) min_t(u32, LIBETH_RX_PAGE_LEN(hr), \
+ LIBIE_MAX_RX_BUF_LEN)
+
+/* The maximum frame size as per HW (S/G) */
+#define __LIBIE_MAX_RX_FRM_LEN 16382U
+/* ATST, HW can chain up to 5 Rx descriptors */
+#define LIBIE_MAX_RX_FRM_LEN(hr) \
+ min_t(u32, __LIBIE_MAX_RX_FRM_LEN, LIBIE_RX_BUF_LEN(hr) * 5)
+/* Maximum frame size minus LL overhead */
+#define LIBIE_MAX_MTU \
+ (LIBIE_MAX_RX_FRM_LEN(LIBETH_MAX_HEADROOM) - LIBETH_RX_LL_LEN)
+
/* O(1) converting i40e/ice/iavf's 8/10-bit hardware packet type to a parsed
* bitfield struct.
*/
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index ffb71a62b105..7e0de0a9b883 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -1,6 +1,8 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2013 - 2018 Intel Corporation. */

+#include <linux/net/intel/libie/rx.h>
+
#include "iavf.h"
#include "iavf_prototype.h"
/* All iavf tracepoints are defined by the include below, which must
@@ -45,6 +47,7 @@ MODULE_DEVICE_TABLE(pci, iavf_pci_tbl);
MODULE_ALIAS("i40evf");
MODULE_AUTHOR("Intel Corporation, <[email protected]>");
MODULE_DESCRIPTION("Intel(R) Ethernet Adaptive Virtual Function Network Driver");
+MODULE_IMPORT_NS(LIBETH);
MODULE_IMPORT_NS(LIBIE);
MODULE_LICENSE("GPL v2");

@@ -1586,7 +1589,6 @@ static int iavf_alloc_queues(struct iavf_adapter *adapter)
rx_ring = &adapter->rx_rings[i];
rx_ring->queue_index = i;
rx_ring->netdev = adapter->netdev;
- rx_ring->dev = &adapter->pdev->dev;
rx_ring->count = adapter->rx_desc_count;
rx_ring->itr_setting = IAVF_ITR_RX_DEF;
}
@@ -2613,9 +2615,8 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
iavf_set_ethtool_ops(netdev);
netdev->watchdog_timeo = 5 * HZ;

- /* MTU range: 68 - 9710 */
netdev->min_mtu = ETH_MIN_MTU;
- netdev->max_mtu = IAVF_MAX_RXBUFFER - IAVF_PACKET_HDR_PAD;
+ netdev->max_mtu = LIBIE_MAX_MTU;

if (!is_valid_ether_addr(adapter->hw.mac.addr)) {
dev_info(&pdev->dev, "Invalid MAC address %pM, using random\n",
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 2ec68b51bebe..26b424fd6718 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -690,11 +690,8 @@ int iavf_setup_tx_descriptors(struct iavf_ring *tx_ring)
**/
static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
{
- unsigned long bi_size;
- u16 i;
-
/* ring already cleared, nothing to do */
- if (!rx_ring->rx_bi)
+ if (!rx_ring->rx_fqes)
return;

if (rx_ring->skb) {
@@ -702,40 +699,16 @@ static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
rx_ring->skb = NULL;
}

- /* Free all the Rx ring sk_buffs */
- for (i = 0; i < rx_ring->count; i++) {
- struct iavf_rx_buffer *rx_bi = &rx_ring->rx_bi[i];
+ /* Free all the Rx ring buffers */
+ for (u32 i = rx_ring->next_to_clean; i != rx_ring->next_to_use; ) {
+ const struct libeth_fqe *rx_fqes = &rx_ring->rx_fqes[i];

- if (!rx_bi->page)
- continue;
+ page_pool_put_full_page(rx_ring->pp, rx_fqes->page, false);

- /* Invalidate cache lines that may have been written to by
- * device so that we avoid corrupting memory.
- */
- dma_sync_single_range_for_cpu(rx_ring->dev,
- rx_bi->dma,
- rx_bi->page_offset,
- IAVF_RXBUFFER_3072,
- DMA_FROM_DEVICE);
-
- /* free resources associated with mapping */
- dma_unmap_page_attrs(rx_ring->dev, rx_bi->dma,
- iavf_rx_pg_size(rx_ring),
- DMA_FROM_DEVICE,
- IAVF_RX_DMA_ATTR);
-
- __free_page(rx_bi->page);
-
- rx_bi->page = NULL;
- rx_bi->page_offset = 0;
+ if (unlikely(++i == rx_ring->count))
+ i = 0;
}

- bi_size = sizeof(struct iavf_rx_buffer) * rx_ring->count;
- memset(rx_ring->rx_bi, 0, bi_size);
-
- /* Zero out the descriptor ring */
- memset(rx_ring->desc, 0, rx_ring->size);
-
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;
}
@@ -748,15 +721,22 @@ static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
**/
void iavf_free_rx_resources(struct iavf_ring *rx_ring)
{
+ struct libeth_fq fq = {
+ .fqes = rx_ring->rx_fqes,
+ .pp = rx_ring->pp,
+ };
+
iavf_clean_rx_ring(rx_ring);
- kfree(rx_ring->rx_bi);
- rx_ring->rx_bi = NULL;

if (rx_ring->desc) {
- dma_free_coherent(rx_ring->dev, rx_ring->size,
+ dma_free_coherent(rx_ring->pp->p.dev, rx_ring->size,
rx_ring->desc, rx_ring->dma);
rx_ring->desc = NULL;
}
+
+ libeth_rx_fq_destroy(&fq);
+ rx_ring->rx_fqes = NULL;
+ rx_ring->pp = NULL;
}

/**
@@ -767,26 +747,32 @@ void iavf_free_rx_resources(struct iavf_ring *rx_ring)
**/
int iavf_setup_rx_descriptors(struct iavf_ring *rx_ring)
{
- struct device *dev = rx_ring->dev;
- int bi_size;
-
- /* warn if we are about to overwrite the pointer */
- WARN_ON(rx_ring->rx_bi);
- bi_size = sizeof(struct iavf_rx_buffer) * rx_ring->count;
- rx_ring->rx_bi = kzalloc(bi_size, GFP_KERNEL);
- if (!rx_ring->rx_bi)
- goto err;
+ struct libeth_fq fq = {
+ .count = rx_ring->count,
+ .buf_len = LIBIE_MAX_RX_BUF_LEN,
+ .nid = NUMA_NO_NODE,
+ };
+ int ret;
+
+ ret = libeth_rx_fq_create(&fq, &rx_ring->q_vector->napi);
+ if (ret)
+ return ret;
+
+ rx_ring->pp = fq.pp;
+ rx_ring->rx_fqes = fq.fqes;
+ rx_ring->truesize = fq.truesize;
+ rx_ring->rx_buf_len = fq.buf_len;

u64_stats_init(&rx_ring->syncp);

/* Round up to nearest 4K */
rx_ring->size = rx_ring->count * sizeof(union iavf_32byte_rx_desc);
rx_ring->size = ALIGN(rx_ring->size, 4096);
- rx_ring->desc = dma_alloc_coherent(dev, rx_ring->size,
+ rx_ring->desc = dma_alloc_coherent(fq.pp->p.dev, rx_ring->size,
&rx_ring->dma, GFP_KERNEL);

if (!rx_ring->desc) {
- dev_info(dev, "Unable to allocate memory for the Rx descriptor ring, size=%d\n",
+ dev_info(fq.pp->p.dev, "Unable to allocate memory for the Rx descriptor ring, size=%d\n",
rx_ring->size);
goto err;
}
@@ -795,9 +781,12 @@ int iavf_setup_rx_descriptors(struct iavf_ring *rx_ring)
rx_ring->next_to_use = 0;

return 0;
+
err:
- kfree(rx_ring->rx_bi);
- rx_ring->rx_bi = NULL;
+ libeth_rx_fq_destroy(&fq);
+ rx_ring->rx_fqes = NULL;
+ rx_ring->pp = NULL;
+
return -ENOMEM;
}

@@ -819,49 +808,6 @@ static void iavf_release_rx_desc(struct iavf_ring *rx_ring, u32 val)
writel(val, rx_ring->tail);
}

-/**
- * iavf_alloc_mapped_page - recycle or make a new page
- * @rx_ring: ring to use
- * @bi: rx_buffer struct to modify
- *
- * Returns true if the page was successfully allocated or
- * reused.
- **/
-static bool iavf_alloc_mapped_page(struct iavf_ring *rx_ring,
- struct iavf_rx_buffer *bi)
-{
- struct page *page = bi->page;
- dma_addr_t dma;
-
- /* alloc new page for storage */
- page = dev_alloc_pages(iavf_rx_pg_order(rx_ring));
- if (unlikely(!page)) {
- rx_ring->rx_stats.alloc_page_failed++;
- return false;
- }
-
- /* map page for use */
- dma = dma_map_page_attrs(rx_ring->dev, page, 0,
- iavf_rx_pg_size(rx_ring),
- DMA_FROM_DEVICE,
- IAVF_RX_DMA_ATTR);
-
- /* if mapping failed free memory back to system since
- * there isn't much point in holding memory we can't use
- */
- if (dma_mapping_error(rx_ring->dev, dma)) {
- __free_pages(page, iavf_rx_pg_order(rx_ring));
- rx_ring->rx_stats.alloc_page_failed++;
- return false;
- }
-
- bi->dma = dma;
- bi->page = page;
- bi->page_offset = IAVF_SKB_PAD;
-
- return true;
-}
-
/**
* iavf_receive_skb - Send a completed packet up the stack
* @rx_ring: rx ring in play
@@ -892,38 +838,37 @@ static void iavf_receive_skb(struct iavf_ring *rx_ring,
**/
bool iavf_alloc_rx_buffers(struct iavf_ring *rx_ring, u16 cleaned_count)
{
+ const struct libeth_fq_fp fq = {
+ .pp = rx_ring->pp,
+ .fqes = rx_ring->rx_fqes,
+ .truesize = rx_ring->truesize,
+ .count = rx_ring->count,
+ };
u16 ntu = rx_ring->next_to_use;
union iavf_rx_desc *rx_desc;
- struct iavf_rx_buffer *bi;

/* do nothing if no valid netdev defined */
if (!rx_ring->netdev || !cleaned_count)
return false;

rx_desc = IAVF_RX_DESC(rx_ring, ntu);
- bi = &rx_ring->rx_bi[ntu];

do {
- if (!iavf_alloc_mapped_page(rx_ring, bi))
- goto no_buffers;
+ dma_addr_t addr;

- /* sync the buffer for use by the device */
- dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
- bi->page_offset,
- IAVF_RXBUFFER_3072,
- DMA_FROM_DEVICE);
+ addr = libeth_rx_alloc(&fq, ntu);
+ if (addr == DMA_MAPPING_ERROR)
+ goto no_buffers;

/* Refresh the desc even if buffer_addrs didn't change
* because each write-back erases this info.
*/
- rx_desc->read.pkt_addr = cpu_to_le64(bi->dma + bi->page_offset);
+ rx_desc->read.pkt_addr = cpu_to_le64(addr);

rx_desc++;
- bi++;
ntu++;
if (unlikely(ntu == rx_ring->count)) {
rx_desc = IAVF_RX_DESC(rx_ring, 0);
- bi = rx_ring->rx_bi;
ntu = 0;
}

@@ -942,6 +887,8 @@ bool iavf_alloc_rx_buffers(struct iavf_ring *rx_ring, u16 cleaned_count)
if (rx_ring->next_to_use != ntu)
iavf_release_rx_desc(rx_ring, ntu);

+ rx_ring->rx_stats.alloc_page_failed++;
+
/* make sure to come back via polling to try again after
* allocation failure
*/
@@ -1090,9 +1037,8 @@ static bool iavf_cleanup_headers(struct iavf_ring *rx_ring, struct sk_buff *skb)

/**
* iavf_add_rx_frag - Add contents of Rx buffer to sk_buff
- * @rx_ring: rx descriptor ring to transact packets on
- * @rx_buffer: buffer containing page to add
* @skb: sk_buff to place the data into
+ * @rx_buffer: buffer containing page to add
* @size: packet length from rx_desc
*
* This function will add the data contained in rx_buffer->page to the skb.
@@ -1100,105 +1046,49 @@ static bool iavf_cleanup_headers(struct iavf_ring *rx_ring, struct sk_buff *skb)
*
* The function will then update the page offset.
**/
-static void iavf_add_rx_frag(struct iavf_ring *rx_ring,
- struct iavf_rx_buffer *rx_buffer,
- struct sk_buff *skb,
+static void iavf_add_rx_frag(struct sk_buff *skb,
+ const struct libeth_fqe *rx_buffer,
unsigned int size)
{
- unsigned int truesize = SKB_DATA_ALIGN(size + IAVF_SKB_PAD);
-
- if (!size)
- return;
+ u32 hr = rx_buffer->page->pp->p.offset;

skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
- rx_buffer->page_offset, size, truesize);
-}
-
-/**
- * iavf_get_rx_buffer - Fetch Rx buffer and synchronize data for use
- * @rx_ring: rx descriptor ring to transact packets on
- * @size: size of buffer to add to skb
- *
- * This function will pull an Rx buffer from the ring and synchronize it
- * for use by the CPU.
- */
-static struct iavf_rx_buffer *iavf_get_rx_buffer(struct iavf_ring *rx_ring,
- const unsigned int size)
-{
- struct iavf_rx_buffer *rx_buffer;
-
- rx_buffer = &rx_ring->rx_bi[rx_ring->next_to_clean];
- prefetchw(rx_buffer->page);
- if (!size)
- return rx_buffer;
-
- /* we are reusing so sync this buffer for CPU use */
- dma_sync_single_range_for_cpu(rx_ring->dev,
- rx_buffer->dma,
- rx_buffer->page_offset,
- size,
- DMA_FROM_DEVICE);
-
- return rx_buffer;
+ rx_buffer->offset + hr, size, rx_buffer->truesize);
}

/**
* iavf_build_skb - Build skb around an existing buffer
- * @rx_ring: Rx descriptor ring to transact packets on
* @rx_buffer: Rx buffer to pull data from
* @size: size of buffer to add to skb
*
* This function builds an skb around an existing Rx buffer, taking care
* to set up the skb correctly and avoid any memcpy overhead.
*/
-static struct sk_buff *iavf_build_skb(struct iavf_ring *rx_ring,
- struct iavf_rx_buffer *rx_buffer,
+static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
unsigned int size)
{
- void *va;
- unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
- SKB_DATA_ALIGN(IAVF_SKB_PAD + size);
+ u32 hr = rx_buffer->page->pp->p.offset;
struct sk_buff *skb;
+ void *va;

- if (!rx_buffer || !size)
- return NULL;
/* prefetch first cache line of first page */
- va = page_address(rx_buffer->page) + rx_buffer->page_offset;
- net_prefetch(va);
+ va = page_address(rx_buffer->page) + rx_buffer->offset;
+ net_prefetch(va + hr);

/* build an skb around the page buffer */
- skb = napi_build_skb(va - IAVF_SKB_PAD, truesize);
+ skb = napi_build_skb(va, rx_buffer->truesize);
if (unlikely(!skb))
return NULL;

+ skb_mark_for_recycle(skb);
+
/* update pointers within the skb to store the data */
- skb_reserve(skb, IAVF_SKB_PAD);
+ skb_reserve(skb, hr);
__skb_put(skb, size);

return skb;
}

-/**
- * iavf_put_rx_buffer - Unmap used buffer
- * @rx_ring: rx descriptor ring to transact packets on
- * @rx_buffer: rx buffer to pull data from
- *
- * This function will unmap the buffer after it's written by HW.
- */
-static void iavf_put_rx_buffer(struct iavf_ring *rx_ring,
- struct iavf_rx_buffer *rx_buffer)
-{
- if (!rx_buffer)
- return;
-
- /* we are not reusing the buffer so unmap it */
- dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma, PAGE_SIZE,
- DMA_FROM_DEVICE, IAVF_RX_DMA_ATTR);
-
- /* clear contents of buffer_info */
- rx_buffer->page = NULL;
-}
-
/**
* iavf_is_non_eop - process handling of non-EOP buffers
* @rx_ring: Rx ring being processed
@@ -1252,7 +1142,7 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
bool failure = false;

while (likely(total_rx_packets < (unsigned int)budget)) {
- struct iavf_rx_buffer *rx_buffer;
+ struct libeth_fqe *rx_buffer;
union iavf_rx_desc *rx_desc;
unsigned int size;
u16 vlan_tag = 0;
@@ -1287,13 +1177,16 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
size = FIELD_GET(IAVF_RXD_QW1_LENGTH_PBUF_MASK, qword);

iavf_trace(clean_rx_irq, rx_ring, rx_desc, skb);
- rx_buffer = iavf_get_rx_buffer(rx_ring, size);
+
+ rx_buffer = &rx_ring->rx_fqes[rx_ring->next_to_clean];
+ if (!libeth_rx_sync_for_cpu(rx_buffer, size))
+ goto skip_data;

/* retrieve a buffer from the ring */
if (skb)
- iavf_add_rx_frag(rx_ring, rx_buffer, skb, size);
+ iavf_add_rx_frag(skb, rx_buffer, size);
else
- skb = iavf_build_skb(rx_ring, rx_buffer, size);
+ skb = iavf_build_skb(rx_buffer, size);

/* exit if we failed to retrieve a buffer */
if (!skb) {
@@ -1301,10 +1194,10 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
break;
}

- iavf_put_rx_buffer(rx_ring, rx_buffer);
+skip_data:
cleaned_count++;

- if (iavf_is_non_eop(rx_ring, rx_desc, skb))
+ if (iavf_is_non_eop(rx_ring, rx_desc, skb) || unlikely(!skb))
continue;

/* ERR_MASK will only have valid bits if EOP set, and
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index f8e9f859a4f1..1e543f6a7c30 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -1,6 +1,8 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2013 - 2018 Intel Corporation. */

+#include <linux/net/intel/libie/rx.h>
+
#include "iavf.h"
#include "iavf_prototype.h"

@@ -268,13 +270,13 @@ int iavf_get_vf_vlan_v2_caps(struct iavf_adapter *adapter)
void iavf_configure_queues(struct iavf_adapter *adapter)
{
struct virtchnl_vsi_queue_config_info *vqci;
- int i, max_frame = adapter->vf_res->max_mtu;
int pairs = adapter->num_active_queues;
struct virtchnl_queue_pair_info *vqpi;
+ u32 i, max_frame;
size_t len;

- if (max_frame > IAVF_MAX_RXBUFFER || !max_frame)
- max_frame = IAVF_MAX_RXBUFFER;
+ max_frame = LIBIE_MAX_RX_FRM_LEN(adapter->rx_rings->pp->p.offset);
+ max_frame = min_not_zero(adapter->vf_res->max_mtu, max_frame);

if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
/* bail because we already have a command pending */
@@ -304,7 +306,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
vqpi->rxq.ring_len = adapter->rx_rings[i].count;
vqpi->rxq.dma_ring_addr = adapter->rx_rings[i].dma;
vqpi->rxq.max_pkt_size = max_frame;
- vqpi->rxq.databuffer_size = IAVF_RXBUFFER_3072;
+ vqpi->rxq.databuffer_size = adapter->rx_rings[i].rx_buf_len;
if (CRC_OFFLOAD_ALLOWED(adapter))
vqpi->rxq.crc_disable = !!(adapter->netdev->features &
NETIF_F_RXFCS);
--
2.44.0


2024-04-18 11:39:50

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v10 10/10] MAINTAINERS: add entry for libeth and libie

Add myself as a maintainer/supporter for libeth and libie. Let they have
separate entries from the Intel ethernet code as it's a bit different
case and all patches will go through me rather than Tony.

Signed-off-by: Alexander Lobakin <[email protected]>
---
MAINTAINERS | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f22698a7859f..6c42ce3ab266 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12392,6 +12392,26 @@ F: drivers/ata/
F: include/linux/ata.h
F: include/linux/libata.h

+LIBETH COMMON ETHERNET LIBRARY
+M: Alexander Lobakin <[email protected]>
+L: [email protected]
+L: [email protected] (moderated for non-subscribers)
+S: Supported
+T: git https://github.com/alobakin/linux.git
+F: drivers/net/ethernet/intel/libeth/
+F: include/net/libeth/
+K: libeth
+
+LIBIE COMMON INTEL ETHERNET LIBRARY
+M: Alexander Lobakin <[email protected]>
+L: [email protected] (moderated for non-subscribers)
+L: [email protected]
+S: Supported
+T: git https://github.com/alobakin/linux.git
+F: drivers/net/ethernet/intel/libie/
+F: include/linux/net/intel/libie/
+K: libie
+
LIBNVDIMM BTT: BLOCK TRANSLATION TABLE
M: Vishal Verma <[email protected]>
M: Dan Williams <[email protected]>
--
2.44.0


2024-04-18 11:40:18

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v10 05/10] page_pool: constify some read-only function arguments

There are several functions taking pointers to data they don't modify.
This includes statistics fetching, page and page_pool parameters, etc.
Constify the pointers, so that call sites will be able to pass const
pointers as well.
No functional changes, no visible changes in functions sizes.

Reviewed-by: Ilias Apalodimas <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/page_pool/types.h | 4 ++--
include/net/page_pool/helpers.h | 10 +++++-----
net/core/page_pool.c | 10 +++++-----
3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 5e43a08d3231..a6ebed002216 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -213,7 +213,7 @@ struct xdp_mem_info;
#ifdef CONFIG_PAGE_POOL
void page_pool_destroy(struct page_pool *pool);
void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
- struct xdp_mem_info *mem);
+ const struct xdp_mem_info *mem);
void page_pool_put_page_bulk(struct page_pool *pool, void **data,
int count);
#else
@@ -223,7 +223,7 @@ static inline void page_pool_destroy(struct page_pool *pool)

static inline void page_pool_use_xdp_mem(struct page_pool *pool,
void (*disconnect)(void *),
- struct xdp_mem_info *mem)
+ const struct xdp_mem_info *mem)
{
}

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 1d397c1a0043..c7bb06750e85 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -58,7 +58,7 @@
/* Deprecated driver-facing API, use netlink instead */
int page_pool_ethtool_stats_get_count(void);
u8 *page_pool_ethtool_stats_get_strings(u8 *data);
-u64 *page_pool_ethtool_stats_get(u64 *data, void *stats);
+u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats);

bool page_pool_get_stats(const struct page_pool *pool,
struct page_pool_stats *stats);
@@ -73,7 +73,7 @@ static inline u8 *page_pool_ethtool_stats_get_strings(u8 *data)
return data;
}

-static inline u64 *page_pool_ethtool_stats_get(u64 *data, void *stats)
+static inline u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
{
return data;
}
@@ -204,8 +204,8 @@ static inline void *page_pool_dev_alloc_va(struct page_pool *pool,
* Get the stored dma direction. A driver might decide to store this locally
* and avoid the extra cache line from page_pool to determine the direction.
*/
-static
-inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
+static inline enum dma_data_direction
+page_pool_get_dma_dir(const struct page_pool *pool)
{
return pool->p.dma_dir;
}
@@ -370,7 +370,7 @@ static inline void page_pool_free_va(struct page_pool *pool, void *va,
* Fetch the DMA address of the page. The page pool to which the page belongs
* must had been created with PP_FLAG_DMA_MAP.
*/
-static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
+static inline dma_addr_t page_pool_get_dma_addr(const struct page *page)
{
dma_addr_t ret = page->dma_addr;

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 4c175091fc0a..273c24429bce 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -123,9 +123,9 @@ int page_pool_ethtool_stats_get_count(void)
}
EXPORT_SYMBOL(page_pool_ethtool_stats_get_count);

-u64 *page_pool_ethtool_stats_get(u64 *data, void *stats)
+u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
{
- struct page_pool_stats *pool_stats = stats;
+ const struct page_pool_stats *pool_stats = stats;

*data++ = pool_stats->alloc_stats.fast;
*data++ = pool_stats->alloc_stats.slow;
@@ -383,8 +383,8 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
return page;
}

-static void page_pool_dma_sync_for_device(struct page_pool *pool,
- struct page *page,
+static void page_pool_dma_sync_for_device(const struct page_pool *pool,
+ const struct page *page,
unsigned int dma_sync_size)
{
dma_addr_t dma_addr = page_pool_get_dma_addr(page);
@@ -987,7 +987,7 @@ static void page_pool_release_retry(struct work_struct *wq)
}

void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
- struct xdp_mem_info *mem)
+ const struct xdp_mem_info *mem)
{
refcount_inc(&pool->user_cnt);
pool->disconnect = disconnect;
--
2.44.0


2024-04-18 11:42:00

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v10 08/10] iavf: pack iavf_ring more efficiently

Before replacing the Rx buffer management with libie, clean up
&iavf_ring a bit.
There are several fields not used anywhere in the code -- simply remove
them. Move ::tail up to remove a hole. Replace ::arm_wb boolean with
1-bit flag in ::flags to free 1 more byte. Finally, move ::prev_pkt_ctr
out of &iavf_tx_queue_stats -- it doesn't belong there (used for Tx
stall detection). Place it next to the stats on the ring itself to fill
the 4-byte slot.
The result: no holes and all the hot fields fit into the first 64-byte
cacheline.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 22 +++------------------
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 12 +++++------
2 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index e01777531635..ed559fa6f214 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -227,7 +227,6 @@ struct iavf_tx_queue_stats {
u64 tx_done_old;
u64 tx_linearize;
u64 tx_force_wb;
- int prev_pkt_ctr;
u64 tx_lost_interrupt;
};

@@ -237,12 +236,6 @@ struct iavf_rx_queue_stats {
u64 alloc_buff_failed;
};

-enum iavf_ring_state_t {
- __IAVF_TX_FDIR_INIT_DONE,
- __IAVF_TX_XPS_INIT_DONE,
- __IAVF_RING_STATE_NBITS /* must be last */
-};
-
/* some useful defines for virtchannel interface, which
* is the only remaining user of header split
*/
@@ -264,10 +257,8 @@ struct iavf_ring {
struct iavf_tx_buffer *tx_bi;
struct iavf_rx_buffer *rx_bi;
};
- DECLARE_BITMAP(state, __IAVF_RING_STATE_NBITS);
- u16 queue_index; /* Queue number of ring */
- u8 dcb_tc; /* Traffic class of ring */
u8 __iomem *tail;
+ u16 queue_index; /* Queue number of ring */

/* high bit set means dynamic, use accessors routines to read/write.
* hardware only supports 2us resolution for the ITR registers.
@@ -277,22 +268,14 @@ struct iavf_ring {
u16 itr_setting;

u16 count; /* Number of descriptors */
- u16 reg_idx; /* HW register index of the ring */

/* used in interrupt processing */
u16 next_to_use;
u16 next_to_clean;

- u8 atr_sample_rate;
- u8 atr_count;
-
- bool ring_active; /* is ring online or not */
- bool arm_wb; /* do something to arm write back */
- u8 packet_stride;
-
u16 flags;
#define IAVF_TXR_FLAGS_WB_ON_ITR BIT(0)
-/* BIT(1) is free, was IAVF_RXR_FLAGS_BUILD_SKB_ENABLED */
+#define IAVF_TXR_FLAGS_ARM_WB BIT(1)
/* BIT(2) is free */
#define IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1 BIT(3)
#define IAVF_TXR_FLAGS_VLAN_TAG_LOC_L2TAG2 BIT(4)
@@ -306,6 +289,7 @@ struct iavf_ring {
struct iavf_rx_queue_stats rx_stats;
};

+ int prev_pkt_ctr; /* For Tx stall detection */
unsigned int size; /* length of descriptor ring in bytes */
dma_addr_t dma; /* physical address of ring */

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index c7602c173420..2ec68b51bebe 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -185,7 +185,7 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
* pending work.
*/
packets = tx_ring->stats.packets & INT_MAX;
- if (tx_ring->tx_stats.prev_pkt_ctr == packets) {
+ if (tx_ring->prev_pkt_ctr == packets) {
iavf_force_wb(vsi, tx_ring->q_vector);
continue;
}
@@ -194,7 +194,7 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
* to iavf_get_tx_pending()
*/
smp_rmb();
- tx_ring->tx_stats.prev_pkt_ctr =
+ tx_ring->prev_pkt_ctr =
iavf_get_tx_pending(tx_ring, true) ? packets : -1;
}
}
@@ -320,7 +320,7 @@ static bool iavf_clean_tx_irq(struct iavf_vsi *vsi,
((j / WB_STRIDE) == 0) && (j > 0) &&
!test_bit(__IAVF_VSI_DOWN, vsi->state) &&
(IAVF_DESC_UNUSED(tx_ring) != tx_ring->count))
- tx_ring->arm_wb = true;
+ tx_ring->flags |= IAVF_TXR_FLAGS_ARM_WB;
}

/* notify netdev of completed buffers */
@@ -675,7 +675,7 @@ int iavf_setup_tx_descriptors(struct iavf_ring *tx_ring)

tx_ring->next_to_use = 0;
tx_ring->next_to_clean = 0;
- tx_ring->tx_stats.prev_pkt_ctr = -1;
+ tx_ring->prev_pkt_ctr = -1;
return 0;

err:
@@ -1491,8 +1491,8 @@ int iavf_napi_poll(struct napi_struct *napi, int budget)
clean_complete = false;
continue;
}
- arm_wb |= ring->arm_wb;
- ring->arm_wb = false;
+ arm_wb |= !!(ring->flags & IAVF_TXR_FLAGS_ARM_WB);
+ ring->flags &= ~IAVF_TXR_FLAGS_ARM_WB;
}

/* Handle case where we are called by netpoll with a budget of 0 */
--
2.44.0


2024-04-18 14:57:31

by Ilias Apalodimas

[permalink] [raw]
Subject: Re: [PATCH net-next v10 06/10] page_pool: add DMA-sync-for-CPU inline helper

On Thu, 18 Apr 2024 at 14:37, Alexander Lobakin
<[email protected]> wrote:
>
> Each driver is responsible for syncing buffers written by HW for CPU
> before accessing them. Almost each PP-enabled driver uses the same
> pattern, which could be shorthanded into a static inline to make driver
> code a little bit more compact.
> Introduce a simple helper which performs DMA synchronization for the
> size passed from the driver. It can be used even when the pool doesn't
> manage DMA-syncs-for-device, just make sure the page has a correct DMA
> address set via page_pool_set_dma_addr().
>
> Signed-off-by: Alexander Lobakin <[email protected]>
> ---
> include/net/page_pool/helpers.h | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index c7bb06750e85..873631c79ab1 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -52,6 +52,8 @@
> #ifndef _NET_PAGE_POOL_HELPERS_H
> #define _NET_PAGE_POOL_HELPERS_H
>
> +#include <linux/dma-mapping.h>
> +
> #include <net/page_pool/types.h>
>
> #ifdef CONFIG_PAGE_POOL_STATS
> @@ -395,6 +397,28 @@ static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
> return false;
> }
>
> +/**
> + * page_pool_dma_sync_for_cpu - sync Rx page for CPU after it's written by HW
> + * @pool: &page_pool the @page belongs to
> + * @page: page to sync
> + * @offset: offset from page start to "hard" start if using PP frags
> + * @dma_sync_size: size of the data written to the page
> + *
> + * Can be used as a shorthand to sync Rx pages before accessing them in the
> + * driver. Caller must ensure the pool was created with ``PP_FLAG_DMA_MAP``.
> + * Note that this version performs DMA sync unconditionally, even if the
> + * associated PP doesn't perform sync-for-device.
> + */
> +static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
> + const struct page *page,
> + u32 offset, u32 dma_sync_size)
> +{
> + dma_sync_single_range_for_cpu(pool->p.dev,
> + page_pool_get_dma_addr(page),
> + offset + pool->p.offset, dma_sync_size,
> + page_pool_get_dma_dir(pool));
> +}
> +
> static inline bool page_pool_put(struct page_pool *pool)
> {
> return refcount_dec_and_test(&pool->user_cnt);
> --
> 2.44.0
>

Reviewed-by: Ilias Apalodimas <[email protected]>

2024-04-18 15:02:17

by Przemek Kitszel

[permalink] [raw]
Subject: Re: [PATCH net-next v10 00/10] net: intel: start The Great Code Dedup + Page Pool for iavf

On 4/18/24 13:36, Alexander Lobakin wrote:
> Here's a two-shot: introduce {,Intel} Ethernet common library (libeth and
> libie) and switch iavf to Page Pool. Details are in the commit messages;
> here's a summary:
>
> Not a secret there's a ton of code duplication between two and more Intel
> ethernet modules. Before introducing new changes, which would need to be
> copied over again, start decoupling the already existing duplicate
> functionality into a new module, which will be shared between several
> Intel Ethernet drivers. The first name that came to my mind was
> "libie" -- "Intel Ethernet common library". Also this sounds like
> "lovelie" (-> one word, no "lib I E" pls) and can be expanded as
> "lib Internet Explorer" :P
> The "generic", pure-software part is placed separately, so that it can be
> easily reused in any driver by any vendor without linking to the Intel
> pre-200G guts. In a few words, it's something any modern driver does the
> same way, but nobody moved it level up (yet).
> The series is only the beginning. From now on, adding every new feature
> or doing any good driver refactoring will remove much more lines than add
> for quite some time. There's a basic roadmap with some deduplications
> planned already, not speaking of that touching every line now asks:
> "can I share this?". The final destination is very ambitious: have only
> one unified driver for at least i40e, ice, iavf, and idpf with a struct
> ops for each generation. That's never gonna happen, right? But you still
> can at least try.
> PP conversion for iavf lands within the same series as these two are tied
> closely. libie will support Page Pool model only, so that a driver can't
> use much of the lib until it's converted. iavf is only the example, the
> rest will eventually be converted soon on a per-driver basis. That is
> when it gets really interesting. Stay tech.
>
> Alexander Lobakin (10):
> net: intel: introduce {,Intel} Ethernet common library
> iavf: kill "legacy-rx" for good
> iavf: drop page splitting and recycling
> slab: introduce kvmalloc_array_node() and kvcalloc_node()
> page_pool: constify some read-only function arguments
> page_pool: add DMA-sync-for-CPU inline helper
> libeth: add Rx buffer management
> iavf: pack iavf_ring more efficiently
> iavf: switch to Page Pool
> MAINTAINERS: add entry for libeth and libie
>
> MAINTAINERS | 20 +
> drivers/net/ethernet/intel/Kconfig | 7 +
> drivers/net/ethernet/intel/libeth/Kconfig | 9 +
> drivers/net/ethernet/intel/libie/Kconfig | 10 +
> drivers/net/ethernet/intel/Makefile | 3 +
> drivers/net/ethernet/intel/libeth/Makefile | 6 +
> drivers/net/ethernet/intel/libie/Makefile | 6 +
> include/net/page_pool/types.h | 4 +-
> .../net/ethernet/intel/i40e/i40e_prototype.h | 7 -
> drivers/net/ethernet/intel/i40e/i40e_type.h | 88 ---
> drivers/net/ethernet/intel/iavf/iavf.h | 2 +-
> .../net/ethernet/intel/iavf/iavf_prototype.h | 7 -
> drivers/net/ethernet/intel/iavf/iavf_txrx.h | 146 +----
> drivers/net/ethernet/intel/iavf/iavf_type.h | 90 ---
> .../net/ethernet/intel/ice/ice_lan_tx_rx.h | 320 ----------
> include/linux/net/intel/libie/rx.h | 50 ++
> include/linux/slab.h | 17 +-
> include/net/libeth/rx.h | 242 ++++++++
> include/net/page_pool/helpers.h | 34 +-
> drivers/net/ethernet/intel/i40e/i40e_common.c | 253 --------
> drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 72 +--
> drivers/net/ethernet/intel/iavf/iavf_common.c | 253 --------
> .../net/ethernet/intel/iavf/iavf_ethtool.c | 140 -----
> drivers/net/ethernet/intel/iavf/iavf_main.c | 40 +-
> drivers/net/ethernet/intel/iavf/iavf_txrx.c | 551 +++---------------
> .../net/ethernet/intel/iavf/iavf_virtchnl.c | 17 +-
> drivers/net/ethernet/intel/ice/ice_main.c | 1 +
> drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 111 +---
> drivers/net/ethernet/intel/libeth/rx.c | 150 +++++
> drivers/net/ethernet/intel/libie/rx.c | 124 ++++
> net/core/page_pool.c | 10 +-
> 32 files changed, 836 insertions(+), 1955 deletions(-)
> create mode 100644 drivers/net/ethernet/intel/libeth/Kconfig
> create mode 100644 drivers/net/ethernet/intel/libie/Kconfig
> create mode 100644 drivers/net/ethernet/intel/libeth/Makefile
> create mode 100644 drivers/net/ethernet/intel/libie/Makefile
> create mode 100644 include/linux/net/intel/libie/rx.h
> create mode 100644 include/net/libeth/rx.h
> create mode 100644 drivers/net/ethernet/intel/libeth/rx.c
> create mode 100644 drivers/net/ethernet/intel/libie/rx.c
>
> ---
> libeth has way more generic functionality and code in the idpf XDP
> tree[0], take a look if you want to have more complete picture of
> what this really is about.
>
> From v9[1]:
> * pick Acked-by from Vlastimil and a couple Reviewed-by from Przemek;

thanks for the updates too!
I've read the code ~two times across the life of this series, nothing
bad spot, so for the series:
Reviewed-by: Przemek Kitszel <[email protected]>

> * mention that the libeth_fq::fp kernel-doc generates a warning and the
> fix for that is pending on the linux-doc ML (Jakub);
> * no functional changes.

// ...