2023-12-07 17:22:22

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 00/12] net: intel: start The Great Code Dedup + Page Pool for iavf

Here's a two-shot: introduce Intel Ethernet common library (libie) and
switch iavf to Page Pool. Details are in the commit messages; here's
a summary:

Not a secret there's a ton of code duplication between two and more Intel
ethernet modules. Before introducing new changes, which would need to be
copied over again, start decoupling the already existing duplicate
functionality into a new module, which will be shared between several
Intel Ethernet drivers. The first name that came to my mind was
"libie" -- "Intel Ethernet common library". Also this sounds like
"lovelie" (-> one word, no "lib I E" pls) and can be expanded as
"lib Internet Explorer" :P
The series is only the beginning. From now on, adding every new feature
or doing any good driver refactoring will remove much more lines than add
for quite some time. There's a basic roadmap with some deduplications
planned already, not speaking of that touching every line now asks:
"can I share this?". The final destination is very ambitious: have only
one unified driver for at least i40e, ice, iavf, and idpf with a struct
ops for each generation. That's never gonna happen, right? But you still
can at least try.
PP conversion for iavf lands within the same series as these two are tied
closely. libie will support Page Pool model only, so that a driver can't
use much of the lib until it's converted. iavf is only the example, the
rest will eventually be converted soon on a per-driver basis. That is
when it gets really interesting. Stay tech.

Alexander Lobakin (12):
page_pool: make sure frag API fields don't span between cachelines
page_pool: don't use driver-set flags field directly
net: intel: introduce Intel Ethernet common library
iavf: kill "legacy-rx" for good
iavf: drop page splitting and recycling
page_pool: constify some read-only function arguments
page_pool: add DMA-sync-for-CPU inline helper
libie: add Rx buffer management (via Page Pool)
iavf: pack iavf_ring more efficiently
iavf: switch to Page Pool
libie: add common queue stats
iavf: switch queue stats to libie

MAINTAINERS | 3 +-
drivers/net/ethernet/intel/Kconfig | 6 +
drivers/net/ethernet/intel/Makefile | 1 +
drivers/net/ethernet/intel/i40e/i40e_common.c | 253 -------
drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
.../net/ethernet/intel/i40e/i40e_prototype.h | 7 -
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 72 +-
drivers/net/ethernet/intel/i40e/i40e_type.h | 88 ---
drivers/net/ethernet/intel/iavf/iavf.h | 2 +-
drivers/net/ethernet/intel/iavf/iavf_common.c | 253 -------
.../net/ethernet/intel/iavf/iavf_ethtool.c | 235 +------
drivers/net/ethernet/intel/iavf/iavf_main.c | 42 +-
.../net/ethernet/intel/iavf/iavf_prototype.h | 7 -
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 624 ++++--------------
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 174 +----
drivers/net/ethernet/intel/iavf/iavf_type.h | 90 ---
.../net/ethernet/intel/iavf/iavf_virtchnl.c | 17 +-
.../net/ethernet/intel/ice/ice_lan_tx_rx.h | 316 ---------
drivers/net/ethernet/intel/ice/ice_main.c | 1 +
drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 74 +--
drivers/net/ethernet/intel/libie/Kconfig | 9 +
drivers/net/ethernet/intel/libie/Makefile | 7 +
drivers/net/ethernet/intel/libie/rx.c | 179 +++++
drivers/net/ethernet/intel/libie/stats.c | 121 ++++
include/linux/net/intel/libie/rx.h | 259 ++++++++
include/linux/net/intel/libie/stats.h | 179 +++++
include/net/page_pool/helpers.h | 34 +-
include/net/page_pool/types.h | 19 +-
net/core/page_pool.c | 42 +-
29 files changed, 1058 insertions(+), 2057 deletions(-)
create mode 100644 drivers/net/ethernet/intel/libie/Kconfig
create mode 100644 drivers/net/ethernet/intel/libie/Makefile
create mode 100644 drivers/net/ethernet/intel/libie/rx.c
create mode 100644 drivers/net/ethernet/intel/libie/stats.c
create mode 100644 include/linux/net/intel/libie/rx.h
create mode 100644 include/linux/net/intel/libie/stats.h

---
Directly to net-next, has non-Intel code changes :p

From v5[0]:
* drop Page Pool DMA shortcut: will pick up Eric's more global DMA sync
optimization[1] and expand it to cover both IOMMU and direct DMA a bit
later (Yunsheng);
* drop per-queue Page Pool Ethtool stats: they are now exported via
generic Netlink interface (Jakub);
* #01: leave a comment why exactly this alignment (Jakub, Yunsheng);
* #08: make use of page_pool_params::netdev when calculating PP params;
* #08: rename ``libie_rx_queue`` -> ``libie_buf_queue``.

From v4[2]:
* make use of Jakub's &page_pool_params split;
* #01: prevent frag fields from spanning into 2 cachelines after
splitting &page_pool_params into fast and slow;
* #02-03: bring back the DMA sync shortcut, now as a per-page flag
(me, Yunsheng);
* #04: let libie have its own Kconfig to stop further bloating of poor
intel/Kconfig;
* #06: merge page split-reuse-recycle drop into one commit (Alex);
* #07: decouple constifying of several Page Pool function arguments
into a separate commit, constify some more;
* #09: stop abusing internal PP fields in the driver code (Yunsheng);
* #09: calculate DMA sync size (::max_len) correctly: within one page,
not one buffer (Yunsheng);
* #10: decouple rearranging &iavf_ring into separate commit, optimize
it even more;
* #11: let the driver get back to the last descriptor to process after
an skb allocation fail, don't drop it (Alex);
* #11: stop touching unrelated stuff like watchdog timeout etc. (Alex);
* fix "Return:" in the kdoc (now `W=12 C=1` is clean), misc typos.

From v3[3]:
* base on the latest net-next, update bloat-o-meter and perf stats;
* split generic PP optimizations into a separate series;
* drop "optimize hotpath a bunch" commit: a lot of [controversial]
changes in one place, worth own series (Alex);
* 02: pick Rev-by (Alex);
* 03: move in-place recycling removal here from the dropped patch;
* 05: new, add libie Rx buffer API separatelly from IAVF changes;
* 05-06: use new "hybrid" allocation API from[4] to reduce memory usage
when a page can fit more than 1 truesize (also asked by David);
* 06: merge with "always use order-0 page" commit to reduce diffs and
simplify things (Alex);
* 09: fix page_alloc_fail counter.

From v2[5]:
* 0006: fix page_pool.h include in OcteonTX2 files (Jakub, Patchwork);
* no functional changes.

From v1[6]:
* 0006: new (me, Jakub);
* 0008: give the helpers more intuitive names (Jakub, Ilias);
* -^-: also expand their kdoc a bit for the same reason;
* -^-: fix kdoc copy-paste issue (Patchwork, Jakub);
* 0011: drop `inline` from C file (Patchwork, Jakub).

[0] https://lore.kernel.org/netdev/[email protected]
[1] https://lore.kernel.org/netdev/[email protected]
[2] https://lore.kernel.org/netdev/[email protected]
[3] https://lore.kernel.org/netdev/[email protected]
[4] https://lore.kernel.org/netdev/[email protected]
[5] https://lore.kernel.org/netdev/[email protected]
[6] https://lore.kernel.org/netdev/[email protected]

--
2.43.0


2023-12-07 17:22:33

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 01/12] page_pool: make sure frag API fields don't span between cachelines

After commit 5027ec19f104 ("net: page_pool: split the page_pool_params
into fast and slow") that made &page_pool contain only "hot" params at
the start, cacheline boundary chops frag API fields group in the middle
again.
To not bother with this each time fast params get expanded or shrunk,
let's just align them to `4 * sizeof(long)`, the closest upper pow-2 to
their actual size (2 longs + 2 ints). This ensures 16-byte alignment for
the 32-bit architectures and 32-byte alignment for the 64-bit ones,
excluding unnecessary false-sharing.

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/page_pool/types.h | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index ac286ea8ce2d..35ab82da7f2a 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -130,7 +130,16 @@ struct page_pool {

bool has_init_callback;

- long frag_users;
+ /* The following block must stay within one cacheline. On 32-bit
+ * systems, sizeof(long) == sizeof(int), so that the block size is
+ * precisely ``4 * sizeof(long)``. On 64-bit systems, the actual size
+ * is ``2 * sizeof(long) + 2 * sizeof(int)``, i.e. 24 bytes, but the
+ * closest pow-2 to that is 32 bytes, which also equals to
+ * ``4 * sizeof(long)``, so just use that one for simplicity.
+ * Having it aligned to a cacheline boundary may be excessive and
+ * doesn't bring any good.
+ */
+ long frag_users __aligned(4 * sizeof(long));
struct page *frag_page;
unsigned int frag_offset;
u32 pages_state_hold_cnt;
--
2.43.0

2023-12-07 17:22:55

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 02/12] page_pool: don't use driver-set flags field directly

page_pool::p is driver-defined params, copied directly from the
structure passed to page_pool_create(). The structure isn't meant
to be modified by the Page Pool core code and this even might look
confusing[0][1].
In order to be able to alter some flags, let's define our own, internal
fields the same way as the already existing one (::has_init_callback).
They are defined as bits in the driver-set params, leave them so here
as well, to not waste byte-per-bit or so. Almost 30 bits are still free
for future extensions.
We could've defined only new flags here or only the ones we may need
to alter, but checking some flags in one place while others in another
doesn't sound convenient or intuitive. ::flags passed by the driver can
now go to the "slow" PP params.

Suggested-by: Jakub Kicinski <[email protected]>
Link[0]: https://lore.kernel.org/netdev/[email protected]
Suggested-by: Alexander Duyck <[email protected]>
Link[1]: https://lore.kernel.org/netdev/CAKgT0UfZCGnWgOH96E4GV3ZP6LLbROHM7SHE8NKwq+exX+Gk_Q@mail.gmail.com
Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/page_pool/types.h | 8 +++++---
net/core/page_pool.c | 34 ++++++++++++++++++----------------
2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 35ab82da7f2a..51f6cdd60757 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -44,7 +44,6 @@ struct pp_alloc_cache {

/**
* struct page_pool_params - page pool parameters
- * @flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV
* @order: 2^order pages on allocation
* @pool_size: size of the ptr_ring
* @nid: NUMA node id to allocate from pages from
@@ -54,10 +53,10 @@ struct pp_alloc_cache {
* @dma_dir: DMA mapping direction
* @max_len: max DMA sync memory size for PP_FLAG_DMA_SYNC_DEV
* @offset: DMA sync address offset for PP_FLAG_DMA_SYNC_DEV
+ * @flags: PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV
*/
struct page_pool_params {
struct_group_tagged(page_pool_params_fast, fast,
- unsigned int flags;
unsigned int order;
unsigned int pool_size;
int nid;
@@ -68,6 +67,7 @@ struct page_pool_params {
unsigned int offset;
);
struct_group_tagged(page_pool_params_slow, slow,
+ unsigned int flags;
struct net_device *netdev;
/* private: used by test code only */
void (*init_callback)(struct page *page, void *arg);
@@ -128,7 +128,9 @@ struct page_pool_stats {
struct page_pool {
struct page_pool_params_fast p;

- bool has_init_callback;
+ bool dma_map:1; /* Perform DMA mapping */
+ bool dma_sync:1; /* Perform DMA sync */
+ bool has_init_callback:1; /* slow.init_callback is set */

/* The following block must stay within one cacheline. On 32-bit
* systems, sizeof(long) == sizeof(int), so that the block size is
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index c2e7c9a6efbe..59aca3339222 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -179,7 +179,7 @@ static int page_pool_init(struct page_pool *pool,
memcpy(&pool->slow, &params->slow, sizeof(pool->slow));

/* Validate only known flags were used */
- if (pool->p.flags & ~(PP_FLAG_ALL))
+ if (pool->slow.flags & ~(PP_FLAG_ALL))
return -EINVAL;

if (pool->p.pool_size)
@@ -193,22 +193,26 @@ static int page_pool_init(struct page_pool *pool,
* DMA_BIDIRECTIONAL is for allowing page used for DMA sending,
* which is the XDP_TX use-case.
*/
- if (pool->p.flags & PP_FLAG_DMA_MAP) {
+ if (pool->slow.flags & PP_FLAG_DMA_MAP) {
if ((pool->p.dma_dir != DMA_FROM_DEVICE) &&
(pool->p.dma_dir != DMA_BIDIRECTIONAL))
return -EINVAL;
+
+ pool->dma_map = true;
}

- if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) {
+ if (pool->slow.flags & PP_FLAG_DMA_SYNC_DEV) {
/* In order to request DMA-sync-for-device the page
* needs to be mapped
*/
- if (!(pool->p.flags & PP_FLAG_DMA_MAP))
+ if (!(pool->slow.flags & PP_FLAG_DMA_MAP))
return -EINVAL;

if (!pool->p.max_len)
return -EINVAL;

+ pool->dma_sync = true;
+
/* pool->p.offset has to be set according to the address
* offset used by the DMA engine to start copying rx data
*/
@@ -234,7 +238,7 @@ static int page_pool_init(struct page_pool *pool,
/* Driver calling page_pool_create() also call page_pool_destroy() */
refcount_set(&pool->user_cnt, 1);

- if (pool->p.flags & PP_FLAG_DMA_MAP)
+ if (pool->dma_map)
get_device(pool->p.dev);

return 0;
@@ -244,7 +248,7 @@ static void page_pool_uninit(struct page_pool *pool)
{
ptr_ring_cleanup(&pool->ring, NULL);

- if (pool->p.flags & PP_FLAG_DMA_MAP)
+ if (pool->dma_map)
put_device(pool->p.dev);

#ifdef CONFIG_PAGE_POOL_STATS
@@ -387,7 +391,7 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
if (page_pool_set_dma_addr(page, dma))
goto unmap_failed;

- if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
+ if (pool->dma_sync)
page_pool_dma_sync_for_device(pool, page, pool->p.max_len);

return true;
@@ -433,8 +437,7 @@ static struct page *__page_pool_alloc_page_order(struct page_pool *pool,
if (unlikely(!page))
return NULL;

- if ((pool->p.flags & PP_FLAG_DMA_MAP) &&
- unlikely(!page_pool_dma_map(pool, page))) {
+ if (pool->dma_map && unlikely(!page_pool_dma_map(pool, page))) {
put_page(page);
return NULL;
}
@@ -454,8 +457,8 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
gfp_t gfp)
{
const int bulk = PP_ALLOC_CACHE_REFILL;
- unsigned int pp_flags = pool->p.flags;
unsigned int pp_order = pool->p.order;
+ bool dma_map = pool->dma_map;
struct page *page;
int i, nr_pages;

@@ -480,8 +483,7 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
*/
for (i = 0; i < nr_pages; i++) {
page = pool->alloc.cache[i];
- if ((pp_flags & PP_FLAG_DMA_MAP) &&
- unlikely(!page_pool_dma_map(pool, page))) {
+ if (dma_map && unlikely(!page_pool_dma_map(pool, page))) {
put_page(page);
continue;
}
@@ -558,7 +560,7 @@ static void page_pool_return_page(struct page_pool *pool, struct page *page)
dma_addr_t dma;
int count;

- if (!(pool->p.flags & PP_FLAG_DMA_MAP))
+ if (!pool->dma_map)
/* Always account for inflight pages, even if we didn't
* map them
*/
@@ -624,7 +626,7 @@ static bool page_pool_recycle_in_cache(struct page *page,
}

/* If the page refcnt == 1, this will try to recycle the page.
- * if PP_FLAG_DMA_SYNC_DEV is set, we'll try to sync the DMA area for
+ * If pool->dma_sync is set, we'll try to sync the DMA area for
* the configured size min(dma_sync_size, pool->max_len).
* If the page refcnt != 1, then the page will be returned to memory
* subsystem.
@@ -647,7 +649,7 @@ __page_pool_put_page(struct page_pool *pool, struct page *page,
if (likely(page_ref_count(page) == 1 && !page_is_pfmemalloc(page))) {
/* Read barrier done in page_ref_count / READ_ONCE */

- if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
+ if (pool->dma_sync)
page_pool_dma_sync_for_device(pool, page,
dma_sync_size);

@@ -760,7 +762,7 @@ static struct page *page_pool_drain_frag(struct page_pool *pool,
return NULL;

if (page_ref_count(page) == 1 && !page_is_pfmemalloc(page)) {
- if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
+ if (pool->dma_sync)
page_pool_dma_sync_for_device(pool, page, -1);

return page;
--
2.43.0

2023-12-07 17:23:05

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 07/12] page_pool: add DMA-sync-for-CPU inline helper

Each driver is responsible for syncing buffers written by HW for CPU
before accessing them. Almost each PP-enabled driver uses the same
pattern, which could be shorthanded into a static inline to make driver
code a little bit more compact.
Introduce a simple helper which performs DMA synchronization for the
size passed from the driver. It can be used even when the pool doesn't
manage DMA-syncs-for-device, just make sure the page has a correct DMA
address set via page_pool_set_dma_addr().

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/page_pool/helpers.h | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index c860fad50d00..73ef4d1e12d4 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -52,6 +52,8 @@
#ifndef _NET_PAGE_POOL_HELPERS_H
#define _NET_PAGE_POOL_HELPERS_H

+#include <linux/dma-mapping.h>
+
#include <net/page_pool/types.h>

#ifdef CONFIG_PAGE_POOL_STATS
@@ -382,6 +384,28 @@ static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
return false;
}

+/**
+ * page_pool_dma_sync_for_cpu - sync Rx page for CPU after it's written by HW
+ * @pool: &page_pool the @page belongs to
+ * @page: page to sync
+ * @offset: offset from page start to "hard" start if using PP frags
+ * @dma_sync_size: size of the data written to the page
+ *
+ * Can be used as a shorthand to sync Rx pages before accessing them in the
+ * driver. Caller must ensure the pool was created with ``PP_FLAG_DMA_MAP``.
+ * Note that this version performs DMA sync unconditionally, even if the
+ * associated PP doesn't perform sync-for-device.
+ */
+static inline void page_pool_dma_sync_for_cpu(const struct page_pool *pool,
+ const struct page *page,
+ u32 offset, u32 dma_sync_size)
+{
+ dma_sync_single_range_for_cpu(pool->p.dev,
+ page_pool_get_dma_addr(page),
+ offset + pool->p.offset, dma_sync_size,
+ page_pool_get_dma_dir(pool));
+}
+
static inline bool page_pool_put(struct page_pool *pool)
{
return refcount_dec_and_test(&pool->user_cnt);
--
2.43.0

2023-12-07 17:23:13

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 03/12] net: intel: introduce Intel Ethernet common library

Not a secret there's a ton of code duplication between two and more Intel
ethernet modules.
Before introducing new changes, which would need to be copied over again,
start decoupling the already existing duplicate functionality into a new
module, which will be shared between several Intel Ethernet drivers.
Add the lookup table which converts 8/10-bit hardware packet type into
a parsed bitfield structure for easy checking packet format parameters,
such as payload level, IP version, etc. This is currently used by i40e,
ice and iavf and it's all the same in all three drivers.
The only difference introduced in this implementation is that instead of
defining a 256 (or 1024 in case of ice) element array, add unlikely()
condition to limit the input to 154 (current maximum non-reserved packet
type). There's no reason to waste 600 (or even 3600) bytes only to not
hurt very unlikely exception packets.
The hash computation function now takes payload level directly as a
pkt_hash_type. There's a couple cases when non-IP ptypes are marked as
L3 payload and in the previous versions their hash level would be 2, not
3. But skb_set_hash() only sees difference between L4 and non-L4, thus
this won't change anything at all.
The module is behind the hidden Kconfig symbol, which the drivers will
select when needed. The exports are behind 'LIBIE' namespace to limit
the scope of the functions.

Signed-off-by: Alexander Lobakin <[email protected]>
---
MAINTAINERS | 3 +-
drivers/net/ethernet/intel/Kconfig | 6 +
drivers/net/ethernet/intel/Makefile | 1 +
drivers/net/ethernet/intel/i40e/i40e_common.c | 253 --------------
drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
.../net/ethernet/intel/i40e/i40e_prototype.h | 7 -
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 72 +---
drivers/net/ethernet/intel/i40e/i40e_type.h | 88 -----
drivers/net/ethernet/intel/iavf/iavf_common.c | 253 --------------
drivers/net/ethernet/intel/iavf/iavf_main.c | 1 +
.../net/ethernet/intel/iavf/iavf_prototype.h | 7 -
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 70 +---
drivers/net/ethernet/intel/iavf/iavf_type.h | 88 -----
.../net/ethernet/intel/ice/ice_lan_tx_rx.h | 316 ------------------
drivers/net/ethernet/intel/ice/ice_main.c | 1 +
drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 74 +---
drivers/net/ethernet/intel/libie/Kconfig | 8 +
drivers/net/ethernet/intel/libie/Makefile | 6 +
drivers/net/ethernet/intel/libie/rx.c | 110 ++++++
include/linux/net/intel/libie/rx.h | 128 +++++++
20 files changed, 315 insertions(+), 1178 deletions(-)
create mode 100644 drivers/net/ethernet/intel/libie/Kconfig
create mode 100644 drivers/net/ethernet/intel/libie/Makefile
create mode 100644 drivers/net/ethernet/intel/libie/rx.c
create mode 100644 include/linux/net/intel/libie/rx.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 2d2865c1e479..6e7ddb1689e0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10651,7 +10651,8 @@ F: Documentation/networking/device_drivers/ethernet/intel/
F: drivers/net/ethernet/intel/
F: drivers/net/ethernet/intel/*/
F: include/linux/avf/virtchnl.h
-F: include/linux/net/intel/iidc.h
+F: include/linux/net/intel/
+F: include/linux/net/intel/*/

INTEL ETHERNET PROTOCOL DRIVER FOR RDMA
M: Mustafa Ismail <[email protected]>
diff --git a/drivers/net/ethernet/intel/Kconfig b/drivers/net/ethernet/intel/Kconfig
index d55638ad8704..c7da7d05d93e 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -16,6 +16,8 @@ config NET_VENDOR_INTEL

if NET_VENDOR_INTEL

+source "drivers/net/ethernet/intel/libie/Kconfig"
+
config E100
tristate "Intel(R) PRO/100+ support"
depends on PCI
@@ -225,6 +227,7 @@ config I40E
depends on PTP_1588_CLOCK_OPTIONAL
depends on PCI
select AUXILIARY_BUS
+ select LIBIE
select NET_DEVLINK
help
This driver supports Intel(R) Ethernet Controller XL710 Family of
@@ -253,6 +256,8 @@ config I40E_DCB
# so that CONFIG_IAVF symbol will always mirror the state of CONFIG_I40EVF
config IAVF
tristate
+ select LIBIE
+
config I40EVF
tristate "Intel(R) Ethernet Adaptive Virtual Function support"
select IAVF
@@ -283,6 +288,7 @@ config ICE
depends on GNSS || GNSS = n
select AUXILIARY_BUS
select DIMLIB
+ select LIBIE
select NET_DEVLINK
select PLDMFW
select DPLL
diff --git a/drivers/net/ethernet/intel/Makefile b/drivers/net/ethernet/intel/Makefile
index dacb481ee5b1..60ca00a01de4 100644
--- a/drivers/net/ethernet/intel/Makefile
+++ b/drivers/net/ethernet/intel/Makefile
@@ -16,3 +16,4 @@ obj-$(CONFIG_IAVF) += iavf/
obj-$(CONFIG_FM10K) += fm10k/
obj-$(CONFIG_ICE) += ice/
obj-$(CONFIG_IDPF) += idpf/
+obj-$(CONFIG_LIBIE) += libie/
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c b/drivers/net/ethernet/intel/i40e/i40e_common.c
index bd52b73cf61f..a57c89af214b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -386,259 +386,6 @@ int i40e_aq_set_rss_key(struct i40e_hw *hw,
return i40e_aq_get_set_rss_key(hw, vsi_id, key, true);
}

-/* The i40e_ptype_lookup table is used to convert from the 8-bit ptype in the
- * hardware to a bit-field that can be used by SW to more easily determine the
- * packet type.
- *
- * Macros are used to shorten the table lines and make this table human
- * readable.
- *
- * We store the PTYPE in the top byte of the bit field - this is just so that
- * we can check that the table doesn't have a row missing, as the index into
- * the table should be the PTYPE.
- *
- * Typical work flow:
- *
- * IF NOT i40e_ptype_lookup[ptype].known
- * THEN
- * Packet is unknown
- * ELSE IF i40e_ptype_lookup[ptype].outer_ip == I40E_RX_PTYPE_OUTER_IP
- * Use the rest of the fields to look at the tunnels, inner protocols, etc
- * ELSE
- * Use the enum i40e_rx_l2_ptype to decode the packet type
- * ENDIF
- */
-
-/* macro to make the table lines short, use explicit indexing with [PTYPE] */
-#define I40E_PTT(PTYPE, OUTER_IP, OUTER_IP_VER, OUTER_FRAG, T, TE, TEF, I, PL)\
- [PTYPE] = { \
- 1, \
- I40E_RX_PTYPE_OUTER_##OUTER_IP, \
- I40E_RX_PTYPE_OUTER_##OUTER_IP_VER, \
- I40E_RX_PTYPE_##OUTER_FRAG, \
- I40E_RX_PTYPE_TUNNEL_##T, \
- I40E_RX_PTYPE_TUNNEL_END_##TE, \
- I40E_RX_PTYPE_##TEF, \
- I40E_RX_PTYPE_INNER_PROT_##I, \
- I40E_RX_PTYPE_PAYLOAD_LAYER_##PL }
-
-#define I40E_PTT_UNUSED_ENTRY(PTYPE) [PTYPE] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 }
-
-/* shorter macros makes the table fit but are terse */
-#define I40E_RX_PTYPE_NOF I40E_RX_PTYPE_NOT_FRAG
-#define I40E_RX_PTYPE_FRG I40E_RX_PTYPE_FRAG
-#define I40E_RX_PTYPE_INNER_PROT_TS I40E_RX_PTYPE_INNER_PROT_TIMESYNC
-
-/* Lookup table mapping in the 8-bit HW PTYPE to the bit field for decoding */
-struct i40e_rx_ptype_decoded i40e_ptype_lookup[BIT(8)] = {
- /* L2 Packet types */
- I40E_PTT_UNUSED_ENTRY(0),
- I40E_PTT(1, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
- I40E_PTT(2, L2, NONE, NOF, NONE, NONE, NOF, TS, PAY2),
- I40E_PTT(3, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
- I40E_PTT_UNUSED_ENTRY(4),
- I40E_PTT_UNUSED_ENTRY(5),
- I40E_PTT(6, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
- I40E_PTT(7, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
- I40E_PTT_UNUSED_ENTRY(8),
- I40E_PTT_UNUSED_ENTRY(9),
- I40E_PTT(10, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
- I40E_PTT(11, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
- I40E_PTT(12, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(13, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(14, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(15, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(16, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(17, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(18, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(19, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(20, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(21, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
-
- /* Non Tunneled IPv4 */
- I40E_PTT(22, IP, IPV4, FRG, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(23, IP, IPV4, NOF, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(24, IP, IPV4, NOF, NONE, NONE, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(25),
- I40E_PTT(26, IP, IPV4, NOF, NONE, NONE, NOF, TCP, PAY4),
- I40E_PTT(27, IP, IPV4, NOF, NONE, NONE, NOF, SCTP, PAY4),
- I40E_PTT(28, IP, IPV4, NOF, NONE, NONE, NOF, ICMP, PAY4),
-
- /* IPv4 --> IPv4 */
- I40E_PTT(29, IP, IPV4, NOF, IP_IP, IPV4, FRG, NONE, PAY3),
- I40E_PTT(30, IP, IPV4, NOF, IP_IP, IPV4, NOF, NONE, PAY3),
- I40E_PTT(31, IP, IPV4, NOF, IP_IP, IPV4, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(32),
- I40E_PTT(33, IP, IPV4, NOF, IP_IP, IPV4, NOF, TCP, PAY4),
- I40E_PTT(34, IP, IPV4, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),
- I40E_PTT(35, IP, IPV4, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),
-
- /* IPv4 --> IPv6 */
- I40E_PTT(36, IP, IPV4, NOF, IP_IP, IPV6, FRG, NONE, PAY3),
- I40E_PTT(37, IP, IPV4, NOF, IP_IP, IPV6, NOF, NONE, PAY3),
- I40E_PTT(38, IP, IPV4, NOF, IP_IP, IPV6, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(39),
- I40E_PTT(40, IP, IPV4, NOF, IP_IP, IPV6, NOF, TCP, PAY4),
- I40E_PTT(41, IP, IPV4, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),
- I40E_PTT(42, IP, IPV4, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT */
- I40E_PTT(43, IP, IPV4, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),
-
- /* IPv4 --> GRE/NAT --> IPv4 */
- I40E_PTT(44, IP, IPV4, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),
- I40E_PTT(45, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),
- I40E_PTT(46, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(47),
- I40E_PTT(48, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, TCP, PAY4),
- I40E_PTT(49, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),
- I40E_PTT(50, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT --> IPv6 */
- I40E_PTT(51, IP, IPV4, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),
- I40E_PTT(52, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),
- I40E_PTT(53, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(54),
- I40E_PTT(55, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, TCP, PAY4),
- I40E_PTT(56, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),
- I40E_PTT(57, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT --> MAC */
- I40E_PTT(58, IP, IPV4, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),
-
- /* IPv4 --> GRE/NAT --> MAC --> IPv4 */
- I40E_PTT(59, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),
- I40E_PTT(60, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),
- I40E_PTT(61, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(62),
- I40E_PTT(63, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP, PAY4),
- I40E_PTT(64, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),
- I40E_PTT(65, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT -> MAC --> IPv6 */
- I40E_PTT(66, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),
- I40E_PTT(67, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),
- I40E_PTT(68, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(69),
- I40E_PTT(70, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP, PAY4),
- I40E_PTT(71, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),
- I40E_PTT(72, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT --> MAC/VLAN */
- I40E_PTT(73, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),
-
- /* IPv4 ---> GRE/NAT -> MAC/VLAN --> IPv4 */
- I40E_PTT(74, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),
- I40E_PTT(75, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),
- I40E_PTT(76, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(77),
- I40E_PTT(78, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP, PAY4),
- I40E_PTT(79, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),
- I40E_PTT(80, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),
-
- /* IPv4 -> GRE/NAT -> MAC/VLAN --> IPv6 */
- I40E_PTT(81, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),
- I40E_PTT(82, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),
- I40E_PTT(83, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(84),
- I40E_PTT(85, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP, PAY4),
- I40E_PTT(86, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),
- I40E_PTT(87, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),
-
- /* Non Tunneled IPv6 */
- I40E_PTT(88, IP, IPV6, FRG, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(89, IP, IPV6, NOF, NONE, NONE, NOF, NONE, PAY3),
- I40E_PTT(90, IP, IPV6, NOF, NONE, NONE, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(91),
- I40E_PTT(92, IP, IPV6, NOF, NONE, NONE, NOF, TCP, PAY4),
- I40E_PTT(93, IP, IPV6, NOF, NONE, NONE, NOF, SCTP, PAY4),
- I40E_PTT(94, IP, IPV6, NOF, NONE, NONE, NOF, ICMP, PAY4),
-
- /* IPv6 --> IPv4 */
- I40E_PTT(95, IP, IPV6, NOF, IP_IP, IPV4, FRG, NONE, PAY3),
- I40E_PTT(96, IP, IPV6, NOF, IP_IP, IPV4, NOF, NONE, PAY3),
- I40E_PTT(97, IP, IPV6, NOF, IP_IP, IPV4, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(98),
- I40E_PTT(99, IP, IPV6, NOF, IP_IP, IPV4, NOF, TCP, PAY4),
- I40E_PTT(100, IP, IPV6, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),
- I40E_PTT(101, IP, IPV6, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),
-
- /* IPv6 --> IPv6 */
- I40E_PTT(102, IP, IPV6, NOF, IP_IP, IPV6, FRG, NONE, PAY3),
- I40E_PTT(103, IP, IPV6, NOF, IP_IP, IPV6, NOF, NONE, PAY3),
- I40E_PTT(104, IP, IPV6, NOF, IP_IP, IPV6, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(105),
- I40E_PTT(106, IP, IPV6, NOF, IP_IP, IPV6, NOF, TCP, PAY4),
- I40E_PTT(107, IP, IPV6, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),
- I40E_PTT(108, IP, IPV6, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT */
- I40E_PTT(109, IP, IPV6, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),
-
- /* IPv6 --> GRE/NAT -> IPv4 */
- I40E_PTT(110, IP, IPV6, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),
- I40E_PTT(111, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),
- I40E_PTT(112, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(113),
- I40E_PTT(114, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, TCP, PAY4),
- I40E_PTT(115, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),
- I40E_PTT(116, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> IPv6 */
- I40E_PTT(117, IP, IPV6, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),
- I40E_PTT(118, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),
- I40E_PTT(119, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(120),
- I40E_PTT(121, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, TCP, PAY4),
- I40E_PTT(122, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),
- I40E_PTT(123, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> MAC */
- I40E_PTT(124, IP, IPV6, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),
-
- /* IPv6 --> GRE/NAT -> MAC -> IPv4 */
- I40E_PTT(125, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),
- I40E_PTT(126, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),
- I40E_PTT(127, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(128),
- I40E_PTT(129, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP, PAY4),
- I40E_PTT(130, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),
- I40E_PTT(131, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> MAC -> IPv6 */
- I40E_PTT(132, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),
- I40E_PTT(133, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),
- I40E_PTT(134, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(135),
- I40E_PTT(136, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP, PAY4),
- I40E_PTT(137, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),
- I40E_PTT(138, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> MAC/VLAN */
- I40E_PTT(139, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),
-
- /* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv4 */
- I40E_PTT(140, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),
- I40E_PTT(141, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),
- I40E_PTT(142, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(143),
- I40E_PTT(144, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP, PAY4),
- I40E_PTT(145, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),
- I40E_PTT(146, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv6 */
- I40E_PTT(147, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),
- I40E_PTT(148, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),
- I40E_PTT(149, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP, PAY4),
- I40E_PTT_UNUSED_ENTRY(150),
- I40E_PTT(151, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP, PAY4),
- I40E_PTT(152, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),
- I40E_PTT(153, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),
-
- /* unused entries */
- [154 ... 255] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 }
-};
-
/**
* i40e_init_shared_code - Initialize the shared code
* @hw: pointer to hardware structure
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 9eeea8d9ab67..a85d9ed271a2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -100,6 +100,7 @@ MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all), Debug mask (0x8XXXXXXX

MODULE_AUTHOR("Intel Corporation, <[email protected]>");
MODULE_DESCRIPTION("Intel(R) Ethernet Connection XL710 Network Driver");
+MODULE_IMPORT_NS(LIBIE);
MODULE_LICENSE("GPL v2");

static struct workqueue_struct *i40e_wq;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_prototype.h b/drivers/net/ethernet/intel/i40e/i40e_prototype.h
index af4269330581..f3dbf6dd2bc6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_prototype.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_prototype.h
@@ -371,13 +371,6 @@ void i40e_set_pci_config_data(struct i40e_hw *hw, u16 link_status);

int i40e_set_mac_type(struct i40e_hw *hw);

-extern struct i40e_rx_ptype_decoded i40e_ptype_lookup[];
-
-static inline struct i40e_rx_ptype_decoded decode_rx_desc_ptype(u8 ptype)
-{
- return i40e_ptype_lookup[ptype];
-}
-
/**
* i40e_virtchnl_link_speed - Convert AdminQ link_speed to virtchnl definition
* @link_speed: the speed to convert
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index b82df5bdfac0..9b3088f73de9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2,6 +2,7 @@
/* Copyright(c) 2013 - 2018 Intel Corporation. */

#include <linux/bpf_trace.h>
+#include <linux/net/intel/libie/rx.h>
#include <linux/prefetch.h>
#include <linux/sctp.h>
#include <net/mpls.h>
@@ -1757,40 +1758,32 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
struct sk_buff *skb,
union i40e_rx_desc *rx_desc)
{
- struct i40e_rx_ptype_decoded decoded;
+ struct libie_rx_ptype_parsed parsed;
u32 rx_error, rx_status;
bool ipv4, ipv6;
u8 ptype;
u64 qword;

+ skb->ip_summed = CHECKSUM_NONE;
+
qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
ptype = (qword & I40E_RXD_QW1_PTYPE_MASK) >> I40E_RXD_QW1_PTYPE_SHIFT;
+
+ parsed = libie_parse_rx_ptype(ptype);
+ if (!libie_has_rx_checksum(vsi->netdev, parsed))
+ return;
+
rx_error = (qword & I40E_RXD_QW1_ERROR_MASK) >>
I40E_RXD_QW1_ERROR_SHIFT;
rx_status = (qword & I40E_RXD_QW1_STATUS_MASK) >>
I40E_RXD_QW1_STATUS_SHIFT;
- decoded = decode_rx_desc_ptype(ptype);
-
- skb->ip_summed = CHECKSUM_NONE;
-
- skb_checksum_none_assert(skb);
-
- /* Rx csum enabled and ip headers found? */
- if (!(vsi->netdev->features & NETIF_F_RXCSUM))
- return;

/* did the hardware decode the packet and checksum? */
if (!(rx_status & BIT(I40E_RX_DESC_STATUS_L3L4P_SHIFT)))
return;

- /* both known and outer_ip must be set for the below code to work */
- if (!(decoded.known && decoded.outer_ip))
- return;
-
- ipv4 = (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP) &&
- (decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV4);
- ipv6 = (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP) &&
- (decoded.outer_ip_ver == I40E_RX_PTYPE_OUTER_IPV6);
+ ipv4 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV4;
+ ipv6 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV6;

if (ipv4 &&
(rx_error & (BIT(I40E_RX_DESC_ERROR_IPE_SHIFT) |
@@ -1818,49 +1811,16 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
* we need to bump the checksum level by 1 to reflect the fact that
* we are indicating we validated the inner checksum.
*/
- if (decoded.tunnel_type >= I40E_RX_PTYPE_TUNNEL_IP_GRENAT)
+ if (parsed.tunnel_type >= LIBIE_RX_PTYPE_TUNNEL_IP_GRENAT)
skb->csum_level = 1;

- /* Only report checksum unnecessary for TCP, UDP, or SCTP */
- switch (decoded.inner_prot) {
- case I40E_RX_PTYPE_INNER_PROT_TCP:
- case I40E_RX_PTYPE_INNER_PROT_UDP:
- case I40E_RX_PTYPE_INNER_PROT_SCTP:
- skb->ip_summed = CHECKSUM_UNNECESSARY;
- fallthrough;
- default:
- break;
- }
-
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
return;

checksum_fail:
vsi->back->hw_csum_rx_error++;
}

-/**
- * i40e_ptype_to_htype - get a hash type
- * @ptype: the ptype value from the descriptor
- *
- * Returns a hash type to be used by skb_set_hash
- **/
-static inline int i40e_ptype_to_htype(u8 ptype)
-{
- struct i40e_rx_ptype_decoded decoded = decode_rx_desc_ptype(ptype);
-
- if (!decoded.known)
- return PKT_HASH_TYPE_NONE;
-
- if (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP &&
- decoded.payload_layer == I40E_RX_PTYPE_PAYLOAD_LAYER_PAY4)
- return PKT_HASH_TYPE_L4;
- else if (decoded.outer_ip == I40E_RX_PTYPE_OUTER_IP &&
- decoded.payload_layer == I40E_RX_PTYPE_PAYLOAD_LAYER_PAY3)
- return PKT_HASH_TYPE_L3;
- else
- return PKT_HASH_TYPE_L2;
-}
-
/**
* i40e_rx_hash - set the hash value in the skb
* @ring: descriptor ring
@@ -1873,17 +1833,19 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
struct sk_buff *skb,
u8 rx_ptype)
{
+ struct libie_rx_ptype_parsed parsed;
u32 hash;
const __le64 rss_mask =
cpu_to_le64((u64)I40E_RX_DESC_FLTSTAT_RSS_HASH <<
I40E_RX_DESC_STATUS_FLTSTAT_SHIFT);

- if (!(ring->netdev->features & NETIF_F_RXHASH))
+ parsed = libie_parse_rx_ptype(rx_ptype);
+ if (!libie_has_rx_hash(ring->netdev, parsed))
return;

if ((rx_desc->wb.qword1.status_error_len & rss_mask) == rss_mask) {
hash = le32_to_cpu(rx_desc->wb.qword0.hi_dword.rss);
- skb_set_hash(skb, hash, i40e_ptype_to_htype(rx_ptype));
+ libie_skb_set_hash(skb, hash, parsed);
}
}

diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h
index de69c2e22448..8812aa84a536 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -745,94 +745,6 @@ enum i40e_rx_desc_error_l3l4e_fcoe_masks {
#define I40E_RXD_QW1_PTYPE_SHIFT 30
#define I40E_RXD_QW1_PTYPE_MASK (0xFFULL << I40E_RXD_QW1_PTYPE_SHIFT)

-/* Packet type non-ip values */
-enum i40e_rx_l2_ptype {
- I40E_RX_PTYPE_L2_RESERVED = 0,
- I40E_RX_PTYPE_L2_MAC_PAY2 = 1,
- I40E_RX_PTYPE_L2_TIMESYNC_PAY2 = 2,
- I40E_RX_PTYPE_L2_FIP_PAY2 = 3,
- I40E_RX_PTYPE_L2_OUI_PAY2 = 4,
- I40E_RX_PTYPE_L2_MACCNTRL_PAY2 = 5,
- I40E_RX_PTYPE_L2_LLDP_PAY2 = 6,
- I40E_RX_PTYPE_L2_ECP_PAY2 = 7,
- I40E_RX_PTYPE_L2_EVB_PAY2 = 8,
- I40E_RX_PTYPE_L2_QCN_PAY2 = 9,
- I40E_RX_PTYPE_L2_EAPOL_PAY2 = 10,
- I40E_RX_PTYPE_L2_ARP = 11,
- I40E_RX_PTYPE_L2_FCOE_PAY3 = 12,
- I40E_RX_PTYPE_L2_FCOE_FCDATA_PAY3 = 13,
- I40E_RX_PTYPE_L2_FCOE_FCRDY_PAY3 = 14,
- I40E_RX_PTYPE_L2_FCOE_FCRSP_PAY3 = 15,
- I40E_RX_PTYPE_L2_FCOE_FCOTHER_PA = 16,
- I40E_RX_PTYPE_L2_FCOE_VFT_PAY3 = 17,
- I40E_RX_PTYPE_L2_FCOE_VFT_FCDATA = 18,
- I40E_RX_PTYPE_L2_FCOE_VFT_FCRDY = 19,
- I40E_RX_PTYPE_L2_FCOE_VFT_FCRSP = 20,
- I40E_RX_PTYPE_L2_FCOE_VFT_FCOTHER = 21,
- I40E_RX_PTYPE_GRENAT4_MAC_PAY3 = 58,
- I40E_RX_PTYPE_GRENAT4_MACVLAN_IPV6_ICMP_PAY4 = 87,
- I40E_RX_PTYPE_GRENAT6_MAC_PAY3 = 124,
- I40E_RX_PTYPE_GRENAT6_MACVLAN_IPV6_ICMP_PAY4 = 153
-};
-
-struct i40e_rx_ptype_decoded {
- u32 known:1;
- u32 outer_ip:1;
- u32 outer_ip_ver:1;
- u32 outer_frag:1;
- u32 tunnel_type:3;
- u32 tunnel_end_prot:2;
- u32 tunnel_end_frag:1;
- u32 inner_prot:4;
- u32 payload_layer:3;
-};
-
-enum i40e_rx_ptype_outer_ip {
- I40E_RX_PTYPE_OUTER_L2 = 0,
- I40E_RX_PTYPE_OUTER_IP = 1
-};
-
-enum i40e_rx_ptype_outer_ip_ver {
- I40E_RX_PTYPE_OUTER_NONE = 0,
- I40E_RX_PTYPE_OUTER_IPV4 = 0,
- I40E_RX_PTYPE_OUTER_IPV6 = 1
-};
-
-enum i40e_rx_ptype_outer_fragmented {
- I40E_RX_PTYPE_NOT_FRAG = 0,
- I40E_RX_PTYPE_FRAG = 1
-};
-
-enum i40e_rx_ptype_tunnel_type {
- I40E_RX_PTYPE_TUNNEL_NONE = 0,
- I40E_RX_PTYPE_TUNNEL_IP_IP = 1,
- I40E_RX_PTYPE_TUNNEL_IP_GRENAT = 2,
- I40E_RX_PTYPE_TUNNEL_IP_GRENAT_MAC = 3,
- I40E_RX_PTYPE_TUNNEL_IP_GRENAT_MAC_VLAN = 4,
-};
-
-enum i40e_rx_ptype_tunnel_end_prot {
- I40E_RX_PTYPE_TUNNEL_END_NONE = 0,
- I40E_RX_PTYPE_TUNNEL_END_IPV4 = 1,
- I40E_RX_PTYPE_TUNNEL_END_IPV6 = 2,
-};
-
-enum i40e_rx_ptype_inner_prot {
- I40E_RX_PTYPE_INNER_PROT_NONE = 0,
- I40E_RX_PTYPE_INNER_PROT_UDP = 1,
- I40E_RX_PTYPE_INNER_PROT_TCP = 2,
- I40E_RX_PTYPE_INNER_PROT_SCTP = 3,
- I40E_RX_PTYPE_INNER_PROT_ICMP = 4,
- I40E_RX_PTYPE_INNER_PROT_TIMESYNC = 5
-};
-
-enum i40e_rx_ptype_payload_layer {
- I40E_RX_PTYPE_PAYLOAD_LAYER_NONE = 0,
- I40E_RX_PTYPE_PAYLOAD_LAYER_PAY2 = 1,
- I40E_RX_PTYPE_PAYLOAD_LAYER_PAY3 = 2,
- I40E_RX_PTYPE_PAYLOAD_LAYER_PAY4 = 3,
-};
-
#define I40E_RXD_QW1_LENGTH_PBUF_SHIFT 38
#define I40E_RXD_QW1_LENGTH_PBUF_MASK (0x3FFFULL << \
I40E_RXD_QW1_LENGTH_PBUF_SHIFT)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_common.c b/drivers/net/ethernet/intel/iavf/iavf_common.c
index 89d2bce529ae..a3b4aaf953c1 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_common.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_common.c
@@ -436,259 +436,6 @@ enum iavf_status iavf_aq_set_rss_key(struct iavf_hw *hw, u16 vsi_id,
return iavf_aq_get_set_rss_key(hw, vsi_id, key, true);
}

-/* The iavf_ptype_lookup table is used to convert from the 8-bit ptype in the
- * hardware to a bit-field that can be used by SW to more easily determine the
- * packet type.
- *
- * Macros are used to shorten the table lines and make this table human
- * readable.
- *
- * We store the PTYPE in the top byte of the bit field - this is just so that
- * we can check that the table doesn't have a row missing, as the index into
- * the table should be the PTYPE.
- *
- * Typical work flow:
- *
- * IF NOT iavf_ptype_lookup[ptype].known
- * THEN
- * Packet is unknown
- * ELSE IF iavf_ptype_lookup[ptype].outer_ip == IAVF_RX_PTYPE_OUTER_IP
- * Use the rest of the fields to look at the tunnels, inner protocols, etc
- * ELSE
- * Use the enum iavf_rx_l2_ptype to decode the packet type
- * ENDIF
- */
-
-/* macro to make the table lines short, use explicit indexing with [PTYPE] */
-#define IAVF_PTT(PTYPE, OUTER_IP, OUTER_IP_VER, OUTER_FRAG, T, TE, TEF, I, PL)\
- [PTYPE] = { \
- 1, \
- IAVF_RX_PTYPE_OUTER_##OUTER_IP, \
- IAVF_RX_PTYPE_OUTER_##OUTER_IP_VER, \
- IAVF_RX_PTYPE_##OUTER_FRAG, \
- IAVF_RX_PTYPE_TUNNEL_##T, \
- IAVF_RX_PTYPE_TUNNEL_END_##TE, \
- IAVF_RX_PTYPE_##TEF, \
- IAVF_RX_PTYPE_INNER_PROT_##I, \
- IAVF_RX_PTYPE_PAYLOAD_LAYER_##PL }
-
-#define IAVF_PTT_UNUSED_ENTRY(PTYPE) [PTYPE] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 }
-
-/* shorter macros makes the table fit but are terse */
-#define IAVF_RX_PTYPE_NOF IAVF_RX_PTYPE_NOT_FRAG
-#define IAVF_RX_PTYPE_FRG IAVF_RX_PTYPE_FRAG
-#define IAVF_RX_PTYPE_INNER_PROT_TS IAVF_RX_PTYPE_INNER_PROT_TIMESYNC
-
-/* Lookup table mapping the 8-bit HW PTYPE to the bit field for decoding */
-struct iavf_rx_ptype_decoded iavf_ptype_lookup[BIT(8)] = {
- /* L2 Packet types */
- IAVF_PTT_UNUSED_ENTRY(0),
- IAVF_PTT(1, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
- IAVF_PTT(2, L2, NONE, NOF, NONE, NONE, NOF, TS, PAY2),
- IAVF_PTT(3, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
- IAVF_PTT_UNUSED_ENTRY(4),
- IAVF_PTT_UNUSED_ENTRY(5),
- IAVF_PTT(6, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
- IAVF_PTT(7, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
- IAVF_PTT_UNUSED_ENTRY(8),
- IAVF_PTT_UNUSED_ENTRY(9),
- IAVF_PTT(10, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
- IAVF_PTT(11, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
- IAVF_PTT(12, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(13, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(14, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(15, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(16, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(17, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(18, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(19, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(20, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(21, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY3),
-
- /* Non Tunneled IPv4 */
- IAVF_PTT(22, IP, IPV4, FRG, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(23, IP, IPV4, NOF, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(24, IP, IPV4, NOF, NONE, NONE, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(25),
- IAVF_PTT(26, IP, IPV4, NOF, NONE, NONE, NOF, TCP, PAY4),
- IAVF_PTT(27, IP, IPV4, NOF, NONE, NONE, NOF, SCTP, PAY4),
- IAVF_PTT(28, IP, IPV4, NOF, NONE, NONE, NOF, ICMP, PAY4),
-
- /* IPv4 --> IPv4 */
- IAVF_PTT(29, IP, IPV4, NOF, IP_IP, IPV4, FRG, NONE, PAY3),
- IAVF_PTT(30, IP, IPV4, NOF, IP_IP, IPV4, NOF, NONE, PAY3),
- IAVF_PTT(31, IP, IPV4, NOF, IP_IP, IPV4, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(32),
- IAVF_PTT(33, IP, IPV4, NOF, IP_IP, IPV4, NOF, TCP, PAY4),
- IAVF_PTT(34, IP, IPV4, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),
- IAVF_PTT(35, IP, IPV4, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),
-
- /* IPv4 --> IPv6 */
- IAVF_PTT(36, IP, IPV4, NOF, IP_IP, IPV6, FRG, NONE, PAY3),
- IAVF_PTT(37, IP, IPV4, NOF, IP_IP, IPV6, NOF, NONE, PAY3),
- IAVF_PTT(38, IP, IPV4, NOF, IP_IP, IPV6, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(39),
- IAVF_PTT(40, IP, IPV4, NOF, IP_IP, IPV6, NOF, TCP, PAY4),
- IAVF_PTT(41, IP, IPV4, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),
- IAVF_PTT(42, IP, IPV4, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT */
- IAVF_PTT(43, IP, IPV4, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),
-
- /* IPv4 --> GRE/NAT --> IPv4 */
- IAVF_PTT(44, IP, IPV4, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),
- IAVF_PTT(45, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),
- IAVF_PTT(46, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(47),
- IAVF_PTT(48, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, TCP, PAY4),
- IAVF_PTT(49, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),
- IAVF_PTT(50, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT --> IPv6 */
- IAVF_PTT(51, IP, IPV4, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),
- IAVF_PTT(52, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),
- IAVF_PTT(53, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(54),
- IAVF_PTT(55, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, TCP, PAY4),
- IAVF_PTT(56, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),
- IAVF_PTT(57, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT --> MAC */
- IAVF_PTT(58, IP, IPV4, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),
-
- /* IPv4 --> GRE/NAT --> MAC --> IPv4 */
- IAVF_PTT(59, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),
- IAVF_PTT(60, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),
- IAVF_PTT(61, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(62),
- IAVF_PTT(63, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP, PAY4),
- IAVF_PTT(64, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),
- IAVF_PTT(65, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT -> MAC --> IPv6 */
- IAVF_PTT(66, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),
- IAVF_PTT(67, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),
- IAVF_PTT(68, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(69),
- IAVF_PTT(70, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP, PAY4),
- IAVF_PTT(71, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),
- IAVF_PTT(72, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT --> MAC/VLAN */
- IAVF_PTT(73, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),
-
- /* IPv4 ---> GRE/NAT -> MAC/VLAN --> IPv4 */
- IAVF_PTT(74, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),
- IAVF_PTT(75, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),
- IAVF_PTT(76, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(77),
- IAVF_PTT(78, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP, PAY4),
- IAVF_PTT(79, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),
- IAVF_PTT(80, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),
-
- /* IPv4 -> GRE/NAT -> MAC/VLAN --> IPv6 */
- IAVF_PTT(81, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),
- IAVF_PTT(82, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),
- IAVF_PTT(83, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(84),
- IAVF_PTT(85, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP, PAY4),
- IAVF_PTT(86, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),
- IAVF_PTT(87, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),
-
- /* Non Tunneled IPv6 */
- IAVF_PTT(88, IP, IPV6, FRG, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(89, IP, IPV6, NOF, NONE, NONE, NOF, NONE, PAY3),
- IAVF_PTT(90, IP, IPV6, NOF, NONE, NONE, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(91),
- IAVF_PTT(92, IP, IPV6, NOF, NONE, NONE, NOF, TCP, PAY4),
- IAVF_PTT(93, IP, IPV6, NOF, NONE, NONE, NOF, SCTP, PAY4),
- IAVF_PTT(94, IP, IPV6, NOF, NONE, NONE, NOF, ICMP, PAY4),
-
- /* IPv6 --> IPv4 */
- IAVF_PTT(95, IP, IPV6, NOF, IP_IP, IPV4, FRG, NONE, PAY3),
- IAVF_PTT(96, IP, IPV6, NOF, IP_IP, IPV4, NOF, NONE, PAY3),
- IAVF_PTT(97, IP, IPV6, NOF, IP_IP, IPV4, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(98),
- IAVF_PTT(99, IP, IPV6, NOF, IP_IP, IPV4, NOF, TCP, PAY4),
- IAVF_PTT(100, IP, IPV6, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),
- IAVF_PTT(101, IP, IPV6, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),
-
- /* IPv6 --> IPv6 */
- IAVF_PTT(102, IP, IPV6, NOF, IP_IP, IPV6, FRG, NONE, PAY3),
- IAVF_PTT(103, IP, IPV6, NOF, IP_IP, IPV6, NOF, NONE, PAY3),
- IAVF_PTT(104, IP, IPV6, NOF, IP_IP, IPV6, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(105),
- IAVF_PTT(106, IP, IPV6, NOF, IP_IP, IPV6, NOF, TCP, PAY4),
- IAVF_PTT(107, IP, IPV6, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),
- IAVF_PTT(108, IP, IPV6, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT */
- IAVF_PTT(109, IP, IPV6, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),
-
- /* IPv6 --> GRE/NAT -> IPv4 */
- IAVF_PTT(110, IP, IPV6, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),
- IAVF_PTT(111, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),
- IAVF_PTT(112, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(113),
- IAVF_PTT(114, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, TCP, PAY4),
- IAVF_PTT(115, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),
- IAVF_PTT(116, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> IPv6 */
- IAVF_PTT(117, IP, IPV6, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),
- IAVF_PTT(118, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),
- IAVF_PTT(119, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(120),
- IAVF_PTT(121, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, TCP, PAY4),
- IAVF_PTT(122, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),
- IAVF_PTT(123, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> MAC */
- IAVF_PTT(124, IP, IPV6, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),
-
- /* IPv6 --> GRE/NAT -> MAC -> IPv4 */
- IAVF_PTT(125, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),
- IAVF_PTT(126, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),
- IAVF_PTT(127, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(128),
- IAVF_PTT(129, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP, PAY4),
- IAVF_PTT(130, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),
- IAVF_PTT(131, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> MAC -> IPv6 */
- IAVF_PTT(132, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),
- IAVF_PTT(133, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),
- IAVF_PTT(134, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(135),
- IAVF_PTT(136, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP, PAY4),
- IAVF_PTT(137, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),
- IAVF_PTT(138, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> MAC/VLAN */
- IAVF_PTT(139, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),
-
- /* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv4 */
- IAVF_PTT(140, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),
- IAVF_PTT(141, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),
- IAVF_PTT(142, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(143),
- IAVF_PTT(144, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP, PAY4),
- IAVF_PTT(145, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),
- IAVF_PTT(146, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv6 */
- IAVF_PTT(147, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),
- IAVF_PTT(148, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),
- IAVF_PTT(149, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP, PAY4),
- IAVF_PTT_UNUSED_ENTRY(150),
- IAVF_PTT(151, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP, PAY4),
- IAVF_PTT(152, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),
- IAVF_PTT(153, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),
-
- /* unused entries */
- [154 ... 255] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 }
-};
-
/**
* iavf_aq_send_msg_to_pf
* @hw: pointer to the hardware structure
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 06a87030c163..091ad35d181d 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -45,6 +45,7 @@ MODULE_DEVICE_TABLE(pci, iavf_pci_tbl);
MODULE_ALIAS("i40evf");
MODULE_AUTHOR("Intel Corporation, <[email protected]>");
MODULE_DESCRIPTION("Intel(R) Ethernet Adaptive Virtual Function Network Driver");
+MODULE_IMPORT_NS(LIBIE);
MODULE_LICENSE("GPL v2");

static const struct net_device_ops iavf_netdev_ops;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_prototype.h b/drivers/net/ethernet/intel/iavf/iavf_prototype.h
index 4a48e6171405..48c3901381b4 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_prototype.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_prototype.h
@@ -45,13 +45,6 @@ enum iavf_status iavf_aq_set_rss_lut(struct iavf_hw *hw, u16 seid,
enum iavf_status iavf_aq_set_rss_key(struct iavf_hw *hw, u16 seid,
struct iavf_aqc_get_set_rss_key_data *key);

-extern struct iavf_rx_ptype_decoded iavf_ptype_lookup[];
-
-static inline struct iavf_rx_ptype_decoded decode_rx_desc_ptype(u8 ptype)
-{
- return iavf_ptype_lookup[ptype];
-}
-
void iavf_vf_parse_hw_config(struct iavf_hw *hw,
struct virtchnl_vf_resource *msg);
enum iavf_status iavf_aq_send_msg_to_pf(struct iavf_hw *hw,
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index d64c4997136b..c610013bdcb0 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2013 - 2018 Intel Corporation. */

+#include <linux/net/intel/libie/rx.h>
#include <linux/prefetch.h>

#include "iavf.h"
@@ -981,40 +982,32 @@ static void iavf_rx_checksum(struct iavf_vsi *vsi,
struct sk_buff *skb,
union iavf_rx_desc *rx_desc)
{
- struct iavf_rx_ptype_decoded decoded;
+ struct libie_rx_ptype_parsed parsed;
u32 rx_error, rx_status;
bool ipv4, ipv6;
u8 ptype;
u64 qword;

+ skb->ip_summed = CHECKSUM_NONE;
+
qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
ptype = (qword & IAVF_RXD_QW1_PTYPE_MASK) >> IAVF_RXD_QW1_PTYPE_SHIFT;
+
+ parsed = libie_parse_rx_ptype(ptype);
+ if (!libie_has_rx_checksum(vsi->netdev, parsed))
+ return;
+
rx_error = (qword & IAVF_RXD_QW1_ERROR_MASK) >>
IAVF_RXD_QW1_ERROR_SHIFT;
rx_status = (qword & IAVF_RXD_QW1_STATUS_MASK) >>
IAVF_RXD_QW1_STATUS_SHIFT;
- decoded = decode_rx_desc_ptype(ptype);
-
- skb->ip_summed = CHECKSUM_NONE;
-
- skb_checksum_none_assert(skb);
-
- /* Rx csum enabled and ip headers found? */
- if (!(vsi->netdev->features & NETIF_F_RXCSUM))
- return;

/* did the hardware decode the packet and checksum? */
if (!(rx_status & BIT(IAVF_RX_DESC_STATUS_L3L4P_SHIFT)))
return;

- /* both known and outer_ip must be set for the below code to work */
- if (!(decoded.known && decoded.outer_ip))
- return;
-
- ipv4 = (decoded.outer_ip == IAVF_RX_PTYPE_OUTER_IP) &&
- (decoded.outer_ip_ver == IAVF_RX_PTYPE_OUTER_IPV4);
- ipv6 = (decoded.outer_ip == IAVF_RX_PTYPE_OUTER_IP) &&
- (decoded.outer_ip_ver == IAVF_RX_PTYPE_OUTER_IPV6);
+ ipv4 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV4;
+ ipv6 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV6;

if (ipv4 &&
(rx_error & (BIT(IAVF_RX_DESC_ERROR_IPE_SHIFT) |
@@ -1038,46 +1031,13 @@ static void iavf_rx_checksum(struct iavf_vsi *vsi,
if (rx_error & BIT(IAVF_RX_DESC_ERROR_PPRS_SHIFT))
return;

- /* Only report checksum unnecessary for TCP, UDP, or SCTP */
- switch (decoded.inner_prot) {
- case IAVF_RX_PTYPE_INNER_PROT_TCP:
- case IAVF_RX_PTYPE_INNER_PROT_UDP:
- case IAVF_RX_PTYPE_INNER_PROT_SCTP:
- skb->ip_summed = CHECKSUM_UNNECESSARY;
- fallthrough;
- default:
- break;
- }
-
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
return;

checksum_fail:
vsi->back->hw_csum_rx_error++;
}

-/**
- * iavf_ptype_to_htype - get a hash type
- * @ptype: the ptype value from the descriptor
- *
- * Returns a hash type to be used by skb_set_hash
- **/
-static int iavf_ptype_to_htype(u8 ptype)
-{
- struct iavf_rx_ptype_decoded decoded = decode_rx_desc_ptype(ptype);
-
- if (!decoded.known)
- return PKT_HASH_TYPE_NONE;
-
- if (decoded.outer_ip == IAVF_RX_PTYPE_OUTER_IP &&
- decoded.payload_layer == IAVF_RX_PTYPE_PAYLOAD_LAYER_PAY4)
- return PKT_HASH_TYPE_L4;
- else if (decoded.outer_ip == IAVF_RX_PTYPE_OUTER_IP &&
- decoded.payload_layer == IAVF_RX_PTYPE_PAYLOAD_LAYER_PAY3)
- return PKT_HASH_TYPE_L3;
- else
- return PKT_HASH_TYPE_L2;
-}
-
/**
* iavf_rx_hash - set the hash value in the skb
* @ring: descriptor ring
@@ -1090,17 +1050,19 @@ static void iavf_rx_hash(struct iavf_ring *ring,
struct sk_buff *skb,
u8 rx_ptype)
{
+ struct libie_rx_ptype_parsed parsed;
u32 hash;
const __le64 rss_mask =
cpu_to_le64((u64)IAVF_RX_DESC_FLTSTAT_RSS_HASH <<
IAVF_RX_DESC_STATUS_FLTSTAT_SHIFT);

- if (!(ring->netdev->features & NETIF_F_RXHASH))
+ parsed = libie_parse_rx_ptype(rx_ptype);
+ if (!libie_has_rx_hash(ring->netdev, parsed))
return;

if ((rx_desc->wb.qword1.status_error_len & rss_mask) == rss_mask) {
hash = le32_to_cpu(rx_desc->wb.qword0.hi_dword.rss);
- skb_set_hash(skb, hash, iavf_ptype_to_htype(rx_ptype));
+ libie_skb_set_hash(skb, hash, parsed);
}
}

diff --git a/drivers/net/ethernet/intel/iavf/iavf_type.h b/drivers/net/ethernet/intel/iavf/iavf_type.h
index 2b6a207fa441..23ded4fcd94f 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_type.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_type.h
@@ -327,94 +327,6 @@ enum iavf_rx_desc_error_l3l4e_fcoe_masks {
#define IAVF_RXD_QW1_PTYPE_SHIFT 30
#define IAVF_RXD_QW1_PTYPE_MASK (0xFFULL << IAVF_RXD_QW1_PTYPE_SHIFT)

-/* Packet type non-ip values */
-enum iavf_rx_l2_ptype {
- IAVF_RX_PTYPE_L2_RESERVED = 0,
- IAVF_RX_PTYPE_L2_MAC_PAY2 = 1,
- IAVF_RX_PTYPE_L2_TIMESYNC_PAY2 = 2,
- IAVF_RX_PTYPE_L2_FIP_PAY2 = 3,
- IAVF_RX_PTYPE_L2_OUI_PAY2 = 4,
- IAVF_RX_PTYPE_L2_MACCNTRL_PAY2 = 5,
- IAVF_RX_PTYPE_L2_LLDP_PAY2 = 6,
- IAVF_RX_PTYPE_L2_ECP_PAY2 = 7,
- IAVF_RX_PTYPE_L2_EVB_PAY2 = 8,
- IAVF_RX_PTYPE_L2_QCN_PAY2 = 9,
- IAVF_RX_PTYPE_L2_EAPOL_PAY2 = 10,
- IAVF_RX_PTYPE_L2_ARP = 11,
- IAVF_RX_PTYPE_L2_FCOE_PAY3 = 12,
- IAVF_RX_PTYPE_L2_FCOE_FCDATA_PAY3 = 13,
- IAVF_RX_PTYPE_L2_FCOE_FCRDY_PAY3 = 14,
- IAVF_RX_PTYPE_L2_FCOE_FCRSP_PAY3 = 15,
- IAVF_RX_PTYPE_L2_FCOE_FCOTHER_PA = 16,
- IAVF_RX_PTYPE_L2_FCOE_VFT_PAY3 = 17,
- IAVF_RX_PTYPE_L2_FCOE_VFT_FCDATA = 18,
- IAVF_RX_PTYPE_L2_FCOE_VFT_FCRDY = 19,
- IAVF_RX_PTYPE_L2_FCOE_VFT_FCRSP = 20,
- IAVF_RX_PTYPE_L2_FCOE_VFT_FCOTHER = 21,
- IAVF_RX_PTYPE_GRENAT4_MAC_PAY3 = 58,
- IAVF_RX_PTYPE_GRENAT4_MACVLAN_IPV6_ICMP_PAY4 = 87,
- IAVF_RX_PTYPE_GRENAT6_MAC_PAY3 = 124,
- IAVF_RX_PTYPE_GRENAT6_MACVLAN_IPV6_ICMP_PAY4 = 153
-};
-
-struct iavf_rx_ptype_decoded {
- u32 known:1;
- u32 outer_ip:1;
- u32 outer_ip_ver:1;
- u32 outer_frag:1;
- u32 tunnel_type:3;
- u32 tunnel_end_prot:2;
- u32 tunnel_end_frag:1;
- u32 inner_prot:4;
- u32 payload_layer:3;
-};
-
-enum iavf_rx_ptype_outer_ip {
- IAVF_RX_PTYPE_OUTER_L2 = 0,
- IAVF_RX_PTYPE_OUTER_IP = 1
-};
-
-enum iavf_rx_ptype_outer_ip_ver {
- IAVF_RX_PTYPE_OUTER_NONE = 0,
- IAVF_RX_PTYPE_OUTER_IPV4 = 0,
- IAVF_RX_PTYPE_OUTER_IPV6 = 1
-};
-
-enum iavf_rx_ptype_outer_fragmented {
- IAVF_RX_PTYPE_NOT_FRAG = 0,
- IAVF_RX_PTYPE_FRAG = 1
-};
-
-enum iavf_rx_ptype_tunnel_type {
- IAVF_RX_PTYPE_TUNNEL_NONE = 0,
- IAVF_RX_PTYPE_TUNNEL_IP_IP = 1,
- IAVF_RX_PTYPE_TUNNEL_IP_GRENAT = 2,
- IAVF_RX_PTYPE_TUNNEL_IP_GRENAT_MAC = 3,
- IAVF_RX_PTYPE_TUNNEL_IP_GRENAT_MAC_VLAN = 4,
-};
-
-enum iavf_rx_ptype_tunnel_end_prot {
- IAVF_RX_PTYPE_TUNNEL_END_NONE = 0,
- IAVF_RX_PTYPE_TUNNEL_END_IPV4 = 1,
- IAVF_RX_PTYPE_TUNNEL_END_IPV6 = 2,
-};
-
-enum iavf_rx_ptype_inner_prot {
- IAVF_RX_PTYPE_INNER_PROT_NONE = 0,
- IAVF_RX_PTYPE_INNER_PROT_UDP = 1,
- IAVF_RX_PTYPE_INNER_PROT_TCP = 2,
- IAVF_RX_PTYPE_INNER_PROT_SCTP = 3,
- IAVF_RX_PTYPE_INNER_PROT_ICMP = 4,
- IAVF_RX_PTYPE_INNER_PROT_TIMESYNC = 5
-};
-
-enum iavf_rx_ptype_payload_layer {
- IAVF_RX_PTYPE_PAYLOAD_LAYER_NONE = 0,
- IAVF_RX_PTYPE_PAYLOAD_LAYER_PAY2 = 1,
- IAVF_RX_PTYPE_PAYLOAD_LAYER_PAY3 = 2,
- IAVF_RX_PTYPE_PAYLOAD_LAYER_PAY4 = 3,
-};
-
#define IAVF_RXD_QW1_LENGTH_PBUF_SHIFT 38
#define IAVF_RXD_QW1_LENGTH_PBUF_MASK (0x3FFFULL << \
IAVF_RXD_QW1_LENGTH_PBUF_SHIFT)
diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
index 89f986a75cc8..611577ebc29d 100644
--- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
+++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
@@ -160,64 +160,6 @@ struct ice_fltr_desc {
(0x1ULL << ICE_FXD_FLTR_WB_QW1_FAIL_PROF_S)
#define ICE_FXD_FLTR_WB_QW1_FAIL_PROF_YES 0x1ULL

-struct ice_rx_ptype_decoded {
- u32 known:1;
- u32 outer_ip:1;
- u32 outer_ip_ver:2;
- u32 outer_frag:1;
- u32 tunnel_type:3;
- u32 tunnel_end_prot:2;
- u32 tunnel_end_frag:1;
- u32 inner_prot:4;
- u32 payload_layer:3;
-};
-
-enum ice_rx_ptype_outer_ip {
- ICE_RX_PTYPE_OUTER_L2 = 0,
- ICE_RX_PTYPE_OUTER_IP = 1,
-};
-
-enum ice_rx_ptype_outer_ip_ver {
- ICE_RX_PTYPE_OUTER_NONE = 0,
- ICE_RX_PTYPE_OUTER_IPV4 = 1,
- ICE_RX_PTYPE_OUTER_IPV6 = 2,
-};
-
-enum ice_rx_ptype_outer_fragmented {
- ICE_RX_PTYPE_NOT_FRAG = 0,
- ICE_RX_PTYPE_FRAG = 1,
-};
-
-enum ice_rx_ptype_tunnel_type {
- ICE_RX_PTYPE_TUNNEL_NONE = 0,
- ICE_RX_PTYPE_TUNNEL_IP_IP = 1,
- ICE_RX_PTYPE_TUNNEL_IP_GRENAT = 2,
- ICE_RX_PTYPE_TUNNEL_IP_GRENAT_MAC = 3,
- ICE_RX_PTYPE_TUNNEL_IP_GRENAT_MAC_VLAN = 4,
-};
-
-enum ice_rx_ptype_tunnel_end_prot {
- ICE_RX_PTYPE_TUNNEL_END_NONE = 0,
- ICE_RX_PTYPE_TUNNEL_END_IPV4 = 1,
- ICE_RX_PTYPE_TUNNEL_END_IPV6 = 2,
-};
-
-enum ice_rx_ptype_inner_prot {
- ICE_RX_PTYPE_INNER_PROT_NONE = 0,
- ICE_RX_PTYPE_INNER_PROT_UDP = 1,
- ICE_RX_PTYPE_INNER_PROT_TCP = 2,
- ICE_RX_PTYPE_INNER_PROT_SCTP = 3,
- ICE_RX_PTYPE_INNER_PROT_ICMP = 4,
- ICE_RX_PTYPE_INNER_PROT_TIMESYNC = 5,
-};
-
-enum ice_rx_ptype_payload_layer {
- ICE_RX_PTYPE_PAYLOAD_LAYER_NONE = 0,
- ICE_RX_PTYPE_PAYLOAD_LAYER_PAY2 = 1,
- ICE_RX_PTYPE_PAYLOAD_LAYER_PAY3 = 2,
- ICE_RX_PTYPE_PAYLOAD_LAYER_PAY4 = 3,
-};
-
/* Rx Flex Descriptor
* This descriptor is used instead of the legacy version descriptor when
* ice_rlan_ctx.adv_desc is set
@@ -651,262 +593,4 @@ struct ice_tlan_ctx {
u8 int_q_state; /* width not needed - internal - DO NOT WRITE!!! */
};

-/* The ice_ptype_lkup table is used to convert from the 10-bit ptype in the
- * hardware to a bit-field that can be used by SW to more easily determine the
- * packet type.
- *
- * Macros are used to shorten the table lines and make this table human
- * readable.
- *
- * We store the PTYPE in the top byte of the bit field - this is just so that
- * we can check that the table doesn't have a row missing, as the index into
- * the table should be the PTYPE.
- *
- * Typical work flow:
- *
- * IF NOT ice_ptype_lkup[ptype].known
- * THEN
- * Packet is unknown
- * ELSE IF ice_ptype_lkup[ptype].outer_ip == ICE_RX_PTYPE_OUTER_IP
- * Use the rest of the fields to look at the tunnels, inner protocols, etc
- * ELSE
- * Use the enum ice_rx_l2_ptype to decode the packet type
- * ENDIF
- */
-
-/* macro to make the table lines short, use explicit indexing with [PTYPE] */
-#define ICE_PTT(PTYPE, OUTER_IP, OUTER_IP_VER, OUTER_FRAG, T, TE, TEF, I, PL)\
- [PTYPE] = { \
- 1, \
- ICE_RX_PTYPE_OUTER_##OUTER_IP, \
- ICE_RX_PTYPE_OUTER_##OUTER_IP_VER, \
- ICE_RX_PTYPE_##OUTER_FRAG, \
- ICE_RX_PTYPE_TUNNEL_##T, \
- ICE_RX_PTYPE_TUNNEL_END_##TE, \
- ICE_RX_PTYPE_##TEF, \
- ICE_RX_PTYPE_INNER_PROT_##I, \
- ICE_RX_PTYPE_PAYLOAD_LAYER_##PL }
-
-#define ICE_PTT_UNUSED_ENTRY(PTYPE) [PTYPE] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 }
-
-/* shorter macros makes the table fit but are terse */
-#define ICE_RX_PTYPE_NOF ICE_RX_PTYPE_NOT_FRAG
-#define ICE_RX_PTYPE_FRG ICE_RX_PTYPE_FRAG
-
-/* Lookup table mapping in the 10-bit HW PTYPE to the bit field for decoding */
-static const struct ice_rx_ptype_decoded ice_ptype_lkup[BIT(10)] = {
- /* L2 Packet types */
- ICE_PTT_UNUSED_ENTRY(0),
- ICE_PTT(1, L2, NONE, NOF, NONE, NONE, NOF, NONE, PAY2),
- ICE_PTT_UNUSED_ENTRY(2),
- ICE_PTT_UNUSED_ENTRY(3),
- ICE_PTT_UNUSED_ENTRY(4),
- ICE_PTT_UNUSED_ENTRY(5),
- ICE_PTT(6, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
- ICE_PTT(7, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
- ICE_PTT_UNUSED_ENTRY(8),
- ICE_PTT_UNUSED_ENTRY(9),
- ICE_PTT(10, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
- ICE_PTT(11, L2, NONE, NOF, NONE, NONE, NOF, NONE, NONE),
- ICE_PTT_UNUSED_ENTRY(12),
- ICE_PTT_UNUSED_ENTRY(13),
- ICE_PTT_UNUSED_ENTRY(14),
- ICE_PTT_UNUSED_ENTRY(15),
- ICE_PTT_UNUSED_ENTRY(16),
- ICE_PTT_UNUSED_ENTRY(17),
- ICE_PTT_UNUSED_ENTRY(18),
- ICE_PTT_UNUSED_ENTRY(19),
- ICE_PTT_UNUSED_ENTRY(20),
- ICE_PTT_UNUSED_ENTRY(21),
-
- /* Non Tunneled IPv4 */
- ICE_PTT(22, IP, IPV4, FRG, NONE, NONE, NOF, NONE, PAY3),
- ICE_PTT(23, IP, IPV4, NOF, NONE, NONE, NOF, NONE, PAY3),
- ICE_PTT(24, IP, IPV4, NOF, NONE, NONE, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(25),
- ICE_PTT(26, IP, IPV4, NOF, NONE, NONE, NOF, TCP, PAY4),
- ICE_PTT(27, IP, IPV4, NOF, NONE, NONE, NOF, SCTP, PAY4),
- ICE_PTT(28, IP, IPV4, NOF, NONE, NONE, NOF, ICMP, PAY4),
-
- /* IPv4 --> IPv4 */
- ICE_PTT(29, IP, IPV4, NOF, IP_IP, IPV4, FRG, NONE, PAY3),
- ICE_PTT(30, IP, IPV4, NOF, IP_IP, IPV4, NOF, NONE, PAY3),
- ICE_PTT(31, IP, IPV4, NOF, IP_IP, IPV4, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(32),
- ICE_PTT(33, IP, IPV4, NOF, IP_IP, IPV4, NOF, TCP, PAY4),
- ICE_PTT(34, IP, IPV4, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),
- ICE_PTT(35, IP, IPV4, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),
-
- /* IPv4 --> IPv6 */
- ICE_PTT(36, IP, IPV4, NOF, IP_IP, IPV6, FRG, NONE, PAY3),
- ICE_PTT(37, IP, IPV4, NOF, IP_IP, IPV6, NOF, NONE, PAY3),
- ICE_PTT(38, IP, IPV4, NOF, IP_IP, IPV6, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(39),
- ICE_PTT(40, IP, IPV4, NOF, IP_IP, IPV6, NOF, TCP, PAY4),
- ICE_PTT(41, IP, IPV4, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),
- ICE_PTT(42, IP, IPV4, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT */
- ICE_PTT(43, IP, IPV4, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),
-
- /* IPv4 --> GRE/NAT --> IPv4 */
- ICE_PTT(44, IP, IPV4, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),
- ICE_PTT(45, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),
- ICE_PTT(46, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(47),
- ICE_PTT(48, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, TCP, PAY4),
- ICE_PTT(49, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),
- ICE_PTT(50, IP, IPV4, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT --> IPv6 */
- ICE_PTT(51, IP, IPV4, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),
- ICE_PTT(52, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),
- ICE_PTT(53, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(54),
- ICE_PTT(55, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, TCP, PAY4),
- ICE_PTT(56, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),
- ICE_PTT(57, IP, IPV4, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT --> MAC */
- ICE_PTT(58, IP, IPV4, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),
-
- /* IPv4 --> GRE/NAT --> MAC --> IPv4 */
- ICE_PTT(59, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),
- ICE_PTT(60, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),
- ICE_PTT(61, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(62),
- ICE_PTT(63, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP, PAY4),
- ICE_PTT(64, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),
- ICE_PTT(65, IP, IPV4, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT -> MAC --> IPv6 */
- ICE_PTT(66, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),
- ICE_PTT(67, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),
- ICE_PTT(68, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(69),
- ICE_PTT(70, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP, PAY4),
- ICE_PTT(71, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),
- ICE_PTT(72, IP, IPV4, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),
-
- /* IPv4 --> GRE/NAT --> MAC/VLAN */
- ICE_PTT(73, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),
-
- /* IPv4 ---> GRE/NAT -> MAC/VLAN --> IPv4 */
- ICE_PTT(74, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),
- ICE_PTT(75, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),
- ICE_PTT(76, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(77),
- ICE_PTT(78, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP, PAY4),
- ICE_PTT(79, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),
- ICE_PTT(80, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),
-
- /* IPv4 -> GRE/NAT -> MAC/VLAN --> IPv6 */
- ICE_PTT(81, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),
- ICE_PTT(82, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),
- ICE_PTT(83, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(84),
- ICE_PTT(85, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP, PAY4),
- ICE_PTT(86, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),
- ICE_PTT(87, IP, IPV4, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),
-
- /* Non Tunneled IPv6 */
- ICE_PTT(88, IP, IPV6, FRG, NONE, NONE, NOF, NONE, PAY3),
- ICE_PTT(89, IP, IPV6, NOF, NONE, NONE, NOF, NONE, PAY3),
- ICE_PTT(90, IP, IPV6, NOF, NONE, NONE, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(91),
- ICE_PTT(92, IP, IPV6, NOF, NONE, NONE, NOF, TCP, PAY4),
- ICE_PTT(93, IP, IPV6, NOF, NONE, NONE, NOF, SCTP, PAY4),
- ICE_PTT(94, IP, IPV6, NOF, NONE, NONE, NOF, ICMP, PAY4),
-
- /* IPv6 --> IPv4 */
- ICE_PTT(95, IP, IPV6, NOF, IP_IP, IPV4, FRG, NONE, PAY3),
- ICE_PTT(96, IP, IPV6, NOF, IP_IP, IPV4, NOF, NONE, PAY3),
- ICE_PTT(97, IP, IPV6, NOF, IP_IP, IPV4, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(98),
- ICE_PTT(99, IP, IPV6, NOF, IP_IP, IPV4, NOF, TCP, PAY4),
- ICE_PTT(100, IP, IPV6, NOF, IP_IP, IPV4, NOF, SCTP, PAY4),
- ICE_PTT(101, IP, IPV6, NOF, IP_IP, IPV4, NOF, ICMP, PAY4),
-
- /* IPv6 --> IPv6 */
- ICE_PTT(102, IP, IPV6, NOF, IP_IP, IPV6, FRG, NONE, PAY3),
- ICE_PTT(103, IP, IPV6, NOF, IP_IP, IPV6, NOF, NONE, PAY3),
- ICE_PTT(104, IP, IPV6, NOF, IP_IP, IPV6, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(105),
- ICE_PTT(106, IP, IPV6, NOF, IP_IP, IPV6, NOF, TCP, PAY4),
- ICE_PTT(107, IP, IPV6, NOF, IP_IP, IPV6, NOF, SCTP, PAY4),
- ICE_PTT(108, IP, IPV6, NOF, IP_IP, IPV6, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT */
- ICE_PTT(109, IP, IPV6, NOF, IP_GRENAT, NONE, NOF, NONE, PAY3),
-
- /* IPv6 --> GRE/NAT -> IPv4 */
- ICE_PTT(110, IP, IPV6, NOF, IP_GRENAT, IPV4, FRG, NONE, PAY3),
- ICE_PTT(111, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, NONE, PAY3),
- ICE_PTT(112, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(113),
- ICE_PTT(114, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, TCP, PAY4),
- ICE_PTT(115, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, SCTP, PAY4),
- ICE_PTT(116, IP, IPV6, NOF, IP_GRENAT, IPV4, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> IPv6 */
- ICE_PTT(117, IP, IPV6, NOF, IP_GRENAT, IPV6, FRG, NONE, PAY3),
- ICE_PTT(118, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, NONE, PAY3),
- ICE_PTT(119, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(120),
- ICE_PTT(121, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, TCP, PAY4),
- ICE_PTT(122, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, SCTP, PAY4),
- ICE_PTT(123, IP, IPV6, NOF, IP_GRENAT, IPV6, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> MAC */
- ICE_PTT(124, IP, IPV6, NOF, IP_GRENAT_MAC, NONE, NOF, NONE, PAY3),
-
- /* IPv6 --> GRE/NAT -> MAC -> IPv4 */
- ICE_PTT(125, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, FRG, NONE, PAY3),
- ICE_PTT(126, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, NONE, PAY3),
- ICE_PTT(127, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(128),
- ICE_PTT(129, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, TCP, PAY4),
- ICE_PTT(130, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, SCTP, PAY4),
- ICE_PTT(131, IP, IPV6, NOF, IP_GRENAT_MAC, IPV4, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> MAC -> IPv6 */
- ICE_PTT(132, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, FRG, NONE, PAY3),
- ICE_PTT(133, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, NONE, PAY3),
- ICE_PTT(134, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(135),
- ICE_PTT(136, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, TCP, PAY4),
- ICE_PTT(137, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, SCTP, PAY4),
- ICE_PTT(138, IP, IPV6, NOF, IP_GRENAT_MAC, IPV6, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> MAC/VLAN */
- ICE_PTT(139, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, NONE, NOF, NONE, PAY3),
-
- /* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv4 */
- ICE_PTT(140, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, FRG, NONE, PAY3),
- ICE_PTT(141, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, NONE, PAY3),
- ICE_PTT(142, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(143),
- ICE_PTT(144, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, TCP, PAY4),
- ICE_PTT(145, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, SCTP, PAY4),
- ICE_PTT(146, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV4, NOF, ICMP, PAY4),
-
- /* IPv6 --> GRE/NAT -> MAC/VLAN --> IPv6 */
- ICE_PTT(147, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, FRG, NONE, PAY3),
- ICE_PTT(148, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, NONE, PAY3),
- ICE_PTT(149, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, UDP, PAY4),
- ICE_PTT_UNUSED_ENTRY(150),
- ICE_PTT(151, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, TCP, PAY4),
- ICE_PTT(152, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, SCTP, PAY4),
- ICE_PTT(153, IP, IPV6, NOF, IP_GRENAT_MAC_VLAN, IPV6, NOF, ICMP, PAY4),
-
- /* unused entries */
- [154 ... 1023] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 }
-};
-
-static inline struct ice_rx_ptype_decoded ice_decode_rx_desc_ptype(u16 ptype)
-{
- return ice_ptype_lkup[ptype];
-}
-
-
#endif /* _ICE_LAN_TX_RX_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index a39c2d9bdafe..4ae1972bc162 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -36,6 +36,7 @@ static const char ice_copyright[] = "Copyright (c) 2018, Intel Corporation.";

MODULE_AUTHOR("Intel Corporation, <[email protected]>");
MODULE_DESCRIPTION(DRV_SUMMARY);
+MODULE_IMPORT_NS(LIBIE);
MODULE_LICENSE("GPL v2");
MODULE_FIRMWARE(ICE_DDP_PKG_FILE);

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 7e06373e14d9..fab81fb93e1b 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -2,6 +2,7 @@
/* Copyright (c) 2019, Intel Corporation. */

#include <linux/filter.h>
+#include <linux/net/intel/libie/rx.h>

#include "ice_txrx_lib.h"
#include "ice_eswitch.h"
@@ -38,30 +39,6 @@ void ice_release_rx_desc(struct ice_rx_ring *rx_ring, u16 val)
}
}

-/**
- * ice_ptype_to_htype - get a hash type
- * @ptype: the ptype value from the descriptor
- *
- * Returns appropriate hash type (such as PKT_HASH_TYPE_L2/L3/L4) to be used by
- * skb_set_hash based on PTYPE as parsed by HW Rx pipeline and is part of
- * Rx desc.
- */
-static enum pkt_hash_types ice_ptype_to_htype(u16 ptype)
-{
- struct ice_rx_ptype_decoded decoded = ice_decode_rx_desc_ptype(ptype);
-
- if (!decoded.known)
- return PKT_HASH_TYPE_NONE;
- if (decoded.payload_layer == ICE_RX_PTYPE_PAYLOAD_LAYER_PAY4)
- return PKT_HASH_TYPE_L4;
- if (decoded.payload_layer == ICE_RX_PTYPE_PAYLOAD_LAYER_PAY3)
- return PKT_HASH_TYPE_L3;
- if (decoded.outer_ip == ICE_RX_PTYPE_OUTER_L2)
- return PKT_HASH_TYPE_L2;
-
- return PKT_HASH_TYPE_NONE;
-}
-
/**
* ice_rx_hash - set the hash value in the skb
* @rx_ring: descriptor ring
@@ -74,9 +51,11 @@ ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc,
struct sk_buff *skb, u16 rx_ptype)
{
struct ice_32b_rx_flex_desc_nic *nic_mdid;
+ struct libie_rx_ptype_parsed parsed;
u32 hash;

- if (!(rx_ring->netdev->features & NETIF_F_RXHASH))
+ parsed = libie_parse_rx_ptype(rx_ptype);
+ if (!libie_has_rx_hash(rx_ring->netdev, parsed))
return;

if (rx_desc->wb.rxdid != ICE_RXDID_FLEX_NIC)
@@ -84,7 +63,7 @@ ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc,

nic_mdid = (struct ice_32b_rx_flex_desc_nic *)rx_desc;
hash = le32_to_cpu(nic_mdid->rss_hash);
- skb_set_hash(skb, hash, ice_ptype_to_htype(rx_ptype));
+ libie_skb_set_hash(skb, hash, parsed);
}

/**
@@ -92,7 +71,7 @@ ice_rx_hash(struct ice_rx_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc,
* @ring: the ring we care about
* @skb: skb currently being received and modified
* @rx_desc: the receive descriptor
- * @ptype: the packet type decoded by hardware
+ * @ptype: the packet type parsed by hardware
*
* skb->protocol must be set before this function is called
*/
@@ -100,34 +79,26 @@ static void
ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb,
union ice_32b_rx_flex_desc *rx_desc, u16 ptype)
{
- struct ice_rx_ptype_decoded decoded;
+ struct libie_rx_ptype_parsed parsed;
u16 rx_status0, rx_status1;
bool ipv4, ipv6;

- rx_status0 = le16_to_cpu(rx_desc->wb.status_error0);
- rx_status1 = le16_to_cpu(rx_desc->wb.status_error1);
-
- decoded = ice_decode_rx_desc_ptype(ptype);
-
/* Start with CHECKSUM_NONE and by default csum_level = 0 */
skb->ip_summed = CHECKSUM_NONE;
- skb_checksum_none_assert(skb);

- /* check if Rx checksum is enabled */
- if (!(ring->netdev->features & NETIF_F_RXCSUM))
+ parsed = libie_parse_rx_ptype(ptype);
+ if (!libie_has_rx_checksum(ring->netdev, parsed))
return;

- /* check if HW has decoded the packet and checksum */
- if (!(rx_status0 & BIT(ICE_RX_FLEX_DESC_STATUS0_L3L4P_S)))
- return;
+ rx_status0 = le16_to_cpu(rx_desc->wb.status_error0);
+ rx_status1 = le16_to_cpu(rx_desc->wb.status_error1);

- if (!(decoded.known && decoded.outer_ip))
+ /* check if HW has parsed the packet and checksum */
+ if (!(rx_status0 & BIT(ICE_RX_FLEX_DESC_STATUS0_L3L4P_S)))
return;

- ipv4 = (decoded.outer_ip == ICE_RX_PTYPE_OUTER_IP) &&
- (decoded.outer_ip_ver == ICE_RX_PTYPE_OUTER_IPV4);
- ipv6 = (decoded.outer_ip == ICE_RX_PTYPE_OUTER_IP) &&
- (decoded.outer_ip_ver == ICE_RX_PTYPE_OUTER_IPV6);
+ ipv4 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV4;
+ ipv6 = parsed.outer_ip == LIBIE_RX_PTYPE_OUTER_IPV6;

if (ipv4 && (rx_status0 & (BIT(ICE_RX_FLEX_DESC_STATUS0_XSUM_IPE_S) |
BIT(ICE_RX_FLEX_DESC_STATUS0_XSUM_EIPE_S))))
@@ -151,19 +122,10 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb,
* we need to bump the checksum level by 1 to reflect the fact that
* we are indicating we validated the inner checksum.
*/
- if (decoded.tunnel_type >= ICE_RX_PTYPE_TUNNEL_IP_GRENAT)
+ if (parsed.tunnel_type >= LIBIE_RX_PTYPE_TUNNEL_IP_GRENAT)
skb->csum_level = 1;

- /* Only report checksum unnecessary for TCP, UDP, or SCTP */
- switch (decoded.inner_prot) {
- case ICE_RX_PTYPE_INNER_PROT_TCP:
- case ICE_RX_PTYPE_INNER_PROT_UDP:
- case ICE_RX_PTYPE_INNER_PROT_SCTP:
- skb->ip_summed = CHECKSUM_UNNECESSARY;
- break;
- default:
- break;
- }
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
return;

checksum_fail:
@@ -175,7 +137,7 @@ ice_rx_csum(struct ice_rx_ring *ring, struct sk_buff *skb,
* @rx_ring: Rx descriptor ring packet is being transacted on
* @rx_desc: pointer to the EOP Rx descriptor
* @skb: pointer to current skb being populated
- * @ptype: the packet type decoded by hardware
+ * @ptype: the packet type parsed by hardware
*
* This function checks the ring, descriptor, and packet information in
* order to populate the hash, checksum, VLAN, protocol, and
diff --git a/drivers/net/ethernet/intel/libie/Kconfig b/drivers/net/ethernet/intel/libie/Kconfig
new file mode 100644
index 000000000000..1eda4a5faa5a
--- /dev/null
+++ b/drivers/net/ethernet/intel/libie/Kconfig
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+# Copyright(c) 2023 Intel Corporation.
+
+config LIBIE
+ tristate
+ help
+ libie (Intel Ethernet library) is a common library containing
+ routines shared between several Intel Ethernet drivers.
diff --git a/drivers/net/ethernet/intel/libie/Makefile b/drivers/net/ethernet/intel/libie/Makefile
new file mode 100644
index 000000000000..95e81d09b474
--- /dev/null
+++ b/drivers/net/ethernet/intel/libie/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0-only
+# Copyright(c) 2023 Intel Corporation.
+
+obj-$(CONFIG_LIBIE) += libie.o
+
+libie-objs += rx.o
diff --git a/drivers/net/ethernet/intel/libie/rx.c b/drivers/net/ethernet/intel/libie/rx.c
new file mode 100644
index 000000000000..f503476d8eef
--- /dev/null
+++ b/drivers/net/ethernet/intel/libie/rx.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2023 Intel Corporation. */
+
+#include <linux/net/intel/libie/rx.h>
+
+/* O(1) converting i40e/ice/iavf's 8/10-bit hardware packet type to a parsed
+ * bitfield struct.
+ */
+
+#define LIBIE_RX_PTYPE(oip, ofrag, tun, tp, tefr, iprot, pl) { \
+ .outer_ip = LIBIE_RX_PTYPE_OUTER_##oip, \
+ .outer_frag = LIBIE_RX_PTYPE_##ofrag, \
+ .tunnel_type = LIBIE_RX_PTYPE_TUNNEL_IP_##tun, \
+ .tunnel_end_prot = LIBIE_RX_PTYPE_TUNNEL_END_##tp, \
+ .tunnel_end_frag = LIBIE_RX_PTYPE_##tefr, \
+ .inner_prot = LIBIE_RX_PTYPE_INNER_##iprot, \
+ .payload_layer = LIBIE_RX_PTYPE_PAYLOAD_##pl, \
+ }
+
+#define LIBIE_RX_PTYPE_UNUSED { }
+
+#define __LIBIE_RX_PTYPE_L2(iprot, pl) \
+ LIBIE_RX_PTYPE(L2, NOT_FRAG, NONE, NONE, NOT_FRAG, iprot, pl)
+#define LIBIE_RX_PTYPE_L2 __LIBIE_RX_PTYPE_L2(NONE, L2)
+#define LIBIE_RX_PTYPE_TS __LIBIE_RX_PTYPE_L2(TIMESYNC, L2)
+#define LIBIE_RX_PTYPE_L3 __LIBIE_RX_PTYPE_L2(NONE, L3)
+
+#define LIBIE_RX_PTYPE_IP_FRAG(oip) \
+ LIBIE_RX_PTYPE(IPV##oip, FRAG, NONE, NONE, NOT_FRAG, NONE, L3)
+#define LIBIE_RX_PTYPE_IP_L3(oip, tun, teprot, tefr) \
+ LIBIE_RX_PTYPE(IPV##oip, NOT_FRAG, tun, teprot, tefr, NONE, L3)
+#define LIBIE_RX_PTYPE_IP_L4(oip, tun, teprot, iprot) \
+ LIBIE_RX_PTYPE(IPV##oip, NOT_FRAG, tun, teprot, NOT_FRAG, iprot, L4)
+
+#define LIBIE_RX_PTYPE_IP_NOF(oip, tun, ver) \
+ LIBIE_RX_PTYPE_IP_L3(oip, tun, ver, NOT_FRAG), \
+ LIBIE_RX_PTYPE_IP_L4(oip, tun, ver, UDP), \
+ LIBIE_RX_PTYPE_UNUSED, \
+ LIBIE_RX_PTYPE_IP_L4(oip, tun, ver, TCP), \
+ LIBIE_RX_PTYPE_IP_L4(oip, tun, ver, SCTP), \
+ LIBIE_RX_PTYPE_IP_L4(oip, tun, ver, ICMP)
+
+/* IPv oip --> tun --> IPv ver */
+#define LIBIE_RX_PTYPE_IP_TUN_VER(oip, tun, ver) \
+ LIBIE_RX_PTYPE_IP_L3(oip, tun, ver, FRAG), \
+ LIBIE_RX_PTYPE_IP_NOF(oip, tun, ver)
+
+/* Non Tunneled IPv oip */
+#define LIBIE_RX_PTYPE_IP_RAW(oip) \
+ LIBIE_RX_PTYPE_IP_FRAG(oip), \
+ LIBIE_RX_PTYPE_IP_NOF(oip, NONE, NONE)
+
+/* IPv oip --> tun --> { IPv4, IPv6 } */
+#define LIBIE_RX_PTYPE_IP_TUN(oip, tun) \
+ LIBIE_RX_PTYPE_IP_TUN_VER(oip, tun, IPV4), \
+ LIBIE_RX_PTYPE_IP_TUN_VER(oip, tun, IPV6)
+
+/* IPv oip --> GRE/NAT tun --> { x, IPv4, IPv6 } */
+#define LIBIE_RX_PTYPE_IP_GRE(oip, tun) \
+ LIBIE_RX_PTYPE_IP_L3(oip, tun, NONE, NOT_FRAG), \
+ LIBIE_RX_PTYPE_IP_TUN(oip, tun)
+
+/* Non Tunneled IPv oip
+ * IPv oip --> { IPv4, IPv6 }
+ * IPv oip --> GRE/NAT --> { x, IPv4, IPv6 }
+ * IPv oip --> GRE/NAT --> MAC --> { x, IPv4, IPv6 }
+ * IPv oip --> GRE/NAT --> MAC/VLAN --> { x, IPv4, IPv6 }
+ */
+#define LIBIE_RX_PTYPE_IP(oip) \
+ LIBIE_RX_PTYPE_IP_RAW(oip), \
+ LIBIE_RX_PTYPE_IP_TUN(oip, IP), \
+ LIBIE_RX_PTYPE_IP_GRE(oip, GRENAT), \
+ LIBIE_RX_PTYPE_IP_GRE(oip, GRENAT_MAC), \
+ LIBIE_RX_PTYPE_IP_GRE(oip, GRENAT_MAC_VLAN)
+
+/* Lookup table mapping for O(1) parsing */
+const struct libie_rx_ptype_parsed libie_rx_ptype_lut[LIBIE_RX_PTYPE_NUM] = {
+ /* L2 packet types */
+ LIBIE_RX_PTYPE_UNUSED,
+ LIBIE_RX_PTYPE_L2,
+ LIBIE_RX_PTYPE_TS,
+ LIBIE_RX_PTYPE_L2,
+ LIBIE_RX_PTYPE_UNUSED,
+ LIBIE_RX_PTYPE_UNUSED,
+ LIBIE_RX_PTYPE_L2,
+ LIBIE_RX_PTYPE_L2,
+ LIBIE_RX_PTYPE_UNUSED,
+ LIBIE_RX_PTYPE_UNUSED,
+ LIBIE_RX_PTYPE_L2,
+ LIBIE_RX_PTYPE_UNUSED,
+
+ LIBIE_RX_PTYPE_L3,
+ LIBIE_RX_PTYPE_L3,
+ LIBIE_RX_PTYPE_L3,
+ LIBIE_RX_PTYPE_L3,
+ LIBIE_RX_PTYPE_L3,
+ LIBIE_RX_PTYPE_L3,
+ LIBIE_RX_PTYPE_L3,
+ LIBIE_RX_PTYPE_L3,
+ LIBIE_RX_PTYPE_L3,
+ LIBIE_RX_PTYPE_L3,
+
+ LIBIE_RX_PTYPE_IP(4),
+ LIBIE_RX_PTYPE_IP(6),
+};
+EXPORT_SYMBOL_NS_GPL(libie_rx_ptype_lut, LIBIE);
+
+MODULE_AUTHOR("Intel Corporation");
+MODULE_DESCRIPTION("Intel(R) Ethernet common library");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/net/intel/libie/rx.h b/include/linux/net/intel/libie/rx.h
new file mode 100644
index 000000000000..55263930aa99
--- /dev/null
+++ b/include/linux/net/intel/libie/rx.h
@@ -0,0 +1,128 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2023 Intel Corporation. */
+
+#ifndef __LIBIE_RX_H
+#define __LIBIE_RX_H
+
+#include <linux/netdevice.h>
+
+/* O(1) converting i40e/ice/iavf's 8/10-bit hardware packet type to a parsed
+ * bitfield struct.
+ */
+
+struct libie_rx_ptype_parsed {
+ u16 outer_ip:2;
+ u16 outer_frag:1;
+ u16 tunnel_type:3;
+ u16 tunnel_end_prot:2;
+ u16 tunnel_end_frag:1;
+ u16 inner_prot:3;
+ u16 payload_layer:2;
+};
+
+enum libie_rx_ptype_outer_ip {
+ LIBIE_RX_PTYPE_OUTER_L2 = 0U,
+ LIBIE_RX_PTYPE_OUTER_IPV4,
+ LIBIE_RX_PTYPE_OUTER_IPV6,
+};
+
+enum libie_rx_ptype_outer_fragmented {
+ LIBIE_RX_PTYPE_NOT_FRAG = 0U,
+ LIBIE_RX_PTYPE_FRAG,
+};
+
+enum libie_rx_ptype_tunnel_type {
+ LIBIE_RX_PTYPE_TUNNEL_IP_NONE = 0U,
+ LIBIE_RX_PTYPE_TUNNEL_IP_IP,
+ LIBIE_RX_PTYPE_TUNNEL_IP_GRENAT,
+ LIBIE_RX_PTYPE_TUNNEL_IP_GRENAT_MAC,
+ LIBIE_RX_PTYPE_TUNNEL_IP_GRENAT_MAC_VLAN,
+};
+
+enum libie_rx_ptype_tunnel_end_prot {
+ LIBIE_RX_PTYPE_TUNNEL_END_NONE = 0U,
+ LIBIE_RX_PTYPE_TUNNEL_END_IPV4,
+ LIBIE_RX_PTYPE_TUNNEL_END_IPV6,
+};
+
+enum libie_rx_ptype_inner_prot {
+ LIBIE_RX_PTYPE_INNER_NONE = 0U,
+ LIBIE_RX_PTYPE_INNER_UDP,
+ LIBIE_RX_PTYPE_INNER_TCP,
+ LIBIE_RX_PTYPE_INNER_SCTP,
+ LIBIE_RX_PTYPE_INNER_ICMP,
+ LIBIE_RX_PTYPE_INNER_TIMESYNC,
+};
+
+enum libie_rx_ptype_payload_layer {
+ LIBIE_RX_PTYPE_PAYLOAD_NONE = PKT_HASH_TYPE_NONE,
+ LIBIE_RX_PTYPE_PAYLOAD_L2 = PKT_HASH_TYPE_L2,
+ LIBIE_RX_PTYPE_PAYLOAD_L3 = PKT_HASH_TYPE_L3,
+ LIBIE_RX_PTYPE_PAYLOAD_L4 = PKT_HASH_TYPE_L4,
+};
+
+#define LIBIE_RX_PTYPE_NUM 154
+
+extern const struct libie_rx_ptype_parsed
+libie_rx_ptype_lut[LIBIE_RX_PTYPE_NUM];
+
+/**
+ * libie_parse_rx_ptype - convert HW packet type to software bitfield structure
+ * @ptype: 10-bit hardware packet type value from the descriptor
+ *
+ * ```libie_rx_ptype_lut``` must be accessed only using this wrapper.
+ *
+ * Return: parsed bitfield struct corresponding to the provided ptype.
+ */
+static inline struct libie_rx_ptype_parsed libie_parse_rx_ptype(u32 ptype)
+{
+ if (unlikely(ptype >= LIBIE_RX_PTYPE_NUM))
+ ptype = 0;
+
+ return libie_rx_ptype_lut[ptype];
+}
+
+/* libie_has_*() can be used to quickly check whether the HW metadata is
+ * available to avoid further expensive processing such as descriptor reads.
+ * They already check for the corresponding netdev feature to be enabled,
+ * thus can be used as drop-in replacements.
+ */
+
+static inline bool libie_has_rx_checksum(const struct net_device *dev,
+ struct libie_rx_ptype_parsed parsed)
+{
+ /* _INNER_{SCTP,TCP,UDP} are possible only when _OUTER_IPV* is set,
+ * it is enough to check only for the L4 type.
+ */
+ switch (parsed.inner_prot) {
+ case LIBIE_RX_PTYPE_INNER_TCP:
+ case LIBIE_RX_PTYPE_INNER_UDP:
+ case LIBIE_RX_PTYPE_INNER_SCTP:
+ return dev->features & NETIF_F_RXCSUM;
+ default:
+ return false;
+ }
+}
+
+static inline bool libie_has_rx_hash(const struct net_device *dev,
+ struct libie_rx_ptype_parsed parsed)
+{
+ if (parsed.payload_layer < LIBIE_RX_PTYPE_PAYLOAD_L2)
+ return false;
+
+ return dev->features & NETIF_F_RXHASH;
+}
+
+/**
+ * libie_skb_set_hash - fill in skb hash value basing on the parsed ptype
+ * @skb: skb to fill the hash in
+ * @hash: 32-bit hash value from the descriptor
+ * @parsed: parsed packet type
+ */
+static inline void libie_skb_set_hash(struct sk_buff *skb, u32 hash,
+ struct libie_rx_ptype_parsed parsed)
+{
+ skb_set_hash(skb, hash, parsed.payload_layer);
+}
+
+#endif /* __LIBIE_RX_H */
--
2.43.0

2023-12-07 17:23:14

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 04/12] iavf: kill "legacy-rx" for good

Ever since build_skb() became stable, the old way with allocating an skb
for storing the headers separately, which will be then copied manually,
was slower, less flexible, and thus obsolete.

* It had higher pressure on MM since it actually allocates new pages,
which then get split and refcount-biased (NAPI page cache);
* It implies memcpy() of packet headers (40+ bytes per each frame);
* the actual header length was calculated via eth_get_headlen(), which
invokes Flow Dissector and thus wastes a bunch of CPU cycles;
* XDP makes it even more weird since it requires headroom for long and
also tailroom for some time (since mbuf landed). Take a look at the
ice driver, which is built around work-arounds to make XDP work with
it.

Even on some quite low-end hardware (not a common case for 100G NICs) it
was performing worse.
The only advantage "legacy-rx" had is that it didn't require any
reserved headroom and tailroom. But iavf didn't use this, as it always
splits pages into two halves of 2k, while that save would only be useful
when striding. And again, XDP effectively removes that sole pro.

There's a train of features to land in IAVF soon: Page Pool, XDP, XSk,
multi-buffer etc. Each new would require adding more and more Danse
Macabre for absolutely no reason, besides making hotpath less and less
effective.
Remove the "feature" with all the related code. This includes at least
one very hot branch (typically hit on each new frame), which was either
always-true or always-false at least for a complete NAPI bulk of 64
frames, the whole private flags cruft, and so on. Some stats:

Function: add/remove: 0/4 grow/shrink: 0/7 up/down: 0/-721 (-721)
RO Data: add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-40 (-40)

Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/iavf/iavf.h | 2 +-
.../net/ethernet/intel/iavf/iavf_ethtool.c | 141 ------------------
drivers/net/ethernet/intel/iavf/iavf_main.c | 10 +-
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 84 +----------
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 27 +---
.../net/ethernet/intel/iavf/iavf_virtchnl.c | 3 +-
6 files changed, 8 insertions(+), 259 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf.h b/drivers/net/ethernet/intel/iavf/iavf.h
index e7ab89dc883a..64c94e9a6b2d 100644
--- a/drivers/net/ethernet/intel/iavf/iavf.h
+++ b/drivers/net/ethernet/intel/iavf/iavf.h
@@ -287,7 +287,7 @@ struct iavf_adapter {
#define IAVF_FLAG_RESET_PENDING BIT(4)
#define IAVF_FLAG_RESET_NEEDED BIT(5)
#define IAVF_FLAG_WB_ON_ITR_CAPABLE BIT(6)
-#define IAVF_FLAG_LEGACY_RX BIT(15)
+/* BIT(15) is free, was IAVF_FLAG_LEGACY_RX */
#define IAVF_FLAG_REINIT_ITR_NEEDED BIT(16)
#define IAVF_FLAG_QUEUES_DISABLED BIT(17)
#define IAVF_FLAG_SETUP_NETDEV_FEATURES BIT(18)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index 6b3d3e54b8b7..964e66b4869c 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -239,29 +239,6 @@ static const struct iavf_stats iavf_gstrings_stats[] = {

#define IAVF_QUEUE_STATS_LEN ARRAY_SIZE(iavf_gstrings_queue_stats)

-/* For now we have one and only one private flag and it is only defined
- * when we have support for the SKIP_CPU_SYNC DMA attribute. Instead
- * of leaving all this code sitting around empty we will strip it unless
- * our one private flag is actually available.
- */
-struct iavf_priv_flags {
- char flag_string[ETH_GSTRING_LEN];
- u32 flag;
- bool read_only;
-};
-
-#define IAVF_PRIV_FLAG(_name, _flag, _read_only) { \
- .flag_string = _name, \
- .flag = _flag, \
- .read_only = _read_only, \
-}
-
-static const struct iavf_priv_flags iavf_gstrings_priv_flags[] = {
- IAVF_PRIV_FLAG("legacy-rx", IAVF_FLAG_LEGACY_RX, 0),
-};
-
-#define IAVF_PRIV_FLAGS_STR_LEN ARRAY_SIZE(iavf_gstrings_priv_flags)
-
/**
* iavf_get_link_ksettings - Get Link Speed and Duplex settings
* @netdev: network interface device structure
@@ -341,8 +318,6 @@ static int iavf_get_sset_count(struct net_device *netdev, int sset)
return IAVF_STATS_LEN +
(IAVF_QUEUE_STATS_LEN * 2 *
netdev->real_num_tx_queues);
- else if (sset == ETH_SS_PRIV_FLAGS)
- return IAVF_PRIV_FLAGS_STR_LEN;
else
return -EINVAL;
}
@@ -384,22 +359,6 @@ static void iavf_get_ethtool_stats(struct net_device *netdev,
rcu_read_unlock();
}

-/**
- * iavf_get_priv_flag_strings - Get private flag strings
- * @netdev: network interface device structure
- * @data: buffer for string data
- *
- * Builds the private flags string table
- **/
-static void iavf_get_priv_flag_strings(struct net_device *netdev, u8 *data)
-{
- unsigned int i;
-
- for (i = 0; i < IAVF_PRIV_FLAGS_STR_LEN; i++)
- ethtool_sprintf(&data, "%s",
- iavf_gstrings_priv_flags[i].flag_string);
-}
-
/**
* iavf_get_stat_strings - Get stat strings
* @netdev: network interface device structure
@@ -438,108 +397,11 @@ static void iavf_get_strings(struct net_device *netdev, u32 sset, u8 *data)
case ETH_SS_STATS:
iavf_get_stat_strings(netdev, data);
break;
- case ETH_SS_PRIV_FLAGS:
- iavf_get_priv_flag_strings(netdev, data);
- break;
default:
break;
}
}

-/**
- * iavf_get_priv_flags - report device private flags
- * @netdev: network interface device structure
- *
- * The get string set count and the string set should be matched for each
- * flag returned. Add new strings for each flag to the iavf_gstrings_priv_flags
- * array.
- *
- * Returns a u32 bitmap of flags.
- **/
-static u32 iavf_get_priv_flags(struct net_device *netdev)
-{
- struct iavf_adapter *adapter = netdev_priv(netdev);
- u32 i, ret_flags = 0;
-
- for (i = 0; i < IAVF_PRIV_FLAGS_STR_LEN; i++) {
- const struct iavf_priv_flags *priv_flags;
-
- priv_flags = &iavf_gstrings_priv_flags[i];
-
- if (priv_flags->flag & adapter->flags)
- ret_flags |= BIT(i);
- }
-
- return ret_flags;
-}
-
-/**
- * iavf_set_priv_flags - set private flags
- * @netdev: network interface device structure
- * @flags: bit flags to be set
- **/
-static int iavf_set_priv_flags(struct net_device *netdev, u32 flags)
-{
- struct iavf_adapter *adapter = netdev_priv(netdev);
- u32 orig_flags, new_flags, changed_flags;
- int ret = 0;
- u32 i;
-
- orig_flags = READ_ONCE(adapter->flags);
- new_flags = orig_flags;
-
- for (i = 0; i < IAVF_PRIV_FLAGS_STR_LEN; i++) {
- const struct iavf_priv_flags *priv_flags;
-
- priv_flags = &iavf_gstrings_priv_flags[i];
-
- if (flags & BIT(i))
- new_flags |= priv_flags->flag;
- else
- new_flags &= ~(priv_flags->flag);
-
- if (priv_flags->read_only &&
- ((orig_flags ^ new_flags) & ~BIT(i)))
- return -EOPNOTSUPP;
- }
-
- /* Before we finalize any flag changes, any checks which we need to
- * perform to determine if the new flags will be supported should go
- * here...
- */
-
- /* Compare and exchange the new flags into place. If we failed, that
- * is if cmpxchg returns anything but the old value, this means
- * something else must have modified the flags variable since we
- * copied it. We'll just punt with an error and log something in the
- * message buffer.
- */
- if (cmpxchg(&adapter->flags, orig_flags, new_flags) != orig_flags) {
- dev_warn(&adapter->pdev->dev,
- "Unable to update adapter->flags as it was modified by another thread...\n");
- return -EAGAIN;
- }
-
- changed_flags = orig_flags ^ new_flags;
-
- /* Process any additional changes needed as a result of flag changes.
- * The changed_flags value reflects the list of bits that were changed
- * in the code above.
- */
-
- /* issue a reset to force legacy-rx change to take effect */
- if (changed_flags & IAVF_FLAG_LEGACY_RX) {
- if (netif_running(netdev)) {
- iavf_schedule_reset(adapter, IAVF_FLAG_RESET_NEEDED);
- ret = iavf_wait_for_reset(adapter);
- if (ret)
- netdev_warn(netdev, "Changing private flags timeout or interrupted waiting for reset");
- }
- }
-
- return ret;
-}
-
/**
* iavf_get_msglevel - Get debug message level
* @netdev: network interface device structure
@@ -585,7 +447,6 @@ static void iavf_get_drvinfo(struct net_device *netdev,
strscpy(drvinfo->driver, iavf_driver_name, 32);
strscpy(drvinfo->fw_version, "N/A", 4);
strscpy(drvinfo->bus_info, pci_name(adapter->pdev), 32);
- drvinfo->n_priv_flags = IAVF_PRIV_FLAGS_STR_LEN;
}

/**
@@ -1973,8 +1834,6 @@ static const struct ethtool_ops iavf_ethtool_ops = {
.get_strings = iavf_get_strings,
.get_ethtool_stats = iavf_get_ethtool_stats,
.get_sset_count = iavf_get_sset_count,
- .get_priv_flags = iavf_get_priv_flags,
- .set_priv_flags = iavf_set_priv_flags,
.get_msglevel = iavf_get_msglevel,
.set_msglevel = iavf_set_msglevel,
.get_coalesce = iavf_get_coalesce,
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 091ad35d181d..f81144466496 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -740,9 +740,7 @@ static void iavf_configure_rx(struct iavf_adapter *adapter)
struct iavf_hw *hw = &adapter->hw;
int i;

- /* Legacy Rx will always default to a 2048 buffer size. */
-#if (PAGE_SIZE < 8192)
- if (!(adapter->flags & IAVF_FLAG_LEGACY_RX)) {
+ if (PAGE_SIZE < 8192) {
struct net_device *netdev = adapter->netdev;

/* For jumbo frames on systems with 4K pages we have to use
@@ -759,16 +757,10 @@ static void iavf_configure_rx(struct iavf_adapter *adapter)
(netdev->mtu <= ETH_DATA_LEN))
rx_buf_len = IAVF_RXBUFFER_1536 - NET_IP_ALIGN;
}
-#endif

for (i = 0; i < adapter->num_active_queues; i++) {
adapter->rx_rings[i].tail = hw->hw_addr + IAVF_QRX_TAIL1(i);
adapter->rx_rings[i].rx_buf_len = rx_buf_len;
-
- if (adapter->flags & IAVF_FLAG_LEGACY_RX)
- clear_ring_build_skb_enabled(&adapter->rx_rings[i]);
- else
- set_ring_build_skb_enabled(&adapter->rx_rings[i]);
}
}

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index c610013bdcb0..27cea26cc53e 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -823,17 +823,6 @@ static void iavf_release_rx_desc(struct iavf_ring *rx_ring, u32 val)
writel(val, rx_ring->tail);
}

-/**
- * iavf_rx_offset - Return expected offset into page to access data
- * @rx_ring: Ring we are requesting offset of
- *
- * Returns the offset value for ring into the data buffer.
- */
-static unsigned int iavf_rx_offset(struct iavf_ring *rx_ring)
-{
- return ring_uses_build_skb(rx_ring) ? IAVF_SKB_PAD : 0;
-}
-
/**
* iavf_alloc_mapped_page - recycle or make a new page
* @rx_ring: ring to use
@@ -878,7 +867,7 @@ static bool iavf_alloc_mapped_page(struct iavf_ring *rx_ring,

bi->dma = dma;
bi->page = page;
- bi->page_offset = iavf_rx_offset(rx_ring);
+ bi->page_offset = IAVF_SKB_PAD;

/* initialize pagecnt_bias to 1 representing we fully own page */
bi->pagecnt_bias = 1;
@@ -1219,7 +1208,7 @@ static void iavf_add_rx_frag(struct iavf_ring *rx_ring,
#if (PAGE_SIZE < 8192)
unsigned int truesize = iavf_rx_pg_size(rx_ring) / 2;
#else
- unsigned int truesize = SKB_DATA_ALIGN(size + iavf_rx_offset(rx_ring));
+ unsigned int truesize = SKB_DATA_ALIGN(size + IAVF_SKB_PAD);
#endif

if (!size)
@@ -1267,71 +1256,6 @@ static struct iavf_rx_buffer *iavf_get_rx_buffer(struct iavf_ring *rx_ring,
return rx_buffer;
}

-/**
- * iavf_construct_skb - Allocate skb and populate it
- * @rx_ring: rx descriptor ring to transact packets on
- * @rx_buffer: rx buffer to pull data from
- * @size: size of buffer to add to skb
- *
- * This function allocates an skb. It then populates it with the page
- * data from the current receive descriptor, taking care to set up the
- * skb correctly.
- */
-static struct sk_buff *iavf_construct_skb(struct iavf_ring *rx_ring,
- struct iavf_rx_buffer *rx_buffer,
- unsigned int size)
-{
- void *va;
-#if (PAGE_SIZE < 8192)
- unsigned int truesize = iavf_rx_pg_size(rx_ring) / 2;
-#else
- unsigned int truesize = SKB_DATA_ALIGN(size);
-#endif
- unsigned int headlen;
- struct sk_buff *skb;
-
- if (!rx_buffer)
- return NULL;
- /* prefetch first cache line of first page */
- va = page_address(rx_buffer->page) + rx_buffer->page_offset;
- net_prefetch(va);
-
- /* allocate a skb to store the frags */
- skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
- IAVF_RX_HDR_SIZE,
- GFP_ATOMIC | __GFP_NOWARN);
- if (unlikely(!skb))
- return NULL;
-
- /* Determine available headroom for copy */
- headlen = size;
- if (headlen > IAVF_RX_HDR_SIZE)
- headlen = eth_get_headlen(skb->dev, va, IAVF_RX_HDR_SIZE);
-
- /* align pull length to size of long to optimize memcpy performance */
- memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
-
- /* update all of the pointers */
- size -= headlen;
- if (size) {
- skb_add_rx_frag(skb, 0, rx_buffer->page,
- rx_buffer->page_offset + headlen,
- size, truesize);
-
- /* buffer is used by skb, update page_offset */
-#if (PAGE_SIZE < 8192)
- rx_buffer->page_offset ^= truesize;
-#else
- rx_buffer->page_offset += truesize;
-#endif
- } else {
- /* buffer is unused, reset bias back to rx_buffer */
- rx_buffer->pagecnt_bias++;
- }
-
- return skb;
-}
-
/**
* iavf_build_skb - Build skb around an existing buffer
* @rx_ring: Rx descriptor ring to transact packets on
@@ -1504,10 +1428,8 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
/* retrieve a buffer from the ring */
if (skb)
iavf_add_rx_frag(rx_ring, rx_buffer, skb, size);
- else if (ring_uses_build_skb(rx_ring))
- skb = iavf_build_skb(rx_ring, rx_buffer, size);
else
- skb = iavf_construct_skb(rx_ring, rx_buffer, size);
+ skb = iavf_build_skb(rx_ring, rx_buffer, size);

/* exit if we failed to retrieve a buffer */
if (!skb) {
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index 7e6ee32d19b6..4b412f7662e4 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -82,20 +82,11 @@ enum iavf_dyn_idx_t {
BIT_ULL(IAVF_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP))

/* Supported Rx Buffer Sizes (a multiple of 128) */
-#define IAVF_RXBUFFER_256 256
#define IAVF_RXBUFFER_1536 1536 /* 128B aligned standard Ethernet frame */
#define IAVF_RXBUFFER_2048 2048
#define IAVF_RXBUFFER_3072 3072 /* Used for large frames w/ padding */
#define IAVF_MAX_RXBUFFER 9728 /* largest size for single descriptor */

-/* NOTE: netdev_alloc_skb reserves up to 64 bytes, NET_IP_ALIGN means we
- * reserve 2 more, and skb_shared_info adds an additional 384 bytes more,
- * this adds up to 512 bytes of extra data meaning the smallest allocation
- * we could have is 1K.
- * i.e. RXBUFFER_256 --> 960 byte skb (size-1024 slab)
- * i.e. RXBUFFER_512 --> 1216 byte skb (size-2048 slab)
- */
-#define IAVF_RX_HDR_SIZE IAVF_RXBUFFER_256
#define IAVF_PACKET_HDR_PAD (ETH_HLEN + ETH_FCS_LEN + (VLAN_HLEN * 2))
#define iavf_rx_desc iavf_32byte_rx_desc

@@ -362,7 +353,8 @@ struct iavf_ring {

u16 flags;
#define IAVF_TXR_FLAGS_WB_ON_ITR BIT(0)
-#define IAVF_RXR_FLAGS_BUILD_SKB_ENABLED BIT(1)
+/* BIT(1) is free, was IAVF_RXR_FLAGS_BUILD_SKB_ENABLED */
+/* BIT(2) is free */
#define IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1 BIT(3)
#define IAVF_TXR_FLAGS_VLAN_TAG_LOC_L2TAG2 BIT(4)
#define IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2 BIT(5)
@@ -393,21 +385,6 @@ struct iavf_ring {
*/
} ____cacheline_internodealigned_in_smp;

-static inline bool ring_uses_build_skb(struct iavf_ring *ring)
-{
- return !!(ring->flags & IAVF_RXR_FLAGS_BUILD_SKB_ENABLED);
-}
-
-static inline void set_ring_build_skb_enabled(struct iavf_ring *ring)
-{
- ring->flags |= IAVF_RXR_FLAGS_BUILD_SKB_ENABLED;
-}
-
-static inline void clear_ring_build_skb_enabled(struct iavf_ring *ring)
-{
- ring->flags &= ~IAVF_RXR_FLAGS_BUILD_SKB_ENABLED;
-}
-
#define IAVF_ITR_ADAPTIVE_MIN_INC 0x0002
#define IAVF_ITR_ADAPTIVE_MIN_USECS 0x0002
#define IAVF_ITR_ADAPTIVE_MAX_USECS 0x007e
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 64c4443dbef9..37d0e4313130 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -289,8 +289,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
return;

/* Limit maximum frame size when jumbo frames is not enabled */
- if (!(adapter->flags & IAVF_FLAG_LEGACY_RX) &&
- (adapter->netdev->mtu <= ETH_DATA_LEN))
+ if (adapter->netdev->mtu <= ETH_DATA_LEN)
max_frame = IAVF_RXBUFFER_1536 - NET_IP_ALIGN;

vqci->vsi_id = adapter->vsi_res->vsi_id;
--
2.43.0

2023-12-07 17:23:16

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 08/12] libie: add Rx buffer management (via Page Pool)

Add a couple intuitive helpers to hide Rx buffer implementation details
in the library and not multiplicate it between drivers. The settings are
optimized for Intel hardware, but nothing really HW-specific here.
Use the new page_pool_dev_alloc() to dynamically switch between
split-page and full-page modes depending on MTU, page size, required
headroom etc. For example, on x86_64 with the default driver settings
each page is shared between 2 buffers. Turning on XDP (not in this
series) -> increasing headroom requirement pushes truesize out of 2048
boundary, leading to that each buffer starts getting a full page.
The "ceiling" limit is %PAGE_SIZE, as only order-0 pages are used to
avoid compound overhead. For the above architecture, this means maximum
linear frame size of 3712 w/o XDP.
Not that &libie_buf_queue is not a complete queue/ring structure for
now, rather a shim, but eventually the libie-enabled drivers will move
to it, with iavf being the first one.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/libie/Kconfig | 1 +
drivers/net/ethernet/intel/libie/rx.c | 69 ++++++++++++
include/linux/net/intel/libie/rx.h | 133 ++++++++++++++++++++++-
3 files changed, 202 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/libie/Kconfig b/drivers/net/ethernet/intel/libie/Kconfig
index 1eda4a5faa5a..6e0162fb94d2 100644
--- a/drivers/net/ethernet/intel/libie/Kconfig
+++ b/drivers/net/ethernet/intel/libie/Kconfig
@@ -3,6 +3,7 @@

config LIBIE
tristate
+ select PAGE_POOL
help
libie (Intel Ethernet library) is a common library containing
routines shared between several Intel Ethernet drivers.
diff --git a/drivers/net/ethernet/intel/libie/rx.c b/drivers/net/ethernet/intel/libie/rx.c
index f503476d8eef..359867714a1b 100644
--- a/drivers/net/ethernet/intel/libie/rx.c
+++ b/drivers/net/ethernet/intel/libie/rx.c
@@ -3,6 +3,75 @@

#include <linux/net/intel/libie/rx.h>

+/* Rx buffer management */
+
+/**
+ * libie_rx_hw_len - get the actual buffer size to be passed to HW
+ * @pp: &page_pool_params of the netdev to calculate the size for
+ *
+ * Return: HW-writeable length per one buffer to pass it to the HW accounting:
+ * MTU the @dev has, HW required alignment, minimum and maximum allowed values,
+ * and system's page size.
+ */
+static u32 libie_rx_hw_len(const struct page_pool_params *pp)
+{
+ u32 len;
+
+ len = READ_ONCE(pp->netdev->mtu) + LIBIE_RX_LL_LEN;
+ len = ALIGN(len, LIBIE_RX_BUF_LEN_ALIGN);
+ len = clamp(len, LIBIE_MIN_RX_BUF_LEN, pp->max_len);
+
+ return len;
+}
+
+/**
+ * libie_rx_page_pool_create - create a PP with the default libie settings
+ * @bq: buffer queue struct to fill
+ * @napi: &napi_struct covering this PP (no usage outside its poll loops)
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+int libie_rx_page_pool_create(struct libie_buf_queue *bq,
+ struct napi_struct *napi)
+{
+ struct page_pool_params pp = {
+ .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
+ .order = LIBIE_RX_PAGE_ORDER,
+ .pool_size = bq->count,
+ .nid = NUMA_NO_NODE,
+ .dev = napi->dev->dev.parent,
+ .netdev = napi->dev,
+ .napi = napi,
+ .dma_dir = DMA_FROM_DEVICE,
+ .offset = LIBIE_SKB_HEADROOM,
+ };
+
+ /* HW-writeable / syncable length per one page */
+ pp.max_len = LIBIE_RX_BUF_LEN(pp.offset);
+
+ /* HW-writeable length per buffer */
+ bq->rx_buf_len = libie_rx_hw_len(&pp);
+ /* Buffer size to allocate */
+ bq->truesize = roundup_pow_of_two(SKB_HEAD_ALIGN(pp.offset +
+ bq->rx_buf_len));
+
+ bq->pp = page_pool_create(&pp);
+
+ return PTR_ERR_OR_ZERO(bq->pp);
+}
+EXPORT_SYMBOL_NS_GPL(libie_rx_page_pool_create, LIBIE);
+
+/**
+ * libie_rx_page_pool_destroy - destroy a &page_pool created by libie
+ * @bq: buffer queue to process
+ */
+void libie_rx_page_pool_destroy(struct libie_buf_queue *bq)
+{
+ page_pool_destroy(bq->pp);
+ bq->pp = NULL;
+}
+EXPORT_SYMBOL_NS_GPL(libie_rx_page_pool_destroy, LIBIE);
+
/* O(1) converting i40e/ice/iavf's 8/10-bit hardware packet type to a parsed
* bitfield struct.
*/
diff --git a/include/linux/net/intel/libie/rx.h b/include/linux/net/intel/libie/rx.h
index 55263930aa99..71bc9a1a9856 100644
--- a/include/linux/net/intel/libie/rx.h
+++ b/include/linux/net/intel/libie/rx.h
@@ -4,7 +4,138 @@
#ifndef __LIBIE_RX_H
#define __LIBIE_RX_H

-#include <linux/netdevice.h>
+#include <linux/if_vlan.h>
+#include <net/page_pool/helpers.h>
+
+/* Rx MTU/buffer/truesize helpers. Mostly pure software-side; HW-defined values
+ * are valid for all Intel HW.
+ */
+
+/* Space reserved in front of each frame */
+#define LIBIE_SKB_HEADROOM (NET_SKB_PAD + NET_IP_ALIGN)
+/* Maximum headroom to calculate max MTU below */
+#define LIBIE_MAX_HEADROOM LIBIE_SKB_HEADROOM
+/* Link layer / L2 overhead: Ethernet, 2 VLAN tags (C + S), FCS */
+#define LIBIE_RX_LL_LEN (ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN)
+
+/* Always use order-0 pages */
+#define LIBIE_RX_PAGE_ORDER 0
+/* Rx buffer size config is a multiple of 128 */
+#define LIBIE_RX_BUF_LEN_ALIGN 128
+/* HW-writeable space in one buffer: truesize - headroom/tailroom,
+ * HW-aligned
+ */
+#define __LIBIE_RX_BUF_LEN(hr) \
+ ALIGN_DOWN(SKB_MAX_ORDER(hr, LIBIE_RX_PAGE_ORDER), \
+ LIBIE_RX_BUF_LEN_ALIGN)
+/* The smallest and largest size for a single descriptor as per HW */
+#define LIBIE_MIN_RX_BUF_LEN 1024U
+#define LIBIE_MAX_RX_BUF_LEN 9728U
+/* "True" HW-writeable space: minimum from SW and HW values */
+#define LIBIE_RX_BUF_LEN(hr) min_t(u32, __LIBIE_RX_BUF_LEN(hr), \
+ LIBIE_MAX_RX_BUF_LEN)
+
+/* The maximum frame size as per HW (S/G) */
+#define __LIBIE_MAX_RX_FRM_LEN 16382U
+/* ATST, HW can chain up to 5 Rx descriptors */
+#define LIBIE_MAX_RX_FRM_LEN(hr) \
+ min_t(u32, __LIBIE_MAX_RX_FRM_LEN, LIBIE_RX_BUF_LEN(hr) * 5)
+/* Maximum frame size minus LL overhead */
+#define LIBIE_MAX_MTU \
+ (LIBIE_MAX_RX_FRM_LEN(LIBIE_MAX_HEADROOM) - LIBIE_RX_LL_LEN)
+
+/* Rx buffer management */
+
+/**
+ * struct libie_rx_buffer - structure representing an Rx buffer
+ * @page: page holding the buffer
+ * @offset: offset from the page start (to the headroom)
+ * @truesize: total space occupied by the buffer (w/ headroom and tailroom)
+ *
+ * Depending on the MTU, API switches between one-page-per-frame and shared
+ * page model (to conserve memory on bigger-page platforms). In case of the
+ * former, @offset is always 0 and @truesize is always ```PAGE_SIZE```.
+ */
+struct libie_rx_buffer {
+ struct page *page;
+ u32 offset;
+ u32 truesize;
+};
+
+/**
+ * struct libie_buf_queue - structure representing a buffer queue
+ * @pp: &page_pool for buffer management
+ * @rx_bi: array of Rx buffers
+ * @truesize: size to allocate per buffer, w/overhead
+ * @count: number of descriptors/buffers the queue has
+ * @rx_buf_len: HW-writeable length per each buffer
+ */
+struct libie_buf_queue {
+ struct page_pool *pp;
+ struct libie_rx_buffer *rx_bi;
+
+ u32 truesize;
+ u32 count;
+
+ /* Cold fields */
+ u32 rx_buf_len;
+};
+
+int libie_rx_page_pool_create(struct libie_buf_queue *bq,
+ struct napi_struct *napi);
+void libie_rx_page_pool_destroy(struct libie_buf_queue *bq);
+
+/**
+ * libie_rx_alloc - allocate a new Rx buffer
+ * @bq: buffer queue to allocate for
+ * @i: index of the buffer within the queue
+ *
+ * Return: DMA address to be passed to HW for Rx on successful allocation,
+ * ```DMA_MAPPING_ERROR``` otherwise.
+ */
+static inline dma_addr_t libie_rx_alloc(const struct libie_buf_queue *bq,
+ u32 i)
+{
+ struct libie_rx_buffer *buf = &bq->rx_bi[i];
+
+ buf->truesize = bq->truesize;
+ buf->page = page_pool_dev_alloc(bq->pp, &buf->offset, &buf->truesize);
+ if (unlikely(!buf->page))
+ return DMA_MAPPING_ERROR;
+
+ return page_pool_get_dma_addr(buf->page) + buf->offset +
+ bq->pp->p.offset;
+}
+
+/**
+ * libie_rx_sync_for_cpu - synchronize or recycle buffer post DMA
+ * @buf: buffer to process
+ * @len: frame length from the descriptor
+ *
+ * Process the buffer after it's written by HW. The regular path is to
+ * synchronize DMA for CPU, but in case of no data it will be immediately
+ * recycled back to its PP.
+ *
+ * Return: true when there's data to process, false otherwise.
+ */
+static inline bool libie_rx_sync_for_cpu(const struct libie_rx_buffer *buf,
+ u32 len)
+{
+ struct page *page = buf->page;
+
+ /* Very rare, but possible case. The most common reason:
+ * the last fragment contained FCS only, which was then
+ * stripped by the HW.
+ */
+ if (unlikely(!len)) {
+ page_pool_recycle_direct(page->pp, page);
+ return false;
+ }
+
+ page_pool_dma_sync_for_cpu(page->pp, page, buf->offset, len);
+
+ return true;
+}

/* O(1) converting i40e/ice/iavf's 8/10-bit hardware packet type to a parsed
* bitfield struct.
--
2.43.0

2023-12-07 17:23:18

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 09/12] iavf: pack iavf_ring more efficiently

Before replacing the Rx buffer management with libie, clean up
&iavf_ring a bit.
There are several fields not used anywhere in the code -- simply remove
them. Move ::tail up to remove a hole. Replace ::arm_wb boolean with
1-bit flag in ::flags to free 1 more byte. Finally, move ::prev_pkt_ctr
out of &iavf_tx_queue_stats -- it doesn't belong there (used for Tx
stall detection). Place it next to the stats on the ring itself to fill
the 4-byte slot.
The result: no holes and all the hot fields fit into the first 64-byte
cacheline.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 12 +++++------
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 22 +++------------------
2 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 665ee1feb877..62f976d322ab 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -184,7 +184,7 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
* pending work.
*/
packets = tx_ring->stats.packets & INT_MAX;
- if (tx_ring->tx_stats.prev_pkt_ctr == packets) {
+ if (tx_ring->prev_pkt_ctr == packets) {
iavf_force_wb(vsi, tx_ring->q_vector);
continue;
}
@@ -193,7 +193,7 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
* to iavf_get_tx_pending()
*/
smp_rmb();
- tx_ring->tx_stats.prev_pkt_ctr =
+ tx_ring->prev_pkt_ctr =
iavf_get_tx_pending(tx_ring, true) ? packets : -1;
}
}
@@ -319,7 +319,7 @@ static bool iavf_clean_tx_irq(struct iavf_vsi *vsi,
((j / WB_STRIDE) == 0) && (j > 0) &&
!test_bit(__IAVF_VSI_DOWN, vsi->state) &&
(IAVF_DESC_UNUSED(tx_ring) != tx_ring->count))
- tx_ring->arm_wb = true;
+ tx_ring->flags |= IAVF_TXR_FLAGS_ARM_WB;
}

/* notify netdev of completed buffers */
@@ -674,7 +674,7 @@ int iavf_setup_tx_descriptors(struct iavf_ring *tx_ring)

tx_ring->next_to_use = 0;
tx_ring->next_to_clean = 0;
- tx_ring->tx_stats.prev_pkt_ctr = -1;
+ tx_ring->prev_pkt_ctr = -1;
return 0;

err:
@@ -1494,8 +1494,8 @@ int iavf_napi_poll(struct napi_struct *napi, int budget)
clean_complete = false;
continue;
}
- arm_wb |= ring->arm_wb;
- ring->arm_wb = false;
+ arm_wb |= !!(ring->flags & IAVF_TXR_FLAGS_ARM_WB);
+ ring->flags &= ~IAVF_TXR_FLAGS_ARM_WB;
}

/* Handle case where we are called by netpoll with a budget of 0 */
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index 720bca0e6716..9b8154c5f4fb 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -228,7 +228,6 @@ struct iavf_tx_queue_stats {
u64 tx_done_old;
u64 tx_linearize;
u64 tx_force_wb;
- int prev_pkt_ctr;
u64 tx_lost_interrupt;
};

@@ -238,12 +237,6 @@ struct iavf_rx_queue_stats {
u64 alloc_buff_failed;
};

-enum iavf_ring_state_t {
- __IAVF_TX_FDIR_INIT_DONE,
- __IAVF_TX_XPS_INIT_DONE,
- __IAVF_RING_STATE_NBITS /* must be last */
-};
-
/* some useful defines for virtchannel interface, which
* is the only remaining user of header split
*/
@@ -265,10 +258,8 @@ struct iavf_ring {
struct iavf_tx_buffer *tx_bi;
struct iavf_rx_buffer *rx_bi;
};
- DECLARE_BITMAP(state, __IAVF_RING_STATE_NBITS);
- u16 queue_index; /* Queue number of ring */
- u8 dcb_tc; /* Traffic class of ring */
u8 __iomem *tail;
+ u16 queue_index; /* Queue number of ring */

/* high bit set means dynamic, use accessors routines to read/write.
* hardware only supports 2us resolution for the ITR registers.
@@ -278,22 +269,14 @@ struct iavf_ring {
u16 itr_setting;

u16 count; /* Number of descriptors */
- u16 reg_idx; /* HW register index of the ring */

/* used in interrupt processing */
u16 next_to_use;
u16 next_to_clean;

- u8 atr_sample_rate;
- u8 atr_count;
-
- bool ring_active; /* is ring online or not */
- bool arm_wb; /* do something to arm write back */
- u8 packet_stride;
-
u16 flags;
#define IAVF_TXR_FLAGS_WB_ON_ITR BIT(0)
-/* BIT(1) is free, was IAVF_RXR_FLAGS_BUILD_SKB_ENABLED */
+#define IAVF_TXR_FLAGS_ARM_WB BIT(1)
/* BIT(2) is free */
#define IAVF_TXRX_FLAGS_VLAN_TAG_LOC_L2TAG1 BIT(3)
#define IAVF_TXR_FLAGS_VLAN_TAG_LOC_L2TAG2 BIT(4)
@@ -307,6 +290,7 @@ struct iavf_ring {
struct iavf_rx_queue_stats rx_stats;
};

+ int prev_pkt_ctr; /* For Tx stall detection */
unsigned int size; /* length of descriptor ring in bytes */
dma_addr_t dma; /* physical address of ring */

--
2.43.0

2023-12-07 17:24:00

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 11/12] libie: add common queue stats

Next stop, per-queue private stats. They have only subtle differences
from driver to driver and can easily be resolved.
Define common structures, inline helpers and Ethtool helpers to collect,
update and export the statistics. Use u64_stats_t right from the start,
as well as the corresponding helpers to ensure tear-free operations.
For the NAPI parts of both Rx and Tx, also define small onstack
containers to update them in polling loops and then sync the actual
containers once a loop ends.
The drivers will be switched to use this API later on a per-driver
basis, along with conversion to PP.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/libie/Makefile | 1 +
drivers/net/ethernet/intel/libie/stats.c | 121 +++++++++++++++
include/linux/net/intel/libie/stats.h | 179 ++++++++++++++++++++++
3 files changed, 301 insertions(+)
create mode 100644 drivers/net/ethernet/intel/libie/stats.c
create mode 100644 include/linux/net/intel/libie/stats.h

diff --git a/drivers/net/ethernet/intel/libie/Makefile b/drivers/net/ethernet/intel/libie/Makefile
index 95e81d09b474..76f32253481b 100644
--- a/drivers/net/ethernet/intel/libie/Makefile
+++ b/drivers/net/ethernet/intel/libie/Makefile
@@ -4,3 +4,4 @@
obj-$(CONFIG_LIBIE) += libie.o

libie-objs += rx.o
+libie-objs += stats.o
diff --git a/drivers/net/ethernet/intel/libie/stats.c b/drivers/net/ethernet/intel/libie/stats.c
new file mode 100644
index 000000000000..85f5d279406f
--- /dev/null
+++ b/drivers/net/ethernet/intel/libie/stats.c
@@ -0,0 +1,121 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright(c) 2023 Intel Corporation. */
+
+#include <linux/ethtool.h>
+
+#include <linux/net/intel/libie/rx.h>
+#include <linux/net/intel/libie/stats.h>
+
+/* Rx per-queue stats */
+
+static const char * const libie_rq_stats_str[] = {
+#define act(s) __stringify(s),
+ DECLARE_LIBIE_RQ_STATS(act)
+#undef act
+};
+
+#define LIBIE_RQ_STATS_NUM ARRAY_SIZE(libie_rq_stats_str)
+
+/**
+ * libie_rq_stats_get_sset_count - get the number of Ethtool RQ stats provided
+ *
+ * Return: number of per-queue Rx stats supported by the library.
+ */
+u32 libie_rq_stats_get_sset_count(void)
+{
+ return LIBIE_RQ_STATS_NUM;
+}
+EXPORT_SYMBOL_NS_GPL(libie_rq_stats_get_sset_count, LIBIE);
+
+/**
+ * libie_rq_stats_get_strings - get the name strings of Ethtool RQ stats
+ * @data: reference to the cursor pointing to the output buffer
+ * @qid: RQ number to print in the prefix
+ */
+void libie_rq_stats_get_strings(u8 **data, u32 qid)
+{
+ for (u32 i = 0; i < LIBIE_RQ_STATS_NUM; i++)
+ ethtool_sprintf(data, "rq%u_%s", qid, libie_rq_stats_str[i]);
+}
+EXPORT_SYMBOL_NS_GPL(libie_rq_stats_get_strings, LIBIE);
+
+/**
+ * libie_rq_stats_get_data - get the RQ stats in Ethtool format
+ * @data: reference to the cursor pointing to the output array
+ * @stats: RQ stats container from the queue
+ */
+void libie_rq_stats_get_data(u64 **data, const struct libie_rq_stats *stats)
+{
+ u64 sarr[LIBIE_RQ_STATS_NUM];
+ u32 start;
+
+ do {
+ start = u64_stats_fetch_begin(&stats->syncp);
+
+ for (u32 i = 0; i < LIBIE_RQ_STATS_NUM; i++)
+ sarr[i] = u64_stats_read(&stats->raw[i]);
+ } while (u64_stats_fetch_retry(&stats->syncp, start));
+
+ for (u32 i = 0; i < LIBIE_RQ_STATS_NUM; i++)
+ (*data)[i] += sarr[i];
+
+ *data += LIBIE_RQ_STATS_NUM;
+}
+EXPORT_SYMBOL_NS_GPL(libie_rq_stats_get_data, LIBIE);
+
+/* Tx per-queue stats */
+
+static const char * const libie_sq_stats_str[] = {
+#define act(s) __stringify(s),
+ DECLARE_LIBIE_SQ_STATS(act)
+#undef act
+};
+
+#define LIBIE_SQ_STATS_NUM ARRAY_SIZE(libie_sq_stats_str)
+
+/**
+ * libie_sq_stats_get_sset_count - get the number of Ethtool SQ stats provided
+ *
+ * Return: number of per-queue Tx stats supported by the library.
+ */
+u32 libie_sq_stats_get_sset_count(void)
+{
+ return LIBIE_SQ_STATS_NUM;
+}
+EXPORT_SYMBOL_NS_GPL(libie_sq_stats_get_sset_count, LIBIE);
+
+/**
+ * libie_sq_stats_get_strings - get the name strings of Ethtool SQ stats
+ * @data: reference to the cursor pointing to the output buffer
+ * @qid: SQ number to print in the prefix
+ */
+void libie_sq_stats_get_strings(u8 **data, u32 qid)
+{
+ for (u32 i = 0; i < LIBIE_SQ_STATS_NUM; i++)
+ ethtool_sprintf(data, "sq%u_%s", qid, libie_sq_stats_str[i]);
+}
+EXPORT_SYMBOL_NS_GPL(libie_sq_stats_get_strings, LIBIE);
+
+/**
+ * libie_sq_stats_get_data - get the SQ stats in Ethtool format
+ * @data: reference to the cursor pointing to the output array
+ * @stats: SQ stats container from the queue
+ */
+void libie_sq_stats_get_data(u64 **data, const struct libie_sq_stats *stats)
+{
+ u64 sarr[LIBIE_SQ_STATS_NUM];
+ u32 start;
+
+ do {
+ start = u64_stats_fetch_begin(&stats->syncp);
+
+ for (u32 i = 0; i < LIBIE_SQ_STATS_NUM; i++)
+ sarr[i] = u64_stats_read(&stats->raw[i]);
+ } while (u64_stats_fetch_retry(&stats->syncp, start));
+
+ for (u32 i = 0; i < LIBIE_SQ_STATS_NUM; i++)
+ (*data)[i] += sarr[i];
+
+ *data += LIBIE_SQ_STATS_NUM;
+}
+EXPORT_SYMBOL_NS_GPL(libie_sq_stats_get_data, LIBIE);
diff --git a/include/linux/net/intel/libie/stats.h b/include/linux/net/intel/libie/stats.h
new file mode 100644
index 000000000000..dbbc98bbd3a7
--- /dev/null
+++ b/include/linux/net/intel/libie/stats.h
@@ -0,0 +1,179 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2023 Intel Corporation. */
+
+#ifndef __LIBIE_STATS_H
+#define __LIBIE_STATS_H
+
+#include <linux/u64_stats_sync.h>
+
+/* Common */
+
+/* Use 32-byte alignment to reduce false sharing */
+#define __libie_stats_aligned __aligned(4 * sizeof(u64_stats_t))
+
+/**
+ * libie_stats_add - update one structure counter from a local struct
+ * @qs: queue stats structure to update (&libie_rq_stats or &libie_sq_stats)
+ * @ss: local/onstack stats structure
+ * @f: name of the field to update
+ *
+ * If a local/onstack stats structure is used to collect statistics during
+ * hotpath loops, this macro can be used to shorthand updates, given that
+ * the fields have the same name.
+ * Must be guarded with u64_stats_update_{begin,end}().
+ */
+#define libie_stats_add(qs, ss, f) \
+ u64_stats_add(&(qs)->f, (ss)->f)
+
+/**
+ * __libie_stats_inc_one - safely increment one stats structure counter
+ * @s: queue stats structure to update (&libie_rq_stats or &libie_sq_stats)
+ * @f: name of the field to increment
+ * @n: name of the temporary variable, result of __UNIQUE_ID()
+ *
+ * To be used on exception or slow paths -- allocation fails, queue stops etc.
+ */
+#define __libie_stats_inc_one(s, f, n) ({ \
+ typeof(*(s)) *n = (s); \
+ \
+ u64_stats_update_begin(&n->syncp); \
+ u64_stats_inc(&n->f); \
+ u64_stats_update_end(&n->syncp); \
+})
+#define libie_stats_inc_one(s, f) \
+ __libie_stats_inc_one(s, f, __UNIQUE_ID(qs_))
+
+/* Rx per-queue stats:
+ * packets: packets received on this queue
+ * bytes: bytes received on this queue
+ * fragments: number of processed descriptors carrying only a fragment
+ * alloc_page_fail: number of Rx page allocation fails
+ * build_skb_fail: number of build_skb() fails
+ */
+
+#define DECLARE_LIBIE_RQ_NAPI_STATS(act) \
+ act(packets) \
+ act(bytes) \
+ act(fragments)
+
+#define DECLARE_LIBIE_RQ_FAIL_STATS(act) \
+ act(alloc_page_fail) \
+ act(build_skb_fail)
+
+#define DECLARE_LIBIE_RQ_STATS(act) \
+ DECLARE_LIBIE_RQ_NAPI_STATS(act) \
+ DECLARE_LIBIE_RQ_FAIL_STATS(act)
+
+struct libie_rq_stats {
+ struct u64_stats_sync syncp;
+
+ union {
+ struct {
+#define act(s) u64_stats_t s;
+ DECLARE_LIBIE_RQ_NAPI_STATS(act);
+ DECLARE_LIBIE_RQ_FAIL_STATS(act);
+#undef act
+ };
+ DECLARE_FLEX_ARRAY(u64_stats_t, raw);
+ };
+} __libie_stats_aligned;
+
+/* Rx stats being modified frequently during the NAPI polling, to sync them
+ * with the queue stats once after the loop is finished.
+ */
+struct libie_rq_onstack_stats {
+ union {
+ struct {
+#define act(s) u32 s;
+ DECLARE_LIBIE_RQ_NAPI_STATS(act);
+#undef act
+ };
+ DECLARE_FLEX_ARRAY(u32, raw);
+ };
+};
+
+/**
+ * libie_rq_napi_stats_add - add onstack Rx stats to the queue container
+ * @qs: Rx queue stats structure to update
+ * @ss: onstack structure to get the values from, updated during the NAPI loop
+ */
+static inline void
+libie_rq_napi_stats_add(struct libie_rq_stats *qs,
+ const struct libie_rq_onstack_stats *ss)
+{
+ u64_stats_update_begin(&qs->syncp);
+ libie_stats_add(qs, ss, packets);
+ libie_stats_add(qs, ss, bytes);
+ libie_stats_add(qs, ss, fragments);
+ u64_stats_update_end(&qs->syncp);
+}
+
+u32 libie_rq_stats_get_sset_count(void);
+void libie_rq_stats_get_strings(u8 **data, u32 qid);
+void libie_rq_stats_get_data(u64 **data, const struct libie_rq_stats *stats);
+
+/* Tx per-queue stats:
+ * packets: packets sent from this queue
+ * bytes: bytes sent from this queue
+ * busy: number of xmit failures due to the ring being full
+ * stops: number times the ring was stopped from the driver
+ * restarts: number times it was started after being stopped
+ * linearized: number of skbs linearized due to HW limits
+ */
+
+#define DECLARE_LIBIE_SQ_NAPI_STATS(act) \
+ act(packets) \
+ act(bytes)
+
+#define DECLARE_LIBIE_SQ_XMIT_STATS(act) \
+ act(busy) \
+ act(stops) \
+ act(restarts) \
+ act(linearized)
+
+#define DECLARE_LIBIE_SQ_STATS(act) \
+ DECLARE_LIBIE_SQ_NAPI_STATS(act) \
+ DECLARE_LIBIE_SQ_XMIT_STATS(act)
+
+struct libie_sq_stats {
+ struct u64_stats_sync syncp;
+
+ union {
+ struct {
+#define act(s) u64_stats_t s;
+ DECLARE_LIBIE_SQ_STATS(act);
+#undef act
+ };
+ DECLARE_FLEX_ARRAY(u64_stats_t, raw);
+ };
+} __libie_stats_aligned;
+
+struct libie_sq_onstack_stats {
+#define act(s) u32 s;
+ DECLARE_LIBIE_SQ_NAPI_STATS(act);
+#undef act
+};
+
+/**
+ * libie_sq_napi_stats_add - add onstack Tx stats to the queue container
+ * @qs: Tx queue stats structure to update
+ * @ss: onstack structure to get the values from, updated during the NAPI loop
+ */
+static inline void
+libie_sq_napi_stats_add(struct libie_sq_stats *qs,
+ const struct libie_sq_onstack_stats *ss)
+{
+ if (unlikely(!ss->packets))
+ return;
+
+ u64_stats_update_begin(&qs->syncp);
+ libie_stats_add(qs, ss, packets);
+ libie_stats_add(qs, ss, bytes);
+ u64_stats_update_end(&qs->syncp);
+}
+
+u32 libie_sq_stats_get_sset_count(void);
+void libie_sq_stats_get_strings(u8 **data, u32 qid);
+void libie_sq_stats_get_data(u64 **data, const struct libie_sq_stats *stats);
+
+#endif /* __LIBIE_STATS_H */
--
2.43.0

2023-12-07 17:24:12

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 12/12] iavf: switch queue stats to libie

iavf is pretty much ready for using the generic libie stats, so drop all
the custom code and just use generic definitions. The only thing is that
it previously lacked the counter of Tx queue stops. It's present in the
other drivers, so add it here as well.
The rest is straightforward. Note that it makes the ring structure 1 CL
bigger, but no layout changes since the stats structures are 32-byte
aligned.

Signed-off-by: Alexander Lobakin <[email protected]>
---
.../net/ethernet/intel/iavf/iavf_ethtool.c | 94 +++++--------------
drivers/net/ethernet/intel/iavf/iavf_main.c | 2 +
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 84 +++++++++--------
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 28 +-----
4 files changed, 72 insertions(+), 136 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index 964e66b4869c..898528abb03c 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -2,9 +2,10 @@
/* Copyright(c) 2013 - 2018 Intel Corporation. */

/* ethtool support for iavf */
-#include "iavf.h"

-#include <linux/uaccess.h>
+#include <linux/net/intel/libie/rx.h>
+
+#include "iavf.h"

/* ethtool statistics helpers */

@@ -46,16 +47,6 @@ struct iavf_stats {
.stat_offset = offsetof(_type, _stat) \
}

-/* Helper macro for defining some statistics related to queues */
-#define IAVF_QUEUE_STAT(_name, _stat) \
- IAVF_STAT(struct iavf_ring, _name, _stat)
-
-/* Stats associated with a Tx or Rx ring */
-static const struct iavf_stats iavf_gstrings_queue_stats[] = {
- IAVF_QUEUE_STAT("%s-%u.packets", stats.packets),
- IAVF_QUEUE_STAT("%s-%u.bytes", stats.bytes),
-};
-
/**
* iavf_add_one_ethtool_stat - copy the stat into the supplied buffer
* @data: location to store the stat value
@@ -141,43 +132,6 @@ __iavf_add_ethtool_stats(u64 **data, void *pointer,
#define iavf_add_ethtool_stats(data, pointer, stats) \
__iavf_add_ethtool_stats(data, pointer, stats, ARRAY_SIZE(stats))

-/**
- * iavf_add_queue_stats - copy queue statistics into supplied buffer
- * @data: ethtool stats buffer
- * @ring: the ring to copy
- *
- * Queue statistics must be copied while protected by
- * u64_stats_fetch_begin, so we can't directly use iavf_add_ethtool_stats.
- * Assumes that queue stats are defined in iavf_gstrings_queue_stats. If the
- * ring pointer is null, zero out the queue stat values and update the data
- * pointer. Otherwise safely copy the stats from the ring into the supplied
- * buffer and update the data pointer when finished.
- *
- * This function expects to be called while under rcu_read_lock().
- **/
-static void
-iavf_add_queue_stats(u64 **data, struct iavf_ring *ring)
-{
- const unsigned int size = ARRAY_SIZE(iavf_gstrings_queue_stats);
- const struct iavf_stats *stats = iavf_gstrings_queue_stats;
- unsigned int start;
- unsigned int i;
-
- /* To avoid invalid statistics values, ensure that we keep retrying
- * the copy until we get a consistent value according to
- * u64_stats_fetch_retry. But first, make sure our ring is
- * non-null before attempting to access its syncp.
- */
- do {
- start = !ring ? 0 : u64_stats_fetch_begin(&ring->syncp);
- for (i = 0; i < size; i++)
- iavf_add_one_ethtool_stat(&(*data)[i], ring, &stats[i]);
- } while (ring && u64_stats_fetch_retry(&ring->syncp, start));
-
- /* Once we successfully copy the stats in, update the data pointer */
- *data += size;
-}
-
/**
* __iavf_add_stat_strings - copy stat strings into ethtool buffer
* @p: ethtool supplied buffer
@@ -237,8 +191,6 @@ static const struct iavf_stats iavf_gstrings_stats[] = {

#define IAVF_STATS_LEN ARRAY_SIZE(iavf_gstrings_stats)

-#define IAVF_QUEUE_STATS_LEN ARRAY_SIZE(iavf_gstrings_queue_stats)
-
/**
* iavf_get_link_ksettings - Get Link Speed and Duplex settings
* @netdev: network interface device structure
@@ -308,18 +260,22 @@ static int iavf_get_link_ksettings(struct net_device *netdev,
**/
static int iavf_get_sset_count(struct net_device *netdev, int sset)
{
- /* Report the maximum number queues, even if not every queue is
- * currently configured. Since allocation of queues is in pairs,
- * use netdev->real_num_tx_queues * 2. The real_num_tx_queues is set
- * at device creation and never changes.
- */
+ u32 num;

- if (sset == ETH_SS_STATS)
- return IAVF_STATS_LEN +
- (IAVF_QUEUE_STATS_LEN * 2 *
- netdev->real_num_tx_queues);
- else
+ switch (sset) {
+ case ETH_SS_STATS:
+ /* Per-queue */
+ num = libie_rq_stats_get_sset_count();
+ num += libie_sq_stats_get_sset_count();
+ num *= netdev->real_num_tx_queues;
+
+ /* Global */
+ num += IAVF_STATS_LEN;
+
+ return num;
+ default:
return -EINVAL;
+ }
}

/**
@@ -346,15 +302,11 @@ static void iavf_get_ethtool_stats(struct net_device *netdev,
* it to iterate over rings' stats.
*/
for (i = 0; i < adapter->num_active_queues; i++) {
- struct iavf_ring *ring;
+ /* Rx rings stats */
+ libie_rq_stats_get_data(&data, &adapter->rx_rings[i].rq_stats);

/* Tx rings stats */
- ring = &adapter->tx_rings[i];
- iavf_add_queue_stats(&data, ring);
-
- /* Rx rings stats */
- ring = &adapter->rx_rings[i];
- iavf_add_queue_stats(&data, ring);
+ libie_sq_stats_get_data(&data, &adapter->tx_rings[i].sq_stats);
}
rcu_read_unlock();
}
@@ -376,10 +328,8 @@ static void iavf_get_stat_strings(struct net_device *netdev, u8 *data)
* real_num_tx_queues for both Tx and Rx queues.
*/
for (i = 0; i < netdev->real_num_tx_queues; i++) {
- iavf_add_stat_strings(&data, iavf_gstrings_queue_stats,
- "tx", i);
- iavf_add_stat_strings(&data, iavf_gstrings_queue_stats,
- "rx", i);
+ libie_rq_stats_get_strings(&data, i);
+ libie_sq_stats_get_strings(&data, i);
}
}

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 0becfdf2ed99..73563ae2ea03 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -1603,12 +1603,14 @@ static int iavf_alloc_queues(struct iavf_adapter *adapter)
tx_ring->itr_setting = IAVF_ITR_TX_DEF;
if (adapter->flags & IAVF_FLAG_WB_ON_ITR_CAPABLE)
tx_ring->flags |= IAVF_TXR_FLAGS_WB_ON_ITR;
+ u64_stats_init(&tx_ring->sq_stats.syncp);

rx_ring = &adapter->rx_rings[i];
rx_ring->queue_index = i;
rx_ring->netdev = adapter->netdev;
rx_ring->count = adapter->rx_desc_count;
rx_ring->itr_setting = IAVF_ITR_RX_DEF;
+ u64_stats_init(&rx_ring->rq_stats.syncp);
}

adapter->num_active_queues = num_active_queues;
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index aa86a0f926c6..97f8241ba56f 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -176,6 +176,9 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
for (i = 0; i < vsi->back->num_active_queues; i++) {
tx_ring = &vsi->back->tx_rings[i];
if (tx_ring && tx_ring->desc) {
+ const struct libie_sq_stats *st = &tx_ring->sq_stats;
+ u32 start;
+
/* If packet counter has not changed the queue is
* likely stalled, so force an interrupt for this
* queue.
@@ -183,7 +186,12 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
* prev_pkt_ctr would be negative if there was no
* pending work.
*/
- packets = tx_ring->stats.packets & INT_MAX;
+ do {
+ start = u64_stats_fetch_begin(&st->syncp);
+ packets = u64_stats_read(&st->packets) &
+ INT_MAX;
+ } while (u64_stats_fetch_retry(&st->syncp, start));
+
if (tx_ring->prev_pkt_ctr == packets) {
iavf_force_wb(vsi, tx_ring->q_vector);
continue;
@@ -212,11 +220,11 @@ void iavf_detect_recover_hung(struct iavf_vsi *vsi)
static bool iavf_clean_tx_irq(struct iavf_vsi *vsi,
struct iavf_ring *tx_ring, int napi_budget)
{
+ unsigned int budget = IAVF_DEFAULT_IRQ_WORK;
+ struct libie_sq_onstack_stats stats = { };
int i = tx_ring->next_to_clean;
struct iavf_tx_buffer *tx_buf;
struct iavf_tx_desc *tx_desc;
- unsigned int total_bytes = 0, total_packets = 0;
- unsigned int budget = IAVF_DEFAULT_IRQ_WORK;

tx_buf = &tx_ring->tx_bi[i];
tx_desc = IAVF_TX_DESC(tx_ring, i);
@@ -242,8 +250,8 @@ static bool iavf_clean_tx_irq(struct iavf_vsi *vsi,
tx_buf->next_to_watch = NULL;

/* update the statistics for this packet */
- total_bytes += tx_buf->bytecount;
- total_packets += tx_buf->gso_segs;
+ stats.bytes += tx_buf->bytecount;
+ stats.packets += tx_buf->gso_segs;

/* free the skb */
napi_consume_skb(tx_buf->skb, napi_budget);
@@ -300,12 +308,9 @@ static bool iavf_clean_tx_irq(struct iavf_vsi *vsi,

i += tx_ring->count;
tx_ring->next_to_clean = i;
- u64_stats_update_begin(&tx_ring->syncp);
- tx_ring->stats.bytes += total_bytes;
- tx_ring->stats.packets += total_packets;
- u64_stats_update_end(&tx_ring->syncp);
- tx_ring->q_vector->tx.total_bytes += total_bytes;
- tx_ring->q_vector->tx.total_packets += total_packets;
+ libie_sq_napi_stats_add(&tx_ring->sq_stats, &stats);
+ tx_ring->q_vector->tx.total_bytes += stats.bytes;
+ tx_ring->q_vector->tx.total_packets += stats.packets;

if (tx_ring->flags & IAVF_TXR_FLAGS_WB_ON_ITR) {
/* check to see if there are < 4 descriptors
@@ -324,10 +329,10 @@ static bool iavf_clean_tx_irq(struct iavf_vsi *vsi,

/* notify netdev of completed buffers */
netdev_tx_completed_queue(txring_txq(tx_ring),
- total_packets, total_bytes);
+ stats.packets, stats.bytes);

#define TX_WAKE_THRESHOLD ((s16)(DESC_NEEDED * 2))
- if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
+ if (unlikely(stats.packets && netif_carrier_ok(tx_ring->netdev) &&
(IAVF_DESC_UNUSED(tx_ring) >= TX_WAKE_THRESHOLD))) {
/* Make sure that anybody stopping the queue after this
* sees the new next_to_clean.
@@ -338,7 +343,7 @@ static bool iavf_clean_tx_irq(struct iavf_vsi *vsi,
!test_bit(__IAVF_VSI_DOWN, vsi->state)) {
netif_wake_subqueue(tx_ring->netdev,
tx_ring->queue_index);
- ++tx_ring->tx_stats.restart_queue;
+ libie_stats_inc_one(&tx_ring->sq_stats, restarts);
}
}

@@ -766,8 +771,6 @@ int iavf_setup_rx_descriptors(struct iavf_ring *rx_ring)
if (!rx_ring->rx_bi)
goto err;

- u64_stats_init(&rx_ring->syncp);
-
/* Round up to nearest 4K */
rx_ring->size = rx_ring->count * sizeof(union iavf_32byte_rx_desc);
rx_ring->size = ALIGN(rx_ring->size, 4096);
@@ -893,7 +896,7 @@ bool iavf_alloc_rx_buffers(struct iavf_ring *rx_ring, u16 cleaned_count)
if (rx_ring->next_to_use != ntu)
iavf_release_rx_desc(rx_ring, ntu);

- rx_ring->rx_stats.alloc_page_failed++;
+ libie_stats_inc_one(&rx_ring->rq_stats, alloc_page_fail);

/* make sure to come back via polling to try again after
* allocation failure
@@ -1101,7 +1104,7 @@ static struct sk_buff *iavf_build_skb(const struct libie_rx_buffer *rx_buffer,
* iavf_is_non_eop - process handling of non-EOP buffers
* @rx_ring: Rx ring being processed
* @rx_desc: Rx descriptor for current buffer
- * @skb: Current socket buffer containing buffer in progress
+ * @stats: NAPI poll local stats to update
*
* This function updates next to clean. If the buffer is an EOP buffer
* this function exits returning false, otherwise it will place the
@@ -1110,7 +1113,7 @@ static struct sk_buff *iavf_build_skb(const struct libie_rx_buffer *rx_buffer,
**/
static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
union iavf_rx_desc *rx_desc,
- struct sk_buff *skb)
+ struct libie_rq_onstack_stats *stats)
{
u32 ntc = rx_ring->next_to_clean + 1;

@@ -1125,7 +1128,7 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
if (likely(iavf_test_staterr(rx_desc, IAVF_RXD_EOF)))
return false;

- rx_ring->rx_stats.non_eop_descs++;
+ stats->fragments++;

return true;
}
@@ -1144,12 +1147,12 @@ static bool iavf_is_non_eop(struct iavf_ring *rx_ring,
**/
static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
{
- unsigned int total_rx_bytes = 0, total_rx_packets = 0;
- struct sk_buff *skb = rx_ring->skb;
u16 cleaned_count = IAVF_DESC_UNUSED(rx_ring);
+ struct libie_rq_onstack_stats stats = { };
+ struct sk_buff *skb = rx_ring->skb;
bool failure = false;

- while (likely(total_rx_packets < (unsigned int)budget)) {
+ while (likely(stats.packets < budget)) {
struct libie_rx_buffer *rx_buffer;
union iavf_rx_desc *rx_desc;
unsigned int size;
@@ -1199,14 +1202,15 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)

/* exit if we failed to retrieve a buffer */
if (!skb) {
- rx_ring->rx_stats.alloc_buff_failed++;
+ libie_stats_inc_one(&rx_ring->rq_stats,
+ build_skb_fail);
break;
}

skip_data:
cleaned_count++;

- if (iavf_is_non_eop(rx_ring, rx_desc, skb))
+ if (iavf_is_non_eop(rx_ring, rx_desc, &stats))
continue;

/* ERR_MASK will only have valid bits if EOP set, and
@@ -1226,7 +1230,7 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
}

/* probably a little skewed due to removing CRC */
- total_rx_bytes += skb->len;
+ stats.bytes += skb->len;

qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
rx_ptype = (qword & IAVF_RXD_QW1_PTYPE_MASK) >>
@@ -1248,20 +1252,20 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
skb = NULL;

/* update budget accounting */
- total_rx_packets++;
+ stats.packets++;
}

rx_ring->skb = skb;

- u64_stats_update_begin(&rx_ring->syncp);
- rx_ring->stats.packets += total_rx_packets;
- rx_ring->stats.bytes += total_rx_bytes;
- u64_stats_update_end(&rx_ring->syncp);
- rx_ring->q_vector->rx.total_packets += total_rx_packets;
- rx_ring->q_vector->rx.total_bytes += total_rx_bytes;
+ libie_rq_napi_stats_add(&rx_ring->rq_stats, &stats);
+ rx_ring->q_vector->rx.total_packets += stats.packets;
+ rx_ring->q_vector->rx.total_bytes += stats.bytes;

/* guarantee a trip back through this routine if there was a failure */
- return failure ? budget : (int)total_rx_packets;
+ if (unlikely(failure))
+ return budget;
+
+ return stats.packets;
}

static inline u32 iavf_buildreg_itr(const int type, u16 itr)
@@ -1438,10 +1442,8 @@ int iavf_napi_poll(struct napi_struct *napi, int budget)
return budget - 1;
}
tx_only:
- if (arm_wb) {
- q_vector->tx.ring[0].tx_stats.tx_force_wb++;
+ if (arm_wb)
iavf_enable_wb_on_itr(vsi, q_vector);
- }
return budget;
}

@@ -1900,6 +1902,7 @@ bool __iavf_chk_linearize(struct sk_buff *skb)
int __iavf_maybe_stop_tx(struct iavf_ring *tx_ring, int size)
{
netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+ libie_stats_inc_one(&tx_ring->sq_stats, stops);
/* Memory barrier before checking head and tail */
smp_mb();

@@ -1909,7 +1912,8 @@ int __iavf_maybe_stop_tx(struct iavf_ring *tx_ring, int size)

/* A reprieve! - use start_queue because it doesn't call schedule */
netif_start_subqueue(tx_ring->netdev, tx_ring->queue_index);
- ++tx_ring->tx_stats.restart_queue;
+ libie_stats_inc_one(&tx_ring->sq_stats, restarts);
+
return 0;
}

@@ -2090,7 +2094,7 @@ static netdev_tx_t iavf_xmit_frame_ring(struct sk_buff *skb,
return NETDEV_TX_OK;
}
count = iavf_txd_use_count(skb->len);
- tx_ring->tx_stats.tx_linearize++;
+ libie_stats_inc_one(&tx_ring->sq_stats, linearized);
}

/* need: 1 descriptor per page * PAGE_SIZE/IAVF_MAX_DATA_PER_TXD,
@@ -2100,7 +2104,7 @@ static netdev_tx_t iavf_xmit_frame_ring(struct sk_buff *skb,
* otherwise try next time
*/
if (iavf_maybe_stop_tx(tx_ring, count + 4 + 1)) {
- tx_ring->tx_stats.tx_busy++;
+ libie_stats_inc_one(&tx_ring->sq_stats, busy);
return NETDEV_TX_BUSY;
}

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index 72c6bc64d94e..016c8b6e6160 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -4,6 +4,8 @@
#ifndef _IAVF_TXRX_H_
#define _IAVF_TXRX_H_

+#include <linux/net/intel/libie/stats.h>
+
/* Interrupt Throttling and Rate Limiting Goodies */
#define IAVF_DEFAULT_IRQ_WORK 256

@@ -201,26 +203,6 @@ struct iavf_tx_buffer {
u32 tx_flags;
};

-struct iavf_queue_stats {
- u64 packets;
- u64 bytes;
-};
-
-struct iavf_tx_queue_stats {
- u64 restart_queue;
- u64 tx_busy;
- u64 tx_done_old;
- u64 tx_linearize;
- u64 tx_force_wb;
- u64 tx_lost_interrupt;
-};
-
-struct iavf_rx_queue_stats {
- u64 non_eop_descs;
- u64 alloc_page_failed;
- u64 alloc_buff_failed;
-};
-
/* some useful defines for virtchannel interface, which
* is the only remaining user of header split
*/
@@ -272,11 +254,9 @@ struct iavf_ring {
#define IAVF_RXR_FLAGS_VLAN_TAG_LOC_L2TAG2_2 BIT(5)

/* stats structs */
- struct iavf_queue_stats stats;
- struct u64_stats_sync syncp;
union {
- struct iavf_tx_queue_stats tx_stats;
- struct iavf_rx_queue_stats rx_stats;
+ struct libie_rq_stats rq_stats;
+ struct libie_sq_stats sq_stats;
};

int prev_pkt_ctr; /* For Tx stall detection */
--
2.43.0

2023-12-07 17:24:22

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 05/12] iavf: drop page splitting and recycling

As an intermediate step, remove all page splitting/recycling code. Just
always allocate a new page and don't touch its refcount, so that it gets
freed by the core stack later.
Same for the "in-place" recycling, i.e. when an unused buffer gets
assigned to a first needs-refilling descriptor. In some cases, this
was leading to moving up to 63 &iavf_rx_buf structures around the ring
on a per-field basis -- not something wanted on hotpath.
The change allows to greatly simplify certain parts of the code:

Function: add/remove: 0/2 grow/shrink: 0/7 up/down: 0/-744 (-744)

Although the array of &iavf_rx_buf is barely used now and could be
replaced with just page pointer array, don't touch it now to not
complicate replacing it with libie Rx buffer struct later on.
No surprise perf loses up to 30% here, but that regression will
go away once PP lands.
Note that iavf_rx_pg_*() definitions are left to reduce diffstat.
They will be removed with the conversion to Page Pool.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/iavf/iavf_main.c | 24 +--
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 152 +-----------------
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 65 --------
drivers/net/ethernet/intel/iavf/iavf_type.h | 2 -
.../net/ethernet/intel/iavf/iavf_virtchnl.c | 8 +-
5 files changed, 10 insertions(+), 241 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index f81144466496..060002d1ebf0 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -736,32 +736,10 @@ static void iavf_configure_tx(struct iavf_adapter *adapter)
**/
static void iavf_configure_rx(struct iavf_adapter *adapter)
{
- unsigned int rx_buf_len = IAVF_RXBUFFER_2048;
struct iavf_hw *hw = &adapter->hw;
- int i;
-
- if (PAGE_SIZE < 8192) {
- struct net_device *netdev = adapter->netdev;

- /* For jumbo frames on systems with 4K pages we have to use
- * an order 1 page, so we might as well increase the size
- * of our Rx buffer to make better use of the available space
- */
- rx_buf_len = IAVF_RXBUFFER_3072;
-
- /* We use a 1536 buffer size for configurations with
- * standard Ethernet mtu. On x86 this gives us enough room
- * for shared info and 192 bytes of padding.
- */
- if (!IAVF_2K_TOO_SMALL_WITH_PADDING &&
- (netdev->mtu <= ETH_DATA_LEN))
- rx_buf_len = IAVF_RXBUFFER_1536 - NET_IP_ALIGN;
- }
-
- for (i = 0; i < adapter->num_active_queues; i++) {
+ for (u32 i = 0; i < adapter->num_active_queues; i++)
adapter->rx_rings[i].tail = hw->hw_addr + IAVF_QRX_TAIL1(i);
- adapter->rx_rings[i].rx_buf_len = rx_buf_len;
- }
}

/**
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 27cea26cc53e..665ee1feb877 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -714,7 +714,7 @@ static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
dma_sync_single_range_for_cpu(rx_ring->dev,
rx_bi->dma,
rx_bi->page_offset,
- rx_ring->rx_buf_len,
+ IAVF_RXBUFFER_3072,
DMA_FROM_DEVICE);

/* free resources associated with mapping */
@@ -723,7 +723,7 @@ static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
DMA_FROM_DEVICE,
IAVF_RX_DMA_ATTR);

- __page_frag_cache_drain(rx_bi->page, rx_bi->pagecnt_bias);
+ __free_page(rx_bi->page);

rx_bi->page = NULL;
rx_bi->page_offset = 0;
@@ -735,7 +735,6 @@ static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
/* Zero out the descriptor ring */
memset(rx_ring->desc, 0, rx_ring->size);

- rx_ring->next_to_alloc = 0;
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;
}
@@ -791,7 +790,6 @@ int iavf_setup_rx_descriptors(struct iavf_ring *rx_ring)
goto err;
}

- rx_ring->next_to_alloc = 0;
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;

@@ -811,9 +809,6 @@ static void iavf_release_rx_desc(struct iavf_ring *rx_ring, u32 val)
{
rx_ring->next_to_use = val;

- /* update next to alloc since we have filled the ring */
- rx_ring->next_to_alloc = val;
-
/* Force memory writes to complete before letting h/w
* know there are new descriptors to fetch. (Only
* applicable for weak-ordered memory model archs,
@@ -837,12 +832,6 @@ static bool iavf_alloc_mapped_page(struct iavf_ring *rx_ring,
struct page *page = bi->page;
dma_addr_t dma;

- /* since we are recycling buffers we should seldom need to alloc */
- if (likely(page)) {
- rx_ring->rx_stats.page_reuse_count++;
- return true;
- }
-
/* alloc new page for storage */
page = dev_alloc_pages(iavf_rx_pg_order(rx_ring));
if (unlikely(!page)) {
@@ -869,9 +858,6 @@ static bool iavf_alloc_mapped_page(struct iavf_ring *rx_ring,
bi->page = page;
bi->page_offset = IAVF_SKB_PAD;

- /* initialize pagecnt_bias to 1 representing we fully own page */
- bi->pagecnt_bias = 1;
-
return true;
}

@@ -923,7 +909,7 @@ bool iavf_alloc_rx_buffers(struct iavf_ring *rx_ring, u16 cleaned_count)
/* sync the buffer for use by the device */
dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
bi->page_offset,
- rx_ring->rx_buf_len,
+ IAVF_RXBUFFER_3072,
DMA_FROM_DEVICE);

/* Refresh the desc even if buffer_addrs didn't change
@@ -1103,91 +1089,6 @@ static bool iavf_cleanup_headers(struct iavf_ring *rx_ring, struct sk_buff *skb)
return false;
}

-/**
- * iavf_reuse_rx_page - page flip buffer and store it back on the ring
- * @rx_ring: rx descriptor ring to store buffers on
- * @old_buff: donor buffer to have page reused
- *
- * Synchronizes page for reuse by the adapter
- **/
-static void iavf_reuse_rx_page(struct iavf_ring *rx_ring,
- struct iavf_rx_buffer *old_buff)
-{
- struct iavf_rx_buffer *new_buff;
- u16 nta = rx_ring->next_to_alloc;
-
- new_buff = &rx_ring->rx_bi[nta];
-
- /* update, and store next to alloc */
- nta++;
- rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
-
- /* transfer page from old buffer to new buffer */
- new_buff->dma = old_buff->dma;
- new_buff->page = old_buff->page;
- new_buff->page_offset = old_buff->page_offset;
- new_buff->pagecnt_bias = old_buff->pagecnt_bias;
-}
-
-/**
- * iavf_can_reuse_rx_page - Determine if this page can be reused by
- * the adapter for another receive
- *
- * @rx_buffer: buffer containing the page
- *
- * If page is reusable, rx_buffer->page_offset is adjusted to point to
- * an unused region in the page.
- *
- * For small pages, @truesize will be a constant value, half the size
- * of the memory at page. We'll attempt to alternate between high and
- * low halves of the page, with one half ready for use by the hardware
- * and the other half being consumed by the stack. We use the page
- * ref count to determine whether the stack has finished consuming the
- * portion of this page that was passed up with a previous packet. If
- * the page ref count is >1, we'll assume the "other" half page is
- * still busy, and this page cannot be reused.
- *
- * For larger pages, @truesize will be the actual space used by the
- * received packet (adjusted upward to an even multiple of the cache
- * line size). This will advance through the page by the amount
- * actually consumed by the received packets while there is still
- * space for a buffer. Each region of larger pages will be used at
- * most once, after which the page will not be reused.
- *
- * In either case, if the page is reusable its refcount is increased.
- **/
-static bool iavf_can_reuse_rx_page(struct iavf_rx_buffer *rx_buffer)
-{
- unsigned int pagecnt_bias = rx_buffer->pagecnt_bias;
- struct page *page = rx_buffer->page;
-
- /* Is any reuse possible? */
- if (!dev_page_is_reusable(page))
- return false;
-
-#if (PAGE_SIZE < 8192)
- /* if we are only owner of page we can reuse it */
- if (unlikely((page_count(page) - pagecnt_bias) > 1))
- return false;
-#else
-#define IAVF_LAST_OFFSET \
- (SKB_WITH_OVERHEAD(PAGE_SIZE) - IAVF_RXBUFFER_2048)
- if (rx_buffer->page_offset > IAVF_LAST_OFFSET)
- return false;
-#endif
-
- /* If we have drained the page fragment pool we need to update
- * the pagecnt_bias and page count so that we fully restock the
- * number of references the driver holds.
- */
- if (unlikely(!pagecnt_bias)) {
- page_ref_add(page, USHRT_MAX);
- rx_buffer->pagecnt_bias = USHRT_MAX;
- }
-
- return true;
-}
-
/**
* iavf_add_rx_frag - Add contents of Rx buffer to sk_buff
* @rx_ring: rx descriptor ring to transact packets on
@@ -1205,24 +1106,13 @@ static void iavf_add_rx_frag(struct iavf_ring *rx_ring,
struct sk_buff *skb,
unsigned int size)
{
-#if (PAGE_SIZE < 8192)
- unsigned int truesize = iavf_rx_pg_size(rx_ring) / 2;
-#else
unsigned int truesize = SKB_DATA_ALIGN(size + IAVF_SKB_PAD);
-#endif

if (!size)
return;

skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
rx_buffer->page_offset, size, truesize);
-
- /* page is being used so we must update the page offset */
-#if (PAGE_SIZE < 8192)
- rx_buffer->page_offset ^= truesize;
-#else
- rx_buffer->page_offset += truesize;
-#endif
}

/**
@@ -1250,9 +1140,6 @@ static struct iavf_rx_buffer *iavf_get_rx_buffer(struct iavf_ring *rx_ring,
size,
DMA_FROM_DEVICE);

- /* We have pulled a buffer for use, so decrement pagecnt_bias */
- rx_buffer->pagecnt_bias--;
-
return rx_buffer;
}

@@ -1270,12 +1157,8 @@ static struct sk_buff *iavf_build_skb(struct iavf_ring *rx_ring,
unsigned int size)
{
void *va;
-#if (PAGE_SIZE < 8192)
- unsigned int truesize = iavf_rx_pg_size(rx_ring) / 2;
-#else
unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
SKB_DATA_ALIGN(IAVF_SKB_PAD + size);
-#endif
struct sk_buff *skb;

if (!rx_buffer || !size)
@@ -1293,23 +1176,15 @@ static struct sk_buff *iavf_build_skb(struct iavf_ring *rx_ring,
skb_reserve(skb, IAVF_SKB_PAD);
__skb_put(skb, size);

- /* buffer is used by skb, update page_offset */
-#if (PAGE_SIZE < 8192)
- rx_buffer->page_offset ^= truesize;
-#else
- rx_buffer->page_offset += truesize;
-#endif
-
return skb;
}

/**
- * iavf_put_rx_buffer - Clean up used buffer and either recycle or free
+ * iavf_put_rx_buffer - Unmap used buffer
* @rx_ring: rx descriptor ring to transact packets on
* @rx_buffer: rx buffer to pull data from
*
- * This function will clean up the contents of the rx_buffer. It will
- * either recycle the buffer or unmap it and free the associated resources.
+ * This function will unmap the buffer after it's written by HW.
*/
static void iavf_put_rx_buffer(struct iavf_ring *rx_ring,
struct iavf_rx_buffer *rx_buffer)
@@ -1317,18 +1192,9 @@ static void iavf_put_rx_buffer(struct iavf_ring *rx_ring,
if (!rx_buffer)
return;

- if (iavf_can_reuse_rx_page(rx_buffer)) {
- /* hand second half of page back to the ring */
- iavf_reuse_rx_page(rx_ring, rx_buffer);
- rx_ring->rx_stats.page_reuse_count++;
- } else {
- /* we are not reusing the buffer so unmap it */
- dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma,
- iavf_rx_pg_size(rx_ring),
- DMA_FROM_DEVICE, IAVF_RX_DMA_ATTR);
- __page_frag_cache_drain(rx_buffer->page,
- rx_buffer->pagecnt_bias);
- }
+ /* we are not reusing the buffer so unmap it */
+ dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma, PAGE_SIZE,
+ DMA_FROM_DEVICE, IAVF_RX_DMA_ATTR);

/* clear contents of buffer_info */
rx_buffer->page = NULL;
@@ -1434,8 +1300,6 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
/* exit if we failed to retrieve a buffer */
if (!skb) {
rx_ring->rx_stats.alloc_buff_failed++;
- if (rx_buffer && size)
- rx_buffer->pagecnt_bias++;
break;
}

diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index 4b412f7662e4..720bca0e6716 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -82,8 +82,6 @@ enum iavf_dyn_idx_t {
BIT_ULL(IAVF_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP))

/* Supported Rx Buffer Sizes (a multiple of 128) */
-#define IAVF_RXBUFFER_1536 1536 /* 128B aligned standard Ethernet frame */
-#define IAVF_RXBUFFER_2048 2048
#define IAVF_RXBUFFER_3072 3072 /* Used for large frames w/ padding */
#define IAVF_MAX_RXBUFFER 9728 /* largest size for single descriptor */

@@ -93,57 +91,7 @@ enum iavf_dyn_idx_t {
#define IAVF_RX_DMA_ATTR \
(DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)

-/* Attempt to maximize the headroom available for incoming frames. We
- * use a 2K buffer for receives and need 1536/1534 to store the data for
- * the frame. This leaves us with 512 bytes of room. From that we need
- * to deduct the space needed for the shared info and the padding needed
- * to IP align the frame.
- *
- * Note: For cache line sizes 256 or larger this value is going to end
- * up negative. In these cases we should fall back to the legacy
- * receive path.
- */
-#if (PAGE_SIZE < 8192)
-#define IAVF_2K_TOO_SMALL_WITH_PADDING \
-((NET_SKB_PAD + IAVF_RXBUFFER_1536) > SKB_WITH_OVERHEAD(IAVF_RXBUFFER_2048))
-
-static inline int iavf_compute_pad(int rx_buf_len)
-{
- int page_size, pad_size;
-
- page_size = ALIGN(rx_buf_len, PAGE_SIZE / 2);
- pad_size = SKB_WITH_OVERHEAD(page_size) - rx_buf_len;
-
- return pad_size;
-}
-
-static inline int iavf_skb_pad(void)
-{
- int rx_buf_len;
-
- /* If a 2K buffer cannot handle a standard Ethernet frame then
- * optimize padding for a 3K buffer instead of a 1.5K buffer.
- *
- * For a 3K buffer we need to add enough padding to allow for
- * tailroom due to NET_IP_ALIGN possibly shifting us out of
- * cache-line alignment.
- */
- if (IAVF_2K_TOO_SMALL_WITH_PADDING)
- rx_buf_len = IAVF_RXBUFFER_3072 + SKB_DATA_ALIGN(NET_IP_ALIGN);
- else
- rx_buf_len = IAVF_RXBUFFER_1536;
-
- /* if needed make room for NET_IP_ALIGN */
- rx_buf_len -= NET_IP_ALIGN;
-
- return iavf_compute_pad(rx_buf_len);
-}
-
-#define IAVF_SKB_PAD iavf_skb_pad()
-#else
-#define IAVF_2K_TOO_SMALL_WITH_PADDING false
#define IAVF_SKB_PAD (NET_SKB_PAD + NET_IP_ALIGN)
-#endif

/**
* iavf_test_staterr - tests bits in Rx descriptor status and error fields
@@ -266,12 +214,7 @@ struct iavf_tx_buffer {
struct iavf_rx_buffer {
dma_addr_t dma;
struct page *page;
-#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
__u32 page_offset;
-#else
- __u16 page_offset;
-#endif
- __u16 pagecnt_bias;
};

struct iavf_queue_stats {
@@ -293,8 +236,6 @@ struct iavf_rx_queue_stats {
u64 non_eop_descs;
u64 alloc_page_failed;
u64 alloc_buff_failed;
- u64 page_reuse_count;
- u64 realloc_count;
};

enum iavf_ring_state_t {
@@ -338,7 +279,6 @@ struct iavf_ring {

u16 count; /* Number of descriptors */
u16 reg_idx; /* HW register index of the ring */
- u16 rx_buf_len;

/* used in interrupt processing */
u16 next_to_use;
@@ -374,7 +314,6 @@ struct iavf_ring {
struct iavf_q_vector *q_vector; /* Backreference to associated vector */

struct rcu_head rcu; /* to avoid race on free */
- u16 next_to_alloc;
struct sk_buff *skb; /* When iavf_clean_rx_ring_irq() must
* return before it sees the EOP for
* the current packet, we save that skb
@@ -408,10 +347,6 @@ struct iavf_ring_container {

static inline unsigned int iavf_rx_pg_order(struct iavf_ring *ring)
{
-#if (PAGE_SIZE < 8192)
- if (ring->rx_buf_len > (PAGE_SIZE / 2))
- return 1;
-#endif
return 0;
}

diff --git a/drivers/net/ethernet/intel/iavf/iavf_type.h b/drivers/net/ethernet/intel/iavf/iavf_type.h
index 23ded4fcd94f..f6b09e57abce 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_type.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_type.h
@@ -10,8 +10,6 @@
#include "iavf_adminq.h"
#include "iavf_devids.h"

-#define IAVF_RXQ_CTX_DBUFF_SHIFT 7
-
/* IAVF_MASK is a macro used on 32 bit registers */
#define IAVF_MASK(mask, shift) ((u32)(mask) << (shift))

diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 37d0e4313130..12da9401c46d 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -288,10 +288,6 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
if (!vqci)
return;

- /* Limit maximum frame size when jumbo frames is not enabled */
- if (adapter->netdev->mtu <= ETH_DATA_LEN)
- max_frame = IAVF_RXBUFFER_1536 - NET_IP_ALIGN;
-
vqci->vsi_id = adapter->vsi_res->vsi_id;
vqci->num_queue_pairs = pairs;
vqpi = vqci->qpair;
@@ -308,9 +304,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
vqpi->rxq.ring_len = adapter->rx_rings[i].count;
vqpi->rxq.dma_ring_addr = adapter->rx_rings[i].dma;
vqpi->rxq.max_pkt_size = max_frame;
- vqpi->rxq.databuffer_size =
- ALIGN(adapter->rx_rings[i].rx_buf_len,
- BIT_ULL(IAVF_RXQ_CTX_DBUFF_SHIFT));
+ vqpi->rxq.databuffer_size = IAVF_RXBUFFER_3072;
if (CRC_OFFLOAD_ALLOWED(adapter))
vqpi->rxq.crc_disable = !!(adapter->netdev->features &
NETIF_F_RXFCS);
--
2.43.0

2023-12-07 17:24:29

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 10/12] iavf: switch to Page Pool

Now that the IAVF driver simply uses dev_alloc_page() + free_page() with
no custom recycling logics, it can easily be switched to using Page
Pool / libie API instead.
This allows to removing the whole dancing around headroom, HW buffer
size, and page order. All DMA-for-device is now done in the PP core,
for-CPU -- in the libie helper.
Use skb_mark_for_recycle() to bring back the recycling and restore the
performance. Speaking of performance: on par with the baseline and
faster with the PP optimization series applied. But the memory usage for
1500b MTU is now almost 2x lower (x86_64) thanks to allocating a page
every second descriptor.

Signed-off-by: Alexander Lobakin <[email protected]>
---
drivers/net/ethernet/intel/iavf/iavf_main.c | 7 +-
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 246 ++++++------------
drivers/net/ethernet/intel/iavf/iavf_txrx.h | 34 +--
.../net/ethernet/intel/iavf/iavf_virtchnl.c | 10 +-
4 files changed, 92 insertions(+), 205 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 060002d1ebf0..0becfdf2ed99 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -1,6 +1,8 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2013 - 2018 Intel Corporation. */

+#include <linux/net/intel/libie/rx.h>
+
#include "iavf.h"
#include "iavf_prototype.h"
/* All iavf tracepoints are defined by the include below, which must
@@ -1605,7 +1607,6 @@ static int iavf_alloc_queues(struct iavf_adapter *adapter)
rx_ring = &adapter->rx_rings[i];
rx_ring->queue_index = i;
rx_ring->netdev = adapter->netdev;
- rx_ring->dev = &adapter->pdev->dev;
rx_ring->count = adapter->rx_desc_count;
rx_ring->itr_setting = IAVF_ITR_RX_DEF;
}
@@ -2637,9 +2638,9 @@ static void iavf_init_config_adapter(struct iavf_adapter *adapter)
iavf_set_ethtool_ops(netdev);
netdev->watchdog_timeo = 5 * HZ;

- /* MTU range: 68 - 9710 */
+ /* MTU range: 68 - 16356 */
netdev->min_mtu = ETH_MIN_MTU;
- netdev->max_mtu = IAVF_MAX_RXBUFFER - IAVF_PACKET_HDR_PAD;
+ netdev->max_mtu = LIBIE_MAX_MTU;

if (!is_valid_ether_addr(adapter->hw.mac.addr)) {
dev_info(&pdev->dev, "Invalid MAC address %pM, using random\n",
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 62f976d322ab..aa86a0f926c6 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -689,9 +689,6 @@ int iavf_setup_tx_descriptors(struct iavf_ring *tx_ring)
**/
static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
{
- unsigned long bi_size;
- u16 i;
-
/* ring already cleared, nothing to do */
if (!rx_ring->rx_bi)
return;
@@ -701,40 +698,16 @@ static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
rx_ring->skb = NULL;
}

- /* Free all the Rx ring sk_buffs */
- for (i = 0; i < rx_ring->count; i++) {
- struct iavf_rx_buffer *rx_bi = &rx_ring->rx_bi[i];
+ /* Free all the Rx ring buffers */
+ for (u32 i = rx_ring->next_to_clean; i != rx_ring->next_to_use; ) {
+ const struct libie_rx_buffer *rx_bi = &rx_ring->rx_bi[i];

- if (!rx_bi->page)
- continue;
+ page_pool_put_full_page(rx_ring->pp, rx_bi->page, false);

- /* Invalidate cache lines that may have been written to by
- * device so that we avoid corrupting memory.
- */
- dma_sync_single_range_for_cpu(rx_ring->dev,
- rx_bi->dma,
- rx_bi->page_offset,
- IAVF_RXBUFFER_3072,
- DMA_FROM_DEVICE);
-
- /* free resources associated with mapping */
- dma_unmap_page_attrs(rx_ring->dev, rx_bi->dma,
- iavf_rx_pg_size(rx_ring),
- DMA_FROM_DEVICE,
- IAVF_RX_DMA_ATTR);
-
- __free_page(rx_bi->page);
-
- rx_bi->page = NULL;
- rx_bi->page_offset = 0;
+ if (unlikely(++i == rx_ring->count))
+ i = 0;
}

- bi_size = sizeof(struct iavf_rx_buffer) * rx_ring->count;
- memset(rx_ring->rx_bi, 0, bi_size);
-
- /* Zero out the descriptor ring */
- memset(rx_ring->desc, 0, rx_ring->size);
-
rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;
}
@@ -747,15 +720,22 @@ static void iavf_clean_rx_ring(struct iavf_ring *rx_ring)
**/
void iavf_free_rx_resources(struct iavf_ring *rx_ring)
{
+ struct libie_buf_queue bq = {
+ .pp = rx_ring->pp,
+ };
+
iavf_clean_rx_ring(rx_ring);
kfree(rx_ring->rx_bi);
rx_ring->rx_bi = NULL;

if (rx_ring->desc) {
- dma_free_coherent(rx_ring->dev, rx_ring->size,
+ dma_free_coherent(rx_ring->pp->p.dev, rx_ring->size,
rx_ring->desc, rx_ring->dma);
rx_ring->desc = NULL;
}
+
+ libie_rx_page_pool_destroy(&bq);
+ rx_ring->pp = bq.pp;
}

/**
@@ -766,13 +746,23 @@ void iavf_free_rx_resources(struct iavf_ring *rx_ring)
**/
int iavf_setup_rx_descriptors(struct iavf_ring *rx_ring)
{
- struct device *dev = rx_ring->dev;
- int bi_size;
+ struct libie_buf_queue bq = {
+ .count = rx_ring->count,
+ };
+ int ret;
+
+ ret = libie_rx_page_pool_create(&bq, &rx_ring->q_vector->napi);
+ if (ret)
+ return ret;
+
+ rx_ring->pp = bq.pp;
+ rx_ring->truesize = bq.truesize;
+ rx_ring->rx_buf_len = bq.rx_buf_len;

/* warn if we are about to overwrite the pointer */
WARN_ON(rx_ring->rx_bi);
- bi_size = sizeof(struct iavf_rx_buffer) * rx_ring->count;
- rx_ring->rx_bi = kzalloc(bi_size, GFP_KERNEL);
+ rx_ring->rx_bi = kcalloc(rx_ring->count, sizeof(*rx_ring->rx_bi),
+ GFP_KERNEL);
if (!rx_ring->rx_bi)
goto err;

@@ -781,22 +771,28 @@ int iavf_setup_rx_descriptors(struct iavf_ring *rx_ring)
/* Round up to nearest 4K */
rx_ring->size = rx_ring->count * sizeof(union iavf_32byte_rx_desc);
rx_ring->size = ALIGN(rx_ring->size, 4096);
- rx_ring->desc = dma_alloc_coherent(dev, rx_ring->size,
+ rx_ring->desc = dma_alloc_coherent(bq.pp->p.dev, rx_ring->size,
&rx_ring->dma, GFP_KERNEL);

if (!rx_ring->desc) {
- dev_info(dev, "Unable to allocate memory for the Rx descriptor ring, size=%d\n",
+ dev_info(bq.pp->p.dev, "Unable to allocate memory for the Rx descriptor ring, size=%d\n",
rx_ring->size);
- goto err;
+ goto err_free_buf;
}

rx_ring->next_to_clean = 0;
rx_ring->next_to_use = 0;

return 0;
-err:
+
+err_free_buf:
kfree(rx_ring->rx_bi);
rx_ring->rx_bi = NULL;
+
+err:
+ libie_rx_page_pool_destroy(&bq);
+ rx_ring->pp = bq.pp;
+
return -ENOMEM;
}

@@ -818,49 +814,6 @@ static void iavf_release_rx_desc(struct iavf_ring *rx_ring, u32 val)
writel(val, rx_ring->tail);
}

-/**
- * iavf_alloc_mapped_page - recycle or make a new page
- * @rx_ring: ring to use
- * @bi: rx_buffer struct to modify
- *
- * Returns true if the page was successfully allocated or
- * reused.
- **/
-static bool iavf_alloc_mapped_page(struct iavf_ring *rx_ring,
- struct iavf_rx_buffer *bi)
-{
- struct page *page = bi->page;
- dma_addr_t dma;
-
- /* alloc new page for storage */
- page = dev_alloc_pages(iavf_rx_pg_order(rx_ring));
- if (unlikely(!page)) {
- rx_ring->rx_stats.alloc_page_failed++;
- return false;
- }
-
- /* map page for use */
- dma = dma_map_page_attrs(rx_ring->dev, page, 0,
- iavf_rx_pg_size(rx_ring),
- DMA_FROM_DEVICE,
- IAVF_RX_DMA_ATTR);
-
- /* if mapping failed free memory back to system since
- * there isn't much point in holding memory we can't use
- */
- if (dma_mapping_error(rx_ring->dev, dma)) {
- __free_pages(page, iavf_rx_pg_order(rx_ring));
- rx_ring->rx_stats.alloc_page_failed++;
- return false;
- }
-
- bi->dma = dma;
- bi->page = page;
- bi->page_offset = IAVF_SKB_PAD;
-
- return true;
-}
-
/**
* iavf_receive_skb - Send a completed packet up the stack
* @rx_ring: rx ring in play
@@ -891,38 +844,37 @@ static void iavf_receive_skb(struct iavf_ring *rx_ring,
**/
bool iavf_alloc_rx_buffers(struct iavf_ring *rx_ring, u16 cleaned_count)
{
+ const struct libie_buf_queue bq = {
+ .pp = rx_ring->pp,
+ .rx_bi = rx_ring->rx_bi,
+ .truesize = rx_ring->truesize,
+ .count = rx_ring->count,
+ };
u16 ntu = rx_ring->next_to_use;
union iavf_rx_desc *rx_desc;
- struct iavf_rx_buffer *bi;

/* do nothing if no valid netdev defined */
if (!rx_ring->netdev || !cleaned_count)
return false;

rx_desc = IAVF_RX_DESC(rx_ring, ntu);
- bi = &rx_ring->rx_bi[ntu];

do {
- if (!iavf_alloc_mapped_page(rx_ring, bi))
- goto no_buffers;
+ dma_addr_t addr;

- /* sync the buffer for use by the device */
- dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
- bi->page_offset,
- IAVF_RXBUFFER_3072,
- DMA_FROM_DEVICE);
+ addr = libie_rx_alloc(&bq, ntu);
+ if (addr == DMA_MAPPING_ERROR)
+ goto no_buffers;

/* Refresh the desc even if buffer_addrs didn't change
* because each write-back erases this info.
*/
- rx_desc->read.pkt_addr = cpu_to_le64(bi->dma + bi->page_offset);
+ rx_desc->read.pkt_addr = cpu_to_le64(addr);

rx_desc++;
- bi++;
ntu++;
if (unlikely(ntu == rx_ring->count)) {
rx_desc = IAVF_RX_DESC(rx_ring, 0);
- bi = rx_ring->rx_bi;
ntu = 0;
}

@@ -941,6 +893,8 @@ bool iavf_alloc_rx_buffers(struct iavf_ring *rx_ring, u16 cleaned_count)
if (rx_ring->next_to_use != ntu)
iavf_release_rx_desc(rx_ring, ntu);

+ rx_ring->rx_stats.alloc_page_failed++;
+
/* make sure to come back via polling to try again after
* allocation failure
*/
@@ -1091,9 +1045,8 @@ static bool iavf_cleanup_headers(struct iavf_ring *rx_ring, struct sk_buff *skb)

/**
* iavf_add_rx_frag - Add contents of Rx buffer to sk_buff
- * @rx_ring: rx descriptor ring to transact packets on
- * @rx_buffer: buffer containing page to add
* @skb: sk_buff to place the data into
+ * @rx_buffer: buffer containing page to add
* @size: packet length from rx_desc
*
* This function will add the data contained in rx_buffer->page to the skb.
@@ -1101,105 +1054,49 @@ static bool iavf_cleanup_headers(struct iavf_ring *rx_ring, struct sk_buff *skb)
*
* The function will then update the page offset.
**/
-static void iavf_add_rx_frag(struct iavf_ring *rx_ring,
- struct iavf_rx_buffer *rx_buffer,
- struct sk_buff *skb,
+static void iavf_add_rx_frag(struct sk_buff *skb,
+ const struct libie_rx_buffer *rx_buffer,
unsigned int size)
{
- unsigned int truesize = SKB_DATA_ALIGN(size + IAVF_SKB_PAD);
-
- if (!size)
- return;
+ u32 hr = rx_buffer->page->pp->p.offset;

skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
- rx_buffer->page_offset, size, truesize);
-}
-
-/**
- * iavf_get_rx_buffer - Fetch Rx buffer and synchronize data for use
- * @rx_ring: rx descriptor ring to transact packets on
- * @size: size of buffer to add to skb
- *
- * This function will pull an Rx buffer from the ring and synchronize it
- * for use by the CPU.
- */
-static struct iavf_rx_buffer *iavf_get_rx_buffer(struct iavf_ring *rx_ring,
- const unsigned int size)
-{
- struct iavf_rx_buffer *rx_buffer;
-
- rx_buffer = &rx_ring->rx_bi[rx_ring->next_to_clean];
- prefetchw(rx_buffer->page);
- if (!size)
- return rx_buffer;
-
- /* we are reusing so sync this buffer for CPU use */
- dma_sync_single_range_for_cpu(rx_ring->dev,
- rx_buffer->dma,
- rx_buffer->page_offset,
- size,
- DMA_FROM_DEVICE);
-
- return rx_buffer;
+ rx_buffer->offset + hr, size, rx_buffer->truesize);
}

/**
* iavf_build_skb - Build skb around an existing buffer
- * @rx_ring: Rx descriptor ring to transact packets on
* @rx_buffer: Rx buffer to pull data from
* @size: size of buffer to add to skb
*
* This function builds an skb around an existing Rx buffer, taking care
* to set up the skb correctly and avoid any memcpy overhead.
*/
-static struct sk_buff *iavf_build_skb(struct iavf_ring *rx_ring,
- struct iavf_rx_buffer *rx_buffer,
+static struct sk_buff *iavf_build_skb(const struct libie_rx_buffer *rx_buffer,
unsigned int size)
{
- void *va;
- unsigned int truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) +
- SKB_DATA_ALIGN(IAVF_SKB_PAD + size);
+ u32 hr = rx_buffer->page->pp->p.offset;
struct sk_buff *skb;
+ void *va;

- if (!rx_buffer || !size)
- return NULL;
/* prefetch first cache line of first page */
- va = page_address(rx_buffer->page) + rx_buffer->page_offset;
- net_prefetch(va);
+ va = page_address(rx_buffer->page) + rx_buffer->offset;
+ net_prefetch(va + hr);

/* build an skb around the page buffer */
- skb = napi_build_skb(va - IAVF_SKB_PAD, truesize);
+ skb = napi_build_skb(va, rx_buffer->truesize);
if (unlikely(!skb))
return NULL;

+ skb_mark_for_recycle(skb);
+
/* update pointers within the skb to store the data */
- skb_reserve(skb, IAVF_SKB_PAD);
+ skb_reserve(skb, hr);
__skb_put(skb, size);

return skb;
}

-/**
- * iavf_put_rx_buffer - Unmap used buffer
- * @rx_ring: rx descriptor ring to transact packets on
- * @rx_buffer: rx buffer to pull data from
- *
- * This function will unmap the buffer after it's written by HW.
- */
-static void iavf_put_rx_buffer(struct iavf_ring *rx_ring,
- struct iavf_rx_buffer *rx_buffer)
-{
- if (!rx_buffer)
- return;
-
- /* we are not reusing the buffer so unmap it */
- dma_unmap_page_attrs(rx_ring->dev, rx_buffer->dma, PAGE_SIZE,
- DMA_FROM_DEVICE, IAVF_RX_DMA_ATTR);
-
- /* clear contents of buffer_info */
- rx_buffer->page = NULL;
-}
-
/**
* iavf_is_non_eop - process handling of non-EOP buffers
* @rx_ring: Rx ring being processed
@@ -1253,7 +1150,7 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
bool failure = false;

while (likely(total_rx_packets < (unsigned int)budget)) {
- struct iavf_rx_buffer *rx_buffer;
+ struct libie_rx_buffer *rx_buffer;
union iavf_rx_desc *rx_desc;
unsigned int size;
u16 vlan_tag = 0;
@@ -1289,13 +1186,16 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
IAVF_RXD_QW1_LENGTH_PBUF_SHIFT;

iavf_trace(clean_rx_irq, rx_ring, rx_desc, skb);
- rx_buffer = iavf_get_rx_buffer(rx_ring, size);
+
+ rx_buffer = &rx_ring->rx_bi[rx_ring->next_to_clean];
+ if (!libie_rx_sync_for_cpu(rx_buffer, size))
+ goto skip_data;

/* retrieve a buffer from the ring */
if (skb)
- iavf_add_rx_frag(rx_ring, rx_buffer, skb, size);
+ iavf_add_rx_frag(skb, rx_buffer, size);
else
- skb = iavf_build_skb(rx_ring, rx_buffer, size);
+ skb = iavf_build_skb(rx_buffer, size);

/* exit if we failed to retrieve a buffer */
if (!skb) {
@@ -1303,7 +1203,7 @@ static int iavf_clean_rx_irq(struct iavf_ring *rx_ring, int budget)
break;
}

- iavf_put_rx_buffer(rx_ring, rx_buffer);
+skip_data:
cleaned_count++;

if (iavf_is_non_eop(rx_ring, rx_desc, skb))
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.h b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
index 9b8154c5f4fb..72c6bc64d94e 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.h
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.h
@@ -81,18 +81,8 @@ enum iavf_dyn_idx_t {
BIT_ULL(IAVF_FILTER_PCTYPE_NONF_UNICAST_IPV6_UDP) | \
BIT_ULL(IAVF_FILTER_PCTYPE_NONF_MULTICAST_IPV6_UDP))

-/* Supported Rx Buffer Sizes (a multiple of 128) */
-#define IAVF_RXBUFFER_3072 3072 /* Used for large frames w/ padding */
-#define IAVF_MAX_RXBUFFER 9728 /* largest size for single descriptor */
-
-#define IAVF_PACKET_HDR_PAD (ETH_HLEN + ETH_FCS_LEN + (VLAN_HLEN * 2))
#define iavf_rx_desc iavf_32byte_rx_desc

-#define IAVF_RX_DMA_ATTR \
- (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
-
-#define IAVF_SKB_PAD (NET_SKB_PAD + NET_IP_ALIGN)
-
/**
* iavf_test_staterr - tests bits in Rx descriptor status and error fields
* @rx_desc: pointer to receive descriptor (in le64 format)
@@ -211,12 +201,6 @@ struct iavf_tx_buffer {
u32 tx_flags;
};

-struct iavf_rx_buffer {
- dma_addr_t dma;
- struct page *page;
- __u32 page_offset;
-};
-
struct iavf_queue_stats {
u64 packets;
u64 bytes;
@@ -252,13 +236,18 @@ struct iavf_rx_queue_stats {
struct iavf_ring {
struct iavf_ring *next; /* pointer to next ring in q_vector */
void *desc; /* Descriptor ring memory */
- struct device *dev; /* Used for DMA mapping */
+ union {
+ struct page_pool *pp; /* Used on Rx for buffer management */
+ struct device *dev; /* Used on Tx for DMA mapping */
+ };
struct net_device *netdev; /* netdev ring maps to */
union {
+ struct libie_rx_buffer *rx_bi;
struct iavf_tx_buffer *tx_bi;
- struct iavf_rx_buffer *rx_bi;
};
u8 __iomem *tail;
+ u32 truesize;
+
u16 queue_index; /* Queue number of ring */

/* high bit set means dynamic, use accessors routines to read/write.
@@ -306,6 +295,8 @@ struct iavf_ring {
* iavf_clean_rx_ring_irq() is called
* for this ring.
*/
+
+ u32 rx_buf_len;
} ____cacheline_internodealigned_in_smp;

#define IAVF_ITR_ADAPTIVE_MIN_INC 0x0002
@@ -329,13 +320,6 @@ struct iavf_ring_container {
#define iavf_for_each_ring(pos, head) \
for (pos = (head).ring; pos != NULL; pos = pos->next)

-static inline unsigned int iavf_rx_pg_order(struct iavf_ring *ring)
-{
- return 0;
-}
-
-#define iavf_rx_pg_size(_ring) (PAGE_SIZE << iavf_rx_pg_order(_ring))
-
bool iavf_alloc_rx_buffers(struct iavf_ring *rxr, u16 cleaned_count);
netdev_tx_t iavf_xmit_frame(struct sk_buff *skb, struct net_device *netdev);
int iavf_setup_tx_descriptors(struct iavf_ring *tx_ring);
diff --git a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
index 12da9401c46d..58dad471004d 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_virtchnl.c
@@ -1,6 +1,8 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2013 - 2018 Intel Corporation. */

+#include <linux/net/intel/libie/rx.h>
+
#include "iavf.h"
#include "iavf_prototype.h"

@@ -268,13 +270,13 @@ int iavf_get_vf_vlan_v2_caps(struct iavf_adapter *adapter)
void iavf_configure_queues(struct iavf_adapter *adapter)
{
struct virtchnl_vsi_queue_config_info *vqci;
- int i, max_frame = adapter->vf_res->max_mtu;
int pairs = adapter->num_active_queues;
struct virtchnl_queue_pair_info *vqpi;
+ u32 i, max_frame;
size_t len;

- if (max_frame > IAVF_MAX_RXBUFFER || !max_frame)
- max_frame = IAVF_MAX_RXBUFFER;
+ max_frame = LIBIE_MAX_RX_FRM_LEN(adapter->rx_rings->pp->p.offset);
+ max_frame = min_not_zero(adapter->vf_res->max_mtu, max_frame);

if (adapter->current_op != VIRTCHNL_OP_UNKNOWN) {
/* bail because we already have a command pending */
@@ -304,7 +306,7 @@ void iavf_configure_queues(struct iavf_adapter *adapter)
vqpi->rxq.ring_len = adapter->rx_rings[i].count;
vqpi->rxq.dma_ring_addr = adapter->rx_rings[i].dma;
vqpi->rxq.max_pkt_size = max_frame;
- vqpi->rxq.databuffer_size = IAVF_RXBUFFER_3072;
+ vqpi->rxq.databuffer_size = adapter->rx_rings[i].rx_buf_len;
if (CRC_OFFLOAD_ALLOWED(adapter))
vqpi->rxq.crc_disable = !!(adapter->netdev->features &
NETIF_F_RXFCS);
--
2.43.0

2023-12-07 17:24:45

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v6 06/12] page_pool: constify some read-only function arguments

There are several functions taking pointers to data they don't modify.
This includes statistics fetching, page and page_pool parameters, etc.
Constify the pointers, so that call sites will be able to pass const
pointers as well.
No functional changes, no visible changes in functions sizes.

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/page_pool/helpers.h | 10 +++++-----
net/core/page_pool.c | 8 ++++----
2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 7dc65774cde5..c860fad50d00 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -58,7 +58,7 @@
/* Deprecated driver-facing API, use netlink instead */
int page_pool_ethtool_stats_get_count(void);
u8 *page_pool_ethtool_stats_get_strings(u8 *data);
-u64 *page_pool_ethtool_stats_get(u64 *data, void *stats);
+u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats);

bool page_pool_get_stats(const struct page_pool *pool,
struct page_pool_stats *stats);
@@ -73,7 +73,7 @@ static inline u8 *page_pool_ethtool_stats_get_strings(u8 *data)
return data;
}

-static inline u64 *page_pool_ethtool_stats_get(u64 *data, void *stats)
+static inline u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
{
return data;
}
@@ -204,8 +204,8 @@ static inline void *page_pool_dev_alloc_va(struct page_pool *pool,
* Get the stored dma direction. A driver might decide to store this locally
* and avoid the extra cache line from page_pool to determine the direction.
*/
-static
-inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
+static inline enum dma_data_direction
+page_pool_get_dma_dir(const struct page_pool *pool)
{
return pool->p.dma_dir;
}
@@ -357,7 +357,7 @@ static inline void page_pool_free_va(struct page_pool *pool, void *va,
* Fetch the DMA address of the page. The page pool to which the page belongs
* must had been created with PP_FLAG_DMA_MAP.
*/
-static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
+static inline dma_addr_t page_pool_get_dma_addr(const struct page *page)
{
dma_addr_t ret = page->dma_addr;

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 59aca3339222..4295aec0be40 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -121,9 +121,9 @@ int page_pool_ethtool_stats_get_count(void)
}
EXPORT_SYMBOL(page_pool_ethtool_stats_get_count);

-u64 *page_pool_ethtool_stats_get(u64 *data, void *stats)
+u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
{
- struct page_pool_stats *pool_stats = stats;
+ const struct page_pool_stats *pool_stats = stats;

*data++ = pool_stats->alloc_stats.fast;
*data++ = pool_stats->alloc_stats.slow;
@@ -360,8 +360,8 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
return page;
}

-static void page_pool_dma_sync_for_device(struct page_pool *pool,
- struct page *page,
+static void page_pool_dma_sync_for_device(const struct page_pool *pool,
+ const struct page *page,
unsigned int dma_sync_size)
{
dma_addr_t dma_addr = page_pool_get_dma_addr(page);
--
2.43.0

2023-12-08 09:28:37

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH net-next v6 08/12] libie: add Rx buffer management (via Page Pool)

On 2023/12/8 1:20, Alexander Lobakin wrote:
...

> +
> +/**
> + * libie_rx_page_pool_create - create a PP with the default libie settings
> + * @bq: buffer queue struct to fill
> + * @napi: &napi_struct covering this PP (no usage outside its poll loops)
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +int libie_rx_page_pool_create(struct libie_buf_queue *bq,
> + struct napi_struct *napi)
> +{
> + struct page_pool_params pp = {
> + .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
> + .order = LIBIE_RX_PAGE_ORDER,
> + .pool_size = bq->count,
> + .nid = NUMA_NO_NODE,

Is there a reason the NUMA_NO_NODE is used here instead of
dev_to_node(napi->dev->dev.parent)?

> + .dev = napi->dev->dev.parent,
> + .netdev = napi->dev,
> + .napi = napi,
> + .dma_dir = DMA_FROM_DEVICE,
> + .offset = LIBIE_SKB_HEADROOM,
> + };
> +
> + /* HW-writeable / syncable length per one page */
> + pp.max_len = LIBIE_RX_BUF_LEN(pp.offset);
> +
> + /* HW-writeable length per buffer */
> + bq->rx_buf_len = libie_rx_hw_len(&pp);
> + /* Buffer size to allocate */
> + bq->truesize = roundup_pow_of_two(SKB_HEAD_ALIGN(pp.offset +
> + bq->rx_buf_len));
> +
> + bq->pp = page_pool_create(&pp);
> +
> + return PTR_ERR_OR_ZERO(bq->pp);
> +}
> +EXPORT_SYMBOL_NS_GPL(libie_rx_page_pool_create, LIBIE);
> +

...

> +/**
> + * libie_rx_sync_for_cpu - synchronize or recycle buffer post DMA
> + * @buf: buffer to process
> + * @len: frame length from the descriptor
> + *
> + * Process the buffer after it's written by HW. The regular path is to
> + * synchronize DMA for CPU, but in case of no data it will be immediately
> + * recycled back to its PP.
> + *
> + * Return: true when there's data to process, false otherwise.
> + */
> +static inline bool libie_rx_sync_for_cpu(const struct libie_rx_buffer *buf,
> + u32 len)
> +{
> + struct page *page = buf->page;
> +
> + /* Very rare, but possible case. The most common reason:
> + * the last fragment contained FCS only, which was then
> + * stripped by the HW.
> + */
> + if (unlikely(!len)) {
> + page_pool_recycle_direct(page->pp, page);
> + return false;
> + }
> +
> + page_pool_dma_sync_for_cpu(page->pp, page, buf->offset, len);

Is there a reason why page_pool_dma_sync_for_cpu() is still used when
page_pool_create() is called with PP_FLAG_DMA_SYNC_DEV flag? Isn't syncing
already handled in page_pool core when when PP_FLAG_DMA_SYNC_DEV flag is
set?

> +
> + return true;
> +}
>
> /* O(1) converting i40e/ice/iavf's 8/10-bit hardware packet type to a parsed
> * bitfield struct.
>

2023-12-08 11:29:04

by Yunsheng Lin

[permalink] [raw]
Subject: Re: [PATCH net-next v6 08/12] libie: add Rx buffer management (via Page Pool)

On 2023/12/8 17:28, Yunsheng Lin wrote:

>> +
>> + page_pool_dma_sync_for_cpu(page->pp, page, buf->offset, len);
>
> Is there a reason why page_pool_dma_sync_for_cpu() is still used when
> page_pool_create() is called with PP_FLAG_DMA_SYNC_DEV flag? Isn't syncing
> already handled in page_pool core when when PP_FLAG_DMA_SYNC_DEV flag is
> set?

Ah, it is a sync_for_cpu.
Ignore this one.

2023-12-09 01:15:56

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v6 04/12] iavf: kill "legacy-rx" for good

On Thu, 7 Dec 2023 18:20:02 +0100 Alexander Lobakin wrote:
> Ever since build_skb() became stable, the old way with allocating an skb
> for storing the headers separately, which will be then copied manually,
> was slower, less flexible, and thus obsolete.

This one has a conflict on net-next :(
--
pw-bot: cr

2023-12-11 10:18:14

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH net-next v6 08/12] libie: add Rx buffer management (via Page Pool)

From: Yunsheng Lin <[email protected]>
Date: Fri, 8 Dec 2023 17:28:21 +0800

> On 2023/12/8 1:20, Alexander Lobakin wrote:
> ...
>
>> +
>> +/**
>> + * libie_rx_page_pool_create - create a PP with the default libie settings
>> + * @bq: buffer queue struct to fill
>> + * @napi: &napi_struct covering this PP (no usage outside its poll loops)
>> + *
>> + * Return: 0 on success, -errno on failure.
>> + */
>> +int libie_rx_page_pool_create(struct libie_buf_queue *bq,
>> + struct napi_struct *napi)
>> +{
>> + struct page_pool_params pp = {
>> + .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
>> + .order = LIBIE_RX_PAGE_ORDER,
>> + .pool_size = bq->count,
>> + .nid = NUMA_NO_NODE,
>
> Is there a reason the NUMA_NO_NODE is used here instead of
> dev_to_node(napi->dev->dev.parent)?

NUMA_NO_NODE creates a "dynamic" page_pool and makes sure the pages are
local to the CPU where PP allocation functions are called. Setting ::nid
to a "static" value pins the PP to a particular node.
But the main reason is that Rx queues can be distributed across several
nodes and in that case NUMA_NO_NODE will make sure each page_pool is
local to the queue it's running on. dev_to_node() will return the same
value, thus forcing some PPs to allocate remote pages.

Ideally, I'd like to pass a CPU ID this queue will be run on and use
cpu_to_node(), but currently there's no NUMA-aware allocations in the
Intel drivers and Rx queues don't get the corresponding CPU ID when
configuring. I may revisit this later, but for now NUMA_NO_NODE is the
most optimal solution here.

[...]

Thanks,
Olek

2023-12-11 19:23:45

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v6 08/12] libie: add Rx buffer management (via Page Pool)

On Mon, 11 Dec 2023 11:16:20 +0100 Alexander Lobakin wrote:
> Ideally, I'd like to pass a CPU ID this queue will be run on and use
> cpu_to_node(), but currently there's no NUMA-aware allocations in the
> Intel drivers and Rx queues don't get the corresponding CPU ID when
> configuring. I may revisit this later, but for now NUMA_NO_NODE is the
> most optimal solution here.

Hm, I've been wondering about persistent IRQ mappings. Drivers
resetting IRQ mapping on reconfiguration is a major PITA in production
clusters. You change the RSS hash and some NICs suddenly forget
affinitization ????️

The connection with memory allocations changes the math on that a bit.

The question is really whether we add CPU <> NAPI config as a netdev
Netlink API or build around the generic IRQ affinity API. The latter
is definitely better from "don't duplicate uAPI" perspective.
But we need to reset the queues and reallocate their state when
the mapping is changed. And shutting down queues on

echo $cpu > /../smp_affinity_list

seems moderately insane. Perhaps some middle-ground exists.

Anyway, if you do find cycles to tackle this - pls try to do it
generically not just for Intel? :)

2023-12-12 16:15:31

by Ilias Apalodimas

[permalink] [raw]
Subject: Re: [PATCH net-next v6 06/12] page_pool: constify some read-only function arguments

Apologies for the noise,
Resending without HTML


On Thu, 7 Dec 2023 at 19:22, Alexander Lobakin
<[email protected]> wrote:
>
> There are several functions taking pointers to data they don't modify.
> This includes statistics fetching, page and page_pool parameters, etc.
> Constify the pointers, so that call sites will be able to pass const
> pointers as well.
> No functional changes, no visible changes in functions sizes.
>
> Signed-off-by: Alexander Lobakin <[email protected]>
> ---
> include/net/page_pool/helpers.h | 10 +++++-----
> net/core/page_pool.c | 8 ++++----
> 2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index 7dc65774cde5..c860fad50d00 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -58,7 +58,7 @@
> /* Deprecated driver-facing API, use netlink instead */
> int page_pool_ethtool_stats_get_count(void);
> u8 *page_pool_ethtool_stats_get_strings(u8 *data);
> -u64 *page_pool_ethtool_stats_get(u64 *data, void *stats);
> +u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats);
>
> bool page_pool_get_stats(const struct page_pool *pool,
> struct page_pool_stats *stats);
> @@ -73,7 +73,7 @@ static inline u8 *page_pool_ethtool_stats_get_strings(u8 *data)
> return data;
> }
>
> -static inline u64 *page_pool_ethtool_stats_get(u64 *data, void *stats)
> +static inline u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
> {
> return data;
> }
> @@ -204,8 +204,8 @@ static inline void *page_pool_dev_alloc_va(struct page_pool *pool,
> * Get the stored dma direction. A driver might decide to store this locally
> * and avoid the extra cache line from page_pool to determine the direction.
> */
> -static
> -inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
> +static inline enum dma_data_direction
> +page_pool_get_dma_dir(const struct page_pool *pool)
> {
> return pool->p.dma_dir;
> }
> @@ -357,7 +357,7 @@ static inline void page_pool_free_va(struct page_pool *pool, void *va,
> * Fetch the DMA address of the page. The page pool to which the page belongs
> * must had been created with PP_FLAG_DMA_MAP.
> */
> -static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
> +static inline dma_addr_t page_pool_get_dma_addr(const struct page *page)
> {
> dma_addr_t ret = page->dma_addr;
>
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 59aca3339222..4295aec0be40 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -121,9 +121,9 @@ int page_pool_ethtool_stats_get_count(void)
> }
> EXPORT_SYMBOL(page_pool_ethtool_stats_get_count);
>
> -u64 *page_pool_ethtool_stats_get(u64 *data, void *stats)
> +u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
> {
> - struct page_pool_stats *pool_stats = stats;
> + const struct page_pool_stats *pool_stats = stats;
>
> *data++ = pool_stats->alloc_stats.fast;
> *data++ = pool_stats->alloc_stats.slow;
> @@ -360,8 +360,8 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
> return page;
> }
>
> -static void page_pool_dma_sync_for_device(struct page_pool *pool,
> - struct page *page,
> +static void page_pool_dma_sync_for_device(const struct page_pool *pool,
> + const struct page *page,
> unsigned int dma_sync_size)
> {
> dma_addr_t dma_addr = page_pool_get_dma_addr(page);
> --
> 2.43.0
>

Reviewed-by: Ilias Apalodimas <[email protected]>

2023-12-13 11:25:33

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [Intel-wired-lan] [PATCH net-next v6 08/12] libie: add Rx buffer management (via Page Pool)

From: Jakub Kicinski <[email protected]>
Date: Mon, 11 Dec 2023 11:23:32 -0800

> On Mon, 11 Dec 2023 11:16:20 +0100 Alexander Lobakin wrote:
>> Ideally, I'd like to pass a CPU ID this queue will be run on and use
>> cpu_to_node(), but currently there's no NUMA-aware allocations in the
>> Intel drivers and Rx queues don't get the corresponding CPU ID when
>> configuring. I may revisit this later, but for now NUMA_NO_NODE is the
>> most optimal solution here.
>
> Hm, I've been wondering about persistent IRQ mappings. Drivers
> resetting IRQ mapping on reconfiguration is a major PITA in production
> clusters. You change the RSS hash and some NICs suddenly forget
> affinitization ????️
>
> The connection with memory allocations changes the math on that a bit.
>
> The question is really whether we add CPU <> NAPI config as a netdev
> Netlink API or build around the generic IRQ affinity API. The latter
> is definitely better from "don't duplicate uAPI" perspective.
> But we need to reset the queues and reallocate their state when
> the mapping is changed. And shutting down queues on
>
> echo $cpu > /../smp_affinity_list
>
> seems moderately insane. Perhaps some middle-ground exists.
>
> Anyway, if you do find cycles to tackle this - pls try to do it
> generically not just for Intel? :)

Sounds good, adding to my fathomless backlog :>

Thanks,
Olek