2023-08-03 19:21:48

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v3 0/6] page_pool: a couple of assorted optimizations

That initially was a spin-off of the IAVF PP series[0], but has grown
(and shrunk) since then a bunch. In fact, it consists of three
semi-independent blocks:

* #1-2: Compile-time optimization. Split page_pool.h into 2 headers to
not overbloat the consumers not needing complex inline helpers and
then stop including it in skbuff.h at all. The first patch is also
prereq for the whole series.
* #3: Improve cacheline locality for users of the Page Pool frag API.
* #4-6: Use direct cache recycling more aggressively, when it is safe
obviously. In addition, make sure nobody wants to use Page Pool API
with disabled interrupts.

Patches #1 and #5 are authored by Yunsheng and Jakub respectively, with
small modifications from my side as per ML discussions.
For the perf numbers for #3-6, please see individual commit messages.

Also available on my GH with many more Page Pool goodies[1].

[0] https://lore.kernel.org/netdev/[email protected]
[1] https://github.com/alobakin/linux/commits/iavf-pp-frag

Alexander Lobakin (4):
net: skbuff: don't include <net/page_pool/types.h> to <linux/skbuff.h>
page_pool: place frag_* fields in one cacheline
net: skbuff: avoid accessing page_pool if !napi_safe when returning
page
net: skbuff: always try to recycle PP pages directly when in softirq

Jakub Kicinski (1):
page_pool: add a lockdep check for recycling in hardirq

Yunsheng Lin (1):
page_pool: split types and declarations from page_pool.h

MAINTAINERS | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 2 +-
drivers/net/ethernet/engleder/tsnep_main.c | 1 +
drivers/net/ethernet/freescale/fec_main.c | 1 +
.../net/ethernet/hisilicon/hns3/hns3_enet.c | 1 +
.../net/ethernet/hisilicon/hns3/hns3_enet.h | 2 +-
drivers/net/ethernet/marvell/mvneta.c | 2 +-
drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 2 +-
.../net/ethernet/marvell/mvpp2/mvpp2_main.c | 1 +
.../marvell/octeontx2/nic/otx2_common.c | 1 +
.../ethernet/marvell/octeontx2/nic/otx2_pf.c | 1 +
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 1 +
drivers/net/ethernet/mediatek/mtk_eth_soc.h | 2 +-
.../ethernet/mellanox/mlx5/core/en/params.c | 1 +
.../net/ethernet/mellanox/mlx5/core/en/trap.c | 1 -
.../net/ethernet/mellanox/mlx5/core/en/xdp.c | 1 +
.../net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 2 +-
.../ethernet/mellanox/mlx5/core/en_stats.c | 2 +-
.../ethernet/microchip/lan966x/lan966x_fdma.c | 1 +
.../ethernet/microchip/lan966x/lan966x_main.h | 2 +-
drivers/net/ethernet/socionext/netsec.c | 2 +-
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +-
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 1 +
drivers/net/ethernet/ti/cpsw.c | 2 +-
drivers/net/ethernet/ti/cpsw_new.c | 2 +-
drivers/net/ethernet/ti/cpsw_priv.c | 2 +-
drivers/net/ethernet/wangxun/libwx/wx_lib.c | 2 +-
drivers/net/veth.c | 2 +-
drivers/net/wireless/mediatek/mt76/mac80211.c | 1 -
drivers/net/wireless/mediatek/mt76/mt76.h | 1 +
drivers/net/xen-netfront.c | 2 +-
include/linux/lockdep.h | 7 +
include/linux/skbuff.h | 5 +-
.../net/{page_pool.h => page_pool/helpers.h} | 246 +-----------------
include/net/page_pool/types.h | 236 +++++++++++++++++
include/trace/events/page_pool.h | 2 +-
net/bpf/test_run.c | 2 +-
net/core/page_pool.c | 43 +--
net/core/skbuff.c | 49 +++-
net/core/xdp.c | 2 +-
42 files changed, 337 insertions(+), 307 deletions(-)
rename include/net/{page_pool.h => page_pool/helpers.h} (50%)
create mode 100644 include/net/page_pool/types.h

---
From v2[2]:
* just rebase on top of Jakub's kdoc changes landed last minute.

From v1[3]:
* move the "avoid calling no-op DMA sync ops" piece out of the series --
will join some other or transform into something else (Jakub et al.);
* #1: restore accidentally removed path in MAINTAINERS (Yunsheng);
* #1: prefer `include/net/page_pool/` over `include/net/page_pool/*` in
MAINTAINERS (Yunsheng);
* #2: rename page_pool_return_skb_page() to napi_pp_put_page() -- the old
name seems to not make any sense for some time already (Yunsheng);
* #2: guard napi_pp_put_page() with CONFIG_PAGE_POOL in skbuff.c (to not
compile it when there are no users) (Yunsheng).

From RFC v2[4]:
* drop the dependency on the hybrid allocation series (and thus the
"RFC" prefix) -- it wasn't a strict dep and it's not in the trees yet;
* add [slightly reworked] Yunsheng's patch which splits page_pool.h into
2 headers -- merge conflict hell otherwise.
Also fix a typo while nobody looks (Simon);
* #3 (former #2): word the commitmsg a bit better, mention the main
reason for the change more clearly (Ilias);
* add Jakub's hardirq assertion as a prereq for the last patch;
* #9 (former #7): add comment mentioning that the hardirq case is not
checked due to the assertion checking it later (yes, it is illegal to
use Page Pool with the interrupts disabled or when in TH) (Jakub).

From RFC v1[5]:
* #1: move the entire function to skbuff.c, don't try to split it (Alex);
* #2-4: new;
* #5: use internal flags field added in #4 and don't modify driver-defined
structure (Alex, Jakub);
* #6: new;
* drop "add new NAPI state" as a redundant complication;
* #7: replace the check for the new NAPI state to just in_softirq(), should
be fine (Jakub).

[2] https://lore.kernel.org/netdev/[email protected]
[3] https://lore.kernel.org/netdev/[email protected]
[4] https://lore.kernel.org/netdev/[email protected]
[5] https://lore.kernel.org/netdev/[email protected]
--
2.41.0



2023-08-03 19:36:32

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v3 6/6] net: skbuff: always try to recycle PP pages directly when in softirq

Commit 8c48eea3adf3 ("page_pool: allow caching from safely localized
NAPI") allowed direct recycling of skb pages to their PP for some cases,
but unfortunately missed a couple of other majors.
For example, %XDP_DROP in skb mode. The netstack just calls kfree_skb(),
which unconditionally passes `false` as @napi_safe. Thus, all pages go
through ptr_ring and locks, although most of time we're actually inside
the NAPI polling this PP is linked with, so that it would be perfectly
safe to recycle pages directly.
Let's address such. If @napi_safe is true, we're fine, don't change
anything for this path. But if it's false, check whether we are in the
softirq context. It will most likely be so and then if ->list_owner
is our current CPU, we're good to use direct recycling, even though
@napi_safe is false -- concurrent access is excluded. in_softirq()
protection is needed mostly due to we can hit this place in the
process context (not the hardirq though).
For the mentioned xdp-drop-skb-mode case, the improvement I got is
3-4% in Mpps. As for page_pool stats, recycle_ring is now 0 and
alloc_slow counter doesn't change most of time, which means the
MM layer is not even called to allocate any new pages.

Suggested-by: Jakub Kicinski <[email protected]> # in_softirq()
Signed-off-by: Alexander Lobakin <[email protected]>
---
net/core/skbuff.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 85f82a6a08dc..33fdf04d4334 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -902,8 +902,10 @@ bool napi_pp_put_page(struct page *page, bool napi_safe)
/* Allow direct recycle if we have reasons to believe that we are
* in the same context as the consumer would run, so there's
* no possible race.
+ * __page_pool_put_page() makes sure we're not in hardirq context
+ * and interrupts are enabled prior to accessing the cache.
*/
- if (napi_safe) {
+ if (napi_safe || in_softirq()) {
const struct napi_struct *napi = READ_ONCE(pp->p.napi);

allow_direct = napi &&
--
2.41.0


2023-08-03 20:57:41

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH net-next v3 3/6] page_pool: place frag_* fields in one cacheline

On x86_64, frag_* fields of struct page_pool are scattered across two
cachelines despite the summary size of 24 bytes. All three fields are
used in pretty much the same places, but the last field, ::frag_users,
is pushed out to the next CL, provoking unwanted false-sharing on
hotpath (frags allocation code).
There are some holes and cold members to move around. Move frag_* one
block up, placing them right after &page_pool_params perfectly at the
beginning of CL2. This doesn't do any meaningful to the second block, as
those are some destroy-path cold structures, and doesn't do anything to
::alloc_stats, which still starts at 200-byte offset, 8 bytes after CL3
(still fitting into 1 cacheline).
On my setup, this yields 1-2% of Mpps when using PP frags actively.
When it comes to 32-bit architectures with 32-byte CL: &page_pool_params
plus ::pad is 44 bytes, the block taken care of is 16 bytes within one
CL, so there should be at least no regressions from the actual change.
::pages_state_hold_cnt is not related directly to that triple, but is
paired currently with ::frags_offset and decoupling them would mean
either two 4-byte holes or more invasive layout changes.

Signed-off-by: Alexander Lobakin <[email protected]>
---
include/net/page_pool/types.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index fcb846523398..887e7946a597 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -123,16 +123,16 @@ struct page_pool_stats {
struct page_pool {
struct page_pool_params p;

+ long frag_users;
+ struct page *frag_page;
+ unsigned int frag_offset;
+ u32 pages_state_hold_cnt;
+
struct delayed_work release_dw;
void (*disconnect)(void *pool);
unsigned long defer_start;
unsigned long defer_warn;

- u32 pages_state_hold_cnt;
- unsigned int frag_offset;
- struct page *frag_page;
- long frag_users;
-
#ifdef CONFIG_PAGE_POOL_STATS
/* these stats are incremented while in softirq context */
struct page_pool_alloc_stats alloc_stats;
--
2.41.0


2023-08-04 01:43:37

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v3 0/6] page_pool: a couple of assorted optimizations

On Thu, 3 Aug 2023 20:20:32 +0200 Alexander Lobakin wrote:
> That initially was a spin-off of the IAVF PP series[0], but has grown
> (and shrunk) since then a bunch. In fact, it consists of three
> semi-independent blocks:
>
> * #1-2: Compile-time optimization. Split page_pool.h into 2 headers to
> not overbloat the consumers not needing complex inline helpers and
> then stop including it in skbuff.h at all. The first patch is also
> prereq for the whole series.
> * #3: Improve cacheline locality for users of the Page Pool frag API.
> * #4-6: Use direct cache recycling more aggressively, when it is safe
> obviously. In addition, make sure nobody wants to use Page Pool API
> with disabled interrupts.
>
> Patches #1 and #5 are authored by Yunsheng and Jakub respectively, with
> small modifications from my side as per ML discussions.
> For the perf numbers for #3-6, please see individual commit messages.

Our scheming didn't help much, the series also conflicts
with the net/xdp.h includes which came in via bpf-next :(
--
pw-bot: cr

2023-08-04 16:33:09

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH net-next v3 0/6] page_pool: a couple of assorted optimizations

From: Jakub Kicinski <[email protected]>
Date: Thu, 3 Aug 2023 18:21:21 -0700

> On Thu, 3 Aug 2023 20:20:32 +0200 Alexander Lobakin wrote:
>> That initially was a spin-off of the IAVF PP series[0], but has grown
>> (and shrunk) since then a bunch. In fact, it consists of three
>> semi-independent blocks:
>>
>> * #1-2: Compile-time optimization. Split page_pool.h into 2 headers to
>> not overbloat the consumers not needing complex inline helpers and
>> then stop including it in skbuff.h at all. The first patch is also
>> prereq for the whole series.
>> * #3: Improve cacheline locality for users of the Page Pool frag API.
>> * #4-6: Use direct cache recycling more aggressively, when it is safe
>> obviously. In addition, make sure nobody wants to use Page Pool API
>> with disabled interrupts.
>>
>> Patches #1 and #5 are authored by Yunsheng and Jakub respectively, with
>> small modifications from my side as per ML discussions.
>> For the perf numbers for #3-6, please see individual commit messages.
>
> Our scheming didn't help much, the series also conflicts
> with the net/xdp.h includes which came in via bpf-next :(

Haha that happens. I rechecked everything before sending, but well, we
need a spinlock or what :D
I'll send v4 in a couple hours, so that ~22 hrs will pass since v3,
sounds okay?

Thanks,
Olek