2024-04-01 21:50:58

by Mina Almasry

[permalink] [raw]
Subject: [PATCH net-next v3 0/3] Minor cleanups to skb frag ref/unref

v3:
- Fixed patchwork build errors/warnings from patch-by-patch modallconfig
build

v2:
- Removed RFC tag.
- Rebased on net-next after the merge window opening.
- Added 1 patch at the beginning, "net: make napi_frag_unref reuse
skb_page_unref" because a recent patch introduced some code
duplication that can also be improved.
- Addressed feedback from Dragos & Yunsheng.
- Added Dragos's Reviewed-by.

This series is largely motivated by a recent discussion where there was
some confusion on how to properly ref/unref pp pages vs non pp pages:

https://lore.kernel.org/netdev/CAHS8izOoO-EovwMwAm9tLYetwikNPxC0FKyVGu1TPJWSz4bGoA@mail.gmail.com/T/#t

There is some subtely there because pp uses page->pp_ref_count for
refcounting, while non-pp uses get_page()/put_page() for ref counting.
Getting the refcounting pairs wrong can lead to kernel crash.

Additionally currently it may not be obvious to skb users unaware of
page pool internals how to properly acquire a ref on a pp frag. It
requires checking of skb->pp_recycle & is_pp_page() to make the correct
calls and may require some handling at the call site aware of arguable pp
internals.

This series is a minor refactor with a couple of goals:

1. skb users should be able to ref/unref a frag using
[__]skb_frag_[un]ref() functions without needing to understand pp
concepts and pp_ref_count vs get/put_page() differences.

2. reference counting functions should have a mirror opposite. I.e. there
should be a foo_unref() to every foo_ref() with a mirror opposite
implementation (as much as possible).

This is RFC to collect feedback if this change is desirable, but also so
that I don't race with the fix for the issue Dragos is seeing for his
crash.

https://lore.kernel.org/lkml/CAHS8izN436pn3SndrzsCyhmqvJHLyxgCeDpWXA4r1ANt3RCDLQ@mail.gmail.com/T/

Cc: Dragos Tatulea <[email protected]>

Mina Almasry (3):
net: make napi_frag_unref reuse skb_page_unref
net: mirror skb frag ref/unref helpers
net: remove napi_frag_unref

.../chelsio/inline_crypto/ch_ktls/chcr_ktls.c | 2 +-
drivers/net/ethernet/marvell/sky2.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
drivers/net/ethernet/sun/cassini.c | 4 +-
drivers/net/veth.c | 2 +-
include/linux/skbuff.h | 44 +++++++-------
net/core/skbuff.c | 58 ++++++-------------
net/ipv4/esp4.c | 2 +-
net/ipv6/esp6.c | 2 +-
net/tls/tls_device.c | 2 +-
net/tls/tls_device_fallback.c | 2 +-
net/tls/tls_strp.c | 2 +-
12 files changed, 54 insertions(+), 70 deletions(-)

--
2.44.0.478.gd926399ef9-goog



2024-04-01 21:59:05

by Mina Almasry

[permalink] [raw]
Subject: [PATCH net-next v3 2/3] net: mirror skb frag ref/unref helpers

Refactor some of the skb frag ref/unref helpers for improved clarity.

Implement napi_pp_get_page() to be the mirror counterpart of
napi_pp_put_page().

Implement skb_page_ref() to be the mirror of skb_page_unref().

Improve __skb_frag_ref() to become a mirror counterpart of
__skb_frag_unref(). Previously unref could handle pp & non-pp pages,
while the ref could only handle non-pp pages. Now both the ref & unref
helpers can correctly handle both pp & non-pp pages.

Now that __skb_frag_ref() can handle both pp & non-pp pages, remove
skb_pp_frag_ref(), and use __skb_frag_ref() instead. This lets us
remove pp specific handling from skb_try_coalesce.

Signed-off-by: Mina Almasry <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>

---

v3:
- Fix build errors reported by patchwork.
- Fix drivers/net/veth.c & tls_device_fallback.c callsite I missed to update.
- Fix page_pool_ref_page(head_page) -> page_pool_ref_page(page)

---
.../chelsio/inline_crypto/ch_ktls/chcr_ktls.c | 2 +-
drivers/net/ethernet/sun/cassini.c | 4 +-
drivers/net/veth.c | 2 +-
include/linux/skbuff.h | 22 ++++++--
net/core/skbuff.c | 54 ++++++-------------
net/tls/tls_device_fallback.c | 2 +-
6 files changed, 40 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c b/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c
index 6482728794dd..f9b0a9533985 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c
@@ -1658,7 +1658,7 @@ static void chcr_ktls_copy_record_in_skb(struct sk_buff *nskb,
for (i = 0; i < record->num_frags; i++) {
skb_shinfo(nskb)->frags[i] = record->frags[i];
/* increase the frag ref count */
- __skb_frag_ref(&skb_shinfo(nskb)->frags[i]);
+ __skb_frag_ref(&skb_shinfo(nskb)->frags[i], false);
}

skb_shinfo(nskb)->nr_frags = record->num_frags;
diff --git a/drivers/net/ethernet/sun/cassini.c b/drivers/net/ethernet/sun/cassini.c
index bfb903506367..fabba729e1b8 100644
--- a/drivers/net/ethernet/sun/cassini.c
+++ b/drivers/net/ethernet/sun/cassini.c
@@ -1999,7 +1999,7 @@ static int cas_rx_process_pkt(struct cas *cp, struct cas_rx_comp *rxc,
skb->len += hlen - swivel;

skb_frag_fill_page_desc(frag, page->buffer, off, hlen - swivel);
- __skb_frag_ref(frag);
+ __skb_frag_ref(frag, false);

/* any more data? */
if ((words[0] & RX_COMP1_SPLIT_PKT) && ((dlen -= hlen) > 0)) {
@@ -2023,7 +2023,7 @@ static int cas_rx_process_pkt(struct cas *cp, struct cas_rx_comp *rxc,
frag++;

skb_frag_fill_page_desc(frag, page->buffer, 0, hlen);
- __skb_frag_ref(frag);
+ __skb_frag_ref(frag, false);
RX_USED_ADD(page, hlen + cp->crc_size);
}

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index bcdfbf61eb66..6160a3e8d341 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -716,7 +716,7 @@ static void veth_xdp_get(struct xdp_buff *xdp)
return;

for (i = 0; i < sinfo->nr_frags; i++)
- __skb_frag_ref(&sinfo->frags[i]);
+ __skb_frag_ref(&sinfo->frags[i], false);
}

static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a6b5596dc0cb..849d78554b50 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3483,15 +3483,29 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag)
return netmem_to_page(frag->netmem);
}

+bool napi_pp_get_page(struct page *page);
+
+static inline void skb_page_ref(struct page *page, bool recycle)
+{
+#ifdef CONFIG_PAGE_POOL
+ if (recycle && napi_pp_get_page(page))
+ return;
+#endif
+ get_page(page);
+}
+
/**
* __skb_frag_ref - take an addition reference on a paged fragment.
* @frag: the paged fragment
+ * @recycle: skb->pp_recycle param of the parent skb. False if no parent skb.
*
- * Takes an additional reference on the paged fragment @frag.
+ * Takes an additional reference on the paged fragment @frag. Obtains the
+ * correct reference count depending on whether skb->pp_recycle is set and
+ * whether the frag is a page pool frag.
*/
-static inline void __skb_frag_ref(skb_frag_t *frag)
+static inline void __skb_frag_ref(skb_frag_t *frag, bool recycle)
{
- get_page(skb_frag_page(frag));
+ skb_page_ref(skb_frag_page(frag), recycle);
}

/**
@@ -3503,7 +3517,7 @@ static inline void __skb_frag_ref(skb_frag_t *frag)
*/
static inline void skb_frag_ref(struct sk_buff *skb, int f)
{
- __skb_frag_ref(&skb_shinfo(skb)->frags[f]);
+ __skb_frag_ref(&skb_shinfo(skb)->frags[f], skb->pp_recycle);
}

int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a1be84be5d35..d878f2e67567 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1004,6 +1004,19 @@ int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb,
EXPORT_SYMBOL(skb_cow_data_for_xdp);

#if IS_ENABLED(CONFIG_PAGE_POOL)
+bool napi_pp_get_page(struct page *page)
+{
+
+ page = compound_head(page);
+
+ if (!is_pp_page(page))
+ return false;
+
+ page_pool_ref_page(page);
+ return true;
+}
+EXPORT_SYMBOL(napi_pp_get_page);
+
bool napi_pp_put_page(struct page *page, bool napi_safe)
{
bool allow_direct = false;
@@ -1056,37 +1069,6 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe)
return napi_pp_put_page(virt_to_page(data), napi_safe);
}

-/**
- * skb_pp_frag_ref() - Increase fragment references of a page pool aware skb
- * @skb: page pool aware skb
- *
- * Increase the fragment reference count (pp_ref_count) of a skb. This is
- * intended to gain fragment references only for page pool aware skbs,
- * i.e. when skb->pp_recycle is true, and not for fragments in a
- * non-pp-recycling skb. It has a fallback to increase references on normal
- * pages, as page pool aware skbs may also have normal page fragments.
- */
-static int skb_pp_frag_ref(struct sk_buff *skb)
-{
- struct skb_shared_info *shinfo;
- struct page *head_page;
- int i;
-
- if (!skb->pp_recycle)
- return -EINVAL;
-
- shinfo = skb_shinfo(skb);
-
- for (i = 0; i < shinfo->nr_frags; i++) {
- head_page = compound_head(skb_frag_page(&shinfo->frags[i]));
- if (likely(is_pp_page(head_page)))
- page_pool_ref_page(head_page);
- else
- page_ref_inc(head_page);
- }
- return 0;
-}
-
static void skb_kfree_head(void *head, unsigned int end_offset)
{
if (end_offset == SKB_SMALL_HEAD_HEADROOM)
@@ -4195,7 +4177,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen)
to++;

} else {
- __skb_frag_ref(fragfrom);
+ __skb_frag_ref(fragfrom, skb->pp_recycle);
skb_frag_page_copy(fragto, fragfrom);
skb_frag_off_copy(fragto, fragfrom);
skb_frag_size_set(fragto, todo);
@@ -4845,7 +4827,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
}

*nskb_frag = (i < 0) ? skb_head_frag_to_page_desc(frag_skb) : *frag;
- __skb_frag_ref(nskb_frag);
+ __skb_frag_ref(nskb_frag, nskb->pp_recycle);
size = skb_frag_size(nskb_frag);

if (pos < offset) {
@@ -5976,10 +5958,8 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
/* if the skb is not cloned this does nothing
* since we set nr_frags to 0.
*/
- if (skb_pp_frag_ref(from)) {
- for (i = 0; i < from_shinfo->nr_frags; i++)
- __skb_frag_ref(&from_shinfo->frags[i]);
- }
+ for (i = 0; i < from_shinfo->nr_frags; i++)
+ __skb_frag_ref(&from_shinfo->frags[i], from->pp_recycle);

to->truesize += delta;
to->len += len;
diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c
index 4e7228f275fa..d4000b4a1f7d 100644
--- a/net/tls/tls_device_fallback.c
+++ b/net/tls/tls_device_fallback.c
@@ -277,7 +277,7 @@ static int fill_sg_in(struct scatterlist *sg_in,
for (i = 0; remaining > 0; i++) {
skb_frag_t *frag = &record->frags[i];

- __skb_frag_ref(frag);
+ __skb_frag_ref(frag, false);
sg_set_page(sg_in + i, skb_frag_page(frag),
skb_frag_size(frag), skb_frag_off(frag));

--
2.44.0.478.gd926399ef9-goog


2024-04-01 21:59:17

by Mina Almasry

[permalink] [raw]
Subject: [PATCH net-next v3 1/3] net: make napi_frag_unref reuse skb_page_unref

The implementations of these 2 functions are almost identical. Remove
the implementation of napi_frag_unref, and make it a call into
skb_page_unref so we don't duplicate the implementation.

Signed-off-by: Mina Almasry <[email protected]>

---
include/linux/skbuff.h | 12 +++---------
net/ipv4/esp4.c | 2 +-
net/ipv6/esp6.c | 2 +-
3 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b7f1ecdaec38..a6b5596dc0cb 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3513,10 +3513,10 @@ int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb,
bool napi_pp_put_page(struct page *page, bool napi_safe);

static inline void
-skb_page_unref(const struct sk_buff *skb, struct page *page, bool napi_safe)
+skb_page_unref(struct page *page, bool recycle, bool napi_safe)
{
#ifdef CONFIG_PAGE_POOL
- if (skb->pp_recycle && napi_pp_put_page(page, napi_safe))
+ if (recycle && napi_pp_put_page(page, napi_safe))
return;
#endif
put_page(page);
@@ -3525,13 +3525,7 @@ skb_page_unref(const struct sk_buff *skb, struct page *page, bool napi_safe)
static inline void
napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
{
- struct page *page = skb_frag_page(frag);
-
-#ifdef CONFIG_PAGE_POOL
- if (recycle && napi_pp_put_page(page, napi_safe))
- return;
-#endif
- put_page(page);
+ skb_page_unref(skb_frag_page(frag), recycle, napi_safe);
}

/**
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index d33d12421814..3d2c252c5570 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -114,7 +114,7 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb)
*/
if (req->src != req->dst)
for (sg = sg_next(req->src); sg; sg = sg_next(sg))
- skb_page_unref(skb, sg_page(sg), false);
+ skb_page_unref(sg_page(sg), skb->pp_recycle, false);
}

#ifdef CONFIG_INET_ESPINTCP
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 7371886d4f9f..4fe4f97f5420 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -131,7 +131,7 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb)
*/
if (req->src != req->dst)
for (sg = sg_next(req->src); sg; sg = sg_next(sg))
- skb_page_unref(skb, sg_page(sg), false);
+ skb_page_unref(sg_page(sg), skb->pp_recycle, false);
}

#ifdef CONFIG_INET6_ESPINTCP
--
2.44.0.478.gd926399ef9-goog


2024-04-01 21:59:57

by Mina Almasry

[permalink] [raw]
Subject: [PATCH net-next v3 3/3] net: remove napi_frag_unref

With the changes in the last patches, napi_frag_unref() is now
reduandant. Remove it and use skb_page_unref directly.

Signed-off-by: Mina Almasry <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>

---
drivers/net/ethernet/marvell/sky2.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 2 +-
include/linux/skbuff.h | 14 +++++---------
net/core/skbuff.c | 4 ++--
net/tls/tls_device.c | 2 +-
net/tls/tls_strp.c | 2 +-
6 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c b/drivers/net/ethernet/marvell/sky2.c
index 07720841a8d7..8e00a5856856 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -2501,7 +2501,7 @@ static void skb_put_frags(struct sk_buff *skb, unsigned int hdr_space,

if (length == 0) {
/* don't need this page */
- __skb_frag_unref(frag, false);
+ __skb_frag_unref(frag, false, false);
--skb_shinfo(skb)->nr_frags;
} else {
size = min(length, (unsigned) PAGE_SIZE);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index eac49657bd07..4dbf29b46979 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -526,7 +526,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
fail:
while (nr > 0) {
nr--;
- __skb_frag_unref(skb_shinfo(skb)->frags + nr, false);
+ __skb_frag_unref(skb_shinfo(skb)->frags + nr, false, false);
}
return 0;
}
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 849d78554b50..e3f40f89e15a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3536,23 +3536,19 @@ skb_page_unref(struct page *page, bool recycle, bool napi_safe)
put_page(page);
}

-static inline void
-napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
-{
- skb_page_unref(skb_frag_page(frag), recycle, napi_safe);
-}
-
/**
* __skb_frag_unref - release a reference on a paged fragment.
* @frag: the paged fragment
* @recycle: recycle the page if allocated via page_pool
+ * @napi_safe: set to true if running in the same napi context as where the
+ * consumer would run.
*
* Releases a reference on the paged fragment @frag
* or recycles the page via the page_pool API.
*/
-static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle)
+static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
{
- napi_frag_unref(frag, recycle, false);
+ skb_page_unref(skb_frag_page(frag), recycle, napi_safe);
}

/**
@@ -3567,7 +3563,7 @@ static inline void skb_frag_unref(struct sk_buff *skb, int f)
struct skb_shared_info *shinfo = skb_shinfo(skb);

if (!skb_zcopy_managed(skb))
- __skb_frag_unref(&shinfo->frags[f], skb->pp_recycle);
+ __skb_frag_unref(&shinfo->frags[f], skb->pp_recycle, false);
}

/**
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d878f2e67567..c4460408467e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1108,7 +1108,7 @@ static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason,
}

for (i = 0; i < shinfo->nr_frags; i++)
- napi_frag_unref(&shinfo->frags[i], skb->pp_recycle, napi_safe);
+ __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle, napi_safe);

free_head:
if (shinfo->frag_list)
@@ -4199,7 +4199,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen)
fragto = &skb_shinfo(tgt)->frags[merge];

skb_frag_size_add(fragto, skb_frag_size(fragfrom));
- __skb_frag_unref(fragfrom, skb->pp_recycle);
+ __skb_frag_unref(fragfrom, skb->pp_recycle, false);
}

/* Reposition in the original skb */
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index bf8ed36b1ad6..5dc6381f34fb 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -140,7 +140,7 @@ static void destroy_record(struct tls_record_info *record)
int i;

for (i = 0; i < record->num_frags; i++)
- __skb_frag_unref(&record->frags[i], false);
+ __skb_frag_unref(&record->frags[i], false, false);
kfree(record);
}

diff --git a/net/tls/tls_strp.c b/net/tls/tls_strp.c
index ca1e0e198ceb..85b41f226978 100644
--- a/net/tls/tls_strp.c
+++ b/net/tls/tls_strp.c
@@ -196,7 +196,7 @@ static void tls_strp_flush_anchor_copy(struct tls_strparser *strp)
DEBUG_NET_WARN_ON_ONCE(atomic_read(&shinfo->dataref) != 1);

for (i = 0; i < shinfo->nr_frags; i++)
- __skb_frag_unref(&shinfo->frags[i], false);
+ __skb_frag_unref(&shinfo->frags[i], false, false);
shinfo->nr_frags = 0;
if (strp->copy_mode) {
kfree_skb_list(shinfo->frag_list);
--
2.44.0.478.gd926399ef9-goog


2024-04-03 02:10:45

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v3 0/3] Minor cleanups to skb frag ref/unref

On Mon, 1 Apr 2024 14:50:36 -0700 Mina Almasry wrote:
> This series is largely motivated by a recent discussion where there was
> some confusion on how to properly ref/unref pp pages vs non pp pages:
>
> https://lore.kernel.org/netdev/CAHS8izOoO-EovwMwAm9tLYetwikNPxC0FKyVGu1TPJWSz4bGoA@mail.gmail.com/T/#t
>
> There is some subtely there because pp uses page->pp_ref_count for
> refcounting, while non-pp uses get_page()/put_page() for ref counting.
> Getting the refcounting pairs wrong can lead to kernel crash.
>
> Additionally currently it may not be obvious to skb users unaware of
> page pool internals how to properly acquire a ref on a pp frag. It
> requires checking of skb->pp_recycle & is_pp_page() to make the correct
> calls and may require some handling at the call site aware of arguable pp
> internals.

I concluded that Olek's series as good to go in, so you gotta rebase.
--
pw-bot: cr