2024-04-03 15:29:04

by Mina Almasry

[permalink] [raw]
Subject: [PATCH net-next v4 0/3] Minor cleanups to skb frag ref/unref

v4:
- Rebased to net-next.
- Clarified skb_shift() code change in commit message.
- Use skb->pp_recycle in a couple of places where I previously hardcoded
'false'.

v3:
- Fixed patchwork build errors/warnings from patch-by-patch modallconfig
build

v2:
- Removed RFC tag.
- Rebased on net-next after the merge window opening.
- Added 1 patch at the beginning, "net: make napi_frag_unref reuse
skb_page_unref" because a recent patch introduced some code
duplication that can also be improved.
- Addressed feedback from Dragos & Yunsheng.
- Added Dragos's Reviewed-by.

This series is largely motivated by a recent discussion where there was
some confusion on how to properly ref/unref pp pages vs non pp pages:

https://lore.kernel.org/netdev/CAHS8izOoO-EovwMwAm9tLYetwikNPxC0FKyVGu1TPJWSz4bGoA@mail.gmail.com/T/#t

There is some subtely there because pp uses page->pp_ref_count for
refcounting, while non-pp uses get_page()/put_page() for ref counting.
Getting the refcounting pairs wrong can lead to kernel crash.

Additionally currently it may not be obvious to skb users unaware of
page pool internals how to properly acquire a ref on a pp frag. It
requires checking of skb->pp_recycle & is_pp_page() to make the correct
calls and may require some handling at the call site aware of arguable pp
internals.

This series is a minor refactor with a couple of goals:

1. skb users should be able to ref/unref a frag using
[__]skb_frag_[un]ref() functions without needing to understand pp
concepts and pp_ref_count vs get/put_page() differences.

2. reference counting functions should have a mirror opposite. I.e. there
should be a foo_unref() to every foo_ref() with a mirror opposite
implementation (as much as possible).

This is RFC to collect feedback if this change is desirable, but also so
that I don't race with the fix for the issue Dragos is seeing for his
crash.

https://lore.kernel.org/lkml/CAHS8izN436pn3SndrzsCyhmqvJHLyxgCeDpWXA4r1ANt3RCDLQ@mail.gmail.com/T/

Cc: Dragos Tatulea <[email protected]>


Mina Almasry (3):
net: make napi_frag_unref reuse skb_page_unref
net: mirror skb frag ref/unref helpers
net: remove napi_frag_unref

.../chelsio/inline_crypto/ch_ktls/chcr_ktls.c | 2 +-
drivers/net/ethernet/sun/cassini.c | 4 +-
drivers/net/veth.c | 2 +-
include/linux/skbuff.h | 38 +++++++------
net/core/skbuff.c | 55 ++++++-------------
net/ipv4/esp4.c | 2 +-
net/ipv6/esp6.c | 2 +-
net/tls/tls_device_fallback.c | 2 +-
8 files changed, 44 insertions(+), 63 deletions(-)

--
2.44.0.478.gd926399ef9-goog



2024-04-03 15:29:39

by Mina Almasry

[permalink] [raw]
Subject: [PATCH net-next v4 1/3] net: make napi_frag_unref reuse skb_page_unref

The implementations of these 2 functions are almost identical. Remove
the implementation of napi_frag_unref, and make it a call into
skb_page_unref so we don't duplicate the implementation.

Signed-off-by: Mina Almasry <[email protected]>

---
include/linux/skbuff.h | 12 +++---------
net/ipv4/esp4.c | 2 +-
net/ipv6/esp6.c | 2 +-
3 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 03ea36a82cdd..7dcbd27e1497 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3513,10 +3513,10 @@ int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb,
bool napi_pp_put_page(struct page *page);

static inline void
-skb_page_unref(const struct sk_buff *skb, struct page *page)
+skb_page_unref(struct page *page, bool recycle)
{
#ifdef CONFIG_PAGE_POOL
- if (skb->pp_recycle && napi_pp_put_page(page))
+ if (recycle && napi_pp_put_page(page))
return;
#endif
put_page(page);
@@ -3525,13 +3525,7 @@ skb_page_unref(const struct sk_buff *skb, struct page *page)
static inline void
napi_frag_unref(skb_frag_t *frag, bool recycle)
{
- struct page *page = skb_frag_page(frag);
-
-#ifdef CONFIG_PAGE_POOL
- if (recycle && napi_pp_put_page(page))
- return;
-#endif
- put_page(page);
+ skb_page_unref(skb_frag_page(frag), recycle);
}

/**
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 3d647c9a7a21..40330253f076 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -114,7 +114,7 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb)
*/
if (req->src != req->dst)
for (sg = sg_next(req->src); sg; sg = sg_next(sg))
- skb_page_unref(skb, sg_page(sg));
+ skb_page_unref(sg_page(sg), skb->pp_recycle);
}

#ifdef CONFIG_INET_ESPINTCP
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index fe8d53f5a5ee..fb431d0a3475 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -131,7 +131,7 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb)
*/
if (req->src != req->dst)
for (sg = sg_next(req->src); sg; sg = sg_next(sg))
- skb_page_unref(skb, sg_page(sg));
+ skb_page_unref(sg_page(sg), skb->pp_recycle);
}

#ifdef CONFIG_INET6_ESPINTCP
--
2.44.0.478.gd926399ef9-goog


2024-04-03 15:29:51

by Mina Almasry

[permalink] [raw]
Subject: [PATCH net-next v4 2/3] net: mirror skb frag ref/unref helpers

Refactor some of the skb frag ref/unref helpers for improved clarity.

Implement napi_pp_get_page() to be the mirror counterpart of
napi_pp_put_page().

Implement skb_page_ref() to be the mirror of skb_page_unref().

Improve __skb_frag_ref() to become a mirror counterpart of
__skb_frag_unref(). Previously unref could handle pp & non-pp pages,
while the ref could only handle non-pp pages. Now both the ref & unref
helpers can correctly handle both pp & non-pp pages.

Now that __skb_frag_ref() can handle both pp & non-pp pages, remove
skb_pp_frag_ref(), and use __skb_frag_ref() instead. This lets us
remove pp specific handling from skb_try_coalesce.

Additionally, since __skb_frag_ref() can now handle both pp & non-pp
pages, a latent issue in skb_shift() should now be fixed. Previously
this function would do a non-pp ref & pp unref on potential pp frags
(fragfrom). After this patch, skb_shift() should correctly do a pp
ref/unref on pp frags.

Signed-off-by: Mina Almasry <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>

---

v4:
- pass skb->pp_recycle instead of 'false' in __skb_frag_ref in
chcr_ktls.c & cassini.c.
- Add some details on the changes to skb_shift() in this commit in the
commit message.

v3:
- Fix build errors reported by patchwork.
- Fix drivers/net/veth.c & tls_device_fallback.c callsite I missed to update.
- Fix page_pool_ref_page(head_page) -> page_pool_ref_page(page)

---
.../chelsio/inline_crypto/ch_ktls/chcr_ktls.c | 2 +-
drivers/net/ethernet/sun/cassini.c | 4 +-
drivers/net/veth.c | 2 +-
include/linux/skbuff.h | 22 ++++++--
net/core/skbuff.c | 53 ++++++-------------
net/tls/tls_device_fallback.c | 2 +-
6 files changed, 39 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c b/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c
index 6482728794dd..d7e8deafddf1 100644
--- a/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c
+++ b/drivers/net/ethernet/chelsio/inline_crypto/ch_ktls/chcr_ktls.c
@@ -1658,7 +1658,7 @@ static void chcr_ktls_copy_record_in_skb(struct sk_buff *nskb,
for (i = 0; i < record->num_frags; i++) {
skb_shinfo(nskb)->frags[i] = record->frags[i];
/* increase the frag ref count */
- __skb_frag_ref(&skb_shinfo(nskb)->frags[i]);
+ __skb_frag_ref(&skb_shinfo(nskb)->frags[i], nskb->pp_recycle);
}

skb_shinfo(nskb)->nr_frags = record->num_frags;
diff --git a/drivers/net/ethernet/sun/cassini.c b/drivers/net/ethernet/sun/cassini.c
index bfb903506367..31878256feee 100644
--- a/drivers/net/ethernet/sun/cassini.c
+++ b/drivers/net/ethernet/sun/cassini.c
@@ -1999,7 +1999,7 @@ static int cas_rx_process_pkt(struct cas *cp, struct cas_rx_comp *rxc,
skb->len += hlen - swivel;

skb_frag_fill_page_desc(frag, page->buffer, off, hlen - swivel);
- __skb_frag_ref(frag);
+ __skb_frag_ref(frag, skb->pp_recycle);

/* any more data? */
if ((words[0] & RX_COMP1_SPLIT_PKT) && ((dlen -= hlen) > 0)) {
@@ -2023,7 +2023,7 @@ static int cas_rx_process_pkt(struct cas *cp, struct cas_rx_comp *rxc,
frag++;

skb_frag_fill_page_desc(frag, page->buffer, 0, hlen);
- __skb_frag_ref(frag);
+ __skb_frag_ref(frag, skb->pp_recycle);
RX_USED_ADD(page, hlen + cp->crc_size);
}

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index bcdfbf61eb66..6160a3e8d341 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -716,7 +716,7 @@ static void veth_xdp_get(struct xdp_buff *xdp)
return;

for (i = 0; i < sinfo->nr_frags; i++)
- __skb_frag_ref(&sinfo->frags[i]);
+ __skb_frag_ref(&sinfo->frags[i], false);
}

static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7dcbd27e1497..71caeee061ca 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3483,15 +3483,29 @@ static inline struct page *skb_frag_page(const skb_frag_t *frag)
return netmem_to_page(frag->netmem);
}

+bool napi_pp_get_page(struct page *page);
+
+static inline void skb_page_ref(struct page *page, bool recycle)
+{
+#ifdef CONFIG_PAGE_POOL
+ if (recycle && napi_pp_get_page(page))
+ return;
+#endif
+ get_page(page);
+}
+
/**
* __skb_frag_ref - take an addition reference on a paged fragment.
* @frag: the paged fragment
+ * @recycle: skb->pp_recycle param of the parent skb. False if no parent skb.
*
- * Takes an additional reference on the paged fragment @frag.
+ * Takes an additional reference on the paged fragment @frag. Obtains the
+ * correct reference count depending on whether skb->pp_recycle is set and
+ * whether the frag is a page pool frag.
*/
-static inline void __skb_frag_ref(skb_frag_t *frag)
+static inline void __skb_frag_ref(skb_frag_t *frag, bool recycle)
{
- get_page(skb_frag_page(frag));
+ skb_page_ref(skb_frag_page(frag), recycle);
}

/**
@@ -3503,7 +3517,7 @@ static inline void __skb_frag_ref(skb_frag_t *frag)
*/
static inline void skb_frag_ref(struct sk_buff *skb, int f)
{
- __skb_frag_ref(&skb_shinfo(skb)->frags[f]);
+ __skb_frag_ref(&skb_shinfo(skb)->frags[f], skb->pp_recycle);
}

int skb_pp_cow_data(struct page_pool *pool, struct sk_buff **pskb,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2a5ce6667bbb..ff7e450ec5ea 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1004,6 +1004,18 @@ int skb_cow_data_for_xdp(struct page_pool *pool, struct sk_buff **pskb,
EXPORT_SYMBOL(skb_cow_data_for_xdp);

#if IS_ENABLED(CONFIG_PAGE_POOL)
+bool napi_pp_get_page(struct page *page)
+{
+ page = compound_head(page);
+
+ if (!is_pp_page(page))
+ return false;
+
+ page_pool_ref_page(page);
+ return true;
+}
+EXPORT_SYMBOL(napi_pp_get_page);
+
bool napi_pp_put_page(struct page *page)
{
page = compound_head(page);
@@ -1032,37 +1044,6 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data)
return napi_pp_put_page(virt_to_page(data));
}

-/**
- * skb_pp_frag_ref() - Increase fragment references of a page pool aware skb
- * @skb: page pool aware skb
- *
- * Increase the fragment reference count (pp_ref_count) of a skb. This is
- * intended to gain fragment references only for page pool aware skbs,
- * i.e. when skb->pp_recycle is true, and not for fragments in a
- * non-pp-recycling skb. It has a fallback to increase references on normal
- * pages, as page pool aware skbs may also have normal page fragments.
- */
-static int skb_pp_frag_ref(struct sk_buff *skb)
-{
- struct skb_shared_info *shinfo;
- struct page *head_page;
- int i;
-
- if (!skb->pp_recycle)
- return -EINVAL;
-
- shinfo = skb_shinfo(skb);
-
- for (i = 0; i < shinfo->nr_frags; i++) {
- head_page = compound_head(skb_frag_page(&shinfo->frags[i]));
- if (likely(is_pp_page(head_page)))
- page_pool_ref_page(head_page);
- else
- page_ref_inc(head_page);
- }
- return 0;
-}
-
static void skb_kfree_head(void *head, unsigned int end_offset)
{
if (end_offset == SKB_SMALL_HEAD_HEADROOM)
@@ -4169,7 +4150,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen)
to++;

} else {
- __skb_frag_ref(fragfrom);
+ __skb_frag_ref(fragfrom, skb->pp_recycle);
skb_frag_page_copy(fragto, fragfrom);
skb_frag_off_copy(fragto, fragfrom);
skb_frag_size_set(fragto, todo);
@@ -4819,7 +4800,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
}

*nskb_frag = (i < 0) ? skb_head_frag_to_page_desc(frag_skb) : *frag;
- __skb_frag_ref(nskb_frag);
+ __skb_frag_ref(nskb_frag, nskb->pp_recycle);
size = skb_frag_size(nskb_frag);

if (pos < offset) {
@@ -5950,10 +5931,8 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
/* if the skb is not cloned this does nothing
* since we set nr_frags to 0.
*/
- if (skb_pp_frag_ref(from)) {
- for (i = 0; i < from_shinfo->nr_frags; i++)
- __skb_frag_ref(&from_shinfo->frags[i]);
- }
+ for (i = 0; i < from_shinfo->nr_frags; i++)
+ __skb_frag_ref(&from_shinfo->frags[i], from->pp_recycle);

to->truesize += delta;
to->len += len;
diff --git a/net/tls/tls_device_fallback.c b/net/tls/tls_device_fallback.c
index 4e7228f275fa..d4000b4a1f7d 100644
--- a/net/tls/tls_device_fallback.c
+++ b/net/tls/tls_device_fallback.c
@@ -277,7 +277,7 @@ static int fill_sg_in(struct scatterlist *sg_in,
for (i = 0; remaining > 0; i++) {
skb_frag_t *frag = &record->frags[i];

- __skb_frag_ref(frag);
+ __skb_frag_ref(frag, false);
sg_set_page(sg_in + i, skb_frag_page(frag),
skb_frag_size(frag), skb_frag_off(frag));

--
2.44.0.478.gd926399ef9-goog


2024-04-03 16:01:35

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net-next v4 1/3] net: make napi_frag_unref reuse skb_page_unref

On Wed, Apr 3, 2024 at 5:28 PM Mina Almasry <[email protected]> wrote:
>
> The implementations of these 2 functions are almost identical. Remove
> the implementation of napi_frag_unref, and make it a call into
> skb_page_unref so we don't duplicate the implementation.
>
> Signed-off-by: Mina Almasry <[email protected]>

Reviewed-by: Eric Dumazet <[email protected]>

2024-04-03 16:05:17

by Mina Almasry

[permalink] [raw]
Subject: [PATCH net-next v4 3/3] net: remove napi_frag_unref

With the changes in the last patches, napi_frag_unref() is now
reduandant. Remove it and use skb_page_unref directly.

Signed-off-by: Mina Almasry <[email protected]>
Reviewed-by: Dragos Tatulea <[email protected]>

---
include/linux/skbuff.h | 8 +-------
net/core/skbuff.c | 2 +-
2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 71caeee061ca..eb3d70e57166 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3536,12 +3536,6 @@ skb_page_unref(struct page *page, bool recycle)
put_page(page);
}

-static inline void
-napi_frag_unref(skb_frag_t *frag, bool recycle)
-{
- skb_page_unref(skb_frag_page(frag), recycle);
-}
-
/**
* __skb_frag_unref - release a reference on a paged fragment.
* @frag: the paged fragment
@@ -3552,7 +3546,7 @@ napi_frag_unref(skb_frag_t *frag, bool recycle)
*/
static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle)
{
- napi_frag_unref(frag, recycle);
+ skb_page_unref(skb_frag_page(frag), recycle);
}

/**
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ff7e450ec5ea..9aa1b40d1693 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1082,7 +1082,7 @@ static void skb_release_data(struct sk_buff *skb, enum skb_drop_reason reason)
}

for (i = 0; i < shinfo->nr_frags; i++)
- napi_frag_unref(&shinfo->frags[i], skb->pp_recycle);
+ __skb_frag_unref(&shinfo->frags[i], skb->pp_recycle);

free_head:
if (shinfo->frag_list)
--
2.44.0.478.gd926399ef9-goog


2024-04-03 16:30:14

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net-next v4 3/3] net: remove napi_frag_unref

On Wed, Apr 3, 2024 at 5:28 PM Mina Almasry <[email protected]> wrote:
>
> With the changes in the last patches, napi_frag_unref() is now
> reduandant. Remove it and use skb_page_unref directly.
>
> Signed-off-by: Mina Almasry <[email protected]>
> Reviewed-by: Dragos Tatulea <[email protected]>
>

Reviewed-by: Eric Dumazet <[email protected]>

2024-04-03 16:21:47

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net-next v4 2/3] net: mirror skb frag ref/unref helpers

On Wed, Apr 3, 2024 at 5:28 PM Mina Almasry <[email protected]> wrote:
>
> Refactor some of the skb frag ref/unref helpers for improved clarity.
>
> Implement napi_pp_get_page() to be the mirror counterpart of
> napi_pp_put_page().
>
> Implement skb_page_ref() to be the mirror of skb_page_unref().
>
> Improve __skb_frag_ref() to become a mirror counterpart of
> __skb_frag_unref(). Previously unref could handle pp & non-pp pages,
> while the ref could only handle non-pp pages. Now both the ref & unref
> helpers can correctly handle both pp & non-pp pages.
>
> Now that __skb_frag_ref() can handle both pp & non-pp pages, remove
> skb_pp_frag_ref(), and use __skb_frag_ref() instead. This lets us
> remove pp specific handling from skb_try_coalesce.
>
> Additionally, since __skb_frag_ref() can now handle both pp & non-pp
> pages, a latent issue in skb_shift() should now be fixed. Previously
> this function would do a non-pp ref & pp unref on potential pp frags
> (fragfrom). After this patch, skb_shift() should correctly do a pp
> ref/unref on pp frags.
>
> Signed-off-by: Mina Almasry <[email protected]>
> Reviewed-by: Dragos Tatulea <[email protected]>
>

..

> #if IS_ENABLED(CONFIG_PAGE_POOL)
> +bool napi_pp_get_page(struct page *page)
> +{
> + page = compound_head(page);
> +
> + if (!is_pp_page(page))
> + return false;
> +
> + page_pool_ref_page(page);
> + return true;
> +}
> +EXPORT_SYMBOL(napi_pp_get_page);

It seems this could be inlined (along with is_pp_page())