When the skb is reorganized during esp_output (!esp->inline), the pages
coming from the original skb fragments are supposed to be released back
to the system through put_page. But if the skb fragment pages are
originating from a page_pool, calling put_page on them will trigger a
page_pool leak which will eventually result in a crash.
This leak can be easily observed when using CONFIG_DEBUG_VM and doing
ipsec + gre (non offloaded) forwarding:
BUG: Bad page state in process ksoftirqd/16 pfn:1451b6
page:00000000de2b8d32 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1451b6000 pfn:0x1451b6
flags: 0x200000000000000(node=0|zone=2)
page_type: 0xffffffff()
raw: 0200000000000000 dead000000000040 ffff88810d23c000 0000000000000000
raw: 00000001451b6000 0000000000000001 00000000ffffffff 0000000000000000
page dumped because: page_pool leak
Modules linked in: ip_gre gre mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat nf_nat xt_addrtype br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay zram zsmalloc fuse [last unloaded: mlx5_core]
CPU: 16 PID: 96 Comm: ksoftirqd/16 Not tainted 6.8.0-rc4+ #22
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x36/0x50
bad_page+0x70/0xf0
free_unref_page_prepare+0x27a/0x460
free_unref_page+0x38/0x120
esp_ssg_unref.isra.0+0x15f/0x200
esp_output_tail+0x66d/0x780
esp_xmit+0x2c5/0x360
validate_xmit_xfrm+0x313/0x370
? validate_xmit_skb+0x1d/0x330
validate_xmit_skb_list+0x4c/0x70
sch_direct_xmit+0x23e/0x350
__dev_queue_xmit+0x337/0xba0
? nf_hook_slow+0x3f/0xd0
ip_finish_output2+0x25e/0x580
iptunnel_xmit+0x19b/0x240
ip_tunnel_xmit+0x5fb/0xb60
ipgre_xmit+0x14d/0x280 [ip_gre]
dev_hard_start_xmit+0xc3/0x1c0
__dev_queue_xmit+0x208/0xba0
? nf_hook_slow+0x3f/0xd0
ip_finish_output2+0x1ca/0x580
ip_sublist_rcv_finish+0x32/0x40
ip_sublist_rcv+0x1b2/0x1f0
? ip_rcv_finish_core.constprop.0+0x460/0x460
ip_list_rcv+0x103/0x130
__netif_receive_skb_list_core+0x181/0x1e0
netif_receive_skb_list_internal+0x1b3/0x2c0
napi_gro_receive+0xc8/0x200
gro_cell_poll+0x52/0x90
__napi_poll+0x25/0x1a0
net_rx_action+0x28e/0x300
__do_softirq+0xc3/0x276
? sort_range+0x20/0x20
run_ksoftirqd+0x1e/0x30
smpboot_thread_fn+0xa6/0x130
kthread+0xcd/0x100
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x31/0x50
? kthread_complete_and_exit+0x20/0x20
ret_from_fork_asm+0x11/0x20
</TASK>
The suggested fix is to use the page_pool release API first and then fallback
to put_page.
Signed-off-by: Dragos Tatulea <[email protected]>
Reported-by: Anatoli N.Chechelnickiy <[email protected]>
Reported-by: Ian Kumlien <[email protected]>
Change-Id: I263cf91c1d13c2736a58927e8e0fc51296759450
---
net/ipv4/esp4.c | 11 ++++++++---
net/ipv6/esp6.c | 11 ++++++++---
2 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 4dd9e5040672..3e07d78c887d 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -112,9 +112,14 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp)
/* Unref skb_frag_pages in the src scatterlist if necessary.
* Skip the first sg which comes from skb->data.
*/
- if (req->src != req->dst)
- for (sg = sg_next(req->src); sg; sg = sg_next(sg))
- put_page(sg_page(sg));
+ if (req->src != req->dst) {
+ for (sg = sg_next(req->src); sg; sg = sg_next(sg)) {
+ struct page *page = sg_page(sg);
+
+ if (!napi_pp_put_page(page, false))
+ put_page(page);
+ }
+ }
}
#ifdef CONFIG_INET_ESPINTCP
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 6e6efe026cdc..b73f5773139d 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -129,9 +129,14 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp)
/* Unref skb_frag_pages in the src scatterlist if necessary.
* Skip the first sg which comes from skb->data.
*/
- if (req->src != req->dst)
- for (sg = sg_next(req->src); sg; sg = sg_next(sg))
- put_page(sg_page(sg));
+ if (req->src != req->dst) {
+ for (sg = sg_next(req->src); sg; sg = sg_next(sg)) {
+ struct page *page = sg_page(sg);
+
+ if (!napi_pp_put_page(page, false))
+ put_page(page);
+ }
+ }
}
#ifdef CONFIG_INET6_ESPINTCP
--
2.42.0
On Mon, 4 Mar 2024 11:48:52 +0200 Dragos Tatulea wrote:
> When the skb is reorganized during esp_output (!esp->inline), the pages
> coming from the original skb fragments are supposed to be released back
> to the system through put_page. But if the skb fragment pages are
> originating from a page_pool, calling put_page on them will trigger a
> page_pool leak which will eventually result in a crash.
So it just does: skb_shinfo(skb)->nr_frags = 1;
and assumes that's equivalent to owning a page ref on all the frags?
Fix looks more or less good, we would need a new wrapper to avoid
build issues without PAGE_POOL, but I wonder if we wouldn't be better
off changing the other side. Instead of "cutting off" the frags -
walking them and dealing with various page types. Because Mina and co.
will step onto this landmine as well.
On Tue, 2024-03-05 at 19:04 -0800, Jakub Kicinski wrote:
> On Mon, 4 Mar 2024 11:48:52 +0200 Dragos Tatulea wrote:
> > When the skb is reorganized during esp_output (!esp->inline), the pages
> > coming from the original skb fragments are supposed to be released back
> > to the system through put_page. But if the skb fragment pages are
> > originating from a page_pool, calling put_page on them will trigger a
> > page_pool leak which will eventually result in a crash.
>
> So it just does: skb_shinfo(skb)->nr_frags = 1;
> and assumes that's equivalent to owning a page ref on all the frags?
>
My understanding is different: it sets nr_frags to 1 because it's swapping out
the old page frag in fragment 0 with the new xfrag page frag and will use this
"new" skb from here. It does take a page reference for the xfrag page frag.
> Fix looks more or less good, we would need a new wrapper to avoid
> build issues without PAGE_POOL,
>
Ack. Which component would be best location for this wrapper: page_pool?
> but I wonder if we wouldn't be better
> off changing the other side. Instead of "cutting off" the frags -
> walking them and dealing with various page types. Because Mina and co.
> will step onto this landmine as well.
The page frags are still stored and used in the sg scatterlist. If we release
them at the moment when the skb is "cut off", the pages in the sg will be
invalid. At least that's my understanding.
Thanks,
Dragos
On Wed, 6 Mar 2024 13:05:14 +0000 Dragos Tatulea wrote:
> On Tue, 2024-03-05 at 19:04 -0800, Jakub Kicinski wrote:
> > On Mon, 4 Mar 2024 11:48:52 +0200 Dragos Tatulea wrote:
> > > When the skb is reorganized during esp_output (!esp->inline), the pages
> > > coming from the original skb fragments are supposed to be released back
> > > to the system through put_page. But if the skb fragment pages are
> > > originating from a page_pool, calling put_page on them will trigger a
> > > page_pool leak which will eventually result in a crash.
> >
> > So it just does: skb_shinfo(skb)->nr_frags = 1;
> > and assumes that's equivalent to owning a page ref on all the frags?
> >
> My understanding is different: it sets nr_frags to 1 because it's swapping out
> the old page frag in fragment 0 with the new xfrag page frag and will use this
> "new" skb from here. It does take a page reference for the xfrag page frag.
Same understanding, I'm just bad at explaining :)
> > Fix looks more or less good, we would need a new wrapper to avoid
> > build issues without PAGE_POOL,
> >
> Ack. Which component would be best location for this wrapper: page_pool?
Hm, that's a judgment call.
Part of me wants to put it next to napi_frag_unref(), since we
basically need to factor out the insides of this function.
When you post the patch the page pool crowd will give us
their opinions.
> > but I wonder if we wouldn't be better
> > off changing the other side. Instead of "cutting off" the frags -
> > walking them and dealing with various page types. Because Mina and co.
> > will step onto this landmine as well.
> The page frags are still stored and used in the sg scatterlist. If we release
> them at the moment when the skb is "cut off", the pages in the sg will be
> invalid. At least that's my understanding.
I was thinking something along the lines of:
for each frag()
if (is_pp_page()) {
get_page();
page_pool_unref_page(1);
}
so that it's trivial to insert another check for "is this a zero-copy"
page in there, and error our. But on reflection the zero copy check may
be better placed in __skb_to_sgvec(), so ignore this. Just respin
what you got with a new helper.
On Wed, 2024-03-06 at 07:22 -0800, Jakub Kicinski wrote:
> On Wed, 6 Mar 2024 13:05:14 +0000 Dragos Tatulea wrote:
> > On Tue, 2024-03-05 at 19:04 -0800, Jakub Kicinski wrote:
> > > On Mon, 4 Mar 2024 11:48:52 +0200 Dragos Tatulea wrote:
> > > > When the skb is reorganized during esp_output (!esp->inline), the pages
> > > > coming from the original skb fragments are supposed to be released back
> > > > to the system through put_page. But if the skb fragment pages are
> > > > originating from a page_pool, calling put_page on them will trigger a
> > > > page_pool leak which will eventually result in a crash.
> > >
> > > So it just does: skb_shinfo(skb)->nr_frags = 1;
> > > and assumes that's equivalent to owning a page ref on all the frags?
> > >
> > My understanding is different: it sets nr_frags to 1 because it's swapping out
> > the old page frag in fragment 0 with the new xfrag page frag and will use this
> > "new" skb from here. It does take a page reference for the xfrag page frag.
>
> Same understanding, I'm just bad at explaining :)
>
> > > Fix looks more or less good, we would need a new wrapper to avoid
> > > build issues without PAGE_POOL,
> > >
> > Ack. Which component would be best location for this wrapper: page_pool?
>
> Hm, that's a judgment call.
> Part of me wants to put it next to napi_frag_unref(), since we
> basically need to factor out the insides of this function.
> When you post the patch the page pool crowd will give us
> their opinions.
>
Why not have napi_pp_put_page simply return false if CONFIG_PAGE_POOL is not
set?
Regarding stable would I need to send a separate fix that does the raw pp page
check without the API?
> > > but I wonder if we wouldn't be better
> > > off changing the other side. Instead of "cutting off" the frags -
> > > walking them and dealing with various page types. Because Mina and co.
> > > will step onto this landmine as well.
> > The page frags are still stored and used in the sg scatterlist. If we release
> > them at the moment when the skb is "cut off", the pages in the sg will be
> > invalid. At least that's my understanding.
>
> I was thinking something along the lines of:
>
> for each frag()
> if (is_pp_page()) {
> get_page();
> page_pool_unref_page(1);
> }
>
> so that it's trivial to insert another check for "is this a zero-copy"
> page in there, and error our. But on reflection the zero copy check may
> be better placed in __skb_to_sgvec(), so ignore this. Just respin
> what you got with a new helper.
>
Ignored. I was hoping we wouldn't go in that direction :).
Thanks,
Dragos
On Wed, 6 Mar 2024 16:00:46 +0000 Dragos Tatulea wrote:
> > Hm, that's a judgment call.
> > Part of me wants to put it next to napi_frag_unref(), since we
> > basically need to factor out the insides of this function.
> > When you post the patch the page pool crowd will give us
> > their opinions.
>
> Why not have napi_pp_put_page simply return false if CONFIG_PAGE_POOL is not
> set?
Without LTO it may still be a function call.
Plus, subjectively, I think that it's a bit too much logic to encode in
the caller (you must also check skb->pp_recycle, AFAIU)
Maybe we should make skb_pp_recycle() take struct page and move it to
skbuff.h ? Rename it to skb_page_unref() ?
> Regarding stable would I need to send a separate fix that does the raw pp page
> check without the API?
You can put them in one patch, I reckon.
On Wed, Mar 6, 2024 at 8:09 AM Jakub Kicinski <[email protected]> wrote:
>
> On Wed, 6 Mar 2024 16:00:46 +0000 Dragos Tatulea wrote:
> > > Hm, that's a judgment call.
> > > Part of me wants to put it next to napi_frag_unref(), since we
> > > basically need to factor out the insides of this function.
> > > When you post the patch the page pool crowd will give us
> > > their opinions.
> >
> > Why not have napi_pp_put_page simply return false if CONFIG_PAGE_POOL is not
> > set?
>
> Without LTO it may still be a function call.
> Plus, subjectively, I think that it's a bit too much logic to encode in
> the caller (you must also check skb->pp_recycle, AFAIU)
> Maybe we should make skb_pp_recycle() take struct page and move it to
> skbuff.h ? Rename it to skb_page_unref() ?
>
Does the caller need to check skb->pp_recycle? pp_recycle seems like a
redundant bit. We can tell whether the page is pp by checking
is_pp_page(page). the pages in the frag must be pp pages when
skb->pp_recycle is set and must be non pp pages when the
skb->pp_recycle is not set, so it all seems redundant to me.
My fix would be something like:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d577e0bee18d..cc737b7b9860 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3507,17 +3507,25 @@ int skb_cow_data_for_xdp(struct page_pool
*pool, struct sk_buff **pskb,
bool napi_pp_put_page(struct page *page, bool napi_safe);
static inline void
-napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
+napi_page_unref(struct page *page, bool napi_safe)
{
- struct page *page = skb_frag_page(frag);
-
#ifdef CONFIG_PAGE_POOL
- if (recycle && napi_pp_put_page(page, napi_safe))
+ if (napi_pp_put_page(page, napi_safe))
return;
#endif
put_page(page);
}
+static inline void
+napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
+{
+ struct page *page = skb_frag_page(frag);
+
+ DEBUG_NET_WARN_ON(recycle != is_pp_page(page));
+
+ napi_page_unref(page);
+}
+
And then use napi_page_unref() in the callers to handle page pool &
non-page pool gracefully without leaking page pool internals to the
callers.
> > Regarding stable would I need to send a separate fix that does the raw pp page
> > check without the API?
>
> You can put them in one patch, I reckon.
--
Thanks,
Mina
On Wed, 2024-03-06 at 08:40 -0800, Mina Almasry wrote:
> On Wed, Mar 6, 2024 at 8:09 AM Jakub Kicinski <[email protected]> wrote:
> >
> > On Wed, 6 Mar 2024 16:00:46 +0000 Dragos Tatulea wrote:
> > > > Hm, that's a judgment call.
> > > > Part of me wants to put it next to napi_frag_unref(), since we
> > > > basically need to factor out the insides of this function.
> > > > When you post the patch the page pool crowd will give us
> > > > their opinions.
> > >
> > > Why not have napi_pp_put_page simply return false if CONFIG_PAGE_POOL is not
> > > set?
> >
> > Without LTO it may still be a function call.
> > Plus, subjectively, I think that it's a bit too much logic to encode in
> > the caller (you must also check skb->pp_recycle, AFAIU)
> > Maybe we should make skb_pp_recycle() take struct page and move it to
> > skbuff.h ? Rename it to skb_page_unref() ?
> >
>
> Does the caller need to check skb->pp_recycle? pp_recycle seems like a
> redundant bit. We can tell whether the page is pp by checking
> is_pp_page(page). the pages in the frag must be pp pages when
> skb->pp_recycle is set and must be non pp pages when the
> skb->pp_recycle is not set, so it all seems redundant to me.
>
AFAIU we don't have to check for pp_recycle, at least not in this specific case.
> My fix would be something like:
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index d577e0bee18d..cc737b7b9860 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -3507,17 +3507,25 @@ int skb_cow_data_for_xdp(struct page_pool
> *pool, struct sk_buff **pskb,
> bool napi_pp_put_page(struct page *page, bool napi_safe);
>
> static inline void
> -napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
> +napi_page_unref(struct page *page, bool napi_safe)
> {
> - struct page *page = skb_frag_page(frag);
> -
> #ifdef CONFIG_PAGE_POOL
> - if (recycle && napi_pp_put_page(page, napi_safe))
> + if (napi_pp_put_page(page, napi_safe))
> return;
> #endif
> put_page(page);
> }
>
> +static inline void
> +napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
> +{
> + struct page *page = skb_frag_page(frag);
> +
> + DEBUG_NET_WARN_ON(recycle != is_pp_page(page));
> +
> + napi_page_unref(page);
> +}
> +
>
> And then use napi_page_unref() in the callers to handle page pool &
> non-page pool gracefully without leaking page pool internals to the
> callers.
>
We'd also need to add is_pp_page() in the header with the changes above...
On that line of thought, unless these new APIs are useful for other use-cases,
why not keep it simple:
- Move is_pp_page() to skbuff.h.
- Do a simple is_pp_page(page) ? page_pool_put_full_page(page):put_page(page) in
the caller? Checking skb->pp_recycle would not be needed.
Thanks,
Dragos
> > > Regarding stable would I need to send a separate fix that does the raw pp page
> > > check without the API?
> >
> > You can put them in one patch, I reckon.
>
On Wed, Mar 6, 2024 at 9:10 AM Dragos Tatulea <[email protected]> wrote:
>
> On Wed, 2024-03-06 at 08:40 -0800, Mina Almasry wrote:
> > On Wed, Mar 6, 2024 at 8:09 AM Jakub Kicinski <[email protected]> wrote:
> > >
> > > On Wed, 6 Mar 2024 16:00:46 +0000 Dragos Tatulea wrote:
> > > > > Hm, that's a judgment call.
> > > > > Part of me wants to put it next to napi_frag_unref(), since we
> > > > > basically need to factor out the insides of this function.
> > > > > When you post the patch the page pool crowd will give us
> > > > > their opinions.
> > > >
> > > > Why not have napi_pp_put_page simply return false if CONFIG_PAGE_POOL is not
> > > > set?
> > >
> > > Without LTO it may still be a function call.
> > > Plus, subjectively, I think that it's a bit too much logic to encode in
> > > the caller (you must also check skb->pp_recycle, AFAIU)
> > > Maybe we should make skb_pp_recycle() take struct page and move it to
> > > skbuff.h ? Rename it to skb_page_unref() ?
> > >
> >
> > Does the caller need to check skb->pp_recycle? pp_recycle seems like a
> > redundant bit. We can tell whether the page is pp by checking
> > is_pp_page(page). the pages in the frag must be pp pages when
> > skb->pp_recycle is set and must be non pp pages when the
> > skb->pp_recycle is not set, so it all seems redundant to me.
> >
> AFAIU we don't have to check for pp_recycle, at least not in this specific case.
>
> > My fix would be something like:
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index d577e0bee18d..cc737b7b9860 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -3507,17 +3507,25 @@ int skb_cow_data_for_xdp(struct page_pool
> > *pool, struct sk_buff **pskb,
> > bool napi_pp_put_page(struct page *page, bool napi_safe);
> >
> > static inline void
> > -napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
> > +napi_page_unref(struct page *page, bool napi_safe)
> > {
> > - struct page *page = skb_frag_page(frag);
> > -
> > #ifdef CONFIG_PAGE_POOL
> > - if (recycle && napi_pp_put_page(page, napi_safe))
> > + if (napi_pp_put_page(page, napi_safe))
> > return;
> > #endif
> > put_page(page);
> > }
> >
> > +static inline void
> > +napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
> > +{
> > + struct page *page = skb_frag_page(frag);
> > +
> > + DEBUG_NET_WARN_ON(recycle != is_pp_page(page));
> > +
> > + napi_page_unref(page);
> > +}
> > +
> >
> > And then use napi_page_unref() in the callers to handle page pool &
> > non-page pool gracefully without leaking page pool internals to the
> > callers.
> >
> We'd also need to add is_pp_page() in the header with the changes above...
>
Gah, I did not realize skbuff.h doesn't have is_pp_page(). Sorry!
> On that line of thought, unless these new APIs are useful for other use-cases,
> why not keep it simple:
> - Move is_pp_page() to skbuff.h.
I prefer this. is_pp_page() is a one-liner anyway.
> - Do a simple is_pp_page(page) ? page_pool_put_full_page(page):put_page(page) in
> the caller? Checking skb->pp_recycle would not be needed.
>
IMHO page pool internals should not leak to the callers like this.
There should be a helper that does the right thing for pp & non-pp
pages so that the caller can use it without worrying about subtle pp
concepts that they may miss at some callsites, but I'm fine either way
if there is strong disagreement from others.
> Thanks,
> Dragos
>
> > > > Regarding stable would I need to send a separate fix that does the raw pp page
> > > > check without the API?
> > >
> > > You can put them in one patch, I reckon.
> >
>
--
Thanks,
Mina
On Wed, 6 Mar 2024 17:09:57 +0000 Dragos Tatulea wrote:
> > Does the caller need to check skb->pp_recycle? pp_recycle seems like a
> > redundant bit. We can tell whether the page is pp by checking
> > is_pp_page(page). the pages in the frag must be pp pages when
> > skb->pp_recycle is set and must be non pp pages when the
> > skb->pp_recycle is not set, so it all seems redundant to me.
> >
> AFAIU we don't have to check for pp_recycle, at least not in this specific case.
Definitely not something we assuming in a fix that needs to go
to stable.
So far, AFAIU, it's legal to have an skb without skb->pp_recycle
set, which holds full page refs to a PP page.
On Wed, Mar 6, 2024 at 9:41 AM Jakub Kicinski <[email protected]> wrote:
>
> On Wed, 6 Mar 2024 17:09:57 +0000 Dragos Tatulea wrote:
> > > Does the caller need to check skb->pp_recycle? pp_recycle seems like a
> > > redundant bit. We can tell whether the page is pp by checking
> > > is_pp_page(page). the pages in the frag must be pp pages when
> > > skb->pp_recycle is set and must be non pp pages when the
> > > skb->pp_recycle is not set, so it all seems redundant to me.
> > >
> > AFAIU we don't have to check for pp_recycle, at least not in this specific case.
>
> Definitely not something we assuming in a fix that needs to go
> to stable.
>
> So far, AFAIU, it's legal to have an skb without skb->pp_recycle
> set, which holds full page refs to a PP page.
Interesting. I apologize then I did not realize this combination is
legal. But I have a follow up question: what is the ref code supposed
to do in this combination? AFAIU:
- skb->pp_recycle && is_pp_page()
ref via page_pool_ref_page()
unref via page_pool_unref_page()
- !skb->pp_recycle && !is_pp_page()
ref via get_page()
unref via put_page()
- !skb->pp_recycle && is_pp_page()
ref via ?
unref via?
Also is this combination also legal you think? and if so what to do?
- skb->pp_recycle && !is_pp_page()
ref via ?
unref via?
I'm asking because I'm starting to wonder if this patch has some issue in it:
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
Because in this patch, if we have a !skb->pp_recycle & is_pp_page()
combination we end up doing in skb_try_coalesce():
ref via page_pool_ref_page()
unref via put_page() via eventual napi_frag_unref()
which seems like an issue, no?
--
Thanks,
Mina
On Wed, Mar 6, 2024 at 10:41 AM Mina Almasry <[email protected]> wrote:
>
> On Wed, Mar 6, 2024 at 9:41 AM Jakub Kicinski <[email protected]> wrote:
> >
> > On Wed, 6 Mar 2024 17:09:57 +0000 Dragos Tatulea wrote:
> > > > Does the caller need to check skb->pp_recycle? pp_recycle seems like a
> > > > redundant bit. We can tell whether the page is pp by checking
> > > > is_pp_page(page). the pages in the frag must be pp pages when
> > > > skb->pp_recycle is set and must be non pp pages when the
> > > > skb->pp_recycle is not set, so it all seems redundant to me.
> > > >
> > > AFAIU we don't have to check for pp_recycle, at least not in this specific case.
> >
> > Definitely not something we assuming in a fix that needs to go
> > to stable.
> >
> > So far, AFAIU, it's legal to have an skb without skb->pp_recycle
> > set, which holds full page refs to a PP page.
>
> Interesting. I apologize then I did not realize this combination is
> legal. But I have a follow up question: what is the ref code supposed
> to do in this combination? AFAIU:
>
> - skb->pp_recycle && is_pp_page()
> ref via page_pool_ref_page()
> unref via page_pool_unref_page()
>
> - !skb->pp_recycle && !is_pp_page()
> ref via get_page()
> unref via put_page()
>
> - !skb->pp_recycle && is_pp_page()
> ref via ?
> unref via?
>
> Also is this combination also legal you think? and if so what to do?
> - skb->pp_recycle && !is_pp_page()
> ref via ?
> unref via?
>
> I'm asking because I'm starting to wonder if this patch has some issue in it:
> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
>
> Because in this patch, if we have a !skb->pp_recycle & is_pp_page()
> combination we end up doing in skb_try_coalesce():
> ref via page_pool_ref_page()
> unref via put_page() via eventual napi_frag_unref()
>
> which seems like an issue, no?
>
Gah, nevermind, skb_pp_frag_ref() actually returns -EINVAL if
!skb->pp_recycle, and in the call site we do a skb_frag_ref() on this
error, so all in all we end up doing a get_page/put_page pair. Sorry
for the noise.
So we're supposed to:
- !skb->pp_recycle && is_pp_page()
ref via get_page
unref via put_page
Very subtle stuff (for me at least). I'll try to propose some cleanup
to make this a bit simpler using helpers that handle all these subtle
details internally so that the call sites don't have to do this
special handling.
--
Thanks,
Mina
On Wed, 6 Mar 2024 10:46:45 -0800 Mina Almasry wrote:
> Gah, nevermind, skb_pp_frag_ref() actually returns -EINVAL if
> !skb->pp_recycle, and in the call site we do a skb_frag_ref() on this
> error, so all in all we end up doing a get_page/put_page pair. Sorry
> for the noise.
>
> So we're supposed to:
> - !skb->pp_recycle && is_pp_page()
> ref via get_page
> unref via put_page
>
> Very subtle stuff (for me at least). I'll try to propose some cleanup
> to make this a bit simpler using helpers that handle all these subtle
> details internally so that the call sites don't have to do this
> special handling.
Sure, although complexity is complexity, we can only do so much to hide
it.
For pp_recycle - the problem is when we added page pool pages, hardly
anything in the upper layers of the stack was made pp aware. So we can
end up with someone doing
get_page(page);
skb_fill_page_desc(skb, page);
on a PP page.
You probably inspected a lot of those cases for the ZC work, and they
can be fixed up to do a "pp-aware get()", but until then we need
skb->pp_recycle. pp_recycle kinda denotes whether whoever constructed
the skb was PP aware.
So, yes, definitely a good long term goal, but not in a fix :)