2023-03-16 17:52:58

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH bpf-next 0/2] double-fix bpf_test_run + XDP_PASS recycling

Enabling skb PP recycling revealed a couple issues in the bpf_test_run
code. Recycling broke the assumption that the headroom won't ever be
touched during the test_run execution: xdp_scrub_frame() invalidates the
XDP frame at the headroom start, while neigh xmit code overwrites 2 bytes
to the left of the Ethernet header. The first makes the kernel panic in
certain cases, while the second breaks xdp_do_redirect selftest on BE.
test_run is a limited-scope entity, so let's hope no more corner cases
will happen here or at least they will be as easy and pleasant to fix
as those two.

Alexander Lobakin (2):
bpf, test_run: fix crashes due to XDP frame overwriting/corruption
selftests/bpf: fix "metadata marker" getting overwritten by the
netstack

net/bpf/test_run.c | 12 +++++++++++-
.../selftests/bpf/prog_tests/xdp_do_redirect.c | 7 ++++---
.../selftests/bpf/progs/test_xdp_do_redirect.c | 2 +-
3 files changed, 16 insertions(+), 5 deletions(-)

--
2.39.2



2023-03-16 17:53:07

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH bpf-next 1/2] bpf, test_run: fix crashes due to XDP frame overwriting/corruption

syzbot and Ilya faced the splats when %XDP_PASS happens for bpf_test_run
after skb PP recycling was enabled for {__,}xdp_build_skb_from_frame():

BUG: kernel NULL pointer dereference, address: 0000000000000d28
RIP: 0010:memset_erms+0xd/0x20 arch/x86/lib/memset_64.S:66
[...]
Call Trace:
<TASK>
__finalize_skb_around net/core/skbuff.c:321 [inline]
__build_skb_around+0x232/0x3a0 net/core/skbuff.c:379
build_skb_around+0x32/0x290 net/core/skbuff.c:444
__xdp_build_skb_from_frame+0x121/0x760 net/core/xdp.c:622
xdp_recv_frames net/bpf/test_run.c:248 [inline]
xdp_test_run_batch net/bpf/test_run.c:334 [inline]
bpf_test_run_xdp_live+0x1289/0x1930 net/bpf/test_run.c:362
bpf_prog_test_run_xdp+0xa05/0x14e0 net/bpf/test_run.c:1418
[...]

This happens due to that it calls xdp_scrub_frame(), which nullifies
xdpf->data. bpf_test_run code doesn't reinit the frame when the XDP
program doesn't adjust head or tail. Previously, %XDP_PASS meant the
page will be released from the pool and returned to the MM layer, but
now it does return to the Pool with the nullified xdpf->data, which
doesn't get reinitialized then.
So, in addition to checking whether the head and/or tail have been
adjusted, check also for a potential XDP frame corruption. xdpf->data
is 100% affected and also xdpf->flags is the field closest to the
metadata / frame start. Checking for these two should be enough for
non-extreme cases.

Fixes: 9c94bbf9a87b ("xdp: recycle Page Pool backed skbs built from XDP frames")
Reported-by: [email protected]
Link: https://lore.kernel.org/bpf/[email protected]
Reported-by: Ilya Leoshkevich <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Alexander Lobakin <[email protected]>
---
net/bpf/test_run.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 71226f68270d..8d6b31209bd6 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -208,6 +208,16 @@ static void xdp_test_run_teardown(struct xdp_test_data *xdp)
kfree(xdp->skbs);
}

+static bool frame_was_changed(const struct xdp_page_head *head)
+{
+ /* xdp_scrub_frame() zeroes the data pointer, flags is the last field,
+ * i.e. has the highest chances to be overwritten. If those two are
+ * untouched, it's most likely safe to skip the context reset.
+ */
+ return head->frm.data != head->orig_ctx.data ||
+ head->frm.flags != head->orig_ctx.flags;
+}
+
static bool ctx_was_changed(struct xdp_page_head *head)
{
return head->orig_ctx.data != head->ctx.data ||
@@ -217,7 +227,7 @@ static bool ctx_was_changed(struct xdp_page_head *head)

static void reset_ctx(struct xdp_page_head *head)
{
- if (likely(!ctx_was_changed(head)))
+ if (likely(!frame_was_changed(head) && !ctx_was_changed(head)))
return;

head->ctx.data = head->orig_ctx.data;
--
2.39.2


2023-03-16 17:53:11

by Alexander Lobakin

[permalink] [raw]
Subject: [PATCH bpf-next 2/2] selftests/bpf: fix "metadata marker" getting overwritten by the netstack

Alexei noticed xdp_do_redirect test on BPF CI started failing on
BE systems after skb PP recycling was enabled:

test_xdp_do_redirect:PASS:prog_run 0 nsec
test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual
220 != expected 9998
test_max_pkt_size:PASS:prog_run_max_size 0 nsec
test_max_pkt_size:PASS:prog_run_too_big 0 nsec
close_netns:PASS:setns 0 nsec
#289 xdp_do_redirect:FAIL
Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED

and it doesn't happen on LE systems.
Ilya then hunted it down to:

#0 0x0000000000aaeee6 in neigh_hh_output (hh=0x83258df0,
skb=0x88142200) at linux/include/net/neighbour.h:503
#1 0x0000000000ab2cda in neigh_output (skip_cache=false,
skb=0x88142200, n=<optimized out>) at linux/include/net/neighbour.h:544
#2 ip6_finish_output2 (net=net@entry=0x88edba00, sk=sk@entry=0x0,
skb=skb@entry=0x88142200) at linux/net/ipv6/ip6_output.c:134
#3 0x0000000000ab4cbc in __ip6_finish_output (skb=0x88142200, sk=0x0,
net=0x88edba00) at linux/net/ipv6/ip6_output.c:195
#4 ip6_finish_output (net=0x88edba00, sk=0x0, skb=0x88142200) at
linux/net/ipv6/ip6_output.c:206

xdp_do_redirect test places a u32 marker (0x42) right before the Ethernet
header to check it then in the XDP program and return %XDP_ABORTED if it's
not there. Neigh xmit code likes to round up hard header length to speed
up copying the header, so it overwrites two bytes in front of the Eth
header. On LE systems, 0x42 is one byte at `data - 4`, while on BE it's
`data - 1`, what explains why it happens only there.
It didn't happen previously due to that %XDP_PASS meant the page will be
discarded and replaced by a new one, but now it can be recycled as well,
while bpf_test_run code doesn't reinitialize the content of recycled
pages. This mark is limited to this particular test and its setup though,
so there's no need to predict 1000 different possible cases. Just move
it 4 bytes to the left, still keeping it 32 bit to match on more bytes.

Fixes: 9c94bbf9a87b ("xdp: recycle Page Pool backed skbs built from XDP frames")
Reported-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/CAADnVQ+B_JOU+EpP=DKhbY9yXdN6GiRPnpTTXfEZ9sNkUeb-yQ@mail.gmail.com
Reported-by: Ilya Leoshkevich <[email protected]> # + debugging
Link: https://lore.kernel.org/bpf/[email protected]
Signed-off-by: Alexander Lobakin <[email protected]>
---
tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c | 7 ++++---
tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c | 2 +-
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
index 856cbc29e6a1..4eaa3dcaebc8 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
@@ -86,12 +86,12 @@ static void test_max_pkt_size(int fd)
void test_xdp_do_redirect(void)
{
int err, xdp_prog_fd, tc_prog_fd, ifindex_src, ifindex_dst;
- char data[sizeof(pkt_udp) + sizeof(__u32)];
+ char data[sizeof(pkt_udp) + sizeof(__u64)];
struct test_xdp_do_redirect *skel = NULL;
struct nstoken *nstoken = NULL;
struct bpf_link *link;
LIBBPF_OPTS(bpf_xdp_query_opts, query_opts);
- struct xdp_md ctx_in = { .data = sizeof(__u32),
+ struct xdp_md ctx_in = { .data = sizeof(__u64),
.data_end = sizeof(data) };
DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts,
.data_in = &data,
@@ -105,8 +105,9 @@ void test_xdp_do_redirect(void)
DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook,
.attach_point = BPF_TC_INGRESS);

- memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp));
+ memcpy(&data[sizeof(__u64)], &pkt_udp, sizeof(pkt_udp));
*((__u32 *)data) = 0x42; /* metadata test value */
+ *((__u32 *)data + 4) = 0;

skel = test_xdp_do_redirect__open();
if (!ASSERT_OK_PTR(skel, "skel"))
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
index cd2d4e3258b8..5baaafed0d2d 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
@@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp)

*payload = MARK_IN;

- if (bpf_xdp_adjust_meta(xdp, 4))
+ if (bpf_xdp_adjust_meta(xdp, sizeof(__u64)))
return XDP_ABORTED;

if (retcode > XDP_PASS)
--
2.39.2


2023-03-16 20:10:28

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH bpf-next 1/2] bpf, test_run: fix crashes due to XDP frame overwriting/corruption

Alexander Lobakin <[email protected]> writes:

> syzbot and Ilya faced the splats when %XDP_PASS happens for bpf_test_run
> after skb PP recycling was enabled for {__,}xdp_build_skb_from_frame():
>
> BUG: kernel NULL pointer dereference, address: 0000000000000d28
> RIP: 0010:memset_erms+0xd/0x20 arch/x86/lib/memset_64.S:66
> [...]
> Call Trace:
> <TASK>
> __finalize_skb_around net/core/skbuff.c:321 [inline]
> __build_skb_around+0x232/0x3a0 net/core/skbuff.c:379
> build_skb_around+0x32/0x290 net/core/skbuff.c:444
> __xdp_build_skb_from_frame+0x121/0x760 net/core/xdp.c:622
> xdp_recv_frames net/bpf/test_run.c:248 [inline]
> xdp_test_run_batch net/bpf/test_run.c:334 [inline]
> bpf_test_run_xdp_live+0x1289/0x1930 net/bpf/test_run.c:362
> bpf_prog_test_run_xdp+0xa05/0x14e0 net/bpf/test_run.c:1418
> [...]
>
> This happens due to that it calls xdp_scrub_frame(), which nullifies
> xdpf->data. bpf_test_run code doesn't reinit the frame when the XDP
> program doesn't adjust head or tail. Previously, %XDP_PASS meant the
> page will be released from the pool and returned to the MM layer, but
> now it does return to the Pool with the nullified xdpf->data, which
> doesn't get reinitialized then.
> So, in addition to checking whether the head and/or tail have been
> adjusted, check also for a potential XDP frame corruption. xdpf->data
> is 100% affected and also xdpf->flags is the field closest to the
> metadata / frame start. Checking for these two should be enough for
> non-extreme cases.
>
> Fixes: 9c94bbf9a87b ("xdp: recycle Page Pool backed skbs built from XDP frames")
> Reported-by: [email protected]
> Link: https://lore.kernel.org/bpf/[email protected]
> Reported-by: Ilya Leoshkevich <[email protected]>
> Link: https://lore.kernel.org/bpf/[email protected]
> Signed-off-by: Alexander Lobakin <[email protected]>

Acked-by: Toke Høiland-Jørgensen <[email protected]>


2023-03-16 20:11:01

by Toke Høiland-Jørgensen

[permalink] [raw]
Subject: Re: [PATCH bpf-next 2/2] selftests/bpf: fix "metadata marker" getting overwritten by the netstack

Alexander Lobakin <[email protected]> writes:

> Alexei noticed xdp_do_redirect test on BPF CI started failing on
> BE systems after skb PP recycling was enabled:
>
> test_xdp_do_redirect:PASS:prog_run 0 nsec
> test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
> test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
> test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual
> 220 != expected 9998
> test_max_pkt_size:PASS:prog_run_max_size 0 nsec
> test_max_pkt_size:PASS:prog_run_too_big 0 nsec
> close_netns:PASS:setns 0 nsec
> #289 xdp_do_redirect:FAIL
> Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED
>
> and it doesn't happen on LE systems.
> Ilya then hunted it down to:
>
> #0 0x0000000000aaeee6 in neigh_hh_output (hh=0x83258df0,
> skb=0x88142200) at linux/include/net/neighbour.h:503
> #1 0x0000000000ab2cda in neigh_output (skip_cache=false,
> skb=0x88142200, n=<optimized out>) at linux/include/net/neighbour.h:544
> #2 ip6_finish_output2 (net=net@entry=0x88edba00, sk=sk@entry=0x0,
> skb=skb@entry=0x88142200) at linux/net/ipv6/ip6_output.c:134
> #3 0x0000000000ab4cbc in __ip6_finish_output (skb=0x88142200, sk=0x0,
> net=0x88edba00) at linux/net/ipv6/ip6_output.c:195
> #4 ip6_finish_output (net=0x88edba00, sk=0x0, skb=0x88142200) at
> linux/net/ipv6/ip6_output.c:206
>
> xdp_do_redirect test places a u32 marker (0x42) right before the Ethernet
> header to check it then in the XDP program and return %XDP_ABORTED if it's
> not there. Neigh xmit code likes to round up hard header length to speed
> up copying the header, so it overwrites two bytes in front of the Eth
> header. On LE systems, 0x42 is one byte at `data - 4`, while on BE it's
> `data - 1`, what explains why it happens only there.
> It didn't happen previously due to that %XDP_PASS meant the page will be
> discarded and replaced by a new one, but now it can be recycled as well,
> while bpf_test_run code doesn't reinitialize the content of recycled
> pages. This mark is limited to this particular test and its setup though,
> so there's no need to predict 1000 different possible cases. Just move
> it 4 bytes to the left, still keeping it 32 bit to match on more
> bytes.

Wow, this must have been annoying to track down - nice work :)

> Fixes: 9c94bbf9a87b ("xdp: recycle Page Pool backed skbs built from XDP frames")
> Reported-by: Alexei Starovoitov <[email protected]>
> Link: https://lore.kernel.org/bpf/CAADnVQ+B_JOU+EpP=DKhbY9yXdN6GiRPnpTTXfEZ9sNkUeb-yQ@mail.gmail.com
> Reported-by: Ilya Leoshkevich <[email protected]> # + debugging
> Link: https://lore.kernel.org/bpf/[email protected]
> Signed-off-by: Alexander Lobakin <[email protected]>

Acked-by: Toke Høiland-Jørgensen <[email protected]>


2023-03-16 21:23:37

by Ilya Leoshkevich

[permalink] [raw]
Subject: Re: [PATCH bpf-next 2/2] selftests/bpf: fix "metadata marker" getting overwritten by the netstack

On Thu, 2023-03-16 at 18:50 +0100, Alexander Lobakin wrote:
> Alexei noticed xdp_do_redirect test on BPF CI started failing on
> BE systems after skb PP recycling was enabled:
>
> test_xdp_do_redirect:PASS:prog_run 0 nsec
> test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
> test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
> test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc:
> actual
> 220 != expected 9998
> test_max_pkt_size:PASS:prog_run_max_size 0 nsec
> test_max_pkt_size:PASS:prog_run_too_big 0 nsec
> close_netns:PASS:setns 0 nsec
>  #289 xdp_do_redirect:FAIL
> Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED
>
> and it doesn't happen on LE systems.
> Ilya then hunted it down to:
>
>  #0  0x0000000000aaeee6 in neigh_hh_output (hh=0x83258df0,
> skb=0x88142200) at linux/include/net/neighbour.h:503
>  #1  0x0000000000ab2cda in neigh_output (skip_cache=false,
> skb=0x88142200, n=<optimized out>) at
> linux/include/net/neighbour.h:544
>  #2  ip6_finish_output2 (net=net@entry=0x88edba00, sk=sk@entry=0x0,
> skb=skb@entry=0x88142200) at linux/net/ipv6/ip6_output.c:134
>  #3  0x0000000000ab4cbc in __ip6_finish_output (skb=0x88142200,
> sk=0x0,
> net=0x88edba00) at linux/net/ipv6/ip6_output.c:195
>  #4  ip6_finish_output (net=0x88edba00, sk=0x0, skb=0x88142200) at
> linux/net/ipv6/ip6_output.c:206
>
> xdp_do_redirect test places a u32 marker (0x42) right before the
> Ethernet
> header to check it then in the XDP program and return %XDP_ABORTED if
> it's
> not there. Neigh xmit code likes to round up hard header length to
> speed
> up copying the header, so it overwrites two bytes in front of the Eth
> header. On LE systems, 0x42 is one byte at `data - 4`, while on BE
> it's
> `data - 1`, what explains why it happens only there.
> It didn't happen previously due to that %XDP_PASS meant the page will
> be
> discarded and replaced by a new one, but now it can be recycled as
> well,
> while bpf_test_run code doesn't reinitialize the content of recycled
> pages. This mark is limited to this particular test and its setup
> though,
> so there's no need to predict 1000 different possible cases. Just
> move
> it 4 bytes to the left, still keeping it 32 bit to match on more
> bytes.
>
> Fixes: 9c94bbf9a87b ("xdp: recycle Page Pool backed skbs built from
> XDP frames")
> Reported-by: Alexei Starovoitov <[email protected]>
> Link:
> https://lore.kernel.org/bpf/CAADnVQ+B_JOU+EpP=DKhbY9yXdN6GiRPnpTTXfEZ9sNkUeb-yQ@mail.gmail.com
> Reported-by: Ilya Leoshkevich <[email protected]> # + debugging
> Link:
> https://lore.kernel.org/bpf/[email protected]
> Signed-off-by: Alexander Lobakin <[email protected]>
> ---
>  tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c | 7 ++++---
>  tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c | 2 +-
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> index 856cbc29e6a1..4eaa3dcaebc8 100644
> --- a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> +++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
> @@ -86,12 +86,12 @@ static void test_max_pkt_size(int fd)
>  void test_xdp_do_redirect(void)
>  {
>         int err, xdp_prog_fd, tc_prog_fd, ifindex_src, ifindex_dst;
> -       char data[sizeof(pkt_udp) + sizeof(__u32)];
> +       char data[sizeof(pkt_udp) + sizeof(__u64)];
>         struct test_xdp_do_redirect *skel = NULL;
>         struct nstoken *nstoken = NULL;
>         struct bpf_link *link;
>         LIBBPF_OPTS(bpf_xdp_query_opts, query_opts);
> -       struct xdp_md ctx_in = { .data = sizeof(__u32),
> +       struct xdp_md ctx_in = { .data = sizeof(__u64),
>                                  .data_end = sizeof(data) };
>         DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts,
>                             .data_in = &data,
> @@ -105,8 +105,9 @@ void test_xdp_do_redirect(void)
>         DECLARE_LIBBPF_OPTS(bpf_tc_hook, tc_hook,
>                             .attach_point = BPF_TC_INGRESS);
>  
> -       memcpy(&data[sizeof(__u32)], &pkt_udp, sizeof(pkt_udp));
> +       memcpy(&data[sizeof(__u64)], &pkt_udp, sizeof(pkt_udp));
>         *((__u32 *)data) = 0x42; /* metadata test value */
> +       *((__u32 *)data + 4) = 0;
>  
>         skel = test_xdp_do_redirect__open();
>         if (!ASSERT_OK_PTR(skel, "skel"))
> diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> index cd2d4e3258b8..5baaafed0d2d 100644
> --- a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> +++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
> @@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp)
>  
>         *payload = MARK_IN;
>  
> -       if (bpf_xdp_adjust_meta(xdp, 4))
> +       if (bpf_xdp_adjust_meta(xdp, sizeof(__u64)))
>                 return XDP_ABORTED;
>  
>         if (retcode > XDP_PASS)

Tested-by: Ilya Leoshkevich <[email protected]>

2023-03-16 21:23:53

by Ilya Leoshkevich

[permalink] [raw]
Subject: Re: [PATCH bpf-next 1/2] bpf, test_run: fix crashes due to XDP frame overwriting/corruption

On Thu, 2023-03-16 at 18:50 +0100, Alexander Lobakin wrote:
> syzbot and Ilya faced the splats when %XDP_PASS happens for
> bpf_test_run
> after skb PP recycling was enabled for
> {__,}xdp_build_skb_from_frame():
>
> BUG: kernel NULL pointer dereference, address: 0000000000000d28
> RIP: 0010:memset_erms+0xd/0x20 arch/x86/lib/memset_64.S:66
> [...]
> Call Trace:
>  <TASK>
>  __finalize_skb_around net/core/skbuff.c:321 [inline]
>  __build_skb_around+0x232/0x3a0 net/core/skbuff.c:379
>  build_skb_around+0x32/0x290 net/core/skbuff.c:444
>  __xdp_build_skb_from_frame+0x121/0x760 net/core/xdp.c:622
>  xdp_recv_frames net/bpf/test_run.c:248 [inline]
>  xdp_test_run_batch net/bpf/test_run.c:334 [inline]
>  bpf_test_run_xdp_live+0x1289/0x1930 net/bpf/test_run.c:362
>  bpf_prog_test_run_xdp+0xa05/0x14e0 net/bpf/test_run.c:1418
> [...]
>
> This happens due to that it calls xdp_scrub_frame(), which nullifies
> xdpf->data. bpf_test_run code doesn't reinit the frame when the XDP
> program doesn't adjust head or tail. Previously, %XDP_PASS meant the
> page will be released from the pool and returned to the MM layer, but
> now it does return to the Pool with the nullified xdpf->data, which
> doesn't get reinitialized then.
> So, in addition to checking whether the head and/or tail have been
> adjusted, check also for a potential XDP frame corruption. xdpf->data
> is 100% affected and also xdpf->flags is the field closest to the
> metadata / frame start. Checking for these two should be enough for
> non-extreme cases.
>
> Fixes: 9c94bbf9a87b ("xdp: recycle Page Pool backed skbs built from
> XDP frames")
> Reported-by: [email protected]
> Link:
> https://lore.kernel.org/bpf/[email protected]
> Reported-by: Ilya Leoshkevich <[email protected]>
> Link:
> https://lore.kernel.org/bpf/[email protected]
> Signed-off-by: Alexander Lobakin <[email protected]>
> ---
>  net/bpf/test_run.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 71226f68270d..8d6b31209bd6 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -208,6 +208,16 @@ static void xdp_test_run_teardown(struct
> xdp_test_data *xdp)
>         kfree(xdp->skbs);
>  }
>  
> +static bool frame_was_changed(const struct xdp_page_head *head)
> +{
> +       /* xdp_scrub_frame() zeroes the data pointer, flags is the
> last field,
> +        * i.e. has the highest chances to be overwritten. If those
> two are
> +        * untouched, it's most likely safe to skip the context
> reset.
> +        */
> +       return head->frm.data != head->orig_ctx.data ||
> +              head->frm.flags != head->orig_ctx.flags;
> +}
> +
>  static bool ctx_was_changed(struct xdp_page_head *head)
>  {
>         return head->orig_ctx.data != head->ctx.data ||
> @@ -217,7 +227,7 @@ static bool ctx_was_changed(struct xdp_page_head
> *head)
>  
>  static void reset_ctx(struct xdp_page_head *head)
>  {
> -       if (likely(!ctx_was_changed(head)))
> +       if (likely(!frame_was_changed(head) &&
> !ctx_was_changed(head)))
>                 return;
>  
>         head->ctx.data = head->orig_ctx.data;

With this test begins to work on s390x:

# ./test_progs -t xdp_do_redirect
IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst: link becomes ready
IPv6: ADDRCONF(NETDEV_CHANGE): veth_src: link becomes ready
#290 xdp_do_redirect:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Thanks!

Tested-by: Ilya Leoshkevich <[email protected]>

2023-03-17 05:30:26

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH bpf-next 0/2] double-fix bpf_test_run + XDP_PASS recycling

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <[email protected]>:

On Thu, 16 Mar 2023 18:50:49 +0100 you wrote:
> Enabling skb PP recycling revealed a couple issues in the bpf_test_run
> code. Recycling broke the assumption that the headroom won't ever be
> touched during the test_run execution: xdp_scrub_frame() invalidates the
> XDP frame at the headroom start, while neigh xmit code overwrites 2 bytes
> to the left of the Ethernet header. The first makes the kernel panic in
> certain cases, while the second breaks xdp_do_redirect selftest on BE.
> test_run is a limited-scope entity, so let's hope no more corner cases
> will happen here or at least they will be as easy and pleasant to fix
> as those two.
>
> [...]

Here is the summary with links:
- [bpf-next,1/2] bpf, test_run: fix crashes due to XDP frame overwriting/corruption
https://git.kernel.org/bpf/bpf-next/c/e5995bc7e2ba
- [bpf-next,2/2] selftests/bpf: fix "metadata marker" getting overwritten by the netstack
https://git.kernel.org/bpf/bpf-next/c/5640b6d89434

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



2023-03-17 13:40:21

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH bpf-next 2/2] selftests/bpf: fix "metadata marker" getting overwritten by the netstack

From: Toke Høiland-Jørgensen <[email protected]>
Date: Thu, 16 Mar 2023 21:10:02 +0100

> Alexander Lobakin <[email protected]> writes:
>
>> Alexei noticed xdp_do_redirect test on BPF CI started failing on
>> BE systems after skb PP recycling was enabled:
>>
>> test_xdp_do_redirect:PASS:prog_run 0 nsec
>> test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
>> test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
>> test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual
>> 220 != expected 9998
>> test_max_pkt_size:PASS:prog_run_max_size 0 nsec
>> test_max_pkt_size:PASS:prog_run_too_big 0 nsec
>> close_netns:PASS:setns 0 nsec
>> #289 xdp_do_redirect:FAIL
>> Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED
>>
>> and it doesn't happen on LE systems.
>> Ilya then hunted it down to:
>>
>> #0 0x0000000000aaeee6 in neigh_hh_output (hh=0x83258df0,
>> skb=0x88142200) at linux/include/net/neighbour.h:503
>> #1 0x0000000000ab2cda in neigh_output (skip_cache=false,
>> skb=0x88142200, n=<optimized out>) at linux/include/net/neighbour.h:544
>> #2 ip6_finish_output2 (net=net@entry=0x88edba00, sk=sk@entry=0x0,
>> skb=skb@entry=0x88142200) at linux/net/ipv6/ip6_output.c:134
>> #3 0x0000000000ab4cbc in __ip6_finish_output (skb=0x88142200, sk=0x0,
>> net=0x88edba00) at linux/net/ipv6/ip6_output.c:195
>> #4 ip6_finish_output (net=0x88edba00, sk=0x0, skb=0x88142200) at
>> linux/net/ipv6/ip6_output.c:206
>>
>> xdp_do_redirect test places a u32 marker (0x42) right before the Ethernet
>> header to check it then in the XDP program and return %XDP_ABORTED if it's
>> not there. Neigh xmit code likes to round up hard header length to speed
>> up copying the header, so it overwrites two bytes in front of the Eth
>> header. On LE systems, 0x42 is one byte at `data - 4`, while on BE it's
>> `data - 1`, what explains why it happens only there.
>> It didn't happen previously due to that %XDP_PASS meant the page will be
>> discarded and replaced by a new one, but now it can be recycled as well,
>> while bpf_test_run code doesn't reinitialize the content of recycled
>> pages. This mark is limited to this particular test and its setup though,
>> so there's no need to predict 1000 different possible cases. Just move
>> it 4 bytes to the left, still keeping it 32 bit to match on more
>> bytes.
>
> Wow, this must have been annoying to track down - nice work :)

I just blinked my eyes once and Ilya already came back with the detailed
stacktrace, so it's almost entirely his work <O

>
>> Fixes: 9c94bbf9a87b ("xdp: recycle Page Pool backed skbs built from XDP frames")
>> Reported-by: Alexei Starovoitov <[email protected]>
>> Link: https://lore.kernel.org/bpf/CAADnVQ+B_JOU+EpP=DKhbY9yXdN6GiRPnpTTXfEZ9sNkUeb-yQ@mail.gmail.com
>> Reported-by: Ilya Leoshkevich <[email protected]> # + debugging
>> Link: https://lore.kernel.org/bpf/[email protected]
>> Signed-off-by: Alexander Lobakin <[email protected]>
>
> Acked-by: Toke Høiland-Jørgensen <[email protected]>
>

Thanks! That was quick :D

Olek

2023-03-17 13:42:08

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [PATCH bpf-next 2/2] selftests/bpf: fix "metadata marker" getting overwritten by the netstack

From: Ilya Leoshkevich <[email protected]>
Date: Thu, 16 Mar 2023 22:22:26 +0100

> On Thu, 2023-03-16 at 18:50 +0100, Alexander Lobakin wrote:
>> Alexei noticed xdp_do_redirect test on BPF CI started failing on
>> BE systems after skb PP recycling was enabled:

[...]

>> @@ -52,7 +52,7 @@ int xdp_redirect(struct xdp_md *xdp)
>>  
>>         *payload = MARK_IN;
>>  
>> -       if (bpf_xdp_adjust_meta(xdp, 4))
>> +       if (bpf_xdp_adjust_meta(xdp, sizeof(__u64)))
>>                 return XDP_ABORTED;
>>  
>>         if (retcode > XDP_PASS)
>
> Tested-by: Ilya Leoshkevich <[email protected]>

Much thanks again for the debugging and testing! It definitely wouldn't
have been quick without you :D

Olek