2021-02-03 00:48:48

by Brendan Jackman

[permalink] [raw]
Subject: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

When BPF_FETCH is set, atomic instructions load a value from memory
into a register. The current verifier code first checks via
check_mem_access whether we can access the memory, and then checks
via check_reg_arg whether we can write into the register.

For loads, check_reg_arg has the side-effect of marking the
register's value as unkonwn, and check_mem_access has the side effect
of propagating bounds from memory to the register. This currently only
takes effect for stack memory.

Therefore with the current order, bounds information is thrown away,
but by simply reversing the order of check_reg_arg
vs. check_mem_access, we can instead propagate bounds smartly.

A simple test is added with an infinite loop that can only be proved
unreachable if this propagation is present. This is implemented both
with C and directly in test_verifier using assembly.

Suggested-by: John Fastabend <[email protected]>
Signed-off-by: Brendan Jackman <[email protected]>
---

Difference from v2->v3 [1]:

* Fixed missing ENABLE_ATOMICS_TESTS check.

Difference from v1->v2:

* Reworked commit message to clarify this only affects stack memory
* Added the Suggested-by
* Added a C-based test.

[1]: https://lore.kernel.org/bpf/CA+i-1C2ZWUbGxWJ8kAxbri9rBboyuMaVj_BBhg+2Zf_Su9BOJA@mail.gmail.com/T/#t

kernel/bpf/verifier.c | 32 +++++++++++--------
.../selftests/bpf/prog_tests/atomic_bounds.c | 15 +++++++++
.../selftests/bpf/progs/atomic_bounds.c | 24 ++++++++++++++
.../selftests/bpf/verifier/atomic_bounds.c | 27 ++++++++++++++++
4 files changed, 84 insertions(+), 14 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
create mode 100644 tools/testing/selftests/bpf/progs/atomic_bounds.c
create mode 100644 tools/testing/selftests/bpf/verifier/atomic_bounds.c

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 972fc38eb62d..5e09632efddb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3665,9 +3665,26 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
return -EACCES;
}

+ if (insn->imm & BPF_FETCH) {
+ if (insn->imm == BPF_CMPXCHG)
+ load_reg = BPF_REG_0;
+ else
+ load_reg = insn->src_reg;
+
+ /* check and record load of old value */
+ err = check_reg_arg(env, load_reg, DST_OP);
+ if (err)
+ return err;
+ } else {
+ /* This instruction accesses a memory location but doesn't
+ * actually load it into a register.
+ */
+ load_reg = -1;
+ }
+
/* check whether we can read the memory */
err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
- BPF_SIZE(insn->code), BPF_READ, -1, true);
+ BPF_SIZE(insn->code), BPF_READ, load_reg, true);
if (err)
return err;

@@ -3677,19 +3694,6 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
if (err)
return err;

- if (!(insn->imm & BPF_FETCH))
- return 0;
-
- if (insn->imm == BPF_CMPXCHG)
- load_reg = BPF_REG_0;
- else
- load_reg = insn->src_reg;
-
- /* check and record load of old value */
- err = check_reg_arg(env, load_reg, DST_OP);
- if (err)
- return err;
-
return 0;
}

diff --git a/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
new file mode 100644
index 000000000000..addf127068e4
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+
+#include "atomic_bounds.skel.h"
+
+void test_atomic_bounds(void)
+{
+ struct atomic_bounds *skel;
+ __u32 duration = 0;
+
+ skel = atomic_bounds__open_and_load();
+ if (CHECK(!skel, "skel_load", "couldn't load program\n"))
+ return;
+}
diff --git a/tools/testing/selftests/bpf/progs/atomic_bounds.c b/tools/testing/selftests/bpf/progs/atomic_bounds.c
new file mode 100644
index 000000000000..e5fff7fc7f8f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/atomic_bounds.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <stdbool.h>
+
+#ifdef ENABLE_ATOMICS_TESTS
+bool skip_tests __attribute((__section__(".data"))) = false;
+#else
+bool skip_tests = true;
+#endif
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(sub, int x)
+{
+#ifdef ENABLE_ATOMICS_TESTS
+ int a = 0;
+ int b = __sync_fetch_and_add(&a, 1);
+ /* b is certainly 0 here. Can the verifier tell? */
+ while (b)
+ continue;
+#endif
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
new file mode 100644
index 000000000000..e82183e4914f
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
@@ -0,0 +1,27 @@
+{
+ "BPF_ATOMIC bounds propagation, mem->reg",
+ .insns = {
+ /* a = 0; */
+ /*
+ * Note this is implemented with two separate instructions,
+ * where you might think one would suffice:
+ *
+ * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ *
+ * This is because BPF_ST_MEM doesn't seem to set the stack slot
+ * type to 0 when storing an immediate.
+ */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+ /* b = atomic_fetch_add(&a, 1); */
+ BPF_MOV64_IMM(BPF_REG_1, 1),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8),
+ /* Verifier should be able to tell that this infinite loop isn't reachable. */
+ /* if (b) while (true) continue; */
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .result_unpriv = REJECT,
+ .errstr_unpriv = "back-edge",
+},

base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439
--
2.30.0.365.g02bc693789-goog


2021-02-03 17:12:04

by Yonghong Song

[permalink] [raw]
Subject: Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH



On 2/2/21 5:50 AM, Brendan Jackman wrote:
> When BPF_FETCH is set, atomic instructions load a value from memory
> into a register. The current verifier code first checks via
> check_mem_access whether we can access the memory, and then checks
> via check_reg_arg whether we can write into the register.
>
> For loads, check_reg_arg has the side-effect of marking the
> register's value as unkonwn, and check_mem_access has the side effect
> of propagating bounds from memory to the register. This currently only
> takes effect for stack memory.
>
> Therefore with the current order, bounds information is thrown away,
> but by simply reversing the order of check_reg_arg
> vs. check_mem_access, we can instead propagate bounds smartly.
>
> A simple test is added with an infinite loop that can only be proved
> unreachable if this propagation is present. This is implemented both
> with C and directly in test_verifier using assembly.
>
> Suggested-by: John Fastabend <[email protected]>
> Signed-off-by: Brendan Jackman <[email protected]>

Ack with a nit below.

Acked-by: Yonghong Song <[email protected]>

> ---
>
> Difference from v2->v3 [1]:
>
> * Fixed missing ENABLE_ATOMICS_TESTS check.
>
> Difference from v1->v2:
>
> * Reworked commit message to clarify this only affects stack memory
> * Added the Suggested-by
> * Added a C-based test.
>
> [1]: https://lore.kernel.org/bpf/CA+i-1C2ZWUbGxWJ8kAxbri9rBboyuMaVj_BBhg+2Zf_Su9BOJA@mail.gmail.com/T/#t
>
> kernel/bpf/verifier.c | 32 +++++++++++--------
> .../selftests/bpf/prog_tests/atomic_bounds.c | 15 +++++++++
> .../selftests/bpf/progs/atomic_bounds.c | 24 ++++++++++++++
> .../selftests/bpf/verifier/atomic_bounds.c | 27 ++++++++++++++++
> 4 files changed, 84 insertions(+), 14 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
> create mode 100644 tools/testing/selftests/bpf/progs/atomic_bounds.c
> create mode 100644 tools/testing/selftests/bpf/verifier/atomic_bounds.c
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 972fc38eb62d..5e09632efddb 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3665,9 +3665,26 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> return -EACCES;
> }
>
> + if (insn->imm & BPF_FETCH) {
> + if (insn->imm == BPF_CMPXCHG)
> + load_reg = BPF_REG_0;
> + else
> + load_reg = insn->src_reg;
> +
> + /* check and record load of old value */
> + err = check_reg_arg(env, load_reg, DST_OP);
> + if (err)
> + return err;
> + } else {
> + /* This instruction accesses a memory location but doesn't
> + * actually load it into a register.
> + */
> + load_reg = -1;
> + }
> +
> /* check whether we can read the memory */
> err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> - BPF_SIZE(insn->code), BPF_READ, -1, true);
> + BPF_SIZE(insn->code), BPF_READ, load_reg, true);
> if (err)
> return err;
>
> @@ -3677,19 +3694,6 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
> if (err)
> return err;
>
> - if (!(insn->imm & BPF_FETCH))
> - return 0;
> -
> - if (insn->imm == BPF_CMPXCHG)
> - load_reg = BPF_REG_0;
> - else
> - load_reg = insn->src_reg;
> -
> - /* check and record load of old value */
> - err = check_reg_arg(env, load_reg, DST_OP);
> - if (err)
> - return err;
> -
> return 0;
> }
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
> new file mode 100644
> index 000000000000..addf127068e4
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <test_progs.h>
> +
> +#include "atomic_bounds.skel.h"
> +
> +void test_atomic_bounds(void)
> +{
> + struct atomic_bounds *skel;
> + __u32 duration = 0;
> +
> + skel = atomic_bounds__open_and_load();
> + if (CHECK(!skel, "skel_load", "couldn't load program\n"))
> + return;

You are missing
atomic_bounds__destroy(skel);
here.

> +}
> diff --git a/tools/testing/selftests/bpf/progs/atomic_bounds.c b/tools/testing/selftests/bpf/progs/atomic_bounds.c
> new file mode 100644
> index 000000000000..e5fff7fc7f8f
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/atomic_bounds.c
> @@ -0,0 +1,24 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +#include <stdbool.h>
> +
> +#ifdef ENABLE_ATOMICS_TESTS
> +bool skip_tests __attribute((__section__(".data"))) = false;
> +#else
> +bool skip_tests = true;
> +#endif
> +
> +SEC("fentry/bpf_fentry_test1")
> +int BPF_PROG(sub, int x)
> +{
> +#ifdef ENABLE_ATOMICS_TESTS
> + int a = 0;
> + int b = __sync_fetch_and_add(&a, 1);
> + /* b is certainly 0 here. Can the verifier tell? */
> + while (b)
> + continue;
> +#endif
> + return 0;
> +}
[...]

2021-02-03 17:42:28

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Wed, Feb 3, 2021 at 9:07 AM Yonghong Song <[email protected]> wrote:
>
>
>
> On 2/2/21 5:50 AM, Brendan Jackman wrote:
> > When BPF_FETCH is set, atomic instructions load a value from memory
> > into a register. The current verifier code first checks via
> > check_mem_access whether we can access the memory, and then checks
> > via check_reg_arg whether we can write into the register.
> >
> > For loads, check_reg_arg has the side-effect of marking the
> > register's value as unkonwn, and check_mem_access has the side effect
> > of propagating bounds from memory to the register. This currently only
> > takes effect for stack memory.
> >
> > Therefore with the current order, bounds information is thrown away,
> > but by simply reversing the order of check_reg_arg
> > vs. check_mem_access, we can instead propagate bounds smartly.
> >
> > A simple test is added with an infinite loop that can only be proved
> > unreachable if this propagation is present. This is implemented both
> > with C and directly in test_verifier using assembly.
> >
> > Suggested-by: John Fastabend <[email protected]>
> > Signed-off-by: Brendan Jackman <[email protected]>
>
> Ack with a nit below.

Sorry it was already applied yesterday.
patchbot just didn't send auto-reply.

2021-06-27 15:38:19

by Jiri Olsa

[permalink] [raw]
Subject: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:

SNIP

> diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> new file mode 100644
> index 000000000000..e82183e4914f
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> @@ -0,0 +1,27 @@
> +{
> + "BPF_ATOMIC bounds propagation, mem->reg",
> + .insns = {
> + /* a = 0; */
> + /*
> + * Note this is implemented with two separate instructions,
> + * where you might think one would suffice:
> + *
> + * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> + *
> + * This is because BPF_ST_MEM doesn't seem to set the stack slot
> + * type to 0 when storing an immediate.
> + */
> + BPF_MOV64_IMM(BPF_REG_0, 0),
> + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
> + /* b = atomic_fetch_add(&a, 1); */
> + BPF_MOV64_IMM(BPF_REG_1, 1),
> + BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8),
> + /* Verifier should be able to tell that this infinite loop isn't reachable. */
> + /* if (b) while (true) continue; */
> + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1),
> + BPF_EXIT_INSN(),
> + },
> + .result = ACCEPT,
> + .result_unpriv = REJECT,
> + .errstr_unpriv = "back-edge",
> +},
>
> base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439
> --
> 2.30.0.365.g02bc693789-goog
>

hi,
I tracked soft lock up on powerpc to this test:

[root@ibm-p9z-07-lp1 bpf]# ./test_verifier 25
#25/u BPF_ATOMIC bounds propagation, mem->reg SKIP
#25/p BPF_ATOMIC bounds propagation, mem->reg

Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:24:34 ...
kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]

Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:25:04 ...
kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [test_verifier:1055]

please check the console output below.. it looks like the verifier
allowed the loop to happen for some reason on powerpc.. any idea?

I'm on latest bpf-next/master, I can send the config if needed

thanks,
jirka


---
ibm-p9z-07-lp1 login: [ 184.108655] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
[ 184.108679] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[ 184.108722] CPU: 2 PID: 1055 Comm: test_verifier Tainted: G E 5.13.0-rc3+ #3
[ 184.108728] NIP: c00800000131314c LR: c000000000c56918 CTR: c008000001313118
[ 184.108733] REGS: c0000000119ef820 TRAP: 0900 Tainted: G E (5.13.0-rc3+)
[ 184.108739] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040003
[ 184.108752] CFAR: c008000001313150 IRQMASK: 0
[ 184.108752] GPR00: c000000000c5671c c0000000119efac0 c000000002a08400 0000000000000001
[ 184.108752] GPR04: c0080000010c0048 ffffffffffffffff 0000000001f3f8ec 0000000000000008
[ 184.108752] GPR08: 0000000000000000 c0000000119efae8 0000000000000001 49adb8fcb8417937
[ 184.108752] GPR12: c008000001313118 c00000001ecae400 0000000000000000 0000000000000000
[ 184.108752] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000000021cf6f8
[ 184.108752] GPR20: 0000000000000000 c0000000119efc34 c0000000119efc30 c0080000010c0048
[ 184.108752] GPR24: c00000000a1dc100 0000000000000001 c000000011fadc80 c0000000021cf638
[ 184.108752] GPR28: c0080000010c0000 0000000000000001 c0000000021cf638 c0000000119efaf0
[ 184.108812] NIP [c00800000131314c] bpf_prog_a2eb9104e5e8a5bf+0x34/0xcee8
[ 184.108819] LR [c000000000c56918] bpf_test_run+0x2f8/0x470
[ 184.108826] Call Trace:
[ 184.108828] [c0000000119efac0] [c0000000119efb30] 0xc0000000119efb30 (unreliable)
[ 184.108835] [c0000000119efb30] [c000000000c5671c] bpf_test_run+0xfc/0x470
[ 184.108841] [c0000000119efc10] [c000000000c57b6c] bpf_prog_test_run_skb+0x38c/0x660
[ 184.108848] [c0000000119efcb0] [c00000000035de6c] __sys_bpf+0x46c/0xd60
[ 184.108854] [c0000000119efd90] [c00000000035e810] sys_bpf+0x30/0x40
[ 184.108859] [c0000000119efdb0] [c00000000002ea34] system_call_exception+0x144/0x280
[ 184.108866] [c0000000119efe10] [c00000000000c570] system_call_vectored_common+0xf0/0x268
[ 184.108874] --- interrupt: 3000 at 0x7fff8bb3ef24
[ 184.108878] NIP: 00007fff8bb3ef24 LR: 0000000000000000 CTR: 0000000000000000
[ 184.108883] REGS: c0000000119efe80 TRAP: 3000 Tainted: G E (5.13.0-rc3+)
[ 184.108887] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000848 XER: 00000000
[ 184.108903] IRQMASK: 0
[ 184.108903] GPR00: 0000000000000169 00007fffe4577710 00007fff8bc27200 000000000000000a
[ 184.108903] GPR04: 00007fffe45777b8 0000000000000080 0000000000000001 0000000000000008
[ 184.108903] GPR08: 000000000000000a 0000000000000000 0000000000000000 0000000000000000
[ 184.108903] GPR12: 0000000000000000 00007fff8be1c400 0000000000000000 0000000000000000
[ 184.108903] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 184.108903] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 184.108903] GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000d1d0
[ 184.108903] GPR28: 0000000000000002 00007fffe4578128 00007fffe45782c0 00007fffe4577710
[ 184.108960] NIP [00007fff8bb3ef24] 0x7fff8bb3ef24
[ 184.108964] LR [0000000000000000] 0x0
[ 184.108967] --- interrupt: 3000
[ 184.108970] Instruction dump:
[ 184.108974] 60000000 f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8
[ 184.108985] 7d4048a8 7d4a1a14 7d4049ad 4082fff4 <28230000> 4082fffc 60000000 ebe10068

2021-06-28 09:23:23

by Brendan Jackman

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <[email protected]> wrote:
>
> On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:
>
> SNIP
>
> > diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> > new file mode 100644
> > index 000000000000..e82183e4914f
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> > @@ -0,0 +1,27 @@
> > +{
> > + "BPF_ATOMIC bounds propagation, mem->reg",
> > + .insns = {
> > + /* a = 0; */
> > + /*
> > + * Note this is implemented with two separate instructions,
> > + * where you might think one would suffice:
> > + *
> > + * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > + *
> > + * This is because BPF_ST_MEM doesn't seem to set the stack slot
> > + * type to 0 when storing an immediate.
> > + */
> > + BPF_MOV64_IMM(BPF_REG_0, 0),
> > + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
> > + /* b = atomic_fetch_add(&a, 1); */
> > + BPF_MOV64_IMM(BPF_REG_1, 1),
> > + BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8),
> > + /* Verifier should be able to tell that this infinite loop isn't reachable. */
> > + /* if (b) while (true) continue; */
> > + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1),
> > + BPF_EXIT_INSN(),
> > + },
> > + .result = ACCEPT,
> > + .result_unpriv = REJECT,
> > + .errstr_unpriv = "back-edge",
> > +},
> >
> > base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439
> > --
> > 2.30.0.365.g02bc693789-goog
> >
>
> hi,
> I tracked soft lock up on powerpc to this test:
>
> [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 25
> #25/u BPF_ATOMIC bounds propagation, mem->reg SKIP
> #25/p BPF_ATOMIC bounds propagation, mem->reg
>
> Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:24:34 ...
> kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
>
> Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:25:04 ...
> kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [test_verifier:1055]
>
> please check the console output below.. it looks like the verifier
> allowed the loop to happen for some reason on powerpc.. any idea?
>
> I'm on latest bpf-next/master, I can send the config if needed
>
> thanks,
> jirka
>
>
> ---
> ibm-p9z-07-lp1 login: [ 184.108655] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
> [ 184.108679] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> [ 184.108722] CPU: 2 PID: 1055 Comm: test_verifier Tainted: G E 5.13.0-rc3+ #3
> [ 184.108728] NIP: c00800000131314c LR: c000000000c56918 CTR: c008000001313118
> [ 184.108733] REGS: c0000000119ef820 TRAP: 0900 Tainted: G E (5.13.0-rc3+)
> [ 184.108739] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040003
> [ 184.108752] CFAR: c008000001313150 IRQMASK: 0
> [ 184.108752] GPR00: c000000000c5671c c0000000119efac0 c000000002a08400 0000000000000001
> [ 184.108752] GPR04: c0080000010c0048 ffffffffffffffff 0000000001f3f8ec 0000000000000008
> [ 184.108752] GPR08: 0000000000000000 c0000000119efae8 0000000000000001 49adb8fcb8417937
> [ 184.108752] GPR12: c008000001313118 c00000001ecae400 0000000000000000 0000000000000000
> [ 184.108752] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000000021cf6f8
> [ 184.108752] GPR20: 0000000000000000 c0000000119efc34 c0000000119efc30 c0080000010c0048
> [ 184.108752] GPR24: c00000000a1dc100 0000000000000001 c000000011fadc80 c0000000021cf638
> [ 184.108752] GPR28: c0080000010c0000 0000000000000001 c0000000021cf638 c0000000119efaf0
> [ 184.108812] NIP [c00800000131314c] bpf_prog_a2eb9104e5e8a5bf+0x34/0xcee8
> [ 184.108819] LR [c000000000c56918] bpf_test_run+0x2f8/0x470
> [ 184.108826] Call Trace:
> [ 184.108828] [c0000000119efac0] [c0000000119efb30] 0xc0000000119efb30 (unreliable)
> [ 184.108835] [c0000000119efb30] [c000000000c5671c] bpf_test_run+0xfc/0x470
> [ 184.108841] [c0000000119efc10] [c000000000c57b6c] bpf_prog_test_run_skb+0x38c/0x660
> [ 184.108848] [c0000000119efcb0] [c00000000035de6c] __sys_bpf+0x46c/0xd60
> [ 184.108854] [c0000000119efd90] [c00000000035e810] sys_bpf+0x30/0x40
> [ 184.108859] [c0000000119efdb0] [c00000000002ea34] system_call_exception+0x144/0x280
> [ 184.108866] [c0000000119efe10] [c00000000000c570] system_call_vectored_common+0xf0/0x268
> [ 184.108874] --- interrupt: 3000 at 0x7fff8bb3ef24
> [ 184.108878] NIP: 00007fff8bb3ef24 LR: 0000000000000000 CTR: 0000000000000000
> [ 184.108883] REGS: c0000000119efe80 TRAP: 3000 Tainted: G E (5.13.0-rc3+)
> [ 184.108887] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000848 XER: 00000000
> [ 184.108903] IRQMASK: 0
> [ 184.108903] GPR00: 0000000000000169 00007fffe4577710 00007fff8bc27200 000000000000000a
> [ 184.108903] GPR04: 00007fffe45777b8 0000000000000080 0000000000000001 0000000000000008
> [ 184.108903] GPR08: 000000000000000a 0000000000000000 0000000000000000 0000000000000000
> [ 184.108903] GPR12: 0000000000000000 00007fff8be1c400 0000000000000000 0000000000000000
> [ 184.108903] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 184.108903] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 184.108903] GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000d1d0
> [ 184.108903] GPR28: 0000000000000002 00007fffe4578128 00007fffe45782c0 00007fffe4577710
> [ 184.108960] NIP [00007fff8bb3ef24] 0x7fff8bb3ef24
> [ 184.108964] LR [0000000000000000] 0x0
> [ 184.108967] --- interrupt: 3000
> [ 184.108970] Instruction dump:
> [ 184.108974] 60000000 f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8
> [ 184.108985] 7d4048a8 7d4a1a14 7d4049ad 4082fff4 <28230000> 4082fffc 60000000 ebe10068

Hmm, is the test prog from atomic_bounds.c getting JITed there (my
dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That
shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d)
unsupported\n' in dmesg instead. I wonder if I missed something in
commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other
atomics in .imm). Any idea if this test was ever passing on PowerPC?

2021-06-29 14:14:28

by Jiri Olsa

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <[email protected]> wrote:
> >
> > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:
> >
> > SNIP
> >
> > > diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> > > new file mode 100644
> > > index 000000000000..e82183e4914f
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> > > @@ -0,0 +1,27 @@
> > > +{
> > > + "BPF_ATOMIC bounds propagation, mem->reg",
> > > + .insns = {
> > > + /* a = 0; */
> > > + /*
> > > + * Note this is implemented with two separate instructions,
> > > + * where you might think one would suffice:
> > > + *
> > > + * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > > + *
> > > + * This is because BPF_ST_MEM doesn't seem to set the stack slot
> > > + * type to 0 when storing an immediate.
> > > + */
> > > + BPF_MOV64_IMM(BPF_REG_0, 0),
> > > + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
> > > + /* b = atomic_fetch_add(&a, 1); */
> > > + BPF_MOV64_IMM(BPF_REG_1, 1),
> > > + BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8),
> > > + /* Verifier should be able to tell that this infinite loop isn't reachable. */
> > > + /* if (b) while (true) continue; */
> > > + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1),
> > > + BPF_EXIT_INSN(),
> > > + },
> > > + .result = ACCEPT,
> > > + .result_unpriv = REJECT,
> > > + .errstr_unpriv = "back-edge",
> > > +},
> > >
> > > base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439
> > > --
> > > 2.30.0.365.g02bc693789-goog
> > >
> >
> > hi,
> > I tracked soft lock up on powerpc to this test:
> >
> > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 25
> > #25/u BPF_ATOMIC bounds propagation, mem->reg SKIP
> > #25/p BPF_ATOMIC bounds propagation, mem->reg
> >
> > Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:24:34 ...
> > kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
> >
> > Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:25:04 ...
> > kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [test_verifier:1055]
> >
> > please check the console output below.. it looks like the verifier
> > allowed the loop to happen for some reason on powerpc.. any idea?
> >
> > I'm on latest bpf-next/master, I can send the config if needed
> >
> > thanks,
> > jirka
> >
> >
> > ---
> > ibm-p9z-07-lp1 login: [ 184.108655] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
> > [ 184.108679] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > [ 184.108722] CPU: 2 PID: 1055 Comm: test_verifier Tainted: G E 5.13.0-rc3+ #3
> > [ 184.108728] NIP: c00800000131314c LR: c000000000c56918 CTR: c008000001313118
> > [ 184.108733] REGS: c0000000119ef820 TRAP: 0900 Tainted: G E (5.13.0-rc3+)
> > [ 184.108739] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040003
> > [ 184.108752] CFAR: c008000001313150 IRQMASK: 0
> > [ 184.108752] GPR00: c000000000c5671c c0000000119efac0 c000000002a08400 0000000000000001
> > [ 184.108752] GPR04: c0080000010c0048 ffffffffffffffff 0000000001f3f8ec 0000000000000008
> > [ 184.108752] GPR08: 0000000000000000 c0000000119efae8 0000000000000001 49adb8fcb8417937
> > [ 184.108752] GPR12: c008000001313118 c00000001ecae400 0000000000000000 0000000000000000
> > [ 184.108752] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000000021cf6f8
> > [ 184.108752] GPR20: 0000000000000000 c0000000119efc34 c0000000119efc30 c0080000010c0048
> > [ 184.108752] GPR24: c00000000a1dc100 0000000000000001 c000000011fadc80 c0000000021cf638
> > [ 184.108752] GPR28: c0080000010c0000 0000000000000001 c0000000021cf638 c0000000119efaf0
> > [ 184.108812] NIP [c00800000131314c] bpf_prog_a2eb9104e5e8a5bf+0x34/0xcee8
> > [ 184.108819] LR [c000000000c56918] bpf_test_run+0x2f8/0x470
> > [ 184.108826] Call Trace:
> > [ 184.108828] [c0000000119efac0] [c0000000119efb30] 0xc0000000119efb30 (unreliable)
> > [ 184.108835] [c0000000119efb30] [c000000000c5671c] bpf_test_run+0xfc/0x470
> > [ 184.108841] [c0000000119efc10] [c000000000c57b6c] bpf_prog_test_run_skb+0x38c/0x660
> > [ 184.108848] [c0000000119efcb0] [c00000000035de6c] __sys_bpf+0x46c/0xd60
> > [ 184.108854] [c0000000119efd90] [c00000000035e810] sys_bpf+0x30/0x40
> > [ 184.108859] [c0000000119efdb0] [c00000000002ea34] system_call_exception+0x144/0x280
> > [ 184.108866] [c0000000119efe10] [c00000000000c570] system_call_vectored_common+0xf0/0x268
> > [ 184.108874] --- interrupt: 3000 at 0x7fff8bb3ef24
> > [ 184.108878] NIP: 00007fff8bb3ef24 LR: 0000000000000000 CTR: 0000000000000000
> > [ 184.108883] REGS: c0000000119efe80 TRAP: 3000 Tainted: G E (5.13.0-rc3+)
> > [ 184.108887] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000848 XER: 00000000
> > [ 184.108903] IRQMASK: 0
> > [ 184.108903] GPR00: 0000000000000169 00007fffe4577710 00007fff8bc27200 000000000000000a
> > [ 184.108903] GPR04: 00007fffe45777b8 0000000000000080 0000000000000001 0000000000000008
> > [ 184.108903] GPR08: 000000000000000a 0000000000000000 0000000000000000 0000000000000000
> > [ 184.108903] GPR12: 0000000000000000 00007fff8be1c400 0000000000000000 0000000000000000
> > [ 184.108903] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [ 184.108903] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [ 184.108903] GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000d1d0
> > [ 184.108903] GPR28: 0000000000000002 00007fffe4578128 00007fffe45782c0 00007fffe4577710
> > [ 184.108960] NIP [00007fff8bb3ef24] 0x7fff8bb3ef24
> > [ 184.108964] LR [0000000000000000] 0x0
> > [ 184.108967] --- interrupt: 3000
> > [ 184.108970] Instruction dump:
> > [ 184.108974] 60000000 f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8
> > [ 184.108985] 7d4048a8 7d4a1a14 7d4049ad 4082fff4 <28230000> 4082fffc 60000000 ebe10068
>
> Hmm, is the test prog from atomic_bounds.c getting JITed there (my
> dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That
> shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d)
> unsupported\n' in dmesg instead. I wonder if I missed something in
> commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other
> atomics in .imm). Any idea if this test was ever passing on PowerPC?
>

hum, I guess not.. will check

jirka

2021-06-29 16:11:50

by Jiri Olsa

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <[email protected]> wrote:
> > >
> > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:
> > >
> > > SNIP
> > >
> > > > diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> > > > new file mode 100644
> > > > index 000000000000..e82183e4914f
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> > > > @@ -0,0 +1,27 @@
> > > > +{
> > > > + "BPF_ATOMIC bounds propagation, mem->reg",
> > > > + .insns = {
> > > > + /* a = 0; */
> > > > + /*
> > > > + * Note this is implemented with two separate instructions,
> > > > + * where you might think one would suffice:
> > > > + *
> > > > + * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > > > + *
> > > > + * This is because BPF_ST_MEM doesn't seem to set the stack slot
> > > > + * type to 0 when storing an immediate.
> > > > + */
> > > > + BPF_MOV64_IMM(BPF_REG_0, 0),
> > > > + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
> > > > + /* b = atomic_fetch_add(&a, 1); */
> > > > + BPF_MOV64_IMM(BPF_REG_1, 1),
> > > > + BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8),
> > > > + /* Verifier should be able to tell that this infinite loop isn't reachable. */
> > > > + /* if (b) while (true) continue; */
> > > > + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1),
> > > > + BPF_EXIT_INSN(),
> > > > + },
> > > > + .result = ACCEPT,
> > > > + .result_unpriv = REJECT,
> > > > + .errstr_unpriv = "back-edge",
> > > > +},
> > > >
> > > > base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439
> > > > --
> > > > 2.30.0.365.g02bc693789-goog
> > > >
> > >
> > > hi,
> > > I tracked soft lock up on powerpc to this test:
> > >
> > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 25
> > > #25/u BPF_ATOMIC bounds propagation, mem->reg SKIP
> > > #25/p BPF_ATOMIC bounds propagation, mem->reg
> > >
> > > Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:24:34 ...
> > > kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
> > >
> > > Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:25:04 ...
> > > kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [test_verifier:1055]
> > >
> > > please check the console output below.. it looks like the verifier
> > > allowed the loop to happen for some reason on powerpc.. any idea?
> > >
> > > I'm on latest bpf-next/master, I can send the config if needed
> > >
> > > thanks,
> > > jirka
> > >
> > >
> > > ---
> > > ibm-p9z-07-lp1 login: [ 184.108655] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
> > > [ 184.108679] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > > [ 184.108722] CPU: 2 PID: 1055 Comm: test_verifier Tainted: G E 5.13.0-rc3+ #3
> > > [ 184.108728] NIP: c00800000131314c LR: c000000000c56918 CTR: c008000001313118
> > > [ 184.108733] REGS: c0000000119ef820 TRAP: 0900 Tainted: G E (5.13.0-rc3+)
> > > [ 184.108739] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040003
> > > [ 184.108752] CFAR: c008000001313150 IRQMASK: 0
> > > [ 184.108752] GPR00: c000000000c5671c c0000000119efac0 c000000002a08400 0000000000000001
> > > [ 184.108752] GPR04: c0080000010c0048 ffffffffffffffff 0000000001f3f8ec 0000000000000008
> > > [ 184.108752] GPR08: 0000000000000000 c0000000119efae8 0000000000000001 49adb8fcb8417937
> > > [ 184.108752] GPR12: c008000001313118 c00000001ecae400 0000000000000000 0000000000000000
> > > [ 184.108752] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000000021cf6f8
> > > [ 184.108752] GPR20: 0000000000000000 c0000000119efc34 c0000000119efc30 c0080000010c0048
> > > [ 184.108752] GPR24: c00000000a1dc100 0000000000000001 c000000011fadc80 c0000000021cf638
> > > [ 184.108752] GPR28: c0080000010c0000 0000000000000001 c0000000021cf638 c0000000119efaf0
> > > [ 184.108812] NIP [c00800000131314c] bpf_prog_a2eb9104e5e8a5bf+0x34/0xcee8
> > > [ 184.108819] LR [c000000000c56918] bpf_test_run+0x2f8/0x470
> > > [ 184.108826] Call Trace:
> > > [ 184.108828] [c0000000119efac0] [c0000000119efb30] 0xc0000000119efb30 (unreliable)
> > > [ 184.108835] [c0000000119efb30] [c000000000c5671c] bpf_test_run+0xfc/0x470
> > > [ 184.108841] [c0000000119efc10] [c000000000c57b6c] bpf_prog_test_run_skb+0x38c/0x660
> > > [ 184.108848] [c0000000119efcb0] [c00000000035de6c] __sys_bpf+0x46c/0xd60
> > > [ 184.108854] [c0000000119efd90] [c00000000035e810] sys_bpf+0x30/0x40
> > > [ 184.108859] [c0000000119efdb0] [c00000000002ea34] system_call_exception+0x144/0x280
> > > [ 184.108866] [c0000000119efe10] [c00000000000c570] system_call_vectored_common+0xf0/0x268
> > > [ 184.108874] --- interrupt: 3000 at 0x7fff8bb3ef24
> > > [ 184.108878] NIP: 00007fff8bb3ef24 LR: 0000000000000000 CTR: 0000000000000000
> > > [ 184.108883] REGS: c0000000119efe80 TRAP: 3000 Tainted: G E (5.13.0-rc3+)
> > > [ 184.108887] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000848 XER: 00000000
> > > [ 184.108903] IRQMASK: 0
> > > [ 184.108903] GPR00: 0000000000000169 00007fffe4577710 00007fff8bc27200 000000000000000a
> > > [ 184.108903] GPR04: 00007fffe45777b8 0000000000000080 0000000000000001 0000000000000008
> > > [ 184.108903] GPR08: 000000000000000a 0000000000000000 0000000000000000 0000000000000000
> > > [ 184.108903] GPR12: 0000000000000000 00007fff8be1c400 0000000000000000 0000000000000000
> > > [ 184.108903] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [ 184.108903] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [ 184.108903] GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000d1d0
> > > [ 184.108903] GPR28: 0000000000000002 00007fffe4578128 00007fffe45782c0 00007fffe4577710
> > > [ 184.108960] NIP [00007fff8bb3ef24] 0x7fff8bb3ef24
> > > [ 184.108964] LR [0000000000000000] 0x0
> > > [ 184.108967] --- interrupt: 3000
> > > [ 184.108970] Instruction dump:
> > > [ 184.108974] 60000000 f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8
> > > [ 184.108985] 7d4048a8 7d4a1a14 7d4049ad 4082fff4 <28230000> 4082fffc 60000000 ebe10068
> >
> > Hmm, is the test prog from atomic_bounds.c getting JITed there (my
> > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That
> > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d)
> > unsupported\n' in dmesg instead. I wonder if I missed something in
> > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other

I see that for all the other atomics tests:

[root@ibm-p9z-07-lp1 bpf]# ./test_verifier 21
#21/p BPF_ATOMIC_AND without fetch FAIL
Failed to load prog 'Unknown error 524'!
verification time 32 usec
stack depth 8
processed 10 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
Summary: 0 PASSED, 0 SKIPPED, 2 FAILED

console:

[ 51.850952] eBPF filter atomic op code db (@2) unsupported
[ 51.851134] eBPF filter atomic op code db (@2) unsupported


[root@ibm-p9z-07-lp1 bpf]# ./test_verifier 22
#22/u BPF_ATOMIC_AND with fetch FAIL
Failed to load prog 'Unknown error 524'!
verification time 38 usec
stack depth 8
processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
#22/p BPF_ATOMIC_AND with fetch FAIL
Failed to load prog 'Unknown error 524'!
verification time 26 usec
stack depth 8
processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1

console:
[ 223.231420] eBPF filter atomic op code db (@3) unsupported
[ 223.231596] eBPF filter atomic op code db (@3) unsupported

...


but no such console output for:

[root@ibm-p9z-07-lp1 bpf]# ./test_verifier 24
#24/u BPF_ATOMIC bounds propagation, mem->reg OK


> > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> >
>
> hum, I guess not.. will check

nope, it locks up the same:

ibm-p9z-07-lp1 login: [ 88.070644] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [test_verifier:950]
[ 88.070674] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[ 88.070714] CPU: 4 PID: 950 Comm: test_verifier Tainted: G E 5.11.0-rc4+ #4
[ 88.070721] NIP: c0080000017daad8 LR: c000000000c26ea8 CTR: c0080000017daaa0
[ 88.070726] REGS: c000000011407870 TRAP: 0900 Tainted: G E (5.11.0-rc4+)
[ 88.070732] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040169
[ 88.070745] CFAR: c0080000017daad8 IRQMASK: 0
[ 88.070745] GPR00: c000000000c26d1c c000000011407b10 c000000002011600 0000000000000001
[ 88.070745] GPR04: c0080000016d0038 ffffffffffffffff 0000000001f3f8ec 0017b1edefe5ad37
[ 88.070745] GPR08: 0000000000000000 c000000011407b38 0000000000000001 f5d4bac969e94f08
[ 88.070745] GPR12: c0080000017daaa0 c00000001ecab400 0000000000000000 0000000000000000
[ 88.070745] GPR16: 000000001003a348 000000001003a3e8 0000000000000000 0000000000000000
[ 88.070745] GPR20: c000000011407c64 c0080000016d0038 c000000011407c60 c000000001734238
[ 88.070745] GPR24: c000000012ce8400 0000000000000001 c000000001734230 c0000000113e9300
[ 88.070745] GPR28: 0000000e5b65166b 0000000000000001 c0080000016d0000 c000000011407b40
[ 88.070807] NIP [c0080000017daad8] bpf_prog_a2eb9104e5e8a5bf+0x38/0x5560
[ 88.070815] LR [c000000000c26ea8] bpf_test_run+0x288/0x470
[ 88.070823] Call Trace:
[ 88.070825] [c000000011407b10] [c000000011407b80] 0xc000000011407b80 (unreliable)
[ 88.070832] [c000000011407b80] [c000000000c26d1c] bpf_test_run+0xfc/0x470
[ 88.070840] [c000000011407c40] [c000000000c2823c] bpf_prog_test_run_skb+0x38c/0x660
[ 88.070846] [c000000011407ce0] [c00000000035896c] __do_sys_bpf+0x4cc/0xf30
[ 88.070853] [c000000011407db0] [c000000000037a14] system_call_exception+0x134/0x230
[ 88.070861] [c000000011407e10] [c00000000000c874] system_call_vectored_common+0xf4/0x26c
[ 88.070869] Instruction dump:
[ 88.070873] f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8 7d4048a8
[ 88.070886] 7d4a1a14 7d4049ad 4082fff4 28230000 <4082fffc> 60000000 ebe10068 38210070

ibm-p9z-07-lp1 login: [ 121.762511] rcu: INFO: rcu_sched self-detected stall on CPU
[ 121.762536] rcu: 4-....: (5999 ticks this GP) idle=9ee/1/0x4000000000000002 softirq=2299/2299 fqs=2998
[ 121.762546] (t=6000 jiffies g=3681 q=798)
[ 121.762551] NMI backtrace for cpu 4
[ 121.762555] CPU: 4 PID: 950 Comm: test_verifier Tainted: G EL 5.11.0-rc4+ #4
[ 121.762562] Call Trace:
[ 121.762565] [c0000000114072b0] [c0000000007cfe9c] dump_stack+0xc0/0x104 (unreliable)
[ 121.762575] [c000000011407300] [c0000000007dd028] nmi_cpu_backtrace+0xe8/0x130
[ 121.762582] [c000000011407370] [c0000000007dd21c] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
[ 121.762589] [c000000011407410] [c0000000000718a8] arch_trigger_cpumask_backtrace+0x28/0x40
[ 121.762596] [c000000011407430] [c000000000215740] rcu_dump_cpu_stacks+0x10c/0x168
[ 121.762604] [c000000011407480] [c00000000020b080] print_cpu_stall+0x1d0/0x2a0
[ 121.762611] [c000000011407530] [c00000000020e4d4] check_cpu_stall+0x174/0x340
[ 121.762618] [c000000011407560] [c000000000213b28] rcu_sched_clock_irq+0x98/0x3f0
[ 121.762625] [c0000000114075a0] [c00000000022b974] update_process_times+0xc4/0x150
[ 121.762631] [c0000000114075e0] [c000000000245cec] tick_sched_handle+0x3c/0xd0
[ 121.762638] [c000000011407610] [c000000000246408] tick_sched_timer+0x68/0xe0
[ 121.762645] [c000000011407650] [c00000000022ca18] __hrtimer_run_queues+0x1c8/0x430
[ 121.762652] [c0000000114076f0] [c00000000022ddd4] hrtimer_interrupt+0x124/0x300
[ 121.762658] [c0000000114077a0] [c0000000000286a4] timer_interrupt+0x104/0x290
[ 121.762665] [c000000011407800] [c000000000009978] decrementer_common_virt+0x1c8/0x1d0
[ 121.762673] --- interrupt: 900 at bpf_prog_a2eb9104e5e8a5bf+0x34/0x5560
[ 121.762680] NIP: c0080000017daad4 LR: c000000000c26ea8 CTR: c0080000017daaa0
[ 121.762684] REGS: c000000011407870 TRAP: 0900 Tainted: G EL (5.11.0-rc4+)
[ 121.762689] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040169
[ 121.762702] CFAR: c0080000017daad8 IRQMASK: 0
[ 121.762702] GPR00: c000000000c26d1c c000000011407b10 c000000002011600 0000000000000001
[ 121.762702] GPR04: c0080000016d0038 ffffffffffffffff 0000000001f3f8ec 0017b1edefe5ad37
[ 121.762702] GPR08: 0000000000000000 c000000011407b38 0000000000000001 f5d4bac969e94f08
[ 121.762702] GPR12: c0080000017daaa0 c00000001ecab400 0000000000000000 0000000000000000
[ 121.762702] GPR16: 000000001003a348 000000001003a3e8 0000000000000000 0000000000000000
[ 121.762702] GPR20: c000000011407c64 c0080000016d0038 c000000011407c60 c000000001734238
[ 121.762702] GPR24: c000000012ce8400 0000000000000001 c000000001734230 c0000000113e9300
[ 121.762702] GPR28: 0000000e5b65166b 0000000000000001 c0080000016d0000 c000000011407b40
[ 121.762765] NIP [c0080000017daad4] bpf_prog_a2eb9104e5e8a5bf+0x34/0x5560
[ 121.762770] LR [c000000000c26ea8] bpf_test_run+0x288/0x470
[ 121.762777] --- interrupt: 900
[ 121.762779] [c000000011407b10] [c000000011407b80] 0xc000000011407b80 (unreliable)
[ 121.762786] [c000000011407b80] [c000000000c26d1c] bpf_test_run+0xfc/0x470
[ 121.762793] [c000000011407c40] [c000000000c2823c] bpf_prog_test_run_skb+0x38c/0x660
[ 121.762800] [c000000011407ce0] [c00000000035896c] __do_sys_bpf+0x4cc/0xf30
[ 121.762807] [c000000011407db0] [c000000000037a14] system_call_exception+0x134/0x230
[ 121.762814] [c000000011407e10] [c00000000000c874] system_call_vectored_common+0xf4/0x26c
[ 148.073962] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [test_verifier:950]
[ 148.073990] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[ 148.074029] CPU: 4 PID: 950 Comm: test_verifier Tainted: G EL 5.11.0-rc4+ #4
[ 148.074036] NIP: c0080000017daad8 LR: c000000000c26ea8 CTR: c0080000017daaa0
[ 148.074041] REGS: c000000011407870 TRAP: 0900 Tainted: G EL (5.11.0-rc4+)
[ 148.074047] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040169
[ 148.074060] CFAR: c0080000017daad8 IRQMASK: 0
[ 148.074060] GPR00: c000000000c26d1c c000000011407b10 c000000002011600 0000000000000001
[ 148.074060] GPR04: c0080000016d0038 ffffffffffffffff 0000000001f3f8ec 0017b1edefe5ad37
[ 148.074060] GPR08: 0000000000000000 c000000011407b38 0000000000000001 f5d4bac969e94f08
[ 148.074060] GPR12: c0080000017daaa0 c00000001ecab400 0000000000000000 0000000000000000
[ 148.074060] GPR16: 000000001003a348 000000001003a3e8 0000000000000000 0000000000000000
[ 148.074060] GPR20: c000000011407c64 c0080000016d0038 c000000011407c60 c000000001734238
[ 148.074060] GPR24: c000000012ce8400 0000000000000001 c000000001734230 c0000000113e9300
[ 148.074060] GPR28: 0000000e5b65166b 0000000000000001 c0080000016d0000 c000000011407b40
[ 148.074122] NIP [c0080000017daad8] bpf_prog_a2eb9104e5e8a5bf+0x38/0x5560
[ 148.074129] LR [c000000000c26ea8] bpf_test_run+0x288/0x470
[ 148.074137] Call Trace:
[ 148.074139] [c000000011407b10] [c000000011407b80] 0xc000000011407b80 (unreliable)
[ 148.074147] [c000000011407b80] [c000000000c26d1c] bpf_test_run+0xfc/0x470
[ 148.074154] [c000000011407c40] [c000000000c2823c] bpf_prog_test_run_skb+0x38c/0x660
[ 148.074160] [c000000011407ce0] [c00000000035896c] __do_sys_bpf+0x4cc/0xf30
[ 148.074168] [c000000011407db0] [c000000000037a14] system_call_exception+0x134/0x230
[ 148.074175] [c000000011407e10] [c00000000000c874] system_call_vectored_common+0xf4/0x26c
[ 148.074183] Instruction dump:
[ 148.074188] f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8 7d4048a8
[ 148.074199] 7d4a1a14 7d4049ad 4082fff4 28230000 <4082fffc> 60000000 ebe10068 38210070

jirka

2021-06-29 17:04:04

by Jiri Olsa

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <[email protected]> wrote:
> >
> > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> > > > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <[email protected]> wrote:
> > > > >
> > > > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:
> [snip]
> > > > Hmm, is the test prog from atomic_bounds.c getting JITed there (my
> > > > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That
> > > > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d)
> > > > unsupported\n' in dmesg instead. I wonder if I missed something in
> > > > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other
> >
> > I see that for all the other atomics tests:
> >
> > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 21
> > #21/p BPF_ATOMIC_AND without fetch FAIL
> > Failed to load prog 'Unknown error 524'!
> > verification time 32 usec
> > stack depth 8
> > processed 10 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> > Summary: 0 PASSED, 0 SKIPPED, 2 FAILED
>
> Hm that's also not good - failure to JIT shouldn't mean failure to
> load. Are there other test_verifier failures or is it just the atomics
> ones?

I have CONFIG_BPF_JIT_ALWAYS_ON=y so I think that's fine

>
> > console:
> >
> > [ 51.850952] eBPF filter atomic op code db (@2) unsupported
> > [ 51.851134] eBPF filter atomic op code db (@2) unsupported
> >
> >
> > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 22
> > #22/u BPF_ATOMIC_AND with fetch FAIL
> > Failed to load prog 'Unknown error 524'!
> > verification time 38 usec
> > stack depth 8
> > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> > #22/p BPF_ATOMIC_AND with fetch FAIL
> > Failed to load prog 'Unknown error 524'!
> > verification time 26 usec
> > stack depth 8
> > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> >
> > console:
> > [ 223.231420] eBPF filter atomic op code db (@3) unsupported
> > [ 223.231596] eBPF filter atomic op code db (@3) unsupported
> >
> > ...
> >
> >
> > but no such console output for:
> >
> > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 24
> > #24/u BPF_ATOMIC bounds propagation, mem->reg OK
> >
> >
> > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > >
> > >
> > > hum, I guess not.. will check
> >
> > nope, it locks up the same:
>
> Do you mean it locks up at commit 91c960b0056 too?
>

I tried this one:
37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

I will check also 91c960b0056, but I think it's the new test issue

jirka

2021-06-29 17:46:19

by Brendan Jackman

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <[email protected]> wrote:
>
> On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> > > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <[email protected]> wrote:
> > > >
> > > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:
[snip]
> > > Hmm, is the test prog from atomic_bounds.c getting JITed there (my
> > > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That
> > > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d)
> > > unsupported\n' in dmesg instead. I wonder if I missed something in
> > > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other
>
> I see that for all the other atomics tests:
>
> [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 21
> #21/p BPF_ATOMIC_AND without fetch FAIL
> Failed to load prog 'Unknown error 524'!
> verification time 32 usec
> stack depth 8
> processed 10 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> Summary: 0 PASSED, 0 SKIPPED, 2 FAILED

Hm that's also not good - failure to JIT shouldn't mean failure to
load. Are there other test_verifier failures or is it just the atomics
ones?

> console:
>
> [ 51.850952] eBPF filter atomic op code db (@2) unsupported
> [ 51.851134] eBPF filter atomic op code db (@2) unsupported
>
>
> [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 22
> #22/u BPF_ATOMIC_AND with fetch FAIL
> Failed to load prog 'Unknown error 524'!
> verification time 38 usec
> stack depth 8
> processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> #22/p BPF_ATOMIC_AND with fetch FAIL
> Failed to load prog 'Unknown error 524'!
> verification time 26 usec
> stack depth 8
> processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
>
> console:
> [ 223.231420] eBPF filter atomic op code db (@3) unsupported
> [ 223.231596] eBPF filter atomic op code db (@3) unsupported
>
> ...
>
>
> but no such console output for:
>
> [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 24
> #24/u BPF_ATOMIC bounds propagation, mem->reg OK
>
>
> > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > >
> >
> > hum, I guess not.. will check
>
> nope, it locks up the same:

Do you mean it locks up at commit 91c960b0056 too?

2021-06-29 21:28:08

by Jiri Olsa

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
> On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <[email protected]> wrote:
> > >
> > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> > > > > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <[email protected]> wrote:
> > > > > >
> > > > > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:
> > [snip]
> > > > > Hmm, is the test prog from atomic_bounds.c getting JITed there (my
> > > > > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That
> > > > > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d)
> > > > > unsupported\n' in dmesg instead. I wonder if I missed something in
> > > > > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other
> > >
> > > I see that for all the other atomics tests:
> > >
> > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 21
> > > #21/p BPF_ATOMIC_AND without fetch FAIL
> > > Failed to load prog 'Unknown error 524'!
> > > verification time 32 usec
> > > stack depth 8
> > > processed 10 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> > > Summary: 0 PASSED, 0 SKIPPED, 2 FAILED
> >
> > Hm that's also not good - failure to JIT shouldn't mean failure to
> > load. Are there other test_verifier failures or is it just the atomics
> > ones?
>
> I have CONFIG_BPF_JIT_ALWAYS_ON=y so I think that's fine
>
> >
> > > console:
> > >
> > > [ 51.850952] eBPF filter atomic op code db (@2) unsupported
> > > [ 51.851134] eBPF filter atomic op code db (@2) unsupported
> > >
> > >
> > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 22
> > > #22/u BPF_ATOMIC_AND with fetch FAIL
> > > Failed to load prog 'Unknown error 524'!
> > > verification time 38 usec
> > > stack depth 8
> > > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> > > #22/p BPF_ATOMIC_AND with fetch FAIL
> > > Failed to load prog 'Unknown error 524'!
> > > verification time 26 usec
> > > stack depth 8
> > > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> > >
> > > console:
> > > [ 223.231420] eBPF filter atomic op code db (@3) unsupported
> > > [ 223.231596] eBPF filter atomic op code db (@3) unsupported
> > >
> > > ...
> > >
> > >
> > > but no such console output for:
> > >
> > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 24
> > > #24/u BPF_ATOMIC bounds propagation, mem->reg OK
> > >
> > >
> > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > > >
> > > >
> > > > hum, I guess not.. will check
> > >
> > > nope, it locks up the same:
> >
> > Do you mean it locks up at commit 91c960b0056 too?
> >
>
> I tried this one:
> 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
>
> I will check also 91c960b0056, but I think it's the new test issue

for i91c960b0056 in HEAD I'm getting just 2 fails:

#1097/p xadd/w check whether src/dst got mangled, 1 FAIL
Failed to load prog 'Unknown error 524'!
verification time 25 usec
stack depth 8
processed 12 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0
#1098/p xadd/w check whether src/dst got mangled, 2 FAIL
Failed to load prog 'Unknown error 524'!
verification time 30 usec
stack depth 8
processed 12 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0

with console output:

[ 289.499341] eBPF filter atomic op code db (@4) unsupported
[ 289.499510] eBPF filter atomic op code c3 (@4) unsupported

no lock up

jirka

2021-06-30 10:41:03

by Brendan Jackman

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <[email protected]> wrote:
>
> On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
> > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <[email protected]> wrote:
> > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:

> > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > > > >
> > > > >
> > > > > hum, I guess not.. will check
> > > >
> > > > nope, it locks up the same:
> > >
> > > Do you mean it locks up at commit 91c960b0056 too?

Sorry I was being stupid here - the test didn't exist at this commit

> > I tried this one:
> > 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
> >
> > I will check also 91c960b0056, but I think it's the new test issue

So yeah hard to say whether this was broken on PowerPC all along. How
hard is it for me to get set up to reproduce the failure? Is there a
rootfs I can download, and some instructions for running a PowerPC
QEMU VM? If so if you can also share your config and I'll take a look.

If it's not as simple as that, I'll stare at the code for a while and
see if anything jumps out.

2021-06-30 12:44:32

by Jiri Olsa

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote:
> On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <[email protected]> wrote:
> >
> > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
> > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <[email protected]> wrote:
> > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
>
> > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > > > > >
> > > > > >
> > > > > > hum, I guess not.. will check
> > > > >
> > > > > nope, it locks up the same:
> > > >
> > > > Do you mean it locks up at commit 91c960b0056 too?
>
> Sorry I was being stupid here - the test didn't exist at this commit
>
> > > I tried this one:
> > > 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
> > >
> > > I will check also 91c960b0056, but I think it's the new test issue
>
> So yeah hard to say whether this was broken on PowerPC all along. How
> hard is it for me to get set up to reproduce the failure? Is there a
> rootfs I can download, and some instructions for running a PowerPC
> QEMU VM? If so if you can also share your config and I'll take a look.
>
> If it's not as simple as that, I'll stare at the code for a while and
> see if anything jumps out.
>

I have latest fedora ppc server and compile/install latest bpf-next tree
I think it will be reproduced also on vm, I attached my config

jirka


Attachments:
(No filename) (1.57 kB)
config (125.12 kB)
Download all attachments

2021-07-01 08:20:16

by Brendan Jackman

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Wed, 30 Jun 2021 at 14:42, Jiri Olsa <[email protected]> wrote:
>
> On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote:
> > On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <[email protected]> wrote:
> > >
> > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
> > > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> > > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <[email protected]> wrote:
> > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> >
> > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > > > > > >
> > > > > > >
> > > > > > > hum, I guess not.. will check
> > > > > >
> > > > > > nope, it locks up the same:
> > > > >
> > > > > Do you mean it locks up at commit 91c960b0056 too?
> >
> > Sorry I was being stupid here - the test didn't exist at this commit
> >
> > > > I tried this one:
> > > > 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
> > > >
> > > > I will check also 91c960b0056, but I think it's the new test issue
> >
> > So yeah hard to say whether this was broken on PowerPC all along. How
> > hard is it for me to get set up to reproduce the failure? Is there a
> > rootfs I can download, and some instructions for running a PowerPC
> > QEMU VM? If so if you can also share your config and I'll take a look.
> >
> > If it's not as simple as that, I'll stare at the code for a while and
> > see if anything jumps out.
> >
>
> I have latest fedora ppc server and compile/install latest bpf-next tree
> I think it will be reproduced also on vm, I attached my config

OK, getting set up to boot a PowerPC QEMU isn't practical here unless
someone's got commands I can copy-paste (suspect it will need .config
hacking too). Looks like you need to build a proper bootloader, and
boot an installer disk.

Looked at the code for a bit but nothing jumped out. It seems like the
verifier is seeing a BPF_ADD | BPF_FETCH, which means it doesn't
detect an infinite loop, but then we lose the BPF_FETCH flag somewhere
between do_check in verifier.c and bpf_jit_build_body in
bpf_jit_comp64.c. That would explain why we don't get the "eBPF filter
atomic op code %02x (@%d) unsupported", and would also explain the
lockup because a normal atomic add without fetch would leave BPF R1
unchanged.

We should be able to confirm that theory by disassembling the JITted
code that gets hexdumped by bpf_jit_dump when bpf_jit_enable is set to
2... at least for PowerPC 32-bit... maybe you could paste those lines
into the 64-bit version too? Here's some notes I made for
disassembling the hexdump on x86, I guess you'd just need to change
the objdump flags:

--

- Enable console JIT output:
```shell
echo 2 > /proc/sys/net/core/bpf_jit_enable
```
- Load & run the program of interest.
- Copy the hex code from the kernel console to `/tmp/jit.txt`. Here's what a
short program looks like. This includes a line of context - don't paste the
`flen=` line.
```
[ 79.381020] flen=8 proglen=54 pass=4 image=000000001af6f390
from=test_verifier pid=258
[ 79.389568] JIT code: 00000000: 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 08 00
[ 79.397411] JIT code: 00000010: 00 00 48 c7 45 f8 64 00 00 00 bf 04 00 00 00 48
[ 79.405965] JIT code: 00000020: f7 df f0 48 29 7d f8 8b 45 f8 48 83 f8 60 74 02
[ 79.414719] JIT code: 00000030: c9 c3 31 c0 eb fa
```
- This incantation will split out and decode the hex, then disassemble the
result:
```shell
cat /tmp/jit.txt | cut -d: -f2- | xxd -r >/tmp/obj && objdump -D -b
binary -m i386:x86-64 /tmp/obj
```

--

Sandipan, Naveen, do you know of anything in the PowerPC code that
might be leading us to drop the BPF_FETCH flag from the atomic
instruction in tools/testing/selftests/bpf/verifier/atomic_bounds.c?

2021-07-01 10:19:51

by Jiri Olsa

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Thu, Jul 01, 2021 at 10:18:39AM +0200, Brendan Jackman wrote:
> On Wed, 30 Jun 2021 at 14:42, Jiri Olsa <[email protected]> wrote:
> >
> > On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote:
> > > On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <[email protected]> wrote:
> > > >
> > > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
> > > > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> > > > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <[email protected]> wrote:
> > > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> > >
> > > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > > > > > > >
> > > > > > > >
> > > > > > > > hum, I guess not.. will check
> > > > > > >
> > > > > > > nope, it locks up the same:
> > > > > >
> > > > > > Do you mean it locks up at commit 91c960b0056 too?
> > >
> > > Sorry I was being stupid here - the test didn't exist at this commit
> > >
> > > > > I tried this one:
> > > > > 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
> > > > >
> > > > > I will check also 91c960b0056, but I think it's the new test issue
> > >
> > > So yeah hard to say whether this was broken on PowerPC all along. How
> > > hard is it for me to get set up to reproduce the failure? Is there a
> > > rootfs I can download, and some instructions for running a PowerPC
> > > QEMU VM? If so if you can also share your config and I'll take a look.
> > >
> > > If it's not as simple as that, I'll stare at the code for a while and
> > > see if anything jumps out.
> > >
> >
> > I have latest fedora ppc server and compile/install latest bpf-next tree
> > I think it will be reproduced also on vm, I attached my config
>
> OK, getting set up to boot a PowerPC QEMU isn't practical here unless
> someone's got commands I can copy-paste (suspect it will need .config
> hacking too). Looks like you need to build a proper bootloader, and
> boot an installer disk.
>
> Looked at the code for a bit but nothing jumped out. It seems like the
> verifier is seeing a BPF_ADD | BPF_FETCH, which means it doesn't
> detect an infinite loop, but then we lose the BPF_FETCH flag somewhere
> between do_check in verifier.c and bpf_jit_build_body in
> bpf_jit_comp64.c. That would explain why we don't get the "eBPF filter
> atomic op code %02x (@%d) unsupported", and would also explain the
> lockup because a normal atomic add without fetch would leave BPF R1
> unchanged.
>
> We should be able to confirm that theory by disassembling the JITted
> code that gets hexdumped by bpf_jit_dump when bpf_jit_enable is set to
> 2... at least for PowerPC 32-bit... maybe you could paste those lines
> into the 64-bit version too? Here's some notes I made for
> disassembling the hexdump on x86, I guess you'd just need to change
> the objdump flags:
>
> --
>
> - Enable console JIT output:
> ```shell
> echo 2 > /proc/sys/net/core/bpf_jit_enable
> ```
> - Load & run the program of interest.
> - Copy the hex code from the kernel console to `/tmp/jit.txt`. Here's what a
> short program looks like. This includes a line of context - don't paste the
> `flen=` line.
> ```
> [ 79.381020] flen=8 proglen=54 pass=4 image=000000001af6f390
> from=test_verifier pid=258
> [ 79.389568] JIT code: 00000000: 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 08 00
> [ 79.397411] JIT code: 00000010: 00 00 48 c7 45 f8 64 00 00 00 bf 04 00 00 00 48
> [ 79.405965] JIT code: 00000020: f7 df f0 48 29 7d f8 8b 45 f8 48 83 f8 60 74 02
> [ 79.414719] JIT code: 00000030: c9 c3 31 c0 eb fa
> ```
> - This incantation will split out and decode the hex, then disassemble the
> result:
> ```shell
> cat /tmp/jit.txt | cut -d: -f2- | xxd -r >/tmp/obj && objdump -D -b
> binary -m i386:x86-64 /tmp/obj
> ```

that's where I decided to write to list and ask for help before
googling ppc assembly ;-)

I changed the test_verifier to stop before executing the test
so I can dump the program via bpftool:

[root@ibm-p9z-07-lp1 bpf-next]# bpftool prog dump xlated id 48
0: (b7) r0 = 0
1: (7b) *(u64 *)(r10 -8) = r0
2: (b7) r1 = 1
3: (db) r1 = atomic64_fetch_add((u64 *)(r10 -8), r1)
4: (55) if r1 != 0x0 goto pc-1
5: (95) exit

[root@ibm-p9z-07-lp1 bpf-next]# bpftool prog dump jited id 48
bpf_prog_a2eb9104e5e8a5bf:
0: nop
4: nop
8: stdu r1,-112(r1)
c: std r31,104(r1)
10: addi r31,r1,48
14: li r8,0
18: std r8,-8(r31)
1c: li r3,1
20: addi r9,r31,-8
24: ldarx r10,0,r9
28: add r10,r10,r3
2c: stdcx. r10,0,r9
30: bne 0x0000000000000024
34: cmpldi r3,0
38: bne 0x0000000000000034
3c: nop
40: ld r31,104(r1)
44: addi r1,r1,112
48: mr r3,r8
4c: blr

I wanted to also do it through bpf_jit_enable and bpf_jit_dump, but I need to check
the setup, because I can't set bpf_jit_enable to 2 at the moment.. might take some time

[root@ibm-p9z-07-lp1 bpf-next]# echo 2 > /proc/sys/net/core/bpf_jit_enable
-bash: echo: write error: Invalid argument

jirka

>
> --
>
> Sandipan, Naveen, do you know of anything in the PowerPC code that
> might be leading us to drop the BPF_FETCH flag from the atomic
> instruction in tools/testing/selftests/bpf/verifier/atomic_bounds.c?
>

2021-07-01 11:09:14

by Naveen N. Rao

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

Hi Brendan, Hi Jiri,


Brendan Jackman wrote:
> On Wed, 30 Jun 2021 at 14:42, Jiri Olsa <[email protected]> wrote:
>>
>> On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote:
>> > On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <[email protected]> wrote:
>> > >
>> > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
>> > > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
>> > > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <[email protected]> wrote:
>> > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
>> > > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
>> >
>> > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
>> > > > > > > >
>> > > > > > >
>> > > > > > > hum, I guess not.. will check
>> > > > > >
>> > > > > > nope, it locks up the same:
>> > > > >
>> > > > > Do you mean it locks up at commit 91c960b0056 too?
>> >
>> > Sorry I was being stupid here - the test didn't exist at this commit
>> >
>> > > > I tried this one:
>> > > > 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
>> > > >
>> > > > I will check also 91c960b0056, but I think it's the new test issue
>> >
>> > So yeah hard to say whether this was broken on PowerPC all along. How
>> > hard is it for me to get set up to reproduce the failure? Is there a
>> > rootfs I can download, and some instructions for running a PowerPC
>> > QEMU VM? If so if you can also share your config and I'll take a look.
>> >
>> > If it's not as simple as that, I'll stare at the code for a while and
>> > see if anything jumps out.
>> >
>>
>> I have latest fedora ppc server and compile/install latest bpf-next tree
>> I think it will be reproduced also on vm, I attached my config
>
> OK, getting set up to boot a PowerPC QEMU isn't practical here unless
> someone's got commands I can copy-paste (suspect it will need .config
> hacking too). Looks like you need to build a proper bootloader, and
> boot an installer disk.

There are some notes put up here, though we can do better:
https://github.com/linuxppc/wiki/wiki/Booting-with-Qemu

If you are familiar with ubuntu/fedora cloud images (and cloud-init),
you should be able to grab one of the ppc64le images and boot it in
qemu:
https://cloud-images.ubuntu.com/releases/hirsute/release/
https://alt.fedoraproject.org/alt/

>
> Looked at the code for a bit but nothing jumped out. It seems like the
> verifier is seeing a BPF_ADD | BPF_FETCH, which means it doesn't
> detect an infinite loop, but then we lose the BPF_FETCH flag somewhere
> between do_check in verifier.c and bpf_jit_build_body in
> bpf_jit_comp64.c. That would explain why we don't get the "eBPF filter
> atomic op code %02x (@%d) unsupported", and would also explain the
> lockup because a normal atomic add without fetch would leave BPF R1
> unchanged.
>
> We should be able to confirm that theory by disassembling the JITted
> code that gets hexdumped by bpf_jit_dump when bpf_jit_enable is set to
> 2... at least for PowerPC 32-bit... maybe you could paste those lines
> into the 64-bit version too? Here's some notes I made for
> disassembling the hexdump on x86, I guess you'd just need to change
> the objdump flags:
>
> --
>
> - Enable console JIT output:
> ```shell
> echo 2 > /proc/sys/net/core/bpf_jit_enable
> ```
> - Load & run the program of interest.
> - Copy the hex code from the kernel console to `/tmp/jit.txt`. Here's what a
> short program looks like. This includes a line of context - don't paste the
> `flen=` line.
> ```
> [ 79.381020] flen=8 proglen=54 pass=4 image=000000001af6f390
> from=test_verifier pid=258
> [ 79.389568] JIT code: 00000000: 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 08 00
> [ 79.397411] JIT code: 00000010: 00 00 48 c7 45 f8 64 00 00 00 bf 04 00 00 00 48
> [ 79.405965] JIT code: 00000020: f7 df f0 48 29 7d f8 8b 45 f8 48 83 f8 60 74 02
> [ 79.414719] JIT code: 00000030: c9 c3 31 c0 eb fa
> ```
> - This incantation will split out and decode the hex, then disassemble the
> result:
> ```shell
> cat /tmp/jit.txt | cut -d: -f2- | xxd -r >/tmp/obj && objdump -D -b
> binary -m i386:x86-64 /tmp/obj
> ```
>
> --
>
> Sandipan, Naveen, do you know of anything in the PowerPC code that
> might be leading us to drop the BPF_FETCH flag from the atomic
> instruction in tools/testing/selftests/bpf/verifier/atomic_bounds.c?

Yes, I think I just found the issue. We aren't looking at the correct
BPF instruction when checking the IMM value.


--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -673,7 +673,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
* BPF_STX ATOMIC (atomic ops)
*/
case BPF_STX | BPF_ATOMIC | BPF_W:
- if (insn->imm != BPF_ADD) {
+ if (insn[i].imm != BPF_ADD) {
pr_err_ratelimited(
"eBPF filter atomic op code %02x (@%d) unsupported\n",
code, i);
@@ -695,7 +695,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
PPC_BCC_SHORT(COND_NE, tmp_idx);
break;
case BPF_STX | BPF_ATOMIC | BPF_DW:
- if (insn->imm != BPF_ADD) {
+ if (insn[i].imm != BPF_ADD) {
pr_err_ratelimited(
"eBPF filter atomic op code %02x (@%d) unsupported\n",
code, i);



Thanks,
Naveen

2021-07-01 14:20:43

by Jiri Olsa

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Thu, Jul 01, 2021 at 04:32:03PM +0530, Naveen N. Rao wrote:
> Hi Brendan, Hi Jiri,
>
>
> Brendan Jackman wrote:
> > On Wed, 30 Jun 2021 at 14:42, Jiri Olsa <[email protected]> wrote:
> > >
> > > On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote:
> > > > On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <[email protected]> wrote:
> > > > >
> > > > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
> > > > > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> > > > > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <[email protected]> wrote:
> > > > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > > > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> > > >
> > > > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > hum, I guess not.. will check
> > > > > > > >
> > > > > > > > nope, it locks up the same:
> > > > > > >
> > > > > > > Do you mean it locks up at commit 91c960b0056 too?
> > > >
> > > > Sorry I was being stupid here - the test didn't exist at this commit
> > > >
> > > > > > I tried this one:
> > > > > > 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
> > > > > >
> > > > > > I will check also 91c960b0056, but I think it's the new test issue
> > > >
> > > > So yeah hard to say whether this was broken on PowerPC all along. How
> > > > hard is it for me to get set up to reproduce the failure? Is there a
> > > > rootfs I can download, and some instructions for running a PowerPC
> > > > QEMU VM? If so if you can also share your config and I'll take a look.
> > > >
> > > > If it's not as simple as that, I'll stare at the code for a while and
> > > > see if anything jumps out.
> > > >
> > >
> > > I have latest fedora ppc server and compile/install latest bpf-next tree
> > > I think it will be reproduced also on vm, I attached my config
> >
> > OK, getting set up to boot a PowerPC QEMU isn't practical here unless
> > someone's got commands I can copy-paste (suspect it will need .config
> > hacking too). Looks like you need to build a proper bootloader, and
> > boot an installer disk.
>
> There are some notes put up here, though we can do better:
> https://github.com/linuxppc/wiki/wiki/Booting-with-Qemu
>
> If you are familiar with ubuntu/fedora cloud images (and cloud-init), you
> should be able to grab one of the ppc64le images and boot it in qemu:
> https://cloud-images.ubuntu.com/releases/hirsute/release/
> https://alt.fedoraproject.org/alt/
>
> >
> > Looked at the code for a bit but nothing jumped out. It seems like the
> > verifier is seeing a BPF_ADD | BPF_FETCH, which means it doesn't
> > detect an infinite loop, but then we lose the BPF_FETCH flag somewhere
> > between do_check in verifier.c and bpf_jit_build_body in
> > bpf_jit_comp64.c. That would explain why we don't get the "eBPF filter
> > atomic op code %02x (@%d) unsupported", and would also explain the
> > lockup because a normal atomic add without fetch would leave BPF R1
> > unchanged.
> >
> > We should be able to confirm that theory by disassembling the JITted
> > code that gets hexdumped by bpf_jit_dump when bpf_jit_enable is set to
> > 2... at least for PowerPC 32-bit... maybe you could paste those lines
> > into the 64-bit version too? Here's some notes I made for
> > disassembling the hexdump on x86, I guess you'd just need to change
> > the objdump flags:
> >
> > --
> >
> > - Enable console JIT output:
> > ```shell
> > echo 2 > /proc/sys/net/core/bpf_jit_enable
> > ```
> > - Load & run the program of interest.
> > - Copy the hex code from the kernel console to `/tmp/jit.txt`. Here's what a
> > short program looks like. This includes a line of context - don't paste the
> > `flen=` line.
> > ```
> > [ 79.381020] flen=8 proglen=54 pass=4 image=000000001af6f390
> > from=test_verifier pid=258
> > [ 79.389568] JIT code: 00000000: 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 08 00
> > [ 79.397411] JIT code: 00000010: 00 00 48 c7 45 f8 64 00 00 00 bf 04 00 00 00 48
> > [ 79.405965] JIT code: 00000020: f7 df f0 48 29 7d f8 8b 45 f8 48 83 f8 60 74 02
> > [ 79.414719] JIT code: 00000030: c9 c3 31 c0 eb fa
> > ```
> > - This incantation will split out and decode the hex, then disassemble the
> > result:
> > ```shell
> > cat /tmp/jit.txt | cut -d: -f2- | xxd -r >/tmp/obj && objdump -D -b
> > binary -m i386:x86-64 /tmp/obj
> > ```
> >
> > --
> >
> > Sandipan, Naveen, do you know of anything in the PowerPC code that
> > might be leading us to drop the BPF_FETCH flag from the atomic
> > instruction in tools/testing/selftests/bpf/verifier/atomic_bounds.c?
>
> Yes, I think I just found the issue. We aren't looking at the correct BPF
> instruction when checking the IMM value.

great, nice catch! :-) that fixes it for me..

Tested-by: Jiri Olsa <[email protected]>

thanks,
jirka

>
>
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -673,7 +673,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
> * BPF_STX ATOMIC (atomic ops)
> */
> case BPF_STX | BPF_ATOMIC | BPF_W:
> - if (insn->imm != BPF_ADD) {
> + if (insn[i].imm != BPF_ADD) {
> pr_err_ratelimited(
> "eBPF filter atomic op code %02x (@%d) unsupported\n",
> code, i);
> @@ -695,7 +695,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
> PPC_BCC_SHORT(COND_NE, tmp_idx);
> break;
> case BPF_STX | BPF_ATOMIC | BPF_DW:
> - if (insn->imm != BPF_ADD) {
> + if (insn[i].imm != BPF_ADD) {
> pr_err_ratelimited(
> "eBPF filter atomic op code %02x (@%d) unsupported\n",
> code, i);
>
>
>
> Thanks,
> Naveen
>

2021-07-02 09:45:51

by Brendan Jackman

[permalink] [raw]
Subject: Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

On Thu, 1 Jul 2021 at 16:15, Jiri Olsa <[email protected]> wrote:
> On Thu, Jul 01, 2021 at 04:32:03PM +0530, Naveen N. Rao wrote:
> > Yes, I think I just found the issue. We aren't looking at the correct BPF
> > instruction when checking the IMM value.
>
> great, nice catch! :-) that fixes it for me..

Ahh, nice. +1 thanks for taking a look!