Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.
We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.
Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.
Changes in v8:
- Use globals instead of perf buffer
- Call test_perf_branches__detach() before destroying skeleton
- Fix typo in docs
Changes in v7:
- Const-ify and static-ify local var
- Documentation formatting
Changes in v6:
- Move #ifdef a little to avoid unused variable warnings on !x86
- Test negative condition in selftest (-EINVAL on improperly configured
perf event)
- Skip positive condition selftest on setups that don't support branch
records
Changes in v5:
- Rename bpf_perf_prog_read_branches() -> bpf_read_branch_records()
- Rename BPF_F_GET_BR_SIZE -> BPF_F_GET_BRANCH_RECORDS_SIZE
- Squash tools/ bpf.h sync into selftest commit
Changes in v4:
- Add BPF_F_GET_BR_SIZE flag
- Return -ENOENT on unsupported architectures
- Only accept initialized memory in helper
- Check buffer size is multiple of sizeof(struct perf_branch_entry)
- Use bpf skeleton in selftest
- Add commit messages
- Spelling and formatting
Changes in v3:
- Document filling unused buffer with zero
- Formatting fixes
- Rebase
Changes in v2:
- Change to a bpf helper instead of context access
- Avoid mentioning Intel specific things
Daniel Xu (2):
bpf: Add bpf_read_branch_records() helper
selftests/bpf: add bpf_read_branch_records() selftest
include/uapi/linux/bpf.h | 25 ++-
kernel/trace/bpf_trace.c | 41 +++++
tools/include/uapi/linux/bpf.h | 25 ++-
.../selftests/bpf/prog_tests/perf_branches.c | 169 ++++++++++++++++++
.../selftests/bpf/progs/test_perf_branches.c | 50 ++++++
5 files changed, 308 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_branches.c
create mode 100644 tools/testing/selftests/bpf/progs/test_perf_branches.c
--
2.21.1
Add a selftest to test:
* default bpf_read_branch_records() behavior
* BPF_F_GET_BRANCH_RECORDS_SIZE flag behavior
* error path on non branch record perf events
* using helper to write to stack
* using helper to write to global
On host with hardware counter support:
# ./test_progs -t perf_branches
#27/1 perf_branches_hw:OK
#27/2 perf_branches_no_hw:OK
#27 perf_branches:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
On host without hardware counter support (VM):
# ./test_progs -t perf_branches
#27/1 perf_branches_hw:OK
#27/2 perf_branches_no_hw:OK
#27 perf_branches:OK
Summary: 1/2 PASSED, 1 SKIPPED, 0 FAILED
Also sync tools/include/uapi/linux/bpf.h.
Signed-off-by: Daniel Xu <[email protected]>
---
tools/include/uapi/linux/bpf.h | 25 ++-
.../selftests/bpf/prog_tests/perf_branches.c | 169 ++++++++++++++++++
.../selftests/bpf/progs/test_perf_branches.c | 50 ++++++
3 files changed, 243 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_branches.c
create mode 100644 tools/testing/selftests/bpf/progs/test_perf_branches.c
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f1d74a2bd234..a7e59756853f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2892,6 +2892,25 @@ union bpf_attr {
* Obtain the 64bit jiffies
* Return
* The 64 bit jiffies
+ *
+ * int bpf_read_branch_records(struct bpf_perf_event_data *ctx, void *buf, u32 size, u64 flags)
+ * Description
+ * For an eBPF program attached to a perf event, retrieve the
+ * branch records (struct perf_branch_entry) associated to *ctx*
+ * and store it in the buffer pointed by *buf* up to size
+ * *size* bytes.
+ * Return
+ * On success, number of bytes written to *buf*. On error, a
+ * negative value.
+ *
+ * The *flags* can be set to **BPF_F_GET_BRANCH_RECORDS_SIZE** to
+ * instead return the number of bytes required to store all the
+ * branch entries. If this flag is set, *buf* may be NULL.
+ *
+ * **-EINVAL** if arguments invalid or **size** not a multiple
+ * of sizeof(struct perf_branch_entry).
+ *
+ * **-ENOENT** if architecture does not support branch records.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -3012,7 +3031,8 @@ union bpf_attr {
FN(probe_read_kernel_str), \
FN(tcp_send_ack), \
FN(send_signal_thread), \
- FN(jiffies64),
+ FN(jiffies64), \
+ FN(read_branch_records),
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
@@ -3091,6 +3111,9 @@ enum bpf_func_id {
/* BPF_FUNC_sk_storage_get flags */
#define BPF_SK_STORAGE_GET_F_CREATE (1ULL << 0)
+/* BPF_FUNC_read_branch_records flags. */
+#define BPF_F_GET_BRANCH_RECORDS_SIZE (1ULL << 0)
+
/* Mode for BPF_FUNC_skb_adjust_room helper. */
enum bpf_adj_room_mode {
BPF_ADJ_ROOM_NET,
diff --git a/tools/testing/selftests/bpf/prog_tests/perf_branches.c b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
new file mode 100644
index 000000000000..424f8be431ea
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <sched.h>
+#include <sys/socket.h>
+#include <test_progs.h>
+#include "bpf/libbpf_internal.h"
+#include "test_perf_branches.skel.h"
+
+static void check_good_sample(struct test_perf_branches *skel)
+{
+ int written_global = skel->bss->written_global_out;
+ int required_size = skel->bss->required_size_out;
+ int written_stack = skel->bss->written_stack_out;
+ int pbe_size = sizeof(struct perf_branch_entry);
+ int duration = 0;
+
+ if (CHECK(!skel->bss->valid, "output not valid",
+ "no valid sample from prog"))
+ return;
+
+ /*
+ * It's hard to validate the contents of the branch entries b/c it
+ * would require some kind of disassembler and also encoding the
+ * valid jump instructions for supported architectures. So just check
+ * the easy stuff for now.
+ */
+ CHECK(required_size <= 0, "read_branches_size", "err %d\n", required_size);
+ CHECK(written_stack < 0, "read_branches_stack", "err %d\n", written_stack);
+ CHECK(written_stack % pbe_size != 0, "read_branches_stack",
+ "stack bytes written=%d not multiple of struct size=%d\n",
+ written_stack, pbe_size);
+ CHECK(written_global < 0, "read_branches_global", "err %d\n", written_global);
+ CHECK(written_global % pbe_size != 0, "read_branches_global",
+ "global bytes written=%d not multiple of struct size=%d\n",
+ written_global, pbe_size);
+ CHECK(written_global < written_stack, "read_branches_size",
+ "written_global=%d < written_stack=%d\n", written_global, written_stack);
+}
+
+static void check_bad_sample(struct test_perf_branches *skel)
+{
+ int written_global = skel->bss->written_global_out;
+ int required_size = skel->bss->required_size_out;
+ int written_stack = skel->bss->written_stack_out;
+ int duration = 0;
+
+ if (CHECK(!skel->bss->valid, "output not valid",
+ "no valid sample from prog"))
+ return;
+
+ CHECK((required_size != -EINVAL && required_size != -ENOENT),
+ "read_branches_size", "err %d\n", required_size);
+ CHECK((written_stack != -EINVAL && written_stack != -ENOENT),
+ "read_branches_stack", "written %d\n", written_stack);
+ CHECK((written_global != -EINVAL && written_global != -ENOENT),
+ "read_branches_global", "written %d\n", written_global);
+}
+
+static void test_perf_branches_common(int perf_fd,
+ void (*cb)(struct test_perf_branches *))
+{
+ struct test_perf_branches *skel;
+ int err, i, duration = 0;
+ bool detached = false;
+ struct bpf_link *link;
+ volatile int j = 0;
+ cpu_set_t cpu_set;
+
+ skel = test_perf_branches__open_and_load();
+ if (CHECK(!skel, "test_perf_branches_load",
+ "perf_branches skeleton failed\n"))
+ return;
+
+ /* attach perf_event */
+ link = bpf_program__attach_perf_event(skel->progs.perf_branches, perf_fd);
+ if (CHECK(IS_ERR(link), "attach_perf_event", "err %ld\n", PTR_ERR(link)))
+ goto out_destroy_skel;
+
+ /* generate some branches on cpu 0 */
+ CPU_ZERO(&cpu_set);
+ CPU_SET(0, &cpu_set);
+ err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
+ if (CHECK(err, "set_affinity", "cpu #0, err %d\n", err))
+ goto out_destroy;
+ /* spin the loop for a while (random high number) */
+ for (i = 0; i < 1000000; ++i)
+ ++j;
+
+ test_perf_branches__detach(skel);
+ detached = true;
+
+ cb(skel);
+out_destroy:
+ bpf_link__destroy(link);
+out_destroy_skel:
+ if (!detached)
+ test_perf_branches__detach(skel);
+ test_perf_branches__destroy(skel);
+}
+
+static void test_perf_branches_hw(void)
+{
+ struct perf_event_attr attr = {0};
+ int duration = 0;
+ int pfd;
+
+ /* create perf event */
+ attr.size = sizeof(attr);
+ attr.type = PERF_TYPE_HARDWARE;
+ attr.config = PERF_COUNT_HW_CPU_CYCLES;
+ attr.freq = 1;
+ attr.sample_freq = 4000;
+ attr.sample_type = PERF_SAMPLE_BRANCH_STACK;
+ attr.branch_sample_type = PERF_SAMPLE_BRANCH_USER | PERF_SAMPLE_BRANCH_ANY;
+ pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
+
+ /*
+ * Some setups don't support branch records (virtual machines, !x86),
+ * so skip test in this case.
+ */
+ if (pfd == -1) {
+ if (errno == ENOENT) {
+ printf("%s:SKIP:no PERF_SAMPLE_BRANCH_STACK\n",
+ __func__);
+ test__skip();
+ return;
+ }
+ if (CHECK(pfd < 0, "perf_event_open", "err %d\n", pfd))
+ return;
+ }
+
+ test_perf_branches_common(pfd, check_good_sample);
+
+ close(pfd);
+}
+
+/*
+ * Tests negative case -- run bpf_read_branch_records() on improperly configured
+ * perf event.
+ */
+static void test_perf_branches_no_hw(void)
+{
+ struct perf_event_attr attr = {0};
+ int duration = 0;
+ int pfd;
+
+ /* create perf event */
+ attr.size = sizeof(attr);
+ attr.type = PERF_TYPE_SOFTWARE;
+ attr.config = PERF_COUNT_SW_CPU_CLOCK;
+ attr.freq = 1;
+ attr.sample_freq = 4000;
+ pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
+ if (CHECK(pfd < 0, "perf_event_open", "err %d\n", pfd))
+ return;
+
+ test_perf_branches_common(pfd, check_bad_sample);
+
+ close(pfd);
+}
+
+void test_perf_branches(void)
+{
+ if (test__start_subtest("perf_branches_hw"))
+ test_perf_branches_hw();
+ if (test__start_subtest("perf_branches_no_hw"))
+ test_perf_branches_no_hw();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_branches.c b/tools/testing/selftests/bpf/progs/test_perf_branches.c
new file mode 100644
index 000000000000..0f7e27d97567
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_branches.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+#include <stddef.h>
+#include <linux/ptrace.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include "bpf_trace_helpers.h"
+
+int valid = 0;
+int required_size_out = 0;
+int written_stack_out = 0;
+int written_global_out = 0;
+
+struct {
+ __u64 _a;
+ __u64 _b;
+ __u64 _c;
+} fpbe[30] = {0};
+
+SEC("perf_event")
+int perf_branches(void *ctx)
+{
+ __u64 entries[4 * 3] = {0};
+ int required_size, written_stack, written_global;
+
+ /* write to stack */
+ written_stack = bpf_read_branch_records(ctx, entries, sizeof(entries), 0);
+ /* ignore spurious events */
+ if (!written_stack)
+ return 1;
+
+ /* get required size */
+ required_size = bpf_read_branch_records(ctx, NULL, 0,
+ BPF_F_GET_BRANCH_RECORDS_SIZE);
+
+ written_global = bpf_read_branch_records(ctx, fpbe, sizeof(fpbe), 0);
+ /* ignore spurious events */
+ if (!written_global)
+ return 1;
+
+ required_size_out = required_size;
+ written_stack_out = written_stack;
+ written_global_out = written_global;
+ valid = 1;
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.21.1
On Mon, Feb 17, 2020 at 7:06 PM Daniel Xu <[email protected]> wrote:
>
> Add a selftest to test:
>
> * default bpf_read_branch_records() behavior
> * BPF_F_GET_BRANCH_RECORDS_SIZE flag behavior
> * error path on non branch record perf events
> * using helper to write to stack
> * using helper to write to global
>
> On host with hardware counter support:
>
> # ./test_progs -t perf_branches
> #27/1 perf_branches_hw:OK
> #27/2 perf_branches_no_hw:OK
> #27 perf_branches:OK
> Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
>
> On host without hardware counter support (VM):
>
> # ./test_progs -t perf_branches
> #27/1 perf_branches_hw:OK
> #27/2 perf_branches_no_hw:OK
> #27 perf_branches:OK
> Summary: 1/2 PASSED, 1 SKIPPED, 0 FAILED
>
> Also sync tools/include/uapi/linux/bpf.h.
>
> Signed-off-by: Daniel Xu <[email protected]>
> ---
LGTM.
Acked-by: Andrii Nakryiko <[email protected]>
> tools/include/uapi/linux/bpf.h | 25 ++-
> .../selftests/bpf/prog_tests/perf_branches.c | 169 ++++++++++++++++++
> .../selftests/bpf/progs/test_perf_branches.c | 50 ++++++
> 3 files changed, 243 insertions(+), 1 deletion(-)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_branches.c
> create mode 100644 tools/testing/selftests/bpf/progs/test_perf_branches.c
>
[...]
On Mon, Feb 17, 2020 at 7:04 PM Daniel Xu <[email protected]> wrote:
>
> Add a selftest to test:
>
> * default bpf_read_branch_records() behavior
> * BPF_F_GET_BRANCH_RECORDS_SIZE flag behavior
> * error path on non branch record perf events
> * using helper to write to stack
> * using helper to write to global
>
> On host with hardware counter support:
>
> # ./test_progs -t perf_branches
> #27/1 perf_branches_hw:OK
> #27/2 perf_branches_no_hw:OK
> #27 perf_branches:OK
> Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
>
> On host without hardware counter support (VM):
>
> # ./test_progs -t perf_branches
> #27/1 perf_branches_hw:OK
> #27/2 perf_branches_no_hw:OK
> #27 perf_branches:OK
> Summary: 1/2 PASSED, 1 SKIPPED, 0 FAILED
That's not what I see:
./test_progs -t perf_branches
test_perf_branches_hw:FAIL:perf_event_open err -1
#27/1 perf_branches_hw:FAIL
#27/2 perf_branches_no_hw:OK
#27 perf_branches:FAIL
Summary: 0/1 PASSED, 0 SKIPPED, 2 FAILED
I remember previous version used to work, but something changed.
On Wed, Feb 19, 2020 at 2:47 PM Alexei Starovoitov
<[email protected]> wrote:
>
> On Mon, Feb 17, 2020 at 7:04 PM Daniel Xu <[email protected]> wrote:
> >
> > Add a selftest to test:
> >
> > * default bpf_read_branch_records() behavior
> > * BPF_F_GET_BRANCH_RECORDS_SIZE flag behavior
> > * error path on non branch record perf events
> > * using helper to write to stack
> > * using helper to write to global
> >
> > On host with hardware counter support:
> >
> > # ./test_progs -t perf_branches
> > #27/1 perf_branches_hw:OK
> > #27/2 perf_branches_no_hw:OK
> > #27 perf_branches:OK
> > Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
> >
> > On host without hardware counter support (VM):
> >
> > # ./test_progs -t perf_branches
> > #27/1 perf_branches_hw:OK
> > #27/2 perf_branches_no_hw:OK
> > #27 perf_branches:OK
> > Summary: 1/2 PASSED, 1 SKIPPED, 0 FAILED
>
> That's not what I see:
> ./test_progs -t perf_branches
> test_perf_branches_hw:FAIL:perf_event_open err -1
> #27/1 perf_branches_hw:FAIL
> #27/2 perf_branches_no_hw:OK
> #27 perf_branches:FAIL
> Summary: 0/1 PASSED, 0 SKIPPED, 2 FAILED
>
> I remember previous version used to work, but something changed.
Looks like the error code has changed.
I've added
if (errno == ENOENT || errno == EOPNOTSUPP)
and applied.
Please test your patches more carefully next time.