2020-01-22 20:24:23

by Daniel Xu

[permalink] [raw]
Subject: [PATCH v2 bpf-next 0/3] Add bpf_perf_prog_read_branches() helper

Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.

We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.

Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.

Changes in v2:
- Change to a bpf helper instead of context access
- Avoid mentioning Intel specific things

Daniel Xu (3):
bpf: Add bpf_perf_prog_read_branches() helper
tools/bpf: Sync uapi header bpf.h
selftests/bpf: add bpf_perf_prog_read_branches() selftest

include/uapi/linux/bpf.h | 13 ++-
kernel/trace/bpf_trace.c | 31 +++++
tools/include/uapi/linux/bpf.h | 13 ++-
.../selftests/bpf/prog_tests/perf_branches.c | 106 ++++++++++++++++++
.../selftests/bpf/progs/test_perf_branches.c | 39 +++++++
5 files changed, 200 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_branches.c
create mode 100644 tools/testing/selftests/bpf/progs/test_perf_branches.c

--
2.21.1


2020-01-22 20:24:35

by Daniel Xu

[permalink] [raw]
Subject: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper

Branch records are a CPU feature that can be configured to record
certain branches that are taken during code execution. This data is
particularly interesting for profile guided optimizations. perf has had
branch record support for a while but the data collection can be a bit
coarse grained.

We (Facebook) have seen in experiments that associating metadata with
branch records can improve results (after postprocessing). We generally
use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
support for branch records is useful.

Aside from this particular use case, having branch data available to bpf
progs can be useful to get stack traces out of userspace applications
that omit frame pointers.

Signed-off-by: Daniel Xu <[email protected]>
---
include/uapi/linux/bpf.h | 13 ++++++++++++-
kernel/trace/bpf_trace.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 033d90a2282d..7350c5be6158 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2885,6 +2885,16 @@ union bpf_attr {
* **-EPERM** if no permission to send the *sig*.
*
* **-EAGAIN** if bpf program can try again.
+ *
+ * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
+ * Description
+ * For en eBPF program attached to a perf event, retrieve the
+ * branch records (struct perf_branch_entry) associated to *ctx*
+ * and store it in the buffer pointed by *buf* up to size
+ * *buf_size* bytes.
+ * Return
+ * On success, number of bytes written to *buf*. On error, a
+ * negative value.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -3004,7 +3014,8 @@ union bpf_attr {
FN(probe_read_user_str), \
FN(probe_read_kernel_str), \
FN(tcp_send_ack), \
- FN(send_signal_thread),
+ FN(send_signal_thread), \
+ FN(perf_prog_read_branches),

/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 19e793aa441a..24c51272a1f7 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1028,6 +1028,35 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
.arg3_type = ARG_CONST_SIZE,
};

+BPF_CALL_3(bpf_perf_prog_read_branches, struct bpf_perf_event_data_kern *, ctx,
+ void *, buf, u32, size)
+{
+ struct perf_branch_stack *br_stack = ctx->data->br_stack;
+ u32 to_copy = 0, to_clear = size;
+ int err = -EINVAL;
+
+ if (unlikely(!br_stack))
+ goto clear;
+
+ to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
+ to_clear -= to_copy;
+
+ memcpy(buf, br_stack->entries, to_copy);
+ err = to_copy;
+clear:
+ memset(buf + to_copy, 0, to_clear);
+ return err;
+}
+
+static const struct bpf_func_proto bpf_perf_prog_read_branches_proto = {
+ .func = bpf_perf_prog_read_branches,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg3_type = ARG_CONST_SIZE,
+};
+
static const struct bpf_func_proto *
pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
@@ -1040,6 +1069,8 @@ pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_get_stack_proto_tp;
case BPF_FUNC_perf_prog_read_value:
return &bpf_perf_prog_read_value_proto;
+ case BPF_FUNC_perf_prog_read_branches:
+ return &bpf_perf_prog_read_branches_proto;
default:
return tracing_func_proto(func_id, prog);
}
--
2.21.1

2020-01-22 20:24:41

by Daniel Xu

[permalink] [raw]
Subject: [PATCH v2 bpf-next 2/3] tools/bpf: Sync uapi header bpf.h

Signed-off-by: Daniel Xu <[email protected]>
---
tools/include/uapi/linux/bpf.h | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 033d90a2282d..7350c5be6158 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2885,6 +2885,16 @@ union bpf_attr {
* **-EPERM** if no permission to send the *sig*.
*
* **-EAGAIN** if bpf program can try again.
+ *
+ * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
+ * Description
+ * For en eBPF program attached to a perf event, retrieve the
+ * branch records (struct perf_branch_entry) associated to *ctx*
+ * and store it in the buffer pointed by *buf* up to size
+ * *buf_size* bytes.
+ * Return
+ * On success, number of bytes written to *buf*. On error, a
+ * negative value.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -3004,7 +3014,8 @@ union bpf_attr {
FN(probe_read_user_str), \
FN(probe_read_kernel_str), \
FN(tcp_send_ack), \
- FN(send_signal_thread),
+ FN(send_signal_thread), \
+ FN(perf_prog_read_branches),

/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
--
2.21.1

2020-01-22 20:26:02

by Daniel Xu

[permalink] [raw]
Subject: [PATCH v2 bpf-next 3/3] selftests/bpf: add bpf_perf_prog_read_branches() selftest

Signed-off-by: Daniel Xu <[email protected]>
---
.../selftests/bpf/prog_tests/perf_branches.c | 106 ++++++++++++++++++
.../selftests/bpf/progs/test_perf_branches.c | 39 +++++++
2 files changed, 145 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_branches.c
create mode 100644 tools/testing/selftests/bpf/progs/test_perf_branches.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_branches.c b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
new file mode 100644
index 000000000000..1d8c3bf3ab39
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_branches.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <sched.h>
+#include <sys/socket.h>
+#include <test_progs.h>
+#include "libbpf_internal.h"
+
+static void on_sample(void *ctx, int cpu, void *data, __u32 size)
+{
+ int pbe_size = sizeof(struct perf_branch_entry);
+ int ret = *(int *)data, duration = 0;
+
+ // It's hard to validate the contents of the branch entries b/c it
+ // would require some kind of disassembler and also encoding the
+ // valid jump instructions for supported architectures. So just check
+ // the easy stuff for now.
+ CHECK(ret < 0, "read_branches", "err %d\n", ret);
+ CHECK(ret % pbe_size != 0, "read_branches",
+ "bytes written=%d not multiple of struct size=%d\n",
+ ret, pbe_size);
+
+ *(int *)ctx = 1;
+}
+
+void test_perf_branches(void)
+{
+ int err, prog_fd, i, pfd = -1, duration = 0, ok = 0;
+ const char *file = "./test_perf_branches.o";
+ const char *prog_name = "perf_event";
+ struct perf_buffer_opts pb_opts = {};
+ struct perf_event_attr attr = {};
+ struct bpf_map *perf_buf_map;
+ struct bpf_program *prog;
+ struct bpf_object *obj;
+ struct perf_buffer *pb;
+ struct bpf_link *link;
+ volatile int j = 0;
+ cpu_set_t cpu_set;
+
+ /* load program */
+ err = bpf_prog_load(file, BPF_PROG_TYPE_PERF_EVENT, &obj, &prog_fd);
+ if (CHECK(err, "obj_load", "err %d errno %d\n", err, errno)) {
+ obj = NULL;
+ goto out_close;
+ }
+
+ prog = bpf_object__find_program_by_title(obj, prog_name);
+ if (CHECK(!prog, "find_probe", "prog '%s' not found\n", prog_name))
+ goto out_close;
+
+ /* load map */
+ perf_buf_map = bpf_object__find_map_by_name(obj, "perf_buf_map");
+ if (CHECK(!perf_buf_map, "find_perf_buf_map", "not found\n"))
+ goto out_close;
+
+ /* create perf event */
+ attr.size = sizeof(attr);
+ attr.type = PERF_TYPE_HARDWARE;
+ attr.config = PERF_COUNT_HW_CPU_CYCLES;
+ attr.freq = 1;
+ attr.sample_freq = 4000;
+ attr.sample_type = PERF_SAMPLE_BRANCH_STACK;
+ attr.branch_sample_type = PERF_SAMPLE_BRANCH_USER | PERF_SAMPLE_BRANCH_ANY;
+ pfd = syscall(__NR_perf_event_open, &attr, -1, 0, -1, PERF_FLAG_FD_CLOEXEC);
+ if (CHECK(pfd < 0, "perf_event_open", "err %d\n", pfd))
+ goto out_close;
+
+ /* attach perf_event */
+ link = bpf_program__attach_perf_event(prog, pfd);
+ if (CHECK(IS_ERR(link), "attach_perf_event", "err %ld\n", PTR_ERR(link)))
+ goto out_close_perf;
+
+ /* set up perf buffer */
+ pb_opts.sample_cb = on_sample;
+ pb_opts.ctx = &ok;
+ pb = perf_buffer__new(bpf_map__fd(perf_buf_map), 1, &pb_opts);
+ if (CHECK(IS_ERR(pb), "perf_buf__new", "err %ld\n", PTR_ERR(pb)))
+ goto out_detach;
+
+ /* generate some branches on cpu 0 */
+ CPU_ZERO(&cpu_set);
+ CPU_SET(0, &cpu_set);
+ err = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set), &cpu_set);
+ if (err && CHECK(err, "set_affinity", "cpu #0, err %d\n", err))
+ goto out_free_pb;
+ for (i = 0; i < 1000000; ++i)
+ ++j;
+
+ /* read perf buffer */
+ err = perf_buffer__poll(pb, 500);
+ if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
+ goto out_free_pb;
+
+ if(CHECK(!ok, "ok", "not ok\n"))
+ goto out_free_pb;
+
+out_free_pb:
+ perf_buffer__free(pb);
+out_detach:
+ bpf_link__destroy(link);
+out_close_perf:
+ close(pfd);
+out_close:
+ bpf_object__close(obj);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_branches.c b/tools/testing/selftests/bpf/progs/test_perf_branches.c
new file mode 100644
index 000000000000..c210065e21c8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_branches.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Facebook
+
+#include <linux/ptrace.h>
+#include <linux/bpf.h>
+#include "bpf_helpers.h"
+#include "bpf_trace_helpers.h"
+
+struct {
+ __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+ __uint(key_size, sizeof(int));
+ __uint(value_size, sizeof(int));
+} perf_buf_map SEC(".maps");
+
+struct fake_perf_branch_entry {
+ __u64 _a;
+ __u64 _b;
+ __u64 _c;
+};
+
+SEC("perf_event")
+int perf_branches(void *ctx)
+{
+ int ret;
+ struct fake_perf_branch_entry entries[4];
+
+ ret = bpf_perf_prog_read_branches(ctx,
+ entries,
+ sizeof(entries));
+ /* ignore spurious events */
+ if (!ret)
+ return 1;
+
+ bpf_perf_event_output(ctx, &perf_buf_map, BPF_F_CURRENT_CPU,
+ &ret, sizeof(ret));
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.21.1

2020-01-23 05:42:22

by John Fastabend

[permalink] [raw]
Subject: RE: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper

Daniel Xu wrote:
> Branch records are a CPU feature that can be configured to record
> certain branches that are taken during code execution. This data is
> particularly interesting for profile guided optimizations. perf has had
> branch record support for a while but the data collection can be a bit
> coarse grained.
>
> We (Facebook) have seen in experiments that associating metadata with
> branch records can improve results (after postprocessing). We generally
> use bpf_probe_read_*() to get metadata out of userspace. That's why bpf
> support for branch records is useful.
>
> Aside from this particular use case, having branch data available to bpf
> progs can be useful to get stack traces out of userspace applications
> that omit frame pointers.
>
> Signed-off-by: Daniel Xu <[email protected]>
> ---
> include/uapi/linux/bpf.h | 13 ++++++++++++-
> kernel/trace/bpf_trace.c | 31 +++++++++++++++++++++++++++++++
> 2 files changed, 43 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 033d90a2282d..7350c5be6158 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -2885,6 +2885,16 @@ union bpf_attr {
> * **-EPERM** if no permission to send the *sig*.
> *
> * **-EAGAIN** if bpf program can try again.
> + *
> + * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
> + * Description
> + * For en eBPF program attached to a perf event, retrieve the
> + * branch records (struct perf_branch_entry) associated to *ctx*
> + * and store it in the buffer pointed by *buf* up to size
> + * *buf_size* bytes.

It seems extra bytes in buf will be cleared. The number of bytes
copied is returned so I don't see any reason to clear the extra bytes I would
just let the BPF program do this if they care. But it should be noted in
the description at least.

> + * Return
> + * On success, number of bytes written to *buf*. On error, a
> + * negative value.
> */
> #define __BPF_FUNC_MAPPER(FN) \
> FN(unspec), \
> @@ -3004,7 +3014,8 @@ union bpf_attr {
> FN(probe_read_user_str), \
> FN(probe_read_kernel_str), \
> FN(tcp_send_ack), \
> - FN(send_signal_thread),
> + FN(send_signal_thread), \
> + FN(perf_prog_read_branches),
>
> /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> * function eBPF program intends to call
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 19e793aa441a..24c51272a1f7 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1028,6 +1028,35 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = {
> .arg3_type = ARG_CONST_SIZE,
> };
>
> +BPF_CALL_3(bpf_perf_prog_read_branches, struct bpf_perf_event_data_kern *, ctx,
> + void *, buf, u32, size)
> +{
> + struct perf_branch_stack *br_stack = ctx->data->br_stack;
> + u32 to_copy = 0, to_clear = size;
> + int err = -EINVAL;
> +
> + if (unlikely(!br_stack))
> + goto clear;
> +
> + to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
> + to_clear -= to_copy;
> +
> + memcpy(buf, br_stack->entries, to_copy);
> + err = to_copy;
> +clear:
> + memset(buf + to_copy, 0, to_clear);

Here, why do this at all? If the user cares they can clear the bytes
directly from the BPF program. I suspect its probably going to be
wasted work in most cases. If its needed for some reason provide
a comment with it.

> + return err;
> +}

[...]

2020-01-23 20:36:54

by Daniel Xu

[permalink] [raw]
Subject: RE: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper

Hi John, thanks for looking.

On Wed Jan 22, 2020 at 9:39 PM, John Fastabend wrote:
[...]
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 033d90a2282d..7350c5be6158 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2885,6 +2885,16 @@ union bpf_attr {
> > * **-EPERM** if no permission to send the *sig*.
> > *
> > * **-EAGAIN** if bpf program can try again.
> > + *
> > + * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
> > + * Description
> > + * For en eBPF program attached to a perf event, retrieve the
> > + * branch records (struct perf_branch_entry) associated to *ctx*
> > + * and store it in the buffer pointed by *buf* up to size
> > + * *buf_size* bytes.
>
>
> It seems extra bytes in buf will be cleared. The number of bytes
> copied is returned so I don't see any reason to clear the extra bytes I
> would
> just let the BPF program do this if they care. But it should be noted in
> the description at least.

In include/linux/bpf.h:

/* the following constraints used to prototype bpf_memcmp() and other
* functions that access data on eBPF program stack
*/
ARG_PTR_TO_UNINIT_MEM, /* pointer to memory does not need to be initialized,
* helper function must fill all bytes or clear
* them in error case.
*/

I figured it would be good to clear out the stack b/c this helper
writes data on program stack.

Also bpf_perf_prog_read_value() does something similar (fill zeros on
failure).

[...]
> > + to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
> > + to_clear -= to_copy;
> > +
> > + memcpy(buf, br_stack->entries, to_copy);
> > + err = to_copy;
> > +clear:
> > + memset(buf + to_copy, 0, to_clear);
>
>
> Here, why do this at all? If the user cares they can clear the bytes
> directly from the BPF program. I suspect its probably going to be
> wasted work in most cases. If its needed for some reason provide
> a comment with it.

Same concern as above, right?

I can send a V3 with updated uapi/linux/bpf.h description (and a rebase).

Thanks,
Daniel

2020-01-23 22:26:42

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper

On 1/23/20 9:09 PM, Daniel Xu wrote:
> Hi John, thanks for looking.
>
> On Wed Jan 22, 2020 at 9:39 PM, John Fastabend wrote:
> [...]
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 033d90a2282d..7350c5be6158 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -2885,6 +2885,16 @@ union bpf_attr {
>>> * **-EPERM** if no permission to send the *sig*.
>>> *
>>> * **-EAGAIN** if bpf program can try again.
>>> + *
>>> + * int bpf_perf_prog_read_branches(struct bpf_perf_event_data *ctx, void *buf, u32 buf_size)
>>> + * Description
>>> + * For en eBPF program attached to a perf event, retrieve the
>>> + * branch records (struct perf_branch_entry) associated to *ctx*
>>> + * and store it in the buffer pointed by *buf* up to size
>>> + * *buf_size* bytes.
>>
>> It seems extra bytes in buf will be cleared. The number of bytes
>> copied is returned so I don't see any reason to clear the extra bytes I
>> would
>> just let the BPF program do this if they care. But it should be noted in
>> the description at least.
>
> In include/linux/bpf.h:
>
> /* the following constraints used to prototype bpf_memcmp() and other
> * functions that access data on eBPF program stack
> */
> ARG_PTR_TO_UNINIT_MEM, /* pointer to memory does not need to be initialized,
> * helper function must fill all bytes or clear
> * them in error case.
> */
>
> I figured it would be good to clear out the stack b/c this helper
> writes data on program stack.
>
> Also bpf_perf_prog_read_value() does something similar (fill zeros on
> failure).
>
> [...]
>>> + to_copy = min_t(u32, br_stack->nr * sizeof(struct perf_branch_entry), size);
>>> + to_clear -= to_copy;
>>> +
>>> + memcpy(buf, br_stack->entries, to_copy);
>>> + err = to_copy;
>>> +clear:
>>> + memset(buf + to_copy, 0, to_clear);
>>
>>
>> Here, why do this at all? If the user cares they can clear the bytes
>> directly from the BPF program. I suspect its probably going to be
>> wasted work in most cases. If its needed for some reason provide
>> a comment with it.
>
> Same concern as above, right?

Yes, so we've been following this practice for all the BPF helpers no matter
which program type. Though for tracing it may be up to debate whether it makes
still sense given there's nothing to be leaked here since you can read this data
anyway via probe read if you'd wanted to. So we might as well get rid of the
clearing for all tracing helpers.

Different question related to your set. It looks like br_stack is only available
on x86, is that correct? For other archs this will always bail out on !br_stack
test. Perhaps we should document this fact so users are not surprised why their
prog using this helper is not working on !x86. Wdyt?

Thanks,
Daniel

2020-01-23 22:32:05

by Daniel Xu

[permalink] [raw]
Subject: Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper

On Thu Jan 23, 2020 at 11:23 PM, Daniel Borkmann wrote:
[...]
>
> Yes, so we've been following this practice for all the BPF helpers no
> matter
> which program type. Though for tracing it may be up to debate whether it
> makes
> still sense given there's nothing to be leaked here since you can read
> this data
> anyway via probe read if you'd wanted to. So we might as well get rid of
> the
> clearing for all tracing helpers.

Right, that makes sense. Do you want me to leave it in for this patchset
and then remove all of them in a followup patchset?

>
> Different question related to your set. It looks like br_stack is only
> available
> on x86, is that correct? For other archs this will always bail out on
> !br_stack
> test. Perhaps we should document this fact so users are not surprised
> why their
> prog using this helper is not working on !x86. Wdyt?

I think perf_event_open() should fail on !x86 if a user tries to configure
it with branch stack collection. So there would not be the opportunity for
the bpf prog to be attached and run. I haven't tested this, though. I'll
look through the code / install a VM and test it.

[...]

Thanks,
Daniel

2020-01-23 22:49:09

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper

On 1/23/20 11:30 PM, Daniel Xu wrote:
> On Thu Jan 23, 2020 at 11:23 PM, Daniel Borkmann wrote:
> [...]
>>
>> Yes, so we've been following this practice for all the BPF helpers no
>> matter
>> which program type. Though for tracing it may be up to debate whether it
>> makes
>> still sense given there's nothing to be leaked here since you can read
>> this data
>> anyway via probe read if you'd wanted to. So we might as well get rid of
>> the
>> clearing for all tracing helpers.
>
> Right, that makes sense. Do you want me to leave it in for this patchset
> and then remove all of them in a followup patchset?

Lets leave it in and in a different set, we can clean this up for all tracing
related helpers at once.

>> Different question related to your set. It looks like br_stack is only
>> available
>> on x86, is that correct? For other archs this will always bail out on
>> !br_stack
>> test. Perhaps we should document this fact so users are not surprised
>> why their
>> prog using this helper is not working on !x86. Wdyt?
>
> I think perf_event_open() should fail on !x86 if a user tries to configure
> it with branch stack collection. So there would not be the opportunity for
> the bpf prog to be attached and run. I haven't tested this, though. I'll
> look through the code / install a VM and test it.

As far as I can see the prog would still be attachable and runnable, just that
the helper always will return -EINVAL on these archs. Maybe error code should be
changed into -ENOENT to avoid confusion wrt whether user provided some invalid
input args. Should this actually bail out with -EINVAL if size is not a multiple
of sizeof(struct perf_branch_entry) as otherwise we'd end up copying half broken
branch entry information?

Thanks,
Daniel

2020-01-23 23:11:48

by Martin KaFai Lau

[permalink] [raw]
Subject: Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper

On Thu, Jan 23, 2020 at 11:44:53PM +0100, Daniel Borkmann wrote:
> On 1/23/20 11:30 PM, Daniel Xu wrote:
> > On Thu Jan 23, 2020 at 11:23 PM, Daniel Borkmann wrote:
> > [...]
> > >
> > > Yes, so we've been following this practice for all the BPF helpers no
> > > matter
> > > which program type. Though for tracing it may be up to debate whether it
> > > makes
> > > still sense given there's nothing to be leaked here since you can read
> > > this data
> > > anyway via probe read if you'd wanted to. So we might as well get rid of
> > > the
> > > clearing for all tracing helpers.
> >
> > Right, that makes sense. Do you want me to leave it in for this patchset
> > and then remove all of them in a followup patchset?
>
> Lets leave it in and in a different set, we can clean this up for all tracing
> related helpers at once.
>
> > > Different question related to your set. It looks like br_stack is only
> > > available
> > > on x86, is that correct? For other archs this will always bail out on
> > > !br_stack
> > > test. Perhaps we should document this fact so users are not surprised
> > > why their
> > > prog using this helper is not working on !x86. Wdyt?
> >
> > I think perf_event_open() should fail on !x86 if a user tries to configure
> > it with branch stack collection. So there would not be the opportunity for
> > the bpf prog to be attached and run. I haven't tested this, though. I'll
> > look through the code / install a VM and test it.
>
> As far as I can see the prog would still be attachable and runnable, just that
> the helper always will return -EINVAL on these archs. Maybe error code should be
> changed into -ENOENT to avoid confusion wrt whether user provided some invalid
+1 on -ENOENT.

> input args. Should this actually bail out with -EINVAL if size is not a multiple
> of sizeof(struct perf_branch_entry) as otherwise we'd end up copying half broken
> branch entry information?

2020-01-23 23:13:26

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper

On 1/23/20 11:41 PM, Andrii Nakryiko wrote:
> On 1/23/20 2:30 PM, Daniel Xu wrote:
>> On Thu Jan 23, 2020 at 11:23 PM, Daniel Borkmann wrote:
>> [...]
>>>
>>> Yes, so we've been following this practice for all the BPF helpers no
>>> matter
>>> which program type. Though for tracing it may be up to debate whether it
>>> makes
>>> still sense given there's nothing to be leaked here since you can read
>>> this data
>>> anyway via probe read if you'd wanted to. So we might as well get rid of
>>> the
>>> clearing for all tracing helpers.
>>
>> Right, that makes sense. Do you want me to leave it in for this patchset
>> and then remove all of them in a followup patchset?
>
> I don't think we can remove that for existing tracing helpers (e.g.,
> bpf_probe_read). There are applications that explicitly expect
> destination memory to be zeroed out on failure. It's a BPF world's
> memset(0).

Due to avoiding error checks that way if the expected outcome of the buf
is non-zero anyway? Agree, that those would break, so yeah they cannot be
removed then.

> I also wonder if BPF verifier has any extra assumptions for
> ARG_PTR_TO_UNINIT_MEM w.r.t. it being initialized after helper call
> (e.g., for liveness tracking).

There are no extra assumptions other than memory being written after the
helper call (whether success or failure of the helper itself doesn't matter,
so there are no assumptions about the content); the data that has been
written to the buffer is marked as initialized but unknown (e.g. in
check_stack_write() the case where reg remains NULL since value_regno is
negative).

Thanks,
Daniel

2020-01-23 23:45:52

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper

On 1/23/20 2:30 PM, Daniel Xu wrote:
> On Thu Jan 23, 2020 at 11:23 PM, Daniel Borkmann wrote:
> [...]
>>
>> Yes, so we've been following this practice for all the BPF helpers no
>> matter
>> which program type. Though for tracing it may be up to debate whether it
>> makes
>> still sense given there's nothing to be leaked here since you can read
>> this data
>> anyway via probe read if you'd wanted to. So we might as well get rid of
>> the
>> clearing for all tracing helpers.
>
> Right, that makes sense. Do you want me to leave it in for this patchset
> and then remove all of them in a followup patchset?
>

I don't think we can remove that for existing tracing helpers (e.g.,
bpf_probe_read). There are applications that explicitly expect
destination memory to be zeroed out on failure. It's a BPF world's
memset(0).

I also wonder if BPF verifier has any extra assumptions for
ARG_PTR_TO_UNINIT_MEM w.r.t. it being initialized after helper call
(e.g., for liveness tracking).

>>
>> Different question related to your set. It looks like br_stack is only
>> available
>> on x86, is that correct? For other archs this will always bail out on
>> !br_stack
>> test. Perhaps we should document this fact so users are not surprised
>> why their
>> prog using this helper is not working on !x86. Wdyt?
>
> I think perf_event_open() should fail on !x86 if a user tries to configure
> it with branch stack collection. So there would not be the opportunity for
> the bpf prog to be attached and run. I haven't tested this, though. I'll
> look through the code / install a VM and test it.
>
> [...]
>
> Thanks,
> Daniel
>

2020-01-24 01:06:22

by Daniel Xu

[permalink] [raw]
Subject: Re: [PATCH v2 bpf-next 1/3] bpf: Add bpf_perf_prog_read_branches() helper

On Thu Jan 23, 2020 at 11:44 PM, Daniel Borkmann wrote:
[...]
> >> Different question related to your set. It looks like br_stack is only
> >> available
> >> on x86, is that correct? For other archs this will always bail out on
> >> !br_stack
> >> test. Perhaps we should document this fact so users are not surprised
> >> why their
> >> prog using this helper is not working on !x86. Wdyt?
> >
> > I think perf_event_open() should fail on !x86 if a user tries to configure
> > it with branch stack collection. So there would not be the opportunity for
> > the bpf prog to be attached and run. I haven't tested this, though. I'll
> > look through the code / install a VM and test it.
>
>
> As far as I can see the prog would still be attachable and runnable,
> just that
> the helper always will return -EINVAL on these archs. Maybe error code
> should be
> changed into -ENOENT to avoid confusion wrt whether user provided some
> invalid
> input args.

Ok, will add.

> Should this actually bail out with -EINVAL if size is not a
> multiple
> of sizeof(struct perf_branch_entry) as otherwise we'd end up copying
> half broken
> branch entry information?

Sure, makes sense.
>
>
> Thanks,
> Daniel
>
>
>
>