Hi.
First, I hope you are fine and the same for your relatives.
Normally, when BPF ring buffer are full, producers cannot write anymore and
need to wait for consumer to get some data.
As a consequence, calling bpf_ringbuf_reserve() from eBPF code returns NULL.
This contribution adds a new flag to make BPF ring buffer overwritable.
Perf ring buffers already implement an option to be overwritable. In order to
avoid data corruption, the data is written backward, see
commit 9ecda41acb97 ("perf/core: Add ::write_backward attribute to perf event").
This patch series re-uses the same idea from perf ring buffers but in BPF ring
buffers.
So, calling bpf_ringbuf_reserve() on an overwritable BPF ring buffer never
returns NULL.
As a consequence, oldest data will be overwritten by the newest so consumer will
loose data.
Overwritable ring buffers are useful in BPF programs that are permanently
enabled but rarely read, only on-demand, for example in case of a user request
to investigate problems. We would like to use this in the Traceloop project [1].
The self test added in this series was tested and validated in a VM:
you@vm# ./share/linux/tools/testing/selftests/bpf/test_progs -t ringbuf_over
Can't find bpf_testmod.ko kernel module: -2
WARNING! Selftests relying on bpf_testmod.ko will be skipped.
#135 ringbuf_over_writable:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
You can also test the libbpf implementation by using the last patch of this
series which should be applied to iovisor/bcc:
you@home$ cd /path/to/iovisor/bcc
you@home$ git am -3 v2-0005-for-test-purpose-only-Add-toy-to-play-with-BPF-ri.patch
you@home$ cd /path/to/linux/tools/lib/bpf
you@home$ make -j$(nproc)
you@home$ cp libbpf.a /path/to/iovisor/bcc/libbpf-tools/.output
you@home$ cd /path/to/iovisor/bcc/libbpf-tools/
you@home$ make -j toy
# Start your VM and copy toy executable inside it.
root@vm-amd64:~# ./share/toy &
[1] 287
root@vm-amd64:~# for i in {1..16}; do ls > /dev/null; done
16
15
14
13
12
11
10
9
root@vm-amd64:~# ls > /dev/null && ls > /dev/null
18
17
As you can see, the first eight events are overwritten.
If you see any way to improve this contribution, feel free to share.
Changes since:
v1:
* Made producers write backward like perf ring buffer, so it permits avoiding
memory corruption.
* Added libbpf implementation to consume all events available.
* Added selftest.
* Added documentation.
Francis Laniel (5):
bpf: Make ring buffer overwritable.
selftests: Add BPF overwritable ring buffer self tests.
docs/bpf: Add documentation for overwritable ring buffer.
libbpf: Add implementation to consume overwritable BPF ring buffer.
for test purpose only: Add toy to play with BPF ring.
...-only-Add-toy-to-play-with-BPF-ring-.patch | 147 ++++++++++++++++
Documentation/bpf/ringbuf.rst | 18 +-
include/uapi/linux/bpf.h | 3 +
kernel/bpf/ringbuf.c | 43 +++--
tools/include/uapi/linux/bpf.h | 3 +
tools/lib/bpf/ringbuf.c | 106 ++++++++++++
tools/testing/selftests/bpf/Makefile | 5 +-
.../bpf/prog_tests/ringbuf_overwritable.c | 158 ++++++++++++++++++
.../bpf/progs/test_ringbuf_overwritable.c | 61 +++++++
9 files changed, 531 insertions(+), 13 deletions(-)
create mode 100644 0001-for-test-purpose-only-Add-toy-to-play-with-BPF-ring-.patch
create mode 100644 tools/testing/selftests/bpf/prog_tests/ringbuf_overwritable.c
create mode 100644 tools/testing/selftests/bpf/progs/test_ringbuf_overwritable.c
Best regards and thank you in advance.
---
[1] https://github.com/kinvolk/traceloop
Traceloop was presented at LPC 2020 (https://lpc.events/event/7/contributions/667/)
--
2.25.1
This patch should be applied on iovisor/bcc.
Signed-off-by: Francis Laniel <[email protected]>
---
...-only-Add-toy-to-play-with-BPF-ring-.patch | 147 ++++++++++++++++++
1 file changed, 147 insertions(+)
create mode 100644 0001-for-test-purpose-only-Add-toy-to-play-with-BPF-ring-.patch
diff --git a/0001-for-test-purpose-only-Add-toy-to-play-with-BPF-ring-.patch b/0001-for-test-purpose-only-Add-toy-to-play-with-BPF-ring-.patch
new file mode 100644
index 000000000000..37d08cc08a88
--- /dev/null
+++ b/0001-for-test-purpose-only-Add-toy-to-play-with-BPF-ring-.patch
@@ -0,0 +1,147 @@
+From e4b95b1f9625f62d0978173973070dce38bd7210 Mon Sep 17 00:00:00 2001
+From: Francis Laniel <[email protected]>
+Date: Tue, 9 Aug 2022 18:18:53 +0200
+Subject: [PATCH] for test purpose only: Add toy to play with BPF ring buffer.
+
+Signed-off-by: Francis Laniel <[email protected]>
+---
+ libbpf-tools/Makefile | 1 +
+ libbpf-tools/toy.bpf.c | 29 +++++++++++++++++++
+ libbpf-tools/toy.c | 65 ++++++++++++++++++++++++++++++++++++++++++
+ libbpf-tools/toy.h | 4 +++
+ 4 files changed, 99 insertions(+)
+ create mode 100644 libbpf-tools/toy.bpf.c
+ create mode 100644 libbpf-tools/toy.c
+ create mode 100644 libbpf-tools/toy.h
+
+diff --git a/libbpf-tools/Makefile b/libbpf-tools/Makefile
+index 3e40f6e5..0d81d3b7 100644
+--- a/libbpf-tools/Makefile
++++ b/libbpf-tools/Makefile
+@@ -68,6 +68,7 @@ APPS = \
+ tcplife \
+ tcprtt \
+ tcpsynbl \
++ toy \
+ vfsstat \
+ #
+
+diff --git a/libbpf-tools/toy.bpf.c b/libbpf-tools/toy.bpf.c
+new file mode 100644
+index 00000000..3c28a20b
+--- /dev/null
++++ b/libbpf-tools/toy.bpf.c
+@@ -0,0 +1,29 @@
++#include <linux/types.h>
++#include <bpf/bpf_helpers.h>
++#include <linux/bpf.h>
++#include "toy.h"
++
++
++struct {
++ __uint(type, BPF_MAP_TYPE_RINGBUF);
++ __uint(max_entries, 4096);
++ __uint(map_flags, 1U << 13);
++} buffer SEC(".maps");
++
++static __u32 count = 0;
++
++SEC("tracepoint/syscalls/sys_enter_execve")
++int sys_enter_execve(void) {
++ count++;
++ struct event *event = bpf_ringbuf_reserve(&buffer, sizeof(struct event), 0);
++ if (!event) {
++ return 1;
++ }
++
++ event->count = count;
++ bpf_ringbuf_submit(event, 0);
++
++ return 0;
++}
++
++char _license[] SEC("license") = "GPL";
+diff --git a/libbpf-tools/toy.c b/libbpf-tools/toy.c
+new file mode 100644
+index 00000000..4cd8b588
+--- /dev/null
++++ b/libbpf-tools/toy.c
+@@ -0,0 +1,65 @@
++#include <bpf/libbpf.h>
++#include <stdio.h>
++#include <unistd.h>
++#include "toy.h"
++#include "toy.skel.h"
++#include "btf_helpers.h"
++
++
++static int buf_process_sample(void *ctx, void *data, size_t len) {
++ struct event *evt = (struct event *)data;
++
++ printf("%d\n", evt->count);
++
++ return 0;
++}
++
++int main(void) {
++ LIBBPF_OPTS(bpf_object_open_opts, open_opts);
++ int buffer_map_fd = -1;
++ struct toy_bpf *obj;
++ int err;
++
++ libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
++
++ err = ensure_core_btf(&open_opts);
++ if (err) {
++ fprintf(stderr, "failed to fetch necessary BTF for CO-RE: %s\n", strerror(-err));
++ return 1;
++ }
++
++ obj = toy_bpf__open_opts(&open_opts);
++ if (!obj) {
++ fprintf(stderr, "failed to open BPF object\n");
++ return 1;
++ }
++
++ err = toy_bpf__load(obj);
++ if (err) {
++ fprintf(stderr, "failed to load BPF object: %d\n", err);
++ return 1;
++ }
++
++ struct ring_buffer *ring_buffer;
++
++ buffer_map_fd = bpf_object__find_map_fd_by_name(obj->obj, "buffer");
++ ring_buffer = ring_buffer__new(buffer_map_fd, buf_process_sample, NULL, NULL);
++
++ if(!ring_buffer) {
++ fprintf(stderr, "failed to create ring buffer\n");
++ return 1;
++ }
++
++ err = toy_bpf__attach(obj);
++ if (err) {
++ fprintf(stderr, "failed to attach BPF programs\n");
++ return 1;
++ }
++
++ for (;;) {
++ ring_buffer__consume(ring_buffer);
++ sleep(1);
++ }
++
++ return 0;
++}
+diff --git a/libbpf-tools/toy.h b/libbpf-tools/toy.h
+new file mode 100644
+index 00000000..ebfedf06
+--- /dev/null
++++ b/libbpf-tools/toy.h
+@@ -0,0 +1,4 @@
++struct event {
++ __u32 count;
++ char filler[4096 / 8 - sizeof(__u32) - 8];
++};
+--
+2.25.1
+
--
2.25.1
Add documentation to precise behavior of overwritable BPF ring buffer compared
to conventionnal ones.
Signed-off-by: Francis Laniel <[email protected]>
---
Documentation/bpf/ringbuf.rst | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/Documentation/bpf/ringbuf.rst b/Documentation/bpf/ringbuf.rst
index 6a615cd62bda..e062381ff604 100644
--- a/Documentation/bpf/ringbuf.rst
+++ b/Documentation/bpf/ringbuf.rst
@@ -124,7 +124,7 @@ buffer. Currently 4 are supported:
- ``BPF_RB_AVAIL_DATA`` returns amount of unconsumed data in ring buffer;
- ``BPF_RB_RING_SIZE`` returns the size of ring buffer;
-- ``BPF_RB_CONS_POS``/``BPF_RB_PROD_POS`` returns current logical possition
+- ``BPF_RB_CONS_POS``/``BPF_RB_PROD_POS`` returns current logical position
of consumer/producer, respectively.
Returned values are momentarily snapshots of ring buffer state and could be
@@ -204,3 +204,19 @@ buffer. For extreme cases, when BPF program wants more manual control of
notifications, commit/discard/output helpers accept ``BPF_RB_NO_WAKEUP`` and
``BPF_RB_FORCE_WAKEUP`` flags, which give full control over notifications of
data availability, but require extra caution and diligence in using this API.
+
+Specific case of overwritable ring buffer
+-----------------------------------------
+
+Using ``BFP_F_RB_OVERWRITABLE`` when creating the ring buffer will make it
+overwritable.
+As a consequence, the producers will never be stopped from writing data, *i.e.*
+in this mode ``bpf_ringbuf_reserve()`` never blocks and returns NULL, but oldest
+events will be replaced by newest ones.
+
+In terms of implementation, this feature uses the same logic than overwritable
+perf ring buffer.
+The ring buffer is written backward, while it should be read forward from the
+producer position.
+As a consequence, in this mode, the consumer position has no meaning and can be
+used freely by userspace implementation.
--
2.25.1
On Tue, Sep 6, 2022 at 12:58 PM Francis Laniel
<[email protected]> wrote:
>
> Hi.
>
>
> First, I hope you are fine and the same for your relatives.
>
> Normally, when BPF ring buffer are full, producers cannot write anymore and
> need to wait for consumer to get some data.
> As a consequence, calling bpf_ringbuf_reserve() from eBPF code returns NULL.
>
> This contribution adds a new flag to make BPF ring buffer overwritable.
> Perf ring buffers already implement an option to be overwritable. In order to
> avoid data corruption, the data is written backward, see
No, you are not avoiding data corruption. This patch set doesn't apply
cleanly, so I can't try this locally, but try the following example:
1. Allocate very small ringbuf (4KB).
2. Write BPF program that does multiple reservations within single
run. Enough reservations to overfill entire ringbuf and wrap over. For
each reserved sample fill it completely with garbage.
3. Try to submit all (successful) reservations. My theory is you'll
observe a crash.
Make sure that sample size isn't an exact multiple of ringbuf size so
that you don't have a nice neat overlap.
Look at bpf_ringbuf_restore_from_rec(). Each successful reservation
contains a very delicate and precise offset that allows
bpf_ringbuf_commit() to find the BPF ringbuf map in memory. By
allowing to overwrite active not-yet-submitted reservation you allow
to corrupt this information in the record header. This will make
kernel code to dereference garbage addresses.
This whole backwards approach won't work with BPF ringbuf. It works
with BPF perfbuf only under some non-enforceable assumptions, from
what I understand. We need something else for an overwritable ringbuf.
> commit 9ecda41acb97 ("perf/core: Add ::write_backward attribute to perf event").
> This patch series re-uses the same idea from perf ring buffers but in BPF ring
> buffers.
> So, calling bpf_ringbuf_reserve() on an overwritable BPF ring buffer never
> returns NULL.
> As a consequence, oldest data will be overwritten by the newest so consumer will
> loose data.
>
> Overwritable ring buffers are useful in BPF programs that are permanently
> enabled but rarely read, only on-demand, for example in case of a user request
> to investigate problems. We would like to use this in the Traceloop project [1].
>
> The self test added in this series was tested and validated in a VM:
> you@vm# ./share/linux/tools/testing/selftests/bpf/test_progs -t ringbuf_over
> Can't find bpf_testmod.ko kernel module: -2
> WARNING! Selftests relying on bpf_testmod.ko will be skipped.
> #135 ringbuf_over_writable:OK
> Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
>
> You can also test the libbpf implementation by using the last patch of this
> series which should be applied to iovisor/bcc:
> you@home$ cd /path/to/iovisor/bcc
> you@home$ git am -3 v2-0005-for-test-purpose-only-Add-toy-to-play-with-BPF-ri.patch
> you@home$ cd /path/to/linux/tools/lib/bpf
> you@home$ make -j$(nproc)
> you@home$ cp libbpf.a /path/to/iovisor/bcc/libbpf-tools/.output
> you@home$ cd /path/to/iovisor/bcc/libbpf-tools/
> you@home$ make -j toy
> # Start your VM and copy toy executable inside it.
> root@vm-amd64:~# ./share/toy &
> [1] 287
> root@vm-amd64:~# for i in {1..16}; do ls > /dev/null; done
> 16
> 15
> 14
> 13
> 12
> 11
> 10
> 9
> root@vm-amd64:~# ls > /dev/null && ls > /dev/null
> 18
> 17
>
> As you can see, the first eight events are overwritten.
>
> If you see any way to improve this contribution, feel free to share.
>
> Changes since:
> v1:
> * Made producers write backward like perf ring buffer, so it permits avoiding
> memory corruption.
> * Added libbpf implementation to consume all events available.
> * Added selftest.
> * Added documentation.
>
> Francis Laniel (5):
> bpf: Make ring buffer overwritable.
> selftests: Add BPF overwritable ring buffer self tests.
> docs/bpf: Add documentation for overwritable ring buffer.
> libbpf: Add implementation to consume overwritable BPF ring buffer.
> for test purpose only: Add toy to play with BPF ring.
>
> ...-only-Add-toy-to-play-with-BPF-ring-.patch | 147 ++++++++++++++++
> Documentation/bpf/ringbuf.rst | 18 +-
> include/uapi/linux/bpf.h | 3 +
> kernel/bpf/ringbuf.c | 43 +++--
> tools/include/uapi/linux/bpf.h | 3 +
> tools/lib/bpf/ringbuf.c | 106 ++++++++++++
> tools/testing/selftests/bpf/Makefile | 5 +-
> .../bpf/prog_tests/ringbuf_overwritable.c | 158 ++++++++++++++++++
> .../bpf/progs/test_ringbuf_overwritable.c | 61 +++++++
> 9 files changed, 531 insertions(+), 13 deletions(-)
> create mode 100644 0001-for-test-purpose-only-Add-toy-to-play-with-BPF-ring-.patch
> create mode 100644 tools/testing/selftests/bpf/prog_tests/ringbuf_overwritable.c
> create mode 100644 tools/testing/selftests/bpf/progs/test_ringbuf_overwritable.c
>
>
> Best regards and thank you in advance.
> ---
> [1] https://github.com/kinvolk/traceloop
> Traceloop was presented at LPC 2020 (https://lpc.events/event/7/contributions/667/)
> --
> 2.25.1
>