2023-12-11 04:56:02

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v3 0/4] Combine perf and bpf for fast eval of hw breakpoint conditions

rr, a userspace record and replay debugger[0], replays asynchronous events
such as signals and context switches by essentially[1] setting a breakpoint
at the address where the asynchronous event was delivered during recording
with a condition that the program state matches the state when the event
was delivered.

Currently, rr uses software breakpoints that trap (via ptrace) to the
supervisor, and evaluates the condition from the supervisor. If the
asynchronous event is delivered in a tight loop (thus requiring the
breakpoint condition to be repeatedly evaluated) the overhead can be
immense. A patch to rr that uses hardware breakpoints via perf events with
an attached BPF program to reject breakpoint hits where the condition is
not satisfied reduces rr's replay overhead by 94% on a pathological (but a
real customer-provided, not contrived) rr trace.

The only obstacle to this approach is that while the kernel allows a BPF
program to suppress sample output when a perf event overflows it does not
suppress signalling the perf event fd or sending the perf event's SIGTRAP.
This patch set redesigns __perf_overflow_handler() and
bpf_overflow_handler() so that the former invokes the latter directly when
appropriate rather than through the generic overflow handler machinery,
passes the return code of the BPF program back to __perf_overflow_handler()
to allow it to decide whether to execute the regular overflow handler,
reorders bpf_overflow_handler() and the side effects of perf event
overflow, changes __perf_overflow_handler() to suppress those side effects
if the BPF program returns zero, and adds a selftest.

The previous version of this patchset can be found at
https://lore.kernel.org/linux-kernel/[email protected]/

Changes since v2:

Patches 1 and 2 were added from a suggestion by Namhyung Kim to refactor
this code to implement this feature in a cleaner way. Patch 2 is separated
for the benefit of the ARM arch maintainers.

Patch 3 conceptually supercedes v2's patches 1 and 2, now with a cleaner
implementation thanks to the earlier refactoring.

Patch 4 is v2's patch 3, and addresses review comments about C++ style
comments, getting a TRAP_PERF definition into the test, and unnecessary
NULL checks.

[0] https://rr-project.org/
[1] Various optimizations exist to skip as much as execution as possible
before setting a breakpoint, and to determine a set of program state that
is practical to check and verify.



2023-12-11 04:56:10

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v3 1/4] perf/bpf: Call bpf handler directly, not through overflow machinery

To ultimately allow bpf programs attached to perf events to completely
suppress all of the effects of a perf event overflow (rather than just the
sample output, as they do today), call bpf_overflow_handler() from
__perf_event_overflow() directly rather than modifying struct perf_event's
overflow_handler. Return the bpf program's return value from
bpf_overflow_handler() so that __perf_event_overflow() knows how to
proceed. Remove the now unnecessary orig_overflow_handler from struct
perf_event.

This patch is solely a refactoring and results in no behavior change.

Signed-off-by: Kyle Huey <[email protected]>
Suggested-by: Namhyung Kim <[email protected]>
---
include/linux/perf_event.h | 6 +-----
kernel/events/core.c | 28 +++++++++++++++-------------
2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 5547ba68e6e4..312b9f31442c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -810,7 +810,6 @@ struct perf_event {
perf_overflow_handler_t overflow_handler;
void *overflow_handler_context;
#ifdef CONFIG_BPF_SYSCALL
- perf_overflow_handler_t orig_overflow_handler;
struct bpf_prog *prog;
u64 bpf_cookie;
#endif
@@ -1337,10 +1336,7 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
#ifdef CONFIG_BPF_SYSCALL
static inline bool uses_default_overflow_handler(struct perf_event *event)
{
- if (likely(is_default_overflow_handler(event)))
- return true;
-
- return __is_default_overflow_handler(event->orig_overflow_handler);
+ return is_default_overflow_handler(event);
}
#else
#define uses_default_overflow_handler(event) \
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b704d83a28b2..54f6372d2634 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9515,6 +9515,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
return true;
}

+#ifdef CONFIG_BPF_SYSCALL
+static int bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs);
+#endif
+
/*
* Generic event overflow handling, sampling.
*/
@@ -9584,7 +9590,10 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(&event->pending_irq);
}

- READ_ONCE(event->overflow_handler)(event, data, regs);
+#ifdef CONFIG_BPF_SYSCALL
+ if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
+#endif
+ READ_ONCE(event->overflow_handler)(event, data, regs);

if (*perf_event_fasync(event) && event->pending_kill) {
event->pending_wakeup = 1;
@@ -10394,9 +10403,9 @@ static void perf_event_free_filter(struct perf_event *event)
}

#ifdef CONFIG_BPF_SYSCALL
-static void bpf_overflow_handler(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
+static int bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
{
struct bpf_perf_event_data_kern ctx = {
.data = data,
@@ -10417,10 +10426,8 @@ static void bpf_overflow_handler(struct perf_event *event,
rcu_read_unlock();
out:
__this_cpu_dec(bpf_prog_active);
- if (!ret)
- return;

- event->orig_overflow_handler(event, data, regs);
+ return ret;
}

static int perf_event_set_bpf_handler(struct perf_event *event,
@@ -10456,8 +10463,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event,

event->prog = prog;
event->bpf_cookie = bpf_cookie;
- event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
- WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
return 0;
}

@@ -10468,7 +10473,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
if (!prog)
return;

- WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
event->prog = NULL;
bpf_prog_put(prog);
}
@@ -11928,13 +11932,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
overflow_handler = parent_event->overflow_handler;
context = parent_event->overflow_handler_context;
#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
- if (overflow_handler == bpf_overflow_handler) {
+ if (parent_event->prog) {
struct bpf_prog *prog = parent_event->prog;

bpf_prog_inc(prog);
event->prog = prog;
- event->orig_overflow_handler =
- parent_event->orig_overflow_handler;
}
#endif
}
--
2.34.1

2023-12-11 04:56:20

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v3 2/4] perf/bpf: Remove unneeded uses_default_overflow_handler.

Now that struct perf_event's orig_overflow_handler is gone, there's no need
for the functions and macros to support looking past overflow_handler to
orig_overflow_handler.

This patch is solely a refactoring and results in no behavior change.

Signed-off-by: Kyle Huey <[email protected]>
---
arch/arm/kernel/hw_breakpoint.c | 8 ++++----
arch/arm64/kernel/hw_breakpoint.c | 4 ++--
include/linux/perf_event.h | 16 ++--------------
3 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index dc0fb7a81371..054e9199f30d 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -626,7 +626,7 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
hw->address &= ~alignment_mask;
hw->ctrl.len <<= offset;

- if (uses_default_overflow_handler(bp)) {
+ if (is_default_overflow_handler(bp)) {
/*
* Mismatch breakpoints are required for single-stepping
* breakpoints.
@@ -798,7 +798,7 @@ static void watchpoint_handler(unsigned long addr, unsigned int fsr,
* Otherwise, insert a temporary mismatch breakpoint so that
* we can single-step over the watchpoint trigger.
*/
- if (!uses_default_overflow_handler(wp))
+ if (!is_default_overflow_handler(wp))
continue;
step:
enable_single_step(wp, instruction_pointer(regs));
@@ -811,7 +811,7 @@ static void watchpoint_handler(unsigned long addr, unsigned int fsr,
info->trigger = addr;
pr_debug("watchpoint fired: address = 0x%x\n", info->trigger);
perf_bp_event(wp, regs);
- if (uses_default_overflow_handler(wp))
+ if (is_default_overflow_handler(wp))
enable_single_step(wp, instruction_pointer(regs));
}

@@ -886,7 +886,7 @@ static void breakpoint_handler(unsigned long unknown, struct pt_regs *regs)
info->trigger = addr;
pr_debug("breakpoint fired: address = 0x%x\n", addr);
perf_bp_event(bp, regs);
- if (uses_default_overflow_handler(bp))
+ if (is_default_overflow_handler(bp))
enable_single_step(bp, addr);
goto unlock;
}
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 35225632d70a..db2a1861bb97 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -654,7 +654,7 @@ static int breakpoint_handler(unsigned long unused, unsigned long esr,
perf_bp_event(bp, regs);

/* Do we need to handle the stepping? */
- if (uses_default_overflow_handler(bp))
+ if (is_default_overflow_handler(bp))
step = 1;
unlock:
rcu_read_unlock();
@@ -733,7 +733,7 @@ static u64 get_distance_from_watchpoint(unsigned long addr, u64 val,
static int watchpoint_report(struct perf_event *wp, unsigned long addr,
struct pt_regs *regs)
{
- int step = uses_default_overflow_handler(wp);
+ int step = is_default_overflow_handler(wp);
struct arch_hw_breakpoint *info = counter_arch_bp(wp);

info->trigger = addr;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 312b9f31442c..7fef6299151b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1321,8 +1321,9 @@ extern int perf_event_output(struct perf_event *event,
struct pt_regs *regs);

static inline bool
-__is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
+is_default_overflow_handler(struct perf_event *event)
{
+ perf_overflow_handler_t overflow_handler = event->overflow_handler;
if (likely(overflow_handler == perf_event_output_forward))
return true;
if (unlikely(overflow_handler == perf_event_output_backward))
@@ -1330,19 +1331,6 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
return false;
}

-#define is_default_overflow_handler(event) \
- __is_default_overflow_handler((event)->overflow_handler)
-
-#ifdef CONFIG_BPF_SYSCALL
-static inline bool uses_default_overflow_handler(struct perf_event *event)
-{
- return is_default_overflow_handler(event);
-}
-#else
-#define uses_default_overflow_handler(event) \
- is_default_overflow_handler(event)
-#endif
-
extern void
perf_event_header__init_id(struct perf_event_header *header,
struct perf_sample_data *data,
--
2.34.1

2023-12-11 04:56:35

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v3 3/4] perf/bpf: Allow a bpf program to suppress all sample side effects

Returning zero from a bpf program attached to a perf event already
suppresses any data output. Return early from __perf_event_overflow() in
this case so it will also suppress event_limit accounting, SIGTRAP
generation, and F_ASYNC signalling.

Signed-off-by: Kyle Huey <[email protected]>
---
kernel/events/core.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 54f6372d2634..d6093fe893c8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9541,6 +9541,11 @@ static int __perf_event_overflow(struct perf_event *event,

ret = __perf_event_account_interrupt(event, throttle);

+#ifdef CONFIG_BPF_SYSCALL
+ if (event->prog && !bpf_overflow_handler(event, data, regs))
+ return ret;
+#endif
+
/*
* XXX event_limit might not quite work as expected on inherited
* events
@@ -9590,10 +9595,7 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(&event->pending_irq);
}

-#ifdef CONFIG_BPF_SYSCALL
- if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
-#endif
- READ_ONCE(event->overflow_handler)(event, data, regs);
+ READ_ONCE(event->overflow_handler)(event, data, regs);

if (*perf_event_fasync(event) && event->pending_kill) {
event->pending_wakeup = 1;
--
2.34.1

2023-12-11 04:57:19

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v3 4/4] selftest/bpf: Test a perf bpf program that suppresses side effects.

The test sets a hardware breakpoint and uses a bpf program to suppress the
side effects of a perf event sample, including I/O availability signals,
SIGTRAPs, and decrementing the event counter limit, if the ip matches the
expected value. Then the function with the breakpoint is executed multiple
times to test that all effects behave as expected.

Signed-off-by: Kyle Huey <[email protected]>
---
.../selftests/bpf/prog_tests/perf_skip.c | 140 ++++++++++++++++++
.../selftests/bpf/progs/test_perf_skip.c | 15 ++
2 files changed, 155 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
new file mode 100644
index 000000000000..0200736a8baf
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+#include <test_progs.h>
+#include "test_perf_skip.skel.h"
+#include <linux/compiler.h>
+#include <linux/hw_breakpoint.h>
+#include <sys/mman.h>
+
+#ifndef TRAP_PERF
+#define TRAP_PERF 6
+#endif
+
+int signals_unexpected = 1;
+int sigio_count, sigtrap_count;
+
+static void handle_sigio(int sig __always_unused)
+{
+ ASSERT_OK(signals_unexpected, "perf event not skipped");
+ ++sigio_count;
+}
+
+static void handle_sigtrap(int signum __always_unused,
+ siginfo_t *info,
+ void *ucontext __always_unused)
+{
+ ASSERT_OK(signals_unexpected, "perf event not skipped");
+ ASSERT_EQ(info->si_code, TRAP_PERF, "wrong si_code");
+ ++sigtrap_count;
+}
+
+static noinline int test_function(void)
+{
+ asm volatile ("");
+ return 0;
+}
+
+void serial_test_perf_skip(void)
+{
+ struct sigaction action = {};
+ struct sigaction previous_sigtrap;
+ sighandler_t previous_sigio;
+ struct test_perf_skip *skel = NULL;
+ struct perf_event_attr attr = {};
+ int perf_fd = -1;
+ int err;
+ struct f_owner_ex owner;
+ struct bpf_link *prog_link = NULL;
+
+ action.sa_flags = SA_SIGINFO | SA_NODEFER;
+ action.sa_sigaction = handle_sigtrap;
+ sigemptyset(&action.sa_mask);
+ if (!ASSERT_OK(sigaction(SIGTRAP, &action, &previous_sigtrap), "sigaction"))
+ return;
+
+ previous_sigio = signal(SIGIO, handle_sigio);
+
+ skel = test_perf_skip__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ attr.type = PERF_TYPE_BREAKPOINT;
+ attr.size = sizeof(attr);
+ attr.bp_type = HW_BREAKPOINT_X;
+ attr.bp_addr = (uintptr_t)test_function;
+ attr.bp_len = sizeof(long);
+ attr.sample_period = 1;
+ attr.sample_type = PERF_SAMPLE_IP;
+ attr.pinned = 1;
+ attr.exclude_kernel = 1;
+ attr.exclude_hv = 1;
+ attr.precise_ip = 3;
+ attr.sigtrap = 1;
+ attr.remove_on_exec = 1;
+
+ perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
+ if (perf_fd < 0 && (errno == ENOENT || errno == EOPNOTSUPP)) {
+ printf("SKIP:no PERF_TYPE_BREAKPOINT/HW_BREAKPOINT_X\n");
+ test__skip();
+ goto cleanup;
+ }
+ if (!ASSERT_OK(perf_fd < 0, "perf_event_open"))
+ goto cleanup;
+
+ /* Configure the perf event to signal on sample. */
+ err = fcntl(perf_fd, F_SETFL, O_ASYNC);
+ if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
+ goto cleanup;
+
+ owner.type = F_OWNER_TID;
+ owner.pid = syscall(__NR_gettid);
+ err = fcntl(perf_fd, F_SETOWN_EX, &owner);
+ if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
+ goto cleanup;
+
+ /*
+ * Allow at most one sample. A sample rejected by bpf should
+ * not count against this.
+ */
+ err = ioctl(perf_fd, PERF_EVENT_IOC_REFRESH, 1);
+ if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_REFRESH)"))
+ goto cleanup;
+
+ prog_link = bpf_program__attach_perf_event(skel->progs.handler, perf_fd);
+ if (!ASSERT_OK_PTR(prog_link, "bpf_program__attach_perf_event"))
+ goto cleanup;
+
+ /* Configure the bpf program to suppress the sample. */
+ skel->bss->ip = (uintptr_t)test_function;
+ test_function();
+
+ ASSERT_EQ(sigio_count, 0, "sigio_count");
+ ASSERT_EQ(sigtrap_count, 0, "sigtrap_count");
+
+ /* Configure the bpf program to allow the sample. */
+ skel->bss->ip = 0;
+ signals_unexpected = 0;
+ test_function();
+
+ ASSERT_EQ(sigio_count, 1, "sigio_count");
+ ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
+
+ /*
+ * Test that the sample above is the only one allowed (by perf, not
+ * by bpf)
+ */
+ test_function();
+
+ ASSERT_EQ(sigio_count, 1, "sigio_count");
+ ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
+
+cleanup:
+ bpf_link__destroy(prog_link);
+ if (perf_fd >= 0)
+ close(perf_fd);
+ test_perf_skip__destroy(skel);
+
+ signal(SIGIO, previous_sigio);
+ sigaction(SIGTRAP, &previous_sigtrap, NULL);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
new file mode 100644
index 000000000000..7eb8b6de7a57
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+uintptr_t ip;
+
+SEC("perf_event")
+int handler(struct bpf_perf_event_data *data)
+{
+ /* Skip events that have the correct ip. */
+ return ip != PT_REGS_IP(&data->regs);
+}
+
+char _license[] SEC("license") = "GPL";
--
2.34.1

2023-12-11 14:22:36

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] perf/bpf: Call bpf handler directly, not through overflow machinery

On Mon, 11 Dec 2023 at 05:55, Kyle Huey <[email protected]> wrote:
>
> To ultimately allow bpf programs attached to perf events to completely
> suppress all of the effects of a perf event overflow (rather than just the
> sample output, as they do today), call bpf_overflow_handler() from
> __perf_event_overflow() directly rather than modifying struct perf_event's
> overflow_handler. Return the bpf program's return value from
> bpf_overflow_handler() so that __perf_event_overflow() knows how to
> proceed. Remove the now unnecessary orig_overflow_handler from struct
> perf_event.
>
> This patch is solely a refactoring and results in no behavior change.
>
> Signed-off-by: Kyle Huey <[email protected]>
> Suggested-by: Namhyung Kim <[email protected]>
> ---
> include/linux/perf_event.h | 6 +-----
> kernel/events/core.c | 28 +++++++++++++++-------------
> 2 files changed, 16 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 5547ba68e6e4..312b9f31442c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -810,7 +810,6 @@ struct perf_event {
> perf_overflow_handler_t overflow_handler;
> void *overflow_handler_context;
> #ifdef CONFIG_BPF_SYSCALL
> - perf_overflow_handler_t orig_overflow_handler;
> struct bpf_prog *prog;
> u64 bpf_cookie;
> #endif
> @@ -1337,10 +1336,7 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
> #ifdef CONFIG_BPF_SYSCALL
> static inline bool uses_default_overflow_handler(struct perf_event *event)
> {
> - if (likely(is_default_overflow_handler(event)))
> - return true;
> -
> - return __is_default_overflow_handler(event->orig_overflow_handler);
> + return is_default_overflow_handler(event);
> }
> #else
> #define uses_default_overflow_handler(event) \
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index b704d83a28b2..54f6372d2634 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9515,6 +9515,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
> return true;
> }
>
> +#ifdef CONFIG_BPF_SYSCALL
> +static int bpf_overflow_handler(struct perf_event *event,
> + struct perf_sample_data *data,
> + struct pt_regs *regs);
> +#endif

To avoid more #ifdefs we usually add a stub, something like:

#ifdef ...
static int bpf_overflow_handler(...);
#else
static inline int bpf_overflow_handler(...) { return 0; }
#endif

Then you can avoid more #ifdefs below, esp. when it surrounds an
if-statement it easily leads to confusion or subtle bugs in future
changes. The compiler will optimize out the constants and the
generated code will be the same.

> /*
> * Generic event overflow handling, sampling.
> */
> @@ -9584,7 +9590,10 @@ static int __perf_event_overflow(struct perf_event *event,
> irq_work_queue(&event->pending_irq);
> }
>
> - READ_ONCE(event->overflow_handler)(event, data, regs);
> +#ifdef CONFIG_BPF_SYSCALL
> + if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
> +#endif
> + READ_ONCE(event->overflow_handler)(event, data, regs);
>
> if (*perf_event_fasync(event) && event->pending_kill) {
> event->pending_wakeup = 1;
> @@ -10394,9 +10403,9 @@ static void perf_event_free_filter(struct perf_event *event)
> }
>
> #ifdef CONFIG_BPF_SYSCALL
> -static void bpf_overflow_handler(struct perf_event *event,
> - struct perf_sample_data *data,
> - struct pt_regs *regs)
> +static int bpf_overflow_handler(struct perf_event *event,
> + struct perf_sample_data *data,
> + struct pt_regs *regs)
> {
> struct bpf_perf_event_data_kern ctx = {
> .data = data,
> @@ -10417,10 +10426,8 @@ static void bpf_overflow_handler(struct perf_event *event,
> rcu_read_unlock();
> out:
> __this_cpu_dec(bpf_prog_active);
> - if (!ret)
> - return;
>
> - event->orig_overflow_handler(event, data, regs);
> + return ret;
> }
>
> static int perf_event_set_bpf_handler(struct perf_event *event,
> @@ -10456,8 +10463,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event,
>
> event->prog = prog;
> event->bpf_cookie = bpf_cookie;
> - event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
> - WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
> return 0;
> }
>
> @@ -10468,7 +10473,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
> if (!prog)
> return;
>
> - WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
> event->prog = NULL;
> bpf_prog_put(prog);
> }
> @@ -11928,13 +11932,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
> overflow_handler = parent_event->overflow_handler;
> context = parent_event->overflow_handler_context;
> #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
> - if (overflow_handler == bpf_overflow_handler) {
> + if (parent_event->prog) {
> struct bpf_prog *prog = parent_event->prog;
>
> bpf_prog_inc(prog);
> event->prog = prog;
> - event->orig_overflow_handler =
> - parent_event->orig_overflow_handler;
> }
> #endif
> }
> --
> 2.34.1
>

2023-12-11 15:21:46

by Kyle Huey

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] perf/bpf: Call bpf handler directly, not through overflow machinery

On Mon, Dec 11, 2023 at 6:20 AM Marco Elver <[email protected]> wrote:
>
> On Mon, 11 Dec 2023 at 05:55, Kyle Huey <[email protected]> wrote:
> >
> > To ultimately allow bpf programs attached to perf events to completely
> > suppress all of the effects of a perf event overflow (rather than just the
> > sample output, as they do today), call bpf_overflow_handler() from
> > __perf_event_overflow() directly rather than modifying struct perf_event's
> > overflow_handler. Return the bpf program's return value from
> > bpf_overflow_handler() so that __perf_event_overflow() knows how to
> > proceed. Remove the now unnecessary orig_overflow_handler from struct
> > perf_event.
> >
> > This patch is solely a refactoring and results in no behavior change.
> >
> > Signed-off-by: Kyle Huey <[email protected]>
> > Suggested-by: Namhyung Kim <[email protected]>
> > ---
> > include/linux/perf_event.h | 6 +-----
> > kernel/events/core.c | 28 +++++++++++++++-------------
> > 2 files changed, 16 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 5547ba68e6e4..312b9f31442c 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -810,7 +810,6 @@ struct perf_event {
> > perf_overflow_handler_t overflow_handler;
> > void *overflow_handler_context;
> > #ifdef CONFIG_BPF_SYSCALL
> > - perf_overflow_handler_t orig_overflow_handler;
> > struct bpf_prog *prog;
> > u64 bpf_cookie;
> > #endif
> > @@ -1337,10 +1336,7 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
> > #ifdef CONFIG_BPF_SYSCALL
> > static inline bool uses_default_overflow_handler(struct perf_event *event)
> > {
> > - if (likely(is_default_overflow_handler(event)))
> > - return true;
> > -
> > - return __is_default_overflow_handler(event->orig_overflow_handler);
> > + return is_default_overflow_handler(event);
> > }
> > #else
> > #define uses_default_overflow_handler(event) \
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index b704d83a28b2..54f6372d2634 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -9515,6 +9515,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
> > return true;
> > }
> >
> > +#ifdef CONFIG_BPF_SYSCALL
> > +static int bpf_overflow_handler(struct perf_event *event,
> > + struct perf_sample_data *data,
> > + struct pt_regs *regs);
> > +#endif
>
> To avoid more #ifdefs we usually add a stub, something like:
>
> #ifdef ...
> static int bpf_overflow_handler(...);
> #else
> static inline int bpf_overflow_handler(...) { return 0; }
> #endif
>
> Then you can avoid more #ifdefs below, esp. when it surrounds an
> if-statement it easily leads to confusion or subtle bugs in future
> changes. The compiler will optimize out the constants and the
> generated code will be the same.

This would not allow removing any #ifdefs because event->prog is only
present if CONFIG_BPF_SYSCALL is defined.

- Kyle

> > /*
> > * Generic event overflow handling, sampling.
> > */
> > @@ -9584,7 +9590,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > irq_work_queue(&event->pending_irq);
> > }
> >
> > - READ_ONCE(event->overflow_handler)(event, data, regs);
> > +#ifdef CONFIG_BPF_SYSCALL
> > + if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
> > +#endif
> > + READ_ONCE(event->overflow_handler)(event, data, regs);
> >
> > if (*perf_event_fasync(event) && event->pending_kill) {
> > event->pending_wakeup = 1;
> > @@ -10394,9 +10403,9 @@ static void perf_event_free_filter(struct perf_event *event)
> > }
> >
> > #ifdef CONFIG_BPF_SYSCALL
> > -static void bpf_overflow_handler(struct perf_event *event,
> > - struct perf_sample_data *data,
> > - struct pt_regs *regs)
> > +static int bpf_overflow_handler(struct perf_event *event,
> > + struct perf_sample_data *data,
> > + struct pt_regs *regs)
> > {
> > struct bpf_perf_event_data_kern ctx = {
> > .data = data,
> > @@ -10417,10 +10426,8 @@ static void bpf_overflow_handler(struct perf_event *event,
> > rcu_read_unlock();
> > out:
> > __this_cpu_dec(bpf_prog_active);
> > - if (!ret)
> > - return;
> >
> > - event->orig_overflow_handler(event, data, regs);
> > + return ret;
> > }
> >
> > static int perf_event_set_bpf_handler(struct perf_event *event,
> > @@ -10456,8 +10463,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event,
> >
> > event->prog = prog;
> > event->bpf_cookie = bpf_cookie;
> > - event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
> > - WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
> > return 0;
> > }
> >
> > @@ -10468,7 +10473,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
> > if (!prog)
> > return;
> >
> > - WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
> > event->prog = NULL;
> > bpf_prog_put(prog);
> > }
> > @@ -11928,13 +11932,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
> > overflow_handler = parent_event->overflow_handler;
> > context = parent_event->overflow_handler_context;
> > #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
> > - if (overflow_handler == bpf_overflow_handler) {
> > + if (parent_event->prog) {
> > struct bpf_prog *prog = parent_event->prog;
> >
> > bpf_prog_inc(prog);
> > event->prog = prog;
> > - event->orig_overflow_handler =
> > - parent_event->orig_overflow_handler;
> > }
> > #endif
> > }
> > --
> > 2.34.1
> >

2023-12-12 09:22:56

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] perf/bpf: Remove unneeded uses_default_overflow_handler.

On Sun, Dec 10, 2023 at 08:55:41PM -0800, Kyle Huey wrote:
> Now that struct perf_event's orig_overflow_handler is gone, there's no need
> for the functions and macros to support looking past overflow_handler to
> orig_overflow_handler.
>
> This patch is solely a refactoring and results in no behavior change.
>
> Signed-off-by: Kyle Huey <[email protected]>
> ---
> arch/arm/kernel/hw_breakpoint.c | 8 ++++----
> arch/arm64/kernel/hw_breakpoint.c | 4 ++--
> include/linux/perf_event.h | 16 ++--------------
> 3 files changed, 8 insertions(+), 20 deletions(-)

Acked-by: Will Deacon <[email protected]>

Will

2023-12-13 01:39:05

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] Combine perf and bpf for fast eval of hw breakpoint conditions

Hello,

On Sun, Dec 10, 2023 at 8:55 PM Kyle Huey <[email protected]> wrote:
>
> rr, a userspace record and replay debugger[0], replays asynchronous events
> such as signals and context switches by essentially[1] setting a breakpoint
> at the address where the asynchronous event was delivered during recording
> with a condition that the program state matches the state when the event
> was delivered.
>
> Currently, rr uses software breakpoints that trap (via ptrace) to the
> supervisor, and evaluates the condition from the supervisor. If the
> asynchronous event is delivered in a tight loop (thus requiring the
> breakpoint condition to be repeatedly evaluated) the overhead can be
> immense. A patch to rr that uses hardware breakpoints via perf events with
> an attached BPF program to reject breakpoint hits where the condition is
> not satisfied reduces rr's replay overhead by 94% on a pathological (but a
> real customer-provided, not contrived) rr trace.
>
> The only obstacle to this approach is that while the kernel allows a BPF
> program to suppress sample output when a perf event overflows it does not
> suppress signalling the perf event fd or sending the perf event's SIGTRAP.
> This patch set redesigns __perf_overflow_handler() and
> bpf_overflow_handler() so that the former invokes the latter directly when
> appropriate rather than through the generic overflow handler machinery,
> passes the return code of the BPF program back to __perf_overflow_handler()
> to allow it to decide whether to execute the regular overflow handler,
> reorders bpf_overflow_handler() and the side effects of perf event
> overflow, changes __perf_overflow_handler() to suppress those side effects
> if the BPF program returns zero, and adds a selftest.
>
> The previous version of this patchset can be found at
> https://lore.kernel.org/linux-kernel/[email protected]/
>
> Changes since v2:
>
> Patches 1 and 2 were added from a suggestion by Namhyung Kim to refactor
> this code to implement this feature in a cleaner way. Patch 2 is separated
> for the benefit of the ARM arch maintainers.
>
> Patch 3 conceptually supercedes v2's patches 1 and 2, now with a cleaner
> implementation thanks to the earlier refactoring.
>
> Patch 4 is v2's patch 3, and addresses review comments about C++ style
> comments, getting a TRAP_PERF definition into the test, and unnecessary
> NULL checks.

Acked-by: Namhyung Kim <[email protected]>

Thanks,
Namhyung

>
> [0] https://rr-project.org/
> [1] Various optimizations exist to skip as much as execution as possible
> before setting a breakpoint, and to determine a set of program state that
> is practical to check and verify.
>
>

2024-01-02 22:49:41

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] selftest/bpf: Test a perf bpf program that suppresses side effects.

On Sun, Dec 10, 2023 at 8:56 PM Kyle Huey <[email protected]> wrote:
>
> The test sets a hardware breakpoint and uses a bpf program to suppress the
> side effects of a perf event sample, including I/O availability signals,
> SIGTRAPs, and decrementing the event counter limit, if the ip matches the
> expected value. Then the function with the breakpoint is executed multiple
> times to test that all effects behave as expected.
>
> Signed-off-by: Kyle Huey <[email protected]>
> ---
> .../selftests/bpf/prog_tests/perf_skip.c | 140 ++++++++++++++++++
> .../selftests/bpf/progs/test_perf_skip.c | 15 ++
> 2 files changed, 155 insertions(+)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
> create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> new file mode 100644
> index 000000000000..0200736a8baf
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> @@ -0,0 +1,140 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE
> +
> +#include <test_progs.h>
> +#include "test_perf_skip.skel.h"
> +#include <linux/compiler.h>
> +#include <linux/hw_breakpoint.h>
> +#include <sys/mman.h>
> +
> +#ifndef TRAP_PERF
> +#define TRAP_PERF 6
> +#endif
> +
> +int signals_unexpected = 1;
> +int sigio_count, sigtrap_count;
> +
> +static void handle_sigio(int sig __always_unused)
> +{
> + ASSERT_OK(signals_unexpected, "perf event not skipped");

ASSERT_OK is a little confusing. Maybe do something like:

static int signals_expected;
static void handle_sigio(int sig __always_unused)
{
ASSERT_EQ(signals_expected, 1, "expected sig_io");
}
serial_test_perf_skip()
{
...
signals_expected = 1;
}

> + ++sigio_count;
> +}
> +
> +static void handle_sigtrap(int signum __always_unused,
> + siginfo_t *info,
> + void *ucontext __always_unused)
> +{
> + ASSERT_OK(signals_unexpected, "perf event not skipped");
ditto

> + ASSERT_EQ(info->si_code, TRAP_PERF, "wrong si_code");
> + ++sigtrap_count;
> +}
> +
> +static noinline int test_function(void)
> +{
> + asm volatile ("");
> + return 0;
> +}
> +
> +void serial_test_perf_skip(void)
> +{
> + struct sigaction action = {};
> + struct sigaction previous_sigtrap;
> + sighandler_t previous_sigio;
> + struct test_perf_skip *skel = NULL;
> + struct perf_event_attr attr = {};
> + int perf_fd = -1;
> + int err;
> + struct f_owner_ex owner;
> + struct bpf_link *prog_link = NULL;
> +
> + action.sa_flags = SA_SIGINFO | SA_NODEFER;
> + action.sa_sigaction = handle_sigtrap;
> + sigemptyset(&action.sa_mask);
> + if (!ASSERT_OK(sigaction(SIGTRAP, &action, &previous_sigtrap), "sigaction"))
> + return;
> +
> + previous_sigio = signal(SIGIO, handle_sigio);

handle signal() errors here?

> +
> + skel = test_perf_skip__open_and_load();
> + if (!ASSERT_OK_PTR(skel, "skel_load"))
> + goto cleanup;
> +
> + attr.type = PERF_TYPE_BREAKPOINT;
> + attr.size = sizeof(attr);
> + attr.bp_type = HW_BREAKPOINT_X;
> + attr.bp_addr = (uintptr_t)test_function;
> + attr.bp_len = sizeof(long);
> + attr.sample_period = 1;
> + attr.sample_type = PERF_SAMPLE_IP;
> + attr.pinned = 1;
> + attr.exclude_kernel = 1;
> + attr.exclude_hv = 1;
> + attr.precise_ip = 3;
> + attr.sigtrap = 1;
> + attr.remove_on_exec = 1;
> +
> + perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
> + if (perf_fd < 0 && (errno == ENOENT || errno == EOPNOTSUPP)) {
> + printf("SKIP:no PERF_TYPE_BREAKPOINT/HW_BREAKPOINT_X\n");
> + test__skip();
> + goto cleanup;
> + }
> + if (!ASSERT_OK(perf_fd < 0, "perf_event_open"))
> + goto cleanup;
> +
> + /* Configure the perf event to signal on sample. */
> + err = fcntl(perf_fd, F_SETFL, O_ASYNC);
> + if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
> + goto cleanup;
> +
> + owner.type = F_OWNER_TID;
> + owner.pid = syscall(__NR_gettid);
> + err = fcntl(perf_fd, F_SETOWN_EX, &owner);
> + if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
> + goto cleanup;
> +
> + /*
> + * Allow at most one sample. A sample rejected by bpf should
> + * not count against this.
> + */

Multi-line comment style should be like

/* Allow at most one sample. A sample rejected by bpf should
* not count against this.
*/

> + err = ioctl(perf_fd, PERF_EVENT_IOC_REFRESH, 1);
> + if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_REFRESH)"))
> + goto cleanup;
> +
> + prog_link = bpf_program__attach_perf_event(skel->progs.handler, perf_fd);
> + if (!ASSERT_OK_PTR(prog_link, "bpf_program__attach_perf_event"))
> + goto cleanup;
> +
> + /* Configure the bpf program to suppress the sample. */
> + skel->bss->ip = (uintptr_t)test_function;
> + test_function();
> +
> + ASSERT_EQ(sigio_count, 0, "sigio_count");
> + ASSERT_EQ(sigtrap_count, 0, "sigtrap_count");
> +
> + /* Configure the bpf program to allow the sample. */
> + skel->bss->ip = 0;
> + signals_unexpected = 0;
> + test_function();
> +
> + ASSERT_EQ(sigio_count, 1, "sigio_count");
> + ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
> +
> + /*
> + * Test that the sample above is the only one allowed (by perf, not
> + * by bpf)
> + */

ditto.

> + test_function();
> +
> + ASSERT_EQ(sigio_count, 1, "sigio_count");
> + ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
> +
> +cleanup:
> + bpf_link__destroy(prog_link);
> + if (perf_fd >= 0)
> + close(perf_fd);
> + test_perf_skip__destroy(skel);
> +
> + signal(SIGIO, previous_sigio);
> + sigaction(SIGTRAP, &previous_sigtrap, NULL);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> new file mode 100644
> index 000000000000..7eb8b6de7a57
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +uintptr_t ip;
> +
> +SEC("perf_event")
> +int handler(struct bpf_perf_event_data *data)
> +{
> + /* Skip events that have the correct ip. */
> + return ip != PT_REGS_IP(&data->regs);
> +}
> +
> +char _license[] SEC("license") = "GPL";
> --
> 2.34.1
>

2024-01-02 22:56:27

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] perf/bpf: Call bpf handler directly, not through overflow machinery

On Sun, Dec 10, 2023 at 8:55 PM Kyle Huey <[email protected]> wrote:
>
> To ultimately allow bpf programs attached to perf events to completely
> suppress all of the effects of a perf event overflow (rather than just the
> sample output, as they do today), call bpf_overflow_handler() from
> __perf_event_overflow() directly rather than modifying struct perf_event's
> overflow_handler. Return the bpf program's return value from
> bpf_overflow_handler() so that __perf_event_overflow() knows how to
> proceed. Remove the now unnecessary orig_overflow_handler from struct
> perf_event.
>
> This patch is solely a refactoring and results in no behavior change.
>
> Signed-off-by: Kyle Huey <[email protected]>
> Suggested-by: Namhyung Kim <[email protected]>

Acked-by: Song Liu <[email protected]>

2024-01-02 22:56:52

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3 2/4] perf/bpf: Remove unneeded uses_default_overflow_handler.

On Tue, Dec 12, 2023 at 1:22 AM Will Deacon <[email protected]> wrote:
>
> On Sun, Dec 10, 2023 at 08:55:41PM -0800, Kyle Huey wrote:
> > Now that struct perf_event's orig_overflow_handler is gone, there's no need
> > for the functions and macros to support looking past overflow_handler to
> > orig_overflow_handler.
> >
> > This patch is solely a refactoring and results in no behavior change.
> >
> > Signed-off-by: Kyle Huey <[email protected]>
> > ---
> > arch/arm/kernel/hw_breakpoint.c | 8 ++++----
> > arch/arm64/kernel/hw_breakpoint.c | 4 ++--
> > include/linux/perf_event.h | 16 ++--------------
> > 3 files changed, 8 insertions(+), 20 deletions(-)
>
> Acked-by: Will Deacon <[email protected]>

Acked-by: Song Liu <[email protected]>

2024-01-02 23:05:58

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] perf/bpf: Call bpf handler directly, not through overflow machinery

On Sun, Dec 10, 2023 at 8:55 PM Kyle Huey <[email protected]> wrote:
>
> To ultimately allow bpf programs attached to perf events to completely
> suppress all of the effects of a perf event overflow (rather than just the
> sample output, as they do today), call bpf_overflow_handler() from
> __perf_event_overflow() directly rather than modifying struct perf_event's
> overflow_handler. Return the bpf program's return value from
> bpf_overflow_handler() so that __perf_event_overflow() knows how to
> proceed. Remove the now unnecessary orig_overflow_handler from struct
> perf_event.
>
> This patch is solely a refactoring and results in no behavior change.
>
> Signed-off-by: Kyle Huey <[email protected]>
> Suggested-by: Namhyung Kim <[email protected]>
> ---
> include/linux/perf_event.h | 6 +-----
> kernel/events/core.c | 28 +++++++++++++++-------------
> 2 files changed, 16 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 5547ba68e6e4..312b9f31442c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -810,7 +810,6 @@ struct perf_event {
> perf_overflow_handler_t overflow_handler;
> void *overflow_handler_context;
> #ifdef CONFIG_BPF_SYSCALL
> - perf_overflow_handler_t orig_overflow_handler;
> struct bpf_prog *prog;
> u64 bpf_cookie;
> #endif
> @@ -1337,10 +1336,7 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
> #ifdef CONFIG_BPF_SYSCALL
> static inline bool uses_default_overflow_handler(struct perf_event *event)
> {
> - if (likely(is_default_overflow_handler(event)))
> - return true;
> -
> - return __is_default_overflow_handler(event->orig_overflow_handler);
> + return is_default_overflow_handler(event);
> }
> #else
> #define uses_default_overflow_handler(event) \
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index b704d83a28b2..54f6372d2634 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9515,6 +9515,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
> return true;
> }
>
> +#ifdef CONFIG_BPF_SYSCALL
> +static int bpf_overflow_handler(struct perf_event *event,
> + struct perf_sample_data *data,
> + struct pt_regs *regs);
> +#endif
> +
> /*
> * Generic event overflow handling, sampling.
> */
> @@ -9584,7 +9590,10 @@ static int __perf_event_overflow(struct perf_event *event,
> irq_work_queue(&event->pending_irq);
> }
>
> - READ_ONCE(event->overflow_handler)(event, data, regs);
> +#ifdef CONFIG_BPF_SYSCALL
> + if (!(event->prog && !bpf_overflow_handler(event, data, regs)))

This condition is hard to follow. Please consider simplifying it.

Thanks,
Song

> +#endif
> + READ_ONCE(event->overflow_handler)(event, data, regs);
>
> if (*perf_event_fasync(event) && event->pending_kill) {
> event->pending_wakeup = 1;
> @@ -10394,9 +10403,9 @@ static void perf_event_free_filter(struct perf_event *event)
> }
>
> #ifdef CONFIG_BPF_SYSCALL
> -static void bpf_overflow_handler(struct perf_event *event,
> - struct perf_sample_data *data,
> - struct pt_regs *regs)
> +static int bpf_overflow_handler(struct perf_event *event,
> + struct perf_sample_data *data,
> + struct pt_regs *regs)
> {
> struct bpf_perf_event_data_kern ctx = {
> .data = data,
> @@ -10417,10 +10426,8 @@ static void bpf_overflow_handler(struct perf_event *event,
> rcu_read_unlock();
> out:
> __this_cpu_dec(bpf_prog_active);
> - if (!ret)
> - return;
>
> - event->orig_overflow_handler(event, data, regs);
> + return ret;
> }
>
> static int perf_event_set_bpf_handler(struct perf_event *event,
> @@ -10456,8 +10463,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event,
>
> event->prog = prog;
> event->bpf_cookie = bpf_cookie;
> - event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
> - WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
> return 0;
> }
>
> @@ -10468,7 +10473,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
> if (!prog)
> return;
>
> - WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
> event->prog = NULL;
> bpf_prog_put(prog);
> }
> @@ -11928,13 +11932,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
> overflow_handler = parent_event->overflow_handler;
> context = parent_event->overflow_handler_context;
> #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
> - if (overflow_handler == bpf_overflow_handler) {
> + if (parent_event->prog) {
> struct bpf_prog *prog = parent_event->prog;
>
> bpf_prog_inc(prog);
> event->prog = prog;
> - event->orig_overflow_handler =
> - parent_event->orig_overflow_handler;
> }
> #endif
> }
> --
> 2.34.1
>
>

2024-01-02 23:10:40

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v3 3/4] perf/bpf: Allow a bpf program to suppress all sample side effects

On Sun, Dec 10, 2023 at 8:56 PM Kyle Huey <[email protected]> wrote:
>
> Returning zero from a bpf program attached to a perf event already
> suppresses any data output. Return early from __perf_event_overflow() in
> this case so it will also suppress event_limit accounting, SIGTRAP
> generation, and F_ASYNC signalling.
>
> Signed-off-by: Kyle Huey <[email protected]>

Acked-by: Song Liu <[email protected]>

> ---
> kernel/events/core.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 54f6372d2634..d6093fe893c8 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9541,6 +9541,11 @@ static int __perf_event_overflow(struct perf_event *event,
>
> ret = __perf_event_account_interrupt(event, throttle);
>
> +#ifdef CONFIG_BPF_SYSCALL
> + if (event->prog && !bpf_overflow_handler(event, data, regs))
> + return ret;
> +#endif
> +
> /*
> * XXX event_limit might not quite work as expected on inherited
> * events
> @@ -9590,10 +9595,7 @@ static int __perf_event_overflow(struct perf_event *event,
> irq_work_queue(&event->pending_irq);
> }
>
> -#ifdef CONFIG_BPF_SYSCALL
> - if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
> -#endif
> - READ_ONCE(event->overflow_handler)(event, data, regs);
> + READ_ONCE(event->overflow_handler)(event, data, regs);
>
> if (*perf_event_fasync(event) && event->pending_kill) {
> event->pending_wakeup = 1;
> --
> 2.34.1
>
>

2024-01-19 00:07:57

by Kyle Huey

[permalink] [raw]
Subject: Re: [PATCH v3 1/4] perf/bpf: Call bpf handler directly, not through overflow machinery

On Tue, Jan 2, 2024 at 3:05 PM Song Liu <[email protected]> wrote:
>
> On Sun, Dec 10, 2023 at 8:55 PM Kyle Huey <[email protected]> wrote:
> >
> > To ultimately allow bpf programs attached to perf events to completely
> > suppress all of the effects of a perf event overflow (rather than just the
> > sample output, as they do today), call bpf_overflow_handler() from
> > __perf_event_overflow() directly rather than modifying struct perf_event's
> > overflow_handler. Return the bpf program's return value from
> > bpf_overflow_handler() so that __perf_event_overflow() knows how to
> > proceed. Remove the now unnecessary orig_overflow_handler from struct
> > perf_event.
> >
> > This patch is solely a refactoring and results in no behavior change.
> >
> > Signed-off-by: Kyle Huey <[email protected]>
> > Suggested-by: Namhyung Kim <[email protected]>
> > ---
> > include/linux/perf_event.h | 6 +-----
> > kernel/events/core.c | 28 +++++++++++++++-------------
> > 2 files changed, 16 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index 5547ba68e6e4..312b9f31442c 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -810,7 +810,6 @@ struct perf_event {
> > perf_overflow_handler_t overflow_handler;
> > void *overflow_handler_context;
> > #ifdef CONFIG_BPF_SYSCALL
> > - perf_overflow_handler_t orig_overflow_handler;
> > struct bpf_prog *prog;
> > u64 bpf_cookie;
> > #endif
> > @@ -1337,10 +1336,7 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
> > #ifdef CONFIG_BPF_SYSCALL
> > static inline bool uses_default_overflow_handler(struct perf_event *event)
> > {
> > - if (likely(is_default_overflow_handler(event)))
> > - return true;
> > -
> > - return __is_default_overflow_handler(event->orig_overflow_handler);
> > + return is_default_overflow_handler(event);
> > }
> > #else
> > #define uses_default_overflow_handler(event) \
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index b704d83a28b2..54f6372d2634 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -9515,6 +9515,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
> > return true;
> > }
> >
> > +#ifdef CONFIG_BPF_SYSCALL
> > +static int bpf_overflow_handler(struct perf_event *event,
> > + struct perf_sample_data *data,
> > + struct pt_regs *regs);
> > +#endif
> > +
> > /*
> > * Generic event overflow handling, sampling.
> > */
> > @@ -9584,7 +9590,10 @@ static int __perf_event_overflow(struct perf_event *event,
> > irq_work_queue(&event->pending_irq);
> > }
> >
> > - READ_ONCE(event->overflow_handler)(event, data, regs);
> > +#ifdef CONFIG_BPF_SYSCALL
> > + if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
>
> This condition is hard to follow. Please consider simplifying it.
>
> Thanks,
> Song

It gets simplified later in patch 3/4.

- Kyle

> > +#endif
> > + READ_ONCE(event->overflow_handler)(event, data, regs);
> >
> > if (*perf_event_fasync(event) && event->pending_kill) {
> > event->pending_wakeup = 1;
> > @@ -10394,9 +10403,9 @@ static void perf_event_free_filter(struct perf_event *event)
> > }
> >
> > #ifdef CONFIG_BPF_SYSCALL
> > -static void bpf_overflow_handler(struct perf_event *event,
> > - struct perf_sample_data *data,
> > - struct pt_regs *regs)
> > +static int bpf_overflow_handler(struct perf_event *event,
> > + struct perf_sample_data *data,
> > + struct pt_regs *regs)
> > {
> > struct bpf_perf_event_data_kern ctx = {
> > .data = data,
> > @@ -10417,10 +10426,8 @@ static void bpf_overflow_handler(struct perf_event *event,
> > rcu_read_unlock();
> > out:
> > __this_cpu_dec(bpf_prog_active);
> > - if (!ret)
> > - return;
> >
> > - event->orig_overflow_handler(event, data, regs);
> > + return ret;
> > }
> >
> > static int perf_event_set_bpf_handler(struct perf_event *event,
> > @@ -10456,8 +10463,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event,
> >
> > event->prog = prog;
> > event->bpf_cookie = bpf_cookie;
> > - event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
> > - WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
> > return 0;
> > }
> >
> > @@ -10468,7 +10473,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
> > if (!prog)
> > return;
> >
> > - WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
> > event->prog = NULL;
> > bpf_prog_put(prog);
> > }
> > @@ -11928,13 +11932,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
> > overflow_handler = parent_event->overflow_handler;
> > context = parent_event->overflow_handler_context;
> > #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
> > - if (overflow_handler == bpf_overflow_handler) {
> > + if (parent_event->prog) {
> > struct bpf_prog *prog = parent_event->prog;
> >
> > bpf_prog_inc(prog);
> > event->prog = prog;
> > - event->orig_overflow_handler =
> > - parent_event->orig_overflow_handler;
> > }
> > #endif
> > }
> > --
> > 2.34.1
> >
> >

2024-01-19 00:08:59

by Kyle Huey

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] selftest/bpf: Test a perf bpf program that suppresses side effects.

On Tue, Jan 2, 2024 at 2:49 PM Song Liu <[email protected]> wrote:
>
> On Sun, Dec 10, 2023 at 8:56 PM Kyle Huey <[email protected]> wrote:
> >
> > The test sets a hardware breakpoint and uses a bpf program to suppress the
> > side effects of a perf event sample, including I/O availability signals,
> > SIGTRAPs, and decrementing the event counter limit, if the ip matches the
> > expected value. Then the function with the breakpoint is executed multiple
> > times to test that all effects behave as expected.
> >
> > Signed-off-by: Kyle Huey <[email protected]>
> > ---
> > .../selftests/bpf/prog_tests/perf_skip.c | 140 ++++++++++++++++++
> > .../selftests/bpf/progs/test_perf_skip.c | 15 ++
> > 2 files changed, 155 insertions(+)
> > create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c
> >
> > diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > new file mode 100644
> > index 000000000000..0200736a8baf
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
> > @@ -0,0 +1,140 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#define _GNU_SOURCE
> > +
> > +#include <test_progs.h>
> > +#include "test_perf_skip.skel.h"
> > +#include <linux/compiler.h>
> > +#include <linux/hw_breakpoint.h>
> > +#include <sys/mman.h>
> > +
> > +#ifndef TRAP_PERF
> > +#define TRAP_PERF 6
> > +#endif
> > +
> > +int signals_unexpected = 1;
> > +int sigio_count, sigtrap_count;
> > +
> > +static void handle_sigio(int sig __always_unused)
> > +{
> > + ASSERT_OK(signals_unexpected, "perf event not skipped");
>
> ASSERT_OK is a little confusing. Maybe do something like:
>
> static int signals_expected;
> static void handle_sigio(int sig __always_unused)
> {
> ASSERT_EQ(signals_expected, 1, "expected sig_io");
> }
> serial_test_perf_skip()
> {
> ...
> signals_expected = 1;
> }
>

I'll just drop signals_expected. Now that I'm counting the exact
number of signals it's redundant.

> > + ++sigio_count;
> > +}
> > +
> > +static void handle_sigtrap(int signum __always_unused,
> > + siginfo_t *info,
> > + void *ucontext __always_unused)
> > +{
> > + ASSERT_OK(signals_unexpected, "perf event not skipped");
> ditto
>
> > + ASSERT_EQ(info->si_code, TRAP_PERF, "wrong si_code");
> > + ++sigtrap_count;
> > +}
> > +
> > +static noinline int test_function(void)
> > +{
> > + asm volatile ("");
> > + return 0;
> > +}
> > +
> > +void serial_test_perf_skip(void)
> > +{
> > + struct sigaction action = {};
> > + struct sigaction previous_sigtrap;
> > + sighandler_t previous_sigio;
> > + struct test_perf_skip *skel = NULL;
> > + struct perf_event_attr attr = {};
> > + int perf_fd = -1;
> > + int err;
> > + struct f_owner_ex owner;
> > + struct bpf_link *prog_link = NULL;
> > +
> > + action.sa_flags = SA_SIGINFO | SA_NODEFER;
> > + action.sa_sigaction = handle_sigtrap;
> > + sigemptyset(&action.sa_mask);
> > + if (!ASSERT_OK(sigaction(SIGTRAP, &action, &previous_sigtrap), "sigaction"))
> > + return;
> > +
> > + previous_sigio = signal(SIGIO, handle_sigio);
>
> handle signal() errors here?

Addressed in v4.

> > +
> > + skel = test_perf_skip__open_and_load();
> > + if (!ASSERT_OK_PTR(skel, "skel_load"))
> > + goto cleanup;
> > +
> > + attr.type = PERF_TYPE_BREAKPOINT;
> > + attr.size = sizeof(attr);
> > + attr.bp_type = HW_BREAKPOINT_X;
> > + attr.bp_addr = (uintptr_t)test_function;
> > + attr.bp_len = sizeof(long);
> > + attr.sample_period = 1;
> > + attr.sample_type = PERF_SAMPLE_IP;
> > + attr.pinned = 1;
> > + attr.exclude_kernel = 1;
> > + attr.exclude_hv = 1;
> > + attr.precise_ip = 3;
> > + attr.sigtrap = 1;
> > + attr.remove_on_exec = 1;
> > +
> > + perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
> > + if (perf_fd < 0 && (errno == ENOENT || errno == EOPNOTSUPP)) {
> > + printf("SKIP:no PERF_TYPE_BREAKPOINT/HW_BREAKPOINT_X\n");
> > + test__skip();
> > + goto cleanup;
> > + }
> > + if (!ASSERT_OK(perf_fd < 0, "perf_event_open"))
> > + goto cleanup;
> > +
> > + /* Configure the perf event to signal on sample. */
> > + err = fcntl(perf_fd, F_SETFL, O_ASYNC);
> > + if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
> > + goto cleanup;
> > +
> > + owner.type = F_OWNER_TID;
> > + owner.pid = syscall(__NR_gettid);
> > + err = fcntl(perf_fd, F_SETOWN_EX, &owner);
> > + if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
> > + goto cleanup;
> > +
> > + /*
> > + * Allow at most one sample. A sample rejected by bpf should
> > + * not count against this.
> > + */
>
> Multi-line comment style should be like

Addressed in v4.

> /* Allow at most one sample. A sample rejected by bpf should
> * not count against this.
> */
>
> > + err = ioctl(perf_fd, PERF_EVENT_IOC_REFRESH, 1);
> > + if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_REFRESH)"))
> > + goto cleanup;
> > +
> > + prog_link = bpf_program__attach_perf_event(skel->progs.handler, perf_fd);
> > + if (!ASSERT_OK_PTR(prog_link, "bpf_program__attach_perf_event"))
> > + goto cleanup;
> > +
> > + /* Configure the bpf program to suppress the sample. */
> > + skel->bss->ip = (uintptr_t)test_function;
> > + test_function();
> > +
> > + ASSERT_EQ(sigio_count, 0, "sigio_count");
> > + ASSERT_EQ(sigtrap_count, 0, "sigtrap_count");
> > +
> > + /* Configure the bpf program to allow the sample. */
> > + skel->bss->ip = 0;
> > + signals_unexpected = 0;
> > + test_function();
> > +
> > + ASSERT_EQ(sigio_count, 1, "sigio_count");
> > + ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
> > +
> > + /*
> > + * Test that the sample above is the only one allowed (by perf, not
> > + * by bpf)
> > + */
>
> ditto.
>
> > + test_function();
> > +
> > + ASSERT_EQ(sigio_count, 1, "sigio_count");
> > + ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
> > +
> > +cleanup:
> > + bpf_link__destroy(prog_link);
> > + if (perf_fd >= 0)
> > + close(perf_fd);
> > + test_perf_skip__destroy(skel);
> > +
> > + signal(SIGIO, previous_sigio);
> > + sigaction(SIGTRAP, &previous_sigtrap, NULL);
> > +}
> > diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> > new file mode 100644
> > index 000000000000..7eb8b6de7a57
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
> > @@ -0,0 +1,15 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +#include "vmlinux.h"
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +
> > +uintptr_t ip;
> > +
> > +SEC("perf_event")
> > +int handler(struct bpf_perf_event_data *data)
> > +{
> > + /* Skip events that have the correct ip. */
> > + return ip != PT_REGS_IP(&data->regs);
> > +}
> > +
> > +char _license[] SEC("license") = "GPL";
> > --
> > 2.34.1
> >

- Kyle