2024-04-12 01:50:50

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v6 0/7] Combine perf and bpf for fast eval of hw breakpoint conditions

rr, a userspace record and replay debugger[0], replays asynchronous events
such as signals and context switches by essentially[1] setting a breakpoint
at the address where the asynchronous event was delivered during recording
with a condition that the program state matches the state when the event
was delivered.

Currently, rr uses software breakpoints that trap (via ptrace) to the
supervisor, and evaluates the condition from the supervisor. If the
asynchronous event is delivered in a tight loop (thus requiring the
breakpoint condition to be repeatedly evaluated) the overhead can be
immense. A patch to rr that uses hardware breakpoints via perf events with
an attached BPF program to reject breakpoint hits where the condition is
not satisfied reduces rr's replay overhead by 94% on a pathological (but a
real customer-provided, not contrived) rr trace.

The only obstacle to this approach is that while the kernel allows a BPF
program to suppress sample output when a perf event overflows it does not
suppress signalling the perf event fd or sending the perf event's SIGTRAP.
This patch set redesigns __perf_overflow_handler() and
bpf_overflow_handler() so that the former invokes the latter directly when
appropriate rather than through the generic overflow handler machinery,
passes the return code of the BPF program back to __perf_overflow_handler()
to allow it to decide whether to execute the regular overflow handler,
reorders bpf_overflow_handler() and the side effects of perf event
overflow, changes __perf_overflow_handler() to suppress those side effects
if the BPF program returns zero, and adds a selftest.

The previous version of this patchset can be found at
https://lore.kernel.org/linux-kernel/[email protected]/

Changes since v5:

Patches 1, 2, and 3 are added to address Ingo's review comments.

Patches 4 through 7 are the previous patches 1 through 4.

Patches 4 through 7 add Andrii's Acked-by.

Patch 5 fixes Ingo's comments about punctuation and newlines.

v4 of this patchset can be found at
https://lore.kernel.org/linux-kernel/[email protected]/

Changes since v4:

Patches 1, 2, 3, 4 added various Acked-by.

Patch 4 addresses additional nits from Song.

v3 of this patchset can be found at
https://lore.kernel.org/linux-kernel/[email protected]/

Changes since v3:

Patches 1, 2, 3 added various Acked-by.

Patch 4 addresses Song's review comments by dropping signals_expected and the
corresponding ASSERT_OKs, handling errors from signal(), and fixing multiline
comment formatting.

v2 of this patchset can be found at
https://lore.kernel.org/linux-kernel/[email protected]/

Changes since v2:

Patches 1 and 2 were added from a suggestion by Namhyung Kim to refactor
this code to implement this feature in a cleaner way. Patch 2 is separated
for the benefit of the ARM arch maintainers.

Patch 3 conceptually supercedes v2's patches 1 and 2, now with a cleaner
implementation thanks to the earlier refactoring.

Patch 4 is v2's patch 3, and addresses review comments about C++ style
comments, getting a TRAP_PERF definition into the test, and unnecessary
NULL checks.

[0] https://rr-project.org/
[1] Various optimizations exist to skip as much as execution as possible
before setting a breakpoint, and to determine a subset of program state
that is practical to check and verify.




2024-04-12 01:51:03

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v6 1/7] perf/bpf: Reorder bpf_overflow_handler() ahead of __perf_event_overflow()

This will allow __perf_event_overflow() to call bpf_overflow_handler().

Signed-off-by: Kyle Huey <[email protected]>
---
kernel/events/core.c | 183 ++++++++++++++++++++++---------------------
1 file changed, 92 insertions(+), 91 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 724e6d7e128f..ee025125a681 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9544,6 +9544,98 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
return true;
}

+#ifdef CONFIG_BPF_SYSCALL
+static void bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ struct bpf_perf_event_data_kern ctx = {
+ .data = data,
+ .event = event,
+ };
+ struct bpf_prog *prog;
+ int ret = 0;
+
+ ctx.regs = perf_arch_bpf_user_pt_regs(regs);
+ if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
+ goto out;
+ rcu_read_lock();
+ prog = READ_ONCE(event->prog);
+ if (prog) {
+ perf_prepare_sample(data, event, regs);
+ ret = bpf_prog_run(prog, &ctx);
+ }
+ rcu_read_unlock();
+out:
+ __this_cpu_dec(bpf_prog_active);
+ if (!ret)
+ return;
+
+ event->orig_overflow_handler(event, data, regs);
+}
+
+static int perf_event_set_bpf_handler(struct perf_event *event,
+ struct bpf_prog *prog,
+ u64 bpf_cookie)
+{
+ if (event->overflow_handler_context)
+ /* hw breakpoint or kernel counter */
+ return -EINVAL;
+
+ if (event->prog)
+ return -EEXIST;
+
+ if (prog->type != BPF_PROG_TYPE_PERF_EVENT)
+ return -EINVAL;
+
+ if (event->attr.precise_ip &&
+ prog->call_get_stack &&
+ (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) ||
+ event->attr.exclude_callchain_kernel ||
+ event->attr.exclude_callchain_user)) {
+ /*
+ * On perf_event with precise_ip, calling bpf_get_stack()
+ * may trigger unwinder warnings and occasional crashes.
+ * bpf_get_[stack|stackid] works around this issue by using
+ * callchain attached to perf_sample_data. If the
+ * perf_event does not full (kernel and user) callchain
+ * attached to perf_sample_data, do not allow attaching BPF
+ * program that calls bpf_get_[stack|stackid].
+ */
+ return -EPROTO;
+ }
+
+ event->prog = prog;
+ event->bpf_cookie = bpf_cookie;
+ event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
+ WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
+ return 0;
+}
+
+static void perf_event_free_bpf_handler(struct perf_event *event)
+{
+ struct bpf_prog *prog = event->prog;
+
+ if (!prog)
+ return;
+
+ WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
+ event->prog = NULL;
+ bpf_prog_put(prog);
+}
+#else
+static int perf_event_set_bpf_handler(struct perf_event *event,
+ struct bpf_prog *prog,
+ u64 bpf_cookie)
+{
+ return -EOPNOTSUPP;
+}
+
+static void perf_event_free_bpf_handler(struct perf_event *event)
+{
+}
+#endif
+
/*
* Generic event overflow handling, sampling.
*/
@@ -10422,97 +10514,6 @@ static void perf_event_free_filter(struct perf_event *event)
ftrace_profile_free_filter(event);
}

-#ifdef CONFIG_BPF_SYSCALL
-static void bpf_overflow_handler(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
-{
- struct bpf_perf_event_data_kern ctx = {
- .data = data,
- .event = event,
- };
- struct bpf_prog *prog;
- int ret = 0;
-
- ctx.regs = perf_arch_bpf_user_pt_regs(regs);
- if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
- goto out;
- rcu_read_lock();
- prog = READ_ONCE(event->prog);
- if (prog) {
- perf_prepare_sample(data, event, regs);
- ret = bpf_prog_run(prog, &ctx);
- }
- rcu_read_unlock();
-out:
- __this_cpu_dec(bpf_prog_active);
- if (!ret)
- return;
-
- event->orig_overflow_handler(event, data, regs);
-}
-
-static int perf_event_set_bpf_handler(struct perf_event *event,
- struct bpf_prog *prog,
- u64 bpf_cookie)
-{
- if (event->overflow_handler_context)
- /* hw breakpoint or kernel counter */
- return -EINVAL;
-
- if (event->prog)
- return -EEXIST;
-
- if (prog->type != BPF_PROG_TYPE_PERF_EVENT)
- return -EINVAL;
-
- if (event->attr.precise_ip &&
- prog->call_get_stack &&
- (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) ||
- event->attr.exclude_callchain_kernel ||
- event->attr.exclude_callchain_user)) {
- /*
- * On perf_event with precise_ip, calling bpf_get_stack()
- * may trigger unwinder warnings and occasional crashes.
- * bpf_get_[stack|stackid] works around this issue by using
- * callchain attached to perf_sample_data. If the
- * perf_event does not full (kernel and user) callchain
- * attached to perf_sample_data, do not allow attaching BPF
- * program that calls bpf_get_[stack|stackid].
- */
- return -EPROTO;
- }
-
- event->prog = prog;
- event->bpf_cookie = bpf_cookie;
- event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
- WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
- return 0;
-}
-
-static void perf_event_free_bpf_handler(struct perf_event *event)
-{
- struct bpf_prog *prog = event->prog;
-
- if (!prog)
- return;
-
- WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
- event->prog = NULL;
- bpf_prog_put(prog);
-}
-#else
-static int perf_event_set_bpf_handler(struct perf_event *event,
- struct bpf_prog *prog,
- u64 bpf_cookie)
-{
- return -EOPNOTSUPP;
-}
-static void perf_event_free_bpf_handler(struct perf_event *event)
-{
-}
-#endif
-
/*
* returns true if the event is a tracepoint, or a kprobe/upprobe created
* with perf_event_open()
--
2.34.1


2024-04-12 01:51:28

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v6 3/7] perf/bpf: Remove #ifdef CONFIG_BPF_SYSCALL from struct perf_event members

This will allow __perf_event_overflow() (which is independent of
CONFIG_BPF_SYSCALL) to use struct perf_event's prog to decide whether to
call bpf_overflow_handler().

Suggested-by: Ingo Molnar <[email protected]>
Signed-off-by: Kyle Huey <[email protected]>
---
include/linux/perf_event.h | 2 --
1 file changed, 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d2a15c0c6f8a..07cd4722dedb 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -809,11 +809,9 @@ struct perf_event {
u64 (*clock)(void);
perf_overflow_handler_t overflow_handler;
void *overflow_handler_context;
-#ifdef CONFIG_BPF_SYSCALL
perf_overflow_handler_t orig_overflow_handler;
struct bpf_prog *prog;
u64 bpf_cookie;
-#endif

#ifdef CONFIG_EVENT_TRACING
struct trace_event_call *tp_event;
--
2.34.1


2024-04-12 01:52:09

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v6 5/7] perf/bpf: Remove unneeded uses_default_overflow_handler()

Now that struct perf_event's orig_overflow_handler is gone, there's no need
for the functions and macros to support looking past overflow_handler to
orig_overflow_handler.

This patch is solely a refactoring and results in no behavior change.

Signed-off-by: Kyle Huey <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
---
arch/arm/kernel/hw_breakpoint.c | 8 ++++----
arch/arm64/kernel/hw_breakpoint.c | 4 ++--
include/linux/perf_event.h | 17 +++--------------
3 files changed, 9 insertions(+), 20 deletions(-)

diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index dc0fb7a81371..054e9199f30d 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -626,7 +626,7 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
hw->address &= ~alignment_mask;
hw->ctrl.len <<= offset;

- if (uses_default_overflow_handler(bp)) {
+ if (is_default_overflow_handler(bp)) {
/*
* Mismatch breakpoints are required for single-stepping
* breakpoints.
@@ -798,7 +798,7 @@ static void watchpoint_handler(unsigned long addr, unsigned int fsr,
* Otherwise, insert a temporary mismatch breakpoint so that
* we can single-step over the watchpoint trigger.
*/
- if (!uses_default_overflow_handler(wp))
+ if (!is_default_overflow_handler(wp))
continue;
step:
enable_single_step(wp, instruction_pointer(regs));
@@ -811,7 +811,7 @@ static void watchpoint_handler(unsigned long addr, unsigned int fsr,
info->trigger = addr;
pr_debug("watchpoint fired: address = 0x%x\n", info->trigger);
perf_bp_event(wp, regs);
- if (uses_default_overflow_handler(wp))
+ if (is_default_overflow_handler(wp))
enable_single_step(wp, instruction_pointer(regs));
}

@@ -886,7 +886,7 @@ static void breakpoint_handler(unsigned long unknown, struct pt_regs *regs)
info->trigger = addr;
pr_debug("breakpoint fired: address = 0x%x\n", addr);
perf_bp_event(bp, regs);
- if (uses_default_overflow_handler(bp))
+ if (is_default_overflow_handler(bp))
enable_single_step(bp, addr);
goto unlock;
}
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 2f5755192c2b..722ac45f9f7b 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -655,7 +655,7 @@ static int breakpoint_handler(unsigned long unused, unsigned long esr,
perf_bp_event(bp, regs);

/* Do we need to handle the stepping? */
- if (uses_default_overflow_handler(bp))
+ if (is_default_overflow_handler(bp))
step = 1;
unlock:
rcu_read_unlock();
@@ -734,7 +734,7 @@ static u64 get_distance_from_watchpoint(unsigned long addr, u64 val,
static int watchpoint_report(struct perf_event *wp, unsigned long addr,
struct pt_regs *regs)
{
- int step = uses_default_overflow_handler(wp);
+ int step = is_default_overflow_handler(wp);
struct arch_hw_breakpoint *info = counter_arch_bp(wp);

info->trigger = addr;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 65ad1294218f..13a2b05cc431 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1339,8 +1339,10 @@ extern int perf_event_output(struct perf_event *event,
struct pt_regs *regs);

static inline bool
-__is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
+is_default_overflow_handler(struct perf_event *event)
{
+ perf_overflow_handler_t overflow_handler = event->overflow_handler;
+
if (likely(overflow_handler == perf_event_output_forward))
return true;
if (unlikely(overflow_handler == perf_event_output_backward))
@@ -1348,19 +1350,6 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
return false;
}

-#define is_default_overflow_handler(event) \
- __is_default_overflow_handler((event)->overflow_handler)
-
-#ifdef CONFIG_BPF_SYSCALL
-static inline bool uses_default_overflow_handler(struct perf_event *event)
-{
- return is_default_overflow_handler(event);
-}
-#else
-#define uses_default_overflow_handler(event) \
- is_default_overflow_handler(event)
-#endif
-
extern void
perf_event_header__init_id(struct perf_event_header *header,
struct perf_sample_data *data,
--
2.34.1


2024-04-12 01:52:16

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v6 6/7] perf/bpf: Allow a bpf program to suppress all sample side effects

Returning zero from a bpf program attached to a perf event already
suppresses any data output. Return early from __perf_event_overflow() in
this case so it will also suppress event_limit accounting, SIGTRAP
generation, and F_ASYNC signalling.

Signed-off-by: Kyle Huey <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
---
kernel/events/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index fd601d509cea..cd88d1e89eb8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9658,6 +9658,9 @@ static int __perf_event_overflow(struct perf_event *event,

ret = __perf_event_account_interrupt(event, throttle);

+ if (event->prog && !bpf_overflow_handler(event, data, regs))
+ return ret;
+
/*
* XXX event_limit might not quite work as expected on inherited
* events
@@ -9707,8 +9710,7 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(&event->pending_irq);
}

- if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
- READ_ONCE(event->overflow_handler)(event, data, regs);
+ READ_ONCE(event->overflow_handler)(event, data, regs);

if (*perf_event_fasync(event) && event->pending_kill) {
event->pending_wakeup = 1;
--
2.34.1


2024-04-12 01:52:48

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v6 7/7] selftest/bpf: Test a perf bpf program that suppresses side effects.

The test sets a hardware breakpoint and uses a bpf program to suppress the
side effects of a perf event sample, including I/O availability signals,
SIGTRAPs, and decrementing the event counter limit, if the ip matches the
expected value. Then the function with the breakpoint is executed multiple
times to test that all effects behave as expected.

Signed-off-by: Kyle Huey <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
---
.../selftests/bpf/prog_tests/perf_skip.c | 137 ++++++++++++++++++
.../selftests/bpf/progs/test_perf_skip.c | 15 ++
2 files changed, 152 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
new file mode 100644
index 000000000000..37d8618800e4
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+#include <test_progs.h>
+#include "test_perf_skip.skel.h"
+#include <linux/compiler.h>
+#include <linux/hw_breakpoint.h>
+#include <sys/mman.h>
+
+#ifndef TRAP_PERF
+#define TRAP_PERF 6
+#endif
+
+int sigio_count, sigtrap_count;
+
+static void handle_sigio(int sig __always_unused)
+{
+ ++sigio_count;
+}
+
+static void handle_sigtrap(int signum __always_unused,
+ siginfo_t *info,
+ void *ucontext __always_unused)
+{
+ ASSERT_EQ(info->si_code, TRAP_PERF, "si_code");
+ ++sigtrap_count;
+}
+
+static noinline int test_function(void)
+{
+ asm volatile ("");
+ return 0;
+}
+
+void serial_test_perf_skip(void)
+{
+ struct sigaction action = {};
+ struct sigaction previous_sigtrap;
+ sighandler_t previous_sigio = SIG_ERR;
+ struct test_perf_skip *skel = NULL;
+ struct perf_event_attr attr = {};
+ int perf_fd = -1;
+ int err;
+ struct f_owner_ex owner;
+ struct bpf_link *prog_link = NULL;
+
+ action.sa_flags = SA_SIGINFO | SA_NODEFER;
+ action.sa_sigaction = handle_sigtrap;
+ sigemptyset(&action.sa_mask);
+ if (!ASSERT_OK(sigaction(SIGTRAP, &action, &previous_sigtrap), "sigaction"))
+ return;
+
+ previous_sigio = signal(SIGIO, handle_sigio);
+ if (!ASSERT_NEQ(previous_sigio, SIG_ERR, "signal"))
+ goto cleanup;
+
+ skel = test_perf_skip__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ attr.type = PERF_TYPE_BREAKPOINT;
+ attr.size = sizeof(attr);
+ attr.bp_type = HW_BREAKPOINT_X;
+ attr.bp_addr = (uintptr_t)test_function;
+ attr.bp_len = sizeof(long);
+ attr.sample_period = 1;
+ attr.sample_type = PERF_SAMPLE_IP;
+ attr.pinned = 1;
+ attr.exclude_kernel = 1;
+ attr.exclude_hv = 1;
+ attr.precise_ip = 3;
+ attr.sigtrap = 1;
+ attr.remove_on_exec = 1;
+
+ perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
+ if (perf_fd < 0 && (errno == ENOENT || errno == EOPNOTSUPP)) {
+ printf("SKIP:no PERF_TYPE_BREAKPOINT/HW_BREAKPOINT_X\n");
+ test__skip();
+ goto cleanup;
+ }
+ if (!ASSERT_OK(perf_fd < 0, "perf_event_open"))
+ goto cleanup;
+
+ /* Configure the perf event to signal on sample. */
+ err = fcntl(perf_fd, F_SETFL, O_ASYNC);
+ if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
+ goto cleanup;
+
+ owner.type = F_OWNER_TID;
+ owner.pid = syscall(__NR_gettid);
+ err = fcntl(perf_fd, F_SETOWN_EX, &owner);
+ if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
+ goto cleanup;
+
+ /* Allow at most one sample. A sample rejected by bpf should
+ * not count against this.
+ */
+ err = ioctl(perf_fd, PERF_EVENT_IOC_REFRESH, 1);
+ if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_REFRESH)"))
+ goto cleanup;
+
+ prog_link = bpf_program__attach_perf_event(skel->progs.handler, perf_fd);
+ if (!ASSERT_OK_PTR(prog_link, "bpf_program__attach_perf_event"))
+ goto cleanup;
+
+ /* Configure the bpf program to suppress the sample. */
+ skel->bss->ip = (uintptr_t)test_function;
+ test_function();
+
+ ASSERT_EQ(sigio_count, 0, "sigio_count");
+ ASSERT_EQ(sigtrap_count, 0, "sigtrap_count");
+
+ /* Configure the bpf program to allow the sample. */
+ skel->bss->ip = 0;
+ test_function();
+
+ ASSERT_EQ(sigio_count, 1, "sigio_count");
+ ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
+
+ /* Test that the sample above is the only one allowed (by perf, not
+ * by bpf)
+ */
+ test_function();
+
+ ASSERT_EQ(sigio_count, 1, "sigio_count");
+ ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
+
+cleanup:
+ bpf_link__destroy(prog_link);
+ if (perf_fd >= 0)
+ close(perf_fd);
+ test_perf_skip__destroy(skel);
+
+ if (previous_sigio != SIG_ERR)
+ signal(SIGIO, previous_sigio);
+ sigaction(SIGTRAP, &previous_sigtrap, NULL);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
new file mode 100644
index 000000000000..7eb8b6de7a57
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+uintptr_t ip;
+
+SEC("perf_event")
+int handler(struct bpf_perf_event_data *data)
+{
+ /* Skip events that have the correct ip. */
+ return ip != PT_REGS_IP(&data->regs);
+}
+
+char _license[] SEC("license") = "GPL";
--
2.34.1


2024-04-12 02:02:28

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v6 4/7] perf/bpf: Call bpf handler directly, not through overflow machinery

To ultimately allow bpf programs attached to perf events to completely
suppress all of the effects of a perf event overflow (rather than just the
sample output, as they do today), call bpf_overflow_handler() from
__perf_event_overflow() directly rather than modifying struct perf_event's
overflow_handler. Return the bpf program's return value from
bpf_overflow_handler() so that __perf_event_overflow() knows how to
proceed. Remove the now unnecessary orig_overflow_handler from struct
perf_event.

This patch is solely a refactoring and results in no behavior change.

Suggested-by: Namhyung Kim <[email protected]>
Signed-off-by: Kyle Huey <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
---
include/linux/perf_event.h | 6 +-----
kernel/events/core.c | 27 +++++++++++----------------
2 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 07cd4722dedb..65ad1294218f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -809,7 +809,6 @@ struct perf_event {
u64 (*clock)(void);
perf_overflow_handler_t overflow_handler;
void *overflow_handler_context;
- perf_overflow_handler_t orig_overflow_handler;
struct bpf_prog *prog;
u64 bpf_cookie;

@@ -1355,10 +1354,7 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
#ifdef CONFIG_BPF_SYSCALL
static inline bool uses_default_overflow_handler(struct perf_event *event)
{
- if (likely(is_default_overflow_handler(event)))
- return true;
-
- return __is_default_overflow_handler(event->orig_overflow_handler);
+ return is_default_overflow_handler(event);
}
#else
#define uses_default_overflow_handler(event) \
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a7c2a739a27c..fd601d509cea 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9545,9 +9545,9 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
}

#ifdef CONFIG_BPF_SYSCALL
-static void bpf_overflow_handler(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
+static int bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
{
struct bpf_perf_event_data_kern ctx = {
.data = data,
@@ -9568,10 +9568,8 @@ static void bpf_overflow_handler(struct perf_event *event,
rcu_read_unlock();
out:
__this_cpu_dec(bpf_prog_active);
- if (!ret)
- return;

- event->orig_overflow_handler(event, data, regs);
+ return ret;
}

static int perf_event_set_bpf_handler(struct perf_event *event,
@@ -9607,8 +9605,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event,

event->prog = prog;
event->bpf_cookie = bpf_cookie;
- event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
- WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
return 0;
}

@@ -9619,15 +9615,15 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
if (!prog)
return;

- WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
event->prog = NULL;
bpf_prog_put(prog);
}
#else
-static void bpf_overflow_handler(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
+static int bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
{
+ return 1;
}

static int perf_event_set_bpf_handler(struct perf_event *event,
@@ -9711,7 +9707,8 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(&event->pending_irq);
}

- READ_ONCE(event->overflow_handler)(event, data, regs);
+ if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
+ READ_ONCE(event->overflow_handler)(event, data, regs);

if (*perf_event_fasync(event) && event->pending_kill) {
event->pending_wakeup = 1;
@@ -11978,13 +11975,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
overflow_handler = parent_event->overflow_handler;
context = parent_event->overflow_handler_context;
#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
- if (overflow_handler == bpf_overflow_handler) {
+ if (parent_event->prog) {
struct bpf_prog *prog = parent_event->prog;

bpf_prog_inc(prog);
event->prog = prog;
- event->orig_overflow_handler =
- parent_event->orig_overflow_handler;
}
#endif
}
--
2.34.1


2024-04-12 02:04:43

by Kyle Huey

[permalink] [raw]
Subject: [PATCH v6 2/7] perf/bpf: Create bpf_overflow_handler() stub for !CONFIG_BPF_SYSCALL

This will allow __perf_event_overflow() (which is independent of
CONFIG_BPF_SYSCALL) to call bpf_overflow_handler().

Signed-off-by: Kyle Huey <[email protected]>
---
kernel/events/core.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index ee025125a681..a7c2a739a27c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9624,6 +9624,12 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
bpf_prog_put(prog);
}
#else
+static void bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+}
+
static int perf_event_set_bpf_handler(struct perf_event *event,
struct bpf_prog *prog,
u64 bpf_cookie)
--
2.34.1


2024-04-12 09:55:15

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH 8/7] perf/bpf: Change the !CONFIG_BPF_SYSCALL stubs to static inlines

Otherwise the compiler will be unhappy if they go unused,
which they do on allnoconfigs.

Cc: Kyle Huey <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/events/core.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2212670cbe9b..6708c1121b9f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9638,21 +9638,21 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
bpf_prog_put(prog);
}
#else
-static int bpf_overflow_handler(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
+static inline int bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
{
return 1;
}

-static int perf_event_set_bpf_handler(struct perf_event *event,
- struct bpf_prog *prog,
- u64 bpf_cookie)
+static inline int perf_event_set_bpf_handler(struct perf_event *event,
+ struct bpf_prog *prog,
+ u64 bpf_cookie)
{
return -EOPNOTSUPP;
}

-static void perf_event_free_bpf_handler(struct perf_event *event)
+static inline void perf_event_free_bpf_handler(struct perf_event *event)
{
}
#endif

Subject: [tip: perf/core] perf/bpf: Change the !CONFIG_BPF_SYSCALL stubs to static inlines

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 93d3fde7fd19c2e2cde7220e7986f9a75e9c5680
Gitweb: https://git.kernel.org/tip/93d3fde7fd19c2e2cde7220e7986f9a75e9c5680
Author: Ingo Molnar <[email protected]>
AuthorDate: Fri, 12 Apr 2024 11:55:00 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 12 Apr 2024 11:56:00 +02:00

perf/bpf: Change the !CONFIG_BPF_SYSCALL stubs to static inlines

Otherwise the compiler will be unhappy if they go unused,
which they do on allnoconfigs.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Kyle Huey <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/events/core.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2212670..6708c11 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9638,21 +9638,21 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
bpf_prog_put(prog);
}
#else
-static int bpf_overflow_handler(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
+static inline int bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
{
return 1;
}

-static int perf_event_set_bpf_handler(struct perf_event *event,
- struct bpf_prog *prog,
- u64 bpf_cookie)
+static inline int perf_event_set_bpf_handler(struct perf_event *event,
+ struct bpf_prog *prog,
+ u64 bpf_cookie)
{
return -EOPNOTSUPP;
}

-static void perf_event_free_bpf_handler(struct perf_event *event)
+static inline void perf_event_free_bpf_handler(struct perf_event *event)
{
}
#endif

Subject: [tip: perf/core] selftest/bpf: Test a perf BPF program that suppresses side effects

The following commit has been merged into the perf/core branch of tip:

Commit-ID: a265c9f6d52ac760e6e572bac73a11b60b998779
Gitweb: https://git.kernel.org/tip/a265c9f6d52ac760e6e572bac73a11b60b998779
Author: Kyle Huey <[email protected]>
AuthorDate: Thu, 11 Apr 2024 18:50:19 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 12 Apr 2024 11:49:51 +02:00

selftest/bpf: Test a perf BPF program that suppresses side effects

The test sets a hardware breakpoint and uses a BPF program to suppress the
side effects of a perf event sample, including I/O availability signals,
SIGTRAPs, and decrementing the event counter limit, if the IP matches the
expected value. Then the function with the breakpoint is executed multiple
times to test that all effects behave as expected.

Signed-off-by: Kyle Huey <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
tools/testing/selftests/bpf/prog_tests/perf_skip.c | 137 ++++++++++++-
tools/testing/selftests/bpf/progs/test_perf_skip.c | 15 +-
2 files changed, 152 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
new file mode 100644
index 0000000..37d8618
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+#include <test_progs.h>
+#include "test_perf_skip.skel.h"
+#include <linux/compiler.h>
+#include <linux/hw_breakpoint.h>
+#include <sys/mman.h>
+
+#ifndef TRAP_PERF
+#define TRAP_PERF 6
+#endif
+
+int sigio_count, sigtrap_count;
+
+static void handle_sigio(int sig __always_unused)
+{
+ ++sigio_count;
+}
+
+static void handle_sigtrap(int signum __always_unused,
+ siginfo_t *info,
+ void *ucontext __always_unused)
+{
+ ASSERT_EQ(info->si_code, TRAP_PERF, "si_code");
+ ++sigtrap_count;
+}
+
+static noinline int test_function(void)
+{
+ asm volatile ("");
+ return 0;
+}
+
+void serial_test_perf_skip(void)
+{
+ struct sigaction action = {};
+ struct sigaction previous_sigtrap;
+ sighandler_t previous_sigio = SIG_ERR;
+ struct test_perf_skip *skel = NULL;
+ struct perf_event_attr attr = {};
+ int perf_fd = -1;
+ int err;
+ struct f_owner_ex owner;
+ struct bpf_link *prog_link = NULL;
+
+ action.sa_flags = SA_SIGINFO | SA_NODEFER;
+ action.sa_sigaction = handle_sigtrap;
+ sigemptyset(&action.sa_mask);
+ if (!ASSERT_OK(sigaction(SIGTRAP, &action, &previous_sigtrap), "sigaction"))
+ return;
+
+ previous_sigio = signal(SIGIO, handle_sigio);
+ if (!ASSERT_NEQ(previous_sigio, SIG_ERR, "signal"))
+ goto cleanup;
+
+ skel = test_perf_skip__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ attr.type = PERF_TYPE_BREAKPOINT;
+ attr.size = sizeof(attr);
+ attr.bp_type = HW_BREAKPOINT_X;
+ attr.bp_addr = (uintptr_t)test_function;
+ attr.bp_len = sizeof(long);
+ attr.sample_period = 1;
+ attr.sample_type = PERF_SAMPLE_IP;
+ attr.pinned = 1;
+ attr.exclude_kernel = 1;
+ attr.exclude_hv = 1;
+ attr.precise_ip = 3;
+ attr.sigtrap = 1;
+ attr.remove_on_exec = 1;
+
+ perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
+ if (perf_fd < 0 && (errno == ENOENT || errno == EOPNOTSUPP)) {
+ printf("SKIP:no PERF_TYPE_BREAKPOINT/HW_BREAKPOINT_X\n");
+ test__skip();
+ goto cleanup;
+ }
+ if (!ASSERT_OK(perf_fd < 0, "perf_event_open"))
+ goto cleanup;
+
+ /* Configure the perf event to signal on sample. */
+ err = fcntl(perf_fd, F_SETFL, O_ASYNC);
+ if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
+ goto cleanup;
+
+ owner.type = F_OWNER_TID;
+ owner.pid = syscall(__NR_gettid);
+ err = fcntl(perf_fd, F_SETOWN_EX, &owner);
+ if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
+ goto cleanup;
+
+ /* Allow at most one sample. A sample rejected by bpf should
+ * not count against this.
+ */
+ err = ioctl(perf_fd, PERF_EVENT_IOC_REFRESH, 1);
+ if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_REFRESH)"))
+ goto cleanup;
+
+ prog_link = bpf_program__attach_perf_event(skel->progs.handler, perf_fd);
+ if (!ASSERT_OK_PTR(prog_link, "bpf_program__attach_perf_event"))
+ goto cleanup;
+
+ /* Configure the bpf program to suppress the sample. */
+ skel->bss->ip = (uintptr_t)test_function;
+ test_function();
+
+ ASSERT_EQ(sigio_count, 0, "sigio_count");
+ ASSERT_EQ(sigtrap_count, 0, "sigtrap_count");
+
+ /* Configure the bpf program to allow the sample. */
+ skel->bss->ip = 0;
+ test_function();
+
+ ASSERT_EQ(sigio_count, 1, "sigio_count");
+ ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
+
+ /* Test that the sample above is the only one allowed (by perf, not
+ * by bpf)
+ */
+ test_function();
+
+ ASSERT_EQ(sigio_count, 1, "sigio_count");
+ ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
+
+cleanup:
+ bpf_link__destroy(prog_link);
+ if (perf_fd >= 0)
+ close(perf_fd);
+ test_perf_skip__destroy(skel);
+
+ if (previous_sigio != SIG_ERR)
+ signal(SIGIO, previous_sigio);
+ sigaction(SIGTRAP, &previous_sigtrap, NULL);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
new file mode 100644
index 0000000..7eb8b6d
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+uintptr_t ip;
+
+SEC("perf_event")
+int handler(struct bpf_perf_event_data *data)
+{
+ /* Skip events that have the correct ip. */
+ return ip != PT_REGS_IP(&data->regs);
+}
+
+char _license[] SEC("license") = "GPL";

Subject: [tip: perf/core] perf/bpf: Remove unneeded uses_default_overflow_handler()

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 76f6d58845829e5d6ef55532e67a323e7d30c26e
Gitweb: https://git.kernel.org/tip/76f6d58845829e5d6ef55532e67a323e7d30c26e
Author: Kyle Huey <[email protected]>
AuthorDate: Thu, 11 Apr 2024 18:50:17 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 12 Apr 2024 11:49:50 +02:00

perf/bpf: Remove unneeded uses_default_overflow_handler()

Now that struct perf_event's orig_overflow_handler is gone, there's no need
for the functions and macros to support looking past overflow_handler to
orig_overflow_handler.

This patch is solely a refactoring and results in no behavior change.

Signed-off-by: Kyle Huey <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/arm/kernel/hw_breakpoint.c | 8 ++++----
arch/arm64/kernel/hw_breakpoint.c | 4 ++--
include/linux/perf_event.h | 17 +++--------------
3 files changed, 9 insertions(+), 20 deletions(-)

diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index dc0fb7a..054e919 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -626,7 +626,7 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
hw->address &= ~alignment_mask;
hw->ctrl.len <<= offset;

- if (uses_default_overflow_handler(bp)) {
+ if (is_default_overflow_handler(bp)) {
/*
* Mismatch breakpoints are required for single-stepping
* breakpoints.
@@ -798,7 +798,7 @@ static void watchpoint_handler(unsigned long addr, unsigned int fsr,
* Otherwise, insert a temporary mismatch breakpoint so that
* we can single-step over the watchpoint trigger.
*/
- if (!uses_default_overflow_handler(wp))
+ if (!is_default_overflow_handler(wp))
continue;
step:
enable_single_step(wp, instruction_pointer(regs));
@@ -811,7 +811,7 @@ step:
info->trigger = addr;
pr_debug("watchpoint fired: address = 0x%x\n", info->trigger);
perf_bp_event(wp, regs);
- if (uses_default_overflow_handler(wp))
+ if (is_default_overflow_handler(wp))
enable_single_step(wp, instruction_pointer(regs));
}

@@ -886,7 +886,7 @@ static void breakpoint_handler(unsigned long unknown, struct pt_regs *regs)
info->trigger = addr;
pr_debug("breakpoint fired: address = 0x%x\n", addr);
perf_bp_event(bp, regs);
- if (uses_default_overflow_handler(bp))
+ if (is_default_overflow_handler(bp))
enable_single_step(bp, addr);
goto unlock;
}
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 2f57551..722ac45 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -655,7 +655,7 @@ static int breakpoint_handler(unsigned long unused, unsigned long esr,
perf_bp_event(bp, regs);

/* Do we need to handle the stepping? */
- if (uses_default_overflow_handler(bp))
+ if (is_default_overflow_handler(bp))
step = 1;
unlock:
rcu_read_unlock();
@@ -734,7 +734,7 @@ static u64 get_distance_from_watchpoint(unsigned long addr, u64 val,
static int watchpoint_report(struct perf_event *wp, unsigned long addr,
struct pt_regs *regs)
{
- int step = uses_default_overflow_handler(wp);
+ int step = is_default_overflow_handler(wp);
struct arch_hw_breakpoint *info = counter_arch_bp(wp);

info->trigger = addr;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2ce2fbc..d5ff0c1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1345,8 +1345,10 @@ extern int perf_event_output(struct perf_event *event,
struct pt_regs *regs);

static inline bool
-__is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
+is_default_overflow_handler(struct perf_event *event)
{
+ perf_overflow_handler_t overflow_handler = event->overflow_handler;
+
if (likely(overflow_handler == perf_event_output_forward))
return true;
if (unlikely(overflow_handler == perf_event_output_backward))
@@ -1354,19 +1356,6 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
return false;
}

-#define is_default_overflow_handler(event) \
- __is_default_overflow_handler((event)->overflow_handler)
-
-#ifdef CONFIG_BPF_SYSCALL
-static inline bool uses_default_overflow_handler(struct perf_event *event)
-{
- return is_default_overflow_handler(event);
-}
-#else
-#define uses_default_overflow_handler(event) \
- is_default_overflow_handler(event)
-#endif
-
extern void
perf_event_header__init_id(struct perf_event_header *header,
struct perf_sample_data *data,

Subject: [tip: perf/core] perf/bpf: Call BPF handler directly, not through overflow machinery

The following commit has been merged into the perf/core branch of tip:

Commit-ID: f11f10bfa1ca23b32020b2073aa13131a27978fe
Gitweb: https://git.kernel.org/tip/f11f10bfa1ca23b32020b2073aa13131a27978fe
Author: Kyle Huey <[email protected]>
AuthorDate: Thu, 11 Apr 2024 18:50:16 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 12 Apr 2024 11:49:49 +02:00

perf/bpf: Call BPF handler directly, not through overflow machinery

To ultimately allow BPF programs attached to perf events to completely
suppress all of the effects of a perf event overflow (rather than just the
sample output, as they do today), call bpf_overflow_handler() from
__perf_event_overflow() directly rather than modifying struct perf_event's
overflow_handler. Return the BPF program's return value from
bpf_overflow_handler() so that __perf_event_overflow() knows how to
proceed. Remove the now unnecessary orig_overflow_handler from struct
perf_event.

This patch is solely a refactoring and results in no behavior change.

Suggested-by: Namhyung Kim <[email protected]>
Signed-off-by: Kyle Huey <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/perf_event.h | 6 +-----
kernel/events/core.c | 27 +++++++++++----------------
2 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 50e01db..2ce2fbc 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -809,7 +809,6 @@ struct perf_event {
u64 (*clock)(void);
perf_overflow_handler_t overflow_handler;
void *overflow_handler_context;
- perf_overflow_handler_t orig_overflow_handler;
struct bpf_prog *prog;
u64 bpf_cookie;

@@ -1361,10 +1360,7 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
#ifdef CONFIG_BPF_SYSCALL
static inline bool uses_default_overflow_handler(struct perf_event *event)
{
- if (likely(is_default_overflow_handler(event)))
- return true;
-
- return __is_default_overflow_handler(event->orig_overflow_handler);
+ return is_default_overflow_handler(event);
}
#else
#define uses_default_overflow_handler(event) \
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d3f3f55..c6a6936 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9564,9 +9564,9 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
}

#ifdef CONFIG_BPF_SYSCALL
-static void bpf_overflow_handler(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
+static int bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
{
struct bpf_perf_event_data_kern ctx = {
.data = data,
@@ -9587,10 +9587,8 @@ static void bpf_overflow_handler(struct perf_event *event,
rcu_read_unlock();
out:
__this_cpu_dec(bpf_prog_active);
- if (!ret)
- return;

- event->orig_overflow_handler(event, data, regs);
+ return ret;
}

static int perf_event_set_bpf_handler(struct perf_event *event,
@@ -9626,8 +9624,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event,

event->prog = prog;
event->bpf_cookie = bpf_cookie;
- event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
- WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
return 0;
}

@@ -9638,15 +9634,15 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
if (!prog)
return;

- WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
event->prog = NULL;
bpf_prog_put(prog);
}
#else
-static void bpf_overflow_handler(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
+static int bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
{
+ return 1;
}

static int perf_event_set_bpf_handler(struct perf_event *event,
@@ -9730,7 +9726,8 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(&event->pending_irq);
}

- READ_ONCE(event->overflow_handler)(event, data, regs);
+ if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
+ READ_ONCE(event->overflow_handler)(event, data, regs);

if (*perf_event_fasync(event) && event->pending_kill) {
event->pending_wakeup = 1;
@@ -11997,13 +11994,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
overflow_handler = parent_event->overflow_handler;
context = parent_event->overflow_handler_context;
#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
- if (overflow_handler == bpf_overflow_handler) {
+ if (parent_event->prog) {
struct bpf_prog *prog = parent_event->prog;

bpf_prog_inc(prog);
event->prog = prog;
- event->orig_overflow_handler =
- parent_event->orig_overflow_handler;
}
#endif
}

Subject: [tip: perf/core] perf/bpf: Remove #ifdef CONFIG_BPF_SYSCALL from struct perf_event members

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 14e40a9578b70cc5323e55f61292a7e021f6037c
Gitweb: https://git.kernel.org/tip/14e40a9578b70cc5323e55f61292a7e021f6037c
Author: Kyle Huey <[email protected]>
AuthorDate: Thu, 11 Apr 2024 18:50:15 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 12 Apr 2024 11:49:49 +02:00

perf/bpf: Remove #ifdef CONFIG_BPF_SYSCALL from struct perf_event members

This will allow __perf_event_overflow() (which is independent of
CONFIG_BPF_SYSCALL) to use struct perf_event's prog to decide whether to
call bpf_overflow_handler().

Suggested-by: Ingo Molnar <[email protected]>
Signed-off-by: Kyle Huey <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/perf_event.h | 2 --
1 file changed, 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 3e33b36..50e01db 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -809,11 +809,9 @@ struct perf_event {
u64 (*clock)(void);
perf_overflow_handler_t overflow_handler;
void *overflow_handler_context;
-#ifdef CONFIG_BPF_SYSCALL
perf_overflow_handler_t orig_overflow_handler;
struct bpf_prog *prog;
u64 bpf_cookie;
-#endif

#ifdef CONFIG_EVENT_TRACING
struct trace_event_call *tp_event;

Subject: [tip: perf/core] perf/bpf: Create bpf_overflow_handler() stub for !CONFIG_BPF_SYSCALL

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 924d934393f98fa6a41d6ea27352faf79c2bbaf6
Gitweb: https://git.kernel.org/tip/924d934393f98fa6a41d6ea27352faf79c2bbaf6
Author: Kyle Huey <[email protected]>
AuthorDate: Thu, 11 Apr 2024 18:50:14 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 12 Apr 2024 11:49:48 +02:00

perf/bpf: Create bpf_overflow_handler() stub for !CONFIG_BPF_SYSCALL

This will allow __perf_event_overflow() (which is independent of
CONFIG_BPF_SYSCALL) to call bpf_overflow_handler().

Signed-off-by: Kyle Huey <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/events/core.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index ca0a906..d3f3f55 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9643,6 +9643,12 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
bpf_prog_put(prog);
}
#else
+static void bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+}
+
static int perf_event_set_bpf_handler(struct perf_event *event,
struct bpf_prog *prog,
u64 bpf_cookie)

Subject: [tip: perf/core] perf/bpf: Reorder bpf_overflow_handler() ahead of __perf_event_overflow()

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 4c03fe11b96bda60610aca77002e83f37b4a2242
Gitweb: https://git.kernel.org/tip/4c03fe11b96bda60610aca77002e83f37b4a2242
Author: Kyle Huey <[email protected]>
AuthorDate: Thu, 11 Apr 2024 18:50:13 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 12 Apr 2024 11:49:48 +02:00

perf/bpf: Reorder bpf_overflow_handler() ahead of __perf_event_overflow()

This will allow __perf_event_overflow() to call bpf_overflow_handler().

Signed-off-by: Kyle Huey <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/events/core.c | 183 +++++++++++++++++++++---------------------
1 file changed, 92 insertions(+), 91 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index fd94e45..ca0a906 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9563,6 +9563,98 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
return true;
}

+#ifdef CONFIG_BPF_SYSCALL
+static void bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
+{
+ struct bpf_perf_event_data_kern ctx = {
+ .data = data,
+ .event = event,
+ };
+ struct bpf_prog *prog;
+ int ret = 0;
+
+ ctx.regs = perf_arch_bpf_user_pt_regs(regs);
+ if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
+ goto out;
+ rcu_read_lock();
+ prog = READ_ONCE(event->prog);
+ if (prog) {
+ perf_prepare_sample(data, event, regs);
+ ret = bpf_prog_run(prog, &ctx);
+ }
+ rcu_read_unlock();
+out:
+ __this_cpu_dec(bpf_prog_active);
+ if (!ret)
+ return;
+
+ event->orig_overflow_handler(event, data, regs);
+}
+
+static int perf_event_set_bpf_handler(struct perf_event *event,
+ struct bpf_prog *prog,
+ u64 bpf_cookie)
+{
+ if (event->overflow_handler_context)
+ /* hw breakpoint or kernel counter */
+ return -EINVAL;
+
+ if (event->prog)
+ return -EEXIST;
+
+ if (prog->type != BPF_PROG_TYPE_PERF_EVENT)
+ return -EINVAL;
+
+ if (event->attr.precise_ip &&
+ prog->call_get_stack &&
+ (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) ||
+ event->attr.exclude_callchain_kernel ||
+ event->attr.exclude_callchain_user)) {
+ /*
+ * On perf_event with precise_ip, calling bpf_get_stack()
+ * may trigger unwinder warnings and occasional crashes.
+ * bpf_get_[stack|stackid] works around this issue by using
+ * callchain attached to perf_sample_data. If the
+ * perf_event does not full (kernel and user) callchain
+ * attached to perf_sample_data, do not allow attaching BPF
+ * program that calls bpf_get_[stack|stackid].
+ */
+ return -EPROTO;
+ }
+
+ event->prog = prog;
+ event->bpf_cookie = bpf_cookie;
+ event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
+ WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
+ return 0;
+}
+
+static void perf_event_free_bpf_handler(struct perf_event *event)
+{
+ struct bpf_prog *prog = event->prog;
+
+ if (!prog)
+ return;
+
+ WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
+ event->prog = NULL;
+ bpf_prog_put(prog);
+}
+#else
+static int perf_event_set_bpf_handler(struct perf_event *event,
+ struct bpf_prog *prog,
+ u64 bpf_cookie)
+{
+ return -EOPNOTSUPP;
+}
+
+static void perf_event_free_bpf_handler(struct perf_event *event)
+{
+}
+#endif
+
/*
* Generic event overflow handling, sampling.
*/
@@ -10441,97 +10533,6 @@ static void perf_event_free_filter(struct perf_event *event)
ftrace_profile_free_filter(event);
}

-#ifdef CONFIG_BPF_SYSCALL
-static void bpf_overflow_handler(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
-{
- struct bpf_perf_event_data_kern ctx = {
- .data = data,
- .event = event,
- };
- struct bpf_prog *prog;
- int ret = 0;
-
- ctx.regs = perf_arch_bpf_user_pt_regs(regs);
- if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1))
- goto out;
- rcu_read_lock();
- prog = READ_ONCE(event->prog);
- if (prog) {
- perf_prepare_sample(data, event, regs);
- ret = bpf_prog_run(prog, &ctx);
- }
- rcu_read_unlock();
-out:
- __this_cpu_dec(bpf_prog_active);
- if (!ret)
- return;
-
- event->orig_overflow_handler(event, data, regs);
-}
-
-static int perf_event_set_bpf_handler(struct perf_event *event,
- struct bpf_prog *prog,
- u64 bpf_cookie)
-{
- if (event->overflow_handler_context)
- /* hw breakpoint or kernel counter */
- return -EINVAL;
-
- if (event->prog)
- return -EEXIST;
-
- if (prog->type != BPF_PROG_TYPE_PERF_EVENT)
- return -EINVAL;
-
- if (event->attr.precise_ip &&
- prog->call_get_stack &&
- (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) ||
- event->attr.exclude_callchain_kernel ||
- event->attr.exclude_callchain_user)) {
- /*
- * On perf_event with precise_ip, calling bpf_get_stack()
- * may trigger unwinder warnings and occasional crashes.
- * bpf_get_[stack|stackid] works around this issue by using
- * callchain attached to perf_sample_data. If the
- * perf_event does not full (kernel and user) callchain
- * attached to perf_sample_data, do not allow attaching BPF
- * program that calls bpf_get_[stack|stackid].
- */
- return -EPROTO;
- }
-
- event->prog = prog;
- event->bpf_cookie = bpf_cookie;
- event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
- WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
- return 0;
-}
-
-static void perf_event_free_bpf_handler(struct perf_event *event)
-{
- struct bpf_prog *prog = event->prog;
-
- if (!prog)
- return;
-
- WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
- event->prog = NULL;
- bpf_prog_put(prog);
-}
-#else
-static int perf_event_set_bpf_handler(struct perf_event *event,
- struct bpf_prog *prog,
- u64 bpf_cookie)
-{
- return -EOPNOTSUPP;
-}
-static void perf_event_free_bpf_handler(struct perf_event *event)
-{
-}
-#endif
-
/*
* returns true if the event is a tracepoint, or a kprobe/upprobe created
* with perf_event_open()

Subject: [tip: perf/core] perf/bpf: Allow a BPF program to suppress all sample side effects

The following commit has been merged into the perf/core branch of tip:

Commit-ID: c4fcc7d1f41532e878087c7c43f4cf247604d68b
Gitweb: https://git.kernel.org/tip/c4fcc7d1f41532e878087c7c43f4cf247604d68b
Author: Kyle Huey <[email protected]>
AuthorDate: Thu, 11 Apr 2024 18:50:18 -07:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Fri, 12 Apr 2024 11:49:50 +02:00

perf/bpf: Allow a BPF program to suppress all sample side effects

Returning zero from a BPF program attached to a perf event already
suppresses any data output. Return early from __perf_event_overflow() in
this case so it will also suppress event_limit accounting, SIGTRAP
generation, and F_ASYNC signalling.

Signed-off-by: Kyle Huey <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Acked-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/events/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index c6a6936..2212670 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9677,6 +9677,9 @@ static int __perf_event_overflow(struct perf_event *event,

ret = __perf_event_account_interrupt(event, throttle);

+ if (event->prog && !bpf_overflow_handler(event, data, regs))
+ return ret;
+
/*
* XXX event_limit might not quite work as expected on inherited
* events
@@ -9726,8 +9729,7 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(&event->pending_irq);
}

- if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
- READ_ONCE(event->overflow_handler)(event, data, regs);
+ READ_ONCE(event->overflow_handler)(event, data, regs);

if (*perf_event_fasync(event) && event->pending_kill) {
event->pending_wakeup = 1;