2024-02-14 17:42:18

by Kyle Huey

[permalink] [raw]
Subject: [RESEND PATCH v5 0/4] Combine perf and bpf for fast eval of hw breakpoint conditions]

Peter, Ingo, could you take a look at this?

----

rr, a userspace record and replay debugger[0], replays asynchronous events
such as signals and context switches by essentially[1] setting a breakpoint
at the address where the asynchronous event was delivered during recording
with a condition that the program state matches the state when the event
was delivered.

Currently, rr uses software breakpoints that trap (via ptrace) to the
supervisor, and evaluates the condition from the supervisor. If the
asynchronous event is delivered in a tight loop (thus requiring the
breakpoint condition to be repeatedly evaluated) the overhead can be
immense. A patch to rr that uses hardware breakpoints via perf events with
an attached BPF program to reject breakpoint hits where the condition is
not satisfied reduces rr's replay overhead by 94% on a pathological (but a
real customer-provided, not contrived) rr trace.

The only obstacle to this approach is that while the kernel allows a BPF
program to suppress sample output when a perf event overflows it does not
suppress signalling the perf event fd or sending the perf event's SIGTRAP.
This patch set redesigns __perf_overflow_handler() and
bpf_overflow_handler() so that the former invokes the latter directly when
appropriate rather than through the generic overflow handler machinery,
passes the return code of the BPF program back to __perf_overflow_handler()
to allow it to decide whether to execute the regular overflow handler,
reorders bpf_overflow_handler() and the side effects of perf event
overflow, changes __perf_overflow_handler() to suppress those side effects
if the BPF program returns zero, and adds a selftest.

The previous version of this patchset can be found at
https://lore.kernel.org/linux-kernel/[email protected]/

Changes since v4:

Patches 1, 2, 3, 4 added various Acked-by.

Patch 4 addresses additional nits from Song.

v3 of this patchset can be found at
https://lore.kernel.org/linux-kernel/[email protected]/

Changes since v3:

Patches 1, 2, 3 added various Acked-by.

Patch 4 addresses Song's review comments by dropping signals_expected and the
corresponding ASSERT_OKs, handling errors from signal(), and fixing multiline
comment formatting.

v2 of this patchset can be found at
https://lore.kernel.org/linux-kernel/[email protected]/

Changes since v2:

Patches 1 and 2 were added from a suggestion by Namhyung Kim to refactor
this code to implement this feature in a cleaner way. Patch 2 is separated
for the benefit of the ARM arch maintainers.

Patch 3 conceptually supercedes v2's patches 1 and 2, now with a cleaner
implementation thanks to the earlier refactoring.

Patch 4 is v2's patch 3, and addresses review comments about C++ style
comments, getting a TRAP_PERF definition into the test, and unnecessary
NULL checks.

[0] https://rr-project.org/
[1] Various optimizations exist to skip as much as execution as possible
before setting a breakpoint, and to determine a set of program state that
is practical to check and verify.




2024-02-14 17:43:04

by Kyle Huey

[permalink] [raw]
Subject: [RESEND PATCH v5 4/4] selftest/bpf: Test a perf bpf program that suppresses side effects.

The test sets a hardware breakpoint and uses a bpf program to suppress the
side effects of a perf event sample, including I/O availability signals,
SIGTRAPs, and decrementing the event counter limit, if the ip matches the
expected value. Then the function with the breakpoint is executed multiple
times to test that all effects behave as expected.

Signed-off-by: Kyle Huey <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
---
.../selftests/bpf/prog_tests/perf_skip.c | 137 ++++++++++++++++++
.../selftests/bpf/progs/test_perf_skip.c | 15 ++
2 files changed, 152 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/perf_skip.c
create mode 100644 tools/testing/selftests/bpf/progs/test_perf_skip.c

diff --git a/tools/testing/selftests/bpf/prog_tests/perf_skip.c b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
new file mode 100644
index 000000000000..37d8618800e4
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/perf_skip.c
@@ -0,0 +1,137 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE
+
+#include <test_progs.h>
+#include "test_perf_skip.skel.h"
+#include <linux/compiler.h>
+#include <linux/hw_breakpoint.h>
+#include <sys/mman.h>
+
+#ifndef TRAP_PERF
+#define TRAP_PERF 6
+#endif
+
+int sigio_count, sigtrap_count;
+
+static void handle_sigio(int sig __always_unused)
+{
+ ++sigio_count;
+}
+
+static void handle_sigtrap(int signum __always_unused,
+ siginfo_t *info,
+ void *ucontext __always_unused)
+{
+ ASSERT_EQ(info->si_code, TRAP_PERF, "si_code");
+ ++sigtrap_count;
+}
+
+static noinline int test_function(void)
+{
+ asm volatile ("");
+ return 0;
+}
+
+void serial_test_perf_skip(void)
+{
+ struct sigaction action = {};
+ struct sigaction previous_sigtrap;
+ sighandler_t previous_sigio = SIG_ERR;
+ struct test_perf_skip *skel = NULL;
+ struct perf_event_attr attr = {};
+ int perf_fd = -1;
+ int err;
+ struct f_owner_ex owner;
+ struct bpf_link *prog_link = NULL;
+
+ action.sa_flags = SA_SIGINFO | SA_NODEFER;
+ action.sa_sigaction = handle_sigtrap;
+ sigemptyset(&action.sa_mask);
+ if (!ASSERT_OK(sigaction(SIGTRAP, &action, &previous_sigtrap), "sigaction"))
+ return;
+
+ previous_sigio = signal(SIGIO, handle_sigio);
+ if (!ASSERT_NEQ(previous_sigio, SIG_ERR, "signal"))
+ goto cleanup;
+
+ skel = test_perf_skip__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_load"))
+ goto cleanup;
+
+ attr.type = PERF_TYPE_BREAKPOINT;
+ attr.size = sizeof(attr);
+ attr.bp_type = HW_BREAKPOINT_X;
+ attr.bp_addr = (uintptr_t)test_function;
+ attr.bp_len = sizeof(long);
+ attr.sample_period = 1;
+ attr.sample_type = PERF_SAMPLE_IP;
+ attr.pinned = 1;
+ attr.exclude_kernel = 1;
+ attr.exclude_hv = 1;
+ attr.precise_ip = 3;
+ attr.sigtrap = 1;
+ attr.remove_on_exec = 1;
+
+ perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
+ if (perf_fd < 0 && (errno == ENOENT || errno == EOPNOTSUPP)) {
+ printf("SKIP:no PERF_TYPE_BREAKPOINT/HW_BREAKPOINT_X\n");
+ test__skip();
+ goto cleanup;
+ }
+ if (!ASSERT_OK(perf_fd < 0, "perf_event_open"))
+ goto cleanup;
+
+ /* Configure the perf event to signal on sample. */
+ err = fcntl(perf_fd, F_SETFL, O_ASYNC);
+ if (!ASSERT_OK(err, "fcntl(F_SETFL, O_ASYNC)"))
+ goto cleanup;
+
+ owner.type = F_OWNER_TID;
+ owner.pid = syscall(__NR_gettid);
+ err = fcntl(perf_fd, F_SETOWN_EX, &owner);
+ if (!ASSERT_OK(err, "fcntl(F_SETOWN_EX)"))
+ goto cleanup;
+
+ /* Allow at most one sample. A sample rejected by bpf should
+ * not count against this.
+ */
+ err = ioctl(perf_fd, PERF_EVENT_IOC_REFRESH, 1);
+ if (!ASSERT_OK(err, "ioctl(PERF_EVENT_IOC_REFRESH)"))
+ goto cleanup;
+
+ prog_link = bpf_program__attach_perf_event(skel->progs.handler, perf_fd);
+ if (!ASSERT_OK_PTR(prog_link, "bpf_program__attach_perf_event"))
+ goto cleanup;
+
+ /* Configure the bpf program to suppress the sample. */
+ skel->bss->ip = (uintptr_t)test_function;
+ test_function();
+
+ ASSERT_EQ(sigio_count, 0, "sigio_count");
+ ASSERT_EQ(sigtrap_count, 0, "sigtrap_count");
+
+ /* Configure the bpf program to allow the sample. */
+ skel->bss->ip = 0;
+ test_function();
+
+ ASSERT_EQ(sigio_count, 1, "sigio_count");
+ ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
+
+ /* Test that the sample above is the only one allowed (by perf, not
+ * by bpf)
+ */
+ test_function();
+
+ ASSERT_EQ(sigio_count, 1, "sigio_count");
+ ASSERT_EQ(sigtrap_count, 1, "sigtrap_count");
+
+cleanup:
+ bpf_link__destroy(prog_link);
+ if (perf_fd >= 0)
+ close(perf_fd);
+ test_perf_skip__destroy(skel);
+
+ if (previous_sigio != SIG_ERR)
+ signal(SIGIO, previous_sigio);
+ sigaction(SIGTRAP, &previous_sigtrap, NULL);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_perf_skip.c b/tools/testing/selftests/bpf/progs/test_perf_skip.c
new file mode 100644
index 000000000000..7eb8b6de7a57
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_perf_skip.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+uintptr_t ip;
+
+SEC("perf_event")
+int handler(struct bpf_perf_event_data *data)
+{
+ /* Skip events that have the correct ip. */
+ return ip != PT_REGS_IP(&data->regs);
+}
+
+char _license[] SEC("license") = "GPL";
--
2.34.1


2024-02-14 18:06:49

by Kyle Huey

[permalink] [raw]
Subject: [RESEND PATCH v5 1/4] perf/bpf: Call bpf handler directly, not through overflow machinery

To ultimately allow bpf programs attached to perf events to completely
suppress all of the effects of a perf event overflow (rather than just the
sample output, as they do today), call bpf_overflow_handler() from
__perf_event_overflow() directly rather than modifying struct perf_event's
overflow_handler. Return the bpf program's return value from
bpf_overflow_handler() so that __perf_event_overflow() knows how to
proceed. Remove the now unnecessary orig_overflow_handler from struct
perf_event.

This patch is solely a refactoring and results in no behavior change.

Signed-off-by: Kyle Huey <[email protected]>
Suggested-by: Namhyung Kim <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
---
include/linux/perf_event.h | 6 +-----
kernel/events/core.c | 28 +++++++++++++++-------------
2 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d2a15c0c6f8a..c7f54fd74d89 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -810,7 +810,6 @@ struct perf_event {
perf_overflow_handler_t overflow_handler;
void *overflow_handler_context;
#ifdef CONFIG_BPF_SYSCALL
- perf_overflow_handler_t orig_overflow_handler;
struct bpf_prog *prog;
u64 bpf_cookie;
#endif
@@ -1357,10 +1356,7 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
#ifdef CONFIG_BPF_SYSCALL
static inline bool uses_default_overflow_handler(struct perf_event *event)
{
- if (likely(is_default_overflow_handler(event)))
- return true;
-
- return __is_default_overflow_handler(event->orig_overflow_handler);
+ return is_default_overflow_handler(event);
}
#else
#define uses_default_overflow_handler(event) \
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f0f0f71213a1..24a718e7eb98 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9548,6 +9548,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
return true;
}

+#ifdef CONFIG_BPF_SYSCALL
+static int bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs);
+#endif
+
/*
* Generic event overflow handling, sampling.
*/
@@ -9617,7 +9623,10 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(&event->pending_irq);
}

- READ_ONCE(event->overflow_handler)(event, data, regs);
+#ifdef CONFIG_BPF_SYSCALL
+ if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
+#endif
+ READ_ONCE(event->overflow_handler)(event, data, regs);

if (*perf_event_fasync(event) && event->pending_kill) {
event->pending_wakeup = 1;
@@ -10427,9 +10436,9 @@ static void perf_event_free_filter(struct perf_event *event)
}

#ifdef CONFIG_BPF_SYSCALL
-static void bpf_overflow_handler(struct perf_event *event,
- struct perf_sample_data *data,
- struct pt_regs *regs)
+static int bpf_overflow_handler(struct perf_event *event,
+ struct perf_sample_data *data,
+ struct pt_regs *regs)
{
struct bpf_perf_event_data_kern ctx = {
.data = data,
@@ -10450,10 +10459,8 @@ static void bpf_overflow_handler(struct perf_event *event,
rcu_read_unlock();
out:
__this_cpu_dec(bpf_prog_active);
- if (!ret)
- return;

- event->orig_overflow_handler(event, data, regs);
+ return ret;
}

static int perf_event_set_bpf_handler(struct perf_event *event,
@@ -10489,8 +10496,6 @@ static int perf_event_set_bpf_handler(struct perf_event *event,

event->prog = prog;
event->bpf_cookie = bpf_cookie;
- event->orig_overflow_handler = READ_ONCE(event->overflow_handler);
- WRITE_ONCE(event->overflow_handler, bpf_overflow_handler);
return 0;
}

@@ -10501,7 +10506,6 @@ static void perf_event_free_bpf_handler(struct perf_event *event)
if (!prog)
return;

- WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler);
event->prog = NULL;
bpf_prog_put(prog);
}
@@ -11975,13 +11979,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
overflow_handler = parent_event->overflow_handler;
context = parent_event->overflow_handler_context;
#if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING)
- if (overflow_handler == bpf_overflow_handler) {
+ if (parent_event->prog) {
struct bpf_prog *prog = parent_event->prog;

bpf_prog_inc(prog);
event->prog = prog;
- event->orig_overflow_handler =
- parent_event->orig_overflow_handler;
}
#endif
}
--
2.34.1


2024-02-14 18:09:15

by Kyle Huey

[permalink] [raw]
Subject: [RESEND PATCH v5 3/4] perf/bpf: Allow a bpf program to suppress all sample side effects

Returning zero from a bpf program attached to a perf event already
suppresses any data output. Return early from __perf_event_overflow() in
this case so it will also suppress event_limit accounting, SIGTRAP
generation, and F_ASYNC signalling.

Signed-off-by: Kyle Huey <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
---
kernel/events/core.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 24a718e7eb98..a329bec42c4d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9574,6 +9574,11 @@ static int __perf_event_overflow(struct perf_event *event,

ret = __perf_event_account_interrupt(event, throttle);

+#ifdef CONFIG_BPF_SYSCALL
+ if (event->prog && !bpf_overflow_handler(event, data, regs))
+ return ret;
+#endif
+
/*
* XXX event_limit might not quite work as expected on inherited
* events
@@ -9623,10 +9628,7 @@ static int __perf_event_overflow(struct perf_event *event,
irq_work_queue(&event->pending_irq);
}

-#ifdef CONFIG_BPF_SYSCALL
- if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
-#endif
- READ_ONCE(event->overflow_handler)(event, data, regs);
+ READ_ONCE(event->overflow_handler)(event, data, regs);

if (*perf_event_fasync(event) && event->pending_kill) {
event->pending_wakeup = 1;
--
2.34.1


2024-02-14 18:09:36

by Kyle Huey

[permalink] [raw]
Subject: [RESEND PATCH v5 2/4] perf/bpf: Remove unneeded uses_default_overflow_handler.

Now that struct perf_event's orig_overflow_handler is gone, there's no need
for the functions and macros to support looking past overflow_handler to
orig_overflow_handler.

This patch is solely a refactoring and results in no behavior change.

Signed-off-by: Kyle Huey <[email protected]>
Acked-by: Will Deacon <[email protected]>
Acked-by: Song Liu <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
---
arch/arm/kernel/hw_breakpoint.c | 8 ++++----
arch/arm64/kernel/hw_breakpoint.c | 4 ++--
include/linux/perf_event.h | 16 ++--------------
3 files changed, 8 insertions(+), 20 deletions(-)

diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
index dc0fb7a81371..054e9199f30d 100644
--- a/arch/arm/kernel/hw_breakpoint.c
+++ b/arch/arm/kernel/hw_breakpoint.c
@@ -626,7 +626,7 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
hw->address &= ~alignment_mask;
hw->ctrl.len <<= offset;

- if (uses_default_overflow_handler(bp)) {
+ if (is_default_overflow_handler(bp)) {
/*
* Mismatch breakpoints are required for single-stepping
* breakpoints.
@@ -798,7 +798,7 @@ static void watchpoint_handler(unsigned long addr, unsigned int fsr,
* Otherwise, insert a temporary mismatch breakpoint so that
* we can single-step over the watchpoint trigger.
*/
- if (!uses_default_overflow_handler(wp))
+ if (!is_default_overflow_handler(wp))
continue;
step:
enable_single_step(wp, instruction_pointer(regs));
@@ -811,7 +811,7 @@ static void watchpoint_handler(unsigned long addr, unsigned int fsr,
info->trigger = addr;
pr_debug("watchpoint fired: address = 0x%x\n", info->trigger);
perf_bp_event(wp, regs);
- if (uses_default_overflow_handler(wp))
+ if (is_default_overflow_handler(wp))
enable_single_step(wp, instruction_pointer(regs));
}

@@ -886,7 +886,7 @@ static void breakpoint_handler(unsigned long unknown, struct pt_regs *regs)
info->trigger = addr;
pr_debug("breakpoint fired: address = 0x%x\n", addr);
perf_bp_event(bp, regs);
- if (uses_default_overflow_handler(bp))
+ if (is_default_overflow_handler(bp))
enable_single_step(bp, addr);
goto unlock;
}
diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
index 35225632d70a..db2a1861bb97 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -654,7 +654,7 @@ static int breakpoint_handler(unsigned long unused, unsigned long esr,
perf_bp_event(bp, regs);

/* Do we need to handle the stepping? */
- if (uses_default_overflow_handler(bp))
+ if (is_default_overflow_handler(bp))
step = 1;
unlock:
rcu_read_unlock();
@@ -733,7 +733,7 @@ static u64 get_distance_from_watchpoint(unsigned long addr, u64 val,
static int watchpoint_report(struct perf_event *wp, unsigned long addr,
struct pt_regs *regs)
{
- int step = uses_default_overflow_handler(wp);
+ int step = is_default_overflow_handler(wp);
struct arch_hw_breakpoint *info = counter_arch_bp(wp);

info->trigger = addr;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c7f54fd74d89..c8bd5bb6610c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1341,8 +1341,9 @@ extern int perf_event_output(struct perf_event *event,
struct pt_regs *regs);

static inline bool
-__is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
+is_default_overflow_handler(struct perf_event *event)
{
+ perf_overflow_handler_t overflow_handler = event->overflow_handler;
if (likely(overflow_handler == perf_event_output_forward))
return true;
if (unlikely(overflow_handler == perf_event_output_backward))
@@ -1350,19 +1351,6 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
return false;
}

-#define is_default_overflow_handler(event) \
- __is_default_overflow_handler((event)->overflow_handler)
-
-#ifdef CONFIG_BPF_SYSCALL
-static inline bool uses_default_overflow_handler(struct perf_event *event)
-{
- return is_default_overflow_handler(event);
-}
-#else
-#define uses_default_overflow_handler(event) \
- is_default_overflow_handler(event)
-#endif
-
extern void
perf_event_header__init_id(struct perf_event_header *header,
struct perf_sample_data *data,
--
2.34.1


2024-02-16 00:12:06

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 1/4] perf/bpf: Call bpf handler directly, not through overflow machinery

On Wed, Feb 14, 2024 at 9:40 AM Kyle Huey <[email protected]> wrote:
>
> To ultimately allow bpf programs attached to perf events to completely
> suppress all of the effects of a perf event overflow (rather than just the
> sample output, as they do today), call bpf_overflow_handler() from
> __perf_event_overflow() directly rather than modifying struct perf_event's
> overflow_handler. Return the bpf program's return value from
> bpf_overflow_handler() so that __perf_event_overflow() knows how to
> proceed. Remove the now unnecessary orig_overflow_handler from struct
> perf_event.
>
> This patch is solely a refactoring and results in no behavior change.
>
> Signed-off-by: Kyle Huey <[email protected]>
> Suggested-by: Namhyung Kim <[email protected]>
> Acked-by: Song Liu <[email protected]>
> Acked-by: Jiri Olsa <[email protected]>
> ---
> include/linux/perf_event.h | 6 +-----
> kernel/events/core.c | 28 +++++++++++++++-------------
> 2 files changed, 16 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index d2a15c0c6f8a..c7f54fd74d89 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -810,7 +810,6 @@ struct perf_event {
> perf_overflow_handler_t overflow_handler;
> void *overflow_handler_context;
> #ifdef CONFIG_BPF_SYSCALL
> - perf_overflow_handler_t orig_overflow_handler;
> struct bpf_prog *prog;
> u64 bpf_cookie;
> #endif
> @@ -1357,10 +1356,7 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
> #ifdef CONFIG_BPF_SYSCALL
> static inline bool uses_default_overflow_handler(struct perf_event *event)
> {
> - if (likely(is_default_overflow_handler(event)))
> - return true;
> -
> - return __is_default_overflow_handler(event->orig_overflow_handler);
> + return is_default_overflow_handler(event);
> }
> #else
> #define uses_default_overflow_handler(event) \

and so in both cases uses_default_overflow_handler() is now just
is_default_overflow_handler(), right? So we can clean all this up
quite a bit?

> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f0f0f71213a1..24a718e7eb98 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9548,6 +9548,12 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r
> return true;
> }
>
> +#ifdef CONFIG_BPF_SYSCALL
> +static int bpf_overflow_handler(struct perf_event *event,
> + struct perf_sample_data *data,
> + struct pt_regs *regs);
> +#endif
> +
> /*
> * Generic event overflow handling, sampling.
> */
> @@ -9617,7 +9623,10 @@ static int __perf_event_overflow(struct perf_event *event,
> irq_work_queue(&event->pending_irq);
> }
>
> - READ_ONCE(event->overflow_handler)(event, data, regs);
> +#ifdef CONFIG_BPF_SYSCALL
> + if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
> +#endif
> + READ_ONCE(event->overflow_handler)(event, data, regs);

This is quite hard to follow... And that CONFIG_BPF_SYSCALL check
breaking apart that if statement is not great. Maybe something like:


bool skip_def_handler = false;

#ifdef CONFIG_BPF_SYSCALL
if (event->prog)
skip = bpf_overflow_handler(event, data, regs) == 0;
#endif
if (!skip_def_handler)
READ_ONCE(event->overflow_handler)(event, data, regs);

we can of course invert "skip" to be "run" and invert conditions, if
that's easier to follow

>
> if (*perf_event_fasync(event) && event->pending_kill) {
> event->pending_wakeup = 1;
> @@ -10427,9 +10436,9 @@ static void perf_event_free_filter(struct perf_event *event)
> }
>
> #ifdef CONFIG_BPF_SYSCALL
> -static void bpf_overflow_handler(struct perf_event *event,
> - struct perf_sample_data *data,
> - struct pt_regs *regs)
> +static int bpf_overflow_handler(struct perf_event *event,
> + struct perf_sample_data *data,
> + struct pt_regs *regs)
> {
> struct bpf_perf_event_data_kern ctx = {
> .data = data,

[...]

2024-02-16 00:12:51

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 2/4] perf/bpf: Remove unneeded uses_default_overflow_handler.

On Wed, Feb 14, 2024 at 9:40 AM Kyle Huey <[email protected]> wrote:
>
> Now that struct perf_event's orig_overflow_handler is gone, there's no need
> for the functions and macros to support looking past overflow_handler to
> orig_overflow_handler.
>
> This patch is solely a refactoring and results in no behavior change.
>
> Signed-off-by: Kyle Huey <[email protected]>
> Acked-by: Will Deacon <[email protected]>
> Acked-by: Song Liu <[email protected]>
> Acked-by: Jiri Olsa <[email protected]>
> ---

oh, never mind what I said in the first patch about this :)

Acked-by: Andrii Nakryiko <[email protected]>

> arch/arm/kernel/hw_breakpoint.c | 8 ++++----
> arch/arm64/kernel/hw_breakpoint.c | 4 ++--
> include/linux/perf_event.h | 16 ++--------------
> 3 files changed, 8 insertions(+), 20 deletions(-)
>
> diff --git a/arch/arm/kernel/hw_breakpoint.c b/arch/arm/kernel/hw_breakpoint.c
> index dc0fb7a81371..054e9199f30d 100644
> --- a/arch/arm/kernel/hw_breakpoint.c
> +++ b/arch/arm/kernel/hw_breakpoint.c
> @@ -626,7 +626,7 @@ int hw_breakpoint_arch_parse(struct perf_event *bp,
> hw->address &= ~alignment_mask;
> hw->ctrl.len <<= offset;
>
> - if (uses_default_overflow_handler(bp)) {
> + if (is_default_overflow_handler(bp)) {
> /*
> * Mismatch breakpoints are required for single-stepping
> * breakpoints.
> @@ -798,7 +798,7 @@ static void watchpoint_handler(unsigned long addr, unsigned int fsr,
> * Otherwise, insert a temporary mismatch breakpoint so that
> * we can single-step over the watchpoint trigger.
> */
> - if (!uses_default_overflow_handler(wp))
> + if (!is_default_overflow_handler(wp))
> continue;
> step:
> enable_single_step(wp, instruction_pointer(regs));
> @@ -811,7 +811,7 @@ static void watchpoint_handler(unsigned long addr, unsigned int fsr,
> info->trigger = addr;
> pr_debug("watchpoint fired: address = 0x%x\n", info->trigger);
> perf_bp_event(wp, regs);
> - if (uses_default_overflow_handler(wp))
> + if (is_default_overflow_handler(wp))
> enable_single_step(wp, instruction_pointer(regs));
> }
>
> @@ -886,7 +886,7 @@ static void breakpoint_handler(unsigned long unknown, struct pt_regs *regs)
> info->trigger = addr;
> pr_debug("breakpoint fired: address = 0x%x\n", addr);
> perf_bp_event(bp, regs);
> - if (uses_default_overflow_handler(bp))
> + if (is_default_overflow_handler(bp))
> enable_single_step(bp, addr);
> goto unlock;
> }
> diff --git a/arch/arm64/kernel/hw_breakpoint.c b/arch/arm64/kernel/hw_breakpoint.c
> index 35225632d70a..db2a1861bb97 100644
> --- a/arch/arm64/kernel/hw_breakpoint.c
> +++ b/arch/arm64/kernel/hw_breakpoint.c
> @@ -654,7 +654,7 @@ static int breakpoint_handler(unsigned long unused, unsigned long esr,
> perf_bp_event(bp, regs);
>
> /* Do we need to handle the stepping? */
> - if (uses_default_overflow_handler(bp))
> + if (is_default_overflow_handler(bp))
> step = 1;
> unlock:
> rcu_read_unlock();
> @@ -733,7 +733,7 @@ static u64 get_distance_from_watchpoint(unsigned long addr, u64 val,
> static int watchpoint_report(struct perf_event *wp, unsigned long addr,
> struct pt_regs *regs)
> {
> - int step = uses_default_overflow_handler(wp);
> + int step = is_default_overflow_handler(wp);
> struct arch_hw_breakpoint *info = counter_arch_bp(wp);
>
> info->trigger = addr;
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index c7f54fd74d89..c8bd5bb6610c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1341,8 +1341,9 @@ extern int perf_event_output(struct perf_event *event,
> struct pt_regs *regs);
>
> static inline bool
> -__is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
> +is_default_overflow_handler(struct perf_event *event)
> {
> + perf_overflow_handler_t overflow_handler = event->overflow_handler;
> if (likely(overflow_handler == perf_event_output_forward))
> return true;
> if (unlikely(overflow_handler == perf_event_output_backward))
> @@ -1350,19 +1351,6 @@ __is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
> return false;
> }
>
> -#define is_default_overflow_handler(event) \
> - __is_default_overflow_handler((event)->overflow_handler)
> -
> -#ifdef CONFIG_BPF_SYSCALL
> -static inline bool uses_default_overflow_handler(struct perf_event *event)
> -{
> - return is_default_overflow_handler(event);
> -}
> -#else
> -#define uses_default_overflow_handler(event) \
> - is_default_overflow_handler(event)
> -#endif
> -
> extern void
> perf_event_header__init_id(struct perf_event_header *header,
> struct perf_sample_data *data,
> --
> 2.34.1
>

2024-02-16 00:14:23

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 3/4] perf/bpf: Allow a bpf program to suppress all sample side effects

On Wed, Feb 14, 2024 at 9:40 AM Kyle Huey <[email protected]> wrote:
>
> Returning zero from a bpf program attached to a perf event already
> suppresses any data output. Return early from __perf_event_overflow() in
> this case so it will also suppress event_limit accounting, SIGTRAP
> generation, and F_ASYNC signalling.
>
> Signed-off-by: Kyle Huey <[email protected]>
> Acked-by: Song Liu <[email protected]>
> Acked-by: Jiri Olsa <[email protected]>
> Acked-by: Namhyung Kim <[email protected]>
> ---
> kernel/events/core.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 24a718e7eb98..a329bec42c4d 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -9574,6 +9574,11 @@ static int __perf_event_overflow(struct perf_event *event,
>
> ret = __perf_event_account_interrupt(event, throttle);
>
> +#ifdef CONFIG_BPF_SYSCALL
> + if (event->prog && !bpf_overflow_handler(event, data, regs))
> + return ret;
> +#endif
> +
> /*
> * XXX event_limit might not quite work as expected on inherited
> * events
> @@ -9623,10 +9628,7 @@ static int __perf_event_overflow(struct perf_event *event,
> irq_work_queue(&event->pending_irq);
> }
>
> -#ifdef CONFIG_BPF_SYSCALL
> - if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
> -#endif
> - READ_ONCE(event->overflow_handler)(event, data, regs);
> + READ_ONCE(event->overflow_handler)(event, data, regs);
>

Sorry, I haven't followed previous discussions, but why can't this
change be done as part of patch 1?

> if (*perf_event_fasync(event) && event->pending_kill) {
> event->pending_wakeup = 1;
> --
> 2.34.1
>

2024-02-16 00:25:10

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 0/4] Combine perf and bpf for fast eval of hw breakpoint conditions]

On Wed, Feb 14, 2024 at 9:40 AM Kyle Huey <[email protected]> wrote:
>
> Peter, Ingo, could you take a look at this?
>
> ----
>
> rr, a userspace record and replay debugger[0], replays asynchronous events
> such as signals and context switches by essentially[1] setting a breakpoint
> at the address where the asynchronous event was delivered during recording
> with a condition that the program state matches the state when the event
> was delivered.
>
> Currently, rr uses software breakpoints that trap (via ptrace) to the
> supervisor, and evaluates the condition from the supervisor. If the
> asynchronous event is delivered in a tight loop (thus requiring the
> breakpoint condition to be repeatedly evaluated) the overhead can be
> immense. A patch to rr that uses hardware breakpoints via perf events with
> an attached BPF program to reject breakpoint hits where the condition is
> not satisfied reduces rr's replay overhead by 94% on a pathological (but a
> real customer-provided, not contrived) rr trace.
>
> The only obstacle to this approach is that while the kernel allows a BPF
> program to suppress sample output when a perf event overflows it does not
> suppress signalling the perf event fd or sending the perf event's SIGTRAP.
> This patch set redesigns __perf_overflow_handler() and
> bpf_overflow_handler() so that the former invokes the latter directly when
> appropriate rather than through the generic overflow handler machinery,
> passes the return code of the BPF program back to __perf_overflow_handler()
> to allow it to decide whether to execute the regular overflow handler,
> reorders bpf_overflow_handler() and the side effects of perf event
> overflow, changes __perf_overflow_handler() to suppress those side effects
> if the BPF program returns zero, and adds a selftest.
>
> The previous version of this patchset can be found at
> https://lore.kernel.org/linux-kernel/20240119001352.9396-1-khuey@kylehueycom/
>
> Changes since v4:
>
> Patches 1, 2, 3, 4 added various Acked-by.
>
> Patch 4 addresses additional nits from Song.
>
> v3 of this patchset can be found at
> https://lore.kernel.org/linux-kernel/[email protected]/
>
> Changes since v3:
>
> Patches 1, 2, 3 added various Acked-by.
>
> Patch 4 addresses Song's review comments by dropping signals_expected and the
> corresponding ASSERT_OKs, handling errors from signal(), and fixing multiline
> comment formatting.
>
> v2 of this patchset can be found at
> https://lore.kernel.org/linux-kernel/20231207163458.5554-1-khuey@kylehueycom/
>
> Changes since v2:
>
> Patches 1 and 2 were added from a suggestion by Namhyung Kim to refactor
> this code to implement this feature in a cleaner way. Patch 2 is separated
> for the benefit of the ARM arch maintainers.
>
> Patch 3 conceptually supercedes v2's patches 1 and 2, now with a cleaner
> implementation thanks to the earlier refactoring.
>
> Patch 4 is v2's patch 3, and addresses review comments about C++ style
> comments, getting a TRAP_PERF definition into the test, and unnecessary
> NULL checks.
>
> [0] https://rr-project.org/
> [1] Various optimizations exist to skip as much as execution as possible
> before setting a breakpoint, and to determine a set of program state that
> is practical to check and verify.
>
>

The series LGTM, I'm just confused why patch 1 and patch 3 are
separated. But regardless, for the series:

Acked-by: Andrii Nakryiko <[email protected]>

2024-02-16 02:01:02

by Kyle Huey

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 3/4] perf/bpf: Allow a bpf program to suppress all sample side effects

On Thu, Feb 15, 2024 at 4:14 PM Andrii Nakryiko
<[email protected]> wrote:
>
> On Wed, Feb 14, 2024 at 9:40 AM Kyle Huey <[email protected]> wrote:
> >
> > Returning zero from a bpf program attached to a perf event already
> > suppresses any data output. Return early from __perf_event_overflow() in
> > this case so it will also suppress event_limit accounting, SIGTRAP
> > generation, and F_ASYNC signalling.
> >
> > Signed-off-by: Kyle Huey <[email protected]>
> > Acked-by: Song Liu <[email protected]>
> > Acked-by: Jiri Olsa <[email protected]>
> > Acked-by: Namhyung Kim <[email protected]>
> > ---
> > kernel/events/core.c | 10 ++++++----
> > 1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 24a718e7eb98..a329bec42c4d 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -9574,6 +9574,11 @@ static int __perf_event_overflow(struct perf_event *event,
> >
> > ret = __perf_event_account_interrupt(event, throttle);
> >
> > +#ifdef CONFIG_BPF_SYSCALL
> > + if (event->prog && !bpf_overflow_handler(event, data, regs))
> > + return ret;
> > +#endif
> > +
> > /*
> > * XXX event_limit might not quite work as expected on inherited
> > * events
> > @@ -9623,10 +9628,7 @@ static int __perf_event_overflow(struct perf_event *event,
> > irq_work_queue(&event->pending_irq);
> > }
> >
> > -#ifdef CONFIG_BPF_SYSCALL
> > - if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
> > -#endif
> > - READ_ONCE(event->overflow_handler)(event, data, regs);
> > + READ_ONCE(event->overflow_handler)(event, data, regs);
> >
>
> Sorry, I haven't followed previous discussions, but why can't this
> change be done as part of patch 1?

The idea was to refactor the code without making any behavior changes
(patches 1 and 2) and then to change the behavior (patch 3).

- Kyle

> > if (*perf_event_fasync(event) && event->pending_kill) {
> > event->pending_wakeup = 1;
> > --
> > 2.34.1
> >

2024-04-10 04:32:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 1/4] perf/bpf: Call bpf handler directly, not through overflow machinery


* Kyle Huey <[email protected]> wrote:

> To ultimately allow bpf programs attached to perf events to completely
> suppress all of the effects of a perf event overflow (rather than just the
> sample output, as they do today), call bpf_overflow_handler() from
> __perf_event_overflow() directly rather than modifying struct perf_event's
> overflow_handler. Return the bpf program's return value from
> bpf_overflow_handler() so that __perf_event_overflow() knows how to
> proceed. Remove the now unnecessary orig_overflow_handler from struct
> perf_event.
>
> This patch is solely a refactoring and results in no behavior change.
>
> Signed-off-by: Kyle Huey <[email protected]>
> Suggested-by: Namhyung Kim <[email protected]>
> Acked-by: Song Liu <[email protected]>
> Acked-by: Jiri Olsa <[email protected]>
> ---
> include/linux/perf_event.h | 6 +-----
> kernel/events/core.c | 28 +++++++++++++++-------------
> 2 files changed, 16 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index d2a15c0c6f8a..c7f54fd74d89 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -810,7 +810,6 @@ struct perf_event {
> perf_overflow_handler_t overflow_handler;
> void *overflow_handler_context;
> #ifdef CONFIG_BPF_SYSCALL
> - perf_overflow_handler_t orig_overflow_handler;
> struct bpf_prog *prog;
> u64 bpf_cookie;
> #endif

Could we reduce the #ifdeffery please?

On distros CONFIG_BPF_SYSCALL is almost always enabled, so it's not like
this truly saves anything on real systems.

I'd suggest making the perf_event::prog and perf_event::bpf_cookie fields
unconditional.

> +#ifdef CONFIG_BPF_SYSCALL
> +static int bpf_overflow_handler(struct perf_event *event,
> + struct perf_sample_data *data,
> + struct pt_regs *regs);
> +#endif

If the function definitions are misordered then first do a patch that moves
the function earlier in the file, instead of slapping a random prototype
into a random place.

> - READ_ONCE(event->overflow_handler)(event, data, regs);
> +#ifdef CONFIG_BPF_SYSCALL
> + if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
> +#endif
> + READ_ONCE(event->overflow_handler)(event, data, regs);

This #ifdef would go away too - on !CONFIG_BPF_SYSCALL event->prog should
always be NULL.

Please keep the #ifdeffery reduction and function-moving patches separate
from these other changes.

Thanks,

Ingo

2024-04-10 04:36:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 2/4] perf/bpf: Remove unneeded uses_default_overflow_handler.


* Kyle Huey <[email protected]> wrote:

> Now that struct perf_event's orig_overflow_handler is gone, there's no need
> for the functions and macros to support looking past overflow_handler to
> orig_overflow_handler.
>
> This patch is solely a refactoring and results in no behavior change.
>
> Signed-off-by: Kyle Huey <[email protected]>
> Acked-by: Will Deacon <[email protected]>
> Acked-by: Song Liu <[email protected]>
> Acked-by: Jiri Olsa <[email protected]>
> ---
> arch/arm/kernel/hw_breakpoint.c | 8 ++++----
> arch/arm64/kernel/hw_breakpoint.c | 4 ++--
> include/linux/perf_event.h | 16 ++--------------
> 3 files changed, 8 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index c7f54fd74d89..c8bd5bb6610c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1341,8 +1341,9 @@ extern int perf_event_output(struct perf_event *event,
> struct pt_regs *regs);
>
> static inline bool
> -__is_default_overflow_handler(perf_overflow_handler_t overflow_handler)
> +is_default_overflow_handler(struct perf_event *event)
> {
> + perf_overflow_handler_t overflow_handler = event->overflow_handler;
> if (likely(overflow_handler == perf_event_output_forward))

Please read the CodingStyle section about variable definition blocks and
newlines...

Also note the stray period in the title ...

How did this patch get to v5 and get acked by 3 people with such trivial
problems still present? ...

Thanks,

Ingo

2024-04-10 04:39:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 0/4] Combine perf and bpf for fast eval of hw breakpoint conditions]


* Kyle Huey <[email protected]> wrote:

> Peter, Ingo, could you take a look at this?
>
> ----
>
> rr, a userspace record and replay debugger[0], replays asynchronous
> events such as signals and context switches by essentially[1] setting a
> breakpoint at the address where the asynchronous event was delivered
> during recording with a condition that the program state matches the
> state when the event was delivered.
>
> Currently, rr uses software breakpoints that trap (via ptrace) to the
> supervisor, and evaluates the condition from the supervisor. If the
> asynchronous event is delivered in a tight loop (thus requiring the
> breakpoint condition to be repeatedly evaluated) the overhead can be
> immense. A patch to rr that uses hardware breakpoints via perf events
> with an attached BPF program to reject breakpoint hits where the
> condition is not satisfied reduces rr's replay overhead by 94% on a
> pathological (but a real customer-provided, not contrived) rr trace.
>
> The only obstacle to this approach is that while the kernel allows a BPF
> program to suppress sample output when a perf event overflows it does not
> suppress signalling the perf event fd or sending the perf event's
> SIGTRAP. This patch set redesigns __perf_overflow_handler() and
> bpf_overflow_handler() so that the former invokes the latter directly
> when appropriate rather than through the generic overflow handler
> machinery, passes the return code of the BPF program back to
> __perf_overflow_handler() to allow it to decide whether to execute the
> regular overflow handler, reorders bpf_overflow_handler() and the side
> effects of perf event overflow, changes __perf_overflow_handler() to
> suppress those side effects if the BPF program returns zero, and adds a
> selftest.

I suppose this optimization makes sense.

Patch quality still needs to be improved though - see my review comments.

Thanks,

Ingo

2024-04-11 12:21:38

by Kyle Huey

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 1/4] perf/bpf: Call bpf handler directly, not through overflow machinery

On Wed, Apr 10, 2024 at 12:32 AM Ingo Molnar <[email protected]> wrote:
>
>
> * Kyle Huey <[email protected]> wrote:
>
> > To ultimately allow bpf programs attached to perf events to completely
> > suppress all of the effects of a perf event overflow (rather than just the
> > sample output, as they do today), call bpf_overflow_handler() from
> > __perf_event_overflow() directly rather than modifying struct perf_event's
> > overflow_handler. Return the bpf program's return value from
> > bpf_overflow_handler() so that __perf_event_overflow() knows how to
> > proceed. Remove the now unnecessary orig_overflow_handler from struct
> > perf_event.
> >
> > This patch is solely a refactoring and results in no behavior change.
> >
> > Signed-off-by: Kyle Huey <[email protected]>
> > Suggested-by: Namhyung Kim <[email protected]>
> > Acked-by: Song Liu <[email protected]>
> > Acked-by: Jiri Olsa <[email protected]>
> > ---
> > include/linux/perf_event.h | 6 +-----
> > kernel/events/core.c | 28 +++++++++++++++-------------
> > 2 files changed, 16 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > index d2a15c0c6f8a..c7f54fd74d89 100644
> > --- a/include/linux/perf_event.h
> > +++ b/include/linux/perf_event.h
> > @@ -810,7 +810,6 @@ struct perf_event {
> > perf_overflow_handler_t overflow_handler;
> > void *overflow_handler_context;
> > #ifdef CONFIG_BPF_SYSCALL
> > - perf_overflow_handler_t orig_overflow_handler;
> > struct bpf_prog *prog;
> > u64 bpf_cookie;
> > #endif
>
> Could we reduce the #ifdeffery please?

Not easily.

> On distros CONFIG_BPF_SYSCALL is almost always enabled, so it's not like
> this truly saves anything on real systems.
>
> I'd suggest making the perf_event::prog and perf_event::bpf_cookie fields
> unconditional.

That's not sufficient. See below.

> > +#ifdef CONFIG_BPF_SYSCALL
> > +static int bpf_overflow_handler(struct perf_event *event,
> > + struct perf_sample_data *data,
> > + struct pt_regs *regs);
> > +#endif
>
> If the function definitions are misordered then first do a patch that moves
> the function earlier in the file, instead of slapping a random prototype
> into a random place.

Ok.

> > - READ_ONCE(event->overflow_handler)(event, data, regs);
> > +#ifdef CONFIG_BPF_SYSCALL
> > + if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
> > +#endif
> > + READ_ONCE(event->overflow_handler)(event, data, regs);
>
> This #ifdef would go away too - on !CONFIG_BPF_SYSCALL event->prog should
> always be NULL.

bpf_overflow_handler() is also #ifdef CONFIG_BPF_SYSCALL. It uses
bpf_prog_active, so that would need to be moved out of the ifdef,
which would require moving the DEFINE_PER_CPU out of bpf/syscall.c ...
or I'd have to add a !CONFIG_BPF_SYSCALL definition of
bpf_overflow_handler() that only returns 1 and never actually gets
called because the condition short-circuits on event->prog. Neither
seems like it makes my patch or the code simpler, especially since
this weird ifdef-that-applies-only-to-the-condition goes away in Part
3 where I actually change the behavior.

It feels like the root of your objection is that CONFIG_BPF_SYSCALL
exists at all. I could remove it in a separate patch if there's
consensus about that.




> Please keep the #ifdeffery reduction and function-moving patches separate
> from these other changes.
>
> Thanks,
>
> Ingo

- Kyle

2024-04-12 01:47:32

by Kyle Huey

[permalink] [raw]
Subject: Re: [RESEND PATCH v5 1/4] perf/bpf: Call bpf handler directly, not through overflow machinery

On Thu, Apr 11, 2024 at 8:11 AM Kyle Huey <[email protected]> wrote:
>
> On Wed, Apr 10, 2024 at 12:32 AM Ingo Molnar <[email protected]> wrote:
> >
> >
> > * Kyle Huey <[email protected]> wrote:
> >
> > > To ultimately allow bpf programs attached to perf events to completely
> > > suppress all of the effects of a perf event overflow (rather than just the
> > > sample output, as they do today), call bpf_overflow_handler() from
> > > __perf_event_overflow() directly rather than modifying struct perf_event's
> > > overflow_handler. Return the bpf program's return value from
> > > bpf_overflow_handler() so that __perf_event_overflow() knows how to
> > > proceed. Remove the now unnecessary orig_overflow_handler from struct
> > > perf_event.
> > >
> > > This patch is solely a refactoring and results in no behavior change.
> > >
> > > Signed-off-by: Kyle Huey <[email protected]>
> > > Suggested-by: Namhyung Kim <[email protected]>
> > > Acked-by: Song Liu <[email protected]>
> > > Acked-by: Jiri Olsa <[email protected]>
> > > ---
> > > include/linux/perf_event.h | 6 +-----
> > > kernel/events/core.c | 28 +++++++++++++++-------------
> > > 2 files changed, 16 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> > > index d2a15c0c6f8a..c7f54fd74d89 100644
> > > --- a/include/linux/perf_event.h
> > > +++ b/include/linux/perf_event.h
> > > @@ -810,7 +810,6 @@ struct perf_event {
> > > perf_overflow_handler_t overflow_handler;
> > > void *overflow_handler_context;
> > > #ifdef CONFIG_BPF_SYSCALL
> > > - perf_overflow_handler_t orig_overflow_handler;
> > > struct bpf_prog *prog;
> > > u64 bpf_cookie;
> > > #endif
> >
> > Could we reduce the #ifdeffery please?
>
> Not easily.
>
> > On distros CONFIG_BPF_SYSCALL is almost always enabled, so it's not like
> > this truly saves anything on real systems.
> >
> > I'd suggest making the perf_event::prog and perf_event::bpf_cookie fields
> > unconditional.
>
> That's not sufficient. See below.
>
> > > +#ifdef CONFIG_BPF_SYSCALL
> > > +static int bpf_overflow_handler(struct perf_event *event,
> > > + struct perf_sample_data *data,
> > > + struct pt_regs *regs);
> > > +#endif
> >
> > If the function definitions are misordered then first do a patch that moves
> > the function earlier in the file, instead of slapping a random prototype
> > into a random place.
>
> Ok.
>
> > > - READ_ONCE(event->overflow_handler)(event, data, regs);
> > > +#ifdef CONFIG_BPF_SYSCALL
> > > + if (!(event->prog && !bpf_overflow_handler(event, data, regs)))
> > > +#endif
> > > + READ_ONCE(event->overflow_handler)(event, data, regs);
> >
> > This #ifdef would go away too - on !CONFIG_BPF_SYSCALL event->prog should
> > always be NULL.
>
> bpf_overflow_handler() is also #ifdef CONFIG_BPF_SYSCALL. It uses
> bpf_prog_active, so that would need to be moved out of the ifdef,
> which would require moving the DEFINE_PER_CPU out of bpf/syscall.c ...
> or I'd have to add a !CONFIG_BPF_SYSCALL definition of
> bpf_overflow_handler() that only returns 1 and never actually gets
> called because the condition short-circuits on event->prog. Neither
> seems like it makes my patch or the code simpler, especially since
> this weird ifdef-that-applies-only-to-the-condition goes away in Part
> 3 where I actually change the behavior.

After fiddling with this I think the stub definition of
bpf_overflow_handler() is fine. The other CONFIG_BPF_SYSCALL functions
in this file already have similar stubs. I'll send a new patch set.

- Kyle

> It feels like the root of your objection is that CONFIG_BPF_SYSCALL
> exists at all. I could remove it in a separate patch if there's
> consensus about that.
>
>
>
>
> > Please keep the #ifdeffery reduction and function-moving patches separate
> > from these other changes.
> >
> > Thanks,
> >
> > Ingo
>
> - Kyle