Hello,
There have been requests for more sophisticated perf event sample
filtering based on the sample data. Recently the kernel added BPF
programs can access perf sample data and this is the userspace part
to enable such a filtering.
This still has some rough edges and needs more improvements. But
I'd like to share the current work and get some feedback for the
directions and idea for further improvements.
v5 changes)
* rebased to the current acme/tmp.perf-tools-next
* update the documentation
v4 changes)
* add __maybe_unused for !BUILD_BPF_SKEL (Adrian)
* warn user if it misses sample flags (Adrian)
* improve error message for invalid input
* add Acked-by from Jiri
v3 changes)
* fix build error on old kernels/vmlinux (Arnaldo)
* move the logic to evlist__apply_filters (Jiri)
* improve error message for bad input
v2 changes)
* fix build error with the misc field (Jiri)
* add a destructor for filter expr (Ian)
* remove 'bpf:' prefix (Arnaldo)
* add '||' operator
The required kernel changes are now in the mainline tree (for v6.3).
perf record has --filter option to set filters on the last specified
event in the command line. It worked only for tracepoints and Intel
PT events so far. This patchset extends it to use BPF in order to
enable the general sample filters for any events.
A new filter expression parser was added (using flex/bison) to process
the filter string. Right now, it only accepts very simple expressions
separated by comma. I'd like to keep the filter expression as simple
as possible.
It requires samples satisfy all the filter expressions otherwise it'd
drop the sample. IOW filter expressions are connected with logical AND
operations unless they used "||" explicitly. So if user has something
like 'A, B || C, D', then BOTH A and D should be true AND either B or C
also needs to be true.
Essentially the BPF filter expression is:
<term> <operator> <value> (("," | "||") <term> <operator> <value>)*
The <term> can be one of:
ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
mem_dtlb, mem_blk, mem_hops
The <operator> can be one of:
==, !=, >, >=, <, <=, &
The <value> can be one of:
<number> (for any term)
na, load, store, pfetch, exec (for mem_op)
l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
remote (for mem_remote)
na, locked (for mem_locked)
na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
na, by_data, by_addr (for mem_blk)
hops0, hops1, hops2, hops3 (for mem_hops)
I plan to improve it with range expressions like for ip or addr and it
should support symbols like the existing addr-filters. Also cgroup
should understand and convert cgroup names to IDs.
Let's take a look at some examples. The following is to profile a user
program on the command line. When the frequency mode is used, it starts
with a very small period (i.e. 1) and adjust it on every interrupt (NMI)
to catch up the given frequency.
$ ./perf record -- ./perf test -w noploop
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.263 MB perf.data (4006 samples) ]
$ ./perf script -F pid,period,event,ip,sym | head
36695 1 cycles: ffffffffbab12ddd perf_event_exec
36695 1 cycles: ffffffffbab12ddd perf_event_exec
36695 5 cycles: ffffffffbab12ddd perf_event_exec
36695 46 cycles: ffffffffbab12de5 perf_event_exec
36695 1163 cycles: ffffffffba80a0eb x86_pmu_disable_all
36695 1304 cycles: ffffffffbaa19507 __hrtimer_get_next_event
36695 8143 cycles: ffffffffbaa186f9 __run_timers
36695 69040 cycles: ffffffffbaa0c393 rcu_segcblist_ready_cbs
36695 355117 cycles: 4b0da4 noploop
36695 321861 cycles: 4b0da4 noploop
If you want to skip the first few samples that have small periods, you
can do like this (note it requires root due to BPF).
$ sudo ./perf record -e cycles --filter 'period > 10000' -- ./perf test -w noploop
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.262 MB perf.data (3990 samples) ]
$ sudo ./perf script -F pid,period,event,ip,sym | head
39524 58253 cycles: ffffffffba97dac0 update_rq_clock
39524 232657 cycles: 4b0da2 noploop
39524 210981 cycles: 4b0da2 noploop
39524 282882 cycles: 4b0da4 noploop
39524 392180 cycles: 4b0da4 noploop
39524 456058 cycles: 4b0da4 noploop
39524 415196 cycles: 4b0da2 noploop
39524 462721 cycles: 4b0da4 noploop
39524 526272 cycles: 4b0da2 noploop
39524 565569 cycles: 4b0da4 noploop
Maybe more useful example is when it deals with precise memory events.
On AMD processors with IBS, you can filter only memory load with L1
dTLB is missed like below.
$ sudo ./perf record -ad -e ibs_op//p \
> --filter 'mem_op == load, mem_dtlb > l1_hit' sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.338 MB perf.data (15 samples) ]
$ sudo ./perf script -F data_src | head
51080242 |OP LOAD|LVL LFB/MAB hit|SNP N/A|TLB L2 miss|LCK N/A|BLK N/A
49080142 |OP LOAD|LVL L1 hit|SNP N/A|TLB L2 hit|LCK N/A|BLK N/A
51080242 |OP LOAD|LVL LFB/MAB hit|SNP N/A|TLB L2 miss|LCK N/A|BLK N/A
51080242 |OP LOAD|LVL LFB/MAB hit|SNP N/A|TLB L2 miss|LCK N/A|BLK N/A
51088842 |OP LOAD|LVL L3 or Remote Cache (1 hop) hit|SNP N/A|TLB L2 miss|LCK N/A|BLK N/A
51080242 |OP LOAD|LVL LFB/MAB hit|SNP N/A|TLB L2 miss|LCK N/A|BLK N/A
51080242 |OP LOAD|LVL LFB/MAB hit|SNP N/A|TLB L2 miss|LCK N/A|BLK N/A
51080242 |OP LOAD|LVL LFB/MAB hit|SNP N/A|TLB L2 miss|LCK N/A|BLK N/A
49080442 |OP LOAD|LVL L2 hit|SNP N/A|TLB L2 hit|LCK N/A|BLK N/A
51080242 |OP LOAD|LVL LFB/MAB hit|SNP N/A|TLB L2 miss|LCK N/A|BLK N/A
You can also check the number of dropped samples in LOST_SAMPLES events
using perf report --stat command.
$ sudo ./perf report --stat
Aggregated stats:
TOTAL events: 16066
MMAP events: 22 ( 0.1%)
COMM events: 4166 (25.9%)
EXIT events: 1 ( 0.0%)
THROTTLE events: 816 ( 5.1%)
UNTHROTTLE events: 613 ( 3.8%)
FORK events: 4165 (25.9%)
SAMPLE events: 15 ( 0.1%)
MMAP2 events: 6133 (38.2%)
LOST_SAMPLES events: 1 ( 0.0%)
KSYMBOL events: 69 ( 0.4%)
BPF_EVENT events: 57 ( 0.4%)
FINISHED_ROUND events: 3 ( 0.0%)
ID_INDEX events: 1 ( 0.0%)
THREAD_MAP events: 1 ( 0.0%)
CPU_MAP events: 1 ( 0.0%)
TIME_CONV events: 1 ( 0.0%)
FINISHED_INIT events: 1 ( 0.0%)
ibs_op//p stats:
SAMPLE events: 15
LOST_SAMPLES events: 3991
Note that the total aggregated stats show 1 LOST_SAMPLES event but
per event stats show 3991 events because it's the actual number of
dropped samples while the aggregated stats has the number of record.
Maybe we need to change the per-event stats to 'LOST_SAMPLES count'
to avoid the confusion.
The code is available at 'perf/bpf-filter-v5' branch in
git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
Any feedback is welcome.
Thanks,
Namhyung
Namhyung Kim (10):
perf bpf filter: Introduce basic BPF filter expression
perf bpf filter: Implement event sample filtering
perf record: Add BPF event filter support
perf record: Record dropped sample count
perf bpf filter: Add 'pid' sample data support
perf bpf filter: Add more weight sample data support
perf bpf filter: Add data_src sample data support
perf bpf filter: Add logical OR operator
perf bpf filter: Show warning for missing sample flags
perf record: Update documentation for BPF filters
tools/lib/perf/include/perf/event.h | 2 +
tools/perf/Documentation/perf-record.txt | 60 +++++-
tools/perf/Makefile.perf | 2 +-
tools/perf/builtin-record.c | 40 ++--
tools/perf/util/Build | 16 ++
tools/perf/util/bpf-filter.c | 197 +++++++++++++++++++
tools/perf/util/bpf-filter.h | 49 +++++
tools/perf/util/bpf-filter.l | 159 +++++++++++++++
tools/perf/util/bpf-filter.y | 78 ++++++++
tools/perf/util/bpf_counter.c | 3 +-
tools/perf/util/bpf_skel/sample-filter.h | 27 +++
tools/perf/util/bpf_skel/sample_filter.bpf.c | 172 ++++++++++++++++
tools/perf/util/evlist.c | 25 ++-
tools/perf/util/evsel.c | 2 +
tools/perf/util/evsel.h | 7 +-
tools/perf/util/parse-events.c | 8 +-
tools/perf/util/session.c | 3 +-
17 files changed, 814 insertions(+), 36 deletions(-)
create mode 100644 tools/perf/util/bpf-filter.c
create mode 100644 tools/perf/util/bpf-filter.h
create mode 100644 tools/perf/util/bpf-filter.l
create mode 100644 tools/perf/util/bpf-filter.y
create mode 100644 tools/perf/util/bpf_skel/sample-filter.h
create mode 100644 tools/perf/util/bpf_skel/sample_filter.bpf.c
base-commit: 4c290d4fa3aeed74e37637acaa1a787f194fe43d
--
2.40.0.rc1.284.g88254d51c5-goog
This implements a tiny parser for the filter expressions used for BPF.
Each expression will be converted to struct perf_bpf_filter_expr and
be passed to a BPF map.
For now, I'd like to start with the very basic comparisons like EQ or
GT. The LHS should be a term for sample data and the RHS is a number.
The expressions are connected by a comma. For example,
period > 10000
ip < 0x1000000000000, cpu == 3
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/Build | 16 +++++++
tools/perf/util/bpf-filter.c | 37 ++++++++++++++++
tools/perf/util/bpf-filter.h | 36 ++++++++++++++++
tools/perf/util/bpf-filter.l | 82 ++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-filter.y | 54 ++++++++++++++++++++++++
5 files changed, 225 insertions(+)
create mode 100644 tools/perf/util/bpf-filter.c
create mode 100644 tools/perf/util/bpf-filter.h
create mode 100644 tools/perf/util/bpf-filter.l
create mode 100644 tools/perf/util/bpf-filter.y
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 8607575183a9..853ce987eb4f 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -154,6 +154,9 @@ perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o
perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter_cgroup.o
perf-$(CONFIG_PERF_BPF_SKEL) += bpf_ftrace.o
perf-$(CONFIG_PERF_BPF_SKEL) += bpf_off_cpu.o
+perf-$(CONFIG_PERF_BPF_SKEL) += bpf-filter.o
+perf-$(CONFIG_PERF_BPF_SKEL) += bpf-filter-flex.o
+perf-$(CONFIG_PERF_BPF_SKEL) += bpf-filter-bison.o
ifeq ($(CONFIG_LIBTRACEEVENT),y)
perf-$(CONFIG_PERF_BPF_SKEL) += bpf_lock_contention.o
@@ -267,6 +270,16 @@ $(OUTPUT)util/pmu-bison.c $(OUTPUT)util/pmu-bison.h: util/pmu.y
$(Q)$(call echo-cmd,bison)$(BISON) -v $< -d $(PARSER_DEBUG_BISON) $(BISON_FILE_PREFIX_MAP) \
-o $(OUTPUT)util/pmu-bison.c -p perf_pmu_
+$(OUTPUT)util/bpf-filter-flex.c $(OUTPUT)util/bpf-filter-flex.h: util/bpf-filter.l $(OUTPUT)util/bpf-filter-bison.c
+ $(call rule_mkdir)
+ $(Q)$(call echo-cmd,flex)$(FLEX) -o $(OUTPUT)util/bpf-filter-flex.c \
+ --header-file=$(OUTPUT)util/bpf-filter-flex.h $(PARSER_DEBUG_FLEX) $<
+
+$(OUTPUT)util/bpf-filter-bison.c $(OUTPUT)util/bpf-filter-bison.h: util/bpf-filter.y
+ $(call rule_mkdir)
+ $(Q)$(call echo-cmd,bison)$(BISON) -v $< -d $(PARSER_DEBUG_BISON) $(BISON_FILE_PREFIX_MAP) \
+ -o $(OUTPUT)util/bpf-filter-bison.c -p perf_bpf_filter_
+
FLEX_GE_26 := $(shell expr $(shell $(FLEX) --version | sed -e 's/flex \([0-9]\+\).\([0-9]\+\)/\1\2/g') \>\= 26)
ifeq ($(FLEX_GE_26),1)
flex_flags := -Wno-switch-enum -Wno-switch-default -Wno-unused-function -Wno-redundant-decls -Wno-sign-compare -Wno-unused-parameter -Wno-missing-prototypes -Wno-missing-declarations
@@ -280,6 +293,7 @@ endif
CFLAGS_parse-events-flex.o += $(flex_flags)
CFLAGS_pmu-flex.o += $(flex_flags)
CFLAGS_expr-flex.o += $(flex_flags)
+CFLAGS_bpf-filter-flex.o += $(flex_flags)
bison_flags := -DYYENABLE_NLS=0
BISON_GE_35 := $(shell expr $(shell $(BISON) --version | grep bison | sed -e 's/.\+ \([0-9]\+\).\([0-9]\+\)/\1\2/g') \>\= 35)
@@ -291,10 +305,12 @@ endif
CFLAGS_parse-events-bison.o += $(bison_flags)
CFLAGS_pmu-bison.o += -DYYLTYPE_IS_TRIVIAL=0 $(bison_flags)
CFLAGS_expr-bison.o += -DYYLTYPE_IS_TRIVIAL=0 $(bison_flags)
+CFLAGS_bpf-filter-bison.o += -DYYLTYPE_IS_TRIVIAL=0 $(bison_flags)
$(OUTPUT)util/parse-events.o: $(OUTPUT)util/parse-events-flex.c $(OUTPUT)util/parse-events-bison.c
$(OUTPUT)util/pmu.o: $(OUTPUT)util/pmu-flex.c $(OUTPUT)util/pmu-bison.c
$(OUTPUT)util/expr.o: $(OUTPUT)util/expr-flex.c $(OUTPUT)util/expr-bison.c
+$(OUTPUT)util/bpf-filter.o: $(OUTPUT)util/bpf-filter-flex.c $(OUTPUT)util/bpf-filter-bison.c
CFLAGS_bitmap.o += -Wno-unused-parameter -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
CFLAGS_find_bit.o += -Wno-unused-parameter -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
new file mode 100644
index 000000000000..c72e35d51240
--- /dev/null
+++ b/tools/perf/util/bpf-filter.c
@@ -0,0 +1,37 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <stdlib.h>
+
+#include "util/bpf-filter.h"
+#include "util/bpf-filter-flex.h"
+#include "util/bpf-filter-bison.h"
+
+struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
+ enum perf_bpf_filter_op op,
+ unsigned long val)
+{
+ struct perf_bpf_filter_expr *expr;
+
+ expr = malloc(sizeof(*expr));
+ if (expr != NULL) {
+ expr->sample_flags = sample_flags;
+ expr->op = op;
+ expr->val = val;
+ }
+ return expr;
+}
+
+int perf_bpf_filter__parse(struct list_head *expr_head, const char *str)
+{
+ YY_BUFFER_STATE buffer;
+ int ret;
+
+ buffer = perf_bpf_filter__scan_string(str);
+
+ ret = perf_bpf_filter_parse(expr_head);
+
+ perf_bpf_filter__flush_buffer(buffer);
+ perf_bpf_filter__delete_buffer(buffer);
+ perf_bpf_filter_lex_destroy();
+
+ return ret;
+}
diff --git a/tools/perf/util/bpf-filter.h b/tools/perf/util/bpf-filter.h
new file mode 100644
index 000000000000..93a0d3de038c
--- /dev/null
+++ b/tools/perf/util/bpf-filter.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef PERF_UTIL_BPF_FILTER_H
+#define PERF_UTIL_BPF_FILTER_H
+
+#include <linux/list.h>
+
+enum perf_bpf_filter_op {
+ PBF_OP_EQ,
+ PBF_OP_NEQ,
+ PBF_OP_GT,
+ PBF_OP_GE,
+ PBF_OP_LT,
+ PBF_OP_LE,
+ PBF_OP_AND,
+};
+
+struct perf_bpf_filter_expr {
+ struct list_head list;
+ enum perf_bpf_filter_op op;
+ unsigned long sample_flags;
+ unsigned long val;
+};
+
+#ifdef HAVE_BPF_SKEL
+struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
+ enum perf_bpf_filter_op op,
+ unsigned long val);
+int perf_bpf_filter__parse(struct list_head *expr_head, const char *str);
+#else /* !HAVE_BPF_SKEL */
+static inline int perf_bpf_filter__parse(struct list_head *expr_head __maybe_unused,
+ const char *str __maybe_unused)
+{
+ return -ENOSYS;
+}
+#endif /* HAVE_BPF_SKEL*/
+#endif /* PERF_UTIL_BPF_FILTER_H */
diff --git a/tools/perf/util/bpf-filter.l b/tools/perf/util/bpf-filter.l
new file mode 100644
index 000000000000..f6c0b74ea285
--- /dev/null
+++ b/tools/perf/util/bpf-filter.l
@@ -0,0 +1,82 @@
+%option prefix="perf_bpf_filter_"
+%option noyywrap
+
+%{
+#include <stdio.h>
+#include <stdlib.h>
+#include <linux/perf_event.h>
+
+#include "bpf-filter.h"
+#include "bpf-filter-bison.h"
+
+static int sample(unsigned long sample_flag)
+{
+ perf_bpf_filter_lval.sample = sample_flag;
+ return BFT_SAMPLE;
+}
+
+static int operator(enum perf_bpf_filter_op op)
+{
+ perf_bpf_filter_lval.op = op;
+ return BFT_OP;
+}
+
+static int value(int base)
+{
+ long num;
+
+ errno = 0;
+ num = strtoul(perf_bpf_filter_text, NULL, base);
+ if (errno)
+ return BFT_ERROR;
+
+ perf_bpf_filter_lval.num = num;
+ return BFT_NUM;
+}
+
+static int error(const char *str)
+{
+ printf("perf_bpf_filter: Unexpected filter %s: %s\n", str, perf_bpf_filter_text);
+ return BFT_ERROR;
+}
+
+%}
+
+num_dec [0-9]+
+num_hex 0[Xx][0-9a-fA-F]+
+space [ \t]+
+ident [_a-zA-Z][_a-zA-Z0-9]+
+
+%%
+
+{num_dec} { return value(10); }
+{num_hex} { return value(16); }
+{space} { }
+
+ip { return sample(PERF_SAMPLE_IP); }
+id { return sample(PERF_SAMPLE_ID); }
+tid { return sample(PERF_SAMPLE_TID); }
+cpu { return sample(PERF_SAMPLE_CPU); }
+time { return sample(PERF_SAMPLE_TIME); }
+addr { return sample(PERF_SAMPLE_ADDR); }
+period { return sample(PERF_SAMPLE_PERIOD); }
+txn { return sample(PERF_SAMPLE_TRANSACTION); }
+weight { return sample(PERF_SAMPLE_WEIGHT); }
+phys_addr { return sample(PERF_SAMPLE_PHYS_ADDR); }
+code_pgsz { return sample(PERF_SAMPLE_CODE_PAGE_SIZE); }
+data_pgsz { return sample(PERF_SAMPLE_DATA_PAGE_SIZE); }
+
+"==" { return operator(PBF_OP_EQ); }
+"!=" { return operator(PBF_OP_NEQ); }
+">" { return operator(PBF_OP_GT); }
+"<" { return operator(PBF_OP_LT); }
+">=" { return operator(PBF_OP_GE); }
+"<=" { return operator(PBF_OP_LE); }
+"&" { return operator(PBF_OP_AND); }
+
+"," { return ','; }
+
+{ident} { return error("ident"); }
+. { return error("input"); }
+
+%%
diff --git a/tools/perf/util/bpf-filter.y b/tools/perf/util/bpf-filter.y
new file mode 100644
index 000000000000..13eca612ecca
--- /dev/null
+++ b/tools/perf/util/bpf-filter.y
@@ -0,0 +1,54 @@
+%parse-param {struct list_head *expr_head}
+%define parse.error verbose
+
+%{
+
+#include <stdio.h>
+#include <string.h>
+#include <linux/compiler.h>
+#include <linux/list.h>
+#include "bpf-filter.h"
+
+static void perf_bpf_filter_error(struct list_head *expr __maybe_unused,
+ char const *msg)
+{
+ printf("perf_bpf_filter: %s\n", msg);
+}
+
+%}
+
+%union
+{
+ unsigned long num;
+ unsigned long sample;
+ enum perf_bpf_filter_op op;
+ struct perf_bpf_filter_expr *expr;
+}
+
+%token BFT_SAMPLE BFT_OP BFT_ERROR BFT_NUM
+%type <expr> filter_term
+%destructor { free ($$); } <expr>
+%type <sample> BFT_SAMPLE
+%type <op> BFT_OP
+%type <num> BFT_NUM
+
+%%
+
+filter:
+filter ',' filter_term
+{
+ list_add_tail(&$3->list, expr_head);
+}
+|
+filter_term
+{
+ list_add_tail(&$1->list, expr_head);
+}
+
+filter_term:
+BFT_SAMPLE BFT_OP BFT_NUM
+{
+ $$ = perf_bpf_filter_expr__new($1, $2, $3);
+}
+
+%%
--
2.40.0.rc1.284.g88254d51c5-goog
The BPF program will be attached to a perf_event and be triggered when
it overflows. It'd iterate the filters map and compare the sample
value according to the expression. If any of them fails, the sample
would be dropped.
Also it needs to have the corresponding sample data for the expression
so it compares data->sample_flags with the given value. To access the
sample data, it uses the bpf_cast_to_kern_ctx() kfunc which was added
in v6.2 kernel.
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Makefile.perf | 2 +-
tools/perf/util/bpf-filter.c | 64 ++++++++++
tools/perf/util/bpf-filter.h | 26 ++--
tools/perf/util/bpf_skel/sample-filter.h | 24 ++++
tools/perf/util/bpf_skel/sample_filter.bpf.c | 126 +++++++++++++++++++
tools/perf/util/evsel.h | 7 +-
6 files changed, 236 insertions(+), 13 deletions(-)
create mode 100644 tools/perf/util/bpf_skel/sample-filter.h
create mode 100644 tools/perf/util/bpf_skel/sample_filter.bpf.c
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index dc9dda09b076..ed6b6a070f79 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -1050,7 +1050,7 @@ SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
SKELETONS += $(SKEL_OUT)/bperf_leader.skel.h $(SKEL_OUT)/bperf_follower.skel.h
SKELETONS += $(SKEL_OUT)/bperf_cgroup.skel.h $(SKEL_OUT)/func_latency.skel.h
SKELETONS += $(SKEL_OUT)/off_cpu.skel.h $(SKEL_OUT)/lock_contention.skel.h
-SKELETONS += $(SKEL_OUT)/kwork_trace.skel.h
+SKELETONS += $(SKEL_OUT)/kwork_trace.skel.h $(SKEL_OUT)/sample_filter.skel.h
$(SKEL_TMP_OUT) $(LIBAPI_OUTPUT) $(LIBBPF_OUTPUT) $(LIBPERF_OUTPUT) $(LIBSUBCMD_OUTPUT) $(LIBSYMBOL_OUTPUT):
$(Q)$(MKDIR) -p $@
diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
index c72e35d51240..f20e1bc03778 100644
--- a/tools/perf/util/bpf-filter.c
+++ b/tools/perf/util/bpf-filter.c
@@ -1,10 +1,74 @@
/* SPDX-License-Identifier: GPL-2.0 */
#include <stdlib.h>
+#include <bpf/bpf.h>
+#include <linux/err.h>
+#include <internal/xyarray.h>
+
+#include "util/debug.h"
+#include "util/evsel.h"
+
#include "util/bpf-filter.h"
#include "util/bpf-filter-flex.h"
#include "util/bpf-filter-bison.h"
+#include "bpf_skel/sample-filter.h"
+#include "bpf_skel/sample_filter.skel.h"
+
+#define FD(e, x, y) (*(int *)xyarray__entry(e->core.fd, x, y))
+
+int perf_bpf_filter__prepare(struct evsel *evsel)
+{
+ int i, x, y, fd;
+ struct sample_filter_bpf *skel;
+ struct bpf_program *prog;
+ struct bpf_link *link;
+ struct perf_bpf_filter_expr *expr;
+
+ skel = sample_filter_bpf__open_and_load();
+ if (!skel) {
+ pr_err("Failed to load perf sample-filter BPF skeleton\n");
+ return -1;
+ }
+
+ i = 0;
+ fd = bpf_map__fd(skel->maps.filters);
+ list_for_each_entry(expr, &evsel->bpf_filters, list) {
+ struct perf_bpf_filter_entry entry = {
+ .op = expr->op,
+ .flags = expr->sample_flags,
+ .value = expr->val,
+ };
+ bpf_map_update_elem(fd, &i, &entry, BPF_ANY);
+ i++;
+ }
+
+ prog = skel->progs.perf_sample_filter;
+ for (x = 0; x < xyarray__max_x(evsel->core.fd); x++) {
+ for (y = 0; y < xyarray__max_y(evsel->core.fd); y++) {
+ link = bpf_program__attach_perf_event(prog, FD(evsel, x, y));
+ if (IS_ERR(link)) {
+ pr_err("Failed to attach perf sample-filter program\n");
+ return PTR_ERR(link);
+ }
+ }
+ }
+ evsel->bpf_skel = skel;
+ return 0;
+}
+
+int perf_bpf_filter__destroy(struct evsel *evsel)
+{
+ struct perf_bpf_filter_expr *expr, *tmp;
+
+ list_for_each_entry_safe(expr, tmp, &evsel->bpf_filters, list) {
+ list_del(&expr->list);
+ free(expr);
+ }
+ sample_filter_bpf__destroy(evsel->bpf_skel);
+ return 0;
+}
+
struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
enum perf_bpf_filter_op op,
unsigned long val)
diff --git a/tools/perf/util/bpf-filter.h b/tools/perf/util/bpf-filter.h
index 93a0d3de038c..eb8e1ac43cdf 100644
--- a/tools/perf/util/bpf-filter.h
+++ b/tools/perf/util/bpf-filter.h
@@ -4,15 +4,7 @@
#include <linux/list.h>
-enum perf_bpf_filter_op {
- PBF_OP_EQ,
- PBF_OP_NEQ,
- PBF_OP_GT,
- PBF_OP_GE,
- PBF_OP_LT,
- PBF_OP_LE,
- PBF_OP_AND,
-};
+#include "bpf_skel/sample-filter.h"
struct perf_bpf_filter_expr {
struct list_head list;
@@ -21,16 +13,30 @@ struct perf_bpf_filter_expr {
unsigned long val;
};
+struct evsel;
+
#ifdef HAVE_BPF_SKEL
struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
enum perf_bpf_filter_op op,
unsigned long val);
int perf_bpf_filter__parse(struct list_head *expr_head, const char *str);
+int perf_bpf_filter__prepare(struct evsel *evsel);
+int perf_bpf_filter__destroy(struct evsel *evsel);
+
#else /* !HAVE_BPF_SKEL */
+
static inline int perf_bpf_filter__parse(struct list_head *expr_head __maybe_unused,
const char *str __maybe_unused)
{
- return -ENOSYS;
+ return -EOPNOTSUPP;
+}
+static inline int perf_bpf_filter__prepare(struct evsel *evsel __maybe_unused)
+{
+ return -EOPNOTSUPP;
+}
+static inline int perf_bpf_filter__destroy(struct evsel *evsel __maybe_unused)
+{
+ return -EOPNOTSUPP;
}
#endif /* HAVE_BPF_SKEL*/
#endif /* PERF_UTIL_BPF_FILTER_H */
diff --git a/tools/perf/util/bpf_skel/sample-filter.h b/tools/perf/util/bpf_skel/sample-filter.h
new file mode 100644
index 000000000000..862060bfda14
--- /dev/null
+++ b/tools/perf/util/bpf_skel/sample-filter.h
@@ -0,0 +1,24 @@
+#ifndef PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H
+#define PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H
+
+#define MAX_FILTERS 32
+
+/* supported filter operations */
+enum perf_bpf_filter_op {
+ PBF_OP_EQ,
+ PBF_OP_NEQ,
+ PBF_OP_GT,
+ PBF_OP_GE,
+ PBF_OP_LT,
+ PBF_OP_LE,
+ PBF_OP_AND
+};
+
+/* BPF map entry for filtering */
+struct perf_bpf_filter_entry {
+ enum perf_bpf_filter_op op;
+ __u64 flags;
+ __u64 value;
+};
+
+#endif /* PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H */
\ No newline at end of file
diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
new file mode 100644
index 000000000000..c07256279c3e
--- /dev/null
+++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+// Copyright (c) 2023 Google
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <bpf/bpf_core_read.h>
+
+#include "sample-filter.h"
+
+/* BPF map that will be filled by user space */
+struct filters {
+ __uint(type, BPF_MAP_TYPE_ARRAY);
+ __type(key, int);
+ __type(value, struct perf_bpf_filter_entry);
+ __uint(max_entries, MAX_FILTERS);
+} filters SEC(".maps");
+
+int dropped;
+
+void *bpf_cast_to_kern_ctx(void *) __ksym;
+
+/* new kernel perf_sample_data definition */
+struct perf_sample_data___new {
+ __u64 sample_flags;
+} __attribute__((preserve_access_index));
+
+/* helper function to return the given perf sample data */
+static inline __u64 perf_get_sample(struct bpf_perf_event_data_kern *kctx,
+ struct perf_bpf_filter_entry *entry)
+{
+ struct perf_sample_data___new *data = (void *)kctx->data;
+
+ if (!bpf_core_field_exists(data->sample_flags) ||
+ (data->sample_flags & entry->flags) == 0)
+ return 0;
+
+ switch (entry->flags) {
+ case PERF_SAMPLE_IP:
+ return kctx->data->ip;
+ case PERF_SAMPLE_ID:
+ return kctx->data->id;
+ case PERF_SAMPLE_TID:
+ return kctx->data->tid_entry.tid;
+ case PERF_SAMPLE_CPU:
+ return kctx->data->cpu_entry.cpu;
+ case PERF_SAMPLE_TIME:
+ return kctx->data->time;
+ case PERF_SAMPLE_ADDR:
+ return kctx->data->addr;
+ case PERF_SAMPLE_PERIOD:
+ return kctx->data->period;
+ case PERF_SAMPLE_TRANSACTION:
+ return kctx->data->txn;
+ case PERF_SAMPLE_WEIGHT:
+ return kctx->data->weight.full;
+ case PERF_SAMPLE_PHYS_ADDR:
+ return kctx->data->phys_addr;
+ case PERF_SAMPLE_CODE_PAGE_SIZE:
+ return kctx->data->code_page_size;
+ case PERF_SAMPLE_DATA_PAGE_SIZE:
+ return kctx->data->data_page_size;
+ default:
+ break;
+ }
+ return 0;
+}
+
+/* BPF program to be called from perf event overflow handler */
+SEC("perf_event")
+int perf_sample_filter(void *ctx)
+{
+ struct bpf_perf_event_data_kern *kctx;
+ struct perf_bpf_filter_entry *entry;
+ __u64 sample_data;
+ int i;
+
+ kctx = bpf_cast_to_kern_ctx(ctx);
+
+ for (i = 0; i < MAX_FILTERS; i++) {
+ int key = i; /* needed for verifier :( */
+
+ entry = bpf_map_lookup_elem(&filters, &key);
+ if (entry == NULL)
+ break;
+ sample_data = perf_get_sample(kctx, entry);
+
+ switch (entry->op) {
+ case PBF_OP_EQ:
+ if (!(sample_data == entry->value))
+ goto drop;
+ break;
+ case PBF_OP_NEQ:
+ if (!(sample_data != entry->value))
+ goto drop;
+ break;
+ case PBF_OP_GT:
+ if (!(sample_data > entry->value))
+ goto drop;
+ break;
+ case PBF_OP_GE:
+ if (!(sample_data >= entry->value))
+ goto drop;
+ break;
+ case PBF_OP_LT:
+ if (!(sample_data < entry->value))
+ goto drop;
+ break;
+ case PBF_OP_LE:
+ if (!(sample_data <= entry->value))
+ goto drop;
+ break;
+ case PBF_OP_AND:
+ if (!(sample_data & entry->value))
+ goto drop;
+ break;
+ }
+ }
+ /* generate sample data */
+ return 1;
+
+drop:
+ __sync_fetch_and_add(&dropped, 1);
+ return 0;
+}
+
+char LICENSE[] SEC("license") = "Dual BSD/GPL";
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index c272c06565c0..68072ec655ce 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -150,8 +150,10 @@ struct evsel {
*/
struct bpf_counter_ops *bpf_counter_ops;
- /* for perf-stat -b */
- struct list_head bpf_counter_list;
+ union {
+ struct list_head bpf_counter_list; /* for perf-stat -b */
+ struct list_head bpf_filters; /* for perf-record --filter */
+ };
/* for perf-stat --use-bpf */
int bperf_leader_prog_fd;
@@ -159,6 +161,7 @@ struct evsel {
union {
struct bperf_leader_bpf *leader_skel;
struct bperf_follower_bpf *follower_skel;
+ void *bpf_skel;
};
unsigned long open_flags;
int precise_ip_original;
--
2.40.0.rc1.284.g88254d51c5-goog
Use --filter option to set BPF filter for generic events other than the
tracepoints or Intel PT. The BPF program will check the sample data and
filter according to the expression.
For example, the below is the typical perf record for frequency mode.
The sample period started from 1 and increased gradually.
$ sudo ./perf record -e cycles true
$ sudo ./perf script
perf-exec 2272336 546683.916875: 1 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec 2272336 546683.916892: 1 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec 2272336 546683.916899: 3 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec 2272336 546683.916905: 17 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec 2272336 546683.916911: 100 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec 2272336 546683.916917: 589 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec 2272336 546683.916924: 3470 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
perf-exec 2272336 546683.916930: 20465 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
true 2272336 546683.916940: 119873 cycles: ffffffff8283afdd perf_iterate_ctx+0x2d ([kernel.kallsyms])
true 2272336 546683.917003: 461349 cycles: ffffffff82892517 vma_interval_tree_insert+0x37 ([kernel.kallsyms])
true 2272336 546683.917237: 635778 cycles: ffffffff82a11400 security_mmap_file+0x20 ([kernel.kallsyms])
When you add a BPF filter to get samples having periods greater than 1000,
the output would look like below:
$ sudo ./perf record -e cycles --filter 'period > 1000' true
$ sudo ./perf script
perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 15 +++++++++++---
tools/perf/util/bpf_counter.c | 3 +--
tools/perf/util/evlist.c | 25 +++++++++++++++++-------
tools/perf/util/evsel.c | 2 ++
tools/perf/util/parse-events.c | 8 +++-----
5 files changed, 36 insertions(+), 17 deletions(-)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index ff815c2f67e8..122f71726eaa 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -119,9 +119,12 @@ OPTIONS
"perf report" to view group events together.
--filter=<filter>::
- Event filter. This option should follow an event selector (-e) which
- selects either tracepoint event(s) or a hardware trace PMU
- (e.g. Intel PT or CoreSight).
+ Event filter. This option should follow an event selector (-e).
+ If the event is a tracepoint, the filter string will be parsed by
+ the kernel. If the event is a hardware trace PMU (e.g. Intel PT
+ or CoreSight), it'll be processed as an address filter. Otherwise
+ it means a general filter using BPF which can be applied for any
+ kind of event.
- tracepoint filters
@@ -176,6 +179,12 @@ OPTIONS
Multiple filters can be separated with space or comma.
+ - bpf filters
+
+ A BPF filter can access the sample data and make a decision based on the
+ data. Users need to set an appropriate sample type to use the BPF
+ filter.
+
--exclude-perf::
Don't record events issued by perf itself. This option should follow
an event selector (-e) which selects tracepoint event(s). It adds a
diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
index aa78a15a6f0a..1b77436e067e 100644
--- a/tools/perf/util/bpf_counter.c
+++ b/tools/perf/util/bpf_counter.c
@@ -763,8 +763,7 @@ extern struct bpf_counter_ops bperf_cgrp_ops;
static inline bool bpf_counter_skip(struct evsel *evsel)
{
- return list_empty(&evsel->bpf_counter_list) &&
- evsel->follower_skel == NULL;
+ return evsel->bpf_counter_ops == NULL;
}
int bpf_counter__install_pe(struct evsel *evsel, int cpu_map_idx, int fd)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index b74e12239aec..cc491a037836 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -31,6 +31,7 @@
#include "util/evlist-hybrid.h"
#include "util/pmu.h"
#include "util/sample.h"
+#include "util/bpf-filter.h"
#include <signal.h>
#include <unistd.h>
#include <sched.h>
@@ -1086,17 +1087,27 @@ int evlist__apply_filters(struct evlist *evlist, struct evsel **err_evsel)
int err = 0;
evlist__for_each_entry(evlist, evsel) {
- if (evsel->filter == NULL)
- continue;
-
/*
* filters only work for tracepoint event, which doesn't have cpu limit.
* So evlist and evsel should always be same.
*/
- err = perf_evsel__apply_filter(&evsel->core, evsel->filter);
- if (err) {
- *err_evsel = evsel;
- break;
+ if (evsel->filter) {
+ err = perf_evsel__apply_filter(&evsel->core, evsel->filter);
+ if (err) {
+ *err_evsel = evsel;
+ break;
+ }
+ }
+
+ /*
+ * non-tracepoint events can have BPF filters.
+ */
+ if (!list_empty(&evsel->bpf_filters)) {
+ err = perf_bpf_filter__prepare(evsel);
+ if (err) {
+ *err_evsel = evsel;
+ break;
+ }
}
}
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a83d8cd5eb51..dc3faf005c3b 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -50,6 +50,7 @@
#include "off_cpu.h"
#include "../perf-sys.h"
#include "util/parse-branch-options.h"
+#include "util/bpf-filter.h"
#include <internal/xyarray.h>
#include <internal/lib.h>
#include <internal/threadmap.h>
@@ -1517,6 +1518,7 @@ void evsel__exit(struct evsel *evsel)
assert(list_empty(&evsel->core.node));
assert(evsel->evlist == NULL);
bpf_counter__destroy(evsel);
+ perf_bpf_filter__destroy(evsel);
evsel__free_counts(evsel);
perf_evsel__free_fd(&evsel->core);
perf_evsel__free_id(&evsel->core);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 3b2e5bb3e852..6c5cf5244486 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -28,6 +28,7 @@
#include "perf.h"
#include "util/parse-events-hybrid.h"
#include "util/pmu-hybrid.h"
+#include "util/bpf-filter.h"
#include "tracepoint.h"
#include "thread_map.h"
@@ -2542,11 +2543,8 @@ static int set_filter(struct evsel *evsel, const void *arg)
perf_pmu__scan_file(pmu, "nr_addr_filters",
"%d", &nr_addr_filters);
- if (!nr_addr_filters) {
- fprintf(stderr,
- "This CPU does not support address filtering\n");
- return -1;
- }
+ if (!nr_addr_filters)
+ return perf_bpf_filter__parse(&evsel->bpf_filters, str);
if (evsel__append_addr_filter(evsel, str) < 0) {
fprintf(stderr,
--
2.40.0.rc1.284.g88254d51c5-goog
Add more description and examples.
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/Documentation/perf-record.txt | 47 +++++++++++++++++++++++-
1 file changed, 46 insertions(+), 1 deletion(-)
diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 122f71726eaa..680396c56bd1 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -183,7 +183,52 @@ OPTIONS
A BPF filter can access the sample data and make a decision based on the
data. Users need to set an appropriate sample type to use the BPF
- filter.
+ filter. BPF filters need root privilege.
+
+ The sample data field can be specified in lower case letter. Multiple
+ filters can be separated with comma. For example,
+
+ --filter 'period > 1000, cpu == 1'
+ or
+ --filter 'mem_op == load || mem_op == store, mem_lvl > l1'
+
+ The former filter only accept samples with period greater than 1000 AND
+ CPU number is 1. The latter one accepts either load and store memory
+ operations but it should have memory level above the L1. Since the
+ mem_op and mem_lvl fields come from the (memory) data_source, it'd only
+ work with some events which set the data_source field.
+
+ Also user should request to collect that information (with -d option in
+ the above case). Otherwise, the following message will be shown.
+
+ $ sudo perf record -e cycles --filter 'mem_op == load'
+ Error: cycles event does not have PERF_SAMPLE_DATA_SRC
+ Hint: please add -d option to perf record.
+ failed to set filter "BPF" on event cycles with 22 (Invalid argument)
+
+ Essentially the BPF filter expression is:
+
+ <term> <operator> <value> (("," | "||") <term> <operator> <value>)*
+
+ The <term> can be one of:
+ ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
+ code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
+ p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
+ mem_dtlb, mem_blk, mem_hops
+
+ The <operator> can be one of:
+ ==, !=, >, >=, <, <=, &
+
+ The <value> can be one of:
+ <number> (for any term)
+ na, load, store, pfetch, exec (for mem_op)
+ l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
+ na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
+ remote (for mem_remote)
+ na, locked (for mem_locked)
+ na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
+ na, by_data, by_addr (for mem_blk)
+ hops0, hops1, hops2, hops3 (for mem_hops)
--exclude-perf::
Don't record events issued by perf itself. This option should follow
--
2.40.0.rc1.284.g88254d51c5-goog
For a BPF filter to work properly, users need to provide appropriate
options to enable the sample types. Otherwise the BPF program would
see an invalid value (i.e. always 0) and filter won't work well.
Show a warning message if sample types are missing like below.
$ sudo ./perf record -e cycles --filter 'addr < 100' true
Error: cycles event does not have PERF_SAMPLE_ADDR
Hint: please add -d option to perf record.
failed to set filter "BPF" on event cycles with 22 (Invalid argument)
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/builtin-record.c | 2 +-
tools/perf/util/bpf-filter.c | 62 ++++++++++++++++++++++++++++++++++++
2 files changed, 63 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 6df8b823859d..7b7e74a56346 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1353,7 +1353,7 @@ static int record__open(struct record *rec)
if (evlist__apply_filters(evlist, &pos)) {
pr_err("failed to set filter \"%s\" on event %s with %d (%s)\n",
- pos->filter, evsel__name(pos), errno,
+ pos->filter ?: "BPF", evsel__name(pos), errno,
str_error_r(errno, msg, sizeof(msg)));
rc = -1;
goto out;
diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
index bd638737e12f..0b30688d78a7 100644
--- a/tools/perf/util/bpf-filter.c
+++ b/tools/perf/util/bpf-filter.c
@@ -17,6 +17,64 @@
#define FD(e, x, y) (*(int *)xyarray__entry(e->core.fd, x, y))
+#define __PERF_SAMPLE_TYPE(st, opt) { st, #st, opt }
+#define PERF_SAMPLE_TYPE(_st, opt) __PERF_SAMPLE_TYPE(PERF_SAMPLE_##_st, opt)
+
+static const struct perf_sample_info {
+ u64 type;
+ const char *name;
+ const char *option;
+} sample_table[] = {
+ /* default sample flags */
+ PERF_SAMPLE_TYPE(IP, NULL),
+ PERF_SAMPLE_TYPE(TID, NULL),
+ PERF_SAMPLE_TYPE(PERIOD, NULL),
+ /* flags mostly set by default, but still have options */
+ PERF_SAMPLE_TYPE(ID, "--sample-identifier"),
+ PERF_SAMPLE_TYPE(CPU, "--sample-cpu"),
+ PERF_SAMPLE_TYPE(TIME, "-T"),
+ /* optional sample flags */
+ PERF_SAMPLE_TYPE(ADDR, "-d"),
+ PERF_SAMPLE_TYPE(DATA_SRC, "-d"),
+ PERF_SAMPLE_TYPE(PHYS_ADDR, "--phys-data"),
+ PERF_SAMPLE_TYPE(WEIGHT, "-W"),
+ PERF_SAMPLE_TYPE(WEIGHT_STRUCT, "-W"),
+ PERF_SAMPLE_TYPE(TRANSACTION, "--transaction"),
+ PERF_SAMPLE_TYPE(CODE_PAGE_SIZE, "--code-page-size"),
+ PERF_SAMPLE_TYPE(DATA_PAGE_SIZE, "--data-page-size"),
+};
+
+static const struct perf_sample_info *get_sample_info(u64 flags)
+{
+ size_t i;
+
+ for (i = 0; i < ARRAY_SIZE(sample_table); i++) {
+ if (sample_table[i].type == flags)
+ return &sample_table[i];
+ }
+ return NULL;
+}
+
+static int check_sample_flags(struct evsel *evsel, struct perf_bpf_filter_expr *expr)
+{
+ const struct perf_sample_info *info;
+
+ if (evsel->core.attr.sample_type & expr->sample_flags)
+ return 0;
+
+ info = get_sample_info(expr->sample_flags);
+ if (info == NULL) {
+ pr_err("Error: %s event does not have sample flags %lx\n",
+ evsel__name(evsel), expr->sample_flags);
+ return -1;
+ }
+
+ pr_err("Error: %s event does not have %s\n", evsel__name(evsel), info->name);
+ if (info->option)
+ pr_err(" Hint: please add %s option to perf record\n", info->option);
+ return -1;
+}
+
int perf_bpf_filter__prepare(struct evsel *evsel)
{
int i, x, y, fd;
@@ -40,6 +98,10 @@ int perf_bpf_filter__prepare(struct evsel *evsel)
.flags = expr->sample_flags,
.value = expr->val,
};
+
+ if (check_sample_flags(evsel, expr) < 0)
+ return -1;
+
bpf_map_update_elem(fd, &i, &entry, BPF_ANY);
i++;
--
2.40.0.rc1.284.g88254d51c5-goog
It supports two or more expressions connected as a group and the group
result is considered true when one of them returns true. The new group
operators (GROUP_BEGIN and GROUP_END) are added to setup and check the
condition. As it doesn't allow nested groups, the condition is saved
in local variables.
For example, the following is to get samples only if the data source
memory level is L2 cache or the weight value is greater than 30.
$ sudo ./perf record -adW -e cpu/mem-loads/pp \
> --filter 'mem_lvl == l2 || weight > 30' -- sleep 1
$ sudo ./perf script -F data_src,weight
10668100842 |OP LOAD|LVL L3 or L3 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 47
11868100242 |OP LOAD|LVL LFB/MAB or LFB/MAB hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 57
10668100842 |OP LOAD|LVL L3 or L3 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 56
10650100842 |OP LOAD|LVL L3 or L3 hit|SNP None|TLB L2 miss|LCK No|BLK N/A 144
10468100442 |OP LOAD|LVL L2 or L2 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 16
10468100442 |OP LOAD|LVL L2 or L2 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 20
11868100242 |OP LOAD|LVL LFB/MAB or LFB/MAB hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 189
1026a100142 |OP LOAD|LVL L1 or L1 hit|SNP None|TLB L1 or L2 hit|LCK Yes|BLK N/A 193
10468100442 |OP LOAD|LVL L2 or L2 hit|SNP None|TLB L1 or L2 hit|LCK No|BLK N/A 18
...
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/bpf-filter.c | 25 +++++++++++++
tools/perf/util/bpf-filter.h | 1 +
tools/perf/util/bpf-filter.l | 1 +
tools/perf/util/bpf-filter.y | 25 +++++++++++--
tools/perf/util/bpf_skel/sample-filter.h | 6 ++--
tools/perf/util/bpf_skel/sample_filter.bpf.c | 38 +++++++++++++-------
6 files changed, 79 insertions(+), 17 deletions(-)
diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
index 743c69fd6cd4..bd638737e12f 100644
--- a/tools/perf/util/bpf-filter.c
+++ b/tools/perf/util/bpf-filter.c
@@ -42,8 +42,32 @@ int perf_bpf_filter__prepare(struct evsel *evsel)
};
bpf_map_update_elem(fd, &i, &entry, BPF_ANY);
i++;
+
+ if (expr->op == PBF_OP_GROUP_BEGIN) {
+ struct perf_bpf_filter_expr *group;
+
+ list_for_each_entry(group, &expr->groups, list) {
+ struct perf_bpf_filter_entry group_entry = {
+ .op = group->op,
+ .part = group->part,
+ .flags = group->sample_flags,
+ .value = group->val,
+ };
+ bpf_map_update_elem(fd, &i, &group_entry, BPF_ANY);
+ i++;
+ }
+
+ memset(&entry, 0, sizeof(entry));
+ entry.op = PBF_OP_GROUP_END;
+ bpf_map_update_elem(fd, &i, &entry, BPF_ANY);
+ i++;
+ }
}
+ if (i > MAX_FILTERS) {
+ pr_err("Too many filters: %d (max = %d)\n", i, MAX_FILTERS);
+ return -1;
+ }
prog = skel->progs.perf_sample_filter;
for (x = 0; x < xyarray__max_x(evsel->core.fd); x++) {
for (y = 0; y < xyarray__max_y(evsel->core.fd); y++) {
@@ -89,6 +113,7 @@ struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flag
expr->part = part;
expr->op = op;
expr->val = val;
+ INIT_LIST_HEAD(&expr->groups);
}
return expr;
}
diff --git a/tools/perf/util/bpf-filter.h b/tools/perf/util/bpf-filter.h
index 3f8827bd965f..7afd159411b8 100644
--- a/tools/perf/util/bpf-filter.h
+++ b/tools/perf/util/bpf-filter.h
@@ -8,6 +8,7 @@
struct perf_bpf_filter_expr {
struct list_head list;
+ struct list_head groups;
enum perf_bpf_filter_op op;
int part;
unsigned long sample_flags;
diff --git a/tools/perf/util/bpf-filter.l b/tools/perf/util/bpf-filter.l
index 3e66b7a0215e..d4ff0f1345cd 100644
--- a/tools/perf/util/bpf-filter.l
+++ b/tools/perf/util/bpf-filter.l
@@ -151,6 +151,7 @@ hops2 { return constant(PERF_MEM_HOPS_2); }
hops3 { return constant(PERF_MEM_HOPS_3); }
"," { return ','; }
+"||" { return BFT_LOGICAL_OR; }
{ident} { return error("ident"); }
. { return error("input"); }
diff --git a/tools/perf/util/bpf-filter.y b/tools/perf/util/bpf-filter.y
index 0ca6532afd8d..07d6c7926c13 100644
--- a/tools/perf/util/bpf-filter.y
+++ b/tools/perf/util/bpf-filter.y
@@ -28,8 +28,8 @@ static void perf_bpf_filter_error(struct list_head *expr __maybe_unused,
struct perf_bpf_filter_expr *expr;
}
-%token BFT_SAMPLE BFT_OP BFT_ERROR BFT_NUM
-%type <expr> filter_term
+%token BFT_SAMPLE BFT_OP BFT_ERROR BFT_NUM BFT_LOGICAL_OR
+%type <expr> filter_term filter_expr
%destructor { free ($$); } <expr>
%type <sample> BFT_SAMPLE
%type <op> BFT_OP
@@ -49,6 +49,27 @@ filter_term
}
filter_term:
+filter_term BFT_LOGICAL_OR filter_expr
+{
+ struct perf_bpf_filter_expr *expr;
+
+ if ($1->op == PBF_OP_GROUP_BEGIN) {
+ expr = $1;
+ } else {
+ expr = perf_bpf_filter_expr__new(0, 0, PBF_OP_GROUP_BEGIN, 1);
+ list_add_tail(&$1->list, &expr->groups);
+ }
+ expr->val++;
+ list_add_tail(&$3->list, &expr->groups);
+ $$ = expr;
+}
+|
+filter_expr
+{
+ $$ = $1;
+}
+
+filter_expr:
BFT_SAMPLE BFT_OP BFT_NUM
{
$$ = perf_bpf_filter_expr__new($1.type, $1.part, $2, $3);
diff --git a/tools/perf/util/bpf_skel/sample-filter.h b/tools/perf/util/bpf_skel/sample-filter.h
index 6b9fd554ad7b..2e96e1ab084a 100644
--- a/tools/perf/util/bpf_skel/sample-filter.h
+++ b/tools/perf/util/bpf_skel/sample-filter.h
@@ -1,7 +1,7 @@
#ifndef PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H
#define PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H
-#define MAX_FILTERS 32
+#define MAX_FILTERS 64
/* supported filter operations */
enum perf_bpf_filter_op {
@@ -11,7 +11,9 @@ enum perf_bpf_filter_op {
PBF_OP_GE,
PBF_OP_LT,
PBF_OP_LE,
- PBF_OP_AND
+ PBF_OP_AND,
+ PBF_OP_GROUP_BEGIN,
+ PBF_OP_GROUP_END,
};
/* BPF map entry for filtering */
diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
index 88dbc788d257..57e3c67d6d37 100644
--- a/tools/perf/util/bpf_skel/sample_filter.bpf.c
+++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
@@ -99,6 +99,14 @@ static inline __u64 perf_get_sample(struct bpf_perf_event_data_kern *kctx,
return 0;
}
+#define CHECK_RESULT(data, op, val) \
+ if (!(data op val)) { \
+ if (!in_group) \
+ goto drop; \
+ } else if (in_group) { \
+ group_result = 1; \
+ }
+
/* BPF program to be called from perf event overflow handler */
SEC("perf_event")
int perf_sample_filter(void *ctx)
@@ -106,6 +114,8 @@ int perf_sample_filter(void *ctx)
struct bpf_perf_event_data_kern *kctx;
struct perf_bpf_filter_entry *entry;
__u64 sample_data;
+ int in_group = 0;
+ int group_result = 0;
int i;
kctx = bpf_cast_to_kern_ctx(ctx);
@@ -120,32 +130,34 @@ int perf_sample_filter(void *ctx)
switch (entry->op) {
case PBF_OP_EQ:
- if (!(sample_data == entry->value))
- goto drop;
+ CHECK_RESULT(sample_data, ==, entry->value)
break;
case PBF_OP_NEQ:
- if (!(sample_data != entry->value))
- goto drop;
+ CHECK_RESULT(sample_data, !=, entry->value)
break;
case PBF_OP_GT:
- if (!(sample_data > entry->value))
- goto drop;
+ CHECK_RESULT(sample_data, >, entry->value)
break;
case PBF_OP_GE:
- if (!(sample_data >= entry->value))
- goto drop;
+ CHECK_RESULT(sample_data, >=, entry->value)
break;
case PBF_OP_LT:
- if (!(sample_data < entry->value))
- goto drop;
+ CHECK_RESULT(sample_data, <, entry->value)
break;
case PBF_OP_LE:
- if (!(sample_data <= entry->value))
- goto drop;
+ CHECK_RESULT(sample_data, <=, entry->value)
break;
case PBF_OP_AND:
- if (!(sample_data & entry->value))
+ CHECK_RESULT(sample_data, &, entry->value)
+ break;
+ case PBF_OP_GROUP_BEGIN:
+ in_group = 1;
+ group_result = 0;
+ break;
+ case PBF_OP_GROUP_END:
+ if (group_result == 0)
goto drop;
+ in_group = 0;
break;
}
}
--
2.40.0.rc1.284.g88254d51c5-goog
The data_src has many entries to express memory behaviors. Add each
term separately so that users can combine them for their purpose.
I didn't add prefix for the constants for simplicity as they are mostly
distinguishable but I had to use l1_miss and l2_hit for mem_dtlb since
mem_lvl has different values for the same names. Note that I decided
mem_lvl to be used as an alias of mem_lvlnum as it's deprecated now.
According to the comment in the UAPI header, users should use the mix
of mem_lvlnum, mem_remote and mem_snoop. Also the SNOOPX bits are
concatenated to mem_snoop for simplicity.
The following terms are used for data_src and the corresponding perf
sample data fields:
* mem_op : { load, store, pfetch, exec }
* mem_lvl: { l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem }
* mem_snoop: { none, hit, miss, hitm, fwd, peer }
* mem_remote: { remote }
* mem_lock: { locked }
* mem_dtlb { l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault }
* mem_blk { by_data, by_addr }
* mem_hops { hops0, hops1, hops2, hops3 }
We can now use a filter expression like below:
'mem_op == load, mem_lvl <= l2, mem_dtlb == l1_hit'
'mem_dtlb == l2_miss, mem_hops > hops1'
'mem_lvl == ram, mem_remote == 1'
Note that 'na' is shared among the terms as it has the same value except
for mem_lvl. I don't have a good idea to handle that for now.
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/bpf-filter.l | 61 ++++++++++++++++++++
tools/perf/util/bpf_skel/sample_filter.bpf.c | 23 ++++++++
2 files changed, 84 insertions(+)
diff --git a/tools/perf/util/bpf-filter.l b/tools/perf/util/bpf-filter.l
index 419f923b35c0..3e66b7a0215e 100644
--- a/tools/perf/util/bpf-filter.l
+++ b/tools/perf/util/bpf-filter.l
@@ -42,6 +42,12 @@ static int value(int base)
return BFT_NUM;
}
+static int constant(int val)
+{
+ perf_bpf_filter_lval.num = val;
+ return BFT_NUM;
+}
+
static int error(const char *str)
{
printf("perf_bpf_filter: Unexpected filter %s: %s\n", str, perf_bpf_filter_text);
@@ -80,6 +86,15 @@ retire_lat { return sample_part(PERF_SAMPLE_WEIGHT_STRUCT, 3); } /* alias for we
phys_addr { return sample(PERF_SAMPLE_PHYS_ADDR); }
code_pgsz { return sample(PERF_SAMPLE_CODE_PAGE_SIZE); }
data_pgsz { return sample(PERF_SAMPLE_DATA_PAGE_SIZE); }
+mem_op { return sample_part(PERF_SAMPLE_DATA_SRC, 1); }
+mem_lvlnum { return sample_part(PERF_SAMPLE_DATA_SRC, 2); }
+mem_lvl { return sample_part(PERF_SAMPLE_DATA_SRC, 2); } /* alias for mem_lvlnum */
+mem_snoop { return sample_part(PERF_SAMPLE_DATA_SRC, 3); } /* include snoopx */
+mem_remote { return sample_part(PERF_SAMPLE_DATA_SRC, 4); }
+mem_lock { return sample_part(PERF_SAMPLE_DATA_SRC, 5); }
+mem_dtlb { return sample_part(PERF_SAMPLE_DATA_SRC, 6); }
+mem_blk { return sample_part(PERF_SAMPLE_DATA_SRC, 7); }
+mem_hops { return sample_part(PERF_SAMPLE_DATA_SRC, 8); }
"==" { return operator(PBF_OP_EQ); }
"!=" { return operator(PBF_OP_NEQ); }
@@ -89,6 +104,52 @@ data_pgsz { return sample(PERF_SAMPLE_DATA_PAGE_SIZE); }
"<=" { return operator(PBF_OP_LE); }
"&" { return operator(PBF_OP_AND); }
+na { return constant(PERF_MEM_OP_NA); }
+load { return constant(PERF_MEM_OP_LOAD); }
+store { return constant(PERF_MEM_OP_STORE); }
+pfetch { return constant(PERF_MEM_OP_PFETCH); }
+exec { return constant(PERF_MEM_OP_EXEC); }
+
+l1 { return constant(PERF_MEM_LVLNUM_L1); }
+l2 { return constant(PERF_MEM_LVLNUM_L2); }
+l3 { return constant(PERF_MEM_LVLNUM_L3); }
+l4 { return constant(PERF_MEM_LVLNUM_L4); }
+cxl { return constant(PERF_MEM_LVLNUM_CXL); }
+io { return constant(PERF_MEM_LVLNUM_IO); }
+any_cache { return constant(PERF_MEM_LVLNUM_ANY_CACHE); }
+lfb { return constant(PERF_MEM_LVLNUM_LFB); }
+ram { return constant(PERF_MEM_LVLNUM_RAM); }
+pmem { return constant(PERF_MEM_LVLNUM_PMEM); }
+
+none { return constant(PERF_MEM_SNOOP_NONE); }
+hit { return constant(PERF_MEM_SNOOP_HIT); }
+miss { return constant(PERF_MEM_SNOOP_MISS); }
+hitm { return constant(PERF_MEM_SNOOP_HITM); }
+fwd { return constant(PERF_MEM_SNOOPX_FWD); }
+peer { return constant(PERF_MEM_SNOOPX_PEER); }
+
+remote { return constant(PERF_MEM_REMOTE_REMOTE); }
+
+locked { return constant(PERF_MEM_LOCK_LOCKED); }
+
+l1_hit { return constant(PERF_MEM_TLB_L1 | PERF_MEM_TLB_HIT); }
+l1_miss { return constant(PERF_MEM_TLB_L1 | PERF_MEM_TLB_MISS); }
+l2_hit { return constant(PERF_MEM_TLB_L2 | PERF_MEM_TLB_HIT); }
+l2_miss { return constant(PERF_MEM_TLB_L2 | PERF_MEM_TLB_MISS); }
+any_hit { return constant(PERF_MEM_TLB_HIT); }
+any_miss { return constant(PERF_MEM_TLB_MISS); }
+walk { return constant(PERF_MEM_TLB_WK); }
+os { return constant(PERF_MEM_TLB_OS); }
+fault { return constant(PERF_MEM_TLB_OS); } /* alias for os */
+
+by_data { return constant(PERF_MEM_BLK_DATA); }
+by_addr { return constant(PERF_MEM_BLK_ADDR); }
+
+hops0 { return constant(PERF_MEM_HOPS_0); }
+hops1 { return constant(PERF_MEM_HOPS_1); }
+hops2 { return constant(PERF_MEM_HOPS_2); }
+hops3 { return constant(PERF_MEM_HOPS_3); }
+
"," { return ','; }
{ident} { return error("ident"); }
diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
index d930401c5bfc..88dbc788d257 100644
--- a/tools/perf/util/bpf_skel/sample_filter.bpf.c
+++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
@@ -70,6 +70,29 @@ static inline __u64 perf_get_sample(struct bpf_perf_event_data_kern *kctx,
return kctx->data->code_page_size;
case PERF_SAMPLE_DATA_PAGE_SIZE:
return kctx->data->data_page_size;
+ case PERF_SAMPLE_DATA_SRC:
+ if (entry->part == 1)
+ return kctx->data->data_src.mem_op;
+ if (entry->part == 2)
+ return kctx->data->data_src.mem_lvl_num;
+ if (entry->part == 3) {
+ __u32 snoop = kctx->data->data_src.mem_snoop;
+ __u32 snoopx = kctx->data->data_src.mem_snoopx;
+
+ return (snoopx << 5) | snoop;
+ }
+ if (entry->part == 4)
+ return kctx->data->data_src.mem_remote;
+ if (entry->part == 5)
+ return kctx->data->data_src.mem_lock;
+ if (entry->part == 6)
+ return kctx->data->data_src.mem_dtlb;
+ if (entry->part == 7)
+ return kctx->data->data_src.mem_blk;
+ if (entry->part == 8)
+ return kctx->data->data_src.mem_hops;
+ /* return the whole word */
+ return kctx->data->data_src.val;
default:
break;
}
--
2.40.0.rc1.284.g88254d51c5-goog
When it uses bpf filters, event might drop some samples. It'd be nice
if it can report how many samples it lost. As LOST_SAMPLES event can
carry the similar information, let's use it for bpf filters.
To indicate it's from BPF filters, add a new misc flag for that and
do not display cpu load warnings.
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/lib/perf/include/perf/event.h | 2 ++
tools/perf/builtin-record.c | 38 ++++++++++++++++++-----------
tools/perf/util/bpf-filter.c | 7 ++++++
tools/perf/util/bpf-filter.h | 5 ++++
tools/perf/util/session.c | 3 ++-
5 files changed, 40 insertions(+), 15 deletions(-)
diff --git a/tools/lib/perf/include/perf/event.h b/tools/lib/perf/include/perf/event.h
index ad47d7b31046..51b9338f4c11 100644
--- a/tools/lib/perf/include/perf/event.h
+++ b/tools/lib/perf/include/perf/event.h
@@ -70,6 +70,8 @@ struct perf_record_lost {
__u64 lost;
};
+#define PERF_RECORD_MISC_LOST_SAMPLES_BPF (1 << 15)
+
struct perf_record_lost_samples {
struct perf_event_header header;
__u64 lost;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 33ebe42b025e..6df8b823859d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -52,6 +52,7 @@
#include "util/pmu-hybrid.h"
#include "util/evlist-hybrid.h"
#include "util/off_cpu.h"
+#include "util/bpf-filter.h"
#include "asm/bug.h"
#include "perf.h"
#include "cputopo.h"
@@ -1856,24 +1857,16 @@ record__switch_output(struct record *rec, bool at_exit)
return fd;
}
-static void __record__read_lost_samples(struct record *rec, struct evsel *evsel,
+static void __record__save_lost_samples(struct record *rec, struct evsel *evsel,
struct perf_record_lost_samples *lost,
- int cpu_idx, int thread_idx)
+ int cpu_idx, int thread_idx, u64 lost_count,
+ u16 misc_flag)
{
- struct perf_counts_values count;
struct perf_sample_id *sid;
struct perf_sample sample = {};
int id_hdr_size;
- if (perf_evsel__read(&evsel->core, cpu_idx, thread_idx, &count) < 0) {
- pr_debug("read LOST count failed\n");
- return;
- }
-
- if (count.lost == 0)
- return;
-
- lost->lost = count.lost;
+ lost->lost = lost_count;
if (evsel->core.ids) {
sid = xyarray__entry(evsel->core.sample_id, cpu_idx, thread_idx);
sample.id = sid->id;
@@ -1882,6 +1875,7 @@ static void __record__read_lost_samples(struct record *rec, struct evsel *evsel,
id_hdr_size = perf_event__synthesize_id_sample((void *)(lost + 1),
evsel->core.attr.sample_type, &sample);
lost->header.size = sizeof(*lost) + id_hdr_size;
+ lost->header.misc = misc_flag;
record__write(rec, NULL, lost, lost->header.size);
}
@@ -1905,6 +1899,7 @@ static void record__read_lost_samples(struct record *rec)
evlist__for_each_entry(session->evlist, evsel) {
struct xyarray *xy = evsel->core.sample_id;
+ u64 lost_count;
if (xy == NULL || evsel->core.fd == NULL)
continue;
@@ -1916,12 +1911,27 @@ static void record__read_lost_samples(struct record *rec)
for (int x = 0; x < xyarray__max_x(xy); x++) {
for (int y = 0; y < xyarray__max_y(xy); y++) {
- __record__read_lost_samples(rec, evsel, lost, x, y);
+ struct perf_counts_values count;
+
+ if (perf_evsel__read(&evsel->core, x, y, &count) < 0) {
+ pr_debug("read LOST count failed\n");
+ goto out;
+ }
+
+ if (count.lost) {
+ __record__save_lost_samples(rec, evsel, lost,
+ x, y, count.lost, 0);
+ }
}
}
+
+ lost_count = perf_bpf_filter__lost_count(evsel);
+ if (lost_count)
+ __record__save_lost_samples(rec, evsel, lost, 0, 0, lost_count,
+ PERF_RECORD_MISC_LOST_SAMPLES_BPF);
}
+out:
free(lost);
-
}
static volatile sig_atomic_t workload_exec_errno;
diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
index f20e1bc03778..7bd6f2e41513 100644
--- a/tools/perf/util/bpf-filter.c
+++ b/tools/perf/util/bpf-filter.c
@@ -69,6 +69,13 @@ int perf_bpf_filter__destroy(struct evsel *evsel)
return 0;
}
+u64 perf_bpf_filter__lost_count(struct evsel *evsel)
+{
+ struct sample_filter_bpf *skel = evsel->bpf_skel;
+
+ return skel ? skel->bss->dropped : 0;
+}
+
struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
enum perf_bpf_filter_op op,
unsigned long val)
diff --git a/tools/perf/util/bpf-filter.h b/tools/perf/util/bpf-filter.h
index eb8e1ac43cdf..f0c66764c6d0 100644
--- a/tools/perf/util/bpf-filter.h
+++ b/tools/perf/util/bpf-filter.h
@@ -22,6 +22,7 @@ struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flag
int perf_bpf_filter__parse(struct list_head *expr_head, const char *str);
int perf_bpf_filter__prepare(struct evsel *evsel);
int perf_bpf_filter__destroy(struct evsel *evsel);
+u64 perf_bpf_filter__lost_count(struct evsel *evsel);
#else /* !HAVE_BPF_SKEL */
@@ -38,5 +39,9 @@ static inline int perf_bpf_filter__destroy(struct evsel *evsel __maybe_unused)
{
return -EOPNOTSUPP;
}
+static inline u64 perf_bpf_filter__lost_count(struct evsel *evsel __maybe_unused)
+{
+ return 0;
+}
#endif /* HAVE_BPF_SKEL*/
#endif /* PERF_UTIL_BPF_FILTER_H */
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 749d5b5c135b..7d8d057d1772 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1582,7 +1582,8 @@ static int machines__deliver_event(struct machines *machines,
evlist->stats.total_lost += event->lost.lost;
return tool->lost(tool, event, sample, machine);
case PERF_RECORD_LOST_SAMPLES:
- if (tool->lost_samples == perf_event__process_lost_samples)
+ if (tool->lost_samples == perf_event__process_lost_samples &&
+ !(event->header.misc & PERF_RECORD_MISC_LOST_SAMPLES_BPF))
evlist->stats.total_lost_samples += event->lost_samples.lost;
return tool->lost_samples(tool, event, sample, machine);
case PERF_RECORD_READ:
--
2.40.0.rc1.284.g88254d51c5-goog
The weight data consists of a couple of fields with the
PERF_SAMPLE_WEIGHT_STRUCT. Add weight{1,2,3} term to select them
separately. Also add their aliases like 'ins_lat', 'p_stage_cyc'
and 'retire_lat'.
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/bpf-filter.l | 6 ++++++
tools/perf/util/bpf_skel/sample_filter.bpf.c | 8 ++++++++
2 files changed, 14 insertions(+)
diff --git a/tools/perf/util/bpf-filter.l b/tools/perf/util/bpf-filter.l
index ec12fc4d2ab8..419f923b35c0 100644
--- a/tools/perf/util/bpf-filter.l
+++ b/tools/perf/util/bpf-filter.l
@@ -71,6 +71,12 @@ addr { return sample(PERF_SAMPLE_ADDR); }
period { return sample(PERF_SAMPLE_PERIOD); }
txn { return sample(PERF_SAMPLE_TRANSACTION); }
weight { return sample(PERF_SAMPLE_WEIGHT); }
+weight1 { return sample_part(PERF_SAMPLE_WEIGHT_STRUCT, 1); }
+weight2 { return sample_part(PERF_SAMPLE_WEIGHT_STRUCT, 2); }
+weight3 { return sample_part(PERF_SAMPLE_WEIGHT_STRUCT, 3); }
+ins_lat { return sample_part(PERF_SAMPLE_WEIGHT_STRUCT, 2); } /* alias for weight2 */
+p_stage_cyc { return sample_part(PERF_SAMPLE_WEIGHT_STRUCT, 3); } /* alias for weight3 */
+retire_lat { return sample_part(PERF_SAMPLE_WEIGHT_STRUCT, 3); } /* alias for weight3 */
phys_addr { return sample(PERF_SAMPLE_PHYS_ADDR); }
code_pgsz { return sample(PERF_SAMPLE_CODE_PAGE_SIZE); }
data_pgsz { return sample(PERF_SAMPLE_DATA_PAGE_SIZE); }
diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
index dddf38c27bb7..d930401c5bfc 100644
--- a/tools/perf/util/bpf_skel/sample_filter.bpf.c
+++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
@@ -54,6 +54,14 @@ static inline __u64 perf_get_sample(struct bpf_perf_event_data_kern *kctx,
return kctx->data->period;
case PERF_SAMPLE_TRANSACTION:
return kctx->data->txn;
+ case PERF_SAMPLE_WEIGHT_STRUCT:
+ if (entry->part == 1)
+ return kctx->data->weight.var1_dw;
+ if (entry->part == 2)
+ return kctx->data->weight.var2_w;
+ if (entry->part == 3)
+ return kctx->data->weight.var3_w;
+ /* fall through */
case PERF_SAMPLE_WEIGHT:
return kctx->data->weight.full;
case PERF_SAMPLE_PHYS_ADDR:
--
2.40.0.rc1.284.g88254d51c5-goog
The pid is special because it's saved in the PERF_SAMPLE_TID together.
So it needs to differenciate tid and pid using the 'part' field in the
perf bpf filter entry struct.
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Namhyung Kim <[email protected]>
---
tools/perf/util/bpf-filter.c | 4 +++-
tools/perf/util/bpf-filter.h | 3 ++-
tools/perf/util/bpf-filter.l | 11 ++++++++++-
tools/perf/util/bpf-filter.y | 7 +++++--
tools/perf/util/bpf_skel/sample-filter.h | 3 ++-
tools/perf/util/bpf_skel/sample_filter.bpf.c | 5 ++++-
6 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
index 7bd6f2e41513..743c69fd6cd4 100644
--- a/tools/perf/util/bpf-filter.c
+++ b/tools/perf/util/bpf-filter.c
@@ -36,6 +36,7 @@ int perf_bpf_filter__prepare(struct evsel *evsel)
list_for_each_entry(expr, &evsel->bpf_filters, list) {
struct perf_bpf_filter_entry entry = {
.op = expr->op,
+ .part = expr->part,
.flags = expr->sample_flags,
.value = expr->val,
};
@@ -76,7 +77,7 @@ u64 perf_bpf_filter__lost_count(struct evsel *evsel)
return skel ? skel->bss->dropped : 0;
}
-struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
+struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags, int part,
enum perf_bpf_filter_op op,
unsigned long val)
{
@@ -85,6 +86,7 @@ struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flag
expr = malloc(sizeof(*expr));
if (expr != NULL) {
expr->sample_flags = sample_flags;
+ expr->part = part;
expr->op = op;
expr->val = val;
}
diff --git a/tools/perf/util/bpf-filter.h b/tools/perf/util/bpf-filter.h
index f0c66764c6d0..3f8827bd965f 100644
--- a/tools/perf/util/bpf-filter.h
+++ b/tools/perf/util/bpf-filter.h
@@ -9,6 +9,7 @@
struct perf_bpf_filter_expr {
struct list_head list;
enum perf_bpf_filter_op op;
+ int part;
unsigned long sample_flags;
unsigned long val;
};
@@ -16,7 +17,7 @@ struct perf_bpf_filter_expr {
struct evsel;
#ifdef HAVE_BPF_SKEL
-struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
+struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags, int part,
enum perf_bpf_filter_op op,
unsigned long val);
int perf_bpf_filter__parse(struct list_head *expr_head, const char *str);
diff --git a/tools/perf/util/bpf-filter.l b/tools/perf/util/bpf-filter.l
index f6c0b74ea285..ec12fc4d2ab8 100644
--- a/tools/perf/util/bpf-filter.l
+++ b/tools/perf/util/bpf-filter.l
@@ -11,7 +11,15 @@
static int sample(unsigned long sample_flag)
{
- perf_bpf_filter_lval.sample = sample_flag;
+ perf_bpf_filter_lval.sample.type = sample_flag;
+ perf_bpf_filter_lval.sample.part = 0;
+ return BFT_SAMPLE;
+}
+
+static int sample_part(unsigned long sample_flag, int part)
+{
+ perf_bpf_filter_lval.sample.type = sample_flag;
+ perf_bpf_filter_lval.sample.part = part;
return BFT_SAMPLE;
}
@@ -56,6 +64,7 @@ ident [_a-zA-Z][_a-zA-Z0-9]+
ip { return sample(PERF_SAMPLE_IP); }
id { return sample(PERF_SAMPLE_ID); }
tid { return sample(PERF_SAMPLE_TID); }
+pid { return sample_part(PERF_SAMPLE_TID, 1); }
cpu { return sample(PERF_SAMPLE_CPU); }
time { return sample(PERF_SAMPLE_TIME); }
addr { return sample(PERF_SAMPLE_ADDR); }
diff --git a/tools/perf/util/bpf-filter.y b/tools/perf/util/bpf-filter.y
index 13eca612ecca..0ca6532afd8d 100644
--- a/tools/perf/util/bpf-filter.y
+++ b/tools/perf/util/bpf-filter.y
@@ -20,7 +20,10 @@ static void perf_bpf_filter_error(struct list_head *expr __maybe_unused,
%union
{
unsigned long num;
- unsigned long sample;
+ struct {
+ unsigned long type;
+ int part;
+ } sample;
enum perf_bpf_filter_op op;
struct perf_bpf_filter_expr *expr;
}
@@ -48,7 +51,7 @@ filter_term
filter_term:
BFT_SAMPLE BFT_OP BFT_NUM
{
- $$ = perf_bpf_filter_expr__new($1, $2, $3);
+ $$ = perf_bpf_filter_expr__new($1.type, $1.part, $2, $3);
}
%%
diff --git a/tools/perf/util/bpf_skel/sample-filter.h b/tools/perf/util/bpf_skel/sample-filter.h
index 862060bfda14..6b9fd554ad7b 100644
--- a/tools/perf/util/bpf_skel/sample-filter.h
+++ b/tools/perf/util/bpf_skel/sample-filter.h
@@ -17,7 +17,8 @@ enum perf_bpf_filter_op {
/* BPF map entry for filtering */
struct perf_bpf_filter_entry {
enum perf_bpf_filter_op op;
- __u64 flags;
+ __u32 part; /* sub-sample type info when it has multiple values */
+ __u64 flags; /* perf sample type flags */
__u64 value;
};
diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
index c07256279c3e..dddf38c27bb7 100644
--- a/tools/perf/util/bpf_skel/sample_filter.bpf.c
+++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
@@ -40,7 +40,10 @@ static inline __u64 perf_get_sample(struct bpf_perf_event_data_kern *kctx,
case PERF_SAMPLE_ID:
return kctx->data->id;
case PERF_SAMPLE_TID:
- return kctx->data->tid_entry.tid;
+ if (entry->part)
+ return kctx->data->tid_entry.pid;
+ else
+ return kctx->data->tid_entry.tid;
case PERF_SAMPLE_CPU:
return kctx->data->cpu_entry.cpu;
case PERF_SAMPLE_TIME:
--
2.40.0.rc1.284.g88254d51c5-goog
Em Tue, Mar 14, 2023 at 04:42:30PM -0700, Namhyung Kim escreveu:
> Use --filter option to set BPF filter for generic events other than the
> tracepoints or Intel PT. The BPF program will check the sample data and
> filter according to the expression.
>
> For example, the below is the typical perf record for frequency mode.
> The sample period started from 1 and increased gradually.
>
> $ sudo ./perf record -e cycles true
> $ sudo ./perf script
> perf-exec 2272336 546683.916875: 1 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916892: 1 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916899: 3 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916905: 17 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916911: 100 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916917: 589 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916924: 3470 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> perf-exec 2272336 546683.916930: 20465 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> true 2272336 546683.916940: 119873 cycles: ffffffff8283afdd perf_iterate_ctx+0x2d ([kernel.kallsyms])
> true 2272336 546683.917003: 461349 cycles: ffffffff82892517 vma_interval_tree_insert+0x37 ([kernel.kallsyms])
> true 2272336 546683.917237: 635778 cycles: ffffffff82a11400 security_mmap_file+0x20 ([kernel.kallsyms])
>
> When you add a BPF filter to get samples having periods greater than 1000,
> the output would look like below:
Had to add:
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index be336f1b2b689602..153a13cdca9df1ea 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -19,6 +19,7 @@
#include "mmap.h"
#include "stat.h"
#include "metricgroup.h"
+#include "util/bpf-filter.h"
#include "util/env.h"
#include "util/pmu.h"
#include <internal/lib.h>
@@ -135,6 +136,18 @@ int bpf_counter__disable(struct evsel *evsel __maybe_unused)
return 0;
}
+// not to drag util/bpf-filter.c
+
+int perf_bpf_filter__prepare(struct evsel *evsel __maybe_unused)
+{
+ return 0;
+}
+
+int perf_bpf_filter__destroy(struct evsel *evsel __maybe_unused)
+{
+ return 0;
+}
+
/*
* Support debug printing even though util/debug.c is not linked. That means
* implementing 'verbose' and 'eprintf'.
Please run 'perf test' before submitting patches,
- Arnaldo
> $ sudo ./perf record -e cycles --filter 'period > 1000' true
> $ sudo ./perf script
> perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
> perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
> perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
> perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
> perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
> true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
> true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
>
> Acked-by: Jiri Olsa <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
> ---
> tools/perf/Documentation/perf-record.txt | 15 +++++++++++---
> tools/perf/util/bpf_counter.c | 3 +--
> tools/perf/util/evlist.c | 25 +++++++++++++++++-------
> tools/perf/util/evsel.c | 2 ++
> tools/perf/util/parse-events.c | 8 +++-----
> 5 files changed, 36 insertions(+), 17 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> index ff815c2f67e8..122f71726eaa 100644
> --- a/tools/perf/Documentation/perf-record.txt
> +++ b/tools/perf/Documentation/perf-record.txt
> @@ -119,9 +119,12 @@ OPTIONS
> "perf report" to view group events together.
>
> --filter=<filter>::
> - Event filter. This option should follow an event selector (-e) which
> - selects either tracepoint event(s) or a hardware trace PMU
> - (e.g. Intel PT or CoreSight).
> + Event filter. This option should follow an event selector (-e).
> + If the event is a tracepoint, the filter string will be parsed by
> + the kernel. If the event is a hardware trace PMU (e.g. Intel PT
> + or CoreSight), it'll be processed as an address filter. Otherwise
> + it means a general filter using BPF which can be applied for any
> + kind of event.
>
> - tracepoint filters
>
> @@ -176,6 +179,12 @@ OPTIONS
>
> Multiple filters can be separated with space or comma.
>
> + - bpf filters
> +
> + A BPF filter can access the sample data and make a decision based on the
> + data. Users need to set an appropriate sample type to use the BPF
> + filter.
> +
> --exclude-perf::
> Don't record events issued by perf itself. This option should follow
> an event selector (-e) which selects tracepoint event(s). It adds a
> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> index aa78a15a6f0a..1b77436e067e 100644
> --- a/tools/perf/util/bpf_counter.c
> +++ b/tools/perf/util/bpf_counter.c
> @@ -763,8 +763,7 @@ extern struct bpf_counter_ops bperf_cgrp_ops;
>
> static inline bool bpf_counter_skip(struct evsel *evsel)
> {
> - return list_empty(&evsel->bpf_counter_list) &&
> - evsel->follower_skel == NULL;
> + return evsel->bpf_counter_ops == NULL;
> }
>
> int bpf_counter__install_pe(struct evsel *evsel, int cpu_map_idx, int fd)
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index b74e12239aec..cc491a037836 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -31,6 +31,7 @@
> #include "util/evlist-hybrid.h"
> #include "util/pmu.h"
> #include "util/sample.h"
> +#include "util/bpf-filter.h"
> #include <signal.h>
> #include <unistd.h>
> #include <sched.h>
> @@ -1086,17 +1087,27 @@ int evlist__apply_filters(struct evlist *evlist, struct evsel **err_evsel)
> int err = 0;
>
> evlist__for_each_entry(evlist, evsel) {
> - if (evsel->filter == NULL)
> - continue;
> -
> /*
> * filters only work for tracepoint event, which doesn't have cpu limit.
> * So evlist and evsel should always be same.
> */
> - err = perf_evsel__apply_filter(&evsel->core, evsel->filter);
> - if (err) {
> - *err_evsel = evsel;
> - break;
> + if (evsel->filter) {
> + err = perf_evsel__apply_filter(&evsel->core, evsel->filter);
> + if (err) {
> + *err_evsel = evsel;
> + break;
> + }
> + }
> +
> + /*
> + * non-tracepoint events can have BPF filters.
> + */
> + if (!list_empty(&evsel->bpf_filters)) {
> + err = perf_bpf_filter__prepare(evsel);
> + if (err) {
> + *err_evsel = evsel;
> + break;
> + }
> }
> }
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index a83d8cd5eb51..dc3faf005c3b 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -50,6 +50,7 @@
> #include "off_cpu.h"
> #include "../perf-sys.h"
> #include "util/parse-branch-options.h"
> +#include "util/bpf-filter.h"
> #include <internal/xyarray.h>
> #include <internal/lib.h>
> #include <internal/threadmap.h>
> @@ -1517,6 +1518,7 @@ void evsel__exit(struct evsel *evsel)
> assert(list_empty(&evsel->core.node));
> assert(evsel->evlist == NULL);
> bpf_counter__destroy(evsel);
> + perf_bpf_filter__destroy(evsel);
> evsel__free_counts(evsel);
> perf_evsel__free_fd(&evsel->core);
> perf_evsel__free_id(&evsel->core);
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 3b2e5bb3e852..6c5cf5244486 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -28,6 +28,7 @@
> #include "perf.h"
> #include "util/parse-events-hybrid.h"
> #include "util/pmu-hybrid.h"
> +#include "util/bpf-filter.h"
> #include "tracepoint.h"
> #include "thread_map.h"
>
> @@ -2542,11 +2543,8 @@ static int set_filter(struct evsel *evsel, const void *arg)
> perf_pmu__scan_file(pmu, "nr_addr_filters",
> "%d", &nr_addr_filters);
>
> - if (!nr_addr_filters) {
> - fprintf(stderr,
> - "This CPU does not support address filtering\n");
> - return -1;
> - }
> + if (!nr_addr_filters)
> + return perf_bpf_filter__parse(&evsel->bpf_filters, str);
>
> if (evsel__append_addr_filter(evsel, str) < 0) {
> fprintf(stderr,
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>
--
- Arnaldo
Em Tue, Mar 14, 2023 at 04:42:29PM -0700, Namhyung Kim escreveu:
> The BPF program will be attached to a perf_event and be triggered when
> it overflows. It'd iterate the filters map and compare the sample
> value according to the expression. If any of them fails, the sample
> would be dropped.
>
> Also it needs to have the corresponding sample data for the expression
> so it compares data->sample_flags with the given value. To access the
> sample data, it uses the bpf_cast_to_kern_ctx() kfunc which was added
> in v6.2 kernel.
>
> Acked-by: Jiri Olsa <[email protected]>
> Signed-off-by: Namhyung Kim <[email protected]>
I'm noticing this while building on a debian:11 container:
GENSKEL /tmp/build/perf/util/bpf_skel/bperf_leader.skel.h
GENSKEL /tmp/build/perf/util/bpf_skel/bperf_follower.skel.h
GENSKEL /tmp/build/perf/util/bpf_skel/func_latency.skel.h
GENSKEL /tmp/build/perf/util/bpf_skel/bpf_prog_profiler.skel.h
GENSKEL /tmp/build/perf/util/bpf_skel/kwork_trace.skel.h
GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h
libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
Error: failed to open BPF object file: No such file or directory
make[2]: *** [Makefile.perf:1085: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile.perf:236: sub-make] Error 2
make: *** [Makefile:70: all] Error 2
make: Leaving directory '/git/perf-6.3.0-rc1/tools/perf'
+ exit 1
[perfbuilder@five 11]$
- Arnaldo
> ---
> tools/perf/Makefile.perf | 2 +-
> tools/perf/util/bpf-filter.c | 64 ++++++++++
> tools/perf/util/bpf-filter.h | 26 ++--
> tools/perf/util/bpf_skel/sample-filter.h | 24 ++++
> tools/perf/util/bpf_skel/sample_filter.bpf.c | 126 +++++++++++++++++++
> tools/perf/util/evsel.h | 7 +-
> 6 files changed, 236 insertions(+), 13 deletions(-)
> create mode 100644 tools/perf/util/bpf_skel/sample-filter.h
> create mode 100644 tools/perf/util/bpf_skel/sample_filter.bpf.c
>
> diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> index dc9dda09b076..ed6b6a070f79 100644
> --- a/tools/perf/Makefile.perf
> +++ b/tools/perf/Makefile.perf
> @@ -1050,7 +1050,7 @@ SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
> SKELETONS += $(SKEL_OUT)/bperf_leader.skel.h $(SKEL_OUT)/bperf_follower.skel.h
> SKELETONS += $(SKEL_OUT)/bperf_cgroup.skel.h $(SKEL_OUT)/func_latency.skel.h
> SKELETONS += $(SKEL_OUT)/off_cpu.skel.h $(SKEL_OUT)/lock_contention.skel.h
> -SKELETONS += $(SKEL_OUT)/kwork_trace.skel.h
> +SKELETONS += $(SKEL_OUT)/kwork_trace.skel.h $(SKEL_OUT)/sample_filter.skel.h
>
> $(SKEL_TMP_OUT) $(LIBAPI_OUTPUT) $(LIBBPF_OUTPUT) $(LIBPERF_OUTPUT) $(LIBSUBCMD_OUTPUT) $(LIBSYMBOL_OUTPUT):
> $(Q)$(MKDIR) -p $@
> diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
> index c72e35d51240..f20e1bc03778 100644
> --- a/tools/perf/util/bpf-filter.c
> +++ b/tools/perf/util/bpf-filter.c
> @@ -1,10 +1,74 @@
> /* SPDX-License-Identifier: GPL-2.0 */
> #include <stdlib.h>
>
> +#include <bpf/bpf.h>
> +#include <linux/err.h>
> +#include <internal/xyarray.h>
> +
> +#include "util/debug.h"
> +#include "util/evsel.h"
> +
> #include "util/bpf-filter.h"
> #include "util/bpf-filter-flex.h"
> #include "util/bpf-filter-bison.h"
>
> +#include "bpf_skel/sample-filter.h"
> +#include "bpf_skel/sample_filter.skel.h"
> +
> +#define FD(e, x, y) (*(int *)xyarray__entry(e->core.fd, x, y))
> +
> +int perf_bpf_filter__prepare(struct evsel *evsel)
> +{
> + int i, x, y, fd;
> + struct sample_filter_bpf *skel;
> + struct bpf_program *prog;
> + struct bpf_link *link;
> + struct perf_bpf_filter_expr *expr;
> +
> + skel = sample_filter_bpf__open_and_load();
> + if (!skel) {
> + pr_err("Failed to load perf sample-filter BPF skeleton\n");
> + return -1;
> + }
> +
> + i = 0;
> + fd = bpf_map__fd(skel->maps.filters);
> + list_for_each_entry(expr, &evsel->bpf_filters, list) {
> + struct perf_bpf_filter_entry entry = {
> + .op = expr->op,
> + .flags = expr->sample_flags,
> + .value = expr->val,
> + };
> + bpf_map_update_elem(fd, &i, &entry, BPF_ANY);
> + i++;
> + }
> +
> + prog = skel->progs.perf_sample_filter;
> + for (x = 0; x < xyarray__max_x(evsel->core.fd); x++) {
> + for (y = 0; y < xyarray__max_y(evsel->core.fd); y++) {
> + link = bpf_program__attach_perf_event(prog, FD(evsel, x, y));
> + if (IS_ERR(link)) {
> + pr_err("Failed to attach perf sample-filter program\n");
> + return PTR_ERR(link);
> + }
> + }
> + }
> + evsel->bpf_skel = skel;
> + return 0;
> +}
> +
> +int perf_bpf_filter__destroy(struct evsel *evsel)
> +{
> + struct perf_bpf_filter_expr *expr, *tmp;
> +
> + list_for_each_entry_safe(expr, tmp, &evsel->bpf_filters, list) {
> + list_del(&expr->list);
> + free(expr);
> + }
> + sample_filter_bpf__destroy(evsel->bpf_skel);
> + return 0;
> +}
> +
> struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
> enum perf_bpf_filter_op op,
> unsigned long val)
> diff --git a/tools/perf/util/bpf-filter.h b/tools/perf/util/bpf-filter.h
> index 93a0d3de038c..eb8e1ac43cdf 100644
> --- a/tools/perf/util/bpf-filter.h
> +++ b/tools/perf/util/bpf-filter.h
> @@ -4,15 +4,7 @@
>
> #include <linux/list.h>
>
> -enum perf_bpf_filter_op {
> - PBF_OP_EQ,
> - PBF_OP_NEQ,
> - PBF_OP_GT,
> - PBF_OP_GE,
> - PBF_OP_LT,
> - PBF_OP_LE,
> - PBF_OP_AND,
> -};
> +#include "bpf_skel/sample-filter.h"
>
> struct perf_bpf_filter_expr {
> struct list_head list;
> @@ -21,16 +13,30 @@ struct perf_bpf_filter_expr {
> unsigned long val;
> };
>
> +struct evsel;
> +
> #ifdef HAVE_BPF_SKEL
> struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
> enum perf_bpf_filter_op op,
> unsigned long val);
> int perf_bpf_filter__parse(struct list_head *expr_head, const char *str);
> +int perf_bpf_filter__prepare(struct evsel *evsel);
> +int perf_bpf_filter__destroy(struct evsel *evsel);
> +
> #else /* !HAVE_BPF_SKEL */
> +
> static inline int perf_bpf_filter__parse(struct list_head *expr_head __maybe_unused,
> const char *str __maybe_unused)
> {
> - return -ENOSYS;
> + return -EOPNOTSUPP;
> +}
> +static inline int perf_bpf_filter__prepare(struct evsel *evsel __maybe_unused)
> +{
> + return -EOPNOTSUPP;
> +}
> +static inline int perf_bpf_filter__destroy(struct evsel *evsel __maybe_unused)
> +{
> + return -EOPNOTSUPP;
> }
> #endif /* HAVE_BPF_SKEL*/
> #endif /* PERF_UTIL_BPF_FILTER_H */
> diff --git a/tools/perf/util/bpf_skel/sample-filter.h b/tools/perf/util/bpf_skel/sample-filter.h
> new file mode 100644
> index 000000000000..862060bfda14
> --- /dev/null
> +++ b/tools/perf/util/bpf_skel/sample-filter.h
> @@ -0,0 +1,24 @@
> +#ifndef PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H
> +#define PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H
> +
> +#define MAX_FILTERS 32
> +
> +/* supported filter operations */
> +enum perf_bpf_filter_op {
> + PBF_OP_EQ,
> + PBF_OP_NEQ,
> + PBF_OP_GT,
> + PBF_OP_GE,
> + PBF_OP_LT,
> + PBF_OP_LE,
> + PBF_OP_AND
> +};
> +
> +/* BPF map entry for filtering */
> +struct perf_bpf_filter_entry {
> + enum perf_bpf_filter_op op;
> + __u64 flags;
> + __u64 value;
> +};
> +
> +#endif /* PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H */
> \ No newline at end of file
> diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
> new file mode 100644
> index 000000000000..c07256279c3e
> --- /dev/null
> +++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
> @@ -0,0 +1,126 @@
> +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +// Copyright (c) 2023 Google
> +#include "vmlinux.h"
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +#include <bpf/bpf_core_read.h>
> +
> +#include "sample-filter.h"
> +
> +/* BPF map that will be filled by user space */
> +struct filters {
> + __uint(type, BPF_MAP_TYPE_ARRAY);
> + __type(key, int);
> + __type(value, struct perf_bpf_filter_entry);
> + __uint(max_entries, MAX_FILTERS);
> +} filters SEC(".maps");
> +
> +int dropped;
> +
> +void *bpf_cast_to_kern_ctx(void *) __ksym;
> +
> +/* new kernel perf_sample_data definition */
> +struct perf_sample_data___new {
> + __u64 sample_flags;
> +} __attribute__((preserve_access_index));
> +
> +/* helper function to return the given perf sample data */
> +static inline __u64 perf_get_sample(struct bpf_perf_event_data_kern *kctx,
> + struct perf_bpf_filter_entry *entry)
> +{
> + struct perf_sample_data___new *data = (void *)kctx->data;
> +
> + if (!bpf_core_field_exists(data->sample_flags) ||
> + (data->sample_flags & entry->flags) == 0)
> + return 0;
> +
> + switch (entry->flags) {
> + case PERF_SAMPLE_IP:
> + return kctx->data->ip;
> + case PERF_SAMPLE_ID:
> + return kctx->data->id;
> + case PERF_SAMPLE_TID:
> + return kctx->data->tid_entry.tid;
> + case PERF_SAMPLE_CPU:
> + return kctx->data->cpu_entry.cpu;
> + case PERF_SAMPLE_TIME:
> + return kctx->data->time;
> + case PERF_SAMPLE_ADDR:
> + return kctx->data->addr;
> + case PERF_SAMPLE_PERIOD:
> + return kctx->data->period;
> + case PERF_SAMPLE_TRANSACTION:
> + return kctx->data->txn;
> + case PERF_SAMPLE_WEIGHT:
> + return kctx->data->weight.full;
> + case PERF_SAMPLE_PHYS_ADDR:
> + return kctx->data->phys_addr;
> + case PERF_SAMPLE_CODE_PAGE_SIZE:
> + return kctx->data->code_page_size;
> + case PERF_SAMPLE_DATA_PAGE_SIZE:
> + return kctx->data->data_page_size;
> + default:
> + break;
> + }
> + return 0;
> +}
> +
> +/* BPF program to be called from perf event overflow handler */
> +SEC("perf_event")
> +int perf_sample_filter(void *ctx)
> +{
> + struct bpf_perf_event_data_kern *kctx;
> + struct perf_bpf_filter_entry *entry;
> + __u64 sample_data;
> + int i;
> +
> + kctx = bpf_cast_to_kern_ctx(ctx);
> +
> + for (i = 0; i < MAX_FILTERS; i++) {
> + int key = i; /* needed for verifier :( */
> +
> + entry = bpf_map_lookup_elem(&filters, &key);
> + if (entry == NULL)
> + break;
> + sample_data = perf_get_sample(kctx, entry);
> +
> + switch (entry->op) {
> + case PBF_OP_EQ:
> + if (!(sample_data == entry->value))
> + goto drop;
> + break;
> + case PBF_OP_NEQ:
> + if (!(sample_data != entry->value))
> + goto drop;
> + break;
> + case PBF_OP_GT:
> + if (!(sample_data > entry->value))
> + goto drop;
> + break;
> + case PBF_OP_GE:
> + if (!(sample_data >= entry->value))
> + goto drop;
> + break;
> + case PBF_OP_LT:
> + if (!(sample_data < entry->value))
> + goto drop;
> + break;
> + case PBF_OP_LE:
> + if (!(sample_data <= entry->value))
> + goto drop;
> + break;
> + case PBF_OP_AND:
> + if (!(sample_data & entry->value))
> + goto drop;
> + break;
> + }
> + }
> + /* generate sample data */
> + return 1;
> +
> +drop:
> + __sync_fetch_and_add(&dropped, 1);
> + return 0;
> +}
> +
> +char LICENSE[] SEC("license") = "Dual BSD/GPL";
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index c272c06565c0..68072ec655ce 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -150,8 +150,10 @@ struct evsel {
> */
> struct bpf_counter_ops *bpf_counter_ops;
>
> - /* for perf-stat -b */
> - struct list_head bpf_counter_list;
> + union {
> + struct list_head bpf_counter_list; /* for perf-stat -b */
> + struct list_head bpf_filters; /* for perf-record --filter */
> + };
>
> /* for perf-stat --use-bpf */
> int bperf_leader_prog_fd;
> @@ -159,6 +161,7 @@ struct evsel {
> union {
> struct bperf_leader_bpf *leader_skel;
> struct bperf_follower_bpf *follower_skel;
> + void *bpf_skel;
> };
> unsigned long open_flags;
> int precise_ip_original;
> --
> 2.40.0.rc1.284.g88254d51c5-goog
>
--
- Arnaldo
Hi Arnaldo,
On Wed, Mar 15, 2023 at 6:47 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Tue, Mar 14, 2023 at 04:42:30PM -0700, Namhyung Kim escreveu:
> > Use --filter option to set BPF filter for generic events other than the
> > tracepoints or Intel PT. The BPF program will check the sample data and
> > filter according to the expression.
> >
> > For example, the below is the typical perf record for frequency mode.
> > The sample period started from 1 and increased gradually.
> >
> > $ sudo ./perf record -e cycles true
> > $ sudo ./perf script
> > perf-exec 2272336 546683.916875: 1 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> > perf-exec 2272336 546683.916892: 1 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> > perf-exec 2272336 546683.916899: 3 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> > perf-exec 2272336 546683.916905: 17 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> > perf-exec 2272336 546683.916911: 100 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> > perf-exec 2272336 546683.916917: 589 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> > perf-exec 2272336 546683.916924: 3470 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> > perf-exec 2272336 546683.916930: 20465 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
> > true 2272336 546683.916940: 119873 cycles: ffffffff8283afdd perf_iterate_ctx+0x2d ([kernel.kallsyms])
> > true 2272336 546683.917003: 461349 cycles: ffffffff82892517 vma_interval_tree_insert+0x37 ([kernel.kallsyms])
> > true 2272336 546683.917237: 635778 cycles: ffffffff82a11400 security_mmap_file+0x20 ([kernel.kallsyms])
> >
> > When you add a BPF filter to get samples having periods greater than 1000,
> > the output would look like below:
>
> Had to add:
>
> diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
> index be336f1b2b689602..153a13cdca9df1ea 100644
> --- a/tools/perf/util/python.c
> +++ b/tools/perf/util/python.c
> @@ -19,6 +19,7 @@
> #include "mmap.h"
> #include "stat.h"
> #include "metricgroup.h"
> +#include "util/bpf-filter.h"
> #include "util/env.h"
> #include "util/pmu.h"
> #include <internal/lib.h>
> @@ -135,6 +136,18 @@ int bpf_counter__disable(struct evsel *evsel __maybe_unused)
> return 0;
> }
>
> +// not to drag util/bpf-filter.c
> +
> +int perf_bpf_filter__prepare(struct evsel *evsel __maybe_unused)
> +{
> + return 0;
> +}
> +
> +int perf_bpf_filter__destroy(struct evsel *evsel __maybe_unused)
> +{
> + return 0;
> +}
> +
> /*
> * Support debug printing even though util/debug.c is not linked. That means
> * implementing 'verbose' and 'eprintf'.
>
>
> Please run 'perf test' before submitting patches,
Ugh, sorry. I think I ran it at some point but missed the python test :-p
Anyway, I'm afraid you need to enclose with #ifndef HAVE_BPF_SKEL.
Thanks,
Namhyung
>
> - Arnaldo
>
> > $ sudo ./perf record -e cycles --filter 'period > 1000' true
> > $ sudo ./perf script
> > perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
> > perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
> > perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
> > perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
> > perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
> > true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
> > true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
> >
> > Acked-by: Jiri Olsa <[email protected]>
> > Signed-off-by: Namhyung Kim <[email protected]>
> > ---
> > tools/perf/Documentation/perf-record.txt | 15 +++++++++++---
> > tools/perf/util/bpf_counter.c | 3 +--
> > tools/perf/util/evlist.c | 25 +++++++++++++++++-------
> > tools/perf/util/evsel.c | 2 ++
> > tools/perf/util/parse-events.c | 8 +++-----
> > 5 files changed, 36 insertions(+), 17 deletions(-)
> >
> > diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
> > index ff815c2f67e8..122f71726eaa 100644
> > --- a/tools/perf/Documentation/perf-record.txt
> > +++ b/tools/perf/Documentation/perf-record.txt
> > @@ -119,9 +119,12 @@ OPTIONS
> > "perf report" to view group events together.
> >
> > --filter=<filter>::
> > - Event filter. This option should follow an event selector (-e) which
> > - selects either tracepoint event(s) or a hardware trace PMU
> > - (e.g. Intel PT or CoreSight).
> > + Event filter. This option should follow an event selector (-e).
> > + If the event is a tracepoint, the filter string will be parsed by
> > + the kernel. If the event is a hardware trace PMU (e.g. Intel PT
> > + or CoreSight), it'll be processed as an address filter. Otherwise
> > + it means a general filter using BPF which can be applied for any
> > + kind of event.
> >
> > - tracepoint filters
> >
> > @@ -176,6 +179,12 @@ OPTIONS
> >
> > Multiple filters can be separated with space or comma.
> >
> > + - bpf filters
> > +
> > + A BPF filter can access the sample data and make a decision based on the
> > + data. Users need to set an appropriate sample type to use the BPF
> > + filter.
> > +
> > --exclude-perf::
> > Don't record events issued by perf itself. This option should follow
> > an event selector (-e) which selects tracepoint event(s). It adds a
> > diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> > index aa78a15a6f0a..1b77436e067e 100644
> > --- a/tools/perf/util/bpf_counter.c
> > +++ b/tools/perf/util/bpf_counter.c
> > @@ -763,8 +763,7 @@ extern struct bpf_counter_ops bperf_cgrp_ops;
> >
> > static inline bool bpf_counter_skip(struct evsel *evsel)
> > {
> > - return list_empty(&evsel->bpf_counter_list) &&
> > - evsel->follower_skel == NULL;
> > + return evsel->bpf_counter_ops == NULL;
> > }
> >
> > int bpf_counter__install_pe(struct evsel *evsel, int cpu_map_idx, int fd)
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > index b74e12239aec..cc491a037836 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -31,6 +31,7 @@
> > #include "util/evlist-hybrid.h"
> > #include "util/pmu.h"
> > #include "util/sample.h"
> > +#include "util/bpf-filter.h"
> > #include <signal.h>
> > #include <unistd.h>
> > #include <sched.h>
> > @@ -1086,17 +1087,27 @@ int evlist__apply_filters(struct evlist *evlist, struct evsel **err_evsel)
> > int err = 0;
> >
> > evlist__for_each_entry(evlist, evsel) {
> > - if (evsel->filter == NULL)
> > - continue;
> > -
> > /*
> > * filters only work for tracepoint event, which doesn't have cpu limit.
> > * So evlist and evsel should always be same.
> > */
> > - err = perf_evsel__apply_filter(&evsel->core, evsel->filter);
> > - if (err) {
> > - *err_evsel = evsel;
> > - break;
> > + if (evsel->filter) {
> > + err = perf_evsel__apply_filter(&evsel->core, evsel->filter);
> > + if (err) {
> > + *err_evsel = evsel;
> > + break;
> > + }
> > + }
> > +
> > + /*
> > + * non-tracepoint events can have BPF filters.
> > + */
> > + if (!list_empty(&evsel->bpf_filters)) {
> > + err = perf_bpf_filter__prepare(evsel);
> > + if (err) {
> > + *err_evsel = evsel;
> > + break;
> > + }
> > }
> > }
> >
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index a83d8cd5eb51..dc3faf005c3b 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -50,6 +50,7 @@
> > #include "off_cpu.h"
> > #include "../perf-sys.h"
> > #include "util/parse-branch-options.h"
> > +#include "util/bpf-filter.h"
> > #include <internal/xyarray.h>
> > #include <internal/lib.h>
> > #include <internal/threadmap.h>
> > @@ -1517,6 +1518,7 @@ void evsel__exit(struct evsel *evsel)
> > assert(list_empty(&evsel->core.node));
> > assert(evsel->evlist == NULL);
> > bpf_counter__destroy(evsel);
> > + perf_bpf_filter__destroy(evsel);
> > evsel__free_counts(evsel);
> > perf_evsel__free_fd(&evsel->core);
> > perf_evsel__free_id(&evsel->core);
> > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > index 3b2e5bb3e852..6c5cf5244486 100644
> > --- a/tools/perf/util/parse-events.c
> > +++ b/tools/perf/util/parse-events.c
> > @@ -28,6 +28,7 @@
> > #include "perf.h"
> > #include "util/parse-events-hybrid.h"
> > #include "util/pmu-hybrid.h"
> > +#include "util/bpf-filter.h"
> > #include "tracepoint.h"
> > #include "thread_map.h"
> >
> > @@ -2542,11 +2543,8 @@ static int set_filter(struct evsel *evsel, const void *arg)
> > perf_pmu__scan_file(pmu, "nr_addr_filters",
> > "%d", &nr_addr_filters);
> >
> > - if (!nr_addr_filters) {
> > - fprintf(stderr,
> > - "This CPU does not support address filtering\n");
> > - return -1;
> > - }
> > + if (!nr_addr_filters)
> > + return perf_bpf_filter__parse(&evsel->bpf_filters, str);
> >
> > if (evsel__append_addr_filter(evsel, str) < 0) {
> > fprintf(stderr,
> > --
> > 2.40.0.rc1.284.g88254d51c5-goog
> >
>
> --
>
> - Arnaldo
Em Wed, Mar 15, 2023 at 01:24:37PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Mar 14, 2023 at 04:42:29PM -0700, Namhyung Kim escreveu:
> > The BPF program will be attached to a perf_event and be triggered when
> > it overflows. It'd iterate the filters map and compare the sample
> > value according to the expression. If any of them fails, the sample
> > would be dropped.
> >
> > Also it needs to have the corresponding sample data for the expression
> > so it compares data->sample_flags with the given value. To access the
> > sample data, it uses the bpf_cast_to_kern_ctx() kfunc which was added
> > in v6.2 kernel.
> >
> > Acked-by: Jiri Olsa <[email protected]>
> > Signed-off-by: Namhyung Kim <[email protected]>
>
>
> I'm noticing this while building on a debian:11 container:
>
> GENSKEL /tmp/build/perf/util/bpf_skel/bperf_leader.skel.h
> GENSKEL /tmp/build/perf/util/bpf_skel/bperf_follower.skel.h
> GENSKEL /tmp/build/perf/util/bpf_skel/func_latency.skel.h
> GENSKEL /tmp/build/perf/util/bpf_skel/bpf_prog_profiler.skel.h
> GENSKEL /tmp/build/perf/util/bpf_skel/kwork_trace.skel.h
> GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h
> libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
> Error: failed to open BPF object file: No such file or directory
> make[2]: *** [Makefile.perf:1085: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
> make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'
> make[2]: *** Waiting for unfinished jobs....
> make[1]: *** [Makefile.perf:236: sub-make] Error 2
> make: *** [Makefile:70: all] Error 2
> make: Leaving directory '/git/perf-6.3.0-rc1/tools/perf'
> + exit 1
> [perfbuilder@five 11]$
Same thing on debian:10
libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
Error: failed to open BPF object file: No such file or directory
make[2]: *** [Makefile.perf:1085: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile.perf:236: sub-make] Error 2
make: *** [Makefile:70: all] Error 2
make: Leaving directory '/git/perf-6.3.0-rc1/tools/perf'
+ exit 1
[perfbuilder@five 10]$
Works with debian:experimental:
[perfbuilder@five experimental]$ export BUILD_TARBALL=http://192.168.86.10/perf/perf-6.3.0-rc1.tar.xz
[perfbuilder@five experimental]$ time dm .
1 147.54 debian:experimental : Ok gcc (Debian 12.2.0-14) 12.2.0 , Debian clang version 14.0.6
BUILD_TARBALL_HEAD=d34a77f6cd75d2a75c64e78f3d949a12903a7cf0
Both with:
Debian clang version 14.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/12
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64
+ rm -rf /tmp/build/perf
+ mkdir /tmp/build/perf
+ make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C tools/perf O=/tmp/build/perf CC=clang
and:
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-14' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Debian 12.2.0-14)
+ make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C tools/perf O=/tmp/build/perf
make: Entering directory '/git/perf-6.3.0-rc1/tools/perf'
>
> > ---
> > tools/perf/Makefile.perf | 2 +-
> > tools/perf/util/bpf-filter.c | 64 ++++++++++
> > tools/perf/util/bpf-filter.h | 26 ++--
> > tools/perf/util/bpf_skel/sample-filter.h | 24 ++++
> > tools/perf/util/bpf_skel/sample_filter.bpf.c | 126 +++++++++++++++++++
> > tools/perf/util/evsel.h | 7 +-
> > 6 files changed, 236 insertions(+), 13 deletions(-)
> > create mode 100644 tools/perf/util/bpf_skel/sample-filter.h
> > create mode 100644 tools/perf/util/bpf_skel/sample_filter.bpf.c
> >
> > diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> > index dc9dda09b076..ed6b6a070f79 100644
> > --- a/tools/perf/Makefile.perf
> > +++ b/tools/perf/Makefile.perf
> > @@ -1050,7 +1050,7 @@ SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
> > SKELETONS += $(SKEL_OUT)/bperf_leader.skel.h $(SKEL_OUT)/bperf_follower.skel.h
> > SKELETONS += $(SKEL_OUT)/bperf_cgroup.skel.h $(SKEL_OUT)/func_latency.skel.h
> > SKELETONS += $(SKEL_OUT)/off_cpu.skel.h $(SKEL_OUT)/lock_contention.skel.h
> > -SKELETONS += $(SKEL_OUT)/kwork_trace.skel.h
> > +SKELETONS += $(SKEL_OUT)/kwork_trace.skel.h $(SKEL_OUT)/sample_filter.skel.h
> >
> > $(SKEL_TMP_OUT) $(LIBAPI_OUTPUT) $(LIBBPF_OUTPUT) $(LIBPERF_OUTPUT) $(LIBSUBCMD_OUTPUT) $(LIBSYMBOL_OUTPUT):
> > $(Q)$(MKDIR) -p $@
> > diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
> > index c72e35d51240..f20e1bc03778 100644
> > --- a/tools/perf/util/bpf-filter.c
> > +++ b/tools/perf/util/bpf-filter.c
> > @@ -1,10 +1,74 @@
> > /* SPDX-License-Identifier: GPL-2.0 */
> > #include <stdlib.h>
> >
> > +#include <bpf/bpf.h>
> > +#include <linux/err.h>
> > +#include <internal/xyarray.h>
> > +
> > +#include "util/debug.h"
> > +#include "util/evsel.h"
> > +
> > #include "util/bpf-filter.h"
> > #include "util/bpf-filter-flex.h"
> > #include "util/bpf-filter-bison.h"
> >
> > +#include "bpf_skel/sample-filter.h"
> > +#include "bpf_skel/sample_filter.skel.h"
> > +
> > +#define FD(e, x, y) (*(int *)xyarray__entry(e->core.fd, x, y))
> > +
> > +int perf_bpf_filter__prepare(struct evsel *evsel)
> > +{
> > + int i, x, y, fd;
> > + struct sample_filter_bpf *skel;
> > + struct bpf_program *prog;
> > + struct bpf_link *link;
> > + struct perf_bpf_filter_expr *expr;
> > +
> > + skel = sample_filter_bpf__open_and_load();
> > + if (!skel) {
> > + pr_err("Failed to load perf sample-filter BPF skeleton\n");
> > + return -1;
> > + }
> > +
> > + i = 0;
> > + fd = bpf_map__fd(skel->maps.filters);
> > + list_for_each_entry(expr, &evsel->bpf_filters, list) {
> > + struct perf_bpf_filter_entry entry = {
> > + .op = expr->op,
> > + .flags = expr->sample_flags,
> > + .value = expr->val,
> > + };
> > + bpf_map_update_elem(fd, &i, &entry, BPF_ANY);
> > + i++;
> > + }
> > +
> > + prog = skel->progs.perf_sample_filter;
> > + for (x = 0; x < xyarray__max_x(evsel->core.fd); x++) {
> > + for (y = 0; y < xyarray__max_y(evsel->core.fd); y++) {
> > + link = bpf_program__attach_perf_event(prog, FD(evsel, x, y));
> > + if (IS_ERR(link)) {
> > + pr_err("Failed to attach perf sample-filter program\n");
> > + return PTR_ERR(link);
> > + }
> > + }
> > + }
> > + evsel->bpf_skel = skel;
> > + return 0;
> > +}
> > +
> > +int perf_bpf_filter__destroy(struct evsel *evsel)
> > +{
> > + struct perf_bpf_filter_expr *expr, *tmp;
> > +
> > + list_for_each_entry_safe(expr, tmp, &evsel->bpf_filters, list) {
> > + list_del(&expr->list);
> > + free(expr);
> > + }
> > + sample_filter_bpf__destroy(evsel->bpf_skel);
> > + return 0;
> > +}
> > +
> > struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
> > enum perf_bpf_filter_op op,
> > unsigned long val)
> > diff --git a/tools/perf/util/bpf-filter.h b/tools/perf/util/bpf-filter.h
> > index 93a0d3de038c..eb8e1ac43cdf 100644
> > --- a/tools/perf/util/bpf-filter.h
> > +++ b/tools/perf/util/bpf-filter.h
> > @@ -4,15 +4,7 @@
> >
> > #include <linux/list.h>
> >
> > -enum perf_bpf_filter_op {
> > - PBF_OP_EQ,
> > - PBF_OP_NEQ,
> > - PBF_OP_GT,
> > - PBF_OP_GE,
> > - PBF_OP_LT,
> > - PBF_OP_LE,
> > - PBF_OP_AND,
> > -};
> > +#include "bpf_skel/sample-filter.h"
> >
> > struct perf_bpf_filter_expr {
> > struct list_head list;
> > @@ -21,16 +13,30 @@ struct perf_bpf_filter_expr {
> > unsigned long val;
> > };
> >
> > +struct evsel;
> > +
> > #ifdef HAVE_BPF_SKEL
> > struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
> > enum perf_bpf_filter_op op,
> > unsigned long val);
> > int perf_bpf_filter__parse(struct list_head *expr_head, const char *str);
> > +int perf_bpf_filter__prepare(struct evsel *evsel);
> > +int perf_bpf_filter__destroy(struct evsel *evsel);
> > +
> > #else /* !HAVE_BPF_SKEL */
> > +
> > static inline int perf_bpf_filter__parse(struct list_head *expr_head __maybe_unused,
> > const char *str __maybe_unused)
> > {
> > - return -ENOSYS;
> > + return -EOPNOTSUPP;
> > +}
> > +static inline int perf_bpf_filter__prepare(struct evsel *evsel __maybe_unused)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +static inline int perf_bpf_filter__destroy(struct evsel *evsel __maybe_unused)
> > +{
> > + return -EOPNOTSUPP;
> > }
> > #endif /* HAVE_BPF_SKEL*/
> > #endif /* PERF_UTIL_BPF_FILTER_H */
> > diff --git a/tools/perf/util/bpf_skel/sample-filter.h b/tools/perf/util/bpf_skel/sample-filter.h
> > new file mode 100644
> > index 000000000000..862060bfda14
> > --- /dev/null
> > +++ b/tools/perf/util/bpf_skel/sample-filter.h
> > @@ -0,0 +1,24 @@
> > +#ifndef PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H
> > +#define PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H
> > +
> > +#define MAX_FILTERS 32
> > +
> > +/* supported filter operations */
> > +enum perf_bpf_filter_op {
> > + PBF_OP_EQ,
> > + PBF_OP_NEQ,
> > + PBF_OP_GT,
> > + PBF_OP_GE,
> > + PBF_OP_LT,
> > + PBF_OP_LE,
> > + PBF_OP_AND
> > +};
> > +
> > +/* BPF map entry for filtering */
> > +struct perf_bpf_filter_entry {
> > + enum perf_bpf_filter_op op;
> > + __u64 flags;
> > + __u64 value;
> > +};
> > +
> > +#endif /* PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H */
> > \ No newline at end of file
> > diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
> > new file mode 100644
> > index 000000000000..c07256279c3e
> > --- /dev/null
> > +++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
> > @@ -0,0 +1,126 @@
> > +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +// Copyright (c) 2023 Google
> > +#include "vmlinux.h"
> > +#include <bpf/bpf_helpers.h>
> > +#include <bpf/bpf_tracing.h>
> > +#include <bpf/bpf_core_read.h>
> > +
> > +#include "sample-filter.h"
> > +
> > +/* BPF map that will be filled by user space */
> > +struct filters {
> > + __uint(type, BPF_MAP_TYPE_ARRAY);
> > + __type(key, int);
> > + __type(value, struct perf_bpf_filter_entry);
> > + __uint(max_entries, MAX_FILTERS);
> > +} filters SEC(".maps");
> > +
> > +int dropped;
> > +
> > +void *bpf_cast_to_kern_ctx(void *) __ksym;
> > +
> > +/* new kernel perf_sample_data definition */
> > +struct perf_sample_data___new {
> > + __u64 sample_flags;
> > +} __attribute__((preserve_access_index));
> > +
> > +/* helper function to return the given perf sample data */
> > +static inline __u64 perf_get_sample(struct bpf_perf_event_data_kern *kctx,
> > + struct perf_bpf_filter_entry *entry)
> > +{
> > + struct perf_sample_data___new *data = (void *)kctx->data;
> > +
> > + if (!bpf_core_field_exists(data->sample_flags) ||
> > + (data->sample_flags & entry->flags) == 0)
> > + return 0;
> > +
> > + switch (entry->flags) {
> > + case PERF_SAMPLE_IP:
> > + return kctx->data->ip;
> > + case PERF_SAMPLE_ID:
> > + return kctx->data->id;
> > + case PERF_SAMPLE_TID:
> > + return kctx->data->tid_entry.tid;
> > + case PERF_SAMPLE_CPU:
> > + return kctx->data->cpu_entry.cpu;
> > + case PERF_SAMPLE_TIME:
> > + return kctx->data->time;
> > + case PERF_SAMPLE_ADDR:
> > + return kctx->data->addr;
> > + case PERF_SAMPLE_PERIOD:
> > + return kctx->data->period;
> > + case PERF_SAMPLE_TRANSACTION:
> > + return kctx->data->txn;
> > + case PERF_SAMPLE_WEIGHT:
> > + return kctx->data->weight.full;
> > + case PERF_SAMPLE_PHYS_ADDR:
> > + return kctx->data->phys_addr;
> > + case PERF_SAMPLE_CODE_PAGE_SIZE:
> > + return kctx->data->code_page_size;
> > + case PERF_SAMPLE_DATA_PAGE_SIZE:
> > + return kctx->data->data_page_size;
> > + default:
> > + break;
> > + }
> > + return 0;
> > +}
> > +
> > +/* BPF program to be called from perf event overflow handler */
> > +SEC("perf_event")
> > +int perf_sample_filter(void *ctx)
> > +{
> > + struct bpf_perf_event_data_kern *kctx;
> > + struct perf_bpf_filter_entry *entry;
> > + __u64 sample_data;
> > + int i;
> > +
> > + kctx = bpf_cast_to_kern_ctx(ctx);
> > +
> > + for (i = 0; i < MAX_FILTERS; i++) {
> > + int key = i; /* needed for verifier :( */
> > +
> > + entry = bpf_map_lookup_elem(&filters, &key);
> > + if (entry == NULL)
> > + break;
> > + sample_data = perf_get_sample(kctx, entry);
> > +
> > + switch (entry->op) {
> > + case PBF_OP_EQ:
> > + if (!(sample_data == entry->value))
> > + goto drop;
> > + break;
> > + case PBF_OP_NEQ:
> > + if (!(sample_data != entry->value))
> > + goto drop;
> > + break;
> > + case PBF_OP_GT:
> > + if (!(sample_data > entry->value))
> > + goto drop;
> > + break;
> > + case PBF_OP_GE:
> > + if (!(sample_data >= entry->value))
> > + goto drop;
> > + break;
> > + case PBF_OP_LT:
> > + if (!(sample_data < entry->value))
> > + goto drop;
> > + break;
> > + case PBF_OP_LE:
> > + if (!(sample_data <= entry->value))
> > + goto drop;
> > + break;
> > + case PBF_OP_AND:
> > + if (!(sample_data & entry->value))
> > + goto drop;
> > + break;
> > + }
> > + }
> > + /* generate sample data */
> > + return 1;
> > +
> > +drop:
> > + __sync_fetch_and_add(&dropped, 1);
> > + return 0;
> > +}
> > +
> > +char LICENSE[] SEC("license") = "Dual BSD/GPL";
> > diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> > index c272c06565c0..68072ec655ce 100644
> > --- a/tools/perf/util/evsel.h
> > +++ b/tools/perf/util/evsel.h
> > @@ -150,8 +150,10 @@ struct evsel {
> > */
> > struct bpf_counter_ops *bpf_counter_ops;
> >
> > - /* for perf-stat -b */
> > - struct list_head bpf_counter_list;
> > + union {
> > + struct list_head bpf_counter_list; /* for perf-stat -b */
> > + struct list_head bpf_filters; /* for perf-record --filter */
> > + };
> >
> > /* for perf-stat --use-bpf */
> > int bperf_leader_prog_fd;
> > @@ -159,6 +161,7 @@ struct evsel {
> > union {
> > struct bperf_leader_bpf *leader_skel;
> > struct bperf_follower_bpf *follower_skel;
> > + void *bpf_skel;
> > };
> > unsigned long open_flags;
> > int precise_ip_original;
> > --
> > 2.40.0.rc1.284.g88254d51c5-goog
> >
>
> --
>
> - Arnaldo
--
- Arnaldo
On Wed, Mar 15, 2023 at 9:39 AM Arnaldo Carvalho de Melo
<[email protected]> wrote:
>
> Em Wed, Mar 15, 2023 at 01:24:37PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Tue, Mar 14, 2023 at 04:42:29PM -0700, Namhyung Kim escreveu:
> > > The BPF program will be attached to a perf_event and be triggered when
> > > it overflows. It'd iterate the filters map and compare the sample
> > > value according to the expression. If any of them fails, the sample
> > > would be dropped.
> > >
> > > Also it needs to have the corresponding sample data for the expression
> > > so it compares data->sample_flags with the given value. To access the
> > > sample data, it uses the bpf_cast_to_kern_ctx() kfunc which was added
> > > in v6.2 kernel.
> > >
> > > Acked-by: Jiri Olsa <[email protected]>
> > > Signed-off-by: Namhyung Kim <[email protected]>
> >
> >
> > I'm noticing this while building on a debian:11 container:
> >
> > GENSKEL /tmp/build/perf/util/bpf_skel/bperf_leader.skel.h
> > GENSKEL /tmp/build/perf/util/bpf_skel/bperf_follower.skel.h
> > GENSKEL /tmp/build/perf/util/bpf_skel/func_latency.skel.h
> > GENSKEL /tmp/build/perf/util/bpf_skel/bpf_prog_profiler.skel.h
> > GENSKEL /tmp/build/perf/util/bpf_skel/kwork_trace.skel.h
> > GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h
> > libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
> > Error: failed to open BPF object file: No such file or directory
> > make[2]: *** [Makefile.perf:1085: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
> > make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'
> > make[2]: *** Waiting for unfinished jobs....
> > make[1]: *** [Makefile.perf:236: sub-make] Error 2
> > make: *** [Makefile:70: all] Error 2
> > make: Leaving directory '/git/perf-6.3.0-rc1/tools/perf'
> > + exit 1
> > [perfbuilder@five 11]$
>
> Same thing on debian:10
Hmm.. I thought extern symbols with__ksym are runtime
dependencies and it should build on old kernels too.
BPF folks, any suggestions?
Thanks,
Namhyung
>
> libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
> Error: failed to open BPF object file: No such file or directory
> make[2]: *** [Makefile.perf:1085: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
> make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'
> make[2]: *** Waiting for unfinished jobs....
> make[1]: *** [Makefile.perf:236: sub-make] Error 2
> make: *** [Makefile:70: all] Error 2
> make: Leaving directory '/git/perf-6.3.0-rc1/tools/perf'
> + exit 1
> [perfbuilder@five 10]$
>
> Works with debian:experimental:
>
>
> [perfbuilder@five experimental]$ export BUILD_TARBALL=http://192.168.86.10/perf/perf-6.3.0-rc1.tar.xz
> [perfbuilder@five experimental]$ time dm .
> 1 147.54 debian:experimental : Ok gcc (Debian 12.2.0-14) 12.2.0 , Debian clang version 14.0.6
> BUILD_TARBALL_HEAD=d34a77f6cd75d2a75c64e78f3d949a12903a7cf0
>
> Both with:
>
> Debian clang version 14.0.6
> Target: x86_64-pc-linux-gnu
> Thread model: posix
> InstalledDir: /usr/bin
> Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/12
> Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/12
> Candidate multilib: .;@m64
> Selected multilib: .;@m64
> + rm -rf /tmp/build/perf
> + mkdir /tmp/build/perf
> + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C tools/perf O=/tmp/build/perf CC=clang
>
> and:
>
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
> OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
> OFFLOAD_TARGET_DEFAULT=1
> Target: x86_64-linux-gnu
> Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-14' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
> Thread model: posix
> Supported LTO compression algorithms: zlib zstd
> gcc version 12.2.0 (Debian 12.2.0-14)
> + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C tools/perf O=/tmp/build/perf
> make: Entering directory '/git/perf-6.3.0-rc1/tools/perf'
>
>
> >
> > > ---
> > > tools/perf/Makefile.perf | 2 +-
> > > tools/perf/util/bpf-filter.c | 64 ++++++++++
> > > tools/perf/util/bpf-filter.h | 26 ++--
> > > tools/perf/util/bpf_skel/sample-filter.h | 24 ++++
> > > tools/perf/util/bpf_skel/sample_filter.bpf.c | 126 +++++++++++++++++++
> > > tools/perf/util/evsel.h | 7 +-
> > > 6 files changed, 236 insertions(+), 13 deletions(-)
> > > create mode 100644 tools/perf/util/bpf_skel/sample-filter.h
> > > create mode 100644 tools/perf/util/bpf_skel/sample_filter.bpf.c
> > >
> > > diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> > > index dc9dda09b076..ed6b6a070f79 100644
> > > --- a/tools/perf/Makefile.perf
> > > +++ b/tools/perf/Makefile.perf
> > > @@ -1050,7 +1050,7 @@ SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
> > > SKELETONS += $(SKEL_OUT)/bperf_leader.skel.h $(SKEL_OUT)/bperf_follower.skel.h
> > > SKELETONS += $(SKEL_OUT)/bperf_cgroup.skel.h $(SKEL_OUT)/func_latency.skel.h
> > > SKELETONS += $(SKEL_OUT)/off_cpu.skel.h $(SKEL_OUT)/lock_contention.skel.h
> > > -SKELETONS += $(SKEL_OUT)/kwork_trace.skel.h
> > > +SKELETONS += $(SKEL_OUT)/kwork_trace.skel.h $(SKEL_OUT)/sample_filter.skel.h
> > >
> > > $(SKEL_TMP_OUT) $(LIBAPI_OUTPUT) $(LIBBPF_OUTPUT) $(LIBPERF_OUTPUT) $(LIBSUBCMD_OUTPUT) $(LIBSYMBOL_OUTPUT):
> > > $(Q)$(MKDIR) -p $@
> > > diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
> > > index c72e35d51240..f20e1bc03778 100644
> > > --- a/tools/perf/util/bpf-filter.c
> > > +++ b/tools/perf/util/bpf-filter.c
> > > @@ -1,10 +1,74 @@
> > > /* SPDX-License-Identifier: GPL-2.0 */
> > > #include <stdlib.h>
> > >
> > > +#include <bpf/bpf.h>
> > > +#include <linux/err.h>
> > > +#include <internal/xyarray.h>
> > > +
> > > +#include "util/debug.h"
> > > +#include "util/evsel.h"
> > > +
> > > #include "util/bpf-filter.h"
> > > #include "util/bpf-filter-flex.h"
> > > #include "util/bpf-filter-bison.h"
> > >
> > > +#include "bpf_skel/sample-filter.h"
> > > +#include "bpf_skel/sample_filter.skel.h"
> > > +
> > > +#define FD(e, x, y) (*(int *)xyarray__entry(e->core.fd, x, y))
> > > +
> > > +int perf_bpf_filter__prepare(struct evsel *evsel)
> > > +{
> > > + int i, x, y, fd;
> > > + struct sample_filter_bpf *skel;
> > > + struct bpf_program *prog;
> > > + struct bpf_link *link;
> > > + struct perf_bpf_filter_expr *expr;
> > > +
> > > + skel = sample_filter_bpf__open_and_load();
> > > + if (!skel) {
> > > + pr_err("Failed to load perf sample-filter BPF skeleton\n");
> > > + return -1;
> > > + }
> > > +
> > > + i = 0;
> > > + fd = bpf_map__fd(skel->maps.filters);
> > > + list_for_each_entry(expr, &evsel->bpf_filters, list) {
> > > + struct perf_bpf_filter_entry entry = {
> > > + .op = expr->op,
> > > + .flags = expr->sample_flags,
> > > + .value = expr->val,
> > > + };
> > > + bpf_map_update_elem(fd, &i, &entry, BPF_ANY);
> > > + i++;
> > > + }
> > > +
> > > + prog = skel->progs.perf_sample_filter;
> > > + for (x = 0; x < xyarray__max_x(evsel->core.fd); x++) {
> > > + for (y = 0; y < xyarray__max_y(evsel->core.fd); y++) {
> > > + link = bpf_program__attach_perf_event(prog, FD(evsel, x, y));
> > > + if (IS_ERR(link)) {
> > > + pr_err("Failed to attach perf sample-filter program\n");
> > > + return PTR_ERR(link);
> > > + }
> > > + }
> > > + }
> > > + evsel->bpf_skel = skel;
> > > + return 0;
> > > +}
> > > +
> > > +int perf_bpf_filter__destroy(struct evsel *evsel)
> > > +{
> > > + struct perf_bpf_filter_expr *expr, *tmp;
> > > +
> > > + list_for_each_entry_safe(expr, tmp, &evsel->bpf_filters, list) {
> > > + list_del(&expr->list);
> > > + free(expr);
> > > + }
> > > + sample_filter_bpf__destroy(evsel->bpf_skel);
> > > + return 0;
> > > +}
> > > +
> > > struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
> > > enum perf_bpf_filter_op op,
> > > unsigned long val)
> > > diff --git a/tools/perf/util/bpf-filter.h b/tools/perf/util/bpf-filter.h
> > > index 93a0d3de038c..eb8e1ac43cdf 100644
> > > --- a/tools/perf/util/bpf-filter.h
> > > +++ b/tools/perf/util/bpf-filter.h
> > > @@ -4,15 +4,7 @@
> > >
> > > #include <linux/list.h>
> > >
> > > -enum perf_bpf_filter_op {
> > > - PBF_OP_EQ,
> > > - PBF_OP_NEQ,
> > > - PBF_OP_GT,
> > > - PBF_OP_GE,
> > > - PBF_OP_LT,
> > > - PBF_OP_LE,
> > > - PBF_OP_AND,
> > > -};
> > > +#include "bpf_skel/sample-filter.h"
> > >
> > > struct perf_bpf_filter_expr {
> > > struct list_head list;
> > > @@ -21,16 +13,30 @@ struct perf_bpf_filter_expr {
> > > unsigned long val;
> > > };
> > >
> > > +struct evsel;
> > > +
> > > #ifdef HAVE_BPF_SKEL
> > > struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
> > > enum perf_bpf_filter_op op,
> > > unsigned long val);
> > > int perf_bpf_filter__parse(struct list_head *expr_head, const char *str);
> > > +int perf_bpf_filter__prepare(struct evsel *evsel);
> > > +int perf_bpf_filter__destroy(struct evsel *evsel);
> > > +
> > > #else /* !HAVE_BPF_SKEL */
> > > +
> > > static inline int perf_bpf_filter__parse(struct list_head *expr_head __maybe_unused,
> > > const char *str __maybe_unused)
> > > {
> > > - return -ENOSYS;
> > > + return -EOPNOTSUPP;
> > > +}
> > > +static inline int perf_bpf_filter__prepare(struct evsel *evsel __maybe_unused)
> > > +{
> > > + return -EOPNOTSUPP;
> > > +}
> > > +static inline int perf_bpf_filter__destroy(struct evsel *evsel __maybe_unused)
> > > +{
> > > + return -EOPNOTSUPP;
> > > }
> > > #endif /* HAVE_BPF_SKEL*/
> > > #endif /* PERF_UTIL_BPF_FILTER_H */
> > > diff --git a/tools/perf/util/bpf_skel/sample-filter.h b/tools/perf/util/bpf_skel/sample-filter.h
> > > new file mode 100644
> > > index 000000000000..862060bfda14
> > > --- /dev/null
> > > +++ b/tools/perf/util/bpf_skel/sample-filter.h
> > > @@ -0,0 +1,24 @@
> > > +#ifndef PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H
> > > +#define PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H
> > > +
> > > +#define MAX_FILTERS 32
> > > +
> > > +/* supported filter operations */
> > > +enum perf_bpf_filter_op {
> > > + PBF_OP_EQ,
> > > + PBF_OP_NEQ,
> > > + PBF_OP_GT,
> > > + PBF_OP_GE,
> > > + PBF_OP_LT,
> > > + PBF_OP_LE,
> > > + PBF_OP_AND
> > > +};
> > > +
> > > +/* BPF map entry for filtering */
> > > +struct perf_bpf_filter_entry {
> > > + enum perf_bpf_filter_op op;
> > > + __u64 flags;
> > > + __u64 value;
> > > +};
> > > +
> > > +#endif /* PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H */
> > > \ No newline at end of file
> > > diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
> > > new file mode 100644
> > > index 000000000000..c07256279c3e
> > > --- /dev/null
> > > +++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
> > > @@ -0,0 +1,126 @@
> > > +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > +// Copyright (c) 2023 Google
> > > +#include "vmlinux.h"
> > > +#include <bpf/bpf_helpers.h>
> > > +#include <bpf/bpf_tracing.h>
> > > +#include <bpf/bpf_core_read.h>
> > > +
> > > +#include "sample-filter.h"
> > > +
> > > +/* BPF map that will be filled by user space */
> > > +struct filters {
> > > + __uint(type, BPF_MAP_TYPE_ARRAY);
> > > + __type(key, int);
> > > + __type(value, struct perf_bpf_filter_entry);
> > > + __uint(max_entries, MAX_FILTERS);
> > > +} filters SEC(".maps");
> > > +
> > > +int dropped;
> > > +
> > > +void *bpf_cast_to_kern_ctx(void *) __ksym;
> > > +
> > > +/* new kernel perf_sample_data definition */
> > > +struct perf_sample_data___new {
> > > + __u64 sample_flags;
> > > +} __attribute__((preserve_access_index));
> > > +
> > > +/* helper function to return the given perf sample data */
> > > +static inline __u64 perf_get_sample(struct bpf_perf_event_data_kern *kctx,
> > > + struct perf_bpf_filter_entry *entry)
> > > +{
> > > + struct perf_sample_data___new *data = (void *)kctx->data;
> > > +
> > > + if (!bpf_core_field_exists(data->sample_flags) ||
> > > + (data->sample_flags & entry->flags) == 0)
> > > + return 0;
> > > +
> > > + switch (entry->flags) {
> > > + case PERF_SAMPLE_IP:
> > > + return kctx->data->ip;
> > > + case PERF_SAMPLE_ID:
> > > + return kctx->data->id;
> > > + case PERF_SAMPLE_TID:
> > > + return kctx->data->tid_entry.tid;
> > > + case PERF_SAMPLE_CPU:
> > > + return kctx->data->cpu_entry.cpu;
> > > + case PERF_SAMPLE_TIME:
> > > + return kctx->data->time;
> > > + case PERF_SAMPLE_ADDR:
> > > + return kctx->data->addr;
> > > + case PERF_SAMPLE_PERIOD:
> > > + return kctx->data->period;
> > > + case PERF_SAMPLE_TRANSACTION:
> > > + return kctx->data->txn;
> > > + case PERF_SAMPLE_WEIGHT:
> > > + return kctx->data->weight.full;
> > > + case PERF_SAMPLE_PHYS_ADDR:
> > > + return kctx->data->phys_addr;
> > > + case PERF_SAMPLE_CODE_PAGE_SIZE:
> > > + return kctx->data->code_page_size;
> > > + case PERF_SAMPLE_DATA_PAGE_SIZE:
> > > + return kctx->data->data_page_size;
> > > + default:
> > > + break;
> > > + }
> > > + return 0;
> > > +}
> > > +
> > > +/* BPF program to be called from perf event overflow handler */
> > > +SEC("perf_event")
> > > +int perf_sample_filter(void *ctx)
> > > +{
> > > + struct bpf_perf_event_data_kern *kctx;
> > > + struct perf_bpf_filter_entry *entry;
> > > + __u64 sample_data;
> > > + int i;
> > > +
> > > + kctx = bpf_cast_to_kern_ctx(ctx);
> > > +
> > > + for (i = 0; i < MAX_FILTERS; i++) {
> > > + int key = i; /* needed for verifier :( */
> > > +
> > > + entry = bpf_map_lookup_elem(&filters, &key);
> > > + if (entry == NULL)
> > > + break;
> > > + sample_data = perf_get_sample(kctx, entry);
> > > +
> > > + switch (entry->op) {
> > > + case PBF_OP_EQ:
> > > + if (!(sample_data == entry->value))
> > > + goto drop;
> > > + break;
> > > + case PBF_OP_NEQ:
> > > + if (!(sample_data != entry->value))
> > > + goto drop;
> > > + break;
> > > + case PBF_OP_GT:
> > > + if (!(sample_data > entry->value))
> > > + goto drop;
> > > + break;
> > > + case PBF_OP_GE:
> > > + if (!(sample_data >= entry->value))
> > > + goto drop;
> > > + break;
> > > + case PBF_OP_LT:
> > > + if (!(sample_data < entry->value))
> > > + goto drop;
> > > + break;
> > > + case PBF_OP_LE:
> > > + if (!(sample_data <= entry->value))
> > > + goto drop;
> > > + break;
> > > + case PBF_OP_AND:
> > > + if (!(sample_data & entry->value))
> > > + goto drop;
> > > + break;
> > > + }
> > > + }
> > > + /* generate sample data */
> > > + return 1;
> > > +
> > > +drop:
> > > + __sync_fetch_and_add(&dropped, 1);
> > > + return 0;
> > > +}
> > > +
> > > +char LICENSE[] SEC("license") = "Dual BSD/GPL";
> > > diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> > > index c272c06565c0..68072ec655ce 100644
> > > --- a/tools/perf/util/evsel.h
> > > +++ b/tools/perf/util/evsel.h
> > > @@ -150,8 +150,10 @@ struct evsel {
> > > */
> > > struct bpf_counter_ops *bpf_counter_ops;
> > >
> > > - /* for perf-stat -b */
> > > - struct list_head bpf_counter_list;
> > > + union {
> > > + struct list_head bpf_counter_list; /* for perf-stat -b */
> > > + struct list_head bpf_filters; /* for perf-record --filter */
> > > + };
> > >
> > > /* for perf-stat --use-bpf */
> > > int bperf_leader_prog_fd;
> > > @@ -159,6 +161,7 @@ struct evsel {
> > > union {
> > > struct bperf_leader_bpf *leader_skel;
> > > struct bperf_follower_bpf *follower_skel;
> > > + void *bpf_skel;
> > > };
> > > unsigned long open_flags;
> > > int precise_ip_original;
> > > --
> > > 2.40.0.rc1.284.g88254d51c5-goog
> > >
> >
> > --
> >
> > - Arnaldo
>
> --
>
> - Arnaldo
On March 15, 2023 1:41:29 PM GMT-03:00, Namhyung Kim <[email protected]> wrote:
>Hi Arnaldo,
>
>On Wed, Mar 15, 2023 at 6:47 AM Arnaldo Carvalho de Melo
><[email protected]> wrote:
>>
>> Em Tue, Mar 14, 2023 at 04:42:30PM -0700, Namhyung Kim escreveu:
>> > Use --filter option to set BPF filter for generic events other than the
>> > tracepoints or Intel PT. The BPF program will check the sample data and
>> > filter according to the expression.
>> >
>> > For example, the below is the typical perf record for frequency mode.
>> > The sample period started from 1 and increased gradually.
>> >
>> > $ sudo ./perf record -e cycles true
>> > $ sudo ./perf script
>> > perf-exec 2272336 546683.916875: 1 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>> > perf-exec 2272336 546683.916892: 1 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>> > perf-exec 2272336 546683.916899: 3 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>> > perf-exec 2272336 546683.916905: 17 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>> > perf-exec 2272336 546683.916911: 100 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>> > perf-exec 2272336 546683.916917: 589 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>> > perf-exec 2272336 546683.916924: 3470 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>> > perf-exec 2272336 546683.916930: 20465 cycles: ffffffff828499b8 perf_event_exec+0x298 ([kernel.kallsyms])
>> > true 2272336 546683.916940: 119873 cycles: ffffffff8283afdd perf_iterate_ctx+0x2d ([kernel.kallsyms])
>> > true 2272336 546683.917003: 461349 cycles: ffffffff82892517 vma_interval_tree_insert+0x37 ([kernel.kallsyms])
>> > true 2272336 546683.917237: 635778 cycles: ffffffff82a11400 security_mmap_file+0x20 ([kernel.kallsyms])
>> >
>> > When you add a BPF filter to get samples having periods greater than 1000,
>> > the output would look like below:
>>
>> Had to add:
>>
>> diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
>> index be336f1b2b689602..153a13cdca9df1ea 100644
>> --- a/tools/perf/util/python.c
>> +++ b/tools/perf/util/python.c
>> @@ -19,6 +19,7 @@
>> #include "mmap.h"
>> #include "stat.h"
>> #include "metricgroup.h"
>> +#include "util/bpf-filter.h"
>> #include "util/env.h"
>> #include "util/pmu.h"
>> #include <internal/lib.h>
>> @@ -135,6 +136,18 @@ int bpf_counter__disable(struct evsel *evsel __maybe_unused)
>> return 0;
>> }
>>
>> +// not to drag util/bpf-filter.c
>> +
>> +int perf_bpf_filter__prepare(struct evsel *evsel __maybe_unused)
>> +{
>> + return 0;
>> +}
>> +
>> +int perf_bpf_filter__destroy(struct evsel *evsel __maybe_unused)
>> +{
>> + return 0;
>> +}
>> +
>> /*
>> * Support debug printing even though util/debug.c is not linked. That means
>> * implementing 'verbose' and 'eprintf'.
>>
>>
>> Please run 'perf test' before submitting patches,
>
>Ugh, sorry. I think I ran it at some point but missed the python test :-p
>
>Anyway, I'm afraid you need to enclose with #ifndef HAVE_BPF_SKEL.
Right, I noticed that
>
>Thanks,
>Namhyung
>
>
>>
>> - Arnaldo
>>
>> > $ sudo ./perf record -e cycles --filter 'period > 1000' true
>> > $ sudo ./perf script
>> > perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
>> > perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
>> > perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
>> > perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
>> > perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
>> > true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
>> > true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
>> >
>> > Acked-by: Jiri Olsa <[email protected]>
>> > Signed-off-by: Namhyung Kim <[email protected]>
>> > ---
>> > tools/perf/Documentation/perf-record.txt | 15 +++++++++++---
>> > tools/perf/util/bpf_counter.c | 3 +--
>> > tools/perf/util/evlist.c | 25 +++++++++++++++++-------
>> > tools/perf/util/evsel.c | 2 ++
>> > tools/perf/util/parse-events.c | 8 +++-----
>> > 5 files changed, 36 insertions(+), 17 deletions(-)
>> >
>> > diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
>> > index ff815c2f67e8..122f71726eaa 100644
>> > --- a/tools/perf/Documentation/perf-record.txt
>> > +++ b/tools/perf/Documentation/perf-record.txt
>> > @@ -119,9 +119,12 @@ OPTIONS
>> > "perf report" to view group events together.
>> >
>> > --filter=<filter>::
>> > - Event filter. This option should follow an event selector (-e) which
>> > - selects either tracepoint event(s) or a hardware trace PMU
>> > - (e.g. Intel PT or CoreSight).
>> > + Event filter. This option should follow an event selector (-e).
>> > + If the event is a tracepoint, the filter string will be parsed by
>> > + the kernel. If the event is a hardware trace PMU (e.g. Intel PT
>> > + or CoreSight), it'll be processed as an address filter. Otherwise
>> > + it means a general filter using BPF which can be applied for any
>> > + kind of event.
>> >
>> > - tracepoint filters
>> >
>> > @@ -176,6 +179,12 @@ OPTIONS
>> >
>> > Multiple filters can be separated with space or comma.
>> >
>> > + - bpf filters
>> > +
>> > + A BPF filter can access the sample data and make a decision based on the
>> > + data. Users need to set an appropriate sample type to use the BPF
>> > + filter.
>> > +
>> > --exclude-perf::
>> > Don't record events issued by perf itself. This option should follow
>> > an event selector (-e) which selects tracepoint event(s). It adds a
>> > diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
>> > index aa78a15a6f0a..1b77436e067e 100644
>> > --- a/tools/perf/util/bpf_counter.c
>> > +++ b/tools/perf/util/bpf_counter.c
>> > @@ -763,8 +763,7 @@ extern struct bpf_counter_ops bperf_cgrp_ops;
>> >
>> > static inline bool bpf_counter_skip(struct evsel *evsel)
>> > {
>> > - return list_empty(&evsel->bpf_counter_list) &&
>> > - evsel->follower_skel == NULL;
>> > + return evsel->bpf_counter_ops == NULL;
>> > }
>> >
>> > int bpf_counter__install_pe(struct evsel *evsel, int cpu_map_idx, int fd)
>> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> > index b74e12239aec..cc491a037836 100644
>> > --- a/tools/perf/util/evlist.c
>> > +++ b/tools/perf/util/evlist.c
>> > @@ -31,6 +31,7 @@
>> > #include "util/evlist-hybrid.h"
>> > #include "util/pmu.h"
>> > #include "util/sample.h"
>> > +#include "util/bpf-filter.h"
>> > #include <signal.h>
>> > #include <unistd.h>
>> > #include <sched.h>
>> > @@ -1086,17 +1087,27 @@ int evlist__apply_filters(struct evlist *evlist, struct evsel **err_evsel)
>> > int err = 0;
>> >
>> > evlist__for_each_entry(evlist, evsel) {
>> > - if (evsel->filter == NULL)
>> > - continue;
>> > -
>> > /*
>> > * filters only work for tracepoint event, which doesn't have cpu limit.
>> > * So evlist and evsel should always be same.
>> > */
>> > - err = perf_evsel__apply_filter(&evsel->core, evsel->filter);
>> > - if (err) {
>> > - *err_evsel = evsel;
>> > - break;
>> > + if (evsel->filter) {
>> > + err = perf_evsel__apply_filter(&evsel->core, evsel->filter);
>> > + if (err) {
>> > + *err_evsel = evsel;
>> > + break;
>> > + }
>> > + }
>> > +
>> > + /*
>> > + * non-tracepoint events can have BPF filters.
>> > + */
>> > + if (!list_empty(&evsel->bpf_filters)) {
>> > + err = perf_bpf_filter__prepare(evsel);
>> > + if (err) {
>> > + *err_evsel = evsel;
>> > + break;
>> > + }
>> > }
>> > }
>> >
>> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> > index a83d8cd5eb51..dc3faf005c3b 100644
>> > --- a/tools/perf/util/evsel.c
>> > +++ b/tools/perf/util/evsel.c
>> > @@ -50,6 +50,7 @@
>> > #include "off_cpu.h"
>> > #include "../perf-sys.h"
>> > #include "util/parse-branch-options.h"
>> > +#include "util/bpf-filter.h"
>> > #include <internal/xyarray.h>
>> > #include <internal/lib.h>
>> > #include <internal/threadmap.h>
>> > @@ -1517,6 +1518,7 @@ void evsel__exit(struct evsel *evsel)
>> > assert(list_empty(&evsel->core.node));
>> > assert(evsel->evlist == NULL);
>> > bpf_counter__destroy(evsel);
>> > + perf_bpf_filter__destroy(evsel);
>> > evsel__free_counts(evsel);
>> > perf_evsel__free_fd(&evsel->core);
>> > perf_evsel__free_id(&evsel->core);
>> > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
>> > index 3b2e5bb3e852..6c5cf5244486 100644
>> > --- a/tools/perf/util/parse-events.c
>> > +++ b/tools/perf/util/parse-events.c
>> > @@ -28,6 +28,7 @@
>> > #include "perf.h"
>> > #include "util/parse-events-hybrid.h"
>> > #include "util/pmu-hybrid.h"
>> > +#include "util/bpf-filter.h"
>> > #include "tracepoint.h"
>> > #include "thread_map.h"
>> >
>> > @@ -2542,11 +2543,8 @@ static int set_filter(struct evsel *evsel, const void *arg)
>> > perf_pmu__scan_file(pmu, "nr_addr_filters",
>> > "%d", &nr_addr_filters);
>> >
>> > - if (!nr_addr_filters) {
>> > - fprintf(stderr,
>> > - "This CPU does not support address filtering\n");
>> > - return -1;
>> > - }
>> > + if (!nr_addr_filters)
>> > + return perf_bpf_filter__parse(&evsel->bpf_filters, str);
>> >
>> > if (evsel__append_addr_filter(evsel, str) < 0) {
>> > fprintf(stderr,
>> > --
>> > 2.40.0.rc1.284.g88254d51c5-goog
>> >
>>
>> --
>>
>> - Arnaldo
Em Wed, Mar 15, 2023 at 09:51:03AM -0700, Namhyung Kim escreveu:
> On Wed, Mar 15, 2023 at 9:39 AM Arnaldo Carvalho de Melo
> <[email protected]> wrote:
> >
> > Em Wed, Mar 15, 2023 at 01:24:37PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > Em Tue, Mar 14, 2023 at 04:42:29PM -0700, Namhyung Kim escreveu:
> > > > The BPF program will be attached to a perf_event and be triggered when
> > > > it overflows. It'd iterate the filters map and compare the sample
> > > > value according to the expression. If any of them fails, the sample
> > > > would be dropped.
> > > >
> > > > Also it needs to have the corresponding sample data for the expression
> > > > so it compares data->sample_flags with the given value. To access the
> > > > sample data, it uses the bpf_cast_to_kern_ctx() kfunc which was added
> > > > in v6.2 kernel.
> > > >
> > > > Acked-by: Jiri Olsa <[email protected]>
> > > > Signed-off-by: Namhyung Kim <[email protected]>
> > >
> > >
> > > I'm noticing this while building on a debian:11 container:
> > >
> > > GENSKEL /tmp/build/perf/util/bpf_skel/bperf_leader.skel.h
> > > GENSKEL /tmp/build/perf/util/bpf_skel/bperf_follower.skel.h
> > > GENSKEL /tmp/build/perf/util/bpf_skel/func_latency.skel.h
> > > GENSKEL /tmp/build/perf/util/bpf_skel/bpf_prog_profiler.skel.h
> > > GENSKEL /tmp/build/perf/util/bpf_skel/kwork_trace.skel.h
> > > GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h
> > > libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
> > > Error: failed to open BPF object file: No such file or directory
> > > make[2]: *** [Makefile.perf:1085: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
> > > make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'
> > > make[2]: *** Waiting for unfinished jobs....
> > > make[1]: *** [Makefile.perf:236: sub-make] Error 2
> > > make: *** [Makefile:70: all] Error 2
> > > make: Leaving directory '/git/perf-6.3.0-rc1/tools/perf'
> > > + exit 1
> > > [perfbuilder@five 11]$
> >
> > Same thing on debian:10
>
> Hmm.. I thought extern symbols with__ksym are runtime
> dependencies and it should build on old kernels too.
>
> BPF folks, any suggestions?
Fedora 33 also fails, see below, but these work:
[perfbuilder@five ~]$ grep Ok dm.log/summary
1 131.48 almalinux:8 : Ok gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-16) , clang version 14.0.6 (Red Hat 14.0.6-1.module_el8.7.0+3277+b822483f)
2 132.99 almalinux:9 : Ok gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2) , clang version 14.0.6 (Red Hat 14.0.6-4.el9_1)
3 162.36 alpine:3.15 : Ok gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027 , Alpine clang version 12.0.1
4 155.25 alpine:3.16 : Ok gcc (Alpine 11.2.1_git20220219) 11.2.1 20220219 , Alpine clang version 13.0.1
5 136.69 alpine:3.17 : Ok gcc (Alpine 12.2.1_git20220924-r4) 12.2.1 20220924 , Alpine clang version 15.0.7
6 158.08 alpine:edge : Ok gcc (Alpine 12.2.1_git20220924-r9) 12.2.1 20220924 , Alpine clang version 15.0.7
12 137.19 archlinux:base : Ok gcc (GCC) 12.2.0 , clang version 14.0.6
13 117.85 centos:stream : Ok gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-18) , clang version 15.0.7 (Red Hat 15.0.7-1.module_el8.8.0+1258+af79b238)
17 122.65 debian:experimental : Ok gcc (Debian 12.2.0-14) 12.2.0 , Debian clang version 14.0.6
30 165.29 fedora:34 : Ok gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-2) , clang version 12.0.1 (Fedora 12.0.1-1.fc34)
33 153.26 fedora:35 : Ok gcc (GCC) 11.3.1 20220421 (Red Hat 11.3.1-3) , clang version 13.0.1 (Fedora 13.0.1-1.fc35)
34 152.66 fedora:36 : Ok gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4) , clang version 14.0.5 (Fedora 14.0.5-2.fc36)
35 154.36 fedora:37 : Ok gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4) , clang version 15.0.7 (Fedora 15.0.7-1.fc37)
36 145.45 fedora:38 : Ok gcc (GCC) 13.0.1 20230208 (Red Hat 13.0.1-0) , clang version 15.0.7 (Fedora 15.0.7-2.fc38)
37 166.89 fedora:rawhide : Ok gcc (GCC) 13.0.1 20230127 (Red Hat 13.0.1-0) , clang version 15.0.7 (Fedora 15.0.7-2.fc38)
44 146.10 opensuse:15.4 : Ok gcc (SUSE Linux) 7.5.0 , clang version 13.0.1
45 165.87 opensuse:15.5 : Ok gcc (SUSE Linux) 7.5.0 , clang version 15.0.2
46 167.90 opensuse:tumbleweed : Ok gcc (SUSE Linux) 12.2.1 20221020 [revision 0aaef83351473e8f4eb774f8f999bbe87a4866d7] , clang version 15.0.6
47 130.58 oraclelinux:8 : Ok gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-16.0.2) , clang version 14.0.6 (Red Hat 14.0.6-1.0.1.module+el8.7.0+20823+214a699d)
49 132.09 rockylinux:8 : Ok gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-16) , clang version 14.0.6 (Red Hat 14.0.6-1.module+el8.7.0+1080+d88dc670)
64 161.49 ubuntu:22.04 : Ok gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 , Ubuntu clang version 14.0.0-1ubuntu1
[perfbuilder@five ~]$
[perfbuilder@five ~]$ for a in `grep Ok dm.log/summary | cut -c15- | cut -d: -f1,2`; do grep -q GENSKEL.*sample_filter dm.log/$a && echo $a ; done
almalinux:8
almalinux:9
alpine:3.15
alpine:3.16
alpine:3.17
alpine:edge
archlinux:base
centos:stream
debian:experimental
fedora:34
fedora:35
fedora:36
fedora:37
fedora:38
fedora:rawhide
opensuse:15.4
opensuse:15.5
opensuse:tumbleweed
oraclelinux:8
rockylinux:8
ubuntu:22.04
[perfbuilder@five ~]$
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.3.1 20210422 (Red Hat 10.3.1-1) (GCC)
+ make PYTHON=python3 ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C tools/perf O=/tmp/build/perf
make: Entering directory '/git/perf-6.3.0-rc1/tools/perf'
BUILD: Doing 'make -j32' parallel build
HOSTCC /tmp/build/perf/fixdep.o
HOSTLD /tmp/build/perf/fixdep-in.o
LINK /tmp/build/perf/fixdep
Makefile.config:1046: No libbabeltrace found, disables 'perf data' CTF format support, please install libbabeltrace-dev[el]/libbabeltrace-ctf-dev
Makefile.config:1137: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
... glibc: [ on ]
... libbfd: [ on ]
... libbfd-buildid: [ on ]
... libcap: [ on ]
... libelf: [ on ]
... libnuma: [ on ]
... numa_num_possible_cpus: [ on ]
... libperl: [ on ]
... libpython: [ on ]
... libcrypto: [ on ]
... libunwind: [ on ]
... libdw-dwarf-unwind: [ on ]
... zlib: [ on ]
... lzma: [ on ]
... get_cpuid: [ on ]
... bpf: [ on ]
... libaio: [ on ]
... libzstd: [ on ]
GEN /tmp/build/perf/common-cmds.h
PERF_VERSION = 6.3.rc1.gd34a77f6cd75
GEN perf-archive
GEN perf-iostat
CC /tmp/build/perf/dlfilters/dlfilter-test-api-v0.o
CC /tmp/build/perf/dlfilters/dlfilter-show-cycles.o
MKDIR /tmp/build/perf/jvmti/
MKDIR /tmp/build/perf/jvmti/
MKDIR /tmp/build/perf/jvmti/
MKDIR /tmp/build/perf/jvmti/
CC /tmp/build/perf/jvmti/libjvmti.o
CC /tmp/build/perf/jvmti/jvmti_agent.o
CC /tmp/build/perf/jvmti/libstring.o
CC /tmp/build/perf/jvmti/libctype.o
INSTALL /tmp/build/perf/libsubcmd/include/subcmd/exec-cmd.h
INSTALL /tmp/build/perf/libsubcmd/include/subcmd/help.h
INSTALL /tmp/build/perf/libsubcmd/include/subcmd/pager.h
INSTALL /tmp/build/perf/libsubcmd/include/subcmd/parse-options.h
INSTALL /tmp/build/perf/libsubcmd/include/subcmd/run-command.h
CC /tmp/build/perf/libsubcmd/exec-cmd.o
CC /tmp/build/perf/libsubcmd/help.o
CC /tmp/build/perf/libsubcmd/pager.o
CC /tmp/build/perf/libsubcmd/parse-options.o
CC /tmp/build/perf/libsubcmd/run-command.o
CC /tmp/build/perf/libsubcmd/sigchain.o
CC /tmp/build/perf/libsubcmd/subcmd-config.o
LINK /tmp/build/perf/dlfilters/dlfilter-show-cycles.so
INSTALL libsubcmd_headers
LINK /tmp/build/perf/dlfilters/dlfilter-test-api-v0.so
INSTALL /tmp/build/perf/libperf/include/perf/bpf_perf.h
INSTALL /tmp/build/perf/libperf/include/perf/core.h
INSTALL /tmp/build/perf/libperf/include/perf/cpumap.h
INSTALL /tmp/build/perf/libperf/include/perf/threadmap.h
INSTALL /tmp/build/perf/libsymbol/include/symbol/kallsyms.h
INSTALL /tmp/build/perf/libapi/include/api/cpu.h
INSTALL /tmp/build/perf/libperf/include/perf/evlist.h
INSTALL /tmp/build/perf/libapi/include/api/debug.h
INSTALL /tmp/build/perf/libperf/include/perf/evsel.h
CC /tmp/build/perf/libperf/core.o
CC /tmp/build/perf/libsymbol/kallsyms.o
CC /tmp/build/perf/libperf/cpumap.o
INSTALL /tmp/build/perf/libapi/include/api/io.h
INSTALL /tmp/build/perf/libperf/include/perf/event.h
INSTALL /tmp/build/perf/libapi/include/api/fd/array.h
MKDIR /tmp/build/perf/libapi/fd/
CC /tmp/build/perf/libperf/threadmap.o
INSTALL libsymbol_headers
INSTALL /tmp/build/perf/libperf/include/perf/mmap.h
INSTALL /tmp/build/perf/libapi/include/api/fs/fs.h
CC /tmp/build/perf/libperf/evsel.o
GEN /tmp/build/perf/libbpf/bpf_helper_defs.h
CC /tmp/build/perf/libperf/evlist.o
MKDIR /tmp/build/perf/libapi/fs/
INSTALL /tmp/build/perf/libapi/include/api/fs/tracing_path.h
MKDIR /tmp/build/perf/libapi/fs/
CC /tmp/build/perf/libapi/fd/array.o
CC /tmp/build/perf/libperf/mmap.o
MKDIR /tmp/build/perf/libapi/fs/
CC /tmp/build/perf/libperf/zalloc.o
CC /tmp/build/perf/libapi/fs/fs.o
CC /tmp/build/perf/libperf/xyarray.o
CC /tmp/build/perf/libperf/lib.o
CC /tmp/build/perf/libapi/fs/tracing_path.o
CC /tmp/build/perf/libapi/fs/cgroup.o
INSTALL /tmp/build/perf/libperf/include/internal/cpumap.h
INSTALL /tmp/build/perf/libperf/include/internal/evlist.h
CC /tmp/build/perf/libapi/cpu.o
CC /tmp/build/perf/libapi/debug.o
CC /tmp/build/perf/libapi/str_error_r.o
INSTALL /tmp/build/perf/libperf/include/internal/evsel.h
INSTALL /tmp/build/perf/libbpf/include/bpf/bpf.h
INSTALL /tmp/build/perf/libbpf/include/bpf/libbpf.h
INSTALL /tmp/build/perf/libperf/include/internal/lib.h
INSTALL libapi_headers
INSTALL /tmp/build/perf/libperf/include/internal/mmap.h
INSTALL /tmp/build/perf/libperf/include/internal/threadmap.h
INSTALL /tmp/build/perf/libperf/include/internal/xyarray.h
INSTALL /tmp/build/perf/libbpf/include/bpf/btf.h
INSTALL /tmp/build/perf/libbpf/include/bpf/libbpf_common.h
INSTALL /tmp/build/perf/libbpf/include/bpf/libbpf_legacy.h
INSTALL /tmp/build/perf/libbpf/include/bpf/bpf_helpers.h
INSTALL /tmp/build/perf/libbpf/include/bpf/bpf_tracing.h
INSTALL /tmp/build/perf/libbpf/include/bpf/bpf_endian.h
INSTALL /tmp/build/perf/libbpf/include/bpf/bpf_core_read.h
INSTALL libperf_headers
INSTALL /tmp/build/perf/libbpf/include/bpf/skel_internal.h
INSTALL /tmp/build/perf/libbpf/include/bpf/libbpf_version.h
INSTALL /tmp/build/perf/libbpf/include/bpf/usdt.bpf.h
INSTALL /tmp/build/perf/libbpf/include/bpf/bpf_helper_defs.h
MKDIR /tmp/build/perf/libbpf/staticobjs/
MKDIR /tmp/build/perf/libbpf/staticobjs/
INSTALL libbpf_headers
MKDIR /tmp/build/perf/libbpf/staticobjs/
MKDIR /tmp/build/perf/libbpf/staticobjs/
MKDIR /tmp/build/perf/libbpf/staticobjs/
LD /tmp/build/perf/libsymbol/libsymbol-in.o
MKDIR /tmp/build/perf/libbpf/staticobjs/
LD /tmp/build/perf/libapi/fd/libapi-in.o
MKDIR /tmp/build/perf/libbpf/staticobjs/
CC /tmp/build/perf/libbpf/staticobjs/bpf_prog_linfo.o
CC /tmp/build/perf/libbpf/staticobjs/libbpf.o
CC /tmp/build/perf/libbpf/staticobjs/bpf.o
CC /tmp/build/perf/libbpf/staticobjs/nlattr.o
CC /tmp/build/perf/libbpf/staticobjs/btf.o
CC /tmp/build/perf/libbpf/staticobjs/libbpf_errno.o
CC /tmp/build/perf/libbpf/staticobjs/str_error.o
CC /tmp/build/perf/libbpf/staticobjs/netlink.o
CC /tmp/build/perf/libbpf/staticobjs/libbpf_probes.o
CC /tmp/build/perf/libbpf/staticobjs/hashmap.o
CC /tmp/build/perf/libbpf/staticobjs/btf_dump.o
AR /tmp/build/perf/libsymbol/libsymbol.a
CC /tmp/build/perf/libbpf/staticobjs/ringbuf.o
LD /tmp/build/perf/jvmti/jvmti-in.o
CC /tmp/build/perf/libbpf/staticobjs/strset.o
CC /tmp/build/perf/libbpf/staticobjs/linker.o
CC /tmp/build/perf/libbpf/staticobjs/gen_loader.o
CC /tmp/build/perf/libbpf/staticobjs/relo_core.o
CC /tmp/build/perf/libbpf/staticobjs/usdt.o
LD /tmp/build/perf/libapi/fs/libapi-in.o
LINK /tmp/build/perf/libperf-jvmti.so
LD /tmp/build/perf/libapi/libapi-in.o
LD /tmp/build/perf/libperf/libperf-in.o
AR /tmp/build/perf/libapi/libapi.a
AR /tmp/build/perf/libperf/libperf.a
LD /tmp/build/perf/libsubcmd/libsubcmd-in.o
AR /tmp/build/perf/libsubcmd/libsubcmd.a
GEN /tmp/build/perf/python/perf.cpython-39-x86_64-linux-gnu.so
Auto-detecting system features:
... clang-bpf-co-re: [ on ]
... llvm: [ on ]
... libcap: [ on ]
... libbfd: [ on ]
MKDIR /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf
MKDIR /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/
MKDIR /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/hashmap.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/relo_core.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/libbpf_internal.h
GEN /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/bpf_helper_defs.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/libbpf.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/bpf.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/btf.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/libbpf_common.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/libbpf_legacy.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/bpf_helpers.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/bpf_tracing.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/bpf_endian.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/bpf_core_read.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/skel_internal.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/libbpf_version.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/usdt.bpf.h
INSTALL /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/include/bpf/bpf_helper_defs.h
MKDIR /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/
MKDIR /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/
MKDIR /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/
MKDIR /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/
MKDIR /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/
INSTALL libbpf_headers
MKDIR /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/
MKDIR /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/
MKDIR /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/libbpf_probes.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/libbpf.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/bpf.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/nlattr.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/btf.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/libbpf_errno.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/hashmap.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/str_error.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/netlink.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/bpf_prog_linfo.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/btf_dump.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/ringbuf.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/strset.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/linker.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/gen_loader.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/relo_core.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/usdt.o
LD /tmp/build/perf/libbpf/staticobjs/libbpf-in.o
LINK /tmp/build/perf/libbpf/libbpf.a
LD /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/staticobjs/libbpf-in.o
LINK /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/libbpf/libbpf.a
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/main.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/common.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/json_writer.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/gen.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/btf.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/xlated_dumper.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/btf_dumper.o
CC /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/disasm.o
LINK /tmp/build/perf/util/bpf_skel/.tmp/bootstrap/bpftool
GEN /tmp/build/perf/util/bpf_skel/vmlinux.h
CLANG /tmp/build/perf/util/bpf_skel/.tmp/bpf_prog_profiler.bpf.o
CLANG /tmp/build/perf/util/bpf_skel/.tmp/bperf_leader.bpf.o
CLANG /tmp/build/perf/util/bpf_skel/.tmp/bperf_follower.bpf.o
CLANG /tmp/build/perf/util/bpf_skel/.tmp/bperf_cgroup.bpf.o
CLANG /tmp/build/perf/util/bpf_skel/.tmp/func_latency.bpf.o
CLANG /tmp/build/perf/util/bpf_skel/.tmp/off_cpu.bpf.o
CLANG /tmp/build/perf/util/bpf_skel/.tmp/lock_contention.bpf.o
CLANG /tmp/build/perf/util/bpf_skel/.tmp/kwork_trace.bpf.o
CLANG /tmp/build/perf/util/bpf_skel/.tmp/sample_filter.bpf.o
GENSKEL /tmp/build/perf/util/bpf_skel/bpf_prog_profiler.skel.h
GENSKEL /tmp/build/perf/util/bpf_skel/bperf_leader.skel.h
GENSKEL /tmp/build/perf/util/bpf_skel/bperf_follower.skel.h
GENSKEL /tmp/build/perf/util/bpf_skel/func_latency.skel.h
libbpf: elf: skipping unrecognized data section(7) .eh_frame
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(12) .rel.eh_frame for section(7) .eh_frame
libbpf: elf: skipping relo section(13) .rel.eh_frame for section(8) .eh_frame
libbpf: elf: skipping relo section(14) .rel.eh_frame for section(8) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
GENSKEL /tmp/build/perf/util/bpf_skel/kwork_trace.skel.h
libbpf: elf: skipping unrecognized data section(17) .eh_frame
libbpf: elf: skipping relo section(29) .rel.eh_frame for section(17) .eh_frame
GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h
libbpf: elf: skipping unrecognized data section(8) .eh_frame
libbpf: elf: skipping relo section(13) .rel.eh_frame for section(8) .eh_frame
libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
Error: failed to open BPF object file: No such file or directory
make[2]: *** [Makefile.perf:1085: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [Makefile.perf:236: sub-make] Error 2
make: *** [Makefile:70: all] Error 2
make: Leaving directory '/git/perf-6.3.0-rc1/tools/perf'
+ exit 1
[perfbuilder@five 33]$ fg
> Thanks,
> Namhyung
>
>
> >
> > libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
> > Error: failed to open BPF object file: No such file or directory
> > make[2]: *** [Makefile.perf:1085: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
> > make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'
> > make[2]: *** Waiting for unfinished jobs....
> > make[1]: *** [Makefile.perf:236: sub-make] Error 2
> > make: *** [Makefile:70: all] Error 2
> > make: Leaving directory '/git/perf-6.3.0-rc1/tools/perf'
> > + exit 1
> > [perfbuilder@five 10]$
> >
> > Works with debian:experimental:
> >
> >
> > [perfbuilder@five experimental]$ export BUILD_TARBALL=http://192.168.86.10/perf/perf-6.3.0-rc1.tar.xz
> > [perfbuilder@five experimental]$ time dm .
> > 1 147.54 debian:experimental : Ok gcc (Debian 12.2.0-14) 12.2.0 , Debian clang version 14.0.6
> > BUILD_TARBALL_HEAD=d34a77f6cd75d2a75c64e78f3d949a12903a7cf0
> >
> > Both with:
> >
> > Debian clang version 14.0.6
> > Target: x86_64-pc-linux-gnu
> > Thread model: posix
> > InstalledDir: /usr/bin
> > Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/12
> > Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/12
> > Candidate multilib: .;@m64
> > Selected multilib: .;@m64
> > + rm -rf /tmp/build/perf
> > + mkdir /tmp/build/perf
> > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C tools/perf O=/tmp/build/perf CC=clang
> >
> > and:
> >
> > COLLECT_GCC=gcc
> > COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/12/lto-wrapper
> > OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
> > OFFLOAD_TARGET_DEFAULT=1
> > Target: x86_64-linux-gnu
> > Configured with: ../src/configure -v --with-pkgversion='Debian 12.2.0-14' --with-bugurl=file:///usr/share/doc/gcc-12/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-12 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-12-bTRWOB/gcc-12-12.2.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
> > Thread model: posix
> > Supported LTO compression algorithms: zlib zstd
> > gcc version 12.2.0 (Debian 12.2.0-14)
> > + make ARCH= CROSS_COMPILE= EXTRA_CFLAGS= -C tools/perf O=/tmp/build/perf
> > make: Entering directory '/git/perf-6.3.0-rc1/tools/perf'
> >
> >
> > >
> > > > ---
> > > > tools/perf/Makefile.perf | 2 +-
> > > > tools/perf/util/bpf-filter.c | 64 ++++++++++
> > > > tools/perf/util/bpf-filter.h | 26 ++--
> > > > tools/perf/util/bpf_skel/sample-filter.h | 24 ++++
> > > > tools/perf/util/bpf_skel/sample_filter.bpf.c | 126 +++++++++++++++++++
> > > > tools/perf/util/evsel.h | 7 +-
> > > > 6 files changed, 236 insertions(+), 13 deletions(-)
> > > > create mode 100644 tools/perf/util/bpf_skel/sample-filter.h
> > > > create mode 100644 tools/perf/util/bpf_skel/sample_filter.bpf.c
> > > >
> > > > diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> > > > index dc9dda09b076..ed6b6a070f79 100644
> > > > --- a/tools/perf/Makefile.perf
> > > > +++ b/tools/perf/Makefile.perf
> > > > @@ -1050,7 +1050,7 @@ SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
> > > > SKELETONS += $(SKEL_OUT)/bperf_leader.skel.h $(SKEL_OUT)/bperf_follower.skel.h
> > > > SKELETONS += $(SKEL_OUT)/bperf_cgroup.skel.h $(SKEL_OUT)/func_latency.skel.h
> > > > SKELETONS += $(SKEL_OUT)/off_cpu.skel.h $(SKEL_OUT)/lock_contention.skel.h
> > > > -SKELETONS += $(SKEL_OUT)/kwork_trace.skel.h
> > > > +SKELETONS += $(SKEL_OUT)/kwork_trace.skel.h $(SKEL_OUT)/sample_filter.skel.h
> > > >
> > > > $(SKEL_TMP_OUT) $(LIBAPI_OUTPUT) $(LIBBPF_OUTPUT) $(LIBPERF_OUTPUT) $(LIBSUBCMD_OUTPUT) $(LIBSYMBOL_OUTPUT):
> > > > $(Q)$(MKDIR) -p $@
> > > > diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c
> > > > index c72e35d51240..f20e1bc03778 100644
> > > > --- a/tools/perf/util/bpf-filter.c
> > > > +++ b/tools/perf/util/bpf-filter.c
> > > > @@ -1,10 +1,74 @@
> > > > /* SPDX-License-Identifier: GPL-2.0 */
> > > > #include <stdlib.h>
> > > >
> > > > +#include <bpf/bpf.h>
> > > > +#include <linux/err.h>
> > > > +#include <internal/xyarray.h>
> > > > +
> > > > +#include "util/debug.h"
> > > > +#include "util/evsel.h"
> > > > +
> > > > #include "util/bpf-filter.h"
> > > > #include "util/bpf-filter-flex.h"
> > > > #include "util/bpf-filter-bison.h"
> > > >
> > > > +#include "bpf_skel/sample-filter.h"
> > > > +#include "bpf_skel/sample_filter.skel.h"
> > > > +
> > > > +#define FD(e, x, y) (*(int *)xyarray__entry(e->core.fd, x, y))
> > > > +
> > > > +int perf_bpf_filter__prepare(struct evsel *evsel)
> > > > +{
> > > > + int i, x, y, fd;
> > > > + struct sample_filter_bpf *skel;
> > > > + struct bpf_program *prog;
> > > > + struct bpf_link *link;
> > > > + struct perf_bpf_filter_expr *expr;
> > > > +
> > > > + skel = sample_filter_bpf__open_and_load();
> > > > + if (!skel) {
> > > > + pr_err("Failed to load perf sample-filter BPF skeleton\n");
> > > > + return -1;
> > > > + }
> > > > +
> > > > + i = 0;
> > > > + fd = bpf_map__fd(skel->maps.filters);
> > > > + list_for_each_entry(expr, &evsel->bpf_filters, list) {
> > > > + struct perf_bpf_filter_entry entry = {
> > > > + .op = expr->op,
> > > > + .flags = expr->sample_flags,
> > > > + .value = expr->val,
> > > > + };
> > > > + bpf_map_update_elem(fd, &i, &entry, BPF_ANY);
> > > > + i++;
> > > > + }
> > > > +
> > > > + prog = skel->progs.perf_sample_filter;
> > > > + for (x = 0; x < xyarray__max_x(evsel->core.fd); x++) {
> > > > + for (y = 0; y < xyarray__max_y(evsel->core.fd); y++) {
> > > > + link = bpf_program__attach_perf_event(prog, FD(evsel, x, y));
> > > > + if (IS_ERR(link)) {
> > > > + pr_err("Failed to attach perf sample-filter program\n");
> > > > + return PTR_ERR(link);
> > > > + }
> > > > + }
> > > > + }
> > > > + evsel->bpf_skel = skel;
> > > > + return 0;
> > > > +}
> > > > +
> > > > +int perf_bpf_filter__destroy(struct evsel *evsel)
> > > > +{
> > > > + struct perf_bpf_filter_expr *expr, *tmp;
> > > > +
> > > > + list_for_each_entry_safe(expr, tmp, &evsel->bpf_filters, list) {
> > > > + list_del(&expr->list);
> > > > + free(expr);
> > > > + }
> > > > + sample_filter_bpf__destroy(evsel->bpf_skel);
> > > > + return 0;
> > > > +}
> > > > +
> > > > struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
> > > > enum perf_bpf_filter_op op,
> > > > unsigned long val)
> > > > diff --git a/tools/perf/util/bpf-filter.h b/tools/perf/util/bpf-filter.h
> > > > index 93a0d3de038c..eb8e1ac43cdf 100644
> > > > --- a/tools/perf/util/bpf-filter.h
> > > > +++ b/tools/perf/util/bpf-filter.h
> > > > @@ -4,15 +4,7 @@
> > > >
> > > > #include <linux/list.h>
> > > >
> > > > -enum perf_bpf_filter_op {
> > > > - PBF_OP_EQ,
> > > > - PBF_OP_NEQ,
> > > > - PBF_OP_GT,
> > > > - PBF_OP_GE,
> > > > - PBF_OP_LT,
> > > > - PBF_OP_LE,
> > > > - PBF_OP_AND,
> > > > -};
> > > > +#include "bpf_skel/sample-filter.h"
> > > >
> > > > struct perf_bpf_filter_expr {
> > > > struct list_head list;
> > > > @@ -21,16 +13,30 @@ struct perf_bpf_filter_expr {
> > > > unsigned long val;
> > > > };
> > > >
> > > > +struct evsel;
> > > > +
> > > > #ifdef HAVE_BPF_SKEL
> > > > struct perf_bpf_filter_expr *perf_bpf_filter_expr__new(unsigned long sample_flags,
> > > > enum perf_bpf_filter_op op,
> > > > unsigned long val);
> > > > int perf_bpf_filter__parse(struct list_head *expr_head, const char *str);
> > > > +int perf_bpf_filter__prepare(struct evsel *evsel);
> > > > +int perf_bpf_filter__destroy(struct evsel *evsel);
> > > > +
> > > > #else /* !HAVE_BPF_SKEL */
> > > > +
> > > > static inline int perf_bpf_filter__parse(struct list_head *expr_head __maybe_unused,
> > > > const char *str __maybe_unused)
> > > > {
> > > > - return -ENOSYS;
> > > > + return -EOPNOTSUPP;
> > > > +}
> > > > +static inline int perf_bpf_filter__prepare(struct evsel *evsel __maybe_unused)
> > > > +{
> > > > + return -EOPNOTSUPP;
> > > > +}
> > > > +static inline int perf_bpf_filter__destroy(struct evsel *evsel __maybe_unused)
> > > > +{
> > > > + return -EOPNOTSUPP;
> > > > }
> > > > #endif /* HAVE_BPF_SKEL*/
> > > > #endif /* PERF_UTIL_BPF_FILTER_H */
> > > > diff --git a/tools/perf/util/bpf_skel/sample-filter.h b/tools/perf/util/bpf_skel/sample-filter.h
> > > > new file mode 100644
> > > > index 000000000000..862060bfda14
> > > > --- /dev/null
> > > > +++ b/tools/perf/util/bpf_skel/sample-filter.h
> > > > @@ -0,0 +1,24 @@
> > > > +#ifndef PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H
> > > > +#define PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H
> > > > +
> > > > +#define MAX_FILTERS 32
> > > > +
> > > > +/* supported filter operations */
> > > > +enum perf_bpf_filter_op {
> > > > + PBF_OP_EQ,
> > > > + PBF_OP_NEQ,
> > > > + PBF_OP_GT,
> > > > + PBF_OP_GE,
> > > > + PBF_OP_LT,
> > > > + PBF_OP_LE,
> > > > + PBF_OP_AND
> > > > +};
> > > > +
> > > > +/* BPF map entry for filtering */
> > > > +struct perf_bpf_filter_entry {
> > > > + enum perf_bpf_filter_op op;
> > > > + __u64 flags;
> > > > + __u64 value;
> > > > +};
> > > > +
> > > > +#endif /* PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H */
> > > > \ No newline at end of file
> > > > diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
> > > > new file mode 100644
> > > > index 000000000000..c07256279c3e
> > > > --- /dev/null
> > > > +++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
> > > > @@ -0,0 +1,126 @@
> > > > +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > > > +// Copyright (c) 2023 Google
> > > > +#include "vmlinux.h"
> > > > +#include <bpf/bpf_helpers.h>
> > > > +#include <bpf/bpf_tracing.h>
> > > > +#include <bpf/bpf_core_read.h>
> > > > +
> > > > +#include "sample-filter.h"
> > > > +
> > > > +/* BPF map that will be filled by user space */
> > > > +struct filters {
> > > > + __uint(type, BPF_MAP_TYPE_ARRAY);
> > > > + __type(key, int);
> > > > + __type(value, struct perf_bpf_filter_entry);
> > > > + __uint(max_entries, MAX_FILTERS);
> > > > +} filters SEC(".maps");
> > > > +
> > > > +int dropped;
> > > > +
> > > > +void *bpf_cast_to_kern_ctx(void *) __ksym;
> > > > +
> > > > +/* new kernel perf_sample_data definition */
> > > > +struct perf_sample_data___new {
> > > > + __u64 sample_flags;
> > > > +} __attribute__((preserve_access_index));
> > > > +
> > > > +/* helper function to return the given perf sample data */
> > > > +static inline __u64 perf_get_sample(struct bpf_perf_event_data_kern *kctx,
> > > > + struct perf_bpf_filter_entry *entry)
> > > > +{
> > > > + struct perf_sample_data___new *data = (void *)kctx->data;
> > > > +
> > > > + if (!bpf_core_field_exists(data->sample_flags) ||
> > > > + (data->sample_flags & entry->flags) == 0)
> > > > + return 0;
> > > > +
> > > > + switch (entry->flags) {
> > > > + case PERF_SAMPLE_IP:
> > > > + return kctx->data->ip;
> > > > + case PERF_SAMPLE_ID:
> > > > + return kctx->data->id;
> > > > + case PERF_SAMPLE_TID:
> > > > + return kctx->data->tid_entry.tid;
> > > > + case PERF_SAMPLE_CPU:
> > > > + return kctx->data->cpu_entry.cpu;
> > > > + case PERF_SAMPLE_TIME:
> > > > + return kctx->data->time;
> > > > + case PERF_SAMPLE_ADDR:
> > > > + return kctx->data->addr;
> > > > + case PERF_SAMPLE_PERIOD:
> > > > + return kctx->data->period;
> > > > + case PERF_SAMPLE_TRANSACTION:
> > > > + return kctx->data->txn;
> > > > + case PERF_SAMPLE_WEIGHT:
> > > > + return kctx->data->weight.full;
> > > > + case PERF_SAMPLE_PHYS_ADDR:
> > > > + return kctx->data->phys_addr;
> > > > + case PERF_SAMPLE_CODE_PAGE_SIZE:
> > > > + return kctx->data->code_page_size;
> > > > + case PERF_SAMPLE_DATA_PAGE_SIZE:
> > > > + return kctx->data->data_page_size;
> > > > + default:
> > > > + break;
> > > > + }
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/* BPF program to be called from perf event overflow handler */
> > > > +SEC("perf_event")
> > > > +int perf_sample_filter(void *ctx)
> > > > +{
> > > > + struct bpf_perf_event_data_kern *kctx;
> > > > + struct perf_bpf_filter_entry *entry;
> > > > + __u64 sample_data;
> > > > + int i;
> > > > +
> > > > + kctx = bpf_cast_to_kern_ctx(ctx);
> > > > +
> > > > + for (i = 0; i < MAX_FILTERS; i++) {
> > > > + int key = i; /* needed for verifier :( */
> > > > +
> > > > + entry = bpf_map_lookup_elem(&filters, &key);
> > > > + if (entry == NULL)
> > > > + break;
> > > > + sample_data = perf_get_sample(kctx, entry);
> > > > +
> > > > + switch (entry->op) {
> > > > + case PBF_OP_EQ:
> > > > + if (!(sample_data == entry->value))
> > > > + goto drop;
> > > > + break;
> > > > + case PBF_OP_NEQ:
> > > > + if (!(sample_data != entry->value))
> > > > + goto drop;
> > > > + break;
> > > > + case PBF_OP_GT:
> > > > + if (!(sample_data > entry->value))
> > > > + goto drop;
> > > > + break;
> > > > + case PBF_OP_GE:
> > > > + if (!(sample_data >= entry->value))
> > > > + goto drop;
> > > > + break;
> > > > + case PBF_OP_LT:
> > > > + if (!(sample_data < entry->value))
> > > > + goto drop;
> > > > + break;
> > > > + case PBF_OP_LE:
> > > > + if (!(sample_data <= entry->value))
> > > > + goto drop;
> > > > + break;
> > > > + case PBF_OP_AND:
> > > > + if (!(sample_data & entry->value))
> > > > + goto drop;
> > > > + break;
> > > > + }
> > > > + }
> > > > + /* generate sample data */
> > > > + return 1;
> > > > +
> > > > +drop:
> > > > + __sync_fetch_and_add(&dropped, 1);
> > > > + return 0;
> > > > +}
> > > > +
> > > > +char LICENSE[] SEC("license") = "Dual BSD/GPL";
> > > > diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> > > > index c272c06565c0..68072ec655ce 100644
> > > > --- a/tools/perf/util/evsel.h
> > > > +++ b/tools/perf/util/evsel.h
> > > > @@ -150,8 +150,10 @@ struct evsel {
> > > > */
> > > > struct bpf_counter_ops *bpf_counter_ops;
> > > >
> > > > - /* for perf-stat -b */
> > > > - struct list_head bpf_counter_list;
> > > > + union {
> > > > + struct list_head bpf_counter_list; /* for perf-stat -b */
> > > > + struct list_head bpf_filters; /* for perf-record --filter */
> > > > + };
> > > >
> > > > /* for perf-stat --use-bpf */
> > > > int bperf_leader_prog_fd;
> > > > @@ -159,6 +161,7 @@ struct evsel {
> > > > union {
> > > > struct bperf_leader_bpf *leader_skel;
> > > > struct bperf_follower_bpf *follower_skel;
> > > > + void *bpf_skel;
> > > > };
> > > > unsigned long open_flags;
> > > > int precise_ip_original;
> > > > --
> > > > 2.40.0.rc1.284.g88254d51c5-goog
> > > >
> > >
> > > --
> > >
> > > - Arnaldo
> >
> > --
> >
> > - Arnaldo
--
- Arnaldo
On Wed, Mar 15, 2023 at 05:12:44PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, Mar 15, 2023 at 09:51:03AM -0700, Namhyung Kim escreveu:
> > On Wed, Mar 15, 2023 at 9:39 AM Arnaldo Carvalho de Melo
> > <[email protected]> wrote:
> > >
> > > Em Wed, Mar 15, 2023 at 01:24:37PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > Em Tue, Mar 14, 2023 at 04:42:29PM -0700, Namhyung Kim escreveu:
> > > > > The BPF program will be attached to a perf_event and be triggered when
> > > > > it overflows. It'd iterate the filters map and compare the sample
> > > > > value according to the expression. If any of them fails, the sample
> > > > > would be dropped.
> > > > >
> > > > > Also it needs to have the corresponding sample data for the expression
> > > > > so it compares data->sample_flags with the given value. To access the
> > > > > sample data, it uses the bpf_cast_to_kern_ctx() kfunc which was added
> > > > > in v6.2 kernel.
> > > > >
> > > > > Acked-by: Jiri Olsa <[email protected]>
> > > > > Signed-off-by: Namhyung Kim <[email protected]>
> > > >
> > > >
> > > > I'm noticing this while building on a debian:11 container:
> > > >
> > > > GENSKEL /tmp/build/perf/util/bpf_skel/bperf_leader.skel.h
> > > > GENSKEL /tmp/build/perf/util/bpf_skel/bperf_follower.skel.h
> > > > GENSKEL /tmp/build/perf/util/bpf_skel/func_latency.skel.h
> > > > GENSKEL /tmp/build/perf/util/bpf_skel/bpf_prog_profiler.skel.h
> > > > GENSKEL /tmp/build/perf/util/bpf_skel/kwork_trace.skel.h
> > > > GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h
> > > > libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
> > > > Error: failed to open BPF object file: No such file or directory
> > > > make[2]: *** [Makefile.perf:1085: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
> > > > make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'
> > > > make[2]: *** Waiting for unfinished jobs....
> > > > make[1]: *** [Makefile.perf:236: sub-make] Error 2
> > > > make: *** [Makefile:70: all] Error 2
> > > > make: Leaving directory '/git/perf-6.3.0-rc1/tools/perf'
> > > > + exit 1
> > > > [perfbuilder@five 11]$
> > >
> > > Same thing on debian:10
> >
> > Hmm.. I thought extern symbols with__ksym are runtime
> > dependencies and it should build on old kernels too.
> >
> > BPF folks, any suggestions?
>
> Fedora 33 also fails, see below, but these work:
Maybe I can declare it as a weak symbol. How about this?
Thanks,
Namhyung
---8<---
diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
index 57e3c67d6d37..52cbdd1765cd 100644
--- a/tools/perf/util/bpf_skel/sample_filter.bpf.c
+++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
@@ -17,7 +17,7 @@ struct filters {
int dropped;
-void *bpf_cast_to_kern_ctx(void *) __ksym;
+void *bpf_cast_to_kern_ctx(void *) __ksym __weak;
/* new kernel perf_sample_data definition */
struct perf_sample_data___new {
@@ -118,6 +118,10 @@ int perf_sample_filter(void *ctx)
int group_result = 0;
int i;
+ /* no kernel context support, no filtering */
+ if (!bpf_cast_to_kern_ctx)
+ return 1;
+
kctx = bpf_cast_to_kern_ctx(ctx);
for (i = 0; i < MAX_FILTERS; i++) {
Em Wed, Mar 15, 2023 at 10:18:25PM -0700, Namhyung Kim escreveu:
> On Wed, Mar 15, 2023 at 05:12:44PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Mar 15, 2023 at 09:51:03AM -0700, Namhyung Kim escreveu:
> > > On Wed, Mar 15, 2023 at 9:39 AM Arnaldo Carvalho de Melo
> > > <[email protected]> wrote:
> > > >
> > > > Em Wed, Mar 15, 2023 at 01:24:37PM -0300, Arnaldo Carvalho de Melo escreveu:
> > > > > Em Tue, Mar 14, 2023 at 04:42:29PM -0700, Namhyung Kim escreveu:
> > > > > > The BPF program will be attached to a perf_event and be triggered when
> > > > > > it overflows. It'd iterate the filters map and compare the sample
> > > > > > value according to the expression. If any of them fails, the sample
> > > > > > would be dropped.
> > > > > >
> > > > > > Also it needs to have the corresponding sample data for the expression
> > > > > > so it compares data->sample_flags with the given value. To access the
> > > > > > sample data, it uses the bpf_cast_to_kern_ctx() kfunc which was added
> > > > > > in v6.2 kernel.
> > > > > >
> > > > > > Acked-by: Jiri Olsa <[email protected]>
> > > > > > Signed-off-by: Namhyung Kim <[email protected]>
> > > > >
> > > > >
> > > > > I'm noticing this while building on a debian:11 container:
> > > > >
> > > > > GENSKEL /tmp/build/perf/util/bpf_skel/bperf_leader.skel.h
> > > > > GENSKEL /tmp/build/perf/util/bpf_skel/bperf_follower.skel.h
> > > > > GENSKEL /tmp/build/perf/util/bpf_skel/func_latency.skel.h
> > > > > GENSKEL /tmp/build/perf/util/bpf_skel/bpf_prog_profiler.skel.h
> > > > > GENSKEL /tmp/build/perf/util/bpf_skel/kwork_trace.skel.h
> > > > > GENSKEL /tmp/build/perf/util/bpf_skel/sample_filter.skel.h
> > > > > libbpf: failed to find BTF for extern 'bpf_cast_to_kern_ctx' [21] section: -2
> > > > > Error: failed to open BPF object file: No such file or directory
> > > > > make[2]: *** [Makefile.perf:1085: /tmp/build/perf/util/bpf_skel/sample_filter.skel.h] Error 254
> > > > > make[2]: *** Deleting file '/tmp/build/perf/util/bpf_skel/sample_filter.skel.h'
> > > > > make[2]: *** Waiting for unfinished jobs....
> > > > > make[1]: *** [Makefile.perf:236: sub-make] Error 2
> > > > > make: *** [Makefile:70: all] Error 2
> > > > > make: Leaving directory '/git/perf-6.3.0-rc1/tools/perf'
> > > > > + exit 1
> > > > > [perfbuilder@five 11]$
> > > >
> > > > Same thing on debian:10
> > >
> > > Hmm.. I thought extern symbols with__ksym are runtime
> > > dependencies and it should build on old kernels too.
> > >
> > > BPF folks, any suggestions?
> >
> > Fedora 33 also fails, see below, but these work:
>
> Maybe I can declare it as a weak symbol. How about this?
Unsure this is the right fix, for now I'm using NO_BPF_SKEL=1 in those
older distros, will check again early next week when I'm back from a
short trip.
- Arnaldo
> Thanks,
> Namhyung
>
> ---8<---
>
> diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c
> index 57e3c67d6d37..52cbdd1765cd 100644
> --- a/tools/perf/util/bpf_skel/sample_filter.bpf.c
> +++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c
> @@ -17,7 +17,7 @@ struct filters {
>
> int dropped;
>
> -void *bpf_cast_to_kern_ctx(void *) __ksym;
> +void *bpf_cast_to_kern_ctx(void *) __ksym __weak;
>
> /* new kernel perf_sample_data definition */
> struct perf_sample_data___new {
> @@ -118,6 +118,10 @@ int perf_sample_filter(void *ctx)
> int group_result = 0;
> int i;
>
> + /* no kernel context support, no filtering */
> + if (!bpf_cast_to_kern_ctx)
> + return 1;
> +
> kctx = bpf_cast_to_kern_ctx(ctx);
>
> for (i = 0; i < MAX_FILTERS; i++) {
>
--
- Arnaldo