2020-12-29 21:44:51

by Song Liu

[permalink] [raw]
Subject: [PATCH v7 0/3] Introduce perf-stat -b for BPF programs

This set introduces perf-stat -b option to count events for BPF programs.
This is similar to bpftool-prog-profile. But perf-stat makes it much more
flexible.

Changes v6 => v7
1. Merge documentation change with feature change. (Arnaldo)
2. Fix python import perf. (Arnaldo)
3. Other small fixes. (Namhyung)
4. Improve allocation/free of bpf skeleton. Also added assert()'s to
debug a segfault.

Changes v5 => v6
1. Update the name for bootstrap bpftool. (Jiri)

Changes v4 => v5:
1. Add documentation. (Jiri)
2. Silent make output for removing .bpf.o file. (Jiri)

Changes v3 => v4:
1. Split changes in bpftool/Makefile to a separate patch
2. Various small fixes. (Jiri)

Changes v2 => v3:
1. Small fixes in Makefile.perf and bpf_counter.c (Jiri)
2. Rebased on top of bpf-next. This is because 1/2 conflicts with some
patches in bpftool/Makefile.

Changes PATCH v1 => PATCH v2:
1. Various fixes in Makefiles. (Jiri)
2. Fix an build warning/error with gcc-10. (Jiri)

Changes RFC v2 => PATCH v1:
1. Support counting on multiple BPF programs.
2. Add BPF handling to target__validate().
3. Improve Makefile. (Jiri)

Changes RFC v1 => RFC v2:
1. Use bootstrap version of bpftool. (Jiri)
2. Set default to not building bpf skeletons. (Jiri)
3. Remove util/bpf_skel/Makefile, keep all the logic in Makefile.perf.
(Jiri)
4. Remove dependency to vmlinux.h in the two skeletons. The goal here is
to enable building perf without building kernel (vmlinux) first.
Note: I also removed the logic that build vmlinux.h. We can add that
back when we have to use it (to access big kernel structures).

Song Liu (3):
bpftool: add Makefile target bootstrap
perf: support build BPF skeletons with perf
perf-stat: enable counting events for BPF programs

tools/bpf/bpftool/Makefile | 2 +
tools/build/Makefile.feature | 4 +-
tools/perf/Documentation/perf-stat.txt | 18 ++
tools/perf/Makefile.config | 9 +
tools/perf/Makefile.perf | 49 ++-
tools/perf/builtin-stat.c | 82 ++++-
tools/perf/util/Build | 1 +
tools/perf/util/bpf_counter.c | 297 ++++++++++++++++++
tools/perf/util/bpf_counter.h | 72 +++++
tools/perf/util/bpf_skel/.gitignore | 3 +
.../util/bpf_skel/bpf_prog_profiler.bpf.c | 93 ++++++
tools/perf/util/evsel.c | 9 +
tools/perf/util/evsel.h | 5 +
tools/perf/util/python.c | 21 ++
tools/perf/util/stat-display.c | 4 +-
tools/perf/util/stat.c | 2 +-
tools/perf/util/target.c | 34 +-
tools/perf/util/target.h | 10 +
tools/scripts/Makefile.include | 1 +
19 files changed, 696 insertions(+), 20 deletions(-)
create mode 100644 tools/perf/util/bpf_counter.c
create mode 100644 tools/perf/util/bpf_counter.h
create mode 100644 tools/perf/util/bpf_skel/.gitignore
create mode 100644 tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c

--
2.24.1


2020-12-29 21:45:06

by Song Liu

[permalink] [raw]
Subject: [PATCH v7 2/3] perf: support build BPF skeletons with perf

BPF programs are useful in perf to profile BPF programs. BPF skeleton is
by far the easiest way to write BPF tools. Enable building BPF skeletons
in util/bpf_skel. A dummy bpf skeleton is added. More bpf skeletons will
be added for different use cases.

Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Song Liu <[email protected]>
---
tools/build/Makefile.feature | 4 ++-
tools/perf/Makefile.config | 9 ++++++
tools/perf/Makefile.perf | 49 +++++++++++++++++++++++++++--
tools/perf/util/bpf_skel/.gitignore | 3 ++
tools/scripts/Makefile.include | 1 +
5 files changed, 63 insertions(+), 3 deletions(-)
create mode 100644 tools/perf/util/bpf_skel/.gitignore

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 97cbfb31b7625..74e255d58d8d0 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -99,7 +99,9 @@ FEATURE_TESTS_EXTRA := \
clang \
libbpf \
libpfm4 \
- libdebuginfod
+ libdebuginfod \
+ clang-bpf-co-re
+

FEATURE_TESTS ?= $(FEATURE_TESTS_BASIC)

diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index ce8516e4de34f..d8e59d31399a5 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -621,6 +621,15 @@ ifndef NO_LIBBPF
endif
endif

+ifdef BUILD_BPF_SKEL
+ $(call feature_check,clang-bpf-co-re)
+ ifeq ($(feature-clang-bpf-co-re), 0)
+ dummy := $(error Error: clang too old. Please install recent clang)
+ endif
+ $(call detected,CONFIG_PERF_BPF_SKEL)
+ CFLAGS += -DHAVE_BPF_SKEL
+endif
+
dwarf-post-unwind := 1
dwarf-post-unwind-text := BUG

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 62f3deb1d3a8b..d182a2dbb9bbd 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -126,6 +126,8 @@ include ../scripts/utilities.mak
#
# Define NO_LIBDEBUGINFOD if you do not want support debuginfod
#
+# Define BUILD_BPF_SKEL to enable BPF skeletons
+#

# As per kernel Makefile, avoid funny character set dependencies
unexport LC_ALL
@@ -175,6 +177,12 @@ endef

LD += $(EXTRA_LDFLAGS)

+HOSTCC ?= gcc
+HOSTLD ?= ld
+HOSTAR ?= ar
+CLANG ?= clang
+LLVM_STRIP ?= llvm-strip
+
PKG_CONFIG = $(CROSS_COMPILE)pkg-config
LLVM_CONFIG ?= llvm-config

@@ -731,7 +739,8 @@ prepare: $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h archheaders $(drm_ioc
$(x86_arch_prctl_code_array) \
$(rename_flags_array) \
$(arch_errno_name_array) \
- $(sync_file_range_arrays)
+ $(sync_file_range_arrays) \
+ bpf-skel

$(OUTPUT)%.o: %.c prepare FORCE
$(Q)$(MAKE) -f $(srctree)/tools/build/Makefile.build dir=$(build-dir) $@
@@ -1004,7 +1013,43 @@ config-clean:
python-clean:
$(python-clean)

-clean:: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBPERF)-clean config-clean fixdep-clean python-clean
+SKEL_OUT := $(abspath $(OUTPUT)util/bpf_skel)
+SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp)
+SKELETONS :=
+
+ifdef BUILD_BPF_SKEL
+BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
+LIBBPF_SRC := $(abspath ../lib/bpf)
+BPF_INCLUDE := -I$(SKEL_TMP_OUT)/.. -I$(BPF_PATH) -I$(LIBBPF_SRC)/..
+
+$(SKEL_TMP_OUT):
+ $(Q)$(MKDIR) -p $@
+
+$(BPFTOOL): | $(SKEL_TMP_OUT)
+ CFLAGS= $(MAKE) -C ../bpf/bpftool \
+ OUTPUT=$(SKEL_TMP_OUT)/ bootstrap
+
+$(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) | $(SKEL_TMP_OUT)
+ $(QUIET_CLANG)$(CLANG) -g -O2 -target bpf $(BPF_INCLUDE) \
+ -c $(filter util/bpf_skel/%.bpf.c,$^) -o $@ && $(LLVM_STRIP) -g $@
+
+$(SKEL_OUT)/%.skel.h: $(SKEL_TMP_OUT)/%.bpf.o | $(BPFTOOL)
+ $(QUIET_GENSKEL)$(BPFTOOL) gen skeleton $< > $@
+
+bpf-skel: $(SKELETONS)
+
+.PRECIOUS: $(SKEL_TMP_OUT)/%.bpf.o
+
+else # BUILD_BPF_SKEL
+
+bpf-skel:
+
+endif # BUILD_BPF_SKEL
+
+bpf-skel-clean:
+ $(call QUIET_CLEAN, bpf-skel) $(RM) -r $(SKEL_TMP_OUT) $(SKELETONS)
+
+clean:: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBPERF)-clean config-clean fixdep-clean python-clean bpf-skel-clean
$(call QUIET_CLEAN, core-objs) $(RM) $(LIBPERF_A) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(LANG_BINDINGS)
$(Q)find $(if $(OUTPUT),$(OUTPUT),.) -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
$(Q)$(RM) $(OUTPUT).config-detected
diff --git a/tools/perf/util/bpf_skel/.gitignore b/tools/perf/util/bpf_skel/.gitignore
new file mode 100644
index 0000000000000..5263e9e6c5d83
--- /dev/null
+++ b/tools/perf/util/bpf_skel/.gitignore
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+.tmp
+*.skel.h
\ No newline at end of file
diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include
index 1358e89cdf7d6..62119ce69ad9a 100644
--- a/tools/scripts/Makefile.include
+++ b/tools/scripts/Makefile.include
@@ -127,6 +127,7 @@ ifneq ($(silent),1)
$(MAKE) $(PRINT_DIR) -C $$subdir
QUIET_FLEX = @echo ' FLEX '$@;
QUIET_BISON = @echo ' BISON '$@;
+ QUIET_GENSKEL = @echo ' GEN-SKEL '$@;

descend = \
+@echo ' DESCEND '$(1); \
--
2.24.1

2020-12-29 21:45:54

by Song Liu

[permalink] [raw]
Subject: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs

Introduce perf-stat -b option, which counts events for BPF programs, like:

[root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
1.487903822 115,200 ref-cycles
1.487903822 86,012 cycles
2.489147029 80,560 ref-cycles
2.489147029 73,784 cycles
3.490341825 60,720 ref-cycles
3.490341825 37,797 cycles
4.491540887 37,120 ref-cycles
4.491540887 31,963 cycles

The example above counts cycles and ref-cycles of BPF program of id 254.
This is similar to bpftool-prog-profile command, but more flexible.

perf-stat -b creates per-cpu perf_event and loads fentry/fexit BPF
programs (monitor-progs) to the target BPF program (target-prog). The
monitor-progs read perf_event before and after the target-prog, and
aggregate the difference in a BPF map. Then the user space reads data
from these maps.

A new struct bpf_counter is introduced to provide common interface that
uses BPF programs/maps to count perf events.

Signed-off-by: Song Liu <[email protected]>
---
tools/perf/Documentation/perf-stat.txt | 18 +
tools/perf/Makefile.perf | 2 +-
tools/perf/builtin-stat.c | 82 ++++-
tools/perf/util/Build | 1 +
tools/perf/util/bpf_counter.c | 309 ++++++++++++++++++
tools/perf/util/bpf_counter.h | 72 ++++
.../util/bpf_skel/bpf_prog_profiler.bpf.c | 93 ++++++
tools/perf/util/evsel.c | 9 +
tools/perf/util/evsel.h | 5 +
tools/perf/util/python.c | 21 ++
tools/perf/util/stat-display.c | 4 +-
tools/perf/util/stat.c | 2 +-
tools/perf/util/target.c | 34 +-
tools/perf/util/target.h | 10 +
14 files changed, 644 insertions(+), 18 deletions(-)
create mode 100644 tools/perf/util/bpf_counter.c
create mode 100644 tools/perf/util/bpf_counter.h
create mode 100644 tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c

diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
index 5d4a673d7621a..98a4dfd3b2dc9 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -75,6 +75,24 @@ report::
--tid=<tid>::
stat events on existing thread id (comma separated list)

+-b::
+--bpf-prog::
+ stat events on existing bpf program id (comma separated list),
+ requiring root rights. bpftool-prog could be used to find program
+ id all bpf programs in the system. For example:
+
+ # bpftool prog | head -n 1
+ 17247: tracepoint name sys_enter tag 192d548b9d754067 gpl
+
+ # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
+
+ Performance counter stats for 'BPF program(s) 17247':
+
+ 85,967 cycles
+ 28,982 instructions # 0.34 insn per cycle
+
+ 1.102235068 seconds time elapsed
+
ifdef::HAVE_LIBPFM[]
--pfm-events events::
Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index d182a2dbb9bbd..8c4e039c3b813 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -1015,7 +1015,7 @@ python-clean:

SKEL_OUT := $(abspath $(OUTPUT)util/bpf_skel)
SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp)
-SKELETONS :=
+SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h

ifdef BUILD_BPF_SKEL
BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 8cc24967bc273..3c054b8d4677c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -67,6 +67,7 @@
#include "util/top.h"
#include "util/affinity.h"
#include "util/pfm.h"
+#include "util/bpf_counter.h"
#include "asm/bug.h"

#include <linux/time64.h>
@@ -409,12 +410,32 @@ static int read_affinity_counters(struct timespec *rs)
return 0;
}

+static int read_bpf_map_counters(void)
+{
+ struct evsel *counter;
+ int err;
+
+ evlist__for_each_entry(evsel_list, counter) {
+ err = bpf_counter__read(counter);
+ if (err)
+ return err;
+ }
+ return 0;
+}
+
static void read_counters(struct timespec *rs)
{
struct evsel *counter;
+ int err;

- if (!stat_config.stop_read_counter && (read_affinity_counters(rs) < 0))
- return;
+ if (!stat_config.stop_read_counter) {
+ if (target__has_bpf(&target))
+ err = read_bpf_map_counters();
+ else
+ err = read_affinity_counters(rs);
+ if (err < 0)
+ return;
+ }

evlist__for_each_entry(evsel_list, counter) {
if (counter->err)
@@ -496,11 +517,22 @@ static bool handle_interval(unsigned int interval, int *times)
return false;
}

-static void enable_counters(void)
+static int enable_counters(void)
{
+ struct evsel *evsel;
+ int err;
+
+ if (target__has_bpf(&target)) {
+ evlist__for_each_entry(evsel_list, evsel) {
+ err = bpf_counter__enable(evsel);
+ if (err)
+ return err;
+ }
+ }
+
if (stat_config.initial_delay < 0) {
pr_info(EVLIST_DISABLED_MSG);
- return;
+ return 0;
}

if (stat_config.initial_delay > 0) {
@@ -518,6 +550,7 @@ static void enable_counters(void)
if (stat_config.initial_delay > 0)
pr_info(EVLIST_ENABLED_MSG);
}
+ return 0;
}

static void disable_counters(void)
@@ -720,7 +753,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
const bool forks = (argc > 0);
bool is_pipe = STAT_RECORD ? perf_stat.data.is_pipe : false;
struct affinity affinity;
- int i, cpu;
+ int i, cpu, err;
bool second_pass = false;

if (forks) {
@@ -737,6 +770,13 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
if (affinity__setup(&affinity) < 0)
return -1;

+ if (target__has_bpf(&target)) {
+ evlist__for_each_entry(evsel_list, counter) {
+ if (bpf_counter__load(counter, &target))
+ return -1;
+ }
+ }
+
evlist__for_each_cpu (evsel_list, i, cpu) {
affinity__set(&affinity, cpu);

@@ -850,7 +890,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
}

if (STAT_RECORD) {
- int err, fd = perf_data__fd(&perf_stat.data);
+ int fd = perf_data__fd(&perf_stat.data);

if (is_pipe) {
err = perf_header__write_pipe(perf_data__fd(&perf_stat.data));
@@ -876,7 +916,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)

if (forks) {
evlist__start_workload(evsel_list);
- enable_counters();
+ err = enable_counters();
+ if (err)
+ return -1;

if (interval || timeout || evlist__ctlfd_initialized(evsel_list))
status = dispatch_events(forks, timeout, interval, &times);
@@ -895,7 +937,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
if (WIFSIGNALED(status))
psignal(WTERMSIG(status), argv[0]);
} else {
- enable_counters();
+ err = enable_counters();
+ if (err)
+ return -1;
status = dispatch_events(forks, timeout, interval, &times);
}

@@ -1085,6 +1129,10 @@ static struct option stat_options[] = {
"stat events on existing process id"),
OPT_STRING('t', "tid", &target.tid, "tid",
"stat events on existing thread id"),
+#ifdef HAVE_BPF_SKEL
+ OPT_STRING('b', "bpf-prog", &target.bpf_str, "bpf-prog-id",
+ "stat events on existing bpf program id"),
+#endif
OPT_BOOLEAN('a', "all-cpus", &target.system_wide,
"system-wide collection from all CPUs"),
OPT_BOOLEAN('g', "group", &group,
@@ -2064,11 +2112,12 @@ int cmd_stat(int argc, const char **argv)
"perf stat [<options>] [<command>]",
NULL
};
- int status = -EINVAL, run_idx;
+ int status = -EINVAL, run_idx, err;
const char *mode;
FILE *output = stderr;
unsigned int interval, timeout;
const char * const stat_subcommands[] = { "record", "report" };
+ char errbuf[BUFSIZ];

setlocale(LC_ALL, "");

@@ -2179,6 +2228,12 @@ int cmd_stat(int argc, const char **argv)
} else if (big_num_opt == 0) /* User passed --no-big-num */
stat_config.big_num = false;

+ err = target__validate(&target);
+ if (err) {
+ target__strerror(&target, err, errbuf, BUFSIZ);
+ pr_warning("%s\n", errbuf);
+ }
+
setup_system_wide(argc);

/*
@@ -2252,8 +2307,6 @@ int cmd_stat(int argc, const char **argv)
}
}

- target__validate(&target);
-
if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide))
target.per_thread = true;

@@ -2384,9 +2437,10 @@ int cmd_stat(int argc, const char **argv)
* tools remain -acme
*/
int fd = perf_data__fd(&perf_stat.data);
- int err = perf_event__synthesize_kernel_mmap((void *)&perf_stat,
- process_synthesized_event,
- &perf_stat.session->machines.host);
+
+ err = perf_event__synthesize_kernel_mmap((void *)&perf_stat,
+ process_synthesized_event,
+ &perf_stat.session->machines.host);
if (err) {
pr_warning("Couldn't synthesize the kernel mmap record, harmless, "
"older tools may produce warnings about this file\n.");
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index e2563d0154eb6..188521f343470 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -135,6 +135,7 @@ perf-y += clockid.o

perf-$(CONFIG_LIBBPF) += bpf-loader.o
perf-$(CONFIG_LIBBPF) += bpf_map.o
+perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o
perf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o
perf-$(CONFIG_LIBELF) += symbol-elf.o
perf-$(CONFIG_LIBELF) += probe-file.o
diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
new file mode 100644
index 0000000000000..8c977f038f497
--- /dev/null
+++ b/tools/perf/util/bpf_counter.c
@@ -0,0 +1,309 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* Copyright (c) 2019 Facebook */
+
+#include <assert.h>
+#include <limits.h>
+#include <unistd.h>
+#include <sys/time.h>
+#include <sys/resource.h>
+#include <linux/err.h>
+#include <linux/zalloc.h>
+#include <bpf/bpf.h>
+#include <bpf/btf.h>
+#include <bpf/libbpf.h>
+
+#include "bpf_counter.h"
+#include "counts.h"
+#include "debug.h"
+#include "evsel.h"
+#include "target.h"
+
+#include "bpf_skel/bpf_prog_profiler.skel.h"
+
+static inline void *u64_to_ptr(__u64 ptr)
+{
+ return (void *)(unsigned long)ptr;
+}
+
+static void set_max_rlimit(void)
+{
+ struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
+
+ setrlimit(RLIMIT_MEMLOCK, &rinf);
+}
+
+static struct bpf_counter *bpf_counter_alloc(void)
+{
+ struct bpf_counter *counter;
+
+ counter = zalloc(sizeof(*counter));
+ if (counter)
+ INIT_LIST_HEAD(&counter->list);
+ return counter;
+}
+
+static int bpf_program_profiler__destroy(struct evsel *evsel)
+{
+ struct bpf_counter *counter, *tmp;
+
+ list_for_each_entry_safe(counter, tmp,
+ &evsel->bpf_counter_list, list) {
+ list_del_init(&counter->list);
+ bpf_prog_profiler_bpf__destroy(counter->skel);
+ free(counter);
+ }
+ assert(list_empty(&evsel->bpf_counter_list));
+
+ return 0;
+}
+
+static char *bpf_target_prog_name(int tgt_fd)
+{
+ struct bpf_prog_info_linear *info_linear;
+ struct bpf_func_info *func_info;
+ const struct btf_type *t;
+ char *name = NULL;
+ struct btf *btf;
+
+ info_linear = bpf_program__get_prog_info_linear(
+ tgt_fd, 1UL << BPF_PROG_INFO_FUNC_INFO);
+ if (IS_ERR_OR_NULL(info_linear)) {
+ pr_debug("failed to get info_linear for prog FD %d\n", tgt_fd);
+ return NULL;
+ }
+
+ if (info_linear->info.btf_id == 0 ||
+ btf__get_from_id(info_linear->info.btf_id, &btf)) {
+ pr_debug("prog FD %d doesn't have valid btf\n", tgt_fd);
+ goto out;
+ }
+
+ func_info = u64_to_ptr(info_linear->info.func_info);
+ t = btf__type_by_id(btf, func_info[0].type_id);
+ if (!t) {
+ pr_debug("btf %d doesn't have type %d\n",
+ info_linear->info.btf_id, func_info[0].type_id);
+ goto out;
+ }
+ name = strdup(btf__name_by_offset(btf, t->name_off));
+out:
+ free(info_linear);
+ return name;
+}
+
+static int bpf_program_profiler_load_one(struct evsel *evsel, u32 prog_id)
+{
+ struct bpf_prog_profiler_bpf *skel;
+ struct bpf_counter *counter;
+ struct bpf_program *prog;
+ char *prog_name;
+ int prog_fd;
+ int err;
+
+ prog_fd = bpf_prog_get_fd_by_id(prog_id);
+ if (prog_fd < 0) {
+ pr_err("Failed to open fd for bpf prog %u\n", prog_id);
+ return -1;
+ }
+ counter = bpf_counter_alloc();
+ if (!counter) {
+ close(prog_fd);
+ return -1;
+ }
+
+ skel = bpf_prog_profiler_bpf__open();
+ if (!skel) {
+ pr_err("Failed to open bpf skeleton\n");
+ goto err_out;
+ }
+
+ skel->rodata->num_cpu = evsel__nr_cpus(evsel);
+
+ bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
+ bpf_map__resize(skel->maps.fentry_readings, 1);
+ bpf_map__resize(skel->maps.accum_readings, 1);
+
+ prog_name = bpf_target_prog_name(prog_fd);
+ if (!prog_name) {
+ pr_err("Failed to get program name for bpf prog %u. Does it have BTF?\n", prog_id);
+ goto err_out;
+ }
+
+ bpf_object__for_each_program(prog, skel->obj) {
+ err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
+ if (err) {
+ pr_err("bpf_program__set_attach_target failed.\n"
+ "Does bpf prog %u have BTF?\n", prog_id);
+ goto err_out;
+ }
+ }
+ set_max_rlimit();
+ err = bpf_prog_profiler_bpf__load(skel);
+ if (err) {
+ pr_err("bpf_prog_profiler_bpf__load failed\n");
+ goto err_out;
+ }
+
+ assert(skel != NULL);
+ counter->skel = skel;
+ list_add(&counter->list, &evsel->bpf_counter_list);
+ close(prog_fd);
+ return 0;
+err_out:
+ bpf_prog_profiler_bpf__destroy(skel);
+ free(counter);
+ close(prog_fd);
+ return -1;
+}
+
+static int bpf_program_profiler__load(struct evsel *evsel, struct target *target)
+{
+ char *bpf_str, *bpf_str_, *tok, *saveptr = NULL, *p;
+ u32 prog_id;
+ int ret;
+
+ bpf_str_ = bpf_str = strdup(target->bpf_str);
+ if (!bpf_str)
+ return -1;
+
+ while ((tok = strtok_r(bpf_str, ",", &saveptr)) != NULL) {
+ prog_id = strtoul(tok, &p, 10);
+ if (prog_id == 0 || prog_id == UINT_MAX ||
+ (*p != '\0' && *p != ',')) {
+ pr_err("Failed to parse bpf prog ids %s\n",
+ target->bpf_str);
+ return -1;
+ }
+
+ ret = bpf_program_profiler_load_one(evsel, prog_id);
+ if (ret) {
+ bpf_program_profiler__destroy(evsel);
+ free(bpf_str_);
+ return -1;
+ }
+ bpf_str = NULL;
+ }
+ free(bpf_str_);
+ return 0;
+}
+
+static int bpf_program_profiler__enable(struct evsel *evsel)
+{
+ struct bpf_counter *counter;
+ int ret;
+
+ list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
+ assert(counter->skel != NULL);
+ ret = bpf_prog_profiler_bpf__attach(counter->skel);
+ if (ret) {
+ bpf_program_profiler__destroy(evsel);
+ return ret;
+ }
+ }
+ return 0;
+}
+
+static int bpf_program_profiler__read(struct evsel *evsel)
+{
+ int num_cpu = evsel__nr_cpus(evsel);
+ struct bpf_perf_event_value values[num_cpu];
+ struct bpf_counter *counter;
+ int reading_map_fd;
+ __u32 key = 0;
+ int err, cpu;
+
+ if (list_empty(&evsel->bpf_counter_list))
+ return -EAGAIN;
+
+ for (cpu = 0; cpu < num_cpu; cpu++) {
+ perf_counts(evsel->counts, cpu, 0)->val = 0;
+ perf_counts(evsel->counts, cpu, 0)->ena = 0;
+ perf_counts(evsel->counts, cpu, 0)->run = 0;
+ }
+ list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
+ struct bpf_prog_profiler_bpf *skel = counter->skel;
+
+ assert(skel != NULL);
+ reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
+
+ err = bpf_map_lookup_elem(reading_map_fd, &key, values);
+ if (err) {
+ pr_err("failed to read value\n");
+ return err;
+ }
+
+ for (cpu = 0; cpu < num_cpu; cpu++) {
+ perf_counts(evsel->counts, cpu, 0)->val += values[cpu].counter;
+ perf_counts(evsel->counts, cpu, 0)->ena += values[cpu].enabled;
+ perf_counts(evsel->counts, cpu, 0)->run += values[cpu].running;
+ }
+ }
+ return 0;
+}
+
+static int bpf_program_profiler__install_pe(struct evsel *evsel, int cpu,
+ int fd)
+{
+ struct bpf_prog_profiler_bpf *skel;
+ struct bpf_counter *counter;
+ int ret;
+
+ list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
+ skel = counter->skel;
+ assert(skel != NULL);
+
+ ret = bpf_map_update_elem(bpf_map__fd(skel->maps.events),
+ &cpu, &fd, BPF_ANY);
+ if (ret)
+ return ret;
+ }
+ return 0;
+}
+
+struct bpf_counter_ops bpf_program_profiler_ops = {
+ .load = bpf_program_profiler__load,
+ .enable = bpf_program_profiler__enable,
+ .read = bpf_program_profiler__read,
+ .destroy = bpf_program_profiler__destroy,
+ .install_pe = bpf_program_profiler__install_pe,
+};
+
+int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd)
+{
+ if (list_empty(&evsel->bpf_counter_list))
+ return 0;
+ return evsel->bpf_counter_ops->install_pe(evsel, cpu, fd);
+}
+
+int bpf_counter__load(struct evsel *evsel, struct target *target)
+{
+ if (target__has_bpf(target))
+ evsel->bpf_counter_ops = &bpf_program_profiler_ops;
+
+ if (evsel->bpf_counter_ops)
+ return evsel->bpf_counter_ops->load(evsel, target);
+ return 0;
+}
+
+int bpf_counter__enable(struct evsel *evsel)
+{
+ if (list_empty(&evsel->bpf_counter_list))
+ return 0;
+ return evsel->bpf_counter_ops->enable(evsel);
+}
+
+int bpf_counter__read(struct evsel *evsel)
+{
+ if (list_empty(&evsel->bpf_counter_list))
+ return -EAGAIN;
+ return evsel->bpf_counter_ops->read(evsel);
+}
+
+void bpf_counter__destroy(struct evsel *evsel)
+{
+ if (list_empty(&evsel->bpf_counter_list))
+ return;
+ evsel->bpf_counter_ops->destroy(evsel);
+ evsel->bpf_counter_ops = NULL;
+}
diff --git a/tools/perf/util/bpf_counter.h b/tools/perf/util/bpf_counter.h
new file mode 100644
index 0000000000000..2eca210e5dc16
--- /dev/null
+++ b/tools/perf/util/bpf_counter.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_BPF_COUNTER_H
+#define __PERF_BPF_COUNTER_H 1
+
+#include <linux/list.h>
+
+struct evsel;
+struct target;
+struct bpf_counter;
+
+typedef int (*bpf_counter_evsel_op)(struct evsel *evsel);
+typedef int (*bpf_counter_evsel_target_op)(struct evsel *evsel,
+ struct target *target);
+typedef int (*bpf_counter_evsel_install_pe_op)(struct evsel *evsel,
+ int cpu,
+ int fd);
+
+struct bpf_counter_ops {
+ bpf_counter_evsel_target_op load;
+ bpf_counter_evsel_op enable;
+ bpf_counter_evsel_op read;
+ bpf_counter_evsel_op destroy;
+ bpf_counter_evsel_install_pe_op install_pe;
+};
+
+struct bpf_counter {
+ void *skel;
+ struct list_head list;
+};
+
+#ifdef HAVE_BPF_SKEL
+
+int bpf_counter__load(struct evsel *evsel, struct target *target);
+int bpf_counter__enable(struct evsel *evsel);
+int bpf_counter__read(struct evsel *evsel);
+void bpf_counter__destroy(struct evsel *evsel);
+int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd);
+
+#else /* HAVE_BPF_SKEL */
+
+#include<linux/err.h>
+
+static inline int bpf_counter__load(struct evsel *evsel __maybe_unused,
+ struct target *target __maybe_unused)
+{
+ return 0;
+}
+
+static inline int bpf_counter__enable(struct evsel *evsel __maybe_unused)
+{
+ return 0;
+}
+
+static inline int bpf_counter__read(struct evsel *evsel __maybe_unused)
+{
+ return -EAGAIN;
+}
+
+static inline void bpf_counter__destroy(struct evsel *evsel __maybe_unused)
+{
+}
+
+static inline int bpf_counter__install_pe(struct evsel *evsel __maybe_unused,
+ int cpu __maybe_unused,
+ int fd __maybe_unused)
+{
+ return 0;
+}
+
+#endif /* HAVE_BPF_SKEL */
+
+#endif /* __PERF_BPF_COUNTER_H */
diff --git a/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c b/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
new file mode 100644
index 0000000000000..c7cec92d02360
--- /dev/null
+++ b/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+// Copyright (c) 2020 Facebook
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+/* map of perf event fds, num_cpu * num_metric entries */
+struct {
+ __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(int));
+} events SEC(".maps");
+
+/* readings at fentry */
+struct {
+ __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct bpf_perf_event_value));
+ __uint(max_entries, 1);
+} fentry_readings SEC(".maps");
+
+/* accumulated readings */
+struct {
+ __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+ __uint(key_size, sizeof(__u32));
+ __uint(value_size, sizeof(struct bpf_perf_event_value));
+ __uint(max_entries, 1);
+} accum_readings SEC(".maps");
+
+const volatile __u32 num_cpu = 1;
+
+SEC("fentry/XXX")
+int BPF_PROG(fentry_XXX)
+{
+ __u32 key = bpf_get_smp_processor_id();
+ struct bpf_perf_event_value *ptr;
+ __u32 zero = 0;
+ long err;
+
+ /* look up before reading, to reduce error */
+ ptr = bpf_map_lookup_elem(&fentry_readings, &zero);
+ if (!ptr)
+ return 0;
+
+ err = bpf_perf_event_read_value(&events, key, ptr, sizeof(*ptr));
+ if (err)
+ return 0;
+
+ return 0;
+}
+
+static inline void
+fexit_update_maps(struct bpf_perf_event_value *after)
+{
+ struct bpf_perf_event_value *before, diff, *accum;
+ __u32 zero = 0;
+
+ before = bpf_map_lookup_elem(&fentry_readings, &zero);
+ /* only account samples with a valid fentry_reading */
+ if (before && before->counter) {
+ struct bpf_perf_event_value *accum;
+
+ diff.counter = after->counter - before->counter;
+ diff.enabled = after->enabled - before->enabled;
+ diff.running = after->running - before->running;
+
+ accum = bpf_map_lookup_elem(&accum_readings, &zero);
+ if (accum) {
+ accum->counter += diff.counter;
+ accum->enabled += diff.enabled;
+ accum->running += diff.running;
+ }
+ }
+}
+
+SEC("fexit/XXX")
+int BPF_PROG(fexit_XXX)
+{
+ struct bpf_perf_event_value reading;
+ __u32 cpu = bpf_get_smp_processor_id();
+ __u32 one = 1, zero = 0;
+ int err;
+
+ /* read all events before updating the maps, to reduce error */
+ err = bpf_perf_event_read_value(&events, cpu, &reading, sizeof(reading));
+ if (err)
+ return 0;
+
+ fexit_update_maps(&reading);
+ return 0;
+}
+
+char LICENSE[] SEC("license") = "Dual BSD/GPL";
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index dc0cfa5f26101..7a5ca00496000 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -25,6 +25,7 @@
#include <stdlib.h>
#include <perf/evsel.h>
#include "asm/bug.h"
+#include "bpf_counter.h"
#include "callchain.h"
#include "cgroup.h"
#include "counts.h"
@@ -51,6 +52,10 @@
#include <internal/lib.h>

#include <linux/ctype.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+#include <bpf/btf.h>
+#include "rlimit.h"

struct perf_missing_features perf_missing_features;

@@ -247,6 +252,7 @@ void evsel__init(struct evsel *evsel,
evsel->bpf_obj = NULL;
evsel->bpf_fd = -1;
INIT_LIST_HEAD(&evsel->config_terms);
+ INIT_LIST_HEAD(&evsel->bpf_counter_list);
perf_evsel__object.init(evsel);
evsel->sample_size = __evsel__sample_size(attr->sample_type);
evsel__calc_id_pos(evsel);
@@ -1368,6 +1374,7 @@ void evsel__exit(struct evsel *evsel)
{
assert(list_empty(&evsel->core.node));
assert(evsel->evlist == NULL);
+ bpf_counter__destroy(evsel);
evsel__free_counts(evsel);
perf_evsel__free_fd(&evsel->core);
perf_evsel__free_id(&evsel->core);
@@ -1783,6 +1790,8 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,

FD(evsel, cpu, thread) = fd;

+ bpf_counter__install_pe(evsel, cpu, fd);
+
if (unlikely(test_attr__enabled)) {
test_attr__open(&evsel->core.attr, pid, cpus->map[cpu],
fd, group_fd, flags);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index cd1d8dd431997..8226b1fefa8cf 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -17,6 +17,8 @@ struct cgroup;
struct perf_counts;
struct perf_stat_evsel;
union perf_event;
+struct bpf_counter_ops;
+struct target;

typedef int (evsel__sb_cb_t)(union perf_event *event, void *data);

@@ -127,6 +129,8 @@ struct evsel {
* See also evsel__has_callchain().
*/
__u64 synth_sample_type;
+ struct list_head bpf_counter_list;
+ struct bpf_counter_ops *bpf_counter_ops;
};

struct perf_missing_features {
@@ -424,4 +428,5 @@ static inline bool evsel__is_dummy_event(struct evsel *evsel)
struct perf_env *evsel__env(struct evsel *evsel);

int evsel__store_ids(struct evsel *evsel, struct evlist *evlist);
+
#endif /* __PERF_EVSEL_H */
diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
index cc5ade85a33fc..278abecb5bdfc 100644
--- a/tools/perf/util/python.c
+++ b/tools/perf/util/python.c
@@ -79,6 +79,27 @@ int metricgroup__copy_metric_events(struct evlist *evlist, struct cgroup *cgrp,
return 0;
}

+/*
+ * XXX: All these evsel destructors need some better mechanism, like a linked
+ * list of destructors registered when the relevant code indeed is used instead
+ * of having more and more calls in perf_evsel__delete(). -- acme
+ *
+ * For now, add some more:
+ *
+ * Not to drag the BPF bandwagon...
+ */
+void bpf_counter__destroy(struct evsel *evsel);
+int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd);
+
+void bpf_counter__destroy(struct evsel *evsel __maybe_unused)
+{
+}
+
+int bpf_counter__install_pe(struct evsel *evsel __maybe_unused, int cpu __maybe_unused, int fd __maybe_unused)
+{
+ return 0;
+}
+
/*
* Support debug printing even though util/debug.c is not linked. That means
* implementing 'verbose' and 'eprintf'.
diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 583ae4f09c5d1..cce7a76d6473c 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -1045,7 +1045,9 @@ static void print_header(struct perf_stat_config *config,
if (!config->csv_output) {
fprintf(output, "\n");
fprintf(output, " Performance counter stats for ");
- if (_target->system_wide)
+ if (_target->bpf_str)
+ fprintf(output, "\'BPF program(s) %s", _target->bpf_str);
+ else if (_target->system_wide)
fprintf(output, "\'system wide");
else if (_target->cpu_list)
fprintf(output, "\'CPU(s) %s", _target->cpu_list);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 8ce1479c98f03..0b3957323f668 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -527,7 +527,7 @@ int create_perf_stat_counter(struct evsel *evsel,
if (leader->core.nr_members > 1)
attr->read_format |= PERF_FORMAT_ID|PERF_FORMAT_GROUP;

- attr->inherit = !config->no_inherit;
+ attr->inherit = !config->no_inherit && list_empty(&evsel->bpf_counter_list);

/*
* Some events get initialized with sample_(period/type) set,
diff --git a/tools/perf/util/target.c b/tools/perf/util/target.c
index a3db13dea937c..0f383418e3df5 100644
--- a/tools/perf/util/target.c
+++ b/tools/perf/util/target.c
@@ -56,6 +56,34 @@ enum target_errno target__validate(struct target *target)
ret = TARGET_ERRNO__UID_OVERRIDE_SYSTEM;
}

+ /* BPF and CPU are mutually exclusive */
+ if (target->bpf_str && target->cpu_list) {
+ target->cpu_list = NULL;
+ if (ret == TARGET_ERRNO__SUCCESS)
+ ret = TARGET_ERRNO__BPF_OVERRIDE_CPU;
+ }
+
+ /* BPF and PID/TID are mutually exclusive */
+ if (target->bpf_str && target->tid) {
+ target->tid = NULL;
+ if (ret == TARGET_ERRNO__SUCCESS)
+ ret = TARGET_ERRNO__BPF_OVERRIDE_PID;
+ }
+
+ /* BPF and UID are mutually exclusive */
+ if (target->bpf_str && target->uid_str) {
+ target->uid_str = NULL;
+ if (ret == TARGET_ERRNO__SUCCESS)
+ ret = TARGET_ERRNO__BPF_OVERRIDE_UID;
+ }
+
+ /* BPF and THREADS are mutually exclusive */
+ if (target->bpf_str && target->per_thread) {
+ target->per_thread = false;
+ if (ret == TARGET_ERRNO__SUCCESS)
+ ret = TARGET_ERRNO__BPF_OVERRIDE_THREAD;
+ }
+
/* THREAD and SYSTEM/CPU are mutually exclusive */
if (target->per_thread && (target->system_wide || target->cpu_list)) {
target->per_thread = false;
@@ -109,6 +137,10 @@ static const char *target__error_str[] = {
"PID/TID switch overriding SYSTEM",
"UID switch overriding SYSTEM",
"SYSTEM/CPU switch overriding PER-THREAD",
+ "BPF switch overriding CPU",
+ "BPF switch overriding PID/TID",
+ "BPF switch overriding UID",
+ "BPF switch overriding THREAD",
"Invalid User: %s",
"Problems obtaining information for user %s",
};
@@ -134,7 +166,7 @@ int target__strerror(struct target *target, int errnum,

switch (errnum) {
case TARGET_ERRNO__PID_OVERRIDE_CPU ...
- TARGET_ERRNO__SYSTEM_OVERRIDE_THREAD:
+ TARGET_ERRNO__BPF_OVERRIDE_THREAD:
snprintf(buf, buflen, "%s", msg);
break;

diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
index 6ef01a83b24e9..f132c6c2eef81 100644
--- a/tools/perf/util/target.h
+++ b/tools/perf/util/target.h
@@ -10,6 +10,7 @@ struct target {
const char *tid;
const char *cpu_list;
const char *uid_str;
+ const char *bpf_str;
uid_t uid;
bool system_wide;
bool uses_mmap;
@@ -36,6 +37,10 @@ enum target_errno {
TARGET_ERRNO__PID_OVERRIDE_SYSTEM,
TARGET_ERRNO__UID_OVERRIDE_SYSTEM,
TARGET_ERRNO__SYSTEM_OVERRIDE_THREAD,
+ TARGET_ERRNO__BPF_OVERRIDE_CPU,
+ TARGET_ERRNO__BPF_OVERRIDE_PID,
+ TARGET_ERRNO__BPF_OVERRIDE_UID,
+ TARGET_ERRNO__BPF_OVERRIDE_THREAD,

/* for target__parse_uid() */
TARGET_ERRNO__INVALID_UID,
@@ -59,6 +64,11 @@ static inline bool target__has_cpu(struct target *target)
return target->system_wide || target->cpu_list;
}

+static inline bool target__has_bpf(struct target *target)
+{
+ return target->bpf_str;
+}
+
static inline bool target__none(struct target *target)
{
return !target__has_task(target) && !target__has_cpu(target);
--
2.24.1

2021-01-12 11:38:54

by Namhyung Kim

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs

Hi,

On Wed, Dec 30, 2020 at 6:42 AM Song Liu <[email protected]> wrote:
>
> Introduce perf-stat -b option, which counts events for BPF programs, like:
>
> [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
> 1.487903822 115,200 ref-cycles
> 1.487903822 86,012 cycles
> 2.489147029 80,560 ref-cycles
> 2.489147029 73,784 cycles
> 3.490341825 60,720 ref-cycles
> 3.490341825 37,797 cycles
> 4.491540887 37,120 ref-cycles
> 4.491540887 31,963 cycles
>
> The example above counts cycles and ref-cycles of BPF program of id 254.
> This is similar to bpftool-prog-profile command, but more flexible.
>
> perf-stat -b creates per-cpu perf_event and loads fentry/fexit BPF
> programs (monitor-progs) to the target BPF program (target-prog). The
> monitor-progs read perf_event before and after the target-prog, and
> aggregate the difference in a BPF map. Then the user space reads data
> from these maps.
>
> A new struct bpf_counter is introduced to provide common interface that
> uses BPF programs/maps to count perf events.

Acked-by: Namhyung Kim <[email protected]>

Thanks,
Namhyung

2021-01-15 18:54:51

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs

Em Tue, Jan 12, 2021 at 04:35:14PM +0900, Namhyung Kim escreveu:
> Hi,
>
> On Wed, Dec 30, 2020 at 6:42 AM Song Liu <[email protected]> wrote:
> >
> > Introduce perf-stat -b option, which counts events for BPF programs, like:
> >
> > [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
> > 1.487903822 115,200 ref-cycles
> > 1.487903822 86,012 cycles
> > 2.489147029 80,560 ref-cycles
> > 2.489147029 73,784 cycles
> > 3.490341825 60,720 ref-cycles
> > 3.490341825 37,797 cycles
> > 4.491540887 37,120 ref-cycles
> > 4.491540887 31,963 cycles
> >
> > The example above counts cycles and ref-cycles of BPF program of id 254.
> > This is similar to bpftool-prog-profile command, but more flexible.
> >
> > perf-stat -b creates per-cpu perf_event and loads fentry/fexit BPF
> > programs (monitor-progs) to the target BPF program (target-prog). The
> > monitor-progs read perf_event before and after the target-prog, and
> > aggregate the difference in a BPF map. Then the user space reads data
> > from these maps.
> >
> > A new struct bpf_counter is introduced to provide common interface that
> > uses BPF programs/maps to count perf events.
>
> Acked-by: Namhyung Kim <[email protected]>

Applied locally, testing now.

- Arnaldo

2021-01-19 05:10:49

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs

Em Tue, Dec 29, 2020 at 01:42:14PM -0800, Song Liu escreveu:
> Introduce perf-stat -b option, which counts events for BPF programs, like:
>
> [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
> 1.487903822 115,200 ref-cycles
> 1.487903822 86,012 cycles
> 2.489147029 80,560 ref-cycles
> 2.489147029 73,784 cycles
> 3.490341825 60,720 ref-cycles
> 3.490341825 37,797 cycles
> 4.491540887 37,120 ref-cycles
> 4.491540887 31,963 cycles
>
> The example above counts cycles and ref-cycles of BPF program of id 254.
> This is similar to bpftool-prog-profile command, but more flexible.

So trying to get back to this after vacation:

[root@five ~]# perf stat -vvv -e cycles -b 103
Control descriptor is not initialized
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: load bpf program failed: Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Cannot recursively attach
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

libbpf: -- END LOG --
libbpf: failed to load program 'fentry_XXX'
libbpf: failed to load object 'bpf_prog_profiler_bpf'
libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
bpf_prog_profiler_bpf__load failed
[root@five ~]#

[root@five perf]# tools/bpf/runqslower/.output/
bpftool/ libbpf/ runqslower
[root@five perf]# tools/bpf/runqslower/.output/runqslower
libbpf: elf: skipping unrecognized data section(10) .eh_frame
libbpf: elf: skipping relo section(17) .rel.eh_frame for section(10) .eh_frame
Tracing run queue latency higher than 10000 us
TIME COMM PID LAT(us)

[root@five ~]# bpftool prog
6: tracing name dump_bpf_map tag e54c922dfa54f65f gpl
loaded_at 2021-01-15T11:14:38-0300 uid 0
xlated 256B jited 154B memlock 4096B map_ids 3
btf_id 6
8: tracing name dump_bpf_prog tag 4cb5558b902ee92e gpl
loaded_at 2021-01-15T11:14:38-0300 uid 0
xlated 520B jited 301B memlock 4096B map_ids 3
btf_id 6
80: cgroup_device tag 8b9c33f36f812014 gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 744B jited 447B memlock 4096B
81: cgroup_skb tag 6deef7357e7b4530 gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 64B jited 54B memlock 4096B
82: cgroup_skb tag 6deef7357e7b4530 gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 64B jited 54B memlock 4096B
83: cgroup_device tag ee0e253c78993a24 gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 416B jited 255B memlock 4096B
84: cgroup_device tag e3dbd137be8d6168 gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 504B jited 309B memlock 4096B
85: cgroup_skb tag 6deef7357e7b4530 gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 64B jited 54B memlock 4096B
86: cgroup_skb tag 6deef7357e7b4530 gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 64B jited 54B memlock 4096B
87: cgroup_device tag c8b47a902f1cc68b gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 464B jited 288B memlock 4096B
88: cgroup_device tag 0ecd07b7b633809f gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 496B jited 307B memlock 4096B
89: cgroup_skb tag 6deef7357e7b4530 gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 64B jited 54B memlock 4096B
90: cgroup_skb tag 6deef7357e7b4530 gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 64B jited 54B memlock 4096B
91: cgroup_skb tag 6deef7357e7b4530 gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 64B jited 54B memlock 4096B
92: cgroup_skb tag 6deef7357e7b4530 gpl
loaded_at 2021-01-17T18:35:22-0300 uid 0
xlated 64B jited 54B memlock 4096B
100: tracing name handle__sched_w tag 00ba1f1d332f5a57 gpl
loaded_at 2021-01-18T16:34:00-0300 uid 0
xlated 152B jited 93B memlock 4096B map_ids 11,9
btf_id 248
102: tracing name handle__sched_w tag 00ba1f1d332f5a57 gpl
loaded_at 2021-01-18T16:34:00-0300 uid 0
xlated 152B jited 93B memlock 4096B map_ids 11,9
btf_id 248
103: tracing name handle__sched_s tag a4ac42153fd49be2 gpl
loaded_at 2021-01-18T16:34:00-0300 uid 0
xlated 560B jited 332B memlock 4096B map_ids 11,9,10
btf_id 248
[root@five ~]# perf stat -vvv -e cycles -b 92
Control descriptor is not initialized
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
prog FD 3 doesn't have valid btf
Failed to get program name for bpf prog 92. Does it have BTF?
[root@five ~]# perf stat -vvv -e cycles -b 103
Control descriptor is not initialized
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: load bpf program failed: Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Cannot recursively attach
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

libbpf: -- END LOG --
libbpf: failed to load program 'fentry_XXX'
libbpf: failed to load object 'bpf_prog_profiler_bpf'
libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
bpf_prog_profiler_bpf__load failed
[root@five ~]# perf stat -vvv -e cycles -b 102
Control descriptor is not initialized
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: load bpf program failed: Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Cannot recursively attach
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

libbpf: -- END LOG --
libbpf: failed to load program 'fentry_XXX'
libbpf: failed to load object 'bpf_prog_profiler_bpf'
libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
bpf_prog_profiler_bpf__load failed
[root@five ~]#
[root@five ~]# perf stat -vvv -e cycles -b 6
Control descriptor is not initialized
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: load bpf program failed: Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Cannot recursively attach
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

libbpf: -- END LOG --
libbpf: failed to load program 'fentry_XXX'
libbpf: failed to load object 'bpf_prog_profiler_bpf'
libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
bpf_prog_profiler_bpf__load failed
[root@five ~]#

So no more segfaults, but still something to look at, I'm pushing my
local perf/core to tmp.perf/core, please take a look.

- Arnaldo

> perf-stat -b creates per-cpu perf_event and loads fentry/fexit BPF
> programs (monitor-progs) to the target BPF program (target-prog). The
> monitor-progs read perf_event before and after the target-prog, and
> aggregate the difference in a BPF map. Then the user space reads data
> from these maps.
>
> A new struct bpf_counter is introduced to provide common interface that
> uses BPF programs/maps to count perf events.
>
> Signed-off-by: Song Liu <[email protected]>
> ---
> tools/perf/Documentation/perf-stat.txt | 18 +
> tools/perf/Makefile.perf | 2 +-
> tools/perf/builtin-stat.c | 82 ++++-
> tools/perf/util/Build | 1 +
> tools/perf/util/bpf_counter.c | 309 ++++++++++++++++++
> tools/perf/util/bpf_counter.h | 72 ++++
> .../util/bpf_skel/bpf_prog_profiler.bpf.c | 93 ++++++
> tools/perf/util/evsel.c | 9 +
> tools/perf/util/evsel.h | 5 +
> tools/perf/util/python.c | 21 ++
> tools/perf/util/stat-display.c | 4 +-
> tools/perf/util/stat.c | 2 +-
> tools/perf/util/target.c | 34 +-
> tools/perf/util/target.h | 10 +
> 14 files changed, 644 insertions(+), 18 deletions(-)
> create mode 100644 tools/perf/util/bpf_counter.c
> create mode 100644 tools/perf/util/bpf_counter.h
> create mode 100644 tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
>
> diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
> index 5d4a673d7621a..98a4dfd3b2dc9 100644
> --- a/tools/perf/Documentation/perf-stat.txt
> +++ b/tools/perf/Documentation/perf-stat.txt
> @@ -75,6 +75,24 @@ report::
> --tid=<tid>::
> stat events on existing thread id (comma separated list)
>
> +-b::
> +--bpf-prog::
> + stat events on existing bpf program id (comma separated list),
> + requiring root rights. bpftool-prog could be used to find program
> + id all bpf programs in the system. For example:
> +
> + # bpftool prog | head -n 1
> + 17247: tracepoint name sys_enter tag 192d548b9d754067 gpl
> +
> + # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
> +
> + Performance counter stats for 'BPF program(s) 17247':
> +
> + 85,967 cycles
> + 28,982 instructions # 0.34 insn per cycle
> +
> + 1.102235068 seconds time elapsed
> +
> ifdef::HAVE_LIBPFM[]
> --pfm-events events::
> Select a PMU event using libpfm4 syntax (see http://perfmon2.sf.net)
> diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> index d182a2dbb9bbd..8c4e039c3b813 100644
> --- a/tools/perf/Makefile.perf
> +++ b/tools/perf/Makefile.perf
> @@ -1015,7 +1015,7 @@ python-clean:
>
> SKEL_OUT := $(abspath $(OUTPUT)util/bpf_skel)
> SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp)
> -SKELETONS :=
> +SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h
>
> ifdef BUILD_BPF_SKEL
> BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 8cc24967bc273..3c054b8d4677c 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -67,6 +67,7 @@
> #include "util/top.h"
> #include "util/affinity.h"
> #include "util/pfm.h"
> +#include "util/bpf_counter.h"
> #include "asm/bug.h"
>
> #include <linux/time64.h>
> @@ -409,12 +410,32 @@ static int read_affinity_counters(struct timespec *rs)
> return 0;
> }
>
> +static int read_bpf_map_counters(void)
> +{
> + struct evsel *counter;
> + int err;
> +
> + evlist__for_each_entry(evsel_list, counter) {
> + err = bpf_counter__read(counter);
> + if (err)
> + return err;
> + }
> + return 0;
> +}
> +
> static void read_counters(struct timespec *rs)
> {
> struct evsel *counter;
> + int err;
>
> - if (!stat_config.stop_read_counter && (read_affinity_counters(rs) < 0))
> - return;
> + if (!stat_config.stop_read_counter) {
> + if (target__has_bpf(&target))
> + err = read_bpf_map_counters();
> + else
> + err = read_affinity_counters(rs);
> + if (err < 0)
> + return;
> + }
>
> evlist__for_each_entry(evsel_list, counter) {
> if (counter->err)
> @@ -496,11 +517,22 @@ static bool handle_interval(unsigned int interval, int *times)
> return false;
> }
>
> -static void enable_counters(void)
> +static int enable_counters(void)
> {
> + struct evsel *evsel;
> + int err;
> +
> + if (target__has_bpf(&target)) {
> + evlist__for_each_entry(evsel_list, evsel) {
> + err = bpf_counter__enable(evsel);
> + if (err)
> + return err;
> + }
> + }
> +
> if (stat_config.initial_delay < 0) {
> pr_info(EVLIST_DISABLED_MSG);
> - return;
> + return 0;
> }
>
> if (stat_config.initial_delay > 0) {
> @@ -518,6 +550,7 @@ static void enable_counters(void)
> if (stat_config.initial_delay > 0)
> pr_info(EVLIST_ENABLED_MSG);
> }
> + return 0;
> }
>
> static void disable_counters(void)
> @@ -720,7 +753,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> const bool forks = (argc > 0);
> bool is_pipe = STAT_RECORD ? perf_stat.data.is_pipe : false;
> struct affinity affinity;
> - int i, cpu;
> + int i, cpu, err;
> bool second_pass = false;
>
> if (forks) {
> @@ -737,6 +770,13 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> if (affinity__setup(&affinity) < 0)
> return -1;
>
> + if (target__has_bpf(&target)) {
> + evlist__for_each_entry(evsel_list, counter) {
> + if (bpf_counter__load(counter, &target))
> + return -1;
> + }
> + }
> +
> evlist__for_each_cpu (evsel_list, i, cpu) {
> affinity__set(&affinity, cpu);
>
> @@ -850,7 +890,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> }
>
> if (STAT_RECORD) {
> - int err, fd = perf_data__fd(&perf_stat.data);
> + int fd = perf_data__fd(&perf_stat.data);
>
> if (is_pipe) {
> err = perf_header__write_pipe(perf_data__fd(&perf_stat.data));
> @@ -876,7 +916,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>
> if (forks) {
> evlist__start_workload(evsel_list);
> - enable_counters();
> + err = enable_counters();
> + if (err)
> + return -1;
>
> if (interval || timeout || evlist__ctlfd_initialized(evsel_list))
> status = dispatch_events(forks, timeout, interval, &times);
> @@ -895,7 +937,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> if (WIFSIGNALED(status))
> psignal(WTERMSIG(status), argv[0]);
> } else {
> - enable_counters();
> + err = enable_counters();
> + if (err)
> + return -1;
> status = dispatch_events(forks, timeout, interval, &times);
> }
>
> @@ -1085,6 +1129,10 @@ static struct option stat_options[] = {
> "stat events on existing process id"),
> OPT_STRING('t', "tid", &target.tid, "tid",
> "stat events on existing thread id"),
> +#ifdef HAVE_BPF_SKEL
> + OPT_STRING('b', "bpf-prog", &target.bpf_str, "bpf-prog-id",
> + "stat events on existing bpf program id"),
> +#endif
> OPT_BOOLEAN('a', "all-cpus", &target.system_wide,
> "system-wide collection from all CPUs"),
> OPT_BOOLEAN('g', "group", &group,
> @@ -2064,11 +2112,12 @@ int cmd_stat(int argc, const char **argv)
> "perf stat [<options>] [<command>]",
> NULL
> };
> - int status = -EINVAL, run_idx;
> + int status = -EINVAL, run_idx, err;
> const char *mode;
> FILE *output = stderr;
> unsigned int interval, timeout;
> const char * const stat_subcommands[] = { "record", "report" };
> + char errbuf[BUFSIZ];
>
> setlocale(LC_ALL, "");
>
> @@ -2179,6 +2228,12 @@ int cmd_stat(int argc, const char **argv)
> } else if (big_num_opt == 0) /* User passed --no-big-num */
> stat_config.big_num = false;
>
> + err = target__validate(&target);
> + if (err) {
> + target__strerror(&target, err, errbuf, BUFSIZ);
> + pr_warning("%s\n", errbuf);
> + }
> +
> setup_system_wide(argc);
>
> /*
> @@ -2252,8 +2307,6 @@ int cmd_stat(int argc, const char **argv)
> }
> }
>
> - target__validate(&target);
> -
> if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide))
> target.per_thread = true;
>
> @@ -2384,9 +2437,10 @@ int cmd_stat(int argc, const char **argv)
> * tools remain -acme
> */
> int fd = perf_data__fd(&perf_stat.data);
> - int err = perf_event__synthesize_kernel_mmap((void *)&perf_stat,
> - process_synthesized_event,
> - &perf_stat.session->machines.host);
> +
> + err = perf_event__synthesize_kernel_mmap((void *)&perf_stat,
> + process_synthesized_event,
> + &perf_stat.session->machines.host);
> if (err) {
> pr_warning("Couldn't synthesize the kernel mmap record, harmless, "
> "older tools may produce warnings about this file\n.");
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index e2563d0154eb6..188521f343470 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -135,6 +135,7 @@ perf-y += clockid.o
>
> perf-$(CONFIG_LIBBPF) += bpf-loader.o
> perf-$(CONFIG_LIBBPF) += bpf_map.o
> +perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o
> perf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o
> perf-$(CONFIG_LIBELF) += symbol-elf.o
> perf-$(CONFIG_LIBELF) += probe-file.o
> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> new file mode 100644
> index 0000000000000..8c977f038f497
> --- /dev/null
> +++ b/tools/perf/util/bpf_counter.c
> @@ -0,0 +1,309 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/* Copyright (c) 2019 Facebook */
> +
> +#include <assert.h>
> +#include <limits.h>
> +#include <unistd.h>
> +#include <sys/time.h>
> +#include <sys/resource.h>
> +#include <linux/err.h>
> +#include <linux/zalloc.h>
> +#include <bpf/bpf.h>
> +#include <bpf/btf.h>
> +#include <bpf/libbpf.h>
> +
> +#include "bpf_counter.h"
> +#include "counts.h"
> +#include "debug.h"
> +#include "evsel.h"
> +#include "target.h"
> +
> +#include "bpf_skel/bpf_prog_profiler.skel.h"
> +
> +static inline void *u64_to_ptr(__u64 ptr)
> +{
> + return (void *)(unsigned long)ptr;
> +}
> +
> +static void set_max_rlimit(void)
> +{
> + struct rlimit rinf = { RLIM_INFINITY, RLIM_INFINITY };
> +
> + setrlimit(RLIMIT_MEMLOCK, &rinf);
> +}
> +
> +static struct bpf_counter *bpf_counter_alloc(void)
> +{
> + struct bpf_counter *counter;
> +
> + counter = zalloc(sizeof(*counter));
> + if (counter)
> + INIT_LIST_HEAD(&counter->list);
> + return counter;
> +}
> +
> +static int bpf_program_profiler__destroy(struct evsel *evsel)
> +{
> + struct bpf_counter *counter, *tmp;
> +
> + list_for_each_entry_safe(counter, tmp,
> + &evsel->bpf_counter_list, list) {
> + list_del_init(&counter->list);
> + bpf_prog_profiler_bpf__destroy(counter->skel);
> + free(counter);
> + }
> + assert(list_empty(&evsel->bpf_counter_list));
> +
> + return 0;
> +}
> +
> +static char *bpf_target_prog_name(int tgt_fd)
> +{
> + struct bpf_prog_info_linear *info_linear;
> + struct bpf_func_info *func_info;
> + const struct btf_type *t;
> + char *name = NULL;
> + struct btf *btf;
> +
> + info_linear = bpf_program__get_prog_info_linear(
> + tgt_fd, 1UL << BPF_PROG_INFO_FUNC_INFO);
> + if (IS_ERR_OR_NULL(info_linear)) {
> + pr_debug("failed to get info_linear for prog FD %d\n", tgt_fd);
> + return NULL;
> + }
> +
> + if (info_linear->info.btf_id == 0 ||
> + btf__get_from_id(info_linear->info.btf_id, &btf)) {
> + pr_debug("prog FD %d doesn't have valid btf\n", tgt_fd);
> + goto out;
> + }
> +
> + func_info = u64_to_ptr(info_linear->info.func_info);
> + t = btf__type_by_id(btf, func_info[0].type_id);
> + if (!t) {
> + pr_debug("btf %d doesn't have type %d\n",
> + info_linear->info.btf_id, func_info[0].type_id);
> + goto out;
> + }
> + name = strdup(btf__name_by_offset(btf, t->name_off));
> +out:
> + free(info_linear);
> + return name;
> +}
> +
> +static int bpf_program_profiler_load_one(struct evsel *evsel, u32 prog_id)
> +{
> + struct bpf_prog_profiler_bpf *skel;
> + struct bpf_counter *counter;
> + struct bpf_program *prog;
> + char *prog_name;
> + int prog_fd;
> + int err;
> +
> + prog_fd = bpf_prog_get_fd_by_id(prog_id);
> + if (prog_fd < 0) {
> + pr_err("Failed to open fd for bpf prog %u\n", prog_id);
> + return -1;
> + }
> + counter = bpf_counter_alloc();
> + if (!counter) {
> + close(prog_fd);
> + return -1;
> + }
> +
> + skel = bpf_prog_profiler_bpf__open();
> + if (!skel) {
> + pr_err("Failed to open bpf skeleton\n");
> + goto err_out;
> + }
> +
> + skel->rodata->num_cpu = evsel__nr_cpus(evsel);
> +
> + bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
> + bpf_map__resize(skel->maps.fentry_readings, 1);
> + bpf_map__resize(skel->maps.accum_readings, 1);
> +
> + prog_name = bpf_target_prog_name(prog_fd);
> + if (!prog_name) {
> + pr_err("Failed to get program name for bpf prog %u. Does it have BTF?\n", prog_id);
> + goto err_out;
> + }
> +
> + bpf_object__for_each_program(prog, skel->obj) {
> + err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> + if (err) {
> + pr_err("bpf_program__set_attach_target failed.\n"
> + "Does bpf prog %u have BTF?\n", prog_id);
> + goto err_out;
> + }
> + }
> + set_max_rlimit();
> + err = bpf_prog_profiler_bpf__load(skel);
> + if (err) {
> + pr_err("bpf_prog_profiler_bpf__load failed\n");
> + goto err_out;
> + }
> +
> + assert(skel != NULL);
> + counter->skel = skel;
> + list_add(&counter->list, &evsel->bpf_counter_list);
> + close(prog_fd);
> + return 0;
> +err_out:
> + bpf_prog_profiler_bpf__destroy(skel);
> + free(counter);
> + close(prog_fd);
> + return -1;
> +}
> +
> +static int bpf_program_profiler__load(struct evsel *evsel, struct target *target)
> +{
> + char *bpf_str, *bpf_str_, *tok, *saveptr = NULL, *p;
> + u32 prog_id;
> + int ret;
> +
> + bpf_str_ = bpf_str = strdup(target->bpf_str);
> + if (!bpf_str)
> + return -1;
> +
> + while ((tok = strtok_r(bpf_str, ",", &saveptr)) != NULL) {
> + prog_id = strtoul(tok, &p, 10);
> + if (prog_id == 0 || prog_id == UINT_MAX ||
> + (*p != '\0' && *p != ',')) {
> + pr_err("Failed to parse bpf prog ids %s\n",
> + target->bpf_str);
> + return -1;
> + }
> +
> + ret = bpf_program_profiler_load_one(evsel, prog_id);
> + if (ret) {
> + bpf_program_profiler__destroy(evsel);
> + free(bpf_str_);
> + return -1;
> + }
> + bpf_str = NULL;
> + }
> + free(bpf_str_);
> + return 0;
> +}
> +
> +static int bpf_program_profiler__enable(struct evsel *evsel)
> +{
> + struct bpf_counter *counter;
> + int ret;
> +
> + list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> + assert(counter->skel != NULL);
> + ret = bpf_prog_profiler_bpf__attach(counter->skel);
> + if (ret) {
> + bpf_program_profiler__destroy(evsel);
> + return ret;
> + }
> + }
> + return 0;
> +}
> +
> +static int bpf_program_profiler__read(struct evsel *evsel)
> +{
> + int num_cpu = evsel__nr_cpus(evsel);
> + struct bpf_perf_event_value values[num_cpu];
> + struct bpf_counter *counter;
> + int reading_map_fd;
> + __u32 key = 0;
> + int err, cpu;
> +
> + if (list_empty(&evsel->bpf_counter_list))
> + return -EAGAIN;
> +
> + for (cpu = 0; cpu < num_cpu; cpu++) {
> + perf_counts(evsel->counts, cpu, 0)->val = 0;
> + perf_counts(evsel->counts, cpu, 0)->ena = 0;
> + perf_counts(evsel->counts, cpu, 0)->run = 0;
> + }
> + list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> + struct bpf_prog_profiler_bpf *skel = counter->skel;
> +
> + assert(skel != NULL);
> + reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
> +
> + err = bpf_map_lookup_elem(reading_map_fd, &key, values);
> + if (err) {
> + pr_err("failed to read value\n");
> + return err;
> + }
> +
> + for (cpu = 0; cpu < num_cpu; cpu++) {
> + perf_counts(evsel->counts, cpu, 0)->val += values[cpu].counter;
> + perf_counts(evsel->counts, cpu, 0)->ena += values[cpu].enabled;
> + perf_counts(evsel->counts, cpu, 0)->run += values[cpu].running;
> + }
> + }
> + return 0;
> +}
> +
> +static int bpf_program_profiler__install_pe(struct evsel *evsel, int cpu,
> + int fd)
> +{
> + struct bpf_prog_profiler_bpf *skel;
> + struct bpf_counter *counter;
> + int ret;
> +
> + list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> + skel = counter->skel;
> + assert(skel != NULL);
> +
> + ret = bpf_map_update_elem(bpf_map__fd(skel->maps.events),
> + &cpu, &fd, BPF_ANY);
> + if (ret)
> + return ret;
> + }
> + return 0;
> +}
> +
> +struct bpf_counter_ops bpf_program_profiler_ops = {
> + .load = bpf_program_profiler__load,
> + .enable = bpf_program_profiler__enable,
> + .read = bpf_program_profiler__read,
> + .destroy = bpf_program_profiler__destroy,
> + .install_pe = bpf_program_profiler__install_pe,
> +};
> +
> +int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd)
> +{
> + if (list_empty(&evsel->bpf_counter_list))
> + return 0;
> + return evsel->bpf_counter_ops->install_pe(evsel, cpu, fd);
> +}
> +
> +int bpf_counter__load(struct evsel *evsel, struct target *target)
> +{
> + if (target__has_bpf(target))
> + evsel->bpf_counter_ops = &bpf_program_profiler_ops;
> +
> + if (evsel->bpf_counter_ops)
> + return evsel->bpf_counter_ops->load(evsel, target);
> + return 0;
> +}
> +
> +int bpf_counter__enable(struct evsel *evsel)
> +{
> + if (list_empty(&evsel->bpf_counter_list))
> + return 0;
> + return evsel->bpf_counter_ops->enable(evsel);
> +}
> +
> +int bpf_counter__read(struct evsel *evsel)
> +{
> + if (list_empty(&evsel->bpf_counter_list))
> + return -EAGAIN;
> + return evsel->bpf_counter_ops->read(evsel);
> +}
> +
> +void bpf_counter__destroy(struct evsel *evsel)
> +{
> + if (list_empty(&evsel->bpf_counter_list))
> + return;
> + evsel->bpf_counter_ops->destroy(evsel);
> + evsel->bpf_counter_ops = NULL;
> +}
> diff --git a/tools/perf/util/bpf_counter.h b/tools/perf/util/bpf_counter.h
> new file mode 100644
> index 0000000000000..2eca210e5dc16
> --- /dev/null
> +++ b/tools/perf/util/bpf_counter.h
> @@ -0,0 +1,72 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __PERF_BPF_COUNTER_H
> +#define __PERF_BPF_COUNTER_H 1
> +
> +#include <linux/list.h>
> +
> +struct evsel;
> +struct target;
> +struct bpf_counter;
> +
> +typedef int (*bpf_counter_evsel_op)(struct evsel *evsel);
> +typedef int (*bpf_counter_evsel_target_op)(struct evsel *evsel,
> + struct target *target);
> +typedef int (*bpf_counter_evsel_install_pe_op)(struct evsel *evsel,
> + int cpu,
> + int fd);
> +
> +struct bpf_counter_ops {
> + bpf_counter_evsel_target_op load;
> + bpf_counter_evsel_op enable;
> + bpf_counter_evsel_op read;
> + bpf_counter_evsel_op destroy;
> + bpf_counter_evsel_install_pe_op install_pe;
> +};
> +
> +struct bpf_counter {
> + void *skel;
> + struct list_head list;
> +};
> +
> +#ifdef HAVE_BPF_SKEL
> +
> +int bpf_counter__load(struct evsel *evsel, struct target *target);
> +int bpf_counter__enable(struct evsel *evsel);
> +int bpf_counter__read(struct evsel *evsel);
> +void bpf_counter__destroy(struct evsel *evsel);
> +int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd);
> +
> +#else /* HAVE_BPF_SKEL */
> +
> +#include<linux/err.h>
> +
> +static inline int bpf_counter__load(struct evsel *evsel __maybe_unused,
> + struct target *target __maybe_unused)
> +{
> + return 0;
> +}
> +
> +static inline int bpf_counter__enable(struct evsel *evsel __maybe_unused)
> +{
> + return 0;
> +}
> +
> +static inline int bpf_counter__read(struct evsel *evsel __maybe_unused)
> +{
> + return -EAGAIN;
> +}
> +
> +static inline void bpf_counter__destroy(struct evsel *evsel __maybe_unused)
> +{
> +}
> +
> +static inline int bpf_counter__install_pe(struct evsel *evsel __maybe_unused,
> + int cpu __maybe_unused,
> + int fd __maybe_unused)
> +{
> + return 0;
> +}
> +
> +#endif /* HAVE_BPF_SKEL */
> +
> +#endif /* __PERF_BPF_COUNTER_H */
> diff --git a/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c b/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
> new file mode 100644
> index 0000000000000..c7cec92d02360
> --- /dev/null
> +++ b/tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
> @@ -0,0 +1,93 @@
> +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +// Copyright (c) 2020 Facebook
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +
> +/* map of perf event fds, num_cpu * num_metric entries */
> +struct {
> + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
> + __uint(key_size, sizeof(__u32));
> + __uint(value_size, sizeof(int));
> +} events SEC(".maps");
> +
> +/* readings at fentry */
> +struct {
> + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
> + __uint(key_size, sizeof(__u32));
> + __uint(value_size, sizeof(struct bpf_perf_event_value));
> + __uint(max_entries, 1);
> +} fentry_readings SEC(".maps");
> +
> +/* accumulated readings */
> +struct {
> + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
> + __uint(key_size, sizeof(__u32));
> + __uint(value_size, sizeof(struct bpf_perf_event_value));
> + __uint(max_entries, 1);
> +} accum_readings SEC(".maps");
> +
> +const volatile __u32 num_cpu = 1;
> +
> +SEC("fentry/XXX")
> +int BPF_PROG(fentry_XXX)
> +{
> + __u32 key = bpf_get_smp_processor_id();
> + struct bpf_perf_event_value *ptr;
> + __u32 zero = 0;
> + long err;
> +
> + /* look up before reading, to reduce error */
> + ptr = bpf_map_lookup_elem(&fentry_readings, &zero);
> + if (!ptr)
> + return 0;
> +
> + err = bpf_perf_event_read_value(&events, key, ptr, sizeof(*ptr));
> + if (err)
> + return 0;
> +
> + return 0;
> +}
> +
> +static inline void
> +fexit_update_maps(struct bpf_perf_event_value *after)
> +{
> + struct bpf_perf_event_value *before, diff, *accum;
> + __u32 zero = 0;
> +
> + before = bpf_map_lookup_elem(&fentry_readings, &zero);
> + /* only account samples with a valid fentry_reading */
> + if (before && before->counter) {
> + struct bpf_perf_event_value *accum;
> +
> + diff.counter = after->counter - before->counter;
> + diff.enabled = after->enabled - before->enabled;
> + diff.running = after->running - before->running;
> +
> + accum = bpf_map_lookup_elem(&accum_readings, &zero);
> + if (accum) {
> + accum->counter += diff.counter;
> + accum->enabled += diff.enabled;
> + accum->running += diff.running;
> + }
> + }
> +}
> +
> +SEC("fexit/XXX")
> +int BPF_PROG(fexit_XXX)
> +{
> + struct bpf_perf_event_value reading;
> + __u32 cpu = bpf_get_smp_processor_id();
> + __u32 one = 1, zero = 0;
> + int err;
> +
> + /* read all events before updating the maps, to reduce error */
> + err = bpf_perf_event_read_value(&events, cpu, &reading, sizeof(reading));
> + if (err)
> + return 0;
> +
> + fexit_update_maps(&reading);
> + return 0;
> +}
> +
> +char LICENSE[] SEC("license") = "Dual BSD/GPL";
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index dc0cfa5f26101..7a5ca00496000 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -25,6 +25,7 @@
> #include <stdlib.h>
> #include <perf/evsel.h>
> #include "asm/bug.h"
> +#include "bpf_counter.h"
> #include "callchain.h"
> #include "cgroup.h"
> #include "counts.h"
> @@ -51,6 +52,10 @@
> #include <internal/lib.h>
>
> #include <linux/ctype.h>
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +#include <bpf/btf.h>
> +#include "rlimit.h"
>
> struct perf_missing_features perf_missing_features;
>
> @@ -247,6 +252,7 @@ void evsel__init(struct evsel *evsel,
> evsel->bpf_obj = NULL;
> evsel->bpf_fd = -1;
> INIT_LIST_HEAD(&evsel->config_terms);
> + INIT_LIST_HEAD(&evsel->bpf_counter_list);
> perf_evsel__object.init(evsel);
> evsel->sample_size = __evsel__sample_size(attr->sample_type);
> evsel__calc_id_pos(evsel);
> @@ -1368,6 +1374,7 @@ void evsel__exit(struct evsel *evsel)
> {
> assert(list_empty(&evsel->core.node));
> assert(evsel->evlist == NULL);
> + bpf_counter__destroy(evsel);
> evsel__free_counts(evsel);
> perf_evsel__free_fd(&evsel->core);
> perf_evsel__free_id(&evsel->core);
> @@ -1783,6 +1790,8 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
>
> FD(evsel, cpu, thread) = fd;
>
> + bpf_counter__install_pe(evsel, cpu, fd);
> +
> if (unlikely(test_attr__enabled)) {
> test_attr__open(&evsel->core.attr, pid, cpus->map[cpu],
> fd, group_fd, flags);
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index cd1d8dd431997..8226b1fefa8cf 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -17,6 +17,8 @@ struct cgroup;
> struct perf_counts;
> struct perf_stat_evsel;
> union perf_event;
> +struct bpf_counter_ops;
> +struct target;
>
> typedef int (evsel__sb_cb_t)(union perf_event *event, void *data);
>
> @@ -127,6 +129,8 @@ struct evsel {
> * See also evsel__has_callchain().
> */
> __u64 synth_sample_type;
> + struct list_head bpf_counter_list;
> + struct bpf_counter_ops *bpf_counter_ops;
> };
>
> struct perf_missing_features {
> @@ -424,4 +428,5 @@ static inline bool evsel__is_dummy_event(struct evsel *evsel)
> struct perf_env *evsel__env(struct evsel *evsel);
>
> int evsel__store_ids(struct evsel *evsel, struct evlist *evlist);
> +
> #endif /* __PERF_EVSEL_H */
> diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c
> index cc5ade85a33fc..278abecb5bdfc 100644
> --- a/tools/perf/util/python.c
> +++ b/tools/perf/util/python.c
> @@ -79,6 +79,27 @@ int metricgroup__copy_metric_events(struct evlist *evlist, struct cgroup *cgrp,
> return 0;
> }
>
> +/*
> + * XXX: All these evsel destructors need some better mechanism, like a linked
> + * list of destructors registered when the relevant code indeed is used instead
> + * of having more and more calls in perf_evsel__delete(). -- acme
> + *
> + * For now, add some more:
> + *
> + * Not to drag the BPF bandwagon...
> + */
> +void bpf_counter__destroy(struct evsel *evsel);
> +int bpf_counter__install_pe(struct evsel *evsel, int cpu, int fd);
> +
> +void bpf_counter__destroy(struct evsel *evsel __maybe_unused)
> +{
> +}
> +
> +int bpf_counter__install_pe(struct evsel *evsel __maybe_unused, int cpu __maybe_unused, int fd __maybe_unused)
> +{
> + return 0;
> +}
> +
> /*
> * Support debug printing even though util/debug.c is not linked. That means
> * implementing 'verbose' and 'eprintf'.
> diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
> index 583ae4f09c5d1..cce7a76d6473c 100644
> --- a/tools/perf/util/stat-display.c
> +++ b/tools/perf/util/stat-display.c
> @@ -1045,7 +1045,9 @@ static void print_header(struct perf_stat_config *config,
> if (!config->csv_output) {
> fprintf(output, "\n");
> fprintf(output, " Performance counter stats for ");
> - if (_target->system_wide)
> + if (_target->bpf_str)
> + fprintf(output, "\'BPF program(s) %s", _target->bpf_str);
> + else if (_target->system_wide)
> fprintf(output, "\'system wide");
> else if (_target->cpu_list)
> fprintf(output, "\'CPU(s) %s", _target->cpu_list);
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 8ce1479c98f03..0b3957323f668 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -527,7 +527,7 @@ int create_perf_stat_counter(struct evsel *evsel,
> if (leader->core.nr_members > 1)
> attr->read_format |= PERF_FORMAT_ID|PERF_FORMAT_GROUP;
>
> - attr->inherit = !config->no_inherit;
> + attr->inherit = !config->no_inherit && list_empty(&evsel->bpf_counter_list);
>
> /*
> * Some events get initialized with sample_(period/type) set,
> diff --git a/tools/perf/util/target.c b/tools/perf/util/target.c
> index a3db13dea937c..0f383418e3df5 100644
> --- a/tools/perf/util/target.c
> +++ b/tools/perf/util/target.c
> @@ -56,6 +56,34 @@ enum target_errno target__validate(struct target *target)
> ret = TARGET_ERRNO__UID_OVERRIDE_SYSTEM;
> }
>
> + /* BPF and CPU are mutually exclusive */
> + if (target->bpf_str && target->cpu_list) {
> + target->cpu_list = NULL;
> + if (ret == TARGET_ERRNO__SUCCESS)
> + ret = TARGET_ERRNO__BPF_OVERRIDE_CPU;
> + }
> +
> + /* BPF and PID/TID are mutually exclusive */
> + if (target->bpf_str && target->tid) {
> + target->tid = NULL;
> + if (ret == TARGET_ERRNO__SUCCESS)
> + ret = TARGET_ERRNO__BPF_OVERRIDE_PID;
> + }
> +
> + /* BPF and UID are mutually exclusive */
> + if (target->bpf_str && target->uid_str) {
> + target->uid_str = NULL;
> + if (ret == TARGET_ERRNO__SUCCESS)
> + ret = TARGET_ERRNO__BPF_OVERRIDE_UID;
> + }
> +
> + /* BPF and THREADS are mutually exclusive */
> + if (target->bpf_str && target->per_thread) {
> + target->per_thread = false;
> + if (ret == TARGET_ERRNO__SUCCESS)
> + ret = TARGET_ERRNO__BPF_OVERRIDE_THREAD;
> + }
> +
> /* THREAD and SYSTEM/CPU are mutually exclusive */
> if (target->per_thread && (target->system_wide || target->cpu_list)) {
> target->per_thread = false;
> @@ -109,6 +137,10 @@ static const char *target__error_str[] = {
> "PID/TID switch overriding SYSTEM",
> "UID switch overriding SYSTEM",
> "SYSTEM/CPU switch overriding PER-THREAD",
> + "BPF switch overriding CPU",
> + "BPF switch overriding PID/TID",
> + "BPF switch overriding UID",
> + "BPF switch overriding THREAD",
> "Invalid User: %s",
> "Problems obtaining information for user %s",
> };
> @@ -134,7 +166,7 @@ int target__strerror(struct target *target, int errnum,
>
> switch (errnum) {
> case TARGET_ERRNO__PID_OVERRIDE_CPU ...
> - TARGET_ERRNO__SYSTEM_OVERRIDE_THREAD:
> + TARGET_ERRNO__BPF_OVERRIDE_THREAD:
> snprintf(buf, buflen, "%s", msg);
> break;
>
> diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
> index 6ef01a83b24e9..f132c6c2eef81 100644
> --- a/tools/perf/util/target.h
> +++ b/tools/perf/util/target.h
> @@ -10,6 +10,7 @@ struct target {
> const char *tid;
> const char *cpu_list;
> const char *uid_str;
> + const char *bpf_str;
> uid_t uid;
> bool system_wide;
> bool uses_mmap;
> @@ -36,6 +37,10 @@ enum target_errno {
> TARGET_ERRNO__PID_OVERRIDE_SYSTEM,
> TARGET_ERRNO__UID_OVERRIDE_SYSTEM,
> TARGET_ERRNO__SYSTEM_OVERRIDE_THREAD,
> + TARGET_ERRNO__BPF_OVERRIDE_CPU,
> + TARGET_ERRNO__BPF_OVERRIDE_PID,
> + TARGET_ERRNO__BPF_OVERRIDE_UID,
> + TARGET_ERRNO__BPF_OVERRIDE_THREAD,
>
> /* for target__parse_uid() */
> TARGET_ERRNO__INVALID_UID,
> @@ -59,6 +64,11 @@ static inline bool target__has_cpu(struct target *target)
> return target->system_wide || target->cpu_list;
> }
>
> +static inline bool target__has_bpf(struct target *target)
> +{
> + return target->bpf_str;
> +}
> +
> static inline bool target__none(struct target *target)
> {
> return !target__has_task(target) && !target__has_cpu(target);
> --
> 2.24.1
>

--

- Arnaldo

2021-01-19 05:36:07

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs

Hi Arnaldo,

> On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Em Tue, Dec 29, 2020 at 01:42:14PM -0800, Song Liu escreveu:
>> Introduce perf-stat -b option, which counts events for BPF programs, like:
>>
>> [root@localhost ~]# ~/perf stat -e ref-cycles,cycles -b 254 -I 1000
>> 1.487903822 115,200 ref-cycles
>> 1.487903822 86,012 cycles
>> 2.489147029 80,560 ref-cycles
>> 2.489147029 73,784 cycles
>> 3.490341825 60,720 ref-cycles
>> 3.490341825 37,797 cycles
>> 4.491540887 37,120 ref-cycles
>> 4.491540887 31,963 cycles
>>
>> The example above counts cycles and ref-cycles of BPF program of id 254.
>> This is similar to bpftool-prog-profile command, but more flexible.
>
> So trying to get back to this after vacation:
>
> [root@five ~]# perf stat -vvv -e cycles -b 103
> Control descriptor is not initialized
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: load bpf program failed: Invalid argument
> libbpf: -- BEGIN DUMP LOG ---
> libbpf:
> Cannot recursively attach
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>
> libbpf: -- END LOG --
> libbpf: failed to load program 'fentry_XXX'
> libbpf: failed to load object 'bpf_prog_profiler_bpf'
> libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
> bpf_prog_profiler_bpf__load failed
> [root@five ~]#
>
> [root@five perf]# tools/bpf/runqslower/.output/
> bpftool/ libbpf/ runqslower
> [root@five perf]# tools/bpf/runqslower/.output/runqslower
> libbpf: elf: skipping unrecognized data section(10) .eh_frame
> libbpf: elf: skipping relo section(17) .rel.eh_frame for section(10) .eh_frame
> Tracing run queue latency higher than 10000 us
> TIME COMM PID LAT(us)
>
> [root@five ~]# bpftool prog
> 6: tracing name dump_bpf_map tag e54c922dfa54f65f gpl
> loaded_at 2021-01-15T11:14:38-0300 uid 0
> xlated 256B jited 154B memlock 4096B map_ids 3
> btf_id 6
> 8: tracing name dump_bpf_prog tag 4cb5558b902ee92e gpl
> loaded_at 2021-01-15T11:14:38-0300 uid 0
> xlated 520B jited 301B memlock 4096B map_ids 3
> btf_id 6
> 80: cgroup_device tag 8b9c33f36f812014 gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 744B jited 447B memlock 4096B
> 81: cgroup_skb tag 6deef7357e7b4530 gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 64B jited 54B memlock 4096B
> 82: cgroup_skb tag 6deef7357e7b4530 gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 64B jited 54B memlock 4096B
> 83: cgroup_device tag ee0e253c78993a24 gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 416B jited 255B memlock 4096B
> 84: cgroup_device tag e3dbd137be8d6168 gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 504B jited 309B memlock 4096B
> 85: cgroup_skb tag 6deef7357e7b4530 gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 64B jited 54B memlock 4096B
> 86: cgroup_skb tag 6deef7357e7b4530 gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 64B jited 54B memlock 4096B
> 87: cgroup_device tag c8b47a902f1cc68b gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 464B jited 288B memlock 4096B
> 88: cgroup_device tag 0ecd07b7b633809f gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 496B jited 307B memlock 4096B
> 89: cgroup_skb tag 6deef7357e7b4530 gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 64B jited 54B memlock 4096B
> 90: cgroup_skb tag 6deef7357e7b4530 gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 64B jited 54B memlock 4096B
> 91: cgroup_skb tag 6deef7357e7b4530 gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 64B jited 54B memlock 4096B
> 92: cgroup_skb tag 6deef7357e7b4530 gpl
> loaded_at 2021-01-17T18:35:22-0300 uid 0
> xlated 64B jited 54B memlock 4096B
> 100: tracing name handle__sched_w tag 00ba1f1d332f5a57 gpl
> loaded_at 2021-01-18T16:34:00-0300 uid 0
> xlated 152B jited 93B memlock 4096B map_ids 11,9
> btf_id 248
> 102: tracing name handle__sched_w tag 00ba1f1d332f5a57 gpl
> loaded_at 2021-01-18T16:34:00-0300 uid 0
> xlated 152B jited 93B memlock 4096B map_ids 11,9
> btf_id 248
> 103: tracing name handle__sched_s tag a4ac42153fd49be2 gpl
> loaded_at 2021-01-18T16:34:00-0300 uid 0
> xlated 560B jited 332B memlock 4096B map_ids 11,9,10
> btf_id 248
> [root@five ~]# perf stat -vvv -e cycles -b 92
> Control descriptor is not initialized
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> prog FD 3 doesn't have valid btf
> Failed to get program name for bpf prog 92. Does it have BTF?
> [root@five ~]# perf stat -vvv -e cycles -b 103
> Control descriptor is not initialized
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: load bpf program failed: Invalid argument
> libbpf: -- BEGIN DUMP LOG ---
> libbpf:
> Cannot recursively attach
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>
> libbpf: -- END LOG --
> libbpf: failed to load program 'fentry_XXX'
> libbpf: failed to load object 'bpf_prog_profiler_bpf'
> libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
> bpf_prog_profiler_bpf__load failed
> [root@five ~]# perf stat -vvv -e cycles -b 102
> Control descriptor is not initialized
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: load bpf program failed: Invalid argument
> libbpf: -- BEGIN DUMP LOG ---
> libbpf:
> Cannot recursively attach
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>
> libbpf: -- END LOG --
> libbpf: failed to load program 'fentry_XXX'
> libbpf: failed to load object 'bpf_prog_profiler_bpf'
> libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
> bpf_prog_profiler_bpf__load failed
> [root@five ~]#
> [root@five ~]# perf stat -vvv -e cycles -b 6
> Control descriptor is not initialized
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: load bpf program failed: Invalid argument
> libbpf: -- BEGIN DUMP LOG ---
> libbpf:
> Cannot recursively attach
> processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
>
> libbpf: -- END LOG --
> libbpf: failed to load program 'fentry_XXX'
> libbpf: failed to load object 'bpf_prog_profiler_bpf'
> libbpf: failed to load BPF skeleton 'bpf_prog_profiler_bpf': -4007
> bpf_prog_profiler_bpf__load failed
> [root@five ~]#
>
> So no more segfaults, but still something to look at, I'm pushing my
> local perf/core to tmp.perf/core, please take a look.
>
> - Arnaldo

We are looking at two issues:
1. Cannot recursively attach;
2. prog FD 3 doesn't have valid btf.

#1 was caused by the verifier disallowing attaching fentry/fexit program
to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target).
This constraint was added when we only had fentry/fexit in the TRACING
type. We have extended the TRACING type to many other use cases, like
"tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit
this constraint. I will work on this.

For #2, we require the target program to have BTF. I guess we won't remove
this requirement.

While I work on improving #1, could you please test with some kprobe
programs? For example, we can use fileslower.py from bcc.

Thanks,
Song

[...]

2021-01-19 15:13:22

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs

Em Tue, Jan 19, 2021 at 11:31:44AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Jan 19, 2021 at 12:48:19AM +0000, Song Liu escreveu:
> > > On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <[email protected]> wrote:
> > We are looking at two issues:
> > 1. Cannot recursively attach;
> > 2. prog FD 3 doesn't have valid btf.

> > #1 was caused by the verifier disallowing attaching fentry/fexit program
> > to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target).
> > This constraint was added when we only had fentry/fexit in the TRACING
> > type. We have extended the TRACING type to many other use cases, like
> > "tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit
> > this constraint. I will work on this.

> > For #2, we require the target program to have BTF. I guess we won't remove
> > this requirement.

> > While I work on improving #1, could you please test with some kprobe
> > programs? For example, we can use fileslower.py from bcc.

> Sure, and please consider improving the error messages to state what you
> described above.

Terminal 1:

[root@five perf]# perf trace -e 5sec.c
^C
[root@five perf]# cat 5sec.c
#include <bpf.h>

#define NSEC_PER_SEC 1000000000L

int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec)
{
return sec / NSEC_PER_SEC == 5;
}

license(GPL);
[root@five perf]# perf trace -e 5sec.c/max-stack=16/
0.000 sleep/3739435 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1743337312, rqtp: 5000000000)
hrtimer_nanosleep ([kernel.kallsyms])
common_nsleep ([kernel.kallsyms])
__x64_sys_clock_nanosleep ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
__clock_nanosleep_2 (/usr/lib64/libc-2.32.so)


Terminal 2:

[root@five ~]# perf stat -e cycles -b 180 -I 1000
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
perf: util/bpf_counter.c:227: bpf_program_profiler__read: Assertion `skel != NULL' failed.
Aborted (core dumped)
[root@five ~]# gdb perf
GNU gdb (GDB) Fedora 10.1-2.fc33
<SNIP>
Reading symbols from perf...
(gdb) run stat -e cycles -b 180 -I 1000
Starting program: /root/bin/perf stat -e cycles -b 180 -I 1000
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.32-2.fc33.x86_64
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
perf: util/bpf_counter.c:227: bpf_program_profiler__read: Assertion `skel != NULL' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff75149d5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install bzip2-libs-1.0.8-4.fc33.x86_64 cyrus-sasl-lib-2.1.27-6.fc33.x86_64 elfutils-debuginfod-client-0.182-1.fc33.x86_64 elfutils-libelf-0.182-1.fc33.x86_64 elfutils-libs-0.182-1.fc33.x86_64 keyutils-libs-1.6.1-1.fc33.x86_64 krb5-libs-1.18.2-29.fc33.x86_64 libbabeltrace-1.5.8-3.fc33.x86_64 libbrotli-1.0.9-3.fc33.x86_64 libcap-2.26-8.fc33.x86_64 libcom_err-1.45.6-4.fc33.x86_64 libcurl-7.71.1-8.fc33.x86_64 libgcc-10.2.1-9.fc33.x86_64 libidn2-2.3.0-4.fc33.x86_64 libnghttp2-1.41.0-3.fc33.x86_64 libpsl-0.21.1-2.fc33.x86_64 libselinux-3.1-2.fc33.x86_64 libssh-0.9.5-1.fc33.x86_64 libunistring-0.9.10-9.fc33.x86_64 libunwind-1.4.0-4.fc33.x86_64 libuuid-2.36.1-1.fc33.x86_64 libxcrypt-4.4.17-1.fc33.x86_64 libzstd-1.4.7-1.fc33.x86_64 numactl-libs-2.0.14-1.fc33.x86_64 openldap-2.4.50-5.fc33.x86_64 openssl-libs-1.1.1i-1.fc33.x86_64 pcre-8.44-2.fc33.x86_64 pcre2-10.36-1.fc33.x86_64 perl-libs-5.32.0-465.fc33.x86_64 popt-1.18-2.fc33.x86_64 python3-libs-3.9.1-1.fc33.x86_64 slang-2.3.2-8.fc33.x86_64 xz-libs-5.2.5-4.fc33.x86_64 zlib-1.2.11-23.fc33.x86_64
(gdb) bt
#0 0x00007ffff75149d5 in raise () from /lib64/libc.so.6
#1 0x00007ffff74fd8a4 in abort () from /lib64/libc.so.6
#2 0x00007ffff74fd789 in __assert_fail_base.cold () from /lib64/libc.so.6
#3 0x00007ffff750d026 in __assert_fail () from /lib64/libc.so.6
#4 0x00000000005661c3 in bpf_program_profiler__read (evsel=0xc3a2c0) at util/bpf_counter.c:227
#5 0x0000000000000000 in ?? ()
(gdb)

207 static int bpf_program_profiler__read(struct evsel *evsel)
208 {
209 int num_cpu = evsel__nr_cpus(evsel);
210 struct bpf_perf_event_value values[num_cpu];
211 struct bpf_counter *counter;
212 int reading_map_fd;
213 __u32 key = 0;
214 int err, cpu;
215
216 if (list_empty(&evsel->bpf_counter_list))
217 return -EAGAIN;
218
219 for (cpu = 0; cpu < num_cpu; cpu++) {
220 perf_counts(evsel->counts, cpu, 0)->val = 0;
221 perf_counts(evsel->counts, cpu, 0)->ena = 0;
222 perf_counts(evsel->counts, cpu, 0)->run = 0;
223 }
224 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
225 struct bpf_prog_profiler_bpf *skel = counter->skel;
226
227 assert(skel != NULL);
228 reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
229
230 err = bpf_map_lookup_elem(reading_map_fd, &key, values);
231 if (err) {
232 pr_err("failed to read value\n");
233 return err;
234 }
235
236 for (cpu = 0; cpu < num_cpu; cpu++) {
237 perf_counts(evsel->counts, cpu, 0)->val += values[cpu].counter;
238 perf_counts(evsel->counts, cpu, 0)->ena += values[cpu].enabled;
239 perf_counts(evsel->counts, cpu, 0)->run += values[cpu].running;
240 }
241 }
242 return 0;
243 }

- Arnaldo

2021-01-19 18:34:33

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs

Em Tue, Jan 19, 2021 at 11:42:49AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Jan 19, 2021 at 11:31:44AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Tue, Jan 19, 2021 at 12:48:19AM +0000, Song Liu escreveu:
> > > > On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > We are looking at two issues:
> > > 1. Cannot recursively attach;
> > > 2. prog FD 3 doesn't have valid btf.
>
> > > #1 was caused by the verifier disallowing attaching fentry/fexit program
> > > to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target).
> > > This constraint was added when we only had fentry/fexit in the TRACING
> > > type. We have extended the TRACING type to many other use cases, like
> > > "tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit
> > > this constraint. I will work on this.
>
> > > For #2, we require the target program to have BTF. I guess we won't remove
> > > this requirement.
>
> > > While I work on improving #1, could you please test with some kprobe
> > > programs? For example, we can use fileslower.py from bcc.
>
> > Sure, and please consider improving the error messages to state what you
> > described above.
>
> Terminal 1:
>
> [root@five perf]# perf trace -e 5sec.c
> ^C
> [root@five perf]# cat 5sec.c
> #include <bpf.h>
>
> #define NSEC_PER_SEC 1000000000L
>
> int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec)
> {
> return sec / NSEC_PER_SEC == 5;
> }
>
> license(GPL);
> [root@five perf]# perf trace -e 5sec.c/max-stack=16/
> 0.000 sleep/3739435 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1743337312, rqtp: 5000000000)
> hrtimer_nanosleep ([kernel.kallsyms])
> common_nsleep ([kernel.kallsyms])
> __x64_sys_clock_nanosleep ([kernel.kallsyms])
> do_syscall_64 ([kernel.kallsyms])
> entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
> __clock_nanosleep_2 (/usr/lib64/libc-2.32.so)
>
>
> Terminal 2:
>
> [root@five ~]# perf stat -e cycles -b 180 -I 1000
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> perf: util/bpf_counter.c:227: bpf_program_profiler__read: Assertion `skel != NULL' failed.
> Aborted (core dumped)

Out to lunch, will continue later, but this may help you figure this out
till then :)

Starting program: /root/bin/perf stat -e cycles -b 244 -I 1000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, bpf_program_profiler_load_one (evsel=0xce02c0, prog_id=244) at util/bpf_counter.c:96
96 {
(gdb) n
104 prog_fd = bpf_prog_get_fd_by_id(prog_id);
(gdb)
105 if (prog_fd < 0) {
(gdb)
109 counter = bpf_counter_alloc();
(gdb)
110 if (!counter) {
(gdb) n
115 skel = bpf_prog_profiler_bpf__open();
(gdb) p counter
$9 = (struct bpf_counter *) 0xce09e0
(gdb) p *counter
$10 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
(gdb) p *counter
$11 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
(gdb) n
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
116 if (!skel) {
(gdb)
121 skel->rodata->num_cpu = evsel__nr_cpus(evsel);
(gdb)
123 bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
(gdb)
124 bpf_map__resize(skel->maps.fentry_readings, 1);
(gdb)
125 bpf_map__resize(skel->maps.accum_readings, 1);
(gdb)
127 prog_name = bpf_target_prog_name(prog_fd);
(gdb)
128 if (!prog_name) {
(gdb)
133 bpf_object__for_each_program(prog, skel->obj) {
(gdb)
134 err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
(gdb)
135 if (err) {
(gdb)
133 bpf_object__for_each_program(prog, skel->obj) {
(gdb) p evsel
$12 = (struct evsel *) 0xce02c0
(gdb) p evsel->name
$13 = 0xce04e0 "cycles"
(gdb) n
134 err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
(gdb)
135 if (err) {
(gdb)
133 bpf_object__for_each_program(prog, skel->obj) {
(gdb)
141 set_max_rlimit();
(gdb)
142 err = bpf_prog_profiler_bpf__load(skel);
(gdb)
143 if (err) {
(gdb)
148 assert(skel != NULL);
(gdb)
149 counter->skel = skel;
(gdb)
150 list_add(&counter->list, &evsel->bpf_counter_list);
(gdb) c
Continuing.

Breakpoint 4, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:247
247 {
(gdb) n
252 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
(gdb)
253 skel = counter->skel;
(gdb) watch counter->skel
Hardware watchpoint 6: counter->skel
(gdb) p counter->skel
$14 = (void *) 0xce0a00
(gdb) n
254 assert(skel != NULL);
(gdb) p skel
$15 = (struct bpf_prog_profiler_bpf *) 0xce0a00
(gdb) c
Continuing.

Hardware watchpoint 6: counter->skel

Old value = (void *) 0xce0a00
New value = (void *) 0x0
0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
252 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
(gdb) info b
Num Type Disp Enb Address What
1 breakpoint keep y 0x00000000005ceb44 in bpf_program_profiler_load_one at util/bpf_counter.c:96
breakpoint already hit 1 time
2 breakpoint keep y 0x00000000005cef2e in bpf_program_profiler__enable at util/bpf_counter.c:192
3 breakpoint keep y 0x00000000005cf00a in bpf_program_profiler__read at util/bpf_counter.c:208
4 breakpoint keep y 0x00000000005cf3ba in bpf_program_profiler__install_pe at util/bpf_counter.c:247
breakpoint already hit 1 time
5 breakpoint keep y 0x00000000005ce8d4 in bpf_program_profiler__destroy at util/bpf_counter.c:47
6 hw watchpoint keep y counter->skel
breakpoint already hit 1 time
(gdb) p evsel
$16 = (struct evsel *) 0xce02c0
(gdb) bt
#0 0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
#1 0x00000000005cf4f3 in bpf_counter__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:276
#2 0x00000000004f3ef1 in evsel__open_cpu (evsel=0xce02c0, cpus=0xce0500, threads=0xce75c0, start_cpu=0, end_cpu=1) at util/evsel.c:1792
#3 0x00000000004f4aa7 in evsel__open_per_cpu (evsel=0xce02c0, cpus=0xce0500, cpu=0) at util/evsel.c:1996
#4 0x000000000057f6f2 in create_perf_stat_counter (evsel=0xce02c0, config=0xa97860 <stat_config>, target=0xa97660 <target>, cpu=0) at util/stat.c:568
#5 0x0000000000435d96 in __run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:790
#6 0x00000000004367e3 in run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:1001
#7 0x0000000000439e35 in cmd_stat (argc=0, argv=0x7fffffffdb60) at builtin-stat.c:2415
#8 0x00000000004cf058 in run_builtin (p=0xaabe00 <commands+288>, argc=7, argv=0x7fffffffdb60) at perf.c:312
#9 0x00000000004cf2c5 in handle_internal_command (argc=7, argv=0x7fffffffdb60) at perf.c:364
#10 0x00000000004cf40c in run_argv (argcp=0x7fffffffd9ac, argv=0x7fffffffd9a0) at perf.c:408
#11 0x00000000004cf7d8 in main (argc=7, argv=0x7fffffffdb60) at perf.c:538
(gdb) n
261 return 0;

2021-01-19 21:59:01

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs



> On Jan 19, 2021, at 8:31 AM, Arnaldo Carvalho de Melo <[email protected]> wrote:
>
> Em Tue, Jan 19, 2021 at 11:42:49AM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Tue, Jan 19, 2021 at 11:31:44AM -0300, Arnaldo Carvalho de Melo escreveu:
>>> Em Tue, Jan 19, 2021 at 12:48:19AM +0000, Song Liu escreveu:
>>>>> On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <[email protected]> wrote:
>>>> We are looking at two issues:
>>>> 1. Cannot recursively attach;
>>>> 2. prog FD 3 doesn't have valid btf.
>>
>>>> #1 was caused by the verifier disallowing attaching fentry/fexit program
>>>> to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target).
>>>> This constraint was added when we only had fentry/fexit in the TRACING
>>>> type. We have extended the TRACING type to many other use cases, like
>>>> "tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit
>>>> this constraint. I will work on this.
>>
>>>> For #2, we require the target program to have BTF. I guess we won't remove
>>>> this requirement.
>>
>>>> While I work on improving #1, could you please test with some kprobe
>>>> programs? For example, we can use fileslower.py from bcc.
>>
>>> Sure, and please consider improving the error messages to state what you
>>> described above.
>>
>> Terminal 1:
>>
>> [root@five perf]# perf trace -e 5sec.c
>> ^C
>> [root@five perf]# cat 5sec.c
>> #include <bpf.h>
>>
>> #define NSEC_PER_SEC 1000000000L
>>
>> int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec)
>> {
>> return sec / NSEC_PER_SEC == 5;
>> }
>>
>> license(GPL);
>> [root@five perf]# perf trace -e 5sec.c/max-stack=16/
>> 0.000 sleep/3739435 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1743337312, rqtp: 5000000000)
>> hrtimer_nanosleep ([kernel.kallsyms])
>> common_nsleep ([kernel.kallsyms])
>> __x64_sys_clock_nanosleep ([kernel.kallsyms])
>> do_syscall_64 ([kernel.kallsyms])
>> entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
>> __clock_nanosleep_2 (/usr/lib64/libc-2.32.so)
>>
>>
>> Terminal 2:
>>
>> [root@five ~]# perf stat -e cycles -b 180 -I 1000
>> libbpf: elf: skipping unrecognized data section(9) .eh_frame
>> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
>> perf: util/bpf_counter.c:227: bpf_program_profiler__read: Assertion `skel != NULL' failed.
>> Aborted (core dumped)
>
> Out to lunch, will continue later, but this may help you figure this out
> till then :)
>
> Starting program: /root/bin/perf stat -e cycles -b 244 -I 1000
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
>
> Breakpoint 1, bpf_program_profiler_load_one (evsel=0xce02c0, prog_id=244) at util/bpf_counter.c:96
> 96 {
> (gdb) n
> 104 prog_fd = bpf_prog_get_fd_by_id(prog_id);
> (gdb)
> 105 if (prog_fd < 0) {
> (gdb)
> 109 counter = bpf_counter_alloc();
> (gdb)
> 110 if (!counter) {
> (gdb) n
> 115 skel = bpf_prog_profiler_bpf__open();
> (gdb) p counter
> $9 = (struct bpf_counter *) 0xce09e0
> (gdb) p *counter
> $10 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
> (gdb) p *counter
> $11 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
> (gdb) n
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> 116 if (!skel) {
> (gdb)
> 121 skel->rodata->num_cpu = evsel__nr_cpus(evsel);
> (gdb)
> 123 bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
> (gdb)
> 124 bpf_map__resize(skel->maps.fentry_readings, 1);
> (gdb)
> 125 bpf_map__resize(skel->maps.accum_readings, 1);
> (gdb)
> 127 prog_name = bpf_target_prog_name(prog_fd);
> (gdb)
> 128 if (!prog_name) {
> (gdb)
> 133 bpf_object__for_each_program(prog, skel->obj) {
> (gdb)
> 134 err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> (gdb)
> 135 if (err) {
> (gdb)
> 133 bpf_object__for_each_program(prog, skel->obj) {
> (gdb) p evsel
> $12 = (struct evsel *) 0xce02c0
> (gdb) p evsel->name
> $13 = 0xce04e0 "cycles"
> (gdb) n
> 134 err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> (gdb)
> 135 if (err) {
> (gdb)
> 133 bpf_object__for_each_program(prog, skel->obj) {
> (gdb)
> 141 set_max_rlimit();
> (gdb)
> 142 err = bpf_prog_profiler_bpf__load(skel);
> (gdb)
> 143 if (err) {
> (gdb)
> 148 assert(skel != NULL);
> (gdb)
> 149 counter->skel = skel;
> (gdb)
> 150 list_add(&counter->list, &evsel->bpf_counter_list);
> (gdb) c
> Continuing.
>
> Breakpoint 4, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:247
> 247 {
> (gdb) n
> 252 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> (gdb)
> 253 skel = counter->skel;
> (gdb) watch counter->skel
> Hardware watchpoint 6: counter->skel
> (gdb) p counter->skel
> $14 = (void *) 0xce0a00
> (gdb) n
> 254 assert(skel != NULL);
> (gdb) p skel
> $15 = (struct bpf_prog_profiler_bpf *) 0xce0a00
> (gdb) c
> Continuing.
>
> Hardware watchpoint 6: counter->skel
>
> Old value = (void *) 0xce0a00
> New value = (void *) 0x0
> 0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
> 252 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {

So it was the list operation that set counter->skel to NULL? I am really confused...

Song

> (gdb) info b
> Num Type Disp Enb Address What
> 1 breakpoint keep y 0x00000000005ceb44 in bpf_program_profiler_load_one at util/bpf_counter.c:96
> breakpoint already hit 1 time
> 2 breakpoint keep y 0x00000000005cef2e in bpf_program_profiler__enable at util/bpf_counter.c:192
> 3 breakpoint keep y 0x00000000005cf00a in bpf_program_profiler__read at util/bpf_counter.c:208
> 4 breakpoint keep y 0x00000000005cf3ba in bpf_program_profiler__install_pe at util/bpf_counter.c:247
> breakpoint already hit 1 time
> 5 breakpoint keep y 0x00000000005ce8d4 in bpf_program_profiler__destroy at util/bpf_counter.c:47
> 6 hw watchpoint keep y counter->skel
> breakpoint already hit 1 time
> (gdb) p evsel
> $16 = (struct evsel *) 0xce02c0
> (gdb) bt
> #0 0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
> #1 0x00000000005cf4f3 in bpf_counter__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:276
> #2 0x00000000004f3ef1 in evsel__open_cpu (evsel=0xce02c0, cpus=0xce0500, threads=0xce75c0, start_cpu=0, end_cpu=1) at util/evsel.c:1792
> #3 0x00000000004f4aa7 in evsel__open_per_cpu (evsel=0xce02c0, cpus=0xce0500, cpu=0) at util/evsel.c:1996
> #4 0x000000000057f6f2 in create_perf_stat_counter (evsel=0xce02c0, config=0xa97860 <stat_config>, target=0xa97660 <target>, cpu=0) at util/stat.c:568
> #5 0x0000000000435d96 in __run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:790
> #6 0x00000000004367e3 in run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:1001
> #7 0x0000000000439e35 in cmd_stat (argc=0, argv=0x7fffffffdb60) at builtin-stat.c:2415
> #8 0x00000000004cf058 in run_builtin (p=0xaabe00 <commands+288>, argc=7, argv=0x7fffffffdb60) at perf.c:312
> #9 0x00000000004cf2c5 in handle_internal_command (argc=7, argv=0x7fffffffdb60) at perf.c:364
> #10 0x00000000004cf40c in run_argv (argcp=0x7fffffffd9ac, argv=0x7fffffffd9a0) at perf.c:408
> #11 0x00000000004cf7d8 in main (argc=7, argv=0x7fffffffdb60) at perf.c:538
> (gdb) n
> 261 return 0;
>

2021-01-19 22:33:38

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs

Em Tue, Jan 19, 2021 at 09:54:50PM +0000, Song Liu escreveu:
>
>
> > On Jan 19, 2021, at 8:31 AM, Arnaldo Carvalho de Melo <[email protected]> wrote:
> >
> > Em Tue, Jan 19, 2021 at 11:42:49AM -0300, Arnaldo Carvalho de Melo escreveu:
> >> Em Tue, Jan 19, 2021 at 11:31:44AM -0300, Arnaldo Carvalho de Melo escreveu:
> >>> Em Tue, Jan 19, 2021 at 12:48:19AM +0000, Song Liu escreveu:
> >>>>> On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <[email protected]> wrote:
> >>>> We are looking at two issues:
> >>>> 1. Cannot recursively attach;
> >>>> 2. prog FD 3 doesn't have valid btf.
> >>
> >>>> #1 was caused by the verifier disallowing attaching fentry/fexit program
> >>>> to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target).
> >>>> This constraint was added when we only had fentry/fexit in the TRACING
> >>>> type. We have extended the TRACING type to many other use cases, like
> >>>> "tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit
> >>>> this constraint. I will work on this.
> >>
> >>>> For #2, we require the target program to have BTF. I guess we won't remove
> >>>> this requirement.
> >>
> >>>> While I work on improving #1, could you please test with some kprobe
> >>>> programs? For example, we can use fileslower.py from bcc.
> >>
> >>> Sure, and please consider improving the error messages to state what you
> >>> described above.
> >>
> >> Terminal 1:
> >>
> >> [root@five perf]# perf trace -e 5sec.c
> >> ^C
> >> [root@five perf]# cat 5sec.c
> >> #include <bpf.h>
> >>
> >> #define NSEC_PER_SEC 1000000000L
> >>
> >> int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec)
> >> {
> >> return sec / NSEC_PER_SEC == 5;
> >> }
> >>
> >> license(GPL);
> >> [root@five perf]# perf trace -e 5sec.c/max-stack=16/
> >> 0.000 sleep/3739435 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1743337312, rqtp: 5000000000)
> >> hrtimer_nanosleep ([kernel.kallsyms])
> >> common_nsleep ([kernel.kallsyms])
> >> __x64_sys_clock_nanosleep ([kernel.kallsyms])
> >> do_syscall_64 ([kernel.kallsyms])
> >> entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
> >> __clock_nanosleep_2 (/usr/lib64/libc-2.32.so)
> >>
> >>
> >> Terminal 2:
> >>
> >> [root@five ~]# perf stat -e cycles -b 180 -I 1000
> >> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> >> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> >> perf: util/bpf_counter.c:227: bpf_program_profiler__read: Assertion `skel != NULL' failed.
> >> Aborted (core dumped)
> >
> > Out to lunch, will continue later, but this may help you figure this out
> > till then :)
> >
> > Starting program: /root/bin/perf stat -e cycles -b 244 -I 1000
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> >
> > Breakpoint 1, bpf_program_profiler_load_one (evsel=0xce02c0, prog_id=244) at util/bpf_counter.c:96
> > 96 {
> > (gdb) n
> > 104 prog_fd = bpf_prog_get_fd_by_id(prog_id);
> > (gdb)
> > 105 if (prog_fd < 0) {
> > (gdb)
> > 109 counter = bpf_counter_alloc();
> > (gdb)
> > 110 if (!counter) {
> > (gdb) n
> > 115 skel = bpf_prog_profiler_bpf__open();
> > (gdb) p counter
> > $9 = (struct bpf_counter *) 0xce09e0
> > (gdb) p *counter
> > $10 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
> > (gdb) p *counter
> > $11 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
> > (gdb) n
> > libbpf: elf: skipping unrecognized data section(9) .eh_frame
> > libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> > 116 if (!skel) {
> > (gdb)
> > 121 skel->rodata->num_cpu = evsel__nr_cpus(evsel);
> > (gdb)
> > 123 bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
> > (gdb)
> > 124 bpf_map__resize(skel->maps.fentry_readings, 1);
> > (gdb)
> > 125 bpf_map__resize(skel->maps.accum_readings, 1);
> > (gdb)
> > 127 prog_name = bpf_target_prog_name(prog_fd);
> > (gdb)
> > 128 if (!prog_name) {
> > (gdb)
> > 133 bpf_object__for_each_program(prog, skel->obj) {
> > (gdb)
> > 134 err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> > (gdb)
> > 135 if (err) {
> > (gdb)
> > 133 bpf_object__for_each_program(prog, skel->obj) {
> > (gdb) p evsel
> > $12 = (struct evsel *) 0xce02c0
> > (gdb) p evsel->name
> > $13 = 0xce04e0 "cycles"
> > (gdb) n
> > 134 err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> > (gdb)
> > 135 if (err) {
> > (gdb)
> > 133 bpf_object__for_each_program(prog, skel->obj) {
> > (gdb)
> > 141 set_max_rlimit();
> > (gdb)
> > 142 err = bpf_prog_profiler_bpf__load(skel);
> > (gdb)
> > 143 if (err) {
> > (gdb)
> > 148 assert(skel != NULL);
> > (gdb)
> > 149 counter->skel = skel;
> > (gdb)
> > 150 list_add(&counter->list, &evsel->bpf_counter_list);
> > (gdb) c
> > Continuing.
> >
> > Breakpoint 4, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:247
> > 247 {
> > (gdb) n
> > 252 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> > (gdb)
> > 253 skel = counter->skel;
> > (gdb) watch counter->skel
> > Hardware watchpoint 6: counter->skel
> > (gdb) p counter->skel
> > $14 = (void *) 0xce0a00
> > (gdb) n
> > 254 assert(skel != NULL);
> > (gdb) p skel
> > $15 = (struct bpf_prog_profiler_bpf *) 0xce0a00
> > (gdb) c
> > Continuing.
> >
> > Hardware watchpoint 6: counter->skel
> >
> > Old value = (void *) 0xce0a00
> > New value = (void *) 0x0
> > 0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
> > 252 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
>
> So it was the list operation that set counter->skel to NULL? I am really confused...

Yep, I'm confused as well and trying to reproduce this, but got
sidetracked...

- Arnaldo

> Song
>
> > (gdb) info b
> > Num Type Disp Enb Address What
> > 1 breakpoint keep y 0x00000000005ceb44 in bpf_program_profiler_load_one at util/bpf_counter.c:96
> > breakpoint already hit 1 time
> > 2 breakpoint keep y 0x00000000005cef2e in bpf_program_profiler__enable at util/bpf_counter.c:192
> > 3 breakpoint keep y 0x00000000005cf00a in bpf_program_profiler__read at util/bpf_counter.c:208
> > 4 breakpoint keep y 0x00000000005cf3ba in bpf_program_profiler__install_pe at util/bpf_counter.c:247
> > breakpoint already hit 1 time
> > 5 breakpoint keep y 0x00000000005ce8d4 in bpf_program_profiler__destroy at util/bpf_counter.c:47
> > 6 hw watchpoint keep y counter->skel
> > breakpoint already hit 1 time
> > (gdb) p evsel
> > $16 = (struct evsel *) 0xce02c0
> > (gdb) bt
> > #0 0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
> > #1 0x00000000005cf4f3 in bpf_counter__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:276
> > #2 0x00000000004f3ef1 in evsel__open_cpu (evsel=0xce02c0, cpus=0xce0500, threads=0xce75c0, start_cpu=0, end_cpu=1) at util/evsel.c:1792
> > #3 0x00000000004f4aa7 in evsel__open_per_cpu (evsel=0xce02c0, cpus=0xce0500, cpu=0) at util/evsel.c:1996
> > #4 0x000000000057f6f2 in create_perf_stat_counter (evsel=0xce02c0, config=0xa97860 <stat_config>, target=0xa97660 <target>, cpu=0) at util/stat.c:568
> > #5 0x0000000000435d96 in __run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:790
> > #6 0x00000000004367e3 in run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:1001
> > #7 0x0000000000439e35 in cmd_stat (argc=0, argv=0x7fffffffdb60) at builtin-stat.c:2415
> > #8 0x00000000004cf058 in run_builtin (p=0xaabe00 <commands+288>, argc=7, argv=0x7fffffffdb60) at perf.c:312
> > #9 0x00000000004cf2c5 in handle_internal_command (argc=7, argv=0x7fffffffdb60) at perf.c:364
> > #10 0x00000000004cf40c in run_argv (argcp=0x7fffffffd9ac, argv=0x7fffffffd9a0) at perf.c:408
> > #11 0x00000000004cf7d8 in main (argc=7, argv=0x7fffffffdb60) at perf.c:538
> > (gdb) n
> > 261 return 0;
> >
>

--

- Arnaldo

2021-01-19 22:46:10

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs

Em Tue, Jan 19, 2021 at 12:48:19AM +0000, Song Liu escreveu:
> > On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <[email protected]> wrote:
> We are looking at two issues:
> 1. Cannot recursively attach;
> 2. prog FD 3 doesn't have valid btf.

> #1 was caused by the verifier disallowing attaching fentry/fexit program
> to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target).
> This constraint was added when we only had fentry/fexit in the TRACING
> type. We have extended the TRACING type to many other use cases, like
> "tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit
> this constraint. I will work on this.

> For #2, we require the target program to have BTF. I guess we won't remove
> this requirement.

> While I work on improving #1, could you please test with some kprobe
> programs? For example, we can use fileslower.py from bcc.

Sure, and please consider improving the error messages to state what you
described above.

- Arnaldo

2021-01-20 13:16:00

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs

Em Tue, Jan 19, 2021 at 07:30:21PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Jan 19, 2021 at 09:54:50PM +0000, Song Liu escreveu:
> >
> >
> > > On Jan 19, 2021, at 8:31 AM, Arnaldo Carvalho de Melo <[email protected]> wrote:
> > >
> > > Em Tue, Jan 19, 2021 at 11:42:49AM -0300, Arnaldo Carvalho de Melo escreveu:
> > >> Em Tue, Jan 19, 2021 at 11:31:44AM -0300, Arnaldo Carvalho de Melo escreveu:
> > >>> Em Tue, Jan 19, 2021 at 12:48:19AM +0000, Song Liu escreveu:
> > >>>>> On Jan 18, 2021, at 11:38 AM, Arnaldo Carvalho de Melo <[email protected]> wrote:
> > >>>> We are looking at two issues:
> > >>>> 1. Cannot recursively attach;
> > >>>> 2. prog FD 3 doesn't have valid btf.
> > >>
> > >>>> #1 was caused by the verifier disallowing attaching fentry/fexit program
> > >>>> to program with type BPF_PROG_TYPE_TRACING (in bpf_check_attach_target).
> > >>>> This constraint was added when we only had fentry/fexit in the TRACING
> > >>>> type. We have extended the TRACING type to many other use cases, like
> > >>>> "tp_btf/", "fmod_ret" and "iter/". Therefore, it is good time to revisit
> > >>>> this constraint. I will work on this.
> > >>
> > >>>> For #2, we require the target program to have BTF. I guess we won't remove
> > >>>> this requirement.
> > >>
> > >>>> While I work on improving #1, could you please test with some kprobe
> > >>>> programs? For example, we can use fileslower.py from bcc.
> > >>
> > >>> Sure, and please consider improving the error messages to state what you
> > >>> described above.
> > >>
> > >> Terminal 1:
> > >>
> > >> [root@five perf]# perf trace -e 5sec.c
> > >> ^C
> > >> [root@five perf]# cat 5sec.c
> > >> #include <bpf.h>
> > >>
> > >> #define NSEC_PER_SEC 1000000000L
> > >>
> > >> int probe(hrtimer_nanosleep, rqtp)(void *ctx, int err, long long sec)
> > >> {
> > >> return sec / NSEC_PER_SEC == 5;
> > >> }
> > >>
> > >> license(GPL);
> > >> [root@five perf]# perf trace -e 5sec.c/max-stack=16/
> > >> 0.000 sleep/3739435 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1743337312, rqtp: 5000000000)
> > >> hrtimer_nanosleep ([kernel.kallsyms])
> > >> common_nsleep ([kernel.kallsyms])
> > >> __x64_sys_clock_nanosleep ([kernel.kallsyms])
> > >> do_syscall_64 ([kernel.kallsyms])
> > >> entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
> > >> __clock_nanosleep_2 (/usr/lib64/libc-2.32.so)
> > >>
> > >>
> > >> Terminal 2:
> > >>
> > >> [root@five ~]# perf stat -e cycles -b 180 -I 1000
> > >> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> > >> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> > >> perf: util/bpf_counter.c:227: bpf_program_profiler__read: Assertion `skel != NULL' failed.
> > >> Aborted (core dumped)
> > >
> > > Out to lunch, will continue later, but this may help you figure this out
> > > till then :)
> > >
> > > Starting program: /root/bin/perf stat -e cycles -b 244 -I 1000
> > > [Thread debugging using libthread_db enabled]
> > > Using host libthread_db library "/lib64/libthread_db.so.1".
> > >
> > > Breakpoint 1, bpf_program_profiler_load_one (evsel=0xce02c0, prog_id=244) at util/bpf_counter.c:96
> > > 96 {
> > > (gdb) n
> > > 104 prog_fd = bpf_prog_get_fd_by_id(prog_id);
> > > (gdb)
> > > 105 if (prog_fd < 0) {
> > > (gdb)
> > > 109 counter = bpf_counter_alloc();
> > > (gdb)
> > > 110 if (!counter) {
> > > (gdb) n
> > > 115 skel = bpf_prog_profiler_bpf__open();
> > > (gdb) p counter
> > > $9 = (struct bpf_counter *) 0xce09e0
> > > (gdb) p *counter
> > > $10 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
> > > (gdb) p *counter
> > > $11 = {skel = 0x0, list = {next = 0xce09e8, prev = 0xce09e8}}
> > > (gdb) n
> > > libbpf: elf: skipping unrecognized data section(9) .eh_frame
> > > libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> > > 116 if (!skel) {
> > > (gdb)
> > > 121 skel->rodata->num_cpu = evsel__nr_cpus(evsel);
> > > (gdb)
> > > 123 bpf_map__resize(skel->maps.events, evsel__nr_cpus(evsel));
> > > (gdb)
> > > 124 bpf_map__resize(skel->maps.fentry_readings, 1);
> > > (gdb)
> > > 125 bpf_map__resize(skel->maps.accum_readings, 1);
> > > (gdb)
> > > 127 prog_name = bpf_target_prog_name(prog_fd);
> > > (gdb)
> > > 128 if (!prog_name) {
> > > (gdb)
> > > 133 bpf_object__for_each_program(prog, skel->obj) {
> > > (gdb)
> > > 134 err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> > > (gdb)
> > > 135 if (err) {
> > > (gdb)
> > > 133 bpf_object__for_each_program(prog, skel->obj) {
> > > (gdb) p evsel
> > > $12 = (struct evsel *) 0xce02c0
> > > (gdb) p evsel->name
> > > $13 = 0xce04e0 "cycles"
> > > (gdb) n
> > > 134 err = bpf_program__set_attach_target(prog, prog_fd, prog_name);
> > > (gdb)
> > > 135 if (err) {
> > > (gdb)
> > > 133 bpf_object__for_each_program(prog, skel->obj) {
> > > (gdb)
> > > 141 set_max_rlimit();
> > > (gdb)
> > > 142 err = bpf_prog_profiler_bpf__load(skel);
> > > (gdb)
> > > 143 if (err) {
> > > (gdb)
> > > 148 assert(skel != NULL);
> > > (gdb)
> > > 149 counter->skel = skel;
> > > (gdb)
> > > 150 list_add(&counter->list, &evsel->bpf_counter_list);
> > > (gdb) c
> > > Continuing.
> > >
> > > Breakpoint 4, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:247
> > > 247 {
> > > (gdb) n
> > > 252 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> > > (gdb)
> > > 253 skel = counter->skel;
> > > (gdb) watch counter->skel
> > > Hardware watchpoint 6: counter->skel
> > > (gdb) p counter->skel
> > > $14 = (void *) 0xce0a00
> > > (gdb) n
> > > 254 assert(skel != NULL);
> > > (gdb) p skel
> > > $15 = (struct bpf_prog_profiler_bpf *) 0xce0a00
> > > (gdb) c
> > > Continuing.
> > >
> > > Hardware watchpoint 6: counter->skel
> > >
> > > Old value = (void *) 0xce0a00
> > > New value = (void *) 0x0
> > > 0x00000000005cf45e in bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=0, fd=3) at util/bpf_counter.c:252
> > > 252 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> >
> > So it was the list operation that set counter->skel to NULL? I am really confused...
>
> Yep, I'm confused as well and trying to reproduce this, but got
> sidetracked...

Coming back to this, now it is exploding later:

Breakpoint 8, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=22, fd=32) at util/bpf_counter.c:254
254 assert(skel != NULL);
(gdb) p skel
$52 = (struct bpf_prog_profiler_bpf *) 0xce0a00
(gdb) c
Continuing.

Breakpoint 4, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=23, fd=33) at util/bpf_counter.c:247
247 {
(gdb) p skel
$53 = (struct bpf_prog_profiler_bpf *) 0xce04c0
(gdb) c
Continuing.

Breakpoint 8, bpf_program_profiler__install_pe (evsel=0xce02c0, cpu=23, fd=33) at util/bpf_counter.c:254
254 assert(skel != NULL);
(gdb) p skel
$54 = (struct bpf_prog_profiler_bpf *) 0xce0a00
(gdb) c
Continuing.

Breakpoint 2, bpf_program_profiler__enable (evsel=0xce02c0) at util/bpf_counter.c:192
192 {
(gdb) n
196 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
(gdb)
197 assert(counter->skel != NULL);
(gdb)
198 ret = bpf_prog_profiler_bpf__attach(counter->skel);
(gdb) c
Continuing.

Breakpoint 3, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:208
208 {
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
224 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
(gdb) p evsel
$55 = (struct evsel *) 0x0
(gdb) bt
#0 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
#1 0x0000000000000000 in ?? ()
(gdb) bt
#0 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
#1 0x0000000000000000 in ?? ()
(gdb)
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7ffff647f900 (LWP 1725711) "perf" 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
(gdb)

2021-01-20 14:51:55

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs

Em Wed, Jan 20, 2021 at 09:37:27AM -0300, Arnaldo Carvalho de Melo escreveu:
> Breakpoint 3, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:208
> 208 {
> (gdb) c
> Continuing.
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> 224 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> (gdb) p evsel
> $55 = (struct evsel *) 0x0
> (gdb) bt
> #0 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> #1 0x0000000000000000 in ?? ()
> (gdb) bt
> #0 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> #1 0x0000000000000000 in ?? ()
> (gdb)
> (gdb) info threads
> Id Target Id Frame
> * 1 Thread 0x7ffff647f900 (LWP 1725711) "perf" 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> (gdb)

(gdb) run stat -e cycles -b 244 -I 1000
Starting program: /root/bin/perf stat -e cycles -b 244 -I 1000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame

Breakpoint 1, read_bpf_map_counters () at builtin-stat.c:414
414 {
(gdb) c
Continuing.

Breakpoint 2, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:224
224 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
(gdb) bt
#0 bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:224
#1 0x00000000005cf648 in bpf_counter__read (evsel=0xce02c0) at util/bpf_counter.c:300
#2 0x0000000000434e46 in read_bpf_map_counters () at builtin-stat.c:419
#3 0x0000000000434ecb in read_counters (rs=0x7fffffff9530) at builtin-stat.c:433
#4 0x00000000004351f3 in process_interval () at builtin-stat.c:498
#5 0x00000000004352b9 in handle_interval (interval=1000, times=0x7fffffff9618) at builtin-stat.c:513
#6 0x0000000000435859 in dispatch_events (forks=false, timeout=0, interval=1000, times=0x7fffffff9618) at builtin-stat.c:676
#7 0x00000000004365f1 in __run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:944
#8 0x00000000004367e3 in run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:1001
#9 0x0000000000439e35 in cmd_stat (argc=0, argv=0x7fffffffdb60) at builtin-stat.c:2415
#10 0x00000000004cf058 in run_builtin (p=0xaabe00 <commands+288>, argc=7, argv=0x7fffffffdb60) at perf.c:312
#11 0x00000000004cf2c5 in handle_internal_command (argc=7, argv=0x7fffffffdb60) at perf.c:364
#12 0x00000000004cf40c in run_argv (argcp=0x7fffffffd9ac, argv=0x7fffffffd9a0) at perf.c:408
#13 0x00000000004cf7d8 in main (argc=7, argv=0x7fffffffdb60) at perf.c:538
(gdb) n
225 struct bpf_prog_profiler_bpf *skel = counter->skel;
(gdb) n
227 assert(skel != NULL);
(gdb) n
228 reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
(gdb) p skel->maps.accum_readings
$16 = (struct bpf_map *) 0xce7400
(gdb) p *skel->maps.accum_readings
$17 = {name = 0xce7580 "accum_readings", fd = 7, sec_idx = 6, sec_offset = 56, map_ifindex = 0, inner_map_fd = -1, def = {type = 6, key_size = 4, value_size = 24, max_entries = 1, map_flags = 0}, numa_node = 0, btf_var_idx = 2, btf_key_type_id = 0, btf_value_type_id = 0,
btf_vmlinux_value_type_id = 0, priv = 0x0, clear_priv = 0x0, libbpf_type = LIBBPF_MAP_UNSPEC, mmaped = 0x0, st_ops = 0x0, inner_map = 0x0, init_slots = 0x0, init_slots_sz = 0, pin_path = 0x0, pinned = false, reused = false}
(gdb) n
230 err = bpf_map_lookup_elem(reading_map_fd, &key, values);
(gdb) n
231 if (err) {
(gdb) p evsel
$18 = (struct evsel *) 0x0
(gdb)

So at this point is stack corruption when doing the bpf lookup on this
Ryzen system with 12 cores/24 threads:

[root@five ~]# bpftool prog | tail -4
244: kprobe name hrtimer_nanosle tag 0e77bacaf4555f83 gpl
loaded_at 2021-01-19T13:04:03-0300 uid 0
xlated 80B jited 49B memlock 4096B
btf_id 360
[root@five ~]# perf stat -e cycles -b 244 -I 1000
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
# time counts unit events
1.246867591 37,371 cycles
2.247955619 33,652 cycles
3.249017879 38,983 cycles
4.250083954 49,546 cycles
5.251150241 57,319 cycles
6.252221407 44,170 cycles
7.253279040 34,691 cycles
8.254333356 48,195 cycles
^C 9.009242074 41,320 cycles

[root@five ~]#

This is with this hack:

diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
index 8c977f038f497fc1..f227fd09d33794b5 100644
--- a/tools/perf/util/bpf_counter.c
+++ b/tools/perf/util/bpf_counter.c
@@ -207,7 +207,7 @@ static int bpf_program_profiler__enable(struct evsel *evsel)
static int bpf_program_profiler__read(struct evsel *evsel)
{
int num_cpu = evsel__nr_cpus(evsel);
- struct bpf_perf_event_value values[num_cpu];
+ struct bpf_perf_event_value values[num_cpu * 2];
struct bpf_counter *counter;
int reading_map_fd;
__u32 key = 0;


--------------------------

Now to check the proper fix...

Works with all the events, etc, we need to remove that nnoying warning
about .eh_frame (is this solved in the bpf tree?)

[root@five ~]# perf stat -b 244 -I 1000
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
libbpf: elf: skipping unrecognized data section(9) .eh_frame
libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
# time counts unit events
3.624452622 0.12 msec cpu-clock # 0.000 CPUs utilized
3.624452622 0 context-switches # 0.000 K/sec
3.624452622 0 cpu-migrations # 0.000 K/sec
3.624452622 0 page-faults # 0.000 K/sec
3.624452622 302,518 cycles # 2.468 GHz (83.41%)
3.624452622 26,101 stalled-cycles-frontend # 8.63% frontend cycles idle (81.05%)
3.624452622 96,327 stalled-cycles-backend # 31.84% backend cycles idle (80.58%)
3.624452622 269,500 instructions # 0.89 insn per cycle
3.624452622 # 0.36 stalled cycles per insn (81.81%)
3.624452622 55,003 branches # 448.667 M/sec (86.94%)
3.624452622 467 branch-misses # 0.85% of all branches (77.18%)
4.625690606 0.11 msec cpu-clock # 0.000 CPUs utilized
4.625690606 0 context-switches # 0.000 K/sec
4.625690606 0 cpu-migrations # 0.000 K/sec
4.625690606 0 page-faults # 0.000 K/sec
4.625690606 290,093 cycles # 2.535 GHz (79.46%)
4.625690606 28,274 stalled-cycles-frontend # 9.75% frontend cycles idle (79.47%)
4.625690606 101,772 stalled-cycles-backend # 35.08% backend cycles idle (84.05%)
4.625690606 261,867 instructions # 0.90 insn per cycle
4.625690606 # 0.39 stalled cycles per insn (86.38%)
4.625690606 55,334 branches # 483.473 M/sec (82.58%)
4.625690606 373 branch-misses # 0.67% of all branches (88.59%)
5.626686730 0.14 msec cpu-clock # 0.000 CPUs utilized
5.626686730 0 context-switches # 0.000 K/sec
5.626686730 0 cpu-migrations # 0.000 K/sec
5.626686730 0 page-faults # 0.000 K/sec
5.626686730 369,810 cycles # 2.574 GHz (79.77%)
5.626686730 32,212 stalled-cycles-frontend # 8.71% frontend cycles idle (76.00%)
5.626686730 122,778 stalled-cycles-backend # 33.20% backend cycles idle (81.49%)
5.626686730 275,699 instructions # 0.75 insn per cycle
5.626686730 # 0.45 stalled cycles per insn (78.39%)
5.626686730 58,135 branches # 404.716 M/sec (73.61%)
5.626686730 588 branch-misses # 1.01% of all branches (71.01%)
6.627688626 0.14 msec cpu-clock # 0.000 CPUs utilized
6.627688626 0 context-switches # 0.000 K/sec
6.627688626 0 cpu-migrations # 0.000 K/sec
6.627688626 0 page-faults # 0.000 K/sec
6.627688626 358,906 cycles # 2.543 GHz (81.90%)
6.627688626 26,411 stalled-cycles-frontend # 7.36% frontend cycles idle (80.87%)
6.627688626 129,526 stalled-cycles-backend # 36.09% backend cycles idle (77.72%)
6.627688626 278,756 instructions # 0.78 insn per cycle
6.627688626 # 0.46 stalled cycles per insn (70.31%)
6.627688626 56,514 branches # 400.448 M/sec (80.53%)
6.627688626 687 branch-misses # 1.22% of all branches (75.74%)
7.628688818 0.15 msec cpu-clock # 0.000 CPUs utilized
7.628688818 0 context-switches # 0.000 K/sec
7.628688818 0 cpu-migrations # 0.000 K/sec
7.628688818 0 page-faults # 0.000 K/sec
7.628688818 384,382 cycles # 2.574 GHz (84.08%)
7.628688818 27,148 stalled-cycles-frontend # 7.06% frontend cycles idle (81.82%)
7.628688818 128,434 stalled-cycles-backend # 33.41% backend cycles idle (82.29%)
7.628688818 270,693 instructions # 0.70 insn per cycle
7.628688818 # 0.47 stalled cycles per insn (83.58%)
7.628688818 57,277 branches # 383.614 M/sec (81.68%)
7.628688818 756 branch-misses # 1.32% of all branches (85.39%)
^C 7.934955676 0.05 msec cpu-clock # 0.000 CPUs utilized
7.934955676 0 context-switches # 0.000 K/sec
7.934955676 0 cpu-migrations # 0.000 K/sec
7.934955676 0 page-faults # 0.000 K/sec
7.934955676 126,813 cycles # 2.626 GHz (77.14%)
7.934955676 9,424 stalled-cycles-frontend # 7.43% frontend cycles idle (87.96%)
7.934955676 44,506 stalled-cycles-backend # 35.10% backend cycles idle (82.43%)
7.934955676 86,383 instructions # 0.68 insn per cycle
7.934955676 # 0.52 stalled cycles per insn (86.26%)
7.934955676 17,491 branches # 362.222 M/sec (87.10%)
7.934955676 247 branch-misses # 1.41% of all branches (76.62%)

[root@five ~]#

2021-01-21 22:13:06

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH v7 3/3] perf-stat: enable counting events for BPF programs

Em Wed, Jan 20, 2021 at 10:01:37AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Jan 20, 2021 at 09:37:27AM -0300, Arnaldo Carvalho de Melo escreveu:
> > Breakpoint 3, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:208
> > 208 {
> > (gdb) c
> > Continuing.
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> > 224 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> > (gdb) p evsel
> > $55 = (struct evsel *) 0x0
> > (gdb) bt
> > #0 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> > #1 0x0000000000000000 in ?? ()
> > (gdb) bt
> > #0 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> > #1 0x0000000000000000 in ?? ()
> > (gdb)
> > (gdb) info threads
> > Id Target Id Frame
> > * 1 Thread 0x7ffff647f900 (LWP 1725711) "perf" 0x00000000005cf34b in bpf_program_profiler__read (evsel=0x0) at util/bpf_counter.c:224
> > (gdb)

[root@five perf]# bpftool prog | grep ^244: -A100
244: kprobe name hrtimer_nanosle tag 0e77bacaf4555f83 gpl
loaded_at 2021-01-19T13:04:03-0300 uid 0
xlated 80B jited 49B memlock 4096B
btf_id 360
486: tracing name fentry_XXX tag 51a406116b7e2f01 gpl
loaded_at 2021-01-20T10:36:29-0300 uid 0
xlated 160B jited 98B memlock 4096B map_ids 367,366,369
btf_id 489
488: tracing name fexit_XXX tag a0cbac3880048978 gpl
loaded_at 2021-01-20T10:36:29-0300 uid 0
xlated 416B jited 242B memlock 4096B map_ids 366,367,368,369
btf_id 489
[root@five perf]#

[root@five perf]# bpftool map
3: array name iterator.rodata flags 0x480
key 4B value 98B max_entries 1 memlock 4096B
btf_id 6 frozen
366: perf_event_array name events flags 0x0
key 4B value 4B max_entries 24 memlock 4096B
367: percpu_array name fentry_readings flags 0x0
key 4B value 24B max_entries 1 memlock 4096B
368: percpu_array name accum_readings flags 0x0
key 4B value 24B max_entries 1 memlock 4096B
369: array name bpf_prog.rodata flags 0x480
key 4B value 4B max_entries 1 memlock 4096B
btf_id 489 frozen
[root@five perf]#

So sizeof(struct bpf_perf_event_value) == 24 and it is a per-cpu array, the
machine has 24 cpus, why is the kernel thinking it has more and end up zeroing
entries after the 24 cores? Some percpu map subtlety (or obvious thing ;-\) I'm
missing?

Checking lookups into per cpu maps in sample code now...

Breakpoint 2, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:230
230 values[num_cpu].counter = 111111;
(gdb) del 2
(gdb) b 233
Breakpoint 3 at 0x5cf270: file util/bpf_counter.c, line 233.
(gdb) c
Continuing.

Breakpoint 3, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:233
233 err = bpf_map_lookup_elem(reading_map_fd, &key, values);
(gdb) p values[num_cpu].running
$4 = 333333
(gdb) c
Continuing.

Breakpoint 1, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:239
239 for (cpu = 0; cpu < num_cpu; cpu++) {
(gdb) p values[num_cpu].running
$5 = 0
(gdb) list -10
224 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
225 struct bpf_prog_profiler_bpf *skel = counter->skel;
226
227 assert(skel != NULL);
228 reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
229
230 values[num_cpu].counter = 111111;
231 values[num_cpu].enabled = 222222;
232 values[num_cpu].running = 333333;
233 err = bpf_map_lookup_elem(reading_map_fd, &key, values);
(gdb)
234 if (err) {
235 pr_err("failed to read value\n");
236 return err;
237 }
238
239 for (cpu = 0; cpu < num_cpu; cpu++) {
240 perf_counts(evsel->counts, cpu, 0)->val += values[cpu].counter;
241 perf_counts(evsel->counts, cpu, 0)->ena += values[cpu].enabled;
242 perf_counts(evsel->counts, cpu, 0)->run += values[cpu].running;
243 }
(gdb)

> (gdb) run stat -e cycles -b 244 -I 1000
> Starting program: /root/bin/perf stat -e cycles -b 244 -I 1000
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
>
> Breakpoint 1, read_bpf_map_counters () at builtin-stat.c:414
> 414 {
> (gdb) c
> Continuing.
>
> Breakpoint 2, bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:224
> 224 list_for_each_entry(counter, &evsel->bpf_counter_list, list) {
> (gdb) bt
> #0 bpf_program_profiler__read (evsel=0xce02c0) at util/bpf_counter.c:224
> #1 0x00000000005cf648 in bpf_counter__read (evsel=0xce02c0) at util/bpf_counter.c:300
> #2 0x0000000000434e46 in read_bpf_map_counters () at builtin-stat.c:419
> #3 0x0000000000434ecb in read_counters (rs=0x7fffffff9530) at builtin-stat.c:433
> #4 0x00000000004351f3 in process_interval () at builtin-stat.c:498
> #5 0x00000000004352b9 in handle_interval (interval=1000, times=0x7fffffff9618) at builtin-stat.c:513
> #6 0x0000000000435859 in dispatch_events (forks=false, timeout=0, interval=1000, times=0x7fffffff9618) at builtin-stat.c:676
> #7 0x00000000004365f1 in __run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:944
> #8 0x00000000004367e3 in run_perf_stat (argc=0, argv=0x7fffffffdb60, run_idx=0) at builtin-stat.c:1001
> #9 0x0000000000439e35 in cmd_stat (argc=0, argv=0x7fffffffdb60) at builtin-stat.c:2415
> #10 0x00000000004cf058 in run_builtin (p=0xaabe00 <commands+288>, argc=7, argv=0x7fffffffdb60) at perf.c:312
> #11 0x00000000004cf2c5 in handle_internal_command (argc=7, argv=0x7fffffffdb60) at perf.c:364
> #12 0x00000000004cf40c in run_argv (argcp=0x7fffffffd9ac, argv=0x7fffffffd9a0) at perf.c:408
> #13 0x00000000004cf7d8 in main (argc=7, argv=0x7fffffffdb60) at perf.c:538
> (gdb) n
> 225 struct bpf_prog_profiler_bpf *skel = counter->skel;
> (gdb) n
> 227 assert(skel != NULL);
> (gdb) n
> 228 reading_map_fd = bpf_map__fd(skel->maps.accum_readings);
> (gdb) p skel->maps.accum_readings
> $16 = (struct bpf_map *) 0xce7400
> (gdb) p *skel->maps.accum_readings
> $17 = {name = 0xce7580 "accum_readings", fd = 7, sec_idx = 6, sec_offset = 56, map_ifindex = 0, inner_map_fd = -1, def = {type = 6, key_size = 4, value_size = 24, max_entries = 1, map_flags = 0}, numa_node = 0, btf_var_idx = 2, btf_key_type_id = 0, btf_value_type_id = 0,
> btf_vmlinux_value_type_id = 0, priv = 0x0, clear_priv = 0x0, libbpf_type = LIBBPF_MAP_UNSPEC, mmaped = 0x0, st_ops = 0x0, inner_map = 0x0, init_slots = 0x0, init_slots_sz = 0, pin_path = 0x0, pinned = false, reused = false}
> (gdb) n
> 230 err = bpf_map_lookup_elem(reading_map_fd, &key, values);
> (gdb) n
> 231 if (err) {
> (gdb) p evsel
> $18 = (struct evsel *) 0x0
> (gdb)
>
> So at this point is stack corruption when doing the bpf lookup on this
> Ryzen system with 12 cores/24 threads:
>
> [root@five ~]# bpftool prog | tail -4
> 244: kprobe name hrtimer_nanosle tag 0e77bacaf4555f83 gpl
> loaded_at 2021-01-19T13:04:03-0300 uid 0
> xlated 80B jited 49B memlock 4096B
> btf_id 360
> [root@five ~]# perf stat -e cycles -b 244 -I 1000
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> # time counts unit events
> 1.246867591 37,371 cycles
> 2.247955619 33,652 cycles
> 3.249017879 38,983 cycles
> 4.250083954 49,546 cycles
> 5.251150241 57,319 cycles
> 6.252221407 44,170 cycles
> 7.253279040 34,691 cycles
> 8.254333356 48,195 cycles
> ^C 9.009242074 41,320 cycles
>
> [root@five ~]#
>
> This is with this hack:
>
> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
> index 8c977f038f497fc1..f227fd09d33794b5 100644
> --- a/tools/perf/util/bpf_counter.c
> +++ b/tools/perf/util/bpf_counter.c
> @@ -207,7 +207,7 @@ static int bpf_program_profiler__enable(struct evsel *evsel)
> static int bpf_program_profiler__read(struct evsel *evsel)
> {
> int num_cpu = evsel__nr_cpus(evsel);
> - struct bpf_perf_event_value values[num_cpu];
> + struct bpf_perf_event_value values[num_cpu * 2];
> struct bpf_counter *counter;
> int reading_map_fd;
> __u32 key = 0;
>
>
> --------------------------
>
> Now to check the proper fix...
>
> Works with all the events, etc, we need to remove that nnoying warning
> about .eh_frame (is this solved in the bpf tree?)
>
> [root@five ~]# perf stat -b 244 -I 1000
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> libbpf: elf: skipping unrecognized data section(9) .eh_frame
> libbpf: elf: skipping relo section(15) .rel.eh_frame for section(9) .eh_frame
> # time counts unit events
> 3.624452622 0.12 msec cpu-clock # 0.000 CPUs utilized
> 3.624452622 0 context-switches # 0.000 K/sec
> 3.624452622 0 cpu-migrations # 0.000 K/sec
> 3.624452622 0 page-faults # 0.000 K/sec
> 3.624452622 302,518 cycles # 2.468 GHz (83.41%)
> 3.624452622 26,101 stalled-cycles-frontend # 8.63% frontend cycles idle (81.05%)
> 3.624452622 96,327 stalled-cycles-backend # 31.84% backend cycles idle (80.58%)
> 3.624452622 269,500 instructions # 0.89 insn per cycle
> 3.624452622 # 0.36 stalled cycles per insn (81.81%)
> 3.624452622 55,003 branches # 448.667 M/sec (86.94%)
> 3.624452622 467 branch-misses # 0.85% of all branches (77.18%)
> 4.625690606 0.11 msec cpu-clock # 0.000 CPUs utilized
> 4.625690606 0 context-switches # 0.000 K/sec
> 4.625690606 0 cpu-migrations # 0.000 K/sec
> 4.625690606 0 page-faults # 0.000 K/sec
> 4.625690606 290,093 cycles # 2.535 GHz (79.46%)
> 4.625690606 28,274 stalled-cycles-frontend # 9.75% frontend cycles idle (79.47%)
> 4.625690606 101,772 stalled-cycles-backend # 35.08% backend cycles idle (84.05%)
> 4.625690606 261,867 instructions # 0.90 insn per cycle
> 4.625690606 # 0.39 stalled cycles per insn (86.38%)
> 4.625690606 55,334 branches # 483.473 M/sec (82.58%)
> 4.625690606 373 branch-misses # 0.67% of all branches (88.59%)
> 5.626686730 0.14 msec cpu-clock # 0.000 CPUs utilized
> 5.626686730 0 context-switches # 0.000 K/sec
> 5.626686730 0 cpu-migrations # 0.000 K/sec
> 5.626686730 0 page-faults # 0.000 K/sec
> 5.626686730 369,810 cycles # 2.574 GHz (79.77%)
> 5.626686730 32,212 stalled-cycles-frontend # 8.71% frontend cycles idle (76.00%)
> 5.626686730 122,778 stalled-cycles-backend # 33.20% backend cycles idle (81.49%)
> 5.626686730 275,699 instructions # 0.75 insn per cycle
> 5.626686730 # 0.45 stalled cycles per insn (78.39%)
> 5.626686730 58,135 branches # 404.716 M/sec (73.61%)
> 5.626686730 588 branch-misses # 1.01% of all branches (71.01%)
> 6.627688626 0.14 msec cpu-clock # 0.000 CPUs utilized
> 6.627688626 0 context-switches # 0.000 K/sec
> 6.627688626 0 cpu-migrations # 0.000 K/sec
> 6.627688626 0 page-faults # 0.000 K/sec
> 6.627688626 358,906 cycles # 2.543 GHz (81.90%)
> 6.627688626 26,411 stalled-cycles-frontend # 7.36% frontend cycles idle (80.87%)
> 6.627688626 129,526 stalled-cycles-backend # 36.09% backend cycles idle (77.72%)
> 6.627688626 278,756 instructions # 0.78 insn per cycle
> 6.627688626 # 0.46 stalled cycles per insn (70.31%)
> 6.627688626 56,514 branches # 400.448 M/sec (80.53%)
> 6.627688626 687 branch-misses # 1.22% of all branches (75.74%)
> 7.628688818 0.15 msec cpu-clock # 0.000 CPUs utilized
> 7.628688818 0 context-switches # 0.000 K/sec
> 7.628688818 0 cpu-migrations # 0.000 K/sec
> 7.628688818 0 page-faults # 0.000 K/sec
> 7.628688818 384,382 cycles # 2.574 GHz (84.08%)
> 7.628688818 27,148 stalled-cycles-frontend # 7.06% frontend cycles idle (81.82%)
> 7.628688818 128,434 stalled-cycles-backend # 33.41% backend cycles idle (82.29%)
> 7.628688818 270,693 instructions # 0.70 insn per cycle
> 7.628688818 # 0.47 stalled cycles per insn (83.58%)
> 7.628688818 57,277 branches # 383.614 M/sec (81.68%)
> 7.628688818 756 branch-misses # 1.32% of all branches (85.39%)
> ^C 7.934955676 0.05 msec cpu-clock # 0.000 CPUs utilized
> 7.934955676 0 context-switches # 0.000 K/sec
> 7.934955676 0 cpu-migrations # 0.000 K/sec
> 7.934955676 0 page-faults # 0.000 K/sec
> 7.934955676 126,813 cycles # 2.626 GHz (77.14%)
> 7.934955676 9,424 stalled-cycles-frontend # 7.43% frontend cycles idle (87.96%)
> 7.934955676 44,506 stalled-cycles-backend # 35.10% backend cycles idle (82.43%)
> 7.934955676 86,383 instructions # 0.68 insn per cycle
> 7.934955676 # 0.52 stalled cycles per insn (86.26%)
> 7.934955676 17,491 branches # 362.222 M/sec (87.10%)
> 7.934955676 247 branch-misses # 1.41% of all branches (76.62%)
>
> [root@five ~]#

--

- Arnaldo