2015-08-21 10:12:11

by Wang Nan

[permalink] [raw]
Subject: [GIT PULL 00/29] perf tools: filtering events using eBPF programs

Hi Arnaldo,

This is the first time I use my new kernel.org account to generate pull
request for you. I hope it work.

The following changes since commit acc60583a43fe93ff42d707a39d478c7dbe24028:

perf tools: Fix Intel PT timestamp handling (2015-08-20 16:45:01 -0300)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/pi3orama/linux tags/perf-ebpf-for-acme-20150821

for you to fetch changes up to 15756861299809103c00d9342352eb22bb84f20f:

perf tools: Support attach BPF program on uprobe events (2015-08-21 03:57:12 +0000)

----------------------------------------------------------------
perf bpf improvements:

- Improve 'perf test BPF'

Don't use the hacking way in previous series that make perf act test
target and run 'perf record' using system().

In this series, perf test BPF open the evlist and evsel, attach then
record, analysis result in its own context.

- Improve error reporting

Use strerror style error reporting. Only output debug information
in bpf-loader.c.

Signed-off-by: Wang Nan <[email protected]>

----------------------------------------------------------------
He Kuang (3):
perf tools: Move linux/filter.h to tools/include
perf tools: Introduce arch_get_reg_info() for x86
perf record: Support custom vmlinux path

Wang Nan (24):
perf probe: Try to use symbol table if searching debug info failed
perf tools: Make perf depend on libbpf
perf ebpf: Add the libbpf glue
perf tools: Enable passing bpf object file to --event
perf probe: Attach trace_probe_event with perf_probe_event
perf record, bpf: Parse and probe eBPF programs probe points
perf bpf: Collect 'struct perf_probe_event' for bpf_program
perf record: Load all eBPF object into kernel
perf tools: Add bpf_fd field to evsel and config it
perf tools: Attach eBPF program to perf event
perf tools: Suppress probing messages when probing by BPF loading
perf record: Add clang options for compiling BPF scripts
perf tools: Infrastructure for compiling scriptlets when passing '.c' to --event
perf tests: Enforce LLVM test for BPF test
perf test: Add 'perf test BPF'
bpf tools: Load a program with different instances using preprocessor
perf tools: Fix probe-event.h include
perf probe: Reset args and nargs for probe_trace_event when failure
perf tools: Add BPF_PROLOGUE config options for further patches
perf tools: Add prologue for BPF programs for fetching arguments
perf tools: Generate prologue for BPF programs
perf tools: Use same BPF program if arguments are identical
perf probe: Init symbol as kprobe
perf tools: Support attach BPF program on uprobe events

tools/build/Makefile.feature | 6 +-
tools/include/linux/filter.h | 237 +++++++++++
tools/lib/bpf/libbpf.c | 143 ++++++-
tools/lib/bpf/libbpf.h | 22 ++
tools/perf/MANIFEST | 4 +
tools/perf/Makefile.perf | 19 +-
tools/perf/arch/x86/Makefile | 1 +
tools/perf/arch/x86/util/Build | 2 +
tools/perf/arch/x86/util/dwarf-regs.c | 104 +++--
tools/perf/builtin-probe.c | 4 +-
tools/perf/builtin-record.c | 52 ++-
tools/perf/config/Makefile | 31 +-
tools/perf/tests/Build | 10 +-
tools/perf/tests/bpf-script-example.c | 44 +++
tools/perf/tests/bpf.c | 170 ++++++++
tools/perf/tests/builtin-test.c | 12 +
tools/perf/tests/llvm.c | 123 +++++-
tools/perf/tests/llvm.h | 15 +
tools/perf/tests/make | 4 +-
tools/perf/tests/tests.h | 3 +
tools/perf/util/Build | 2 +
tools/perf/util/bpf-loader.c | 723 ++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 92 +++++
tools/perf/util/bpf-prologue.c | 442 +++++++++++++++++++++
tools/perf/util/bpf-prologue.h | 34 ++
tools/perf/util/evlist.c | 41 ++
tools/perf/util/evlist.h | 1 +
tools/perf/util/evsel.c | 17 +
tools/perf/util/evsel.h | 1 +
tools/perf/util/include/dwarf-regs.h | 7 +
tools/perf/util/parse-events.c | 26 ++
tools/perf/util/parse-events.h | 4 +
tools/perf/util/parse-events.l | 6 +
tools/perf/util/parse-events.y | 29 +-
tools/perf/util/probe-event.c | 86 ++--
tools/perf/util/probe-event.h | 8 +-
tools/perf/util/probe-file.c | 5 +-
tools/perf/util/probe-finder.c | 4 +
38 files changed, 2432 insertions(+), 102 deletions(-)
create mode 100644 tools/include/linux/filter.h
create mode 100644 tools/perf/tests/bpf-script-example.c
create mode 100644 tools/perf/tests/bpf.c
create mode 100644 tools/perf/tests/llvm.h
create mode 100644 tools/perf/util/bpf-loader.c
create mode 100644 tools/perf/util/bpf-loader.h
create mode 100644 tools/perf/util/bpf-prologue.c
create mode 100644 tools/perf/util/bpf-prologue.h


2015-08-21 10:10:20

by Wang Nan

[permalink] [raw]
Subject: [PATCH 01/29] perf probe: Try to use symbol table if searching debug info failed

A problem can occure in statically linked perf when vmlinux can be found:

# perf probe --add sys_epoll_pwait
probe-definition(0): sys_epoll_pwait
symbol:sys_epoll_pwait file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (7 entries long)
Using /lib/modules/4.2.0-rc1+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.2.0-rc1+/build/vmlinux
Try to find probe point from debuginfo.
Symbol sys_epoll_pwait address found : ffffffff8122bd40
Matched function: SyS_epoll_pwait
Failed to get call frame on 0xffffffff8122bd40
An error occurred in debuginfo analysis (-2).
Error: Failed to add events. Reason: No such file or directory (Code: -2)

The reason is caused by libdw that, if libdw is statically linked, it can't
load libebl_{arch}.so reliable.

In this case it is still possible to get the address from /proc/kalksyms.
However, perf tries that only when libdw returns -EBADF.

This patch gives it another chance to utilize symbol table, even if libdw
return error code other than -EBADF.

After applying this patch:

# perf probe -nv --add sys_epoll_pwait
probe-definition(0): sys_epoll_pwait
symbol:sys_epoll_pwait file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (7 entries long)
Using /lib/modules/4.2.0-rc1+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.2.0-rc1+/build/vmlinux
Try to find probe point from debuginfo.
Symbol sys_epoll_pwait address found : ffffffff8122bd40
Matched function: SyS_epoll_pwait
Failed to get call frame on 0xffffffff8122bd40
An error occurred in debuginfo analysis (-2).
Trying to use symbols.
Opening /sys/kernel/debug/tracing/kprobe_events write=1
Added new event:
Writing event: p:probe/sys_epoll_pwait _text+2276672
probe:sys_epoll_pwait (on sys_epoll_pwait)

You can now use it in all perf tools, such as:

perf record -e probe:sys_epoll_pwait -aR sleep 1

Although libdw returns an error (Failed to get call frame),
perf tries symbol table and finally gets correct address.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/probe-event.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index fe4941a..f07374b 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -705,9 +705,10 @@ static int try_to_find_probe_trace_events(struct perf_probe_event *pev,
}
/* Error path : ntevs < 0 */
pr_debug("An error occurred in debuginfo analysis (%d).\n", ntevs);
- if (ntevs == -EBADF) {
- pr_warning("Warning: No dwarf info found in the vmlinux - "
- "please rebuild kernel with CONFIG_DEBUG_INFO=y.\n");
+ if (ntevs < 0) {
+ if (ntevs == -EBADF)
+ pr_warning("Warning: No dwarf info found in the vmlinux - "
+ "please rebuild kernel with CONFIG_DEBUG_INFO=y.\n");
if (!need_dwarf) {
pr_debug("Trying to use symbols.\n");
return 0;
--
2.1.0

2015-08-21 10:17:54

by Wang Nan

[permalink] [raw]
Subject: [PATCH 02/29] perf tools: Make perf depend on libbpf

By adding libbpf into perf's Makefile, this patch enables perf to build
libbpf during building if libelf is found and neither NO_LIBELF nor
NO_LIBBPF is set. The newly introduced code is similar to libapi and
libtraceevent building in Makefile.perf.

MANIFEST is also updated for 'make perf-*-src-pkg'.

Append make_no_libbpf to tools/perf/tests/make.

'bpf' feature check is appended into default FEATURE_TESTS and
FEATURE_DISPLAY, so perf will check API version of bpf in
/path/to/kernel/include/uapi/linux/bpf.h. Which should not fail except
when we are trying to port this code to an old kernel.

Error messages are also updated to notify users about the disable of BPF
support of 'perf record' if libelf is missed or BPF API check failed.

tools/lib/bpf is added into TAG_FOLDERS to allow us to navigate on
libbpf files when working on perf using tools/perf/tags.

Signed-off-by: Wang Nan <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/build/Makefile.feature | 6 ++++--
tools/perf/MANIFEST | 3 +++
tools/perf/Makefile.perf | 19 +++++++++++++++++--
tools/perf/config/Makefile | 19 ++++++++++++++++++-
tools/perf/tests/make | 4 +++-
5 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 2975632..5ec6b37 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -51,7 +51,8 @@ FEATURE_TESTS ?= \
timerfd \
libdw-dwarf-unwind \
zlib \
- lzma
+ lzma \
+ bpf

FEATURE_DISPLAY ?= \
dwarf \
@@ -67,7 +68,8 @@ FEATURE_DISPLAY ?= \
libunwind \
libdw-dwarf-unwind \
zlib \
- lzma
+ lzma \
+ bpf

# Set FEATURE_CHECK_(C|LD)FLAGS-all for all FEATURE_TESTS features.
# If in the future we need per-feature checks/flags for features not
diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
index f31f15a..3db15cf 100644
--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@@ -17,6 +17,7 @@ tools/build
tools/arch/x86/include/asm/atomic.h
tools/arch/x86/include/asm/rmwcc.h
tools/lib/traceevent
+tools/lib/bpf
tools/lib/api
tools/lib/bpf
tools/lib/hweight.c
@@ -68,6 +69,8 @@ arch/*/lib/memset*.S
include/linux/poison.h
include/linux/hw_breakpoint.h
include/uapi/linux/perf_event.h
+include/uapi/linux/bpf.h
+include/uapi/linux/bpf_common.h
include/uapi/linux/const.h
include/uapi/linux/swab.h
include/uapi/linux/hw_breakpoint.h
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index d9863cb..a6a789e 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -145,6 +145,7 @@ AWK = awk

LIB_DIR = $(srctree)/tools/lib/api/
TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
+BPF_DIR = $(srctree)/tools/lib/bpf/

# include config/Makefile by default and rule out
# non-config cases
@@ -180,6 +181,7 @@ strip-libs = $(filter-out -l%,$(1))

ifneq ($(OUTPUT),)
TE_PATH=$(OUTPUT)
+ BPF_PATH=$(OUTPUT)
ifneq ($(subdir),)
LIB_PATH=$(OUTPUT)/../lib/api/
else
@@ -188,6 +190,7 @@ endif
else
TE_PATH=$(TRACE_EVENT_DIR)
LIB_PATH=$(LIB_DIR)
+ BPF_PATH=$(BPF_DIR)
endif

LIBTRACEEVENT = $(TE_PATH)libtraceevent.a
@@ -199,6 +202,8 @@ LIBTRACEEVENT_DYNAMIC_LIST_LDFLAGS = -Xlinker --dynamic-list=$(LIBTRACEEVENT_DYN
LIBAPI = $(LIB_PATH)libapi.a
export LIBAPI

+LIBBPF = $(BPF_PATH)libbpf.a
+
# python extension build directories
PYTHON_EXTBUILD := $(OUTPUT)python_ext_build/
PYTHON_EXTBUILD_LIB := $(PYTHON_EXTBUILD)lib/
@@ -251,6 +256,9 @@ export PERL_PATH
LIB_FILE=$(OUTPUT)libperf.a

PERFLIBS = $(LIB_FILE) $(LIBAPI) $(LIBTRACEEVENT)
+ifndef NO_LIBBPF
+ PERFLIBS += $(LIBBPF)
+endif

# We choose to avoid "if .. else if .. else .. endif endif"
# because maintaining the nesting to match is a pain. If
@@ -420,6 +428,13 @@ $(LIBAPI)-clean:
$(call QUIET_CLEAN, libapi)
$(Q)$(MAKE) -C $(LIB_DIR) O=$(OUTPUT) clean >/dev/null

+$(LIBBPF): FORCE
+ $(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) $(OUTPUT)libbpf.a
+
+$(LIBBPF)-clean:
+ $(call QUIET_CLEAN, libbpf)
+ $(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) clean >/dev/null
+
help:
@echo 'Perf make targets:'
@echo ' doc - make *all* documentation (see below)'
@@ -459,7 +474,7 @@ INSTALL_DOC_TARGETS += quick-install-doc quick-install-man quick-install-html
$(DOC_TARGETS):
$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) $(@:doc=all)

-TAG_FOLDERS= . ../lib/traceevent ../lib/api ../lib/symbol
+TAG_FOLDERS= . ../lib/traceevent ../lib/api ../lib/symbol ../lib/bpf
TAG_FILES= ../../include/uapi/linux/perf_event.h

TAGS:
@@ -567,7 +582,7 @@ config-clean:
$(call QUIET_CLEAN, config)
$(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null

-clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
+clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean config-clean
$(call QUIET_CLEAN, core-objs) $(RM) $(LIB_FILE) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(LANG_BINDINGS)
$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
$(Q)$(RM) $(OUTPUT).config-detected
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 827557f..38a4144 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -106,6 +106,7 @@ ifdef LIBBABELTRACE
FEATURE_CHECK_LDFLAGS-libbabeltrace := $(LIBBABELTRACE_LDFLAGS) -lbabeltrace-ctf
endif

+FEATURE_CHECK_CFLAGS-bpf = -I. -I$(srctree)/tools/include -I$(srctree)/arch/$(ARCH)/include/uapi -I$(srctree)/include/uapi
# include ARCH specific config
-include $(src-perf)/arch/$(ARCH)/Makefile

@@ -233,6 +234,7 @@ ifdef NO_LIBELF
NO_DEMANGLE := 1
NO_LIBUNWIND := 1
NO_LIBDW_DWARF_UNWIND := 1
+ NO_LIBBPF := 1
else
ifeq ($(feature-libelf), 0)
ifeq ($(feature-glibc), 1)
@@ -242,13 +244,14 @@ else
LIBC_SUPPORT := 1
endif
ifeq ($(LIBC_SUPPORT),1)
- msg := $(warning No libelf found, disables 'probe' tool, please install elfutils-libelf-devel/libelf-dev);
+ msg := $(warning No libelf found, disables 'probe' tool and BPF support in 'perf record', please install elfutils-libelf-devel/libelf-dev);

NO_LIBELF := 1
NO_DWARF := 1
NO_DEMANGLE := 1
NO_LIBUNWIND := 1
NO_LIBDW_DWARF_UNWIND := 1
+ NO_LIBBPF := 1
else
ifneq ($(filter s% -static%,$(LDFLAGS),),)
msg := $(error No static glibc found, please install glibc-static);
@@ -305,6 +308,13 @@ ifndef NO_LIBELF
$(call detected,CONFIG_DWARF)
endif # PERF_HAVE_DWARF_REGS
endif # NO_DWARF
+
+ ifndef NO_LIBBPF
+ ifeq ($(feature-bpf), 1)
+ CFLAGS += -DHAVE_LIBBPF_SUPPORT
+ $(call detected,CONFIG_LIBBPF)
+ endif
+ endif # NO_LIBBPF
endif # NO_LIBELF

ifeq ($(ARCH),powerpc)
@@ -320,6 +330,13 @@ ifndef NO_LIBUNWIND
endif
endif

+ifndef NO_LIBBPF
+ ifneq ($(feature-bpf), 1)
+ msg := $(warning BPF API too old. Please install recent kernel headers. BPF support in 'perf record' is disabled.)
+ NO_LIBBPF := 1
+ endif
+endif
+
dwarf-post-unwind := 1
dwarf-post-unwind-text := BUG

diff --git a/tools/perf/tests/make b/tools/perf/tests/make
index ba31c4b..2cbd0c6 100644
--- a/tools/perf/tests/make
+++ b/tools/perf/tests/make
@@ -44,6 +44,7 @@ make_no_libnuma := NO_LIBNUMA=1
make_no_libaudit := NO_LIBAUDIT=1
make_no_libbionic := NO_LIBBIONIC=1
make_no_auxtrace := NO_AUXTRACE=1
+make_no_libbpf := NO_LIBBPF=1
make_tags := tags
make_cscope := cscope
make_help := help
@@ -66,7 +67,7 @@ make_static := LDFLAGS=-static
make_minimal := NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1
make_minimal += NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1
make_minimal += NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1
-make_minimal += NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1
+make_minimal += NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1

# $(run) contains all available tests
run := make_pure
@@ -94,6 +95,7 @@ run += make_no_libnuma
run += make_no_libaudit
run += make_no_libbionic
run += make_no_auxtrace
+run += make_no_libbpf
run += make_help
run += make_doc
run += make_perf_o
--
2.1.0

2015-08-21 10:10:04

by Wang Nan

[permalink] [raw]
Subject: [PATCH 03/29] perf ebpf: Add the libbpf glue

The 'bpf-loader.[ch]' files are introduced in this patch. Which will be
the interface between perf and libbpf. bpf__prepare_load() resides in
bpf-loader.c. Dummy functions should be used because bpf-loader.c is
available only when CONFIG_LIBBPF is on.

Functions in bpf-loader.c should not report error explicitly. Instead,
strerror style error reporting should be used.

Signed-off-by: Wang Nan <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/bpf-loader.c | 92 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 47 ++++++++++++++++++++++
2 files changed, 139 insertions(+)
create mode 100644 tools/perf/util/bpf-loader.c
create mode 100644 tools/perf/util/bpf-loader.h

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
new file mode 100644
index 0000000..88531ea
--- /dev/null
+++ b/tools/perf/util/bpf-loader.c
@@ -0,0 +1,92 @@
+/*
+ * bpf-loader.c
+ *
+ * Copyright (C) 2015 Wang Nan <[email protected]>
+ * Copyright (C) 2015 Huawei Inc.
+ */
+
+#include <bpf/libbpf.h>
+#include "perf.h"
+#include "debug.h"
+#include "bpf-loader.h"
+
+#define DEFINE_PRINT_FN(name, level) \
+static int libbpf_##name(const char *fmt, ...) \
+{ \
+ va_list args; \
+ int ret; \
+ \
+ va_start(args, fmt); \
+ ret = veprintf(level, verbose, pr_fmt(fmt), args);\
+ va_end(args); \
+ return ret; \
+}
+
+DEFINE_PRINT_FN(warning, 0)
+DEFINE_PRINT_FN(info, 0)
+DEFINE_PRINT_FN(debug, 1)
+
+static bool libbpf_initialized;
+
+int bpf__prepare_load(const char *filename)
+{
+ struct bpf_object *obj;
+
+ if (!libbpf_initialized)
+ libbpf_set_print(libbpf_warning,
+ libbpf_info,
+ libbpf_debug);
+
+ obj = bpf_object__open(filename);
+ if (!obj) {
+ pr_debug("bpf: failed to load %s\n", filename);
+ return -EINVAL;
+ }
+
+ /*
+ * Throw object pointer away: it will be retrived using
+ * bpf_objects iterater.
+ */
+
+ return 0;
+}
+
+void bpf__clear(void)
+{
+ struct bpf_object *obj, *tmp;
+
+ bpf_object__for_each_safe(obj, tmp)
+ bpf_object__close(obj);
+}
+
+#define bpf__strerror_head(err, buf, size) \
+ char sbuf[STRERR_BUFSIZE], *emsg;\
+ if (!size)\
+ return 0;\
+ if (err < 0)\
+ err = -err;\
+ emsg = strerror_r(err, sbuf, sizeof(sbuf));\
+ switch (err) {\
+ default:\
+ scnprintf(buf, size, "%s", emsg);\
+ break;
+
+#define bpf__strerror_entry(val, fmt...)\
+ case val: {\
+ scnprintf(buf, size, fmt);\
+ break;\
+ }
+
+#define bpf__strerror_end(buf, size)\
+ }\
+ buf[size - 1] = '\0';
+
+int bpf__strerror_prepare_load(const char *filename, int err,
+ char *buf, size_t size)
+{
+ bpf__strerror_head(err, buf, size);
+ bpf__strerror_entry(EINVAL, "%s: BPF object file '%s' is invalid",
+ emsg, filename)
+ bpf__strerror_end(buf, size);
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
new file mode 100644
index 0000000..12be630
--- /dev/null
+++ b/tools/perf/util/bpf-loader.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright (C) 2015, Wang Nan <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ */
+#ifndef __BPF_LOADER_H
+#define __BPF_LOADER_H
+
+#include <linux/compiler.h>
+#include <string.h>
+#include "debug.h"
+
+#ifdef HAVE_LIBBPF_SUPPORT
+int bpf__prepare_load(const char *filename);
+int bpf__strerror_prepare_load(const char *filename, int err,
+ char *buf, size_t size);
+
+void bpf__clear(void);
+#else
+static inline int bpf__prepare_load(const char *filename __maybe_unused)
+{
+ pr_debug("ERROR: eBPF object loading is disabled during compiling.\n");
+ return -1;
+}
+
+static inline void bpf__clear(void) { }
+
+static inline int
+__bpf_strerror(char *buf, size_t size)
+{
+ if (!size)
+ return 0;
+ strncpy(buf,
+ "ERROR: eBPF object loading is disabled during compiling.\n",
+ size);
+ buf[size - 1] = '\0';
+ return 0;
+}
+
+static inline int
+bpf__strerror_prepare_load(const char *filename __maybe_unused,
+ int err __maybe_unused,
+ char *buf, size_t size)
+{
+ return __bpf_strerror(buf, size);
+}
+#endif
+#endif
--
2.1.0

2015-08-21 10:10:02

by Wang Nan

[permalink] [raw]
Subject: [PATCH 04/29] perf tools: Enable passing bpf object file to --event

By introducing new rules in tools/perf/util/parse-events.[ly], this
patch enables 'perf record --event bpf_file.o' to select events by an
eBPF object file. It calls parse_events_load_bpf() to load that file,
which uses bpf__prepare_load() and finally calls bpf_object__open() for
the object files.

Instead of introducing evsel to evlist during parsing, events selected
by eBPF object files are appended separately. The reason is:

1. During parsing, the probing points have not been initialized.

2. Currently we are unable to call add_perf_probe_events() twice,
therefore we have to wait until all such events are collected,
then probe all points by one call.

The real probing and selecting is reside in following patches.

Commiter note:

Testing if the event parsing changes indeed call the BPF loading
routines:

[root@felicio ~]# ls -la foo.o
ls: cannot access foo.o: No such file or directory
[root@felicio ~]# perf record --event foo.o sleep
libbpf: failed to open foo.o: No such file or directory
bpf: failed to load foo.o
invalid or unsupported event: 'foo.o'
Run 'perf list' for a list of valid events

usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]

-e, --event <event> event selector. use 'perf list' to list available events
[root@felicio ~]#

Yes, it does this time around.

Signed-off-by: Wang Nan <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
[ The veprintf() and bpf loader parts were split from this one ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/Build | 1 +
tools/perf/util/parse-events.c | 26 ++++++++++++++++++++++++++
tools/perf/util/parse-events.h | 3 +++
tools/perf/util/parse-events.l | 3 +++
tools/perf/util/parse-events.y | 18 +++++++++++++++++-
5 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index c20473d..4d944d4 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -82,6 +82,7 @@ libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
libperf-$(CONFIG_AUXTRACE) += intel-pt.o
libperf-y += parse-branch-options.o

+libperf-$(CONFIG_LIBBPF) += bpf-loader.o
libperf-$(CONFIG_LIBELF) += symbol-elf.o
libperf-$(CONFIG_LIBELF) += probe-file.o
libperf-$(CONFIG_LIBELF) += probe-event.o
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index d826e6f..9eaceb2 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -19,6 +19,7 @@
#include "thread_map.h"
#include "cpumap.h"
#include "asm/bug.h"
+#include "bpf-loader.h"

#define MAX_NAME_LEN 100

@@ -481,6 +482,31 @@ int parse_events_add_tracepoint(struct list_head *list, int *idx,
return add_tracepoint_event(list, idx, sys, event);
}

+int parse_events_load_bpf(struct parse_events_evlist *data,
+ struct list_head *list __maybe_unused,
+ char *bpf_file_name)
+{
+ int err;
+ char errbuf[BUFSIZ];
+
+ /*
+ * Currently don't link any event to list. BPF object files
+ * should be saved to a seprated list and processed together.
+ *
+ * Things could be changed if we solve perf probe reentering
+ * problem. After that probe events file by file is possible.
+ * However, probing cost is still need to be considered.
+ */
+ err = bpf__prepare_load(bpf_file_name);
+ if (err) {
+ bpf__strerror_prepare_load(bpf_file_name, err,
+ errbuf, sizeof(errbuf));
+ data->error->str = strdup(errbuf);
+ data->error->help = strdup("(add -v to see detail)");
+ }
+ return err;
+}
+
static int
parse_breakpoint_type(const char *type, struct perf_event_attr *attr)
{
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index a09b0e2..3652387 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -119,6 +119,9 @@ int parse_events__modifier_group(struct list_head *list, char *event_mod);
int parse_events_name(struct list_head *list, char *name);
int parse_events_add_tracepoint(struct list_head *list, int *idx,
char *sys, char *event);
+int parse_events_load_bpf(struct parse_events_evlist *data,
+ struct list_head *list,
+ char *bpf_file_name);
int parse_events_add_numeric(struct parse_events_evlist *data,
struct list_head *list,
u32 type, u64 config,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 936d566..22e8f93 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -115,6 +115,7 @@ do { \
group [^,{}/]*[{][^}]*[}][^,{}/]*
event_pmu [^,{}/]+[/][^/]*[/][^,{}/]*
event [^,{}/]+
+bpf_object .*\.(o|bpf)

num_dec [0-9]+
num_hex 0x[a-fA-F0-9]+
@@ -159,6 +160,7 @@ modifier_bp [rwx]{1,3}
}

{event_pmu} |
+{bpf_object} |
{event} {
BEGIN(INITIAL);
REWIND(1);
@@ -264,6 +266,7 @@ r{num_raw_hex} { return raw(yyscanner); }
{num_hex} { return value(yyscanner, 16); }

{modifier_event} { return str(yyscanner, PE_MODIFIER_EVENT); }
+{bpf_object} { return str(yyscanner, PE_BPF_OBJECT); }
{name} { return pmu_str_check(yyscanner); }
"/" { BEGIN(config); return '/'; }
- { return '-'; }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 591905a..3ee3a32 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -42,6 +42,7 @@ static inc_group_count(struct list_head *list,
%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_RAW PE_TERM
%token PE_EVENT_NAME
%token PE_NAME
+%token PE_BPF_OBJECT
%token PE_MODIFIER_EVENT PE_MODIFIER_BP
%token PE_NAME_CACHE_TYPE PE_NAME_CACHE_OP_RESULT
%token PE_PREFIX_MEM PE_PREFIX_RAW PE_PREFIX_GROUP
@@ -53,6 +54,7 @@ static inc_group_count(struct list_head *list,
%type <num> PE_RAW
%type <num> PE_TERM
%type <str> PE_NAME
+%type <str> PE_BPF_OBJECT
%type <str> PE_NAME_CACHE_TYPE
%type <str> PE_NAME_CACHE_OP_RESULT
%type <str> PE_MODIFIER_EVENT
@@ -69,6 +71,7 @@ static inc_group_count(struct list_head *list,
%type <head> event_legacy_tracepoint
%type <head> event_legacy_numeric
%type <head> event_legacy_raw
+%type <head> event_bpf_file
%type <head> event_def
%type <head> event_mod
%type <head> event_name
@@ -198,7 +201,8 @@ event_def: event_pmu |
event_legacy_mem |
event_legacy_tracepoint sep_dc |
event_legacy_numeric sep_dc |
- event_legacy_raw sep_dc
+ event_legacy_raw sep_dc |
+ event_bpf_file

event_pmu:
PE_NAME '/' event_config '/'
@@ -420,6 +424,18 @@ PE_RAW
$$ = list;
}

+event_bpf_file:
+PE_BPF_OBJECT
+{
+ struct parse_events_evlist *data = _data;
+ struct list_head *list;
+
+ ALLOC_LIST(list);
+ ABORT_ON(parse_events_load_bpf(data, list, $1));
+ $$ = list;
+}
+
+
start_terms: event_config
{
struct parse_events_terms *data = _data;
--
2.1.0

2015-08-21 10:10:12

by Wang Nan

[permalink] [raw]
Subject: [PATCH 05/29] perf probe: Attach trace_probe_event with perf_probe_event

This patch drops struct __event_package structure. Instead, it adds
trace_probe_event into 'struct perf_probe_event'.

trace_probe_event information gives further patches a chance to access
actual probe points and actual arguments. Using them, bpf_loader will
be able to attach one bpf program to different probing points of a
inline functions (which has multiple probing points) and glob
functions. Moreover, by reading arguments information, bpf code for
reading those arguments can be generated.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/builtin-probe.c | 4 ++-
tools/perf/util/probe-event.c | 60 +++++++++++++++++++++----------------------
tools/perf/util/probe-event.h | 6 ++++-
3 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index b81cec3..826d452 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -496,7 +496,9 @@ __cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused)
usage_with_options(probe_usage, options);
}

- ret = add_perf_probe_events(params.events, params.nevents);
+ ret = add_perf_probe_events(params.events,
+ params.nevents,
+ true);
if (ret < 0) {
pr_err_with_code(" Error: Failed to add events.", ret);
return ret;
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index f07374b..d5368ac 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1926,6 +1926,9 @@ void clear_perf_probe_event(struct perf_probe_event *pev)
struct perf_probe_arg_field *field, *next;
int i;

+ if (pev->ntevs)
+ cleanup_perf_probe_event(pev);
+
free(pev->event);
free(pev->group);
free(pev->target);
@@ -2602,61 +2605,58 @@ static int convert_to_probe_trace_events(struct perf_probe_event *pev,
return find_probe_trace_events_from_map(pev, tevs);
}

-struct __event_package {
- struct perf_probe_event *pev;
- struct probe_trace_event *tevs;
- int ntevs;
-};
-
-int add_perf_probe_events(struct perf_probe_event *pevs, int npevs)
+int cleanup_perf_probe_event(struct perf_probe_event *pev)
{
- int i, j, ret;
- struct __event_package *pkgs;
+ int i;

- ret = 0;
- pkgs = zalloc(sizeof(struct __event_package) * npevs);
+ if (!pev || !pev->ntevs)
+ return 0;

- if (pkgs == NULL)
- return -ENOMEM;
+ for (i = 0; i < pev->ntevs; i++)
+ clear_probe_trace_event(&pev->tevs[i]);
+
+ zfree(&pev->tevs);
+ pev->ntevs = 0;
+ return 0;
+}
+
+int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
+ bool cleanup)
+{
+ int i, ret;

ret = init_symbol_maps(pevs->uprobes);
- if (ret < 0) {
- free(pkgs);
+ if (ret < 0)
return ret;
- }

/* Loop 1: convert all events */
for (i = 0; i < npevs; i++) {
- pkgs[i].pev = &pevs[i];
/* Init kprobe blacklist if needed */
- if (!pkgs[i].pev->uprobes)
+ if (pevs[i].uprobes)
kprobe_blacklist__init();
/* Convert with or without debuginfo */
- ret = convert_to_probe_trace_events(pkgs[i].pev,
- &pkgs[i].tevs);
- if (ret < 0)
+ ret = convert_to_probe_trace_events(&pevs[i], &pevs[i].tevs);
+ if (ret < 0) {
+ cleanup = true;
goto end;
- pkgs[i].ntevs = ret;
+ }
+ pevs[i].ntevs = ret;
}
/* This just release blacklist only if allocated */
kprobe_blacklist__release();

/* Loop 2: add all events */
for (i = 0; i < npevs; i++) {
- ret = __add_probe_trace_events(pkgs[i].pev, pkgs[i].tevs,
- pkgs[i].ntevs,
+ ret = __add_probe_trace_events(&pevs[i], pevs[i].tevs,
+ pevs[i].ntevs,
probe_conf.force_add);
if (ret < 0)
break;
}
end:
/* Loop 3: cleanup and free trace events */
- for (i = 0; i < npevs; i++) {
- for (j = 0; j < pkgs[i].ntevs; j++)
- clear_probe_trace_event(&pkgs[i].tevs[j]);
- zfree(&pkgs[i].tevs);
- }
- free(pkgs);
+ for (i = 0; cleanup && (i < npevs); i++)
+ cleanup_perf_probe_event(&pevs[i]);
exit_symbol_maps();

return ret;
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 83ee95e..5db0f20 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -86,6 +86,8 @@ struct perf_probe_event {
bool uprobes; /* Uprobe event flag */
char *target; /* Target binary */
struct perf_probe_arg *args; /* Arguments */
+ struct probe_trace_event *tevs;
+ int ntevs;
};

/* Line range */
@@ -136,8 +138,10 @@ extern void line_range__clear(struct line_range *lr);
/* Initialize line range */
extern int line_range__init(struct line_range *lr);

-extern int add_perf_probe_events(struct perf_probe_event *pevs, int npevs);
+extern int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
+ bool cleanup);
extern int del_perf_probe_events(struct strfilter *filter);
+extern int cleanup_perf_probe_event(struct perf_probe_event *pev);
extern int show_perf_probe_events(struct strfilter *filter);
extern int show_line_range(struct line_range *lr, const char *module,
bool user);
--
2.1.0

2015-08-21 10:17:53

by Wang Nan

[permalink] [raw]
Subject: [PATCH 06/29] perf record, bpf: Parse and probe eBPF programs probe points

This patch introduces bpf__{un,}probe() functions to enable callers to
create kprobe points based on section names of BPF programs. It parses
the section names of each eBPF program and creates corresponding 'struct
perf_probe_event' structures. The parse_perf_probe_command() function is
used to do the main parsing work.

Parsing result is stored into an array to satisify
add_perf_probe_events(). It accepts an array of 'struct perf_probe_event'
and do all the work in one call.

Define PERF_BPF_PROBE_GROUP as "perf_bpf_probe", which will be used as
the group name of all eBPF probing points.

probe_conf.max_probes is set to MAX_PROBES to support glob matching.

Before ending of bpf__probe(), data in each 'struct perf_probe_event' is
cleaned. Things will be changed by following patches because they need
'struct probe_trace_event' in them,

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
Link: http://lkml.kernel.org/n/[email protected]
[Merged by two patches]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-record.c | 20 ++++++-
tools/perf/util/bpf-loader.c | 133 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 13 +++++
3 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index a660022..4e9c022 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -29,6 +29,7 @@
#include "util/data.h"
#include "util/auxtrace.h"
#include "util/parse-branch-options.h"
+#include "util/bpf-loader.h"

#include <unistd.h>
#include <sched.h>
@@ -1137,7 +1138,23 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
if (err)
return err;

- err = -ENOMEM;
+ /*
+ * bpf__probe must be called before symbol__init() because we
+ * need init_symbol_maps. If called after symbol__init,
+ * symbol_conf.sort_by_name won't take effect.
+ *
+ * bpf__unprobe() is safe even if bpf__probe() failed, and it
+ * also calls symbol__init. Therefore, goto out_symbol_exit
+ * is safe when probe failed.
+ */
+ err = bpf__probe();
+ if (err) {
+ bpf__strerror_probe(err, errbuf, sizeof(errbuf));
+
+ pr_err("Probing at events in BPF object failed.\n");
+ pr_err("\t%s\n", errbuf);
+ goto out_symbol_exit;
+ }

symbol__init(NULL);

@@ -1198,6 +1215,7 @@ out_symbol_exit:
perf_evlist__delete(rec->evlist);
symbol__exit();
auxtrace_record__free(rec->itr);
+ bpf__unprobe();
return err;
}

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 88531ea..435f52e 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -9,6 +9,8 @@
#include "perf.h"
#include "debug.h"
#include "bpf-loader.h"
+#include "probe-event.h"
+#include "probe-finder.h"

#define DEFINE_PRINT_FN(name, level) \
static int libbpf_##name(const char *fmt, ...) \
@@ -28,6 +30,58 @@ DEFINE_PRINT_FN(debug, 1)

static bool libbpf_initialized;

+static int
+config_bpf_program(struct bpf_program *prog, struct perf_probe_event *pev)
+{
+ const char *config_str;
+ int err;
+
+ config_str = bpf_program__title(prog, false);
+ if (!config_str) {
+ pr_debug("bpf: unable to get title for program\n");
+ return -EINVAL;
+ }
+
+ pr_debug("bpf: config program '%s'\n", config_str);
+ err = parse_perf_probe_command(config_str, pev);
+ if (err < 0) {
+ pr_debug("bpf: '%s' is not a valid config string\n",
+ config_str);
+ /* parse failed, don't need clear pev. */
+ return -EINVAL;
+ }
+
+ if (pev->group && strcmp(pev->group, PERF_BPF_PROBE_GROUP)) {
+ pr_debug("bpf: '%s': group for event is set and not '%s'.\n",
+ config_str, PERF_BPF_PROBE_GROUP);
+ err = -EINVAL;
+ goto errout;
+ } else if (!pev->group)
+ pev->group = strdup(PERF_BPF_PROBE_GROUP);
+
+ if (!pev->group) {
+ pr_debug("bpf: strdup failed\n");
+ err = -ENOMEM;
+ goto errout;
+ }
+
+ if (!pev->event) {
+ pr_debug("bpf: '%s': event name is missing\n",
+ config_str);
+ err = -EINVAL;
+ goto errout;
+ }
+
+ pr_debug("bpf: config '%s' is ok\n", config_str);
+
+ return 0;
+
+errout:
+ if (pev)
+ clear_perf_probe_event(pev);
+ return err;
+}
+
int bpf__prepare_load(const char *filename)
{
struct bpf_object *obj;
@@ -59,6 +113,74 @@ void bpf__clear(void)
bpf_object__close(obj);
}

+static bool is_probed;
+
+int bpf__unprobe(void)
+{
+ struct strfilter *delfilter;
+ int ret;
+
+ if (!is_probed)
+ return 0;
+
+ delfilter = strfilter__new(PERF_BPF_PROBE_GROUP ":*", NULL);
+ if (!delfilter) {
+ pr_debug("Failed to create delfilter when unprobing\n");
+ return -ENOMEM;
+ }
+
+ ret = del_perf_probe_events(delfilter);
+ strfilter__delete(delfilter);
+ if (ret < 0 && is_probed)
+ pr_debug("Error: failed to delete events: %s\n",
+ strerror(-ret));
+ else
+ is_probed = false;
+ return ret < 0 ? ret : 0;
+}
+
+int bpf__probe(void)
+{
+ int err, nr_events = 0;
+ struct bpf_object *obj, *tmp;
+ struct bpf_program *prog;
+ struct perf_probe_event *pevs;
+
+ pevs = calloc(MAX_PROBES, sizeof(pevs[0]));
+ if (!pevs)
+ return -ENOMEM;
+
+ bpf_object__for_each_safe(obj, tmp) {
+ bpf_object__for_each_program(prog, obj) {
+ err = config_bpf_program(prog, &pevs[nr_events++]);
+ if (err < 0)
+ goto out;
+
+ if (nr_events >= MAX_PROBES) {
+ pr_debug("Too many (more than %d) events\n",
+ MAX_PROBES);
+ err = -ERANGE;
+ goto out;
+ };
+ }
+ }
+
+ probe_conf.max_probes = MAX_PROBES;
+ /* Let add_perf_probe_events generates probe_trace_event (tevs) */
+ err = add_perf_probe_events(pevs, nr_events, false);
+
+ /* add_perf_probe_events return negative when fail */
+ if (err < 0) {
+ pr_debug("bpf probe: failed to probe events\n");
+ } else
+ is_probed = true;
+out:
+ while (nr_events > 0)
+ clear_perf_probe_event(&pevs[--nr_events]);
+ free(pevs);
+ return err < 0 ? err : 0;
+}
+
#define bpf__strerror_head(err, buf, size) \
char sbuf[STRERR_BUFSIZE], *emsg;\
if (!size)\
@@ -90,3 +212,14 @@ int bpf__strerror_prepare_load(const char *filename, int err,
bpf__strerror_end(buf, size);
return 0;
}
+
+int bpf__strerror_probe(int err, char *buf, size_t size)
+{
+ bpf__strerror_head(err, buf, size);
+ bpf__strerror_entry(ERANGE, "Too many (more than %d) events",
+ MAX_PROBES);
+ bpf__strerror_entry(ENOENT, "Selected kprobe point doesn't exist.");
+ bpf__strerror_entry(EEXIST, "Selected kprobe point already exist, try perf probe -d '*'.");
+ bpf__strerror_end(buf, size);
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 12be630..6b09a85 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -9,10 +9,15 @@
#include <string.h>
#include "debug.h"

+#define PERF_BPF_PROBE_GROUP "perf_bpf_probe"
+
#ifdef HAVE_LIBBPF_SUPPORT
int bpf__prepare_load(const char *filename);
int bpf__strerror_prepare_load(const char *filename, int err,
char *buf, size_t size);
+int bpf__probe(void);
+int bpf__unprobe(void);
+int bpf__strerror_probe(int err, char *buf, size_t size);

void bpf__clear(void);
#else
@@ -22,6 +27,8 @@ static inline int bpf__prepare_load(const char *filename __maybe_unused)
return -1;
}

+static inline int bpf__probe(void) { return 0; }
+static inline int bpf__unprobe(void) { return 0; }
static inline void bpf__clear(void) { }

static inline int
@@ -43,5 +50,11 @@ bpf__strerror_prepare_load(const char *filename __maybe_unused,
{
return __bpf_strerror(buf, size);
}
+
+static inline int bpf__strerror_probe(int err __maybe_unused,
+ char *buf, size_t size)
+{
+ return __bpf_strerror(buf, size);
+}
#endif
#endif
--
2.1.0

2015-08-21 10:17:57

by Wang Nan

[permalink] [raw]
Subject: [PATCH 07/29] perf bpf: Collect 'struct perf_probe_event' for bpf_program

This patch utilizes bpf_program__set_private(), binding perf_probe_event
with bpf program by private field.

Saving those information so 'perf record' knows which kprobe point a program
should be attached.

Since data in 'struct perf_probe_event' is build by 2 stages, tev_ready
is used to mark whether the information (especially tevs) in 'struct
perf_probe_event' is valid or not. It is false at first, and set to true
by sync_bpf_program_pev(), which copy all pointers in original pev into
a program specific memory region. sync_bpf_program_pev() is called after
add_perf_probe_events() to make sure ready of data.

Remove code which clean 'struct perf_probe_event' after bpf__probe()
because pointers in pevs are copied to program's private field, calling
clear_perf_probe_event() becomes unsafe.

Signed-off-by: Wang Nan <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
[Splitted from a larger patch]
---
tools/perf/util/bpf-loader.c | 90 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 88 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 435f52e..ae23f6f 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -30,9 +30,35 @@ DEFINE_PRINT_FN(debug, 1)

static bool libbpf_initialized;

+struct bpf_prog_priv {
+ /*
+ * If pev_ready is false, ppev pointes to a local memory which
+ * is only valid inside bpf__probe().
+ * pev is valid only when pev_ready.
+ */
+ bool pev_ready;
+ union {
+ struct perf_probe_event *ppev;
+ struct perf_probe_event pev;
+ };
+};
+
+static void
+bpf_prog_priv__clear(struct bpf_program *prog __maybe_unused,
+ void *_priv)
+{
+ struct bpf_prog_priv *priv = _priv;
+
+ /* check if pev is initialized */
+ if (priv && priv->pev_ready)
+ clear_perf_probe_event(&priv->pev);
+ free(priv);
+}
+
static int
config_bpf_program(struct bpf_program *prog, struct perf_probe_event *pev)
{
+ struct bpf_prog_priv *priv = NULL;
const char *config_str;
int err;

@@ -74,14 +100,58 @@ config_bpf_program(struct bpf_program *prog, struct perf_probe_event *pev)

pr_debug("bpf: config '%s' is ok\n", config_str);

+ priv = calloc(1, sizeof(*priv));
+ if (!priv) {
+ pr_debug("bpf: failed to alloc memory\n");
+ err = -ENOMEM;
+ goto errout;
+ }
+
+ /*
+ * At this very early stage, tevs inside pev are not ready.
+ * It becomes usable after add_perf_probe_events() is called.
+ * set pev_ready to false so further access read priv->ppev
+ * only.
+ */
+ priv->pev_ready = false;
+ priv->ppev = pev;
+
+ err = bpf_program__set_private(prog, priv,
+ bpf_prog_priv__clear);
+ if (err) {
+ pr_debug("bpf: set program private failed\n");
+ err = -ENOMEM;
+ goto errout;
+ }
return 0;

errout:
if (pev)
clear_perf_probe_event(pev);
+ if (priv)
+ free(priv);
return err;
}

+static int
+sync_bpf_program_pev(struct bpf_program *prog)
+{
+ int err;
+ struct bpf_prog_priv *priv;
+ struct perf_probe_event *ppev;
+
+ err = bpf_program__get_private(prog, (void **)&priv);
+ if (err || !priv || priv->pev_ready) {
+ pr_debug("Internal error: sync_bpf_program_pev\n");
+ return -EINVAL;
+ }
+
+ ppev = priv->ppev;
+ memcpy(&priv->pev, ppev, sizeof(*ppev));
+ priv->pev_ready = true;
+ return 0;
+}
+
int bpf__prepare_load(const char *filename)
{
struct bpf_object *obj;
@@ -172,11 +242,27 @@ int bpf__probe(void)
/* add_perf_probe_events return negative when fail */
if (err < 0) {
pr_debug("bpf probe: failed to probe events\n");
+ goto out;
} else
is_probed = true;
+
+ /*
+ * After add_perf_probe_events, 'struct perf_probe_event' is ready.
+ * Until now copying program's priv->pev field and freeing
+ * the big array allocated before become safe.
+ */
+ bpf_object__for_each_safe(obj, tmp) {
+ bpf_object__for_each_program(prog, obj) {
+ err = sync_bpf_program_pev(prog);
+ if (err)
+ goto out;
+ }
+ }
out:
- while (nr_events > 0)
- clear_perf_probe_event(&pevs[--nr_events]);
+ /*
+ * Don't call clear_perf_probe_event() for entries of pevs:
+ * they are used by prog's private field.
+ */
free(pevs);
return err < 0 ? err : 0;
}
--
2.1.0

2015-08-21 10:23:26

by Wang Nan

[permalink] [raw]
Subject: [PATCH 08/29] perf record: Load all eBPF object into kernel

This patch utilizes bpf_object__load() provided by libbpf to load all
objects into kernel.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/builtin-record.c | 15 +++++++++++++++
tools/perf/util/bpf-loader.c | 28 ++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 10 ++++++++++
3 files changed, 53 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4e9c022..2e19c95c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1156,6 +1156,21 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
goto out_symbol_exit;
}

+ /*
+ * bpf__probe() also calls symbol__init() if there are probe
+ * events in bpf objects, so calling symbol_exit when failure
+ * is safe. If there is no probe event, bpf__load() always
+ * success.
+ */
+ err = bpf__load();
+ if (err) {
+ pr_err("Loading BPF programs failed:\n");
+
+ bpf__strerror_load(err, errbuf, sizeof(errbuf));
+ pr_err("\t%s\n", errbuf);
+ goto out_symbol_exit;
+ }
+
symbol__init(NULL);

if (symbol_conf.kptr_restrict)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index ae23f6f..d63a594 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -267,6 +267,25 @@ out:
return err < 0 ? err : 0;
}

+int bpf__load(void)
+{
+ struct bpf_object *obj, *tmp;
+ int err = 0;
+
+ bpf_object__for_each_safe(obj, tmp) {
+ err = bpf_object__load(obj);
+ if (err) {
+ pr_debug("bpf: load objects failed\n");
+ goto errout;
+ }
+ }
+ return 0;
+errout:
+ bpf_object__for_each_safe(obj, tmp)
+ bpf_object__unload(obj);
+ return err;
+}
+
#define bpf__strerror_head(err, buf, size) \
char sbuf[STRERR_BUFSIZE], *emsg;\
if (!size)\
@@ -309,3 +328,12 @@ int bpf__strerror_probe(int err, char *buf, size_t size)
bpf__strerror_end(buf, size);
return 0;
}
+
+int bpf__strerror_load(int err, char *buf, size_t size)
+{
+ bpf__strerror_head(err, buf, size);
+ bpf__strerror_entry(EINVAL, "%s: add -v to see detail. Run a CONFIG_BPF_SYSCALL kernel?",
+ emsg)
+ bpf__strerror_end(buf, size);
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 6b09a85..4d7552e 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -19,6 +19,9 @@ int bpf__probe(void);
int bpf__unprobe(void);
int bpf__strerror_probe(int err, char *buf, size_t size);

+int bpf__load(void);
+int bpf__strerror_load(int err, char *buf, size_t size);
+
void bpf__clear(void);
#else
static inline int bpf__prepare_load(const char *filename __maybe_unused)
@@ -29,6 +32,7 @@ static inline int bpf__prepare_load(const char *filename __maybe_unused)

static inline int bpf__probe(void) { return 0; }
static inline int bpf__unprobe(void) { return 0; }
+static inline int bpf__load(void) { return 0; }
static inline void bpf__clear(void) { }

static inline int
@@ -56,5 +60,11 @@ static inline int bpf__strerror_probe(int err __maybe_unused,
{
return __bpf_strerror(buf, size);
}
+
+static inline int bpf__strerror_load(int err __maybe_unused,
+ char *buf, size_t size)
+{
+ return __bpf_strerror(buf, size);
+}
#endif
#endif
--
2.1.0

2015-08-21 10:12:29

by Wang Nan

[permalink] [raw]
Subject: [PATCH 09/29] perf tools: Add bpf_fd field to evsel and config it

This patch adds a bpf_fd field to 'struct evsel' then introduces method
to config it. In bpf-loader, a bpf__foreach_tev() function is added,
which calls the callback function for each 'struct probe_trace_event'
events for each bpf program with their file descriptors. In evlist.c,
perf_evlist__add_bpf() is introduced to add all bpf events into evlist.
The event names are found from probe_trace_event structure.
'perf record' calls perf_evlist__add_bpf().

Since bpf-loader.c will not be built if libbpf is turned off, an empty
bpf__foreach_tev() is defined in bpf-loader.h to avoid compiling
error.

This patch iterates over 'struct probe_trace_event' instead of
'struct probe_trace_event' during the loop for further patches, which
will generate multiple instances form one BPF program and install then
onto different 'struct probe_trace_event'.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/builtin-record.c | 6 ++++++
tools/perf/util/bpf-loader.c | 41 +++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 13 +++++++++++++
tools/perf/util/evlist.c | 41 +++++++++++++++++++++++++++++++++++++++++
tools/perf/util/evlist.h | 1 +
tools/perf/util/evsel.c | 1 +
tools/perf/util/evsel.h | 1 +
7 files changed, 104 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 2e19c95c..120b00d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1171,6 +1171,12 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
goto out_symbol_exit;
}

+ err = perf_evlist__add_bpf(rec->evlist);
+ if (err < 0) {
+ pr_err("Failed to add events from BPF object(s)\n");
+ goto out_symbol_exit;
+ }
+
symbol__init(NULL);

if (symbol_conf.kptr_restrict)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index d63a594..126aa71 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -286,6 +286,47 @@ errout:
return err;
}

+int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg)
+{
+ struct bpf_object *obj, *tmp;
+ struct bpf_program *prog;
+ int err;
+
+ bpf_object__for_each_safe(obj, tmp) {
+ bpf_object__for_each_program(prog, obj) {
+ struct probe_trace_event *tev;
+ struct perf_probe_event *pev;
+ struct bpf_prog_priv *priv;
+ int i, fd;
+
+ err = bpf_program__get_private(prog,
+ (void **)&priv);
+ if (err || !priv) {
+ pr_debug("bpf: failed to get private field\n");
+ return -EINVAL;
+ }
+
+ pev = &priv->pev;
+ for (i = 0; i < pev->ntevs; i++) {
+ tev = &pev->tevs[i];
+
+ fd = bpf_program__fd(prog);
+ if (fd < 0) {
+ pr_debug("bpf: failed to get file descriptor\n");
+ return fd;
+ }
+
+ err = func(tev, fd, arg);
+ if (err) {
+ pr_debug("bpf: call back failed, stop iterate\n");
+ return err;
+ }
+ }
+ }
+ }
+ return 0;
+}
+
#define bpf__strerror_head(err, buf, size) \
char sbuf[STRERR_BUFSIZE], *emsg;\
if (!size)\
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 4d7552e..34656f8 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -7,10 +7,14 @@

#include <linux/compiler.h>
#include <string.h>
+#include "probe-event.h"
#include "debug.h"

#define PERF_BPF_PROBE_GROUP "perf_bpf_probe"

+typedef int (*bpf_prog_iter_callback_t)(struct probe_trace_event *tev,
+ int fd, void *arg);
+
#ifdef HAVE_LIBBPF_SUPPORT
int bpf__prepare_load(const char *filename);
int bpf__strerror_prepare_load(const char *filename, int err,
@@ -23,6 +27,8 @@ int bpf__load(void);
int bpf__strerror_load(int err, char *buf, size_t size);

void bpf__clear(void);
+
+int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg);
#else
static inline int bpf__prepare_load(const char *filename __maybe_unused)
{
@@ -36,6 +42,13 @@ static inline int bpf__load(void) { return 0; }
static inline void bpf__clear(void) { }

static inline int
+bpf__foreach_tev(bpf_prog_iter_callback_t func __maybe_unused,
+ void *arg __maybe_unused)
+{
+ return 0;
+}
+
+static inline int
__bpf_strerror(char *buf, size_t size)
{
if (!size)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e9a5d43..3931f5e 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -14,6 +14,7 @@
#include "target.h"
#include "evlist.h"
#include "evsel.h"
+#include "bpf-loader.h"
#include "debug.h"
#include <unistd.h>

@@ -194,6 +195,46 @@ error:
return -ENOMEM;
}

+static int add_bpf_event(struct probe_trace_event *tev, int fd,
+ void *arg)
+{
+ struct perf_evlist *evlist = arg;
+ struct perf_evsel *pos;
+ struct list_head list;
+ int err, idx, entries;
+
+ pr_debug("add bpf event %s:%s and attach bpf program %d\n",
+ tev->group, tev->event, fd);
+ INIT_LIST_HEAD(&list);
+ idx = evlist->nr_entries;
+
+ pr_debug("adding %s:%s\n", tev->group, tev->event);
+ err = parse_events_add_tracepoint(&list, &idx, tev->group,
+ tev->event);
+ if (err) {
+ struct perf_evsel *evsel, *tmp;
+
+ pr_err("Failed to add BPF event %s:%s\n",
+ tev->group, tev->event);
+ list_for_each_entry_safe(evsel, tmp, &list, node) {
+ list_del(&evsel->node);
+ perf_evsel__delete(evsel);
+ }
+ return -EINVAL;
+ }
+
+ list_for_each_entry(pos, &list, node)
+ pos->bpf_fd = fd;
+ entries = idx - evlist->nr_entries;
+ perf_evlist__splice_list_tail(evlist, &list, entries);
+ return 0;
+}
+
+int perf_evlist__add_bpf(struct perf_evlist *evlist)
+{
+ return bpf__foreach_tev(add_bpf_event, evlist);
+}
+
static int perf_evlist__add_attrs(struct perf_evlist *evlist,
struct perf_event_attr *attrs, size_t nr_attrs)
{
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 436e358..7b8fd90 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -72,6 +72,7 @@ void perf_evlist__delete(struct perf_evlist *evlist);

void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry);
int perf_evlist__add_default(struct perf_evlist *evlist);
+int perf_evlist__add_bpf(struct perf_evlist *evlist);
int __perf_evlist__add_default_attrs(struct perf_evlist *evlist,
struct perf_event_attr *attrs, size_t nr_attrs);

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index b096ef7..ae9ad2d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -206,6 +206,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
evsel->leader = evsel;
evsel->unit = "";
evsel->scale = 1.0;
+ evsel->bpf_fd = -1;
INIT_LIST_HEAD(&evsel->node);
INIT_LIST_HEAD(&evsel->config_terms);
perf_evsel__object.init(evsel);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 93ac6b1..3ff4aad 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -115,6 +115,7 @@ struct perf_evsel {
char *group_name;
bool cmdline_group_boundary;
struct list_head config_terms;
+ int bpf_fd;
};

union u64_swap {
--
2.1.0

2015-08-21 10:11:52

by Wang Nan

[permalink] [raw]
Subject: [PATCH 10/29] perf tools: Attach eBPF program to perf event

This is the final patch which makes basic BPF filter work. After
applying this patch, users are allowed to use BPF filter like:

# perf record --event ./hello_world.c ls

In this patch PERF_EVENT_IOC_SET_BPF ioctl is used to attach eBPF
program to a newly created perf event. The file descriptor of the
eBPF program is passed to perf record using previous patches, and
stored into evsel->bpf_fd.

It is possible that different perf event are created for one kprobe
events for different CPUs. In this case, when trying to call the
ioctl, EEXIST will be return. This patch doesn't treat it as an error.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/evsel.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index ae9ad2d..40b2722 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1331,6 +1331,22 @@ retry_open:
err);
goto try_fallback;
}
+
+ if (evsel->bpf_fd >= 0) {
+ int evt_fd = FD(evsel, cpu, thread);
+ int bpf_fd = evsel->bpf_fd;
+
+ err = ioctl(evt_fd,
+ PERF_EVENT_IOC_SET_BPF,
+ bpf_fd);
+ if (err && errno != EEXIST) {
+ pr_err("failed to attach bpf fd %d: %s\n",
+ bpf_fd, strerror(errno));
+ err = -EINVAL;
+ goto out_close;
+ }
+ }
+
set_rlimit = NO_CHANGE;

/*
--
2.1.0

2015-08-21 10:11:44

by Wang Nan

[permalink] [raw]
Subject: [PATCH 11/29] perf tools: Suppress probing messages when probing by BPF loading

This patch suppresses message output by add_perf_probe_events() and
del_perf_probe_events() if they are triggered by BPF loading. Before
this patch, when using 'perf record' with BPF object/source as event
selector, following message will be output:

Added new event:
perf_bpf_probe:lock_page_ret (on __lock_page%return)
You can now use it in all perf tools, such as:
perf record -e perf_bpf_probe:lock_page_ret -aR sleep 1
...
Removed event: perf_bpf_probe:lock_page_ret

Which is misleading, especially 'use it in all perf tools' because they
will be removed after 'pref record' exit.

In this patch, a 'silent' field is appended into probe_conf to control
output. bpf__{,un}probe() set it to true when calling
{add,del}_perf_probe_events().

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 6 ++++++
tools/perf/util/probe-event.c | 17 ++++++++++++-----
tools/perf/util/probe-event.h | 1 +
tools/perf/util/probe-file.c | 5 ++++-
4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 126aa71..8e91ad1 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -188,6 +188,7 @@ static bool is_probed;
int bpf__unprobe(void)
{
struct strfilter *delfilter;
+ bool old_silent = probe_conf.silent;
int ret;

if (!is_probed)
@@ -199,7 +200,9 @@ int bpf__unprobe(void)
return -ENOMEM;
}

+ probe_conf.silent = true;
ret = del_perf_probe_events(delfilter);
+ probe_conf.silent = old_silent;
strfilter__delete(delfilter);
if (ret < 0 && is_probed)
pr_debug("Error: failed to delete events: %s\n",
@@ -215,6 +218,7 @@ int bpf__probe(void)
struct bpf_object *obj, *tmp;
struct bpf_program *prog;
struct perf_probe_event *pevs;
+ bool old_silent = probe_conf.silent;

pevs = calloc(MAX_PROBES, sizeof(pevs[0]));
if (!pevs)
@@ -235,9 +239,11 @@ int bpf__probe(void)
}
}

+ probe_conf.silent = true;
probe_conf.max_probes = MAX_PROBES;
/* Let add_perf_probe_events generates probe_trace_event (tevs) */
err = add_perf_probe_events(pevs, nr_events, false);
+ probe_conf.silent = old_silent;

/* add_perf_probe_events return negative when fail */
if (err < 0) {
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index d5368ac..ffc0113 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -52,7 +52,9 @@
#define PERFPROBE_GROUP "probe"

bool probe_event_dry_run; /* Dry run flag */
-struct probe_conf probe_conf;
+struct probe_conf probe_conf = {
+ .silent = false,
+};

#define semantic_error(msg ...) pr_err("Semantic error :" msg)

@@ -2133,10 +2135,12 @@ static int show_perf_probe_event(const char *group, const char *event,

ret = perf_probe_event__sprintf(group, event, pev, module, &buf);
if (ret >= 0) {
- if (use_stdout)
+ if (use_stdout && !probe_conf.silent)
printf("%s\n", buf.buf);
- else
+ else if (!probe_conf.silent)
pr_info("%s\n", buf.buf);
+ else
+ pr_debug("%s\n", buf.buf);
}
strbuf_release(&buf);

@@ -2357,7 +2361,10 @@ static int __add_probe_trace_events(struct perf_probe_event *pev,
}

ret = 0;
- pr_info("Added new event%s\n", (ntevs > 1) ? "s:" : ":");
+ if (!probe_conf.silent)
+ pr_info("Added new event%s\n", (ntevs > 1) ? "s:" : ":");
+ else
+ pr_debug("Added new event%s\n", (ntevs > 1) ? "s:" : ":");
for (i = 0; i < ntevs; i++) {
tev = &tevs[i];
/* Skip if the symbol is out of .text or blacklisted */
@@ -2393,7 +2400,7 @@ static int __add_probe_trace_events(struct perf_probe_event *pev,
warn_uprobe_event_compat(tev);

/* Note that it is possible to skip all events because of blacklist */
- if (ret >= 0 && event) {
+ if (ret >= 0 && event && !probe_conf.silent) {
/* Show how to use the event. */
pr_info("\nYou can now use it in all perf tools, such as:\n\n");
pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event);
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 5db0f20..ba981c5 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -13,6 +13,7 @@ struct probe_conf {
bool force_add;
bool no_inlines;
int max_probes;
+ bool silent;
};
extern struct probe_conf probe_conf;
extern bool probe_event_dry_run;
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index bbb2437..db7bd4c 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -267,7 +267,10 @@ static int __del_trace_probe_event(int fd, struct str_node *ent)
goto error;
}

- pr_info("Removed event: %s\n", ent->s);
+ if (!probe_conf.silent)
+ pr_info("Removed event: %s\n", ent->s);
+ else
+ pr_debug("Removed event: %s\n", ent->s);
return 0;
error:
pr_warning("Failed to delete event: %s\n",
--
2.1.0

2015-08-21 10:17:03

by Wang Nan

[permalink] [raw]
Subject: [PATCH 12/29] perf record: Add clang options for compiling BPF scripts

Although previous patch allows setting BPF compiler related options in
perfconfig, on some ad-hoc situation it still requires passing options
through cmdline. This patch introduces 2 options to 'perf record' for
this propose: --clang-path and --clang-opt.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/builtin-record.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 120b00d..c7353ba 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -30,6 +30,7 @@
#include "util/auxtrace.h"
#include "util/parse-branch-options.h"
#include "util/bpf-loader.h"
+#include "util/llvm-utils.h"

#include <unistd.h>
#include <sched.h>
@@ -1094,6 +1095,12 @@ struct option __record_options[] = {
"per thread proc mmap processing timeout in ms"),
OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
"Record context switch events"),
+#ifdef HAVE_LIBBPF_SUPPORT
+ OPT_STRING(0, "clang-path", &llvm_param.clang_path, "clang path",
+ "clang binary to use for compiling BPF scriptlets"),
+ OPT_STRING(0, "clang-opt", &llvm_param.clang_opt, "clang options",
+ "options passed to clang when compiling BPF scriptlets"),
+#endif
OPT_END()
};

--
2.1.0

2015-08-21 10:11:40

by Wang Nan

[permalink] [raw]
Subject: [PATCH 13/29] perf tools: Infrastructure for compiling scriptlets when passing '.c' to --event

This patch provides infrastructure for passing source files to --event
directly using:

# perf record --event bpf-file.c command

This patch does following works:

1) Allow passing '.c' file to '--event'. parse_events_load_bpf() is
expanded to allow caller tell it whether the passed file is source
file or object.

2) llvm__compile_bpf() is called to compile the '.c' file, the result
is saved into memory. Use bpf_object__open_buffer() to load the
in-memory object.

Introduces a bpf-script-example.c so we can manually test it:

# perf record --clang-opt "-DLINUX_VERSION_CODE=0x40200" --event ./bpf-script-example.c sleep 1

Note that '--clang-opt' must put before '--event'.

Futher patches will merge it into a testcase so can be tested automatically.

Signed-off-by: Wang Nan <[email protected]>
Signed-off-by: He Kuang <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/tests/bpf-script-example.c | 44 +++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.c | 25 +++++++++++++++-----
tools/perf/util/bpf-loader.h | 10 ++++----
tools/perf/util/parse-events.c | 6 ++---
tools/perf/util/parse-events.h | 3 ++-
tools/perf/util/parse-events.l | 3 +++
tools/perf/util/parse-events.y | 15 ++++++++++--
7 files changed, 90 insertions(+), 16 deletions(-)
create mode 100644 tools/perf/tests/bpf-script-example.c

diff --git a/tools/perf/tests/bpf-script-example.c b/tools/perf/tests/bpf-script-example.c
new file mode 100644
index 0000000..410a70b
--- /dev/null
+++ b/tools/perf/tests/bpf-script-example.c
@@ -0,0 +1,44 @@
+#ifndef LINUX_VERSION_CODE
+# error Need LINUX_VERSION_CODE
+# error Example: for 4.2 kernel, put 'clang-opt="-DLINUX_VERSION_CODE=0x40200" into llvm section of ~/.perfconfig'
+#endif
+#define BPF_ANY 0
+#define BPF_MAP_TYPE_ARRAY 2
+#define BPF_FUNC_map_lookup_elem 1
+#define BPF_FUNC_map_update_elem 2
+
+static void *(*bpf_map_lookup_elem)(void *map, void *key) =
+ (void *) BPF_FUNC_map_lookup_elem;
+static void *(*bpf_map_update_elem)(void *map, void *key, void *value, int flags) =
+ (void *) BPF_FUNC_map_update_elem;
+
+struct bpf_map_def {
+ unsigned int type;
+ unsigned int key_size;
+ unsigned int value_size;
+ unsigned int max_entries;
+};
+
+#define SEC(NAME) __attribute__((section(NAME), used))
+struct bpf_map_def SEC("maps") flip_table = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .max_entries = 1,
+};
+
+SEC("func=sys_epoll_pwait")
+int bpf_func__sys_epoll_pwait(void *ctx)
+{
+ int ind =0;
+ int *flag = bpf_map_lookup_elem(&flip_table, &ind);
+ int new_flag;
+ if (!flag)
+ return 0;
+ /* flip flag and store back */
+ new_flag = !*flag;
+ bpf_map_update_elem(&flip_table, &ind, &new_flag, BPF_ANY);
+ return new_flag;
+}
+char _license[] SEC("license") = "GPL";
+int _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 8e91ad1..5a203ae 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -11,6 +11,7 @@
#include "bpf-loader.h"
#include "probe-event.h"
#include "probe-finder.h"
+#include "llvm-utils.h"

#define DEFINE_PRINT_FN(name, level) \
static int libbpf_##name(const char *fmt, ...) \
@@ -152,16 +153,28 @@ sync_bpf_program_pev(struct bpf_program *prog)
return 0;
}

-int bpf__prepare_load(const char *filename)
+int bpf__prepare_load(const char *filename, bool source)
{
struct bpf_object *obj;
+ int err;

if (!libbpf_initialized)
libbpf_set_print(libbpf_warning,
libbpf_info,
libbpf_debug);

- obj = bpf_object__open(filename);
+ if (source) {
+ void *obj_buf;
+ size_t obj_buf_sz;
+
+ err = llvm__compile_bpf(filename, &obj_buf, &obj_buf_sz);
+ if (err)
+ return err;
+ obj = bpf_object__open_buffer(obj_buf, obj_buf_sz);
+ free(obj_buf);
+ } else
+ obj = bpf_object__open(filename);
+
if (!obj) {
pr_debug("bpf: failed to load %s\n", filename);
return -EINVAL;
@@ -355,12 +368,12 @@ int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg)
}\
buf[size - 1] = '\0';

-int bpf__strerror_prepare_load(const char *filename, int err,
- char *buf, size_t size)
+int bpf__strerror_prepare_load(const char *filename, bool source,
+ int err, char *buf, size_t size)
{
bpf__strerror_head(err, buf, size);
- bpf__strerror_entry(EINVAL, "%s: BPF object file '%s' is invalid",
- emsg, filename)
+ bpf__strerror_entry(EINVAL, "%s: BPF %s file '%s' is invalid",
+ emsg, source ? "source" : "object", filename);
bpf__strerror_end(buf, size);
return 0;
}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 34656f8..c55ffde 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -16,9 +16,9 @@ typedef int (*bpf_prog_iter_callback_t)(struct probe_trace_event *tev,
int fd, void *arg);

#ifdef HAVE_LIBBPF_SUPPORT
-int bpf__prepare_load(const char *filename);
-int bpf__strerror_prepare_load(const char *filename, int err,
- char *buf, size_t size);
+int bpf__prepare_load(const char *filename, bool source);
+int bpf__strerror_prepare_load(const char *filename, bool source,
+ int err, char *buf, size_t size);
int bpf__probe(void);
int bpf__unprobe(void);
int bpf__strerror_probe(int err, char *buf, size_t size);
@@ -30,7 +30,8 @@ void bpf__clear(void);

int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg);
#else
-static inline int bpf__prepare_load(const char *filename __maybe_unused)
+static inline int bpf__prepare_load(const char *filename __maybe_unused,
+ bool source __maybe_unused)
{
pr_debug("ERROR: eBPF object loading is disabled during compiling.\n");
return -1;
@@ -62,6 +63,7 @@ __bpf_strerror(char *buf, size_t size)

static inline int
bpf__strerror_prepare_load(const char *filename __maybe_unused,
+ bool source __maybe_unused,
int err __maybe_unused,
char *buf, size_t size)
{
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 9eaceb2..cce3e43 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -484,7 +484,7 @@ int parse_events_add_tracepoint(struct list_head *list, int *idx,

int parse_events_load_bpf(struct parse_events_evlist *data,
struct list_head *list __maybe_unused,
- char *bpf_file_name)
+ char *bpf_file_name, bool source)
{
int err;
char errbuf[BUFSIZ];
@@ -497,9 +497,9 @@ int parse_events_load_bpf(struct parse_events_evlist *data,
* problem. After that probe events file by file is possible.
* However, probing cost is still need to be considered.
*/
- err = bpf__prepare_load(bpf_file_name);
+ err = bpf__prepare_load(bpf_file_name, source);
if (err) {
- bpf__strerror_prepare_load(bpf_file_name, err,
+ bpf__strerror_prepare_load(bpf_file_name, source, err,
errbuf, sizeof(errbuf));
data->error->str = strdup(errbuf);
data->error->help = strdup("(add -v to see detail)");
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 3652387..728a424 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -121,7 +121,8 @@ int parse_events_add_tracepoint(struct list_head *list, int *idx,
char *sys, char *event);
int parse_events_load_bpf(struct parse_events_evlist *data,
struct list_head *list,
- char *bpf_file_name);
+ char *bpf_file_name,
+ bool source);
int parse_events_add_numeric(struct parse_events_evlist *data,
struct list_head *list,
u32 type, u64 config,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 22e8f93..8033890 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -116,6 +116,7 @@ group [^,{}/]*[{][^}]*[}][^,{}/]*
event_pmu [^,{}/]+[/][^/]*[/][^,{}/]*
event [^,{}/]+
bpf_object .*\.(o|bpf)
+bpf_source .*\.c

num_dec [0-9]+
num_hex 0x[a-fA-F0-9]+
@@ -161,6 +162,7 @@ modifier_bp [rwx]{1,3}

{event_pmu} |
{bpf_object} |
+{bpf_source} |
{event} {
BEGIN(INITIAL);
REWIND(1);
@@ -267,6 +269,7 @@ r{num_raw_hex} { return raw(yyscanner); }

{modifier_event} { return str(yyscanner, PE_MODIFIER_EVENT); }
{bpf_object} { return str(yyscanner, PE_BPF_OBJECT); }
+{bpf_source} { return str(yyscanner, PE_BPF_SOURCE); }
{name} { return pmu_str_check(yyscanner); }
"/" { BEGIN(config); return '/'; }
- { return '-'; }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 3ee3a32..90d2458 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -42,7 +42,7 @@ static inc_group_count(struct list_head *list,
%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_RAW PE_TERM
%token PE_EVENT_NAME
%token PE_NAME
-%token PE_BPF_OBJECT
+%token PE_BPF_OBJECT PE_BPF_SOURCE
%token PE_MODIFIER_EVENT PE_MODIFIER_BP
%token PE_NAME_CACHE_TYPE PE_NAME_CACHE_OP_RESULT
%token PE_PREFIX_MEM PE_PREFIX_RAW PE_PREFIX_GROUP
@@ -55,6 +55,7 @@ static inc_group_count(struct list_head *list,
%type <num> PE_TERM
%type <str> PE_NAME
%type <str> PE_BPF_OBJECT
+%type <str> PE_BPF_SOURCE
%type <str> PE_NAME_CACHE_TYPE
%type <str> PE_NAME_CACHE_OP_RESULT
%type <str> PE_MODIFIER_EVENT
@@ -431,7 +432,17 @@ PE_BPF_OBJECT
struct list_head *list;

ALLOC_LIST(list);
- ABORT_ON(parse_events_load_bpf(data, list, $1));
+ ABORT_ON(parse_events_load_bpf(data, list, $1, false));
+ $$ = list;
+}
+|
+PE_BPF_SOURCE
+{
+ struct parse_events_evlist *data = _data;
+ struct list_head *list;
+
+ ALLOC_LIST(list);
+ ABORT_ON(parse_events_load_bpf(data, list, $1, true));
$$ = list;
}

--
2.1.0

2015-08-21 10:11:14

by Wang Nan

[permalink] [raw]
Subject: [PATCH 14/29] perf tests: Enforce LLVM test for BPF test

This patch replaces the original toy BPF program with previous introduced
bpf-script-example.c. Dynamically embedded it into 'llvm-src.c'.

The newly introduced BPF program attaches a BPF program at
'sys_epoll_pwait()', and collect half samples from it. perf itself never
use that syscall, so further test can verify their result with it.

Since BPF program require LINUX_VERSION_CODE of runtime kernel, this
patch computes that code from uname.

Since the resuling BPF object is useful for further testcases, this patch
introduces 'prepare' and 'cleanup' method to tests, and makes test__llvm()
create a MAP_SHARED memory array to hold the resulting object.

Signed-off-by: He Kuang <[email protected]>
Signed-off-by: Wang Nan <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
---
tools/perf/tests/Build | 9 +++-
tools/perf/tests/builtin-test.c | 8 ++++
tools/perf/tests/llvm.c | 104 +++++++++++++++++++++++++++++++++++-----
tools/perf/tests/llvm.h | 14 ++++++
tools/perf/tests/tests.h | 2 +
5 files changed, 123 insertions(+), 14 deletions(-)
create mode 100644 tools/perf/tests/llvm.h

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index c1518bd..8c98409 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -32,7 +32,14 @@ perf-y += sample-parsing.o
perf-y += parse-no-sample-id-all.o
perf-y += kmod-path.o
perf-y += thread-map.o
-perf-y += llvm.o
+perf-y += llvm.o llvm-src.o
+
+$(OUTPUT)tests/llvm-src.c: tests/bpf-script-example.c
+ $(Q)echo '#include <tests/llvm.h>' > $@
+ $(Q)echo 'const char test_llvm__bpf_prog[] =' >> $@
+ $(Q)sed -e 's/"/\\"/g' -e 's/\(.*\)/"\1\\n"/g' $< >> $@
+ $(Q)echo ';' >> $@
+

perf-$(CONFIG_X86) += perf-time-to-tsc.o

diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 136cd93..1a349e8 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -17,6 +17,8 @@
static struct test {
const char *desc;
int (*func)(void);
+ void (*prepare)(void);
+ void (*cleanup)(void);
} tests[] = {
{
.desc = "vmlinux symtab matches kallsyms",
@@ -177,6 +179,8 @@ static struct test {
{
.desc = "Test LLVM searching and compiling",
.func = test__llvm,
+ .prepare = test__llvm_prepare,
+ .cleanup = test__llvm_cleanup,
},
{
.func = NULL,
@@ -265,7 +269,11 @@ static int __cmd_test(int argc, const char *argv[], struct intlist *skiplist)
}

pr_debug("\n--- start ---\n");
+ if (tests[curr].prepare)
+ tests[curr].prepare();
err = run_test(&tests[curr]);
+ if (tests[curr].cleanup)
+ tests[curr].cleanup();
pr_debug("---- end ----\n%s:", tests[curr].desc);

switch (err) {
diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c
index a337356..e48eb64 100644
--- a/tools/perf/tests/llvm.c
+++ b/tools/perf/tests/llvm.c
@@ -1,9 +1,13 @@
#include <stdio.h>
+#include <sys/utsname.h>
#include <bpf/libbpf.h>
#include <util/llvm-utils.h>
#include <util/cache.h>
+#include <util/util.h>
+#include <sys/mman.h>
#include "tests.h"
#include "debug.h"
+#include "llvm.h"

static int perf_config_cb(const char *var, const char *val,
void *arg __maybe_unused)
@@ -11,16 +15,6 @@ static int perf_config_cb(const char *var, const char *val,
return perf_default_config(var, val, arg);
}

-/*
- * Randomly give it a "version" section since we don't really load it
- * into kernel
- */
-static const char test_bpf_prog[] =
- "__attribute__((section(\"do_fork\"), used)) "
- "int fork(void *ctx) {return 0;} "
- "char _license[] __attribute__((section(\"license\"), used)) = \"GPL\";"
- "int _version __attribute__((section(\"version\"), used)) = 0x40100;";
-
#ifdef HAVE_LIBBPF_SUPPORT
static int test__bpf_parsing(void *obj_buf, size_t obj_buf_sz)
{
@@ -41,12 +35,44 @@ static int test__bpf_parsing(void *obj_buf __maybe_unused,
}
#endif

+static char *
+compose_source(void)
+{
+ struct utsname utsname;
+ int version, patchlevel, sublevel, err;
+ unsigned long version_code;
+ char *code;
+
+ if (uname(&utsname))
+ return NULL;
+
+ err = sscanf(utsname.release, "%d.%d.%d",
+ &version, &patchlevel, &sublevel);
+ if (err != 3) {
+ fprintf(stderr, " (Can't get kernel version from uname '%s')",
+ utsname.release);
+ return NULL;
+ }
+
+ version_code = (version << 16) + (patchlevel << 8) + sublevel;
+ err = asprintf(&code, "#define LINUX_VERSION_CODE 0x%08lx;\n%s",
+ version_code, test_llvm__bpf_prog);
+ if (err < 0)
+ return NULL;
+
+ return code;
+}
+
+#define SHARED_BUF_INIT_SIZE (1 << 20)
+struct test_llvm__bpf_result *p_test_llvm__bpf_result;
+
int test__llvm(void)
{
char *tmpl_new, *clang_opt_new;
void *obj_buf;
size_t obj_buf_sz;
int err, old_verbose;
+ char *source;

perf_config(perf_config_cb, NULL);

@@ -73,10 +99,22 @@ int test__llvm(void)
if (!llvm_param.clang_opt)
llvm_param.clang_opt = strdup("");

- err = asprintf(&tmpl_new, "echo '%s' | %s", test_bpf_prog,
- llvm_param.clang_bpf_cmd_template);
- if (err < 0)
+ source = compose_source();
+ if (!source) {
+ pr_err("Failed to compose source code\n");
+ return -1;
+ }
+
+ /* Quote __EOF__ so strings in source won't be expanded by shell */
+ err = asprintf(&tmpl_new, "cat << '__EOF__' | %s\n%s\n__EOF__\n",
+ llvm_param.clang_bpf_cmd_template, source);
+ free(source);
+ source = NULL;
+ if (err < 0) {
+ pr_err("Failed to alloc new template\n");
return -1;
+ }
+
err = asprintf(&clang_opt_new, "-xc %s", llvm_param.clang_opt);
if (err < 0)
return -1;
@@ -93,6 +131,46 @@ int test__llvm(void)
}

err = test__bpf_parsing(obj_buf, obj_buf_sz);
+ if (!err && p_test_llvm__bpf_result) {
+ if (obj_buf_sz > SHARED_BUF_INIT_SIZE) {
+ pr_err("Resulting object too large\n");
+ } else {
+ p_test_llvm__bpf_result->size = obj_buf_sz;
+ memcpy(p_test_llvm__bpf_result->object,
+ obj_buf, obj_buf_sz);
+ }
+ }
free(obj_buf);
return err;
}
+
+void test__llvm_prepare(void)
+{
+ p_test_llvm__bpf_result = mmap(NULL, SHARED_BUF_INIT_SIZE,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+ if (!p_test_llvm__bpf_result)
+ return;
+ memset((void *)p_test_llvm__bpf_result, '\0', SHARED_BUF_INIT_SIZE);
+}
+
+void test__llvm_cleanup(void)
+{
+ unsigned long boundary, buf_end;
+
+ if (!p_test_llvm__bpf_result)
+ return;
+ if (p_test_llvm__bpf_result->size == 0) {
+ munmap((void *)p_test_llvm__bpf_result, SHARED_BUF_INIT_SIZE);
+ p_test_llvm__bpf_result = NULL;
+ return;
+ }
+
+ buf_end = (unsigned long)p_test_llvm__bpf_result + SHARED_BUF_INIT_SIZE;
+
+ boundary = (unsigned long)(p_test_llvm__bpf_result);
+ boundary += p_test_llvm__bpf_result->size;
+ boundary = (boundary + (page_size - 1)) &
+ (~((unsigned long)page_size - 1));
+ munmap((void *)boundary, buf_end - boundary);
+}
diff --git a/tools/perf/tests/llvm.h b/tools/perf/tests/llvm.h
new file mode 100644
index 0000000..1e89e46
--- /dev/null
+++ b/tools/perf/tests/llvm.h
@@ -0,0 +1,14 @@
+#ifndef PERF_TEST_LLVM_H
+#define PERF_TEST_LLVM_H
+
+#include <stddef.h> /* for size_t */
+
+struct test_llvm__bpf_result {
+ size_t size;
+ char object[];
+};
+
+extern struct test_llvm__bpf_result *p_test_llvm__bpf_result;
+extern const char test_llvm__bpf_prog[];
+
+#endif
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index bf113a2..0d79f04 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -63,6 +63,8 @@ int test__fdarray__add(void);
int test__kmod_path__parse(void);
int test__thread_map(void);
int test__llvm(void);
+void test__llvm_prepare(void);
+void test__llvm_cleanup(void);

#if defined(__x86_64__) || defined(__i386__) || defined(__arm__) || defined(__aarch64__)
#ifdef HAVE_DWARF_UNWIND_SUPPORT
--
2.1.0

2015-08-21 10:15:07

by Wang Nan

[permalink] [raw]
Subject: [PATCH 15/29] perf test: Add 'perf test BPF'

This patch adds BPF testcase for testing BPF event filtering.

By utilizing the result of 'perf test LLVM', this patch compiles the
eBPF sample program then test it ability. The BPF script in 'perf test
LLVM' collects half of execution of epoll_pwait(). This patch runs 111
times of it, so the resule should contains 56 samples.

Signed-off-by: Wang Nan <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
---
tools/perf/tests/Build | 1 +
tools/perf/tests/bpf.c | 170 ++++++++++++++++++++++++++++++++++++++++
tools/perf/tests/builtin-test.c | 4 +
tools/perf/tests/llvm.c | 19 +++++
tools/perf/tests/llvm.h | 1 +
tools/perf/tests/tests.h | 1 +
tools/perf/util/bpf-loader.c | 13 +++
tools/perf/util/bpf-loader.h | 7 ++
8 files changed, 216 insertions(+)
create mode 100644 tools/perf/tests/bpf.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 8c98409..7ceb448 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -33,6 +33,7 @@ perf-y += parse-no-sample-id-all.o
perf-y += kmod-path.o
perf-y += thread-map.o
perf-y += llvm.o llvm-src.o
+perf-y += bpf.o

$(OUTPUT)tests/llvm-src.c: tests/bpf-script-example.c
$(Q)echo '#include <tests/llvm.h>' > $@
diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c
new file mode 100644
index 0000000..1b0ee4e
--- /dev/null
+++ b/tools/perf/tests/bpf.c
@@ -0,0 +1,170 @@
+#include <stdio.h>
+#include <sys/epoll.h>
+#include <util/bpf-loader.h>
+#include <util/evlist.h>
+#include "tests.h"
+#include "llvm.h"
+#include "debug.h"
+#define NR_ITERS 111
+
+#ifdef HAVE_LIBBPF_SUPPORT
+
+static int epoll_pwait_loop(void)
+{
+ int i;
+
+ /* Should fail NR_ITERS times */
+ for (i = 0; i < NR_ITERS; i++)
+ epoll_pwait(-(i + 1), NULL, 0, 0, NULL);
+ return 0;
+}
+
+static int prepare_bpf(void *obj_buf, size_t obj_buf_sz)
+{
+ int err;
+ char errbuf[BUFSIZ];
+
+ err = bpf__prepare_load_buffer(obj_buf, obj_buf_sz);
+ if (err) {
+ bpf__strerror_prepare_load("[buffer]", false, err, errbuf,
+ sizeof(errbuf));
+ fprintf(stderr, " (%s)", errbuf);
+ return TEST_FAIL;
+ }
+
+ err = bpf__probe();
+ if (err) {
+ bpf__strerror_load(err, errbuf, sizeof(errbuf));
+ fprintf(stderr, " (%s)", errbuf);
+ if (getuid() != 0)
+ fprintf(stderr, " (try run as root)");
+ return TEST_FAIL;
+ }
+
+ err = bpf__load();
+ if (err) {
+ bpf__strerror_load(err, errbuf, sizeof(errbuf));
+ fprintf(stderr, " (%s)", errbuf);
+ return TEST_FAIL;
+ }
+
+ return 0;
+}
+
+static int do_test(void)
+{
+ struct record_opts opts = {
+ .target = {
+ .uid = UINT_MAX,
+ .uses_mmap = true,
+ },
+ .freq = 0,
+ .mmap_pages = 256,
+ .default_interval = 1,
+ };
+
+ int err, i, count = 0;
+ char pid[16];
+ char sbuf[STRERR_BUFSIZE];
+ struct perf_evlist *evlist;
+
+ snprintf(pid, sizeof(pid), "%d", getpid());
+ pid[sizeof(pid) - 1] = '\0';
+ opts.target.tid = opts.target.pid = pid;
+
+ /* Instead of perf_evlist__new_default, don't add default events */
+ evlist = perf_evlist__new();
+ if (!evlist) {
+ pr_debug("No ehough memory to create evlist\n");
+ return -ENOMEM;
+ }
+
+ err = perf_evlist__create_maps(evlist, &opts.target);
+ if (err < 0) {
+ pr_debug("Not enough memory to create thread/cpu maps\n");
+ goto out_delete_evlist;
+ }
+
+ err = perf_evlist__add_bpf(evlist);
+ if (err) {
+ fprintf(stderr, " (Failed to add events selected by BPF)");
+ goto out_delete_evlist;
+ }
+
+ perf_evlist__config(evlist, &opts);
+
+ err = perf_evlist__open(evlist);
+ if (err < 0) {
+ pr_debug("perf_evlist__open: %s\n",
+ strerror_r(errno, sbuf, sizeof(sbuf)));
+ goto out_delete_evlist;
+ }
+
+ err = perf_evlist__mmap(evlist, opts.mmap_pages, false);
+ if (err < 0) {
+ pr_debug("perf_evlist__mmap: %s\n",
+ strerror_r(errno, sbuf, sizeof(sbuf)));
+ goto out_delete_evlist;
+ }
+
+ perf_evlist__enable(evlist);
+ epoll_pwait_loop();
+ perf_evlist__disable(evlist);
+
+ for (i = 0; i < evlist->nr_mmaps; i++) {
+ union perf_event *event;
+
+ while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+ const u32 type = event->header.type;
+
+ if (type == PERF_RECORD_SAMPLE)
+ count ++;
+ }
+ }
+
+ if (count != (NR_ITERS + 1) / 2) {
+ fprintf(stderr, " (filter result incorrect)");
+ err = -EBADF;
+ }
+
+out_delete_evlist:
+ perf_evlist__delete(evlist);
+ if (err)
+ return TEST_FAIL;
+ return 0;
+}
+
+int test__bpf(void)
+{
+ int err;
+ void *obj_buf;
+ size_t obj_buf_sz;
+
+ test_llvm__fetch_bpf_obj(&obj_buf, &obj_buf_sz);
+ if (!obj_buf || !obj_buf_sz) {
+ if (verbose == 0)
+ fprintf(stderr, " (fix 'perf test LLVM' first)");
+ return TEST_SKIP;
+ }
+
+ err = prepare_bpf(obj_buf, obj_buf_sz);
+ if (err)
+ goto out;
+
+ err = do_test();
+ if (err)
+ goto out;
+out:
+ bpf__unprobe();
+ bpf__clear();
+ if (err)
+ return TEST_FAIL;
+ return 0;
+}
+
+#else
+int test__bpf(void)
+{
+ return TEST_SKIP;
+}
+#endif
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 1a349e8..c32c836 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -183,6 +183,10 @@ static struct test {
.cleanup = test__llvm_cleanup,
},
{
+ .desc = "Test BPF filter",
+ .func = test__bpf,
+ },
+ {
.func = NULL,
},
};
diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c
index e48eb64..94ed5f2 100644
--- a/tools/perf/tests/llvm.c
+++ b/tools/perf/tests/llvm.c
@@ -174,3 +174,22 @@ void test__llvm_cleanup(void)
(~((unsigned long)page_size - 1));
munmap((void *)boundary, buf_end - boundary);
}
+
+void
+test_llvm__fetch_bpf_obj(void **p_obj_buf, size_t *p_obj_buf_sz)
+{
+ *p_obj_buf = NULL;
+ *p_obj_buf_sz = 0;
+
+ if (!p_test_llvm__bpf_result) {
+ test__llvm_prepare();
+ test__llvm();
+ test__llvm_cleanup();
+ }
+
+ if (!p_test_llvm__bpf_result)
+ return;
+
+ *p_obj_buf = p_test_llvm__bpf_result->object;
+ *p_obj_buf_sz = p_test_llvm__bpf_result->size;
+}
diff --git a/tools/perf/tests/llvm.h b/tools/perf/tests/llvm.h
index 1e89e46..2fd7ed6 100644
--- a/tools/perf/tests/llvm.h
+++ b/tools/perf/tests/llvm.h
@@ -10,5 +10,6 @@ struct test_llvm__bpf_result {

extern struct test_llvm__bpf_result *p_test_llvm__bpf_result;
extern const char test_llvm__bpf_prog[];
+void test_llvm__fetch_bpf_obj(void **p_obj_buf, size_t *p_obj_buf_sz);

#endif
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 0d79f04..f8ded73 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -65,6 +65,7 @@ int test__thread_map(void);
int test__llvm(void);
void test__llvm_prepare(void);
void test__llvm_cleanup(void);
+int test__bpf(void);

#if defined(__x86_64__) || defined(__i386__) || defined(__arm__) || defined(__aarch64__)
#ifdef HAVE_DWARF_UNWIND_SUPPORT
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 5a203ae..f4693b6 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -153,6 +153,19 @@ sync_bpf_program_pev(struct bpf_program *prog)
return 0;
}

+int bpf__prepare_load_buffer(void *obj_buf, size_t obj_buf_sz)
+{
+ struct bpf_object *obj;
+
+ obj = bpf_object__open_buffer(obj_buf, obj_buf_sz);
+ if (!obj) {
+ pr_debug("bpf: failed to load buffer\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
int bpf__prepare_load(const char *filename, bool source)
{
struct bpf_object *obj;
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index c55ffde..3a86209 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -17,6 +17,7 @@ typedef int (*bpf_prog_iter_callback_t)(struct probe_trace_event *tev,

#ifdef HAVE_LIBBPF_SUPPORT
int bpf__prepare_load(const char *filename, bool source);
+int bpf__prepare_load_buffer(void *obj_buf, size_t obj_buf_sz);
int bpf__strerror_prepare_load(const char *filename, bool source,
int err, char *buf, size_t size);
int bpf__probe(void);
@@ -37,6 +38,12 @@ static inline int bpf__prepare_load(const char *filename __maybe_unused,
return -1;
}

+static inline int bpf__prepare_load_buffer(void *obj_buf __maybe_unused,
+ size_t obj_buf_sz __maybe_unused)
+{
+ return bpf__prepare_load(NULL, false);
+}
+
static inline int bpf__probe(void) { return 0; }
static inline int bpf__unprobe(void) { return 0; }
static inline int bpf__load(void) { return 0; }
--
2.1.0

2015-08-21 10:10:34

by Wang Nan

[permalink] [raw]
Subject: [PATCH 16/29] bpf tools: Load a program with different instances using preprocessor

In this patch, caller of libbpf is able to control the loaded programs
by installing a preprocessor callback for a BPF program. With
preprocessor, different instances can be created from one BPF program.

This patch will be used by perf to generate different prologue for
different 'struct probe_trace_event' instances matched by one
'struct perf_probe_event'.

bpf_program__set_prep() is added to support this feature. Caller
should pass libbpf the number of instances should be created and a
preprocessor function which will be called when doing real loading.
The callback should return instructions arrays for each instances.

fd field in bpf_programs is replaced by instance, which has an nr field
and fds array. bpf_program__nth_fd() is introduced for read fd of
instances. Old interface bpf_program__fd() is reimplemented by
returning the first fd.

Signed-off-by: Wang Nan <[email protected]>
Signed-off-by: He Kuang <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
[wangnan: Add missing '!',
allows bpf_program__unload() when prog->instance.nr == -1
]
---
tools/lib/bpf/libbpf.c | 143 +++++++++++++++++++++++++++++++++++++++++++++----
tools/lib/bpf/libbpf.h | 22 ++++++++
2 files changed, 156 insertions(+), 9 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4fa4bc4..1ff6a19 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -98,7 +98,11 @@ struct bpf_program {
} *reloc_desc;
int nr_reloc;

- int fd;
+ struct {
+ int nr;
+ int *fds;
+ } instance;
+ bpf_program_prep_t preprocessor;

struct bpf_object *obj;
void *priv;
@@ -152,10 +156,24 @@ struct bpf_object {

static void bpf_program__unload(struct bpf_program *prog)
{
+ int i;
+
if (!prog)
return;

- zclose(prog->fd);
+ /*
+ * If the object is opened but the program is never loaded,
+ * it is possible that prog->instance.nr == -1.
+ */
+ if (prog->instance.nr > 0) {
+ for (i = 0; i < prog->instance.nr; i++)
+ zclose(prog->instance.fds[i]);
+ } else if (prog->instance.nr != -1)
+ pr_warning("Internal error: instance.nr is %d\n",
+ prog->instance.nr);
+
+ prog->instance.nr = -1;
+ zfree(&prog->instance.fds);
}

static void bpf_program__exit(struct bpf_program *prog)
@@ -206,7 +224,8 @@ bpf_program__init(void *data, size_t size, char *name, int idx,
memcpy(prog->insns, data,
prog->insns_cnt * sizeof(struct bpf_insn));
prog->idx = idx;
- prog->fd = -1;
+ prog->instance.fds = NULL;
+ prog->instance.nr = -1;

return 0;
errout:
@@ -795,13 +814,71 @@ static int
bpf_program__load(struct bpf_program *prog,
char *license, u32 kern_version)
{
- int err, fd;
+ int err = 0, fd, i;
+
+ if (prog->instance.nr < 0 || !prog->instance.fds) {
+ if (prog->preprocessor) {
+ pr_warning("Internal error: can't load program '%s'\n",
+ prog->section_name);
+ return -EINVAL;
+ }
+
+ prog->instance.fds = malloc(sizeof(int));
+ if (!prog->instance.fds) {
+ pr_warning("No enough memory for fds\n");
+ return -ENOMEM;
+ }
+ prog->instance.nr = 1;
+ prog->instance.fds[0] = -1;
+ }
+
+ if (!prog->preprocessor) {
+ if (prog->instance.nr != 1)
+ pr_warning("Program '%s' inconsistent: nr(%d) not 1\n",
+ prog->section_name, prog->instance.nr);

- err = load_program(prog->insns, prog->insns_cnt,
- license, kern_version, &fd);
- if (!err)
- prog->fd = fd;
+ err = load_program(prog->insns, prog->insns_cnt,
+ license, kern_version, &fd);
+ if (!err)
+ prog->instance.fds[0] = fd;
+ goto out;
+ }
+
+ for (i = 0; i < prog->instance.nr; i++) {
+ struct bpf_prog_prep_result result;
+ bpf_program_prep_t preprocessor = prog->preprocessor;
+
+ bzero(&result, sizeof(result));
+ err = preprocessor(prog, i, prog->insns,
+ prog->insns_cnt, &result);
+ if (err) {
+ pr_warning("Preprocessing %dth instance of program '%s' failed\n",
+ i, prog->section_name);
+ goto out;
+ }
+
+ if (!result.new_insn_ptr || !result.new_insn_cnt) {
+ pr_debug("Skip loading %dth instance of program '%s'\n",
+ i, prog->section_name);
+ prog->instance.fds[i] = -1;
+ continue;
+ }
+
+ err = load_program(result.new_insn_ptr,
+ result.new_insn_cnt,
+ license, kern_version, &fd);
+
+ if (err) {
+ pr_warning("Loading %dth instance of program '%s' failed\n",
+ i, prog->section_name);
+ goto out;
+ }

+ if (result.pfd)
+ *result.pfd = fd;
+ prog->instance.fds[i] = fd;
+ }
+out:
if (err)
pr_warning("failed to load program '%s'\n",
prog->section_name);
@@ -1033,5 +1110,53 @@ const char *bpf_program__title(struct bpf_program *prog, bool dup)

int bpf_program__fd(struct bpf_program *prog)
{
- return prog->fd;
+ return bpf_program__nth_fd(prog, 0);
+}
+
+int bpf_program__set_prep(struct bpf_program *prog, int nr_instance,
+ bpf_program_prep_t prep)
+{
+ int *instance_fds;
+
+ if (nr_instance <= 0 || !prep)
+ return -EINVAL;
+
+ if (prog->instance.nr > 0 || prog->instance.fds) {
+ pr_warning("Can't set pre-processor after loading\n");
+ return -EINVAL;
+ }
+
+ instance_fds = malloc(sizeof(int) * nr_instance);
+ if (!instance_fds) {
+ pr_warning("alloc memory failed for instance of fds\n");
+ return -ENOMEM;
+ }
+
+ /* fill all fd with -1 */
+ memset(instance_fds, 0xff, sizeof(int) * nr_instance);
+
+ prog->instance.nr = nr_instance;
+ prog->instance.fds = instance_fds;
+ prog->preprocessor = prep;
+ return 0;
+}
+
+int bpf_program__nth_fd(struct bpf_program *prog, int n)
+{
+ int fd;
+
+ if (n >= prog->instance.nr || n < 0) {
+ pr_warning("Can't get the %dth fd from program %s: only %d instances\n",
+ n, prog->section_name, prog->instance.nr);
+ return -EINVAL;
+ }
+
+ fd = prog->instance.fds[n];
+ if (fd < 0) {
+ pr_warning("%dth instance of program '%s' is invalid\n",
+ n, prog->section_name);
+ return -ENOENT;
+ }
+
+ return fd;
}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index ea8adc2..9fa7b09 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -65,6 +65,28 @@ const char *bpf_program__title(struct bpf_program *prog, bool dup);

int bpf_program__fd(struct bpf_program *prog);

+struct bpf_insn;
+struct bpf_prog_prep_result {
+ /*
+ * If not NULL, load new instruction array.
+ * If set to NULL, don't load this instance.
+ */
+ struct bpf_insn *new_insn_ptr;
+ int new_insn_cnt;
+
+ /* If not NULL, result fd is set to it */
+ int *pfd;
+};
+
+typedef int (*bpf_program_prep_t)(struct bpf_program *, int n,
+ struct bpf_insn *, int insn_cnt,
+ struct bpf_prog_prep_result *res);
+
+int bpf_program__set_prep(struct bpf_program *prog, int nr_instance,
+ bpf_program_prep_t prep);
+
+int bpf_program__nth_fd(struct bpf_program *prog, int n);
+
/*
* We don't need __attribute__((packed)) now since it is
* unnecessary for 'bpf_map_def' because they are all aligned.
--
2.1.0

2015-08-21 10:10:16

by Wang Nan

[permalink] [raw]
Subject: [PATCH 17/29] perf tools: Fix probe-event.h include

Commit 7b6ff0bdbf4f7f429c2116cca92a6d171217449e ("perf probe ppc64le:
Fixup function entry if using kallsyms lookup") adds 'struct map' into
probe-event.h but not include "util/map.h" in it. This patch fixes it.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/probe-event.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index ba981c5..9c7b99e 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -5,6 +5,7 @@
#include "intlist.h"
#include "strlist.h"
#include "strfilter.h"
+#include "map.h"

/* Probe related configurations */
struct probe_conf {
--
2.1.0

2015-08-21 10:10:27

by Wang Nan

[permalink] [raw]
Subject: [PATCH 18/29] perf probe: Reset args and nargs for probe_trace_event when failure

When failure occures in add_probe_trace_event(), args in
probe_trace_event is incomplete. Since information in it may be used
in further, this patch frees the allocated memory and set it to NULL
to avoid dangling pointer.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/probe-finder.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index 7b80f8c..4b1e074 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -1243,6 +1243,10 @@ static int add_probe_trace_event(Dwarf_Die *sc_die, struct probe_finder *pf)

end:
free(args);
+ if (ret) {
+ tev->nargs = 0;
+ zfree(&tev->args);
+ }
return ret;
}

--
2.1.0

2015-08-21 10:11:17

by Wang Nan

[permalink] [raw]
Subject: [PATCH 19/29] perf tools: Move linux/filter.h to tools/include

From: He Kuang <[email protected]>

This patch moves filter.h from include/linux/kernel.h to
tools/include/linux/filter.h to enable other libraries use macros in
it, like libbpf which will be introduced by further patches. Currenty,
the moved filter.h only contains the useful macros needed by libbpf
for not introducing too much dependence.

MANIFEST is also updated for 'make perf-*-src-pkg'.

One change:
imm field of BPF_EMIT_CALL becomes ((FUNC) - BPF_FUNC_unspec) to
suit user space code generator.

Signed-off-by: He Kuang <[email protected]>
Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/include/linux/filter.h | 237 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/MANIFEST | 1 +
2 files changed, 238 insertions(+)
create mode 100644 tools/include/linux/filter.h

diff --git a/tools/include/linux/filter.h b/tools/include/linux/filter.h
new file mode 100644
index 0000000..11d2b1c
--- /dev/null
+++ b/tools/include/linux/filter.h
@@ -0,0 +1,237 @@
+/*
+ * Linux Socket Filter Data Structures
+ */
+#ifndef __TOOLS_LINUX_FILTER_H
+#define __TOOLS_LINUX_FILTER_H
+
+#include <linux/bpf.h>
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1 BPF_REG_1
+#define BPF_REG_ARG2 BPF_REG_2
+#define BPF_REG_ARG3 BPF_REG_3
+#define BPF_REG_ARG4 BPF_REG_4
+#define BPF_REG_ARG5 BPF_REG_5
+#define BPF_REG_CTX BPF_REG_6
+#define BPF_REG_FP BPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A BPF_REG_0
+#define BPF_REG_X BPF_REG_7
+#define BPF_REG_TMP BPF_REG_8
+
+/* BPF program can access up to 512 bytes of stack space. */
+#define MAX_BPF_STACK 512
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define BPF_ALU64_REG(OP, DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU64_IMM(OP, DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_ALU32_IMM(OP, DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Endianness conversion, cpu_to_{l,b}e(), {l,b}e_to_cpu() */
+
+#define BPF_ENDIAN(TYPE, DST, LEN) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_END | BPF_SRC(TYPE), \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = LEN })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+#define BPF_MOV32_REG(DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV64_IMM(DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_MOV32_IMM(DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Short form of mov based on type,
+ * BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32
+ */
+
+#define BPF_MOV64_RAW(TYPE, DST, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_SRC(TYPE), \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_SRC(TYPE), \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Indirect packet access, R0 = *(uint *) (skb->data + src_reg + imm32) */
+
+#define BPF_LD_IND(SIZE, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_SIZE(SIZE) | BPF_IND, \
+ .dst_reg = 0, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
+
+#define BPF_ST_MEM(SIZE, DST, OFF, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Conditional jumps against registers,
+ * if (dst_reg 'op' src_reg) goto pc + off16
+ */
+
+#define BPF_JMP_REG(OP, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Conditional jumps against immediates,
+ * if (dst_reg 'op' imm32) goto pc + off16
+ */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Function call */
+
+#define BPF_EMIT_CALL(FUNC) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_CALL, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = ((FUNC) - BPF_FUNC_unspec) })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \
+ ((struct bpf_insn) { \
+ .code = CODE, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN() \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_EXIT, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = 0 })
+
+#endif /* __TOOLS_LINUX_FILTER_H */
diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
index 3db15cf..21a5237 100644
--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@@ -45,6 +45,7 @@ tools/include/linux/compiler.h
tools/include/linux/export.h
tools/include/linux/hash.h
tools/include/linux/kernel.h
+tools/include/linux/filter.h
tools/include/linux/list.h
tools/include/linux/log2.h
tools/include/linux/poison.h
--
2.1.0

2015-08-21 10:11:15

by Wang Nan

[permalink] [raw]
Subject: [PATCH 20/29] perf tools: Add BPF_PROLOGUE config options for further patches

If both LIBBPF and DWARF are detected, it is possible to create prologue
for eBPF programs to help them accessing kernel data. HAVE_BPF_PROLOGUE
and CONFIG_BPF_PROLOGUE is added as flags for this feature.

PERF_HAVE_ARCH_GET_REG_OFFSET indicates an architecture supports
converting name of a register to its offset in 'struct pt_regs'.
Without this support, BPF_PROLOGUE should be turned off.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/config/Makefile | 12 ++++++++++++
tools/perf/util/include/dwarf-regs.h | 7 +++++++
2 files changed, 19 insertions(+)

diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 38a4144..d46765b7 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -314,6 +314,18 @@ ifndef NO_LIBELF
CFLAGS += -DHAVE_LIBBPF_SUPPORT
$(call detected,CONFIG_LIBBPF)
endif
+
+ ifndef NO_DWARF
+ ifneq ($(origin PERF_HAVE_ARCH_GET_REG_INFO), undefined)
+ CFLAGS += -DHAVE_BPF_PROLOGUE
+ $(call detected,CONFIG_BPF_PROLOGUE)
+ else
+ msg := $(warning BPF prologue is not supported by architecture $(ARCH));
+ endif
+ else
+ msg := $(warning DWARF support is off, BPF prologue is disabled);
+ endif
+
endif # NO_LIBBPF
endif # NO_LIBELF

diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 8f14965..3dda083 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -5,4 +5,11 @@
const char *get_arch_regstr(unsigned int n);
#endif

+#ifdef HAVE_BPF_PROLOGUE
+/*
+ * Arch should support fetching the offset of a register in pt_regs
+ * by its name.
+ */
+int arch_get_reg_info(const char *name, int *offset);
+#endif
#endif
--
2.1.0

2015-08-21 10:15:45

by Wang Nan

[permalink] [raw]
Subject: [PATCH 21/29] perf tools: Introduce arch_get_reg_info() for x86

From: He Kuang <[email protected]>

arch_get_reg_info() is a helper function which converts register name
like "%rax" to offset of a register in 'struct pt_regs', which is
required by BPF prologue generator.

This patch replaces original string table by a 'struct reg_info' table,
which records offset of registers according to its name.

For x86, since there are two sub-archs (x86_32 and x86_64) but we can
only get pt_regs for the arch we are currently on, this patch fills
offset with '-1' for another sub-arch. This introduces a limitation to
perf prologue that, we are unable to generate prologue on a x86_32
compiled perf for BPF programs targeted on x86_64 kernel. This
limitation is acceptable, because this is a very rare usecase.

Signed-off-by: Wang Nan <[email protected]>
Signed-off-by: He Kuang <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/arch/x86/Makefile | 1 +
tools/perf/arch/x86/util/Build | 2 +
tools/perf/arch/x86/util/dwarf-regs.c | 104 ++++++++++++++++++++++++----------
3 files changed, 78 insertions(+), 29 deletions(-)

diff --git a/tools/perf/arch/x86/Makefile b/tools/perf/arch/x86/Makefile
index 21322e0..a84a6f6f 100644
--- a/tools/perf/arch/x86/Makefile
+++ b/tools/perf/arch/x86/Makefile
@@ -2,3 +2,4 @@ ifndef NO_DWARF
PERF_HAVE_DWARF_REGS := 1
endif
HAVE_KVM_STAT_SUPPORT := 1
+PERF_HAVE_ARCH_GET_REG_INFO := 1
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index a8be9f9..a3837f5 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -3,6 +3,8 @@ libperf-y += tsc.o
libperf-y += pmu.o
libperf-y += kvm-stat.o

+# BPF_PROLOGUE also need dwarf-regs.o. However, if CONFIG_BPF_PROLOGUE
+# is true, CONFIG_DWARF must true.
libperf-$(CONFIG_DWARF) += dwarf-regs.o

libperf-$(CONFIG_LIBUNWIND) += unwind-libunwind.o
diff --git a/tools/perf/arch/x86/util/dwarf-regs.c b/tools/perf/arch/x86/util/dwarf-regs.c
index be22dd4..9928caf 100644
--- a/tools/perf/arch/x86/util/dwarf-regs.c
+++ b/tools/perf/arch/x86/util/dwarf-regs.c
@@ -22,44 +22,67 @@

#include <stddef.h>
#include <dwarf-regs.h>
+#include <string.h>
+#include <linux/ptrace.h>
+#include <linux/kernel.h> /* for offsetof */
+#include <util/bpf-loader.h>
+
+struct reg_info {
+ const char *name; /* Reg string in debuginfo */
+ int offset; /* Reg offset in struct pt_regs */
+};

/*
* Generic dwarf analysis helpers
*/
-
+/*
+ * x86_64 compiling can't access pt_regs for x86_32, so fill offset
+ * with -1.
+ */
+#ifdef __x86_64__
+# define REG_INFO(n, f) { .name = n, .offset = -1, }
+#else
+# define REG_INFO(n, f) { .name = n, .offset = offsetof(struct pt_regs, f), }
+#endif
#define X86_32_MAX_REGS 8
-const char *x86_32_regs_table[X86_32_MAX_REGS] = {
- "%ax",
- "%cx",
- "%dx",
- "%bx",
- "$stack", /* Stack address instead of %sp */
- "%bp",
- "%si",
- "%di",
+
+struct reg_info x86_32_regs_table[X86_32_MAX_REGS] = {
+ REG_INFO("%ax", eax),
+ REG_INFO("%cx", ecx),
+ REG_INFO("%dx", edx),
+ REG_INFO("%bx", ebx),
+ REG_INFO("$stack", esp), /* Stack address instead of %sp */
+ REG_INFO("%bp", ebp),
+ REG_INFO("%si", esi),
+ REG_INFO("%di", edi),
};

+#undef REG_INFO
+#ifdef __x86_64__
+# define REG_INFO(n, f) { .name = n, .offset = offsetof(struct pt_regs, f), }
+#else
+# define REG_INFO(n, f) { .name = n, .offset = -1, }
+#endif
#define X86_64_MAX_REGS 16
-const char *x86_64_regs_table[X86_64_MAX_REGS] = {
- "%ax",
- "%dx",
- "%cx",
- "%bx",
- "%si",
- "%di",
- "%bp",
- "%sp",
- "%r8",
- "%r9",
- "%r10",
- "%r11",
- "%r12",
- "%r13",
- "%r14",
- "%r15",
+struct reg_info x86_64_regs_table[X86_64_MAX_REGS] = {
+ REG_INFO("%ax", rax),
+ REG_INFO("%dx", rdx),
+ REG_INFO("%cx", rcx),
+ REG_INFO("%bx", rbx),
+ REG_INFO("%si", rsi),
+ REG_INFO("%di", rdi),
+ REG_INFO("%bp", rbp),
+ REG_INFO("%sp", rsp),
+ REG_INFO("%r8", r8),
+ REG_INFO("%r9", r9),
+ REG_INFO("%r10", r10),
+ REG_INFO("%r11", r11),
+ REG_INFO("%r12", r12),
+ REG_INFO("%r13", r13),
+ REG_INFO("%r14", r14),
+ REG_INFO("%r15", r15),
};

-/* TODO: switching by dwarf address size */
#ifdef __x86_64__
#define ARCH_MAX_REGS X86_64_MAX_REGS
#define arch_regs_table x86_64_regs_table
@@ -71,5 +94,28 @@ const char *x86_64_regs_table[X86_64_MAX_REGS] = {
/* Return architecture dependent register string (for kprobe-tracer) */
const char *get_arch_regstr(unsigned int n)
{
- return (n <= ARCH_MAX_REGS) ? arch_regs_table[n] : NULL;
+ return (n <= ARCH_MAX_REGS) ? arch_regs_table[n].name : NULL;
}
+
+#ifdef HAVE_BPF_PROLOGUE
+int arch_get_reg_info(const char *name, int *offset)
+{
+ int i;
+ struct reg_info *info;
+
+ if (!name || !offset)
+ return -1;
+
+ for (i = 0; i < ARCH_MAX_REGS; i++) {
+ info = &arch_regs_table[i];
+ if (strcmp(info->name, name) == 0) {
+ if (info->offset < 0)
+ return -1;
+ *offset = info->offset;
+ return 0;
+ }
+ }
+
+ return -1;
+}
+#endif
--
2.1.0

2015-08-21 10:10:55

by Wang Nan

[permalink] [raw]
Subject: [PATCH 22/29] perf tools: Add prologue for BPF programs for fetching arguments

This patch generates prologue for a BPF program which fetch arguments
for it. With this patch, the program can have arguments as follow:

SEC("lock_page=__lock_page page->flags")
int lock_page(struct pt_regs *ctx, int err, unsigned long flags)
{
return 1;
}

This patch passes at most 3 arguments from r3, r4 and r5. r1 is still
the ctx pointer. r2 is used to indicate the successfulness of
dereferencing.

This patch uses r6 to hold ctx (struct pt_regs) and r7 to hold stack
pointer for result. Result of each arguments first store on stack:

low address
BPF_REG_FP - 24 ARG3
BPF_REG_FP - 16 ARG2
BPF_REG_FP - 8 ARG1
BPF_REG_FP
high address

Then loaded into r3, r4 and r5.

The output prologue for offn(...off2(off1(reg)))) should be:

r6 <- r1 // save ctx into a callee saved register
r7 <- fp
r7 <- r7 - stack_offset // pointer to result slot
/* load r3 with the offset in pt_regs of 'reg' */
(r7) <- r3 // make slot valid
r3 <- r3 + off1 // prepare to read unsafe pointer
r2 <- 8
r1 <- r7 // result put onto stack
call probe_read // read unsafe pointer
jnei r0, 0, err // error checking
r3 <- (r7) // read result
r3 <- r3 + off2 // prepare to read unsafe pointer
r2 <- 8
r1 <- r7
call probe_read
jnei r0, 0, err
...
/* load r2, r3, r4 from stack */
goto success
err:
r2 <- 1
/* load r3, r4, r5 with 0 */
goto usercode
success:
r2 <- 0
usercode:
r1 <- r6 // restore ctx
// original user code

If all of arguments reside in register (dereferencing is not
required), gen_prologue_fastpath() will be used to create
fast prologue:

r3 <- (r1 + offset of reg1)
r4 <- (r1 + offset of reg2)
r5 <- (r1 + offset of reg3)
r2 <- 0

P.S.

eBPF calling convention is defined as:

* r0 - return value from in-kernel function, and exit value
for eBPF program
* r1 - r5 - arguments from eBPF program to in-kernel function
* r6 - r9 - callee saved registers that in-kernel function will
preserve
* r10 - read-only frame pointer to access stack

Signed-off-by: He Kuang <[email protected]>
Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/Build | 1 +
tools/perf/util/bpf-prologue.c | 442 +++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-prologue.h | 34 ++++
3 files changed, 477 insertions(+)
create mode 100644 tools/perf/util/bpf-prologue.c
create mode 100644 tools/perf/util/bpf-prologue.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 4d944d4..5b4d032 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -83,6 +83,7 @@ libperf-$(CONFIG_AUXTRACE) += intel-pt.o
libperf-y += parse-branch-options.o

libperf-$(CONFIG_LIBBPF) += bpf-loader.o
+libperf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o
libperf-$(CONFIG_LIBELF) += symbol-elf.o
libperf-$(CONFIG_LIBELF) += probe-file.o
libperf-$(CONFIG_LIBELF) += probe-event.o
diff --git a/tools/perf/util/bpf-prologue.c b/tools/perf/util/bpf-prologue.c
new file mode 100644
index 0000000..2a5f4c7
--- /dev/null
+++ b/tools/perf/util/bpf-prologue.c
@@ -0,0 +1,442 @@
+/*
+ * bpf-prologue.c
+ *
+ * Copyright (C) 2015 He Kuang <[email protected]>
+ * Copyright (C) 2015 Huawei Inc.
+ */
+
+#include <bpf/libbpf.h>
+#include "perf.h"
+#include "debug.h"
+#include "bpf-prologue.h"
+#include "probe-finder.h"
+#include <dwarf-regs.h>
+#include <linux/filter.h>
+
+#define BPF_REG_SIZE 8
+
+#define JMP_TO_ERROR_CODE -1
+#define JMP_TO_SUCCESS_CODE -2
+#define JMP_TO_USER_CODE -3
+
+struct bpf_insn_pos {
+ struct bpf_insn *begin;
+ struct bpf_insn *end;
+ struct bpf_insn *pos;
+};
+
+static inline int
+pos_get_cnt(struct bpf_insn_pos *pos)
+{
+ return pos->pos - pos->begin;
+}
+
+static int
+append_insn(struct bpf_insn new_insn, struct bpf_insn_pos *pos)
+{
+ if (!pos->pos)
+ return -ERANGE;
+
+ if (pos->pos + 1 >= pos->end) {
+ pr_err("bpf prologue: prologue too long\n");
+ pos->pos = NULL;
+ return -ERANGE;
+ }
+
+ *(pos->pos)++ = new_insn;
+ return 0;
+}
+
+static int
+check_pos(struct bpf_insn_pos *pos)
+{
+ if (!pos->pos || pos->pos >= pos->end)
+ return -ERANGE;
+ return 0;
+}
+
+/* Give it a shorter name */
+#define ins(i, p) append_insn((i), (p))
+
+/*
+ * Give a register name (in 'reg'), generate instruction to
+ * load register into an eBPF register rd:
+ * 'ldd target_reg, offset(ctx_reg)', where:
+ * ctx_reg is pre initialized to pointer of 'struct pt_regs'.
+ */
+static int
+gen_ldx_reg_from_ctx(struct bpf_insn_pos *pos, int ctx_reg,
+ const char *reg, int target_reg)
+{
+ int offset;
+
+ if (arch_get_reg_info(reg, &offset)) {
+ pr_err("bpf: prologue: failed to get register %s\n",
+ reg);
+ return -1;
+ }
+ ins(BPF_LDX_MEM(BPF_DW, target_reg, ctx_reg, offset), pos);
+
+ if (check_pos(pos))
+ return -ERANGE;
+ return 0;
+}
+
+/*
+ * Generate a BPF_FUNC_probe_read function call.
+ *
+ * src_base_addr_reg is a register holding base address,
+ * dst_addr_reg is a register holding dest address (on stack),
+ * result is:
+ *
+ * *[dst_addr_reg] = *([src_base_addr_reg] + offset)
+ *
+ * Arguments of BPF_FUNC_probe_read:
+ * ARG1: ptr to stack (dest)
+ * ARG2: size (8)
+ * ARG3: unsafe ptr (src)
+ */
+static int
+gen_read_mem(struct bpf_insn_pos *pos,
+ int src_base_addr_reg,
+ int dst_addr_reg,
+ long offset)
+{
+ /* mov arg3, src_base_addr_reg */
+ if (src_base_addr_reg != BPF_REG_ARG3)
+ ins(BPF_MOV64_REG(BPF_REG_ARG3, src_base_addr_reg), pos);
+ /* add arg3, #offset */
+ if (offset)
+ ins(BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG3, offset), pos);
+
+ /* mov arg2, #reg_size */
+ ins(BPF_ALU64_IMM(BPF_MOV, BPF_REG_ARG2, BPF_REG_SIZE), pos);
+
+ /* mov arg1, dst_addr_reg */
+ if (dst_addr_reg != BPF_REG_ARG1)
+ ins(BPF_MOV64_REG(BPF_REG_ARG1, dst_addr_reg), pos);
+
+ /* Call probe_read */
+ ins(BPF_EMIT_CALL(BPF_FUNC_probe_read), pos);
+ /*
+ * Error processing: if read fail, goto error code,
+ * will be relocated. Target should be the start of
+ * error processing code.
+ */
+ ins(BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, JMP_TO_ERROR_CODE),
+ pos);
+
+ if (check_pos(pos))
+ return -ERANGE;
+ return 0;
+}
+
+/*
+ * Each arg should be bare register. Fetch and save them into argument
+ * registers (r3 - r5).
+ *
+ * BPF_REG_1 should have been initialized with pointer to
+ * 'struct pt_regs'.
+ */
+static int
+gen_prologue_fastpath(struct bpf_insn_pos *pos,
+ struct probe_trace_arg *args, int nargs)
+{
+ int i;
+
+ for (i = 0; i < nargs; i++)
+ if (gen_ldx_reg_from_ctx(pos, BPF_REG_1, args[i].value,
+ BPF_PROLOGUE_START_ARG_REG + i))
+ goto errout;
+
+ if (check_pos(pos))
+ goto errout;
+ return 0;
+errout:
+ return -1;
+}
+
+/*
+ * Slow path:
+ * At least one argument has the form of 'offset($rx)'.
+ *
+ * Following code first stores them into stack, then loads all of then
+ * to r2 - r5.
+ * Before final loading, the final result should be:
+ *
+ * low address
+ * BPF_REG_FP - 24 ARG3
+ * BPF_REG_FP - 16 ARG2
+ * BPF_REG_FP - 8 ARG1
+ * BPF_REG_FP
+ * high address
+ *
+ * For each argument (described as: offn(...off2(off1(reg)))),
+ * generates following code:
+ *
+ * r7 <- fp
+ * r7 <- r7 - stack_offset // Ideal code should initialize r7 using
+ * // fp before generating args. However,
+ * // eBPF won't regard r7 as stack pointer
+ * // if it is generated by minus 8 from
+ * // another stack pointer except fp.
+ * // This is why we have to set r7
+ * // to fp for each variable.
+ * r3 <- value of 'reg'-> generated using gen_ldx_reg_from_ctx()
+ * (r7) <- r3 // skip following instructions for bare reg
+ * r3 <- r3 + off1 . // skip if off1 == 0
+ * r2 <- 8 \
+ * r1 <- r7 |-> generated by gen_read_mem()
+ * call probe_read /
+ * jnei r0, 0, err ./
+ * r3 <- (r7)
+ * r3 <- r3 + off2 . // skip if off2 == 0
+ * r2 <- 8 \ // r2 may be broken by probe_read, so set again
+ * r1 <- r7 |-> generated by gen_read_mem()
+ * call probe_read /
+ * jnei r0, 0, err ./
+ * ...
+ */
+static int
+gen_prologue_slowpath(struct bpf_insn_pos *pos,
+ struct probe_trace_arg *args, int nargs)
+{
+ int i;
+
+ for (i = 0; i < nargs; i++) {
+ struct probe_trace_arg *arg = &args[i];
+ const char *reg = arg->value;
+ struct probe_trace_arg_ref *ref = NULL;
+ int stack_offset = (i + 1) * -8;
+
+ pr_debug("prologue: fetch arg %d, base reg is %s\n",
+ i, reg);
+
+ /* value of base register is stored into ARG3 */
+ if (gen_ldx_reg_from_ctx(pos, BPF_REG_CTX, reg,
+ BPF_REG_ARG3)) {
+ pr_err("prologue: failed to get offset of register %s\n",
+ reg);
+ goto errout;
+ }
+
+ /* Make r7 the stack pointer. */
+ ins(BPF_MOV64_REG(BPF_REG_7, BPF_REG_FP), pos);
+ /* r7 += -8 */
+ ins(BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, stack_offset), pos);
+ /*
+ * Store r3 (base register) onto stack
+ * Ensure fp[offset] is set.
+ * fp is the only valid base register when storing
+ * into stack. We are not allowed to use r7 as base
+ * register here.
+ */
+ ins(BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_ARG3,
+ stack_offset), pos);
+
+ ref = arg->ref;
+ while (ref) {
+ pr_debug("prologue: arg %d: offset %ld\n",
+ i, ref->offset);
+ if (gen_read_mem(pos, BPF_REG_3, BPF_REG_7,
+ ref->offset)) {
+ pr_err("prologue: failed to generate probe_read function call\n");
+ goto errout;
+ }
+
+ ref = ref->next;
+ /*
+ * Load previous result into ARG3. Use
+ * BPF_REG_FP instead of r7 because verifier
+ * allows FP based addressing only.
+ */
+ if (ref)
+ ins(BPF_LDX_MEM(BPF_DW, BPF_REG_ARG3,
+ BPF_REG_FP, stack_offset), pos);
+ }
+ }
+
+ /* Final pass: read to registers */
+ for (i = 0; i < nargs; i++)
+ ins(BPF_LDX_MEM(BPF_DW, BPF_PROLOGUE_START_ARG_REG + i,
+ BPF_REG_FP, -BPF_REG_SIZE * (i + 1)), pos);
+
+ ins(BPF_JMP_IMM(BPF_JA, BPF_REG_0, 0, JMP_TO_SUCCESS_CODE), pos);
+
+ if (check_pos(pos))
+ goto errout;
+ return 0;
+errout:
+ return -1;
+}
+
+static int
+prologue_relocate(struct bpf_insn_pos *pos, struct bpf_insn *error_code,
+ struct bpf_insn *success_code, struct bpf_insn *user_code)
+{
+ struct bpf_insn *insn;
+
+ if (check_pos(pos))
+ return -ERANGE;
+
+ for (insn = pos->begin; insn < pos->pos; insn++) {
+ u8 class = BPF_CLASS(insn->code);
+ u8 opcode;
+
+ if (class != BPF_JMP)
+ continue;
+ opcode = BPF_OP(insn->code);
+ if (opcode == BPF_CALL)
+ continue;
+
+ switch (insn->off) {
+ case JMP_TO_ERROR_CODE:
+ insn->off = error_code - (insn + 1);
+ break;
+ case JMP_TO_SUCCESS_CODE:
+ insn->off = success_code - (insn + 1);
+ break;
+ case JMP_TO_USER_CODE:
+ insn->off = user_code - (insn + 1);
+ break;
+ default:
+ pr_err("bpf prologue: internal error: relocation failed\n");
+ return -1;
+ }
+ }
+ return 0;
+}
+
+int bpf__gen_prologue(struct probe_trace_arg *args, int nargs,
+ struct bpf_insn *new_prog, size_t *new_cnt,
+ size_t cnt_space)
+{
+ struct bpf_insn *success_code = NULL;
+ struct bpf_insn *error_code = NULL;
+ struct bpf_insn *user_code = NULL;
+ struct bpf_insn_pos pos;
+ bool fastpath = true;
+ int i;
+
+ if (!new_prog || !new_cnt)
+ return -EINVAL;
+
+ pos.begin = new_prog;
+ pos.end = new_prog + cnt_space;
+ pos.pos = new_prog;
+
+ if (!nargs) {
+ ins(BPF_ALU64_IMM(BPF_MOV, BPF_PROLOGUE_FETCH_RESULT_REG, 0),
+ &pos);
+
+ if (check_pos(&pos))
+ goto errout;
+
+ *new_cnt = pos_get_cnt(&pos);
+ return 0;
+ }
+
+ if (nargs > BPF_PROLOGUE_MAX_ARGS)
+ nargs = BPF_PROLOGUE_MAX_ARGS;
+ if (cnt_space > BPF_MAXINSNS)
+ cnt_space = BPF_MAXINSNS;
+
+ /* First pass: validation */
+ for (i = 0; i < nargs; i++) {
+ struct probe_trace_arg_ref *ref = args[i].ref;
+
+ if (args[i].value[0] == '@') {
+ /* TODO: fetch global variable */
+ pr_err("bpf: prologue: global %s%+ld not support\n",
+ args[i].value, ref ? ref->offset : 0);
+ return -ENOTSUP;
+ }
+
+ while (ref) {
+ /* fastpath is true if all args has ref == NULL */
+ fastpath = false;
+
+ /*
+ * Instruction encodes immediate value using
+ * s32, ref->offset is long. On systems which
+ * can't fill long in s32, refuse to process if
+ * ref->offset too large (or small).
+ */
+#ifdef __LP64__
+#define OFFSET_MAX ((1LL << 31) - 1)
+#define OFFSET_MIN ((1LL << 31) * -1)
+ if (ref->offset > OFFSET_MAX ||
+ ref->offset < OFFSET_MIN) {
+ pr_err("bpf: prologue: offset out of bound: %ld\n",
+ ref->offset);
+ return -E2BIG;
+ }
+#endif
+ ref = ref->next;
+ }
+ }
+ pr_debug("prologue: pass validation\n");
+
+ if (fastpath) {
+ /* If all variables are registers... */
+ pr_debug("prologue: fast path\n");
+ if (gen_prologue_fastpath(&pos, args, nargs))
+ goto errout;
+ } else {
+ pr_debug("prologue: slow path\n");
+
+ /* Initialization: move ctx to a callee saved register. */
+ ins(BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1), &pos);
+
+ if (gen_prologue_slowpath(&pos, args, nargs))
+ goto errout;
+ /*
+ * start of ERROR_CODE (only slow pass needs error code)
+ * mov r2 <- 1
+ * goto usercode
+ */
+ error_code = pos.pos;
+ ins(BPF_ALU64_IMM(BPF_MOV, BPF_PROLOGUE_FETCH_RESULT_REG, 1),
+ &pos);
+
+ for (i = 0; i < nargs; i++)
+ ins(BPF_ALU64_IMM(BPF_MOV,
+ BPF_PROLOGUE_START_ARG_REG + i,
+ 0),
+ &pos);
+ ins(BPF_JMP_IMM(BPF_JA, BPF_REG_0, 0, JMP_TO_USER_CODE),
+ &pos);
+ }
+
+ /*
+ * start of SUCCESS_CODE:
+ * mov r2 <- 0
+ * goto usercode // skip
+ */
+ success_code = pos.pos;
+ ins(BPF_ALU64_IMM(BPF_MOV, BPF_PROLOGUE_FETCH_RESULT_REG, 0), &pos);
+
+ /*
+ * start of USER_CODE:
+ * Restore ctx to r1
+ */
+ user_code = pos.pos;
+ if (!fastpath) {
+ /*
+ * Only slow path needs restoring of ctx. In fast path,
+ * register are loaded directly from r1.
+ */
+ ins(BPF_MOV64_REG(BPF_REG_ARG1, BPF_REG_CTX), &pos);
+ if (prologue_relocate(&pos, error_code, success_code,
+ user_code))
+ goto errout;
+ }
+
+ if (check_pos(&pos))
+ goto errout;
+
+ *new_cnt = pos_get_cnt(&pos);
+ return 0;
+errout:
+ return -ERANGE;
+}
diff --git a/tools/perf/util/bpf-prologue.h b/tools/perf/util/bpf-prologue.h
new file mode 100644
index 0000000..f1e4c5d
--- /dev/null
+++ b/tools/perf/util/bpf-prologue.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright (C) 2015, He Kuang <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ */
+#ifndef __BPF_PROLOGUE_H
+#define __BPF_PROLOGUE_H
+
+#include <linux/compiler.h>
+#include <linux/filter.h>
+#include "probe-event.h"
+
+#define BPF_PROLOGUE_MAX_ARGS 3
+#define BPF_PROLOGUE_START_ARG_REG BPF_REG_3
+#define BPF_PROLOGUE_FETCH_RESULT_REG BPF_REG_2
+
+#ifdef HAVE_BPF_PROLOGUE
+int bpf__gen_prologue(struct probe_trace_arg *args, int nargs,
+ struct bpf_insn *new_prog, size_t *new_cnt,
+ size_t cnt_space);
+#else
+static inline int
+bpf__gen_prologue(struct probe_trace_arg *args __maybe_unused,
+ int nargs __maybe_unused,
+ struct bpf_insn *new_prog __maybe_unused,
+ size_t *new_cnt,
+ size_t cnt_space __maybe_unused)
+{
+ if (!new_cnt)
+ return -EINVAL;
+ *new_cnt = 0;
+ return 0;
+}
+#endif
+#endif /* __BPF_PROLOGUE_H */
--
2.1.0

2015-08-21 10:13:01

by Wang Nan

[permalink] [raw]
Subject: [PATCH 23/29] perf tools: Generate prologue for BPF programs

This patch generates prologue for each 'struct probe_trace_event' for
fetching arguments for BPF programs.

After bpf__probe(), iterate over each programs to check whether
prologue is required. If none of 'struct perf_probe_event' a program
will attach to has at least one argument, simply skip preprocessor
hooking. For those who prologue is required, calls bpf__gen_prologue()
and paste original instruction after prologue.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/bpf-loader.c | 120 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 119 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index f4693b6..10fbd33 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -5,10 +5,13 @@
* Copyright (C) 2015 Huawei Inc.
*/

+#include <linux/bpf.h>
#include <bpf/libbpf.h>
#include "perf.h"
#include "debug.h"
#include "bpf-loader.h"
+#include "bpf-prologue.h"
+#include "llvm-utils.h"
#include "probe-event.h"
#include "probe-finder.h"
#include "llvm-utils.h"
@@ -42,6 +45,8 @@ struct bpf_prog_priv {
struct perf_probe_event *ppev;
struct perf_probe_event pev;
};
+ bool need_prologue;
+ struct bpf_insn *insns_buf;
};

static void
@@ -53,6 +58,7 @@ bpf_prog_priv__clear(struct bpf_program *prog __maybe_unused,
/* check if pev is initialized */
if (priv && priv->pev_ready)
clear_perf_probe_event(&priv->pev);
+ zfree(&priv->insns_buf);
free(priv);
}

@@ -238,6 +244,103 @@ int bpf__unprobe(void)
return ret < 0 ? ret : 0;
}

+static int
+preproc_gen_prologue(struct bpf_program *prog, int n,
+ struct bpf_insn *orig_insns, int orig_insns_cnt,
+ struct bpf_prog_prep_result *res)
+{
+ struct probe_trace_event *tev;
+ struct perf_probe_event *pev;
+ struct bpf_prog_priv *priv;
+ struct bpf_insn *buf;
+ size_t prologue_cnt = 0;
+ int err;
+
+ err = bpf_program__get_private(prog, (void **)&priv);
+ if (err || !priv || !priv->pev_ready)
+ goto errout;
+
+ pev = &priv->pev;
+
+ if (n < 0 || n >= pev->ntevs)
+ goto errout;
+
+ tev = &pev->tevs[n];
+
+ buf = priv->insns_buf;
+ err = bpf__gen_prologue(tev->args, tev->nargs,
+ buf, &prologue_cnt,
+ BPF_MAXINSNS - orig_insns_cnt);
+ if (err) {
+ const char *title;
+
+ title = bpf_program__title(prog, false);
+ if (!title)
+ title = "??";
+
+ pr_debug("Failed to generate prologue for program %s\n",
+ title);
+ return err;
+ }
+
+ memcpy(&buf[prologue_cnt], orig_insns,
+ sizeof(struct bpf_insn) * orig_insns_cnt);
+
+ res->new_insn_ptr = buf;
+ res->new_insn_cnt = prologue_cnt + orig_insns_cnt;
+ res->pfd = NULL;
+ return 0;
+
+errout:
+ pr_debug("Internal error in preproc_gen_prologue\n");
+ return -EINVAL;
+}
+
+static int hook_load_preprocessor(struct bpf_program *prog)
+{
+ struct perf_probe_event *pev;
+ struct bpf_prog_priv *priv;
+ bool need_prologue = false;
+ int err, i;
+
+ err = bpf_program__get_private(prog, (void **)&priv);
+ if (err || !priv) {
+ pr_debug("Internal error when hook preprocessor\n");
+ return -EINVAL;
+ }
+
+ pev = &priv->pev;
+ for (i = 0; i < pev->ntevs; i++) {
+ struct probe_trace_event *tev = &pev->tevs[i];
+
+ if (tev->nargs > 0) {
+ need_prologue = true;
+ break;
+ }
+ }
+
+ /*
+ * Since all tev doesn't have argument, we don't need generate
+ * prologue.
+ */
+ if (!need_prologue) {
+ priv->need_prologue = false;
+ return 0;
+ }
+
+ priv->need_prologue = true;
+ priv->insns_buf = malloc(sizeof(struct bpf_insn) *
+ BPF_MAXINSNS);
+ if (!priv->insns_buf) {
+ pr_debug("No enough memory: alloc insns_buf failed\n");
+ return -ENOMEM;
+ }
+
+ err = bpf_program__set_prep(prog, pev->ntevs,
+ preproc_gen_prologue);
+ return err;
+}
+
int bpf__probe(void)
{
int err, nr_events = 0;
@@ -288,6 +391,17 @@ int bpf__probe(void)
err = sync_bpf_program_pev(prog);
if (err)
goto out;
+ /*
+ * After probing, let's consider prologue, which
+ * adds program fetcher to BPF programs.
+ *
+ * hook_load_preprocessorr() hooks pre-processor
+ * to bpf_program, let it generate prologue
+ * dynamically during loading.
+ */
+ err = hook_load_preprocessor(prog);
+ if (err)
+ goto out;
}
}
out:
@@ -342,7 +456,11 @@ int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg)
for (i = 0; i < pev->ntevs; i++) {
tev = &pev->tevs[i];

- fd = bpf_program__fd(prog);
+ if (priv->need_prologue)
+ fd = bpf_program__nth_fd(prog, i);
+ else
+ fd = bpf_program__fd(prog);
+
if (fd < 0) {
pr_debug("bpf: failed to get file descriptor\n");
return fd;
--
2.1.0

2015-08-21 10:16:29

by Wang Nan

[permalink] [raw]
Subject: [PATCH 24/29] perf tools: Use same BPF program if arguments are identical

This patch allows creating only one BPF program for different
'probe_trace_event'(tev) generated by one 'perf_probe_event'(pev), if
their prologues are identical.

This is done by comparing argument list of different tev, and maps type
of prologue and tev using a mapping array. This patch utilizes qsort to
sort tevs. After sorting, tevs with identical argument list will group
together.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/bpf-loader.c | 133 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 126 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 10fbd33..68c059b 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -47,6 +47,8 @@ struct bpf_prog_priv {
};
bool need_prologue;
struct bpf_insn *insns_buf;
+ int nr_types;
+ int *type_mapping;
};

static void
@@ -59,6 +61,7 @@ bpf_prog_priv__clear(struct bpf_program *prog __maybe_unused,
if (priv && priv->pev_ready)
clear_perf_probe_event(&priv->pev);
zfree(&priv->insns_buf);
+ zfree(&priv->type_mapping);
free(priv);
}

@@ -254,7 +257,7 @@ preproc_gen_prologue(struct bpf_program *prog, int n,
struct bpf_prog_priv *priv;
struct bpf_insn *buf;
size_t prologue_cnt = 0;
- int err;
+ int i, err;

err = bpf_program__get_private(prog, (void **)&priv);
if (err || !priv || !priv->pev_ready)
@@ -262,10 +265,20 @@ preproc_gen_prologue(struct bpf_program *prog, int n,

pev = &priv->pev;

- if (n < 0 || n >= pev->ntevs)
+ if (n < 0 || n >= priv->nr_types)
goto errout;

- tev = &pev->tevs[n];
+ /* Find a tev belongs to that type */
+ for (i = 0; i < pev->ntevs; i++)
+ if (priv->type_mapping[i] == n)
+ break;
+
+ if (i >= pev->ntevs) {
+ pr_debug("Internal error: prologue type %d not found\n", n);
+ return -ENOENT;
+ }
+
+ tev = &pev->tevs[i];

buf = priv->insns_buf;
err = bpf__gen_prologue(tev->args, tev->nargs,
@@ -296,6 +309,98 @@ errout:
return -EINVAL;
}

+/*
+ * compare_tev_args is reflexive, transitive and antisymmetric.
+ * I can show that but this margin is too narrow to contain.
+ */
+static int compare_tev_args(const void *ptev1, const void *ptev2)
+{
+ int i, ret;
+ const struct probe_trace_event *tev1 =
+ *(const struct probe_trace_event **)ptev1;
+ const struct probe_trace_event *tev2 =
+ *(const struct probe_trace_event **)ptev2;
+
+ ret = tev2->nargs - tev1->nargs;
+ if (ret)
+ return ret;
+
+ for (i = 0; i < tev1->nargs; i++) {
+ struct probe_trace_arg *arg1, *arg2;
+ struct probe_trace_arg_ref *ref1, *ref2;
+
+ arg1 = &tev1->args[i];
+ arg2 = &tev2->args[i];
+
+ ret = strcmp(arg1->value, arg2->value);
+ if (ret)
+ return ret;
+
+ ref1 = arg1->ref;
+ ref2 = arg2->ref;
+
+ while (ref1 && ref2) {
+ ret = ref2->offset - ref1->offset;
+ if (ret)
+ return ret;
+
+ ref1 = ref1->next;
+ ref2 = ref2->next;
+ }
+
+ if (ref1 || ref2)
+ return ref2 ? 1 : -1;
+ }
+
+ return 0;
+}
+
+static int map_prologue(struct perf_probe_event *pev, int *mapping,
+ int *nr_types)
+{
+ int i, type = 0;
+ struct {
+ struct probe_trace_event *tev;
+ int idx;
+ } *stevs;
+ size_t array_sz = sizeof(*stevs) * pev->ntevs;
+
+ stevs = malloc(array_sz);
+ if (!stevs) {
+ pr_debug("No ehough memory: alloc stevs failed\n");
+ return -ENOMEM;
+ }
+
+ pr_debug("In map_prologue, ntevs=%d\n", pev->ntevs);
+ for (i = 0; i < pev->ntevs; i++) {
+ stevs[i].tev = &pev->tevs[i];
+ stevs[i].idx = i;
+ }
+ qsort(stevs, pev->ntevs, sizeof(*stevs),
+ compare_tev_args);
+
+ for (i = 0; i < pev->ntevs; i++) {
+ if (i == 0) {
+ mapping[stevs[i].idx] = type;
+ pr_debug("mapping[%d]=%d\n", stevs[i].idx,
+ type);
+ continue;
+ }
+
+ if (compare_tev_args(stevs + i, stevs + i - 1) == 0)
+ mapping[stevs[i].idx] = type;
+ else
+ mapping[stevs[i].idx] = ++type;
+
+ pr_debug("mapping[%d]=%d\n", stevs[i].idx,
+ mapping[stevs[i].idx]);
+ }
+ free(stevs);
+ *nr_types = type + 1;
+
+ return 0;
+}
+
static int hook_load_preprocessor(struct bpf_program *prog)
{
struct perf_probe_event *pev;
@@ -336,7 +441,19 @@ static int hook_load_preprocessor(struct bpf_program *prog)
return -ENOMEM;
}

- err = bpf_program__set_prep(prog, pev->ntevs,
+ priv->type_mapping = malloc(sizeof(int) * pev->ntevs);
+ if (!priv->type_mapping) {
+ pr_debug("No enough memory: alloc type_mapping failed\n");
+ return -ENOMEM;
+ }
+ memset(priv->type_mapping, 0xff,
+ sizeof(int) * pev->ntevs);
+
+ err = map_prologue(pev, priv->type_mapping, &priv->nr_types);
+ if (err)
+ return err;
+
+ err = bpf_program__set_prep(prog, priv->nr_types,
preproc_gen_prologue);
return err;
}
@@ -456,9 +573,11 @@ int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg)
for (i = 0; i < pev->ntevs; i++) {
tev = &pev->tevs[i];

- if (priv->need_prologue)
- fd = bpf_program__nth_fd(prog, i);
- else
+ if (priv->need_prologue) {
+ int type = priv->type_mapping[i];
+
+ fd = bpf_program__nth_fd(prog, type);
+ } else
fd = bpf_program__fd(prog);

if (fd < 0) {
--
2.1.0

2015-08-21 10:15:35

by Wang Nan

[permalink] [raw]
Subject: [PATCH 25/29] perf record: Support custom vmlinux path

From: He Kuang <[email protected]>

Make perf-record command support --vmlinux option if BPF_PROLOGUE is on.

'perf record' needs vmlinux as the source of DWARF info to generate
prologue for BPF programs, so path of vmlinux should be specified.

Short name 'k' has been taken by 'clockid'. This patch skips the short
option name and use '--vmlinux' for vmlinux path.

Signed-off-by: He Kuang <[email protected]>
Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/builtin-record.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index c7353ba..c5e1080 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1100,6 +1100,10 @@ struct option __record_options[] = {
"clang binary to use for compiling BPF scriptlets"),
OPT_STRING(0, "clang-opt", &llvm_param.clang_opt, "clang options",
"options passed to clang when compiling BPF scriptlets"),
+#ifdef HAVE_BPF_PROLOGUE
+ OPT_STRING(0, "vmlinux", &symbol_conf.vmlinux_name,
+ "file", "vmlinux pathname"),
+#endif
#endif
OPT_END()
};
--
2.1.0

2015-08-21 10:10:24

by Wang Nan

[permalink] [raw]
Subject: [PATCH 26/29] perf probe: Init symbol as kprobe

Before this patch, add_perf_probe_events() init symbol maps only for
uprobe if the first 'struct perf_probe_event' passed to it is a uprobe
event. This is a trick because 'perf probe''s command line syntax
constrains the first elements of the probe_event arrays must be kprobes
if there is one kprobe there.

However, with the incoming BPF uprobe support, that constrain is not
hold since 'perf record' will also probe on k/u probes through BPF
object, and is possible to pass an array with kprobe but the first
element is uprobe.

This patch init symbol maps for kprobes even if all of events are
uprobes, because the extra cost should be small enough.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/probe-event.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index ffc0113..f173372 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -2632,7 +2632,7 @@ int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
{
int i, ret;

- ret = init_symbol_maps(pevs->uprobes);
+ ret = init_symbol_maps(false);
if (ret < 0)
return ret;

--
2.1.0

2015-08-21 10:10:39

by Wang Nan

[permalink] [raw]
Subject: [PATCH 27/29] perf tools: Support attach BPF program on uprobe events

This patch appends new syntax to BPF object section name to support
probing at uprobe event. Now we can use BPF program like this:

SEC(
"target=/lib64/libc.so.6\n"
"libcwrite=__write"
)
int libcwrite(void *ctx)
{
return 1;
}

Where, in section name of a program, before the main config string,
we can use 'key=value' style options. Now the only option key "target"
is for uprobe probing.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/bpf-loader.c | 88 ++++++++++++++++++++++++++++++++++++++++----
1 file changed, 81 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 68c059b..4d6cdb6 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -66,6 +66,84 @@ bpf_prog_priv__clear(struct bpf_program *prog __maybe_unused,
}

static int
+do_config(const char *key, const char *value,
+ struct perf_probe_event *pev)
+{
+ pr_debug("config bpf program: %s=%s\n", key, value);
+ if (strcmp(key, "target") == 0) {
+ pev->uprobes = true;
+ pev->target = strdup(value);
+ return 0;
+ }
+
+ pr_warning("BPF: WARNING: invalid config option in object: %s=%s\n",
+ key, value);
+ pr_warning("\tHint: Currently only valid option is 'target=<file>'\n");
+ return 0;
+}
+
+static const char *
+parse_config_kvpair(const char *config_str, struct perf_probe_event *pev)
+{
+ char *text = strdup(config_str);
+ char *sep, *line;
+ const char *main_str = NULL;
+ int err = 0;
+
+ if (!text) {
+ pr_debug("No enough memory: dup config_str failed\n");
+ return NULL;
+ }
+
+ line = text;
+ while ((sep = strchr(line, '\n'))) {
+ char *equ;
+
+ *sep = '\0';
+ equ = strchr(line, '=');
+ if (!equ) {
+ pr_warning("WARNING: invalid config in BPF object: %s\n",
+ line);
+ pr_warning("\tShould be 'key=value'.\n");
+ goto nextline;
+ }
+ *equ = '\0';
+
+ err = do_config(line, equ + 1, pev);
+ if (err)
+ break;
+nextline:
+ line = sep + 1;
+ }
+
+ if (!err)
+ main_str = config_str + (line - text);
+ free(text);
+
+ return main_str;
+}
+
+static int
+parse_config(const char *config_str, struct perf_probe_event *pev)
+{
+ const char *main_str;
+ int err;
+
+ main_str = parse_config_kvpair(config_str, pev);
+ if (!main_str)
+ return -EINVAL;
+
+ err = parse_perf_probe_command(main_str, pev);
+ if (err < 0) {
+ pr_debug("bpf: '%s' is not a valid config string\n",
+ config_str);
+ /* parse failed, don't need clear pev. */
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
config_bpf_program(struct bpf_program *prog, struct perf_probe_event *pev)
{
struct bpf_prog_priv *priv = NULL;
@@ -79,13 +157,9 @@ config_bpf_program(struct bpf_program *prog, struct perf_probe_event *pev)
}

pr_debug("bpf: config program '%s'\n", config_str);
- err = parse_perf_probe_command(config_str, pev);
- if (err < 0) {
- pr_debug("bpf: '%s' is not a valid config string\n",
- config_str);
- /* parse failed, don't need clear pev. */
- return -EINVAL;
- }
+ err = parse_config(config_str, pev);
+ if (err)
+ return err;

if (pev->group && strcmp(pev->group, PERF_BPF_PROBE_GROUP)) {
pr_debug("bpf: '%s': group for event is set and not '%s'.\n",
--
2.1.0

2015-08-21 10:13:51

by Wang Nan

[permalink] [raw]
Subject: [PATCH 28/29] tools lib traceevent: Support function __get_dynamic_array_len

From: He Kuang <[email protected]>

Support helper function __get_dynamic_array_len() in libtraceevent,
this function is used accompany with __print_array() or __print_hex(),
but currently it is not an available function in the function list of
process_function().

The total allocated length of the dynamic array is embedded in the top
half of __data_loc_##item field. This patch adds new arg type
PRINT_DYNAMIC_ARRAY_LEN to return the length to eval_num_arg(),

Signed-off-by: He Kuang <[email protected]>
---
tools/lib/traceevent/event-parse.c | 56 +++++++++++++++++++++-
tools/lib/traceevent/event-parse.h | 1 +
.../perf/util/scripting-engines/trace-event-perl.c | 1 +
.../util/scripting-engines/trace-event-python.c | 1 +
4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
index fcd8a9e..4888674 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -848,6 +848,7 @@ static void free_arg(struct print_arg *arg)
free(arg->bitmask.bitmask);
break;
case PRINT_DYNAMIC_ARRAY:
+ case PRINT_DYNAMIC_ARRAY_LEN:
free(arg->dynarray.index);
break;
case PRINT_OP:
@@ -2720,6 +2721,42 @@ process_dynamic_array(struct event_format *event, struct print_arg *arg, char **
}

static enum event_type
+process_dynamic_array_len(struct event_format *event, struct print_arg *arg,
+ char **tok)
+{
+ struct format_field *field;
+ enum event_type type;
+ char *token;
+
+ if (read_expect_type(EVENT_ITEM, &token) < 0)
+ goto out_free;
+
+ arg->type = PRINT_DYNAMIC_ARRAY_LEN;
+
+ /* Find the field */
+ field = pevent_find_field(event, token);
+ if (!field)
+ goto out_free;
+
+ arg->dynarray.field = field;
+ arg->dynarray.index = 0;
+
+ if (read_expected(EVENT_DELIM, ")") < 0)
+ goto out_err;
+
+ type = read_token(&token);
+ *tok = token;
+
+ return type;
+
+ out_free:
+ free_token(token);
+ out_err:
+ *tok = NULL;
+ return EVENT_ERROR;
+}
+
+static enum event_type
process_paren(struct event_format *event, struct print_arg *arg, char **tok)
{
struct print_arg *item_arg;
@@ -2966,6 +3003,10 @@ process_function(struct event_format *event, struct print_arg *arg,
free_token(token);
return process_dynamic_array(event, arg, tok);
}
+ if (strcmp(token, "__get_dynamic_array_len") == 0) {
+ free_token(token);
+ return process_dynamic_array_len(event, arg, tok);
+ }

func = find_func_handler(event->pevent, token);
if (func) {
@@ -3646,14 +3687,25 @@ eval_num_arg(void *data, int size, struct event_format *event, struct print_arg
goto out_warning_op;
}
break;
+ case PRINT_DYNAMIC_ARRAY_LEN:
+ offset = pevent_read_number(pevent,
+ data + arg->dynarray.field->offset,
+ arg->dynarray.field->size);
+ /*
+ * The total allocated length of the dynamic array is
+ * stored in the top half of the field, and the offset
+ * is in the bottom half of the 32 bit field.
+ */
+ val = (unsigned long long)(offset >> 16);
+ break;
case PRINT_DYNAMIC_ARRAY:
/* Without [], we pass the address to the dynamic data */
offset = pevent_read_number(pevent,
data + arg->dynarray.field->offset,
arg->dynarray.field->size);
/*
- * The actual length of the dynamic array is stored
- * in the top half of the field, and the offset
+ * The total allocated length of the dynamic array is
+ * stored in the top half of the field, and the offset
* is in the bottom half of the 32 bit field.
*/
offset &= 0xffff;
diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h
index 204befb..6fc83c7 100644
--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -294,6 +294,7 @@ enum print_arg_type {
PRINT_OP,
PRINT_FUNC,
PRINT_BITMASK,
+ PRINT_DYNAMIC_ARRAY_LEN,
};

struct print_arg {
diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c b/tools/perf/util/scripting-engines/trace-event-perl.c
index 1bd593b..544509c 100644
--- a/tools/perf/util/scripting-engines/trace-event-perl.c
+++ b/tools/perf/util/scripting-engines/trace-event-perl.c
@@ -221,6 +221,7 @@ static void define_event_symbols(struct event_format *event,
break;
case PRINT_BSTRING:
case PRINT_DYNAMIC_ARRAY:
+ case PRINT_DYNAMIC_ARRAY_LEN:
case PRINT_STRING:
case PRINT_BITMASK:
break;
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index ace2484..aa9e125 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -251,6 +251,7 @@ static void define_event_symbols(struct event_format *event,
/* gcc warns for these? */
case PRINT_BSTRING:
case PRINT_DYNAMIC_ARRAY:
+ case PRINT_DYNAMIC_ARRAY_LEN:
case PRINT_FUNC:
case PRINT_BITMASK:
/* we should warn... */
--
2.1.0

2015-08-21 10:13:07

by Wang Nan

[permalink] [raw]
Subject: [PATCH 29/29] bpf: Introduce function for outputing data to perf event

From: He Kuang <[email protected]>

There're scenarios that we need an eBPF program to record not only
kprobe point args, but also the PMU counters, time latencies or the
number of cache misses between two probe points and other information
when the probe point is entered.

This patch adds a new trace event to establish infrastruction for bpf to
output data to perf. Userspace perf tools can detect and use this event
as using the existing tracepoint events.

New bpf trace event entry in debugfs:

/sys/kernel/debug/tracing/events/bpf/bpf_output_data

Userspace perf tools detect the new tracepoint event as:

bpf:bpf_output_data [Tracepoint event]

Data in ring-buffer of perf events added to this event will be polled
out, sample types and other attributes can be adjusted to those events
directly without touching the original kprobe events.

The bpf helper function gives eBPF program ability to output data as
perf sample event. This helper simple call the new trace event and
userspace perf tools can record the BPF ftrace event to collect those
records.

Signed-off-by: He Kuang <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
---
include/trace/events/bpf.h | 30 ++++++++++++++++++++++++++++++
include/uapi/linux/bpf.h | 7 +++++++
kernel/trace/bpf_trace.c | 23 +++++++++++++++++++++++
samples/bpf/bpf_helpers.h | 2 ++
4 files changed, 62 insertions(+)
create mode 100644 include/trace/events/bpf.h

diff --git a/include/trace/events/bpf.h b/include/trace/events/bpf.h
new file mode 100644
index 0000000..6b739b8
--- /dev/null
+++ b/include/trace/events/bpf.h
@@ -0,0 +1,30 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM bpf
+
+#if !defined(_TRACE_BPF_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_BPF_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(bpf_output_data,
+
+ TP_PROTO(u64 *src, int size),
+
+ TP_ARGS(src, size),
+
+ TP_STRUCT__entry(
+ __dynamic_array(u8, buf, size)
+ ),
+
+ TP_fast_assign(
+ memcpy(__get_dynamic_array(buf), src, size);
+ ),
+
+ TP_printk("%s", __print_hex(__get_dynamic_array(buf),
+ __get_dynamic_array_len(buf)))
+);
+
+#endif /* _TRACE_BPF_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 29ef6f9..5068ab1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -249,6 +249,13 @@ enum bpf_func_id {
* Return: 0 on success
*/
BPF_FUNC_get_current_comm,
+
+ /**
+ * int bpf_output_trace_data(void *src, int size)
+ * Return: 0 on success
+ */
+ BPF_FUNC_output_trace_data,
+
__BPF_FUNC_MAX_ID,
};

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 88a041a..219f670 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -11,7 +11,10 @@
#include <linux/filter.h>
#include <linux/uaccess.h>
#include <linux/ctype.h>
+
#include "trace.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/bpf.h>

static DEFINE_PER_CPU(int, bpf_prog_active);

@@ -79,6 +82,24 @@ static const struct bpf_func_proto bpf_probe_read_proto = {
.arg3_type = ARG_ANYTHING,
};

+static u64 bpf_output_trace_data(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+ void *src = (void *) (long) r1;
+ int size = (int) r2;
+
+ trace_bpf_output_data(src, size);
+
+ return 0;
+}
+
+static const struct bpf_func_proto bpf_output_trace_data_proto = {
+ .func = bpf_output_trace_data,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_STACK,
+ .arg2_type = ARG_CONST_STACK_SIZE,
+};
+
/*
* limited trace_printk()
* only %d %u %x %ld %lu %lx %lld %llu %llx %p conversion specifiers allowed
@@ -169,6 +190,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
return &bpf_map_delete_elem_proto;
case BPF_FUNC_probe_read:
return &bpf_probe_read_proto;
+ case BPF_FUNC_output_trace_data:
+ return &bpf_output_trace_data_proto;
case BPF_FUNC_ktime_get_ns:
return &bpf_ktime_get_ns_proto;
case BPF_FUNC_tail_call:
diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index bdf1c16..0aeaebe 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -59,5 +59,7 @@ static int (*bpf_l3_csum_replace)(void *ctx, int off, int from, int to, int flag
(void *) BPF_FUNC_l3_csum_replace;
static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flags) =
(void *) BPF_FUNC_l4_csum_replace;
+static int (*bpf_output_trace_data)(void *src, int size) =
+ (void *) BPF_FUNC_output_trace_data;

#endif
--
2.1.0

2015-08-21 14:13:17

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 01/29] perf probe: Try to use symbol table if searching debug info failed

Em Fri, Aug 21, 2015 at 10:09:02AM +0000, Wang Nan escreveu:
> Although libdw returns an error (Failed to get call frame),
> perf tries symbol table and finally gets correct address.

So, what my script does when processing from the e-mail messages is to
add a Cc: line for each of the e-mail addresses found, that way, in the
git history, we will know who got notifications about the patch.

There is also the Link tag, that it forms using the Message-ID, i.e.:

Link: http://lkml.kernel.org/r/<MESSAGE-ID>

Which I put right before my own "Signed-off-by: Arnaldo" line.

So:

Reported-by: Name <[email protected]>
Signed-off-by: Original Author <[email protected]>
Reviewed-by: Name <[email protected]>
Acked-by: Name <[email protected]>
Tested-by: Bla <[email protected]>
Link: http://lkml.kernel.org/r/<MESSAGE-ID>
Signed-off-by: You <[email protected]>

I've been avoiding having the same person in multiple tags, i.e. if
someone reviewed, tested, acked, then no need to have it on the Cc:
list, if someone tested and acked, the Tested-by: is enough, implies an
Acked-by.

My script has a database of e-mail addresses and expands it to have the
names and addresses, i.e.: "Arnaldo Carvalho de Melo <[email protected]>"
instead of just [email protected], etc.

If someone reacts to your patch and acks it, you should try to add the
relevant tag to your patch, possibly doing a 'rebase -i
patch-just-before, reword it, etc" before I process it, doing so will
reduce the work I have to do to process your patches.

Now ideally this would all happen before you put it somewhere public so
that we don't disrupt git trees cloned from ours, but I'm leaving that
to my upstreamers (Ingo and Linus), which I should refrain from as I go
on pulling from other people more often, which I really want, its all
just a matter of us all agreeing to some common ground on how to format
those patches that doesn't requires me being the one doing the patch
editing, etc, which adds up as a burden, preventing me from doing more
interesting work :)

This set of rules evolved over time as I went by pushing patches to
Ingo, it would be good if you followed it so that I could straight away
pull from your tree.

Anyway, thanks for trying to be taking the steps in that direction, I
will now go back to processing the Intel PT patches so that I can get
back to eBPF.

- Arnaldo

> Signed-off-by: Wang Nan <[email protected]>
> ---
> tools/perf/util/probe-event.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
> index fe4941a..f07374b 100644
> --- a/tools/perf/util/probe-event.c
> +++ b/tools/perf/util/probe-event.c
> @@ -705,9 +705,10 @@ static int try_to_find_probe_trace_events(struct perf_probe_event *pev,
> }
> /* Error path : ntevs < 0 */
> pr_debug("An error occurred in debuginfo analysis (%d).\n", ntevs);
> - if (ntevs == -EBADF) {
> - pr_warning("Warning: No dwarf info found in the vmlinux - "
> - "please rebuild kernel with CONFIG_DEBUG_INFO=y.\n");
> + if (ntevs < 0) {
> + if (ntevs == -EBADF)
> + pr_warning("Warning: No dwarf info found in the vmlinux - "
> + "please rebuild kernel with CONFIG_DEBUG_INFO=y.\n");
> if (!need_dwarf) {
> pr_debug("Trying to use symbols.\n");
> return 0;
> --
> 2.1.0

2015-08-21 16:00:34

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 28/29] tools lib traceevent: Support function __get_dynamic_array_len

Em Fri, Aug 21, 2015 at 10:09:29AM +0000, Wang Nan escreveu:
> From: He Kuang <[email protected]>
>
> Support helper function __get_dynamic_array_len() in libtraceevent,
> this function is used accompany with __print_array() or __print_hex(),
> but currently it is not an available function in the function list of
> process_function().

Rostedt, Namhyung, Ack?

- Arnaldo

> The total allocated length of the dynamic array is embedded in the top
> half of __data_loc_##item field. This patch adds new arg type
> PRINT_DYNAMIC_ARRAY_LEN to return the length to eval_num_arg(),
>
> Signed-off-by: He Kuang <[email protected]>
> ---
> tools/lib/traceevent/event-parse.c | 56 +++++++++++++++++++++-
> tools/lib/traceevent/event-parse.h | 1 +
> .../perf/util/scripting-engines/trace-event-perl.c | 1 +
> .../util/scripting-engines/trace-event-python.c | 1 +
> 4 files changed, 57 insertions(+), 2 deletions(-)
>
> diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
> index fcd8a9e..4888674 100644
> --- a/tools/lib/traceevent/event-parse.c
> +++ b/tools/lib/traceevent/event-parse.c
> @@ -848,6 +848,7 @@ static void free_arg(struct print_arg *arg)
> free(arg->bitmask.bitmask);
> break;
> case PRINT_DYNAMIC_ARRAY:
> + case PRINT_DYNAMIC_ARRAY_LEN:
> free(arg->dynarray.index);
> break;
> case PRINT_OP:
> @@ -2720,6 +2721,42 @@ process_dynamic_array(struct event_format *event, struct print_arg *arg, char **
> }
>
> static enum event_type
> +process_dynamic_array_len(struct event_format *event, struct print_arg *arg,
> + char **tok)
> +{
> + struct format_field *field;
> + enum event_type type;
> + char *token;
> +
> + if (read_expect_type(EVENT_ITEM, &token) < 0)
> + goto out_free;
> +
> + arg->type = PRINT_DYNAMIC_ARRAY_LEN;
> +
> + /* Find the field */
> + field = pevent_find_field(event, token);
> + if (!field)
> + goto out_free;
> +
> + arg->dynarray.field = field;
> + arg->dynarray.index = 0;
> +
> + if (read_expected(EVENT_DELIM, ")") < 0)
> + goto out_err;
> +
> + type = read_token(&token);
> + *tok = token;
> +
> + return type;
> +
> + out_free:
> + free_token(token);
> + out_err:
> + *tok = NULL;
> + return EVENT_ERROR;
> +}
> +
> +static enum event_type
> process_paren(struct event_format *event, struct print_arg *arg, char **tok)
> {
> struct print_arg *item_arg;
> @@ -2966,6 +3003,10 @@ process_function(struct event_format *event, struct print_arg *arg,
> free_token(token);
> return process_dynamic_array(event, arg, tok);
> }
> + if (strcmp(token, "__get_dynamic_array_len") == 0) {
> + free_token(token);
> + return process_dynamic_array_len(event, arg, tok);
> + }
>
> func = find_func_handler(event->pevent, token);
> if (func) {
> @@ -3646,14 +3687,25 @@ eval_num_arg(void *data, int size, struct event_format *event, struct print_arg
> goto out_warning_op;
> }
> break;
> + case PRINT_DYNAMIC_ARRAY_LEN:
> + offset = pevent_read_number(pevent,
> + data + arg->dynarray.field->offset,
> + arg->dynarray.field->size);
> + /*
> + * The total allocated length of the dynamic array is
> + * stored in the top half of the field, and the offset
> + * is in the bottom half of the 32 bit field.
> + */
> + val = (unsigned long long)(offset >> 16);
> + break;
> case PRINT_DYNAMIC_ARRAY:
> /* Without [], we pass the address to the dynamic data */
> offset = pevent_read_number(pevent,
> data + arg->dynarray.field->offset,
> arg->dynarray.field->size);
> /*
> - * The actual length of the dynamic array is stored
> - * in the top half of the field, and the offset
> + * The total allocated length of the dynamic array is
> + * stored in the top half of the field, and the offset
> * is in the bottom half of the 32 bit field.
> */
> offset &= 0xffff;
> diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h
> index 204befb..6fc83c7 100644
> --- a/tools/lib/traceevent/event-parse.h
> +++ b/tools/lib/traceevent/event-parse.h
> @@ -294,6 +294,7 @@ enum print_arg_type {
> PRINT_OP,
> PRINT_FUNC,
> PRINT_BITMASK,
> + PRINT_DYNAMIC_ARRAY_LEN,
> };
>
> struct print_arg {
> diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c b/tools/perf/util/scripting-engines/trace-event-perl.c
> index 1bd593b..544509c 100644
> --- a/tools/perf/util/scripting-engines/trace-event-perl.c
> +++ b/tools/perf/util/scripting-engines/trace-event-perl.c
> @@ -221,6 +221,7 @@ static void define_event_symbols(struct event_format *event,
> break;
> case PRINT_BSTRING:
> case PRINT_DYNAMIC_ARRAY:
> + case PRINT_DYNAMIC_ARRAY_LEN:
> case PRINT_STRING:
> case PRINT_BITMASK:
> break;
> diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
> index ace2484..aa9e125 100644
> --- a/tools/perf/util/scripting-engines/trace-event-python.c
> +++ b/tools/perf/util/scripting-engines/trace-event-python.c
> @@ -251,6 +251,7 @@ static void define_event_symbols(struct event_format *event,
> /* gcc warns for these? */
> case PRINT_BSTRING:
> case PRINT_DYNAMIC_ARRAY:
> + case PRINT_DYNAMIC_ARRAY_LEN:
> case PRINT_FUNC:
> case PRINT_BITMASK:
> /* we should warn... */
> --
> 2.1.0

Subject: [tip:perf/core] perf probe: Try to use symbol table if searching debug info failed

Commit-ID: 1c0bd0e891aaed0219010bfe79b32e1b0b82d662
Gitweb: http://git.kernel.org/tip/1c0bd0e891aaed0219010bfe79b32e1b0b82d662
Author: Wang Nan <[email protected]>
AuthorDate: Fri, 21 Aug 2015 10:09:02 +0000
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Fri, 21 Aug 2015 12:57:20 -0300

perf probe: Try to use symbol table if searching debug info failed

A problem can occur in a statically linked perf when vmlinux can be found:

# perf probe --add sys_epoll_pwait
probe-definition(0): sys_epoll_pwait
symbol:sys_epoll_pwait file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (7 entries long)
Using /lib/modules/4.2.0-rc1+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.2.0-rc1+/build/vmlinux
Try to find probe point from debuginfo.
Symbol sys_epoll_pwait address found : ffffffff8122bd40
Matched function: SyS_epoll_pwait
Failed to get call frame on 0xffffffff8122bd40
An error occurred in debuginfo analysis (-2).
Error: Failed to add events. Reason: No such file or directory (Code: -2)

The reason is caused by libdw that, if libdw is statically linked, it
can't load libebl_{arch}.so reliable.

In this case it is still possible to get the address from
/proc/kalksyms. However, perf tries that only when libdw returns
-EBADF.

This patch gives it another chance to utilize symbol table, even if
libdw returns an error code other than -EBADF.

After applying this patch:

# perf probe -nv --add sys_epoll_pwait
probe-definition(0): sys_epoll_pwait
symbol:sys_epoll_pwait file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (7 entries long)
Using /lib/modules/4.2.0-rc1+/build/vmlinux for symbols
Open Debuginfo file: /lib/modules/4.2.0-rc1+/build/vmlinux
Try to find probe point from debuginfo.
Symbol sys_epoll_pwait address found : ffffffff8122bd40
Matched function: SyS_epoll_pwait
Failed to get call frame on 0xffffffff8122bd40
An error occurred in debuginfo analysis (-2).
Trying to use symbols.
Opening /sys/kernel/debug/tracing/kprobe_events write=1
Added new event:
Writing event: p:probe/sys_epoll_pwait _text+2276672
probe:sys_epoll_pwait (on sys_epoll_pwait)

You can now use it in all perf tools, such as:

perf record -e probe:sys_epoll_pwait -aR sleep 1

Although libdw returns an error (Failed to get call frame), perf tries
symbol table and finally gets correct address.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/probe-event.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index fe4941a..f07374b 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -705,9 +705,10 @@ static int try_to_find_probe_trace_events(struct perf_probe_event *pev,
}
/* Error path : ntevs < 0 */
pr_debug("An error occurred in debuginfo analysis (%d).\n", ntevs);
- if (ntevs == -EBADF) {
- pr_warning("Warning: No dwarf info found in the vmlinux - "
- "please rebuild kernel with CONFIG_DEBUG_INFO=y.\n");
+ if (ntevs < 0) {
+ if (ntevs == -EBADF)
+ pr_warning("Warning: No dwarf info found in the vmlinux - "
+ "please rebuild kernel with CONFIG_DEBUG_INFO=y.\n");
if (!need_dwarf) {
pr_debug("Trying to use symbols.\n");
return 0;