2015-05-17 10:58:08

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

This is the 3rd version of 'perf bpf' patch series, based on
v4.1-rc3.

The goal of this series of patches is to integrate eBPF with perf.
After applying these patches, users are allowed to use following
command to load eBPF program compiled by LLVM into kernel then start
recording with filters on:

# perf bpf record --object sample_bpf.o -- -a sleep 4

I post new version before receiving enough comment because I found that
v2 series losts a small but importand patch (37/37 in this series). It
does the final work, attaches eBPF programs to perf event.

Other than the previous change, v3 patch series drops the '|' event
syntax introduced in v2, because I realized that in v2 users are
allowed to pass any bpf fd by using it, like:

# perf bpf record -- -e sched:sched_switch|100| sleep 1

which may become trouble maker. In v3, patch 36/37 passes file
descriptors to evsel by bpf_for_each_program(), which iterates over
each bpf programs and calls a callback function. bpf_fd is set by the
callback.

According to Ingo's suggestion, I renamed every titles of all patches in
this series to make shortlog easier to read.

Since I haven't received enough comment, question in v2 is still open:

Are we actually need a 'perf bpf' command? We can get similar result by
modifying 'perf record' to make it load eBPF program before recording.

I suggest to keep 'perf bpf', group all eBPF stuffs together using a
uniform entry. Also, eBPF programs can act not only as filters but also
data aggregator. It is possible to make something link 'perf bpf run'
to simply make it run, and dump result after user hit 'C-c' or timeout.
The 'config' section may be utilized in this case to let 'perf bpf'
know how to display results.

Following is detail description. Most of following text is copied from
cover letter of v2. You can skip reading if you have already read this
in v2 series.

Patch 1/37 - 5/37 are preparation and bugfixs. Some of them are already
acked.

Patch 6/37 - 26/37 creates tools/lib/bpf.

Libbpf will be compiled into libbpf.a and libbpf.so. It can be
devided into 2 parts:

1) User-kernel interface. The API is defined by tools/lib/bpf/bpf.h,
encapsulates map and program loading operations. In
bpf_load_program(), it doesn't use log buffer in the first try to
improve performance, and retry with log buffer enabled when
failure.

2) ELF operations. The structure of eBPF object file is defined
here. API of this part can be found in tools/lib/bpf/libbpf.h.
'struct bpf_map_def' is also put here.

Libbpf's API hides internal structures. Callers access data of
object files with handlers and accessors. 'struct bpf_object *'
is handler of a whole object file. 'struct bpf_prog_handler *'
is handler and iterator of programs. Some of accessors are
defined to enable caller to retrive section name and file
descriptor of a program. Further accessor can be appended.

In the design of libbpf, I explictly separate full procedure
into opening and loading phase. Data are collected during
'opening' phase. BPF syscalls are called in 'loading' phase.
The separation is designed for potential cross-objects
operations. Such separation also give caller a chance to let
him/her to adjust bytecode and/or maps before real loading.
(API of such operation is not provided in this version).

During loading, fields in 'struct bpf_map_def' are also swapped
if endianess mismatched.

Patch 27/37 - 37/37 are patches on perf, which introduce 'perf bpf'
command and 'perf bpf record' subcommand.

'perf bpf' is not a standalone command. The usage should be:

perf bpf [<options>] <command> --objects <objfile> -- \
<args passed to other cmds>

First two patches make 'perf bpf' avaliable and make perf depend on
libbpf. 29/37 creates 'perf bpf record' and directly passes
everything after '--' to cmd_record(). Other stuffs resides in
tools/perf/utils/bpf-loader.[ch], which are introduced in 30/37.
Following patches do collection -> probing -> loading works step
by step. In those operations, 'perf bpf' collects all required
objects before creating kprobe points, and loads programs into
kernel after probing finish.

A 'bpf_unload()' is used to remove kprobe points. I use 'atexit'
hook to ensure it called before exiting. However, I find that
atexit hookers are not always work well. For example, when program
is canceled by SIGINT. Therefore we still need to call bpf_unload()
after cmd_record().

Patch 36/37 adds bpf_fd field to evsel and config it.

Patch 37/37 finally attachs eBPF program to perf event.

Wang Nan (37):
perf/events/core: fix race in bpf program unregister
perf tools: Set vmlinux_path__nr_entries to 0 in vmlinux_path__exit
tools lib traceevent: Install libtraceevent.a into libdir
tools: Change FEATURE_TESTS and FEATURE_DISPLAY to weak binding
tools: Add __aligned_u64 to types.h
bpf tools: Introduce 'bpf' library to tools
bpf tools: Allow caller to set printing function
bpf tools: Define basic interface
bpf tools: Open eBPF object file and do basic validation
bpf tools: Check endianess and set swap flag according to EHDR
bpf tools: Iterate over ELF sections to collect information
bpf tools: Collect version and license from ELF sections
bpf tools: Collect map definitions from 'maps' section
bpf tools: Collect config string from 'config' section
bpf tools: Collect symbol table from SHT_SYMTAB section
bpf tools: Collect eBPF programs from their own sections
bpf tools: Collect relocation sections from SHT_REL sections
bpf tools: Record map accessing instructions for each program
bpf tools: Clear libelf and ELF parsing resrouce to finish opening
bpf tools: Add bpf.c/h for common bpf operations
bpf tools: Create eBPF maps defined in an object file
bpf tools: Relocate eBPF programs
bpf tools: Introduce bpf_load_program() to bpf.c
bpf tools: Load eBPF programs in object files into kernel
bpf tools: Introduce accessors for struct bpf_program
bpf tools: Introduce accessors for struct bpf_object
perf tools: Add new 'perf bpf' command
perf tools: Make perf depend on libbpf
perf bpf: Add 'perf bpf record' subcommand
perf bpf: Add bpf-loader and open ELF object files
perf bpf: Collect all eBPF programs
perf bpf: Parse probe points of eBPF programs during preparation
perf bpf: Probe at kprobe points
perf bpf: Load all eBPF object into kernel
perf tools: Add a bpf_wrapper global flag
perf tools: Add bpf_fd field to evsel and config it
perf tools: Attach eBPF program to perf event

kernel/events/core.c | 3 +-
tools/build/Makefile.feature | 4 +-
tools/include/linux/types.h | 5 +
tools/lib/bpf/.gitignore | 2 +
tools/lib/bpf/Build | 1 +
tools/lib/bpf/Makefile | 191 ++++++
tools/lib/bpf/bpf.c | 87 +++
tools/lib/bpf/bpf.h | 24 +
tools/lib/bpf/libbpf.c | 1089 +++++++++++++++++++++++++++++++++
tools/lib/bpf/libbpf.h | 66 ++
tools/lib/traceevent/Makefile | 20 +-
tools/perf/Build | 1 +
tools/perf/Documentation/perf-bpf.txt | 18 +
tools/perf/Makefile.perf | 20 +-
tools/perf/builtin-bpf.c | 217 +++++++
tools/perf/builtin-record.c | 6 +
tools/perf/builtin.h | 1 +
tools/perf/command-list.txt | 1 +
tools/perf/perf.c | 10 +
tools/perf/perf.h | 1 +
tools/perf/util/Build | 1 +
tools/perf/util/bpf-loader.c | 262 ++++++++
tools/perf/util/bpf-loader.h | 34 +
tools/perf/util/debug.c | 5 +
tools/perf/util/debug.h | 1 +
tools/perf/util/evlist.c | 34 +
tools/perf/util/evlist.h | 1 +
tools/perf/util/evsel.c | 17 +
tools/perf/util/evsel.h | 1 +
tools/perf/util/parse-options.c | 8 +-
tools/perf/util/parse-options.h | 2 +
tools/perf/util/symbol.c | 1 +
32 files changed, 2122 insertions(+), 12 deletions(-)
create mode 100644 tools/lib/bpf/.gitignore
create mode 100644 tools/lib/bpf/Build
create mode 100644 tools/lib/bpf/Makefile
create mode 100644 tools/lib/bpf/bpf.c
create mode 100644 tools/lib/bpf/bpf.h
create mode 100644 tools/lib/bpf/libbpf.c
create mode 100644 tools/lib/bpf/libbpf.h
create mode 100644 tools/perf/Documentation/perf-bpf.txt
create mode 100644 tools/perf/builtin-bpf.c
create mode 100644 tools/perf/util/bpf-loader.c
create mode 100644 tools/perf/util/bpf-loader.h

--
1.8.3.4


2015-05-17 11:04:50

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 01/37] perf/events/core: fix race in bpf program unregister

there is a race between perf_event_free_bpf_prog() and free_trace_kprobe():

__free_event()
event->destroy(event)
tp_perf_event_destroy()
perf_trace_destroy()
perf_trace_event_unreg()

which is dropping event->tp_event->perf_refcount and allows to proceed in:

unregister_trace_kprobe()
unregister_kprobe_event()
trace_remove_event_call()
probe_remove_event_call()
free_trace_kprobe()

while __free_event does:
call_rcu(&event->rcu_head, free_event_rcu);
free_event_rcu()
perf_event_free_bpf_prog()

To fix the race simply move perf_event_free_bpf_prog() before
event->destroy(), since event->tp_event is still valid at that point.

Note, perf_trace_destroy() is not racing with trace_remove_event_call()
since they both grab event_mutex.

Fixes: 2541517c32be ("tracing, perf: Implement BPF programs attached to kprobes")
Reported-by: Wang Nan <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
---
kernel/events/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 81aa3a4..2288b48 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3422,7 +3422,6 @@ static void free_event_rcu(struct rcu_head *head)
if (event->ns)
put_pid_ns(event->ns);
perf_event_free_filter(event);
- perf_event_free_bpf_prog(event);
kfree(event);
}

@@ -3553,6 +3552,8 @@ static void __free_event(struct perf_event *event)
put_callchain_buffers();
}

+ perf_event_free_bpf_prog(event);
+
if (event->destroy)
event->destroy(event);

--
1.8.3.4

2015-05-17 10:57:56

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 02/37] perf tools: Set vmlinux_path__nr_entries to 0 in vmlinux_path__exit

Original vmlinux_path__exit() doesn't revert vmlinux_path__nr_entries
to its original state. After the while loop vmlinux_path__nr_entries
becomes -1 instead of 0. This makes a problem that, if runs twice,
during the second run vmlinux_path__init() will set vmlinux_path[-1]
to strdup("vmlinux"), corrupts random memory.

This patch reset vmlinux_path__nr_entries to 0 after the while loop.

Signed-off-by: Wang Nan <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
---
tools/perf/util/symbol.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 201f6c4c..451777f 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1802,6 +1802,7 @@ static void vmlinux_path__exit(void)
{
while (--vmlinux_path__nr_entries >= 0)
zfree(&vmlinux_path[vmlinux_path__nr_entries]);
+ vmlinux_path__nr_entries = 0;

zfree(&vmlinux_path);
}
--
1.8.3.4

2015-05-17 11:04:18

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 03/37] tools lib traceevent: Install libtraceevent.a into libdir

Before this patch, 'make install' installs libraries into bindir:

$ make install DESTDIR=./tree
INSTALL trace_plugins
INSTALL libtraceevent.a
INSTALL libtraceevent.so
$ find ./tree
./tree/
./tree/usr
./tree/usr/local
./tree/usr/local/bin
./tree/usr/local/bin/libtraceevent.a
./tree/usr/local/bin/libtraceevent.so
...

/usr/local/lib( or lib64) should be a better place.

This patch replaces 'bin' with libdir. For __LP64__ building, libraries
are installed to /usr/local/lib64. For other building, to
/usr/local/lib instead.

After applying this patch:

$ make install DESTDIR=./tree
INSTALL trace_plugins
INSTALL libtraceevent.a
INSTALL libtraceevent.so
$ find ./tree
./tree
./tree/usr
./tree/usr/local
./tree/usr/local/lib64
./tree/usr/local/lib64/libtraceevent.a
./tree/usr/local/lib64/traceevent
./tree/usr/local/lib64/traceevent/plugins
./tree/usr/local/lib64/traceevent/plugins/plugin_mac80211.so
./tree/usr/local/lib64/traceevent/plugins/plugin_hrtimer.so
...
./tree/usr/local/lib64/libtraceevent.so

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/traceevent/Makefile | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile
index d410da3..8464039 100644
--- a/tools/lib/traceevent/Makefile
+++ b/tools/lib/traceevent/Makefile
@@ -34,9 +34,15 @@ INSTALL = install
DESTDIR ?=
DESTDIR_SQ = '$(subst ','\'',$(DESTDIR))'

+LP64 := $(shell echo __LP64__ | ${CC} ${CFLAGS} -E -x c - | tail -n 1)
+ifeq ($(LP64), 1)
+ libdir_relative = lib64
+else
+ libdir_relative = lib
+endif
+
prefix ?= /usr/local
-bindir_relative = bin
-bindir = $(prefix)/$(bindir_relative)
+libdir = $(prefix)/$(libdir_relative)
man_dir = $(prefix)/share/man
man_dir_SQ = '$(subst ','\'',$(man_dir))'

@@ -58,7 +64,7 @@ ifeq ($(prefix),$(HOME))
override plugin_dir = $(HOME)/.traceevent/plugins
set_plugin_dir := 0
else
-override plugin_dir = $(prefix)/lib/traceevent/plugins
+override plugin_dir = $(libdir)/traceevent/plugins
endif
endif

@@ -85,11 +91,11 @@ srctree := $(patsubst %/,%,$(dir $(srctree)))
#$(info Determined 'srctree' to be $(srctree))
endif

-export prefix bindir src obj
+export prefix libdir src obj

# Shell quotes
-bindir_SQ = $(subst ','\'',$(bindir))
-bindir_relative_SQ = $(subst ','\'',$(bindir_relative))
+libdir_SQ = $(subst ','\'',$(libdir))
+libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
plugin_dir_SQ = $(subst ','\'',$(plugin_dir))

LIB_FILE = libtraceevent.a libtraceevent.so
@@ -240,7 +246,7 @@ endef

install_lib: all_cmd install_plugins
$(call QUIET_INSTALL, $(LIB_FILE)) \
- $(call do_install,$(LIB_FILE),$(bindir_SQ))
+ $(call do_install,$(LIB_FILE),$(libdir_SQ))

install_plugins: $(PLUGINS)
$(call QUIET_INSTALL, trace_plugins) \
--
1.8.3.4

2015-05-17 10:58:20

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 04/37] tools: Change FEATURE_TESTS and FEATURE_DISPLAY to weak binding

Replace strong binding of FEATURE_TESTS and FEATURE_DISPLAY by weak
binding. This patch enables other makefiles which include
tools/build/Makefile.feature enable only limited feathres to check.

Signed-off-by: Wang Nan <[email protected]>
---
tools/build/Makefile.feature | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 3a0b0ca..2975632 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -27,7 +27,7 @@ endef
# the rule that uses them - an example for that is the 'bionic'
# feature check. ]
#
-FEATURE_TESTS = \
+FEATURE_TESTS ?= \
backtrace \
dwarf \
fortify-source \
@@ -53,7 +53,7 @@ FEATURE_TESTS = \
zlib \
lzma

-FEATURE_DISPLAY = \
+FEATURE_DISPLAY ?= \
dwarf \
glibc \
gtk2 \
--
1.8.3.4

2015-05-17 11:05:08

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 05/37] tools: Add __aligned_u64 to types.h

Following patches will introduce linux/bpf.h to a new libbpf library,
which requires definition of __aligned_u64. This patch add it to the
common types.h for tools.

Signed-off-by: Wang Nan <[email protected]>
---
tools/include/linux/types.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/tools/include/linux/types.h b/tools/include/linux/types.h
index b5cf25e..10a2cdc 100644
--- a/tools/include/linux/types.h
+++ b/tools/include/linux/types.h
@@ -60,6 +60,11 @@ typedef __u32 __bitwise __be32;
typedef __u64 __bitwise __le64;
typedef __u64 __bitwise __be64;

+/* Taken from uapi/linux/types.h. Required by linux/bpf.h */
+#ifndef __aligned_u64
+# define __aligned_u64 __u64 __attribute__((aligned(8)))
+#endif
+
struct list_head {
struct list_head *next, *prev;
};
--
1.8.3.4

2015-05-17 11:05:14

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 06/37] bpf tools: Introduce 'bpf' library to tools

This is the first patch of libbpf. The goal of libbpf is to create a
standard way for accessing eBPF object files. This patch creates
Makefile and Build for it, allows 'make' to build libbpf.a and
libbpf.so, 'make install' to put them into proper directories.
Most part of Makefile is borrowed from traceevent. Before building,
it checks the existance of libelf in Makefile, and deny to build if
not found. Instead of throw an error if libelf not found, the error
raises in a phony target "elfdep". This design is to ensure
'make clean' still workable even if libelf is not found.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/.gitignore | 2 +
tools/lib/bpf/Build | 1 +
tools/lib/bpf/Makefile | 191 +++++++++++++++++++++++++++++++++++++++++++++++
tools/lib/bpf/libbpf.c | 15 ++++
tools/lib/bpf/libbpf.h | 12 +++
5 files changed, 221 insertions(+)
create mode 100644 tools/lib/bpf/.gitignore
create mode 100644 tools/lib/bpf/Build
create mode 100644 tools/lib/bpf/Makefile
create mode 100644 tools/lib/bpf/libbpf.c
create mode 100644 tools/lib/bpf/libbpf.h

diff --git a/tools/lib/bpf/.gitignore b/tools/lib/bpf/.gitignore
new file mode 100644
index 0000000..812aeed
--- /dev/null
+++ b/tools/lib/bpf/.gitignore
@@ -0,0 +1,2 @@
+libbpf_version.h
+FEATURE-DUMP
diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
new file mode 100644
index 0000000..a316484
--- /dev/null
+++ b/tools/lib/bpf/Build
@@ -0,0 +1 @@
+libbpf-y := libbpf.o
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
new file mode 100644
index 0000000..bd5769c
--- /dev/null
+++ b/tools/lib/bpf/Makefile
@@ -0,0 +1,191 @@
+BPF_VERSION = 0
+BPF_PATCHLEVEL = 0
+BPF_EXTRAVERSION = 1
+
+# Version of eBPF elf file
+FILE_VERSION = 1
+
+MAKEFLAGS += --no-print-directory
+
+
+# Makefiles suck: This macro sets a default value of $(2) for the
+# variable named by $(1), unless the variable has been set by
+# environment or command line. This is necessary for CC and AR
+# because make sets default values, so the simpler ?= approach
+# won't work as expected.
+define allow-override
+ $(if $(or $(findstring environment,$(origin $(1))),\
+ $(findstring command line,$(origin $(1)))),,\
+ $(eval $(1) = $(2)))
+endef
+
+# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
+$(call allow-override,CC,$(CROSS_COMPILE)gcc)
+$(call allow-override,AR,$(CROSS_COMPILE)ar)
+
+EXT = -std=gnu99
+INSTALL = install
+
+# Use DESTDIR for installing into a different root directory.
+# This is useful for building a package. The program will be
+# installed in this directory as if it was the root directory.
+# Then the build tool can move it later.
+DESTDIR ?=
+DESTDIR_SQ = '$(subst ','\'',$(DESTDIR))'
+
+LP64 := $(shell echo __LP64__ | ${CC} ${CFLAGS} -E -x c - | tail -n 1)
+ifeq ($(LP64), 1)
+ libdir_relative = lib64
+else
+ libdir_relative = lib
+endif
+
+prefix ?= /usr/local
+libdir = $(prefix)/$(libdir_relative)
+man_dir = $(prefix)/share/man
+man_dir_SQ = '$(subst ','\'',$(man_dir))'
+
+export man_dir man_dir_SQ INSTALL
+export DESTDIR DESTDIR_SQ
+
+include ../../scripts/Makefile.include
+
+# copy a bit from Linux kbuild
+
+ifeq ("$(origin V)", "command line")
+ VERBOSE = $(V)
+endif
+ifndef VERBOSE
+ VERBOSE = 0
+endif
+
+ifeq ($(srctree),)
+srctree := $(patsubst %/,%,$(dir $(shell pwd)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+srctree := $(patsubst %/,%,$(dir $(srctree)))
+#$(info Determined 'srctree' to be $(srctree))
+endif
+
+FEATURE_DISPLAY = libelf libelf-getphdrnum libelf-mmap
+FEATURE_TESTS = libelf
+include $(srctree)/tools/build/Makefile.feature
+
+ifeq ($(feature-libelf-mmap), 1)
+ override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT
+endif
+
+ifeq ($(feature-libelf-getphdrnum), 1)
+ override CFLAGS += -DHAVE_ELF_GETPHDRNUM_SUPPORT
+endif
+
+export prefix libdir src obj
+
+# Shell quotes
+libdir_SQ = $(subst ','\'',$(libdir))
+libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
+plugin_dir_SQ = $(subst ','\'',$(plugin_dir))
+
+LIB_FILE = libbpf.a libbpf.so
+
+VERSION = $(BPF_VERSION)
+PATCHLEVEL = $(BPF_PATCHLEVEL)
+EXTRAVERSION = $(BPF_EXTRAVERSION)
+
+OBJ = $@
+N =
+
+LIBBPF_VERSION = $(BPF_VERSION).$(BPF_PATCHLEVEL).$(BPF_EXTRAVERSION)
+
+INCLUDES = -I. -I $(srctree)/tools/include -I$(srctree)/arch/$(ARCH)/include/uapi -I$(srctree)/include/uapi
+
+# Set compile option CFLAGS
+ifdef EXTRA_CFLAGS
+ CFLAGS := $(EXTRA_CFLAGS)
+else
+ CFLAGS := -g -Wall
+endif
+
+# Append required CFLAGS
+override CFLAGS += -fPIC
+override CFLAGS += $(INCLUDES)
+override CFLAGS += -D_GNU_SOURCE
+
+ifeq ($(VERBOSE),1)
+ Q =
+else
+ Q = @
+endif
+
+# Disable command line variables (CFLAGS) overide from top
+# level Makefile (perf), otherwise build Makefile will get
+# the same command line setup.
+MAKEOVERRIDES=
+
+export srctree OUTPUT CC LD CFLAGS V
+build := -f $(srctree)/tools/build/Makefile.build dir=. obj
+
+BPF_IN := $(OUTPUT)libbpf-in.o
+LIB_FILE := $(addprefix $(OUTPUT),$(LIB_FILE))
+
+CMD_TARGETS = $(LIB_FILE)
+
+TARGETS = $(CMD_TARGETS)
+
+all: $(VERSION_FILES) all_cmd
+
+all_cmd: $(CMD_TARGETS)
+
+$(BPF_IN): force elfdep
+ $(Q)$(MAKE) $(build)=libbpf
+
+$(OUTPUT)libbpf.so: $(BPF_IN)
+ $(QUIET_LINK)$(CC) --shared $^ -o $@
+
+$(OUTPUT)libbpf.a: $(BPF_IN)
+ $(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
+
+define update_dir
+ (echo $1 > [email protected]; \
+ if [ -r $@ ] && cmp -s $@ [email protected]; then \
+ rm -f [email protected]; \
+ else \
+ echo ' UPDATE $@'; \
+ mv -f [email protected] $@; \
+ fi);
+endef
+
+define do_install
+ if [ ! -d '$(DESTDIR_SQ)$2' ]; then \
+ $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$2'; \
+ fi; \
+ $(INSTALL) $1 '$(DESTDIR_SQ)$2'
+endef
+
+install_lib: all_cmd
+ $(call QUIET_INSTALL, $(LIB_FILE)) \
+ $(call do_install,$(LIB_FILE),$(libdir_SQ))
+
+install: install_lib
+
+### Cleaning rules
+
+config-clean:
+ $(call QUIET_CLEAN, config)
+ $(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null
+
+clean:
+ $(call QUIET_CLEAN, libbpf) $(RM) *.o *~ $(TARGETS) *.a *.so $(VERSION_FILES) .*.d \
+ $(RM) LIBBPF-CFLAGS
+ $(call QUIET_CLEAN, core-gen) $(RM) $(OUTPUT)FEATURE-DUMP
+
+
+
+PHONY += force elfdep
+force:
+
+elfdep:
+ @if [ "$(feature-libelf)" != "1" ]; then echo "No libelf found"; exit -1 ; fi
+
+# Declare the contents of the .PHONY variable as phony. We keep that
+# information in a variable so we can use it in if_changed and friends.
+.PHONY: $(PHONY)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
new file mode 100644
index 0000000..bebe99a
--- /dev/null
+++ b/tools/lib/bpf/libbpf.c
@@ -0,0 +1,15 @@
+/*
+ * common eBPF ELF loading operations.
+ *
+ * Copyright (C) 2015, Wang Nan <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+
+#include <stdlib.h>
+#include <unistd.h>
+#include <asm/unistd.h>
+#include <linux/bpf.h>
+
+#include "libbpf.h"
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
new file mode 100644
index 0000000..10845a0
--- /dev/null
+++ b/tools/lib/bpf/libbpf.h
@@ -0,0 +1,12 @@
+/*
+ * Common eBPF object loader operations.
+ *
+ * Copyright (C) 2015, Wang Nan <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+#ifndef __BPF_LIBBPF_H
+#define __BPF_LIBBPF_H
+
+#endif
--
1.8.3.4

2015-05-17 10:58:32

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 07/37] bpf tools: Allow caller to set printing function

By libbpf_set_print(), users of libbpf are allowed to register he/she
own debug, info and warning printing functions. Libbpf will use those
functions to print messages. If not provided, default info and warning
printing functions are fprintf(stderr, ...); defailt debug printing
is NULL.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 43 +++++++++++++++++++++++++++++++++++++++++++
tools/lib/bpf/libbpf.h | 4 ++++
2 files changed, 47 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index bebe99a..d7a7869 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8,8 +8,51 @@
*/

#include <stdlib.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <string.h>
#include <unistd.h>
#include <asm/unistd.h>
#include <linux/bpf.h>

#include "libbpf.h"
+
+#define __printf(a, b) __attribute__((format(printf, a, b)))
+
+__printf(1, 2)
+static int __base_pr(const char *format, ...)
+{
+ va_list args;
+ int err;
+
+ va_start(args, format);
+ err = vfprintf(stderr, format, args);
+ va_end(args);
+ return err;
+}
+
+static __printf(1, 2) int (*__pr_warning)(const char *format, ...) =
+ __base_pr;
+static __printf(1, 2) int (*__pr_info)(const char *format, ...) =
+ __base_pr;
+static __printf(1, 2) int (*__pr_debug)(const char *format, ...) =
+ NULL;
+
+#define __pr(func, fmt, ...) \
+do { \
+ if ((func)) \
+ (func)("libbpf: " fmt, ##__VA_ARGS__); \
+} while(0)
+
+#define pr_warning(fmt, ...) __pr(__pr_warning, fmt, ##__VA_ARGS__)
+#define pr_info(fmt, ...) __pr(__pr_info, fmt, ##__VA_ARGS__)
+#define pr_debug(fmt, ...) __pr(__pr_debug, fmt, ##__VA_ARGS__)
+
+void libbpf_set_print(int (*warn)(const char *format, ...),
+ int (*info)(const char *format, ...),
+ int (*debug)(const char *format, ...))
+{
+ __pr_warning = warn;
+ __pr_info = info;
+ __pr_debug = debug;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 10845a0..eb306c0 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -9,4 +9,8 @@
#ifndef __BPF_LIBBPF_H
#define __BPF_LIBBPF_H

+void libbpf_set_print(int (*warn)(const char *format, ...),
+ int (*info)(const char *format, ...),
+ int (*debug)(const char *format, ...));
+
#endif
--
1.8.3.4

2015-05-17 11:04:37

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 08/37] bpf tools: Define basic interface

bpf_open_object() and bpf_close_object() are open and close function of
eBPF object files. 'struct bpf_object' will be handler of one object
file. Its internal structure is hide to user.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 11 +++++++++++
tools/lib/bpf/libbpf.h | 8 ++++++++
2 files changed, 19 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d7a7869..f8decff 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -12,6 +12,7 @@
#include <stdarg.h>
#include <string.h>
#include <unistd.h>
+#include <errno.h>
#include <asm/unistd.h>
#include <linux/bpf.h>

@@ -56,3 +57,13 @@ void libbpf_set_print(int (*warn)(const char *format, ...),
__pr_info = info;
__pr_debug = debug;
}
+
+struct bpf_object *bpf_open_object(const char *path)
+{
+ return NULL;
+}
+
+void bpf_close_object(struct bpf_object *object)
+{
+ return 0;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index eb306c0..e523ae9 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -9,8 +9,16 @@
#ifndef __BPF_LIBBPF_H
#define __BPF_LIBBPF_H

+#include <stdio.h>
+
void libbpf_set_print(int (*warn)(const char *format, ...),
int (*info)(const char *format, ...),
int (*debug)(const char *format, ...));

+/* Hide internal to user */
+struct bpf_object;
+
+struct bpf_object *bpf_open_object(const char *path);
+void bpf_close_object(struct bpf_object *object);
+
#endif
--
1.8.3.4

2015-05-17 11:05:00

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 09/37] bpf tools: Open eBPF object file and do basic validation

This patch adds basic 'struct bpf_object' which will be used for eBPF
object file loading. eBPF object files are compiled by LLVM as ELF
format. In this patch, libelf is used to open those files, read EHDR
and do basic validation according to e_type and e_machine.

All elf related staffs are grouped together and reside in elf field of
'struct bpf_object'. bpf_obj_clear_elf() is introduced to clear it.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 154 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index f8decff..43c16bc 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -12,9 +12,12 @@
#include <stdarg.h>
#include <string.h>
#include <unistd.h>
+#include <fcntl.h>
#include <errno.h>
#include <asm/unistd.h>
#include <linux/bpf.h>
+#include <libelf.h>
+#include <gelf.h>

#include "libbpf.h"

@@ -58,12 +61,161 @@ void libbpf_set_print(int (*warn)(const char *format, ...),
__pr_debug = debug;
}

-struct bpf_object *bpf_open_object(const char *path)
+#ifdef HAVE_LIBELF_MMAP_SUPPORT
+# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ_MMAP
+#else
+# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ
+#endif
+
+struct bpf_object {
+ char *path;
+
+ /*
+ * Information when doing elf related work. Only valid if fd
+ * is valid.
+ */
+ struct {
+ int fd;
+ Elf *elf;
+ GElf_Ehdr ehdr;
+ } elf;
+};
+#define obj_elf_valid(o) ((o)->elf.fd >= 0)
+
+static struct bpf_object *__bpf_obj_alloc(const char *path)
+{
+ struct bpf_object *obj;
+
+ obj = calloc(1, sizeof(struct bpf_object));
+ if (!obj) {
+ pr_warning("alloc memory failed for %s\n", path);
+ return NULL;
+ }
+
+ obj->path = strdup(path);
+ if (!obj->path) {
+ pr_warning("failed to strdup '%s'\n", path);
+ free(obj);
+ return NULL;
+ }
+ return obj;
+}
+
+static struct bpf_object *bpf_obj_alloc(const char *path)
{
+ struct bpf_object *obj;
+
+ obj = __bpf_obj_alloc(path);
+ if (!obj)
+ goto out;
+
+ obj->elf.fd = -1;
+ return obj;
+out:
+ bpf_close_object(obj);
return NULL;
}

-void bpf_close_object(struct bpf_object *object)
+static void bpf_obj_clear_elf(struct bpf_object *obj)
+{
+ if (!obj_elf_valid(obj))
+ return;
+
+ if (obj->elf.elf) {
+ elf_end(obj->elf.elf);
+ obj->elf.elf = NULL;
+ }
+ if (obj->elf.fd >= 0) {
+ close(obj->elf.fd);
+ obj->elf.fd = -1;
+ }
+}
+
+static int bpf_obj_elf_init(struct bpf_object *obj)
{
+ int err = 0;
+ GElf_Ehdr *ep;
+
+ if (obj_elf_valid(obj)) {
+ pr_warning("elf init: internal error\n");
+ return -EEXIST;
+ }
+
+ obj->elf.fd = open(obj->path, O_RDONLY);
+ if (obj->elf.fd < 0) {
+ pr_warning("failed to open %s: %s\n", obj->path,
+ strerror(errno));
+ return -errno;
+ }
+
+ obj->elf.elf = elf_begin(obj->elf.fd,
+ LIBBPF_ELF_C_READ_MMAP,
+ NULL);
+ if (!obj->elf.elf) {
+ pr_warning("failed to open %s as ELF file\n",
+ obj->path);
+ err = -EINVAL;
+ goto errout;
+ }
+
+ if (!gelf_getehdr(obj->elf.elf, &obj->elf.ehdr)) {
+ pr_warning("failed to get EHDR from %s\n",
+ obj->path);
+ err = -EINVAL;
+ goto errout;
+ }
+ ep = &obj->elf.ehdr;
+
+ if ((ep->e_type != ET_REL) || (ep->e_machine != 0)) {
+ pr_warning("%s is not an eBPF object file\n",
+ obj->path);
+ err = -EINVAL;
+ goto errout;
+ }
+
return 0;
+errout:
+ bpf_obj_clear_elf(obj);
+ return err;
+}
+
+struct bpf_object *bpf_open_object(const char *path)
+{
+ struct bpf_object *obj;
+
+ /* param validation */
+ if (!path)
+ return NULL;
+
+ pr_debug("loading %s\n", path);
+
+ if (elf_version(EV_CURRENT) == EV_NONE) {
+ pr_warning("failed to init libelf for %s\n", path);
+ return NULL;
+ }
+
+ obj = bpf_obj_alloc(path);
+ if (!obj)
+ return NULL;
+
+ if (bpf_obj_elf_init(obj))
+ goto out;
+
+ return obj;
+
+out:
+ bpf_close_object(obj);
+ return NULL;
+}
+
+void bpf_close_object(struct bpf_object *obj)
+{
+ if (!obj)
+ return;
+
+ bpf_obj_clear_elf(obj);
+
+ if (obj->path)
+ free(obj->path);
+ free(obj);
}
--
1.8.3.4

2015-05-17 11:04:29

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 10/37] bpf tools: Check endianess and set swap flag according to EHDR

Check endianess according to EHDR to support loading eBPF objects into
big endian machines. Code is taken from tools/perf/util/symbol-elf.c.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 43c16bc..a4910a8 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -69,6 +69,7 @@ void libbpf_set_print(int (*warn)(const char *format, ...),

struct bpf_object {
char *path;
+ bool needs_swap;

/*
* Information when doing elf related work. Only valid if fd
@@ -109,6 +110,7 @@ static struct bpf_object *bpf_obj_alloc(const char *path)
if (!obj)
goto out;

+ obj->needs_swap = false;
obj->elf.fd = -1;
return obj;
out:
@@ -179,6 +181,31 @@ errout:
return err;
}

+static int
+bpf_obj_swap_init(struct bpf_object *obj)
+{
+ static unsigned int const endian = 1;
+
+ obj->needs_swap = false;
+
+ switch (obj->elf.ehdr.e_ident[EI_DATA]) {
+ case ELFDATA2LSB:
+ /* We are big endian, BPF obj is little endian. */
+ if (*(unsigned char const *)&endian != 1)
+ obj->needs_swap = true;
+ return 0;
+
+ case ELFDATA2MSB:
+ /* We are little endian, BPF obj is big endian. */
+ if (*(unsigned char const *)&endian != 0)
+ obj->needs_swap = true;
+ return 0;
+
+ default:
+ return -EINVAL;
+ }
+}
+
struct bpf_object *bpf_open_object(const char *path)
{
struct bpf_object *obj;
@@ -200,6 +227,8 @@ struct bpf_object *bpf_open_object(const char *path)

if (bpf_obj_elf_init(obj))
goto out;
+ if (bpf_obj_swap_init(obj))
+ goto out;

return obj;

--
1.8.3.4

2015-05-17 11:04:10

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 11/37] bpf tools: Iterate over ELF sections to collect information

bpf_obj_elf_collect() is introduced to iterate over each elf sections
to collection informations in eBPF object files. This function will
futher enhanced to collect license, kernel version, programs, configs
and map information.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index a4910a8..1af80c3 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -206,6 +206,58 @@ bpf_obj_swap_init(struct bpf_object *obj)
}
}

+static int bpf_obj_elf_collect(struct bpf_object *obj)
+{
+ Elf *elf = obj->elf.elf;
+ GElf_Ehdr *ep = &obj->elf.ehdr;
+ Elf_Scn *scn = NULL;
+ int idx = 0, err = 0;
+
+ /* Elf is corrupted/truncated, avoid calling elf_strptr. */
+ if (!elf_rawdata(elf_getscn(elf, ep->e_shstrndx), NULL)) {
+ pr_warning("failed to get e_shstrndx from %s\n",
+ obj->path);
+ return -EINVAL;
+ }
+
+ while ((scn = elf_nextscn(elf, scn)) != NULL) {
+ char *name;
+ GElf_Shdr sh;
+ Elf_Data *data;
+
+ idx++;
+ if (gelf_getshdr(scn, &sh) != &sh) {
+ pr_warning("failed to get section header"
+ " from %s\n", obj->path);
+ err = -EINVAL;
+ goto out;
+ }
+
+ name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name);
+ if (!name) {
+ pr_warning("failed to get section name "
+ "from %s\n", obj->path);
+ err = -EINVAL;
+ goto out;
+ }
+
+ data = elf_getdata(scn, 0);
+ if (!data) {
+ pr_warning("failed to get section data "
+ "from %s(%s)\n", name, obj->path);
+ err = -EINVAL;
+ goto out;
+ }
+ pr_debug("section %s, size %ld, link %d, "
+ "flags %lx, type=%d\n",
+ name, (unsigned long)data->d_size,
+ (int)sh.sh_link, (unsigned long)sh.sh_flags,
+ (int)sh.sh_type);
+ }
+out:
+ return err;
+}
+
struct bpf_object *bpf_open_object(const char *path)
{
struct bpf_object *obj;
@@ -229,6 +281,8 @@ struct bpf_object *bpf_open_object(const char *path)
goto out;
if (bpf_obj_swap_init(obj))
goto out;
+ if (bpf_obj_elf_collect(obj))
+ goto out;

return obj;

--
1.8.3.4

2015-05-17 11:03:58

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 12/37] bpf tools: Collect version and license from ELF sections

Expand bpf_obj_elf_collect() to collect license and kernel version
information in eBPF object file. eBPF object file should have a section
named 'license', which contains a string. It should also have a section
named 'version', contains a u32 LINUX_VERSION_CODE.

bpf_obj_validate() is introduced to validate object file after loaded.
Currently it only check existance of 'version' section.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 59 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 1af80c3..b26f1ee 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -15,12 +15,22 @@
#include <fcntl.h>
#include <errno.h>
#include <asm/unistd.h>
+#include <byteswap.h>
#include <linux/bpf.h>
#include <libelf.h>
#include <gelf.h>

#include "libbpf.h"

+#ifdef min
+# undef min
+#endif
+#define min(x, y) ({ \
+ typeof(x) _min1 = (x); \
+ typeof(y) _min2 = (y); \
+ (void) (&_min1 == &_min2); \
+ _min1 < _min2 ? _min1 : _min2; })
+
#define __printf(a, b) __attribute__((format(printf, a, b)))

__printf(1, 2)
@@ -70,6 +80,8 @@ void libbpf_set_print(int (*warn)(const char *format, ...),
struct bpf_object {
char *path;
bool needs_swap;
+ char license[64];
+ u32 kern_version;

/*
* Information when doing elf related work. Only valid if fd
@@ -206,6 +218,32 @@ bpf_obj_swap_init(struct bpf_object *obj)
}
}

+static int bpf_obj_license_init(struct bpf_object *obj,
+ void *data, size_t size)
+{
+ memcpy(obj->license, data,
+ min(size, sizeof(obj->license) - 1));
+ pr_debug("license of %s is %s\n", obj->path, obj->license);
+ return 0;
+}
+
+static int bpf_obj_kver_init(struct bpf_object *obj,
+ void *data, size_t size)
+{
+ u32 kver;
+ if (size < sizeof(kver)) {
+ pr_warning("invalid kver section in %s\n", obj->path);
+ return -EINVAL;
+ }
+ memcpy(&kver, data, sizeof(kver));
+ if (obj->needs_swap)
+ kver = bswap_32(kver);
+ obj->kern_version = kver;
+ pr_debug("kernel version of %s is %x\n", obj->path,
+ obj->kern_version);
+ return 0;
+}
+
static int bpf_obj_elf_collect(struct bpf_object *obj)
{
Elf *elf = obj->elf.elf;
@@ -253,11 +291,30 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
name, (unsigned long)data->d_size,
(int)sh.sh_link, (unsigned long)sh.sh_flags,
(int)sh.sh_type);
+
+ if (strcmp(name, "license") == 0)
+ err = bpf_obj_license_init(obj, data->d_buf,
+ data->d_size);
+ else if (strcmp(name, "version") == 0)
+ err = bpf_obj_kver_init(obj, data->d_buf,
+ data->d_size);
+ if (err)
+ goto out;
}
out:
return err;
}

+static int bpf_obj_validate(struct bpf_object *obj)
+{
+ if (obj->kern_version == 0) {
+ pr_warning("%s doesn't provide kernel version\n",
+ obj->path);
+ return -EINVAL;
+ }
+ return 0;
+}
+
struct bpf_object *bpf_open_object(const char *path)
{
struct bpf_object *obj;
@@ -283,6 +340,8 @@ struct bpf_object *bpf_open_object(const char *path)
goto out;
if (bpf_obj_elf_collect(obj))
goto out;
+ if (bpf_obj_validate(obj))
+ goto out;

return obj;

--
1.8.3.4

2015-05-17 11:04:15

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 13/37] bpf tools: Collect map definitions from 'maps' section

If maps are used by eBPF programs, corresponding object file(s) should
contain a section named 'map'. Which contains map definitions. This
patch copies the data of the whole section. Map data parsing should be
acted just before map loading.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b26f1ee..6ee5f3c 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -82,6 +82,8 @@ struct bpf_object {
bool needs_swap;
char license[64];
u32 kern_version;
+ void *maps_buf;
+ size_t maps_buf_sz;

/*
* Information when doing elf related work. Only valid if fd
@@ -244,6 +246,27 @@ static int bpf_obj_kver_init(struct bpf_object *obj,
return 0;
}

+static int bpf_obj_maps_init(struct bpf_object *obj, void *data,
+ size_t size)
+{
+ if (size == 0) {
+ pr_debug("%s doesn't need map definition\n",
+ obj->path);
+ return 0;
+ }
+
+ obj->maps_buf = malloc(size);
+ if (!obj->maps_buf) {
+ pr_warning("malloc maps failed: %s\n", obj->path);
+ return -ENOMEM;
+ }
+
+ obj->maps_buf_sz = size;
+ memcpy(obj->maps_buf, data, size);
+ pr_debug("maps in %s: %ld bytes\n", obj->path, (long)size);
+ return 0;
+}
+
static int bpf_obj_elf_collect(struct bpf_object *obj)
{
Elf *elf = obj->elf.elf;
@@ -298,6 +321,9 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
else if (strcmp(name, "version") == 0)
err = bpf_obj_kver_init(obj, data->d_buf,
data->d_size);
+ else if (strcmp(name, "maps") == 0)
+ err = bpf_obj_maps_init(obj, data->d_buf,
+ data->d_size);
if (err)
goto out;
}
@@ -359,5 +385,7 @@ void bpf_close_object(struct bpf_object *obj)

if (obj->path)
free(obj->path);
+ if (obj->maps_buf)
+ free(obj->maps_buf);
free(obj);
}
--
1.8.3.4

2015-05-17 11:03:42

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 14/37] bpf tools: Collect config string from 'config' section

A 'config' section is allowed to enable eBPF object file to pass
something to user. libbpf doesn't use config string.

To make further processing easiler, this patch converts '\0' in the
whole config strings into '\n'

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6ee5f3c..43b22a5 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -84,6 +84,7 @@ struct bpf_object {
u32 kern_version;
void *maps_buf;
size_t maps_buf_sz;
+ char *config_str;

/*
* Information when doing elf related work. Only valid if fd
@@ -267,6 +268,48 @@ static int bpf_obj_maps_init(struct bpf_object *obj, void *data,
return 0;
}

+static int bpf_obj_config_init(struct bpf_object *obj, void *data,
+ size_t size)
+{
+ char *config_str;
+ char *p, *pend;
+
+ if (size == 0) {
+ pr_debug("bpf: config section in %s empty\n",
+ obj->path);
+ return 0;
+ }
+ if (obj->config_str) {
+ pr_warning("bpf: multiple config section in %s\n",
+ obj->path);
+ return -EEXIST;
+ }
+
+ config_str = malloc(size + 1);
+ if (!config_str) {
+ pr_warning("bpf: malloc config string failed\n");
+ return -ENOMEM;
+ }
+
+ memcpy(config_str, data, size);
+
+ /*
+ * It is possible that config section contains multiple
+ * Make it a big string by converting all '\0' to '\n' and
+ * append final '\0'.
+ */
+ pend = config_str + size;
+ for (p = config_str; p < pend; p++)
+ *p == '\0' ? *p = '\n' : 0 ;
+ *pend = '\0';
+
+ obj->config_str = config_str;
+ pr_debug("--- CONFIG STRING IN %s: ---\n%s\n",
+ obj->path, config_str);
+ pr_debug("----------------------------\n");
+ return 0;
+}
+
static int bpf_obj_elf_collect(struct bpf_object *obj)
{
Elf *elf = obj->elf.elf;
@@ -324,6 +367,9 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
else if (strcmp(name, "maps") == 0)
err = bpf_obj_maps_init(obj, data->d_buf,
data->d_size);
+ else if (strcmp(name, "config") == 0)
+ err = bpf_obj_config_init(obj, data->d_buf,
+ data->d_size);
if (err)
goto out;
}
@@ -387,5 +433,7 @@ void bpf_close_object(struct bpf_object *obj)
free(obj->path);
if (obj->maps_buf)
free(obj->maps_buf);
+ if (obj->config_str)
+ free(obj->config_str);
free(obj);
}
--
1.8.3.4

2015-05-17 11:04:05

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 15/37] bpf tools: Collect symbol table from SHT_SYMTAB section

This patch collects symbols section. This section is useful when
linking ELF maps.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 43b22a5..2068f0b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -94,6 +94,7 @@ struct bpf_object {
int fd;
Elf *elf;
GElf_Ehdr ehdr;
+ Elf_Data *symbols;
} elf;
};
#define obj_elf_valid(o) ((o)->elf.fd >= 0)
@@ -142,6 +143,8 @@ static void bpf_obj_clear_elf(struct bpf_object *obj)
elf_end(obj->elf.elf);
obj->elf.elf = NULL;
}
+ obj->elf.symbols = NULL;
+
if (obj->elf.fd >= 0) {
close(obj->elf.fd);
obj->elf.fd = -1;
@@ -370,6 +373,14 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
else if (strcmp(name, "config") == 0)
err = bpf_obj_config_init(obj, data->d_buf,
data->d_size);
+ else if (sh.sh_type == SHT_SYMTAB) {
+ if (obj->elf.symbols) {
+ pr_warning("bpf: multiple SYMTAB in %s\n",
+ obj->path);
+ err = -EEXIST;
+ } else
+ obj->elf.symbols = data;
+ }
if (err)
goto out;
}
--
1.8.3.4

2015-05-17 10:57:44

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 16/37] bpf tools: Collect eBPF programs from their own sections

This patch collects all programs in an object file into an array of
'struct bpf_program' for further processing. That structure is for
representing each eBPF program. 'bpf_prog' should be a better name, but
it has been used by linux/filter.h. Although it is a kernel space name,
I still prefer to call it 'bpf_program' to prevent possible confusion.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2068f0b..18b221f 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -77,6 +77,18 @@ void libbpf_set_print(int (*warn)(const char *format, ...),
# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ
#endif

+/*
+ * bpf_prog should be a better name but it has been used in
+ * linux/filter.h.
+ */
+struct bpf_program {
+ /* Index in elf obj file, for relocation use. */
+ int idx;
+ char *section_name;
+ struct bpf_insn *insns;
+ size_t insns_cnt;
+};
+
struct bpf_object {
char *path;
bool needs_swap;
@@ -86,6 +98,9 @@ struct bpf_object {
size_t maps_buf_sz;
char *config_str;

+ struct bpf_program *programs;
+ size_t nr_programs;
+
/*
* Information when doing elf related work. Only valid if fd
* is valid.
@@ -99,6 +114,79 @@ struct bpf_object {
};
#define obj_elf_valid(o) ((o)->elf.fd >= 0)

+static void bpf_clear_program(struct bpf_program *prog)
+{
+ if (!prog)
+ return;
+ if (prog->section_name) {
+ free(prog->section_name);
+ prog->section_name = NULL;
+ }
+ if (prog->insns) {
+ free(prog->insns);
+ prog->insns = NULL;
+ }
+ prog->insns_cnt = 0;
+ prog->idx = -1;
+}
+
+static struct bpf_program *
+bpf_program_alloc(struct bpf_object *obj, void *data, size_t size,
+ char *name, int idx)
+{
+ struct bpf_program *prog, *progs;
+ int nr_progs;
+
+ if (size < sizeof(struct bpf_insn)) {
+ pr_warning("corrupted section '%s'\n", name);
+ return NULL;
+ }
+
+ progs = obj->programs;
+ nr_progs = obj->nr_programs;
+
+ progs = realloc(progs, sizeof(*prog) * (nr_progs + 1));
+ if (!progs) {
+ /*
+ * In this case the original obj->programs
+ * is still valid, so don't need special treat for
+ * bpf_close_object().
+ */
+ pr_warning("failed to alloc a new program '%s'\n",
+ name);
+ return NULL;
+ }
+
+ obj->programs = progs;
+
+ prog = &progs[nr_progs];
+ bzero(prog, sizeof(*prog));
+
+ obj->nr_programs = nr_progs + 1;
+
+ prog->section_name = strdup(name);
+ if (!prog->section_name) {
+ pr_warning("failed to alloc name for prog %s\n",
+ name);
+ goto out;
+ }
+
+ prog->insns = malloc(size);
+ if (!prog->insns) {
+ pr_warning("failed to alloc insns for %s\n", name);
+ goto out;
+ }
+ prog->insns_cnt = size / sizeof(struct bpf_insn);
+ memcpy(prog->insns, data,
+ prog->insns_cnt * sizeof(struct bpf_insn));
+ prog->idx = idx;
+
+ return prog;
+out:
+ bpf_clear_program(prog);
+ return NULL;
+}
+
static struct bpf_object *__bpf_obj_alloc(const char *path)
{
struct bpf_object *obj;
@@ -380,6 +468,22 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
err = -EEXIST;
} else
obj->elf.symbols = data;
+ } else if ((sh.sh_type == SHT_PROGBITS) &&
+ (sh.sh_flags & SHF_EXECINSTR) &&
+ (data->d_size > 0)) {
+ struct bpf_program *prog;
+
+ prog = bpf_program_alloc(obj, data->d_buf,
+ data->d_size, name,
+ idx);
+ if (!prog) {
+ pr_warning("failed to alloc "
+ "program %s (%s)", name,
+ obj->path);
+ err = -ENOMEM;
+ } else
+ pr_debug("found program %s\n",
+ prog->section_name);
}
if (err)
goto out;
@@ -446,5 +550,12 @@ void bpf_close_object(struct bpf_object *obj)
free(obj->maps_buf);
if (obj->config_str)
free(obj->config_str);
+ if (obj->programs) {
+ size_t i;
+
+ for (i = 0; i < obj->nr_programs; i++)
+ bpf_clear_program(&obj->programs[i]);
+ free(obj->programs);
+ }
free(obj);
}
--
1.8.3.4

2015-05-17 11:02:59

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 17/37] bpf tools: Collect relocation sections from SHT_REL sections

This patch collects relocation sections into 'struct object'.
Such sections are used for connecting maps to bpf programs.
'reloc' field in 'struct bpf_object' is introduced for storing
such informations.

This patch simply store the data into 'reloc' field. Following
patch will parse them to know the exact instructions which are
needed to be relocated.

Note that the collected data will be invalid after ELF object file
is closed.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 18b221f..8fd29a9 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -110,6 +110,11 @@ struct bpf_object {
Elf *elf;
GElf_Ehdr ehdr;
Elf_Data *symbols;
+ struct {
+ GElf_Shdr shdr;
+ Elf_Data *data;
+ } *reloc;
+ int nr_reloc;
} elf;
};
#define obj_elf_valid(o) ((o)->elf.fd >= 0)
@@ -233,6 +238,12 @@ static void bpf_obj_clear_elf(struct bpf_object *obj)
}
obj->elf.symbols = NULL;

+ if (obj->elf.reloc) {
+ free(obj->elf.reloc);
+ obj->elf.reloc = NULL;
+ obj->elf.nr_reloc = 0;
+ }
+
if (obj->elf.fd >= 0) {
close(obj->elf.fd);
obj->elf.fd = -1;
@@ -484,6 +495,24 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
} else
pr_debug("found program %s\n",
prog->section_name);
+ } else if (sh.sh_type == SHT_REL) {
+ void *reloc = obj->elf.reloc;
+ int nr_reloc = obj->elf.nr_reloc;
+
+ reloc = realloc(reloc,
+ sizeof(*obj->elf.reloc) * (++nr_reloc));
+ if (!reloc) {
+ pr_warning("realloc failed\n");
+ err = -ENOMEM;
+ } else {
+ int n = nr_reloc - 1;
+
+ obj->elf.reloc = reloc;
+ obj->elf.nr_reloc = nr_reloc;
+
+ obj->elf.reloc[n].shdr = sh;
+ obj->elf.reloc[n].data = data;
+ }
}
if (err)
goto out;
--
1.8.3.4

2015-05-17 11:02:37

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 18/37] bpf tools: Record map accessing instructions for each program

This patch records the indics of instructions which are needed to be
relocated. Those information are saved in 'reloc_desc' field in
'struct bpf_program'. In loading phase (this patch takes effect in
opening phase), the collected instructions will be replaced by
map loading instructions.

Since we are going to close the ELF file and clear all data at the end
of 'opening' phase, ELF information will no longer be valid in
'loading' phase. We have to locate the instructions before maps are
loaded, instead of directly modifying the instruction.

'struct bpf_map_def' is introduce in this patch to let us know how many
maps defined in the object.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 128 +++++++++++++++++++++++++++++++++++++++++++++++++
tools/lib/bpf/libbpf.h | 10 ++++
2 files changed, 138 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8fd29a9..ded96cb 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10,6 +10,7 @@
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
+#include <inttypes.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
@@ -87,6 +88,12 @@ struct bpf_program {
char *section_name;
struct bpf_insn *insns;
size_t insns_cnt;
+
+ struct {
+ int insn_idx;
+ int map_idx;
+ } *reloc_desc;
+ int nr_reloc;
};

struct bpf_object {
@@ -131,6 +138,12 @@ static void bpf_clear_program(struct bpf_program *prog)
free(prog->insns);
prog->insns = NULL;
}
+ if (prog->reloc_desc) {
+ free(prog->reloc_desc);
+ prog->reloc_desc = NULL;
+ }
+
+ prog->nr_reloc = 0;
prog->insns_cnt = 0;
prog->idx = -1;
}
@@ -521,6 +534,119 @@ out:
return err;
}

+static struct bpf_program *
+bpf_find_prog_by_idx(struct bpf_object *obj, int idx)
+{
+ struct bpf_program *prog;
+ size_t i;
+
+ for (i = 0; i < obj->nr_programs; i++) {
+ prog = &obj->programs[i];
+ if (prog->idx == idx)
+ return prog;
+ }
+ return NULL;
+}
+
+static int
+bpf_program_collect_reloc(struct bpf_object *obj,
+ GElf_Shdr *shdr,
+ Elf_Data *data,
+ struct bpf_program *prog)
+{
+ int i, nrels;
+ size_t nr_maps = obj->maps_buf_sz / sizeof(struct bpf_map_def);
+
+ pr_debug("collecting relocating info for: '%s'\n",
+ prog->section_name);
+ nrels = shdr->sh_size / shdr->sh_entsize;
+
+ prog->reloc_desc = malloc(sizeof(*prog->reloc_desc) * nrels);
+ if (!prog->reloc_desc) {
+ pr_warning("failed to alloc memory in relocation\n");
+ return -ENOMEM;
+ }
+ prog->nr_reloc = nrels;
+
+ for (i = 0; i < nrels; i++) {
+ GElf_Sym sym;
+ GElf_Rel rel;
+ unsigned int insn_idx;
+ struct bpf_insn *insns = prog->insns;
+ size_t map_idx;
+
+ if (!gelf_getrel(data, i, &rel)) {
+ pr_warning("relocation: failed to get "
+ "%d reloc\n", i);
+ return -EINVAL;
+ }
+
+ insn_idx = rel.r_offset / sizeof(struct bpf_insn);
+ pr_debug("relocation: insn_idx=%u\n", insn_idx);
+
+ if (!gelf_getsym(obj->elf.symbols,
+ GELF_R_SYM(rel.r_info),
+ &sym)) {
+ pr_warning("relocation: symbol %"PRIx64
+ " not found\n",
+ GELF_R_SYM(rel.r_info));
+ return -EINVAL;
+ }
+
+ if (insns[insn_idx].code != (BPF_LD | BPF_IMM | BPF_DW)) {
+ pr_warning("bpf: relocation: invalid relo for "
+ "insns[%d].code 0x%x\n",
+ insn_idx, insns[insn_idx].code);
+ return -EINVAL;
+ }
+
+ map_idx = sym.st_value / sizeof(struct bpf_map_def);
+ if (map_idx >= nr_maps) {
+ pr_warning("bpf relocation: map_idx %d large than %d\n",
+ (int)map_idx, (int)nr_maps - 1);
+ return -EINVAL;
+ }
+
+ prog->reloc_desc[i].insn_idx = insn_idx;
+ prog->reloc_desc[i].map_idx = map_idx;
+ }
+ return 0;
+}
+
+static int bpf_obj_collect_reloc(struct bpf_object *obj)
+{
+ int i, err;
+
+ if (!obj_elf_valid(obj)) {
+ pr_warning("Internal error: elf object is closed\n");
+ return -EINVAL;
+ }
+
+ for (i = 0; i < obj->elf.nr_reloc; i++) {
+ GElf_Shdr *shdr = &obj->elf.reloc[i].shdr;
+ Elf_Data *data = obj->elf.reloc[i].data;
+ int idx = shdr->sh_info;
+ struct bpf_program *prog;
+
+ if (shdr->sh_type != SHT_REL) {
+ pr_warning("internal error at %d\n", __LINE__);
+ return -EINVAL;
+ }
+
+ prog = bpf_find_prog_by_idx(obj, idx);
+ if (!prog) {
+ pr_warning("relocation failed: no %d section\n",
+ idx);
+ return -ENOENT;
+ }
+
+ err = bpf_program_collect_reloc(obj, shdr, data, prog);
+ if (err)
+ return -EINVAL;
+ }
+ return 0;
+}
+
static int bpf_obj_validate(struct bpf_object *obj)
{
if (obj->kern_version == 0) {
@@ -556,6 +682,8 @@ struct bpf_object *bpf_open_object(const char *path)
goto out;
if (bpf_obj_elf_collect(obj))
goto out;
+ if (bpf_obj_collect_reloc(obj))
+ goto out;
if (bpf_obj_validate(obj))
goto out;

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e523ae9..3505be7 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -21,4 +21,14 @@ struct bpf_object;
struct bpf_object *bpf_open_object(const char *path);
void bpf_close_object(struct bpf_object *object);

+/*
+ * packed attribute is unnecessary for 'bpf_map_def'.
+ */
+struct bpf_map_def {
+ unsigned int type;
+ unsigned int key_size;
+ unsigned int value_size;
+ unsigned int max_entries;
+};
+
#endif
--
1.8.3.4

2015-05-17 11:03:09

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 19/37] bpf tools: Clear libelf and ELF parsing resrouce to finish opening

After all eBPF programs in an object file are loaded, related ELF
information is useless. Close the object file and free those memory.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ded96cb..9ed8cca 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -687,8 +687,8 @@ struct bpf_object *bpf_open_object(const char *path)
if (bpf_obj_validate(obj))
goto out;

+ bpf_obj_clear_elf(obj);
return obj;
-
out:
bpf_close_object(obj);
return NULL;
--
1.8.3.4

2015-05-17 11:02:24

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 20/37] bpf tools: Add bpf.c/h for common bpf operations

This patch introduces bpf.c and bpf.h, which hold common functions
issuing bpf syscall. The goal of these two files is to hide syscall
completly from user. Note that bpf.c and bpf.h only deal with kernel
interface. Things like structure of 'map' section in the ELF object is
not cared by of bpf.[ch].

We first introduce bpf_create_map().

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/Build | 2 +-
tools/lib/bpf/bpf.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
tools/lib/bpf/bpf.h | 17 +++++++++++++++++
3 files changed, 71 insertions(+), 1 deletion(-)
create mode 100644 tools/lib/bpf/bpf.c
create mode 100644 tools/lib/bpf/bpf.h

diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build
index a316484..d874975 100644
--- a/tools/lib/bpf/Build
+++ b/tools/lib/bpf/Build
@@ -1 +1 @@
-libbpf-y := libbpf.o
+libbpf-y := libbpf.o bpf.o
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
new file mode 100644
index 0000000..3dbe30d
--- /dev/null
+++ b/tools/lib/bpf/bpf.c
@@ -0,0 +1,53 @@
+/*
+ * common eBPF ELF operations.
+ *
+ * Copyright (C) 2015, Wang Nan <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+
+#include <stdlib.h>
+#include <memory.h>
+#include <unistd.h>
+#include <asm/unistd.h>
+#include <linux/bpf.h>
+#include "bpf.h"
+
+/* When building perf, unistd.h is override. __NR_bpf by ourself. */
+#if defined(__i386__)
+#ifndef __NR_bpf
+# define __NR_bpf 357
+#endif
+#endif
+
+#if defined(__x86_64__)
+#ifndef __NR_bpf
+# define __NR_bpf 321
+#endif
+#endif
+
+#if defined(__aarch64__)
+#ifndef __NR_bpf
+# define __NR_bpf 280
+#endif
+#endif
+
+static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr, unsigned int size)
+{
+ return syscall(__NR_bpf, cmd, attr, size);
+}
+
+int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
+ int max_entries)
+{
+ union bpf_attr attr;
+ memset(&attr, '\0', sizeof(attr));
+
+ attr.map_type = map_type;
+ attr.key_size = key_size;
+ attr.value_size = value_size;
+ attr.max_entries = max_entries;
+
+ return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
new file mode 100644
index 0000000..1e9a53b
--- /dev/null
+++ b/tools/lib/bpf/bpf.h
@@ -0,0 +1,17 @@
+/*
+ * common eBPF ELF operations.
+ *
+ * Copyright (C) 2015, Wang Nan <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+#ifndef __LIBBPF_BPF_H
+#define __LIBBPF_BPF_H
+
+#include <linux/bpf.h>
+
+int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
+ int max_entries);
+
+#endif
--
1.8.3.4

2015-05-17 11:03:25

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 21/37] bpf tools: Create eBPF maps defined in an object file

This patch creates maps based on 'map' section in object file using
bpf_create_map(), and store the fds into an array in
'struct bpf_object'. Since the byte order of the object may differ
from the host, swap map definition before processing.

This is the first patch in 'loading' phase. Previous patches parse ELF
object file and create needed data structure, but doesnnt play with
kernel. They belong to 'opening' phase.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++++
tools/lib/bpf/libbpf.h | 4 +++
2 files changed, 102 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 9ed8cca..6ff4cb6 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -22,6 +22,7 @@
#include <gelf.h>

#include "libbpf.h"
+#include "bpf.h"

#ifdef min
# undef min
@@ -107,6 +108,7 @@ struct bpf_object {

struct bpf_program *programs;
size_t nr_programs;
+ int *maps_fds;

/*
* Information when doing elf related work. Only valid if fd
@@ -613,6 +615,67 @@ bpf_program_collect_reloc(struct bpf_object *obj,
return 0;
}

+static int
+bpf_obj_create_maps(struct bpf_object *obj)
+{
+ unsigned int i;
+ size_t nr_maps;
+ int *pfd;
+
+ nr_maps = obj->maps_buf_sz / sizeof(struct bpf_map_def);
+ if (!obj->maps_buf || !nr_maps) {
+ pr_debug("don't need create maps for %s\n",
+ obj->path);
+ return 0;
+ }
+
+ obj->maps_fds = malloc(sizeof(int) * nr_maps);
+ if (!obj->maps_fds) {
+ pr_warning("realloc perf_bpf_maps_fds failed\n");
+ return -ENOMEM;
+ }
+
+ /* fill all fd with -1 */
+ memset(obj->maps_fds, 0xff, sizeof(int) * nr_maps);
+
+ pfd = obj->maps_fds;
+ for (i = 0; i < nr_maps; i++) {
+ struct bpf_map_def def;
+
+ def = *(struct bpf_map_def *)(obj->maps_buf +
+ i * sizeof(struct bpf_map_def));
+
+ if (obj->needs_swap) {
+ def.type = bswap_32(def.type);
+ def.key_size = bswap_32(def.key_size);
+ def.value_size = bswap_32(def.value_size);
+ def.max_entries = bswap_32(def.max_entries);
+ }
+
+ *pfd = bpf_create_map(def.type,
+ def.key_size,
+ def.value_size,
+ def.max_entries);
+ if (*pfd < 0) {
+ size_t j;
+ int err = *pfd;
+
+ pr_warning("failed to create map: %s\n",
+ strerror(errno));
+ for (j = 0; j < i; j++) {
+ close(obj->maps_fds[j]);
+ obj->maps_fds[j] = -1;
+ }
+ free(obj->maps_fds);
+ obj->maps_fds = NULL;
+ return err;
+ }
+ pr_debug("create map: fd=%d\n", *pfd);
+ pfd ++;
+ }
+ return 0;
+}
+
static int bpf_obj_collect_reloc(struct bpf_object *obj)
{
int i, err;
@@ -694,12 +757,47 @@ out:
return NULL;
}

+int bpf_unload_object(struct bpf_object *obj)
+{
+ if (!obj)
+ return -EINVAL;
+
+ if (obj->maps_fds) {
+ size_t i;
+ size_t sz = sizeof(struct bpf_map_def);
+
+ for (i = 0; i < obj->maps_buf_sz; i += sz) {
+ if (obj->maps_fds[i] >= 0)
+ close(obj->maps_fds[i]);
+ }
+ free(obj->maps_fds);
+ }
+
+ return 0;
+}
+
+int bpf_load_object(struct bpf_object *obj)
+{
+ if (!obj)
+ return -EINVAL;
+
+ if (bpf_obj_create_maps(obj))
+ goto out;
+
+ return 0;
+out:
+ bpf_unload_object(obj);
+ pr_warning("failed to load object '%s'\n", obj->path);
+ return -EINVAL;
+}
+
void bpf_close_object(struct bpf_object *obj)
{
if (!obj)
return;

bpf_obj_clear_elf(obj);
+ bpf_unload_object(obj);

if (obj->path)
free(obj->path);
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 3505be7..c0b290d 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -21,6 +21,10 @@ struct bpf_object;
struct bpf_object *bpf_open_object(const char *path);
void bpf_close_object(struct bpf_object *object);

+/* Load/unload object into/from kernel */
+int bpf_load_object(struct bpf_object *obj);
+int bpf_unload_object(struct bpf_object *obj);
+
/*
* packed attribute is unnecessary for 'bpf_map_def'.
*/
--
1.8.3.4

2015-05-17 11:02:48

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 22/37] bpf tools: Relocate eBPF programs

If an eBPF program access a map, LLVM generates a relocated load
instruction. To enable the usage of that map, relocation must be done
by replacing original instructions by map loading instructions.

Based on relocation description collected during 'opening' phase, this
patch replaces the instructions with map loading with correct map fd.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 48 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6ff4cb6..5e25ea7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -676,6 +676,52 @@ bpf_obj_create_maps(struct bpf_object *obj)
return 0;
}

+static int
+bpf_program_relocate(struct bpf_object *obj, struct bpf_program *prog)
+{
+ int i;
+
+ if (!prog || !prog->reloc_desc)
+ return 0;
+
+ for (i = 0; i < prog->nr_reloc; i++) {
+ int insn_idx, map_idx;
+ struct bpf_insn *insns = prog->insns;
+
+ insn_idx = prog->reloc_desc[i].insn_idx;
+ map_idx = prog->reloc_desc[i].map_idx;
+
+ if (insn_idx >= (int)prog->insns_cnt) {
+ pr_warning("relocation out of range: '%s'\n",
+ prog->section_name);
+ return -ERANGE;
+ }
+ insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
+ insns[insn_idx].imm = obj->maps_fds[map_idx];
+ }
+
+ return 0;
+}
+
+static int
+bpf_obj_relocate(struct bpf_object *obj)
+{
+ struct bpf_program *prog;
+ size_t i;
+ int err;
+
+ for (i = 0; i < obj->nr_programs; i++) {
+ prog = &obj->programs[i];
+
+ if ((err = bpf_program_relocate(obj, prog))) {
+ pr_warning("failed to relocate '%s'\n",
+ prog->section_name);
+ return err;
+ }
+ }
+ return 0;
+}
+
static int bpf_obj_collect_reloc(struct bpf_object *obj)
{
int i, err;
@@ -783,6 +829,8 @@ int bpf_load_object(struct bpf_object *obj)

if (bpf_obj_create_maps(obj))
goto out;
+ if (bpf_obj_relocate(obj))
+ goto out;

return 0;
out:
--
1.8.3.4

2015-05-17 11:01:13

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 23/37] bpf tools: Introduce bpf_load_program() to bpf.c

bpf_load_program() can be used to load bpf program into kernel. To make
loading faster, first try to load without logbuf. Try again with logbuf
if the first try failed.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/bpf.c | 34 ++++++++++++++++++++++++++++++++++
tools/lib/bpf/bpf.h | 7 +++++++
2 files changed, 41 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 3dbe30d..e4eed22 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -33,6 +33,11 @@
#endif
#endif

+static __u64 ptr_to_u64(void *ptr)
+{
+ return (__u64) (unsigned long) ptr;
+}
+
static int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr, unsigned int size)
{
return syscall(__NR_bpf, cmd, attr, size);
@@ -51,3 +56,32 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,

return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}
+
+int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
+ size_t insns_cnt, char *license,
+ u32 kern_version, char *log_buf, size_t log_buf_sz)
+{
+ int fd;
+ union bpf_attr attr;
+
+ bzero(&attr, sizeof(attr));
+ attr.prog_type = type;
+ attr.insn_cnt = (__u32)insns_cnt;
+ attr.insns = ptr_to_u64((void *) insns);
+ attr.license = ptr_to_u64((void *) license);
+ attr.log_buf = ptr_to_u64(NULL);
+ attr.log_size = 0;
+ attr.log_level = 0;
+ attr.kern_version = kern_version;
+
+ fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
+ if ((fd >= 0) || !log_buf || !log_buf_sz)
+ return fd;
+
+ /* Try again with log */
+ attr.log_buf = ptr_to_u64(log_buf);
+ attr.log_size = log_buf_sz;
+ attr.log_level = 1;
+ log_buf[0] = 0;
+ return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 1e9a53b..fb2a613 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -14,4 +14,11 @@
int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
int max_entries);

+/* Recommend log buffer size */
+#define BPF_LOG_BUF_SIZE 65536
+int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
+ size_t insns_cnt, char *license,
+ u32 kern_version, char *log_buf,
+ size_t log_buf_sz);
+
#endif
--
1.8.3.4

2015-05-17 10:58:41

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 24/37] bpf tools: Load eBPF programs in object files into kernel

This patch utilizes previous introduced bpf_load_program to load
programs in the ELF file into kernel. Result is stored in 'fd' field
in 'struct bpf_program'.

During loading, it allocs a log buffer and free it before return.
Note that that buffer is not passed to bpf_load_program() if the first
loading try is successful. Doesn't use a statically allocated log
buffer to avoid potention multi-thread problem.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 79 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 5e25ea7..e8ef78e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -95,6 +95,8 @@ struct bpf_program {
int map_idx;
} *reloc_desc;
int nr_reloc;
+
+ int fd;
};

struct bpf_object {
@@ -128,10 +130,24 @@ struct bpf_object {
};
#define obj_elf_valid(o) ((o)->elf.fd >= 0)

+static void bpf_unload_program(struct bpf_program *prog)
+{
+ if (!prog)
+ return;
+
+ if (prog->fd >= 0) {
+ close(prog->fd);
+ prog->fd = -1;
+ }
+}
+
static void bpf_clear_program(struct bpf_program *prog)
{
if (!prog)
return;
+
+ bpf_unload_program(prog);
+
if (prog->section_name) {
free(prog->section_name);
prog->section_name = NULL;
@@ -200,6 +216,7 @@ bpf_program_alloc(struct bpf_object *obj, void *data, size_t size,
memcpy(prog->insns, data,
prog->insns_cnt * sizeof(struct bpf_insn));
prog->idx = idx;
+ prog->fd = -1;

return prog;
out:
@@ -756,6 +773,59 @@ static int bpf_obj_collect_reloc(struct bpf_object *obj)
return 0;
}

+static int
+bpf_obj_load_prog(struct bpf_object *obj, struct bpf_program *prog)
+{
+ int fd, err;
+ char *log_buf;
+
+ log_buf = malloc(BPF_LOG_BUF_SIZE);
+ if (!log_buf)
+ pr_warning("Alloc log buffer for bpf loader error, "
+ "continue without log\n");
+
+ fd = bpf_load_program(BPF_PROG_TYPE_KPROBE, prog->insns,
+ prog->insns_cnt, obj->license,
+ obj->kern_version, log_buf,
+ BPF_LOG_BUF_SIZE);
+
+ if (fd >= 0) {
+ prog->fd = fd;
+ pr_debug("load bpf program '%s': fd = %d\n",
+ prog->section_name, prog->fd);
+ err = 0;
+ goto out;
+ }
+
+ err = -EINVAL;
+ pr_warning("load bpf program '%s' failed: %s\n",
+ prog->section_name, strerror(errno));
+
+ if (log_buf)
+ pr_warning("bpf: load: failed to load program '%s':\n"
+ "-- BEGIN DUMP LOG ---\n%s\n-- END LOG --\n",
+ prog->section_name, log_buf);
+
+out:
+ if (log_buf)
+ free(log_buf);
+ return err;
+}
+
+static int
+bpf_obj_load_progs(struct bpf_object *obj)
+{
+ size_t i;
+ int err;
+
+ for (i = 0; i < obj->nr_programs; i++) {
+ err = bpf_obj_load_prog(obj, &obj->programs[i]);
+ if (err)
+ return err;
+ }
+ return 0;
+}
+
static int bpf_obj_validate(struct bpf_object *obj)
{
if (obj->kern_version == 0) {
@@ -819,6 +889,13 @@ int bpf_unload_object(struct bpf_object *obj)
free(obj->maps_fds);
}

+ if (obj->programs) {
+ size_t i;
+
+ for (i = 0; i < obj->nr_programs; i++)
+ bpf_unload_program(&obj->programs[i]);
+ }
+
return 0;
}

@@ -831,6 +908,8 @@ int bpf_load_object(struct bpf_object *obj)
goto out;
if (bpf_obj_relocate(obj))
goto out;
+ if (bpf_obj_load_progs(obj))
+ goto out;

return 0;
out:
--
1.8.3.4

2015-05-17 11:01:34

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 25/37] bpf tools: Introduce accessors for struct bpf_program

This patch introduces accessors for user of libbpf to retrive section
name and fd of a opened/loaded eBPF program. 'struct bpf_prog_handler'
is used for that purpose. Accessors of programs section name and file
descriptor are provided. Set/get private data are also impelmented.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 137 +++++++++++++++++++++++++++++++++++++++++++++++++
tools/lib/bpf/libbpf.h | 25 +++++++++
2 files changed, 162 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index e8ef78e..d770adc 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -79,6 +79,20 @@ void libbpf_set_print(int (*warn)(const char *format, ...),
# define LIBBPF_ELF_C_READ_MMAP ELF_C_READ
#endif

+#ifndef container_of
+# define container_of(ptr, type, member) ({ \
+ const typeof( ((type *)0)->member ) *__mptr = (ptr); \
+ (type *)( (char *)__mptr - offsetof(type,member) );})
+#endif
+
+/* Accessing of project */
+struct bpf_prog_handler {
+ struct bpf_object *obj;
+
+ void *priv;
+ bpf_prog_clear_priv_t clear_priv;
+};
+
/*
* bpf_prog should be a better name but it has been used in
* linux/filter.h.
@@ -97,6 +111,7 @@ struct bpf_program {
int nr_reloc;

int fd;
+ struct bpf_prog_handler handler;
};

struct bpf_object {
@@ -146,6 +161,12 @@ static void bpf_clear_program(struct bpf_program *prog)
if (!prog)
return;

+ if (prog->handler.clear_priv)
+ prog->handler.clear_priv(&prog->handler, prog->handler.priv);
+
+ prog->handler.priv = NULL;
+ prog->handler.clear_priv = NULL;
+
bpf_unload_program(prog);

if (prog->section_name) {
@@ -217,6 +238,7 @@ bpf_program_alloc(struct bpf_object *obj, void *data, size_t size,
prog->insns_cnt * sizeof(struct bpf_insn));
prog->idx = idx;
prog->fd = -1;
+ prog->handler.obj = obj;

return prog;
out:
@@ -941,3 +963,118 @@ void bpf_close_object(struct bpf_object *obj)
}
free(obj);
}
+
+static inline struct bpf_program *
+handler_to_prog(struct bpf_prog_handler *handler)
+{
+ struct bpf_program *prog;
+ struct bpf_object *obj;
+ size_t idx;
+
+ if (!handler)
+ goto out_warn;
+
+ obj = handler->obj;
+ prog = container_of(handler, struct bpf_program, handler);
+ idx = prog - obj->programs;
+ if (idx >= obj->nr_programs)
+ goto out_warn;
+ return &obj->programs[idx];
+
+out_warn:
+ pr_warning("invalid handler %p\n", handler);
+ return NULL;
+}
+
+struct bpf_prog_handler *
+bpf_next_prog(struct bpf_object *obj, struct bpf_prog_handler *handler)
+{
+ struct bpf_program *prog;
+ size_t idx;
+
+ if (!obj->programs)
+ return NULL;
+ /* First handler */
+ if (handler == NULL)
+ return (&obj->programs[0].handler);
+
+ if (handler->obj != obj) {
+ pr_warning("error: program handler doesn't "
+ "match object\n");
+ return NULL;
+ }
+
+ prog = handler_to_prog(handler);
+ if (!prog)
+ return NULL;
+
+ idx = (prog - obj->programs) + 1;
+ if (idx >= obj->nr_programs)
+ return NULL;
+ return &obj->programs[idx].handler;
+}
+
+int bpf_prog_set_private(struct bpf_prog_handler *handler, void *priv,
+ bpf_prog_clear_priv_t clear_priv)
+{
+ struct bpf_program *prog;
+
+ prog = handler_to_prog(handler);
+ if (!prog)
+ return -EINVAL;
+
+ if (prog->handler.priv && prog->handler.clear_priv)
+ prog->handler.clear_priv(&prog->handler, prog->handler.priv);
+
+ prog->handler.priv = priv;
+ prog->handler.clear_priv = clear_priv;
+ return 0;
+}
+
+int bpf_prog_get_private(struct bpf_prog_handler *handler, void **ppriv)
+{
+ struct bpf_program *prog;
+
+ prog = handler_to_prog(handler);
+ if (!prog || !ppriv)
+ return -EINVAL;
+
+ *ppriv = prog->handler.priv;
+ return 0;
+}
+
+int bpf_prog_get_title(struct bpf_prog_handler *handler,
+ const char **ptitle, bool dup)
+{
+ struct bpf_program *prog;
+ const char *title;
+
+ prog = handler_to_prog(handler);
+ if (!prog || !ptitle)
+ return -EINVAL;
+
+ title = prog->section_name;
+ if (dup) {
+ title = strdup(title);
+ if (!title) {
+ pr_warning("failed to strdup program title\n");
+ *ptitle = NULL;
+ return -ENOMEM;
+ }
+ }
+
+ *ptitle = title;
+ return 0;
+}
+
+int bpf_prog_get_fd(struct bpf_prog_handler *handler, int *pfd)
+{
+ struct bpf_program *prog;
+
+ prog = handler_to_prog(handler);
+ if (!prog || !pfd)
+ return -EINVAL;
+
+ *pfd = prog->fd;
+ return 0;
+}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index c0b290d..b451d1a 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -10,6 +10,7 @@
#define __BPF_LIBBPF_H

#include <stdio.h>
+#include <stdbool.h>

void libbpf_set_print(int (*warn)(const char *format, ...),
int (*info)(const char *format, ...),
@@ -25,6 +26,30 @@ void bpf_close_object(struct bpf_object *object);
int bpf_load_object(struct bpf_object *obj);
int bpf_unload_object(struct bpf_object *obj);

+/* Accessors of bpf_program. */
+struct bpf_prog_handler;
+struct bpf_prog_handler *bpf_next_prog(struct bpf_object *obj,
+ struct bpf_prog_handler *handler);
+
+#define bpf_obj_for_each_prog(obj, pos) \
+ for ((pos) = bpf_next_prog((obj), NULL); \
+ (pos) != NULL; \
+ (pos) = bpf_next_prog((obj), (pos)))
+
+typedef void (*bpf_prog_clear_priv_t)(struct bpf_prog_handler *,
+ void *);
+
+int bpf_prog_set_private(struct bpf_prog_handler *handler, void *priv,
+ bpf_prog_clear_priv_t clear_priv);
+
+int bpf_prog_get_private(struct bpf_prog_handler *handler,
+ void **ppriv);
+
+int bpf_prog_get_title(struct bpf_prog_handler *handler,
+ const char **ptitle, bool dup);
+
+int bpf_prog_get_fd(struct bpf_prog_handler *handler, int *pfd);
+
/*
* packed attribute is unnecessary for 'bpf_map_def'.
*/
--
1.8.3.4

2015-05-17 10:58:48

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 26/37] bpf tools: Introduce accessors for struct bpf_object

This patch add an accessor which allows caller to get count of programs
in an object file.

Signed-off-by: Wang Nan <[email protected]>
---
tools/lib/bpf/libbpf.c | 9 +++++++++
tools/lib/bpf/libbpf.h | 3 +++
2 files changed, 12 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index d770adc..89c725a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -964,6 +964,15 @@ void bpf_close_object(struct bpf_object *obj)
free(obj);
}

+int bpf_obj_get_prog_cnt(struct bpf_object *obj, size_t *pcnt)
+{
+ if (!obj || !pcnt)
+ return -EINVAL;
+
+ *pcnt = obj->nr_programs;
+ return 0;
+}
+
static inline struct bpf_program *
handler_to_prog(struct bpf_prog_handler *handler)
{
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b451d1a..31ff5d9 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -26,6 +26,9 @@ void bpf_close_object(struct bpf_object *object);
int bpf_load_object(struct bpf_object *obj);
int bpf_unload_object(struct bpf_object *obj);

+/* Accessors of bpf_object */
+int bpf_obj_get_prog_cnt(struct bpf_object *obj, size_t *pcnt);
+
/* Accessors of bpf_program. */
struct bpf_prog_handler;
struct bpf_prog_handler *bpf_next_prog(struct bpf_object *obj,
--
1.8.3.4

2015-05-17 11:01:27

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 27/37] perf tools: Add new 'perf bpf' command

Adding new 'perf bpf' command to provide eBPF program management
operations. This patch only creates basic 'perf bpf'. Subcommands will
be introduced in following patches.

To utilize existing code of usage_with_options() while enable
subcommand list get output after 'Usage ...' indicator, this patch
add a usage_with_options_noexit() function, which does similar thing
except exiting, allows caller print more information before quit.

In this patch, 'perf bpf' command won't be built if doesn't find
libelf.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/Build | 1 +
tools/perf/Documentation/perf-bpf.txt | 18 ++++++++
tools/perf/builtin-bpf.c | 85 +++++++++++++++++++++++++++++++++++
tools/perf/builtin.h | 1 +
tools/perf/command-list.txt | 1 +
tools/perf/perf.c | 3 ++
tools/perf/util/parse-options.c | 8 +++-
tools/perf/util/parse-options.h | 2 +
8 files changed, 118 insertions(+), 1 deletion(-)
create mode 100644 tools/perf/Documentation/perf-bpf.txt
create mode 100644 tools/perf/builtin-bpf.c

diff --git a/tools/perf/Build b/tools/perf/Build
index b77370e..c3c6cc3 100644
--- a/tools/perf/Build
+++ b/tools/perf/Build
@@ -19,6 +19,7 @@ perf-y += builtin-kvm.o
perf-y += builtin-inject.o
perf-y += builtin-mem.o
perf-y += builtin-data.o
+perf-$(CONFIG_LIBELF) += builtin-bpf.o

perf-$(CONFIG_AUDIT) += builtin-trace.o
perf-$(CONFIG_LIBELF) += builtin-probe.o
diff --git a/tools/perf/Documentation/perf-bpf.txt b/tools/perf/Documentation/perf-bpf.txt
new file mode 100644
index 0000000..0e8b590
--- /dev/null
+++ b/tools/perf/Documentation/perf-bpf.txt
@@ -0,0 +1,18 @@
+perf-bpf(1)
+==============
+
+NAME
+----
+perf-bpf - Management of eBPF programs.
+
+SYNOPSIS
+--------
+[verse]
+'perf bpf' [<common options>] <command> [<options>]",
+
+DESCRIPTION
+-----------
+Management of eBPF programs.
+
+OPTIONS
+-------
diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
new file mode 100644
index 0000000..a8858e2
--- /dev/null
+++ b/tools/perf/builtin-bpf.c
@@ -0,0 +1,85 @@
+/*
+ * builtin-bpf.c
+ *
+ * Copyright (C) 2015, Wang Nan <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ *
+ * Builtin bpf command: management of bpf programs.
+ */
+#include "builtin.h"
+#include "perf.h"
+#include "debug.h"
+#include "parse-options.h"
+
+typedef int (*bpf_cmd_fn_t)(int argc, const char **argv, const char *prefix);
+
+struct bpf_cmd {
+ const char *name;
+ const char *summary;
+ bpf_cmd_fn_t fn;
+};
+
+static struct bpf_cmd bpf_cmds[];
+
+#define for_each_cmd(cmd) \
+ for (cmd = bpf_cmds; cmd && cmd->name; cmd++)
+
+struct option bpf_options[] = {
+ OPT_INCR('v', "verbose", &verbose, "be more verbose "
+ "(show debug information)"),
+ OPT_END()
+};
+
+static const char *bpf_usage[] = {
+ "perf bpf [<command options>] <command> [<options>]",
+ NULL
+};
+
+static void print_usage(void)
+{
+ struct bpf_cmd *cmd;
+
+ usage_with_options_noexit(bpf_usage, bpf_options);
+ printf("\tAvailable commands:\n");
+ for_each_cmd(cmd)
+ printf("\t %s\t- %s\n", cmd->name, cmd->summary);
+ exit(129);
+}
+
+static const char * const bpf_subcommands[] = { NULL };
+
+static struct bpf_cmd bpf_cmds[] = {
+ { .name = NULL, },
+};
+
+int cmd_bpf(int argc, const char **argv,
+ const char *prefix __maybe_unused)
+{
+ struct bpf_cmd *cmd;
+ const char *cmdstr;
+
+ /* No command specified. */
+ if (argc < 2)
+ goto usage;
+
+ argc = parse_options_subcommand(argc, argv, bpf_options, bpf_subcommands, bpf_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (argc < 1)
+ goto usage;
+
+ cmdstr = argv[0];
+
+ for_each_cmd(cmd) {
+ if (strcmp(cmd->name, cmdstr))
+ continue;
+
+ return cmd->fn(argc, argv, prefix);
+ }
+
+ pr_err("Unknown command %s\n", cmdstr);
+usage:
+ print_usage();
+ return -1;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 3688ad2..c2c4a0d 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -38,6 +38,7 @@ extern int cmd_trace(int argc, const char **argv, const char *prefix);
extern int cmd_inject(int argc, const char **argv, const char *prefix);
extern int cmd_mem(int argc, const char **argv, const char *prefix);
extern int cmd_data(int argc, const char **argv, const char *prefix);
+extern int cmd_bpf(int argc, const char **argv, const char *prefix);

extern int find_scripts(char **scripts_array, char **scripts_path_array);
#endif
diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt
index 00fcaf8..1000463 100644
--- a/tools/perf/command-list.txt
+++ b/tools/perf/command-list.txt
@@ -5,6 +5,7 @@
perf-annotate mainporcelain common
perf-archive mainporcelain common
perf-bench mainporcelain common
+perf-bpf mainporcelain full
perf-buildid-cache mainporcelain common
perf-buildid-list mainporcelain common
perf-data mainporcelain common
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index b857fcb..eff1a55 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -64,6 +64,9 @@ static struct cmd_struct commands[] = {
{ "inject", cmd_inject, 0 },
{ "mem", cmd_mem, 0 },
{ "data", cmd_data, 0 },
+#ifdef HAVE_LIBELF_SUPPORT
+ { "bpf", cmd_bpf, 0 },
+#endif
};

struct pager_config {
diff --git a/tools/perf/util/parse-options.c b/tools/perf/util/parse-options.c
index 01626be..09e48a2 100644
--- a/tools/perf/util/parse-options.c
+++ b/tools/perf/util/parse-options.c
@@ -672,9 +672,15 @@ int usage_with_options_internal(const char * const *usagestr,
void usage_with_options(const char * const *usagestr,
const struct option *opts)
{
+ usage_with_options_noexit(usagestr, opts);
+ exit(129);
+}
+
+void usage_with_options_noexit(const char * const *usagestr,
+ const struct option *opts)
+{
exit_browser(false);
usage_with_options_internal(usagestr, opts, 0);
- exit(129);
}

int parse_options_usage(const char * const *usagestr,
diff --git a/tools/perf/util/parse-options.h b/tools/perf/util/parse-options.h
index 59561fd..41194db 100644
--- a/tools/perf/util/parse-options.h
+++ b/tools/perf/util/parse-options.h
@@ -156,6 +156,8 @@ extern int parse_options_subcommand(int argc, const char **argv,

extern NORETURN void usage_with_options(const char * const *usagestr,
const struct option *options);
+extern void usage_with_options_noexit(const char * const *usagestr,
+ const struct option *options);

/*----- incremantal advanced APIs -----*/

--
1.8.3.4

2015-05-17 11:01:40

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 28/37] perf tools: Make perf depend on libbpf

By adding libbpf into perf's Makefile, this patch enable perf to
build libbpf during building if libelf is found and NO_LIBELF is not
set. The newly introduced code is similar to libapi and libtraceevent
building in Makefile.perf.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/Makefile.perf | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index c43a205..c69821c 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -123,6 +123,7 @@ STRIP = strip

LIB_DIR = $(srctree)/tools/lib/api/
TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
+BPF_DIR = $(srctree)/tools/lib/bpf/

# include config/Makefile by default and rule out
# non-config cases
@@ -158,6 +159,7 @@ strip-libs = $(filter-out -l%,$(1))

ifneq ($(OUTPUT),)
TE_PATH=$(OUTPUT)
+ BPF_PATH=$(OUTPUT)
ifneq ($(subdir),)
LIB_PATH=$(OUTPUT)/../lib/api/
else
@@ -166,6 +168,7 @@ endif
else
TE_PATH=$(TRACE_EVENT_DIR)
LIB_PATH=$(LIB_DIR)
+ BPF_PATH=$(BPF_DIR)
endif

LIBTRACEEVENT = $(TE_PATH)libtraceevent.a
@@ -174,6 +177,9 @@ export LIBTRACEEVENT
LIBAPI = $(LIB_PATH)libapi.a
export LIBAPI

+LIBBPF = $(BPF_PATH)/libbpf.a
+export LIBBPF
+
# python extension build directories
PYTHON_EXTBUILD := $(OUTPUT)python_ext_build/
PYTHON_EXTBUILD_LIB := $(PYTHON_EXTBUILD)lib/
@@ -225,6 +231,11 @@ export PERL_PATH
LIB_FILE=$(OUTPUT)libperf.a

PERFLIBS = $(LIB_FILE) $(LIBAPI) $(LIBTRACEEVENT)
+ifndef NO_LIBELF
+ ifeq ($(feature-libelf), 1)
+ PERFLIBS += $(LIBBPF)
+ endif
+endif

# We choose to avoid "if .. else if .. else .. endif endif"
# because maintaining the nesting to match is a pain. If
@@ -387,6 +398,13 @@ $(LIBAPI)-clean:
$(call QUIET_CLEAN, libapi)
$(Q)$(MAKE) -C $(LIB_DIR) O=$(OUTPUT) clean >/dev/null

+$(LIBBPF): FORCE
+ $(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) $(OUTPUT)libbpf.a
+
+$(LIBBPF)-clean:
+ $(call QUIET_CLEAN, libbpf)
+ $(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) clean >/dev/null
+
help:
@echo 'Perf make targets:'
@echo ' doc - make *all* documentation (see below)'
@@ -525,7 +543,7 @@ config-clean:
$(call QUIET_CLEAN, config)
$(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null

-clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
+clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean config-clean
$(call QUIET_CLEAN, core-objs) $(RM) $(LIB_FILE) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(LANG_BINDINGS)
$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
$(Q)$(RM) .config-detected
--
1.8.3.4

2015-05-17 11:00:11

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 29/37] perf bpf: Add 'perf bpf record' subcommand

'perf bpf record' will be implemented to load eBPF object file then
start recording on events defined in it. This patch only adds a
'--object' option for selecting object file. Other arguments are
directly passed to cmd_record.

Example:

# perf bpf --object obj1.o --object obj2.o -- -a command-to-record

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/builtin-bpf.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 109 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
index a8858e2..b5c613f 100644
--- a/tools/perf/builtin-bpf.c
+++ b/tools/perf/builtin-bpf.c
@@ -48,14 +48,121 @@ static void print_usage(void)
exit(129);
}

-static const char * const bpf_subcommands[] = { NULL };
+static const char * const bpf_record_usage[] = {
+ "perf bpf record [<options>] -- [options passed to record]",
+ NULL
+};
+
+struct param {
+ struct strlist *object_file_names;
+} param;
+
+static int add_bpf_object_file(const struct option *opt,
+ const char *str,
+ int unset __maybe_unused)
+{
+ struct strlist **filenames = (struct strlist **)opt->value;
+
+ if (!*filenames)
+ *filenames = strlist__new(true, NULL);
+
+ if (!*filenames) {
+ pr_err("alloc strlist failed\n");
+ return -ENOMEM;
+ }
+
+ strlist__add(*filenames, str);
+ return 0;
+}
+
+static int start_bpf_record(int argc, const char **argv)
+{
+ int i, new_argc, err, pos = 0;
+ const char **new_argv;
+
+ new_argc = argc + 1;
+ new_argv = malloc((new_argc + 1) * sizeof(const char *));
+ if (!new_argv) {
+ pr_err("malloc failed\n");
+ return -ENOMEM;
+ }
+ bzero(new_argv, sizeof(const char *) * (new_argc + 1));
+
+ new_argv[pos++] = strdup("bpf record --");
+
+ for (i = 0; i < argc; i++) {
+ new_argv[pos++] = strdup(argv[i]);
+ if (!new_argv[pos - 1]) {
+ pr_err("strdup failed\n");
+ err = -ENOMEM;
+ goto errout;
+ }
+ }
+
+ return cmd_record(new_argc, new_argv, NULL);
+
+errout:
+ if (new_argv) {
+ for (i = 0; i < pos; i++)
+ free((void *)new_argv[i]);
+ free(new_argv);
+ }
+ return err;
+}
+
+static int cmd_bpf_record(int argc, const char **argv,
+ const char *prefix __maybe_unused)
+{
+ /*
+ * Options in perf-record may be mirrored here. This command
+ * should add '-e' options to cmd_record.
+ */
+ static const struct option options[] = {
+ OPT_CALLBACK(0, "object", &param.object_file_names,
+ "file", "eBPF object file name",
+ add_bpf_object_file),
+ OPT_END()
+ };
+ struct str_node *str_node;
+
+ argc = parse_options(argc, argv, options,
+ bpf_record_usage, PARSE_OPT_KEEP_DASHDASH);
+
+ if (!param.object_file_names) {
+ pr_err("At least one '--object' option is needed to "
+ "select an eBPF object file\n");
+ goto usage;
+ }
+
+ if (!argv || strcmp(argv[0], "--")) {
+ pr_err("Should use '--' to pass options to perf "
+ "record\n");
+ goto usage;
+ }
+
+ /* skip "--" */
+ argc--;
+ argv++;
+
+ strlist__for_each(str_node, param.object_file_names)
+ pr_debug("loading %s\n", str_node->s);
+
+ return start_bpf_record(argc, argv);
+usage:
+ usage_with_options(bpf_record_usage, options);
+ return -1;
+}
+
+
+static const char * const bpf_subcommands[] = { "record", NULL };

static struct bpf_cmd bpf_cmds[] = {
+ { "record", "load eBPF program into kernel then start record on events in it", cmd_bpf_record },
{ .name = NULL, },
};

int cmd_bpf(int argc, const char **argv,
- const char *prefix __maybe_unused)
+ const char *prefix)
{
struct bpf_cmd *cmd;
const char *cmdstr;
--
1.8.3.4

2015-05-17 11:00:29

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 30/37] perf bpf: Add bpf-loader and open ELF object files

bpf_prepare_laod() is used to open each eBPF object files. The
returned handlers are stored into an array. A corresponding bpf_clear()
is introduced to free all resources.

For the propose of logging, 3 printing functions are defined using
DEFINE_PRINT_FN, which require a veprintf to process va_list args.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/builtin-bpf.c | 12 +++++++-
tools/perf/util/Build | 1 +
tools/perf/util/bpf-loader.c | 69 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 13 +++++++++
tools/perf/util/debug.c | 5 ++++
tools/perf/util/debug.h | 1 +
6 files changed, 100 insertions(+), 1 deletion(-)
create mode 100644 tools/perf/util/bpf-loader.c
create mode 100644 tools/perf/util/bpf-loader.h

diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
index b5c613f..e922921 100644
--- a/tools/perf/builtin-bpf.c
+++ b/tools/perf/builtin-bpf.c
@@ -12,6 +12,7 @@
#include "perf.h"
#include "debug.h"
#include "parse-options.h"
+#include "bpf-loader.h"

typedef int (*bpf_cmd_fn_t)(int argc, const char **argv, const char *prefix);

@@ -124,6 +125,7 @@ static int cmd_bpf_record(int argc, const char **argv,
OPT_END()
};
struct str_node *str_node;
+ int err;

argc = parse_options(argc, argv, options,
bpf_record_usage, PARSE_OPT_KEEP_DASHDASH);
@@ -144,8 +146,16 @@ static int cmd_bpf_record(int argc, const char **argv,
argc--;
argv++;

- strlist__for_each(str_node, param.object_file_names)
+ strlist__for_each(str_node, param.object_file_names) {
+ const char *fn = str_node->s;
+
pr_debug("loading %s\n", str_node->s);
+ if ((err = bpf_prepare_load(fn))) {
+ pr_err("bpf: failed to load object file %s\n",
+ fn);
+ return -1;
+ }
+ }

return start_bpf_record(argc, argv);
usage:
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 797490a..609f6d6 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -75,6 +75,7 @@ libperf-$(CONFIG_X86) += tsc.o
libperf-y += cloexec.o
libperf-y += thread-stack.o

+libperf-$(CONFIG_LIBELF) += bpf-loader.o
libperf-$(CONFIG_LIBELF) += symbol-elf.o
libperf-$(CONFIG_LIBELF) += probe-event.o

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
new file mode 100644
index 0000000..17cd2b6
--- /dev/null
+++ b/tools/perf/util/bpf-loader.c
@@ -0,0 +1,69 @@
+/*
+ * bpf-loader.c
+ *
+ * Load bpf files into kernel using libbpf; create kprobe events.
+ */
+
+#include <bpf/libbpf.h>
+#include "perf.h"
+#include "debug.h"
+#include "bpf-loader.h"
+
+#define DEFINE_PRINT_FN(name, level) \
+static int libbpf_##name(const char *fmt, ...) \
+{ \
+ va_list args; \
+ int ret; \
+ \
+ va_start(args, fmt); \
+ ret = veprintf(level, verbose, pr_fmt(fmt), args);\
+ va_end(args); \
+ return ret; \
+}
+
+DEFINE_PRINT_FN(warning, 0)
+DEFINE_PRINT_FN(info, 0)
+DEFINE_PRINT_FN(debug, 1)
+
+static bool libbpf_inited = false;
+
+#define MAX_OBJECTS 128
+
+struct {
+ struct bpf_object *objects[MAX_OBJECTS];
+ size_t nr_objects;
+} params;
+
+int bpf_prepare_load(const char *filename)
+{
+ struct bpf_object *obj;
+
+ if (!libbpf_inited)
+ libbpf_set_print(libbpf_warning,
+ libbpf_info,
+ libbpf_debug);
+
+ pr_debug("bpf: loading %s\n", filename);
+ if (params.nr_objects + 1 > MAX_OBJECTS) {
+ pr_err("Too many objects to load. "
+ "Increase MAX_OBJECTS\n");
+ return -EMFILE;
+ }
+
+ obj = bpf_open_object(filename);
+ if (!obj) {
+ pr_err("bpf: failed to load %s\n", filename);
+ return -EINVAL;
+ }
+
+ params.objects[params.nr_objects++] = obj;
+ return 0;
+}
+
+void bpf_clear(void)
+{
+ size_t i;
+
+ for (i = 0; i < params.nr_objects; i++)
+ bpf_close_object(params.objects[i]);
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
new file mode 100644
index 0000000..74eebc3
--- /dev/null
+++ b/tools/perf/util/bpf-loader.h
@@ -0,0 +1,13 @@
+/*
+ * Copyright (C) 2015, Wang Nan <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+#ifndef __BPF_LOADER_H
+#define __BPF_LOADER_H
+
+int bpf_prepare_load(const char *filename);
+
+void bpf_clear(void);
+#endif
diff --git a/tools/perf/util/debug.c b/tools/perf/util/debug.c
index 2da5581..86d9c73 100644
--- a/tools/perf/util/debug.c
+++ b/tools/perf/util/debug.c
@@ -36,6 +36,11 @@ static int _eprintf(int level, int var, const char *fmt, va_list args)
return ret;
}

+int veprintf(int level, int var, const char *fmt, va_list args)
+{
+ return _eprintf(level, var, fmt, args);
+}
+
int eprintf(int level, int var, const char *fmt, ...)
{
va_list args;
diff --git a/tools/perf/util/debug.h b/tools/perf/util/debug.h
index caac2fd..8b9a088 100644
--- a/tools/perf/util/debug.h
+++ b/tools/perf/util/debug.h
@@ -50,6 +50,7 @@ void pr_stat(const char *fmt, ...);

int eprintf(int level, int var, const char *fmt, ...) __attribute__((format(printf, 3, 4)));
int eprintf_time(int level, int var, u64 t, const char *fmt, ...) __attribute__((format(printf, 4, 5)));
+int veprintf(int level, int var, const char *fmt, va_list args);

int perf_debug_option(const char *str);

--
1.8.3.4

2015-05-17 10:59:41

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 31/37] perf bpf: Collect all eBPF programs

This patch collects 'struct bpf_prog_handler *' after opening an object
file. Handlers are stored into an array of MAX_PROBES slots.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 17cd2b6..67bfb62 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -8,6 +8,7 @@
#include "perf.h"
#include "debug.h"
#include "bpf-loader.h"
+#include "probe-finder.h" // for MAX_PROBES

#define DEFINE_PRINT_FN(name, level) \
static int libbpf_##name(const char *fmt, ...) \
@@ -32,11 +33,17 @@ static bool libbpf_inited = false;
struct {
struct bpf_object *objects[MAX_OBJECTS];
size_t nr_objects;
+
+ struct {
+ struct bpf_prog_handler *prog;
+ } progs[MAX_PROBES];
+ size_t nr_progs;
} params;

int bpf_prepare_load(const char *filename)
{
struct bpf_object *obj;
+ struct bpf_prog_handler *prog;

if (!libbpf_inited)
libbpf_set_print(libbpf_warning,
@@ -57,6 +64,24 @@ int bpf_prepare_load(const char *filename)
}

params.objects[params.nr_objects++] = obj;
+
+ bpf_obj_for_each_prog(obj, prog) {
+ const char *title;
+
+ if (params.nr_progs + 1 > MAX_PROBES) {
+ pr_err("Too many programs. "
+ "Increase MAX_PROBES\n");
+ return -EMFILE;
+ }
+
+ params.progs[params.nr_progs++].prog = prog;
+
+ if (bpf_prog_get_title(prog, &title, false)) {
+ pr_err("Unable to get title of a program\n");
+ return -EINVAL;
+ }
+ pr_debug("bpf: add program '%s'\n", title);
+ }
return 0;
}

--
1.8.3.4

2015-05-17 11:00:22

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 32/37] perf bpf: Parse probe points of eBPF programs during preparation

This patch parses section name of each program, and creates
corresponding 'struct perf_probe_event' structure.

parse_perf_probe_command() is used to do the main parsing works.
Parsing result is stored into a global array. This is because
add_perf_probe_events() is non-reentrantable. In following patch,
add_perf_probe_events will be introduced to insert kprobes. It accepts
an array of 'struct perf_probe_event' and do works in one call.

Define PERF_BPF_PROBE_GROUP as "perf_bpf_probe", which will be used
as group name of all eBPF probing points.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 74 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 67bfb62..3dc9b61 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -29,6 +29,7 @@ DEFINE_PRINT_FN(debug, 1)
static bool libbpf_inited = false;

#define MAX_OBJECTS 128
+#define PERF_BPF_PROBE_GROUP "perf_bpf_probe"

struct {
struct bpf_object *objects[MAX_OBJECTS];
@@ -36,10 +37,78 @@ struct {

struct {
struct bpf_prog_handler *prog;
+ struct perf_probe_event *pevent;
} progs[MAX_PROBES];
size_t nr_progs;
+
+ struct perf_probe_event event_array[MAX_PROBES];
+ size_t nr_events;
} params;

+static struct perf_probe_event *
+alloc_perf_probe_event(void)
+{
+ struct perf_probe_event *pev;
+ int n = params.nr_events;
+
+ if (n >= MAX_PROBES) {
+ pr_err("bpf: too many events, increase MAX_PROBES\n");
+ return NULL;
+ }
+
+ params.nr_events = n + 1;
+ pev = &params.event_array[n];
+ bzero(pev, sizeof(*pev));
+ return pev;
+}
+
+static int
+bpf_do_config(size_t prog_idx, const char *config_str)
+{
+ struct perf_probe_event *pev = alloc_perf_probe_event();
+ int err = 0;
+
+ if (!pev)
+ return -ENOMEM;
+
+ if ((err = parse_perf_probe_command(config_str, pev)) < 0) {
+ pr_err("bpf config: %s is not a valid config string\n",
+ config_str);
+ /* parse failed, don't need clear pev. */
+ return -EINVAL;
+ }
+
+ if (pev->group && strcmp(pev->group, PERF_BPF_PROBE_GROUP)) {
+ pr_err("bpf config: '%s': group for event is set "
+ "and not '%s'.\n", config_str,
+ PERF_BPF_PROBE_GROUP);
+ err = -EINVAL;
+ goto errout;
+ } else if (!pev->group)
+ pev->group = strdup(PERF_BPF_PROBE_GROUP);
+
+ if (!pev->group) {
+ pr_err("bpf config: strdup failed\n");
+ err = -ENOMEM;
+ goto errout;
+ }
+
+ if (!pev->event) {
+ pr_err("bpf config: '%s': event name is missing\n",
+ config_str);
+ err = -EINVAL;
+ goto errout;
+ }
+
+ pr_debug("bpf config: config '%s' ok\n", config_str);
+ params.progs[prog_idx].pevent = pev;
+ return 0;
+errout:
+ if (pev)
+ clear_perf_probe_event(pev);
+ return err;
+}
+
int bpf_prepare_load(const char *filename)
{
struct bpf_object *obj;
@@ -81,6 +150,8 @@ int bpf_prepare_load(const char *filename)
return -EINVAL;
}
pr_debug("bpf: add program '%s'\n", title);
+
+ bpf_do_config(params.nr_progs - 1, title);
}
return 0;
}
@@ -89,6 +160,9 @@ void bpf_clear(void)
{
size_t i;

+ for (i = 0; i < params.nr_events; i++)
+ clear_perf_probe_event(&params.event_array[i]);
+
for (i = 0; i < params.nr_objects; i++)
bpf_close_object(params.objects[i]);
}
--
1.8.3.4

2015-05-17 11:00:02

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 33/37] perf bpf: Probe at kprobe points

In this patch, kprobe points are created using add_perf_probe_events.
Since all events are already grouped together in an array, calling
add_perf_probe_events() once creates all of them.

To ensure recover the system when existing, a bpf_unprobe() is also
provided and hooked to atexit(). Because all of events are in group
"perf_bpf_probe" (PERF_BPF_PROBE_GROUP), use 'perf_bpf_probe:*' string
to remove all of them should be a simple method. However, this also
introduces a constrain that only one instance of 'perf bpf' is allowed
to be active.

Due to the atexit hook, this patch must ensure bpf_unprobe() won't
error if it has been executed. A global flag 'is_probing' is used to
track probing state. bpf_unprobe() will do nothing if it is unset.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/builtin-bpf.c | 8 ++++++++
tools/perf/util/bpf-loader.c | 48 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 2 ++
3 files changed, 58 insertions(+)

diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
index e922921..95e0c65 100644
--- a/tools/perf/builtin-bpf.c
+++ b/tools/perf/builtin-bpf.c
@@ -157,10 +157,18 @@ static int cmd_bpf_record(int argc, const char **argv,
}
}

+ if (bpf_probe()) {
+ pr_err("bpf: failed to probe\n");
+ goto errout;
+ }
+
return start_bpf_record(argc, argv);
usage:
usage_with_options(bpf_record_usage, options);
return -1;
+errout:
+ bpf_clear();
+ return -1;
}


diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 3dc9b61..c820d1a 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -160,9 +160,57 @@ void bpf_clear(void)
{
size_t i;

+ bpf_unprobe();
for (i = 0; i < params.nr_events; i++)
clear_perf_probe_event(&params.event_array[i]);

for (i = 0; i < params.nr_objects; i++)
bpf_close_object(params.objects[i]);
}
+
+static bool is_probing = false;
+
+int bpf_unprobe(void)
+{
+ struct strlist *dellist;
+ int ret;
+
+ if (!is_probing)
+ return 0;
+
+ dellist = strlist__new(true, PERF_BPF_PROBE_GROUP ":*");
+ if (!dellist) {
+ pr_err("Failed to create dellist when unprobing\n");
+ return -ENOMEM;
+ }
+
+ ret = del_perf_probe_events(dellist);
+ strlist__delete(dellist);
+ if (ret < 0)
+ pr_err(" Error: failed to delete events: %s\n",
+ strerror(-ret));
+ else
+ is_probing = false;
+ return ret < 0 ? ret : 0;
+}
+
+static void bpf_unprobe_atexit(void)
+{
+ bpf_unprobe();
+}
+
+int bpf_probe(void)
+{
+ int err = add_perf_probe_events(params.event_array,
+ params.nr_events,
+ MAX_PROBES, 0);
+ /* add_perf_probe_events return negative when fail */
+ if (err < 0)
+ pr_err("bpf probe: failed to probe events\n");
+ else {
+ is_probing = true;
+ atexit(bpf_unprobe_atexit);
+ }
+
+ return err < 0 ? err : 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 74eebc3..30dea2e 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -8,6 +8,8 @@
#define __BPF_LOADER_H

int bpf_prepare_load(const char *filename);
+int bpf_probe(void);
+int bpf_unprobe(void);

void bpf_clear(void);
#endif
--
1.8.3.4

2015-05-17 10:59:50

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 34/37] perf bpf: Load all eBPF object into kernel

This patch utilizes bpf_load_object() provided by libbpf to load all
objects into kernel.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/builtin-bpf.c | 5 +++++
tools/perf/util/bpf-loader.c | 16 ++++++++++++++++
tools/perf/util/bpf-loader.h | 3 ++-
3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
index 95e0c65..8155f39 100644
--- a/tools/perf/builtin-bpf.c
+++ b/tools/perf/builtin-bpf.c
@@ -162,6 +162,11 @@ static int cmd_bpf_record(int argc, const char **argv,
goto errout;
}

+ if (bpf_load()) {
+ pr_err("bpf: failed to load\n");
+ goto errout;
+ }
+
return start_bpf_record(argc, argv);
usage:
usage_with_options(bpf_record_usage, options);
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index c820d1a..7295a3b 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -214,3 +214,19 @@ int bpf_probe(void)

return err < 0 ? err : 0;
}
+
+int bpf_load(void)
+{
+ size_t i;
+ int err;
+
+ for (i = 0; i < params.nr_objects; i++) {
+ err = bpf_load_object(params.objects[i]);
+ if (err) {
+ pr_err("failed to load object\n");
+ return err;
+ }
+ }
+
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 30dea2e..1ccebdf 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -9,7 +9,8 @@

int bpf_prepare_load(const char *filename);
int bpf_probe(void);
-int bpf_unprobe(void);
+int bpf_load(void);

+int bpf_unprobe(void);
void bpf_clear(void);
#endif
--
1.8.3.4

2015-05-17 11:01:20

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 35/37] perf tools: Add a bpf_wrapper global flag

The newly introduced flag is a indicator for 'perf bpf'. When commands
like 'cmd_record' is started using 'perf bpf', they should consider the
binding of bpf programs.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/builtin-bpf.c | 2 ++
tools/perf/perf.c | 7 +++++++
tools/perf/perf.h | 1 +
3 files changed, 10 insertions(+)

diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
index 8155f39..4ef294a 100644
--- a/tools/perf/builtin-bpf.c
+++ b/tools/perf/builtin-bpf.c
@@ -201,6 +201,8 @@ int cmd_bpf(int argc, const char **argv,

cmdstr = argv[0];

+ bpf_wrapper = true;
+
for_each_cmd(cmd) {
if (strcmp(cmd->name, cmdstr))
continue;
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index eff1a55..2c41c43 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -28,6 +28,13 @@ int use_browser = -1;
static int use_pager = -1;
const char *input_name;

+/*
+ * Only for cmd_bpf, set this wrapper to true. This flag is to tell
+ * commands like 'record' that they are running inside a 'perf bpf'
+ * command, and let them consider binding of bpf programs.
+ */
+bool bpf_wrapper = false;
+
struct cmd_struct {
const char *cmd;
int (*fn)(int, const char **, const char *);
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index e14bb63..f3d233a 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -69,4 +69,5 @@ struct record_opts {
struct option;
extern const char * const *record_usage;
extern struct option *record_options;
+extern bool bpf_wrapper;
#endif
--
1.8.3.4

2015-05-17 10:59:20

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 36/37] perf tools: Add bpf_fd field to evsel and config it

This patch adds a bpf_fd field to 'struct evsel' then introduces method
to config it. In bpf-loader, a bpf_for_each_program() function is added.
Which calls the callback function for each found eBPF program with
their names and file descriptors. In evlist.c, perf_evlist__add_bpf()
is added to add all bpf events into evlist. 'perf record' calls
perf_evlist__add_bpf() if it is wrapped by perf bpf.

Since bpf-loader.c will not built if libelf not found, an empty
bpf_for_each_program() is defined in bpf-loader.h to avoid compiling
error.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/builtin-record.c | 6 ++++++
tools/perf/util/bpf-loader.c | 30 ++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 18 ++++++++++++++++++
tools/perf/util/evlist.c | 34 ++++++++++++++++++++++++++++++++++
tools/perf/util/evlist.h | 1 +
tools/perf/util/evsel.c | 1 +
tools/perf/util/evsel.h | 1 +
7 files changed, 91 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index c3efdfb..21f3dcb 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -978,6 +978,12 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
goto out_symbol_exit;
}

+ if (bpf_wrapper &&
+ perf_evlist__add_bpf(rec->evlist) < 0) {
+ pr_err("Failed to add events from bpf object\n");
+ goto out_symbol_exit;
+ }
+
if (rec->opts.target.tid && !rec->opts.no_inherit_set)
rec->opts.no_inherit = true;

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 7295a3b..b793c69 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -230,3 +230,33 @@ int bpf_load(void)

return 0;
}
+
+int bpf_for_each_program(bpf_prog_iter_callback_t func, void *arg)
+{
+ int i;
+
+ if (!func)
+ return -EINVAL;
+
+ for (i = 0; i < (int)params.nr_progs; i++) {
+ int err, fd;
+ const char *group, *event;
+ struct bpf_prog_handler *prog =
+ params.progs[i].prog;
+ struct perf_probe_event *pevent =
+ params.progs[i].pevent;
+
+ group = pevent->group;
+ event = pevent->event;
+ err = bpf_prog_get_fd(prog, &fd);
+ if (err) {
+ pr_err("Unable to get program fd\n");
+ return -EINVAL;
+ }
+
+ err = func(group, event, fd, arg);
+ if (err)
+ return err;
+ }
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 1ccebdf..ffdd535 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -7,10 +7,28 @@
#ifndef __BPF_LOADER_H
#define __BPF_LOADER_H

+#include <linux/compiler.h> // for __maybe_unused
+
int bpf_prepare_load(const char *filename);
int bpf_probe(void);
int bpf_load(void);

int bpf_unprobe(void);
void bpf_clear(void);
+
+typedef int (*bpf_prog_iter_callback_t)(const char *group,
+ const char *event,
+ int fd, void *arg);
+
+#ifdef HAVE_LIBELF_SUPPORT
+int bpf_for_each_program(bpf_prog_iter_callback_t func, void *arg);
+#else
+static inline int
+bpf_for_each_program(bpf_prog_iter_callback_t func __maybe_unused,
+ void *arg __maybe_unused)
+{
+ return 0;
+}
+#endif
+
#endif
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 080be93..cc26317 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -14,6 +14,7 @@
#include "target.h"
#include "evlist.h"
#include "evsel.h"
+#include "bpf-loader.h"
#include "debug.h"
#include <unistd.h>

@@ -194,6 +195,39 @@ error:
return -ENOMEM;
}

+static int add_bpf_event(const char *group, const char *event, int fd,
+ void *arg)
+{
+ struct perf_evlist *evlist = arg;
+ struct perf_evsel *pos;
+ struct list_head list;
+ int err, idx, entries;
+
+ pr_info("add bpf event %s:%s and attach bpf program %d\n",
+ group, event, fd);
+
+ INIT_LIST_HEAD(&list);
+ idx = evlist->nr_entries;
+ err = parse_events_add_tracepoint(&list, &idx, (char *)group,
+ (char *)event);
+ if (err) {
+ pr_err("Failed to add BPF event %s:%s\n",
+ group, event);
+ return err;
+ }
+ list_for_each_entry(pos, &list, node)
+ pos->bpf_fd = fd;
+
+ entries = idx - evlist->nr_entries;
+ perf_evlist__splice_list_tail(evlist, &list, entries);
+ return 0;
+}
+
+int perf_evlist__add_bpf(struct perf_evlist *evlist)
+{
+ return bpf_for_each_program(add_bpf_event, evlist);
+}
+
static int perf_evlist__add_attrs(struct perf_evlist *evlist,
struct perf_event_attr *attrs, size_t nr_attrs)
{
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index b5cce95..032f04d 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -68,6 +68,7 @@ void perf_evlist__delete(struct perf_evlist *evlist);

void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry);
int perf_evlist__add_default(struct perf_evlist *evlist);
+int perf_evlist__add_bpf(struct perf_evlist *evlist);
int __perf_evlist__add_default_attrs(struct perf_evlist *evlist,
struct perf_event_attr *attrs, size_t nr_attrs);

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 33e3fd8..04d60a7 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -205,6 +205,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
evsel->leader = evsel;
evsel->unit = "";
evsel->scale = 1.0;
+ evsel->bpf_fd = -1;
INIT_LIST_HEAD(&evsel->node);
perf_evsel__object.init(evsel);
evsel->sample_size = __perf_evsel__sample_size(attr->sample_type);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index e486151..ff1f634 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -100,6 +100,7 @@ struct perf_evsel {
int sample_read;
struct perf_evsel *leader;
char *group_name;
+ int bpf_fd;
};

union u64_swap {
--
1.8.3.4

2015-05-17 10:59:30

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH v3 37/37] perf tools: Attach eBPF program to perf event

In this patch PERF_EVENT_IOC_SET_BPF ioctl is used to attach eBPF
program to a newly created perf event. The file descriptor of the
eBPF program is passed to perf record using previous patches, and
stored into evsel->bpf_fd.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/evsel.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 04d60a7..62c0a8c 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1201,6 +1201,22 @@ retry_open:
err);
goto try_fallback;
}
+
+ if (evsel->bpf_fd >= 0) {
+ int evt_fd = FD(evsel, cpu, thread);
+ int bpf_fd = evsel->bpf_fd;
+
+ err = ioctl(evt_fd,
+ PERF_EVENT_IOC_SET_BPF,
+ bpf_fd);
+ if (err) {
+ pr_err("failed to attach bpf fd %d\n",
+ bpf_fd);
+ err = -EINVAL;
+ goto out_close;
+ }
+ }
+
set_rlimit = NO_CHANGE;

/*
--
1.8.3.4

2015-05-18 12:31:00

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v3 16/37] bpf tools: Collect eBPF programs from their own sections

Hi,

On Sun, May 17, 2015 at 10:56:41AM +0000, Wang Nan wrote:
> This patch collects all programs in an object file into an array of
> 'struct bpf_program' for further processing. That structure is for
> representing each eBPF program. 'bpf_prog' should be a better name, but
> it has been used by linux/filter.h. Although it is a kernel space name,
> I still prefer to call it 'bpf_program' to prevent possible confusion.
>
> Signed-off-by: Wang Nan <[email protected]>
> ---

[SNIP]

> +static struct bpf_program *
> +bpf_program_alloc(struct bpf_object *obj, void *data, size_t size,
> + char *name, int idx)
> +{
> + struct bpf_program *prog, *progs;
> + int nr_progs;
> +
> + if (size < sizeof(struct bpf_insn)) {
> + pr_warning("corrupted section '%s'\n", name);
> + return NULL;
> + }
> +
> + progs = obj->programs;
> + nr_progs = obj->nr_programs;
> +
> + progs = realloc(progs, sizeof(*prog) * (nr_progs + 1));
> + if (!progs) {
> + /*
> + * In this case the original obj->programs
> + * is still valid, so don't need special treat for
> + * bpf_close_object().
> + */
> + pr_warning("failed to alloc a new program '%s'\n",
> + name);
> + return NULL;
> + }
> +
> + obj->programs = progs;
> +
> + prog = &progs[nr_progs];
> + bzero(prog, sizeof(*prog));
> +
> + obj->nr_programs = nr_progs + 1;
> +
> + prog->section_name = strdup(name);
> + if (!prog->section_name) {
> + pr_warning("failed to alloc name for prog %s\n",
> + name);
> + goto out;
> + }
> +
> + prog->insns = malloc(size);
> + if (!prog->insns) {
> + pr_warning("failed to alloc insns for %s\n", name);
> + goto out;
> + }
> + prog->insns_cnt = size / sizeof(struct bpf_insn);
> + memcpy(prog->insns, data,
> + prog->insns_cnt * sizeof(struct bpf_insn));

Doesn't the data need to be swapped?

Thanks,
Namhyung


> + prog->idx = idx;
> +
> + return prog;
> +out:
> + bpf_clear_program(prog);
> + return NULL;
> +}
> +
> static struct bpf_object *__bpf_obj_alloc(const char *path)
> {
> struct bpf_object *obj;
> @@ -380,6 +468,22 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
> err = -EEXIST;
> } else
> obj->elf.symbols = data;
> + } else if ((sh.sh_type == SHT_PROGBITS) &&
> + (sh.sh_flags & SHF_EXECINSTR) &&
> + (data->d_size > 0)) {
> + struct bpf_program *prog;
> +
> + prog = bpf_program_alloc(obj, data->d_buf,
> + data->d_size, name,
> + idx);
> + if (!prog) {
> + pr_warning("failed to alloc "
> + "program %s (%s)", name,
> + obj->path);
> + err = -ENOMEM;
> + } else
> + pr_debug("found program %s\n",
> + prog->section_name);
> }
> if (err)
> goto out;
> @@ -446,5 +550,12 @@ void bpf_close_object(struct bpf_object *obj)
> free(obj->maps_buf);
> if (obj->config_str)
> free(obj->config_str);
> + if (obj->programs) {
> + size_t i;
> +
> + for (i = 0; i < obj->nr_programs; i++)
> + bpf_clear_program(&obj->programs[i]);
> + free(obj->programs);
> + }
> free(obj);
> }
> --
> 1.8.3.4
>

2015-05-18 12:48:51

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH v3 16/37] bpf tools: Collect eBPF programs from their own sections



在 2015/5/18 20:29, Namhyung Kim 写道:
> Hi,
>
> On Sun, May 17, 2015 at 10:56:41AM +0000, Wang Nan wrote:
>> This patch collects all programs in an object file into an array of
>> 'struct bpf_program' for further processing. That structure is for
>> representing each eBPF program. 'bpf_prog' should be a better name, but
>> it has been used by linux/filter.h. Although it is a kernel space name,
>> I still prefer to call it 'bpf_program' to prevent possible confusion.
>>
>> Signed-off-by: Wang Nan <[email protected]>
>> ---
> [SNIP]
>
>> +static struct bpf_program *
>> +bpf_program_alloc(struct bpf_object *obj, void *data, size_t size,
>> + char *name, int idx)
>> +{
>> + struct bpf_program *prog, *progs;
>> + int nr_progs;
>> +
>> + if (size < sizeof(struct bpf_insn)) {
>> + pr_warning("corrupted section '%s'\n", name);
>> + return NULL;
>> + }
>> +
>> + progs = obj->programs;
>> + nr_progs = obj->nr_programs;
>> +
>> + progs = realloc(progs, sizeof(*prog) * (nr_progs + 1));
>> + if (!progs) {
>> + /*
>> + * In this case the original obj->programs
>> + * is still valid, so don't need special treat for
>> + * bpf_close_object().
>> + */
>> + pr_warning("failed to alloc a new program '%s'\n",
>> + name);
>> + return NULL;
>> + }
>> +
>> + obj->programs = progs;
>> +
>> + prog = &progs[nr_progs];
>> + bzero(prog, sizeof(*prog));
>> +
>> + obj->nr_programs = nr_progs + 1;
>> +
>> + prog->section_name = strdup(name);
>> + if (!prog->section_name) {
>> + pr_warning("failed to alloc name for prog %s\n",
>> + name);
>> + goto out;
>> + }
>> +
>> + prog->insns = malloc(size);
>> + if (!prog->insns) {
>> + pr_warning("failed to alloc insns for %s\n", name);
>> + goto out;
>> + }
>> + prog->insns_cnt = size / sizeof(struct bpf_insn);
>> + memcpy(prog->insns, data,
>> + prog->insns_cnt * sizeof(struct bpf_insn));
> Doesn't the data need to be swapped?
>
> Thanks,
> Namhyung
>

I'm not very sure, since they are instructions. Byte order of
instructions and byte order of data
are not always same. Think about ARM. Therefore another choice is to
swap them in kernel,
keep user-kernel interface clean.

Alexei Starovoitov, do you think we should use uniformed instruction
byte order in both big and little
endian kernel on user-kernel interface, or let userspace feed swapped
instructions to kernel if
endianess is not match?

Thank you.

>> + prog->idx = idx;
>> +
>> + return prog;
>> +out:
>> + bpf_clear_program(prog);
>> + return NULL;
>> +}
>> +
>> static struct bpf_object *__bpf_obj_alloc(const char *path)
>> {
>> struct bpf_object *obj;
>> @@ -380,6 +468,22 @@ static int bpf_obj_elf_collect(struct bpf_object *obj)
>> err = -EEXIST;
>> } else
>> obj->elf.symbols = data;
>> + } else if ((sh.sh_type == SHT_PROGBITS) &&
>> + (sh.sh_flags & SHF_EXECINSTR) &&
>> + (data->d_size > 0)) {
>> + struct bpf_program *prog;
>> +
>> + prog = bpf_program_alloc(obj, data->d_buf,
>> + data->d_size, name,
>> + idx);
>> + if (!prog) {
>> + pr_warning("failed to alloc "
>> + "program %s (%s)", name,
>> + obj->path);
>> + err = -ENOMEM;
>> + } else
>> + pr_debug("found program %s\n",
>> + prog->section_name);
>> }
>> if (err)
>> goto out;
>> @@ -446,5 +550,12 @@ void bpf_close_object(struct bpf_object *obj)
>> free(obj->maps_buf);
>> if (obj->config_str)
>> free(obj->config_str);
>> + if (obj->programs) {
>> + size_t i;
>> +
>> + for (i = 0; i < obj->nr_programs; i++)
>> + bpf_clear_program(&obj->programs[i]);
>> + free(obj->programs);
>> + }
>> free(obj);
>> }
>> --
>> 1.8.3.4
>>

2015-05-18 14:28:23

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 03/37] tools lib traceevent: Install libtraceevent.a into libdir

On Sun, May 17, 2015 at 10:56:28AM +0000, Wang Nan wrote:
> Before this patch, 'make install' installs libraries into bindir:
>
> $ make install DESTDIR=./tree
> INSTALL trace_plugins
> INSTALL libtraceevent.a
> INSTALL libtraceevent.so
> $ find ./tree
> ./tree/
> ./tree/usr
> ./tree/usr/local
> ./tree/usr/local/bin
> ./tree/usr/local/bin/libtraceevent.a
> ./tree/usr/local/bin/libtraceevent.so
> ...
>
> /usr/local/lib( or lib64) should be a better place.
>
> This patch replaces 'bin' with libdir. For __LP64__ building, libraries
> are installed to /usr/local/lib64. For other building, to
> /usr/local/lib instead.
>
> After applying this patch:
>
> $ make install DESTDIR=./tree
> INSTALL trace_plugins
> INSTALL libtraceevent.a
> INSTALL libtraceevent.so
> $ find ./tree
> ./tree
> ./tree/usr
> ./tree/usr/local
> ./tree/usr/local/lib64
> ./tree/usr/local/lib64/libtraceevent.a
> ./tree/usr/local/lib64/traceevent
> ./tree/usr/local/lib64/traceevent/plugins
> ./tree/usr/local/lib64/traceevent/plugins/plugin_mac80211.so
> ./tree/usr/local/lib64/traceevent/plugins/plugin_hrtimer.so
> ...
> ./tree/usr/local/lib64/libtraceevent.so
>
> Signed-off-by: Wang Nan <[email protected]>

I've already Ack-ed this one
http://marc.info/?l=linux-kernel&m=143094417302353&w=2

and so did Steven
http://marc.info/?l=linux-kernel&m=143096066606455&w=2

jirka

2015-05-18 14:31:28

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 04/37] tools: Change FEATURE_TESTS and FEATURE_DISPLAY to weak binding

On Sun, May 17, 2015 at 10:56:29AM +0000, Wang Nan wrote:
> Replace strong binding of FEATURE_TESTS and FEATURE_DISPLAY by weak
> binding. This patch enables other makefiles which include
> tools/build/Makefile.feature enable only limited feathres to check.
>
> Signed-off-by: Wang Nan <[email protected]>

Acked-by: Jiri Olsa <[email protected]>

thanks,
jirka

2015-05-18 16:59:07

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 01/37] perf/events/core: fix race in bpf program unregister

On 5/17/15 3:56 AM, Wang Nan wrote:
> there is a race between perf_event_free_bpf_prog() and free_trace_kprobe():
...
> Fixes: 2541517c32be ("tracing, perf: Implement BPF programs attached to kprobes")
> Reported-by: Wang Nan <[email protected]>
> Signed-off-by: Alexei Starovoitov <[email protected]>

there is no need to repost it.
Rather ack and/or add your tested-by to original thread:
https://lkml.org/lkml/2015/5/15/586

My fix is also not related to this series. Your work exposed
the bug, but the fix is needed in 4.1 whereas this stuff is
targeting 4.2 the earliest.

2015-05-18 17:02:01

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 02/37] perf tools: Set vmlinux_path__nr_entries to 0 in vmlinux_path__exit

On 5/17/15 3:56 AM, Wang Nan wrote:
> Original vmlinux_path__exit() doesn't revert vmlinux_path__nr_entries
> to its original state. After the while loop vmlinux_path__nr_entries
> becomes -1 instead of 0. This makes a problem that, if runs twice,
> during the second run vmlinux_path__init() will set vmlinux_path[-1]
> to strdup("vmlinux"), corrupts random memory.
>
> This patch reset vmlinux_path__nr_entries to 0 after the while loop.
>
> Signed-off-by: Wang Nan <[email protected]>
> Acked-by: Namhyung Kim <[email protected]>

no need to repost. Arnaldo picked it up already.
When you repost it many times, it gets very confusing.
If patch is not moving, rather ping it in original thread.
Same with patches 3,4,5.

2015-05-18 17:35:29

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 06/37] bpf tools: Introduce 'bpf' library to tools

On 5/17/15 3:56 AM, Wang Nan wrote:
> This is the first patch of libbpf. The goal of libbpf is to create a
> standard way for accessing eBPF object files. This patch creates
> Makefile and Build for it, allows 'make' to build libbpf.a and
> libbpf.so, 'make install' to put them into proper directories.
> Most part of Makefile is borrowed from traceevent. Before building,
> it checks the existance of libelf in Makefile, and deny to build if
> not found. Instead of throw an error if libelf not found, the error
> raises in a phony target "elfdep". This design is to ensure
> 'make clean' still workable even if libelf is not found.
>
> Signed-off-by: Wang Nan <[email protected]>
> ---
...
> +
> +# Version of eBPF elf file
> +FILE_VERSION = 1

what that comment suppose to mean?

> +# Makefiles suck: This macro sets a default value of $(2) for the
> +# variable named by $(1), unless the variable has been set by
> +# environment or command line. This is necessary for CC and AR
> +# because make sets default values, so the simpler ?= approach
> +# won't work as expected.

what this for? copy-paste?

> +# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
> +$(call allow-override,CC,$(CROSS_COMPILE)gcc)
> +$(call allow-override,AR,$(CROSS_COMPILE)ar)

was cross-compile tested or just copy-pasted?

> +EXT = -std=gnu99

I guess it was copy-pasted from libtraceevent, but please double
check that it's actually needed.

> +# Append required CFLAGS
> +override CFLAGS += -fPIC
> +override CFLAGS += $(INCLUDES)
> +override CFLAGS += -D_GNU_SOURCE

_GNU_SOURCE actually needed?
Please sanitize the file.

> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> new file mode 100644
> index 0000000..bebe99a
> --- /dev/null
> +++ b/tools/lib/bpf/libbpf.c
> @@ -0,0 +1,15 @@
> +/*
> + * common eBPF ELF loading operations.
> + *
> + * Copyright (C) 2015, Wang Nan <[email protected]>
> + * Copyright (C) 2015, Huawei Inc.

since it borrows heavily from samples/bpf/bpf_load.c, libbpf.h
would be nice if you mention the source and/or copyright in the header.

> + *
> + * Released under the GPL v2. (and only v2, not any later version)
> + */

are you sure about this restriction? libtracevent is LGPL, for example.

2015-05-18 17:55:56

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 07/37] bpf tools: Allow caller to set printing function

On 5/17/15 3:56 AM, Wang Nan wrote:
> By libbpf_set_print(), users of libbpf are allowed to register he/she
> own debug, info and warning printing functions. Libbpf will use those
> functions to print messages. If not provided, default info and warning
> printing functions are fprintf(stderr, ...); defailt debug printing
> is NULL.
>
> Signed-off-by: Wang Nan <[email protected]>
> ---

Acked-by: Alexei Starovoitov <[email protected]>

would be good to add a sentence to the commit log explaining
that perf and patch 30 makes use of this api:

> +void libbpf_set_print(int (*warn)(const char *format, ...),
> + int (*info)(const char *format, ...),
> + int (*debug)(const char *format, ...));
> +

2015-05-18 17:57:40

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 08/37] bpf tools: Define basic interface

On 5/17/15 3:56 AM, Wang Nan wrote:
> bpf_open_object() and bpf_close_object() are open and close function of
> eBPF object files. 'struct bpf_object' will be handler of one object
> file. Its internal structure is hide to user.
>
> Signed-off-by: Wang Nan <[email protected]>
> ---
...
> +
> +struct bpf_object *bpf_open_object(const char *path)
> +{
> + return NULL;
> +}
> +
> +void bpf_close_object(struct bpf_object *object)
> +{
> + return 0;
> +}

I'm not a fan of introducing empty helpers. I would
squash it with the patch that actually adds meat to it.

2015-05-18 18:06:09

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 09/37] bpf tools: Open eBPF object file and do basic validation

On 5/17/15 3:56 AM, Wang Nan wrote:
> This patch adds basic 'struct bpf_object' which will be used for eBPF
> object file loading. eBPF object files are compiled by LLVM as ELF
> format. In this patch, libelf is used to open those files, read EHDR
> and do basic validation according to e_type and e_machine.
>
> All elf related staffs are grouped together and reside in elf field of
> 'struct bpf_object'. bpf_obj_clear_elf() is introduced to clear it.
>
> Signed-off-by: Wang Nan <[email protected]>
> ---
...
> +static void bpf_obj_clear_elf(struct bpf_object *obj)
> +{
> + if (!obj_elf_valid(obj))
> + return;
> +
> + if (obj->elf.elf) {
> + elf_end(obj->elf.elf);
> + obj->elf.elf = NULL;

the name of the function is odd.
'..clear_elf' ? Only because the field was named 'elf' ?
Also obj->elf.elf looks unbalanced.

may be bpf_obj_elf_finish() to match bpf_obj_elf_init()
and obj->efile.elf ?

> + }
> + if (obj->elf.fd >= 0) {
> + close(obj->elf.fd);

and the above will change to obj->efile.fd ?

2015-05-18 18:19:42

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 10/37] bpf tools: Check endianess and set swap flag according to EHDR

On 5/17/15 3:56 AM, Wang Nan wrote:
> Check endianess according to EHDR to support loading eBPF objects into
> big endian machines. Code is taken from tools/perf/util/symbol-elf.c.
>
> Signed-off-by: Wang Nan <[email protected]>
> ---
...
> +static int
> +bpf_obj_swap_init(struct bpf_object *obj)
> +{
> + static unsigned int const endian = 1;
> +
> + obj->needs_swap = false;
> +
> + switch (obj->elf.ehdr.e_ident[EI_DATA]) {
> + case ELFDATA2LSB:
> + /* We are big endian, BPF obj is little endian. */
> + if (*(unsigned char const *)&endian != 1)
> + obj->needs_swap = true;
> + return 0;
> +
> + case ELFDATA2MSB:
> + /* We are little endian, BPF obj is big endian. */
> + if (*(unsigned char const *)&endian != 0)
> + obj->needs_swap = true;
> + return 0;

I don't like 'needs_swap', since it implies that bpf loader
can magically convert little endian code into big endian.
It's obviously not the case.
For example original C program may have: (u8) ptr->val;
where 'val' is 32-bit variable. llvm may optimize it into
byte load, but offset will be different depending on endianness
of the target. By byteswapping instruction encoding you'll be able
to load such code, but it will be logically incorrect. Verifier will
catch the broken code if accesses are out of bounds, but for
this u32->u8 example, the code is safe, but logically incorrect.
Instead, please help fixing llvm. It needs -march=bpfbe switch,
which will generate proper big-endian code.

2015-05-18 18:21:27

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 11/37] bpf tools: Iterate over ELF sections to collect information

On 5/17/15 3:56 AM, Wang Nan wrote:
> bpf_obj_elf_collect() is introduced to iterate over each elf sections
> to collection informations in eBPF object files. This function will
> futher enhanced to collect license, kernel version, programs, configs
> and map information.
>
> Signed-off-by: Wang Nan <[email protected]>
...
> + if (gelf_getshdr(scn, &sh) != &sh) {
> + pr_warning("failed to get section header"
> + " from %s\n", obj->path);

please don't wrap strings.
Here and in all other patches.

2015-05-18 18:27:21

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 12/37] bpf tools: Collect version and license from ELF sections

On 5/17/15 3:56 AM, Wang Nan wrote:
> Expand bpf_obj_elf_collect() to collect license and kernel version
> information in eBPF object file. eBPF object file should have a section
> named 'license', which contains a string. It should also have a section
> named 'version', contains a u32 LINUX_VERSION_CODE.
>
> bpf_obj_validate() is introduced to validate object file after loaded.
> Currently it only check existance of 'version' section.
>
> Signed-off-by: Wang Nan <[email protected]>
...
> +#ifdef min
> +# undef min
> +#endif
> +#define min(x, y) ({ \
> + typeof(x) _min1 = (x); \
> + typeof(y) _min2 = (y); \
> + (void) (&_min1 == &_min2); \
> + _min1 < _min2 ? _min1 : _min2; })
> +

copy-paste from lib traceevent?
there is another container_of copy-paste later in the patches.
please use something like tools/lib/bpf/utils.h file for such
things, so we can consolidate and share this code later.

2015-05-18 18:30:25

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 14/37] bpf tools: Collect config string from 'config' section

On 5/17/15 3:56 AM, Wang Nan wrote:
> A 'config' section is allowed to enable eBPF object file to pass
> something to user. libbpf doesn't use config string.
>
> To make further processing easiler, this patch converts '\0' in the
> whole config strings into '\n'
>
> Signed-off-by: Wang Nan <[email protected]>
...
> + else if (strcmp(name, "config") == 0)
> + err = bpf_obj_config_init(obj, data->d_buf,
> + data->d_size);

the cover letter says that 'config' section name is unused right now.
Let's not add the code to parse it then.

2015-05-18 18:32:13

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 16/37] bpf tools: Collect eBPF programs from their own sections

On 5/18/15 5:47 AM, Wangnan (F) wrote:
>
>
>>> + prog->insns_cnt = size / sizeof(struct bpf_insn);
>>> + memcpy(prog->insns, data,
>>> + prog->insns_cnt * sizeof(struct bpf_insn));
>> Doesn't the data need to be swapped?
>>
>> Thanks,
>> Namhyung
>>
>
> I'm not very sure, since they are instructions. Byte order of
> instructions and byte order of data
> are not always same. Think about ARM. Therefore another choice is to
> swap them in kernel,
> keep user-kernel interface clean.
>
> Alexei Starovoitov, do you think we should use uniformed instruction
> byte order in both big and little
> endian kernel on user-kernel interface, or let userspace feed swapped
> instructions to kernel if
> endianess is not match?

see my reply to the other patch. endianness is the job of llvm.
Let's fix it there.

2015-05-18 18:34:49

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 18/37] bpf tools: Record map accessing instructions for each program

On 5/17/15 3:56 AM, Wang Nan wrote:
> This patch records the indics of instructions which are needed to be
> relocated. Those information are saved in 'reloc_desc' field in
> 'struct bpf_program'. In loading phase (this patch takes effect in
> opening phase), the collected instructions will be replaced by
> map loading instructions.
>
> Since we are going to close the ELF file and clear all data at the end
> of 'opening' phase, ELF information will no longer be valid in
> 'loading' phase. We have to locate the instructions before maps are
> loaded, instead of directly modifying the instruction.
>
> 'struct bpf_map_def' is introduce in this patch to let us know how many
> maps defined in the object.
>
> Signed-off-by: Wang Nan <[email protected]>
...
> +/*
> + * packed attribute is unnecessary for 'bpf_map_def'.
> + */
> +struct bpf_map_def {
> + unsigned int type;
> + unsigned int key_size;
> + unsigned int value_size;
> + unsigned int max_entries;
> +};

the comment looks out of place. 'packed' is necessary somewhere else?
What were you concerned about here?

2015-05-18 18:36:12

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 19/37] bpf tools: Clear libelf and ELF parsing resrouce to finish opening

On 5/17/15 3:56 AM, Wang Nan wrote:
> After all eBPF programs in an object file are loaded, related ELF
> information is useless. Close the object file and free those memory.
>
> Signed-off-by: Wang Nan <[email protected]>
> ---
> tools/lib/bpf/libbpf.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index ded96cb..9ed8cca 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -687,8 +687,8 @@ struct bpf_object *bpf_open_object(const char *path)
> if (bpf_obj_validate(obj))
> goto out;
>
> + bpf_obj_clear_elf(obj);
> return obj;
> -
> out:
> bpf_close_object(obj);
> return NULL;
>

looks like a bug in some previous patch in the same set.
Should be squashed with that patch?

2015-05-18 18:39:52

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 20/37] bpf tools: Add bpf.c/h for common bpf operations

On 5/17/15 3:56 AM, Wang Nan wrote:
> This patch introduces bpf.c and bpf.h, which hold common functions
> issuing bpf syscall. The goal of these two files is to hide syscall
> completly from user. Note that bpf.c and bpf.h only deal with kernel
> interface. Things like structure of 'map' section in the ELF object is
> not cared by of bpf.[ch].
>
> We first introduce bpf_create_map().
>
> Signed-off-by: Wang Nan <[email protected]>
...
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> new file mode 100644
> index 0000000..3dbe30d
> --- /dev/null
> +++ b/tools/lib/bpf/bpf.c
> @@ -0,0 +1,53 @@
> +/*
> + * common eBPF ELF operations.
> + *
> + * Copyright (C) 2015, Wang Nan <[email protected]>
> + * Copyright (C) 2015, Huawei Inc.
> + *
> + * Released under the GPL v2. (and only v2, not any later version)
> + */

same comment as in the other patch.

2015-05-18 18:49:02

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 21/37] bpf tools: Create eBPF maps defined in an object file

On 5/17/15 3:56 AM, Wang Nan wrote:
> This patch creates maps based on 'map' section in object file using
> bpf_create_map(), and store the fds into an array in
> 'struct bpf_object'. Since the byte order of the object may differ
> from the host, swap map definition before processing.
>
> This is the first patch in 'loading' phase. Previous patches parse ELF
> object file and create needed data structure, but doesnnt play with
> kernel. They belong to 'opening' phase.
>
> Signed-off-by: Wang Nan <[email protected]>
...
> + for (j = 0; j < i; j++) {
> + close(obj->maps_fds[j]);
> + obj->maps_fds[j] = -1;

this line is unnecessary, since you're freeing the whole array
at the line below:

> + }
> + free(obj->maps_fds);
> + obj->maps_fds = NULL;

...

> void bpf_close_object(struct bpf_object *obj)
> {
> if (!obj)
> return;
>
> bpf_obj_clear_elf(obj);
> + bpf_unload_object(obj);

just realized that this API keeps elf file open for the whole
life time. I think that should be split up.
bpf_open_object() can do elf parsing, creation of maps and
bpf loading.
bpf_close_object() can unload programs, maps. That's fine,
but closing of elf file and freeing memory associated
with parsing sections should be a separate call.

2015-05-18 18:53:06

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 23/37] bpf tools: Introduce bpf_load_program() to bpf.c

On 5/17/15 3:56 AM, Wang Nan wrote:
> bpf_load_program() can be used to load bpf program into kernel. To make
> loading faster, first try to load without logbuf. Try again with logbuf
> if the first try failed.
>
> Signed-off-by: Wang Nan <[email protected]>
...
> + attr.insn_cnt = (__u32)insns_cnt;
> + attr.insns = ptr_to_u64((void *) insns);
> + attr.license = ptr_to_u64((void *) license);

unnecessary casts?

> + attr.log_buf = ptr_to_u64(NULL);
> + attr.log_size = 0;
> + attr.log_level = 0;
> + attr.kern_version = kern_version;
> +
> + fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
> + if ((fd >= 0) || !log_buf || !log_buf_sz)

unnecessary ()

> + return fd;
> +
> + /* Try again with log */
> + attr.log_buf = ptr_to_u64(log_buf);
> + attr.log_size = log_buf_sz;
> + attr.log_level = 1;
> + log_buf[0] = 0;
> + return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));

this part looks good. thank you for addressing the feedback from v1.

2015-05-18 19:25:19

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 33/37] perf bpf: Probe at kprobe points

On 5/17/15 3:56 AM, Wang Nan wrote:
> In this patch, kprobe points are created using add_perf_probe_events.
> Since all events are already grouped together in an array, calling
> add_perf_probe_events() once creates all of them.
>
> To ensure recover the system when existing, a bpf_unprobe() is also
> provided and hooked to atexit(). Because all of events are in group
> "perf_bpf_probe" (PERF_BPF_PROBE_GROUP), use 'perf_bpf_probe:*' string
> to remove all of them should be a simple method. However, this also
> introduces a constrain that only one instance of 'perf bpf' is allowed
> to be active.

perf bits look ok to me, but I'm not a perf codebase expert.
The above restriction also looks fine for now.
We can relax it later if it really becomes a problem.

2015-05-18 19:38:47

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 5/17/15 3:56 AM, Wang Nan wrote:
> This is the 3rd version of 'perf bpf' patch series, based on
> v4.1-rc3.
>
> The goal of this series of patches is to integrate eBPF with perf.
> After applying these patches, users are allowed to use following
> command to load eBPF program compiled by LLVM into kernel then start
> recording with filters on:
>
> # perf bpf record --object sample_bpf.o -- -a sleep 4

I think using programs are sophisticated filters is a good start
and are useful already. Let's focus on that at the moment.
I wouldn't grow the patchset any bigger.

> Other than the previous change, v3 patch series drops the '|' event
> syntax introduced in v2, because I realized that in v2 users are
> allowed to pass any bpf fd by using it, like:
>
> # perf bpf record -- -e sched:sched_switch|100| sleep 1
>
> which may become trouble maker.

passing fd number as a string is an odd interface anyway.
So I think that was the right call. We can improve it later.

> Are we actually need a 'perf bpf' command? We can get similar result by
> modifying 'perf record' to make it load eBPF program before recording.
>
> I suggest to keep 'perf bpf', group all eBPF stuffs together using a
> uniform entry. Also, eBPF programs can act not only as filters but also
> data aggregator. It is possible to make something link 'perf bpf run'
> to simply make it run, and dump result after user hit 'C-c' or timeout.

Though it's tempting to group under 'perf bpf'. I think it's cleaner to
add --object flag to 'perf record'
Since it will avoid unnecessary '--'.
Unless we can drop it? Like
perf bpf record --object sample_bpf.o -a sleep 4
should work?
If not, then the following is better:
perf record --object sample_bpf.o -a sleep 4

Thank you for the hard work!

2015-05-18 20:29:06

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 02/37] perf tools: Set vmlinux_path__nr_entries to 0 in vmlinux_path__exit

Em Mon, May 18, 2015 at 10:01:56AM -0700, Alexei Starovoitov escreveu:
> On 5/17/15 3:56 AM, Wang Nan wrote:
> >Original vmlinux_path__exit() doesn't revert vmlinux_path__nr_entries
> >to its original state. After the while loop vmlinux_path__nr_entries
> >becomes -1 instead of 0. This makes a problem that, if runs twice,
> >during the second run vmlinux_path__init() will set vmlinux_path[-1]
> >to strdup("vmlinux"), corrupts random memory.
> >
> >This patch reset vmlinux_path__nr_entries to 0 after the while loop.
> >
> >Signed-off-by: Wang Nan <[email protected]>
> >Acked-by: Namhyung Kim <[email protected]>
>
> no need to repost. Arnaldo picked it up already.

Yeah, I try to cherry pick stuff that is not related to the main topic
of a patchkit and that is an independent bug fix.

- Arnaldo

> When you repost it many times, it gets very confusing.
> If patch is not moving, rather ping it in original thread.
> Same with patches 3,4,5.

2015-05-18 20:34:39

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 12/37] bpf tools: Collect version and license from ELF sections

Em Mon, May 18, 2015 at 11:27:17AM -0700, Alexei Starovoitov escreveu:
> On 5/17/15 3:56 AM, Wang Nan wrote:
> >Expand bpf_obj_elf_collect() to collect license and kernel version
> >information in eBPF object file. eBPF object file should have a section
> >named 'license', which contains a string. It should also have a section
> >named 'version', contains a u32 LINUX_VERSION_CODE.
> >
> >bpf_obj_validate() is introduced to validate object file after loaded.
> >Currently it only check existance of 'version' section.
> >
> >Signed-off-by: Wang Nan <[email protected]>
> ...
> >+#ifdef min
> >+# undef min
> >+#endif
> >+#define min(x, y) ({ \
> >+ typeof(x) _min1 = (x); \
> >+ typeof(y) _min2 = (y); \
> >+ (void) (&_min1 == &_min2); \
> >+ _min1 < _min2 ? _min1 : _min2; })
> >+
>
> copy-paste from lib traceevent?
> there is another container_of copy-paste later in the patches.
> please use something like tools/lib/bpf/utils.h file for such
> things, so we can consolidate and share this code later.

Better, ask what the kernel does, find out where it does and replicate
that in tools/include/, i.e. a min function?!? This looks like useful
stuff! ;-)

So, the kernel has this one, in include/linux/kernel.h:

/*
* min()/max()/clamp() macros that also do
* strict type-checking.. See the
* "unnecessary" pointer comparison.
*/
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })

Looks a lot like that one above, no? :-)

So, lets take a look at: tools/include/linux/kernel.h... bummer, that
one wasn't yet moved from tools/perf/util/include/linux/kernel.h there.
Will do, and then it has this:

#ifndef min
#define min(x, y) ({ \
typeof(x) _min1 = (x); \
typeof(y) _min2 = (y); \
(void) (&_min1 == &_min2); \
_min1 < _min2 ? _min1 : _min2; })
#endif

We should minimize this mess :-)

- Arnaldo

2015-05-18 20:35:30

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 19/37] bpf tools: Clear libelf and ELF parsing resrouce to finish opening

Em Mon, May 18, 2015 at 11:36:07AM -0700, Alexei Starovoitov escreveu:
> On 5/17/15 3:56 AM, Wang Nan wrote:
> >After all eBPF programs in an object file are loaded, related ELF
> >information is useless. Close the object file and free those memory.
> >
> >Signed-off-by: Wang Nan <[email protected]>
> >---
> > tools/lib/bpf/libbpf.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >index ded96cb..9ed8cca 100644
> >--- a/tools/lib/bpf/libbpf.c
> >+++ b/tools/lib/bpf/libbpf.c
> >@@ -687,8 +687,8 @@ struct bpf_object *bpf_open_object(const char *path)
> > if (bpf_obj_validate(obj))
> > goto out;
> >
> >+ bpf_obj_clear_elf(obj);
> > return obj;
> >-
> > out:
> > bpf_close_object(obj);
> > return NULL;
> >
>
> looks like a bug in some previous patch in the same set.
> Should be squashed with that patch?

Right, please don't introduce bugs + fixes, we love bisecting :-)

- Arnaldo

2015-05-18 20:37:07

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 21/37] bpf tools: Create eBPF maps defined in an object file

Em Mon, May 18, 2015 at 11:48:57AM -0700, Alexei Starovoitov escreveu:
> On 5/17/15 3:56 AM, Wang Nan wrote:
> >This patch creates maps based on 'map' section in object file using
> >bpf_create_map(), and store the fds into an array in
> >'struct bpf_object'. Since the byte order of the object may differ
> >from the host, swap map definition before processing.
> >
> >This is the first patch in 'loading' phase. Previous patches parse ELF
> >object file and create needed data structure, but doesnnt play with
> >kernel. They belong to 'opening' phase.
> >
> >Signed-off-by: Wang Nan <[email protected]>
> ...
> >+ for (j = 0; j < i; j++) {
> >+ close(obj->maps_fds[j]);
> >+ obj->maps_fds[j] = -1;
>
> this line is unnecessary, since you're freeing the whole array
> at the line below:

I guess this is kinda of a zfree() that he wants to achieve, i.e. if a
bug creeps in that makes some code use the deleted maps_fds, he rather
wants it to access a -1 fd than an some reused fd, no?

> >+ }
> >+ free(obj->maps_fds);

> >+ obj->maps_fds = NULL;
>
> ...
>
> > void bpf_close_object(struct bpf_object *obj)
> > {
> > if (!obj)
> > return;
> >
> > bpf_obj_clear_elf(obj);
> >+ bpf_unload_object(obj);
>
> just realized that this API keeps elf file open for the whole
> life time. I think that should be split up.
> bpf_open_object() can do elf parsing, creation of maps and
> bpf loading.
> bpf_close_object() can unload programs, maps. That's fine,
> but closing of elf file and freeing memory associated
> with parsing sections should be a separate call.

2015-05-18 20:37:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 23/37] bpf tools: Introduce bpf_load_program() to bpf.c

Em Mon, May 18, 2015 at 11:52:59AM -0700, Alexei Starovoitov escreveu:
> On 5/17/15 3:56 AM, Wang Nan wrote:
> >bpf_load_program() can be used to load bpf program into kernel. To make
> >loading faster, first try to load without logbuf. Try again with logbuf
> >if the first try failed.
> >
> >Signed-off-by: Wang Nan <[email protected]>
> ...
> >+ attr.insn_cnt = (__u32)insns_cnt;
> >+ attr.insns = ptr_to_u64((void *) insns);
> >+ attr.license = ptr_to_u64((void *) license);
>
> unnecessary casts?

And inconsistent at that, we don't need a space before the casted
variable.

> >+ attr.log_buf = ptr_to_u64(NULL);
> >+ attr.log_size = 0;
> >+ attr.log_level = 0;
> >+ attr.kern_version = kern_version;
> >+
> >+ fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
> >+ if ((fd >= 0) || !log_buf || !log_buf_sz)
>
> unnecessary ()
>
> >+ return fd;
> >+
> >+ /* Try again with log */
> >+ attr.log_buf = ptr_to_u64(log_buf);
> >+ attr.log_size = log_buf_sz;
> >+ attr.log_level = 1;
> >+ log_buf[0] = 0;
> >+ return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
>
> this part looks good. thank you for addressing the feedback from v1.

2015-05-18 20:44:25

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

Em Mon, May 18, 2015 at 12:38:42PM -0700, Alexei Starovoitov escreveu:
> On 5/17/15 3:56 AM, Wang Nan wrote:
> >This is the 3rd version of 'perf bpf' patch series, based on
> >v4.1-rc3.
> >
> >The goal of this series of patches is to integrate eBPF with perf.
> >After applying these patches, users are allowed to use following
> >command to load eBPF program compiled by LLVM into kernel then start
> >recording with filters on:
> >
> > # perf bpf record --object sample_bpf.o -- -a sleep 4
>
> I think using programs are sophisticated filters is a good start
> and are useful already. Let's focus on that at the moment.
> I wouldn't grow the patchset any bigger.

Right, I am just now trying to slowly get involved, and my first
impression would be like that, i.e. we have:

perf record --filter, to pass a filter to tracepoints, if I could
instead of a filter expression pass, say, filter_bpf.o, that would seem
natural for me, i.e. no new option, just an alternative type of filter,
one way more powerful.

If i could write it as a C expression that would then get wrapped up as
a bpf, compiled, turned into an object, and then inserted in the kernel
to be used as my filter, then that would be almost like a tracepoint
filter.

> >Other than the previous change, v3 patch series drops the '|' event
> >syntax introduced in v2, because I realized that in v2 users are
> >allowed to pass any bpf fd by using it, like:
> >
> > # perf bpf record -- -e sched:sched_switch|100| sleep 1

So, what was this supposed to achieve? What does 100 mean there?

> >
> >which may become trouble maker.
>
> passing fd number as a string is an odd interface anyway.
> So I think that was the right call. We can improve it later.
>
> > Are we actually need a 'perf bpf' command? We can get similar result by
> > modifying 'perf record' to make it load eBPF program before recording.
> >
> > I suggest to keep 'perf bpf', group all eBPF stuffs together using a
> > uniform entry. Also, eBPF programs can act not only as filters but also
> > data aggregator. It is possible to make something link 'perf bpf run'
> > to simply make it run, and dump result after user hit 'C-c' or timeout.
>
> Though it's tempting to group under 'perf bpf'. I think it's cleaner to
> add --object flag to 'perf record'

I'd say keep it in --filter, that noticing it is a bpf object would
dtrt:

perf record --filter bpf_thing.o usleep 1


> Since it will avoid unnecessary '--'.
> Unless we can drop it? Like
> perf bpf record --object sample_bpf.o -a sleep 4
> should work?
> If not, then the following is better:
> perf record --object sample_bpf.o -a sleep 4
>
> Thank you for the hard work!

Ditto!

- Arnaldo

2015-05-18 20:51:42

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 12/37] bpf tools: Collect version and license from ELF sections

On 5/18/15 1:34 PM, Arnaldo Carvalho de Melo wrote:
>
> So, lets take a look at: tools/include/linux/kernel.h... bummer, that
> one wasn't yet moved from tools/perf/util/include/linux/kernel.h there.
> Will do, and then it has this:
>
> #ifndef min
> #define min(x, y) ({ \
> typeof(x) _min1 = (x); \
> typeof(y) _min2 = (y); \
> (void) (&_min1 == &_min2); \
> _min1 < _min2 ? _min1 : _min2; })
> #endif
>
> We should minimize this mess :-)
>

yes. that's even better! Thanks!

2015-05-18 21:05:41

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 5/18/15 1:44 PM, Arnaldo Carvalho de Melo wrote:
>
> perf record --filter, to pass a filter to tracepoints, if I could
> instead of a filter expression pass, say, filter_bpf.o, that would seem
> natural for me, i.e. no new option, just an alternative type of filter,
> one way more powerful.
...
> I'd say keep it in --filter, that noticing it is a bpf object would
> dtrt:
>
> perf record --filter bpf_thing.o usleep 1
>

agree. make sense.
The only thing is that such bpf program defines both event and filter.
Existing --filter applies to --event, whereas this bpf_thing.o does both
and likely kprob-ing multiple events underneath.
I guess '--filter' still fits. Just need to document it clearly.

2015-05-18 21:20:20

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

Em Mon, May 18, 2015 at 02:05:35PM -0700, Alexei Starovoitov escreveu:
> On 5/18/15 1:44 PM, Arnaldo Carvalho de Melo wrote:
> >
> >perf record --filter, to pass a filter to tracepoints, if I could
> >instead of a filter expression pass, say, filter_bpf.o, that would seem
> >natural for me, i.e. no new option, just an alternative type of filter,
> >one way more powerful.
> ...
> >I'd say keep it in --filter, that noticing it is a bpf object would
> >dtrt:
> >
> > perf record --filter bpf_thing.o usleep 1
> >
>
> agree. make sense.
> The only thing is that such bpf program defines both event and filter.
> Existing --filter applies to --event, whereas this bpf_thing.o does both
> and likely kprob-ing multiple events underneath.
> I guess '--filter' still fits. Just need to document it clearly.

Humm, unsure then, because it is not a filter anymore, i.e. it is both a
filter and event selector :-\

I was thinking more like, hey, for an existing event, i.e. a place in
the kernel where it will collect something, collect if this filter
returns true. That would fit the existing --filter semantic.

perf record --event bpf_thing.o

Looks more natural then, as it is an event that will take place when the
filter returns true, and in addition to that, it will come with a bunch
of variables, etc.

And if that is the case, then what is the difference from a kprobe
event? I.e. for the existing tooling it wouldn't matter how this event
was set up, as long as it was available via tracefs, etc. I.e. it would
be completely similar to a tracepoint, kprobe, uprobe, etc, i.e. first
set it up, expose its internals via tracefs, no changes to perf.

I must be missing something... But then it would happen when I would try
to deal with this eBPF thing, bear with me :-)

- Arnaldo

2015-05-18 21:46:01

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 5/18/15 2:20 PM, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 18, 2015 at 02:05:35PM -0700, Alexei Starovoitov escreveu:
>> On 5/18/15 1:44 PM, Arnaldo Carvalho de Melo wrote:
>>>
>>> perf record --filter, to pass a filter to tracepoints, if I could
>>> instead of a filter expression pass, say, filter_bpf.o, that would seem
>>> natural for me, i.e. no new option, just an alternative type of filter,
>>> one way more powerful.
>> ...
>>> I'd say keep it in --filter, that noticing it is a bpf object would
>>> dtrt:
>>>
>>> perf record --filter bpf_thing.o usleep 1
>>>
>>
>> agree. make sense.
>> The only thing is that such bpf program defines both event and filter.
>> Existing --filter applies to --event, whereas this bpf_thing.o does both
>> and likely kprob-ing multiple events underneath.
>> I guess '--filter' still fits. Just need to document it clearly.
>
> Humm, unsure then, because it is not a filter anymore, i.e. it is both a
> filter and event selector :-\
>
> I was thinking more like, hey, for an existing event, i.e. a place in
> the kernel where it will collect something, collect if this filter
> returns true. That would fit the existing --filter semantic.
>
> perf record --event bpf_thing.o
>
> Looks more natural then, as it is an event that will take place when the
> filter returns true, and in addition to that, it will come with a bunch
> of variables, etc.

well, I think --event fits a bit less than --filter ;)
Both not ideal.
May be --bpfobj would be a better flag, since it's a clean slate.
Short version '-b' is also unused :)

> And if that is the case, then what is the difference from a kprobe
> event? I.e. for the existing tooling it wouldn't matter how this event
> was set up, as long as it was available via tracefs, etc. I.e. it would
> be completely similar to a tracepoint, kprobe, uprobe, etc, i.e. first
> set it up, expose its internals via tracefs, no changes to perf.

the main difference that programs are not static as kprobes.
bpf maps, programs need to be dynamically created and loaded and they
will cease to exist as soon as process that holds FDs exits. So it
matches perf_event_open model which is FD based as well.
And that's only filtering like usage. Where 'perf report' facilities
are reused. For 'kernel debugging', 'latency heatmaps' use cases some
new visualizations in perf will be needed. That's where
'perf bpf command' fits.

2015-05-19 13:45:07

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

Em Mon, May 18, 2015 at 02:45:58PM -0700, Alexei Starovoitov escreveu:
> On 5/18/15 2:20 PM, Arnaldo Carvalho de Melo wrote:
> > perf record --event bpf_thing.o

>> Looks more natural then, as it is an event that will take place when the
>> filter returns true, and in addition to that, it will come with a bunch
>> of variables, etc.
>
> well, I think --event fits a bit less than --filter ;)
> Both not ideal.

I thought --event was more suited, as it states what is the event, and
when it should happen, i.e. --filter is about reducing the frequency of
something well defined, i.e. an existing --event.

> May be --bpfobj would be a better flag, since it's a clean slate.
> Short version '-b' is also unused :)

Anything with "bpf" seems artificial ;-\ Using short options for
something controvertial also doesn't seems like a good idea :-)

>> And if that is the case, then what is the difference from a kprobe
>> event? I.e. for the existing tooling it wouldn't matter how this event
>> was set up, as long as it was available via tracefs, etc. I.e. it would
>> be completely similar to a tracepoint, kprobe, uprobe, etc, i.e. first
>> set it up, expose its internals via tracefs, no changes to perf.

> the main difference that programs are not static as kprobes.

Well, I could, and indeed have been thinking about, using kprobes as
part of the 'trace' process, i.e. to collect things not available in
existing data sources (aka --event's), for the purposes of a tracing
session, i.e. it would be set up and torn down as part of calling
'perf trace foo-workload'.

In this sense it would be more "not static", with the only caveat that
with the current way of setting up (ku)probes, it would be available for
whoever would wanted to/could use it while that tracing session would be
underway.

> bpf maps, programs need to be dynamically created and loaded and they
> will cease to exist as soon as process that holds FDs exits. So it

Ok, so its just that the setting up of the event is hardwired with the
use of it via --event in the command line.

Or, looking at it another way, using it via --event would set it up
_and_ use it, then tear it down.

> matches perf_event_open model which is FD based as well.
> And that's only filtering like usage. Where 'perf report' facilities
> are reused. For 'kernel debugging', 'latency heatmaps' use cases some
> new visualizations in perf will be needed. That's where
> 'perf bpf command' fits.

Humm, why not use 'perf script', 'perf trace' as well for those things?

A 'perf script' that actually uses a C subset, gets compiled by llvm and
then immediately used, with caching for amortizing llvm calls if those
are that expensive, etc, instead of the current python or perl scripting
would come in handy for people like PeterZ, right?

Oh, I would like it too.

- Arnaldo

2015-05-19 16:06:17

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

On Tue, May 19, 2015 at 10:44:58AM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, May 18, 2015 at 02:45:58PM -0700, Alexei Starovoitov escreveu:
> > On 5/18/15 2:20 PM, Arnaldo Carvalho de Melo wrote:
> > > perf record --event bpf_thing.o
>
> >> Looks more natural then, as it is an event that will take place when the
> >> filter returns true, and in addition to that, it will come with a bunch
> >> of variables, etc.
> >
> > well, I think --event fits a bit less than --filter ;)
> > Both not ideal.
>
> I thought --event was more suited, as it states what is the event, and
> when it should happen, i.e. --filter is about reducing the frequency of
> something well defined, i.e. an existing --event.

If we go with 'perf record' rather than 'perf bpf record', I agree
that --event option is more natural than --filter. The --event option
says that it will record - or enable, at least - a (kprobe) event for
bpf programs in it and then do something with it. :)

Maybe something like this?

perf record --event bpf:/path/to/object


>
> > May be --bpfobj would be a better flag, since it's a clean slate.
> > Short version '-b' is also unused :)
>
> Anything with "bpf" seems artificial ;-\ Using short options for
> something controvertial also doesn't seems like a good idea :-)
>
> >> And if that is the case, then what is the difference from a kprobe
> >> event? I.e. for the existing tooling it wouldn't matter how this event
> >> was set up, as long as it was available via tracefs, etc. I.e. it would
> >> be completely similar to a tracepoint, kprobe, uprobe, etc, i.e. first
> >> set it up, expose its internals via tracefs, no changes to perf.
>
> > the main difference that programs are not static as kprobes.
>
> Well, I could, and indeed have been thinking about, using kprobes as
> part of the 'trace' process, i.e. to collect things not available in
> existing data sources (aka --event's), for the purposes of a tracing
> session, i.e. it would be set up and torn down as part of calling
> 'perf trace foo-workload'.
>
> In this sense it would be more "not static", with the only caveat that
> with the current way of setting up (ku)probes, it would be available for
> whoever would wanted to/could use it while that tracing session would be
> underway.
>
> > bpf maps, programs need to be dynamically created and loaded and they
> > will cease to exist as soon as process that holds FDs exits. So it
>
> Ok, so its just that the setting up of the event is hardwired with the
> use of it via --event in the command line.
>
> Or, looking at it another way, using it via --event would set it up
> _and_ use it, then tear it down.
>
> > matches perf_event_open model which is FD based as well.
> > And that's only filtering like usage. Where 'perf report' facilities
> > are reused. For 'kernel debugging', 'latency heatmaps' use cases some
> > new visualizations in perf will be needed. That's where
> > 'perf bpf command' fits.
>
> Humm, why not use 'perf script', 'perf trace' as well for those things?
>
> A 'perf script' that actually uses a C subset, gets compiled by llvm and
> then immediately used, with caching for amortizing llvm calls if those
> are that expensive, etc, instead of the current python or perl scripting
> would come in handy for people like PeterZ, right?

Oh, this looks like an interesting approach.. are you saying something
like below?

1. bpf generation

# add kprobe event to be used by bpf program
$ perf probe --add <wanted function>

# record dummy data file for the event above
$ perf record -e <probe>

# generate sample bpf program for the event
$ perf script --gen-script bpf

# write the program and compile it with llvm
# copy the resulting binary into $PERF_EXEC_PATH
# now 'perf script --list' should show the bpf object/script

# delete kprobe event
$ perf probe --del <probe>


2. usage

# do the real work here via shell script
# add kprobe event, load and attach bpf program, delete kprobe after record
$ perf script record <bpf script>

# maybe we can have custom code/script to display the result
$ perf script report <bpf script>

2015-05-19 16:40:47

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

Em Wed, May 20, 2015 at 01:04:48AM +0900, Namhyung Kim escreveu:
> On Tue, May 19, 2015 at 10:44:58AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, May 18, 2015 at 02:45:58PM -0700, Alexei Starovoitov escreveu:
> > > On 5/18/15 2:20 PM, Arnaldo Carvalho de Melo wrote:
> > > > perf record --event bpf_thing.o
> >
> > >> Looks more natural then, as it is an event that will take place when the
> > >> filter returns true, and in addition to that, it will come with a bunch
> > >> of variables, etc.
> > >
> > > well, I think --event fits a bit less than --filter ;)
> > > Both not ideal.
> >
> > I thought --event was more suited, as it states what is the event, and
> > when it should happen, i.e. --filter is about reducing the frequency of
> > something well defined, i.e. an existing --event.
>
> If we go with 'perf record' rather than 'perf bpf record', I agree
> that --event option is more natural than --filter. The --event option
> says that it will record - or enable, at least - a (kprobe) event for
> bpf programs in it and then do something with it. :)
>
> Maybe something like this?
>
> perf record --event bpf:/path/to/object

The syntax maybe one of many, say if it sees a ".o" suffix in the even
name, look if the provided event name is a file and if this file has the
ELF header, whatever.

<SNIP>

> > > matches perf_event_open model which is FD based as well.
> > > And that's only filtering like usage. Where 'perf report' facilities
> > > are reused. For 'kernel debugging', 'latency heatmaps' use cases some
> > > new visualizations in perf will be needed. That's where
> > > 'perf bpf command' fits.
> >
> > Humm, why not use 'perf script', 'perf trace' as well for those things?
> >
> > A 'perf script' that actually uses a C subset, gets compiled by llvm and
> > then immediately used, with caching for amortizing llvm calls if those
> > are that expensive, etc, instead of the current python or perl scripting
> > would come in handy for people like PeterZ, right?
>
> Oh, this looks like an interesting approach.. are you saying something
> like below?

No, those are way too many steps :-)

What 'perf script' does? Right now you can ask for a script to run and
it will both start 'perf record' with the proper events, and then
"immediately" consume it, piping the output of the 'record' "script" to
the consumer, that is 'perf script' itself running an interpreter, perl
or python.

Look at:

[acme@ssdandy linux]$ ls -la ~/libexec/perf-core/scripts/perl/bin/
total 64
drwxr-xr-x. 2 acme acme 4096 May 19 11:53 .
drwxr-xr-x. 4 acme acme 4096 May 19 11:53 ..
-rwxr-xr-x. 1 acme acme 78 May 19 11:53 check-perf-trace-record
-rwxr-xr-x. 1 acme acme 132 Jan 6 11:36 check-perf-trace-report
-rwxr-xr-x. 1 acme acme 109 May 19 11:53 failed-syscalls-record
-rwxr-xr-x. 1 acme acme 241 May 19 11:53 failed-syscalls-report
-rwxr-xr-x. 1 acme acme 83 May 19 11:53 rw-by-file-record
-rwxr-xr-x. 1 acme acme 232 May 19 11:53 rw-by-file-report
-rwxr-xr-x. 1 acme acme 135 May 19 11:53 rw-by-pid-record
-rwxr-xr-x. 1 acme acme 114 May 19 11:53 rw-by-pid-report
-rwxr-xr-x. 1 acme acme 135 May 19 11:53 rwtop-record
-rwxr-xr-x. 1 acme acme 398 May 19 11:53 rwtop-report
-rwxr-xr-x. 1 acme acme 75 May 19 11:53 wakeup-latency-record
-rwxr-xr-x. 1 acme acme 133 May 19 11:53 wakeup-latency-report
-rwxr-xr-x. 1 acme acme 160 Jan 6 13:05 workqueue-stats-record
-rwxr-xr-x. 1 acme acme 136 Jan 6 13:05 workqueue-stats-report
[acme@ssdandy linux]$

I.e. the first part, say, failed-syscalls-record, would be done
internally, loading the bpf object, etc, the second part would be the
event massaging, but done in a C subset :-)

- Arnaldo

2015-05-19 20:46:18

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 5/19/15 9:40 AM, Arnaldo Carvalho de Melo wrote:
> Em Wed, May 20, 2015 at 01:04:48AM +0900, Namhyung Kim escreveu:
>> On Tue, May 19, 2015 at 10:44:58AM -0300, Arnaldo Carvalho de Melo wrote:
>>> Em Mon, May 18, 2015 at 02:45:58PM -0700, Alexei Starovoitov escreveu:
>>>> On 5/18/15 2:20 PM, Arnaldo Carvalho de Melo wrote:
>>>>> perf record --event bpf_thing.o
>>>
>>>>> Looks more natural then, as it is an event that will take place when the
>>>>> filter returns true, and in addition to that, it will come with a bunch
>>>>> of variables, etc.
>>>>
>>>> well, I think --event fits a bit less than --filter ;)
>>>> Both not ideal.
>>>
>>> I thought --event was more suited, as it states what is the event, and
>>> when it should happen, i.e. --filter is about reducing the frequency of
>>> something well defined, i.e. an existing --event.
>>
>> If we go with 'perf record' rather than 'perf bpf record', I agree
>> that --event option is more natural than --filter. The --event option
>> says that it will record - or enable, at least - a (kprobe) event for
>> bpf programs in it and then do something with it. :)
>>
>> Maybe something like this?
>>
>> perf record --event bpf:/path/to/object
>
> The syntax maybe one of many, say if it sees a ".o" suffix in the even
> name, look if the provided event name is a file and if this file has the
> ELF header, whatever.

agree. 'bpf:' prefix is redundant.
To me the following syntax is fine:
perf record --event bpf_file.o

In the future it can support automatically:
perf record --event bpf_file.c

Wang, thoughts?

>> Oh, this looks like an interesting approach.. are you saying something
>> like below?
>
> No, those are way too many steps :-)
>
> What 'perf script' does? Right now you can ask for a script to run and
> it will both start 'perf record' with the proper events, and then
> "immediately" consume it, piping the output of the 'record' "script" to
> the consumer, that is 'perf script' itself running an interpreter, perl
> or python.

if you're proposing to do something like:
perf script bpf_file.c
that will do event creation, filtering, aggregation, reporting
and printing results, then it's fine.
This is pretty much what I thought 'perf bpf run' will do.

> I.e. the first part, say, failed-syscalls-record, would be done
> internally, loading the bpf object, etc, the second part would be the
> event massaging, but done in a C subset :-)

not sure that's doable. The task (perf) that loaded the program in the
step one should be still alive when 2nd part is running.
For 'perf bpf run' use case, the whole record/report split is
artificial. There is no need for perf.data.
In my mind 'perf bpf run file.c' suppose to quickly compile .c,
hook into kernel, collect and print something, then Ctrl-C of
'perf bpf run' should automatically stop everything.
No perf.data anywhere. It should be quick single step
tool for live debugging.
At the same time 'perf record --event bpf_file.o' plus generic
'perf report' model works very well as well.
They are different use cases.

I guess I'm saying let's not get too much ahead of ourselves ;)
I think Wang's current patchset is close enough to make
'perf record --event bpf_file.o' to be useful.
Let's take this first step.

2015-05-19 21:10:44

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

Em Tue, May 19, 2015 at 01:46:10PM -0700, Alexei Starovoitov escreveu:
> On 5/19/15 9:40 AM, Arnaldo Carvalho de Melo wrote:
> >Em Wed, May 20, 2015 at 01:04:48AM +0900, Namhyung Kim escreveu:
> >>If we go with 'perf record' rather than 'perf bpf record', I agree
> >>that --event option is more natural than --filter. The --event option
> >>says that it will record - or enable, at least - a (kprobe) event for
> >>bpf programs in it and then do something with it. :)
> >>
> >>Maybe something like this?
> >>
> >> perf record --event bpf:/path/to/object
> >
> >The syntax maybe one of many, say if it sees a ".o" suffix in the even
> >name, look if the provided event name is a file and if this file has the
> >ELF header, whatever.
>
> agree. 'bpf:' prefix is redundant.

I would say unnecessarily exposing an implementation detail :-)

> To me the following syntax is fine:
> perf record --event bpf_file.o

Agreed, for something pre-compiled.

> In the future it can support automatically:
> perf record --event bpf_file.c

Right, to compile it, then use the resulting bpf_file.o just like in the
first case, then, on another patch, cache it and next time just check
that the contents of the file hasn't changed, so the .o file can be
used, i.e. ccache like stuff.

> Wang, thoughts?
>
> >>Oh, this looks like an interesting approach.. are you saying something
> >>like below?
> >
> >No, those are way too many steps :-)
> >
> >What 'perf script' does? Right now you can ask for a script to run and
> >it will both start 'perf record' with the proper events, and then
> >"immediately" consume it, piping the output of the 'record' "script" to
> >the consumer, that is 'perf script' itself running an interpreter, perl
> >or python.
>
> if you're proposing to do something like:
> perf script bpf_file.c
> that will do event creation, filtering, aggregation, reporting
> and printing results, then it's fine.
> This is pretty much what I thought 'perf bpf run' will do.

Agreed, that is what I think should be done, parts of what is in
bpf.file.c are related to the data collection, some are for filtering,
and parts are for reporting, etc.

This all should use infrastructure in perf for symbol resolving,
callcahins, etc.

> >I.e. the first part, say, failed-syscalls-record, would be done
> >internally, loading the bpf object, etc, the second part would be the
> >event massaging, but done in a C subset :-)
>
> not sure that's doable. The task (perf) that loaded the program in the

Well,perhaps in this part I wasn't clear, but what I think should be
fone is what is above, i.e. end goal should be "perf script bpf_file.c"
does it all.

> step one should be still alive when 2nd part is running.
> For 'perf bpf run' use case, the whole record/report split is
> artificial. There is no need for perf.data.

Right, but if we do a:

perf script -i perf.data bpf_file.c

Then there would be a short circuit effect of just doing the
aggregation and/or reporting.

> In my mind 'perf bpf run file.c' suppose to quickly compile .c,
> hook into kernel, collect and print something, then Ctrl-C of
> 'perf bpf run' should automatically stop everything.

That is what I thought about 'perf script file.c'. That would be stopped
either because it finishes due to some condition in the kernel, in the
reporting or by means of control+C, either by design or because it is
taking too long, etc.

> No perf.data anywhere. It should be quick single step
> tool for live debugging.

Well, we could have a tee like mode where perf.data file could be
generated so that we could run it again after doing some change on the
aggregation code, so that we wouldn't have to re-run the data collection
parts, that could be about some condition hard to capture, etc.

> At the same time 'perf record --event bpf_file.o' plus generic
> 'perf report' model works very well as well.
> They are different use cases.

Right.

> I guess I'm saying let's not get too much ahead of ourselves ;)

No problem, but then we need to talk at this moment not to add new stuff
that we think should not be added, like 'perf bpf' :-)

> I think Wang's current patchset is close enough to make
> 'perf record --event bpf_file.o' to be useful.
> Let's take this first step.

- Arnaldo

2015-05-19 21:42:19

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 5/19/15 2:10 PM, Arnaldo Carvalho de Melo wrote:
>
> This all should use infrastructure in perf for symbol resolving,
> callcahins, etc.

yes. 100%

> Right, but if we do a:
>
> perf script -i perf.data bpf_file.c
>
> Then there would be a short circuit effect of just doing the
> aggregation and/or reporting.

That can work, but I don't see why I would use bpf for
scripting on top of perf.data. If trace is already collected,
the existing perl/python scripts will work fine.

> Well, we could have a tee like mode where perf.data file could be
> generated so that we could run it again after doing some change on the
> aggregation code, so that we wouldn't have to re-run the data collection
> parts, that could be about some condition hard to capture, etc.

true, but again I don't think in such scenario you'd need bpf.
bpf is needed when the number of events is huge and the user
needs to aggregate/process them in-kernel.

>> I guess I'm saying let's not get too much ahead of ourselves ;)
>
> No problem, but then we need to talk at this moment not to add new stuff
> that we think should not be added, like 'perf bpf' :-)

ahh, that's where you going :) Sure. I don't mind avoiding that
unbearable abbreviation if we can :)
'perf script file.[oc]' command line works for me.

2015-05-19 22:05:39

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

Em Tue, May 19, 2015 at 02:42:11PM -0700, Alexei Starovoitov escreveu:
> On 5/19/15 2:10 PM, Arnaldo Carvalho de Melo wrote:
> >
> >This all should use infrastructure in perf for symbol resolving,
> >callcahins, etc.
>
> yes. 100%
>
> >Right, but if we do a:
> >
> >perf script -i perf.data bpf_file.c
> >
> >Then there would be a short circuit effect of just doing the
> >aggregation and/or reporting.

Ok, can you point me to this bpf_file.c, an example? So that we can talk
about the parts of it that would be short circuited when not loading the
bpf_file.o, etc.

> That can work, but I don't see why I would use bpf for
> scripting on top of perf.data. If trace is already collected,
> the existing perl/python scripts will work fine.

> >Well, we could have a tee like mode where perf.data file could be
> >generated so that we could run it again after doing some change on the
> >aggregation code, so that we wouldn't have to re-run the data collection
> >parts, that could be about some condition hard to capture, etc.
>
> true, but again I don't think in such scenario you'd need bpf.
> bpf is needed when the number of events is huge and the user
> needs to aggregate/process them in-kernel.
>
> >>I guess I'm saying let's not get too much ahead of ourselves ;)
> >
> >No problem, but then we need to talk at this moment not to add new stuff
> >that we think should not be added, like 'perf bpf' :-)
>
> ahh, that's where you going :) Sure. I don't mind avoiding that

:-) If there is no need, why add a new command? I.e. if all that is
wanted can be used by integrating eBPF into existing commands, that
could be best.

> unbearable abbreviation if we can :)

:-)

> 'perf script file.[oc]' command line works for me.

- Arnaldo

2015-05-20 00:33:17

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

On Tue, May 19, 2015 at 01:40:40PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, May 20, 2015 at 01:04:48AM +0900, Namhyung Kim escreveu:
> > On Tue, May 19, 2015 at 10:44:58AM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, May 18, 2015 at 02:45:58PM -0700, Alexei Starovoitov escreveu:
> > > > On 5/18/15 2:20 PM, Arnaldo Carvalho de Melo wrote:
> > > > > perf record --event bpf_thing.o
> > >
> > > >> Looks more natural then, as it is an event that will take place when the
> > > >> filter returns true, and in addition to that, it will come with a bunch
> > > >> of variables, etc.
> > > >
> > > > well, I think --event fits a bit less than --filter ;)
> > > > Both not ideal.
> > >
> > > I thought --event was more suited, as it states what is the event, and
> > > when it should happen, i.e. --filter is about reducing the frequency of
> > > something well defined, i.e. an existing --event.
> >
> > If we go with 'perf record' rather than 'perf bpf record', I agree
> > that --event option is more natural than --filter. The --event option
> > says that it will record - or enable, at least - a (kprobe) event for
> > bpf programs in it and then do something with it. :)
> >
> > Maybe something like this?
> >
> > perf record --event bpf:/path/to/object
>
> The syntax maybe one of many, say if it sees a ".o" suffix in the even
> name, look if the provided event name is a file and if this file has the
> ELF header, whatever.

I was thinking about something similar for SDT. I think that it'll
eventually handle ELF files as an event somehow, no? Maybe we need to
add ".bpf" suffix?


>
> <SNIP>
>
> > > > matches perf_event_open model which is FD based as well.
> > > > And that's only filtering like usage. Where 'perf report' facilities
> > > > are reused. For 'kernel debugging', 'latency heatmaps' use cases some
> > > > new visualizations in perf will be needed. That's where
> > > > 'perf bpf command' fits.
> > >
> > > Humm, why not use 'perf script', 'perf trace' as well for those things?
> > >
> > > A 'perf script' that actually uses a C subset, gets compiled by llvm and
> > > then immediately used, with caching for amortizing llvm calls if those
> > > are that expensive, etc, instead of the current python or perl scripting
> > > would come in handy for people like PeterZ, right?
> >
> > Oh, this looks like an interesting approach.. are you saying something
> > like below?
>
> No, those are way too many steps :-)
>
> What 'perf script' does? Right now you can ask for a script to run and
> it will both start 'perf record' with the proper events, and then
> "immediately" consume it, piping the output of the 'record' "script" to
> the consumer, that is 'perf script' itself running an interpreter, perl
> or python.

Ah, okay. So 'perf script <xxx>' is same as 'perf script record <xxx> -o- | \
perf script report <xxx> -i-', right?


>
> Look at:
>
> [acme@ssdandy linux]$ ls -la ~/libexec/perf-core/scripts/perl/bin/
> total 64
> drwxr-xr-x. 2 acme acme 4096 May 19 11:53 .
> drwxr-xr-x. 4 acme acme 4096 May 19 11:53 ..
> -rwxr-xr-x. 1 acme acme 78 May 19 11:53 check-perf-trace-record
> -rwxr-xr-x. 1 acme acme 132 Jan 6 11:36 check-perf-trace-report
> -rwxr-xr-x. 1 acme acme 109 May 19 11:53 failed-syscalls-record
> -rwxr-xr-x. 1 acme acme 241 May 19 11:53 failed-syscalls-report
> -rwxr-xr-x. 1 acme acme 83 May 19 11:53 rw-by-file-record
> -rwxr-xr-x. 1 acme acme 232 May 19 11:53 rw-by-file-report
> -rwxr-xr-x. 1 acme acme 135 May 19 11:53 rw-by-pid-record
> -rwxr-xr-x. 1 acme acme 114 May 19 11:53 rw-by-pid-report
> -rwxr-xr-x. 1 acme acme 135 May 19 11:53 rwtop-record
> -rwxr-xr-x. 1 acme acme 398 May 19 11:53 rwtop-report
> -rwxr-xr-x. 1 acme acme 75 May 19 11:53 wakeup-latency-record
> -rwxr-xr-x. 1 acme acme 133 May 19 11:53 wakeup-latency-report
> -rwxr-xr-x. 1 acme acme 160 Jan 6 13:05 workqueue-stats-record
> -rwxr-xr-x. 1 acme acme 136 Jan 6 13:05 workqueue-stats-report
> [acme@ssdandy linux]$
>
> I.e. the first part, say, failed-syscalls-record, would be done
> internally, loading the bpf object, etc, the second part would be the
> event massaging, but done in a C subset :-)

IIRC there's a patch posted to support C for scripting.

https://lkml.org/lkml/2014/5/22/13

Are you saying about it? A rough idea is that it'd be nice to have a
kind of this report/display code in a section of the bpf file itself.

Thanks,
Namhyung

2015-05-20 00:40:26

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

On Tue, May 19, 2015 at 06:10:33PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, May 19, 2015 at 01:46:10PM -0700, Alexei Starovoitov escreveu:
> > On 5/19/15 9:40 AM, Arnaldo Carvalho de Melo wrote:
> > >Em Wed, May 20, 2015 at 01:04:48AM +0900, Namhyung Kim escreveu:
> > >>If we go with 'perf record' rather than 'perf bpf record', I agree
> > >>that --event option is more natural than --filter. The --event option
> > >>says that it will record - or enable, at least - a (kprobe) event for
> > >>bpf programs in it and then do something with it. :)
> > >>
> > >>Maybe something like this?
> > >>
> > >> perf record --event bpf:/path/to/object
> > >
> > >The syntax maybe one of many, say if it sees a ".o" suffix in the even
> > >name, look if the provided event name is a file and if this file has the
> > >ELF header, whatever.
> >
> > agree. 'bpf:' prefix is redundant.
>
> I would say unnecessarily exposing an implementation detail :-)
>
> > To me the following syntax is fine:
> > perf record --event bpf_file.o
>
> Agreed, for something pre-compiled.
>
> > In the future it can support automatically:
> > perf record --event bpf_file.c
>
> Right, to compile it, then use the resulting bpf_file.o just like in the
> first case, then, on another patch, cache it and next time just check
> that the contents of the file hasn't changed, so the .o file can be
> used, i.e. ccache like stuff.
>
> > Wang, thoughts?
> >
> > >>Oh, this looks like an interesting approach.. are you saying something
> > >>like below?
> > >
> > >No, those are way too many steps :-)
> > >
> > >What 'perf script' does? Right now you can ask for a script to run and
> > >it will both start 'perf record' with the proper events, and then
> > >"immediately" consume it, piping the output of the 'record' "script" to
> > >the consumer, that is 'perf script' itself running an interpreter, perl
> > >or python.
> >
> > if you're proposing to do something like:
> > perf script bpf_file.c
> > that will do event creation, filtering, aggregation, reporting
> > and printing results, then it's fine.
> > This is pretty much what I thought 'perf bpf run' will do.
>
> Agreed, that is what I think should be done, parts of what is in
> bpf.file.c are related to the data collection, some are for filtering,
> and parts are for reporting, etc.

Looks great! :)

>
> This all should use infrastructure in perf for symbol resolving,
> callcahins, etc.

But then we need to stabilize libperf APIs IMHO. :)

Thanks,
Namhyung

2015-05-20 00:37:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

Em Wed, May 20, 2015 at 09:23:50AM +0900, Namhyung Kim escreveu:
> On Tue, May 19, 2015 at 01:40:40PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Wed, May 20, 2015 at 01:04:48AM +0900, Namhyung Kim escreveu:
> > > On Tue, May 19, 2015 at 10:44:58AM -0300, Arnaldo Carvalho de Melo wrote:
> > > If we go with 'perf record' rather than 'perf bpf record', I agree
> > > that --event option is more natural than --filter. The --event option
> > > says that it will record - or enable, at least - a (kprobe) event for
> > > bpf programs in it and then do something with it. :)
> > >
> > > Maybe something like this?
> > >
> > > perf record --event bpf:/path/to/object
> >
> > The syntax maybe one of many, say if it sees a ".o" suffix in the even
> > name, look if the provided event name is a file and if this file has the
> > ELF header, whatever.

> I was thinking about something similar for SDT. I think that it'll
> eventually handle ELF files as an event somehow, no? Maybe we need to
> add ".bpf" suffix?

Well, if both are ELF files, then probably they will have something that
is unique to them, like perhaps some init function, or some section name
that can be used to disambiguate, no?

> > <SNIP>
> >
> > > > > matches perf_event_open model which is FD based as well.
> > > > > And that's only filtering like usage. Where 'perf report' facilities
> > > > > are reused. For 'kernel debugging', 'latency heatmaps' use cases some
> > > > > new visualizations in perf will be needed. That's where
> > > > > 'perf bpf command' fits.
> > > >
> > > > Humm, why not use 'perf script', 'perf trace' as well for those things?
> > > >
> > > > A 'perf script' that actually uses a C subset, gets compiled by llvm and
> > > > then immediately used, with caching for amortizing llvm calls if those
> > > > are that expensive, etc, instead of the current python or perl scripting
> > > > would come in handy for people like PeterZ, right?
> > >
> > > Oh, this looks like an interesting approach.. are you saying something
> > > like below?
> >
> > No, those are way too many steps :-)
> >
> > What 'perf script' does? Right now you can ask for a script to run and
> > it will both start 'perf record' with the proper events, and then
> > "immediately" consume it, piping the output of the 'record' "script" to
> > the consumer, that is 'perf script' itself running an interpreter, perl
> > or python.
>
> Ah, okay. So 'perf script <xxx>' is same as 'perf script record <xxx> -o- | \
> perf script report <xxx> -i-', right?

Something like that, but without two commands spawned using pipes for
communication.

> > Look at:
> >
> > [acme@ssdandy linux]$ ls -la ~/libexec/perf-core/scripts/perl/bin/
> > total 64
> > drwxr-xr-x. 2 acme acme 4096 May 19 11:53 .
> > drwxr-xr-x. 4 acme acme 4096 May 19 11:53 ..
> > -rwxr-xr-x. 1 acme acme 78 May 19 11:53 check-perf-trace-record
> > -rwxr-xr-x. 1 acme acme 132 Jan 6 11:36 check-perf-trace-report
> > -rwxr-xr-x. 1 acme acme 109 May 19 11:53 failed-syscalls-record
> > -rwxr-xr-x. 1 acme acme 241 May 19 11:53 failed-syscalls-report
> > -rwxr-xr-x. 1 acme acme 83 May 19 11:53 rw-by-file-record
> > -rwxr-xr-x. 1 acme acme 232 May 19 11:53 rw-by-file-report
> > -rwxr-xr-x. 1 acme acme 135 May 19 11:53 rw-by-pid-record
> > -rwxr-xr-x. 1 acme acme 114 May 19 11:53 rw-by-pid-report
> > -rwxr-xr-x. 1 acme acme 135 May 19 11:53 rwtop-record
> > -rwxr-xr-x. 1 acme acme 398 May 19 11:53 rwtop-report
> > -rwxr-xr-x. 1 acme acme 75 May 19 11:53 wakeup-latency-record
> > -rwxr-xr-x. 1 acme acme 133 May 19 11:53 wakeup-latency-report
> > -rwxr-xr-x. 1 acme acme 160 Jan 6 13:05 workqueue-stats-record
> > -rwxr-xr-x. 1 acme acme 136 Jan 6 13:05 workqueue-stats-report
> > [acme@ssdandy linux]$
> >
> > I.e. the first part, say, failed-syscalls-record, would be done
> > internally, loading the bpf object, etc, the second part would be the
> > event massaging, but done in a C subset :-)
>
> IIRC there's a patch posted to support C for scripting.
>
> https://lkml.org/lkml/2014/5/22/13

Lemme look at it... Yeah, that is part of what needs to be done, i.e.
somehow we need to get the .c "script", build it and connect it to the
perf tool.

> Are you saying about it? A rough idea is that it'd be nice to have a
> kind of this report/display code in a section of the bpf file itself.

Exactly. All in one file, parts of it built for specific purposes and
loaded in each needed way, then all gets connected.

- Arnaldo

2015-05-20 00:43:12

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

Em Wed, May 20, 2015 at 09:30:52AM +0900, Namhyung Kim escreveu:
> On Tue, May 19, 2015 at 06:10:33PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, May 19, 2015 at 01:46:10PM -0700, Alexei Starovoitov escreveu:
> > > if you're proposing to do something like:
> > > perf script bpf_file.c
> > > that will do event creation, filtering, aggregation, reporting
> > > and printing results, then it's fine.
> > > This is pretty much what I thought 'perf bpf run' will do.

> > Agreed, that is what I think should be done, parts of what is in
> > bpf.file.c are related to the data collection, some are for filtering,
> > and parts are for reporting, etc.

> Looks great! :)

> > This all should use infrastructure in perf for symbol resolving,
> > callcahins, etc.

> But then we need to stabilize libperf APIs IMHO. :)

Well, all this will remain in tools/, more so in tools/perf/ where we
can continue to do what you propose :-)

- Arnaldo

2015-05-20 01:02:18

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 5/19/15 3:05 PM, Arnaldo Carvalho de Melo wrote:
>
> Ok, can you point me to this bpf_file.c, an example? So that we can talk
> about the parts of it that would be short circuited when not loading the
> bpf_file.o, etc.

There are different use cases that would fit in different perf commands.
- 1st use case is quick kernel debugging which could be something
similar to: samples/bpf/tracex1_kern.c
where the user would type a little program with some filtering and
printing and will say 'perf script prog.c'
where it would compile (in microseconds), load, hook into kernel
and print stuff to the screen.
If it prints too much, the user will just Ctrl-C, tweak the prog
and repeat.
I'm seeing it as replacement of 'add printk to kernel, recompile,
reboot' cycle.

- 2nd use case is performance analysis of various things where
aggregation is happening in the kernel and data is printed by
user space in human friendly way. There are three main categories
of visualization: histograms, heatmaps, flamegraphs.
The program will collect the data into such structures
and perf will be visualizing them.
Currently tracex2 is an example of histogram, tracex3 - heatmap
and flamegraphs in the works.

- 3rd use case is 'top' like visualization
tracex4 example is trying to demo it.

these are my use cases.
- Wang's use case turned out to be 4th category.
My understanding he wants to write a program that pretty
much sophisticated filter with dependencies across multiple
events in the system and at the end visualize one 'master'
event via 'perf report'.
That's what this patch set is about.
I think 'perf record --event file.o' + 'perf report' fits
Wang's use case quite well.
My 1st use case fits 'perf script file.c' model.
How to fit 2 and 3 into perf is hard to see yet.

2015-05-20 01:15:04

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.

Em Tue, May 19, 2015 at 06:02:13PM -0700, Alexei Starovoitov escreveu:
> On 5/19/15 3:05 PM, Arnaldo Carvalho de Melo wrote:
> >
> >Ok, can you point me to this bpf_file.c, an example? So that we can talk
> >about the parts of it that would be short circuited when not loading the
> >bpf_file.o, etc.

Thanks! I'll continue tomorrow, reading those examples as well messages
I saw on netdev.

- Arnaldo

> There are different use cases that would fit in different perf commands.
> - 1st use case is quick kernel debugging which could be something
> similar to: samples/bpf/tracex1_kern.c
> where the user would type a little program with some filtering and
> printing and will say 'perf script prog.c'
> where it would compile (in microseconds), load, hook into kernel
> and print stuff to the screen.
> If it prints too much, the user will just Ctrl-C, tweak the prog
> and repeat.
> I'm seeing it as replacement of 'add printk to kernel, recompile,
> reboot' cycle.
>
> - 2nd use case is performance analysis of various things where
> aggregation is happening in the kernel and data is printed by
> user space in human friendly way. There are three main categories
> of visualization: histograms, heatmaps, flamegraphs.
> The program will collect the data into such structures
> and perf will be visualizing them.
> Currently tracex2 is an example of histogram, tracex3 - heatmap
> and flamegraphs in the works.
>
> - 3rd use case is 'top' like visualization
> tracex4 example is trying to demo it.
>
> these are my use cases.
> - Wang's use case turned out to be 4th category.
> My understanding he wants to write a program that pretty
> much sophisticated filter with dependencies across multiple
> events in the system and at the end visualize one 'master'
> event via 'perf report'.
> That's what this patch set is about.
> I think 'perf record --event file.o' + 'perf report' fits
> Wang's use case quite well.
> My 1st use case fits 'perf script file.c' model.
> How to fit 2 and 3 into perf is hard to see yet.

2015-05-20 03:48:51

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH v3 06/37] bpf tools: Introduce 'bpf' library to tools



在 2015/5/19 1:35, Alexei Starovoitov 写道:
> On 5/17/15 3:56 AM, Wang Nan wrote:
>> This is the first patch of libbpf. The goal of libbpf is to create a
>> standard way for accessing eBPF object files. This patch creates
>> Makefile and Build for it, allows 'make' to build libbpf.a and
>> libbpf.so, 'make install' to put them into proper directories.
>> Most part of Makefile is borrowed from traceevent. Before building,
>> it checks the existance of libelf in Makefile, and deny to build if
>> not found. Instead of throw an error if libelf not found, the error
>> raises in a phony target "elfdep". This design is to ensure
>> 'make clean' still workable even if libelf is not found.
>>
>> Signed-off-by: Wang Nan <[email protected]>
>> ---
> ...
>> +
>> +# Version of eBPF elf file
>> +FILE_VERSION = 1
>
> what that comment suppose to mean?

The format of eBPF objects can be improved in futher. A version number
here is the precaution of backward compatibility. However this patch doesn't
utilize it.

I'd like to append a 'format' section into eBPF object format to let
libbpf know
the version of the object. What do you think?

Thank you.

2015-05-20 05:24:53

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 06/37] bpf tools: Introduce 'bpf' library to tools

On 5/19/15 8:48 PM, Wangnan (F) wrote:
>
>>> +
>>> +# Version of eBPF elf file
>>> +FILE_VERSION = 1
>>
>> what that comment suppose to mean?
>
> The format of eBPF objects can be improved in futher. A version number
> here is the precaution of backward compatibility. However this patch
> doesn't
> utilize it.
>
> I'd like to append a 'format' section into eBPF object format to let
> libbpf know
> the version of the object. What do you think?

I don't think it will help.
Version number is quite inconvenient.
perf_event_attr and bpf_attr are using 'size' instead, since the only
way keep to backward compatibility is to force new additions to preserve
old fields. The notion that 'new version number can start fresh' doesn't
really work, because it means duplicated code. In this case, in libbpf.
I don't think we want 'if (elf_version == X) parse sections this way'
type of code. iproute2 already reserved 'classifier' and 'action'
names and I think 'kprobe' and 'socket' are good enough prefixes for
BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_SOCKET_FILTER programs as well.
So the prefix to indicate the program type is already settled.
What comes after the prefix is tbd.
I proposed 'kprobe/perform_write(void*, void*, long long)'
style for vmlinux without debug info and
'kprobe/perform_write+122(file->f_mapping->a_ops, bytes, offset)'
with debug. It looks flexible enough and can be extended with
new features later.
I don't think 'config', 'format' sections are needed.

Subject: [tip:perf/core] perf tools: Set vmlinux_path__nr_entries to 0 in vmlinux_path__exit

Commit-ID: c4f035473d93c1594d8225f6dd97332317820801
Gitweb: http://git.kernel.org/tip/c4f035473d93c1594d8225f6dd97332317820801
Author: Wang Nan <[email protected]>
AuthorDate: Sun, 17 May 2015 10:56:27 +0000
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Mon, 18 May 2015 10:17:39 -0300

perf tools: Set vmlinux_path__nr_entries to 0 in vmlinux_path__exit

Original vmlinux_path__exit() doesn't revert vmlinux_path__nr_entries to
its original state. After the while loop vmlinux_path__nr_entries
becomes -1 instead of 0.

This makes a problem that, if runs twice, during the second run
vmlinux_path__init() will set vmlinux_path[-1] to strdup("vmlinux"),
corrupts random memory.

This patch reset vmlinux_path__nr_entries to 0 after the while loop.

Signed-off-by: Wang Nan <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexei Starovoitov <[email protected]
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: Zefan Li <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/symbol.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 9ef8b89..82a31fd 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1819,6 +1819,7 @@ static void vmlinux_path__exit(void)
{
while (--vmlinux_path__nr_entries >= 0)
zfree(&vmlinux_path[vmlinux_path__nr_entries]);
+ vmlinux_path__nr_entries = 0;

zfree(&vmlinux_path);
}

Subject: [tip:perf/core] tools lib traceevent: Install libtraceevent.a into libdir

Commit-ID: bb53e176fed9b6c9321904558b4e605b4770e454
Gitweb: http://git.kernel.org/tip/bb53e176fed9b6c9321904558b4e605b4770e454
Author: Wang Nan <[email protected]>
AuthorDate: Sun, 17 May 2015 10:56:28 +0000
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Mon, 18 May 2015 12:36:25 -0300

tools lib traceevent: Install libtraceevent.a into libdir

Before this patch, 'make install' installs libraries into bindir:

$ make install DESTDIR=./tree
INSTALL trace_plugins
INSTALL libtraceevent.a
INSTALL libtraceevent.so
$ find ./tree
./tree/
./tree/usr
./tree/usr/local
./tree/usr/local/bin
./tree/usr/local/bin/libtraceevent.a
./tree/usr/local/bin/libtraceevent.so
...

/usr/local/lib( or lib64) should be a better place.

This patch replaces 'bin' with libdir. For __LP64__ building, libraries
are installed to /usr/local/lib64. For other building, to
/usr/local/lib instead.

After applying this patch:

$ make install DESTDIR=./tree
INSTALL trace_plugins
INSTALL libtraceevent.a
INSTALL libtraceevent.so
$ find ./tree
./tree
./tree/usr
./tree/usr/local
./tree/usr/local/lib64
./tree/usr/local/lib64/libtraceevent.a
./tree/usr/local/lib64/traceevent
./tree/usr/local/lib64/traceevent/plugins
./tree/usr/local/lib64/traceevent/plugins/plugin_mac80211.so
./tree/usr/local/lib64/traceevent/plugins/plugin_hrtimer.so
...
./tree/usr/local/lib64/libtraceevent.so

Signed-off-by: Wang Nan <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: Zefan Li <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/lib/traceevent/Makefile | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile
index d410da3..8464039 100644
--- a/tools/lib/traceevent/Makefile
+++ b/tools/lib/traceevent/Makefile
@@ -34,9 +34,15 @@ INSTALL = install
DESTDIR ?=
DESTDIR_SQ = '$(subst ','\'',$(DESTDIR))'

+LP64 := $(shell echo __LP64__ | ${CC} ${CFLAGS} -E -x c - | tail -n 1)
+ifeq ($(LP64), 1)
+ libdir_relative = lib64
+else
+ libdir_relative = lib
+endif
+
prefix ?= /usr/local
-bindir_relative = bin
-bindir = $(prefix)/$(bindir_relative)
+libdir = $(prefix)/$(libdir_relative)
man_dir = $(prefix)/share/man
man_dir_SQ = '$(subst ','\'',$(man_dir))'

@@ -58,7 +64,7 @@ ifeq ($(prefix),$(HOME))
override plugin_dir = $(HOME)/.traceevent/plugins
set_plugin_dir := 0
else
-override plugin_dir = $(prefix)/lib/traceevent/plugins
+override plugin_dir = $(libdir)/traceevent/plugins
endif
endif

@@ -85,11 +91,11 @@ srctree := $(patsubst %/,%,$(dir $(srctree)))
#$(info Determined 'srctree' to be $(srctree))
endif

-export prefix bindir src obj
+export prefix libdir src obj

# Shell quotes
-bindir_SQ = $(subst ','\'',$(bindir))
-bindir_relative_SQ = $(subst ','\'',$(bindir_relative))
+libdir_SQ = $(subst ','\'',$(libdir))
+libdir_relative_SQ = $(subst ','\'',$(libdir_relative))
plugin_dir_SQ = $(subst ','\'',$(plugin_dir))

LIB_FILE = libtraceevent.a libtraceevent.so
@@ -240,7 +246,7 @@ endef

install_lib: all_cmd install_plugins
$(call QUIET_INSTALL, $(LIB_FILE)) \
- $(call do_install,$(LIB_FILE),$(bindir_SQ))
+ $(call do_install,$(LIB_FILE),$(libdir_SQ))

install_plugins: $(PLUGINS)
$(call QUIET_INSTALL, trace_plugins) \

Subject: [tip:perf/core] tools build: Change FEATURE_TESTS and FEATURE_DISPLAY to weak binding

Commit-ID: 8135c8c750cf018cd43bf955117529467ba178db
Gitweb: http://git.kernel.org/tip/8135c8c750cf018cd43bf955117529467ba178db
Author: Wang Nan <[email protected]>
AuthorDate: Sun, 17 May 2015 10:56:29 +0000
Committer: Arnaldo Carvalho de Melo <[email protected]>
CommitDate: Mon, 18 May 2015 12:36:46 -0300

tools build: Change FEATURE_TESTS and FEATURE_DISPLAY to weak binding

Replace strong binding of FEATURE_TESTS and FEATURE_DISPLAY by weak
binding. This patch enables other makefiles which include
tools/build/Makefile.feature enable only limited feathres to check.

Signed-off-by: Wang Nan <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: Zefan Li <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/build/Makefile.feature | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 3a0b0ca..2975632 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -27,7 +27,7 @@ endef
# the rule that uses them - an example for that is the 'bionic'
# feature check. ]
#
-FEATURE_TESTS = \
+FEATURE_TESTS ?= \
backtrace \
dwarf \
fortify-source \
@@ -53,7 +53,7 @@ FEATURE_TESTS = \
zlib \
lzma

-FEATURE_DISPLAY = \
+FEATURE_DISPLAY ?= \
dwarf \
glibc \
gtk2 \

2015-05-21 00:27:54

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH v3 06/37] bpf tools: Introduce 'bpf' library to tools



在 2015/5/20 13:24, Alexei Starovoitov 写道:
> On 5/19/15 8:48 PM, Wangnan (F) wrote:
>>
>>>> +
>>>> +# Version of eBPF elf file
>>>> +FILE_VERSION = 1
>>>
>>> what that comment suppose to mean?
>>
>> The format of eBPF objects can be improved in futher. A version number
>> here is the precaution of backward compatibility. However this patch
>> doesn't
>> utilize it.
>>
>> I'd like to append a 'format' section into eBPF object format to let
>> libbpf know
>> the version of the object. What do you think?
>
> I don't think it will help.
> Version number is quite inconvenient.
> perf_event_attr and bpf_attr are using 'size' instead, since the only
> way keep to backward compatibility is to force new additions to preserve
> old fields. The notion that 'new version number can start fresh' doesn't
> really work, because it means duplicated code. In this case, in libbpf.
> I don't think we want 'if (elf_version == X) parse sections this way'
> type of code.


> iproute2 already reserved 'classifier' and 'action'
> names and I think 'kprobe' and 'socket' are good enough prefixes for
> BPF_PROG_TYPE_KPROBE and BPF_PROG_TYPE_SOCKET_FILTER programs as well.

Do you think we should classify kprobe/socket programs in libbpf layer
instead of perf?

In my current implementation, type of a program is determined by perf by
parsing names of
corresponding sections. Format of section names should be part of
interface between perf
(and iproute2 and others) and eBPF programs. What libbpf should do is to
fetch those
names and give them to caller, let caller decide the type of programs.
Therefore, if
the program is written for perf, writer of program even don't need to
think about
kprobe/socket program type since in perf currently we can only use
kprobe program.

(Currently "config" section is special, but as you suggested I decide to
drop it in next version.)

> So the prefix to indicate the program type is already settled.
> What comes after the prefix is tbd.
> I proposed 'kprobe/perform_write(void*, void*, long long)'
> style for vmlinux without debug info and
> 'kprobe/perform_write+122(file->f_mapping->a_ops, bytes, offset)'
> with debug. It looks flexible enough and can be extended with
> new features later.

Now section name has similary format as 'perf probe' because I think by
doing this we can avoid
inventing a new syntax for eBPF programs. What do you think?

> I don't think 'config', 'format' sections are needed.
>

You won't see them in v4 :)

Thank you.

2015-05-21 07:12:01

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH v3 06/37] bpf tools: Introduce 'bpf' library to tools



在 2015/5/19 1:35, Alexei Starovoitov 写道:
> On 5/17/15 3:56 AM, Wang Nan wrote:
>> This is the first patch of libbpf. The goal of libbpf is to create a
>> standard way for accessing eBPF object files. This patch creates
>> Makefile and Build for it, allows 'make' to build libbpf.a and
>> libbpf.so, 'make install' to put them into proper directories.
>> Most part of Makefile is borrowed from traceevent. Before building,
>> it checks the existance of libelf in Makefile, and deny to build if
>> not found. Instead of throw an error if libelf not found, the error
>> raises in a phony target "elfdep". This design is to ensure
>> 'make clean' still workable even if libelf is not found.
>>
>> Signed-off-by: Wang Nan <[email protected]>
>> ---
> ...
>> +
>> +# Version of eBPF elf file
>> +FILE_VERSION = 1
>
> what that comment suppose to mean?
>
>> +# Makefiles suck: This macro sets a default value of $(2) for the
>> +# variable named by $(1), unless the variable has been set by
>> +# environment or command line. This is necessary for CC and AR
>> +# because make sets default values, so the simpler ?= approach
>> +# won't work as expected.
>
> what this for? copy-paste?
>
>> +# Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
>> +$(call allow-override,CC,$(CROSS_COMPILE)gcc)
>> +$(call allow-override,AR,$(CROSS_COMPILE)ar)
>
> was cross-compile tested or just copy-pasted?
>

Cross compiling is tested on aarch64.

>> +EXT = -std=gnu99
>
> I guess it was copy-pasted from libtraceevent, but please double
> check that it's actually needed.
>

Will remove.

>> +# Append required CFLAGS
>> +override CFLAGS += -fPIC
>> +override CFLAGS += $(INCLUDES)
>> +override CFLAGS += -D_GNU_SOURCE
>
> _GNU_SOURCE actually needed?
> Please sanitize the file.
>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> new file mode 100644
>> index 0000000..bebe99a
>> --- /dev/null
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -0,0 +1,15 @@
>> +/*
>> + * common eBPF ELF loading operations.
>> + *
>> + * Copyright (C) 2015, Wang Nan <[email protected]>
>> + * Copyright (C) 2015, Huawei Inc.
>
> since it borrows heavily from samples/bpf/bpf_load.c, libbpf.h
> would be nice if you mention the source and/or copyright in the header.
>

Will change. Could you please provide copyright information so I can add
it here?

>> + *
>> + * Released under the GPL v2. (and only v2, not any later version)
>> + */
>
> are you sure about this restriction? libtracevent is LGPL, for example.
>

Will change.

Thank you.

2015-05-21 17:53:25

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 06/37] bpf tools: Introduce 'bpf' library to tools

On 5/20/15 5:24 PM, Wangnan (F) wrote:
>
>
> Do you think we should classify kprobe/socket programs in libbpf layer
> instead of perf?
>
> In my current implementation, type of a program is determined by perf by
> parsing names of
> corresponding sections. Format of section names should be part of
> interface between perf
> (and iproute2 and others) and eBPF programs. What libbpf should do is to
> fetch those
> names and give them to caller, let caller decide the type of programs.
> Therefore, if
> the program is written for perf, writer of program even don't need to
> think about
> kprobe/socket program type since in perf currently we can only use
> kprobe program.

if whole section name string is passed to perf/iproute2 then it's fine,
but libbpf already parses 'map', 'license', 'version' section names.
Are you saying everything else will be passed to perf directly?
perf will parse section string, generate extra prologue insns based on
debug info found in vmlinux and call into library back to load them?
Sounds ok-ish. Let's see how code looks.

2015-05-22 17:22:42

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 09/37] bpf tools: Open eBPF object file and do basic validation

On Sun, May 17, 2015 at 10:56:34AM +0000, Wang Nan wrote:

SNIP

> +
> +void bpf_close_object(struct bpf_object *obj)
> +{
> + if (!obj)
> + return;
> +
> + bpf_obj_clear_elf(obj);
> +
> + if (obj->path)
> + free(obj->path);

you're safe calling free with NULL, so there's
no need for the condition

jirka

> + free(obj);
> }
> --
> 1.8.3.4
>

2015-05-22 17:23:31

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 08/37] bpf tools: Define basic interface

On Mon, May 18, 2015 at 10:57:36AM -0700, Alexei Starovoitov wrote:
> On 5/17/15 3:56 AM, Wang Nan wrote:
> >bpf_open_object() and bpf_close_object() are open and close function of
> >eBPF object files. 'struct bpf_object' will be handler of one object
> >file. Its internal structure is hide to user.
> >
> >Signed-off-by: Wang Nan <[email protected]>
> >---
> ...
> >+
> >+struct bpf_object *bpf_open_object(const char *path)
> >+{
> >+ return NULL;
> >+}
> >+
> >+void bpf_close_object(struct bpf_object *object)
> >+{
> >+ return 0;
> >+}
>
> I'm not a fan of introducing empty helpers. I would
> squash it with the patch that actually adds meat to it.
>

agreed, also:

[jolsa@krava bpf]$ make
CC libbpf.o
libbpf.c: In function ‘bpf_close_object’:
libbpf.c:68:2: warning: ‘return’ with a value, in function returning void [enabled by default]
return 0;
^
LD libbpf-in.o
LINK libbpf.a
LINK libbpf.so


jirka

2015-05-22 17:23:47

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 21/37] bpf tools: Create eBPF maps defined in an object file

On Sun, May 17, 2015 at 10:56:46AM +0000, Wang Nan wrote:

SNIP

> + i * sizeof(struct bpf_map_def));
> +
> + if (obj->needs_swap) {
> + def.type = bswap_32(def.type);
> + def.key_size = bswap_32(def.key_size);
> + def.value_size = bswap_32(def.value_size);
> + def.max_entries = bswap_32(def.max_entries);
> + }
> +
> + *pfd = bpf_create_map(def.type,
> + def.key_size,
> + def.value_size,
> + def.max_entries);
> + if (*pfd < 0) {
> + size_t j;
> + int err = *pfd;
> +
> + pr_warning("failed to create map: %s\n",
> + strerror(errno));
> + for (j = 0; j < i; j++) {
> + close(obj->maps_fds[j]);
> + obj->maps_fds[j] = -1;
> + }
> + free(obj->maps_fds);
> + obj->maps_fds = NULL;

the close/free code above could go into separate function
and be used in bpf_unload_object

jirka

2015-05-22 17:24:34

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 09/37] bpf tools: Open eBPF object file and do basic validation

On Sun, May 17, 2015 at 10:56:34AM +0000, Wang Nan wrote:

SNIP

> + obj->path);
> + err = -EINVAL;
> + goto errout;
> + }
> + ep = &obj->elf.ehdr;
> +
> + if ((ep->e_type != ET_REL) || (ep->e_machine != 0)) {
> + pr_warning("%s is not an eBPF object file\n",
> + obj->path);
> + err = -EINVAL;
> + goto errout;
> + }
> +
> return 0;
> +errout:
> + bpf_obj_clear_elf(obj);
> + return err;
> +}
> +
> +struct bpf_object *bpf_open_object(const char *path)
> +{

another suggestion for the namespace.. Arnaldo forces us ;-)
to use the object name first plus '__(method name)' for
interface functions so that would be:

bpf_object__open
bpf_object__close

not sure we want to keep that standard in here though.. Arnaldo?

jirka


SNIP

> +void bpf_close_object(struct bpf_object *obj)
> +{
> + if (!obj)
> + return;
> +
> + bpf_obj_clear_elf(obj);
> +
> + if (obj->path)
> + free(obj->path);
> + free(obj);
> }
> --
> 1.8.3.4
>

2015-05-22 17:24:14

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 20/37] bpf tools: Add bpf.c/h for common bpf operations

On Sun, May 17, 2015 at 10:56:45AM +0000, Wang Nan wrote:

SNIP

> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> new file mode 100644
> index 0000000..1e9a53b
> --- /dev/null
> +++ b/tools/lib/bpf/bpf.h
> @@ -0,0 +1,17 @@
> +/*
> + * common eBPF ELF operations.
> + *
> + * Copyright (C) 2015, Wang Nan <[email protected]>
> + * Copyright (C) 2015, Huawei Inc.
> + *
> + * Released under the GPL v2. (and only v2, not any later version)
> + */
> +#ifndef __LIBBPF_BPF_H
> +#define __LIBBPF_BPF_H

libbpf.h uses __BPF_LIBBPF_H ... you might want to come
with some common pattern for these defines

jirka

2015-05-22 17:24:29

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 28/37] perf tools: Make perf depend on libbpf

On Sun, May 17, 2015 at 10:56:53AM +0000, Wang Nan wrote:

SNIP

> else
> TE_PATH=$(TRACE_EVENT_DIR)
> LIB_PATH=$(LIB_DIR)
> + BPF_PATH=$(BPF_DIR)
> endif
>
> LIBTRACEEVENT = $(TE_PATH)libtraceevent.a
> @@ -174,6 +177,9 @@ export LIBTRACEEVENT
> LIBAPI = $(LIB_PATH)libapi.a
> export LIBAPI
>
> +LIBBPF = $(BPF_PATH)/libbpf.a
> +export LIBBPF

hum, why does LIBBPF need to be exported?

if it's to match the LIBAPI, it's exported because
of its usage in the util/setup.py, which is not the
case for LIBBPF

jirka

2015-05-22 17:24:47

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 30/37] perf bpf: Add bpf-loader and open ELF object files

On Sun, May 17, 2015 at 10:56:55AM +0000, Wang Nan wrote:

SNIP

> +#define DEFINE_PRINT_FN(name, level) \
> +static int libbpf_##name(const char *fmt, ...) \
> +{ \
> + va_list args; \
> + int ret; \
> + \
> + va_start(args, fmt); \
> + ret = veprintf(level, verbose, pr_fmt(fmt), args);\
> + va_end(args); \
> + return ret; \
> +}
> +
> +DEFINE_PRINT_FN(warning, 0)
> +DEFINE_PRINT_FN(info, 0)
> +DEFINE_PRINT_FN(debug, 1)
> +
> +static bool libbpf_inited = false;
> +
> +#define MAX_OBJECTS 128
> +
> +struct {
> + struct bpf_object *objects[MAX_OBJECTS];
> + size_t nr_objects;
> +} params;

apart from that we dont like this kind of static stuff, this seems like
nice case for having simple handler like 'struct bpf_objects' carrying
the above data.. what do I miss?

also params should actually be static right?

jirka

2015-05-22 17:25:03

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 32/37] perf bpf: Parse probe points of eBPF programs during preparation

On Sun, May 17, 2015 at 10:56:57AM +0000, Wang Nan wrote:

SNIP

> + if (pev)
> + clear_perf_probe_event(pev);
> + return err;
> +}
> +
> int bpf_prepare_load(const char *filename)
> {
> struct bpf_object *obj;
> @@ -81,6 +150,8 @@ int bpf_prepare_load(const char *filename)
> return -EINVAL;
> }
> pr_debug("bpf: add program '%s'\n", title);
> +
> + bpf_do_config(params.nr_progs - 1, title);

shouldn't you check&pass return value of bpf_do_config ?

jirka

2015-05-23 01:01:06

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH v3 09/37] bpf tools: Open eBPF object file and do basic validation

On 5/22/15 10:23 AM, Jiri Olsa wrote:
>> +
>> +struct bpf_object *bpf_open_object(const char *path)
>> +{
>
> another suggestion for the namespace.. Arnaldo forces us ;-)
> to use the object name first plus '__(method name)' for
> interface functions so that would be:
>
> bpf_object__open
> bpf_object__close
>
> not sure we want to keep that standard in here though.. Arnaldo?

have been thinking back and forth on this one.
Finally convinced myself that we shouldn't be forcing it here.
object__method style would force the library to look like fake
object oriented whereas it's not. It's a normal C. Let's keep it
simple. Objects are not needed here. May be 'bpf_object' is an
unfortunate name, but it doesn't make the library to be 'ooo'.
libtraceevent doesn't use this style either...

2015-05-25 07:40:05

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH v3 18/37] bpf tools: Record map accessing instructions for each program



On 2015/5/19 2:34, Alexei Starovoitov wrote:
> On 5/17/15 3:56 AM, Wang Nan wrote:
>> This patch records the indics of instructions which are needed to be
>> relocated. Those information are saved in 'reloc_desc' field in
>> 'struct bpf_program'. In loading phase (this patch takes effect in
>> opening phase), the collected instructions will be replaced by
>> map loading instructions.
>>
>> Since we are going to close the ELF file and clear all data at the end
>> of 'opening' phase, ELF information will no longer be valid in
>> 'loading' phase. We have to locate the instructions before maps are
>> loaded, instead of directly modifying the instruction.
>>
>> 'struct bpf_map_def' is introduce in this patch to let us know how many
>> maps defined in the object.
>>
>> Signed-off-by: Wang Nan <[email protected]>
> ...
>> +/*
>> + * packed attribute is unnecessary for 'bpf_map_def'.
>> + */
>> +struct bpf_map_def {
>> + unsigned int type;
>> + unsigned int key_size;
>> + unsigned int value_size;
>> + unsigned int max_entries;
>> +};
>
> the comment looks out of place. 'packed' is necessary somewhere else?
> What were you concerned about here?
>

I see a warning message:

make: Entering directory `/home/w00229757/kernel-hydrogen/tools/lib/bpf'
CC libbpf.o
In file included from libbpf.c:23:0:
libbpf.h:31:1: error: packed attribute is unnecessary for
‘bpf_map_def’ [-Werror=packed]
} __attribute__((packed));
^
cc1: all warnings being treated as errors

if I append __attribute__((packed)) here. This is because
Makefile.include in tools/script
as '-Wpacked' warning message and it is inherited by perf. When
compiling with perf this
problem breaks compiling. Therefore I don't use 'packed' here. However I
think there should
be one. This comment is for that.









2015-05-25 09:26:20

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH v3 21/37] bpf tools: Create eBPF maps defined in an object file



On 2015/5/19 2:48, Alexei Starovoitov wrote:
> On 5/17/15 3:56 AM, Wang Nan wrote:
>> This patch creates maps based on 'map' section in object file using
>> bpf_create_map(), and store the fds into an array in
>> 'struct bpf_object'. Since the byte order of the object may differ
>> from the host, swap map definition before processing.
>>
>> This is the first patch in 'loading' phase. Previous patches parse ELF
>> object file and create needed data structure, but doesnnt play with
>> kernel. They belong to 'opening' phase.
>>
>> Signed-off-by: Wang Nan <[email protected]>
> ...
>> + for (j = 0; j < i; j++) {
>> + close(obj->maps_fds[j]);
>> + obj->maps_fds[j] = -1;
>
> this line is unnecessary, since you're freeing the whole array
> at the line below:
>
>> + }
>> + free(obj->maps_fds);
>> + obj->maps_fds = NULL;
>
> ...
>
>> void bpf_close_object(struct bpf_object *obj)
>> {
>> if (!obj)
>> return;
>>
>> bpf_obj_clear_elf(obj);
>> + bpf_unload_object(obj);
>
> just realized that this API keeps elf file open for the whole
> life time. I think that should be split up.
> bpf_open_object() can do elf parsing, creation of maps and
> bpf loading.
> bpf_close_object() can unload programs, maps. That's fine,
> but closing of elf file and freeing memory associated
> with parsing sections should be a separate call.
>

Looks like I didn't describe the API of libbpf very clearly in patch 0/37.

There are two phases of the API: opening and loading. Opening phase
does ELF parsing, and records all related information info 'struct
bpf_object'.
After that, ELF file itself is closed by bpf_obj_elf_finish(). See patch
19/37.
After that, caller will have a chance to adjust programs and maps
(currently not
implemented). Caller should call bpf_load_object() to do real useful
things with
kernel. bpf_unload_object() remove map and programs in the object from
kernel. It
also calls bpf_obj_elf_finish(). However, if it has been called once,
the 2nd call
doesn't take effect at all. Just in case.

Thank you.

2015-05-25 11:47:48

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH v3 30/37] perf bpf: Add bpf-loader and open ELF object files



On 2015/5/23 1:24, Jiri Olsa wrote:
> On Sun, May 17, 2015 at 10:56:55AM +0000, Wang Nan wrote:
>
> SNIP
>
>> +#define DEFINE_PRINT_FN(name, level) \
>> +static int libbpf_##name(const char *fmt, ...) \
>> +{ \
>> + va_list args; \
>> + int ret; \
>> + \
>> + va_start(args, fmt); \
>> + ret = veprintf(level, verbose, pr_fmt(fmt), args);\
>> + va_end(args); \
>> + return ret; \
>> +}
>> +
>> +DEFINE_PRINT_FN(warning, 0)
>> +DEFINE_PRINT_FN(info, 0)
>> +DEFINE_PRINT_FN(debug, 1)
>> +
>> +static bool libbpf_inited = false;
>> +
>> +#define MAX_OBJECTS 128
>> +
>> +struct {
>> + struct bpf_object *objects[MAX_OBJECTS];
>> + size_t nr_objects;
>> +} params;
> apart from that we dont like this kind of static stuff, this seems like
> nice case for having simple handler like 'struct bpf_objects' carrying
> the above data.. what do I miss?

I want to avoid fragmented memory allocation for storing bpf_object
pointers.
Storing them together into an array can make code simpler. I think I can
made
something like 'struct bpf_object *bpf_next_object(obj)' in libbpf so we can
iterate over each loaded bpf objects, then this array and nr_objects can be
hidden.

> also params should actually be static right?
>
> jirka

2015-05-25 13:00:24

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 30/37] perf bpf: Add bpf-loader and open ELF object files

> On Sun, May 17, 2015 at 10:56:55AM +0000, Wang Nan wrote:

SNIP

> +static bool libbpf_inited = false;

libbpf_initialized?

[acme@zoo linux]$ find . -name "*.[ch]" | xargs grep initialized | wc -l
5134
[acme@zoo linux]$ find . -name "*.[ch]" | xargs grep inited | wc -l
226

- Arnaldo

2015-05-25 13:30:49

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 09/37] bpf tools: Open eBPF object file and do basic validation

Em Fri, May 22, 2015 at 06:00:58PM -0700, Alexei Starovoitov escreveu:
> On 5/22/15 10:23 AM, Jiri Olsa wrote:
> >>+struct bpf_object *bpf_open_object(const char *path)

> >another suggestion for the namespace.. Arnaldo forces us ;-)
> >to use the object name first plus '__(method name)' for
> >interface functions so that would be:

> > bpf_object__open
> > bpf_object__close

> >not sure we want to keep that standard in here though.. Arnaldo?

> have been thinking back and forth on this one.
> Finally convinced myself that we shouldn't be forcing it here.
> object__method style would force the library to look like fake
> object oriented whereas it's not. It's a normal C. Let's keep it

Why "fake"? Just because C doesn't have explicit support for OO doesn't
mean we can't use the concept of OO with structs and functions :-)

> simple. Objects are not needed here. May be 'bpf_object' is an
> unfortunate name, but it doesn't make the library to be 'ooo'.

Well, I don't think that what leads one to think about using some
convention was because "object" was in its name, but because OO _is_
being used in this case, albeit a restricted set, the one possible while
using C.

For instance, in this patch:

struct bpf_object {
/*
* Information when doing elf related work. Only valid if fd
* is valid.
*/
struct {
int fd;
Elf *elf;
GElf_Ehdr ehdr;
} elf;
char path[]; /* Changed from being a pointer to here, to avoid one alloc */
};

static struct bpf_object *__bpf_obj_alloc(const char *path)
{
struct bpf_object *obj;

obj = calloc(1, sizeof(struct bpf_object));
if (!obj) {
pr_warning("alloc memory failed for %s\n", path);
return NULL;
}

obj->path = strdup(path);
if (!obj->path) {
pr_warning("failed to strdup '%s'\n", path);
free(obj);
return NULL;
}
return obj;
}

The above is for me naturally a constructor, in the restricted OO
possible with C used in tools/perf (or anywhere else :)), and thus we
have a convention for this, short one, struct being instantiated + __ +
new.

struct bpf_object *bpf_object__new(const char *path)
{
struct bpf_object *obj = zalloc(sizeof(*obj) + strlen(path) + 1);

if (obj) {
strcpy(obj->path, path);
obj->elf.fd = -1;
}

return obj;
}

---

If it doesn't allocates, i.e. it is embedded in another struct, then, we
have struct being initiated + __ + init, and so on for __delete +
__exit, etc.

Just a convention, that we (or at least I) try to follow as judiciously
as possible in tools/perf/.

Like others in the kernel, like using, hey, "__" in front of functions
to state that they do slightly less than the a function with the same
name, normally locking is done on foo() that calls __foo() to do the
unlocked part.

It avoids ambiguity as what is the struct being acted upon by the
function, since we use _ to separate words in the struct name
(bpf_object, perf_evlist, etc) and in the function name (findnew_thread,
process_events, etc), helps with grepping the source code base, etc.

> libtraceevent doesn't use this style either...

Well, there are many styles to pick, the fact that perf uses __ to
separate class name from class method doesn't mean that you should as
well, as you may find it inconvenient or useless to you, you may prefer
CamelCase notation, for instance ;-)

In the same fashion the fact that libtraceevent doesn't doesn't mean you
shouldn't use what the perf tooling uses.

- Arnaldo

2015-05-26 00:06:19

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH v3 09/37] bpf tools: Open eBPF object file and do basic validation



On 2015/5/25 21:30, Arnaldo Carvalho de Melo wrote:
> Em Fri, May 22, 2015 at 06:00:58PM -0700, Alexei Starovoitov escreveu:
>> On 5/22/15 10:23 AM, Jiri Olsa wrote:
>>>> +struct bpf_object *bpf_open_object(const char *path)
>>> another suggestion for the namespace.. Arnaldo forces us ;-)
>>> to use the object name first plus '__(method name)' for
>>> interface functions so that would be:
>>> bpf_object__open
>>> bpf_object__close
>>> not sure we want to keep that standard in here though.. Arnaldo?
>
>> have been thinking back and forth on this one.
>> Finally convinced myself that we shouldn't be forcing it here.
>> object__method style would force the library to look like fake
>> object oriented whereas it's not. It's a normal C. Let's keep it
> Why "fake"? Just because C doesn't have explicit support for OO doesn't
> mean we can't use the concept of OO with structs and functions :-)
>
>> simple. Objects are not needed here. May be 'bpf_object' is an
>> unfortunate name, but it doesn't make the library to be 'ooo'.
> Well, I don't think that what leads one to think about using some
> convention was because "object" was in its name, but because OO _is_
> being used in this case, albeit a restricted set, the one possible while
> using C.
>
> For instance, in this patch:
>
> struct bpf_object {
> /*
> * Information when doing elf related work. Only valid if fd
> * is valid.
> */
> struct {
> int fd;
> Elf *elf;
> GElf_Ehdr ehdr;
> } elf;
> char path[]; /* Changed from being a pointer to here, to avoid one alloc */
> };
>
> static struct bpf_object *__bpf_obj_alloc(const char *path)
> {
> struct bpf_object *obj;
>
> obj = calloc(1, sizeof(struct bpf_object));
> if (!obj) {
> pr_warning("alloc memory failed for %s\n", path);
> return NULL;
> }
>
> obj->path = strdup(path);
> if (!obj->path) {
> pr_warning("failed to strdup '%s'\n", path);
> free(obj);
> return NULL;
> }
> return obj;
> }
>
> The above is for me naturally a constructor, in the restricted OO
> possible with C used in tools/perf (or anywhere else :)), and thus we
> have a convention for this, short one, struct being instantiated + __ +
> new.
>
> struct bpf_object *bpf_object__new(const char *path)
> {
> struct bpf_object *obj = zalloc(sizeof(*obj) + strlen(path) + 1);
>
> if (obj) {
> strcpy(obj->path, path);
> obj->elf.fd = -1;
> }
>
> return obj;
> }
>
> ---
>
> If it doesn't allocates, i.e. it is embedded in another struct, then, we
> have struct being initiated + __ + init, and so on for __delete +
> __exit, etc.
>
> Just a convention, that we (or at least I) try to follow as judiciously
> as possible in tools/perf/.
>
> Like others in the kernel, like using, hey, "__" in front of functions
> to state that they do slightly less than the a function with the same
> name, normally locking is done on foo() that calls __foo() to do the
> unlocked part.
>
> It avoids ambiguity as what is the struct being acted upon by the
> function, since we use _ to separate words in the struct name
> (bpf_object, perf_evlist, etc) and in the function name (findnew_thread,
> process_events, etc), helps with grepping the source code base, etc.
>
>> libtraceevent doesn't use this style either...
> Well, there are many styles to pick, the fact that perf uses __ to
> separate class name from class method doesn't mean that you should as
> well, as you may find it inconvenient or useless to you, you may prefer
> CamelCase notation, for instance ;-)
>
> In the same fashion the fact that libtraceevent doesn't doesn't mean you
> shouldn't use what the perf tooling uses.
>
> - Arnaldo
I'll try OO style naming and see the results in my next version.
However, I'm
not very sure whether such naming make sense, since we have only 2
classes in
libbpf: 'bpf_object, bpf_program', 'bpf_map' has potential to become a
class but
currently not. In addition, the internal functions are hidden to user,
so the only
meaning to user of such API is an additional '_' in each function.

Anyway, let's see the result then decide whether it is good enough.

Thank you.

2015-05-26 00:41:21

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH v3 09/37] bpf tools: Open eBPF object file and do basic validation

Em Tue, May 26, 2015 at 08:05:28AM +0800, Wangnan (F) escreveu:
> On 2015/5/25 21:30, Arnaldo Carvalho de Melo wrote:
> >Well, there are many styles to pick, the fact that perf uses __ to
> >separate class name from class method doesn't mean that you should as
> >well, as you may find it inconvenient or useless to you, you may prefer
> >CamelCase notation, for instance ;-)

> >In the same fashion the fact that libtraceevent doesn't doesn't mean you
> >shouldn't use what the perf tooling uses.

> I'll try OO style naming and see the results in my next version.

> However, I'm not very sure whether such naming make sense, since we
> have only 2 classes in libbpf: 'bpf_object, bpf_program', 'bpf_map'
> has potential to become a class but currently not.

> In addition, the internal functions are hidden to user, so the only
> meaning to user of such API is an additional '_' in each function.

> Anyway, let's see the result then decide whether it is good enough.

Well, perhaps the number of classes in libbpf is really small, haven't
looked in detail, but over time, if perf's track record is to be taken
into accound, many more may appear :-)

Time permitting I'll try to look at the current codebase again, to try
to provide more comments,

Thanks,

- Arnaldo

2015-05-26 06:24:17

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH v3 00/37] perf tools: introduce 'perf bpf' command to load eBPF programs.



On 2015/5/20 4:46, Alexei Starovoitov wrote:
> On 5/19/15 9:40 AM, Arnaldo Carvalho de Melo wrote:
>> Em Wed, May 20, 2015 at 01:04:48AM +0900, Namhyung Kim escreveu:
>>> On Tue, May 19, 2015 at 10:44:58AM -0300, Arnaldo Carvalho de Melo
>>> wrote:
>>>> Em Mon, May 18, 2015 at 02:45:58PM -0700, Alexei Starovoitov escreveu:
>>>>> On 5/18/15 2:20 PM, Arnaldo Carvalho de Melo wrote:
>>>>>> perf record --event bpf_thing.o
>>>>
>>>>>> Looks more natural then, as it is an event that will take place
>>>>>> when the
>>>>>> filter returns true, and in addition to that, it will come with a
>>>>>> bunch
>>>>>> of variables, etc.
>>>>>
>>>>> well, I think --event fits a bit less than --filter ;)
>>>>> Both not ideal.
>>>>
>>>> I thought --event was more suited, as it states what is the event, and
>>>> when it should happen, i.e. --filter is about reducing the
>>>> frequency of
>>>> something well defined, i.e. an existing --event.
>>>
>>> If we go with 'perf record' rather than 'perf bpf record', I agree
>>> that --event option is more natural than --filter. The --event option
>>> says that it will record - or enable, at least - a (kprobe) event for
>>> bpf programs in it and then do something with it. :)
>>>
>>> Maybe something like this?
>>>
>>> perf record --event bpf:/path/to/object
>>
>> The syntax maybe one of many, say if it sees a ".o" suffix in the even
>> name, look if the provided event name is a file and if this file has the
>> ELF header, whatever.
>
> agree. 'bpf:' prefix is redundant.
> To me the following syntax is fine:
> perf record --event bpf_file.o
>
> In the future it can support automatically:
> perf record --event bpf_file.c
>
> Wang, thoughts?
>

After reading the full thread I think we finally agree that the user
interface should be

perf record --event bpf_file.o

for event filtering case. I'll do this in next version.

Thank you.

>>> Oh, this looks like an interesting approach.. are you saying something
>>> like below?
>>
>> No, those are way too many steps :-)
>>
>> What 'perf script' does? Right now you can ask for a script to run and
>> it will both start 'perf record' with the proper events, and then
>> "immediately" consume it, piping the output of the 'record' "script" to
>> the consumer, that is 'perf script' itself running an interpreter, perl
>> or python.
>
> if you're proposing to do something like:
> perf script bpf_file.c
> that will do event creation, filtering, aggregation, reporting
> and printing results, then it's fine.
> This is pretty much what I thought 'perf bpf run' will do.
>
>> I.e. the first part, say, failed-syscalls-record, would be done
>> internally, loading the bpf object, etc, the second part would be the
>> event massaging, but done in a C subset :-)
>
> not sure that's doable. The task (perf) that loaded the program in the
> step one should be still alive when 2nd part is running.
> For 'perf bpf run' use case, the whole record/report split is
> artificial. There is no need for perf.data.
> In my mind 'perf bpf run file.c' suppose to quickly compile .c,
> hook into kernel, collect and print something, then Ctrl-C of
> 'perf bpf run' should automatically stop everything.
> No perf.data anywhere. It should be quick single step
> tool for live debugging.
> At the same time 'perf record --event bpf_file.o' plus generic
> 'perf report' model works very well as well.
> They are different use cases.
>
> I guess I'm saying let's not get too much ahead of ourselves ;)
> I think Wang's current patchset is close enough to make
> 'perf record --event bpf_file.o' to be useful.
> Let's take this first step.
>