2015-08-28 07:06:50

by Wang Nan

[permalink] [raw]
Subject: [GIT PULL 00/32] perf tools: filtering events using eBPF programs

Hi Arnaldo,

This time I adjust all Cc and Link field in each patch.

Four new patches (1,2,3,12/32) is newly introduced for fixing a bug
related to '--filter' option. Patch 06/32 is also modified. Please keep
an eye on it.

Since Steven is not very responsive, I still keep patch 31 and 32 in my
tree.

The following changes since commit 327938ee9ea16cc03f02f0e9cc74cdc6ac704cc6:

tools lib traceeveent: Allow for negative numbers in print format (2015-08-27 11:13:53 -0300)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/pi3orama/linux tags/perf-ebpf-for-acme-20150828

for you to fetch changes up to 430fc318354a17dddbc66ebb0964a0831f643c77:

bpf: Introduce function for outputing data to perf event (2015-08-28 06:24:43 +0000)

----------------------------------------------------------------
perf BPF related improvements and bugfix:

- Call bpf__clear() in command 'record, top, stat and trace' to ensure
release BPF related resources.

- Correct --filter option when try to apply filter to BPF object.

Signed-off-by: Wang Nan <[email protected]>

----------------------------------------------------------------
He Kuang (5):
perf tools: Move linux/filter.h to tools/include
perf tools: Introduce arch_get_reg_info() for x86
perf record: Support custom vmlinux path
tools lib traceevent: Support function __get_dynamic_array_len
bpf: Introduce function for outputing data to perf event

Wang Nan (27):
bpf tools: New API to get name from a BPF object
perf tools: Don't set cmdline_group_boundary if no evsel is collected
perf tools: Introduce dummy evsel
perf tools: Make perf depend on libbpf
perf ebpf: Add the libbpf glue
perf tools: Enable passing bpf object file to --event
perf probe: Attach trace_probe_event with perf_probe_event
perf record, bpf: Parse and probe eBPF programs probe points
perf bpf: Collect 'struct perf_probe_event' for bpf_program
perf record: Load all eBPF object into kernel
perf tools: Add bpf_fd field to evsel and config it
perf tools: Allow filter option to be applied to bof object
perf tools: Attach eBPF program to perf event
perf tools: Suppress probing messages when probing by BPF loading
perf record: Add clang options for compiling BPF scripts
perf tools: Infrastructure for compiling scriptlets when passing '.c' to --event
perf tests: Enforce LLVM test for BPF test
perf test: Add 'perf test BPF'
bpf tools: Load a program with different instances using preprocessor
perf tools: Fix probe-event.h include
perf probe: Reset args and nargs for probe_trace_event when failure
perf tools: Add BPF_PROLOGUE config options for further patches
perf tools: Add prologue for BPF programs for fetching arguments
perf tools: Generate prologue for BPF programs
perf tools: Use same BPF program if arguments are identical
perf probe: Init symbol as kprobe
perf tools: Support attach BPF program on uprobe events

include/trace/events/bpf.h | 30 +
include/uapi/linux/bpf.h | 7 +
kernel/trace/bpf_trace.c | 23 +
samples/bpf/bpf_helpers.h | 2 +
tools/build/Makefile.feature | 6 +-
tools/include/linux/filter.h | 237 +++++++
tools/lib/bpf/libbpf.c | 168 ++++-
tools/lib/bpf/libbpf.h | 26 +-
tools/lib/traceevent/event-parse.c | 56 +-
tools/lib/traceevent/event-parse.h | 1 +
tools/perf/MANIFEST | 4 +
tools/perf/Makefile.perf | 19 +-
tools/perf/arch/x86/Makefile | 1 +
tools/perf/arch/x86/util/Build | 2 +
tools/perf/arch/x86/util/dwarf-regs.c | 104 ++-
tools/perf/builtin-probe.c | 4 +-
tools/perf/builtin-record.c | 64 +-
tools/perf/builtin-stat.c | 9 +-
tools/perf/builtin-top.c | 11 +-
tools/perf/builtin-trace.c | 6 +-
tools/perf/config/Makefile | 31 +-
tools/perf/tests/Build | 10 +-
tools/perf/tests/bpf-script-example.c | 44 ++
tools/perf/tests/bpf.c | 170 +++++
tools/perf/tests/builtin-test.c | 12 +
tools/perf/tests/llvm.c | 125 +++-
tools/perf/tests/llvm.h | 15 +
tools/perf/tests/make | 4 +-
tools/perf/tests/tests.h | 3 +
tools/perf/util/Build | 2 +
tools/perf/util/bpf-loader.c | 730 +++++++++++++++++++++
tools/perf/util/bpf-loader.h | 95 +++
tools/perf/util/bpf-prologue.c | 442 +++++++++++++
tools/perf/util/bpf-prologue.h | 34 +
tools/perf/util/evlist.c | 107 +++
tools/perf/util/evlist.h | 2 +
tools/perf/util/evsel.c | 49 ++
tools/perf/util/evsel.h | 7 +
tools/perf/util/include/dwarf-regs.h | 7 +
tools/perf/util/parse-events.c | 73 ++-
tools/perf/util/parse-events.h | 4 +
tools/perf/util/parse-events.l | 6 +
tools/perf/util/parse-events.y | 29 +-
tools/perf/util/probe-event.c | 79 ++-
tools/perf/util/probe-event.h | 8 +-
tools/perf/util/probe-file.c | 5 +-
tools/perf/util/probe-finder.c | 4 +
.../perf/util/scripting-engines/trace-event-perl.c | 1 +
.../util/scripting-engines/trace-event-python.c | 1 +
49 files changed, 2759 insertions(+), 120 deletions(-)
create mode 100644 include/trace/events/bpf.h
create mode 100644 tools/include/linux/filter.h
create mode 100644 tools/perf/tests/bpf-script-example.c
create mode 100644 tools/perf/tests/bpf.c
create mode 100644 tools/perf/tests/llvm.h
create mode 100644 tools/perf/util/bpf-loader.c
create mode 100644 tools/perf/util/bpf-loader.h
create mode 100644 tools/perf/util/bpf-prologue.c
create mode 100644 tools/perf/util/bpf-prologue.h

--
2.1.0


2015-08-28 07:06:57

by Wang Nan

[permalink] [raw]
Subject: [PATCH 01/32] bpf tools: New API to get name from a BPF object

Before this patch there's no way to connect a loaded bpf object
to its source file. However, during applying perf's '--filter' to BPF
object, without this connection makes things harder, because perf loads
all programs together, but '--filter' setting is for each object.

API of bpf_object__open_buffer() is changed to allow passing a name.
Fortunately, at this time there's only one user of it (perf test LLVM),
so we change it together.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
---
tools/lib/bpf/libbpf.c | 25 ++++++++++++++++++++++---
tools/lib/bpf/libbpf.h | 4 +++-
tools/perf/tests/llvm.c | 2 +-
3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4fa4bc4..4252fc2 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -880,15 +880,26 @@ struct bpf_object *bpf_object__open(const char *path)
}

struct bpf_object *bpf_object__open_buffer(void *obj_buf,
- size_t obj_buf_sz)
+ size_t obj_buf_sz,
+ const char *name)
{
+ char tmp_name[64];
+
/* param validation */
if (!obj_buf || obj_buf_sz <= 0)
return NULL;

- pr_debug("loading object from buffer\n");
+ if (!name) {
+ snprintf(tmp_name, sizeof(tmp_name), "%lx-%lx",
+ (unsigned long)obj_buf,
+ (unsigned long)obj_buf_sz);
+ tmp_name[sizeof(tmp_name) - 1] = '\0';
+ name = tmp_name;
+ }
+ pr_debug("loading object '%s' from buffer\n",
+ name);

- return __bpf_object__open("[buffer]", obj_buf, obj_buf_sz);
+ return __bpf_object__open(name, obj_buf, obj_buf_sz);
}

int bpf_object__unload(struct bpf_object *obj)
@@ -975,6 +986,14 @@ bpf_object__next(struct bpf_object *prev)
return next;
}

+const char *
+bpf_object__get_name(struct bpf_object *obj)
+{
+ if (!obj)
+ return NULL;
+ return obj->path;
+}
+
struct bpf_program *
bpf_program__next(struct bpf_program *prev, struct bpf_object *obj)
{
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index ea8adc2..f16170c 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -28,12 +28,14 @@ struct bpf_object;

struct bpf_object *bpf_object__open(const char *path);
struct bpf_object *bpf_object__open_buffer(void *obj_buf,
- size_t obj_buf_sz);
+ size_t obj_buf_sz,
+ const char *name);
void bpf_object__close(struct bpf_object *object);

/* Load/unload object into/from kernel */
int bpf_object__load(struct bpf_object *obj);
int bpf_object__unload(struct bpf_object *obj);
+const char *bpf_object__get_name(struct bpf_object *obj);

struct bpf_object *bpf_object__next(struct bpf_object *prev);
#define bpf_object__for_each_safe(pos, tmp) \
diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c
index a337356..52d5597 100644
--- a/tools/perf/tests/llvm.c
+++ b/tools/perf/tests/llvm.c
@@ -26,7 +26,7 @@ static int test__bpf_parsing(void *obj_buf, size_t obj_buf_sz)
{
struct bpf_object *obj;

- obj = bpf_object__open_buffer(obj_buf, obj_buf_sz);
+ obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, NULL);
if (!obj)
return -1;
bpf_object__close(obj);
--
2.1.0

2015-08-28 07:15:10

by Wang Nan

[permalink] [raw]
Subject: [PATCH 02/32] perf tools: Don't set cmdline_group_boundary if no evsel is collected

If parse_events__scanner() collects no entry, perf_evlist__last(evlist)
is invalid. Then setting of cmdline_group_boundary touches invalid.

It could happend in currect BPF implementation. See [1]. Although it
can be fixed, for safety reason it whould be better to introduce this
check.

Instead of checking number of entries, check data.list instead, so we
can add dummy evsel here.

[1]: http://lkml.kernel.org/n/[email protected]

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
---
tools/perf/util/parse-events.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index d826e6f..14cd7e3 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1143,10 +1143,14 @@ int parse_events(struct perf_evlist *evlist, const char *str,
int entries = data.idx - evlist->nr_entries;
struct perf_evsel *last;

+ if (!list_empty(&data.list)) {
+ last = list_entry(data.list.prev,
+ struct perf_evsel, node);
+ last->cmdline_group_boundary = true;
+ }
+
perf_evlist__splice_list_tail(evlist, &data.list, entries);
evlist->nr_groups += data.nr_groups;
- last = perf_evlist__last(evlist);
- last->cmdline_group_boundary = true;

return 0;
}
--
2.1.0

2015-08-28 07:14:34

by Wang Nan

[permalink] [raw]
Subject: [PATCH 03/32] perf tools: Introduce dummy evsel

This patch allows linking dummy evsel onto evlist as a placeholder. It
is for following patch which allows passing BPF object using '--event
object.o'.

Doesn't link other event selectors, if passing a BPF object file to
'--event', nothing is linked onto evlist. Instead, events described in
BPF object file are probed and linked in a delayed manner because we
want do all probing work together. Therefore, evsel for events in BPF
object would be linked at the end of evlist. Which causes a small
problem that, if passing '--filter' setting after object file, the
filter option won't be correctly applied to those events.

This patch links dummy onto evlist, so following --filter can be
collected by the dummy evsel. For this reason dummy evsels are set to
PERF_TYPE_TRACEPOINT.

Due to the possibility of existance of dummy evsel,
perf_evlist__purge_dummy() must be called right after parse_options().
This patch adds it to record, top, trace and stat builtin commands.
Further patch moves it down after real BPF events are processed with.

Signed-off-by: Wang Nan <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
---
tools/perf/builtin-record.c | 2 ++
tools/perf/builtin-stat.c | 1 +
tools/perf/builtin-top.c | 1 +
tools/perf/builtin-trace.c | 1 +
tools/perf/util/evlist.c | 19 +++++++++++++++++++
tools/perf/util/evlist.h | 1 +
tools/perf/util/evsel.c | 32 ++++++++++++++++++++++++++++++++
tools/perf/util/evsel.h | 6 ++++++
tools/perf/util/parse-events.c | 25 +++++++++++++++++++++----
9 files changed, 84 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index a660022..81829de 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1112,6 +1112,8 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)

argc = parse_options(argc, argv, record_options, record_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
+ perf_evlist__purge_dummy(rec->evlist);
+
if (!argc && target__none(&rec->opts.target))
usage_with_options(record_usage, record_options);

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7aa039b..99b62f1 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1208,6 +1208,7 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)

argc = parse_options(argc, argv, options, stat_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
+ perf_evlist__purge_dummy(evsel_list);

interval = stat_config.interval;

diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 8c465c8..246203b 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1198,6 +1198,7 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
perf_config(perf_top_config, &top);

argc = parse_options(argc, argv, options, top_usage, 0);
+ perf_evlist__purge_dummy(top.evlist);
if (argc)
usage_with_options(top_usage, options);

diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index 2f1162d..ef5fde6 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -3080,6 +3080,7 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)

argc = parse_options_subcommand(argc, argv, trace_options, trace_subcommands,
trace_usage, PARSE_OPT_STOP_AT_NON_OPTION);
+ perf_evlist__purge_dummy(trace.evlist);

if (trace.trace_pgfaults) {
trace.opts.sample_address = true;
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e9a5d43..30fc327 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1694,3 +1694,22 @@ void perf_evlist__set_tracking_event(struct perf_evlist *evlist,

tracking_evsel->tracking = true;
}
+
+void perf_evlist__purge_dummy(struct perf_evlist *evlist)
+{
+ struct perf_evsel *pos, *n;
+
+ /*
+ * Remove all dummy events.
+ * During linking, we don't touch anything except link
+ * it into evlist. As a result, we don't
+ * need to adjust evlist->nr_entries during removal.
+ */
+
+ evlist__for_each_safe(evlist, n, pos) {
+ if (perf_evsel__is_dummy(pos)) {
+ list_del_init(&pos->node);
+ perf_evsel__delete(pos);
+ }
+ }
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 436e358..df4820e 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -180,6 +180,7 @@ bool perf_evlist__valid_read_format(struct perf_evlist *evlist);
void perf_evlist__splice_list_tail(struct perf_evlist *evlist,
struct list_head *list,
int nr_entries);
+void perf_evlist__purge_dummy(struct perf_evlist *evlist);

static inline struct perf_evsel *perf_evlist__first(struct perf_evlist *evlist)
{
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index b096ef7..35947f5 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -212,6 +212,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
evsel->sample_size = __perf_evsel__sample_size(attr->sample_type);
perf_evsel__calc_id_pos(evsel);
evsel->cmdline_group_boundary = false;
+ evsel->is_dummy = false;
}

struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx)
@@ -224,6 +225,37 @@ struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx)
return evsel;
}

+struct perf_evsel *perf_evsel__new_dummy(const char *name)
+{
+ struct perf_evsel *evsel = zalloc(perf_evsel__object.size);
+
+ if (!evsel)
+ return NULL;
+
+ /*
+ * Don't need call perf_evsel__init() for dummy evsel.
+ * Keep it simple.
+ */
+ evsel->name = strdup(name);
+ if (!evsel->name)
+ goto out_free;
+
+ INIT_LIST_HEAD(&evsel->node);
+ INIT_LIST_HEAD(&evsel->config_terms);
+
+ evsel->cmdline_group_boundary = false;
+ /*
+ * Set dummy evsel as TRACEPOINT event so it can collect filter
+ * options.
+ */
+ evsel->attr.type = PERF_TYPE_TRACEPOINT;
+ evsel->is_dummy = true;
+ return evsel;
+out_free:
+ free(evsel);
+ return NULL;
+}
+
struct perf_evsel *perf_evsel__newtp_idx(const char *sys, const char *name, int idx)
{
struct perf_evsel *evsel = zalloc(perf_evsel__object.size);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 93ac6b1..443995b 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -114,6 +114,7 @@ struct perf_evsel {
struct perf_evsel *leader;
char *group_name;
bool cmdline_group_boundary;
+ bool is_dummy;
struct list_head config_terms;
};

@@ -149,6 +150,11 @@ int perf_evsel__object_config(size_t object_size,
void (*fini)(struct perf_evsel *evsel));

struct perf_evsel *perf_evsel__new_idx(struct perf_event_attr *attr, int idx);
+struct perf_evsel *perf_evsel__new_dummy(const char *name);
+static inline bool perf_evsel__is_dummy(struct perf_evsel *evsel)
+{
+ return evsel->is_dummy;
+}

static inline struct perf_evsel *perf_evsel__new(struct perf_event_attr *attr)
{
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 14cd7e3..71d91fb 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1141,7 +1141,7 @@ int parse_events(struct perf_evlist *evlist, const char *str,
perf_pmu__parse_cleanup();
if (!ret) {
int entries = data.idx - evlist->nr_entries;
- struct perf_evsel *last;
+ struct perf_evsel *last = NULL;

if (!list_empty(&data.list)) {
last = list_entry(data.list.prev,
@@ -1149,8 +1149,25 @@ int parse_events(struct perf_evlist *evlist, const char *str,
last->cmdline_group_boundary = true;
}

- perf_evlist__splice_list_tail(evlist, &data.list, entries);
- evlist->nr_groups += data.nr_groups;
+ if (last && perf_evsel__is_dummy(last)) {
+ if (!list_is_singular(&data.list)) {
+ parse_events_evlist_error(&data, 0,
+ "Dummy evsel error: not on a singular list");
+ return -1;
+ }
+ /*
+ * We are introducing a dummy event. Don't touch
+ * anything, just link it.
+ *
+ * Don't use perf_evlist__splice_list_tail() since
+ * it alerts evlist->nr_entries, which affect header
+ * of resulting perf.data.
+ */
+ list_splice_tail(&data.list, &evlist->entries);
+ } else {
+ perf_evlist__splice_list_tail(evlist, &data.list, entries);
+ evlist->nr_groups += data.nr_groups;
+ }

return 0;
}
@@ -1256,7 +1273,7 @@ foreach_evsel_in_last_glob(struct perf_evlist *evlist,
struct perf_evsel *last = NULL;
int err;

- if (evlist->nr_entries > 0)
+ if (!list_empty(&evlist->entries))
last = perf_evlist__last(evlist);

do {
--
2.1.0

2015-08-28 07:07:06

by Wang Nan

[permalink] [raw]
Subject: [PATCH 04/32] perf tools: Make perf depend on libbpf

By adding libbpf into perf's Makefile, this patch enables perf to build
libbpf during building if libelf is found and neither NO_LIBELF nor
NO_LIBBPF is set. The newly introduced code is similar to libapi and
libtraceevent building in Makefile.perf.

MANIFEST is also updated for 'make perf-*-src-pkg'.

Append make_no_libbpf to tools/perf/tests/make.

'bpf' feature check is appended into default FEATURE_TESTS and
FEATURE_DISPLAY, so perf will check API version of bpf in
/path/to/kernel/include/uapi/linux/bpf.h. Which should not fail except
when we are trying to port this code to an old kernel.

Error messages are also updated to notify users about the disable of BPF
support of 'perf record' if libelf is missed or BPF API check failed.

tools/lib/bpf is added into TAG_FOLDERS to allow us to navigate on
libbpf files when working on perf using tools/perf/tags.

Signed-off-by: Wang Nan <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/build/Makefile.feature | 6 ++++--
tools/perf/MANIFEST | 3 +++
tools/perf/Makefile.perf | 19 +++++++++++++++++--
tools/perf/config/Makefile | 19 ++++++++++++++++++-
tools/perf/tests/make | 4 +++-
5 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/tools/build/Makefile.feature b/tools/build/Makefile.feature
index 2975632..5ec6b37 100644
--- a/tools/build/Makefile.feature
+++ b/tools/build/Makefile.feature
@@ -51,7 +51,8 @@ FEATURE_TESTS ?= \
timerfd \
libdw-dwarf-unwind \
zlib \
- lzma
+ lzma \
+ bpf

FEATURE_DISPLAY ?= \
dwarf \
@@ -67,7 +68,8 @@ FEATURE_DISPLAY ?= \
libunwind \
libdw-dwarf-unwind \
zlib \
- lzma
+ lzma \
+ bpf

# Set FEATURE_CHECK_(C|LD)FLAGS-all for all FEATURE_TESTS features.
# If in the future we need per-feature checks/flags for features not
diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
index af009bd..56fe0c9 100644
--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@@ -17,6 +17,7 @@ tools/build
tools/arch/x86/include/asm/atomic.h
tools/arch/x86/include/asm/rmwcc.h
tools/lib/traceevent
+tools/lib/bpf
tools/lib/api
tools/lib/bpf
tools/lib/hweight.c
@@ -67,6 +68,8 @@ arch/*/lib/memset*.S
include/linux/poison.h
include/linux/hw_breakpoint.h
include/uapi/linux/perf_event.h
+include/uapi/linux/bpf.h
+include/uapi/linux/bpf_common.h
include/uapi/linux/const.h
include/uapi/linux/swab.h
include/uapi/linux/hw_breakpoint.h
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index d9863cb..a6a789e 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -145,6 +145,7 @@ AWK = awk

LIB_DIR = $(srctree)/tools/lib/api/
TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
+BPF_DIR = $(srctree)/tools/lib/bpf/

# include config/Makefile by default and rule out
# non-config cases
@@ -180,6 +181,7 @@ strip-libs = $(filter-out -l%,$(1))

ifneq ($(OUTPUT),)
TE_PATH=$(OUTPUT)
+ BPF_PATH=$(OUTPUT)
ifneq ($(subdir),)
LIB_PATH=$(OUTPUT)/../lib/api/
else
@@ -188,6 +190,7 @@ endif
else
TE_PATH=$(TRACE_EVENT_DIR)
LIB_PATH=$(LIB_DIR)
+ BPF_PATH=$(BPF_DIR)
endif

LIBTRACEEVENT = $(TE_PATH)libtraceevent.a
@@ -199,6 +202,8 @@ LIBTRACEEVENT_DYNAMIC_LIST_LDFLAGS = -Xlinker --dynamic-list=$(LIBTRACEEVENT_DYN
LIBAPI = $(LIB_PATH)libapi.a
export LIBAPI

+LIBBPF = $(BPF_PATH)libbpf.a
+
# python extension build directories
PYTHON_EXTBUILD := $(OUTPUT)python_ext_build/
PYTHON_EXTBUILD_LIB := $(PYTHON_EXTBUILD)lib/
@@ -251,6 +256,9 @@ export PERL_PATH
LIB_FILE=$(OUTPUT)libperf.a

PERFLIBS = $(LIB_FILE) $(LIBAPI) $(LIBTRACEEVENT)
+ifndef NO_LIBBPF
+ PERFLIBS += $(LIBBPF)
+endif

# We choose to avoid "if .. else if .. else .. endif endif"
# because maintaining the nesting to match is a pain. If
@@ -420,6 +428,13 @@ $(LIBAPI)-clean:
$(call QUIET_CLEAN, libapi)
$(Q)$(MAKE) -C $(LIB_DIR) O=$(OUTPUT) clean >/dev/null

+$(LIBBPF): FORCE
+ $(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) $(OUTPUT)libbpf.a
+
+$(LIBBPF)-clean:
+ $(call QUIET_CLEAN, libbpf)
+ $(Q)$(MAKE) -C $(BPF_DIR) O=$(OUTPUT) clean >/dev/null
+
help:
@echo 'Perf make targets:'
@echo ' doc - make *all* documentation (see below)'
@@ -459,7 +474,7 @@ INSTALL_DOC_TARGETS += quick-install-doc quick-install-man quick-install-html
$(DOC_TARGETS):
$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) $(@:doc=all)

-TAG_FOLDERS= . ../lib/traceevent ../lib/api ../lib/symbol
+TAG_FOLDERS= . ../lib/traceevent ../lib/api ../lib/symbol ../lib/bpf
TAG_FILES= ../../include/uapi/linux/perf_event.h

TAGS:
@@ -567,7 +582,7 @@ config-clean:
$(call QUIET_CLEAN, config)
$(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null

-clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
+clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean config-clean
$(call QUIET_CLEAN, core-objs) $(RM) $(LIB_FILE) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(LANG_BINDINGS)
$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
$(Q)$(RM) $(OUTPUT).config-detected
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 827557f..38a4144 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -106,6 +106,7 @@ ifdef LIBBABELTRACE
FEATURE_CHECK_LDFLAGS-libbabeltrace := $(LIBBABELTRACE_LDFLAGS) -lbabeltrace-ctf
endif

+FEATURE_CHECK_CFLAGS-bpf = -I. -I$(srctree)/tools/include -I$(srctree)/arch/$(ARCH)/include/uapi -I$(srctree)/include/uapi
# include ARCH specific config
-include $(src-perf)/arch/$(ARCH)/Makefile

@@ -233,6 +234,7 @@ ifdef NO_LIBELF
NO_DEMANGLE := 1
NO_LIBUNWIND := 1
NO_LIBDW_DWARF_UNWIND := 1
+ NO_LIBBPF := 1
else
ifeq ($(feature-libelf), 0)
ifeq ($(feature-glibc), 1)
@@ -242,13 +244,14 @@ else
LIBC_SUPPORT := 1
endif
ifeq ($(LIBC_SUPPORT),1)
- msg := $(warning No libelf found, disables 'probe' tool, please install elfutils-libelf-devel/libelf-dev);
+ msg := $(warning No libelf found, disables 'probe' tool and BPF support in 'perf record', please install elfutils-libelf-devel/libelf-dev);

NO_LIBELF := 1
NO_DWARF := 1
NO_DEMANGLE := 1
NO_LIBUNWIND := 1
NO_LIBDW_DWARF_UNWIND := 1
+ NO_LIBBPF := 1
else
ifneq ($(filter s% -static%,$(LDFLAGS),),)
msg := $(error No static glibc found, please install glibc-static);
@@ -305,6 +308,13 @@ ifndef NO_LIBELF
$(call detected,CONFIG_DWARF)
endif # PERF_HAVE_DWARF_REGS
endif # NO_DWARF
+
+ ifndef NO_LIBBPF
+ ifeq ($(feature-bpf), 1)
+ CFLAGS += -DHAVE_LIBBPF_SUPPORT
+ $(call detected,CONFIG_LIBBPF)
+ endif
+ endif # NO_LIBBPF
endif # NO_LIBELF

ifeq ($(ARCH),powerpc)
@@ -320,6 +330,13 @@ ifndef NO_LIBUNWIND
endif
endif

+ifndef NO_LIBBPF
+ ifneq ($(feature-bpf), 1)
+ msg := $(warning BPF API too old. Please install recent kernel headers. BPF support in 'perf record' is disabled.)
+ NO_LIBBPF := 1
+ endif
+endif
+
dwarf-post-unwind := 1
dwarf-post-unwind-text := BUG

diff --git a/tools/perf/tests/make b/tools/perf/tests/make
index ba31c4b..2cbd0c6 100644
--- a/tools/perf/tests/make
+++ b/tools/perf/tests/make
@@ -44,6 +44,7 @@ make_no_libnuma := NO_LIBNUMA=1
make_no_libaudit := NO_LIBAUDIT=1
make_no_libbionic := NO_LIBBIONIC=1
make_no_auxtrace := NO_AUXTRACE=1
+make_no_libbpf := NO_LIBBPF=1
make_tags := tags
make_cscope := cscope
make_help := help
@@ -66,7 +67,7 @@ make_static := LDFLAGS=-static
make_minimal := NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1
make_minimal += NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1
make_minimal += NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1
-make_minimal += NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1
+make_minimal += NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1

# $(run) contains all available tests
run := make_pure
@@ -94,6 +95,7 @@ run += make_no_libnuma
run += make_no_libaudit
run += make_no_libbionic
run += make_no_auxtrace
+run += make_no_libbpf
run += make_help
run += make_doc
run += make_perf_o
--
2.1.0

2015-08-28 07:14:33

by Wang Nan

[permalink] [raw]
Subject: [PATCH 05/32] perf ebpf: Add the libbpf glue

The 'bpf-loader.[ch]' files are introduced in this patch. Which will be
the interface between perf and libbpf. bpf__prepare_load() resides in
bpf-loader.c. Dummy functions should be used because bpf-loader.c is
available only when CONFIG_LIBBPF is on.

Functions in bpf-loader.c should not report error explicitly. Instead,
strerror style error reporting should be used.

Signed-off-by: Wang Nan <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/util/bpf-loader.c | 92 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 47 ++++++++++++++++++++++
2 files changed, 139 insertions(+)
create mode 100644 tools/perf/util/bpf-loader.c
create mode 100644 tools/perf/util/bpf-loader.h

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
new file mode 100644
index 0000000..88531ea
--- /dev/null
+++ b/tools/perf/util/bpf-loader.c
@@ -0,0 +1,92 @@
+/*
+ * bpf-loader.c
+ *
+ * Copyright (C) 2015 Wang Nan <[email protected]>
+ * Copyright (C) 2015 Huawei Inc.
+ */
+
+#include <bpf/libbpf.h>
+#include "perf.h"
+#include "debug.h"
+#include "bpf-loader.h"
+
+#define DEFINE_PRINT_FN(name, level) \
+static int libbpf_##name(const char *fmt, ...) \
+{ \
+ va_list args; \
+ int ret; \
+ \
+ va_start(args, fmt); \
+ ret = veprintf(level, verbose, pr_fmt(fmt), args);\
+ va_end(args); \
+ return ret; \
+}
+
+DEFINE_PRINT_FN(warning, 0)
+DEFINE_PRINT_FN(info, 0)
+DEFINE_PRINT_FN(debug, 1)
+
+static bool libbpf_initialized;
+
+int bpf__prepare_load(const char *filename)
+{
+ struct bpf_object *obj;
+
+ if (!libbpf_initialized)
+ libbpf_set_print(libbpf_warning,
+ libbpf_info,
+ libbpf_debug);
+
+ obj = bpf_object__open(filename);
+ if (!obj) {
+ pr_debug("bpf: failed to load %s\n", filename);
+ return -EINVAL;
+ }
+
+ /*
+ * Throw object pointer away: it will be retrived using
+ * bpf_objects iterater.
+ */
+
+ return 0;
+}
+
+void bpf__clear(void)
+{
+ struct bpf_object *obj, *tmp;
+
+ bpf_object__for_each_safe(obj, tmp)
+ bpf_object__close(obj);
+}
+
+#define bpf__strerror_head(err, buf, size) \
+ char sbuf[STRERR_BUFSIZE], *emsg;\
+ if (!size)\
+ return 0;\
+ if (err < 0)\
+ err = -err;\
+ emsg = strerror_r(err, sbuf, sizeof(sbuf));\
+ switch (err) {\
+ default:\
+ scnprintf(buf, size, "%s", emsg);\
+ break;
+
+#define bpf__strerror_entry(val, fmt...)\
+ case val: {\
+ scnprintf(buf, size, fmt);\
+ break;\
+ }
+
+#define bpf__strerror_end(buf, size)\
+ }\
+ buf[size - 1] = '\0';
+
+int bpf__strerror_prepare_load(const char *filename, int err,
+ char *buf, size_t size)
+{
+ bpf__strerror_head(err, buf, size);
+ bpf__strerror_entry(EINVAL, "%s: BPF object file '%s' is invalid",
+ emsg, filename)
+ bpf__strerror_end(buf, size);
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
new file mode 100644
index 0000000..12be630
--- /dev/null
+++ b/tools/perf/util/bpf-loader.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright (C) 2015, Wang Nan <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ */
+#ifndef __BPF_LOADER_H
+#define __BPF_LOADER_H
+
+#include <linux/compiler.h>
+#include <string.h>
+#include "debug.h"
+
+#ifdef HAVE_LIBBPF_SUPPORT
+int bpf__prepare_load(const char *filename);
+int bpf__strerror_prepare_load(const char *filename, int err,
+ char *buf, size_t size);
+
+void bpf__clear(void);
+#else
+static inline int bpf__prepare_load(const char *filename __maybe_unused)
+{
+ pr_debug("ERROR: eBPF object loading is disabled during compiling.\n");
+ return -1;
+}
+
+static inline void bpf__clear(void) { }
+
+static inline int
+__bpf_strerror(char *buf, size_t size)
+{
+ if (!size)
+ return 0;
+ strncpy(buf,
+ "ERROR: eBPF object loading is disabled during compiling.\n",
+ size);
+ buf[size - 1] = '\0';
+ return 0;
+}
+
+static inline int
+bpf__strerror_prepare_load(const char *filename __maybe_unused,
+ int err __maybe_unused,
+ char *buf, size_t size)
+{
+ return __bpf_strerror(buf, size);
+}
+#endif
+#endif
--
2.1.0

2015-08-28 07:07:08

by Wang Nan

[permalink] [raw]
Subject: [PATCH 06/32] perf tools: Enable passing bpf object file to --event

By introducing new rules in tools/perf/util/parse-events.[ly], this
patch enables 'perf record --event bpf_file.o' to select events by an
eBPF object file. It calls parse_events_load_bpf() to load that file,
which uses bpf__prepare_load() and finally calls bpf_object__open() for
the object files.

Instead of introducing evsel to evlist during parsing, events selected
by eBPF object files are appended separately. The reason is:

1. During parsing, the probing points have not been initialized.

2. Currently we are unable to call add_perf_probe_events() twice,
therefore we have to wait until all such events are collected,
then probe all points by one call.

The real probing and selecting is reside in following patches.

To collect '--filter' events, add a dummy evsel during parsing.

Since bpf__prepare_load() is possible to be called during cmdline
parsing, all builtin commands which are possible to call
parse_events_option() should release bpf resources during cleanup.
Add bpf__clear() to stat, record, top and trace commands, although
currently we are going to support 'perf record' only.

Commiter note:

Testing if the event parsing changes indeed call the BPF loading
routines:

[root@felicio ~]# ls -la foo.o
ls: cannot access foo.o: No such file or directory
[root@felicio ~]# perf record --event foo.o sleep
libbpf: failed to open foo.o: No such file or directory
bpf: failed to load foo.o
invalid or unsupported event: 'foo.o'
Run 'perf list' for a list of valid events

usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]

-e, --event <event> event selector. use 'perf list' to list available events
[root@felicio ~]#

Yes, it does this time around.

Signed-off-by: Wang Nan <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
[ The veprintf() and bpf loader parts were split from this one;
Add bpf__clear() into stat, record, top and trace commands.
Add dummy evsel when parsing.
]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-record.c | 7 +++++--
tools/perf/builtin-stat.c | 8 ++++++--
tools/perf/builtin-top.c | 10 +++++++---
tools/perf/builtin-trace.c | 5 ++++-
tools/perf/util/Build | 1 +
tools/perf/util/parse-events.c | 40 ++++++++++++++++++++++++++++++++++++++++
tools/perf/util/parse-events.h | 3 +++
tools/perf/util/parse-events.l | 3 +++
tools/perf/util/parse-events.y | 18 +++++++++++++++++-
9 files changed, 86 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 81829de..31934b1 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -29,6 +29,7 @@
#include "util/data.h"
#include "util/auxtrace.h"
#include "util/parse-branch-options.h"
+#include "util/bpf-loader.h"

#include <unistd.h>
#include <sched.h>
@@ -1131,13 +1132,13 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
if (!rec->itr) {
rec->itr = auxtrace_record__init(rec->evlist, &err);
if (err)
- return err;
+ goto out_bpf_clear;
}

err = auxtrace_parse_snapshot_options(rec->itr, &rec->opts,
rec->opts.auxtrace_snapshot_opts);
if (err)
- return err;
+ goto out_bpf_clear;

err = -ENOMEM;

@@ -1200,6 +1201,8 @@ out_symbol_exit:
perf_evlist__delete(rec->evlist);
symbol__exit();
auxtrace_record__free(rec->itr);
+out_bpf_clear:
+ bpf__clear();
return err;
}

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 99b62f1..d50a19a 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -59,6 +59,7 @@
#include "util/thread.h"
#include "util/thread_map.h"
#include "util/counts.h"
+#include "util/bpf-loader.h"

#include <stdlib.h>
#include <sys/prctl.h>
@@ -1235,7 +1236,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
output = fopen(output_name, mode);
if (!output) {
perror("failed to create output file");
- return -1;
+ status = -1;
+ goto out;
}
clock_gettime(CLOCK_REALTIME, &tm);
fprintf(output, "# started on %s\n", ctime(&tm.tv_sec));
@@ -1244,7 +1246,8 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
output = fdopen(output_fd, mode);
if (!output) {
perror("Failed opening logfd");
- return -errno;
+ status = -errno;
+ goto out;
}
}

@@ -1377,5 +1380,6 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
perf_evlist__free_stats(evsel_list);
out:
perf_evlist__delete(evsel_list);
+ bpf__clear();
return status;
}
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 246203b..ee946dc 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -41,6 +41,7 @@
#include "util/sort.h"
#include "util/intlist.h"
#include "util/parse-branch-options.h"
+#include "util/bpf-loader.h"
#include "arch/common.h"

#include "util/debug.h"
@@ -1271,8 +1272,10 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)
symbol_conf.priv_size = sizeof(struct annotation);

symbol_conf.try_vmlinux_path = (symbol_conf.vmlinux_name == NULL);
- if (symbol__init(NULL) < 0)
- return -1;
+ if (symbol__init(NULL) < 0) {
+ status = -1;
+ goto out_bpf_clear;
+ }

sort__setup_elide(stdout);

@@ -1290,6 +1293,7 @@ int cmd_top(int argc, const char **argv, const char *prefix __maybe_unused)

out_delete_evlist:
perf_evlist__delete(top.evlist);
-
+out_bpf_clear:
+ bpf__clear();
return status;
}
diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
index ef5fde6..24c8b63 100644
--- a/tools/perf/builtin-trace.c
+++ b/tools/perf/builtin-trace.c
@@ -3090,6 +3090,7 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
if (trace.evlist->nr_entries > 0)
evlist__set_evsel_handler(trace.evlist, trace__event_handler);

+ /* trace__record calls cmd_record, which calls bpf__clear() */
if ((argc >= 1) && (strcmp(argv[0], "record") == 0))
return trace__record(&trace, argc-1, &argv[1]);

@@ -3100,7 +3101,8 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
if (!trace.trace_syscalls && !trace.trace_pgfaults &&
trace.evlist->nr_entries == 0 /* Was --events used? */) {
pr_err("Please specify something to trace.\n");
- return -1;
+ err = -1;
+ goto out;
}

if (output_name != NULL) {
@@ -3159,5 +3161,6 @@ out_close:
if (output_name != NULL)
fclose(trace.output);
out:
+ bpf__clear();
return err;
}
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index e912856..c0ca4a1 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -83,6 +83,7 @@ libperf-$(CONFIG_AUXTRACE) += intel-pt.o
libperf-$(CONFIG_AUXTRACE) += intel-bts.o
libperf-y += parse-branch-options.o

+libperf-$(CONFIG_LIBBPF) += bpf-loader.o
libperf-$(CONFIG_LIBELF) += symbol-elf.o
libperf-$(CONFIG_LIBELF) += probe-file.o
libperf-$(CONFIG_LIBELF) += probe-event.o
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 71d91fb..4343433 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -19,6 +19,7 @@
#include "thread_map.h"
#include "cpumap.h"
#include "asm/bug.h"
+#include "bpf-loader.h"

#define MAX_NAME_LEN 100

@@ -481,6 +482,45 @@ int parse_events_add_tracepoint(struct list_head *list, int *idx,
return add_tracepoint_event(list, idx, sys, event);
}

+int parse_events_load_bpf(struct parse_events_evlist *data,
+ struct list_head *list,
+ char *bpf_file_name)
+{
+ int err;
+ char errbuf[BUFSIZ];
+ struct perf_evsel *evsel;
+
+ /*
+ * Currently don't link useful event to list. BPF object files
+ * should be saved to a seprated list and processed together.
+ *
+ * A dummy event is added here to collect '--filter' option.
+ *
+ * Things could be changed if we solve perf probe reentering
+ * problem. After that probe events file by file is possible.
+ * However, probing cost is still need to be considered.
+ */
+ err = bpf__prepare_load(bpf_file_name);
+ if (err) {
+ bpf__strerror_prepare_load(bpf_file_name, err,
+ errbuf, sizeof(errbuf));
+ data->error->str = strdup(errbuf);
+ data->error->help = strdup("(add -v to see detail)");
+ return err;
+ }
+
+ /*
+ * Don't need call perf_evsel__init() for dummy evsel.
+ * Also, don't increase data->idx.
+ * data->idx affects other evsel's tracking field.
+ */
+ evsel = perf_evsel__new_dummy(bpf_file_name);
+ if (!evsel)
+ return -ENOMEM;
+ list_add_tail(&evsel->node, list);
+ return 0;
+}
+
static int
parse_breakpoint_type(const char *type, struct perf_event_attr *attr)
{
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index a09b0e2..3652387 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -119,6 +119,9 @@ int parse_events__modifier_group(struct list_head *list, char *event_mod);
int parse_events_name(struct list_head *list, char *name);
int parse_events_add_tracepoint(struct list_head *list, int *idx,
char *sys, char *event);
+int parse_events_load_bpf(struct parse_events_evlist *data,
+ struct list_head *list,
+ char *bpf_file_name);
int parse_events_add_numeric(struct parse_events_evlist *data,
struct list_head *list,
u32 type, u64 config,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 936d566..22e8f93 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -115,6 +115,7 @@ do { \
group [^,{}/]*[{][^}]*[}][^,{}/]*
event_pmu [^,{}/]+[/][^/]*[/][^,{}/]*
event [^,{}/]+
+bpf_object .*\.(o|bpf)

num_dec [0-9]+
num_hex 0x[a-fA-F0-9]+
@@ -159,6 +160,7 @@ modifier_bp [rwx]{1,3}
}

{event_pmu} |
+{bpf_object} |
{event} {
BEGIN(INITIAL);
REWIND(1);
@@ -264,6 +266,7 @@ r{num_raw_hex} { return raw(yyscanner); }
{num_hex} { return value(yyscanner, 16); }

{modifier_event} { return str(yyscanner, PE_MODIFIER_EVENT); }
+{bpf_object} { return str(yyscanner, PE_BPF_OBJECT); }
{name} { return pmu_str_check(yyscanner); }
"/" { BEGIN(config); return '/'; }
- { return '-'; }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 591905a..3ee3a32 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -42,6 +42,7 @@ static inc_group_count(struct list_head *list,
%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_RAW PE_TERM
%token PE_EVENT_NAME
%token PE_NAME
+%token PE_BPF_OBJECT
%token PE_MODIFIER_EVENT PE_MODIFIER_BP
%token PE_NAME_CACHE_TYPE PE_NAME_CACHE_OP_RESULT
%token PE_PREFIX_MEM PE_PREFIX_RAW PE_PREFIX_GROUP
@@ -53,6 +54,7 @@ static inc_group_count(struct list_head *list,
%type <num> PE_RAW
%type <num> PE_TERM
%type <str> PE_NAME
+%type <str> PE_BPF_OBJECT
%type <str> PE_NAME_CACHE_TYPE
%type <str> PE_NAME_CACHE_OP_RESULT
%type <str> PE_MODIFIER_EVENT
@@ -69,6 +71,7 @@ static inc_group_count(struct list_head *list,
%type <head> event_legacy_tracepoint
%type <head> event_legacy_numeric
%type <head> event_legacy_raw
+%type <head> event_bpf_file
%type <head> event_def
%type <head> event_mod
%type <head> event_name
@@ -198,7 +201,8 @@ event_def: event_pmu |
event_legacy_mem |
event_legacy_tracepoint sep_dc |
event_legacy_numeric sep_dc |
- event_legacy_raw sep_dc
+ event_legacy_raw sep_dc |
+ event_bpf_file

event_pmu:
PE_NAME '/' event_config '/'
@@ -420,6 +424,18 @@ PE_RAW
$$ = list;
}

+event_bpf_file:
+PE_BPF_OBJECT
+{
+ struct parse_events_evlist *data = _data;
+ struct list_head *list;
+
+ ALLOC_LIST(list);
+ ABORT_ON(parse_events_load_bpf(data, list, $1));
+ $$ = list;
+}
+
+
start_terms: event_config
{
struct parse_events_terms *data = _data;
--
2.1.0

2015-08-28 07:13:51

by Wang Nan

[permalink] [raw]
Subject: [PATCH 07/32] perf probe: Attach trace_probe_event with perf_probe_event

This patch drops struct __event_package structure. Instead, it adds
trace_probe_event into 'struct perf_probe_event'.

trace_probe_event information gives further patches a chance to access
actual probe points and actual arguments. Using them, bpf_loader will
be able to attach one bpf program to different probing points of a
inline functions (which has multiple probing points) and glob
functions. Moreover, by reading arguments information, bpf code for
reading those arguments can be generated.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/builtin-probe.c | 4 ++-
tools/perf/util/probe-event.c | 60 +++++++++++++++++++++----------------------
tools/perf/util/probe-event.h | 6 ++++-
3 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index b81cec3..826d452 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -496,7 +496,9 @@ __cmd_probe(int argc, const char **argv, const char *prefix __maybe_unused)
usage_with_options(probe_usage, options);
}

- ret = add_perf_probe_events(params.events, params.nevents);
+ ret = add_perf_probe_events(params.events,
+ params.nevents,
+ true);
if (ret < 0) {
pr_err_with_code(" Error: Failed to add events.", ret);
return ret;
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index eb5f18b..57a7bae 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1985,6 +1985,9 @@ void clear_perf_probe_event(struct perf_probe_event *pev)
struct perf_probe_arg_field *field, *next;
int i;

+ if (pev->ntevs)
+ cleanup_perf_probe_event(pev);
+
free(pev->event);
free(pev->group);
free(pev->target);
@@ -2759,61 +2762,58 @@ static int convert_to_probe_trace_events(struct perf_probe_event *pev,
return find_probe_trace_events_from_map(pev, tevs);
}

-struct __event_package {
- struct perf_probe_event *pev;
- struct probe_trace_event *tevs;
- int ntevs;
-};
-
-int add_perf_probe_events(struct perf_probe_event *pevs, int npevs)
+int cleanup_perf_probe_event(struct perf_probe_event *pev)
{
- int i, j, ret;
- struct __event_package *pkgs;
+ int i;

- ret = 0;
- pkgs = zalloc(sizeof(struct __event_package) * npevs);
+ if (!pev || !pev->ntevs)
+ return 0;

- if (pkgs == NULL)
- return -ENOMEM;
+ for (i = 0; i < pev->ntevs; i++)
+ clear_probe_trace_event(&pev->tevs[i]);
+
+ zfree(&pev->tevs);
+ pev->ntevs = 0;
+ return 0;
+}
+
+int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
+ bool cleanup)
+{
+ int i, ret;

ret = init_symbol_maps(pevs->uprobes);
- if (ret < 0) {
- free(pkgs);
+ if (ret < 0)
return ret;
- }

/* Loop 1: convert all events */
for (i = 0; i < npevs; i++) {
- pkgs[i].pev = &pevs[i];
/* Init kprobe blacklist if needed */
- if (!pkgs[i].pev->uprobes)
+ if (pevs[i].uprobes)
kprobe_blacklist__init();
/* Convert with or without debuginfo */
- ret = convert_to_probe_trace_events(pkgs[i].pev,
- &pkgs[i].tevs);
- if (ret < 0)
+ ret = convert_to_probe_trace_events(&pevs[i], &pevs[i].tevs);
+ if (ret < 0) {
+ cleanup = true;
goto end;
- pkgs[i].ntevs = ret;
+ }
+ pevs[i].ntevs = ret;
}
/* This just release blacklist only if allocated */
kprobe_blacklist__release();

/* Loop 2: add all events */
for (i = 0; i < npevs; i++) {
- ret = __add_probe_trace_events(pkgs[i].pev, pkgs[i].tevs,
- pkgs[i].ntevs,
+ ret = __add_probe_trace_events(&pevs[i], pevs[i].tevs,
+ pevs[i].ntevs,
probe_conf.force_add);
if (ret < 0)
break;
}
end:
/* Loop 3: cleanup and free trace events */
- for (i = 0; i < npevs; i++) {
- for (j = 0; j < pkgs[i].ntevs; j++)
- clear_probe_trace_event(&pkgs[i].tevs[j]);
- zfree(&pkgs[i].tevs);
- }
- free(pkgs);
+ for (i = 0; cleanup && (i < npevs); i++)
+ cleanup_perf_probe_event(&pevs[i]);
exit_symbol_maps();

return ret;
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 6e7ec68..915f0d8 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -87,6 +87,8 @@ struct perf_probe_event {
bool uprobes; /* Uprobe event flag */
char *target; /* Target binary */
struct perf_probe_arg *args; /* Arguments */
+ struct probe_trace_event *tevs;
+ int ntevs;
};

/* Line range */
@@ -137,8 +139,10 @@ extern void line_range__clear(struct line_range *lr);
/* Initialize line range */
extern int line_range__init(struct line_range *lr);

-extern int add_perf_probe_events(struct perf_probe_event *pevs, int npevs);
+extern int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
+ bool cleanup);
extern int del_perf_probe_events(struct strfilter *filter);
+extern int cleanup_perf_probe_event(struct perf_probe_event *pev);
extern int show_perf_probe_events(struct strfilter *filter);
extern int show_line_range(struct line_range *lr, const char *module,
bool user);
--
2.1.0

2015-08-28 07:07:13

by Wang Nan

[permalink] [raw]
Subject: [PATCH 08/32] perf record, bpf: Parse and probe eBPF programs probe points

This patch introduces bpf__{un,}probe() functions to enable callers to
create kprobe points based on section names of BPF programs. It parses
the section names of each eBPF program and creates corresponding 'struct
perf_probe_event' structures. The parse_perf_probe_command() function is
used to do the main parsing work.

Parsing result is stored into an array to satisify
add_perf_probe_events(). It accepts an array of 'struct perf_probe_event'
and do all the work in one call.

Define PERF_BPF_PROBE_GROUP as "perf_bpf_probe", which will be used as
the group name of all eBPF probing points.

probe_conf.max_probes is set to MAX_PROBES to support glob matching.

Before ending of bpf__probe(), data in each 'struct perf_probe_event' is
cleaned. Things will be changed by following patches because they need
'struct probe_trace_event' in them,

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
Link: http://lkml.kernel.org/n/[email protected]
[Merged by two patches]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/builtin-record.c | 19 ++++++-
tools/perf/util/bpf-loader.c | 133 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 13 +++++
3 files changed, 164 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 31934b1..8833186 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1140,7 +1140,23 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
if (err)
goto out_bpf_clear;

- err = -ENOMEM;
+ /*
+ * bpf__probe must be called before symbol__init() because we
+ * need init_symbol_maps. If called after symbol__init,
+ * symbol_conf.sort_by_name won't take effect.
+ *
+ * bpf__unprobe() is safe even if bpf__probe() failed, and it
+ * also calls symbol__init. Therefore, goto out_symbol_exit
+ * is safe when probe failed.
+ */
+ err = bpf__probe();
+ if (err) {
+ bpf__strerror_probe(err, errbuf, sizeof(errbuf));
+
+ pr_err("Probing at events in BPF object failed.\n");
+ pr_err("\t%s\n", errbuf);
+ goto out_symbol_exit;
+ }

symbol__init(NULL);

@@ -1201,6 +1217,7 @@ out_symbol_exit:
perf_evlist__delete(rec->evlist);
symbol__exit();
auxtrace_record__free(rec->itr);
+ bpf__unprobe();
out_bpf_clear:
bpf__clear();
return err;
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 88531ea..435f52e 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -9,6 +9,8 @@
#include "perf.h"
#include "debug.h"
#include "bpf-loader.h"
+#include "probe-event.h"
+#include "probe-finder.h"

#define DEFINE_PRINT_FN(name, level) \
static int libbpf_##name(const char *fmt, ...) \
@@ -28,6 +30,58 @@ DEFINE_PRINT_FN(debug, 1)

static bool libbpf_initialized;

+static int
+config_bpf_program(struct bpf_program *prog, struct perf_probe_event *pev)
+{
+ const char *config_str;
+ int err;
+
+ config_str = bpf_program__title(prog, false);
+ if (!config_str) {
+ pr_debug("bpf: unable to get title for program\n");
+ return -EINVAL;
+ }
+
+ pr_debug("bpf: config program '%s'\n", config_str);
+ err = parse_perf_probe_command(config_str, pev);
+ if (err < 0) {
+ pr_debug("bpf: '%s' is not a valid config string\n",
+ config_str);
+ /* parse failed, don't need clear pev. */
+ return -EINVAL;
+ }
+
+ if (pev->group && strcmp(pev->group, PERF_BPF_PROBE_GROUP)) {
+ pr_debug("bpf: '%s': group for event is set and not '%s'.\n",
+ config_str, PERF_BPF_PROBE_GROUP);
+ err = -EINVAL;
+ goto errout;
+ } else if (!pev->group)
+ pev->group = strdup(PERF_BPF_PROBE_GROUP);
+
+ if (!pev->group) {
+ pr_debug("bpf: strdup failed\n");
+ err = -ENOMEM;
+ goto errout;
+ }
+
+ if (!pev->event) {
+ pr_debug("bpf: '%s': event name is missing\n",
+ config_str);
+ err = -EINVAL;
+ goto errout;
+ }
+
+ pr_debug("bpf: config '%s' is ok\n", config_str);
+
+ return 0;
+
+errout:
+ if (pev)
+ clear_perf_probe_event(pev);
+ return err;
+}
+
int bpf__prepare_load(const char *filename)
{
struct bpf_object *obj;
@@ -59,6 +113,74 @@ void bpf__clear(void)
bpf_object__close(obj);
}

+static bool is_probed;
+
+int bpf__unprobe(void)
+{
+ struct strfilter *delfilter;
+ int ret;
+
+ if (!is_probed)
+ return 0;
+
+ delfilter = strfilter__new(PERF_BPF_PROBE_GROUP ":*", NULL);
+ if (!delfilter) {
+ pr_debug("Failed to create delfilter when unprobing\n");
+ return -ENOMEM;
+ }
+
+ ret = del_perf_probe_events(delfilter);
+ strfilter__delete(delfilter);
+ if (ret < 0 && is_probed)
+ pr_debug("Error: failed to delete events: %s\n",
+ strerror(-ret));
+ else
+ is_probed = false;
+ return ret < 0 ? ret : 0;
+}
+
+int bpf__probe(void)
+{
+ int err, nr_events = 0;
+ struct bpf_object *obj, *tmp;
+ struct bpf_program *prog;
+ struct perf_probe_event *pevs;
+
+ pevs = calloc(MAX_PROBES, sizeof(pevs[0]));
+ if (!pevs)
+ return -ENOMEM;
+
+ bpf_object__for_each_safe(obj, tmp) {
+ bpf_object__for_each_program(prog, obj) {
+ err = config_bpf_program(prog, &pevs[nr_events++]);
+ if (err < 0)
+ goto out;
+
+ if (nr_events >= MAX_PROBES) {
+ pr_debug("Too many (more than %d) events\n",
+ MAX_PROBES);
+ err = -ERANGE;
+ goto out;
+ };
+ }
+ }
+
+ probe_conf.max_probes = MAX_PROBES;
+ /* Let add_perf_probe_events generates probe_trace_event (tevs) */
+ err = add_perf_probe_events(pevs, nr_events, false);
+
+ /* add_perf_probe_events return negative when fail */
+ if (err < 0) {
+ pr_debug("bpf probe: failed to probe events\n");
+ } else
+ is_probed = true;
+out:
+ while (nr_events > 0)
+ clear_perf_probe_event(&pevs[--nr_events]);
+ free(pevs);
+ return err < 0 ? err : 0;
+}
+
#define bpf__strerror_head(err, buf, size) \
char sbuf[STRERR_BUFSIZE], *emsg;\
if (!size)\
@@ -90,3 +212,14 @@ int bpf__strerror_prepare_load(const char *filename, int err,
bpf__strerror_end(buf, size);
return 0;
}
+
+int bpf__strerror_probe(int err, char *buf, size_t size)
+{
+ bpf__strerror_head(err, buf, size);
+ bpf__strerror_entry(ERANGE, "Too many (more than %d) events",
+ MAX_PROBES);
+ bpf__strerror_entry(ENOENT, "Selected kprobe point doesn't exist.");
+ bpf__strerror_entry(EEXIST, "Selected kprobe point already exist, try perf probe -d '*'.");
+ bpf__strerror_end(buf, size);
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 12be630..6b09a85 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -9,10 +9,15 @@
#include <string.h>
#include "debug.h"

+#define PERF_BPF_PROBE_GROUP "perf_bpf_probe"
+
#ifdef HAVE_LIBBPF_SUPPORT
int bpf__prepare_load(const char *filename);
int bpf__strerror_prepare_load(const char *filename, int err,
char *buf, size_t size);
+int bpf__probe(void);
+int bpf__unprobe(void);
+int bpf__strerror_probe(int err, char *buf, size_t size);

void bpf__clear(void);
#else
@@ -22,6 +27,8 @@ static inline int bpf__prepare_load(const char *filename __maybe_unused)
return -1;
}

+static inline int bpf__probe(void) { return 0; }
+static inline int bpf__unprobe(void) { return 0; }
static inline void bpf__clear(void) { }

static inline int
@@ -43,5 +50,11 @@ bpf__strerror_prepare_load(const char *filename __maybe_unused,
{
return __bpf_strerror(buf, size);
}
+
+static inline int bpf__strerror_probe(int err __maybe_unused,
+ char *buf, size_t size)
+{
+ return __bpf_strerror(buf, size);
+}
#endif
#endif
--
2.1.0

2015-08-28 07:07:23

by Wang Nan

[permalink] [raw]
Subject: [PATCH 09/32] perf bpf: Collect 'struct perf_probe_event' for bpf_program

This patch utilizes bpf_program__set_private(), binding perf_probe_event
with bpf program by private field.

Saving those information so 'perf record' knows which kprobe point a program
should be attached.

Since data in 'struct perf_probe_event' is build by 2 stages, tev_ready
is used to mark whether the information (especially tevs) in 'struct
perf_probe_event' is valid or not. It is false at first, and set to true
by sync_bpf_program_pev(), which copy all pointers in original pev into
a program specific memory region. sync_bpf_program_pev() is called after
add_perf_probe_events() to make sure ready of data.

Remove code which clean 'struct perf_probe_event' after bpf__probe()
because pointers in pevs are copied to program's private field, calling
clear_perf_probe_event() becomes unsafe.

Signed-off-by: Wang Nan <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
[Splitted from a larger patch]
---
tools/perf/util/bpf-loader.c | 90 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 88 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 435f52e..ae23f6f 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -30,9 +30,35 @@ DEFINE_PRINT_FN(debug, 1)

static bool libbpf_initialized;

+struct bpf_prog_priv {
+ /*
+ * If pev_ready is false, ppev pointes to a local memory which
+ * is only valid inside bpf__probe().
+ * pev is valid only when pev_ready.
+ */
+ bool pev_ready;
+ union {
+ struct perf_probe_event *ppev;
+ struct perf_probe_event pev;
+ };
+};
+
+static void
+bpf_prog_priv__clear(struct bpf_program *prog __maybe_unused,
+ void *_priv)
+{
+ struct bpf_prog_priv *priv = _priv;
+
+ /* check if pev is initialized */
+ if (priv && priv->pev_ready)
+ clear_perf_probe_event(&priv->pev);
+ free(priv);
+}
+
static int
config_bpf_program(struct bpf_program *prog, struct perf_probe_event *pev)
{
+ struct bpf_prog_priv *priv = NULL;
const char *config_str;
int err;

@@ -74,14 +100,58 @@ config_bpf_program(struct bpf_program *prog, struct perf_probe_event *pev)

pr_debug("bpf: config '%s' is ok\n", config_str);

+ priv = calloc(1, sizeof(*priv));
+ if (!priv) {
+ pr_debug("bpf: failed to alloc memory\n");
+ err = -ENOMEM;
+ goto errout;
+ }
+
+ /*
+ * At this very early stage, tevs inside pev are not ready.
+ * It becomes usable after add_perf_probe_events() is called.
+ * set pev_ready to false so further access read priv->ppev
+ * only.
+ */
+ priv->pev_ready = false;
+ priv->ppev = pev;
+
+ err = bpf_program__set_private(prog, priv,
+ bpf_prog_priv__clear);
+ if (err) {
+ pr_debug("bpf: set program private failed\n");
+ err = -ENOMEM;
+ goto errout;
+ }
return 0;

errout:
if (pev)
clear_perf_probe_event(pev);
+ if (priv)
+ free(priv);
return err;
}

+static int
+sync_bpf_program_pev(struct bpf_program *prog)
+{
+ int err;
+ struct bpf_prog_priv *priv;
+ struct perf_probe_event *ppev;
+
+ err = bpf_program__get_private(prog, (void **)&priv);
+ if (err || !priv || priv->pev_ready) {
+ pr_debug("Internal error: sync_bpf_program_pev\n");
+ return -EINVAL;
+ }
+
+ ppev = priv->ppev;
+ memcpy(&priv->pev, ppev, sizeof(*ppev));
+ priv->pev_ready = true;
+ return 0;
+}
+
int bpf__prepare_load(const char *filename)
{
struct bpf_object *obj;
@@ -172,11 +242,27 @@ int bpf__probe(void)
/* add_perf_probe_events return negative when fail */
if (err < 0) {
pr_debug("bpf probe: failed to probe events\n");
+ goto out;
} else
is_probed = true;
+
+ /*
+ * After add_perf_probe_events, 'struct perf_probe_event' is ready.
+ * Until now copying program's priv->pev field and freeing
+ * the big array allocated before become safe.
+ */
+ bpf_object__for_each_safe(obj, tmp) {
+ bpf_object__for_each_program(prog, obj) {
+ err = sync_bpf_program_pev(prog);
+ if (err)
+ goto out;
+ }
+ }
out:
- while (nr_events > 0)
- clear_perf_probe_event(&pevs[--nr_events]);
+ /*
+ * Don't call clear_perf_probe_event() for entries of pevs:
+ * they are used by prog's private field.
+ */
free(pevs);
return err < 0 ? err : 0;
}
--
2.1.0

2015-08-28 07:07:18

by Wang Nan

[permalink] [raw]
Subject: [PATCH 10/32] perf record: Load all eBPF object into kernel

This patch utilizes bpf_object__load() provided by libbpf to load all
objects into kernel.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/builtin-record.c | 15 +++++++++++++++
tools/perf/util/bpf-loader.c | 28 ++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 10 ++++++++++
3 files changed, 53 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 8833186..c335ac5 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1158,6 +1158,21 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
goto out_symbol_exit;
}

+ /*
+ * bpf__probe() also calls symbol__init() if there are probe
+ * events in bpf objects, so calling symbol_exit when failure
+ * is safe. If there is no probe event, bpf__load() always
+ * success.
+ */
+ err = bpf__load();
+ if (err) {
+ pr_err("Loading BPF programs failed:\n");
+
+ bpf__strerror_load(err, errbuf, sizeof(errbuf));
+ pr_err("\t%s\n", errbuf);
+ goto out_symbol_exit;
+ }
+
symbol__init(NULL);

if (symbol_conf.kptr_restrict)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index ae23f6f..d63a594 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -267,6 +267,25 @@ out:
return err < 0 ? err : 0;
}

+int bpf__load(void)
+{
+ struct bpf_object *obj, *tmp;
+ int err = 0;
+
+ bpf_object__for_each_safe(obj, tmp) {
+ err = bpf_object__load(obj);
+ if (err) {
+ pr_debug("bpf: load objects failed\n");
+ goto errout;
+ }
+ }
+ return 0;
+errout:
+ bpf_object__for_each_safe(obj, tmp)
+ bpf_object__unload(obj);
+ return err;
+}
+
#define bpf__strerror_head(err, buf, size) \
char sbuf[STRERR_BUFSIZE], *emsg;\
if (!size)\
@@ -309,3 +328,12 @@ int bpf__strerror_probe(int err, char *buf, size_t size)
bpf__strerror_end(buf, size);
return 0;
}
+
+int bpf__strerror_load(int err, char *buf, size_t size)
+{
+ bpf__strerror_head(err, buf, size);
+ bpf__strerror_entry(EINVAL, "%s: add -v to see detail. Run a CONFIG_BPF_SYSCALL kernel?",
+ emsg)
+ bpf__strerror_end(buf, size);
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 6b09a85..4d7552e 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -19,6 +19,9 @@ int bpf__probe(void);
int bpf__unprobe(void);
int bpf__strerror_probe(int err, char *buf, size_t size);

+int bpf__load(void);
+int bpf__strerror_load(int err, char *buf, size_t size);
+
void bpf__clear(void);
#else
static inline int bpf__prepare_load(const char *filename __maybe_unused)
@@ -29,6 +32,7 @@ static inline int bpf__prepare_load(const char *filename __maybe_unused)

static inline int bpf__probe(void) { return 0; }
static inline int bpf__unprobe(void) { return 0; }
+static inline int bpf__load(void) { return 0; }
static inline void bpf__clear(void) { }

static inline int
@@ -56,5 +60,11 @@ static inline int bpf__strerror_probe(int err __maybe_unused,
{
return __bpf_strerror(buf, size);
}
+
+static inline int bpf__strerror_load(int err __maybe_unused,
+ char *buf, size_t size)
+{
+ return __bpf_strerror(buf, size);
+}
#endif
#endif
--
2.1.0

2015-08-28 07:07:16

by Wang Nan

[permalink] [raw]
Subject: [PATCH 11/32] perf tools: Add bpf_fd field to evsel and config it

This patch adds a bpf_fd field to 'struct evsel' then introduces method
to config it. In bpf-loader, a bpf__foreach_tev() function is added,
which calls the callback function for each 'struct probe_trace_event'
events for each bpf program with their file descriptors. In evlist.c,
perf_evlist__add_bpf() is introduced to add all bpf events into evlist.
The event names are found from probe_trace_event structure.
'perf record' calls perf_evlist__add_bpf().

Since bpf-loader.c will not be built if libbpf is turned off, an empty
bpf__foreach_tev() is defined in bpf-loader.h to avoid compiling
error.

This patch iterates over 'struct probe_trace_event' instead of
'struct probe_trace_event' during the loop for further patches, which
will generate multiple instances form one BPF program and install then
onto different 'struct probe_trace_event'.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/builtin-record.c | 6 ++++++
tools/perf/util/bpf-loader.c | 41 +++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 13 +++++++++++++
tools/perf/util/evlist.c | 41 +++++++++++++++++++++++++++++++++++++++++
tools/perf/util/evlist.h | 1 +
tools/perf/util/evsel.c | 1 +
tools/perf/util/evsel.h | 1 +
7 files changed, 104 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index c335ac5..5051d3b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1173,6 +1173,12 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
goto out_symbol_exit;
}

+ err = perf_evlist__add_bpf(rec->evlist);
+ if (err < 0) {
+ pr_err("Failed to add events from BPF object(s)\n");
+ goto out_symbol_exit;
+ }
+
symbol__init(NULL);

if (symbol_conf.kptr_restrict)
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index d63a594..126aa71 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -286,6 +286,47 @@ errout:
return err;
}

+int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg)
+{
+ struct bpf_object *obj, *tmp;
+ struct bpf_program *prog;
+ int err;
+
+ bpf_object__for_each_safe(obj, tmp) {
+ bpf_object__for_each_program(prog, obj) {
+ struct probe_trace_event *tev;
+ struct perf_probe_event *pev;
+ struct bpf_prog_priv *priv;
+ int i, fd;
+
+ err = bpf_program__get_private(prog,
+ (void **)&priv);
+ if (err || !priv) {
+ pr_debug("bpf: failed to get private field\n");
+ return -EINVAL;
+ }
+
+ pev = &priv->pev;
+ for (i = 0; i < pev->ntevs; i++) {
+ tev = &pev->tevs[i];
+
+ fd = bpf_program__fd(prog);
+ if (fd < 0) {
+ pr_debug("bpf: failed to get file descriptor\n");
+ return fd;
+ }
+
+ err = func(tev, fd, arg);
+ if (err) {
+ pr_debug("bpf: call back failed, stop iterate\n");
+ return err;
+ }
+ }
+ }
+ }
+ return 0;
+}
+
#define bpf__strerror_head(err, buf, size) \
char sbuf[STRERR_BUFSIZE], *emsg;\
if (!size)\
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 4d7552e..34656f8 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -7,10 +7,14 @@

#include <linux/compiler.h>
#include <string.h>
+#include "probe-event.h"
#include "debug.h"

#define PERF_BPF_PROBE_GROUP "perf_bpf_probe"

+typedef int (*bpf_prog_iter_callback_t)(struct probe_trace_event *tev,
+ int fd, void *arg);
+
#ifdef HAVE_LIBBPF_SUPPORT
int bpf__prepare_load(const char *filename);
int bpf__strerror_prepare_load(const char *filename, int err,
@@ -23,6 +27,8 @@ int bpf__load(void);
int bpf__strerror_load(int err, char *buf, size_t size);

void bpf__clear(void);
+
+int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg);
#else
static inline int bpf__prepare_load(const char *filename __maybe_unused)
{
@@ -36,6 +42,13 @@ static inline int bpf__load(void) { return 0; }
static inline void bpf__clear(void) { }

static inline int
+bpf__foreach_tev(bpf_prog_iter_callback_t func __maybe_unused,
+ void *arg __maybe_unused)
+{
+ return 0;
+}
+
+static inline int
__bpf_strerror(char *buf, size_t size)
{
if (!size)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 30fc327..3bedf64 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -14,6 +14,7 @@
#include "target.h"
#include "evlist.h"
#include "evsel.h"
+#include "bpf-loader.h"
#include "debug.h"
#include <unistd.h>

@@ -194,6 +195,46 @@ error:
return -ENOMEM;
}

+static int add_bpf_event(struct probe_trace_event *tev, int fd,
+ void *arg)
+{
+ struct perf_evlist *evlist = arg;
+ struct perf_evsel *pos;
+ struct list_head list;
+ int err, idx, entries;
+
+ pr_debug("add bpf event %s:%s and attach bpf program %d\n",
+ tev->group, tev->event, fd);
+ INIT_LIST_HEAD(&list);
+ idx = evlist->nr_entries;
+
+ pr_debug("adding %s:%s\n", tev->group, tev->event);
+ err = parse_events_add_tracepoint(&list, &idx, tev->group,
+ tev->event);
+ if (err) {
+ struct perf_evsel *evsel, *tmp;
+
+ pr_err("Failed to add BPF event %s:%s\n",
+ tev->group, tev->event);
+ list_for_each_entry_safe(evsel, tmp, &list, node) {
+ list_del(&evsel->node);
+ perf_evsel__delete(evsel);
+ }
+ return -EINVAL;
+ }
+
+ list_for_each_entry(pos, &list, node)
+ pos->bpf_fd = fd;
+ entries = idx - evlist->nr_entries;
+ perf_evlist__splice_list_tail(evlist, &list, entries);
+ return 0;
+}
+
+int perf_evlist__add_bpf(struct perf_evlist *evlist)
+{
+ return bpf__foreach_tev(add_bpf_event, evlist);
+}
+
static int perf_evlist__add_attrs(struct perf_evlist *evlist,
struct perf_event_attr *attrs, size_t nr_attrs)
{
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index df4820e..779aa27 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -72,6 +72,7 @@ void perf_evlist__delete(struct perf_evlist *evlist);

void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry);
int perf_evlist__add_default(struct perf_evlist *evlist);
+int perf_evlist__add_bpf(struct perf_evlist *evlist);
int __perf_evlist__add_default_attrs(struct perf_evlist *evlist,
struct perf_event_attr *attrs, size_t nr_attrs);

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 35947f5..77a425f 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -206,6 +206,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
evsel->leader = evsel;
evsel->unit = "";
evsel->scale = 1.0;
+ evsel->bpf_fd = -1;
INIT_LIST_HEAD(&evsel->node);
INIT_LIST_HEAD(&evsel->config_terms);
perf_evsel__object.init(evsel);
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 443995b..9618099 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -116,6 +116,7 @@ struct perf_evsel {
bool cmdline_group_boundary;
bool is_dummy;
struct list_head config_terms;
+ int bpf_fd;
};

union u64_swap {
--
2.1.0

2015-08-28 07:13:36

by Wang Nan

[permalink] [raw]
Subject: [PATCH 12/32] perf tools: Allow filter option to be applied to bof object

Before this patch, --filter options can't be applied to BPF object
'events'. For example, the following command:

# perf record -e cycles -e test_bpf.o --exclude-perf -a sleep 1

doesn't apply '--exclude-perf' to events in test_bpf.o. Instead, the
filter will be applied to 'cycles' event. This is caused by the delay
manner of adding real BPF events. Because all BPF probing points are
probed by one call, we can't add real events until all BPF objects
are collected. In previous patch (perf tools: Enable passing bpf object
file to --event), nothing is appended to evlist.

This patch fixes this by utilizing the dummy event linked during
parse_events(). Filter settings goes to dummy event, and be synced with
real events in add_bpf_event().

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
---
tools/perf/builtin-record.c | 6 ++++-
tools/perf/util/bpf-loader.c | 8 ++++++-
tools/perf/util/bpf-loader.h | 2 ++
tools/perf/util/evlist.c | 53 +++++++++++++++++++++++++++++++++++++++++---
4 files changed, 64 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5051d3b..fd56a5b 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1113,7 +1113,6 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)

argc = parse_options(argc, argv, record_options, record_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
- perf_evlist__purge_dummy(rec->evlist);

if (!argc && target__none(&rec->opts.target))
usage_with_options(record_usage, record_options);
@@ -1178,6 +1177,11 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
pr_err("Failed to add events from BPF object(s)\n");
goto out_symbol_exit;
}
+ /*
+ * Until now let's purge dummy event. Filter options should
+ * have been attached to real events by perf_evlist__add_bpf().
+ */
+ perf_evlist__purge_dummy(rec->evlist);

symbol__init(NULL);

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 126aa71..c3bc0a8 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -293,6 +293,12 @@ int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg)
int err;

bpf_object__for_each_safe(obj, tmp) {
+ const char *obj_name;
+
+ obj_name = bpf_object__get_name(obj);
+ if (!obj_name)
+ obj_name = "[unknown]";
+
bpf_object__for_each_program(prog, obj) {
struct probe_trace_event *tev;
struct perf_probe_event *pev;
@@ -316,7 +322,7 @@ int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg)
return fd;
}

- err = func(tev, fd, arg);
+ err = func(tev, obj_name, fd, arg);
if (err) {
pr_debug("bpf: call back failed, stop iterate\n");
return err;
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 34656f8..323e664 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -6,6 +6,7 @@
#define __BPF_LOADER_H

#include <linux/compiler.h>
+#include <linux/perf_event.h>
#include <string.h>
#include "probe-event.h"
#include "debug.h"
@@ -13,6 +14,7 @@
#define PERF_BPF_PROBE_GROUP "perf_bpf_probe"

typedef int (*bpf_prog_iter_callback_t)(struct probe_trace_event *tev,
+ const char *obj_name,
int fd, void *arg);

#ifdef HAVE_LIBBPF_SUPPORT
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3bedf64..21a11c9 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -195,7 +195,45 @@ error:
return -ENOMEM;
}

-static int add_bpf_event(struct probe_trace_event *tev, int fd,
+static void
+sync_with_dummy(struct perf_evlist *evlist, const char *obj_name,
+ struct list_head *list)
+{
+ struct perf_evsel *dummy_evsel, *pos;
+ const char *filter;
+ bool found = false;
+ int err;
+
+ evlist__for_each(evlist, dummy_evsel) {
+ if (!perf_evsel__is_dummy(dummy_evsel))
+ continue;
+
+ if (strcmp(dummy_evsel->name, obj_name) == 0) {
+ found = true;
+ break;
+ }
+ }
+
+ if (!found) {
+ pr_debug("Failed to find dummy event of '%s'\n",
+ obj_name);
+ return;
+ }
+
+ filter = dummy_evsel->filter;
+ if (!filter)
+ return;
+
+ list_for_each_entry(pos, list, node) {
+ err = perf_evsel__set_filter(pos, filter);
+ if (err)
+ pr_debug("Failed to set filter '%s' to evsel %s\n",
+ filter, pos->name);
+ }
+}
+
+static int add_bpf_event(struct probe_trace_event *tev,
+ const char *obj_name, int fd,
void *arg)
{
struct perf_evlist *evlist = arg;
@@ -203,8 +241,8 @@ static int add_bpf_event(struct probe_trace_event *tev, int fd,
struct list_head list;
int err, idx, entries;

- pr_debug("add bpf event %s:%s and attach bpf program %d\n",
- tev->group, tev->event, fd);
+ pr_debug("add bpf event %s:%s and attach bpf program %d (from %s)\n",
+ tev->group, tev->event, fd, obj_name);
INIT_LIST_HEAD(&list);
idx = evlist->nr_entries;

@@ -226,6 +264,15 @@ static int add_bpf_event(struct probe_trace_event *tev, int fd,
list_for_each_entry(pos, &list, node)
pos->bpf_fd = fd;
entries = idx - evlist->nr_entries;
+
+ sync_with_dummy(evlist, obj_name, &list);
+
+ /*
+ * Currectly we don't need to link those new events at the
+ * same place where dummy node reside because order of
+ * events in cmdline won't be used after
+ * 'perf_evlist__add_bpf'.
+ */
perf_evlist__splice_list_tail(evlist, &list, entries);
return 0;
}
--
2.1.0

2015-08-28 07:12:12

by Wang Nan

[permalink] [raw]
Subject: [PATCH 13/32] perf tools: Attach eBPF program to perf event

This is the final patch which makes basic BPF filter work. After
applying this patch, users are allowed to use BPF filter like:

# perf record --event ./hello_world.c ls

In this patch PERF_EVENT_IOC_SET_BPF ioctl is used to attach eBPF
program to a newly created perf event. The file descriptor of the
eBPF program is passed to perf record using previous patches, and
stored into evsel->bpf_fd.

It is possible that different perf event are created for one kprobe
events for different CPUs. In this case, when trying to call the
ioctl, EEXIST will be return. This patch doesn't treat it as an error.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/evsel.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 77a425f..51132d3 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1363,6 +1363,22 @@ retry_open:
err);
goto try_fallback;
}
+
+ if (evsel->bpf_fd >= 0) {
+ int evt_fd = FD(evsel, cpu, thread);
+ int bpf_fd = evsel->bpf_fd;
+
+ err = ioctl(evt_fd,
+ PERF_EVENT_IOC_SET_BPF,
+ bpf_fd);
+ if (err && errno != EEXIST) {
+ pr_err("failed to attach bpf fd %d: %s\n",
+ bpf_fd, strerror(errno));
+ err = -EINVAL;
+ goto out_close;
+ }
+ }
+
set_rlimit = NO_CHANGE;

/*
--
2.1.0

2015-08-28 07:07:21

by Wang Nan

[permalink] [raw]
Subject: [PATCH 14/32] perf tools: Suppress probing messages when probing by BPF loading

This patch suppresses message output by add_perf_probe_events() and
del_perf_probe_events() if they are triggered by BPF loading. Before
this patch, when using 'perf record' with BPF object/source as event
selector, following message will be output:

Added new event:
perf_bpf_probe:lock_page_ret (on __lock_page%return)
You can now use it in all perf tools, such as:
perf record -e perf_bpf_probe:lock_page_ret -aR sleep 1
...
Removed event: perf_bpf_probe:lock_page_ret

Which is misleading, especially 'use it in all perf tools' because they
will be removed after 'pref record' exit.

In this patch, a 'silent' field is appended into probe_conf to control
output. bpf__{,un}probe() set it to true when calling
{add,del}_perf_probe_events().

Signed-off-by: Wang Nan <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/bpf-loader.c | 6 ++++++
tools/perf/util/probe-event.c | 17 ++++++++++++-----
tools/perf/util/probe-event.h | 1 +
tools/perf/util/probe-file.c | 5 ++++-
4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index c3bc0a8..77eeb99 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -188,6 +188,7 @@ static bool is_probed;
int bpf__unprobe(void)
{
struct strfilter *delfilter;
+ bool old_silent = probe_conf.silent;
int ret;

if (!is_probed)
@@ -199,7 +200,9 @@ int bpf__unprobe(void)
return -ENOMEM;
}

+ probe_conf.silent = true;
ret = del_perf_probe_events(delfilter);
+ probe_conf.silent = old_silent;
strfilter__delete(delfilter);
if (ret < 0 && is_probed)
pr_debug("Error: failed to delete events: %s\n",
@@ -215,6 +218,7 @@ int bpf__probe(void)
struct bpf_object *obj, *tmp;
struct bpf_program *prog;
struct perf_probe_event *pevs;
+ bool old_silent = probe_conf.silent;

pevs = calloc(MAX_PROBES, sizeof(pevs[0]));
if (!pevs)
@@ -235,9 +239,11 @@ int bpf__probe(void)
}
}

+ probe_conf.silent = true;
probe_conf.max_probes = MAX_PROBES;
/* Let add_perf_probe_events generates probe_trace_event (tevs) */
err = add_perf_probe_events(pevs, nr_events, false);
+ probe_conf.silent = old_silent;

/* add_perf_probe_events return negative when fail */
if (err < 0) {
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 57a7bae..e720913 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -52,7 +52,9 @@
#define PERFPROBE_GROUP "probe"

bool probe_event_dry_run; /* Dry run flag */
-struct probe_conf probe_conf;
+struct probe_conf probe_conf = {
+ .silent = false,
+};

#define semantic_error(msg ...) pr_err("Semantic error :" msg)

@@ -2192,10 +2194,12 @@ static int show_perf_probe_event(const char *group, const char *event,

ret = perf_probe_event__sprintf(group, event, pev, module, &buf);
if (ret >= 0) {
- if (use_stdout)
+ if (use_stdout && !probe_conf.silent)
printf("%s\n", buf.buf);
- else
+ else if (!probe_conf.silent)
pr_info("%s\n", buf.buf);
+ else
+ pr_debug("%s\n", buf.buf);
}
strbuf_release(&buf);

@@ -2418,7 +2422,10 @@ static int __add_probe_trace_events(struct perf_probe_event *pev,
}

ret = 0;
- pr_info("Added new event%s\n", (ntevs > 1) ? "s:" : ":");
+ if (!probe_conf.silent)
+ pr_info("Added new event%s\n", (ntevs > 1) ? "s:" : ":");
+ else
+ pr_debug("Added new event%s\n", (ntevs > 1) ? "s:" : ":");
for (i = 0; i < ntevs; i++) {
tev = &tevs[i];
/* Skip if the symbol is out of .text or blacklisted */
@@ -2454,7 +2461,7 @@ static int __add_probe_trace_events(struct perf_probe_event *pev,
warn_uprobe_event_compat(tev);

/* Note that it is possible to skip all events because of blacklist */
- if (ret >= 0 && event) {
+ if (ret >= 0 && event && !probe_conf.silent) {
/* Show how to use the event. */
pr_info("\nYou can now use it in all perf tools, such as:\n\n");
pr_info("\tperf record -e %s:%s -aR sleep 1\n\n", group, event);
diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 915f0d8..3ab9c3e 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -13,6 +13,7 @@ struct probe_conf {
bool force_add;
bool no_inlines;
int max_probes;
+ bool silent;
};
extern struct probe_conf probe_conf;
extern bool probe_event_dry_run;
diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index bbb2437..db7bd4c 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -267,7 +267,10 @@ static int __del_trace_probe_event(int fd, struct str_node *ent)
goto error;
}

- pr_info("Removed event: %s\n", ent->s);
+ if (!probe_conf.silent)
+ pr_info("Removed event: %s\n", ent->s);
+ else
+ pr_debug("Removed event: %s\n", ent->s);
return 0;
error:
pr_warning("Failed to delete event: %s\n",
--
2.1.0

2015-08-28 07:12:42

by Wang Nan

[permalink] [raw]
Subject: [PATCH 15/32] perf record: Add clang options for compiling BPF scripts

Although previous patch allows setting BPF compiler related options in
perfconfig, on some ad-hoc situation it still requires passing options
through cmdline. This patch introduces 2 options to 'perf record' for
this propose: --clang-path and --clang-opt.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/builtin-record.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index fd56a5b..212718c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -30,6 +30,7 @@
#include "util/auxtrace.h"
#include "util/parse-branch-options.h"
#include "util/bpf-loader.h"
+#include "util/llvm-utils.h"

#include <unistd.h>
#include <sched.h>
@@ -1094,6 +1095,12 @@ struct option __record_options[] = {
"per thread proc mmap processing timeout in ms"),
OPT_BOOLEAN(0, "switch-events", &record.opts.record_switch_events,
"Record context switch events"),
+#ifdef HAVE_LIBBPF_SUPPORT
+ OPT_STRING(0, "clang-path", &llvm_param.clang_path, "clang path",
+ "clang binary to use for compiling BPF scriptlets"),
+ OPT_STRING(0, "clang-opt", &llvm_param.clang_opt, "clang options",
+ "options passed to clang when compiling BPF scriptlets"),
+#endif
OPT_END()
};

--
2.1.0

2015-08-28 07:13:12

by Wang Nan

[permalink] [raw]
Subject: [PATCH 16/32] perf tools: Infrastructure for compiling scriptlets when passing '.c' to --event

This patch provides infrastructure for passing source files to --event
directly using:

# perf record --event bpf-file.c command

This patch does following works:

1) Allow passing '.c' file to '--event'. parse_events_load_bpf() is
expanded to allow caller tell it whether the passed file is source
file or object.

2) llvm__compile_bpf() is called to compile the '.c' file, the result
is saved into memory. Use bpf_object__open_buffer() to load the
in-memory object.

Introduces a bpf-script-example.c so we can manually test it:

# perf record --clang-opt "-DLINUX_VERSION_CODE=0x40200" --event ./bpf-script-example.c sleep 1

Note that '--clang-opt' must put before '--event'.

Futher patches will merge it into a testcase so can be tested automatically.

Signed-off-by: Wang Nan <[email protected]>
Signed-off-by: He Kuang <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
[ wangnan: Pass name of source file to bpf_object__open_buffer(). ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
---
tools/perf/tests/bpf-script-example.c | 44 +++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.c | 25 +++++++++++++++-----
tools/perf/util/bpf-loader.h | 10 ++++----
tools/perf/util/parse-events.c | 8 +++----
tools/perf/util/parse-events.h | 3 ++-
tools/perf/util/parse-events.l | 3 +++
tools/perf/util/parse-events.y | 15 ++++++++++--
7 files changed, 91 insertions(+), 17 deletions(-)
create mode 100644 tools/perf/tests/bpf-script-example.c

diff --git a/tools/perf/tests/bpf-script-example.c b/tools/perf/tests/bpf-script-example.c
new file mode 100644
index 0000000..410a70b
--- /dev/null
+++ b/tools/perf/tests/bpf-script-example.c
@@ -0,0 +1,44 @@
+#ifndef LINUX_VERSION_CODE
+# error Need LINUX_VERSION_CODE
+# error Example: for 4.2 kernel, put 'clang-opt="-DLINUX_VERSION_CODE=0x40200" into llvm section of ~/.perfconfig'
+#endif
+#define BPF_ANY 0
+#define BPF_MAP_TYPE_ARRAY 2
+#define BPF_FUNC_map_lookup_elem 1
+#define BPF_FUNC_map_update_elem 2
+
+static void *(*bpf_map_lookup_elem)(void *map, void *key) =
+ (void *) BPF_FUNC_map_lookup_elem;
+static void *(*bpf_map_update_elem)(void *map, void *key, void *value, int flags) =
+ (void *) BPF_FUNC_map_update_elem;
+
+struct bpf_map_def {
+ unsigned int type;
+ unsigned int key_size;
+ unsigned int value_size;
+ unsigned int max_entries;
+};
+
+#define SEC(NAME) __attribute__((section(NAME), used))
+struct bpf_map_def SEC("maps") flip_table = {
+ .type = BPF_MAP_TYPE_ARRAY,
+ .key_size = sizeof(int),
+ .value_size = sizeof(int),
+ .max_entries = 1,
+};
+
+SEC("func=sys_epoll_pwait")
+int bpf_func__sys_epoll_pwait(void *ctx)
+{
+ int ind =0;
+ int *flag = bpf_map_lookup_elem(&flip_table, &ind);
+ int new_flag;
+ if (!flag)
+ return 0;
+ /* flip flag and store back */
+ new_flag = !*flag;
+ bpf_map_update_elem(&flip_table, &ind, &new_flag, BPF_ANY);
+ return new_flag;
+}
+char _license[] SEC("license") = "GPL";
+int _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 77eeb99..c2aafe2 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -11,6 +11,7 @@
#include "bpf-loader.h"
#include "probe-event.h"
#include "probe-finder.h"
+#include "llvm-utils.h"

#define DEFINE_PRINT_FN(name, level) \
static int libbpf_##name(const char *fmt, ...) \
@@ -152,16 +153,28 @@ sync_bpf_program_pev(struct bpf_program *prog)
return 0;
}

-int bpf__prepare_load(const char *filename)
+int bpf__prepare_load(const char *filename, bool source)
{
struct bpf_object *obj;
+ int err;

if (!libbpf_initialized)
libbpf_set_print(libbpf_warning,
libbpf_info,
libbpf_debug);

- obj = bpf_object__open(filename);
+ if (source) {
+ void *obj_buf;
+ size_t obj_buf_sz;
+
+ err = llvm__compile_bpf(filename, &obj_buf, &obj_buf_sz);
+ if (err)
+ return err;
+ obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, filename);
+ free(obj_buf);
+ } else
+ obj = bpf_object__open(filename);
+
if (!obj) {
pr_debug("bpf: failed to load %s\n", filename);
return -EINVAL;
@@ -361,12 +374,12 @@ int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg)
}\
buf[size - 1] = '\0';

-int bpf__strerror_prepare_load(const char *filename, int err,
- char *buf, size_t size)
+int bpf__strerror_prepare_load(const char *filename, bool source,
+ int err, char *buf, size_t size)
{
bpf__strerror_head(err, buf, size);
- bpf__strerror_entry(EINVAL, "%s: BPF object file '%s' is invalid",
- emsg, filename)
+ bpf__strerror_entry(EINVAL, "%s: BPF %s file '%s' is invalid",
+ emsg, source ? "source" : "object", filename);
bpf__strerror_end(buf, size);
return 0;
}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 323e664..97aed65 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -18,9 +18,9 @@ typedef int (*bpf_prog_iter_callback_t)(struct probe_trace_event *tev,
int fd, void *arg);

#ifdef HAVE_LIBBPF_SUPPORT
-int bpf__prepare_load(const char *filename);
-int bpf__strerror_prepare_load(const char *filename, int err,
- char *buf, size_t size);
+int bpf__prepare_load(const char *filename, bool source);
+int bpf__strerror_prepare_load(const char *filename, bool source,
+ int err, char *buf, size_t size);
int bpf__probe(void);
int bpf__unprobe(void);
int bpf__strerror_probe(int err, char *buf, size_t size);
@@ -32,7 +32,8 @@ void bpf__clear(void);

int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg);
#else
-static inline int bpf__prepare_load(const char *filename __maybe_unused)
+static inline int bpf__prepare_load(const char *filename __maybe_unused,
+ bool source __maybe_unused)
{
pr_debug("ERROR: eBPF object loading is disabled during compiling.\n");
return -1;
@@ -64,6 +65,7 @@ __bpf_strerror(char *buf, size_t size)

static inline int
bpf__strerror_prepare_load(const char *filename __maybe_unused,
+ bool source __maybe_unused,
int err __maybe_unused,
char *buf, size_t size)
{
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4343433..08b277b 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -483,8 +483,8 @@ int parse_events_add_tracepoint(struct list_head *list, int *idx,
}

int parse_events_load_bpf(struct parse_events_evlist *data,
- struct list_head *list,
- char *bpf_file_name)
+ struct list_head *list __maybe_unused,
+ char *bpf_file_name, bool source)
{
int err;
char errbuf[BUFSIZ];
@@ -500,9 +500,9 @@ int parse_events_load_bpf(struct parse_events_evlist *data,
* problem. After that probe events file by file is possible.
* However, probing cost is still need to be considered.
*/
- err = bpf__prepare_load(bpf_file_name);
+ err = bpf__prepare_load(bpf_file_name, source);
if (err) {
- bpf__strerror_prepare_load(bpf_file_name, err,
+ bpf__strerror_prepare_load(bpf_file_name, source, err,
errbuf, sizeof(errbuf));
data->error->str = strdup(errbuf);
data->error->help = strdup("(add -v to see detail)");
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index 3652387..728a424 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -121,7 +121,8 @@ int parse_events_add_tracepoint(struct list_head *list, int *idx,
char *sys, char *event);
int parse_events_load_bpf(struct parse_events_evlist *data,
struct list_head *list,
- char *bpf_file_name);
+ char *bpf_file_name,
+ bool source);
int parse_events_add_numeric(struct parse_events_evlist *data,
struct list_head *list,
u32 type, u64 config,
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 22e8f93..8033890 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -116,6 +116,7 @@ group [^,{}/]*[{][^}]*[}][^,{}/]*
event_pmu [^,{}/]+[/][^/]*[/][^,{}/]*
event [^,{}/]+
bpf_object .*\.(o|bpf)
+bpf_source .*\.c

num_dec [0-9]+
num_hex 0x[a-fA-F0-9]+
@@ -161,6 +162,7 @@ modifier_bp [rwx]{1,3}

{event_pmu} |
{bpf_object} |
+{bpf_source} |
{event} {
BEGIN(INITIAL);
REWIND(1);
@@ -267,6 +269,7 @@ r{num_raw_hex} { return raw(yyscanner); }

{modifier_event} { return str(yyscanner, PE_MODIFIER_EVENT); }
{bpf_object} { return str(yyscanner, PE_BPF_OBJECT); }
+{bpf_source} { return str(yyscanner, PE_BPF_SOURCE); }
{name} { return pmu_str_check(yyscanner); }
"/" { BEGIN(config); return '/'; }
- { return '-'; }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 3ee3a32..90d2458 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -42,7 +42,7 @@ static inc_group_count(struct list_head *list,
%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_RAW PE_TERM
%token PE_EVENT_NAME
%token PE_NAME
-%token PE_BPF_OBJECT
+%token PE_BPF_OBJECT PE_BPF_SOURCE
%token PE_MODIFIER_EVENT PE_MODIFIER_BP
%token PE_NAME_CACHE_TYPE PE_NAME_CACHE_OP_RESULT
%token PE_PREFIX_MEM PE_PREFIX_RAW PE_PREFIX_GROUP
@@ -55,6 +55,7 @@ static inc_group_count(struct list_head *list,
%type <num> PE_TERM
%type <str> PE_NAME
%type <str> PE_BPF_OBJECT
+%type <str> PE_BPF_SOURCE
%type <str> PE_NAME_CACHE_TYPE
%type <str> PE_NAME_CACHE_OP_RESULT
%type <str> PE_MODIFIER_EVENT
@@ -431,7 +432,17 @@ PE_BPF_OBJECT
struct list_head *list;

ALLOC_LIST(list);
- ABORT_ON(parse_events_load_bpf(data, list, $1));
+ ABORT_ON(parse_events_load_bpf(data, list, $1, false));
+ $$ = list;
+}
+|
+PE_BPF_SOURCE
+{
+ struct parse_events_evlist *data = _data;
+ struct list_head *list;
+
+ ALLOC_LIST(list);
+ ABORT_ON(parse_events_load_bpf(data, list, $1, true));
$$ = list;
}

--
2.1.0

2015-08-28 07:07:26

by Wang Nan

[permalink] [raw]
Subject: [PATCH 17/32] perf tests: Enforce LLVM test for BPF test

This patch replaces the original toy BPF program with previous introduced
bpf-script-example.c. Dynamically embedded it into 'llvm-src.c'.

The newly introduced BPF program attaches a BPF program at
'sys_epoll_pwait()', and collect half samples from it. perf itself never
use that syscall, so further test can verify their result with it.

Since BPF program require LINUX_VERSION_CODE of runtime kernel, this
patch computes that code from uname.

Since the resuling BPF object is useful for further testcases, this patch
introduces 'prepare' and 'cleanup' method to tests, and makes test__llvm()
create a MAP_SHARED memory array to hold the resulting object.

Signed-off-by: He Kuang <[email protected]>
Signed-off-by: Wang Nan <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/tests/Build | 9 +++-
tools/perf/tests/builtin-test.c | 8 ++++
tools/perf/tests/llvm.c | 104 +++++++++++++++++++++++++++++++++++-----
tools/perf/tests/llvm.h | 14 ++++++
tools/perf/tests/tests.h | 2 +
5 files changed, 123 insertions(+), 14 deletions(-)
create mode 100644 tools/perf/tests/llvm.h

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index c1518bd..8c98409 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -32,7 +32,14 @@ perf-y += sample-parsing.o
perf-y += parse-no-sample-id-all.o
perf-y += kmod-path.o
perf-y += thread-map.o
-perf-y += llvm.o
+perf-y += llvm.o llvm-src.o
+
+$(OUTPUT)tests/llvm-src.c: tests/bpf-script-example.c
+ $(Q)echo '#include <tests/llvm.h>' > $@
+ $(Q)echo 'const char test_llvm__bpf_prog[] =' >> $@
+ $(Q)sed -e 's/"/\\"/g' -e 's/\(.*\)/"\1\\n"/g' $< >> $@
+ $(Q)echo ';' >> $@
+

perf-$(CONFIG_X86) += perf-time-to-tsc.o

diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 136cd93..1a349e8 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -17,6 +17,8 @@
static struct test {
const char *desc;
int (*func)(void);
+ void (*prepare)(void);
+ void (*cleanup)(void);
} tests[] = {
{
.desc = "vmlinux symtab matches kallsyms",
@@ -177,6 +179,8 @@ static struct test {
{
.desc = "Test LLVM searching and compiling",
.func = test__llvm,
+ .prepare = test__llvm_prepare,
+ .cleanup = test__llvm_cleanup,
},
{
.func = NULL,
@@ -265,7 +269,11 @@ static int __cmd_test(int argc, const char *argv[], struct intlist *skiplist)
}

pr_debug("\n--- start ---\n");
+ if (tests[curr].prepare)
+ tests[curr].prepare();
err = run_test(&tests[curr]);
+ if (tests[curr].cleanup)
+ tests[curr].cleanup();
pr_debug("---- end ----\n%s:", tests[curr].desc);

switch (err) {
diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c
index 52d5597..236bf39 100644
--- a/tools/perf/tests/llvm.c
+++ b/tools/perf/tests/llvm.c
@@ -1,9 +1,13 @@
#include <stdio.h>
+#include <sys/utsname.h>
#include <bpf/libbpf.h>
#include <util/llvm-utils.h>
#include <util/cache.h>
+#include <util/util.h>
+#include <sys/mman.h>
#include "tests.h"
#include "debug.h"
+#include "llvm.h"

static int perf_config_cb(const char *var, const char *val,
void *arg __maybe_unused)
@@ -11,16 +15,6 @@ static int perf_config_cb(const char *var, const char *val,
return perf_default_config(var, val, arg);
}

-/*
- * Randomly give it a "version" section since we don't really load it
- * into kernel
- */
-static const char test_bpf_prog[] =
- "__attribute__((section(\"do_fork\"), used)) "
- "int fork(void *ctx) {return 0;} "
- "char _license[] __attribute__((section(\"license\"), used)) = \"GPL\";"
- "int _version __attribute__((section(\"version\"), used)) = 0x40100;";
-
#ifdef HAVE_LIBBPF_SUPPORT
static int test__bpf_parsing(void *obj_buf, size_t obj_buf_sz)
{
@@ -41,12 +35,44 @@ static int test__bpf_parsing(void *obj_buf __maybe_unused,
}
#endif

+static char *
+compose_source(void)
+{
+ struct utsname utsname;
+ int version, patchlevel, sublevel, err;
+ unsigned long version_code;
+ char *code;
+
+ if (uname(&utsname))
+ return NULL;
+
+ err = sscanf(utsname.release, "%d.%d.%d",
+ &version, &patchlevel, &sublevel);
+ if (err != 3) {
+ fprintf(stderr, " (Can't get kernel version from uname '%s')",
+ utsname.release);
+ return NULL;
+ }
+
+ version_code = (version << 16) + (patchlevel << 8) + sublevel;
+ err = asprintf(&code, "#define LINUX_VERSION_CODE 0x%08lx;\n%s",
+ version_code, test_llvm__bpf_prog);
+ if (err < 0)
+ return NULL;
+
+ return code;
+}
+
+#define SHARED_BUF_INIT_SIZE (1 << 20)
+struct test_llvm__bpf_result *p_test_llvm__bpf_result;
+
int test__llvm(void)
{
char *tmpl_new, *clang_opt_new;
void *obj_buf;
size_t obj_buf_sz;
int err, old_verbose;
+ char *source;

perf_config(perf_config_cb, NULL);

@@ -73,10 +99,22 @@ int test__llvm(void)
if (!llvm_param.clang_opt)
llvm_param.clang_opt = strdup("");

- err = asprintf(&tmpl_new, "echo '%s' | %s", test_bpf_prog,
- llvm_param.clang_bpf_cmd_template);
- if (err < 0)
+ source = compose_source();
+ if (!source) {
+ pr_err("Failed to compose source code\n");
+ return -1;
+ }
+
+ /* Quote __EOF__ so strings in source won't be expanded by shell */
+ err = asprintf(&tmpl_new, "cat << '__EOF__' | %s\n%s\n__EOF__\n",
+ llvm_param.clang_bpf_cmd_template, source);
+ free(source);
+ source = NULL;
+ if (err < 0) {
+ pr_err("Failed to alloc new template\n");
return -1;
+ }
+
err = asprintf(&clang_opt_new, "-xc %s", llvm_param.clang_opt);
if (err < 0)
return -1;
@@ -93,6 +131,46 @@ int test__llvm(void)
}

err = test__bpf_parsing(obj_buf, obj_buf_sz);
+ if (!err && p_test_llvm__bpf_result) {
+ if (obj_buf_sz > SHARED_BUF_INIT_SIZE) {
+ pr_err("Resulting object too large\n");
+ } else {
+ p_test_llvm__bpf_result->size = obj_buf_sz;
+ memcpy(p_test_llvm__bpf_result->object,
+ obj_buf, obj_buf_sz);
+ }
+ }
free(obj_buf);
return err;
}
+
+void test__llvm_prepare(void)
+{
+ p_test_llvm__bpf_result = mmap(NULL, SHARED_BUF_INIT_SIZE,
+ PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+ if (!p_test_llvm__bpf_result)
+ return;
+ memset((void *)p_test_llvm__bpf_result, '\0', SHARED_BUF_INIT_SIZE);
+}
+
+void test__llvm_cleanup(void)
+{
+ unsigned long boundary, buf_end;
+
+ if (!p_test_llvm__bpf_result)
+ return;
+ if (p_test_llvm__bpf_result->size == 0) {
+ munmap((void *)p_test_llvm__bpf_result, SHARED_BUF_INIT_SIZE);
+ p_test_llvm__bpf_result = NULL;
+ return;
+ }
+
+ buf_end = (unsigned long)p_test_llvm__bpf_result + SHARED_BUF_INIT_SIZE;
+
+ boundary = (unsigned long)(p_test_llvm__bpf_result);
+ boundary += p_test_llvm__bpf_result->size;
+ boundary = (boundary + (page_size - 1)) &
+ (~((unsigned long)page_size - 1));
+ munmap((void *)boundary, buf_end - boundary);
+}
diff --git a/tools/perf/tests/llvm.h b/tools/perf/tests/llvm.h
new file mode 100644
index 0000000..1e89e46
--- /dev/null
+++ b/tools/perf/tests/llvm.h
@@ -0,0 +1,14 @@
+#ifndef PERF_TEST_LLVM_H
+#define PERF_TEST_LLVM_H
+
+#include <stddef.h> /* for size_t */
+
+struct test_llvm__bpf_result {
+ size_t size;
+ char object[];
+};
+
+extern struct test_llvm__bpf_result *p_test_llvm__bpf_result;
+extern const char test_llvm__bpf_prog[];
+
+#endif
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index bf113a2..0d79f04 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -63,6 +63,8 @@ int test__fdarray__add(void);
int test__kmod_path__parse(void);
int test__thread_map(void);
int test__llvm(void);
+void test__llvm_prepare(void);
+void test__llvm_cleanup(void);

#if defined(__x86_64__) || defined(__i386__) || defined(__arm__) || defined(__aarch64__)
#ifdef HAVE_DWARF_UNWIND_SUPPORT
--
2.1.0

2015-08-28 07:11:32

by Wang Nan

[permalink] [raw]
Subject: [PATCH 18/32] perf test: Add 'perf test BPF'

This patch adds BPF testcase for testing BPF event filtering.

By utilizing the result of 'perf test LLVM', this patch compiles the
eBPF sample program then test it ability. The BPF script in 'perf test
LLVM' collects half of execution of epoll_pwait(). This patch runs 111
times of it, so the resule should contains 56 samples.

Signed-off-by: Wang Nan <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/tests/Build | 1 +
tools/perf/tests/bpf.c | 170 ++++++++++++++++++++++++++++++++++++++++
tools/perf/tests/builtin-test.c | 4 +
tools/perf/tests/llvm.c | 19 +++++
tools/perf/tests/llvm.h | 1 +
tools/perf/tests/tests.h | 1 +
tools/perf/util/bpf-loader.c | 14 ++++
tools/perf/util/bpf-loader.h | 8 ++
8 files changed, 218 insertions(+)
create mode 100644 tools/perf/tests/bpf.c

diff --git a/tools/perf/tests/Build b/tools/perf/tests/Build
index 8c98409..7ceb448 100644
--- a/tools/perf/tests/Build
+++ b/tools/perf/tests/Build
@@ -33,6 +33,7 @@ perf-y += parse-no-sample-id-all.o
perf-y += kmod-path.o
perf-y += thread-map.o
perf-y += llvm.o llvm-src.o
+perf-y += bpf.o

$(OUTPUT)tests/llvm-src.c: tests/bpf-script-example.c
$(Q)echo '#include <tests/llvm.h>' > $@
diff --git a/tools/perf/tests/bpf.c b/tools/perf/tests/bpf.c
new file mode 100644
index 0000000..6c238ca
--- /dev/null
+++ b/tools/perf/tests/bpf.c
@@ -0,0 +1,170 @@
+#include <stdio.h>
+#include <sys/epoll.h>
+#include <util/bpf-loader.h>
+#include <util/evlist.h>
+#include "tests.h"
+#include "llvm.h"
+#include "debug.h"
+#define NR_ITERS 111
+
+#ifdef HAVE_LIBBPF_SUPPORT
+
+static int epoll_pwait_loop(void)
+{
+ int i;
+
+ /* Should fail NR_ITERS times */
+ for (i = 0; i < NR_ITERS; i++)
+ epoll_pwait(-(i + 1), NULL, 0, 0, NULL);
+ return 0;
+}
+
+static int prepare_bpf(void *obj_buf, size_t obj_buf_sz)
+{
+ int err;
+ char errbuf[BUFSIZ];
+
+ err = bpf__prepare_load_buffer(obj_buf, obj_buf_sz, NULL);
+ if (err) {
+ bpf__strerror_prepare_load("[buffer]", false, err, errbuf,
+ sizeof(errbuf));
+ fprintf(stderr, " (%s)", errbuf);
+ return TEST_FAIL;
+ }
+
+ err = bpf__probe();
+ if (err) {
+ bpf__strerror_load(err, errbuf, sizeof(errbuf));
+ fprintf(stderr, " (%s)", errbuf);
+ if (getuid() != 0)
+ fprintf(stderr, " (try run as root)");
+ return TEST_FAIL;
+ }
+
+ err = bpf__load();
+ if (err) {
+ bpf__strerror_load(err, errbuf, sizeof(errbuf));
+ fprintf(stderr, " (%s)", errbuf);
+ return TEST_FAIL;
+ }
+
+ return 0;
+}
+
+static int do_test(void)
+{
+ struct record_opts opts = {
+ .target = {
+ .uid = UINT_MAX,
+ .uses_mmap = true,
+ },
+ .freq = 0,
+ .mmap_pages = 256,
+ .default_interval = 1,
+ };
+
+ int err, i, count = 0;
+ char pid[16];
+ char sbuf[STRERR_BUFSIZE];
+ struct perf_evlist *evlist;
+
+ snprintf(pid, sizeof(pid), "%d", getpid());
+ pid[sizeof(pid) - 1] = '\0';
+ opts.target.tid = opts.target.pid = pid;
+
+ /* Instead of perf_evlist__new_default, don't add default events */
+ evlist = perf_evlist__new();
+ if (!evlist) {
+ pr_debug("No ehough memory to create evlist\n");
+ return -ENOMEM;
+ }
+
+ err = perf_evlist__create_maps(evlist, &opts.target);
+ if (err < 0) {
+ pr_debug("Not enough memory to create thread/cpu maps\n");
+ goto out_delete_evlist;
+ }
+
+ err = perf_evlist__add_bpf(evlist);
+ if (err) {
+ fprintf(stderr, " (Failed to add events selected by BPF)");
+ goto out_delete_evlist;
+ }
+
+ perf_evlist__config(evlist, &opts);
+
+ err = perf_evlist__open(evlist);
+ if (err < 0) {
+ pr_debug("perf_evlist__open: %s\n",
+ strerror_r(errno, sbuf, sizeof(sbuf)));
+ goto out_delete_evlist;
+ }
+
+ err = perf_evlist__mmap(evlist, opts.mmap_pages, false);
+ if (err < 0) {
+ pr_debug("perf_evlist__mmap: %s\n",
+ strerror_r(errno, sbuf, sizeof(sbuf)));
+ goto out_delete_evlist;
+ }
+
+ perf_evlist__enable(evlist);
+ epoll_pwait_loop();
+ perf_evlist__disable(evlist);
+
+ for (i = 0; i < evlist->nr_mmaps; i++) {
+ union perf_event *event;
+
+ while ((event = perf_evlist__mmap_read(evlist, i)) != NULL) {
+ const u32 type = event->header.type;
+
+ if (type == PERF_RECORD_SAMPLE)
+ count ++;
+ }
+ }
+
+ if (count != (NR_ITERS + 1) / 2) {
+ fprintf(stderr, " (filter result incorrect)");
+ err = -EBADF;
+ }
+
+out_delete_evlist:
+ perf_evlist__delete(evlist);
+ if (err)
+ return TEST_FAIL;
+ return 0;
+}
+
+int test__bpf(void)
+{
+ int err;
+ void *obj_buf;
+ size_t obj_buf_sz;
+
+ test_llvm__fetch_bpf_obj(&obj_buf, &obj_buf_sz);
+ if (!obj_buf || !obj_buf_sz) {
+ if (verbose == 0)
+ fprintf(stderr, " (fix 'perf test LLVM' first)");
+ return TEST_SKIP;
+ }
+
+ err = prepare_bpf(obj_buf, obj_buf_sz);
+ if (err)
+ goto out;
+
+ err = do_test();
+ if (err)
+ goto out;
+out:
+ bpf__unprobe();
+ bpf__clear();
+ if (err)
+ return TEST_FAIL;
+ return 0;
+}
+
+#else
+int test__bpf(void)
+{
+ return TEST_SKIP;
+}
+#endif
diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index 1a349e8..c32c836 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -183,6 +183,10 @@ static struct test {
.cleanup = test__llvm_cleanup,
},
{
+ .desc = "Test BPF filter",
+ .func = test__bpf,
+ },
+ {
.func = NULL,
},
};
diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c
index 236bf39..fd5fdb0 100644
--- a/tools/perf/tests/llvm.c
+++ b/tools/perf/tests/llvm.c
@@ -174,3 +174,22 @@ void test__llvm_cleanup(void)
(~((unsigned long)page_size - 1));
munmap((void *)boundary, buf_end - boundary);
}
+
+void
+test_llvm__fetch_bpf_obj(void **p_obj_buf, size_t *p_obj_buf_sz)
+{
+ *p_obj_buf = NULL;
+ *p_obj_buf_sz = 0;
+
+ if (!p_test_llvm__bpf_result) {
+ test__llvm_prepare();
+ test__llvm();
+ test__llvm_cleanup();
+ }
+
+ if (!p_test_llvm__bpf_result)
+ return;
+
+ *p_obj_buf = p_test_llvm__bpf_result->object;
+ *p_obj_buf_sz = p_test_llvm__bpf_result->size;
+}
diff --git a/tools/perf/tests/llvm.h b/tools/perf/tests/llvm.h
index 1e89e46..2fd7ed6 100644
--- a/tools/perf/tests/llvm.h
+++ b/tools/perf/tests/llvm.h
@@ -10,5 +10,6 @@ struct test_llvm__bpf_result {

extern struct test_llvm__bpf_result *p_test_llvm__bpf_result;
extern const char test_llvm__bpf_prog[];
+void test_llvm__fetch_bpf_obj(void **p_obj_buf, size_t *p_obj_buf_sz);

#endif
diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
index 0d79f04..f8ded73 100644
--- a/tools/perf/tests/tests.h
+++ b/tools/perf/tests/tests.h
@@ -65,6 +65,7 @@ int test__thread_map(void);
int test__llvm(void);
void test__llvm_prepare(void);
void test__llvm_cleanup(void);
+int test__bpf(void);

#if defined(__x86_64__) || defined(__i386__) || defined(__arm__) || defined(__aarch64__)
#ifdef HAVE_DWARF_UNWIND_SUPPORT
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index c2aafe2..95e529b 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -153,6 +153,20 @@ sync_bpf_program_pev(struct bpf_program *prog)
return 0;
}

+int bpf__prepare_load_buffer(void *obj_buf, size_t obj_buf_sz,
+ const char *name)
+{
+ struct bpf_object *obj;
+
+ obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, name);
+ if (!obj) {
+ pr_debug("bpf: failed to load buffer\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
int bpf__prepare_load(const char *filename, bool source)
{
struct bpf_object *obj;
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 97aed65..dead4d4 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -19,6 +19,8 @@ typedef int (*bpf_prog_iter_callback_t)(struct probe_trace_event *tev,

#ifdef HAVE_LIBBPF_SUPPORT
int bpf__prepare_load(const char *filename, bool source);
+int bpf__prepare_load_buffer(void *obj_buf, size_t obj_buf_sz,
+ const char *name);
int bpf__strerror_prepare_load(const char *filename, bool source,
int err, char *buf, size_t size);
int bpf__probe(void);
@@ -39,6 +41,12 @@ static inline int bpf__prepare_load(const char *filename __maybe_unused,
return -1;
}

+static inline int bpf__prepare_load_buffer(void *obj_buf __maybe_unused,
+ size_t obj_buf_sz __maybe_unused)
+{
+ return bpf__prepare_load(NULL, false);
+}
+
static inline int bpf__probe(void) { return 0; }
static inline int bpf__unprobe(void) { return 0; }
static inline int bpf__load(void) { return 0; }
--
2.1.0

2015-08-28 07:10:05

by Wang Nan

[permalink] [raw]
Subject: [PATCH 19/32] bpf tools: Load a program with different instances using preprocessor

In this patch, caller of libbpf is able to control the loaded programs
by installing a preprocessor callback for a BPF program. With
preprocessor, different instances can be created from one BPF program.

This patch will be used by perf to generate different prologue for
different 'struct probe_trace_event' instances matched by one
'struct perf_probe_event'.

bpf_program__set_prep() is added to support this feature. Caller
should pass libbpf the number of instances should be created and a
preprocessor function which will be called when doing real loading.
The callback should return instructions arrays for each instances.

fd field in bpf_programs is replaced by instance, which has an nr field
and fds array. bpf_program__nth_fd() is introduced for read fd of
instances. Old interface bpf_program__fd() is reimplemented by
returning the first fd.

Signed-off-by: Wang Nan <[email protected]>
Signed-off-by: He Kuang <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
[wangnan: Add missing '!',
allows bpf_program__unload() when prog->instance.nr == -1
]
---
tools/lib/bpf/libbpf.c | 143 +++++++++++++++++++++++++++++++++++++++++++++----
tools/lib/bpf/libbpf.h | 22 ++++++++
2 files changed, 156 insertions(+), 9 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4252fc2..6a07b26 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -98,7 +98,11 @@ struct bpf_program {
} *reloc_desc;
int nr_reloc;

- int fd;
+ struct {
+ int nr;
+ int *fds;
+ } instance;
+ bpf_program_prep_t preprocessor;

struct bpf_object *obj;
void *priv;
@@ -152,10 +156,24 @@ struct bpf_object {

static void bpf_program__unload(struct bpf_program *prog)
{
+ int i;
+
if (!prog)
return;

- zclose(prog->fd);
+ /*
+ * If the object is opened but the program is never loaded,
+ * it is possible that prog->instance.nr == -1.
+ */
+ if (prog->instance.nr > 0) {
+ for (i = 0; i < prog->instance.nr; i++)
+ zclose(prog->instance.fds[i]);
+ } else if (prog->instance.nr != -1)
+ pr_warning("Internal error: instance.nr is %d\n",
+ prog->instance.nr);
+
+ prog->instance.nr = -1;
+ zfree(&prog->instance.fds);
}

static void bpf_program__exit(struct bpf_program *prog)
@@ -206,7 +224,8 @@ bpf_program__init(void *data, size_t size, char *name, int idx,
memcpy(prog->insns, data,
prog->insns_cnt * sizeof(struct bpf_insn));
prog->idx = idx;
- prog->fd = -1;
+ prog->instance.fds = NULL;
+ prog->instance.nr = -1;

return 0;
errout:
@@ -795,13 +814,71 @@ static int
bpf_program__load(struct bpf_program *prog,
char *license, u32 kern_version)
{
- int err, fd;
+ int err = 0, fd, i;
+
+ if (prog->instance.nr < 0 || !prog->instance.fds) {
+ if (prog->preprocessor) {
+ pr_warning("Internal error: can't load program '%s'\n",
+ prog->section_name);
+ return -EINVAL;
+ }
+
+ prog->instance.fds = malloc(sizeof(int));
+ if (!prog->instance.fds) {
+ pr_warning("No enough memory for fds\n");
+ return -ENOMEM;
+ }
+ prog->instance.nr = 1;
+ prog->instance.fds[0] = -1;
+ }
+
+ if (!prog->preprocessor) {
+ if (prog->instance.nr != 1)
+ pr_warning("Program '%s' inconsistent: nr(%d) not 1\n",
+ prog->section_name, prog->instance.nr);

- err = load_program(prog->insns, prog->insns_cnt,
- license, kern_version, &fd);
- if (!err)
- prog->fd = fd;
+ err = load_program(prog->insns, prog->insns_cnt,
+ license, kern_version, &fd);
+ if (!err)
+ prog->instance.fds[0] = fd;
+ goto out;
+ }
+
+ for (i = 0; i < prog->instance.nr; i++) {
+ struct bpf_prog_prep_result result;
+ bpf_program_prep_t preprocessor = prog->preprocessor;
+
+ bzero(&result, sizeof(result));
+ err = preprocessor(prog, i, prog->insns,
+ prog->insns_cnt, &result);
+ if (err) {
+ pr_warning("Preprocessing %dth instance of program '%s' failed\n",
+ i, prog->section_name);
+ goto out;
+ }
+
+ if (!result.new_insn_ptr || !result.new_insn_cnt) {
+ pr_debug("Skip loading %dth instance of program '%s'\n",
+ i, prog->section_name);
+ prog->instance.fds[i] = -1;
+ continue;
+ }
+
+ err = load_program(result.new_insn_ptr,
+ result.new_insn_cnt,
+ license, kern_version, &fd);
+
+ if (err) {
+ pr_warning("Loading %dth instance of program '%s' failed\n",
+ i, prog->section_name);
+ goto out;
+ }

+ if (result.pfd)
+ *result.pfd = fd;
+ prog->instance.fds[i] = fd;
+ }
+out:
if (err)
pr_warning("failed to load program '%s'\n",
prog->section_name);
@@ -1052,5 +1129,53 @@ const char *bpf_program__title(struct bpf_program *prog, bool dup)

int bpf_program__fd(struct bpf_program *prog)
{
- return prog->fd;
+ return bpf_program__nth_fd(prog, 0);
+}
+
+int bpf_program__set_prep(struct bpf_program *prog, int nr_instance,
+ bpf_program_prep_t prep)
+{
+ int *instance_fds;
+
+ if (nr_instance <= 0 || !prep)
+ return -EINVAL;
+
+ if (prog->instance.nr > 0 || prog->instance.fds) {
+ pr_warning("Can't set pre-processor after loading\n");
+ return -EINVAL;
+ }
+
+ instance_fds = malloc(sizeof(int) * nr_instance);
+ if (!instance_fds) {
+ pr_warning("alloc memory failed for instance of fds\n");
+ return -ENOMEM;
+ }
+
+ /* fill all fd with -1 */
+ memset(instance_fds, 0xff, sizeof(int) * nr_instance);
+
+ prog->instance.nr = nr_instance;
+ prog->instance.fds = instance_fds;
+ prog->preprocessor = prep;
+ return 0;
+}
+
+int bpf_program__nth_fd(struct bpf_program *prog, int n)
+{
+ int fd;
+
+ if (n >= prog->instance.nr || n < 0) {
+ pr_warning("Can't get the %dth fd from program %s: only %d instances\n",
+ n, prog->section_name, prog->instance.nr);
+ return -EINVAL;
+ }
+
+ fd = prog->instance.fds[n];
+ if (fd < 0) {
+ pr_warning("%dth instance of program '%s' is invalid\n",
+ n, prog->section_name);
+ return -ENOENT;
+ }
+
+ return fd;
}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index f16170c..d82b89e 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -67,6 +67,28 @@ const char *bpf_program__title(struct bpf_program *prog, bool dup);

int bpf_program__fd(struct bpf_program *prog);

+struct bpf_insn;
+struct bpf_prog_prep_result {
+ /*
+ * If not NULL, load new instruction array.
+ * If set to NULL, don't load this instance.
+ */
+ struct bpf_insn *new_insn_ptr;
+ int new_insn_cnt;
+
+ /* If not NULL, result fd is set to it */
+ int *pfd;
+};
+
+typedef int (*bpf_program_prep_t)(struct bpf_program *, int n,
+ struct bpf_insn *, int insn_cnt,
+ struct bpf_prog_prep_result *res);
+
+int bpf_program__set_prep(struct bpf_program *prog, int nr_instance,
+ bpf_program_prep_t prep);
+
+int bpf_program__nth_fd(struct bpf_program *prog, int n);
+
/*
* We don't need __attribute__((packed)) now since it is
* unnecessary for 'bpf_map_def' because they are all aligned.
--
2.1.0

2015-08-28 07:07:29

by Wang Nan

[permalink] [raw]
Subject: [PATCH 20/32] perf tools: Fix probe-event.h include

Commit 7b6ff0bdbf4f7f429c2116cca92a6d171217449e ("perf probe ppc64le:
Fixup function entry if using kallsyms lookup") adds 'struct map' into
probe-event.h but not include "util/map.h" in it. This patch fixes it.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/probe-event.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/probe-event.h b/tools/perf/util/probe-event.h
index 3ab9c3e..de0dd13 100644
--- a/tools/perf/util/probe-event.h
+++ b/tools/perf/util/probe-event.h
@@ -5,6 +5,7 @@
#include "intlist.h"
#include "strlist.h"
#include "strfilter.h"
+#include "map.h"

/* Probe related configurations */
struct probe_conf {
--
2.1.0

2015-08-28 07:12:40

by Wang Nan

[permalink] [raw]
Subject: [PATCH 21/32] perf probe: Reset args and nargs for probe_trace_event when failure

When failure occures in add_probe_trace_event(), args in
probe_trace_event is incomplete. Since information in it may be used
in further, this patch frees the allocated memory and set it to NULL
to avoid dangling pointer.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/probe-finder.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index 29c43c068..5ab9cd6 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -1228,6 +1228,10 @@ static int add_probe_trace_event(Dwarf_Die *sc_die, struct probe_finder *pf)

end:
free(args);
+ if (ret) {
+ tev->nargs = 0;
+ zfree(&tev->args);
+ }
return ret;
}

--
2.1.0

2015-08-28 07:08:41

by Wang Nan

[permalink] [raw]
Subject: [PATCH 22/32] perf tools: Move linux/filter.h to tools/include

From: He Kuang <[email protected]>

This patch moves filter.h from include/linux/kernel.h to
tools/include/linux/filter.h to enable other libraries use macros in
it, like libbpf which will be introduced by further patches. Currenty,
the moved filter.h only contains the useful macros needed by libbpf
for not introducing too much dependence.

MANIFEST is also updated for 'make perf-*-src-pkg'.

One change:
imm field of BPF_EMIT_CALL becomes ((FUNC) - BPF_FUNC_unspec) to
suit user space code generator.

Signed-off-by: He Kuang <[email protected]>
Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/include/linux/filter.h | 237 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/MANIFEST | 1 +
2 files changed, 238 insertions(+)
create mode 100644 tools/include/linux/filter.h

diff --git a/tools/include/linux/filter.h b/tools/include/linux/filter.h
new file mode 100644
index 0000000..11d2b1c
--- /dev/null
+++ b/tools/include/linux/filter.h
@@ -0,0 +1,237 @@
+/*
+ * Linux Socket Filter Data Structures
+ */
+#ifndef __TOOLS_LINUX_FILTER_H
+#define __TOOLS_LINUX_FILTER_H
+
+#include <linux/bpf.h>
+
+/* ArgX, context and stack frame pointer register positions. Note,
+ * Arg1, Arg2, Arg3, etc are used as argument mappings of function
+ * calls in BPF_CALL instruction.
+ */
+#define BPF_REG_ARG1 BPF_REG_1
+#define BPF_REG_ARG2 BPF_REG_2
+#define BPF_REG_ARG3 BPF_REG_3
+#define BPF_REG_ARG4 BPF_REG_4
+#define BPF_REG_ARG5 BPF_REG_5
+#define BPF_REG_CTX BPF_REG_6
+#define BPF_REG_FP BPF_REG_10
+
+/* Additional register mappings for converted user programs. */
+#define BPF_REG_A BPF_REG_0
+#define BPF_REG_X BPF_REG_7
+#define BPF_REG_TMP BPF_REG_8
+
+/* BPF program can access up to 512 bytes of stack space. */
+#define MAX_BPF_STACK 512
+
+/* Helper macros for filter block array initializers. */
+
+/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
+
+#define BPF_ALU64_REG(OP, DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+#define BPF_ALU32_REG(OP, DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
+
+#define BPF_ALU64_IMM(OP, DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_ALU32_IMM(OP, DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Endianness conversion, cpu_to_{l,b}e(), {l,b}e_to_cpu() */
+
+#define BPF_ENDIAN(TYPE, DST, LEN) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_END | BPF_SRC(TYPE), \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = LEN })
+
+/* Short form of mov, dst_reg = src_reg */
+
+#define BPF_MOV64_REG(DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+#define BPF_MOV32_REG(DST, SRC) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = 0 })
+
+/* Short form of mov, dst_reg = imm32 */
+
+#define BPF_MOV64_IMM(DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_MOV32_IMM(DST, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Short form of mov based on type,
+ * BPF_X: dst_reg = src_reg, BPF_K: dst_reg = imm32
+ */
+
+#define BPF_MOV64_RAW(TYPE, DST, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU64 | BPF_MOV | BPF_SRC(TYPE), \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = IMM })
+
+#define BPF_MOV32_RAW(TYPE, DST, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ALU | BPF_MOV | BPF_SRC(TYPE), \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
+
+#define BPF_LD_ABS(SIZE, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Indirect packet access, R0 = *(uint *) (skb->data + src_reg + imm32) */
+
+#define BPF_LD_IND(SIZE, SRC, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_LD | BPF_SIZE(SIZE) | BPF_IND, \
+ .dst_reg = 0, \
+ .src_reg = SRC, \
+ .off = 0, \
+ .imm = IMM })
+
+/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
+
+#define BPF_LDX_MEM(SIZE, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
+
+#define BPF_STX_MEM(SIZE, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
+
+#define BPF_ST_MEM(SIZE, DST, OFF, IMM) \
+ ((struct bpf_insn) { \
+ .code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Conditional jumps against registers,
+ * if (dst_reg 'op' src_reg) goto pc + off16
+ */
+
+#define BPF_JMP_REG(OP, DST, SRC, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_OP(OP) | BPF_X, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = 0 })
+
+/* Conditional jumps against immediates,
+ * if (dst_reg 'op' imm32) goto pc + off16
+ */
+
+#define BPF_JMP_IMM(OP, DST, IMM, OFF) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_OP(OP) | BPF_K, \
+ .dst_reg = DST, \
+ .src_reg = 0, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Function call */
+
+#define BPF_EMIT_CALL(FUNC) \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_CALL, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = ((FUNC) - BPF_FUNC_unspec) })
+
+/* Raw code statement block */
+
+#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \
+ ((struct bpf_insn) { \
+ .code = CODE, \
+ .dst_reg = DST, \
+ .src_reg = SRC, \
+ .off = OFF, \
+ .imm = IMM })
+
+/* Program exit */
+
+#define BPF_EXIT_INSN() \
+ ((struct bpf_insn) { \
+ .code = BPF_JMP | BPF_EXIT, \
+ .dst_reg = 0, \
+ .src_reg = 0, \
+ .off = 0, \
+ .imm = 0 })
+
+#endif /* __TOOLS_LINUX_FILTER_H */
diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
index 56fe0c9..14e8b98 100644
--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@@ -42,6 +42,7 @@ tools/include/asm-generic/bitops.h
tools/include/linux/atomic.h
tools/include/linux/bitops.h
tools/include/linux/compiler.h
+tools/include/linux/filter.h
tools/include/linux/hash.h
tools/include/linux/kernel.h
tools/include/linux/list.h
--
2.1.0

2015-08-28 07:07:35

by Wang Nan

[permalink] [raw]
Subject: [PATCH 23/32] perf tools: Add BPF_PROLOGUE config options for further patches

If both LIBBPF and DWARF are detected, it is possible to create prologue
for eBPF programs to help them accessing kernel data. HAVE_BPF_PROLOGUE
and CONFIG_BPF_PROLOGUE is added as flags for this feature.

PERF_HAVE_ARCH_GET_REG_OFFSET indicates an architecture supports
converting name of a register to its offset in 'struct pt_regs'.
Without this support, BPF_PROLOGUE should be turned off.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/config/Makefile | 12 ++++++++++++
tools/perf/util/include/dwarf-regs.h | 7 +++++++
2 files changed, 19 insertions(+)

diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 38a4144..d46765b7 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -314,6 +314,18 @@ ifndef NO_LIBELF
CFLAGS += -DHAVE_LIBBPF_SUPPORT
$(call detected,CONFIG_LIBBPF)
endif
+
+ ifndef NO_DWARF
+ ifneq ($(origin PERF_HAVE_ARCH_GET_REG_INFO), undefined)
+ CFLAGS += -DHAVE_BPF_PROLOGUE
+ $(call detected,CONFIG_BPF_PROLOGUE)
+ else
+ msg := $(warning BPF prologue is not supported by architecture $(ARCH));
+ endif
+ else
+ msg := $(warning DWARF support is off, BPF prologue is disabled);
+ endif
+
endif # NO_LIBBPF
endif # NO_LIBELF

diff --git a/tools/perf/util/include/dwarf-regs.h b/tools/perf/util/include/dwarf-regs.h
index 8f14965..3dda083 100644
--- a/tools/perf/util/include/dwarf-regs.h
+++ b/tools/perf/util/include/dwarf-regs.h
@@ -5,4 +5,11 @@
const char *get_arch_regstr(unsigned int n);
#endif

+#ifdef HAVE_BPF_PROLOGUE
+/*
+ * Arch should support fetching the offset of a register in pt_regs
+ * by its name.
+ */
+int arch_get_reg_info(const char *name, int *offset);
+#endif
#endif
--
2.1.0

2015-08-28 07:11:13

by Wang Nan

[permalink] [raw]
Subject: [PATCH 24/32] perf tools: Introduce arch_get_reg_info() for x86

From: He Kuang <[email protected]>

arch_get_reg_info() is a helper function which converts register name
like "%rax" to offset of a register in 'struct pt_regs', which is
required by BPF prologue generator.

This patch replaces original string table by a 'struct reg_info' table,
which records offset of registers according to its name.

For x86, since there are two sub-archs (x86_32 and x86_64) but we can
only get pt_regs for the arch we are currently on, this patch fills
offset with '-1' for another sub-arch. This introduces a limitation to
perf prologue that, we are unable to generate prologue on a x86_32
compiled perf for BPF programs targeted on x86_64 kernel. This
limitation is acceptable, because this is a very rare usecase.

Signed-off-by: Wang Nan <[email protected]>
Signed-off-by: He Kuang <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/arch/x86/Makefile | 1 +
tools/perf/arch/x86/util/Build | 2 +
tools/perf/arch/x86/util/dwarf-regs.c | 104 ++++++++++++++++++++++++----------
3 files changed, 78 insertions(+), 29 deletions(-)

diff --git a/tools/perf/arch/x86/Makefile b/tools/perf/arch/x86/Makefile
index 21322e0..a84a6f6f 100644
--- a/tools/perf/arch/x86/Makefile
+++ b/tools/perf/arch/x86/Makefile
@@ -2,3 +2,4 @@ ifndef NO_DWARF
PERF_HAVE_DWARF_REGS := 1
endif
HAVE_KVM_STAT_SUPPORT := 1
+PERF_HAVE_ARCH_GET_REG_INFO := 1
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 2c55e1b..09429f6 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -3,6 +3,8 @@ libperf-y += tsc.o
libperf-y += pmu.o
libperf-y += kvm-stat.o

+# BPF_PROLOGUE also need dwarf-regs.o. However, if CONFIG_BPF_PROLOGUE
+# is true, CONFIG_DWARF must true.
libperf-$(CONFIG_DWARF) += dwarf-regs.o

libperf-$(CONFIG_LIBUNWIND) += unwind-libunwind.o
diff --git a/tools/perf/arch/x86/util/dwarf-regs.c b/tools/perf/arch/x86/util/dwarf-regs.c
index be22dd4..9928caf 100644
--- a/tools/perf/arch/x86/util/dwarf-regs.c
+++ b/tools/perf/arch/x86/util/dwarf-regs.c
@@ -22,44 +22,67 @@

#include <stddef.h>
#include <dwarf-regs.h>
+#include <string.h>
+#include <linux/ptrace.h>
+#include <linux/kernel.h> /* for offsetof */
+#include <util/bpf-loader.h>
+
+struct reg_info {
+ const char *name; /* Reg string in debuginfo */
+ int offset; /* Reg offset in struct pt_regs */
+};

/*
* Generic dwarf analysis helpers
*/
-
+/*
+ * x86_64 compiling can't access pt_regs for x86_32, so fill offset
+ * with -1.
+ */
+#ifdef __x86_64__
+# define REG_INFO(n, f) { .name = n, .offset = -1, }
+#else
+# define REG_INFO(n, f) { .name = n, .offset = offsetof(struct pt_regs, f), }
+#endif
#define X86_32_MAX_REGS 8
-const char *x86_32_regs_table[X86_32_MAX_REGS] = {
- "%ax",
- "%cx",
- "%dx",
- "%bx",
- "$stack", /* Stack address instead of %sp */
- "%bp",
- "%si",
- "%di",
+
+struct reg_info x86_32_regs_table[X86_32_MAX_REGS] = {
+ REG_INFO("%ax", eax),
+ REG_INFO("%cx", ecx),
+ REG_INFO("%dx", edx),
+ REG_INFO("%bx", ebx),
+ REG_INFO("$stack", esp), /* Stack address instead of %sp */
+ REG_INFO("%bp", ebp),
+ REG_INFO("%si", esi),
+ REG_INFO("%di", edi),
};

+#undef REG_INFO
+#ifdef __x86_64__
+# define REG_INFO(n, f) { .name = n, .offset = offsetof(struct pt_regs, f), }
+#else
+# define REG_INFO(n, f) { .name = n, .offset = -1, }
+#endif
#define X86_64_MAX_REGS 16
-const char *x86_64_regs_table[X86_64_MAX_REGS] = {
- "%ax",
- "%dx",
- "%cx",
- "%bx",
- "%si",
- "%di",
- "%bp",
- "%sp",
- "%r8",
- "%r9",
- "%r10",
- "%r11",
- "%r12",
- "%r13",
- "%r14",
- "%r15",
+struct reg_info x86_64_regs_table[X86_64_MAX_REGS] = {
+ REG_INFO("%ax", rax),
+ REG_INFO("%dx", rdx),
+ REG_INFO("%cx", rcx),
+ REG_INFO("%bx", rbx),
+ REG_INFO("%si", rsi),
+ REG_INFO("%di", rdi),
+ REG_INFO("%bp", rbp),
+ REG_INFO("%sp", rsp),
+ REG_INFO("%r8", r8),
+ REG_INFO("%r9", r9),
+ REG_INFO("%r10", r10),
+ REG_INFO("%r11", r11),
+ REG_INFO("%r12", r12),
+ REG_INFO("%r13", r13),
+ REG_INFO("%r14", r14),
+ REG_INFO("%r15", r15),
};

-/* TODO: switching by dwarf address size */
#ifdef __x86_64__
#define ARCH_MAX_REGS X86_64_MAX_REGS
#define arch_regs_table x86_64_regs_table
@@ -71,5 +94,28 @@ const char *x86_64_regs_table[X86_64_MAX_REGS] = {
/* Return architecture dependent register string (for kprobe-tracer) */
const char *get_arch_regstr(unsigned int n)
{
- return (n <= ARCH_MAX_REGS) ? arch_regs_table[n] : NULL;
+ return (n <= ARCH_MAX_REGS) ? arch_regs_table[n].name : NULL;
}
+
+#ifdef HAVE_BPF_PROLOGUE
+int arch_get_reg_info(const char *name, int *offset)
+{
+ int i;
+ struct reg_info *info;
+
+ if (!name || !offset)
+ return -1;
+
+ for (i = 0; i < ARCH_MAX_REGS; i++) {
+ info = &arch_regs_table[i];
+ if (strcmp(info->name, name) == 0) {
+ if (info->offset < 0)
+ return -1;
+ *offset = info->offset;
+ return 0;
+ }
+ }
+
+ return -1;
+}
+#endif
--
2.1.0

2015-08-28 07:08:43

by Wang Nan

[permalink] [raw]
Subject: [PATCH 25/32] perf tools: Add prologue for BPF programs for fetching arguments

This patch generates prologue for a BPF program which fetch arguments
for it. With this patch, the program can have arguments as follow:

SEC("lock_page=__lock_page page->flags")
int lock_page(struct pt_regs *ctx, int err, unsigned long flags)
{
return 1;
}

This patch passes at most 3 arguments from r3, r4 and r5. r1 is still
the ctx pointer. r2 is used to indicate the successfulness of
dereferencing.

This patch uses r6 to hold ctx (struct pt_regs) and r7 to hold stack
pointer for result. Result of each arguments first store on stack:

low address
BPF_REG_FP - 24 ARG3
BPF_REG_FP - 16 ARG2
BPF_REG_FP - 8 ARG1
BPF_REG_FP
high address

Then loaded into r3, r4 and r5.

The output prologue for offn(...off2(off1(reg)))) should be:

r6 <- r1 // save ctx into a callee saved register
r7 <- fp
r7 <- r7 - stack_offset // pointer to result slot
/* load r3 with the offset in pt_regs of 'reg' */
(r7) <- r3 // make slot valid
r3 <- r3 + off1 // prepare to read unsafe pointer
r2 <- 8
r1 <- r7 // result put onto stack
call probe_read // read unsafe pointer
jnei r0, 0, err // error checking
r3 <- (r7) // read result
r3 <- r3 + off2 // prepare to read unsafe pointer
r2 <- 8
r1 <- r7
call probe_read
jnei r0, 0, err
...
/* load r2, r3, r4 from stack */
goto success
err:
r2 <- 1
/* load r3, r4, r5 with 0 */
goto usercode
success:
r2 <- 0
usercode:
r1 <- r6 // restore ctx
// original user code

If all of arguments reside in register (dereferencing is not
required), gen_prologue_fastpath() will be used to create
fast prologue:

r3 <- (r1 + offset of reg1)
r4 <- (r1 + offset of reg2)
r5 <- (r1 + offset of reg3)
r2 <- 0

P.S.

eBPF calling convention is defined as:

* r0 - return value from in-kernel function, and exit value
for eBPF program
* r1 - r5 - arguments from eBPF program to in-kernel function
* r6 - r9 - callee saved registers that in-kernel function will
preserve
* r10 - read-only frame pointer to access stack

Signed-off-by: He Kuang <[email protected]>
Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/Build | 1 +
tools/perf/util/bpf-prologue.c | 442 +++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-prologue.h | 34 ++++
3 files changed, 477 insertions(+)
create mode 100644 tools/perf/util/bpf-prologue.c
create mode 100644 tools/perf/util/bpf-prologue.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index c0ca4a1..fd2f084 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -84,6 +84,7 @@ libperf-$(CONFIG_AUXTRACE) += intel-bts.o
libperf-y += parse-branch-options.o

libperf-$(CONFIG_LIBBPF) += bpf-loader.o
+libperf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o
libperf-$(CONFIG_LIBELF) += symbol-elf.o
libperf-$(CONFIG_LIBELF) += probe-file.o
libperf-$(CONFIG_LIBELF) += probe-event.o
diff --git a/tools/perf/util/bpf-prologue.c b/tools/perf/util/bpf-prologue.c
new file mode 100644
index 0000000..2a5f4c7
--- /dev/null
+++ b/tools/perf/util/bpf-prologue.c
@@ -0,0 +1,442 @@
+/*
+ * bpf-prologue.c
+ *
+ * Copyright (C) 2015 He Kuang <[email protected]>
+ * Copyright (C) 2015 Huawei Inc.
+ */
+
+#include <bpf/libbpf.h>
+#include "perf.h"
+#include "debug.h"
+#include "bpf-prologue.h"
+#include "probe-finder.h"
+#include <dwarf-regs.h>
+#include <linux/filter.h>
+
+#define BPF_REG_SIZE 8
+
+#define JMP_TO_ERROR_CODE -1
+#define JMP_TO_SUCCESS_CODE -2
+#define JMP_TO_USER_CODE -3
+
+struct bpf_insn_pos {
+ struct bpf_insn *begin;
+ struct bpf_insn *end;
+ struct bpf_insn *pos;
+};
+
+static inline int
+pos_get_cnt(struct bpf_insn_pos *pos)
+{
+ return pos->pos - pos->begin;
+}
+
+static int
+append_insn(struct bpf_insn new_insn, struct bpf_insn_pos *pos)
+{
+ if (!pos->pos)
+ return -ERANGE;
+
+ if (pos->pos + 1 >= pos->end) {
+ pr_err("bpf prologue: prologue too long\n");
+ pos->pos = NULL;
+ return -ERANGE;
+ }
+
+ *(pos->pos)++ = new_insn;
+ return 0;
+}
+
+static int
+check_pos(struct bpf_insn_pos *pos)
+{
+ if (!pos->pos || pos->pos >= pos->end)
+ return -ERANGE;
+ return 0;
+}
+
+/* Give it a shorter name */
+#define ins(i, p) append_insn((i), (p))
+
+/*
+ * Give a register name (in 'reg'), generate instruction to
+ * load register into an eBPF register rd:
+ * 'ldd target_reg, offset(ctx_reg)', where:
+ * ctx_reg is pre initialized to pointer of 'struct pt_regs'.
+ */
+static int
+gen_ldx_reg_from_ctx(struct bpf_insn_pos *pos, int ctx_reg,
+ const char *reg, int target_reg)
+{
+ int offset;
+
+ if (arch_get_reg_info(reg, &offset)) {
+ pr_err("bpf: prologue: failed to get register %s\n",
+ reg);
+ return -1;
+ }
+ ins(BPF_LDX_MEM(BPF_DW, target_reg, ctx_reg, offset), pos);
+
+ if (check_pos(pos))
+ return -ERANGE;
+ return 0;
+}
+
+/*
+ * Generate a BPF_FUNC_probe_read function call.
+ *
+ * src_base_addr_reg is a register holding base address,
+ * dst_addr_reg is a register holding dest address (on stack),
+ * result is:
+ *
+ * *[dst_addr_reg] = *([src_base_addr_reg] + offset)
+ *
+ * Arguments of BPF_FUNC_probe_read:
+ * ARG1: ptr to stack (dest)
+ * ARG2: size (8)
+ * ARG3: unsafe ptr (src)
+ */
+static int
+gen_read_mem(struct bpf_insn_pos *pos,
+ int src_base_addr_reg,
+ int dst_addr_reg,
+ long offset)
+{
+ /* mov arg3, src_base_addr_reg */
+ if (src_base_addr_reg != BPF_REG_ARG3)
+ ins(BPF_MOV64_REG(BPF_REG_ARG3, src_base_addr_reg), pos);
+ /* add arg3, #offset */
+ if (offset)
+ ins(BPF_ALU64_IMM(BPF_ADD, BPF_REG_ARG3, offset), pos);
+
+ /* mov arg2, #reg_size */
+ ins(BPF_ALU64_IMM(BPF_MOV, BPF_REG_ARG2, BPF_REG_SIZE), pos);
+
+ /* mov arg1, dst_addr_reg */
+ if (dst_addr_reg != BPF_REG_ARG1)
+ ins(BPF_MOV64_REG(BPF_REG_ARG1, dst_addr_reg), pos);
+
+ /* Call probe_read */
+ ins(BPF_EMIT_CALL(BPF_FUNC_probe_read), pos);
+ /*
+ * Error processing: if read fail, goto error code,
+ * will be relocated. Target should be the start of
+ * error processing code.
+ */
+ ins(BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, JMP_TO_ERROR_CODE),
+ pos);
+
+ if (check_pos(pos))
+ return -ERANGE;
+ return 0;
+}
+
+/*
+ * Each arg should be bare register. Fetch and save them into argument
+ * registers (r3 - r5).
+ *
+ * BPF_REG_1 should have been initialized with pointer to
+ * 'struct pt_regs'.
+ */
+static int
+gen_prologue_fastpath(struct bpf_insn_pos *pos,
+ struct probe_trace_arg *args, int nargs)
+{
+ int i;
+
+ for (i = 0; i < nargs; i++)
+ if (gen_ldx_reg_from_ctx(pos, BPF_REG_1, args[i].value,
+ BPF_PROLOGUE_START_ARG_REG + i))
+ goto errout;
+
+ if (check_pos(pos))
+ goto errout;
+ return 0;
+errout:
+ return -1;
+}
+
+/*
+ * Slow path:
+ * At least one argument has the form of 'offset($rx)'.
+ *
+ * Following code first stores them into stack, then loads all of then
+ * to r2 - r5.
+ * Before final loading, the final result should be:
+ *
+ * low address
+ * BPF_REG_FP - 24 ARG3
+ * BPF_REG_FP - 16 ARG2
+ * BPF_REG_FP - 8 ARG1
+ * BPF_REG_FP
+ * high address
+ *
+ * For each argument (described as: offn(...off2(off1(reg)))),
+ * generates following code:
+ *
+ * r7 <- fp
+ * r7 <- r7 - stack_offset // Ideal code should initialize r7 using
+ * // fp before generating args. However,
+ * // eBPF won't regard r7 as stack pointer
+ * // if it is generated by minus 8 from
+ * // another stack pointer except fp.
+ * // This is why we have to set r7
+ * // to fp for each variable.
+ * r3 <- value of 'reg'-> generated using gen_ldx_reg_from_ctx()
+ * (r7) <- r3 // skip following instructions for bare reg
+ * r3 <- r3 + off1 . // skip if off1 == 0
+ * r2 <- 8 \
+ * r1 <- r7 |-> generated by gen_read_mem()
+ * call probe_read /
+ * jnei r0, 0, err ./
+ * r3 <- (r7)
+ * r3 <- r3 + off2 . // skip if off2 == 0
+ * r2 <- 8 \ // r2 may be broken by probe_read, so set again
+ * r1 <- r7 |-> generated by gen_read_mem()
+ * call probe_read /
+ * jnei r0, 0, err ./
+ * ...
+ */
+static int
+gen_prologue_slowpath(struct bpf_insn_pos *pos,
+ struct probe_trace_arg *args, int nargs)
+{
+ int i;
+
+ for (i = 0; i < nargs; i++) {
+ struct probe_trace_arg *arg = &args[i];
+ const char *reg = arg->value;
+ struct probe_trace_arg_ref *ref = NULL;
+ int stack_offset = (i + 1) * -8;
+
+ pr_debug("prologue: fetch arg %d, base reg is %s\n",
+ i, reg);
+
+ /* value of base register is stored into ARG3 */
+ if (gen_ldx_reg_from_ctx(pos, BPF_REG_CTX, reg,
+ BPF_REG_ARG3)) {
+ pr_err("prologue: failed to get offset of register %s\n",
+ reg);
+ goto errout;
+ }
+
+ /* Make r7 the stack pointer. */
+ ins(BPF_MOV64_REG(BPF_REG_7, BPF_REG_FP), pos);
+ /* r7 += -8 */
+ ins(BPF_ALU64_IMM(BPF_ADD, BPF_REG_7, stack_offset), pos);
+ /*
+ * Store r3 (base register) onto stack
+ * Ensure fp[offset] is set.
+ * fp is the only valid base register when storing
+ * into stack. We are not allowed to use r7 as base
+ * register here.
+ */
+ ins(BPF_STX_MEM(BPF_DW, BPF_REG_FP, BPF_REG_ARG3,
+ stack_offset), pos);
+
+ ref = arg->ref;
+ while (ref) {
+ pr_debug("prologue: arg %d: offset %ld\n",
+ i, ref->offset);
+ if (gen_read_mem(pos, BPF_REG_3, BPF_REG_7,
+ ref->offset)) {
+ pr_err("prologue: failed to generate probe_read function call\n");
+ goto errout;
+ }
+
+ ref = ref->next;
+ /*
+ * Load previous result into ARG3. Use
+ * BPF_REG_FP instead of r7 because verifier
+ * allows FP based addressing only.
+ */
+ if (ref)
+ ins(BPF_LDX_MEM(BPF_DW, BPF_REG_ARG3,
+ BPF_REG_FP, stack_offset), pos);
+ }
+ }
+
+ /* Final pass: read to registers */
+ for (i = 0; i < nargs; i++)
+ ins(BPF_LDX_MEM(BPF_DW, BPF_PROLOGUE_START_ARG_REG + i,
+ BPF_REG_FP, -BPF_REG_SIZE * (i + 1)), pos);
+
+ ins(BPF_JMP_IMM(BPF_JA, BPF_REG_0, 0, JMP_TO_SUCCESS_CODE), pos);
+
+ if (check_pos(pos))
+ goto errout;
+ return 0;
+errout:
+ return -1;
+}
+
+static int
+prologue_relocate(struct bpf_insn_pos *pos, struct bpf_insn *error_code,
+ struct bpf_insn *success_code, struct bpf_insn *user_code)
+{
+ struct bpf_insn *insn;
+
+ if (check_pos(pos))
+ return -ERANGE;
+
+ for (insn = pos->begin; insn < pos->pos; insn++) {
+ u8 class = BPF_CLASS(insn->code);
+ u8 opcode;
+
+ if (class != BPF_JMP)
+ continue;
+ opcode = BPF_OP(insn->code);
+ if (opcode == BPF_CALL)
+ continue;
+
+ switch (insn->off) {
+ case JMP_TO_ERROR_CODE:
+ insn->off = error_code - (insn + 1);
+ break;
+ case JMP_TO_SUCCESS_CODE:
+ insn->off = success_code - (insn + 1);
+ break;
+ case JMP_TO_USER_CODE:
+ insn->off = user_code - (insn + 1);
+ break;
+ default:
+ pr_err("bpf prologue: internal error: relocation failed\n");
+ return -1;
+ }
+ }
+ return 0;
+}
+
+int bpf__gen_prologue(struct probe_trace_arg *args, int nargs,
+ struct bpf_insn *new_prog, size_t *new_cnt,
+ size_t cnt_space)
+{
+ struct bpf_insn *success_code = NULL;
+ struct bpf_insn *error_code = NULL;
+ struct bpf_insn *user_code = NULL;
+ struct bpf_insn_pos pos;
+ bool fastpath = true;
+ int i;
+
+ if (!new_prog || !new_cnt)
+ return -EINVAL;
+
+ pos.begin = new_prog;
+ pos.end = new_prog + cnt_space;
+ pos.pos = new_prog;
+
+ if (!nargs) {
+ ins(BPF_ALU64_IMM(BPF_MOV, BPF_PROLOGUE_FETCH_RESULT_REG, 0),
+ &pos);
+
+ if (check_pos(&pos))
+ goto errout;
+
+ *new_cnt = pos_get_cnt(&pos);
+ return 0;
+ }
+
+ if (nargs > BPF_PROLOGUE_MAX_ARGS)
+ nargs = BPF_PROLOGUE_MAX_ARGS;
+ if (cnt_space > BPF_MAXINSNS)
+ cnt_space = BPF_MAXINSNS;
+
+ /* First pass: validation */
+ for (i = 0; i < nargs; i++) {
+ struct probe_trace_arg_ref *ref = args[i].ref;
+
+ if (args[i].value[0] == '@') {
+ /* TODO: fetch global variable */
+ pr_err("bpf: prologue: global %s%+ld not support\n",
+ args[i].value, ref ? ref->offset : 0);
+ return -ENOTSUP;
+ }
+
+ while (ref) {
+ /* fastpath is true if all args has ref == NULL */
+ fastpath = false;
+
+ /*
+ * Instruction encodes immediate value using
+ * s32, ref->offset is long. On systems which
+ * can't fill long in s32, refuse to process if
+ * ref->offset too large (or small).
+ */
+#ifdef __LP64__
+#define OFFSET_MAX ((1LL << 31) - 1)
+#define OFFSET_MIN ((1LL << 31) * -1)
+ if (ref->offset > OFFSET_MAX ||
+ ref->offset < OFFSET_MIN) {
+ pr_err("bpf: prologue: offset out of bound: %ld\n",
+ ref->offset);
+ return -E2BIG;
+ }
+#endif
+ ref = ref->next;
+ }
+ }
+ pr_debug("prologue: pass validation\n");
+
+ if (fastpath) {
+ /* If all variables are registers... */
+ pr_debug("prologue: fast path\n");
+ if (gen_prologue_fastpath(&pos, args, nargs))
+ goto errout;
+ } else {
+ pr_debug("prologue: slow path\n");
+
+ /* Initialization: move ctx to a callee saved register. */
+ ins(BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1), &pos);
+
+ if (gen_prologue_slowpath(&pos, args, nargs))
+ goto errout;
+ /*
+ * start of ERROR_CODE (only slow pass needs error code)
+ * mov r2 <- 1
+ * goto usercode
+ */
+ error_code = pos.pos;
+ ins(BPF_ALU64_IMM(BPF_MOV, BPF_PROLOGUE_FETCH_RESULT_REG, 1),
+ &pos);
+
+ for (i = 0; i < nargs; i++)
+ ins(BPF_ALU64_IMM(BPF_MOV,
+ BPF_PROLOGUE_START_ARG_REG + i,
+ 0),
+ &pos);
+ ins(BPF_JMP_IMM(BPF_JA, BPF_REG_0, 0, JMP_TO_USER_CODE),
+ &pos);
+ }
+
+ /*
+ * start of SUCCESS_CODE:
+ * mov r2 <- 0
+ * goto usercode // skip
+ */
+ success_code = pos.pos;
+ ins(BPF_ALU64_IMM(BPF_MOV, BPF_PROLOGUE_FETCH_RESULT_REG, 0), &pos);
+
+ /*
+ * start of USER_CODE:
+ * Restore ctx to r1
+ */
+ user_code = pos.pos;
+ if (!fastpath) {
+ /*
+ * Only slow path needs restoring of ctx. In fast path,
+ * register are loaded directly from r1.
+ */
+ ins(BPF_MOV64_REG(BPF_REG_ARG1, BPF_REG_CTX), &pos);
+ if (prologue_relocate(&pos, error_code, success_code,
+ user_code))
+ goto errout;
+ }
+
+ if (check_pos(&pos))
+ goto errout;
+
+ *new_cnt = pos_get_cnt(&pos);
+ return 0;
+errout:
+ return -ERANGE;
+}
diff --git a/tools/perf/util/bpf-prologue.h b/tools/perf/util/bpf-prologue.h
new file mode 100644
index 0000000..f1e4c5d
--- /dev/null
+++ b/tools/perf/util/bpf-prologue.h
@@ -0,0 +1,34 @@
+/*
+ * Copyright (C) 2015, He Kuang <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ */
+#ifndef __BPF_PROLOGUE_H
+#define __BPF_PROLOGUE_H
+
+#include <linux/compiler.h>
+#include <linux/filter.h>
+#include "probe-event.h"
+
+#define BPF_PROLOGUE_MAX_ARGS 3
+#define BPF_PROLOGUE_START_ARG_REG BPF_REG_3
+#define BPF_PROLOGUE_FETCH_RESULT_REG BPF_REG_2
+
+#ifdef HAVE_BPF_PROLOGUE
+int bpf__gen_prologue(struct probe_trace_arg *args, int nargs,
+ struct bpf_insn *new_prog, size_t *new_cnt,
+ size_t cnt_space);
+#else
+static inline int
+bpf__gen_prologue(struct probe_trace_arg *args __maybe_unused,
+ int nargs __maybe_unused,
+ struct bpf_insn *new_prog __maybe_unused,
+ size_t *new_cnt,
+ size_t cnt_space __maybe_unused)
+{
+ if (!new_cnt)
+ return -EINVAL;
+ *new_cnt = 0;
+ return 0;
+}
+#endif
+#endif /* __BPF_PROLOGUE_H */
--
2.1.0

2015-08-28 07:08:42

by Wang Nan

[permalink] [raw]
Subject: [PATCH 26/32] perf tools: Generate prologue for BPF programs

This patch generates prologue for each 'struct probe_trace_event' for
fetching arguments for BPF programs.

After bpf__probe(), iterate over each programs to check whether
prologue is required. If none of 'struct perf_probe_event' a program
will attach to has at least one argument, simply skip preprocessor
hooking. For those who prologue is required, calls bpf__gen_prologue()
and paste original instruction after prologue.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/bpf-loader.c | 120 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 119 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 95e529b..66d9bea 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -5,10 +5,13 @@
* Copyright (C) 2015 Huawei Inc.
*/

+#include <linux/bpf.h>
#include <bpf/libbpf.h>
#include "perf.h"
#include "debug.h"
#include "bpf-loader.h"
+#include "bpf-prologue.h"
+#include "llvm-utils.h"
#include "probe-event.h"
#include "probe-finder.h"
#include "llvm-utils.h"
@@ -42,6 +45,8 @@ struct bpf_prog_priv {
struct perf_probe_event *ppev;
struct perf_probe_event pev;
};
+ bool need_prologue;
+ struct bpf_insn *insns_buf;
};

static void
@@ -53,6 +58,7 @@ bpf_prog_priv__clear(struct bpf_program *prog __maybe_unused,
/* check if pev is initialized */
if (priv && priv->pev_ready)
clear_perf_probe_event(&priv->pev);
+ zfree(&priv->insns_buf);
free(priv);
}

@@ -239,6 +245,103 @@ int bpf__unprobe(void)
return ret < 0 ? ret : 0;
}

+static int
+preproc_gen_prologue(struct bpf_program *prog, int n,
+ struct bpf_insn *orig_insns, int orig_insns_cnt,
+ struct bpf_prog_prep_result *res)
+{
+ struct probe_trace_event *tev;
+ struct perf_probe_event *pev;
+ struct bpf_prog_priv *priv;
+ struct bpf_insn *buf;
+ size_t prologue_cnt = 0;
+ int err;
+
+ err = bpf_program__get_private(prog, (void **)&priv);
+ if (err || !priv || !priv->pev_ready)
+ goto errout;
+
+ pev = &priv->pev;
+
+ if (n < 0 || n >= pev->ntevs)
+ goto errout;
+
+ tev = &pev->tevs[n];
+
+ buf = priv->insns_buf;
+ err = bpf__gen_prologue(tev->args, tev->nargs,
+ buf, &prologue_cnt,
+ BPF_MAXINSNS - orig_insns_cnt);
+ if (err) {
+ const char *title;
+
+ title = bpf_program__title(prog, false);
+ if (!title)
+ title = "??";
+
+ pr_debug("Failed to generate prologue for program %s\n",
+ title);
+ return err;
+ }
+
+ memcpy(&buf[prologue_cnt], orig_insns,
+ sizeof(struct bpf_insn) * orig_insns_cnt);
+
+ res->new_insn_ptr = buf;
+ res->new_insn_cnt = prologue_cnt + orig_insns_cnt;
+ res->pfd = NULL;
+ return 0;
+
+errout:
+ pr_debug("Internal error in preproc_gen_prologue\n");
+ return -EINVAL;
+}
+
+static int hook_load_preprocessor(struct bpf_program *prog)
+{
+ struct perf_probe_event *pev;
+ struct bpf_prog_priv *priv;
+ bool need_prologue = false;
+ int err, i;
+
+ err = bpf_program__get_private(prog, (void **)&priv);
+ if (err || !priv) {
+ pr_debug("Internal error when hook preprocessor\n");
+ return -EINVAL;
+ }
+
+ pev = &priv->pev;
+ for (i = 0; i < pev->ntevs; i++) {
+ struct probe_trace_event *tev = &pev->tevs[i];
+
+ if (tev->nargs > 0) {
+ need_prologue = true;
+ break;
+ }
+ }
+
+ /*
+ * Since all tev doesn't have argument, we don't need generate
+ * prologue.
+ */
+ if (!need_prologue) {
+ priv->need_prologue = false;
+ return 0;
+ }
+
+ priv->need_prologue = true;
+ priv->insns_buf = malloc(sizeof(struct bpf_insn) *
+ BPF_MAXINSNS);
+ if (!priv->insns_buf) {
+ pr_debug("No enough memory: alloc insns_buf failed\n");
+ return -ENOMEM;
+ }
+
+ err = bpf_program__set_prep(prog, pev->ntevs,
+ preproc_gen_prologue);
+ return err;
+}
+
int bpf__probe(void)
{
int err, nr_events = 0;
@@ -289,6 +392,17 @@ int bpf__probe(void)
err = sync_bpf_program_pev(prog);
if (err)
goto out;
+ /*
+ * After probing, let's consider prologue, which
+ * adds program fetcher to BPF programs.
+ *
+ * hook_load_preprocessorr() hooks pre-processor
+ * to bpf_program, let it generate prologue
+ * dynamically during loading.
+ */
+ err = hook_load_preprocessor(prog);
+ if (err)
+ goto out;
}
}
out:
@@ -349,7 +463,11 @@ int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg)
for (i = 0; i < pev->ntevs; i++) {
tev = &pev->tevs[i];

- fd = bpf_program__fd(prog);
+ if (priv->need_prologue)
+ fd = bpf_program__nth_fd(prog, i);
+ else
+ fd = bpf_program__fd(prog);
+
if (fd < 0) {
pr_debug("bpf: failed to get file descriptor\n");
return fd;
--
2.1.0

2015-08-28 07:10:50

by Wang Nan

[permalink] [raw]
Subject: [PATCH 27/32] perf tools: Use same BPF program if arguments are identical

This patch allows creating only one BPF program for different
'probe_trace_event'(tev) generated by one 'perf_probe_event'(pev), if
their prologues are identical.

This is done by comparing argument list of different tev, and maps type
of prologue and tev using a mapping array. This patch utilizes qsort to
sort tevs. After sorting, tevs with identical argument list will group
together.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/bpf-loader.c | 133 ++++++++++++++++++++++++++++++++++++++++---
1 file changed, 126 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 66d9bea..a23aaf0 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -47,6 +47,8 @@ struct bpf_prog_priv {
};
bool need_prologue;
struct bpf_insn *insns_buf;
+ int nr_types;
+ int *type_mapping;
};

static void
@@ -59,6 +61,7 @@ bpf_prog_priv__clear(struct bpf_program *prog __maybe_unused,
if (priv && priv->pev_ready)
clear_perf_probe_event(&priv->pev);
zfree(&priv->insns_buf);
+ zfree(&priv->type_mapping);
free(priv);
}

@@ -255,7 +258,7 @@ preproc_gen_prologue(struct bpf_program *prog, int n,
struct bpf_prog_priv *priv;
struct bpf_insn *buf;
size_t prologue_cnt = 0;
- int err;
+ int i, err;

err = bpf_program__get_private(prog, (void **)&priv);
if (err || !priv || !priv->pev_ready)
@@ -263,10 +266,20 @@ preproc_gen_prologue(struct bpf_program *prog, int n,

pev = &priv->pev;

- if (n < 0 || n >= pev->ntevs)
+ if (n < 0 || n >= priv->nr_types)
goto errout;

- tev = &pev->tevs[n];
+ /* Find a tev belongs to that type */
+ for (i = 0; i < pev->ntevs; i++)
+ if (priv->type_mapping[i] == n)
+ break;
+
+ if (i >= pev->ntevs) {
+ pr_debug("Internal error: prologue type %d not found\n", n);
+ return -ENOENT;
+ }
+
+ tev = &pev->tevs[i];

buf = priv->insns_buf;
err = bpf__gen_prologue(tev->args, tev->nargs,
@@ -297,6 +310,98 @@ errout:
return -EINVAL;
}

+/*
+ * compare_tev_args is reflexive, transitive and antisymmetric.
+ * I can show that but this margin is too narrow to contain.
+ */
+static int compare_tev_args(const void *ptev1, const void *ptev2)
+{
+ int i, ret;
+ const struct probe_trace_event *tev1 =
+ *(const struct probe_trace_event **)ptev1;
+ const struct probe_trace_event *tev2 =
+ *(const struct probe_trace_event **)ptev2;
+
+ ret = tev2->nargs - tev1->nargs;
+ if (ret)
+ return ret;
+
+ for (i = 0; i < tev1->nargs; i++) {
+ struct probe_trace_arg *arg1, *arg2;
+ struct probe_trace_arg_ref *ref1, *ref2;
+
+ arg1 = &tev1->args[i];
+ arg2 = &tev2->args[i];
+
+ ret = strcmp(arg1->value, arg2->value);
+ if (ret)
+ return ret;
+
+ ref1 = arg1->ref;
+ ref2 = arg2->ref;
+
+ while (ref1 && ref2) {
+ ret = ref2->offset - ref1->offset;
+ if (ret)
+ return ret;
+
+ ref1 = ref1->next;
+ ref2 = ref2->next;
+ }
+
+ if (ref1 || ref2)
+ return ref2 ? 1 : -1;
+ }
+
+ return 0;
+}
+
+static int map_prologue(struct perf_probe_event *pev, int *mapping,
+ int *nr_types)
+{
+ int i, type = 0;
+ struct {
+ struct probe_trace_event *tev;
+ int idx;
+ } *stevs;
+ size_t array_sz = sizeof(*stevs) * pev->ntevs;
+
+ stevs = malloc(array_sz);
+ if (!stevs) {
+ pr_debug("No ehough memory: alloc stevs failed\n");
+ return -ENOMEM;
+ }
+
+ pr_debug("In map_prologue, ntevs=%d\n", pev->ntevs);
+ for (i = 0; i < pev->ntevs; i++) {
+ stevs[i].tev = &pev->tevs[i];
+ stevs[i].idx = i;
+ }
+ qsort(stevs, pev->ntevs, sizeof(*stevs),
+ compare_tev_args);
+
+ for (i = 0; i < pev->ntevs; i++) {
+ if (i == 0) {
+ mapping[stevs[i].idx] = type;
+ pr_debug("mapping[%d]=%d\n", stevs[i].idx,
+ type);
+ continue;
+ }
+
+ if (compare_tev_args(stevs + i, stevs + i - 1) == 0)
+ mapping[stevs[i].idx] = type;
+ else
+ mapping[stevs[i].idx] = ++type;
+
+ pr_debug("mapping[%d]=%d\n", stevs[i].idx,
+ mapping[stevs[i].idx]);
+ }
+ free(stevs);
+ *nr_types = type + 1;
+
+ return 0;
+}
+
static int hook_load_preprocessor(struct bpf_program *prog)
{
struct perf_probe_event *pev;
@@ -337,7 +442,19 @@ static int hook_load_preprocessor(struct bpf_program *prog)
return -ENOMEM;
}

- err = bpf_program__set_prep(prog, pev->ntevs,
+ priv->type_mapping = malloc(sizeof(int) * pev->ntevs);
+ if (!priv->type_mapping) {
+ pr_debug("No enough memory: alloc type_mapping failed\n");
+ return -ENOMEM;
+ }
+ memset(priv->type_mapping, 0xff,
+ sizeof(int) * pev->ntevs);
+
+ err = map_prologue(pev, priv->type_mapping, &priv->nr_types);
+ if (err)
+ return err;
+
+ err = bpf_program__set_prep(prog, priv->nr_types,
preproc_gen_prologue);
return err;
}
@@ -463,9 +580,11 @@ int bpf__foreach_tev(bpf_prog_iter_callback_t func, void *arg)
for (i = 0; i < pev->ntevs; i++) {
tev = &pev->tevs[i];

- if (priv->need_prologue)
- fd = bpf_program__nth_fd(prog, i);
- else
+ if (priv->need_prologue) {
+ int type = priv->type_mapping[i];
+
+ fd = bpf_program__nth_fd(prog, type);
+ } else
fd = bpf_program__fd(prog);

if (fd < 0) {
--
2.1.0

2015-08-28 07:10:06

by Wang Nan

[permalink] [raw]
Subject: [PATCH 28/32] perf record: Support custom vmlinux path

From: He Kuang <[email protected]>

Make perf-record command support --vmlinux option if BPF_PROLOGUE is on.

'perf record' needs vmlinux as the source of DWARF info to generate
prologue for BPF programs, so path of vmlinux should be specified.

Short name 'k' has been taken by 'clockid'. This patch skips the short
option name and use '--vmlinux' for vmlinux path.

Signed-off-by: He Kuang <[email protected]>
Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/builtin-record.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 212718c..8eb39d5 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1100,6 +1100,10 @@ struct option __record_options[] = {
"clang binary to use for compiling BPF scriptlets"),
OPT_STRING(0, "clang-opt", &llvm_param.clang_opt, "clang options",
"options passed to clang when compiling BPF scriptlets"),
+#ifdef HAVE_BPF_PROLOGUE
+ OPT_STRING(0, "vmlinux", &symbol_conf.vmlinux_name,
+ "file", "vmlinux pathname"),
+#endif
#endif
OPT_END()
};
--
2.1.0

2015-08-28 07:08:40

by Wang Nan

[permalink] [raw]
Subject: [PATCH 29/32] perf probe: Init symbol as kprobe

Before this patch, add_perf_probe_events() init symbol maps only for
uprobe if the first 'struct perf_probe_event' passed to it is a uprobe
event. This is a trick because 'perf probe''s command line syntax
constrains the first elements of the probe_event arrays must be kprobes
if there is one kprobe there.

However, with the incoming BPF uprobe support, that constrain is not
hold since 'perf record' will also probe on k/u probes through BPF
object, and is possible to pass an array with kprobe but the first
element is uprobe.

This patch init symbol maps for kprobes even if all of events are
uprobes, because the extra cost should be small enough.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/probe-event.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index e720913..b94a8d7 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -2789,7 +2789,7 @@ int add_perf_probe_events(struct perf_probe_event *pevs, int npevs,
{
int i, ret;

- ret = init_symbol_maps(pevs->uprobes);
+ ret = init_symbol_maps(false);
if (ret < 0)
return ret;

--
2.1.0

2015-08-28 07:08:06

by Wang Nan

[permalink] [raw]
Subject: [PATCH 30/32] perf tools: Support attach BPF program on uprobe events

This patch appends new syntax to BPF object section name to support
probing at uprobe event. Now we can use BPF program like this:

SEC(
"target=/lib64/libc.so.6\n"
"libcwrite=__write"
)
int libcwrite(void *ctx)
{
return 1;
}

Where, in section name of a program, before the main config string,
we can use 'key=value' style options. Now the only option key "target"
is for uprobe probing.

Signed-off-by: Wang Nan <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Brendan Gregg <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: David Ahern <[email protected]>
Cc: He Kuang <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kaixu Xia <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Cc: [email protected]
Cc: Arnaldo Carvalho de Melo <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/perf/util/bpf-loader.c | 88 ++++++++++++++++++++++++++++++++++++++++----
1 file changed, 81 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index a23aaf0..2735389 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -66,6 +66,84 @@ bpf_prog_priv__clear(struct bpf_program *prog __maybe_unused,
}

static int
+do_config(const char *key, const char *value,
+ struct perf_probe_event *pev)
+{
+ pr_debug("config bpf program: %s=%s\n", key, value);
+ if (strcmp(key, "target") == 0) {
+ pev->uprobes = true;
+ pev->target = strdup(value);
+ return 0;
+ }
+
+ pr_warning("BPF: WARNING: invalid config option in object: %s=%s\n",
+ key, value);
+ pr_warning("\tHint: Currently only valid option is 'target=<file>'\n");
+ return 0;
+}
+
+static const char *
+parse_config_kvpair(const char *config_str, struct perf_probe_event *pev)
+{
+ char *text = strdup(config_str);
+ char *sep, *line;
+ const char *main_str = NULL;
+ int err = 0;
+
+ if (!text) {
+ pr_debug("No enough memory: dup config_str failed\n");
+ return NULL;
+ }
+
+ line = text;
+ while ((sep = strchr(line, '\n'))) {
+ char *equ;
+
+ *sep = '\0';
+ equ = strchr(line, '=');
+ if (!equ) {
+ pr_warning("WARNING: invalid config in BPF object: %s\n",
+ line);
+ pr_warning("\tShould be 'key=value'.\n");
+ goto nextline;
+ }
+ *equ = '\0';
+
+ err = do_config(line, equ + 1, pev);
+ if (err)
+ break;
+nextline:
+ line = sep + 1;
+ }
+
+ if (!err)
+ main_str = config_str + (line - text);
+ free(text);
+
+ return main_str;
+}
+
+static int
+parse_config(const char *config_str, struct perf_probe_event *pev)
+{
+ const char *main_str;
+ int err;
+
+ main_str = parse_config_kvpair(config_str, pev);
+ if (!main_str)
+ return -EINVAL;
+
+ err = parse_perf_probe_command(main_str, pev);
+ if (err < 0) {
+ pr_debug("bpf: '%s' is not a valid config string\n",
+ config_str);
+ /* parse failed, don't need clear pev. */
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static int
config_bpf_program(struct bpf_program *prog, struct perf_probe_event *pev)
{
struct bpf_prog_priv *priv = NULL;
@@ -79,13 +157,9 @@ config_bpf_program(struct bpf_program *prog, struct perf_probe_event *pev)
}

pr_debug("bpf: config program '%s'\n", config_str);
- err = parse_perf_probe_command(config_str, pev);
- if (err < 0) {
- pr_debug("bpf: '%s' is not a valid config string\n",
- config_str);
- /* parse failed, don't need clear pev. */
- return -EINVAL;
- }
+ err = parse_config(config_str, pev);
+ if (err)
+ return err;

if (pev->group && strcmp(pev->group, PERF_BPF_PROBE_GROUP)) {
pr_debug("bpf: '%s': group for event is set and not '%s'.\n",
--
2.1.0

2015-08-28 07:08:39

by Wang Nan

[permalink] [raw]
Subject: [PATCH 31/32] tools lib traceevent: Support function __get_dynamic_array_len

From: He Kuang <[email protected]>

Support helper function __get_dynamic_array_len() in libtraceevent,
this function is used accompany with __print_array() or __print_hex(),
but currently it is not an available function in the function list of
process_function().

The total allocated length of the dynamic array is embedded in the top
half of __data_loc_##item field. This patch adds new arg type
PRINT_DYNAMIC_ARRAY_LEN to return the length to eval_num_arg(),

Signed-off-by: He Kuang <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Signed-off-by: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
tools/lib/traceevent/event-parse.c | 56 +++++++++++++++++++++-
tools/lib/traceevent/event-parse.h | 1 +
.../perf/util/scripting-engines/trace-event-perl.c | 1 +
.../util/scripting-engines/trace-event-python.c | 1 +
4 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c
index 4d88593..1244797 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -848,6 +848,7 @@ static void free_arg(struct print_arg *arg)
free(arg->bitmask.bitmask);
break;
case PRINT_DYNAMIC_ARRAY:
+ case PRINT_DYNAMIC_ARRAY_LEN:
free(arg->dynarray.index);
break;
case PRINT_OP:
@@ -2729,6 +2730,42 @@ process_dynamic_array(struct event_format *event, struct print_arg *arg, char **
}

static enum event_type
+process_dynamic_array_len(struct event_format *event, struct print_arg *arg,
+ char **tok)
+{
+ struct format_field *field;
+ enum event_type type;
+ char *token;
+
+ if (read_expect_type(EVENT_ITEM, &token) < 0)
+ goto out_free;
+
+ arg->type = PRINT_DYNAMIC_ARRAY_LEN;
+
+ /* Find the field */
+ field = pevent_find_field(event, token);
+ if (!field)
+ goto out_free;
+
+ arg->dynarray.field = field;
+ arg->dynarray.index = 0;
+
+ if (read_expected(EVENT_DELIM, ")") < 0)
+ goto out_err;
+
+ type = read_token(&token);
+ *tok = token;
+
+ return type;
+
+ out_free:
+ free_token(token);
+ out_err:
+ *tok = NULL;
+ return EVENT_ERROR;
+}
+
+static enum event_type
process_paren(struct event_format *event, struct print_arg *arg, char **tok)
{
struct print_arg *item_arg;
@@ -2975,6 +3012,10 @@ process_function(struct event_format *event, struct print_arg *arg,
free_token(token);
return process_dynamic_array(event, arg, tok);
}
+ if (strcmp(token, "__get_dynamic_array_len") == 0) {
+ free_token(token);
+ return process_dynamic_array_len(event, arg, tok);
+ }

func = find_func_handler(event->pevent, token);
if (func) {
@@ -3655,14 +3696,25 @@ eval_num_arg(void *data, int size, struct event_format *event, struct print_arg
goto out_warning_op;
}
break;
+ case PRINT_DYNAMIC_ARRAY_LEN:
+ offset = pevent_read_number(pevent,
+ data + arg->dynarray.field->offset,
+ arg->dynarray.field->size);
+ /*
+ * The total allocated length of the dynamic array is
+ * stored in the top half of the field, and the offset
+ * is in the bottom half of the 32 bit field.
+ */
+ val = (unsigned long long)(offset >> 16);
+ break;
case PRINT_DYNAMIC_ARRAY:
/* Without [], we pass the address to the dynamic data */
offset = pevent_read_number(pevent,
data + arg->dynarray.field->offset,
arg->dynarray.field->size);
/*
- * The actual length of the dynamic array is stored
- * in the top half of the field, and the offset
+ * The total allocated length of the dynamic array is
+ * stored in the top half of the field, and the offset
* is in the bottom half of the 32 bit field.
*/
offset &= 0xffff;
diff --git a/tools/lib/traceevent/event-parse.h b/tools/lib/traceevent/event-parse.h
index 204befb..6fc83c7 100644
--- a/tools/lib/traceevent/event-parse.h
+++ b/tools/lib/traceevent/event-parse.h
@@ -294,6 +294,7 @@ enum print_arg_type {
PRINT_OP,
PRINT_FUNC,
PRINT_BITMASK,
+ PRINT_DYNAMIC_ARRAY_LEN,
};

struct print_arg {
diff --git a/tools/perf/util/scripting-engines/trace-event-perl.c b/tools/perf/util/scripting-engines/trace-event-perl.c
index 1bd593b..544509c 100644
--- a/tools/perf/util/scripting-engines/trace-event-perl.c
+++ b/tools/perf/util/scripting-engines/trace-event-perl.c
@@ -221,6 +221,7 @@ static void define_event_symbols(struct event_format *event,
break;
case PRINT_BSTRING:
case PRINT_DYNAMIC_ARRAY:
+ case PRINT_DYNAMIC_ARRAY_LEN:
case PRINT_STRING:
case PRINT_BITMASK:
break;
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c b/tools/perf/util/scripting-engines/trace-event-python.c
index ace2484..aa9e125 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -251,6 +251,7 @@ static void define_event_symbols(struct event_format *event,
/* gcc warns for these? */
case PRINT_BSTRING:
case PRINT_DYNAMIC_ARRAY:
+ case PRINT_DYNAMIC_ARRAY_LEN:
case PRINT_FUNC:
case PRINT_BITMASK:
/* we should warn... */
--
2.1.0

2015-08-28 07:08:05

by Wang Nan

[permalink] [raw]
Subject: [PATCH 32/32] bpf: Introduce function for outputing data to perf event

From: He Kuang <[email protected]>

There're scenarios that we need an eBPF program to record not only
kprobe point args, but also the PMU counters, time latencies or the
number of cache misses between two probe points and other information
when the probe point is entered.

This patch adds a new trace event to establish infrastruction for bpf to
output data to perf. Userspace perf tools can detect and use this event
as using the existing tracepoint events.

New bpf trace event entry in debugfs:

/sys/kernel/debug/tracing/events/bpf/bpf_output_data

Userspace perf tools detect the new tracepoint event as:

bpf:bpf_output_data [Tracepoint event]

Data in ring-buffer of perf events added to this event will be polled
out, sample types and other attributes can be adjusted to those events
directly without touching the original kprobe events.

The bpf helper function gives eBPF program ability to output data as
perf sample event. This helper simple call the new trace event and
userspace perf tools can record the BPF ftrace event to collect those
records.

Signed-off-by: He Kuang <[email protected]>
Acked-by: Namhyung Kim <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Signed-off-by: Wang Nan <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
---
include/trace/events/bpf.h | 30 ++++++++++++++++++++++++++++++
include/uapi/linux/bpf.h | 7 +++++++
kernel/trace/bpf_trace.c | 23 +++++++++++++++++++++++
samples/bpf/bpf_helpers.h | 2 ++
4 files changed, 62 insertions(+)
create mode 100644 include/trace/events/bpf.h

diff --git a/include/trace/events/bpf.h b/include/trace/events/bpf.h
new file mode 100644
index 0000000..6b739b8
--- /dev/null
+++ b/include/trace/events/bpf.h
@@ -0,0 +1,30 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM bpf
+
+#if !defined(_TRACE_BPF_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_BPF_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(bpf_output_data,
+
+ TP_PROTO(u64 *src, int size),
+
+ TP_ARGS(src, size),
+
+ TP_STRUCT__entry(
+ __dynamic_array(u8, buf, size)
+ ),
+
+ TP_fast_assign(
+ memcpy(__get_dynamic_array(buf), src, size);
+ ),
+
+ TP_printk("%s", __print_hex(__get_dynamic_array(buf),
+ __get_dynamic_array_len(buf)))
+);
+
+#endif /* _TRACE_BPF_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 29ef6f9..5068ab1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -249,6 +249,13 @@ enum bpf_func_id {
* Return: 0 on success
*/
BPF_FUNC_get_current_comm,
+
+ /**
+ * int bpf_output_trace_data(void *src, int size)
+ * Return: 0 on success
+ */
+ BPF_FUNC_output_trace_data,
+
__BPF_FUNC_MAX_ID,
};

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 88a041a..219f670 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -11,7 +11,10 @@
#include <linux/filter.h>
#include <linux/uaccess.h>
#include <linux/ctype.h>
+
#include "trace.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/bpf.h>

static DEFINE_PER_CPU(int, bpf_prog_active);

@@ -79,6 +82,24 @@ static const struct bpf_func_proto bpf_probe_read_proto = {
.arg3_type = ARG_ANYTHING,
};

+static u64 bpf_output_trace_data(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+ void *src = (void *) (long) r1;
+ int size = (int) r2;
+
+ trace_bpf_output_data(src, size);
+
+ return 0;
+}
+
+static const struct bpf_func_proto bpf_output_trace_data_proto = {
+ .func = bpf_output_trace_data,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_STACK,
+ .arg2_type = ARG_CONST_STACK_SIZE,
+};
+
/*
* limited trace_printk()
* only %d %u %x %ld %lu %lx %lld %llu %llx %p conversion specifiers allowed
@@ -169,6 +190,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func
return &bpf_map_delete_elem_proto;
case BPF_FUNC_probe_read:
return &bpf_probe_read_proto;
+ case BPF_FUNC_output_trace_data:
+ return &bpf_output_trace_data_proto;
case BPF_FUNC_ktime_get_ns:
return &bpf_ktime_get_ns_proto;
case BPF_FUNC_tail_call:
diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index bdf1c16..0aeaebe 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -59,5 +59,7 @@ static int (*bpf_l3_csum_replace)(void *ctx, int off, int from, int to, int flag
(void *) BPF_FUNC_l3_csum_replace;
static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flags) =
(void *) BPF_FUNC_l4_csum_replace;
+static int (*bpf_output_trace_data)(void *src, int size) =
+ (void *) BPF_FUNC_output_trace_data;

#endif
--
2.1.0

2015-08-29 00:25:31

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [GIT PULL 00/32] perf tools: filtering events using eBPF programs

On 8/28/15 12:05 AM, Wang Nan wrote:
> Hi Arnaldo,
>
> This time I adjust all Cc and Link field in each patch.
>
> Four new patches (1,2,3,12/32) is newly introduced for fixing a bug
> related to '--filter' option. Patch 06/32 is also modified. Please keep
> an eye on it.

Arnaldo, what is the latest news on this set?
I think you've looked at most of them over the last months and few patch
reorders were necessary. Is it all addressed ? All further work is
sadly blocked, because these core patches need to come in first.
I took another look today and to me patches 1-30 look good.
Thanks!

2015-08-29 00:29:43

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH 31/32] tools lib traceevent: Support function __get_dynamic_array_len

On 8/28/15 12:06 AM, Wang Nan wrote:
> From: He Kuang<[email protected]>
>
> Support helper function __get_dynamic_array_len() in libtraceevent,
> this function is used accompany with __print_array() or __print_hex(),
> but currently it is not an available function in the function list of
> process_function().
>
> The total allocated length of the dynamic array is embedded in the top
> half of __data_loc_##item field. This patch adds new arg type
> PRINT_DYNAMIC_ARRAY_LEN to return the length to eval_num_arg(),
>
> Signed-off-by: He Kuang<[email protected]>
> Acked-by: Namhyung Kim<[email protected]>

Tested-by: Alexei Starovoitov <[email protected]>

this patch fixes the perf crash:
Warning: [bpf:bpf_output_data] function __get_dynamic_array_len not
defined
Warning: Error: expected type 5 but read 0
*** glibc detected *** perf_4.2.0: double free or corruption (fasttop):
0x00000000032caf20 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x7ec96)[0x7f0d5d2d3c96]

it's not strictly necessary until patch 32 lands, but I think it's
a good fix regardless.
Steven, could you take it into your tree?

2015-08-29 00:45:11

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH 32/32] bpf: Introduce function for outputing data to perf event

On 8/28/15 12:06 AM, Wang Nan wrote:
> his patch adds a new trace event to establish infrastruction for bpf to
> output data to perf. Userspace perf tools can detect and use this event
> as using the existing tracepoint events.
>
> New bpf trace event entry in debugfs:
>
> /sys/kernel/debug/tracing/events/bpf/bpf_output_data
>
> Userspace perf tools detect the new tracepoint event as:
>
> bpf:bpf_output_data [Tracepoint event]
>
> Data in ring-buffer of perf events added to this event will be polled
> out, sample types and other attributes can be adjusted to those events
> directly without touching the original kprobe events.

Wang,
I have 2nd thoughts on this.
I've played with it, but global bpf:bpf_output_data event is limiting.
I'd like to use this bpf_output_trace_data() helper for tcp estats
gathering, but global collector will prevent other similar bpf programs
running in parallel.
So as a concept I think it's very useful, but we need a way to select
which ring-buffer to output data to.
proposal A:
Can we use ftrace:instances concept and make bpf_output_trace_data()
into that particular trace_pipe ?
proposal B:
bpf_perf_event_read() model is using nice concept of an array of
perf_events. Can we perf_event_open a 'new' event that can be mmaped
in user space and bpf_output_trace_data(idx, buf, buf_size) into it.
Where 'idx' will be an index of FD from perf_even_open of such
new event?

Thanks!

2015-08-29 00:51:42

by Wang Nan

[permalink] [raw]
Subject: Re: [PATCH 06/32] perf tools: Enable passing bpf object file to --event



On 2015/8/28 15:05, Wang Nan wrote:
> diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c
> index ef5fde6..24c8b63 100644
> --- a/tools/perf/builtin-trace.c
> +++ b/tools/perf/builtin-trace.c
> @@ -3090,6 +3090,7 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
> if (trace.evlist->nr_entries > 0)
> evlist__set_evsel_handler(trace.evlist, trace__event_handler);
>
> + /* trace__record calls cmd_record, which calls bpf__clear() */
> if ((argc >= 1) && (strcmp(argv[0], "record") == 0))
> return trace__record(&trace, argc-1, &argv[1]);
>
> @@ -3100,7 +3101,8 @@ int cmd_trace(int argc, const char **argv, const char *prefix __maybe_unused)
> if (!trace.trace_syscalls && !trace.trace_pgfaults &&
> trace.evlist->nr_entries == 0 /* Was --events used? */) {
> pr_err("Please specify something to trace.\n");
> - return -1;
> + err = -1;
> + goto out;
> }
>
> if (output_name != NULL) {
> @@ -3159,5 +3161,6 @@ out_close:
> if (output_name != NULL)
> fclose(trace.output);
> out:
> + bpf__clear();
> return err;
> }
>

Sorry, here is a silly mistake that I miss

#include "bpf-loader.h"

at the head of builtin-trace.c. In my default environment
builtin-trace.c is not compiled
so I find this problem today when I compile it on another machine. I'll
fix in my tree.

Arnaldo, since you suggest Ingo to pull directly, shall I make another pull request with the whole 32 patches
sent for fixing that line?

Thank you.


2015-08-29 01:20:51

by Wang Nan

[permalink] [raw]
Subject: Re: [PATCH 32/32] bpf: Introduce function for outputing data to perf event



On 2015/8/29 8:45, Alexei Starovoitov wrote:
> On 8/28/15 12:06 AM, Wang Nan wrote:
>> his patch adds a new trace event to establish infrastruction for bpf to
>> output data to perf. Userspace perf tools can detect and use this event
>> as using the existing tracepoint events.
>>
>> New bpf trace event entry in debugfs:
>>
>> /sys/kernel/debug/tracing/events/bpf/bpf_output_data
>>
>> Userspace perf tools detect the new tracepoint event as:
>>
>> bpf:bpf_output_data [Tracepoint event]
>>
>> Data in ring-buffer of perf events added to this event will be polled
>> out, sample types and other attributes can be adjusted to those events
>> directly without touching the original kprobe events.
>
> Wang,
> I have 2nd thoughts on this.
> I've played with it, but global bpf:bpf_output_data event is limiting.
> I'd like to use this bpf_output_trace_data() helper for tcp estats
> gathering, but global collector will prevent other similar bpf programs
> running in parallel.

So current model work for you but the problem is all output goes into one
place, which prevents similar BPF programs run in parallel because the
reveicer is unable to tell what message is generated by who. So actually
you want a publish-and-subscribe model, subscriber get messages from only
the publisher it interested in. Am I understand your problem correctly?

> So as a concept I think it's very useful, but we need a way to select
> which ring-buffer to output data to.
> proposal A:
> Can we use ftrace:instances concept and make bpf_output_trace_data()
> into that particular trace_pipe ?
> proposal B:
> bpf_perf_event_read() model is using nice concept of an array of
> perf_events. Can we perf_event_open a 'new' event that can be mmaped
> in user space and bpf_output_trace_data(idx, buf, buf_size) into it.
> Where 'idx' will be an index of FD from perf_even_open of such
> new event?
>

I've also thinking about adding the extra id parameter in
bpf_output_trace_data()
but it is for encoding the type of output data, which is totally different
from what you want.

For me, I use bpf_output_trace_data() to output information like PMU count
value. Perf is the only receiver, so global collector is perfect. Could you
please describe your usecase in more detail?

Thank you for using that feature!

> Thanks!
>

2015-08-29 01:34:42

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH 32/32] bpf: Introduce function for outputing data to perf event

On 8/28/15 6:19 PM, Wangnan (F) wrote:
> For me, I use bpf_output_trace_data() to output information like PMU count
> value. Perf is the only receiver, so global collector is perfect. Could you
> please describe your usecase in more detail?

there is a special receiver in user space that only wants the data from
the bpf program that it loaded. It shouldn't conflict with any other
processes. Like when it's running, I still should be able to use perf
for other performance analysis. There is no way to share single
bpf:bpf_output_data event, since these user processes are completely
independent.

2015-08-29 02:16:31

by Wang Nan

[permalink] [raw]
Subject: Re: [PATCH 32/32] bpf: Introduce function for outputing data to perf event



On 2015/8/29 9:34, Alexei Starovoitov wrote:
> On 8/28/15 6:19 PM, Wangnan (F) wrote:
>> For me, I use bpf_output_trace_data() to output information like PMU
>> count
>> value. Perf is the only receiver, so global collector is perfect.
>> Could you
>> please describe your usecase in more detail?
>
> there is a special receiver in user space that only wants the data from
> the bpf program that it loaded. It shouldn't conflict with any other
> processes. Like when it's running, I still should be able to use perf
> for other performance analysis. There is no way to share single
> bpf:bpf_output_data event, since these user processes are completely
> independent.
>
I'd like to see whether it is possible to create dynamic tracepoints so
different receivers can listen on different tracepoints. For my side, maybe
I can encode format information into the new tracepoints so don't need
those LLVM patches.

For example:

# echo 'dynamic_tracepoint:mytracepoint <encode its format>' >>
/sys/kernel/debug/tracing/dynamic_trace_events
# perf list
...
dynamic_tracepoint:mytracepoint
...

In perf side we can encode the creation of dynamic tracepoint into
bpf-loader
like what we currectly do for probing the kprobes.

This way reqires us to create a fresh new event source, in parallel with
tracepoint. I'm not sure
how much work it needs. What do you think?

Thank you.

2015-08-29 02:22:07

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH 32/32] bpf: Introduce function for outputing data to perf event

On 8/28/15 7:15 PM, Wangnan (F) wrote:
> I'd like to see whether it is possible to create dynamic tracepoints so
> different receivers can listen on different tracepoints.

see my proposal A. I think ftrace instances might work for this.

I'm not sure about 'format' part though. Kernel side shouldn't be
aware of it. It's only the contract between bpf program and user process
that deals with it.

2015-08-29 02:37:23

by Wang Nan

[permalink] [raw]
Subject: Re: [PATCH 32/32] bpf: Introduce function for outputing data to perf event



On 2015/8/29 10:22, Alexei Starovoitov wrote:
> On 8/28/15 7:15 PM, Wangnan (F) wrote:
>> I'd like to see whether it is possible to create dynamic tracepoints so
>> different receivers can listen on different tracepoints.
>
> see my proposal A. I think ftrace instances might work for this.
>
> I'm not sure about 'format' part though. Kernel side shouldn't be
> aware of it. It's only the contract between bpf program and user process
> that deals with it.
>
It is an option. Let's keep an open mind now :)

For current patch 32/32, I think it is useful enough for some simple cases,
and we have already start using it internally. What about keep it as what it
is now and create a independent method for your usecase?

Thank you.

2015-08-29 02:49:46

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH 32/32] bpf: Introduce function for outputing data to perf event

On 8/28/15 7:36 PM, Wangnan (F) wrote:
> For current patch 32/32, I think it is useful enough for some simple cases,
> and we have already start using it internally. What about keep it as
> what it
> is now and create a independent method for your usecase?

well, though the patch is small and contained, I think we can do better
and define more generic helper. I believe Namhyung back in July had
the same concern.

2015-08-29 02:52:04

by Wang Nan

[permalink] [raw]
Subject: Re: [PATCH 32/32] bpf: Introduce function for outputing data to perf event



On 2015/8/29 10:49, Alexei Starovoitov wrote:
> On 8/28/15 7:36 PM, Wangnan (F) wrote:
>> For current patch 32/32, I think it is useful enough for some simple
>> cases,
>> and we have already start using it internally. What about keep it as
>> what it
>> is now and create a independent method for your usecase?
>
> well, though the patch is small and contained, I think we can do better
> and define more generic helper. I believe Namhyung back in July had
> the same concern.
>
OK. I'll drop this one in my next pull request.

Thank you.

2015-08-31 13:59:17

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [GIT PULL 00/32] perf tools: filtering events using eBPF programs

Em Fri, Aug 28, 2015 at 05:25:27PM -0700, Alexei Starovoitov escreveu:
> On 8/28/15 12:05 AM, Wang Nan wrote:
> >This time I adjust all Cc and Link field in each patch.

> >Four new patches (1,2,3,12/32) is newly introduced for fixing a bug
> >related to '--filter' option. Patch 06/32 is also modified. Please keep
> >an eye on it.

> Arnaldo, what is the latest news on this set?
> I think you've looked at most of them over the last months and few patch
> reorders were necessary. Is it all addressed ? All further work is
> sadly blocked, because these core patches need to come in first.
> I took another look today and to me patches 1-30 look good.

I asked Ingo if he had anything else to mention about changelog format
so that I could try pulling it directly, i.e. I need give it a last
look, and this is not a per-patchkit cost, its just fine tuning that
_should_ make processing subsequent patchkits faster, by pulling instead
of me going thru each patch.

But I disagree it "prevents further work", nobody has to wait for
everything to get upstream to do work, anyway, it will be processed.

- Arnaldo

2015-08-31 14:10:08

by Wang Nan

[permalink] [raw]
Subject: Re: [GIT PULL 00/32] perf tools: filtering events using eBPF programs



On 2015/8/31 21:59, Arnaldo Carvalho de Melo wrote:
> Em Fri, Aug 28, 2015 at 05:25:27PM -0700, Alexei Starovoitov escreveu:
>> On 8/28/15 12:05 AM, Wang Nan wrote:
>>> This time I adjust all Cc and Link field in each patch.
>>> Four new patches (1,2,3,12/32) is newly introduced for fixing a bug
>>> related to '--filter' option. Patch 06/32 is also modified. Please keep
>>> an eye on it.
>
>> Arnaldo, what is the latest news on this set?
>> I think you've looked at most of them over the last months and few patch
>> reorders were necessary. Is it all addressed ? All further work is
>> sadly blocked, because these core patches need to come in first.
>> I took another look today and to me patches 1-30 look good.
> I asked Ingo if he had anything else to mention about changelog format
> so that I could try pulling it directly, i.e. I need give it a last
> look, and this is not a per-patchkit cost, its just fine tuning that
> _should_ make processing subsequent patchkits faster, by pulling instead
> of me going thru each patch.
>
> But I disagree it "prevents further work", nobody has to wait for
> everything to get upstream to do work, anyway, it will be processed.

I think Xia Kaixu's BPF read pmu patch series is waiting for this.
Please have
a look at [1]. His kernel side patches has already collected by net-next,
and waiting for userside update.

However he is also waiting for net-next be merged, and currently we are
the only
user of that feature :)

[1]
http://lkml.kernel.org/r/[email protected]
.

> - Arnaldo