2015-04-30 11:02:31

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

This series of patches is an approach to integrate eBPF with perf.
After applying these patches, users are allowed to use following
command to load eBPF program compiled by LLVM into kernel:

$ perf bpf sample_bpf.o

The required BPF code and the loading procedure is similar to Alexei
Starovoitov's libbpf in sample/bpf, with following exceptions:

1. The section name are not required leading with 'kprobe/' or
'kretprobe/'. Without such leading, any valid C var name can be use.

2. A 'config' section can be provided to describe the position and
arguments of a program. Syntax is identical to 'perf probe'.

An example is pasted at the bottom of this cover letter. In that
example, mybpfprog is configured by string in config section, and will
be probed at __alloc_pages_nodemask. sample_bpf.o is generated using:

$ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include -D__KERNEL__ \
-Wno-unused-value -Wno-pointer-sign \
-O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \
sample_bpf.o

And can be loaded using:

$ perf bpf sample_bpf.o

This series is only a limited functional. Following works are on the
todo list:

1. Unprobe kprobe stubs used by eBPF programs when unloading;

2. Enable eBPF programs to access local variables and arguments
by utilizing debuginfo;

3. Output data in perf way.

In this series:

Patch 1/22 is a bugfix in perf probe, and may be triggered by following
patches;

Patch 2-3/22 are preparation, add required macros and syscall
definition into perf source tree.

Patch 4/22 add 'perf bpf' command.

Patch 5-20/22 are labor works, which parse the ELF object file, collect
information in object files, create maps needed by programs, link map
and programs, config programs and load programs into kernel.

Patch 21-22/22 are the final work. Patch 21 creates kprobe points which
will be used by eBPF programs, patch 22 creates perf file descriptors
then attach eBPF programs on them.

-------- EXAMPL --------
----- sample_bpf.c -----
#include <uapi/linux/bpf.h>
#include <linux/version.h>
#include <uapi/linux/ptrace.h>

#define SEC(NAME) __attribute__((section(NAME), used))

static int (*bpf_map_delete_elem)(void *map, void *key) =
(void *) BPF_FUNC_map_delete_elem;
static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
(void *) BPF_FUNC_trace_printk;

struct bpf_map_def {
unsigned int type;
unsigned int key_size;
unsigned int value_size;
unsigned int max_entries;
};

struct pair {
u64 val;
u64 ip;
};

struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(long),
.value_size = sizeof(struct pair),
.max_entries = 1000000,
};

SEC("kprobe/kmem_cache_free")
int bpf_prog1(struct pt_regs *ctx)
{
long ptr = ctx->r14;
bpf_map_delete_elem(&my_map, &ptr);
return 0;
}

SEC("mybpfprog")
int bpf_prog_my(void *ctx)
{
char fmt[] = "Haha\n";
bpf_trace_printk(fmt, sizeof(fmt));
return 0;
}

char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;
char _config[] SEC("config") = ""
"mybpfprog=__alloc_pages_nodemask\n";

Wang Nan (22):
perf: probe: avoid segfault if passed with ''.
perf: bpf: prepare: add __aligned_u64 to types.h.
perf: add bpf common operations.
perf tools: Add new 'perf bpf' command.
perf bpf: open eBPF object file and do basic validation.
perf bpf: check swap according to EHDR.
perf bpf: iterater over elf sections to collect information.
perf bpf: collect version and license from ELF.
perf bpf: collect map definitions.
perf bpf: collect config section in object.
perf bpf: collect symbol table in object files.
perf bpf: collect bpf programs from object files.
perf bpf: collects relocation sections from object file.
perf bpf: config eBPF programs based on their names.
perf bpf: config eBPF programs using config section.
perf bpf: create maps needed by object file.
perf bpf: relocation programs.
perf bpf: load eBPF programs into kernel.
perf bpf: dump eBPF program before loading.
perf bpf: clean elf memory after loading.
perf bpf: probe at kprobe points.
perf bpf: attaches eBPF program to perf fd.

tools/include/linux/types.h | 5 +
tools/perf/Build | 1 +
tools/perf/Documentation/perf-bpf.txt | 18 +
tools/perf/builtin-bpf.c | 63 ++
tools/perf/builtin.h | 1 +
tools/perf/perf-sys.h | 6 +
tools/perf/perf.c | 1 +
tools/perf/util/Build | 2 +
tools/perf/util/bpf-loader.c | 1061 +++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 73 +++
tools/perf/util/bpf.c | 195 ++++++
tools/perf/util/bpf.h | 23 +
tools/perf/util/probe-event.c | 2 +
13 files changed, 1451 insertions(+)
create mode 100644 tools/perf/Documentation/perf-bpf.txt
create mode 100644 tools/perf/builtin-bpf.c
create mode 100644 tools/perf/util/bpf-loader.c
create mode 100644 tools/perf/util/bpf-loader.h
create mode 100644 tools/perf/util/bpf.c
create mode 100644 tools/perf/util/bpf.h

--
1.8.3.4


2015-04-30 10:53:36

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 01/22] perf: probe: avoid segfault if passed with ''.

Since parse_perf_probe_point() deals with a user passed argument, we
should not assume it to be a valid string.

Without this patch, if pass '' to perf probe, a segfault raises:

$ perf probe -a ''
Segmentation fault

This patch checks argument of parse_perf_probe_point() before
string processing.

After this patch:

$ perf probe -a ''

usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
...

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/probe-event.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index d8bb616..d05b77c 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -1084,6 +1084,8 @@ static int parse_perf_probe_point(char *arg, struct perf_probe_event *pev)
*
* TODO:Group name support
*/
+ if (!arg)
+ return -EINVAL;

ptr = strpbrk(arg, ";=@+%");
if (ptr && *ptr == '=') { /* Event name */
--
1.8.3.4

2015-04-30 10:53:32

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 02/22] perf: bpf: prepare: add __aligned_u64 to types.h.

Following patches will introduce linux/bpf.h to perf, which
requires definition of __aligned_u64. This patch add it to the common
types.h for tools.

Signed-off-by: Wang Nan <[email protected]>
---
tools/include/linux/types.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/tools/include/linux/types.h b/tools/include/linux/types.h
index b5cf25e..10a2cdc 100644
--- a/tools/include/linux/types.h
+++ b/tools/include/linux/types.h
@@ -60,6 +60,11 @@ typedef __u32 __bitwise __be32;
typedef __u64 __bitwise __le64;
typedef __u64 __bitwise __be64;

+/* Taken from uapi/linux/types.h. Required by linux/bpf.h */
+#ifndef __aligned_u64
+# define __aligned_u64 __u64 __attribute__((aligned(8)))
+#endif
+
struct list_head {
struct list_head *next, *prev;
};
--
1.8.3.4

2015-04-30 11:02:37

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 03/22] perf: add bpf common operations.

Add bpf syscall and related structure to perf for bpf loader use.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/perf-sys.h | 6 ++++++
tools/perf/util/Build | 1 +
tools/perf/util/bpf.c | 39 +++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf.h | 22 ++++++++++++++++++++++
4 files changed, 68 insertions(+)
create mode 100644 tools/perf/util/bpf.c
create mode 100644 tools/perf/util/bpf.h

diff --git a/tools/perf/perf-sys.h b/tools/perf/perf-sys.h
index 6ef6816..b38ca8b 100644
--- a/tools/perf/perf-sys.h
+++ b/tools/perf/perf-sys.h
@@ -22,6 +22,9 @@
#ifndef __NR_gettid
# define __NR_gettid 224
#endif
+#ifndef __NR_bpf
+# define __NR_bpf 357
+#endif
#endif

#if defined(__x86_64__)
@@ -39,6 +42,9 @@
#ifndef __NR_gettid
# define __NR_gettid 186
#endif
+#ifndef __NR_bpf
+# define __NR_bpf 321
+#endif
#endif

#ifdef __powerpc__
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 797490a..dfba2f0 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -74,6 +74,7 @@ libperf-y += data.o
libperf-$(CONFIG_X86) += tsc.o
libperf-y += cloexec.o
libperf-y += thread-stack.o
+libperf-y += bpf.o

libperf-$(CONFIG_LIBELF) += symbol-elf.o
libperf-$(CONFIG_LIBELF) += probe-event.o
diff --git a/tools/perf/util/bpf.c b/tools/perf/util/bpf.c
new file mode 100644
index 0000000..f752723
--- /dev/null
+++ b/tools/perf/util/bpf.c
@@ -0,0 +1,39 @@
+/*
+ * common BPF operations.
+ *
+ * Copyright (C) 2015, Wang Nan <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+
+#include <stdlib.h>
+#include <string.h>
+#include <linux/unistd.h>
+#include <unistd.h>
+#include <linux/bpf.h>
+#include <errno.h>
+#include "perf.h"
+#include "bpf.h"
+
+int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr, size_t size)
+{
+ return syscall(__NR_bpf, cmd, attr, size);
+}
+
+int bpf_create_map(struct bpf_map_def *map_def)
+{
+ union bpf_attr attr;
+
+ if (!map_def)
+ return -EFAULT;
+
+ bzero(&attr, sizeof(attr));
+
+ attr.map_type = map_def->type;
+ attr.key_size = map_def->key_size;
+ attr.value_size = map_def->value_size;
+ attr.max_entries = map_def->max_entries;
+
+ return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
+}
diff --git a/tools/perf/util/bpf.h b/tools/perf/util/bpf.h
new file mode 100644
index 0000000..be106b0
--- /dev/null
+++ b/tools/perf/util/bpf.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright (C) 2015, Wang Nan <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+#ifndef __PERF_BPF_H
+#define __PERF_BPF_H
+
+#include <linux/bpf.h>
+
+struct bpf_map_def {
+ unsigned int type;
+ unsigned int key_size;
+ unsigned int value_size;
+ unsigned int max_entries;
+};
+
+int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr, size_t size);
+
+int bpf_create_map(struct bpf_map_def *map_def);
+#endif
--
1.8.3.4

2015-04-30 11:01:00

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 04/22] perf tools: Add new 'perf bpf' command.

Adding new 'perf bpf' command to provide eBPF program loading and
management support.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/Build | 1 +
tools/perf/Documentation/perf-bpf.txt | 18 ++++++++++
tools/perf/builtin-bpf.c | 63 +++++++++++++++++++++++++++++++++++
tools/perf/builtin.h | 1 +
tools/perf/perf.c | 1 +
tools/perf/util/Build | 1 +
tools/perf/util/bpf-loader.c | 35 +++++++++++++++++++
tools/perf/util/bpf-loader.h | 21 ++++++++++++
8 files changed, 141 insertions(+)
create mode 100644 tools/perf/Documentation/perf-bpf.txt
create mode 100644 tools/perf/builtin-bpf.c
create mode 100644 tools/perf/util/bpf-loader.c
create mode 100644 tools/perf/util/bpf-loader.h

diff --git a/tools/perf/Build b/tools/perf/Build
index b77370e..c69f0c1 100644
--- a/tools/perf/Build
+++ b/tools/perf/Build
@@ -19,6 +19,7 @@ perf-y += builtin-kvm.o
perf-y += builtin-inject.o
perf-y += builtin-mem.o
perf-y += builtin-data.o
+perf-y += builtin-bpf.o

perf-$(CONFIG_AUDIT) += builtin-trace.o
perf-$(CONFIG_LIBELF) += builtin-probe.o
diff --git a/tools/perf/Documentation/perf-bpf.txt b/tools/perf/Documentation/perf-bpf.txt
new file mode 100644
index 0000000..634d588
--- /dev/null
+++ b/tools/perf/Documentation/perf-bpf.txt
@@ -0,0 +1,18 @@
+perf-bpf(1)
+==============
+
+NAME
+----
+perf-bpf - loads eBPF programs into kernel.
+
+SYNOPSIS
+--------
+[verse]
+'perf bpf' [<common options>] <bpfprogram.o>",
+
+DESCRIPTION
+-----------
+Loading eBPF programs into kernel.
+
+OPTIONS
+-------
diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
new file mode 100644
index 0000000..0fc7a82
--- /dev/null
+++ b/tools/perf/builtin-bpf.c
@@ -0,0 +1,63 @@
+/*
+ * buildin-bpf.c
+ *
+ * Buildin bpf command: Load bpf and attach bpf programs onto kprobes.
+ */
+#include "builtin.h"
+#include "perf.h"
+#include "debug.h"
+#include "parse-options.h"
+#include "bpf-loader.h"
+
+static const char *bpf_usage[] = {
+ "perf bpf [<options>] <bpfobj>",
+ NULL
+};
+
+static void print_usage(void)
+{
+ printf("Usage:\n");
+ printf("\t%s\n\n", bpf_usage[0]);
+}
+
+struct option __bpf_options[] = {
+ OPT_INCR('v', "verbose", &verbose, "be more verbose"),
+ OPT_END()
+};
+
+struct option *bpf_options = __bpf_options;
+
+int cmd_bpf(int argc, const char **argv,
+ const char *prefix __maybe_unused)
+{
+ int err;
+ const char **pfn;
+
+ if (argc < 2)
+ goto usage;
+
+ argc = parse_options(argc, argv, bpf_options, bpf_usage,
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ if (argc < 1)
+ goto usage;
+
+ pfn = argv;
+ while (*pfn != NULL) {
+ const char *fn = *pfn++;
+
+ err = bpf__load(fn);
+ if (err) {
+ pr_err("bpf: load bpf program from %s: result: %d\n",
+ fn, err);
+ break;
+ }
+ }
+
+ if (!err)
+ bpf__run();
+ return err;
+usage:
+ print_usage();
+ return -1;
+}
+
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 3688ad2..c2c4a0d 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -38,6 +38,7 @@ extern int cmd_trace(int argc, const char **argv, const char *prefix);
extern int cmd_inject(int argc, const char **argv, const char *prefix);
extern int cmd_mem(int argc, const char **argv, const char *prefix);
extern int cmd_data(int argc, const char **argv, const char *prefix);
+extern int cmd_bpf(int argc, const char **argv, const char *prefix);

extern int find_scripts(char **scripts_array, char **scripts_path_array);
#endif
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index b857fcb..779f2fb 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -64,6 +64,7 @@ static struct cmd_struct commands[] = {
{ "inject", cmd_inject, 0 },
{ "mem", cmd_mem, 0 },
{ "data", cmd_data, 0 },
+ { "bpf", cmd_bpf, 0 },
};

struct pager_config {
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index dfba2f0..39287a5 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -75,6 +75,7 @@ libperf-$(CONFIG_X86) += tsc.o
libperf-y += cloexec.o
libperf-y += thread-stack.o
libperf-y += bpf.o
+libperf-y += bpf-loader.o

libperf-$(CONFIG_LIBELF) += symbol-elf.o
libperf-$(CONFIG_LIBELF) += probe-event.o
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
new file mode 100644
index 0000000..84d3cc3
--- /dev/null
+++ b/tools/perf/util/bpf-loader.c
@@ -0,0 +1,35 @@
+/*
+ * BPF loader support.
+ *
+ * Copyright (C) 2015, Wang Nan <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+#include <stdio.h>
+#include <errno.h>
+
+#include "perf.h"
+#include "debug.h"
+#include "symbol.h"
+#include "bpf-loader.h"
+#include "probe-event.h"
+#include "probe-finder.h" // for MAX_PROBES
+
+#include <linux/list.h>
+#include <linux/types.h>
+#include <linux/bpf.h>
+
+int bpf__load(const char *path)
+{
+ pr_debug("bpf: loading %s\n", path);
+ return 0;
+}
+
+int bpf__run(void)
+{
+ pr_info("BPF is running. Use Ctrl-c to stop.\n");
+ while(1)
+ sleep(1);
+ return 0;
+}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
new file mode 100644
index 0000000..122b178
--- /dev/null
+++ b/tools/perf/util/bpf-loader.h
@@ -0,0 +1,21 @@
+/*
+ * Copyright (C) 2015, Wang Nan <[email protected]>
+ * Copyright (C) 2015, Huawei Inc.
+ *
+ * Released under the GPL v2. (and only v2, not any later version)
+ */
+#ifndef __BPF_LOADER_H
+#define __BPF_LOADER_H
+
+#include <linux/unistd.h>
+#include <unistd.h>
+#include <linux/bpf.h>
+
+#include "perf.h"
+#include "symbol.h"
+#include "probe-event.h"
+
+int bpf__load(const char *path);
+int bpf__run(void);
+
+#endif
--
1.8.3.4

2015-04-30 11:02:33

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 05/22] perf bpf: open eBPF object file and do basic validation.

This patch adds basic 'struct bpf_obj' which will be used for eBPF
object files loading. eBPF object files are compiled by LLVM as ELF
format. In this patch, libelf is used to open those files, read EHDR
and do basic validation according to e_type and e_machine.

All elf related staffs are grouped together and reside in
'struct bpf_obj'. bpf_obj_clear_elf() is introduced to clear it.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 133 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 16 ++++++
2 files changed, 149 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 84d3cc3..3eb7504 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -20,10 +20,143 @@
#include <linux/types.h>
#include <linux/bpf.h>

+static LIST_HEAD(bpf_obj_list);
+
+static struct bpf_obj *__bpf_obj_alloc(const char *path)
+{
+ struct bpf_obj *obj;
+
+ obj = calloc(1, sizeof(struct bpf_obj));
+ if (!obj) {
+ pr_err("bpf: alloc memory failed for %s\n", path);
+ return NULL;
+ }
+
+ obj->path = strdup(path);
+ if (!obj->path) {
+ pr_err("bpf: failed to strdup '%s'\n", path);
+ free(obj);
+ return NULL;
+ }
+ return obj;
+}
+
+static void bpf_obj_clear_elf(struct bpf_obj *obj)
+{
+ if (!obj_elf_valid(obj))
+ return;
+
+ if (obj->elf.elf) {
+ elf_end(obj->elf.elf);
+ obj->elf.elf = NULL;
+ }
+ if (obj->elf.fd >= 0) {
+ close(obj->elf.fd);
+ obj->elf.fd = -1;
+ }
+}
+
+static void bpf_obj_close(struct bpf_obj *obj)
+{
+ if (!obj)
+ return;
+
+ bpf_obj_clear_elf(obj);
+
+ if (obj->path)
+ free(obj->path);
+ free(obj);
+}
+
+static struct bpf_obj *bpf_obj_alloc(const char *path)
+{
+ struct bpf_obj *obj;
+
+ obj = __bpf_obj_alloc(path);
+ if (!obj)
+ goto out;
+
+ obj->elf.fd = -1;
+ return obj;
+out:
+ bpf_obj_close(obj);
+ return NULL;
+}
+
+static int bpf_obj_elf_init(struct bpf_obj *obj)
+{
+ int err = 0;
+ GElf_Ehdr *ep;
+
+ if (obj_elf_valid(obj)) {
+ pr_err("bpf: elf init: internal error\n");
+ return -EEXIST;
+ }
+
+ obj->elf.fd = open(obj->path, O_RDONLY);
+ if (obj->elf.fd < 0) {
+ pr_err("bpf: failed to open %s: %s\n", obj->path,
+ strerror(errno));
+ return -errno;
+ }
+
+ obj->elf.elf = elf_begin(obj->elf.fd,
+ PERF_ELF_C_READ_MMAP,
+ NULL);
+ if (!obj->elf.elf) {
+ pr_err("bpf: failed to open %s as ELF file\n",
+ obj->path);
+ err = -EINVAL;
+ goto errout;
+ }
+
+ if (!gelf_getehdr(obj->elf.elf, &obj->elf.ehdr)) {
+ pr_err("bpf: failed to get EHDR from %s\n",
+ obj->path);
+ err = -EINVAL;
+ goto errout;
+ }
+ ep = &obj->elf.ehdr;
+
+ if ((ep->e_type != ET_REL) || (ep->e_machine != 0)) {
+ pr_err("bpf: %s is not an eBPF object file\n",
+ obj->path);
+ err = -EINVAL;
+ goto errout;
+ }
+
+ return 0;
+errout:
+ bpf_obj_clear_elf(obj);
+ return err;
+}
+
int bpf__load(const char *path)
{
+ struct bpf_obj *obj;
+ int err;
+
pr_debug("bpf: loading %s\n", path);
+
+ if (elf_version(EV_CURRENT) == EV_NONE) {
+ pr_err("bpf: failed to init libelf for %s\n", path);
+ return -ENOTSUP;
+ }
+
+ obj = bpf_obj_alloc(path);
+ if (!obj) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ if ((err = bpf_obj_elf_init(obj)))
+ goto out;
+
+ list_add(&obj->list, &bpf_obj_list);
return 0;
+out:
+ bpf_obj_close(obj);
+ return -1;
}

int bpf__run(void)
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 122b178..6a6651b 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -18,4 +18,20 @@
int bpf__load(const char *path);
int bpf__run(void);

+struct bpf_obj {
+ /* All bpf objs should be linked together. */
+ struct list_head list;
+ char *path;
+
+ /*
+ * Information when doing elf related work. Only valid if fd
+ * is valid.
+ */
+ struct {
+ int fd;
+ Elf *elf;
+ GElf_Ehdr ehdr;
+ } elf;
+};
+#define obj_elf_valid(o) ((o)->elf.fd >= 0)
#endif
--
1.8.3.4

2015-04-30 10:53:40

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 06/22] perf bpf: check swap according to EHDR.

Check endianess according to EHDR to support loading eBPF objects into
big endian machines. Code is taken from util/symbol-elf.c.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 28 ++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 1 +
2 files changed, 29 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 3eb7504..14d76f6 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -76,6 +76,7 @@ static struct bpf_obj *bpf_obj_alloc(const char *path)
if (!obj)
goto out;

+ obj->needs_swap = false;
obj->elf.fd = -1;
return obj;
out:
@@ -131,6 +132,31 @@ errout:
return err;
}

+static int
+bpf_obj_swap_init(struct bpf_obj *obj)
+{
+ static unsigned int const endian = 1;
+
+ obj->needs_swap = false;
+
+ switch (obj->elf.ehdr.e_ident[EI_DATA]) {
+ case ELFDATA2LSB:
+ /* We are big endian, BPF obj is little endian. */
+ if (*(unsigned char const *)&endian != 1)
+ obj->needs_swap = true;
+ return 0;
+
+ case ELFDATA2MSB:
+ /* We are little endian, BPF obj is big endian. */
+ if (*(unsigned char const *)&endian != 0)
+ obj->needs_swap = true;
+ return 0;
+
+ default:
+ return -EINVAL;
+ }
+}
+
int bpf__load(const char *path)
{
struct bpf_obj *obj;
@@ -151,6 +177,8 @@ int bpf__load(const char *path)

if ((err = bpf_obj_elf_init(obj)))
goto out;
+ if ((err = bpf_obj_swap_init(obj)))
+ goto out;

list_add(&obj->list, &bpf_obj_list);
return 0;
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 6a6651b..c27b0ac 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -22,6 +22,7 @@ struct bpf_obj {
/* All bpf objs should be linked together. */
struct list_head list;
char *path;
+ bool needs_swap;

/*
* Information when doing elf related work. Only valid if fd
--
1.8.3.4

2015-04-30 11:01:59

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 07/22] perf bpf: iterater over elf sections to collect information.

bpf_obj_elf_collect() is introduced to iterate over each elf sections
to collection informations in eBPF object files. This function will
futher enhanced to collect license, kernel version, programs, configs
and map information.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 54 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 14d76f6..9c077dd 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -157,6 +157,58 @@ bpf_obj_swap_init(struct bpf_obj *obj)
}
}

+static int bpf_obj_elf_collect(struct bpf_obj *obj)
+{
+ Elf *elf = obj->elf.elf;
+ GElf_Ehdr *ep = &obj->elf.ehdr;
+ Elf_Scn *scn = NULL;
+ int idx = 0, err = 0;
+
+ /* Elf is corrupted/truncated, avoid calling elf_strptr. */
+ if (!elf_rawdata(elf_getscn(elf, ep->e_shstrndx), NULL)) {
+ pr_err("bpf: failed to get e_shstrndx from %s\n",
+ obj->path);
+ return -EINVAL;
+ }
+
+ while ((scn = elf_nextscn(elf, scn)) != NULL) {
+ char *name;
+ GElf_Shdr sh;
+ Elf_Data *data;
+
+ idx++;
+ if (gelf_getshdr(scn, &sh) != &sh) {
+ pr_err("bpf: failed to get section header"
+ " from %s\n", obj->path);
+ err = -EINVAL;
+ goto out;
+ }
+
+ name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name);
+ if (!name) {
+ pr_err("bpf: failed to get section name "
+ "from %s\n", obj->path);
+ err = -EINVAL;
+ goto out;
+ }
+
+ data = elf_getdata(scn, 0);
+ if (!data) {
+ pr_err("bpf: failed to get section data "
+ "from %s(%s)\n", name, obj->path);
+ err = -EINVAL;
+ goto out;
+ }
+ pr_debug("bpf: section %s, size %ld, link %d, flags %lx, type=%d\n",
+ name,(unsigned long)data->d_size,
+ (int)sh.sh_link,
+ (unsigned long)sh.sh_flags,
+ (int)sh.sh_type);
+ }
+out:
+ return err;
+}
+
int bpf__load(const char *path)
{
struct bpf_obj *obj;
@@ -179,6 +231,8 @@ int bpf__load(const char *path)
goto out;
if ((err = bpf_obj_swap_init(obj)))
goto out;
+ if ((err = bpf_obj_elf_collect(obj)))
+ goto out;

list_add(&obj->list, &bpf_obj_list);
return 0;
--
1.8.3.4

2015-04-30 10:53:44

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 08/22] perf bpf: collect version and license from ELF.

Expand bpf_obj_elf_collect() to let it collect license and kernel
version information in eBPF object files. eBPF object file should have
a section named 'license', which contains a string. It should also
have a section named 'version', contains a u32 LINUX_VERSION_CODE.

bpf_obj_validate() is introduced to validate object file after loaded.
Currently it only check existance of 'version' section.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 2 ++
2 files changed, 49 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 9c077dd..296fb06 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -157,6 +157,32 @@ bpf_obj_swap_init(struct bpf_obj *obj)
}
}

+static int bpf_obj_license_init(struct bpf_obj *obj,
+ void *data, size_t size)
+{
+ memcpy(obj->license, data,
+ min(size, sizeof(obj->license) - 1));
+ pr_debug("bpf: license of %s is %s\n", obj->path, obj->license);
+ return 0;
+}
+
+static int bpf_obj_kver_init(struct bpf_obj *obj,
+ void *data, size_t size)
+{
+ u32 kver;
+ if (size < sizeof(kver)) {
+ pr_err("bpf: invalid kver section in %s\n", obj->path);
+ return -EINVAL;
+ }
+ memcpy(&kver, data, sizeof(kver));
+ if (obj->needs_swap)
+ kver = bswap_32(kver);
+ obj->kern_version = kver;
+ pr_debug("bpf: kernel version of %s is %x\n", obj->path,
+ obj->kern_version);
+ return 0;
+}
+
static int bpf_obj_elf_collect(struct bpf_obj *obj)
{
Elf *elf = obj->elf.elf;
@@ -204,11 +230,30 @@ static int bpf_obj_elf_collect(struct bpf_obj *obj)
(int)sh.sh_link,
(unsigned long)sh.sh_flags,
(int)sh.sh_type);
+
+ if (strcmp(name, "license") == 0)
+ err = bpf_obj_license_init(obj, data->d_buf,
+ data->d_size);
+ else if (strcmp(name, "version") == 0)
+ err = bpf_obj_kver_init(obj, data->d_buf,
+ data->d_size);
+ if (err)
+ goto out;
}
out:
return err;
}

+static int bpf_obj_validate(struct bpf_obj *obj)
+{
+ if (obj->kern_version == 0) {
+ pr_err("bpf: %s doesn't provide kernel version\n",
+ obj->path);
+ return -EINVAL;
+ }
+ return 0;
+}
+
int bpf__load(const char *path)
{
struct bpf_obj *obj;
@@ -233,6 +278,8 @@ int bpf__load(const char *path)
goto out;
if ((err = bpf_obj_elf_collect(obj)))
goto out;
+ if ((err = bpf_obj_validate(obj)))
+ goto out;

list_add(&obj->list, &bpf_obj_list);
return 0;
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index c27b0ac..e1d5c42 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -23,6 +23,8 @@ struct bpf_obj {
struct list_head list;
char *path;
bool needs_swap;
+ char license[64];
+ u32 kern_version;

/*
* Information when doing elf related work. Only valid if fd
--
1.8.3.4

2015-04-30 10:53:50

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 09/22] perf bpf: collect map definitions.

If maps are used by eBPF programs, corresponding object file(s) should
contain a section named 'map'. Which contains map definitions, one for
each map to describe its format. 'struct perf_bpf_map_def' is
introduced as part of protocol between perf and eBPF programs. All map
definitions are copied to obj->maps.

bpf.h is introduced for common bpf operations.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 31 +++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 3 +++
2 files changed, 34 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 296fb06..bf3b793 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -65,6 +65,8 @@ static void bpf_obj_close(struct bpf_obj *obj)

if (obj->path)
free(obj->path);
+ if (obj->maps)
+ free(obj->maps);
free(obj);
}

@@ -183,6 +185,32 @@ static int bpf_obj_kver_init(struct bpf_obj *obj,
return 0;
}

+static int bpf_obj_maps_init(struct bpf_obj *obj, void *data,
+ size_t size)
+{
+ size_t map_def_sz = sizeof(struct bpf_map_def);
+ int nr_maps = size / map_def_sz;
+
+ if (nr_maps == 0) {
+ pr_debug("bpf: %s doesn't need map definition\n",
+ obj->path);
+ return 0;
+ }
+
+ obj->maps = malloc(nr_maps * map_def_sz);
+ if (!obj->maps) {
+ pr_err("bpf: malloc maps failed: %s\n", obj->path);
+ return -ENOMEM;
+ }
+
+ obj->nr_maps = nr_maps;
+ memcpy(obj->maps, data, nr_maps * map_def_sz);
+ pr_debug("bpf: %d map%s in %s\n", nr_maps,
+ nr_maps > 1 ? "s" : "",
+ obj->path);
+ return 0;
+}
+
static int bpf_obj_elf_collect(struct bpf_obj *obj)
{
Elf *elf = obj->elf.elf;
@@ -237,6 +265,9 @@ static int bpf_obj_elf_collect(struct bpf_obj *obj)
else if (strcmp(name, "version") == 0)
err = bpf_obj_kver_init(obj, data->d_buf,
data->d_size);
+ else if (strcmp(name, "maps") == 0)
+ err = bpf_obj_maps_init(obj, data->d_buf,
+ data->d_size);
if (err)
goto out;
}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index e1d5c42..6c5c8d6 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -14,6 +14,7 @@
#include "perf.h"
#include "symbol.h"
#include "probe-event.h"
+#include "bpf.h"

int bpf__load(const char *path);
int bpf__run(void);
@@ -25,6 +26,8 @@ struct bpf_obj {
bool needs_swap;
char license[64];
u32 kern_version;
+ struct bpf_map_def *maps;
+ size_t nr_maps;

/*
* Information when doing elf related work. Only valid if fd
--
1.8.3.4

2015-04-30 10:53:38

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 10/22] perf bpf: collect config section in object.

eBPF programs are allowed to include 'config' sections, which contain
strings describe each bpf program. Config strings have identical format
as 'perf probe' defined. They describe positions and arguments used by
bpf programs.

Defining config strings separatly is allowed. For example:

#define SEC(NAME) __attribute__((section(NAME), used))
char _bpf_prog1_config[] SEC("config") = "bpf_prog1=kmem_cache_free%return";
SEC("bpf_prog1")
int bpf_prog1(struct pt_rets *ctx) {
....
}
char _bpf_prog2_config[] SEC("config") = "bpf_prog2=kmem_cache_free";
SEC("bpf_prog2")
int bpf_prog2(struct pt_rets *ctx) {
....
}
char other_config[] SEC("config") = "bpf_prog3=kmem_cache_alloc\n"
"bpf_prog4=__alloc_pages_nodemask%return";

To make further processing easiler, this patch converts '\0' in the
whole config strings into '\n'

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 1 +
2 files changed, 45 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index bf3b793..b913d6f 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -67,6 +67,8 @@ static void bpf_obj_close(struct bpf_obj *obj)
free(obj->path);
if (obj->maps)
free(obj->maps);
+ if (obj->config_str)
+ free(obj->config_str);
free(obj);
}

@@ -211,6 +213,45 @@ static int bpf_obj_maps_init(struct bpf_obj *obj, void *data,
return 0;
}

+static int bpf_obj_config_init(struct bpf_obj *obj, void *data,
+ size_t size)
+{
+ char *config_str;
+ char *p, *pend;
+
+ if (size == 0) {
+ pr_debug("bpf: config section in %s empty\n",
+ obj->path);
+ return 0;
+ }
+ if (obj->config_str) {
+ pr_err("bpf: multiple config section in %s\n",
+ obj->path);
+ return -EEXIST;
+ }
+
+ config_str = malloc(size + 1);
+ if (!config_str) {
+ pr_err("bpf: malloc config string failed\n");
+ return -ENOMEM;
+ }
+
+ memcpy(config_str, data, size);
+
+ /*
+ * It is possible that config section contains multiple
+ * Make it a big string by converting all '\0' to '\n' and
+ * append final '\0'.
+ */
+ pend = config_str + size;
+ for (p = config_str; p < pend; p++)
+ *p == '\0' ? *p = '\n' : 0 ;
+ *pend = '\0';
+
+ obj->config_str = config_str;
+ return 0;
+}
+
static int bpf_obj_elf_collect(struct bpf_obj *obj)
{
Elf *elf = obj->elf.elf;
@@ -268,6 +309,9 @@ static int bpf_obj_elf_collect(struct bpf_obj *obj)
else if (strcmp(name, "maps") == 0)
err = bpf_obj_maps_init(obj, data->d_buf,
data->d_size);
+ else if (strcmp(name, "config") == 0)
+ err = bpf_obj_config_init(obj, data->d_buf,
+ data->d_size);
if (err)
goto out;
}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 6c5c8d6..086f28d 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -28,6 +28,7 @@ struct bpf_obj {
u32 kern_version;
struct bpf_map_def *maps;
size_t nr_maps;
+ char *config_str;

/*
* Information when doing elf related work. Only valid if fd
--
1.8.3.4

2015-04-30 11:01:05

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 11/22] perf bpf: collect symbol table in object files.

This patch collects symbols section. This section is useful when
linking ELF maps.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 11 +++++++++++
tools/perf/util/bpf-loader.h | 1 +
2 files changed, 12 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index b913d6f..b9c701a 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -50,6 +50,9 @@ static void bpf_obj_clear_elf(struct bpf_obj *obj)
elf_end(obj->elf.elf);
obj->elf.elf = NULL;
}
+
+ obj->elf.symbols = NULL;
+
if (obj->elf.fd >= 0) {
close(obj->elf.fd);
obj->elf.fd = -1;
@@ -312,6 +315,14 @@ static int bpf_obj_elf_collect(struct bpf_obj *obj)
else if (strcmp(name, "config") == 0)
err = bpf_obj_config_init(obj, data->d_buf,
data->d_size);
+ else if (sh.sh_type == SHT_SYMTAB) {
+ if (obj->elf.symbols) {
+ pr_err("bpf: multiple SYMTAB in %s\n",
+ obj->path);
+ err = -EEXIST;
+ } else
+ obj->elf.symbols = data;
+ }
if (err)
goto out;
}
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 086f28d..f0b573c 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -38,6 +38,7 @@ struct bpf_obj {
int fd;
Elf *elf;
GElf_Ehdr ehdr;
+ Elf_Data *symbols;
} elf;
};
#define obj_elf_valid(o) ((o)->elf.fd >= 0)
--
1.8.3.4

2015-04-30 11:00:48

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 12/22] perf bpf: collect bpf programs from object files.

This patch collects all programs in an object file and links them into
a list for further processing. 'struct bpf_perf_prog' is used for
representing each eBPF program. 'bpf_prog' should be a better name, but
it has been used by linux/filter.h. Although it is a kernel space name,
I still prefer to call it 'bpf_perf_prog' to prevent possible
confusion.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 91 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 14 +++++++
2 files changed, 105 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index b9c701a..bbebaf1 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -59,11 +59,20 @@ static void bpf_obj_clear_elf(struct bpf_obj *obj)
}
}

+static void bpf_perf_prog_free(struct bpf_perf_prog *prog);
+
static void bpf_obj_close(struct bpf_obj *obj)
{
+ struct bpf_perf_prog *prog, *tmp;
+
if (!obj)
return;

+ list_for_each_entry_safe(prog, tmp, &obj->progs_list, list) {
+ list_del(&prog->list);
+ bpf_perf_prog_free(prog);
+ }
+
bpf_obj_clear_elf(obj);

if (obj->path)
@@ -85,6 +94,7 @@ static struct bpf_obj *bpf_obj_alloc(const char *path)

obj->needs_swap = false;
obj->elf.fd = -1;
+ INIT_LIST_HEAD(&obj->progs_list);
return obj;
out:
bpf_obj_close(obj);
@@ -255,6 +265,70 @@ static int bpf_obj_config_init(struct bpf_obj *obj, void *data,
return 0;
}

+static void
+bpf_perf_prog_free(struct bpf_perf_prog *prog)
+{
+ if (!prog)
+ return;
+
+ if (prog->name)
+ free(prog->name);
+ if (prog->insns)
+ free(prog->insns);
+ free(prog);
+}
+
+static struct bpf_perf_prog *
+bpf_perf_prog_alloc(struct bpf_obj *obj __maybe_unused,
+ void *data, size_t size,
+ char *name, int idx)
+{
+ struct bpf_perf_prog *prog;
+
+ if (size < sizeof(struct bpf_insn)) {
+ pr_err("bpf: corrupted section %s\n", name);
+ return NULL;
+ }
+
+ prog = calloc(1, sizeof(struct bpf_perf_prog));
+ if (!prog) {
+ pr_err("bpf: failed to alloc prog\n");
+ return NULL;
+ }
+
+ /*
+ * Name of a program could be:
+ * "k(ret)probe/[a-zA-Z_][a-zA-Z_0-9]"
+ *
+ * or
+ *
+ * "[a-zA-Z_][a-zA-Z_0-9]"
+ *
+ * Will be parsed in other function. This function only saves
+ * the name.
+ */
+ prog->name = strdup(name);
+ if (!prog->name) {
+ pr_err("bpf: failed to alloc name for prog %s\n",
+ name);
+ goto out;
+ }
+
+ prog->insns = malloc(size);
+ if (!prog->insns) {
+ pr_err("bpf: failed to alloc insns for %s\n", name);
+ goto out;
+ }
+ prog->insns_cnt = size / sizeof(struct bpf_insn);
+ memcpy(prog->insns, data,
+ prog->insns_cnt * sizeof(struct bpf_insn));
+ prog->idx = idx;
+ return prog;
+out:
+ bpf_perf_prog_free(prog);
+ return NULL;
+}
+
static int bpf_obj_elf_collect(struct bpf_obj *obj)
{
Elf *elf = obj->elf.elf;
@@ -322,6 +396,23 @@ static int bpf_obj_elf_collect(struct bpf_obj *obj)
err = -EEXIST;
} else
obj->elf.symbols = data;
+ } else if ((sh.sh_type == SHT_PROGBITS) &&
+ (sh.sh_flags & SHF_EXECINSTR) &&
+ (data->d_size > 0)) {
+ struct bpf_perf_prog *prog;
+
+ prog = bpf_perf_prog_alloc(obj, data->d_buf,
+ data->d_size, name,
+ idx);
+ if (!prog) {
+ pr_err("bpf: failed to alloc "
+ "program %s (%s)", name, obj->path);
+ err = -ENOMEM;
+ } else {
+ pr_debug("bpf: found program %s\n",
+ prog->name);
+ list_add(&prog->list, &obj->progs_list);
+ }
}
if (err)
goto out;
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index f0b573c..f9cb46b 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -19,9 +19,23 @@
int bpf__load(const char *path);
int bpf__run(void);

+struct bpf_perf_prog {
+ struct list_head list;
+
+ /* Index in elf obj file, for relocation use. */
+ int idx;
+ char *name;
+ struct bpf_insn *insns;
+ size_t insns_cnt;
+};
+
struct bpf_obj {
/* All bpf objs should be linked together. */
struct list_head list;
+
+ /* All eBPF programs are linked at this list */
+ struct list_head progs_list;
+
char *path;
bool needs_swap;
char license[64];
--
1.8.3.4

2015-04-30 11:00:51

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 13/22] perf bpf: collects relocation sections from object file.

This patch collects relocation sections into 'struct obj'. Such
sections are used for associating maps with bpf programs.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 17 +++++++++++++++++
tools/perf/util/bpf-loader.h | 5 +++++
2 files changed, 22 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index bbebaf1..66fbca2 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -46,6 +46,10 @@ static void bpf_obj_clear_elf(struct bpf_obj *obj)
if (!obj_elf_valid(obj))
return;

+ if (obj->elf.reloc) {
+ free(obj->elf.reloc);
+ obj->elf.reloc = NULL;
+ }
if (obj->elf.elf) {
elf_end(obj->elf.elf);
obj->elf.elf = NULL;
@@ -413,6 +417,19 @@ static int bpf_obj_elf_collect(struct bpf_obj *obj)
prog->name);
list_add(&prog->list, &obj->progs_list);
}
+ } else if (sh.sh_type == SHT_REL) {
+ obj->elf.reloc = realloc(obj->elf.reloc,
+ sizeof(*obj->elf.reloc) *
+ (++obj->elf.nr_reloc));
+ if (!obj->elf.reloc) {
+ pr_err("bpf: realloc reloc record failed\n");
+ err = -ENOMEM;
+ } else {
+ int n = obj->elf.nr_reloc - 1;
+
+ obj->elf.reloc[n].shdr = sh;
+ obj->elf.reloc[n].data = data;
+ }
}
if (err)
goto out;
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index f9cb46b..1417c0d 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -53,6 +53,11 @@ struct bpf_obj {
Elf *elf;
GElf_Ehdr ehdr;
Elf_Data *symbols;
+ struct {
+ GElf_Shdr shdr;
+ Elf_Data *data;
+ } *reloc;
+ int nr_reloc;
} elf;
};
#define obj_elf_valid(o) ((o)->elf.fd >= 0)
--
1.8.3.4

2015-04-30 11:00:21

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 14/22] perf bpf: config eBPF programs based on their names.

This patch partially implements bpf_obj_config(), which is used for
define k(ret)probe positions which will be attached eBPF programs.

parse_perf_probe_command() is used to do the main parsing works.
Parsing result is stored into a global array. This is because
add_perf_probe_events() is non-reentrantable. In following patch,
add_perf_probe_events will be introduced to insert kprobes. It accepts
an array of 'struct perf_probe_event' and do works together.

This patch deals programs with 'kprobe/myprobe' like name only by
generating perf probe command string then calling
parse_perf_probe_command().

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 201 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 2 +
2 files changed, 203 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 66fbca2..b2871fc 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -22,6 +22,38 @@

static LIST_HEAD(bpf_obj_list);

+#define MAX_CMDLEN 256
+#define PERF_BPF_PROBE_GROUP "perf_bpf_probe"
+/*
+ * perf probe is no non-reentrantable, so we must group all events
+ * together and call add_perf_probe_events only once.
+ */
+
+struct params {
+ struct perf_probe_event event_array[MAX_PROBES];
+ size_t nr_events;
+};
+static struct params params = {
+ .nr_events = 0,
+};
+
+static struct perf_probe_event *
+alloc_perf_probe_event(void)
+{
+ struct perf_probe_event *pev;
+ int n = params.nr_events;
+
+ if (n >= MAX_PROBES) {
+ pr_err("bpf: too many events, increase MAX_PROBES\n");
+ return NULL;
+ }
+
+ params.nr_events = n + 1;
+ pev = &params.event_array[n];
+ bzero(pev, sizeof(*pev));
+ return pev;
+}
+
static struct bpf_obj *__bpf_obj_alloc(const char *path)
{
struct bpf_obj *obj;
@@ -279,6 +311,10 @@ bpf_perf_prog_free(struct bpf_perf_prog *prog)
free(prog->name);
if (prog->insns)
free(prog->insns);
+ if (prog->pev) {
+ clear_perf_probe_event(prog->pev);
+ bzero(prog->pev, sizeof(*prog->pev));
+ }
free(prog);
}

@@ -448,6 +484,169 @@ static int bpf_obj_validate(struct bpf_obj *obj)
return 0;
}

+static struct bpf_perf_prog *
+bpf_find_prog_by_name(struct bpf_obj *obj, const char *name)
+{
+ struct bpf_perf_prog *prog;
+
+ list_for_each_entry(prog, &obj->progs_list, list)
+ if (strcmp(name, prog->name) == 0)
+ return prog;
+ return NULL;
+}
+
+/*
+ * Use config_str to config program. If prog is NULL, find a
+ * prog based on config_str. config_str should not be NULL.
+ */
+static int __bpf_perf_prog_config(struct bpf_obj *obj,
+ struct bpf_perf_prog *prog,
+ char *config_str)
+{
+ struct perf_probe_event *pev = alloc_perf_probe_event();
+ int err = 0;
+
+ if (!pev)
+ return -ENOMEM;
+
+ if ((err = parse_perf_probe_command(config_str, pev)) < 0) {
+ pr_err("bpf config: %s is not a valid config string\n",
+ config_str);
+ /* parse failed, don't need clear pev. */
+ return -EINVAL;
+ }
+
+ if (pev->group && strcmp(pev->group, PERF_BPF_PROBE_GROUP)) {
+ pr_err("bpf config: '%s': group for event is set "
+ "and not '%s'.\n", config_str,
+ PERF_BPF_PROBE_GROUP);
+ err = -EINVAL;
+ goto errout;
+ } else if (!pev->group)
+ pev->group = strdup(PERF_BPF_PROBE_GROUP);
+
+ if (!pev->group) {
+ pr_err("bpf config: strdup failed\n");
+ err = -ENOMEM;
+ goto errout;
+ }
+
+ if (!pev->event) {
+ pr_err("bpf config: '%s': event name is missing\n",
+ config_str);
+ err = -EINVAL;
+ goto errout;
+ }
+
+ if (!prog)
+ prog = bpf_find_prog_by_name(obj, pev->event);
+ if (!prog) {
+ pr_err("bpf config: section %s not found for"
+ " config '%s'\n", pev->event, config_str);
+ err = -ENOENT;
+ goto errout;
+ }
+
+ if (prog->pev) {
+ pr_err("bpf config: duplicate config for section %s\n",
+ prog->name);
+ err = -EEXIST;
+ goto errout;
+ }
+
+ prog->pev = pev;
+ pr_debug("bpf config: config %s ok\n", prog->name);
+ return 0;
+errout:
+ if (pev)
+ clear_perf_probe_event(pev);
+ return err;
+}
+
+/*
+ * Config specific prog using config_str. Both prog and config_str
+ * can be set to NULL, but not both. If prog is NULL, search prog
+ * based on config_str. If config_str is NULL, try to generate a
+ * config_str using prog->name.
+ */
+static int bpf_perf_prog_config(struct bpf_obj *obj,
+ struct bpf_perf_prog *prog,
+ char *config_str)
+{
+ char __config_str[MAX_CMDLEN];
+ char *func_str;
+ char *name = NULL;
+
+ if (!prog && !config_str) {
+ pr_err("bpf config: internal error\n");
+ return -EINVAL;
+ }
+
+ if (prog)
+ name = prog->name;
+
+ if (name && (func_str = strchr(name, '/'))) {
+ const char *ret_str;
+ int err = 0;
+
+ /* try to config prog based on its name */
+ if (config_str) {
+ pr_err("bpf config: bad config %s for %s\n",
+ config_str, name);
+ return -EINVAL;
+ }
+ config_str = __config_str;
+
+ if (memcmp(name, "kprobe/", 7) == 0)
+ ret_str = "";
+ else if (memcmp(name, "kretprobe/", 10) == 0)
+ ret_str = "%return";
+ else {
+ pr_err("bpf: bad section name: '%s'\n", name);
+ return -EINVAL;
+ }
+
+ /* skip '/' */
+ func_str += 1;
+ err = snprintf(config_str, MAX_CMDLEN, "%s=%s%s",
+ func_str, func_str, ret_str);
+ if (err >= MAX_CMDLEN) {
+ pr_err("bpf: function name %s too long\n", func_str);
+ return -EINVAL;
+ }
+
+ err = __bpf_perf_prog_config(obj, prog, config_str);
+ if (err)
+ return err;
+ return 0;
+ }
+
+ if (config_str)
+ /* prog should be NULL in this case. */
+ return __bpf_perf_prog_config(obj, prog, config_str);
+
+ /*
+ * prog->name is a symbol and config_str is NULL.
+ * Return normally. It will be config again with config_str.
+ */
+ return 0;
+}
+
+static int bpf_obj_config(struct bpf_obj *obj)
+{
+ struct bpf_perf_prog *prog;
+ int err;
+
+ /* try to config progs based on their names */
+ list_for_each_entry(prog, &obj->progs_list, list) {
+ err = bpf_perf_prog_config(obj, prog, NULL);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
int bpf__load(const char *path)
{
struct bpf_obj *obj;
@@ -474,6 +673,8 @@ int bpf__load(const char *path)
goto out;
if ((err = bpf_obj_validate(obj)))
goto out;
+ if ((err = bpf_obj_config(obj)))
+ goto out;

list_add(&obj->list, &bpf_obj_list);
return 0;
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 1417c0d..09f77a5 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -27,6 +27,8 @@ struct bpf_perf_prog {
char *name;
struct bpf_insn *insns;
size_t insns_cnt;
+
+ struct perf_probe_event *pev;
};

struct bpf_obj {
--
1.8.3.4

2015-04-30 11:00:56

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 15/22] perf bpf: config eBPF programs using config section.

This patch uses bpf_perf_prog_config() to config perf programs using
config section. In bpf_obj_config(), splits config section into lines
and parse config string line by line.

This patch and previous patch should config all eBPF programs. This
patch also makes bpf_obj_validate() to check programs with no config,
and disable further processing if found one.

Since all programs are configed, obj->config_str us useless. Free it
after configuration done.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 56 ++++++++++++++++++++++++++++++++++++++++----
1 file changed, 52 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index b2871fc..6a1c800 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -476,11 +476,22 @@ out:

static int bpf_obj_validate(struct bpf_obj *obj)
{
+ struct bpf_perf_prog *prog;
+
if (obj->kern_version == 0) {
pr_err("bpf: %s doesn't provide kernel version\n",
obj->path);
return -EINVAL;
}
+
+ list_for_each_entry(prog, &obj->progs_list, list) {
+ if (!prog->pev) {
+ pr_err("bpf: program %s doesn't have config.\n",
+ prog->name);
+ return -EINVAL;
+
+ }
+ }
return 0;
}

@@ -635,7 +646,9 @@ static int bpf_perf_prog_config(struct bpf_obj *obj,
static int bpf_obj_config(struct bpf_obj *obj)
{
struct bpf_perf_prog *prog;
- int err;
+ char *config_str = obj->config_str;
+ char *pend;
+ int err = 0;

/* try to config progs based on their names */
list_for_each_entry(prog, &obj->progs_list, list) {
@@ -644,7 +657,42 @@ static int bpf_obj_config(struct bpf_obj *obj)
return err;
}

- return 0;
+ /* return if no 'config' section provided */
+ if (!config_str)
+ return 0;
+
+ /* config progs use obj->config_str */
+ pend = config_str + strlen(config_str);
+ while (config_str < pend) {
+ char *ptr;
+
+ /* skip blank lines */
+ if (*config_str == '\n') {
+ config_str ++;
+ continue;
+ }
+
+ ptr = strpbrk(config_str, "\n");
+ if (ptr)
+ *ptr = '\0';
+ err = bpf_perf_prog_config(obj, NULL, config_str);
+ if (err) {
+ pr_err("bpf config: '%s' is not a valid "
+ "config string: err %d\n",
+ config_str, err);
+ goto out;
+ }
+
+ if (!ptr)
+ break;
+ config_str = ptr + 1;
+ }
+out:
+ if (obj->config_str) {
+ free(obj->config_str);
+ obj->config_str = NULL;
+ }
+ return err;
}

int bpf__load(const char *path)
@@ -671,10 +719,10 @@ int bpf__load(const char *path)
goto out;
if ((err = bpf_obj_elf_collect(obj)))
goto out;
- if ((err = bpf_obj_validate(obj)))
- goto out;
if ((err = bpf_obj_config(obj)))
goto out;
+ if ((err = bpf_obj_validate(obj)))
+ goto out;

list_add(&obj->list, &bpf_obj_list);
return 0;
--
1.8.3.4

2015-04-30 10:59:30

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 16/22] perf bpf: create maps needed by object file.

This patch creates maps based on 'map' section in object file using
bpf_create_map(), and store the fds into an array in 'struct bpf_obj'.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 53 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 2 ++
2 files changed, 55 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 6a1c800..34ccc10 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -117,6 +117,15 @@ static void bpf_obj_close(struct bpf_obj *obj)
free(obj->maps);
if (obj->config_str)
free(obj->config_str);
+ if (obj->maps_fds) {
+ size_t i;
+
+ for (i = 0; i < obj->nr_maps; i++) {
+ if (obj->maps_fds[i] >= 0)
+ close(obj->maps_fds[i]);
+ }
+ free(obj->maps_fds);
+ }
free(obj);
}

@@ -695,6 +704,48 @@ out:
return err;
}

+static int
+bpf_obj_create_maps(struct bpf_obj *obj)
+{
+ unsigned int i;
+ size_t nr_maps = obj->nr_maps;
+ int *pfd;
+
+ if (!nr_maps) {
+ pr_debug("bpf: don't need create maps for %s\n",
+ obj->path);
+ return 0;
+ }
+
+ obj->maps_fds = malloc(sizeof(int) * nr_maps);
+ if (!obj->maps_fds) {
+ pr_err("bpf: realloc perf_bpf_maps_fds failed\n");
+ return -ENOMEM;
+ }
+
+ /* fill all fd with -1 */
+ memset(obj->maps_fds, 0xff, sizeof(int) * nr_maps);
+
+ pfd = obj->maps_fds;
+ for (i = 0; i < nr_maps; i++) {
+ *pfd = bpf_create_map(&obj->maps[i]);
+ if (*pfd < 0) {
+ size_t j;
+
+ pr_err("bpf: failed to create map: %s\n",
+ strerror(errno));
+ for (j = 0; j < i; i++) {
+ close(obj->maps_fds[j]);
+ obj->maps_fds[j] = -1;
+ }
+ return *pfd;
+ }
+ pr_debug("bpf: create map: fd=%d\n", *pfd);
+ pfd ++;
+ }
+ return 0;
+}
+
int bpf__load(const char *path)
{
struct bpf_obj *obj;
@@ -723,6 +774,8 @@ int bpf__load(const char *path)
goto out;
if ((err = bpf_obj_validate(obj)))
goto out;
+ if ((err = bpf_obj_create_maps(obj)))
+ goto out;

list_add(&obj->list, &bpf_obj_list);
return 0;
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 09f77a5..756ac2e 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -46,6 +46,8 @@ struct bpf_obj {
size_t nr_maps;
char *config_str;

+ int *maps_fds;
+
/*
* Information when doing elf related work. Only valid if fd
* is valid.
--
1.8.3.4

2015-04-30 10:59:25

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 17/22] perf bpf: relocation programs.

If an eBPF program access a map, LLVM generates a relocated load
instruction. To enable the usage of that map, relocation must be done
by replacing original instructions by map loading instructions.

This patch do that work by utilizing previous stored relocation section
information in 'struct bpf_obj'.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 97 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 97 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 34ccc10..fe623df 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -515,6 +515,17 @@ bpf_find_prog_by_name(struct bpf_obj *obj, const char *name)
return NULL;
}

+static struct bpf_perf_prog *
+bpf_find_prog_by_idx(struct bpf_obj *obj, int idx)
+{
+ struct bpf_perf_prog *prog;
+
+ list_for_each_entry(prog, &obj->progs_list, list)
+ if (prog->idx == idx)
+ return prog;
+ return NULL;
+}
+
/*
* Use config_str to config program. If prog is NULL, find a
* prog based on config_str. config_str should not be NULL.
@@ -746,6 +757,90 @@ bpf_obj_create_maps(struct bpf_obj *obj)
return 0;
}

+static int
+bpf_perf_prog_relocate(struct bpf_obj *obj,
+ GElf_Shdr *shdr,
+ Elf_Data *data,
+ struct bpf_perf_prog *prog)
+{
+ int i, nrels;
+
+ pr_debug("bpf: relocating: %s\n", prog->name);
+ nrels = shdr->sh_size / shdr->sh_entsize;
+
+ for (i = 0; i < nrels; i++) {
+ GElf_Sym sym;
+ GElf_Rel rel;
+ unsigned int insn_idx;
+ struct bpf_insn *insns = prog->insns;
+ size_t map_idx;
+
+ if (!gelf_getrel(data, i, &rel)) {
+ pr_err("bpf: relocation: failed to get %d reloc\n", i);
+ return -EINVAL;
+ }
+
+ insn_idx = rel.r_offset / sizeof(struct bpf_insn);
+ pr_debug("bpf: relocation: insn_idx=%u\n", insn_idx);
+
+ if (!gelf_getsym(obj->elf.symbols,
+ GELF_R_SYM(rel.r_info),
+ &sym)) {
+ pr_err("bpf: relocation: symbol %"PRIx64" not found\n",
+ GELF_R_SYM(rel.r_info));
+ return -EINVAL;
+ }
+
+ if (insns[insn_idx].code != (BPF_LD | BPF_IMM | BPF_DW)) {
+ pr_err("bpf: relocation: invalid relo for "
+ "insns[%d].code 0x%x\n",
+ insn_idx, insns[insn_idx].code);
+ return -EINVAL;
+ }
+
+ map_idx = sym.st_value / sizeof(struct bpf_map_def);
+ if (map_idx >= obj->nr_maps) {
+ pr_err("bpf relocation: map_idx %d large than %d\n",
+ (int)map_idx, (int)obj->nr_maps - 1);
+ return -EINVAL;
+ }
+
+ insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
+ insns[insn_idx].imm = obj->maps_fds[map_idx];
+ }
+ return 0;
+}
+
+static int
+bpf_obj_relocate(struct bpf_obj *obj)
+{
+ int i, err;
+
+ for (i = 0; i < obj->elf.nr_reloc; i++) {
+ GElf_Shdr *shdr = &obj->elf.reloc[i].shdr;
+ Elf_Data *data = obj->elf.reloc[i].data;
+ int idx = shdr->sh_info;
+ struct bpf_perf_prog *prog;
+
+ if (shdr->sh_type != SHT_REL) {
+ pr_err("bpf: internal error\n");
+ return -EINVAL;
+ }
+
+ prog = bpf_find_prog_by_idx(obj, idx);
+ if (!prog) {
+ pr_err("bpf: relocation failed: no %d section\n",
+ idx);
+ return -ENOENT;
+ }
+
+ err = bpf_perf_prog_relocate(obj, shdr, data, prog);
+ if (err)
+ return -EINVAL;
+ }
+ return 0;
+}
+
int bpf__load(const char *path)
{
struct bpf_obj *obj;
@@ -776,6 +871,8 @@ int bpf__load(const char *path)
goto out;
if ((err = bpf_obj_create_maps(obj)))
goto out;
+ if ((err = bpf_obj_relocate(obj)))
+ goto out;

list_add(&obj->list, &bpf_obj_list);
return 0;
--
1.8.3.4

2015-04-30 10:58:37

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 18/22] perf bpf: load eBPF programs into kernel.

This patch use bpf system call to load all eBPF programs into kernel.
Their fds are stored into prog_fd field in 'struct bpf_perf_prog'.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 58 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 3 +++
2 files changed, 61 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index fe623df..0395483 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -324,6 +324,8 @@ bpf_perf_prog_free(struct bpf_perf_prog *prog)
clear_perf_probe_event(prog->pev);
bzero(prog->pev, sizeof(*prog->pev));
}
+ if (prog->prog_fd >= 0)
+ close(prog->prog_fd);
free(prog);
}

@@ -372,6 +374,7 @@ bpf_perf_prog_alloc(struct bpf_obj *obj __maybe_unused,
memcpy(prog->insns, data,
prog->insns_cnt * sizeof(struct bpf_insn));
prog->idx = idx;
+ prog->prog_fd = -1;
return prog;
out:
bpf_perf_prog_free(prog);
@@ -841,6 +844,59 @@ bpf_obj_relocate(struct bpf_obj *obj)
return 0;
}

+static __u64 ptr_to_u64(void *ptr)
+{
+ return (__u64) (unsigned long) ptr;
+}
+
+static int
+bpf_perf_prog_load(struct bpf_obj *obj, struct bpf_perf_prog *prog)
+{
+ int fd;
+ union bpf_attr attr;
+#define LOG_BUF_SIZE 65536
+ static char bpf_log_buf[LOG_BUF_SIZE];
+
+ bzero(&attr, sizeof(attr));
+
+ attr.prog_type = BPF_PROG_TYPE_KPROBE;
+ attr.insn_cnt = prog->insns_cnt;
+ attr.insns = ptr_to_u64((void *) prog->insns);
+ attr.license = ptr_to_u64((void *) obj->license);
+ attr.log_buf = ptr_to_u64(bpf_log_buf);
+ attr.log_size = LOG_BUF_SIZE;
+ attr.log_level = 1;
+ attr.kern_version = obj->kern_version;
+
+ bpf_log_buf[0] = 0;
+ fd = sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
+
+ if (fd < 0) {
+ pr_err("bpf: load: failed to load program %s:\n"
+ "-- BEGIN DUMP LOG ---\n%s\n-- END LOG --\n",
+ prog->name, bpf_log_buf);
+ return fd;
+ }
+ pr_debug("bpf load: load program %s as %d\n", prog->name, fd);
+ prog->prog_fd = fd;
+
+ return 0;
+}
+
+static int
+bpf_obj_load_progs(struct bpf_obj *obj)
+{
+ struct bpf_perf_prog *prog;
+ int err;
+
+ list_for_each_entry(prog, &obj->progs_list, list) {
+ err = bpf_perf_prog_load(obj, prog);
+ if (err)
+ return err;
+ }
+ return 0;
+}
+
int bpf__load(const char *path)
{
struct bpf_obj *obj;
@@ -873,6 +929,8 @@ int bpf__load(const char *path)
goto out;
if ((err = bpf_obj_relocate(obj)))
goto out;
+ if ((err = bpf_obj_load_progs(obj)))
+ goto out;

list_add(&obj->list, &bpf_obj_list);
return 0;
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index 756ac2e..dfdc3ca 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -29,6 +29,9 @@ struct bpf_perf_prog {
size_t insns_cnt;

struct perf_probe_event *pev;
+
+ /* fd of loaded eBPF program */
+ int prog_fd;
};

struct bpf_obj {
--
1.8.3.4

2015-04-30 10:59:18

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 19/22] perf bpf: dump eBPF program before loading.

Copy print_bpf_insn() from kernel/bpf/verifier.c to bpf.c, dump the
program if called by -vv.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 14 +++-
tools/perf/util/bpf.c | 156 +++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf.h | 1 +
3 files changed, 170 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 0395483..6587e99 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -849,6 +849,16 @@ static __u64 ptr_to_u64(void *ptr)
return (__u64) (unsigned long) ptr;
}

+static void
+dump_perf_prog(struct bpf_perf_prog *prog)
+{
+ if (verbose < 2)
+ return;
+ pr_info("-- BEGIN DUMP '%s' --\n", prog->name);
+ bpf_dump_prog(prog->insns, prog->insns_cnt);
+ pr_info("-- FINISH DUMP '%s' --\n", prog->name);
+}
+
static int
bpf_perf_prog_load(struct bpf_obj *obj, struct bpf_perf_prog *prog)
{
@@ -857,6 +867,8 @@ bpf_perf_prog_load(struct bpf_obj *obj, struct bpf_perf_prog *prog)
#define LOG_BUF_SIZE 65536
static char bpf_log_buf[LOG_BUF_SIZE];

+ dump_perf_prog(prog);
+
bzero(&attr, sizeof(attr));

attr.prog_type = BPF_PROG_TYPE_KPROBE;
@@ -902,7 +914,7 @@ int bpf__load(const char *path)
struct bpf_obj *obj;
int err;

- pr_debug("bpf: loading %s\n", path);
+ pr_debug("bpf: loading %s, verbose=%d\n", path, verbose);

if (elf_version(EV_CURRENT) == EV_NONE) {
pr_err("bpf: failed to init libelf for %s\n", path);
diff --git a/tools/perf/util/bpf.c b/tools/perf/util/bpf.c
index f752723..eb3411b 100644
--- a/tools/perf/util/bpf.c
+++ b/tools/perf/util/bpf.c
@@ -8,12 +8,14 @@
*/

#include <stdlib.h>
+#include <stdio.h>
#include <string.h>
#include <linux/unistd.h>
#include <unistd.h>
#include <linux/bpf.h>
#include <errno.h>
#include "perf.h"
+#include "debug.h"
#include "bpf.h"

int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr, size_t size)
@@ -37,3 +39,157 @@ int bpf_create_map(struct bpf_map_def *map_def)

return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}
+
+/* Taken from kernel/bpf/verifier.c, s/verbose/_verbose/g*/
+static const char *const bpf_class_string[] = {
+ [BPF_LD] = "ld",
+ [BPF_LDX] = "ldx",
+ [BPF_ST] = "st",
+ [BPF_STX] = "stx",
+ [BPF_ALU] = "alu",
+ [BPF_JMP] = "jmp",
+ [BPF_RET] = "BUG",
+ [BPF_ALU64] = "alu64",
+};
+
+static const char *const bpf_alu_string[] = {
+ [BPF_ADD >> 4] = "+=",
+ [BPF_SUB >> 4] = "-=",
+ [BPF_MUL >> 4] = "*=",
+ [BPF_DIV >> 4] = "/=",
+ [BPF_OR >> 4] = "|=",
+ [BPF_AND >> 4] = "&=",
+ [BPF_LSH >> 4] = "<<=",
+ [BPF_RSH >> 4] = ">>=",
+ [BPF_NEG >> 4] = "neg",
+ [BPF_MOD >> 4] = "%=",
+ [BPF_XOR >> 4] = "^=",
+ [BPF_MOV >> 4] = "=",
+ [BPF_ARSH >> 4] = "s>>=",
+ [BPF_END >> 4] = "endian",
+};
+
+static const char *const bpf_ldst_string[] = {
+ [BPF_W >> 3] = "u32",
+ [BPF_H >> 3] = "u16",
+ [BPF_B >> 3] = "u8",
+ [BPF_DW >> 3] = "u64",
+};
+
+static const char *const bpf_jmp_string[] = {
+ [BPF_JA >> 4] = "jmp",
+ [BPF_JEQ >> 4] = "==",
+ [BPF_JGT >> 4] = ">",
+ [BPF_JGE >> 4] = ">=",
+ [BPF_JSET >> 4] = "&",
+ [BPF_JNE >> 4] = "!=",
+ [BPF_JSGT >> 4] = "s>",
+ [BPF_JSGE >> 4] = "s>=",
+ [BPF_CALL >> 4] = "call",
+ [BPF_EXIT >> 4] = "exit",
+};
+#define _verbose pr_info
+static void print_bpf_insn(struct bpf_insn *insn)
+{
+ u8 class = BPF_CLASS(insn->code);
+
+ if (class == BPF_ALU || class == BPF_ALU64) {
+ if (BPF_SRC(insn->code) == BPF_X)
+ _verbose("(%02x) %sr%d %s %sr%d\n",
+ insn->code, class == BPF_ALU ? "(u32) " : "",
+ insn->dst_reg,
+ bpf_alu_string[BPF_OP(insn->code) >> 4],
+ class == BPF_ALU ? "(u32) " : "",
+ insn->src_reg);
+ else
+ _verbose("(%02x) %sr%d %s %s%d\n",
+ insn->code, class == BPF_ALU ? "(u32) " : "",
+ insn->dst_reg,
+ bpf_alu_string[BPF_OP(insn->code) >> 4],
+ class == BPF_ALU ? "(u32) " : "",
+ insn->imm);
+ } else if (class == BPF_STX) {
+ if (BPF_MODE(insn->code) == BPF_MEM)
+ _verbose("(%02x) *(%s *)(r%d %+d) = r%d\n",
+ insn->code,
+ bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+ insn->dst_reg,
+ insn->off, insn->src_reg);
+ else if (BPF_MODE(insn->code) == BPF_XADD)
+ _verbose("(%02x) lock *(%s *)(r%d %+d) += r%d\n",
+ insn->code,
+ bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+ insn->dst_reg, insn->off,
+ insn->src_reg);
+ else
+ _verbose("BUG_%02x\n", insn->code);
+ } else if (class == BPF_ST) {
+ if (BPF_MODE(insn->code) != BPF_MEM) {
+ _verbose("BUG_st_%02x\n", insn->code);
+ return;
+ }
+ _verbose("(%02x) *(%s *)(r%d %+d) = %d\n",
+ insn->code,
+ bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+ insn->dst_reg,
+ insn->off, insn->imm);
+ } else if (class == BPF_LDX) {
+ if (BPF_MODE(insn->code) != BPF_MEM) {
+ _verbose("BUG_ldx_%02x\n", insn->code);
+ return;
+ }
+ _verbose("(%02x) r%d = *(%s *)(r%d %+d)\n",
+ insn->code, insn->dst_reg,
+ bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+ insn->src_reg, insn->off);
+ } else if (class == BPF_LD) {
+ if (BPF_MODE(insn->code) == BPF_ABS) {
+ _verbose("(%02x) r0 = *(%s *)skb[%d]\n",
+ insn->code,
+ bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+ insn->imm);
+ } else if (BPF_MODE(insn->code) == BPF_IND) {
+ _verbose("(%02x) r0 = *(%s *)skb[r%d + %d]\n",
+ insn->code,
+ bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+ insn->src_reg, insn->imm);
+ } else if (BPF_MODE(insn->code) == BPF_IMM) {
+ _verbose("(%02x) r%d = 0x%x\n",
+ insn->code, insn->dst_reg, insn->imm);
+ } else {
+ _verbose("BUG_ld_%02x\n", insn->code);
+ return;
+ }
+ } else if (class == BPF_JMP) {
+ u8 opcode = BPF_OP(insn->code);
+
+ if (opcode == BPF_CALL) {
+ _verbose("(%02x) call %d\n", insn->code, insn->imm);
+ } else if (insn->code == (BPF_JMP | BPF_JA)) {
+ _verbose("(%02x) goto pc%+d\n",
+ insn->code, insn->off);
+ } else if (insn->code == (BPF_JMP | BPF_EXIT)) {
+ _verbose("(%02x) exit\n", insn->code);
+ } else if (BPF_SRC(insn->code) == BPF_X) {
+ _verbose("(%02x) if r%d %s r%d goto pc%+d\n",
+ insn->code, insn->dst_reg,
+ bpf_jmp_string[BPF_OP(insn->code) >> 4],
+ insn->src_reg, insn->off);
+ } else {
+ _verbose("(%02x) if r%d %s 0x%x goto pc%+d\n",
+ insn->code, insn->dst_reg,
+ bpf_jmp_string[BPF_OP(insn->code) >> 4],
+ insn->imm, insn->off);
+ }
+ } else {
+ _verbose("(%02x) %s\n", insn->code, bpf_class_string[class]);
+ }
+}
+
+void bpf_dump_prog(struct bpf_insn *insn, size_t nr_insn)
+{
+ unsigned int i;
+
+ for (i = 0; i < nr_insn; i++)
+ print_bpf_insn(&insn[i]);
+}
diff --git a/tools/perf/util/bpf.h b/tools/perf/util/bpf.h
index be106b0..b4a6802 100644
--- a/tools/perf/util/bpf.h
+++ b/tools/perf/util/bpf.h
@@ -19,4 +19,5 @@ struct bpf_map_def {
int sys_bpf(enum bpf_cmd cmd, union bpf_attr *attr, size_t size);

int bpf_create_map(struct bpf_map_def *map_def);
+void bpf_dump_prog(struct bpf_insn *insn, size_t nr_insn);
#endif
--
1.8.3.4

2015-04-30 10:59:22

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 20/22] perf bpf: clean elf memory after loading.

After all eBPF programs in an object file are loaded, related ELF
information is useless. Close the object file and free those memory.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 6587e99..208f5e8 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -944,6 +944,7 @@ int bpf__load(const char *path)
if ((err = bpf_obj_load_progs(obj)))
goto out;

+ bpf_obj_clear_elf(obj);
list_add(&obj->list, &bpf_obj_list);
return 0;
out:
--
1.8.3.4

2015-04-30 10:55:23

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 21/22] perf bpf: probe at kprobe points.

---
tools/perf/util/bpf-loader.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 208f5e8..186a3d0 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -952,8 +952,26 @@ out:
return -1;
}

+static int bpf_probe(void)
+{
+ int err = add_perf_probe_events(params.event_array,
+ params.nr_events,
+ MAX_PROBES, 0);
+ /* add_perf_probe_events return negative when fail */
+ if (err < 0)
+ pr_err("bpf probe: failed to probe events\n");
+
+ return err < 0 ? err : 0;
+}
+
int bpf__run(void)
{
+ int err;
+
+ pr_debug("bpf: probing\n");
+ if ((err = bpf_probe()))
+ return err;
+
pr_info("BPF is running. Use Ctrl-c to stop.\n");
while(1)
sleep(1);
--
1.8.3.4

2015-04-30 10:58:06

by Wang Nan

[permalink] [raw]
Subject: [RFC PATCH 22/22] perf bpf: attaches eBPF program to perf fd.

This patch does the final work whcih makes eBPF program actually work.
It introduces bpf_attach(), which first retrive id field of previous
created k(ret)probe events, then use PERF_EVENT_IOC_SET_BPF to attach
eBPF program to the events.

Signed-off-by: Wang Nan <[email protected]>
---
tools/perf/util/bpf-loader.c | 82 ++++++++++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-loader.h | 2 ++
2 files changed, 84 insertions(+)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 186a3d0..c646ca4 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -15,6 +15,7 @@
#include "bpf-loader.h"
#include "probe-event.h"
#include "probe-finder.h" // for MAX_PROBES
+#include "trace-event.h"

#include <linux/list.h>
#include <linux/types.h>
@@ -326,6 +327,8 @@ bpf_perf_prog_free(struct bpf_perf_prog *prog)
}
if (prog->prog_fd >= 0)
close(prog->prog_fd);
+ if (prog->perf_fd >= 0)
+ close(prog->perf_fd);
free(prog);
}

@@ -375,6 +378,8 @@ bpf_perf_prog_alloc(struct bpf_obj *obj __maybe_unused,
prog->insns_cnt * sizeof(struct bpf_insn));
prog->idx = idx;
prog->prog_fd = -1;
+ prog->event_id = -1;
+ prog->perf_fd = -1;
return prog;
out:
bpf_perf_prog_free(prog);
@@ -964,6 +969,81 @@ static int bpf_probe(void)
return err < 0 ? err : 0;
}

+static int bpf_perf_prog_attach(struct bpf_perf_prog *prog)
+{
+ struct event_format* format;
+ struct perf_event_attr attr;
+ int fd, err;
+
+ format = trace_event__tp_format(prog->pev->group,
+ prog->pev->event);
+ if (!format) {
+ pr_err("bpf: attach: failed to get format of %s/%s\n",
+ prog->pev->group, prog->pev->event);
+ return -EINVAL;
+ }
+
+ pr_debug("bpf: attach %s/%s: id=%d\n",
+ prog->pev->group,
+ prog->pev->event,
+ format->id);
+ prog->event_id = format->id;
+
+ pevent_free_format(format);
+
+ memset(&attr, '\0', sizeof(attr));
+
+ attr.type = PERF_TYPE_TRACEPOINT;
+ attr.sample_type = PERF_SAMPLE_RAW;
+ attr.sample_period = 1;
+ attr.wakeup_events = 1;
+ attr.config = prog->event_id;
+
+ fd = sys_perf_event_open(&attr, -1, 0, -1, 0);
+ if (fd < 0) {
+ pr_err("bpf: open event %d failed\n", prog->event_id);
+ return -errno;
+ }
+ prog->perf_fd = fd;
+
+ err = ioctl(fd, PERF_EVENT_IOC_ENABLE, 0);
+ if (err) {
+ pr_err("bpf attach: enable failed: %s\n",
+ strerror(errno));
+ return -errno;
+ }
+
+ err = ioctl(fd, PERF_EVENT_IOC_SET_BPF, prog->prog_fd);
+ if (err) {
+ pr_err("bpf attach: set bpf: %s\n", strerror(errno));
+ return -errno;
+ }
+ pr_debug("bpf: attach %s to event %d\n", prog->name,
+ prog->event_id);
+
+ return 0;
+}
+
+static int bpf_attach(void)
+{
+ struct bpf_obj *obj;
+ int err;
+
+ list_for_each_entry(obj, &bpf_obj_list, list) {
+ struct bpf_perf_prog *prog;
+
+ list_for_each_entry(prog, &obj->progs_list, list) {
+ err = bpf_perf_prog_attach(prog);
+ if (err) {
+ pr_err("bpf: attach: faied to attach %s\n",
+ prog->name);
+ return err;
+ }
+ }
+ }
+ return 0;
+}
+
int bpf__run(void)
{
int err;
@@ -971,6 +1051,8 @@ int bpf__run(void)
pr_debug("bpf: probing\n");
if ((err = bpf_probe()))
return err;
+ if ((err = bpf_attach()))
+ return err;

pr_info("BPF is running. Use Ctrl-c to stop.\n");
while(1)
diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
index dfdc3ca..baa4404 100644
--- a/tools/perf/util/bpf-loader.h
+++ b/tools/perf/util/bpf-loader.h
@@ -32,6 +32,8 @@ struct bpf_perf_prog {

/* fd of loaded eBPF program */
int prog_fd;
+ int event_id;
+ int perf_fd;
};

struct bpf_obj {
--
1.8.3.4

2015-05-01 04:37:11

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 4/30/15 3:52 AM, Wang Nan wrote:
> This series of patches is an approach to integrate eBPF with perf.
> After applying these patches, users are allowed to use following
> command to load eBPF program compiled by LLVM into kernel:
>
> $ perf bpf sample_bpf.o
>
> The required BPF code and the loading procedure is similar to Alexei
> Starovoitov's libbpf in sample/bpf, with following exceptions:
>
> 1. The section name are not required leading with 'kprobe/' or
> 'kretprobe/'. Without such leading, any valid C var name can be use.
>
> 2. A 'config' section can be provided to describe the position and
> arguments of a program. Syntax is identical to 'perf probe'.
>
> An example is pasted at the bottom of this cover letter. In that
> example, mybpfprog is configured by string in config section, and will
> be probed at __alloc_pages_nodemask. sample_bpf.o is generated using:
>
> $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include -D__KERNEL__ \
> -Wno-unused-value -Wno-pointer-sign \
> -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \
> sample_bpf.o
>
> And can be loaded using:
>
> $ perf bpf sample_bpf.o
>
> This series is only a limited functional. Following works are on the
> todo list:
>
> 1. Unprobe kprobe stubs used by eBPF programs when unloading;
>
> 2. Enable eBPF programs to access local variables and arguments
> by utilizing debuginfo;
>
> 3. Output data in perf way.
>
> In this series:
>
> Patch 1/22 is a bugfix in perf probe, and may be triggered by following
> patches;
>
> Patch 2-3/22 are preparation, add required macros and syscall
> definition into perf source tree.
>
> Patch 4/22 add 'perf bpf' command.
>
> Patch 5-20/22 are labor works, which parse the ELF object file, collect
> information in object files, create maps needed by programs, link map
> and programs, config programs and load programs into kernel.
>
> Patch 21-22/22 are the final work. Patch 21 creates kprobe points which
> will be used by eBPF programs, patch 22 creates perf file descriptors
> then attach eBPF programs on them.

I'm very happy to see this work. Looks great. All patches are
impressively clean and concise.
I think patches 1-3 are ready to go into Arnaldo's perf tree right now.
4 and above are clean and polished, but probably need to go into
some 'staging area' like a branch of perf tree, since I suspect the
user interface may change a little in the coming months and it's
a bit too early to expose 'perf bpf' command to every perf user ?
Arnaldo, Ingo, what do you guys think should be the arrangement?
'perf/bpf' branch in acme/linux.git or in tip/tip.git ?

I have few comments for patches 18 and 19, but let's figure out
the long term plan first.

We're also working in parallel on creating a new tracing language
that together with llvm backend can be used as a single shared library
that can be called from perf or anything else.
Then clang compilation step will be gone and programs can be run
as 'perf bpf file.bpf'.

Thanks!

2015-05-01 07:16:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.


* Wang Nan <[email protected]> wrote:

> This series of patches is an approach to integrate eBPF with perf.

Very promising!

> After applying these patches, users are allowed to use following
> command to load eBPF program compiled by LLVM into kernel:
>
> $ perf bpf sample_bpf.o

Please keep space for a subcommand space as most other perf
subcommands do, i.e. make it something like:

perf bpf add sample_bpf.o

or:

perf bpf run sample_bpf.o

or:

perf bpf load sample_bpf.o

So that future subcommands can be added:

perf bpf list
perf bpf del <...>
perf bpf enable <...>
perf bpf disable <...>
perf bpf help

and 'perf bpf' should probably display the help page by default, so if
curious perf users stumble into the new subcommand, they get a basic
idea about what it's all about.

I.e. you should think about the high level subcommand space right now,
and pick proper names - because this is going to determine the future
usability and the success of the tool to a large degree.

Thanks,

Ingo

2015-05-01 11:07:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote:
> We're also working in parallel on creating a new tracing language
> that together with llvm backend can be used as a single shared library
> that can be called from perf or anything else.

Gurgh, please also keep normal C an option. I never can remember how all
these fancy arse special case languages work and its just too annoying /
frustrating to have to figure out how to do simple things every time you
need it to just work.

2015-05-01 11:49:51

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.


* Peter Zijlstra <[email protected]> wrote:

> On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote:
> > We're also working in parallel on creating a new tracing language
> > that together with llvm backend can be used as a single shared library
> > that can be called from perf or anything else.
>
> Gurgh, please also keep normal C an option. [...]

Absolutely, I thought there was agreement on that when we started
merging all these eBPF patches ...

It might be 'simplified C', in that it's just a subset of C, but
please don't re-do something that works, especially if it's used to
instrument a kernel that is written in C ...

Thanks,

Ingo

2015-05-01 16:56:30

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 5/1/15 4:49 AM, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
>> On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote:
>>> We're also working in parallel on creating a new tracing language
>>> that together with llvm backend can be used as a single shared library
>>> that can be called from perf or anything else.
>>
>> Gurgh, please also keep normal C an option. [...]
>
> Absolutely, I thought there was agreement on that when we started
> merging all these eBPF patches ...
>
> It might be 'simplified C', in that it's just a subset of C, but
> please don't re-do something that works, especially if it's used to
> instrument a kernel that is written in C ...

of course. When did I say that I like 'bird' languages? :)
By 'new' I mean that we're not trying to port existing tracing
language like dtrace, systemtap, ktap to bpf.
I believe dtrace would have been more widely adopted if it didn't
invent new syntax. We're trying to do a C -- with ++.
It's C where non-supported things like 'for', 'while', 'asm' are
actively error-ed by front-end and additional syntactic
sugar for things that too ugly/verbose in vanilla C are added.
Full C via clang will always be there, but looks like it will have
a hard time, because full C has way too many things that are not
supported by bpf VM. We're trying to act on feedback that new users
are giving us. It's much more friendly when compiler tells you right
away that 'for' is not supported instead of kernel verifier says that
there is a loop. New thing is map[key] access which is equivalent
to bpf_map_lookup(&map, &key) followed by
bpf_map_update(&map, &key, &zero_value) if lookup doesn't find
an element. Turned out that for tracing use cases it's a very common
pattern.

Anyway, back to my original question about long term home.
where to land 'perf/bpf' branch ?

I also agree on a room for additional arguments after 'perf bpf'.
Especially I like to see 'perf bpf list'.

2015-05-01 17:06:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.


* Alexei Starovoitov <[email protected]> wrote:

> On 5/1/15 4:49 AM, Ingo Molnar wrote:
> >
> >* Peter Zijlstra <[email protected]> wrote:
> >
> >>On Thu, Apr 30, 2015 at 09:37:04PM -0700, Alexei Starovoitov wrote:
> >>>We're also working in parallel on creating a new tracing language
> >>>that together with llvm backend can be used as a single shared library
> >>>that can be called from perf or anything else.
> >>
> >>Gurgh, please also keep normal C an option. [...]
> >
> >Absolutely, I thought there was agreement on that when we started
> >merging all these eBPF patches ...
> >
> >It might be 'simplified C', in that it's just a subset of C, but
> >please don't re-do something that works, especially if it's used to
> >instrument a kernel that is written in C ...
>
> of course. When did I say that I like 'bird' languages? :)
> By 'new' I mean that we're not trying to port existing tracing
> language like dtrace, systemtap, ktap to bpf.
> I believe dtrace would have been more widely adopted if it didn't
> invent new syntax. We're trying to do a C -- with ++.
> It's C where non-supported things like 'for', 'while', 'asm' are
> actively error-ed by front-end and additional syntactic
> sugar for things that too ugly/verbose in vanilla C are added.

Ok, sounds very good to me!

Thanks,

Ingo

2015-05-02 07:19:59

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 2015/5/1 12:37, Alexei Starovoitov wrote:
> On 4/30/15 3:52 AM, Wang Nan wrote:
>> This series of patches is an approach to integrate eBPF with perf.
>> After applying these patches, users are allowed to use following
>> command to load eBPF program compiled by LLVM into kernel:
>>
>> $ perf bpf sample_bpf.o
>>
>> The required BPF code and the loading procedure is similar to Alexei
>> Starovoitov's libbpf in sample/bpf, with following exceptions:
>>
>> 1. The section name are not required leading with 'kprobe/' or
>> 'kretprobe/'. Without such leading, any valid C var name can be use.
>>
>> 2. A 'config' section can be provided to describe the position and
>> arguments of a program. Syntax is identical to 'perf probe'.
>>
>> An example is pasted at the bottom of this cover letter. In that
>> example, mybpfprog is configured by string in config section, and will
>> be probed at __alloc_pages_nodemask. sample_bpf.o is generated using:
>>
>> $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include -D__KERNEL__ \
>> -Wno-unused-value -Wno-pointer-sign \
>> -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \
>> sample_bpf.o
>>
>> And can be loaded using:
>>
>> $ perf bpf sample_bpf.o
>>
>> This series is only a limited functional. Following works are on the
>> todo list:
>>
>> 1. Unprobe kprobe stubs used by eBPF programs when unloading;
>>
>> 2. Enable eBPF programs to access local variables and arguments
>> by utilizing debuginfo;
>>
>> 3. Output data in perf way.
>>
>> In this series:
>>
>> Patch 1/22 is a bugfix in perf probe, and may be triggered by following
>> patches;
>>
>> Patch 2-3/22 are preparation, add required macros and syscall
>> definition into perf source tree.
>>
>> Patch 4/22 add 'perf bpf' command.
>>
>> Patch 5-20/22 are labor works, which parse the ELF object file, collect
>> information in object files, create maps needed by programs, link map
>> and programs, config programs and load programs into kernel.
>>
>> Patch 21-22/22 are the final work. Patch 21 creates kprobe points which
>> will be used by eBPF programs, patch 22 creates perf file descriptors
>> then attach eBPF programs on them.
>
> I'm very happy to see this work. Looks great. All patches are impressively clean and concise.
> I think patches 1-3 are ready to go into Arnaldo's perf tree right now.
> 4 and above are clean and polished, but probably need to go into
> some 'staging area' like a branch of perf tree, since I suspect the
> user interface may change a little in the coming months and it's
> a bit too early to expose 'perf bpf' command to every perf user ?
> Arnaldo, Ingo, what do you guys think should be the arrangement?
> 'perf/bpf' branch in acme/linux.git or in tip/tip.git ?
>
> I have few comments for patches 18 and 19, but let's figure out
> the long term plan first.
>

Hi,

Very happy to see your and other's positive feedbacks. I'm also interested in
how these patches can be merged into mainline. I'd like to continous send patches
to this list to let you all see my improvements, and let maintainers deside whether
and how to merge them.

Now we are also doing some backporting work to make eBPF patches to work for our
low version kernels. After that we will utilize eBPF in our profiling work.
I think this RFC series is only a start point to let us to use eBPF. Further requirements
should arise during our real work.

I'd like to do following works in the next version (based on my experience and feedbacks):

1. Safely clean up kprobe points after unloading;

2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf bpf load';

3. Extract eBPF ELF walking and collecting work to a separated library to help others.

My collage He Kuang is working on variable accessing. Probing inside function body
and accessing its local variable will be supported like this:

SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
int prog(struct pt_regs *ctx, unsigned long vara) {
// vara is the value of localvara of function func_name
}

And I want to discuss with you and others about:

1. How to make eBPF output its tracing and aggregation results to perf?

Thanks!

> We're also working in parallel on creating a new tracing language
> that together with llvm backend can be used as a single shared library
> that can be called from perf or anything else.
> Then clang compilation step will be gone and programs can be run
> as 'perf bpf file.bpf'.
>
> Thanks!
>

2015-05-05 03:02:15

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 5/2/15 12:19 AM, Wang Nan wrote:
>
> I'd like to do following works in the next version (based on my experience and feedbacks):
>
> 1. Safely clean up kprobe points after unloading;
>
> 2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf bpf load';
>
> 3. Extract eBPF ELF walking and collecting work to a separated library to help others.

that's a good list.

The feedback for existing patches:
patch 18 - since we're creating a generic library for bpf elf
loading it would great to do the following:
first try to load with
attr.log_buf = NULL;
attr.log_level = 0;
then only if it fails, allocate a buffer and repeat with log_level = 1.
The reason is that it's better to have fast program loading by default
without any verbosity emitted by verifier.

patch 19 - I think it's unnecessary.
verifier already dumps it. so this '-v' flag can be translated into
verbose loading.
There is also .s output from llvm for those interested in bpf asm
instructions.

> My collage He Kuang is working on variable accessing. Probing inside function body
> and accessing its local variable will be supported like this:
>
> SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
> int prog(struct pt_regs *ctx, unsigned long vara) {
> // vara is the value of localvara of function func_name
> }

that would be great. I'm not sure though how you can achieve that
without changing C front-end ?
This type of feature is exactly the reason why we're trying to write
our front-end.
In general there are two ways to achieve 'restricted C' language:
- start from clang and chop all features that are not supported.
I believe Jovi already tried to do that and it became very difficult.
- start from simple front-end with minimal C and add all things one by
one. That's what we're trying to do. So far we have most of normal
syntax. The problem with our approach is that we cannot easily do
#include of existing .h files. We're working on that.
It's too experimental still. May be will be drop it and go back to
first approach.

The reason for extending front-end is your example above, where
the user would want to write:
int prog(struct pt_regs *ctx, unsigned long vara) {
// use 'vara'
but generated BPF should have only one 'ctx' pointer, since that's
the only thing that verifier will accept. bpf/core and JITs expect
only one argument, etc.
So this func definition + 'vara' access can be compiled as ctx->si
(if vara is actually in register) or
bpf_probe_read(ctx->bp + magic_offset_from_debug_info)
(if vara is on stack)
or it can also be done via store_trace_args() but that will be slower
and requires hacking kernel, whereas ctx->... style is pure userspace.
Lot's of things to brainstorm. So please share your progress soon.

> And I want to discuss with you and others about:
>
> 1. How to make eBPF output its tracing and aggregation results to perf?

well, the output of bpf program is a data stored in maps. Each program
needs a corresponding user space reader/printer/sorter of this data.
Like tracex2 prints this data as histogram and tracex3 prints it as
heatmap. We can standardize few things like this, but ideally we
keep it up to user. So that user can write single file that consists
of functions that are loaded as bpf into kernel and other functions
that are executed in user space. llvm can jit first set to bpf and
second set to x86. That's distant future though.
So far samples/bpf/ style of kern.c+user.c worked quite well.

2015-05-05 04:42:31

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 2015/5/5 11:02, Alexei Starovoitov wrote:
> On 5/2/15 12:19 AM, Wang Nan wrote:
>>
>> I'd like to do following works in the next version (based on my experience and feedbacks):
>>
>> 1. Safely clean up kprobe points after unloading;
>>
>> 2. Add subcommand space to 'perf bpf'. Current staff should be reside in 'perf bpf load';
>>
>> 3. Extract eBPF ELF walking and collecting work to a separated library to help others.
>
> that's a good list.
>
> The feedback for existing patches:
> patch 18 - since we're creating a generic library for bpf elf
> loading it would great to do the following:
> first try to load with
> attr.log_buf = NULL;
> attr.log_level = 0;
> then only if it fails, allocate a buffer and repeat with log_level = 1.
> The reason is that it's better to have fast program loading by default
> without any verbosity emitted by verifier.
>

Will do.

> patch 19 - I think it's unnecessary.
> verifier already dumps it. so this '-v' flag can be translated into
> verbose loading.
> There is also .s output from llvm for those interested in bpf asm
> instructions.
>

That's great. Could you please append the description of 'llvm -s' into your README
or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to
add it into perf...

>> My collage He Kuang is working on variable accessing. Probing inside function body
>> and accessing its local variable will be supported like this:
>>
>> SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
>> int prog(struct pt_regs *ctx, unsigned long vara) {
>> // vara is the value of localvara of function func_name
>> }
>
> that would be great. I'm not sure though how you can achieve that
> without changing C front-end ?

It's not very difficult. He is trying to generate the loader of vara
as prologue, then paste the prologue and the main eBPF program together.
>From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
prologue program fetches the value of vara then put it into a propoer register,
then main program work.

Another possible solution is to change the protocol between kprobe and eBPF
program, makes kprobes calls fetchers and passes them to eBPF program as
a second param (group all varx together).
A prologue may still need in this case to load each param into correct
register.

> This type of feature is exactly the reason why we're trying to write
> our front-end.
> In general there are two ways to achieve 'restricted C' language:
> - start from clang and chop all features that are not supported.
> I believe Jovi already tried to do that and it became very difficult.
> - start from simple front-end with minimal C and add all things one by
> one. That's what we're trying to do. So far we have most of normal
> syntax. The problem with our approach is that we cannot easily do
> #include of existing .h files. We're working on that.
> It's too experimental still. May be will be drop it and go back to
> first approach.
>
> The reason for extending front-end is your example above, where
> the user would want to write:
> int prog(struct pt_regs *ctx, unsigned long vara) {
> // use 'vara'
> but generated BPF should have only one 'ctx' pointer, since that's
> the only thing that verifier will accept. bpf/core and JITs expect
> only one argument, etc.
> So this func definition + 'vara' access can be compiled as ctx->si
> (if vara is actually in register) or
> bpf_probe_read(ctx->bp + magic_offset_from_debug_info)
> (if vara is on stack)
> or it can also be done via store_trace_args() but that will be slower
> and requires hacking kernel, whereas ctx->... style is pure userspace.
> Lot's of things to brainstorm. So please share your progress soon.
>
>> And I want to discuss with you and others about:
>>
>> 1. How to make eBPF output its tracing and aggregation results to perf?
>
> well, the output of bpf program is a data stored in maps. Each program
> needs a corresponding user space reader/printer/sorter of this data.
> Like tracex2 prints this data as histogram and tracex3 prints it as
> heatmap. We can standardize few things like this, but ideally we
> keep it up to user. So that user can write single file that consists
> of functions that are loaded as bpf into kernel and other functions
> that are executed in user space. llvm can jit first set to bpf and
> second set to x86. That's distant future though.
> So far samples/bpf/ style of kern.c+user.c worked quite well.
>

Well, looks like in your design the usage of BPF programs are some aggration
results. In my side, I want they also ack as trace filters.

Could you please consider the following problem?

We find there are serval __lock_page() calls last very long time. We are going
to find corresponding __unlock_page() so we can know what blocks them. We want to
insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program
on the entry of __unlock_page(), so we can compute the interval between page locking and
unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling
so we get its call stack. In this case, eBPF program acts as a trace filter.

Thank you.


2015-05-05 05:49:07

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 5/4/15 9:41 PM, Wang Nan wrote:
>
> That's great. Could you please append the description of 'llvm -s' into your README
> or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to
> add it into perf...

sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
Eventually it will work as normal 'clang -S file.c' when few more
llvm commits are accepted upstream.

>>> My collage He Kuang is working on variable accessing. Probing inside function body
>>> and accessing its local variable will be supported like this:
>>>
>>> SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
>>> int prog(struct pt_regs *ctx, unsigned long vara) {
>>> // vara is the value of localvara of function func_name
>>> }
>>
>> that would be great. I'm not sure though how you can achieve that
>> without changing C front-end ?
>
> It's not very difficult. He is trying to generate the loader of vara
> as prologue, then paste the prologue and the main eBPF program together.
> From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
> prologue program fetches the value of vara then put it into a propoer register,
> then main program work.

got it. I think that's much cleaner than what I was proposing.
The only question is then:
char _prog_config[] = "prog: func_name:1234 vara=localvara"
should actually be something like "... r2=localvara", right?
since prologue would need to assign into r2.
Otherwise I don't see where you find out about 'vara' inside
compiled bpf code.

Would be nice if this can be done without debug info.
Like in tracex2_kern.c I have:
SEC("kprobe/sys_write")
int bpf_prog(struct pt_regs *ctx)
{
long wr_size = ctx->dx; /* arg3 */

with your prolog generator the above can be rewritten as:
SEC("kprobe/sys_write")
int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
{
/* use wr_size */

that will improve ease of use a lot.

> Another possible solution is to change the protocol between kprobe and eBPF
> program, makes kprobes calls fetchers and passes them to eBPF program as
> a second param (group all varx together).
> A prologue may still need in this case to load each param into correct
> register.

you mean grouping varx together in some other struct and embedding it
together with pt_regs into new container struct?
doable, but your first approach is quite clean already. why bother.

> Could you please consider the following problem?
>
> We find there are serval __lock_page() calls last very long time. We are going
> to find corresponding __unlock_page() so we can know what blocks them. We want to
> insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program
> on the entry of __unlock_page(), so we can compute the interval between page locking and
> unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling
> so we get its call stack. In this case, eBPF program acts as a trace filter.

all makes sense and your use case fits quite well into existing
bpf+kprobe model. I'm not sure why you're calling a 'problem'.
A problem of how to display that call stack from perf?
I would say it fits better as a sample than a trace.
If you dump it as a trace, it won't easy to decipher, whereas if you
treat it a sampling event, perf record/report facility will pick it up
and display nicely. Meaning that one sample == lock_page/unlock_page
latency > N. Then existing sample_callchain flag should work.

2015-05-05 06:15:15

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 2015/5/5 13:49, Alexei Starovoitov wrote:
> On 5/4/15 9:41 PM, Wang Nan wrote:
>>
>> That's great. Could you please append the description of 'llvm -s' into your README
>> or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to
>> add it into perf...
>
> sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
> Eventually it will work as normal 'clang -S file.c' when few more
> llvm commits are accepted upstream.
>
>>>> My collage He Kuang is working on variable accessing. Probing inside function body
>>>> and accessing its local variable will be supported like this:
>>>>
>>>> SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
>>>> int prog(struct pt_regs *ctx, unsigned long vara) {
>>>> // vara is the value of localvara of function func_name
>>>> }
>>>
>>> that would be great. I'm not sure though how you can achieve that
>>> without changing C front-end ?
>>
>> It's not very difficult. He is trying to generate the loader of vara
>> as prologue, then paste the prologue and the main eBPF program together.
>> From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
>> prologue program fetches the value of vara then put it into a propoer register,
>> then main program work.
>
> got it. I think that's much cleaner than what I was proposing.
> The only question is then:
> char _prog_config[] = "prog: func_name:1234 vara=localvara"
> should actually be something like "... r2=localvara", right?
> since prologue would need to assign into r2.
> Otherwise I don't see where you find out about 'vara' inside
> compiled bpf code.
>

I think the calling convention could teach us which var should go to which
register. In the case of

SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara varb=globalvarb";
int prog(struct pt_regs *ctx, unsigned long vara, unsigned long varb) { ... }

llvm should compile 'prog' according to calling convention. The body of that
program should assume vara in r2 and varb in r3. The prologue also puts the vars into
r2 and r3 according to calling convention. Therefore, after paste them together, the final
program should run properly. There is no need to describe register number explicitly.
What do you think?


> Would be nice if this can be done without debug info.
> Like in tracex2_kern.c I have:
> SEC("kprobe/sys_write")
> int bpf_prog(struct pt_regs *ctx)
> {
> long wr_size = ctx->dx; /* arg3 */
>
> with your prolog generator the above can be rewritten as:
> SEC("kprobe/sys_write")
> int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
> {
> /* use wr_size */
>
> that will improve ease of use a lot.
>

It is possible if probing on the entry of a function. However, when probing on
function body, there still need a way to pass variable list required by the
program to perf to let it generate correct prologue. We'd like to implement
the generic one (list vars in config string) first, then make function
parameters accessing as a syntax sugar.

>> Another possible solution is to change the protocol between kprobe and eBPF
>> program, makes kprobes calls fetchers and passes them to eBPF program as
>> a second param (group all varx together).
>> A prologue may still need in this case to load each param into correct
>> register.
>
> you mean grouping varx together in some other struct and embedding it
> together with pt_regs into new container struct?
> doable, but your first approach is quite clean already. why bother.
>

The second approach makes us reuse the fetchers code which are already in
kernel. Further more, if new type of fetchers are appear (for example, fetcher
of PMU counter), we support it automatically.

>> Could you please consider the following problem?
>>
>> We find there are serval __lock_page() calls last very long time. We are going
>> to find corresponding __unlock_page() so we can know what blocks them. We want to
>> insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program
>> on the entry of __unlock_page(), so we can compute the interval between page locking and
>> unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling
>> so we get its call stack. In this case, eBPF program acts as a trace filter.
>
> all makes sense and your use case fits quite well into existing
> bpf+kprobe model. I'm not sure why you're calling a 'problem'.
> A problem of how to display that call stack from perf?
> I would say it fits better as a sample than a trace.
> If you dump it as a trace, it won't easy to decipher, whereas if you
> treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page
> latency > N. Then existing sample_callchain flag should work.
>

Quite well. Do we have an eBPF function like

static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = BPF_FUNC_perf_sample

so we can use it in the program probed in the body of __unlock_page() like that:

...
if (latency > 0.5s)
bpf_perf_sample("page=%p, latency=%d", sizeof(...), page, latency);
...

Thank you.

Subject: Re: [RFC PATCH 01/22] perf: probe: avoid segfault if passed with ''.

On 2015/04/30 19:52, Wang Nan wrote:
> Since parse_perf_probe_point() deals with a user passed argument, we
> should not assume it to be a valid string.
>
> Without this patch, if pass '' to perf probe, a segfault raises:
>
> $ perf probe -a ''
> Segmentation fault
>
> This patch checks argument of parse_perf_probe_point() before
> string processing.
>
> After this patch:
>
> $ perf probe -a ''
>
> usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
> or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
> ...

This looks OK to me.

Acked-by: Masami Hiramatsu <[email protected]>

Could you split this as an independent bugfix with my ack?

Thank you,

>
> Signed-off-by: Wang Nan <[email protected]>
> ---
> tools/perf/util/probe-event.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
> index d8bb616..d05b77c 100644
> --- a/tools/perf/util/probe-event.c
> +++ b/tools/perf/util/probe-event.c
> @@ -1084,6 +1084,8 @@ static int parse_perf_probe_point(char *arg, struct perf_probe_event *pev)
> *
> * TODO:Group name support
> */
> + if (!arg)
> + return -EINVAL;
>
> ptr = strpbrk(arg, ";=@+%");
> if (ptr && *ptr == '=') { /* Event name */
>


--
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: [email protected]

2015-05-05 16:10:15

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH 01/22] perf: probe: avoid segfault if passed with ''.

Em Tue, May 05, 2015 at 11:09:21PM +0900, Masami Hiramatsu escreveu:
> On 2015/04/30 19:52, Wang Nan wrote:
> > Since parse_perf_probe_point() deals with a user passed argument, we
> > should not assume it to be a valid string.
> >
> > Without this patch, if pass '' to perf probe, a segfault raises:
> >
> > $ perf probe -a ''
> > Segmentation fault

> > This patch checks argument of parse_perf_probe_point() before
> > string processing.

> > After this patch:

> > $ perf probe -a ''
> >
> > usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
> > or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
> > ...

> This looks OK to me.

> Acked-by: Masami Hiramatsu <[email protected]>

> Could you split this as an independent bugfix with my ack?

You mean split from this patchkit? I already did that, added it to my
perf/urgent branch, where I'll add your acked-by as well,

- Arnaldo

2015-05-05 16:10:49

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

Em Fri, May 01, 2015 at 09:56:23AM -0700, Alexei Starovoitov escreveu:
> Anyway, back to my original question about long term home.
> where to land 'perf/bpf' branch ?

I don't care, but for me to merge it, please go on addressing the
comments made in this thread (perf bpf command --args, etc) and at some
point provide a small patchset that implements the most basic stuff,
like, say, a "hello, world" style proggie, together with the
tools/perf/Documentation/perf-bpf.txt file, detailed instructions on how
to use the feature, i.e. what dependencies are needed, what kernel
options should be enabled, etc.

Nice warning/error messages for when the user doesn't have those options
enabled or doesn't have appropriate permissions, etc.

I.e. just by following what is in each changeset comment log I should be
able to test patch after patch.

After we get one such, say, 10-long patchkit with a very basic feature
of eBPF exposed via 'perf bpf', we can go to the next, and so on.

Try to use 'perf trace usleep 1', 'perf trace -a usleep 1' as non-root,
for instance, to see examples on how to inform the user about what is
needed to use the tool.

- Arnaldo

Subject: Re: [RFC PATCH 01/22] perf: probe: avoid segfault if passed with ''.

On 2015/05/06 0:26, Arnaldo Carvalho de Melo wrote:
> Em Tue, May 05, 2015 at 11:09:21PM +0900, Masami Hiramatsu escreveu:
>> On 2015/04/30 19:52, Wang Nan wrote:
>>> Since parse_perf_probe_point() deals with a user passed argument, we
>>> should not assume it to be a valid string.
>>>
>>> Without this patch, if pass '' to perf probe, a segfault raises:
>>>
>>> $ perf probe -a ''
>>> Segmentation fault
>
>>> This patch checks argument of parse_perf_probe_point() before
>>> string processing.
>
>>> After this patch:
>
>>> $ perf probe -a ''
>>>
>>> usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
>>> or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
>>> ...
>
>> This looks OK to me.
>
>> Acked-by: Masami Hiramatsu <[email protected]>
>
>> Could you split this as an independent bugfix with my ack?
>
> You mean split from this patchkit? I already did that, added it to my
> perf/urgent branch, where I'll add your acked-by as well,

Ah, you've done it! I missed it.

Thanks!

>
> - Arnaldo
>


--
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: [email protected]

Subject: Re: [RFC PATCH 21/22] perf bpf: probe at kprobe points.

At least we need a description what this patch does... what will be
done with this patch, and what the user will see and what/how they can do.

Thank you,

On 2015/04/30 19:52, Wang Nan wrote:
> ---
> tools/perf/util/bpf-loader.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
> index 208f5e8..186a3d0 100644
> --- a/tools/perf/util/bpf-loader.c
> +++ b/tools/perf/util/bpf-loader.c
> @@ -952,8 +952,26 @@ out:
> return -1;
> }
>
> +static int bpf_probe(void)
> +{
> + int err = add_perf_probe_events(params.event_array,
> + params.nr_events,
> + MAX_PROBES, 0);
> + /* add_perf_probe_events return negative when fail */
> + if (err < 0)
> + pr_err("bpf probe: failed to probe events\n");
> +
> + return err < 0 ? err : 0;
> +}
> +
> int bpf__run(void)
> {
> + int err;
> +
> + pr_debug("bpf: probing\n");
> + if ((err = bpf_probe()))
> + return err;
> +
> pr_info("BPF is running. Use Ctrl-c to stop.\n");
> while(1)
> sleep(1);
>


--
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: [email protected]

2015-05-05 21:52:28

by Brendan Gregg

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

On Thu, Apr 30, 2015 at 3:52 AM, Wang Nan <[email protected]> wrote:
[...]
> An example is pasted at the bottom of this cover letter. In that
> example, mybpfprog is configured by string in config section, and will
> be probed at __alloc_pages_nodemask. sample_bpf.o is generated using:
>
> $ $CLANG -I/usr/src/kernel/include -I/usr/src/kernel/usr/include -D__KERNEL__ \
> -Wno-unused-value -Wno-pointer-sign \
> -O2 -emit-llvm -c sample_bpf.c -o -| $LLC -march=bpf -filetype=obj -o \
> sample_bpf.o
>
> And can be loaded using:
>
> $ perf bpf sample_bpf.o
[...]
> -------- EXAMPL --------
> ----- sample_bpf.c -----
> #include <uapi/linux/bpf.h>
> #include <linux/version.h>
> #include <uapi/linux/ptrace.h>
>
> #define SEC(NAME) __attribute__((section(NAME), used))
>
> static int (*bpf_map_delete_elem)(void *map, void *key) =
> (void *) BPF_FUNC_map_delete_elem;
> static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
> (void *) BPF_FUNC_trace_printk;
>
> struct bpf_map_def {
> unsigned int type;
> unsigned int key_size;
> unsigned int value_size;
> unsigned int max_entries;
> };
>
> struct pair {
> u64 val;
> u64 ip;
> };
>
> struct bpf_map_def SEC("maps") my_map = {
> .type = BPF_MAP_TYPE_HASH,
> .key_size = sizeof(long),
> .value_size = sizeof(struct pair),
> .max_entries = 1000000,
> };
>
> SEC("kprobe/kmem_cache_free")
> int bpf_prog1(struct pt_regs *ctx)
> {
> long ptr = ctx->r14;
> bpf_map_delete_elem(&my_map, &ptr);
> return 0;
> }
>
> SEC("mybpfprog")
> int bpf_prog_my(void *ctx)
> {
> char fmt[] = "Haha\n";
> bpf_trace_printk(fmt, sizeof(fmt));
> return 0;
> }
>
> char _license[] SEC("license") = "GPL";
> u32 _version SEC("version") = LINUX_VERSION_CODE;
> char _config[] SEC("config") = ""
> "mybpfprog=__alloc_pages_nodemask\n";

Was this just some random eBPF code to test the perf framework? Or was
it to do something useful with
kmem_cache_free()/__alloc_pages_nodemask() tracing as well? It looks a
bit incomplete.

If it's just random code, I'd include a comment to state that,
otherwise it's a bit confusing. A complete example might be better;
eg, something like Alexei's tracex1, for a simple example of
bpf_trace_printk(), or sockex1, for a simple map example.

Brendan

2015-05-06 02:38:04

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH 21/22] perf bpf: probe at kprobe points.

On 2015/5/6 0:34, Masami Hiramatsu wrote:
> At least we need a description what this patch does... what will be
> done with this patch, and what the user will see and what/how they can do.
>
> Thank you,
>

Sorry. I manually checked patch commit message after 'git format-patch' but
forgot this one. I'll fix it in the next version.

What this patch do is simply calls 'add_perf_probe_events' (originally used for
perf probe) to create kprobe events. Previous patch has puts all events into a
uniform array.

> On 2015/04/30 19:52, Wang Nan wrote:
>> ---
>> tools/perf/util/bpf-loader.c | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>>
>> diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
>> index 208f5e8..186a3d0 100644
>> --- a/tools/perf/util/bpf-loader.c
>> +++ b/tools/perf/util/bpf-loader.c
>> @@ -952,8 +952,26 @@ out:
>> return -1;
>> }
>>
>> +static int bpf_probe(void)
>> +{
>> + int err = add_perf_probe_events(params.event_array,
>> + params.nr_events,
>> + MAX_PROBES, 0);
>> + /* add_perf_probe_events return negative when fail */
>> + if (err < 0)
>> + pr_err("bpf probe: failed to probe events\n");
>> +
>> + return err < 0 ? err : 0;
>> +}
>> +
>> int bpf__run(void)
>> {
>> + int err;
>> +
>> + pr_debug("bpf: probing\n");
>> + if ((err = bpf_probe()))
>> + return err;
>> +
>> pr_info("BPF is running. Use Ctrl-c to stop.\n");
>> while(1)
>> sleep(1);
>>
>
>

2015-05-06 04:47:11

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

Hi Alexei Starovoitov,

Have you ever read this mail?

I'm very intrerested in triggering perf sample in BPF code.
You said it is not a problem. Could you please give me some
further information?

Thank you.

On 2015/5/5 14:14, Wang Nan wrote:
> On 2015/5/5 13:49, Alexei Starovoitov wrote:
>> On 5/4/15 9:41 PM, Wang Nan wrote:
>>>
>>> That's great. Could you please append the description of 'llvm -s' into your README
>>> or comments? It has cost me a lot of time for dumping eBPF instructions so I decide to
>>> add it into perf...
>>
>> sure. it's just -filetype=asm flag to llc instead of -filetype=obj.
>> Eventually it will work as normal 'clang -S file.c' when few more
>> llvm commits are accepted upstream.
>>
>>>>> My collage He Kuang is working on variable accessing. Probing inside function body
>>>>> and accessing its local variable will be supported like this:
>>>>>
>>>>> SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara"
>>>>> int prog(struct pt_regs *ctx, unsigned long vara) {
>>>>> // vara is the value of localvara of function func_name
>>>>> }
>>>>
>>>> that would be great. I'm not sure though how you can achieve that
>>>> without changing C front-end ?
>>>
>>> It's not very difficult. He is trying to generate the loader of vara
>>> as prologue, then paste the prologue and the main eBPF program together.
>>> From the viewpoint of kernel bpf verifier, there is only one param (ctx); the
>>> prologue program fetches the value of vara then put it into a propoer register,
>>> then main program work.
>>
>> got it. I think that's much cleaner than what I was proposing.
>> The only question is then:
>> char _prog_config[] = "prog: func_name:1234 vara=localvara"
>> should actually be something like "... r2=localvara", right?
>> since prologue would need to assign into r2.
>> Otherwise I don't see where you find out about 'vara' inside
>> compiled bpf code.
>>
>
> I think the calling convention could teach us which var should go to which
> register. In the case of
>
> SEC("config") char _prog_config[] = "prog: func_name:1234 vara=localvara varb=globalvarb";
> int prog(struct pt_regs *ctx, unsigned long vara, unsigned long varb) { ... }
>
> llvm should compile 'prog' according to calling convention. The body of that
> program should assume vara in r2 and varb in r3. The prologue also puts the vars into
> r2 and r3 according to calling convention. Therefore, after paste them together, the final
> program should run properly. There is no need to describe register number explicitly.
> What do you think?
>
>
>> Would be nice if this can be done without debug info.
>> Like in tracex2_kern.c I have:
>> SEC("kprobe/sys_write")
>> int bpf_prog(struct pt_regs *ctx)
>> {
>> long wr_size = ctx->dx; /* arg3 */
>>
>> with your prolog generator the above can be rewritten as:
>> SEC("kprobe/sys_write")
>> int bpf_prog(struct pt_regs *unused, int fd, char *buf, size_t wr_size)
>> {
>> /* use wr_size */
>>
>> that will improve ease of use a lot.
>>
>
> It is possible if probing on the entry of a function. However, when probing on
> function body, there still need a way to pass variable list required by the
> program to perf to let it generate correct prologue. We'd like to implement
> the generic one (list vars in config string) first, then make function
> parameters accessing as a syntax sugar.
>
>>> Another possible solution is to change the protocol between kprobe and eBPF
>>> program, makes kprobes calls fetchers and passes them to eBPF program as
>>> a second param (group all varx together).
>>> A prologue may still need in this case to load each param into correct
>>> register.
>>
>> you mean grouping varx together in some other struct and embedding it
>> together with pt_regs into new container struct?
>> doable, but your first approach is quite clean already. why bother.
>>
>
> The second approach makes us reuse the fetchers code which are already in
> kernel. Further more, if new type of fetchers are appear (for example, fetcher
> of PMU counter), we support it automatically.
>
>>> Could you please consider the following problem?
>>>
>>> We find there are serval __lock_page() calls last very long time. We are going
>>> to find corresponding __unlock_page() so we can know what blocks them. We want to
>>> insert eBPF programs before io_schedule() in __lock_page(), and also add eBPF program
>>> on the entry of __unlock_page(), so we can compute the interval between page locking and
>>> unlocking. If time is longer than a threshold, let __unlock_page() trigger a perf sampling
>>> so we get its call stack. In this case, eBPF program acts as a trace filter.
>>
>> all makes sense and your use case fits quite well into existing
>> bpf+kprobe model. I'm not sure why you're calling a 'problem'.
>> A problem of how to display that call stack from perf?
>> I would say it fits better as a sample than a trace.
>> If you dump it as a trace, it won't easy to decipher, whereas if you
>> treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page
>> latency > N. Then existing sample_callchain flag should work.
>>
>
> Quite well. Do we have an eBPF function like
>
> static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = BPF_FUNC_perf_sample
>
> so we can use it in the program probed in the body of __unlock_page() like that:
>
> ...
> if (latency > 0.5s)
> bpf_perf_sample("page=%p, latency=%d", sizeof(...), page, latency);
> ...
>
> Thank you.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2015-05-06 04:57:04

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 5/5/15 9:46 PM, Wang Nan wrote:
> Hi Alexei Starovoitov,
>
> Have you ever read this mail?

please don't top post.

>>> all makes sense and your use case fits quite well into existing
>>> bpf+kprobe model. I'm not sure why you're calling a 'problem'.
>>> A problem of how to display that call stack from perf?
>>> I would say it fits better as a sample than a trace.
>>> If you dump it as a trace, it won't easy to decipher, whereas if you
>>> treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page
>>> latency > N. Then existing sample_callchain flag should work.
>>>
>>
>> Quite well. Do we have an eBPF function like
>>
>> static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = BPF_FUNC_perf_sample
>>
>> so we can use it in the program probed in the body of __unlock_page() like that:
>>
>> ...
>> if (latency > 0.5s)
>> bpf_perf_sample("page=%p, latency=%d", sizeof(...), page, latency);

No need for extra helper. There is already return value from
the program for this purpose.
From kernel/trace/bpf_trace.c:
* Return: BPF programs always return an integer which is interpreted by
* kprobe handler as:
* 0 - return from kprobe (event is filtered out)
* 1 - store kprobe event into ring buffer

in your case the program attached to unlock_page() can return 1
when it needs to store this event into ring buffer, so that perf can
process it. If I'm not mistaken, the sample_callchain flag cannot be
applied to kprobe events, but that's a general program (not
related to bpf) and can be addressed as such.

2015-05-06 05:00:40

by Wang Nan

[permalink] [raw]
Subject: Re: [RFC PATCH 00/22] perf tools: introduce 'perf bpf' command to load eBPF programs.

On 2015/5/6 12:56, Alexei Starovoitov wrote:
> On 5/5/15 9:46 PM, Wang Nan wrote:
>> Hi Alexei Starovoitov,
>>
>> Have you ever read this mail?
>
> please don't top post.
>
>>>> all makes sense and your use case fits quite well into existing
>>>> bpf+kprobe model. I'm not sure why you're calling a 'problem'.
>>>> A problem of how to display that call stack from perf?
>>>> I would say it fits better as a sample than a trace.
>>>> If you dump it as a trace, it won't easy to decipher, whereas if you
>>>> treat it a sampling event, perf record/report facility will pick it up and display nicely. Meaning that one sample == lock_page/unlock_page
>>>> latency > N. Then existing sample_callchain flag should work.
>>>>
>>>
>>> Quite well. Do we have an eBPF function like
>>>
>>> static int (*bpf_perf_sample)(const char *fmt, int fmt_size, ...) = BPF_FUNC_perf_sample
>>>
>>> so we can use it in the program probed in the body of __unlock_page() like that:
>>>
>>> ...
>>> if (latency > 0.5s)
>>> bpf_perf_sample("page=%p, latency=%d", sizeof(...), page, latency);
>
> No need for extra helper. There is already return value from
> the program for this purpose.
> From kernel/trace/bpf_trace.c:
> * Return: BPF programs always return an integer which is interpreted by
> * kprobe handler as:
> * 0 - return from kprobe (event is filtered out)
> * 1 - store kprobe event into ring buffer
>
> in your case the program attached to unlock_page() can return 1
> when it needs to store this event into ring buffer, so that perf can
> process it. If I'm not mistaken, the sample_callchain flag cannot be
> applied to kprobe events, but that's a general program (not
> related to bpf) and can be addressed as such.
>

That's great! Thanks to your response!

2015-05-11 06:38:16

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC PATCH 04/22] perf tools: Add new 'perf bpf' command.

Hi,

On Thu, Apr 30, 2015 at 10:52:27AM +0000, Wang Nan wrote:
> Adding new 'perf bpf' command to provide eBPF program loading and
> management support.
>
> Signed-off-by: Wang Nan <[email protected]>
> ---
> tools/perf/Build | 1 +
> tools/perf/Documentation/perf-bpf.txt | 18 ++++++++++
> tools/perf/builtin-bpf.c | 63 +++++++++++++++++++++++++++++++++++
> tools/perf/builtin.h | 1 +
> tools/perf/perf.c | 1 +
> tools/perf/util/Build | 1 +
> tools/perf/util/bpf-loader.c | 35 +++++++++++++++++++
> tools/perf/util/bpf-loader.h | 21 ++++++++++++
> 8 files changed, 141 insertions(+)
> create mode 100644 tools/perf/Documentation/perf-bpf.txt
> create mode 100644 tools/perf/builtin-bpf.c
> create mode 100644 tools/perf/util/bpf-loader.c
> create mode 100644 tools/perf/util/bpf-loader.h
>
> diff --git a/tools/perf/Build b/tools/perf/Build
> index b77370e..c69f0c1 100644
> --- a/tools/perf/Build
> +++ b/tools/perf/Build
> @@ -19,6 +19,7 @@ perf-y += builtin-kvm.o
> perf-y += builtin-inject.o
> perf-y += builtin-mem.o
> perf-y += builtin-data.o
> +perf-y += builtin-bpf.o
>
> perf-$(CONFIG_AUDIT) += builtin-trace.o
> perf-$(CONFIG_LIBELF) += builtin-probe.o
> diff --git a/tools/perf/Documentation/perf-bpf.txt b/tools/perf/Documentation/perf-bpf.txt
> new file mode 100644
> index 0000000..634d588
> --- /dev/null
> +++ b/tools/perf/Documentation/perf-bpf.txt
> @@ -0,0 +1,18 @@
> +perf-bpf(1)
> +==============
> +
> +NAME
> +----
> +perf-bpf - loads eBPF programs into kernel.
> +
> +SYNOPSIS
> +--------
> +[verse]
> +'perf bpf' [<common options>] <bpfprogram.o>",
> +
> +DESCRIPTION
> +-----------
> +Loading eBPF programs into kernel.
> +
> +OPTIONS
> +-------
> diff --git a/tools/perf/builtin-bpf.c b/tools/perf/builtin-bpf.c
> new file mode 100644
> index 0000000..0fc7a82
> --- /dev/null
> +++ b/tools/perf/builtin-bpf.c
> @@ -0,0 +1,63 @@
> +/*
> + * buildin-bpf.c

s/buildin/builtn/

> + *
> + * Buildin bpf command: Load bpf and attach bpf programs onto kprobes.

ditto.

> + */
> +#include "builtin.h"
> +#include "perf.h"
> +#include "debug.h"
> +#include "parse-options.h"
> +#include "bpf-loader.h"
> +
> +static const char *bpf_usage[] = {
> + "perf bpf [<options>] <bpfobj>",
> + NULL
> +};
> +
> +static void print_usage(void)
> +{
> + printf("Usage:\n");
> + printf("\t%s\n\n", bpf_usage[0]);
> +}

Why not using usage_with_options() for this?

Thanks,
Namhyung


> +
> +struct option __bpf_options[] = {
> + OPT_INCR('v', "verbose", &verbose, "be more verbose"),
> + OPT_END()
> +};
> +
> +struct option *bpf_options = __bpf_options;
> +
> +int cmd_bpf(int argc, const char **argv,
> + const char *prefix __maybe_unused)
> +{
> + int err;
> + const char **pfn;
> +
> + if (argc < 2)
> + goto usage;
> +
> + argc = parse_options(argc, argv, bpf_options, bpf_usage,
> + PARSE_OPT_STOP_AT_NON_OPTION);
> + if (argc < 1)
> + goto usage;
> +
> + pfn = argv;
> + while (*pfn != NULL) {
> + const char *fn = *pfn++;
> +
> + err = bpf__load(fn);
> + if (err) {
> + pr_err("bpf: load bpf program from %s: result: %d\n",
> + fn, err);
> + break;
> + }
> + }
> +
> + if (!err)
> + bpf__run();
> + return err;
> +usage:
> + print_usage();
> + return -1;
> +}
> +

2015-05-11 06:42:16

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC PATCH 09/22] perf bpf: collect map definitions.

On Thu, Apr 30, 2015 at 10:52:32AM +0000, Wang Nan wrote:
> If maps are used by eBPF programs, corresponding object file(s) should
> contain a section named 'map'. Which contains map definitions, one for
> each map to describe its format. 'struct perf_bpf_map_def' is
> introduced as part of protocol between perf and eBPF programs. All map
> definitions are copied to obj->maps.
>
> bpf.h is introduced for common bpf operations.
>
> Signed-off-by: Wang Nan <[email protected]>
> ---
> tools/perf/util/bpf-loader.c | 31 +++++++++++++++++++++++++++++++
> tools/perf/util/bpf-loader.h | 3 +++
> 2 files changed, 34 insertions(+)
>
> diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
> index 296fb06..bf3b793 100644
> --- a/tools/perf/util/bpf-loader.c
> +++ b/tools/perf/util/bpf-loader.c
> @@ -65,6 +65,8 @@ static void bpf_obj_close(struct bpf_obj *obj)
>
> if (obj->path)
> free(obj->path);
> + if (obj->maps)
> + free(obj->maps);
> free(obj);
> }
>
> @@ -183,6 +185,32 @@ static int bpf_obj_kver_init(struct bpf_obj *obj,
> return 0;
> }
>
> +static int bpf_obj_maps_init(struct bpf_obj *obj, void *data,
> + size_t size)
> +{
> + size_t map_def_sz = sizeof(struct bpf_map_def);
> + int nr_maps = size / map_def_sz;
> +
> + if (nr_maps == 0) {
> + pr_debug("bpf: %s doesn't need map definition\n",
> + obj->path);
> + return 0;
> + }
> +
> + obj->maps = malloc(nr_maps * map_def_sz);
> + if (!obj->maps) {
> + pr_err("bpf: malloc maps failed: %s\n", obj->path);
> + return -ENOMEM;
> + }
> +
> + obj->nr_maps = nr_maps;
> + memcpy(obj->maps, data, nr_maps * map_def_sz);

Doesn't it need to swap the data as it's binary?

Thanks,
Namhyung


> + pr_debug("bpf: %d map%s in %s\n", nr_maps,
> + nr_maps > 1 ? "s" : "",
> + obj->path);
> + return 0;
> +}
> +
> static int bpf_obj_elf_collect(struct bpf_obj *obj)
> {
> Elf *elf = obj->elf.elf;
> @@ -237,6 +265,9 @@ static int bpf_obj_elf_collect(struct bpf_obj *obj)
> else if (strcmp(name, "version") == 0)
> err = bpf_obj_kver_init(obj, data->d_buf,
> data->d_size);
> + else if (strcmp(name, "maps") == 0)
> + err = bpf_obj_maps_init(obj, data->d_buf,
> + data->d_size);
> if (err)
> goto out;
> }
> diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
> index e1d5c42..6c5c8d6 100644
> --- a/tools/perf/util/bpf-loader.h
> +++ b/tools/perf/util/bpf-loader.h
> @@ -14,6 +14,7 @@
> #include "perf.h"
> #include "symbol.h"
> #include "probe-event.h"
> +#include "bpf.h"
>
> int bpf__load(const char *path);
> int bpf__run(void);
> @@ -25,6 +26,8 @@ struct bpf_obj {
> bool needs_swap;
> char license[64];
> u32 kern_version;
> + struct bpf_map_def *maps;
> + size_t nr_maps;
>
> /*
> * Information when doing elf related work. Only valid if fd
> --
> 1.8.3.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/