2015-08-27 10:43:47

by xiakaixu

[permalink] [raw]
Subject: [RFC PATCH 0/4] perf tools: Use the new ability of eBPF programs to access hardware PMU counter

According to the discussions on this subject https://lkml.org/lkml/2015/5/27/1027,
we want to give eBPF programs the ability to access hardware PMU counter
and use this ability with perf.

Now the kernel side patch set 'bpf: Introduce the new ability of eBPF
programs to access hardware PMU counter' has been applied and can be
found in the net-next tree.

ffe8690c85b8 ("perf: add the necessary core perf APIs when accessing events counters in eBPF programs")
2a36f0b92eb6 ("bpf: Make the bpf_prog_array_map more generic")
ea317b267e9d ("bpf: Add new bpf map type to store the pointer to struct perf_event")
35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter")
47efb30274cb ("samples/bpf: example of get selected PMU counter value")

According to the design plan, we still need the perf side code.
This patch set based on Wang Nan's patches (perf tools: filtering events
using eBPF programs).
(git://git.kernel.org/pub/scm/linux/kernel/git/pi3orama/linux tags/perf-ebpf-for-acme-20150821)
The kernel side patch set above also need to be mergerd if you want to
test this patch set.

An example is pasted at the bottom of this cover letter. In that example,
we can get the cpu_cycles and exception taken in sys_write.

$ cat /sys/kernel/debug/tracing/trace_pipe
$ ./perf record --event perf-bpf.o ls
...
cat-1653 [003] d..1 88174.613854: : ente: CPU-3 cyc:48746333 exc:84
cat-1653 [003] d..2 88174.613861: : exit: CPU-3 cyc:48756041 exc:84
cat-1653 [003] d..1 88174.613872: : ente: CPU-3 cyc:48771199 exc:86
cat-1653 [003] d..2 88174.613879: : exit: CPU-3 cyc:48780448 exc:86
cat-1678 [003] d..1 88174.615001: : ente: CPU-3 cyc:50293479 exc:93
sshd-1669 [000] d..1 88174.615199: : ente: CPU-0 cyc:44402694 exc:51
sshd-1669 [000] d..2 88174.615283: : exit: CPU-0 cyc:44517335 exc:51
ls-1680 [003] d..1 88174.620260: : ente: CPU-3 cyc:57281750 exc:241
sshd-1669 [000] d..1 88174.620474: : ente: CPU-0 cyc:44998837 exc:69
sshd-1669 [000] d..2 88174.620549: : exit: CPU-0 cyc:45101855 exc:69
sshd-1669 [000] d..1 88174.620608: : ente: CPU-0 cyc:45181848 exc:77
sshd-1669 [000] d..2 88174.620709: : exit: CPU-0 cyc:45317439 exc:78
sshd-1669 [000] d..1 88174.620801: : ente: CPU-0 cyc:45441321 exc:87
sshd-1669 [000] d..2 88174.620856: : exit: CPU-0 cyc:45515882 exc:87
...

Limitation of this patch set: The eBPF programs can only create and access
the perf events depend on CPUs and can not do that depend on PID.

The detail of patches is as follow:

Patch 1/4 introduces bpf_update_elem() and perf_event_open() in
introduces bpf_update_elem() and perf_event_open(). We can store
the pointers to struct perf_event to maps;

Patch 2/4 collects BPF_MAP_TYPE_PERF_EVENT_ARRAY map definitions
from 'maps' section and get the event & map match;

Patch 3/4 saves the perf event fds from "maps" sections to
'struct bpf_object'. So we can enable/disable these perf events
at the appropriate time;

Patch 4/4 enable/disable the perf events stored in 'struct bpf_object';

-------- EXAMPL --------
----- perf-bpf.c -----

struct perf_event_map_def SEC("maps") my_cycles_map = {
.map_def = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = 32,
},
.attr = {
.freq = 0,
.inherit = 0,
.sample_period = 0x7fffffffffffffffULL,
.type = PERF_TYPE_HARDWARE,
.read_format = 0,
.sample_type = 0,
.config = 0,/* PMU: cycles */
},
};

struct perf_event_map_def SEC("maps") my_exception_map = {
.map_def = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = 32,
},
.attr = {
.freq = 0,
.inherit = 0,
.sample_period = 0x7fffffffffffffffULL,
.type = PERF_TYPE_RAW,
.read_format = 0,
.sample_type = 0,
.config = 0x09,/* PMU: exception */
},
};

SEC("ente=sys_write")
int bpf_prog_1(struct pt_regs *ctx)
{
u64 count_cycles, count_exception;
u32 key = bpf_get_smp_processor_id();
char fmt[] = "ente: CPU-%d cyc:%llu exc:%llu\n";

count_cycles = bpf_perf_event_read(&my_cycles_map, key);
count_exception = bpf_perf_event_read(&my_exception_map, key);
bpf_trace_printk(fmt, sizeof(fmt), key, count_cycles, count_exception);

return 0;
}

SEC("exit=sys_write%return")
int bpf_prog_2(struct pt_regs *ctx)
{
u64 count_cycles, count_exception;
u32 key = bpf_get_smp_processor_id();
char fmt[] = "exit: CPU-%d cyc:%llu exc:%llu\n";

count_cycles = bpf_perf_event_read(&my_cycles_map, key);
count_exception = bpf_perf_event_read(&my_exception_map, key);
bpf_trace_printk(fmt, sizeof(fmt), key, count_cycles, count_exception);

return 0;
}

Kaixu Xia (4):
bpf tools: Add bpf_update_elem() and perf_event_open() for common bpf
operations
bpf tools: Collect BPF_MAP_TYPE_PERF_EVENT_ARRAY map definitions from
'maps' section
bpf tools: Save the perf event fds from "maps" sections to 'struct
bpf_object'
bpf tools: Enable/disable the perf events stored in 'struct
bpf_object'

tools/lib/bpf/bpf.c | 34 +++++++++++++
tools/lib/bpf/bpf.h | 4 ++
tools/lib/bpf/libbpf.c | 130 +++++++++++++++++++++++++++++++++++++++++++------
tools/lib/bpf/libbpf.h | 13 +++++
4 files changed, 166 insertions(+), 15 deletions(-)

--
1.8.3.4


2015-08-27 10:42:48

by xiakaixu

[permalink] [raw]
Subject: [RFC PATCH 1/4] bpf tools: Add bpf_update_elem() and perf_event_open() for common bpf operations

This patch introduces bpf_update_elem() and perf_event_open()
in bpf.c/h for common bpf operations.

Signed-off-by: Kaixu Xia <[email protected]>
---
tools/lib/bpf/bpf.c | 34 ++++++++++++++++++++++++++++++++++
tools/lib/bpf/bpf.h | 4 ++++
2 files changed, 38 insertions(+)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index a633105..5ff7f09 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -10,6 +10,7 @@
#include <memory.h>
#include <unistd.h>
#include <asm/unistd.h>
+#include <sys/syscall.h>
#include <linux/bpf.h>
#include "bpf.h"

@@ -29,6 +30,18 @@
# endif
#endif

+#if defined(__i386__)
+#ifndef __NR_perf_event_open
+# define __NR_perf_event_open 336
+#endif
+#endif
+
+#if defined(__x86_64__)
+#ifndef __NR_perf_event_open
+# define __NR_perf_event_open 298
+#endif
+#endif
+
static __u64 ptr_to_u64(void *ptr)
{
return (__u64) (unsigned long) ptr;
@@ -55,6 +68,20 @@ int bpf_create_map(enum bpf_map_type map_type, int key_size,
return sys_bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}

+int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags)
+{
+ union bpf_attr attr;
+
+ memset(&attr, '\0', sizeof(attr));
+
+ attr.map_fd = fd;
+ attr.key = ptr_to_u64(key);
+ attr.value = ptr_to_u64(value);
+ attr.flags = flags;
+
+ return sys_bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
+}
+
int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
size_t insns_cnt, char *license,
u32 kern_version, char *log_buf, size_t log_buf_sz)
@@ -83,3 +110,10 @@ int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
log_buf[0] = 0;
return sys_bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
}
+
+int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
+ int group_fd, unsigned long flags)
+{
+ return syscall(__NR_perf_event_open, attr, pid, cpu,
+ group_fd, flags);
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 854b736..7f1283c 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -12,6 +12,10 @@

int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size,
int max_entries);
+int bpf_update_elem(int fd, void *key, void *value, unsigned long long flags);
+struct perf_event_attr;
+int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
+ int group_fd, unsigned long flags);

/* Recommend log buffer size */
#define BPF_LOG_BUF_SIZE 65536
--
1.8.3.4

2015-08-27 10:44:23

by xiakaixu

[permalink] [raw]
Subject: [RFC PATCH 2/4] bpf tools: Collect BPF_MAP_TYPE_PERF_EVENT_ARRAY map definitions from 'maps' section

In order to make use of the new ability of eBPF programs to access
hardware PMU counter, we need to get the event & map match. So we
introduce the struct perf_event_map_def that contains struct bpf_map
_def and struct perf_event_attr. We can get the necessary info from
'maps' section and store the pointers to struct perf_event in
BPF_MAP_TYPE_PERF_EVENT_ARRAY maps.

Signed-off-by: Kaixu Xia <[email protected]>
---
tools/lib/bpf/libbpf.c | 76 ++++++++++++++++++++++++++++++++++++++++----------
tools/lib/bpf/libbpf.h | 13 +++++++++
2 files changed, 74 insertions(+), 15 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 1ff6a19..83d79c4 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -570,7 +570,7 @@ bpf_object__find_prog_by_idx(struct bpf_object *obj, int idx)

static int
bpf_program__collect_reloc(struct bpf_program *prog,
- size_t nr_maps, GElf_Shdr *shdr,
+ size_t max_maps, GElf_Shdr *shdr,
Elf_Data *data, Elf_Data *symbols)
{
int i, nrels;
@@ -616,9 +616,9 @@ bpf_program__collect_reloc(struct bpf_program *prog,
}

map_idx = sym.st_value / sizeof(struct bpf_map_def);
- if (map_idx >= nr_maps) {
+ if (map_idx >= max_maps) {
pr_warning("bpf relocation: map_idx %d large than %d\n",
- (int)map_idx, (int)nr_maps - 1);
+ (int)map_idx, (int)max_maps - 1);
return -EINVAL;
}

@@ -629,11 +629,42 @@ bpf_program__collect_reloc(struct bpf_program *prog,
}

static int
+bpf_object__collect_perf_event_maps(void *data, int **pfd)
+{
+ int i, event_fd;
+ int maps_fd = **pfd;
+ int attr_length = ATTR_LENGTH;
+ int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
+ struct perf_event_attr *attr;
+
+ attr = (struct perf_event_attr *)(data + sizeof(struct bpf_map_def));
+ if (attr->type != PERF_TYPE_RAW &&
+ attr->type != PERF_TYPE_HARDWARE)
+ return -EINVAL;
+ attr->disabled = 1;
+
+ do {
+ (*pfd)++;
+ **pfd = -1;
+ } while (--attr_length);
+
+ for (i = 0; i < nr_cpus; i++) {
+ event_fd = perf_event_open(attr, -1/*pid*/, i/*cpu*/, -1/*group_fd*/, 0);
+ if (event_fd < 0) {
+ pr_warning("event syscall failed\n");
+ return -EINVAL;
+ }
+ bpf_update_elem(maps_fd, &i, &event_fd, BPF_ANY);
+ }
+ return 0;
+}
+
+static int
bpf_object__create_maps(struct bpf_object *obj)
{
unsigned int i;
- size_t nr_maps;
- int *pfd;
+ size_t nr_maps, j;
+ int *pfd, err;

nr_maps = obj->maps_buf_sz / sizeof(struct bpf_map_def);
if (!obj->maps_buf || !nr_maps) {
@@ -664,24 +695,37 @@ bpf_object__create_maps(struct bpf_object *obj)
def.value_size,
def.max_entries);
if (*pfd < 0) {
- size_t j;
- int err = *pfd;
-
+ err = *pfd;
pr_warning("failed to create map: %s\n",
strerror(errno));
- for (j = 0; j < i; j++)
- zclose(obj->map_fds[j]);
- obj->nr_map_fds = 0;
- zfree(&obj->map_fds);
- return err;
+ goto out_close;
}
pr_debug("create map: fd=%d\n", *pfd);
+
+ if (def.type == BPF_MAP_TYPE_PERF_EVENT_ARRAY) {
+ void *data = obj->maps_buf + i * sizeof(struct bpf_map_def);
+
+ err = bpf_object__collect_perf_event_maps(data, &pfd);
+ if (err < 0) {
+ pr_warning("failed to collect perf_event maps: %s\n",
+ strerror(errno));
+ goto out_close;
+ }
+ i += ATTR_LENGTH;
+ }
pfd++;
}

zfree(&obj->maps_buf);
obj->maps_buf_sz = 0;
return 0;
+
+out_close:
+ for (j = 0; j < i; j++)
+ zclose(obj->map_fds[j]);
+ obj->nr_map_fds = 0;
+ zfree(&obj->map_fds);
+ return err;
}

static int
@@ -705,6 +749,8 @@ bpf_program__relocate(struct bpf_program *prog, int *map_fds)
return -ERANGE;
}
insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
+ if (map_fds[map_idx] == -1)
+ return -EINVAL;
insns[insn_idx].imm = map_fds[map_idx];
}

@@ -748,7 +794,7 @@ static int bpf_object__collect_reloc(struct bpf_object *obj)
Elf_Data *data = obj->efile.reloc[i].data;
int idx = shdr->sh_info;
struct bpf_program *prog;
- size_t nr_maps = obj->maps_buf_sz /
+ size_t max_maps = obj->maps_buf_sz /
sizeof(struct bpf_map_def);

if (shdr->sh_type != SHT_REL) {
@@ -763,7 +809,7 @@ static int bpf_object__collect_reloc(struct bpf_object *obj)
return -ENOENT;
}

- err = bpf_program__collect_reloc(prog, nr_maps,
+ err = bpf_program__collect_reloc(prog, max_maps,
shdr, data,
obj->efile.symbols);
if (err)
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 9fa7b09..8361dd5 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -10,6 +10,7 @@

#include <stdio.h>
#include <stdbool.h>
+#include <linux/perf_event.h>

/*
* In include/linux/compiler-gcc.h, __printf is defined. However
@@ -100,4 +101,16 @@ struct bpf_map_def {
unsigned int max_entries;
};

+#define ATTR_LENGTH ((sizeof(struct perf_event_attr) + \
+ sizeof(struct bpf_map_def) - 1) /\
+ sizeof(struct bpf_map_def))
+
+struct perf_event_map_def {
+ struct bpf_map_def map_def;
+ union {
+ struct perf_event_attr attr;
+ struct bpf_map_def align[ATTR_LENGTH];
+ };
+};
+
#endif
--
1.8.3.4

2015-08-27 10:42:47

by xiakaixu

[permalink] [raw]
Subject: [RFC PATCH 3/4] bpf tools: Save the perf event fds from "maps" sections to 'struct bpf_object'

This patch saves the perf_event fds from "maps" sections to struct
bpf_object. So we can enable/disable these perf events at the
appropriate time.

Signed-off-by: Kaixu Xia <[email protected]>
---
tools/lib/bpf/libbpf.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 83d79c4..2b3940e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -126,6 +126,8 @@ struct bpf_object {
*/
size_t nr_map_fds;
bool loaded;
+ int *bpf_event_fds;
+ int nr_bpf_event_fds;

/*
* Information when doing elf related work. Only valid if fd
@@ -629,7 +631,26 @@ bpf_program__collect_reloc(struct bpf_program *prog,
}

static int
-bpf_object__collect_perf_event_maps(void *data, int **pfd)
+bpf_object__save_event_bpf_fd(struct bpf_object *obj, int event_fd)
+{
+ void *bpf_event = obj->bpf_event_fds;
+ int nr_bpf_event = obj->nr_bpf_event_fds;
+
+ bpf_event = realloc(bpf_event, sizeof(int)*(nr_bpf_event + 1));
+
+ if (!bpf_event) {
+ pr_warning("realloc failed\n");
+ return -ENOMEM;
+ }
+
+ obj->bpf_event_fds = bpf_event;
+ obj->nr_bpf_event_fds = nr_bpf_event + 1;
+ obj->bpf_event_fds[nr_bpf_event] = event_fd;
+ return 0;
+}
+
+static int
+bpf_object__collect_perf_event_maps(struct bpf_object *obj, void *data, int **pfd)
{
int i, event_fd;
int maps_fd = **pfd;
@@ -655,6 +676,8 @@ bpf_object__collect_perf_event_maps(void *data, int **pfd)
return -EINVAL;
}
bpf_update_elem(maps_fd, &i, &event_fd, BPF_ANY);
+ if (bpf_object__save_event_bpf_fd(obj, event_fd))
+ return -ENOMEM;
}
return 0;
}
@@ -705,7 +728,7 @@ bpf_object__create_maps(struct bpf_object *obj)
if (def.type == BPF_MAP_TYPE_PERF_EVENT_ARRAY) {
void *data = obj->maps_buf + i * sizeof(struct bpf_map_def);

- err = bpf_object__collect_perf_event_maps(data, &pfd);
+ err = bpf_object__collect_perf_event_maps(obj, data, &pfd);
if (err < 0) {
pr_warning("failed to collect perf_event maps: %s\n",
strerror(errno));
@@ -1074,6 +1097,7 @@ void bpf_object__close(struct bpf_object *obj)
bpf_program__exit(&obj->programs[i]);
}
zfree(&obj->programs);
+ zfree(&obj->bpf_event_fds);

list_del(&obj->list);
free(obj);
--
1.8.3.4

2015-08-27 10:43:17

by xiakaixu

[permalink] [raw]
Subject: [RFC PATCH 4/4] bpf tools: Enable/disable the perf events stored in 'struct bpf_object'

This patch enable/disable the perf events stored in 'struct
bpf_object' at the appropriate time. These events we're
interested in come from 'maps' sections.

Signed-off-by: Kaixu Xia <[email protected]>
---
tools/lib/bpf/libbpf.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2b3940e..7fab959 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -20,6 +20,7 @@
#include <linux/list.h>
#include <libelf.h>
#include <gelf.h>
+#include <sys/ioctl.h>

#include "libbpf.h"
#include "bpf.h"
@@ -972,6 +973,32 @@ bpf_object__load_progs(struct bpf_object *obj)
return 0;
}

+static int
+bpf_object__maps_event_enable(struct bpf_object *obj)
+{
+ int i;
+ int nr_bpf_event = obj->nr_bpf_event_fds;
+
+ for (i = 0; i < nr_bpf_event; i++)
+ ioctl((int)obj->bpf_event_fds[i],
+ PERF_EVENT_IOC_ENABLE, 0);
+
+ return 0;
+}
+
+static void bpf_object__maps_event_close(struct bpf_object *obj)
+{
+ int i;
+ int nr_bpf_event = obj->nr_bpf_event_fds;
+
+ for (i = 0; i < nr_bpf_event; i++) {
+ int maps_event_fd = (int)obj->bpf_event_fds[i];
+
+ ioctl(maps_event_fd, PERF_EVENT_IOC_DISABLE, 0);
+ close(maps_event_fd);
+ }
+}
+
static int bpf_object__validate(struct bpf_object *obj)
{
if (obj->kern_version == 0) {
@@ -1072,6 +1099,8 @@ int bpf_object__load(struct bpf_object *obj)
goto out;
if (bpf_object__load_progs(obj))
goto out;
+ if (bpf_object__maps_event_enable(obj))
+ goto out;

return 0;
out:
@@ -1087,6 +1116,7 @@ void bpf_object__close(struct bpf_object *obj)
if (!obj)
return;

+ bpf_object__maps_event_close(obj);
bpf_object__elf_finish(obj);
bpf_object__unload(obj);

--
1.8.3.4

2015-08-29 01:28:37

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] perf tools: Use the new ability of eBPF programs to access hardware PMU counter

On 8/27/15 3:42 AM, Kaixu Xia wrote:
> An example is pasted at the bottom of this cover letter. In that example,
> we can get the cpu_cycles and exception taken in sys_write.
>
> $ cat /sys/kernel/debug/tracing/trace_pipe
> $ ./perf record --event perf-bpf.o ls
> ...
> cat-1653 [003] d..1 88174.613854: : ente: CPU-3 cyc:48746333 exc:84
> cat-1653 [003] d..2 88174.613861: : exit: CPU-3 cyc:48756041 exc:84

nice. probably more complex example that computes the delta of the pmu
counters on the kernel side would be even more interesting.
Do you think you can extend 'perf stat' with a flag that does
stats collection for a given kernel or user function instead of the
whole process ?
Then we can use perf record/report to figure out hot functions and
follow with 'perf stat -f my_hot_func my_process' to drill into
particular function stats.

2015-08-29 02:15:23

by xiakaixu

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] perf tools: Use the new ability of eBPF programs to access hardware PMU counter

于 2015/8/29 9:28, Alexei Starovoitov 写道:
> On 8/27/15 3:42 AM, Kaixu Xia wrote:
>> An example is pasted at the bottom of this cover letter. In that example,
>> we can get the cpu_cycles and exception taken in sys_write.
>>
>> $ cat /sys/kernel/debug/tracing/trace_pipe
>> $ ./perf record --event perf-bpf.o ls
>> ...
>> cat-1653 [003] d..1 88174.613854: : ente: CPU-3 cyc:48746333 exc:84
>> cat-1653 [003] d..2 88174.613861: : exit: CPU-3 cyc:48756041 exc:84
>
> nice. probably more complex example that computes the delta of the pmu
> counters on the kernel side would be even more interesting.

Right, this is just a little example. Actually, I have tested this
ability on kernel side and user space side, that is kprobe and uprobe.
The collected delta of the pmu counters form kernel and glibc is correct
and meets the expected goals. I will give them in the next version.

At this time i wish to get your comment on the current chosen implementation.
Now the struct perf_event_map_def is introduced and the user can directly
define the struct perf_event_attr, so we can skip the parse_events process
and call the sys_perf_event_open on the events directly. This is the most
simple implementation, but I am not sure it is the most appropriate.
> Do you think you can extend 'perf stat' with a flag that does
> stats collection for a given kernel or user function instead of the
> whole process ?
> Then we can use perf record/report to figure out hot functions and
> follow with 'perf stat -f my_hot_func my_process' to drill into
> particular function stats.

Good idea! I will consider it when this patchset is basically completed.
>
>
> .
>

2015-08-29 02:42:32

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] perf tools: Use the new ability of eBPF programs to access hardware PMU counter

On 8/28/15 7:14 PM, xiakaixu wrote:
> Right, this is just a little example. Actually, I have tested this
> ability on kernel side and user space side, that is kprobe and uprobe.

great to hear.

> At this time i wish to get your comment on the current chosen implementation.
> Now the struct perf_event_map_def is introduced and the user can directly
> define the struct perf_event_attr, so we can skip the parse_events process
> and call the sys_perf_event_open on the events directly. This is the most
> simple implementation, but I am not sure it is the most appropriate.

I think it's a bit kludgy. You are trying to squeeze more and more
information into sections and pass them via elf.
It worked for samples early on, but now it's time to do better.
Like in bcc we just write normal C and extract all necessary information
by looking at C via clang:rewriter api. I think it's a cleaner approach.
In our use case we can compile on the host, so no intermediate files,
no elf files. If you have to cross-compile you can still use the same
approach and let llvm generate .o and emit all extra stuff as another
configuration file (say in .json), then let host load .o and use .json
to setup pmu events and everything else. It will work for higher number
of use cases, but at the end I don't see how you can avoid moving to
c+python or c+whatever approach, since static configuration (whether in
.json or in elf section) are not going to be enough. You'd need a
program in user space to deal with all the data that bpf program
in kernel is collecting.