2018-11-06 20:56:02

by Song Liu

[permalink] [raw]
Subject: [RFC perf,bpf 0/5] reveal invisible bpf programs

This is to follow up Alexei's early effort to show bpf programs

https://www.spinics.net/lists/netdev/msg524232.html

In this version, PERF_RECORD_BPF_EVENT is introduced to send real time BPF
load/unload events to user space. In user space, perf-record is modified
to listen to these events (through a dedicated ring buffer) and generate
detailed information about the program (struct bpf_prog_info_event). Then,
perf-report translates these events into proper symbols.

With this set, perf-report will show bpf program as:

18.49% 0.16% test [kernel.vmlinux] [k] ksys_write
18.01% 0.47% test [kernel.vmlinux] [k] vfs_write
17.02% 0.40% test bpf_prog [k] bpf_prog_F
16.97% 0.10% test [kernel.vmlinux] [k] __vfs_write
16.86% 0.12% test [kernel.vmlinux] [k] comm_write
16.67% 0.39% test [kernel.vmlinux] [k] bpf_probe_read

Note that, the program name is still work in progress, it will be cleaner
with function types in BTF.

Please share your comments on this.

Thanks,
Song

Song Liu (5):
perf, bpf: Introduce PERF_RECORD_BPF_EVENT
perf: sync tools/include/uapi/linux/perf_event.h
perf util: basic handling of PERF_RECORD_BPF_EVENT
perf util: introduce bpf_prog_info_event
perf util: generate bpf_prog_info_event for short living bpf programs

include/linux/perf_event.h | 5 +
include/uapi/linux/perf_event.h | 27 ++-
kernel/bpf/syscall.c | 4 +
kernel/events/core.c | 82 +++++++-
tools/include/uapi/linux/perf_event.h | 27 ++-
tools/perf/builtin-record.c | 55 +++++
tools/perf/builtin-report.c | 2 +
tools/perf/util/Build | 2 +
tools/perf/util/bpf-info.c | 287 ++++++++++++++++++++++++++
tools/perf/util/bpf-info.h | 29 +++
tools/perf/util/event.c | 20 ++
tools/perf/util/event.h | 29 +++
tools/perf/util/evlist.c | 42 +++-
tools/perf/util/evlist.h | 4 +
tools/perf/util/evsel.c | 9 +
tools/perf/util/evsel.h | 3 +
tools/perf/util/machine.c | 10 +
tools/perf/util/machine.h | 2 +
tools/perf/util/session.c | 8 +
tools/perf/util/tool.h | 7 +-
20 files changed, 646 insertions(+), 8 deletions(-)
create mode 100644 tools/perf/util/bpf-info.c
create mode 100644 tools/perf/util/bpf-info.h

--
2.17.1


2018-11-06 20:54:59

by Song Liu

[permalink] [raw]
Subject: [RFC perf,bpf 2/5] perf: sync tools/include/uapi/linux/perf_event.h

Sync changes for PERF_RECORD_BPF_EVENT.

Signed-off-by: Song Liu <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 27 ++++++++++++++++++++++++++-
1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index f35eb72739c0..d51cacb3077a 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -372,7 +372,8 @@ struct perf_event_attr {
context_switch : 1, /* context switch data */
write_backward : 1, /* Write ring buffer from end to beginning */
namespaces : 1, /* include namespaces data */
- __reserved_1 : 35;
+ bpf_event : 1, /* include bpf events */
+ __reserved_1 : 34;

union {
__u32 wakeup_events; /* wakeup every n events */
@@ -963,9 +964,33 @@ enum perf_event_type {
*/
PERF_RECORD_NAMESPACES = 16,

+ /*
+ * Record different types of bpf events:
+ * enum perf_bpf_event_type {
+ * PERF_BPF_EVENT_UNKNOWN = 0,
+ * PERF_BPF_EVENT_PROG_LOAD = 1,
+ * PERF_BPF_EVENT_PROG_UNLOAD = 2,
+ * };
+ *
+ * struct {
+ * struct perf_event_header header;
+ * u16 type;
+ * u16 flags;
+ * u32 id; // prog_id or map_id
+ * };
+ */
+ PERF_RECORD_BPF_EVENT = 17,
+
PERF_RECORD_MAX, /* non-ABI */
};

+enum perf_bpf_event_type {
+ PERF_BPF_EVENT_UNKNOWN = 0,
+ PERF_BPF_EVENT_PROG_LOAD = 1,
+ PERF_BPF_EVENT_PROG_UNLOAD = 2,
+ PERF_BPF_EVENT_MAX, /* non-ABI */
+};
+
#define PERF_MAX_STACK_DEPTH 127
#define PERF_MAX_CONTEXTS_PER_STACK 8

--
2.17.1


2018-11-06 20:56:05

by Song Liu

[permalink] [raw]
Subject: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT

For better performance analysis of BPF programs, this patch introduces
PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
load/unload information to user space.

/*
* Record different types of bpf events:
* enum perf_bpf_event_type {
* PERF_BPF_EVENT_UNKNOWN = 0,
* PERF_BPF_EVENT_PROG_LOAD = 1,
* PERF_BPF_EVENT_PROG_UNLOAD = 2,
* };
*
* struct {
* struct perf_event_header header;
* u16 type;
* u16 flags;
* u32 id; // prog_id or map_id
* };
*/
PERF_RECORD_BPF_EVENT = 17,

PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
Perf utility (or other user space tools) should listen to this event and
fetch more details about the event via BPF syscalls
(BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).

Currently, PERF_RECORD_BPF_EVENT only support two events:
PERF_BPF_EVENT_PROG_LOAD and PERF_BPF_EVENT_PROG_UNLOAD. But it can be
easily extended to support more events.

Signed-off-by: Song Liu <[email protected]>
---
include/linux/perf_event.h | 5 ++
include/uapi/linux/perf_event.h | 27 ++++++++++-
kernel/bpf/syscall.c | 4 ++
kernel/events/core.c | 82 ++++++++++++++++++++++++++++++++-
4 files changed, 116 insertions(+), 2 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 53c500f0ca79..a3126fd5b7f1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1113,6 +1113,9 @@ static inline void perf_event_task_sched_out(struct task_struct *prev,
}

extern void perf_event_mmap(struct vm_area_struct *vma);
+extern void perf_event_bpf_event(enum perf_bpf_event_type type,
+ u16 flags, u32 id);
+
extern struct perf_guest_info_callbacks *perf_guest_cbs;
extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
@@ -1333,6 +1336,8 @@ static inline int perf_unregister_guest_info_callbacks
(struct perf_guest_info_callbacks *callbacks) { return 0; }

static inline void perf_event_mmap(struct vm_area_struct *vma) { }
+static inline void perf_event_bpf_event(enum perf_bpf_event_type type,
+ u16 flags, u32 id) { }
static inline void perf_event_exec(void) { }
static inline void perf_event_comm(struct task_struct *tsk, bool exec) { }
static inline void perf_event_namespaces(struct task_struct *tsk) { }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index f35eb72739c0..d51cacb3077a 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -372,7 +372,8 @@ struct perf_event_attr {
context_switch : 1, /* context switch data */
write_backward : 1, /* Write ring buffer from end to beginning */
namespaces : 1, /* include namespaces data */
- __reserved_1 : 35;
+ bpf_event : 1, /* include bpf events */
+ __reserved_1 : 34;

union {
__u32 wakeup_events; /* wakeup every n events */
@@ -963,9 +964,33 @@ enum perf_event_type {
*/
PERF_RECORD_NAMESPACES = 16,

+ /*
+ * Record different types of bpf events:
+ * enum perf_bpf_event_type {
+ * PERF_BPF_EVENT_UNKNOWN = 0,
+ * PERF_BPF_EVENT_PROG_LOAD = 1,
+ * PERF_BPF_EVENT_PROG_UNLOAD = 2,
+ * };
+ *
+ * struct {
+ * struct perf_event_header header;
+ * u16 type;
+ * u16 flags;
+ * u32 id; // prog_id or map_id
+ * };
+ */
+ PERF_RECORD_BPF_EVENT = 17,
+
PERF_RECORD_MAX, /* non-ABI */
};

+enum perf_bpf_event_type {
+ PERF_BPF_EVENT_UNKNOWN = 0,
+ PERF_BPF_EVENT_PROG_LOAD = 1,
+ PERF_BPF_EVENT_PROG_UNLOAD = 2,
+ PERF_BPF_EVENT_MAX, /* non-ABI */
+};
+
#define PERF_MAX_STACK_DEPTH 127
#define PERF_MAX_CONTEXTS_PER_STACK 8

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 18e3be193a05..b37051a13be6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1101,9 +1101,12 @@ static void __bpf_prog_put_rcu(struct rcu_head *rcu)
static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
{
if (atomic_dec_and_test(&prog->aux->refcnt)) {
+ int prog_id = prog->aux->id;
+
/* bpf_prog_free_id() must be called first */
bpf_prog_free_id(prog, do_idr_lock);
bpf_prog_kallsyms_del_all(prog);
+ perf_event_bpf_event(PERF_BPF_EVENT_PROG_UNLOAD, 0, prog_id);

call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
}
@@ -1441,6 +1444,7 @@ static int bpf_prog_load(union bpf_attr *attr)
}

bpf_prog_kallsyms_add(prog);
+ perf_event_bpf_event(PERF_BPF_EVENT_PROG_LOAD, 0, prog->aux->id);
return err;

free_used_maps:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5a97f34bc14c..54667be6669b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -385,6 +385,7 @@ static atomic_t nr_namespaces_events __read_mostly;
static atomic_t nr_task_events __read_mostly;
static atomic_t nr_freq_events __read_mostly;
static atomic_t nr_switch_events __read_mostly;
+static atomic_t nr_bpf_events __read_mostly;

static LIST_HEAD(pmus);
static DEFINE_MUTEX(pmus_lock);
@@ -4235,7 +4236,7 @@ static bool is_sb_event(struct perf_event *event)

if (attr->mmap || attr->mmap_data || attr->mmap2 ||
attr->comm || attr->comm_exec ||
- attr->task ||
+ attr->task || attr->bpf_event ||
attr->context_switch)
return true;
return false;
@@ -4305,6 +4306,8 @@ static void unaccount_event(struct perf_event *event)
dec = true;
if (has_branch_stack(event))
dec = true;
+ if (event->attr.bpf_event)
+ atomic_dec(&nr_bpf_events);

if (dec) {
if (!atomic_add_unless(&perf_sched_count, -1, 1))
@@ -7650,6 +7653,81 @@ static void perf_log_throttle(struct perf_event *event, int enable)
perf_output_end(&handle);
}

+/*
+ * bpf load/unload tracking
+ */
+
+struct perf_bpf_event {
+ struct {
+ struct perf_event_header header;
+ u16 type;
+ u16 flags;
+ u32 id;
+ } event_id;
+};
+
+static int perf_event_bpf_match(struct perf_event *event)
+{
+ return event->attr.bpf_event;
+}
+
+static void perf_event_bpf_output(struct perf_event *event,
+ void *data)
+{
+ struct perf_bpf_event *bpf_event = data;
+ struct perf_output_handle handle;
+ struct perf_sample_data sample;
+ int size = bpf_event->event_id.header.size;
+ int ret;
+
+ if (!perf_event_bpf_match(event))
+ return;
+
+ perf_event_header__init_id(&bpf_event->event_id.header, &sample, event);
+ ret = perf_output_begin(&handle, event,
+ bpf_event->event_id.header.size);
+ if (ret)
+ goto out;
+
+ perf_output_put(&handle, bpf_event->event_id);
+ perf_event__output_id_sample(event, &handle, &sample);
+
+ perf_output_end(&handle);
+out:
+ bpf_event->event_id.header.size = size;
+}
+
+static void perf_event_bpf(struct perf_bpf_event *bpf_event)
+{
+ perf_iterate_sb(perf_event_bpf_output,
+ bpf_event,
+ NULL);
+}
+
+void perf_event_bpf_event(enum perf_bpf_event_type type, u16 flags, u32 id)
+{
+ struct perf_bpf_event bpf_event;
+
+ if (!atomic_read(&nr_bpf_events))
+ return;
+
+ if (type <= PERF_BPF_EVENT_UNKNOWN || type >= PERF_BPF_EVENT_MAX)
+ return;
+
+ bpf_event = (struct perf_bpf_event){
+ .event_id = {
+ .header = {
+ .type = PERF_RECORD_BPF_EVENT,
+ .size = sizeof(bpf_event.event_id),
+ },
+ .type = type,
+ .flags = flags,
+ .id = id,
+ },
+ };
+ perf_event_bpf(&bpf_event);
+}
+
void perf_event_itrace_started(struct perf_event *event)
{
event->attach_state |= PERF_ATTACH_ITRACE;
@@ -9871,6 +9949,8 @@ static void account_event(struct perf_event *event)
inc = true;
if (is_cgroup_event(event))
inc = true;
+ if (event->attr.bpf_event)
+ atomic_inc(&nr_bpf_events);

if (inc) {
/*
--
2.17.1


2018-11-06 20:56:27

by Song Liu

[permalink] [raw]
Subject: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs

This patch enables perf-record to listen to bpf_event and generate
bpf_prog_info_event for bpf programs loaded and unloaded during
perf-record run.

To minimize latency between bpf_event and following bpf calls, separate
mmap with watermark of 1 is created to process these vip events. Then
a separate dummy event is attached to the special mmap.

By default, perf-record will listen to bpf_event. Option no-bpf-event is
added in case the user would opt out.

Signed-off-by: Song Liu <[email protected]>
---
tools/perf/builtin-record.c | 50 +++++++++++++++++++++++++++++++++++++
tools/perf/util/evlist.c | 42 ++++++++++++++++++++++++++++---
tools/perf/util/evlist.h | 4 +++
tools/perf/util/evsel.c | 8 ++++++
tools/perf/util/evsel.h | 3 +++
5 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 73b02bde1ebc..1036a64eb9f7 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -80,6 +80,7 @@ struct record {
bool buildid_all;
bool timestamp_filename;
bool timestamp_boundary;
+ bool no_bpf_event;
struct switch_output switch_output;
unsigned long long samples;
};
@@ -381,6 +382,8 @@ static int record__open(struct record *rec)
pos->tracking = 1;
pos->attr.enable_on_exec = 1;
}
+ if (!rec->no_bpf_event)
+ perf_evlist__add_bpf_tracker(evlist);

perf_evlist__config(evlist, opts, &callchain_param);

@@ -562,10 +565,55 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli
return rc;
}

+static int record__mmap_process_vip_events(struct record *rec)
+{
+ int i;
+
+ for (i = 0; i < rec->evlist->nr_mmaps; i++) {
+ struct perf_mmap *map = &rec->evlist->vip_mmap[i];
+ union perf_event *event;
+
+ perf_mmap__read_init(map);
+ while ((event = perf_mmap__read_event(map)) != NULL) {
+ pr_debug("processing vip event of type %d\n",
+ event->header.type);
+ switch (event->header.type) {
+ case PERF_RECORD_BPF_EVENT:
+ switch (event->bpf_event.type) {
+ case PERF_BPF_EVENT_PROG_LOAD:
+ perf_event__synthesize_one_bpf_prog_info(
+ &rec->tool,
+ process_synthesized_event,
+ &rec->session->machines.host,
+ event->bpf_event.id);
+ /* fall through */
+ case PERF_BPF_EVENT_PROG_UNLOAD:
+ record__write(rec, NULL, event,
+ event->header.size);
+ break;
+ default:
+ break;
+ }
+ break;
+ default:
+ break;
+ }
+ perf_mmap__consume(map);
+ }
+ perf_mmap__read_done(map);
+ }
+
+ return 0;
+}
+
static int record__mmap_read_all(struct record *rec)
{
int err;

+ err = record__mmap_process_vip_events(rec);
+ if (err)
+ return err;
+
err = record__mmap_read_evlist(rec, rec->evlist, false);
if (err)
return err;
@@ -1686,6 +1734,8 @@ static struct option __record_options[] = {
"signal"),
OPT_BOOLEAN(0, "dry-run", &dry_run,
"Parse options then exit"),
+ OPT_BOOLEAN(0, "no-bpf-event", &record.no_bpf_event,
+ "do not record event on bpf program load/unload"),
OPT_END()
};

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index be440df29615..466a9f7b1e93 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -45,6 +45,7 @@ void perf_evlist__init(struct perf_evlist *evlist, struct cpu_map *cpus,
for (i = 0; i < PERF_EVLIST__HLIST_SIZE; ++i)
INIT_HLIST_HEAD(&evlist->heads[i]);
INIT_LIST_HEAD(&evlist->entries);
+ INIT_LIST_HEAD(&evlist->vip_entries);
perf_evlist__set_maps(evlist, cpus, threads);
fdarray__init(&evlist->pollfd, 64);
evlist->workload.pid = -1;
@@ -177,6 +178,8 @@ void perf_evlist__add(struct perf_evlist *evlist, struct perf_evsel *entry)
{
entry->evlist = evlist;
list_add_tail(&entry->node, &evlist->entries);
+ if (entry->vip)
+ list_add_tail(&entry->vip_node, &evlist->vip_entries);
entry->idx = evlist->nr_entries;
entry->tracking = !entry->idx;

@@ -267,6 +270,27 @@ int perf_evlist__add_dummy(struct perf_evlist *evlist)
return 0;
}

+int perf_evlist__add_bpf_tracker(struct perf_evlist *evlist)
+{
+ struct perf_event_attr attr = {
+ .type = PERF_TYPE_SOFTWARE,
+ .config = PERF_COUNT_SW_DUMMY,
+ .watermark = 1,
+ .bpf_event = 1,
+ .wakeup_watermark = 1,
+ .size = sizeof(attr), /* to capture ABI version */
+ };
+ struct perf_evsel *evsel = perf_evsel__new_idx(&attr,
+ evlist->nr_entries);
+
+ if (evsel == NULL)
+ return -ENOMEM;
+
+ evsel->vip = true;
+ perf_evlist__add(evlist, evsel);
+ return 0;
+}
+
static int perf_evlist__add_attrs(struct perf_evlist *evlist,
struct perf_event_attr *attrs, size_t nr_attrs)
{
@@ -770,6 +794,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
int evlist_cpu = cpu_map__cpu(evlist->cpus, cpu_idx);

evlist__for_each_entry(evlist, evsel) {
+ struct perf_mmap *vip_maps = evlist->vip_mmap;
struct perf_mmap *maps = evlist->mmap;
int *output = _output;
int fd;
@@ -800,7 +825,11 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,

fd = FD(evsel, cpu, thread);

- if (*output == -1) {
+ if (evsel->vip) {
+ if (perf_mmap__mmap(&vip_maps[idx], mp,
+ fd, evlist_cpu) < 0)
+ return -1;
+ } else if (*output == -1) {
*output = fd;

if (perf_mmap__mmap(&maps[idx], mp, *output, evlist_cpu) < 0)
@@ -822,8 +851,12 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
* Therefore don't add it for polling.
*/
if (!evsel->system_wide &&
- __perf_evlist__add_pollfd(evlist, fd, &maps[idx], revent) < 0) {
- perf_mmap__put(&maps[idx]);
+ __perf_evlist__add_pollfd(
+ evlist, fd,
+ evsel->vip ? &vip_maps[idx] : &maps[idx],
+ revent) < 0) {
+ perf_mmap__put(evsel->vip ?
+ &vip_maps[idx] : &maps[idx]);
return -1;
}

@@ -1035,6 +1068,9 @@ int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
if (!evlist->mmap)
return -ENOMEM;

+ if (!evlist->vip_mmap)
+ evlist->vip_mmap = perf_evlist__alloc_mmap(evlist, false);
+
if (evlist->pollfd.entries == NULL && perf_evlist__alloc_pollfd(evlist) < 0)
return -ENOMEM;

diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index dc66436add98..6d99e8dab570 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -26,6 +26,7 @@ struct record_opts;

struct perf_evlist {
struct list_head entries;
+ struct list_head vip_entries;
struct hlist_head heads[PERF_EVLIST__HLIST_SIZE];
int nr_entries;
int nr_groups;
@@ -43,6 +44,7 @@ struct perf_evlist {
} workload;
struct fdarray pollfd;
struct perf_mmap *mmap;
+ struct perf_mmap *vip_mmap;
struct perf_mmap *overwrite_mmap;
struct thread_map *threads;
struct cpu_map *cpus;
@@ -84,6 +86,8 @@ int __perf_evlist__add_default_attrs(struct perf_evlist *evlist,

int perf_evlist__add_dummy(struct perf_evlist *evlist);

+int perf_evlist__add_bpf_tracker(struct perf_evlist *evlist);
+
int perf_evlist__add_newtp(struct perf_evlist *evlist,
const char *sys, const char *name, void *handler);

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index af9d539e4b6a..94456a493607 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -235,6 +235,7 @@ void perf_evsel__init(struct perf_evsel *evsel,
evsel->evlist = NULL;
evsel->bpf_fd = -1;
INIT_LIST_HEAD(&evsel->node);
+ INIT_LIST_HEAD(&evsel->vip_node);
INIT_LIST_HEAD(&evsel->config_terms);
perf_evsel__object.init(evsel);
evsel->sample_size = __perf_evsel__sample_size(attr->sample_type);
@@ -1795,6 +1796,8 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
PERF_SAMPLE_BRANCH_NO_CYCLES);
if (perf_missing_features.group_read && evsel->attr.inherit)
evsel->attr.read_format &= ~(PERF_FORMAT_GROUP|PERF_FORMAT_ID);
+ if (perf_missing_features.bpf_event)
+ evsel->attr.bpf_event = 0;
retry_sample_id:
if (perf_missing_features.sample_id_all)
evsel->attr.sample_id_all = 0;
@@ -1939,6 +1942,11 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
perf_missing_features.exclude_guest = true;
pr_debug2("switching off exclude_guest, exclude_host\n");
goto fallback_missing_features;
+ } else if (!perf_missing_features.bpf_event &&
+ evsel->attr.bpf_event) {
+ perf_missing_features.bpf_event = true;
+ pr_debug2("switching off bpf_event\n");
+ goto fallback_missing_features;
} else if (!perf_missing_features.sample_id_all) {
perf_missing_features.sample_id_all = true;
pr_debug2("switching off sample_id_all\n");
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 4107c39f4a54..82b1d3e42603 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -89,6 +89,7 @@ struct perf_stat_evsel;
*/
struct perf_evsel {
struct list_head node;
+ struct list_head vip_node;
struct perf_evlist *evlist;
struct perf_event_attr attr;
char *filter;
@@ -128,6 +129,7 @@ struct perf_evsel {
bool ignore_missing_thread;
bool forced_leader;
bool use_uncore_alias;
+ bool vip; /* vip events have their own mmap */
/* parse modifier helper */
int exclude_GH;
int nr_members;
@@ -163,6 +165,7 @@ struct perf_missing_features {
bool lbr_flags;
bool write_backward;
bool group_read;
+ bool bpf_event;
};

extern struct perf_missing_features perf_missing_features;
--
2.17.1


2018-11-06 20:56:32

by Song Liu

[permalink] [raw]
Subject: [RFC perf,bpf 4/5] perf util: introduce bpf_prog_info_event

This patch introduces struct bpf_prog_info_event to union perf_event.

struct bpf_prog_info_event {
struct perf_event_header header;
u32 prog_info_len;
u32 ksym_table_len;
u64 ksym_table;
struct bpf_prog_info prog_info;
char data[];
};

struct bpf_prog_info_event contains information about a bpf program.
These events are written to perf.data by perf-record, and processed by
perf-report.

struct bpf_prog_info_event uses arrays for some data (ksym_table, and
arrays in struct bpf_prog_info). To make these arrays easy to serialize,
we allocate continuous memory (data). These array pointers are translated
to offset in bpf_prog_info_event before written to file. And vice-versa
when the event is read from file.

This patch enables synthesizing these events at the beginning of
perf-record run. Next patch will process short living bpf programs that
are created during perf-record.

Signed-off-by: Song Liu <[email protected]>
---
tools/perf/builtin-record.c | 5 +
tools/perf/builtin-report.c | 2 +
tools/perf/util/Build | 2 +
tools/perf/util/bpf-info.c | 287 ++++++++++++++++++++++++++++++++++++
tools/perf/util/bpf-info.h | 29 ++++
tools/perf/util/event.c | 1 +
tools/perf/util/event.h | 14 ++
tools/perf/util/session.c | 4 +
tools/perf/util/tool.h | 3 +-
9 files changed, 346 insertions(+), 1 deletion(-)
create mode 100644 tools/perf/util/bpf-info.c
create mode 100644 tools/perf/util/bpf-info.h

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 0980dfe3396b..73b02bde1ebc 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -41,6 +41,7 @@
#include "util/perf-hooks.h"
#include "util/time-utils.h"
#include "util/units.h"
+#include "util/bpf-info.h"
#include "asm/bug.h"

#include <errno.h>
@@ -850,6 +851,9 @@ static int record__synthesize(struct record *rec, bool tail)
err = __machine__synthesize_threads(machine, tool, &opts->target, rec->evlist->threads,
process_synthesized_event, opts->sample_address,
opts->proc_map_timeout, 1);
+
+ err = perf_event__synthesize_bpf_prog_info(
+ &rec->tool, process_synthesized_event, machine);
out:
return err;
}
@@ -1531,6 +1535,7 @@ static struct record record = {
.namespaces = perf_event__process_namespaces,
.mmap = perf_event__process_mmap,
.mmap2 = perf_event__process_mmap2,
+ .bpf_prog_info = perf_event__process_bpf_prog_info,
.ordered_events = true,
},
};
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index c0703979c51d..4a9a3e8da4e0 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -41,6 +41,7 @@
#include "util/auxtrace.h"
#include "util/units.h"
#include "util/branch.h"
+#include "util/bpf-info.h"

#include <dlfcn.h>
#include <errno.h>
@@ -981,6 +982,7 @@ int cmd_report(int argc, const char **argv)
.auxtrace_info = perf_event__process_auxtrace_info,
.auxtrace = perf_event__process_auxtrace,
.feature = process_feature_event,
+ .bpf_prog_info = perf_event__process_bpf_prog_info,
.ordered_events = true,
.ordering_requires_timestamps = true,
},
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index ecd9f9ceda77..624c7281217c 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -150,6 +150,8 @@ endif

libperf-y += perf-hooks.o

+libperf-$(CONFIG_LIBBPF) += bpf-info.o
+
libperf-$(CONFIG_CXX) += c++/

CFLAGS_config.o += -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
diff --git a/tools/perf/util/bpf-info.c b/tools/perf/util/bpf-info.c
new file mode 100644
index 000000000000..fa598c4328be
--- /dev/null
+++ b/tools/perf/util/bpf-info.c
@@ -0,0 +1,287 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2018 Facebook
+ */
+#include <errno.h>
+#include <stdio.h>
+#include <bpf/bpf.h>
+#include "bpf-info.h"
+#include "debug.h"
+#include "session.h"
+
+#define KSYM_NAME_LEN 128
+#define BPF_PROG_INFO_MIN_SIZE 128 /* minimal require jited_func_lens */
+
+static inline __u64 ptr_to_u64(const void *ptr)
+{
+ return (__u64) (unsigned long) ptr;
+}
+
+/* fetch information of the bpf program via bpf syscall. */
+struct bpf_prog_info_event *perf_bpf_info__get_bpf_prog_info_event(u32 prog_id)
+{
+ struct bpf_prog_info_event *prog_info_event = NULL;
+ struct bpf_prog_info info = {};
+ u32 info_len = sizeof(info);
+ u32 event_len, i;
+ int fd, err;
+ void *ptr;
+
+ fd = bpf_prog_get_fd_by_id(prog_id);
+ if (fd < 0) {
+ pr_debug("Failed to get fd for prog_id %u\n", prog_id);
+ return NULL;
+ }
+
+ err = bpf_obj_get_info_by_fd(fd, &info, &info_len);
+ if (err) {
+ pr_debug("can't get prog info: %s", strerror(errno));
+ goto close_fd;
+ }
+ if (info_len < BPF_PROG_INFO_MIN_SIZE) {
+ pr_debug("kernel is too old to support proper prog info\n");
+ goto close_fd;
+ }
+
+ /* calculate size of bpf_prog_info_event */
+ event_len = sizeof(struct bpf_prog_info_event);
+ event_len += info_len;
+ event_len -= sizeof(info);
+ event_len += info.jited_prog_len;
+ event_len += info.xlated_prog_len;
+ event_len += info.nr_map_ids * sizeof(u32);
+ event_len += info.nr_jited_ksyms * sizeof(u64);
+ event_len += info.nr_jited_func_lens * sizeof(u32);
+ event_len += info.nr_jited_ksyms * KSYM_NAME_LEN;
+
+ prog_info_event = (struct bpf_prog_info_event *) malloc(event_len);
+ if (!prog_info_event)
+ goto close_fd;
+
+ /* assign pointers for map_ids, jited_prog_insns, etc. */
+ ptr = prog_info_event->data;
+ info.map_ids = ptr_to_u64(ptr);
+ ptr += info.nr_map_ids * sizeof(u32);
+ info.jited_prog_insns = ptr_to_u64(ptr);
+ ptr += info.jited_prog_len;
+ info.xlated_prog_insns = ptr_to_u64(ptr);
+ ptr += info.xlated_prog_len;
+ info.jited_ksyms = ptr_to_u64(ptr);
+ ptr += info.nr_jited_ksyms * sizeof(u64);
+ info.jited_func_lens = ptr_to_u64(ptr);
+ ptr += info.nr_jited_func_lens * sizeof(u32);
+
+ err = bpf_obj_get_info_by_fd(fd, &info, &info_len);
+ if (err) {
+ pr_err("can't get prog info: %s\n", strerror(errno));
+ free(prog_info_event);
+ prog_info_event = NULL;
+ goto close_fd;
+ }
+
+ /* fill data in prog_info_event */
+ prog_info_event->header.type = PERF_RECORD_BPF_PROG_INFO;
+ prog_info_event->header.misc = 0;
+ prog_info_event->prog_info_len = info_len;
+
+ memcpy(&prog_info_event->prog_info, &info, info_len);
+
+ prog_info_event->ksym_table_len = 0;
+ prog_info_event->ksym_table = ptr_to_u64(ptr);
+
+ /* fill in fake symbol name for now, add real name after BTF */
+ if (info.nr_jited_func_lens == 1 && info.name) { /* only main prog */
+ size_t l;
+
+ assert(info.nr_jited_ksyms == 1);
+ l = snprintf(ptr, KSYM_NAME_LEN, "bpf_prog_%s", info.name);
+ prog_info_event->ksym_table_len += l + 1;
+ ptr += l + 1;
+
+ } else {
+ assert(info.nr_jited_ksyms == info.nr_jited_func_lens);
+
+ for (i = 0; i < info.nr_jited_ksyms; i++) {
+ size_t l;
+
+ l = snprintf(ptr, KSYM_NAME_LEN, "bpf_prog_%d_%d",
+ info.id, i);
+ prog_info_event->ksym_table_len += l + 1;
+ ptr += l + 1;
+ }
+ }
+
+ prog_info_event->header.size = ptr - (void *)prog_info_event;
+
+close_fd:
+ close(fd);
+ return prog_info_event;
+}
+
+static size_t fprintf_bpf_prog_info(
+ struct bpf_prog_info_event *prog_info_event, FILE *fp)
+{
+ struct bpf_prog_info *info = &prog_info_event->prog_info;
+ unsigned long *jited_ksyms = (unsigned long *)(info->jited_ksyms);
+ char *name_ptr = (char *) prog_info_event->ksym_table;
+ unsigned int i;
+ size_t ret;
+
+ ret = fprintf(fp, "bpf_prog: type: %u id: %u ", info->type, info->id);
+ ret += fprintf(fp, "nr_jited_ksyms: %u\n", info->nr_jited_ksyms);
+
+ for (i = 0; i < info->nr_jited_ksyms; i++) {
+ ret += fprintf(fp, "jited_ksyms[%u]: %lx %s\n",
+ i, jited_ksyms[i], name_ptr);
+ name_ptr += strlen(name_ptr);
+ }
+ return ret;
+}
+
+size_t perf_event__fprintf_bpf_prog_info(union perf_event *event, FILE *fp)
+{
+ return fprintf_bpf_prog_info(&event->bpf_prog_info, fp);
+}
+
+/*
+ * translate all array ptr to offset from base address, called before
+ * writing the event to file
+ */
+void perf_bpf_info__ptr_to_offset(
+ struct bpf_prog_info_event *prog_info_event)
+{
+ u64 base = ptr_to_u64(prog_info_event);
+
+ prog_info_event->ksym_table -= base;
+ prog_info_event->prog_info.jited_prog_insns -= base;
+ prog_info_event->prog_info.xlated_prog_insns -= base;
+ prog_info_event->prog_info.map_ids -= base;
+ prog_info_event->prog_info.jited_ksyms -= base;
+ prog_info_event->prog_info.jited_func_lens -= base;
+}
+
+/*
+ * translate offset from base address to array pointer, called after
+ * reading the event from file
+ */
+void perf_bpf_info__offset_to_ptr(
+ struct bpf_prog_info_event *prog_info_event)
+{
+ u64 base = ptr_to_u64(prog_info_event);
+
+ prog_info_event->ksym_table += base;
+ prog_info_event->prog_info.jited_prog_insns += base;
+ prog_info_event->prog_info.xlated_prog_insns += base;
+ prog_info_event->prog_info.map_ids += base;
+ prog_info_event->prog_info.jited_ksyms += base;
+ prog_info_event->prog_info.jited_func_lens += base;
+}
+
+int perf_event__synthesize_one_bpf_prog_info(struct perf_tool *tool,
+ perf_event__handler_t process,
+ struct machine *machine,
+ __u32 id)
+{
+ struct bpf_prog_info_event *prog_info_event;
+
+ prog_info_event = perf_bpf_info__get_bpf_prog_info_event(id);
+
+ if (!prog_info_event) {
+ pr_err("Failed to get prog_info_event\n");
+ return -1;
+ }
+ perf_bpf_info__ptr_to_offset(prog_info_event);
+
+ if (perf_tool__process_synth_event(
+ tool, (union perf_event *)prog_info_event,
+ machine, process) != 0) {
+ free(prog_info_event);
+ return -1;
+ }
+
+ free(prog_info_event);
+ return 0;
+}
+
+int perf_event__synthesize_bpf_prog_info(struct perf_tool *tool,
+ perf_event__handler_t process,
+ struct machine *machine)
+{
+ __u32 id = 0;
+ int err = 0;
+
+ while (true) {
+ err = bpf_prog_get_next_id(id, &id);
+ if (err) {
+ if (errno == ENOENT) {
+ err = 0;
+ break;
+ }
+ fprintf(stderr, "can't get next program: %s%s",
+ strerror(errno),
+ errno == EINVAL ? " -- kernel too old?" : "");
+ err = -1;
+ break;
+ }
+ err = perf_event__synthesize_one_bpf_prog_info(
+ tool, process, machine, id);
+ }
+ return err;
+}
+
+int perf_event__process_bpf_prog_info(struct perf_session *session,
+ union perf_event *event)
+{
+ struct machine *machine = &session->machines.host;
+ struct bpf_prog_info_event *prog_info_event;
+ struct bpf_prog_info *info;
+ struct symbol *sym;
+ struct map *map;
+ char *name_ptr;
+ int ret = 0;
+ u64 *addrs;
+ u32 *lens;
+ u32 i;
+
+ prog_info_event = (struct bpf_prog_info_event *)
+ malloc(event->header.size);
+ if (!prog_info_event)
+ return -ENOMEM;
+
+ /* copy the data to rw memeory so we can modify it */
+ memcpy(prog_info_event, &event->bpf_prog_info, event->header.size);
+ info = &prog_info_event->prog_info;
+
+ perf_bpf_info__offset_to_ptr(prog_info_event);
+ name_ptr = (char *) prog_info_event->ksym_table;
+ addrs = (u64 *)info->jited_ksyms;
+ lens = (u32 *)info->jited_func_lens;
+ for (i = 0; i < info->nr_jited_ksyms; i++) {
+ u32 len = info->nr_jited_func_lens == 1 ?
+ len = info->jited_prog_len : lens[i];
+
+ map = map_groups__find(&machine->kmaps, addrs[i]);
+ if (!map) {
+ map = dso__new_map("bpf_prog");
+ if (!map) {
+ ret = -ENOMEM;
+ break;
+ }
+ map->start = addrs[i];
+ map->pgoff = map->start;
+ map->end = map->start + len;
+ map_groups__insert(&machine->kmaps, map);
+ }
+
+ sym = symbol__new(addrs[i], len, 0, 0, name_ptr);
+ if (!sym) {
+ ret = -ENOMEM;
+ break;
+ }
+ dso__insert_symbol(map->dso, sym);
+ name_ptr += strlen(name_ptr) + 1;
+ }
+
+ free(prog_info_event);
+ return ret;
+}
diff --git a/tools/perf/util/bpf-info.h b/tools/perf/util/bpf-info.h
new file mode 100644
index 000000000000..813cad07bacb
--- /dev/null
+++ b/tools/perf/util/bpf-info.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PERF_BPF_INFO_H
+#define __PERF_BPF_INFO_H
+
+#include "event.h"
+#include "machine.h"
+#include "tool.h"
+#include "symbol.h"
+
+struct bpf_prog_info_event *perf_bpf_info__get_bpf_prog_info_event(u32 prog_id);
+
+size_t perf_event__fprintf_bpf_prog_info(union perf_event *event, FILE *fp);
+
+int perf_event__synthesize_one_bpf_prog_info(struct perf_tool *tool,
+ perf_event__handler_t process,
+ struct machine *machine,
+ __u32 id);
+
+int perf_event__synthesize_bpf_prog_info(struct perf_tool *tool,
+ perf_event__handler_t process,
+ struct machine *machine);
+
+void perf_bpf_info__ptr_to_offset(struct bpf_prog_info_event *prog_info_event);
+void perf_bpf_info__offset_to_ptr(struct bpf_prog_info_event *prog_info_event);
+
+int perf_event__process_bpf_prog_info(struct perf_session *session,
+ union perf_event *event);
+
+#endif /* __PERF_BPF_INFO_H */
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 601432afbfb2..33b1c168b83e 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -61,6 +61,7 @@ static const char *perf_event__names[] = {
[PERF_RECORD_EVENT_UPDATE] = "EVENT_UPDATE",
[PERF_RECORD_TIME_CONV] = "TIME_CONV",
[PERF_RECORD_HEADER_FEATURE] = "FEATURE",
+ [PERF_RECORD_BPF_PROG_INFO] = "BPF_PROG_INFO",
};

static const char *perf_ns__names[] = {
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 13a0c64dd0ed..dc64d800eaa6 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -5,6 +5,7 @@
#include <limits.h>
#include <stdio.h>
#include <linux/kernel.h>
+#include <linux/bpf.h>

#include "../perf.h"
#include "build-id.h"
@@ -258,6 +259,7 @@ enum perf_user_event_type { /* above any possible kernel type */
PERF_RECORD_EVENT_UPDATE = 78,
PERF_RECORD_TIME_CONV = 79,
PERF_RECORD_HEADER_FEATURE = 80,
+ PERF_RECORD_BPF_PROG_INFO = 81,
PERF_RECORD_HEADER_MAX
};

@@ -629,6 +631,17 @@ struct feature_event {
char data[];
};

+#define KSYM_NAME_LEN 128
+
+struct bpf_prog_info_event {
+ struct perf_event_header header;
+ u32 prog_info_len;
+ u32 ksym_table_len;
+ u64 ksym_table;
+ struct bpf_prog_info prog_info;
+ char data[];
+};
+
union perf_event {
struct perf_event_header header;
struct mmap_event mmap;
@@ -661,6 +674,7 @@ union perf_event {
struct time_conv_event time_conv;
struct feature_event feat;
struct bpf_event bpf_event;
+ struct bpf_prog_info_event bpf_prog_info;
};

void perf_event__print_totals(void);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index dffe5120d2d3..5365ee1dfbec 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -415,6 +415,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
tool->time_conv = process_event_op2_stub;
if (tool->feature == NULL)
tool->feature = process_event_op2_stub;
+ if (tool->bpf_prog_info == NULL)
+ tool->bpf_prog_info = process_event_op2_stub;
}

static void swap_sample_id_all(union perf_event *event, void *data)
@@ -1397,6 +1399,8 @@ static s64 perf_session__process_user_event(struct perf_session *session,
return tool->time_conv(session, event);
case PERF_RECORD_HEADER_FEATURE:
return tool->feature(session, event);
+ case PERF_RECORD_BPF_PROG_INFO:
+ return tool->bpf_prog_info(session, event);
default:
return -EINVAL;
}
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 69ae898ca024..739a4b1188f7 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -70,7 +70,8 @@ struct perf_tool {
stat_config,
stat,
stat_round,
- feature;
+ feature,
+ bpf_prog_info;
event_op3 auxtrace;
bool ordered_events;
bool ordering_requires_timestamps;
--
2.17.1


2018-11-06 21:14:08

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC perf,bpf 4/5] perf util: introduce bpf_prog_info_event

On 11/6/18 12:52 PM, Song Liu wrote:
> + /* fill in fake symbol name for now, add real name after BTF */
> + if (info.nr_jited_func_lens == 1 && info.name) { /* only main prog */
> + size_t l;
> +
> + assert(info.nr_jited_ksyms == 1);
> + l = snprintf(ptr, KSYM_NAME_LEN, "bpf_prog_%s", info.name);

please include the prog tag here. Just like kernel kallsyms do.
Other than this small nit the patch set looks great to me.

2018-11-06 21:52:20

by Song Liu

[permalink] [raw]
Subject: [RFC perf,bpf 3/5] perf util: basic handling of PERF_RECORD_BPF_EVENT

This patch adds basic handling of PERF_RECORD_BPF_EVENT in perf util.
Following patches add more logic that leverages these events.

Signed-off-by: Song Liu <[email protected]>
---
tools/perf/util/event.c | 19 +++++++++++++++++++
tools/perf/util/event.h | 15 +++++++++++++++
tools/perf/util/evsel.c | 1 +
tools/perf/util/machine.c | 10 ++++++++++
tools/perf/util/machine.h | 2 ++
tools/perf/util/session.c | 4 ++++
tools/perf/util/tool.h | 4 +++-
7 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 0cd42150f712..601432afbfb2 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -43,6 +43,7 @@ static const char *perf_event__names[] = {
[PERF_RECORD_SWITCH] = "SWITCH",
[PERF_RECORD_SWITCH_CPU_WIDE] = "SWITCH_CPU_WIDE",
[PERF_RECORD_NAMESPACES] = "NAMESPACES",
+ [PERF_RECORD_BPF_EVENT] = "BPF_EVENT",
[PERF_RECORD_HEADER_ATTR] = "ATTR",
[PERF_RECORD_HEADER_EVENT_TYPE] = "EVENT_TYPE",
[PERF_RECORD_HEADER_TRACING_DATA] = "TRACING_DATA",
@@ -1333,6 +1334,14 @@ int perf_event__process_switch(struct perf_tool *tool __maybe_unused,
return machine__process_switch_event(machine, event);
}

+int perf_event__process_bpf_event(struct perf_tool *tool __maybe_unused,
+ union perf_event *event,
+ struct perf_sample *sample __maybe_unused,
+ struct machine *machine)
+{
+ return machine__process_bpf_event(machine, event);
+}
+
size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
{
return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64 "]: %c %s\n",
@@ -1465,6 +1474,13 @@ static size_t perf_event__fprintf_lost(union perf_event *event, FILE *fp)
return fprintf(fp, " lost %" PRIu64 "\n", event->lost.lost);
}

+size_t perf_event__fprintf_bpf_event(union perf_event *event, FILE *fp)
+{
+ return fprintf(fp, " bpf event with type %u, flags %u, id %u\n",
+ event->bpf_event.type, event->bpf_event.flags,
+ event->bpf_event.id);
+}
+
size_t perf_event__fprintf(union perf_event *event, FILE *fp)
{
size_t ret = fprintf(fp, "PERF_RECORD_%s",
@@ -1500,6 +1516,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
case PERF_RECORD_LOST:
ret += perf_event__fprintf_lost(event, fp);
break;
+ case PERF_RECORD_BPF_EVENT:
+ ret += perf_event__fprintf_bpf_event(event, fp);
+ break;
default:
ret += fprintf(fp, "\n");
}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index bfa60bcafbde..13a0c64dd0ed 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -84,6 +84,15 @@ struct throttle_event {
u64 stream_id;
};

+
+/* Record different types of bpf events, see enum perf_bpf_event_type */
+struct bpf_event {
+ struct perf_event_header header;
+ u16 type;
+ u16 flags;
+ u32 id;
+};
+
#define PERF_SAMPLE_MASK \
(PERF_SAMPLE_IP | PERF_SAMPLE_TID | \
PERF_SAMPLE_TIME | PERF_SAMPLE_ADDR | \
@@ -651,6 +660,7 @@ union perf_event {
struct stat_round_event stat_round;
struct time_conv_event time_conv;
struct feature_event feat;
+ struct bpf_event bpf_event;
};

void perf_event__print_totals(void);
@@ -750,6 +760,10 @@ int perf_event__process_exit(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
struct machine *machine);
+int perf_event__process_bpf_event(struct perf_tool *tool,
+ union perf_event *event,
+ struct perf_sample *sample,
+ struct machine *machine);
int perf_tool__process_synth_event(struct perf_tool *tool,
union perf_event *event,
struct machine *machine,
@@ -814,6 +828,7 @@ size_t perf_event__fprintf_switch(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_thread_map(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_cpu_map(union perf_event *event, FILE *fp);
size_t perf_event__fprintf_namespaces(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_bpf_event(union perf_event *event, FILE *fp);
size_t perf_event__fprintf(union perf_event *event, FILE *fp);

int kallsyms__get_function_start(const char *kallsyms_filename,
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index cb7f01059940..af9d539e4b6a 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1635,6 +1635,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
PRINT_ATTRf(context_switch, p_unsigned);
PRINT_ATTRf(write_backward, p_unsigned);
PRINT_ATTRf(namespaces, p_unsigned);
+ PRINT_ATTRf(bpf_event, p_unsigned);

PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
PRINT_ATTRf(bp_type, p_unsigned);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 111ae858cbcb..74c295f6fdc4 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -681,6 +681,14 @@ int machine__process_switch_event(struct machine *machine __maybe_unused,
return 0;
}

+int machine__process_bpf_event(struct machine *machine __maybe_unused,
+ union perf_event *event)
+{
+ if (dump_trace)
+ perf_event__fprintf_bpf_event(event, stdout);
+ return 0;
+}
+
static void dso__adjust_kmod_long_name(struct dso *dso, const char *filename)
{
const char *dup_filename;
@@ -1795,6 +1803,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
case PERF_RECORD_SWITCH:
case PERF_RECORD_SWITCH_CPU_WIDE:
ret = machine__process_switch_event(machine, event); break;
+ case PERF_RECORD_BPF_EVENT:
+ ret = machine__process_bpf_event(machine, event); break;
default:
ret = -1;
break;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index d856b85862e2..0ed237d6fd0a 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -127,6 +127,8 @@ int machine__process_mmap_event(struct machine *machine, union perf_event *event
struct perf_sample *sample);
int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
struct perf_sample *sample);
+int machine__process_bpf_event(struct machine *machine,
+ union perf_event *event);
int machine__process_event(struct machine *machine, union perf_event *event,
struct perf_sample *sample);

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7d2c8ce6cfad..dffe5120d2d3 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -371,6 +371,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
tool->itrace_start = perf_event__process_itrace_start;
if (tool->context_switch == NULL)
tool->context_switch = perf_event__process_switch;
+ if (tool->bpf_event == NULL)
+ tool->bpf_event = perf_event__process_bpf_event;
if (tool->read == NULL)
tool->read = process_event_sample_stub;
if (tool->throttle == NULL)
@@ -1300,6 +1302,8 @@ static int machines__deliver_event(struct machines *machines,
case PERF_RECORD_SWITCH:
case PERF_RECORD_SWITCH_CPU_WIDE:
return tool->context_switch(tool, event, sample, machine);
+ case PERF_RECORD_BPF_EVENT:
+ return tool->bpf_event(tool, event, sample, machine);
default:
++evlist->stats.nr_unknown_events;
return -1;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 56e4ca54020a..69ae898ca024 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -53,7 +53,9 @@ struct perf_tool {
itrace_start,
context_switch,
throttle,
- unthrottle;
+ unthrottle,
+ bpf_event;
+
event_attr_op attr;
event_attr_op event_update;
event_op2 tracing_data;
--
2.17.1


2018-11-06 21:56:54

by David Ahern

[permalink] [raw]
Subject: Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs

On 11/6/18 1:52 PM, Song Liu wrote:
> +
> static int record__mmap_read_all(struct record *rec)
> {
> int err;
>
> + err = record__mmap_process_vip_events(rec);
> + if (err)
> + return err;
> +
> err = record__mmap_read_evlist(rec, rec->evlist, false);
> if (err)
> return err;

Seems to me that is going to increase the overhead of perf on any system
doing BPF updates. The BPF events cause a wakeup every load and unload,
and perf processes not only the VIP events but then walks all of the
other maps.

> @@ -1686,6 +1734,8 @@ static struct option __record_options[] = {
> "signal"),
> OPT_BOOLEAN(0, "dry-run", &dry_run,
> "Parse options then exit"),
> + OPT_BOOLEAN(0, "no-bpf-event", &record.no_bpf_event,
> + "do not record event on bpf program load/unload"),

Why should this default on? If am recording FIB events, I don't care
about BPF events.


2018-11-06 23:19:41

by Song Liu

[permalink] [raw]
Subject: Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs



> On Nov 6, 2018, at 1:54 PM, David Ahern <[email protected]> wrote:
>
> On 11/6/18 1:52 PM, Song Liu wrote:
>> +
>> static int record__mmap_read_all(struct record *rec)
>> {
>> int err;
>>
>> + err = record__mmap_process_vip_events(rec);
>> + if (err)
>> + return err;
>> +
>> err = record__mmap_read_evlist(rec, rec->evlist, false);
>> if (err)
>> return err;
>
> Seems to me that is going to increase the overhead of perf on any system
> doing BPF updates. The BPF events cause a wakeup every load and unload,
> and perf processes not only the VIP events but then walks all of the
> other maps.

BPF prog load/unload events should be rare events in real world use cases.
So I think the overhead is OK. Also, I don't see an easy way to improve
this.

>
>> @@ -1686,6 +1734,8 @@ static struct option __record_options[] = {
>> "signal"),
>> OPT_BOOLEAN(0, "dry-run", &dry_run,
>> "Parse options then exit"),
>> + OPT_BOOLEAN(0, "no-bpf-event", &record.no_bpf_event,
>> + "do not record event on bpf program load/unload"),
>
> Why should this default on? If am recording FIB events, I don't care
> about BPF events.
>

I am OK with default off if that's the preferred way.

Thanks,
Song


2018-11-06 23:31:52

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs

On 11/6/18 3:17 PM, Song Liu wrote:
>
>
>> On Nov 6, 2018, at 1:54 PM, David Ahern <[email protected]> wrote:
>>
>> On 11/6/18 1:52 PM, Song Liu wrote:
>>> +
>>> static int record__mmap_read_all(struct record *rec)
>>> {
>>> int err;
>>>
>>> + err = record__mmap_process_vip_events(rec);
>>> + if (err)
>>> + return err;
>>> +
>>> err = record__mmap_read_evlist(rec, rec->evlist, false);
>>> if (err)
>>> return err;
>>
>> Seems to me that is going to increase the overhead of perf on any system
>> doing BPF updates. The BPF events cause a wakeup every load and unload,
>> and perf processes not only the VIP events but then walks all of the
>> other maps.
>
> BPF prog load/unload events should be rare events in real world use cases.
> So I think the overhead is OK. Also, I don't see an easy way to improve
> this.
>
>>
>>> @@ -1686,6 +1734,8 @@ static struct option __record_options[] = {
>>> "signal"),
>>> OPT_BOOLEAN(0, "dry-run", &dry_run,
>>> "Parse options then exit"),
>>> + OPT_BOOLEAN(0, "no-bpf-event", &record.no_bpf_event,
>>> + "do not record event on bpf program load/unload"),
>>
>> Why should this default on? If am recording FIB events, I don't care
>> about BPF events.
>>
>
> I am OK with default off if that's the preferred way.

I think concerns with perf overhead from collecting bpf events
are unfounded.
I would prefer for this flag to be on by default.

2018-11-06 23:38:45

by David Miller

[permalink] [raw]
Subject: Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs

From: Alexei Starovoitov <[email protected]>
Date: Tue, 6 Nov 2018 23:29:07 +0000

> I think concerns with perf overhead from collecting bpf events
> are unfounded.
> I would prefer for this flag to be on by default.

I will sit in userspace looping over bpf load/unload and see how the
person trying to monitor something else with perf feels about that.

Really, it is inappropriate to turn this on by default, I completely
agree with David Ahern.

It's hard enough, _AS IS_, for me to fight back all of the bloat that
is in perf right now and get it back to being able to handle simple
full workloads without dropping events..

Every new event type like this sets us back.

If people want to monitor new things, or have new functionality, fine.

But not by default, please.

2018-11-07 00:15:33

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs

On 11/6/18 3:36 PM, David Miller wrote:
> From: Alexei Starovoitov <[email protected]>
> Date: Tue, 6 Nov 2018 23:29:07 +0000
>
>> I think concerns with perf overhead from collecting bpf events
>> are unfounded.
>> I would prefer for this flag to be on by default.
>
> I will sit in userspace looping over bpf load/unload and see how the
> person trying to monitor something else with perf feels about that.
>
> Really, it is inappropriate to turn this on by default, I completely
> agree with David Ahern.
>
> It's hard enough, _AS IS_, for me to fight back all of the bloat that
> is in perf right now and get it back to being able to handle simple
> full workloads without dropping events..

It's a separate perf thread and separate event with its own epoll.
I don't see how it can affect main event collection.
Let's put it this way. If it does affect somehow, then yes,
it should not be on. If it is not, there is no downside to keep it on.
Typical user expects to type 'perf record' and see everything that
is happening on the system. Right now short lived bpf programs
will not be seen. How user suppose to even know when to use the flag?
The only option is to always pass the flag 'just in case'
which is unnecessary burden.
The problem of dropped events is certainly valid, but it's
a separate issue. The aio stuff that Alexey Budankov is working on
suppose to address that.

2018-11-07 00:25:37

by David Ahern

[permalink] [raw]
Subject: Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs

On 11/6/18 5:13 PM, Alexei Starovoitov wrote:
> On 11/6/18 3:36 PM, David Miller wrote:
>> From: Alexei Starovoitov <[email protected]>
>> Date: Tue, 6 Nov 2018 23:29:07 +0000
>>
>>> I think concerns with perf overhead from collecting bpf events
>>> are unfounded.
>>> I would prefer for this flag to be on by default.
>>
>> I will sit in userspace looping over bpf load/unload and see how the
>> person trying to monitor something else with perf feels about that.
>>
>> Really, it is inappropriate to turn this on by default, I completely
>> agree with David Ahern.
>>
>> It's hard enough, _AS IS_, for me to fight back all of the bloat that
>> is in perf right now and get it back to being able to handle simple
>> full workloads without dropping events..
>
> It's a separate perf thread and separate event with its own epoll.
> I don't see how it can affect main event collection.
> Let's put it this way. If it does affect somehow, then yes,
> it should not be on. If it is not, there is no downside to keep it on.
> Typical user expects to type 'perf record' and see everything that
> is happening on the system. Right now short lived bpf programs
> will not be seen. How user suppose to even know when to use the flag?

The default is profiling where perf record collects task events and
periodic samples. So for the default record/report, the bpf load /
unload events are not relevant.

> The only option is to always pass the flag 'just in case'
> which is unnecessary burden.

People who care about an event enable the event collection of the event.

2018-11-07 00:27:46

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs

On 11/6/18 4:23 PM, David Ahern wrote:
> On 11/6/18 5:13 PM, Alexei Starovoitov wrote:
>> On 11/6/18 3:36 PM, David Miller wrote:
>>> From: Alexei Starovoitov <[email protected]>
>>> Date: Tue, 6 Nov 2018 23:29:07 +0000
>>>
>>>> I think concerns with perf overhead from collecting bpf events
>>>> are unfounded.
>>>> I would prefer for this flag to be on by default.
>>>
>>> I will sit in userspace looping over bpf load/unload and see how the
>>> person trying to monitor something else with perf feels about that.
>>>
>>> Really, it is inappropriate to turn this on by default, I completely
>>> agree with David Ahern.
>>>
>>> It's hard enough, _AS IS_, for me to fight back all of the bloat that
>>> is in perf right now and get it back to being able to handle simple
>>> full workloads without dropping events..
>>
>> It's a separate perf thread and separate event with its own epoll.
>> I don't see how it can affect main event collection.
>> Let's put it this way. If it does affect somehow, then yes,
>> it should not be on. If it is not, there is no downside to keep it on.
>> Typical user expects to type 'perf record' and see everything that
>> is happening on the system. Right now short lived bpf programs
>> will not be seen. How user suppose to even know when to use the flag?
>
> The default is profiling where perf record collects task events and
> periodic samples. So for the default record/report, the bpf load /
> unload events are not relevant.

Exactly the opposite.
It's for default 'perf record' collection of periodic samples.
It can be off for -e collection. That's easy.

2018-11-07 00:52:19

by David Ahern

[permalink] [raw]
Subject: Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs

On 11/6/18 5:26 PM, Alexei Starovoitov wrote:
> On 11/6/18 4:23 PM, David Ahern wrote:
>> On 11/6/18 5:13 PM, Alexei Starovoitov wrote:
>>> On 11/6/18 3:36 PM, David Miller wrote:
>>>> From: Alexei Starovoitov <[email protected]>
>>>> Date: Tue, 6 Nov 2018 23:29:07 +0000
>>>>
>>>>> I think concerns with perf overhead from collecting bpf events
>>>>> are unfounded.
>>>>> I would prefer for this flag to be on by default.
>>>>
>>>> I will sit in userspace looping over bpf load/unload and see how the
>>>> person trying to monitor something else with perf feels about that.
>>>>
>>>> Really, it is inappropriate to turn this on by default, I completely
>>>> agree with David Ahern.
>>>>
>>>> It's hard enough, _AS IS_, for me to fight back all of the bloat that
>>>> is in perf right now and get it back to being able to handle simple
>>>> full workloads without dropping events..
>>>
>>> It's a separate perf thread and separate event with its own epoll.
>>> I don't see how it can affect main event collection.
>>> Let's put it this way. If it does affect somehow, then yes,
>>> it should not be on. If it is not, there is no downside to keep it on.
>>> Typical user expects to type 'perf record' and see everything that
>>> is happening on the system. Right now short lived bpf programs
>>> will not be seen. How user suppose to even know when to use the flag?
>>
>> The default is profiling where perf record collects task events and
>> periodic samples. So for the default record/report, the bpf load /
>> unload events are not relevant.
>
> Exactly the opposite.
> It's for default 'perf record' collection of periodic samples.
> It can be off for -e collection. That's easy.
>

So one use case is profiling bpf programs. I was also considering the
auditing discussion from some weeks ago which I thought the events are
also targeting.

As for the overhead, I did not see a separate thread getting spun off
for the bpf events, so the events are processed inline for this RFC set.

2018-11-07 01:13:28

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs

On 11/6/18 4:44 PM, David Ahern wrote:
>
> So one use case is profiling bpf programs. I was also considering the
> auditing discussion from some weeks ago which I thought the events are
> also targeting.

yes. there should be separate mode for 're: audit discussion' where
only bpf events are collected. This patch set doesn't add that to
perf user space side.
The kernel side is common though. It can be used for bpf load/unload
only and for different use case in this set. Which is making
bpf program appear in normal 'perf report'.
Please see link in cover letter.
We decided to abandon my old approach in favor of this one,
but the end result is the same.
From 0.81% cpu of some magic 0x00007fffa001a660
into 18.13% of bpf_prog_1accc788e7f04c38_balancer_ingres


> As for the overhead, I did not see a separate thread getting spun off
> for the bpf events, so the events are processed inline for this RFC set.

argh. you're right. we have to fix that.

2018-11-07 08:41:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT

On Tue, Nov 06, 2018 at 12:52:42PM -0800, Song Liu wrote:
> For better performance analysis of BPF programs, this patch introduces
> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
> load/unload information to user space.
>
> /*
> * Record different types of bpf events:
> * enum perf_bpf_event_type {
> * PERF_BPF_EVENT_UNKNOWN = 0,
> * PERF_BPF_EVENT_PROG_LOAD = 1,
> * PERF_BPF_EVENT_PROG_UNLOAD = 2,
> * };
> *
> * struct {
> * struct perf_event_header header;
> * u16 type;
> * u16 flags;
> * u32 id; // prog_id or map_id
> * };
> */
> PERF_RECORD_BPF_EVENT = 17,
>
> PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
> Perf utility (or other user space tools) should listen to this event and
> fetch more details about the event via BPF syscalls
> (BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).

Why !? You're failing to explain why it cannot provide the full
information there.

2018-11-07 18:13:35

by David Ahern

[permalink] [raw]
Subject: Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs

On 11/6/18 6:09 PM, Alexei Starovoitov wrote:
> On 11/6/18 4:44 PM, David Ahern wrote:
>>
>> So one use case is profiling bpf programs. I was also considering the
>> auditing discussion from some weeks ago which I thought the events are
>> also targeting.
>
> yes. there should be separate mode for 're: audit discussion' where
> only bpf events are collected. This patch set doesn't add that to
> perf user space side.
> The kernel side is common though. It can be used for bpf load/unload
> only and for different use case in this set. Which is making
> bpf program appear in normal 'perf report'.

It would be good for perf-script to work in the next version. Users
should be able to dump the event from the kernel and the synthesized
events. Should be trivial to add and allows a review of both
perspectives -- profiling and monitoring events.

2018-11-07 18:26:37

by Song Liu

[permalink] [raw]
Subject: Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT



> On Nov 7, 2018, at 12:40 AM, Peter Zijlstra <[email protected]> wrote:
>
> On Tue, Nov 06, 2018 at 12:52:42PM -0800, Song Liu wrote:
>> For better performance analysis of BPF programs, this patch introduces
>> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
>> load/unload information to user space.
>>
>> /*
>> * Record different types of bpf events:
>> * enum perf_bpf_event_type {
>> * PERF_BPF_EVENT_UNKNOWN = 0,
>> * PERF_BPF_EVENT_PROG_LOAD = 1,
>> * PERF_BPF_EVENT_PROG_UNLOAD = 2,
>> * };
>> *
>> * struct {
>> * struct perf_event_header header;
>> * u16 type;
>> * u16 flags;
>> * u32 id; // prog_id or map_id
>> * };
>> */
>> PERF_RECORD_BPF_EVENT = 17,
>>
>> PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
>> Perf utility (or other user space tools) should listen to this event and
>> fetch more details about the event via BPF syscalls
>> (BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).
>
> Why !? You're failing to explain why it cannot provide the full
> information there.

Aha, I missed this part. I will add the following to next version. Please
let me know if anything is not clear.

Thanks,
Song



This design decision is picked for the following reasons. First, BPF
programs could be loaded-and-jited and/or unloaded before/during/after
perf-record run. Once a BPF programs is unloaded, it is impossible to
recover details of the program. It is impossible to provide the
information through a simple key (like the build ID). Second, BPF prog
annotation is under fast developments. Multiple informations will be
added to bpf_prog_info in the next few releases. Including all the
information of a BPF program in the perf ring buffer requires frequent
changes to the perf ABI, and thus makes it very difficult to manage
compatibility of perf utility.

For more information on this topic, please refer to the following
discussions:

https://www.spinics.net/lists/netdev/msg524230.html


2018-11-07 18:30:09

by Song Liu

[permalink] [raw]
Subject: Re: [RFC perf,bpf 5/5] perf util: generate bpf_prog_info_event for short living bpf programs



> On Nov 7, 2018, at 10:12 AM, David Ahern <[email protected]> wrote:
>
> On 11/6/18 6:09 PM, Alexei Starovoitov wrote:
>> On 11/6/18 4:44 PM, David Ahern wrote:
>>>
>>> So one use case is profiling bpf programs. I was also considering the
>>> auditing discussion from some weeks ago which I thought the events are
>>> also targeting.
>>
>> yes. there should be separate mode for 're: audit discussion' where
>> only bpf events are collected. This patch set doesn't add that to
>> perf user space side.
>> The kernel side is common though. It can be used for bpf load/unload
>> only and for different use case in this set. Which is making
>> bpf program appear in normal 'perf report'.
>
> It would be good for perf-script to work in the next version. Users
> should be able to dump the event from the kernel and the synthesized
> events. Should be trivial to add and allows a review of both
> perspectives -- profiling and monitoring events.

Yes, we do plan to include these information in other perf commands,
including perf-script, perf-top, perf-annotate, etc.

Thanks,
Song

2018-11-08 15:01:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT

On Wed, Nov 07, 2018 at 06:25:04PM +0000, Song Liu wrote:
>
>
> > On Nov 7, 2018, at 12:40 AM, Peter Zijlstra <[email protected]> wrote:
> >
> > On Tue, Nov 06, 2018 at 12:52:42PM -0800, Song Liu wrote:
> >> For better performance analysis of BPF programs, this patch introduces
> >> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
> >> load/unload information to user space.
> >>
> >> /*
> >> * Record different types of bpf events:
> >> * enum perf_bpf_event_type {
> >> * PERF_BPF_EVENT_UNKNOWN = 0,
> >> * PERF_BPF_EVENT_PROG_LOAD = 1,
> >> * PERF_BPF_EVENT_PROG_UNLOAD = 2,
> >> * };
> >> *
> >> * struct {
> >> * struct perf_event_header header;
> >> * u16 type;
> >> * u16 flags;
> >> * u32 id; // prog_id or map_id
> >> * };
> >> */
> >> PERF_RECORD_BPF_EVENT = 17,
> >>
> >> PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
> >> Perf utility (or other user space tools) should listen to this event and
> >> fetch more details about the event via BPF syscalls
> >> (BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).
> >
> > Why !? You're failing to explain why it cannot provide the full
> > information there.
>
> Aha, I missed this part. I will add the following to next version. Please
> let me know if anything is not clear.

>
> This design decision is picked for the following reasons. First, BPF
> programs could be loaded-and-jited and/or unloaded before/during/after
> perf-record run. Once a BPF programs is unloaded, it is impossible to
> recover details of the program. It is impossible to provide the
> information through a simple key (like the build ID). Second, BPF prog
> annotation is under fast developments. Multiple informations will be
> added to bpf_prog_info in the next few releases. Including all the
> information of a BPF program in the perf ring buffer requires frequent
> changes to the perf ABI, and thus makes it very difficult to manage
> compatibility of perf utility.

So I don't agree with that reasoning. If you want symbol information
you'll just have to commit to some form of ABI. That bpf_prog_info is an
ABI too.

And relying on userspace to synchronously consume perf output to
directly call into the kernel again to get more info (through another
ABI) is a pretty terrible design.

So please try harder. NAK on this.

2018-11-08 18:06:35

by Song Liu

[permalink] [raw]
Subject: Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT

Hi Peter,

> On Nov 8, 2018, at 7:00 AM, Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Nov 07, 2018 at 06:25:04PM +0000, Song Liu wrote:
>>
>>
>>> On Nov 7, 2018, at 12:40 AM, Peter Zijlstra <[email protected]> wrote:
>>>
>>> On Tue, Nov 06, 2018 at 12:52:42PM -0800, Song Liu wrote:
>>>> For better performance analysis of BPF programs, this patch introduces
>>>> PERF_RECORD_BPF_EVENT, a new perf_event_type that exposes BPF program
>>>> load/unload information to user space.
>>>>
>>>> /*
>>>> * Record different types of bpf events:
>>>> * enum perf_bpf_event_type {
>>>> * PERF_BPF_EVENT_UNKNOWN = 0,
>>>> * PERF_BPF_EVENT_PROG_LOAD = 1,
>>>> * PERF_BPF_EVENT_PROG_UNLOAD = 2,
>>>> * };
>>>> *
>>>> * struct {
>>>> * struct perf_event_header header;
>>>> * u16 type;
>>>> * u16 flags;
>>>> * u32 id; // prog_id or map_id
>>>> * };
>>>> */
>>>> PERF_RECORD_BPF_EVENT = 17,
>>>>
>>>> PERF_RECORD_BPF_EVENT contains minimal information about the BPF program.
>>>> Perf utility (or other user space tools) should listen to this event and
>>>> fetch more details about the event via BPF syscalls
>>>> (BPF_PROG_GET_FD_BY_ID, BPF_OBJ_GET_INFO_BY_FD, etc.).
>>>
>>> Why !? You're failing to explain why it cannot provide the full
>>> information there.
>>
>> Aha, I missed this part. I will add the following to next version. Please
>> let me know if anything is not clear.
>
>>
>> This design decision is picked for the following reasons. First, BPF
>> programs could be loaded-and-jited and/or unloaded before/during/after
>> perf-record run. Once a BPF programs is unloaded, it is impossible to
>> recover details of the program. It is impossible to provide the
>> information through a simple key (like the build ID). Second, BPF prog
>> annotation is under fast developments. Multiple informations will be
>> added to bpf_prog_info in the next few releases. Including all the
>> information of a BPF program in the perf ring buffer requires frequent
>> changes to the perf ABI, and thus makes it very difficult to manage
>> compatibility of perf utility.
>
> So I don't agree with that reasoning. If you want symbol information
> you'll just have to commit to some form of ABI. That bpf_prog_info is an
> ABI too.

At the beginning of the perf-record run, perf need to query bpf_prog_info
of already loaded BPF programs. Therefore, we need to commit to the
bpf_prog_info ABI. If we also include full information of the BPF program
in the perf ring buffer, we will commit to TWO ABIs.

Also, perf-record write the event to perf.data file, so the data need to be
serialized. This is implemented in patch 4/5. To include the data in the
ring buffer, we will need another piece of code in the kernel to do the
same serialization work.

On the other hand, processing BPF load/unload events synchronously should
not introduce too much overhead for meaningful use cases. If many BPF progs
are being loaded/unloaded within short period of time, it is not the steady
state that profiling works care about.

Would these resolve your concerns?

Thanks,
Song


2018-11-08 18:38:19

by David Ahern

[permalink] [raw]
Subject: Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT

On 11/8/18 11:04 AM, Song Liu wrote:
> On the other hand, processing BPF load/unload events synchronously should
> not introduce too much overhead for meaningful use cases. If many BPF progs
> are being loaded/unloaded within short period of time, it is not the steady
> state that profiling works care about.

but, profiling is not the only use case, and perf-record is common with
those other use cases.

I think that answers why your RFC set does not fork a thread for the bpf
events. You are focused on profiling and for already loaded programs or
for a small number of programs loaded by a specific workload started by
perf.

2018-11-08 18:50:44

by Song Liu

[permalink] [raw]
Subject: Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT

Hi David,

> On Nov 8, 2018, at 10:26 AM, David Ahern <[email protected]> wrote:
>
> On 11/8/18 11:04 AM, Song Liu wrote:
>> On the other hand, processing BPF load/unload events synchronously should
>> not introduce too much overhead for meaningful use cases. If many BPF progs
>> are being loaded/unloaded within short period of time, it is not the steady
>> state that profiling works care about.
>
> but, profiling is not the only use case, and perf-record is common with
> those other use cases.
>
> I think that answers why your RFC set does not fork a thread for the bpf
> events. You are focused on profiling and for already loaded programs or
> for a small number of programs loaded by a specific workload started by
> perf.

We sure can fork a thread for the BPF event. But I guess that's not Peter's
main concern here...

Could you please point me to more information about the use cases you worry
about? I am more than happy to optimize the logic for those use cases.

Thanks,
Song


2018-11-09 17:11:56

by David Ahern

[permalink] [raw]
Subject: Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT

On 11/8/18 11:49 AM, Song Liu wrote:
> Could you please point me to more information about the use cases you worry
> about? I am more than happy to optimize the logic for those use cases.

bpf load and unload as just another tracepoint to throw into a set of
events that are monitored. As mentioned before auditing the loads and
unloads is one example.

And that brings up another comment: Why are you adding a PERF_RECORD_*
rather than a tracepoint? From what I can see the PERF_RECORD_BPF_EVENT
definition does not include the who is loading / unloading a bpf
program. That is important information as well.

2018-11-09 18:52:57

by Song Liu

[permalink] [raw]
Subject: Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT



> On Nov 9, 2018, at 9:08 AM, David Ahern <[email protected]> wrote:
>
> On 11/8/18 11:49 AM, Song Liu wrote:
>> Could you please point me to more information about the use cases you worry
>> about? I am more than happy to optimize the logic for those use cases.
>
> bpf load and unload as just another tracepoint to throw into a set of
> events that are monitored. As mentioned before auditing the loads and
> unloads is one example.

For the tracepoint approach, we need similar synchronous logic in perf to
process BPF load/unload events. If we agree this is the right direction,
I will modify the set with tracepoints.

>
> And that brings up another comment: Why are you adding a PERF_RECORD_*
> rather than a tracepoint? From what I can see the PERF_RECORD_BPF_EVENT
> definition does not include the who is loading / unloading a bpf
> program. That is important information as well.

bpf_prog_info has "__u32 created_by_uid;" in it, so it is not really needed
in current version.

Thanks,
Song

2018-11-09 19:15:56

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [RFC perf,bpf 1/5] perf, bpf: Introduce PERF_RECORD_BPF_EVENT

On 11/9/18 10:49 AM, Song Liu wrote:
>
>
>> On Nov 9, 2018, at 9:08 AM, David Ahern <[email protected]> wrote:
>>
>> On 11/8/18 11:49 AM, Song Liu wrote:
>>> Could you please point me to more information about the use cases you worry
>>> about? I am more than happy to optimize the logic for those use cases.
>>
>> bpf load and unload as just another tracepoint to throw into a set of
>> events that are monitored. As mentioned before auditing the loads and
>> unloads is one example.
>
> For the tracepoint approach, we need similar synchronous logic in perf to
> process BPF load/unload events. If we agree this is the right direction,
> I will modify the set with tracepoints.

we had tracepoints in bpf core. We removed it.
I don't think we'll be going back.