hi,
adding support to create multiple kprobes/uprobes within single
perf event. This way we can associate single bpf program with
multiple kprobes.
Sending this as RFC because I'm not completely sure I haven't
missed anything in the trace/events area.
Also it needs following uprobe fix to work properly:
https://lore.kernel.org/lkml/[email protected]/
Also available at:
https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
bpf/kuprobe_batch
thanks,
jirka
---
Jiri Olsa (8):
perf/kprobe: Add support to create multiple probes
perf/uprobe: Add support to create multiple probes
libbpf: Add libbpf__kallsyms_parse function
libbpf: Add struct perf_event_open_args
libbpf: Add support to attach multiple [ku]probes
libbpf: Add support for k[ret]probe.multi program section
selftest/bpf: Add kprobe multi attach test
selftest/bpf: Add uprobe multi attach test
include/uapi/linux/perf_event.h | 1 +
kernel/trace/trace_event_perf.c | 214 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------
kernel/trace/trace_kprobe.c | 47 ++++++++++++++++---
kernel/trace/trace_probe.c | 2 +-
kernel/trace/trace_probe.h | 6 ++-
kernel/trace/trace_uprobe.c | 43 +++++++++++++++--
tools/include/uapi/linux/perf_event.h | 1 +
tools/lib/bpf/libbpf.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------
tools/lib/bpf/libbpf.h | 25 +++++++++-
tools/lib/bpf/libbpf_internal.h | 5 ++
tools/testing/selftests/bpf/prog_tests/multi_kprobe_test.c | 83 +++++++++++++++++++++++++++++++++
tools/testing/selftests/bpf/prog_tests/multi_uprobe_test.c | 97 ++++++++++++++++++++++++++++++++++++++
tools/testing/selftests/bpf/progs/multi_kprobe.c | 58 +++++++++++++++++++++++
tools/testing/selftests/bpf/progs/multi_uprobe.c | 26 +++++++++++
14 files changed, 765 insertions(+), 78 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/multi_kprobe_test.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/multi_uprobe_test.c
create mode 100644 tools/testing/selftests/bpf/progs/multi_kprobe.c
create mode 100644 tools/testing/selftests/bpf/progs/multi_uprobe.c
Adding support to create multiple probes within single perf event.
This way we can associate single bpf program with multiple kprobes,
because bpf program gets associated with the perf event.
The perf_event_attr is not extended, current fields for kprobe
attachment are used for multi attachment.
For current kprobe atachment we use either:
kprobe_func (in config1) + probe_offset (in config2)
to define kprobe by function name with offset, or:
kprobe_addr (in config2)
to define kprobe with direct address value.
For multi probe attach the same fields point to array of values
with the same semantic. Each probe is defined as set of values
with the same array index (idx) as:
kprobe_func[idx] + probe_offset[idx]
to define kprobe by function name with offset, or:
kprobe_addr[idx]
to define kprobe with direct address value.
The number of probes is passed in probe_cnt value, which shares
the union with wakeup_events/wakeup_watermark values which are
not used for kprobes.
Since [1] it's possible to stack multiple probes events under
one head event. Using the same code to allow that for probes
defined under perf kprobe interface.
[1] https://lore.kernel.org/lkml/156095682948.28024.14190188071338900568.stgit@devnote2/
Signed-off-by: Jiri Olsa <[email protected]>
---
include/uapi/linux/perf_event.h | 1 +
kernel/trace/trace_event_perf.c | 106 ++++++++++++++++++++++++++++----
kernel/trace/trace_kprobe.c | 47 ++++++++++++--
kernel/trace/trace_probe.c | 2 +-
kernel/trace/trace_probe.h | 3 +-
5 files changed, 138 insertions(+), 21 deletions(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index bd8860eeb291..eea80709d1ed 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -414,6 +414,7 @@ struct perf_event_attr {
union {
__u32 wakeup_events; /* wakeup every n events */
__u32 wakeup_watermark; /* bytes before wakeup */
+ __u32 probe_cnt; /* number of [k,u] probes */
};
__u32 bp_type;
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index a114549720d6..26078e40c299 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -245,23 +245,27 @@ void perf_trace_destroy(struct perf_event *p_event)
}
#ifdef CONFIG_KPROBE_EVENTS
-int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe)
+static struct trace_event_call*
+kprobe_init(bool is_retprobe, u64 kprobe_func, u64 kprobe_addr,
+ u64 probe_offset, struct trace_event_call *old)
{
int ret;
char *func = NULL;
struct trace_event_call *tp_event;
- if (p_event->attr.kprobe_func) {
+ if (kprobe_func) {
func = kzalloc(KSYM_NAME_LEN, GFP_KERNEL);
if (!func)
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);
ret = strncpy_from_user(
- func, u64_to_user_ptr(p_event->attr.kprobe_func),
+ func, u64_to_user_ptr(kprobe_func),
KSYM_NAME_LEN);
if (ret == KSYM_NAME_LEN)
ret = -E2BIG;
- if (ret < 0)
- goto out;
+ if (ret < 0) {
+ kfree(func);
+ return ERR_PTR(ret);
+ }
if (func[0] == '\0') {
kfree(func);
@@ -270,20 +274,96 @@ int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe)
}
tp_event = create_local_trace_kprobe(
- func, (void *)(unsigned long)(p_event->attr.kprobe_addr),
- p_event->attr.probe_offset, is_retprobe);
- if (IS_ERR(tp_event)) {
- ret = PTR_ERR(tp_event);
- goto out;
+ func, (void *)(unsigned long)(kprobe_addr),
+ probe_offset, is_retprobe, old);
+ kfree(func);
+ return tp_event;
+}
+
+static struct trace_event_call*
+kprobe_init_multi(struct perf_event *p_event, bool is_retprobe)
+{
+ void __user *kprobe_func = u64_to_user_ptr(p_event->attr.kprobe_func);
+ void __user *kprobe_addr = u64_to_user_ptr(p_event->attr.kprobe_addr);
+ u64 *funcs = NULL, *addrs = NULL, *offs = NULL;
+ struct trace_event_call *tp_event, *tp_old = NULL;
+ u32 i, cnt = p_event->attr.probe_cnt;
+ int ret = -EINVAL;
+ size_t size;
+
+ if (!cnt)
+ return ERR_PTR(-EINVAL);
+
+ size = cnt * sizeof(u64);
+ if (kprobe_func) {
+ ret = -ENOMEM;
+ funcs = kmalloc(size, GFP_KERNEL);
+ if (!funcs)
+ goto out;
+ ret = -EFAULT;
+ if (copy_from_user(funcs, kprobe_func, size))
+ goto out;
+ }
+
+ if (kprobe_addr) {
+ ret = -ENOMEM;
+ addrs = kmalloc(size, GFP_KERNEL);
+ if (!addrs)
+ goto out;
+ ret = -EFAULT;
+ if (copy_from_user(addrs, kprobe_addr, size))
+ goto out;
+ /* addresses and ofsets share the same array */
+ offs = addrs;
}
+ for (i = 0; i < cnt; i++) {
+ tp_event = kprobe_init(is_retprobe, funcs ? funcs[i] : 0,
+ addrs ? addrs[i] : 0, offs ? offs[i] : 0,
+ tp_old);
+ if (IS_ERR(tp_event)) {
+ if (tp_old)
+ destroy_local_trace_kprobe(tp_old);
+ ret = PTR_ERR(tp_event);
+ goto out;
+ }
+ if (!tp_old)
+ tp_old = tp_event;
+ }
+ ret = 0;
+out:
+ kfree(funcs);
+ kfree(addrs);
+ return ret ? ERR_PTR(ret) : tp_old;
+}
+
+static struct trace_event_call*
+kprobe_init_single(struct perf_event *p_event, bool is_retprobe)
+{
+ struct perf_event_attr *attr = &p_event->attr;
+
+ return kprobe_init(is_retprobe, attr->kprobe_func, attr->kprobe_addr,
+ attr->probe_offset, NULL);
+}
+
+int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe)
+{
+ struct trace_event_call *tp_event;
+ int ret;
+
+ if (p_event->attr.probe_cnt)
+ tp_event = kprobe_init_multi(p_event, is_retprobe);
+ else
+ tp_event = kprobe_init_single(p_event, is_retprobe);
+
+ if (IS_ERR(tp_event))
+ return PTR_ERR(tp_event);
+
mutex_lock(&event_mutex);
ret = perf_trace_event_init(tp_event, p_event);
if (ret)
destroy_local_trace_kprobe(tp_event);
mutex_unlock(&event_mutex);
-out:
- kfree(func);
return ret;
}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 33272a7b6912..86a7aada853a 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -237,13 +237,18 @@ static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs);
static int kretprobe_dispatcher(struct kretprobe_instance *ri,
struct pt_regs *regs);
+static void __free_trace_kprobe(struct trace_kprobe *tk)
+{
+ kfree(tk->symbol);
+ free_percpu(tk->nhit);
+ kfree(tk);
+}
+
static void free_trace_kprobe(struct trace_kprobe *tk)
{
if (tk) {
trace_probe_cleanup(&tk->tp);
- kfree(tk->symbol);
- free_percpu(tk->nhit);
- kfree(tk);
+ __free_trace_kprobe(tk);
}
}
@@ -1796,7 +1801,7 @@ static int unregister_kprobe_event(struct trace_kprobe *tk)
/* create a trace_kprobe, but don't add it to global lists */
struct trace_event_call *
create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
- bool is_return)
+ bool is_return, struct trace_event_call *old)
{
enum probe_print_type ptype;
struct trace_kprobe *tk;
@@ -1820,6 +1825,28 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
return ERR_CAST(tk);
}
+ if (old) {
+ struct trace_kprobe *tk_old;
+
+ tk_old = trace_kprobe_primary_from_call(old);
+ if (!tk_old) {
+ ret = -EINVAL;
+ goto error;
+ }
+
+ /* Append to existing event */
+ ret = trace_probe_append(&tk->tp, &tk_old->tp);
+ if (ret)
+ goto error;
+
+ /* Register k*probe */
+ ret = __register_trace_kprobe(tk);
+ if (ret)
+ goto error;
+
+ return trace_probe_event_call(&tk->tp);
+ }
+
init_trace_event_call(tk);
ptype = trace_kprobe_is_return(tk) ?
@@ -1841,6 +1868,8 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
void destroy_local_trace_kprobe(struct trace_event_call *event_call)
{
+ struct trace_probe_event *event;
+ struct trace_probe *pos, *tmp;
struct trace_kprobe *tk;
tk = trace_kprobe_primary_from_call(event_call);
@@ -1852,9 +1881,15 @@ void destroy_local_trace_kprobe(struct trace_event_call *event_call)
return;
}
- __unregister_trace_kprobe(tk);
+ event = tk->tp.event;
+ list_for_each_entry_safe(pos, tmp, &event->probes, list) {
+ tk = container_of(pos, struct trace_kprobe, tp);
+ list_del_init(&pos->list);
+ __unregister_trace_kprobe(tk);
+ __free_trace_kprobe(tk);
+ }
- free_trace_kprobe(tk);
+ trace_probe_event_free(event);
}
#endif /* CONFIG_PERF_EVENTS */
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 3ed2a3f37297..2dff85aa21e9 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -974,7 +974,7 @@ int traceprobe_define_arg_fields(struct trace_event_call *event_call,
return 0;
}
-static void trace_probe_event_free(struct trace_probe_event *tpe)
+void trace_probe_event_free(struct trace_probe_event *tpe)
{
kfree(tpe->class.system);
kfree(tpe->call.name);
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 99e7a5df025e..ba8e46c7efe8 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -333,6 +333,7 @@ int trace_probe_init(struct trace_probe *tp, const char *event,
const char *group, bool alloc_filter);
void trace_probe_cleanup(struct trace_probe *tp);
int trace_probe_append(struct trace_probe *tp, struct trace_probe *to);
+void trace_probe_event_free(struct trace_probe_event *tpe);
void trace_probe_unlink(struct trace_probe *tp);
int trace_probe_register_event_call(struct trace_probe *tp);
int trace_probe_add_file(struct trace_probe *tp, struct trace_event_file *file);
@@ -377,7 +378,7 @@ extern int traceprobe_set_print_fmt(struct trace_probe *tp, enum probe_print_typ
#ifdef CONFIG_PERF_EVENTS
extern struct trace_event_call *
create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
- bool is_return);
+ bool is_return, struct trace_event_call *old);
extern void destroy_local_trace_kprobe(struct trace_event_call *event_call);
extern struct trace_event_call *
--
2.33.1
Adding support to create multiple probes within single perf event.
This way we can associate single bpf program with multiple uprobes,
because bpf program gets associated with the perf event.
The perf_event_attr is not extended, current fields for uprobe
attachment are used for multi attachment.
For current uprobe atachment we use:
uprobe_path (in config1) + probe_offset (in config2)
to define kprobe by executable path with offset.
For multi probe attach the same fields point to array of values
with the same semantic. Each probe is defined as set of values
with the same array index (idx) as:
uprobe_path[idx] (in config1) + probe_offset[idx] (in config2)
to define uprobe executable path with offset.
The number of probes is passed in probe_cnt value, which shares
the union with wakeup_events/wakeup_watermark values which are
not used for uprobes.
Since [1] it's possible to stack multiple probes events under
one head event. Using the same code to allow that for probes
defined under perf uprobe interface.
[1] https://lore.kernel.org/lkml/156095682948.28024.14190188071338900568.stgit@devnote2/
Signed-off-by: Jiri Olsa <[email protected]>
---
kernel/trace/trace_event_perf.c | 108 +++++++++++++++++++++++++++-----
kernel/trace/trace_probe.h | 3 +-
kernel/trace/trace_uprobe.c | 43 +++++++++++--
3 files changed, 133 insertions(+), 21 deletions(-)
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 26078e40c299..fb5db6a43d37 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -379,34 +379,114 @@ void perf_kprobe_destroy(struct perf_event *p_event)
#endif /* CONFIG_KPROBE_EVENTS */
#ifdef CONFIG_UPROBE_EVENTS
-int perf_uprobe_init(struct perf_event *p_event,
- unsigned long ref_ctr_offset, bool is_retprobe)
+static struct trace_event_call*
+uprobe_init(u64 uprobe_path, u64 probe_offset, unsigned long ref_ctr_offset,
+ bool is_retprobe, struct trace_event_call *old)
{
int ret;
char *path = NULL;
struct trace_event_call *tp_event;
- if (!p_event->attr.uprobe_path)
- return -EINVAL;
+ if (!uprobe_path)
+ return ERR_PTR(-EINVAL);
- path = strndup_user(u64_to_user_ptr(p_event->attr.uprobe_path),
+ path = strndup_user(u64_to_user_ptr(uprobe_path),
PATH_MAX);
if (IS_ERR(path)) {
ret = PTR_ERR(path);
- return (ret == -EINVAL) ? -E2BIG : ret;
+ return ERR_PTR((ret == -EINVAL) ? -E2BIG : ret);
}
if (path[0] == '\0') {
- ret = -EINVAL;
- goto out;
+ kfree(path);
+ return ERR_PTR(-EINVAL);
}
- tp_event = create_local_trace_uprobe(path, p_event->attr.probe_offset,
- ref_ctr_offset, is_retprobe);
- if (IS_ERR(tp_event)) {
- ret = PTR_ERR(tp_event);
- goto out;
+ tp_event = create_local_trace_uprobe(path, probe_offset,
+ ref_ctr_offset, is_retprobe, old);
+ kfree(path);
+ return tp_event;
+}
+
+static struct trace_event_call*
+uprobe_init_multi(struct perf_event *p_event, unsigned long ref_ctr_offset,
+ bool is_retprobe)
+{
+ void __user *probe_offset = u64_to_user_ptr(p_event->attr.probe_offset);
+ void __user *uprobe_path = u64_to_user_ptr(p_event->attr.uprobe_path);
+ struct trace_event_call *tp_event, *tp_old = NULL;
+ u32 i, cnt = p_event->attr.probe_cnt;
+ u64 *paths = NULL, *offs = NULL;
+ int ret = -EINVAL;
+ size_t size;
+
+ if (!cnt)
+ return ERR_PTR(-EINVAL);
+
+ size = cnt * sizeof(u64);
+ if (uprobe_path) {
+ ret = -ENOMEM;
+ paths = kmalloc(size, GFP_KERNEL);
+ if (!paths)
+ goto out;
+ ret = -EFAULT;
+ if (copy_from_user(paths, uprobe_path, size))
+ goto out;
}
+ if (probe_offset) {
+ ret = -ENOMEM;
+ offs = kmalloc(size, GFP_KERNEL);
+ if (!offs)
+ goto out;
+ ret = -EFAULT;
+ if (copy_from_user(offs, probe_offset, size))
+ goto out;
+ }
+
+ for (i = 0; i < cnt; i++) {
+ tp_event = uprobe_init(paths ? paths[i] : 0, offs ? offs[i] : 0,
+ ref_ctr_offset, is_retprobe, tp_old);
+ if (IS_ERR(tp_event)) {
+ if (tp_old)
+ destroy_local_trace_uprobe(tp_old);
+ ret = PTR_ERR(tp_event);
+ goto out;
+ }
+ if (!tp_old)
+ tp_old = tp_event;
+ }
+ ret = 0;
+
+out:
+ kfree(paths);
+ kfree(offs);
+ return ret ? ERR_PTR(ret) : tp_old;
+}
+
+static struct trace_event_call*
+uprobe_init_single(struct perf_event *p_event, unsigned long ref_ctr_offset,
+ bool is_retprobe)
+{
+ struct perf_event_attr *attr = &p_event->attr;
+
+ return uprobe_init(attr->uprobe_path, attr->probe_offset,
+ ref_ctr_offset, is_retprobe, NULL);
+}
+
+int perf_uprobe_init(struct perf_event *p_event,
+ unsigned long ref_ctr_offset, bool is_retprobe)
+{
+ struct trace_event_call *tp_event;
+ int ret;
+
+ if (p_event->attr.probe_cnt)
+ tp_event = uprobe_init_multi(p_event, ref_ctr_offset, is_retprobe);
+ else
+ tp_event = uprobe_init_single(p_event, ref_ctr_offset, is_retprobe);
+
+ if (IS_ERR(tp_event))
+ return PTR_ERR(tp_event);
+
/*
* local trace_uprobe need to hold event_mutex to call
* uprobe_buffer_enable() and uprobe_buffer_disable().
@@ -417,8 +497,6 @@ int perf_uprobe_init(struct perf_event *p_event,
if (ret)
destroy_local_trace_uprobe(tp_event);
mutex_unlock(&event_mutex);
-out:
- kfree(path);
return ret;
}
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index ba8e46c7efe8..6c81926874ff 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -383,7 +383,8 @@ extern void destroy_local_trace_kprobe(struct trace_event_call *event_call);
extern struct trace_event_call *
create_local_trace_uprobe(char *name, unsigned long offs,
- unsigned long ref_ctr_offset, bool is_return);
+ unsigned long ref_ctr_offset, bool is_return,
+ struct trace_event_call *old);
extern void destroy_local_trace_uprobe(struct trace_event_call *event_call);
#endif
extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index f5f0039d31e5..ca76f9ab6811 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -358,15 +358,20 @@ alloc_trace_uprobe(const char *group, const char *event, int nargs, bool is_ret)
return ERR_PTR(ret);
}
+static void __free_trace_uprobe(struct trace_uprobe *tu)
+{
+ path_put(&tu->path);
+ kfree(tu->filename);
+ kfree(tu);
+}
+
static void free_trace_uprobe(struct trace_uprobe *tu)
{
if (!tu)
return;
- path_put(&tu->path);
trace_probe_cleanup(&tu->tp);
- kfree(tu->filename);
- kfree(tu);
+ __free_trace_uprobe(tu);
}
static struct trace_uprobe *find_probe_event(const char *event, const char *group)
@@ -1584,7 +1589,8 @@ static int unregister_uprobe_event(struct trace_uprobe *tu)
#ifdef CONFIG_PERF_EVENTS
struct trace_event_call *
create_local_trace_uprobe(char *name, unsigned long offs,
- unsigned long ref_ctr_offset, bool is_return)
+ unsigned long ref_ctr_offset, bool is_return,
+ struct trace_event_call *old)
{
enum probe_print_type ptype;
struct trace_uprobe *tu;
@@ -1619,6 +1625,24 @@ create_local_trace_uprobe(char *name, unsigned long offs,
tu->path = path;
tu->ref_ctr_offset = ref_ctr_offset;
tu->filename = kstrdup(name, GFP_KERNEL);
+
+ if (old) {
+ struct trace_uprobe *tu_old;
+
+ tu_old = trace_uprobe_primary_from_call(old);
+ if (!tu_old) {
+ ret = -EINVAL;
+ goto error;
+ }
+
+ /* Append to existing event */
+ ret = trace_probe_append(&tu->tp, &tu_old->tp);
+ if (ret)
+ goto error;
+
+ return trace_probe_event_call(&tu->tp);
+ }
+
init_trace_event_call(tu);
ptype = is_ret_probe(tu) ? PROBE_PRINT_RETURN : PROBE_PRINT_NORMAL;
@@ -1635,11 +1659,20 @@ create_local_trace_uprobe(char *name, unsigned long offs,
void destroy_local_trace_uprobe(struct trace_event_call *event_call)
{
+ struct trace_probe_event *event;
+ struct trace_probe *pos, *tmp;
struct trace_uprobe *tu;
tu = trace_uprobe_primary_from_call(event_call);
- free_trace_uprobe(tu);
+ event = tu->tp.event;
+ list_for_each_entry_safe(pos, tmp, &event->probes, list) {
+ tu = container_of(pos, struct trace_uprobe, tp);
+ list_del_init(&pos->list);
+ __free_trace_uprobe(tu);
+ }
+
+ trace_probe_event_free(event);
}
#endif /* CONFIG_PERF_EVENTS */
--
2.33.1
Move the kallsyms parsing in internal libbpf__kallsyms_parse
function, so it can be used from other places.
It will be used in following changes.
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/lib/bpf/libbpf.c | 62 ++++++++++++++++++++-------------
tools/lib/bpf/libbpf_internal.h | 5 +++
2 files changed, 43 insertions(+), 24 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index af405c38aadc..b55c0fbfcc03 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6950,12 +6950,10 @@ static int bpf_object__sanitize_maps(struct bpf_object *obj)
return 0;
}
-static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
+int libbpf__kallsyms_parse(void *arg, kallsyms_cb_t cb)
{
char sym_type, sym_name[500];
unsigned long long sym_addr;
- const struct btf_type *t;
- struct extern_desc *ext;
int ret, err = 0;
FILE *f;
@@ -6974,35 +6972,51 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
if (ret != 3) {
pr_warn("failed to read kallsyms entry: %d\n", ret);
err = -EINVAL;
- goto out;
+ break;
}
- ext = find_extern_by_name(obj, sym_name);
- if (!ext || ext->type != EXT_KSYM)
- continue;
-
- t = btf__type_by_id(obj->btf, ext->btf_id);
- if (!btf_is_var(t))
- continue;
-
- if (ext->is_set && ext->ksym.addr != sym_addr) {
- pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n",
- sym_name, ext->ksym.addr, sym_addr);
- err = -EINVAL;
- goto out;
- }
- if (!ext->is_set) {
- ext->is_set = true;
- ext->ksym.addr = sym_addr;
- pr_debug("extern (ksym) %s=0x%llx\n", sym_name, sym_addr);
- }
+ err = cb(arg, sym_addr, sym_type, sym_name);
+ if (err)
+ break;
}
-out:
fclose(f);
return err;
}
+static int kallsyms_cb(void *arg, unsigned long long sym_addr,
+ char sym_type, const char *sym_name)
+{
+ struct bpf_object *obj = arg;
+ const struct btf_type *t;
+ struct extern_desc *ext;
+
+ ext = find_extern_by_name(obj, sym_name);
+ if (!ext || ext->type != EXT_KSYM)
+ return 0;
+
+ t = btf__type_by_id(obj->btf, ext->btf_id);
+ if (!btf_is_var(t))
+ return 0;
+
+ if (ext->is_set && ext->ksym.addr != sym_addr) {
+ pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n",
+ sym_name, ext->ksym.addr, sym_addr);
+ return -EINVAL;
+ }
+ if (!ext->is_set) {
+ ext->is_set = true;
+ ext->ksym.addr = sym_addr;
+ pr_debug("extern (ksym) %s=0x%llx\n", sym_name, sym_addr);
+ }
+ return 0;
+}
+
+static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
+{
+ return libbpf__kallsyms_parse(obj, kallsyms_cb);
+}
+
static int find_ksym_btf_id(struct bpf_object *obj, const char *ksym_name,
__u16 kind, struct btf **res_btf,
struct module_btf **res_mod_btf)
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index f7ac349650a1..511cb09f593f 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -406,6 +406,11 @@ __s32 btf__find_by_name_kind_own(const struct btf *btf, const char *type_name,
extern enum libbpf_strict_mode libbpf_mode;
+typedef int (*kallsyms_cb_t)(void *arg, unsigned long long sym_addr,
+ char sym_type, const char *sym_name);
+
+int libbpf__kallsyms_parse(void *arg, kallsyms_cb_t cb);
+
/* handle direct returned errors */
static inline int libbpf_err(int ret)
{
--
2.33.1
Adding struct perf_event_open_args to hold arguments for
perf_event_open_probe, because there's already 6 arguments
and more will come in following changes.
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/lib/bpf/libbpf.c | 42 ++++++++++++++++++++++++++++++++----------
1 file changed, 32 insertions(+), 10 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b55c0fbfcc03..34219a0c39a7 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -9625,11 +9625,20 @@ static int determine_uprobe_retprobe_bit(void)
#define PERF_UPROBE_REF_CTR_OFFSET_BITS 32
#define PERF_UPROBE_REF_CTR_OFFSET_SHIFT 32
-static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
- uint64_t offset, int pid, size_t ref_ctr_off)
+struct perf_event_open_args {
+ bool retprobe;
+ const char *name;
+ uint64_t offset;
+ int pid;
+ size_t ref_ctr_off;
+};
+
+static int perf_event_open_probe(bool uprobe, struct perf_event_open_args *args)
{
+ size_t ref_ctr_off = args->ref_ctr_off;
struct perf_event_attr attr = {};
char errmsg[STRERR_BUFSIZE];
+ int pid = args->pid;
int type, pfd, err;
if (ref_ctr_off >= (1ULL << PERF_UPROBE_REF_CTR_OFFSET_BITS))
@@ -9643,7 +9652,7 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
libbpf_strerror_r(type, errmsg, sizeof(errmsg)));
return type;
}
- if (retprobe) {
+ if (args->retprobe) {
int bit = uprobe ? determine_uprobe_retprobe_bit()
: determine_kprobe_retprobe_bit();
@@ -9658,8 +9667,8 @@ static int perf_event_open_probe(bool uprobe, bool retprobe, const char *name,
attr.size = sizeof(attr);
attr.type = type;
attr.config |= (__u64)ref_ctr_off << PERF_UPROBE_REF_CTR_OFFSET_SHIFT;
- attr.config1 = ptr_to_u64(name); /* kprobe_func or uprobe_path */
- attr.config2 = offset; /* kprobe_addr or probe_offset */
+ attr.config1 = ptr_to_u64(args->name); /* kprobe_func or uprobe_path */
+ attr.config2 = args->offset; /* kprobe_addr or probe_offset */
/* pid filter is meaningful only for uprobes */
pfd = syscall(__NR_perf_event_open, &attr,
@@ -9791,9 +9800,15 @@ bpf_program__attach_kprobe_opts(const struct bpf_program *prog,
legacy = determine_kprobe_perf_type() < 0;
if (!legacy) {
- pfd = perf_event_open_probe(false /* uprobe */, retprobe,
- func_name, offset,
- -1 /* pid */, 0 /* ref_ctr_off */);
+ struct perf_event_open_args args = {
+ .retprobe = retprobe,
+ .name = func_name,
+ .offset = offset,
+ .pid = -1,
+ .ref_ctr_off = 0,
+ };
+
+ pfd = perf_event_open_probe(false /* uprobe */, &args);
} else {
char probe_name[256];
@@ -9984,8 +9999,15 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
legacy = determine_uprobe_perf_type() < 0;
if (!legacy) {
- pfd = perf_event_open_probe(true /* uprobe */, retprobe, binary_path,
- func_offset, pid, ref_ctr_off);
+ struct perf_event_open_args args = {
+ .retprobe = retprobe,
+ .name = binary_path,
+ .offset = func_offset,
+ .pid = pid,
+ .ref_ctr_off = ref_ctr_off,
+ };
+
+ pfd = perf_event_open_probe(true /* uprobe */, &args);
} else {
char probe_name[512];
--
2.33.1
Adding support to attach multiple [ku]probes.
Extending both bpf_kprobe_opts and bpf_uprobe_opts structs
with config values to define multiple probes within single
bpf_program__attach_[ku]probe_opts call.
For mutiple probes in bpf_program__attach_kprobe_opts function
the 'func_name' argument is ignored and probes are defined in
bpf_kprobe_opts struct with:
struct {
/* probes count */
__u32 cnt;
/* function names array */
char **funcs;
/* address/offset values array */
union {
__u64 *addrs;
__u64 *offs;
};
} multi;
For mutiple probes in bpf_program__attach_uprobe_opts function
both 'binary_path' and 'func_offset' arguments are ignored and
probes are defined in bpf_kprobe_opts struct with:
/* multi uprobe values */
struct {
/* probes count */
__u32 cnt;
/* paths names array */
const char **paths;
/* offsets values array */
__u64 *offs;
} multi;
The multiple probes attachment is enabled when multi.cnt != 0.
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/include/uapi/linux/perf_event.h | 1 +
tools/lib/bpf/libbpf.c | 30 +++++++++++++++++++++++++--
tools/lib/bpf/libbpf.h | 25 ++++++++++++++++++++--
3 files changed, 52 insertions(+), 4 deletions(-)
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index bd8860eeb291..eea80709d1ed 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -414,6 +414,7 @@ struct perf_event_attr {
union {
__u32 wakeup_events; /* wakeup every n events */
__u32 wakeup_watermark; /* bytes before wakeup */
+ __u32 probe_cnt; /* number of [k,u] probes */
};
__u32 bp_type;
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 34219a0c39a7..b570e93de735 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -9631,6 +9631,11 @@ struct perf_event_open_args {
uint64_t offset;
int pid;
size_t ref_ctr_off;
+ struct {
+ __u32 probe_cnt;
+ __u64 config1;
+ __u64 config2;
+ } multi;
};
static int perf_event_open_probe(bool uprobe, struct perf_event_open_args *args)
@@ -9667,8 +9672,15 @@ static int perf_event_open_probe(bool uprobe, struct perf_event_open_args *args)
attr.size = sizeof(attr);
attr.type = type;
attr.config |= (__u64)ref_ctr_off << PERF_UPROBE_REF_CTR_OFFSET_SHIFT;
- attr.config1 = ptr_to_u64(args->name); /* kprobe_func or uprobe_path */
- attr.config2 = args->offset; /* kprobe_addr or probe_offset */
+
+ if (args->multi.probe_cnt) {
+ attr.probe_cnt = args->multi.probe_cnt;
+ attr.config1 = args->multi.config1;
+ attr.config2 = args->multi.config2;
+ } else {
+ attr.config1 = ptr_to_u64(args->name); /* kprobe_func or uprobe_path */
+ attr.config2 = args->offset; /* kprobe_addr or probe_offset */
+ }
/* pid filter is meaningful only for uprobes */
pfd = syscall(__NR_perf_event_open, &attr,
@@ -9807,7 +9819,14 @@ bpf_program__attach_kprobe_opts(const struct bpf_program *prog,
.pid = -1,
.ref_ctr_off = 0,
};
+ __u32 probe_cnt = OPTS_GET(opts, multi.cnt, false);
+ if (probe_cnt) {
+ args.multi.probe_cnt = probe_cnt;
+ args.multi.config1 = ptr_to_u64(OPTS_GET(opts, multi.funcs, false));
+ /* multi.addrs and multi.offs share the same array */
+ args.multi.config2 = ptr_to_u64(OPTS_GET(opts, multi.addrs, false));
+ }
pfd = perf_event_open_probe(false /* uprobe */, &args);
} else {
char probe_name[256];
@@ -10006,6 +10025,13 @@ bpf_program__attach_uprobe_opts(const struct bpf_program *prog, pid_t pid,
.pid = pid,
.ref_ctr_off = ref_ctr_off,
};
+ __u32 probe_cnt = OPTS_GET(opts, multi.cnt, false);
+
+ if (probe_cnt) {
+ args.multi.probe_cnt = probe_cnt;
+ args.multi.config1 = ptr_to_u64(OPTS_GET(opts, multi.paths, false));
+ args.multi.config2 = ptr_to_u64(OPTS_GET(opts, multi.offs, false));
+ }
pfd = perf_event_open_probe(true /* uprobe */, &args);
} else {
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index d02139fec4ac..ae072882b5dd 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -321,9 +321,21 @@ struct bpf_kprobe_opts {
size_t offset;
/* kprobe is return probe */
bool retprobe;
+ /* multi kprobe values */
+ struct {
+ /* probes count */
+ __u32 cnt;
+ /* function names array */
+ char **funcs;
+ /* address/offset values array */
+ union {
+ __u64 *addrs;
+ __u64 *offs;
+ };
+ } multi;
size_t :0;
};
-#define bpf_kprobe_opts__last_field retprobe
+#define bpf_kprobe_opts__last_field multi.addrs
LIBBPF_API struct bpf_link *
bpf_program__attach_kprobe(const struct bpf_program *prog, bool retprobe,
@@ -344,9 +356,18 @@ struct bpf_uprobe_opts {
__u64 bpf_cookie;
/* uprobe is return probe, invoked at function return time */
bool retprobe;
+ /* multi uprobe values */
+ struct {
+ /* probes count */
+ __u32 cnt;
+ /* paths names array */
+ const char **paths;
+ /* offsets values array */
+ __u64 *offs;
+ } multi;
size_t :0;
};
-#define bpf_uprobe_opts__last_field retprobe
+#define bpf_uprobe_opts__last_field multi.offs
LIBBPF_API struct bpf_link *
bpf_program__attach_uprobe(const struct bpf_program *prog, bool retprobe,
--
2.33.1
Adding new sections kprobe.multi/kretprobe.multi for multi
kprobe programs.
It's now possible to define kprobe/kretprobe program like:
SEC("kprobe.multi/bpf_fentry_test*")
and it will be automatically attached to bpf_fentry_test*
functions.
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/lib/bpf/libbpf.c | 105 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 105 insertions(+)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b570e93de735..c1feb5f389a0 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -8348,6 +8348,7 @@ int bpf_program__set_flags(struct bpf_program *prog, __u32 flags)
}
static struct bpf_link *attach_kprobe(const struct bpf_program *prog, long cookie);
+static struct bpf_link *attach_kprobe_multi(const struct bpf_program *prog, long cookie);
static struct bpf_link *attach_tp(const struct bpf_program *prog, long cookie);
static struct bpf_link *attach_raw_tp(const struct bpf_program *prog, long cookie);
static struct bpf_link *attach_trace(const struct bpf_program *prog, long cookie);
@@ -8362,6 +8363,8 @@ static const struct bpf_sec_def section_defs[] = {
SEC_DEF("uprobe/", KPROBE, 0, SEC_NONE),
SEC_DEF("kretprobe/", KPROBE, 0, SEC_NONE, attach_kprobe),
SEC_DEF("uretprobe/", KPROBE, 0, SEC_NONE),
+ SEC_DEF("kprobe.multi/", KPROBE, 0, SEC_NONE, attach_kprobe_multi),
+ SEC_DEF("kretprobe.multi/", KPROBE, 0, SEC_NONE, attach_kprobe_multi),
SEC_DEF("tc", SCHED_CLS, 0, SEC_NONE),
SEC_DEF("classifier", SCHED_CLS, 0, SEC_NONE | SEC_SLOPPY_PFX),
SEC_DEF("action", SCHED_ACT, 0, SEC_NONE | SEC_SLOPPY_PFX),
@@ -9918,6 +9921,108 @@ static struct bpf_link *attach_kprobe(const struct bpf_program *prog, long cooki
return link;
}
+struct kprobe_resolve_multi {
+ const char *name;
+ char **funcs;
+ __u32 alloc;
+ __u32 cnt;
+};
+
+static bool glob_matches(const char *glob, const char *s)
+{
+ int n = strlen(glob);
+
+ if (n == 1 && glob[0] == '*')
+ return true;
+
+ if (glob[0] == '*' && glob[n - 1] == '*') {
+ const char *subs;
+ /* substring match */
+
+ /* this is hacky, but we don't want to allocate
+ * for no good reason
+ */
+ ((char *)glob)[n - 1] = '\0';
+ subs = strstr(s, glob + 1);
+ ((char *)glob)[n - 1] = '*';
+
+ return subs != NULL;
+ } else if (glob[0] == '*') {
+ size_t nn = strlen(s);
+ /* suffix match */
+
+ /* too short for a given suffix */
+ if (nn < n - 1)
+ return false;
+ return strcmp(s + nn - (n - 1), glob + 1) == 0;
+ } else if (glob[n - 1] == '*') {
+ /* prefix match */
+ return strncmp(s, glob, n - 1) == 0;
+ } else {
+ /* exact match */
+ return strcmp(glob, s) == 0;
+ }
+}
+
+static int kprobe_resolve_multi_cb(void *arg, unsigned long long sym_addr,
+ char sym_type, const char *sym_name)
+{
+ struct kprobe_resolve_multi *res = arg;
+ char **p, *sym;
+
+ if (!glob_matches(res->name, sym_name))
+ return 0;
+
+ if (res->cnt == res->alloc) {
+ res->alloc = max((__u32) 16, res->alloc * 3 / 2);
+ p = libbpf_reallocarray(res->funcs, res->alloc, sizeof(__u32));
+ if (!p)
+ return -ENOMEM;
+ res->funcs = p;
+ }
+ sym = strdup(sym_name);
+ if (!sym)
+ return -ENOMEM;
+ res->funcs[res->cnt++] = sym;
+ return 0;
+}
+
+static void free_str_array(char **func, __u32 cnt)
+{
+ __u32 i;
+
+ for (i = 0; i < cnt; i++)
+ free(func[i]);
+ free(func);
+}
+
+static struct bpf_link *attach_kprobe_multi(const struct bpf_program *prog, long cookie)
+{
+ DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts);
+ struct kprobe_resolve_multi res = { };
+ struct bpf_link *link;
+ int err;
+
+ opts.retprobe = str_has_pfx(prog->sec_name, "kretprobe.multi/");
+ if (opts.retprobe)
+ res.name = prog->sec_name + sizeof("kretprobe.multi/") - 1;
+ else
+ res.name = prog->sec_name + sizeof("kprobe.multi/") - 1;
+
+ err = libbpf__kallsyms_parse(&res, kprobe_resolve_multi_cb);
+ if (err) {
+ free_str_array(res.funcs, res.cnt);
+ return libbpf_err_ptr(err);
+ }
+ if (!res.cnt)
+ return libbpf_err_ptr(-ENOENT);
+ opts.multi.cnt = res.cnt;
+ opts.multi.funcs = res.funcs;
+ link = bpf_program__attach_kprobe_opts(prog, NULL, &opts);
+ free_str_array(res.funcs, res.cnt);
+ return link;
+}
+
static void gen_uprobe_legacy_event_name(char *buf, size_t buf_sz,
const char *binary_path, uint64_t offset)
{
--
2.33.1
Adding kprobe multi attach test that uses new interface
to attach multiple probes within single perf event.
The test is attaching to bpf_fentry_test* functions and
single trampoline program to use bpf_prog_test_run to
trigger bpf_fentry_test* functions.
Signed-off-by: Jiri Olsa <[email protected]>
---
.../bpf/prog_tests/multi_kprobe_test.c | 83 +++++++++++++++++++
.../selftests/bpf/progs/multi_kprobe.c | 58 +++++++++++++
2 files changed, 141 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/multi_kprobe_test.c
create mode 100644 tools/testing/selftests/bpf/progs/multi_kprobe.c
diff --git a/tools/testing/selftests/bpf/prog_tests/multi_kprobe_test.c b/tools/testing/selftests/bpf/prog_tests/multi_kprobe_test.c
new file mode 100644
index 000000000000..4aae472f9c16
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/multi_kprobe_test.c
@@ -0,0 +1,83 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include "multi_kprobe.skel.h"
+#include "trace_helpers.h"
+
+static void test_funcs_api(void)
+{
+ struct multi_kprobe *skel = NULL;
+ __u32 duration = 0, retval;
+ int err, prog_fd;
+
+ skel = multi_kprobe__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "fentry_multi_skel_load"))
+ goto cleanup;
+
+ err = multi_kprobe__attach(skel);
+ if (!ASSERT_OK(err, "fentry_attach"))
+ goto cleanup;
+
+ prog_fd = bpf_program__fd(skel->progs.test1);
+ err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+ NULL, NULL, &retval, &duration);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(retval, 0, "test_run");
+
+ ASSERT_EQ(skel->bss->test2_result, 8, "test2_result");
+ ASSERT_EQ(skel->bss->test3_result, 8, "test3_result");
+
+cleanup:
+ multi_kprobe__destroy(skel);
+}
+
+static void test_addrs_api(void)
+{
+ struct bpf_link *link1 = NULL, *link2 = NULL;
+ DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts);
+ struct multi_kprobe *skel = NULL;
+ __u32 duration = 0, retval;
+ int err, prog_fd;
+ __u64 addrs[8];
+
+ skel = multi_kprobe__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "fentry_multi_skel_load"))
+ goto cleanup;
+
+ kallsyms_find("bpf_fentry_test1", &addrs[0]);
+ kallsyms_find("bpf_fentry_test2", &addrs[1]);
+ kallsyms_find("bpf_fentry_test3", &addrs[2]);
+ kallsyms_find("bpf_fentry_test4", &addrs[3]);
+ kallsyms_find("bpf_fentry_test5", &addrs[4]);
+ kallsyms_find("bpf_fentry_test6", &addrs[5]);
+ kallsyms_find("bpf_fentry_test7", &addrs[6]);
+ kallsyms_find("bpf_fentry_test8", &addrs[7]);
+
+ opts.multi.cnt = 8;
+ opts.multi.addrs = (__u64 *) &addrs;
+ link1 = bpf_program__attach_kprobe_opts(skel->progs.test2, NULL, &opts);
+ if (!ASSERT_OK_PTR(link1, "link1"))
+ goto cleanup;
+
+ link2 = bpf_program__attach_kprobe_opts(skel->progs.test3, NULL, &opts);
+ if (!ASSERT_OK_PTR(link1, "link1"))
+ goto cleanup;
+
+ prog_fd = bpf_program__fd(skel->progs.test1);
+ err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+ NULL, NULL, &retval, &duration);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(retval, 0, "test_run");
+
+ ASSERT_EQ(skel->bss->test2_result, 8, "test2_result");
+ ASSERT_EQ(skel->bss->test3_result, 8, "test3_result");
+
+cleanup:
+ bpf_link__destroy(link1);
+ bpf_link__destroy(link2);
+ multi_kprobe__destroy(skel);
+}
+void test_multi_kprobe_test(void)
+{
+ test_funcs_api();
+ test_addrs_api();
+}
diff --git a/tools/testing/selftests/bpf/progs/multi_kprobe.c b/tools/testing/selftests/bpf/progs/multi_kprobe.c
new file mode 100644
index 000000000000..67fe4c2486fc
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/multi_kprobe.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+extern const void bpf_fentry_test1 __ksym;
+extern const void bpf_fentry_test2 __ksym;
+extern const void bpf_fentry_test3 __ksym;
+extern const void bpf_fentry_test4 __ksym;
+extern const void bpf_fentry_test5 __ksym;
+extern const void bpf_fentry_test6 __ksym;
+extern const void bpf_fentry_test7 __ksym;
+extern const void bpf_fentry_test8 __ksym;
+
+/* No tests, just to trigger bpf_fentry_test* through tracing test_run */
+SEC("fentry/bpf_modify_return_test")
+int BPF_PROG(test1)
+{
+ return 0;
+}
+
+__u64 test2_result = 0;
+
+SEC("kprobe.multi/bpf_fentry_test*")
+int test2(struct pt_regs *ctx)
+{
+ __u64 addr = bpf_get_func_ip(ctx);
+
+ test2_result += (const void *) addr == &bpf_fentry_test1 ||
+ (const void *) addr == &bpf_fentry_test2 ||
+ (const void *) addr == &bpf_fentry_test3 ||
+ (const void *) addr == &bpf_fentry_test4 ||
+ (const void *) addr == &bpf_fentry_test5 ||
+ (const void *) addr == &bpf_fentry_test6 ||
+ (const void *) addr == &bpf_fentry_test7 ||
+ (const void *) addr == &bpf_fentry_test8;
+ return 0;
+}
+
+__u64 test3_result = 0;
+
+SEC("kretprobe.multi/bpf_fentry_test*")
+int test3(struct pt_regs *ctx)
+{
+ __u64 addr = bpf_get_func_ip(ctx);
+
+ test3_result += (const void *) addr == &bpf_fentry_test1 ||
+ (const void *) addr == &bpf_fentry_test2 ||
+ (const void *) addr == &bpf_fentry_test3 ||
+ (const void *) addr == &bpf_fentry_test4 ||
+ (const void *) addr == &bpf_fentry_test5 ||
+ (const void *) addr == &bpf_fentry_test6 ||
+ (const void *) addr == &bpf_fentry_test7 ||
+ (const void *) addr == &bpf_fentry_test8;
+ return 0;
+}
--
2.33.1
Adding uprobe multi attach test that uses new interface
to attach multiple probes within single perf event.
Signed-off-by: Jiri Olsa <[email protected]>
---
.../bpf/prog_tests/multi_uprobe_test.c | 97 +++++++++++++++++++
.../selftests/bpf/progs/multi_uprobe.c | 26 +++++
2 files changed, 123 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/multi_uprobe_test.c
create mode 100644 tools/testing/selftests/bpf/progs/multi_uprobe.c
diff --git a/tools/testing/selftests/bpf/prog_tests/multi_uprobe_test.c b/tools/testing/selftests/bpf/prog_tests/multi_uprobe_test.c
new file mode 100644
index 000000000000..0f939893ef45
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/multi_uprobe_test.c
@@ -0,0 +1,97 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include "multi_uprobe.skel.h"
+
+/* this is how USDT semaphore is actually defined, except volatile modifier */
+extern volatile unsigned short uprobe_ref_ctr;
+
+/* attach points */
+static void method0(void) { return ; }
+static void method1(void) { return ; }
+static void method2(void) { return ; }
+static void method3(void) { return ; }
+static void method4(void) { return ; }
+static void method5(void) { return ; }
+static void method6(void) { return ; }
+static void method7(void) { return ; }
+
+void test_multi_uprobe_test(void)
+{
+ DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, uprobe_opts);
+ struct bpf_link *uretprobe_link = NULL;
+ struct bpf_link *uprobe_link = NULL;
+ ssize_t base_addr, ref_ctr_offset;
+ struct multi_uprobe *skel;
+ const char *paths[8];
+ int duration = 0;
+ __u64 offs[8];
+
+ skel = multi_uprobe__open_and_load();
+ if (CHECK(!skel, "skel_open", "failed to open skeleton\n"))
+ return;
+
+ base_addr = get_base_addr();
+ if (CHECK(base_addr < 0, "get_base_addr",
+ "failed to find base addr: %zd", base_addr))
+ return;
+
+ ref_ctr_offset = get_rel_offset((uintptr_t)&uprobe_ref_ctr);
+ if (!ASSERT_GE(ref_ctr_offset, 0, "ref_ctr_offset"))
+ return;
+
+#define INIT(__i) \
+ do { \
+ paths[__i] = (const char *) "/proc/self/exe"; \
+ offs[__i] = get_uprobe_offset(&method ## __i, base_addr); \
+ } while (0)
+
+ INIT(0);
+ INIT(1);
+ INIT(2);
+ INIT(3);
+ INIT(4);
+ INIT(5);
+ INIT(6);
+ INIT(7);
+
+#undef INIT
+
+ uprobe_opts.multi.paths = paths;
+ uprobe_opts.multi.offs = offs;
+
+ uprobe_opts.multi.cnt = 8;
+
+ uprobe_opts.retprobe = false;
+ uprobe_opts.ref_ctr_offset = ref_ctr_offset;
+ uprobe_link = bpf_program__attach_uprobe_opts(skel->progs.handle_uprobe,
+ 0 /* self pid */,
+ NULL, 0,
+ &uprobe_opts);
+ if (!ASSERT_OK_PTR(uprobe_link, "attach_uprobe"))
+ goto cleanup;
+
+ uprobe_opts.retprobe = true;
+ uretprobe_link = bpf_program__attach_uprobe_opts(skel->progs.handle_uretprobe,
+ -1 /* any pid */,
+ NULL, 0,
+ &uprobe_opts);
+ if (!ASSERT_OK_PTR(uretprobe_link, "attach_uretprobe"))
+ goto cleanup;
+
+ method0();
+ method1();
+ method2();
+ method3();
+ method4();
+ method5();
+ method6();
+ method7();
+
+ ASSERT_EQ(skel->bss->test_uprobe_result, 8, "test_uprobe_result");
+ ASSERT_EQ(skel->bss->test_uretprobe_result, 8, "test_uretprobe_result");
+
+cleanup:
+ bpf_link__destroy(uretprobe_link);
+ bpf_link__destroy(uprobe_link);
+ multi_uprobe__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/multi_uprobe.c b/tools/testing/selftests/bpf/progs/multi_uprobe.c
new file mode 100644
index 000000000000..831f7b98baef
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/multi_uprobe.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/ptrace.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+__u64 test_uprobe_result = 0;
+
+SEC("uprobe/trigger_func")
+int handle_uprobe(struct pt_regs *ctx)
+{
+ test_uprobe_result++;
+ return 0;
+}
+
+__u64 test_uretprobe_result = 0;
+
+SEC("uretprobe/trigger_func")
+int handle_uretprobe(struct pt_regs *ctx)
+{
+ test_uretprobe_result++;
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.33.1
Hi Jiri,
On Wed, 24 Nov 2021 09:41:11 +0100
Jiri Olsa <[email protected]> wrote:
> hi,
> adding support to create multiple kprobes/uprobes within single
> perf event. This way we can associate single bpf program with
> multiple kprobes.
Thanks for the change, basically, you can repeatedly call the
create_local_trace_kprobe() and register it.
>
> Sending this as RFC because I'm not completely sure I haven't
> missed anything in the trace/events area.
OK let me check that.
Thanks,
>
> Also it needs following uprobe fix to work properly:
> https://lore.kernel.org/lkml/[email protected]/
>
> Also available at:
> https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
> bpf/kuprobe_batch
>
> thanks,
> jirka
>
>
> ---
> Jiri Olsa (8):
> perf/kprobe: Add support to create multiple probes
> perf/uprobe: Add support to create multiple probes
> libbpf: Add libbpf__kallsyms_parse function
> libbpf: Add struct perf_event_open_args
> libbpf: Add support to attach multiple [ku]probes
> libbpf: Add support for k[ret]probe.multi program section
> selftest/bpf: Add kprobe multi attach test
> selftest/bpf: Add uprobe multi attach test
>
> include/uapi/linux/perf_event.h | 1 +
> kernel/trace/trace_event_perf.c | 214 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------
> kernel/trace/trace_kprobe.c | 47 ++++++++++++++++---
> kernel/trace/trace_probe.c | 2 +-
> kernel/trace/trace_probe.h | 6 ++-
> kernel/trace/trace_uprobe.c | 43 +++++++++++++++--
> tools/include/uapi/linux/perf_event.h | 1 +
> tools/lib/bpf/libbpf.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------
> tools/lib/bpf/libbpf.h | 25 +++++++++-
> tools/lib/bpf/libbpf_internal.h | 5 ++
> tools/testing/selftests/bpf/prog_tests/multi_kprobe_test.c | 83 +++++++++++++++++++++++++++++++++
> tools/testing/selftests/bpf/prog_tests/multi_uprobe_test.c | 97 ++++++++++++++++++++++++++++++++++++++
> tools/testing/selftests/bpf/progs/multi_kprobe.c | 58 +++++++++++++++++++++++
> tools/testing/selftests/bpf/progs/multi_uprobe.c | 26 +++++++++++
> 14 files changed, 765 insertions(+), 78 deletions(-)
> create mode 100644 tools/testing/selftests/bpf/prog_tests/multi_kprobe_test.c
> create mode 100644 tools/testing/selftests/bpf/prog_tests/multi_uprobe_test.c
> create mode 100644 tools/testing/selftests/bpf/progs/multi_kprobe.c
> create mode 100644 tools/testing/selftests/bpf/progs/multi_uprobe.c
>
--
Masami Hiramatsu <[email protected]>
On Wed, 24 Nov 2021 09:41:12 +0100
Jiri Olsa <[email protected]> wrote:
> Adding support to create multiple probes within single perf event.
> This way we can associate single bpf program with multiple kprobes,
> because bpf program gets associated with the perf event.
>
> The perf_event_attr is not extended, current fields for kprobe
> attachment are used for multi attachment.
>
> For current kprobe atachment we use either:
>
> kprobe_func (in config1) + probe_offset (in config2)
>
> to define kprobe by function name with offset, or:
>
> kprobe_addr (in config2)
>
> to define kprobe with direct address value.
>
> For multi probe attach the same fields point to array of values
> with the same semantic. Each probe is defined as set of values
> with the same array index (idx) as:
>
> kprobe_func[idx] + probe_offset[idx]
>
> to define kprobe by function name with offset, or:
>
> kprobe_addr[idx]
>
> to define kprobe with direct address value.
>
> The number of probes is passed in probe_cnt value, which shares
> the union with wakeup_events/wakeup_watermark values which are
> not used for kprobes.
>
> Since [1] it's possible to stack multiple probes events under
> one head event. Using the same code to allow that for probes
> defined under perf kprobe interface.
OK, so you also want to add multi-probes on single event by
single perf-event syscall. Not defining different events.
Those are bit different, multi-probes on single event can
invoke single event handler from different probe points. For
exapmple same user bpf handler will be invoked from different
address.
>
> [1] https://lore.kernel.org/lkml/156095682948.28024.14190188071338900568.stgit@devnote2/
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> include/uapi/linux/perf_event.h | 1 +
> kernel/trace/trace_event_perf.c | 106 ++++++++++++++++++++++++++++----
> kernel/trace/trace_kprobe.c | 47 ++++++++++++--
> kernel/trace/trace_probe.c | 2 +-
> kernel/trace/trace_probe.h | 3 +-
> 5 files changed, 138 insertions(+), 21 deletions(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index bd8860eeb291..eea80709d1ed 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -414,6 +414,7 @@ struct perf_event_attr {
> union {
> __u32 wakeup_events; /* wakeup every n events */
> __u32 wakeup_watermark; /* bytes before wakeup */
> + __u32 probe_cnt; /* number of [k,u] probes */
> };
>
> __u32 bp_type;
> diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> index a114549720d6..26078e40c299 100644
> --- a/kernel/trace/trace_event_perf.c
> +++ b/kernel/trace/trace_event_perf.c
> @@ -245,23 +245,27 @@ void perf_trace_destroy(struct perf_event *p_event)
> }
>
> #ifdef CONFIG_KPROBE_EVENTS
> -int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe)
> +static struct trace_event_call*
> +kprobe_init(bool is_retprobe, u64 kprobe_func, u64 kprobe_addr,
> + u64 probe_offset, struct trace_event_call *old)
> {
> int ret;
> char *func = NULL;
> struct trace_event_call *tp_event;
>
> - if (p_event->attr.kprobe_func) {
> + if (kprobe_func) {
> func = kzalloc(KSYM_NAME_LEN, GFP_KERNEL);
> if (!func)
> - return -ENOMEM;
> + return ERR_PTR(-ENOMEM);
> ret = strncpy_from_user(
> - func, u64_to_user_ptr(p_event->attr.kprobe_func),
> + func, u64_to_user_ptr(kprobe_func),
> KSYM_NAME_LEN);
> if (ret == KSYM_NAME_LEN)
> ret = -E2BIG;
> - if (ret < 0)
> - goto out;
> + if (ret < 0) {
> + kfree(func);
> + return ERR_PTR(ret);
> + }
>
> if (func[0] == '\0') {
> kfree(func);
> @@ -270,20 +274,96 @@ int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe)
> }
>
> tp_event = create_local_trace_kprobe(
> - func, (void *)(unsigned long)(p_event->attr.kprobe_addr),
> - p_event->attr.probe_offset, is_retprobe);
> - if (IS_ERR(tp_event)) {
> - ret = PTR_ERR(tp_event);
> - goto out;
> + func, (void *)(unsigned long)(kprobe_addr),
> + probe_offset, is_retprobe, old);
Hmm, here I have a concern (maybe no real issue is caused at this momemnt.)
Since ftrace's multi-probe event has same event/group name among the
probes's internal event-calls. However, create_local_trace_kprobe()
actually uses the "func" name for the event name.
I think you should choose a randome different "representative" event name
for the event (not probe), and share it among the probes on the event,
if the perf event has no event name.
(I'm not sure how the event names are used from inside of the BPF programs,
but just for the consistency.)
> + kfree(func);
> + return tp_event;
> +}
> +
> +static struct trace_event_call*
> +kprobe_init_multi(struct perf_event *p_event, bool is_retprobe)
> +{
> + void __user *kprobe_func = u64_to_user_ptr(p_event->attr.kprobe_func);
> + void __user *kprobe_addr = u64_to_user_ptr(p_event->attr.kprobe_addr);
> + u64 *funcs = NULL, *addrs = NULL, *offs = NULL;
> + struct trace_event_call *tp_event, *tp_old = NULL;
> + u32 i, cnt = p_event->attr.probe_cnt;
> + int ret = -EINVAL;
> + size_t size;
> +
> + if (!cnt)
> + return ERR_PTR(-EINVAL);
> +
> + size = cnt * sizeof(u64);
> + if (kprobe_func) {
> + ret = -ENOMEM;
> + funcs = kmalloc(size, GFP_KERNEL);
> + if (!funcs)
> + goto out;
> + ret = -EFAULT;
> + if (copy_from_user(funcs, kprobe_func, size))
> + goto out;
> + }
> +
> + if (kprobe_addr) {
> + ret = -ENOMEM;
> + addrs = kmalloc(size, GFP_KERNEL);
> + if (!addrs)
> + goto out;
> + ret = -EFAULT;
> + if (copy_from_user(addrs, kprobe_addr, size))
> + goto out;
> + /* addresses and ofsets share the same array */
> + offs = addrs;
> }
>
> + for (i = 0; i < cnt; i++) {
> + tp_event = kprobe_init(is_retprobe, funcs ? funcs[i] : 0,
> + addrs ? addrs[i] : 0, offs ? offs[i] : 0,
> + tp_old);
> + if (IS_ERR(tp_event)) {
> + if (tp_old)
> + destroy_local_trace_kprobe(tp_old);
> + ret = PTR_ERR(tp_event);
> + goto out;
> + }
> + if (!tp_old)
> + tp_old = tp_event;
> + }
> + ret = 0;
> +out:
> + kfree(funcs);
> + kfree(addrs);
> + return ret ? ERR_PTR(ret) : tp_old;
> +}
> +
> +static struct trace_event_call*
> +kprobe_init_single(struct perf_event *p_event, bool is_retprobe)
> +{
> + struct perf_event_attr *attr = &p_event->attr;
> +
> + return kprobe_init(is_retprobe, attr->kprobe_func, attr->kprobe_addr,
> + attr->probe_offset, NULL);
> +}
> +
> +int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe)
> +{
> + struct trace_event_call *tp_event;
> + int ret;
> +
> + if (p_event->attr.probe_cnt)
isn't this "p_event->attr.probe_cnt > 1" ?
> + tp_event = kprobe_init_multi(p_event, is_retprobe);
> + else
> + tp_event = kprobe_init_single(p_event, is_retprobe);
> +
> + if (IS_ERR(tp_event))
> + return PTR_ERR(tp_event);
> +
> mutex_lock(&event_mutex);
> ret = perf_trace_event_init(tp_event, p_event);
> if (ret)
> destroy_local_trace_kprobe(tp_event);
> mutex_unlock(&event_mutex);
> -out:
> - kfree(func);
> return ret;
> }
>
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index 33272a7b6912..86a7aada853a 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -237,13 +237,18 @@ static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs);
> static int kretprobe_dispatcher(struct kretprobe_instance *ri,
> struct pt_regs *regs);
>
> +static void __free_trace_kprobe(struct trace_kprobe *tk)
> +{
> + kfree(tk->symbol);
> + free_percpu(tk->nhit);
> + kfree(tk);
> +}
> +
> static void free_trace_kprobe(struct trace_kprobe *tk)
> {
> if (tk) {
> trace_probe_cleanup(&tk->tp);
> - kfree(tk->symbol);
> - free_percpu(tk->nhit);
> - kfree(tk);
> + __free_trace_kprobe(tk);
Why is this needed?
> }
> }
>
> @@ -1796,7 +1801,7 @@ static int unregister_kprobe_event(struct trace_kprobe *tk)
> /* create a trace_kprobe, but don't add it to global lists */
> struct trace_event_call *
> create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
> - bool is_return)
> + bool is_return, struct trace_event_call *old)
> {
> enum probe_print_type ptype;
> struct trace_kprobe *tk;
> @@ -1820,6 +1825,28 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
> return ERR_CAST(tk);
> }
>
> + if (old) {
> + struct trace_kprobe *tk_old;
> +
> + tk_old = trace_kprobe_primary_from_call(old);
So, this will choose the first(primary) probe's function name as
the representative event name. But other probes can have different
event names.
> + if (!tk_old) {
> + ret = -EINVAL;
> + goto error;
> + }
> +
> + /* Append to existing event */
> + ret = trace_probe_append(&tk->tp, &tk_old->tp);
> + if (ret)
> + goto error;
> +
> + /* Register k*probe */
> + ret = __register_trace_kprobe(tk);
> + if (ret)
> + goto error;
If "appended" probe failed to register, it must be "unlinked" from
the first one and goto error to free the trace_kprobe.
if (ret) {
trace_probe_unlink(&tk->tp);
goto error;
}
See append_trace_kprobe() for details.
> +
> + return trace_probe_event_call(&tk->tp);
> + }
> +
> init_trace_event_call(tk);
>
> ptype = trace_kprobe_is_return(tk) ?
> @@ -1841,6 +1868,8 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
>
> void destroy_local_trace_kprobe(struct trace_event_call *event_call)
> {
> + struct trace_probe_event *event;
> + struct trace_probe *pos, *tmp;
> struct trace_kprobe *tk;
>
> tk = trace_kprobe_primary_from_call(event_call);
> @@ -1852,9 +1881,15 @@ void destroy_local_trace_kprobe(struct trace_event_call *event_call)
> return;
> }
>
> - __unregister_trace_kprobe(tk);
> + event = tk->tp.event;
> + list_for_each_entry_safe(pos, tmp, &event->probes, list) {
> + list_for_each_entry_safe(pos, tmp, &event->probes, list) {
> + list_del_init(&pos->list);
> + __unregister_trace_kprobe(tk);
> + __free_trace_kprobe(tk);
> + }
>
> - free_trace_kprobe(tk);
> + trace_probe_event_free(event);
Actually, each probe already allocated the trace_probe events (which are not
used if it is appended). Thus you have to use trace_probe_unlink(&tk->tp) in
the above loop.
list_for_each_entry_safe(pos, tmp, &event->probes, list) {
list_for_each_entry_safe(pos, tmp, &event->probes, list) {
__unregister_trace_kprobe(tk);
trace_probe_unlink(&tk->tp); /* This will call trace_probe_event_free() internally */
free_trace_kprobe(tk);
}
Thank you,
> }
> #endif /* CONFIG_PERF_EVENTS */
>
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 3ed2a3f37297..2dff85aa21e9 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -974,7 +974,7 @@ int traceprobe_define_arg_fields(struct trace_event_call *event_call,
> return 0;
> }
>
> -static void trace_probe_event_free(struct trace_probe_event *tpe)
> +void trace_probe_event_free(struct trace_probe_event *tpe)
> {
> kfree(tpe->class.system);
> kfree(tpe->call.name);
> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index 99e7a5df025e..ba8e46c7efe8 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -333,6 +333,7 @@ int trace_probe_init(struct trace_probe *tp, const char *event,
> const char *group, bool alloc_filter);
> void trace_probe_cleanup(struct trace_probe *tp);
> int trace_probe_append(struct trace_probe *tp, struct trace_probe *to);
> +void trace_probe_event_free(struct trace_probe_event *tpe);
> void trace_probe_unlink(struct trace_probe *tp);
> int trace_probe_register_event_call(struct trace_probe *tp);
> int trace_probe_add_file(struct trace_probe *tp, struct trace_event_file *file);
> @@ -377,7 +378,7 @@ extern int traceprobe_set_print_fmt(struct trace_probe *tp, enum probe_print_typ
> #ifdef CONFIG_PERF_EVENTS
> extern struct trace_event_call *
> create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
> - bool is_return);
> + bool is_return, struct trace_event_call *old);
> extern void destroy_local_trace_kprobe(struct trace_event_call *event_call);
>
> extern struct trace_event_call *
> --
> 2.33.1
>
--
Masami Hiramatsu <[email protected]>
On Sun, Nov 28, 2021 at 10:49:54PM +0900, Masami Hiramatsu wrote:
> On Wed, 24 Nov 2021 09:41:12 +0100
> Jiri Olsa <[email protected]> wrote:
>
> > Adding support to create multiple probes within single perf event.
> > This way we can associate single bpf program with multiple kprobes,
> > because bpf program gets associated with the perf event.
> >
> > The perf_event_attr is not extended, current fields for kprobe
> > attachment are used for multi attachment.
> >
> > For current kprobe atachment we use either:
> >
> > kprobe_func (in config1) + probe_offset (in config2)
> >
> > to define kprobe by function name with offset, or:
> >
> > kprobe_addr (in config2)
> >
> > to define kprobe with direct address value.
> >
> > For multi probe attach the same fields point to array of values
> > with the same semantic. Each probe is defined as set of values
> > with the same array index (idx) as:
> >
> > kprobe_func[idx] + probe_offset[idx]
> >
> > to define kprobe by function name with offset, or:
> >
> > kprobe_addr[idx]
> >
> > to define kprobe with direct address value.
> >
> > The number of probes is passed in probe_cnt value, which shares
> > the union with wakeup_events/wakeup_watermark values which are
> > not used for kprobes.
> >
> > Since [1] it's possible to stack multiple probes events under
> > one head event. Using the same code to allow that for probes
> > defined under perf kprobe interface.
>
> OK, so you also want to add multi-probes on single event by
> single perf-event syscall. Not defining different events.
correct.. bpf program is then attached to perf event with
ioctl call.. this way we can have multiple probes attached
to single bpf program
>
> Those are bit different, multi-probes on single event can
> invoke single event handler from different probe points. For
> exapmple same user bpf handler will be invoked from different
> address.
right, that's the goal, having single bpf program executed
from multiple probes
>
> >
> > [1] https://lore.kernel.org/lkml/156095682948.28024.14190188071338900568.stgit@devnote2/
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > include/uapi/linux/perf_event.h | 1 +
> > kernel/trace/trace_event_perf.c | 106 ++++++++++++++++++++++++++++----
> > kernel/trace/trace_kprobe.c | 47 ++++++++++++--
> > kernel/trace/trace_probe.c | 2 +-
> > kernel/trace/trace_probe.h | 3 +-
> > 5 files changed, 138 insertions(+), 21 deletions(-)
> >
> > diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> > index bd8860eeb291..eea80709d1ed 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -414,6 +414,7 @@ struct perf_event_attr {
> > union {
> > __u32 wakeup_events; /* wakeup every n events */
> > __u32 wakeup_watermark; /* bytes before wakeup */
> > + __u32 probe_cnt; /* number of [k,u] probes */
> > };
> >
> > __u32 bp_type;
> > diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
> > index a114549720d6..26078e40c299 100644
> > --- a/kernel/trace/trace_event_perf.c
> > +++ b/kernel/trace/trace_event_perf.c
> > @@ -245,23 +245,27 @@ void perf_trace_destroy(struct perf_event *p_event)
> > }
> >
> > #ifdef CONFIG_KPROBE_EVENTS
> > -int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe)
> > +static struct trace_event_call*
> > +kprobe_init(bool is_retprobe, u64 kprobe_func, u64 kprobe_addr,
> > + u64 probe_offset, struct trace_event_call *old)
> > {
> > int ret;
> > char *func = NULL;
> > struct trace_event_call *tp_event;
> >
> > - if (p_event->attr.kprobe_func) {
> > + if (kprobe_func) {
> > func = kzalloc(KSYM_NAME_LEN, GFP_KERNEL);
> > if (!func)
> > - return -ENOMEM;
> > + return ERR_PTR(-ENOMEM);
> > ret = strncpy_from_user(
> > - func, u64_to_user_ptr(p_event->attr.kprobe_func),
> > + func, u64_to_user_ptr(kprobe_func),
> > KSYM_NAME_LEN);
> > if (ret == KSYM_NAME_LEN)
> > ret = -E2BIG;
> > - if (ret < 0)
> > - goto out;
> > + if (ret < 0) {
> > + kfree(func);
> > + return ERR_PTR(ret);
> > + }
> >
> > if (func[0] == '\0') {
> > kfree(func);
> > @@ -270,20 +274,96 @@ int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe)
> > }
> >
> > tp_event = create_local_trace_kprobe(
> > - func, (void *)(unsigned long)(p_event->attr.kprobe_addr),
> > - p_event->attr.probe_offset, is_retprobe);
> > - if (IS_ERR(tp_event)) {
> > - ret = PTR_ERR(tp_event);
> > - goto out;
> > + func, (void *)(unsigned long)(kprobe_addr),
> > + probe_offset, is_retprobe, old);
>
> Hmm, here I have a concern (maybe no real issue is caused at this momemnt.)
> Since ftrace's multi-probe event has same event/group name among the
> probes's internal event-calls. However, create_local_trace_kprobe()
> actually uses the "func" name for the event name.
> I think you should choose a randome different "representative" event name
> for the event (not probe), and share it among the probes on the event,
> if the perf event has no event name.
>
> (I'm not sure how the event names are used from inside of the BPF programs,
> but just for the consistency.)
ok, I don't think event names are used, I'll check
>
> > + kfree(func);
> > + return tp_event;
> > +}
> > +
> > +static struct trace_event_call*
> > +kprobe_init_multi(struct perf_event *p_event, bool is_retprobe)
> > +{
> > + void __user *kprobe_func = u64_to_user_ptr(p_event->attr.kprobe_func);
> > + void __user *kprobe_addr = u64_to_user_ptr(p_event->attr.kprobe_addr);
> > + u64 *funcs = NULL, *addrs = NULL, *offs = NULL;
> > + struct trace_event_call *tp_event, *tp_old = NULL;
> > + u32 i, cnt = p_event->attr.probe_cnt;
> > + int ret = -EINVAL;
> > + size_t size;
> > +
> > + if (!cnt)
> > + return ERR_PTR(-EINVAL);
> > +
> > + size = cnt * sizeof(u64);
> > + if (kprobe_func) {
> > + ret = -ENOMEM;
> > + funcs = kmalloc(size, GFP_KERNEL);
> > + if (!funcs)
> > + goto out;
> > + ret = -EFAULT;
> > + if (copy_from_user(funcs, kprobe_func, size))
> > + goto out;
> > + }
> > +
> > + if (kprobe_addr) {
> > + ret = -ENOMEM;
> > + addrs = kmalloc(size, GFP_KERNEL);
> > + if (!addrs)
> > + goto out;
> > + ret = -EFAULT;
> > + if (copy_from_user(addrs, kprobe_addr, size))
> > + goto out;
> > + /* addresses and ofsets share the same array */
> > + offs = addrs;
> > }
> >
> > + for (i = 0; i < cnt; i++) {
> > + tp_event = kprobe_init(is_retprobe, funcs ? funcs[i] : 0,
> > + addrs ? addrs[i] : 0, offs ? offs[i] : 0,
> > + tp_old);
> > + if (IS_ERR(tp_event)) {
> > + if (tp_old)
> > + destroy_local_trace_kprobe(tp_old);
> > + ret = PTR_ERR(tp_event);
> > + goto out;
> > + }
> > + if (!tp_old)
> > + tp_old = tp_event;
> > + }
> > + ret = 0;
> > +out:
> > + kfree(funcs);
> > + kfree(addrs);
> > + return ret ? ERR_PTR(ret) : tp_old;
> > +}
> > +
> > +static struct trace_event_call*
> > +kprobe_init_single(struct perf_event *p_event, bool is_retprobe)
> > +{
> > + struct perf_event_attr *attr = &p_event->attr;
> > +
> > + return kprobe_init(is_retprobe, attr->kprobe_func, attr->kprobe_addr,
> > + attr->probe_offset, NULL);
> > +}
> > +
> > +int perf_kprobe_init(struct perf_event *p_event, bool is_retprobe)
> > +{
> > + struct trace_event_call *tp_event;
> > + int ret;
> > +
> > + if (p_event->attr.probe_cnt)
>
> isn't this "p_event->attr.probe_cnt > 1" ?
right, probe_cnt is just added by this patchset and used only when
there are multiple probes being attached, so that's why it works
even with 1 at the moment
will change
>
> > + tp_event = kprobe_init_multi(p_event, is_retprobe);
> > + else
> > + tp_event = kprobe_init_single(p_event, is_retprobe);
> > +
> > + if (IS_ERR(tp_event))
> > + return PTR_ERR(tp_event);
> > +
> > mutex_lock(&event_mutex);
> > ret = perf_trace_event_init(tp_event, p_event);
> > if (ret)
> > destroy_local_trace_kprobe(tp_event);
> > mutex_unlock(&event_mutex);
> > -out:
> > - kfree(func);
> > return ret;
> > }
> >
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index 33272a7b6912..86a7aada853a 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -237,13 +237,18 @@ static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs);
> > static int kretprobe_dispatcher(struct kretprobe_instance *ri,
> > struct pt_regs *regs);
> >
> > +static void __free_trace_kprobe(struct trace_kprobe *tk)
> > +{
> > + kfree(tk->symbol);
> > + free_percpu(tk->nhit);
> > + kfree(tk);
> > +}
> > +
> > static void free_trace_kprobe(struct trace_kprobe *tk)
> > {
> > if (tk) {
> > trace_probe_cleanup(&tk->tp);
> > - kfree(tk->symbol);
> > - free_percpu(tk->nhit);
> > - kfree(tk);
> > + __free_trace_kprobe(tk);
>
> Why is this needed?
I needed some free function that does not call trace_probe_cleanup and
trace_probe_unlink.. I wrote details in the destroy_local_trace_kprobe
comment below
>
> > }
> > }
> >
> > @@ -1796,7 +1801,7 @@ static int unregister_kprobe_event(struct trace_kprobe *tk)
> > /* create a trace_kprobe, but don't add it to global lists */
> > struct trace_event_call *
> > create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
> > - bool is_return)
> > + bool is_return, struct trace_event_call *old)
> > {
> > enum probe_print_type ptype;
> > struct trace_kprobe *tk;
> > @@ -1820,6 +1825,28 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
> > return ERR_CAST(tk);
> > }
> >
> > + if (old) {
> > + struct trace_kprobe *tk_old;
> > +
> > + tk_old = trace_kprobe_primary_from_call(old);
>
> So, this will choose the first(primary) probe's function name as
> the representative event name. But other probes can have different
> event names.
ok
>
> > + if (!tk_old) {
> > + ret = -EINVAL;
> > + goto error;
> > + }
> > +
> > + /* Append to existing event */
> > + ret = trace_probe_append(&tk->tp, &tk_old->tp);
> > + if (ret)
> > + goto error;
> > +
> > + /* Register k*probe */
> > + ret = __register_trace_kprobe(tk);
> > + if (ret)
> > + goto error;
>
> If "appended" probe failed to register, it must be "unlinked" from
> the first one and goto error to free the trace_kprobe.
>
> if (ret) {
> trace_probe_unlink(&tk->tp);
> goto error;
> }
>
> See append_trace_kprobe() for details.
so there's goto error jumping to:
error:
free_trace_kprobe(tk);
that calls:
trace_probe_cleanup
-> trace_probe_unlink
that should do it, right?
>
> > +
> > + return trace_probe_event_call(&tk->tp);
> > + }
> > +
> > init_trace_event_call(tk);
> >
> > ptype = trace_kprobe_is_return(tk) ?
> > @@ -1841,6 +1868,8 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
> >
> > void destroy_local_trace_kprobe(struct trace_event_call *event_call)
> > {
> > + struct trace_probe_event *event;
> > + struct trace_probe *pos, *tmp;
> > struct trace_kprobe *tk;
> >
> > tk = trace_kprobe_primary_from_call(event_call);
> > @@ -1852,9 +1881,15 @@ void destroy_local_trace_kprobe(struct trace_event_call *event_call)
> > return;
> > }
> >
> > - __unregister_trace_kprobe(tk);
> > + event = tk->tp.event;
> > + list_for_each_entry_safe(pos, tmp, &event->probes, list) {
> > + list_for_each_entry_safe(pos, tmp, &event->probes, list) {
> > + list_del_init(&pos->list);
> > + __unregister_trace_kprobe(tk);
> > + __free_trace_kprobe(tk);
> > + }
> >
> > - free_trace_kprobe(tk);
> > + trace_probe_event_free(event);
>
> Actually, each probe already allocated the trace_probe events (which are not
> used if it is appended). Thus you have to use trace_probe_unlink(&tk->tp) in
> the above loop.
>
> list_for_each_entry_safe(pos, tmp, &event->probes, list) {
> list_for_each_entry_safe(pos, tmp, &event->probes, list) {
> __unregister_trace_kprobe(tk);
> trace_probe_unlink(&tk->tp); /* This will call trace_probe_event_free() internally */
> free_trace_kprobe(tk);
> }
so calling trace_probe_event_free inside this loop is a problem,
because the loop iterates that trace_probe_event's probes list,
and last probe removed will trigger trace_probe_event_free, that
will free the list we iterate.. and we go down ;-)
so that's why I added new free function '__free_trace_kprobe'
that frees everything as free_trace_kprobe, but does not call
trace_probe_unlink
event = tk->tp.event;
list_for_each_entry_safe(pos, tmp, &event->probes, list) {
list_for_each_entry_safe(pos, tmp, &event->probes, list) {
list_del_init(&pos->list);
__unregister_trace_kprobe(tk);
__free_trace_kprobe(tk);
}
trace_probe_event_free(event);
and there's trace_probe_event_free(event) to make the final free
thanks,
jirka
On Sun, 28 Nov 2021 23:34:13 +0100
Jiri Olsa <[email protected]> wrote:
>
> > > + if (!tk_old) {
> > > + ret = -EINVAL;
> > > + goto error;
> > > + }
> > > +
> > > + /* Append to existing event */
> > > + ret = trace_probe_append(&tk->tp, &tk_old->tp);
> > > + if (ret)
> > > + goto error;
> > > +
> > > + /* Register k*probe */
> > > + ret = __register_trace_kprobe(tk);
> > > + if (ret)
> > > + goto error;
> >
> > If "appended" probe failed to register, it must be "unlinked" from
> > the first one and goto error to free the trace_kprobe.
> >
> > if (ret) {
> > trace_probe_unlink(&tk->tp);
> > goto error;
> > }
> >
> > See append_trace_kprobe() for details.
>
> so there's goto error jumping to:
>
> error:
> free_trace_kprobe(tk);
>
> that calls:
> trace_probe_cleanup
> -> trace_probe_unlink
>
> that should do it, right?
Ah, OK. Clean up all the kprobe events in this function. Then it's good.
>
> >
> > > +
> > > + return trace_probe_event_call(&tk->tp);
> > > + }
> > > +
> > > init_trace_event_call(tk);
> > >
> > > ptype = trace_kprobe_is_return(tk) ?
> > > @@ -1841,6 +1868,8 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
> > >
> > > void destroy_local_trace_kprobe(struct trace_event_call *event_call)
> > > {
> > > + struct trace_probe_event *event;
> > > + struct trace_probe *pos, *tmp;
> > > struct trace_kprobe *tk;
> > >
> > > tk = trace_kprobe_primary_from_call(event_call);
> > > @@ -1852,9 +1881,15 @@ void destroy_local_trace_kprobe(struct trace_event_call *event_call)
> > > return;
> > > }
> > >
> > > - __unregister_trace_kprobe(tk);
> > > + event = tk->tp.event;
> > > + list_for_each_entry_safe(pos, tmp, &event->probes, list) {
> > > + list_for_each_entry_safe(pos, tmp, &event->probes, list) {
> > > + list_del_init(&pos->list);
> > > + __unregister_trace_kprobe(tk);
> > > + __free_trace_kprobe(tk);
> > > + }
> > >
> > > - free_trace_kprobe(tk);
> > > + trace_probe_event_free(event);
> >
> > Actually, each probe already allocated the trace_probe events (which are not
> > used if it is appended). Thus you have to use trace_probe_unlink(&tk->tp) in
> > the above loop.
> >
> > list_for_each_entry_safe(pos, tmp, &event->probes, list) {
> > list_for_each_entry_safe(pos, tmp, &event->probes, list) {
> > __unregister_trace_kprobe(tk);
> > trace_probe_unlink(&tk->tp); /* This will call trace_probe_event_free() internally */
> > free_trace_kprobe(tk);
> > }
>
> so calling trace_probe_event_free inside this loop is a problem,
> because the loop iterates that trace_probe_event's probes list,
> and last probe removed will trigger trace_probe_event_free, that
> will free the list we iterate.. and we go down ;-)
Oops, right. So in this case, you are looping on the all probes
on an event, so event is referred outside of loop.
OK, I got it.
In the ftrace kprobe-event, this loop cursor is done by dynevent,
so this problem doesn't occur. But the BPF is only using the
trace_event, thus this special routine is needed.
Could you add such comment on your loop?
Thank you,
>
> so that's why I added new free function '__free_trace_kprobe'
> that frees everything as free_trace_kprobe, but does not call
> trace_probe_unlink
>
> event = tk->tp.event;
> list_for_each_entry_safe(pos, tmp, &event->probes, list) {
> list_for_each_entry_safe(pos, tmp, &event->probes, list) {
> list_del_init(&pos->list);
> __unregister_trace_kprobe(tk);
> __free_trace_kprobe(tk);
> }
>
> trace_probe_event_free(event);
>
> and there's trace_probe_event_free(event) to make the final free
>
> thanks,
> jirka
>
--
Masami Hiramatsu <[email protected]>
On Wed, Nov 24, 2021 at 12:41 AM Jiri Olsa <[email protected]> wrote:
>
> Adding support to create multiple probes within single perf event.
> This way we can associate single bpf program with multiple kprobes,
> because bpf program gets associated with the perf event.
>
> The perf_event_attr is not extended, current fields for kprobe
> attachment are used for multi attachment.
I'm a bit concerned with complicating perf_event_attr further to
support this multi-attach. For BPF, at least, we now have
bpf_perf_link and corresponding BPF_LINK_CREATE command in bpf()
syscall which allows much simpler and cleaner API to do this. Libbpf
will actually pick bpf_link-based attachment if kernel supports it. I
think we should better do bpf_link-based approach from the get go.
Another thing I'd like you to keep in mind and think about is BPF
cookie. Currently kprobe/uprobe/tracepoint allow to associate
arbitrary user-provided u64 value which will be accessible from BPF
program with bpf_get_attach_cookie(). With multi-attach kprobes this
because extremely crucial feature to support, otherwise it's both
expensive, inconvenient and complicated to be able to distinguish
between different instances of the same multi-attach kprobe
invocation. So with that, what would be the interface to specify these
BPF cookies for this multi-attach kprobe, if we are going through
perf_event_attr. Probably picking yet another unused field and
union-izing it with a pointer. It will work, but makes the interface
even more overloaded. While for LINK_CREATE we can just add another
pointer to a u64[] with the same size as number of kfunc names and
offsets.
But other than that, I'm super happy that you are working on these
complicated multi-attach capabilities! It would be great to benchmark
one-by-one attachment vs multi-attach to the same set of kprobes once
you arrive at the final implementation.
>
> For current kprobe atachment we use either:
>
> kprobe_func (in config1) + probe_offset (in config2)
>
> to define kprobe by function name with offset, or:
>
> kprobe_addr (in config2)
>
> to define kprobe with direct address value.
>
> For multi probe attach the same fields point to array of values
> with the same semantic. Each probe is defined as set of values
> with the same array index (idx) as:
>
> kprobe_func[idx] + probe_offset[idx]
>
> to define kprobe by function name with offset, or:
>
> kprobe_addr[idx]
>
> to define kprobe with direct address value.
>
> The number of probes is passed in probe_cnt value, which shares
> the union with wakeup_events/wakeup_watermark values which are
> not used for kprobes.
>
> Since [1] it's possible to stack multiple probes events under
> one head event. Using the same code to allow that for probes
> defined under perf kprobe interface.
>
> [1] https://lore.kernel.org/lkml/156095682948.28024.14190188071338900568.stgit@devnote2/
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> include/uapi/linux/perf_event.h | 1 +
> kernel/trace/trace_event_perf.c | 106 ++++++++++++++++++++++++++++----
> kernel/trace/trace_kprobe.c | 47 ++++++++++++--
> kernel/trace/trace_probe.c | 2 +-
> kernel/trace/trace_probe.h | 3 +-
> 5 files changed, 138 insertions(+), 21 deletions(-)
>
[...]
On Tue, Nov 30, 2021 at 10:53 PM Andrii Nakryiko
<[email protected]> wrote:
>
> On Wed, Nov 24, 2021 at 12:41 AM Jiri Olsa <[email protected]> wrote:
> >
> > Adding support to create multiple probes within single perf event.
> > This way we can associate single bpf program with multiple kprobes,
> > because bpf program gets associated with the perf event.
> >
> > The perf_event_attr is not extended, current fields for kprobe
> > attachment are used for multi attachment.
>
> I'm a bit concerned with complicating perf_event_attr further to
> support this multi-attach. For BPF, at least, we now have
> bpf_perf_link and corresponding BPF_LINK_CREATE command in bpf()
> syscall which allows much simpler and cleaner API to do this. Libbpf
> will actually pick bpf_link-based attachment if kernel supports it. I
> think we should better do bpf_link-based approach from the get go.
>
> Another thing I'd like you to keep in mind and think about is BPF
> cookie. Currently kprobe/uprobe/tracepoint allow to associate
> arbitrary user-provided u64 value which will be accessible from BPF
> program with bpf_get_attach_cookie(). With multi-attach kprobes this
> because extremely crucial feature to support, otherwise it's both
> expensive, inconvenient and complicated to be able to distinguish
> between different instances of the same multi-attach kprobe
> invocation. So with that, what would be the interface to specify these
> BPF cookies for this multi-attach kprobe, if we are going through
> perf_event_attr. Probably picking yet another unused field and
> union-izing it with a pointer. It will work, but makes the interface
> even more overloaded. While for LINK_CREATE we can just add another
> pointer to a u64[] with the same size as number of kfunc names and
> offsets.
Oh, and to be clear, I'm not proposing to bypass underlying perf
infra. Rather use it directly as an internal API, not through
perf_event_open syscall.
>
> But other than that, I'm super happy that you are working on these
> complicated multi-attach capabilities! It would be great to benchmark
> one-by-one attachment vs multi-attach to the same set of kprobes once
> you arrive at the final implementation.
>
> >
> > For current kprobe atachment we use either:
> >
> > kprobe_func (in config1) + probe_offset (in config2)
> >
> > to define kprobe by function name with offset, or:
> >
> > kprobe_addr (in config2)
> >
> > to define kprobe with direct address value.
> >
> > For multi probe attach the same fields point to array of values
> > with the same semantic. Each probe is defined as set of values
> > with the same array index (idx) as:
> >
> > kprobe_func[idx] + probe_offset[idx]
> >
> > to define kprobe by function name with offset, or:
> >
> > kprobe_addr[idx]
> >
> > to define kprobe with direct address value.
> >
> > The number of probes is passed in probe_cnt value, which shares
> > the union with wakeup_events/wakeup_watermark values which are
> > not used for kprobes.
> >
> > Since [1] it's possible to stack multiple probes events under
> > one head event. Using the same code to allow that for probes
> > defined under perf kprobe interface.
> >
> > [1] https://lore.kernel.org/lkml/156095682948.28024.14190188071338900568.stgit@devnote2/
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > include/uapi/linux/perf_event.h | 1 +
> > kernel/trace/trace_event_perf.c | 106 ++++++++++++++++++++++++++++----
> > kernel/trace/trace_kprobe.c | 47 ++++++++++++--
> > kernel/trace/trace_probe.c | 2 +-
> > kernel/trace/trace_probe.h | 3 +-
> > 5 files changed, 138 insertions(+), 21 deletions(-)
> >
>
> [...]
On Tue, Nov 30, 2021 at 10:53:58PM -0800, Andrii Nakryiko wrote:
> On Wed, Nov 24, 2021 at 12:41 AM Jiri Olsa <[email protected]> wrote:
> >
> > Adding support to create multiple probes within single perf event.
> > This way we can associate single bpf program with multiple kprobes,
> > because bpf program gets associated with the perf event.
> >
> > The perf_event_attr is not extended, current fields for kprobe
> > attachment are used for multi attachment.
>
> I'm a bit concerned with complicating perf_event_attr further to
> support this multi-attach. For BPF, at least, we now have
> bpf_perf_link and corresponding BPF_LINK_CREATE command in bpf()
> syscall which allows much simpler and cleaner API to do this. Libbpf
> will actually pick bpf_link-based attachment if kernel supports it. I
> think we should better do bpf_link-based approach from the get go.
>
> Another thing I'd like you to keep in mind and think about is BPF
> cookie. Currently kprobe/uprobe/tracepoint allow to associate
> arbitrary user-provided u64 value which will be accessible from BPF
> program with bpf_get_attach_cookie(). With multi-attach kprobes this
> because extremely crucial feature to support, otherwise it's both
> expensive, inconvenient and complicated to be able to distinguish
> between different instances of the same multi-attach kprobe
> invocation. So with that, what would be the interface to specify these
> BPF cookies for this multi-attach kprobe, if we are going through
> perf_event_attr. Probably picking yet another unused field and
> union-izing it with a pointer. It will work, but makes the interface
> even more overloaded. While for LINK_CREATE we can just add another
> pointer to a u64[] with the same size as number of kfunc names and
> offsets.
I'm not sure we could bypass perf event easily.. perhaps introduce
BPF_PROG_TYPE_RAW_KPROBE as we did for tracepoints or just new
type for multi kprobe attachment like BPF_PROG_TYPE_MULTI_KPROBE
that might be that way we'd have full control over the API
>
> But other than that, I'm super happy that you are working on these
> complicated multi-attach capabilities! It would be great to benchmark
> one-by-one attachment vs multi-attach to the same set of kprobes once
> you arrive at the final implementation.
I have the change for bpftrace to use this and even though there's
some speed up, it's not as substantial as for trampolines
looks like we 'only' got rid of the multiple perf syscall overheads,
compared to rcu syncs timeouts like we eliminated for trampolines
I'll make full benchmarks once we have some final solution
jirka
On Wed, Dec 1, 2021 at 1:32 PM Jiri Olsa <[email protected]> wrote:
>
> On Tue, Nov 30, 2021 at 10:53:58PM -0800, Andrii Nakryiko wrote:
> > On Wed, Nov 24, 2021 at 12:41 AM Jiri Olsa <[email protected]> wrote:
> > >
> > > Adding support to create multiple probes within single perf event.
> > > This way we can associate single bpf program with multiple kprobes,
> > > because bpf program gets associated with the perf event.
> > >
> > > The perf_event_attr is not extended, current fields for kprobe
> > > attachment are used for multi attachment.
> >
> > I'm a bit concerned with complicating perf_event_attr further to
> > support this multi-attach. For BPF, at least, we now have
> > bpf_perf_link and corresponding BPF_LINK_CREATE command in bpf()
> > syscall which allows much simpler and cleaner API to do this. Libbpf
> > will actually pick bpf_link-based attachment if kernel supports it. I
> > think we should better do bpf_link-based approach from the get go.
> >
> > Another thing I'd like you to keep in mind and think about is BPF
> > cookie. Currently kprobe/uprobe/tracepoint allow to associate
> > arbitrary user-provided u64 value which will be accessible from BPF
> > program with bpf_get_attach_cookie(). With multi-attach kprobes this
> > because extremely crucial feature to support, otherwise it's both
> > expensive, inconvenient and complicated to be able to distinguish
> > between different instances of the same multi-attach kprobe
> > invocation. So with that, what would be the interface to specify these
> > BPF cookies for this multi-attach kprobe, if we are going through
> > perf_event_attr. Probably picking yet another unused field and
> > union-izing it with a pointer. It will work, but makes the interface
> > even more overloaded. While for LINK_CREATE we can just add another
> > pointer to a u64[] with the same size as number of kfunc names and
> > offsets.
>
> I'm not sure we could bypass perf event easily.. perhaps introduce
> BPF_PROG_TYPE_RAW_KPROBE as we did for tracepoints or just new
> type for multi kprobe attachment like BPF_PROG_TYPE_MULTI_KPROBE
> that might be that way we'd have full control over the API
Indeed. The existing kprobe prog type has this api:
* Return: BPF programs always return an integer which is interpreted by
* kprobe handler as:
* 0 - return from kprobe (event is filtered out)
* 1 - store kprobe event into ring buffer
that part we cannot change.
No one was using that filtering feature. It often was in a way.
New MULTI_KPROBE prog type should not have it.
On Wed, Dec 1, 2021 at 1:32 PM Jiri Olsa <[email protected]> wrote:
>
> On Tue, Nov 30, 2021 at 10:53:58PM -0800, Andrii Nakryiko wrote:
> > On Wed, Nov 24, 2021 at 12:41 AM Jiri Olsa <[email protected]> wrote:
> > >
> > > Adding support to create multiple probes within single perf event.
> > > This way we can associate single bpf program with multiple kprobes,
> > > because bpf program gets associated with the perf event.
> > >
> > > The perf_event_attr is not extended, current fields for kprobe
> > > attachment are used for multi attachment.
> >
> > I'm a bit concerned with complicating perf_event_attr further to
> > support this multi-attach. For BPF, at least, we now have
> > bpf_perf_link and corresponding BPF_LINK_CREATE command in bpf()
> > syscall which allows much simpler and cleaner API to do this. Libbpf
> > will actually pick bpf_link-based attachment if kernel supports it. I
> > think we should better do bpf_link-based approach from the get go.
> >
> > Another thing I'd like you to keep in mind and think about is BPF
> > cookie. Currently kprobe/uprobe/tracepoint allow to associate
> > arbitrary user-provided u64 value which will be accessible from BPF
> > program with bpf_get_attach_cookie(). With multi-attach kprobes this
> > because extremely crucial feature to support, otherwise it's both
> > expensive, inconvenient and complicated to be able to distinguish
> > between different instances of the same multi-attach kprobe
> > invocation. So with that, what would be the interface to specify these
> > BPF cookies for this multi-attach kprobe, if we are going through
> > perf_event_attr. Probably picking yet another unused field and
> > union-izing it with a pointer. It will work, but makes the interface
> > even more overloaded. While for LINK_CREATE we can just add another
> > pointer to a u64[] with the same size as number of kfunc names and
> > offsets.
>
> I'm not sure we could bypass perf event easily.. perhaps introduce
> BPF_PROG_TYPE_RAW_KPROBE as we did for tracepoints or just new
> type for multi kprobe attachment like BPF_PROG_TYPE_MULTI_KPROBE
> that might be that way we'd have full control over the API
Sure, new type works.
>
> >
> > But other than that, I'm super happy that you are working on these
> > complicated multi-attach capabilities! It would be great to benchmark
> > one-by-one attachment vs multi-attach to the same set of kprobes once
> > you arrive at the final implementation.
>
> I have the change for bpftrace to use this and even though there's
> some speed up, it's not as substantial as for trampolines
>
> looks like we 'only' got rid of the multiple perf syscall overheads,
> compared to rcu syncs timeouts like we eliminated for trampolines
if it's just eliminating a pretty small overhead of multiple syscalls,
then it would be quite disappointing to add a bunch of complexity just
for that. Are there any reasons we can't use the same low-level ftrace
batch attach API to speed this up considerably? I assume it's only
possible if kprobe is attached at the beginning of the function (not
sure how kretprobe is treated here), so we can either say that this
new kprobe prog type can only be attached at the beginning of each
function and enforce that (probably would be totally reasonable
assumption as that's what's happening most frequently in practice).
Worst case, should be possible to split all requested attach targets
into two groups, one fast at function entry and all the rest.
Am I too far off on this one? There might be some more complications
that I don't see.
>
> I'll make full benchmarks once we have some final solution
>
> jirka
>
On Mon, Dec 06, 2021 at 07:15:58PM -0800, Andrii Nakryiko wrote:
> On Wed, Dec 1, 2021 at 1:32 PM Jiri Olsa <[email protected]> wrote:
> >
> > On Tue, Nov 30, 2021 at 10:53:58PM -0800, Andrii Nakryiko wrote:
> > > On Wed, Nov 24, 2021 at 12:41 AM Jiri Olsa <[email protected]> wrote:
> > > >
> > > > Adding support to create multiple probes within single perf event.
> > > > This way we can associate single bpf program with multiple kprobes,
> > > > because bpf program gets associated with the perf event.
> > > >
> > > > The perf_event_attr is not extended, current fields for kprobe
> > > > attachment are used for multi attachment.
> > >
> > > I'm a bit concerned with complicating perf_event_attr further to
> > > support this multi-attach. For BPF, at least, we now have
> > > bpf_perf_link and corresponding BPF_LINK_CREATE command in bpf()
> > > syscall which allows much simpler and cleaner API to do this. Libbpf
> > > will actually pick bpf_link-based attachment if kernel supports it. I
> > > think we should better do bpf_link-based approach from the get go.
> > >
> > > Another thing I'd like you to keep in mind and think about is BPF
> > > cookie. Currently kprobe/uprobe/tracepoint allow to associate
> > > arbitrary user-provided u64 value which will be accessible from BPF
> > > program with bpf_get_attach_cookie(). With multi-attach kprobes this
> > > because extremely crucial feature to support, otherwise it's both
> > > expensive, inconvenient and complicated to be able to distinguish
> > > between different instances of the same multi-attach kprobe
> > > invocation. So with that, what would be the interface to specify these
> > > BPF cookies for this multi-attach kprobe, if we are going through
> > > perf_event_attr. Probably picking yet another unused field and
> > > union-izing it with a pointer. It will work, but makes the interface
> > > even more overloaded. While for LINK_CREATE we can just add another
> > > pointer to a u64[] with the same size as number of kfunc names and
> > > offsets.
> >
> > I'm not sure we could bypass perf event easily.. perhaps introduce
> > BPF_PROG_TYPE_RAW_KPROBE as we did for tracepoints or just new
> > type for multi kprobe attachment like BPF_PROG_TYPE_MULTI_KPROBE
> > that might be that way we'd have full control over the API
>
> Sure, new type works.
>
> >
> > >
> > > But other than that, I'm super happy that you are working on these
> > > complicated multi-attach capabilities! It would be great to benchmark
> > > one-by-one attachment vs multi-attach to the same set of kprobes once
> > > you arrive at the final implementation.
> >
> > I have the change for bpftrace to use this and even though there's
> > some speed up, it's not as substantial as for trampolines
> >
> > looks like we 'only' got rid of the multiple perf syscall overheads,
> > compared to rcu syncs timeouts like we eliminated for trampolines
>
> if it's just eliminating a pretty small overhead of multiple syscalls,
> then it would be quite disappointing to add a bunch of complexity just
> for that.
I meant it's not as huge save as for trampolines, but I expect some
noticeable speedup, I'll make more becnhmarks with current patchset
> Are there any reasons we can't use the same low-level ftrace
> batch attach API to speed this up considerably? I assume it's only
> possible if kprobe is attached at the beginning of the function (not
> sure how kretprobe is treated here), so we can either say that this
> new kprobe prog type can only be attached at the beginning of each
> function and enforce that (probably would be totally reasonable
> assumption as that's what's happening most frequently in practice).
> Worst case, should be possible to split all requested attach targets
> into two groups, one fast at function entry and all the rest.
>
> Am I too far off on this one? There might be some more complications
> that I don't see.
I'd need to check more on kprobes internals, but.. ;-)
the new ftrace interface is special for 'direct' trampolines and
I think that although kprobes can use ftrace for attaching, they
use it in a different way
also this current 'multi attach' approach is on top of current kprobe
interface, if we wanted to use the new ftrace API we'd need to add new
kprobe interface and change the kprobe attaching to use it (for cases
it's attached at the function entry)
jirka
>
> >
> > I'll make full benchmarks once we have some final solution
> >
> > jirka
> >
>
On Wed, Dec 08, 2021 at 02:50:09PM +0100, Jiri Olsa wrote:
> On Mon, Dec 06, 2021 at 07:15:58PM -0800, Andrii Nakryiko wrote:
> > On Wed, Dec 1, 2021 at 1:32 PM Jiri Olsa <[email protected]> wrote:
> > >
> > > On Tue, Nov 30, 2021 at 10:53:58PM -0800, Andrii Nakryiko wrote:
> > > > On Wed, Nov 24, 2021 at 12:41 AM Jiri Olsa <[email protected]> wrote:
> > > > >
> > > > > Adding support to create multiple probes within single perf event.
> > > > > This way we can associate single bpf program with multiple kprobes,
> > > > > because bpf program gets associated with the perf event.
> > > > >
> > > > > The perf_event_attr is not extended, current fields for kprobe
> > > > > attachment are used for multi attachment.
> > > >
> > > > I'm a bit concerned with complicating perf_event_attr further to
> > > > support this multi-attach. For BPF, at least, we now have
> > > > bpf_perf_link and corresponding BPF_LINK_CREATE command in bpf()
> > > > syscall which allows much simpler and cleaner API to do this. Libbpf
> > > > will actually pick bpf_link-based attachment if kernel supports it. I
> > > > think we should better do bpf_link-based approach from the get go.
> > > >
> > > > Another thing I'd like you to keep in mind and think about is BPF
> > > > cookie. Currently kprobe/uprobe/tracepoint allow to associate
> > > > arbitrary user-provided u64 value which will be accessible from BPF
> > > > program with bpf_get_attach_cookie(). With multi-attach kprobes this
> > > > because extremely crucial feature to support, otherwise it's both
> > > > expensive, inconvenient and complicated to be able to distinguish
> > > > between different instances of the same multi-attach kprobe
> > > > invocation. So with that, what would be the interface to specify these
> > > > BPF cookies for this multi-attach kprobe, if we are going through
> > > > perf_event_attr. Probably picking yet another unused field and
> > > > union-izing it with a pointer. It will work, but makes the interface
> > > > even more overloaded. While for LINK_CREATE we can just add another
> > > > pointer to a u64[] with the same size as number of kfunc names and
> > > > offsets.
> > >
> > > I'm not sure we could bypass perf event easily.. perhaps introduce
> > > BPF_PROG_TYPE_RAW_KPROBE as we did for tracepoints or just new
> > > type for multi kprobe attachment like BPF_PROG_TYPE_MULTI_KPROBE
> > > that might be that way we'd have full control over the API
> >
> > Sure, new type works.
> >
> > >
> > > >
> > > > But other than that, I'm super happy that you are working on these
> > > > complicated multi-attach capabilities! It would be great to benchmark
> > > > one-by-one attachment vs multi-attach to the same set of kprobes once
> > > > you arrive at the final implementation.
> > >
> > > I have the change for bpftrace to use this and even though there's
> > > some speed up, it's not as substantial as for trampolines
> > >
> > > looks like we 'only' got rid of the multiple perf syscall overheads,
> > > compared to rcu syncs timeouts like we eliminated for trampolines
> >
> > if it's just eliminating a pretty small overhead of multiple syscalls,
> > then it would be quite disappointing to add a bunch of complexity just
> > for that.
>
> I meant it's not as huge save as for trampolines, but I expect some
> noticeable speedup, I'll make more becnhmarks with current patchset
so with this approach there's noticable speedup, but it's not the
'instant attachment speed' as for trampolines
as a base I used bpftrace with change that allows to reuse bpf program
for multiple kprobes
bpftrace standard attach of 672 kprobes:
Performance counter stats for './src/bpftrace -vv -e kprobe:kvm* { @[kstack] += 1; } i:ms:10 { printf("KRAVA\n"); exit() }':
70.548897815 seconds time elapsed
0.909996000 seconds user
50.622834000 seconds sys
bpftrace using interface from this patchset attach of 673 kprobes:
Performance counter stats for './src/bpftrace -vv -e kprobe:kvm* { @[kstack] += 1; } i:ms:10 { printf("KRAVA\n"); exit() }':
36.947586803 seconds time elapsed
0.272585000 seconds user
30.900831000 seconds sys
so it's noticeable, but I wonder it's not enough ;-)
jirka
>
> > Are there any reasons we can't use the same low-level ftrace
> > batch attach API to speed this up considerably? I assume it's only
> > possible if kprobe is attached at the beginning of the function (not
> > sure how kretprobe is treated here), so we can either say that this
> > new kprobe prog type can only be attached at the beginning of each
> > function and enforce that (probably would be totally reasonable
> > assumption as that's what's happening most frequently in practice).
> > Worst case, should be possible to split all requested attach targets
> > into two groups, one fast at function entry and all the rest.
> >
> > Am I too far off on this one? There might be some more complications
> > that I don't see.
>
> I'd need to check more on kprobes internals, but.. ;-)
>
> the new ftrace interface is special for 'direct' trampolines and
> I think that although kprobes can use ftrace for attaching, they
> use it in a different way
>
> also this current 'multi attach' approach is on top of current kprobe
> interface, if we wanted to use the new ftrace API we'd need to add new
> kprobe interface and change the kprobe attaching to use it (for cases
> it's attached at the function entry)
>
> jirka
>
> >
> > >
> > > I'll make full benchmarks once we have some final solution
> > >
> > > jirka
> > >
> >
On Fri, Dec 10, 2021 at 4:42 AM Jiri Olsa <[email protected]> wrote:
>
> On Wed, Dec 08, 2021 at 02:50:09PM +0100, Jiri Olsa wrote:
> > On Mon, Dec 06, 2021 at 07:15:58PM -0800, Andrii Nakryiko wrote:
> > > On Wed, Dec 1, 2021 at 1:32 PM Jiri Olsa <[email protected]> wrote:
> > > >
> > > > On Tue, Nov 30, 2021 at 10:53:58PM -0800, Andrii Nakryiko wrote:
> > > > > On Wed, Nov 24, 2021 at 12:41 AM Jiri Olsa <[email protected]> wrote:
> > > > > >
> > > > > > Adding support to create multiple probes within single perf event.
> > > > > > This way we can associate single bpf program with multiple kprobes,
> > > > > > because bpf program gets associated with the perf event.
> > > > > >
> > > > > > The perf_event_attr is not extended, current fields for kprobe
> > > > > > attachment are used for multi attachment.
> > > > >
> > > > > I'm a bit concerned with complicating perf_event_attr further to
> > > > > support this multi-attach. For BPF, at least, we now have
> > > > > bpf_perf_link and corresponding BPF_LINK_CREATE command in bpf()
> > > > > syscall which allows much simpler and cleaner API to do this. Libbpf
> > > > > will actually pick bpf_link-based attachment if kernel supports it. I
> > > > > think we should better do bpf_link-based approach from the get go.
> > > > >
> > > > > Another thing I'd like you to keep in mind and think about is BPF
> > > > > cookie. Currently kprobe/uprobe/tracepoint allow to associate
> > > > > arbitrary user-provided u64 value which will be accessible from BPF
> > > > > program with bpf_get_attach_cookie(). With multi-attach kprobes this
> > > > > because extremely crucial feature to support, otherwise it's both
> > > > > expensive, inconvenient and complicated to be able to distinguish
> > > > > between different instances of the same multi-attach kprobe
> > > > > invocation. So with that, what would be the interface to specify these
> > > > > BPF cookies for this multi-attach kprobe, if we are going through
> > > > > perf_event_attr. Probably picking yet another unused field and
> > > > > union-izing it with a pointer. It will work, but makes the interface
> > > > > even more overloaded. While for LINK_CREATE we can just add another
> > > > > pointer to a u64[] with the same size as number of kfunc names and
> > > > > offsets.
> > > >
> > > > I'm not sure we could bypass perf event easily.. perhaps introduce
> > > > BPF_PROG_TYPE_RAW_KPROBE as we did for tracepoints or just new
> > > > type for multi kprobe attachment like BPF_PROG_TYPE_MULTI_KPROBE
> > > > that might be that way we'd have full control over the API
> > >
> > > Sure, new type works.
> > >
> > > >
> > > > >
> > > > > But other than that, I'm super happy that you are working on these
> > > > > complicated multi-attach capabilities! It would be great to benchmark
> > > > > one-by-one attachment vs multi-attach to the same set of kprobes once
> > > > > you arrive at the final implementation.
> > > >
> > > > I have the change for bpftrace to use this and even though there's
> > > > some speed up, it's not as substantial as for trampolines
> > > >
> > > > looks like we 'only' got rid of the multiple perf syscall overheads,
> > > > compared to rcu syncs timeouts like we eliminated for trampolines
> > >
> > > if it's just eliminating a pretty small overhead of multiple syscalls,
> > > then it would be quite disappointing to add a bunch of complexity just
> > > for that.
> >
> > I meant it's not as huge save as for trampolines, but I expect some
> > noticeable speedup, I'll make more becnhmarks with current patchset
>
> so with this approach there's noticable speedup, but it's not the
> 'instant attachment speed' as for trampolines
>
> as a base I used bpftrace with change that allows to reuse bpf program
> for multiple kprobes
>
> bpftrace standard attach of 672 kprobes:
>
> Performance counter stats for './src/bpftrace -vv -e kprobe:kvm* { @[kstack] += 1; } i:ms:10 { printf("KRAVA\n"); exit() }':
>
> 70.548897815 seconds time elapsed
>
> 0.909996000 seconds user
> 50.622834000 seconds sys
>
>
> bpftrace using interface from this patchset attach of 673 kprobes:
>
> Performance counter stats for './src/bpftrace -vv -e kprobe:kvm* { @[kstack] += 1; } i:ms:10 { printf("KRAVA\n"); exit() }':
>
> 36.947586803 seconds time elapsed
>
> 0.272585000 seconds user
> 30.900831000 seconds sys
>
>
> so it's noticeable, but I wonder it's not enough ;-)
Typical retsnoop run for BPF use case is attaching to ~1200 functions.
Answer for yourself if you think the tool that takes 36 seconds to
start up is a great user experience? ;)
>
> jirka
>
> >
> > > Are there any reasons we can't use the same low-level ftrace
> > > batch attach API to speed this up considerably? I assume it's only
> > > possible if kprobe is attached at the beginning of the function (not
> > > sure how kretprobe is treated here), so we can either say that this
> > > new kprobe prog type can only be attached at the beginning of each
> > > function and enforce that (probably would be totally reasonable
> > > assumption as that's what's happening most frequently in practice).
> > > Worst case, should be possible to split all requested attach targets
> > > into two groups, one fast at function entry and all the rest.
> > >
> > > Am I too far off on this one? There might be some more complications
> > > that I don't see.
> >
> > I'd need to check more on kprobes internals, but.. ;-)
> >
> > the new ftrace interface is special for 'direct' trampolines and
> > I think that although kprobes can use ftrace for attaching, they
> > use it in a different way
> >
> > also this current 'multi attach' approach is on top of current kprobe
> > interface, if we wanted to use the new ftrace API we'd need to add new
> > kprobe interface and change the kprobe attaching to use it (for cases
> > it's attached at the function entry)
> >
> > jirka
> >
> > >
> > > >
> > > > I'll make full benchmarks once we have some final solution
> > > >
> > > > jirka
> > > >
> > >
>