hi,
adding support to attach multiple kprobes within single syscall
and speed up attachment of many kprobes.
The previous attempt [1] wasn't fast enough, so coming with new
approach that adds new kprobe interface.
The attachment speed of of this approach (tested in bpftrace)
is now comparable to ftrace tracer attachment speed.. fast ;-)
The limit of this approach is forced by using ftrace as attach
layer, so it allows only kprobes on function's entry (plus
return probes).
This patchset contains:
- kprobes support to register multiple kprobes with current
kprobe API (patches 1 - 8)
- bpf support ot create new kprobe link allowing to attach
multiple addresses (patches 9 - 14)
We don't need to care about multiple probes on same functions
because it's taken care on the ftrace_ops layer.
Also available at:
https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git
kprobe/multi
thanks,
jirka
[1] https://lore.kernel.org/bpf/[email protected]/
---
Jiri Olsa (13):
ftrace: Add ftrace_set_filter_ips function
kprobe: Keep traced function address
kprobe: Add support to register multiple ftrace kprobes
kprobe: Add support to register multiple ftrace kretprobes
kprobe: Allow to get traced function address for multi ftrace kprobes
samples/kprobes: Add support for multi kprobe interface
samples/kprobes: Add support for multi kretprobe interface
bpf: Add kprobe link for attaching raw kprobes
libbpf: Add libbpf__kallsyms_parse function
libbpf: Add bpf_link_create support for multi kprobes
libbpf: Add bpf_program__attach_kprobe_opts for multi kprobes
selftest/bpf: Add raw kprobe attach test
selftest/bpf: Add bpf_cookie test for raw_k[ret]probe
arch/Kconfig | 3 ++
arch/x86/Kconfig | 1 +
arch/x86/kernel/kprobes/ftrace.c | 51 +++++++++++++-----
include/linux/bpf_types.h | 1 +
include/linux/ftrace.h | 3 ++
include/linux/kprobes.h | 55 ++++++++++++++++++++
include/uapi/linux/bpf.h | 12 +++++
kernel/bpf/syscall.c | 191 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
kernel/kprobes.c | 264 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------
kernel/trace/bpf_trace.c | 7 ++-
kernel/trace/ftrace.c | 53 +++++++++++++++----
samples/kprobes/kprobe_example.c | 47 +++++++++++++++--
samples/kprobes/kretprobe_example.c | 43 +++++++++++++++-
tools/include/uapi/linux/bpf.h | 12 +++++
tools/lib/bpf/bpf.c | 5 ++
tools/lib/bpf/bpf.h | 7 ++-
tools/lib/bpf/libbpf.c | 186 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
tools/lib/bpf/libbpf_internal.h | 5 ++
tools/testing/selftests/bpf/prog_tests/bpf_cookie.c | 42 +++++++++++++++
tools/testing/selftests/bpf/prog_tests/raw_kprobe_test.c | 92 +++++++++++++++++++++++++++++++++
tools/testing/selftests/bpf/progs/get_func_ip_test.c | 4 +-
tools/testing/selftests/bpf/progs/raw_kprobe.c | 58 +++++++++++++++++++++
tools/testing/selftests/bpf/progs/test_bpf_cookie.c | 24 ++++++++-
23 files changed, 1062 insertions(+), 104 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_kprobe_test.c
create mode 100644 tools/testing/selftests/bpf/progs/raw_kprobe.c
Adding ftrace_set_filter_ips function to be able to set filter on
multiple ip addresses at once.
With the kprobe multi attach interface we have cases where we need to
initialize ftrace_ops object with thousands of functions, so having
single function diving into ftrace_hash_move_and_update_ops with
ftrace_lock is faster.
The functions ips are passed as unsigned long array with count.
Signed-off-by: Jiri Olsa <[email protected]>
---
include/linux/ftrace.h | 3 +++
kernel/trace/ftrace.c | 53 +++++++++++++++++++++++++++++++++++-------
2 files changed, 47 insertions(+), 9 deletions(-)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9999e29187de..60847cbce0da 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -512,6 +512,8 @@ struct dyn_ftrace {
int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
int remove, int reset);
+int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips,
+ unsigned int cnt, int remove, int reset);
int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
int len, int reset);
int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
@@ -802,6 +804,7 @@ static inline unsigned long ftrace_location(unsigned long ip)
#define ftrace_regex_open(ops, flag, inod, file) ({ -ENODEV; })
#define ftrace_set_early_filter(ops, buf, enable) do { } while (0)
#define ftrace_set_filter_ip(ops, ip, remove, reset) ({ -ENODEV; })
+#define ftrace_set_filter_ips(ops, ips, cnt, remove, reset) ({ -ENODEV; })
#define ftrace_set_filter(ops, buf, len, reset) ({ -ENODEV; })
#define ftrace_set_notrace(ops, buf, len, reset) ({ -ENODEV; })
#define ftrace_free_filter(ops) do { } while (0)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index be5f6b32a012..39350aa38649 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4958,7 +4958,7 @@ ftrace_notrace_write(struct file *file, const char __user *ubuf,
}
static int
-ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
+__ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
{
struct ftrace_func_entry *entry;
@@ -4976,9 +4976,25 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
return add_hash_entry(hash, ip);
}
+static int
+ftrace_match_addr(struct ftrace_hash *hash, unsigned long *ips,
+ unsigned int cnt, int remove)
+{
+ unsigned int i;
+ int err;
+
+ for (i = 0; i < cnt; i++) {
+ err = __ftrace_match_addr(hash, ips[i], remove);
+ if (err)
+ return err;
+ }
+ return 0;
+}
+
static int
ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
- unsigned long ip, int remove, int reset, int enable)
+ unsigned long *ips, unsigned int cnt,
+ int remove, int reset, int enable)
{
struct ftrace_hash **orig_hash;
struct ftrace_hash *hash;
@@ -5008,8 +5024,8 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
ret = -EINVAL;
goto out_regex_unlock;
}
- if (ip) {
- ret = ftrace_match_addr(hash, ip, remove);
+ if (ips) {
+ ret = ftrace_match_addr(hash, ips, cnt, remove);
if (ret < 0)
goto out_regex_unlock;
}
@@ -5026,10 +5042,10 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
}
static int
-ftrace_set_addr(struct ftrace_ops *ops, unsigned long ip, int remove,
- int reset, int enable)
+ftrace_set_addr(struct ftrace_ops *ops, unsigned long *ips, unsigned int cnt,
+ int remove, int reset, int enable)
{
- return ftrace_set_hash(ops, NULL, 0, ip, remove, reset, enable);
+ return ftrace_set_hash(ops, NULL, 0, ips, cnt, remove, reset, enable);
}
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
@@ -5634,10 +5650,29 @@ int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
int remove, int reset)
{
ftrace_ops_init(ops);
- return ftrace_set_addr(ops, ip, remove, reset, 1);
+ return ftrace_set_addr(ops, &ip, 1, remove, reset, 1);
}
EXPORT_SYMBOL_GPL(ftrace_set_filter_ip);
+/**
+ * ftrace_set_filter_ips - set a functions to filter on in ftrace by addresses
+ * @ops - the ops to set the filter with
+ * @ips - the array of addresses to add to or remove from the filter.
+ * @cnt - the number of addresses in @ips
+ * @remove - non zero to remove ips from the filter
+ * @reset - non zero to reset all filters before applying this filter.
+ *
+ * Filters denote which functions should be enabled when tracing is enabled
+ * If @ips array or any ip specified within is NULL , it fails to update filter.
+ */
+int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips,
+ unsigned int cnt, int remove, int reset)
+{
+ ftrace_ops_init(ops);
+ return ftrace_set_addr(ops, ips, cnt, remove, reset, 1);
+}
+EXPORT_SYMBOL_GPL(ftrace_set_filter_ips);
+
/**
* ftrace_ops_set_global_filter - setup ops to use global filters
* @ops - the ops which will use the global filters
@@ -5659,7 +5694,7 @@ static int
ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
int reset, int enable)
{
- return ftrace_set_hash(ops, buf, len, 0, 0, reset, enable);
+ return ftrace_set_hash(ops, buf, len, NULL, 0, 0, reset, enable);
}
/**
--
2.33.1
The bpf_get_func_ip_kprobe helper should return traced function
address, but it's doing so only for kprobes that are placed on
the function entry.
If kprobe is placed within the function, bpf_get_func_ip_kprobe
returns that address instead of function entry.
Storing the function entry directly in kprobe object, so it could
be used in bpf_get_func_ip_kprobe helper.
Signed-off-by: Jiri Olsa <[email protected]>
---
include/linux/kprobes.h | 3 +++
kernel/kprobes.c | 12 ++++++++++++
kernel/trace/bpf_trace.c | 2 +-
tools/testing/selftests/bpf/progs/get_func_ip_test.c | 4 ++--
4 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 8c8f7a4d93af..a204df4fef96 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -74,6 +74,9 @@ struct kprobe {
/* Offset into the symbol */
unsigned int offset;
+ /* traced function address */
+ unsigned long func_addr;
+
/* Called before addr is executed. */
kprobe_pre_handler_t pre_handler;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index d20ae8232835..c4060a8da050 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1310,6 +1310,7 @@ static void init_aggr_kprobe(struct kprobe *ap, struct kprobe *p)
copy_kprobe(p, ap);
flush_insn_slot(ap);
ap->addr = p->addr;
+ ap->func_addr = p->func_addr;
ap->flags = p->flags & ~KPROBE_FLAG_OPTIMIZED;
ap->pre_handler = aggr_pre_handler;
/* We don't care the kprobe which has gone. */
@@ -1588,6 +1589,16 @@ static int check_kprobe_address_safe(struct kprobe *p,
return ret;
}
+static unsigned long resolve_func_addr(kprobe_opcode_t *addr)
+{
+ char str[KSYM_SYMBOL_LEN];
+ unsigned long offset;
+
+ if (kallsyms_lookup((unsigned long) addr, NULL, &offset, NULL, str))
+ return (unsigned long) addr - offset;
+ return 0;
+}
+
int register_kprobe(struct kprobe *p)
{
int ret;
@@ -1600,6 +1611,7 @@ int register_kprobe(struct kprobe *p)
if (IS_ERR(addr))
return PTR_ERR(addr);
p->addr = addr;
+ p->func_addr = resolve_func_addr(addr);
ret = warn_kprobe_rereg(p);
if (ret)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 21aa30644219..25631253084a 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1026,7 +1026,7 @@ BPF_CALL_1(bpf_get_func_ip_kprobe, struct pt_regs *, regs)
{
struct kprobe *kp = kprobe_running();
- return kp ? (uintptr_t)kp->addr : 0;
+ return kp ? (uintptr_t)kp->func_addr : 0;
}
static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
diff --git a/tools/testing/selftests/bpf/progs/get_func_ip_test.c b/tools/testing/selftests/bpf/progs/get_func_ip_test.c
index a587aeca5ae0..e988aefa567e 100644
--- a/tools/testing/selftests/bpf/progs/get_func_ip_test.c
+++ b/tools/testing/selftests/bpf/progs/get_func_ip_test.c
@@ -69,7 +69,7 @@ int test6(struct pt_regs *ctx)
{
__u64 addr = bpf_get_func_ip(ctx);
- test6_result = (const void *) addr == &bpf_fentry_test6 + 5;
+ test6_result = (const void *) addr == &bpf_fentry_test6;
return 0;
}
@@ -79,6 +79,6 @@ int test7(struct pt_regs *ctx)
{
__u64 addr = bpf_get_func_ip(ctx);
- test7_result = (const void *) addr == &bpf_fentry_test7 + 5;
+ test7_result = (const void *) addr == &bpf_fentry_test7;
return 0;
}
--
2.33.1
Adding support to register kprobe on multiple addresses within
single kprobe object instance.
It uses the CONFIG_KPROBES_ON_FTRACE feature (so it's only
available for function entry addresses) and separated ftrace_ops
object placed directly in the kprobe object.
There's new CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE config option,
enabled for archs that support multi kprobes.
It registers all the provided addresses in the ftrace_ops object
filter and adds separate ftrace_ops callback.
To register multi kprobe user provides array of addresses or
symbols with their count, like:
struct kprobe kp = {};
kp.multi.symbols = (const char **) symbols;
kp.multi.cnt = cnt;
...
err = register_kprobe(&kp);
Signed-off-by: Jiri Olsa <[email protected]>
---
arch/Kconfig | 3 +
arch/x86/Kconfig | 1 +
arch/x86/kernel/kprobes/ftrace.c | 48 ++++++--
include/linux/kprobes.h | 25 ++++
kernel/kprobes.c | 204 +++++++++++++++++++++++++------
5 files changed, 235 insertions(+), 46 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig
index d3c4ab249e9c..0131636e1ef8 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -191,6 +191,9 @@ config HAVE_OPTPROBES
config HAVE_KPROBES_ON_FTRACE
bool
+config HAVE_KPROBES_MULTI_ON_FTRACE
+ bool
+
config ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
bool
help
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5c2ccb85f2ef..0c870238016a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -217,6 +217,7 @@ config X86
select HAVE_KERNEL_ZSTD
select HAVE_KPROBES
select HAVE_KPROBES_ON_FTRACE
+ select HAVE_KPROBES_MULTI_ON_FTRACE
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_KRETPROBES
select HAVE_KVM
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index dd2ec14adb77..ac4d256b89c6 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -12,22 +12,14 @@
#include "common.h"
-/* Ftrace callback handler for kprobes -- called under preempt disabled */
-void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
- struct ftrace_ops *ops, struct ftrace_regs *fregs)
+static void ftrace_handler(struct kprobe *p, unsigned long ip,
+ struct ftrace_regs *fregs)
{
struct pt_regs *regs = ftrace_get_regs(fregs);
- struct kprobe *p;
struct kprobe_ctlblk *kcb;
- int bit;
- bit = ftrace_test_recursion_trylock(ip, parent_ip);
- if (bit < 0)
- return;
-
- p = get_kprobe((kprobe_opcode_t *)ip);
if (unlikely(!p) || kprobe_disabled(p))
- goto out;
+ return;
kcb = get_kprobe_ctlblk();
if (kprobe_running()) {
@@ -57,11 +49,43 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
*/
__this_cpu_write(current_kprobe, NULL);
}
-out:
+}
+
+/* Ftrace callback handler for kprobes -- called under preempt disabled */
+void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
+ struct ftrace_ops *ops, struct ftrace_regs *fregs)
+{
+ struct kprobe *p;
+ int bit;
+
+ bit = ftrace_test_recursion_trylock(ip, parent_ip);
+ if (bit < 0)
+ return;
+
+ p = get_kprobe((kprobe_opcode_t *)ip);
+ ftrace_handler(p, ip, fregs);
+
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(kprobe_ftrace_handler);
+void kprobe_ftrace_multi_handler(unsigned long ip, unsigned long parent_ip,
+ struct ftrace_ops *ops, struct ftrace_regs *fregs)
+{
+ struct kprobe *p;
+ int bit;
+
+ bit = ftrace_test_recursion_trylock(ip, parent_ip);
+ if (bit < 0)
+ return;
+
+ p = container_of(ops, struct kprobe, multi.ops);
+ ftrace_handler(p, ip, fregs);
+
+ ftrace_test_recursion_unlock(bit);
+}
+NOKPROBE_SYMBOL(kprobe_ftrace_multi_handler);
+
int arch_prepare_kprobe_ftrace(struct kprobe *p)
{
p->ainsn.insn = NULL;
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index a204df4fef96..03fd86ef69cb 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -68,6 +68,16 @@ struct kprobe {
/* location of the probe point */
kprobe_opcode_t *addr;
+#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
+ /* location of the multi probe points */
+ struct {
+ const char **symbols;
+ kprobe_opcode_t **addrs;
+ unsigned int cnt;
+ struct ftrace_ops ops;
+ } multi;
+#endif
+
/* Allow user to indicate symbol name of the probe point */
const char *symbol_name;
@@ -105,6 +115,7 @@ struct kprobe {
* this flag is only for optimized_kprobe.
*/
#define KPROBE_FLAG_FTRACE 8 /* probe is using ftrace */
+#define KPROBE_FLAG_MULTI 16 /* probe multi addresses */
/* Has this kprobe gone ? */
static inline bool kprobe_gone(struct kprobe *p)
@@ -130,6 +141,18 @@ static inline bool kprobe_ftrace(struct kprobe *p)
return p->flags & KPROBE_FLAG_FTRACE;
}
+/* Is this ftrace multi kprobe ? */
+static inline bool kprobe_ftrace_multi(struct kprobe *p)
+{
+ return kprobe_ftrace(p) && (p->flags & KPROBE_FLAG_MULTI);
+}
+
+/* Is this single kprobe ? */
+static inline bool kprobe_single(struct kprobe *p)
+{
+ return !(p->flags & KPROBE_FLAG_MULTI);
+}
+
/*
* Function-return probe -
* Note:
@@ -365,6 +388,8 @@ static inline void wait_for_kprobe_optimizer(void) { }
#ifdef CONFIG_KPROBES_ON_FTRACE
extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct ftrace_regs *fregs);
+extern void kprobe_ftrace_multi_handler(unsigned long ip, unsigned long parent_ip,
+ struct ftrace_ops *ops, struct ftrace_regs *fregs);
extern int arch_prepare_kprobe_ftrace(struct kprobe *p);
#else
static inline int arch_prepare_kprobe_ftrace(struct kprobe *p)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index c4060a8da050..e7729e20d85c 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -44,6 +44,7 @@
#include <asm/cacheflush.h>
#include <asm/errno.h>
#include <linux/uaccess.h>
+#include <linux/ftrace.h>
#define KPROBE_HASH_BITS 6
#define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
@@ -1022,6 +1023,35 @@ static struct kprobe *alloc_aggr_kprobe(struct kprobe *p)
}
#endif /* CONFIG_OPTPROBES */
+static int check_kprobe_address(unsigned long addr)
+{
+ /* Ensure it is not in reserved area nor out of text */
+ return !kernel_text_address(addr) ||
+ within_kprobe_blacklist(addr) ||
+ jump_label_text_reserved((void *) addr, (void *) addr) ||
+ static_call_text_reserved((void *) addr, (void *) addr) ||
+ find_bug(addr);
+}
+
+static int check_ftrace_location(unsigned long addr, struct kprobe *p)
+{
+ unsigned long ftrace_addr;
+
+ ftrace_addr = ftrace_location(addr);
+ if (ftrace_addr) {
+#ifdef CONFIG_KPROBES_ON_FTRACE
+ /* Given address is not on the instruction boundary */
+ if (addr != ftrace_addr)
+ return -EILSEQ;
+ if (p)
+ p->flags |= KPROBE_FLAG_FTRACE;
+#else /* !CONFIG_KPROBES_ON_FTRACE */
+ return -EINVAL;
+#endif
+ }
+ return 0;
+}
+
#ifdef CONFIG_KPROBES_ON_FTRACE
static struct ftrace_ops kprobe_ftrace_ops __read_mostly = {
.func = kprobe_ftrace_handler,
@@ -1043,6 +1073,13 @@ static int __arm_kprobe_ftrace(struct kprobe *p, struct ftrace_ops *ops,
lockdep_assert_held(&kprobe_mutex);
+#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
+ if (kprobe_ftrace_multi(p)) {
+ ret = register_ftrace_function(&p->multi.ops);
+ WARN(ret < 0, "Failed to register kprobe-multi-ftrace (error %d)\n", ret);
+ return ret;
+ }
+#endif
ret = ftrace_set_filter_ip(ops, (unsigned long)p->addr, 0, 0);
if (WARN_ONCE(ret < 0, "Failed to arm kprobe-ftrace at %pS (error %d)\n", p->addr, ret))
return ret;
@@ -1081,6 +1118,13 @@ static int __disarm_kprobe_ftrace(struct kprobe *p, struct ftrace_ops *ops,
lockdep_assert_held(&kprobe_mutex);
+#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
+ if (kprobe_ftrace_multi(p)) {
+ ret = unregister_ftrace_function(&p->multi.ops);
+ WARN(ret < 0, "Failed to unregister kprobe-ftrace (error %d)\n", ret);
+ return ret;
+ }
+#endif
if (*cnt == 1) {
ret = unregister_ftrace_function(ops);
if (WARN(ret < 0, "Failed to unregister kprobe-ftrace (error %d)\n", ret))
@@ -1103,6 +1147,94 @@ static int disarm_kprobe_ftrace(struct kprobe *p)
ipmodify ? &kprobe_ipmodify_ops : &kprobe_ftrace_ops,
ipmodify ? &kprobe_ipmodify_enabled : &kprobe_ftrace_enabled);
}
+
+#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
+/*
+ * In addition to standard kprobe address check for multi
+ * ftrace kprobes we also allow only:
+ * - ftrace managed function entry address
+ * - kernel core only address
+ */
+static unsigned long check_ftrace_addr(unsigned long addr)
+{
+ int err;
+
+ if (!addr)
+ return -EINVAL;
+ err = check_ftrace_location(addr, NULL);
+ if (err)
+ return err;
+ if (check_kprobe_address(addr))
+ return -EINVAL;
+ if (__module_text_address(addr))
+ return -EINVAL;
+ return 0;
+}
+
+static int check_ftrace_multi(struct kprobe *p)
+{
+ kprobe_opcode_t **addrs = p->multi.addrs;
+ const char **symbols = p->multi.symbols;
+ unsigned int i, cnt = p->multi.cnt;
+ unsigned long addr, *ips;
+ int err;
+
+ if ((symbols && addrs) || (!symbols && !addrs))
+ return -EINVAL;
+
+ /* do we want sysctl for this? */
+ if (cnt >= 20000)
+ return -E2BIG;
+
+ ips = kmalloc(sizeof(*ips) * cnt, GFP_KERNEL);
+ if (!ips)
+ return -ENOMEM;
+
+ for (i = 0; i < cnt; i++) {
+ if (symbols)
+ addr = (unsigned long) kprobe_lookup_name(symbols[i], 0);
+ else
+ addr = (unsigned long) addrs[i];
+ ips[i] = addr;
+ }
+
+ jump_label_lock();
+ preempt_disable();
+
+ for (i = 0; i < cnt; i++) {
+ err = check_ftrace_addr(ips[i]);
+ if (err)
+ break;
+ }
+
+ preempt_enable();
+ jump_label_unlock();
+
+ if (err)
+ goto out;
+
+ err = ftrace_set_filter_ips(&p->multi.ops, ips, cnt, 0, 0);
+ if (err)
+ goto out;
+
+ p->multi.ops.func = kprobe_ftrace_multi_handler;
+ p->multi.ops.flags = FTRACE_OPS_FL_SAVE_REGS|FTRACE_OPS_FL_DYNAMIC;
+
+ p->flags |= KPROBE_FLAG_MULTI|KPROBE_FLAG_FTRACE;
+ if (p->post_handler)
+ p->multi.ops.flags |= FTRACE_OPS_FL_IPMODIFY;
+
+out:
+ kfree(ips);
+ return err;
+}
+
+static void free_ftrace_multi(struct kprobe *p)
+{
+ ftrace_free_filter(&p->multi.ops);
+}
+#endif
+
#else /* !CONFIG_KPROBES_ON_FTRACE */
static inline int arm_kprobe_ftrace(struct kprobe *p)
{
@@ -1489,6 +1621,9 @@ static struct kprobe *__get_valid_kprobe(struct kprobe *p)
lockdep_assert_held(&kprobe_mutex);
+ if (kprobe_ftrace_multi(p))
+ return p;
+
ap = get_kprobe(p->addr);
if (unlikely(!ap))
return NULL;
@@ -1520,41 +1655,18 @@ static inline int warn_kprobe_rereg(struct kprobe *p)
return ret;
}
-static int check_ftrace_location(struct kprobe *p)
-{
- unsigned long ftrace_addr;
-
- ftrace_addr = ftrace_location((unsigned long)p->addr);
- if (ftrace_addr) {
-#ifdef CONFIG_KPROBES_ON_FTRACE
- /* Given address is not on the instruction boundary */
- if ((unsigned long)p->addr != ftrace_addr)
- return -EILSEQ;
- p->flags |= KPROBE_FLAG_FTRACE;
-#else /* !CONFIG_KPROBES_ON_FTRACE */
- return -EINVAL;
-#endif
- }
- return 0;
-}
-
static int check_kprobe_address_safe(struct kprobe *p,
struct module **probed_mod)
{
int ret;
- ret = check_ftrace_location(p);
+ ret = check_ftrace_location((unsigned long) p->addr, p);
if (ret)
return ret;
jump_label_lock();
preempt_disable();
- /* Ensure it is not in reserved area nor out of text */
- if (!kernel_text_address((unsigned long) p->addr) ||
- within_kprobe_blacklist((unsigned long) p->addr) ||
- jump_label_text_reserved(p->addr, p->addr) ||
- static_call_text_reserved(p->addr, p->addr) ||
- find_bug((unsigned long)p->addr)) {
+ if (check_kprobe_address((unsigned long) p->addr)) {
ret = -EINVAL;
goto out;
}
@@ -1599,13 +1711,16 @@ static unsigned long resolve_func_addr(kprobe_opcode_t *addr)
return 0;
}
-int register_kprobe(struct kprobe *p)
+static int check_addr(struct kprobe *p, struct module **probed_mod)
{
int ret;
- struct kprobe *old_p;
- struct module *probed_mod;
kprobe_opcode_t *addr;
+#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
+ if (p->multi.cnt)
+ return check_ftrace_multi(p);
+#endif
+
/* Adjust probe address from symbol */
addr = kprobe_addr(p);
if (IS_ERR(addr))
@@ -1616,13 +1731,21 @@ int register_kprobe(struct kprobe *p)
ret = warn_kprobe_rereg(p);
if (ret)
return ret;
+ return check_kprobe_address_safe(p, probed_mod);
+}
+
+int register_kprobe(struct kprobe *p)
+{
+ struct module *probed_mod = NULL;
+ struct kprobe *old_p;
+ int ret;
/* User can pass only KPROBE_FLAG_DISABLED to register_kprobe */
p->flags &= KPROBE_FLAG_DISABLED;
p->nmissed = 0;
INIT_LIST_HEAD(&p->list);
- ret = check_kprobe_address_safe(p, &probed_mod);
+ ret = check_addr(p, &probed_mod);
if (ret)
return ret;
@@ -1644,14 +1767,21 @@ int register_kprobe(struct kprobe *p)
if (ret)
goto out;
- INIT_HLIST_NODE(&p->hlist);
- hlist_add_head_rcu(&p->hlist,
- &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
+ /*
+ * Multi ftrace kprobes do not have single address,
+ * so they are not stored in the kprobe_table hash.
+ */
+ if (kprobe_single(p)) {
+ INIT_HLIST_NODE(&p->hlist);
+ hlist_add_head_rcu(&p->hlist,
+ &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
+ }
if (!kprobes_all_disarmed && !kprobe_disabled(p)) {
ret = arm_kprobe(p);
if (ret) {
- hlist_del_rcu(&p->hlist);
+ if (kprobe_single(p))
+ hlist_del_rcu(&p->hlist);
synchronize_rcu();
goto out;
}
@@ -1778,7 +1908,13 @@ static int __unregister_kprobe_top(struct kprobe *p)
return 0;
disarmed:
- hlist_del_rcu(&ap->hlist);
+ if (kprobe_single(ap))
+ hlist_del_rcu(&ap->hlist);
+
+#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
+ if (kprobe_ftrace_multi(ap))
+ free_ftrace_multi(ap);
+#endif
return 0;
}
--
2.33.1
Adding support to register kretprobe on multiple addresses within
single kprobe object instance.
The interface stays same as for kprobes, the patch just adds blacklist
check for each address, which is special for kretprobes.
Signed-off-by: Jiri Olsa <[email protected]>
---
include/linux/kprobes.h | 1 +
kernel/kprobes.c | 46 ++++++++++++++++++++++++++++++++---------
2 files changed, 37 insertions(+), 10 deletions(-)
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 03fd86ef69cb..a31da6202b5c 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -75,6 +75,7 @@ struct kprobe {
kprobe_opcode_t **addrs;
unsigned int cnt;
struct ftrace_ops ops;
+ bool check_kretprobe;
} multi;
#endif
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index e7729e20d85c..04fc411ca30c 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1052,6 +1052,17 @@ static int check_ftrace_location(unsigned long addr, struct kprobe *p)
return 0;
}
+static bool in_kretprobe_blacklist(void *addr)
+{
+ int i;
+
+ for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
+ if (kretprobe_blacklist[i].addr == addr)
+ return true;
+ }
+ return false;
+}
+
#ifdef CONFIG_KPROBES_ON_FTRACE
static struct ftrace_ops kprobe_ftrace_ops __read_mostly = {
.func = kprobe_ftrace_handler,
@@ -1155,7 +1166,8 @@ static int disarm_kprobe_ftrace(struct kprobe *p)
* - ftrace managed function entry address
* - kernel core only address
*/
-static unsigned long check_ftrace_addr(unsigned long addr)
+static unsigned long check_ftrace_addr(unsigned long addr,
+ bool check_kretprobe)
{
int err;
@@ -1168,6 +1180,8 @@ static unsigned long check_ftrace_addr(unsigned long addr)
return -EINVAL;
if (__module_text_address(addr))
return -EINVAL;
+ if (check_kretprobe && in_kretprobe_blacklist((void *) addr))
+ return -EINVAL;
return 0;
}
@@ -1202,7 +1216,7 @@ static int check_ftrace_multi(struct kprobe *p)
preempt_disable();
for (i = 0; i < cnt; i++) {
- err = check_ftrace_addr(ips[i]);
+ err = check_ftrace_addr(ips[i], p->multi.check_kretprobe);
if (err)
break;
}
@@ -2188,13 +2202,17 @@ int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long o
return 0;
}
-int register_kretprobe(struct kretprobe *rp)
+static int check_kretprobe_address(struct kretprobe *rp)
{
int ret;
- struct kretprobe_instance *inst;
- int i;
void *addr;
+#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
+ if (rp->kp.multi.cnt) {
+ rp->kp.multi.check_kretprobe = !!kretprobe_blacklist_size;
+ return 0;
+ }
+#endif
ret = kprobe_on_func_entry(rp->kp.addr, rp->kp.symbol_name, rp->kp.offset);
if (ret)
return ret;
@@ -2207,12 +2225,20 @@ int register_kretprobe(struct kretprobe *rp)
addr = kprobe_addr(&rp->kp);
if (IS_ERR(addr))
return PTR_ERR(addr);
-
- for (i = 0; kretprobe_blacklist[i].name != NULL; i++) {
- if (kretprobe_blacklist[i].addr == addr)
- return -EINVAL;
- }
+ if (in_kretprobe_blacklist(addr))
+ return -EINVAL;
}
+ return 0;
+}
+
+int register_kretprobe(struct kretprobe *rp)
+{
+ struct kretprobe_instance *inst;
+ int i, ret = 0;
+
+ ret = check_kretprobe_address(rp);
+ if (ret)
+ return ret;
if (rp->data_size > KRETPROBE_MAX_DATA_SIZE)
return -E2BIG;
--
2.33.1
The current bpf_get_func_ip_kprobe helper does not work properly,
when used in ebpf program triggered by the new multi kprobes.
We can't use kprobe's func_addr in bpf_get_func_ip_kprobe helper,
because there are multiple functions registered for single kprobe
object.
Adding new per cpu variable current_ftrace_multi_addr and extra
address in kretprobe_instance object to keep current traced function
address for each cpu for both kprobe handler and kretprobe trampoline.
The address value is set/passed as follows, for kprobe:
kprobe_ftrace_multi_handler
{
old = kprobe_ftrace_multi_addr_set(ip);
handler..
kprobe_ftrace_multi_addr_set(old);
}
For kretprobe:
kprobe_ftrace_multi_handler
{
old = kprobe_ftrace_multi_addr_set(ip);
...
pre_handler_kretprobe
{
ri->ftrace_multi_addr = kprobe_ftrace_multi_addr
}
...
kprobe_ftrace_multi_addr_set(old);
}
__kretprobe_trampoline_handler
{
prev_func_addr = kprobe_ftrace_multi_addr_set(ri->ftrace_multi_addr);
handler..
kprobe_ftrace_multi_addr_set(prev_func_addr);
}
Signed-off-by: Jiri Olsa <[email protected]>
---
arch/x86/kernel/kprobes/ftrace.c | 3 +++
include/linux/kprobes.h | 26 ++++++++++++++++++++++++++
kernel/kprobes.c | 6 ++++++
kernel/trace/bpf_trace.c | 7 ++++++-
4 files changed, 41 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
index ac4d256b89c6..8caaa58c3a64 100644
--- a/arch/x86/kernel/kprobes/ftrace.c
+++ b/arch/x86/kernel/kprobes/ftrace.c
@@ -72,6 +72,7 @@ NOKPROBE_SYMBOL(kprobe_ftrace_handler);
void kprobe_ftrace_multi_handler(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct ftrace_regs *fregs)
{
+ unsigned long old;
struct kprobe *p;
int bit;
@@ -79,8 +80,10 @@ void kprobe_ftrace_multi_handler(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
return;
+ old = kprobe_ftrace_multi_addr_set(ip);
p = container_of(ops, struct kprobe, multi.ops);
ftrace_handler(p, ip, fregs);
+ kprobe_ftrace_multi_addr_set(old);
ftrace_test_recursion_unlock(bit);
}
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index a31da6202b5c..3f0522b9538b 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -191,6 +191,7 @@ struct kretprobe_instance {
struct kretprobe_holder *rph;
kprobe_opcode_t *ret_addr;
void *fp;
+ unsigned long ftrace_multi_addr;
char data[];
};
@@ -387,16 +388,37 @@ static inline void wait_for_kprobe_optimizer(void) { }
#endif /* CONFIG_OPTPROBES */
#ifdef CONFIG_KPROBES_ON_FTRACE
+DECLARE_PER_CPU(unsigned long, current_ftrace_multi_addr);
extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct ftrace_regs *fregs);
extern void kprobe_ftrace_multi_handler(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct ftrace_regs *fregs);
extern int arch_prepare_kprobe_ftrace(struct kprobe *p);
+
+static inline unsigned long kprobe_ftrace_multi_addr(void)
+{
+ return __this_cpu_read(current_ftrace_multi_addr);
+}
+static inline unsigned long kprobe_ftrace_multi_addr_set(unsigned long addr)
+{
+ unsigned long old = __this_cpu_read(current_ftrace_multi_addr);
+
+ __this_cpu_write(current_ftrace_multi_addr, addr);
+ return old;
+}
#else
static inline int arch_prepare_kprobe_ftrace(struct kprobe *p)
{
return -EINVAL;
}
+static inline unsigned long kprobe_ftrace_multi_addr_set(unsigned long addr)
+{
+ return 0;
+}
+static inline unsigned long kprobe_ftrace_multi_addr(void)
+{
+ return 0;
+}
#endif /* CONFIG_KPROBES_ON_FTRACE */
/* Get the kprobe at this addr (if any) - called with preemption disabled */
@@ -514,6 +536,10 @@ static inline int kprobe_get_kallsym(unsigned int symnum, unsigned long *value,
{
return -ERANGE;
}
+static inline unsigned long kprobe_ftrace_multi_addr(void)
+{
+ return 0;
+}
#endif /* CONFIG_KPROBES */
static inline int disable_kretprobe(struct kretprobe *rp)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 04fc411ca30c..6ba249f3a0cb 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1064,6 +1064,8 @@ static bool in_kretprobe_blacklist(void *addr)
}
#ifdef CONFIG_KPROBES_ON_FTRACE
+DEFINE_PER_CPU(unsigned long, current_ftrace_multi_addr);
+
static struct ftrace_ops kprobe_ftrace_ops __read_mostly = {
.func = kprobe_ftrace_handler,
.flags = FTRACE_OPS_FL_SAVE_REGS,
@@ -2106,11 +2108,14 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
rp = get_kretprobe(ri);
if (rp && rp->handler) {
struct kprobe *prev = kprobe_running();
+ unsigned long prev_func_addr;
+ prev_func_addr = kprobe_ftrace_multi_addr_set(ri->ftrace_multi_addr);
__this_cpu_write(current_kprobe, &rp->kp);
ri->ret_addr = correct_ret_addr;
rp->handler(ri, regs);
__this_cpu_write(current_kprobe, prev);
+ kprobe_ftrace_multi_addr_set(prev_func_addr);
}
if (first == node)
break;
@@ -2161,6 +2166,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
}
arch_prepare_kretprobe(ri, regs);
+ ri->ftrace_multi_addr = kprobe_ftrace_multi_addr();
__llist_add(&ri->llist, ¤t->kretprobe_instances);
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 25631253084a..39f4d476cfca 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1026,7 +1026,12 @@ BPF_CALL_1(bpf_get_func_ip_kprobe, struct pt_regs *, regs)
{
struct kprobe *kp = kprobe_running();
- return kp ? (uintptr_t)kp->func_addr : 0;
+ if (!kp)
+ return 0;
+ if (kprobe_ftrace_multi(kp))
+ return (uintptr_t) kprobe_ftrace_multi_addr();
+ else
+ return (uintptr_t) kp->func_addr;
}
static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
--
2.33.1
Adding support to test multi kprobe interface. It's now possible
to register multiple functions for the module handler, like:
# modprobe kprobe_example symbol='sched_fork,kernel_clone'
Signed-off-by: Jiri Olsa <[email protected]>
---
samples/kprobes/kprobe_example.c | 47 +++++++++++++++++++++++++++++---
1 file changed, 43 insertions(+), 4 deletions(-)
diff --git a/samples/kprobes/kprobe_example.c b/samples/kprobes/kprobe_example.c
index f991a66b5b02..4ae31184f098 100644
--- a/samples/kprobes/kprobe_example.c
+++ b/samples/kprobes/kprobe_example.c
@@ -15,8 +15,10 @@
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/kprobes.h>
+#include <linux/slab.h>
+#include <linux/string.h>
-#define MAX_SYMBOL_LEN 64
+#define MAX_SYMBOL_LEN 4096
static char symbol[MAX_SYMBOL_LEN] = "kernel_clone";
module_param_string(symbol, symbol, sizeof(symbol), 0644);
@@ -97,17 +99,54 @@ static void __kprobes handler_post(struct kprobe *p, struct pt_regs *regs,
static int __init kprobe_init(void)
{
+ char **symbols = NULL;
int ret;
+
kp.pre_handler = handler_pre;
kp.post_handler = handler_post;
+#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
+ if (strchr(symbol, ',')) {
+ char *p, *tmp;
+ int cnt;
+
+ tmp = kstrdup(symbol, GFP_KERNEL);
+ if (!tmp)
+ return -ENOMEM;
+
+ p = strchr(tmp, ',');
+ while (p) {
+ *p = ' ';
+ p = strchr(p + 1, ',');
+ }
+
+ symbols = argv_split(GFP_KERNEL, tmp, &cnt);
+ kfree(tmp);
+ if (!symbols) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ kp.multi.symbols = (const char **) symbols;
+ kp.multi.cnt = cnt;
+ }
+#endif
+
ret = register_kprobe(&kp);
if (ret < 0) {
pr_err("register_kprobe failed, returned %d\n", ret);
- return ret;
+ goto out;
}
- pr_info("Planted kprobe at %p\n", kp.addr);
- return 0;
+
+ if (symbols)
+ pr_info("Planted multi kprobe to %s\n", symbol);
+ else
+ pr_info("Planted kprobe at %p\n", kp.addr);
+
+out:
+ if (symbols)
+ argv_free(symbols);
+ return ret;
}
static void __exit kprobe_exit(void)
--
2.33.1
Adding support to test multi kprobe interface. It's now possible
to register multiple functions for the module handler, like:
# modprobe kretprobe_example.ko func='sched_fork,kernel_clone'
Signed-off-by: Jiri Olsa <[email protected]>
---
samples/kprobes/kretprobe_example.c | 43 +++++++++++++++++++++++++++--
1 file changed, 41 insertions(+), 2 deletions(-)
diff --git a/samples/kprobes/kretprobe_example.c b/samples/kprobes/kretprobe_example.c
index 228321ecb161..2181cf0d6e4a 100644
--- a/samples/kprobes/kretprobe_example.c
+++ b/samples/kprobes/kretprobe_example.c
@@ -25,6 +25,8 @@
#include <linux/ktime.h>
#include <linux/limits.h>
#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/string.h>
static char func_name[NAME_MAX] = "kernel_clone";
module_param_string(func, func_name, NAME_MAX, S_IRUGO);
@@ -80,17 +82,54 @@ static struct kretprobe my_kretprobe = {
static int __init kretprobe_init(void)
{
+ char **symbols = NULL;
int ret;
my_kretprobe.kp.symbol_name = func_name;
+
+#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
+ if (strchr(func_name, ',')) {
+ char *p, *tmp;
+ int cnt;
+
+ tmp = kstrdup(func_name, GFP_KERNEL);
+ if (!tmp)
+ return -ENOMEM;
+
+ p = strchr(tmp, ',');
+ while (p) {
+ *p = ' ';
+ p = strchr(p + 1, ',');
+ }
+
+ symbols = argv_split(GFP_KERNEL, tmp, &cnt);
+ kfree(tmp);
+ if (!symbols) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ my_kretprobe.kp.multi.symbols = (const char **) symbols;
+ my_kretprobe.kp.multi.cnt = cnt;
+ }
+#endif
ret = register_kretprobe(&my_kretprobe);
if (ret < 0) {
pr_err("register_kretprobe failed, returned %d\n", ret);
return ret;
}
- pr_info("Planted return probe at %s: %p\n",
+
+ if (symbols) {
+ pr_info("Planted multi return kprobe to %s\n", func_name);
+ } else {
+ pr_info("Planted return probe at %s: %p\n",
my_kretprobe.kp.symbol_name, my_kretprobe.kp.addr);
- return 0;
+ }
+
+out:
+ if (symbols)
+ argv_free(symbols);
+ return ret;
}
static void __exit kretprobe_exit(void)
--
2.33.1
Adding new link type BPF_LINK_TYPE_KPROBE to attach kprobes
directly through register_kprobe/kretprobe API.
Adding new attach type BPF_TRACE_RAW_KPROBE that enables
such link for kprobe program.
The new link allows to create multiple kprobes link by using
new link_create interface:
struct {
__aligned_u64 addrs;
__u32 cnt;
__u64 bpf_cookie;
} kprobe;
Plus new flag BPF_F_KPROBE_RETURN for link_create.flags to
create return probe.
Signed-off-by: Jiri Olsa <[email protected]>
---
include/linux/bpf_types.h | 1 +
include/uapi/linux/bpf.h | 12 +++
kernel/bpf/syscall.c | 191 ++++++++++++++++++++++++++++++++-
tools/include/uapi/linux/bpf.h | 12 +++
4 files changed, 211 insertions(+), 5 deletions(-)
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 48a91c51c015..a9000feab34e 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -140,3 +140,4 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
#ifdef CONFIG_PERF_EVENTS
BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
#endif
+BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE, kprobe)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b0383d371b9a..5216b333c688 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -995,6 +995,7 @@ enum bpf_attach_type {
BPF_SK_REUSEPORT_SELECT,
BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
BPF_PERF_EVENT,
+ BPF_TRACE_RAW_KPROBE,
__MAX_BPF_ATTACH_TYPE
};
@@ -1009,6 +1010,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_NETNS = 5,
BPF_LINK_TYPE_XDP = 6,
BPF_LINK_TYPE_PERF_EVENT = 7,
+ BPF_LINK_TYPE_KPROBE = 8,
MAX_BPF_LINK_TYPE,
};
@@ -1111,6 +1113,11 @@ enum bpf_link_type {
*/
#define BPF_F_SLEEPABLE (1U << 4)
+/* link_create flags used in LINK_CREATE command for BPF_TRACE_RAW_KPROBE
+ * attach type.
+ */
+#define BPF_F_KPROBE_RETURN (1U << 0)
+
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* the following extensions:
*
@@ -1465,6 +1472,11 @@ union bpf_attr {
*/
__u64 bpf_cookie;
} perf_event;
+ struct {
+ __aligned_u64 addrs;
+ __u32 cnt;
+ __u64 bpf_cookie;
+ } kprobe;
};
} link_create;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fa4505f9b611..7e5be63c820f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -32,6 +32,7 @@
#include <linux/bpf-netns.h>
#include <linux/rcupdate_trace.h>
#include <linux/memcontrol.h>
+#include <linux/kprobes.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@@ -3014,8 +3015,178 @@ static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *pro
fput(perf_file);
return err;
}
+#else
+static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+ return -ENOTSUPP;
+}
#endif /* CONFIG_PERF_EVENTS */
+#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
+
+struct bpf_kprobe_link {
+ struct bpf_link link;
+ struct kretprobe rp;
+ bool is_return;
+ kprobe_opcode_t **addrs;
+ u32 cnt;
+ u64 bpf_cookie;
+};
+
+static void bpf_kprobe_link_release(struct bpf_link *link)
+{
+ struct bpf_kprobe_link *kprobe_link;
+
+ kprobe_link = container_of(link, struct bpf_kprobe_link, link);
+
+ if (kprobe_link->is_return)
+ unregister_kretprobe(&kprobe_link->rp);
+ else
+ unregister_kprobe(&kprobe_link->rp.kp);
+}
+
+static void bpf_kprobe_link_dealloc(struct bpf_link *link)
+{
+ struct bpf_kprobe_link *kprobe_link;
+
+ kprobe_link = container_of(link, struct bpf_kprobe_link, link);
+ kfree(kprobe_link->addrs);
+ kfree(kprobe_link);
+}
+
+static const struct bpf_link_ops bpf_kprobe_link_lops = {
+ .release = bpf_kprobe_link_release,
+ .dealloc = bpf_kprobe_link_dealloc,
+};
+
+static int kprobe_link_prog_run(struct bpf_kprobe_link *kprobe_link,
+ struct pt_regs *regs)
+{
+ struct bpf_trace_run_ctx run_ctx;
+ struct bpf_run_ctx *old_run_ctx;
+ int err;
+
+ if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
+ err = 0;
+ goto out;
+ }
+
+ old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+ run_ctx.bpf_cookie = kprobe_link->bpf_cookie;
+
+ rcu_read_lock();
+ migrate_disable();
+ err = bpf_prog_run(kprobe_link->link.prog, regs);
+ migrate_enable();
+ rcu_read_unlock();
+
+ bpf_reset_run_ctx(old_run_ctx);
+
+ out:
+ __this_cpu_dec(bpf_prog_active);
+ return err;
+}
+
+static int kprobe_dispatcher(struct kprobe *kp, struct pt_regs *regs)
+{
+ struct bpf_kprobe_link *kprobe_link;
+
+ kprobe_link = container_of(kp, struct bpf_kprobe_link, rp.kp);
+ return kprobe_link_prog_run(kprobe_link, regs);
+}
+
+static int
+kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
+{
+ struct kretprobe *rp = get_kretprobe(ri);
+ struct bpf_kprobe_link *kprobe_link;
+
+ kprobe_link = container_of(rp, struct bpf_kprobe_link, rp);
+ return kprobe_link_prog_run(kprobe_link, regs);
+}
+
+static int bpf_kprobe_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+ struct bpf_link_primer link_primer;
+ struct bpf_kprobe_link *link;
+ kprobe_opcode_t **addrs;
+ u32 flags, cnt, size;
+ void __user *uaddrs;
+ u64 **tmp;
+ int err;
+
+ flags = attr->link_create.flags;
+ if (flags & ~BPF_F_KPROBE_RETURN)
+ return -EINVAL;
+
+ uaddrs = u64_to_user_ptr(attr->link_create.kprobe.addrs);
+ cnt = attr->link_create.kprobe.cnt;
+ size = cnt * sizeof(*tmp);
+
+ tmp = kzalloc(size, GFP_KERNEL);
+ if (!tmp)
+ return -ENOMEM;
+
+ if (copy_from_user(tmp, uaddrs, size)) {
+ err = -EFAULT;
+ goto error;
+ }
+
+ /* TODO add extra copy for 32bit archs */
+ if (sizeof(u64) != sizeof(void *))
+ return -EINVAL;
+
+ addrs = (kprobe_opcode_t **) tmp;
+
+ link = kzalloc(sizeof(*link), GFP_KERNEL);
+ if (!link)
+ return -ENOMEM;
+
+ bpf_link_init(&link->link, BPF_LINK_TYPE_KPROBE, &bpf_kprobe_link_lops, prog);
+
+ err = bpf_link_prime(&link->link, &link_primer);
+ if (err) {
+ kfree(link);
+ goto error;
+ }
+
+ INIT_HLIST_NODE(&link->rp.kp.hlist);
+ INIT_LIST_HEAD(&link->rp.kp.list);
+ link->is_return = flags & BPF_F_KPROBE_RETURN;
+ link->addrs = addrs;
+ link->cnt = cnt;
+ link->bpf_cookie = attr->link_create.kprobe.bpf_cookie;
+
+ link->rp.kp.multi.addrs = addrs;
+ link->rp.kp.multi.cnt = cnt;
+
+ if (link->is_return)
+ link->rp.handler = kretprobe_dispatcher;
+ else
+ link->rp.kp.pre_handler = kprobe_dispatcher;
+
+ if (link->is_return)
+ err = register_kretprobe(&link->rp);
+ else
+ err = register_kprobe(&link->rp.kp);
+
+ if (err) {
+ bpf_link_cleanup(&link_primer);
+ return err;
+ }
+ return bpf_link_settle(&link_primer);
+
+error:
+ kfree(tmp);
+ return err;
+}
+#else /* !CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE */
+static int bpf_kprobe_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+ return -ENOTSUPP;
+}
+#endif
+
#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
@@ -4242,7 +4413,7 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
return -EINVAL;
}
-#define BPF_LINK_CREATE_LAST_FIELD link_create.iter_info_len
+#define BPF_LINK_CREATE_LAST_FIELD link_create.kprobe.bpf_cookie
static int link_create(union bpf_attr *attr, bpfptr_t uattr)
{
enum bpf_prog_type ptype;
@@ -4266,7 +4437,6 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
ret = tracing_bpf_link_attach(attr, uattr, prog);
goto out;
case BPF_PROG_TYPE_PERF_EVENT:
- case BPF_PROG_TYPE_KPROBE:
case BPF_PROG_TYPE_TRACEPOINT:
if (attr->link_create.attach_type != BPF_PERF_EVENT) {
ret = -EINVAL;
@@ -4274,6 +4444,14 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
}
ptype = prog->type;
break;
+ case BPF_PROG_TYPE_KPROBE:
+ if (attr->link_create.attach_type != BPF_PERF_EVENT &&
+ attr->link_create.attach_type != BPF_TRACE_RAW_KPROBE) {
+ ret = -EINVAL;
+ goto out;
+ }
+ ptype = prog->type;
+ break;
default:
ptype = attach_type_to_prog_type(attr->link_create.attach_type);
if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) {
@@ -4305,13 +4483,16 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
ret = bpf_xdp_link_attach(attr, prog);
break;
#endif
-#ifdef CONFIG_PERF_EVENTS
case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_TRACEPOINT:
- case BPF_PROG_TYPE_KPROBE:
ret = bpf_perf_link_attach(attr, prog);
break;
-#endif
+ case BPF_PROG_TYPE_KPROBE:
+ if (attr->link_create.attach_type == BPF_PERF_EVENT)
+ ret = bpf_perf_link_attach(attr, prog);
+ else
+ ret = bpf_kprobe_link_attach(attr, prog);
+ break;
default:
ret = -EINVAL;
}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b0383d371b9a..5216b333c688 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -995,6 +995,7 @@ enum bpf_attach_type {
BPF_SK_REUSEPORT_SELECT,
BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
BPF_PERF_EVENT,
+ BPF_TRACE_RAW_KPROBE,
__MAX_BPF_ATTACH_TYPE
};
@@ -1009,6 +1010,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_NETNS = 5,
BPF_LINK_TYPE_XDP = 6,
BPF_LINK_TYPE_PERF_EVENT = 7,
+ BPF_LINK_TYPE_KPROBE = 8,
MAX_BPF_LINK_TYPE,
};
@@ -1111,6 +1113,11 @@ enum bpf_link_type {
*/
#define BPF_F_SLEEPABLE (1U << 4)
+/* link_create flags used in LINK_CREATE command for BPF_TRACE_RAW_KPROBE
+ * attach type.
+ */
+#define BPF_F_KPROBE_RETURN (1U << 0)
+
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* the following extensions:
*
@@ -1465,6 +1472,11 @@ union bpf_attr {
*/
__u64 bpf_cookie;
} perf_event;
+ struct {
+ __aligned_u64 addrs;
+ __u32 cnt;
+ __u64 bpf_cookie;
+ } kprobe;
};
} link_create;
--
2.33.1
Move the kallsyms parsing in internal libbpf__kallsyms_parse
function, so it can be used from other places.
It will be used in following changes.
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/lib/bpf/libbpf.c | 62 ++++++++++++++++++++-------------
tools/lib/bpf/libbpf_internal.h | 5 +++
2 files changed, 43 insertions(+), 24 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 9cb99d1e2385..25512b4dbc8c 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -7146,12 +7146,10 @@ static int bpf_object__sanitize_maps(struct bpf_object *obj)
return 0;
}
-static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
+int libbpf__kallsyms_parse(void *arg, kallsyms_cb_t cb)
{
char sym_type, sym_name[500];
unsigned long long sym_addr;
- const struct btf_type *t;
- struct extern_desc *ext;
int ret, err = 0;
FILE *f;
@@ -7170,35 +7168,51 @@ static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
if (ret != 3) {
pr_warn("failed to read kallsyms entry: %d\n", ret);
err = -EINVAL;
- goto out;
+ break;
}
- ext = find_extern_by_name(obj, sym_name);
- if (!ext || ext->type != EXT_KSYM)
- continue;
-
- t = btf__type_by_id(obj->btf, ext->btf_id);
- if (!btf_is_var(t))
- continue;
-
- if (ext->is_set && ext->ksym.addr != sym_addr) {
- pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n",
- sym_name, ext->ksym.addr, sym_addr);
- err = -EINVAL;
- goto out;
- }
- if (!ext->is_set) {
- ext->is_set = true;
- ext->ksym.addr = sym_addr;
- pr_debug("extern (ksym) %s=0x%llx\n", sym_name, sym_addr);
- }
+ err = cb(arg, sym_addr, sym_type, sym_name);
+ if (err)
+ break;
}
-out:
fclose(f);
return err;
}
+static int kallsyms_cb(void *arg, unsigned long long sym_addr,
+ char sym_type, const char *sym_name)
+{
+ struct bpf_object *obj = arg;
+ const struct btf_type *t;
+ struct extern_desc *ext;
+
+ ext = find_extern_by_name(obj, sym_name);
+ if (!ext || ext->type != EXT_KSYM)
+ return 0;
+
+ t = btf__type_by_id(obj->btf, ext->btf_id);
+ if (!btf_is_var(t))
+ return 0;
+
+ if (ext->is_set && ext->ksym.addr != sym_addr) {
+ pr_warn("extern (ksym) '%s' resolution is ambiguous: 0x%llx or 0x%llx\n",
+ sym_name, ext->ksym.addr, sym_addr);
+ return -EINVAL;
+ }
+ if (!ext->is_set) {
+ ext->is_set = true;
+ ext->ksym.addr = sym_addr;
+ pr_debug("extern (ksym) %s=0x%llx\n", sym_name, sym_addr);
+ }
+ return 0;
+}
+
+static int bpf_object__read_kallsyms_file(struct bpf_object *obj)
+{
+ return libbpf__kallsyms_parse(obj, kallsyms_cb);
+}
+
static int find_ksym_btf_id(struct bpf_object *obj, const char *ksym_name,
__u16 kind, struct btf **res_btf,
struct module_btf **res_mod_btf)
diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h
index 1565679eb432..dc544d239509 100644
--- a/tools/lib/bpf/libbpf_internal.h
+++ b/tools/lib/bpf/libbpf_internal.h
@@ -446,6 +446,11 @@ __s32 btf__find_by_name_kind_own(const struct btf *btf, const char *type_name,
extern enum libbpf_strict_mode libbpf_mode;
+typedef int (*kallsyms_cb_t)(void *arg, unsigned long long sym_addr,
+ char sym_type, const char *sym_name);
+
+int libbpf__kallsyms_parse(void *arg, kallsyms_cb_t cb);
+
/* handle direct returned errors */
static inline int libbpf_err(int ret)
{
--
2.33.1
Adding new kprobe struct in bpf_link_create_opts object
to pass multi kprobe data to link_create attr API.
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/lib/bpf/bpf.c | 5 +++++
tools/lib/bpf/bpf.h | 7 ++++++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 9b64eed2b003..40cad575ad62 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -848,6 +848,11 @@ int bpf_link_create(int prog_fd, int target_fd,
if (!OPTS_ZEROED(opts, perf_event))
return libbpf_err(-EINVAL);
break;
+ case BPF_TRACE_RAW_KPROBE:
+ attr.link_create.kprobe.addrs = OPTS_GET(opts, kprobe.addrs, 0);
+ attr.link_create.kprobe.cnt = OPTS_GET(opts, kprobe.cnt, 0);
+ attr.link_create.kprobe.bpf_cookie = OPTS_GET(opts, kprobe.bpf_cookie, 0);
+ break;
default:
if (!OPTS_ZEROED(opts, flags))
return libbpf_err(-EINVAL);
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 00619f64a040..9611023138b1 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -298,10 +298,15 @@ struct bpf_link_create_opts {
struct {
__u64 bpf_cookie;
} perf_event;
+ struct {
+ __u64 addrs;
+ __u32 cnt;
+ __u64 bpf_cookie;
+ } kprobe;
};
size_t :0;
};
-#define bpf_link_create_opts__last_field perf_event
+#define bpf_link_create_opts__last_field kprobe.bpf_cookie
LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
enum bpf_attach_type attach_type,
--
2.33.1
SEC("raw_kprobe/bpf_fentry_test*")
to attach program to all bpf_fentry_test* functions.
Signed-off-by: Jiri Olsa <[email protected]>
---
tools/lib/bpf/libbpf.c | 124 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 123 insertions(+), 1 deletion(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 25512b4dbc8c..0061ab02fc5a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -10000,6 +10000,125 @@ static int perf_event_kprobe_open_legacy(const char *probe_name, bool retprobe,
return pfd;
}
+struct kprobe_resolve_raw_kprobe {
+ const char *name;
+ __u64 *addrs;
+ __u32 alloc;
+ __u32 cnt;
+};
+
+static bool glob_matches(const char *glob, const char *s)
+{
+ int n = strlen(glob);
+
+ if (n == 1 && glob[0] == '*')
+ return true;
+
+ if (glob[0] == '*' && glob[n - 1] == '*') {
+ const char *subs;
+ /* substring match */
+
+ /* this is hacky, but we don't want to allocate
+ * for no good reason
+ */
+ ((char *)glob)[n - 1] = '\0';
+ subs = strstr(s, glob + 1);
+ ((char *)glob)[n - 1] = '*';
+
+ return subs != NULL;
+ } else if (glob[0] == '*') {
+ size_t nn = strlen(s);
+ /* suffix match */
+
+ /* too short for a given suffix */
+ if (nn < n - 1)
+ return false;
+ return strcmp(s + nn - (n - 1), glob + 1) == 0;
+ } else if (glob[n - 1] == '*') {
+ /* prefix match */
+ return strncmp(s, glob, n - 1) == 0;
+ } else {
+ /* exact match */
+ return strcmp(glob, s) == 0;
+ }
+}
+
+static int
+kprobe_resolve_raw_kprobe_cb(void *arg, unsigned long long sym_addr,
+ char sym_type, const char *sym_name)
+{
+ struct kprobe_resolve_raw_kprobe *res = arg;
+ __u64 *p;
+
+ if (!glob_matches(res->name, sym_name))
+ return 0;
+
+ if (res->cnt == res->alloc) {
+ res->alloc = max((__u32) 16, res->alloc * 3 / 2);
+ p = libbpf_reallocarray(res->addrs, res->alloc, sizeof(__u32));
+ if (!p)
+ return -ENOMEM;
+ res->addrs = p;
+ }
+ res->addrs[res->cnt++] = sym_addr;
+ return 0;
+}
+
+static struct bpf_link *
+attach_raw_kprobe_opts(const struct bpf_program *prog,
+ const char *func_name,
+ const struct bpf_kprobe_opts *kopts)
+{
+ DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+ struct kprobe_resolve_raw_kprobe res = {
+ .name = func_name,
+ };
+ struct bpf_link *link = NULL;
+ char errmsg[STRERR_BUFSIZE];
+ int err, link_fd, prog_fd;
+ __u64 bpf_cookie;
+ bool retprobe;
+
+ err = libbpf__kallsyms_parse(&res, kprobe_resolve_raw_kprobe_cb);
+ if (err)
+ goto error;
+ if (!res.cnt) {
+ err = -ENOENT;
+ goto error;
+ }
+
+ retprobe = OPTS_GET(kopts, retprobe, false);
+ bpf_cookie = OPTS_GET(kopts, bpf_cookie, 0);
+
+ opts.kprobe.addrs = (__u64) res.addrs;
+ opts.kprobe.cnt = res.cnt;
+ opts.kprobe.bpf_cookie = bpf_cookie;
+ opts.flags = retprobe ? BPF_F_KPROBE_RETURN : 0;
+
+ link = calloc(1, sizeof(*link));
+ if (!link)
+ return libbpf_err_ptr(-ENOMEM);
+ link->detach = &bpf_link__detach_fd;
+
+ prog_fd = bpf_program__fd(prog);
+ link_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_RAW_KPROBE, &opts);
+ if (link_fd < 0) {
+ err = -errno;
+ pr_warn("prog '%s': failed to attach to %s: %s\n",
+ prog->name, res.name,
+ libbpf_strerror_r(err, errmsg, sizeof(errmsg)));
+ goto error;
+ }
+ link->fd = link_fd;
+ free(res.addrs);
+ return link;
+
+error:
+ free(link);
+ free(res.addrs);
+ return libbpf_err_ptr(err);
+}
+
struct bpf_link *
bpf_program__attach_kprobe_opts(const struct bpf_program *prog,
const char *func_name,
@@ -10016,6 +10135,9 @@ bpf_program__attach_kprobe_opts(const struct bpf_program *prog,
if (!OPTS_VALID(opts, bpf_kprobe_opts))
return libbpf_err_ptr(-EINVAL);
+ if (strchr(func_name, '*'))
+ return attach_raw_kprobe_opts(prog, func_name, opts);
+
retprobe = OPTS_GET(opts, retprobe, false);
offset = OPTS_GET(opts, offset, 0);
pe_opts.bpf_cookie = OPTS_GET(opts, bpf_cookie, 0);
@@ -10096,7 +10218,7 @@ static struct bpf_link *attach_kprobe(const struct bpf_program *prog, long cooki
else
func_name = prog->sec_name + sizeof("kprobe/") - 1;
- n = sscanf(func_name, "%m[a-zA-Z0-9_.]+%li", &func, &offset);
+ n = sscanf(func_name, "%m[a-zA-Z0-9_.*]+%li", &func, &offset);
if (n < 1) {
err = -EINVAL;
pr_warn("kprobe name is invalid: %s\n", func_name);
--
2.33.1
Adding raw kprobe attach test that uses new interface
to attach multiple kprobes with new kprobe link.
The test is attaching to bpf_fentry_test* functions and
single trampoline program to use bpf_prog_test_run to
trigger bpf_fentry_test* functions.
Signed-off-by: Jiri Olsa <[email protected]>
---
.../bpf/prog_tests/raw_kprobe_test.c | 92 +++++++++++++++++++
.../testing/selftests/bpf/progs/raw_kprobe.c | 58 ++++++++++++
2 files changed, 150 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_kprobe_test.c
create mode 100644 tools/testing/selftests/bpf/progs/raw_kprobe.c
diff --git a/tools/testing/selftests/bpf/prog_tests/raw_kprobe_test.c b/tools/testing/selftests/bpf/prog_tests/raw_kprobe_test.c
new file mode 100644
index 000000000000..5ade44c57c9e
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/raw_kprobe_test.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include "raw_kprobe.skel.h"
+#include "trace_helpers.h"
+
+static void test_skel_api(void)
+{
+ struct raw_kprobe *skel = NULL;
+ __u32 duration = 0, retval;
+ int err, prog_fd;
+
+ skel = raw_kprobe__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "raw_kprobe__open_and_load"))
+ goto cleanup;
+
+ err = raw_kprobe__attach(skel);
+ if (!ASSERT_OK(err, "raw_kprobe__attach"))
+ goto cleanup;
+
+ prog_fd = bpf_program__fd(skel->progs.test1);
+ err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+ NULL, NULL, &retval, &duration);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(retval, 0, "test_run");
+
+ ASSERT_EQ(skel->bss->test2_result, 8, "test2_result");
+ ASSERT_EQ(skel->bss->test3_result, 8, "test3_result");
+
+cleanup:
+ raw_kprobe__destroy(skel);
+}
+
+static void test_link_api(void)
+{
+ DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+ int err, prog_fd, link1_fd = -1, link2_fd = -1;
+ struct raw_kprobe *skel = NULL;
+ __u32 duration = 0, retval;
+ __u64 addrs[8];
+
+ skel = raw_kprobe__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "fentry_raw_skel_load"))
+ goto cleanup;
+
+ kallsyms_find("bpf_fentry_test1", &addrs[0]);
+ kallsyms_find("bpf_fentry_test2", &addrs[1]);
+ kallsyms_find("bpf_fentry_test3", &addrs[2]);
+ kallsyms_find("bpf_fentry_test4", &addrs[3]);
+ kallsyms_find("bpf_fentry_test5", &addrs[4]);
+ kallsyms_find("bpf_fentry_test6", &addrs[5]);
+ kallsyms_find("bpf_fentry_test7", &addrs[6]);
+ kallsyms_find("bpf_fentry_test8", &addrs[7]);
+
+ opts.kprobe.addrs = (__u64) addrs;
+ opts.kprobe.cnt = 8;
+
+ prog_fd = bpf_program__fd(skel->progs.test2);
+ link1_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_RAW_KPROBE, &opts);
+ if (!ASSERT_GE(link1_fd, 0, "link_fd"))
+ goto cleanup;
+
+ opts.flags = BPF_F_KPROBE_RETURN;
+ prog_fd = bpf_program__fd(skel->progs.test3);
+ link2_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_RAW_KPROBE, &opts);
+ if (!ASSERT_GE(link2_fd, 0, "link_fd"))
+ goto cleanup;
+
+ skel->bss->test2_result = 0;
+ skel->bss->test3_result = 0;
+
+ prog_fd = bpf_program__fd(skel->progs.test1);
+ err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+ NULL, NULL, &retval, &duration);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(retval, 0, "test_run");
+
+ ASSERT_EQ(skel->bss->test2_result, 8, "test2_result");
+ ASSERT_EQ(skel->bss->test3_result, 8, "test3_result");
+
+cleanup:
+ if (link1_fd != -1)
+ close(link1_fd);
+ if (link2_fd != -1)
+ close(link2_fd);
+ raw_kprobe__destroy(skel);
+}
+
+void test_raw_kprobe_test(void)
+{
+ test_skel_api();
+ test_link_api();
+}
diff --git a/tools/testing/selftests/bpf/progs/raw_kprobe.c b/tools/testing/selftests/bpf/progs/raw_kprobe.c
new file mode 100644
index 000000000000..baf7086203f9
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/raw_kprobe.c
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+char _license[] SEC("license") = "GPL";
+
+extern const void bpf_fentry_test1 __ksym;
+extern const void bpf_fentry_test2 __ksym;
+extern const void bpf_fentry_test3 __ksym;
+extern const void bpf_fentry_test4 __ksym;
+extern const void bpf_fentry_test5 __ksym;
+extern const void bpf_fentry_test6 __ksym;
+extern const void bpf_fentry_test7 __ksym;
+extern const void bpf_fentry_test8 __ksym;
+
+/* No tests, just to trigger bpf_fentry_test* through tracing test_run */
+SEC("fentry/bpf_modify_return_test")
+int BPF_PROG(test1)
+{
+ return 0;
+}
+
+__u64 test2_result = 0;
+
+SEC("kprobe/bpf_fentry_test*")
+int test2(struct pt_regs *ctx)
+{
+ __u64 addr = bpf_get_func_ip(ctx);
+
+ test2_result += (const void *) addr == &bpf_fentry_test1 ||
+ (const void *) addr == &bpf_fentry_test2 ||
+ (const void *) addr == &bpf_fentry_test3 ||
+ (const void *) addr == &bpf_fentry_test4 ||
+ (const void *) addr == &bpf_fentry_test5 ||
+ (const void *) addr == &bpf_fentry_test6 ||
+ (const void *) addr == &bpf_fentry_test7 ||
+ (const void *) addr == &bpf_fentry_test8;
+ return 0;
+}
+
+__u64 test3_result = 0;
+
+SEC("kretprobe/bpf_fentry_test*")
+int test3(struct pt_regs *ctx)
+{
+ __u64 addr = bpf_get_func_ip(ctx);
+
+ test3_result += (const void *) addr == &bpf_fentry_test1 ||
+ (const void *) addr == &bpf_fentry_test2 ||
+ (const void *) addr == &bpf_fentry_test3 ||
+ (const void *) addr == &bpf_fentry_test4 ||
+ (const void *) addr == &bpf_fentry_test5 ||
+ (const void *) addr == &bpf_fentry_test6 ||
+ (const void *) addr == &bpf_fentry_test7 ||
+ (const void *) addr == &bpf_fentry_test8;
+ return 0;
+}
--
2.33.1
Adding bpf_cookie test for raw_k[ret]probes.
Using the bpf_link_create interface directly and single
trampoline program to trigger the bpf_fentry_test1
execution.
Signed-off-by: Jiri Olsa <[email protected]>
---
.../selftests/bpf/prog_tests/bpf_cookie.c | 42 +++++++++++++++++++
.../selftests/bpf/progs/test_bpf_cookie.c | 24 ++++++++++-
2 files changed, 65 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
index 5eea3c3a40fe..aee01dd3a823 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_cookie.c
@@ -57,6 +57,46 @@ static void kprobe_subtest(struct test_bpf_cookie *skel)
bpf_link__destroy(retlink2);
}
+static void rawkprobe_subtest(struct test_bpf_cookie *skel)
+{
+ DECLARE_LIBBPF_OPTS(bpf_link_create_opts, opts);
+ int err, prog_fd, link1_fd = -1, link2_fd = -1;
+ __u32 duration = 0, retval;
+ __u64 addr;
+
+ kallsyms_find("bpf_fentry_test1", &addr);
+
+ opts.kprobe.addrs = (__u64) &addr;
+ opts.kprobe.cnt = 1;
+ opts.kprobe.bpf_cookie = 0x1;
+ prog_fd = bpf_program__fd(skel->progs.handle_raw_kprobe);
+
+ link1_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_RAW_KPROBE, &opts);
+ if (!ASSERT_GE(link1_fd, 0, "link1_fd"))
+ return;
+
+ opts.flags = BPF_F_KPROBE_RETURN;
+ opts.kprobe.bpf_cookie = 0x2;
+ prog_fd = bpf_program__fd(skel->progs.handle_raw_kretprobe);
+
+ link2_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_RAW_KPROBE, &opts);
+ if (!ASSERT_GE(link2_fd, 0, "link2_fd"))
+ goto cleanup;
+
+ prog_fd = bpf_program__fd(skel->progs.raw_trigger);
+ err = bpf_prog_test_run(prog_fd, 1, NULL, 0,
+ NULL, NULL, &retval, &duration);
+ ASSERT_OK(err, "test_run");
+ ASSERT_EQ(retval, 0, "test_run");
+
+ ASSERT_EQ(skel->bss->raw_kprobe_res, 0x1, "raw_kprobe_res");
+ ASSERT_EQ(skel->bss->raw_kretprobe_res, 0x2, "raw_kretprobe_res");
+
+cleanup:
+ close(link1_fd);
+ close(link2_fd);
+}
+
static void uprobe_subtest(struct test_bpf_cookie *skel)
{
DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, opts);
@@ -243,6 +283,8 @@ void test_bpf_cookie(void)
if (test__start_subtest("kprobe"))
kprobe_subtest(skel);
+ if (test__start_subtest("rawkprobe"))
+ rawkprobe_subtest(skel);
if (test__start_subtest("uprobe"))
uprobe_subtest(skel);
if (test__start_subtest("tracepoint"))
diff --git a/tools/testing/selftests/bpf/progs/test_bpf_cookie.c b/tools/testing/selftests/bpf/progs/test_bpf_cookie.c
index 2d3a7710e2ce..409f87464b1f 100644
--- a/tools/testing/selftests/bpf/progs/test_bpf_cookie.c
+++ b/tools/testing/selftests/bpf/progs/test_bpf_cookie.c
@@ -8,8 +8,9 @@
int my_tid;
int kprobe_res;
-int kprobe_multi_res;
int kretprobe_res;
+int raw_kprobe_res;
+int raw_kretprobe_res;
int uprobe_res;
int uretprobe_res;
int tp_res;
@@ -37,6 +38,27 @@ int handle_kretprobe(struct pt_regs *ctx)
return 0;
}
+SEC("kprobe/bpf_fentry_test1")
+int handle_raw_kprobe(struct pt_regs *ctx)
+{
+ update(ctx, &raw_kprobe_res);
+ return 0;
+}
+
+SEC("kretprobe/bpf_fentry_test1")
+int handle_raw_kretprobe(struct pt_regs *ctx)
+{
+ update(ctx, &raw_kretprobe_res);
+ return 0;
+}
+
+/* just to trigger bpf_fentry_test1 through tracing test_run */
+SEC("fentry/bpf_modify_return_test")
+int BPF_PROG(raw_trigger)
+{
+ return 0;
+}
+
SEC("uprobe/trigger_func")
int handle_uprobe(struct pt_regs *ctx)
{
--
2.33.1
On Tue, Jan 4, 2022 at 12:09 AM Jiri Olsa <[email protected]> wrote:
>
> hi,
> adding support to attach multiple kprobes within single syscall
> and speed up attachment of many kprobes.
>
> The previous attempt [1] wasn't fast enough, so coming with new
> approach that adds new kprobe interface.
>
> The attachment speed of of this approach (tested in bpftrace)
> is now comparable to ftrace tracer attachment speed.. fast ;-)
What are the absolute numbers?
How quickly a single bpf prog can attach to 1k kprobes?
On Tue, Jan 04, 2022 at 10:53:19AM -0800, Alexei Starovoitov wrote:
> On Tue, Jan 4, 2022 at 12:09 AM Jiri Olsa <[email protected]> wrote:
> >
> > hi,
> > adding support to attach multiple kprobes within single syscall
> > and speed up attachment of many kprobes.
> >
> > The previous attempt [1] wasn't fast enough, so coming with new
> > approach that adds new kprobe interface.
> >
> > The attachment speed of of this approach (tested in bpftrace)
> > is now comparable to ftrace tracer attachment speed.. fast ;-)
>
> What are the absolute numbers?
> How quickly a single bpf prog can attach to 1k kprobes?
>
I'd need to write special tool for 1k kprobes exactly,
we could do some benchmark selftest for that
I tested following counts with current bpftrace interface for now
(note it includes both attach and detach)
2 seconds for 673 kprobes:
# perf stat -e cycles:u,cycles:k ./src/bpftrace -e 'kprobe:kvm* { } i:ms:10 { printf("KRAVA\n"); exit() }'
Attaching 2 probes...
Attaching 673 functions
KRAVA
Performance counter stats for './src/bpftrace -e kprobe:kvm* { } i:ms:10 { printf("KRAVA\n"); exit() }':
1,695,142,901 cycles:u
1,909,616,944 cycles:k
1.990434019 seconds time elapsed
0.767746000 seconds user
0.921166000 seconds sys
5 seconds for 3337 kprobes:
# perf stat -e cycles:u,cycles:k ./src/bpftrace -e 'kprobe:x* { } i:ms:10 { printf("KRAVA\n"); exit() }'
Attaching 2 probes...
Attaching 3337 functions
KRAVA
Performance counter stats for './src/bpftrace -e kprobe:x* { } i:ms:10 { printf("KRAVA\n"); exit() }':
1,731,646,061 cycles:u
9,815,306,940 cycles:k
5.196176904 seconds time elapsed
0.780508000 seconds user
4.078170000 seconds sys
lot of the time above is spent in kallsyms:
42.70% bpftrace [kernel.kallsyms] [k] kallsyms_expand_symbol.constprop.0
5.11% bpftrace [kernel.kallsyms] [k] insn_get_prefixes.part.0
3.91% bpftrace [kernel.kallsyms] [k] insn_decode
3.09% bpftrace [kernel.kallsyms] [k] arch_jump_entry_size
1.98% bpftrace [kernel.kallsyms] [k] __lock_acquire
1.51% bpftrace [kernel.kallsyms] [k] static_call_text_reserved
by checking if the address is on the kprobe blacklist:
42.70% bpftrace [kernel.kallsyms] [k] kallsyms_expand_symbol.constprop.0
|
---kallsyms_expand_symbol.constprop.0
|
--42.22%--kallsyms_lookup_name
within_kprobe_blacklist.part.0
check_kprobe_address
register_kprobe
bpf_kprobe_link_attach
__sys_bpf
__x64_sys_bpf
do_syscall_64
entry_SYSCALL_64_after_hwframe
syscall
bpftrace::AttachedProbe::attach_kprobe
I could revive that patch that did bsearch on kallsyms or we could
add 'do-not-check-kprobe-blacklist' unsafe mode to get more speed
jirka
On Tue, 4 Jan 2022 09:09:32 +0100
Jiri Olsa <[email protected]> wrote:
> The bpf_get_func_ip_kprobe helper should return traced function
> address, but it's doing so only for kprobes that are placed on
> the function entry.
>
> If kprobe is placed within the function, bpf_get_func_ip_kprobe
> returns that address instead of function entry.
>
> Storing the function entry directly in kprobe object, so it could
> be used in bpf_get_func_ip_kprobe helper.
Hmm, please do this in bpf side, which should have some data structure
around the kprobe itself. Do not add this "specialized" field to
the kprobe data structure.
Thank you,
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> include/linux/kprobes.h | 3 +++
> kernel/kprobes.c | 12 ++++++++++++
> kernel/trace/bpf_trace.c | 2 +-
> tools/testing/selftests/bpf/progs/get_func_ip_test.c | 4 ++--
> 4 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 8c8f7a4d93af..a204df4fef96 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -74,6 +74,9 @@ struct kprobe {
> /* Offset into the symbol */
> unsigned int offset;
>
> + /* traced function address */
> + unsigned long func_addr;
> +
> /* Called before addr is executed. */
> kprobe_pre_handler_t pre_handler;
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index d20ae8232835..c4060a8da050 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1310,6 +1310,7 @@ static void init_aggr_kprobe(struct kprobe *ap, struct kprobe *p)
> copy_kprobe(p, ap);
> flush_insn_slot(ap);
> ap->addr = p->addr;
> + ap->func_addr = p->func_addr;
> ap->flags = p->flags & ~KPROBE_FLAG_OPTIMIZED;
> ap->pre_handler = aggr_pre_handler;
> /* We don't care the kprobe which has gone. */
> @@ -1588,6 +1589,16 @@ static int check_kprobe_address_safe(struct kprobe *p,
> return ret;
> }
>
> +static unsigned long resolve_func_addr(kprobe_opcode_t *addr)
> +{
> + char str[KSYM_SYMBOL_LEN];
> + unsigned long offset;
> +
> + if (kallsyms_lookup((unsigned long) addr, NULL, &offset, NULL, str))
> + return (unsigned long) addr - offset;
> + return 0;
> +}
> +
> int register_kprobe(struct kprobe *p)
> {
> int ret;
> @@ -1600,6 +1611,7 @@ int register_kprobe(struct kprobe *p)
> if (IS_ERR(addr))
> return PTR_ERR(addr);
> p->addr = addr;
> + p->func_addr = resolve_func_addr(addr);
>
> ret = warn_kprobe_rereg(p);
> if (ret)
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 21aa30644219..25631253084a 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1026,7 +1026,7 @@ BPF_CALL_1(bpf_get_func_ip_kprobe, struct pt_regs *, regs)
> {
> struct kprobe *kp = kprobe_running();
>
> - return kp ? (uintptr_t)kp->addr : 0;
> + return kp ? (uintptr_t)kp->func_addr : 0;
> }
>
> static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
> diff --git a/tools/testing/selftests/bpf/progs/get_func_ip_test.c b/tools/testing/selftests/bpf/progs/get_func_ip_test.c
> index a587aeca5ae0..e988aefa567e 100644
> --- a/tools/testing/selftests/bpf/progs/get_func_ip_test.c
> +++ b/tools/testing/selftests/bpf/progs/get_func_ip_test.c
> @@ -69,7 +69,7 @@ int test6(struct pt_regs *ctx)
> {
> __u64 addr = bpf_get_func_ip(ctx);
>
> - test6_result = (const void *) addr == &bpf_fentry_test6 + 5;
> + test6_result = (const void *) addr == &bpf_fentry_test6;
> return 0;
> }
>
> @@ -79,6 +79,6 @@ int test7(struct pt_regs *ctx)
> {
> __u64 addr = bpf_get_func_ip(ctx);
>
> - test7_result = (const void *) addr == &bpf_fentry_test7 + 5;
> + test7_result = (const void *) addr == &bpf_fentry_test7;
> return 0;
> }
> --
> 2.33.1
>
--
Masami Hiramatsu <[email protected]>
On Tue, 4 Jan 2022 09:09:33 +0100
Jiri Olsa <[email protected]> wrote:
> Adding support to register kprobe on multiple addresses within
> single kprobe object instance.
>
> It uses the CONFIG_KPROBES_ON_FTRACE feature (so it's only
> available for function entry addresses) and separated ftrace_ops
> object placed directly in the kprobe object.
>
> There's new CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE config option,
> enabled for archs that support multi kprobes.
>
> It registers all the provided addresses in the ftrace_ops object
> filter and adds separate ftrace_ops callback.
>
> To register multi kprobe user provides array of addresses or
> symbols with their count, like:
>
> struct kprobe kp = {};
>
> kp.multi.symbols = (const char **) symbols;
> kp.multi.cnt = cnt;
> ...
>
> err = register_kprobe(&kp);
I would like to keep the kprobe itself as simple as possible, which
also should provide a consistent probe handling model.
I understand that you consider the overhead of having multiple
probes, but as far as I can see, this implementation design
smells no good, sorry.
The worst point is that the multi kprobe only supports function
entry (and only available if FTRACE is enabled). Then, that is
FTRACE, not kprobes.
Yes, kprobe supports probing on FTRACE by using FTRACE, but that does
not mean kprobes wrapps FTRACE. That is just for "avoidance" of the
limitation (because we can not put a breakpoint on self-modified code.)
If you need a probe which support multiple address but only
on function entry, that should be something like 'fprobe', not
kprobes.
IMHO, that should be implemented as similar but different APIs
because those are simply different.
So, can't we use ftrace directly from bpf? I don't think there is
no reason that the bpf sticks on kprobes APIs.
Thank you,
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> arch/Kconfig | 3 +
> arch/x86/Kconfig | 1 +
> arch/x86/kernel/kprobes/ftrace.c | 48 ++++++--
> include/linux/kprobes.h | 25 ++++
> kernel/kprobes.c | 204 +++++++++++++++++++++++++------
> 5 files changed, 235 insertions(+), 46 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index d3c4ab249e9c..0131636e1ef8 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -191,6 +191,9 @@ config HAVE_OPTPROBES
> config HAVE_KPROBES_ON_FTRACE
> bool
>
> +config HAVE_KPROBES_MULTI_ON_FTRACE
> + bool
> +
> config ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
> bool
> help
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 5c2ccb85f2ef..0c870238016a 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -217,6 +217,7 @@ config X86
> select HAVE_KERNEL_ZSTD
> select HAVE_KPROBES
> select HAVE_KPROBES_ON_FTRACE
> + select HAVE_KPROBES_MULTI_ON_FTRACE
> select HAVE_FUNCTION_ERROR_INJECTION
> select HAVE_KRETPROBES
> select HAVE_KVM
> diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c
> index dd2ec14adb77..ac4d256b89c6 100644
> --- a/arch/x86/kernel/kprobes/ftrace.c
> +++ b/arch/x86/kernel/kprobes/ftrace.c
> @@ -12,22 +12,14 @@
>
> #include "common.h"
>
> -/* Ftrace callback handler for kprobes -- called under preempt disabled */
> -void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> - struct ftrace_ops *ops, struct ftrace_regs *fregs)
> +static void ftrace_handler(struct kprobe *p, unsigned long ip,
> + struct ftrace_regs *fregs)
> {
> struct pt_regs *regs = ftrace_get_regs(fregs);
> - struct kprobe *p;
> struct kprobe_ctlblk *kcb;
> - int bit;
>
> - bit = ftrace_test_recursion_trylock(ip, parent_ip);
> - if (bit < 0)
> - return;
> -
> - p = get_kprobe((kprobe_opcode_t *)ip);
> if (unlikely(!p) || kprobe_disabled(p))
> - goto out;
> + return;
>
> kcb = get_kprobe_ctlblk();
> if (kprobe_running()) {
> @@ -57,11 +49,43 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> */
> __this_cpu_write(current_kprobe, NULL);
> }
> -out:
> +}
> +
> +/* Ftrace callback handler for kprobes -- called under preempt disabled */
> +void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> + struct ftrace_ops *ops, struct ftrace_regs *fregs)
> +{
> + struct kprobe *p;
> + int bit;
> +
> + bit = ftrace_test_recursion_trylock(ip, parent_ip);
> + if (bit < 0)
> + return;
> +
> + p = get_kprobe((kprobe_opcode_t *)ip);
> + ftrace_handler(p, ip, fregs);
> +
> ftrace_test_recursion_unlock(bit);
> }
> NOKPROBE_SYMBOL(kprobe_ftrace_handler);
>
> +void kprobe_ftrace_multi_handler(unsigned long ip, unsigned long parent_ip,
> + struct ftrace_ops *ops, struct ftrace_regs *fregs)
> +{
> + struct kprobe *p;
> + int bit;
> +
> + bit = ftrace_test_recursion_trylock(ip, parent_ip);
> + if (bit < 0)
> + return;
> +
> + p = container_of(ops, struct kprobe, multi.ops);
> + ftrace_handler(p, ip, fregs);
> +
> + ftrace_test_recursion_unlock(bit);
> +}
> +NOKPROBE_SYMBOL(kprobe_ftrace_multi_handler);
> +
> int arch_prepare_kprobe_ftrace(struct kprobe *p)
> {
> p->ainsn.insn = NULL;
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index a204df4fef96..03fd86ef69cb 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -68,6 +68,16 @@ struct kprobe {
> /* location of the probe point */
> kprobe_opcode_t *addr;
>
> +#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
> + /* location of the multi probe points */
> + struct {
> + const char **symbols;
> + kprobe_opcode_t **addrs;
> + unsigned int cnt;
> + struct ftrace_ops ops;
> + } multi;
> +#endif
> +
> /* Allow user to indicate symbol name of the probe point */
> const char *symbol_name;
>
> @@ -105,6 +115,7 @@ struct kprobe {
> * this flag is only for optimized_kprobe.
> */
> #define KPROBE_FLAG_FTRACE 8 /* probe is using ftrace */
> +#define KPROBE_FLAG_MULTI 16 /* probe multi addresses */
>
> /* Has this kprobe gone ? */
> static inline bool kprobe_gone(struct kprobe *p)
> @@ -130,6 +141,18 @@ static inline bool kprobe_ftrace(struct kprobe *p)
> return p->flags & KPROBE_FLAG_FTRACE;
> }
>
> +/* Is this ftrace multi kprobe ? */
> +static inline bool kprobe_ftrace_multi(struct kprobe *p)
> +{
> + return kprobe_ftrace(p) && (p->flags & KPROBE_FLAG_MULTI);
> +}
> +
> +/* Is this single kprobe ? */
> +static inline bool kprobe_single(struct kprobe *p)
> +{
> + return !(p->flags & KPROBE_FLAG_MULTI);
> +}
> +
> /*
> * Function-return probe -
> * Note:
> @@ -365,6 +388,8 @@ static inline void wait_for_kprobe_optimizer(void) { }
> #ifdef CONFIG_KPROBES_ON_FTRACE
> extern void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
> struct ftrace_ops *ops, struct ftrace_regs *fregs);
> +extern void kprobe_ftrace_multi_handler(unsigned long ip, unsigned long parent_ip,
> + struct ftrace_ops *ops, struct ftrace_regs *fregs);
> extern int arch_prepare_kprobe_ftrace(struct kprobe *p);
> #else
> static inline int arch_prepare_kprobe_ftrace(struct kprobe *p)
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index c4060a8da050..e7729e20d85c 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -44,6 +44,7 @@
> #include <asm/cacheflush.h>
> #include <asm/errno.h>
> #include <linux/uaccess.h>
> +#include <linux/ftrace.h>
>
> #define KPROBE_HASH_BITS 6
> #define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS)
> @@ -1022,6 +1023,35 @@ static struct kprobe *alloc_aggr_kprobe(struct kprobe *p)
> }
> #endif /* CONFIG_OPTPROBES */
>
> +static int check_kprobe_address(unsigned long addr)
> +{
> + /* Ensure it is not in reserved area nor out of text */
> + return !kernel_text_address(addr) ||
> + within_kprobe_blacklist(addr) ||
> + jump_label_text_reserved((void *) addr, (void *) addr) ||
> + static_call_text_reserved((void *) addr, (void *) addr) ||
> + find_bug(addr);
> +}
> +
> +static int check_ftrace_location(unsigned long addr, struct kprobe *p)
> +{
> + unsigned long ftrace_addr;
> +
> + ftrace_addr = ftrace_location(addr);
> + if (ftrace_addr) {
> +#ifdef CONFIG_KPROBES_ON_FTRACE
> + /* Given address is not on the instruction boundary */
> + if (addr != ftrace_addr)
> + return -EILSEQ;
> + if (p)
> + p->flags |= KPROBE_FLAG_FTRACE;
> +#else /* !CONFIG_KPROBES_ON_FTRACE */
> + return -EINVAL;
> +#endif
> + }
> + return 0;
> +}
> +
> #ifdef CONFIG_KPROBES_ON_FTRACE
> static struct ftrace_ops kprobe_ftrace_ops __read_mostly = {
> .func = kprobe_ftrace_handler,
> @@ -1043,6 +1073,13 @@ static int __arm_kprobe_ftrace(struct kprobe *p, struct ftrace_ops *ops,
>
> lockdep_assert_held(&kprobe_mutex);
>
> +#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
> + if (kprobe_ftrace_multi(p)) {
> + ret = register_ftrace_function(&p->multi.ops);
> + WARN(ret < 0, "Failed to register kprobe-multi-ftrace (error %d)\n", ret);
> + return ret;
> + }
> +#endif
> ret = ftrace_set_filter_ip(ops, (unsigned long)p->addr, 0, 0);
> if (WARN_ONCE(ret < 0, "Failed to arm kprobe-ftrace at %pS (error %d)\n", p->addr, ret))
> return ret;
> @@ -1081,6 +1118,13 @@ static int __disarm_kprobe_ftrace(struct kprobe *p, struct ftrace_ops *ops,
>
> lockdep_assert_held(&kprobe_mutex);
>
> +#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
> + if (kprobe_ftrace_multi(p)) {
> + ret = unregister_ftrace_function(&p->multi.ops);
> + WARN(ret < 0, "Failed to unregister kprobe-ftrace (error %d)\n", ret);
> + return ret;
> + }
> +#endif
> if (*cnt == 1) {
> ret = unregister_ftrace_function(ops);
> if (WARN(ret < 0, "Failed to unregister kprobe-ftrace (error %d)\n", ret))
> @@ -1103,6 +1147,94 @@ static int disarm_kprobe_ftrace(struct kprobe *p)
> ipmodify ? &kprobe_ipmodify_ops : &kprobe_ftrace_ops,
> ipmodify ? &kprobe_ipmodify_enabled : &kprobe_ftrace_enabled);
> }
> +
> +#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
> +/*
> + * In addition to standard kprobe address check for multi
> + * ftrace kprobes we also allow only:
> + * - ftrace managed function entry address
> + * - kernel core only address
> + */
> +static unsigned long check_ftrace_addr(unsigned long addr)
> +{
> + int err;
> +
> + if (!addr)
> + return -EINVAL;
> + err = check_ftrace_location(addr, NULL);
> + if (err)
> + return err;
> + if (check_kprobe_address(addr))
> + return -EINVAL;
> + if (__module_text_address(addr))
> + return -EINVAL;
> + return 0;
> +}
> +
> +static int check_ftrace_multi(struct kprobe *p)
> +{
> + kprobe_opcode_t **addrs = p->multi.addrs;
> + const char **symbols = p->multi.symbols;
> + unsigned int i, cnt = p->multi.cnt;
> + unsigned long addr, *ips;
> + int err;
> +
> + if ((symbols && addrs) || (!symbols && !addrs))
> + return -EINVAL;
> +
> + /* do we want sysctl for this? */
> + if (cnt >= 20000)
> + return -E2BIG;
> +
> + ips = kmalloc(sizeof(*ips) * cnt, GFP_KERNEL);
> + if (!ips)
> + return -ENOMEM;
> +
> + for (i = 0; i < cnt; i++) {
> + if (symbols)
> + addr = (unsigned long) kprobe_lookup_name(symbols[i], 0);
> + else
> + addr = (unsigned long) addrs[i];
> + ips[i] = addr;
> + }
> +
> + jump_label_lock();
> + preempt_disable();
> +
> + for (i = 0; i < cnt; i++) {
> + err = check_ftrace_addr(ips[i]);
> + if (err)
> + break;
> + }
> +
> + preempt_enable();
> + jump_label_unlock();
> +
> + if (err)
> + goto out;
> +
> + err = ftrace_set_filter_ips(&p->multi.ops, ips, cnt, 0, 0);
> + if (err)
> + goto out;
> +
> + p->multi.ops.func = kprobe_ftrace_multi_handler;
> + p->multi.ops.flags = FTRACE_OPS_FL_SAVE_REGS|FTRACE_OPS_FL_DYNAMIC;
> +
> + p->flags |= KPROBE_FLAG_MULTI|KPROBE_FLAG_FTRACE;
> + if (p->post_handler)
> + p->multi.ops.flags |= FTRACE_OPS_FL_IPMODIFY;
> +
> +out:
> + kfree(ips);
> + return err;
> +}
> +
> +static void free_ftrace_multi(struct kprobe *p)
> +{
> + ftrace_free_filter(&p->multi.ops);
> +}
> +#endif
> +
> #else /* !CONFIG_KPROBES_ON_FTRACE */
> static inline int arm_kprobe_ftrace(struct kprobe *p)
> {
> @@ -1489,6 +1621,9 @@ static struct kprobe *__get_valid_kprobe(struct kprobe *p)
>
> lockdep_assert_held(&kprobe_mutex);
>
> + if (kprobe_ftrace_multi(p))
> + return p;
> +
> ap = get_kprobe(p->addr);
> if (unlikely(!ap))
> return NULL;
> @@ -1520,41 +1655,18 @@ static inline int warn_kprobe_rereg(struct kprobe *p)
> return ret;
> }
>
> -static int check_ftrace_location(struct kprobe *p)
> -{
> - unsigned long ftrace_addr;
> -
> - ftrace_addr = ftrace_location((unsigned long)p->addr);
> - if (ftrace_addr) {
> -#ifdef CONFIG_KPROBES_ON_FTRACE
> - /* Given address is not on the instruction boundary */
> - if ((unsigned long)p->addr != ftrace_addr)
> - return -EILSEQ;
> - p->flags |= KPROBE_FLAG_FTRACE;
> -#else /* !CONFIG_KPROBES_ON_FTRACE */
> - return -EINVAL;
> -#endif
> - }
> - return 0;
> -}
> -
> static int check_kprobe_address_safe(struct kprobe *p,
> struct module **probed_mod)
> {
> int ret;
>
> - ret = check_ftrace_location(p);
> + ret = check_ftrace_location((unsigned long) p->addr, p);
> if (ret)
> return ret;
> jump_label_lock();
> preempt_disable();
>
> - /* Ensure it is not in reserved area nor out of text */
> - if (!kernel_text_address((unsigned long) p->addr) ||
> - within_kprobe_blacklist((unsigned long) p->addr) ||
> - jump_label_text_reserved(p->addr, p->addr) ||
> - static_call_text_reserved(p->addr, p->addr) ||
> - find_bug((unsigned long)p->addr)) {
> + if (check_kprobe_address((unsigned long) p->addr)) {
> ret = -EINVAL;
> goto out;
> }
> @@ -1599,13 +1711,16 @@ static unsigned long resolve_func_addr(kprobe_opcode_t *addr)
> return 0;
> }
>
> -int register_kprobe(struct kprobe *p)
> +static int check_addr(struct kprobe *p, struct module **probed_mod)
> {
> int ret;
> - struct kprobe *old_p;
> - struct module *probed_mod;
> kprobe_opcode_t *addr;
>
> +#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
> + if (p->multi.cnt)
> + return check_ftrace_multi(p);
> +#endif
> +
> /* Adjust probe address from symbol */
> addr = kprobe_addr(p);
> if (IS_ERR(addr))
> @@ -1616,13 +1731,21 @@ int register_kprobe(struct kprobe *p)
> ret = warn_kprobe_rereg(p);
> if (ret)
> return ret;
> + return check_kprobe_address_safe(p, probed_mod);
> +}
> +
> +int register_kprobe(struct kprobe *p)
> +{
> + struct module *probed_mod = NULL;
> + struct kprobe *old_p;
> + int ret;
>
> /* User can pass only KPROBE_FLAG_DISABLED to register_kprobe */
> p->flags &= KPROBE_FLAG_DISABLED;
> p->nmissed = 0;
> INIT_LIST_HEAD(&p->list);
>
> - ret = check_kprobe_address_safe(p, &probed_mod);
> + ret = check_addr(p, &probed_mod);
> if (ret)
> return ret;
>
> @@ -1644,14 +1767,21 @@ int register_kprobe(struct kprobe *p)
> if (ret)
> goto out;
>
> - INIT_HLIST_NODE(&p->hlist);
> - hlist_add_head_rcu(&p->hlist,
> - &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
> + /*
> + * Multi ftrace kprobes do not have single address,
> + * so they are not stored in the kprobe_table hash.
> + */
> + if (kprobe_single(p)) {
> + INIT_HLIST_NODE(&p->hlist);
> + hlist_add_head_rcu(&p->hlist,
> + &kprobe_table[hash_ptr(p->addr, KPROBE_HASH_BITS)]);
> + }
>
> if (!kprobes_all_disarmed && !kprobe_disabled(p)) {
> ret = arm_kprobe(p);
> if (ret) {
> - hlist_del_rcu(&p->hlist);
> + if (kprobe_single(p))
> + hlist_del_rcu(&p->hlist);
> synchronize_rcu();
> goto out;
> }
> @@ -1778,7 +1908,13 @@ static int __unregister_kprobe_top(struct kprobe *p)
> return 0;
>
> disarmed:
> - hlist_del_rcu(&ap->hlist);
> + if (kprobe_single(ap))
> + hlist_del_rcu(&ap->hlist);
> +
> +#ifdef CONFIG_HAVE_KPROBES_MULTI_ON_FTRACE
> + if (kprobe_ftrace_multi(ap))
> + free_ftrace_multi(ap);
> +#endif
> return 0;
> }
>
> --
> 2.33.1
>
--
Masami Hiramatsu <[email protected]>
On Tue, 4 Jan 2022 09:09:30 +0100
Jiri Olsa <[email protected]> wrote:
> hi,
> adding support to attach multiple kprobes within single syscall
> and speed up attachment of many kprobes.
>
> The previous attempt [1] wasn't fast enough, so coming with new
> approach that adds new kprobe interface.
Yes, since register_kprobes() just registers multiple kprobes on
array. This is designed for dozens of kprobes.
> The attachment speed of of this approach (tested in bpftrace)
> is now comparable to ftrace tracer attachment speed.. fast ;-)
Yes, because that if ftrace, not kprobes.
> The limit of this approach is forced by using ftrace as attach
> layer, so it allows only kprobes on function's entry (plus
> return probes).
Note that you also need to multiply the number of instances.
>
> This patchset contains:
> - kprobes support to register multiple kprobes with current
> kprobe API (patches 1 - 8)
> - bpf support ot create new kprobe link allowing to attach
> multiple addresses (patches 9 - 14)
>
> We don't need to care about multiple probes on same functions
> because it's taken care on the ftrace_ops layer.
Hmm, I think there may be a time to split the "kprobe as an
interface for the software breakpoint" and "kprobe as a wrapper
interface for the callbacks of various instrumentations", like
'raw_kprobe'(or kswbp) and 'kprobes'.
And this may be called as 'fprobe' as ftrace_ops wrapper.
(But if the bpf is enough flexible, this kind of intermediate layer
may not be needed, it can use ftrace_ops directly, eventually)
Jiri, have you already considered to use ftrace_ops from the
bpf directly? Are there any issues?
(bpf depends on 'kprobe' widely?)
Thank you,
--
Masami Hiramatsu <[email protected]>
On Tue, Jan 4, 2022 at 12:10 AM Jiri Olsa <[email protected]> wrote:
>
> The bpf_get_func_ip_kprobe helper should return traced function
> address, but it's doing so only for kprobes that are placed on
> the function entry.
>
> If kprobe is placed within the function, bpf_get_func_ip_kprobe
> returns that address instead of function entry.
>
> Storing the function entry directly in kprobe object, so it could
> be used in bpf_get_func_ip_kprobe helper.
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> include/linux/kprobes.h | 3 +++
> kernel/kprobes.c | 12 ++++++++++++
> kernel/trace/bpf_trace.c | 2 +-
> tools/testing/selftests/bpf/progs/get_func_ip_test.c | 4 ++--
> 4 files changed, 18 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 8c8f7a4d93af..a204df4fef96 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -74,6 +74,9 @@ struct kprobe {
> /* Offset into the symbol */
> unsigned int offset;
>
> + /* traced function address */
> + unsigned long func_addr;
> +
keep in mind that we'll also need (maybe in a follow up series) to
store bpf_cookie somewhere close to this func_addr as well. Just
mentioning to keep in mind as you decide with Masami where to put it.
> /* Called before addr is executed. */
> kprobe_pre_handler_t pre_handler;
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index d20ae8232835..c4060a8da050 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1310,6 +1310,7 @@ static void init_aggr_kprobe(struct kprobe *ap, struct kprobe *p)
> copy_kprobe(p, ap);
> flush_insn_slot(ap);
> ap->addr = p->addr;
> + ap->func_addr = p->func_addr;
> ap->flags = p->flags & ~KPROBE_FLAG_OPTIMIZED;
> ap->pre_handler = aggr_pre_handler;
> /* We don't care the kprobe which has gone. */
> @@ -1588,6 +1589,16 @@ static int check_kprobe_address_safe(struct kprobe *p,
> return ret;
> }
>
> +static unsigned long resolve_func_addr(kprobe_opcode_t *addr)
> +{
> + char str[KSYM_SYMBOL_LEN];
> + unsigned long offset;
> +
> + if (kallsyms_lookup((unsigned long) addr, NULL, &offset, NULL, str))
> + return (unsigned long) addr - offset;
> + return 0;
> +}
> +
> int register_kprobe(struct kprobe *p)
> {
> int ret;
> @@ -1600,6 +1611,7 @@ int register_kprobe(struct kprobe *p)
> if (IS_ERR(addr))
> return PTR_ERR(addr);
> p->addr = addr;
> + p->func_addr = resolve_func_addr(addr);
>
> ret = warn_kprobe_rereg(p);
> if (ret)
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index 21aa30644219..25631253084a 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1026,7 +1026,7 @@ BPF_CALL_1(bpf_get_func_ip_kprobe, struct pt_regs *, regs)
> {
> struct kprobe *kp = kprobe_running();
>
> - return kp ? (uintptr_t)kp->addr : 0;
> + return kp ? (uintptr_t)kp->func_addr : 0;
> }
>
> static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
> diff --git a/tools/testing/selftests/bpf/progs/get_func_ip_test.c b/tools/testing/selftests/bpf/progs/get_func_ip_test.c
> index a587aeca5ae0..e988aefa567e 100644
> --- a/tools/testing/selftests/bpf/progs/get_func_ip_test.c
> +++ b/tools/testing/selftests/bpf/progs/get_func_ip_test.c
> @@ -69,7 +69,7 @@ int test6(struct pt_regs *ctx)
> {
> __u64 addr = bpf_get_func_ip(ctx);
>
> - test6_result = (const void *) addr == &bpf_fentry_test6 + 5;
> + test6_result = (const void *) addr == &bpf_fentry_test6;
> return 0;
> }
>
> @@ -79,6 +79,6 @@ int test7(struct pt_regs *ctx)
> {
> __u64 addr = bpf_get_func_ip(ctx);
>
> - test7_result = (const void *) addr == &bpf_fentry_test7 + 5;
> + test7_result = (const void *) addr == &bpf_fentry_test7;
we can treat this as a bug fix for bpf_get_func_ip() for kprobes,
right? I think "Fixes: " tag is in order then.
> return 0;
> }
> --
> 2.33.1
>
On Tue, Jan 4, 2022 at 12:10 AM Jiri Olsa <[email protected]> wrote:
>
> The current bpf_get_func_ip_kprobe helper does not work properly,
> when used in ebpf program triggered by the new multi kprobes.
>
> We can't use kprobe's func_addr in bpf_get_func_ip_kprobe helper,
> because there are multiple functions registered for single kprobe
> object.
>
> Adding new per cpu variable current_ftrace_multi_addr and extra
> address in kretprobe_instance object to keep current traced function
> address for each cpu for both kprobe handler and kretprobe trampoline.
>
> The address value is set/passed as follows, for kprobe:
>
> kprobe_ftrace_multi_handler
> {
> old = kprobe_ftrace_multi_addr_set(ip);
> handler..
> kprobe_ftrace_multi_addr_set(old);
> }
>
> For kretprobe:
>
> kprobe_ftrace_multi_handler
> {
> old = kprobe_ftrace_multi_addr_set(ip);
> ...
> pre_handler_kretprobe
> {
> ri->ftrace_multi_addr = kprobe_ftrace_multi_addr
> }
> ...
> kprobe_ftrace_multi_addr_set(old);
> }
>
> __kretprobe_trampoline_handler
> {
> prev_func_addr = kprobe_ftrace_multi_addr_set(ri->ftrace_multi_addr);
> handler..
> kprobe_ftrace_multi_addr_set(prev_func_addr);
> }
>
Is it possible to record or calculate the multi-kprobe "instance
index" (i.e., which position in addrs array did we get triggered for)?
If yes, then storing that index would allow to fetch both IP and
cookie value with just one per-cpu variable.
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> arch/x86/kernel/kprobes/ftrace.c | 3 +++
> include/linux/kprobes.h | 26 ++++++++++++++++++++++++++
> kernel/kprobes.c | 6 ++++++
> kernel/trace/bpf_trace.c | 7 ++++++-
> 4 files changed, 41 insertions(+), 1 deletion(-)
>
[...]
On Tue, Jan 4, 2022 at 12:10 AM Jiri Olsa <[email protected]> wrote:
>
> Adding new link type BPF_LINK_TYPE_KPROBE to attach kprobes
> directly through register_kprobe/kretprobe API.
>
> Adding new attach type BPF_TRACE_RAW_KPROBE that enables
> such link for kprobe program.
>
> The new link allows to create multiple kprobes link by using
> new link_create interface:
>
> struct {
> __aligned_u64 addrs;
> __u32 cnt;
> __u64 bpf_cookie;
I'm afraid bpf_cookie has to be different for each addr, otherwise
it's severely limiting. So it would be an array of cookies alongside
an array of addresses
> } kprobe;
>
> Plus new flag BPF_F_KPROBE_RETURN for link_create.flags to
> create return probe.
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> include/linux/bpf_types.h | 1 +
> include/uapi/linux/bpf.h | 12 +++
> kernel/bpf/syscall.c | 191 ++++++++++++++++++++++++++++++++-
> tools/include/uapi/linux/bpf.h | 12 +++
> 4 files changed, 211 insertions(+), 5 deletions(-)
>
[...]
> @@ -1111,6 +1113,11 @@ enum bpf_link_type {
> */
> #define BPF_F_SLEEPABLE (1U << 4)
>
> +/* link_create flags used in LINK_CREATE command for BPF_TRACE_RAW_KPROBE
> + * attach type.
> + */
> +#define BPF_F_KPROBE_RETURN (1U << 0)
> +
we have plenty of flexibility to have per-link type fields, so why not
add `bool is_retprobe` next to addrs and cnt?
> /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> * the following extensions:
> *
> @@ -1465,6 +1472,11 @@ union bpf_attr {
> */
> __u64 bpf_cookie;
> } perf_event;
> + struct {
> + __aligned_u64 addrs;
> + __u32 cnt;
> + __u64 bpf_cookie;
> + } kprobe;
> };
> } link_create;
>
[...]
On Thu, Jan 06, 2022 at 12:24:35AM +0900, Masami Hiramatsu wrote:
> On Tue, 4 Jan 2022 09:09:30 +0100
> Jiri Olsa <[email protected]> wrote:
>
> > hi,
> > adding support to attach multiple kprobes within single syscall
> > and speed up attachment of many kprobes.
> >
> > The previous attempt [1] wasn't fast enough, so coming with new
> > approach that adds new kprobe interface.
>
> Yes, since register_kprobes() just registers multiple kprobes on
> array. This is designed for dozens of kprobes.
>
> > The attachment speed of of this approach (tested in bpftrace)
> > is now comparable to ftrace tracer attachment speed.. fast ;-)
>
> Yes, because that if ftrace, not kprobes.
>
> > The limit of this approach is forced by using ftrace as attach
> > layer, so it allows only kprobes on function's entry (plus
> > return probes).
>
> Note that you also need to multiply the number of instances.
>
> >
> > This patchset contains:
> > - kprobes support to register multiple kprobes with current
> > kprobe API (patches 1 - 8)
> > - bpf support ot create new kprobe link allowing to attach
> > multiple addresses (patches 9 - 14)
> >
> > We don't need to care about multiple probes on same functions
> > because it's taken care on the ftrace_ops layer.
>
> Hmm, I think there may be a time to split the "kprobe as an
> interface for the software breakpoint" and "kprobe as a wrapper
> interface for the callbacks of various instrumentations", like
> 'raw_kprobe'(or kswbp) and 'kprobes'.
> And this may be called as 'fprobe' as ftrace_ops wrapper.
> (But if the bpf is enough flexible, this kind of intermediate layer
> may not be needed, it can use ftrace_ops directly, eventually)
>
> Jiri, have you already considered to use ftrace_ops from the
> bpf directly? Are there any issues?
> (bpf depends on 'kprobe' widely?)
at the moment there's not ftrace public interface for the return
probe merged in, so to get the kretprobe working I had to use
kprobe interface
but.. there are patches Steven shared some time ago, that do that
and make graph_ops available as kernel interface
I recall we considered graph_ops interface before as common attach
layer for trampolines, which was bad, but it might actually make
sense for kprobes
I'll need to check it in more details but I think both graph_ops and
kprobe do about similar thing wrt hooking return probe, so it should
be comparable.. and they are already doing the same for the entry hook,
because kprobe is mostly using ftrace for that
we would not need to introduce new program type - kprobe programs
should be able to run from ftrace callbacks just fine
so we would have:
- kprobe type programs attaching to:
- new BPF_LINK_TYPE_FPROBE link using the graph_ops as attachment layer
jirka
On Wed, Jan 05, 2022 at 11:32:52PM +0900, Masami Hiramatsu wrote:
> On Tue, 4 Jan 2022 09:09:32 +0100
> Jiri Olsa <[email protected]> wrote:
>
> > The bpf_get_func_ip_kprobe helper should return traced function
> > address, but it's doing so only for kprobes that are placed on
> > the function entry.
> >
> > If kprobe is placed within the function, bpf_get_func_ip_kprobe
> > returns that address instead of function entry.
> >
> > Storing the function entry directly in kprobe object, so it could
> > be used in bpf_get_func_ip_kprobe helper.
>
> Hmm, please do this in bpf side, which should have some data structure
> around the kprobe itself. Do not add this "specialized" field to
> the kprobe data structure.
ok, will check
thanks,
jirka
>
> Thank you,
>
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > include/linux/kprobes.h | 3 +++
> > kernel/kprobes.c | 12 ++++++++++++
> > kernel/trace/bpf_trace.c | 2 +-
> > tools/testing/selftests/bpf/progs/get_func_ip_test.c | 4 ++--
> > 4 files changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> > index 8c8f7a4d93af..a204df4fef96 100644
> > --- a/include/linux/kprobes.h
> > +++ b/include/linux/kprobes.h
> > @@ -74,6 +74,9 @@ struct kprobe {
> > /* Offset into the symbol */
> > unsigned int offset;
> >
> > + /* traced function address */
> > + unsigned long func_addr;
> > +
> > /* Called before addr is executed. */
> > kprobe_pre_handler_t pre_handler;
> >
> > diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> > index d20ae8232835..c4060a8da050 100644
> > --- a/kernel/kprobes.c
> > +++ b/kernel/kprobes.c
> > @@ -1310,6 +1310,7 @@ static void init_aggr_kprobe(struct kprobe *ap, struct kprobe *p)
> > copy_kprobe(p, ap);
> > flush_insn_slot(ap);
> > ap->addr = p->addr;
> > + ap->func_addr = p->func_addr;
> > ap->flags = p->flags & ~KPROBE_FLAG_OPTIMIZED;
> > ap->pre_handler = aggr_pre_handler;
> > /* We don't care the kprobe which has gone. */
> > @@ -1588,6 +1589,16 @@ static int check_kprobe_address_safe(struct kprobe *p,
> > return ret;
> > }
> >
> > +static unsigned long resolve_func_addr(kprobe_opcode_t *addr)
> > +{
> > + char str[KSYM_SYMBOL_LEN];
> > + unsigned long offset;
> > +
> > + if (kallsyms_lookup((unsigned long) addr, NULL, &offset, NULL, str))
> > + return (unsigned long) addr - offset;
> > + return 0;
> > +}
> > +
> > int register_kprobe(struct kprobe *p)
> > {
> > int ret;
> > @@ -1600,6 +1611,7 @@ int register_kprobe(struct kprobe *p)
> > if (IS_ERR(addr))
> > return PTR_ERR(addr);
> > p->addr = addr;
> > + p->func_addr = resolve_func_addr(addr);
> >
> > ret = warn_kprobe_rereg(p);
> > if (ret)
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 21aa30644219..25631253084a 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -1026,7 +1026,7 @@ BPF_CALL_1(bpf_get_func_ip_kprobe, struct pt_regs *, regs)
> > {
> > struct kprobe *kp = kprobe_running();
> >
> > - return kp ? (uintptr_t)kp->addr : 0;
> > + return kp ? (uintptr_t)kp->func_addr : 0;
> > }
> >
> > static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
> > diff --git a/tools/testing/selftests/bpf/progs/get_func_ip_test.c b/tools/testing/selftests/bpf/progs/get_func_ip_test.c
> > index a587aeca5ae0..e988aefa567e 100644
> > --- a/tools/testing/selftests/bpf/progs/get_func_ip_test.c
> > +++ b/tools/testing/selftests/bpf/progs/get_func_ip_test.c
> > @@ -69,7 +69,7 @@ int test6(struct pt_regs *ctx)
> > {
> > __u64 addr = bpf_get_func_ip(ctx);
> >
> > - test6_result = (const void *) addr == &bpf_fentry_test6 + 5;
> > + test6_result = (const void *) addr == &bpf_fentry_test6;
> > return 0;
> > }
> >
> > @@ -79,6 +79,6 @@ int test7(struct pt_regs *ctx)
> > {
> > __u64 addr = bpf_get_func_ip(ctx);
> >
> > - test7_result = (const void *) addr == &bpf_fentry_test7 + 5;
> > + test7_result = (const void *) addr == &bpf_fentry_test7;
> > return 0;
> > }
> > --
> > 2.33.1
> >
>
>
> --
> Masami Hiramatsu <[email protected]>
>
On Wed, Jan 05, 2022 at 08:30:48PM -0800, Andrii Nakryiko wrote:
> On Tue, Jan 4, 2022 at 12:10 AM Jiri Olsa <[email protected]> wrote:
> >
> > The bpf_get_func_ip_kprobe helper should return traced function
> > address, but it's doing so only for kprobes that are placed on
> > the function entry.
> >
> > If kprobe is placed within the function, bpf_get_func_ip_kprobe
> > returns that address instead of function entry.
> >
> > Storing the function entry directly in kprobe object, so it could
> > be used in bpf_get_func_ip_kprobe helper.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > include/linux/kprobes.h | 3 +++
> > kernel/kprobes.c | 12 ++++++++++++
> > kernel/trace/bpf_trace.c | 2 +-
> > tools/testing/selftests/bpf/progs/get_func_ip_test.c | 4 ++--
> > 4 files changed, 18 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> > index 8c8f7a4d93af..a204df4fef96 100644
> > --- a/include/linux/kprobes.h
> > +++ b/include/linux/kprobes.h
> > @@ -74,6 +74,9 @@ struct kprobe {
> > /* Offset into the symbol */
> > unsigned int offset;
> >
> > + /* traced function address */
> > + unsigned long func_addr;
> > +
>
> keep in mind that we'll also need (maybe in a follow up series) to
> store bpf_cookie somewhere close to this func_addr as well. Just
> mentioning to keep in mind as you decide with Masami where to put it.
ok
SNIP
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index 21aa30644219..25631253084a 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -1026,7 +1026,7 @@ BPF_CALL_1(bpf_get_func_ip_kprobe, struct pt_regs *, regs)
> > {
> > struct kprobe *kp = kprobe_running();
> >
> > - return kp ? (uintptr_t)kp->addr : 0;
> > + return kp ? (uintptr_t)kp->func_addr : 0;
> > }
> >
> > static const struct bpf_func_proto bpf_get_func_ip_proto_kprobe = {
> > diff --git a/tools/testing/selftests/bpf/progs/get_func_ip_test.c b/tools/testing/selftests/bpf/progs/get_func_ip_test.c
> > index a587aeca5ae0..e988aefa567e 100644
> > --- a/tools/testing/selftests/bpf/progs/get_func_ip_test.c
> > +++ b/tools/testing/selftests/bpf/progs/get_func_ip_test.c
> > @@ -69,7 +69,7 @@ int test6(struct pt_regs *ctx)
> > {
> > __u64 addr = bpf_get_func_ip(ctx);
> >
> > - test6_result = (const void *) addr == &bpf_fentry_test6 + 5;
> > + test6_result = (const void *) addr == &bpf_fentry_test6;
> > return 0;
> > }
> >
> > @@ -79,6 +79,6 @@ int test7(struct pt_regs *ctx)
> > {
> > __u64 addr = bpf_get_func_ip(ctx);
> >
> > - test7_result = (const void *) addr == &bpf_fentry_test7 + 5;
> > + test7_result = (const void *) addr == &bpf_fentry_test7;
>
> we can treat this as a bug fix for bpf_get_func_ip() for kprobes,
> right? I think "Fixes: " tag is in order then.
true, will add that in next version
thanks,
jirka
On Wed, Jan 05, 2022 at 08:30:56PM -0800, Andrii Nakryiko wrote:
> On Tue, Jan 4, 2022 at 12:10 AM Jiri Olsa <[email protected]> wrote:
> >
> > Adding new link type BPF_LINK_TYPE_KPROBE to attach kprobes
> > directly through register_kprobe/kretprobe API.
> >
> > Adding new attach type BPF_TRACE_RAW_KPROBE that enables
> > such link for kprobe program.
> >
> > The new link allows to create multiple kprobes link by using
> > new link_create interface:
> >
> > struct {
> > __aligned_u64 addrs;
> > __u32 cnt;
> > __u64 bpf_cookie;
>
> I'm afraid bpf_cookie has to be different for each addr, otherwise
> it's severely limiting. So it would be an array of cookies alongside
> an array of addresses
ok
>
> > } kprobe;
> >
> > Plus new flag BPF_F_KPROBE_RETURN for link_create.flags to
> > create return probe.
> >
> > Signed-off-by: Jiri Olsa <[email protected]>
> > ---
> > include/linux/bpf_types.h | 1 +
> > include/uapi/linux/bpf.h | 12 +++
> > kernel/bpf/syscall.c | 191 ++++++++++++++++++++++++++++++++-
> > tools/include/uapi/linux/bpf.h | 12 +++
> > 4 files changed, 211 insertions(+), 5 deletions(-)
> >
>
> [...]
>
> > @@ -1111,6 +1113,11 @@ enum bpf_link_type {
> > */
> > #define BPF_F_SLEEPABLE (1U << 4)
> >
> > +/* link_create flags used in LINK_CREATE command for BPF_TRACE_RAW_KPROBE
> > + * attach type.
> > + */
> > +#define BPF_F_KPROBE_RETURN (1U << 0)
> > +
>
> we have plenty of flexibility to have per-link type fields, so why not
> add `bool is_retprobe` next to addrs and cnt?
well I thought if I do that, people would suggest to use the empty
flags field instead ;-)
we can move it there as you suggest, but I wonder it's good idea to
use bool in uapi headers, because the bool size definition is vague
jirka
>
> > /* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> > * the following extensions:
> > *
> > @@ -1465,6 +1472,11 @@ union bpf_attr {
> > */
> > __u64 bpf_cookie;
> > } perf_event;
> > + struct {
> > + __aligned_u64 addrs;
> > + __u32 cnt;
> > + __u64 bpf_cookie;
> > + } kprobe;
> > };
> > } link_create;
> >
>
> [...]
>
On Thu, 6 Jan 2022 09:29:02 +0100
Jiri Olsa <[email protected]> wrote:
> On Thu, Jan 06, 2022 at 12:24:35AM +0900, Masami Hiramatsu wrote:
> > On Tue, 4 Jan 2022 09:09:30 +0100
> > Jiri Olsa <[email protected]> wrote:
> >
> > > hi,
> > > adding support to attach multiple kprobes within single syscall
> > > and speed up attachment of many kprobes.
> > >
> > > The previous attempt [1] wasn't fast enough, so coming with new
> > > approach that adds new kprobe interface.
> >
> > Yes, since register_kprobes() just registers multiple kprobes on
> > array. This is designed for dozens of kprobes.
> >
> > > The attachment speed of of this approach (tested in bpftrace)
> > > is now comparable to ftrace tracer attachment speed.. fast ;-)
> >
> > Yes, because that if ftrace, not kprobes.
> >
> > > The limit of this approach is forced by using ftrace as attach
> > > layer, so it allows only kprobes on function's entry (plus
> > > return probes).
> >
> > Note that you also need to multiply the number of instances.
> >
> > >
> > > This patchset contains:
> > > - kprobes support to register multiple kprobes with current
> > > kprobe API (patches 1 - 8)
> > > - bpf support ot create new kprobe link allowing to attach
> > > multiple addresses (patches 9 - 14)
> > >
> > > We don't need to care about multiple probes on same functions
> > > because it's taken care on the ftrace_ops layer.
> >
> > Hmm, I think there may be a time to split the "kprobe as an
> > interface for the software breakpoint" and "kprobe as a wrapper
> > interface for the callbacks of various instrumentations", like
> > 'raw_kprobe'(or kswbp) and 'kprobes'.
> > And this may be called as 'fprobe' as ftrace_ops wrapper.
> > (But if the bpf is enough flexible, this kind of intermediate layer
> > may not be needed, it can use ftrace_ops directly, eventually)
> >
> > Jiri, have you already considered to use ftrace_ops from the
> > bpf directly? Are there any issues?
> > (bpf depends on 'kprobe' widely?)
>
> at the moment there's not ftrace public interface for the return
> probe merged in, so to get the kretprobe working I had to use
> kprobe interface
Yeah, I found that too. We have to ask Steve to salvage it ;)
> but.. there are patches Steven shared some time ago, that do that
> and make graph_ops available as kernel interface
>
> I recall we considered graph_ops interface before as common attach
> layer for trampolines, which was bad, but it might actually make
> sense for kprobes
I started working on making 'fprobe' which will provide multiple
function probe with similar interface of kprobes. See attached
patch. Then you can use it in bpf, maybe with an union like
union {
struct kprobe kp; // for function body
struct fprobe fp; // for function entry and return
};
At this moment, fprobe only support entry_handler, but when we
re-start the generic graph_ops interface, it is easy to expand
to support exit_handler.
If this works, I think kretprobe can be phased out, since at that
moment, kprobe_event can replace it with the fprobe exit_handler.
(This is a benefit of decoupling the instrumentation layer from
the event layer. It can choose the best way without changing
user interface.)
> I'll need to check it in more details but I think both graph_ops and
> kprobe do about similar thing wrt hooking return probe, so it should
> be comparable.. and they are already doing the same for the entry hook,
> because kprobe is mostly using ftrace for that
>
> we would not need to introduce new program type - kprobe programs
> should be able to run from ftrace callbacks just fine
That seems to bind your mind. The program type is just a programing
'model' of the bpf. You can choose the best implementation to provide
equal functionality. 'kprobe' in bpf is just a name that you call some
instrumentations which can probe kernel code.
Thank you,
>
> so we would have:
> - kprobe type programs attaching to:
> - new BPF_LINK_TYPE_FPROBE link using the graph_ops as attachment layer
>
> jirka
>
--
Masami Hiramatsu <[email protected]>
On Thu, Jan 06, 2022 at 10:59:43PM +0900, Masami Hiramatsu wrote:
SNIP
> > >
> > > Hmm, I think there may be a time to split the "kprobe as an
> > > interface for the software breakpoint" and "kprobe as a wrapper
> > > interface for the callbacks of various instrumentations", like
> > > 'raw_kprobe'(or kswbp) and 'kprobes'.
> > > And this may be called as 'fprobe' as ftrace_ops wrapper.
> > > (But if the bpf is enough flexible, this kind of intermediate layer
> > > may not be needed, it can use ftrace_ops directly, eventually)
> > >
> > > Jiri, have you already considered to use ftrace_ops from the
> > > bpf directly? Are there any issues?
> > > (bpf depends on 'kprobe' widely?)
> >
> > at the moment there's not ftrace public interface for the return
> > probe merged in, so to get the kretprobe working I had to use
> > kprobe interface
>
> Yeah, I found that too. We have to ask Steve to salvage it ;)
I got those patches rebased like half a year ago upstream code,
so should be easy to revive them
>
> > but.. there are patches Steven shared some time ago, that do that
> > and make graph_ops available as kernel interface
> >
> > I recall we considered graph_ops interface before as common attach
> > layer for trampolines, which was bad, but it might actually make
> > sense for kprobes
>
> I started working on making 'fprobe' which will provide multiple
> function probe with similar interface of kprobes. See attached
> patch. Then you can use it in bpf, maybe with an union like
>
> union {
> struct kprobe kp; // for function body
> struct fprobe fp; // for function entry and return
> };
>
> At this moment, fprobe only support entry_handler, but when we
> re-start the generic graph_ops interface, it is easy to expand
> to support exit_handler.
> If this works, I think kretprobe can be phased out, since at that
> moment, kprobe_event can replace it with the fprobe exit_handler.
> (This is a benefit of decoupling the instrumentation layer from
> the event layer. It can choose the best way without changing
> user interface.)
>
I can resend out graph_ops patches if you want to base
it directly on that
> > I'll need to check it in more details but I think both graph_ops and
> > kprobe do about similar thing wrt hooking return probe, so it should
> > be comparable.. and they are already doing the same for the entry hook,
> > because kprobe is mostly using ftrace for that
> >
> > we would not need to introduce new program type - kprobe programs
> > should be able to run from ftrace callbacks just fine
>
> That seems to bind your mind. The program type is just a programing
> 'model' of the bpf. You can choose the best implementation to provide
> equal functionality. 'kprobe' in bpf is just a name that you call some
> instrumentations which can probe kernel code.
I don't want to introduce new type, there's some dependencies
in bpf verifier and helpers code we'd need to handle for that
I'm looking for solution for current kprobe bpf program type
to be registered for multiple addresses quickly
>
> Thank you,
>
> >
> > so we would have:
> > - kprobe type programs attaching to:
> > - new BPF_LINK_TYPE_FPROBE link using the graph_ops as attachment layer
> >
> > jirka
> >
>
>
> --
> Masami Hiramatsu <[email protected]>
> From 269b86597c166d6d4c5dd564168237603533165a Mon Sep 17 00:00:00 2001
> From: Masami Hiramatsu <[email protected]>
> Date: Thu, 6 Jan 2022 15:40:36 +0900
> Subject: [PATCH] fprobe: Add ftrace based probe APIs
>
> The fprobe is a wrapper API for ftrace function tracer.
> Unlike kprobes, this probes only supports the function entry, but
> it can probe multiple functions by one fprobe. The usage is almost
> same as the kprobe, user will specify the function names by
> fprobe::syms, the number of syms by fprobe::nsyms, and the user
> handler by fprobe::handler.
>
> struct fprobe = { 0 };
> const char *targets[] = {"func1", "func2", "func3"};
>
> fprobe.handler = user_handler;
> fprobe.nsyms = ARRAY_SIZE(targets);
> fprobe.syms = targets;
>
> ret = register_fprobe(&fprobe);
> ...
>
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> ---
> include/linux/fprobes.h | 52 ++++++++++++++++
> kernel/trace/Kconfig | 10 ++++
> kernel/trace/Makefile | 1 +
> kernel/trace/fprobes.c | 128 ++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 191 insertions(+)
> create mode 100644 include/linux/fprobes.h
> create mode 100644 kernel/trace/fprobes.c
>
> diff --git a/include/linux/fprobes.h b/include/linux/fprobes.h
> new file mode 100644
> index 000000000000..22db748bf491
> --- /dev/null
> +++ b/include/linux/fprobes.h
> @@ -0,0 +1,52 @@
> +#ifndef _LINUX_FPROBES_H
> +#define _LINUX_FPROBES_H
> +/* Simple ftrace probe wrapper */
> +
> +#include <linux/compiler.h>
> +#include <linux/ftrace.h>
> +
> +struct fprobe {
> + const char **syms;
> + unsigned long *addrs;
could you add array of user data for each addr/sym?
SNIP
> +static int populate_func_addresses(struct fprobe *fp)
> +{
> + unsigned int i;
> +
> + fp->addrs = kmalloc(sizeof(void *) * fp->nsyms, GFP_KERNEL);
> + if (!fp->addrs)
> + return -ENOMEM;
> +
> + for (i = 0; i < fp->nsyms; i++) {
> + fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
> + if (!fp->addrs[i]) {
> + kfree(fp->addrs);
> + fp->addrs = NULL;
> + return -ENOENT;
> + }
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * register_fprobe - Register fprobe to ftrace
> + * @fp: A fprobe data structure to be registered.
> + *
> + * This expects the user set @fp::syms or @fp::addrs (not both),
> + * @fp::nsyms (number of entries of @fp::syms or @fp::addrs) and
> + * @fp::handler. Other fields are initialized by this function.
> + */
> +int register_fprobe(struct fprobe *fp)
> +{
> + unsigned int i;
> + int ret;
> +
> + if (!fp)
> + return -EINVAL;
> +
> + if (!fp->nsyms || (!fp->syms && !fp->addrs) || (fp->syms && fp->addrs))
> + return -EINVAL;
> +
> + if (fp->syms) {
> + ret = populate_func_addresses(fp);
> + if (ret < 0)
> + return ret;
> + }
> +
> + fp->ftrace.func = fprobe_handler;
> + fp->ftrace.flags = FTRACE_OPS_FL_SAVE_REGS;
> +
> + for (i = 0; i < fp->nsyms; i++) {
> + ret = ftrace_set_filter_ip(&fp->ftrace, fp->addrs[i], 0, 0);
> + if (ret < 0)
> + goto error;
> + }
I introduced ftrace_set_filter_ips, because loop like above was slow:
https://lore.kernel.org/bpf/[email protected]/
thanks,
jirka
> +
> + fp->nmissed = 0;
> + ret = register_ftrace_function(&fp->ftrace);
> + if (!ret)
> + return ret;
> +
> +error:
> + if (fp->syms) {
> + kfree(fp->addrs);
> + fp->addrs = NULL;
> + }
> +
> + return ret;
> +}
> +
> +/**
> + * unregister_fprobe - Unregister fprobe from ftrace
> + * @fp: A fprobe data structure to be unregistered.
> + */
> +int unregister_fprobe(struct fprobe *fp)
> +{
> + int ret;
> +
> + if (!fp)
> + return -EINVAL;
> +
> + if (!fp->nsyms || !fp->addrs)
> + return -EINVAL;
> +
> + ret = unregister_ftrace_function(&fp->ftrace);
> +
> + if (fp->syms) {
> + /* fp->addrs is allocated by register_fprobe() */
> + kfree(fp->addrs);
> + fp->addrs = NULL;
> + }
> +
> + return ret;
> +}
> --
> 2.25.1
>
On Thu, 6 Jan 2022 22:59:43 +0900
Masami Hiramatsu <[email protected]> wrote:
> > at the moment there's not ftrace public interface for the return
> > probe merged in, so to get the kretprobe working I had to use
> > kprobe interface
>
> Yeah, I found that too. We have to ask Steve to salvage it ;)
I have one more week of being unemployed (and I'm done with my office
renovation), so perhaps I'll start looking into this.
This was the work to merge function graph tracer with kretprobes, right?
-- Steve
On Thu, Jan 6, 2022 at 12:41 AM Jiri Olsa <[email protected]> wrote:
>
> On Wed, Jan 05, 2022 at 08:30:56PM -0800, Andrii Nakryiko wrote:
> > On Tue, Jan 4, 2022 at 12:10 AM Jiri Olsa <[email protected]> wrote:
> > >
> > > Adding new link type BPF_LINK_TYPE_KPROBE to attach kprobes
> > > directly through register_kprobe/kretprobe API.
> > >
> > > Adding new attach type BPF_TRACE_RAW_KPROBE that enables
> > > such link for kprobe program.
> > >
> > > The new link allows to create multiple kprobes link by using
> > > new link_create interface:
> > >
> > > struct {
> > > __aligned_u64 addrs;
> > > __u32 cnt;
> > > __u64 bpf_cookie;
> >
> > I'm afraid bpf_cookie has to be different for each addr, otherwise
> > it's severely limiting. So it would be an array of cookies alongside
> > an array of addresses
>
> ok
>
> >
> > > } kprobe;
> > >
> > > Plus new flag BPF_F_KPROBE_RETURN for link_create.flags to
> > > create return probe.
> > >
> > > Signed-off-by: Jiri Olsa <[email protected]>
> > > ---
> > > include/linux/bpf_types.h | 1 +
> > > include/uapi/linux/bpf.h | 12 +++
> > > kernel/bpf/syscall.c | 191 ++++++++++++++++++++++++++++++++-
> > > tools/include/uapi/linux/bpf.h | 12 +++
> > > 4 files changed, 211 insertions(+), 5 deletions(-)
> > >
> >
> > [...]
> >
> > > @@ -1111,6 +1113,11 @@ enum bpf_link_type {
> > > */
> > > #define BPF_F_SLEEPABLE (1U << 4)
> > >
> > > +/* link_create flags used in LINK_CREATE command for BPF_TRACE_RAW_KPROBE
> > > + * attach type.
> > > + */
> > > +#define BPF_F_KPROBE_RETURN (1U << 0)
> > > +
> >
> > we have plenty of flexibility to have per-link type fields, so why not
> > add `bool is_retprobe` next to addrs and cnt?
>
> well I thought if I do that, people would suggest to use the empty
> flags field instead ;-)
>
> we can move it there as you suggest, but I wonder it's good idea to
> use bool in uapi headers, because the bool size definition is vague
Good point. No 'bool' please.
grep bool include/uapi/linux/
Only gives openvswitch.h and it's guarded by ifdef KERNEL
So not a single uapi header has bool in it.
On Thu, Jan 6, 2022 at 5:59 AM Masami Hiramatsu <[email protected]> wrote:
>
> That seems to bind your mind. The program type is just a programing
> 'model' of the bpf. You can choose the best implementation to provide
> equal functionality. 'kprobe' in bpf is just a name that you call some
> instrumentations which can probe kernel code.
No. We're not going to call it "fprobe" or any other name.
From bpf user's pov it's going to be "multi attach kprobe",
because this is how everyone got to know kprobes.
The 99% usage is at the beginning of the funcs.
When users say "kprobe" they don't care how kernel attaches it.
The func entry limitation for "multi attach kprobe" is a no-brainer.
And we need both "multi attach kprobe" and "multi attach kretprobe"
at the same time. It's no go to implement one first and the other
some time later.
On Thu, Jan 6, 2022 at 8:32 AM Alexei Starovoitov
<[email protected]> wrote:
>
> On Thu, Jan 6, 2022 at 12:41 AM Jiri Olsa <[email protected]> wrote:
> >
> > On Wed, Jan 05, 2022 at 08:30:56PM -0800, Andrii Nakryiko wrote:
> > > On Tue, Jan 4, 2022 at 12:10 AM Jiri Olsa <[email protected]> wrote:
> > > >
> > > > Adding new link type BPF_LINK_TYPE_KPROBE to attach kprobes
> > > > directly through register_kprobe/kretprobe API.
> > > >
> > > > Adding new attach type BPF_TRACE_RAW_KPROBE that enables
> > > > such link for kprobe program.
> > > >
> > > > The new link allows to create multiple kprobes link by using
> > > > new link_create interface:
> > > >
> > > > struct {
> > > > __aligned_u64 addrs;
> > > > __u32 cnt;
> > > > __u64 bpf_cookie;
> > >
> > > I'm afraid bpf_cookie has to be different for each addr, otherwise
> > > it's severely limiting. So it would be an array of cookies alongside
> > > an array of addresses
> >
> > ok
> >
> > >
> > > > } kprobe;
> > > >
> > > > Plus new flag BPF_F_KPROBE_RETURN for link_create.flags to
> > > > create return probe.
> > > >
> > > > Signed-off-by: Jiri Olsa <[email protected]>
> > > > ---
> > > > include/linux/bpf_types.h | 1 +
> > > > include/uapi/linux/bpf.h | 12 +++
> > > > kernel/bpf/syscall.c | 191 ++++++++++++++++++++++++++++++++-
> > > > tools/include/uapi/linux/bpf.h | 12 +++
> > > > 4 files changed, 211 insertions(+), 5 deletions(-)
> > > >
> > >
> > > [...]
> > >
> > > > @@ -1111,6 +1113,11 @@ enum bpf_link_type {
> > > > */
> > > > #define BPF_F_SLEEPABLE (1U << 4)
> > > >
> > > > +/* link_create flags used in LINK_CREATE command for BPF_TRACE_RAW_KPROBE
> > > > + * attach type.
> > > > + */
> > > > +#define BPF_F_KPROBE_RETURN (1U << 0)
> > > > +
> > >
> > > we have plenty of flexibility to have per-link type fields, so why not
> > > add `bool is_retprobe` next to addrs and cnt?
> >
> > well I thought if I do that, people would suggest to use the empty
> > flags field instead ;-)
> >
> > we can move it there as you suggest, but I wonder it's good idea to
> > use bool in uapi headers, because the bool size definition is vague
>
> Good point. No 'bool' please.
> grep bool include/uapi/linux/
> Only gives openvswitch.h and it's guarded by ifdef KERNEL
> So not a single uapi header has bool in it.
Yeah, I don't insist on bool specifically.
But I was trying to avoid link_create.flags to become map_flags where
we have various flags, each taking a separate bit, but most flags
don't apply to most map types. Ideally link_create.flags would have
few flags that apply to all or most link types (i.e., something
orthogonal to a specific link type), and for this case we could have
kprobe-specific flags (link_create.kprobe.flags) to adjust kprobe link
creation behavior.
But I don't know, maybe I'm overthinking this.
On Thu, 6 Jan 2022 09:40:17 -0800
Alexei Starovoitov <[email protected]> wrote:
> On Thu, Jan 6, 2022 at 5:59 AM Masami Hiramatsu <[email protected]> wrote:
> >
> > That seems to bind your mind. The program type is just a programing
> > 'model' of the bpf. You can choose the best implementation to provide
> > equal functionality. 'kprobe' in bpf is just a name that you call some
> > instrumentations which can probe kernel code.
>
> No. We're not going to call it "fprobe" or any other name.
> From bpf user's pov it's going to be "multi attach kprobe",
> because this is how everyone got to know kprobes.
> The 99% usage is at the beginning of the funcs.
> When users say "kprobe" they don't care how kernel attaches it.
> The func entry limitation for "multi attach kprobe" is a no-brainer.
Agreed. I think I might mislead you. From the bpf user pov, it always be
shown as 'multi attached kprobes (but only for the function entry)'
the 'fprobe' is kernel internal API name.
> And we need both "multi attach kprobe" and "multi attach kretprobe"
> at the same time. It's no go to implement one first and the other
> some time later.
You can provide the interface to user space, but the kernel implementation
is optimized step by step. We can start it with using real multiple
kretprobes, and then, switch to 'fprobe' after integrating fgraph
callback. :)
Thank you,
--
Masami Hiramatsu <[email protected]>
On Thu, Jan 6, 2022 at 3:52 PM Masami Hiramatsu <[email protected]> wrote:
>
> On Thu, 6 Jan 2022 09:40:17 -0800
> Alexei Starovoitov <[email protected]> wrote:
>
> > On Thu, Jan 6, 2022 at 5:59 AM Masami Hiramatsu <[email protected]> wrote:
> > >
> > > That seems to bind your mind. The program type is just a programing
> > > 'model' of the bpf. You can choose the best implementation to provide
> > > equal functionality. 'kprobe' in bpf is just a name that you call some
> > > instrumentations which can probe kernel code.
> >
> > No. We're not going to call it "fprobe" or any other name.
> > From bpf user's pov it's going to be "multi attach kprobe",
> > because this is how everyone got to know kprobes.
> > The 99% usage is at the beginning of the funcs.
> > When users say "kprobe" they don't care how kernel attaches it.
> > The func entry limitation for "multi attach kprobe" is a no-brainer.
>
> Agreed. I think I might mislead you. From the bpf user pov, it always be
> shown as 'multi attached kprobes (but only for the function entry)'
> the 'fprobe' is kernel internal API name.
>
> > And we need both "multi attach kprobe" and "multi attach kretprobe"
> > at the same time. It's no go to implement one first and the other
> > some time later.
>
> You can provide the interface to user space, but the kernel implementation
> is optimized step by step. We can start it with using real multiple
> kretprobes, and then, switch to 'fprobe' after integrating fgraph
> callback. :)
Sounds good to me.
My point was that users often want to say:
"profile speed of all foo* functions".
To perform such a command a tracer would need to
attach kprobes and kretprobes to all such functions.
The speed of attach/detach has to be fast.
Currently tracers artificially limit the regex just because
attach/detach is so slow that the user will likely Ctrl-C
instead of waiting for many seconds.
On Thu, 6 Jan 2022 15:57:10 +0100
Jiri Olsa <[email protected]> wrote:
> On Thu, Jan 06, 2022 at 10:59:43PM +0900, Masami Hiramatsu wrote:
>
> SNIP
>
> > > >
> > > > Hmm, I think there may be a time to split the "kprobe as an
> > > > interface for the software breakpoint" and "kprobe as a wrapper
> > > > interface for the callbacks of various instrumentations", like
> > > > 'raw_kprobe'(or kswbp) and 'kprobes'.
> > > > And this may be called as 'fprobe' as ftrace_ops wrapper.
> > > > (But if the bpf is enough flexible, this kind of intermediate layer
> > > > may not be needed, it can use ftrace_ops directly, eventually)
> > > >
> > > > Jiri, have you already considered to use ftrace_ops from the
> > > > bpf directly? Are there any issues?
> > > > (bpf depends on 'kprobe' widely?)
> > >
> > > at the moment there's not ftrace public interface for the return
> > > probe merged in, so to get the kretprobe working I had to use
> > > kprobe interface
> >
> > Yeah, I found that too. We have to ask Steve to salvage it ;)
>
> I got those patches rebased like half a year ago upstream code,
> so should be easy to revive them
Nice! :)
>
> >
> > > but.. there are patches Steven shared some time ago, that do that
> > > and make graph_ops available as kernel interface
> > >
> > > I recall we considered graph_ops interface before as common attach
> > > layer for trampolines, which was bad, but it might actually make
> > > sense for kprobes
> >
> > I started working on making 'fprobe' which will provide multiple
> > function probe with similar interface of kprobes. See attached
> > patch. Then you can use it in bpf, maybe with an union like
> >
> > union {
> > struct kprobe kp; // for function body
> > struct fprobe fp; // for function entry and return
> > };
> >
> > At this moment, fprobe only support entry_handler, but when we
> > re-start the generic graph_ops interface, it is easy to expand
> > to support exit_handler.
> > If this works, I think kretprobe can be phased out, since at that
> > moment, kprobe_event can replace it with the fprobe exit_handler.
> > (This is a benefit of decoupling the instrumentation layer from
> > the event layer. It can choose the best way without changing
> > user interface.)
> >
>
> I can resend out graph_ops patches if you want to base
> it directly on that
Yes, that's very helpful. Now I'm considering to use it (or via fprobe)
from kretprobes like ftrace-based kprobe.
> > > I'll need to check it in more details but I think both graph_ops and
> > > kprobe do about similar thing wrt hooking return probe, so it should
> > > be comparable.. and they are already doing the same for the entry hook,
> > > because kprobe is mostly using ftrace for that
> > >
> > > we would not need to introduce new program type - kprobe programs
> > > should be able to run from ftrace callbacks just fine
> >
> > That seems to bind your mind. The program type is just a programing
> > 'model' of the bpf. You can choose the best implementation to provide
> > equal functionality. 'kprobe' in bpf is just a name that you call some
> > instrumentations which can probe kernel code.
>
> I don't want to introduce new type, there's some dependencies
> in bpf verifier and helpers code we'd need to handle for that
>
> I'm looking for solution for current kprobe bpf program type
> to be registered for multiple addresses quickly
Yes, as I replied to Alex, the bpf program type itself keeps 'kprobe'.
For example, you've introduced bpf_kprobe_link at [8/13],
struct bpf_kprobe_link {
struct bpf_link link;
union {
struct kretprobe rp;
struct fprobe fp;
};
bool is_return;
bool is_fentry;
kprobe_opcode_t **addrs;
u32 cnt;
u64 bpf_cookie;
};
If all "addrs" are function entry, ::fp will be used.
If cnt == 1 then use ::rp.
> > > so we would have:
> > > - kprobe type programs attaching to:
> > > - new BPF_LINK_TYPE_FPROBE link using the graph_ops as attachment layer
> > >
> > > jirka
> > >
> >
> >
> > --
> > Masami Hiramatsu <[email protected]>
>
> > From 269b86597c166d6d4c5dd564168237603533165a Mon Sep 17 00:00:00 2001
> > From: Masami Hiramatsu <[email protected]>
> > Date: Thu, 6 Jan 2022 15:40:36 +0900
> > Subject: [PATCH] fprobe: Add ftrace based probe APIs
> >
> > The fprobe is a wrapper API for ftrace function tracer.
> > Unlike kprobes, this probes only supports the function entry, but
> > it can probe multiple functions by one fprobe. The usage is almost
> > same as the kprobe, user will specify the function names by
> > fprobe::syms, the number of syms by fprobe::nsyms, and the user
> > handler by fprobe::handler.
> >
> > struct fprobe = { 0 };
> > const char *targets[] = {"func1", "func2", "func3"};
> >
> > fprobe.handler = user_handler;
> > fprobe.nsyms = ARRAY_SIZE(targets);
> > fprobe.syms = targets;
> >
> > ret = register_fprobe(&fprobe);
> > ...
> >
> >
> > Signed-off-by: Masami Hiramatsu <[email protected]>
> > ---
> > include/linux/fprobes.h | 52 ++++++++++++++++
> > kernel/trace/Kconfig | 10 ++++
> > kernel/trace/Makefile | 1 +
> > kernel/trace/fprobes.c | 128 ++++++++++++++++++++++++++++++++++++++++
> > 4 files changed, 191 insertions(+)
> > create mode 100644 include/linux/fprobes.h
> > create mode 100644 kernel/trace/fprobes.c
> >
> > diff --git a/include/linux/fprobes.h b/include/linux/fprobes.h
> > new file mode 100644
> > index 000000000000..22db748bf491
> > --- /dev/null
> > +++ b/include/linux/fprobes.h
> > @@ -0,0 +1,52 @@
> > +#ifndef _LINUX_FPROBES_H
> > +#define _LINUX_FPROBES_H
> > +/* Simple ftrace probe wrapper */
> > +
> > +#include <linux/compiler.h>
> > +#include <linux/ftrace.h>
> > +
> > +struct fprobe {
> > + const char **syms;
> > + unsigned long *addrs;
>
> could you add array of user data for each addr/sym?
OK, something like this?
void **user_data;
But note that you need O(N) to search the entry corresponding to
a specific address. To reduce the overhead, we may need to sort
the array in advance (e.g. when registering it).
>
> SNIP
>
> > +static int populate_func_addresses(struct fprobe *fp)
> > +{
> > + unsigned int i;
> > +
> > + fp->addrs = kmalloc(sizeof(void *) * fp->nsyms, GFP_KERNEL);
> > + if (!fp->addrs)
> > + return -ENOMEM;
> > +
> > + for (i = 0; i < fp->nsyms; i++) {
> > + fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
> > + if (!fp->addrs[i]) {
> > + kfree(fp->addrs);
> > + fp->addrs = NULL;
> > + return -ENOENT;
> > + }
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * register_fprobe - Register fprobe to ftrace
> > + * @fp: A fprobe data structure to be registered.
> > + *
> > + * This expects the user set @fp::syms or @fp::addrs (not both),
> > + * @fp::nsyms (number of entries of @fp::syms or @fp::addrs) and
> > + * @fp::handler. Other fields are initialized by this function.
> > + */
> > +int register_fprobe(struct fprobe *fp)
> > +{
> > + unsigned int i;
> > + int ret;
> > +
> > + if (!fp)
> > + return -EINVAL;
> > +
> > + if (!fp->nsyms || (!fp->syms && !fp->addrs) || (fp->syms && fp->addrs))
> > + return -EINVAL;
> > +
> > + if (fp->syms) {
> > + ret = populate_func_addresses(fp);
> > + if (ret < 0)
> > + return ret;
> > + }
> > +
> > + fp->ftrace.func = fprobe_handler;
> > + fp->ftrace.flags = FTRACE_OPS_FL_SAVE_REGS;
> > +
> > + for (i = 0; i < fp->nsyms; i++) {
> > + ret = ftrace_set_filter_ip(&fp->ftrace, fp->addrs[i], 0, 0);
> > + if (ret < 0)
> > + goto error;
> > + }
>
> I introduced ftrace_set_filter_ips, because loop like above was slow:
> https://lore.kernel.org/bpf/[email protected]/
Ah, thanks for noticing!
Thank you,
>
> thanks,
> jirka
>
> > +
> > + fp->nmissed = 0;
> > + ret = register_ftrace_function(&fp->ftrace);
> > + if (!ret)
> > + return ret;
> > +
> > +error:
> > + if (fp->syms) {
> > + kfree(fp->addrs);
> > + fp->addrs = NULL;
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +/**
> > + * unregister_fprobe - Unregister fprobe from ftrace
> > + * @fp: A fprobe data structure to be unregistered.
> > + */
> > +int unregister_fprobe(struct fprobe *fp)
> > +{
> > + int ret;
> > +
> > + if (!fp)
> > + return -EINVAL;
> > +
> > + if (!fp->nsyms || !fp->addrs)
> > + return -EINVAL;
> > +
> > + ret = unregister_ftrace_function(&fp->ftrace);
> > +
> > + if (fp->syms) {
> > + /* fp->addrs is allocated by register_fprobe() */
> > + kfree(fp->addrs);
> > + fp->addrs = NULL;
> > + }
> > +
> > + return ret;
> > +}
> > --
> > 2.25.1
> >
>
--
Masami Hiramatsu <[email protected]>
On Thu, 6 Jan 2022 16:20:05 -0800
Alexei Starovoitov <[email protected]> wrote:
> On Thu, Jan 6, 2022 at 3:52 PM Masami Hiramatsu <[email protected]> wrote:
> >
> > On Thu, 6 Jan 2022 09:40:17 -0800
> > Alexei Starovoitov <[email protected]> wrote:
> >
> > > On Thu, Jan 6, 2022 at 5:59 AM Masami Hiramatsu <[email protected]> wrote:
> > > >
> > > > That seems to bind your mind. The program type is just a programing
> > > > 'model' of the bpf. You can choose the best implementation to provide
> > > > equal functionality. 'kprobe' in bpf is just a name that you call some
> > > > instrumentations which can probe kernel code.
> > >
> > > No. We're not going to call it "fprobe" or any other name.
> > > From bpf user's pov it's going to be "multi attach kprobe",
> > > because this is how everyone got to know kprobes.
> > > The 99% usage is at the beginning of the funcs.
> > > When users say "kprobe" they don't care how kernel attaches it.
> > > The func entry limitation for "multi attach kprobe" is a no-brainer.
> >
> > Agreed. I think I might mislead you. From the bpf user pov, it always be
> > shown as 'multi attached kprobes (but only for the function entry)'
> > the 'fprobe' is kernel internal API name.
> >
> > > And we need both "multi attach kprobe" and "multi attach kretprobe"
> > > at the same time. It's no go to implement one first and the other
> > > some time later.
> >
> > You can provide the interface to user space, but the kernel implementation
> > is optimized step by step. We can start it with using real multiple
> > kretprobes, and then, switch to 'fprobe' after integrating fgraph
> > callback. :)
>
> Sounds good to me.
> My point was that users often want to say:
> "profile speed of all foo* functions".
> To perform such a command a tracer would need to
> attach kprobes and kretprobes to all such functions.
Yeah, I know. That is more than 10 years issue since
systemtap. :)
> The speed of attach/detach has to be fast.
Yes, that's why I provided register/unregister_kprobes()
but it sounds not enough (and maybe not optimized enough
because all handlers are same)
> Currently tracers artificially limit the regex just because
> attach/detach is so slow that the user will likely Ctrl-C
> instead of waiting for many seconds.
Ah, OK.
Anyway I also would like to fix that issue. If user wants
only function entry/exit, it should be done by ftrace. But
since the syntax (and user's mind model) it should be done via
'kprobe', so transparently converting such request to ftrace
is needed.
Thank you,
--
Masami Hiramatsu <[email protected]>
Hi Jiri,
Here is a short series of patches, which shows what I replied
to your series.
This introduces the fprobe, the function entry/exit probe with
multiple probe point support. This also introduces the rethook
for hooking function return, which I cloned from kretprobe.
I also rewrite your [08/13] bpf patch to use this fprobe instead
of kprobes. I didn't tested that one, but the sample module seems
to work. Please test bpf part with your libbpf updates.
BTW, while implementing the fprobe, I introduced the per-probe
point private data, but I'm not sure why you need it. It seems
that data is not used from bpf...
If this is good for you, I would like to proceed this with
the rethook and rewrite the kretprobe to use the rethook to
hook the functions. That should be much cleaner (and easy to
prepare for the fgraph tracer integration)
Thank you,
---
Jiri Olsa (1):
bpf: Add kprobe link for attaching raw kprobes
Masami Hiramatsu (5):
fprobe: Add ftrace based probe APIs
rethook: Add a generic return hook
rethook: x86: Add rethook x86 implementation
fprobe: Add exit_handler support
fprobe: Add sample program for fprobe
arch/x86/Kconfig | 1
arch/x86/kernel/Makefile | 1
arch/x86/kernel/rethook.c | 115 ++++++++++++++++++++
include/linux/bpf_types.h | 1
include/linux/fprobes.h | 75 +++++++++++++
include/linux/rethook.h | 74 +++++++++++++
include/linux/sched.h | 3 +
include/uapi/linux/bpf.h | 12 ++
kernel/bpf/syscall.c | 199 +++++++++++++++++++++++++++++++++-
kernel/exit.c | 2
kernel/fork.c | 3 +
kernel/trace/Kconfig | 22 ++++
kernel/trace/Makefile | 2
kernel/trace/fprobes.c | 187 ++++++++++++++++++++++++++++++++
kernel/trace/rethook.c | 226 +++++++++++++++++++++++++++++++++++++++
samples/Kconfig | 6 +
samples/Makefile | 1
samples/fprobe/Makefile | 3 +
samples/fprobe/fprobe_example.c | 103 ++++++++++++++++++
tools/include/uapi/linux/bpf.h | 12 ++
20 files changed, 1043 insertions(+), 5 deletions(-)
create mode 100644 arch/x86/kernel/rethook.c
create mode 100644 include/linux/fprobes.h
create mode 100644 include/linux/rethook.h
create mode 100644 kernel/trace/fprobes.c
create mode 100644 kernel/trace/rethook.c
create mode 100644 samples/fprobe/Makefile
create mode 100644 samples/fprobe/fprobe_example.c
--
Masami Hiramatsu (Linaro) <[email protected]>
The fprobe is a wrapper API for ftrace function tracer.
Unlike kprobes, this probes only supports the function entry, but
it can probe multiple functions by one fprobe. The usage is almost
same as the kprobe, user will specify the function names by
fprobe::entries[].syms, the number of syms by fprobe::nentry,
and the user handler by fprobe::entry_handler.
struct fprobe fp = { 0 };
struct fprobe_entry targets[] =
{{.sym = "func1"}, {.sym = "func2"}, {.sym = "func3"}};
fp.handler = user_handler;
fp.nentry = ARRAY_SIZE(targets);
fp.entries = targets;
ret = register_fprobe(&fp);
Note that the fp::entries will be sorted by the converted
function address.
Signed-off-by: Masami Hiramatsu <[email protected]>
---
include/linux/fprobes.h | 71 +++++++++++++++++++++++++
kernel/trace/Kconfig | 10 ++++
kernel/trace/Makefile | 1
kernel/trace/fprobes.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 214 insertions(+)
create mode 100644 include/linux/fprobes.h
create mode 100644 kernel/trace/fprobes.c
diff --git a/include/linux/fprobes.h b/include/linux/fprobes.h
new file mode 100644
index 000000000000..fa85a2fc3ad1
--- /dev/null
+++ b/include/linux/fprobes.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Simple ftrace probe wrapper */
+#ifndef _LINUX_FPROBES_H
+#define _LINUX_FPROBES_H
+
+#include <linux/compiler.h>
+#include <linux/ftrace.h>
+
+/*
+ * fprobe_entry - function entry for fprobe
+ * @sym: The symbol name of the function.
+ * @addr: The address of @sym.
+ * @data: per-entry data
+ *
+ * User must specify either @sym or @addr (not both). @data is optional.
+ */
+struct fprobe_entry {
+ const char *sym;
+ unsigned long addr;
+ void *data;
+};
+
+struct fprobe {
+ struct fprobe_entry *entries;
+ unsigned int nentry;
+
+ struct ftrace_ops ftrace;
+ unsigned long nmissed;
+ unsigned int flags;
+ void (*entry_handler) (struct fprobe *, unsigned long, struct pt_regs *);
+};
+
+#define FPROBE_FL_DISABLED 1
+
+static inline bool fprobe_disabled(struct fprobe *fp)
+{
+ return (fp) ? fp->flags & FPROBE_FL_DISABLED : false;
+}
+
+#ifdef CONFIG_FPROBES
+int register_fprobe(struct fprobe *fp);
+int unregister_fprobe(struct fprobe *fp);
+struct fprobe_entry *fprobe_find_entry(struct fprobe *fp, unsigned long addr);
+#else
+static inline int register_fprobe(struct fprobe *fp)
+{
+ return -ENOTSUPP;
+}
+static inline int unregister_fprobe(struct fprobe *fp)
+{
+ return -ENOTSUPP;
+}
+struct fprobe_entry *fprobe_find_entry(struct fprobe *fp, unsigned long addr)
+{
+ return NULL;
+}
+#endif
+
+static inline void disable_fprobe(struct fprobe *fp)
+{
+ if (fp)
+ fp->flags |= FPROBE_FL_DISABLED;
+}
+
+static inline void enable_fprobe(struct fprobe *fp)
+{
+ if (fp)
+ fp->flags &= ~FPROBE_FL_DISABLED;
+}
+
+#endif
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 420ff4bc67fd..45a3618a20a7 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -223,6 +223,16 @@ config DYNAMIC_FTRACE_WITH_ARGS
depends on DYNAMIC_FTRACE
depends on HAVE_DYNAMIC_FTRACE_WITH_ARGS
+config FPROBES
+ bool "Kernel Function Probe (fprobe)"
+ depends on FUNCTION_TRACER
+ depends on DYNAMIC_FTRACE_WITH_REGS
+ default n
+ help
+ This option enables kernel function probe feature, which is
+ similar to kprobes, but probes only for kernel function entries
+ and it can probe multiple functions by one fprobe.
+
config FUNCTION_PROFILER
bool "Kernel function profiler"
depends on FUNCTION_TRACER
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index bedc5caceec7..47a37a3bb974 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -97,6 +97,7 @@ obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
+obj-$(CONFIG_FPROBES) += fprobes.o
obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
diff --git a/kernel/trace/fprobes.c b/kernel/trace/fprobes.c
new file mode 100644
index 000000000000..0a609093d48c
--- /dev/null
+++ b/kernel/trace/fprobes.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define pr_fmt(fmt) "fprobes: " fmt
+
+#include <linux/fprobes.h>
+#include <linux/kallsyms.h>
+#include <linux/kprobes.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+
+static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
+ struct ftrace_ops *ops, struct ftrace_regs *fregs)
+{
+ struct fprobe *fp;
+ int bit;
+
+ fp = container_of(ops, struct fprobe, ftrace);
+ if (fprobe_disabled(fp))
+ return;
+
+ bit = ftrace_test_recursion_trylock(ip, parent_ip);
+ if (bit < 0) {
+ fp->nmissed++;
+ return;
+ }
+
+ if (fp->entry_handler)
+ fp->entry_handler(fp, ip, ftrace_get_regs(fregs));
+
+ ftrace_test_recursion_unlock(bit);
+}
+NOKPROBE_SYMBOL(fprobe_handler);
+
+static int convert_func_addresses(struct fprobe *fp)
+{
+ unsigned int i;
+ struct fprobe_entry *ent = fp->entries;
+
+ for (i = 0; i < fp->nentry; i++) {
+ if ((ent[i].sym && ent[i].addr) ||
+ (!ent[i].sym && !ent[i].addr))
+ return -EINVAL;
+
+ if (ent[i].addr)
+ continue;
+
+ ent[i].addr = kallsyms_lookup_name(ent[i].sym);
+ if (!ent[i].addr)
+ return -ENOENT;
+ }
+
+ return 0;
+}
+
+/* Since the entry list is sorted, we can search it by bisect */
+struct fprobe_entry *fprobe_find_entry(struct fprobe *fp, unsigned long addr)
+{
+ int d, n;
+
+ d = n = fp->nentry / 2;
+
+ while (fp->entries[n].addr != addr) {
+ d /= 2;
+ if (d == 0)
+ return NULL;
+ if (fp->entries[n].addr < addr)
+ n += d;
+ else
+ n -= d;
+ }
+
+ return fp->entries + n;
+}
+EXPORT_SYMBOL_GPL(fprobe_find_entry);
+
+static int fprobe_comp_func(const void *a, const void *b)
+{
+ return ((struct fprobe_entry *)a)->addr - ((struct fprobe_entry *)b)->addr;
+}
+
+/**
+ * register_fprobe - Register fprobe to ftrace
+ * @fp: A fprobe data structure to be registered.
+ *
+ * This expects the user set @fp::entry_handler, @fp::entries and @fp::nentry.
+ * For each entry of @fp::entries[], user must set 'addr' or 'sym'.
+ * Note that you do not set both of 'addr' and 'sym' of the entry.
+ */
+int register_fprobe(struct fprobe *fp)
+{
+ unsigned int i;
+ int ret;
+
+ if (!fp || !fp->nentry || !fp->entries)
+ return -EINVAL;
+
+ ret = convert_func_addresses(fp);
+ if (ret < 0)
+ return ret;
+ /*
+ * Sort the addresses so that the handler can find corresponding user data
+ * immediately.
+ */
+ sort(fp->entries, fp->nentry, sizeof(*fp->entries),
+ fprobe_comp_func, NULL);
+
+ fp->nmissed = 0;
+ fp->ftrace.func = fprobe_handler;
+ fp->ftrace.flags = FTRACE_OPS_FL_SAVE_REGS;
+
+ for (i = 0; i < fp->nentry; i++) {
+ ret = ftrace_set_filter_ip(&fp->ftrace, fp->entries[i].addr, 0, 0);
+ if (ret < 0)
+ return ret;
+ }
+
+ return register_ftrace_function(&fp->ftrace);
+}
+EXPORT_SYMBOL_GPL(register_fprobe);
+
+/**
+ * unregister_fprobe - Unregister fprobe from ftrace
+ * @fp: A fprobe data structure to be unregistered.
+ */
+int unregister_fprobe(struct fprobe *fp)
+{
+ if (!fp || !fp->nentry || !fp->entries)
+ return -EINVAL;
+
+ return unregister_ftrace_function(&fp->ftrace);
+}
+EXPORT_SYMBOL_GPL(unregister_fprobe);
Add a return hook framework which hooks the function
return. Most of the idea came from the kretprobe, but
this is independent from kretprobe.
Note that this is expected to be used with other
function entry hooking feature, like ftrace, fprobe,
adn kprobes. Eventually this will replace the
kretprobe (e.g. kprobe + rethook = kretprobe), but
at this moment, this is just a additional hook.
Signed-off-by: Masami Hiramatsu <[email protected]>
---
include/linux/rethook.h | 74 +++++++++++++++
include/linux/sched.h | 3 +
kernel/exit.c | 2
kernel/fork.c | 3 +
kernel/trace/Kconfig | 11 ++
kernel/trace/Makefile | 1
kernel/trace/rethook.c | 226 +++++++++++++++++++++++++++++++++++++++++++++++
7 files changed, 320 insertions(+)
create mode 100644 include/linux/rethook.h
create mode 100644 kernel/trace/rethook.c
diff --git a/include/linux/rethook.h b/include/linux/rethook.h
new file mode 100644
index 000000000000..2622bcd5213a
--- /dev/null
+++ b/include/linux/rethook.h
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Return hooking with list-based shadow stack.
+ */
+#ifndef _LINUX_RETHOOK_H
+#define _LINUX_RETHOOK_H
+
+#include <linux/compiler.h>
+#include <linux/freelist.h>
+#include <linux/llist.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+
+struct rethook_node;
+
+typedef void (*rethook_handler_t) (struct rethook_node *, void *, struct pt_regs *);
+
+struct rethook {
+ void *data;
+ rethook_handler_t handler;
+ struct freelist_head pool;
+ refcount_t ref;
+ struct rcu_head rcu;
+};
+
+struct rethook_node {
+ union {
+ struct freelist_node freelist;
+ struct rcu_head rcu;
+ };
+ struct llist_node llist;
+ struct rethook *rethook;
+ unsigned long ret_addr;
+ unsigned long frame;
+};
+
+int rethook_node_init(struct rethook_node *node);
+
+struct rethook *rethook_alloc(void *data, rethook_handler_t handler);
+void rethook_free(struct rethook *rh);
+void rethook_add_node(struct rethook *rh, struct rethook_node *node);
+
+struct rethook_node *rethook_try_get(struct rethook *rh);
+void rethook_node_recycle(struct rethook_node *node);
+void rethook_hook_current(struct rethook_node *node, struct pt_regs *regs);
+
+unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
+ struct llist_node **cur);
+
+/* Arch dependent code must implement this and trampoline code */
+void arch_rethook_prepare(struct rethook_node *node, struct pt_regs *regs);
+void arch_rethook_trampoline(void);
+
+static inline bool is_rethook_trampoline(unsigned long addr)
+{
+ return addr == (unsigned long)arch_rethook_trampoline;
+}
+
+/* If the architecture needs a fixup the return address, implement it. */
+void arch_rethook_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr);
+
+/* Generic trampoline handler, arch code must prepare asm stub */
+unsigned long rethook_trampoline_handler(struct pt_regs *regs,
+ unsigned long frame);
+
+#ifdef CONFIG_RETHOOK
+void rethook_flush_task(struct task_struct *tk);
+#else
+#define rethook_flush_task(tsk) do { } while (0)
+#endif
+
+#endif
+
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 78c351e35fec..2bfabf5355b7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1473,6 +1473,9 @@ struct task_struct {
#ifdef CONFIG_KRETPROBES
struct llist_head kretprobe_instances;
#endif
+#ifdef CONFIG_RETHOOK
+ struct llist_head rethooks;
+#endif
#ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH
/*
diff --git a/kernel/exit.c b/kernel/exit.c
index f702a6a63686..a39a321c1f37 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -64,6 +64,7 @@
#include <linux/compat.h>
#include <linux/io_uring.h>
#include <linux/kprobes.h>
+#include <linux/rethook.h>
#include <linux/uaccess.h>
#include <asm/unistd.h>
@@ -169,6 +170,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);
kprobe_flush_task(tsk);
+ rethook_flush_task(tsk);
perf_event_delayed_put(tsk);
trace_sched_process_free(tsk);
put_task_struct(tsk);
diff --git a/kernel/fork.c b/kernel/fork.c
index 3244cc56b697..ffae38be64c4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2282,6 +2282,9 @@ static __latent_entropy struct task_struct *copy_process(
#ifdef CONFIG_KRETPROBES
p->kretprobe_instances.first = NULL;
#endif
+#ifdef CONFIG_RETHOOK
+ p->rethooks.first = NULL;
+#endif
/*
* Ensure that the cgroup subsystem policies allow the new process to be
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 45a3618a20a7..9328724258dc 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -10,6 +10,17 @@ config USER_STACKTRACE_SUPPORT
config NOP_TRACER
bool
+config HAVE_RETHOOK
+ bool
+
+config RETHOOK
+ bool
+ depends on HAVE_RETHOOK
+ help
+ Enable generic return hooking feature. This is an internal
+ API, which will be used by other function-entry hooking
+ feature like fprobe and kprobes.
+
config HAVE_FUNCTION_TRACER
bool
help
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 47a37a3bb974..c68fdacbf9ef 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -98,6 +98,7 @@ obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
obj-$(CONFIG_FPROBES) += fprobes.o
+obj-$(CONFIG_RETHOOK) += rethook.o
obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
new file mode 100644
index 000000000000..80c0584e8497
--- /dev/null
+++ b/kernel/trace/rethook.c
@@ -0,0 +1,226 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define pr_fmt(fmt) "rethook: " fmt
+
+#include <linux/bug.h>
+#include <linux/kallsyms.h>
+#include <linux/kprobes.h>
+#include <linux/preempt.h>
+#include <linux/rethook.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+
+/* Return hook list (shadow stack by list) */
+
+void rethook_flush_task(struct task_struct *tk)
+{
+ struct rethook_node *rhn;
+ struct llist_node *node;
+
+ preempt_disable();
+
+ node = __llist_del_all(&tk->rethooks);
+ while (node) {
+ rhn = container_of(node, struct rethook_node, llist);
+ node = node->next;
+ rethook_node_recycle(rhn);
+ }
+
+ preempt_enable();
+}
+
+static void rethook_free_rcu(struct rcu_head *head)
+{
+ struct rethook *rh = container_of(head, struct rethook, rcu);
+ struct rethook_node *rhn;
+ struct freelist_node *node;
+ int count = 1;
+
+ node = rh->pool.head;
+ while (node) {
+ rhn = container_of(node, struct rethook_node, freelist);
+ node = node->next;
+ kfree(rhn);
+ count++;
+ }
+
+ /* The rh->ref is the number of pooled node + 1 */
+ if (refcount_sub_and_test(count, &rh->ref))
+ kfree(rh);
+}
+
+void rethook_free(struct rethook *rh)
+{
+ rh->handler = NULL;
+ rh->data = NULL;
+
+ call_rcu(&rh->rcu, rethook_free_rcu);
+}
+
+/*
+ * @handler must not NULL. @handler == NULL means this rethook is
+ * going to be freed.
+ */
+struct rethook *rethook_alloc(void *data, rethook_handler_t handler)
+{
+ struct rethook *rh = kzalloc(sizeof(struct rethook), GFP_KERNEL);
+
+ if (!rh || !handler)
+ return NULL;
+
+ rh->data = data;
+ rh->handler = handler;
+ rh->pool.head = NULL;
+ refcount_set(&rh->ref, 1);
+
+ return rh;
+}
+
+void rethook_add_node(struct rethook *rh, struct rethook_node *node)
+{
+ node->rethook = rh;
+ freelist_add(&node->freelist, &rh->pool);
+ refcount_inc(&rh->ref);
+}
+
+static void free_rethook_node_rcu(struct rcu_head *head)
+{
+ struct rethook_node *node = container_of(head, struct rethook_node, rcu);
+
+ if (refcount_dec_and_test(&node->rethook->ref))
+ kfree(node->rethook);
+ kfree(node);
+}
+
+void rethook_node_recycle(struct rethook_node *node)
+{
+ if (likely(READ_ONCE(node->rethook->handler)))
+ freelist_add(&node->freelist, &node->rethook->pool);
+ else
+ call_rcu(&node->rcu, free_rethook_node_rcu);
+}
+
+struct rethook_node *rethook_try_get(struct rethook *rh)
+{
+ struct freelist_node *fn;
+
+ /* Check whether @rh is going to be freed. */
+ if (unlikely(!READ_ONCE(rh->handler)))
+ return NULL;
+
+ fn = freelist_try_get(&rh->pool);
+ if (!fn)
+ return NULL;
+
+ return container_of(fn, struct rethook_node, freelist);
+}
+
+void rethook_hook_current(struct rethook_node *node, struct pt_regs *regs)
+{
+ arch_rethook_prepare(node, regs);
+ __llist_add(&node->llist, ¤t->rethooks);
+}
+
+/* This assumes the 'tsk' is the current task or the is not running. */
+static unsigned long __rethook_find_ret_addr(struct task_struct *tsk,
+ struct llist_node **cur)
+{
+ struct rethook_node *rh = NULL;
+ struct llist_node *node = *cur;
+
+ if (!node)
+ node = tsk->rethooks.first;
+ else
+ node = node->next;
+
+ while (node) {
+ rh = container_of(node, struct rethook_node, llist);
+ if (rh->ret_addr != (unsigned long)arch_rethook_trampoline) {
+ *cur = node;
+ return rh->ret_addr;
+ }
+ node = node->next;
+ }
+ return 0;
+}
+NOKPROBE_SYMBOL(__rethook_find_ret_addr);
+
+/**
+ * rethook_find_ret_addr -- Find correct return address modified by rethook
+ * @tsk: Target task
+ * @frame: A frame pointer
+ * @cur: a storage of the loop cursor llist_node pointer for next call
+ *
+ * Find the correct return address modified by a rethook on @tsk in unsigned
+ * long type. If it finds the return address, this returns that address value,
+ * or this returns 0.
+ * The @tsk must be 'current' or a task which is not running. @frame is a hint
+ * to get the currect return address - which is compared with the
+ * rethook::frame field. The @cur is a loop cursor for searching the
+ * kretprobe return addresses on the @tsk. The '*@cur' should be NULL at the
+ * first call, but '@cur' itself must NOT NULL.
+ */
+unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
+ struct llist_node **cur)
+{
+ struct rethook_node *rhn = NULL;
+ unsigned long ret;
+
+ if (WARN_ON_ONCE(!cur))
+ return 0;
+
+ do {
+ ret = __rethook_find_ret_addr(tsk, cur);
+ if (!ret)
+ break;
+ rhn = container_of(*cur, struct rethook_node, llist);
+ } while (rhn->frame != frame);
+
+ return ret;
+}
+NOKPROBE_SYMBOL(rethook_find_ret_addr);
+
+void __weak arch_rethook_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr)
+{
+ /*
+ * Do nothing by default. If the architecture which uses a
+ * frame pointer to record real return address on the stack,
+ * it should fill this function to fixup the return address
+ * so that stacktrace works from the rethook handler.
+ */
+}
+
+unsigned long rethook_trampoline_handler(struct pt_regs *regs,
+ unsigned long frame)
+{
+ struct rethook_node *rhn;
+ struct llist_node *first, *node = NULL;
+ unsigned long correct_ret_addr = __rethook_find_ret_addr(current, &node);
+
+ if (!correct_ret_addr) {
+ pr_err("rethook: Return address not found! Maybe there is a bug in the kernel\n");
+ BUG_ON(1);
+ }
+
+ instruction_pointer_set(regs, correct_ret_addr);
+ arch_rethook_fixup_return(regs, correct_ret_addr);
+
+ first = current->rethooks.first;
+ current->rethooks.first = node->next;
+ node->next = NULL;
+
+ while (first) {
+ rhn = container_of(first, struct rethook_node, llist);
+ if (WARN_ON_ONCE(rhn->frame != frame))
+ break;
+ if (rhn->rethook->handler)
+ rhn->rethook->handler(rhn, rhn->rethook->data, regs);
+
+ first = first->next;
+ rethook_node_recycle(rhn);
+ }
+
+ return correct_ret_addr;
+}
+
Add rethook for x86 implementation. Most of the code
has been copied from kretprobes on x86.
Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/x86/Kconfig | 1
arch/x86/kernel/Makefile | 1
arch/x86/kernel/rethook.c | 115 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 117 insertions(+)
create mode 100644 arch/x86/kernel/rethook.c
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 7399327d1eff..939c4c897e63 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -219,6 +219,7 @@ config X86
select HAVE_KPROBES_ON_FTRACE
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_KRETPROBES
+ select HAVE_RETHOOK
select HAVE_KVM
select HAVE_LIVEPATCH if X86_64
select HAVE_MIXED_BREAKPOINTS_REGS
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2ff3e600f426..66593d8c4d74 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
obj-$(CONFIG_FTRACE_SYSCALLS) += ftrace.o
obj-$(CONFIG_X86_TSC) += trace_clock.o
obj-$(CONFIG_TRACING) += trace.o
+obj-$(CONFIG_RETHOOK) += rethook.o
obj-$(CONFIG_CRASH_CORE) += crash_core_$(BITS).o
obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o
obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o
diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c
new file mode 100644
index 000000000000..f2f3b9526e43
--- /dev/null
+++ b/arch/x86/kernel/rethook.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * x86 implementation of rethook. Mostly copied from arch/x86/kernel/kprobes/core.c.
+ */
+#include <linux/bug.h>
+#include <linux/rethook.h>
+#include <linux/kprobes.h>
+
+#include "kprobes/common.h"
+
+/*
+ * Called from arch_rethook_trampoline
+ */
+__used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs)
+{
+ unsigned long *frame_pointer;
+
+ /* fixup registers */
+ regs->cs = __KERNEL_CS;
+#ifdef CONFIG_X86_32
+ regs->gs = 0;
+#endif
+ regs->ip = (unsigned long)&arch_rethook_trampoline;
+ regs->orig_ax = ~0UL;
+ regs->sp += sizeof(long);
+ frame_pointer = ®s->sp + 1;
+
+ /*
+ * The return address at 'frame_pointer' is recovered by the
+ * arch_rethook_fixup_return() which called from this
+ * rethook_trampoline_handler().
+ */
+ rethook_trampoline_handler(regs, (unsigned long)frame_pointer);
+
+ /*
+ * Copy FLAGS to 'pt_regs::sp' so that arch_rethook_trapmoline()
+ * can do RET right after POPF.
+ */
+ regs->sp = regs->flags;
+}
+NOKPROBE_SYMBOL(arch_rethook_trampoline_callback);
+
+/*
+ * When a target function returns, this code saves registers and calls
+ * arch_rethook_trampoline_callback(), which calls the rethook handler.
+ */
+asm(
+ ".text\n"
+ ".global arch_rethook_trampoline\n"
+ ".type arch_rethook_trampoline, @function\n"
+ "arch_rethook_trampoline:\n"
+#ifdef CONFIG_X86_64
+ /* Push a fake return address to tell the unwinder it's a kretprobe. */
+ " pushq $arch_rethook_trampoline\n"
+ UNWIND_HINT_FUNC
+ /* Save the 'sp - 8', this will be fixed later. */
+ " pushq %rsp\n"
+ " pushfq\n"
+ SAVE_REGS_STRING
+ " movq %rsp, %rdi\n"
+ " call arch_rethook_trampoline_callback\n"
+ RESTORE_REGS_STRING
+ /* In the callback function, 'regs->flags' is copied to 'regs->sp'. */
+ " addq $8, %rsp\n"
+ " popfq\n"
+#else
+ /* Push a fake return address to tell the unwinder it's a kretprobe. */
+ " pushl $arch_rethook_trampoline\n"
+ UNWIND_HINT_FUNC
+ /* Save the 'sp - 4', this will be fixed later. */
+ " pushl %esp\n"
+ " pushfl\n"
+ SAVE_REGS_STRING
+ " movl %esp, %eax\n"
+ " call arch_rethook_trampoline_callback\n"
+ RESTORE_REGS_STRING
+ /* In the callback function, 'regs->flags' is copied to 'regs->sp'. */
+ " addl $4, %esp\n"
+ " popfl\n"
+#endif
+ " ret\n"
+ ".size arch_rethook_trampoline, .-arch_rethook_trampoline\n"
+);
+NOKPROBE_SYMBOL(arch_rethook_trampoline);
+/*
+ * arch_rethook_trampoline() skips updating frame pointer. The frame pointer
+ * saved in arch_rethook_trampoline_callback() points to the real caller
+ * function's frame pointer. Thus the arch_rethook_trampoline() doesn't have
+ * a standard stack frame with CONFIG_FRAME_POINTER=y.
+ * Let's mark it non-standard function. Anyway, FP unwinder can correctly
+ * unwind without the hint.
+ */
+STACK_FRAME_NON_STANDARD_FP(arch_rethook_trampoline);
+
+/* This is called from rethook_trampoline_handler(). */
+void arch_rethook_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr)
+{
+ unsigned long *frame_pointer = ®s->sp + 1;
+
+ /* Replace fake return address with real one. */
+ *frame_pointer = correct_ret_addr;
+}
+
+void arch_rethook_prepare(struct rethook_node *rh, struct pt_regs *regs)
+{
+ unsigned long *stack = (unsigned long *)regs->sp;
+
+ rh->ret_addr = stack[0];
+ rh->frame = regs->sp;
+
+ /* Replace the return addr with trampoline addr */
+ stack[0] = (unsigned long) arch_rethook_trampoline;
+}
+NOKPROBE_SYMBOL(arch_rethook_prepare);
Add exit_handler to fprobe. fprobe + rethook allows us
to hook the kernel function return without fgraph tracer.
Eventually, the fgraph tracer will be generic array based
return hooking and fprobe may use it if user requests.
Since both array-based approach and list-based approach
have Pros and Cons, (e.g. memory consumption v.s. less
missing events) it is better to keep both but fprobe
will provide the same exit-handler interface.
Signed-off-by: Masami Hiramatsu <[email protected]>
---
include/linux/fprobes.h | 4 +++
kernel/trace/Kconfig | 1 +
kernel/trace/fprobes.c | 59 +++++++++++++++++++++++++++++++++++++++++++++--
3 files changed, 62 insertions(+), 2 deletions(-)
diff --git a/include/linux/fprobes.h b/include/linux/fprobes.h
index fa85a2fc3ad1..d2eb064c5b79 100644
--- a/include/linux/fprobes.h
+++ b/include/linux/fprobes.h
@@ -5,6 +5,7 @@
#include <linux/compiler.h>
#include <linux/ftrace.h>
+#include <linux/rethook.h>
/*
* fprobe_entry - function entry for fprobe
@@ -27,7 +28,10 @@ struct fprobe {
struct ftrace_ops ftrace;
unsigned long nmissed;
unsigned int flags;
+ struct rethook *rethook;
+
void (*entry_handler) (struct fprobe *, unsigned long, struct pt_regs *);
+ void (*exit_handler) (struct fprobe *, unsigned long, struct pt_regs *);
};
#define FPROBE_FL_DISABLED 1
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 9328724258dc..59e227ade0b7 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -238,6 +238,7 @@ config FPROBES
bool "Kernel Function Probe (fprobe)"
depends on FUNCTION_TRACER
depends on DYNAMIC_FTRACE_WITH_REGS
+ select RETHOOK
default n
help
This option enables kernel function probe feature, which is
diff --git a/kernel/trace/fprobes.c b/kernel/trace/fprobes.c
index 0a609093d48c..1e8202a19e3d 100644
--- a/kernel/trace/fprobes.c
+++ b/kernel/trace/fprobes.c
@@ -5,12 +5,20 @@
#include <linux/fprobes.h>
#include <linux/kallsyms.h>
#include <linux/kprobes.h>
+#include <linux/rethook.h>
#include <linux/slab.h>
#include <linux/sort.h>
+struct fprobe_rethook_node {
+ struct rethook_node node;
+ unsigned long entry_ip;
+};
+
static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct ftrace_regs *fregs)
{
+ struct fprobe_rethook_node *fpr;
+ struct rethook_node *rh;
struct fprobe *fp;
int bit;
@@ -27,10 +35,34 @@ static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
if (fp->entry_handler)
fp->entry_handler(fp, ip, ftrace_get_regs(fregs));
+ if (fp->exit_handler) {
+ rh = rethook_try_get(fp->rethook);
+ if (!rh) {
+ fp->nmissed++;
+ goto out;
+ }
+ fpr = container_of(rh, struct fprobe_rethook_node, node);
+ fpr->entry_ip = ip;
+ rethook_hook_current(rh, ftrace_get_regs(fregs));
+ }
+
+out:
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(fprobe_handler);
+static void fprobe_exit_handler(struct rethook_node *rh, void *data,
+ struct pt_regs *regs)
+{
+ struct fprobe *fp = (struct fprobe *)data;
+ struct fprobe_rethook_node *fpr;
+
+ fpr = container_of(rh, struct fprobe_rethook_node, node);
+
+ fp->exit_handler(fp, fpr->entry_ip, regs);
+}
+NOKPROBE_SYMBOL(fprobe_exit_handler);
+
static int convert_func_addresses(struct fprobe *fp)
{
unsigned int i;
@@ -88,7 +120,7 @@ static int fprobe_comp_func(const void *a, const void *b)
*/
int register_fprobe(struct fprobe *fp)
{
- unsigned int i;
+ unsigned int i, size;
int ret;
if (!fp || !fp->nentry || !fp->entries)
@@ -114,6 +146,23 @@ int register_fprobe(struct fprobe *fp)
return ret;
}
+ /* Initialize rethook if needed */
+ if (fp->exit_handler) {
+ size = fp->nentry * num_possible_cpus() * 2;
+ fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler);
+ for (i = 0; i < size; i++) {
+ struct rethook_node *node;
+
+ node = kzalloc(sizeof(struct fprobe_rethook_node), GFP_KERNEL);
+ if (!node) {
+ rethook_free(fp->rethook);
+ return -ENOMEM;
+ }
+ rethook_add_node(fp->rethook, node);
+ }
+ } else
+ fp->rethook = NULL;
+
return register_ftrace_function(&fp->ftrace);
}
EXPORT_SYMBOL_GPL(register_fprobe);
@@ -124,9 +173,15 @@ EXPORT_SYMBOL_GPL(register_fprobe);
*/
int unregister_fprobe(struct fprobe *fp)
{
+ int ret;
+
if (!fp || !fp->nentry || !fp->entries)
return -EINVAL;
- return unregister_ftrace_function(&fp->ftrace);
+ ret = unregister_ftrace_function(&fp->ftrace);
+ if (!ret)
+ rethook_free(fp->rethook);
+
+ return ret;
}
EXPORT_SYMBOL_GPL(unregister_fprobe);
Add a sample program for the fprobe.
Signed-off-by: Masami Hiramatsu <[email protected]>
---
samples/Kconfig | 6 ++
samples/Makefile | 1
samples/fprobe/Makefile | 3 +
samples/fprobe/fprobe_example.c | 103 +++++++++++++++++++++++++++++++++++++++
4 files changed, 113 insertions(+)
create mode 100644 samples/fprobe/Makefile
create mode 100644 samples/fprobe/fprobe_example.c
diff --git a/samples/Kconfig b/samples/Kconfig
index 43d2e9aa557f..487b5d17f722 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -73,6 +73,12 @@ config SAMPLE_HW_BREAKPOINT
help
This builds kernel hardware breakpoint example modules.
+config SAMPLE_FPROBE
+ tristate "Build fprobe examples -- loadable modules only"
+ depends on FPROBES && m
+ help
+ This build several fprobe example modules.
+
config SAMPLE_KFIFO
tristate "Build kfifo examples -- loadable modules only"
depends on m
diff --git a/samples/Makefile b/samples/Makefile
index 4bcd6b93bffa..4f73fe7aa473 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_SAMPLE_INTEL_MEI) += mei/
subdir-$(CONFIG_SAMPLE_WATCHDOG) += watchdog
subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak/
+obj-$(CONFIG_SAMPLE_FPROBE) += fprobe/
diff --git a/samples/fprobe/Makefile b/samples/fprobe/Makefile
new file mode 100644
index 000000000000..ecccbfa6e99b
--- /dev/null
+++ b/samples/fprobe/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_SAMPLE_FPROBE) += fprobe_example.o
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
new file mode 100644
index 000000000000..8ea335cfe916
--- /dev/null
+++ b/samples/fprobe/fprobe_example.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Here's a sample kernel module showing the use of fprobe to dump a
+ * stack trace and selected registers when kernel_clone() is called.
+ *
+ * For more information on theory of operation of kprobes, see
+ * Documentation/trace/kprobes.rst
+ *
+ * You will see the trace data in /var/log/messages and on the console
+ * whenever kernel_clone() is invoked to create a new process.
+ */
+
+#define pr_fmt(fmt) "%s: " fmt, __func__
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/fprobes.h>
+#include <linux/slab.h>
+
+#define MAX_SYMBOL_LEN 4096
+struct fprobe sample_probe;
+static char symbol[MAX_SYMBOL_LEN] = "kernel_clone";
+module_param_string(symbol, symbol, sizeof(symbol), 0644);
+
+static void sample_entry_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs)
+{
+ const char *sym = "<no-sym>";
+ struct fprobe_entry *ent;
+
+ ent = fprobe_find_entry(fp, ip);
+ if (ent)
+ sym = ent->sym;
+
+ pr_info("Enter <%s> ip = 0x%p (%pS)\n", sym, (void *)ip, (void *)ip);
+}
+
+static void sample_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs)
+{
+ unsigned long rip = instruction_pointer(regs);
+ const char *sym = "<no-sym>";
+ struct fprobe_entry *ent;
+
+ ent = fprobe_find_entry(fp, ip);
+ if (ent)
+ sym = ent->sym;
+
+ pr_info("Return from <%s> ip = 0x%p to rip = 0x%p (%pS)\n", sym, (void *)ip,
+ (void *)rip, (void *)rip);
+}
+
+static int __init fprobe_init(void)
+{
+ struct fprobe_entry *ents;
+ char *tmp, *p;
+ int ret, count, i;
+
+ sample_probe.entry_handler = sample_entry_handler;
+ sample_probe.exit_handler = sample_exit_handler;
+
+ if (strchr(symbol, ',')) {
+ tmp = kstrdup(symbol, GFP_KERNEL);
+ if (!tmp)
+ return -ENOMEM;
+ p = tmp;
+ count = 1;
+ while ((p = strchr(p, ',')) != NULL)
+ count++;
+ } else {
+ count = 1;
+ tmp = symbol;
+ }
+
+ ents = kzalloc(count * sizeof(*ents), GFP_KERNEL);
+ if (!ents) {
+ if (tmp != symbol)
+ kfree(tmp);
+ return -ENOMEM;
+ }
+
+ for (i = 0; i < count; i++)
+ ents[i].sym = strsep(&tmp, ",");
+
+ sample_probe.entries = ents;
+ sample_probe.nentry = count;
+
+ ret = register_fprobe(&sample_probe);
+ if (ret < 0) {
+ pr_err("register_fprobe failed, returned %d\n", ret);
+ return ret;
+ }
+ pr_info("Planted fprobe at %s\n", symbol);
+ return 0;
+}
+
+static void __exit fprobe_exit(void)
+{
+ unregister_fprobe(&sample_probe);
+ pr_info("fprobe at %s unregistered\n", symbol);
+}
+
+module_init(fprobe_init)
+module_exit(fprobe_exit)
+MODULE_LICENSE("GPL");
From: Jiri Olsa <[email protected]>
Adding new link type BPF_LINK_TYPE_KPROBE to attach so called
"kprobes" directly through fprobe API. Note that since the
using kprobes with multiple same handler is not efficient,
this uses the fprobe which natively support multiple probe
points for one same handler, but limited on function entry
and exit.
Adding new attach type BPF_TRACE_RAW_KPROBE that enables
such link for kprobe program.
The new link allows to create multiple kprobes link by using
new link_create interface:
struct {
__aligned_u64 addrs;
__u32 cnt;
__u64 bpf_cookie;
} kprobe;
Plus new flag BPF_F_KPROBE_RETURN for link_create.flags to
create return probe.
Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
---
include/linux/bpf_types.h | 1
include/uapi/linux/bpf.h | 12 ++
kernel/bpf/syscall.c | 199 +++++++++++++++++++++++++++++++++++++++-
tools/include/uapi/linux/bpf.h | 12 ++
4 files changed, 219 insertions(+), 5 deletions(-)
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 48a91c51c015..a9000feab34e 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -140,3 +140,4 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
#ifdef CONFIG_PERF_EVENTS
BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
#endif
+BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE, kprobe)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ba5af15e25f5..10e9b56a074e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -995,6 +995,7 @@ enum bpf_attach_type {
BPF_SK_REUSEPORT_SELECT,
BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
BPF_PERF_EVENT,
+ BPF_TRACE_RAW_KPROBE,
__MAX_BPF_ATTACH_TYPE
};
@@ -1009,6 +1010,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_NETNS = 5,
BPF_LINK_TYPE_XDP = 6,
BPF_LINK_TYPE_PERF_EVENT = 7,
+ BPF_LINK_TYPE_KPROBE = 8,
MAX_BPF_LINK_TYPE,
};
@@ -1111,6 +1113,11 @@ enum bpf_link_type {
*/
#define BPF_F_SLEEPABLE (1U << 4)
+/* link_create flags used in LINK_CREATE command for BPF_TRACE_RAW_KPROBE
+ * attach type.
+ */
+#define BPF_F_KPROBE_RETURN (1U << 0)
+
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* the following extensions:
*
@@ -1463,6 +1470,11 @@ union bpf_attr {
*/
__u64 bpf_cookie;
} perf_event;
+ struct {
+ __aligned_u64 addrs;
+ __u32 cnt;
+ __u64 bpf_cookie;
+ } kprobe;
};
} link_create;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 1033ee8c0caf..d237ba7762ec 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -31,6 +31,7 @@
#include <linux/bpf-netns.h>
#include <linux/rcupdate_trace.h>
#include <linux/memcontrol.h>
+#include <linux/fprobes.h>
#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@@ -3013,8 +3014,186 @@ static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *pro
fput(perf_file);
return err;
}
+#else
+static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+ return -ENOTSUPP;
+}
#endif /* CONFIG_PERF_EVENTS */
+#ifdef CONFIG_FPROBES
+
+/* Note that this is called 'kprobe_link' but using fprobe inside */
+struct bpf_kprobe_link {
+ struct bpf_link link;
+ struct fprobe fp;
+ bool is_return;
+ unsigned long *addrs;
+ u32 cnt;
+ u64 bpf_cookie;
+};
+
+static void bpf_kprobe_link_release(struct bpf_link *link)
+{
+ struct bpf_kprobe_link *kprobe_link;
+
+ kprobe_link = container_of(link, struct bpf_kprobe_link, link);
+
+ unregister_fprobe(&kprobe_link->fp);
+}
+
+static void bpf_kprobe_link_dealloc(struct bpf_link *link)
+{
+ struct bpf_kprobe_link *kprobe_link;
+
+ kprobe_link = container_of(link, struct bpf_kprobe_link, link);
+ kfree(kprobe_link->fp.entries);
+ kfree(kprobe_link->addrs);
+ kfree(kprobe_link);
+}
+
+static const struct bpf_link_ops bpf_kprobe_link_lops = {
+ .release = bpf_kprobe_link_release,
+ .dealloc = bpf_kprobe_link_dealloc,
+};
+
+static int kprobe_link_prog_run(struct bpf_kprobe_link *kprobe_link,
+ struct pt_regs *regs)
+{
+ struct bpf_trace_run_ctx run_ctx;
+ struct bpf_run_ctx *old_run_ctx;
+ int err;
+
+ if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
+ err = 0;
+ goto out;
+ }
+
+ old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+ run_ctx.bpf_cookie = kprobe_link->bpf_cookie;
+
+ rcu_read_lock();
+ migrate_disable();
+ err = bpf_prog_run(kprobe_link->link.prog, regs);
+ migrate_enable();
+ rcu_read_unlock();
+
+ bpf_reset_run_ctx(old_run_ctx);
+
+ out:
+ __this_cpu_dec(bpf_prog_active);
+ return err;
+}
+
+static void kprobe_link_entry_handler(struct fprobe *fp, unsigned long entry_ip,
+ struct pt_regs *regs)
+{
+ struct bpf_kprobe_link *kprobe_link;
+
+ kprobe_link = container_of(fp, struct bpf_kprobe_link, fp);
+ kprobe_link_prog_run(kprobe_link, regs);
+}
+
+static void kprobe_link_exit_handler(struct fprobe *fp, unsigned long entry_ip,
+ struct pt_regs *regs)
+{
+ struct bpf_kprobe_link *kprobe_link;
+
+ kprobe_link = container_of(fp, struct bpf_kprobe_link, fp);
+ kprobe_link_prog_run(kprobe_link, regs);
+}
+
+static int bpf_kprobe_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+ struct bpf_link_primer link_primer;
+ struct bpf_kprobe_link *link = NULL;
+ struct fprobe_entry *ents = NULL;
+ unsigned long *addrs;
+ u32 flags, cnt, size, i;
+ void __user *uaddrs;
+ u64 **tmp;
+ int err;
+
+ flags = attr->link_create.flags;
+ if (flags & ~BPF_F_KPROBE_RETURN)
+ return -EINVAL;
+
+ uaddrs = u64_to_user_ptr(attr->link_create.kprobe.addrs);
+ cnt = attr->link_create.kprobe.cnt;
+ size = cnt * sizeof(*tmp);
+
+ tmp = kzalloc(size, GFP_KERNEL);
+ if (!tmp)
+ return -ENOMEM;
+
+ if (copy_from_user(tmp, uaddrs, size)) {
+ err = -EFAULT;
+ goto error;
+ }
+
+ /* TODO add extra copy for 32bit archs */
+ if (sizeof(u64) != sizeof(void *)) {
+ err = -EINVAL;
+ goto error;
+ }
+
+ addrs = (unsigned long *) tmp;
+
+ link = kzalloc(sizeof(*link), GFP_KERNEL);
+ if (!link) {
+ err = -ENOMEM;
+ goto error;
+ }
+
+ ents = kzalloc(sizeof(*ents) * cnt, GFP_KERNEL);
+ if (!ents) {
+ err = -ENOMEM;
+ goto error;
+ }
+ for (i = 0; i < cnt; i++)
+ ents[i].addr = addrs[i];
+
+ bpf_link_init(&link->link, BPF_LINK_TYPE_KPROBE, &bpf_kprobe_link_lops, prog);
+
+ err = bpf_link_prime(&link->link, &link_primer);
+ if (err)
+ goto error;
+
+ link->is_return = flags & BPF_F_KPROBE_RETURN;
+ link->addrs = addrs;
+ link->cnt = cnt;
+ link->bpf_cookie = attr->link_create.kprobe.bpf_cookie;
+
+ link->fp.entries = ents;
+ link->fp.nentry = cnt;
+
+ if (link->is_return)
+ link->fp.exit_handler = kprobe_link_exit_handler;
+ else
+ link->fp.entry_handler = kprobe_link_entry_handler;
+
+ err = register_fprobe(&link->fp);
+ if (err) {
+ bpf_link_cleanup(&link_primer);
+ goto error;
+ }
+
+ return bpf_link_settle(&link_primer);
+
+error:
+ kfree(ents);
+ kfree(link);
+ kfree(tmp);
+
+ return err;
+}
+#else /* !CONFIG_FPROBES */
+static int bpf_kprobe_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+ return -ENOTSUPP;
+}
+#endif
+
#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd
static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
@@ -4241,7 +4420,7 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
return -EINVAL;
}
-#define BPF_LINK_CREATE_LAST_FIELD link_create.iter_info_len
+#define BPF_LINK_CREATE_LAST_FIELD link_create.kprobe.bpf_cookie
static int link_create(union bpf_attr *attr, bpfptr_t uattr)
{
enum bpf_prog_type ptype;
@@ -4265,7 +4444,6 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
ret = tracing_bpf_link_attach(attr, uattr, prog);
goto out;
case BPF_PROG_TYPE_PERF_EVENT:
- case BPF_PROG_TYPE_KPROBE:
case BPF_PROG_TYPE_TRACEPOINT:
if (attr->link_create.attach_type != BPF_PERF_EVENT) {
ret = -EINVAL;
@@ -4273,6 +4451,14 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
}
ptype = prog->type;
break;
+ case BPF_PROG_TYPE_KPROBE:
+ if (attr->link_create.attach_type != BPF_PERF_EVENT &&
+ attr->link_create.attach_type != BPF_TRACE_RAW_KPROBE) {
+ ret = -EINVAL;
+ goto out;
+ }
+ ptype = prog->type;
+ break;
default:
ptype = attach_type_to_prog_type(attr->link_create.attach_type);
if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) {
@@ -4304,13 +4490,16 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
ret = bpf_xdp_link_attach(attr, prog);
break;
#endif
-#ifdef CONFIG_PERF_EVENTS
case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_TRACEPOINT:
- case BPF_PROG_TYPE_KPROBE:
ret = bpf_perf_link_attach(attr, prog);
break;
-#endif
+ case BPF_PROG_TYPE_KPROBE:
+ if (attr->link_create.attach_type == BPF_PERF_EVENT)
+ ret = bpf_perf_link_attach(attr, prog);
+ else
+ ret = bpf_kprobe_link_attach(attr, prog);
+ break;
default:
ret = -EINVAL;
}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index ba5af15e25f5..10e9b56a074e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -995,6 +995,7 @@ enum bpf_attach_type {
BPF_SK_REUSEPORT_SELECT,
BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
BPF_PERF_EVENT,
+ BPF_TRACE_RAW_KPROBE,
__MAX_BPF_ATTACH_TYPE
};
@@ -1009,6 +1010,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_NETNS = 5,
BPF_LINK_TYPE_XDP = 6,
BPF_LINK_TYPE_PERF_EVENT = 7,
+ BPF_LINK_TYPE_KPROBE = 8,
MAX_BPF_LINK_TYPE,
};
@@ -1111,6 +1113,11 @@ enum bpf_link_type {
*/
#define BPF_F_SLEEPABLE (1U << 4)
+/* link_create flags used in LINK_CREATE command for BPF_TRACE_RAW_KPROBE
+ * attach type.
+ */
+#define BPF_F_KPROBE_RETURN (1U << 0)
+
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* the following extensions:
*
@@ -1463,6 +1470,11 @@ union bpf_attr {
*/
__u64 bpf_cookie;
} perf_event;
+ struct {
+ __aligned_u64 addrs;
+ __u32 cnt;
+ __u64 bpf_cookie;
+ } kprobe;
};
} link_create;
On Wed, Jan 12, 2022 at 12:00:17AM +0900, Masami Hiramatsu wrote:
> Hi Jiri,
>
> Here is a short series of patches, which shows what I replied
> to your series.
>
> This introduces the fprobe, the function entry/exit probe with
> multiple probe point support. This also introduces the rethook
> for hooking function return, which I cloned from kretprobe.
>
> I also rewrite your [08/13] bpf patch to use this fprobe instead
> of kprobes. I didn't tested that one, but the sample module seems
> to work. Please test bpf part with your libbpf updates.
>
> BTW, while implementing the fprobe, I introduced the per-probe
> point private data, but I'm not sure why you need it. It seems
> that data is not used from bpf...
>
> If this is good for you, I would like to proceed this with
> the rethook and rewrite the kretprobe to use the rethook to
> hook the functions. That should be much cleaner (and easy to
> prepare for the fgraph tracer integration)
What is the speed of attach/detach of thousands fprobes?
Hi Alexei,
On Tue, 11 Jan 2022 14:39:44 -0800
Alexei Starovoitov <[email protected]> wrote:
> On Wed, Jan 12, 2022 at 12:00:17AM +0900, Masami Hiramatsu wrote:
> > Hi Jiri,
> >
> > Here is a short series of patches, which shows what I replied
> > to your series.
> >
> > This introduces the fprobe, the function entry/exit probe with
> > multiple probe point support. This also introduces the rethook
> > for hooking function return, which I cloned from kretprobe.
> >
> > I also rewrite your [08/13] bpf patch to use this fprobe instead
> > of kprobes. I didn't tested that one, but the sample module seems
> > to work. Please test bpf part with your libbpf updates.
> >
> > BTW, while implementing the fprobe, I introduced the per-probe
> > point private data, but I'm not sure why you need it. It seems
> > that data is not used from bpf...
> >
> > If this is good for you, I would like to proceed this with
> > the rethook and rewrite the kretprobe to use the rethook to
> > hook the functions. That should be much cleaner (and easy to
> > prepare for the fgraph tracer integration)
>
> What is the speed of attach/detach of thousands fprobes?
I've treaked my example module and it shows below result;
/lib/modules/5.16.0-rc4+/kernel/samples/fprobe # time insmod ./fprobe_example.ko
symbol='btrfs_*'
[ 187.095925] fprobe_init: 1028 symbols found
[ 188.521694] fprobe_init: Planted fprobe at btrfs_*
real 0m 1.47s
user 0m 0.00s
sys 0m 1.36s
I think using ftrace_set_filter_ips() can make it faster.
(maybe it needs to drop per-probe point private data, that
prevents fprobe to use that interface)
Thank you,
--
Masami Hiramatsu <[email protected]>
On Wed, 12 Jan 2022 16:33:53 +0900
Masami Hiramatsu <[email protected]> wrote:
> Hi Alexei,
>
> On Tue, 11 Jan 2022 14:39:44 -0800
> Alexei Starovoitov <[email protected]> wrote:
>
> > On Wed, Jan 12, 2022 at 12:00:17AM +0900, Masami Hiramatsu wrote:
> > > Hi Jiri,
> > >
> > > Here is a short series of patches, which shows what I replied
> > > to your series.
> > >
> > > This introduces the fprobe, the function entry/exit probe with
> > > multiple probe point support. This also introduces the rethook
> > > for hooking function return, which I cloned from kretprobe.
> > >
> > > I also rewrite your [08/13] bpf patch to use this fprobe instead
> > > of kprobes. I didn't tested that one, but the sample module seems
> > > to work. Please test bpf part with your libbpf updates.
> > >
> > > BTW, while implementing the fprobe, I introduced the per-probe
> > > point private data, but I'm not sure why you need it. It seems
> > > that data is not used from bpf...
> > >
> > > If this is good for you, I would like to proceed this with
> > > the rethook and rewrite the kretprobe to use the rethook to
> > > hook the functions. That should be much cleaner (and easy to
> > > prepare for the fgraph tracer integration)
> >
> > What is the speed of attach/detach of thousands fprobes?
>
> I've treaked my example module and it shows below result;
>
> /lib/modules/5.16.0-rc4+/kernel/samples/fprobe # time insmod ./fprobe_example.ko
> symbol='btrfs_*'
> [ 187.095925] fprobe_init: 1028 symbols found
> [ 188.521694] fprobe_init: Planted fprobe at btrfs_*
> real 0m 1.47s
> user 0m 0.00s
> sys 0m 1.36s
>
> I think using ftrace_set_filter_ips() can make it faster.
> (maybe it needs to drop per-probe point private data, that
> prevents fprobe to use that interface)
OK, I've updated fprobes to use the ftrace_set_filter_ips()
and got below result.
/lib/modules/5.16.0-rc4+/kernel/samples/fprobe # time insmod fprobe_example.ko s
ymbol='btrfs_*'
[ 36.130947] fprobe_init: 1028 symbols found
[ 36.177901] fprobe_init: Planted fprobe at btrfs_*
real 0m 0.08s
user 0m 0.00s
sys 0m 0.07s
Let me update the series :)
Thank you,
--
Masami Hiramatsu <[email protected]>