2022-01-21 19:20:04

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH v3 0/9] fprobe: Introduce fprobe function entry/exit probe

Hello Jiri,

Here is the 3rd version of fprobe. I added some comments and
fixed some issues. But I still saw some problems when I add
your selftest patches.

This series introduces the fprobe, the function entry/exit probe
with multiple probe point support. This also introduces the rethook
for hooking function return as same as kretprobe does. This
abstraction will help us to generalize the fgraph tracer,
because we can just switch it from rethook in fprobe, depending
on the kernel configuration.

The patch [1/9] and [7/9] are from Jiri's series[1]. Other libbpf
patches will not be affected by this change.

[1] https://lore.kernel.org/all/[email protected]/T/#u

However, when I applied all other patches on top of this series,
I saw the "#8 bpf_cookie" test case has been stacked (maybe related
to the bpf_cookie issue which Andrii and Jiri talked?) And when I
remove the last selftest patch[2], the selftest stopped at "#112
raw_tp_test_run".

[2] https://lore.kernel.org/all/[email protected]/T/#m242d2b3a3775eeb5baba322424b15901e5e78483

Note that I used tools/testing/selftests/bpf/vmtest.sh to check it.

This added 2 more out-of-tree patches. [8/9] is for adding wildcard
support to the sample program, [9/9] is a testing patch for replacing
kretprobe trampoline with rethook.
According to this work, I noticed that using rethook in kretprobe
needs 2 steps.
1. port the rethook on all architectures which supports kretprobes.
(some arch requires CONFIG_KPROBES for rethook)
2. replace kretprobe trampoline with rethook for all archs, at once.
This must be done by one treewide patch.

Anyway, I'll do the kretprobe update in the next step as another series.
(This testing patch is just for confirming the rethook is correctly
implemented.)

BTW, on the x86, ftrace (with fentry) location address is same as
symbol address. But on other archs, it will be different (e.g. arm64
will need 2 instructions to save link-register and call ftrace, the
2nd instruction will be the ftrace location.)
Does libbpf correctly handle it?

Thank you,

---

Jiri Olsa (2):
ftrace: Add ftrace_set_filter_ips function
bpf: Add kprobe link for attaching raw kprobes

Masami Hiramatsu (7):
fprobe: Add ftrace based probe APIs
rethook: Add a generic return hook
rethook: x86: Add rethook x86 implementation
fprobe: Add exit_handler support
fprobe: Add sample program for fprobe
[DO NOT MERGE] Out-of-tree: Support wildcard symbol option to sample
[DO NOT MERGE] out-of-tree: kprobes: Use rethook for kretprobe


arch/x86/Kconfig | 1
arch/x86/include/asm/unwind.h | 8 +
arch/x86/kernel/Makefile | 1
arch/x86/kernel/kprobes/core.c | 106 --------------
arch/x86/kernel/rethook.c | 115 +++++++++++++++
include/linux/bpf_types.h | 1
include/linux/fprobe.h | 84 +++++++++++
include/linux/ftrace.h | 3
include/linux/kprobes.h | 85 +----------
include/linux/rethook.h | 99 +++++++++++++
include/linux/sched.h | 4 -
include/uapi/linux/bpf.h | 12 ++
kernel/bpf/syscall.c | 195 +++++++++++++++++++++++++-
kernel/exit.c | 3
kernel/fork.c | 4 -
kernel/kallsyms.c | 1
kernel/kprobes.c | 265 +++++------------------------------
kernel/trace/Kconfig | 22 +++
kernel/trace/Makefile | 2
kernel/trace/fprobe.c | 179 ++++++++++++++++++++++++
kernel/trace/ftrace.c | 54 ++++++-
kernel/trace/rethook.c | 295 +++++++++++++++++++++++++++++++++++++++
kernel/trace/trace_kprobe.c | 4 -
kernel/trace/trace_output.c | 2
samples/Kconfig | 7 +
samples/Makefile | 1
samples/fprobe/Makefile | 3
samples/fprobe/fprobe_example.c | 154 ++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 12 ++
29 files changed, 1283 insertions(+), 439 deletions(-)
create mode 100644 arch/x86/kernel/rethook.c
create mode 100644 include/linux/fprobe.h
create mode 100644 include/linux/rethook.h
create mode 100644 kernel/trace/fprobe.c
create mode 100644 kernel/trace/rethook.c
create mode 100644 samples/fprobe/Makefile
create mode 100644 samples/fprobe/fprobe_example.c

--
Masami Hiramatsu (Linaro) <[email protected]>


2022-01-21 19:20:09

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH v3 1/9] ftrace: Add ftrace_set_filter_ips function

From: Jiri Olsa <[email protected]>

Adding ftrace_set_filter_ips function to be able to set filter on
multiple ip addresses at once.

With the kprobe multi attach interface we have cases where we need to
initialize ftrace_ops object with thousands of functions, so having
single function diving into ftrace_hash_move_and_update_ops with
ftrace_lock is faster.

The functions ips are passed as unsigned long array with count.

Signed-off-by: Jiri Olsa <[email protected]>
---
include/linux/ftrace.h | 3 +++
kernel/trace/ftrace.c | 53 ++++++++++++++++++++++++++++++++++++++++--------
2 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9999e29187de..60847cbce0da 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -512,6 +512,8 @@ struct dyn_ftrace {

int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
int remove, int reset);
+int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips,
+ unsigned int cnt, int remove, int reset);
int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
int len, int reset);
int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
@@ -802,6 +804,7 @@ static inline unsigned long ftrace_location(unsigned long ip)
#define ftrace_regex_open(ops, flag, inod, file) ({ -ENODEV; })
#define ftrace_set_early_filter(ops, buf, enable) do { } while (0)
#define ftrace_set_filter_ip(ops, ip, remove, reset) ({ -ENODEV; })
+#define ftrace_set_filter_ips(ops, ips, cnt, remove, reset) ({ -ENODEV; })
#define ftrace_set_filter(ops, buf, len, reset) ({ -ENODEV; })
#define ftrace_set_notrace(ops, buf, len, reset) ({ -ENODEV; })
#define ftrace_free_filter(ops) do { } while (0)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index be5f6b32a012..39350aa38649 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4958,7 +4958,7 @@ ftrace_notrace_write(struct file *file, const char __user *ubuf,
}

static int
-ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
+__ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
{
struct ftrace_func_entry *entry;

@@ -4976,9 +4976,25 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
return add_hash_entry(hash, ip);
}

+static int
+ftrace_match_addr(struct ftrace_hash *hash, unsigned long *ips,
+ unsigned int cnt, int remove)
+{
+ unsigned int i;
+ int err;
+
+ for (i = 0; i < cnt; i++) {
+ err = __ftrace_match_addr(hash, ips[i], remove);
+ if (err)
+ return err;
+ }
+ return 0;
+}
+
static int
ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
- unsigned long ip, int remove, int reset, int enable)
+ unsigned long *ips, unsigned int cnt,
+ int remove, int reset, int enable)
{
struct ftrace_hash **orig_hash;
struct ftrace_hash *hash;
@@ -5008,8 +5024,8 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
ret = -EINVAL;
goto out_regex_unlock;
}
- if (ip) {
- ret = ftrace_match_addr(hash, ip, remove);
+ if (ips) {
+ ret = ftrace_match_addr(hash, ips, cnt, remove);
if (ret < 0)
goto out_regex_unlock;
}
@@ -5026,10 +5042,10 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
}

static int
-ftrace_set_addr(struct ftrace_ops *ops, unsigned long ip, int remove,
- int reset, int enable)
+ftrace_set_addr(struct ftrace_ops *ops, unsigned long *ips, unsigned int cnt,
+ int remove, int reset, int enable)
{
- return ftrace_set_hash(ops, NULL, 0, ip, remove, reset, enable);
+ return ftrace_set_hash(ops, NULL, 0, ips, cnt, remove, reset, enable);
}

#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
@@ -5634,10 +5650,29 @@ int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
int remove, int reset)
{
ftrace_ops_init(ops);
- return ftrace_set_addr(ops, ip, remove, reset, 1);
+ return ftrace_set_addr(ops, &ip, 1, remove, reset, 1);
}
EXPORT_SYMBOL_GPL(ftrace_set_filter_ip);

+/**
+ * ftrace_set_filter_ips - set a functions to filter on in ftrace by addresses
+ * @ops - the ops to set the filter with
+ * @ips - the array of addresses to add to or remove from the filter.
+ * @cnt - the number of addresses in @ips
+ * @remove - non zero to remove ips from the filter
+ * @reset - non zero to reset all filters before applying this filter.
+ *
+ * Filters denote which functions should be enabled when tracing is enabled
+ * If @ips array or any ip specified within is NULL , it fails to update filter.
+ */
+int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips,
+ unsigned int cnt, int remove, int reset)
+{
+ ftrace_ops_init(ops);
+ return ftrace_set_addr(ops, ips, cnt, remove, reset, 1);
+}
+EXPORT_SYMBOL_GPL(ftrace_set_filter_ips);
+
/**
* ftrace_ops_set_global_filter - setup ops to use global filters
* @ops - the ops which will use the global filters
@@ -5659,7 +5694,7 @@ static int
ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
int reset, int enable)
{
- return ftrace_set_hash(ops, buf, len, 0, 0, reset, enable);
+ return ftrace_set_hash(ops, buf, len, NULL, 0, 0, reset, enable);
}

/**

2022-01-21 19:20:15

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH v3 2/9] fprobe: Add ftrace based probe APIs

The fprobe is a wrapper API for ftrace function tracer.
Unlike kprobes, this probes only supports the function entry, but
it can probe multiple functions by one fprobe. The usage is almost
same as the kprobe, user will specify the function names by
fprobe::syms, the number of syms by fprobe::nentry,
and the user handler by fprobe::entry_handler.

struct fprobe fp = { 0 };
const char *targets[] = { "func1", "func2", "func3"};

fp.handler = user_handler;
fp.nentry = ARRAY_SIZE(targets);
fp.syms = targets;

ret = register_fprobe(&fp);

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v3:
- Update kerneldocs.
---
include/linux/fprobe.h | 80 +++++++++++++++++++++++++++++++++
kernel/trace/Kconfig | 10 ++++
kernel/trace/Makefile | 1
kernel/trace/fprobe.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 208 insertions(+)
create mode 100644 include/linux/fprobe.h
create mode 100644 kernel/trace/fprobe.c

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
new file mode 100644
index 000000000000..2fc487d933e3
--- /dev/null
+++ b/include/linux/fprobe.h
@@ -0,0 +1,80 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Simple ftrace probe wrapper */
+#ifndef _LINUX_FPROBE_H
+#define _LINUX_FPROBE_H
+
+#include <linux/compiler.h>
+#include <linux/ftrace.h>
+
+/**
+ * struct fprobe - ftrace based probe.
+ * @syms: The array of symbols to probe.
+ * @addrs: The array of address of the symbols.
+ * @nentry: The number of entries of @syms or @addrs.
+ * @ftrace: The ftrace_ops.
+ * @nmissed: The counter for missing events.
+ * @flags: The status flag.
+ * @entry_handler: The callback function for function entry.
+ *
+ * User must set either @syms or @addrs, but not both. If user sets
+ * only @syms, the @addrs are generated when registering the fprobe.
+ * That auto-generated @addrs will be freed when unregistering.
+ */
+struct fprobe {
+ const char **syms;
+ unsigned long *addrs;
+ unsigned int nentry;
+
+ struct ftrace_ops ftrace;
+ unsigned long nmissed;
+ unsigned int flags;
+ void (*entry_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
+};
+
+#define FPROBE_FL_DISABLED 1
+
+static inline bool fprobe_disabled(struct fprobe *fp)
+{
+ return (fp) ? fp->flags & FPROBE_FL_DISABLED : false;
+}
+
+#ifdef CONFIG_FPROBE
+int register_fprobe(struct fprobe *fp);
+int unregister_fprobe(struct fprobe *fp);
+#else
+static inline int register_fprobe(struct fprobe *fp)
+{
+ return -EOPNOTSUPP;
+}
+static inline int unregister_fprobe(struct fprobe *fp)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
+/**
+ * disable_fprobe() - Disable fprobe
+ * @fp: The fprobe to be disabled.
+ *
+ * This will soft-disable @fp. Note that this doesn't remove the ftrace
+ * hooks from the function entry.
+ */
+static inline void disable_fprobe(struct fprobe *fp)
+{
+ if (fp)
+ fp->flags |= FPROBE_FL_DISABLED;
+}
+
+/**
+ * enable_fprobe() - Enable fprobe
+ * @fp: The fprobe to be enabled.
+ *
+ * This will soft-enable @fp.
+ */
+static inline void enable_fprobe(struct fprobe *fp)
+{
+ if (fp)
+ fp->flags &= ~FPROBE_FL_DISABLED;
+}
+
+#endif
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 420ff4bc67fd..6834b0272798 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -223,6 +223,16 @@ config DYNAMIC_FTRACE_WITH_ARGS
depends on DYNAMIC_FTRACE
depends on HAVE_DYNAMIC_FTRACE_WITH_ARGS

+config FPROBE
+ bool "Kernel Function Probe (fprobe)"
+ depends on FUNCTION_TRACER
+ depends on DYNAMIC_FTRACE_WITH_REGS
+ default n
+ help
+ This option enables kernel function probe feature, which is
+ similar to kprobes, but probes only for kernel function entries
+ and it can probe multiple functions by one fprobe.
+
config FUNCTION_PROFILER
bool "Kernel function profiler"
depends on FUNCTION_TRACER
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index bedc5caceec7..79255f9de9a4 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -97,6 +97,7 @@ obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
+obj-$(CONFIG_FPROBE) += fprobe.o

obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o

diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
new file mode 100644
index 000000000000..8b068deadc48
--- /dev/null
+++ b/kernel/trace/fprobe.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fprobe - Simple ftrace probe wrapper for function entry.
+ */
+#define pr_fmt(fmt) "fprobe: " fmt
+
+#include <linux/fprobe.h>
+#include <linux/kallsyms.h>
+#include <linux/kprobes.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+
+static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
+ struct ftrace_ops *ops, struct ftrace_regs *fregs)
+{
+ struct fprobe *fp;
+ int bit;
+
+ fp = container_of(ops, struct fprobe, ftrace);
+ if (fprobe_disabled(fp))
+ return;
+
+ bit = ftrace_test_recursion_trylock(ip, parent_ip);
+ if (bit < 0) {
+ fp->nmissed++;
+ return;
+ }
+
+ if (fp->entry_handler)
+ fp->entry_handler(fp, ip, ftrace_get_regs(fregs));
+
+ ftrace_test_recursion_unlock(bit);
+}
+NOKPROBE_SYMBOL(fprobe_handler);
+
+static int convert_func_addresses(struct fprobe *fp)
+{
+ unsigned int i;
+
+ if (!fp->syms)
+ return 0;
+
+ fp->addrs = kcalloc(fp->nentry, sizeof(*fp->addrs), GFP_KERNEL);
+ if (!fp->addrs)
+ return -ENOMEM;
+
+ for (i = 0; i < fp->nentry; i++) {
+
+ fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
+ if (!fp->addrs[i])
+ return -ENOENT;
+ }
+
+ return 0;
+}
+
+/**
+ * register_fprobe() - Register fprobe to ftrace
+ * @fp: A fprobe data structure to be registered.
+ *
+ * This expects the user set @fp::entry_handler, @fp::syms or @fp:addrs,
+ * and @fp::nentry.
+ * Note that you do not set both of @fp::addrs and @fp::syms.
+ */
+int register_fprobe(struct fprobe *fp)
+{
+ int ret;
+
+ if (!fp || !fp->nentry || (!fp->syms && !fp->addrs) ||
+ (fp->syms && fp->addrs))
+ return -EINVAL;
+
+ ret = convert_func_addresses(fp);
+ if (ret < 0)
+ return ret;
+
+ fp->nmissed = 0;
+ fp->ftrace.func = fprobe_handler;
+ fp->ftrace.flags = FTRACE_OPS_FL_SAVE_REGS;
+
+ ret = ftrace_set_filter_ips(&fp->ftrace, fp->addrs, fp->nentry, 0, 0);
+ if (!ret)
+ ret = register_ftrace_function(&fp->ftrace);
+
+ if (ret < 0 && fp->syms) {
+ kfree(fp->addrs);
+ fp->addrs = NULL;
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(register_fprobe);
+
+/**
+ * unregister_fprobe() - Unregister fprobe from ftrace
+ * @fp: A fprobe data structure to be unregistered.
+ *
+ * Unregister fprobe (and remove ftrace hooks from the function entries).
+ * If the @fp::addrs are generated by register_fprobe(), it will be removed
+ * automatically.
+ */
+int unregister_fprobe(struct fprobe *fp)
+{
+ int ret;
+
+ if (!fp || !fp->nentry || !fp->addrs)
+ return -EINVAL;
+
+ ret = unregister_ftrace_function(&fp->ftrace);
+
+ if (!ret && fp->syms) {
+ kfree(fp->addrs);
+ fp->addrs = NULL;
+ }
+ return ret;
+}
+EXPORT_SYMBOL_GPL(unregister_fprobe);

2022-01-21 19:20:26

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH v3 4/9] rethook: x86: Add rethook x86 implementation

Add rethook for x86 implementation. Most of the code
has been copied from kretprobes on x86.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/x86/Kconfig | 1
arch/x86/kernel/Makefile | 1
arch/x86/kernel/rethook.c | 115 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 117 insertions(+)
create mode 100644 arch/x86/kernel/rethook.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5c2ccb85f2ef..0a7d48a63787 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -219,6 +219,7 @@ config X86
select HAVE_KPROBES_ON_FTRACE
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_KRETPROBES
+ select HAVE_RETHOOK
select HAVE_KVM
select HAVE_LIVEPATCH if X86_64
select HAVE_MIXED_BREAKPOINTS_REGS
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2ff3e600f426..66593d8c4d74 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
obj-$(CONFIG_FTRACE_SYSCALLS) += ftrace.o
obj-$(CONFIG_X86_TSC) += trace_clock.o
obj-$(CONFIG_TRACING) += trace.o
+obj-$(CONFIG_RETHOOK) += rethook.o
obj-$(CONFIG_CRASH_CORE) += crash_core_$(BITS).o
obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o
obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o
diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c
new file mode 100644
index 000000000000..f2f3b9526e43
--- /dev/null
+++ b/arch/x86/kernel/rethook.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * x86 implementation of rethook. Mostly copied from arch/x86/kernel/kprobes/core.c.
+ */
+#include <linux/bug.h>
+#include <linux/rethook.h>
+#include <linux/kprobes.h>
+
+#include "kprobes/common.h"
+
+/*
+ * Called from arch_rethook_trampoline
+ */
+__used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs)
+{
+ unsigned long *frame_pointer;
+
+ /* fixup registers */
+ regs->cs = __KERNEL_CS;
+#ifdef CONFIG_X86_32
+ regs->gs = 0;
+#endif
+ regs->ip = (unsigned long)&arch_rethook_trampoline;
+ regs->orig_ax = ~0UL;
+ regs->sp += sizeof(long);
+ frame_pointer = &regs->sp + 1;
+
+ /*
+ * The return address at 'frame_pointer' is recovered by the
+ * arch_rethook_fixup_return() which called from this
+ * rethook_trampoline_handler().
+ */
+ rethook_trampoline_handler(regs, (unsigned long)frame_pointer);
+
+ /*
+ * Copy FLAGS to 'pt_regs::sp' so that arch_rethook_trapmoline()
+ * can do RET right after POPF.
+ */
+ regs->sp = regs->flags;
+}
+NOKPROBE_SYMBOL(arch_rethook_trampoline_callback);
+
+/*
+ * When a target function returns, this code saves registers and calls
+ * arch_rethook_trampoline_callback(), which calls the rethook handler.
+ */
+asm(
+ ".text\n"
+ ".global arch_rethook_trampoline\n"
+ ".type arch_rethook_trampoline, @function\n"
+ "arch_rethook_trampoline:\n"
+#ifdef CONFIG_X86_64
+ /* Push a fake return address to tell the unwinder it's a kretprobe. */
+ " pushq $arch_rethook_trampoline\n"
+ UNWIND_HINT_FUNC
+ /* Save the 'sp - 8', this will be fixed later. */
+ " pushq %rsp\n"
+ " pushfq\n"
+ SAVE_REGS_STRING
+ " movq %rsp, %rdi\n"
+ " call arch_rethook_trampoline_callback\n"
+ RESTORE_REGS_STRING
+ /* In the callback function, 'regs->flags' is copied to 'regs->sp'. */
+ " addq $8, %rsp\n"
+ " popfq\n"
+#else
+ /* Push a fake return address to tell the unwinder it's a kretprobe. */
+ " pushl $arch_rethook_trampoline\n"
+ UNWIND_HINT_FUNC
+ /* Save the 'sp - 4', this will be fixed later. */
+ " pushl %esp\n"
+ " pushfl\n"
+ SAVE_REGS_STRING
+ " movl %esp, %eax\n"
+ " call arch_rethook_trampoline_callback\n"
+ RESTORE_REGS_STRING
+ /* In the callback function, 'regs->flags' is copied to 'regs->sp'. */
+ " addl $4, %esp\n"
+ " popfl\n"
+#endif
+ " ret\n"
+ ".size arch_rethook_trampoline, .-arch_rethook_trampoline\n"
+);
+NOKPROBE_SYMBOL(arch_rethook_trampoline);
+/*
+ * arch_rethook_trampoline() skips updating frame pointer. The frame pointer
+ * saved in arch_rethook_trampoline_callback() points to the real caller
+ * function's frame pointer. Thus the arch_rethook_trampoline() doesn't have
+ * a standard stack frame with CONFIG_FRAME_POINTER=y.
+ * Let's mark it non-standard function. Anyway, FP unwinder can correctly
+ * unwind without the hint.
+ */
+STACK_FRAME_NON_STANDARD_FP(arch_rethook_trampoline);
+
+/* This is called from rethook_trampoline_handler(). */
+void arch_rethook_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr)
+{
+ unsigned long *frame_pointer = &regs->sp + 1;
+
+ /* Replace fake return address with real one. */
+ *frame_pointer = correct_ret_addr;
+}
+
+void arch_rethook_prepare(struct rethook_node *rh, struct pt_regs *regs)
+{
+ unsigned long *stack = (unsigned long *)regs->sp;
+
+ rh->ret_addr = stack[0];
+ rh->frame = regs->sp;
+
+ /* Replace the return addr with trampoline addr */
+ stack[0] = (unsigned long) arch_rethook_trampoline;
+}
+NOKPROBE_SYMBOL(arch_rethook_prepare);

2022-01-21 19:20:32

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH v3 3/9] rethook: Add a generic return hook

Add a return hook framework which hooks the function
return. Most of the idea came from the kretprobe, but
this is independent from kretprobe.
Note that this is expected to be used with other
function entry hooking feature, like ftrace, fprobe,
adn kprobes. Eventually this will replace the
kretprobe (e.g. kprobe + rethook = kretprobe), but
at this moment, this is just a additional hook.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v3:
- Strictly check the rethook::handler.
- Protect the handler callback loop from rcu.
- Add some comments and kernel docs.
- Change some function name.
---
include/linux/rethook.h | 99 ++++++++++++++++
include/linux/sched.h | 3
kernel/exit.c | 2
kernel/fork.c | 3
kernel/trace/Kconfig | 11 ++
kernel/trace/Makefile | 1
kernel/trace/rethook.c | 295 +++++++++++++++++++++++++++++++++++++++++++++++
7 files changed, 414 insertions(+)
create mode 100644 include/linux/rethook.h
create mode 100644 kernel/trace/rethook.c

diff --git a/include/linux/rethook.h b/include/linux/rethook.h
new file mode 100644
index 000000000000..39cfbff1a03c
--- /dev/null
+++ b/include/linux/rethook.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Return hooking with list-based shadow stack.
+ */
+#ifndef _LINUX_RETHOOK_H
+#define _LINUX_RETHOOK_H
+
+#include <linux/compiler.h>
+#include <linux/freelist.h>
+#include <linux/llist.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+
+struct rethook_node;
+
+typedef void (*rethook_handler_t) (struct rethook_node *, void *, struct pt_regs *);
+
+/**
+ * struct rethook - The rethook management data structure.
+ * @data: The user-defined data storage.
+ * @handler: The user-defined return hook handler.
+ * @pool: The pool of struct rethook_node.
+ * @ref: The reference counter.
+ * @rcu: The rcu_head for deferred freeing.
+ *
+ * Don't embed to another data structure, because this is a self-destructive
+ * data structure when all rethook_node are freed.
+ */
+struct rethook {
+ void *data;
+ rethook_handler_t handler;
+ struct freelist_head pool;
+ refcount_t ref;
+ struct rcu_head rcu;
+};
+
+/**
+ * struct rethook_node - The rethook shadow-stack entry node.
+ * @freelist: The freelist, linked to struct rethook::pool.
+ * @rcu: The rcu_head for deferred freeing.
+ * @llist: The llist, linked to a struct task_struct::rethooks.
+ * @rethook: The pointer to the struct rethook.
+ * @ret_addr: The storage for the real return address.
+ * @frame: The stroage for the frame pointer.
+ *
+ * You can embed this with your extended data structure to store any data
+ * on the entry of shadow stack.
+ */
+struct rethook_node {
+ union {
+ struct freelist_node freelist;
+ struct rcu_head rcu;
+ };
+ struct llist_node llist;
+ struct rethook *rethook;
+ unsigned long ret_addr;
+ unsigned long frame;
+};
+
+struct rethook *rethook_alloc(void *data, rethook_handler_t handler);
+void rethook_free(struct rethook *rh);
+void rethook_add_node(struct rethook *rh, struct rethook_node *node);
+struct rethook_node *rethook_try_get(struct rethook *rh);
+void rethook_recycle(struct rethook_node *node);
+void rethook_hook(struct rethook_node *node, struct pt_regs *regs);
+unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
+ struct llist_node **cur);
+
+/* Arch dependent code must implement arch_* and trampoline code */
+void arch_rethook_prepare(struct rethook_node *node, struct pt_regs *regs);
+void arch_rethook_trampoline(void);
+
+/**
+ * is_rethook_trampoline() - Check whether the address is rethook trampoline
+ * @addr: The address to be checked
+ *
+ * Return true if the @addr is rethook trampoline address.
+ */
+static inline bool is_rethook_trampoline(unsigned long addr)
+{
+ return addr == (unsigned long)arch_rethook_trampoline;
+}
+
+/* If the architecture needs a fixup the return address, implement it. */
+void arch_rethook_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr);
+
+/* Generic trampoline handler, arch code must prepare asm stub */
+unsigned long rethook_trampoline_handler(struct pt_regs *regs,
+ unsigned long frame);
+
+#ifdef CONFIG_RETHOOK
+void rethook_flush_task(struct task_struct *tk);
+#else
+#define rethook_flush_task(tsk) do { } while (0)
+#endif
+
+#endif
+
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 78c351e35fec..2bfabf5355b7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1473,6 +1473,9 @@ struct task_struct {
#ifdef CONFIG_KRETPROBES
struct llist_head kretprobe_instances;
#endif
+#ifdef CONFIG_RETHOOK
+ struct llist_head rethooks;
+#endif

#ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH
/*
diff --git a/kernel/exit.c b/kernel/exit.c
index f702a6a63686..a39a321c1f37 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -64,6 +64,7 @@
#include <linux/compat.h>
#include <linux/io_uring.h>
#include <linux/kprobes.h>
+#include <linux/rethook.h>

#include <linux/uaccess.h>
#include <asm/unistd.h>
@@ -169,6 +170,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);

kprobe_flush_task(tsk);
+ rethook_flush_task(tsk);
perf_event_delayed_put(tsk);
trace_sched_process_free(tsk);
put_task_struct(tsk);
diff --git a/kernel/fork.c b/kernel/fork.c
index 3244cc56b697..ffae38be64c4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2282,6 +2282,9 @@ static __latent_entropy struct task_struct *copy_process(
#ifdef CONFIG_KRETPROBES
p->kretprobe_instances.first = NULL;
#endif
+#ifdef CONFIG_RETHOOK
+ p->rethooks.first = NULL;
+#endif

/*
* Ensure that the cgroup subsystem policies allow the new process to be
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 6834b0272798..44c473ad9021 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -10,6 +10,17 @@ config USER_STACKTRACE_SUPPORT
config NOP_TRACER
bool

+config HAVE_RETHOOK
+ bool
+
+config RETHOOK
+ bool
+ depends on HAVE_RETHOOK
+ help
+ Enable generic return hooking feature. This is an internal
+ API, which will be used by other function-entry hooking
+ feature like fprobe and kprobes.
+
config HAVE_FUNCTION_TRACER
bool
help
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 79255f9de9a4..c6f11a139eac 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -98,6 +98,7 @@ obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
obj-$(CONFIG_FPROBE) += fprobe.o
+obj-$(CONFIG_RETHOOK) += rethook.o

obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o

diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
new file mode 100644
index 000000000000..ccd7be2cd09b
--- /dev/null
+++ b/kernel/trace/rethook.c
@@ -0,0 +1,295 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define pr_fmt(fmt) "rethook: " fmt
+
+#include <linux/bug.h>
+#include <linux/kallsyms.h>
+#include <linux/kprobes.h>
+#include <linux/preempt.h>
+#include <linux/rethook.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+
+/* Return hook list (shadow stack by list) */
+
+/*
+ * This function is called from delayed_put_task_struct() when a task is
+ * dead and cleaned up to recycle any kretprobe instances associated with
+ * this task. These left over instances represent probed functions that
+ * have been called but will never return.
+ */
+void rethook_flush_task(struct task_struct *tk)
+{
+ struct rethook_node *rhn;
+ struct llist_node *node;
+
+ preempt_disable();
+
+ node = __llist_del_all(&tk->rethooks);
+ while (node) {
+ rhn = container_of(node, struct rethook_node, llist);
+ node = node->next;
+ rethook_recycle(rhn);
+ }
+
+ preempt_enable();
+}
+
+static void rethook_free_rcu(struct rcu_head *head)
+{
+ struct rethook *rh = container_of(head, struct rethook, rcu);
+ struct rethook_node *rhn;
+ struct freelist_node *node;
+ int count = 1;
+
+ node = rh->pool.head;
+ while (node) {
+ rhn = container_of(node, struct rethook_node, freelist);
+ node = node->next;
+ kfree(rhn);
+ count++;
+ }
+
+ /* The rh->ref is the number of pooled node + 1 */
+ if (refcount_sub_and_test(count, &rh->ref))
+ kfree(rh);
+}
+
+/**
+ * rethook_free() - Free struct rethook.
+ * @rh: the struct rethook to be freed.
+ *
+ * Free the rethook. Before calling this function, user must ensure the
+ * @rh::data is cleaned if needed (or, the handler can access it after
+ * calling this function.) This function will set the @rh to be freed
+ * after all rethook_node are freed (not soon). And the caller must
+ * not touch @rh after calling this.
+ */
+void rethook_free(struct rethook *rh)
+{
+ rcu_assign_pointer(rh->handler, NULL);
+
+ call_rcu(&rh->rcu, rethook_free_rcu);
+}
+
+/**
+ * rethook_alloc() - Allocate struct rethook.
+ * @data: a data to pass the @handler when hooking the return.
+ * @handler: the return hook callback function.
+ *
+ * Allocate and initialize a new rethook with @data and @handler.
+ * Return NULL if memory allocation fails or @handler is NULL.
+ * Note that @handler == NULL means this rethook is going to be freed.
+ */
+struct rethook *rethook_alloc(void *data, rethook_handler_t handler)
+{
+ struct rethook *rh = kzalloc(sizeof(struct rethook), GFP_KERNEL);
+
+ if (!rh || !handler)
+ return NULL;
+
+ rh->data = data;
+ rh->handler = handler;
+ rh->pool.head = NULL;
+ refcount_set(&rh->ref, 1);
+
+ return rh;
+}
+
+/**
+ * rethook_add_node() - Add a new node to the rethook.
+ * @rh: the struct rethook.
+ * @node: the struct rethook_node to be added.
+ *
+ * Add @node to @rh. User must allocate @node (as a part of user's
+ * data structure.) The @node fields are initialized in this function.
+ */
+void rethook_add_node(struct rethook *rh, struct rethook_node *node)
+{
+ node->rethook = rh;
+ freelist_add(&node->freelist, &rh->pool);
+ refcount_inc(&rh->ref);
+}
+
+static void free_rethook_node_rcu(struct rcu_head *head)
+{
+ struct rethook_node *node = container_of(head, struct rethook_node, rcu);
+
+ if (refcount_dec_and_test(&node->rethook->ref))
+ kfree(node->rethook);
+ kfree(node);
+}
+
+/**
+ * rethook_recycle() - return the node to rethook.
+ * @node: The struct rethook_node to be returned.
+ *
+ * Return back the @node to @node::rethook. If the @node::rethook is already
+ * marked as freed, this will free the @node.
+ */
+void rethook_recycle(struct rethook_node *node)
+{
+ lockdep_assert_preemption_disabled();
+
+ if (likely(READ_ONCE(node->rethook->handler)))
+ freelist_add(&node->freelist, &node->rethook->pool);
+ else
+ call_rcu(&node->rcu, free_rethook_node_rcu);
+}
+NOKPROBE_SYMBOL(rethook_recycle);
+
+/**
+ * rethook_try_get() - get an unused rethook node.
+ * @rh: The struct rethook which pools the nodes.
+ *
+ * Get an unused rethook node from @rh. If the node pool is empty, this
+ * will return NULL. Caller must disable preemption.
+ */
+struct rethook_node *rethook_try_get(struct rethook *rh)
+{
+ rethook_handler_t handler = READ_ONCE(rh->handler);
+ struct freelist_node *fn;
+
+ lockdep_assert_preemption_disabled();
+
+ /* Check whether @rh is going to be freed. */
+ if (unlikely(!handler))
+ return NULL;
+
+ fn = freelist_try_get(&rh->pool);
+ if (!fn)
+ return NULL;
+
+ return container_of(fn, struct rethook_node, freelist);
+}
+NOKPROBE_SYMBOL(rethook_try_get);
+
+/**
+ * rethook_hook() - Hook the current function return.
+ * @node: The struct rethook node to hook the function return.
+ * @regs: The struct pt_regs for the function entry.
+ *
+ * Hook the current running function return. This must be called when the
+ * function entry (or at least @regs must be the registers of the function
+ * entry.)
+ */
+void rethook_hook(struct rethook_node *node, struct pt_regs *regs)
+{
+ arch_rethook_prepare(node, regs);
+ __llist_add(&node->llist, &current->rethooks);
+}
+NOKPROBE_SYMBOL(rethook_hook);
+
+/* This assumes the 'tsk' is the current task or the is not running. */
+static unsigned long __rethook_find_ret_addr(struct task_struct *tsk,
+ struct llist_node **cur)
+{
+ struct rethook_node *rh = NULL;
+ struct llist_node *node = *cur;
+
+ if (!node)
+ node = tsk->rethooks.first;
+ else
+ node = node->next;
+
+ while (node) {
+ rh = container_of(node, struct rethook_node, llist);
+ if (rh->ret_addr != (unsigned long)arch_rethook_trampoline) {
+ *cur = node;
+ return rh->ret_addr;
+ }
+ node = node->next;
+ }
+ return 0;
+}
+NOKPROBE_SYMBOL(__rethook_find_ret_addr);
+
+/**
+ * rethook_find_ret_addr -- Find correct return address modified by rethook
+ * @tsk: Target task
+ * @frame: A frame pointer
+ * @cur: a storage of the loop cursor llist_node pointer for next call
+ *
+ * Find the correct return address modified by a rethook on @tsk in unsigned
+ * long type. If it finds the return address, this returns that address value,
+ * or this returns 0.
+ * The @tsk must be 'current' or a task which is not running. @frame is a hint
+ * to get the currect return address - which is compared with the
+ * rethook::frame field. The @cur is a loop cursor for searching the
+ * kretprobe return addresses on the @tsk. The '*@cur' should be NULL at the
+ * first call, but '@cur' itself must NOT NULL.
+ */
+unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
+ struct llist_node **cur)
+{
+ struct rethook_node *rhn = NULL;
+ unsigned long ret;
+
+ if (WARN_ON_ONCE(!cur))
+ return 0;
+
+ do {
+ ret = __rethook_find_ret_addr(tsk, cur);
+ if (!ret)
+ break;
+ rhn = container_of(*cur, struct rethook_node, llist);
+ } while (rhn->frame != frame);
+
+ return ret;
+}
+NOKPROBE_SYMBOL(rethook_find_ret_addr);
+
+void __weak arch_rethook_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr)
+{
+ /*
+ * Do nothing by default. If the architecture which uses a
+ * frame pointer to record real return address on the stack,
+ * it should fill this function to fixup the return address
+ * so that stacktrace works from the rethook handler.
+ */
+}
+
+/* This function will be called from each arch-defined trampoline. */
+unsigned long rethook_trampoline_handler(struct pt_regs *regs,
+ unsigned long frame)
+{
+ struct llist_node *first, *node = NULL;
+ unsigned long correct_ret_addr;
+ rethook_handler_t handler;
+ struct rethook_node *rhn;
+
+ correct_ret_addr = __rethook_find_ret_addr(current, &node);
+ if (!correct_ret_addr) {
+ pr_err("rethook: Return address not found! Maybe there is a bug in the kernel\n");
+ BUG_ON(1);
+ }
+
+ instruction_pointer_set(regs, correct_ret_addr);
+ arch_rethook_fixup_return(regs, correct_ret_addr);
+
+ first = current->rethooks.first;
+ current->rethooks.first = node->next;
+ node->next = NULL;
+
+ /*
+ * This loop must be protected from rethook_free_rcu() because this is
+ * accessing 'rhn->rethook'.
+ */
+ preempt_disable();
+ while (first) {
+ rhn = container_of(first, struct rethook_node, llist);
+ if (WARN_ON_ONCE(rhn->frame != frame))
+ break;
+ handler = READ_ONCE(rhn->rethook->handler);
+ if (handler)
+ handler(rhn, rhn->rethook->data, regs);
+
+ first = first->next;
+ rethook_recycle(rhn);
+ }
+ preempt_enable();
+
+ return correct_ret_addr;
+}
+NOKPROBE_SYMBOL(rethook_trampoline_handler);

2022-01-21 19:20:56

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH v3 7/9] bpf: Add kprobe link for attaching raw kprobes

From: Jiri Olsa <[email protected]>

Adding new link type BPF_LINK_TYPE_KPROBE to attach so called
"kprobes" directly through fprobe API. Note that since the
using kprobes with multiple same handler is not efficient,
this uses the fprobe which natively support multiple probe
points for one same handler, but limited on function entry
and exit.

Adding new attach type BPF_TRACE_RAW_KPROBE that enables
such link for kprobe program.

The new link allows to create multiple kprobes link by using
new link_create interface:

struct {
__aligned_u64 addrs;
__u32 cnt;
__u64 bpf_cookie;
} kprobe;

Plus new flag BPF_F_KPROBE_RETURN for link_create.flags to
create return probe.

Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
---
include/linux/bpf_types.h | 1
include/uapi/linux/bpf.h | 12 ++
kernel/bpf/syscall.c | 195 +++++++++++++++++++++++++++++++++++++++-
tools/include/uapi/linux/bpf.h | 12 ++
4 files changed, 215 insertions(+), 5 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 48a91c51c015..a9000feab34e 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -140,3 +140,4 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp)
#ifdef CONFIG_PERF_EVENTS
BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf)
#endif
+BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE, kprobe)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b0383d371b9a..5216b333c688 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -995,6 +995,7 @@ enum bpf_attach_type {
BPF_SK_REUSEPORT_SELECT,
BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
BPF_PERF_EVENT,
+ BPF_TRACE_RAW_KPROBE,
__MAX_BPF_ATTACH_TYPE
};

@@ -1009,6 +1010,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_NETNS = 5,
BPF_LINK_TYPE_XDP = 6,
BPF_LINK_TYPE_PERF_EVENT = 7,
+ BPF_LINK_TYPE_KPROBE = 8,

MAX_BPF_LINK_TYPE,
};
@@ -1111,6 +1113,11 @@ enum bpf_link_type {
*/
#define BPF_F_SLEEPABLE (1U << 4)

+/* link_create flags used in LINK_CREATE command for BPF_TRACE_RAW_KPROBE
+ * attach type.
+ */
+#define BPF_F_KPROBE_RETURN (1U << 0)
+
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* the following extensions:
*
@@ -1465,6 +1472,11 @@ union bpf_attr {
*/
__u64 bpf_cookie;
} perf_event;
+ struct {
+ __aligned_u64 addrs;
+ __u32 cnt;
+ __u64 bpf_cookie;
+ } kprobe;
};
} link_create;

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fa4505f9b611..dc4f74fa6882 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -32,6 +32,7 @@
#include <linux/bpf-netns.h>
#include <linux/rcupdate_trace.h>
#include <linux/memcontrol.h>
+#include <linux/fprobe.h>

#define IS_FD_ARRAY(map) ((map)->map_type == BPF_MAP_TYPE_PERF_EVENT_ARRAY || \
(map)->map_type == BPF_MAP_TYPE_CGROUP_ARRAY || \
@@ -3014,8 +3015,182 @@ static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *pro
fput(perf_file);
return err;
}
+#else
+static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+ return -ENOTSUPP;
+}
#endif /* CONFIG_PERF_EVENTS */

+#ifdef CONFIG_FPROBE
+
+/* Note that this is called 'kprobe_link' but using fprobe inside */
+struct bpf_kprobe_link {
+ struct bpf_link link;
+ struct fprobe fp;
+ bool is_return;
+ unsigned long *addrs;
+ u32 cnt;
+ u64 bpf_cookie;
+};
+
+static void bpf_kprobe_link_release(struct bpf_link *link)
+{
+ struct bpf_kprobe_link *kprobe_link;
+
+ kprobe_link = container_of(link, struct bpf_kprobe_link, link);
+
+ unregister_fprobe(&kprobe_link->fp);
+}
+
+static void bpf_kprobe_link_dealloc(struct bpf_link *link)
+{
+ struct bpf_kprobe_link *kprobe_link;
+
+ kprobe_link = container_of(link, struct bpf_kprobe_link, link);
+ kfree(kprobe_link->addrs);
+ kfree(kprobe_link);
+}
+
+static const struct bpf_link_ops bpf_kprobe_link_lops = {
+ .release = bpf_kprobe_link_release,
+ .dealloc = bpf_kprobe_link_dealloc,
+};
+
+static int kprobe_link_prog_run(struct bpf_kprobe_link *kprobe_link,
+ struct pt_regs *regs)
+{
+ struct bpf_trace_run_ctx run_ctx;
+ struct bpf_run_ctx *old_run_ctx;
+ int err;
+
+ if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
+ err = 0;
+ goto out;
+ }
+
+ old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
+ run_ctx.bpf_cookie = kprobe_link->bpf_cookie;
+
+ rcu_read_lock();
+ migrate_disable();
+ err = bpf_prog_run(kprobe_link->link.prog, regs);
+ migrate_enable();
+ rcu_read_unlock();
+
+ bpf_reset_run_ctx(old_run_ctx);
+
+ out:
+ __this_cpu_dec(bpf_prog_active);
+ return err;
+}
+
+static void kprobe_link_entry_handler(struct fprobe *fp, unsigned long entry_ip,
+ struct pt_regs *regs)
+{
+ struct bpf_kprobe_link *kprobe_link;
+
+ /*
+ * Because fprobe's regs->ip is set to the next instruction of
+ * dynamic-ftrace insturction, correct entry ip must be set, so
+ * that the bpf program can access entry address via regs as same
+ * as kprobes.
+ */
+ instruction_pointer_set(regs, entry_ip);
+ kprobe_link = container_of(fp, struct bpf_kprobe_link, fp);
+ kprobe_link_prog_run(kprobe_link, regs);
+}
+
+static void kprobe_link_exit_handler(struct fprobe *fp, unsigned long entry_ip,
+ struct pt_regs *regs)
+{
+ struct bpf_kprobe_link *kprobe_link;
+
+ kprobe_link = container_of(fp, struct bpf_kprobe_link, fp);
+ kprobe_link_prog_run(kprobe_link, regs);
+}
+
+static int bpf_kprobe_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+ struct bpf_link_primer link_primer;
+ struct bpf_kprobe_link *link = NULL;
+ unsigned long *addrs;
+ u32 flags, cnt, size;
+ void __user *uaddrs;
+ u64 **tmp;
+ int err;
+
+ flags = attr->link_create.flags;
+ if (flags & ~BPF_F_KPROBE_RETURN)
+ return -EINVAL;
+
+ uaddrs = u64_to_user_ptr(attr->link_create.kprobe.addrs);
+ cnt = attr->link_create.kprobe.cnt;
+ size = cnt * sizeof(*tmp);
+
+ tmp = kzalloc(size, GFP_KERNEL);
+ if (!tmp)
+ return -ENOMEM;
+
+ if (copy_from_user(tmp, uaddrs, size)) {
+ err = -EFAULT;
+ goto error;
+ }
+
+ /* TODO add extra copy for 32bit archs */
+ if (sizeof(u64) != sizeof(void *)) {
+ err = -EINVAL;
+ goto error;
+ }
+
+ addrs = (unsigned long *) tmp;
+
+ link = kzalloc(sizeof(*link), GFP_KERNEL);
+ if (!link) {
+ err = -ENOMEM;
+ goto error;
+ }
+
+ bpf_link_init(&link->link, BPF_LINK_TYPE_KPROBE, &bpf_kprobe_link_lops, prog);
+
+ err = bpf_link_prime(&link->link, &link_primer);
+ if (err)
+ goto error;
+
+ link->is_return = flags & BPF_F_KPROBE_RETURN;
+ link->addrs = addrs;
+ link->cnt = cnt;
+ link->bpf_cookie = attr->link_create.kprobe.bpf_cookie;
+
+ link->fp.addrs = addrs;
+ link->fp.nentry = cnt;
+
+ if (link->is_return)
+ link->fp.exit_handler = kprobe_link_exit_handler;
+ else
+ link->fp.entry_handler = kprobe_link_entry_handler;
+
+ err = register_fprobe(&link->fp);
+ if (err) {
+ bpf_link_cleanup(&link_primer);
+ goto error;
+ }
+
+ return bpf_link_settle(&link_primer);
+
+error:
+ kfree(link);
+ kfree(tmp);
+
+ return err;
+}
+#else /* !CONFIG_FPROBE */
+static int bpf_kprobe_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
+{
+ return -ENOTSUPP;
+}
+#endif
+
#define BPF_RAW_TRACEPOINT_OPEN_LAST_FIELD raw_tracepoint.prog_fd

static int bpf_raw_tracepoint_open(const union bpf_attr *attr)
@@ -4242,7 +4417,7 @@ static int tracing_bpf_link_attach(const union bpf_attr *attr, bpfptr_t uattr,
return -EINVAL;
}

-#define BPF_LINK_CREATE_LAST_FIELD link_create.iter_info_len
+#define BPF_LINK_CREATE_LAST_FIELD link_create.kprobe.bpf_cookie
static int link_create(union bpf_attr *attr, bpfptr_t uattr)
{
enum bpf_prog_type ptype;
@@ -4266,7 +4441,6 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
ret = tracing_bpf_link_attach(attr, uattr, prog);
goto out;
case BPF_PROG_TYPE_PERF_EVENT:
- case BPF_PROG_TYPE_KPROBE:
case BPF_PROG_TYPE_TRACEPOINT:
if (attr->link_create.attach_type != BPF_PERF_EVENT) {
ret = -EINVAL;
@@ -4274,6 +4448,14 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
}
ptype = prog->type;
break;
+ case BPF_PROG_TYPE_KPROBE:
+ if (attr->link_create.attach_type != BPF_PERF_EVENT &&
+ attr->link_create.attach_type != BPF_TRACE_RAW_KPROBE) {
+ ret = -EINVAL;
+ goto out;
+ }
+ ptype = prog->type;
+ break;
default:
ptype = attach_type_to_prog_type(attr->link_create.attach_type);
if (ptype == BPF_PROG_TYPE_UNSPEC || ptype != prog->type) {
@@ -4305,13 +4487,16 @@ static int link_create(union bpf_attr *attr, bpfptr_t uattr)
ret = bpf_xdp_link_attach(attr, prog);
break;
#endif
-#ifdef CONFIG_PERF_EVENTS
case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_TRACEPOINT:
- case BPF_PROG_TYPE_KPROBE:
ret = bpf_perf_link_attach(attr, prog);
break;
-#endif
+ case BPF_PROG_TYPE_KPROBE:
+ if (attr->link_create.attach_type == BPF_PERF_EVENT)
+ ret = bpf_perf_link_attach(attr, prog);
+ else
+ ret = bpf_kprobe_link_attach(attr, prog);
+ break;
default:
ret = -EINVAL;
}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b0383d371b9a..5216b333c688 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -995,6 +995,7 @@ enum bpf_attach_type {
BPF_SK_REUSEPORT_SELECT,
BPF_SK_REUSEPORT_SELECT_OR_MIGRATE,
BPF_PERF_EVENT,
+ BPF_TRACE_RAW_KPROBE,
__MAX_BPF_ATTACH_TYPE
};

@@ -1009,6 +1010,7 @@ enum bpf_link_type {
BPF_LINK_TYPE_NETNS = 5,
BPF_LINK_TYPE_XDP = 6,
BPF_LINK_TYPE_PERF_EVENT = 7,
+ BPF_LINK_TYPE_KPROBE = 8,

MAX_BPF_LINK_TYPE,
};
@@ -1111,6 +1113,11 @@ enum bpf_link_type {
*/
#define BPF_F_SLEEPABLE (1U << 4)

+/* link_create flags used in LINK_CREATE command for BPF_TRACE_RAW_KPROBE
+ * attach type.
+ */
+#define BPF_F_KPROBE_RETURN (1U << 0)
+
/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
* the following extensions:
*
@@ -1465,6 +1472,11 @@ union bpf_attr {
*/
__u64 bpf_cookie;
} perf_event;
+ struct {
+ __aligned_u64 addrs;
+ __u32 cnt;
+ __u64 bpf_cookie;
+ } kprobe;
};
} link_create;


2022-01-21 19:21:05

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH v3 9/9] [DO NOT MERGE] out-of-tree: kprobes: Use rethook for kretprobe

Replace the kretprobe return trampoline code with rethook APIs.
Note that this is still under testing. This will work only on x86,
but breaks other architectures.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/x86/include/asm/unwind.h | 8 +
arch/x86/kernel/kprobes/core.c | 106 ----------------
include/linux/kprobes.h | 85 +------------
include/linux/sched.h | 3
kernel/exit.c | 1
kernel/fork.c | 3
kernel/kprobes.c | 265 ++++++----------------------------------
kernel/trace/trace_kprobe.c | 4 -
kernel/trace/trace_output.c | 2
9 files changed, 50 insertions(+), 427 deletions(-)

diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 2a1f8734416d..9237576f7ab4 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -4,7 +4,7 @@

#include <linux/sched.h>
#include <linux/ftrace.h>
-#include <linux/kprobes.h>
+#include <linux/rethook.h>
#include <asm/ptrace.h>
#include <asm/stacktrace.h>

@@ -107,9 +107,9 @@ static inline
unsigned long unwind_recover_kretprobe(struct unwind_state *state,
unsigned long addr, unsigned long *addr_p)
{
-#ifdef CONFIG_KRETPROBES
- return is_kretprobe_trampoline(addr) ?
- kretprobe_find_ret_addr(state->task, addr_p, &state->kr_cur) :
+#ifdef CONFIG_RETHOOK
+ return is_rethook_trampoline(addr) ?
+ rethook_find_ret_addr(state->task, (unsigned long)addr_p, &state->kr_cur) :
addr;
#else
return addr;
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index fce99e249d61..b986434035f9 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -801,18 +801,6 @@ set_current_kprobe(struct kprobe *p, struct pt_regs *regs,
= (regs->flags & X86_EFLAGS_IF);
}

-void arch_prepare_kretprobe(struct kretprobe_instance *ri, struct pt_regs *regs)
-{
- unsigned long *sara = stack_addr(regs);
-
- ri->ret_addr = (kprobe_opcode_t *) *sara;
- ri->fp = sara;
-
- /* Replace the return addr with trampoline addr */
- *sara = (unsigned long) &__kretprobe_trampoline;
-}
-NOKPROBE_SYMBOL(arch_prepare_kretprobe);
-
static void kprobe_post_process(struct kprobe *cur, struct pt_regs *regs,
struct kprobe_ctlblk *kcb)
{
@@ -1013,100 +1001,6 @@ int kprobe_int3_handler(struct pt_regs *regs)
}
NOKPROBE_SYMBOL(kprobe_int3_handler);

-/*
- * When a retprobed function returns, this code saves registers and
- * calls trampoline_handler() runs, which calls the kretprobe's handler.
- */
-asm(
- ".text\n"
- ".global __kretprobe_trampoline\n"
- ".type __kretprobe_trampoline, @function\n"
- "__kretprobe_trampoline:\n"
-#ifdef CONFIG_X86_64
- /* Push a fake return address to tell the unwinder it's a kretprobe. */
- " pushq $__kretprobe_trampoline\n"
- UNWIND_HINT_FUNC
- /* Save the 'sp - 8', this will be fixed later. */
- " pushq %rsp\n"
- " pushfq\n"
- SAVE_REGS_STRING
- " movq %rsp, %rdi\n"
- " call trampoline_handler\n"
- RESTORE_REGS_STRING
- /* In trampoline_handler(), 'regs->flags' is copied to 'regs->sp'. */
- " addq $8, %rsp\n"
- " popfq\n"
-#else
- /* Push a fake return address to tell the unwinder it's a kretprobe. */
- " pushl $__kretprobe_trampoline\n"
- UNWIND_HINT_FUNC
- /* Save the 'sp - 4', this will be fixed later. */
- " pushl %esp\n"
- " pushfl\n"
- SAVE_REGS_STRING
- " movl %esp, %eax\n"
- " call trampoline_handler\n"
- RESTORE_REGS_STRING
- /* In trampoline_handler(), 'regs->flags' is copied to 'regs->sp'. */
- " addl $4, %esp\n"
- " popfl\n"
-#endif
- " ret\n"
- ".size __kretprobe_trampoline, .-__kretprobe_trampoline\n"
-);
-NOKPROBE_SYMBOL(__kretprobe_trampoline);
-/*
- * __kretprobe_trampoline() skips updating frame pointer. The frame pointer
- * saved in trampoline_handler() points to the real caller function's
- * frame pointer. Thus the __kretprobe_trampoline() doesn't have a
- * standard stack frame with CONFIG_FRAME_POINTER=y.
- * Let's mark it non-standard function. Anyway, FP unwinder can correctly
- * unwind without the hint.
- */
-STACK_FRAME_NON_STANDARD_FP(__kretprobe_trampoline);
-
-/* This is called from kretprobe_trampoline_handler(). */
-void arch_kretprobe_fixup_return(struct pt_regs *regs,
- kprobe_opcode_t *correct_ret_addr)
-{
- unsigned long *frame_pointer = &regs->sp + 1;
-
- /* Replace fake return address with real one. */
- *frame_pointer = (unsigned long)correct_ret_addr;
-}
-
-/*
- * Called from __kretprobe_trampoline
- */
-__used __visible void trampoline_handler(struct pt_regs *regs)
-{
- unsigned long *frame_pointer;
-
- /* fixup registers */
- regs->cs = __KERNEL_CS;
-#ifdef CONFIG_X86_32
- regs->gs = 0;
-#endif
- regs->ip = (unsigned long)&__kretprobe_trampoline;
- regs->orig_ax = ~0UL;
- regs->sp += sizeof(long);
- frame_pointer = &regs->sp + 1;
-
- /*
- * The return address at 'frame_pointer' is recovered by the
- * arch_kretprobe_fixup_return() which called from the
- * kretprobe_trampoline_handler().
- */
- kretprobe_trampoline_handler(regs, frame_pointer);
-
- /*
- * Copy FLAGS to 'pt_regs::sp' so that __kretprobe_trapmoline()
- * can do RET right after POPF.
- */
- regs->sp = regs->flags;
-}
-NOKPROBE_SYMBOL(trampoline_handler);
-
int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
{
struct kprobe *cur = kprobe_running();
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 8c8f7a4d93af..0742b38181c1 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -26,8 +26,7 @@
#include <linux/rcupdate.h>
#include <linux/mutex.h>
#include <linux/ftrace.h>
-#include <linux/refcount.h>
-#include <linux/freelist.h>
+#include <linux/rethook.h>
#include <asm/kprobes.h>

#ifdef CONFIG_KPROBES
@@ -137,10 +136,6 @@ static inline bool kprobe_ftrace(struct kprobe *p)
* ignored, due to maxactive being too low.
*
*/
-struct kretprobe_holder {
- struct kretprobe *rp;
- refcount_t ref;
-};

struct kretprobe {
struct kprobe kp;
@@ -149,21 +144,13 @@ struct kretprobe {
int maxactive;
int nmissed;
size_t data_size;
- struct freelist_head freelist;
- struct kretprobe_holder *rph;
+ struct rethook *rh;
};

#define KRETPROBE_MAX_DATA_SIZE 4096

struct kretprobe_instance {
- union {
- struct freelist_node freelist;
- struct rcu_head rcu;
- };
- struct llist_node llist;
- struct kretprobe_holder *rph;
- kprobe_opcode_t *ret_addr;
- void *fp;
+ struct rethook_node node;
char data[];
};

@@ -186,57 +173,17 @@ extern void kprobe_busy_begin(void);
extern void kprobe_busy_end(void);

#ifdef CONFIG_KRETPROBES
-extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
- struct pt_regs *regs);
extern int arch_trampoline_kprobe(struct kprobe *p);

-void arch_kretprobe_fixup_return(struct pt_regs *regs,
- kprobe_opcode_t *correct_ret_addr);
-
-void __kretprobe_trampoline(void);
-/*
- * Since some architecture uses structured function pointer,
- * use dereference_function_descriptor() to get real function address.
- */
-static nokprobe_inline void *kretprobe_trampoline_addr(void)
-{
- return dereference_kernel_function_descriptor(__kretprobe_trampoline);
-}
-
-/* If the trampoline handler called from a kprobe, use this version */
-unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
- void *frame_pointer);
-
-static nokprobe_inline
-unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
- void *frame_pointer)
-{
- unsigned long ret;
- /*
- * Set a dummy kprobe for avoiding kretprobe recursion.
- * Since kretprobe never runs in kprobe handler, no kprobe must
- * be running at this point.
- */
- kprobe_busy_begin();
- ret = __kretprobe_trampoline_handler(regs, frame_pointer);
- kprobe_busy_end();
-
- return ret;
-}
-
static nokprobe_inline struct kretprobe *get_kretprobe(struct kretprobe_instance *ri)
{
RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(),
"Kretprobe is accessed from instance under preemptive context");

- return READ_ONCE(ri->rph->rp);
+ return (struct kretprobe *)READ_ONCE(ri->node.rethook->data);
}

#else /* !CONFIG_KRETPROBES */
-static inline void arch_prepare_kretprobe(struct kretprobe *rp,
- struct pt_regs *regs)
-{
-}
static inline int arch_trampoline_kprobe(struct kprobe *p)
{
return 0;
@@ -400,8 +347,6 @@ void unregister_kretprobe(struct kretprobe *rp);
int register_kretprobes(struct kretprobe **rps, int num);
void unregister_kretprobes(struct kretprobe **rps, int num);

-void kprobe_flush_task(struct task_struct *tk);
-
void kprobe_free_init_mem(void);

int disable_kprobe(struct kprobe *kp);
@@ -510,28 +455,6 @@ static inline bool is_kprobe_optinsn_slot(unsigned long addr)
}
#endif /* !CONFIG_OPTPROBES */

-#ifdef CONFIG_KRETPROBES
-static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
-{
- return (void *)addr == kretprobe_trampoline_addr();
-}
-
-unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
- struct llist_node **cur);
-#else
-static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
-{
- return false;
-}
-
-static nokprobe_inline
-unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
- struct llist_node **cur)
-{
- return 0;
-}
-#endif
-
/* Returns true if kprobes handled the fault */
static nokprobe_inline bool kprobe_page_fault(struct pt_regs *regs,
unsigned int trap)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2bfabf5355b7..b0fa25523a35 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1470,9 +1470,6 @@ struct task_struct {
int mce_count;
#endif

-#ifdef CONFIG_KRETPROBES
- struct llist_head kretprobe_instances;
-#endif
#ifdef CONFIG_RETHOOK
struct llist_head rethooks;
#endif
diff --git a/kernel/exit.c b/kernel/exit.c
index a39a321c1f37..e3a05589faa4 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -169,7 +169,6 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
{
struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);

- kprobe_flush_task(tsk);
rethook_flush_task(tsk);
perf_event_delayed_put(tsk);
trace_sched_process_free(tsk);
diff --git a/kernel/fork.c b/kernel/fork.c
index ffae38be64c4..bce71536c20a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2279,9 +2279,6 @@ static __latent_entropy struct task_struct *copy_process(
p->task_works = NULL;
clear_posix_cputimers_work(p);

-#ifdef CONFIG_KRETPROBES
- p->kretprobe_instances.first = NULL;
-#endif
#ifdef CONFIG_RETHOOK
p->rethooks.first = NULL;
#endif
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 21eccc961bba..1f7bba138f7e 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1216,26 +1216,6 @@ void kprobes_inc_nmissed_count(struct kprobe *p)
}
NOKPROBE_SYMBOL(kprobes_inc_nmissed_count);

-static void free_rp_inst_rcu(struct rcu_head *head)
-{
- struct kretprobe_instance *ri = container_of(head, struct kretprobe_instance, rcu);
-
- if (refcount_dec_and_test(&ri->rph->ref))
- kfree(ri->rph);
- kfree(ri);
-}
-NOKPROBE_SYMBOL(free_rp_inst_rcu);
-
-static void recycle_rp_inst(struct kretprobe_instance *ri)
-{
- struct kretprobe *rp = get_kretprobe(ri);
-
- if (likely(rp))
- freelist_add(&ri->freelist, &rp->freelist);
- else
- call_rcu(&ri->rcu, free_rp_inst_rcu);
-}
-NOKPROBE_SYMBOL(recycle_rp_inst);

static struct kprobe kprobe_busy = {
.addr = (void *) get_kprobe,
@@ -1257,56 +1237,6 @@ void kprobe_busy_end(void)
preempt_enable();
}

-/*
- * This function is called from delayed_put_task_struct() when a task is
- * dead and cleaned up to recycle any kretprobe instances associated with
- * this task. These left over instances represent probed functions that
- * have been called but will never return.
- */
-void kprobe_flush_task(struct task_struct *tk)
-{
- struct kretprobe_instance *ri;
- struct llist_node *node;
-
- /* Early boot, not yet initialized. */
- if (unlikely(!kprobes_initialized))
- return;
-
- kprobe_busy_begin();
-
- node = __llist_del_all(&tk->kretprobe_instances);
- while (node) {
- ri = container_of(node, struct kretprobe_instance, llist);
- node = node->next;
-
- recycle_rp_inst(ri);
- }
-
- kprobe_busy_end();
-}
-NOKPROBE_SYMBOL(kprobe_flush_task);
-
-static inline void free_rp_inst(struct kretprobe *rp)
-{
- struct kretprobe_instance *ri;
- struct freelist_node *node;
- int count = 0;
-
- node = rp->freelist.head;
- while (node) {
- ri = container_of(node, struct kretprobe_instance, freelist);
- node = node->next;
-
- kfree(ri);
- count++;
- }
-
- if (refcount_sub_and_test(count, &rp->rph->ref)) {
- kfree(rp->rph);
- rp->rph = NULL;
- }
-}
-
/* Add the new probe to 'ap->list'. */
static int add_new_kprobe(struct kprobe *ap, struct kprobe *p)
{
@@ -1862,139 +1792,6 @@ static struct notifier_block kprobe_exceptions_nb = {
};

#ifdef CONFIG_KRETPROBES
-
-/* This assumes the 'tsk' is the current task or the is not running. */
-static kprobe_opcode_t *__kretprobe_find_ret_addr(struct task_struct *tsk,
- struct llist_node **cur)
-{
- struct kretprobe_instance *ri = NULL;
- struct llist_node *node = *cur;
-
- if (!node)
- node = tsk->kretprobe_instances.first;
- else
- node = node->next;
-
- while (node) {
- ri = container_of(node, struct kretprobe_instance, llist);
- if (ri->ret_addr != kretprobe_trampoline_addr()) {
- *cur = node;
- return ri->ret_addr;
- }
- node = node->next;
- }
- return NULL;
-}
-NOKPROBE_SYMBOL(__kretprobe_find_ret_addr);
-
-/**
- * kretprobe_find_ret_addr -- Find correct return address modified by kretprobe
- * @tsk: Target task
- * @fp: A frame pointer
- * @cur: a storage of the loop cursor llist_node pointer for next call
- *
- * Find the correct return address modified by a kretprobe on @tsk in unsigned
- * long type. If it finds the return address, this returns that address value,
- * or this returns 0.
- * The @tsk must be 'current' or a task which is not running. @fp is a hint
- * to get the currect return address - which is compared with the
- * kretprobe_instance::fp field. The @cur is a loop cursor for searching the
- * kretprobe return addresses on the @tsk. The '*@cur' should be NULL at the
- * first call, but '@cur' itself must NOT NULL.
- */
-unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
- struct llist_node **cur)
-{
- struct kretprobe_instance *ri = NULL;
- kprobe_opcode_t *ret;
-
- if (WARN_ON_ONCE(!cur))
- return 0;
-
- do {
- ret = __kretprobe_find_ret_addr(tsk, cur);
- if (!ret)
- break;
- ri = container_of(*cur, struct kretprobe_instance, llist);
- } while (ri->fp != fp);
-
- return (unsigned long)ret;
-}
-NOKPROBE_SYMBOL(kretprobe_find_ret_addr);
-
-void __weak arch_kretprobe_fixup_return(struct pt_regs *regs,
- kprobe_opcode_t *correct_ret_addr)
-{
- /*
- * Do nothing by default. Please fill this to update the fake return
- * address on the stack with the correct one on each arch if possible.
- */
-}
-
-unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
- void *frame_pointer)
-{
- kprobe_opcode_t *correct_ret_addr = NULL;
- struct kretprobe_instance *ri = NULL;
- struct llist_node *first, *node = NULL;
- struct kretprobe *rp;
-
- /* Find correct address and all nodes for this frame. */
- correct_ret_addr = __kretprobe_find_ret_addr(current, &node);
- if (!correct_ret_addr) {
- pr_err("kretprobe: Return address not found, not execute handler. Maybe there is a bug in the kernel.\n");
- BUG_ON(1);
- }
-
- /*
- * Set the return address as the instruction pointer, because if the
- * user handler calls stack_trace_save_regs() with this 'regs',
- * the stack trace will start from the instruction pointer.
- */
- instruction_pointer_set(regs, (unsigned long)correct_ret_addr);
-
- /* Run the user handler of the nodes. */
- first = current->kretprobe_instances.first;
- while (first) {
- ri = container_of(first, struct kretprobe_instance, llist);
-
- if (WARN_ON_ONCE(ri->fp != frame_pointer))
- break;
-
- rp = get_kretprobe(ri);
- if (rp && rp->handler) {
- struct kprobe *prev = kprobe_running();
-
- __this_cpu_write(current_kprobe, &rp->kp);
- ri->ret_addr = correct_ret_addr;
- rp->handler(ri, regs);
- __this_cpu_write(current_kprobe, prev);
- }
- if (first == node)
- break;
-
- first = first->next;
- }
-
- arch_kretprobe_fixup_return(regs, correct_ret_addr);
-
- /* Unlink all nodes for this frame. */
- first = current->kretprobe_instances.first;
- current->kretprobe_instances.first = node->next;
- node->next = NULL;
-
- /* Recycle free instances. */
- while (first) {
- ri = container_of(first, struct kretprobe_instance, llist);
- first = first->next;
-
- recycle_rp_inst(ri);
- }
-
- return (unsigned long)correct_ret_addr;
-}
-NOKPROBE_SYMBOL(__kretprobe_trampoline_handler)
-
/*
* This kprobe pre_handler is registered with every kretprobe. When probe
* hits it will set up the return probe.
@@ -2003,29 +1800,47 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
{
struct kretprobe *rp = container_of(p, struct kretprobe, kp);
struct kretprobe_instance *ri;
- struct freelist_node *fn;
+ struct rethook_node *rhn;

- fn = freelist_try_get(&rp->freelist);
- if (!fn) {
+ rhn = rethook_try_get(rp->rh);
+ if (!rhn) {
rp->nmissed++;
return 0;
}

- ri = container_of(fn, struct kretprobe_instance, freelist);
+ ri = container_of(rhn, struct kretprobe_instance, node);

if (rp->entry_handler && rp->entry_handler(ri, regs)) {
- freelist_add(&ri->freelist, &rp->freelist);
+ rethook_recycle(rhn);
return 0;
}

- arch_prepare_kretprobe(ri, regs);
-
- __llist_add(&ri->llist, &current->kretprobe_instances);
+ rethook_hook(rhn, regs);

return 0;
}
NOKPROBE_SYMBOL(pre_handler_kretprobe);

+static void kretprobe_rethook_handler(struct rethook_node *rh, void *data,
+ struct pt_regs *regs)
+{
+ struct kretprobe *rp = (struct kretprobe *)data;
+ struct kretprobe_instance *ri;
+ struct kprobe_ctlblk *kcb;
+
+ if (!data)
+ return;
+
+ __this_cpu_write(current_kprobe, &rp->kp);
+ kcb = get_kprobe_ctlblk();
+ kcb->kprobe_status = KPROBE_HIT_ACTIVE;
+
+ ri = container_of(rh, struct kretprobe_instance, node);
+ rp->handler(ri, regs);
+
+ __this_cpu_write(current_kprobe, NULL);
+}
+
bool __weak arch_kprobe_on_func_entry(unsigned long offset)
{
return !offset;
@@ -2100,30 +1915,28 @@ int register_kretprobe(struct kretprobe *rp)
rp->maxactive = num_possible_cpus();
#endif
}
- rp->freelist.head = NULL;
- rp->rph = kzalloc(sizeof(struct kretprobe_holder), GFP_KERNEL);
- if (!rp->rph)
+ rp->rh = rethook_alloc((void *)rp, kretprobe_rethook_handler);
+ if (!rp->rh)
return -ENOMEM;

- rp->rph->rp = rp;
for (i = 0; i < rp->maxactive; i++) {
inst = kzalloc(sizeof(struct kretprobe_instance) +
rp->data_size, GFP_KERNEL);
if (inst == NULL) {
- refcount_set(&rp->rph->ref, i);
- free_rp_inst(rp);
+ rethook_free(rp->rh);
+ rp->rh = NULL;
return -ENOMEM;
}
- inst->rph = rp->rph;
- freelist_add(&inst->freelist, &rp->freelist);
+ rethook_add_node(rp->rh, &inst->node);
}
- refcount_set(&rp->rph->ref, i);

rp->nmissed = 0;
/* Establish function entry probe point */
ret = register_kprobe(&rp->kp);
- if (ret != 0)
- free_rp_inst(rp);
+ if (ret != 0) {
+ rethook_free(rp->rh);
+ rp->rh = NULL;
+ }
return ret;
}
EXPORT_SYMBOL_GPL(register_kretprobe);
@@ -2162,16 +1975,16 @@ void unregister_kretprobes(struct kretprobe **rps, int num)
for (i = 0; i < num; i++) {
if (__unregister_kprobe_top(&rps[i]->kp) < 0)
rps[i]->kp.addr = NULL;
- rps[i]->rph->rp = NULL;
+ rcu_assign_pointer(rps[i]->rh->data, NULL);
+ barrier();
+ rethook_free(rps[i]->rh);
}
mutex_unlock(&kprobe_mutex);

synchronize_rcu();
for (i = 0; i < num; i++) {
- if (rps[i]->kp.addr) {
+ if (rps[i]->kp.addr)
__unregister_kprobe_bottom(&rps[i]->kp);
- free_rp_inst(rps[i]);
- }
}
}
EXPORT_SYMBOL_GPL(unregister_kretprobes);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 4e1257f50aa3..3d88cf33f7cc 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -1446,7 +1446,7 @@ __kretprobe_trace_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
fbuffer.regs = regs;
entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
entry->func = (unsigned long)tk->rp.kp.addr;
- entry->ret_ip = (unsigned long)ri->ret_addr;
+ entry->ret_ip = ri->node.ret_addr;
store_trace_args(&entry[1], &tk->tp, regs, sizeof(*entry), dsize);

trace_event_buffer_commit(&fbuffer);
@@ -1641,7 +1641,7 @@ kretprobe_perf_func(struct trace_kprobe *tk, struct kretprobe_instance *ri,
return;

entry->func = (unsigned long)tk->rp.kp.addr;
- entry->ret_ip = (unsigned long)ri->ret_addr;
+ entry->ret_ip = ri->node.ret_addr;
store_trace_args(&entry[1], &tk->tp, regs, sizeof(*entry), dsize);
perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
head, NULL);
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 3547e7176ff7..b1a391f7bc78 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -349,7 +349,7 @@ EXPORT_SYMBOL_GPL(trace_output_call);

static inline const char *kretprobed(const char *name, unsigned long addr)
{
- if (is_kretprobe_trampoline(addr))
+ if (is_rethook_trampoline(addr))
return "[unknown/kretprobe'd]";
return name;
}

2022-01-21 19:21:07

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH v3 6/9] fprobe: Add sample program for fprobe

Add a sample program for the fprobe.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v2:
- Fix infinit loop for multiple symbols.
- Fix memory leaks for copied string and entry array.
- Update for new fprobe APIs.
- Fix style issues.
---
samples/Kconfig | 7 +++
samples/Makefile | 1
samples/fprobe/Makefile | 3 +
samples/fprobe/fprobe_example.c | 103 +++++++++++++++++++++++++++++++++++++++
4 files changed, 114 insertions(+)
create mode 100644 samples/fprobe/Makefile
create mode 100644 samples/fprobe/fprobe_example.c

diff --git a/samples/Kconfig b/samples/Kconfig
index 43d2e9aa557f..e010c2c1256c 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -73,6 +73,13 @@ config SAMPLE_HW_BREAKPOINT
help
This builds kernel hardware breakpoint example modules.

+config SAMPLE_FPROBE
+ tristate "Build fprobe examples -- loadable modules only"
+ depends on FPROBE && m
+ help
+ This builds a fprobe example module. This module has an option 'symbol'.
+ You can specify a probed symbol or symbols separated with ','.
+
config SAMPLE_KFIFO
tristate "Build kfifo examples -- loadable modules only"
depends on m
diff --git a/samples/Makefile b/samples/Makefile
index 4bcd6b93bffa..4f73fe7aa473 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_SAMPLE_INTEL_MEI) += mei/
subdir-$(CONFIG_SAMPLE_WATCHDOG) += watchdog
subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak/
+obj-$(CONFIG_SAMPLE_FPROBE) += fprobe/
diff --git a/samples/fprobe/Makefile b/samples/fprobe/Makefile
new file mode 100644
index 000000000000..ecccbfa6e99b
--- /dev/null
+++ b/samples/fprobe/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_SAMPLE_FPROBE) += fprobe_example.o
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
new file mode 100644
index 000000000000..c28320537f98
--- /dev/null
+++ b/samples/fprobe/fprobe_example.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Here's a sample kernel module showing the use of fprobe to dump a
+ * stack trace and selected registers when kernel_clone() is called.
+ *
+ * For more information on theory of operation of kprobes, see
+ * Documentation/trace/kprobes.rst
+ *
+ * You will see the trace data in /var/log/messages and on the console
+ * whenever kernel_clone() is invoked to create a new process.
+ */
+
+#define pr_fmt(fmt) "%s: " fmt, __func__
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/fprobe.h>
+#include <linux/slab.h>
+
+#define MAX_SYMBOL_LEN 4096
+struct fprobe sample_probe;
+static char symbol[MAX_SYMBOL_LEN] = "kernel_clone";
+module_param_string(symbol, symbol, sizeof(symbol), 0644);
+
+static void sample_entry_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs)
+{
+ pr_info("Enter <%pS> ip = 0x%p\n", (void *)ip, (void *)ip);
+}
+
+static void sample_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs)
+{
+ unsigned long rip = instruction_pointer(regs);
+
+ pr_info("Return from <%pS> ip = 0x%p to rip = 0x%p (%pS)\n",
+ (void *)ip, (void *)ip, (void *)rip, (void *)rip);
+}
+
+static char *symbuf;
+
+static int __init fprobe_init(void)
+{
+ const char **syms;
+ char *p;
+ int ret, count, i;
+
+ sample_probe.entry_handler = sample_entry_handler;
+ sample_probe.exit_handler = sample_exit_handler;
+
+ if (strchr(symbol, ',')) {
+ symbuf = kstrdup(symbol, GFP_KERNEL);
+ if (!symbuf)
+ return -ENOMEM;
+ p = symbuf;
+ count = 1;
+ while ((p = strchr(++p, ',')) != NULL)
+ count++;
+ } else {
+ count = 1;
+ symbuf = symbol;
+ }
+ pr_info("%d symbols found\n", count);
+
+ syms = kcalloc(count, sizeof(char *), GFP_KERNEL);
+ if (!syms) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ p = symbuf;
+ for (i = 0; i < count; i++)
+ syms[i] = strsep(&p, ",");
+
+ sample_probe.syms = syms;
+ sample_probe.nentry = count;
+
+ ret = register_fprobe(&sample_probe);
+ if (ret < 0) {
+ pr_err("register_fprobe failed, returned %d\n", ret);
+ goto error;
+ }
+ pr_info("Planted fprobe at %s\n", symbol);
+ return 0;
+
+error:
+ if (symbuf != symbol)
+ kfree(symbuf);
+ return ret;
+}
+
+static void __exit fprobe_exit(void)
+{
+ unregister_fprobe(&sample_probe);
+
+ kfree(sample_probe.syms);
+ if (symbuf != symbol)
+ kfree(symbuf);
+
+ pr_info("fprobe at %s unregistered\n", symbol);
+}
+
+module_init(fprobe_init)
+module_exit(fprobe_exit)
+MODULE_LICENSE("GPL");

2022-01-21 19:21:08

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH v3 8/9] [DO NOT MERGE] Out-of-tree: Support wildcard symbol option to sample

This is not intended to be merged to upstream code (since this
expose some kernel internal functions just for an example.)

But this is good to show how the fprobe is time-efficient
for registering a probe on thousands of functions.

# time insmod fprobe_example.ko symbol='btrfs_*'
[ 36.130947] fprobe_init: 1028 symbols found
[ 36.177901] fprobe_init: Planted fprobe at btrfs_*
real 0m 0.08s
user 0m 0.00s
sys 0m 0.07s

Signed-off-by: Masami Hiramatsu <[email protected]>
---
kernel/kallsyms.c | 1 +
kernel/trace/ftrace.c | 1 +
samples/fprobe/fprobe_example.c | 69 ++++++++++++++++++++++++++++++++++-----
3 files changed, 62 insertions(+), 9 deletions(-)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 3011bc33a5ba..d0c4073acbfd 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -246,6 +246,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *,
}
return 0;
}
+EXPORT_SYMBOL_GPL(kallsyms_on_each_symbol);
#endif /* CONFIG_LIVEPATCH */

static unsigned long get_symbol_pos(unsigned long addr,
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 39350aa38649..7ce604bc9529 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1580,6 +1580,7 @@ unsigned long ftrace_location(unsigned long ip)
{
return ftrace_location_range(ip, ip);
}
+EXPORT_SYMBOL_GPL(ftrace_location);

/**
* ftrace_text_reserved - return true if range contains an ftrace location
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index c28320537f98..df034e00661e 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -12,6 +12,7 @@

#define pr_fmt(fmt) "%s: " fmt, __func__

+#include <linux/glob.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/fprobe.h>
@@ -37,16 +38,51 @@ static void sample_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_r

static char *symbuf;

+struct sym_search_param {
+ unsigned long *addrs;
+ const char *pat;
+ int cnt;
+};
+
+#define MAX_FPROBE_ENTS (16 * 1024)
+
+static int wildcard_match(void *data, const char *symbol, struct module *mod,
+ unsigned long addr)
+{
+ struct sym_search_param *param = (struct sym_search_param *)data;
+
+ if (glob_match(param->pat, symbol)) {
+ if (!ftrace_location(addr))
+ return 0;
+
+ if (param->addrs)
+ param->addrs[param->cnt] = addr;
+ param->cnt++;
+ if (param->cnt >= MAX_FPROBE_ENTS)
+ return -E2BIG;
+ }
+ return 0;
+}
+
static int __init fprobe_init(void)
{
- const char **syms;
+ struct sym_search_param param = {.pat = symbol, .addrs = NULL, .cnt = 0};
+ unsigned long *addrs = NULL;
+ const char **syms = NULL;
char *p;
int ret, count, i;
+ bool wildcard = false;

sample_probe.entry_handler = sample_entry_handler;
sample_probe.exit_handler = sample_exit_handler;

- if (strchr(symbol, ',')) {
+ if (strchr(symbol, '*')) {
+ kallsyms_on_each_symbol(wildcard_match, &param);
+ count = param.cnt;
+ if (!count)
+ return -ENOENT;
+ wildcard = true;
+ } else if (strchr(symbol, ',')) {
symbuf = kstrdup(symbol, GFP_KERNEL);
if (!symbuf)
return -ENOMEM;
@@ -58,19 +94,31 @@ static int __init fprobe_init(void)
count = 1;
symbuf = symbol;
}
- pr_info("%d symbols found\n", count);

- syms = kcalloc(count, sizeof(char *), GFP_KERNEL);
- if (!syms) {
+ if (wildcard)
+ addrs = kcalloc(count, sizeof(unsigned long), GFP_KERNEL);
+ else
+ syms = kcalloc(count, sizeof(char *), GFP_KERNEL);
+ if (!syms && !addrs) {
ret = -ENOMEM;
goto error;
}

- p = symbuf;
- for (i = 0; i < count; i++)
- syms[i] = strsep(&p, ",");
+ if (wildcard) {
+ param.addrs = addrs;
+ param.cnt = 0;
+
+ kallsyms_on_each_symbol(wildcard_match, &param);
+ count = param.cnt;
+ sample_probe.addrs = addrs;
+ } else {
+ p = symbuf;
+ for (i = 0; i < count; i++)
+ syms[i] = strsep(&p, ",");
+ sample_probe.syms = syms;
+ }
+ pr_info("%d symbols found\n", count);

- sample_probe.syms = syms;
sample_probe.nentry = count;

ret = register_fprobe(&sample_probe);
@@ -82,6 +130,8 @@ static int __init fprobe_init(void)
return 0;

error:
+ kfree(addrs);
+ kfree(syms);
if (symbuf != symbol)
kfree(symbuf);
return ret;
@@ -92,6 +142,7 @@ static void __exit fprobe_exit(void)
unregister_fprobe(&sample_probe);

kfree(sample_probe.syms);
+ kfree(sample_probe.addrs);
if (symbuf != symbol)
kfree(symbuf);


2022-01-21 19:21:45

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH v3 5/9] fprobe: Add exit_handler support

Add exit_handler to fprobe. fprobe + rethook allows us
to hook the kernel function return without fgraph tracer.
Eventually, the fgraph tracer will be generic array based
return hooking and fprobe may use it if user requests.
Since both array-based approach and list-based approach
have Pros and Cons, (e.g. memory consumption v.s. less
missing events) it is better to keep both but fprobe
will provide the same exit-handler interface.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v3:
- Make sure to clear rethook->data before free.
- Handler checks the data is not NULL.
- Free rethook only if the rethook is using.
---
include/linux/fprobe.h | 4 +++
kernel/trace/Kconfig | 1 +
kernel/trace/fprobe.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 2fc487d933e3..dd9d65294552 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -5,6 +5,7 @@

#include <linux/compiler.h>
#include <linux/ftrace.h>
+#include <linux/rethook.h>

/**
* struct fprobe - ftrace based probe.
@@ -28,7 +29,10 @@ struct fprobe {
struct ftrace_ops ftrace;
unsigned long nmissed;
unsigned int flags;
+ struct rethook *rethook;
+
void (*entry_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
+ void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
};

#define FPROBE_FL_DISABLED 1
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 44c473ad9021..00bdd2a2f417 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -238,6 +238,7 @@ config FPROBE
bool "Kernel Function Probe (fprobe)"
depends on FUNCTION_TRACER
depends on DYNAMIC_FTRACE_WITH_REGS
+ select RETHOOK
default n
help
This option enables kernel function probe feature, which is
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 8b068deadc48..7d98ca026c72 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -7,12 +7,20 @@
#include <linux/fprobe.h>
#include <linux/kallsyms.h>
#include <linux/kprobes.h>
+#include <linux/rethook.h>
#include <linux/slab.h>
#include <linux/sort.h>

+struct fprobe_rethook_node {
+ struct rethook_node node;
+ unsigned long entry_ip;
+};
+
static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct ftrace_regs *fregs)
{
+ struct fprobe_rethook_node *fpr;
+ struct rethook_node *rh;
struct fprobe *fp;
int bit;

@@ -29,10 +37,37 @@ static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
if (fp->entry_handler)
fp->entry_handler(fp, ip, ftrace_get_regs(fregs));

+ if (fp->exit_handler) {
+ rh = rethook_try_get(fp->rethook);
+ if (!rh) {
+ fp->nmissed++;
+ goto out;
+ }
+ fpr = container_of(rh, struct fprobe_rethook_node, node);
+ fpr->entry_ip = ip;
+ rethook_hook(rh, ftrace_get_regs(fregs));
+ }
+
+out:
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(fprobe_handler);

+static void fprobe_exit_handler(struct rethook_node *rh, void *data,
+ struct pt_regs *regs)
+{
+ struct fprobe *fp = (struct fprobe *)data;
+ struct fprobe_rethook_node *fpr;
+
+ if (!data)
+ return;
+
+ fpr = container_of(rh, struct fprobe_rethook_node, node);
+
+ fp->exit_handler(fp, fpr->entry_ip, regs);
+}
+NOKPROBE_SYMBOL(fprobe_exit_handler);
+
static int convert_func_addresses(struct fprobe *fp)
{
unsigned int i;
@@ -45,7 +80,6 @@ static int convert_func_addresses(struct fprobe *fp)
return -ENOMEM;

for (i = 0; i < fp->nentry; i++) {
-
fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
if (!fp->addrs[i])
return -ENOENT;
@@ -64,6 +98,7 @@ static int convert_func_addresses(struct fprobe *fp)
*/
int register_fprobe(struct fprobe *fp)
{
+ unsigned int i, size;
int ret;

if (!fp || !fp->nentry || (!fp->syms && !fp->addrs) ||
@@ -78,10 +113,29 @@ int register_fprobe(struct fprobe *fp)
fp->ftrace.func = fprobe_handler;
fp->ftrace.flags = FTRACE_OPS_FL_SAVE_REGS;

+ /* Initialize rethook if needed */
+ if (fp->exit_handler) {
+ size = fp->nentry * num_possible_cpus() * 2;
+ fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler);
+ for (i = 0; i < size; i++) {
+ struct rethook_node *node;
+
+ node = kzalloc(sizeof(struct fprobe_rethook_node), GFP_KERNEL);
+ if (!node) {
+ rethook_free(fp->rethook);
+ ret = -ENOMEM;
+ goto out;
+ }
+ rethook_add_node(fp->rethook, node);
+ }
+ } else
+ fp->rethook = NULL;
+
ret = ftrace_set_filter_ips(&fp->ftrace, fp->addrs, fp->nentry, 0, 0);
if (!ret)
ret = register_ftrace_function(&fp->ftrace);

+out:
if (ret < 0 && fp->syms) {
kfree(fp->addrs);
fp->addrs = NULL;
@@ -107,8 +161,16 @@ int unregister_fprobe(struct fprobe *fp)
return -EINVAL;

ret = unregister_ftrace_function(&fp->ftrace);
+ if (ret < 0)
+ return ret;

- if (!ret && fp->syms) {
+ if (fp->rethook) {
+ /* Make sure to clear rethook->data before freeing. */
+ WRITE_ONCE(fp->rethook->data, NULL);
+ barrier();
+ rethook_free(fp->rethook);
+ }
+ if (fp->syms) {
kfree(fp->addrs);
fp->addrs = NULL;
}

2022-01-21 21:18:11

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 7/9] bpf: Add kprobe link for attaching raw kprobes

On Wed, Jan 19, 2022 at 11:57:50PM +0900, Masami Hiramatsu wrote:

SNIP

> +static int kprobe_link_prog_run(struct bpf_kprobe_link *kprobe_link,
> + struct pt_regs *regs)
> +{
> + struct bpf_trace_run_ctx run_ctx;
> + struct bpf_run_ctx *old_run_ctx;
> + int err;
> +
> + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
> + err = 0;
> + goto out;
> + }
> +
> + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> + run_ctx.bpf_cookie = kprobe_link->bpf_cookie;
> +
> + rcu_read_lock();
> + migrate_disable();
> + err = bpf_prog_run(kprobe_link->link.prog, regs);
> + migrate_enable();
> + rcu_read_unlock();
> +
> + bpf_reset_run_ctx(old_run_ctx);
> +
> + out:
> + __this_cpu_dec(bpf_prog_active);
> + return err;
> +}
> +
> +static void kprobe_link_entry_handler(struct fprobe *fp, unsigned long entry_ip,
> + struct pt_regs *regs)
> +{
> + struct bpf_kprobe_link *kprobe_link;
> +
> + /*
> + * Because fprobe's regs->ip is set to the next instruction of
> + * dynamic-ftrace insturction, correct entry ip must be set, so
> + * that the bpf program can access entry address via regs as same
> + * as kprobes.
> + */
> + instruction_pointer_set(regs, entry_ip);

ok, so this actually does the stall for me.. it changes
the return address back to repeat the call again

bu I think it's good idea to carry the original ip in regs
(for bpf_get_func_ip helper) so I think we need to save it
first and restore after the callback

I'll make the fix and add cookie change Andrii asked for
on top of your ftrace changes and let you know

thanks,
jirka

> + kprobe_link = container_of(fp, struct bpf_kprobe_link, fp);
> + kprobe_link_prog_run(kprobe_link, regs);
> +}
> +

SNIP

2022-01-21 21:57:58

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC PATCH v3 7/9] bpf: Add kprobe link for attaching raw kprobes

On Thu, 20 Jan 2022 11:44:54 +0100
Jiri Olsa <[email protected]> wrote:

> On Wed, Jan 19, 2022 at 11:57:50PM +0900, Masami Hiramatsu wrote:
>
> SNIP
>
> > +static int kprobe_link_prog_run(struct bpf_kprobe_link *kprobe_link,
> > + struct pt_regs *regs)
> > +{
> > + struct bpf_trace_run_ctx run_ctx;
> > + struct bpf_run_ctx *old_run_ctx;
> > + int err;
> > +
> > + if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
> > + err = 0;
> > + goto out;
> > + }
> > +
> > + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > + run_ctx.bpf_cookie = kprobe_link->bpf_cookie;
> > +
> > + rcu_read_lock();
> > + migrate_disable();
> > + err = bpf_prog_run(kprobe_link->link.prog, regs);
> > + migrate_enable();
> > + rcu_read_unlock();
> > +
> > + bpf_reset_run_ctx(old_run_ctx);
> > +
> > + out:
> > + __this_cpu_dec(bpf_prog_active);
> > + return err;
> > +}
> > +
> > +static void kprobe_link_entry_handler(struct fprobe *fp, unsigned long entry_ip,
> > + struct pt_regs *regs)
> > +{
> > + struct bpf_kprobe_link *kprobe_link;
> > +
> > + /*
> > + * Because fprobe's regs->ip is set to the next instruction of
> > + * dynamic-ftrace insturction, correct entry ip must be set, so
> > + * that the bpf program can access entry address via regs as same
> > + * as kprobes.
> > + */
> > + instruction_pointer_set(regs, entry_ip);
>
> ok, so this actually does the stall for me.. it changes
> the return address back to repeat the call again

Good catch! and don't mind, this change is introduced me :-P.
I thought that if I didn't add the FTRACE_FL_IPMODIFY, ftrace
ignores the updated regs->ip, but it doesn't.

> bu I think it's good idea to carry the original ip in regs
> (for bpf_get_func_ip helper) so I think we need to save it
> first and restore after the callback

Yes, btw, should I do that fix in fprobe?

>
> I'll make the fix and add cookie change Andrii asked for
> on top of your ftrace changes and let you know

OK, thank you!

>
> thanks,
> jirka
>
> > + kprobe_link = container_of(fp, struct bpf_kprobe_link, fp);
> > + kprobe_link_prog_run(kprobe_link, regs);
> > +}
> > +
>
> SNIP
>


--
Masami Hiramatsu <[email protected]>

2022-01-22 00:17:19

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] fprobe: Introduce fprobe function entry/exit probe

On Wed, Jan 19, 2022 at 6:56 AM Masami Hiramatsu <[email protected]> wrote:
>
> Hello Jiri,
>
> Here is the 3rd version of fprobe. I added some comments and
> fixed some issues. But I still saw some problems when I add
> your selftest patches.
>
> This series introduces the fprobe, the function entry/exit probe
> with multiple probe point support. This also introduces the rethook
> for hooking function return as same as kretprobe does. This
> abstraction will help us to generalize the fgraph tracer,
> because we can just switch it from rethook in fprobe, depending
> on the kernel configuration.
>
> The patch [1/9] and [7/9] are from Jiri's series[1]. Other libbpf
> patches will not be affected by this change.
>
> [1] https://lore.kernel.org/all/[email protected]/T/#u
>
> However, when I applied all other patches on top of this series,
> I saw the "#8 bpf_cookie" test case has been stacked (maybe related
> to the bpf_cookie issue which Andrii and Jiri talked?) And when I
> remove the last selftest patch[2], the selftest stopped at "#112
> raw_tp_test_run".
>
> [2] https://lore.kernel.org/all/[email protected]/T/#m242d2b3a3775eeb5baba322424b15901e5e78483
>
> Note that I used tools/testing/selftests/bpf/vmtest.sh to check it.
>
> This added 2 more out-of-tree patches. [8/9] is for adding wildcard
> support to the sample program, [9/9] is a testing patch for replacing
> kretprobe trampoline with rethook.
> According to this work, I noticed that using rethook in kretprobe
> needs 2 steps.
> 1. port the rethook on all architectures which supports kretprobes.
> (some arch requires CONFIG_KPROBES for rethook)
> 2. replace kretprobe trampoline with rethook for all archs, at once.
> This must be done by one treewide patch.
>
> Anyway, I'll do the kretprobe update in the next step as another series.
> (This testing patch is just for confirming the rethook is correctly
> implemented.)
>
> BTW, on the x86, ftrace (with fentry) location address is same as
> symbol address. But on other archs, it will be different (e.g. arm64
> will need 2 instructions to save link-register and call ftrace, the
> 2nd instruction will be the ftrace location.)
> Does libbpf correctly handle it?

libbpf doesn't do anything there. The interface for kprobe is based on
function name and kernel performs name lookups internally to resolve
IP. For fentry it's similar (kernel handles IP resolution), but
instead of function name we specify BTF ID of a function type.

>
> Thank you,
>
> ---
>
> Jiri Olsa (2):
> ftrace: Add ftrace_set_filter_ips function
> bpf: Add kprobe link for attaching raw kprobes
>
> Masami Hiramatsu (7):
> fprobe: Add ftrace based probe APIs
> rethook: Add a generic return hook
> rethook: x86: Add rethook x86 implementation
> fprobe: Add exit_handler support
> fprobe: Add sample program for fprobe
> [DO NOT MERGE] Out-of-tree: Support wildcard symbol option to sample
> [DO NOT MERGE] out-of-tree: kprobes: Use rethook for kretprobe
>
>
> arch/x86/Kconfig | 1
> arch/x86/include/asm/unwind.h | 8 +
> arch/x86/kernel/Makefile | 1
> arch/x86/kernel/kprobes/core.c | 106 --------------
> arch/x86/kernel/rethook.c | 115 +++++++++++++++
> include/linux/bpf_types.h | 1
> include/linux/fprobe.h | 84 +++++++++++
> include/linux/ftrace.h | 3
> include/linux/kprobes.h | 85 +----------
> include/linux/rethook.h | 99 +++++++++++++
> include/linux/sched.h | 4 -
> include/uapi/linux/bpf.h | 12 ++
> kernel/bpf/syscall.c | 195 +++++++++++++++++++++++++-
> kernel/exit.c | 3
> kernel/fork.c | 4 -
> kernel/kallsyms.c | 1
> kernel/kprobes.c | 265 +++++------------------------------
> kernel/trace/Kconfig | 22 +++
> kernel/trace/Makefile | 2
> kernel/trace/fprobe.c | 179 ++++++++++++++++++++++++
> kernel/trace/ftrace.c | 54 ++++++-
> kernel/trace/rethook.c | 295 +++++++++++++++++++++++++++++++++++++++
> kernel/trace/trace_kprobe.c | 4 -
> kernel/trace/trace_output.c | 2
> samples/Kconfig | 7 +
> samples/Makefile | 1
> samples/fprobe/Makefile | 3
> samples/fprobe/fprobe_example.c | 154 ++++++++++++++++++++
> tools/include/uapi/linux/bpf.h | 12 ++
> 29 files changed, 1283 insertions(+), 439 deletions(-)
> create mode 100644 arch/x86/kernel/rethook.c
> create mode 100644 include/linux/fprobe.h
> create mode 100644 include/linux/rethook.h
> create mode 100644 kernel/trace/fprobe.c
> create mode 100644 kernel/trace/rethook.c
> create mode 100644 samples/fprobe/Makefile
> create mode 100644 samples/fprobe/fprobe_example.c
>
> --
> Masami Hiramatsu (Linaro) <[email protected]>

2022-01-22 00:33:35

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] fprobe: Introduce fprobe function entry/exit probe

On Thu, 20 Jan 2022 14:24:15 -0800
Andrii Nakryiko <[email protected]> wrote:

> On Wed, Jan 19, 2022 at 6:56 AM Masami Hiramatsu <[email protected]> wrote:
> >
> > Hello Jiri,
> >
> > Here is the 3rd version of fprobe. I added some comments and
> > fixed some issues. But I still saw some problems when I add
> > your selftest patches.
> >
> > This series introduces the fprobe, the function entry/exit probe
> > with multiple probe point support. This also introduces the rethook
> > for hooking function return as same as kretprobe does. This
> > abstraction will help us to generalize the fgraph tracer,
> > because we can just switch it from rethook in fprobe, depending
> > on the kernel configuration.
> >
> > The patch [1/9] and [7/9] are from Jiri's series[1]. Other libbpf
> > patches will not be affected by this change.
> >
> > [1] https://lore.kernel.org/all/[email protected]/T/#u
> >
> > However, when I applied all other patches on top of this series,
> > I saw the "#8 bpf_cookie" test case has been stacked (maybe related
> > to the bpf_cookie issue which Andrii and Jiri talked?) And when I
> > remove the last selftest patch[2], the selftest stopped at "#112
> > raw_tp_test_run".
> >
> > [2] https://lore.kernel.org/all/[email protected]/T/#m242d2b3a3775eeb5baba322424b15901e5e78483
> >
> > Note that I used tools/testing/selftests/bpf/vmtest.sh to check it.
> >
> > This added 2 more out-of-tree patches. [8/9] is for adding wildcard
> > support to the sample program, [9/9] is a testing patch for replacing
> > kretprobe trampoline with rethook.
> > According to this work, I noticed that using rethook in kretprobe
> > needs 2 steps.
> > 1. port the rethook on all architectures which supports kretprobes.
> > (some arch requires CONFIG_KPROBES for rethook)
> > 2. replace kretprobe trampoline with rethook for all archs, at once.
> > This must be done by one treewide patch.
> >
> > Anyway, I'll do the kretprobe update in the next step as another series.
> > (This testing patch is just for confirming the rethook is correctly
> > implemented.)
> >
> > BTW, on the x86, ftrace (with fentry) location address is same as
> > symbol address. But on other archs, it will be different (e.g. arm64
> > will need 2 instructions to save link-register and call ftrace, the
> > 2nd instruction will be the ftrace location.)
> > Does libbpf correctly handle it?
>
> libbpf doesn't do anything there. The interface for kprobe is based on
> function name and kernel performs name lookups internally to resolve
> IP. For fentry it's similar (kernel handles IP resolution), but
> instead of function name we specify BTF ID of a function type.

Hmm, according to Jiri's original patch, it seems to pass an array of
addresses. So I thought that has been resolved by libbpf.

+ struct {
+ __aligned_u64 addrs;
+ __u32 cnt;
+ __u64 bpf_cookie;
+ } kprobe;

Anyway, fprobe itself also has same issue. I'll try to fix it.

Thank you!

>
> >
> > Thank you,
> >
> > ---
> >
> > Jiri Olsa (2):
> > ftrace: Add ftrace_set_filter_ips function
> > bpf: Add kprobe link for attaching raw kprobes
> >
> > Masami Hiramatsu (7):
> > fprobe: Add ftrace based probe APIs
> > rethook: Add a generic return hook
> > rethook: x86: Add rethook x86 implementation
> > fprobe: Add exit_handler support
> > fprobe: Add sample program for fprobe
> > [DO NOT MERGE] Out-of-tree: Support wildcard symbol option to sample
> > [DO NOT MERGE] out-of-tree: kprobes: Use rethook for kretprobe
> >
> >
> > arch/x86/Kconfig | 1
> > arch/x86/include/asm/unwind.h | 8 +
> > arch/x86/kernel/Makefile | 1
> > arch/x86/kernel/kprobes/core.c | 106 --------------
> > arch/x86/kernel/rethook.c | 115 +++++++++++++++
> > include/linux/bpf_types.h | 1
> > include/linux/fprobe.h | 84 +++++++++++
> > include/linux/ftrace.h | 3
> > include/linux/kprobes.h | 85 +----------
> > include/linux/rethook.h | 99 +++++++++++++
> > include/linux/sched.h | 4 -
> > include/uapi/linux/bpf.h | 12 ++
> > kernel/bpf/syscall.c | 195 +++++++++++++++++++++++++-
> > kernel/exit.c | 3
> > kernel/fork.c | 4 -
> > kernel/kallsyms.c | 1
> > kernel/kprobes.c | 265 +++++------------------------------
> > kernel/trace/Kconfig | 22 +++
> > kernel/trace/Makefile | 2
> > kernel/trace/fprobe.c | 179 ++++++++++++++++++++++++
> > kernel/trace/ftrace.c | 54 ++++++-
> > kernel/trace/rethook.c | 295 +++++++++++++++++++++++++++++++++++++++
> > kernel/trace/trace_kprobe.c | 4 -
> > kernel/trace/trace_output.c | 2
> > samples/Kconfig | 7 +
> > samples/Makefile | 1
> > samples/fprobe/Makefile | 3
> > samples/fprobe/fprobe_example.c | 154 ++++++++++++++++++++
> > tools/include/uapi/linux/bpf.h | 12 ++
> > 29 files changed, 1283 insertions(+), 439 deletions(-)
> > create mode 100644 arch/x86/kernel/rethook.c
> > create mode 100644 include/linux/fprobe.h
> > create mode 100644 include/linux/rethook.h
> > create mode 100644 kernel/trace/fprobe.c
> > create mode 100644 kernel/trace/rethook.c
> > create mode 100644 samples/fprobe/Makefile
> > create mode 100644 samples/fprobe/fprobe_example.c
> >
> > --
> > Masami Hiramatsu (Linaro) <[email protected]>


--
Masami Hiramatsu <[email protected]>

2022-01-22 02:04:15

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] fprobe: Introduce fprobe function entry/exit probe

On Thu, Jan 20, 2022 at 8:55 PM Masami Hiramatsu <[email protected]> wrote:
>
> On Thu, 20 Jan 2022 14:24:15 -0800
> Andrii Nakryiko <[email protected]> wrote:
>
> > On Wed, Jan 19, 2022 at 6:56 AM Masami Hiramatsu <[email protected]> wrote:
> > >
> > > Hello Jiri,
> > >
> > > Here is the 3rd version of fprobe. I added some comments and
> > > fixed some issues. But I still saw some problems when I add
> > > your selftest patches.
> > >
> > > This series introduces the fprobe, the function entry/exit probe
> > > with multiple probe point support. This also introduces the rethook
> > > for hooking function return as same as kretprobe does. This
> > > abstraction will help us to generalize the fgraph tracer,
> > > because we can just switch it from rethook in fprobe, depending
> > > on the kernel configuration.
> > >
> > > The patch [1/9] and [7/9] are from Jiri's series[1]. Other libbpf
> > > patches will not be affected by this change.
> > >
> > > [1] https://lore.kernel.org/all/[email protected]/T/#u
> > >
> > > However, when I applied all other patches on top of this series,
> > > I saw the "#8 bpf_cookie" test case has been stacked (maybe related
> > > to the bpf_cookie issue which Andrii and Jiri talked?) And when I
> > > remove the last selftest patch[2], the selftest stopped at "#112
> > > raw_tp_test_run".
> > >
> > > [2] https://lore.kernel.org/all/[email protected]/T/#m242d2b3a3775eeb5baba322424b15901e5e78483
> > >
> > > Note that I used tools/testing/selftests/bpf/vmtest.sh to check it.
> > >
> > > This added 2 more out-of-tree patches. [8/9] is for adding wildcard
> > > support to the sample program, [9/9] is a testing patch for replacing
> > > kretprobe trampoline with rethook.
> > > According to this work, I noticed that using rethook in kretprobe
> > > needs 2 steps.
> > > 1. port the rethook on all architectures which supports kretprobes.
> > > (some arch requires CONFIG_KPROBES for rethook)
> > > 2. replace kretprobe trampoline with rethook for all archs, at once.
> > > This must be done by one treewide patch.
> > >
> > > Anyway, I'll do the kretprobe update in the next step as another series.
> > > (This testing patch is just for confirming the rethook is correctly
> > > implemented.)
> > >
> > > BTW, on the x86, ftrace (with fentry) location address is same as
> > > symbol address. But on other archs, it will be different (e.g. arm64
> > > will need 2 instructions to save link-register and call ftrace, the
> > > 2nd instruction will be the ftrace location.)
> > > Does libbpf correctly handle it?
> >
> > libbpf doesn't do anything there. The interface for kprobe is based on
> > function name and kernel performs name lookups internally to resolve
> > IP. For fentry it's similar (kernel handles IP resolution), but
> > instead of function name we specify BTF ID of a function type.
>
> Hmm, according to Jiri's original patch, it seems to pass an array of
> addresses. So I thought that has been resolved by libbpf.
>
> + struct {
> + __aligned_u64 addrs;

I think this is a pointer to an array of pointers to zero-terminated C strings

> + __u32 cnt;
> + __u64 bpf_cookie;
> + } kprobe;
>
> Anyway, fprobe itself also has same issue. I'll try to fix it.
>
> Thank you!
>
> >
> > >
> > > Thank you,
> > >
> > > ---
> > >
> > > Jiri Olsa (2):
> > > ftrace: Add ftrace_set_filter_ips function
> > > bpf: Add kprobe link for attaching raw kprobes
> > >
> > > Masami Hiramatsu (7):
> > > fprobe: Add ftrace based probe APIs
> > > rethook: Add a generic return hook
> > > rethook: x86: Add rethook x86 implementation
> > > fprobe: Add exit_handler support
> > > fprobe: Add sample program for fprobe
> > > [DO NOT MERGE] Out-of-tree: Support wildcard symbol option to sample
> > > [DO NOT MERGE] out-of-tree: kprobes: Use rethook for kretprobe
> > >
> > >
> > > arch/x86/Kconfig | 1
> > > arch/x86/include/asm/unwind.h | 8 +
> > > arch/x86/kernel/Makefile | 1
> > > arch/x86/kernel/kprobes/core.c | 106 --------------
> > > arch/x86/kernel/rethook.c | 115 +++++++++++++++
> > > include/linux/bpf_types.h | 1
> > > include/linux/fprobe.h | 84 +++++++++++
> > > include/linux/ftrace.h | 3
> > > include/linux/kprobes.h | 85 +----------
> > > include/linux/rethook.h | 99 +++++++++++++
> > > include/linux/sched.h | 4 -
> > > include/uapi/linux/bpf.h | 12 ++
> > > kernel/bpf/syscall.c | 195 +++++++++++++++++++++++++-
> > > kernel/exit.c | 3
> > > kernel/fork.c | 4 -
> > > kernel/kallsyms.c | 1
> > > kernel/kprobes.c | 265 +++++------------------------------
> > > kernel/trace/Kconfig | 22 +++
> > > kernel/trace/Makefile | 2
> > > kernel/trace/fprobe.c | 179 ++++++++++++++++++++++++
> > > kernel/trace/ftrace.c | 54 ++++++-
> > > kernel/trace/rethook.c | 295 +++++++++++++++++++++++++++++++++++++++
> > > kernel/trace/trace_kprobe.c | 4 -
> > > kernel/trace/trace_output.c | 2
> > > samples/Kconfig | 7 +
> > > samples/Makefile | 1
> > > samples/fprobe/Makefile | 3
> > > samples/fprobe/fprobe_example.c | 154 ++++++++++++++++++++
> > > tools/include/uapi/linux/bpf.h | 12 ++
> > > 29 files changed, 1283 insertions(+), 439 deletions(-)
> > > create mode 100644 arch/x86/kernel/rethook.c
> > > create mode 100644 include/linux/fprobe.h
> > > create mode 100644 include/linux/rethook.h
> > > create mode 100644 kernel/trace/fprobe.c
> > > create mode 100644 kernel/trace/rethook.c
> > > create mode 100644 samples/fprobe/Makefile
> > > create mode 100644 samples/fprobe/fprobe_example.c
> > >
> > > --
> > > Masami Hiramatsu (Linaro) <[email protected]>
>
>
> --
> Masami Hiramatsu <[email protected]>

2022-01-24 14:31:34

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] fprobe: Introduce fprobe function entry/exit probe

On Fri, Jan 21, 2022 at 09:29:00AM -0800, Andrii Nakryiko wrote:
> On Thu, Jan 20, 2022 at 8:55 PM Masami Hiramatsu <[email protected]> wrote:
> >
> > On Thu, 20 Jan 2022 14:24:15 -0800
> > Andrii Nakryiko <[email protected]> wrote:
> >
> > > On Wed, Jan 19, 2022 at 6:56 AM Masami Hiramatsu <[email protected]> wrote:
> > > >
> > > > Hello Jiri,
> > > >
> > > > Here is the 3rd version of fprobe. I added some comments and
> > > > fixed some issues. But I still saw some problems when I add
> > > > your selftest patches.
> > > >
> > > > This series introduces the fprobe, the function entry/exit probe
> > > > with multiple probe point support. This also introduces the rethook
> > > > for hooking function return as same as kretprobe does. This
> > > > abstraction will help us to generalize the fgraph tracer,
> > > > because we can just switch it from rethook in fprobe, depending
> > > > on the kernel configuration.
> > > >
> > > > The patch [1/9] and [7/9] are from Jiri's series[1]. Other libbpf
> > > > patches will not be affected by this change.
> > > >
> > > > [1] https://lore.kernel.org/all/[email protected]/T/#u
> > > >
> > > > However, when I applied all other patches on top of this series,
> > > > I saw the "#8 bpf_cookie" test case has been stacked (maybe related
> > > > to the bpf_cookie issue which Andrii and Jiri talked?) And when I
> > > > remove the last selftest patch[2], the selftest stopped at "#112
> > > > raw_tp_test_run".
> > > >
> > > > [2] https://lore.kernel.org/all/[email protected]/T/#m242d2b3a3775eeb5baba322424b15901e5e78483
> > > >
> > > > Note that I used tools/testing/selftests/bpf/vmtest.sh to check it.
> > > >
> > > > This added 2 more out-of-tree patches. [8/9] is for adding wildcard
> > > > support to the sample program, [9/9] is a testing patch for replacing
> > > > kretprobe trampoline with rethook.
> > > > According to this work, I noticed that using rethook in kretprobe
> > > > needs 2 steps.
> > > > 1. port the rethook on all architectures which supports kretprobes.
> > > > (some arch requires CONFIG_KPROBES for rethook)
> > > > 2. replace kretprobe trampoline with rethook for all archs, at once.
> > > > This must be done by one treewide patch.
> > > >
> > > > Anyway, I'll do the kretprobe update in the next step as another series.
> > > > (This testing patch is just for confirming the rethook is correctly
> > > > implemented.)
> > > >
> > > > BTW, on the x86, ftrace (with fentry) location address is same as
> > > > symbol address. But on other archs, it will be different (e.g. arm64
> > > > will need 2 instructions to save link-register and call ftrace, the
> > > > 2nd instruction will be the ftrace location.)
> > > > Does libbpf correctly handle it?

hm, I'm probably missing something, but should this be handled by arm
specific kernel code? user passes whatever is found in kallsyms, right?


> > >
> > > libbpf doesn't do anything there. The interface for kprobe is based on
> > > function name and kernel performs name lookups internally to resolve
> > > IP. For fentry it's similar (kernel handles IP resolution), but
> > > instead of function name we specify BTF ID of a function type.
> >
> > Hmm, according to Jiri's original patch, it seems to pass an array of
> > addresses. So I thought that has been resolved by libbpf.
> >
> > + struct {
> > + __aligned_u64 addrs;
>
> I think this is a pointer to an array of pointers to zero-terminated C strings

I used direct addresses, because bpftrace already has them, so there was
no point passing strings, I cann add support for that

jirka

>
> > + __u32 cnt;
> > + __u64 bpf_cookie;
> > + } kprobe;
> >
> > Anyway, fprobe itself also has same issue. I'll try to fix it.
> >
> > Thank you!
> >
> > >
> > > >
> > > > Thank you,
> > > >
> > > > ---
> > > >
> > > > Jiri Olsa (2):
> > > > ftrace: Add ftrace_set_filter_ips function
> > > > bpf: Add kprobe link for attaching raw kprobes
> > > >
> > > > Masami Hiramatsu (7):
> > > > fprobe: Add ftrace based probe APIs
> > > > rethook: Add a generic return hook
> > > > rethook: x86: Add rethook x86 implementation
> > > > fprobe: Add exit_handler support
> > > > fprobe: Add sample program for fprobe
> > > > [DO NOT MERGE] Out-of-tree: Support wildcard symbol option to sample
> > > > [DO NOT MERGE] out-of-tree: kprobes: Use rethook for kretprobe
> > > >
> > > >
> > > > arch/x86/Kconfig | 1
> > > > arch/x86/include/asm/unwind.h | 8 +
> > > > arch/x86/kernel/Makefile | 1
> > > > arch/x86/kernel/kprobes/core.c | 106 --------------
> > > > arch/x86/kernel/rethook.c | 115 +++++++++++++++
> > > > include/linux/bpf_types.h | 1
> > > > include/linux/fprobe.h | 84 +++++++++++
> > > > include/linux/ftrace.h | 3
> > > > include/linux/kprobes.h | 85 +----------
> > > > include/linux/rethook.h | 99 +++++++++++++
> > > > include/linux/sched.h | 4 -
> > > > include/uapi/linux/bpf.h | 12 ++
> > > > kernel/bpf/syscall.c | 195 +++++++++++++++++++++++++-
> > > > kernel/exit.c | 3
> > > > kernel/fork.c | 4 -
> > > > kernel/kallsyms.c | 1
> > > > kernel/kprobes.c | 265 +++++------------------------------
> > > > kernel/trace/Kconfig | 22 +++
> > > > kernel/trace/Makefile | 2
> > > > kernel/trace/fprobe.c | 179 ++++++++++++++++++++++++
> > > > kernel/trace/ftrace.c | 54 ++++++-
> > > > kernel/trace/rethook.c | 295 +++++++++++++++++++++++++++++++++++++++
> > > > kernel/trace/trace_kprobe.c | 4 -
> > > > kernel/trace/trace_output.c | 2
> > > > samples/Kconfig | 7 +
> > > > samples/Makefile | 1
> > > > samples/fprobe/Makefile | 3
> > > > samples/fprobe/fprobe_example.c | 154 ++++++++++++++++++++
> > > > tools/include/uapi/linux/bpf.h | 12 ++
> > > > 29 files changed, 1283 insertions(+), 439 deletions(-)
> > > > create mode 100644 arch/x86/kernel/rethook.c
> > > > create mode 100644 include/linux/fprobe.h
> > > > create mode 100644 include/linux/rethook.h
> > > > create mode 100644 kernel/trace/fprobe.c
> > > > create mode 100644 kernel/trace/rethook.c
> > > > create mode 100644 samples/fprobe/Makefile
> > > > create mode 100644 samples/fprobe/fprobe_example.c
> > > >
> > > > --
> > > > Masami Hiramatsu (Linaro) <[email protected]>
> >
> >
> > --
> > Masami Hiramatsu <[email protected]>
>

2022-01-24 15:03:37

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] fprobe: Introduce fprobe function entry/exit probe

On Mon, 24 Jan 2022 00:50:13 +0100
Jiri Olsa <[email protected]> wrote:

> On Fri, Jan 21, 2022 at 09:29:00AM -0800, Andrii Nakryiko wrote:
> > On Thu, Jan 20, 2022 at 8:55 PM Masami Hiramatsu <[email protected]> wrote:
> > >
> > > On Thu, 20 Jan 2022 14:24:15 -0800
> > > Andrii Nakryiko <[email protected]> wrote:
> > >
> > > > On Wed, Jan 19, 2022 at 6:56 AM Masami Hiramatsu <[email protected]> wrote:
> > > > >
> > > > > Hello Jiri,
> > > > >
> > > > > Here is the 3rd version of fprobe. I added some comments and
> > > > > fixed some issues. But I still saw some problems when I add
> > > > > your selftest patches.
> > > > >
> > > > > This series introduces the fprobe, the function entry/exit probe
> > > > > with multiple probe point support. This also introduces the rethook
> > > > > for hooking function return as same as kretprobe does. This
> > > > > abstraction will help us to generalize the fgraph tracer,
> > > > > because we can just switch it from rethook in fprobe, depending
> > > > > on the kernel configuration.
> > > > >
> > > > > The patch [1/9] and [7/9] are from Jiri's series[1]. Other libbpf
> > > > > patches will not be affected by this change.
> > > > >
> > > > > [1] https://lore.kernel.org/all/[email protected]/T/#u
> > > > >
> > > > > However, when I applied all other patches on top of this series,
> > > > > I saw the "#8 bpf_cookie" test case has been stacked (maybe related
> > > > > to the bpf_cookie issue which Andrii and Jiri talked?) And when I
> > > > > remove the last selftest patch[2], the selftest stopped at "#112
> > > > > raw_tp_test_run".
> > > > >
> > > > > [2] https://lore.kernel.org/all/[email protected]/T/#m242d2b3a3775eeb5baba322424b15901e5e78483
> > > > >
> > > > > Note that I used tools/testing/selftests/bpf/vmtest.sh to check it.
> > > > >
> > > > > This added 2 more out-of-tree patches. [8/9] is for adding wildcard
> > > > > support to the sample program, [9/9] is a testing patch for replacing
> > > > > kretprobe trampoline with rethook.
> > > > > According to this work, I noticed that using rethook in kretprobe
> > > > > needs 2 steps.
> > > > > 1. port the rethook on all architectures which supports kretprobes.
> > > > > (some arch requires CONFIG_KPROBES for rethook)
> > > > > 2. replace kretprobe trampoline with rethook for all archs, at once.
> > > > > This must be done by one treewide patch.
> > > > >
> > > > > Anyway, I'll do the kretprobe update in the next step as another series.
> > > > > (This testing patch is just for confirming the rethook is correctly
> > > > > implemented.)
> > > > >
> > > > > BTW, on the x86, ftrace (with fentry) location address is same as
> > > > > symbol address. But on other archs, it will be different (e.g. arm64
> > > > > will need 2 instructions to save link-register and call ftrace, the
> > > > > 2nd instruction will be the ftrace location.)
> > > > > Does libbpf correctly handle it?
>
> hm, I'm probably missing something, but should this be handled by arm
> specific kernel code? user passes whatever is found in kallsyms, right?

In x86, fentry nop is always placed at the first instruction of the function,
but the other arches couldn't do that if they use LR (link register) for
storing return address instead of stack. E.g. arm64 saves lr and call the
ftrace. Then ftrace location address of a function is not the symbol address.

Anyway, I updated fprobe to handle those cases. I also found some issues
on rethook, so let me update the series again.

> > > >
> > > > libbpf doesn't do anything there. The interface for kprobe is based on
> > > > function name and kernel performs name lookups internally to resolve
> > > > IP. For fentry it's similar (kernel handles IP resolution), but
> > > > instead of function name we specify BTF ID of a function type.
> > >
> > > Hmm, according to Jiri's original patch, it seems to pass an array of
> > > addresses. So I thought that has been resolved by libbpf.
> > >
> > > + struct {
> > > + __aligned_u64 addrs;
> >
> > I think this is a pointer to an array of pointers to zero-terminated C strings
>
> I used direct addresses, because bpftrace already has them, so there was
> no point passing strings, I cann add support for that

So now both direct address array or symbol array are OK.

Thank you,

--
Masami Hiramatsu <[email protected]>

2022-01-24 19:11:45

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] fprobe: Introduce fprobe function entry/exit probe

On Mon, Jan 24, 2022 at 09:24:05AM +0900, Masami Hiramatsu wrote:
> On Mon, 24 Jan 2022 00:50:13 +0100
> Jiri Olsa <[email protected]> wrote:
>
> > On Fri, Jan 21, 2022 at 09:29:00AM -0800, Andrii Nakryiko wrote:
> > > On Thu, Jan 20, 2022 at 8:55 PM Masami Hiramatsu <[email protected]> wrote:
> > > >
> > > > On Thu, 20 Jan 2022 14:24:15 -0800
> > > > Andrii Nakryiko <[email protected]> wrote:
> > > >
> > > > > On Wed, Jan 19, 2022 at 6:56 AM Masami Hiramatsu <[email protected]> wrote:
> > > > > >
> > > > > > Hello Jiri,
> > > > > >
> > > > > > Here is the 3rd version of fprobe. I added some comments and
> > > > > > fixed some issues. But I still saw some problems when I add
> > > > > > your selftest patches.
> > > > > >
> > > > > > This series introduces the fprobe, the function entry/exit probe
> > > > > > with multiple probe point support. This also introduces the rethook
> > > > > > for hooking function return as same as kretprobe does. This
> > > > > > abstraction will help us to generalize the fgraph tracer,
> > > > > > because we can just switch it from rethook in fprobe, depending
> > > > > > on the kernel configuration.
> > > > > >
> > > > > > The patch [1/9] and [7/9] are from Jiri's series[1]. Other libbpf
> > > > > > patches will not be affected by this change.
> > > > > >
> > > > > > [1] https://lore.kernel.org/all/[email protected]/T/#u
> > > > > >
> > > > > > However, when I applied all other patches on top of this series,
> > > > > > I saw the "#8 bpf_cookie" test case has been stacked (maybe related
> > > > > > to the bpf_cookie issue which Andrii and Jiri talked?) And when I
> > > > > > remove the last selftest patch[2], the selftest stopped at "#112
> > > > > > raw_tp_test_run".
> > > > > >
> > > > > > [2] https://lore.kernel.org/all/[email protected]/T/#m242d2b3a3775eeb5baba322424b15901e5e78483
> > > > > >
> > > > > > Note that I used tools/testing/selftests/bpf/vmtest.sh to check it.
> > > > > >
> > > > > > This added 2 more out-of-tree patches. [8/9] is for adding wildcard
> > > > > > support to the sample program, [9/9] is a testing patch for replacing
> > > > > > kretprobe trampoline with rethook.
> > > > > > According to this work, I noticed that using rethook in kretprobe
> > > > > > needs 2 steps.
> > > > > > 1. port the rethook on all architectures which supports kretprobes.
> > > > > > (some arch requires CONFIG_KPROBES for rethook)
> > > > > > 2. replace kretprobe trampoline with rethook for all archs, at once.
> > > > > > This must be done by one treewide patch.
> > > > > >
> > > > > > Anyway, I'll do the kretprobe update in the next step as another series.
> > > > > > (This testing patch is just for confirming the rethook is correctly
> > > > > > implemented.)
> > > > > >
> > > > > > BTW, on the x86, ftrace (with fentry) location address is same as
> > > > > > symbol address. But on other archs, it will be different (e.g. arm64
> > > > > > will need 2 instructions to save link-register and call ftrace, the
> > > > > > 2nd instruction will be the ftrace location.)
> > > > > > Does libbpf correctly handle it?
> >
> > hm, I'm probably missing something, but should this be handled by arm
> > specific kernel code? user passes whatever is found in kallsyms, right?
>
> In x86, fentry nop is always placed at the first instruction of the function,
> but the other arches couldn't do that if they use LR (link register) for
> storing return address instead of stack. E.g. arm64 saves lr and call the
> ftrace. Then ftrace location address of a function is not the symbol address.
>
> Anyway, I updated fprobe to handle those cases. I also found some issues
> on rethook, so let me update the series again.

great, I reworked the bpf fprobe link change and need to add the
symbols attachment support, so you don't need to include it in
new version.. I'll rebase it and send on top of your patchset

thanks,
jirka

>
> > > > >
> > > > > libbpf doesn't do anything there. The interface for kprobe is based on
> > > > > function name and kernel performs name lookups internally to resolve
> > > > > IP. For fentry it's similar (kernel handles IP resolution), but
> > > > > instead of function name we specify BTF ID of a function type.
> > > >
> > > > Hmm, according to Jiri's original patch, it seems to pass an array of
> > > > addresses. So I thought that has been resolved by libbpf.
> > > >
> > > > + struct {
> > > > + __aligned_u64 addrs;
> > >
> > > I think this is a pointer to an array of pointers to zero-terminated C strings
> >
> > I used direct addresses, because bpftrace already has them, so there was
> > no point passing strings, I cann add support for that
>
> So now both direct address array or symbol array are OK.
>
> Thank you,
>
> --
> Masami Hiramatsu <[email protected]>
>

2022-01-24 23:28:35

by Jiri Olsa

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] fprobe: Introduce fprobe function entry/exit probe

On Mon, Jan 24, 2022 at 12:22:10PM -0800, Andrii Nakryiko wrote:

SNIP

> > > > > > > > (This testing patch is just for confirming the rethook is correctly
> > > > > > > > implemented.)
> > > > > > > >
> > > > > > > > BTW, on the x86, ftrace (with fentry) location address is same as
> > > > > > > > symbol address. But on other archs, it will be different (e.g. arm64
> > > > > > > > will need 2 instructions to save link-register and call ftrace, the
> > > > > > > > 2nd instruction will be the ftrace location.)
> > > > > > > > Does libbpf correctly handle it?
> > > >
> > > > hm, I'm probably missing something, but should this be handled by arm
> > > > specific kernel code? user passes whatever is found in kallsyms, right?
> > >
> > > In x86, fentry nop is always placed at the first instruction of the function,
> > > but the other arches couldn't do that if they use LR (link register) for
> > > storing return address instead of stack. E.g. arm64 saves lr and call the
> > > ftrace. Then ftrace location address of a function is not the symbol address.
> > >
> > > Anyway, I updated fprobe to handle those cases. I also found some issues
> > > on rethook, so let me update the series again.
> >
> > great, I reworked the bpf fprobe link change and need to add the
> > symbols attachment support, so you don't need to include it in
> > new version.. I'll rebase it and send on top of your patchset
>
> Using just addresses (IPs) for retsnoop and bpftrace is fine because
> such generic tools are already parsing kallsyms and probably building
> some lookup table. But in general, having IP-based attachment is a
> regression from current perf_event_open-based kprobe, where user is
> expected to pass symbolic function name. Using IPs has an advantage of
> being unambiguous (e.g., when same static function name in kernel
> belongs to multiple actual functions), so there is that. But I was
> also wondering wouldn't kernel need to do symbol to IP resolution
> anyways just to check that we are attaching to function entry?

ftrace does its own check for address to attach, it keeps record
for every attachable address.. so less work for us ;-)

>
> I'll wait for your patch set to see how did you go about it in a new revision.

I agree we should have the support to use symbols as well, I'll add it

jirka

2022-01-25 00:33:27

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] fprobe: Introduce fprobe function entry/exit probe

On Mon, Jan 24, 2022 at 4:21 AM Jiri Olsa <[email protected]> wrote:
>
> On Mon, Jan 24, 2022 at 09:24:05AM +0900, Masami Hiramatsu wrote:
> > On Mon, 24 Jan 2022 00:50:13 +0100
> > Jiri Olsa <[email protected]> wrote:
> >
> > > On Fri, Jan 21, 2022 at 09:29:00AM -0800, Andrii Nakryiko wrote:
> > > > On Thu, Jan 20, 2022 at 8:55 PM Masami Hiramatsu <[email protected]> wrote:
> > > > >
> > > > > On Thu, 20 Jan 2022 14:24:15 -0800
> > > > > Andrii Nakryiko <[email protected]> wrote:
> > > > >
> > > > > > On Wed, Jan 19, 2022 at 6:56 AM Masami Hiramatsu <[email protected]> wrote:
> > > > > > >
> > > > > > > Hello Jiri,
> > > > > > >
> > > > > > > Here is the 3rd version of fprobe. I added some comments and
> > > > > > > fixed some issues. But I still saw some problems when I add
> > > > > > > your selftest patches.
> > > > > > >
> > > > > > > This series introduces the fprobe, the function entry/exit probe
> > > > > > > with multiple probe point support. This also introduces the rethook
> > > > > > > for hooking function return as same as kretprobe does. This
> > > > > > > abstraction will help us to generalize the fgraph tracer,
> > > > > > > because we can just switch it from rethook in fprobe, depending
> > > > > > > on the kernel configuration.
> > > > > > >
> > > > > > > The patch [1/9] and [7/9] are from Jiri's series[1]. Other libbpf
> > > > > > > patches will not be affected by this change.
> > > > > > >
> > > > > > > [1] https://lore.kernel.org/all/[email protected]/T/#u
> > > > > > >
> > > > > > > However, when I applied all other patches on top of this series,
> > > > > > > I saw the "#8 bpf_cookie" test case has been stacked (maybe related
> > > > > > > to the bpf_cookie issue which Andrii and Jiri talked?) And when I
> > > > > > > remove the last selftest patch[2], the selftest stopped at "#112
> > > > > > > raw_tp_test_run".
> > > > > > >
> > > > > > > [2] https://lore.kernel.org/all/[email protected]/T/#m242d2b3a3775eeb5baba322424b15901e5e78483
> > > > > > >
> > > > > > > Note that I used tools/testing/selftests/bpf/vmtest.sh to check it.
> > > > > > >
> > > > > > > This added 2 more out-of-tree patches. [8/9] is for adding wildcard
> > > > > > > support to the sample program, [9/9] is a testing patch for replacing
> > > > > > > kretprobe trampoline with rethook.
> > > > > > > According to this work, I noticed that using rethook in kretprobe
> > > > > > > needs 2 steps.
> > > > > > > 1. port the rethook on all architectures which supports kretprobes.
> > > > > > > (some arch requires CONFIG_KPROBES for rethook)
> > > > > > > 2. replace kretprobe trampoline with rethook for all archs, at once.
> > > > > > > This must be done by one treewide patch.
> > > > > > >
> > > > > > > Anyway, I'll do the kretprobe update in the next step as another series.
> > > > > > > (This testing patch is just for confirming the rethook is correctly
> > > > > > > implemented.)
> > > > > > >
> > > > > > > BTW, on the x86, ftrace (with fentry) location address is same as
> > > > > > > symbol address. But on other archs, it will be different (e.g. arm64
> > > > > > > will need 2 instructions to save link-register and call ftrace, the
> > > > > > > 2nd instruction will be the ftrace location.)
> > > > > > > Does libbpf correctly handle it?
> > >
> > > hm, I'm probably missing something, but should this be handled by arm
> > > specific kernel code? user passes whatever is found in kallsyms, right?
> >
> > In x86, fentry nop is always placed at the first instruction of the function,
> > but the other arches couldn't do that if they use LR (link register) for
> > storing return address instead of stack. E.g. arm64 saves lr and call the
> > ftrace. Then ftrace location address of a function is not the symbol address.
> >
> > Anyway, I updated fprobe to handle those cases. I also found some issues
> > on rethook, so let me update the series again.
>
> great, I reworked the bpf fprobe link change and need to add the
> symbols attachment support, so you don't need to include it in
> new version.. I'll rebase it and send on top of your patchset

Using just addresses (IPs) for retsnoop and bpftrace is fine because
such generic tools are already parsing kallsyms and probably building
some lookup table. But in general, having IP-based attachment is a
regression from current perf_event_open-based kprobe, where user is
expected to pass symbolic function name. Using IPs has an advantage of
being unambiguous (e.g., when same static function name in kernel
belongs to multiple actual functions), so there is that. But I was
also wondering wouldn't kernel need to do symbol to IP resolution
anyways just to check that we are attaching to function entry?

I'll wait for your patch set to see how did you go about it in a new revision.


>
> thanks,
> jirka
>
> >
> > > > > >
> > > > > > libbpf doesn't do anything there. The interface for kprobe is based on
> > > > > > function name and kernel performs name lookups internally to resolve
> > > > > > IP. For fentry it's similar (kernel handles IP resolution), but
> > > > > > instead of function name we specify BTF ID of a function type.
> > > > >
> > > > > Hmm, according to Jiri's original patch, it seems to pass an array of
> > > > > addresses. So I thought that has been resolved by libbpf.
> > > > >
> > > > > + struct {
> > > > > + __aligned_u64 addrs;
> > > >
> > > > I think this is a pointer to an array of pointers to zero-terminated C strings
> > >
> > > I used direct addresses, because bpftrace already has them, so there was
> > > no point passing strings, I cann add support for that
> >
> > So now both direct address array or symbol array are OK.
> >
> > Thank you,
> >
> > --
> > Masami Hiramatsu <[email protected]>
> >
>

2022-01-25 08:56:34

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [RFC PATCH v3 0/9] fprobe: Introduce fprobe function entry/exit probe

On Mon, Jan 24, 2022 at 2:23 PM Jiri Olsa <[email protected]> wrote:
>
> On Mon, Jan 24, 2022 at 12:22:10PM -0800, Andrii Nakryiko wrote:
>
> SNIP
>
> > > > > > > > > (This testing patch is just for confirming the rethook is correctly
> > > > > > > > > implemented.)
> > > > > > > > >
> > > > > > > > > BTW, on the x86, ftrace (with fentry) location address is same as
> > > > > > > > > symbol address. But on other archs, it will be different (e.g. arm64
> > > > > > > > > will need 2 instructions to save link-register and call ftrace, the
> > > > > > > > > 2nd instruction will be the ftrace location.)
> > > > > > > > > Does libbpf correctly handle it?
> > > > >
> > > > > hm, I'm probably missing something, but should this be handled by arm
> > > > > specific kernel code? user passes whatever is found in kallsyms, right?
> > > >
> > > > In x86, fentry nop is always placed at the first instruction of the function,
> > > > but the other arches couldn't do that if they use LR (link register) for
> > > > storing return address instead of stack. E.g. arm64 saves lr and call the
> > > > ftrace. Then ftrace location address of a function is not the symbol address.
> > > >
> > > > Anyway, I updated fprobe to handle those cases. I also found some issues
> > > > on rethook, so let me update the series again.
> > >
> > > great, I reworked the bpf fprobe link change and need to add the
> > > symbols attachment support, so you don't need to include it in
> > > new version.. I'll rebase it and send on top of your patchset
> >
> > Using just addresses (IPs) for retsnoop and bpftrace is fine because
> > such generic tools are already parsing kallsyms and probably building
> > some lookup table. But in general, having IP-based attachment is a
> > regression from current perf_event_open-based kprobe, where user is
> > expected to pass symbolic function name. Using IPs has an advantage of
> > being unambiguous (e.g., when same static function name in kernel
> > belongs to multiple actual functions), so there is that. But I was
> > also wondering wouldn't kernel need to do symbol to IP resolution
> > anyways just to check that we are attaching to function entry?
>
> ftrace does its own check for address to attach, it keeps record
> for every attachable address.. so less work for us ;-)

Oh, makes sense, thanks!

>
> >
> > I'll wait for your patch set to see how did you go about it in a new revision.
>
> I agree we should have the support to use symbols as well, I'll add it

sounds good, thanks

>
> jirka
>