2022-01-25 16:51:44

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 0/9] fprobe: Introduce fprobe function entry/exit probe

Hi,

Here is the 5th version of fprobe. This version fixed some build
issues on fprobe and rethook (thanks for kernel test bot to give
a hint!). The previous version is here[1];

[1] https://lore.kernel.org/all/164304056155.1680787.14081905648619647218.stgit@devnote2/

This series introduces the fprobe, the function entry/exit probe
with multiple probe point support. This also introduces the rethook
for hooking function return as same as kretprobe does. This
abstraction will help us to generalize the fgraph tracer,
because we can just switch it from rethook in fprobe, depending
on the kernel configuration.

The patch [1/9] is from Jiri's series[2].

[2] https://lore.kernel.org/all/[email protected]/T/#u


Thank you,

---

Jiri Olsa (1):
ftrace: Add ftrace_set_filter_ips function

Masami Hiramatsu (8):
fprobe: Add ftrace based probe APIs
rethook: Add a generic return hook
rethook: x86: Add rethook x86 implementation
ARM: rethook: Add rethook arm implementation
arm64: rethook: Add arm64 rethook implementation
fprobe: Add exit_handler support
fprobe: Add sample program for fprobe
docs: fprobe: Add fprobe description to ftrace-use.rst


Documentation/trace/fprobe.rst | 131 +++++++++++
Documentation/trace/index.rst | 1
arch/arm/Kconfig | 1
arch/arm/include/asm/stacktrace.h | 4
arch/arm/kernel/stacktrace.c | 6
arch/arm/probes/Makefile | 1
arch/arm/probes/rethook.c | 71 ++++++
arch/arm64/Kconfig | 1
arch/arm64/include/asm/stacktrace.h | 2
arch/arm64/kernel/probes/Makefile | 1
arch/arm64/kernel/probes/rethook.c | 25 ++
arch/arm64/kernel/probes/rethook_trampoline.S | 87 +++++++
arch/arm64/kernel/stacktrace.c | 7 -
arch/x86/Kconfig | 1
arch/x86/include/asm/unwind.h | 8 +
arch/x86/kernel/Makefile | 1
arch/x86/kernel/kprobes/common.h | 1
arch/x86/kernel/rethook.c | 115 +++++++++
include/linux/fprobe.h | 86 +++++++
include/linux/ftrace.h | 3
include/linux/rethook.h | 99 ++++++++
include/linux/sched.h | 3
kernel/exit.c | 2
kernel/fork.c | 3
kernel/trace/Kconfig | 25 ++
kernel/trace/Makefile | 2
kernel/trace/fprobe.c | 198 ++++++++++++++++
kernel/trace/ftrace.c | 53 ++++
kernel/trace/rethook.c | 311 +++++++++++++++++++++++++
samples/Kconfig | 7 +
samples/Makefile | 1
samples/fprobe/Makefile | 3
samples/fprobe/fprobe_example.c | 103 ++++++++
33 files changed, 1349 insertions(+), 14 deletions(-)
create mode 100644 Documentation/trace/fprobe.rst
create mode 100644 arch/arm/probes/rethook.c
create mode 100644 arch/arm64/kernel/probes/rethook.c
create mode 100644 arch/arm64/kernel/probes/rethook_trampoline.S
create mode 100644 arch/x86/kernel/rethook.c
create mode 100644 include/linux/fprobe.h
create mode 100644 include/linux/rethook.h
create mode 100644 kernel/trace/fprobe.c
create mode 100644 kernel/trace/rethook.c
create mode 100644 samples/fprobe/Makefile
create mode 100644 samples/fprobe/fprobe_example.c

--
Masami Hiramatsu (Linaro) <[email protected]>


2022-01-25 16:51:45

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 2/9] fprobe: Add ftrace based probe APIs

The fprobe is a wrapper API for ftrace function tracer.
Unlike kprobes, this probes only supports the function entry, but
it can probe multiple functions by one fprobe. The usage is almost
same as the kprobe, user will specify the function names by
fprobe::syms, the number of syms by fprobe::nentry,
and the user handler by fprobe::entry_handler.

struct fprobe fp = { 0 };
const char *targets[] = { "func1", "func2", "func3"};

fp.handler = user_handler;
fp.nentry = ARRAY_SIZE(targets);
fp.syms = targets;

ret = register_fprobe(&fp);

CAUTION: if user entry handler changes registers including
ip address, it will be applied when returns from the
entry handler. So user handler must recover it.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v4:
- Fix a memory leak when symbol lookup failed.
- Use ftrace location address instead of symbol address.
- Convert the given symbol address to ftrace location automatically.
- Rename fprobe::ftrace to fprobe::ops.
- Update the Kconfig description.
---
include/linux/fprobe.h | 80 ++++++++++++++++++++++++++++
kernel/trace/Kconfig | 12 ++++
kernel/trace/Makefile | 1
kernel/trace/fprobe.c | 135 ++++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 228 insertions(+)
create mode 100644 include/linux/fprobe.h
create mode 100644 kernel/trace/fprobe.c

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
new file mode 100644
index 000000000000..f7de332b08c2
--- /dev/null
+++ b/include/linux/fprobe.h
@@ -0,0 +1,80 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Simple ftrace probe wrapper */
+#ifndef _LINUX_FPROBE_H
+#define _LINUX_FPROBE_H
+
+#include <linux/compiler.h>
+#include <linux/ftrace.h>
+
+/**
+ * struct fprobe - ftrace based probe.
+ * @syms: The array of symbols to probe.
+ * @addrs: The array of ftrace address of the symbols.
+ * @nentry: The number of entries of @syms or @addrs.
+ * @ops: The ftrace_ops.
+ * @nmissed: The counter for missing events.
+ * @flags: The status flag.
+ * @entry_handler: The callback function for function entry.
+ *
+ * User must set either @syms or @addrs, but not both. If user sets
+ * only @syms, the @addrs are generated when registering the fprobe.
+ * That auto-generated @addrs will be freed when unregistering.
+ */
+struct fprobe {
+ const char **syms;
+ unsigned long *addrs;
+ unsigned int nentry;
+
+ struct ftrace_ops ops;
+ unsigned long nmissed;
+ unsigned int flags;
+ void (*entry_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
+};
+
+#define FPROBE_FL_DISABLED 1
+
+static inline bool fprobe_disabled(struct fprobe *fp)
+{
+ return (fp) ? fp->flags & FPROBE_FL_DISABLED : false;
+}
+
+#ifdef CONFIG_FPROBE
+int register_fprobe(struct fprobe *fp);
+int unregister_fprobe(struct fprobe *fp);
+#else
+static inline int register_fprobe(struct fprobe *fp)
+{
+ return -EOPNOTSUPP;
+}
+static inline int unregister_fprobe(struct fprobe *fp)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
+/**
+ * disable_fprobe() - Disable fprobe
+ * @fp: The fprobe to be disabled.
+ *
+ * This will soft-disable @fp. Note that this doesn't remove the ftrace
+ * hooks from the function entry.
+ */
+static inline void disable_fprobe(struct fprobe *fp)
+{
+ if (fp)
+ fp->flags |= FPROBE_FL_DISABLED;
+}
+
+/**
+ * enable_fprobe() - Enable fprobe
+ * @fp: The fprobe to be enabled.
+ *
+ * This will soft-enable @fp.
+ */
+static inline void enable_fprobe(struct fprobe *fp)
+{
+ if (fp)
+ fp->flags &= ~FPROBE_FL_DISABLED;
+}
+
+#endif
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 420ff4bc67fd..23483dd474b0 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -223,6 +223,18 @@ config DYNAMIC_FTRACE_WITH_ARGS
depends on DYNAMIC_FTRACE
depends on HAVE_DYNAMIC_FTRACE_WITH_ARGS

+config FPROBE
+ bool "Kernel Function Probe (fprobe)"
+ depends on FUNCTION_TRACER
+ depends on DYNAMIC_FTRACE_WITH_REGS
+ default n
+ help
+ This option enables kernel function probe (fprobe) based on ftrace,
+ which is similar to kprobes, but probes only for kernel function
+ entries and it can probe multiple functions by one fprobe.
+
+ If unsure, say N.
+
config FUNCTION_PROFILER
bool "Kernel function profiler"
depends on FUNCTION_TRACER
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index bedc5caceec7..79255f9de9a4 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -97,6 +97,7 @@ obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
+obj-$(CONFIG_FPROBE) += fprobe.o

obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o

diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
new file mode 100644
index 000000000000..748cc34765c1
--- /dev/null
+++ b/kernel/trace/fprobe.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fprobe - Simple ftrace probe wrapper for function entry.
+ */
+#define pr_fmt(fmt) "fprobe: " fmt
+
+#include <linux/fprobe.h>
+#include <linux/kallsyms.h>
+#include <linux/kprobes.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+
+static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
+ struct ftrace_ops *ops, struct ftrace_regs *fregs)
+{
+ struct fprobe *fp;
+ int bit;
+
+ fp = container_of(ops, struct fprobe, ops);
+ if (fprobe_disabled(fp))
+ return;
+
+ bit = ftrace_test_recursion_trylock(ip, parent_ip);
+ if (bit < 0) {
+ fp->nmissed++;
+ return;
+ }
+
+ if (fp->entry_handler)
+ fp->entry_handler(fp, ip, ftrace_get_regs(fregs));
+
+ ftrace_test_recursion_unlock(bit);
+}
+NOKPROBE_SYMBOL(fprobe_handler);
+
+/* Convert ftrace location address from symbols */
+static int convert_func_addresses(struct fprobe *fp)
+{
+ unsigned long addr, size;
+ unsigned int i;
+
+ /* Convert symbols to symbol address */
+ if (fp->syms) {
+ fp->addrs = kcalloc(fp->nentry, sizeof(*fp->addrs), GFP_KERNEL);
+ if (!fp->addrs)
+ return -ENOMEM;
+
+ for (i = 0; i < fp->nentry; i++) {
+ fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
+ if (!fp->addrs[i]) /* Maybe wrong symbol */
+ goto error;
+ }
+ }
+
+ /* Convert symbol address to ftrace location. */
+ for (i = 0; i < fp->nentry; i++) {
+ if (!kallsyms_lookup_size_offset(fp->addrs[i], &size, NULL))
+ size = MCOUNT_INSN_SIZE;
+ addr = ftrace_location_range(fp->addrs[i], fp->addrs[i] + size);
+ if (!addr) /* No dynamic ftrace there. */
+ goto error;
+ fp->addrs[i] = addr;
+ }
+
+ return 0;
+
+error:
+ kfree(fp->addrs);
+ fp->addrs = NULL;
+ return -ENOENT;
+}
+
+/**
+ * register_fprobe() - Register fprobe to ftrace
+ * @fp: A fprobe data structure to be registered.
+ *
+ * This expects the user set @fp::entry_handler, @fp::syms or @fp:addrs,
+ * and @fp::nentry. If @fp::addrs are set, that will be updated to point
+ * the ftrace location. If @fp::addrs are NULL, this will generate it
+ * from @fp::syms.
+ * Note that you do not set both of @fp::addrs and @fp::syms.
+ */
+int register_fprobe(struct fprobe *fp)
+{
+ int ret;
+
+ if (!fp || !fp->nentry || (!fp->syms && !fp->addrs) ||
+ (fp->syms && fp->addrs))
+ return -EINVAL;
+
+ ret = convert_func_addresses(fp);
+ if (ret < 0)
+ return ret;
+
+ fp->nmissed = 0;
+ fp->ops.func = fprobe_handler;
+ fp->ops.flags = FTRACE_OPS_FL_SAVE_REGS;
+
+ ret = ftrace_set_filter_ips(&fp->ops, fp->addrs, fp->nentry, 0, 0);
+ if (!ret)
+ ret = register_ftrace_function(&fp->ops);
+
+ if (ret < 0 && fp->syms) {
+ kfree(fp->addrs);
+ fp->addrs = NULL;
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(register_fprobe);
+
+/**
+ * unregister_fprobe() - Unregister fprobe from ftrace
+ * @fp: A fprobe data structure to be unregistered.
+ *
+ * Unregister fprobe (and remove ftrace hooks from the function entries).
+ * If the @fp::addrs are generated by register_fprobe(), it will be removed
+ * automatically.
+ */
+int unregister_fprobe(struct fprobe *fp)
+{
+ int ret;
+
+ if (!fp || !fp->nentry || !fp->addrs)
+ return -EINVAL;
+
+ ret = unregister_ftrace_function(&fp->ops);
+
+ if (!ret && fp->syms) {
+ kfree(fp->addrs);
+ fp->addrs = NULL;
+ }
+ return ret;
+}
+EXPORT_SYMBOL_GPL(unregister_fprobe);

2022-01-25 16:51:54

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 4/9] rethook: x86: Add rethook x86 implementation

Add rethook for x86 implementation. Most of the code
has been copied from kretprobes on x86.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v5:
- Fix a build error if !CONFIG_KRETPROBES and !CONFIG_RETHOOK.
Changes in v4:
- fix stack backtrace as same as kretprobe does.
---
arch/x86/Kconfig | 1
arch/x86/include/asm/unwind.h | 8 ++-
arch/x86/kernel/Makefile | 1
arch/x86/kernel/kprobes/common.h | 1
arch/x86/kernel/rethook.c | 115 ++++++++++++++++++++++++++++++++++++++
5 files changed, 125 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kernel/rethook.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5c2ccb85f2ef..0a7d48a63787 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -219,6 +219,7 @@ config X86
select HAVE_KPROBES_ON_FTRACE
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_KRETPROBES
+ select HAVE_RETHOOK
select HAVE_KVM
select HAVE_LIVEPATCH if X86_64
select HAVE_MIXED_BREAKPOINTS_REGS
diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 2a1f8734416d..192df5b2094d 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -5,6 +5,7 @@
#include <linux/sched.h>
#include <linux/ftrace.h>
#include <linux/kprobes.h>
+#include <linux/rethook.h>
#include <asm/ptrace.h>
#include <asm/stacktrace.h>

@@ -16,7 +17,7 @@ struct unwind_state {
unsigned long stack_mask;
struct task_struct *task;
int graph_idx;
-#ifdef CONFIG_KRETPROBES
+#if defined(CONFIG_KRETPROBES) || defined(CONFIG_RETHOOK)
struct llist_node *kr_cur;
#endif
bool error;
@@ -107,6 +108,11 @@ static inline
unsigned long unwind_recover_kretprobe(struct unwind_state *state,
unsigned long addr, unsigned long *addr_p)
{
+#ifdef CONFIG_RETHOOK
+ if (is_rethook_trampoline(addr))
+ return rethook_find_ret_addr(state->task, (unsigned long)addr_p,
+ &state->kr_cur);
+#endif
#ifdef CONFIG_KRETPROBES
return is_kretprobe_trampoline(addr) ?
kretprobe_find_ret_addr(state->task, addr_p, &state->kr_cur) :
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 2ff3e600f426..66593d8c4d74 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -106,6 +106,7 @@ obj-$(CONFIG_FUNCTION_GRAPH_TRACER) += ftrace.o
obj-$(CONFIG_FTRACE_SYSCALLS) += ftrace.o
obj-$(CONFIG_X86_TSC) += trace_clock.o
obj-$(CONFIG_TRACING) += trace.o
+obj-$(CONFIG_RETHOOK) += rethook.o
obj-$(CONFIG_CRASH_CORE) += crash_core_$(BITS).o
obj-$(CONFIG_KEXEC_CORE) += machine_kexec_$(BITS).o
obj-$(CONFIG_KEXEC_CORE) += relocate_kernel_$(BITS).o crash.o
diff --git a/arch/x86/kernel/kprobes/common.h b/arch/x86/kernel/kprobes/common.h
index 7d3a2e2daf01..c993521d4933 100644
--- a/arch/x86/kernel/kprobes/common.h
+++ b/arch/x86/kernel/kprobes/common.h
@@ -6,6 +6,7 @@

#include <asm/asm.h>
#include <asm/frame.h>
+#include <asm/insn.h>

#ifdef CONFIG_X86_64

diff --git a/arch/x86/kernel/rethook.c b/arch/x86/kernel/rethook.c
new file mode 100644
index 000000000000..f2f3b9526e43
--- /dev/null
+++ b/arch/x86/kernel/rethook.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * x86 implementation of rethook. Mostly copied from arch/x86/kernel/kprobes/core.c.
+ */
+#include <linux/bug.h>
+#include <linux/rethook.h>
+#include <linux/kprobes.h>
+
+#include "kprobes/common.h"
+
+/*
+ * Called from arch_rethook_trampoline
+ */
+__used __visible void arch_rethook_trampoline_callback(struct pt_regs *regs)
+{
+ unsigned long *frame_pointer;
+
+ /* fixup registers */
+ regs->cs = __KERNEL_CS;
+#ifdef CONFIG_X86_32
+ regs->gs = 0;
+#endif
+ regs->ip = (unsigned long)&arch_rethook_trampoline;
+ regs->orig_ax = ~0UL;
+ regs->sp += sizeof(long);
+ frame_pointer = &regs->sp + 1;
+
+ /*
+ * The return address at 'frame_pointer' is recovered by the
+ * arch_rethook_fixup_return() which called from this
+ * rethook_trampoline_handler().
+ */
+ rethook_trampoline_handler(regs, (unsigned long)frame_pointer);
+
+ /*
+ * Copy FLAGS to 'pt_regs::sp' so that arch_rethook_trapmoline()
+ * can do RET right after POPF.
+ */
+ regs->sp = regs->flags;
+}
+NOKPROBE_SYMBOL(arch_rethook_trampoline_callback);
+
+/*
+ * When a target function returns, this code saves registers and calls
+ * arch_rethook_trampoline_callback(), which calls the rethook handler.
+ */
+asm(
+ ".text\n"
+ ".global arch_rethook_trampoline\n"
+ ".type arch_rethook_trampoline, @function\n"
+ "arch_rethook_trampoline:\n"
+#ifdef CONFIG_X86_64
+ /* Push a fake return address to tell the unwinder it's a kretprobe. */
+ " pushq $arch_rethook_trampoline\n"
+ UNWIND_HINT_FUNC
+ /* Save the 'sp - 8', this will be fixed later. */
+ " pushq %rsp\n"
+ " pushfq\n"
+ SAVE_REGS_STRING
+ " movq %rsp, %rdi\n"
+ " call arch_rethook_trampoline_callback\n"
+ RESTORE_REGS_STRING
+ /* In the callback function, 'regs->flags' is copied to 'regs->sp'. */
+ " addq $8, %rsp\n"
+ " popfq\n"
+#else
+ /* Push a fake return address to tell the unwinder it's a kretprobe. */
+ " pushl $arch_rethook_trampoline\n"
+ UNWIND_HINT_FUNC
+ /* Save the 'sp - 4', this will be fixed later. */
+ " pushl %esp\n"
+ " pushfl\n"
+ SAVE_REGS_STRING
+ " movl %esp, %eax\n"
+ " call arch_rethook_trampoline_callback\n"
+ RESTORE_REGS_STRING
+ /* In the callback function, 'regs->flags' is copied to 'regs->sp'. */
+ " addl $4, %esp\n"
+ " popfl\n"
+#endif
+ " ret\n"
+ ".size arch_rethook_trampoline, .-arch_rethook_trampoline\n"
+);
+NOKPROBE_SYMBOL(arch_rethook_trampoline);
+/*
+ * arch_rethook_trampoline() skips updating frame pointer. The frame pointer
+ * saved in arch_rethook_trampoline_callback() points to the real caller
+ * function's frame pointer. Thus the arch_rethook_trampoline() doesn't have
+ * a standard stack frame with CONFIG_FRAME_POINTER=y.
+ * Let's mark it non-standard function. Anyway, FP unwinder can correctly
+ * unwind without the hint.
+ */
+STACK_FRAME_NON_STANDARD_FP(arch_rethook_trampoline);
+
+/* This is called from rethook_trampoline_handler(). */
+void arch_rethook_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr)
+{
+ unsigned long *frame_pointer = &regs->sp + 1;
+
+ /* Replace fake return address with real one. */
+ *frame_pointer = correct_ret_addr;
+}
+
+void arch_rethook_prepare(struct rethook_node *rh, struct pt_regs *regs)
+{
+ unsigned long *stack = (unsigned long *)regs->sp;
+
+ rh->ret_addr = stack[0];
+ rh->frame = regs->sp;
+
+ /* Replace the return addr with trampoline addr */
+ stack[0] = (unsigned long) arch_rethook_trampoline;
+}
+NOKPROBE_SYMBOL(arch_rethook_prepare);

2022-01-25 16:51:56

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 3/9] rethook: Add a generic return hook

Add a return hook framework which hooks the function
return. Most of the idea came from the kretprobe, but
this is independent from kretprobe.
Note that this is expected to be used with other
function entry hooking feature, like ftrace, fprobe,
adn kprobes. Eventually this will replace the
kretprobe (e.g. kprobe + rethook = kretprobe), but
at this moment, this is just a additional hook.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v4:
- Fix rethook_trampoline_handler() loops as same as
what currently kretprobe does. This will fix some
stacktrace issue in the rethook handler.
---
include/linux/rethook.h | 99 +++++++++++++++
include/linux/sched.h | 3
kernel/exit.c | 2
kernel/fork.c | 3
kernel/trace/Kconfig | 11 ++
kernel/trace/Makefile | 1
kernel/trace/rethook.c | 311 +++++++++++++++++++++++++++++++++++++++++++++++
7 files changed, 430 insertions(+)
create mode 100644 include/linux/rethook.h
create mode 100644 kernel/trace/rethook.c

diff --git a/include/linux/rethook.h b/include/linux/rethook.h
new file mode 100644
index 000000000000..39cfbff1a03c
--- /dev/null
+++ b/include/linux/rethook.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Return hooking with list-based shadow stack.
+ */
+#ifndef _LINUX_RETHOOK_H
+#define _LINUX_RETHOOK_H
+
+#include <linux/compiler.h>
+#include <linux/freelist.h>
+#include <linux/llist.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+
+struct rethook_node;
+
+typedef void (*rethook_handler_t) (struct rethook_node *, void *, struct pt_regs *);
+
+/**
+ * struct rethook - The rethook management data structure.
+ * @data: The user-defined data storage.
+ * @handler: The user-defined return hook handler.
+ * @pool: The pool of struct rethook_node.
+ * @ref: The reference counter.
+ * @rcu: The rcu_head for deferred freeing.
+ *
+ * Don't embed to another data structure, because this is a self-destructive
+ * data structure when all rethook_node are freed.
+ */
+struct rethook {
+ void *data;
+ rethook_handler_t handler;
+ struct freelist_head pool;
+ refcount_t ref;
+ struct rcu_head rcu;
+};
+
+/**
+ * struct rethook_node - The rethook shadow-stack entry node.
+ * @freelist: The freelist, linked to struct rethook::pool.
+ * @rcu: The rcu_head for deferred freeing.
+ * @llist: The llist, linked to a struct task_struct::rethooks.
+ * @rethook: The pointer to the struct rethook.
+ * @ret_addr: The storage for the real return address.
+ * @frame: The stroage for the frame pointer.
+ *
+ * You can embed this with your extended data structure to store any data
+ * on the entry of shadow stack.
+ */
+struct rethook_node {
+ union {
+ struct freelist_node freelist;
+ struct rcu_head rcu;
+ };
+ struct llist_node llist;
+ struct rethook *rethook;
+ unsigned long ret_addr;
+ unsigned long frame;
+};
+
+struct rethook *rethook_alloc(void *data, rethook_handler_t handler);
+void rethook_free(struct rethook *rh);
+void rethook_add_node(struct rethook *rh, struct rethook_node *node);
+struct rethook_node *rethook_try_get(struct rethook *rh);
+void rethook_recycle(struct rethook_node *node);
+void rethook_hook(struct rethook_node *node, struct pt_regs *regs);
+unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
+ struct llist_node **cur);
+
+/* Arch dependent code must implement arch_* and trampoline code */
+void arch_rethook_prepare(struct rethook_node *node, struct pt_regs *regs);
+void arch_rethook_trampoline(void);
+
+/**
+ * is_rethook_trampoline() - Check whether the address is rethook trampoline
+ * @addr: The address to be checked
+ *
+ * Return true if the @addr is rethook trampoline address.
+ */
+static inline bool is_rethook_trampoline(unsigned long addr)
+{
+ return addr == (unsigned long)arch_rethook_trampoline;
+}
+
+/* If the architecture needs a fixup the return address, implement it. */
+void arch_rethook_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr);
+
+/* Generic trampoline handler, arch code must prepare asm stub */
+unsigned long rethook_trampoline_handler(struct pt_regs *regs,
+ unsigned long frame);
+
+#ifdef CONFIG_RETHOOK
+void rethook_flush_task(struct task_struct *tk);
+#else
+#define rethook_flush_task(tsk) do { } while (0)
+#endif
+
+#endif
+
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 78c351e35fec..2bfabf5355b7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1473,6 +1473,9 @@ struct task_struct {
#ifdef CONFIG_KRETPROBES
struct llist_head kretprobe_instances;
#endif
+#ifdef CONFIG_RETHOOK
+ struct llist_head rethooks;
+#endif

#ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH
/*
diff --git a/kernel/exit.c b/kernel/exit.c
index f702a6a63686..a39a321c1f37 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -64,6 +64,7 @@
#include <linux/compat.h>
#include <linux/io_uring.h>
#include <linux/kprobes.h>
+#include <linux/rethook.h>

#include <linux/uaccess.h>
#include <asm/unistd.h>
@@ -169,6 +170,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);

kprobe_flush_task(tsk);
+ rethook_flush_task(tsk);
perf_event_delayed_put(tsk);
trace_sched_process_free(tsk);
put_task_struct(tsk);
diff --git a/kernel/fork.c b/kernel/fork.c
index 3244cc56b697..ffae38be64c4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2282,6 +2282,9 @@ static __latent_entropy struct task_struct *copy_process(
#ifdef CONFIG_KRETPROBES
p->kretprobe_instances.first = NULL;
#endif
+#ifdef CONFIG_RETHOOK
+ p->rethooks.first = NULL;
+#endif

/*
* Ensure that the cgroup subsystem policies allow the new process to be
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 23483dd474b0..4d27e56c6e76 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -10,6 +10,17 @@ config USER_STACKTRACE_SUPPORT
config NOP_TRACER
bool

+config HAVE_RETHOOK
+ bool
+
+config RETHOOK
+ bool
+ depends on HAVE_RETHOOK
+ help
+ Enable generic return hooking feature. This is an internal
+ API, which will be used by other function-entry hooking
+ feature like fprobe and kprobes.
+
config HAVE_FUNCTION_TRACER
bool
help
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 79255f9de9a4..c6f11a139eac 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -98,6 +98,7 @@ obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
obj-$(CONFIG_FPROBE) += fprobe.o
+obj-$(CONFIG_RETHOOK) += rethook.o

obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o

diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
new file mode 100644
index 000000000000..76c9848b44a9
--- /dev/null
+++ b/kernel/trace/rethook.c
@@ -0,0 +1,311 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#define pr_fmt(fmt) "rethook: " fmt
+
+#include <linux/bug.h>
+#include <linux/kallsyms.h>
+#include <linux/kprobes.h>
+#include <linux/preempt.h>
+#include <linux/rethook.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+
+/* Return hook list (shadow stack by list) */
+
+/*
+ * This function is called from delayed_put_task_struct() when a task is
+ * dead and cleaned up to recycle any kretprobe instances associated with
+ * this task. These left over instances represent probed functions that
+ * have been called but will never return.
+ */
+void rethook_flush_task(struct task_struct *tk)
+{
+ struct rethook_node *rhn;
+ struct llist_node *node;
+
+ preempt_disable();
+
+ node = __llist_del_all(&tk->rethooks);
+ while (node) {
+ rhn = container_of(node, struct rethook_node, llist);
+ node = node->next;
+ rethook_recycle(rhn);
+ }
+
+ preempt_enable();
+}
+
+static void rethook_free_rcu(struct rcu_head *head)
+{
+ struct rethook *rh = container_of(head, struct rethook, rcu);
+ struct rethook_node *rhn;
+ struct freelist_node *node;
+ int count = 1;
+
+ node = rh->pool.head;
+ while (node) {
+ rhn = container_of(node, struct rethook_node, freelist);
+ node = node->next;
+ kfree(rhn);
+ count++;
+ }
+
+ /* The rh->ref is the number of pooled node + 1 */
+ if (refcount_sub_and_test(count, &rh->ref))
+ kfree(rh);
+}
+
+/**
+ * rethook_free() - Free struct rethook.
+ * @rh: the struct rethook to be freed.
+ *
+ * Free the rethook. Before calling this function, user must ensure the
+ * @rh::data is cleaned if needed (or, the handler can access it after
+ * calling this function.) This function will set the @rh to be freed
+ * after all rethook_node are freed (not soon). And the caller must
+ * not touch @rh after calling this.
+ */
+void rethook_free(struct rethook *rh)
+{
+ rcu_assign_pointer(rh->handler, NULL);
+
+ call_rcu(&rh->rcu, rethook_free_rcu);
+}
+
+/**
+ * rethook_alloc() - Allocate struct rethook.
+ * @data: a data to pass the @handler when hooking the return.
+ * @handler: the return hook callback function.
+ *
+ * Allocate and initialize a new rethook with @data and @handler.
+ * Return NULL if memory allocation fails or @handler is NULL.
+ * Note that @handler == NULL means this rethook is going to be freed.
+ */
+struct rethook *rethook_alloc(void *data, rethook_handler_t handler)
+{
+ struct rethook *rh = kzalloc(sizeof(struct rethook), GFP_KERNEL);
+
+ if (!rh || !handler)
+ return NULL;
+
+ rh->data = data;
+ rh->handler = handler;
+ rh->pool.head = NULL;
+ refcount_set(&rh->ref, 1);
+
+ return rh;
+}
+
+/**
+ * rethook_add_node() - Add a new node to the rethook.
+ * @rh: the struct rethook.
+ * @node: the struct rethook_node to be added.
+ *
+ * Add @node to @rh. User must allocate @node (as a part of user's
+ * data structure.) The @node fields are initialized in this function.
+ */
+void rethook_add_node(struct rethook *rh, struct rethook_node *node)
+{
+ node->rethook = rh;
+ freelist_add(&node->freelist, &rh->pool);
+ refcount_inc(&rh->ref);
+}
+
+static void free_rethook_node_rcu(struct rcu_head *head)
+{
+ struct rethook_node *node = container_of(head, struct rethook_node, rcu);
+
+ if (refcount_dec_and_test(&node->rethook->ref))
+ kfree(node->rethook);
+ kfree(node);
+}
+
+/**
+ * rethook_recycle() - return the node to rethook.
+ * @node: The struct rethook_node to be returned.
+ *
+ * Return back the @node to @node::rethook. If the @node::rethook is already
+ * marked as freed, this will free the @node.
+ */
+void rethook_recycle(struct rethook_node *node)
+{
+ lockdep_assert_preemption_disabled();
+
+ if (likely(READ_ONCE(node->rethook->handler)))
+ freelist_add(&node->freelist, &node->rethook->pool);
+ else
+ call_rcu(&node->rcu, free_rethook_node_rcu);
+}
+NOKPROBE_SYMBOL(rethook_recycle);
+
+/**
+ * rethook_try_get() - get an unused rethook node.
+ * @rh: The struct rethook which pools the nodes.
+ *
+ * Get an unused rethook node from @rh. If the node pool is empty, this
+ * will return NULL. Caller must disable preemption.
+ */
+struct rethook_node *rethook_try_get(struct rethook *rh)
+{
+ rethook_handler_t handler = READ_ONCE(rh->handler);
+ struct freelist_node *fn;
+
+ lockdep_assert_preemption_disabled();
+
+ /* Check whether @rh is going to be freed. */
+ if (unlikely(!handler))
+ return NULL;
+
+ fn = freelist_try_get(&rh->pool);
+ if (!fn)
+ return NULL;
+
+ return container_of(fn, struct rethook_node, freelist);
+}
+NOKPROBE_SYMBOL(rethook_try_get);
+
+/**
+ * rethook_hook() - Hook the current function return.
+ * @node: The struct rethook node to hook the function return.
+ * @regs: The struct pt_regs for the function entry.
+ *
+ * Hook the current running function return. This must be called when the
+ * function entry (or at least @regs must be the registers of the function
+ * entry.)
+ */
+void rethook_hook(struct rethook_node *node, struct pt_regs *regs)
+{
+ arch_rethook_prepare(node, regs);
+ __llist_add(&node->llist, &current->rethooks);
+}
+NOKPROBE_SYMBOL(rethook_hook);
+
+/* This assumes the 'tsk' is the current task or the is not running. */
+static unsigned long __rethook_find_ret_addr(struct task_struct *tsk,
+ struct llist_node **cur)
+{
+ struct rethook_node *rh = NULL;
+ struct llist_node *node = *cur;
+
+ if (!node)
+ node = tsk->rethooks.first;
+ else
+ node = node->next;
+
+ while (node) {
+ rh = container_of(node, struct rethook_node, llist);
+ if (rh->ret_addr != (unsigned long)arch_rethook_trampoline) {
+ *cur = node;
+ return rh->ret_addr;
+ }
+ node = node->next;
+ }
+ return 0;
+}
+NOKPROBE_SYMBOL(__rethook_find_ret_addr);
+
+/**
+ * rethook_find_ret_addr -- Find correct return address modified by rethook
+ * @tsk: Target task
+ * @frame: A frame pointer
+ * @cur: a storage of the loop cursor llist_node pointer for next call
+ *
+ * Find the correct return address modified by a rethook on @tsk in unsigned
+ * long type. If it finds the return address, this returns that address value,
+ * or this returns 0.
+ * The @tsk must be 'current' or a task which is not running. @frame is a hint
+ * to get the currect return address - which is compared with the
+ * rethook::frame field. The @cur is a loop cursor for searching the
+ * kretprobe return addresses on the @tsk. The '*@cur' should be NULL at the
+ * first call, but '@cur' itself must NOT NULL.
+ */
+unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
+ struct llist_node **cur)
+{
+ struct rethook_node *rhn = NULL;
+ unsigned long ret;
+
+ if (WARN_ON_ONCE(!cur))
+ return 0;
+
+ do {
+ ret = __rethook_find_ret_addr(tsk, cur);
+ if (!ret)
+ break;
+ rhn = container_of(*cur, struct rethook_node, llist);
+ } while (rhn->frame != frame);
+
+ return ret;
+}
+NOKPROBE_SYMBOL(rethook_find_ret_addr);
+
+void __weak arch_rethook_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr)
+{
+ /*
+ * Do nothing by default. If the architecture which uses a
+ * frame pointer to record real return address on the stack,
+ * it should fill this function to fixup the return address
+ * so that stacktrace works from the rethook handler.
+ */
+}
+
+/* This function will be called from each arch-defined trampoline. */
+unsigned long rethook_trampoline_handler(struct pt_regs *regs,
+ unsigned long frame)
+{
+ struct llist_node *first, *node = NULL;
+ unsigned long correct_ret_addr;
+ rethook_handler_t handler;
+ struct rethook_node *rhn;
+
+ correct_ret_addr = __rethook_find_ret_addr(current, &node);
+ if (!correct_ret_addr) {
+ pr_err("rethook: Return address not found! Maybe there is a bug in the kernel\n");
+ BUG_ON(1);
+ }
+
+ instruction_pointer_set(regs, correct_ret_addr);
+
+ /*
+ * These loops must be protected from rethook_free_rcu() because those
+ * are accessing 'rhn->rethook'.
+ */
+ preempt_disable();
+
+ /*
+ * Run the handler on the shadow stack. Do not unlink the list here because
+ * stackdump inside the handlers needs to decode it.
+ */
+ first = current->rethooks.first;
+ while (first) {
+ rhn = container_of(first, struct rethook_node, llist);
+ if (WARN_ON_ONCE(rhn->frame != frame))
+ break;
+ handler = READ_ONCE(rhn->rethook->handler);
+ if (handler)
+ handler(rhn, rhn->rethook->data, regs);
+
+ if (first == node)
+ break;
+ first = first->next;
+ }
+
+ /* Fixup registers for returning to correct address. */
+ arch_rethook_fixup_return(regs, correct_ret_addr);
+
+ /* Unlink used shadow stack */
+ first = current->rethooks.first;
+ current->rethooks.first = node->next;
+ node->next = NULL;
+
+ while (first) {
+ rhn = container_of(first, struct rethook_node, llist);
+ first = first->next;
+ rethook_recycle(rhn);
+ }
+ preempt_enable();
+
+ return correct_ret_addr;
+}
+NOKPROBE_SYMBOL(rethook_trampoline_handler);

2022-01-25 16:52:19

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 7/9] fprobe: Add exit_handler support

Add exit_handler to fprobe. fprobe + rethook allows us
to hook the kernel function return without fgraph tracer.
Eventually, the fgraph tracer will be generic array based
return hooking and fprobe may use it if user requests.
Since both array-based approach and list-based approach
have Pros and Cons, (e.g. memory consumption v.s. less
missing events) it is better to keep both but fprobe
will provide the same exit-handler interface.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v5:
- Add dependency for HAVE_RETHOOK.
Changes in v4:
- Check fprobe is disabled in the exit handler.
Changes in v3:
- Make sure to clear rethook->data before free.
- Handler checks the data is not NULL.
- Free rethook only if the rethook is using.
---
include/linux/fprobe.h | 6 ++++
kernel/trace/Kconfig | 2 +
kernel/trace/fprobe.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index f7de332b08c2..4692089b9118 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -5,6 +5,7 @@

#include <linux/compiler.h>
#include <linux/ftrace.h>
+#include <linux/rethook.h>

/**
* struct fprobe - ftrace based probe.
@@ -14,7 +15,9 @@
* @ops: The ftrace_ops.
* @nmissed: The counter for missing events.
* @flags: The status flag.
+ * @rethook: The rethook data structure. (internal data)
* @entry_handler: The callback function for function entry.
+ * @exit_handler: The callback function for function exit.
*
* User must set either @syms or @addrs, but not both. If user sets
* only @syms, the @addrs are generated when registering the fprobe.
@@ -28,7 +31,10 @@ struct fprobe {
struct ftrace_ops ops;
unsigned long nmissed;
unsigned int flags;
+ struct rethook *rethook;
+
void (*entry_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
+ void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
};

#define FPROBE_FL_DISABLED 1
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 4d27e56c6e76..f47213140499 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -238,6 +238,8 @@ config FPROBE
bool "Kernel Function Probe (fprobe)"
depends on FUNCTION_TRACER
depends on DYNAMIC_FTRACE_WITH_REGS
+ depends on HAVE_RETHOOK
+ select RETHOOK
default n
help
This option enables kernel function probe (fprobe) based on ftrace,
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 748cc34765c1..4d089dda89c2 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -7,12 +7,20 @@
#include <linux/fprobe.h>
#include <linux/kallsyms.h>
#include <linux/kprobes.h>
+#include <linux/rethook.h>
#include <linux/slab.h>
#include <linux/sort.h>

+struct fprobe_rethook_node {
+ struct rethook_node node;
+ unsigned long entry_ip;
+};
+
static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct ftrace_regs *fregs)
{
+ struct fprobe_rethook_node *fpr;
+ struct rethook_node *rh;
struct fprobe *fp;
int bit;

@@ -29,10 +37,37 @@ static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
if (fp->entry_handler)
fp->entry_handler(fp, ip, ftrace_get_regs(fregs));

+ if (fp->exit_handler) {
+ rh = rethook_try_get(fp->rethook);
+ if (!rh) {
+ fp->nmissed++;
+ goto out;
+ }
+ fpr = container_of(rh, struct fprobe_rethook_node, node);
+ fpr->entry_ip = ip;
+ rethook_hook(rh, ftrace_get_regs(fregs));
+ }
+
+out:
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(fprobe_handler);

+static void fprobe_exit_handler(struct rethook_node *rh, void *data,
+ struct pt_regs *regs)
+{
+ struct fprobe *fp = (struct fprobe *)data;
+ struct fprobe_rethook_node *fpr;
+
+ if (!fp || fprobe_disabled(fp))
+ return;
+
+ fpr = container_of(rh, struct fprobe_rethook_node, node);
+
+ fp->exit_handler(fp, fpr->entry_ip, regs);
+}
+NOKPROBE_SYMBOL(fprobe_exit_handler);
+
/* Convert ftrace location address from symbols */
static int convert_func_addresses(struct fprobe *fp)
{
@@ -82,6 +117,7 @@ static int convert_func_addresses(struct fprobe *fp)
*/
int register_fprobe(struct fprobe *fp)
{
+ unsigned int i, size;
int ret;

if (!fp || !fp->nentry || (!fp->syms && !fp->addrs) ||
@@ -96,10 +132,29 @@ int register_fprobe(struct fprobe *fp)
fp->ops.func = fprobe_handler;
fp->ops.flags = FTRACE_OPS_FL_SAVE_REGS;

+ /* Initialize rethook if needed */
+ if (fp->exit_handler) {
+ size = fp->nentry * num_possible_cpus() * 2;
+ fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler);
+ for (i = 0; i < size; i++) {
+ struct rethook_node *node;
+
+ node = kzalloc(sizeof(struct fprobe_rethook_node), GFP_KERNEL);
+ if (!node) {
+ rethook_free(fp->rethook);
+ ret = -ENOMEM;
+ goto out;
+ }
+ rethook_add_node(fp->rethook, node);
+ }
+ } else
+ fp->rethook = NULL;
+
ret = ftrace_set_filter_ips(&fp->ops, fp->addrs, fp->nentry, 0, 0);
if (!ret)
ret = register_ftrace_function(&fp->ops);

+out:
if (ret < 0 && fp->syms) {
kfree(fp->addrs);
fp->addrs = NULL;
@@ -125,8 +180,16 @@ int unregister_fprobe(struct fprobe *fp)
return -EINVAL;

ret = unregister_ftrace_function(&fp->ops);
+ if (ret < 0)
+ return ret;

- if (!ret && fp->syms) {
+ if (fp->rethook) {
+ /* Make sure to clear rethook->data before freeing. */
+ WRITE_ONCE(fp->rethook->data, NULL);
+ barrier();
+ rethook_free(fp->rethook);
+ }
+ if (fp->syms) {
kfree(fp->addrs);
fp->addrs = NULL;
}

2022-01-25 16:52:26

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 8/9] fprobe: Add sample program for fprobe

Add a sample program for the fprobe.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v2:
- Fix infinit loop for multiple symbols.
- Fix memory leaks for copied string and entry array.
- Update for new fprobe APIs.
- Fix style issues.
---
samples/Kconfig | 7 +++
samples/Makefile | 1
samples/fprobe/Makefile | 3 +
samples/fprobe/fprobe_example.c | 103 +++++++++++++++++++++++++++++++++++++++
4 files changed, 114 insertions(+)
create mode 100644 samples/fprobe/Makefile
create mode 100644 samples/fprobe/fprobe_example.c

diff --git a/samples/Kconfig b/samples/Kconfig
index 43d2e9aa557f..e010c2c1256c 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -73,6 +73,13 @@ config SAMPLE_HW_BREAKPOINT
help
This builds kernel hardware breakpoint example modules.

+config SAMPLE_FPROBE
+ tristate "Build fprobe examples -- loadable modules only"
+ depends on FPROBE && m
+ help
+ This builds a fprobe example module. This module has an option 'symbol'.
+ You can specify a probed symbol or symbols separated with ','.
+
config SAMPLE_KFIFO
tristate "Build kfifo examples -- loadable modules only"
depends on m
diff --git a/samples/Makefile b/samples/Makefile
index 4bcd6b93bffa..4f73fe7aa473 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_SAMPLE_INTEL_MEI) += mei/
subdir-$(CONFIG_SAMPLE_WATCHDOG) += watchdog
subdir-$(CONFIG_SAMPLE_WATCH_QUEUE) += watch_queue
obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak/
+obj-$(CONFIG_SAMPLE_FPROBE) += fprobe/
diff --git a/samples/fprobe/Makefile b/samples/fprobe/Makefile
new file mode 100644
index 000000000000..ecccbfa6e99b
--- /dev/null
+++ b/samples/fprobe/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_SAMPLE_FPROBE) += fprobe_example.o
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
new file mode 100644
index 000000000000..c28320537f98
--- /dev/null
+++ b/samples/fprobe/fprobe_example.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Here's a sample kernel module showing the use of fprobe to dump a
+ * stack trace and selected registers when kernel_clone() is called.
+ *
+ * For more information on theory of operation of kprobes, see
+ * Documentation/trace/kprobes.rst
+ *
+ * You will see the trace data in /var/log/messages and on the console
+ * whenever kernel_clone() is invoked to create a new process.
+ */
+
+#define pr_fmt(fmt) "%s: " fmt, __func__
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/fprobe.h>
+#include <linux/slab.h>
+
+#define MAX_SYMBOL_LEN 4096
+struct fprobe sample_probe;
+static char symbol[MAX_SYMBOL_LEN] = "kernel_clone";
+module_param_string(symbol, symbol, sizeof(symbol), 0644);
+
+static void sample_entry_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs)
+{
+ pr_info("Enter <%pS> ip = 0x%p\n", (void *)ip, (void *)ip);
+}
+
+static void sample_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs)
+{
+ unsigned long rip = instruction_pointer(regs);
+
+ pr_info("Return from <%pS> ip = 0x%p to rip = 0x%p (%pS)\n",
+ (void *)ip, (void *)ip, (void *)rip, (void *)rip);
+}
+
+static char *symbuf;
+
+static int __init fprobe_init(void)
+{
+ const char **syms;
+ char *p;
+ int ret, count, i;
+
+ sample_probe.entry_handler = sample_entry_handler;
+ sample_probe.exit_handler = sample_exit_handler;
+
+ if (strchr(symbol, ',')) {
+ symbuf = kstrdup(symbol, GFP_KERNEL);
+ if (!symbuf)
+ return -ENOMEM;
+ p = symbuf;
+ count = 1;
+ while ((p = strchr(++p, ',')) != NULL)
+ count++;
+ } else {
+ count = 1;
+ symbuf = symbol;
+ }
+ pr_info("%d symbols found\n", count);
+
+ syms = kcalloc(count, sizeof(char *), GFP_KERNEL);
+ if (!syms) {
+ ret = -ENOMEM;
+ goto error;
+ }
+
+ p = symbuf;
+ for (i = 0; i < count; i++)
+ syms[i] = strsep(&p, ",");
+
+ sample_probe.syms = syms;
+ sample_probe.nentry = count;
+
+ ret = register_fprobe(&sample_probe);
+ if (ret < 0) {
+ pr_err("register_fprobe failed, returned %d\n", ret);
+ goto error;
+ }
+ pr_info("Planted fprobe at %s\n", symbol);
+ return 0;
+
+error:
+ if (symbuf != symbol)
+ kfree(symbuf);
+ return ret;
+}
+
+static void __exit fprobe_exit(void)
+{
+ unregister_fprobe(&sample_probe);
+
+ kfree(sample_probe.syms);
+ if (symbuf != symbol)
+ kfree(symbuf);
+
+ pr_info("fprobe at %s unregistered\n", symbol);
+}
+
+module_init(fprobe_init)
+module_exit(fprobe_exit)
+MODULE_LICENSE("GPL");

2022-01-25 16:52:31

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 9/9] docs: fprobe: Add fprobe description to ftrace-use.rst

Add a documentation of fprobe for the user who needs
this interface.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Documentation/trace/fprobe.rst | 131 ++++++++++++++++++++++++++++++++++++++++
Documentation/trace/index.rst | 1
2 files changed, 132 insertions(+)
create mode 100644 Documentation/trace/fprobe.rst

diff --git a/Documentation/trace/fprobe.rst b/Documentation/trace/fprobe.rst
new file mode 100644
index 000000000000..c53950a1f91e
--- /dev/null
+++ b/Documentation/trace/fprobe.rst
@@ -0,0 +1,131 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==================================
+Fprobe - Function entry/exit probe
+==================================
+
+.. Author: Masami Hiramatsu <[email protected]>
+
+Introduction
+============
+
+Instead of using ftrace full feature, if you only want to attach callbacks
+on function entry and exit, similar to the kprobes and kretprobes, you can
+use fprobe. Compared with kprobes and kretprobes, fprobe gives faster
+instrumentation for multiple functions with single handler. This document
+describes how to use fprobe.
+
+The usage of fprobe
+===================
+
+The fprobe is a wrapper of ftrace (+ kretprobe-like return callback) to
+attach callbacks to multiple function entry and exit. User needs to set up
+the `struct fprobe` and pass it to `register_fprobe()`.
+
+Typically, `fprobe` data structure is initialized with the `syms`, `nentry`
+and `entry_handler` and/or `exit_handler` as below.
+
+.. code-block:: c
+
+ char targets[] = {"func1", "func2", "func3"};
+ struct fprobe fp = {
+ .syms = targets,
+ .nentry = ARRAY_SIZE(targets),
+ .entry_handler = my_entry_callback,
+ .exit_handler = my_exit_callback,
+ };
+
+The ftrace_ops in the fprobe is automatically set. The FTRACE_OPS_FL_SAVE_REGS
+and FTRACE_OPS_FL_RECURSION
+flag will be set. If you need other flags, please set it by yourself.
+
+.. code-block:: c
+
+ fp.ops.flags |= FTRACE_OPS_FL_RCU;
+
+To enable this fprobe, call::
+
+ register_fprobe(&fp);
+
+To disable (remove from functions) this fprobe, call::
+
+ unregister_fprobe(&fp);
+
+You can temporally (soft) disable the fprobe by::
+
+ disable_fprobe(&fp);
+
+and resume by::
+
+ enable_fprobe(&fp);
+
+The above is defined by including the header::
+
+ #include <linux/fprobe.h>
+
+Same as ftrace, the registered callback will start being called some time
+after the register_fprobe() is called and before it returns. See
+:file:`Documentation/trace/ftrace.rst`.
+
+
+The fprobe entry/exit handler
+=============================
+
+The prototype of the entry/exit callback function is as follows:
+
+.. code-block:: c
+
+ void callback_func(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
+
+Note that both entry and exit callback has same ptototype. The @entry_ip is
+saved at function entry and passed to exit handler.
+
+@fp
+ This is the address of `fprobe` data structure related to this handler.
+ You can embed the `fprobe` to your data structure and get it by
+ container_of() macro from @fp. The @fp must not be NULL.
+
+@entry_ip
+ This is the entry address of the traced function (both entry and exit).
+
+@regs
+ This is the `pt_regs` data structure at the entry and exit. Note that
+ the instruction pointer of @regs may be different from the @entry_ip
+ in the entry_handler. If you need traced instruction pointer, you need
+ to use @entry_ip. On the other hand, in the exit_handler, the instruction
+ pointer of @regs is set to the currect return address.
+
+
+Use fprobe with raw address list
+================================
+
+Instead of passing the array of symbols, you can pass a array of raw
+function addresses via `fprobe::addrs`. In this case, the value of
+this array will be changed automatically to the dynamic ftrace NOP
+location addresses in the given kernel function. So please take care
+if you share this array with others.
+
+
+The missed counter
+==================
+
+The `fprobe` data structure has `fprobe::nmissed` counter field as same as
+kprobes.
+This counter counts up when;
+
+ - fprobe fails to take ftrace_recursion lock. This usually means that a function
+ which is traced by other ftrace users is called from the entry_handler.
+
+ - fprobe fails to setup the function exit because of the shortage of rethook
+ (the shadow stack for hooking the function return.)
+
+Note that `fprobe::nmissed` field is counted up in both case. The former case
+will skip both of entry and exit callback, and the latter case will skip exit
+callback, but in both case the counter is just increased by 1.
+
+Functions and structures
+========================
+
+.. kernel-doc:: include/linux/fprobe.h
+.. kernel-doc:: kernel/trace/fprobe.c
+
diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst
index 3769b9b7aed8..b9f3757f8269 100644
--- a/Documentation/trace/index.rst
+++ b/Documentation/trace/index.rst
@@ -9,6 +9,7 @@ Linux Tracing Technologies
tracepoint-analysis
ftrace
ftrace-uses
+ fprobe
kprobes
kprobetrace
uprobetracer

2022-01-25 16:54:26

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 1/9] ftrace: Add ftrace_set_filter_ips function

From: Jiri Olsa <[email protected]>

Adding ftrace_set_filter_ips function to be able to set filter on
multiple ip addresses at once.

With the kprobe multi attach interface we have cases where we need to
initialize ftrace_ops object with thousands of functions, so having
single function diving into ftrace_hash_move_and_update_ops with
ftrace_lock is faster.

The functions ips are passed as unsigned long array with count.

Signed-off-by: Jiri Olsa <[email protected]>
---
include/linux/ftrace.h | 3 +++
kernel/trace/ftrace.c | 53 ++++++++++++++++++++++++++++++++++++++++--------
2 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9999e29187de..60847cbce0da 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -512,6 +512,8 @@ struct dyn_ftrace {

int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
int remove, int reset);
+int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips,
+ unsigned int cnt, int remove, int reset);
int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
int len, int reset);
int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
@@ -802,6 +804,7 @@ static inline unsigned long ftrace_location(unsigned long ip)
#define ftrace_regex_open(ops, flag, inod, file) ({ -ENODEV; })
#define ftrace_set_early_filter(ops, buf, enable) do { } while (0)
#define ftrace_set_filter_ip(ops, ip, remove, reset) ({ -ENODEV; })
+#define ftrace_set_filter_ips(ops, ips, cnt, remove, reset) ({ -ENODEV; })
#define ftrace_set_filter(ops, buf, len, reset) ({ -ENODEV; })
#define ftrace_set_notrace(ops, buf, len, reset) ({ -ENODEV; })
#define ftrace_free_filter(ops) do { } while (0)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index be5f6b32a012..39350aa38649 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4958,7 +4958,7 @@ ftrace_notrace_write(struct file *file, const char __user *ubuf,
}

static int
-ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
+__ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
{
struct ftrace_func_entry *entry;

@@ -4976,9 +4976,25 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
return add_hash_entry(hash, ip);
}

+static int
+ftrace_match_addr(struct ftrace_hash *hash, unsigned long *ips,
+ unsigned int cnt, int remove)
+{
+ unsigned int i;
+ int err;
+
+ for (i = 0; i < cnt; i++) {
+ err = __ftrace_match_addr(hash, ips[i], remove);
+ if (err)
+ return err;
+ }
+ return 0;
+}
+
static int
ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
- unsigned long ip, int remove, int reset, int enable)
+ unsigned long *ips, unsigned int cnt,
+ int remove, int reset, int enable)
{
struct ftrace_hash **orig_hash;
struct ftrace_hash *hash;
@@ -5008,8 +5024,8 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
ret = -EINVAL;
goto out_regex_unlock;
}
- if (ip) {
- ret = ftrace_match_addr(hash, ip, remove);
+ if (ips) {
+ ret = ftrace_match_addr(hash, ips, cnt, remove);
if (ret < 0)
goto out_regex_unlock;
}
@@ -5026,10 +5042,10 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
}

static int
-ftrace_set_addr(struct ftrace_ops *ops, unsigned long ip, int remove,
- int reset, int enable)
+ftrace_set_addr(struct ftrace_ops *ops, unsigned long *ips, unsigned int cnt,
+ int remove, int reset, int enable)
{
- return ftrace_set_hash(ops, NULL, 0, ip, remove, reset, enable);
+ return ftrace_set_hash(ops, NULL, 0, ips, cnt, remove, reset, enable);
}

#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
@@ -5634,10 +5650,29 @@ int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
int remove, int reset)
{
ftrace_ops_init(ops);
- return ftrace_set_addr(ops, ip, remove, reset, 1);
+ return ftrace_set_addr(ops, &ip, 1, remove, reset, 1);
}
EXPORT_SYMBOL_GPL(ftrace_set_filter_ip);

+/**
+ * ftrace_set_filter_ips - set a functions to filter on in ftrace by addresses
+ * @ops - the ops to set the filter with
+ * @ips - the array of addresses to add to or remove from the filter.
+ * @cnt - the number of addresses in @ips
+ * @remove - non zero to remove ips from the filter
+ * @reset - non zero to reset all filters before applying this filter.
+ *
+ * Filters denote which functions should be enabled when tracing is enabled
+ * If @ips array or any ip specified within is NULL , it fails to update filter.
+ */
+int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips,
+ unsigned int cnt, int remove, int reset)
+{
+ ftrace_ops_init(ops);
+ return ftrace_set_addr(ops, ips, cnt, remove, reset, 1);
+}
+EXPORT_SYMBOL_GPL(ftrace_set_filter_ips);
+
/**
* ftrace_ops_set_global_filter - setup ops to use global filters
* @ops - the ops which will use the global filters
@@ -5659,7 +5694,7 @@ static int
ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
int reset, int enable)
{
- return ftrace_set_hash(ops, buf, len, 0, 0, reset, enable);
+ return ftrace_set_hash(ops, buf, len, NULL, 0, 0, reset, enable);
}

/**

2022-01-25 16:54:51

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 6/9] arm64: rethook: Add arm64 rethook implementation

Add rethook arm64 implementation. Most of the code
has been copied from kretprobes on arm64.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v5:
- Add description.
- Fix build error if !CONFIG_KRETPROBES
---
arch/arm64/Kconfig | 1
arch/arm64/include/asm/stacktrace.h | 2 -
arch/arm64/kernel/probes/Makefile | 1
arch/arm64/kernel/probes/rethook.c | 25 +++++++
arch/arm64/kernel/probes/rethook_trampoline.S | 87 +++++++++++++++++++++++++
arch/arm64/kernel/stacktrace.c | 7 ++
6 files changed, 121 insertions(+), 2 deletions(-)
create mode 100644 arch/arm64/kernel/probes/rethook.c
create mode 100644 arch/arm64/kernel/probes/rethook_trampoline.S

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c4207cf9bb17..c706ed25ea50 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -201,6 +201,7 @@ config ARM64
select HAVE_SYSCALL_TRACEPOINTS
select HAVE_KPROBES
select HAVE_KRETPROBES
+ select HAVE_RETHOOK
select HAVE_GENERIC_VDSO
select IOMMU_DMA if IOMMU_SUPPORT
select IRQ_DOMAIN
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 6564a01cc085..1cc472eb4852 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -54,7 +54,7 @@ struct stackframe {
DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
unsigned long prev_fp;
enum stack_type prev_type;
-#ifdef CONFIG_KRETPROBES
+#if defined(CONFIG_KRETPROBES) || defined(CONFIG_RETHOOK)
struct llist_node *kr_cur;
#endif
};
diff --git a/arch/arm64/kernel/probes/Makefile b/arch/arm64/kernel/probes/Makefile
index 8e4be92e25b1..24e689f44c32 100644
--- a/arch/arm64/kernel/probes/Makefile
+++ b/arch/arm64/kernel/probes/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_KPROBES) += kprobes.o decode-insn.o \
simulate-insn.o
obj-$(CONFIG_UPROBES) += uprobes.o decode-insn.o \
simulate-insn.o
+obj-$(CONFIG_RETHOOK) += rethook.o rethook_trampoline.o
diff --git a/arch/arm64/kernel/probes/rethook.c b/arch/arm64/kernel/probes/rethook.c
new file mode 100644
index 000000000000..38c33c81438b
--- /dev/null
+++ b/arch/arm64/kernel/probes/rethook.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Generic return hook for arm64.
+ * Most of the code is copied from arch/arm64/kernel/probes/kprobes.c
+ */
+
+#include <linux/kprobes.h>
+#include <linux/rethook.h>
+
+/* This is called from arch_rethook_trampoline() */
+unsigned long __used arch_rethook_trampoline_callback(struct pt_regs *regs)
+{
+ return rethook_trampoline_handler(regs, regs->regs[29]);
+}
+NOKPROBE_SYMBOL(arch_rethook_trampoline_callback);
+
+void arch_rethook_prepare(struct rethook_node *rhn, struct pt_regs *regs)
+{
+ rhn->ret_addr = regs->regs[30];
+ rhn->frame = regs->regs[29];
+
+ /* replace return addr (x30) with trampoline */
+ regs->regs[30] = (u64)arch_rethook_trampoline;
+}
+NOKPROBE_SYMBOL(arch_rethook_prepare);
diff --git a/arch/arm64/kernel/probes/rethook_trampoline.S b/arch/arm64/kernel/probes/rethook_trampoline.S
new file mode 100644
index 000000000000..610f520ee72b
--- /dev/null
+++ b/arch/arm64/kernel/probes/rethook_trampoline.S
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * trampoline entry and return code for rethook.
+ * Copied from arch/arm64/kernel/probes/kprobes_trampoline.S
+ */
+
+#include <linux/linkage.h>
+#include <asm/asm-offsets.h>
+#include <asm/assembler.h>
+
+ .text
+
+ .macro save_all_base_regs
+ stp x0, x1, [sp, #S_X0]
+ stp x2, x3, [sp, #S_X2]
+ stp x4, x5, [sp, #S_X4]
+ stp x6, x7, [sp, #S_X6]
+ stp x8, x9, [sp, #S_X8]
+ stp x10, x11, [sp, #S_X10]
+ stp x12, x13, [sp, #S_X12]
+ stp x14, x15, [sp, #S_X14]
+ stp x16, x17, [sp, #S_X16]
+ stp x18, x19, [sp, #S_X18]
+ stp x20, x21, [sp, #S_X20]
+ stp x22, x23, [sp, #S_X22]
+ stp x24, x25, [sp, #S_X24]
+ stp x26, x27, [sp, #S_X26]
+ stp x28, x29, [sp, #S_X28]
+ add x0, sp, #PT_REGS_SIZE
+ stp lr, x0, [sp, #S_LR]
+ /*
+ * Construct a useful saved PSTATE
+ */
+ mrs x0, nzcv
+ mrs x1, daif
+ orr x0, x0, x1
+ mrs x1, CurrentEL
+ orr x0, x0, x1
+ mrs x1, SPSel
+ orr x0, x0, x1
+ stp xzr, x0, [sp, #S_PC]
+ .endm
+
+ .macro restore_all_base_regs
+ ldr x0, [sp, #S_PSTATE]
+ and x0, x0, #(PSR_N_BIT | PSR_Z_BIT | PSR_C_BIT | PSR_V_BIT)
+ msr nzcv, x0
+ ldp x0, x1, [sp, #S_X0]
+ ldp x2, x3, [sp, #S_X2]
+ ldp x4, x5, [sp, #S_X4]
+ ldp x6, x7, [sp, #S_X6]
+ ldp x8, x9, [sp, #S_X8]
+ ldp x10, x11, [sp, #S_X10]
+ ldp x12, x13, [sp, #S_X12]
+ ldp x14, x15, [sp, #S_X14]
+ ldp x16, x17, [sp, #S_X16]
+ ldp x18, x19, [sp, #S_X18]
+ ldp x20, x21, [sp, #S_X20]
+ ldp x22, x23, [sp, #S_X22]
+ ldp x24, x25, [sp, #S_X24]
+ ldp x26, x27, [sp, #S_X26]
+ ldp x28, x29, [sp, #S_X28]
+ .endm
+
+SYM_CODE_START(arch_rethook_trampoline)
+ sub sp, sp, #PT_REGS_SIZE
+
+ save_all_base_regs
+
+ /* Setup a frame pointer. */
+ add x29, sp, #S_FP
+
+ mov x0, sp
+ bl arch_rethook_trampoline_callback
+ /*
+ * Replace trampoline address in lr with actual orig_ret_addr return
+ * address.
+ */
+ mov lr, x0
+
+ /* The frame pointer (x29) is restored with other registers. */
+ restore_all_base_regs
+
+ add sp, sp, #PT_REGS_SIZE
+ ret
+
+SYM_CODE_END(arch_rethook_trampoline)
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index b0f21677764d..be83a9402268 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -8,6 +8,7 @@
#include <linux/export.h>
#include <linux/ftrace.h>
#include <linux/kprobes.h>
+#include <linux/rethook.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
@@ -38,7 +39,7 @@ void notrace start_backtrace(struct stackframe *frame, unsigned long fp,
{
frame->fp = fp;
frame->pc = pc;
-#ifdef CONFIG_KRETPROBES
+#if defined(CONFIG_KRETPROBES) || defined(CONFIG_RETHOOK)
frame->kr_cur = NULL;
#endif

@@ -137,6 +138,10 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
if (is_kretprobe_trampoline(frame->pc))
frame->pc = kretprobe_find_ret_addr(tsk, (void *)frame->fp, &frame->kr_cur);
#endif
+#ifdef CONFIG_RETHOOK
+ if (is_rethook_trampoline(frame->pc))
+ frame->pc = rethook_find_ret_addr(tsk, frame->fp, &frame->kr_cur);
+#endif

return 0;
}

2022-01-25 16:54:58

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 5/9] ARM: rethook: Add rethook arm implementation

Add rethook arm implementation. Most of the code
has been copied from kretprobes on arm.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v5:
- Fix build error when !CONFIG_KRETPROBES
---
arch/arm/Kconfig | 1 +
arch/arm/include/asm/stacktrace.h | 4 +-
arch/arm/kernel/stacktrace.c | 6 +++
arch/arm/probes/Makefile | 1 +
arch/arm/probes/rethook.c | 71 +++++++++++++++++++++++++++++++++++++
5 files changed, 81 insertions(+), 2 deletions(-)
create mode 100644 arch/arm/probes/rethook.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c2724d986fa0..2fe24bbca618 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -106,6 +106,7 @@ config ARM
select HAVE_MOD_ARCH_SPECIFIC
select HAVE_NMI
select HAVE_OPTPROBES if !THUMB2_KERNEL
+ select HAVE_RETHOOK
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
diff --git a/arch/arm/include/asm/stacktrace.h b/arch/arm/include/asm/stacktrace.h
index 8f54f9ad8a9b..babed1707ca8 100644
--- a/arch/arm/include/asm/stacktrace.h
+++ b/arch/arm/include/asm/stacktrace.h
@@ -14,7 +14,7 @@ struct stackframe {
unsigned long sp;
unsigned long lr;
unsigned long pc;
-#ifdef CONFIG_KRETPROBES
+#if defined(CONFIG_KRETPROBES) || defined(CONFIG_RETHOOK)
struct llist_node *kr_cur;
struct task_struct *tsk;
#endif
@@ -27,7 +27,7 @@ void arm_get_current_stackframe(struct pt_regs *regs, struct stackframe *frame)
frame->sp = regs->ARM_sp;
frame->lr = regs->ARM_lr;
frame->pc = regs->ARM_pc;
-#ifdef CONFIG_KRETPROBES
+#if defined(CONFIG_KRETPROBES) || defined(CONFIG_RETHOOK)
frame->kr_cur = NULL;
frame->tsk = current;
#endif
diff --git a/arch/arm/kernel/stacktrace.c b/arch/arm/kernel/stacktrace.c
index 75e905508f27..f509c6be4f57 100644
--- a/arch/arm/kernel/stacktrace.c
+++ b/arch/arm/kernel/stacktrace.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <linux/export.h>
#include <linux/kprobes.h>
+#include <linux/rethook.h>
#include <linux/sched.h>
#include <linux/sched/debug.h>
#include <linux/stacktrace.h>
@@ -66,6 +67,11 @@ int notrace unwind_frame(struct stackframe *frame)
frame->sp = *(unsigned long *)(fp - 8);
frame->pc = *(unsigned long *)(fp - 4);
#endif
+#ifdef CONFIG_RETHOOK
+ if (is_rethook_trampoline(frame->pc))
+ frame->pc = rethook_find_ret_addr(frame->tsk, frame->fp,
+ &frame->kr_cur);
+#endif
#ifdef CONFIG_KRETPROBES
if (is_kretprobe_trampoline(frame->pc))
frame->pc = kretprobe_find_ret_addr(frame->tsk,
diff --git a/arch/arm/probes/Makefile b/arch/arm/probes/Makefile
index 8b0ea5ace100..10c083a22223 100644
--- a/arch/arm/probes/Makefile
+++ b/arch/arm/probes/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_KPROBES) += decode-thumb.o
else
obj-$(CONFIG_KPROBES) += decode-arm.o
endif
+obj-$(CONFIG_RETHOOK) += rethook.o
diff --git a/arch/arm/probes/rethook.c b/arch/arm/probes/rethook.c
new file mode 100644
index 000000000000..adc16cdf358a
--- /dev/null
+++ b/arch/arm/probes/rethook.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * arm implementation of rethook. Mostly copied from arch/arm/probes/kprobes/core.c
+ */
+
+#include <linux/kprobes.h>
+#include <linux/rethook.h>
+
+/* Called from arch_rethook_trampoline */
+static __used unsigned long arch_rethook_trampoline_callback(struct pt_regs *regs)
+{
+ return rethook_trampoline_handler(regs, regs->ARM_fp);
+}
+NOKPROBE_SYMBOL(arch_rethook_trampoline_callback);
+
+/*
+ * When a rethook'ed function returns, it returns to arch_rethook_trampoline
+ * which calls rethook callback. We construct a struct pt_regs to
+ * give a view of registers r0-r11, sp, lr, and pc to the user
+ * return-handler. This is not a complete pt_regs structure, but that
+ * should be enough for stacktrace from the return handler with or
+ * without pt_regs.
+ */
+void __naked arch_rethook_trampoline(void)
+{
+ __asm__ __volatile__ (
+#ifdef CONFIG_FRAME_POINTER
+ "ldr lr, =arch_rethook_trampoline \n\t"
+ /* this makes a framepointer on pt_regs. */
+#ifdef CONFIG_CC_IS_CLANG
+ "stmdb sp, {sp, lr, pc} \n\t"
+ "sub sp, sp, #12 \n\t"
+ /* In clang case, pt_regs->ip = lr. */
+ "stmdb sp!, {r0 - r11, lr} \n\t"
+ /* fp points regs->r11 (fp) */
+ "add fp, sp, #44 \n\t"
+#else /* !CONFIG_CC_IS_CLANG */
+ /* In gcc case, pt_regs->ip = fp. */
+ "stmdb sp, {fp, sp, lr, pc} \n\t"
+ "sub sp, sp, #16 \n\t"
+ "stmdb sp!, {r0 - r11} \n\t"
+ /* fp points regs->r15 (pc) */
+ "add fp, sp, #60 \n\t"
+#endif /* CONFIG_CC_IS_CLANG */
+#else /* !CONFIG_FRAME_POINTER */
+ "sub sp, sp, #16 \n\t"
+ "stmdb sp!, {r0 - r11} \n\t"
+#endif /* CONFIG_FRAME_POINTER */
+ "mov r0, sp \n\t"
+ "bl arch_rethook_trampoline_callback \n\t"
+ "mov lr, r0 \n\t"
+ "ldmia sp!, {r0 - r11} \n\t"
+ "add sp, sp, #16 \n\t"
+#ifdef CONFIG_THUMB2_KERNEL
+ "bx lr \n\t"
+#else
+ "mov pc, lr \n\t"
+#endif
+ : : : "memory");
+}
+NOKPROBE_SYMBOL(arch_rethook_trampoline);
+
+void arch_rethook_prepare(struct rethook_node *rh, struct pt_regs *regs)
+{
+ rh->ret_addr = regs->ARM_lr;
+ rh->frame = regs->ARM_fp;
+
+ /* Replace the return addr with trampoline addr. */
+ regs->ARM_lr = (unsigned long)arch_rethook_trampoline;
+}
+NOKPROBE_SYMBOL(arch_rethook_prepare);

2022-01-25 22:58:45

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 1/9] ftrace: Add ftrace_set_filter_ips function

On Tue, 25 Jan 2022 21:11:46 +0900
Masami Hiramatsu <[email protected]> wrote:

> From: Jiri Olsa <[email protected]>
>
> Adding ftrace_set_filter_ips function to be able to set filter on
> multiple ip addresses at once.
>
> With the kprobe multi attach interface we have cases where we need to
> initialize ftrace_ops object with thousands of functions, so having
> single function diving into ftrace_hash_move_and_update_ops with
> ftrace_lock is faster.
>
> The functions ips are passed as unsigned long array with count.
>
> Signed-off-by: Jiri Olsa <[email protected]>
> ---
> include/linux/ftrace.h | 3 +++
> kernel/trace/ftrace.c | 53 ++++++++++++++++++++++++++++++++++++++++--------
> 2 files changed, 47 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 9999e29187de..60847cbce0da 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -512,6 +512,8 @@ struct dyn_ftrace {
>
> int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
> int remove, int reset);
> +int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips,
> + unsigned int cnt, int remove, int reset);
> int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf,
> int len, int reset);
> int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf,
> @@ -802,6 +804,7 @@ static inline unsigned long ftrace_location(unsigned long ip)
> #define ftrace_regex_open(ops, flag, inod, file) ({ -ENODEV; })
> #define ftrace_set_early_filter(ops, buf, enable) do { } while (0)
> #define ftrace_set_filter_ip(ops, ip, remove, reset) ({ -ENODEV; })
> +#define ftrace_set_filter_ips(ops, ips, cnt, remove, reset) ({ -ENODEV; })
> #define ftrace_set_filter(ops, buf, len, reset) ({ -ENODEV; })
> #define ftrace_set_notrace(ops, buf, len, reset) ({ -ENODEV; })
> #define ftrace_free_filter(ops) do { } while (0)
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index be5f6b32a012..39350aa38649 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -4958,7 +4958,7 @@ ftrace_notrace_write(struct file *file, const char __user *ubuf,
> }
>
> static int
> -ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
> +__ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
> {
> struct ftrace_func_entry *entry;
>
> @@ -4976,9 +4976,25 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
> return add_hash_entry(hash, ip);
> }
>
> +static int
> +ftrace_match_addr(struct ftrace_hash *hash, unsigned long *ips,
> + unsigned int cnt, int remove)
> +{
> + unsigned int i;
> + int err;
> +
> + for (i = 0; i < cnt; i++) {
> + err = __ftrace_match_addr(hash, ips[i], remove);
> + if (err)
> + return err;

On error should we revert what was done?

goto err;
> + }
> + return 0;

err:
for (i--; i >= 0; i--)
__ftrace_match_addr(hash, ips[i], !remove);
return err;

Although it may not matter as it looks like it is only used on a temporary
hash. But either it should be commented that is the case, or we do the above
just to be more robust.

> +}
> +
> static int
> ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
> - unsigned long ip, int remove, int reset, int enable)
> + unsigned long *ips, unsigned int cnt,
> + int remove, int reset, int enable)
> {
> struct ftrace_hash **orig_hash;
> struct ftrace_hash *hash;
> @@ -5008,8 +5024,8 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
> ret = -EINVAL;
> goto out_regex_unlock;
> }
> - if (ip) {
> - ret = ftrace_match_addr(hash, ip, remove);
> + if (ips) {
> + ret = ftrace_match_addr(hash, ips, cnt, remove);
> if (ret < 0)
> goto out_regex_unlock;
> }
> @@ -5026,10 +5042,10 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
> }
>
> static int
> -ftrace_set_addr(struct ftrace_ops *ops, unsigned long ip, int remove,
> - int reset, int enable)
> +ftrace_set_addr(struct ftrace_ops *ops, unsigned long *ips, unsigned int cnt,
> + int remove, int reset, int enable)
> {
> - return ftrace_set_hash(ops, NULL, 0, ip, remove, reset, enable);
> + return ftrace_set_hash(ops, NULL, 0, ips, cnt, remove, reset, enable);
> }
>
> #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> @@ -5634,10 +5650,29 @@ int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
> int remove, int reset)
> {
> ftrace_ops_init(ops);
> - return ftrace_set_addr(ops, ip, remove, reset, 1);
> + return ftrace_set_addr(ops, &ip, 1, remove, reset, 1);
> }
> EXPORT_SYMBOL_GPL(ftrace_set_filter_ip);
>
> +/**
> + * ftrace_set_filter_ips - set a functions to filter on in ftrace by addresses

- set functions to filter on ...

-- Steve

> + * @ops - the ops to set the filter with
> + * @ips - the array of addresses to add to or remove from the filter.
> + * @cnt - the number of addresses in @ips
> + * @remove - non zero to remove ips from the filter
> + * @reset - non zero to reset all filters before applying this filter.
> + *
> + * Filters denote which functions should be enabled when tracing is enabled
> + * If @ips array or any ip specified within is NULL , it fails to update filter.
> + */
> +int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips,
> + unsigned int cnt, int remove, int reset)
> +{
> + ftrace_ops_init(ops);
> + return ftrace_set_addr(ops, ips, cnt, remove, reset, 1);
> +}
> +EXPORT_SYMBOL_GPL(ftrace_set_filter_ips);
> +
> /**
> * ftrace_ops_set_global_filter - setup ops to use global filters
> * @ops - the ops which will use the global filters
> @@ -5659,7 +5694,7 @@ static int
> ftrace_set_regex(struct ftrace_ops *ops, unsigned char *buf, int len,
> int reset, int enable)
> {
> - return ftrace_set_hash(ops, buf, len, 0, 0, reset, enable);
> + return ftrace_set_hash(ops, buf, len, NULL, 0, 0, reset, enable);
> }
>
> /**

2022-01-25 23:03:29

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] fprobe: Add ftrace based probe APIs

On Tue, 25 Jan 2022 21:11:57 +0900
Masami Hiramatsu <[email protected]> wrote:

> The fprobe is a wrapper API for ftrace function tracer.
> Unlike kprobes, this probes only supports the function entry, but
> it can probe multiple functions by one fprobe. The usage is almost
> same as the kprobe, user will specify the function names by
> fprobe::syms, the number of syms by fprobe::nentry,
> and the user handler by fprobe::entry_handler.
>
> struct fprobe fp = { 0 };
> const char *targets[] = { "func1", "func2", "func3"};
>
> fp.handler = user_handler;
> fp.nentry = ARRAY_SIZE(targets);
> fp.syms = targets;
>
> ret = register_fprobe(&fp);
>
> CAUTION: if user entry handler changes registers including
> ip address, it will be applied when returns from the
> entry handler. So user handler must recover it.

Can you rephrase the above, I'm not sure what you mean by it.

>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> ---
> Changes in v4:
> - Fix a memory leak when symbol lookup failed.
> - Use ftrace location address instead of symbol address.
> - Convert the given symbol address to ftrace location automatically.
> - Rename fprobe::ftrace to fprobe::ops.
> - Update the Kconfig description.
> ---
> include/linux/fprobe.h | 80 ++++++++++++++++++++++++++++
> kernel/trace/Kconfig | 12 ++++
> kernel/trace/Makefile | 1
> kernel/trace/fprobe.c | 135 ++++++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 228 insertions(+)
> create mode 100644 include/linux/fprobe.h
> create mode 100644 kernel/trace/fprobe.c
>
> diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
> new file mode 100644
> index 000000000000..f7de332b08c2
> --- /dev/null
> +++ b/include/linux/fprobe.h
> @@ -0,0 +1,80 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Simple ftrace probe wrapper */
> +#ifndef _LINUX_FPROBE_H
> +#define _LINUX_FPROBE_H
> +
> +#include <linux/compiler.h>
> +#include <linux/ftrace.h>
> +
> +/**
> + * struct fprobe - ftrace based probe.
> + * @syms: The array of symbols to probe.
> + * @addrs: The array of ftrace address of the symbols.
> + * @nentry: The number of entries of @syms or @addrs.
> + * @ops: The ftrace_ops.
> + * @nmissed: The counter for missing events.
> + * @flags: The status flag.
> + * @entry_handler: The callback function for function entry.
> + *
> + * User must set either @syms or @addrs, but not both. If user sets
> + * only @syms, the @addrs are generated when registering the fprobe.
> + * That auto-generated @addrs will be freed when unregistering.
> + */
> +struct fprobe {
> + const char **syms;
> + unsigned long *addrs;
> + unsigned int nentry;
> +
> + struct ftrace_ops ops;
> + unsigned long nmissed;
> + unsigned int flags;
> + void (*entry_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
> +};
> +
> +#define FPROBE_FL_DISABLED 1
> +
> +static inline bool fprobe_disabled(struct fprobe *fp)
> +{
> + return (fp) ? fp->flags & FPROBE_FL_DISABLED : false;
> +}
> +
> +#ifdef CONFIG_FPROBE
> +int register_fprobe(struct fprobe *fp);
> +int unregister_fprobe(struct fprobe *fp);
> +#else
> +static inline int register_fprobe(struct fprobe *fp)
> +{
> + return -EOPNOTSUPP;
> +}
> +static inline int unregister_fprobe(struct fprobe *fp)
> +{
> + return -EOPNOTSUPP;
> +}
> +#endif
> +
> +/**
> + * disable_fprobe() - Disable fprobe
> + * @fp: The fprobe to be disabled.
> + *
> + * This will soft-disable @fp. Note that this doesn't remove the ftrace
> + * hooks from the function entry.
> + */
> +static inline void disable_fprobe(struct fprobe *fp)
> +{
> + if (fp)
> + fp->flags |= FPROBE_FL_DISABLED;
> +}
> +
> +/**
> + * enable_fprobe() - Enable fprobe
> + * @fp: The fprobe to be enabled.
> + *
> + * This will soft-enable @fp.
> + */
> +static inline void enable_fprobe(struct fprobe *fp)
> +{
> + if (fp)
> + fp->flags &= ~FPROBE_FL_DISABLED;
> +}
> +
> +#endif
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 420ff4bc67fd..23483dd474b0 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -223,6 +223,18 @@ config DYNAMIC_FTRACE_WITH_ARGS
> depends on DYNAMIC_FTRACE
> depends on HAVE_DYNAMIC_FTRACE_WITH_ARGS
>
> +config FPROBE
> + bool "Kernel Function Probe (fprobe)"
> + depends on FUNCTION_TRACER
> + depends on DYNAMIC_FTRACE_WITH_REGS
> + default n
> + help
> + This option enables kernel function probe (fprobe) based on ftrace,
> + which is similar to kprobes, but probes only for kernel function
> + entries and it can probe multiple functions by one fprobe.
> +
> + If unsure, say N.
> +
> config FUNCTION_PROFILER
> bool "Kernel function profiler"
> depends on FUNCTION_TRACER
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index bedc5caceec7..79255f9de9a4 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -97,6 +97,7 @@ obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
> obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
> obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
> obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
> +obj-$(CONFIG_FPROBE) += fprobe.o
>
> obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
>
> diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
> new file mode 100644
> index 000000000000..748cc34765c1
> --- /dev/null
> +++ b/kernel/trace/fprobe.c
> @@ -0,0 +1,135 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * fprobe - Simple ftrace probe wrapper for function entry.
> + */
> +#define pr_fmt(fmt) "fprobe: " fmt
> +
> +#include <linux/fprobe.h>
> +#include <linux/kallsyms.h>
> +#include <linux/kprobes.h>
> +#include <linux/slab.h>
> +#include <linux/sort.h>
> +
> +static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
> + struct ftrace_ops *ops, struct ftrace_regs *fregs)
> +{
> + struct fprobe *fp;
> + int bit;
> +
> + fp = container_of(ops, struct fprobe, ops);
> + if (fprobe_disabled(fp))
> + return;
> +
> + bit = ftrace_test_recursion_trylock(ip, parent_ip);
> + if (bit < 0) {
> + fp->nmissed++;
> + return;
> + }
> +
> + if (fp->entry_handler)
> + fp->entry_handler(fp, ip, ftrace_get_regs(fregs));
> +
> + ftrace_test_recursion_unlock(bit);
> +}
> +NOKPROBE_SYMBOL(fprobe_handler);
> +
> +/* Convert ftrace location address from symbols */
> +static int convert_func_addresses(struct fprobe *fp)
> +{
> + unsigned long addr, size;
> + unsigned int i;
> +
> + /* Convert symbols to symbol address */
> + if (fp->syms) {
> + fp->addrs = kcalloc(fp->nentry, sizeof(*fp->addrs), GFP_KERNEL);
> + if (!fp->addrs)
> + return -ENOMEM;
> +
> + for (i = 0; i < fp->nentry; i++) {
> + fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
> + if (!fp->addrs[i]) /* Maybe wrong symbol */
> + goto error;
> + }
> + }

I wonder if we should just copy the addrs when fp->syms is not set, and
not have to worry about not freeing addrs (see below). This will make
things easier to maintain. Or better yet, have the syms and addrs passed
in, and then we assign it.

static int convert_func_addresses(struct fprobe *fp, const char **syms,
unsigned long *addrs)
{
unsigned int i;

fp->addrs = kcalloc(fp->nentry, sizeof(*fp->addrs), GFP_KERNEL);
if (!fp->addrs)
return -ENOMEM;

if (syms) {
for (i = 0; i < fp->nentry; i++) {
fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
if (!fp->addrs[i]) /* Maybe wrong symbol */
goto error;
}
} else {
memcpy(fp->addrs, addrs, fp->nentry * sizeof(*addrs));
}

> +
> + /* Convert symbol address to ftrace location. */
> + for (i = 0; i < fp->nentry; i++) {
> + if (!kallsyms_lookup_size_offset(fp->addrs[i], &size, NULL))
> + size = MCOUNT_INSN_SIZE;
> + addr = ftrace_location_range(fp->addrs[i], fp->addrs[i] + size);
> + if (!addr) /* No dynamic ftrace there. */
> + goto error;
> + fp->addrs[i] = addr;
> + }
> +
> + return 0;
> +
> +error:
> + kfree(fp->addrs);

The above doesn't check if fp->syms was set, so if it wasn't we just freed
the addrs that was passed in. Again, I think these should be passed into
the register function as separate parameters and not via the fp handle.

> + fp->addrs = NULL;
> + return -ENOENT;
> +}
> +
> +/**
> + * register_fprobe() - Register fprobe to ftrace
> + * @fp: A fprobe data structure to be registered.
> + *
> + * This expects the user set @fp::entry_handler, @fp::syms or @fp:addrs,
> + * and @fp::nentry. If @fp::addrs are set, that will be updated to point
> + * the ftrace location. If @fp::addrs are NULL, this will generate it
> + * from @fp::syms.
> + * Note that you do not set both of @fp::addrs and @fp::syms.

Again, I think this should pass in the syms and addrs as parameters.

-- Steve

> + */
> +int register_fprobe(struct fprobe *fp)
> +{
> + int ret;
> +
> + if (!fp || !fp->nentry || (!fp->syms && !fp->addrs) ||
> + (fp->syms && fp->addrs))
> + return -EINVAL;
> +
> + ret = convert_func_addresses(fp);
> + if (ret < 0)
> + return ret;
> +
> + fp->nmissed = 0;
> + fp->ops.func = fprobe_handler;
> + fp->ops.flags = FTRACE_OPS_FL_SAVE_REGS;
> +
> + ret = ftrace_set_filter_ips(&fp->ops, fp->addrs, fp->nentry, 0, 0);
> + if (!ret)
> + ret = register_ftrace_function(&fp->ops);
> +
> + if (ret < 0 && fp->syms) {
> + kfree(fp->addrs);
> + fp->addrs = NULL;
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(register_fprobe);
> +
> +/**
> + * unregister_fprobe() - Unregister fprobe from ftrace
> + * @fp: A fprobe data structure to be unregistered.
> + *
> + * Unregister fprobe (and remove ftrace hooks from the function entries).
> + * If the @fp::addrs are generated by register_fprobe(), it will be removed
> + * automatically.
> + */
> +int unregister_fprobe(struct fprobe *fp)
> +{
> + int ret;
> +
> + if (!fp || !fp->nentry || !fp->addrs)
> + return -EINVAL;
> +
> + ret = unregister_ftrace_function(&fp->ops);
> +
> + if (!ret && fp->syms) {
> + kfree(fp->addrs);
> + fp->addrs = NULL;
> + }
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(unregister_fprobe);

2022-01-26 01:18:42

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] fprobe: Add ftrace based probe APIs

On Tue, Jan 25, 2022 at 09:11:57PM +0900, Masami Hiramatsu wrote:

SNIP

> +
> +/* Convert ftrace location address from symbols */
> +static int convert_func_addresses(struct fprobe *fp)
> +{
> + unsigned long addr, size;
> + unsigned int i;
> +
> + /* Convert symbols to symbol address */
> + if (fp->syms) {
> + fp->addrs = kcalloc(fp->nentry, sizeof(*fp->addrs), GFP_KERNEL);
> + if (!fp->addrs)
> + return -ENOMEM;
> +
> + for (i = 0; i < fp->nentry; i++) {
> + fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
> + if (!fp->addrs[i]) /* Maybe wrong symbol */
> + goto error;
> + }
> + }
> +
> + /* Convert symbol address to ftrace location. */
> + for (i = 0; i < fp->nentry; i++) {
> + if (!kallsyms_lookup_size_offset(fp->addrs[i], &size, NULL))
> + size = MCOUNT_INSN_SIZE;
> + addr = ftrace_location_range(fp->addrs[i], fp->addrs[i] + size);

you need to substract 1 from 'end' in here, as explained in
__within_notrace_func comment:

/*
* Since ftrace_location_range() does inclusive range check, we need
* to subtract 1 byte from the end address.
*/

like in the patch below

also this convert is for archs where address from kallsyms does not match
the real attach addresss, like for arm you mentioned earlier, right?

could we have that arch specific, so we don't have extra heavy search
loop for archs that do not need it?

thanks,
jirka


---
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 4d089dda89c2..7970418820e7 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -91,7 +91,7 @@ static int convert_func_addresses(struct fprobe *fp)
for (i = 0; i < fp->nentry; i++) {
if (!kallsyms_lookup_size_offset(fp->addrs[i], &size, NULL))
size = MCOUNT_INSN_SIZE;
- addr = ftrace_location_range(fp->addrs[i], fp->addrs[i] + size);
+ addr = ftrace_location_range(fp->addrs[i], fp->addrs[i] + size - 1);
if (!addr) /* No dynamic ftrace there. */
goto error;
fp->addrs[i] = addr;

2022-01-26 01:19:16

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v5 1/9] ftrace: Add ftrace_set_filter_ips function

On Tue, Jan 25, 2022 at 11:06:59AM -0500, Steven Rostedt wrote:

SNIP

> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index be5f6b32a012..39350aa38649 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -4958,7 +4958,7 @@ ftrace_notrace_write(struct file *file, const char __user *ubuf,
> > }
> >
> > static int
> > -ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
> > +__ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
> > {
> > struct ftrace_func_entry *entry;
> >
> > @@ -4976,9 +4976,25 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
> > return add_hash_entry(hash, ip);
> > }
> >
> > +static int
> > +ftrace_match_addr(struct ftrace_hash *hash, unsigned long *ips,
> > + unsigned int cnt, int remove)
> > +{
> > + unsigned int i;
> > + int err;
> > +
> > + for (i = 0; i < cnt; i++) {
> > + err = __ftrace_match_addr(hash, ips[i], remove);
> > + if (err)
> > + return err;
>
> On error should we revert what was done?
>
> goto err;
> > + }
> > + return 0;
>
> err:
> for (i--; i >= 0; i--)
> __ftrace_match_addr(hash, ips[i], !remove);
> return err;
>
> Although it may not matter as it looks like it is only used on a temporary
> hash. But either it should be commented that is the case, or we do the above
> just to be more robust.

yes, that's the case.. it populates just the hash at this point
and if __ftrace_match_addr fails, the thehash is relased after
jumping to out_regex_unlock

>
> > +}
> > +
> > static int
> > ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
> > - unsigned long ip, int remove, int reset, int enable)
> > + unsigned long *ips, unsigned int cnt,
> > + int remove, int reset, int enable)
> > {
> > struct ftrace_hash **orig_hash;
> > struct ftrace_hash *hash;
> > @@ -5008,8 +5024,8 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
> > ret = -EINVAL;
> > goto out_regex_unlock;
> > }
> > - if (ip) {
> > - ret = ftrace_match_addr(hash, ip, remove);
> > + if (ips) {
> > + ret = ftrace_match_addr(hash, ips, cnt, remove);
> > if (ret < 0)
> > goto out_regex_unlock;
> > }
> > @@ -5026,10 +5042,10 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
> > }
> >
> > static int
> > -ftrace_set_addr(struct ftrace_ops *ops, unsigned long ip, int remove,
> > - int reset, int enable)
> > +ftrace_set_addr(struct ftrace_ops *ops, unsigned long *ips, unsigned int cnt,
> > + int remove, int reset, int enable)
> > {
> > - return ftrace_set_hash(ops, NULL, 0, ip, remove, reset, enable);
> > + return ftrace_set_hash(ops, NULL, 0, ips, cnt, remove, reset, enable);
> > }
> >
> > #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> > @@ -5634,10 +5650,29 @@ int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
> > int remove, int reset)
> > {
> > ftrace_ops_init(ops);
> > - return ftrace_set_addr(ops, ip, remove, reset, 1);
> > + return ftrace_set_addr(ops, &ip, 1, remove, reset, 1);
> > }
> > EXPORT_SYMBOL_GPL(ftrace_set_filter_ip);
> >
> > +/**
> > + * ftrace_set_filter_ips - set a functions to filter on in ftrace by addresses
>
> - set functions to filter on ...

will fix,

thanks,
jirka

2022-01-26 01:19:50

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] rethook: Add a generic return hook

On Tue, 25 Jan 2022 21:12:09 +0900
Masami Hiramatsu <[email protected]> wrote:

> Add a return hook framework which hooks the function
> return. Most of the idea came from the kretprobe, but
> this is independent from kretprobe.
> Note that this is expected to be used with other
> function entry hooking feature, like ftrace, fprobe,
> adn kprobes. Eventually this will replace the
> kretprobe (e.g. kprobe + rethook = kretprobe), but
> at this moment, this is just a additional hook.

BTW, 74 characters is the recommended width of a change log (I usually use
76). Not sure why your width is set to 55 characters.

>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> ---
> Changes in v4:
> - Fix rethook_trampoline_handler() loops as same as
> what currently kretprobe does. This will fix some
> stacktrace issue in the rethook handler.
> ---
> include/linux/rethook.h | 99 +++++++++++++++
> include/linux/sched.h | 3
> kernel/exit.c | 2
> kernel/fork.c | 3
> kernel/trace/Kconfig | 11 ++
> kernel/trace/Makefile | 1
> kernel/trace/rethook.c | 311 +++++++++++++++++++++++++++++++++++++++++++++++
> 7 files changed, 430 insertions(+)
> create mode 100644 include/linux/rethook.h
> create mode 100644 kernel/trace/rethook.c
>
> diff --git a/include/linux/rethook.h b/include/linux/rethook.h
> new file mode 100644
> index 000000000000..39cfbff1a03c
> --- /dev/null
> +++ b/include/linux/rethook.h
> @@ -0,0 +1,99 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Return hooking with list-based shadow stack.
> + */
> +#ifndef _LINUX_RETHOOK_H
> +#define _LINUX_RETHOOK_H
> +
> +#include <linux/compiler.h>
> +#include <linux/freelist.h>
> +#include <linux/llist.h>
> +#include <linux/rcupdate.h>
> +#include <linux/refcount.h>
> +
> +struct rethook_node;
> +
> +typedef void (*rethook_handler_t) (struct rethook_node *, void *, struct pt_regs *);
> +
> +/**
> + * struct rethook - The rethook management data structure.
> + * @data: The user-defined data storage.
> + * @handler: The user-defined return hook handler.
> + * @pool: The pool of struct rethook_node.
> + * @ref: The reference counter.
> + * @rcu: The rcu_head for deferred freeing.
> + *
> + * Don't embed to another data structure, because this is a self-destructive
> + * data structure when all rethook_node are freed.
> + */
> +struct rethook {
> + void *data;
> + rethook_handler_t handler;
> + struct freelist_head pool;
> + refcount_t ref;
> + struct rcu_head rcu;
> +};
> +
> +/**
> + * struct rethook_node - The rethook shadow-stack entry node.
> + * @freelist: The freelist, linked to struct rethook::pool.
> + * @rcu: The rcu_head for deferred freeing.
> + * @llist: The llist, linked to a struct task_struct::rethooks.
> + * @rethook: The pointer to the struct rethook.
> + * @ret_addr: The storage for the real return address.
> + * @frame: The stroage for the frame pointer.

"storage"

> + *
> + * You can embed this with your extended data structure to store any data
> + * on the entry of shadow stack.

"the shadow stack"?

> + */
> +struct rethook_node {
> + union {
> + struct freelist_node freelist;
> + struct rcu_head rcu;
> + };
> + struct llist_node llist;
> + struct rethook *rethook;
> + unsigned long ret_addr;
> + unsigned long frame;
> +};
> +
> +struct rethook *rethook_alloc(void *data, rethook_handler_t handler);
> +void rethook_free(struct rethook *rh);
> +void rethook_add_node(struct rethook *rh, struct rethook_node *node);
> +struct rethook_node *rethook_try_get(struct rethook *rh);
> +void rethook_recycle(struct rethook_node *node);
> +void rethook_hook(struct rethook_node *node, struct pt_regs *regs);
> +unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
> + struct llist_node **cur);
> +
> +/* Arch dependent code must implement arch_* and trampoline code */
> +void arch_rethook_prepare(struct rethook_node *node, struct pt_regs *regs);
> +void arch_rethook_trampoline(void);
> +
> +/**
> + * is_rethook_trampoline() - Check whether the address is rethook trampoline
> + * @addr: The address to be checked
> + *
> + * Return true if the @addr is rethook trampoline address.

"is a rethook"

> + */
> +static inline bool is_rethook_trampoline(unsigned long addr)
> +{
> + return addr == (unsigned long)arch_rethook_trampoline;

Will this work on architectures like PPC that have strange ways of holding
the function addresses? Or is that what the below fixup handles?

> +}
> +
> +/* If the architecture needs a fixup the return address, implement it. */

"needs to fixup the"

> +void arch_rethook_fixup_return(struct pt_regs *regs,
> + unsigned long correct_ret_addr);
> +
> +/* Generic trampoline handler, arch code must prepare asm stub */
> +unsigned long rethook_trampoline_handler(struct pt_regs *regs,
> + unsigned long frame);
> +
> +#ifdef CONFIG_RETHOOK
> +void rethook_flush_task(struct task_struct *tk);
> +#else
> +#define rethook_flush_task(tsk) do { } while (0)
> +#endif
> +
> +#endif
> +
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 78c351e35fec..2bfabf5355b7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1473,6 +1473,9 @@ struct task_struct {
> #ifdef CONFIG_KRETPROBES
> struct llist_head kretprobe_instances;
> #endif
> +#ifdef CONFIG_RETHOOK
> + struct llist_head rethooks;
> +#endif
>
> #ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH
> /*
> diff --git a/kernel/exit.c b/kernel/exit.c
> index f702a6a63686..a39a321c1f37 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -64,6 +64,7 @@
> #include <linux/compat.h>
> #include <linux/io_uring.h>
> #include <linux/kprobes.h>
> +#include <linux/rethook.h>
>
> #include <linux/uaccess.h>
> #include <asm/unistd.h>
> @@ -169,6 +170,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
> struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);
>
> kprobe_flush_task(tsk);
> + rethook_flush_task(tsk);
> perf_event_delayed_put(tsk);
> trace_sched_process_free(tsk);
> put_task_struct(tsk);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 3244cc56b697..ffae38be64c4 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2282,6 +2282,9 @@ static __latent_entropy struct task_struct *copy_process(
> #ifdef CONFIG_KRETPROBES
> p->kretprobe_instances.first = NULL;
> #endif
> +#ifdef CONFIG_RETHOOK
> + p->rethooks.first = NULL;
> +#endif
>
> /*
> * Ensure that the cgroup subsystem policies allow the new process to be
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 23483dd474b0..4d27e56c6e76 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -10,6 +10,17 @@ config USER_STACKTRACE_SUPPORT
> config NOP_TRACER
> bool
>
> +config HAVE_RETHOOK
> + bool
> +
> +config RETHOOK
> + bool
> + depends on HAVE_RETHOOK
> + help
> + Enable generic return hooking feature. This is an internal
> + API, which will be used by other function-entry hooking
> + feature like fprobe and kprobes.

"features"

> +
> config HAVE_FUNCTION_TRACER
> bool
> help
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index 79255f9de9a4..c6f11a139eac 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -98,6 +98,7 @@ obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
> obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
> obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
> obj-$(CONFIG_FPROBE) += fprobe.o
> +obj-$(CONFIG_RETHOOK) += rethook.o
>
> obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
>
> diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
> new file mode 100644
> index 000000000000..76c9848b44a9
> --- /dev/null
> +++ b/kernel/trace/rethook.c
> @@ -0,0 +1,311 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#define pr_fmt(fmt) "rethook: " fmt
> +
> +#include <linux/bug.h>
> +#include <linux/kallsyms.h>
> +#include <linux/kprobes.h>
> +#include <linux/preempt.h>
> +#include <linux/rethook.h>
> +#include <linux/slab.h>
> +#include <linux/sort.h>
> +
> +/* Return hook list (shadow stack by list) */
> +
> +/*
> + * This function is called from delayed_put_task_struct() when a task is
> + * dead and cleaned up to recycle any kretprobe instances associated with
> + * this task. These left over instances represent probed functions that
> + * have been called but will never return.
> + */
> +void rethook_flush_task(struct task_struct *tk)
> +{
> + struct rethook_node *rhn;
> + struct llist_node *node;
> +
> + preempt_disable();
> +
> + node = __llist_del_all(&tk->rethooks);

Hmm, this keeps preemption disabled for the entire walk of the list.
Can we enable it here, and then just disable it when calling
rethook_recycle()?

> + while (node) {
> + rhn = container_of(node, struct rethook_node, llist);
> + node = node->next;

preempt_disable();
> + rethook_recycle(rhn);
preempt_enable();

? I'm concerned about the latency that this can add on RT tasks.

> + }
> +
> + preempt_enable();
> +}
> +
> +static void rethook_free_rcu(struct rcu_head *head)
> +{
> + struct rethook *rh = container_of(head, struct rethook, rcu);
> + struct rethook_node *rhn;
> + struct freelist_node *node;
> + int count = 1;
> +
> + node = rh->pool.head;
> + while (node) {
> + rhn = container_of(node, struct rethook_node, freelist);
> + node = node->next;
> + kfree(rhn);
> + count++;
> + }
> +
> + /* The rh->ref is the number of pooled node + 1 */
> + if (refcount_sub_and_test(count, &rh->ref))
> + kfree(rh);
> +}
> +
> +/**
> + * rethook_free() - Free struct rethook.
> + * @rh: the struct rethook to be freed.
> + *
> + * Free the rethook. Before calling this function, user must ensure the
> + * @rh::data is cleaned if needed (or, the handler can access it after
> + * calling this function.) This function will set the @rh to be freed
> + * after all rethook_node are freed (not soon). And the caller must
> + * not touch @rh after calling this.
> + */
> +void rethook_free(struct rethook *rh)
> +{
> + rcu_assign_pointer(rh->handler, NULL);
> +
> + call_rcu(&rh->rcu, rethook_free_rcu);
> +}
> +
> +/**
> + * rethook_alloc() - Allocate struct rethook.
> + * @data: a data to pass the @handler when hooking the return.
> + * @handler: the return hook callback function.
> + *
> + * Allocate and initialize a new rethook with @data and @handler.
> + * Return NULL if memory allocation fails or @handler is NULL.
> + * Note that @handler == NULL means this rethook is going to be freed.
> + */
> +struct rethook *rethook_alloc(void *data, rethook_handler_t handler)
> +{
> + struct rethook *rh = kzalloc(sizeof(struct rethook), GFP_KERNEL);
> +
> + if (!rh || !handler)
> + return NULL;
> +
> + rh->data = data;
> + rh->handler = handler;
> + rh->pool.head = NULL;
> + refcount_set(&rh->ref, 1);
> +
> + return rh;
> +}
> +
> +/**
> + * rethook_add_node() - Add a new node to the rethook.
> + * @rh: the struct rethook.
> + * @node: the struct rethook_node to be added.
> + *
> + * Add @node to @rh. User must allocate @node (as a part of user's
> + * data structure.) The @node fields are initialized in this function.
> + */
> +void rethook_add_node(struct rethook *rh, struct rethook_node *node)
> +{
> + node->rethook = rh;
> + freelist_add(&node->freelist, &rh->pool);
> + refcount_inc(&rh->ref);
> +}
> +
> +static void free_rethook_node_rcu(struct rcu_head *head)
> +{
> + struct rethook_node *node = container_of(head, struct rethook_node, rcu);
> +
> + if (refcount_dec_and_test(&node->rethook->ref))
> + kfree(node->rethook);
> + kfree(node);
> +}
> +
> +/**
> + * rethook_recycle() - return the node to rethook.
> + * @node: The struct rethook_node to be returned.
> + *
> + * Return back the @node to @node::rethook. If the @node::rethook is already
> + * marked as freed, this will free the @node.
> + */
> +void rethook_recycle(struct rethook_node *node)
> +{
> + lockdep_assert_preemption_disabled();
> +
> + if (likely(READ_ONCE(node->rethook->handler)))
> + freelist_add(&node->freelist, &node->rethook->pool);
> + else
> + call_rcu(&node->rcu, free_rethook_node_rcu);
> +}
> +NOKPROBE_SYMBOL(rethook_recycle);
> +
> +/**
> + * rethook_try_get() - get an unused rethook node.
> + * @rh: The struct rethook which pools the nodes.
> + *
> + * Get an unused rethook node from @rh. If the node pool is empty, this
> + * will return NULL. Caller must disable preemption.
> + */
> +struct rethook_node *rethook_try_get(struct rethook *rh)
> +{
> + rethook_handler_t handler = READ_ONCE(rh->handler);
> + struct freelist_node *fn;
> +
> + lockdep_assert_preemption_disabled();
> +
> + /* Check whether @rh is going to be freed. */
> + if (unlikely(!handler))
> + return NULL;
> +
> + fn = freelist_try_get(&rh->pool);
> + if (!fn)
> + return NULL;
> +
> + return container_of(fn, struct rethook_node, freelist);
> +}
> +NOKPROBE_SYMBOL(rethook_try_get);
> +
> +/**
> + * rethook_hook() - Hook the current function return.
> + * @node: The struct rethook node to hook the function return.
> + * @regs: The struct pt_regs for the function entry.
> + *
> + * Hook the current running function return. This must be called when the
> + * function entry (or at least @regs must be the registers of the function
> + * entry.)
> + */
> +void rethook_hook(struct rethook_node *node, struct pt_regs *regs)
> +{
> + arch_rethook_prepare(node, regs);
> + __llist_add(&node->llist, &current->rethooks);
> +}
> +NOKPROBE_SYMBOL(rethook_hook);
> +
> +/* This assumes the 'tsk' is the current task or the is not running. */

"or the is not running" ?

> +static unsigned long __rethook_find_ret_addr(struct task_struct *tsk,
> + struct llist_node **cur)
> +{
> + struct rethook_node *rh = NULL;
> + struct llist_node *node = *cur;
> +
> + if (!node)
> + node = tsk->rethooks.first;
> + else
> + node = node->next;
> +
> + while (node) {
> + rh = container_of(node, struct rethook_node, llist);
> + if (rh->ret_addr != (unsigned long)arch_rethook_trampoline) {
> + *cur = node;
> + return rh->ret_addr;
> + }
> + node = node->next;
> + }
> + return 0;
> +}
> +NOKPROBE_SYMBOL(__rethook_find_ret_addr);
> +
> +/**
> + * rethook_find_ret_addr -- Find correct return address modified by rethook
> + * @tsk: Target task
> + * @frame: A frame pointer
> + * @cur: a storage of the loop cursor llist_node pointer for next call
> + *
> + * Find the correct return address modified by a rethook on @tsk in unsigned
> + * long type. If it finds the return address, this returns that address value,
> + * or this returns 0.

space

> + * The @tsk must be 'current' or a task which is not running. @frame is a hint

How do you know a tsk is not running? How can that be guaranteed?

> + * to get the currect return address - which is compared with the
> + * rethook::frame field. The @cur is a loop cursor for searching the
> + * kretprobe return addresses on the @tsk. The '*@cur' should be NULL at the
> + * first call, but '@cur' itself must NOT NULL.

I know you state what the return value is above, but it should be stated
(again) here. As kernel-doc should have a separate section for return
values:

* Returns found address value or zero if not found.

> + */
> +unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
> + struct llist_node **cur)
> +{
> + struct rethook_node *rhn = NULL;
> + unsigned long ret;
> +
> + if (WARN_ON_ONCE(!cur))
> + return 0;
> +
> + do {
> + ret = __rethook_find_ret_addr(tsk, cur);
> + if (!ret)
> + break;
> + rhn = container_of(*cur, struct rethook_node, llist);
> + } while (rhn->frame != frame);
> +
> + return ret;
> +}

-- Steve


> +NOKPROBE_SYMBOL(rethook_find_ret_addr);
> +
> +void __weak arch_rethook_fixup_return(struct pt_regs *regs,
> + unsigned long correct_ret_addr)
> +{
> + /*
> + * Do nothing by default. If the architecture which uses a
> + * frame pointer to record real return address on the stack,
> + * it should fill this function to fixup the return address
> + * so that stacktrace works from the rethook handler.
> + */
> +}
> +
> +/* This function will be called from each arch-defined trampoline. */
> +unsigned long rethook_trampoline_handler(struct pt_regs *regs,
> + unsigned long frame)
> +{
> + struct llist_node *first, *node = NULL;
> + unsigned long correct_ret_addr;
> + rethook_handler_t handler;
> + struct rethook_node *rhn;
> +
> + correct_ret_addr = __rethook_find_ret_addr(current, &node);
> + if (!correct_ret_addr) {
> + pr_err("rethook: Return address not found! Maybe there is a bug in the kernel\n");
> + BUG_ON(1);
> + }
> +
> + instruction_pointer_set(regs, correct_ret_addr);
> +
> + /*
> + * These loops must be protected from rethook_free_rcu() because those
> + * are accessing 'rhn->rethook'.
> + */
> + preempt_disable();
> +
> + /*
> + * Run the handler on the shadow stack. Do not unlink the list here because
> + * stackdump inside the handlers needs to decode it.
> + */
> + first = current->rethooks.first;
> + while (first) {
> + rhn = container_of(first, struct rethook_node, llist);
> + if (WARN_ON_ONCE(rhn->frame != frame))
> + break;
> + handler = READ_ONCE(rhn->rethook->handler);
> + if (handler)
> + handler(rhn, rhn->rethook->data, regs);
> +
> + if (first == node)
> + break;
> + first = first->next;
> + }
> +
> + /* Fixup registers for returning to correct address. */
> + arch_rethook_fixup_return(regs, correct_ret_addr);
> +
> + /* Unlink used shadow stack */
> + first = current->rethooks.first;
> + current->rethooks.first = node->next;
> + node->next = NULL;
> +
> + while (first) {
> + rhn = container_of(first, struct rethook_node, llist);
> + first = first->next;
> + rethook_recycle(rhn);
> + }
> + preempt_enable();
> +
> + return correct_ret_addr;
> +}
> +NOKPROBE_SYMBOL(rethook_trampoline_handler);

2022-01-26 01:25:19

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 7/9] fprobe: Add exit_handler support

On Tue, 25 Jan 2022 21:12:56 +0900
Masami Hiramatsu <[email protected]> wrote:

> Add exit_handler to fprobe. fprobe + rethook allows us
> to hook the kernel function return without fgraph tracer.
> Eventually, the fgraph tracer will be generic array based
> return hooking and fprobe may use it if user requests.
> Since both array-based approach and list-based approach
> have Pros and Cons, (e.g. memory consumption v.s. less
> missing events) it is better to keep both but fprobe
> will provide the same exit-handler interface.

Again the 55 character width ;-)

>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> ---
> Changes in v5:
> - Add dependency for HAVE_RETHOOK.
> Changes in v4:
> - Check fprobe is disabled in the exit handler.
> Changes in v3:
> - Make sure to clear rethook->data before free.
> - Handler checks the data is not NULL.
> - Free rethook only if the rethook is using.
> ---

> @@ -82,6 +117,7 @@ static int convert_func_addresses(struct fprobe *fp)
> */
> int register_fprobe(struct fprobe *fp)
> {
> + unsigned int i, size;
> int ret;
>
> if (!fp || !fp->nentry || (!fp->syms && !fp->addrs) ||
> @@ -96,10 +132,29 @@ int register_fprobe(struct fprobe *fp)
> fp->ops.func = fprobe_handler;
> fp->ops.flags = FTRACE_OPS_FL_SAVE_REGS;
>
> + /* Initialize rethook if needed */
> + if (fp->exit_handler) {
> + size = fp->nentry * num_possible_cpus() * 2;
> + fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler);

Shouldn't we check if fp->rethook succeeded to be allocated?

> + for (i = 0; i < size; i++) {
> + struct rethook_node *node;
> +
> + node = kzalloc(sizeof(struct fprobe_rethook_node), GFP_KERNEL);
> + if (!node) {
> + rethook_free(fp->rethook);
> + ret = -ENOMEM;
> + goto out;
> + }
> + rethook_add_node(fp->rethook, node);
> + }
> + } else
> + fp->rethook = NULL;
> +
> ret = ftrace_set_filter_ips(&fp->ops, fp->addrs, fp->nentry, 0, 0);
> if (!ret)
> ret = register_ftrace_function(&fp->ops);
>
> +out:
> if (ret < 0 && fp->syms) {
> kfree(fp->addrs);
> fp->addrs = NULL;
> @@ -125,8 +180,16 @@ int unregister_fprobe(struct fprobe *fp)
> return -EINVAL;
>
> ret = unregister_ftrace_function(&fp->ops);
> + if (ret < 0)
> + return ret;

If we fail to unregister the fp->ops, we do not free the allocated nodes
above?

-- Steve

>
> - if (!ret && fp->syms) {
> + if (fp->rethook) {
> + /* Make sure to clear rethook->data before freeing. */
> + WRITE_ONCE(fp->rethook->data, NULL);
> + barrier();
> + rethook_free(fp->rethook);
> + }
> + if (fp->syms) {
> kfree(fp->addrs);
> fp->addrs = NULL;
> }

2022-01-26 03:18:41

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] fprobe: Add ftrace based probe APIs

On Tue, Jan 25, 2022 at 05:41:24PM +0100, Jiri Olsa wrote:
> On Tue, Jan 25, 2022 at 09:11:57PM +0900, Masami Hiramatsu wrote:
>
> SNIP
>
> > +
> > +/* Convert ftrace location address from symbols */
> > +static int convert_func_addresses(struct fprobe *fp)
> > +{
> > + unsigned long addr, size;
> > + unsigned int i;
> > +
> > + /* Convert symbols to symbol address */
> > + if (fp->syms) {
> > + fp->addrs = kcalloc(fp->nentry, sizeof(*fp->addrs), GFP_KERNEL);
> > + if (!fp->addrs)
> > + return -ENOMEM;
> > +
> > + for (i = 0; i < fp->nentry; i++) {
> > + fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
> > + if (!fp->addrs[i]) /* Maybe wrong symbol */
> > + goto error;
> > + }
> > + }
> > +
> > + /* Convert symbol address to ftrace location. */
> > + for (i = 0; i < fp->nentry; i++) {
> > + if (!kallsyms_lookup_size_offset(fp->addrs[i], &size, NULL))
> > + size = MCOUNT_INSN_SIZE;
> > + addr = ftrace_location_range(fp->addrs[i], fp->addrs[i] + size);
>
> you need to substract 1 from 'end' in here, as explained in
> __within_notrace_func comment:
>
> /*
> * Since ftrace_location_range() does inclusive range check, we need
> * to subtract 1 byte from the end address.
> */
>
> like in the patch below
>
> also this convert is for archs where address from kallsyms does not match
> the real attach addresss, like for arm you mentioned earlier, right?
>
> could we have that arch specific, so we don't have extra heavy search
> loop for archs that do not need it?

one more question..

I'm adding support for user to pass function symbols to bpf fprobe link
and I thought I'd pass symbols array to register_fprobe, but I'd need to
copy the whole array of strings from user space first, which could take
lot of memory considering attachment of 10k+ functions

so I'm thinking better way is to resolve symbols already in bpf fprobe
link code and pass just addresses to register_fprobe

I assume you want to keep symbol interface, right? could we have some
flag ensuring the conversion code is skipped, so we don't go through
it twice?

in any case I need addresses before I call register_fprobe, because
of the bpf cookies setup

thanks,
jirka

2022-01-26 13:39:08

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] rethook: Add a generic return hook

On Tue, 25 Jan 2022 11:46:15 -0500
Steven Rostedt <[email protected]> wrote:

> On Tue, 25 Jan 2022 21:12:09 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > Add a return hook framework which hooks the function
> > return. Most of the idea came from the kretprobe, but
> > this is independent from kretprobe.
> > Note that this is expected to be used with other
> > function entry hooking feature, like ftrace, fprobe,
> > adn kprobes. Eventually this will replace the
> > kretprobe (e.g. kprobe + rethook = kretprobe), but
> > at this moment, this is just a additional hook.
>
> BTW, 74 characters is the recommended width of a change log (I usually use
> 76). Not sure why your width is set to 55 characters.

OK, let me make it wider. It's my strange habit (maybe came from
writing text email.)

Thank you!

>
> >
> > Signed-off-by: Masami Hiramatsu <[email protected]>
> > ---
> > Changes in v4:
> > - Fix rethook_trampoline_handler() loops as same as
> > what currently kretprobe does. This will fix some
> > stacktrace issue in the rethook handler.
> > ---
> > include/linux/rethook.h | 99 +++++++++++++++
> > include/linux/sched.h | 3
> > kernel/exit.c | 2
> > kernel/fork.c | 3
> > kernel/trace/Kconfig | 11 ++
> > kernel/trace/Makefile | 1
> > kernel/trace/rethook.c | 311 +++++++++++++++++++++++++++++++++++++++++++++++
> > 7 files changed, 430 insertions(+)
> > create mode 100644 include/linux/rethook.h
> > create mode 100644 kernel/trace/rethook.c
> >
> > diff --git a/include/linux/rethook.h b/include/linux/rethook.h
> > new file mode 100644
> > index 000000000000..39cfbff1a03c
> > --- /dev/null
> > +++ b/include/linux/rethook.h
> > @@ -0,0 +1,99 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Return hooking with list-based shadow stack.
> > + */
> > +#ifndef _LINUX_RETHOOK_H
> > +#define _LINUX_RETHOOK_H
> > +
> > +#include <linux/compiler.h>
> > +#include <linux/freelist.h>
> > +#include <linux/llist.h>
> > +#include <linux/rcupdate.h>
> > +#include <linux/refcount.h>
> > +
> > +struct rethook_node;
> > +
> > +typedef void (*rethook_handler_t) (struct rethook_node *, void *, struct pt_regs *);
> > +
> > +/**
> > + * struct rethook - The rethook management data structure.
> > + * @data: The user-defined data storage.
> > + * @handler: The user-defined return hook handler.
> > + * @pool: The pool of struct rethook_node.
> > + * @ref: The reference counter.
> > + * @rcu: The rcu_head for deferred freeing.
> > + *
> > + * Don't embed to another data structure, because this is a self-destructive
> > + * data structure when all rethook_node are freed.
> > + */
> > +struct rethook {
> > + void *data;
> > + rethook_handler_t handler;
> > + struct freelist_head pool;
> > + refcount_t ref;
> > + struct rcu_head rcu;
> > +};
> > +
> > +/**
> > + * struct rethook_node - The rethook shadow-stack entry node.
> > + * @freelist: The freelist, linked to struct rethook::pool.
> > + * @rcu: The rcu_head for deferred freeing.
> > + * @llist: The llist, linked to a struct task_struct::rethooks.
> > + * @rethook: The pointer to the struct rethook.
> > + * @ret_addr: The storage for the real return address.
> > + * @frame: The stroage for the frame pointer.
>
> "storage"
>
> > + *
> > + * You can embed this with your extended data structure to store any data
> > + * on the entry of shadow stack.
>
> "the shadow stack"?
>
> > + */
> > +struct rethook_node {
> > + union {
> > + struct freelist_node freelist;
> > + struct rcu_head rcu;
> > + };
> > + struct llist_node llist;
> > + struct rethook *rethook;
> > + unsigned long ret_addr;
> > + unsigned long frame;
> > +};
> > +
> > +struct rethook *rethook_alloc(void *data, rethook_handler_t handler);
> > +void rethook_free(struct rethook *rh);
> > +void rethook_add_node(struct rethook *rh, struct rethook_node *node);
> > +struct rethook_node *rethook_try_get(struct rethook *rh);
> > +void rethook_recycle(struct rethook_node *node);
> > +void rethook_hook(struct rethook_node *node, struct pt_regs *regs);
> > +unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
> > + struct llist_node **cur);
> > +
> > +/* Arch dependent code must implement arch_* and trampoline code */
> > +void arch_rethook_prepare(struct rethook_node *node, struct pt_regs *regs);
> > +void arch_rethook_trampoline(void);
> > +
> > +/**
> > + * is_rethook_trampoline() - Check whether the address is rethook trampoline
> > + * @addr: The address to be checked
> > + *
> > + * Return true if the @addr is rethook trampoline address.
>
> "is a rethook"
>
> > + */
> > +static inline bool is_rethook_trampoline(unsigned long addr)
> > +{
> > + return addr == (unsigned long)arch_rethook_trampoline;
>
> Will this work on architectures like PPC that have strange ways of holding
> the function addresses? Or is that what the below fixup handles?
>
> > +}
> > +
> > +/* If the architecture needs a fixup the return address, implement it. */
>
> "needs to fixup the"
>
> > +void arch_rethook_fixup_return(struct pt_regs *regs,
> > + unsigned long correct_ret_addr);
> > +
> > +/* Generic trampoline handler, arch code must prepare asm stub */
> > +unsigned long rethook_trampoline_handler(struct pt_regs *regs,
> > + unsigned long frame);
> > +
> > +#ifdef CONFIG_RETHOOK
> > +void rethook_flush_task(struct task_struct *tk);
> > +#else
> > +#define rethook_flush_task(tsk) do { } while (0)
> > +#endif
> > +
> > +#endif
> > +
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 78c351e35fec..2bfabf5355b7 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1473,6 +1473,9 @@ struct task_struct {
> > #ifdef CONFIG_KRETPROBES
> > struct llist_head kretprobe_instances;
> > #endif
> > +#ifdef CONFIG_RETHOOK
> > + struct llist_head rethooks;
> > +#endif
> >
> > #ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH
> > /*
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index f702a6a63686..a39a321c1f37 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -64,6 +64,7 @@
> > #include <linux/compat.h>
> > #include <linux/io_uring.h>
> > #include <linux/kprobes.h>
> > +#include <linux/rethook.h>
> >
> > #include <linux/uaccess.h>
> > #include <asm/unistd.h>
> > @@ -169,6 +170,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
> > struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);
> >
> > kprobe_flush_task(tsk);
> > + rethook_flush_task(tsk);
> > perf_event_delayed_put(tsk);
> > trace_sched_process_free(tsk);
> > put_task_struct(tsk);
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 3244cc56b697..ffae38be64c4 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -2282,6 +2282,9 @@ static __latent_entropy struct task_struct *copy_process(
> > #ifdef CONFIG_KRETPROBES
> > p->kretprobe_instances.first = NULL;
> > #endif
> > +#ifdef CONFIG_RETHOOK
> > + p->rethooks.first = NULL;
> > +#endif
> >
> > /*
> > * Ensure that the cgroup subsystem policies allow the new process to be
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index 23483dd474b0..4d27e56c6e76 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -10,6 +10,17 @@ config USER_STACKTRACE_SUPPORT
> > config NOP_TRACER
> > bool
> >
> > +config HAVE_RETHOOK
> > + bool
> > +
> > +config RETHOOK
> > + bool
> > + depends on HAVE_RETHOOK
> > + help
> > + Enable generic return hooking feature. This is an internal
> > + API, which will be used by other function-entry hooking
> > + feature like fprobe and kprobes.
>
> "features"
>
> > +
> > config HAVE_FUNCTION_TRACER
> > bool
> > help
> > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > index 79255f9de9a4..c6f11a139eac 100644
> > --- a/kernel/trace/Makefile
> > +++ b/kernel/trace/Makefile
> > @@ -98,6 +98,7 @@ obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
> > obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
> > obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
> > obj-$(CONFIG_FPROBE) += fprobe.o
> > +obj-$(CONFIG_RETHOOK) += rethook.o
> >
> > obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
> >
> > diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
> > new file mode 100644
> > index 000000000000..76c9848b44a9
> > --- /dev/null
> > +++ b/kernel/trace/rethook.c
> > @@ -0,0 +1,311 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#define pr_fmt(fmt) "rethook: " fmt
> > +
> > +#include <linux/bug.h>
> > +#include <linux/kallsyms.h>
> > +#include <linux/kprobes.h>
> > +#include <linux/preempt.h>
> > +#include <linux/rethook.h>
> > +#include <linux/slab.h>
> > +#include <linux/sort.h>
> > +
> > +/* Return hook list (shadow stack by list) */
> > +
> > +/*
> > + * This function is called from delayed_put_task_struct() when a task is
> > + * dead and cleaned up to recycle any kretprobe instances associated with
> > + * this task. These left over instances represent probed functions that
> > + * have been called but will never return.
> > + */
> > +void rethook_flush_task(struct task_struct *tk)
> > +{
> > + struct rethook_node *rhn;
> > + struct llist_node *node;
> > +
> > + preempt_disable();
> > +
> > + node = __llist_del_all(&tk->rethooks);
>
> Hmm, this keeps preemption disabled for the entire walk of the list.
> Can we enable it here, and then just disable it when calling
> rethook_recycle()?
>
> > + while (node) {
> > + rhn = container_of(node, struct rethook_node, llist);
> > + node = node->next;
>
> preempt_disable();
> > + rethook_recycle(rhn);
> preempt_enable();
>
> ? I'm concerned about the latency that this can add on RT tasks.
>
> > + }
> > +
> > + preempt_enable();
> > +}
> > +
> > +static void rethook_free_rcu(struct rcu_head *head)
> > +{
> > + struct rethook *rh = container_of(head, struct rethook, rcu);
> > + struct rethook_node *rhn;
> > + struct freelist_node *node;
> > + int count = 1;
> > +
> > + node = rh->pool.head;
> > + while (node) {
> > + rhn = container_of(node, struct rethook_node, freelist);
> > + node = node->next;
> > + kfree(rhn);
> > + count++;
> > + }
> > +
> > + /* The rh->ref is the number of pooled node + 1 */
> > + if (refcount_sub_and_test(count, &rh->ref))
> > + kfree(rh);
> > +}
> > +
> > +/**
> > + * rethook_free() - Free struct rethook.
> > + * @rh: the struct rethook to be freed.
> > + *
> > + * Free the rethook. Before calling this function, user must ensure the
> > + * @rh::data is cleaned if needed (or, the handler can access it after
> > + * calling this function.) This function will set the @rh to be freed
> > + * after all rethook_node are freed (not soon). And the caller must
> > + * not touch @rh after calling this.
> > + */
> > +void rethook_free(struct rethook *rh)
> > +{
> > + rcu_assign_pointer(rh->handler, NULL);
> > +
> > + call_rcu(&rh->rcu, rethook_free_rcu);
> > +}
> > +
> > +/**
> > + * rethook_alloc() - Allocate struct rethook.
> > + * @data: a data to pass the @handler when hooking the return.
> > + * @handler: the return hook callback function.
> > + *
> > + * Allocate and initialize a new rethook with @data and @handler.
> > + * Return NULL if memory allocation fails or @handler is NULL.
> > + * Note that @handler == NULL means this rethook is going to be freed.
> > + */
> > +struct rethook *rethook_alloc(void *data, rethook_handler_t handler)
> > +{
> > + struct rethook *rh = kzalloc(sizeof(struct rethook), GFP_KERNEL);
> > +
> > + if (!rh || !handler)
> > + return NULL;
> > +
> > + rh->data = data;
> > + rh->handler = handler;
> > + rh->pool.head = NULL;
> > + refcount_set(&rh->ref, 1);
> > +
> > + return rh;
> > +}
> > +
> > +/**
> > + * rethook_add_node() - Add a new node to the rethook.
> > + * @rh: the struct rethook.
> > + * @node: the struct rethook_node to be added.
> > + *
> > + * Add @node to @rh. User must allocate @node (as a part of user's
> > + * data structure.) The @node fields are initialized in this function.
> > + */
> > +void rethook_add_node(struct rethook *rh, struct rethook_node *node)
> > +{
> > + node->rethook = rh;
> > + freelist_add(&node->freelist, &rh->pool);
> > + refcount_inc(&rh->ref);
> > +}
> > +
> > +static void free_rethook_node_rcu(struct rcu_head *head)
> > +{
> > + struct rethook_node *node = container_of(head, struct rethook_node, rcu);
> > +
> > + if (refcount_dec_and_test(&node->rethook->ref))
> > + kfree(node->rethook);
> > + kfree(node);
> > +}
> > +
> > +/**
> > + * rethook_recycle() - return the node to rethook.
> > + * @node: The struct rethook_node to be returned.
> > + *
> > + * Return back the @node to @node::rethook. If the @node::rethook is already
> > + * marked as freed, this will free the @node.
> > + */
> > +void rethook_recycle(struct rethook_node *node)
> > +{
> > + lockdep_assert_preemption_disabled();
> > +
> > + if (likely(READ_ONCE(node->rethook->handler)))
> > + freelist_add(&node->freelist, &node->rethook->pool);
> > + else
> > + call_rcu(&node->rcu, free_rethook_node_rcu);
> > +}
> > +NOKPROBE_SYMBOL(rethook_recycle);
> > +
> > +/**
> > + * rethook_try_get() - get an unused rethook node.
> > + * @rh: The struct rethook which pools the nodes.
> > + *
> > + * Get an unused rethook node from @rh. If the node pool is empty, this
> > + * will return NULL. Caller must disable preemption.
> > + */
> > +struct rethook_node *rethook_try_get(struct rethook *rh)
> > +{
> > + rethook_handler_t handler = READ_ONCE(rh->handler);
> > + struct freelist_node *fn;
> > +
> > + lockdep_assert_preemption_disabled();
> > +
> > + /* Check whether @rh is going to be freed. */
> > + if (unlikely(!handler))
> > + return NULL;
> > +
> > + fn = freelist_try_get(&rh->pool);
> > + if (!fn)
> > + return NULL;
> > +
> > + return container_of(fn, struct rethook_node, freelist);
> > +}
> > +NOKPROBE_SYMBOL(rethook_try_get);
> > +
> > +/**
> > + * rethook_hook() - Hook the current function return.
> > + * @node: The struct rethook node to hook the function return.
> > + * @regs: The struct pt_regs for the function entry.
> > + *
> > + * Hook the current running function return. This must be called when the
> > + * function entry (or at least @regs must be the registers of the function
> > + * entry.)
> > + */
> > +void rethook_hook(struct rethook_node *node, struct pt_regs *regs)
> > +{
> > + arch_rethook_prepare(node, regs);
> > + __llist_add(&node->llist, &current->rethooks);
> > +}
> > +NOKPROBE_SYMBOL(rethook_hook);
> > +
> > +/* This assumes the 'tsk' is the current task or the is not running. */
>
> "or the is not running" ?
>
> > +static unsigned long __rethook_find_ret_addr(struct task_struct *tsk,
> > + struct llist_node **cur)
> > +{
> > + struct rethook_node *rh = NULL;
> > + struct llist_node *node = *cur;
> > +
> > + if (!node)
> > + node = tsk->rethooks.first;
> > + else
> > + node = node->next;
> > +
> > + while (node) {
> > + rh = container_of(node, struct rethook_node, llist);
> > + if (rh->ret_addr != (unsigned long)arch_rethook_trampoline) {
> > + *cur = node;
> > + return rh->ret_addr;
> > + }
> > + node = node->next;
> > + }
> > + return 0;
> > +}
> > +NOKPROBE_SYMBOL(__rethook_find_ret_addr);
> > +
> > +/**
> > + * rethook_find_ret_addr -- Find correct return address modified by rethook
> > + * @tsk: Target task
> > + * @frame: A frame pointer
> > + * @cur: a storage of the loop cursor llist_node pointer for next call
> > + *
> > + * Find the correct return address modified by a rethook on @tsk in unsigned
> > + * long type. If it finds the return address, this returns that address value,
> > + * or this returns 0.
>
> space
>
> > + * The @tsk must be 'current' or a task which is not running. @frame is a hint
>
> How do you know a tsk is not running? How can that be guaranteed?
>
> > + * to get the currect return address - which is compared with the
> > + * rethook::frame field. The @cur is a loop cursor for searching the
> > + * kretprobe return addresses on the @tsk. The '*@cur' should be NULL at the
> > + * first call, but '@cur' itself must NOT NULL.
>
> I know you state what the return value is above, but it should be stated
> (again) here. As kernel-doc should have a separate section for return
> values:
>
> * Returns found address value or zero if not found.
>
> > + */
> > +unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
> > + struct llist_node **cur)
> > +{
> > + struct rethook_node *rhn = NULL;
> > + unsigned long ret;
> > +
> > + if (WARN_ON_ONCE(!cur))
> > + return 0;
> > +
> > + do {
> > + ret = __rethook_find_ret_addr(tsk, cur);
> > + if (!ret)
> > + break;
> > + rhn = container_of(*cur, struct rethook_node, llist);
> > + } while (rhn->frame != frame);
> > +
> > + return ret;
> > +}
>
> -- Steve
>
>
> > +NOKPROBE_SYMBOL(rethook_find_ret_addr);
> > +
> > +void __weak arch_rethook_fixup_return(struct pt_regs *regs,
> > + unsigned long correct_ret_addr)
> > +{
> > + /*
> > + * Do nothing by default. If the architecture which uses a
> > + * frame pointer to record real return address on the stack,
> > + * it should fill this function to fixup the return address
> > + * so that stacktrace works from the rethook handler.
> > + */
> > +}
> > +
> > +/* This function will be called from each arch-defined trampoline. */
> > +unsigned long rethook_trampoline_handler(struct pt_regs *regs,
> > + unsigned long frame)
> > +{
> > + struct llist_node *first, *node = NULL;
> > + unsigned long correct_ret_addr;
> > + rethook_handler_t handler;
> > + struct rethook_node *rhn;
> > +
> > + correct_ret_addr = __rethook_find_ret_addr(current, &node);
> > + if (!correct_ret_addr) {
> > + pr_err("rethook: Return address not found! Maybe there is a bug in the kernel\n");
> > + BUG_ON(1);
> > + }
> > +
> > + instruction_pointer_set(regs, correct_ret_addr);
> > +
> > + /*
> > + * These loops must be protected from rethook_free_rcu() because those
> > + * are accessing 'rhn->rethook'.
> > + */
> > + preempt_disable();
> > +
> > + /*
> > + * Run the handler on the shadow stack. Do not unlink the list here because
> > + * stackdump inside the handlers needs to decode it.
> > + */
> > + first = current->rethooks.first;
> > + while (first) {
> > + rhn = container_of(first, struct rethook_node, llist);
> > + if (WARN_ON_ONCE(rhn->frame != frame))
> > + break;
> > + handler = READ_ONCE(rhn->rethook->handler);
> > + if (handler)
> > + handler(rhn, rhn->rethook->data, regs);
> > +
> > + if (first == node)
> > + break;
> > + first = first->next;
> > + }
> > +
> > + /* Fixup registers for returning to correct address. */
> > + arch_rethook_fixup_return(regs, correct_ret_addr);
> > +
> > + /* Unlink used shadow stack */
> > + first = current->rethooks.first;
> > + current->rethooks.first = node->next;
> > + node->next = NULL;
> > +
> > + while (first) {
> > + rhn = container_of(first, struct rethook_node, llist);
> > + first = first->next;
> > + rethook_recycle(rhn);
> > + }
> > + preempt_enable();
> > +
> > + return correct_ret_addr;
> > +}
> > +NOKPROBE_SYMBOL(rethook_trampoline_handler);
>


--
Masami Hiramatsu <[email protected]>

2022-01-26 16:20:27

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] fprobe: Add ftrace based probe APIs

On Tue, 25 Jan 2022 19:11:52 +0100
Jiri Olsa <[email protected]> wrote:

> On Tue, Jan 25, 2022 at 05:41:24PM +0100, Jiri Olsa wrote:
> > On Tue, Jan 25, 2022 at 09:11:57PM +0900, Masami Hiramatsu wrote:
> >
> > SNIP
> >
> > > +
> > > +/* Convert ftrace location address from symbols */
> > > +static int convert_func_addresses(struct fprobe *fp)
> > > +{
> > > + unsigned long addr, size;
> > > + unsigned int i;
> > > +
> > > + /* Convert symbols to symbol address */
> > > + if (fp->syms) {
> > > + fp->addrs = kcalloc(fp->nentry, sizeof(*fp->addrs), GFP_KERNEL);
> > > + if (!fp->addrs)
> > > + return -ENOMEM;
> > > +
> > > + for (i = 0; i < fp->nentry; i++) {
> > > + fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
> > > + if (!fp->addrs[i]) /* Maybe wrong symbol */
> > > + goto error;
> > > + }
> > > + }
> > > +
> > > + /* Convert symbol address to ftrace location. */
> > > + for (i = 0; i < fp->nentry; i++) {
> > > + if (!kallsyms_lookup_size_offset(fp->addrs[i], &size, NULL))
> > > + size = MCOUNT_INSN_SIZE;
> > > + addr = ftrace_location_range(fp->addrs[i], fp->addrs[i] + size);
> >
> > you need to substract 1 from 'end' in here, as explained in
> > __within_notrace_func comment:
> >
> > /*
> > * Since ftrace_location_range() does inclusive range check, we need
> > * to subtract 1 byte from the end address.
> > */
> >
> > like in the patch below

Good catch, I missed that point.

> > also this convert is for archs where address from kallsyms does not match
> > the real attach addresss, like for arm you mentioned earlier, right?

Yes, that's right.

> >
> > could we have that arch specific, so we don't have extra heavy search
> > loop for archs that do not need it?

Hmm, is that so heavy? Even though, this kind of convertion is better
to be implemented in the arch-specific ftrace. They may know the
best way to do, because the offset is fixed for each arch.

>
> one more question..
>
> I'm adding support for user to pass function symbols to bpf fprobe link
> and I thought I'd pass symbols array to register_fprobe, but I'd need to
> copy the whole array of strings from user space first, which could take
> lot of memory considering attachment of 10k+ functions
>
> so I'm thinking better way is to resolve symbols already in bpf fprobe
> link code and pass just addresses to register_fprobe

That is OK. Fprobe accepts either ::syms or ::addrs.

>
> I assume you want to keep symbol interface, right? could we have some
> flag ensuring the conversion code is skipped, so we don't go through
> it twice?

Yeah, we still have many unused bits in fprobe::flags. :)

Thank you,

> in any case I need addresses before I call register_fprobe, because
> of the bpf cookies setup
>
> thanks,
> jirka
>


--
Masami Hiramatsu <[email protected]>

2022-01-26 18:42:50

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 3/9] rethook: Add a generic return hook

Hi Steve,

Thanks for the review.
I'll fix the typos.


On Tue, 25 Jan 2022 11:46:15 -0500
Steven Rostedt <[email protected]> wrote:

> > + */
> > +static inline bool is_rethook_trampoline(unsigned long addr)
> > +{
> > + return addr == (unsigned long)arch_rethook_trampoline;
>
> Will this work on architectures like PPC that have strange ways of holding
> the function addresses? Or is that what the below fixup handles?

Yes, I'll use dereference_symbol_descriptor().

> > +}
> > +
> > +/* If the architecture needs a fixup the return address, implement it. */
>
> "needs to fixup the"
>
> > +void arch_rethook_fixup_return(struct pt_regs *regs,
> > + unsigned long correct_ret_addr);
> > +
> > +/* Generic trampoline handler, arch code must prepare asm stub */
> > +unsigned long rethook_trampoline_handler(struct pt_regs *regs,
> > + unsigned long frame);
> > +
> > +#ifdef CONFIG_RETHOOK
> > +void rethook_flush_task(struct task_struct *tk);
> > +#else
> > +#define rethook_flush_task(tsk) do { } while (0)
> > +#endif
> > +
> > +#endif
> > +
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 78c351e35fec..2bfabf5355b7 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1473,6 +1473,9 @@ struct task_struct {
> > #ifdef CONFIG_KRETPROBES
> > struct llist_head kretprobe_instances;
> > #endif
> > +#ifdef CONFIG_RETHOOK
> > + struct llist_head rethooks;
> > +#endif
> >
> > #ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH
> > /*
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index f702a6a63686..a39a321c1f37 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -64,6 +64,7 @@
> > #include <linux/compat.h>
> > #include <linux/io_uring.h>
> > #include <linux/kprobes.h>
> > +#include <linux/rethook.h>
> >
> > #include <linux/uaccess.h>
> > #include <asm/unistd.h>
> > @@ -169,6 +170,7 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
> > struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);
> >
> > kprobe_flush_task(tsk);
> > + rethook_flush_task(tsk);
> > perf_event_delayed_put(tsk);
> > trace_sched_process_free(tsk);
> > put_task_struct(tsk);
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 3244cc56b697..ffae38be64c4 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -2282,6 +2282,9 @@ static __latent_entropy struct task_struct *copy_process(
> > #ifdef CONFIG_KRETPROBES
> > p->kretprobe_instances.first = NULL;
> > #endif
> > +#ifdef CONFIG_RETHOOK
> > + p->rethooks.first = NULL;
> > +#endif
> >
> > /*
> > * Ensure that the cgroup subsystem policies allow the new process to be
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index 23483dd474b0..4d27e56c6e76 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -10,6 +10,17 @@ config USER_STACKTRACE_SUPPORT
> > config NOP_TRACER
> > bool
> >
> > +config HAVE_RETHOOK
> > + bool
> > +
> > +config RETHOOK
> > + bool
> > + depends on HAVE_RETHOOK
> > + help
> > + Enable generic return hooking feature. This is an internal
> > + API, which will be used by other function-entry hooking
> > + feature like fprobe and kprobes.
>
> "features"
>
> > +
> > config HAVE_FUNCTION_TRACER
> > bool
> > help
> > diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> > index 79255f9de9a4..c6f11a139eac 100644
> > --- a/kernel/trace/Makefile
> > +++ b/kernel/trace/Makefile
> > @@ -98,6 +98,7 @@ obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
> > obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
> > obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
> > obj-$(CONFIG_FPROBE) += fprobe.o
> > +obj-$(CONFIG_RETHOOK) += rethook.o
> >
> > obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
> >
> > diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
> > new file mode 100644
> > index 000000000000..76c9848b44a9
> > --- /dev/null
> > +++ b/kernel/trace/rethook.c
> > @@ -0,0 +1,311 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#define pr_fmt(fmt) "rethook: " fmt
> > +
> > +#include <linux/bug.h>
> > +#include <linux/kallsyms.h>
> > +#include <linux/kprobes.h>
> > +#include <linux/preempt.h>
> > +#include <linux/rethook.h>
> > +#include <linux/slab.h>
> > +#include <linux/sort.h>
> > +
> > +/* Return hook list (shadow stack by list) */
> > +
> > +/*
> > + * This function is called from delayed_put_task_struct() when a task is
> > + * dead and cleaned up to recycle any kretprobe instances associated with
> > + * this task. These left over instances represent probed functions that
> > + * have been called but will never return.
> > + */
> > +void rethook_flush_task(struct task_struct *tk)
> > +{
> > + struct rethook_node *rhn;
> > + struct llist_node *node;
> > +
> > + preempt_disable();
> > +
> > + node = __llist_del_all(&tk->rethooks);
>
> Hmm, this keeps preemption disabled for the entire walk of the list.
> Can we enable it here, and then just disable it when calling
> rethook_recycle()?

Yes, it is possible.

> > + while (node) {
> > + rhn = container_of(node, struct rethook_node, llist);
> > + node = node->next;
>
> preempt_disable();
> > + rethook_recycle(rhn);
> preempt_enable();
>
> ? I'm concerned about the latency that this can add on RT tasks.

OK, actually I just followed what the kretprobe does.

[SNIP]
> > +/* This assumes the 'tsk' is the current task or the is not running. */
>
> "or the is not running" ?

Oops, "a task which is not running"

> > +/**
> > + * rethook_find_ret_addr -- Find correct return address modified by rethook
> > + * @tsk: Target task
> > + * @frame: A frame pointer
> > + * @cur: a storage of the loop cursor llist_node pointer for next call
> > + *
> > + * Find the correct return address modified by a rethook on @tsk in unsigned
> > + * long type. If it finds the return address, this returns that address value,
> > + * or this returns 0.
>
> space
>
> > + * The @tsk must be 'current' or a task which is not running. @frame is a hint
>
> How do you know a tsk is not running? How can that be guaranteed?

There is no check yet (I expected to caller ensure it).
I'll add task_is_running() check.

> > + * to get the currect return address - which is compared with the
> > + * rethook::frame field. The @cur is a loop cursor for searching the
> > + * kretprobe return addresses on the @tsk. The '*@cur' should be NULL at the
> > + * first call, but '@cur' itself must NOT NULL.
>
> I know you state what the return value is above, but it should be stated
> (again) here. As kernel-doc should have a separate section for return
> values:
>
> * Returns found address value or zero if not found.

OK.

>
> > + */
> > +unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame,
> > + struct llist_node **cur)
> > +{
> > + struct rethook_node *rhn = NULL;
> > + unsigned long ret;
> > +
> > + if (WARN_ON_ONCE(!cur))
> > + return 0;
> > +
> > + do {
> > + ret = __rethook_find_ret_addr(tsk, cur);
> > + if (!ret)
> > + break;
> > + rhn = container_of(*cur, struct rethook_node, llist);
> > + } while (rhn->frame != frame);
> > +
> > + return ret;
> > +}
>

Thank you!

> -- Steve
>
>
> > +NOKPROBE_SYMBOL(rethook_find_ret_addr);
> > +
> > +void __weak arch_rethook_fixup_return(struct pt_regs *regs,
> > + unsigned long correct_ret_addr)
> > +{
> > + /*
> > + * Do nothing by default. If the architecture which uses a
> > + * frame pointer to record real return address on the stack,
> > + * it should fill this function to fixup the return address
> > + * so that stacktrace works from the rethook handler.
> > + */
> > +}
> > +
> > +/* This function will be called from each arch-defined trampoline. */
> > +unsigned long rethook_trampoline_handler(struct pt_regs *regs,
> > + unsigned long frame)
> > +{
> > + struct llist_node *first, *node = NULL;
> > + unsigned long correct_ret_addr;
> > + rethook_handler_t handler;
> > + struct rethook_node *rhn;
> > +
> > + correct_ret_addr = __rethook_find_ret_addr(current, &node);
> > + if (!correct_ret_addr) {
> > + pr_err("rethook: Return address not found! Maybe there is a bug in the kernel\n");
> > + BUG_ON(1);
> > + }
> > +
> > + instruction_pointer_set(regs, correct_ret_addr);
> > +
> > + /*
> > + * These loops must be protected from rethook_free_rcu() because those
> > + * are accessing 'rhn->rethook'.
> > + */
> > + preempt_disable();
> > +
> > + /*
> > + * Run the handler on the shadow stack. Do not unlink the list here because
> > + * stackdump inside the handlers needs to decode it.
> > + */
> > + first = current->rethooks.first;
> > + while (first) {
> > + rhn = container_of(first, struct rethook_node, llist);
> > + if (WARN_ON_ONCE(rhn->frame != frame))
> > + break;
> > + handler = READ_ONCE(rhn->rethook->handler);
> > + if (handler)
> > + handler(rhn, rhn->rethook->data, regs);
> > +
> > + if (first == node)
> > + break;
> > + first = first->next;
> > + }
> > +
> > + /* Fixup registers for returning to correct address. */
> > + arch_rethook_fixup_return(regs, correct_ret_addr);
> > +
> > + /* Unlink used shadow stack */
> > + first = current->rethooks.first;
> > + current->rethooks.first = node->next;
> > + node->next = NULL;
> > +
> > + while (first) {
> > + rhn = container_of(first, struct rethook_node, llist);
> > + first = first->next;
> > + rethook_recycle(rhn);
> > + }
> > + preempt_enable();
> > +
> > + return correct_ret_addr;
> > +}
> > +NOKPROBE_SYMBOL(rethook_trampoline_handler);
>


--
Masami Hiramatsu <[email protected]>

2022-01-26 20:39:36

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] fprobe: Add ftrace based probe APIs

On Tue, 25 Jan 2022 11:21:23 -0500
Steven Rostedt <[email protected]> wrote:

> On Tue, 25 Jan 2022 21:11:57 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > The fprobe is a wrapper API for ftrace function tracer.
> > Unlike kprobes, this probes only supports the function entry, but
> > it can probe multiple functions by one fprobe. The usage is almost
> > same as the kprobe, user will specify the function names by
> > fprobe::syms, the number of syms by fprobe::nentry,
> > and the user handler by fprobe::entry_handler.
> >
> > struct fprobe fp = { 0 };
> > const char *targets[] = { "func1", "func2", "func3"};
> >
> > fp.handler = user_handler;
> > fp.nentry = ARRAY_SIZE(targets);
> > fp.syms = targets;
> >
> > ret = register_fprobe(&fp);
> >
> > CAUTION: if user entry handler changes registers including
> > ip address, it will be applied when returns from the
> > entry handler. So user handler must recover it.
>
> Can you rephrase the above, I'm not sure what you mean by it.

OK, but I think this should be written in the document.
I meant entry_handler can change the regs, but that will change
the execution path. So for some reason if it needs to change the
registers, those should be recovered in the same entry_handler.

[SNIP]
> > +/* Convert ftrace location address from symbols */
> > +static int convert_func_addresses(struct fprobe *fp)
> > +{
> > + unsigned long addr, size;
> > + unsigned int i;
> > +
> > + /* Convert symbols to symbol address */
> > + if (fp->syms) {
> > + fp->addrs = kcalloc(fp->nentry, sizeof(*fp->addrs), GFP_KERNEL);
> > + if (!fp->addrs)
> > + return -ENOMEM;
> > +
> > + for (i = 0; i < fp->nentry; i++) {
> > + fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
> > + if (!fp->addrs[i]) /* Maybe wrong symbol */
> > + goto error;
> > + }
> > + }
>
> I wonder if we should just copy the addrs when fp->syms is not set, and
> not have to worry about not freeing addrs (see below). This will make
> things easier to maintain. Or better yet, have the syms and addrs passed
> in, and then we assign it.
>
> static int convert_func_addresses(struct fprobe *fp, const char **syms,
> unsigned long *addrs)
> {
> unsigned int i;
>
> fp->addrs = kcalloc(fp->nentry, sizeof(*fp->addrs), GFP_KERNEL);
> if (!fp->addrs)
> return -ENOMEM;
>
> if (syms) {
> for (i = 0; i < fp->nentry; i++) {
> fp->addrs[i] = kallsyms_lookup_name(fp->syms[i]);
> if (!fp->addrs[i]) /* Maybe wrong symbol */
> goto error;
> }
> } else {
> memcpy(fp->addrs, addrs, fp->nentry * sizeof(*addrs));
> }

Actually, since fprobe doesn't touch the addrs and syms except for the
registering time, instead of changing the fp->addrs, I would like
to make a temporary address array just for ftrace_filter_ips(). Then
we don't need to free it later.

>
> > +
> > + /* Convert symbol address to ftrace location. */
> > + for (i = 0; i < fp->nentry; i++) {
> > + if (!kallsyms_lookup_size_offset(fp->addrs[i], &size, NULL))
> > + size = MCOUNT_INSN_SIZE;
> > + addr = ftrace_location_range(fp->addrs[i], fp->addrs[i] + size);
> > + if (!addr) /* No dynamic ftrace there. */
> > + goto error;
> > + fp->addrs[i] = addr;
> > + }
> > +
> > + return 0;
> > +
> > +error:
> > + kfree(fp->addrs);
>
> The above doesn't check if fp->syms was set, so if it wasn't we just freed
> the addrs that was passed in. Again, I think these should be passed into
> the register function as separate parameters and not via the fp handle.

Agreed. I also would like to remove those params from struct fprobe.

>
> > + fp->addrs = NULL;
> > + return -ENOENT;
> > +}
> > +
> > +/**
> > + * register_fprobe() - Register fprobe to ftrace
> > + * @fp: A fprobe data structure to be registered.
> > + *
> > + * This expects the user set @fp::entry_handler, @fp::syms or @fp:addrs,
> > + * and @fp::nentry. If @fp::addrs are set, that will be updated to point
> > + * the ftrace location. If @fp::addrs are NULL, this will generate it
> > + * from @fp::syms.
> > + * Note that you do not set both of @fp::addrs and @fp::syms.
>
> Again, I think this should pass in the syms and addrs as parameters.

That's good to me :)

Thank you,

>
> -- Steve
>
> > + */
> > +int register_fprobe(struct fprobe *fp)
> > +{
> > + int ret;
> > +
> > + if (!fp || !fp->nentry || (!fp->syms && !fp->addrs) ||
> > + (fp->syms && fp->addrs))
> > + return -EINVAL;
> > +
> > + ret = convert_func_addresses(fp);
> > + if (ret < 0)
> > + return ret;
> > +
> > + fp->nmissed = 0;
> > + fp->ops.func = fprobe_handler;
> > + fp->ops.flags = FTRACE_OPS_FL_SAVE_REGS;
> > +
> > + ret = ftrace_set_filter_ips(&fp->ops, fp->addrs, fp->nentry, 0, 0);
> > + if (!ret)
> > + ret = register_ftrace_function(&fp->ops);
> > +
> > + if (ret < 0 && fp->syms) {
> > + kfree(fp->addrs);
> > + fp->addrs = NULL;
> > + }
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(register_fprobe);
> > +
> > +/**
> > + * unregister_fprobe() - Unregister fprobe from ftrace
> > + * @fp: A fprobe data structure to be unregistered.
> > + *
> > + * Unregister fprobe (and remove ftrace hooks from the function entries).
> > + * If the @fp::addrs are generated by register_fprobe(), it will be removed
> > + * automatically.
> > + */
> > +int unregister_fprobe(struct fprobe *fp)
> > +{
> > + int ret;
> > +
> > + if (!fp || !fp->nentry || !fp->addrs)
> > + return -EINVAL;
> > +
> > + ret = unregister_ftrace_function(&fp->ops);
> > +
> > + if (!ret && fp->syms) {
> > + kfree(fp->addrs);
> > + fp->addrs = NULL;
> > + }
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(unregister_fprobe);
>


--
Masami Hiramatsu <[email protected]>

2022-01-26 22:11:22

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] fprobe: Add ftrace based probe APIs

On Wed, 26 Jan 2022 11:50:22 +0900
Masami Hiramatsu <[email protected]> wrote:

> > one more question..
> >
> > I'm adding support for user to pass function symbols to bpf fprobe link
> > and I thought I'd pass symbols array to register_fprobe, but I'd need to
> > copy the whole array of strings from user space first, which could take
> > lot of memory considering attachment of 10k+ functions
> >
> > so I'm thinking better way is to resolve symbols already in bpf fprobe
> > link code and pass just addresses to register_fprobe
>
> That is OK. Fprobe accepts either ::syms or ::addrs.
>
> >
> > I assume you want to keep symbol interface, right? could we have some
> > flag ensuring the conversion code is skipped, so we don't go through
> > it twice?
>
> Yeah, we still have many unused bits in fprobe::flags. :)

Instead of that, according to Steve's comment, I would like to introduce
3 registration APIs.

int register_fprobe(struct fprobe *fp, const char *filter, const char *notrace);
int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num);
int register_fprobe_syms(struct fprobe *fp, const char **syms, int num);

The register_fprobe_ips() will not touch the @addrs. You have to set the
correct ftrace location address in the @addrs.

Thank you,

--
Masami Hiramatsu <[email protected]>

2022-01-26 22:29:36

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] fprobe: Add ftrace based probe APIs

On Thu, Jan 27, 2022 at 12:59:52AM +0900, Masami Hiramatsu wrote:
> On Wed, 26 Jan 2022 11:50:22 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > > one more question..
> > >
> > > I'm adding support for user to pass function symbols to bpf fprobe link
> > > and I thought I'd pass symbols array to register_fprobe, but I'd need to
> > > copy the whole array of strings from user space first, which could take
> > > lot of memory considering attachment of 10k+ functions
> > >
> > > so I'm thinking better way is to resolve symbols already in bpf fprobe
> > > link code and pass just addresses to register_fprobe
> >
> > That is OK. Fprobe accepts either ::syms or ::addrs.
> >
> > >
> > > I assume you want to keep symbol interface, right? could we have some
> > > flag ensuring the conversion code is skipped, so we don't go through
> > > it twice?
> >
> > Yeah, we still have many unused bits in fprobe::flags. :)
>
> Instead of that, according to Steve's comment, I would like to introduce
> 3 registration APIs.
>
> int register_fprobe(struct fprobe *fp, const char *filter, const char *notrace);
> int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num);
> int register_fprobe_syms(struct fprobe *fp, const char **syms, int num);
>
> The register_fprobe_ips() will not touch the @addrs. You have to set the
> correct ftrace location address in the @addrs.

ok, sounds good

thanks,
jirka

2022-01-29 08:12:23

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 1/9] ftrace: Add ftrace_set_filter_ips function

Hi Jiri,

On Tue, 25 Jan 2022 17:46:07 +0100
Jiri Olsa <[email protected]> wrote:

> On Tue, Jan 25, 2022 at 11:06:59AM -0500, Steven Rostedt wrote:
>
> SNIP
>
> > > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > > index be5f6b32a012..39350aa38649 100644
> > > --- a/kernel/trace/ftrace.c
> > > +++ b/kernel/trace/ftrace.c
> > > @@ -4958,7 +4958,7 @@ ftrace_notrace_write(struct file *file, const char __user *ubuf,
> > > }
> > >
> > > static int
> > > -ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
> > > +__ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
> > > {
> > > struct ftrace_func_entry *entry;
> > >
> > > @@ -4976,9 +4976,25 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long ip, int remove)
> > > return add_hash_entry(hash, ip);
> > > }
> > >
> > > +static int
> > > +ftrace_match_addr(struct ftrace_hash *hash, unsigned long *ips,
> > > + unsigned int cnt, int remove)
> > > +{
> > > + unsigned int i;
> > > + int err;
> > > +
> > > + for (i = 0; i < cnt; i++) {
> > > + err = __ftrace_match_addr(hash, ips[i], remove);
> > > + if (err)
> > > + return err;
> >
> > On error should we revert what was done?
> >
> > goto err;
> > > + }
> > > + return 0;
> >
> > err:
> > for (i--; i >= 0; i--)
> > __ftrace_match_addr(hash, ips[i], !remove);
> > return err;
> >
> > Although it may not matter as it looks like it is only used on a temporary
> > hash. But either it should be commented that is the case, or we do the above
> > just to be more robust.
>
> yes, that's the case.. it populates just the hash at this point
> and if __ftrace_match_addr fails, the thehash is relased after
> jumping to out_regex_unlock
>
> >
> > > +}
> > > +
> > > static int
> > > ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
> > > - unsigned long ip, int remove, int reset, int enable)
> > > + unsigned long *ips, unsigned int cnt,
> > > + int remove, int reset, int enable)
> > > {
> > > struct ftrace_hash **orig_hash;
> > > struct ftrace_hash *hash;
> > > @@ -5008,8 +5024,8 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
> > > ret = -EINVAL;
> > > goto out_regex_unlock;
> > > }
> > > - if (ip) {
> > > - ret = ftrace_match_addr(hash, ip, remove);
> > > + if (ips) {
> > > + ret = ftrace_match_addr(hash, ips, cnt, remove);
> > > if (ret < 0)
> > > goto out_regex_unlock;
> > > }
> > > @@ -5026,10 +5042,10 @@ ftrace_set_hash(struct ftrace_ops *ops, unsigned char *buf, int len,
> > > }
> > >
> > > static int
> > > -ftrace_set_addr(struct ftrace_ops *ops, unsigned long ip, int remove,
> > > - int reset, int enable)
> > > +ftrace_set_addr(struct ftrace_ops *ops, unsigned long *ips, unsigned int cnt,
> > > + int remove, int reset, int enable)
> > > {
> > > - return ftrace_set_hash(ops, NULL, 0, ip, remove, reset, enable);
> > > + return ftrace_set_hash(ops, NULL, 0, ips, cnt, remove, reset, enable);
> > > }
> > >
> > > #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
> > > @@ -5634,10 +5650,29 @@ int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
> > > int remove, int reset)
> > > {
> > > ftrace_ops_init(ops);
> > > - return ftrace_set_addr(ops, ip, remove, reset, 1);
> > > + return ftrace_set_addr(ops, &ip, 1, remove, reset, 1);
> > > }
> > > EXPORT_SYMBOL_GPL(ftrace_set_filter_ip);
> > >
> > > +/**
> > > + * ftrace_set_filter_ips - set a functions to filter on in ftrace by addresses
> >
> > - set functions to filter on ...
>
> will fix,

So, I wrote a below change for the next version. Is that OK for you?

Thank you,

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index f305e18f699f..a28b1bdb234a 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -4985,8 +4985,13 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long *ips,

for (i = 0; i < cnt; i++) {
err = __ftrace_match_addr(hash, ips[i], remove);
- if (err)
+ if (err) {
+ /*
+ * This expects the @hash is a temporary hash and if this
+ * fails the caller must free the @hash.
+ */
return err;
+ }
}
return 0;
}
@@ -5649,7 +5654,7 @@ int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
EXPORT_SYMBOL_GPL(ftrace_set_filter_ip);

/**
- * ftrace_set_filter_ips - set a functions to filter on in ftrace by addresses
+ * ftrace_set_filter_ips - set functions to filter on in ftrace by addresses
* @ops - the ops to set the filter with
* @ips - the array of addresses to add to or remove from the filter.
* @cnt - the number of addresses in @ips








>
> thanks,
> jirka
>


--
Masami Hiramatsu <[email protected]>

2022-01-30 20:50:24

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v5 1/9] ftrace: Add ftrace_set_filter_ips function

On Fri, Jan 28, 2022 at 11:05:23AM +0900, Masami Hiramatsu wrote:

SNIP

>
> So, I wrote a below change for the next version. Is that OK for you?
>
> Thank you,
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index f305e18f699f..a28b1bdb234a 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -4985,8 +4985,13 @@ ftrace_match_addr(struct ftrace_hash *hash, unsigned long *ips,
>
> for (i = 0; i < cnt; i++) {
> err = __ftrace_match_addr(hash, ips[i], remove);
> - if (err)
> + if (err) {
> + /*
> + * This expects the @hash is a temporary hash and if this
> + * fails the caller must free the @hash.
> + */
> return err;
> + }
> }
> return 0;
> }
> @@ -5649,7 +5654,7 @@ int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip,
> EXPORT_SYMBOL_GPL(ftrace_set_filter_ip);
>
> /**
> - * ftrace_set_filter_ips - set a functions to filter on in ftrace by addresses
> + * ftrace_set_filter_ips - set functions to filter on in ftrace by addresses
> * @ops - the ops to set the filter with
> * @ips - the array of addresses to add to or remove from the filter.
> * @cnt - the number of addresses in @ips

looks good, thanks

jirka