LinuxLists.cc - [for-next][PATCH 03/13] fprobe: Add ftrace based probe APIs

2022-03-17 20:08:29

Subject: [for-next][PATCH 03/13] fprobe: Add ftrace based probe APIs

From: Masami Hiramatsu <[email protected]>

The fprobe is a wrapper API for ftrace function tracer.
Unlike kprobes, this probes only supports the function entry, but this
can probe multiple functions by one fprobe. The usage is similar, user
will set their callback to fprobe::entry_handler and call
register_fprobe*() with probed functions.
There are 3 registration interfaces,

- register_fprobe() takes filtering patterns of the functin names.
- register_fprobe_ips() takes an array of ftrace-location addresses.
- register_fprobe_syms() takes an array of function names.

The registered fprobes can be unregistered with unregister_fprobe().
e.g.

struct fprobe fp = { .entry_handler = user_handler };
const char *targets[] = { "func1", "func2", "func3"};
...

ret = register_fprobe_syms(&fp, targets, ARRAY_SIZE(targets));

...

unregister_fprobe(&fp);

Link: https://lkml.kernel.org/r/164735283857.1084943.1154436951479395551.stgit@devnote2

Cc: Jiri Olsa <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Yonghong Song <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: KP Singh <[email protected]>
Cc: "Naveen N . Rao" <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: "David S . Miller" <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
---
include/linux/fprobe.h | 87 +++++++++++++++++
kernel/trace/Kconfig | 12 +++
kernel/trace/Makefile | 1 +
kernel/trace/fprobe.c | 211 +++++++++++++++++++++++++++++++++++++++++
4 files changed, 311 insertions(+)
create mode 100644 include/linux/fprobe.h
create mode 100644 kernel/trace/fprobe.c

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
new file mode 100644
index 000000000000..2ba099aff041
--- /dev/null
+++ b/include/linux/fprobe.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Simple ftrace probe wrapper */
+#ifndef _LINUX_FPROBE_H
+#define _LINUX_FPROBE_H
+
+#include <linux/compiler.h>
+#include <linux/ftrace.h>
+
+/**
+ * struct fprobe - ftrace based probe.
+ * @ops: The ftrace_ops.
+ * @nmissed: The counter for missing events.
+ * @flags: The status flag.
+ * @entry_handler: The callback function for function entry.
+ */
+struct fprobe {
+#ifdef CONFIG_FUNCTION_TRACER
+ /*
+ * If CONFIG_FUNCTION_TRACER is not set, CONFIG_FPROBE is disabled too.
+ * But user of fprobe may keep embedding the struct fprobe on their own
+ * code. To avoid build error, this will keep the fprobe data structure
+ * defined here, but remove ftrace_ops data structure.
+ */
+ struct ftrace_ops ops;
+#endif
+ unsigned long nmissed;
+ unsigned int flags;
+ void (*entry_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs);
+};
+
+#define FPROBE_FL_DISABLED 1
+
+static inline bool fprobe_disabled(struct fprobe *fp)
+{
+ return (fp) ? fp->flags & FPROBE_FL_DISABLED : false;
+}
+
+#ifdef CONFIG_FPROBE
+int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter);
+int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num);
+int register_fprobe_syms(struct fprobe *fp, const char **syms, int num);
+int unregister_fprobe(struct fprobe *fp);
+#else
+static inline int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter)
+{
+ return -EOPNOTSUPP;
+}
+static inline int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num)
+{
+ return -EOPNOTSUPP;
+}
+static inline int register_fprobe_syms(struct fprobe *fp, const char **syms, int num)
+{
+ return -EOPNOTSUPP;
+}
+static inline int unregister_fprobe(struct fprobe *fp)
+{
+ return -EOPNOTSUPP;
+}
+#endif
+
+/**
+ * disable_fprobe() - Disable fprobe
+ * @fp: The fprobe to be disabled.
+ *
+ * This will soft-disable @fp. Note that this doesn't remove the ftrace
+ * hooks from the function entry.
+ */
+static inline void disable_fprobe(struct fprobe *fp)
+{
+ if (fp)
+ fp->flags |= FPROBE_FL_DISABLED;
+}
+
+/**
+ * enable_fprobe() - Enable fprobe
+ * @fp: The fprobe to be enabled.
+ *
+ * This will soft-enable @fp.
+ */
+static inline void enable_fprobe(struct fprobe *fp)
+{
+ if (fp)
+ fp->flags &= ~FPROBE_FL_DISABLED;
+}
+
+#endif
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 16a52a71732d..79d3e8e5a167 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -236,6 +236,18 @@ config DYNAMIC_FTRACE_WITH_ARGS
depends on DYNAMIC_FTRACE
depends on HAVE_DYNAMIC_FTRACE_WITH_ARGS

+config FPROBE
+ bool "Kernel Function Probe (fprobe)"
+ depends on FUNCTION_TRACER
+ depends on DYNAMIC_FTRACE_WITH_REGS
+ default n
+ help
+ This option enables kernel function probe (fprobe) based on ftrace,
+ which is similar to kprobes, but probes only for kernel function
+ entries and it can probe multiple functions by one fprobe.
+
+ If unsure, say N.
+
config FUNCTION_PROFILER
bool "Kernel function profiler"
depends on FUNCTION_TRACER
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 19ef3758da95..cb6245da30bb 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -98,6 +98,7 @@ obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
+obj-$(CONFIG_FPROBE) += fprobe.o

obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o

diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
new file mode 100644
index 000000000000..7e8ceee339a0
--- /dev/null
+++ b/kernel/trace/fprobe.c
@@ -0,0 +1,211 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fprobe - Simple ftrace probe wrapper for function entry.
+ */
+#define pr_fmt(fmt) "fprobe: " fmt
+
+#include <linux/err.h>
+#include <linux/fprobe.h>
+#include <linux/kallsyms.h>
+#include <linux/kprobes.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+
+static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
+ struct ftrace_ops *ops, struct ftrace_regs *fregs)
+{
+ struct fprobe *fp;
+ int bit;
+
+ fp = container_of(ops, struct fprobe, ops);
+ if (fprobe_disabled(fp))
+ return;
+
+ bit = ftrace_test_recursion_trylock(ip, parent_ip);
+ if (bit < 0) {
+ fp->nmissed++;
+ return;
+ }
+
+ if (fp->entry_handler)
+ fp->entry_handler(fp, ip, ftrace_get_regs(fregs));
+
+ ftrace_test_recursion_unlock(bit);
+}
+NOKPROBE_SYMBOL(fprobe_handler);
+
+/* Convert ftrace location address from symbols */
+static unsigned long *get_ftrace_locations(const char **syms, int num)
+{
+ unsigned long addr, size;
+ unsigned long *addrs;
+ int i;
+
+ /* Convert symbols to symbol address */
+ addrs = kcalloc(num, sizeof(*addrs), GFP_KERNEL);
+ if (!addrs)
+ return ERR_PTR(-ENOMEM);
+
+ for (i = 0; i < num; i++) {
+ addr = kallsyms_lookup_name(syms[i]);
+ if (!addr) /* Maybe wrong symbol */
+ goto error;
+
+ /* Convert symbol address to ftrace location. */
+ if (!kallsyms_lookup_size_offset(addr, &size, NULL) || !size)
+ goto error;
+
+ addr = ftrace_location_range(addr, addr + size - 1);
+ if (!addr) /* No dynamic ftrace there. */
+ goto error;
+
+ addrs[i] = addr;
+ }
+
+ return addrs;
+
+error:
+ kfree(addrs);
+
+ return ERR_PTR(-ENOENT);
+}
+
+static void fprobe_init(struct fprobe *fp)
+{
+ fp->nmissed = 0;
+ fp->ops.func = fprobe_handler;
+ fp->ops.flags |= FTRACE_OPS_FL_SAVE_REGS;
+}
+
+/**
+ * register_fprobe() - Register fprobe to ftrace by pattern.
+ * @fp: A fprobe data structure to be registered.
+ * @filter: A wildcard pattern of probed symbols.
+ * @notfilter: A wildcard pattern of NOT probed symbols.
+ *
+ * Register @fp to ftrace for enabling the probe on the symbols matched to @filter.
+ * If @notfilter is not NULL, the symbols matched the @notfilter are not probed.
+ *
+ * Return 0 if @fp is registered successfully, -errno if not.
+ */
+int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter)
+{
+ unsigned char *str;
+ int ret, len;
+
+ if (!fp || !filter)
+ return -EINVAL;
+
+ fprobe_init(fp);
+
+ len = strlen(filter);
+ str = kstrdup(filter, GFP_KERNEL);
+ ret = ftrace_set_filter(&fp->ops, str, len, 0);
+ kfree(str);
+ if (ret)
+ return ret;
+
+ if (notfilter) {
+ len = strlen(notfilter);
+ str = kstrdup(notfilter, GFP_KERNEL);
+ ret = ftrace_set_notrace(&fp->ops, str, len, 0);
+ kfree(str);
+ if (ret)
+ goto out;
+ }
+
+ ret = register_ftrace_function(&fp->ops);
+out:
+ if (ret)
+ ftrace_free_filter(&fp->ops);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(register_fprobe);
+
+/**
+ * register_fprobe_ips() - Register fprobe to ftrace by address.
+ * @fp: A fprobe data structure to be registered.
+ * @addrs: An array of target ftrace location addresses.
+ * @num: The number of entries of @addrs.
+ *
+ * Register @fp to ftrace for enabling the probe on the address given by @addrs.
+ * The @addrs must be the addresses of ftrace location address, which may be
+ * the symbol address + arch-dependent offset.
+ * If you unsure what this mean, please use other registration functions.
+ *
+ * Return 0 if @fp is registered successfully, -errno if not.
+ */
+int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num)
+{
+ int ret;
+
+ if (!fp || !addrs || num <= 0)
+ return -EINVAL;
+
+ fprobe_init(fp);
+
+ ret = ftrace_set_filter_ips(&fp->ops, addrs, num, 0, 0);
+ if (!ret)
+ ret = register_ftrace_function(&fp->ops);
+
+ if (ret)
+ ftrace_free_filter(&fp->ops);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(register_fprobe_ips);
+
+/**
+ * register_fprobe_syms() - Register fprobe to ftrace by symbols.
+ * @fp: A fprobe data structure to be registered.
+ * @syms: An array of target symbols.
+ * @num: The number of entries of @syms.
+ *
+ * Register @fp to the symbols given by @syms array. This will be useful if
+ * you are sure the symbols exist in the kernel.
+ *
+ * Return 0 if @fp is registered successfully, -errno if not.
+ */
+int register_fprobe_syms(struct fprobe *fp, const char **syms, int num)
+{
+ unsigned long *addrs;
+ int ret;
+
+ if (!fp || !syms || num <= 0)
+ return -EINVAL;
+
+ addrs = get_ftrace_locations(syms, num);
+ if (IS_ERR(addrs))
+ return PTR_ERR(addrs);
+
+ ret = register_fprobe_ips(fp, addrs, num);
+
+ kfree(addrs);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(register_fprobe_syms);
+
+/**
+ * unregister_fprobe() - Unregister fprobe from ftrace
+ * @fp: A fprobe data structure to be unregistered.
+ *
+ * Unregister fprobe (and remove ftrace hooks from the function entries).
+ *
+ * Return 0 if @fp is unregistered successfully, -errno if not.
+ */
+int unregister_fprobe(struct fprobe *fp)
+{
+ int ret;
+
+ if (!fp || fp->ops.func != fprobe_handler)
+ return -EINVAL;
+
+ ret = unregister_ftrace_function(&fp->ops);
+
+ if (!ret)
+ ftrace_free_filter(&fp->ops);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(unregister_fprobe);
--
2.35.1

2022-03-17 22:07:10

by Andrii Nakryiko

[permalink] [raw]

Subject: Re: [for-next][PATCH 03/13] fprobe: Add ftrace based probe APIs

On Thu, Mar 17, 2022 at 8:25 AM Steven Rostedt <[email protected]> wrote:
>
> From: Masami Hiramatsu <[email protected]>
>
> The fprobe is a wrapper API for ftrace function tracer.
> Unlike kprobes, this probes only supports the function entry, but this
> can probe multiple functions by one fprobe. The usage is similar, user
> will set their callback to fprobe::entry_handler and call
> register_fprobe*() with probed functions.
> There are 3 registration interfaces,
>
> - register_fprobe() takes filtering patterns of the functin names.
> - register_fprobe_ips() takes an array of ftrace-location addresses.
> - register_fprobe_syms() takes an array of function names.
>
> The registered fprobes can be unregistered with unregister_fprobe().
> e.g.
>
> struct fprobe fp = { .entry_handler = user_handler };
> const char *targets[] = { "func1", "func2", "func3"};
> ...
>
> ret = register_fprobe_syms(&fp, targets, ARRAY_SIZE(targets));
>
> ...
>
> unregister_fprobe(&fp);
>
> Link: https://lkml.kernel.org/r/164735283857.1084943.1154436951479395551.stgit@devnote2
>
> Cc: Jiri Olsa <[email protected]>
> Cc: Alexei Starovoitov <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Daniel Borkmann <[email protected]>
> Cc: Andrii Nakryiko <[email protected]>
> Cc: Martin KaFai Lau <[email protected]>
> Cc: Song Liu <[email protected]>
> Cc: Yonghong Song <[email protected]>
> Cc: John Fastabend <[email protected]>
> Cc: KP Singh <[email protected]>
> Cc: "Naveen N . Rao" <[email protected]>
> Cc: Anil S Keshavamurthy <[email protected]>
> Cc: "David S . Miller" <[email protected]>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Signed-off-by: Steven Rostedt (Google) <[email protected]>
> ---

Hey Steven!

Do I understand correctly that this patch set was applied in your
tree? I was under the impression that we agreed to route this through
the bpf-next tree earlier (see [0]), but I might have misunderstood
something, sorry.

Either way, the reason it matters is because Jiri's multi-attach
kprobe patch set ([1]) is depending on Masami's patches and having
fprobe patches in bpf-next tree would simplify logistics
significantly.

So I wonder if it's still possible to route it through bpf-next?

If not, we'd need a way to get these changes into the bpf-next tree
somehow. Having it in a separate branch that we can merge would be a
way to go about this, I presume? But it's certainly a more complicated
way, so it would be preferable to back it out and land through
bpf-next.

Please let me know how we should proceed. Thanks!

[0] https://lore.kernel.org/bpf/CAEf4BzaugZWf6f_0JzA-mqaGfp52tCwEp5dWdhpeVt6GjDLQ3Q@mail.gmail.com/
[1] https://lore.kernel.org/bpf/[email protected]/

> include/linux/fprobe.h | 87 +++++++++++++++++
> kernel/trace/Kconfig | 12 +++
> kernel/trace/Makefile | 1 +
> kernel/trace/fprobe.c | 211 +++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 311 insertions(+)
> create mode 100644 include/linux/fprobe.h
> create mode 100644 kernel/trace/fprobe.c
>

[...]

2022-03-18 01:18:09

by Steven Rostedt

[permalink] [raw]

Subject: Re: [for-next][PATCH 03/13] fprobe: Add ftrace based probe APIs

On Thu, 17 Mar 2022 15:03:33 -0700
Andrii Nakryiko <[email protected]> wrote:

> Do I understand correctly that this patch set was applied in your
> tree? I was under the impression that we agreed to route this through
> the bpf-next tree earlier (see [0]), but I might have misunderstood
> something, sorry.
>
> Either way, the reason it matters is because Jiri's multi-attach
> kprobe patch set ([1]) is depending on Masami's patches and having
> fprobe patches in bpf-next tree would simplify logistics
> significantly.

I knew Jiri's patches were to go through the bpf tree, but I missed that
those were dependent on this and you wanted these to go through as well.

I had just finished my automated tests that ran these patches. I haven't
pushed them to my next branch yet so I can hold them off. I don't have
anything dependent on them.

Would you be able to take these for-next patches directly (as they all have
been tested) and you can switch my signed-off-by to Reviewed-by.

The first of the series is unrelated and will go through my tree. That's
the user_events patch.

-- Steve

>
> So I wonder if it's still possible to route it through bpf-next?
>
> If not, we'd need a way to get these changes into the bpf-next tree
> somehow. Having it in a separate branch that we can merge would be a
> way to go about this, I presume? But it's certainly a more complicated
> way, so it would be preferable to back it out and land through
> bpf-next.

2022-03-18 01:55:22

by Alexei Starovoitov

[permalink] [raw]

Subject: Re: [for-next][PATCH 03/13] fprobe: Add ftrace based probe APIs

On Thu, Mar 17, 2022 at 4:49 PM Steven Rostedt <[email protected]> wrote:
>
> On Thu, 17 Mar 2022 15:03:33 -0700
> Andrii Nakryiko <[email protected]> wrote:
>
> > Do I understand correctly that this patch set was applied in your
> > tree? I was under the impression that we agreed to route this through
> > the bpf-next tree earlier (see [0]), but I might have misunderstood
> > something, sorry.
> >
> > Either way, the reason it matters is because Jiri's multi-attach
> > kprobe patch set ([1]) is depending on Masami's patches and having
> > fprobe patches in bpf-next tree would simplify logistics
> > significantly.
>
> I knew Jiri's patches were to go through the bpf tree, but I missed that
> those were dependent on this and you wanted these to go through as well.
>
> I had just finished my automated tests that ran these patches. I haven't
> pushed them to my next branch yet so I can hold them off. I don't have
> anything dependent on them.

Excellent. Thanks for testing.

> Would you be able to take these for-next patches directly (as they all have
> been tested) and you can switch my signed-off-by to Reviewed-by.
>
> The first of the series is unrelated and will go through my tree. That's
> the user_events patch.

Right.
We're talking about this set:
https://patchwork.kernel.org/project/netdevbpf/cover/164735281449.1084943.12438881786173547153.stgit@devnote2/

I believe it has a small doc difference vs what you've tested.
Looks like our CI is green on it too.
We'll do additional testing and add your SOB and Tested-by
before pushing.

Thanks!

2022-03-18 14:46:36

by Masami Hiramatsu

[permalink] [raw]

Subject: Re: [for-next][PATCH 03/13] fprobe: Add ftrace based probe APIs

On Thu, 17 Mar 2022 17:26:23 -0700
Alexei Starovoitov <[email protected]> wrote:

> On Thu, Mar 17, 2022 at 4:49 PM Steven Rostedt <[email protected]> wrote:
> >
> > On Thu, 17 Mar 2022 15:03:33 -0700
> > Andrii Nakryiko <[email protected]> wrote:
> >
> > > Do I understand correctly that this patch set was applied in your
> > > tree? I was under the impression that we agreed to route this through
> > > the bpf-next tree earlier (see [0]), but I might have misunderstood
> > > something, sorry.
> > >
> > > Either way, the reason it matters is because Jiri's multi-attach
> > > kprobe patch set ([1]) is depending on Masami's patches and having
> > > fprobe patches in bpf-next tree would simplify logistics
> > > significantly.
> >
> > I knew Jiri's patches were to go through the bpf tree, but I missed that
> > those were dependent on this and you wanted these to go through as well.
> >
> > I had just finished my automated tests that ran these patches. I haven't
> > pushed them to my next branch yet so I can hold them off. I don't have
> > anything dependent on them.
>
> Excellent. Thanks for testing.
>
> > Would you be able to take these for-next patches directly (as they all have
> > been tested) and you can switch my signed-off-by to Reviewed-by.
> >
> > The first of the series is unrelated and will go through my tree. That's
> > the user_events patch.
>
> Right.
> We're talking about this set:
> https://patchwork.kernel.org/project/netdevbpf/cover/164735281449.1084943.12438881786173547153.stgit@devnote2/
>
> I believe it has a small doc difference vs what you've tested.
> Looks like our CI is green on it too.
> We'll do additional testing and add your SOB and Tested-by
> before pushing.

Thanks Alexei and Steve, I think this fprobe series is good to go through
bpf-next tree because it is currently used only from bpf with Jiri's patch.
Sorry Steve for confusion. I think I might better move you to CC when I
tagged it 'bpf-next'.

Thank you,

>
> Thanks!

--
Masami Hiramatsu <[email protected]>