2023-04-20 11:27:03

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 0/9] tracing: Add fprobe events

Hi,

Here is the 5th version of improve fprobe and add a basic fprobe event
support for ftrace (tracefs) and perf. Here is the previous version.

https://lore.kernel.org/all/168035963900.397811.6647648816464443553.stgit@mhiramat.roam.corp.google.com/

I updated the first 3 patches and added 6 additional patches to support
tracepoint probe (fprobe on tracepoint) and BTF support for function
entry and tracepoint. This allows us to trace function entries with
parameter names (if BTF is available).

With this fprobe events, we can continue to trace function entry/exit
even if the CONFIG_KPROBES_ON_FTRACE is not available. Since
CONFIG_KPROBES_ON_FTRACE requires the CONFIG_DYNAMIC_FTRACE_WITH_REGS,
it is not available if the architecture only supports
CONFIG_DYNAMIC_FTRACE_WITH_ARGS (e.g. arm64). And that means kprobe
events can not probe function entry/exit effectively on such architecture.
But this problem can be solved if the dynamic events supports fprobe events
because fprobe events doesn't use kprobe but ftrace via fprobe.

Fprobe events allows user to add new events on the entry and exit of kernel
functions (which can be ftraced). Unlike kprobe events, the fprobe events
can only probe the function entry and exit, the IP address will have some
offsets from the symbol address. And it can only trace the function args,
return value, and stacks. (no registers)
For probing function body, users can continue to use the kprobe events.

The tracepoint probe events (tprobe events) also allows user to add new
events dynamically on the tracepoint. Most of the tracepoint already has
trace-events, so this feature is useful if you only want to know a
specific parameter, or trace the tracepoints which has no trace-events
(e.g. sched_*_tp tracepoints only exposes the tracepoints.)

The fprobe events syntax is;

f[:[GRP/][EVENT]] FUNCTION [FETCHARGS]
f[MAXACTIVE][:[GRP/][EVENT]] FUNCTION%return [FETCHARGS]

And tracepoint probe events syntax is;

t[:[GRP/][EVENT]] TRACEPOINT [FETCHARGS]

This series also includes BTF argument support for fprobe/tracepoint events,
and kprobe events. This allows us to fetch a specific function parameter
by name, and all parameters by '$$args'.
Note that enabling this feature, you need to enable CONFIG_BPF_SYSCALL and
confirm that your arch supports CONFIG_HAVE_FUNCTION_ARG_ACCESS_API.

E.g.

# echo 't kfree ptr' >> dynamic_events
# echo 'f kfree object' >> dynamic_events
# cat dynamic_events
t:tracepoints/kfree kfree ptr=ptr
f:fprobes/kfree__entry kfree object=object
# echo 1 > events/fprobes/enable
# echo 1 > events/tracepoints/enable
# echo > trace
# head -n 20 trace | tail
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
tail-84 [000] ..... 1324.561958: kfree__entry: (kfree+0x4/0x140) object=0xffff888006383c00
tail-84 [000] ...1. 1324.561961: kfree: (__probestub_kfree+0x4/0x10) ptr=0xffff888006383c00
tail-84 [000] ..... 1324.561988: kfree__entry: (kfree+0x4/0x140) object=0x0
tail-84 [000] ...1. 1324.561988: kfree: (__probestub_kfree+0x4/0x10) ptr=0x0
tail-84 [000] ..... 1324.561989: kfree__entry: (kfree+0x4/0x140) object=0xffff88800671e600
tail-84 [000] ...1. 1324.561989: kfree: (__probestub_kfree+0x4/0x10) ptr=0xffff88800671e600
tail-84 [000] ..... 1324.562368: kfree__entry: (kfree+0x4/0x140) object=0xffff8880065e0580
tail-84 [000] ...1. 1324.562369: kfree: (__probestub_kfree+0x4/0x10) ptr=0xffff8880065e0580


Thank you,

---

Masami Hiramatsu (Google) (9):
fprobe: Pass return address to the handlers
tracing/probes: Add fprobe events for tracing function entry and exit.
selftests/ftrace: Add fprobe related testcases
tracing/probes: Add tracepoint support on fprobe_event
tracing/probes: Move event parameter fetching code to common parser
tracing/probes: Support function parameters if BTF is available
tracing/probes: Add $$args meta argument for all function args
selftests/ftrace: Add tracepoint probe test case
selftests/ftrace: Add BTF arguments test cases


include/linux/fprobe.h | 11
include/linux/rethook.h | 2
include/linux/trace_events.h | 3
include/linux/tracepoint-defs.h | 1
include/linux/tracepoint.h | 5
kernel/kprobes.c | 1
kernel/trace/Kconfig | 25
kernel/trace/Makefile | 1
kernel/trace/bpf_trace.c | 6
kernel/trace/fprobe.c | 17
kernel/trace/rethook.c | 3
kernel/trace/trace.c | 13
kernel/trace/trace.h | 11
kernel/trace/trace_eprobe.c | 44 -
kernel/trace/trace_fprobe.c | 1214 ++++++++++++++++++++
kernel/trace/trace_kprobe.c | 33 -
kernel/trace/trace_probe.c | 448 ++++++-
kernel/trace/trace_probe.h | 42 +
kernel/trace/trace_uprobe.c | 8
lib/test_fprobe.c | 10
samples/fprobe/fprobe_example.c | 6
.../ftrace/test.d/dynevent/add_remove_btfarg.tc | 54 +
.../ftrace/test.d/dynevent/add_remove_fprobe.tc | 26
.../ftrace/test.d/dynevent/add_remove_tprobe.tc | 27
.../ftrace/test.d/dynevent/fprobe_syntax_errors.tc | 98 ++
.../ftrace/test.d/dynevent/tprobe_syntax_errors.tc | 82 +
.../ftrace/test.d/kprobe/kprobe_syntax_errors.tc | 12
27 files changed, 2064 insertions(+), 139 deletions(-)
create mode 100644 kernel/trace/trace_fprobe.c
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_btfarg.tc
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_tprobe.tc
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/tprobe_syntax_errors.tc

--
Masami Hiramatsu (Google) <[email protected]>


2023-04-20 11:27:16

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 1/9] fprobe: Pass return address to the handlers

From: Masami Hiramatsu (Google) <[email protected]>

Pass return address as 'ret_ip' to the fprobe entry and return handlers.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
include/linux/fprobe.h | 6 ++++--
include/linux/rethook.h | 2 +-
kernel/kprobes.c | 1 +
kernel/trace/bpf_trace.c | 6 ++++--
kernel/trace/fprobe.c | 6 +++---
kernel/trace/rethook.c | 3 ++-
lib/test_fprobe.c | 10 +++++++---
samples/fprobe/fprobe_example.c | 6 ++++--
8 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 47fefc7f363b..134f0f59ffa8 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -35,9 +35,11 @@ struct fprobe {
int nr_maxactive;

int (*entry_handler)(struct fprobe *fp, unsigned long entry_ip,
- struct pt_regs *regs, void *entry_data);
+ unsigned long ret_ip, struct pt_regs *regs,
+ void *entry_data);
void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip,
- struct pt_regs *regs, void *entry_data);
+ unsigned long ret_ip, struct pt_regs *regs,
+ void *entry_data);
};

/* This fprobe is soft-disabled. */
diff --git a/include/linux/rethook.h b/include/linux/rethook.h
index c8ac1e5afcd1..fdf26cd0e742 100644
--- a/include/linux/rethook.h
+++ b/include/linux/rethook.h
@@ -14,7 +14,7 @@

struct rethook_node;

-typedef void (*rethook_handler_t) (struct rethook_node *, void *, struct pt_regs *);
+typedef void (*rethook_handler_t) (struct rethook_node *, void *, unsigned long, struct pt_regs *);

/**
* struct rethook - The rethook management data structure.
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 00e177de91cc..ce13f1a35251 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -2127,6 +2127,7 @@ static int pre_handler_kretprobe(struct kprobe *p, struct pt_regs *regs)
NOKPROBE_SYMBOL(pre_handler_kretprobe);

static void kretprobe_rethook_handler(struct rethook_node *rh, void *data,
+ unsigned long ret_addr,
struct pt_regs *regs)
{
struct kretprobe *rp = (struct kretprobe *)data;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d804172b709c..c0a32118f08f 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2646,7 +2646,8 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,

static int
kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,
- struct pt_regs *regs, void *data)
+ unsigned long ret_ip, struct pt_regs *regs,
+ void *data)
{
struct bpf_kprobe_multi_link *link;

@@ -2657,7 +2658,8 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,

static void
kprobe_multi_link_exit_handler(struct fprobe *fp, unsigned long fentry_ip,
- struct pt_regs *regs, void *data)
+ unsigned long ret_ip, struct pt_regs *regs,
+ void *data)
{
struct bpf_kprobe_multi_link *link;

diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 293184227394..d7d2b9bd2fc7 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -52,7 +52,7 @@ static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
}

if (fp->entry_handler)
- ret = fp->entry_handler(fp, ip, ftrace_get_regs(fregs), entry_data);
+ ret = fp->entry_handler(fp, ip, parent_ip, ftrace_get_regs(fregs), entry_data);

/* If entry_handler returns !0, nmissed is not counted. */
if (rh) {
@@ -81,7 +81,7 @@ static void fprobe_kprobe_handler(unsigned long ip, unsigned long parent_ip,
}

static void fprobe_exit_handler(struct rethook_node *rh, void *data,
- struct pt_regs *regs)
+ unsigned long ret_ip, struct pt_regs *regs)
{
struct fprobe *fp = (struct fprobe *)data;
struct fprobe_rethook_node *fpr;
@@ -91,7 +91,7 @@ static void fprobe_exit_handler(struct rethook_node *rh, void *data,

fpr = container_of(rh, struct fprobe_rethook_node, node);

- fp->exit_handler(fp, fpr->entry_ip, regs,
+ fp->exit_handler(fp, fpr->entry_ip, ret_ip, regs,
fp->entry_data_size ? (void *)fpr->data : NULL);
}
NOKPROBE_SYMBOL(fprobe_exit_handler);
diff --git a/kernel/trace/rethook.c b/kernel/trace/rethook.c
index 32c3dfdb4d6a..fc196e186737 100644
--- a/kernel/trace/rethook.c
+++ b/kernel/trace/rethook.c
@@ -301,7 +301,8 @@ unsigned long rethook_trampoline_handler(struct pt_regs *regs,
break;
handler = READ_ONCE(rhn->rethook->handler);
if (handler)
- handler(rhn, rhn->rethook->data, regs);
+ handler(rhn, rhn->rethook->data,
+ correct_ret_addr, regs);

if (first == node)
break;
diff --git a/lib/test_fprobe.c b/lib/test_fprobe.c
index 0fe5273e960b..ade7e3d93dac 100644
--- a/lib/test_fprobe.c
+++ b/lib/test_fprobe.c
@@ -39,7 +39,8 @@ static noinline u32 fprobe_selftest_nest_target(u32 value, u32 (*nest)(u32))
}

static notrace int fp_entry_handler(struct fprobe *fp, unsigned long ip,
- struct pt_regs *regs, void *data)
+ unsigned long ret_ip,
+ struct pt_regs *regs, void *data)
{
KUNIT_EXPECT_FALSE(current_test, preemptible());
/* This can be called on the fprobe_selftest_target and the fprobe_selftest_target2 */
@@ -57,6 +58,7 @@ static notrace int fp_entry_handler(struct fprobe *fp, unsigned long ip,
}

static notrace void fp_exit_handler(struct fprobe *fp, unsigned long ip,
+ unsigned long ret_ip,
struct pt_regs *regs, void *data)
{
unsigned long ret = regs_return_value(regs);
@@ -78,14 +80,16 @@ static notrace void fp_exit_handler(struct fprobe *fp, unsigned long ip,
}

static notrace int nest_entry_handler(struct fprobe *fp, unsigned long ip,
- struct pt_regs *regs, void *data)
+ unsigned long ret_ip,
+ struct pt_regs *regs, void *data)
{
KUNIT_EXPECT_FALSE(current_test, preemptible());
return 0;
}

static notrace void nest_exit_handler(struct fprobe *fp, unsigned long ip,
- struct pt_regs *regs, void *data)
+ unsigned long ret_ip,
+ struct pt_regs *regs, void *data)
{
KUNIT_EXPECT_FALSE(current_test, preemptible());
KUNIT_EXPECT_EQ(current_test, ip, target_nest_ip);
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index 4efc8feb6277..64e715e7ed11 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -49,6 +49,7 @@ static void show_backtrace(void)
}

static int sample_entry_handler(struct fprobe *fp, unsigned long ip,
+ unsigned long ret_ip,
struct pt_regs *regs, void *data)
{
if (use_trace)
@@ -65,10 +66,11 @@ static int sample_entry_handler(struct fprobe *fp, unsigned long ip,
return 0;
}

-static void sample_exit_handler(struct fprobe *fp, unsigned long ip, struct pt_regs *regs,
+static void sample_exit_handler(struct fprobe *fp, unsigned long ip,
+ unsigned long ret_ip, struct pt_regs *regs,
void *data)
{
- unsigned long rip = instruction_pointer(regs);
+ unsigned long rip = ret_ip;

if (use_trace)
/*

2023-04-20 11:27:20

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 2/9] tracing/probes: Add fprobe events for tracing function entry and exit.

From: Masami Hiramatsu (Google) <[email protected]>

Add fprobe events for tracing function entry and exit instead of kprobe
events. With this change, we can continue to trace function entry/exit
even if the CONFIG_KPROBES_ON_FTRACE is not available. Since
CONFIG_KPROBES_ON_FTRACE requires the CONFIG_DYNAMIC_FTRACE_WITH_REGS,
it is not available if the architecture only supports
CONFIG_DYNAMIC_FTRACE_WITH_ARGS. And that means kprobe events can not
probe function entry/exit effectively on such architecture.
But this can be solved if the dynamic events supports fprobe events.

The fprobe event is a new dynamic events which is only for the function
(symbol) entry and exit. This event accepts non register fetch arguments
so that user can trace the function arguments and return values.

The fprobe events syntax is here;

f[:[GRP/][EVENT]] FUNCTION [FETCHARGS]
f[MAXACTIVE][:[GRP/][EVENT]] FUNCTION%return [FETCHARGS]

E.g.

# echo 'f vfs_read $arg1' >> dynamic_events
# echo 'f vfs_read%return $retval' >> dynamic_events
# cat dynamic_events
f:fprobes/vfs_read__entry vfs_read arg1=$arg1
f:fprobes/vfs_read__exit vfs_read%return arg1=$retval
# echo 1 > events/fprobes/enable
# head -n 20 trace | tail
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
sh-142 [005] ...1. 448.386420: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540
sh-142 [005] ..... 448.386436: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1
sh-142 [005] ...1. 448.386451: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540
sh-142 [005] ..... 448.386458: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1
sh-142 [005] ...1. 448.386469: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540
sh-142 [005] ..... 448.386476: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1
sh-142 [005] ...1. 448.602073: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540
sh-142 [005] ..... 448.602089: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1


Reported-by: kernel test robot <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v5:
- Sort header files alphabetically.
- Fix typo.
- Change default event name with '__<place>' suffix.
- Change common error name MAXACT_NO_KPROBE to BAD_MAXACT_TYPE
- Fix to show README entry correctly
---
include/linux/fprobe.h | 5
include/linux/trace_events.h | 3
kernel/trace/Kconfig | 14
kernel/trace/Makefile | 1
kernel/trace/fprobe.c | 11
kernel/trace/trace.c | 8
kernel/trace/trace.h | 11
kernel/trace/trace_fprobe.c | 1073 ++++++++++++++++++++
kernel/trace/trace_kprobe.c | 2
kernel/trace/trace_probe.c | 4
kernel/trace/trace_probe.h | 5
.../ftrace/test.d/kprobe/kprobe_syntax_errors.tc | 2
12 files changed, 1130 insertions(+), 9 deletions(-)
create mode 100644 kernel/trace/trace_fprobe.c

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 134f0f59ffa8..3e03758151f4 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -66,6 +66,7 @@ int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter
int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num);
int register_fprobe_syms(struct fprobe *fp, const char **syms, int num);
int unregister_fprobe(struct fprobe *fp);
+bool fprobe_is_registered(struct fprobe *fp);
#else
static inline int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter)
{
@@ -83,6 +84,10 @@ static inline int unregister_fprobe(struct fprobe *fp)
{
return -EOPNOTSUPP;
}
+static inline bool fprobe_is_registered(struct fprobe *fp)
+{
+ return false;
+}
#endif

/**
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 0e373222a6df..fcdc1f0bed23 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -318,6 +318,7 @@ enum {
TRACE_EVENT_FL_KPROBE_BIT,
TRACE_EVENT_FL_UPROBE_BIT,
TRACE_EVENT_FL_EPROBE_BIT,
+ TRACE_EVENT_FL_FPROBE_BIT,
TRACE_EVENT_FL_CUSTOM_BIT,
};

@@ -332,6 +333,7 @@ enum {
* KPROBE - Event is a kprobe
* UPROBE - Event is a uprobe
* EPROBE - Event is an event probe
+ * FPROBE - Event is an function probe
* CUSTOM - Event is a custom event (to be attached to an exsiting tracepoint)
* This is set when the custom event has not been attached
* to a tracepoint yet, then it is cleared when it is.
@@ -346,6 +348,7 @@ enum {
TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT),
TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT),
TRACE_EVENT_FL_EPROBE = (1 << TRACE_EVENT_FL_EPROBE_BIT),
+ TRACE_EVENT_FL_FPROBE = (1 << TRACE_EVENT_FL_FPROBE_BIT),
TRACE_EVENT_FL_CUSTOM = (1 << TRACE_EVENT_FL_CUSTOM_BIT),
};

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8cf97fa4a4b3..8e10a9453c96 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -650,6 +650,20 @@ config BLK_DEV_IO_TRACE

If unsure, say N.

+config FPROBE_EVENTS
+ depends on FPROBE
+ depends on HAVE_REGS_AND_STACK_ACCESS_API
+ bool "Enable fprobe-based dynamic events"
+ select TRACING
+ select PROBE_EVENTS
+ select DYNAMIC_EVENTS
+ default y
+ help
+ This allows user to add tracing events on the function entry and
+ exit via ftrace interface. The syntax is same as the kprobe events
+ and the kprobe events on function entry and exit will be
+ transparently converted to this fprobe events.
+
config KPROBE_EVENTS
depends on KPROBES
depends on HAVE_REGS_AND_STACK_ACCESS_API
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index c6651e16b557..64b61f67a403 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -104,6 +104,7 @@ obj-$(CONFIG_BOOTTIME_TRACING) += trace_boot.o
obj-$(CONFIG_FTRACE_RECORD_RECURSION) += trace_recursion_record.o
obj-$(CONFIG_FPROBE) += fprobe.o
obj-$(CONFIG_RETHOOK) += rethook.o
+obj-$(CONFIG_FPROBE_EVENTS) += trace_fprobe.o

obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
obj-$(CONFIG_RV) += rv/
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index d7d2b9bd2fc7..44cf354d730e 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -305,6 +305,14 @@ int register_fprobe_syms(struct fprobe *fp, const char **syms, int num)
}
EXPORT_SYMBOL_GPL(register_fprobe_syms);

+bool fprobe_is_registered(struct fprobe *fp)
+{
+ if (!fp || (fp->ops.saved_func != fprobe_handler &&
+ fp->ops.saved_func != fprobe_kprobe_handler))
+ return false;
+ return true;
+}
+
/**
* unregister_fprobe() - Unregister fprobe from ftrace
* @fp: A fprobe data structure to be unregistered.
@@ -317,8 +325,7 @@ int unregister_fprobe(struct fprobe *fp)
{
int ret;

- if (!fp || (fp->ops.saved_func != fprobe_handler &&
- fp->ops.saved_func != fprobe_kprobe_handler))
+ if (!fprobe_is_registered(fp))
return -EINVAL;

/*
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 076d893d2965..4d7d5708d3cd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5644,10 +5644,16 @@ static const char readme_msg[] =
" uprobe_events\t\t- Create/append/remove/show the userspace dynamic events\n"
"\t\t\t Write into this file to define/undefine new trace events.\n"
#endif
-#if defined(CONFIG_KPROBE_EVENTS) || defined(CONFIG_UPROBE_EVENTS)
+#if defined(CONFIG_KPROBE_EVENTS) || defined(CONFIG_UPROBE_EVENTS) || \
+ defined(CONFIG_FPROBE_EVENTS)
"\t accepts: event-definitions (one definition per line)\n"
+#if defined(CONFIG_KPROBE_EVENTS) || defined(CONFIG_UPROBE_EVENTS)
"\t Format: p[:[<group>/][<event>]] <place> [<args>]\n"
"\t r[maxactive][:[<group>/][<event>]] <place> [<args>]\n"
+#endif
+#ifdef CONFIG_FPROBE_EVENTS
+ "\t f[:[<group>/][<event>]] <func-name>[%return] [<args>]\n"
+#endif
#ifdef CONFIG_HIST_TRIGGERS
"\t s:[synthetic/]<event> <field> [<field>]\n"
#endif
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 79bdefe9261b..b5ab5479f9e3 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -148,6 +148,17 @@ struct kretprobe_trace_entry_head {
unsigned long ret_ip;
};

+struct fentry_trace_entry_head {
+ struct trace_entry ent;
+ unsigned long ip;
+};
+
+struct fexit_trace_entry_head {
+ struct trace_entry ent;
+ unsigned long func;
+ unsigned long ret_ip;
+};
+
#define TRACE_BUF_SIZE 1024

struct trace_array;
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
new file mode 100644
index 000000000000..9dcc320b1158
--- /dev/null
+++ b/kernel/trace/trace_fprobe.c
@@ -0,0 +1,1073 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Fprobe-based tracing events
+ * Copyright (C) 2022 Google LLC.
+ */
+#define pr_fmt(fmt) "trace_fprobe: " fmt
+
+#include <linux/fprobe.h>
+#include <linux/module.h>
+#include <linux/rculist.h>
+#include <linux/security.h>
+#include <linux/uaccess.h>
+
+#include "trace_dynevent.h"
+#include "trace_probe.h"
+#include "trace_probe_kernel.h"
+#include "trace_probe_tmpl.h"
+
+#define FPROBE_EVENT_SYSTEM "fprobes"
+#define RETHOOK_MAXACTIVE_MAX 4096
+
+static int trace_fprobe_create(const char *raw_command);
+static int trace_fprobe_show(struct seq_file *m, struct dyn_event *ev);
+static int trace_fprobe_release(struct dyn_event *ev);
+static bool trace_fprobe_is_busy(struct dyn_event *ev);
+static bool trace_fprobe_match(const char *system, const char *event,
+ int argc, const char **argv, struct dyn_event *ev);
+
+static struct dyn_event_operations trace_fprobe_ops = {
+ .create = trace_fprobe_create,
+ .show = trace_fprobe_show,
+ .is_busy = trace_fprobe_is_busy,
+ .free = trace_fprobe_release,
+ .match = trace_fprobe_match,
+};
+
+/*
+ * Fprobe event core functions
+ */
+struct trace_fprobe {
+ struct dyn_event devent;
+ struct fprobe fp;
+ const char *symbol;
+ struct trace_probe tp;
+};
+
+static bool is_trace_fprobe(struct dyn_event *ev)
+{
+ return ev->ops == &trace_fprobe_ops;
+}
+
+static struct trace_fprobe *to_trace_fprobe(struct dyn_event *ev)
+{
+ return container_of(ev, struct trace_fprobe, devent);
+}
+
+/**
+ * for_each_trace_fprobe - iterate over the trace_fprobe list
+ * @pos: the struct trace_fprobe * for each entry
+ * @dpos: the struct dyn_event * to use as a loop cursor
+ */
+#define for_each_trace_fprobe(pos, dpos) \
+ for_each_dyn_event(dpos) \
+ if (is_trace_fprobe(dpos) && (pos = to_trace_fprobe(dpos)))
+
+static bool trace_fprobe_is_return(struct trace_fprobe *tf)
+{
+ return tf->fp.exit_handler != NULL;
+}
+
+static const char *trace_fprobe_symbol(struct trace_fprobe *tf)
+{
+ return tf->symbol ? tf->symbol : "unknown";
+}
+
+static bool trace_fprobe_is_busy(struct dyn_event *ev)
+{
+ struct trace_fprobe *tf = to_trace_fprobe(ev);
+
+ return trace_probe_is_enabled(&tf->tp);
+}
+
+static bool trace_fprobe_match_command_head(struct trace_fprobe *tf,
+ int argc, const char **argv)
+{
+ char buf[MAX_ARGSTR_LEN + 1];
+
+ if (!argc)
+ return true;
+
+ snprintf(buf, sizeof(buf), "%s", trace_fprobe_symbol(tf));
+ if (strcmp(buf, argv[0]))
+ return false;
+ argc--; argv++;
+
+ return trace_probe_match_command_args(&tf->tp, argc, argv);
+}
+
+static bool trace_fprobe_match(const char *system, const char *event,
+ int argc, const char **argv, struct dyn_event *ev)
+{
+ struct trace_fprobe *tf = to_trace_fprobe(ev);
+
+ return (event[0] == '\0' ||
+ strcmp(trace_probe_name(&tf->tp), event) == 0) &&
+ (!system || strcmp(trace_probe_group_name(&tf->tp), system) == 0) &&
+ trace_fprobe_match_command_head(tf, argc, argv);
+}
+
+static bool trace_fprobe_is_registered(struct trace_fprobe *tf)
+{
+ return fprobe_is_registered(&tf->fp);
+}
+
+/* Note that we don't verify it, since the code does not come from user space */
+static int
+process_fetch_insn(struct fetch_insn *code, void *rec, void *dest,
+ void *base)
+{
+ struct pt_regs *regs = rec;
+ unsigned long val;
+
+retry:
+ /* 1st stage: get value from context */
+ switch (code->op) {
+ case FETCH_OP_REG:
+ val = regs_get_register(regs, code->param);
+ break;
+ case FETCH_OP_STACK:
+ val = regs_get_kernel_stack_nth(regs, code->param);
+ break;
+ case FETCH_OP_STACKP:
+ val = kernel_stack_pointer(regs);
+ break;
+ case FETCH_OP_RETVAL:
+ val = regs_return_value(regs);
+ break;
+ case FETCH_OP_IMM:
+ val = code->immediate;
+ break;
+ case FETCH_OP_COMM:
+ val = (unsigned long)current->comm;
+ break;
+ case FETCH_OP_DATA:
+ val = (unsigned long)code->data;
+ break;
+#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
+ case FETCH_OP_ARG:
+ val = regs_get_kernel_argument(regs, code->param);
+ break;
+#endif
+ case FETCH_NOP_SYMBOL: /* Ignore a place holder */
+ code++;
+ goto retry;
+ default:
+ return -EILSEQ;
+ }
+ code++;
+
+ return process_fetch_insn_bottom(code, val, dest, base);
+}
+NOKPROBE_SYMBOL(process_fetch_insn)
+
+/* function entry handler */
+static nokprobe_inline void
+__fentry_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
+ struct pt_regs *regs,
+ struct trace_event_file *trace_file)
+{
+ struct fentry_trace_entry_head *entry;
+ struct trace_event_call *call = trace_probe_event_call(&tf->tp);
+ struct trace_event_buffer fbuffer;
+ int dsize;
+
+ WARN_ON(call != trace_file->event_call);
+
+ if (trace_trigger_soft_disabled(trace_file))
+ return;
+
+ dsize = __get_data_size(&tf->tp, regs);
+
+ entry = trace_event_buffer_reserve(&fbuffer, trace_file,
+ sizeof(*entry) + tf->tp.size + dsize);
+ if (!entry)
+ return;
+
+ fbuffer.regs = regs;
+ entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
+ entry->ip = entry_ip;
+ store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
+
+ trace_event_buffer_commit(&fbuffer);
+}
+
+static void
+fentry_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
+ struct pt_regs *regs)
+{
+ struct event_file_link *link;
+
+ trace_probe_for_each_link_rcu(link, &tf->tp)
+ __fentry_trace_func(tf, entry_ip, regs, link->file);
+}
+NOKPROBE_SYMBOL(fentry_trace_func);
+
+/* Kretprobe handler */
+static nokprobe_inline void
+__fexit_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
+ unsigned long ret_ip, struct pt_regs *regs,
+ struct trace_event_file *trace_file)
+{
+ struct fexit_trace_entry_head *entry;
+ struct trace_event_buffer fbuffer;
+ struct trace_event_call *call = trace_probe_event_call(&tf->tp);
+ int dsize;
+
+ WARN_ON(call != trace_file->event_call);
+
+ if (trace_trigger_soft_disabled(trace_file))
+ return;
+
+ dsize = __get_data_size(&tf->tp, regs);
+
+ entry = trace_event_buffer_reserve(&fbuffer, trace_file,
+ sizeof(*entry) + tf->tp.size + dsize);
+ if (!entry)
+ return;
+
+ fbuffer.regs = regs;
+ entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
+ entry->func = entry_ip;
+ entry->ret_ip = ret_ip;
+ store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
+
+ trace_event_buffer_commit(&fbuffer);
+}
+
+static void
+fexit_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
+ unsigned long ret_ip, struct pt_regs *regs)
+{
+ struct event_file_link *link;
+
+ trace_probe_for_each_link_rcu(link, &tf->tp)
+ __fexit_trace_func(tf, entry_ip, ret_ip, regs, link->file);
+}
+NOKPROBE_SYMBOL(fexit_trace_func);
+
+#ifdef CONFIG_PERF_EVENTS
+
+static int fentry_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
+ struct pt_regs *regs)
+{
+ struct trace_event_call *call = trace_probe_event_call(&tf->tp);
+ struct fentry_trace_entry_head *entry;
+ struct hlist_head *head;
+ int size, __size, dsize;
+ int rctx;
+
+ if (bpf_prog_array_valid(call)) {
+ unsigned long orig_ip = instruction_pointer(regs);
+ int ret;
+
+ ret = trace_call_bpf(call, regs);
+
+ /*
+ * We need to check and see if we modified the pc of the
+ * pt_regs, and if so return 1 so that we don't do the
+ * single stepping.
+ */
+ if (orig_ip != instruction_pointer(regs))
+ return 1;
+ if (!ret)
+ return 0;
+ }
+
+ head = this_cpu_ptr(call->perf_events);
+ if (hlist_empty(head))
+ return 0;
+
+ dsize = __get_data_size(&tf->tp, regs);
+ __size = sizeof(*entry) + tf->tp.size + dsize;
+ size = ALIGN(__size + sizeof(u32), sizeof(u64));
+ size -= sizeof(u32);
+
+ entry = perf_trace_buf_alloc(size, NULL, &rctx);
+ if (!entry)
+ return 0;
+
+ entry->ip = entry_ip;
+ memset(&entry[1], 0, dsize);
+ store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
+ perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
+ head, NULL);
+ return 0;
+}
+NOKPROBE_SYMBOL(fentry_perf_func);
+
+static void
+fexit_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
+ unsigned long ret_ip, struct pt_regs *regs)
+{
+ struct trace_event_call *call = trace_probe_event_call(&tf->tp);
+ struct fexit_trace_entry_head *entry;
+ struct hlist_head *head;
+ int size, __size, dsize;
+ int rctx;
+
+ if (bpf_prog_array_valid(call) && !trace_call_bpf(call, regs))
+ return;
+
+ head = this_cpu_ptr(call->perf_events);
+ if (hlist_empty(head))
+ return;
+
+ dsize = __get_data_size(&tf->tp, regs);
+ __size = sizeof(*entry) + tf->tp.size + dsize;
+ size = ALIGN(__size + sizeof(u32), sizeof(u64));
+ size -= sizeof(u32);
+
+ entry = perf_trace_buf_alloc(size, NULL, &rctx);
+ if (!entry)
+ return;
+
+ entry->func = entry_ip;
+ entry->ret_ip = ret_ip;
+ store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
+ perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
+ head, NULL);
+}
+NOKPROBE_SYMBOL(fexit_perf_func);
+#endif /* CONFIG_PERF_EVENTS */
+
+static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip,
+ unsigned long ret_ip, struct pt_regs *regs,
+ void *entry_data)
+{
+ struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
+ int ret = 0;
+
+ if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
+ fentry_trace_func(tf, entry_ip, regs);
+#ifdef CONFIG_PERF_EVENTS
+ if (trace_probe_test_flag(&tf->tp, TP_FLAG_PROFILE))
+ ret = fentry_perf_func(tf, entry_ip, regs);
+#endif
+ return ret;
+}
+NOKPROBE_SYMBOL(fentry_dispatcher);
+
+static void fexit_dispatcher(struct fprobe *fp, unsigned long entry_ip,
+ unsigned long ret_ip, struct pt_regs *regs,
+ void *entry_data)
+{
+ struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
+
+ if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
+ fexit_trace_func(tf, entry_ip, ret_ip, regs);
+#ifdef CONFIG_PERF_EVENTS
+ if (trace_probe_test_flag(&tf->tp, TP_FLAG_PROFILE))
+ fexit_perf_func(tf, entry_ip, ret_ip, regs);
+#endif
+}
+NOKPROBE_SYMBOL(fexit_dispatcher);
+
+static void free_trace_fprobe(struct trace_fprobe *tf)
+{
+ if (tf) {
+ trace_probe_cleanup(&tf->tp);
+ kfree(tf->symbol);
+ kfree(tf);
+ }
+}
+
+/*
+ * Allocate new trace_probe and initialize it (including fprobe).
+ */
+static struct trace_fprobe *alloc_trace_fprobe(const char *group,
+ const char *event,
+ const char *symbol,
+ int maxactive,
+ int nargs, bool is_return)
+{
+ struct trace_fprobe *tf;
+ int ret = -ENOMEM;
+
+ tf = kzalloc(struct_size(tf, tp.args, nargs), GFP_KERNEL);
+ if (!tf)
+ return ERR_PTR(ret);
+
+ tf->symbol = kstrdup(symbol, GFP_KERNEL);
+ if (!tf->symbol)
+ goto error;
+
+ if (is_return)
+ tf->fp.exit_handler = fexit_dispatcher;
+ else
+ tf->fp.entry_handler = fentry_dispatcher;
+
+ tf->fp.nr_maxactive = maxactive;
+
+ ret = trace_probe_init(&tf->tp, event, group, false);
+ if (ret < 0)
+ goto error;
+
+ dyn_event_init(&tf->devent, &trace_fprobe_ops);
+ return tf;
+error:
+ free_trace_fprobe(tf);
+ return ERR_PTR(ret);
+}
+
+static struct trace_fprobe *find_trace_fprobe(const char *event,
+ const char *group)
+{
+ struct dyn_event *pos;
+ struct trace_fprobe *tf;
+
+ for_each_trace_fprobe(tf, pos)
+ if (strcmp(trace_probe_name(&tf->tp), event) == 0 &&
+ strcmp(trace_probe_group_name(&tf->tp), group) == 0)
+ return tf;
+ return NULL;
+}
+
+static inline int __enable_trace_fprobe(struct trace_fprobe *tf)
+{
+ if (trace_fprobe_is_registered(tf))
+ enable_fprobe(&tf->fp);
+
+ return 0;
+}
+
+static void __disable_trace_fprobe(struct trace_probe *tp)
+{
+ struct trace_fprobe *tf;
+
+ list_for_each_entry(tf, trace_probe_probe_list(tp), tp.list) {
+ if (!trace_fprobe_is_registered(tf))
+ continue;
+ disable_fprobe(&tf->fp);
+ }
+}
+
+/*
+ * Enable trace_probe
+ * if the file is NULL, enable "perf" handler, or enable "trace" handler.
+ */
+static int enable_trace_fprobe(struct trace_event_call *call,
+ struct trace_event_file *file)
+{
+ struct trace_probe *tp;
+ struct trace_fprobe *tf;
+ bool enabled;
+ int ret = 0;
+
+ tp = trace_probe_primary_from_call(call);
+ if (WARN_ON_ONCE(!tp))
+ return -ENODEV;
+ enabled = trace_probe_is_enabled(tp);
+
+ /* This also changes "enabled" state */
+ if (file) {
+ ret = trace_probe_add_file(tp, file);
+ if (ret)
+ return ret;
+ } else
+ trace_probe_set_flag(tp, TP_FLAG_PROFILE);
+
+ if (!enabled) {
+ list_for_each_entry(tf, trace_probe_probe_list(tp), tp.list) {
+ /* TODO: check the fprobe is gone */
+ __enable_trace_fprobe(tf);
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * Disable trace_probe
+ * if the file is NULL, disable "perf" handler, or disable "trace" handler.
+ */
+static int disable_trace_fprobe(struct trace_event_call *call,
+ struct trace_event_file *file)
+{
+ struct trace_probe *tp;
+
+ tp = trace_probe_primary_from_call(call);
+ if (WARN_ON_ONCE(!tp))
+ return -ENODEV;
+
+ if (file) {
+ if (!trace_probe_get_file_link(tp, file))
+ return -ENOENT;
+ if (!trace_probe_has_single_file(tp))
+ goto out;
+ trace_probe_clear_flag(tp, TP_FLAG_TRACE);
+ } else
+ trace_probe_clear_flag(tp, TP_FLAG_PROFILE);
+
+ if (!trace_probe_is_enabled(tp))
+ __disable_trace_fprobe(tp);
+
+ out:
+ if (file)
+ /*
+ * Synchronization is done in below function. For perf event,
+ * file == NULL and perf_trace_event_unreg() calls
+ * tracepoint_synchronize_unregister() to ensure synchronize
+ * event. We don't need to care about it.
+ */
+ trace_probe_remove_file(tp, file);
+
+ return 0;
+}
+
+/* Event entry printers */
+static enum print_line_t
+print_fentry_event(struct trace_iterator *iter, int flags,
+ struct trace_event *event)
+{
+ struct fentry_trace_entry_head *field;
+ struct trace_seq *s = &iter->seq;
+ struct trace_probe *tp;
+
+ field = (struct fentry_trace_entry_head *)iter->ent;
+ tp = trace_probe_primary_from_call(
+ container_of(event, struct trace_event_call, event));
+ if (WARN_ON_ONCE(!tp))
+ goto out;
+
+ trace_seq_printf(s, "%s: (", trace_probe_name(tp));
+
+ if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET))
+ goto out;
+
+ trace_seq_putc(s, ')');
+
+ if (trace_probe_print_args(s, tp->args, tp->nr_args,
+ (u8 *)&field[1], field) < 0)
+ goto out;
+
+ trace_seq_putc(s, '\n');
+ out:
+ return trace_handle_return(s);
+}
+
+static enum print_line_t
+print_fexit_event(struct trace_iterator *iter, int flags,
+ struct trace_event *event)
+{
+ struct fexit_trace_entry_head *field;
+ struct trace_seq *s = &iter->seq;
+ struct trace_probe *tp;
+
+ field = (struct fexit_trace_entry_head *)iter->ent;
+ tp = trace_probe_primary_from_call(
+ container_of(event, struct trace_event_call, event));
+ if (WARN_ON_ONCE(!tp))
+ goto out;
+
+ trace_seq_printf(s, "%s: (", trace_probe_name(tp));
+
+ if (!seq_print_ip_sym(s, field->ret_ip, flags | TRACE_ITER_SYM_OFFSET))
+ goto out;
+
+ trace_seq_puts(s, " <- ");
+
+ if (!seq_print_ip_sym(s, field->func, flags & ~TRACE_ITER_SYM_OFFSET))
+ goto out;
+
+ trace_seq_putc(s, ')');
+
+ if (trace_probe_print_args(s, tp->args, tp->nr_args,
+ (u8 *)&field[1], field) < 0)
+ goto out;
+
+ trace_seq_putc(s, '\n');
+
+ out:
+ return trace_handle_return(s);
+}
+
+static int fentry_event_define_fields(struct trace_event_call *event_call)
+{
+ int ret;
+ struct fentry_trace_entry_head field;
+ struct trace_probe *tp;
+
+ tp = trace_probe_primary_from_call(event_call);
+ if (WARN_ON_ONCE(!tp))
+ return -ENOENT;
+
+ DEFINE_FIELD(unsigned long, ip, FIELD_STRING_IP, 0);
+
+ return traceprobe_define_arg_fields(event_call, sizeof(field), tp);
+}
+
+static int fexit_event_define_fields(struct trace_event_call *event_call)
+{
+ int ret;
+ struct fexit_trace_entry_head field;
+ struct trace_probe *tp;
+
+ tp = trace_probe_primary_from_call(event_call);
+ if (WARN_ON_ONCE(!tp))
+ return -ENOENT;
+
+ DEFINE_FIELD(unsigned long, func, FIELD_STRING_FUNC, 0);
+ DEFINE_FIELD(unsigned long, ret_ip, FIELD_STRING_RETIP, 0);
+
+ return traceprobe_define_arg_fields(event_call, sizeof(field), tp);
+}
+
+static struct trace_event_functions fentry_funcs = {
+ .trace = print_fentry_event
+};
+
+static struct trace_event_functions fexit_funcs = {
+ .trace = print_fexit_event
+};
+
+static struct trace_event_fields fentry_fields_array[] = {
+ { .type = TRACE_FUNCTION_TYPE,
+ .define_fields = fentry_event_define_fields },
+ {}
+};
+
+static struct trace_event_fields fexit_fields_array[] = {
+ { .type = TRACE_FUNCTION_TYPE,
+ .define_fields = fexit_event_define_fields },
+ {}
+};
+
+static int fprobe_register(struct trace_event_call *event,
+ enum trace_reg type, void *data);
+
+static inline void init_trace_event_call(struct trace_fprobe *tf)
+{
+ struct trace_event_call *call = trace_probe_event_call(&tf->tp);
+
+ if (trace_fprobe_is_return(tf)) {
+ call->event.funcs = &fexit_funcs;
+ call->class->fields_array = fexit_fields_array;
+ } else {
+ call->event.funcs = &fentry_funcs;
+ call->class->fields_array = fentry_fields_array;
+ }
+
+ call->flags = TRACE_EVENT_FL_FPROBE;
+ call->class->reg = fprobe_register;
+}
+
+static int register_fprobe_event(struct trace_fprobe *tf)
+{
+ init_trace_event_call(tf);
+
+ return trace_probe_register_event_call(&tf->tp);
+}
+
+static int unregister_fprobe_event(struct trace_fprobe *tf)
+{
+ return trace_probe_unregister_event_call(&tf->tp);
+}
+
+/* Internal register function - just handle fprobe and flags */
+static int __register_trace_fprobe(struct trace_fprobe *tf)
+{
+ int i, ret;
+
+ /* Should we need new LOCKDOWN flag for fprobe? */
+ ret = security_locked_down(LOCKDOWN_KPROBES);
+ if (ret)
+ return ret;
+
+ if (trace_fprobe_is_registered(tf))
+ return -EINVAL;
+
+ for (i = 0; i < tf->tp.nr_args; i++) {
+ ret = traceprobe_update_arg(&tf->tp.args[i]);
+ if (ret)
+ return ret;
+ }
+
+ /* Set/clear disabled flag according to tp->flag */
+ if (trace_probe_is_enabled(&tf->tp))
+ tf->fp.flags &= ~FPROBE_FL_DISABLED;
+ else
+ tf->fp.flags |= FPROBE_FL_DISABLED;
+
+ /* TODO: handle filter, nofilter or symbol list */
+ return register_fprobe(&tf->fp, tf->symbol, NULL);
+}
+
+/* Internal unregister function - just handle fprobe and flags */
+static void __unregister_trace_fprobe(struct trace_fprobe *tf)
+{
+ if (trace_fprobe_is_registered(tf)) {
+ unregister_fprobe(&tf->fp);
+ memset(&tf->fp, 0, sizeof(tf->fp));
+ }
+}
+
+/* TODO: make this trace_*probe common function */
+/* Unregister a trace_probe and probe_event */
+static int unregister_trace_fprobe(struct trace_fprobe *tf)
+{
+ /* If other probes are on the event, just unregister fprobe */
+ if (trace_probe_has_sibling(&tf->tp))
+ goto unreg;
+
+ /* Enabled event can not be unregistered */
+ if (trace_probe_is_enabled(&tf->tp))
+ return -EBUSY;
+
+ /* If there's a reference to the dynamic event */
+ if (trace_event_dyn_busy(trace_probe_event_call(&tf->tp)))
+ return -EBUSY;
+
+ /* Will fail if probe is being used by ftrace or perf */
+ if (unregister_fprobe_event(tf))
+ return -EBUSY;
+
+unreg:
+ __unregister_trace_fprobe(tf);
+ dyn_event_remove(&tf->devent);
+ trace_probe_unlink(&tf->tp);
+
+ return 0;
+}
+
+static bool trace_fprobe_has_same_fprobe(struct trace_fprobe *orig,
+ struct trace_fprobe *comp)
+{
+ struct trace_probe_event *tpe = orig->tp.event;
+ int i;
+
+ list_for_each_entry(orig, &tpe->probes, tp.list) {
+ if (strcmp(trace_fprobe_symbol(orig),
+ trace_fprobe_symbol(comp)))
+ continue;
+
+ /*
+ * trace_probe_compare_arg_type() ensured that nr_args and
+ * each argument name and type are same. Let's compare comm.
+ */
+ for (i = 0; i < orig->tp.nr_args; i++) {
+ if (strcmp(orig->tp.args[i].comm,
+ comp->tp.args[i].comm))
+ break;
+ }
+
+ if (i == orig->tp.nr_args)
+ return true;
+ }
+
+ return false;
+}
+
+static int append_trace_fprobe(struct trace_fprobe *tf, struct trace_fprobe *to)
+{
+ int ret;
+
+ if (trace_fprobe_is_return(tf) != trace_fprobe_is_return(to)) {
+ trace_probe_log_set_index(0);
+ trace_probe_log_err(0, DIFF_PROBE_TYPE);
+ return -EEXIST;
+ }
+ ret = trace_probe_compare_arg_type(&tf->tp, &to->tp);
+ if (ret) {
+ /* Note that argument starts index = 2 */
+ trace_probe_log_set_index(ret + 1);
+ trace_probe_log_err(0, DIFF_ARG_TYPE);
+ return -EEXIST;
+ }
+ if (trace_fprobe_has_same_fprobe(to, tf)) {
+ trace_probe_log_set_index(0);
+ trace_probe_log_err(0, SAME_PROBE);
+ return -EEXIST;
+ }
+
+ /* Append to existing event */
+ ret = trace_probe_append(&tf->tp, &to->tp);
+ if (ret)
+ return ret;
+
+ ret = __register_trace_fprobe(tf);
+ if (ret)
+ trace_probe_unlink(&tf->tp);
+ else
+ dyn_event_add(&tf->devent, trace_probe_event_call(&tf->tp));
+
+ return ret;
+}
+
+/* Register a trace_probe and probe_event */
+static int register_trace_fprobe(struct trace_fprobe *tf)
+{
+ struct trace_fprobe *old_tf;
+ int ret;
+
+ mutex_lock(&event_mutex);
+
+ old_tf = find_trace_fprobe(trace_probe_name(&tf->tp),
+ trace_probe_group_name(&tf->tp));
+ if (old_tf) {
+ ret = append_trace_fprobe(tf, old_tf);
+ goto end;
+ }
+
+ /* Register new event */
+ ret = register_fprobe_event(tf);
+ if (ret) {
+ if (ret == -EEXIST) {
+ trace_probe_log_set_index(0);
+ trace_probe_log_err(0, EVENT_EXIST);
+ } else
+ pr_warn("Failed to register probe event(%d)\n", ret);
+ goto end;
+ }
+
+ /* Register fprobe */
+ ret = __register_trace_fprobe(tf);
+ if (ret < 0)
+ unregister_fprobe_event(tf);
+ else
+ dyn_event_add(&tf->devent, trace_probe_event_call(&tf->tp));
+
+end:
+ mutex_unlock(&event_mutex);
+ return ret;
+}
+
+static int __trace_fprobe_create(int argc, const char *argv[])
+{
+ /*
+ * Argument syntax:
+ * - Add fentry probe:
+ * f[:[GRP/][EVENT]] [MOD:]KSYM [FETCHARGS]
+ * - Add fexit probe:
+ * f[N][:[GRP/][EVENT]] [MOD:]KSYM%return [FETCHARGS]
+ *
+ * Fetch args:
+ * $retval : fetch return value
+ * $stack : fetch stack address
+ * $stackN : fetch Nth entry of stack (N:0-)
+ * $argN : fetch Nth argument (N:1-)
+ * $comm : fetch current task comm
+ * @ADDR : fetch memory at ADDR (ADDR should be in kernel)
+ * @SYM[+|-offs] : fetch memory at SYM +|- offs (SYM is a data symbol)
+ * Dereferencing memory fetch:
+ * +|-offs(ARG) : fetch memory at ARG +|- offs address.
+ * Alias name of args:
+ * NAME=FETCHARG : set NAME as alias of FETCHARG.
+ * Type of args:
+ * FETCHARG:TYPE : use TYPE instead of unsigned long.
+ */
+ struct trace_fprobe *tf = NULL;
+ int i, len, ret = 0;
+ bool is_return = false;
+ char *symbol = NULL, *tmp = NULL;
+ const char *event = NULL, *group = FPROBE_EVENT_SYSTEM;
+ int maxactive = 0;
+ char buf[MAX_EVENT_NAME_LEN];
+ char gbuf[MAX_EVENT_NAME_LEN];
+ unsigned int flags = TPARG_FL_KERNEL;
+
+ if (argv[0][0] != 'f' || argc < 2)
+ return -ECANCELED;
+
+ trace_probe_log_init("trace_fprobe", argc, argv);
+
+ event = strchr(&argv[0][1], ':');
+ if (event)
+ event++;
+
+ if (isdigit(argv[0][1])) {
+ if (event)
+ len = event - &argv[0][1] - 1;
+ else
+ len = strlen(&argv[0][1]);
+ if (len > MAX_EVENT_NAME_LEN - 1) {
+ trace_probe_log_err(1, BAD_MAXACT);
+ goto parse_error;
+ }
+ memcpy(buf, &argv[0][1], len);
+ buf[len] = '\0';
+ ret = kstrtouint(buf, 0, &maxactive);
+ if (ret || !maxactive) {
+ trace_probe_log_err(1, BAD_MAXACT);
+ goto parse_error;
+ }
+ /* fprobe rethook instances are iterated over via a list. The
+ * maximum should stay reasonable.
+ */
+ if (maxactive > RETHOOK_MAXACTIVE_MAX) {
+ trace_probe_log_err(1, MAXACT_TOO_BIG);
+ goto parse_error;
+ }
+ }
+
+ trace_probe_log_set_index(1);
+
+ /* a symbol specified */
+ symbol = kstrdup(argv[1], GFP_KERNEL);
+ if (!symbol)
+ return -ENOMEM;
+
+ tmp = strchr(symbol, '%');
+ if (tmp) {
+ if (!strcmp(tmp, "%return")) {
+ *tmp = '\0';
+ is_return = true;
+ } else {
+ trace_probe_log_err(tmp - symbol, BAD_ADDR_SUFFIX);
+ goto parse_error;
+ }
+ }
+ if (!is_return && maxactive) {
+ trace_probe_log_set_index(0);
+ trace_probe_log_err(1, BAD_MAXACT_TYPE);
+ goto parse_error;
+ }
+
+ flags |= TPARG_FL_FENTRY;
+ if (is_return)
+ flags |= TPARG_FL_RETURN;
+
+ trace_probe_log_set_index(0);
+ if (event) {
+ ret = traceprobe_parse_event_name(&event, &group, gbuf,
+ event - argv[0]);
+ if (ret)
+ goto parse_error;
+ }
+
+ if (!event) {
+ /* Make a new event name */
+ snprintf(buf, MAX_EVENT_NAME_LEN, "%s__%s", symbol,
+ is_return ? "exit" : "entry");
+ sanitize_event_name(buf);
+ event = buf;
+ }
+
+ /* setup a probe */
+ tf = alloc_trace_fprobe(group, event, symbol, maxactive,
+ argc - 2, is_return);
+ if (IS_ERR(tf)) {
+ ret = PTR_ERR(tf);
+ /* This must return -ENOMEM, else there is a bug */
+ WARN_ON_ONCE(ret != -ENOMEM);
+ goto out; /* We know tf is not allocated */
+ }
+ argc -= 2; argv += 2;
+
+ /* parse arguments */
+ for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
+ trace_probe_log_set_index(i + 2);
+ ret = traceprobe_parse_probe_arg(&tf->tp, i, argv[i], flags);
+ if (ret)
+ goto error; /* This can be -ENOMEM */
+ }
+
+ ret = traceprobe_set_print_fmt(&tf->tp,
+ is_return ? PROBE_PRINT_RETURN : PROBE_PRINT_NORMAL);
+ if (ret < 0)
+ goto error;
+
+ ret = register_trace_fprobe(tf);
+ if (ret) {
+ trace_probe_log_set_index(1);
+ if (ret == -EILSEQ)
+ trace_probe_log_err(0, BAD_INSN_BNDRY);
+ else if (ret == -ENOENT)
+ trace_probe_log_err(0, BAD_PROBE_ADDR);
+ else if (ret != -ENOMEM && ret != -EEXIST)
+ trace_probe_log_err(0, FAIL_REG_PROBE);
+ goto error;
+ }
+
+out:
+ trace_probe_log_clear();
+ kfree(symbol);
+ return ret;
+
+parse_error:
+ ret = -EINVAL;
+error:
+ free_trace_fprobe(tf);
+ goto out;
+}
+
+static int trace_fprobe_create(const char *raw_command)
+{
+ return trace_probe_create(raw_command, __trace_fprobe_create);
+}
+
+static int trace_fprobe_release(struct dyn_event *ev)
+{
+ struct trace_fprobe *tf = to_trace_fprobe(ev);
+ int ret = unregister_trace_fprobe(tf);
+
+ if (!ret)
+ free_trace_fprobe(tf);
+ return ret;
+}
+
+static int trace_fprobe_show(struct seq_file *m, struct dyn_event *ev)
+{
+ struct trace_fprobe *tf = to_trace_fprobe(ev);
+ int i;
+
+ seq_putc(m, 'f');
+ if (trace_fprobe_is_return(tf) && tf->fp.nr_maxactive)
+ seq_printf(m, "%d", tf->fp.nr_maxactive);
+ seq_printf(m, ":%s/%s", trace_probe_group_name(&tf->tp),
+ trace_probe_name(&tf->tp));
+
+ seq_printf(m, " %s%s", trace_fprobe_symbol(tf),
+ trace_fprobe_is_return(tf) ? "%return" : "");
+
+ for (i = 0; i < tf->tp.nr_args; i++)
+ seq_printf(m, " %s=%s", tf->tp.args[i].name, tf->tp.args[i].comm);
+ seq_putc(m, '\n');
+
+ return 0;
+}
+
+/*
+ * called by perf_trace_init() or __ftrace_set_clr_event() under event_mutex.
+ */
+static int fprobe_register(struct trace_event_call *event,
+ enum trace_reg type, void *data)
+{
+ struct trace_event_file *file = data;
+
+ switch (type) {
+ case TRACE_REG_REGISTER:
+ return enable_trace_fprobe(event, file);
+ case TRACE_REG_UNREGISTER:
+ return disable_trace_fprobe(event, file);
+
+#ifdef CONFIG_PERF_EVENTS
+ case TRACE_REG_PERF_REGISTER:
+ return enable_trace_fprobe(event, NULL);
+ case TRACE_REG_PERF_UNREGISTER:
+ return disable_trace_fprobe(event, NULL);
+ case TRACE_REG_PERF_OPEN:
+ case TRACE_REG_PERF_CLOSE:
+ case TRACE_REG_PERF_ADD:
+ case TRACE_REG_PERF_DEL:
+ return 0;
+#endif
+ }
+ return 0;
+}
+
+/*
+ * Register dynevent at core_initcall. This allows kernel to setup fprobe
+ * events in postcore_initcall without tracefs.
+ */
+static __init int init_fprobe_trace_early(void)
+{
+ int ret;
+
+ ret = dyn_event_register(&trace_fprobe_ops);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+core_initcall(init_fprobe_trace_early);
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 59cda19a9033..38e34ed89d55 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -764,7 +764,7 @@ static int __trace_kprobe_create(int argc, const char *argv[])

if (isdigit(argv[0][1])) {
if (!is_return) {
- trace_probe_log_err(1, MAXACT_NO_KPROBE);
+ trace_probe_log_err(1, BAD_MAXACT_TYPE);
goto parse_error;
}
if (event)
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 20d0c4a97633..8646b097d56c 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -393,8 +393,8 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
break;

case '%': /* named register */
- if (flags & TPARG_FL_TPOINT) {
- /* eprobes do not handle registers */
+ if (flags & (TPARG_FL_TPOINT | TPARG_FL_FPROBE)) {
+ /* eprobe and fprobe do not handle registers */
trace_probe_log_err(offs, BAD_VAR);
break;
}
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index ef8ed3b65d05..a2ee6d336bd8 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -362,7 +362,8 @@ int trace_probe_print_args(struct trace_seq *s, struct probe_arg *args, int nr_a
#define TPARG_FL_FENTRY BIT(2)
#define TPARG_FL_TPOINT BIT(3)
#define TPARG_FL_USER BIT(4)
-#define TPARG_FL_MASK GENMASK(4, 0)
+#define TPARG_FL_FPROBE BIT(5)
+#define TPARG_FL_MASK GENMASK(5, 0)

extern int traceprobe_parse_probe_arg(struct trace_probe *tp, int i,
const char *argv, unsigned int flags);
@@ -404,7 +405,7 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
C(REFCNT_OPEN_BRACE, "Reference counter brace is not closed"), \
C(BAD_REFCNT_SUFFIX, "Reference counter has wrong suffix"), \
C(BAD_UPROBE_OFFS, "Invalid uprobe offset"), \
- C(MAXACT_NO_KPROBE, "Maxactive is not for kprobe"), \
+ C(BAD_MAXACT_TYPE, "Maxactive is only for function exit"), \
C(BAD_MAXACT, "Invalid maxactive number"), \
C(MAXACT_TOO_BIG, "Maxactive is too big"), \
C(BAD_PROBE_ADDR, "Invalid probed address or symbol"), \
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
index 9e85d3019ff0..97c08867490a 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
@@ -8,7 +8,7 @@ check_error() { # command-with-error-pos-by-^
}

if grep -q 'r\[maxactive\]' README; then
-check_error 'p^100 vfs_read' # MAXACT_NO_KPROBE
+check_error 'p^100 vfs_read' # BAD_MAXACT_TYPE
check_error 'r^1a111 vfs_read' # BAD_MAXACT
check_error 'r^100000 vfs_read' # MAXACT_TOO_BIG
fi

2023-04-20 11:27:49

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 3/9] selftests/ftrace: Add fprobe related testcases

From: Masami Hiramatsu (Google) <[email protected]>

Add syntax error testcase and add-remove testcase for fprobe events.
This ensures that the fprobe events can be added/removed and parser
handles syntax errors correctly.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
.../ftrace/test.d/dynevent/add_remove_fprobe.tc | 26 ++++++
.../ftrace/test.d/dynevent/fprobe_syntax_errors.tc | 88 ++++++++++++++++++++
2 files changed, 114 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc

diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
new file mode 100644
index 000000000000..53e0d5671687
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
@@ -0,0 +1,26 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - add/remove fprobe events
+# requires: dynamic_events "f[:[<group>/][<event>]] <func-name>[%return] [<args>]": README
+
+echo 0 > events/enable
+echo > dynamic_events
+
+PLACE=$FUNCTION_FORK
+
+echo "f:myevent1 $PLACE" >> dynamic_events
+echo "f:myevent2 $PLACE%return" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+grep -q myevent2 dynamic_events
+test -d events/fprobes/myevent1
+test -d events/fprobes/myevent2
+
+echo "-:myevent2" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+! grep -q myevent2 dynamic_events
+
+echo > dynamic_events
+
+clear_trace
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
new file mode 100644
index 000000000000..549daa162d84
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
@@ -0,0 +1,88 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Fprobe event parser error log check
+# requires: dynamic_events "f[:[<group>/][<event>]] <func-name>[%return] [<args>]": README
+
+check_error() { # command-with-error-pos-by-^
+ ftrace_errlog_check 'trace_fprobe' "$1" 'dynamic_events'
+}
+
+check_error 'f^100 vfs_read' # MAXACT_NO_KPROBE
+check_error 'f^1a111 vfs_read' # BAD_MAXACT
+check_error 'f^100000 vfs_read' # MAXACT_TOO_BIG
+
+check_error 'f ^non_exist_func' # BAD_PROBE_ADDR (enoent)
+check_error 'f ^vfs_read+10' # BAD_PROBE_ADDR
+check_error 'f:^/bar vfs_read' # NO_GROUP_NAME
+check_error 'f:^12345678901234567890123456789012345678901234567890123456789012345/bar vfs_read' # GROUP_TOO_LONG
+
+check_error 'f:^foo.1/bar vfs_read' # BAD_GROUP_NAME
+check_error 'f:^ vfs_read' # NO_EVENT_NAME
+check_error 'f:foo/^12345678901234567890123456789012345678901234567890123456789012345 vfs_read' # EVENT_TOO_LONG
+check_error 'f:foo/^bar.1 vfs_read' # BAD_EVENT_NAME
+
+check_error 'f vfs_read ^$retval' # RETVAL_ON_PROBE
+check_error 'f vfs_read ^$stack10000' # BAD_STACK_NUM
+
+check_error 'f vfs_read ^$arg10000' # BAD_ARG_NUM
+
+check_error 'f vfs_read ^$none_var' # BAD_VAR
+check_error 'f vfs_read ^%rax' # BAD_VAR
+
+check_error 'f vfs_read ^@12345678abcde' # BAD_MEM_ADDR
+check_error 'f vfs_read ^@+10' # FILE_ON_KPROBE
+
+grep -q "imm-value" README && \
+check_error 'f vfs_read arg1=\^x' # BAD_IMM
+grep -q "imm-string" README && \
+check_error 'f vfs_read arg1=\"abcd^' # IMMSTR_NO_CLOSE
+
+check_error 'f vfs_read ^+0@0)' # DEREF_NEED_BRACE
+check_error 'f vfs_read ^+0ab1(@0)' # BAD_DEREF_OFFS
+check_error 'f vfs_read +0(+0(@0^)' # DEREF_OPEN_BRACE
+
+if grep -A1 "fetcharg:" README | grep -q '\$comm' ; then
+check_error 'f vfs_read +0(^$comm)' # COMM_CANT_DEREF
+fi
+
+check_error 'f vfs_read ^&1' # BAD_FETCH_ARG
+
+
+# We've introduced this limitation with array support
+if grep -q ' <type>\\\[<array-size>\\\]' README; then
+check_error 'f vfs_read +0(^+0(+0(+0(+0(+0(+0(+0(+0(+0(+0(+0(+0(+0(@0))))))))))))))' # TOO_MANY_OPS?
+check_error 'f vfs_read +0(@11):u8[10^' # ARRAY_NO_CLOSE
+check_error 'f vfs_read +0(@11):u8[10]^a' # BAD_ARRAY_SUFFIX
+check_error 'f vfs_read +0(@11):u8[^10a]' # BAD_ARRAY_NUM
+check_error 'f vfs_read +0(@11):u8[^256]' # ARRAY_TOO_BIG
+fi
+
+check_error 'f vfs_read @11:^unknown_type' # BAD_TYPE
+check_error 'f vfs_read $stack0:^string' # BAD_STRING
+check_error 'f vfs_read @11:^b10@a/16' # BAD_BITFIELD
+
+check_error 'f vfs_read ^arg123456789012345678901234567890=@11' # ARG_NAME_TOO_LOG
+check_error 'f vfs_read ^=@11' # NO_ARG_NAME
+check_error 'f vfs_read ^var.1=@11' # BAD_ARG_NAME
+check_error 'f vfs_read var1=@11 ^var1=@12' # USED_ARG_NAME
+check_error 'f vfs_read ^+1234567(+1234567(+1234567(+1234567(+1234567(+1234567(@1234))))))' # ARG_TOO_LONG
+check_error 'f vfs_read arg1=^' # NO_ARG_BODY
+
+
+# multiprobe errors
+if grep -q "Create/append/" README && grep -q "imm-value" README; then
+echo "f:fprobes/testevent $FUNCTION_FORK" > dynamic_events
+check_error '^f:fprobes/testevent do_exit%return' # DIFF_PROBE_TYPE
+
+# Explicitly use printf "%s" to not interpret \1
+printf "%s" "f:fprobes/testevent $FUNCTION_FORK abcd=\\1" > dynamic_events
+check_error "f:fprobes/testevent $FUNCTION_FORK ^bcd=\\1" # DIFF_ARG_TYPE
+check_error "f:fprobes/testevent $FUNCTION_FORK ^abcd=\\1:u8" # DIFF_ARG_TYPE
+check_error "f:fprobes/testevent $FUNCTION_FORK ^abcd=\\\"foo\"" # DIFF_ARG_TYPE
+check_error "^f:fprobes/testevent $FUNCTION_FORK abcd=\\1" # SAME_PROBE
+fi
+
+# %return suffix errors
+check_error 'f vfs_read^%hoge' # BAD_ADDR_SUFFIX
+
+exit 0

2023-04-20 11:27:56

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 4/9] tracing/probes: Add tracepoint support on fprobe_event

From: Masami Hiramatsu (Google) <[email protected]>

Allow fprobe_event to trace raw tracepoints so that user can trace
tracepoints which don't have traceevent wrappers. This new event is
always available if fprobe event is enabled since the tracepoint is
disabled, trace-event and dynamic event is also not available.

e.g.
# echo 't sched_overutilized_tp' >> dynamic_events
# echo 't 9p_client_req' >> dynamic_events
# cat dynamic_events
t:tracepoints/sched_overutilized_tp sched_overutilized_tp
t:tracepoints/_9p_client_req 9p_client_req

The event name is based on the tracepoint name, but if it is started
with digit character, an underscore '_' will be added.

NOTE: to avoid further confusion, this renames TPARG_FL_TPOINT to
TPARG_FL_TEVENT because this flag is used for eprobe (trace-event probe).
And reuse TPARG_FL_TPOINT for this raw tracepoint probe.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
include/linux/tracepoint-defs.h | 1
include/linux/tracepoint.h | 5 ++
kernel/trace/trace.c | 1
kernel/trace/trace_eprobe.c | 2 -
kernel/trace/trace_fprobe.c | 129 +++++++++++++++++++++++++++++++++++++--
kernel/trace/trace_probe.c | 15 +++--
kernel/trace/trace_probe.h | 14 ++++
7 files changed, 152 insertions(+), 15 deletions(-)

diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index e7c2276be33e..4dc4955f0fbf 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -35,6 +35,7 @@ struct tracepoint {
struct static_call_key *static_call_key;
void *static_call_tramp;
void *iterator;
+ void *probestub;
int (*regfunc)(void);
void (*unregfunc)(void);
struct tracepoint_func __rcu *funcs;
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 6811e43c1b5c..37196aaf8930 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -303,6 +303,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
__section("__tracepoints_strings") = #_name; \
extern struct static_call_key STATIC_CALL_KEY(tp_func_##_name); \
int __traceiter_##_name(void *__data, proto); \
+ void __probestub_##_name(void *, proto); \
struct tracepoint __tracepoint_##_name __used \
__section("__tracepoints") = { \
.name = __tpstrtab_##_name, \
@@ -310,6 +311,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
.static_call_key = &STATIC_CALL_KEY(tp_func_##_name), \
.static_call_tramp = STATIC_CALL_TRAMP_ADDR(tp_func_##_name), \
.iterator = &__traceiter_##_name, \
+ .probestub = &__probestub_##_name, \
.regfunc = _reg, \
.unregfunc = _unreg, \
.funcs = NULL }; \
@@ -330,6 +332,9 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
} \
return 0; \
} \
+ void __probestub_##_name(void *__data, proto) \
+ { \
+ } \
DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name);

#define DEFINE_TRACE(name, proto, args) \
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4d7d5708d3cd..9da9c979faa3 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5653,6 +5653,7 @@ static const char readme_msg[] =
#endif
#ifdef CONFIG_FPROBE_EVENTS
"\t f[:[<group>/][<event>]] <func-name>[%return] [<args>]\n"
+ "\t t[:[<group>/][<event>]] <tracepoint> [<args>]\n"
#endif
#ifdef CONFIG_HIST_TRIGGERS
"\t s:[synthetic/]<event> <field> [<field>]\n"
diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/trace_eprobe.c
index 67e854979d53..fd64cd5d5745 100644
--- a/kernel/trace/trace_eprobe.c
+++ b/kernel/trace/trace_eprobe.c
@@ -817,7 +817,7 @@ find_and_get_event(const char *system, const char *event_name)

static int trace_eprobe_tp_update_arg(struct trace_eprobe *ep, const char *argv[], int i)
{
- unsigned int flags = TPARG_FL_KERNEL | TPARG_FL_TPOINT;
+ unsigned int flags = TPARG_FL_KERNEL | TPARG_FL_TEVENT;
int ret;

ret = traceprobe_parse_probe_arg(&ep->tp, i, argv[i], flags);
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 9dcc320b1158..8cf581568a14 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -9,6 +9,7 @@
#include <linux/module.h>
#include <linux/rculist.h>
#include <linux/security.h>
+#include <linux/tracepoint.h>
#include <linux/uaccess.h>

#include "trace_dynevent.h"
@@ -17,6 +18,7 @@
#include "trace_probe_tmpl.h"

#define FPROBE_EVENT_SYSTEM "fprobes"
+#define TRACEPOINT_EVENT_SYSTEM "tracepoints"
#define RETHOOK_MAXACTIVE_MAX 4096

static int trace_fprobe_create(const char *raw_command);
@@ -41,6 +43,8 @@ struct trace_fprobe {
struct dyn_event devent;
struct fprobe fp;
const char *symbol;
+ struct tracepoint *tpoint;
+ struct module *mod;
struct trace_probe tp;
};

@@ -68,6 +72,11 @@ static bool trace_fprobe_is_return(struct trace_fprobe *tf)
return tf->fp.exit_handler != NULL;
}

+static bool trace_fprobe_is_tracepoint(struct trace_fprobe *tf)
+{
+ return tf->tpoint != NULL;
+}
+
static const char *trace_fprobe_symbol(struct trace_fprobe *tf)
{
return tf->symbol ? tf->symbol : "unknown";
@@ -689,6 +698,21 @@ static int __register_trace_fprobe(struct trace_fprobe *tf)
else
tf->fp.flags |= FPROBE_FL_DISABLED;

+ if (trace_fprobe_is_tracepoint(tf)) {
+ struct tracepoint *tpoint = tf->tpoint;
+ unsigned long ip = (unsigned long)tpoint->probestub;
+ /*
+ * Here, we do 2 steps to enable fprobe on a tracepoint.
+ * At first, put __probestub_##TP function on the tracepoint
+ * and put a fprobe on the stub function.
+ */
+ ret = tracepoint_probe_register_prio_may_exist(tpoint,
+ tpoint->probestub, NULL, 0);
+ if (ret < 0)
+ return ret;
+ return register_fprobe_ips(&tf->fp, &ip, 1);
+ }
+
/* TODO: handle filter, nofilter or symbol list */
return register_fprobe(&tf->fp, tf->symbol, NULL);
}
@@ -699,6 +723,12 @@ static void __unregister_trace_fprobe(struct trace_fprobe *tf)
if (trace_fprobe_is_registered(tf)) {
unregister_fprobe(&tf->fp);
memset(&tf->fp, 0, sizeof(tf->fp));
+ if (trace_fprobe_is_tracepoint(tf)) {
+ tracepoint_probe_unregister(tf->tpoint,
+ tf->tpoint->probestub, NULL);
+ tf->tpoint = NULL;
+ tf->mod = NULL;
+ }
}
}

@@ -762,7 +792,8 @@ static int append_trace_fprobe(struct trace_fprobe *tf, struct trace_fprobe *to)
{
int ret;

- if (trace_fprobe_is_return(tf) != trace_fprobe_is_return(to)) {
+ if (trace_fprobe_is_return(tf) != trace_fprobe_is_return(to) ||
+ trace_fprobe_is_tracepoint(tf) != trace_fprobe_is_tracepoint(to)) {
trace_probe_log_set_index(0);
trace_probe_log_err(0, DIFF_PROBE_TYPE);
return -EEXIST;
@@ -832,6 +863,58 @@ static int register_trace_fprobe(struct trace_fprobe *tf)
return ret;
}

+static int __tracepoint_probe_module_cb(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ struct tp_module *tp_mod = data;
+ struct trace_fprobe *tf;
+ struct dyn_event *pos;
+
+ if (val != MODULE_STATE_GOING)
+ return NOTIFY_DONE;
+
+ mutex_lock(&event_mutex);
+ for_each_trace_fprobe(tf, pos) {
+ if (tp_mod->mod == tf->mod) {
+ tracepoint_probe_unregister(tf->tpoint,
+ tf->tpoint->probestub, NULL);
+ tf->tpoint = NULL;
+ tf->mod = NULL;
+ }
+ }
+ mutex_unlock(&event_mutex);
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block tracepoint_module_nb = {
+ .notifier_call = __tracepoint_probe_module_cb,
+};
+
+struct __find_tracepoint_cb_data {
+ const char *tp_name;
+ struct tracepoint *tpoint;
+};
+
+static void __find_tracepoint_cb(struct tracepoint *tp, void *priv)
+{
+ struct __find_tracepoint_cb_data *data = priv;
+
+ if (!data->tpoint && !strcmp(data->tp_name, tp->name))
+ data->tpoint = tp;
+}
+
+struct tracepoint *find_tracepoint(const char *tp_name)
+{
+ struct __find_tracepoint_cb_data data = {
+ .tp_name = tp_name,
+ };
+
+ for_each_kernel_tracepoint(__find_tracepoint_cb, &data);
+
+ return data.tpoint;
+}
+
static int __trace_fprobe_create(int argc, const char *argv[])
{
/*
@@ -840,6 +923,8 @@ static int __trace_fprobe_create(int argc, const char *argv[])
* f[:[GRP/][EVENT]] [MOD:]KSYM [FETCHARGS]
* - Add fexit probe:
* f[N][:[GRP/][EVENT]] [MOD:]KSYM%return [FETCHARGS]
+ * - Add tracepoint probe:
+ * t[:[GRP/][EVENT]] TRACEPOINT [FETCHARGS]
*
* Fetch args:
* $retval : fetch return value
@@ -865,10 +950,16 @@ static int __trace_fprobe_create(int argc, const char *argv[])
char buf[MAX_EVENT_NAME_LEN];
char gbuf[MAX_EVENT_NAME_LEN];
unsigned int flags = TPARG_FL_KERNEL;
+ bool is_tracepoint = false;

- if (argv[0][0] != 'f' || argc < 2)
+ if ((argv[0][0] != 'f' && argv[0][0] != 't') || argc < 2)
return -ECANCELED;

+ if (argv[0][0] == 't') {
+ is_tracepoint = true;
+ group = TRACEPOINT_EVENT_SYSTEM;
+ }
+
trace_probe_log_init("trace_fprobe", argc, argv);

event = strchr(&argv[0][1], ':');
@@ -902,14 +993,14 @@ static int __trace_fprobe_create(int argc, const char *argv[])

trace_probe_log_set_index(1);

- /* a symbol specified */
+ /* a symbol(or tracepoint) must be specified */
symbol = kstrdup(argv[1], GFP_KERNEL);
if (!symbol)
return -ENOMEM;

tmp = strchr(symbol, '%');
if (tmp) {
- if (!strcmp(tmp, "%return")) {
+ if (!is_tracepoint && !strcmp(tmp, "%return")) {
*tmp = '\0';
is_return = true;
} else {
@@ -926,6 +1017,8 @@ static int __trace_fprobe_create(int argc, const char *argv[])
flags |= TPARG_FL_FENTRY;
if (is_return)
flags |= TPARG_FL_RETURN;
+ if (is_tracepoint)
+ flags |= TPARG_FL_TPOINT;

trace_probe_log_set_index(0);
if (event) {
@@ -937,8 +1030,11 @@ static int __trace_fprobe_create(int argc, const char *argv[])

if (!event) {
/* Make a new event name */
- snprintf(buf, MAX_EVENT_NAME_LEN, "%s__%s", symbol,
- is_return ? "exit" : "entry");
+ if (is_tracepoint)
+ strscpy(buf, symbol, MAX_EVENT_NAME_LEN);
+ else
+ snprintf(buf, MAX_EVENT_NAME_LEN, "%s__%s", symbol,
+ is_return ? "exit" : "entry");
sanitize_event_name(buf);
event = buf;
}
@@ -952,6 +1048,18 @@ static int __trace_fprobe_create(int argc, const char *argv[])
WARN_ON_ONCE(ret != -ENOMEM);
goto out; /* We know tf is not allocated */
}
+
+ if (is_tracepoint) {
+ tf->tpoint = find_tracepoint(tf->symbol);
+ if (!tf->tpoint) {
+ trace_probe_log_set_index(1);
+ trace_probe_log_err(0, NO_TRACEPOINT);
+ goto parse_error;
+ }
+ tf->mod = __module_text_address(
+ (unsigned long)tf->tpoint->probestub);
+ }
+
argc -= 2; argv += 2;

/* parse arguments */
@@ -1011,7 +1119,10 @@ static int trace_fprobe_show(struct seq_file *m, struct dyn_event *ev)
struct trace_fprobe *tf = to_trace_fprobe(ev);
int i;

- seq_putc(m, 'f');
+ if (trace_fprobe_is_tracepoint(tf))
+ seq_putc(m, 't');
+ else
+ seq_putc(m, 'f');
if (trace_fprobe_is_return(tf) && tf->fp.nr_maxactive)
seq_printf(m, "%d", tf->fp.nr_maxactive);
seq_printf(m, ":%s/%s", trace_probe_group_name(&tf->tp),
@@ -1068,6 +1179,10 @@ static __init int init_fprobe_trace_early(void)
if (ret)
return ret;

+ ret = register_tracepoint_module_notifier(&tracepoint_module_nb);
+ if (ret)
+ return ret;
+
return 0;
}
core_initcall(init_fprobe_trace_early);
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 8646b097d56c..dffbed2ddae9 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -292,7 +292,7 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
int ret = 0;
int len;

- if (flags & TPARG_FL_TPOINT) {
+ if (flags & TPARG_FL_TEVENT) {
if (code->data)
return -EFAULT;
code->data = kstrdup(arg, GFP_KERNEL);
@@ -326,8 +326,7 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
} else if (strcmp(arg, "comm") == 0 || strcmp(arg, "COMM") == 0) {
code->op = FETCH_OP_COMM;
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
- } else if (((flags & TPARG_FL_MASK) ==
- (TPARG_FL_KERNEL | TPARG_FL_FENTRY)) &&
+ } else if (tparg_is_function_entry(flags) &&
(len = str_has_prefix(arg, "arg"))) {
ret = kstrtoul(arg + len, 10, &param);
if (ret) {
@@ -338,6 +337,12 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
}
code->op = FETCH_OP_ARG;
code->param = (unsigned int)param - 1;
+ /*
+ * The tracepoint probe will probe a stub function, and the
+ * first parameter of the stub is a dummy and should be ignored.
+ */
+ if (flags & TPARG_FL_TPOINT)
+ code->param++;
#endif
} else
goto inval_var;
@@ -393,7 +398,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
break;

case '%': /* named register */
- if (flags & (TPARG_FL_TPOINT | TPARG_FL_FPROBE)) {
+ if (flags & (TPARG_FL_TEVENT | TPARG_FL_FPROBE)) {
/* eprobe and fprobe do not handle registers */
trace_probe_log_err(offs, BAD_VAR);
break;
@@ -633,7 +638,7 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
* Since $comm and immediate string can not be dereferenced,
* we can find those by strcmp. But ignore for eprobes.
*/
- if (!(flags & TPARG_FL_TPOINT) &&
+ if (!(flags & TPARG_FL_TEVENT) &&
(strcmp(arg, "$comm") == 0 || strcmp(arg, "$COMM") == 0 ||
strncmp(arg, "\\\"", 2) == 0)) {
/* The type of $comm must be "string", and not an array. */
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index a2ee6d336bd8..6b23206068f7 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -360,10 +360,19 @@ int trace_probe_print_args(struct trace_seq *s, struct probe_arg *args, int nr_a
#define TPARG_FL_RETURN BIT(0)
#define TPARG_FL_KERNEL BIT(1)
#define TPARG_FL_FENTRY BIT(2)
-#define TPARG_FL_TPOINT BIT(3)
+#define TPARG_FL_TEVENT BIT(3)
#define TPARG_FL_USER BIT(4)
#define TPARG_FL_FPROBE BIT(5)
-#define TPARG_FL_MASK GENMASK(5, 0)
+#define TPARG_FL_TPOINT BIT(6)
+#define TPARG_FL_MASK GENMASK(6, 0)
+
+static inline bool tparg_is_function_entry(unsigned int flags)
+{
+ /* Tracpoint probe is also function entry */
+ unsigned int mask = TPARG_FL_MASK & ~TPARG_FL_TPOINT;
+
+ return (flags & mask) == (TPARG_FL_KERNEL | TPARG_FL_FENTRY);
+}

extern int traceprobe_parse_probe_arg(struct trace_probe *tp, int i,
const char *argv, unsigned int flags);
@@ -410,6 +419,7 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
C(MAXACT_TOO_BIG, "Maxactive is too big"), \
C(BAD_PROBE_ADDR, "Invalid probed address or symbol"), \
C(BAD_RETPROBE, "Retprobe address must be an function entry"), \
+ C(NO_TRACEPOINT, "Tracepoint is not found"), \
C(BAD_ADDR_SUFFIX, "Invalid probed address suffix"), \
C(NO_GROUP_NAME, "Group name is not specified"), \
C(GROUP_TOO_LONG, "Group name is too long"), \

2023-04-20 11:28:04

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 5/9] tracing/probes: Move event parameter fetching code to common parser

From: Masami Hiramatsu (Google) <[email protected]>

Move trace event parameter fetching code to common parser in
trace_probe.c. This simplifies eprobe's trace-event variable fetching
code by introducing a parse context data structure.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
kernel/trace/trace_eprobe.c | 44 +-----------
kernel/trace/trace_fprobe.c | 4 +
kernel/trace/trace_kprobe.c | 4 +
kernel/trace/trace_probe.c | 156 ++++++++++++++++++++++++++-----------------
kernel/trace/trace_probe.h | 9 ++
kernel/trace/trace_uprobe.c | 8 +-
6 files changed, 117 insertions(+), 108 deletions(-)

diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/trace_eprobe.c
index fd64cd5d5745..cb0077ba2b49 100644
--- a/kernel/trace/trace_eprobe.c
+++ b/kernel/trace/trace_eprobe.c
@@ -227,37 +227,6 @@ static struct trace_eprobe *alloc_event_probe(const char *group,
return ERR_PTR(ret);
}

-static int trace_eprobe_tp_arg_update(struct trace_eprobe *ep, int i)
-{
- struct probe_arg *parg = &ep->tp.args[i];
- struct ftrace_event_field *field;
- struct list_head *head;
- int ret = -ENOENT;
-
- head = trace_get_fields(ep->event);
- list_for_each_entry(field, head, link) {
- if (!strcmp(parg->code->data, field->name)) {
- kfree(parg->code->data);
- parg->code->data = field;
- return 0;
- }
- }
-
- /*
- * Argument not found on event. But allow for comm and COMM
- * to be used to get the current->comm.
- */
- if (strcmp(parg->code->data, "COMM") == 0 ||
- strcmp(parg->code->data, "comm") == 0) {
- parg->code->op = FETCH_OP_COMM;
- ret = 0;
- }
-
- kfree(parg->code->data);
- parg->code->data = NULL;
- return ret;
-}
-
static int eprobe_event_define_fields(struct trace_event_call *event_call)
{
struct eprobe_trace_entry_head field;
@@ -817,19 +786,16 @@ find_and_get_event(const char *system, const char *event_name)

static int trace_eprobe_tp_update_arg(struct trace_eprobe *ep, const char *argv[], int i)
{
- unsigned int flags = TPARG_FL_KERNEL | TPARG_FL_TEVENT;
+ struct traceprobe_parse_context ctx = {
+ .event = ep->event,
+ .flags = TPARG_FL_KERNEL | TPARG_FL_TEVENT,
+ };
int ret;

- ret = traceprobe_parse_probe_arg(&ep->tp, i, argv[i], flags);
+ ret = traceprobe_parse_probe_arg(&ep->tp, i, argv[i], &ctx);
if (ret)
return ret;

- if (ep->tp.args[i].code->op == FETCH_OP_TP_ARG) {
- ret = trace_eprobe_tp_arg_update(ep, i);
- if (ret)
- trace_probe_log_err(0, BAD_ATTACH_ARG);
- }
-
/* Handle symbols "@" */
if (!ret)
ret = traceprobe_update_arg(&ep->tp.args[i]);
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 8cf581568a14..cd91bf57baac 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -1064,8 +1064,10 @@ static int __trace_fprobe_create(int argc, const char *argv[])

/* parse arguments */
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
+ struct traceprobe_parse_context ctx = { .flags = flags };
+
trace_probe_log_set_index(i + 2);
- ret = traceprobe_parse_probe_arg(&tf->tp, i, argv[i], flags);
+ ret = traceprobe_parse_probe_arg(&tf->tp, i, argv[i], &ctx);
if (ret)
goto error; /* This can be -ENOMEM */
}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 38e34ed89d55..fd62de2a2f51 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -867,8 +867,10 @@ static int __trace_kprobe_create(int argc, const char *argv[])

/* parse arguments */
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
+ struct traceprobe_parse_context ctx = { .flags = flags };
+
trace_probe_log_set_index(i + 2);
- ret = traceprobe_parse_probe_arg(&tk->tp, i, argv[i], flags);
+ ret = traceprobe_parse_probe_arg(&tk->tp, i, argv[i], &ctx);
if (ret)
goto error; /* This can be -ENOMEM */
}
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index dffbed2ddae9..84a9f0446390 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -283,27 +283,53 @@ int traceprobe_parse_event_name(const char **pevent, const char **pgroup,
return 0;
}

+static int parse_trace_event_arg(char *arg, struct fetch_insn *code,
+ struct traceprobe_parse_context *ctx)
+{
+ struct ftrace_event_field *field;
+ struct list_head *head;
+
+ head = trace_get_fields(ctx->event);
+ list_for_each_entry(field, head, link) {
+ if (!strcmp(arg, field->name)) {
+ code->op = FETCH_OP_TP_ARG;
+ code->data = field;
+ return 0;
+ }
+ }
+ return -ENOENT;
+}
+
#define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long))

static int parse_probe_vars(char *arg, const struct fetch_type *t,
- struct fetch_insn *code, unsigned int flags, int offs)
+ struct fetch_insn *code,
+ struct traceprobe_parse_context *ctx)
{
unsigned long param;
int ret = 0;
int len;

- if (flags & TPARG_FL_TEVENT) {
+ if (ctx->flags & TPARG_FL_TEVENT) {
if (code->data)
return -EFAULT;
- code->data = kstrdup(arg, GFP_KERNEL);
- if (!code->data)
- return -ENOMEM;
- code->op = FETCH_OP_TP_ARG;
- } else if (strcmp(arg, "retval") == 0) {
- if (flags & TPARG_FL_RETURN) {
+ ret = parse_trace_event_arg(arg, code, ctx);
+ if (!ret)
+ return 0;
+ if (strcmp(arg, "comm") == 0 || strcmp(arg, "COMM") == 0) {
+ code->op = FETCH_OP_COMM;
+ return 0;
+ }
+ /* backward compatibility */
+ ctx->offset = 0;
+ goto inval_var;
+ }
+
+ if (strcmp(arg, "retval") == 0) {
+ if (ctx->flags & TPARG_FL_RETURN) {
code->op = FETCH_OP_RETVAL;
} else {
- trace_probe_log_err(offs, RETVAL_ON_PROBE);
+ trace_probe_log_err(ctx->offset, RETVAL_ON_PROBE);
ret = -EINVAL;
}
} else if ((len = str_has_prefix(arg, "stack"))) {
@@ -313,9 +339,9 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
ret = kstrtoul(arg + len, 10, &param);
if (ret) {
goto inval_var;
- } else if ((flags & TPARG_FL_KERNEL) &&
+ } else if ((ctx->flags & TPARG_FL_KERNEL) &&
param > PARAM_MAX_STACK) {
- trace_probe_log_err(offs, BAD_STACK_NUM);
+ trace_probe_log_err(ctx->offset, BAD_STACK_NUM);
ret = -EINVAL;
} else {
code->op = FETCH_OP_STACK;
@@ -326,13 +352,13 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
} else if (strcmp(arg, "comm") == 0 || strcmp(arg, "COMM") == 0) {
code->op = FETCH_OP_COMM;
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
- } else if (tparg_is_function_entry(flags) &&
+ } else if (tparg_is_function_entry(ctx->flags) &&
(len = str_has_prefix(arg, "arg"))) {
ret = kstrtoul(arg + len, 10, &param);
if (ret) {
goto inval_var;
} else if (!param || param > PARAM_MAX_STACK) {
- trace_probe_log_err(offs, BAD_ARG_NUM);
+ trace_probe_log_err(ctx->offset, BAD_ARG_NUM);
return -EINVAL;
}
code->op = FETCH_OP_ARG;
@@ -341,7 +367,7 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
* The tracepoint probe will probe a stub function, and the
* first parameter of the stub is a dummy and should be ignored.
*/
- if (flags & TPARG_FL_TPOINT)
+ if (ctx->flags & TPARG_FL_TPOINT)
code->param++;
#endif
} else
@@ -350,7 +376,7 @@ static int parse_probe_vars(char *arg, const struct fetch_type *t,
return ret;

inval_var:
- trace_probe_log_err(offs, BAD_VAR);
+ trace_probe_log_err(ctx->offset, BAD_VAR);
return -EINVAL;
}

@@ -383,7 +409,7 @@ static int __parse_imm_string(char *str, char **pbuf, int offs)
static int
parse_probe_arg(char *arg, const struct fetch_type *type,
struct fetch_insn **pcode, struct fetch_insn *end,
- unsigned int flags, int offs)
+ struct traceprobe_parse_context *ctx)
{
struct fetch_insn *code = *pcode;
unsigned long param;
@@ -394,13 +420,13 @@ parse_probe_arg(char *arg, const struct fetch_type *type,

switch (arg[0]) {
case '$':
- ret = parse_probe_vars(arg + 1, type, code, flags, offs);
+ ret = parse_probe_vars(arg + 1, type, code, ctx);
break;

case '%': /* named register */
- if (flags & (TPARG_FL_TEVENT | TPARG_FL_FPROBE)) {
+ if (ctx->flags & (TPARG_FL_TEVENT | TPARG_FL_FPROBE)) {
/* eprobe and fprobe do not handle registers */
- trace_probe_log_err(offs, BAD_VAR);
+ trace_probe_log_err(ctx->offset, BAD_VAR);
break;
}
ret = regs_query_register_offset(arg + 1);
@@ -409,14 +435,14 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
code->param = (unsigned int)ret;
ret = 0;
} else
- trace_probe_log_err(offs, BAD_REG_NAME);
+ trace_probe_log_err(ctx->offset, BAD_REG_NAME);
break;

case '@': /* memory, file-offset or symbol */
if (isdigit(arg[1])) {
ret = kstrtoul(arg + 1, 0, &param);
if (ret) {
- trace_probe_log_err(offs, BAD_MEM_ADDR);
+ trace_probe_log_err(ctx->offset, BAD_MEM_ADDR);
break;
}
/* load address */
@@ -424,13 +450,13 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
code->immediate = param;
} else if (arg[1] == '+') {
/* kprobes don't support file offsets */
- if (flags & TPARG_FL_KERNEL) {
- trace_probe_log_err(offs, FILE_ON_KPROBE);
+ if (ctx->flags & TPARG_FL_KERNEL) {
+ trace_probe_log_err(ctx->offset, FILE_ON_KPROBE);
return -EINVAL;
}
ret = kstrtol(arg + 2, 0, &offset);
if (ret) {
- trace_probe_log_err(offs, BAD_FILE_OFFS);
+ trace_probe_log_err(ctx->offset, BAD_FILE_OFFS);
break;
}

@@ -438,8 +464,8 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
code->immediate = (unsigned long)offset; // imm64?
} else {
/* uprobes don't support symbols */
- if (!(flags & TPARG_FL_KERNEL)) {
- trace_probe_log_err(offs, SYM_ON_UPROBE);
+ if (!(ctx->flags & TPARG_FL_KERNEL)) {
+ trace_probe_log_err(ctx->offset, SYM_ON_UPROBE);
return -EINVAL;
}
/* Preserve symbol for updating */
@@ -448,7 +474,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
if (!code->data)
return -ENOMEM;
if (++code == end) {
- trace_probe_log_err(offs, TOO_MANY_OPS);
+ trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
return -EINVAL;
}
code->op = FETCH_OP_IMM;
@@ -456,7 +482,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
}
/* These are fetching from memory */
if (++code == end) {
- trace_probe_log_err(offs, TOO_MANY_OPS);
+ trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
return -EINVAL;
}
*pcode = code;
@@ -475,36 +501,38 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
arg++; /* Skip '+', because kstrtol() rejects it. */
tmp = strchr(arg, '(');
if (!tmp) {
- trace_probe_log_err(offs, DEREF_NEED_BRACE);
+ trace_probe_log_err(ctx->offset, DEREF_NEED_BRACE);
return -EINVAL;
}
*tmp = '\0';
ret = kstrtol(arg, 0, &offset);
if (ret) {
- trace_probe_log_err(offs, BAD_DEREF_OFFS);
+ trace_probe_log_err(ctx->offset, BAD_DEREF_OFFS);
break;
}
- offs += (tmp + 1 - arg) + (arg[0] != '-' ? 1 : 0);
+ ctx->offset += (tmp + 1 - arg) + (arg[0] != '-' ? 1 : 0);
arg = tmp + 1;
tmp = strrchr(arg, ')');
if (!tmp) {
- trace_probe_log_err(offs + strlen(arg),
+ trace_probe_log_err(ctx->offset + strlen(arg),
DEREF_OPEN_BRACE);
return -EINVAL;
} else {
- const struct fetch_type *t2 = find_fetch_type(NULL, flags);
+ const struct fetch_type *t2 = find_fetch_type(NULL, ctx->flags);
+ int cur_offs = ctx->offset;

*tmp = '\0';
- ret = parse_probe_arg(arg, t2, &code, end, flags, offs);
+ ret = parse_probe_arg(arg, t2, &code, end, ctx);
if (ret)
break;
+ ctx->offset = cur_offs;
if (code->op == FETCH_OP_COMM ||
code->op == FETCH_OP_DATA) {
- trace_probe_log_err(offs, COMM_CANT_DEREF);
+ trace_probe_log_err(ctx->offset, COMM_CANT_DEREF);
return -EINVAL;
}
if (++code == end) {
- trace_probe_log_err(offs, TOO_MANY_OPS);
+ trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
return -EINVAL;
}
*pcode = code;
@@ -515,7 +543,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
break;
case '\\': /* Immediate value */
if (arg[1] == '"') { /* Immediate string */
- ret = __parse_imm_string(arg + 2, &tmp, offs + 2);
+ ret = __parse_imm_string(arg + 2, &tmp, ctx->offset + 2);
if (ret)
break;
code->op = FETCH_OP_DATA;
@@ -523,7 +551,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
} else {
ret = str_to_immediate(arg + 1, &code->immediate);
if (ret)
- trace_probe_log_err(offs + 1, BAD_IMM);
+ trace_probe_log_err(ctx->offset + 1, BAD_IMM);
else
code->op = FETCH_OP_IMM;
}
@@ -531,7 +559,7 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
}
if (!ret && code->op == FETCH_OP_NOP) {
/* Parsed, but do not find fetch method */
- trace_probe_log_err(offs, BAD_FETCH_ARG);
+ trace_probe_log_err(ctx->offset, BAD_FETCH_ARG);
ret = -EINVAL;
}
return ret;
@@ -576,12 +604,13 @@ static int __parse_bitfield_probe_arg(const char *bf,

/* String length checking wrapper */
static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
- struct probe_arg *parg, unsigned int flags, int offset)
+ struct probe_arg *parg,
+ struct traceprobe_parse_context *ctx)
{
struct fetch_insn *code, *scode, *tmp = NULL;
char *t, *t2, *t3;
- char *arg;
int ret, len;
+ char *arg;

arg = kstrdup(argv, GFP_KERNEL);
if (!arg)
@@ -590,10 +619,10 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
ret = -EINVAL;
len = strlen(arg);
if (len > MAX_ARGSTR_LEN) {
- trace_probe_log_err(offset, ARG_TOO_LONG);
+ trace_probe_log_err(ctx->offset, ARG_TOO_LONG);
goto out;
} else if (len == 0) {
- trace_probe_log_err(offset, NO_ARG_BODY);
+ trace_probe_log_err(ctx->offset, NO_ARG_BODY);
goto out;
}

@@ -611,23 +640,24 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
*t2++ = '\0';
t3 = strchr(t2, ']');
if (!t3) {
- offset += t2 + strlen(t2) - arg;
- trace_probe_log_err(offset,
+ int offs = t2 + strlen(t2) - arg;
+
+ trace_probe_log_err(ctx->offset + offs,
ARRAY_NO_CLOSE);
goto out;
} else if (t3[1] != '\0') {
- trace_probe_log_err(offset + t3 + 1 - arg,
+ trace_probe_log_err(ctx->offset + t3 + 1 - arg,
BAD_ARRAY_SUFFIX);
goto out;
}
*t3 = '\0';
if (kstrtouint(t2, 0, &parg->count) || !parg->count) {
- trace_probe_log_err(offset + t2 - arg,
+ trace_probe_log_err(ctx->offset + t2 - arg,
BAD_ARRAY_NUM);
goto out;
}
if (parg->count > MAX_ARRAY_LEN) {
- trace_probe_log_err(offset + t2 - arg,
+ trace_probe_log_err(ctx->offset + t2 - arg,
ARRAY_TOO_BIG);
goto out;
}
@@ -638,17 +668,17 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
* Since $comm and immediate string can not be dereferenced,
* we can find those by strcmp. But ignore for eprobes.
*/
- if (!(flags & TPARG_FL_TEVENT) &&
+ if (!(ctx->flags & TPARG_FL_TEVENT) &&
(strcmp(arg, "$comm") == 0 || strcmp(arg, "$COMM") == 0 ||
strncmp(arg, "\\\"", 2) == 0)) {
/* The type of $comm must be "string", and not an array. */
if (parg->count || (t && strcmp(t, "string")))
goto out;
- parg->type = find_fetch_type("string", flags);
+ parg->type = find_fetch_type("string", ctx->flags);
} else
- parg->type = find_fetch_type(t, flags);
+ parg->type = find_fetch_type(t, ctx->flags);
if (!parg->type) {
- trace_probe_log_err(offset + (t ? (t - arg) : 0), BAD_TYPE);
+ trace_probe_log_err(ctx->offset + (t ? (t - arg) : 0), BAD_TYPE);
goto out;
}
parg->offset = *size;
@@ -670,7 +700,7 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
code[FETCH_INSN_MAX - 1].op = FETCH_OP_END;

ret = parse_probe_arg(arg, parg->type, &code, &code[FETCH_INSN_MAX - 1],
- flags, offset);
+ ctx);
if (ret)
goto fail;

@@ -681,7 +711,7 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
if (code->op != FETCH_OP_REG && code->op != FETCH_OP_STACK &&
code->op != FETCH_OP_RETVAL && code->op != FETCH_OP_ARG &&
code->op != FETCH_OP_DEREF && code->op != FETCH_OP_TP_ARG) {
- trace_probe_log_err(offset + (t ? (t - arg) : 0),
+ trace_probe_log_err(ctx->offset + (t ? (t - arg) : 0),
BAD_SYMSTRING);
goto fail;
}
@@ -689,7 +719,7 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
if (code->op != FETCH_OP_DEREF && code->op != FETCH_OP_UDEREF &&
code->op != FETCH_OP_IMM && code->op != FETCH_OP_COMM &&
code->op != FETCH_OP_DATA && code->op != FETCH_OP_TP_ARG) {
- trace_probe_log_err(offset + (t ? (t - arg) : 0),
+ trace_probe_log_err(ctx->offset + (t ? (t - arg) : 0),
BAD_STRING);
goto fail;
}
@@ -708,7 +738,7 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
*/
code++;
if (code->op != FETCH_OP_NOP) {
- trace_probe_log_err(offset, TOO_MANY_OPS);
+ trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
goto fail;
}
}
@@ -731,7 +761,7 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
} else {
code++;
if (code->op != FETCH_OP_NOP) {
- trace_probe_log_err(offset, TOO_MANY_OPS);
+ trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
goto fail;
}
code->op = FETCH_OP_ST_RAW;
@@ -742,7 +772,7 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
if (t != NULL) {
ret = __parse_bitfield_probe_arg(t, parg->type, &code);
if (ret) {
- trace_probe_log_err(offset + t - arg, BAD_BITFIELD);
+ trace_probe_log_err(ctx->offset + t - arg, BAD_BITFIELD);
goto fail;
}
}
@@ -752,13 +782,13 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
if (scode->op != FETCH_OP_ST_MEM &&
scode->op != FETCH_OP_ST_STRING &&
scode->op != FETCH_OP_ST_USTRING) {
- trace_probe_log_err(offset + (t ? (t - arg) : 0),
+ trace_probe_log_err(ctx->offset + (t ? (t - arg) : 0),
BAD_STRING);
goto fail;
}
code++;
if (code->op != FETCH_OP_NOP) {
- trace_probe_log_err(offset, TOO_MANY_OPS);
+ trace_probe_log_err(ctx->offset, TOO_MANY_OPS);
goto fail;
}
code->op = FETCH_OP_LP_ARRAY;
@@ -807,7 +837,7 @@ static int traceprobe_conflict_field_name(const char *name,
}

int traceprobe_parse_probe_arg(struct trace_probe *tp, int i, const char *arg,
- unsigned int flags)
+ struct traceprobe_parse_context *ctx)
{
struct probe_arg *parg = &tp->args[i];
const char *body;
@@ -842,9 +872,9 @@ int traceprobe_parse_probe_arg(struct trace_probe *tp, int i, const char *arg,
trace_probe_log_err(0, USED_ARG_NAME);
return -EINVAL;
}
+ ctx->offset = body - arg;
/* Parse fetch argument */
- return traceprobe_parse_probe_arg_body(body, &tp->size, parg, flags,
- body - arg);
+ return traceprobe_parse_probe_arg_body(body, &tp->size, parg, ctx);
}

void traceprobe_free_probe_arg(struct probe_arg *arg)
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 6b23206068f7..2dc1e5c4c9e8 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -374,8 +374,15 @@ static inline bool tparg_is_function_entry(unsigned int flags)
return (flags & mask) == (TPARG_FL_KERNEL | TPARG_FL_FENTRY);
}

+struct traceprobe_parse_context {
+ struct trace_event_call *event;
+ unsigned int flags;
+ int offset;
+};
+
extern int traceprobe_parse_probe_arg(struct trace_probe *tp, int i,
- const char *argv, unsigned int flags);
+ const char *argv,
+ struct traceprobe_parse_context *ctx);

extern int traceprobe_update_arg(struct probe_arg *arg);
extern void traceprobe_free_probe_arg(struct probe_arg *arg);
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 8b92e34ff0c8..fa09b33ee731 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -686,10 +686,12 @@ static int __trace_uprobe_create(int argc, const char **argv)

/* parse arguments */
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
+ struct traceprobe_parse_context ctx = {
+ .flags = (is_return ? TPARG_FL_RETURN : 0) | TPARG_FL_USER,
+ };
+
trace_probe_log_set_index(i + 2);
- ret = traceprobe_parse_probe_arg(&tu->tp, i, argv[i],
- (is_return ? TPARG_FL_RETURN : 0) |
- TPARG_FL_USER);
+ ret = traceprobe_parse_probe_arg(&tu->tp, i, argv[i], &ctx);
if (ret)
goto error;
}

2023-04-20 11:28:19

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 6/9] tracing/probes: Support function parameters if BTF is available

From: Masami Hiramatsu (Google) <[email protected]>

Support function or tracepoint parameters by name if BTF is available
and the event is for function entry (This means it is available for
kprobe-events, fprobe-events and tracepoint probe events.)

BTF variable syntax is a bit special because it doesn't need any prefix.
Also, if only the BTF variable name is given, the argument name is
also becomes the BTF variable name. e.g.

# echo 'p vfs_read count pos' >> dynamic_events
# echo 'f vfs_write count pos' >> dynamic_events
# echo 't sched_overutilized_tp rd overutilized' >> dynamic_events
# cat dynamic_events
p:kprobes/p_vfs_read_0 vfs_read count=count pos=pos
f:fprobes/vfs_write__entry vfs_write count=count pos=pos
t:tracepoints/sched_overutilized_tp sched_overutilized_tp rd=rd overutilized=overutilized

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
kernel/trace/Kconfig | 11 ++
kernel/trace/trace.c | 4 +
kernel/trace/trace_fprobe.c | 49 ++++++-----
kernel/trace/trace_kprobe.c | 12 +--
kernel/trace/trace_probe.c | 192 +++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace_probe.h | 9 ++
6 files changed, 248 insertions(+), 29 deletions(-)

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 8e10a9453c96..e2b415b9fcd4 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -664,6 +664,17 @@ config FPROBE_EVENTS
and the kprobe events on function entry and exit will be
transparently converted to this fprobe events.

+config PROBE_EVENTS_BTF_ARGS
+ depends on HAVE_FUNCTION_ARG_ACCESS_API
+ depends on FPROBE_EVENTS || KPROBE_EVENTS
+ depends on DEBUG_INFO_BTF && BPF_SYSCALL
+ bool "Support BTF function arguments for probe events"
+ default y
+ help
+ The user can specify the arguments of the probe event using the names
+ of the arguments of the probed function. This feature only works if
+ the probe location is a kernel function entry or a tracepoint.
+
config KPROBE_EVENTS
depends on KPROBES
depends on HAVE_REGS_AND_STACK_ACCESS_API
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 9da9c979faa3..0d9c48197a5c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5670,7 +5670,11 @@ static const char readme_msg[] =
"\t args: <name>=fetcharg[:type]\n"
"\t fetcharg: (%<register>|$<efield>), @<address>, @<symbol>[+|-<offset>],\n"
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
+#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
+ "\t $stack<index>, $stack, $retval, $comm, $arg<N>, <argname>\n"
+#else
"\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
+#endif
#else
"\t $stack<index>, $stack, $retval, $comm,\n"
#endif
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index cd91bf57baac..d88079c2d2e3 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -387,6 +387,7 @@ static void free_trace_fprobe(struct trace_fprobe *tf)
static struct trace_fprobe *alloc_trace_fprobe(const char *group,
const char *event,
const char *symbol,
+ struct tracepoint *tpoint,
int maxactive,
int nargs, bool is_return)
{
@@ -406,6 +407,7 @@ static struct trace_fprobe *alloc_trace_fprobe(const char *group,
else
tf->fp.entry_handler = fentry_dispatcher;

+ tf->tpoint = tpoint;
tf->fp.nr_maxactive = maxactive;

ret = trace_probe_init(&tf->tp, event, group, false);
@@ -949,8 +951,12 @@ static int __trace_fprobe_create(int argc, const char *argv[])
int maxactive = 0;
char buf[MAX_EVENT_NAME_LEN];
char gbuf[MAX_EVENT_NAME_LEN];
- unsigned int flags = TPARG_FL_KERNEL;
+ char sbuf[KSYM_NAME_LEN];
bool is_tracepoint = false;
+ struct tracepoint *tpoint = NULL;
+ struct traceprobe_parse_context ctx = {
+ .flags = TPARG_FL_KERNEL | TPARG_FL_FENTRY,
+ };

if ((argv[0][0] != 'f' && argv[0][0] != 't') || argc < 2)
return -ECANCELED;
@@ -1014,12 +1020,6 @@ static int __trace_fprobe_create(int argc, const char *argv[])
goto parse_error;
}

- flags |= TPARG_FL_FENTRY;
- if (is_return)
- flags |= TPARG_FL_RETURN;
- if (is_tracepoint)
- flags |= TPARG_FL_TPOINT;
-
trace_probe_log_set_index(0);
if (event) {
ret = traceprobe_parse_event_name(&event, &group, gbuf,
@@ -1031,7 +1031,8 @@ static int __trace_fprobe_create(int argc, const char *argv[])
if (!event) {
/* Make a new event name */
if (is_tracepoint)
- strscpy(buf, symbol, MAX_EVENT_NAME_LEN);
+ snprintf(buf, MAX_EVENT_NAME_LEN, "%s%s",
+ isdigit(*symbol) ? "_" : "", symbol);
else
snprintf(buf, MAX_EVENT_NAME_LEN, "%s__%s", symbol,
is_return ? "exit" : "entry");
@@ -1039,8 +1040,25 @@ static int __trace_fprobe_create(int argc, const char *argv[])
event = buf;
}

+ if (is_return)
+ ctx.flags |= TPARG_FL_RETURN;
+
+ if (is_tracepoint) {
+ ctx.flags |= TPARG_FL_TPOINT;
+ tpoint = find_tracepoint(symbol);
+ if (!tpoint) {
+ trace_probe_log_set_index(1);
+ trace_probe_log_err(0, NO_TRACEPOINT);
+ goto parse_error;
+ }
+ ctx.funcname = kallsyms_lookup(
+ (unsigned long)tpoint->probestub,
+ NULL, NULL, NULL, sbuf);
+ } else
+ ctx.funcname = symbol;
+
/* setup a probe */
- tf = alloc_trace_fprobe(group, event, symbol, maxactive,
+ tf = alloc_trace_fprobe(group, event, symbol, tpoint, maxactive,
argc - 2, is_return);
if (IS_ERR(tf)) {
ret = PTR_ERR(tf);
@@ -1049,24 +1067,15 @@ static int __trace_fprobe_create(int argc, const char *argv[])
goto out; /* We know tf is not allocated */
}

- if (is_tracepoint) {
- tf->tpoint = find_tracepoint(tf->symbol);
- if (!tf->tpoint) {
- trace_probe_log_set_index(1);
- trace_probe_log_err(0, NO_TRACEPOINT);
- goto parse_error;
- }
+ if (is_tracepoint)
tf->mod = __module_text_address(
(unsigned long)tf->tpoint->probestub);
- }

argc -= 2; argv += 2;
-
/* parse arguments */
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
- struct traceprobe_parse_context ctx = { .flags = flags };
-
trace_probe_log_set_index(i + 2);
+ ctx.offset = 0;
ret = traceprobe_parse_probe_arg(&tf->tp, i, argv[i], &ctx);
if (ret)
goto error; /* This can be -ENOMEM */
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index fd62de2a2f51..aff6c1a5e161 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -742,7 +742,7 @@ static int __trace_kprobe_create(int argc, const char *argv[])
void *addr = NULL;
char buf[MAX_EVENT_NAME_LEN];
char gbuf[MAX_EVENT_NAME_LEN];
- unsigned int flags = TPARG_FL_KERNEL;
+ struct traceprobe_parse_context ctx = { .flags = TPARG_FL_KERNEL };

switch (argv[0][0]) {
case 'r':
@@ -823,10 +823,10 @@ static int __trace_kprobe_create(int argc, const char *argv[])
goto parse_error;
}
if (is_return)
- flags |= TPARG_FL_RETURN;
+ ctx.flags |= TPARG_FL_RETURN;
ret = kprobe_on_func_entry(NULL, symbol, offset);
if (ret == 0)
- flags |= TPARG_FL_FENTRY;
+ ctx.flags |= TPARG_FL_FENTRY;
/* Defer the ENOENT case until register kprobe */
if (ret == -EINVAL && is_return) {
trace_probe_log_err(0, BAD_RETPROBE);
@@ -856,7 +856,7 @@ static int __trace_kprobe_create(int argc, const char *argv[])

/* setup a probe */
tk = alloc_trace_kprobe(group, event, addr, symbol, offset, maxactive,
- argc - 2, is_return);
+ argc - 2, is_return);
if (IS_ERR(tk)) {
ret = PTR_ERR(tk);
/* This must return -ENOMEM, else there is a bug */
@@ -866,10 +866,10 @@ static int __trace_kprobe_create(int argc, const char *argv[])
argc -= 2; argv += 2;

/* parse arguments */
+ ctx.funcname = symbol;
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
- struct traceprobe_parse_context ctx = { .flags = flags };
-
trace_probe_log_set_index(i + 2);
+ ctx.offset = 0;
ret = traceprobe_parse_probe_arg(&tk->tp, i, argv[i], &ctx);
if (ret)
goto error; /* This can be -ENOMEM */
diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index 84a9f0446390..f55d633b3e2a 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -300,6 +300,174 @@ static int parse_trace_event_arg(char *arg, struct fetch_insn *code,
return -ENOENT;
}

+#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
+
+static DEFINE_MUTEX(tp_btf_mutex);
+static struct btf *traceprobe_btf;
+
+static struct btf *traceprobe_get_btf(void)
+{
+ if (!traceprobe_btf && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
+ mutex_lock(&tp_btf_mutex);
+ if (!traceprobe_btf)
+ traceprobe_btf = btf_parse_vmlinux();
+ mutex_unlock(&tp_btf_mutex);
+ }
+
+ return traceprobe_btf;
+}
+
+static u32 btf_type_int(const struct btf_type *t)
+{
+ return *(u32 *)(t + 1);
+}
+
+static const char *type_from_btf_id(struct btf *btf, s32 id)
+{
+ const struct btf_type *t;
+ u32 intdata;
+ s32 tid;
+
+ /* TODO: const char * could be converted as a string */
+ t = btf_type_skip_modifiers(btf, id, &tid);
+
+ switch (BTF_INFO_KIND(t->info)) {
+ case BTF_KIND_ENUM:
+ /* enum is "int", so convert to "s32" */
+ return "s32";
+ case BTF_KIND_PTR:
+ /* pointer will be converted to "x??" */
+ if (IS_ENABLED(CONFIG_64BIT))
+ return "x64";
+ else
+ return "x32";
+ case BTF_KIND_INT:
+ intdata = btf_type_int(t);
+ if (BTF_INT_ENCODING(intdata) & BTF_INT_SIGNED) {
+ switch (BTF_INT_BITS(intdata)) {
+ case 8:
+ return "s8";
+ case 16:
+ return "s16";
+ case 32:
+ return "s32";
+ case 64:
+ return "s64";
+ }
+ } else { /* unsigned */
+ switch (BTF_INT_BITS(intdata)) {
+ case 8:
+ return "u8";
+ case 16:
+ return "u16";
+ case 32:
+ return "u32";
+ case 64:
+ return "u64";
+ }
+ }
+ }
+ /* TODO: support other types */
+
+ return NULL;
+}
+
+static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr)
+{
+ struct btf *btf = traceprobe_get_btf();
+ const struct btf_type *t;
+ s32 id;
+
+ if (!btf || !funcname || !nr)
+ return ERR_PTR(-EINVAL);
+
+ id = btf_find_by_name_kind(btf, funcname, BTF_KIND_FUNC);
+ if (id <= 0)
+ return ERR_PTR(-ENOENT);
+
+ /* Get BTF_KIND_FUNC type */
+ t = btf_type_by_id(btf, id);
+ if (!btf_type_is_func(t))
+ return ERR_PTR(-ENOENT);
+
+ /* The type of BTF_KIND_FUNC is BTF_KIND_FUNC_PROTO */
+ t = btf_type_by_id(btf, t->type);
+ if (!btf_type_is_func_proto(t))
+ return ERR_PTR(-ENOENT);
+
+ *nr = btf_type_vlen(t);
+
+ if (*nr)
+ return (const struct btf_param *)(t + 1);
+ else
+ return NULL;
+}
+
+static int parse_btf_arg(const char *varname, struct fetch_insn *code,
+ struct traceprobe_parse_context *ctx)
+{
+ struct btf *btf = traceprobe_get_btf();
+ const struct btf_param *params;
+ int i;
+
+ if (!btf) {
+ trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
+ return -EOPNOTSUPP;
+ }
+
+ if (WARN_ON_ONCE(!ctx->funcname))
+ return -EINVAL;
+
+ if (!ctx->params) {
+ params = find_btf_func_param(ctx->funcname, &ctx->nr_params);
+ if (IS_ERR(params)) {
+ trace_probe_log_err(ctx->offset, NO_BTF_ENTRY);
+ return PTR_ERR(params);
+ }
+ ctx->params = params;
+ } else
+ params = ctx->params;
+
+ for (i = 0; i < ctx->nr_params; i++) {
+ const char *name = btf_name_by_offset(btf, params[i].name_off);
+
+ if (name && !strcmp(name, varname)) {
+ code->op = FETCH_OP_ARG;
+ code->param = i;
+ return 0;
+ }
+ }
+ trace_probe_log_err(ctx->offset, NO_BTFARG);
+ return -ENOENT;
+}
+
+static const struct fetch_type *parse_btf_arg_type(int arg_idx,
+ struct traceprobe_parse_context *ctx)
+{
+ struct btf *btf = traceprobe_get_btf();
+ const char *typestr = NULL;
+
+ if (btf && ctx->params)
+ typestr = type_from_btf_id(btf, ctx->params[arg_idx].type);
+
+ return find_fetch_type(typestr, ctx->flags);
+}
+#else
+static struct btf *traceprobe_get_btf(void)
+{
+ return NULL;
+}
+
+static int parse_btf_arg(const char *varname, struct fetch_insn *code,
+ struct traceprobe_parse_context *ctx)
+{
+ trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
+ return -EOPNOTSUPP;
+}
+#define parse_btf_arg_type(idx, ctx) \
+ find_fetch_type(NULL, ctx->flags)
+#endif
+
#define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long))

static int parse_probe_vars(char *arg, const struct fetch_type *t,
@@ -556,6 +724,15 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
code->op = FETCH_OP_IMM;
}
break;
+ default:
+ if (isalpha(arg[0]) || arg[0] == '_') { /* BTF variable */
+ if (!tparg_is_function_entry(ctx->flags)) {
+ trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
+ return -EINVAL;
+ }
+ ret = parse_btf_arg(arg, code, ctx);
+ break;
+ }
}
if (!ret && code->op == FETCH_OP_NOP) {
/* Parsed, but do not find fetch method */
@@ -704,6 +881,11 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
if (ret)
goto fail;

+ /* Update storing type if BTF is available */
+ if (IS_ENABLED(CONFIG_PROBE_EVENTS_BTF_ARGS) &&
+ !t && code->op == FETCH_OP_ARG)
+ parg->type = parse_btf_arg_type(code->param, ctx);
+
ret = -EINVAL;
/* Store operation */
if (parg->type->is_string) {
@@ -857,8 +1039,14 @@ int traceprobe_parse_probe_arg(struct trace_probe *tp, int i, const char *arg,
parg->name = kmemdup_nul(arg, body - arg, GFP_KERNEL);
body++;
} else {
- /* If argument name is omitted, set "argN" */
- parg->name = kasprintf(GFP_KERNEL, "arg%d", i + 1);
+ /*
+ * If argument name is omitted, try arg as a name (BTF variable)
+ * or "argN".
+ */
+ if (is_good_name(arg))
+ parg->name = kstrdup(arg, GFP_KERNEL);
+ else
+ parg->name = kasprintf(GFP_KERNEL, "arg%d", i + 1);
body = arg;
}
if (!parg->name)
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 2dc1e5c4c9e8..9ea5c7e8753f 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -23,6 +23,7 @@
#include <linux/limits.h>
#include <linux/uaccess.h>
#include <linux/bitops.h>
+#include <linux/btf.h>
#include <asm/bitsperlong.h>

#include "trace.h"
@@ -376,6 +377,9 @@ static inline bool tparg_is_function_entry(unsigned int flags)

struct traceprobe_parse_context {
struct trace_event_call *event;
+ const struct btf_param *params;
+ s32 nr_params;
+ const char *funcname;
unsigned int flags;
int offset;
};
@@ -474,7 +478,10 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
C(NO_EVENT_INFO, "This requires both group and event name to attach"),\
C(BAD_ATTACH_EVENT, "Attached event does not exist"),\
C(BAD_ATTACH_ARG, "Attached event does not have this field"),\
- C(NO_EP_FILTER, "No filter rule after 'if'"),
+ C(NO_EP_FILTER, "No filter rule after 'if'"), \
+ C(NOSUP_BTFARG, "BTF is not available or not supported"), \
+ C(NO_BTFARG, "This variable is not found at this probe point"),\
+ C(NO_BTF_ENTRY, "No BTF entry for this probe point"),

#undef C
#define C(a, b) TP_ERR_##a

2023-04-20 11:28:36

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 8/9] selftests/ftrace: Add tracepoint probe test case

From: Masami Hiramatsu (Google) <[email protected]>

Add test cases for tracepoint probe events.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
.../ftrace/test.d/dynevent/add_remove_tprobe.tc | 27 +++++++
.../ftrace/test.d/dynevent/tprobe_syntax_errors.tc | 82 ++++++++++++++++++++
2 files changed, 109 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_tprobe.tc
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/tprobe_syntax_errors.tc

diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_tprobe.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_tprobe.tc
new file mode 100644
index 000000000000..afc8412fde6b
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_tprobe.tc
@@ -0,0 +1,27 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - add/remove tracepoint probe events
+# requires: dynamic_events "t[:[<group>/][<event>]] <tracepoint> [<args>]": README
+
+echo 0 > events/enable
+echo > dynamic_events
+
+TRACEPOINT1=kmem_cache_alloc
+TRACEPOINT2=kmem_cache_free
+
+echo "t:myevent1 $TRACEPOINT1" >> dynamic_events
+echo "t:myevent2 $TRACEPOINT2" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+grep -q myevent2 dynamic_events
+test -d events/tracepoints/myevent1
+test -d events/tracepoints/myevent2
+
+echo "-:myevent2" >> dynamic_events
+
+grep -q myevent1 dynamic_events
+! grep -q myevent2 dynamic_events
+
+echo > dynamic_events
+
+clear_trace
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/tprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/tprobe_syntax_errors.tc
new file mode 100644
index 000000000000..c8dac5c1cfa8
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/tprobe_syntax_errors.tc
@@ -0,0 +1,82 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Tracepoint probe event parser error log check
+# requires: dynamic_events "t[:[<group>/][<event>]] <tracepoint> [<args>]": README
+
+check_error() { # command-with-error-pos-by-^
+ ftrace_errlog_check 'trace_fprobe' "$1" 'dynamic_events'
+}
+
+check_error 't^100 kfree' # BAD_MAXACT_TYPE
+
+check_error 't ^non_exist_tracepoint' # NO_TRACEPOINT
+check_error 't:^/bar kfree' # NO_GROUP_NAME
+check_error 't:^12345678901234567890123456789012345678901234567890123456789012345/bar kfree' # GROUP_TOO_LONG
+
+check_error 't:^foo.1/bar kfree' # BAD_GROUP_NAME
+check_error 't:^ kfree' # NO_EVENT_NAME
+check_error 't:foo/^12345678901234567890123456789012345678901234567890123456789012345 kfree' # EVENT_TOO_LONG
+check_error 't:foo/^bar.1 kfree' # BAD_EVENT_NAME
+
+check_error 't kfree ^$retval' # RETVAL_ON_PROBE
+check_error 't kfree ^$stack10000' # BAD_STACK_NUM
+
+check_error 't kfree ^$arg10000' # BAD_ARG_NUM
+
+check_error 't kfree ^$none_var' # BAD_VAR
+check_error 't kfree ^%rax' # BAD_VAR
+
+check_error 't kfree ^@12345678abcde' # BAD_MEM_ADDR
+check_error 't kfree ^@+10' # FILE_ON_KPROBE
+
+grep -q "imm-value" README && \
+check_error 't kfree arg1=\^x' # BAD_IMM
+grep -q "imm-string" README && \
+check_error 't kfree arg1=\"abcd^' # IMMSTR_NO_CLOSE
+
+check_error 't kfree ^+0@0)' # DEREF_NEED_BRACE
+check_error 't kfree ^+0ab1(@0)' # BAD_DEREF_OFFS
+check_error 't kfree +0(+0(@0^)' # DEREF_OPEN_BRACE
+
+if grep -A1 "fetcharg:" README | grep -q '\$comm' ; then
+check_error 't kfree +0(^$comm)' # COMM_CANT_DEREF
+fi
+
+check_error 't kfree ^&1' # BAD_FETCH_ARG
+
+
+# We've introduced this limitation with array support
+if grep -q ' <type>\\\[<array-size>\\\]' README; then
+check_error 't kfree +0(^+0(+0(+0(+0(+0(+0(+0(+0(+0(+0(+0(+0(+0(@0))))))))))))))' # TOO_MANY_OPS?
+check_error 't kfree +0(@11):u8[10^' # ARRAY_NO_CLOSE
+check_error 't kfree +0(@11):u8[10]^a' # BAD_ARRAY_SUFFIX
+check_error 't kfree +0(@11):u8[^10a]' # BAD_ARRAY_NUM
+check_error 't kfree +0(@11):u8[^256]' # ARRAY_TOO_BIG
+fi
+
+check_error 't kfree @11:^unknown_type' # BAD_TYPE
+check_error 't kfree $stack0:^string' # BAD_STRING
+check_error 't kfree @11:^b10@a/16' # BAD_BITFIELD
+
+check_error 't kfree ^arg123456789012345678901234567890=@11' # ARG_NAME_TOO_LOG
+check_error 't kfree ^=@11' # NO_ARG_NAME
+check_error 't kfree ^var.1=@11' # BAD_ARG_NAME
+check_error 't kfree var1=@11 ^var1=@12' # USED_ARG_NAME
+check_error 't kfree ^+1234567(+1234567(+1234567(+1234567(+1234567(+1234567(@1234))))))' # ARG_TOO_LONG
+check_error 't kfree arg1=^' # NO_ARG_BODY
+
+
+# multiprobe errors
+if grep -q "Create/append/" README && grep -q "imm-value" README; then
+echo "t:tracepoint/testevent kfree" > dynamic_events
+check_error '^f:tracepoint/testevent kfree' # DIFF_PROBE_TYPE
+
+# Explicitly use printf "%s" to not interpret \1
+printf "%s" "t:tracepoints/testevent kfree abcd=\\1" > dynamic_events
+check_error "t:tracepoints/testevent kfree ^bcd=\\1" # DIFF_ARG_TYPE
+check_error "t:tracepoints/testevent kfree ^abcd=\\1:u8" # DIFF_ARG_TYPE
+check_error "t:tracepoints/testevent kfree ^abcd=\\\"foo\"" # DIFF_ARG_TYPE
+check_error "^t:tracepoints/testevent kfree abcd=\\1" # SAME_PROBE
+fi
+
+exit 0

2023-04-20 11:28:40

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 7/9] tracing/probes: Add $$args meta argument for all function args

From: Masami Hiramatsu (Google) <[email protected]>

Add the '$$args' fetch argument for function-entry probe events. This
will be expanded to all arguments using BTF function argument information.

e.g.
# echo 'p vfs_read $$args' >> dynamic_events
# echo 'f vfs_write $$args' >> dynamic_events
# echo 't sched_overutilized_tp $$args' >> dynamic_events
# cat dynamic_events
p:kprobes/p_vfs_read_0 vfs_read file=file buf=buf count=count pos=pos
f:fprobes/vfs_write__entry vfs_write file=file buf=buf count=count pos=pos
t:tracepoints/sched_overutilized_tp sched_overutilized_tp rd=rd overutilized=overutilized

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
kernel/trace/trace_fprobe.c | 21 ++++++++--
kernel/trace/trace_kprobe.c | 23 +++++++++--
kernel/trace/trace_probe.c | 93 +++++++++++++++++++++++++++++++++++++++++++
kernel/trace/trace_probe.h | 9 ++++
4 files changed, 138 insertions(+), 8 deletions(-)

diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index d88079c2d2e3..3d0b96862963 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -944,14 +944,16 @@ static int __trace_fprobe_create(int argc, const char *argv[])
* FETCHARG:TYPE : use TYPE instead of unsigned long.
*/
struct trace_fprobe *tf = NULL;
- int i, len, ret = 0;
+ int i, len, new_argc = 0, ret = 0;
bool is_return = false;
char *symbol = NULL, *tmp = NULL;
const char *event = NULL, *group = FPROBE_EVENT_SYSTEM;
+ const char **new_argv = NULL;
int maxactive = 0;
char buf[MAX_EVENT_NAME_LEN];
char gbuf[MAX_EVENT_NAME_LEN];
char sbuf[KSYM_NAME_LEN];
+ char abuf[MAX_BTF_ARGS_LEN];
bool is_tracepoint = false;
struct tracepoint *tpoint = NULL;
struct traceprobe_parse_context ctx = {
@@ -1057,9 +1059,22 @@ static int __trace_fprobe_create(int argc, const char *argv[])
} else
ctx.funcname = symbol;

+ argc -= 2; argv += 2;
+ new_argv = traceprobe_expand_meta_args(argc, argv, &new_argc,
+ abuf, MAX_BTF_ARGS_LEN, &ctx);
+ if (IS_ERR(new_argv)) {
+ ret = PTR_ERR(new_argv);
+ new_argv = NULL;
+ goto out;
+ }
+ if (new_argv) {
+ argc = new_argc;
+ argv = new_argv;
+ }
+
/* setup a probe */
tf = alloc_trace_fprobe(group, event, symbol, tpoint, maxactive,
- argc - 2, is_return);
+ argc, is_return);
if (IS_ERR(tf)) {
ret = PTR_ERR(tf);
/* This must return -ENOMEM, else there is a bug */
@@ -1071,7 +1086,6 @@ static int __trace_fprobe_create(int argc, const char *argv[])
tf->mod = __module_text_address(
(unsigned long)tf->tpoint->probestub);

- argc -= 2; argv += 2;
/* parse arguments */
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
trace_probe_log_set_index(i + 2);
@@ -1100,6 +1114,7 @@ static int __trace_fprobe_create(int argc, const char *argv[])

out:
trace_probe_log_clear();
+ kfree(new_argv);
kfree(symbol);
return ret;

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index aff6c1a5e161..2d7c0188c2b1 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -732,9 +732,10 @@ static int __trace_kprobe_create(int argc, const char *argv[])
* FETCHARG:TYPE : use TYPE instead of unsigned long.
*/
struct trace_kprobe *tk = NULL;
- int i, len, ret = 0;
+ int i, len, new_argc = 0, ret = 0;
bool is_return = false;
char *symbol = NULL, *tmp = NULL;
+ const char **new_argv = NULL;
const char *event = NULL, *group = KPROBE_EVENT_SYSTEM;
enum probe_print_type ptype;
int maxactive = 0;
@@ -742,6 +743,7 @@ static int __trace_kprobe_create(int argc, const char *argv[])
void *addr = NULL;
char buf[MAX_EVENT_NAME_LEN];
char gbuf[MAX_EVENT_NAME_LEN];
+ char abuf[MAX_BTF_ARGS_LEN];
struct traceprobe_parse_context ctx = { .flags = TPARG_FL_KERNEL };

switch (argv[0][0]) {
@@ -854,19 +856,31 @@ static int __trace_kprobe_create(int argc, const char *argv[])
event = buf;
}

+ argc -= 2; argv += 2;
+ ctx.funcname = symbol;
+ new_argv = traceprobe_expand_meta_args(argc, argv, &new_argc,
+ abuf, MAX_BTF_ARGS_LEN, &ctx);
+ if (IS_ERR(new_argv)) {
+ ret = PTR_ERR(new_argv);
+ new_argv = NULL;
+ goto out;
+ }
+ if (new_argv) {
+ argc = new_argc;
+ argv = new_argv;
+ }
+
/* setup a probe */
tk = alloc_trace_kprobe(group, event, addr, symbol, offset, maxactive,
- argc - 2, is_return);
+ argc, is_return);
if (IS_ERR(tk)) {
ret = PTR_ERR(tk);
/* This must return -ENOMEM, else there is a bug */
WARN_ON_ONCE(ret != -ENOMEM);
goto out; /* We know tk is not allocated */
}
- argc -= 2; argv += 2;

/* parse arguments */
- ctx.funcname = symbol;
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
trace_probe_log_set_index(i + 2);
ctx.offset = 0;
@@ -894,6 +908,7 @@ static int __trace_kprobe_create(int argc, const char *argv[])

out:
trace_probe_log_clear();
+ kfree(new_argv);
kfree(symbol);
return ret;

diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
index f55d633b3e2a..c75b6a1bd5ea 100644
--- a/kernel/trace/trace_probe.c
+++ b/kernel/trace/trace_probe.c
@@ -452,12 +452,18 @@ static const struct fetch_type *parse_btf_arg_type(int arg_idx,

return find_fetch_type(typestr, ctx->flags);
}
+
#else
static struct btf *traceprobe_get_btf(void)
{
return NULL;
}

+static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
+
static int parse_btf_arg(const char *varname, struct fetch_insn *code,
struct traceprobe_parse_context *ctx)
{
@@ -1081,6 +1087,93 @@ void traceprobe_free_probe_arg(struct probe_arg *arg)
kfree(arg->fmt);
}

+/* Return new_argv which must be freed after use */
+const char **traceprobe_expand_meta_args(int argc, const char *argv[],
+ int *new_argc, char *buf, int bufsize,
+ struct traceprobe_parse_context *ctx)
+{
+ struct btf *btf = traceprobe_get_btf();
+ const struct btf_param *params = NULL;
+ int i, j, used, ret, args_idx = -1;
+ const char **new_argv = NULL;
+ int nr_skipped;
+
+ /* The first argument of tracepoint should be skipped. */
+ nr_skipped = ctx->flags & TPARG_FL_TPOINT ? 1 : 0;
+ for (i = 0; i < argc; i++)
+ if (!strcmp(argv[i], "$$args")) {
+ trace_probe_log_set_index(i + 2);
+
+ if (!tparg_is_function_entry(ctx->flags)) {
+ trace_probe_log_err(0, NOFENTRY_ARGS);
+ return ERR_PTR(-EINVAL);
+ }
+
+ if (args_idx >= 0) {
+ trace_probe_log_err(0, DOUBLE_ARGS);
+ return ERR_PTR(-EINVAL);
+ }
+
+ args_idx = i;
+ params = find_btf_func_param(ctx->funcname, &ctx->nr_params);
+ if (IS_ERR(params)) {
+ trace_probe_log_err(0, NOSUP_BTFARG);
+ return (const char **)params;
+ }
+ ctx->params = params;
+ }
+
+ /* If target has no arguments, return NULL and the original argc. */
+ if (args_idx < 0 || ctx->nr_params < nr_skipped) {
+ *new_argc = argc;
+ return NULL;
+ }
+
+ *new_argc = argc - 1 + ctx->nr_params - nr_skipped;
+
+ new_argv = kcalloc(*new_argc, sizeof(char *), GFP_KERNEL);
+ if (!new_argv)
+ return ERR_PTR(-ENOMEM);
+
+ for (i = 0; i < args_idx; i++)
+ new_argv[i] = argv[i];
+
+ used = 0;
+ trace_probe_log_set_index(args_idx + 2);
+ for (i = 0; i < ctx->nr_params - nr_skipped; i++) {
+ const char *name;
+
+ name = btf_name_by_offset(btf, params[i + nr_skipped].name_off);
+ if (!name) {
+ trace_probe_log_err(0, NO_BTF_ENTRY);
+ ret = -ENOENT;
+ goto error;
+ }
+ ret = snprintf(buf + used, bufsize - used, "%s", name);
+ if (ret >= bufsize - used) {
+ trace_probe_log_err(0, ARGS_2LONG);
+ ret = -E2BIG;
+ goto error;
+ }
+ new_argv[args_idx + i] = buf + used;
+ used += ret + 1; /* include null byte */
+ }
+
+ /* Note: we have to skip $$args */
+ j = args_idx + ctx->nr_params - nr_skipped;
+ for (i = args_idx + 1; i < argc; i++, j++) {
+ if (WARN_ON(j >= *new_argc))
+ goto error;
+ new_argv[j] = argv[i];
+ }
+
+ return new_argv;
+
+error:
+ kfree(new_argv);
+ return ERR_PTR(ret);
+}
+
int traceprobe_update_arg(struct probe_arg *arg)
{
struct fetch_insn *code = arg->code;
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 9ea5c7e8753f..8c5b029c5d62 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -33,6 +33,7 @@
#define MAX_ARGSTR_LEN 63
#define MAX_ARRAY_LEN 64
#define MAX_ARG_NAME_LEN 32
+#define MAX_BTF_ARGS_LEN 128
#define MAX_STRING_SIZE PATH_MAX

/* Reserved field names */
@@ -387,6 +388,9 @@ struct traceprobe_parse_context {
extern int traceprobe_parse_probe_arg(struct trace_probe *tp, int i,
const char *argv,
struct traceprobe_parse_context *ctx);
+const char **traceprobe_expand_meta_args(int argc, const char *argv[],
+ int *new_argc, char *buf, int bufsize,
+ struct traceprobe_parse_context *ctx);

extern int traceprobe_update_arg(struct probe_arg *arg);
extern void traceprobe_free_probe_arg(struct probe_arg *arg);
@@ -481,7 +485,10 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
C(NO_EP_FILTER, "No filter rule after 'if'"), \
C(NOSUP_BTFARG, "BTF is not available or not supported"), \
C(NO_BTFARG, "This variable is not found at this probe point"),\
- C(NO_BTF_ENTRY, "No BTF entry for this probe point"),
+ C(NO_BTF_ENTRY, "No BTF entry for this probe point"), \
+ C(NOFENTRY_ARGS, "$$args can be used only on function entry"), \
+ C(DOUBLE_ARGS, "$$args can be used only once in the parameter"), \
+ C(ARGS_2LONG, "$$args failed because the argument is too long"),

#undef C
#define C(a, b) TP_ERR_##a

2023-04-20 11:32:44

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v5 9/9] selftests/ftrace: Add BTF arguments test cases

From: Masami Hiramatsu (Google) <[email protected]>

Add test cases to check the BTF arguments correctly supported.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
.../ftrace/test.d/dynevent/add_remove_btfarg.tc | 54 ++++++++++++++++++++
.../ftrace/test.d/dynevent/fprobe_syntax_errors.tc | 10 ++++
.../ftrace/test.d/kprobe/kprobe_syntax_errors.tc | 10 ++++
3 files changed, 74 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_btfarg.tc

diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_btfarg.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_btfarg.tc
new file mode 100644
index 000000000000..15515f0a8c9a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_btfarg.tc
@@ -0,0 +1,54 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - add/remove probes with BTF arguments
+# requires: dynamic_events "<argname>":README
+
+KPROBES=
+FPROBES=
+
+if grep -qF "p[:[<group>/][<event>]] <place> [<args>]" README ; then
+ KPROBES=yes
+fi
+if grep -qF "f[:[<group>/][<event>]] <func-name>[%return] [<args>]" README ; then
+ FPROBES=yes
+fi
+
+if [ -z "$KPROBES" -a "$FPROBES" ] ; then
+ exit_unsupported
+fi
+
+echo 0 > events/enable
+echo > dynamic_events
+
+TP=kfree
+
+if [ "$FPROBES" ] ; then
+echo "f:fpevent $TP object" >> dynamic_events
+echo "t:tpevent $TP ptr" >> dynamic_events
+
+grep -q "fpevent.*object=object" dynamic_events
+grep -q "tpevent.*ptr=ptr" dynamic_events
+
+echo > dynamic_events
+
+echo "f:fpevent $TP "'$$args' >> dynamic_events
+echo "t:tpevent $TP "'$$args' >> dynamic_events
+
+grep -q "fpevent.*object=object" dynamic_events
+grep -q "tpevent.*ptr=ptr" dynamic_events
+! grep -q "tpevent.*_data" dynamic_events
+fi
+
+echo > dynamic_events
+
+if [ "$KPROBES" ] ; then
+echo "p:kpevent $TP object" >> dynamic_events
+grep -q "kpevent.*object=object" dynamic_events
+
+echo > dynamic_events
+
+echo "p:kpevent $TP "'$$args' >> dynamic_events
+grep -q "kpevent.*object=object" dynamic_events
+fi
+
+clear_trace
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
index 549daa162d84..fb92566916a9 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
@@ -85,4 +85,14 @@ fi
# %return suffix errors
check_error 'f vfs_read^%hoge' # BAD_ADDR_SUFFIX

+# BTF arguments errors
+if grep -q "<argname>" README; then
+check_error 'f vfs_read $$args ^$$args' # DOUBLE_ARGS
+check_error 'f vfs_read%return ^$$args' # NOFENTRY_ARGS
+check_error 'f vfs_read ^hoge' # NO_BTFARG
+else
+check_error 'f vfs_read $$args' # NOSUP_BTFARG
+check_error 't kfree $$args' # NOSUP_BTFARG
+fi
+
exit 0
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
index 97c08867490a..f2d1edbb650a 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_syntax_errors.tc
@@ -103,4 +103,14 @@ check_error 'p vfs_read^%hoge' # BAD_ADDR_SUFFIX
check_error 'p ^vfs_read+10%return' # BAD_RETPROBE
fi

+# BTF arguments errors
+if grep -q "<argname>" README; then
+check_error 'p vfs_read $$args ^$$args' # DOUBLE_ARGS
+check_error 'r vfs_read ^$$args' # NOFENTRY_ARGS
+check_error 'p vfs_read+8 ^$$args' # NOFENTRY_ARGS
+check_error 'p vfs_read ^hoge' # NO_BTFARG
+else
+check_error 'p vfs_read $$args' # NOSUP_BTFARG
+fi
+
exit 0

2023-04-20 18:51:29

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] tracing/probes: Add fprobe events for tracing function entry and exit.

On Thu, Apr 20, 2023 at 08:25:50PM +0900, Masami Hiramatsu (Google) wrote:
> +static int fentry_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
> + struct pt_regs *regs)
> +{
> + struct trace_event_call *call = trace_probe_event_call(&tf->tp);
> + struct fentry_trace_entry_head *entry;
> + struct hlist_head *head;
> + int size, __size, dsize;
> + int rctx;
> +
> + if (bpf_prog_array_valid(call)) {
> + unsigned long orig_ip = instruction_pointer(regs);
> + int ret;
> +
> + ret = trace_call_bpf(call, regs);

Please do not call bpf from fprobe.
There is no use case for it.

> +
> + /*
> + * We need to check and see if we modified the pc of the
> + * pt_regs, and if so return 1 so that we don't do the
> + * single stepping.
> + */
> + if (orig_ip != instruction_pointer(regs))
> + return 1;
> + if (!ret)
> + return 0;
> + }
> +
> + head = this_cpu_ptr(call->perf_events);
> + if (hlist_empty(head))
> + return 0;
> +
> + dsize = __get_data_size(&tf->tp, regs);
> + __size = sizeof(*entry) + tf->tp.size + dsize;
> + size = ALIGN(__size + sizeof(u32), sizeof(u64));
> + size -= sizeof(u32);
> +
> + entry = perf_trace_buf_alloc(size, NULL, &rctx);
> + if (!entry)
> + return 0;
> +
> + entry->ip = entry_ip;
> + memset(&entry[1], 0, dsize);
> + store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
> + perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
> + head, NULL);
> + return 0;
> +}
> +NOKPROBE_SYMBOL(fentry_perf_func);
> +
> +static void
> +fexit_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
> + unsigned long ret_ip, struct pt_regs *regs)
> +{
> + struct trace_event_call *call = trace_probe_event_call(&tf->tp);
> + struct fexit_trace_entry_head *entry;
> + struct hlist_head *head;
> + int size, __size, dsize;
> + int rctx;
> +
> + if (bpf_prog_array_valid(call) && !trace_call_bpf(call, regs))
> + return;

Same here.
These two parts look like copy-paste from kprobes.
I suspect this code wasn't tested at all.

2023-04-20 19:14:25

by Alan Maguire

[permalink] [raw]
Subject: Re: [PATCH v5 6/9] tracing/probes: Support function parameters if BTF is available

On 20/04/2023 12:26, Masami Hiramatsu (Google) wrote:
> From: Masami Hiramatsu (Google) <[email protected]>
>
> Support function or tracepoint parameters by name if BTF is available
> and the event is for function entry (This means it is available for
> kprobe-events, fprobe-events and tracepoint probe events.)
>
> BTF variable syntax is a bit special because it doesn't need any prefix.
> Also, if only the BTF variable name is given, the argument name is
> also becomes the BTF variable name. e.g.
>
> # echo 'p vfs_read count pos' >> dynamic_events
> # echo 'f vfs_write count pos' >> dynamic_events
> # echo 't sched_overutilized_tp rd overutilized' >> dynamic_events
> # cat dynamic_events
> p:kprobes/p_vfs_read_0 vfs_read count=count pos=pos
> f:fprobes/vfs_write__entry vfs_write count=count pos=pos
> t:tracepoints/sched_overutilized_tp sched_overutilized_tp rd=rd overutilized=overutilized
>
> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> ---
> kernel/trace/Kconfig | 11 ++
> kernel/trace/trace.c | 4 +
> kernel/trace/trace_fprobe.c | 49 ++++++-----
> kernel/trace/trace_kprobe.c | 12 +--
> kernel/trace/trace_probe.c | 192 +++++++++++++++++++++++++++++++++++++++++++
> kernel/trace/trace_probe.h | 9 ++
> 6 files changed, 248 insertions(+), 29 deletions(-)
>
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index 8e10a9453c96..e2b415b9fcd4 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -664,6 +664,17 @@ config FPROBE_EVENTS
> and the kprobe events on function entry and exit will be
> transparently converted to this fprobe events.
>
> +config PROBE_EVENTS_BTF_ARGS
> + depends on HAVE_FUNCTION_ARG_ACCESS_API
> + depends on FPROBE_EVENTS || KPROBE_EVENTS
> + depends on DEBUG_INFO_BTF && BPF_SYSCALL
> + bool "Support BTF function arguments for probe events"
> + default y
> + help
> + The user can specify the arguments of the probe event using the names
> + of the arguments of the probed function. This feature only works if
> + the probe location is a kernel function entry or a tracepoint.
> +
> config KPROBE_EVENTS
> depends on KPROBES
> depends on HAVE_REGS_AND_STACK_ACCESS_API
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 9da9c979faa3..0d9c48197a5c 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -5670,7 +5670,11 @@ static const char readme_msg[] =
> "\t args: <name>=fetcharg[:type]\n"
> "\t fetcharg: (%<register>|$<efield>), @<address>, @<symbol>[+|-<offset>],\n"
> #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
> +#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
> + "\t $stack<index>, $stack, $retval, $comm, $arg<N>, <argname>\n"
> +#else
> "\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
> +#endif
> #else
> "\t $stack<index>, $stack, $retval, $comm,\n"
> #endif
> diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
> index cd91bf57baac..d88079c2d2e3 100644
> --- a/kernel/trace/trace_fprobe.c
> +++ b/kernel/trace/trace_fprobe.c
> @@ -387,6 +387,7 @@ static void free_trace_fprobe(struct trace_fprobe *tf)
> static struct trace_fprobe *alloc_trace_fprobe(const char *group,
> const char *event,
> const char *symbol,
> + struct tracepoint *tpoint,
> int maxactive,
> int nargs, bool is_return)
> {
> @@ -406,6 +407,7 @@ static struct trace_fprobe *alloc_trace_fprobe(const char *group,
> else
> tf->fp.entry_handler = fentry_dispatcher;
>
> + tf->tpoint = tpoint;
> tf->fp.nr_maxactive = maxactive;
>
> ret = trace_probe_init(&tf->tp, event, group, false);
> @@ -949,8 +951,12 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> int maxactive = 0;
> char buf[MAX_EVENT_NAME_LEN];
> char gbuf[MAX_EVENT_NAME_LEN];
> - unsigned int flags = TPARG_FL_KERNEL;
> + char sbuf[KSYM_NAME_LEN];
> bool is_tracepoint = false;
> + struct tracepoint *tpoint = NULL;
> + struct traceprobe_parse_context ctx = {
> + .flags = TPARG_FL_KERNEL | TPARG_FL_FENTRY,
> + };
>
> if ((argv[0][0] != 'f' && argv[0][0] != 't') || argc < 2)
> return -ECANCELED;
> @@ -1014,12 +1020,6 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> goto parse_error;
> }
>
> - flags |= TPARG_FL_FENTRY;
> - if (is_return)
> - flags |= TPARG_FL_RETURN;
> - if (is_tracepoint)
> - flags |= TPARG_FL_TPOINT;
> -
> trace_probe_log_set_index(0);
> if (event) {
> ret = traceprobe_parse_event_name(&event, &group, gbuf,
> @@ -1031,7 +1031,8 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> if (!event) {
> /* Make a new event name */
> if (is_tracepoint)
> - strscpy(buf, symbol, MAX_EVENT_NAME_LEN);
> + snprintf(buf, MAX_EVENT_NAME_LEN, "%s%s",
> + isdigit(*symbol) ? "_" : "", symbol);
> else
> snprintf(buf, MAX_EVENT_NAME_LEN, "%s__%s", symbol,
> is_return ? "exit" : "entry");
> @@ -1039,8 +1040,25 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> event = buf;
> }
>
> + if (is_return)
> + ctx.flags |= TPARG_FL_RETURN;
> +
> + if (is_tracepoint) {
> + ctx.flags |= TPARG_FL_TPOINT;
> + tpoint = find_tracepoint(symbol);
> + if (!tpoint) {
> + trace_probe_log_set_index(1);
> + trace_probe_log_err(0, NO_TRACEPOINT);
> + goto parse_error;
> + }
> + ctx.funcname = kallsyms_lookup(
> + (unsigned long)tpoint->probestub,
> + NULL, NULL, NULL, sbuf);
> + } else
> + ctx.funcname = symbol;
> +
> /* setup a probe */
> - tf = alloc_trace_fprobe(group, event, symbol, maxactive,
> + tf = alloc_trace_fprobe(group, event, symbol, tpoint, maxactive,
> argc - 2, is_return);
> if (IS_ERR(tf)) {
> ret = PTR_ERR(tf);
> @@ -1049,24 +1067,15 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> goto out; /* We know tf is not allocated */
> }
>
> - if (is_tracepoint) {
> - tf->tpoint = find_tracepoint(tf->symbol);
> - if (!tf->tpoint) {
> - trace_probe_log_set_index(1);
> - trace_probe_log_err(0, NO_TRACEPOINT);
> - goto parse_error;
> - }
> + if (is_tracepoint)
> tf->mod = __module_text_address(
> (unsigned long)tf->tpoint->probestub);
> - }
>
> argc -= 2; argv += 2;
> -
> /* parse arguments */
> for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
> - struct traceprobe_parse_context ctx = { .flags = flags };
> -
> trace_probe_log_set_index(i + 2);
> + ctx.offset = 0;
> ret = traceprobe_parse_probe_arg(&tf->tp, i, argv[i], &ctx);
> if (ret)
> goto error; /* This can be -ENOMEM */
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index fd62de2a2f51..aff6c1a5e161 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -742,7 +742,7 @@ static int __trace_kprobe_create(int argc, const char *argv[])
> void *addr = NULL;
> char buf[MAX_EVENT_NAME_LEN];
> char gbuf[MAX_EVENT_NAME_LEN];
> - unsigned int flags = TPARG_FL_KERNEL;
> + struct traceprobe_parse_context ctx = { .flags = TPARG_FL_KERNEL };
>
> switch (argv[0][0]) {
> case 'r':
> @@ -823,10 +823,10 @@ static int __trace_kprobe_create(int argc, const char *argv[])
> goto parse_error;
> }
> if (is_return)
> - flags |= TPARG_FL_RETURN;
> + ctx.flags |= TPARG_FL_RETURN;
> ret = kprobe_on_func_entry(NULL, symbol, offset);
> if (ret == 0)
> - flags |= TPARG_FL_FENTRY;
> + ctx.flags |= TPARG_FL_FENTRY;
> /* Defer the ENOENT case until register kprobe */
> if (ret == -EINVAL && is_return) {
> trace_probe_log_err(0, BAD_RETPROBE);
> @@ -856,7 +856,7 @@ static int __trace_kprobe_create(int argc, const char *argv[])
>
> /* setup a probe */
> tk = alloc_trace_kprobe(group, event, addr, symbol, offset, maxactive,
> - argc - 2, is_return);
> + argc - 2, is_return);
> if (IS_ERR(tk)) {
> ret = PTR_ERR(tk);
> /* This must return -ENOMEM, else there is a bug */
> @@ -866,10 +866,10 @@ static int __trace_kprobe_create(int argc, const char *argv[])
> argc -= 2; argv += 2;
>
> /* parse arguments */
> + ctx.funcname = symbol;
> for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
> - struct traceprobe_parse_context ctx = { .flags = flags };
> -
> trace_probe_log_set_index(i + 2);
> + ctx.offset = 0;
> ret = traceprobe_parse_probe_arg(&tk->tp, i, argv[i], &ctx);
> if (ret)
> goto error; /* This can be -ENOMEM */
> diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> index 84a9f0446390..f55d633b3e2a 100644
> --- a/kernel/trace/trace_probe.c
> +++ b/kernel/trace/trace_probe.c
> @@ -300,6 +300,174 @@ static int parse_trace_event_arg(char *arg, struct fetch_insn *code,
> return -ENOENT;
> }
>
> +#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
> +
> +static DEFINE_MUTEX(tp_btf_mutex);
> +static struct btf *traceprobe_btf;
> +
> +static struct btf *traceprobe_get_btf(void)
> +{
> + if (!traceprobe_btf && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
> + mutex_lock(&tp_btf_mutex);
> + if (!traceprobe_btf)
> + traceprobe_btf = btf_parse_vmlinux();

Apologies if I missed this in previous discussion, but should we
use bpf_get_btf_vmlinux() here instead, since it will
return an already-parsed BTF? There's a bunch of additional
work that btf_parse_vmlinux() does that's not needed from
a tracing POV.

> + mutex_unlock(&tp_btf_mutex);
> + }
> +
> + return traceprobe_btf;
> +}
> +
> +static u32 btf_type_int(const struct btf_type *t)
> +{
> + return *(u32 *)(t + 1);
> +}
> +
> +static const char *type_from_btf_id(struct btf *btf, s32 id)
> +{
> + const struct btf_type *t;
> + u32 intdata;
> + s32 tid;
> +
> + /* TODO: const char * could be converted as a string */
> + t = btf_type_skip_modifiers(btf, id, &tid);
> +
> + switch (BTF_INFO_KIND(t->info)) {
> + case BTF_KIND_ENUM:
> + /* enum is "int", so convert to "s32" */
> + return "s32";
> + case BTF_KIND_PTR:
> + /* pointer will be converted to "x??" */
> + if (IS_ENABLED(CONFIG_64BIT))
> + return "x64";
> + else
> + return "x32";
> + case BTF_KIND_INT:
> + intdata = btf_type_int(t);
> + if (BTF_INT_ENCODING(intdata) & BTF_INT_SIGNED) {
> + switch (BTF_INT_BITS(intdata)) {
> + case 8:
> + return "s8";
> + case 16:
> + return "s16";
> + case 32:
> + return "s32";
> + case 64:
> + return "s64";
> + }
> + } else { /* unsigned */
> + switch (BTF_INT_BITS(intdata)) {
> + case 8:
> + return "u8";
> + case 16:
> + return "u16";
> + case 32:
> + return "u32";
> + case 64:
> + return "u64";
> + }
> + }
> + }
> + /* TODO: support other types */
> +
> + return NULL;
> +}
> +

I wonder if we could haul out some common code for printing the type
from btf_show_name()? Part of what it does is stringify a type name;
if we had that, it might help solve the TODO above I think..

Alan

> +static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr)
> +{
> + struct btf *btf = traceprobe_get_btf();
> + const struct btf_type *t;
> + s32 id;
> +
> + if (!btf || !funcname || !nr)
> + return ERR_PTR(-EINVAL);
> +
> + id = btf_find_by_name_kind(btf, funcname, BTF_KIND_FUNC);
> + if (id <= 0)
> + return ERR_PTR(-ENOENT);
> +
> + /* Get BTF_KIND_FUNC type */
> + t = btf_type_by_id(btf, id);
> + if (!btf_type_is_func(t))
> + return ERR_PTR(-ENOENT);
> +
> + /* The type of BTF_KIND_FUNC is BTF_KIND_FUNC_PROTO */
> + t = btf_type_by_id(btf, t->type);
> + if (!btf_type_is_func_proto(t))
> + return ERR_PTR(-ENOENT);
> +
> + *nr = btf_type_vlen(t);
> +
> + if (*nr)
> + return (const struct btf_param *)(t + 1);
> + else
> + return NULL;
> +}
> +
> +static int parse_btf_arg(const char *varname, struct fetch_insn *code,
> + struct traceprobe_parse_context *ctx)
> +{
> + struct btf *btf = traceprobe_get_btf();
> + const struct btf_param *params;
> + int i;
> +
> + if (!btf) {
> + trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
> + return -EOPNOTSUPP;
> + }
> +
> + if (WARN_ON_ONCE(!ctx->funcname))
> + return -EINVAL;
> +
> + if (!ctx->params) {
> + params = find_btf_func_param(ctx->funcname, &ctx->nr_params);
> + if (IS_ERR(params)) {
> + trace_probe_log_err(ctx->offset, NO_BTF_ENTRY);
> + return PTR_ERR(params);
> + }
> + ctx->params = params;
> + } else
> + params = ctx->params;
> +
> + for (i = 0; i < ctx->nr_params; i++) {
> + const char *name = btf_name_by_offset(btf, params[i].name_off);
> +
> + if (name && !strcmp(name, varname)) {
> + code->op = FETCH_OP_ARG;
> + code->param = i;
> + return 0;
> + }
> + }
> + trace_probe_log_err(ctx->offset, NO_BTFARG);
> + return -ENOENT;
> +}
> +
> +static const struct fetch_type *parse_btf_arg_type(int arg_idx,
> + struct traceprobe_parse_context *ctx)
> +{
> + struct btf *btf = traceprobe_get_btf();
> + const char *typestr = NULL;
> +
> + if (btf && ctx->params)
> + typestr = type_from_btf_id(btf, ctx->params[arg_idx].type);
> +
> + return find_fetch_type(typestr, ctx->flags);
> +}
> +#else
> +static struct btf *traceprobe_get_btf(void)
> +{
> + return NULL;
> +}
> +
> +static int parse_btf_arg(const char *varname, struct fetch_insn *code,
> + struct traceprobe_parse_context *ctx)
> +{
> + trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
> + return -EOPNOTSUPP;
> +}
> +#define parse_btf_arg_type(idx, ctx) \
> + find_fetch_type(NULL, ctx->flags)
> +#endif
> +
> #define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long))
>
> static int parse_probe_vars(char *arg, const struct fetch_type *t,
> @@ -556,6 +724,15 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
> code->op = FETCH_OP_IMM;
> }
> break;
> + default:
> + if (isalpha(arg[0]) || arg[0] == '_') { /* BTF variable */
> + if (!tparg_is_function_entry(ctx->flags)) {
> + trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
> + return -EINVAL;
> + }
> + ret = parse_btf_arg(arg, code, ctx);
> + break;
> + }
> }
> if (!ret && code->op == FETCH_OP_NOP) {
> /* Parsed, but do not find fetch method */
> @@ -704,6 +881,11 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
> if (ret)
> goto fail;
>
> + /* Update storing type if BTF is available */
> + if (IS_ENABLED(CONFIG_PROBE_EVENTS_BTF_ARGS) &&
> + !t && code->op == FETCH_OP_ARG)
> + parg->type = parse_btf_arg_type(code->param, ctx);
> +
> ret = -EINVAL;
> /* Store operation */
> if (parg->type->is_string) {
> @@ -857,8 +1039,14 @@ int traceprobe_parse_probe_arg(struct trace_probe *tp, int i, const char *arg,
> parg->name = kmemdup_nul(arg, body - arg, GFP_KERNEL);
> body++;
> } else {
> - /* If argument name is omitted, set "argN" */
> - parg->name = kasprintf(GFP_KERNEL, "arg%d", i + 1);
> + /*
> + * If argument name is omitted, try arg as a name (BTF variable)
> + * or "argN".
> + */
> + if (is_good_name(arg))
> + parg->name = kstrdup(arg, GFP_KERNEL);
> + else
> + parg->name = kasprintf(GFP_KERNEL, "arg%d", i + 1);
> body = arg;
> }
> if (!parg->name)
> diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> index 2dc1e5c4c9e8..9ea5c7e8753f 100644
> --- a/kernel/trace/trace_probe.h
> +++ b/kernel/trace/trace_probe.h
> @@ -23,6 +23,7 @@
> #include <linux/limits.h>
> #include <linux/uaccess.h>
> #include <linux/bitops.h>
> +#include <linux/btf.h>
> #include <asm/bitsperlong.h>
>
> #include "trace.h"
> @@ -376,6 +377,9 @@ static inline bool tparg_is_function_entry(unsigned int flags)
>
> struct traceprobe_parse_context {
> struct trace_event_call *event;
> + const struct btf_param *params;
> + s32 nr_params;
> + const char *funcname;
> unsigned int flags;
> int offset;
> };
> @@ -474,7 +478,10 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
> C(NO_EVENT_INFO, "This requires both group and event name to attach"),\
> C(BAD_ATTACH_EVENT, "Attached event does not exist"),\
> C(BAD_ATTACH_ARG, "Attached event does not have this field"),\
> - C(NO_EP_FILTER, "No filter rule after 'if'"),
> + C(NO_EP_FILTER, "No filter rule after 'if'"), \
> + C(NOSUP_BTFARG, "BTF is not available or not supported"), \
> + C(NO_BTFARG, "This variable is not found at this probe point"),\
> + C(NO_BTF_ENTRY, "No BTF entry for this probe point"),
>
> #undef C
> #define C(a, b) TP_ERR_##a
>

2023-04-20 23:46:37

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] tracing/probes: Add fprobe events for tracing function entry and exit.

On Thu, 20 Apr 2023 11:49:32 -0700
Alexei Starovoitov <[email protected]> wrote:

> On Thu, Apr 20, 2023 at 08:25:50PM +0900, Masami Hiramatsu (Google) wrote:
> > +static int fentry_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
> > + struct pt_regs *regs)
> > +{
> > + struct trace_event_call *call = trace_probe_event_call(&tf->tp);
> > + struct fentry_trace_entry_head *entry;
> > + struct hlist_head *head;
> > + int size, __size, dsize;
> > + int rctx;
> > +
> > + if (bpf_prog_array_valid(call)) {
> > + unsigned long orig_ip = instruction_pointer(regs);
> > + int ret;
> > +
> > + ret = trace_call_bpf(call, regs);
>
> Please do not call bpf from fprobe.
> There is no use case for it.

OK.

>
> > +
> > + /*
> > + * We need to check and see if we modified the pc of the
> > + * pt_regs, and if so return 1 so that we don't do the
> > + * single stepping.
> > + */
> > + if (orig_ip != instruction_pointer(regs))
> > + return 1;
> > + if (!ret)
> > + return 0;
> > + }
> > +
> > + head = this_cpu_ptr(call->perf_events);
> > + if (hlist_empty(head))
> > + return 0;
> > +
> > + dsize = __get_data_size(&tf->tp, regs);
> > + __size = sizeof(*entry) + tf->tp.size + dsize;
> > + size = ALIGN(__size + sizeof(u32), sizeof(u64));
> > + size -= sizeof(u32);
> > +
> > + entry = perf_trace_buf_alloc(size, NULL, &rctx);
> > + if (!entry)
> > + return 0;
> > +
> > + entry->ip = entry_ip;
> > + memset(&entry[1], 0, dsize);
> > + store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
> > + perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
> > + head, NULL);
> > + return 0;
> > +}
> > +NOKPROBE_SYMBOL(fentry_perf_func);
> > +
> > +static void
> > +fexit_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
> > + unsigned long ret_ip, struct pt_regs *regs)
> > +{
> > + struct trace_event_call *call = trace_probe_event_call(&tf->tp);
> > + struct fexit_trace_entry_head *entry;
> > + struct hlist_head *head;
> > + int size, __size, dsize;
> > + int rctx;
> > +
> > + if (bpf_prog_array_valid(call) && !trace_call_bpf(call, regs))
> > + return;
>
> Same here.
> These two parts look like copy-paste from kprobes.
> I suspect this code wasn't tested at all.

OK, I missed to test that bpf part. I thought bpf could be appended to
any "trace-event" (looks like trace-event), isn't it?

Thank you,

--
Masami Hiramatsu (Google) <[email protected]>

2023-04-20 23:53:52

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] tracing/probes: Add fprobe events for tracing function entry and exit.

On Thu, Apr 20, 2023 at 4:41 PM Masami Hiramatsu <[email protected]> wrote:
>
> On Thu, 20 Apr 2023 11:49:32 -0700
> Alexei Starovoitov <[email protected]> wrote:
>
> > On Thu, Apr 20, 2023 at 08:25:50PM +0900, Masami Hiramatsu (Google) wrote:
> > > +static int fentry_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
> > > + struct pt_regs *regs)
> > > +{
> > > + struct trace_event_call *call = trace_probe_event_call(&tf->tp);
> > > + struct fentry_trace_entry_head *entry;
> > > + struct hlist_head *head;
> > > + int size, __size, dsize;
> > > + int rctx;
> > > +
> > > + if (bpf_prog_array_valid(call)) {
> > > + unsigned long orig_ip = instruction_pointer(regs);
> > > + int ret;
> > > +
> > > + ret = trace_call_bpf(call, regs);
> >
> > Please do not call bpf from fprobe.
> > There is no use case for it.
>
> OK.
>
> >
> > > +
> > > + /*
> > > + * We need to check and see if we modified the pc of the
> > > + * pt_regs, and if so return 1 so that we don't do the
> > > + * single stepping.
> > > + */
> > > + if (orig_ip != instruction_pointer(regs))
> > > + return 1;
> > > + if (!ret)
> > > + return 0;
> > > + }
> > > +
> > > + head = this_cpu_ptr(call->perf_events);
> > > + if (hlist_empty(head))
> > > + return 0;
> > > +
> > > + dsize = __get_data_size(&tf->tp, regs);
> > > + __size = sizeof(*entry) + tf->tp.size + dsize;
> > > + size = ALIGN(__size + sizeof(u32), sizeof(u64));
> > > + size -= sizeof(u32);
> > > +
> > > + entry = perf_trace_buf_alloc(size, NULL, &rctx);
> > > + if (!entry)
> > > + return 0;
> > > +
> > > + entry->ip = entry_ip;
> > > + memset(&entry[1], 0, dsize);
> > > + store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
> > > + perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
> > > + head, NULL);
> > > + return 0;
> > > +}
> > > +NOKPROBE_SYMBOL(fentry_perf_func);
> > > +
> > > +static void
> > > +fexit_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
> > > + unsigned long ret_ip, struct pt_regs *regs)
> > > +{
> > > + struct trace_event_call *call = trace_probe_event_call(&tf->tp);
> > > + struct fexit_trace_entry_head *entry;
> > > + struct hlist_head *head;
> > > + int size, __size, dsize;
> > > + int rctx;
> > > +
> > > + if (bpf_prog_array_valid(call) && !trace_call_bpf(call, regs))
> > > + return;
> >
> > Same here.
> > These two parts look like copy-paste from kprobes.
> > I suspect this code wasn't tested at all.
>
> OK, I missed to test that bpf part. I thought bpf could be appended to
> any "trace-event" (looks like trace-event), isn't it?

No. We're not applying bpf filtering to any random event
that gets introduced in a tracing subsystem.
fprobe falls into that category.
Every hook where bpf can be invoked has to be thought through.
That mental exercise didn't happen here.

2023-04-21 00:28:13

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v5 4/9] tracing/probes: Add tracepoint support on fprobe_event

Hi Masami,

kernel test robot noticed the following build warnings:

[auto build test WARNING on next-20230419]
[cannot apply to shuah-kselftest/next shuah-kselftest/fixes bpf-next/master bpf/master linus/master rostedt-trace/for-next rostedt-trace/for-next-urgent v6.3-rc7 v6.3-rc6 v6.3-rc5 v6.3-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Masami-Hiramatsu-Google/fprobe-Pass-return-address-to-the-handlers/20230420-192935
patch link: https://lore.kernel.org/r/168198997089.1795549.1009510263722958117.stgit%40mhiramat.roam.corp.google.com
patch subject: [PATCH v5 4/9] tracing/probes: Add tracepoint support on fprobe_event
config: riscv-allmodconfig (https://download.01.org/0day-ci/archive/20230421/[email protected]/config)
compiler: riscv64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/intel-lab-lkp/linux/commit/ecaf9aff228ea0fe18d72079792f4ff6a95263b7
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Masami-Hiramatsu-Google/fprobe-Pass-return-address-to-the-handlers/20230420-192935
git checkout ecaf9aff228ea0fe18d72079792f4ff6a95263b7
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv olddefconfig
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash kernel/trace/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

>> kernel/trace/trace_fprobe.c:907:20: warning: no previous prototype for 'find_tracepoint' [-Wmissing-prototypes]
907 | struct tracepoint *find_tracepoint(const char *tp_name)
| ^~~~~~~~~~~~~~~


vim +/find_tracepoint +907 kernel/trace/trace_fprobe.c

906
> 907 struct tracepoint *find_tracepoint(const char *tp_name)
908 {
909 struct __find_tracepoint_cb_data data = {
910 .tp_name = tp_name,
911 };
912
913 for_each_kernel_tracepoint(__find_tracepoint_cb, &data);
914
915 return data.tpoint;
916 }
917

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

2023-04-21 00:57:44

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 6/9] tracing/probes: Support function parameters if BTF is available

On Thu, 20 Apr 2023 20:08:00 +0100
Alan Maguire <[email protected]> wrote:

> On 20/04/2023 12:26, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google) <[email protected]>
> >
> > Support function or tracepoint parameters by name if BTF is available
> > and the event is for function entry (This means it is available for
> > kprobe-events, fprobe-events and tracepoint probe events.)
> >
> > BTF variable syntax is a bit special because it doesn't need any prefix.
> > Also, if only the BTF variable name is given, the argument name is
> > also becomes the BTF variable name. e.g.
> >
> > # echo 'p vfs_read count pos' >> dynamic_events
> > # echo 'f vfs_write count pos' >> dynamic_events
> > # echo 't sched_overutilized_tp rd overutilized' >> dynamic_events
> > # cat dynamic_events
> > p:kprobes/p_vfs_read_0 vfs_read count=count pos=pos
> > f:fprobes/vfs_write__entry vfs_write count=count pos=pos
> > t:tracepoints/sched_overutilized_tp sched_overutilized_tp rd=rd overutilized=overutilized
> >
> > Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> > ---
> > kernel/trace/Kconfig | 11 ++
> > kernel/trace/trace.c | 4 +
> > kernel/trace/trace_fprobe.c | 49 ++++++-----
> > kernel/trace/trace_kprobe.c | 12 +--
> > kernel/trace/trace_probe.c | 192 +++++++++++++++++++++++++++++++++++++++++++
> > kernel/trace/trace_probe.h | 9 ++
> > 6 files changed, 248 insertions(+), 29 deletions(-)
> >
> > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > index 8e10a9453c96..e2b415b9fcd4 100644
> > --- a/kernel/trace/Kconfig
> > +++ b/kernel/trace/Kconfig
> > @@ -664,6 +664,17 @@ config FPROBE_EVENTS
> > and the kprobe events on function entry and exit will be
> > transparently converted to this fprobe events.
> >
> > +config PROBE_EVENTS_BTF_ARGS
> > + depends on HAVE_FUNCTION_ARG_ACCESS_API
> > + depends on FPROBE_EVENTS || KPROBE_EVENTS
> > + depends on DEBUG_INFO_BTF && BPF_SYSCALL
> > + bool "Support BTF function arguments for probe events"
> > + default y
> > + help
> > + The user can specify the arguments of the probe event using the names
> > + of the arguments of the probed function. This feature only works if
> > + the probe location is a kernel function entry or a tracepoint.
> > +
> > config KPROBE_EVENTS
> > depends on KPROBES
> > depends on HAVE_REGS_AND_STACK_ACCESS_API
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 9da9c979faa3..0d9c48197a5c 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -5670,7 +5670,11 @@ static const char readme_msg[] =
> > "\t args: <name>=fetcharg[:type]\n"
> > "\t fetcharg: (%<register>|$<efield>), @<address>, @<symbol>[+|-<offset>],\n"
> > #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
> > +#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
> > + "\t $stack<index>, $stack, $retval, $comm, $arg<N>, <argname>\n"
> > +#else
> > "\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
> > +#endif
> > #else
> > "\t $stack<index>, $stack, $retval, $comm,\n"
> > #endif
> > diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
> > index cd91bf57baac..d88079c2d2e3 100644
> > --- a/kernel/trace/trace_fprobe.c
> > +++ b/kernel/trace/trace_fprobe.c
> > @@ -387,6 +387,7 @@ static void free_trace_fprobe(struct trace_fprobe *tf)
> > static struct trace_fprobe *alloc_trace_fprobe(const char *group,
> > const char *event,
> > const char *symbol,
> > + struct tracepoint *tpoint,
> > int maxactive,
> > int nargs, bool is_return)
> > {
> > @@ -406,6 +407,7 @@ static struct trace_fprobe *alloc_trace_fprobe(const char *group,
> > else
> > tf->fp.entry_handler = fentry_dispatcher;
> >
> > + tf->tpoint = tpoint;
> > tf->fp.nr_maxactive = maxactive;
> >
> > ret = trace_probe_init(&tf->tp, event, group, false);
> > @@ -949,8 +951,12 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> > int maxactive = 0;
> > char buf[MAX_EVENT_NAME_LEN];
> > char gbuf[MAX_EVENT_NAME_LEN];
> > - unsigned int flags = TPARG_FL_KERNEL;
> > + char sbuf[KSYM_NAME_LEN];
> > bool is_tracepoint = false;
> > + struct tracepoint *tpoint = NULL;
> > + struct traceprobe_parse_context ctx = {
> > + .flags = TPARG_FL_KERNEL | TPARG_FL_FENTRY,
> > + };
> >
> > if ((argv[0][0] != 'f' && argv[0][0] != 't') || argc < 2)
> > return -ECANCELED;
> > @@ -1014,12 +1020,6 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> > goto parse_error;
> > }
> >
> > - flags |= TPARG_FL_FENTRY;
> > - if (is_return)
> > - flags |= TPARG_FL_RETURN;
> > - if (is_tracepoint)
> > - flags |= TPARG_FL_TPOINT;
> > -
> > trace_probe_log_set_index(0);
> > if (event) {
> > ret = traceprobe_parse_event_name(&event, &group, gbuf,
> > @@ -1031,7 +1031,8 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> > if (!event) {
> > /* Make a new event name */
> > if (is_tracepoint)
> > - strscpy(buf, symbol, MAX_EVENT_NAME_LEN);
> > + snprintf(buf, MAX_EVENT_NAME_LEN, "%s%s",
> > + isdigit(*symbol) ? "_" : "", symbol);
> > else
> > snprintf(buf, MAX_EVENT_NAME_LEN, "%s__%s", symbol,
> > is_return ? "exit" : "entry");
> > @@ -1039,8 +1040,25 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> > event = buf;
> > }
> >
> > + if (is_return)
> > + ctx.flags |= TPARG_FL_RETURN;
> > +
> > + if (is_tracepoint) {
> > + ctx.flags |= TPARG_FL_TPOINT;
> > + tpoint = find_tracepoint(symbol);
> > + if (!tpoint) {
> > + trace_probe_log_set_index(1);
> > + trace_probe_log_err(0, NO_TRACEPOINT);
> > + goto parse_error;
> > + }
> > + ctx.funcname = kallsyms_lookup(
> > + (unsigned long)tpoint->probestub,
> > + NULL, NULL, NULL, sbuf);
> > + } else
> > + ctx.funcname = symbol;
> > +
> > /* setup a probe */
> > - tf = alloc_trace_fprobe(group, event, symbol, maxactive,
> > + tf = alloc_trace_fprobe(group, event, symbol, tpoint, maxactive,
> > argc - 2, is_return);
> > if (IS_ERR(tf)) {
> > ret = PTR_ERR(tf);
> > @@ -1049,24 +1067,15 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> > goto out; /* We know tf is not allocated */
> > }
> >
> > - if (is_tracepoint) {
> > - tf->tpoint = find_tracepoint(tf->symbol);
> > - if (!tf->tpoint) {
> > - trace_probe_log_set_index(1);
> > - trace_probe_log_err(0, NO_TRACEPOINT);
> > - goto parse_error;
> > - }
> > + if (is_tracepoint)
> > tf->mod = __module_text_address(
> > (unsigned long)tf->tpoint->probestub);
> > - }
> >
> > argc -= 2; argv += 2;
> > -
> > /* parse arguments */
> > for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
> > - struct traceprobe_parse_context ctx = { .flags = flags };
> > -
> > trace_probe_log_set_index(i + 2);
> > + ctx.offset = 0;
> > ret = traceprobe_parse_probe_arg(&tf->tp, i, argv[i], &ctx);
> > if (ret)
> > goto error; /* This can be -ENOMEM */
> > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > index fd62de2a2f51..aff6c1a5e161 100644
> > --- a/kernel/trace/trace_kprobe.c
> > +++ b/kernel/trace/trace_kprobe.c
> > @@ -742,7 +742,7 @@ static int __trace_kprobe_create(int argc, const char *argv[])
> > void *addr = NULL;
> > char buf[MAX_EVENT_NAME_LEN];
> > char gbuf[MAX_EVENT_NAME_LEN];
> > - unsigned int flags = TPARG_FL_KERNEL;
> > + struct traceprobe_parse_context ctx = { .flags = TPARG_FL_KERNEL };
> >
> > switch (argv[0][0]) {
> > case 'r':
> > @@ -823,10 +823,10 @@ static int __trace_kprobe_create(int argc, const char *argv[])
> > goto parse_error;
> > }
> > if (is_return)
> > - flags |= TPARG_FL_RETURN;
> > + ctx.flags |= TPARG_FL_RETURN;
> > ret = kprobe_on_func_entry(NULL, symbol, offset);
> > if (ret == 0)
> > - flags |= TPARG_FL_FENTRY;
> > + ctx.flags |= TPARG_FL_FENTRY;
> > /* Defer the ENOENT case until register kprobe */
> > if (ret == -EINVAL && is_return) {
> > trace_probe_log_err(0, BAD_RETPROBE);
> > @@ -856,7 +856,7 @@ static int __trace_kprobe_create(int argc, const char *argv[])
> >
> > /* setup a probe */
> > tk = alloc_trace_kprobe(group, event, addr, symbol, offset, maxactive,
> > - argc - 2, is_return);
> > + argc - 2, is_return);
> > if (IS_ERR(tk)) {
> > ret = PTR_ERR(tk);
> > /* This must return -ENOMEM, else there is a bug */
> > @@ -866,10 +866,10 @@ static int __trace_kprobe_create(int argc, const char *argv[])
> > argc -= 2; argv += 2;
> >
> > /* parse arguments */
> > + ctx.funcname = symbol;
> > for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
> > - struct traceprobe_parse_context ctx = { .flags = flags };
> > -
> > trace_probe_log_set_index(i + 2);
> > + ctx.offset = 0;
> > ret = traceprobe_parse_probe_arg(&tk->tp, i, argv[i], &ctx);
> > if (ret)
> > goto error; /* This can be -ENOMEM */
> > diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> > index 84a9f0446390..f55d633b3e2a 100644
> > --- a/kernel/trace/trace_probe.c
> > +++ b/kernel/trace/trace_probe.c
> > @@ -300,6 +300,174 @@ static int parse_trace_event_arg(char *arg, struct fetch_insn *code,
> > return -ENOENT;
> > }
> >
> > +#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
> > +
> > +static DEFINE_MUTEX(tp_btf_mutex);
> > +static struct btf *traceprobe_btf;
> > +
> > +static struct btf *traceprobe_get_btf(void)
> > +{
> > + if (!traceprobe_btf && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
> > + mutex_lock(&tp_btf_mutex);
> > + if (!traceprobe_btf)
> > + traceprobe_btf = btf_parse_vmlinux();
>
> Apologies if I missed this in previous discussion, but should we
> use bpf_get_btf_vmlinux() here instead, since it will
> return an already-parsed BTF? There's a bunch of additional
> work that btf_parse_vmlinux() does that's not needed from
> a tracing POV.

I thought bpf_get_btf_vmlinux() was only for BPF subsystem because
it is exposed in bpf.h, not btf.h. But yeah, it is useless if we
have 2 BTF instance. So I'll use it.

By the way, I eventually would like to use the BTF things even if the
BPF_SYSCALL is not enabled. Can we make it, or is it hard to decouple
the BTF things from BPF?

>
> > + mutex_unlock(&tp_btf_mutex);
> > + }
> > +
> > + return traceprobe_btf;
> > +}
> > +
> > +static u32 btf_type_int(const struct btf_type *t)
> > +{
> > + return *(u32 *)(t + 1);
> > +}
> > +
> > +static const char *type_from_btf_id(struct btf *btf, s32 id)
> > +{
> > + const struct btf_type *t;
> > + u32 intdata;
> > + s32 tid;
> > +
> > + /* TODO: const char * could be converted as a string */
> > + t = btf_type_skip_modifiers(btf, id, &tid);
> > +
> > + switch (BTF_INFO_KIND(t->info)) {
> > + case BTF_KIND_ENUM:
> > + /* enum is "int", so convert to "s32" */
> > + return "s32";
> > + case BTF_KIND_PTR:
> > + /* pointer will be converted to "x??" */
> > + if (IS_ENABLED(CONFIG_64BIT))
> > + return "x64";
> > + else
> > + return "x32";
> > + case BTF_KIND_INT:
> > + intdata = btf_type_int(t);
> > + if (BTF_INT_ENCODING(intdata) & BTF_INT_SIGNED) {
> > + switch (BTF_INT_BITS(intdata)) {
> > + case 8:
> > + return "s8";
> > + case 16:
> > + return "s16";
> > + case 32:
> > + return "s32";
> > + case 64:
> > + return "s64";
> > + }
> > + } else { /* unsigned */
> > + switch (BTF_INT_BITS(intdata)) {
> > + case 8:
> > + return "u8";
> > + case 16:
> > + return "u16";
> > + case 32:
> > + return "u32";
> > + case 64:
> > + return "u64";
> > + }
> > + }
> > + }
> > + /* TODO: support other types */
> > +
> > + return NULL;
> > +}
> > +
>
> I wonder if we could haul out some common code for printing the type
> from btf_show_name()? Part of what it does is stringify a type name;
> if we had that, it might help solve the TODO above I think..

I don't think we can make a common code, since the type name which BPF
needs and this are very different.
This is for converting the btf type to trace probe fetch type (it supports
limited types, not 'int' etc. because it needs to ) but yours needs to show the type name
in C like.

Thank you!

>
> Alan
>
> > +static const struct btf_param *find_btf_func_param(const char *funcname, s32 *nr)
> > +{
> > + struct btf *btf = traceprobe_get_btf();
> > + const struct btf_type *t;
> > + s32 id;
> > +
> > + if (!btf || !funcname || !nr)
> > + return ERR_PTR(-EINVAL);
> > +
> > + id = btf_find_by_name_kind(btf, funcname, BTF_KIND_FUNC);
> > + if (id <= 0)
> > + return ERR_PTR(-ENOENT);
> > +
> > + /* Get BTF_KIND_FUNC type */
> > + t = btf_type_by_id(btf, id);
> > + if (!btf_type_is_func(t))
> > + return ERR_PTR(-ENOENT);
> > +
> > + /* The type of BTF_KIND_FUNC is BTF_KIND_FUNC_PROTO */
> > + t = btf_type_by_id(btf, t->type);
> > + if (!btf_type_is_func_proto(t))
> > + return ERR_PTR(-ENOENT);
> > +
> > + *nr = btf_type_vlen(t);
> > +
> > + if (*nr)
> > + return (const struct btf_param *)(t + 1);
> > + else
> > + return NULL;
> > +}
> > +
> > +static int parse_btf_arg(const char *varname, struct fetch_insn *code,
> > + struct traceprobe_parse_context *ctx)
> > +{
> > + struct btf *btf = traceprobe_get_btf();
> > + const struct btf_param *params;
> > + int i;
> > +
> > + if (!btf) {
> > + trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + if (WARN_ON_ONCE(!ctx->funcname))
> > + return -EINVAL;
> > +
> > + if (!ctx->params) {
> > + params = find_btf_func_param(ctx->funcname, &ctx->nr_params);
> > + if (IS_ERR(params)) {
> > + trace_probe_log_err(ctx->offset, NO_BTF_ENTRY);
> > + return PTR_ERR(params);
> > + }
> > + ctx->params = params;
> > + } else
> > + params = ctx->params;
> > +
> > + for (i = 0; i < ctx->nr_params; i++) {
> > + const char *name = btf_name_by_offset(btf, params[i].name_off);
> > +
> > + if (name && !strcmp(name, varname)) {
> > + code->op = FETCH_OP_ARG;
> > + code->param = i;
> > + return 0;
> > + }
> > + }
> > + trace_probe_log_err(ctx->offset, NO_BTFARG);
> > + return -ENOENT;
> > +}
> > +
> > +static const struct fetch_type *parse_btf_arg_type(int arg_idx,
> > + struct traceprobe_parse_context *ctx)
> > +{
> > + struct btf *btf = traceprobe_get_btf();
> > + const char *typestr = NULL;
> > +
> > + if (btf && ctx->params)
> > + typestr = type_from_btf_id(btf, ctx->params[arg_idx].type);
> > +
> > + return find_fetch_type(typestr, ctx->flags);
> > +}
> > +#else
> > +static struct btf *traceprobe_get_btf(void)
> > +{
> > + return NULL;
> > +}
> > +
> > +static int parse_btf_arg(const char *varname, struct fetch_insn *code,
> > + struct traceprobe_parse_context *ctx)
> > +{
> > + trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
> > + return -EOPNOTSUPP;
> > +}
> > +#define parse_btf_arg_type(idx, ctx) \
> > + find_fetch_type(NULL, ctx->flags)
> > +#endif
> > +
> > #define PARAM_MAX_STACK (THREAD_SIZE / sizeof(unsigned long))
> >
> > static int parse_probe_vars(char *arg, const struct fetch_type *t,
> > @@ -556,6 +724,15 @@ parse_probe_arg(char *arg, const struct fetch_type *type,
> > code->op = FETCH_OP_IMM;
> > }
> > break;
> > + default:
> > + if (isalpha(arg[0]) || arg[0] == '_') { /* BTF variable */
> > + if (!tparg_is_function_entry(ctx->flags)) {
> > + trace_probe_log_err(ctx->offset, NOSUP_BTFARG);
> > + return -EINVAL;
> > + }
> > + ret = parse_btf_arg(arg, code, ctx);
> > + break;
> > + }
> > }
> > if (!ret && code->op == FETCH_OP_NOP) {
> > /* Parsed, but do not find fetch method */
> > @@ -704,6 +881,11 @@ static int traceprobe_parse_probe_arg_body(const char *argv, ssize_t *size,
> > if (ret)
> > goto fail;
> >
> > + /* Update storing type if BTF is available */
> > + if (IS_ENABLED(CONFIG_PROBE_EVENTS_BTF_ARGS) &&
> > + !t && code->op == FETCH_OP_ARG)
> > + parg->type = parse_btf_arg_type(code->param, ctx);
> > +
> > ret = -EINVAL;
> > /* Store operation */
> > if (parg->type->is_string) {
> > @@ -857,8 +1039,14 @@ int traceprobe_parse_probe_arg(struct trace_probe *tp, int i, const char *arg,
> > parg->name = kmemdup_nul(arg, body - arg, GFP_KERNEL);
> > body++;
> > } else {
> > - /* If argument name is omitted, set "argN" */
> > - parg->name = kasprintf(GFP_KERNEL, "arg%d", i + 1);
> > + /*
> > + * If argument name is omitted, try arg as a name (BTF variable)
> > + * or "argN".
> > + */
> > + if (is_good_name(arg))
> > + parg->name = kstrdup(arg, GFP_KERNEL);
> > + else
> > + parg->name = kasprintf(GFP_KERNEL, "arg%d", i + 1);
> > body = arg;
> > }
> > if (!parg->name)
> > diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
> > index 2dc1e5c4c9e8..9ea5c7e8753f 100644
> > --- a/kernel/trace/trace_probe.h
> > +++ b/kernel/trace/trace_probe.h
> > @@ -23,6 +23,7 @@
> > #include <linux/limits.h>
> > #include <linux/uaccess.h>
> > #include <linux/bitops.h>
> > +#include <linux/btf.h>
> > #include <asm/bitsperlong.h>
> >
> > #include "trace.h"
> > @@ -376,6 +377,9 @@ static inline bool tparg_is_function_entry(unsigned int flags)
> >
> > struct traceprobe_parse_context {
> > struct trace_event_call *event;
> > + const struct btf_param *params;
> > + s32 nr_params;
> > + const char *funcname;
> > unsigned int flags;
> > int offset;
> > };
> > @@ -474,7 +478,10 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
> > C(NO_EVENT_INFO, "This requires both group and event name to attach"),\
> > C(BAD_ATTACH_EVENT, "Attached event does not exist"),\
> > C(BAD_ATTACH_ARG, "Attached event does not have this field"),\
> > - C(NO_EP_FILTER, "No filter rule after 'if'"),
> > + C(NO_EP_FILTER, "No filter rule after 'if'"), \
> > + C(NOSUP_BTFARG, "BTF is not available or not supported"), \
> > + C(NO_BTFARG, "This variable is not found at this probe point"),\
> > + C(NO_BTF_ENTRY, "No BTF entry for this probe point"),
> >
> > #undef C
> > #define C(a, b) TP_ERR_##a
> >


--
Masami Hiramatsu (Google) <[email protected]>

2023-04-21 01:03:45

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v5 6/9] tracing/probes: Support function parameters if BTF is available

On Thu, Apr 20, 2023 at 5:57 PM Masami Hiramatsu <[email protected]> wrote:
>
> On Thu, 20 Apr 2023 20:08:00 +0100
> Alan Maguire <[email protected]> wrote:
>
> > On 20/04/2023 12:26, Masami Hiramatsu (Google) wrote:
> > > From: Masami Hiramatsu (Google) <[email protected]>
> > >
> > > Support function or tracepoint parameters by name if BTF is available
> > > and the event is for function entry (This means it is available for
> > > kprobe-events, fprobe-events and tracepoint probe events.)
> > >
> > > BTF variable syntax is a bit special because it doesn't need any prefix.
> > > Also, if only the BTF variable name is given, the argument name is
> > > also becomes the BTF variable name. e.g.
> > >
> > > # echo 'p vfs_read count pos' >> dynamic_events
> > > # echo 'f vfs_write count pos' >> dynamic_events
> > > # echo 't sched_overutilized_tp rd overutilized' >> dynamic_events
> > > # cat dynamic_events
> > > p:kprobes/p_vfs_read_0 vfs_read count=count pos=pos
> > > f:fprobes/vfs_write__entry vfs_write count=count pos=pos
> > > t:tracepoints/sched_overutilized_tp sched_overutilized_tp rd=rd overutilized=overutilized
> > >
> > > Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> > > ---
> > > kernel/trace/Kconfig | 11 ++
> > > kernel/trace/trace.c | 4 +
> > > kernel/trace/trace_fprobe.c | 49 ++++++-----
> > > kernel/trace/trace_kprobe.c | 12 +--
> > > kernel/trace/trace_probe.c | 192 +++++++++++++++++++++++++++++++++++++++++++
> > > kernel/trace/trace_probe.h | 9 ++
> > > 6 files changed, 248 insertions(+), 29 deletions(-)
> > >
> > > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> > > index 8e10a9453c96..e2b415b9fcd4 100644
> > > --- a/kernel/trace/Kconfig
> > > +++ b/kernel/trace/Kconfig
> > > @@ -664,6 +664,17 @@ config FPROBE_EVENTS
> > > and the kprobe events on function entry and exit will be
> > > transparently converted to this fprobe events.
> > >
> > > +config PROBE_EVENTS_BTF_ARGS
> > > + depends on HAVE_FUNCTION_ARG_ACCESS_API
> > > + depends on FPROBE_EVENTS || KPROBE_EVENTS
> > > + depends on DEBUG_INFO_BTF && BPF_SYSCALL
> > > + bool "Support BTF function arguments for probe events"
> > > + default y
> > > + help
> > > + The user can specify the arguments of the probe event using the names
> > > + of the arguments of the probed function. This feature only works if
> > > + the probe location is a kernel function entry or a tracepoint.
> > > +
> > > config KPROBE_EVENTS
> > > depends on KPROBES
> > > depends on HAVE_REGS_AND_STACK_ACCESS_API
> > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > > index 9da9c979faa3..0d9c48197a5c 100644
> > > --- a/kernel/trace/trace.c
> > > +++ b/kernel/trace/trace.c
> > > @@ -5670,7 +5670,11 @@ static const char readme_msg[] =
> > > "\t args: <name>=fetcharg[:type]\n"
> > > "\t fetcharg: (%<register>|$<efield>), @<address>, @<symbol>[+|-<offset>],\n"
> > > #ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
> > > +#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
> > > + "\t $stack<index>, $stack, $retval, $comm, $arg<N>, <argname>\n"
> > > +#else
> > > "\t $stack<index>, $stack, $retval, $comm, $arg<N>,\n"
> > > +#endif
> > > #else
> > > "\t $stack<index>, $stack, $retval, $comm,\n"
> > > #endif
> > > diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
> > > index cd91bf57baac..d88079c2d2e3 100644
> > > --- a/kernel/trace/trace_fprobe.c
> > > +++ b/kernel/trace/trace_fprobe.c
> > > @@ -387,6 +387,7 @@ static void free_trace_fprobe(struct trace_fprobe *tf)
> > > static struct trace_fprobe *alloc_trace_fprobe(const char *group,
> > > const char *event,
> > > const char *symbol,
> > > + struct tracepoint *tpoint,
> > > int maxactive,
> > > int nargs, bool is_return)
> > > {
> > > @@ -406,6 +407,7 @@ static struct trace_fprobe *alloc_trace_fprobe(const char *group,
> > > else
> > > tf->fp.entry_handler = fentry_dispatcher;
> > >
> > > + tf->tpoint = tpoint;
> > > tf->fp.nr_maxactive = maxactive;
> > >
> > > ret = trace_probe_init(&tf->tp, event, group, false);
> > > @@ -949,8 +951,12 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> > > int maxactive = 0;
> > > char buf[MAX_EVENT_NAME_LEN];
> > > char gbuf[MAX_EVENT_NAME_LEN];
> > > - unsigned int flags = TPARG_FL_KERNEL;
> > > + char sbuf[KSYM_NAME_LEN];
> > > bool is_tracepoint = false;
> > > + struct tracepoint *tpoint = NULL;
> > > + struct traceprobe_parse_context ctx = {
> > > + .flags = TPARG_FL_KERNEL | TPARG_FL_FENTRY,
> > > + };
> > >
> > > if ((argv[0][0] != 'f' && argv[0][0] != 't') || argc < 2)
> > > return -ECANCELED;
> > > @@ -1014,12 +1020,6 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> > > goto parse_error;
> > > }
> > >
> > > - flags |= TPARG_FL_FENTRY;
> > > - if (is_return)
> > > - flags |= TPARG_FL_RETURN;
> > > - if (is_tracepoint)
> > > - flags |= TPARG_FL_TPOINT;
> > > -
> > > trace_probe_log_set_index(0);
> > > if (event) {
> > > ret = traceprobe_parse_event_name(&event, &group, gbuf,
> > > @@ -1031,7 +1031,8 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> > > if (!event) {
> > > /* Make a new event name */
> > > if (is_tracepoint)
> > > - strscpy(buf, symbol, MAX_EVENT_NAME_LEN);
> > > + snprintf(buf, MAX_EVENT_NAME_LEN, "%s%s",
> > > + isdigit(*symbol) ? "_" : "", symbol);
> > > else
> > > snprintf(buf, MAX_EVENT_NAME_LEN, "%s__%s", symbol,
> > > is_return ? "exit" : "entry");
> > > @@ -1039,8 +1040,25 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> > > event = buf;
> > > }
> > >
> > > + if (is_return)
> > > + ctx.flags |= TPARG_FL_RETURN;
> > > +
> > > + if (is_tracepoint) {
> > > + ctx.flags |= TPARG_FL_TPOINT;
> > > + tpoint = find_tracepoint(symbol);
> > > + if (!tpoint) {
> > > + trace_probe_log_set_index(1);
> > > + trace_probe_log_err(0, NO_TRACEPOINT);
> > > + goto parse_error;
> > > + }
> > > + ctx.funcname = kallsyms_lookup(
> > > + (unsigned long)tpoint->probestub,
> > > + NULL, NULL, NULL, sbuf);
> > > + } else
> > > + ctx.funcname = symbol;
> > > +
> > > /* setup a probe */
> > > - tf = alloc_trace_fprobe(group, event, symbol, maxactive,
> > > + tf = alloc_trace_fprobe(group, event, symbol, tpoint, maxactive,
> > > argc - 2, is_return);
> > > if (IS_ERR(tf)) {
> > > ret = PTR_ERR(tf);
> > > @@ -1049,24 +1067,15 @@ static int __trace_fprobe_create(int argc, const char *argv[])
> > > goto out; /* We know tf is not allocated */
> > > }
> > >
> > > - if (is_tracepoint) {
> > > - tf->tpoint = find_tracepoint(tf->symbol);
> > > - if (!tf->tpoint) {
> > > - trace_probe_log_set_index(1);
> > > - trace_probe_log_err(0, NO_TRACEPOINT);
> > > - goto parse_error;
> > > - }
> > > + if (is_tracepoint)
> > > tf->mod = __module_text_address(
> > > (unsigned long)tf->tpoint->probestub);
> > > - }
> > >
> > > argc -= 2; argv += 2;
> > > -
> > > /* parse arguments */
> > > for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
> > > - struct traceprobe_parse_context ctx = { .flags = flags };
> > > -
> > > trace_probe_log_set_index(i + 2);
> > > + ctx.offset = 0;
> > > ret = traceprobe_parse_probe_arg(&tf->tp, i, argv[i], &ctx);
> > > if (ret)
> > > goto error; /* This can be -ENOMEM */
> > > diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> > > index fd62de2a2f51..aff6c1a5e161 100644
> > > --- a/kernel/trace/trace_kprobe.c
> > > +++ b/kernel/trace/trace_kprobe.c
> > > @@ -742,7 +742,7 @@ static int __trace_kprobe_create(int argc, const char *argv[])
> > > void *addr = NULL;
> > > char buf[MAX_EVENT_NAME_LEN];
> > > char gbuf[MAX_EVENT_NAME_LEN];
> > > - unsigned int flags = TPARG_FL_KERNEL;
> > > + struct traceprobe_parse_context ctx = { .flags = TPARG_FL_KERNEL };
> > >
> > > switch (argv[0][0]) {
> > > case 'r':
> > > @@ -823,10 +823,10 @@ static int __trace_kprobe_create(int argc, const char *argv[])
> > > goto parse_error;
> > > }
> > > if (is_return)
> > > - flags |= TPARG_FL_RETURN;
> > > + ctx.flags |= TPARG_FL_RETURN;
> > > ret = kprobe_on_func_entry(NULL, symbol, offset);
> > > if (ret == 0)
> > > - flags |= TPARG_FL_FENTRY;
> > > + ctx.flags |= TPARG_FL_FENTRY;
> > > /* Defer the ENOENT case until register kprobe */
> > > if (ret == -EINVAL && is_return) {
> > > trace_probe_log_err(0, BAD_RETPROBE);
> > > @@ -856,7 +856,7 @@ static int __trace_kprobe_create(int argc, const char *argv[])
> > >
> > > /* setup a probe */
> > > tk = alloc_trace_kprobe(group, event, addr, symbol, offset, maxactive,
> > > - argc - 2, is_return);
> > > + argc - 2, is_return);
> > > if (IS_ERR(tk)) {
> > > ret = PTR_ERR(tk);
> > > /* This must return -ENOMEM, else there is a bug */
> > > @@ -866,10 +866,10 @@ static int __trace_kprobe_create(int argc, const char *argv[])
> > > argc -= 2; argv += 2;
> > >
> > > /* parse arguments */
> > > + ctx.funcname = symbol;
> > > for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
> > > - struct traceprobe_parse_context ctx = { .flags = flags };
> > > -
> > > trace_probe_log_set_index(i + 2);
> > > + ctx.offset = 0;
> > > ret = traceprobe_parse_probe_arg(&tk->tp, i, argv[i], &ctx);
> > > if (ret)
> > > goto error; /* This can be -ENOMEM */
> > > diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c
> > > index 84a9f0446390..f55d633b3e2a 100644
> > > --- a/kernel/trace/trace_probe.c
> > > +++ b/kernel/trace/trace_probe.c
> > > @@ -300,6 +300,174 @@ static int parse_trace_event_arg(char *arg, struct fetch_insn *code,
> > > return -ENOENT;
> > > }
> > >
> > > +#ifdef CONFIG_PROBE_EVENTS_BTF_ARGS
> > > +
> > > +static DEFINE_MUTEX(tp_btf_mutex);
> > > +static struct btf *traceprobe_btf;
> > > +
> > > +static struct btf *traceprobe_get_btf(void)
> > > +{
> > > + if (!traceprobe_btf && IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) {
> > > + mutex_lock(&tp_btf_mutex);
> > > + if (!traceprobe_btf)
> > > + traceprobe_btf = btf_parse_vmlinux();
> >
> > Apologies if I missed this in previous discussion, but should we
> > use bpf_get_btf_vmlinux() here instead, since it will
> > return an already-parsed BTF? There's a bunch of additional
> > work that btf_parse_vmlinux() does that's not needed from
> > a tracing POV.
>
> I thought bpf_get_btf_vmlinux() was only for BPF subsystem because
> it is exposed in bpf.h, not btf.h. But yeah, it is useless if we
> have 2 BTF instance. So I'll use it.
>
> By the way, I eventually would like to use the BTF things even if the
> BPF_SYSCALL is not enabled. Can we make it, or is it hard to decouple
> the BTF things from BPF?

It's hard. I'd rather keep it as-is. The test matrix is already huge.

2023-04-21 05:42:11

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] tracing/probes: Add fprobe events for tracing function entry and exit.

On Thu, 20 Apr 2023 16:46:08 -0700
Alexei Starovoitov <[email protected]> wrote:

> On Thu, Apr 20, 2023 at 4:41 PM Masami Hiramatsu <[email protected]> wrote:
> >
> > On Thu, 20 Apr 2023 11:49:32 -0700
> > Alexei Starovoitov <[email protected]> wrote:
> >
> > > On Thu, Apr 20, 2023 at 08:25:50PM +0900, Masami Hiramatsu (Google) wrote:
> > > > +static int fentry_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
> > > > + struct pt_regs *regs)
> > > > +{
> > > > + struct trace_event_call *call = trace_probe_event_call(&tf->tp);
> > > > + struct fentry_trace_entry_head *entry;
> > > > + struct hlist_head *head;
> > > > + int size, __size, dsize;
> > > > + int rctx;
> > > > +
> > > > + if (bpf_prog_array_valid(call)) {
> > > > + unsigned long orig_ip = instruction_pointer(regs);
> > > > + int ret;
> > > > +
> > > > + ret = trace_call_bpf(call, regs);
> > >
> > > Please do not call bpf from fprobe.
> > > There is no use case for it.
> >
> > OK.
> >
> > >
> > > > +
> > > > + /*
> > > > + * We need to check and see if we modified the pc of the
> > > > + * pt_regs, and if so return 1 so that we don't do the
> > > > + * single stepping.
> > > > + */
> > > > + if (orig_ip != instruction_pointer(regs))
> > > > + return 1;
> > > > + if (!ret)
> > > > + return 0;
> > > > + }
> > > > +
> > > > + head = this_cpu_ptr(call->perf_events);
> > > > + if (hlist_empty(head))
> > > > + return 0;
> > > > +
> > > > + dsize = __get_data_size(&tf->tp, regs);
> > > > + __size = sizeof(*entry) + tf->tp.size + dsize;
> > > > + size = ALIGN(__size + sizeof(u32), sizeof(u64));
> > > > + size -= sizeof(u32);
> > > > +
> > > > + entry = perf_trace_buf_alloc(size, NULL, &rctx);
> > > > + if (!entry)
> > > > + return 0;
> > > > +
> > > > + entry->ip = entry_ip;
> > > > + memset(&entry[1], 0, dsize);
> > > > + store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
> > > > + perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
> > > > + head, NULL);
> > > > + return 0;
> > > > +}
> > > > +NOKPROBE_SYMBOL(fentry_perf_func);
> > > > +
> > > > +static void
> > > > +fexit_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
> > > > + unsigned long ret_ip, struct pt_regs *regs)
> > > > +{
> > > > + struct trace_event_call *call = trace_probe_event_call(&tf->tp);
> > > > + struct fexit_trace_entry_head *entry;
> > > > + struct hlist_head *head;
> > > > + int size, __size, dsize;
> > > > + int rctx;
> > > > +
> > > > + if (bpf_prog_array_valid(call) && !trace_call_bpf(call, regs))
> > > > + return;
> > >
> > > Same here.
> > > These two parts look like copy-paste from kprobes.
> > > I suspect this code wasn't tested at all.
> >
> > OK, I missed to test that bpf part. I thought bpf could be appended to
> > any "trace-event" (looks like trace-event), isn't it?
>
> No. We're not applying bpf filtering to any random event
> that gets introduced in a tracing subsystem.
> fprobe falls into that category.
> Every hook where bpf can be invoked has to be thought through.
> That mental exercise didn't happen here.

OK. Just out of curiousity, where is the "tracepoint" filter applied?
In the kernel (verifier?) or the userspace?

Thank you,


--
Masami Hiramatsu (Google) <[email protected]>

2023-04-21 16:35:12

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] tracing/probes: Add fprobe events for tracing function entry and exit.

On Thu, Apr 20, 2023 at 10:38 PM Masami Hiramatsu <[email protected]> wrote:
>
> On Thu, 20 Apr 2023 16:46:08 -0700
> Alexei Starovoitov <[email protected]> wrote:
>
> > On Thu, Apr 20, 2023 at 4:41 PM Masami Hiramatsu <[email protected]> wrote:
> > >
> > > On Thu, 20 Apr 2023 11:49:32 -0700
> > > Alexei Starovoitov <[email protected]> wrote:
> > >
> > > > On Thu, Apr 20, 2023 at 08:25:50PM +0900, Masami Hiramatsu (Google) wrote:
> > > > > +static int fentry_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
> > > > > + struct pt_regs *regs)
> > > > > +{
> > > > > + struct trace_event_call *call = trace_probe_event_call(&tf->tp);
> > > > > + struct fentry_trace_entry_head *entry;
> > > > > + struct hlist_head *head;
> > > > > + int size, __size, dsize;
> > > > > + int rctx;
> > > > > +
> > > > > + if (bpf_prog_array_valid(call)) {
> > > > > + unsigned long orig_ip = instruction_pointer(regs);
> > > > > + int ret;
> > > > > +
> > > > > + ret = trace_call_bpf(call, regs);
> > > >
> > > > Please do not call bpf from fprobe.
> > > > There is no use case for it.
> > >
> > > OK.
> > >
> > > >
> > > > > +
> > > > > + /*
> > > > > + * We need to check and see if we modified the pc of the
> > > > > + * pt_regs, and if so return 1 so that we don't do the
> > > > > + * single stepping.
> > > > > + */
> > > > > + if (orig_ip != instruction_pointer(regs))
> > > > > + return 1;
> > > > > + if (!ret)
> > > > > + return 0;
> > > > > + }
> > > > > +
> > > > > + head = this_cpu_ptr(call->perf_events);
> > > > > + if (hlist_empty(head))
> > > > > + return 0;
> > > > > +
> > > > > + dsize = __get_data_size(&tf->tp, regs);
> > > > > + __size = sizeof(*entry) + tf->tp.size + dsize;
> > > > > + size = ALIGN(__size + sizeof(u32), sizeof(u64));
> > > > > + size -= sizeof(u32);
> > > > > +
> > > > > + entry = perf_trace_buf_alloc(size, NULL, &rctx);
> > > > > + if (!entry)
> > > > > + return 0;
> > > > > +
> > > > > + entry->ip = entry_ip;
> > > > > + memset(&entry[1], 0, dsize);
> > > > > + store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
> > > > > + perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
> > > > > + head, NULL);
> > > > > + return 0;
> > > > > +}
> > > > > +NOKPROBE_SYMBOL(fentry_perf_func);
> > > > > +
> > > > > +static void
> > > > > +fexit_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
> > > > > + unsigned long ret_ip, struct pt_regs *regs)
> > > > > +{
> > > > > + struct trace_event_call *call = trace_probe_event_call(&tf->tp);
> > > > > + struct fexit_trace_entry_head *entry;
> > > > > + struct hlist_head *head;
> > > > > + int size, __size, dsize;
> > > > > + int rctx;
> > > > > +
> > > > > + if (bpf_prog_array_valid(call) && !trace_call_bpf(call, regs))
> > > > > + return;
> > > >
> > > > Same here.
> > > > These two parts look like copy-paste from kprobes.
> > > > I suspect this code wasn't tested at all.
> > >
> > > OK, I missed to test that bpf part. I thought bpf could be appended to
> > > any "trace-event" (looks like trace-event), isn't it?
> >
> > No. We're not applying bpf filtering to any random event
> > that gets introduced in a tracing subsystem.
> > fprobe falls into that category.
> > Every hook where bpf can be invoked has to be thought through.
> > That mental exercise didn't happen here.
>
> OK. Just out of curiousity, where is the "tracepoint" filter applied?
> In the kernel (verifier?) or the userspace?

Sorry. I don't understand the question.
Are you talking about BPF_PROG_TYPE_TRACEPOINT or BPF_PROG_TYPE_RAW_TRACEPOINT ?

2023-04-23 07:45:17

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v5 4/9] tracing/probes: Add tracepoint support on fprobe_event

On Thu, Apr 20, 2023 at 08:26:10PM +0900, Masami Hiramatsu (Google) wrote:
> From: Masami Hiramatsu (Google) <[email protected]>
>
> Allow fprobe_event to trace raw tracepoints so that user can trace
> tracepoints which don't have traceevent wrappers. This new event is
> always available if fprobe event is enabled since the tracepoint is
> disabled, trace-event and dynamic event is also not available.

I thought of ftrace tracepoints wrappers as standard in distros,
could you specify which config options that involves?

> + if (trace_fprobe_is_tracepoint(tf)) {
> + struct tracepoint *tpoint = tf->tpoint;
> + unsigned long ip = (unsigned long)tpoint->probestub;
> + /*
> + * Here, we do 2 steps to enable fprobe on a tracepoint.
> + * At first, put __probestub_##TP function on the tracepoint
> + * and put a fprobe on the stub function.
> + */
> + ret = tracepoint_probe_register_prio_may_exist(tpoint,
> + tpoint->probestub, NULL, 0);
> + if (ret < 0)
> + return ret;
> + return register_fprobe_ips(&tf->fp, &ip, 1);

nice idea

jirka

> + }
> +
> /* TODO: handle filter, nofilter or symbol list */
> return register_fprobe(&tf->fp, tf->symbol, NULL);
> }
> @@ -699,6 +723,12 @@ static void __unregister_trace_fprobe(struct trace_fprobe *tf)
> if (trace_fprobe_is_registered(tf)) {
> unregister_fprobe(&tf->fp);
> memset(&tf->fp, 0, sizeof(tf->fp));
> + if (trace_fprobe_is_tracepoint(tf)) {
> + tracepoint_probe_unregister(tf->tpoint,
> + tf->tpoint->probestub, NULL);
> + tf->tpoint = NULL;
> + tf->mod = NULL;
> + }
> }
> }

SNIP

2023-04-23 13:38:23

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 4/9] tracing/probes: Add tracepoint support on fprobe_event

On Sun, 23 Apr 2023 09:41:54 +0200
Jiri Olsa <[email protected]> wrote:

> On Thu, Apr 20, 2023 at 08:26:10PM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google) <[email protected]>
> >
> > Allow fprobe_event to trace raw tracepoints so that user can trace
> > tracepoints which don't have traceevent wrappers. This new event is
> > always available if fprobe event is enabled since the tracepoint is
> > disabled, trace-event and dynamic event is also not available.
>
> I thought of ftrace tracepoints wrappers as standard in distros,
> could you specify which config options that involves?

Ah, sorry, I'm completely confused you.

----
This new event is always available if fprobe event is enabled. If the
tracepoint is disabled, trace-event and dynamic event including
fprobe is also not available.
----

>
> > + if (trace_fprobe_is_tracepoint(tf)) {
> > + struct tracepoint *tpoint = tf->tpoint;
> > + unsigned long ip = (unsigned long)tpoint->probestub;
> > + /*
> > + * Here, we do 2 steps to enable fprobe on a tracepoint.
> > + * At first, put __probestub_##TP function on the tracepoint
> > + * and put a fprobe on the stub function.
> > + */
> > + ret = tracepoint_probe_register_prio_may_exist(tpoint,
> > + tpoint->probestub, NULL, 0);
> > + if (ret < 0)
> > + return ret;
> > + return register_fprobe_ips(&tf->fp, &ip, 1);
>
> nice idea

Thanks!


>
> jirka
>
> > + }
> > +
> > /* TODO: handle filter, nofilter or symbol list */
> > return register_fprobe(&tf->fp, tf->symbol, NULL);
> > }
> > @@ -699,6 +723,12 @@ static void __unregister_trace_fprobe(struct trace_fprobe *tf)
> > if (trace_fprobe_is_registered(tf)) {
> > unregister_fprobe(&tf->fp);
> > memset(&tf->fp, 0, sizeof(tf->fp));
> > + if (trace_fprobe_is_tracepoint(tf)) {
> > + tracepoint_probe_unregister(tf->tpoint,
> > + tf->tpoint->probestub, NULL);
> > + tf->tpoint = NULL;
> > + tf->mod = NULL;
> > + }
> > }
> > }
>
> SNIP


--
Masami Hiramatsu (Google) <[email protected]>

2023-04-24 04:33:13

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v5 2/9] tracing/probes: Add fprobe events for tracing function entry and exit.

On Fri, 21 Apr 2023 09:31:12 -0700
Alexei Starovoitov <[email protected]> wrote:

> On Thu, Apr 20, 2023 at 10:38 PM Masami Hiramatsu <[email protected]> wrote:
> >
> > On Thu, 20 Apr 2023 16:46:08 -0700
> > Alexei Starovoitov <[email protected]> wrote:
> >
> > > On Thu, Apr 20, 2023 at 4:41 PM Masami Hiramatsu <[email protected]> wrote:
> > > >
> > > > On Thu, 20 Apr 2023 11:49:32 -0700
> > > > Alexei Starovoitov <[email protected]> wrote:
> > > >
> > > > > On Thu, Apr 20, 2023 at 08:25:50PM +0900, Masami Hiramatsu (Google) wrote:
> > > > > > +static int fentry_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
> > > > > > + struct pt_regs *regs)
> > > > > > +{
> > > > > > + struct trace_event_call *call = trace_probe_event_call(&tf->tp);
> > > > > > + struct fentry_trace_entry_head *entry;
> > > > > > + struct hlist_head *head;
> > > > > > + int size, __size, dsize;
> > > > > > + int rctx;
> > > > > > +
> > > > > > + if (bpf_prog_array_valid(call)) {
> > > > > > + unsigned long orig_ip = instruction_pointer(regs);
> > > > > > + int ret;
> > > > > > +
> > > > > > + ret = trace_call_bpf(call, regs);
> > > > >
> > > > > Please do not call bpf from fprobe.
> > > > > There is no use case for it.
> > > >
> > > > OK.
> > > >
> > > > >
> > > > > > +
> > > > > > + /*
> > > > > > + * We need to check and see if we modified the pc of the
> > > > > > + * pt_regs, and if so return 1 so that we don't do the
> > > > > > + * single stepping.
> > > > > > + */
> > > > > > + if (orig_ip != instruction_pointer(regs))
> > > > > > + return 1;
> > > > > > + if (!ret)
> > > > > > + return 0;
> > > > > > + }
> > > > > > +
> > > > > > + head = this_cpu_ptr(call->perf_events);
> > > > > > + if (hlist_empty(head))
> > > > > > + return 0;
> > > > > > +
> > > > > > + dsize = __get_data_size(&tf->tp, regs);
> > > > > > + __size = sizeof(*entry) + tf->tp.size + dsize;
> > > > > > + size = ALIGN(__size + sizeof(u32), sizeof(u64));
> > > > > > + size -= sizeof(u32);
> > > > > > +
> > > > > > + entry = perf_trace_buf_alloc(size, NULL, &rctx);
> > > > > > + if (!entry)
> > > > > > + return 0;
> > > > > > +
> > > > > > + entry->ip = entry_ip;
> > > > > > + memset(&entry[1], 0, dsize);
> > > > > > + store_trace_args(&entry[1], &tf->tp, regs, sizeof(*entry), dsize);
> > > > > > + perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
> > > > > > + head, NULL);
> > > > > > + return 0;
> > > > > > +}
> > > > > > +NOKPROBE_SYMBOL(fentry_perf_func);
> > > > > > +
> > > > > > +static void
> > > > > > +fexit_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
> > > > > > + unsigned long ret_ip, struct pt_regs *regs)
> > > > > > +{
> > > > > > + struct trace_event_call *call = trace_probe_event_call(&tf->tp);
> > > > > > + struct fexit_trace_entry_head *entry;
> > > > > > + struct hlist_head *head;
> > > > > > + int size, __size, dsize;
> > > > > > + int rctx;
> > > > > > +
> > > > > > + if (bpf_prog_array_valid(call) && !trace_call_bpf(call, regs))
> > > > > > + return;
> > > > >
> > > > > Same here.
> > > > > These two parts look like copy-paste from kprobes.
> > > > > I suspect this code wasn't tested at all.
> > > >
> > > > OK, I missed to test that bpf part. I thought bpf could be appended to
> > > > any "trace-event" (looks like trace-event), isn't it?
> > >
> > > No. We're not applying bpf filtering to any random event
> > > that gets introduced in a tracing subsystem.
> > > fprobe falls into that category.
> > > Every hook where bpf can be invoked has to be thought through.
> > > That mental exercise didn't happen here.
> >
> > OK. Just out of curiousity, where is the "tracepoint" filter applied?
> > In the kernel (verifier?) or the userspace?
>
> Sorry. I don't understand the question.
> Are you talking about BPF_PROG_TYPE_TRACEPOINT or BPF_PROG_TYPE_RAW_TRACEPOINT ?

I thought that you filtered the available events by name, but I found
that perf_event_set_bpf_prog() checks TRACE_EVENT_FL_* flags and
its combinations. Yeah, in that case this new fprobe event introduced
TRACE_EVENT_FL_FPROBE and bpf will reject to use it.

OK, let me remove the BPF support from this series. I think the fprobe
event can be used as same as kprobe events, but I have a plan to change
it for supporting fprobe wider architectures. Thus it will require a bit
different way to get the register values.

Thank you,


--
Masami Hiramatsu (Google) <[email protected]>

2023-04-24 07:41:20

by Jiri Olsa

[permalink] [raw]
Subject: Re: [PATCH v5 4/9] tracing/probes: Add tracepoint support on fprobe_event

On Sun, Apr 23, 2023 at 10:37:40PM +0900, Masami Hiramatsu wrote:
> On Sun, 23 Apr 2023 09:41:54 +0200
> Jiri Olsa <[email protected]> wrote:
>
> > On Thu, Apr 20, 2023 at 08:26:10PM +0900, Masami Hiramatsu (Google) wrote:
> > > From: Masami Hiramatsu (Google) <[email protected]>
> > >
> > > Allow fprobe_event to trace raw tracepoints so that user can trace
> > > tracepoints which don't have traceevent wrappers. This new event is
> > > always available if fprobe event is enabled since the tracepoint is
> > > disabled, trace-event and dynamic event is also not available.
> >
> > I thought of ftrace tracepoints wrappers as standard in distros,
> > could you specify which config options that involves?
>
> Ah, sorry, I'm completely confused you.
>
> ----
> This new event is always available if fprobe event is enabled. If the
> tracepoint is disabled, trace-event and dynamic event including
> fprobe is also not available.
> ----

so this basically adds another way of attaching to tracepoint through ftrace
dynamic_events file interface.. but we can already attach to tracepoints
through ftrace via /sys/kernel/debug/tracing/events/* right?

I'm trying to find out what's the config for which this new way of attaching
tracepoints is useful

thanks,
jirka

>
> >
> > > + if (trace_fprobe_is_tracepoint(tf)) {
> > > + struct tracepoint *tpoint = tf->tpoint;
> > > + unsigned long ip = (unsigned long)tpoint->probestub;
> > > + /*
> > > + * Here, we do 2 steps to enable fprobe on a tracepoint.
> > > + * At first, put __probestub_##TP function on the tracepoint
> > > + * and put a fprobe on the stub function.
> > > + */
> > > + ret = tracepoint_probe_register_prio_may_exist(tpoint,
> > > + tpoint->probestub, NULL, 0);
> > > + if (ret < 0)
> > > + return ret;
> > > + return register_fprobe_ips(&tf->fp, &ip, 1);
> >
> > nice idea
>
> Thanks!
>
>
> >
> > jirka
> >
> > > + }
> > > +
> > > /* TODO: handle filter, nofilter or symbol list */
> > > return register_fprobe(&tf->fp, tf->symbol, NULL);
> > > }
> > > @@ -699,6 +723,12 @@ static void __unregister_trace_fprobe(struct trace_fprobe *tf)
> > > if (trace_fprobe_is_registered(tf)) {
> > > unregister_fprobe(&tf->fp);
> > > memset(&tf->fp, 0, sizeof(tf->fp));
> > > + if (trace_fprobe_is_tracepoint(tf)) {
> > > + tracepoint_probe_unregister(tf->tpoint,
> > > + tf->tpoint->probestub, NULL);
> > > + tf->tpoint = NULL;
> > > + tf->mod = NULL;
> > > + }
> > > }
> > > }
> >
> > SNIP
>
>
> --
> Masami Hiramatsu (Google) <[email protected]>