2024-04-15 12:49:18

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

Hi,

Here is the 9th version of the series to re-implement the fprobe on
function-graph tracer. The previous version is;

https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/

This version is ported on the latest kernel (v6.9-rc3 + probes/for-next)
and fixed some bugs + performance optimization patch[36/36].
- [12/36] Fix to clear fgraph_array entry in registration failure, also
return -ENOSPC when fgraph_array is full.
- [28/36] Add new store_fprobe_entry_data() for fprobe.
- [31/36] Remove DIV_ROUND_UP() and fix entry data address calculation.
- [36/36] Add new flag to skip timestamp recording.

Overview
--------
This series does major 2 changes, enable multiple function-graphs on
the ftrace (e.g. allow function-graph on sub instances) and rewrite the
fprobe on this function-graph.

The former changes had been sent from Steven Rostedt 4 years ago (*),
which allows users to set different setting function-graph tracer (and
other tracers based on function-graph) in each trace-instances at the
same time.

(*) https://lore.kernel.org/all/[email protected]/

The purpose of latter change are;

1) Remove dependency of the rethook from fprobe so that we can reduce
the return hook code and shadow stack.

2) Make 'ftrace_regs' the common trace interface for the function
boundary.

1) Currently we have 2(or 3) different function return hook codes,
the function-graph tracer and rethook (and legacy kretprobe).
But since this is redundant and needs double maintenance cost,
I would like to unify those. From the user's viewpoint, function-
graph tracer is very useful to grasp the execution path. For this
purpose, it is hard to use the rethook in the function-graph
tracer, but the opposite is possible. (Strictly speaking, kretprobe
can not use it because it requires 'pt_regs' for historical reasons.)

2) Now the fprobe provides the 'pt_regs' for its handler, but that is
wrong for the function entry and exit. Moreover, depending on the
architecture, there is no way to accurately reproduce 'pt_regs'
outside of interrupt or exception handlers. This means fprobe should
not use 'pt_regs' because it does not use such exceptions.
(Conversely, kprobe should use 'pt_regs' because it is an abstract
interface of the software breakpoint exception.)

This series changes fprobe to use function-graph tracer for tracing
function entry and exit, instead of mixture of ftrace and rethook.
Unlike the rethook which is a per-task list of system-wide allocated
nodes, the function graph's ret_stack is a per-task shadow stack.
Thus it does not need to set 'nr_maxactive' (which is the number of
pre-allocated nodes).
Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'.
Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as
their register interface, this changes it to convert 'ftrace_regs' to
'pt_regs'. Of course this conversion makes an incomplete 'pt_regs',
so users must access only registers for function parameters or
return value.

Design
------
Instead of using ftrace's function entry hook directly, the new fprobe
is built on top of the function-graph's entry and return callbacks
with 'ftrace_regs'.

Since the fprobe requires access to 'ftrace_regs', the architecture
must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS and
CONFIG_HAVE_FTRACE_GRAPH_FUNC, which enables to call function-graph
entry callback with 'ftrace_regs', and also
CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to
return_to_handler.

All fprobes share a single function-graph ops (means shares a common
ftrace filter) similar to the kprobe-on-ftrace. This needs another
layer to find corresponding fprobe in the common function-graph
callbacks, but has much better scalability, since the number of
registered function-graph ops is limited.

In the entry callback, the fprobe runs its entry_handler and saves the
address of 'fprobe' on the function-graph's shadow stack as data. The
return callback decodes the data to get the 'fprobe' address, and runs
the exit_handler.

The fprobe introduces two hash-tables, one is for entry callback which
searches fprobes related to the given function address passed by entry
callback. The other is for a return callback which checks if the given
'fprobe' data structure pointer is still valid. Note that it is
possible to unregister fprobe before the return callback runs. Thus
the address validation must be done before using it in the return
callback.

This series can be applied against the probes/for-next branch, which
is based on v6.9-rc3.

This series can also be found below branch.

https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-on-fgraph

Thank you,

---

Masami Hiramatsu (Google) (21):
tracing: Add a comment about ftrace_regs definition
tracing: Rename ftrace_regs_return_value to ftrace_regs_get_return_value
x86: tracing: Add ftrace_regs definition in the header
function_graph: Use a simple LRU for fgraph_array index number
ftrace: Add multiple fgraph storage selftest
function_graph: Pass ftrace_regs to entryfunc
function_graph: Replace fgraph_ret_regs with ftrace_regs
function_graph: Pass ftrace_regs to retfunc
fprobe: Use ftrace_regs in fprobe entry handler
fprobe: Use ftrace_regs in fprobe exit handler
tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
tracing: Add ftrace_fill_perf_regs() for perf event
tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
fprobe: Rewrite fprobe on function-graph tracer
tracing/fprobe: Remove nr_maxactive from fprobe
selftests: ftrace: Remove obsolate maxactive syntax check
selftests/ftrace: Add a test case for repeating register/unregister fprobe
Documentation: probes: Update fprobe on function-graph tracer
fgraph: Skip recording calltime/rettime if it is not nneeded

Steven Rostedt (VMware) (15):
function_graph: Convert ret_stack to a series of longs
fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long
function_graph: Add an array structure that will allow multiple callbacks
function_graph: Allow multiple users to attach to function graph
function_graph: Remove logic around ftrace_graph_entry and return
ftrace/function_graph: Pass fgraph_ops to function graph callbacks
ftrace: Allow function_graph tracer to be enabled in instances
ftrace: Allow ftrace startup flags exist without dynamic ftrace
function_graph: Have the instances use their own ftrace_ops for filtering
function_graph: Add "task variables" per task for fgraph_ops
function_graph: Move set_graph_function tests to shadow stack global var
function_graph: Move graph depth stored data to shadow stack global var
function_graph: Move graph notrace bit to shadow stack global var
function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
function_graph: Add selftest for passing local variables


Documentation/trace/fprobe.rst | 42 +
arch/arm64/Kconfig | 3
arch/arm64/include/asm/ftrace.h | 47 +
arch/arm64/kernel/asm-offsets.c | 12
arch/arm64/kernel/entry-ftrace.S | 32 -
arch/arm64/kernel/ftrace.c | 21
arch/loongarch/Kconfig | 4
arch/loongarch/include/asm/ftrace.h | 32 -
arch/loongarch/kernel/asm-offsets.c | 12
arch/loongarch/kernel/ftrace_dyn.c | 15
arch/loongarch/kernel/mcount.S | 17
arch/loongarch/kernel/mcount_dyn.S | 14
arch/powerpc/Kconfig | 1
arch/powerpc/include/asm/ftrace.h | 15
arch/powerpc/kernel/trace/ftrace.c | 3
arch/powerpc/kernel/trace/ftrace_64_pg.c | 10
arch/riscv/Kconfig | 3
arch/riscv/include/asm/ftrace.h | 21
arch/riscv/kernel/ftrace.c | 15
arch/riscv/kernel/mcount.S | 24
arch/s390/Kconfig | 3
arch/s390/include/asm/ftrace.h | 39 -
arch/s390/kernel/asm-offsets.c | 6
arch/s390/kernel/mcount.S | 9
arch/x86/Kconfig | 4
arch/x86/include/asm/ftrace.h | 43 -
arch/x86/kernel/ftrace.c | 51 +
arch/x86/kernel/ftrace_32.S | 15
arch/x86/kernel/ftrace_64.S | 17
include/linux/fprobe.h | 57 +
include/linux/ftrace.h | 170 +++
include/linux/sched.h | 2
include/linux/trace_recursion.h | 39 -
kernel/trace/Kconfig | 23
kernel/trace/bpf_trace.c | 14
kernel/trace/fgraph.c | 1005 ++++++++++++++++----
kernel/trace/fprobe.c | 637 +++++++++----
kernel/trace/ftrace.c | 13
kernel/trace/ftrace_internal.h | 2
kernel/trace/trace.h | 96 ++
kernel/trace/trace_fprobe.c | 147 ++-
kernel/trace/trace_functions.c | 8
kernel/trace/trace_functions_graph.c | 98 +-
kernel/trace/trace_irqsoff.c | 12
kernel/trace/trace_probe_tmpl.h | 2
kernel/trace/trace_sched_wakeup.c | 12
kernel/trace/trace_selftest.c | 262 +++++
lib/test_fprobe.c | 51 -
samples/fprobe/fprobe_example.c | 4
.../test.d/dynevent/add_remove_fprobe_repeat.tc | 19
.../ftrace/test.d/dynevent/fprobe_syntax_errors.tc | 4
51 files changed, 2325 insertions(+), 882 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc

--
Masami Hiramatsu (Google) <[email protected]>


2024-04-15 12:50:23

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 01/36] tracing: Add a comment about ftrace_regs definition

From: Masami Hiramatsu (Google) <[email protected]>

To clarify what will be expected on ftrace_regs, add a comment to the
architecture independent definition of the ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Mark Rutland <[email protected]>
---
Changes in v8:
- Update that the saved registers depends on the context.
Changes in v3:
- Add instruction pointer
Changes in v2:
- newly added.
---
include/linux/ftrace.h | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 54d53f345d14..b81f1afa82a1 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -118,6 +118,32 @@ extern int ftrace_enabled;

#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS

+/**
+ * ftrace_regs - ftrace partial/optimal register set
+ *
+ * ftrace_regs represents a group of registers which is used at the
+ * function entry and exit. There are three types of registers.
+ *
+ * - Registers for passing the parameters to callee, including the stack
+ * pointer. (e.g. rcx, rdx, rdi, rsi, r8, r9 and rsp on x86_64)
+ * - Registers for passing the return values to caller.
+ * (e.g. rax and rdx on x86_64)
+ * - Registers for hooking the function call and return including the
+ * frame pointer (the frame pointer is architecture/config dependent)
+ * (e.g. rip, rbp and rsp for x86_64)
+ *
+ * Also, architecture dependent fields can be used for internal process.
+ * (e.g. orig_ax on x86_64)
+ *
+ * On the function entry, those registers will be restored except for
+ * the stack pointer, so that user can change the function parameters
+ * and instruction pointer (e.g. live patching.)
+ * On the function exit, only registers which is used for return values
+ * are restored.
+ *
+ * NOTE: user *must not* access regs directly, only do it via APIs, because
+ * the member can be changed according to the architecture.
+ */
struct ftrace_regs {
struct pt_regs regs;
};


2024-04-15 12:52:31

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 02/36] tracing: Rename ftrace_regs_return_value to ftrace_regs_get_return_value

From: Masami Hiramatsu (Google) <[email protected]>

Rename ftrace_regs_return_value to ftrace_regs_get_return_value as same as
other ftrace_regs_get/set_* APIs.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Mark Rutland <[email protected]>
---
Changes in v6:
- Moved to top of the series.
Changes in v3:
- Newly added.
---
arch/loongarch/include/asm/ftrace.h | 2 +-
arch/powerpc/include/asm/ftrace.h | 2 +-
arch/s390/include/asm/ftrace.h | 2 +-
arch/x86/include/asm/ftrace.h | 2 +-
include/linux/ftrace.h | 2 +-
5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/loongarch/include/asm/ftrace.h b/arch/loongarch/include/asm/ftrace.h
index de891c2c83d4..b43acfc5776c 100644
--- a/arch/loongarch/include/asm/ftrace.h
+++ b/arch/loongarch/include/asm/ftrace.h
@@ -70,7 +70,7 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs, unsigned long ip)
regs_get_kernel_argument(&(fregs)->regs, n)
#define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(&(fregs)->regs)
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(&(fregs)->regs)
#define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(&(fregs)->regs, ret)
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 107fc5a48456..cfec6c5a47d0 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -61,7 +61,7 @@ ftrace_regs_get_instruction_pointer(struct ftrace_regs *fregs)
regs_get_kernel_argument(&(fregs)->regs, n)
#define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(&(fregs)->regs)
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(&(fregs)->regs)
#define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(&(fregs)->regs, ret)
diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h
index 621f23d5ae30..1912b598d1b8 100644
--- a/arch/s390/include/asm/ftrace.h
+++ b/arch/s390/include/asm/ftrace.h
@@ -88,7 +88,7 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs,
regs_get_kernel_argument(&(fregs)->regs, n)
#define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(&(fregs)->regs)
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(&(fregs)->regs)
#define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(&(fregs)->regs, ret)
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 897cf02c20b1..cf88cc8cc74d 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -58,7 +58,7 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
regs_get_kernel_argument(&(fregs)->regs, n)
#define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(&(fregs)->regs)
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(&(fregs)->regs)
#define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(&(fregs)->regs, ret)
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index b81f1afa82a1..d5df5f8fc35a 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -184,7 +184,7 @@ static __always_inline bool ftrace_regs_has_args(struct ftrace_regs *fregs)
regs_get_kernel_argument(ftrace_get_regs(fregs), n)
#define ftrace_regs_get_stack_pointer(fregs) \
kernel_stack_pointer(ftrace_get_regs(fregs))
-#define ftrace_regs_return_value(fregs) \
+#define ftrace_regs_get_return_value(fregs) \
regs_return_value(ftrace_get_regs(fregs))
#define ftrace_regs_set_return_value(fregs, ret) \
regs_set_return_value(ftrace_get_regs(fregs), ret)


2024-04-15 12:53:25

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 03/36] x86: tracing: Add ftrace_regs definition in the header

From: Masami Hiramatsu (Google) <[email protected]>

Add ftrace_regs definition for x86_64 in the ftrace header to
clarify what register will be accessible from ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v3:
- Add rip to be saved.
Changes in v2:
- Newly added.
---
arch/x86/include/asm/ftrace.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index cf88cc8cc74d..c88bf47f46da 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -36,6 +36,12 @@ static inline unsigned long ftrace_call_adjust(unsigned long addr)

#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
struct ftrace_regs {
+ /*
+ * On the x86_64, the ftrace_regs saves;
+ * rax, rcx, rdx, rdi, rsi, r8, r9, rbp, rip and rsp.
+ * Also orig_ax is used for passing direct trampoline address.
+ * x86_32 doesn't support ftrace_regs.
+ */
struct pt_regs regs;
};



2024-04-15 12:56:51

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 05/36] fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long

From: Steven Rostedt (VMware) <[email protected]>

Instead of using "ALIGN()", use BUILD_BUG_ON() as the structures should
always be divisible by sizeof(long).

Link: http://lkml.kernel.org/r/[email protected]

Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v7:
- Use DIV_ROUND_UP() to calculate FGRAPH_RET_INDEX
---
kernel/trace/fgraph.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 30edeb6d4aa9..6f8d36370994 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -26,10 +26,9 @@
#endif

#define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
-#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+#define FGRAPH_RET_INDEX DIV_ROUND_UP(FGRAPH_RET_SIZE, sizeof(long))
#define SHADOW_STACK_SIZE (PAGE_SIZE)
-#define SHADOW_STACK_INDEX \
- (ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
/* Leave on a buffer at the end */
#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)

@@ -91,6 +90,8 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
if (!current->ret_stack)
return -EBUSY;

+ BUILD_BUG_ON(SHADOW_STACK_SIZE % sizeof(long));
+
/*
* We must make sure the ret_stack is tested before we read
* anything else.
@@ -325,6 +326,8 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
{
int index = task->curr_ret_stack;

+ BUILD_BUG_ON(FGRAPH_RET_SIZE % sizeof(long));
+
index -= FGRAPH_RET_INDEX * (idx + 1);
if (index < 0)
return NULL;


2024-04-15 12:56:56

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 04/36] function_graph: Convert ret_stack to a series of longs

From: Steven Rostedt (VMware) <[email protected]>

In order to make it possible to have multiple callbacks registered with the
function_graph tracer, the retstack needs to be converted from an array of
ftrace_ret_stack structures to an array of longs. This will allow to store
the list of callbacks on the stack for the return side of the functions.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
include/linux/sched.h | 2 -
kernel/trace/fgraph.c | 124 ++++++++++++++++++++++++++++---------------------
2 files changed, 71 insertions(+), 55 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3c2abbc587b4..e453ad8d2d79 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1396,7 +1396,7 @@ struct task_struct {
int curr_ret_depth;

/* Stack of return addresses for return function tracing: */
- struct ftrace_ret_stack *ret_stack;
+ unsigned long *ret_stack;

/* Timestamp for last schedule: */
unsigned long long ftrace_timestamp;
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index c83c005e654e..30edeb6d4aa9 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -25,6 +25,18 @@
#define ASSIGN_OPS_HASH(opsname, val)
#endif

+#define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
+#define FGRAPH_RET_INDEX (ALIGN(FGRAPH_RET_SIZE, sizeof(long)) / sizeof(long))
+#define SHADOW_STACK_SIZE (PAGE_SIZE)
+#define SHADOW_STACK_INDEX \
+ (ALIGN(SHADOW_STACK_SIZE, sizeof(long)) / sizeof(long))
+/* Leave on a buffer at the end */
+#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
+
+#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
+#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
+#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })
+
DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
int ftrace_graph_active;

@@ -69,6 +81,7 @@ static int
ftrace_push_return_trace(unsigned long ret, unsigned long func,
unsigned long frame_pointer, unsigned long *retp)
{
+ struct ftrace_ret_stack *ret_stack;
unsigned long long calltime;
int index;

@@ -85,23 +98,25 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
smp_rmb();

/* The return trace stack is full */
- if (current->curr_ret_stack == FTRACE_RETFUNC_DEPTH - 1) {
+ if (current->curr_ret_stack >= SHADOW_STACK_MAX_INDEX) {
atomic_inc(&current->trace_overrun);
return -EBUSY;
}

calltime = trace_clock_local();

- index = ++current->curr_ret_stack;
+ index = current->curr_ret_stack;
+ RET_STACK_INC(current->curr_ret_stack);
+ ret_stack = RET_STACK(current, index);
barrier();
- current->ret_stack[index].ret = ret;
- current->ret_stack[index].func = func;
- current->ret_stack[index].calltime = calltime;
+ ret_stack->ret = ret;
+ ret_stack->func = func;
+ ret_stack->calltime = calltime;
#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
- current->ret_stack[index].fp = frame_pointer;
+ ret_stack->fp = frame_pointer;
#endif
#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
- current->ret_stack[index].retp = retp;
+ ret_stack->retp = retp;
#endif
return 0;
}
@@ -148,7 +163,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,

return 0;
out_ret:
- current->curr_ret_stack--;
+ RET_STACK_DEC(current->curr_ret_stack);
out:
current->curr_ret_depth--;
return -EBUSY;
@@ -159,11 +174,13 @@ static void
ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
unsigned long frame_pointer)
{
+ struct ftrace_ret_stack *ret_stack;
int index;

index = current->curr_ret_stack;
+ RET_STACK_DEC(index);

- if (unlikely(index < 0 || index >= FTRACE_RETFUNC_DEPTH)) {
+ if (unlikely(index < 0 || index > SHADOW_STACK_MAX_INDEX)) {
ftrace_graph_stop();
WARN_ON(1);
/* Might as well panic, otherwise we have no where to go */
@@ -171,6 +188,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
return;
}

+ ret_stack = RET_STACK(current, index);
#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
/*
* The arch may choose to record the frame pointer used
@@ -186,22 +204,22 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
* Note, -mfentry does not use frame pointers, and this test
* is not needed if CC_USING_FENTRY is set.
*/
- if (unlikely(current->ret_stack[index].fp != frame_pointer)) {
+ if (unlikely(ret_stack->fp != frame_pointer)) {
ftrace_graph_stop();
WARN(1, "Bad frame pointer: expected %lx, received %lx\n"
" from func %ps return to %lx\n",
current->ret_stack[index].fp,
frame_pointer,
- (void *)current->ret_stack[index].func,
- current->ret_stack[index].ret);
+ (void *)ret_stack->func,
+ ret_stack->ret);
*ret = (unsigned long)panic;
return;
}
#endif

- *ret = current->ret_stack[index].ret;
- trace->func = current->ret_stack[index].func;
- trace->calltime = current->ret_stack[index].calltime;
+ *ret = ret_stack->ret;
+ trace->func = ret_stack->func;
+ trace->calltime = ret_stack->calltime;
trace->overrun = atomic_read(&current->trace_overrun);
trace->depth = current->curr_ret_depth--;
/*
@@ -262,7 +280,7 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
* curr_ret_stack is after that.
*/
barrier();
- current->curr_ret_stack--;
+ RET_STACK_DEC(current->curr_ret_stack);

if (unlikely(!ret)) {
ftrace_graph_stop();
@@ -305,12 +323,13 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
struct ftrace_ret_stack *
ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
{
- idx = task->curr_ret_stack - idx;
+ int index = task->curr_ret_stack;

- if (idx >= 0 && idx <= task->curr_ret_stack)
- return &task->ret_stack[idx];
+ index -= FGRAPH_RET_INDEX * (idx + 1);
+ if (index < 0)
+ return NULL;

- return NULL;
+ return RET_STACK(task, index);
}

/**
@@ -332,18 +351,20 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ret, unsigned long *retp)
{
+ struct ftrace_ret_stack *ret_stack;
int index = task->curr_ret_stack;
int i;

if (ret != (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
return ret;

- if (index < 0)
- return ret;
+ RET_STACK_DEC(index);

- for (i = 0; i <= index; i++)
- if (task->ret_stack[i].retp == retp)
- return task->ret_stack[i].ret;
+ for (i = index; i >= 0; RET_STACK_DEC(i)) {
+ ret_stack = RET_STACK(task, i);
+ if (ret_stack->retp == retp)
+ return ret_stack->ret;
+ }

return ret;
}
@@ -357,14 +378,15 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
return ret;

task_idx = task->curr_ret_stack;
+ RET_STACK_DEC(task_idx);

if (!task->ret_stack || task_idx < *idx)
return ret;

task_idx -= *idx;
- (*idx)++;
+ RET_STACK_INC(*idx);

- return task->ret_stack[task_idx].ret;
+ return RET_STACK(task, task_idx);
}
#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */

@@ -402,7 +424,7 @@ trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;

/* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
-static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
+static int alloc_retstack_tasklist(unsigned long **ret_stack_list)
{
int i;
int ret = 0;
@@ -410,10 +432,7 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)
struct task_struct *g, *t;

for (i = 0; i < FTRACE_RETSTACK_ALLOC_SIZE; i++) {
- ret_stack_list[i] =
- kmalloc_array(FTRACE_RETFUNC_DEPTH,
- sizeof(struct ftrace_ret_stack),
- GFP_KERNEL);
+ ret_stack_list[i] = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
if (!ret_stack_list[i]) {
start = 0;
end = i;
@@ -431,9 +450,9 @@ static int alloc_retstack_tasklist(struct ftrace_ret_stack **ret_stack_list)

if (t->ret_stack == NULL) {
atomic_set(&t->trace_overrun, 0);
- t->curr_ret_stack = -1;
+ t->curr_ret_stack = 0;
t->curr_ret_depth = -1;
- /* Make sure the tasks see the -1 first: */
+ /* Make sure the tasks see the 0 first: */
smp_wmb();
t->ret_stack = ret_stack_list[start++];
}
@@ -453,6 +472,7 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
struct task_struct *next,
unsigned int prev_state)
{
+ struct ftrace_ret_stack *ret_stack;
unsigned long long timestamp;
int index;

@@ -477,8 +497,11 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
*/
timestamp -= next->ftrace_timestamp;

- for (index = next->curr_ret_stack; index >= 0; index--)
- next->ret_stack[index].calltime += timestamp;
+ for (index = next->curr_ret_stack - FGRAPH_RET_INDEX; index >= 0; ) {
+ ret_stack = RET_STACK(next, index);
+ ret_stack->calltime += timestamp;
+ index -= FGRAPH_RET_INDEX;
+ }
}

static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
@@ -521,10 +544,10 @@ void update_function_graph_func(void)
ftrace_graph_entry = __ftrace_graph_entry;
}

-static DEFINE_PER_CPU(struct ftrace_ret_stack *, idle_ret_stack);
+static DEFINE_PER_CPU(unsigned long *, idle_ret_stack);

static void
-graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
+graph_init_task(struct task_struct *t, unsigned long *ret_stack)
{
atomic_set(&t->trace_overrun, 0);
t->ftrace_timestamp = 0;
@@ -539,7 +562,7 @@ graph_init_task(struct task_struct *t, struct ftrace_ret_stack *ret_stack)
*/
void ftrace_graph_init_idle_task(struct task_struct *t, int cpu)
{
- t->curr_ret_stack = -1;
+ t->curr_ret_stack = 0;
t->curr_ret_depth = -1;
/*
* The idle task has no parent, it either has its own
@@ -549,14 +572,11 @@ void ftrace_graph_init_idle_task(struct task_struct *t, int cpu)
WARN_ON(t->ret_stack != per_cpu(idle_ret_stack, cpu));

if (ftrace_graph_active) {
- struct ftrace_ret_stack *ret_stack;
+ unsigned long *ret_stack;

ret_stack = per_cpu(idle_ret_stack, cpu);
if (!ret_stack) {
- ret_stack =
- kmalloc_array(FTRACE_RETFUNC_DEPTH,
- sizeof(struct ftrace_ret_stack),
- GFP_KERNEL);
+ ret_stack = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
if (!ret_stack)
return;
per_cpu(idle_ret_stack, cpu) = ret_stack;
@@ -570,15 +590,13 @@ void ftrace_graph_init_task(struct task_struct *t)
{
/* Make sure we do not use the parent ret_stack */
t->ret_stack = NULL;
- t->curr_ret_stack = -1;
+ t->curr_ret_stack = 0;
t->curr_ret_depth = -1;

if (ftrace_graph_active) {
- struct ftrace_ret_stack *ret_stack;
+ unsigned long *ret_stack;

- ret_stack = kmalloc_array(FTRACE_RETFUNC_DEPTH,
- sizeof(struct ftrace_ret_stack),
- GFP_KERNEL);
+ ret_stack = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);
if (!ret_stack)
return;
graph_init_task(t, ret_stack);
@@ -587,7 +605,7 @@ void ftrace_graph_init_task(struct task_struct *t)

void ftrace_graph_exit_task(struct task_struct *t)
{
- struct ftrace_ret_stack *ret_stack = t->ret_stack;
+ unsigned long *ret_stack = t->ret_stack;

t->ret_stack = NULL;
/* NULL must become visible to IRQs before we free it: */
@@ -599,12 +617,10 @@ void ftrace_graph_exit_task(struct task_struct *t)
/* Allocate a return stack for each task */
static int start_graph_tracing(void)
{
- struct ftrace_ret_stack **ret_stack_list;
+ unsigned long **ret_stack_list;
int ret, cpu;

- ret_stack_list = kmalloc_array(FTRACE_RETSTACK_ALLOC_SIZE,
- sizeof(struct ftrace_ret_stack *),
- GFP_KERNEL);
+ ret_stack_list = kmalloc(SHADOW_STACK_SIZE, GFP_KERNEL);

if (!ret_stack_list)
return -ENOMEM;


2024-04-15 12:58:44

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 06/36] function_graph: Add an array structure that will allow multiple callbacks

From: Steven Rostedt (VMware) <[email protected]>

Add an array structure that will eventually allow the function graph tracer
to have up to 16 simultaneous callbacks attached. It's an array of 16
fgraph_ops pointers, that is assigned when one is registered. On entry of a
function the entry of the first item in the array is called, and if it
returns zero, then the callback returns non zero if it wants the return
callback to be called on exit of the function.

The array will simplify the process of having more than one callback
attached to the same function, as its index into the array can be stored on
the shadow stack. We need to only save the index, because this will allow
the fgraph_ops to be freed before the function returns (which may happen if
the function call schedule for a long time).

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v2:
- Remove unneeded brace.
---
kernel/trace/fgraph.c | 114 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 81 insertions(+), 33 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 6f8d36370994..3f9dd213e7d8 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -39,6 +39,11 @@
DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
int ftrace_graph_active;

+static int fgraph_array_cnt;
+#define FGRAPH_ARRAY_SIZE 16
+
+static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
+
/* Both enabled by default (can be cleared by function_graph tracer flags */
static bool fgraph_sleep_time = true;

@@ -62,6 +67,20 @@ int __weak ftrace_disable_ftrace_graph_caller(void)
}
#endif

+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+{
+ return 0;
+}
+
+static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace)
+{
+}
+
+static struct fgraph_ops fgraph_stub = {
+ .entryfunc = ftrace_graph_entry_stub,
+ .retfunc = ftrace_graph_ret_stub,
+};
+
/**
* ftrace_graph_stop - set to permanently disable function graph tracing
*
@@ -159,7 +178,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
goto out;

/* Only trace if the calling function expects to */
- if (!ftrace_graph_entry(&trace))
+ if (!fgraph_array[0]->entryfunc(&trace))
goto out_ret;

return 0;
@@ -274,7 +293,7 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
trace.retval = fgraph_ret_regs_return_value(ret_regs);
#endif
trace.rettime = trace_clock_local();
- ftrace_graph_return(&trace);
+ fgraph_array[0]->retfunc(&trace);
/*
* The ftrace_graph_return() may still access the current
* ret_stack structure, we need to make sure the update of
@@ -410,11 +429,6 @@ void ftrace_graph_sleep_time_control(bool enable)
fgraph_sleep_time = enable;
}

-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
-{
- return 0;
-}
-
/*
* Simply points to ftrace_stub, but with the proper protocol.
* Defined by the linker script in linux/vmlinux.lds.h
@@ -652,37 +666,54 @@ static int start_graph_tracing(void)
int register_ftrace_graph(struct fgraph_ops *gops)
{
int ret = 0;
+ int i;

mutex_lock(&ftrace_lock);

- /* we currently allow only one tracer registered at a time */
- if (ftrace_graph_active) {
+ if (!fgraph_array[0]) {
+ /* The array must always have real data on it */
+ for (i = 0; i < FGRAPH_ARRAY_SIZE; i++)
+ fgraph_array[i] = &fgraph_stub;
+ }
+
+ /* Look for an available spot */
+ for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+ if (fgraph_array[i] == &fgraph_stub)
+ break;
+ }
+ if (i >= FGRAPH_ARRAY_SIZE) {
ret = -EBUSY;
goto out;
}

- register_pm_notifier(&ftrace_suspend_notifier);
+ fgraph_array[i] = gops;
+ if (i + 1 > fgraph_array_cnt)
+ fgraph_array_cnt = i + 1;

ftrace_graph_active++;
- ret = start_graph_tracing();
- if (ret) {
- ftrace_graph_active--;
- goto out;
- }

- ftrace_graph_return = gops->retfunc;
+ if (ftrace_graph_active == 1) {
+ register_pm_notifier(&ftrace_suspend_notifier);
+ ret = start_graph_tracing();
+ if (ret) {
+ ftrace_graph_active--;
+ goto out;
+ }
+
+ ftrace_graph_return = gops->retfunc;

- /*
- * Update the indirect function to the entryfunc, and the
- * function that gets called to the entry_test first. Then
- * call the update fgraph entry function to determine if
- * the entryfunc should be called directly or not.
- */
- __ftrace_graph_entry = gops->entryfunc;
- ftrace_graph_entry = ftrace_graph_entry_test;
- update_function_graph_func();
+ /*
+ * Update the indirect function to the entryfunc, and the
+ * function that gets called to the entry_test first. Then
+ * call the update fgraph entry function to determine if
+ * the entryfunc should be called directly or not.
+ */
+ __ftrace_graph_entry = gops->entryfunc;
+ ftrace_graph_entry = ftrace_graph_entry_test;
+ update_function_graph_func();

- ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
+ ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
+ }
out:
mutex_unlock(&ftrace_lock);
return ret;
@@ -690,19 +721,36 @@ int register_ftrace_graph(struct fgraph_ops *gops)

void unregister_ftrace_graph(struct fgraph_ops *gops)
{
+ int i;
+
mutex_lock(&ftrace_lock);

if (unlikely(!ftrace_graph_active))
goto out;

- ftrace_graph_active--;
- ftrace_graph_return = ftrace_stub_graph;
- ftrace_graph_entry = ftrace_graph_entry_stub;
- __ftrace_graph_entry = ftrace_graph_entry_stub;
- ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
- unregister_pm_notifier(&ftrace_suspend_notifier);
- unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
+ for (i = 0; i < fgraph_array_cnt; i++)
+ if (gops == fgraph_array[i])
+ break;
+ if (i >= fgraph_array_cnt)
+ goto out;

+ fgraph_array[i] = &fgraph_stub;
+ if (i + 1 == fgraph_array_cnt) {
+ for (; i >= 0; i--)
+ if (fgraph_array[i] != &fgraph_stub)
+ break;
+ fgraph_array_cnt = i + 1;
+ }
+
+ ftrace_graph_active--;
+ if (!ftrace_graph_active) {
+ ftrace_graph_return = ftrace_stub_graph;
+ ftrace_graph_entry = ftrace_graph_entry_stub;
+ __ftrace_graph_entry = ftrace_graph_entry_stub;
+ ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
+ unregister_pm_notifier(&ftrace_suspend_notifier);
+ unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
+ }
out:
mutex_unlock(&ftrace_lock);
}


2024-04-15 13:00:52

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 07/36] function_graph: Allow multiple users to attach to function graph

From: Steven Rostedt (VMware) <[email protected]>

Allow for multiple users to attach to function graph tracer at the same
time. Only 16 simultaneous users can attach to the tracer. This is because
there's an array that stores the pointers to the attached fgraph_ops. When
a function being traced is entered, each of the ftrace_ops entryfunc is
called and if it returns non zero, its index into the array will be added
to the shadow stack.

On exit of the function being traced, the shadow stack will contain the
indexes of the ftrace_ops on the array that want their retfunc to be
called.

Because a function may sleep for a long time (if a task sleeps itself),
the return of the function may be literally days later. If the ftrace_ops
is removed, its place on the array is replaced with a ftrace_ops that
contains the stub functions and that will be called when the function
finally returns.

If another ftrace_ops is added that happens to get the same index into the
array, its return function may be called. But that's actually the way
things current work with the old function graph tracer. If one tracer is
removed and another is added, the new one will get the return calls of the
function traced by the previous one, thus this is not a regression. This
can be fixed by adding a counter to each time the array item is updated and
save that on the shadow stack as well, such that it won't be called if the
index saved does not match the index on the array.

Note, being able to filter functions when both are called is not completely
handled yet, but that shouldn't be too hard to manage.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v7:
- Fix max limitation check in ftrace_graph_push_return().
- Rewrite the shadow stack implementation using bitmap entry. This allows
us to detect recursive call/tail call easier. (this implementation is
moved from later patch in the series.
Changes in v2:
- Check return value of the ftrace_pop_return_trace() instead of 'ret'
since 'ret' is set to the address of panic().
- Fix typo and make lines shorter than 76 chars in description.
---
include/linux/ftrace.h | 1
kernel/trace/fgraph.c | 360 ++++++++++++++++++++++++++++++++++++++++--------
2 files changed, 301 insertions(+), 60 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index d5df5f8fc35a..bedc3c5fc36f 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1066,6 +1066,7 @@ extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace);
struct fgraph_ops {
trace_func_graph_ent_t entryfunc;
trace_func_graph_ret_t retfunc;
+ int idx;
};

/*
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 3f9dd213e7d8..b9a2399b75ee 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -7,6 +7,7 @@
*
* Highly modified by Steven Rostedt (VMware).
*/
+#include <linux/bits.h>
#include <linux/jump_label.h>
#include <linux/suspend.h>
#include <linux/ftrace.h>
@@ -27,23 +28,157 @@

#define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
#define FGRAPH_RET_INDEX DIV_ROUND_UP(FGRAPH_RET_SIZE, sizeof(long))
+
+/*
+ * On entry to a function (via function_graph_enter()), a new ftrace_ret_stack
+ * is allocated on the task's ret_stack with indexes entry, then each
+ * fgraph_ops on the fgraph_array[]'s entryfunc is called and if that returns
+ * non-zero, the index into the fgraph_array[] for that fgraph_ops is recorded
+ * on the indexes entry as a bit flag.
+ * As the associated ftrace_ret_stack saved for those fgraph_ops needs to
+ * be found, the index to it is also added to the ret_stack along with the
+ * index of the fgraph_array[] to each fgraph_ops that needs their retfunc
+ * called.
+ *
+ * The top of the ret_stack (when not empty) will always have a reference
+ * to the last ftrace_ret_stack saved. All references to the
+ * ftrace_ret_stack has the format of:
+ *
+ * bits: 0 - 9 offset in words from the previous ftrace_ret_stack
+ * (bitmap type should have FGRAPH_RET_INDEX always)
+ * bits: 10 - 11 Type of storage
+ * 0 - reserved
+ * 1 - bitmap of fgraph_array index
+ *
+ * For bitmap of fgraph_array index
+ * bits: 12 - 27 The bitmap of fgraph_ops fgraph_array index
+ *
+ * That is, at the end of function_graph_enter, if the first and forth
+ * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
+ * on the return of the function being traced, this is what will be on the
+ * task's shadow ret_stack: (the stack grows upward)
+ *
+ * | | <- task->curr_ret_stack
+ * +--------------------------------------------+
+ * | bitmap_type(bitmap:(BIT(3)|BIT(0)), |
+ * | offset:FGRAPH_RET_INDEX) | <- the offset is from here
+ * +--------------------------------------------+
+ * | struct ftrace_ret_stack |
+ * | (stores the saved ret pointer) | <- the offset points here
+ * +--------------------------------------------+
+ * | (X) | (N) | ( N words away from
+ * | | previous ret_stack)
+ *
+ * If a backtrace is required, and the real return pointer needs to be
+ * fetched, then it looks at the task's curr_ret_stack index, if it
+ * is greater than zero (reserved, or right before poped), it would mask
+ * the value by FGRAPH_RET_INDEX_MASK to get the offset index of the
+ * ftrace_ret_stack structure stored on the shadow stack.
+ */
+
+#define FGRAPH_RET_INDEX_SIZE 10
+#define FGRAPH_RET_INDEX_MASK GENMASK(FGRAPH_RET_INDEX_SIZE - 1, 0)
+
+#define FGRAPH_TYPE_SIZE 2
+#define FGRAPH_TYPE_MASK GENMASK(FGRAPH_TYPE_SIZE - 1, 0)
+#define FGRAPH_TYPE_SHIFT FGRAPH_RET_INDEX_SIZE
+
+enum {
+ FGRAPH_TYPE_RESERVED = 0,
+ FGRAPH_TYPE_BITMAP = 1,
+};
+
+#define FGRAPH_INDEX_SIZE 16
+#define FGRAPH_INDEX_MASK GENMASK(FGRAPH_INDEX_SIZE - 1, 0)
+#define FGRAPH_INDEX_SHIFT (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
+
+/* Currently the max stack index can't be more than register callers */
+#define FGRAPH_MAX_INDEX (FGRAPH_INDEX_SIZE + FGRAPH_RET_INDEX)
+
+#define FGRAPH_ARRAY_SIZE FGRAPH_INDEX_SIZE
+
#define SHADOW_STACK_SIZE (PAGE_SIZE)
#define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
/* Leave on a buffer at the end */
-#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
+#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1))

#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))
-#define RET_STACK_INC(c) ({ c += FGRAPH_RET_INDEX; })
-#define RET_STACK_DEC(c) ({ c -= FGRAPH_RET_INDEX; })

DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
int ftrace_graph_active;

static int fgraph_array_cnt;
-#define FGRAPH_ARRAY_SIZE 16

static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];

+static inline int get_ret_stack_index(struct task_struct *t, int offset)
+{
+ return t->ret_stack[offset] & FGRAPH_RET_INDEX_MASK;
+}
+
+static inline int get_fgraph_type(struct task_struct *t, int offset)
+{
+ return (t->ret_stack[offset] >> FGRAPH_TYPE_SHIFT) & FGRAPH_TYPE_MASK;
+}
+
+static inline unsigned long
+get_fgraph_index_bitmap(struct task_struct *t, int offset)
+{
+ return (t->ret_stack[offset] >> FGRAPH_INDEX_SHIFT) & FGRAPH_INDEX_MASK;
+}
+
+static inline void
+set_fgraph_index_bitmap(struct task_struct *t, int offset, unsigned long bitmap)
+{
+ t->ret_stack[offset] = (bitmap << FGRAPH_INDEX_SHIFT) |
+ (FGRAPH_TYPE_BITMAP << FGRAPH_TYPE_SHIFT) | FGRAPH_RET_INDEX;
+}
+
+static inline bool is_fgraph_index_set(struct task_struct *t, int offset, int idx)
+{
+ return !!(get_fgraph_index_bitmap(t, offset) & BIT(idx));
+}
+
+static inline void
+add_fgraph_index_bitmap(struct task_struct *t, int offset, unsigned long bitmap)
+{
+ t->ret_stack[offset] |= (bitmap << FGRAPH_INDEX_SHIFT);
+}
+
+/*
+ * @offset: The index into @t->ret_stack to find the ret_stack entry
+ * @index: Where to place the index into @t->ret_stack of that entry
+ *
+ * Calling this with:
+ *
+ * offset = task->curr_ret_stack;
+ * do {
+ * ret_stack = get_ret_stack(task, offset, &offset);
+ * } while (ret_stack);
+ *
+ * Will iterate through all the ret_stack entries from curr_ret_stack
+ * down to the first one.
+ */
+static inline struct ftrace_ret_stack *
+get_ret_stack(struct task_struct *t, int offset, int *index)
+{
+ int idx;
+
+ BUILD_BUG_ON(FGRAPH_RET_SIZE % sizeof(long));
+
+ if (unlikely(offset <= 0))
+ return NULL;
+
+ idx = get_ret_stack_index(t, --offset);
+ if (WARN_ON_ONCE(idx <= 0 || idx > offset))
+ return NULL;
+
+ offset -= idx;
+
+ *index = offset;
+ return RET_STACK(t, offset);
+}
+
/* Both enabled by default (can be cleared by function_graph tracer flags */
static bool fgraph_sleep_time = true;

@@ -97,10 +232,12 @@ void ftrace_graph_stop(void)
/* Add a function return address to the trace stack on thread info.*/
static int
ftrace_push_return_trace(unsigned long ret, unsigned long func,
- unsigned long frame_pointer, unsigned long *retp)
+ unsigned long frame_pointer, unsigned long *retp,
+ int fgraph_idx)
{
struct ftrace_ret_stack *ret_stack;
unsigned long long calltime;
+ unsigned long val;
int index;

if (unlikely(ftrace_graph_is_dead()))
@@ -109,6 +246,21 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
if (!current->ret_stack)
return -EBUSY;

+ /*
+ * At first, check whether the previous fgraph callback is pushed by
+ * the fgraph on the same function entry.
+ * But if @func is the self tail-call function, we also need to ensure
+ * the ret_stack is not for the previous call by checking whether the
+ * bit of @fgraph_idx is set or not.
+ */
+ ret_stack = get_ret_stack(current, current->curr_ret_stack, &index);
+ if (ret_stack && ret_stack->func == func &&
+ get_fgraph_type(current, index + FGRAPH_RET_INDEX) == FGRAPH_TYPE_BITMAP &&
+ !is_fgraph_index_set(current, index + FGRAPH_RET_INDEX, fgraph_idx))
+ return index + FGRAPH_RET_INDEX;
+
+ val = (FGRAPH_TYPE_RESERVED << FGRAPH_TYPE_SHIFT) | FGRAPH_RET_INDEX;
+
BUILD_BUG_ON(SHADOW_STACK_SIZE % sizeof(long));

/*
@@ -118,17 +270,44 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
smp_rmb();

/* The return trace stack is full */
- if (current->curr_ret_stack >= SHADOW_STACK_MAX_INDEX) {
+ if (current->curr_ret_stack + FGRAPH_RET_INDEX + 1 >= SHADOW_STACK_MAX_INDEX) {
atomic_inc(&current->trace_overrun);
return -EBUSY;
}

calltime = trace_clock_local();

- index = current->curr_ret_stack;
- RET_STACK_INC(current->curr_ret_stack);
+ index = READ_ONCE(current->curr_ret_stack);
ret_stack = RET_STACK(current, index);
+ index += FGRAPH_RET_INDEX;
+
+ /* ret offset = FGRAPH_RET_INDEX ; type = reserved */
+ current->ret_stack[index] = val;
+ ret_stack->ret = ret;
+ /*
+ * The unwinders expect curr_ret_stack to point to either zero
+ * or an index where to find the next ret_stack. Even though the
+ * ret stack might be bogus, we want to write the ret and the
+ * index to find the ret_stack before we increment the stack point.
+ * If an interrupt comes in now before we increment the curr_ret_stack
+ * it may blow away what we wrote. But that's fine, because the
+ * index will still be correct (even though the 'ret' won't be).
+ * What we worry about is the index being correct after we increment
+ * the curr_ret_stack and before we update that index, as if an
+ * interrupt comes in and does an unwind stack dump, it will need
+ * at least a correct index!
+ */
+ barrier();
+ current->curr_ret_stack = index + 1;
+ /*
+ * This next barrier is to ensure that an interrupt coming in
+ * will not corrupt what we are about to write.
+ */
barrier();
+
+ /* Still keep it reserved even if an interrupt came in */
+ current->ret_stack[index] = val;
+
ret_stack->ret = ret;
ret_stack->func = func;
ret_stack->calltime = calltime;
@@ -138,7 +317,7 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
#ifdef HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
ret_stack->retp = retp;
#endif
- return 0;
+ return index;
}

/*
@@ -155,10 +334,14 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
# define MCOUNT_INSN_SIZE 0
#endif

+/* If the caller does not use ftrace, call this function. */
int function_graph_enter(unsigned long ret, unsigned long func,
unsigned long frame_pointer, unsigned long *retp)
{
struct ftrace_graph_ent trace;
+ unsigned long bitmap = 0;
+ int index;
+ int i;

#ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
/*
@@ -171,44 +354,59 @@ int function_graph_enter(unsigned long ret, unsigned long func,
ftrace_find_rec_direct(ret - MCOUNT_INSN_SIZE))
return -EBUSY;
#endif
+
trace.func = func;
trace.depth = ++current->curr_ret_depth;

- if (ftrace_push_return_trace(ret, func, frame_pointer, retp))
+ index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0);
+ if (index < 0)
goto out;

- /* Only trace if the calling function expects to */
- if (!fgraph_array[0]->entryfunc(&trace))
+ for (i = 0; i < fgraph_array_cnt; i++) {
+ struct fgraph_ops *gops = fgraph_array[i];
+
+ if (gops == &fgraph_stub)
+ continue;
+
+ if (gops->entryfunc(&trace))
+ bitmap |= BIT(i);
+ }
+
+ if (!bitmap)
goto out_ret;

+ /*
+ * Since this function uses fgraph_idx = 0 as a tail-call checking
+ * flag, set that bit always.
+ */
+ set_fgraph_index_bitmap(current, index, bitmap | BIT(0));
+
return 0;
out_ret:
- RET_STACK_DEC(current->curr_ret_stack);
+ current->curr_ret_stack -= FGRAPH_RET_INDEX + 1;
out:
current->curr_ret_depth--;
return -EBUSY;
}

/* Retrieve a function return address to the trace stack on thread info.*/
-static void
+static struct ftrace_ret_stack *
ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
- unsigned long frame_pointer)
+ unsigned long frame_pointer, int *index)
{
struct ftrace_ret_stack *ret_stack;
- int index;

- index = current->curr_ret_stack;
- RET_STACK_DEC(index);
+ ret_stack = get_ret_stack(current, current->curr_ret_stack, index);

- if (unlikely(index < 0 || index > SHADOW_STACK_MAX_INDEX)) {
+ if (unlikely(!ret_stack)) {
ftrace_graph_stop();
- WARN_ON(1);
+ WARN(1, "Bad function graph ret_stack pointer: %d",
+ current->curr_ret_stack);
/* Might as well panic, otherwise we have no where to go */
*ret = (unsigned long)panic;
- return;
+ return NULL;
}

- ret_stack = RET_STACK(current, index);
#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
/*
* The arch may choose to record the frame pointer used
@@ -228,26 +426,29 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
ftrace_graph_stop();
WARN(1, "Bad frame pointer: expected %lx, received %lx\n"
" from func %ps return to %lx\n",
- current->ret_stack[index].fp,
+ ret_stack->fp,
frame_pointer,
(void *)ret_stack->func,
ret_stack->ret);
*ret = (unsigned long)panic;
- return;
+ return NULL;
}
#endif

+ *index += FGRAPH_RET_INDEX;
*ret = ret_stack->ret;
trace->func = ret_stack->func;
trace->calltime = ret_stack->calltime;
trace->overrun = atomic_read(&current->trace_overrun);
- trace->depth = current->curr_ret_depth--;
+ trace->depth = current->curr_ret_depth;
/*
* We still want to trace interrupts coming in if
* max_depth is set to 1. Make sure the decrement is
* seen before ftrace_graph_return.
*/
barrier();
+
+ return ret_stack;
}

/*
@@ -285,30 +486,47 @@ struct fgraph_ret_regs;
static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs,
unsigned long frame_pointer)
{
+ struct ftrace_ret_stack *ret_stack;
struct ftrace_graph_ret trace;
+ unsigned long bitmap;
unsigned long ret;
+ int index;
+ int i;

- ftrace_pop_return_trace(&trace, &ret, frame_pointer);
+ ret_stack = ftrace_pop_return_trace(&trace, &ret, frame_pointer, &index);
+
+ if (unlikely(!ret_stack)) {
+ ftrace_graph_stop();
+ WARN_ON(1);
+ /* Might as well panic. What else to do? */
+ return (unsigned long)panic;
+ }
+
+ trace.rettime = trace_clock_local();
#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
trace.retval = fgraph_ret_regs_return_value(ret_regs);
#endif
- trace.rettime = trace_clock_local();
- fgraph_array[0]->retfunc(&trace);
+
+ bitmap = get_fgraph_index_bitmap(current, index);
+ for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+ struct fgraph_ops *gops = fgraph_array[i];
+
+ if (!(bitmap & BIT(i)))
+ continue;
+ if (gops == &fgraph_stub)
+ continue;
+
+ gops->retfunc(&trace);
+ }
+
/*
* The ftrace_graph_return() may still access the current
* ret_stack structure, we need to make sure the update of
* curr_ret_stack is after that.
*/
barrier();
- RET_STACK_DEC(current->curr_ret_stack);
-
- if (unlikely(!ret)) {
- ftrace_graph_stop();
- WARN_ON(1);
- /* Might as well panic. What else to do? */
- ret = (unsigned long)panic;
- }
-
+ current->curr_ret_stack -= FGRAPH_RET_INDEX + 1;
+ current->curr_ret_depth--;
return ret;
}

@@ -343,15 +561,17 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
struct ftrace_ret_stack *
ftrace_graph_get_ret_stack(struct task_struct *task, int idx)
{
+ struct ftrace_ret_stack *ret_stack = NULL;
int index = task->curr_ret_stack;

- BUILD_BUG_ON(FGRAPH_RET_SIZE % sizeof(long));
-
- index -= FGRAPH_RET_INDEX * (idx + 1);
if (index < 0)
return NULL;

- return RET_STACK(task, index);
+ do {
+ ret_stack = get_ret_stack(task, index, &index);
+ } while (ret_stack && --idx >= 0);
+
+ return ret_stack;
}

/**
@@ -374,17 +594,26 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ret, unsigned long *retp)
{
struct ftrace_ret_stack *ret_stack;
- int index = task->curr_ret_stack;
- int i;
+ int i = task->curr_ret_stack;

if (ret != (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
return ret;

- RET_STACK_DEC(index);
-
- for (i = index; i >= 0; RET_STACK_DEC(i)) {
- ret_stack = RET_STACK(task, i);
- if (ret_stack->retp == retp)
+ while (i > 0) {
+ ret_stack = get_ret_stack(current, i, &i);
+ if (!ret_stack)
+ break;
+ /*
+ * For the tail-call, there would be 2 or more ftrace_ret_stacks on
+ * the ret_stack, which records "return_to_handler" as the return
+ * address excpt for the last one.
+ * But on the real stack, there should be 1 entry because tail-call
+ * reuses the return address on the stack and jump to the next function.
+ * Thus we will continue to find real return address.
+ */
+ if (ret_stack->retp == retp &&
+ ret_stack->ret !=
+ (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
return ret_stack->ret;
}

@@ -394,21 +623,29 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ret, unsigned long *retp)
{
- int task_idx;
+ struct ftrace_ret_stack *ret_stack;
+ int task_idx = task->curr_ret_stack;
+ int i;

if (ret != (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
return ret;

- task_idx = task->curr_ret_stack;
- RET_STACK_DEC(task_idx);
-
- if (!task->ret_stack || task_idx < *idx)
+ if (!idx)
return ret;

- task_idx -= *idx;
- RET_STACK_INC(*idx);
+ i = *idx;
+ do {
+ ret_stack = get_ret_stack(task, task_idx, &task_idx);
+ if (ret_stack && ret_stack->ret ==
+ (unsigned long)dereference_kernel_function_descriptor(return_to_handler))
+ continue;
+ i--;
+ } while (i >= 0 && ret_stack);
+
+ if (ret_stack)
+ return ret_stack->ret;

- return RET_STACK(task, task_idx);
+ return ret;
}
#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */

@@ -514,10 +751,10 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
*/
timestamp -= next->ftrace_timestamp;

- for (index = next->curr_ret_stack - FGRAPH_RET_INDEX; index >= 0; ) {
- ret_stack = RET_STACK(next, index);
- ret_stack->calltime += timestamp;
- index -= FGRAPH_RET_INDEX;
+ for (index = next->curr_ret_stack; index > 0; ) {
+ ret_stack = get_ret_stack(next, index, &index);
+ if (ret_stack)
+ ret_stack->calltime += timestamp;
}
}

@@ -568,6 +805,8 @@ graph_init_task(struct task_struct *t, unsigned long *ret_stack)
{
atomic_set(&t->trace_overrun, 0);
t->ftrace_timestamp = 0;
+ t->curr_ret_stack = 0;
+ t->curr_ret_depth = -1;
/* make curr_ret_stack visible before we add the ret_stack */
smp_wmb();
t->ret_stack = ret_stack;
@@ -689,6 +928,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
fgraph_array[i] = gops;
if (i + 1 > fgraph_array_cnt)
fgraph_array_cnt = i + 1;
+ gops->idx = i;

ftrace_graph_active++;



2024-04-15 13:02:40

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 08/36] function_graph: Remove logic around ftrace_graph_entry and return

From: Steven Rostedt (VMware) <[email protected]>

The function pointers ftrace_graph_entry and ftrace_graph_return are no
longer called via the function_graph tracer. Instead, an array structure is
now used that will allow for multiple users of the function_graph
infrastructure. The variables are still used by the architecture code for
non dynamic ftrace configs, where a test is made against them to see if
they point to the default stub function or not. This is how the static
function tracing knows to call into the function graph tracer
infrastructure or not.

Two new stub functions are made. entry_run() and return_run(). The
ftrace_graph_entry and ftrace_graph_return are set to them respectively
when the function graph tracer is enabled, and this will trigger the
architecture specific function graph code to be executed.

This also requires checking the global_ops hash for all calls into the
function_graph tracer.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v2:
- Fix typo and make lines shorter than 76 chars in the description.
- Remove unneeded return from return_run() function.
---
kernel/trace/fgraph.c | 67 +++++++++-------------------------------
kernel/trace/ftrace.c | 2 -
kernel/trace/ftrace_internal.h | 2 -
3 files changed, 15 insertions(+), 56 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index b9a2399b75ee..6f3ba8e113c1 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -145,6 +145,17 @@ add_fgraph_index_bitmap(struct task_struct *t, int offset, unsigned long bitmap)
t->ret_stack[offset] |= (bitmap << FGRAPH_INDEX_SHIFT);
}

+/* ftrace_graph_entry set to this to tell some archs to run function graph */
+static int entry_run(struct ftrace_graph_ent *trace)
+{
+ return 0;
+}
+
+/* ftrace_graph_return set to this to tell some archs to run function graph */
+static void return_run(struct ftrace_graph_ret *trace)
+{
+}
+
/*
* @offset: The index into @t->ret_stack to find the ret_stack entry
* @index: Where to place the index into @t->ret_stack of that entry
@@ -675,7 +686,6 @@ extern void ftrace_stub_graph(struct ftrace_graph_ret *);
/* The callbacks that hook a function */
trace_func_graph_ret_t ftrace_graph_return = ftrace_stub_graph;
trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
-static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;

/* Try to assign a return stack array on FTRACE_RETSTACK_ALLOC_SIZE tasks. */
static int alloc_retstack_tasklist(unsigned long **ret_stack_list)
@@ -758,46 +768,6 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
}
}

-static int ftrace_graph_entry_test(struct ftrace_graph_ent *trace)
-{
- if (!ftrace_ops_test(&global_ops, trace->func, NULL))
- return 0;
- return __ftrace_graph_entry(trace);
-}
-
-/*
- * The function graph tracer should only trace the functions defined
- * by set_ftrace_filter and set_ftrace_notrace. If another function
- * tracer ops is registered, the graph tracer requires testing the
- * function against the global ops, and not just trace any function
- * that any ftrace_ops registered.
- */
-void update_function_graph_func(void)
-{
- struct ftrace_ops *op;
- bool do_test = false;
-
- /*
- * The graph and global ops share the same set of functions
- * to test. If any other ops is on the list, then
- * the graph tracing needs to test if its the function
- * it should call.
- */
- do_for_each_ftrace_op(op, ftrace_ops_list) {
- if (op != &global_ops && op != &graph_ops &&
- op != &ftrace_list_end) {
- do_test = true;
- /* in double loop, break out with goto */
- goto out;
- }
- } while_for_each_ftrace_op(op);
- out:
- if (do_test)
- ftrace_graph_entry = ftrace_graph_entry_test;
- else
- ftrace_graph_entry = __ftrace_graph_entry;
-}
-
static DEFINE_PER_CPU(unsigned long *, idle_ret_stack);

static void
@@ -939,18 +909,12 @@ int register_ftrace_graph(struct fgraph_ops *gops)
ftrace_graph_active--;
goto out;
}
-
- ftrace_graph_return = gops->retfunc;
-
/*
- * Update the indirect function to the entryfunc, and the
- * function that gets called to the entry_test first. Then
- * call the update fgraph entry function to determine if
- * the entryfunc should be called directly or not.
+ * Some archs just test to see if these are not
+ * the default function
*/
- __ftrace_graph_entry = gops->entryfunc;
- ftrace_graph_entry = ftrace_graph_entry_test;
- update_function_graph_func();
+ ftrace_graph_return = return_run;
+ ftrace_graph_entry = entry_run;

ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
}
@@ -986,7 +950,6 @@ void unregister_ftrace_graph(struct fgraph_ops *gops)
if (!ftrace_graph_active) {
ftrace_graph_return = ftrace_stub_graph;
ftrace_graph_entry = ftrace_graph_entry_stub;
- __ftrace_graph_entry = ftrace_graph_entry_stub;
ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
unregister_pm_notifier(&ftrace_suspend_notifier);
unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index da1710499698..fef833f63647 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -235,8 +235,6 @@ static void update_ftrace_function(void)
func = ftrace_ops_list_func;
}

- update_function_graph_func();
-
/* If there's no change, then do nothing more here */
if (ftrace_trace_function == func)
return;
diff --git a/kernel/trace/ftrace_internal.h b/kernel/trace/ftrace_internal.h
index 5012c04f92c0..19eddcb91584 100644
--- a/kernel/trace/ftrace_internal.h
+++ b/kernel/trace/ftrace_internal.h
@@ -42,10 +42,8 @@ ftrace_ops_test(struct ftrace_ops *ops, unsigned long ip, void *regs)

#ifdef CONFIG_FUNCTION_GRAPH_TRACER
extern int ftrace_graph_active;
-void update_function_graph_func(void);
#else /* !CONFIG_FUNCTION_GRAPH_TRACER */
# define ftrace_graph_active 0
-static inline void update_function_graph_func(void) { }
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */

#else /* !CONFIG_FUNCTION_TRACER */


2024-04-15 13:04:25

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 09/36] ftrace/function_graph: Pass fgraph_ops to function graph callbacks

From: Steven Rostedt (VMware) <[email protected]>

Pass the fgraph_ops structure to the function graph callbacks. This will
allow callbacks to add a descriptor to a fgraph_ops private field that wil
be added in the future and use it for the callbacks. This will be useful
when more than one callback can be registered to the function graph tracer.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v2:
- cleanup to set argument name on function prototype.
---
include/linux/ftrace.h | 10 +++++++---
kernel/trace/fgraph.c | 16 +++++++++-------
kernel/trace/ftrace.c | 6 ++++--
kernel/trace/trace.h | 4 ++--
kernel/trace/trace_functions_graph.c | 11 +++++++----
kernel/trace/trace_irqsoff.c | 6 ++++--
kernel/trace/trace_sched_wakeup.c | 6 ++++--
kernel/trace/trace_selftest.c | 5 +++--
8 files changed, 40 insertions(+), 24 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index bedc3c5fc36f..483876444d32 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1055,11 +1055,15 @@ struct ftrace_graph_ret {
unsigned long long rettime;
} __packed;

+struct fgraph_ops;
+
/* Type of the callback handlers for tracing function graph*/
-typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *); /* return */
-typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *); /* entry */
+typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *,
+ struct fgraph_ops *); /* return */
+typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *,
+ struct fgraph_ops *); /* entry */

-extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace);
+extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);

#ifdef CONFIG_FUNCTION_GRAPH_TRACER

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 6f3ba8e113c1..47b461b1cf7e 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -146,13 +146,13 @@ add_fgraph_index_bitmap(struct task_struct *t, int offset, unsigned long bitmap)
}

/* ftrace_graph_entry set to this to tell some archs to run function graph */
-static int entry_run(struct ftrace_graph_ent *trace)
+static int entry_run(struct ftrace_graph_ent *trace, struct fgraph_ops *ops)
{
return 0;
}

/* ftrace_graph_return set to this to tell some archs to run function graph */
-static void return_run(struct ftrace_graph_ret *trace)
+static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops)
{
}

@@ -213,12 +213,14 @@ int __weak ftrace_disable_ftrace_graph_caller(void)
}
#endif

-int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
+int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace,
+ struct fgraph_ops *gops)
{
return 0;
}

-static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace)
+static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace,
+ struct fgraph_ops *gops)
{
}

@@ -379,7 +381,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
if (gops == &fgraph_stub)
continue;

- if (gops->entryfunc(&trace))
+ if (gops->entryfunc(&trace, gops))
bitmap |= BIT(i);
}

@@ -527,7 +529,7 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
if (gops == &fgraph_stub)
continue;

- gops->retfunc(&trace);
+ gops->retfunc(&trace, gops);
}

/*
@@ -681,7 +683,7 @@ void ftrace_graph_sleep_time_control(bool enable)
* Simply points to ftrace_stub, but with the proper protocol.
* Defined by the linker script in linux/vmlinux.lds.h
*/
-extern void ftrace_stub_graph(struct ftrace_graph_ret *);
+void ftrace_stub_graph(struct ftrace_graph_ret *trace, struct fgraph_ops *gops);

/* The callbacks that hook a function */
trace_func_graph_ret_t ftrace_graph_return = ftrace_stub_graph;
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index fef833f63647..4b0708106692 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -815,7 +815,8 @@ void ftrace_graph_graph_time_control(bool enable)
fgraph_graph_time = enable;
}

-static int profile_graph_entry(struct ftrace_graph_ent *trace)
+static int profile_graph_entry(struct ftrace_graph_ent *trace,
+ struct fgraph_ops *gops)
{
struct ftrace_ret_stack *ret_stack;

@@ -832,7 +833,8 @@ static int profile_graph_entry(struct ftrace_graph_ent *trace)
return 1;
}

-static void profile_graph_return(struct ftrace_graph_ret *trace)
+static void profile_graph_return(struct ftrace_graph_ret *trace,
+ struct fgraph_ops *gops)
{
struct ftrace_ret_stack *ret_stack;
struct ftrace_profile_stat *stat;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 64450615ca0c..55bb9a3bf322 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -678,8 +678,8 @@ void trace_latency_header(struct seq_file *m);
void trace_default_header(struct seq_file *m);
void print_trace_header(struct seq_file *m, struct trace_iterator *iter);

-void trace_graph_return(struct ftrace_graph_ret *trace);
-int trace_graph_entry(struct ftrace_graph_ent *trace);
+void trace_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops *gops);
+int trace_graph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);
void set_graph_array(struct trace_array *tr);

void tracing_start_cmdline_record(void);
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index c35fbaab2a47..b7b142b65299 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -129,7 +129,8 @@ static inline int ftrace_graph_ignore_irqs(void)
return in_hardirq();
}

-int trace_graph_entry(struct ftrace_graph_ent *trace)
+int trace_graph_entry(struct ftrace_graph_ent *trace,
+ struct fgraph_ops *gops)
{
struct trace_array *tr = graph_array;
struct trace_array_cpu *data;
@@ -238,7 +239,8 @@ void __trace_graph_return(struct trace_array *tr,
trace_buffer_unlock_commit_nostack(buffer, event);
}

-void trace_graph_return(struct ftrace_graph_ret *trace)
+void trace_graph_return(struct ftrace_graph_ret *trace,
+ struct fgraph_ops *gops)
{
struct trace_array *tr = graph_array;
struct trace_array_cpu *data;
@@ -275,7 +277,8 @@ void set_graph_array(struct trace_array *tr)
smp_mb();
}

-static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
+static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
+ struct fgraph_ops *gops)
{
ftrace_graph_addr_finish(trace);

@@ -288,7 +291,7 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace)
(trace->rettime - trace->calltime < tracing_thresh))
return;
else
- trace_graph_return(trace);
+ trace_graph_return(trace, gops);
}

static struct fgraph_ops funcgraph_thresh_ops = {
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index ba37f768e2f2..5478f4c4f708 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -175,7 +175,8 @@ static int irqsoff_display_graph(struct trace_array *tr, int set)
return start_irqsoff_tracer(irqsoff_trace, set);
}

-static int irqsoff_graph_entry(struct ftrace_graph_ent *trace)
+static int irqsoff_graph_entry(struct ftrace_graph_ent *trace,
+ struct fgraph_ops *gops)
{
struct trace_array *tr = irqsoff_trace;
struct trace_array_cpu *data;
@@ -205,7 +206,8 @@ static int irqsoff_graph_entry(struct ftrace_graph_ent *trace)
return ret;
}

-static void irqsoff_graph_return(struct ftrace_graph_ret *trace)
+static void irqsoff_graph_return(struct ftrace_graph_ret *trace,
+ struct fgraph_ops *gops)
{
struct trace_array *tr = irqsoff_trace;
struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 0469a04a355f..49bcc812652c 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -112,7 +112,8 @@ static int wakeup_display_graph(struct trace_array *tr, int set)
return start_func_tracer(tr, set);
}

-static int wakeup_graph_entry(struct ftrace_graph_ent *trace)
+static int wakeup_graph_entry(struct ftrace_graph_ent *trace,
+ struct fgraph_ops *gops)
{
struct trace_array *tr = wakeup_trace;
struct trace_array_cpu *data;
@@ -141,7 +142,8 @@ static int wakeup_graph_entry(struct ftrace_graph_ent *trace)
return ret;
}

-static void wakeup_graph_return(struct ftrace_graph_ret *trace)
+static void wakeup_graph_return(struct ftrace_graph_ret *trace,
+ struct fgraph_ops *gops)
{
struct trace_array *tr = wakeup_trace;
struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index e9c5058a8efd..56f269c0560a 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -762,7 +762,8 @@ trace_selftest_startup_function(struct tracer *trace, struct trace_array *tr)
static unsigned int graph_hang_thresh;

/* Wrap the real function entry probe to avoid possible hanging */
-static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace)
+static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace,
+ struct fgraph_ops *gops)
{
/* This is harmlessly racy, we want to approximately detect a hang */
if (unlikely(++graph_hang_thresh > GRAPH_MAX_FUNC_TEST)) {
@@ -776,7 +777,7 @@ static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace)
return 0;
}

- return trace_graph_entry(trace);
+ return trace_graph_entry(trace, gops);
}

static struct fgraph_ops fgraph_ops __initdata = {


2024-04-15 13:08:09

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 12/36] function_graph: Have the instances use their own ftrace_ops for filtering

From: Steven Rostedt (VMware) <[email protected]>

Allow for instances to have their own ftrace_ops part of the fgraph_ops
that makes the funtion_graph tracer filter on the set_ftrace_filter file
of the instance and not the top instance.

Note that this also requires to update ftrace_graph_func() to call new
function_graph_enter_ops() instead of function_graph_enter() so that
it avoid pushing on shadow stack multiple times on the same function.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v9:
- Fix to clear fgraph_array correctly when ftrace_startup() fails.
- Return -ENOSPC if fgraph_array is full.
Changes in v8:
- Fix a compilation error in loongarch implementation.
- Update riscv implementation of ftrace_graph_func().
Changes in v7:
- Move FGRAPH_TYPE_BITMAP type implementation to earlier patch (
which implements FGRAPH_TYPE_ARRAY) so that it does not need to
replace the FGRAPH_TYPE_ARRAY type.
- Update loongarch and powerpc implementation of ftrace_graph_func().
- Update description.
Changes in v6:
- Fix to check whether the fgraph_ops is already unregistered in
function_graph_enter_ops().
- Fix stack unwinder error on arm64 because of passing wrong value
as retp. Thanks Mark!
Changes in v4:
- Simplify get_ret_stack() sanity check and use WARN_ON_ONCE() for
obviously wrong value.
- Do not check ret == return_to_handler but always read the previous
ret_stack in ftrace_push_return_trace() to check it is reusable.
- Set the bit 0 of the bitmap entry always in function_graph_enter()
because it uses bit 0 to check re-usability.
- Fix to ensure the ret_stack entry is bitmap type when checking the
bitmap.
Changes in v3:
- Pass current fgraph_ops to the new entry handler
(function_graph_enter_ops) if fgraph use ftrace.
- Add fgraph_ops::idx in this patch.
- Replace the array type with the bitmap type so that it can record
which fgraph is called.
- Fix some helper function to use passed task_struct instead of current.
- Reduce the ret-index size to 1024 words.
- Make the ret-index directly points the ret_stack.
- Fix ftrace_graph_ret_addr() to handle tail-call case correctly.
Changes in v2:
- Use ftrace_graph_func and FTRACE_OPS_GRAPH_STUB instead of
ftrace_stub and FTRACE_OPS_FL_STUB for new ftrace based fgraph.
---
arch/arm64/kernel/ftrace.c | 21 +++++-
arch/loongarch/kernel/ftrace_dyn.c | 15 ++++
arch/powerpc/kernel/trace/ftrace.c | 3 +
arch/riscv/kernel/ftrace.c | 15 ++++
arch/x86/kernel/ftrace.c | 19 +++++
include/linux/ftrace.h | 6 ++
kernel/trace/fgraph.c | 125 +++++++++++++++++++++++++---------
kernel/trace/ftrace.c | 4 +
kernel/trace/trace.h | 16 ++--
kernel/trace/trace_functions.c | 2 -
kernel/trace/trace_functions_graph.c | 8 ++
11 files changed, 183 insertions(+), 51 deletions(-)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index a650f5e11fc5..b96740829798 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -481,7 +481,26 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs)
{
- prepare_ftrace_return(ip, &fregs->lr, fregs->fp);
+ struct fgraph_ops *gops = container_of(op, struct fgraph_ops, ops);
+ unsigned long frame_pointer = fregs->fp;
+ unsigned long *parent = &fregs->lr;
+ int bit;
+
+ if (unlikely(ftrace_graph_is_dead()))
+ return;
+
+ if (unlikely(atomic_read(&current->tracing_graph_pause)))
+ return;
+
+ bit = ftrace_test_recursion_trylock(ip, *parent);
+ if (bit < 0)
+ return;
+
+ if (!function_graph_enter_ops(*parent, ip, frame_pointer,
+ (void *)frame_pointer, gops))
+ *parent = (unsigned long)&return_to_handler;
+
+ ftrace_test_recursion_unlock(bit);
}
#else
/*
diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
index 73858c9029cc..920eb673b32b 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -241,10 +241,21 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent)
void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs)
{
+ struct fgraph_ops *gops = container_of(op, struct fgraph_ops, ops);
+ unsigned long return_hooker = (unsigned long)&return_to_handler;
struct pt_regs *regs = &fregs->regs;
- unsigned long *parent = (unsigned long *)&regs->regs[1];
+ unsigned long *parent;
+ unsigned long old;
+
+ parent = (unsigned long *)&regs->regs[1];

- prepare_ftrace_return(ip, (unsigned long *)parent);
+ if (unlikely(atomic_read(&current->tracing_graph_pause)))
+ return;
+
+ old = *parent;
+
+ if (!function_graph_enter_ops(old, ip, 0, parent, gops))
+ *parent = return_hooker;
}
#else
static int ftrace_modify_graph_caller(bool enable)
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index d8d6b4fd9a14..4a9294821c0d 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -421,6 +421,7 @@ int __init ftrace_dyn_arch_init(void)
void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs)
{
+ struct fgraph_ops *gops = container_of(op, struct fgraph_ops, ops);
unsigned long sp = fregs->regs.gpr[1];
int bit;

@@ -434,7 +435,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
goto out;

- if (!function_graph_enter(parent_ip, ip, 0, (unsigned long *)sp))
+ if (!function_graph_enter_ops(parent_ip, ip, 0, (unsigned long *)sp, gops))
parent_ip = ppc_function_entry(return_to_handler);

ftrace_test_recursion_unlock(bit);
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index f5aa24d9e1c1..eb86fb005f34 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -182,10 +182,23 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr,
void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs)
{
+ struct fgraph_ops *gops = container_of(op, struct fgraph_ops, ops);
+ unsigned long return_hooker = (unsigned long)&return_to_handler;
struct pt_regs *regs = arch_ftrace_get_regs(fregs);
unsigned long *parent = (unsigned long *)&regs->ra;
+ unsigned long old;
+
+ if (unlikely(atomic_read(&current->tracing_graph_pause)))
+ return;
+
+ /*
+ * We don't suffer access faults, so no extra fault-recovery assembly
+ * is needed here.
+ */
+ old = *parent;

- prepare_ftrace_return(parent, ip, frame_pointer(regs));
+ if (!function_graph_enter_ops(old, ip, frame_pointer(regs), parent, gops))
+ *parent = return_hooker;
}
#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
extern void ftrace_graph_call(void);
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 70139d9d2e01..5e30cd69b8ab 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -658,9 +658,24 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs)
{
struct pt_regs *regs = &fregs->regs;
- unsigned long *stack = (unsigned long *)kernel_stack_pointer(regs);
+ unsigned long *parent = (unsigned long *)kernel_stack_pointer(regs);
+ struct fgraph_ops *gops = container_of(op, struct fgraph_ops, ops);
+ int bit;
+
+ if (unlikely(ftrace_graph_is_dead()))
+ return;
+
+ if (unlikely(atomic_read(&current->tracing_graph_pause)))
+ return;

- prepare_ftrace_return(ip, (unsigned long *)stack, 0);
+ bit = ftrace_test_recursion_trylock(ip, *parent);
+ if (bit < 0)
+ return;
+
+ if (!function_graph_enter_ops(*parent, ip, 0, parent, gops))
+ *parent = (unsigned long)&return_to_handler;
+
+ ftrace_test_recursion_unlock(bit);
}
#endif

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index d66ebc77e4e4..6aaca057a078 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1070,6 +1070,7 @@ extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph
struct fgraph_ops {
trace_func_graph_ent_t entryfunc;
trace_func_graph_ret_t retfunc;
+ struct ftrace_ops ops; /* for the hash lists */
void *private;
int idx;
};
@@ -1105,6 +1106,11 @@ extern int
function_graph_enter(unsigned long ret, unsigned long func,
unsigned long frame_pointer, unsigned long *retp);

+extern int
+function_graph_enter_ops(unsigned long ret, unsigned long func,
+ unsigned long frame_pointer, unsigned long *retp,
+ struct fgraph_ops *gops);
+
struct ftrace_ret_stack *
ftrace_graph_get_ret_stack(struct task_struct *task, int idx);

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 47b461b1cf7e..aa9a4fac3373 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -18,14 +18,6 @@
#include "ftrace_internal.h"
#include "trace.h"

-#ifdef CONFIG_DYNAMIC_FTRACE
-#define ASSIGN_OPS_HASH(opsname, val) \
- .func_hash = val, \
- .local_hash.regex_lock = __MUTEX_INITIALIZER(opsname.local_hash.regex_lock),
-#else
-#define ASSIGN_OPS_HASH(opsname, val)
-#endif
-
#define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
#define FGRAPH_RET_INDEX DIV_ROUND_UP(FGRAPH_RET_SIZE, sizeof(long))

@@ -381,7 +373,8 @@ int function_graph_enter(unsigned long ret, unsigned long func,
if (gops == &fgraph_stub)
continue;

- if (gops->entryfunc(&trace, gops))
+ if (ftrace_ops_test(&gops->ops, func, NULL) &&
+ gops->entryfunc(&trace, gops))
bitmap |= BIT(i);
}

@@ -402,6 +395,46 @@ int function_graph_enter(unsigned long ret, unsigned long func,
return -EBUSY;
}

+/* This is called from ftrace_graph_func() via ftrace */
+int function_graph_enter_ops(unsigned long ret, unsigned long func,
+ unsigned long frame_pointer, unsigned long *retp,
+ struct fgraph_ops *gops)
+{
+ struct ftrace_graph_ent trace;
+ int index;
+ int type;
+
+ /* Check whether the fgraph_ops is unregistered. */
+ if (unlikely(fgraph_array[gops->idx] == &fgraph_stub))
+ return -ENODEV;
+
+ /* Use start for the distance to ret_stack (skipping over reserve) */
+ index = ftrace_push_return_trace(ret, func, frame_pointer, retp, gops->idx);
+ if (index < 0)
+ return index;
+ type = get_fgraph_type(current, index);
+
+ /* This is the first ret_stack for this fentry */
+ if (type == FGRAPH_TYPE_RESERVED)
+ ++current->curr_ret_depth;
+
+ trace.func = func;
+ trace.depth = current->curr_ret_depth;
+ if (gops->entryfunc(&trace, gops)) {
+ if (type == FGRAPH_TYPE_RESERVED)
+ set_fgraph_index_bitmap(current, index, BIT(gops->idx));
+ else
+ add_fgraph_index_bitmap(current, index, BIT(gops->idx));
+ return 0;
+ }
+
+ if (type == FGRAPH_TYPE_RESERVED) {
+ current->curr_ret_stack -= FGRAPH_RET_INDEX + 1;
+ current->curr_ret_depth--;
+ }
+ return -EBUSY;
+}
+
/* Retrieve a function return address to the trace stack on thread info.*/
static struct ftrace_ret_stack *
ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
@@ -662,17 +695,25 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
}
#endif /* HAVE_FUNCTION_GRAPH_RET_ADDR_PTR */

-static struct ftrace_ops graph_ops = {
- .func = ftrace_graph_func,
- .flags = FTRACE_OPS_FL_INITIALIZED |
- FTRACE_OPS_FL_PID |
- FTRACE_OPS_GRAPH_STUB,
+void fgraph_init_ops(struct ftrace_ops *dst_ops,
+ struct ftrace_ops *src_ops)
+{
+ dst_ops->func = ftrace_graph_func;
+ dst_ops->flags = FTRACE_OPS_FL_PID | FTRACE_OPS_GRAPH_STUB;
+
#ifdef FTRACE_GRAPH_TRAMP_ADDR
- .trampoline = FTRACE_GRAPH_TRAMP_ADDR,
+ dst_ops->trampoline = FTRACE_GRAPH_TRAMP_ADDR;
/* trampoline_size is only needed for dynamically allocated tramps */
#endif
- ASSIGN_OPS_HASH(graph_ops, &global_ops.local_hash)
-};
+
+#ifdef CONFIG_DYNAMIC_FTRACE
+ if (src_ops) {
+ dst_ops->func_hash = &src_ops->local_hash;
+ mutex_init(&dst_ops->local_hash.regex_lock);
+ dst_ops->flags |= FTRACE_OPS_FL_INITIALIZED;
+ }
+#endif
+}

void ftrace_graph_sleep_time_control(bool enable)
{
@@ -876,11 +917,20 @@ static int start_graph_tracing(void)

int register_ftrace_graph(struct fgraph_ops *gops)
{
+ int command = 0;
int ret = 0;
int i;

mutex_lock(&ftrace_lock);

+ if (!gops->ops.func) {
+ gops->ops.flags |= FTRACE_OPS_GRAPH_STUB;
+ gops->ops.func = ftrace_graph_func;
+#ifdef FTRACE_GRAPH_TRAMP_ADDR
+ gops->ops.trampoline = FTRACE_GRAPH_TRAMP_ADDR;
+#endif
+ }
+
if (!fgraph_array[0]) {
/* The array must always have real data on it */
for (i = 0; i < FGRAPH_ARRAY_SIZE; i++)
@@ -893,7 +943,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
break;
}
if (i >= FGRAPH_ARRAY_SIZE) {
- ret = -EBUSY;
+ ret = -ENOSPC;
goto out;
}

@@ -907,18 +957,22 @@ int register_ftrace_graph(struct fgraph_ops *gops)
if (ftrace_graph_active == 1) {
register_pm_notifier(&ftrace_suspend_notifier);
ret = start_graph_tracing();
- if (ret) {
- ftrace_graph_active--;
- goto out;
- }
+ if (ret)
+ goto error;
/*
* Some archs just test to see if these are not
* the default function
*/
ftrace_graph_return = return_run;
ftrace_graph_entry = entry_run;
+ command = FTRACE_START_FUNC_RET;
+ }

- ret = ftrace_startup(&graph_ops, FTRACE_START_FUNC_RET);
+ ret = ftrace_startup(&gops->ops, command);
+error:
+ if (ret) {
+ fgraph_array[i] = &fgraph_stub;
+ ftrace_graph_active--;
}
out:
mutex_unlock(&ftrace_lock);
@@ -927,6 +981,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)

void unregister_ftrace_graph(struct fgraph_ops *gops)
{
+ int command = 0;
int i;

mutex_lock(&ftrace_lock);
@@ -934,25 +989,29 @@ void unregister_ftrace_graph(struct fgraph_ops *gops)
if (unlikely(!ftrace_graph_active))
goto out;

- for (i = 0; i < fgraph_array_cnt; i++)
- if (gops == fgraph_array[i])
- break;
- if (i >= fgraph_array_cnt)
+ if (unlikely(gops->idx < 0 || gops->idx >= fgraph_array_cnt))
goto out;

- fgraph_array[i] = &fgraph_stub;
- if (i + 1 == fgraph_array_cnt) {
- for (; i >= 0; i--)
- if (fgraph_array[i] != &fgraph_stub)
- break;
+ WARN_ON_ONCE(fgraph_array[gops->idx] != gops);
+
+ fgraph_array[gops->idx] = &fgraph_stub;
+ if (gops->idx + 1 == fgraph_array_cnt) {
+ i = gops->idx;
+ while (i >= 0 && fgraph_array[i] == &fgraph_stub)
+ i--;
fgraph_array_cnt = i + 1;
}

ftrace_graph_active--;
+
+ if (!ftrace_graph_active)
+ command = FTRACE_STOP_FUNC_RET;
+
+ ftrace_shutdown(&gops->ops, command);
+
if (!ftrace_graph_active) {
ftrace_graph_return = ftrace_stub_graph;
ftrace_graph_entry = ftrace_graph_entry_stub;
- ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
unregister_pm_notifier(&ftrace_suspend_notifier);
unregister_trace_sched_switch(ftrace_graph_probe_sched_switch, NULL);
}
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 92abb9869198..45fd2710f81b 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3017,6 +3017,8 @@ int ftrace_startup(struct ftrace_ops *ops, int command)
if (unlikely(ftrace_disabled))
return -ENODEV;

+ ftrace_ops_init(ops);
+
ret = __register_ftrace_function(ops);
if (ret)
return ret;
@@ -7326,7 +7328,7 @@ __init void ftrace_init_global_array_ops(struct trace_array *tr)
tr->ops = &global_ops;
tr->ops->private = tr;
ftrace_init_trace_array(tr);
- init_array_fgraph_ops(tr);
+ init_array_fgraph_ops(tr, tr->ops);
}

void ftrace_init_array_ops(struct trace_array *tr, ftrace_func_t func)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 114b120afd2a..9995d6b00a93 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -893,8 +893,8 @@ extern int __trace_graph_entry(struct trace_array *tr,
extern void __trace_graph_return(struct trace_array *tr,
struct ftrace_graph_ret *trace,
unsigned int trace_ctx);
-extern void init_array_fgraph_ops(struct trace_array *tr);
-extern int allocate_fgraph_ops(struct trace_array *tr);
+extern void init_array_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops);
+extern int allocate_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops);
extern void free_fgraph_ops(struct trace_array *tr);

#ifdef CONFIG_DYNAMIC_FTRACE
@@ -977,6 +977,7 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
preempt_enable_notrace();
return ret;
}
+
#else
static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
{
@@ -1002,18 +1003,19 @@ static inline bool ftrace_graph_ignore_func(struct ftrace_graph_ent *trace)
(fgraph_max_depth && trace->depth >= fgraph_max_depth);
}

+void fgraph_init_ops(struct ftrace_ops *dst_ops,
+ struct ftrace_ops *src_ops);
+
#else /* CONFIG_FUNCTION_GRAPH_TRACER */
static inline enum print_line_t
print_graph_function_flags(struct trace_iterator *iter, u32 flags)
{
return TRACE_TYPE_UNHANDLED;
}
-static inline void init_array_fgraph_ops(struct trace_array *tr) { }
-static inline int allocate_fgraph_ops(struct trace_array *tr)
-{
- return 0;
-}
static inline void free_fgraph_ops(struct trace_array *tr) { }
+/* ftrace_ops may not be defined */
+#define init_array_fgraph_ops(tr, ops) do { } while (0)
+#define allocate_fgraph_ops(tr, ops) ({ 0; })
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */

extern struct list_head ftrace_pids;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 8e8da0d0ee52..13bf2415245d 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -91,7 +91,7 @@ int ftrace_create_function_files(struct trace_array *tr,
if (!tr->ops)
return -EINVAL;

- ret = allocate_fgraph_ops(tr);
+ ret = allocate_fgraph_ops(tr, tr->ops);
if (ret) {
kfree(tr->ops);
return ret;
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 9ccc904a7703..7f30652f0e97 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -288,7 +288,7 @@ static struct fgraph_ops funcgraph_ops = {
.retfunc = &trace_graph_return,
};

-int allocate_fgraph_ops(struct trace_array *tr)
+int allocate_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops)
{
struct fgraph_ops *gops;

@@ -301,6 +301,9 @@ int allocate_fgraph_ops(struct trace_array *tr)

tr->gops = gops;
gops->private = tr;
+
+ fgraph_init_ops(&gops->ops, ops);
+
return 0;
}

@@ -309,10 +312,11 @@ void free_fgraph_ops(struct trace_array *tr)
kfree(tr->gops);
}

-__init void init_array_fgraph_ops(struct trace_array *tr)
+__init void init_array_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops)
{
tr->gops = &funcgraph_ops;
funcgraph_ops.private = tr;
+ fgraph_init_ops(&tr->gops->ops, ops);
}

static int graph_trace_init(struct trace_array *tr)


2024-04-15 13:08:15

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 11/36] ftrace: Allow ftrace startup flags exist without dynamic ftrace

From: Steven Rostedt (VMware) <[email protected]>

Some of the flags for ftrace_startup() may be exposed even when
CONFIG_DYNAMIC_FTRACE is not configured in. This is fine as the difference
between dynamic ftrace and static ftrace is done within the internals of
ftrace itself. No need to have use cases fail to compile because dynamic
ftrace is disabled.

This change is needed to move some of the logic of what is passed to
ftrace_startup() out of the parameters of ftrace_startup().

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
include/linux/ftrace.h | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 2eb4981ec80b..d66ebc77e4e4 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -538,6 +538,15 @@ static inline void stack_tracer_disable(void) { }
static inline void stack_tracer_enable(void) { }
#endif

+enum {
+ FTRACE_UPDATE_CALLS = (1 << 0),
+ FTRACE_DISABLE_CALLS = (1 << 1),
+ FTRACE_UPDATE_TRACE_FUNC = (1 << 2),
+ FTRACE_START_FUNC_RET = (1 << 3),
+ FTRACE_STOP_FUNC_RET = (1 << 4),
+ FTRACE_MAY_SLEEP = (1 << 5),
+};
+
#ifdef CONFIG_DYNAMIC_FTRACE

void ftrace_arch_code_modify_prepare(void);
@@ -632,15 +641,6 @@ void ftrace_set_global_notrace(unsigned char *buf, int len, int reset);
void ftrace_free_filter(struct ftrace_ops *ops);
void ftrace_ops_set_global_filter(struct ftrace_ops *ops);

-enum {
- FTRACE_UPDATE_CALLS = (1 << 0),
- FTRACE_DISABLE_CALLS = (1 << 1),
- FTRACE_UPDATE_TRACE_FUNC = (1 << 2),
- FTRACE_START_FUNC_RET = (1 << 3),
- FTRACE_STOP_FUNC_RET = (1 << 4),
- FTRACE_MAY_SLEEP = (1 << 5),
-};
-
/*
* The FTRACE_UPDATE_* enum is used to pass information back
* from the ftrace_update_record() and ftrace_test_record()


2024-04-15 13:08:56

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 14/36] function_graph: Add "task variables" per task for fgraph_ops

From: Steven Rostedt (VMware) <[email protected]>

Add a "task variables" array on the tasks shadow ret_stack that is the
size of longs for each possible registered fgraph_ops. That's a total
of 16, taking up 8 * 16 = 128 bytes (out of a page size 4k).

This will allow for fgraph_ops to do specific features on a per task basis
having a way to maintain state for each task.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v3:
- Move fgraph_ops::idx to previous patch in the series.
Changes in v2:
- Make description lines shorter than 76 chars.
---
include/linux/ftrace.h | 1 +
kernel/trace/fgraph.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 6aaca057a078..85b887973e02 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1116,6 +1116,7 @@ ftrace_graph_get_ret_stack(struct task_struct *task, int idx);

unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx,
unsigned long ret, unsigned long *retp);
+unsigned long *fgraph_get_task_var(struct fgraph_ops *gops);

/*
* Sometimes we don't want to trace a function with the function
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 7e73bc3eab8b..2a6e91c293fe 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -92,10 +92,18 @@ enum {
#define SHADOW_STACK_SIZE (PAGE_SIZE)
#define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
/* Leave on a buffer at the end */
-#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1))
+#define SHADOW_STACK_MAX_INDEX \
+ (SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1 + FGRAPH_ARRAY_SIZE))

#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))

+/*
+ * Each fgraph_ops has a reservered unsigned long at the end (top) of the
+ * ret_stack to store task specific state.
+ */
+#define SHADOW_STACK_TASK_VARS(ret_stack) \
+ ((unsigned long *)(&(ret_stack)[SHADOW_STACK_INDEX - FGRAPH_ARRAY_SIZE]))
+
DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
int ftrace_graph_active;

@@ -186,6 +194,44 @@ static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops)
{
}

+static void ret_stack_set_task_var(struct task_struct *t, int idx, long val)
+{
+ unsigned long *gvals = SHADOW_STACK_TASK_VARS(t->ret_stack);
+
+ gvals[idx] = val;
+}
+
+static unsigned long *
+ret_stack_get_task_var(struct task_struct *t, int idx)
+{
+ unsigned long *gvals = SHADOW_STACK_TASK_VARS(t->ret_stack);
+
+ return &gvals[idx];
+}
+
+static void ret_stack_init_task_vars(unsigned long *ret_stack)
+{
+ unsigned long *gvals = SHADOW_STACK_TASK_VARS(ret_stack);
+
+ memset(gvals, 0, sizeof(*gvals) * FGRAPH_ARRAY_SIZE);
+}
+
+/**
+ * fgraph_get_task_var - retrieve a task specific state variable
+ * @gops: The ftrace_ops that owns the task specific variable
+ *
+ * Every registered fgraph_ops has a task state variable
+ * reserved on the task's ret_stack. This function returns the
+ * address to that variable.
+ *
+ * Returns the address to the fgraph_ops @gops tasks specific
+ * unsigned long variable.
+ */
+unsigned long *fgraph_get_task_var(struct fgraph_ops *gops)
+{
+ return ret_stack_get_task_var(current, gops->idx);
+}
+
/*
* @offset: The index into @t->ret_stack to find the ret_stack entry
* @index: Where to place the index into @t->ret_stack of that entry
@@ -795,6 +841,7 @@ static int alloc_retstack_tasklist(unsigned long **ret_stack_list)

if (t->ret_stack == NULL) {
atomic_set(&t->trace_overrun, 0);
+ ret_stack_init_task_vars(ret_stack_list[start]);
t->curr_ret_stack = 0;
t->curr_ret_depth = -1;
/* Make sure the tasks see the 0 first: */
@@ -855,6 +902,7 @@ static void
graph_init_task(struct task_struct *t, unsigned long *ret_stack)
{
atomic_set(&t->trace_overrun, 0);
+ ret_stack_init_task_vars(ret_stack);
t->ftrace_timestamp = 0;
t->curr_ret_stack = 0;
t->curr_ret_depth = -1;
@@ -953,6 +1001,24 @@ static int start_graph_tracing(void)
return ret;
}

+static void init_task_vars(int idx)
+{
+ struct task_struct *g, *t;
+ int cpu;
+
+ for_each_online_cpu(cpu) {
+ if (idle_task(cpu)->ret_stack)
+ ret_stack_set_task_var(idle_task(cpu), idx, 0);
+ }
+
+ read_lock(&tasklist_lock);
+ for_each_process_thread(g, t) {
+ if (t->ret_stack)
+ ret_stack_set_task_var(t, idx, 0);
+ }
+ read_unlock(&tasklist_lock);
+}
+
int register_ftrace_graph(struct fgraph_ops *gops)
{
int command = 0;
@@ -999,6 +1065,8 @@ int register_ftrace_graph(struct fgraph_ops *gops)
ftrace_graph_return = return_run;
ftrace_graph_entry = entry_run;
command = FTRACE_START_FUNC_RET;
+ } else {
+ init_task_vars(gops->idx);
}

ret = ftrace_startup(&gops->ops, command);


2024-04-15 13:09:19

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 13/36] function_graph: Use a simple LRU for fgraph_array index number

From: Masami Hiramatsu (Google) <[email protected]>

Since the fgraph_array index is used for the bitmap on the shadow
stack, it may leave some entries after a function_graph instance is
removed. Thus if another instance reuses the fgraph_array index soon
after releasing it, the fgraph may confuse to call the newer callback
for the entries which are pushed by the older instance.
To avoid reusing the fgraph_array index soon after releasing, introduce
a simple LRU table for managing the index number. This will reduce the
possibility of this confusion.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v8:
- Add a WARN_ON_ONCE() if fgraph_lru_table[] is broken when releasing
index, and remove WARN_ON_ONCE() from unregister_ftrace_graph()
- Fix to release allocated index if register_ftrace_graph() fails.
- Add comments and code cleanup.
Changes in v5:
- Fix the underflow bug in fgraph_lru_release_index() and return 0
if the release is succeded.
Changes in v4:
- Newly added.
---
kernel/trace/fgraph.c | 71 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 50 insertions(+), 21 deletions(-)

diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index aa9a4fac3373..7e73bc3eab8b 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -99,10 +99,48 @@ enum {
DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
int ftrace_graph_active;

-static int fgraph_array_cnt;
-
static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];

+/* LRU index table for fgraph_array */
+static int fgraph_lru_table[FGRAPH_ARRAY_SIZE];
+static int fgraph_lru_next;
+static int fgraph_lru_last;
+
+/* Initialize fgraph_lru_table with unused index */
+static void fgraph_lru_init(void)
+{
+ int i;
+
+ for (i = 0; i < FGRAPH_ARRAY_SIZE; i++)
+ fgraph_lru_table[i] = i;
+}
+
+/* Release the used index to the LRU table */
+static int fgraph_lru_release_index(int idx)
+{
+ if (idx < 0 || idx >= FGRAPH_ARRAY_SIZE ||
+ WARN_ON_ONCE(fgraph_lru_table[fgraph_lru_last] != -1))
+ return -1;
+
+ fgraph_lru_table[fgraph_lru_last] = idx;
+ fgraph_lru_last = (fgraph_lru_last + 1) % FGRAPH_ARRAY_SIZE;
+ return 0;
+}
+
+/* Allocate a new index from LRU table */
+static int fgraph_lru_alloc_index(void)
+{
+ int idx = fgraph_lru_table[fgraph_lru_next];
+
+ /* No id is available */
+ if (idx == -1)
+ return -1;
+
+ fgraph_lru_table[fgraph_lru_next] = -1;
+ fgraph_lru_next = (fgraph_lru_next + 1) % FGRAPH_ARRAY_SIZE;
+ return idx;
+}
+
static inline int get_ret_stack_index(struct task_struct *t, int offset)
{
return t->ret_stack[offset] & FGRAPH_RET_INDEX_MASK;
@@ -367,7 +405,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
if (index < 0)
goto out;

- for (i = 0; i < fgraph_array_cnt; i++) {
+ for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
struct fgraph_ops *gops = fgraph_array[i];

if (gops == &fgraph_stub)
@@ -919,7 +957,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
{
int command = 0;
int ret = 0;
- int i;
+ int i = -1;

mutex_lock(&ftrace_lock);

@@ -935,21 +973,16 @@ int register_ftrace_graph(struct fgraph_ops *gops)
/* The array must always have real data on it */
for (i = 0; i < FGRAPH_ARRAY_SIZE; i++)
fgraph_array[i] = &fgraph_stub;
+ fgraph_lru_init();
}

- /* Look for an available spot */
- for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
- if (fgraph_array[i] == &fgraph_stub)
- break;
- }
- if (i >= FGRAPH_ARRAY_SIZE) {
+ i = fgraph_lru_alloc_index();
+ if (i < 0 || WARN_ON_ONCE(fgraph_array[i] != &fgraph_stub)) {
ret = -ENOSPC;
goto out;
}

fgraph_array[i] = gops;
- if (i + 1 > fgraph_array_cnt)
- fgraph_array_cnt = i + 1;
gops->idx = i;

ftrace_graph_active++;
@@ -973,6 +1006,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
if (ret) {
fgraph_array[i] = &fgraph_stub;
ftrace_graph_active--;
+ fgraph_lru_release_index(i);
}
out:
mutex_unlock(&ftrace_lock);
@@ -982,25 +1016,20 @@ int register_ftrace_graph(struct fgraph_ops *gops)
void unregister_ftrace_graph(struct fgraph_ops *gops)
{
int command = 0;
- int i;

mutex_lock(&ftrace_lock);

if (unlikely(!ftrace_graph_active))
goto out;

- if (unlikely(gops->idx < 0 || gops->idx >= fgraph_array_cnt))
+ if (unlikely(gops->idx < 0 || gops->idx >= FGRAPH_ARRAY_SIZE ||
+ fgraph_array[gops->idx] != gops))
goto out;

- WARN_ON_ONCE(fgraph_array[gops->idx] != gops);
+ if (fgraph_lru_release_index(gops->idx) < 0)
+ goto out;

fgraph_array[gops->idx] = &fgraph_stub;
- if (gops->idx + 1 == fgraph_array_cnt) {
- i = gops->idx;
- while (i >= 0 && fgraph_array[i] == &fgraph_stub)
- i--;
- fgraph_array_cnt = i + 1;
- }

ftrace_graph_active--;



2024-04-15 13:10:07

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 16/36] function_graph: Move graph depth stored data to shadow stack global var

From: Steven Rostedt (VMware) <[email protected]>

The use of the task->trace_recursion for the logic used for the function
graph depth was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
include/linux/trace_recursion.h | 29 -----------------------------
kernel/trace/trace.h | 34 ++++++++++++++++++++++++++++++++--
2 files changed, 32 insertions(+), 31 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index 2efd5ec46d7f..00e792bf148d 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -44,25 +44,6 @@ enum {
*/
TRACE_IRQ_BIT,

- /*
- * In the very unlikely case that an interrupt came in
- * at a start of graph tracing, and we want to trace
- * the function in that interrupt, the depth can be greater
- * than zero, because of the preempted start of a previous
- * trace. In an even more unlikely case, depth could be 2
- * if a softirq interrupted the start of graph tracing,
- * followed by an interrupt preempting a start of graph
- * tracing in the softirq, and depth can even be 3
- * if an NMI came in at the start of an interrupt function
- * that preempted a softirq start of a function that
- * preempted normal context!!!! Luckily, it can't be
- * greater than 3, so the next two bits are a mask
- * of what the depth is when we set TRACE_GRAPH_FL
- */
-
- TRACE_GRAPH_DEPTH_START_BIT,
- TRACE_GRAPH_DEPTH_END_BIT,
-
/*
* To implement set_graph_notrace, if this bit is set, we ignore
* function graph tracing of called functions, until the return
@@ -78,16 +59,6 @@ enum {
#define trace_recursion_clear(bit) do { (current)->trace_recursion &= ~(1<<(bit)); } while (0)
#define trace_recursion_test(bit) ((current)->trace_recursion & (1<<(bit)))

-#define trace_recursion_depth() \
- (((current)->trace_recursion >> TRACE_GRAPH_DEPTH_START_BIT) & 3)
-#define trace_recursion_set_depth(depth) \
- do { \
- current->trace_recursion &= \
- ~(3 << TRACE_GRAPH_DEPTH_START_BIT); \
- current->trace_recursion |= \
- ((depth) & 3) << TRACE_GRAPH_DEPTH_START_BIT; \
- } while (0)
-
#define TRACE_CONTEXT_BITS 4

#define TRACE_FTRACE_START TRACE_FTRACE_BIT
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index c7c7e7c9f700..7ab731b9ebc8 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -899,8 +899,38 @@ extern void free_fgraph_ops(struct trace_array *tr);

enum {
TRACE_GRAPH_FL = 1,
+
+ /*
+ * In the very unlikely case that an interrupt came in
+ * at a start of graph tracing, and we want to trace
+ * the function in that interrupt, the depth can be greater
+ * than zero, because of the preempted start of a previous
+ * trace. In an even more unlikely case, depth could be 2
+ * if a softirq interrupted the start of graph tracing,
+ * followed by an interrupt preempting a start of graph
+ * tracing in the softirq, and depth can even be 3
+ * if an NMI came in at the start of an interrupt function
+ * that preempted a softirq start of a function that
+ * preempted normal context!!!! Luckily, it can't be
+ * greater than 3, so the next two bits are a mask
+ * of what the depth is when we set TRACE_GRAPH_FL
+ */
+
+ TRACE_GRAPH_DEPTH_START_BIT,
+ TRACE_GRAPH_DEPTH_END_BIT,
};

+static inline unsigned long ftrace_graph_depth(unsigned long *task_var)
+{
+ return (*task_var >> TRACE_GRAPH_DEPTH_START_BIT) & 3;
+}
+
+static inline void ftrace_graph_set_depth(unsigned long *task_var, int depth)
+{
+ *task_var &= ~(3 << TRACE_GRAPH_DEPTH_START_BIT);
+ *task_var |= (depth & 3) << TRACE_GRAPH_DEPTH_START_BIT;
+}
+
#ifdef CONFIG_DYNAMIC_FTRACE
extern struct ftrace_hash __rcu *ftrace_graph_hash;
extern struct ftrace_hash __rcu *ftrace_graph_notrace_hash;
@@ -933,7 +963,7 @@ ftrace_graph_addr(unsigned long *task_var, struct ftrace_graph_ent *trace)
* when the depth is zero.
*/
*task_var |= TRACE_GRAPH_FL;
- trace_recursion_set_depth(trace->depth);
+ ftrace_graph_set_depth(task_var, trace->depth);

/*
* If no irqs are to be traced, but a set_graph_function
@@ -958,7 +988,7 @@ ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftrace_graph_ret *trace
unsigned long *task_var = fgraph_get_task_var(gops);

if ((*task_var & TRACE_GRAPH_FL) &&
- trace->depth == trace_recursion_depth())
+ trace->depth == ftrace_graph_depth(task_var))
*task_var &= ~TRACE_GRAPH_FL;
}



2024-04-15 13:10:19

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 17/36] function_graph: Move graph notrace bit to shadow stack global var

From: Steven Rostedt (VMware) <[email protected]>

The use of the task->trace_recursion for the logic used for the function
graph no-trace was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use
that instead.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v2:
- Make description lines shorter than 76 chars.
---
include/linux/trace_recursion.h | 7 -------
kernel/trace/trace.h | 9 +++++++++
kernel/trace/trace_functions_graph.c | 10 ++++++----
3 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index 00e792bf148d..cc11b0e9d220 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -44,13 +44,6 @@ enum {
*/
TRACE_IRQ_BIT,

- /*
- * To implement set_graph_notrace, if this bit is set, we ignore
- * function graph tracing of called functions, until the return
- * function is called to clear it.
- */
- TRACE_GRAPH_NOTRACE_BIT,
-
/* Used to prevent recursion recording from recursing. */
TRACE_RECORD_RECURSION_BIT,
};
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 7ab731b9ebc8..f23b6fbd547d 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -918,8 +918,17 @@ enum {

TRACE_GRAPH_DEPTH_START_BIT,
TRACE_GRAPH_DEPTH_END_BIT,
+
+ /*
+ * To implement set_graph_notrace, if this bit is set, we ignore
+ * function graph tracing of called functions, until the return
+ * function is called to clear it.
+ */
+ TRACE_GRAPH_NOTRACE_BIT,
};

+#define TRACE_GRAPH_NOTRACE (1 << TRACE_GRAPH_NOTRACE_BIT)
+
static inline unsigned long ftrace_graph_depth(unsigned long *task_var)
{
return (*task_var >> TRACE_GRAPH_DEPTH_START_BIT) & 3;
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 66cce73e94f8..13d0387ac6a6 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -130,6 +130,7 @@ static inline int ftrace_graph_ignore_irqs(void)
int trace_graph_entry(struct ftrace_graph_ent *trace,
struct fgraph_ops *gops)
{
+ unsigned long *task_var = fgraph_get_task_var(gops);
struct trace_array *tr = gops->private;
struct trace_array_cpu *data;
unsigned long flags;
@@ -138,7 +139,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
int ret;
int cpu;

- if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT))
+ if (*task_var & TRACE_GRAPH_NOTRACE)
return 0;

/*
@@ -149,7 +150,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
* returning from the function.
*/
if (ftrace_graph_notrace_addr(trace->func)) {
- trace_recursion_set(TRACE_GRAPH_NOTRACE_BIT);
+ *task_var |= TRACE_GRAPH_NOTRACE_BIT;
/*
* Need to return 1 to have the return called
* that will clear the NOTRACE bit.
@@ -240,6 +241,7 @@ void __trace_graph_return(struct trace_array *tr,
void trace_graph_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops)
{
+ unsigned long *task_var = fgraph_get_task_var(gops);
struct trace_array *tr = gops->private;
struct trace_array_cpu *data;
unsigned long flags;
@@ -249,8 +251,8 @@ void trace_graph_return(struct ftrace_graph_ret *trace,

ftrace_graph_addr_finish(gops, trace);

- if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
- trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
+ if (*task_var & TRACE_GRAPH_NOTRACE) {
+ *task_var &= ~TRACE_GRAPH_NOTRACE;
return;
}



2024-04-15 13:10:36

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 18/36] function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()

From: Steven Rostedt (VMware) <[email protected]>

Added functions that can be called by a fgraph_ops entryfunc and retfunc to
store state between the entry of the function being traced to the exit of
the same function. The fgraph_ops entryfunc() may call
fgraph_reserve_data() to store up to 32 words onto the task's shadow
ret_stack and this then can be retrieved by fgraph_retrieve_data() called
by the corresponding retfunc().

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v8:
- Avoid using DIV_ROUND_UP() in the hot path.
Changes in v3:
- Store fgraph_array index to the data entry.
- Both function requires fgraph_array index to store/retrieve data.
- Reserve correct size of the data.
- Return correct data area.
Changes in v2:
- Retrieve the reserved size by fgraph_retrieve_data().
- Expand the maximum data size to 32 words.
- Update stack index with __get_index(val) if FGRAPH_TYPE_ARRAY entry.
- fix typos and make description lines shorter than 76 chars.
---
include/linux/ftrace.h | 3 +
kernel/trace/fgraph.c | 175 ++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 170 insertions(+), 8 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 85b887973e02..4c53f3dffab8 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1075,6 +1075,9 @@ struct fgraph_ops {
int idx;
};

+void *fgraph_reserve_data(int idx, int size_bytes);
+void *fgraph_retrieve_data(int idx, int *size_bytes);
+
/*
* Stack of return addresses for functions
* of a thread.
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 2a6e91c293fe..d13806ca1bbb 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -41,17 +41,29 @@
* bits: 10 - 11 Type of storage
* 0 - reserved
* 1 - bitmap of fgraph_array index
+ * 2 - reserved data
*
* For bitmap of fgraph_array index
* bits: 12 - 27 The bitmap of fgraph_ops fgraph_array index
*
+ * For reserved data:
+ * bits: 12 - 17 The size in words that is stored
+ * bits: 18 - 23 The index of fgraph_array, which shows who is stored
+ *
* That is, at the end of function_graph_enter, if the first and forth
* fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
- * on the return of the function being traced, this is what will be on the
- * task's shadow ret_stack: (the stack grows upward)
+ * on the return of the function being traced, and the forth fgraph_ops
+ * stored two words of data, this is what will be on the task's shadow
+ * ret_stack: (the stack grows upward)
*
* | | <- task->curr_ret_stack
* +--------------------------------------------+
+ * | data_type(idx:3, size:2, |
+ * | offset:FGRAPH_RET_INDEX+3) | ( Data with size of 2 words)
+ * +--------------------------------------------+ ( It is 4 words from the ret_stack)
+ * | STORED DATA WORD 2 |
+ * | STORED DATA WORD 1 |
+ * +------i-------------------------------------+
* | bitmap_type(bitmap:(BIT(3)|BIT(0)), |
* | offset:FGRAPH_RET_INDEX) | <- the offset is from here
* +--------------------------------------------+
@@ -78,14 +90,23 @@
enum {
FGRAPH_TYPE_RESERVED = 0,
FGRAPH_TYPE_BITMAP = 1,
+ FGRAPH_TYPE_DATA = 2,
};

#define FGRAPH_INDEX_SIZE 16
#define FGRAPH_INDEX_MASK GENMASK(FGRAPH_INDEX_SIZE - 1, 0)
#define FGRAPH_INDEX_SHIFT (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)

-/* Currently the max stack index can't be more than register callers */
-#define FGRAPH_MAX_INDEX (FGRAPH_INDEX_SIZE + FGRAPH_RET_INDEX)
+#define FGRAPH_DATA_SIZE 5
+#define FGRAPH_DATA_MASK ((1 << FGRAPH_DATA_SIZE) - 1)
+#define FGRAPH_DATA_SHIFT (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
+
+#define FGRAPH_DATA_INDEX_SIZE 4
+#define FGRAPH_DATA_INDEX_MASK ((1 << FGRAPH_DATA_INDEX_SIZE) - 1)
+#define FGRAPH_DATA_INDEX_SHIFT (FGRAPH_DATA_SHIFT + FGRAPH_DATA_SIZE)
+
+#define FGRAPH_MAX_INDEX \
+ ((FGRAPH_INDEX_SIZE << FGRAPH_DATA_SIZE) + FGRAPH_RET_INDEX)

#define FGRAPH_ARRAY_SIZE FGRAPH_INDEX_SIZE

@@ -97,6 +118,8 @@ enum {

#define RET_STACK(t, index) ((struct ftrace_ret_stack *)(&(t)->ret_stack[index]))

+#define FGRAPH_MAX_DATA_SIZE (sizeof(long) * (1 << FGRAPH_DATA_SIZE))
+
/*
* Each fgraph_ops has a reservered unsigned long at the end (top) of the
* ret_stack to store task specific state.
@@ -149,14 +172,39 @@ static int fgraph_lru_alloc_index(void)
return idx;
}

+static inline int __get_index(unsigned long val)
+{
+ return val & FGRAPH_RET_INDEX_MASK;
+}
+
+static inline int __get_type(unsigned long val)
+{
+ return (val >> FGRAPH_TYPE_SHIFT) & FGRAPH_TYPE_MASK;
+}
+
+static inline int __get_data_index(unsigned long val)
+{
+ return (val >> FGRAPH_DATA_INDEX_SHIFT) & FGRAPH_DATA_INDEX_MASK;
+}
+
+static inline int __get_data_size(unsigned long val)
+{
+ return (val >> FGRAPH_DATA_SHIFT) & FGRAPH_DATA_MASK;
+}
+
+static inline unsigned long get_fgraph_entry(struct task_struct *t, int index)
+{
+ return t->ret_stack[index];
+}
+
static inline int get_ret_stack_index(struct task_struct *t, int offset)
{
- return t->ret_stack[offset] & FGRAPH_RET_INDEX_MASK;
+ return __get_index(t->ret_stack[offset]);
}

static inline int get_fgraph_type(struct task_struct *t, int offset)
{
- return (t->ret_stack[offset] >> FGRAPH_TYPE_SHIFT) & FGRAPH_TYPE_MASK;
+ return __get_type(t->ret_stack[offset]);
}

static inline unsigned long
@@ -183,6 +231,22 @@ add_fgraph_index_bitmap(struct task_struct *t, int offset, unsigned long bitmap)
t->ret_stack[offset] |= (bitmap << FGRAPH_INDEX_SHIFT);
}

+static inline void *get_fgraph_data(struct task_struct *t, int index)
+{
+ unsigned long val = t->ret_stack[index];
+
+ if (__get_type(val) != FGRAPH_TYPE_DATA)
+ return NULL;
+ index -= __get_data_size(val);
+ return (void *)&t->ret_stack[index];
+}
+
+static inline unsigned long make_fgraph_data(int idx, int size, int offset)
+{
+ return (idx << FGRAPH_DATA_INDEX_SHIFT) | (size << FGRAPH_DATA_SHIFT) |
+ (FGRAPH_TYPE_DATA << FGRAPH_TYPE_SHIFT) | offset;
+}
+
/* ftrace_graph_entry set to this to tell some archs to run function graph */
static int entry_run(struct ftrace_graph_ent *trace, struct fgraph_ops *ops)
{
@@ -216,6 +280,92 @@ static void ret_stack_init_task_vars(unsigned long *ret_stack)
memset(gvals, 0, sizeof(*gvals) * FGRAPH_ARRAY_SIZE);
}

+/**
+ * fgraph_reserve_data - Reserve storage on the task's ret_stack
+ * @idx: The index of fgraph_array
+ * @size_bytes: The size in bytes to reserve
+ *
+ * Reserves space of up to FGRAPH_MAX_DATA_SIZE bytes on the
+ * task's ret_stack shadow stack, for a given fgraph_ops during
+ * the entryfunc() call. If entryfunc() returns zero, the storage
+ * is discarded. An entryfunc() can only call this once per iteration.
+ * The fgraph_ops retfunc() can retrieve this stored data with
+ * fgraph_retrieve_data().
+ *
+ * Returns: On success, a pointer to the data on the stack.
+ * Otherwise, NULL if there's not enough space left on the
+ * ret_stack for the data, or if fgraph_reserve_data() was called
+ * more than once for a single entryfunc() call.
+ */
+void *fgraph_reserve_data(int idx, int size_bytes)
+{
+ unsigned long val;
+ void *data;
+ int curr_ret_stack = current->curr_ret_stack;
+ int data_size;
+
+ if (size_bytes > FGRAPH_MAX_DATA_SIZE)
+ return NULL;
+
+ /* Convert to number of longs + data word */
+ data_size = (size_bytes + sizeof(long) - 1) >> (sizeof(long) == 4 ? 2 : 3);
+
+ val = get_fgraph_entry(current, curr_ret_stack - 1);
+ data = &current->ret_stack[curr_ret_stack];
+
+ curr_ret_stack += data_size + 1;
+ if (unlikely(curr_ret_stack >= SHADOW_STACK_MAX_INDEX))
+ return NULL;
+
+ val = make_fgraph_data(idx, data_size, __get_index(val) + data_size + 1);
+
+ /* Set the last word to be reserved */
+ current->ret_stack[curr_ret_stack - 1] = val;
+
+ /* Make sure interrupts see this */
+ barrier();
+ current->curr_ret_stack = curr_ret_stack;
+ /* Again sync with interrupts, and reset reserve */
+ current->ret_stack[curr_ret_stack - 1] = val;
+
+ return data;
+}
+
+/**
+ * fgraph_retrieve_data - Retrieve stored data from fgraph_reserve_data()
+ * @idx: the index of fgraph_array (fgraph_ops::idx)
+ * @size_bytes: pointer to retrieved data size.
+ *
+ * This is to be called by a fgraph_ops retfunc(), to retrieve data that
+ * was stored by the fgraph_ops entryfunc() on the function entry.
+ * That is, this will retrieve the data that was reserved on the
+ * entry of the function that corresponds to the exit of the function
+ * that the fgraph_ops retfunc() is called on.
+ *
+ * Returns: The stored data from fgraph_reserve_data() called by the
+ * matching entryfunc() for the retfunc() this is called from.
+ * Or NULL if there was nothing stored.
+ */
+void *fgraph_retrieve_data(int idx, int *size_bytes)
+{
+ int index = current->curr_ret_stack - 1;
+ unsigned long val;
+
+ val = get_fgraph_entry(current, index);
+ while (__get_type(val) == FGRAPH_TYPE_DATA) {
+ if (__get_data_index(val) == idx)
+ goto found;
+ index -= __get_data_size(val) + 1;
+ val = get_fgraph_entry(current, index);
+ }
+ return NULL;
+found:
+ if (size_bytes)
+ *size_bytes = __get_data_size(val) *
+ sizeof(long);
+ return get_fgraph_data(current, index);
+}
+
/**
* fgraph_get_task_var - retrieve a task specific state variable
* @gops: The ftrace_ops that owns the task specific variable
@@ -453,13 +603,18 @@ int function_graph_enter(unsigned long ret, unsigned long func,

for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
struct fgraph_ops *gops = fgraph_array[i];
+ int save_curr_ret_stack;

if (gops == &fgraph_stub)
continue;

+ save_curr_ret_stack = current->curr_ret_stack;
if (ftrace_ops_test(&gops->ops, func, NULL) &&
gops->entryfunc(&trace, gops))
bitmap |= BIT(i);
+ else
+ /* Clear out any saved storage */
+ current->curr_ret_stack = save_curr_ret_stack;
}

if (!bitmap)
@@ -485,6 +640,7 @@ int function_graph_enter_ops(unsigned long ret, unsigned long func,
struct fgraph_ops *gops)
{
struct ftrace_graph_ent trace;
+ int save_curr_ret_stack;
int index;
int type;

@@ -504,13 +660,15 @@ int function_graph_enter_ops(unsigned long ret, unsigned long func,

trace.func = func;
trace.depth = current->curr_ret_depth;
+ save_curr_ret_stack = current->curr_ret_stack;
if (gops->entryfunc(&trace, gops)) {
if (type == FGRAPH_TYPE_RESERVED)
set_fgraph_index_bitmap(current, index, BIT(gops->idx));
else
add_fgraph_index_bitmap(current, index, BIT(gops->idx));
return 0;
- }
+ } else
+ current->curr_ret_stack = save_curr_ret_stack;

if (type == FGRAPH_TYPE_RESERVED) {
current->curr_ret_stack -= FGRAPH_RET_INDEX + 1;
@@ -655,7 +813,8 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
* curr_ret_stack is after that.
*/
barrier();
- current->curr_ret_stack -= FGRAPH_RET_INDEX + 1;
+ current->curr_ret_stack = index - FGRAPH_RET_INDEX;
+
current->curr_ret_depth--;
return ret;
}


2024-04-15 13:10:52

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 19/36] function_graph: Add selftest for passing local variables

From: Steven Rostedt (VMware) <[email protected]>

Add boot up selftest that passes variables from a function entry to a
function exit, and make sure that they do get passed around.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v2:
- Add reserved size test.
- Use pr_*() instead of printk(KERN_*).
---
kernel/trace/trace_selftest.c | 169 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 169 insertions(+)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index f8f55fd79e53..fcdc744c245e 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -756,6 +756,173 @@ trace_selftest_startup_function(struct tracer *trace, struct trace_array *tr)

#ifdef CONFIG_FUNCTION_GRAPH_TRACER

+#ifdef CONFIG_DYNAMIC_FTRACE
+
+#define BYTE_NUMBER 123
+#define SHORT_NUMBER 12345
+#define WORD_NUMBER 1234567890
+#define LONG_NUMBER 1234567890123456789LL
+
+static int fgraph_store_size __initdata;
+static const char *fgraph_store_type_name __initdata;
+static char *fgraph_error_str __initdata;
+static char fgraph_error_str_buf[128] __initdata;
+
+static __init int store_entry(struct ftrace_graph_ent *trace,
+ struct fgraph_ops *gops)
+{
+ const char *type = fgraph_store_type_name;
+ int size = fgraph_store_size;
+ void *p;
+
+ p = fgraph_reserve_data(gops->idx, size);
+ if (!p) {
+ snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+ "Failed to reserve %s\n", type);
+ fgraph_error_str = fgraph_error_str_buf;
+ return 0;
+ }
+
+ switch (fgraph_store_size) {
+ case 1:
+ *(char *)p = BYTE_NUMBER;
+ break;
+ case 2:
+ *(short *)p = SHORT_NUMBER;
+ break;
+ case 4:
+ *(int *)p = WORD_NUMBER;
+ break;
+ case 8:
+ *(long long *)p = LONG_NUMBER;
+ break;
+ }
+
+ return 1;
+}
+
+static __init void store_return(struct ftrace_graph_ret *trace,
+ struct fgraph_ops *gops)
+{
+ const char *type = fgraph_store_type_name;
+ long long expect = 0;
+ long long found = -1;
+ int size;
+ char *p;
+
+ p = fgraph_retrieve_data(gops->idx, &size);
+ if (!p) {
+ snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+ "Failed to retrieve %s\n", type);
+ fgraph_error_str = fgraph_error_str_buf;
+ return;
+ }
+ if (fgraph_store_size > size) {
+ snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+ "Retrieved size %d is smaller than expected %d\n",
+ size, (int)fgraph_store_size);
+ fgraph_error_str = fgraph_error_str_buf;
+ return;
+ }
+
+ switch (fgraph_store_size) {
+ case 1:
+ expect = BYTE_NUMBER;
+ found = *(char *)p;
+ break;
+ case 2:
+ expect = SHORT_NUMBER;
+ found = *(short *)p;
+ break;
+ case 4:
+ expect = WORD_NUMBER;
+ found = *(int *)p;
+ break;
+ case 8:
+ expect = LONG_NUMBER;
+ found = *(long long *)p;
+ break;
+ }
+
+ if (found != expect) {
+ snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+ "%s returned not %lld but %lld\n", type, expect, found);
+ fgraph_error_str = fgraph_error_str_buf;
+ return;
+ }
+ fgraph_error_str = NULL;
+}
+
+static struct fgraph_ops store_bytes __initdata = {
+ .entryfunc = store_entry,
+ .retfunc = store_return,
+};
+
+static int __init test_graph_storage_type(const char *name, int size)
+{
+ char *func_name;
+ int len;
+ int ret;
+
+ fgraph_store_type_name = name;
+ fgraph_store_size = size;
+
+ snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+ "Failed to execute storage %s\n", name);
+ fgraph_error_str = fgraph_error_str_buf;
+
+ pr_cont("PASSED\n");
+ pr_info("Testing fgraph storage of %d byte%s: ", size, size > 1 ? "s" : "");
+
+ func_name = "*" __stringify(DYN_FTRACE_TEST_NAME);
+ len = strlen(func_name);
+
+ ret = ftrace_set_filter(&store_bytes.ops, func_name, len, 1);
+ if (ret && ret != -ENODEV) {
+ pr_cont("*Could not set filter* ");
+ return -1;
+ }
+
+ ret = register_ftrace_graph(&store_bytes);
+ if (ret) {
+ pr_warn("Failed to init store_bytes fgraph tracing\n");
+ return -1;
+ }
+
+ DYN_FTRACE_TEST_NAME();
+
+ unregister_ftrace_graph(&store_bytes);
+
+ if (fgraph_error_str) {
+ pr_cont("*** %s ***", fgraph_error_str);
+ return -1;
+ }
+
+ return 0;
+}
+/* Test the storage passed across function_graph entry and return */
+static __init int test_graph_storage(void)
+{
+ int ret;
+
+ ret = test_graph_storage_type("byte", 1);
+ if (ret)
+ return ret;
+ ret = test_graph_storage_type("short", 2);
+ if (ret)
+ return ret;
+ ret = test_graph_storage_type("word", 4);
+ if (ret)
+ return ret;
+ ret = test_graph_storage_type("long long", 8);
+ if (ret)
+ return ret;
+ return 0;
+}
+#else
+static inline int test_graph_storage(void) { return 0; }
+#endif /* CONFIG_DYNAMIC_FTRACE */
+
/* Maximum number of functions to trace before diagnosing a hang */
#define GRAPH_MAX_FUNC_TEST 100000000

@@ -913,6 +1080,8 @@ trace_selftest_startup_function_graph(struct tracer *trace,
ftrace_set_global_filter(NULL, 0, 1);
#endif

+ ret = test_graph_storage();
+
/* Don't test dynamic tracing, the function tracer already did */
out:
/* Stop it if we failed */


2024-04-15 13:10:52

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 10/36] ftrace: Allow function_graph tracer to be enabled in instances

From: Steven Rostedt (VMware) <[email protected]>

Now that function graph tracing can handle more than one user, allow it to
be enabled in the ftrace instances. Note, the filtering of the functions is
still joined by the top level set_ftrace_filter and friends, as well as the
graph and nograph files.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v2:
- Fix to remove set_graph_array() completely.
---
include/linux/ftrace.h | 1 +
kernel/trace/ftrace.c | 1 +
kernel/trace/trace.h | 13 ++++++-
kernel/trace/trace_functions.c | 8 ++++
kernel/trace/trace_functions_graph.c | 65 +++++++++++++++++++++-------------
kernel/trace/trace_selftest.c | 4 +-
6 files changed, 64 insertions(+), 28 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 483876444d32..2eb4981ec80b 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1070,6 +1070,7 @@ extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph
struct fgraph_ops {
trace_func_graph_ent_t entryfunc;
trace_func_graph_ret_t retfunc;
+ void *private;
int idx;
};

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 4b0708106692..92abb9869198 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7326,6 +7326,7 @@ __init void ftrace_init_global_array_ops(struct trace_array *tr)
tr->ops = &global_ops;
tr->ops->private = tr;
ftrace_init_trace_array(tr);
+ init_array_fgraph_ops(tr);
}

void ftrace_init_array_ops(struct trace_array *tr, ftrace_func_t func)
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 55bb9a3bf322..114b120afd2a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -396,6 +396,9 @@ struct trace_array {
struct ftrace_ops *ops;
struct trace_pid_list __rcu *function_pids;
struct trace_pid_list __rcu *function_no_pids;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+ struct fgraph_ops *gops;
+#endif
#ifdef CONFIG_DYNAMIC_FTRACE
/* All of these are protected by the ftrace_lock */
struct list_head func_probes;
@@ -680,7 +683,6 @@ void print_trace_header(struct seq_file *m, struct trace_iterator *iter);

void trace_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops *gops);
int trace_graph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);
-void set_graph_array(struct trace_array *tr);

void tracing_start_cmdline_record(void);
void tracing_stop_cmdline_record(void);
@@ -891,6 +893,9 @@ extern int __trace_graph_entry(struct trace_array *tr,
extern void __trace_graph_return(struct trace_array *tr,
struct ftrace_graph_ret *trace,
unsigned int trace_ctx);
+extern void init_array_fgraph_ops(struct trace_array *tr);
+extern int allocate_fgraph_ops(struct trace_array *tr);
+extern void free_fgraph_ops(struct trace_array *tr);

#ifdef CONFIG_DYNAMIC_FTRACE
extern struct ftrace_hash __rcu *ftrace_graph_hash;
@@ -1003,6 +1008,12 @@ print_graph_function_flags(struct trace_iterator *iter, u32 flags)
{
return TRACE_TYPE_UNHANDLED;
}
+static inline void init_array_fgraph_ops(struct trace_array *tr) { }
+static inline int allocate_fgraph_ops(struct trace_array *tr)
+{
+ return 0;
+}
+static inline void free_fgraph_ops(struct trace_array *tr) { }
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */

extern struct list_head ftrace_pids;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 9f1bfbe105e8..8e8da0d0ee52 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -80,6 +80,7 @@ void ftrace_free_ftrace_ops(struct trace_array *tr)
int ftrace_create_function_files(struct trace_array *tr,
struct dentry *parent)
{
+ int ret;
/*
* The top level array uses the "global_ops", and the files are
* created on boot up.
@@ -90,6 +91,12 @@ int ftrace_create_function_files(struct trace_array *tr,
if (!tr->ops)
return -EINVAL;

+ ret = allocate_fgraph_ops(tr);
+ if (ret) {
+ kfree(tr->ops);
+ return ret;
+ }
+
ftrace_create_filter_files(tr->ops, parent);

return 0;
@@ -99,6 +106,7 @@ void ftrace_destroy_function_files(struct trace_array *tr)
{
ftrace_destroy_filter_files(tr->ops);
ftrace_free_ftrace_ops(tr);
+ free_fgraph_ops(tr);
}

static ftrace_func_t select_trace_function(u32 flags_val)
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index b7b142b65299..9ccc904a7703 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -83,8 +83,6 @@ static struct tracer_flags tracer_flags = {
.opts = trace_opts
};

-static struct trace_array *graph_array;
-
/*
* DURATION column is being also used to display IRQ signs,
* following values are used by print_graph_irq and others
@@ -132,7 +130,7 @@ static inline int ftrace_graph_ignore_irqs(void)
int trace_graph_entry(struct ftrace_graph_ent *trace,
struct fgraph_ops *gops)
{
- struct trace_array *tr = graph_array;
+ struct trace_array *tr = gops->private;
struct trace_array_cpu *data;
unsigned long flags;
unsigned int trace_ctx;
@@ -242,7 +240,7 @@ void __trace_graph_return(struct trace_array *tr,
void trace_graph_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops)
{
- struct trace_array *tr = graph_array;
+ struct trace_array *tr = gops->private;
struct trace_array_cpu *data;
unsigned long flags;
unsigned int trace_ctx;
@@ -268,15 +266,6 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
local_irq_restore(flags);
}

-void set_graph_array(struct trace_array *tr)
-{
- graph_array = tr;
-
- /* Make graph_array visible before we start tracing */
-
- smp_mb();
-}
-
static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops)
{
@@ -294,25 +283,53 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
trace_graph_return(trace, gops);
}

-static struct fgraph_ops funcgraph_thresh_ops = {
- .entryfunc = &trace_graph_entry,
- .retfunc = &trace_graph_thresh_return,
-};
-
static struct fgraph_ops funcgraph_ops = {
.entryfunc = &trace_graph_entry,
.retfunc = &trace_graph_return,
};

+int allocate_fgraph_ops(struct trace_array *tr)
+{
+ struct fgraph_ops *gops;
+
+ gops = kzalloc(sizeof(*gops), GFP_KERNEL);
+ if (!gops)
+ return -ENOMEM;
+
+ gops->entryfunc = &trace_graph_entry;
+ gops->retfunc = &trace_graph_return;
+
+ tr->gops = gops;
+ gops->private = tr;
+ return 0;
+}
+
+void free_fgraph_ops(struct trace_array *tr)
+{
+ kfree(tr->gops);
+}
+
+__init void init_array_fgraph_ops(struct trace_array *tr)
+{
+ tr->gops = &funcgraph_ops;
+ funcgraph_ops.private = tr;
+}
+
static int graph_trace_init(struct trace_array *tr)
{
int ret;

- set_graph_array(tr);
+ tr->gops->entryfunc = trace_graph_entry;
+
if (tracing_thresh)
- ret = register_ftrace_graph(&funcgraph_thresh_ops);
+ tr->gops->retfunc = trace_graph_thresh_return;
else
- ret = register_ftrace_graph(&funcgraph_ops);
+ tr->gops->retfunc = trace_graph_return;
+
+ /* Make gops functions are visible before we start tracing */
+ smp_mb();
+
+ ret = register_ftrace_graph(tr->gops);
if (ret)
return ret;
tracing_start_cmdline_record();
@@ -323,10 +340,7 @@ static int graph_trace_init(struct trace_array *tr)
static void graph_trace_reset(struct trace_array *tr)
{
tracing_stop_cmdline_record();
- if (tracing_thresh)
- unregister_ftrace_graph(&funcgraph_thresh_ops);
- else
- unregister_ftrace_graph(&funcgraph_ops);
+ unregister_ftrace_graph(tr->gops);
}

static int graph_trace_update_thresh(struct trace_array *tr)
@@ -1365,6 +1379,7 @@ static struct tracer graph_trace __tracer_data = {
.print_header = print_graph_headers,
.flags = &tracer_flags,
.set_flag = func_graph_set_flag,
+ .allow_instances = true,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_function_graph,
#endif
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 56f269c0560a..f8f55fd79e53 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -813,7 +813,7 @@ trace_selftest_startup_function_graph(struct tracer *trace,
* to detect and recover from possible hangs
*/
tracing_reset_online_cpus(&tr->array_buffer);
- set_graph_array(tr);
+ fgraph_ops.private = tr;
ret = register_ftrace_graph(&fgraph_ops);
if (ret) {
warn_failed_init_tracer(trace, ret);
@@ -856,7 +856,7 @@ trace_selftest_startup_function_graph(struct tracer *trace,
cond_resched();

tracing_reset_online_cpus(&tr->array_buffer);
- set_graph_array(tr);
+ fgraph_ops.private = tr;

/*
* Some archs *cough*PowerPC*cough* add characters to the


2024-04-15 13:11:08

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 20/36] ftrace: Add multiple fgraph storage selftest

From: Masami Hiramatsu (Google) <[email protected]>

Add a selftest for multiple function graph tracer with storage on a same
function. In this case, the shadow stack entry will be shared among those
fgraph with different data storage. So this will ensure the fgraph will
not mixed those storage data.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Suggested-by: Steven Rostedt (Google) <[email protected]>
---
Changes in v8:
- Newly added.
---
kernel/trace/trace_selftest.c | 171 ++++++++++++++++++++++++++++++-----------
1 file changed, 126 insertions(+), 45 deletions(-)

diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index fcdc744c245e..369efc569238 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -762,28 +762,32 @@ trace_selftest_startup_function(struct tracer *trace, struct trace_array *tr)
#define SHORT_NUMBER 12345
#define WORD_NUMBER 1234567890
#define LONG_NUMBER 1234567890123456789LL
-
-static int fgraph_store_size __initdata;
-static const char *fgraph_store_type_name __initdata;
-static char *fgraph_error_str __initdata;
-static char fgraph_error_str_buf[128] __initdata;
+#define ERRSTR_BUFLEN 128
+
+struct fgraph_fixture {
+ struct fgraph_ops gops;
+ int store_size;
+ const char *store_type_name;
+ char error_str_buf[ERRSTR_BUFLEN];
+ char *error_str;
+};

static __init int store_entry(struct ftrace_graph_ent *trace,
struct fgraph_ops *gops)
{
- const char *type = fgraph_store_type_name;
- int size = fgraph_store_size;
+ struct fgraph_fixture *fixture = container_of(gops, struct fgraph_fixture, gops);
+ const char *type = fixture->store_type_name;
+ int size = fixture->store_size;
void *p;

p = fgraph_reserve_data(gops->idx, size);
if (!p) {
- snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+ snprintf(fixture->error_str_buf, ERRSTR_BUFLEN,
"Failed to reserve %s\n", type);
- fgraph_error_str = fgraph_error_str_buf;
return 0;
}

- switch (fgraph_store_size) {
+ switch (size) {
case 1:
*(char *)p = BYTE_NUMBER;
break;
@@ -804,7 +808,8 @@ static __init int store_entry(struct ftrace_graph_ent *trace,
static __init void store_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops)
{
- const char *type = fgraph_store_type_name;
+ struct fgraph_fixture *fixture = container_of(gops, struct fgraph_fixture, gops);
+ const char *type = fixture->store_type_name;
long long expect = 0;
long long found = -1;
int size;
@@ -812,20 +817,18 @@ static __init void store_return(struct ftrace_graph_ret *trace,

p = fgraph_retrieve_data(gops->idx, &size);
if (!p) {
- snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+ snprintf(fixture->error_str_buf, ERRSTR_BUFLEN,
"Failed to retrieve %s\n", type);
- fgraph_error_str = fgraph_error_str_buf;
return;
}
- if (fgraph_store_size > size) {
- snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+ if (fixture->store_size > size) {
+ snprintf(fixture->error_str_buf, ERRSTR_BUFLEN,
"Retrieved size %d is smaller than expected %d\n",
- size, (int)fgraph_store_size);
- fgraph_error_str = fgraph_error_str_buf;
+ size, (int)fixture->store_size);
return;
}

- switch (fgraph_store_size) {
+ switch (fixture->store_size) {
case 1:
expect = BYTE_NUMBER;
found = *(char *)p;
@@ -845,45 +848,44 @@ static __init void store_return(struct ftrace_graph_ret *trace,
}

if (found != expect) {
- snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
+ snprintf(fixture->error_str_buf, ERRSTR_BUFLEN,
"%s returned not %lld but %lld\n", type, expect, found);
- fgraph_error_str = fgraph_error_str_buf;
return;
}
- fgraph_error_str = NULL;
+ fixture->error_str = NULL;
}

-static struct fgraph_ops store_bytes __initdata = {
- .entryfunc = store_entry,
- .retfunc = store_return,
-};
-
-static int __init test_graph_storage_type(const char *name, int size)
+static int __init init_fgraph_fixture(struct fgraph_fixture *fixture)
{
char *func_name;
int len;
- int ret;

- fgraph_store_type_name = name;
- fgraph_store_size = size;
+ snprintf(fixture->error_str_buf, ERRSTR_BUFLEN,
+ "Failed to execute storage %s\n", fixture->store_type_name);
+ fixture->error_str = fixture->error_str_buf;

- snprintf(fgraph_error_str_buf, sizeof(fgraph_error_str_buf),
- "Failed to execute storage %s\n", name);
- fgraph_error_str = fgraph_error_str_buf;
+ func_name = "*" __stringify(DYN_FTRACE_TEST_NAME);
+ len = strlen(func_name);
+
+ return ftrace_set_filter(&fixture->gops.ops, func_name, len, 1);
+}
+
+/* Test fgraph storage for each size */
+static int __init test_graph_storage_single(struct fgraph_fixture *fixture)
+{
+ int size = fixture->store_size;
+ int ret;

pr_cont("PASSED\n");
pr_info("Testing fgraph storage of %d byte%s: ", size, size > 1 ? "s" : "");

- func_name = "*" __stringify(DYN_FTRACE_TEST_NAME);
- len = strlen(func_name);
-
- ret = ftrace_set_filter(&store_bytes.ops, func_name, len, 1);
+ ret = init_fgraph_fixture(fixture);
if (ret && ret != -ENODEV) {
pr_cont("*Could not set filter* ");
return -1;
}

- ret = register_ftrace_graph(&store_bytes);
+ ret = register_ftrace_graph(&fixture->gops);
if (ret) {
pr_warn("Failed to init store_bytes fgraph tracing\n");
return -1;
@@ -891,30 +893,109 @@ static int __init test_graph_storage_type(const char *name, int size)

DYN_FTRACE_TEST_NAME();

- unregister_ftrace_graph(&store_bytes);
+ unregister_ftrace_graph(&fixture->gops);

- if (fgraph_error_str) {
- pr_cont("*** %s ***", fgraph_error_str);
+ if (fixture->error_str) {
+ pr_cont("*** %s ***", fixture->error_str);
return -1;
}

return 0;
}
+
+static struct fgraph_fixture store_bytes[4] __initdata = {
+ [0] = {
+ .gops = {
+ .entryfunc = store_entry,
+ .retfunc = store_return,
+ },
+ .store_size = 1,
+ .store_type_name = "byte",
+ },
+ [1] = {
+ .gops = {
+ .entryfunc = store_entry,
+ .retfunc = store_return,
+ },
+ .store_size = 2,
+ .store_type_name = "short",
+ },
+ [2] = {
+ .gops = {
+ .entryfunc = store_entry,
+ .retfunc = store_return,
+ },
+ .store_size = 4,
+ .store_type_name = "word",
+ },
+ [3] = {
+ .gops = {
+ .entryfunc = store_entry,
+ .retfunc = store_return,
+ },
+ .store_size = 8,
+ .store_type_name = "long long",
+ },
+};
+
+static __init int test_graph_storage_multi(void)
+{
+ struct fgraph_fixture *fixture;
+ bool printed = false;
+ int i, ret;
+
+ pr_cont("PASSED\n");
+ pr_info("Testing multiple fgraph storage on a function: ");
+
+ for (i = 0; i < ARRAY_SIZE(store_bytes); i++) {
+ fixture = &store_bytes[i];
+ ret = init_fgraph_fixture(fixture);
+ if (ret && ret != -ENODEV) {
+ pr_cont("*Could not set filter* ");
+ printed = true;
+ goto out;
+ }
+
+ ret = register_ftrace_graph(&fixture->gops);
+ if (ret) {
+ pr_warn("Failed to init store_bytes fgraph tracing\n");
+ printed = true;
+ goto out;
+ }
+ }
+
+ DYN_FTRACE_TEST_NAME();
+out:
+ while (--i >= 0) {
+ fixture = &store_bytes[i];
+ unregister_ftrace_graph(&fixture->gops);
+
+ if (fixture->error_str && !printed) {
+ pr_cont("*** %s ***", fixture->error_str);
+ printed = true;
+ }
+ }
+ return printed ? -1 : 0;
+}
+
/* Test the storage passed across function_graph entry and return */
static __init int test_graph_storage(void)
{
int ret;

- ret = test_graph_storage_type("byte", 1);
+ ret = test_graph_storage_single(&store_bytes[0]);
+ if (ret)
+ return ret;
+ ret = test_graph_storage_single(&store_bytes[1]);
if (ret)
return ret;
- ret = test_graph_storage_type("short", 2);
+ ret = test_graph_storage_single(&store_bytes[2]);
if (ret)
return ret;
- ret = test_graph_storage_type("word", 4);
+ ret = test_graph_storage_single(&store_bytes[3]);
if (ret)
return ret;
- ret = test_graph_storage_type("long long", 8);
+ ret = test_graph_storage_multi();
if (ret)
return ret;
return 0;


2024-04-15 13:11:29

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 21/36] function_graph: Pass ftrace_regs to entryfunc

From: Masami Hiramatsu (Google) <[email protected]>

Pass ftrace_regs to the fgraph_ops::entryfunc(). If ftrace_regs is not
available, it passes a NULL instead. User callback function can access
some registers (including return address) via this ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v8:
- Just pass ftrace_regs to the handler instead of adding a new
entryregfunc.
- Update riscv ftrace_graph_func().
Changes in v3:
- Update for new multiple fgraph.
---
arch/arm64/kernel/ftrace.c | 2 +
arch/loongarch/kernel/ftrace_dyn.c | 2 +
arch/powerpc/kernel/trace/ftrace.c | 2 +
arch/powerpc/kernel/trace/ftrace_64_pg.c | 10 ++++---
arch/riscv/kernel/ftrace.c | 2 +
arch/x86/kernel/ftrace.c | 42 ++++++++++++++++--------------
include/linux/ftrace.h | 20 +++++++++++---
kernel/trace/fgraph.c | 21 +++++++++------
kernel/trace/ftrace.c | 3 +-
kernel/trace/trace.h | 3 +-
kernel/trace/trace_functions_graph.c | 3 +-
kernel/trace/trace_irqsoff.c | 3 +-
kernel/trace/trace_sched_wakeup.c | 3 +-
kernel/trace/trace_selftest.c | 8 ++++--
14 files changed, 76 insertions(+), 48 deletions(-)

diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index b96740829798..779b975f03f5 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -497,7 +497,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
return;

if (!function_graph_enter_ops(*parent, ip, frame_pointer,
- (void *)frame_pointer, gops))
+ (void *)frame_pointer, fregs, gops))
*parent = (unsigned long)&return_to_handler;

ftrace_test_recursion_unlock(bit);
diff --git a/arch/loongarch/kernel/ftrace_dyn.c b/arch/loongarch/kernel/ftrace_dyn.c
index 920eb673b32b..155bdaba2012 100644
--- a/arch/loongarch/kernel/ftrace_dyn.c
+++ b/arch/loongarch/kernel/ftrace_dyn.c
@@ -254,7 +254,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,

old = *parent;

- if (!function_graph_enter_ops(old, ip, 0, parent, gops))
+ if (!function_graph_enter_ops(old, ip, 0, parent, fregs, gops))
*parent = return_hooker;
}
#else
diff --git a/arch/powerpc/kernel/trace/ftrace.c b/arch/powerpc/kernel/trace/ftrace.c
index 4a9294821c0d..501adb80fc8d 100644
--- a/arch/powerpc/kernel/trace/ftrace.c
+++ b/arch/powerpc/kernel/trace/ftrace.c
@@ -435,7 +435,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
if (bit < 0)
goto out;

- if (!function_graph_enter_ops(parent_ip, ip, 0, (unsigned long *)sp, gops))
+ if (!function_graph_enter_ops(parent_ip, ip, 0, (unsigned long *)sp, fregs, gops))
parent_ip = ppc_function_entry(return_to_handler);

ftrace_test_recursion_unlock(bit);
diff --git a/arch/powerpc/kernel/trace/ftrace_64_pg.c b/arch/powerpc/kernel/trace/ftrace_64_pg.c
index 12fab1803bcf..4ae9eeb1c8f1 100644
--- a/arch/powerpc/kernel/trace/ftrace_64_pg.c
+++ b/arch/powerpc/kernel/trace/ftrace_64_pg.c
@@ -800,7 +800,8 @@ int ftrace_disable_ftrace_graph_caller(void)
* in current thread info. Return the address we want to divert to.
*/
static unsigned long
-__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp)
+__prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp,
+ struct ftrace_regs *fregs)
{
unsigned long return_hooker;
int bit;
@@ -817,7 +818,7 @@ __prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp

return_hooker = ppc_function_entry(return_to_handler);

- if (!function_graph_enter(parent, ip, 0, (unsigned long *)sp))
+ if (!function_graph_enter_regs(parent, ip, 0, (unsigned long *)sp, fregs))
parent = return_hooker;

ftrace_test_recursion_unlock(bit);
@@ -829,13 +830,14 @@ __prepare_ftrace_return(unsigned long parent, unsigned long ip, unsigned long sp
void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs)
{
- fregs->regs.link = __prepare_ftrace_return(parent_ip, ip, fregs->regs.gpr[1]);
+ fregs->regs.link = __prepare_ftrace_return(parent_ip, ip,
+ fregs->regs.gpr[1], fregs);
}
#else
unsigned long prepare_ftrace_return(unsigned long parent, unsigned long ip,
unsigned long sp)
{
- return __prepare_ftrace_return(parent, ip, sp);
+ return __prepare_ftrace_return(parent, ip, sp, NULL);
}
#endif
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
index eb86fb005f34..59c2824e2aaf 100644
--- a/arch/riscv/kernel/ftrace.c
+++ b/arch/riscv/kernel/ftrace.c
@@ -197,7 +197,7 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
*/
old = *parent;

- if (!function_graph_enter_ops(old, ip, frame_pointer(regs), parent, gops))
+ if (!function_graph_enter_ops(old, ip, frame_pointer(regs), parent, fregs, gops))
*parent = return_hooker;
}
#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 5e30cd69b8ab..fb81afa7d07d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -615,16 +615,8 @@ int ftrace_disable_ftrace_graph_caller(void)
}
#endif /* CONFIG_DYNAMIC_FTRACE && !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */

-/*
- * Hook the return address and push it in the stack of return addrs
- * in current thread info.
- */
-void prepare_ftrace_return(unsigned long ip, unsigned long *parent,
- unsigned long frame_pointer)
+static inline bool skip_ftrace_return(void)
{
- unsigned long return_hooker = (unsigned long)&return_to_handler;
- int bit;
-
/*
* When resuming from suspend-to-ram, this function can be indirectly
* called from early CPU startup code while the CPU is in real mode,
@@ -634,13 +626,28 @@ void prepare_ftrace_return(unsigned long ip, unsigned long *parent,
* This check isn't as accurate as virt_addr_valid(), but it should be
* good enough for this purpose, and it's fast.
*/
- if (unlikely((long)__builtin_frame_address(0) >= 0))
- return;
+ if ((long)__builtin_frame_address(0) >= 0)
+ return true;

- if (unlikely(ftrace_graph_is_dead()))
- return;
+ if (ftrace_graph_is_dead())
+ return true;
+
+ if (atomic_read(&current->tracing_graph_pause))
+ return true;
+ return false;
+}

- if (unlikely(atomic_read(&current->tracing_graph_pause)))
+/*
+ * Hook the return address and push it in the stack of return addrs
+ * in current thread info.
+ */
+void prepare_ftrace_return(unsigned long ip, unsigned long *parent,
+ unsigned long frame_pointer)
+{
+ unsigned long return_hooker = (unsigned long)&return_to_handler;
+ int bit;
+
+ if (unlikely(skip_ftrace_return()))
return;

bit = ftrace_test_recursion_trylock(ip, *parent);
@@ -662,17 +669,14 @@ void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
struct fgraph_ops *gops = container_of(op, struct fgraph_ops, ops);
int bit;

- if (unlikely(ftrace_graph_is_dead()))
- return;
-
- if (unlikely(atomic_read(&current->tracing_graph_pause)))
+ if (unlikely(skip_ftrace_return()))
return;

bit = ftrace_test_recursion_trylock(ip, *parent);
if (bit < 0)
return;

- if (!function_graph_enter_ops(*parent, ip, 0, parent, gops))
+ if (!function_graph_enter_ops(*parent, ip, 0, parent, fregs, gops))
*parent = (unsigned long)&return_to_handler;

ftrace_test_recursion_unlock(bit);
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 4c53f3dffab8..087345ef0d72 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1061,9 +1061,12 @@ struct fgraph_ops;
typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *,
struct fgraph_ops *); /* return */
typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *,
- struct fgraph_ops *); /* entry */
+ struct fgraph_ops *,
+ struct ftrace_regs *); /* entry */

-extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);
+extern int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace,
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs);

#ifdef CONFIG_FUNCTION_GRAPH_TRACER

@@ -1106,13 +1109,20 @@ struct ftrace_ret_stack {
extern void return_to_handler(void);

extern int
-function_graph_enter(unsigned long ret, unsigned long func,
- unsigned long frame_pointer, unsigned long *retp);
+function_graph_enter_regs(unsigned long ret, unsigned long func,
+ unsigned long frame_pointer, unsigned long *retp,
+ struct ftrace_regs *fregs);
+
+static inline int function_graph_enter(unsigned long ret, unsigned long func,
+ unsigned long fp, unsigned long *retp)
+{
+ return function_graph_enter_regs(ret, func, fp, retp, NULL);
+}

extern int
function_graph_enter_ops(unsigned long ret, unsigned long func,
unsigned long frame_pointer, unsigned long *retp,
- struct fgraph_ops *gops);
+ struct ftrace_regs *fregs, struct fgraph_ops *gops);

struct ftrace_ret_stack *
ftrace_graph_get_ret_stack(struct task_struct *task, int idx);
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index d13806ca1bbb..05845291c4bd 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -248,7 +248,8 @@ static inline unsigned long make_fgraph_data(int idx, int size, int offset)
}

/* ftrace_graph_entry set to this to tell some archs to run function graph */
-static int entry_run(struct ftrace_graph_ent *trace, struct fgraph_ops *ops)
+static int entry_run(struct ftrace_graph_ent *trace, struct fgraph_ops *ops,
+ struct ftrace_regs *fregs)
{
return 0;
}
@@ -440,7 +441,7 @@ int __weak ftrace_disable_ftrace_graph_caller(void)
#endif

int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops, struct ftrace_regs *fregs)
{
return 0;
}
@@ -574,8 +575,9 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
#endif

/* If the caller does not use ftrace, call this function. */
-int function_graph_enter(unsigned long ret, unsigned long func,
- unsigned long frame_pointer, unsigned long *retp)
+int function_graph_enter_regs(unsigned long ret, unsigned long func,
+ unsigned long frame_pointer, unsigned long *retp,
+ struct ftrace_regs *fregs)
{
struct ftrace_graph_ent trace;
unsigned long bitmap = 0;
@@ -610,7 +612,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,

save_curr_ret_stack = current->curr_ret_stack;
if (ftrace_ops_test(&gops->ops, func, NULL) &&
- gops->entryfunc(&trace, gops))
+ gops->entryfunc(&trace, gops, fregs))
bitmap |= BIT(i);
else
/* Clear out any saved storage */
@@ -637,6 +639,7 @@ int function_graph_enter(unsigned long ret, unsigned long func,
/* This is called from ftrace_graph_func() via ftrace */
int function_graph_enter_ops(unsigned long ret, unsigned long func,
unsigned long frame_pointer, unsigned long *retp,
+ struct ftrace_regs *fregs,
struct fgraph_ops *gops)
{
struct ftrace_graph_ent trace;
@@ -661,7 +664,7 @@ int function_graph_enter_ops(unsigned long ret, unsigned long func,
trace.func = func;
trace.depth = current->curr_ret_depth;
save_curr_ret_stack = current->curr_ret_stack;
- if (gops->entryfunc(&trace, gops)) {
+ if (gops->entryfunc(&trace, gops, fregs)) {
if (type == FGRAPH_TYPE_RESERVED)
set_fgraph_index_bitmap(current, index, BIT(gops->idx));
else
@@ -942,7 +945,8 @@ void fgraph_init_ops(struct ftrace_ops *dst_ops,
struct ftrace_ops *src_ops)
{
dst_ops->func = ftrace_graph_func;
- dst_ops->flags = FTRACE_OPS_FL_PID | FTRACE_OPS_GRAPH_STUB;
+ dst_ops->flags = FTRACE_OPS_FL_PID | FTRACE_OPS_GRAPH_STUB |
+ FTRACE_OPS_FL_SAVE_ARGS;

#ifdef FTRACE_GRAPH_TRAMP_ADDR
dst_ops->trampoline = FTRACE_GRAPH_TRAMP_ADDR;
@@ -1187,7 +1191,8 @@ int register_ftrace_graph(struct fgraph_ops *gops)
mutex_lock(&ftrace_lock);

if (!gops->ops.func) {
- gops->ops.flags |= FTRACE_OPS_GRAPH_STUB;
+ gops->ops.flags |= FTRACE_OPS_GRAPH_STUB |
+ FTRACE_OPS_FL_SAVE_ARGS;
gops->ops.func = ftrace_graph_func;
#ifdef FTRACE_GRAPH_TRAMP_ADDR
gops->ops.trampoline = FTRACE_GRAPH_TRAMP_ADDR;
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 45fd2710f81b..5377a0b22ec9 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -816,7 +816,8 @@ void ftrace_graph_graph_time_control(bool enable)
}

static int profile_graph_entry(struct ftrace_graph_ent *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
struct ftrace_ret_stack *ret_stack;

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f23b6fbd547d..8221b6febb51 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -682,7 +682,8 @@ void trace_default_header(struct seq_file *m);
void print_trace_header(struct seq_file *m, struct trace_iterator *iter);

void trace_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops *gops);
-int trace_graph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops);
+int trace_graph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops,
+ struct ftrace_regs *fregs);

void tracing_start_cmdline_record(void);
void tracing_stop_cmdline_record(void);
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 13d0387ac6a6..b9785fc919c9 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -128,7 +128,8 @@ static inline int ftrace_graph_ignore_irqs(void)
}

int trace_graph_entry(struct ftrace_graph_ent *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
unsigned long *task_var = fgraph_get_task_var(gops);
struct trace_array *tr = gops->private;
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index fce064e20570..ad739d76fc86 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -176,7 +176,8 @@ static int irqsoff_display_graph(struct trace_array *tr, int set)
}

static int irqsoff_graph_entry(struct ftrace_graph_ent *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
struct trace_array *tr = irqsoff_trace;
struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 130ca7e7787e..23360a2700de 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -113,7 +113,8 @@ static int wakeup_display_graph(struct trace_array *tr, int set)
}

static int wakeup_graph_entry(struct ftrace_graph_ent *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
struct trace_array *tr = wakeup_trace;
struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 369efc569238..5edbf09844d9 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -773,7 +773,8 @@ struct fgraph_fixture {
};

static __init int store_entry(struct ftrace_graph_ent *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
struct fgraph_fixture *fixture = container_of(gops, struct fgraph_fixture, gops);
const char *type = fixture->store_type_name;
@@ -1011,7 +1012,8 @@ static unsigned int graph_hang_thresh;

/* Wrap the real function entry probe to avoid possible hanging */
static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
/* This is harmlessly racy, we want to approximately detect a hang */
if (unlikely(++graph_hang_thresh > GRAPH_MAX_FUNC_TEST)) {
@@ -1025,7 +1027,7 @@ static int trace_graph_entry_watchdog(struct ftrace_graph_ent *trace,
return 0;
}

- return trace_graph_entry(trace, gops);
+ return trace_graph_entry(trace, gops, fregs);
}

static struct fgraph_ops fgraph_ops __initdata = {


2024-04-15 13:11:59

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 15/36] function_graph: Move set_graph_function tests to shadow stack global var

From: Steven Rostedt (VMware) <[email protected]>

The use of the task->trace_recursion for the logic used for the
set_graph_funnction was a bit of an abuse of that variable. Now that there
exists global vars that are per stack for registered graph traces, use that
instead.

Signed-off-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
include/linux/trace_recursion.h | 5 +----
kernel/trace/trace.h | 32 +++++++++++++++++++++-----------
kernel/trace/trace_functions_graph.c | 6 +++---
kernel/trace/trace_irqsoff.c | 4 ++--
kernel/trace/trace_sched_wakeup.c | 4 ++--
5 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h
index d48cd92d2364..2efd5ec46d7f 100644
--- a/include/linux/trace_recursion.h
+++ b/include/linux/trace_recursion.h
@@ -44,9 +44,6 @@ enum {
*/
TRACE_IRQ_BIT,

- /* Set if the function is in the set_graph_function file */
- TRACE_GRAPH_BIT,
-
/*
* In the very unlikely case that an interrupt came in
* at a start of graph tracing, and we want to trace
@@ -60,7 +57,7 @@ enum {
* that preempted a softirq start of a function that
* preempted normal context!!!! Luckily, it can't be
* greater than 3, so the next two bits are a mask
- * of what the depth is when we set TRACE_GRAPH_BIT
+ * of what the depth is when we set TRACE_GRAPH_FL
*/

TRACE_GRAPH_DEPTH_START_BIT,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 9995d6b00a93..c7c7e7c9f700 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -897,11 +897,16 @@ extern void init_array_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops
extern int allocate_fgraph_ops(struct trace_array *tr, struct ftrace_ops *ops);
extern void free_fgraph_ops(struct trace_array *tr);

+enum {
+ TRACE_GRAPH_FL = 1,
+};
+
#ifdef CONFIG_DYNAMIC_FTRACE
extern struct ftrace_hash __rcu *ftrace_graph_hash;
extern struct ftrace_hash __rcu *ftrace_graph_notrace_hash;

-static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
+static inline int
+ftrace_graph_addr(unsigned long *task_var, struct ftrace_graph_ent *trace)
{
unsigned long addr = trace->func;
int ret = 0;
@@ -923,12 +928,11 @@ static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
}

if (ftrace_lookup_ip(hash, addr)) {
-
/*
* This needs to be cleared on the return functions
* when the depth is zero.
*/
- trace_recursion_set(TRACE_GRAPH_BIT);
+ *task_var |= TRACE_GRAPH_FL;
trace_recursion_set_depth(trace->depth);

/*
@@ -948,11 +952,14 @@ static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
return ret;
}

-static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
+static inline void
+ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftrace_graph_ret *trace)
{
- if (trace_recursion_test(TRACE_GRAPH_BIT) &&
+ unsigned long *task_var = fgraph_get_task_var(gops);
+
+ if ((*task_var & TRACE_GRAPH_FL) &&
trace->depth == trace_recursion_depth())
- trace_recursion_clear(TRACE_GRAPH_BIT);
+ *task_var &= ~TRACE_GRAPH_FL;
}

static inline int ftrace_graph_notrace_addr(unsigned long addr)
@@ -979,7 +986,7 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
}

#else
-static inline int ftrace_graph_addr(struct ftrace_graph_ent *trace)
+static inline int ftrace_graph_addr(unsigned long *task_var, struct ftrace_graph_ent *trace)
{
return 1;
}
@@ -988,17 +995,20 @@ static inline int ftrace_graph_notrace_addr(unsigned long addr)
{
return 0;
}
-static inline void ftrace_graph_addr_finish(struct ftrace_graph_ret *trace)
+static inline void ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftrace_graph_ret *trace)
{ }
#endif /* CONFIG_DYNAMIC_FTRACE */

extern unsigned int fgraph_max_depth;

-static inline bool ftrace_graph_ignore_func(struct ftrace_graph_ent *trace)
+static inline bool
+ftrace_graph_ignore_func(struct fgraph_ops *gops, struct ftrace_graph_ent *trace)
{
+ unsigned long *task_var = fgraph_get_task_var(gops);
+
/* trace it when it is-nested-in or is a function enabled. */
- return !(trace_recursion_test(TRACE_GRAPH_BIT) ||
- ftrace_graph_addr(trace)) ||
+ return !((*task_var & TRACE_GRAPH_FL) ||
+ ftrace_graph_addr(task_var, trace)) ||
(trace->depth < 0) ||
(fgraph_max_depth && trace->depth >= fgraph_max_depth);
}
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 7f30652f0e97..66cce73e94f8 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -160,7 +160,7 @@ int trace_graph_entry(struct ftrace_graph_ent *trace,
if (!ftrace_trace_task(tr))
return 0;

- if (ftrace_graph_ignore_func(trace))
+ if (ftrace_graph_ignore_func(gops, trace))
return 0;

if (ftrace_graph_ignore_irqs())
@@ -247,7 +247,7 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
long disabled;
int cpu;

- ftrace_graph_addr_finish(trace);
+ ftrace_graph_addr_finish(gops, trace);

if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
@@ -269,7 +269,7 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops)
{
- ftrace_graph_addr_finish(trace);
+ ftrace_graph_addr_finish(gops, trace);

if (trace_recursion_test(TRACE_GRAPH_NOTRACE_BIT)) {
trace_recursion_clear(TRACE_GRAPH_NOTRACE_BIT);
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 5478f4c4f708..fce064e20570 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -184,7 +184,7 @@ static int irqsoff_graph_entry(struct ftrace_graph_ent *trace,
unsigned int trace_ctx;
int ret;

- if (ftrace_graph_ignore_func(trace))
+ if (ftrace_graph_ignore_func(gops, trace))
return 0;
/*
* Do not trace a function if it's filtered by set_graph_notrace.
@@ -214,7 +214,7 @@ static void irqsoff_graph_return(struct ftrace_graph_ret *trace,
unsigned long flags;
unsigned int trace_ctx;

- ftrace_graph_addr_finish(trace);
+ ftrace_graph_addr_finish(gops, trace);

if (!func_prolog_dec(tr, &data, &flags))
return;
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 49bcc812652c..130ca7e7787e 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -120,7 +120,7 @@ static int wakeup_graph_entry(struct ftrace_graph_ent *trace,
unsigned int trace_ctx;
int ret = 0;

- if (ftrace_graph_ignore_func(trace))
+ if (ftrace_graph_ignore_func(gops, trace))
return 0;
/*
* Do not trace a function if it's filtered by set_graph_notrace.
@@ -149,7 +149,7 @@ static void wakeup_graph_return(struct ftrace_graph_ret *trace,
struct trace_array_cpu *data;
unsigned int trace_ctx;

- ftrace_graph_addr_finish(trace);
+ ftrace_graph_addr_finish(gops, trace);

if (!func_prolog_preempt_disable(tr, &data, &trace_ctx))
return;


2024-04-15 13:12:06

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 22/36] function_graph: Replace fgraph_ret_regs with ftrace_regs

From: Masami Hiramatsu (Google) <[email protected]>

Use ftrace_regs instead of fgraph_ret_regs for tracing return value
on function_graph tracer because of simplifying the callback interface.

The CONFIG_HAVE_FUNCTION_GRAPH_RETVAL is also replaced by
CONFIG_HAVE_FUNCTION_GRAPH_FREGS.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v8:
- Newly added.
---
arch/arm64/Kconfig | 2 +-
arch/arm64/include/asm/ftrace.h | 23 ++++++-----------------
arch/arm64/kernel/asm-offsets.c | 12 ------------
arch/arm64/kernel/entry-ftrace.S | 32 ++++++++++++++++++--------------
arch/loongarch/Kconfig | 2 +-
arch/loongarch/include/asm/ftrace.h | 24 ++----------------------
arch/loongarch/kernel/asm-offsets.c | 12 ------------
arch/loongarch/kernel/mcount.S | 17 ++++++++++-------
arch/loongarch/kernel/mcount_dyn.S | 14 +++++++-------
arch/riscv/Kconfig | 2 +-
arch/riscv/include/asm/ftrace.h | 21 ---------------------
arch/riscv/kernel/mcount.S | 24 +++++++++++++-----------
arch/s390/Kconfig | 2 +-
arch/s390/include/asm/ftrace.h | 26 +++++++++-----------------
arch/s390/kernel/asm-offsets.c | 6 ------
arch/s390/kernel/mcount.S | 9 +++++----
arch/x86/Kconfig | 2 +-
arch/x86/include/asm/ftrace.h | 22 ++--------------------
arch/x86/kernel/ftrace_32.S | 15 +++++++++------
arch/x86/kernel/ftrace_64.S | 17 +++++++++--------
include/linux/ftrace.h | 14 +++++++++++---
kernel/trace/Kconfig | 4 ++--
kernel/trace/fgraph.c | 21 +++++++++------------
23 files changed, 117 insertions(+), 206 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 7b11c98b3e84..8d5047bc13bc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -209,7 +209,7 @@ config ARM64
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_ERROR_INJECTION
- select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
+ select HAVE_FUNCTION_GRAPH_FREGS
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_GCC_PLUGINS
select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && \
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index ab158196480c..ac82dc43a57d 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -137,6 +137,12 @@ ftrace_override_function_with_return(struct ftrace_regs *fregs)
fregs->pc = fregs->lr;
}

+static __always_inline unsigned long
+ftrace_regs_get_frame_pointer(const struct ftrace_regs *fregs)
+{
+ return fregs->fp;
+}
+
int ftrace_regs_query_register_offset(const char *name);

int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
@@ -194,23 +200,6 @@ static inline bool arch_syscall_match_sym_name(const char *sym,

#ifndef __ASSEMBLY__
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-struct fgraph_ret_regs {
- /* x0 - x7 */
- unsigned long regs[8];
-
- unsigned long fp;
- unsigned long __unused;
-};
-
-static inline unsigned long fgraph_ret_regs_return_value(struct fgraph_ret_regs *ret_regs)
-{
- return ret_regs->regs[0];
-}
-
-static inline unsigned long fgraph_ret_regs_frame_pointer(struct fgraph_ret_regs *ret_regs)
-{
- return ret_regs->fp;
-}

void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
unsigned long frame_pointer);
diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
index 81496083c041..81bb6704ff5a 100644
--- a/arch/arm64/kernel/asm-offsets.c
+++ b/arch/arm64/kernel/asm-offsets.c
@@ -200,18 +200,6 @@ int main(void)
DEFINE(FTRACE_OPS_FUNC, offsetof(struct ftrace_ops, func));
#endif
BLANK();
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- DEFINE(FGRET_REGS_X0, offsetof(struct fgraph_ret_regs, regs[0]));
- DEFINE(FGRET_REGS_X1, offsetof(struct fgraph_ret_regs, regs[1]));
- DEFINE(FGRET_REGS_X2, offsetof(struct fgraph_ret_regs, regs[2]));
- DEFINE(FGRET_REGS_X3, offsetof(struct fgraph_ret_regs, regs[3]));
- DEFINE(FGRET_REGS_X4, offsetof(struct fgraph_ret_regs, regs[4]));
- DEFINE(FGRET_REGS_X5, offsetof(struct fgraph_ret_regs, regs[5]));
- DEFINE(FGRET_REGS_X6, offsetof(struct fgraph_ret_regs, regs[6]));
- DEFINE(FGRET_REGS_X7, offsetof(struct fgraph_ret_regs, regs[7]));
- DEFINE(FGRET_REGS_FP, offsetof(struct fgraph_ret_regs, fp));
- DEFINE(FGRET_REGS_SIZE, sizeof(struct fgraph_ret_regs));
-#endif
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
DEFINE(FTRACE_OPS_DIRECT_CALL, offsetof(struct ftrace_ops, direct_call));
#endif
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index f0c16640ef21..169ccf600066 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -329,24 +329,28 @@ SYM_FUNC_END(ftrace_stub_graph)
* @fp is checked against the value passed by ftrace_graph_caller().
*/
SYM_CODE_START(return_to_handler)
- /* save return value regs */
- sub sp, sp, #FGRET_REGS_SIZE
- stp x0, x1, [sp, #FGRET_REGS_X0]
- stp x2, x3, [sp, #FGRET_REGS_X2]
- stp x4, x5, [sp, #FGRET_REGS_X4]
- stp x6, x7, [sp, #FGRET_REGS_X6]
- str x29, [sp, #FGRET_REGS_FP] // parent's fp
+ /* Make room for ftrace_regs */
+ sub sp, sp, #FREGS_SIZE
+
+ /* Save return value regs */
+ stp x0, x1, [sp, #FREGS_X0]
+ stp x2, x3, [sp, #FREGS_X2]
+ stp x4, x5, [sp, #FREGS_X4]
+ stp x6, x7, [sp, #FREGS_X6]
+
+ /* Save the callsite's FP */
+ str x29, [sp, #FREGS_FP]

mov x0, sp
- bl ftrace_return_to_handler // addr = ftrace_return_to_hander(regs);
+ bl ftrace_return_to_handler // addr = ftrace_return_to_hander(fregs);
mov x30, x0 // restore the original return address

- /* restore return value regs */
- ldp x0, x1, [sp, #FGRET_REGS_X0]
- ldp x2, x3, [sp, #FGRET_REGS_X2]
- ldp x4, x5, [sp, #FGRET_REGS_X4]
- ldp x6, x7, [sp, #FGRET_REGS_X6]
- add sp, sp, #FGRET_REGS_SIZE
+ /* Restore return value regs */
+ ldp x0, x1, [sp, #FREGS_X0]
+ ldp x2, x3, [sp, #FREGS_X2]
+ ldp x4, x5, [sp, #FREGS_X4]
+ ldp x6, x7, [sp, #FREGS_X6]
+ add sp, sp, #FREGS_SIZE

ret
SYM_CODE_END(return_to_handler)
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index a5f300ec6f28..da81053bfa6f 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -123,7 +123,7 @@ config LOONGARCH
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_FUNCTION_ERROR_INJECTION
- select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
+ select HAVE_FUNCTION_GRAPH_FREGS
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
select HAVE_GCC_PLUGINS
diff --git a/arch/loongarch/include/asm/ftrace.h b/arch/loongarch/include/asm/ftrace.h
index b43acfc5776c..14a1576bf948 100644
--- a/arch/loongarch/include/asm/ftrace.h
+++ b/arch/loongarch/include/asm/ftrace.h
@@ -78,6 +78,8 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs, unsigned long ip)
override_function_with_return(&(fregs)->regs)
#define ftrace_regs_query_register_offset(name) \
regs_query_register_offset(name)
+#define ftrace_regs_get_frame_pointer(fregs) \
+ ((fregs)->regs.regs[22])

#define ftrace_graph_func ftrace_graph_func
void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
@@ -100,26 +102,4 @@ __arch_ftrace_set_direct_caller(struct pt_regs *regs, unsigned long addr)

#endif /* CONFIG_FUNCTION_TRACER */

-#ifndef __ASSEMBLY__
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-struct fgraph_ret_regs {
- /* a0 - a1 */
- unsigned long regs[2];
-
- unsigned long fp;
- unsigned long __unused;
-};
-
-static inline unsigned long fgraph_ret_regs_return_value(struct fgraph_ret_regs *ret_regs)
-{
- return ret_regs->regs[0];
-}
-
-static inline unsigned long fgraph_ret_regs_frame_pointer(struct fgraph_ret_regs *ret_regs)
-{
- return ret_regs->fp;
-}
-#endif /* ifdef CONFIG_FUNCTION_GRAPH_TRACER */
-#endif
-
#endif /* _ASM_LOONGARCH_FTRACE_H */
diff --git a/arch/loongarch/kernel/asm-offsets.c b/arch/loongarch/kernel/asm-offsets.c
index bee9f7a3108f..714f5b5f1956 100644
--- a/arch/loongarch/kernel/asm-offsets.c
+++ b/arch/loongarch/kernel/asm-offsets.c
@@ -279,18 +279,6 @@ static void __used output_pbe_defines(void)
}
#endif

-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-static void __used output_fgraph_ret_regs_defines(void)
-{
- COMMENT("LoongArch fgraph_ret_regs offsets.");
- OFFSET(FGRET_REGS_A0, fgraph_ret_regs, regs[0]);
- OFFSET(FGRET_REGS_A1, fgraph_ret_regs, regs[1]);
- OFFSET(FGRET_REGS_FP, fgraph_ret_regs, fp);
- DEFINE(FGRET_REGS_SIZE, sizeof(struct fgraph_ret_regs));
- BLANK();
-}
-#endif
-
static void __used output_kvm_defines(void)
{
COMMENT("KVM/LoongArch Specific offsets.");
diff --git a/arch/loongarch/kernel/mcount.S b/arch/loongarch/kernel/mcount.S
index 3015896016a0..b6850503e061 100644
--- a/arch/loongarch/kernel/mcount.S
+++ b/arch/loongarch/kernel/mcount.S
@@ -79,10 +79,11 @@ SYM_FUNC_START(ftrace_graph_caller)
SYM_FUNC_END(ftrace_graph_caller)

SYM_FUNC_START(return_to_handler)
- PTR_ADDI sp, sp, -FGRET_REGS_SIZE
- PTR_S a0, sp, FGRET_REGS_A0
- PTR_S a1, sp, FGRET_REGS_A1
- PTR_S zero, sp, FGRET_REGS_FP
+ /* Save return value regs */
+ PTR_ADDI sp, sp, -PT_SIZE
+ PTR_S a0, sp, PT_R4
+ PTR_S a1, sp, PT_R5
+ PTR_S zero, sp, PT_R22

move a0, sp
bl ftrace_return_to_handler
@@ -90,9 +91,11 @@ SYM_FUNC_START(return_to_handler)
/* Restore the real parent address: a0 -> ra */
move ra, a0

- PTR_L a0, sp, FGRET_REGS_A0
- PTR_L a1, sp, FGRET_REGS_A1
- PTR_ADDI sp, sp, FGRET_REGS_SIZE
+ /* Restore return value regs */
+ PTR_L a0, sp, PT_R4
+ PTR_L a1, sp, PT_R5
+ PTR_ADDI sp, sp, PT_SIZE
+
jr ra
SYM_FUNC_END(return_to_handler)
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
diff --git a/arch/loongarch/kernel/mcount_dyn.S b/arch/loongarch/kernel/mcount_dyn.S
index 0c65cf09110c..d6b474ad1d5e 100644
--- a/arch/loongarch/kernel/mcount_dyn.S
+++ b/arch/loongarch/kernel/mcount_dyn.S
@@ -140,19 +140,19 @@ SYM_CODE_END(ftrace_graph_caller)
SYM_CODE_START(return_to_handler)
UNWIND_HINT_UNDEFINED
/* Save return value regs */
- PTR_ADDI sp, sp, -FGRET_REGS_SIZE
- PTR_S a0, sp, FGRET_REGS_A0
- PTR_S a1, sp, FGRET_REGS_A1
- PTR_S zero, sp, FGRET_REGS_FP
+ PTR_ADDI sp, sp, -PT_SIZE
+ PTR_S a0, sp, PT_R4
+ PTR_S a1, sp, PT_R5
+ PTR_S zero, sp, PT_R22

move a0, sp
bl ftrace_return_to_handler
move ra, a0

/* Restore return value regs */
- PTR_L a0, sp, FGRET_REGS_A0
- PTR_L a1, sp, FGRET_REGS_A1
- PTR_ADDI sp, sp, FGRET_REGS_SIZE
+ PTR_L a0, sp, PT_R4
+ PTR_L a1, sp, PT_R5
+ PTR_ADDI sp, sp, PT_SIZE

jr ra
SYM_CODE_END(return_to_handler)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index be09c8836d56..b58b8e81b510 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -129,7 +129,7 @@ config RISCV
select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
select HAVE_FUNCTION_GRAPH_TRACER
- select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
+ select HAVE_FUNCTION_GRAPH_FREGS
select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION
select HAVE_EBPF_JIT if MMU
select HAVE_FAST_GUP if MMU
diff --git a/arch/riscv/include/asm/ftrace.h b/arch/riscv/include/asm/ftrace.h
index 1276d7d9ca8b..9500acf8e156 100644
--- a/arch/riscv/include/asm/ftrace.h
+++ b/arch/riscv/include/asm/ftrace.h
@@ -143,25 +143,4 @@ static inline void __arch_ftrace_set_direct_caller(struct pt_regs *regs, unsigne

#endif /* CONFIG_DYNAMIC_FTRACE */

-#ifndef __ASSEMBLY__
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-struct fgraph_ret_regs {
- unsigned long a1;
- unsigned long a0;
- unsigned long s0;
- unsigned long ra;
-};
-
-static inline unsigned long fgraph_ret_regs_return_value(struct fgraph_ret_regs *ret_regs)
-{
- return ret_regs->a0;
-}
-
-static inline unsigned long fgraph_ret_regs_frame_pointer(struct fgraph_ret_regs *ret_regs)
-{
- return ret_regs->s0;
-}
-#endif /* ifdef CONFIG_FUNCTION_GRAPH_TRACER */
-#endif
-
#endif /* _ASM_RISCV_FTRACE_H */
diff --git a/arch/riscv/kernel/mcount.S b/arch/riscv/kernel/mcount.S
index 3a42f6287909..068168046e0e 100644
--- a/arch/riscv/kernel/mcount.S
+++ b/arch/riscv/kernel/mcount.S
@@ -12,6 +12,8 @@
#include <asm/asm-offsets.h>
#include <asm/ftrace.h>

+#define ABI_SIZE_ON_STACK 80
+
.text

.macro SAVE_ABI_STATE
@@ -26,12 +28,12 @@
* register if a0 was not saved.
*/
.macro SAVE_RET_ABI_STATE
- addi sp, sp, -4*SZREG
- REG_S s0, 2*SZREG(sp)
- REG_S ra, 3*SZREG(sp)
- REG_S a0, 1*SZREG(sp)
- REG_S a1, 0*SZREG(sp)
- addi s0, sp, 4*SZREG
+ addi sp, sp, -ABI_SIZE_ON_STACK
+ REG_S ra, 1*SZREG(sp)
+ REG_S s0, 8*SZREG(sp)
+ REG_S a0, 10*SZREG(sp)
+ REG_S a1, 11*SZREG(sp)
+ addi s0, sp, ABI_SIZE_ON_STACK
.endm

.macro RESTORE_ABI_STATE
@@ -41,11 +43,11 @@
.endm

.macro RESTORE_RET_ABI_STATE
- REG_L ra, 3*SZREG(sp)
- REG_L s0, 2*SZREG(sp)
- REG_L a0, 1*SZREG(sp)
- REG_L a1, 0*SZREG(sp)
- addi sp, sp, 4*SZREG
+ REG_L ra, 1*SZREG(sp)
+ REG_L s0, 8*SZREG(sp)
+ REG_L a0, 10*SZREG(sp)
+ REG_L a1, 11*SZREG(sp)
+ addi sp, sp, ABI_SIZE_ON_STACK
.endm

SYM_TYPED_FUNC_START(ftrace_stub)
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 8f01ada6845e..1ed25b72eb47 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -179,7 +179,7 @@ config S390
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_FUNCTION_ERROR_INJECTION
- select HAVE_FUNCTION_GRAPH_RETVAL
+ select HAVE_FUNCTION_GRAPH_FREGS
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
select HAVE_GCC_PLUGINS
diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h
index 1912b598d1b8..9f8cc6d13bec 100644
--- a/arch/s390/include/asm/ftrace.h
+++ b/arch/s390/include/asm/ftrace.h
@@ -54,23 +54,6 @@ static __always_inline struct pt_regs *arch_ftrace_get_regs(struct ftrace_regs *
return NULL;
}

-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-struct fgraph_ret_regs {
- unsigned long gpr2;
- unsigned long fp;
-};
-
-static __always_inline unsigned long fgraph_ret_regs_return_value(struct fgraph_ret_regs *ret_regs)
-{
- return ret_regs->gpr2;
-}
-
-static __always_inline unsigned long fgraph_ret_regs_frame_pointer(struct fgraph_ret_regs *ret_regs)
-{
- return ret_regs->fp;
-}
-#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
-
static __always_inline unsigned long
ftrace_regs_get_instruction_pointer(const struct ftrace_regs *fregs)
{
@@ -97,6 +80,15 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs,
#define ftrace_regs_query_register_offset(name) \
regs_query_register_offset(name)

+static __always_inline unsigned long
+ftrace_regs_get_frame_pointer(struct ftrace_regs *fregs)
+{
+ unsigned long *sp;
+
+ sp = (void *)ftrace_regs_get_stack_pointer(fregs);
+ return sp[0]; /* return backchain */
+}
+
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
/*
* When an ftrace registered caller is tracing a function that is
diff --git a/arch/s390/kernel/asm-offsets.c b/arch/s390/kernel/asm-offsets.c
index fa5f6885c74a..73f8bcf0c873 100644
--- a/arch/s390/kernel/asm-offsets.c
+++ b/arch/s390/kernel/asm-offsets.c
@@ -178,12 +178,6 @@ int main(void)
DEFINE(OLDMEM_SIZE, PARMAREA + offsetof(struct parmarea, oldmem_size));
DEFINE(COMMAND_LINE, PARMAREA + offsetof(struct parmarea, command_line));
DEFINE(MAX_COMMAND_LINE_SIZE, PARMAREA + offsetof(struct parmarea, max_command_line_size));
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- /* function graph return value tracing */
- OFFSET(__FGRAPH_RET_GPR2, fgraph_ret_regs, gpr2);
- OFFSET(__FGRAPH_RET_FP, fgraph_ret_regs, fp);
- DEFINE(__FGRAPH_RET_SIZE, sizeof(struct fgraph_ret_regs));
-#endif
OFFSET(__FTRACE_REGS_PT_REGS, ftrace_regs, regs);
DEFINE(__FTRACE_REGS_SIZE, sizeof(struct ftrace_regs));
return 0;
diff --git a/arch/s390/kernel/mcount.S b/arch/s390/kernel/mcount.S
index ae4d4fd9afcd..cda798b976de 100644
--- a/arch/s390/kernel/mcount.S
+++ b/arch/s390/kernel/mcount.S
@@ -133,14 +133,15 @@ SYM_CODE_END(ftrace_common)
SYM_FUNC_START(return_to_handler)
stmg %r2,%r5,32(%r15)
lgr %r1,%r15
- aghi %r15,-(STACK_FRAME_OVERHEAD+__FGRAPH_RET_SIZE)
+# Allocate ftrace_regs + backchain on the stack
+ aghi %r15,-STACK_FRAME_SIZE_FREGS
stg %r1,__SF_BACKCHAIN(%r15)
la %r3,STACK_FRAME_OVERHEAD(%r15)
- stg %r1,__FGRAPH_RET_FP(%r3)
- stg %r2,__FGRAPH_RET_GPR2(%r3)
+ stg %r2,(__SF_GPRS+2*8)(%r15)
+ stg %r15,(__SF_GPRS+15*8)(%r15)
lgr %r2,%r3
brasl %r14,ftrace_return_to_handler
- aghi %r15,STACK_FRAME_OVERHEAD+__FGRAPH_RET_SIZE
+ aghi %r15,STACK_FRAME_SIZE_FREGS
lgr %r14,%r2
lmg %r2,%r5,32(%r15)
BR_EX %r14
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4fff6ed46e90..96c567714c6b 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -224,7 +224,7 @@ config X86
select HAVE_FAST_GUP
select HAVE_FENTRY if X86_64 || DYNAMIC_FTRACE
select HAVE_FTRACE_MCOUNT_RECORD
- select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
+ select HAVE_FUNCTION_GRAPH_FREGS if HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_GRAPH_TRACER if X86_32 || (X86_64 && DYNAMIC_FTRACE)
select HAVE_FUNCTION_TRACER
select HAVE_GCC_PLUGINS
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index c88bf47f46da..8d6db2b7d03a 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -72,6 +72,8 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
override_function_with_return(&(fregs)->regs)
#define ftrace_regs_query_register_offset(name) \
regs_query_register_offset(name)
+#define ftrace_regs_get_frame_pointer(fregs) \
+ frame_pointer(&(fregs)->regs)

struct ftrace_ops;
#define ftrace_graph_func ftrace_graph_func
@@ -156,24 +158,4 @@ static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
#endif /* !COMPILE_OFFSETS */
#endif /* !__ASSEMBLY__ */

-#ifndef __ASSEMBLY__
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
-struct fgraph_ret_regs {
- unsigned long ax;
- unsigned long dx;
- unsigned long bp;
-};
-
-static inline unsigned long fgraph_ret_regs_return_value(struct fgraph_ret_regs *ret_regs)
-{
- return ret_regs->ax;
-}
-
-static inline unsigned long fgraph_ret_regs_frame_pointer(struct fgraph_ret_regs *ret_regs)
-{
- return ret_regs->bp;
-}
-#endif /* ifdef CONFIG_FUNCTION_GRAPH_TRACER */
-#endif
-
#endif /* _ASM_X86_FTRACE_H */
diff --git a/arch/x86/kernel/ftrace_32.S b/arch/x86/kernel/ftrace_32.S
index 58d9ed50fe61..4b265884d06c 100644
--- a/arch/x86/kernel/ftrace_32.S
+++ b/arch/x86/kernel/ftrace_32.S
@@ -23,6 +23,8 @@ SYM_FUNC_START(__fentry__)
SYM_FUNC_END(__fentry__)
EXPORT_SYMBOL(__fentry__)

+#define FRAME_SIZE PT_OLDSS+4
+
SYM_CODE_START(ftrace_caller)

#ifdef CONFIG_FRAME_POINTER
@@ -187,14 +189,15 @@ SYM_CODE_END(ftrace_graph_caller)

.globl return_to_handler
return_to_handler:
- pushl $0
- pushl %edx
- pushl %eax
+ subl $(FRAME_SIZE), %esp
+ movl $0, PT_EBP(%esp)
+ movl %edx, PT_EDX(%esp)
+ movl %eax, PT_EAX(%esp)
movl %esp, %eax
call ftrace_return_to_handler
movl %eax, %ecx
- popl %eax
- popl %edx
- addl $4, %esp # skip ebp
+ movl %eax, PT_EAX(%esp)
+ movl %edx, PT_EDX(%esp)
+ addl $(FRAME_SIZE), %esp
JMP_NOSPEC ecx
#endif
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 214f30e9f0c0..d51647228596 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -348,21 +348,22 @@ STACK_FRAME_NON_STANDARD_FP(__fentry__)
SYM_CODE_START(return_to_handler)
UNWIND_HINT_UNDEFINED
ANNOTATE_NOENDBR
- subq $24, %rsp

- /* Save the return values */
- movq %rax, (%rsp)
- movq %rdx, 8(%rsp)
- movq %rbp, 16(%rsp)
+ /* Save ftrace_regs for function exit context */
+ subq $(FRAME_SIZE), %rsp
+
+ movq %rax, RAX(%rsp)
+ movq %rdx, RDX(%rsp)
+ movq %rbp, RBP(%rsp)
movq %rsp, %rdi

call ftrace_return_to_handler

movq %rax, %rdi
- movq 8(%rsp), %rdx
- movq (%rsp), %rax
+ movq RDX(%rsp), %rdx
+ movq RAX(%rsp), %rax

- addq $24, %rsp
+ addq $(FRAME_SIZE), %rsp
/*
* Jump back to the old return address. This cannot be JMP_NOSPEC rdi
* since IBT would demand that contain ENDBR, which simply isn't so for
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 087345ef0d72..0255b95f2d61 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -43,9 +43,8 @@ struct dyn_ftrace;

char *arch_ftrace_match_adjust(char *str, const char *search);

-#ifdef CONFIG_HAVE_FUNCTION_GRAPH_RETVAL
-struct fgraph_ret_regs;
-unsigned long ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs);
+#ifdef CONFIG_HAVE_FUNCTION_GRAPH_FREGS
+unsigned long ftrace_return_to_handler(struct ftrace_regs *fregs);
#else
unsigned long ftrace_return_to_handler(unsigned long frame_pointer);
#endif
@@ -135,6 +134,13 @@ extern int ftrace_enabled;
* Also, architecture dependent fields can be used for internal process.
* (e.g. orig_ax on x86_64)
*
+ * Basically, ftrace_regs stores the registers related to the context.
+ * On function entry, registers for function parameters and hooking the
+ * function call are stored, and on function exit, registers for function
+ * return value and frame pointers are stored.
+ *
+ * And also, it dpends on the context that which registers are restored
+ * from the ftrace_regs.
* On the function entry, those registers will be restored except for
* the stack pointer, so that user can change the function parameters
* and instruction pointer (e.g. live patching.)
@@ -192,6 +198,8 @@ static __always_inline bool ftrace_regs_has_args(struct ftrace_regs *fregs)
override_function_with_return(ftrace_get_regs(fregs))
#define ftrace_regs_query_register_offset(name) \
regs_query_register_offset(name)
+#define ftrace_regs_get_frame_pointer(fregs) \
+ frame_pointer(&(fregs)->regs)
#endif

typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 61c541c36596..cb9c48a4f5bc 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -31,7 +31,7 @@ config HAVE_FUNCTION_GRAPH_TRACER
help
See Documentation/trace/ftrace-design.rst

-config HAVE_FUNCTION_GRAPH_RETVAL
+config HAVE_FUNCTION_GRAPH_FREGS
bool

config HAVE_DYNAMIC_FTRACE
@@ -232,7 +232,7 @@ config FUNCTION_GRAPH_TRACER

config FUNCTION_GRAPH_RETVAL
bool "Kernel Function Graph Return Value"
- depends on HAVE_FUNCTION_GRAPH_RETVAL
+ depends on HAVE_FUNCTION_GRAPH_FREGS
depends on FUNCTION_GRAPH_TRACER
default n
help
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 05845291c4bd..33be5af4801c 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -767,15 +767,12 @@ static struct notifier_block ftrace_suspend_notifier = {
.notifier_call = ftrace_suspend_notifier_call,
};

-/* fgraph_ret_regs is not defined without CONFIG_FUNCTION_GRAPH_RETVAL */
-struct fgraph_ret_regs;
-
/*
* Send the trace to the ring-buffer.
* @return the original return address.
*/
-static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs,
- unsigned long frame_pointer)
+static inline unsigned long
+__ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointer)
{
struct ftrace_ret_stack *ret_stack;
struct ftrace_graph_ret trace;
@@ -795,7 +792,7 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs

trace.rettime = trace_clock_local();
#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
- trace.retval = fgraph_ret_regs_return_value(ret_regs);
+ trace.retval = ftrace_regs_get_return_value(fregs);
#endif

bitmap = get_fgraph_index_bitmap(current, index);
@@ -823,14 +820,14 @@ static unsigned long __ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs
}

/*
- * After all architecures have selected HAVE_FUNCTION_GRAPH_RETVAL, we can
- * leave only ftrace_return_to_handler(ret_regs).
+ * After all architecures have selected HAVE_FUNCTION_GRAPH_FREGS, we can
+ * leave only ftrace_return_to_handler(fregs).
*/
-#ifdef CONFIG_HAVE_FUNCTION_GRAPH_RETVAL
-unsigned long ftrace_return_to_handler(struct fgraph_ret_regs *ret_regs)
+#ifdef CONFIG_HAVE_FUNCTION_GRAPH_FREGS
+unsigned long ftrace_return_to_handler(struct ftrace_regs *fregs)
{
- return __ftrace_return_to_handler(ret_regs,
- fgraph_ret_regs_frame_pointer(ret_regs));
+ return __ftrace_return_to_handler(fregs,
+ ftrace_regs_get_frame_pointer(fregs));
}
#else
unsigned long ftrace_return_to_handler(unsigned long frame_pointer)


2024-04-15 13:12:51

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 24/36] fprobe: Use ftrace_regs in fprobe entry handler

From: Masami Hiramatsu (Google) <[email protected]>

This allows fprobes to be available with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
instead of CONFIG_DYNAMIC_FTRACE_WITH_REGS, then we can enable fprobe
on arm64.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Florent Revest <[email protected]>
---
Changes in v6:
- Keep using SAVE_REGS flag to avoid breaking bpf kprobe-multi test.
---
include/linux/fprobe.h | 2 +-
kernel/trace/Kconfig | 3 ++-
kernel/trace/bpf_trace.c | 10 +++++++---
kernel/trace/fprobe.c | 3 ++-
kernel/trace/trace_fprobe.c | 6 +++++-
lib/test_fprobe.c | 4 ++--
samples/fprobe/fprobe_example.c | 2 +-
7 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index f39869588117..ca64ee5e45d2 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -10,7 +10,7 @@
struct fprobe;

typedef int (*fprobe_entry_cb)(struct fprobe *fp, unsigned long entry_ip,
- unsigned long ret_ip, struct pt_regs *regs,
+ unsigned long ret_ip, struct ftrace_regs *regs,
void *entry_data);

typedef void (*fprobe_exit_cb)(struct fprobe *fp, unsigned long entry_ip,
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index cb9c48a4f5bc..0aba53b7b0be 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -287,7 +287,7 @@ config DYNAMIC_FTRACE_WITH_ARGS
config FPROBE
bool "Kernel Function Probe (fprobe)"
depends on FUNCTION_TRACER
- depends on DYNAMIC_FTRACE_WITH_REGS
+ depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
depends on HAVE_RETHOOK
select RETHOOK
default n
@@ -672,6 +672,7 @@ config FPROBE_EVENTS
select TRACING
select PROBE_EVENTS
select DYNAMIC_EVENTS
+ depends on DYNAMIC_FTRACE_WITH_REGS
default y
help
This allows user to add tracing events on the function entry and
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 9dc605f08a23..7837cf4e39d9 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2577,7 +2577,7 @@ static int __init bpf_event_init(void)
fs_initcall(bpf_event_init);
#endif /* CONFIG_MODULES */

-#ifdef CONFIG_FPROBE
+#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS)
struct bpf_kprobe_multi_link {
struct bpf_link link;
struct fprobe fp;
@@ -2822,10 +2822,14 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,

static int
kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,
- unsigned long ret_ip, struct pt_regs *regs,
+ unsigned long ret_ip, struct ftrace_regs *fregs,
void *data)
{
struct bpf_kprobe_multi_link *link;
+ struct pt_regs *regs = ftrace_get_regs(fregs);
+
+ if (!regs)
+ return 0;

link = container_of(fp, struct bpf_kprobe_multi_link, fp);
kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
@@ -3099,7 +3103,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
kvfree(cookies);
return err;
}
-#else /* !CONFIG_FPROBE */
+#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */
int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
return -EOPNOTSUPP;
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 9ff018245840..3d3789283873 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -46,7 +46,7 @@ static inline void __fprobe_handler(unsigned long ip, unsigned long parent_ip,
}

if (fp->entry_handler)
- ret = fp->entry_handler(fp, ip, parent_ip, ftrace_get_regs(fregs), entry_data);
+ ret = fp->entry_handler(fp, ip, parent_ip, fregs, entry_data);

/* If entry_handler returns !0, nmissed is not counted. */
if (rh) {
@@ -182,6 +182,7 @@ static void fprobe_init(struct fprobe *fp)
fp->ops.func = fprobe_kprobe_handler;
else
fp->ops.func = fprobe_handler;
+
fp->ops.flags |= FTRACE_OPS_FL_SAVE_REGS;
}

diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 62e6a8f4aae9..b2c20d4fdfd7 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -338,12 +338,16 @@ NOKPROBE_SYMBOL(fexit_perf_func);
#endif /* CONFIG_PERF_EVENTS */

static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip,
- unsigned long ret_ip, struct pt_regs *regs,
+ unsigned long ret_ip, struct ftrace_regs *fregs,
void *entry_data)
{
struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
+ struct pt_regs *regs = ftrace_get_regs(fregs);
int ret = 0;

+ if (!regs)
+ return 0;
+
if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
fentry_trace_func(tf, entry_ip, regs);
#ifdef CONFIG_PERF_EVENTS
diff --git a/lib/test_fprobe.c b/lib/test_fprobe.c
index 24de0e5ff859..ff607babba18 100644
--- a/lib/test_fprobe.c
+++ b/lib/test_fprobe.c
@@ -40,7 +40,7 @@ static noinline u32 fprobe_selftest_nest_target(u32 value, u32 (*nest)(u32))

static notrace int fp_entry_handler(struct fprobe *fp, unsigned long ip,
unsigned long ret_ip,
- struct pt_regs *regs, void *data)
+ struct ftrace_regs *fregs, void *data)
{
KUNIT_EXPECT_FALSE(current_test, preemptible());
/* This can be called on the fprobe_selftest_target and the fprobe_selftest_target2 */
@@ -81,7 +81,7 @@ static notrace void fp_exit_handler(struct fprobe *fp, unsigned long ip,

static notrace int nest_entry_handler(struct fprobe *fp, unsigned long ip,
unsigned long ret_ip,
- struct pt_regs *regs, void *data)
+ struct ftrace_regs *fregs, void *data)
{
KUNIT_EXPECT_FALSE(current_test, preemptible());
return 0;
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index 64e715e7ed11..1545a1aac616 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -50,7 +50,7 @@ static void show_backtrace(void)

static int sample_entry_handler(struct fprobe *fp, unsigned long ip,
unsigned long ret_ip,
- struct pt_regs *regs, void *data)
+ struct ftrace_regs *fregs, void *data)
{
if (use_trace)
/*


2024-04-15 13:13:38

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 26/36] tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs

From: Masami Hiramatsu (Google) <[email protected]>

Add ftrace_partial_regs() which converts the ftrace_regs to pt_regs.
This is for the eBPF which needs this to keep the same pt_regs interface
to access registers.
Thus when replacing the pt_regs with ftrace_regs in fprobes (which is
used by kprobe_multi eBPF event), this will be used.

If the architecture defines its own ftrace_regs, this copies partial
registers to pt_regs and returns it. If not, ftrace_regs is the same as
pt_regs and ftrace_partial_regs() will return ftrace_regs::regs.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Florent Revest <[email protected]>
---
Changes in v8:
- Add the reason why this required in changelog.
Changes from previous series: NOTHING, just forward ported.
---
arch/arm64/include/asm/ftrace.h | 11 +++++++++++
include/linux/ftrace.h | 17 +++++++++++++++++
2 files changed, 28 insertions(+)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index ac82dc43a57d..aab2b7a0f78c 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -143,6 +143,17 @@ ftrace_regs_get_frame_pointer(const struct ftrace_regs *fregs)
return fregs->fp;
}

+static __always_inline struct pt_regs *
+ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+ memcpy(regs->regs, fregs->regs, sizeof(u64) * 9);
+ regs->sp = fregs->sp;
+ regs->pc = fregs->pc;
+ regs->regs[29] = fregs->fp;
+ regs->regs[30] = fregs->lr;
+ return regs;
+}
+
int ftrace_regs_query_register_offset(const char *name);

int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 2b35f7d851ca..9cf4c1b8b3f8 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -177,6 +177,23 @@ static __always_inline struct pt_regs *ftrace_get_regs(struct ftrace_regs *fregs
return arch_ftrace_get_regs(fregs);
}

+#if !defined(CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS) || \
+ defined(CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST)
+
+static __always_inline struct pt_regs *
+ftrace_partial_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+ /*
+ * If CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST=y, ftrace_regs memory
+ * layout is the same as pt_regs. So always returns that address.
+ * Since arch_ftrace_get_regs() will check some members and may return
+ * NULL, we can not use it.
+ */
+ return &fregs->regs;
+}
+
+#endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
+
/*
* When true, the ftrace_regs_{get,set}_*() functions may be used on fregs.
* Note: this can be true even when ftrace_get_regs() cannot provide a pt_regs.


2024-04-15 13:13:54

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 27/36] tracing: Add ftrace_fill_perf_regs() for perf event

From: Masami Hiramatsu (Google) <[email protected]>

Add ftrace_fill_perf_regs() which should be compatible with the
perf_fetch_caller_regs(). In other words, the pt_regs returned from the
ftrace_fill_perf_regs() must satisfy 'user_mode(regs) == false' and can be
used for stack tracing.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes from previous series: NOTHING, just forward ported.
---
arch/arm64/include/asm/ftrace.h | 7 +++++++
arch/powerpc/include/asm/ftrace.h | 7 +++++++
arch/s390/include/asm/ftrace.h | 5 +++++
arch/x86/include/asm/ftrace.h | 7 +++++++
include/linux/ftrace.h | 31 +++++++++++++++++++++++++++++++
5 files changed, 57 insertions(+)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index aab2b7a0f78c..95a8f349f871 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -154,6 +154,13 @@ ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs *regs)
return regs;
}

+#define arch_ftrace_fill_perf_regs(fregs, _regs) do { \
+ (_regs)->pc = (fregs)->pc; \
+ (_regs)->regs[29] = (fregs)->fp; \
+ (_regs)->sp = (fregs)->sp; \
+ (_regs)->pstate = PSR_MODE_EL1h; \
+ } while (0)
+
int ftrace_regs_query_register_offset(const char *name);

int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec);
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index cfec6c5a47d0..51245fd6b45b 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -44,6 +44,13 @@ static __always_inline struct pt_regs *arch_ftrace_get_regs(struct ftrace_regs *
return fregs->regs.msr ? &fregs->regs : NULL;
}

+#define arch_ftrace_fill_perf_regs(fregs, _regs) do { \
+ (_regs)->result = 0; \
+ (_regs)->nip = (fregs)->regs.nip; \
+ (_regs)->gpr[1] = (fregs)->regs.gpr[1]; \
+ asm volatile("mfmsr %0" : "=r" ((_regs)->msr)); \
+ } while (0)
+
static __always_inline void
ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs,
unsigned long ip)
diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h
index 9f8cc6d13bec..cb8d60a5fe1d 100644
--- a/arch/s390/include/asm/ftrace.h
+++ b/arch/s390/include/asm/ftrace.h
@@ -89,6 +89,11 @@ ftrace_regs_get_frame_pointer(struct ftrace_regs *fregs)
return sp[0]; /* return backchain */
}

+#define arch_ftrace_fill_perf_regs(fregs, _regs) do { \
+ (_regs)->psw.addr = (fregs)->regs.psw.addr; \
+ (_regs)->gprs[15] = (fregs)->regs.gprs[15]; \
+ } while (0)
+
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
/*
* When an ftrace registered caller is tracing a function that is
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 8d6db2b7d03a..7625887fc49b 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -54,6 +54,13 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
return &fregs->regs;
}

+#define arch_ftrace_fill_perf_regs(fregs, _regs) do { \
+ (_regs)->ip = (fregs)->regs.ip; \
+ (_regs)->sp = (fregs)->regs.sp; \
+ (_regs)->cs = __KERNEL_CS; \
+ (_regs)->flags = 0; \
+ } while (0)
+
#define ftrace_regs_set_instruction_pointer(fregs, _ip) \
do { (fregs)->regs.ip = (_ip); } while (0)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 9cf4c1b8b3f8..71dd25afdb1c 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -194,6 +194,37 @@ ftrace_partial_regs(struct ftrace_regs *fregs, struct pt_regs *regs)

#endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */

+#ifdef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
+
+/*
+ * Please define arch dependent pt_regs which compatible to the
+ * perf_arch_fetch_caller_regs() but based on ftrace_regs.
+ * This requires
+ * - user_mode(_regs) returns false (always kernel mode).
+ * - able to use the _regs for stack trace.
+ */
+#ifndef arch_ftrace_fill_perf_regs
+/* As same as perf_arch_fetch_caller_regs(), do nothing by default */
+#define arch_ftrace_fill_perf_regs(fregs, _regs) do {} while (0)
+#endif
+
+static __always_inline struct pt_regs *
+ftrace_fill_perf_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+ arch_ftrace_fill_perf_regs(fregs, regs);
+ return regs;
+}
+
+#else /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */
+
+static __always_inline struct pt_regs *
+ftrace_fill_perf_regs(struct ftrace_regs *fregs, struct pt_regs *regs)
+{
+ return &fregs->regs;
+}
+
+#endif
+
/*
* When true, the ftrace_regs_{get,set}_*() functions may be used on fregs.
* Note: this can be true even when ftrace_get_regs() cannot provide a pt_regs.


2024-04-15 13:14:25

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 28/36] tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS

From: Masami Hiramatsu (Google) <[email protected]>

Allow fprobe events to be enabled with CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
With this change, fprobe events mostly use ftrace_regs instead of pt_regs.
Note that if the arch doesn't enable HAVE_PT_REGS_COMPAT_FTRACE_REGS,
fprobe events will not be able to be used from perf.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v9:
- Copy store_trace_entry_data() as store_fprobe_entry_data() for
fprobe.
Chagnes in v3:
- Use ftrace_regs_get_return_value().
Changes in v2:
- Define ftrace_regs_get_kernel_stack_nth() for
!CONFIG_HAVE_REGS_AND_STACK_ACCESS_API.
Changes from previous series: Update against the new series.
---
include/linux/ftrace.h | 17 ++++++
kernel/trace/Kconfig | 1
kernel/trace/trace_fprobe.c | 107 +++++++++++++++++++++++++--------------
kernel/trace/trace_probe_tmpl.h | 2 -
4 files changed, 86 insertions(+), 41 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 71dd25afdb1c..d845a80a3d56 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -256,6 +256,23 @@ static __always_inline bool ftrace_regs_has_args(struct ftrace_regs *fregs)
frame_pointer(&(fregs)->regs)
#endif

+#ifdef CONFIG_HAVE_REGS_AND_STACK_ACCESS_API
+static __always_inline unsigned long
+ftrace_regs_get_kernel_stack_nth(struct ftrace_regs *fregs, unsigned int nth)
+{
+ unsigned long *stackp;
+
+ stackp = (unsigned long *)ftrace_regs_get_stack_pointer(fregs);
+ if (((unsigned long)(stackp + nth) & ~(THREAD_SIZE - 1)) ==
+ ((unsigned long)stackp & ~(THREAD_SIZE - 1)))
+ return *(stackp + nth);
+
+ return 0;
+}
+#else /* !CONFIG_HAVE_REGS_AND_STACK_ACCESS_API */
+#define ftrace_regs_get_kernel_stack_nth(fregs, nth) (0L)
+#endif /* CONFIG_HAVE_REGS_AND_STACK_ACCESS_API */
+
typedef void (*ftrace_func_t)(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs);

diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index d818ba3ff943..e2fc600f0ef2 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -680,7 +680,6 @@ config FPROBE_EVENTS
select TRACING
select PROBE_EVENTS
select DYNAMIC_EVENTS
- depends on DYNAMIC_FTRACE_WITH_REGS
default y
help
This allows user to add tracing events on the function entry and
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 273cdf3cf70c..86cd6a8c806a 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -133,7 +133,7 @@ static int
process_fetch_insn(struct fetch_insn *code, void *rec, void *edata,
void *dest, void *base)
{
- struct pt_regs *regs = rec;
+ struct ftrace_regs *fregs = rec;
unsigned long val;
int ret;

@@ -141,17 +141,17 @@ process_fetch_insn(struct fetch_insn *code, void *rec, void *edata,
/* 1st stage: get value from context */
switch (code->op) {
case FETCH_OP_STACK:
- val = regs_get_kernel_stack_nth(regs, code->param);
+ val = ftrace_regs_get_kernel_stack_nth(fregs, code->param);
break;
case FETCH_OP_STACKP:
- val = kernel_stack_pointer(regs);
+ val = ftrace_regs_get_stack_pointer(fregs);
break;
case FETCH_OP_RETVAL:
- val = regs_return_value(regs);
+ val = ftrace_regs_get_return_value(fregs);
break;
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
case FETCH_OP_ARG:
- val = regs_get_kernel_argument(regs, code->param);
+ val = ftrace_regs_get_argument(fregs, code->param);
break;
case FETCH_OP_EDATA:
val = *(unsigned long *)((unsigned long)edata + code->offset);
@@ -174,7 +174,7 @@ NOKPROBE_SYMBOL(process_fetch_insn)
/* function entry handler */
static nokprobe_inline void
__fentry_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
- struct pt_regs *regs,
+ struct ftrace_regs *fregs,
struct trace_event_file *trace_file)
{
struct fentry_trace_entry_head *entry;
@@ -188,41 +188,71 @@ __fentry_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
if (trace_trigger_soft_disabled(trace_file))
return;

- dsize = __get_data_size(&tf->tp, regs, NULL);
+ dsize = __get_data_size(&tf->tp, fregs, NULL);

entry = trace_event_buffer_reserve(&fbuffer, trace_file,
sizeof(*entry) + tf->tp.size + dsize);
if (!entry)
return;

- fbuffer.regs = regs;
+ fbuffer.regs = ftrace_get_regs(fregs);
entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
entry->ip = entry_ip;
- store_trace_args(&entry[1], &tf->tp, regs, NULL, sizeof(*entry), dsize);
+ store_trace_args(&entry[1], &tf->tp, fregs, NULL, sizeof(*entry), dsize);

trace_event_buffer_commit(&fbuffer);
}

static void
fentry_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
- struct pt_regs *regs)
+ struct ftrace_regs *fregs)
{
struct event_file_link *link;

trace_probe_for_each_link_rcu(link, &tf->tp)
- __fentry_trace_func(tf, entry_ip, regs, link->file);
+ __fentry_trace_func(tf, entry_ip, fregs, link->file);
}
NOKPROBE_SYMBOL(fentry_trace_func);

+static nokprobe_inline
+void store_fprobe_entry_data(void *edata, struct trace_probe *tp, struct ftrace_regs *fregs)
+{
+ struct probe_entry_arg *earg = tp->entry_arg;
+ unsigned long val = 0;
+ int i;
+
+ if (!earg)
+ return;
+
+ for (i = 0; i < earg->size; i++) {
+ struct fetch_insn *code = &earg->code[i];
+
+ switch (code->op) {
+ case FETCH_OP_ARG:
+ val = ftrace_regs_get_argument(fregs, code->param);
+ break;
+ case FETCH_OP_ST_EDATA:
+ *(unsigned long *)((unsigned long)edata + code->offset) = val;
+ break;
+ case FETCH_OP_END:
+ goto end;
+ default:
+ break;
+ }
+ }
+end:
+ return;
+}
+
/* function exit handler */
static int trace_fprobe_entry_handler(struct fprobe *fp, unsigned long entry_ip,
- unsigned long ret_ip, struct pt_regs *regs,
+ unsigned long ret_ip, struct ftrace_regs *fregs,
void *entry_data)
{
struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);

if (tf->tp.entry_arg)
- store_trace_entry_data(entry_data, &tf->tp, regs);
+ store_fprobe_entry_data(entry_data, &tf->tp, fregs);

return 0;
}
@@ -230,7 +260,7 @@ NOKPROBE_SYMBOL(trace_fprobe_entry_handler)

static nokprobe_inline void
__fexit_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
- unsigned long ret_ip, struct pt_regs *regs,
+ unsigned long ret_ip, struct ftrace_regs *fregs,
void *entry_data, struct trace_event_file *trace_file)
{
struct fexit_trace_entry_head *entry;
@@ -244,60 +274,63 @@ __fexit_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
if (trace_trigger_soft_disabled(trace_file))
return;

- dsize = __get_data_size(&tf->tp, regs, entry_data);
+ dsize = __get_data_size(&tf->tp, fregs, entry_data);

entry = trace_event_buffer_reserve(&fbuffer, trace_file,
sizeof(*entry) + tf->tp.size + dsize);
if (!entry)
return;

- fbuffer.regs = regs;
+ fbuffer.regs = ftrace_get_regs(fregs);
entry = fbuffer.entry = ring_buffer_event_data(fbuffer.event);
entry->func = entry_ip;
entry->ret_ip = ret_ip;
- store_trace_args(&entry[1], &tf->tp, regs, entry_data, sizeof(*entry), dsize);
+ store_trace_args(&entry[1], &tf->tp, fregs, entry_data, sizeof(*entry), dsize);

trace_event_buffer_commit(&fbuffer);
}

static void
fexit_trace_func(struct trace_fprobe *tf, unsigned long entry_ip,
- unsigned long ret_ip, struct pt_regs *regs, void *entry_data)
+ unsigned long ret_ip, struct ftrace_regs *fregs, void *entry_data)
{
struct event_file_link *link;

trace_probe_for_each_link_rcu(link, &tf->tp)
- __fexit_trace_func(tf, entry_ip, ret_ip, regs, entry_data, link->file);
+ __fexit_trace_func(tf, entry_ip, ret_ip, fregs, entry_data, link->file);
}
NOKPROBE_SYMBOL(fexit_trace_func);

#ifdef CONFIG_PERF_EVENTS

static int fentry_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
- struct pt_regs *regs)
+ struct ftrace_regs *fregs)
{
struct trace_event_call *call = trace_probe_event_call(&tf->tp);
struct fentry_trace_entry_head *entry;
struct hlist_head *head;
int size, __size, dsize;
+ struct pt_regs *regs;
int rctx;

head = this_cpu_ptr(call->perf_events);
if (hlist_empty(head))
return 0;

- dsize = __get_data_size(&tf->tp, regs, NULL);
+ dsize = __get_data_size(&tf->tp, fregs, NULL);
__size = sizeof(*entry) + tf->tp.size + dsize;
size = ALIGN(__size + sizeof(u32), sizeof(u64));
size -= sizeof(u32);

- entry = perf_trace_buf_alloc(size, NULL, &rctx);
+ entry = perf_trace_buf_alloc(size, &regs, &rctx);
if (!entry)
return 0;

+ regs = ftrace_fill_perf_regs(fregs, regs);
+
entry->ip = entry_ip;
memset(&entry[1], 0, dsize);
- store_trace_args(&entry[1], &tf->tp, regs, NULL, sizeof(*entry), dsize);
+ store_trace_args(&entry[1], &tf->tp, fregs, NULL, sizeof(*entry), dsize);
perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
head, NULL);
return 0;
@@ -306,31 +339,34 @@ NOKPROBE_SYMBOL(fentry_perf_func);

static void
fexit_perf_func(struct trace_fprobe *tf, unsigned long entry_ip,
- unsigned long ret_ip, struct pt_regs *regs,
+ unsigned long ret_ip, struct ftrace_regs *fregs,
void *entry_data)
{
struct trace_event_call *call = trace_probe_event_call(&tf->tp);
struct fexit_trace_entry_head *entry;
struct hlist_head *head;
int size, __size, dsize;
+ struct pt_regs *regs;
int rctx;

head = this_cpu_ptr(call->perf_events);
if (hlist_empty(head))
return;

- dsize = __get_data_size(&tf->tp, regs, entry_data);
+ dsize = __get_data_size(&tf->tp, fregs, entry_data);
__size = sizeof(*entry) + tf->tp.size + dsize;
size = ALIGN(__size + sizeof(u32), sizeof(u64));
size -= sizeof(u32);

- entry = perf_trace_buf_alloc(size, NULL, &rctx);
+ entry = perf_trace_buf_alloc(size, &regs, &rctx);
if (!entry)
return;

+ regs = ftrace_fill_perf_regs(fregs, regs);
+
entry->func = entry_ip;
entry->ret_ip = ret_ip;
- store_trace_args(&entry[1], &tf->tp, regs, entry_data, sizeof(*entry), dsize);
+ store_trace_args(&entry[1], &tf->tp, fregs, entry_data, sizeof(*entry), dsize);
perf_trace_buf_submit(entry, size, rctx, call->event.type, 1, regs,
head, NULL);
}
@@ -342,17 +378,14 @@ static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip,
void *entry_data)
{
struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
- struct pt_regs *regs = ftrace_get_regs(fregs);
int ret = 0;

- if (!regs)
- return 0;
-
if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
- fentry_trace_func(tf, entry_ip, regs);
+ fentry_trace_func(tf, entry_ip, fregs);
+
#ifdef CONFIG_PERF_EVENTS
if (trace_probe_test_flag(&tf->tp, TP_FLAG_PROFILE))
- ret = fentry_perf_func(tf, entry_ip, regs);
+ ret = fentry_perf_func(tf, entry_ip, fregs);
#endif
return ret;
}
@@ -363,16 +396,12 @@ static void fexit_dispatcher(struct fprobe *fp, unsigned long entry_ip,
void *entry_data)
{
struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
- struct pt_regs *regs = ftrace_get_regs(fregs);
-
- if (!regs)
- return;

if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
- fexit_trace_func(tf, entry_ip, ret_ip, regs, entry_data);
+ fexit_trace_func(tf, entry_ip, ret_ip, fregs, entry_data);
#ifdef CONFIG_PERF_EVENTS
if (trace_probe_test_flag(&tf->tp, TP_FLAG_PROFILE))
- fexit_perf_func(tf, entry_ip, ret_ip, regs, entry_data);
+ fexit_perf_func(tf, entry_ip, ret_ip, fregs, entry_data);
#endif
}
NOKPROBE_SYMBOL(fexit_dispatcher);
diff --git a/kernel/trace/trace_probe_tmpl.h b/kernel/trace/trace_probe_tmpl.h
index 2caf0d2afb32..f39b37fcdb3b 100644
--- a/kernel/trace/trace_probe_tmpl.h
+++ b/kernel/trace/trace_probe_tmpl.h
@@ -232,7 +232,7 @@ process_fetch_insn_bottom(struct fetch_insn *code, unsigned long val,

/* Sum up total data length for dynamic arrays (strings) */
static nokprobe_inline int
-__get_data_size(struct trace_probe *tp, struct pt_regs *regs, void *edata)
+__get_data_size(struct trace_probe *tp, void *regs, void *edata)
{
struct probe_arg *arg;
int i, len, ret = 0;


2024-04-15 13:15:26

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 30/36] ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC

From: Masami Hiramatsu (Google) <[email protected]>

Add CONFIG_HAVE_FTRACE_GRAPH_FUNC kconfig in addition to ftrace_graph_func
macro check. This is for the other feature (e.g. FPROBE) which requires to
access ftrace_regs from fgraph_ops::entryfunc() can avoid compiling if
the fgraph can not pass the valid ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v8:
- Newly added.
---
arch/arm64/Kconfig | 1 +
arch/loongarch/Kconfig | 1 +
arch/powerpc/Kconfig | 1 +
arch/riscv/Kconfig | 1 +
arch/x86/Kconfig | 1 +
kernel/trace/Kconfig | 5 +++++
6 files changed, 10 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8d5047bc13bc..e0a5c69eeda2 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -206,6 +206,7 @@ config ARM64
select HAVE_SAMPLE_FTRACE_DIRECT_MULTI
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FAST_GUP
+ select HAVE_FTRACE_GRAPH_FUNC
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_ERROR_INJECTION
diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index dbcba61b6acf..9e94eff7f7cb 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -121,6 +121,7 @@ config LOONGARCH
select HAVE_EFFICIENT_UNALIGNED_ACCESS if !ARCH_STRICT_ALIGN
select HAVE_EXIT_THREAD
select HAVE_FAST_GUP
+ select HAVE_FTRACE_GRAPH_FUNC
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_FUNCTION_ERROR_INJECTION
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 1c4be3373686..b79d16c5846a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -237,6 +237,7 @@ config PPC
select HAVE_EBPF_JIT
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FAST_GUP
+ select HAVE_FTRACE_GRAPH_FUNC
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_ARG_ACCESS_API
select HAVE_FUNCTION_DESCRIPTORS if PPC64_ELF_ABI_V1
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index b58b8e81b510..6fd2a166904b 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -127,6 +127,7 @@ config RISCV
select HAVE_DYNAMIC_FTRACE if !XIP_KERNEL && MMU && (CLANG_SUPPORTS_DYNAMIC_FTRACE || GCC_SUPPORTS_DYNAMIC_FTRACE)
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
+ select HAVE_FTRACE_GRAPH_FUNC
select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_GRAPH_FREGS
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 188edde1db4d..4a4fdcd294d2 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -224,6 +224,7 @@ config X86
select HAVE_EXIT_THREAD
select HAVE_FAST_GUP
select HAVE_FENTRY if X86_64 || DYNAMIC_FTRACE
+ select HAVE_FTRACE_GRAPH_FUNC if HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_GRAPH_FREGS if HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_GRAPH_TRACER if X86_32 || (X86_64 && DYNAMIC_FTRACE)
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index e2fc600f0ef2..9056315d56ed 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -34,6 +34,11 @@ config HAVE_FUNCTION_GRAPH_TRACER
config HAVE_FUNCTION_GRAPH_FREGS
bool

+config HAVE_FTRACE_GRAPH_FUNC
+ bool
+ help
+ True if ftrace_graph_func() is defined.
+
config HAVE_DYNAMIC_FTRACE
bool
help


2024-04-15 13:15:55

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 31/36] fprobe: Rewrite fprobe on function-graph tracer

From: Masami Hiramatsu (Google) <[email protected]>

Rewrite fprobe implementation on function-graph tracer.
Major API changes are:
- 'nr_maxactive' field is deprecated.
- This depends on CONFIG_DYNAMIC_FTRACE_WITH_ARGS or
!CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS, and
CONFIG_HAVE_FUNCTION_GRAPH_FREGS. So currently works only
on x86_64.
- Currently the entry size is limited in 15 * sizeof(long).
- If there is too many fprobe exit handler set on the same
function, it will fail to probe.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v9:
- Remove unneeded prototype of ftrace_regs_get_return_address().
- Fix entry data address calculation.
- Remove DIV_ROUND_UP() from hotpath.
Changes in v8:
- Use trace_func_graph_ret/ent_t for fgraph_ops.
- Update CONFIG_FPROBE dependencies.
- Add ftrace_regs_get_return_address() for each arch.
Changes in v3:
- Update for new reserve_data/retrieve_data API.
- Fix internal push/pop on fgraph data logic so that it can
correctly save/restore the returning fprobes.
Changes in v2:
- Add more lockdep_assert_held(fprobe_mutex)
- Use READ_ONCE() and WRITE_ONCE() for fprobe_hlist_node::fp.
- Add NOKPROBE_SYMBOL() for the functions which is called from
entry/exit callback.
---
arch/arm64/include/asm/ftrace.h | 6
arch/loongarch/include/asm/ftrace.h | 6
arch/powerpc/include/asm/ftrace.h | 6
arch/s390/include/asm/ftrace.h | 6
arch/x86/include/asm/ftrace.h | 6
include/linux/fprobe.h | 53 ++-
kernel/trace/Kconfig | 8
kernel/trace/fprobe.c | 638 +++++++++++++++++++++++++----------
lib/test_fprobe.c | 45 --
9 files changed, 529 insertions(+), 245 deletions(-)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 95a8f349f871..800c75f46a13 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -143,6 +143,12 @@ ftrace_regs_get_frame_pointer(const struct ftrace_regs *fregs)
return fregs->fp;
}

+static __always_inline unsigned long
+ftrace_regs_get_return_address(const struct ftrace_regs *fregs)
+{
+ return fregs->lr;
+}
+
static __always_inline struct pt_regs *
ftrace_partial_regs(const struct ftrace_regs *fregs, struct pt_regs *regs)
{
diff --git a/arch/loongarch/include/asm/ftrace.h b/arch/loongarch/include/asm/ftrace.h
index 14a1576bf948..b8432b7cc9d4 100644
--- a/arch/loongarch/include/asm/ftrace.h
+++ b/arch/loongarch/include/asm/ftrace.h
@@ -81,6 +81,12 @@ ftrace_regs_set_instruction_pointer(struct ftrace_regs *fregs, unsigned long ip)
#define ftrace_regs_get_frame_pointer(fregs) \
((fregs)->regs.regs[22])

+static __always_inline unsigned long
+ftrace_regs_get_return_address(struct ftrace_regs *fregs)
+{
+ return *(unsigned long *)(fregs->regs.regs[1]);
+}
+
#define ftrace_graph_func ftrace_graph_func
void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs);
diff --git a/arch/powerpc/include/asm/ftrace.h b/arch/powerpc/include/asm/ftrace.h
index 51245fd6b45b..d8a74a6570f8 100644
--- a/arch/powerpc/include/asm/ftrace.h
+++ b/arch/powerpc/include/asm/ftrace.h
@@ -77,6 +77,12 @@ ftrace_regs_get_instruction_pointer(struct ftrace_regs *fregs)
#define ftrace_regs_query_register_offset(name) \
regs_query_register_offset(name)

+static __always_inline unsigned long
+ftrace_regs_get_return_address(struct ftrace_regs *fregs)
+{
+ return fregs->regs.link;
+}
+
struct ftrace_ops;

#define ftrace_graph_func ftrace_graph_func
diff --git a/arch/s390/include/asm/ftrace.h b/arch/s390/include/asm/ftrace.h
index cb8d60a5fe1d..d8ca1776c554 100644
--- a/arch/s390/include/asm/ftrace.h
+++ b/arch/s390/include/asm/ftrace.h
@@ -89,6 +89,12 @@ ftrace_regs_get_frame_pointer(struct ftrace_regs *fregs)
return sp[0]; /* return backchain */
}

+static __always_inline unsigned long
+ftrace_regs_get_return_address(const struct ftrace_regs *fregs)
+{
+ return fregs->regs.gprs[14];
+}
+
#define arch_ftrace_fill_perf_regs(fregs, _regs) do { \
(_regs)->psw.addr = (fregs)->regs.psw.addr; \
(_regs)->gprs[15] = (fregs)->regs.gprs[15]; \
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index 7625887fc49b..979d3458a328 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -82,6 +82,12 @@ arch_ftrace_get_regs(struct ftrace_regs *fregs)
#define ftrace_regs_get_frame_pointer(fregs) \
frame_pointer(&(fregs)->regs)

+static __always_inline unsigned long
+ftrace_regs_get_return_address(struct ftrace_regs *fregs)
+{
+ return *(unsigned long *)ftrace_regs_get_stack_pointer(fregs);
+}
+
struct ftrace_ops;
#define ftrace_graph_func ftrace_graph_func
void ftrace_graph_func(unsigned long ip, unsigned long parent_ip,
diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index ef609bcca0f9..2d06bbd99601 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -5,10 +5,11 @@

#include <linux/compiler.h>
#include <linux/ftrace.h>
-#include <linux/rethook.h>
+#include <linux/rcupdate.h>
+#include <linux/refcount.h>
+#include <linux/slab.h>

struct fprobe;
-
typedef int (*fprobe_entry_cb)(struct fprobe *fp, unsigned long entry_ip,
unsigned long ret_ip, struct ftrace_regs *regs,
void *entry_data);
@@ -17,35 +18,57 @@ typedef void (*fprobe_exit_cb)(struct fprobe *fp, unsigned long entry_ip,
unsigned long ret_ip, struct ftrace_regs *regs,
void *entry_data);

+/**
+ * strcut fprobe_hlist_node - address based hash list node for fprobe.
+ *
+ * @hlist: The hlist node for address search hash table.
+ * @addr: The address represented by this.
+ * @fp: The fprobe which owns this.
+ */
+struct fprobe_hlist_node {
+ struct hlist_node hlist;
+ unsigned long addr;
+ struct fprobe *fp;
+};
+
+/**
+ * struct fprobe_hlist - hash list nodes for fprobe.
+ *
+ * @hlist: The hlist node for existence checking hash table.
+ * @rcu: rcu_head for RCU deferred release.
+ * @fp: The fprobe which owns this fprobe_hlist.
+ * @size: The size of @array.
+ * @array: The fprobe_hlist_node for each address to probe.
+ */
+struct fprobe_hlist {
+ struct hlist_node hlist;
+ struct rcu_head rcu;
+ struct fprobe *fp;
+ int size;
+ struct fprobe_hlist_node array[];
+};
+
/**
* struct fprobe - ftrace based probe.
- * @ops: The ftrace_ops.
+ *
* @nmissed: The counter for missing events.
* @flags: The status flag.
- * @rethook: The rethook data structure. (internal data)
* @entry_data_size: The private data storage size.
- * @nr_maxactive: The max number of active functions.
+ * @nr_maxactive: The max number of active functions. (*deprecated)
* @entry_handler: The callback function for function entry.
* @exit_handler: The callback function for function exit.
+ * @hlist_array: The fprobe_hlist for fprobe search from IP hash table.
*/
struct fprobe {
-#ifdef CONFIG_FUNCTION_TRACER
- /*
- * If CONFIG_FUNCTION_TRACER is not set, CONFIG_FPROBE is disabled too.
- * But user of fprobe may keep embedding the struct fprobe on their own
- * code. To avoid build error, this will keep the fprobe data structure
- * defined here, but remove ftrace_ops data structure.
- */
- struct ftrace_ops ops;
-#endif
unsigned long nmissed;
unsigned int flags;
- struct rethook *rethook;
size_t entry_data_size;
int nr_maxactive;

fprobe_entry_cb entry_handler;
fprobe_exit_cb exit_handler;
+
+ struct fprobe_hlist *hlist_array;
};

/* This fprobe is soft-disabled. */
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 9056315d56ed..d2ee9e6f1561 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -298,11 +298,9 @@ config DYNAMIC_FTRACE_WITH_ARGS

config FPROBE
bool "Kernel Function Probe (fprobe)"
- depends on FUNCTION_TRACER
- depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
- depends on HAVE_PT_REGS_TO_FTRACE_REGS_CAST || !HAVE_DYNAMIC_FTRACE_WITH_ARGS
- depends on HAVE_RETHOOK
- select RETHOOK
+ depends on FUNCTION_GRAPH_TRACER
+ depends on HAVE_FUNCTION_GRAPH_FREGS && HAVE_FTRACE_GRAPH_FUNC
+ depends on DYNAMIC_FTRACE_WITH_ARGS
default n
help
This option enables kernel function probe (fprobe) based on ftrace.
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 90a3c8e2bbdf..afa52d9816cf 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -8,98 +8,195 @@
#include <linux/fprobe.h>
#include <linux/kallsyms.h>
#include <linux/kprobes.h>
-#include <linux/rethook.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
#include <linux/slab.h>
#include <linux/sort.h>

#include "trace.h"

-struct fprobe_rethook_node {
- struct rethook_node node;
- unsigned long entry_ip;
- unsigned long entry_parent_ip;
- char data[];
-};
+#define FPROBE_IP_HASH_BITS 8
+#define FPROBE_IP_TABLE_SIZE (1 << FPROBE_IP_HASH_BITS)

-static inline void __fprobe_handler(unsigned long ip, unsigned long parent_ip,
- struct ftrace_ops *ops, struct ftrace_regs *fregs)
-{
- struct fprobe_rethook_node *fpr;
- struct rethook_node *rh = NULL;
- struct fprobe *fp;
- void *entry_data = NULL;
- int ret = 0;
+#define FPROBE_HASH_BITS 6
+#define FPROBE_TABLE_SIZE (1 << FPROBE_HASH_BITS)

- fp = container_of(ops, struct fprobe, ops);
+#define SIZE_IN_LONG(x) ((x + sizeof(long) - 1) >> (sizeof(long) == 8 ? 3 : 2))

- if (fp->exit_handler) {
- rh = rethook_try_get(fp->rethook);
- if (!rh) {
- fp->nmissed++;
- return;
- }
- fpr = container_of(rh, struct fprobe_rethook_node, node);
- fpr->entry_ip = ip;
- fpr->entry_parent_ip = parent_ip;
- if (fp->entry_data_size)
- entry_data = fpr->data;
+/*
+ * fprobe_table: hold 'fprobe_hlist::hlist' for checking the fprobe still
+ * exists. The key is the address of fprobe instance.
+ * fprobe_ip_table: hold 'fprobe_hlist::array[*]' for searching the fprobe
+ * instance related to the funciton address. The key is the ftrace IP
+ * address.
+ *
+ * When unregistering the fprobe, fprobe_hlist::fp and fprobe_hlist::array[*].fp
+ * are set NULL and delete those from both hash tables (by hlist_del_rcu).
+ * After an RCU grace period, the fprobe_hlist itself will be released.
+ *
+ * fprobe_table and fprobe_ip_table can be accessed from either
+ * - Normal hlist traversal and RCU add/del under 'fprobe_mutex' is held.
+ * - RCU hlist traversal under disabling preempt
+ */
+static struct hlist_head fprobe_table[FPROBE_TABLE_SIZE];
+static struct hlist_head fprobe_ip_table[FPROBE_IP_TABLE_SIZE];
+static DEFINE_MUTEX(fprobe_mutex);
+
+/*
+ * Find first fprobe in the hlist. It will be iterated twice in the entry
+ * probe, once for correcting the total required size, the second time is
+ * calling back the user handlers.
+ * Thus the hlist in the fprobe_table must be sorted and new probe needs to
+ * be added *before* the first fprobe.
+ */
+static struct fprobe_hlist_node *find_first_fprobe_node(unsigned long ip)
+{
+ struct fprobe_hlist_node *node;
+ struct hlist_head *head;
+
+ head = &fprobe_ip_table[hash_ptr((void *)ip, FPROBE_IP_HASH_BITS)];
+ hlist_for_each_entry_rcu(node, head, hlist,
+ lockdep_is_held(&fprobe_mutex)) {
+ if (node->addr == ip)
+ return node;
}
+ return NULL;
+}
+NOKPROBE_SYMBOL(find_first_fprobe_node);
+
+/* Node insertion and deletion requires the fprobe_mutex */
+static void insert_fprobe_node(struct fprobe_hlist_node *node)
+{
+ unsigned long ip = node->addr;
+ struct fprobe_hlist_node *next;
+ struct hlist_head *head;

- if (fp->entry_handler)
- ret = fp->entry_handler(fp, ip, parent_ip, fregs, entry_data);
+ lockdep_assert_held(&fprobe_mutex);

- /* If entry_handler returns !0, nmissed is not counted. */
- if (rh) {
- if (ret)
- rethook_recycle(rh);
- else
- rethook_hook(rh, ftrace_get_regs(fregs), true);
+ next = find_first_fprobe_node(ip);
+ if (next) {
+ hlist_add_before_rcu(&node->hlist, &next->hlist);
+ return;
}
+ head = &fprobe_ip_table[hash_ptr((void *)ip, FPROBE_IP_HASH_BITS)];
+ hlist_add_head_rcu(&node->hlist, head);
}

-static void fprobe_handler(unsigned long ip, unsigned long parent_ip,
- struct ftrace_ops *ops, struct ftrace_regs *fregs)
+/* Return true if there are synonims */
+static bool delete_fprobe_node(struct fprobe_hlist_node *node)
{
- struct fprobe *fp;
- int bit;
+ lockdep_assert_held(&fprobe_mutex);

- fp = container_of(ops, struct fprobe, ops);
- if (fprobe_disabled(fp))
- return;
+ WRITE_ONCE(node->fp, NULL);
+ hlist_del_rcu(&node->hlist);
+ return !!find_first_fprobe_node(node->addr);
+}

- /* recursion detection has to go before any traceable function and
- * all functions before this point should be marked as notrace
- */
- bit = ftrace_test_recursion_trylock(ip, parent_ip);
- if (bit < 0) {
- fp->nmissed++;
- return;
+/* Check existence of the fprobe */
+static bool is_fprobe_still_exist(struct fprobe *fp)
+{
+ struct hlist_head *head;
+ struct fprobe_hlist *fph;
+
+ head = &fprobe_table[hash_ptr(fp, FPROBE_HASH_BITS)];
+ hlist_for_each_entry_rcu(fph, head, hlist,
+ lockdep_is_held(&fprobe_mutex)) {
+ if (fph->fp == fp)
+ return true;
}
- __fprobe_handler(ip, parent_ip, ops, fregs);
- ftrace_test_recursion_unlock(bit);
+ return false;
+}
+NOKPROBE_SYMBOL(is_fprobe_still_exist);
+
+static int add_fprobe_hash(struct fprobe *fp)
+{
+ struct fprobe_hlist *fph = fp->hlist_array;
+ struct hlist_head *head;
+
+ lockdep_assert_held(&fprobe_mutex);

+ if (WARN_ON_ONCE(!fph))
+ return -EINVAL;
+
+ if (is_fprobe_still_exist(fp))
+ return -EEXIST;
+
+ head = &fprobe_table[hash_ptr(fp, FPROBE_HASH_BITS)];
+ hlist_add_head_rcu(&fp->hlist_array->hlist, head);
+ return 0;
}
-NOKPROBE_SYMBOL(fprobe_handler);

-static void fprobe_kprobe_handler(unsigned long ip, unsigned long parent_ip,
- struct ftrace_ops *ops, struct ftrace_regs *fregs)
+static int del_fprobe_hash(struct fprobe *fp)
{
- struct fprobe *fp;
- int bit;
+ struct fprobe_hlist *fph = fp->hlist_array;

- fp = container_of(ops, struct fprobe, ops);
- if (fprobe_disabled(fp))
- return;
+ lockdep_assert_held(&fprobe_mutex);

- /* recursion detection has to go before any traceable function and
- * all functions called before this point should be marked as notrace
- */
- bit = ftrace_test_recursion_trylock(ip, parent_ip);
- if (bit < 0) {
- fp->nmissed++;
- return;
+ if (WARN_ON_ONCE(!fph))
+ return -EINVAL;
+
+ if (!is_fprobe_still_exist(fp))
+ return -ENOENT;
+
+ fph->fp = NULL;
+ hlist_del_rcu(&fph->hlist);
+ return 0;
+}
+
+/* The entry data size is 4 bits (=16) * sizeof(long) in maximum */
+#define FPROBE_HEADER_SIZE_BITS 4
+#define MAX_FPROBE_DATA_SIZE_WORD ((1L << FPROBE_HEADER_SIZE_BITS) - 1)
+#define MAX_FPROBE_DATA_SIZE (MAX_FPROBE_DATA_SIZE_WORD * sizeof(long))
+#define FPROBE_HEADER_PTR_BITS (BITS_PER_LONG - FPROBE_HEADER_SIZE_BITS)
+#define FPROBE_HEADER_PTR_MASK GENMASK(FPROBE_HEADER_PTR_BITS - 1, 0)
+#define FPROBE_HEADER_SIZE sizeof(unsigned long)
+
+static inline unsigned long encode_fprobe_header(struct fprobe *fp, int size_words)
+{
+ if (WARN_ON_ONCE(size_words > MAX_FPROBE_DATA_SIZE_WORD ||
+ ((unsigned long)fp & ~FPROBE_HEADER_PTR_MASK) !=
+ ~FPROBE_HEADER_PTR_MASK)) {
+ return 0;
}
+ return ((unsigned long)size_words << FPROBE_HEADER_PTR_BITS) |
+ ((unsigned long)fp & FPROBE_HEADER_PTR_MASK);
+}
+
+/* Return reserved data size in words */
+static inline int decode_fprobe_header(unsigned long val, struct fprobe **fp)
+{
+ unsigned long ptr;
+
+ ptr = (val & FPROBE_HEADER_PTR_MASK) | ~FPROBE_HEADER_PTR_MASK;
+ if (fp)
+ *fp = (struct fprobe *)ptr;
+ return val >> FPROBE_HEADER_PTR_BITS;
+}
+
+/*
+ * fprobe shadow stack management:
+ * Since fprobe shares a single fgraph_ops, it needs to share the stack entry
+ * among the probes on the same function exit. Note that a new probe can be
+ * registered before a target function is returning, we can not use the hash
+ * table to find the corresponding probes. Thus the probe address is stored on
+ * the shadow stack with its entry data size.
+ *
+ */
+static inline int __fprobe_handler(unsigned long ip, unsigned long parent_ip,
+ struct fprobe *fp, struct ftrace_regs *fregs,
+ void *data)
+{
+ if (!fp->entry_handler)
+ return 0;
+
+ return fp->entry_handler(fp, ip, parent_ip, fregs, data);
+}

+static inline int __fprobe_kprobe_handler(unsigned long ip, unsigned long parent_ip,
+ struct fprobe *fp, struct ftrace_regs *fregs,
+ void *data)
+{
+ int ret;
/*
* This user handler is shared with other kprobes and is not expected to be
* called recursively. So if any other kprobe handler is running, this will
@@ -108,45 +205,185 @@ static void fprobe_kprobe_handler(unsigned long ip, unsigned long parent_ip,
*/
if (unlikely(kprobe_running())) {
fp->nmissed++;
- goto recursion_unlock;
+ return 0;
}

kprobe_busy_begin();
- __fprobe_handler(ip, parent_ip, ops, fregs);
+ ret = __fprobe_handler(ip, parent_ip, fp, fregs, data);
kprobe_busy_end();
-
-recursion_unlock:
- ftrace_test_recursion_unlock(bit);
+ return ret;
}

-static void fprobe_exit_handler(struct rethook_node *rh, void *data,
- unsigned long ret_ip, struct pt_regs *regs)
+static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
- struct fprobe *fp = (struct fprobe *)data;
- struct fprobe_rethook_node *fpr;
- struct ftrace_regs *fregs = (struct ftrace_regs *)regs;
- int bit;
+ struct fprobe_hlist_node *node, *first;
+ unsigned long *fgraph_data = NULL;
+ unsigned long func = trace->func;
+ unsigned long header, ret_ip;
+ int reserved_words;
+ struct fprobe *fp;
+ int used, ret;

- if (!fp || fprobe_disabled(fp))
- return;
+ if (WARN_ON_ONCE(!fregs))
+ return 0;
+
+ first = node = find_first_fprobe_node(func);
+ if (unlikely(!first))
+ return 0;

- fpr = container_of(rh, struct fprobe_rethook_node, node);
+ reserved_words = 0;
+ hlist_for_each_entry_from_rcu(node, hlist) {
+ if (node->addr != func)
+ break;
+ fp = READ_ONCE(node->fp);
+ if (!fp || !fp->exit_handler)
+ continue;
+ /*
+ * Since fprobe can be enabled until the next loop, we ignore the
+ * fprobe's disabled flag in this loop.
+ */
+ reserved_words +=
+ SIZE_IN_LONG(fp->entry_data_size) + 1;
+ }
+ node = first;
+ if (reserved_words) {
+ fgraph_data = fgraph_reserve_data(gops->idx, reserved_words * sizeof(long));
+ if (unlikely(!fgraph_data)) {
+ hlist_for_each_entry_from_rcu(node, hlist) {
+ if (node->addr != func)
+ break;
+ fp = READ_ONCE(node->fp);
+ if (fp && !fprobe_disabled(fp))
+ fp->nmissed++;
+ }
+ return 0;
+ }
+ }

/*
- * we need to assure no calls to traceable functions in-between the
- * end of fprobe_handler and the beginning of fprobe_exit_handler.
+ * TODO: recursion detection has been done in the fgraph. Thus we need
+ * to add a callback to increment missed counter.
*/
- bit = ftrace_test_recursion_trylock(fpr->entry_ip, fpr->entry_parent_ip);
- if (bit < 0) {
- fp->nmissed++;
+ ret_ip = ftrace_regs_get_return_address(fregs);
+ used = 0;
+ hlist_for_each_entry_from_rcu(node, hlist) {
+ void *data;
+
+ if (node->addr != func)
+ break;
+ fp = READ_ONCE(node->fp);
+ if (!fp || fprobe_disabled(fp))
+ continue;
+
+ if (fp->entry_data_size && fp->exit_handler)
+ data = fgraph_data + used + 1;
+ else
+ data = NULL;
+
+ if (fprobe_shared_with_kprobes(fp))
+ ret = __fprobe_kprobe_handler(func, ret_ip, fp, fregs, data);
+ else
+ ret = __fprobe_handler(func, ret_ip, fp, fregs, data);
+ /* If entry_handler returns !0, nmissed is not counted but skips exit_handler. */
+ if (!ret && fp->exit_handler) {
+ int size_words = SIZE_IN_LONG(fp->entry_data_size);
+
+ header = encode_fprobe_header(fp, size_words);
+ if (likely(header)) {
+ fgraph_data[used] = header;
+ used += size_words + 1;
+ }
+ }
+ }
+ if (used < reserved_words)
+ memset(fgraph_data + used, 0, reserved_words - used);
+
+ /* If any exit_handler is set, data must be used. */
+ return used != 0;
+}
+NOKPROBE_SYMBOL(fprobe_entry);
+
+static void fprobe_return(struct ftrace_graph_ret *trace,
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
+{
+ unsigned long *fgraph_data = NULL;
+ unsigned long ret_ip;
+ unsigned long val;
+ struct fprobe *fp;
+ int size, curr;
+ int size_words;
+
+ fgraph_data = (unsigned long *)fgraph_retrieve_data(gops->idx, &size);
+ if (WARN_ON_ONCE(!fgraph_data))
return;
+ size_words = SIZE_IN_LONG(size);
+ ret_ip = ftrace_regs_get_instruction_pointer(fregs);
+
+ preempt_disable();
+
+ curr = 0;
+ while (size_words > curr) {
+ val = fgraph_data[curr++];
+ if (!val)
+ break;
+
+ size = decode_fprobe_header(val, &fp);
+ if (fp && is_fprobe_still_exist(fp) && !fprobe_disabled(fp)) {
+ if (WARN_ON_ONCE(curr + size > size_words))
+ break;
+ fp->exit_handler(fp, trace->func, ret_ip, fregs,
+ size ? fgraph_data + curr : NULL);
+ }
+ curr += size;
}
+ preempt_enable();
+}
+NOKPROBE_SYMBOL(fprobe_return);
+
+static struct fgraph_ops fprobe_graph_ops = {
+ .entryfunc = fprobe_entry,
+ .retfunc = fprobe_return,
+};
+static int fprobe_graph_active;

- fp->exit_handler(fp, fpr->entry_ip, ret_ip, fregs,
- fp->entry_data_size ? (void *)fpr->data : NULL);
- ftrace_test_recursion_unlock(bit);
+/* Add @addrs to the ftrace filter and register fgraph if needed. */
+static int fprobe_graph_add_ips(unsigned long *addrs, int num)
+{
+ int ret;
+
+ lockdep_assert_held(&fprobe_mutex);
+
+ ret = ftrace_set_filter_ips(&fprobe_graph_ops.ops, addrs, num, 0, 0);
+ if (ret)
+ return ret;
+
+ if (!fprobe_graph_active) {
+ ret = register_ftrace_graph(&fprobe_graph_ops);
+ if (WARN_ON_ONCE(ret)) {
+ ftrace_free_filter(&fprobe_graph_ops.ops);
+ return ret;
+ }
+ }
+ fprobe_graph_active++;
+ return 0;
+}
+
+/* Remove @addrs from the ftrace filter and unregister fgraph if possible. */
+static void fprobe_graph_remove_ips(unsigned long *addrs, int num)
+{
+ lockdep_assert_held(&fprobe_mutex);
+
+ fprobe_graph_active--;
+ if (!fprobe_graph_active) {
+ /* Q: should we unregister it ? */
+ unregister_ftrace_graph(&fprobe_graph_ops);
+ return;
+ }
+
+ ftrace_set_filter_ips(&fprobe_graph_ops.ops, addrs, num, 1, 0);
}
-NOKPROBE_SYMBOL(fprobe_exit_handler);

static int symbols_cmp(const void *a, const void *b)
{
@@ -176,54 +413,97 @@ static unsigned long *get_ftrace_locations(const char **syms, int num)
return ERR_PTR(-ENOENT);
}

-static void fprobe_init(struct fprobe *fp)
-{
- fp->nmissed = 0;
- if (fprobe_shared_with_kprobes(fp))
- fp->ops.func = fprobe_kprobe_handler;
- else
- fp->ops.func = fprobe_handler;
-
- fp->ops.flags |= FTRACE_OPS_FL_SAVE_REGS;
-}
+struct filter_match_data {
+ const char *filter;
+ const char *notfilter;
+ size_t index;
+ size_t size;
+ unsigned long *addrs;
+};

-static int fprobe_init_rethook(struct fprobe *fp, int num)
+static int filter_match_callback(void *data, const char *name, unsigned long addr)
{
- int size;
+ struct filter_match_data *match = data;

- if (!fp->exit_handler) {
- fp->rethook = NULL;
+ if (!glob_match(match->filter, name) ||
+ (match->notfilter && glob_match(match->notfilter, name)))
return 0;
- }

- /* Initialize rethook if needed */
- if (fp->nr_maxactive)
- num = fp->nr_maxactive;
- else
- num *= num_possible_cpus() * 2;
- if (num <= 0)
- return -EINVAL;
+ if (!ftrace_location(addr))
+ return 0;

- size = sizeof(struct fprobe_rethook_node) + fp->entry_data_size;
+ if (match->addrs)
+ match->addrs[match->index] = addr;

- /* Initialize rethook */
- fp->rethook = rethook_alloc((void *)fp, fprobe_exit_handler, size, num);
- if (IS_ERR(fp->rethook))
- return PTR_ERR(fp->rethook);
+ match->index++;
+ return match->index == match->size;
+}

- return 0;
+/*
+ * Make IP list from the filter/no-filter glob patterns.
+ * Return the number of matched symbols, or -ENOENT.
+ */
+static int ip_list_from_filter(const char *filter, const char *notfilter,
+ unsigned long *addrs, size_t size)
+{
+ struct filter_match_data match = { .filter = filter, .notfilter = notfilter,
+ .index = 0, .size = size, .addrs = addrs};
+ int ret;
+
+ ret = kallsyms_on_each_symbol(filter_match_callback, &match);
+ if (ret < 0)
+ return ret;
+ ret = module_kallsyms_on_each_symbol(NULL, filter_match_callback, &match);
+ if (ret < 0)
+ return ret;
+
+ return match.index ?: -ENOENT;
}

static void fprobe_fail_cleanup(struct fprobe *fp)
{
- if (!IS_ERR_OR_NULL(fp->rethook)) {
- /* Don't need to cleanup rethook->handler because this is not used. */
- rethook_free(fp->rethook);
- fp->rethook = NULL;
+ kfree(fp->hlist_array);
+ fp->hlist_array = NULL;
+}
+
+/* Initialize the fprobe data structure. */
+static int fprobe_init(struct fprobe *fp, unsigned long *addrs, int num)
+{
+ struct fprobe_hlist *hlist_array;
+ unsigned long addr;
+ int size, i;
+
+ if (!fp || !addrs || num <= 0)
+ return -EINVAL;
+
+ size = ALIGN(fp->entry_data_size, sizeof(long));
+ if (size > MAX_FPROBE_DATA_SIZE)
+ return -E2BIG;
+ fp->entry_data_size = size;
+
+ hlist_array = kzalloc(struct_size(hlist_array, array, num), GFP_KERNEL);
+ if (!hlist_array)
+ return -ENOMEM;
+
+ fp->nmissed = 0;
+
+ hlist_array->size = num;
+ fp->hlist_array = hlist_array;
+ hlist_array->fp = fp;
+ for (i = 0; i < num; i++) {
+ hlist_array->array[i].fp = fp;
+ addr = ftrace_location(addrs[i]);
+ if (!addr) {
+ fprobe_fail_cleanup(fp);
+ return -ENOENT;
+ }
+ hlist_array->array[i].addr = addr;
}
- ftrace_free_filter(&fp->ops);
+ return 0;
}

+#define FPROBE_IPS_MAX INT_MAX
+
/**
* register_fprobe() - Register fprobe to ftrace by pattern.
* @fp: A fprobe data structure to be registered.
@@ -237,46 +517,24 @@ static void fprobe_fail_cleanup(struct fprobe *fp)
*/
int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter)
{
- struct ftrace_hash *hash;
- unsigned char *str;
- int ret, len;
+ unsigned long *addrs;
+ int ret;

if (!fp || !filter)
return -EINVAL;

- fprobe_init(fp);
-
- len = strlen(filter);
- str = kstrdup(filter, GFP_KERNEL);
- ret = ftrace_set_filter(&fp->ops, str, len, 0);
- kfree(str);
- if (ret)
+ ret = ip_list_from_filter(filter, notfilter, NULL, FPROBE_IPS_MAX);
+ if (ret < 0)
return ret;

- if (notfilter) {
- len = strlen(notfilter);
- str = kstrdup(notfilter, GFP_KERNEL);
- ret = ftrace_set_notrace(&fp->ops, str, len, 0);
- kfree(str);
- if (ret)
- goto out;
- }
-
- /* TODO:
- * correctly calculate the total number of filtered symbols
- * from both filter and notfilter.
- */
- hash = rcu_access_pointer(fp->ops.local_hash.filter_hash);
- if (WARN_ON_ONCE(!hash))
- goto out;
-
- ret = fprobe_init_rethook(fp, (int)hash->count);
- if (!ret)
- ret = register_ftrace_function(&fp->ops);
+ addrs = kcalloc(ret, sizeof(unsigned long), GFP_KERNEL);
+ if (!addrs)
+ return -ENOMEM;
+ ret = ip_list_from_filter(filter, notfilter, addrs, ret);
+ if (ret > 0)
+ ret = register_fprobe_ips(fp, addrs, ret);

-out:
- if (ret)
- fprobe_fail_cleanup(fp);
+ kfree(addrs);
return ret;
}
EXPORT_SYMBOL_GPL(register_fprobe);
@@ -284,7 +542,7 @@ EXPORT_SYMBOL_GPL(register_fprobe);
/**
* register_fprobe_ips() - Register fprobe to ftrace by address.
* @fp: A fprobe data structure to be registered.
- * @addrs: An array of target ftrace location addresses.
+ * @addrs: An array of target function address.
* @num: The number of entries of @addrs.
*
* Register @fp to ftrace for enabling the probe on the address given by @addrs.
@@ -296,23 +554,27 @@ EXPORT_SYMBOL_GPL(register_fprobe);
*/
int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num)
{
- int ret;
-
- if (!fp || !addrs || num <= 0)
- return -EINVAL;
-
- fprobe_init(fp);
+ struct fprobe_hlist *hlist_array;
+ int ret, i;

- ret = ftrace_set_filter_ips(&fp->ops, addrs, num, 0, 0);
+ ret = fprobe_init(fp, addrs, num);
if (ret)
return ret;

- ret = fprobe_init_rethook(fp, num);
- if (!ret)
- ret = register_ftrace_function(&fp->ops);
+ mutex_lock(&fprobe_mutex);
+
+ hlist_array = fp->hlist_array;
+ ret = fprobe_graph_add_ips(addrs, num);
+ if (!ret) {
+ add_fprobe_hash(fp);
+ for (i = 0; i < hlist_array->size; i++)
+ insert_fprobe_node(&hlist_array->array[i]);
+ }
+ mutex_unlock(&fprobe_mutex);

if (ret)
fprobe_fail_cleanup(fp);
+
return ret;
}
EXPORT_SYMBOL_GPL(register_fprobe_ips);
@@ -350,14 +612,13 @@ EXPORT_SYMBOL_GPL(register_fprobe_syms);

bool fprobe_is_registered(struct fprobe *fp)
{
- if (!fp || (fp->ops.saved_func != fprobe_handler &&
- fp->ops.saved_func != fprobe_kprobe_handler))
+ if (!fp || !fp->hlist_array)
return false;
return true;
}

/**
- * unregister_fprobe() - Unregister fprobe from ftrace
+ * unregister_fprobe() - Unregister fprobe.
* @fp: A fprobe data structure to be unregistered.
*
* Unregister fprobe (and remove ftrace hooks from the function entries).
@@ -366,23 +627,40 @@ bool fprobe_is_registered(struct fprobe *fp)
*/
int unregister_fprobe(struct fprobe *fp)
{
- int ret;
+ struct fprobe_hlist *hlist_array;
+ unsigned long *addrs = NULL;
+ int ret = 0, i, count;

- if (!fprobe_is_registered(fp))
- return -EINVAL;
+ mutex_lock(&fprobe_mutex);
+ if (!fp || !is_fprobe_still_exist(fp)) {
+ ret = -EINVAL;
+ goto out;
+ }

- if (!IS_ERR_OR_NULL(fp->rethook))
- rethook_stop(fp->rethook);
+ hlist_array = fp->hlist_array;
+ addrs = kcalloc(hlist_array->size, sizeof(unsigned long), GFP_KERNEL);
+ if (!addrs) {
+ ret = -ENOMEM; /* TODO: Fallback to one-by-one loop */
+ goto out;
+ }

- ret = unregister_ftrace_function(&fp->ops);
- if (ret < 0)
- return ret;
+ /* Remove non-synonim ips from table and hash */
+ count = 0;
+ for (i = 0; i < hlist_array->size; i++) {
+ if (!delete_fprobe_node(&hlist_array->array[i]))
+ addrs[count++] = hlist_array->array[i].addr;
+ }
+ del_fprobe_hash(fp);

- if (!IS_ERR_OR_NULL(fp->rethook))
- rethook_free(fp->rethook);
+ fprobe_graph_remove_ips(addrs, count);

- ftrace_free_filter(&fp->ops);
+ kfree_rcu(hlist_array, rcu);
+ fp->hlist_array = NULL;

+out:
+ mutex_unlock(&fprobe_mutex);
+
+ kfree(addrs);
return ret;
}
EXPORT_SYMBOL_GPL(unregister_fprobe);
diff --git a/lib/test_fprobe.c b/lib/test_fprobe.c
index 271ce0caeec0..cf92111b5c79 100644
--- a/lib/test_fprobe.c
+++ b/lib/test_fprobe.c
@@ -17,10 +17,8 @@ static u32 rand1, entry_val, exit_val;
/* Use indirect calls to avoid inlining the target functions */
static u32 (*target)(u32 value);
static u32 (*target2)(u32 value);
-static u32 (*target_nest)(u32 value, u32 (*nest)(u32));
static unsigned long target_ip;
static unsigned long target2_ip;
-static unsigned long target_nest_ip;
static int entry_return_value;

static noinline u32 fprobe_selftest_target(u32 value)
@@ -33,11 +31,6 @@ static noinline u32 fprobe_selftest_target2(u32 value)
return (value / div_factor) + 1;
}

-static noinline u32 fprobe_selftest_nest_target(u32 value, u32 (*nest)(u32))
-{
- return nest(value + 2);
-}
-
static notrace int fp_entry_handler(struct fprobe *fp, unsigned long ip,
unsigned long ret_ip,
struct ftrace_regs *fregs, void *data)
@@ -79,22 +72,6 @@ static notrace void fp_exit_handler(struct fprobe *fp, unsigned long ip,
KUNIT_EXPECT_NULL(current_test, data);
}

-static notrace int nest_entry_handler(struct fprobe *fp, unsigned long ip,
- unsigned long ret_ip,
- struct ftrace_regs *fregs, void *data)
-{
- KUNIT_EXPECT_FALSE(current_test, preemptible());
- return 0;
-}
-
-static notrace void nest_exit_handler(struct fprobe *fp, unsigned long ip,
- unsigned long ret_ip,
- struct ftrace_regs *fregs, void *data)
-{
- KUNIT_EXPECT_FALSE(current_test, preemptible());
- KUNIT_EXPECT_EQ(current_test, ip, target_nest_ip);
-}
-
/* Test entry only (no rethook) */
static void test_fprobe_entry(struct kunit *test)
{
@@ -191,25 +168,6 @@ static void test_fprobe_data(struct kunit *test)
KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp));
}

-/* Test nr_maxactive */
-static void test_fprobe_nest(struct kunit *test)
-{
- static const char *syms[] = {"fprobe_selftest_target", "fprobe_selftest_nest_target"};
- struct fprobe fp = {
- .entry_handler = nest_entry_handler,
- .exit_handler = nest_exit_handler,
- .nr_maxactive = 1,
- };
-
- current_test = test;
- KUNIT_EXPECT_EQ(test, 0, register_fprobe_syms(&fp, syms, 2));
-
- target_nest(rand1, target);
- KUNIT_EXPECT_EQ(test, 1, fp.nmissed);
-
- KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp));
-}
-
static void test_fprobe_skip(struct kunit *test)
{
struct fprobe fp = {
@@ -247,10 +205,8 @@ static int fprobe_test_init(struct kunit *test)
rand1 = get_random_u32_above(div_factor);
target = fprobe_selftest_target;
target2 = fprobe_selftest_target2;
- target_nest = fprobe_selftest_nest_target;
target_ip = get_ftrace_location(target);
target2_ip = get_ftrace_location(target2);
- target_nest_ip = get_ftrace_location(target_nest);

return 0;
}
@@ -260,7 +216,6 @@ static struct kunit_case fprobe_testcases[] = {
KUNIT_CASE(test_fprobe),
KUNIT_CASE(test_fprobe_syms),
KUNIT_CASE(test_fprobe_data),
- KUNIT_CASE(test_fprobe_nest),
KUNIT_CASE(test_fprobe_skip),
{}
};


2024-04-15 13:17:04

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 33/36] selftests: ftrace: Remove obsolate maxactive syntax check

From: Masami Hiramatsu (Google) <[email protected]>

Since the fprobe event does not support maxactive anymore, stop
testing the maxactive syntax error checking.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
.../ftrace/test.d/dynevent/fprobe_syntax_errors.tc | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
index 61877d166451..c9425a34fae3 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/fprobe_syntax_errors.tc
@@ -16,9 +16,7 @@ aarch64)
REG=%r0 ;;
esac

-check_error 'f^100 vfs_read' # MAXACT_NO_KPROBE
-check_error 'f^1a111 vfs_read' # BAD_MAXACT
-check_error 'f^100000 vfs_read' # MAXACT_TOO_BIG
+check_error 'f^100 vfs_read' # BAD_MAXACT

check_error 'f ^non_exist_func' # BAD_PROBE_ADDR (enoent)
check_error 'f ^vfs_read+10' # BAD_PROBE_ADDR


2024-04-15 13:17:24

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 35/36] Documentation: probes: Update fprobe on function-graph tracer

From: Masami Hiramatsu (Google) <[email protected]>

Update fprobe documentation for the new fprobe on function-graph
tracer. This includes some bahvior changes and pt_regs to
ftrace_regs interface change.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v2:
- Update @fregs parameter explanation.
---
Documentation/trace/fprobe.rst | 42 ++++++++++++++++++++++++++--------------
1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/Documentation/trace/fprobe.rst b/Documentation/trace/fprobe.rst
index 196f52386aaa..f58bdc64504f 100644
--- a/Documentation/trace/fprobe.rst
+++ b/Documentation/trace/fprobe.rst
@@ -9,9 +9,10 @@ Fprobe - Function entry/exit probe
Introduction
============

-Fprobe is a function entry/exit probe mechanism based on ftrace.
-Instead of using ftrace full feature, if you only want to attach callbacks
-on function entry and exit, similar to the kprobes and kretprobes, you can
+Fprobe is a function entry/exit probe mechanism based on the function-graph
+tracer.
+Instead of tracing all functions, if you want to attach callbacks on specific
+function entry and exit, similar to the kprobes and kretprobes, you can
use fprobe. Compared with kprobes and kretprobes, fprobe gives faster
instrumentation for multiple functions with single handler. This document
describes how to use fprobe.
@@ -91,12 +92,14 @@ The prototype of the entry/exit callback function are as follows:

.. code-block:: c

- int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct pt_regs *regs, void *entry_data);
+ int entry_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct ftrace_regs *fregs, void *entry_data);

- void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct pt_regs *regs, void *entry_data);
+ void exit_callback(struct fprobe *fp, unsigned long entry_ip, unsigned long ret_ip, struct ftrace_regs *fregs, void *entry_data);

-Note that the @entry_ip is saved at function entry and passed to exit handler.
-If the entry callback function returns !0, the corresponding exit callback will be cancelled.
+Note that the @entry_ip is saved at function entry and passed to exit
+handler.
+If the entry callback function returns !0, the corresponding exit callback
+will be cancelled.

@fp
This is the address of `fprobe` data structure related to this handler.
@@ -112,12 +115,10 @@ If the entry callback function returns !0, the corresponding exit callback will
This is the return address that the traced function will return to,
somewhere in the caller. This can be used at both entry and exit.

-@regs
- This is the `pt_regs` data structure at the entry and exit. Note that
- the instruction pointer of @regs may be different from the @entry_ip
- in the entry_handler. If you need traced instruction pointer, you need
- to use @entry_ip. On the other hand, in the exit_handler, the instruction
- pointer of @regs is set to the current return address.
+@fregs
+ This is the `ftrace_regs` data structure at the entry and exit. This
+ includes the function parameters, or the return values. So user can
+ access thos values via appropriate `ftrace_regs_*` APIs.

@entry_data
This is a local storage to share the data between entry and exit handlers.
@@ -125,6 +126,17 @@ If the entry callback function returns !0, the corresponding exit callback will
and `entry_data_size` field when registering the fprobe, the storage is
allocated and passed to both `entry_handler` and `exit_handler`.

+Entry data size and exit handlers on the same function
+======================================================
+
+Since the entry data is passed via per-task stack and it is has limited size,
+the entry data size per probe is limited to `15 * sizeof(long)`. You also need
+to take care that the different fprobes are probing on the same function, this
+limit becomes smaller. The entry data size is aligned to `sizeof(long)` and
+each fprobe which has exit handler uses a `sizeof(long)` space on the stack,
+you should keep the number of fprobes on the same function as small as
+possible.
+
Share the callbacks with kprobes
================================

@@ -165,8 +177,8 @@ This counter counts up when;
- fprobe fails to take ftrace_recursion lock. This usually means that a function
which is traced by other ftrace users is called from the entry_handler.

- - fprobe fails to setup the function exit because of the shortage of rethook
- (the shadow stack for hooking the function return.)
+ - fprobe fails to setup the function exit because of failing to allocate the
+ data buffer from the per-task shadow stack.

The `fprobe::nmissed` field counts up in both cases. Therefore, the former
skips both of entry and exit callback and the latter skips the exit


2024-04-15 13:17:29

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 36/36] fgraph: Skip recording calltime/rettime if it is not nneeded

From: Masami Hiramatsu (Google) <[email protected]>

Skip recording calltime and rettime if the fgraph_ops does not need it.
This is a kind of performance optimization for fprobe. Since the fprobe
user does not use these entries, recording timestamp in fgraph is just
a overhead (e.g. eBPF, ftrace). So introduce the skip_timestamp flag,
and all fgraph_ops sets this flag, skip recording calltime and rettime.

Suggested-by: Jiri Olsa <[email protected]>
Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v9:
- Newly added.
---
include/linux/ftrace.h | 2 ++
kernel/trace/fgraph.c | 46 +++++++++++++++++++++++++++++++++++++++-------
kernel/trace/fprobe.c | 1 +
3 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index d845a80a3d56..06fc7cbef897 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1156,6 +1156,8 @@ struct fgraph_ops {
struct ftrace_ops ops; /* for the hash lists */
void *private;
int idx;
+ /* If skip_timestamp is true, this does not record timestamps. */
+ bool skip_timestamp;
};

void *fgraph_reserve_data(int idx, int size_bytes);
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 7556fbbae323..a5722537bb79 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -131,6 +131,7 @@ DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
int ftrace_graph_active;

static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
+static bool fgraph_skip_timestamp;

/* LRU index table for fgraph_array */
static int fgraph_lru_table[FGRAPH_ARRAY_SIZE];
@@ -475,7 +476,7 @@ void ftrace_graph_stop(void)
static int
ftrace_push_return_trace(unsigned long ret, unsigned long func,
unsigned long frame_pointer, unsigned long *retp,
- int fgraph_idx)
+ int fgraph_idx, bool skip_ts)
{
struct ftrace_ret_stack *ret_stack;
unsigned long long calltime;
@@ -498,8 +499,12 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
ret_stack = get_ret_stack(current, current->curr_ret_stack, &index);
if (ret_stack && ret_stack->func == func &&
get_fgraph_type(current, index + FGRAPH_RET_INDEX) == FGRAPH_TYPE_BITMAP &&
- !is_fgraph_index_set(current, index + FGRAPH_RET_INDEX, fgraph_idx))
+ !is_fgraph_index_set(current, index + FGRAPH_RET_INDEX, fgraph_idx)) {
+ /* If previous one skips calltime, update it. */
+ if (!skip_ts && !ret_stack->calltime)
+ ret_stack->calltime = trace_clock_local();
return index + FGRAPH_RET_INDEX;
+ }

val = (FGRAPH_TYPE_RESERVED << FGRAPH_TYPE_SHIFT) | FGRAPH_RET_INDEX;

@@ -517,7 +522,10 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
return -EBUSY;
}

- calltime = trace_clock_local();
+ if (skip_ts)
+ calltime = 0LL;
+ else
+ calltime = trace_clock_local();

index = READ_ONCE(current->curr_ret_stack);
ret_stack = RET_STACK(current, index);
@@ -601,7 +609,8 @@ int function_graph_enter_regs(unsigned long ret, unsigned long func,
trace.func = func;
trace.depth = ++current->curr_ret_depth;

- index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0);
+ index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0,
+ fgraph_skip_timestamp);
if (index < 0)
goto out;

@@ -654,7 +663,8 @@ int function_graph_enter_ops(unsigned long ret, unsigned long func,
return -ENODEV;

/* Use start for the distance to ret_stack (skipping over reserve) */
- index = ftrace_push_return_trace(ret, func, frame_pointer, retp, gops->idx);
+ index = ftrace_push_return_trace(ret, func, frame_pointer, retp, gops->idx,
+ gops->skip_timestamp);
if (index < 0)
return index;
type = get_fgraph_type(current, index);
@@ -732,6 +742,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
*ret = ret_stack->ret;
trace->func = ret_stack->func;
trace->calltime = ret_stack->calltime;
+ trace->rettime = 0;
trace->overrun = atomic_read(&current->trace_overrun);
trace->depth = current->curr_ret_depth;
/*
@@ -792,7 +803,6 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
return (unsigned long)panic;
}

- trace.rettime = trace_clock_local();
if (fregs)
ftrace_regs_set_instruction_pointer(fregs, ret);

@@ -808,6 +818,8 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
continue;
if (gops == &fgraph_stub)
continue;
+ if (!trace.rettime && !gops->skip_timestamp)
+ trace.rettime = trace_clock_local();

gops->retfunc(&trace, gops, fregs);
}
@@ -1185,6 +1197,24 @@ static void init_task_vars(int idx)
read_unlock(&tasklist_lock);
}

+static void update_fgraph_skip_timestamp(void)
+{
+ int i;
+
+ for (i = 0; i < FGRAPH_ARRAY_SIZE; i++) {
+ struct fgraph_ops *gops = fgraph_array[i];
+
+ if (gops == &fgraph_stub)
+ continue;
+
+ if (!gops->skip_timestamp) {
+ fgraph_skip_timestamp = false;
+ return;
+ }
+ }
+ fgraph_skip_timestamp = true;
+}
+
int register_ftrace_graph(struct fgraph_ops *gops)
{
int command = 0;
@@ -1219,6 +1249,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
gops->idx = i;

ftrace_graph_active++;
+ update_fgraph_skip_timestamp();

if (ftrace_graph_active == 1) {
register_pm_notifier(&ftrace_suspend_notifier);
@@ -1242,6 +1273,7 @@ int register_ftrace_graph(struct fgraph_ops *gops)
fgraph_array[i] = &fgraph_stub;
ftrace_graph_active--;
fgraph_lru_release_index(i);
+ update_fgraph_skip_timestamp();
}
out:
mutex_unlock(&ftrace_lock);
@@ -1265,8 +1297,8 @@ void unregister_ftrace_graph(struct fgraph_ops *gops)
goto out;

fgraph_array[gops->idx] = &fgraph_stub;
-
ftrace_graph_active--;
+ update_fgraph_skip_timestamp();

if (!ftrace_graph_active)
command = FTRACE_STOP_FUNC_RET;
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index afa52d9816cf..24bb8edec8a3 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -345,6 +345,7 @@ NOKPROBE_SYMBOL(fprobe_return);
static struct fgraph_ops fprobe_graph_ops = {
.entryfunc = fprobe_entry,
.retfunc = fprobe_return,
+ .skip_timestamp = true,
};
static int fprobe_graph_active;



2024-04-15 13:23:06

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 23/36] function_graph: Pass ftrace_regs to retfunc

From: Masami Hiramatsu (Google) <[email protected]>

Pass ftrace_regs to the fgraph_ops::retfunc(). If ftrace_regs is not
available, it passes a NULL instead. User callback function can access
some registers (including return address) via this ftrace_regs.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v8:
- Pass ftrace_regs to retfunc, instead of adding retregfunc.
Changes in v6:
- update to use ftrace_regs_get_return_value() because of reordering
patches.
Changes in v3:
- Update for new multiple fgraph.
- Save the return address to instruction pointer in ftrace_regs.
---
include/linux/ftrace.h | 3 ++-
kernel/trace/fgraph.c | 14 ++++++++++----
kernel/trace/ftrace.c | 3 ++-
kernel/trace/trace.h | 3 ++-
kernel/trace/trace_functions_graph.c | 7 ++++---
kernel/trace/trace_irqsoff.c | 3 ++-
kernel/trace/trace_sched_wakeup.c | 3 ++-
kernel/trace/trace_selftest.c | 3 ++-
8 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 0255b95f2d61..54e60dbdb657 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1067,7 +1067,8 @@ struct fgraph_ops;

/* Type of the callback handlers for tracing function graph*/
typedef void (*trace_func_graph_ret_t)(struct ftrace_graph_ret *,
- struct fgraph_ops *); /* return */
+ struct fgraph_ops *,
+ struct ftrace_regs *); /* return */
typedef int (*trace_func_graph_ent_t)(struct ftrace_graph_ent *,
struct fgraph_ops *,
struct ftrace_regs *); /* entry */
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 33be5af4801c..7556fbbae323 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -255,7 +255,8 @@ static int entry_run(struct ftrace_graph_ent *trace, struct fgraph_ops *ops,
}

/* ftrace_graph_return set to this to tell some archs to run function graph */
-static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops)
+static void return_run(struct ftrace_graph_ret *trace, struct fgraph_ops *ops,
+ struct ftrace_regs *fregs)
{
}

@@ -447,7 +448,8 @@ int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace,
}

static void ftrace_graph_ret_stub(struct ftrace_graph_ret *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
}

@@ -791,6 +793,9 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
}

trace.rettime = trace_clock_local();
+ if (fregs)
+ ftrace_regs_set_instruction_pointer(fregs, ret);
+
#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
trace.retval = ftrace_regs_get_return_value(fregs);
#endif
@@ -804,7 +809,7 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
if (gops == &fgraph_stub)
continue;

- gops->retfunc(&trace, gops);
+ gops->retfunc(&trace, gops, fregs);
}

/*
@@ -968,7 +973,8 @@ void ftrace_graph_sleep_time_control(bool enable)
* Simply points to ftrace_stub, but with the proper protocol.
* Defined by the linker script in linux/vmlinux.lds.h
*/
-void ftrace_stub_graph(struct ftrace_graph_ret *trace, struct fgraph_ops *gops);
+void ftrace_stub_graph(struct ftrace_graph_ret *trace, struct fgraph_ops *gops,
+ struct ftrace_regs *fregs);

/* The callbacks that hook a function */
trace_func_graph_ret_t ftrace_graph_return = ftrace_stub_graph;
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 5377a0b22ec9..e869258efc52 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -835,7 +835,8 @@ static int profile_graph_entry(struct ftrace_graph_ent *trace,
}

static void profile_graph_return(struct ftrace_graph_ret *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
struct ftrace_ret_stack *ret_stack;
struct ftrace_profile_stat *stat;
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 8221b6febb51..81cb2a90cbda 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -681,7 +681,8 @@ void trace_latency_header(struct seq_file *m);
void trace_default_header(struct seq_file *m);
void print_trace_header(struct seq_file *m, struct trace_iterator *iter);

-void trace_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops *gops);
+void trace_graph_return(struct ftrace_graph_ret *trace, struct fgraph_ops *gops,
+ struct ftrace_regs *fregs);
int trace_graph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops,
struct ftrace_regs *fregs);

diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index b9785fc919c9..241407000109 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -240,7 +240,7 @@ void __trace_graph_return(struct trace_array *tr,
}

void trace_graph_return(struct ftrace_graph_ret *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops, struct ftrace_regs *fregs)
{
unsigned long *task_var = fgraph_get_task_var(gops);
struct trace_array *tr = gops->private;
@@ -270,7 +270,8 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
}

static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
ftrace_graph_addr_finish(gops, trace);

@@ -283,7 +284,7 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
(trace->rettime - trace->calltime < tracing_thresh))
return;
else
- trace_graph_return(trace, gops);
+ trace_graph_return(trace, gops, fregs);
}

static struct fgraph_ops funcgraph_ops = {
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index ad739d76fc86..504de7a05498 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -208,7 +208,8 @@ static int irqsoff_graph_entry(struct ftrace_graph_ent *trace,
}

static void irqsoff_graph_return(struct ftrace_graph_ret *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
struct trace_array *tr = irqsoff_trace;
struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_sched_wakeup.c b/kernel/trace/trace_sched_wakeup.c
index 23360a2700de..9ffbd9326898 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -144,7 +144,8 @@ static int wakeup_graph_entry(struct ftrace_graph_ent *trace,
}

static void wakeup_graph_return(struct ftrace_graph_ret *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
struct trace_array *tr = wakeup_trace;
struct trace_array_cpu *data;
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 5edbf09844d9..4e6dff831407 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -807,7 +807,8 @@ static __init int store_entry(struct ftrace_graph_ent *trace,
}

static __init void store_return(struct ftrace_graph_ret *trace,
- struct fgraph_ops *gops)
+ struct fgraph_ops *gops,
+ struct ftrace_regs *fregs)
{
struct fgraph_fixture *fixture = container_of(gops, struct fgraph_fixture, gops);
const char *type = fixture->store_type_name;


2024-04-15 13:24:13

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 29/36] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled

From: Masami Hiramatsu (Google) <[email protected]>

Enable kprobe_multi feature if CONFIG_FPROBE is enabled. The pt_regs is
converted from ftrace_regs by ftrace_partial_regs(), thus some registers
may always returns 0. But it should be enough for function entry (access
arguments) and exit (access return value).

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Florent Revest <[email protected]>
---
Changes from previous series: NOTHING, Update against the new series.
---
kernel/trace/bpf_trace.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index e51a6ef87167..57b1174030c9 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2577,7 +2577,7 @@ static int __init bpf_event_init(void)
fs_initcall(bpf_event_init);
#endif /* CONFIG_MODULES */

-#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS)
+#ifdef CONFIG_FPROBE
struct bpf_kprobe_multi_link {
struct bpf_link link;
struct fprobe fp;
@@ -2600,6 +2600,8 @@ struct user_syms {
char *buf;
};

+static DEFINE_PER_CPU(struct pt_regs, bpf_kprobe_multi_pt_regs);
+
static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, u32 cnt)
{
unsigned long __user usymbol;
@@ -2792,13 +2794,14 @@ static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx)

static int
kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
- unsigned long entry_ip, struct pt_regs *regs)
+ unsigned long entry_ip, struct ftrace_regs *fregs)
{
struct bpf_kprobe_multi_run_ctx run_ctx = {
.link = link,
.entry_ip = entry_ip,
};
struct bpf_run_ctx *old_run_ctx;
+ struct pt_regs *regs;
int err;

if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
@@ -2809,6 +2812,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,

migrate_disable();
rcu_read_lock();
+ regs = ftrace_partial_regs(fregs, this_cpu_ptr(&bpf_kprobe_multi_pt_regs));
old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
err = bpf_prog_run(link->link.prog, regs);
bpf_reset_run_ctx(old_run_ctx);
@@ -2826,13 +2830,9 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,
void *data)
{
struct bpf_kprobe_multi_link *link;
- struct pt_regs *regs = ftrace_get_regs(fregs);
-
- if (!regs)
- return 0;

link = container_of(fp, struct bpf_kprobe_multi_link, fp);
- kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
+ kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
return 0;
}

@@ -2842,13 +2842,9 @@ kprobe_multi_link_exit_handler(struct fprobe *fp, unsigned long fentry_ip,
void *data)
{
struct bpf_kprobe_multi_link *link;
- struct pt_regs *regs = ftrace_get_regs(fregs);
-
- if (!regs)
- return;

link = container_of(fp, struct bpf_kprobe_multi_link, fp);
- kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
+ kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
}

static int symbols_cmp_r(const void *a, const void *b, const void *priv)
@@ -3107,7 +3103,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
kvfree(cookies);
return err;
}
-#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */
+#else /* !CONFIG_FPROBE */
int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
{
return -EOPNOTSUPP;


2024-04-15 13:24:14

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 32/36] tracing/fprobe: Remove nr_maxactive from fprobe

From: Masami Hiramatsu (Google) <[email protected]>

Remove depercated fprobe::nr_maxactive. This involves fprobe events to
rejects the maxactive number.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v2:
- Newly added.
---
include/linux/fprobe.h | 2 --
kernel/trace/trace_fprobe.c | 44 ++++++-------------------------------------
2 files changed, 6 insertions(+), 40 deletions(-)

diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index 2d06bbd99601..a86b3e4df2a0 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -54,7 +54,6 @@ struct fprobe_hlist {
* @nmissed: The counter for missing events.
* @flags: The status flag.
* @entry_data_size: The private data storage size.
- * @nr_maxactive: The max number of active functions. (*deprecated)
* @entry_handler: The callback function for function entry.
* @exit_handler: The callback function for function exit.
* @hlist_array: The fprobe_hlist for fprobe search from IP hash table.
@@ -63,7 +62,6 @@ struct fprobe {
unsigned long nmissed;
unsigned int flags;
size_t entry_data_size;
- int nr_maxactive;

fprobe_entry_cb entry_handler;
fprobe_exit_cb exit_handler;
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index 86cd6a8c806a..20ef5cd5d419 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -422,7 +422,6 @@ static struct trace_fprobe *alloc_trace_fprobe(const char *group,
const char *event,
const char *symbol,
struct tracepoint *tpoint,
- int maxactive,
int nargs, bool is_return)
{
struct trace_fprobe *tf;
@@ -442,7 +441,6 @@ static struct trace_fprobe *alloc_trace_fprobe(const char *group,
tf->fp.entry_handler = fentry_dispatcher;

tf->tpoint = tpoint;
- tf->fp.nr_maxactive = maxactive;

ret = trace_probe_init(&tf->tp, event, group, false, nargs);
if (ret < 0)
@@ -1021,12 +1019,11 @@ static int __trace_fprobe_create(int argc, const char *argv[])
* FETCHARG:TYPE : use TYPE instead of unsigned long.
*/
struct trace_fprobe *tf = NULL;
- int i, len, new_argc = 0, ret = 0;
+ int i, new_argc = 0, ret = 0;
bool is_return = false;
char *symbol = NULL;
const char *event = NULL, *group = FPROBE_EVENT_SYSTEM;
const char **new_argv = NULL;
- int maxactive = 0;
char buf[MAX_EVENT_NAME_LEN];
char gbuf[MAX_EVENT_NAME_LEN];
char sbuf[KSYM_NAME_LEN];
@@ -1048,33 +1045,13 @@ static int __trace_fprobe_create(int argc, const char *argv[])

trace_probe_log_init("trace_fprobe", argc, argv);

- event = strchr(&argv[0][1], ':');
- if (event)
- event++;
-
- if (isdigit(argv[0][1])) {
- if (event)
- len = event - &argv[0][1] - 1;
- else
- len = strlen(&argv[0][1]);
- if (len > MAX_EVENT_NAME_LEN - 1) {
- trace_probe_log_err(1, BAD_MAXACT);
- goto parse_error;
- }
- memcpy(buf, &argv[0][1], len);
- buf[len] = '\0';
- ret = kstrtouint(buf, 0, &maxactive);
- if (ret || !maxactive) {
+ if (argv[0][1] != '\0') {
+ if (argv[0][1] != ':') {
+ trace_probe_log_set_index(0);
trace_probe_log_err(1, BAD_MAXACT);
goto parse_error;
}
- /* fprobe rethook instances are iterated over via a list. The
- * maximum should stay reasonable.
- */
- if (maxactive > RETHOOK_MAXACTIVE_MAX) {
- trace_probe_log_err(1, MAXACT_TOO_BIG);
- goto parse_error;
- }
+ event = &argv[0][2];
}

trace_probe_log_set_index(1);
@@ -1084,12 +1061,6 @@ static int __trace_fprobe_create(int argc, const char *argv[])
if (ret < 0)
goto parse_error;

- if (!is_return && maxactive) {
- trace_probe_log_set_index(0);
- trace_probe_log_err(1, BAD_MAXACT_TYPE);
- goto parse_error;
- }
-
trace_probe_log_set_index(0);
if (event) {
ret = traceprobe_parse_event_name(&event, &group, gbuf,
@@ -1147,8 +1118,7 @@ static int __trace_fprobe_create(int argc, const char *argv[])
goto out;

/* setup a probe */
- tf = alloc_trace_fprobe(group, event, symbol, tpoint, maxactive,
- argc, is_return);
+ tf = alloc_trace_fprobe(group, event, symbol, tpoint, argc, is_return);
if (IS_ERR(tf)) {
ret = PTR_ERR(tf);
/* This must return -ENOMEM, else there is a bug */
@@ -1230,8 +1200,6 @@ static int trace_fprobe_show(struct seq_file *m, struct dyn_event *ev)
seq_putc(m, 't');
else
seq_putc(m, 'f');
- if (trace_fprobe_is_return(tf) && tf->fp.nr_maxactive)
- seq_printf(m, "%d", tf->fp.nr_maxactive);
seq_printf(m, ":%s/%s", trace_probe_group_name(&tf->tp),
trace_probe_name(&tf->tp));



2024-04-15 13:24:55

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 25/36] fprobe: Use ftrace_regs in fprobe exit handler

From: Masami Hiramatsu (Google) <[email protected]>

Change the fprobe exit handler to use ftrace_regs structure instead of
pt_regs. This also introduce HAVE_PT_REGS_TO_FTRACE_REGS_CAST which means
the ftrace_regs's memory layout is equal to the pt_regs so that those are
able to cast. Fprobe introduces a new dependency with that.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
Changes in v3:
- Use ftrace_regs_get_return_value()
Changes from previous series: NOTHING, just forward ported.
---
arch/loongarch/Kconfig | 1 +
arch/s390/Kconfig | 1 +
arch/x86/Kconfig | 1 +
include/linux/fprobe.h | 2 +-
include/linux/ftrace.h | 6 ++++++
kernel/trace/Kconfig | 8 ++++++++
kernel/trace/bpf_trace.c | 6 +++++-
kernel/trace/fprobe.c | 3 ++-
kernel/trace/trace_fprobe.c | 6 +++++-
lib/test_fprobe.c | 6 +++---
samples/fprobe/fprobe_example.c | 2 +-
11 files changed, 34 insertions(+), 8 deletions(-)

diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig
index da81053bfa6f..dbcba61b6acf 100644
--- a/arch/loongarch/Kconfig
+++ b/arch/loongarch/Kconfig
@@ -114,6 +114,7 @@ config LOONGARCH
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGS
+ select HAVE_PT_REGS_TO_FTRACE_REGS_CAST
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EBPF_JIT
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 1ed25b72eb47..ce85b3165065 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -170,6 +170,7 @@ config S390
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_ARGS
+ select HAVE_PT_REGS_TO_FTRACE_REGS_CAST
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_EBPF_JIT if HAVE_MARCH_Z196_FEATURES
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 96c567714c6b..188edde1db4d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -214,6 +214,7 @@ config X86
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_DYNAMIC_FTRACE_WITH_ARGS if X86_64
+ select HAVE_PT_REGS_TO_FTRACE_REGS_CAST if X86_64
select HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS
select HAVE_SAMPLE_FTRACE_DIRECT if X86_64
select HAVE_SAMPLE_FTRACE_DIRECT_MULTI if X86_64
diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h
index ca64ee5e45d2..ef609bcca0f9 100644
--- a/include/linux/fprobe.h
+++ b/include/linux/fprobe.h
@@ -14,7 +14,7 @@ typedef int (*fprobe_entry_cb)(struct fprobe *fp, unsigned long entry_ip,
void *entry_data);

typedef void (*fprobe_exit_cb)(struct fprobe *fp, unsigned long entry_ip,
- unsigned long ret_ip, struct pt_regs *regs,
+ unsigned long ret_ip, struct ftrace_regs *regs,
void *entry_data);

/**
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 54e60dbdb657..2b35f7d851ca 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -163,6 +163,12 @@ struct ftrace_regs {
#define ftrace_regs_set_instruction_pointer(fregs, ip) do { } while (0)
#endif /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS */

+#ifdef CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST
+
+static_assert(sizeof(struct pt_regs) == sizeof(struct ftrace_regs));
+
+#endif /* CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
+
static __always_inline struct pt_regs *ftrace_get_regs(struct ftrace_regs *fregs)
{
if (!fregs)
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 0aba53b7b0be..d818ba3ff943 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -57,6 +57,13 @@ config HAVE_DYNAMIC_FTRACE_WITH_ARGS
This allows for use of ftrace_regs_get_argument() and
ftrace_regs_get_stack_pointer().

+config HAVE_PT_REGS_TO_FTRACE_REGS_CAST
+ bool
+ help
+ If this is set, the memory layout of the ftrace_regs data structure
+ is the same as the pt_regs. So the pt_regs is possible to be casted
+ to ftrace_regs.
+
config HAVE_DYNAMIC_FTRACE_NO_PATCHABLE
bool
help
@@ -288,6 +295,7 @@ config FPROBE
bool "Kernel Function Probe (fprobe)"
depends on FUNCTION_TRACER
depends on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS
+ depends on HAVE_PT_REGS_TO_FTRACE_REGS_CAST || !HAVE_DYNAMIC_FTRACE_WITH_ARGS
depends on HAVE_RETHOOK
select RETHOOK
default n
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 7837cf4e39d9..e51a6ef87167 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2838,10 +2838,14 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,

static void
kprobe_multi_link_exit_handler(struct fprobe *fp, unsigned long fentry_ip,
- unsigned long ret_ip, struct pt_regs *regs,
+ unsigned long ret_ip, struct ftrace_regs *fregs,
void *data)
{
struct bpf_kprobe_multi_link *link;
+ struct pt_regs *regs = ftrace_get_regs(fregs);
+
+ if (!regs)
+ return;

link = container_of(fp, struct bpf_kprobe_multi_link, fp);
kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index 3d3789283873..90a3c8e2bbdf 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c
@@ -124,6 +124,7 @@ static void fprobe_exit_handler(struct rethook_node *rh, void *data,
{
struct fprobe *fp = (struct fprobe *)data;
struct fprobe_rethook_node *fpr;
+ struct ftrace_regs *fregs = (struct ftrace_regs *)regs;
int bit;

if (!fp || fprobe_disabled(fp))
@@ -141,7 +142,7 @@ static void fprobe_exit_handler(struct rethook_node *rh, void *data,
return;
}

- fp->exit_handler(fp, fpr->entry_ip, ret_ip, regs,
+ fp->exit_handler(fp, fpr->entry_ip, ret_ip, fregs,
fp->entry_data_size ? (void *)fpr->data : NULL);
ftrace_test_recursion_unlock(bit);
}
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index b2c20d4fdfd7..273cdf3cf70c 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
@@ -359,10 +359,14 @@ static int fentry_dispatcher(struct fprobe *fp, unsigned long entry_ip,
NOKPROBE_SYMBOL(fentry_dispatcher);

static void fexit_dispatcher(struct fprobe *fp, unsigned long entry_ip,
- unsigned long ret_ip, struct pt_regs *regs,
+ unsigned long ret_ip, struct ftrace_regs *fregs,
void *entry_data)
{
struct trace_fprobe *tf = container_of(fp, struct trace_fprobe, fp);
+ struct pt_regs *regs = ftrace_get_regs(fregs);
+
+ if (!regs)
+ return;

if (trace_probe_test_flag(&tf->tp, TP_FLAG_TRACE))
fexit_trace_func(tf, entry_ip, ret_ip, regs, entry_data);
diff --git a/lib/test_fprobe.c b/lib/test_fprobe.c
index ff607babba18..271ce0caeec0 100644
--- a/lib/test_fprobe.c
+++ b/lib/test_fprobe.c
@@ -59,9 +59,9 @@ static notrace int fp_entry_handler(struct fprobe *fp, unsigned long ip,

static notrace void fp_exit_handler(struct fprobe *fp, unsigned long ip,
unsigned long ret_ip,
- struct pt_regs *regs, void *data)
+ struct ftrace_regs *fregs, void *data)
{
- unsigned long ret = regs_return_value(regs);
+ unsigned long ret = ftrace_regs_get_return_value(fregs);

KUNIT_EXPECT_FALSE(current_test, preemptible());
if (ip != target_ip) {
@@ -89,7 +89,7 @@ static notrace int nest_entry_handler(struct fprobe *fp, unsigned long ip,

static notrace void nest_exit_handler(struct fprobe *fp, unsigned long ip,
unsigned long ret_ip,
- struct pt_regs *regs, void *data)
+ struct ftrace_regs *fregs, void *data)
{
KUNIT_EXPECT_FALSE(current_test, preemptible());
KUNIT_EXPECT_EQ(current_test, ip, target_nest_ip);
diff --git a/samples/fprobe/fprobe_example.c b/samples/fprobe/fprobe_example.c
index 1545a1aac616..d476d1f07538 100644
--- a/samples/fprobe/fprobe_example.c
+++ b/samples/fprobe/fprobe_example.c
@@ -67,7 +67,7 @@ static int sample_entry_handler(struct fprobe *fp, unsigned long ip,
}

static void sample_exit_handler(struct fprobe *fp, unsigned long ip,
- unsigned long ret_ip, struct pt_regs *regs,
+ unsigned long ret_ip, struct ftrace_regs *regs,
void *data)
{
unsigned long rip = ret_ip;


2024-04-15 13:31:34

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v9 34/36] selftests/ftrace: Add a test case for repeating register/unregister fprobe

From: Masami Hiramatsu (Google) <[email protected]>

This test case repeats define and undefine the fprobe dynamic event to
ensure that the fprobe does not cause any issue with such operations.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
---
.../test.d/dynevent/add_remove_fprobe_repeat.tc | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc

diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
new file mode 100644
index 000000000000..b4ad09237e2a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
@@ -0,0 +1,19 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+# description: Generic dynamic event - Repeating add/remove fprobe events
+# requires: dynamic_events "f[:[<group>/][<event>]] <func-name>[%return] [<args>]":README
+
+echo 0 > events/enable
+echo > dynamic_events
+
+PLACE=$FUNCTION_FORK
+REPEAT_TIMES=64
+
+for i in `seq 1 $REPEAT_TIMES`; do
+ echo "f:myevent $PLACE" >> dynamic_events
+ grep -q myevent dynamic_events
+ test -d events/fprobes/myevent
+ echo > dynamic_events
+done
+
+clear_trace


2024-04-19 05:36:35

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

Hi Steve,

Can you review this series? Especially, [07/36] and [12/36] has been changed
a lot from your original patch.

Thank you,

On Mon, 15 Apr 2024 21:48:59 +0900
"Masami Hiramatsu (Google)" <[email protected]> wrote:

> Hi,
>
> Here is the 9th version of the series to re-implement the fprobe on
> function-graph tracer. The previous version is;
>
> https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/
>
> This version is ported on the latest kernel (v6.9-rc3 + probes/for-next)
> and fixed some bugs + performance optimization patch[36/36].
> - [12/36] Fix to clear fgraph_array entry in registration failure, also
> return -ENOSPC when fgraph_array is full.
> - [28/36] Add new store_fprobe_entry_data() for fprobe.
> - [31/36] Remove DIV_ROUND_UP() and fix entry data address calculation.
> - [36/36] Add new flag to skip timestamp recording.
>
> Overview
> --------
> This series does major 2 changes, enable multiple function-graphs on
> the ftrace (e.g. allow function-graph on sub instances) and rewrite the
> fprobe on this function-graph.
>
> The former changes had been sent from Steven Rostedt 4 years ago (*),
> which allows users to set different setting function-graph tracer (and
> other tracers based on function-graph) in each trace-instances at the
> same time.
>
> (*) https://lore.kernel.org/all/[email protected]/
>
> The purpose of latter change are;
>
> 1) Remove dependency of the rethook from fprobe so that we can reduce
> the return hook code and shadow stack.
>
> 2) Make 'ftrace_regs' the common trace interface for the function
> boundary.
>
> 1) Currently we have 2(or 3) different function return hook codes,
> the function-graph tracer and rethook (and legacy kretprobe).
> But since this is redundant and needs double maintenance cost,
> I would like to unify those. From the user's viewpoint, function-
> graph tracer is very useful to grasp the execution path. For this
> purpose, it is hard to use the rethook in the function-graph
> tracer, but the opposite is possible. (Strictly speaking, kretprobe
> can not use it because it requires 'pt_regs' for historical reasons.)
>
> 2) Now the fprobe provides the 'pt_regs' for its handler, but that is
> wrong for the function entry and exit. Moreover, depending on the
> architecture, there is no way to accurately reproduce 'pt_regs'
> outside of interrupt or exception handlers. This means fprobe should
> not use 'pt_regs' because it does not use such exceptions.
> (Conversely, kprobe should use 'pt_regs' because it is an abstract
> interface of the software breakpoint exception.)
>
> This series changes fprobe to use function-graph tracer for tracing
> function entry and exit, instead of mixture of ftrace and rethook.
> Unlike the rethook which is a per-task list of system-wide allocated
> nodes, the function graph's ret_stack is a per-task shadow stack.
> Thus it does not need to set 'nr_maxactive' (which is the number of
> pre-allocated nodes).
> Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'.
> Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as
> their register interface, this changes it to convert 'ftrace_regs' to
> 'pt_regs'. Of course this conversion makes an incomplete 'pt_regs',
> so users must access only registers for function parameters or
> return value.
>
> Design
> ------
> Instead of using ftrace's function entry hook directly, the new fprobe
> is built on top of the function-graph's entry and return callbacks
> with 'ftrace_regs'.
>
> Since the fprobe requires access to 'ftrace_regs', the architecture
> must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS and
> CONFIG_HAVE_FTRACE_GRAPH_FUNC, which enables to call function-graph
> entry callback with 'ftrace_regs', and also
> CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to
> return_to_handler.
>
> All fprobes share a single function-graph ops (means shares a common
> ftrace filter) similar to the kprobe-on-ftrace. This needs another
> layer to find corresponding fprobe in the common function-graph
> callbacks, but has much better scalability, since the number of
> registered function-graph ops is limited.
>
> In the entry callback, the fprobe runs its entry_handler and saves the
> address of 'fprobe' on the function-graph's shadow stack as data. The
> return callback decodes the data to get the 'fprobe' address, and runs
> the exit_handler.
>
> The fprobe introduces two hash-tables, one is for entry callback which
> searches fprobes related to the given function address passed by entry
> callback. The other is for a return callback which checks if the given
> 'fprobe' data structure pointer is still valid. Note that it is
> possible to unregister fprobe before the return callback runs. Thus
> the address validation must be done before using it in the return
> callback.
>
> This series can be applied against the probes/for-next branch, which
> is based on v6.9-rc3.
>
> This series can also be found below branch.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-on-fgraph
>
> Thank you,
>
> ---
>
> Masami Hiramatsu (Google) (21):
> tracing: Add a comment about ftrace_regs definition
> tracing: Rename ftrace_regs_return_value to ftrace_regs_get_return_value
> x86: tracing: Add ftrace_regs definition in the header
> function_graph: Use a simple LRU for fgraph_array index number
> ftrace: Add multiple fgraph storage selftest
> function_graph: Pass ftrace_regs to entryfunc
> function_graph: Replace fgraph_ret_regs with ftrace_regs
> function_graph: Pass ftrace_regs to retfunc
> fprobe: Use ftrace_regs in fprobe entry handler
> fprobe: Use ftrace_regs in fprobe exit handler
> tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
> tracing: Add ftrace_fill_perf_regs() for perf event
> tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
> bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
> ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
> fprobe: Rewrite fprobe on function-graph tracer
> tracing/fprobe: Remove nr_maxactive from fprobe
> selftests: ftrace: Remove obsolate maxactive syntax check
> selftests/ftrace: Add a test case for repeating register/unregister fprobe
> Documentation: probes: Update fprobe on function-graph tracer
> fgraph: Skip recording calltime/rettime if it is not nneeded
>
> Steven Rostedt (VMware) (15):
> function_graph: Convert ret_stack to a series of longs
> fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long
> function_graph: Add an array structure that will allow multiple callbacks
> function_graph: Allow multiple users to attach to function graph
> function_graph: Remove logic around ftrace_graph_entry and return
> ftrace/function_graph: Pass fgraph_ops to function graph callbacks
> ftrace: Allow function_graph tracer to be enabled in instances
> ftrace: Allow ftrace startup flags exist without dynamic ftrace
> function_graph: Have the instances use their own ftrace_ops for filtering
> function_graph: Add "task variables" per task for fgraph_ops
> function_graph: Move set_graph_function tests to shadow stack global var
> function_graph: Move graph depth stored data to shadow stack global var
> function_graph: Move graph notrace bit to shadow stack global var
> function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
> function_graph: Add selftest for passing local variables
>
>
> Documentation/trace/fprobe.rst | 42 +
> arch/arm64/Kconfig | 3
> arch/arm64/include/asm/ftrace.h | 47 +
> arch/arm64/kernel/asm-offsets.c | 12
> arch/arm64/kernel/entry-ftrace.S | 32 -
> arch/arm64/kernel/ftrace.c | 21
> arch/loongarch/Kconfig | 4
> arch/loongarch/include/asm/ftrace.h | 32 -
> arch/loongarch/kernel/asm-offsets.c | 12
> arch/loongarch/kernel/ftrace_dyn.c | 15
> arch/loongarch/kernel/mcount.S | 17
> arch/loongarch/kernel/mcount_dyn.S | 14
> arch/powerpc/Kconfig | 1
> arch/powerpc/include/asm/ftrace.h | 15
> arch/powerpc/kernel/trace/ftrace.c | 3
> arch/powerpc/kernel/trace/ftrace_64_pg.c | 10
> arch/riscv/Kconfig | 3
> arch/riscv/include/asm/ftrace.h | 21
> arch/riscv/kernel/ftrace.c | 15
> arch/riscv/kernel/mcount.S | 24
> arch/s390/Kconfig | 3
> arch/s390/include/asm/ftrace.h | 39 -
> arch/s390/kernel/asm-offsets.c | 6
> arch/s390/kernel/mcount.S | 9
> arch/x86/Kconfig | 4
> arch/x86/include/asm/ftrace.h | 43 -
> arch/x86/kernel/ftrace.c | 51 +
> arch/x86/kernel/ftrace_32.S | 15
> arch/x86/kernel/ftrace_64.S | 17
> include/linux/fprobe.h | 57 +
> include/linux/ftrace.h | 170 +++
> include/linux/sched.h | 2
> include/linux/trace_recursion.h | 39 -
> kernel/trace/Kconfig | 23
> kernel/trace/bpf_trace.c | 14
> kernel/trace/fgraph.c | 1005 ++++++++++++++++----
> kernel/trace/fprobe.c | 637 +++++++++----
> kernel/trace/ftrace.c | 13
> kernel/trace/ftrace_internal.h | 2
> kernel/trace/trace.h | 96 ++
> kernel/trace/trace_fprobe.c | 147 ++-
> kernel/trace/trace_functions.c | 8
> kernel/trace/trace_functions_graph.c | 98 +-
> kernel/trace/trace_irqsoff.c | 12
> kernel/trace/trace_probe_tmpl.h | 2
> kernel/trace/trace_sched_wakeup.c | 12
> kernel/trace/trace_selftest.c | 262 +++++
> lib/test_fprobe.c | 51 -
> samples/fprobe/fprobe_example.c | 4
> .../test.d/dynevent/add_remove_fprobe_repeat.tc | 19
> .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc | 4
> 51 files changed, 2325 insertions(+), 882 deletions(-)
> create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
>
> --
> Masami Hiramatsu (Google) <[email protected]>
>


--
Masami Hiramatsu (Google) <[email protected]>

2024-04-19 08:02:14

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

On Fri, 19 Apr 2024 14:36:18 +0900
Masami Hiramatsu (Google) <[email protected]> wrote:

> Hi Steve,
>
> Can you review this series? Especially, [07/36] and [12/36] has been changed
> a lot from your original patch.

I haven't forgotten (just been a bit hectic).

Worse comes to worse, I'll review it tomorrow.

-- Steve

>
> Thank you,
>
> On Mon, 15 Apr 2024 21:48:59 +0900
> "Masami Hiramatsu (Google)" <[email protected]> wrote:
>
> > Hi,
> >
> > Here is the 9th version of the series to re-implement the fprobe on
> > function-graph tracer. The previous version is;
> >
> > https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/
> >
> > This version is ported on the latest kernel (v6.9-rc3 + probes/for-next)
> > and fixed some bugs + performance optimization patch[36/36].
> > - [12/36] Fix to clear fgraph_array entry in registration failure, also
> > return -ENOSPC when fgraph_array is full.
> > - [28/36] Add new store_fprobe_entry_data() for fprobe.
> > - [31/36] Remove DIV_ROUND_UP() and fix entry data address calculation.
> > - [36/36] Add new flag to skip timestamp recording.
> >
> > Overview
> > --------
> > This series does major 2 changes, enable multiple function-graphs on
> > the ftrace (e.g. allow function-graph on sub instances) and rewrite the
> > fprobe on this function-graph.
> >
> > The former changes had been sent from Steven Rostedt 4 years ago (*),
> > which allows users to set different setting function-graph tracer (and
> > other tracers based on function-graph) in each trace-instances at the
> > same time.
> >
> > (*) https://lore.kernel.org/all/[email protected]/
> >
> > The purpose of latter change are;
> >
> > 1) Remove dependency of the rethook from fprobe so that we can reduce
> > the return hook code and shadow stack.
> >
> > 2) Make 'ftrace_regs' the common trace interface for the function
> > boundary.
> >
> > 1) Currently we have 2(or 3) different function return hook codes,
> > the function-graph tracer and rethook (and legacy kretprobe).
> > But since this is redundant and needs double maintenance cost,
> > I would like to unify those. From the user's viewpoint, function-
> > graph tracer is very useful to grasp the execution path. For this
> > purpose, it is hard to use the rethook in the function-graph
> > tracer, but the opposite is possible. (Strictly speaking, kretprobe
> > can not use it because it requires 'pt_regs' for historical reasons.)
> >
> > 2) Now the fprobe provides the 'pt_regs' for its handler, but that is
> > wrong for the function entry and exit. Moreover, depending on the
> > architecture, there is no way to accurately reproduce 'pt_regs'
> > outside of interrupt or exception handlers. This means fprobe should
> > not use 'pt_regs' because it does not use such exceptions.
> > (Conversely, kprobe should use 'pt_regs' because it is an abstract
> > interface of the software breakpoint exception.)
> >
> > This series changes fprobe to use function-graph tracer for tracing
> > function entry and exit, instead of mixture of ftrace and rethook.
> > Unlike the rethook which is a per-task list of system-wide allocated
> > nodes, the function graph's ret_stack is a per-task shadow stack.
> > Thus it does not need to set 'nr_maxactive' (which is the number of
> > pre-allocated nodes).
> > Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'.
> > Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as
> > their register interface, this changes it to convert 'ftrace_regs' to
> > 'pt_regs'. Of course this conversion makes an incomplete 'pt_regs',
> > so users must access only registers for function parameters or
> > return value.
> >
> > Design
> > ------
> > Instead of using ftrace's function entry hook directly, the new fprobe
> > is built on top of the function-graph's entry and return callbacks
> > with 'ftrace_regs'.
> >
> > Since the fprobe requires access to 'ftrace_regs', the architecture
> > must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS and
> > CONFIG_HAVE_FTRACE_GRAPH_FUNC, which enables to call function-graph
> > entry callback with 'ftrace_regs', and also
> > CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to
> > return_to_handler.
> >
> > All fprobes share a single function-graph ops (means shares a common
> > ftrace filter) similar to the kprobe-on-ftrace. This needs another
> > layer to find corresponding fprobe in the common function-graph
> > callbacks, but has much better scalability, since the number of
> > registered function-graph ops is limited.
> >
> > In the entry callback, the fprobe runs its entry_handler and saves the
> > address of 'fprobe' on the function-graph's shadow stack as data. The
> > return callback decodes the data to get the 'fprobe' address, and runs
> > the exit_handler.
> >
> > The fprobe introduces two hash-tables, one is for entry callback which
> > searches fprobes related to the given function address passed by entry
> > callback. The other is for a return callback which checks if the given
> > 'fprobe' data structure pointer is still valid. Note that it is
> > possible to unregister fprobe before the return callback runs. Thus
> > the address validation must be done before using it in the return
> > callback.
> >
> > This series can be applied against the probes/for-next branch, which
> > is based on v6.9-rc3.
> >
> > This series can also be found below branch.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-on-fgraph
> >
> > Thank you,
> >
> > ---
> >
> > Masami Hiramatsu (Google) (21):
> > tracing: Add a comment about ftrace_regs definition
> > tracing: Rename ftrace_regs_return_value to ftrace_regs_get_return_value
> > x86: tracing: Add ftrace_regs definition in the header
> > function_graph: Use a simple LRU for fgraph_array index number
> > ftrace: Add multiple fgraph storage selftest
> > function_graph: Pass ftrace_regs to entryfunc
> > function_graph: Replace fgraph_ret_regs with ftrace_regs
> > function_graph: Pass ftrace_regs to retfunc
> > fprobe: Use ftrace_regs in fprobe entry handler
> > fprobe: Use ftrace_regs in fprobe exit handler
> > tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
> > tracing: Add ftrace_fill_perf_regs() for perf event
> > tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
> > bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
> > ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
> > fprobe: Rewrite fprobe on function-graph tracer
> > tracing/fprobe: Remove nr_maxactive from fprobe
> > selftests: ftrace: Remove obsolate maxactive syntax check
> > selftests/ftrace: Add a test case for repeating register/unregister fprobe
> > Documentation: probes: Update fprobe on function-graph tracer
> > fgraph: Skip recording calltime/rettime if it is not nneeded
> >
> > Steven Rostedt (VMware) (15):
> > function_graph: Convert ret_stack to a series of longs
> > fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long
> > function_graph: Add an array structure that will allow multiple callbacks
> > function_graph: Allow multiple users to attach to function graph
> > function_graph: Remove logic around ftrace_graph_entry and return
> > ftrace/function_graph: Pass fgraph_ops to function graph callbacks
> > ftrace: Allow function_graph tracer to be enabled in instances
> > ftrace: Allow ftrace startup flags exist without dynamic ftrace
> > function_graph: Have the instances use their own ftrace_ops for filtering
> > function_graph: Add "task variables" per task for fgraph_ops
> > function_graph: Move set_graph_function tests to shadow stack global var
> > function_graph: Move graph depth stored data to shadow stack global var
> > function_graph: Move graph notrace bit to shadow stack global var
> > function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
> > function_graph: Add selftest for passing local variables
> >
> >
> > Documentation/trace/fprobe.rst | 42 +
> > arch/arm64/Kconfig | 3
> > arch/arm64/include/asm/ftrace.h | 47 +
> > arch/arm64/kernel/asm-offsets.c | 12
> > arch/arm64/kernel/entry-ftrace.S | 32 -
> > arch/arm64/kernel/ftrace.c | 21
> > arch/loongarch/Kconfig | 4
> > arch/loongarch/include/asm/ftrace.h | 32 -
> > arch/loongarch/kernel/asm-offsets.c | 12
> > arch/loongarch/kernel/ftrace_dyn.c | 15
> > arch/loongarch/kernel/mcount.S | 17
> > arch/loongarch/kernel/mcount_dyn.S | 14
> > arch/powerpc/Kconfig | 1
> > arch/powerpc/include/asm/ftrace.h | 15
> > arch/powerpc/kernel/trace/ftrace.c | 3
> > arch/powerpc/kernel/trace/ftrace_64_pg.c | 10
> > arch/riscv/Kconfig | 3
> > arch/riscv/include/asm/ftrace.h | 21
> > arch/riscv/kernel/ftrace.c | 15
> > arch/riscv/kernel/mcount.S | 24
> > arch/s390/Kconfig | 3
> > arch/s390/include/asm/ftrace.h | 39 -
> > arch/s390/kernel/asm-offsets.c | 6
> > arch/s390/kernel/mcount.S | 9
> > arch/x86/Kconfig | 4
> > arch/x86/include/asm/ftrace.h | 43 -
> > arch/x86/kernel/ftrace.c | 51 +
> > arch/x86/kernel/ftrace_32.S | 15
> > arch/x86/kernel/ftrace_64.S | 17
> > include/linux/fprobe.h | 57 +
> > include/linux/ftrace.h | 170 +++
> > include/linux/sched.h | 2
> > include/linux/trace_recursion.h | 39 -
> > kernel/trace/Kconfig | 23
> > kernel/trace/bpf_trace.c | 14
> > kernel/trace/fgraph.c | 1005 ++++++++++++++++----
> > kernel/trace/fprobe.c | 637 +++++++++----
> > kernel/trace/ftrace.c | 13
> > kernel/trace/ftrace_internal.h | 2
> > kernel/trace/trace.h | 96 ++
> > kernel/trace/trace_fprobe.c | 147 ++-
> > kernel/trace/trace_functions.c | 8
> > kernel/trace/trace_functions_graph.c | 98 +-
> > kernel/trace/trace_irqsoff.c | 12
> > kernel/trace/trace_probe_tmpl.h | 2
> > kernel/trace/trace_sched_wakeup.c | 12
> > kernel/trace/trace_selftest.c | 262 +++++
> > lib/test_fprobe.c | 51 -
> > samples/fprobe/fprobe_example.c | 4
> > .../test.d/dynevent/add_remove_fprobe_repeat.tc | 19
> > .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc | 4
> > 51 files changed, 2325 insertions(+), 882 deletions(-)
> > create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
> >
> > --
> > Masami Hiramatsu (Google) <[email protected]>
> >
>
>


2024-04-20 03:53:20

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v9 07/36] function_graph: Allow multiple users to attach to function graph

On Mon, 15 Apr 2024 21:50:20 +0900
"Masami Hiramatsu (Google)" <[email protected]> wrote:

> @@ -27,23 +28,157 @@
>
> #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
> #define FGRAPH_RET_INDEX DIV_ROUND_UP(FGRAPH_RET_SIZE, sizeof(long))
> +
> +/*
> + * On entry to a function (via function_graph_enter()), a new ftrace_ret_stack
> + * is allocated on the task's ret_stack with indexes entry, then each
> + * fgraph_ops on the fgraph_array[]'s entryfunc is called and if that returns
> + * non-zero, the index into the fgraph_array[] for that fgraph_ops is recorded
> + * on the indexes entry as a bit flag.
> + * As the associated ftrace_ret_stack saved for those fgraph_ops needs to
> + * be found, the index to it is also added to the ret_stack along with the
> + * index of the fgraph_array[] to each fgraph_ops that needs their retfunc
> + * called.
> + *
> + * The top of the ret_stack (when not empty) will always have a reference
> + * to the last ftrace_ret_stack saved. All references to the
> + * ftrace_ret_stack has the format of:
> + *
> + * bits: 0 - 9 offset in words from the previous ftrace_ret_stack
> + * (bitmap type should have FGRAPH_RET_INDEX always)
> + * bits: 10 - 11 Type of storage
> + * 0 - reserved
> + * 1 - bitmap of fgraph_array index
> + *
> + * For bitmap of fgraph_array index
> + * bits: 12 - 27 The bitmap of fgraph_ops fgraph_array index

I really hate the terminology I came up with here, and would love to
get better terminology for describing what is going on. I looked it
over but I'm constantly getting confused. And I wrote this code!

Perhaps we should use:

@frame : The data that represents a single function call. When a
function is traced, all the data used for all the callbacks
attached to it, is in a single frame. This would replace the
FGRAPH_RET_SIZE as FGRAPH_FRAME_SIZE.

@offset : This is the word size position on the stack. It would
replace INDEX, as I think "index" is being used for more
than one thing. Perhaps it should be "offset" when dealing
with where it is on the shadow stack, and "pos" when dealing
with which callback ops is being referenced.


> + *
> + * That is, at the end of function_graph_enter, if the first and forth
> + * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
> + * on the return of the function being traced, this is what will be on the
> + * task's shadow ret_stack: (the stack grows upward)
> + *
> + * | | <- task->curr_ret_stack
> + * +--------------------------------------------+
> + * | bitmap_type(bitmap:(BIT(3)|BIT(0)), |
> + * | offset:FGRAPH_RET_INDEX) | <- the offset is from here
> + * +--------------------------------------------+
> + * | struct ftrace_ret_stack |
> + * | (stores the saved ret pointer) | <- the offset points here
> + * +--------------------------------------------+
> + * | (X) | (N) | ( N words away from
> + * | | previous ret_stack)
> + *
> + * If a backtrace is required, and the real return pointer needs to be
> + * fetched, then it looks at the task's curr_ret_stack index, if it
> + * is greater than zero (reserved, or right before poped), it would mask
> + * the value by FGRAPH_RET_INDEX_MASK to get the offset index of the
> + * ftrace_ret_stack structure stored on the shadow stack.
> + */
> +
> +#define FGRAPH_RET_INDEX_SIZE 10

Replace SIZE with BITS.

> +#define FGRAPH_RET_INDEX_MASK GENMASK(FGRAPH_RET_INDEX_SIZE - 1, 0)

#define FGRAPH_FRAME_SIZE_BITS 10
#define FGRAPH_FRAME_SIZE_MASK GENMASK(FGRAPH_FRAME_SIZE_BITS - 1, 0)


> +
> +#define FGRAPH_TYPE_SIZE 2
> +#define FGRAPH_TYPE_MASK GENMASK(FGRAPH_TYPE_SIZE - 1, 0)

#define FGRAPH_TYPE_BITS 2
#define FGRAPH_TYPE_MASK GENMASK(FGRAPH_TYPE_BITS - 1, 0)


> +#define FGRAPH_TYPE_SHIFT FGRAPH_RET_INDEX_SIZE
> +
> +enum {
> + FGRAPH_TYPE_RESERVED = 0,
> + FGRAPH_TYPE_BITMAP = 1,
> +};
> +
> +#define FGRAPH_INDEX_SIZE 16

replace "INDEX" with "OPS" as it will be the indexes of ops in the
array.

#define FGRAPH_OPS_BITS 16
#define FGRAPH_OPS_MASK GENMASK(FGRAPH_OPS_BITS - 1, 0)

> +#define FGRAPH_INDEX_MASK GENMASK(FGRAPH_INDEX_SIZE - 1, 0)
> +#define FGRAPH_INDEX_SHIFT (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
> +
> +/* Currently the max stack index can't be more than register callers */
> +#define FGRAPH_MAX_INDEX (FGRAPH_INDEX_SIZE + FGRAPH_RET_INDEX)

FGRAPH_MAX_INDEX isn't even used. Let's delete it.

> +
> +#define FGRAPH_ARRAY_SIZE FGRAPH_INDEX_SIZE

#define FGRAPH_ARRAY_SIZE FGRAPH_INDEX_BITS

> +
> #define SHADOW_STACK_SIZE (PAGE_SIZE)
> #define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
> /* Leave on a buffer at the end */
> -#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
> +#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1))

We probably should rename this is previous patches as well.

Unfortunately, it's getting close to the time for me to pick up my wife
from the airport to start our vacation. But I think we should rename a
lot of these variables to make things more consistent.

I'll try to look more at the previous patches as well to make my
comments there, when I get some time. Maybe even later today.

-- Steve


2024-04-20 08:57:41

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v9 07/36] function_graph: Allow multiple users to attach to function graph

On Fri, 19 Apr 2024 23:52:58 -0400
Steven Rostedt <[email protected]> wrote:

> On Mon, 15 Apr 2024 21:50:20 +0900
> "Masami Hiramatsu (Google)" <[email protected]> wrote:
>
> > @@ -27,23 +28,157 @@
> >
> > #define FGRAPH_RET_SIZE sizeof(struct ftrace_ret_stack)
> > #define FGRAPH_RET_INDEX DIV_ROUND_UP(FGRAPH_RET_SIZE, sizeof(long))
> > +
> > +/*
> > + * On entry to a function (via function_graph_enter()), a new ftrace_ret_stack
> > + * is allocated on the task's ret_stack with indexes entry, then each
> > + * fgraph_ops on the fgraph_array[]'s entryfunc is called and if that returns
> > + * non-zero, the index into the fgraph_array[] for that fgraph_ops is recorded
> > + * on the indexes entry as a bit flag.
> > + * As the associated ftrace_ret_stack saved for those fgraph_ops needs to
> > + * be found, the index to it is also added to the ret_stack along with the
> > + * index of the fgraph_array[] to each fgraph_ops that needs their retfunc
> > + * called.
> > + *
> > + * The top of the ret_stack (when not empty) will always have a reference
> > + * to the last ftrace_ret_stack saved. All references to the
> > + * ftrace_ret_stack has the format of:
> > + *
> > + * bits: 0 - 9 offset in words from the previous ftrace_ret_stack
> > + * (bitmap type should have FGRAPH_RET_INDEX always)
> > + * bits: 10 - 11 Type of storage
> > + * 0 - reserved
> > + * 1 - bitmap of fgraph_array index
> > + *
> > + * For bitmap of fgraph_array index
> > + * bits: 12 - 27 The bitmap of fgraph_ops fgraph_array index
>
> I really hate the terminology I came up with here, and would love to
> get better terminology for describing what is going on. I looked it
> over but I'm constantly getting confused. And I wrote this code!
>
> Perhaps we should use:
>
> @frame : The data that represents a single function call. When a
> function is traced, all the data used for all the callbacks
> attached to it, is in a single frame. This would replace the
> FGRAPH_RET_SIZE as FGRAPH_FRAME_SIZE.

Agreed.

>
> @offset : This is the word size position on the stack. It would
> replace INDEX, as I think "index" is being used for more
> than one thing. Perhaps it should be "offset" when dealing
> with where it is on the shadow stack, and "pos" when dealing
> with which callback ops is being referenced.

Indeed. @index is usually used from the index in an array. So we can use
@index for fgraph_array[]. But inside a @frame, @offset would be better.

>
>
> > + *
> > + * That is, at the end of function_graph_enter, if the first and forth
> > + * fgraph_ops on the fgraph_array[] (index 0 and 3) needs their retfunc called
> > + * on the return of the function being traced, this is what will be on the
> > + * task's shadow ret_stack: (the stack grows upward)
> > + *
> > + * | | <- task->curr_ret_stack
> > + * +--------------------------------------------+
> > + * | bitmap_type(bitmap:(BIT(3)|BIT(0)), |
> > + * | offset:FGRAPH_RET_INDEX) | <- the offset is from here
> > + * +--------------------------------------------+
> > + * | struct ftrace_ret_stack |
> > + * | (stores the saved ret pointer) | <- the offset points here
> > + * +--------------------------------------------+
> > + * | (X) | (N) | ( N words away from
> > + * | | previous ret_stack)
> > + *
> > + * If a backtrace is required, and the real return pointer needs to be
> > + * fetched, then it looks at the task's curr_ret_stack index, if it
> > + * is greater than zero (reserved, or right before poped), it would mask
> > + * the value by FGRAPH_RET_INDEX_MASK to get the offset index of the
> > + * ftrace_ret_stack structure stored on the shadow stack.
> > + */
> > +
> > +#define FGRAPH_RET_INDEX_SIZE 10
>
> Replace SIZE with BITS.

Agreed.

>
> > +#define FGRAPH_RET_INDEX_MASK GENMASK(FGRAPH_RET_INDEX_SIZE - 1, 0)
>
> #define FGRAPH_FRAME_SIZE_BITS 10
> #define FGRAPH_FRAME_SIZE_MASK GENMASK(FGRAPH_FRAME_SIZE_BITS - 1, 0)
>
>
> > +
> > +#define FGRAPH_TYPE_SIZE 2
> > +#define FGRAPH_TYPE_MASK GENMASK(FGRAPH_TYPE_SIZE - 1, 0)
>
> #define FGRAPH_TYPE_BITS 2
> #define FGRAPH_TYPE_MASK GENMASK(FGRAPH_TYPE_BITS - 1, 0)
>
>
> > +#define FGRAPH_TYPE_SHIFT FGRAPH_RET_INDEX_SIZE
> > +
> > +enum {
> > + FGRAPH_TYPE_RESERVED = 0,
> > + FGRAPH_TYPE_BITMAP = 1,
> > +};
> > +
> > +#define FGRAPH_INDEX_SIZE 16
>
> replace "INDEX" with "OPS" as it will be the indexes of ops in the
> array.
>
> #define FGRAPH_OPS_BITS 16
> #define FGRAPH_OPS_MASK GENMASK(FGRAPH_OPS_BITS - 1, 0)

OK, this looks good.

>
> > +#define FGRAPH_INDEX_MASK GENMASK(FGRAPH_INDEX_SIZE - 1, 0)
> > +#define FGRAPH_INDEX_SHIFT (FGRAPH_TYPE_SHIFT + FGRAPH_TYPE_SIZE)
> > +
> > +/* Currently the max stack index can't be more than register callers */
> > +#define FGRAPH_MAX_INDEX (FGRAPH_INDEX_SIZE + FGRAPH_RET_INDEX)
>
> FGRAPH_MAX_INDEX isn't even used. Let's delete it.

OK.

>
> > +
> > +#define FGRAPH_ARRAY_SIZE FGRAPH_INDEX_SIZE
>
> #define FGRAPH_ARRAY_SIZE FGRAPH_INDEX_BITS

OK.

>
> > +
> > #define SHADOW_STACK_SIZE (PAGE_SIZE)
> > #define SHADOW_STACK_INDEX (SHADOW_STACK_SIZE / sizeof(long))
> > /* Leave on a buffer at the end */
> > -#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - FGRAPH_RET_INDEX)
> > +#define SHADOW_STACK_MAX_INDEX (SHADOW_STACK_INDEX - (FGRAPH_RET_INDEX + 1))
>
> We probably should rename this is previous patches as well.
>
> Unfortunately, it's getting close to the time for me to pick up my wife
> from the airport to start our vacation. But I think we should rename a
> lot of these variables to make things more consistent.

OK, Thanks for your review!

>
> I'll try to look more at the previous patches as well to make my
> comments there, when I get some time. Maybe even later today.

Only if you have a time. I think I also refresh the code.

Thank you,

>
> -- Steve
>


--
Masami Hiramatsu (Google) <[email protected]>

2024-04-24 12:28:28

by Florent Revest

[permalink] [raw]
Subject: Re: [PATCH v9 01/36] tracing: Add a comment about ftrace_regs definition

On Mon, Apr 15, 2024 at 2:49 PM Masami Hiramatsu (Google)
<[email protected]> wrote:
>
> From: Masami Hiramatsu (Google) <[email protected]>
>
> To clarify what will be expected on ftrace_regs, add a comment to the
> architecture independent definition of the ftrace_regs.
>
> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> Acked-by: Mark Rutland <[email protected]>
> ---
> Changes in v8:
> - Update that the saved registers depends on the context.
> Changes in v3:
> - Add instruction pointer
> Changes in v2:
> - newly added.
> ---
> include/linux/ftrace.h | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index 54d53f345d14..b81f1afa82a1 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -118,6 +118,32 @@ extern int ftrace_enabled;
>
> #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
>
> +/**
> + * ftrace_regs - ftrace partial/optimal register set
> + *
> + * ftrace_regs represents a group of registers which is used at the
> + * function entry and exit. There are three types of registers.
> + *
> + * - Registers for passing the parameters to callee, including the stack
> + * pointer. (e.g. rcx, rdx, rdi, rsi, r8, r9 and rsp on x86_64)
> + * - Registers for passing the return values to caller.
> + * (e.g. rax and rdx on x86_64)

Ooc, have we ever considered skipping argument registers that are not
return value registers in the exit code paths ? For example, why would
we want to save rdi in a return handler ?

But if we want to avoid the situation of having "sparse ftrace_regs"
all over again, we'd have to split ftrace_regs into a ftrace_args_regs
and a ftrace_ret_regs which would make this refactoring even more
painful, just to skip a few instructions. :|

I don't necessarily think it's worth it, I just wanted to make sure
this was considered.

> + * - Registers for hooking the function call and return including the
> + * frame pointer (the frame pointer is architecture/config dependent)
> + * (e.g. rip, rbp and rsp for x86_64)
> + *
> + * Also, architecture dependent fields can be used for internal process.
> + * (e.g. orig_ax on x86_64)
> + *
> + * On the function entry, those registers will be restored except for
> + * the stack pointer, so that user can change the function parameters
> + * and instruction pointer (e.g. live patching.)
> + * On the function exit, only registers which is used for return values
> + * are restored.
> + *
> + * NOTE: user *must not* access regs directly, only do it via APIs, because
> + * the member can be changed according to the architecture.
> + */
> struct ftrace_regs {
> struct pt_regs regs;
> };
>

2024-04-24 13:42:05

by Florent Revest

[permalink] [raw]
Subject: Re: [PATCH v9 01/36] tracing: Add a comment about ftrace_regs definition

On Wed, Apr 24, 2024 at 2:23 PM Florent Revest <[email protected]> wrote:
>
> On Mon, Apr 15, 2024 at 2:49 PM Masami Hiramatsu (Google)
> <[email protected]> wrote:
> >
> > From: Masami Hiramatsu (Google) <[email protected]>
> >
> > To clarify what will be expected on ftrace_regs, add a comment to the
> > architecture independent definition of the ftrace_regs.
> >
> > Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> > Acked-by: Mark Rutland <[email protected]>
> > ---
> > Changes in v8:
> > - Update that the saved registers depends on the context.
> > Changes in v3:
> > - Add instruction pointer
> > Changes in v2:
> > - newly added.
> > ---
> > include/linux/ftrace.h | 26 ++++++++++++++++++++++++++
> > 1 file changed, 26 insertions(+)
> >
> > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > index 54d53f345d14..b81f1afa82a1 100644
> > --- a/include/linux/ftrace.h
> > +++ b/include/linux/ftrace.h
> > @@ -118,6 +118,32 @@ extern int ftrace_enabled;
> >
> > #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
> >
> > +/**
> > + * ftrace_regs - ftrace partial/optimal register set
> > + *
> > + * ftrace_regs represents a group of registers which is used at the
> > + * function entry and exit. There are three types of registers.
> > + *
> > + * - Registers for passing the parameters to callee, including the stack
> > + * pointer. (e.g. rcx, rdx, rdi, rsi, r8, r9 and rsp on x86_64)
> > + * - Registers for passing the return values to caller.
> > + * (e.g. rax and rdx on x86_64)
>
> Ooc, have we ever considered skipping argument registers that are not
> return value registers in the exit code paths ? For example, why would
> we want to save rdi in a return handler ?
>
> But if we want to avoid the situation of having "sparse ftrace_regs"
> all over again, we'd have to split ftrace_regs into a ftrace_args_regs
> and a ftrace_ret_regs which would make this refactoring even more
> painful, just to skip a few instructions. :|
>
> I don't necessarily think it's worth it, I just wanted to make sure
> this was considered.

Ah, well, I just reached patch 22 and noticed that there you add add:

+ * Basically, ftrace_regs stores the registers related to the context.
+ * On function entry, registers for function parameters and hooking the
+ * function call are stored, and on function exit, registers for function
+ * return value and frame pointers are stored.

So ftrace_regs can be a a sparse structure then. That's fair enough with me! ;)

2024-04-24 14:03:09

by Florent Revest

[permalink] [raw]
Subject: Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

Neat! :) I had a look at mostly the "high level" part (fprobe and
arm64 specific bits) and this seems to be in a good state to me.

Thanks for all that work, that is quite a refactoring :)

On Mon, Apr 15, 2024 at 2:49 PM Masami Hiramatsu (Google)
<[email protected]> wrote:
>
> Hi,
>
> Here is the 9th version of the series to re-implement the fprobe on
> function-graph tracer. The previous version is;
>
> https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/
>
> This version is ported on the latest kernel (v6.9-rc3 + probes/for-next)
> and fixed some bugs + performance optimization patch[36/36].
> - [12/36] Fix to clear fgraph_array entry in registration failure, also
> return -ENOSPC when fgraph_array is full.
> - [28/36] Add new store_fprobe_entry_data() for fprobe.
> - [31/36] Remove DIV_ROUND_UP() and fix entry data address calculation.
> - [36/36] Add new flag to skip timestamp recording.
>
> Overview
> --------
> This series does major 2 changes, enable multiple function-graphs on
> the ftrace (e.g. allow function-graph on sub instances) and rewrite the
> fprobe on this function-graph.
>
> The former changes had been sent from Steven Rostedt 4 years ago (*),
> which allows users to set different setting function-graph tracer (and
> other tracers based on function-graph) in each trace-instances at the
> same time.
>
> (*) https://lore.kernel.org/all/[email protected]/
>
> The purpose of latter change are;
>
> 1) Remove dependency of the rethook from fprobe so that we can reduce
> the return hook code and shadow stack.
>
> 2) Make 'ftrace_regs' the common trace interface for the function
> boundary.
>
> 1) Currently we have 2(or 3) different function return hook codes,
> the function-graph tracer and rethook (and legacy kretprobe).
> But since this is redundant and needs double maintenance cost,
> I would like to unify those. From the user's viewpoint, function-
> graph tracer is very useful to grasp the execution path. For this
> purpose, it is hard to use the rethook in the function-graph
> tracer, but the opposite is possible. (Strictly speaking, kretprobe
> can not use it because it requires 'pt_regs' for historical reasons.)
>
> 2) Now the fprobe provides the 'pt_regs' for its handler, but that is
> wrong for the function entry and exit. Moreover, depending on the
> architecture, there is no way to accurately reproduce 'pt_regs'
> outside of interrupt or exception handlers. This means fprobe should
> not use 'pt_regs' because it does not use such exceptions.
> (Conversely, kprobe should use 'pt_regs' because it is an abstract
> interface of the software breakpoint exception.)
>
> This series changes fprobe to use function-graph tracer for tracing
> function entry and exit, instead of mixture of ftrace and rethook.
> Unlike the rethook which is a per-task list of system-wide allocated
> nodes, the function graph's ret_stack is a per-task shadow stack.
> Thus it does not need to set 'nr_maxactive' (which is the number of
> pre-allocated nodes).
> Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'.
> Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as
> their register interface, this changes it to convert 'ftrace_regs' to
> 'pt_regs'. Of course this conversion makes an incomplete 'pt_regs',
> so users must access only registers for function parameters or
> return value.
>
> Design
> ------
> Instead of using ftrace's function entry hook directly, the new fprobe
> is built on top of the function-graph's entry and return callbacks
> with 'ftrace_regs'.
>
> Since the fprobe requires access to 'ftrace_regs', the architecture
> must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS and
> CONFIG_HAVE_FTRACE_GRAPH_FUNC, which enables to call function-graph
> entry callback with 'ftrace_regs', and also
> CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to
> return_to_handler.
>
> All fprobes share a single function-graph ops (means shares a common
> ftrace filter) similar to the kprobe-on-ftrace. This needs another
> layer to find corresponding fprobe in the common function-graph
> callbacks, but has much better scalability, since the number of
> registered function-graph ops is limited.
>
> In the entry callback, the fprobe runs its entry_handler and saves the
> address of 'fprobe' on the function-graph's shadow stack as data. The
> return callback decodes the data to get the 'fprobe' address, and runs
> the exit_handler.
>
> The fprobe introduces two hash-tables, one is for entry callback which
> searches fprobes related to the given function address passed by entry
> callback. The other is for a return callback which checks if the given
> 'fprobe' data structure pointer is still valid. Note that it is
> possible to unregister fprobe before the return callback runs. Thus
> the address validation must be done before using it in the return
> callback.
>
> This series can be applied against the probes/for-next branch, which
> is based on v6.9-rc3.
>
> This series can also be found below branch.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-on-fgraph
>
> Thank you,
>
> ---
>
> Masami Hiramatsu (Google) (21):
> tracing: Add a comment about ftrace_regs definition
> tracing: Rename ftrace_regs_return_value to ftrace_regs_get_return_value
> x86: tracing: Add ftrace_regs definition in the header
> function_graph: Use a simple LRU for fgraph_array index number
> ftrace: Add multiple fgraph storage selftest
> function_graph: Pass ftrace_regs to entryfunc
> function_graph: Replace fgraph_ret_regs with ftrace_regs
> function_graph: Pass ftrace_regs to retfunc
> fprobe: Use ftrace_regs in fprobe entry handler
> fprobe: Use ftrace_regs in fprobe exit handler
> tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
> tracing: Add ftrace_fill_perf_regs() for perf event
> tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
> bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
> ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
> fprobe: Rewrite fprobe on function-graph tracer
> tracing/fprobe: Remove nr_maxactive from fprobe
> selftests: ftrace: Remove obsolate maxactive syntax check
> selftests/ftrace: Add a test case for repeating register/unregister fprobe
> Documentation: probes: Update fprobe on function-graph tracer
> fgraph: Skip recording calltime/rettime if it is not nneeded
>
> Steven Rostedt (VMware) (15):
> function_graph: Convert ret_stack to a series of longs
> fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long
> function_graph: Add an array structure that will allow multiple callbacks
> function_graph: Allow multiple users to attach to function graph
> function_graph: Remove logic around ftrace_graph_entry and return
> ftrace/function_graph: Pass fgraph_ops to function graph callbacks
> ftrace: Allow function_graph tracer to be enabled in instances
> ftrace: Allow ftrace startup flags exist without dynamic ftrace
> function_graph: Have the instances use their own ftrace_ops for filtering
> function_graph: Add "task variables" per task for fgraph_ops
> function_graph: Move set_graph_function tests to shadow stack global var
> function_graph: Move graph depth stored data to shadow stack global var
> function_graph: Move graph notrace bit to shadow stack global var
> function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
> function_graph: Add selftest for passing local variables
>
>
> Documentation/trace/fprobe.rst | 42 +
> arch/arm64/Kconfig | 3
> arch/arm64/include/asm/ftrace.h | 47 +
> arch/arm64/kernel/asm-offsets.c | 12
> arch/arm64/kernel/entry-ftrace.S | 32 -
> arch/arm64/kernel/ftrace.c | 21
> arch/loongarch/Kconfig | 4
> arch/loongarch/include/asm/ftrace.h | 32 -
> arch/loongarch/kernel/asm-offsets.c | 12
> arch/loongarch/kernel/ftrace_dyn.c | 15
> arch/loongarch/kernel/mcount.S | 17
> arch/loongarch/kernel/mcount_dyn.S | 14
> arch/powerpc/Kconfig | 1
> arch/powerpc/include/asm/ftrace.h | 15
> arch/powerpc/kernel/trace/ftrace.c | 3
> arch/powerpc/kernel/trace/ftrace_64_pg.c | 10
> arch/riscv/Kconfig | 3
> arch/riscv/include/asm/ftrace.h | 21
> arch/riscv/kernel/ftrace.c | 15
> arch/riscv/kernel/mcount.S | 24
> arch/s390/Kconfig | 3
> arch/s390/include/asm/ftrace.h | 39 -
> arch/s390/kernel/asm-offsets.c | 6
> arch/s390/kernel/mcount.S | 9
> arch/x86/Kconfig | 4
> arch/x86/include/asm/ftrace.h | 43 -
> arch/x86/kernel/ftrace.c | 51 +
> arch/x86/kernel/ftrace_32.S | 15
> arch/x86/kernel/ftrace_64.S | 17
> include/linux/fprobe.h | 57 +
> include/linux/ftrace.h | 170 +++
> include/linux/sched.h | 2
> include/linux/trace_recursion.h | 39 -
> kernel/trace/Kconfig | 23
> kernel/trace/bpf_trace.c | 14
> kernel/trace/fgraph.c | 1005 ++++++++++++++++----
> kernel/trace/fprobe.c | 637 +++++++++----
> kernel/trace/ftrace.c | 13
> kernel/trace/ftrace_internal.h | 2
> kernel/trace/trace.h | 96 ++
> kernel/trace/trace_fprobe.c | 147 ++-
> kernel/trace/trace_functions.c | 8
> kernel/trace/trace_functions_graph.c | 98 +-
> kernel/trace/trace_irqsoff.c | 12
> kernel/trace/trace_probe_tmpl.h | 2
> kernel/trace/trace_sched_wakeup.c | 12
> kernel/trace/trace_selftest.c | 262 +++++
> lib/test_fprobe.c | 51 -
> samples/fprobe/fprobe_example.c | 4
> .../test.d/dynevent/add_remove_fprobe_repeat.tc | 19
> .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc | 4
> 51 files changed, 2325 insertions(+), 882 deletions(-)
> create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
>
> --
> Masami Hiramatsu (Google) <[email protected]>

2024-04-24 14:33:09

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v9 01/36] tracing: Add a comment about ftrace_regs definition

On Wed, 24 Apr 2024 15:19:24 +0200
Florent Revest <[email protected]> wrote:

> On Wed, Apr 24, 2024 at 2:23 PM Florent Revest <[email protected]> wrote:
> >
> > On Mon, Apr 15, 2024 at 2:49 PM Masami Hiramatsu (Google)
> > <[email protected]> wrote:
> > >
> > > From: Masami Hiramatsu (Google) <[email protected]>
> > >
> > > To clarify what will be expected on ftrace_regs, add a comment to the
> > > architecture independent definition of the ftrace_regs.
> > >
> > > Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> > > Acked-by: Mark Rutland <[email protected]>
> > > ---
> > > Changes in v8:
> > > - Update that the saved registers depends on the context.
> > > Changes in v3:
> > > - Add instruction pointer
> > > Changes in v2:
> > > - newly added.
> > > ---
> > > include/linux/ftrace.h | 26 ++++++++++++++++++++++++++
> > > 1 file changed, 26 insertions(+)
> > >
> > > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > > index 54d53f345d14..b81f1afa82a1 100644
> > > --- a/include/linux/ftrace.h
> > > +++ b/include/linux/ftrace.h
> > > @@ -118,6 +118,32 @@ extern int ftrace_enabled;
> > >
> > > #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS
> > >
> > > +/**
> > > + * ftrace_regs - ftrace partial/optimal register set
> > > + *
> > > + * ftrace_regs represents a group of registers which is used at the
> > > + * function entry and exit. There are three types of registers.
> > > + *
> > > + * - Registers for passing the parameters to callee, including the stack
> > > + * pointer. (e.g. rcx, rdx, rdi, rsi, r8, r9 and rsp on x86_64)
> > > + * - Registers for passing the return values to caller.
> > > + * (e.g. rax and rdx on x86_64)
> >
> > Ooc, have we ever considered skipping argument registers that are not
> > return value registers in the exit code paths ? For example, why would
> > we want to save rdi in a return handler ?
> >
> > But if we want to avoid the situation of having "sparse ftrace_regs"
> > all over again, we'd have to split ftrace_regs into a ftrace_args_regs
> > and a ftrace_ret_regs which would make this refactoring even more
> > painful, just to skip a few instructions. :|
> >
> > I don't necessarily think it's worth it, I just wanted to make sure
> > this was considered.
>
> Ah, well, I just reached patch 22 and noticed that there you add add:
>
> + * Basically, ftrace_regs stores the registers related to the context.
> + * On function entry, registers for function parameters and hooking the
> + * function call are stored, and on function exit, registers for function
> + * return value and frame pointers are stored.
>
> So ftrace_regs can be a a sparse structure then. That's fair enough with me! ;)

Yes, and in this patch, I explained that too :)

> + * On the function entry, those registers will be restored except for
> + * the stack pointer, so that user can change the function parameters
> + * and instruction pointer (e.g. live patching.)
> + * On the function exit, only registers which is used for return values
> > + * are restored.

So the function exit, ftrace_regs will be sparse.

Thank you,

--
Masami Hiramatsu (Google) <[email protected]>

2024-04-25 15:10:41

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

On Wed, 24 Apr 2024 15:35:15 +0200
Florent Revest <[email protected]> wrote:

> Neat! :) I had a look at mostly the "high level" part (fprobe and
> arm64 specific bits) and this seems to be in a good state to me.
>

Thanks for the review this long series!

> Thanks for all that work, that is quite a refactoring :)
>
> On Mon, Apr 15, 2024 at 2:49 PM Masami Hiramatsu (Google)
> <[email protected]> wrote:
> >
> > Hi,
> >
> > Here is the 9th version of the series to re-implement the fprobe on
> > function-graph tracer. The previous version is;
> >
> > https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/
> >
> > This version is ported on the latest kernel (v6.9-rc3 + probes/for-next)
> > and fixed some bugs + performance optimization patch[36/36].
> > - [12/36] Fix to clear fgraph_array entry in registration failure, also
> > return -ENOSPC when fgraph_array is full.
> > - [28/36] Add new store_fprobe_entry_data() for fprobe.
> > - [31/36] Remove DIV_ROUND_UP() and fix entry data address calculation.
> > - [36/36] Add new flag to skip timestamp recording.
> >
> > Overview
> > --------
> > This series does major 2 changes, enable multiple function-graphs on
> > the ftrace (e.g. allow function-graph on sub instances) and rewrite the
> > fprobe on this function-graph.
> >
> > The former changes had been sent from Steven Rostedt 4 years ago (*),
> > which allows users to set different setting function-graph tracer (and
> > other tracers based on function-graph) in each trace-instances at the
> > same time.
> >
> > (*) https://lore.kernel.org/all/[email protected]/
> >
> > The purpose of latter change are;
> >
> > 1) Remove dependency of the rethook from fprobe so that we can reduce
> > the return hook code and shadow stack.
> >
> > 2) Make 'ftrace_regs' the common trace interface for the function
> > boundary.
> >
> > 1) Currently we have 2(or 3) different function return hook codes,
> > the function-graph tracer and rethook (and legacy kretprobe).
> > But since this is redundant and needs double maintenance cost,
> > I would like to unify those. From the user's viewpoint, function-
> > graph tracer is very useful to grasp the execution path. For this
> > purpose, it is hard to use the rethook in the function-graph
> > tracer, but the opposite is possible. (Strictly speaking, kretprobe
> > can not use it because it requires 'pt_regs' for historical reasons.)
> >
> > 2) Now the fprobe provides the 'pt_regs' for its handler, but that is
> > wrong for the function entry and exit. Moreover, depending on the
> > architecture, there is no way to accurately reproduce 'pt_regs'
> > outside of interrupt or exception handlers. This means fprobe should
> > not use 'pt_regs' because it does not use such exceptions.
> > (Conversely, kprobe should use 'pt_regs' because it is an abstract
> > interface of the software breakpoint exception.)
> >
> > This series changes fprobe to use function-graph tracer for tracing
> > function entry and exit, instead of mixture of ftrace and rethook.
> > Unlike the rethook which is a per-task list of system-wide allocated
> > nodes, the function graph's ret_stack is a per-task shadow stack.
> > Thus it does not need to set 'nr_maxactive' (which is the number of
> > pre-allocated nodes).
> > Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'.
> > Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as
> > their register interface, this changes it to convert 'ftrace_regs' to
> > 'pt_regs'. Of course this conversion makes an incomplete 'pt_regs',
> > so users must access only registers for function parameters or
> > return value.
> >
> > Design
> > ------
> > Instead of using ftrace's function entry hook directly, the new fprobe
> > is built on top of the function-graph's entry and return callbacks
> > with 'ftrace_regs'.
> >
> > Since the fprobe requires access to 'ftrace_regs', the architecture
> > must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS and
> > CONFIG_HAVE_FTRACE_GRAPH_FUNC, which enables to call function-graph
> > entry callback with 'ftrace_regs', and also
> > CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to
> > return_to_handler.
> >
> > All fprobes share a single function-graph ops (means shares a common
> > ftrace filter) similar to the kprobe-on-ftrace. This needs another
> > layer to find corresponding fprobe in the common function-graph
> > callbacks, but has much better scalability, since the number of
> > registered function-graph ops is limited.
> >
> > In the entry callback, the fprobe runs its entry_handler and saves the
> > address of 'fprobe' on the function-graph's shadow stack as data. The
> > return callback decodes the data to get the 'fprobe' address, and runs
> > the exit_handler.
> >
> > The fprobe introduces two hash-tables, one is for entry callback which
> > searches fprobes related to the given function address passed by entry
> > callback. The other is for a return callback which checks if the given
> > 'fprobe' data structure pointer is still valid. Note that it is
> > possible to unregister fprobe before the return callback runs. Thus
> > the address validation must be done before using it in the return
> > callback.
> >
> > This series can be applied against the probes/for-next branch, which
> > is based on v6.9-rc3.
> >
> > This series can also be found below branch.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-on-fgraph
> >
> > Thank you,
> >
> > ---
> >
> > Masami Hiramatsu (Google) (21):
> > tracing: Add a comment about ftrace_regs definition
> > tracing: Rename ftrace_regs_return_value to ftrace_regs_get_return_value
> > x86: tracing: Add ftrace_regs definition in the header
> > function_graph: Use a simple LRU for fgraph_array index number
> > ftrace: Add multiple fgraph storage selftest
> > function_graph: Pass ftrace_regs to entryfunc
> > function_graph: Replace fgraph_ret_regs with ftrace_regs
> > function_graph: Pass ftrace_regs to retfunc
> > fprobe: Use ftrace_regs in fprobe entry handler
> > fprobe: Use ftrace_regs in fprobe exit handler
> > tracing: Add ftrace_partial_regs() for converting ftrace_regs to pt_regs
> > tracing: Add ftrace_fill_perf_regs() for perf event
> > tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
> > bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled
> > ftrace: Add CONFIG_HAVE_FTRACE_GRAPH_FUNC
> > fprobe: Rewrite fprobe on function-graph tracer
> > tracing/fprobe: Remove nr_maxactive from fprobe
> > selftests: ftrace: Remove obsolate maxactive syntax check
> > selftests/ftrace: Add a test case for repeating register/unregister fprobe
> > Documentation: probes: Update fprobe on function-graph tracer
> > fgraph: Skip recording calltime/rettime if it is not nneeded
> >
> > Steven Rostedt (VMware) (15):
> > function_graph: Convert ret_stack to a series of longs
> > fgraph: Use BUILD_BUG_ON() to make sure we have structures divisible by long
> > function_graph: Add an array structure that will allow multiple callbacks
> > function_graph: Allow multiple users to attach to function graph
> > function_graph: Remove logic around ftrace_graph_entry and return
> > ftrace/function_graph: Pass fgraph_ops to function graph callbacks
> > ftrace: Allow function_graph tracer to be enabled in instances
> > ftrace: Allow ftrace startup flags exist without dynamic ftrace
> > function_graph: Have the instances use their own ftrace_ops for filtering
> > function_graph: Add "task variables" per task for fgraph_ops
> > function_graph: Move set_graph_function tests to shadow stack global var
> > function_graph: Move graph depth stored data to shadow stack global var
> > function_graph: Move graph notrace bit to shadow stack global var
> > function_graph: Implement fgraph_reserve_data() and fgraph_retrieve_data()
> > function_graph: Add selftest for passing local variables
> >
> >
> > Documentation/trace/fprobe.rst | 42 +
> > arch/arm64/Kconfig | 3
> > arch/arm64/include/asm/ftrace.h | 47 +
> > arch/arm64/kernel/asm-offsets.c | 12
> > arch/arm64/kernel/entry-ftrace.S | 32 -
> > arch/arm64/kernel/ftrace.c | 21
> > arch/loongarch/Kconfig | 4
> > arch/loongarch/include/asm/ftrace.h | 32 -
> > arch/loongarch/kernel/asm-offsets.c | 12
> > arch/loongarch/kernel/ftrace_dyn.c | 15
> > arch/loongarch/kernel/mcount.S | 17
> > arch/loongarch/kernel/mcount_dyn.S | 14
> > arch/powerpc/Kconfig | 1
> > arch/powerpc/include/asm/ftrace.h | 15
> > arch/powerpc/kernel/trace/ftrace.c | 3
> > arch/powerpc/kernel/trace/ftrace_64_pg.c | 10
> > arch/riscv/Kconfig | 3
> > arch/riscv/include/asm/ftrace.h | 21
> > arch/riscv/kernel/ftrace.c | 15
> > arch/riscv/kernel/mcount.S | 24
> > arch/s390/Kconfig | 3
> > arch/s390/include/asm/ftrace.h | 39 -
> > arch/s390/kernel/asm-offsets.c | 6
> > arch/s390/kernel/mcount.S | 9
> > arch/x86/Kconfig | 4
> > arch/x86/include/asm/ftrace.h | 43 -
> > arch/x86/kernel/ftrace.c | 51 +
> > arch/x86/kernel/ftrace_32.S | 15
> > arch/x86/kernel/ftrace_64.S | 17
> > include/linux/fprobe.h | 57 +
> > include/linux/ftrace.h | 170 +++
> > include/linux/sched.h | 2
> > include/linux/trace_recursion.h | 39 -
> > kernel/trace/Kconfig | 23
> > kernel/trace/bpf_trace.c | 14
> > kernel/trace/fgraph.c | 1005 ++++++++++++++++----
> > kernel/trace/fprobe.c | 637 +++++++++----
> > kernel/trace/ftrace.c | 13
> > kernel/trace/ftrace_internal.h | 2
> > kernel/trace/trace.h | 96 ++
> > kernel/trace/trace_fprobe.c | 147 ++-
> > kernel/trace/trace_functions.c | 8
> > kernel/trace/trace_functions_graph.c | 98 +-
> > kernel/trace/trace_irqsoff.c | 12
> > kernel/trace/trace_probe_tmpl.h | 2
> > kernel/trace/trace_sched_wakeup.c | 12
> > kernel/trace/trace_selftest.c | 262 +++++
> > lib/test_fprobe.c | 51 -
> > samples/fprobe/fprobe_example.c | 4
> > .../test.d/dynevent/add_remove_fprobe_repeat.tc | 19
> > .../ftrace/test.d/dynevent/fprobe_syntax_errors.tc | 4
> > 51 files changed, 2325 insertions(+), 882 deletions(-)
> > create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
> >
> > --
> > Masami Hiramatsu (Google) <[email protected]>


--
Masami Hiramatsu (Google) <[email protected]>

2024-04-25 20:09:57

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH v9 29/36] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled

On Mon, Apr 15, 2024 at 6:22 AM Masami Hiramatsu (Google)
<[email protected]> wrote:
>
> From: Masami Hiramatsu (Google) <[email protected]>
>
> Enable kprobe_multi feature if CONFIG_FPROBE is enabled. The pt_regs is
> converted from ftrace_regs by ftrace_partial_regs(), thus some registers
> may always returns 0. But it should be enough for function entry (access
> arguments) and exit (access return value).
>
> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> Acked-by: Florent Revest <[email protected]>
> ---
> Changes from previous series: NOTHING, Update against the new series.
> ---
> kernel/trace/bpf_trace.c | 22 +++++++++-------------
> 1 file changed, 9 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index e51a6ef87167..57b1174030c9 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -2577,7 +2577,7 @@ static int __init bpf_event_init(void)
> fs_initcall(bpf_event_init);
> #endif /* CONFIG_MODULES */
>
> -#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS)
> +#ifdef CONFIG_FPROBE
> struct bpf_kprobe_multi_link {
> struct bpf_link link;
> struct fprobe fp;
> @@ -2600,6 +2600,8 @@ struct user_syms {
> char *buf;
> };
>
> +static DEFINE_PER_CPU(struct pt_regs, bpf_kprobe_multi_pt_regs);

this is a waste if CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST=y, right?
Can we guard it?


> +
> static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, u32 cnt)
> {
> unsigned long __user usymbol;
> @@ -2792,13 +2794,14 @@ static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
>
> static int
> kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
> - unsigned long entry_ip, struct pt_regs *regs)
> + unsigned long entry_ip, struct ftrace_regs *fregs)
> {
> struct bpf_kprobe_multi_run_ctx run_ctx = {
> .link = link,
> .entry_ip = entry_ip,
> };
> struct bpf_run_ctx *old_run_ctx;
> + struct pt_regs *regs;
> int err;
>
> if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
> @@ -2809,6 +2812,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
>
> migrate_disable();
> rcu_read_lock();
> + regs = ftrace_partial_regs(fregs, this_cpu_ptr(&bpf_kprobe_multi_pt_regs));

and then pass NULL if defined(CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST)?


> old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> err = bpf_prog_run(link->link.prog, regs);
> bpf_reset_run_ctx(old_run_ctx);
> @@ -2826,13 +2830,9 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,
> void *data)
> {
> struct bpf_kprobe_multi_link *link;
> - struct pt_regs *regs = ftrace_get_regs(fregs);
> -
> - if (!regs)
> - return 0;
>
> link = container_of(fp, struct bpf_kprobe_multi_link, fp);
> - kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
> + kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
> return 0;
> }
>
> @@ -2842,13 +2842,9 @@ kprobe_multi_link_exit_handler(struct fprobe *fp, unsigned long fentry_ip,
> void *data)
> {
> struct bpf_kprobe_multi_link *link;
> - struct pt_regs *regs = ftrace_get_regs(fregs);
> -
> - if (!regs)
> - return;
>
> link = container_of(fp, struct bpf_kprobe_multi_link, fp);
> - kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
> + kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
> }
>
> static int symbols_cmp_r(const void *a, const void *b, const void *priv)
> @@ -3107,7 +3103,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> kvfree(cookies);
> return err;
> }
> -#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> +#else /* !CONFIG_FPROBE */
> int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> {
> return -EOPNOTSUPP;
>
>

2024-04-25 20:15:34

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH v9 36/36] fgraph: Skip recording calltime/rettime if it is not nneeded

On Mon, Apr 15, 2024 at 6:25 AM Masami Hiramatsu (Google)
<[email protected]> wrote:
>
> From: Masami Hiramatsu (Google) <[email protected]>
>
> Skip recording calltime and rettime if the fgraph_ops does not need it.
> This is a kind of performance optimization for fprobe. Since the fprobe
> user does not use these entries, recording timestamp in fgraph is just
> a overhead (e.g. eBPF, ftrace). So introduce the skip_timestamp flag,
> and all fgraph_ops sets this flag, skip recording calltime and rettime.
>
> Suggested-by: Jiri Olsa <[email protected]>
> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> ---
> Changes in v9:
> - Newly added.
> ---
> include/linux/ftrace.h | 2 ++
> kernel/trace/fgraph.c | 46 +++++++++++++++++++++++++++++++++++++++-------
> kernel/trace/fprobe.c | 1 +
> 3 files changed, 42 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> index d845a80a3d56..06fc7cbef897 100644
> --- a/include/linux/ftrace.h
> +++ b/include/linux/ftrace.h
> @@ -1156,6 +1156,8 @@ struct fgraph_ops {
> struct ftrace_ops ops; /* for the hash lists */
> void *private;
> int idx;
> + /* If skip_timestamp is true, this does not record timestamps. */
> + bool skip_timestamp;
> };
>
> void *fgraph_reserve_data(int idx, int size_bytes);
> diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> index 7556fbbae323..a5722537bb79 100644
> --- a/kernel/trace/fgraph.c
> +++ b/kernel/trace/fgraph.c
> @@ -131,6 +131,7 @@ DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
> int ftrace_graph_active;
>
> static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
> +static bool fgraph_skip_timestamp;
>
> /* LRU index table for fgraph_array */
> static int fgraph_lru_table[FGRAPH_ARRAY_SIZE];
> @@ -475,7 +476,7 @@ void ftrace_graph_stop(void)
> static int
> ftrace_push_return_trace(unsigned long ret, unsigned long func,
> unsigned long frame_pointer, unsigned long *retp,
> - int fgraph_idx)
> + int fgraph_idx, bool skip_ts)
> {
> struct ftrace_ret_stack *ret_stack;
> unsigned long long calltime;
> @@ -498,8 +499,12 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
> ret_stack = get_ret_stack(current, current->curr_ret_stack, &index);
> if (ret_stack && ret_stack->func == func &&
> get_fgraph_type(current, index + FGRAPH_RET_INDEX) == FGRAPH_TYPE_BITMAP &&
> - !is_fgraph_index_set(current, index + FGRAPH_RET_INDEX, fgraph_idx))
> + !is_fgraph_index_set(current, index + FGRAPH_RET_INDEX, fgraph_idx)) {
> + /* If previous one skips calltime, update it. */
> + if (!skip_ts && !ret_stack->calltime)
> + ret_stack->calltime = trace_clock_local();
> return index + FGRAPH_RET_INDEX;
> + }
>
> val = (FGRAPH_TYPE_RESERVED << FGRAPH_TYPE_SHIFT) | FGRAPH_RET_INDEX;
>
> @@ -517,7 +522,10 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
> return -EBUSY;
> }
>
> - calltime = trace_clock_local();
> + if (skip_ts)

would it be ok to add likely() here to keep the least-overhead code path linear?

> + calltime = 0LL;
> + else
> + calltime = trace_clock_local();
>
> index = READ_ONCE(current->curr_ret_stack);
> ret_stack = RET_STACK(current, index);
> @@ -601,7 +609,8 @@ int function_graph_enter_regs(unsigned long ret, unsigned long func,
> trace.func = func;
> trace.depth = ++current->curr_ret_depth;
>
> - index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0);
> + index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0,
> + fgraph_skip_timestamp);
> if (index < 0)
> goto out;
>
> @@ -654,7 +663,8 @@ int function_graph_enter_ops(unsigned long ret, unsigned long func,
> return -ENODEV;
>
> /* Use start for the distance to ret_stack (skipping over reserve) */
> - index = ftrace_push_return_trace(ret, func, frame_pointer, retp, gops->idx);
> + index = ftrace_push_return_trace(ret, func, frame_pointer, retp, gops->idx,
> + gops->skip_timestamp);
> if (index < 0)
> return index;
> type = get_fgraph_type(current, index);
> @@ -732,6 +742,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
> *ret = ret_stack->ret;
> trace->func = ret_stack->func;
> trace->calltime = ret_stack->calltime;
> + trace->rettime = 0;
> trace->overrun = atomic_read(&current->trace_overrun);
> trace->depth = current->curr_ret_depth;
> /*
> @@ -792,7 +803,6 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
> return (unsigned long)panic;
> }
>
> - trace.rettime = trace_clock_local();
> if (fregs)
> ftrace_regs_set_instruction_pointer(fregs, ret);
>
> @@ -808,6 +818,8 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
> continue;
> if (gops == &fgraph_stub)
> continue;
> + if (!trace.rettime && !gops->skip_timestamp)

In addition to the above, do you mind adding unlikely() here as well?

> + trace.rettime = trace_clock_local();
>
> gops->retfunc(&trace, gops, fregs);
> }

[...]

2024-04-25 20:32:28

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

On Mon, Apr 15, 2024 at 5:49 AM Masami Hiramatsu (Google)
<[email protected]> wrote:
>
> Hi,
>
> Here is the 9th version of the series to re-implement the fprobe on
> function-graph tracer. The previous version is;
>
> https://lore.kernel.org/all/170887410337.564249.6360118840946697039.stgit@devnote2/
>
> This version is ported on the latest kernel (v6.9-rc3 + probes/for-next)
> and fixed some bugs + performance optimization patch[36/36].
> - [12/36] Fix to clear fgraph_array entry in registration failure, also
> return -ENOSPC when fgraph_array is full.
> - [28/36] Add new store_fprobe_entry_data() for fprobe.
> - [31/36] Remove DIV_ROUND_UP() and fix entry data address calculation.
> - [36/36] Add new flag to skip timestamp recording.
>
> Overview
> --------
> This series does major 2 changes, enable multiple function-graphs on
> the ftrace (e.g. allow function-graph on sub instances) and rewrite the
> fprobe on this function-graph.
>
> The former changes had been sent from Steven Rostedt 4 years ago (*),
> which allows users to set different setting function-graph tracer (and
> other tracers based on function-graph) in each trace-instances at the
> same time.
>
> (*) https://lore.kernel.org/all/[email protected]/
>
> The purpose of latter change are;
>
> 1) Remove dependency of the rethook from fprobe so that we can reduce
> the return hook code and shadow stack.
>
> 2) Make 'ftrace_regs' the common trace interface for the function
> boundary.
>
> 1) Currently we have 2(or 3) different function return hook codes,
> the function-graph tracer and rethook (and legacy kretprobe).
> But since this is redundant and needs double maintenance cost,
> I would like to unify those. From the user's viewpoint, function-
> graph tracer is very useful to grasp the execution path. For this
> purpose, it is hard to use the rethook in the function-graph
> tracer, but the opposite is possible. (Strictly speaking, kretprobe
> can not use it because it requires 'pt_regs' for historical reasons.)
>
> 2) Now the fprobe provides the 'pt_regs' for its handler, but that is
> wrong for the function entry and exit. Moreover, depending on the
> architecture, there is no way to accurately reproduce 'pt_regs'
> outside of interrupt or exception handlers. This means fprobe should
> not use 'pt_regs' because it does not use such exceptions.
> (Conversely, kprobe should use 'pt_regs' because it is an abstract
> interface of the software breakpoint exception.)
>
> This series changes fprobe to use function-graph tracer for tracing
> function entry and exit, instead of mixture of ftrace and rethook.
> Unlike the rethook which is a per-task list of system-wide allocated
> nodes, the function graph's ret_stack is a per-task shadow stack.
> Thus it does not need to set 'nr_maxactive' (which is the number of
> pre-allocated nodes).
> Also the handlers will get the 'ftrace_regs' instead of 'pt_regs'.
> Since eBPF mulit_kprobe/multi_kretprobe events still use 'pt_regs' as
> their register interface, this changes it to convert 'ftrace_regs' to
> 'pt_regs'. Of course this conversion makes an incomplete 'pt_regs',
> so users must access only registers for function parameters or
> return value.
>
> Design
> ------
> Instead of using ftrace's function entry hook directly, the new fprobe
> is built on top of the function-graph's entry and return callbacks
> with 'ftrace_regs'.
>
> Since the fprobe requires access to 'ftrace_regs', the architecture
> must support CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS and
> CONFIG_HAVE_FTRACE_GRAPH_FUNC, which enables to call function-graph
> entry callback with 'ftrace_regs', and also
> CONFIG_HAVE_FUNCTION_GRAPH_FREGS, which passes the ftrace_regs to
> return_to_handler.
>
> All fprobes share a single function-graph ops (means shares a common
> ftrace filter) similar to the kprobe-on-ftrace. This needs another
> layer to find corresponding fprobe in the common function-graph
> callbacks, but has much better scalability, since the number of
> registered function-graph ops is limited.
>
> In the entry callback, the fprobe runs its entry_handler and saves the
> address of 'fprobe' on the function-graph's shadow stack as data. The
> return callback decodes the data to get the 'fprobe' address, and runs
> the exit_handler.
>
> The fprobe introduces two hash-tables, one is for entry callback which
> searches fprobes related to the given function address passed by entry
> callback. The other is for a return callback which checks if the given
> 'fprobe' data structure pointer is still valid. Note that it is
> possible to unregister fprobe before the return callback runs. Thus
> the address validation must be done before using it in the return
> callback.
>
> This series can be applied against the probes/for-next branch, which
> is based on v6.9-rc3.
>
> This series can also be found below branch.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git/log/?h=topic/fprobe-on-fgraph
>
> Thank you,
>
> ---

Hey Masami,

I can't really review most of that code as I'm completely unfamiliar
with all those inner workings of fprobe/ftrace/function_graph. I left
a few comments where there were somewhat more obvious BPF-related
pieces.

But I also did run our BPF benchmarks on probes/for-next as a baseline
and then with your series applied on top. Just to see if there are any
regressions. I think it will be a useful data point for you.

You should be already familiar with the bench tool we have in BPF
selftests (I used it on some other patches for your tree).

BASELINE
========
kprobe : 24.634 ± 0.205M/s
kprobe-multi : 28.898 ± 0.531M/s
kretprobe : 10.478 ± 0.015M/s
kretprobe-multi: 11.012 ± 0.063M/s

THIS PATCH SET ON TOP
=====================
kprobe : 25.144 ± 0.027M/s (+2%)
kprobe-multi : 28.909 ± 0.074M/s
kretprobe : 9.482 ± 0.008M/s (-9.5%)
kretprobe-multi: 13.688 ± 0.027M/s (+24%)

These numbers are pretty stable and look to be more or less representative.

As you can see, kprobes got a bit faster, kprobe-multi seems to be
about the same, though.

Then (I suppose they are "legacy") kretprobes got quite noticeably
slower, almost by 10%. Not sure why, but looks real after re-running
benchmarks a bunch of times and getting stable results.

On the other hand, multi-kretprobes got significantly faster (+24%!).
Again, I don't know if it is expected or not, but it's a nice
improvement.

If you have any idea why kretprobes would get so much slower, it would
be nice to look into that and see if you can mitigate the regression
somehow. Thanks!


> 51 files changed, 2325 insertions(+), 882 deletions(-)
> create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
>
> --
> Masami Hiramatsu (Google) <[email protected]>
>

2024-04-28 23:26:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

On Thu, 25 Apr 2024 13:31:53 -0700
Andrii Nakryiko <[email protected]> wrote:

I'm just coming back from Japan (work and then a vacation), and
catching up on my email during the 6 hour layover in Detroit.

> Hey Masami,
>
> I can't really review most of that code as I'm completely unfamiliar
> with all those inner workings of fprobe/ftrace/function_graph. I left
> a few comments where there were somewhat more obvious BPF-related
> pieces.
>
> But I also did run our BPF benchmarks on probes/for-next as a baseline
> and then with your series applied on top. Just to see if there are any
> regressions. I think it will be a useful data point for you.
>
> You should be already familiar with the bench tool we have in BPF
> selftests (I used it on some other patches for your tree).

I should get familiar with your tools too.

>
> BASELINE
> ========
> kprobe : 24.634 ± 0.205M/s
> kprobe-multi : 28.898 ± 0.531M/s
> kretprobe : 10.478 ± 0.015M/s
> kretprobe-multi: 11.012 ± 0.063M/s
>
> THIS PATCH SET ON TOP
> =====================
> kprobe : 25.144 ± 0.027M/s (+2%)
> kprobe-multi : 28.909 ± 0.074M/s
> kretprobe : 9.482 ± 0.008M/s (-9.5%)
> kretprobe-multi: 13.688 ± 0.027M/s (+24%)
>
> These numbers are pretty stable and look to be more or less representative.

Thanks for running this.

>
> As you can see, kprobes got a bit faster, kprobe-multi seems to be
> about the same, though.
>
> Then (I suppose they are "legacy") kretprobes got quite noticeably
> slower, almost by 10%. Not sure why, but looks real after re-running
> benchmarks a bunch of times and getting stable results.
>
> On the other hand, multi-kretprobes got significantly faster (+24%!).
> Again, I don't know if it is expected or not, but it's a nice
> improvement.
>
> If you have any idea why kretprobes would get so much slower, it would
> be nice to look into that and see if you can mitigate the regression
> somehow. Thanks!

My guess is that this patch set helps generic use cases for tracing the
return of functions, but will likely add more overhead for single use
cases. That is, kretprobe is made to be specific for a single function,
but kretprobe-multi is more generic. Hence the generic version will
improve at the sacrifice of the specific function. I did expect as much.

That said, I think there's probably a lot of low hanging fruit that can
be done to this series to help improve the kretprobe performance. I'm
not sure we can get back to the baseline, but I'm hoping we can at
least make it much better than that 10% slowdown.

I'll be reviewing this patch set this week as I recover from jetlag.

-- Steve

2024-04-29 13:53:37

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

Hi Andrii,

On Thu, 25 Apr 2024 13:31:53 -0700
Andrii Nakryiko <[email protected]> wrote:

> Hey Masami,
>
> I can't really review most of that code as I'm completely unfamiliar
> with all those inner workings of fprobe/ftrace/function_graph. I left
> a few comments where there were somewhat more obvious BPF-related
> pieces.
>
> But I also did run our BPF benchmarks on probes/for-next as a baseline
> and then with your series applied on top. Just to see if there are any
> regressions. I think it will be a useful data point for you.

Thanks for testing!

>
> You should be already familiar with the bench tool we have in BPF
> selftests (I used it on some other patches for your tree).

What patches we need?

>
> BASELINE
> ========
> kprobe : 24.634 ± 0.205M/s
> kprobe-multi : 28.898 ± 0.531M/s
> kretprobe : 10.478 ± 0.015M/s
> kretprobe-multi: 11.012 ± 0.063M/s
>
> THIS PATCH SET ON TOP
> =====================
> kprobe : 25.144 ± 0.027M/s (+2%)
> kprobe-multi : 28.909 ± 0.074M/s
> kretprobe : 9.482 ± 0.008M/s (-9.5%)
> kretprobe-multi: 13.688 ± 0.027M/s (+24%)

This looks good. Kretprobe should also use kretprobe-multi (fprobe)
eventually because it should be a single callback version of
kretprobe-multi.

>
> These numbers are pretty stable and look to be more or less representative.
>
> As you can see, kprobes got a bit faster, kprobe-multi seems to be
> about the same, though.
>
> Then (I suppose they are "legacy") kretprobes got quite noticeably
> slower, almost by 10%. Not sure why, but looks real after re-running
> benchmarks a bunch of times and getting stable results.

Hmm, kretprobe on x86 should use ftrace + rethook even with my series.
So nothing should be changed. Maybe cache access pattern has been
changed?
I'll check it with tracefs (to remove the effect from bpf related changes)

>
> On the other hand, multi-kretprobes got significantly faster (+24%!).
> Again, I don't know if it is expected or not, but it's a nice
> improvement.

Thanks!

>
> If you have any idea why kretprobes would get so much slower, it would
> be nice to look into that and see if you can mitigate the regression
> somehow. Thanks!

OK, let me check it.

Thank you!

>
>
> > 51 files changed, 2325 insertions(+), 882 deletions(-)
> > create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
> >
> > --
> > Masami Hiramatsu (Google) <[email protected]>
> >


--
Masami Hiramatsu (Google) <[email protected]>

2024-04-29 14:56:30

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v9 36/36] fgraph: Skip recording calltime/rettime if it is not nneeded

On Thu, 25 Apr 2024 13:15:08 -0700
Andrii Nakryiko <[email protected]> wrote:

> On Mon, Apr 15, 2024 at 6:25 AM Masami Hiramatsu (Google)
> <[email protected]> wrote:
> >
> > From: Masami Hiramatsu (Google) <[email protected]>
> >
> > Skip recording calltime and rettime if the fgraph_ops does not need it.
> > This is a kind of performance optimization for fprobe. Since the fprobe
> > user does not use these entries, recording timestamp in fgraph is just
> > a overhead (e.g. eBPF, ftrace). So introduce the skip_timestamp flag,
> > and all fgraph_ops sets this flag, skip recording calltime and rettime.
> >
> > Suggested-by: Jiri Olsa <[email protected]>
> > Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> > ---
> > Changes in v9:
> > - Newly added.
> > ---
> > include/linux/ftrace.h | 2 ++
> > kernel/trace/fgraph.c | 46 +++++++++++++++++++++++++++++++++++++++-------
> > kernel/trace/fprobe.c | 1 +
> > 3 files changed, 42 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
> > index d845a80a3d56..06fc7cbef897 100644
> > --- a/include/linux/ftrace.h
> > +++ b/include/linux/ftrace.h
> > @@ -1156,6 +1156,8 @@ struct fgraph_ops {
> > struct ftrace_ops ops; /* for the hash lists */
> > void *private;
> > int idx;
> > + /* If skip_timestamp is true, this does not record timestamps. */
> > + bool skip_timestamp;
> > };
> >
> > void *fgraph_reserve_data(int idx, int size_bytes);
> > diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
> > index 7556fbbae323..a5722537bb79 100644
> > --- a/kernel/trace/fgraph.c
> > +++ b/kernel/trace/fgraph.c
> > @@ -131,6 +131,7 @@ DEFINE_STATIC_KEY_FALSE(kill_ftrace_graph);
> > int ftrace_graph_active;
> >
> > static struct fgraph_ops *fgraph_array[FGRAPH_ARRAY_SIZE];
> > +static bool fgraph_skip_timestamp;
> >
> > /* LRU index table for fgraph_array */
> > static int fgraph_lru_table[FGRAPH_ARRAY_SIZE];
> > @@ -475,7 +476,7 @@ void ftrace_graph_stop(void)
> > static int
> > ftrace_push_return_trace(unsigned long ret, unsigned long func,
> > unsigned long frame_pointer, unsigned long *retp,
> > - int fgraph_idx)
> > + int fgraph_idx, bool skip_ts)
> > {
> > struct ftrace_ret_stack *ret_stack;
> > unsigned long long calltime;
> > @@ -498,8 +499,12 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
> > ret_stack = get_ret_stack(current, current->curr_ret_stack, &index);
> > if (ret_stack && ret_stack->func == func &&
> > get_fgraph_type(current, index + FGRAPH_RET_INDEX) == FGRAPH_TYPE_BITMAP &&
> > - !is_fgraph_index_set(current, index + FGRAPH_RET_INDEX, fgraph_idx))
> > + !is_fgraph_index_set(current, index + FGRAPH_RET_INDEX, fgraph_idx)) {
> > + /* If previous one skips calltime, update it. */
> > + if (!skip_ts && !ret_stack->calltime)
> > + ret_stack->calltime = trace_clock_local();
> > return index + FGRAPH_RET_INDEX;
> > + }
> >
> > val = (FGRAPH_TYPE_RESERVED << FGRAPH_TYPE_SHIFT) | FGRAPH_RET_INDEX;
> >
> > @@ -517,7 +522,10 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func,
> > return -EBUSY;
> > }
> >
> > - calltime = trace_clock_local();
> > + if (skip_ts)
>
> would it be ok to add likely() here to keep the least-overhead code path linear?

It's not "likely", but hmm, yes as you said. We can keep the least overhead.
OK, let me add likely.

Thank you,

>
> > + calltime = 0LL;
> > + else
> > + calltime = trace_clock_local();
> >
> > index = READ_ONCE(current->curr_ret_stack);
> > ret_stack = RET_STACK(current, index);
> > @@ -601,7 +609,8 @@ int function_graph_enter_regs(unsigned long ret, unsigned long func,
> > trace.func = func;
> > trace.depth = ++current->curr_ret_depth;
> >
> > - index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0);
> > + index = ftrace_push_return_trace(ret, func, frame_pointer, retp, 0,
> > + fgraph_skip_timestamp);
> > if (index < 0)
> > goto out;
> >
> > @@ -654,7 +663,8 @@ int function_graph_enter_ops(unsigned long ret, unsigned long func,
> > return -ENODEV;
> >
> > /* Use start for the distance to ret_stack (skipping over reserve) */
> > - index = ftrace_push_return_trace(ret, func, frame_pointer, retp, gops->idx);
> > + index = ftrace_push_return_trace(ret, func, frame_pointer, retp, gops->idx,
> > + gops->skip_timestamp);
> > if (index < 0)
> > return index;
> > type = get_fgraph_type(current, index);
> > @@ -732,6 +742,7 @@ ftrace_pop_return_trace(struct ftrace_graph_ret *trace, unsigned long *ret,
> > *ret = ret_stack->ret;
> > trace->func = ret_stack->func;
> > trace->calltime = ret_stack->calltime;
> > + trace->rettime = 0;
> > trace->overrun = atomic_read(&current->trace_overrun);
> > trace->depth = current->curr_ret_depth;
> > /*
> > @@ -792,7 +803,6 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
> > return (unsigned long)panic;
> > }
> >
> > - trace.rettime = trace_clock_local();
> > if (fregs)
> > ftrace_regs_set_instruction_pointer(fregs, ret);
> >
> > @@ -808,6 +818,8 @@ __ftrace_return_to_handler(struct ftrace_regs *fregs, unsigned long frame_pointe
> > continue;
> > if (gops == &fgraph_stub)
> > continue;
> > + if (!trace.rettime && !gops->skip_timestamp)
>
> In addition to the above, do you mind adding unlikely() here as well?
>
> > + trace.rettime = trace_clock_local();
> >
> > gops->retfunc(&trace, gops, fregs);
> > }
>
> [...]


--
Masami Hiramatsu (Google) <[email protected]>

2024-04-29 15:26:38

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v9 29/36] bpf: Enable kprobe_multi feature if CONFIG_FPROBE is enabled

On Thu, 25 Apr 2024 13:09:32 -0700
Andrii Nakryiko <[email protected]> wrote:

> On Mon, Apr 15, 2024 at 6:22 AM Masami Hiramatsu (Google)
> <[email protected]> wrote:
> >
> > From: Masami Hiramatsu (Google) <[email protected]>
> >
> > Enable kprobe_multi feature if CONFIG_FPROBE is enabled. The pt_regs is
> > converted from ftrace_regs by ftrace_partial_regs(), thus some registers
> > may always returns 0. But it should be enough for function entry (access
> > arguments) and exit (access return value).
> >
> > Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> > Acked-by: Florent Revest <[email protected]>
> > ---
> > Changes from previous series: NOTHING, Update against the new series.
> > ---
> > kernel/trace/bpf_trace.c | 22 +++++++++-------------
> > 1 file changed, 9 insertions(+), 13 deletions(-)
> >
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index e51a6ef87167..57b1174030c9 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -2577,7 +2577,7 @@ static int __init bpf_event_init(void)
> > fs_initcall(bpf_event_init);
> > #endif /* CONFIG_MODULES */
> >
> > -#if defined(CONFIG_FPROBE) && defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS)
> > +#ifdef CONFIG_FPROBE
> > struct bpf_kprobe_multi_link {
> > struct bpf_link link;
> > struct fprobe fp;
> > @@ -2600,6 +2600,8 @@ struct user_syms {
> > char *buf;
> > };
> >
> > +static DEFINE_PER_CPU(struct pt_regs, bpf_kprobe_multi_pt_regs);
>
> this is a waste if CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST=y, right?
> Can we guard it?

Good catch! Yes, we can guard it.

>
>
> > +
> > static int copy_user_syms(struct user_syms *us, unsigned long __user *usyms, u32 cnt)
> > {
> > unsigned long __user usymbol;
> > @@ -2792,13 +2794,14 @@ static u64 bpf_kprobe_multi_entry_ip(struct bpf_run_ctx *ctx)
> >
> > static int
> > kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
> > - unsigned long entry_ip, struct pt_regs *regs)
> > + unsigned long entry_ip, struct ftrace_regs *fregs)
> > {
> > struct bpf_kprobe_multi_run_ctx run_ctx = {
> > .link = link,
> > .entry_ip = entry_ip,
> > };
> > struct bpf_run_ctx *old_run_ctx;
> > + struct pt_regs *regs;
> > int err;
> >
> > if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) {
> > @@ -2809,6 +2812,7 @@ kprobe_multi_link_prog_run(struct bpf_kprobe_multi_link *link,
> >
> > migrate_disable();
> > rcu_read_lock();
> > + regs = ftrace_partial_regs(fregs, this_cpu_ptr(&bpf_kprobe_multi_pt_regs));
>
> and then pass NULL if defined(CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST)?

Indeed.

Thank you!

>
>
> > old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > err = bpf_prog_run(link->link.prog, regs);
> > bpf_reset_run_ctx(old_run_ctx);
> > @@ -2826,13 +2830,9 @@ kprobe_multi_link_handler(struct fprobe *fp, unsigned long fentry_ip,
> > void *data)
> > {
> > struct bpf_kprobe_multi_link *link;
> > - struct pt_regs *regs = ftrace_get_regs(fregs);
> > -
> > - if (!regs)
> > - return 0;
> >
> > link = container_of(fp, struct bpf_kprobe_multi_link, fp);
> > - kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
> > + kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
> > return 0;
> > }
> >
> > @@ -2842,13 +2842,9 @@ kprobe_multi_link_exit_handler(struct fprobe *fp, unsigned long fentry_ip,
> > void *data)
> > {
> > struct bpf_kprobe_multi_link *link;
> > - struct pt_regs *regs = ftrace_get_regs(fregs);
> > -
> > - if (!regs)
> > - return;
> >
> > link = container_of(fp, struct bpf_kprobe_multi_link, fp);
> > - kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), regs);
> > + kprobe_multi_link_prog_run(link, get_entry_ip(fentry_ip), fregs);
> > }
> >
> > static int symbols_cmp_r(const void *a, const void *b, const void *priv)
> > @@ -3107,7 +3103,7 @@ int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
> > kvfree(cookies);
> > return err;
> > }
> > -#else /* !CONFIG_FPROBE || !CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> > +#else /* !CONFIG_FPROBE */
> > int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
> > {
> > return -EOPNOTSUPP;
> >
> >


--
Masami Hiramatsu (Google) <[email protected]>

2024-04-29 20:25:29

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

On Mon, Apr 29, 2024 at 6:51 AM Masami Hiramatsu <[email protected]> wrote:
>
> Hi Andrii,
>
> On Thu, 25 Apr 2024 13:31:53 -0700
> Andrii Nakryiko <[email protected]> wrote:
>
> > Hey Masami,
> >
> > I can't really review most of that code as I'm completely unfamiliar
> > with all those inner workings of fprobe/ftrace/function_graph. I left
> > a few comments where there were somewhat more obvious BPF-related
> > pieces.
> >
> > But I also did run our BPF benchmarks on probes/for-next as a baseline
> > and then with your series applied on top. Just to see if there are any
> > regressions. I think it will be a useful data point for you.
>
> Thanks for testing!
>
> >
> > You should be already familiar with the bench tool we have in BPF
> > selftests (I used it on some other patches for your tree).
>
> What patches we need?
>

You mean for this `bench` tool? They are part of BPF selftests (under
tools/testing/selftests/bpf), you can build them by running:

$ make RELEASE=1 -j$(nproc) bench

After that you'll get a self-container `bench` binary, which has all
the self-contained benchmarks.

You might also find a small script (benchs/run_bench_trigger.sh inside
BPF selftests directory) helpful, it collects final summary of the
benchmark run and optionally accepts a specific set of benchmarks. So
you can use it like this:

$ benchs/run_bench_trigger.sh kprobe kprobe-multi
kprobe : 18.731 ± 0.639M/s
kprobe-multi : 23.938 ± 0.612M/s

By default it will run a wider set of benchmarks (no uprobes, but a
bunch of extra fentry/fexit tests and stuff like this).

> >
> > BASELINE
> > ========
> > kprobe : 24.634 ± 0.205M/s
> > kprobe-multi : 28.898 ± 0.531M/s
> > kretprobe : 10.478 ± 0.015M/s
> > kretprobe-multi: 11.012 ± 0.063M/s
> >
> > THIS PATCH SET ON TOP
> > =====================
> > kprobe : 25.144 ± 0.027M/s (+2%)
> > kprobe-multi : 28.909 ± 0.074M/s
> > kretprobe : 9.482 ± 0.008M/s (-9.5%)
> > kretprobe-multi: 13.688 ± 0.027M/s (+24%)
>
> This looks good. Kretprobe should also use kretprobe-multi (fprobe)
> eventually because it should be a single callback version of
> kretprobe-multi.
>
> >
> > These numbers are pretty stable and look to be more or less representative.
> >
> > As you can see, kprobes got a bit faster, kprobe-multi seems to be
> > about the same, though.
> >
> > Then (I suppose they are "legacy") kretprobes got quite noticeably
> > slower, almost by 10%. Not sure why, but looks real after re-running
> > benchmarks a bunch of times and getting stable results.
>
> Hmm, kretprobe on x86 should use ftrace + rethook even with my series.
> So nothing should be changed. Maybe cache access pattern has been
> changed?
> I'll check it with tracefs (to remove the effect from bpf related changes)
>
> >
> > On the other hand, multi-kretprobes got significantly faster (+24%!).
> > Again, I don't know if it is expected or not, but it's a nice
> > improvement.
>
> Thanks!
>
> >
> > If you have any idea why kretprobes would get so much slower, it would
> > be nice to look into that and see if you can mitigate the regression
> > somehow. Thanks!
>
> OK, let me check it.
>
> Thank you!
>
> >
> >
> > > 51 files changed, 2325 insertions(+), 882 deletions(-)
> > > create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
> > >
> > > --
> > > Masami Hiramatsu (Google) <[email protected]>
> > >
>
>
> --
> Masami Hiramatsu (Google) <[email protected]>

2024-04-29 20:29:40

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

On Sun, Apr 28, 2024 at 4:25 PM Steven Rostedt <[email protected]> wrote:
>
> On Thu, 25 Apr 2024 13:31:53 -0700
> Andrii Nakryiko <[email protected]> wrote:
>
> I'm just coming back from Japan (work and then a vacation), and
> catching up on my email during the 6 hour layover in Detroit.
>
> > Hey Masami,
> >
> > I can't really review most of that code as I'm completely unfamiliar
> > with all those inner workings of fprobe/ftrace/function_graph. I left
> > a few comments where there were somewhat more obvious BPF-related
> > pieces.
> >
> > But I also did run our BPF benchmarks on probes/for-next as a baseline
> > and then with your series applied on top. Just to see if there are any
> > regressions. I think it will be a useful data point for you.
> >
> > You should be already familiar with the bench tool we have in BPF
> > selftests (I used it on some other patches for your tree).
>
> I should get familiar with your tools too.
>

It's a nifty and self-contained tool to do some micro-benchmarking, I
replied to Masami with a few details on how to build and use it.

> >
> > BASELINE
> > ========
> > kprobe : 24.634 ± 0.205M/s
> > kprobe-multi : 28.898 ± 0.531M/s
> > kretprobe : 10.478 ± 0.015M/s
> > kretprobe-multi: 11.012 ± 0.063M/s
> >
> > THIS PATCH SET ON TOP
> > =====================
> > kprobe : 25.144 ± 0.027M/s (+2%)
> > kprobe-multi : 28.909 ± 0.074M/s
> > kretprobe : 9.482 ± 0.008M/s (-9.5%)
> > kretprobe-multi: 13.688 ± 0.027M/s (+24%)
> >
> > These numbers are pretty stable and look to be more or less representative.
>
> Thanks for running this.
>
> >
> > As you can see, kprobes got a bit faster, kprobe-multi seems to be
> > about the same, though.
> >
> > Then (I suppose they are "legacy") kretprobes got quite noticeably
> > slower, almost by 10%. Not sure why, but looks real after re-running
> > benchmarks a bunch of times and getting stable results.
> >
> > On the other hand, multi-kretprobes got significantly faster (+24%!).
> > Again, I don't know if it is expected or not, but it's a nice
> > improvement.
> >
> > If you have any idea why kretprobes would get so much slower, it would
> > be nice to look into that and see if you can mitigate the regression
> > somehow. Thanks!
>
> My guess is that this patch set helps generic use cases for tracing the
> return of functions, but will likely add more overhead for single use
> cases. That is, kretprobe is made to be specific for a single function,
> but kretprobe-multi is more generic. Hence the generic version will
> improve at the sacrifice of the specific function. I did expect as much.
>
> That said, I think there's probably a lot of low hanging fruit that can
> be done to this series to help improve the kretprobe performance. I'm
> not sure we can get back to the baseline, but I'm hoping we can at
> least make it much better than that 10% slowdown.

That would certainly be appreciated, thanks!

But I'm also considering trying to switch to multi-kprobe/kretprobe
automatically on libbpf side, whenever possible, so that users can get
the best performance. There might still be situations where this can't
be done, so singular kprobe/kretprobe can't be completely deprecated,
but multi variants seems to be universally faster, so I'm going to
make them a default (I need to handle some backwards compat aspect,
but that's libbpf-specific stuff you shouldn't be concerned with).

>
> I'll be reviewing this patch set this week as I recover from jetlag.
>
> -- Steve

2024-04-30 13:34:42

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

On Mon, 29 Apr 2024 13:25:04 -0700
Andrii Nakryiko <[email protected]> wrote:

> On Mon, Apr 29, 2024 at 6:51 AM Masami Hiramatsu <[email protected]> wrote:
> >
> > Hi Andrii,
> >
> > On Thu, 25 Apr 2024 13:31:53 -0700
> > Andrii Nakryiko <[email protected]> wrote:
> >
> > > Hey Masami,
> > >
> > > I can't really review most of that code as I'm completely unfamiliar
> > > with all those inner workings of fprobe/ftrace/function_graph. I left
> > > a few comments where there were somewhat more obvious BPF-related
> > > pieces.
> > >
> > > But I also did run our BPF benchmarks on probes/for-next as a baseline
> > > and then with your series applied on top. Just to see if there are any
> > > regressions. I think it will be a useful data point for you.
> >
> > Thanks for testing!
> >
> > >
> > > You should be already familiar with the bench tool we have in BPF
> > > selftests (I used it on some other patches for your tree).
> >
> > What patches we need?
> >
>
> You mean for this `bench` tool? They are part of BPF selftests (under
> tools/testing/selftests/bpf), you can build them by running:
>
> $ make RELEASE=1 -j$(nproc) bench
>
> After that you'll get a self-container `bench` binary, which has all
> the self-contained benchmarks.
>
> You might also find a small script (benchs/run_bench_trigger.sh inside
> BPF selftests directory) helpful, it collects final summary of the
> benchmark run and optionally accepts a specific set of benchmarks. So
> you can use it like this:
>
> $ benchs/run_bench_trigger.sh kprobe kprobe-multi
> kprobe : 18.731 ± 0.639M/s
> kprobe-multi : 23.938 ± 0.612M/s
>
> By default it will run a wider set of benchmarks (no uprobes, but a
> bunch of extra fentry/fexit tests and stuff like this).

origin:
# benchs/run_bench_trigger.sh
kretprobe : 1.329 ± 0.007M/s
kretprobe-multi: 1.341 ± 0.004M/s
# benchs/run_bench_trigger.sh
kretprobe : 1.288 ± 0.014M/s
kretprobe-multi: 1.365 ± 0.002M/s
# benchs/run_bench_trigger.sh
kretprobe : 1.329 ± 0.002M/s
kretprobe-multi: 1.331 ± 0.011M/s
# benchs/run_bench_trigger.sh
kretprobe : 1.311 ± 0.003M/s
kretprobe-multi: 1.318 ± 0.002M/s s

patched:

# benchs/run_bench_trigger.sh
kretprobe : 1.274 ± 0.003M/s
kretprobe-multi: 1.397 ± 0.002M/s
# benchs/run_bench_trigger.sh
kretprobe : 1.307 ± 0.002M/s
kretprobe-multi: 1.406 ± 0.004M/s
# benchs/run_bench_trigger.sh
kretprobe : 1.279 ± 0.004M/s
kretprobe-multi: 1.330 ± 0.014M/s
# benchs/run_bench_trigger.sh
kretprobe : 1.256 ± 0.010M/s
kretprobe-multi: 1.412 ± 0.003M/s

Hmm, in my case, it seems smaller differences (~3%?).
I attached perf report results for those, but I don't see large difference.

> > >
> > > BASELINE
> > > ========
> > > kprobe : 24.634 ± 0.205M/s
> > > kprobe-multi : 28.898 ± 0.531M/s
> > > kretprobe : 10.478 ± 0.015M/s
> > > kretprobe-multi: 11.012 ± 0.063M/s
> > >
> > > THIS PATCH SET ON TOP
> > > =====================
> > > kprobe : 25.144 ± 0.027M/s (+2%)
> > > kprobe-multi : 28.909 ± 0.074M/s
> > > kretprobe : 9.482 ± 0.008M/s (-9.5%)
> > > kretprobe-multi: 13.688 ± 0.027M/s (+24%)
> >
> > This looks good. Kretprobe should also use kretprobe-multi (fprobe)
> > eventually because it should be a single callback version of
> > kretprobe-multi.

I ran another benchmark (prctl loop, attached), the origin kernel result is here;

# sh ./benchmark.sh
count = 10000000, took 6.748133 sec

And the patched kernel result;

# sh ./benchmark.sh
count = 10000000, took 6.644095 sec

I confirmed that the parf result has no big difference.

Thank you,


> >
> > >
> > > These numbers are pretty stable and look to be more or less representative.
> > >
> > > As you can see, kprobes got a bit faster, kprobe-multi seems to be
> > > about the same, though.
> > >
> > > Then (I suppose they are "legacy") kretprobes got quite noticeably
> > > slower, almost by 10%. Not sure why, but looks real after re-running
> > > benchmarks a bunch of times and getting stable results.
> >
> > Hmm, kretprobe on x86 should use ftrace + rethook even with my series.
> > So nothing should be changed. Maybe cache access pattern has been
> > changed?
> > I'll check it with tracefs (to remove the effect from bpf related changes)
> >
> > >
> > > On the other hand, multi-kretprobes got significantly faster (+24%!).
> > > Again, I don't know if it is expected or not, but it's a nice
> > > improvement.
> >
> > Thanks!
> >
> > >
> > > If you have any idea why kretprobes would get so much slower, it would
> > > be nice to look into that and see if you can mitigate the regression
> > > somehow. Thanks!
> >
> > OK, let me check it.
> >
> > Thank you!
> >
> > >
> > >
> > > > 51 files changed, 2325 insertions(+), 882 deletions(-)
> > > > create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
> > > >
> > > > --
> > > > Masami Hiramatsu (Google) <[email protected]>
> > > >
> >
> >
> > --
> > Masami Hiramatsu (Google) <[email protected]>


--
Masami Hiramatsu (Google) <[email protected]>


Attachments:
prctl_loop.c (582.00 B)
perf-out-kretprobe-nopatch.txt (64.75 kB)
perf-out-kretprobe-patched.txt (65.40 kB)
Download all attachments

2024-04-30 16:30:16

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH v9 00/36] tracing: fprobe: function_graph: Multi-function graph and fprobe on fgraph

On Tue, Apr 30, 2024 at 6:32 AM Masami Hiramatsu <[email protected]> wrote:
>
> On Mon, 29 Apr 2024 13:25:04 -0700
> Andrii Nakryiko <[email protected]> wrote:
>
> > On Mon, Apr 29, 2024 at 6:51 AM Masami Hiramatsu <[email protected]> wrote:
> > >
> > > Hi Andrii,
> > >
> > > On Thu, 25 Apr 2024 13:31:53 -0700
> > > Andrii Nakryiko <[email protected]> wrote:
> > >
> > > > Hey Masami,
> > > >
> > > > I can't really review most of that code as I'm completely unfamiliar
> > > > with all those inner workings of fprobe/ftrace/function_graph. I left
> > > > a few comments where there were somewhat more obvious BPF-related
> > > > pieces.
> > > >
> > > > But I also did run our BPF benchmarks on probes/for-next as a baseline
> > > > and then with your series applied on top. Just to see if there are any
> > > > regressions. I think it will be a useful data point for you.
> > >
> > > Thanks for testing!
> > >
> > > >
> > > > You should be already familiar with the bench tool we have in BPF
> > > > selftests (I used it on some other patches for your tree).
> > >
> > > What patches we need?
> > >
> >
> > You mean for this `bench` tool? They are part of BPF selftests (under
> > tools/testing/selftests/bpf), you can build them by running:
> >
> > $ make RELEASE=1 -j$(nproc) bench
> >
> > After that you'll get a self-container `bench` binary, which has all
> > the self-contained benchmarks.
> >
> > You might also find a small script (benchs/run_bench_trigger.sh inside
> > BPF selftests directory) helpful, it collects final summary of the
> > benchmark run and optionally accepts a specific set of benchmarks. So
> > you can use it like this:
> >
> > $ benchs/run_bench_trigger.sh kprobe kprobe-multi
> > kprobe : 18.731 ± 0.639M/s
> > kprobe-multi : 23.938 ± 0.612M/s
> >
> > By default it will run a wider set of benchmarks (no uprobes, but a
> > bunch of extra fentry/fexit tests and stuff like this).
>
> origin:
> # benchs/run_bench_trigger.sh
> kretprobe : 1.329 ± 0.007M/s
> kretprobe-multi: 1.341 ± 0.004M/s
> # benchs/run_bench_trigger.sh
> kretprobe : 1.288 ± 0.014M/s
> kretprobe-multi: 1.365 ± 0.002M/s
> # benchs/run_bench_trigger.sh
> kretprobe : 1.329 ± 0.002M/s
> kretprobe-multi: 1.331 ± 0.011M/s
> # benchs/run_bench_trigger.sh
> kretprobe : 1.311 ± 0.003M/s
> kretprobe-multi: 1.318 ± 0.002M/s s
>
> patched:
>
> # benchs/run_bench_trigger.sh
> kretprobe : 1.274 ± 0.003M/s
> kretprobe-multi: 1.397 ± 0.002M/s
> # benchs/run_bench_trigger.sh
> kretprobe : 1.307 ± 0.002M/s
> kretprobe-multi: 1.406 ± 0.004M/s
> # benchs/run_bench_trigger.sh
> kretprobe : 1.279 ± 0.004M/s
> kretprobe-multi: 1.330 ± 0.014M/s
> # benchs/run_bench_trigger.sh
> kretprobe : 1.256 ± 0.010M/s
> kretprobe-multi: 1.412 ± 0.003M/s
>
> Hmm, in my case, it seems smaller differences (~3%?).
> I attached perf report results for those, but I don't see large difference.

I ran my benchmarks on bare metal machine (and quite powerful at that,
you can see my numbers are almost 10x of yours), with mitigations
disabled, no retpolines, etc. If you have any of those mitigations it
might result in smaller differences, probably. If you are running
inside QEMU/VM, the results might differ significantly as well.

>
> > > >
> > > > BASELINE
> > > > ========
> > > > kprobe : 24.634 ± 0.205M/s
> > > > kprobe-multi : 28.898 ± 0.531M/s
> > > > kretprobe : 10.478 ± 0.015M/s
> > > > kretprobe-multi: 11.012 ± 0.063M/s
> > > >
> > > > THIS PATCH SET ON TOP
> > > > =====================
> > > > kprobe : 25.144 ± 0.027M/s (+2%)
> > > > kprobe-multi : 28.909 ± 0.074M/s
> > > > kretprobe : 9.482 ± 0.008M/s (-9.5%)
> > > > kretprobe-multi: 13.688 ± 0.027M/s (+24%)
> > >
> > > This looks good. Kretprobe should also use kretprobe-multi (fprobe)
> > > eventually because it should be a single callback version of
> > > kretprobe-multi.
>
> I ran another benchmark (prctl loop, attached), the origin kernel result is here;
>
> # sh ./benchmark.sh
> count = 10000000, took 6.748133 sec
>
> And the patched kernel result;
>
> # sh ./benchmark.sh
> count = 10000000, took 6.644095 sec
>
> I confirmed that the parf result has no big difference.
>
> Thank you,
>
>
> > >
> > > >
> > > > These numbers are pretty stable and look to be more or less representative.
> > > >
> > > > As you can see, kprobes got a bit faster, kprobe-multi seems to be
> > > > about the same, though.
> > > >
> > > > Then (I suppose they are "legacy") kretprobes got quite noticeably
> > > > slower, almost by 10%. Not sure why, but looks real after re-running
> > > > benchmarks a bunch of times and getting stable results.
> > >
> > > Hmm, kretprobe on x86 should use ftrace + rethook even with my series.
> > > So nothing should be changed. Maybe cache access pattern has been
> > > changed?
> > > I'll check it with tracefs (to remove the effect from bpf related changes)
> > >
> > > >
> > > > On the other hand, multi-kretprobes got significantly faster (+24%!).
> > > > Again, I don't know if it is expected or not, but it's a nice
> > > > improvement.
> > >
> > > Thanks!
> > >
> > > >
> > > > If you have any idea why kretprobes would get so much slower, it would
> > > > be nice to look into that and see if you can mitigate the regression
> > > > somehow. Thanks!
> > >
> > > OK, let me check it.
> > >
> > > Thank you!
> > >
> > > >
> > > >
> > > > > 51 files changed, 2325 insertions(+), 882 deletions(-)
> > > > > create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe_repeat.tc
> > > > >
> > > > > --
> > > > > Masami Hiramatsu (Google) <[email protected]>
> > > > >
> > >
> > >
> > > --
> > > Masami Hiramatsu (Google) <[email protected]>
>
>
> --
> Masami Hiramatsu (Google) <[email protected]>