2023-03-20 13:18:24

by pengdonglin

[permalink] [raw]
Subject: [PATCH v5 0/2] function_graph: Support recording and printing the return value of function

When using the function_graph tracer to analyze system call failures,
it can be time-consuming to analyze the trace logs and locate the kernel
function that first returns an error. This change aims to simplify the
process by recording the function return value to the 'retval' member of
'ftrace_graph_ent' and printing it when outputing the trace log.

Note that even if a function's return type is void, a return value will
still be printed, so it should be ignored. If you care about this, the
BTF file can be used to obtain the details of function return type. We
can implement a tool to process the trace log and display the return
value based on its actual type.

Here is an example:

...

1) | cgroup_attach_task() {
1) | cgroup_migrate_add_src() {
1) 1.403 us | cset_cgroup_from_root(); /* = 0xffff93fc86f58010 */
1) 2.154 us | } /* cgroup_migrate_add_src = 0xffffb286c1297d00 */
1) ! 386.538 us | cgroup_migrate_prepare_dst(); /* = 0x0 */
1) | cgroup_migrate() {
1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
1) | cgroup_migrate_execute() {
1) | cpu_cgroup_can_attach() {
1) | cgroup_taskset_first() {
1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
1) 4.369 us | } /* cgroup_migrate_execute = -22 */
1) 7.143 us | } /* cgroup_migrate = -22 */
1) | cgroup_migrate_finish() {
1) 0.411 us | put_css_set_locked(); /* = 0x8 */
1) + 62.397 us | put_css_set_locked(); /* = 0x80000001 */
1) + 64.742 us | } /* cgroup_migrate_finish = 0x80000000 */
1) ! 465.605 us | } /* cgroup_attach_task = -22 */

...

After processing the above trace logs using BTF information:

...

1) | cgroup_attach_task() {
1) | cgroup_migrate_add_src() {
1) 1.403 us | cset_cgroup_from_root(); /* = 0xffff93fc86f58010 */
1) 2.154 us | } /* cgroup_migrate_add_src */
1) ! 386.538 us | cgroup_migrate_prepare_dst(); /* = 0 */
1) | cgroup_migrate() {
1) 0.651 us | cgroup_migrate_add_task();
1) | cgroup_migrate_execute() {
1) | cpu_cgroup_can_attach() {
1) | cgroup_taskset_first() {
1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
1) 0.380 us | sched_rt_can_attach(); /* = 0 */
1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
1) 4.369 us | } /* cgroup_migrate_execute = -22 */
1) 7.143 us | } /* cgroup_migrate = -22 */
1) | cgroup_migrate_finish() {
1) 0.411 us | put_css_set_locked();
1) + 62.397 us | put_css_set_locked();
1) + 64.742 us | } /* cgroup_migrate_finish */
1) ! 465.605 us | } /* cgroup_attach_task = -22 */

...

Donglin Peng (2):
function_graph: Support recording and printing the return value of
function
tracing: Add documentation for funcgraph-retval and graph_retval_hex

Documentation/trace/ftrace.rst | 75 ++++++++++++++++++++++
arch/arm/Kconfig | 1 +
arch/arm/kernel/entry-ftrace.S | 8 +++
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/entry-ftrace.S | 8 +++
arch/riscv/Kconfig | 1 +
arch/riscv/kernel/mcount.S | 12 +++-
arch/x86/Kconfig | 1 +
arch/x86/kernel/ftrace_32.S | 8 +++
arch/x86/kernel/ftrace_64.S | 10 +++
include/linux/ftrace.h | 3 +
kernel/trace/Kconfig | 8 +++
kernel/trace/fgraph.c | 25 +++++++-
kernel/trace/trace.h | 2 +
kernel/trace/trace_entries.h | 26 ++++++++
kernel/trace/trace_functions_graph.c | 95 +++++++++++++++++++++++++---
16 files changed, 272 insertions(+), 12 deletions(-)

--
2.25.1



2023-03-20 13:18:29

by pengdonglin

[permalink] [raw]
Subject: [PATCH v5 2/2] tracing: Add documentation for funcgraph-retval and graph_retval_hex

Add documentation for the two newly introduced options for the
function_graph tracer. The new option funcgraph-retval is used to
control whether or not to display the return value. The new option
graph_retval_hex is used to control the display format of the return
value.

Signed-off-by: Donglin Peng <[email protected]>
---
v5:
- Describe the limitations of funcgraph-retval
---
Documentation/trace/ftrace.rst | 75 ++++++++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)

diff --git a/Documentation/trace/ftrace.rst b/Documentation/trace/ftrace.rst
index b927fb2b94dc..797841d15da9 100644
--- a/Documentation/trace/ftrace.rst
+++ b/Documentation/trace/ftrace.rst
@@ -1328,6 +1328,19 @@ Options for function_graph tracer:
only a closing curly bracket "}" is displayed for
the return of a function.

+ funcgraph-retval
+ When set, the return event will include the function and
+ its return value. Note that even if the function return
+ type is void, a return value will also be printed, you
+ should ignore it. This option is off by default.
+
+ graph_retval_hex
+ Depend on function-retval. When set, the function return
+ value will be printed in hexadecimal format. If this is not
+ set and the return value looks like an error code, it will
+ be printed in signed decimal format, else in hexadecimal
+ format. This option is off by default.
+
sleep-time
When running function graph tracer, to include
the time a task schedules out in its function.
@@ -2673,6 +2686,68 @@ It is default disabled.
0) 1.757 us | } /* kmem_cache_free() */
0) 2.861 us | } /* putname() */

+The return value of each traced function can be displayed. This
+feature can be very useful when you encounter system call failures
+and want to locate the function that return errors firstly.
+
+ - hide: echo nofuncgraph-retval > trace_options
+ - show: echo funcgraph-retval > trace_options
+
+ Example with funcgraph-retval::
+
+ 1) | cgroup_migrate() {
+ 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
+ 1) | cgroup_migrate_execute() {
+ 1) | cpu_cgroup_can_attach() {
+ 1) | cgroup_taskset_first() {
+ 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
+ 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
+ 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
+ 1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
+ 1) 4.369 us | } /* cgroup_migrate_execute = -22 */
+ 1) 7.143 us | } /* cgroup_migrate = -22 */
+
+The above example shows that the cgroup function cpu_cgroup_can_attach
+returned the error code -22 firstly, then we can read the code of this
+function to get the root cause.
+
+If the option graph_retval_hex is not set and the return value looks
+like an error code, it will be printed in signed decimal format. In
+other cases, it will be printed in hexadecimal format.
+
+ - hexadecimal or signed decimal: echo nograph_retval_hex > trace_options
+ - hexadecimal: echo graph_retval_hex > trace_options
+
+ Example with graph_retval_hex::
+
+ 1) | cgroup_migrate() {
+ 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
+ 1) | cgroup_migrate_execute() {
+ 1) | cpu_cgroup_can_attach() {
+ 1) | cgroup_taskset_first() {
+ 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
+ 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
+ 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
+ 1) 2.335 us | } /* cpu_cgroup_can_attach = 0xffffffea */
+ 1) 4.369 us | } /* cgroup_migrate_execute = 0xffffffea */
+ 1) 7.143 us | } /* cgroup_migrate = 0xffffffea */
+
+There are some limitations when using the funcgraph-retval currently:
+
+- Even if the function return type is void, a return value will still
+ be printed, and you can just ignore it.
+
+- Even if the return value is not an error code actually, it may be
+ displayed as an error code. You should read the code to check.
+ For example, both 0xfe and 0xfffe are be interpreted as -2.
+
+- Only the value of the first return register will be recorded and
+ printed even if the return values may be stored in two registers
+ actually. For example, both the eax and edx are used to store a
+ 64 bit return value in the x86 architecture, and the eax stores
+ the low 32 bit, the edx stores the high 32 bit, however only the
+ value stored in eax will be recorded and printed.
+
You can put some comments on specific functions by using
trace_printk() For example, if you want to put a comment inside
the __might_sleep() function, you just have to include
--
2.25.1


2023-03-20 13:18:33

by pengdonglin

[permalink] [raw]
Subject: [PATCH v5 1/2] function_graph: Support recording and printing the return value of function

When using the function_graph tracer to analyze system call failures,
it can be time-consuming to analyze the trace logs and locate the kernel
function that first returns an error. This change aims to simplify the
process by recording the function return value to the 'retval' member of
'ftrace_graph_ent' and printing it when outputing the trace log.

New trace options are introduced: funcgraph-retval and graph_retval_hex.
The former is used to control whether to display the return value, while
the latter is used to control the display format of the reutrn value.

Note that even if a function's return type is void, a return value will
still be printed, so just ignore it.

Currently, this modification supports the following commonly used
processor architectures: x86_64, x86, arm64, arm, riscv.

Here is an example:

I want to attach the demo process to a cpu cgroup, but it failed:

echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
-bash: echo: write error: Invalid argument

The strace logs tells that the write system call returned -EINVAL(-22):
...
write(1, "273\n", 4) = -1 EINVAL (Invalid argument)
...

Use the following commands to capture trace logs when calling the write
system call:

cd /sys/kernel/debug/tracing/
echo 0 > tracing_on
echo > trace
echo *sys_write > set_graph_function
echo *spin* > set_graph_notrace
echo *rcu* >> set_graph_notrace
echo *alloc* >> set_graph_notrace
echo preempt* >> set_graph_notrace
echo kfree* >> set_graph_notrace
echo $$ > set_ftrace_pid
echo function_graph > current_tracer
echo 1 > tracing_on
echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
echo 0 > tracing_on
echo 1 > options/funcgraph-retval
cat trace > ~/trace.log

Search the error code -22 directly in the file trace.log and find the
first function that return -22, then read the function code to get the
root cause.

...

1) | cgroup_migrate() {
1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
1) | cgroup_migrate_execute() {
1) | cpu_cgroup_can_attach() {
1) | cgroup_taskset_first() {
1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
1) 4.369 us | } /* cgroup_migrate_execute = -22 */
1) 7.143 us | } /* cgroup_migrate = -22 */

...

Signed-off-by: Donglin Peng <[email protected]>
---
v5:
- Pass both the return values to ftrace_return_to_handler
- Modify the parameter sequence of ftrace_return_to_handler to
decrease the modification of assembly code, thanks to Russell King
- Wrap __ftrace_return_to_handler with ftrace_return_to_handler
for compatible

v4:
- Modify commit message
- Introduce new option graph_retval_hex to control display format
- Introduce macro CONFIG_FUNCTION_GRAPH_RETVAL and
CONFIG_HAVE_FUNCTION_GRAPH_RETVAL
- add each arch maintainers to review

v3:
- Modify the commit message: add trace logs processed with the btf tool

v2:
- Modify the commit message: use BTF to get the return type of function
---
arch/arm/Kconfig | 1 +
arch/arm/kernel/entry-ftrace.S | 8 +++
arch/arm64/Kconfig | 1 +
arch/arm64/kernel/entry-ftrace.S | 8 +++
arch/riscv/Kconfig | 1 +
arch/riscv/kernel/mcount.S | 12 +++-
arch/x86/Kconfig | 1 +
arch/x86/kernel/ftrace_32.S | 8 +++
arch/x86/kernel/ftrace_64.S | 10 +++
include/linux/ftrace.h | 3 +
kernel/trace/Kconfig | 8 +++
kernel/trace/fgraph.c | 25 +++++++-
kernel/trace/trace.h | 2 +
kernel/trace/trace_entries.h | 26 ++++++++
kernel/trace/trace_functions_graph.c | 95 +++++++++++++++++++++++++---
15 files changed, 197 insertions(+), 12 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e24a9820e12f..ad03fc868f34 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -99,6 +99,7 @@ config ARM
select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_FUNCTION_GRAPH_TRACER
+ select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER if !XIP_KERNEL
select HAVE_GCC_PLUGINS
select HAVE_HW_BREAKPOINT if PERF_EVENTS && (CPU_V6 || CPU_V6K || CPU_V7)
diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
index 3e7bcaca5e07..ba1986e27af8 100644
--- a/arch/arm/kernel/entry-ftrace.S
+++ b/arch/arm/kernel/entry-ftrace.S
@@ -258,7 +258,15 @@ ENDPROC(ftrace_graph_regs_caller)
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
ENTRY(return_to_handler)
stmdb sp!, {r0-r3}
+#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
+ /*
+ * Pass both the function return values in the register r0 and r1
+ * to ftrace_return_to_handler
+ */
+ add r2, sp, #16 @ sp at exit of instrumented routine
+#else
add r0, sp, #16 @ sp at exit of instrumented routine
+#endif
bl ftrace_return_to_handler
mov lr, r0 @ r0 has real ret addr
ldmia sp!, {r0-r3}
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1023e896d46b..5ad53d565f2b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -196,6 +196,7 @@ config ARM64
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_ERROR_INJECTION
select HAVE_FUNCTION_GRAPH_TRACER
+ select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
select HAVE_GCC_PLUGINS
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_IOREMAP_PROT
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
index 350ed81324ac..d1a5d76e6d72 100644
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -276,7 +276,15 @@ SYM_CODE_START(return_to_handler)
stp x4, x5, [sp, #32]
stp x6, x7, [sp, #48]

+#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
+ /*
+ * Pass both the function return values in the register x0 and x1
+ * to ftrace_return_to_handler.
+ */
+ mov x2, x29 // parent's fp
+#else
mov x0, x29 // parent's fp
+#endif
bl ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
mov x30, x0 // restore the original return address

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index c5e42cc37604..3efa4f645a39 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -139,6 +139,7 @@ config RISCV
select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
select HAVE_FTRACE_MCOUNT_RECORD if !XIP_KERNEL
select HAVE_FUNCTION_GRAPH_TRACER
+ select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION

config ARCH_MMAP_RND_BITS_MIN
diff --git a/arch/riscv/kernel/mcount.S b/arch/riscv/kernel/mcount.S
index 30102aadc4d7..bba2f3835592 100644
--- a/arch/riscv/kernel/mcount.S
+++ b/arch/riscv/kernel/mcount.S
@@ -69,8 +69,18 @@ ENTRY(return_to_handler)
mv t6, s0
#endif
SAVE_RET_ABI_STATE
-#ifdef HAVE_FUNCTION_GRAPH_FP_TEST
+#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
+ /*
+ * Pass both the function return values in the register a0 and a1
+ * to ftrace_return_to_handler
+ */
+ #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
+ mv a2, t6
+ #endif
+#else
+ #ifdef HAVE_FUNCTION_GRAPH_FP_TEST
mv a0, t6
+ #endif
#endif
call ftrace_return_to_handler
mv a2, a0
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index a825bf031f49..af80f3c75a54 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -215,6 +215,7 @@ config X86
select HAVE_FENTRY if X86_64 || DYNAMIC_FTRACE
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_GRAPH_TRACER if X86_32 || (X86_64 && DYNAMIC_FTRACE)
+ select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER
select HAVE_FUNCTION_TRACER
select HAVE_GCC_PLUGINS
select HAVE_HW_BREAKPOINT
diff --git a/arch/x86/kernel/ftrace_32.S b/arch/x86/kernel/ftrace_32.S
index a0ed0e4a2c0c..ae8f215d9820 100644
--- a/arch/x86/kernel/ftrace_32.S
+++ b/arch/x86/kernel/ftrace_32.S
@@ -184,7 +184,15 @@ SYM_CODE_END(ftrace_graph_caller)
return_to_handler:
pushl %eax
pushl %edx
+#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
+ /*
+ * Pass both the function return values in the register eax and edx
+ * to ftrace_return_to_handler
+ */
+ movl $0, %ecx
+#else
movl $0, %eax
+#endif
call ftrace_return_to_handler
movl %eax, %ecx
popl %edx
diff --git a/arch/x86/kernel/ftrace_64.S b/arch/x86/kernel/ftrace_64.S
index 1265ad519249..2eefa82814dc 100644
--- a/arch/x86/kernel/ftrace_64.S
+++ b/arch/x86/kernel/ftrace_64.S
@@ -347,7 +347,17 @@ SYM_CODE_START(return_to_handler)
/* Save the return values */
movq %rax, (%rsp)
movq %rdx, 8(%rsp)
+#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
+ /*
+ * Pass both the function return values in the register rax and rdx
+ * to ftrace_return_to_handler
+ */
+ movq %rax, %rdi
+ movq %rdx, %rsi
+ movq %rbp, %rdx
+#else
movq %rbp, %rdi
+#endif

call ftrace_return_to_handler

diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 366c730beaa3..be662a11cd13 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -1032,6 +1032,9 @@ struct ftrace_graph_ent {
*/
struct ftrace_graph_ret {
unsigned long func; /* Current function */
+#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
+ unsigned long retval;
+#endif
int depth;
/* Number of functions that overran the depth limit for current task */
unsigned int overrun;
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index a856d4a34c67..319eb6dc2250 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -31,6 +31,9 @@ config HAVE_FUNCTION_GRAPH_TRACER
help
See Documentation/trace/ftrace-design.rst

+config HAVE_FUNCTION_GRAPH_RETVAL
+ bool
+
config HAVE_DYNAMIC_FTRACE
bool
help
@@ -227,6 +230,11 @@ config FUNCTION_GRAPH_TRACER
the return value. This is done by setting the current return
address on the current task structure into a stack of calls.

+config FUNCTION_GRAPH_RETVAL
+ def_bool y
+ depends on HAVE_FUNCTION_GRAPH_RETVAL
+ depends on FUNCTION_GRAPH_TRACER
+
config DYNAMIC_FTRACE
bool "enable/disable function tracing dynamically"
depends on FUNCTION_TRACER
diff --git a/kernel/trace/fgraph.c b/kernel/trace/fgraph.c
index 218cd95bf8e4..f40b10fc6561 100644
--- a/kernel/trace/fgraph.c
+++ b/kernel/trace/fgraph.c
@@ -238,14 +238,24 @@ static struct notifier_block ftrace_suspend_notifier = {

/*
* Send the trace to the ring-buffer.
+ *
+ * For double register return, need to pass both register values.
+ * However only the first one is saved currently.
+ *
+ * @retval_1st: the first return value
+ * @retval_2nd: the second return value
* @return the original return address.
*/
-unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
+static unsigned long __ftrace_return_to_handler(unsigned long retval_1st,
+ unsigned long retval_2nd, unsigned long frame_pointer)
{
struct ftrace_graph_ret trace;
unsigned long ret;

ftrace_pop_return_trace(&trace, &ret, frame_pointer);
+#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
+ trace.retval = retval_1st;
+#endif
trace.rettime = trace_clock_local();
ftrace_graph_return(&trace);
/*
@@ -266,6 +276,19 @@ unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
return ret;
}

+#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
+unsigned long ftrace_return_to_handler(unsigned long retval_1st,
+ unsigned long retval_2nd, unsigned long frame_pointer)
+{
+ return __ftrace_return_to_handler(retval_1st, retval_2nd, frame_pointer);
+}
+#else
+unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
+{
+ return __ftrace_return_to_handler(0, 0, frame_pointer);
+}
+#endif
+
/**
* ftrace_graph_get_ret_stack - return the entry of the shadow stack
* @task: The task to read the shadow stack from
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 616e1aa1c4da..0a3c4582df10 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -831,6 +831,8 @@ static __always_inline bool ftrace_hash_empty(struct ftrace_hash *hash)
#define TRACE_GRAPH_PRINT_TAIL 0x100
#define TRACE_GRAPH_SLEEP_TIME 0x200
#define TRACE_GRAPH_GRAPH_TIME 0x400
+#define TRACE_GRAPH_PRINT_RETVAL 0x800
+#define TRACE_GRAPH_PRINT_RETVAL_HEX 0x1000
#define TRACE_GRAPH_PRINT_FILL_SHIFT 28
#define TRACE_GRAPH_PRINT_FILL_MASK (0x3 << TRACE_GRAPH_PRINT_FILL_SHIFT)

diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index cd41e863b51c..340b2fa98218 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -86,6 +86,30 @@ FTRACE_ENTRY_PACKED(funcgraph_entry, ftrace_graph_ent_entry,
);

/* Function return entry */
+#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
+
+FTRACE_ENTRY_PACKED(funcgraph_exit, ftrace_graph_ret_entry,
+
+ TRACE_GRAPH_RET,
+
+ F_STRUCT(
+ __field_struct( struct ftrace_graph_ret, ret )
+ __field_packed( unsigned long, ret, func )
+ __field_packed( unsigned long, ret, retval )
+ __field_packed( int, ret, depth )
+ __field_packed( unsigned int, ret, overrun )
+ __field_packed( unsigned long long, ret, calltime)
+ __field_packed( unsigned long long, ret, rettime )
+ ),
+
+ F_printk("<-- %ps (%d) (start: %llx end: %llx) over: %d retval: %lx",
+ (void *)__entry->func, __entry->depth,
+ __entry->calltime, __entry->rettime,
+ __entry->depth, __entry->retval)
+);
+
+#else
+
FTRACE_ENTRY_PACKED(funcgraph_exit, ftrace_graph_ret_entry,

TRACE_GRAPH_RET,
@@ -105,6 +129,8 @@ FTRACE_ENTRY_PACKED(funcgraph_exit, ftrace_graph_ret_entry,
__entry->depth)
);

+#endif
+
/*
* Context switch trace entry - which task (and prio) we switched from/to:
*
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 203204cadf92..e26f548a4d75 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -58,6 +58,12 @@ static struct tracer_opt trace_opts[] = {
{ TRACER_OPT(funcgraph-irqs, TRACE_GRAPH_PRINT_IRQS) },
/* Display function name after trailing } */
{ TRACER_OPT(funcgraph-tail, TRACE_GRAPH_PRINT_TAIL) },
+#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
+ /* Display function return value ? */
+ { TRACER_OPT(funcgraph-retval, TRACE_GRAPH_PRINT_RETVAL) },
+ /* Display function return value in hexadecimal format ? */
+ { TRACER_OPT(graph_retval_hex, TRACE_GRAPH_PRINT_RETVAL_HEX) },
+#endif
/* Include sleep time (scheduled out) between entry and return */
{ TRACER_OPT(sleep-time, TRACE_GRAPH_SLEEP_TIME) },

@@ -619,6 +625,58 @@ print_graph_duration(struct trace_array *tr, unsigned long long duration,
trace_seq_puts(s, "| ");
}

+#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
+
+#define __TRACE_GRAPH_PRINT_RETVAL TRACE_GRAPH_PRINT_RETVAL
+
+static void print_graph_retval(struct trace_seq *s, unsigned long retval,
+ bool leaf, void *func, bool hex_format)
+{
+ unsigned long err_code = 0;
+
+ if (retval == 0 || hex_format)
+ goto done;
+
+ /* Guess whether the retval looks like an error code */
+ if ((retval & BIT(7)) && (retval >> 8) == 0)
+ err_code = (unsigned long)(s8)retval;
+ else if ((retval & BIT(15)) && (retval >> 16) == 0)
+ err_code = (unsigned long)(s16)retval;
+ else if ((retval & BIT(31)) && (((u64)retval) >> 32) == 0)
+ err_code = (unsigned long)(s32)retval;
+ else
+ err_code = retval;
+
+ if (!IS_ERR_VALUE(err_code))
+ err_code = 0;
+
+done:
+ if (leaf) {
+ if (hex_format || (err_code == 0))
+ trace_seq_printf(s, "%ps(); /* = 0x%lx */\n",
+ func, retval);
+ else
+ trace_seq_printf(s, "%ps(); /* = %ld */\n",
+ func, err_code);
+ } else {
+ if (hex_format || (err_code == 0))
+ trace_seq_printf(s, "} /* %ps = 0x%lx */\n",
+ func, retval);
+ else
+ trace_seq_printf(s, "} /* %ps = %ld */\n",
+ func, err_code);
+ }
+}
+
+#else
+
+#define __TRACE_GRAPH_PRINT_RETVAL 0
+
+#define print_graph_retval(_seq, _retval, _leaf, _func, _format) \
+ do {} while (0)
+
+#endif
+
/* Case of a leaf function on its call entry */
static enum print_line_t
print_graph_entry_leaf(struct trace_iterator *iter,
@@ -663,7 +721,15 @@ print_graph_entry_leaf(struct trace_iterator *iter,
for (i = 0; i < call->depth * TRACE_GRAPH_INDENT; i++)
trace_seq_putc(s, ' ');

- trace_seq_printf(s, "%ps();\n", (void *)call->func);
+ /*
+ * Write out the function return value if the option function-retval is
+ * enabled.
+ */
+ if (flags & __TRACE_GRAPH_PRINT_RETVAL)
+ print_graph_retval(s, graph_ret->retval, true, (void *)call->func,
+ !!(flags & TRACE_GRAPH_PRINT_RETVAL_HEX));
+ else
+ trace_seq_printf(s, "%ps();\n", (void *)call->func);

print_graph_irq(iter, graph_ret->func, TRACE_GRAPH_RET,
cpu, iter->ent->pid, flags);
@@ -942,16 +1008,25 @@ print_graph_return(struct ftrace_graph_ret *trace, struct trace_seq *s,
trace_seq_putc(s, ' ');

/*
- * If the return function does not have a matching entry,
- * then the entry was lost. Instead of just printing
- * the '}' and letting the user guess what function this
- * belongs to, write out the function name. Always do
- * that if the funcgraph-tail option is enabled.
+ * Always write out the function name and its return value if the
+ * function-retval option is enabled.
*/
- if (func_match && !(flags & TRACE_GRAPH_PRINT_TAIL))
- trace_seq_puts(s, "}\n");
- else
- trace_seq_printf(s, "} /* %ps */\n", (void *)trace->func);
+ if (flags & __TRACE_GRAPH_PRINT_RETVAL) {
+ print_graph_retval(s, trace->retval, false, (void *)trace->func,
+ !!(flags & TRACE_GRAPH_PRINT_RETVAL_HEX));
+ } else {
+ /*
+ * If the return function does not have a matching entry,
+ * then the entry was lost. Instead of just printing
+ * the '}' and letting the user guess what function this
+ * belongs to, write out the function name. Always do
+ * that if the funcgraph-tail option is enabled.
+ */
+ if (func_match && !(flags & TRACE_GRAPH_PRINT_TAIL))
+ trace_seq_puts(s, "}\n");
+ else
+ trace_seq_printf(s, "} /* %ps */\n", (void *)trace->func);
+ }

/* Overrun */
if (flags & TRACE_GRAPH_PRINT_OVERRUN)
--
2.25.1


2023-03-21 02:31:57

by Ding Hui

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] tracing: Add documentation for funcgraph-retval and graph_retval_hex

On 2023/3/20 21:16, Donglin Peng wrote:

> +There are some limitations when using the funcgraph-retval currently:
> +
> +- Even if the function return type is void, a return value will still
> + be printed, and you can just ignore it.
> +
> +- Even if the return value is not an error code actually, it may be
> + displayed as an error code. You should read the code to check.
> + For example, both 0xfe and 0xfffe are be interpreted as -2.

For char and short types, displaying as signed decimal may be not
appropriate, because they are rarely used to store error code.

So in "smart" mode (graph_retval_hex=0), I suggest just smart convert
error value stored in int or pointer to signed decimal.

> +- Only the value of the first return register will be recorded and
> + printed even if the return values may be stored in two registers
> + actually. For example, both the eax and edx are used to store a
> + 64 bit return value in the x86 architecture, and the eax stores
> + the low 32 bit, the edx stores the high 32 bit, however only the
> + value stored in eax will be recorded and printed.
> +
> You can put some comments on specific functions by using
> trace_printk() For example, if you want to put a comment inside
> the __might_sleep() function, you just have to include

--
Thanks,
- Ding Hui


2023-03-21 03:25:03

by pengdonglin

[permalink] [raw]
Subject: Re: [PATCH v5 2/2] tracing: Add documentation for funcgraph-retval and graph_retval_hex

On 2023/3/21 10:31, Ding Hui wrote:
> On 2023/3/20 21:16, Donglin Peng wrote:
>
>> +There are some limitations when using the funcgraph-retval currently:
>> +
>> +- Even if the function return type is void, a return value will still
>> +  be printed, and you can just ignore it.
>> +
>> +- Even if the return value is not an error code actually, it may be
>> +  displayed as an error code. You should read the code to check.
>> +  For example, both 0xfe and 0xfffe are be interpreted as -2.
>
> For char and short types, displaying as signed decimal may be not
> appropriate, because they are rarely used to store error code.
>
> So in "smart" mode (graph_retval_hex=0), I suggest just smart convert
> error value stored in int or pointer to signed decimal.

Yeah, I will remove the return value conversion for char and short
types in the "smart" mode.

>
>> +- Only the value of the first return register will be recorded and
>> +  printed even if the return values may be stored in two registers
>> +  actually. For example, both the eax and edx are used to store a
>> +  64 bit return value in the x86 architecture, and the eax stores
>> +  the low 32 bit, the edx stores the high 32 bit, however only the
>> +  value stored in eax will be recorded and printed.
>> +
>>   You can put some comments on specific functions by using
>>   trace_printk() For example, if you want to put a comment inside
>>   the __might_sleep() function, you just have to include
>


2023-03-21 14:10:03

by Florian Kauer

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] function_graph: Support recording and printing the return value of function

On 20.03.23 14:16, Donglin Peng wrote:
> When using the function_graph tracer to analyze system call failures,
> it can be time-consuming to analyze the trace logs and locate the kernel
> function that first returns an error. This change aims to simplify the
> process by recording the function return value to the 'retval' member of
> 'ftrace_graph_ent' and printing it when outputing the trace log.

I just came across your patch by pure luck and it helped me a lot
to trace down a problem I had, thanks!

So you can have my
Tested-by: Florian Kauer <[email protected]>

> New trace options are introduced: funcgraph-retval and graph_retval_hex.

I would personally prefer to have the second option scoped better, so for example
"funcgraph-retval-hex".

Greetings,
Florian

2023-03-21 14:31:41

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] function_graph: Support recording and printing the return value of function

On Mon, Mar 20, 2023 at 06:16:49AM -0700, Donglin Peng wrote:
> When using the function_graph tracer to analyze system call failures,
> it can be time-consuming to analyze the trace logs and locate the kernel
> function that first returns an error. This change aims to simplify the
> process by recording the function return value to the 'retval' member of
> 'ftrace_graph_ent' and printing it when outputing the trace log.
>
> New trace options are introduced: funcgraph-retval and graph_retval_hex.
> The former is used to control whether to display the return value, while
> the latter is used to control the display format of the reutrn value.
>
> Note that even if a function's return type is void, a return value will
> still be printed, so just ignore it.
>
> Currently, this modification supports the following commonly used
> processor architectures: x86_64, x86, arm64, arm, riscv.
>
> Here is an example:
>
> I want to attach the demo process to a cpu cgroup, but it failed:
>
> echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
> -bash: echo: write error: Invalid argument
>
> The strace logs tells that the write system call returned -EINVAL(-22):
> ...
> write(1, "273\n", 4) = -1 EINVAL (Invalid argument)
> ...
>
> Use the following commands to capture trace logs when calling the write
> system call:
>
> cd /sys/kernel/debug/tracing/
> echo 0 > tracing_on
> echo > trace
> echo *sys_write > set_graph_function
> echo *spin* > set_graph_notrace
> echo *rcu* >> set_graph_notrace
> echo *alloc* >> set_graph_notrace
> echo preempt* >> set_graph_notrace
> echo kfree* >> set_graph_notrace
> echo $$ > set_ftrace_pid
> echo function_graph > current_tracer
> echo 1 > tracing_on
> echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
> echo 0 > tracing_on
> echo 1 > options/funcgraph-retval
> cat trace > ~/trace.log
>
> Search the error code -22 directly in the file trace.log and find the
> first function that return -22, then read the function code to get the
> root cause.
>
> ...
>
> 1) | cgroup_migrate() {
> 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
> 1) | cgroup_migrate_execute() {
> 1) | cpu_cgroup_can_attach() {
> 1) | cgroup_taskset_first() {
> 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
> 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
> 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
> 1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
> 1) 4.369 us | } /* cgroup_migrate_execute = -22 */
> 1) 7.143 us | } /* cgroup_migrate = -22 */
>
> ...
>
> Signed-off-by: Donglin Peng <[email protected]>

> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
> index 350ed81324ac..d1a5d76e6d72 100644
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -276,7 +276,15 @@ SYM_CODE_START(return_to_handler)
> stp x4, x5, [sp, #32]
> stp x6, x7, [sp, #48]
>
> +#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
> + /*
> + * Pass both the function return values in the register x0 and x1
> + * to ftrace_return_to_handler.
> + */
> + mov x2, x29 // parent's fp
> +#else
> mov x0, x29 // parent's fp
> +#endif
> bl ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
> mov x30, x0 // restore the original return address


Please don't make the calling convention of the asm change depending on a
selectable config option.

We already store the regs here; I'd be happy to make that a struct ftrace_regs
and pass a pointer to that to C code. Then it's be easy to acquire the value
you want in the exact same way as upon entry, and it'll work even if we decide
to return a structure by value somewhere (as that can use registers x2-x7 too).

Thanks,
Mark.

2023-03-21 14:44:24

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] function_graph: Support recording and printing the return value of function

On Tue, 21 Mar 2023 15:09:40 +0100
Florian Kauer <[email protected]> wrote:

> On 20.03.23 14:16, Donglin Peng wrote:
> > When using the function_graph tracer to analyze system call failures,
> > it can be time-consuming to analyze the trace logs and locate the kernel
> > function that first returns an error. This change aims to simplify the
> > process by recording the function return value to the 'retval' member of
> > 'ftrace_graph_ent' and printing it when outputing the trace log.
>
> I just came across your patch by pure luck and it helped me a lot
> to trace down a problem I had, thanks!
>
> So you can have my
> Tested-by: Florian Kauer <[email protected]>
>
> > New trace options are introduced: funcgraph-retval and graph_retval_hex.
>
> I would personally prefer to have the second option scoped better, so for example
> "funcgraph-retval-hex".

That could be an ftrace option.

Anyway, could you tell us your use case, and that could go into the change
log of this patch as "one use case that this helped with".

Thanks,

-- Steve


2023-03-21 15:11:28

by Florian Kauer

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] function_graph: Support recording and printing the return value of function

On 21.03.23 15:44, Steven Rostedt wrote:
> On Tue, 21 Mar 2023 15:09:40 +0100
> Florian Kauer <[email protected]> wrote:
>
>> On 20.03.23 14:16, Donglin Peng wrote:
>>> When using the function_graph tracer to analyze system call failures,
>>> it can be time-consuming to analyze the trace logs and locate the kernel
>>> function that first returns an error. This change aims to simplify the
>>> process by recording the function return value to the 'retval' member of
>>> 'ftrace_graph_ent' and printing it when outputing the trace log.
>>
>> I just came across your patch by pure luck and it helped me a lot
>> to trace down a problem I had, thanks!
>>
>> So you can have my
>> Tested-by: Florian Kauer <[email protected]>
>>
>>> New trace options are introduced: funcgraph-retval and graph_retval_hex.
>>
>> I would personally prefer to have the second option scoped better, so for example
>> "funcgraph-retval-hex".
>
> That could be an ftrace option.

What do you mean? In the current implementation both funcgraph-retval and graph_retval_hex
are options for the function_graph tracer, but one is prefixed with "funcgraph-" as nearly
all other options for the function_graph tracer and one is not (and is even snake_case, while
the others are kebab-case). So it just looks inconsistent for me, but there might be a reason?

By the way: The documentation patch also references "function-retval" instead of
"funcgraph-retval" in the documentation of the graph_retval_hex option.

> Anyway, could you tell us your use case, and that could go into the change
> log of this patch as "one use case that this helped with".

Nothing spectacular. I just wanted to find out why ICMP port unreachable messages
sporadically lead to -111 (Connection Refused) for __sys_sendto() when IP_RECVERR is set
and the call never fails if IP_RECVERR is not set. (I am still unsure if this is
REALLY intended behavior, but at least it makes sense why this occurs when reading
the sources).

And with this patch, the -111 is directly popping up in the trace, but I do not think
that my missing knowledge about details of the kernel network stack
really qualifies as a good argument ;-)

Greetings,
Florian

2023-03-21 17:32:31

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] function_graph: Support recording and printing the return value of function

On Mon, Mar 20, 2023 at 06:16:49AM -0700, Donglin Peng wrote:
> diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
> index 3e7bcaca5e07..ba1986e27af8 100644
> --- a/arch/arm/kernel/entry-ftrace.S
> +++ b/arch/arm/kernel/entry-ftrace.S
> @@ -258,7 +258,15 @@ ENDPROC(ftrace_graph_regs_caller)
> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> ENTRY(return_to_handler)
> stmdb sp!, {r0-r3}
> +#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
> + /*
> + * Pass both the function return values in the register r0 and r1
> + * to ftrace_return_to_handler
> + */
> + add r2, sp, #16 @ sp at exit of instrumented routine
> +#else
> add r0, sp, #16 @ sp at exit of instrumented routine
> +#endif
> bl ftrace_return_to_handler
...
> -unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
> +static unsigned long __ftrace_return_to_handler(unsigned long retval_1st,
> + unsigned long retval_2nd, unsigned long frame_pointer)
...
> +#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
> +unsigned long ftrace_return_to_handler(unsigned long retval_1st,
> + unsigned long retval_2nd, unsigned long frame_pointer)
> +{
> + return __ftrace_return_to_handler(retval_1st, retval_2nd, frame_pointer);
> +}
> +#else
> +unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
> +{
> + return __ftrace_return_to_handler(0, 0, frame_pointer);
> +}
> +#endif
> +

Hi,

To echo Mark's criticism, I also don't like this. I feel it would be
better if ftrace_return_to_handler() always took the same arguments
irrespective of the setting of CONFIG_FUNCTION_GRAPH_RETVAL.

On 32-bit ARM, we have to stack r0-r3 anyway to prevent the call to
ftrace_return_to_handler() corrupting the return value, and these
are the registers we need. So we might as well pass a pointer to
these stacked registers. Whether that's acceptable on other
architectures, I couldn't say.

Thanks.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2023-03-22 05:14:14

by pengdonglin

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] function_graph: Support recording and printing the return value of function

On 2023/3/21 23:11, Florian Kauer wrote:
> On 21.03.23 15:44, Steven Rostedt wrote:
>> On Tue, 21 Mar 2023 15:09:40 +0100
>> Florian Kauer <[email protected]> wrote:
>>
>>> On 20.03.23 14:16, Donglin Peng wrote:
>>>> When using the function_graph tracer to analyze system call failures,
>>>> it can be time-consuming to analyze the trace logs and locate the kernel
>>>> function that first returns an error. This change aims to simplify the
>>>> process by recording the function return value to the 'retval' member of
>>>> 'ftrace_graph_ent' and printing it when outputing the trace log.
>>>
>>> I just came across your patch by pure luck and it helped me a lot
>>> to trace down a problem I had, thanks!
>>>
>>> So you can have my
>>> Tested-by: Florian Kauer <[email protected]>
>>>
>>>> New trace options are introduced: funcgraph-retval and graph_retval_hex.
>>>
>>> I would personally prefer to have the second option scoped better, so for example
>>> "funcgraph-retval-hex".
>>
>> That could be an ftrace option.
>
> What do you mean? In the current implementation both funcgraph-retval and graph_retval_hex
> are options for the function_graph tracer, but one is prefixed with "funcgraph-" as nearly
> all other options for the function_graph tracer and one is not (and is even snake_case, while
> the others are kebab-case). So it just looks inconsistent for me, but there might be a reason?

Agree, I also think "funcgraph-retval-hex" may look better.

Hi Hiramatsu, what do you think?

>
> By the way: The documentation patch also references "function-retval" instead of
> "funcgraph-retval" in the documentation of the graph_retval_hex option.

Thanks, it has been removed in v6.

>
>> Anyway, could you tell us your use case, and that could go into the change
>> log of this patch as "one use case that this helped with".
>
> Nothing spectacular. I just wanted to find out why ICMP port unreachable messages
> sporadically lead to -111 (Connection Refused) for __sys_sendto() when IP_RECVERR is set
> and the call never fails if IP_RECVERR is not set. (I am still unsure if this is
> REALLY intended behavior, but at least it makes sense why this occurs when reading
> the sources).
>
> And with this patch, the -111 is directly popping up in the trace, but I do not think
> that my missing knowledge about details of the kernel network stack
> really qualifies as a good argument ;-)
>
> Greetings,
> Florian

2023-03-22 08:50:10

by pengdonglin

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] function_graph: Support recording and printing the return value of function

On 2023/3/21 22:24, Mark Rutland wrote:
> On Mon, Mar 20, 2023 at 06:16:49AM -0700, Donglin Peng wrote:
>> When using the function_graph tracer to analyze system call failures,
>> it can be time-consuming to analyze the trace logs and locate the kernel
>> function that first returns an error. This change aims to simplify the
>> process by recording the function return value to the 'retval' member of
>> 'ftrace_graph_ent' and printing it when outputing the trace log.
>>
>> New trace options are introduced: funcgraph-retval and graph_retval_hex.
>> The former is used to control whether to display the return value, while
>> the latter is used to control the display format of the reutrn value.
>>
>> Note that even if a function's return type is void, a return value will
>> still be printed, so just ignore it.
>>
>> Currently, this modification supports the following commonly used
>> processor architectures: x86_64, x86, arm64, arm, riscv.
>>
>> Here is an example:
>>
>> I want to attach the demo process to a cpu cgroup, but it failed:
>>
>> echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
>> -bash: echo: write error: Invalid argument
>>
>> The strace logs tells that the write system call returned -EINVAL(-22):
>> ...
>> write(1, "273\n", 4) = -1 EINVAL (Invalid argument)
>> ...
>>
>> Use the following commands to capture trace logs when calling the write
>> system call:
>>
>> cd /sys/kernel/debug/tracing/
>> echo 0 > tracing_on
>> echo > trace
>> echo *sys_write > set_graph_function
>> echo *spin* > set_graph_notrace
>> echo *rcu* >> set_graph_notrace
>> echo *alloc* >> set_graph_notrace
>> echo preempt* >> set_graph_notrace
>> echo kfree* >> set_graph_notrace
>> echo $$ > set_ftrace_pid
>> echo function_graph > current_tracer
>> echo 1 > tracing_on
>> echo `pidof demo` > /sys/fs/cgroup/cpu/test/tasks
>> echo 0 > tracing_on
>> echo 1 > options/funcgraph-retval
>> cat trace > ~/trace.log
>>
>> Search the error code -22 directly in the file trace.log and find the
>> first function that return -22, then read the function code to get the
>> root cause.
>>
>> ...
>>
>> 1) | cgroup_migrate() {
>> 1) 0.651 us | cgroup_migrate_add_task(); /* = 0xffff93fcfd346c00 */
>> 1) | cgroup_migrate_execute() {
>> 1) | cpu_cgroup_can_attach() {
>> 1) | cgroup_taskset_first() {
>> 1) 0.732 us | cgroup_taskset_next(); /* = 0xffff93fc8fb20000 */
>> 1) 1.232 us | } /* cgroup_taskset_first = 0xffff93fc8fb20000 */
>> 1) 0.380 us | sched_rt_can_attach(); /* = 0x0 */
>> 1) 2.335 us | } /* cpu_cgroup_can_attach = -22 */
>> 1) 4.369 us | } /* cgroup_migrate_execute = -22 */
>> 1) 7.143 us | } /* cgroup_migrate = -22 */
>>
>> ...
>>
>> Signed-off-by: Donglin Peng <[email protected]>
>
>> diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
>> index 350ed81324ac..d1a5d76e6d72 100644
>> --- a/arch/arm64/kernel/entry-ftrace.S
>> +++ b/arch/arm64/kernel/entry-ftrace.S
>> @@ -276,7 +276,15 @@ SYM_CODE_START(return_to_handler)
>> stp x4, x5, [sp, #32]
>> stp x6, x7, [sp, #48]
>>
>> +#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
>> + /*
>> + * Pass both the function return values in the register x0 and x1
>> + * to ftrace_return_to_handler.
>> + */
>> + mov x2, x29 // parent's fp
>> +#else
>> mov x0, x29 // parent's fp
>> +#endif
>> bl ftrace_return_to_handler// addr = ftrace_return_to_hander(fp);
>> mov x30, x0 // restore the original return address
>
>
> Please don't make the calling convention of the asm change depending on a
> selectable config option.
>
> We already store the regs here; I'd be happy to make that a struct ftrace_regs
> and pass a pointer to that to C code. Then it's be easy to acquire the value
> you want in the exact same way as upon entry, and it'll work even if we decide
> to return a structure by value somewhere (as that can use registers x2-x7 too).

Agree, I think we only need to store return registers, but ftrace_regs
has too many members. So we can introduce a new structure called
pt_ret_regs which only contains return registers and frame pointer
register, for ARM64, they are x0~x7 and x29. Then pass the pointer to
pt_ret_regs to ftrace_return_to_handler.

>
> Thanks,
> Mark.

2023-03-22 09:06:30

by pengdonglin

[permalink] [raw]
Subject: Re: [PATCH v5 1/2] function_graph: Support recording and printing the return value of function

On 2023/3/22 1:31, Russell King (Oracle) wrote:
> On Mon, Mar 20, 2023 at 06:16:49AM -0700, Donglin Peng wrote:
>> diff --git a/arch/arm/kernel/entry-ftrace.S b/arch/arm/kernel/entry-ftrace.S
>> index 3e7bcaca5e07..ba1986e27af8 100644
>> --- a/arch/arm/kernel/entry-ftrace.S
>> +++ b/arch/arm/kernel/entry-ftrace.S
>> @@ -258,7 +258,15 @@ ENDPROC(ftrace_graph_regs_caller)
>> #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>> ENTRY(return_to_handler)
>> stmdb sp!, {r0-r3}
>> +#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
>> + /*
>> + * Pass both the function return values in the register r0 and r1
>> + * to ftrace_return_to_handler
>> + */
>> + add r2, sp, #16 @ sp at exit of instrumented routine
>> +#else
>> add r0, sp, #16 @ sp at exit of instrumented routine
>> +#endif
>> bl ftrace_return_to_handler
> ...
>> -unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
>> +static unsigned long __ftrace_return_to_handler(unsigned long retval_1st,
>> + unsigned long retval_2nd, unsigned long frame_pointer)
> ...
>> +#ifdef CONFIG_FUNCTION_GRAPH_RETVAL
>> +unsigned long ftrace_return_to_handler(unsigned long retval_1st,
>> + unsigned long retval_2nd, unsigned long frame_pointer)
>> +{
>> + return __ftrace_return_to_handler(retval_1st, retval_2nd, frame_pointer);
>> +}
>> +#else
>> +unsigned long ftrace_return_to_handler(unsigned long frame_pointer)
>> +{
>> + return __ftrace_return_to_handler(0, 0, frame_pointer);
>> +}
>> +#endif
>> +
>
> Hi,
>
> To echo Mark's criticism, I also don't like this. I feel it would be
> better if ftrace_return_to_handler() always took the same arguments
> irrespective of the setting of CONFIG_FUNCTION_GRAPH_RETVAL.
>
> On 32-bit ARM, we have to stack r0-r3 anyway to prevent the call to
> ftrace_return_to_handler() corrupting the return value, and these
> are the registers we need. So we might as well pass a pointer to
> these stacked registers. Whether that's acceptable on other
> architectures, I couldn't say.

Agree, I think we can introduce a new structure called pt_ret_regs for
each relevant architecture. The pt_ret_regs should only contain the
return registers and the frame pointer register, for arm, they are r0~r3
and r11.Then we can pass the pointer to the pt_ret_regs to
ftrace_return_to_handler.

>
> Thanks.
>