2021-08-12 20:27:22

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: [RFC PATCH v8 0/4] arm64: Reorganize the unwinder and implement stack trace reliability checks

From: "Madhavan T. Venkataraman" <[email protected]>

Make all stack walking functions use arch_stack_walk()
======================================================

Currently, there are multiple functions in ARM64 code that walk the
stack using start_backtrace() and unwind_frame(). Convert all of
them to use arch_stack_walk(). This makes maintenance easier.

Reorganize the unwinder code for better consistency and maintenance
===================================================================

Rename unwinder functions to unwind_*() similar to other architectures
for naming consistency.

Annotate all of the unwind_*() functions with notrace so they cannot be
ftraced and NOKPROBE_SYMBOL() so they cannot be kprobed. Ftrace and Kprobe
code can call the unwinder.

Redefine the unwinder loop and make it similar to other architectures.
Define the following:

unwind_start(&frame, task, fp, pc);
while (unwind_consume(&frame, consume_entry, cookie))
unwind_next(&frame);
return !unwind_failed(&frame);

unwind_start()
Same as the original start_backtrace().

unwind_consume()
This new function does two things:

- Calls consume_entry() to consume the return PC.

- Implements checks to determine whether the unwind should continue
or terminate.

unwind_next()
Same as the original unwind_frame() except:

- the stack trace termination check has been moved from here to
unwind_consume(). So, unwind_next() assumes that the fp is valid.

- unwind_frame() used to return an error value. This function only
sets internal state and does not return anything. The state is
retrieved via a helper. See next.

unwind_failed()
Return a boolean to indicate whether the stack trace completed
successfully or failed. arch_stack_walk() ignores the return
value. But arch_stack_walk_reliable() in the future will look
at the return value.

Unwind status
Introduce a new flag called "failed" in struct stackframe. Set this
flag when an error is encountered. If this flag is set, terminate
the unwind. Also, let the unwinder return the status to the caller.

Reliability checks
==================

There are some kernel features and conditions that make a stack trace
unreliable. Callers may require the unwinder to detect these cases.
E.g., livepatch.

Introduce a new function called unwind_is_reliable() that will detect
these cases and return a boolean.

Introduce a new argument to unwind() called "need_reliable" so a caller
can tell unwind() that it requires a reliable stack trace. For such a
caller, any unreliability in the stack trace must be treated as a fatal
error and the unwind must be aborted.

Call unwind_is_reliable() from unwind_consume() like this:

if (frame->need_reliable && !unwind_is_reliable(frame)) {
frame->failed = true;
return false;
}

arch_stack_walk() passes "false" for need_reliable because its callers
don't care about reliability. arch_stack_walk() is used for debug and
test purposes.

Introduce arch_stack_walk_reliable() for ARM64. This works like
arch_stack_walk() except for two things:

- It passes "true" for need_reliable.

- It returns -EINVAL if unwind() aborts.

Introduce the first reliability check in unwind_is_reliable() - If
a return PC is not a valid kernel text address, consider the stack
trace unreliable. It could be some generated code.

Other reliability checks will be added in the future. Until all of the
checks are in place, arch_stack_walk_reliable() may not be used by
livepatch. But it may be used by debug and test code.

SYM_CODE check
==============

SYM_CODE functions do not follow normal calling conventions. They cannot
be unwound reliably using the frame pointer. Collect the address ranges
of these functions in a special section called "sym_code_functions".

In unwind_is_reliable(), check the return PC against these ranges. If a
match is found, then consider the stack trace unreliable. This is the
second reliability check introduced by this work.

Last stack frame
----------------

If a SYM_CODE function occurs in the very last frame in the stack trace,
then the stack trace is not considered unreliable. This is because there
is no more unwinding to do. Examples:

- EL0 exception stack traces end in the top level EL0 exception
handlers.

- All kernel thread stack traces end in ret_from_fork().
---
Changelog:

v8:
From Mark Rutland:

- Make the unwinder loop similar to other architectures.

- Keep details to within the unwinder functions and return a simple
boolean to the caller.

- Convert some of the current code that contains unwinder logic to
simply use arch_stack_walk(). I have converted all of them.

- Do not copy sym_code_functions[]. Just place it in rodata for now.

- Have the main loop check for termination conditions rather than
having unwind_frame() check for them. In other words, let
unwind_frame() assume that the fp is valid.

- Replace the big comment for SYM_CODE functions with a shorter
comment.

/*
* As SYM_CODE functions don't follow the usual calling
* conventions, we assume by default that any SYM_CODE function
* cannot be unwound reliably.
*
* Note that this includes:
*
* - Exception handlers and entry assembly
* - Trampoline assembly (e.g., ftrace, kprobes)
* - Hypervisor-related assembly
* - Hibernation-related assembly
* - CPU start-stop, suspend-resume assembly
* - Kernel relocation assembly
*/

v7:
The Mailer screwed up the threading on this. So, I have resent this
same series as version 8 with proper threading to avoid confusion.
v6:
From Mark Rutland:

- The per-frame reliability concept and flag are acceptable. But more
work is needed to make the per-frame checks more accurate and more
complete. E.g., some code reorg is being worked on that will help.

I have now removed the frame->reliable flag and deleted the whole
concept of per-frame status. This is orthogonal to this patch series.
Instead, I have improved the unwinder to return proper return codes
so a caller can take appropriate action without needing per-frame
status.

- Remove the mention of PLTs and update the comment.

I have replaced the comment above the call to __kernel_text_address()
with the comment suggested by Mark Rutland.

Other comments:

- Other comments on the per-frame stuff are not relevant because
that approach is not there anymore.

v5:
From Keiya Nobuta:

- The term blacklist(ed) is not to be used anymore. I have changed it
to unreliable. So, the function unwinder_blacklisted() has been
changed to unwinder_is_unreliable().

From Mark Brown:

- Add a comment for the "reliable" flag in struct stackframe. The
reliability attribute is not complete until all the checks are
in place. Added a comment above struct stackframe.

- Include some of the comments in the cover letter in the actual
code so that we can compare it with the reliable stack trace
requirements document for completeness. I have added a comment:

- above unwinder_is_unreliable() that lists the requirements
that are addressed by the function.

- above the __kernel_text_address() call about all the cases
the call covers.

v4:
From Mark Brown:

- I was checking the return PC with __kernel_text_address() before
the Function Graph trace handling. Mark Brown felt that all the
reliability checks should be performed on the original return PC
once that is obtained. So, I have moved all the reliability checks
to after the Function Graph Trace handling code in the unwinder.
Basically, the unwinder should perform PC translations first (for
rhe return trampoline for Function Graph Tracing, Kretprobes, etc).
Then, the reliability checks should be applied to the resulting
PC.

- Mark said to improve the naming of the new functions so they don't
collide with existing ones. I have used a prefix "unwinder_" for
all the new functions.

From Josh Poimboeuf:

- In the error scenarios in the unwinder, the reliable flag in the
stack frame should be set. Implemented this.

- Some of the other comments are not relevant to the new code as
I have taken a different approach in the new code. That is why
I have not made those changes. E.g., Ard wanted me to add the
"const" keyword to the global section array. That array does not
exist in v4. Similarly, Mark Brown said to use ARRAY_SIZE() for
the same array in a for loop.

Other changes:

- Add a new definition for SYM_CODE_END() that adds the address
range of the function to a special section called
"sym_code_functions".

- Include the new section under initdata in vmlinux.lds.S.

- Define an early_initcall() to copy the contents of the
"sym_code_functions" section to an array by the same name.

- Define a function unwinder_blacklisted() that compares a return
PC against sym_code_sections[]. If there is a match, mark the
stack trace unreliable. Call this from unwind_frame().

v3:
- Implemented a sym_code_ranges[] array to contains sections bounds
for text sections that contain SYM_CODE_*() functions. The unwinder
checks each return PC against the sections. If it falls in any of
the sections, the stack trace is marked unreliable.

- Moved SYM_CODE functions from .text and .init.text into a new
text section called ".code.text". Added this section to
vmlinux.lds.S and sym_code_ranges[].

- Fixed the logic in the unwinder that handles Function Graph
Tracer return trampoline.

- Removed all the previous code that handles:
- ftrace entry code for traced function
- special_functions[] array that lists individual functions
- kretprobe_trampoline() special case

v2
- Removed the terminating entry { 0, 0 } in special_functions[]
and replaced it with the idiom { /* sentinel */ }.

- Change the ftrace trampoline entry ftrace_graph_call in
special_functions[] to ftrace_call + 4 and added explanatory
comments.

- Unnested #ifdefs in special_functions[] for FTRACE.

v1
- Define a bool field in struct stackframe. This will indicate if
a stack trace is reliable.

- Implement a special_functions[] array that will be populated
with special functions in which the stack trace is considered
unreliable.

- Using kallsyms_lookup(), get the address ranges for the special
functions and record them.

- Implement an is_reliable_function(pc). This function will check
if a given return PC falls in any of the special functions. If
it does, the stack trace is unreliable.

- Implement check_reliability() function that will check if a
stack frame is reliable. Call is_reliable_function() from
check_reliability().

- Before a return PC is checked against special_funtions[], it
must be validates as a proper kernel text address. Call
__kernel_text_address() from check_reliability().

- Finally, call check_reliability() from unwind_frame() for
each stack frame.

- Add EL1 exception handlers to special_functions[].

el1_sync();
el1_irq();
el1_error();
el1_sync_invalid();
el1_irq_invalid();
el1_fiq_invalid();
el1_error_invalid();

- The above functions are currently defined as LOCAL symbols.
Make them global so that they can be referenced from the
unwinder code.

- Add FTRACE trampolines to special_functions[]:

ftrace_graph_call()
ftrace_graph_caller()
return_to_handler()

- Add the kretprobe trampoline to special functions[]:

kretprobe_trampoline()

Previous versions and discussion
================================

v7: Mailer screwed up the threading. Sent the same as v8 with proper threading.
v6: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v5: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v4: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v3: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v2: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v1: https://lore.kernel.org/linux-arm-kernel/[email protected]/
Madhavan T. Venkataraman (4):
arm64: Make all stack walking functions use arch_stack_walk()
arm64: Reorganize the unwinder code for better consistency and
maintenance
arm64: Introduce stack trace reliability checks in the unwinder
arm64: Create a list of SYM_CODE functions, check return PC against
list

arch/arm64/include/asm/linkage.h | 12 ++
arch/arm64/include/asm/sections.h | 1 +
arch/arm64/include/asm/stacktrace.h | 16 +-
arch/arm64/kernel/perf_callchain.c | 5 +-
arch/arm64/kernel/process.c | 39 ++--
arch/arm64/kernel/return_address.c | 6 +-
arch/arm64/kernel/stacktrace.c | 291 ++++++++++++++++++++--------
arch/arm64/kernel/time.c | 22 ++-
arch/arm64/kernel/vmlinux.lds.S | 10 +
9 files changed, 277 insertions(+), 125 deletions(-)


base-commit: 36a21d51725af2ce0700c6ebcb6b9594aac658a6
--
2.25.1


2021-08-12 20:28:12

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: [RFC PATCH v8 3/4] arm64: Introduce stack trace reliability checks in the unwinder

From: "Madhavan T. Venkataraman" <[email protected]>

There are some kernel features and conditions that make a stack trace
unreliable. Callers may require the unwinder to detect these cases.
E.g., livepatch.

Introduce a new function called unwind_is_reliable() that will detect
these cases and return a boolean.

Introduce a new argument to unwind() called "need_reliable" so a caller
can tell unwind() that it requires a reliable stack trace. For such a
caller, any unreliability in the stack trace must be treated as a fatal
error and the unwind must be aborted.

Call unwind_is_reliable() from unwind_consume() like this:

if (frame->need_reliable && !unwind_is_reliable(frame)) {
frame->failed = true;
return false;
}

In other words, if the return PC in the stackframe falls in unreliable code,
then it cannot be unwound reliably.

arch_stack_walk() will pass "false" for need_reliable because its callers
don't care about reliability. arch_stack_walk() is used for debug and
test purposes.

Introduce arch_stack_walk_reliable() for ARM64. This works like
arch_stack_walk() except for two things:

- It passes "true" for need_reliable.

- It returns -EINVAL if unwind() says that the stack trace is
unreliable.

Introduce the first reliability check in unwind_is_reliable() - If
a return PC is not a valid kernel text address, consider the stack
trace unreliable. It could be some generated code.

Other reliability checks will be added in the future. Until all of the
checks are in place, arch_stack_walk_reliable() may not be used by
livepatch. But it may be used by debug and test code.

Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/include/asm/stacktrace.h | 4 ++
arch/arm64/kernel/stacktrace.c | 63 +++++++++++++++++++++++++++--
2 files changed, 63 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 407007376e97..65ea151da5da 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -53,6 +53,9 @@ struct stack_info {
* replacement lr value in the ftrace graph stack.
*
* @failed: Unwind failed.
+ *
+ * @need_reliable The caller needs a reliable stack trace. Treat any
+ * unreliability as a fatal error.
*/
struct stackframe {
struct task_struct *task;
@@ -65,6 +68,7 @@ struct stackframe {
int graph;
#endif
bool failed;
+ bool need_reliable;
};

extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index ec8f5163c4d0..b60f8a20ba64 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -34,7 +34,8 @@

static void notrace unwind_start(struct stackframe *frame,
struct task_struct *task,
- unsigned long fp, unsigned long pc)
+ unsigned long fp, unsigned long pc,
+ bool need_reliable)
{
frame->task = task;
frame->fp = fp;
@@ -56,6 +57,7 @@ static void notrace unwind_start(struct stackframe *frame,
frame->prev_fp = 0;
frame->prev_type = STACK_TYPE_UNKNOWN;
frame->failed = false;
+ frame->need_reliable = need_reliable;
}

NOKPROBE_SYMBOL(unwind_start);
@@ -178,6 +180,23 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
barrier();
}

+/*
+ * Check the stack frame for conditions that make further unwinding unreliable.
+ */
+static bool notrace unwind_is_reliable(struct stackframe *frame)
+{
+ /*
+ * If the PC is not a known kernel text address, then we cannot
+ * be sure that a subsequent unwind will be reliable, as we
+ * don't know that the code follows our unwind requirements.
+ */
+ if (!__kernel_text_address(frame->pc))
+ return false;
+ return true;
+}
+
+NOKPROBE_SYMBOL(unwind_is_reliable);
+
static bool notrace unwind_consume(struct stackframe *frame,
stack_trace_consume_fn consume_entry,
void *cookie)
@@ -197,6 +216,12 @@ static bool notrace unwind_consume(struct stackframe *frame,
/* Final frame; nothing to unwind */
return false;
}
+
+ if (frame->need_reliable && !unwind_is_reliable(frame)) {
+ /* Cannot unwind to the next frame reliably. */
+ frame->failed = true;
+ return false;
+ }
return true;
}

@@ -210,11 +235,12 @@ static inline bool unwind_failed(struct stackframe *frame)
/* Core unwind function */
static bool notrace unwind(stack_trace_consume_fn consume_entry, void *cookie,
struct task_struct *task,
- unsigned long fp, unsigned long pc)
+ unsigned long fp, unsigned long pc,
+ bool need_reliable)
{
struct stackframe frame;

- unwind_start(&frame, task, fp, pc);
+ unwind_start(&frame, task, fp, pc, need_reliable);
while (unwind_consume(&frame, consume_entry, cookie))
unwind_next(&frame);
return !unwind_failed(&frame);
@@ -245,7 +271,36 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
fp = thread_saved_fp(task);
pc = thread_saved_pc(task);
}
- unwind(consume_entry, cookie, task, fp, pc);
+ unwind(consume_entry, cookie, task, fp, pc, false);
+}
+
+/*
+ * arch_stack_walk_reliable() may not be used for livepatch until all of
+ * the reliability checks are in place in unwind_consume(). However,
+ * debug and test code can choose to use it even if all the checks are not
+ * in place.
+ */
+noinline int notrace arch_stack_walk_reliable(stack_trace_consume_fn consume_fn,
+ void *cookie,
+ struct task_struct *task)
+{
+ unsigned long fp, pc;
+
+ if (!task)
+ task = current;
+
+ if (task == current) {
+ /* Skip arch_stack_walk_reliable() in the stack trace. */
+ fp = (unsigned long)__builtin_frame_address(1);
+ pc = (unsigned long)__builtin_return_address(0);
+ } else {
+ /* Caller guarantees that the task is not running. */
+ fp = thread_saved_fp(task);
+ pc = thread_saved_pc(task);
+ }
+ if (unwind(consume_fn, cookie, task, fp, pc, true))
+ return 0;
+ return -EINVAL;
}

#endif
--
2.25.1

2021-08-12 20:30:52

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: [RFC PATCH v8 4/4] arm64: Create a list of SYM_CODE functions, check return PC against list

From: "Madhavan T. Venkataraman" <[email protected]>

SYM_CODE functions don't follow the usual calling conventions. Check if the
return PC in a stack frame falls in any of these. If it does, consider the
stack trace unreliable.

Define a special section for unreliable functions
=================================================

Define a SYM_CODE_END() macro for arm64 that adds the function address
range to a new section called "sym_code_functions".

Linker file
===========

Include the "sym_code_functions" section under read-only data in
vmlinux.lds.S.

Initialization
==============

Define an early_initcall() to create a sym_code_functions[] array from
the linker data.

Unwinder check
==============

Add a reliability check in unwind_is_reliable() that compares a return
PC with sym_code_functions[]. If there is a match, then return failure.

Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/include/asm/linkage.h | 12 +++++++
arch/arm64/include/asm/sections.h | 1 +
arch/arm64/kernel/stacktrace.c | 53 +++++++++++++++++++++++++++++++
arch/arm64/kernel/vmlinux.lds.S | 10 ++++++
4 files changed, 76 insertions(+)

diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
index 9906541a6861..616bad74e297 100644
--- a/arch/arm64/include/asm/linkage.h
+++ b/arch/arm64/include/asm/linkage.h
@@ -68,4 +68,16 @@
SYM_FUNC_END_ALIAS(x); \
SYM_FUNC_END_ALIAS(__pi_##x)

+/*
+ * Record the address range of each SYM_CODE function in a struct code_range
+ * in a special section.
+ */
+#define SYM_CODE_END(name) \
+ SYM_END(name, SYM_T_NONE) ;\
+ 99: ;\
+ .pushsection "sym_code_functions", "aw" ;\
+ .quad name ;\
+ .quad 99b ;\
+ .popsection
+
#endif
diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index e4ad9db53af1..c84c71063d6e 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -21,5 +21,6 @@ extern char __exittext_begin[], __exittext_end[];
extern char __irqentry_text_start[], __irqentry_text_end[];
extern char __mmuoff_data_start[], __mmuoff_data_end[];
extern char __entry_tramp_text_start[], __entry_tramp_text_end[];
+extern char __sym_code_functions_start[], __sym_code_functions_end[];

#endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index b60f8a20ba64..26dbdd4fff77 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,6 +18,31 @@
#include <asm/stack_pointer.h>
#include <asm/stacktrace.h>

+struct code_range {
+ unsigned long start;
+ unsigned long end;
+};
+
+static struct code_range *sym_code_functions;
+static int num_sym_code_functions;
+
+int __init init_sym_code_functions(void)
+{
+ size_t size = (unsigned long)__sym_code_functions_end -
+ (unsigned long)__sym_code_functions_start;
+
+ sym_code_functions = (struct code_range *)__sym_code_functions_start;
+ /*
+ * Order it so that sym_code_functions is not visible before
+ * num_sym_code_functions.
+ */
+ smp_mb();
+ num_sym_code_functions = size / sizeof(struct code_range);
+
+ return 0;
+}
+early_initcall(init_sym_code_functions);
+
/*
* AArch64 PCS assigns the frame pointer to x29.
*
@@ -185,6 +210,10 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
*/
static bool notrace unwind_is_reliable(struct stackframe *frame)
{
+ const struct code_range *range;
+ unsigned long pc;
+ int i;
+
/*
* If the PC is not a known kernel text address, then we cannot
* be sure that a subsequent unwind will be reliable, as we
@@ -192,6 +221,30 @@ static bool notrace unwind_is_reliable(struct stackframe *frame)
*/
if (!__kernel_text_address(frame->pc))
return false;
+
+ /*
+ * Check the return PC against sym_code_functions[]. If there is a
+ * match, then the consider the stack frame unreliable.
+ *
+ * As SYM_CODE functions don't follow the usual calling conventions,
+ * we assume by default that any SYM_CODE function cannot be unwound
+ * reliably.
+ *
+ * Note that this includes:
+ *
+ * - Exception handlers and entry assembly
+ * - Trampoline assembly (e.g., ftrace, kprobes)
+ * - Hypervisor-related assembly
+ * - Hibernation-related assembly
+ * - CPU start-stop, suspend-resume assembly
+ * - Kernel relocation assembly
+ */
+ pc = frame->pc;
+ for (i = 0; i < num_sym_code_functions; i++) {
+ range = &sym_code_functions[i];
+ if (pc >= range->start && pc < range->end)
+ return false;
+ }
return true;
}

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 709d2c433c5e..2bf769f45b54 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -111,6 +111,14 @@ jiffies = jiffies_64;
#define TRAMP_TEXT
#endif

+#define SYM_CODE_FUNCTIONS \
+ . = ALIGN(16); \
+ .symcode : AT(ADDR(.symcode) - LOAD_OFFSET) { \
+ __sym_code_functions_start = .; \
+ KEEP(*(sym_code_functions)) \
+ __sym_code_functions_end = .; \
+ }
+
/*
* The size of the PE/COFF section that covers the kernel image, which
* runs from _stext to _edata, must be a round multiple of the PE/COFF
@@ -196,6 +204,8 @@ SECTIONS
swapper_pg_dir = .;
. += PAGE_SIZE;

+ SYM_CODE_FUNCTIONS
+
. = ALIGN(SEGMENT_ALIGN);
__init_begin = .;
__inittext_begin = .;
--
2.25.1

2021-08-12 22:00:18

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: [RFC PATCH v8 1/4] arm64: Make all stack walking functions use arch_stack_walk()

From: "Madhavan T. Venkataraman" <[email protected]>

Currently, there are multiple functions in ARM64 code that walk the
stack using start_backtrace() and unwind_frame(). Convert all of
them to use arch_stack_walk(). This makes maintenance easier.

Here is the list of functions:

perf_callchain_kernel()
get_wchan()
return_address()
dump_backtrace()
profile_pc()

Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/include/asm/stacktrace.h | 3 ---
arch/arm64/kernel/perf_callchain.c | 5 +---
arch/arm64/kernel/process.c | 39 ++++++++++++++++++-----------
arch/arm64/kernel/return_address.c | 6 +----
arch/arm64/kernel/stacktrace.c | 38 +++-------------------------
arch/arm64/kernel/time.c | 22 +++++++++-------
6 files changed, 43 insertions(+), 70 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 8aebc00c1718..e43dea1c6b41 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -61,9 +61,6 @@ struct stackframe {
#endif
};

-extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
-extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
- bool (*fn)(void *, unsigned long), void *data);
extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
const char *loglvl);

diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
index 4a72c2727309..2f289013c9c9 100644
--- a/arch/arm64/kernel/perf_callchain.c
+++ b/arch/arm64/kernel/perf_callchain.c
@@ -147,15 +147,12 @@ static bool callchain_trace(void *data, unsigned long pc)
void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
struct pt_regs *regs)
{
- struct stackframe frame;
-
if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
/* We don't support guest os callchain now */
return;
}

- start_backtrace(&frame, regs->regs[29], regs->pc);
- walk_stackframe(current, &frame, callchain_trace, entry);
+ arch_stack_walk(callchain_trace, entry, current, regs);
}

unsigned long perf_instruction_pointer(struct pt_regs *regs)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index c8989b999250..52c12fd26407 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -544,11 +544,28 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
return last;
}

+struct wchan_info {
+ unsigned long pc;
+ int count;
+};
+
+static bool get_wchan_cb(void *arg, unsigned long pc)
+{
+ struct wchan_info *wchan_info = arg;
+
+ if (!in_sched_functions(pc)) {
+ wchan_info->pc = pc;
+ return false;
+ }
+ wchan_info->count--;
+ return !!wchan_info->count;
+}
+
unsigned long get_wchan(struct task_struct *p)
{
- struct stackframe frame;
- unsigned long stack_page, ret = 0;
- int count = 0;
+ unsigned long stack_page;
+ struct wchan_info wchan_info;
+
if (!p || p == current || task_is_running(p))
return 0;

@@ -556,20 +573,12 @@ unsigned long get_wchan(struct task_struct *p)
if (!stack_page)
return 0;

- start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
+ wchan_info.pc = 0;
+ wchan_info.count = 16;
+ arch_stack_walk(get_wchan_cb, &wchan_info, p, NULL);

- do {
- if (unwind_frame(p, &frame))
- goto out;
- if (!in_sched_functions(frame.pc)) {
- ret = frame.pc;
- goto out;
- }
- } while (count++ < 16);
-
-out:
put_task_stack(p);
- return ret;
+ return wchan_info.pc;
}

unsigned long arch_align_stack(unsigned long sp)
diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
index a6d18755652f..92a0f4d434e4 100644
--- a/arch/arm64/kernel/return_address.c
+++ b/arch/arm64/kernel/return_address.c
@@ -35,15 +35,11 @@ NOKPROBE_SYMBOL(save_return_addr);
void *return_address(unsigned int level)
{
struct return_address_data data;
- struct stackframe frame;

data.level = level + 2;
data.addr = NULL;

- start_backtrace(&frame,
- (unsigned long)__builtin_frame_address(0),
- (unsigned long)return_address);
- walk_stackframe(current, &frame, save_return_addr, &data);
+ arch_stack_walk(save_return_addr, &data, current, NULL);

if (!data.level)
return data.addr;
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 8982a2b78acf..1800310f92be 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -151,23 +151,21 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
}
NOKPROBE_SYMBOL(walk_stackframe);

-static void dump_backtrace_entry(unsigned long where, const char *loglvl)
+static bool dump_backtrace_entry(void *arg, unsigned long where)
{
+ char *loglvl = arg;
printk("%s %pSb\n", loglvl, (void *)where);
+ return true;
}

void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
const char *loglvl)
{
- struct stackframe frame;
- int skip = 0;
-
pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);

if (regs) {
if (user_mode(regs))
return;
- skip = 1;
}

if (!tsk)
@@ -176,36 +174,8 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
if (!try_get_task_stack(tsk))
return;

- if (tsk == current) {
- start_backtrace(&frame,
- (unsigned long)__builtin_frame_address(0),
- (unsigned long)dump_backtrace);
- } else {
- /*
- * task blocked in __switch_to
- */
- start_backtrace(&frame,
- thread_saved_fp(tsk),
- thread_saved_pc(tsk));
- }
-
printk("%sCall trace:\n", loglvl);
- do {
- /* skip until specified stack frame */
- if (!skip) {
- dump_backtrace_entry(frame.pc, loglvl);
- } else if (frame.fp == regs->regs[29]) {
- skip = 0;
- /*
- * Mostly, this is the case where this function is
- * called in panic/abort. As exception handler's
- * stack frame does not contain the corresponding pc
- * at which an exception has taken place, use regs->pc
- * instead.
- */
- dump_backtrace_entry(regs->pc, loglvl);
- }
- } while (!unwind_frame(tsk, &frame));
+ arch_stack_walk(dump_backtrace_entry, (void *)loglvl, tsk, regs);

put_task_stack(tsk);
}
diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index eebbc8d7123e..671b3038a772 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -32,22 +32,26 @@
#include <asm/stacktrace.h>
#include <asm/paravirt.h>

+static bool profile_pc_cb(void *arg, unsigned long pc)
+{
+ unsigned long *prof_pc = arg;
+
+ if (in_lock_functions(pc))
+ return true;
+ *prof_pc = pc;
+ return false;
+}
+
unsigned long profile_pc(struct pt_regs *regs)
{
- struct stackframe frame;
+ unsigned long prof_pc = 0;

if (!in_lock_functions(regs->pc))
return regs->pc;

- start_backtrace(&frame, regs->regs[29], regs->pc);
-
- do {
- int ret = unwind_frame(NULL, &frame);
- if (ret < 0)
- return 0;
- } while (in_lock_functions(frame.pc));
+ arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);

- return frame.pc;
+ return prof_pc;
}
EXPORT_SYMBOL(profile_pc);

--
2.25.1

2021-08-12 22:00:19

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: [RFC PATCH v8 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance

From: "Madhavan T. Venkataraman" <[email protected]>

Renaming of unwinder functions
==============================

Rename unwinder functions to unwind_*() similar to other architectures
for naming consistency. More on this below.

unwind function attributes
==========================

Mark all of the unwind_*() functions with notrace so they cannot be ftraced
and NOKPROBE_SYMBOL() so they cannot be kprobed. Ftrace and Kprobe code
can call the unwinder.

start_backtrace()
=================

start_backtrace() is only called by arch_stack_walk(). Make it static.
Rename start_backtrace() to unwind_start() for naming consistency.

unwind_frame()
==============

Rename this to unwind_next() for naming consistency.

Replace walk_stackframe() with unwind()
=======================================

walk_stackframe() contains the unwinder loop that walks the stack
frames. Currently, start_backtrace() and walk_stackframe() are called
separately. They should be combined in the same function. Also, the
loop in walk_stackframe() should be simplified and should look like
the unwind loops in other architectures such as X86 and S390.

Remove walk_stackframe(). Define a new function called "unwind()" in
its place. Define the following unwinder loop:

unwind_start(&frame, task, fp, pc);
while (unwind_consume(&frame, consume_entry, cookie))
unwind_next(&frame);
return !unwind_failed(&frame);

unwind_start()
Same as the original start_backtrace().

unwind_consume()
This is a new function that calls the callback function to
consume the PC in a stackframe. Do it this way so that checks
can be performed before and after the callback to determine
whether the unwind should continue or terminate.

unwind_next()
Same as the original unwind_frame() except for two things:

- the stack trace termination check has been moved from
here to unwind_consume(). So, unwind_next() is always
called on a valid fp.

- unwind_frame() used to return an error value. This
function does not return anything.

unwind_failed()
Return a boolean to indicate if the stack trace completed
successfully or failed. arch_stack_walk() ignores the return
value. But arch_stack_walk_reliable() in the future will look
at the return value.

Unwind status
=============

Introduce a new flag called "failed" in struct stackframe. unwind_next()
and unwind_consume() will set this flag when an error is encountered and
unwind_consume() will check this flag. This is in keeping with other
architectures.

The failed flags is accessed via the helper unwind_failed().

Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/include/asm/stacktrace.h | 9 +-
arch/arm64/kernel/stacktrace.c | 145 ++++++++++++++++++----------
2 files changed, 99 insertions(+), 55 deletions(-)

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index e43dea1c6b41..407007376e97 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -34,6 +34,8 @@ struct stack_info {
* A snapshot of a frame record or fp/lr register values, along with some
* accounting information necessary for robust unwinding.
*
+ * @task: The task whose stack is being unwound.
+ *
* @fp: The fp value in the frame record (or the real fp)
* @pc: The lr value in the frame record (or the real lr)
*
@@ -49,8 +51,11 @@ struct stack_info {
*
* @graph: When FUNCTION_GRAPH_TRACER is selected, holds the index of a
* replacement lr value in the ftrace graph stack.
+ *
+ * @failed: Unwind failed.
*/
struct stackframe {
+ struct task_struct *task;
unsigned long fp;
unsigned long pc;
DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
@@ -59,6 +64,7 @@ struct stackframe {
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
int graph;
#endif
+ bool failed;
};

extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
@@ -145,7 +151,4 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
return false;
}

-void start_backtrace(struct stackframe *frame, unsigned long fp,
- unsigned long pc);
-
#endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 1800310f92be..ec8f5163c4d0 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -32,10 +32,11 @@
* add sp, sp, #0x10
*/

-
-void start_backtrace(struct stackframe *frame, unsigned long fp,
- unsigned long pc)
+static void notrace unwind_start(struct stackframe *frame,
+ struct task_struct *task,
+ unsigned long fp, unsigned long pc)
{
+ frame->task = task;
frame->fp = fp;
frame->pc = pc;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
@@ -45,7 +46,7 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
/*
* Prime the first unwind.
*
- * In unwind_frame() we'll check that the FP points to a valid stack,
+ * In unwind_next() we'll check that the FP points to a valid stack,
* which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
* treated as a transition to whichever stack that happens to be. The
* prev_fp value won't be used, but we set it to 0 such that it is
@@ -54,8 +55,11 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
frame->prev_fp = 0;
frame->prev_type = STACK_TYPE_UNKNOWN;
+ frame->failed = false;
}

+NOKPROBE_SYMBOL(unwind_start);
+
/*
* Unwind from one frame record (A) to the next frame record (B).
*
@@ -63,26 +67,26 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
* records (e.g. a cycle), determined based on the location and fp value of A
* and the location (but not the fp value) of B.
*/
-int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
+static void notrace unwind_next(struct stackframe *frame)
{
unsigned long fp = frame->fp;
struct stack_info info;
+ struct task_struct *tsk = frame->task;

- if (!tsk)
- tsk = current;
-
- /* Final frame; nothing to unwind */
- if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
- return -ENOENT;
-
- if (fp & 0x7)
- return -EINVAL;
+ if (fp & 0x7) {
+ frame->failed = true;
+ return;
+ }

- if (!on_accessible_stack(tsk, fp, 16, &info))
- return -EINVAL;
+ if (!on_accessible_stack(tsk, fp, 16, &info)) {
+ frame->failed = true;
+ return;
+ }

- if (test_bit(info.type, frame->stacks_done))
- return -EINVAL;
+ if (test_bit(info.type, frame->stacks_done)) {
+ frame->failed = true;
+ return;
+ }

/*
* As stacks grow downward, any valid record on the same stack must be
@@ -98,15 +102,17 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
* stack.
*/
if (info.type == frame->prev_type) {
- if (fp <= frame->prev_fp)
- return -EINVAL;
+ if (fp <= frame->prev_fp) {
+ frame->failed = true;
+ return;
+ }
} else {
set_bit(frame->prev_type, frame->stacks_done);
}

/*
* Record this frame record's values and location. The prev_fp and
- * prev_type are only meaningful to the next unwind_frame() invocation.
+ * prev_type are only meaningful to the next unwind_next() invocation.
*/
frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
@@ -124,32 +130,18 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
* So replace it to an original value.
*/
ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
- if (WARN_ON_ONCE(!ret_stack))
- return -EINVAL;
+ if (WARN_ON_ONCE(!ret_stack)) {
+ frame->failed = true;
+ return;
+ }
frame->pc = ret_stack->ret;
}
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */

frame->pc = ptrauth_strip_insn_pac(frame->pc);
-
- return 0;
}
-NOKPROBE_SYMBOL(unwind_frame);

-void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
- bool (*fn)(void *, unsigned long), void *data)
-{
- while (1) {
- int ret;
-
- if (!fn(data, frame->pc))
- break;
- ret = unwind_frame(tsk, frame);
- if (ret < 0)
- break;
- }
-}
-NOKPROBE_SYMBOL(walk_stackframe);
+NOKPROBE_SYMBOL(unwind_next);

static bool dump_backtrace_entry(void *arg, unsigned long where)
{
@@ -186,25 +178,74 @@ void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
barrier();
}

+static bool notrace unwind_consume(struct stackframe *frame,
+ stack_trace_consume_fn consume_entry,
+ void *cookie)
+{
+ if (frame->failed) {
+ /* PC is suspect. Cannot consume it. */
+ return false;
+ }
+
+ if (!consume_entry(cookie, frame->pc)) {
+ /* Caller terminated the unwind. */
+ frame->failed = true;
+ return false;
+ }
+
+ if (frame->fp == (unsigned long)task_pt_regs(frame->task)->stackframe) {
+ /* Final frame; nothing to unwind */
+ return false;
+ }
+ return true;
+}
+
+NOKPROBE_SYMBOL(unwind_consume);
+
+static inline bool unwind_failed(struct stackframe *frame)
+{
+ return frame->failed;
+}
+
+/* Core unwind function */
+static bool notrace unwind(stack_trace_consume_fn consume_entry, void *cookie,
+ struct task_struct *task,
+ unsigned long fp, unsigned long pc)
+{
+ struct stackframe frame;
+
+ unwind_start(&frame, task, fp, pc);
+ while (unwind_consume(&frame, consume_entry, cookie))
+ unwind_next(&frame);
+ return !unwind_failed(&frame);
+}
+
+NOKPROBE_SYMBOL(unwind);
+
#ifdef CONFIG_STACKTRACE

noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
void *cookie, struct task_struct *task,
struct pt_regs *regs)
{
- struct stackframe frame;
+ unsigned long fp, pc;
+
+ if (!task)
+ task = current;

- if (regs)
- start_backtrace(&frame, regs->regs[29], regs->pc);
- else if (task == current)
- start_backtrace(&frame,
- (unsigned long)__builtin_frame_address(1),
- (unsigned long)__builtin_return_address(0));
- else
- start_backtrace(&frame, thread_saved_fp(task),
- thread_saved_pc(task));
-
- walk_stackframe(task, &frame, consume_entry, cookie);
+ if (regs) {
+ fp = regs->regs[29];
+ pc = regs->pc;
+ } else if (task == current) {
+ /* Skip arch_stack_walk() in the stack trace. */
+ fp = (unsigned long)__builtin_frame_address(1);
+ pc = (unsigned long)__builtin_return_address(0);
+ } else {
+ /* Caller guarantees that the task is not running. */
+ fp = thread_saved_fp(task);
+ pc = thread_saved_pc(task);
+ }
+ unwind(consume_entry, cookie, task, fp, pc);
}

#endif
--
2.25.1

2021-08-12 22:00:38

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: Re: [RFC PATCH v8 0/4] arm64: Reorganize the unwinder and implement stack trace reliability checks

OK. So, this time the threading is proper. Please review version 8.
It is identical to version 7 except for the version number and
threading.

Please disregard all emails sent as RFC PATCH v7 for this series. Again,
apologies for screwing the threading up.

Madhavan

On 8/12/21 2:05 PM, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Make all stack walking functions use arch_stack_walk()
> ======================================================
>
> Currently, there are multiple functions in ARM64 code that walk the
> stack using start_backtrace() and unwind_frame(). Convert all of
> them to use arch_stack_walk(). This makes maintenance easier.
>
> Reorganize the unwinder code for better consistency and maintenance
> ===================================================================
>
> Rename unwinder functions to unwind_*() similar to other architectures
> for naming consistency.
>
> Annotate all of the unwind_*() functions with notrace so they cannot be
> ftraced and NOKPROBE_SYMBOL() so they cannot be kprobed. Ftrace and Kprobe
> code can call the unwinder.
>
> Redefine the unwinder loop and make it similar to other architectures.
> Define the following:
>
> unwind_start(&frame, task, fp, pc);
> while (unwind_consume(&frame, consume_entry, cookie))
> unwind_next(&frame);
> return !unwind_failed(&frame);
>
> unwind_start()
> Same as the original start_backtrace().
>
> unwind_consume()
> This new function does two things:
>
> - Calls consume_entry() to consume the return PC.
>
> - Implements checks to determine whether the unwind should continue
> or terminate.
>
> unwind_next()
> Same as the original unwind_frame() except:
>
> - the stack trace termination check has been moved from here to
> unwind_consume(). So, unwind_next() assumes that the fp is valid.
>
> - unwind_frame() used to return an error value. This function only
> sets internal state and does not return anything. The state is
> retrieved via a helper. See next.
>
> unwind_failed()
> Return a boolean to indicate whether the stack trace completed
> successfully or failed. arch_stack_walk() ignores the return
> value. But arch_stack_walk_reliable() in the future will look
> at the return value.
>
> Unwind status
> Introduce a new flag called "failed" in struct stackframe. Set this
> flag when an error is encountered. If this flag is set, terminate
> the unwind. Also, let the unwinder return the status to the caller.
>
> Reliability checks
> ==================
>
> There are some kernel features and conditions that make a stack trace
> unreliable. Callers may require the unwinder to detect these cases.
> E.g., livepatch.
>
> Introduce a new function called unwind_is_reliable() that will detect
> these cases and return a boolean.
>
> Introduce a new argument to unwind() called "need_reliable" so a caller
> can tell unwind() that it requires a reliable stack trace. For such a
> caller, any unreliability in the stack trace must be treated as a fatal
> error and the unwind must be aborted.
>
> Call unwind_is_reliable() from unwind_consume() like this:
>
> if (frame->need_reliable && !unwind_is_reliable(frame)) {
> frame->failed = true;
> return false;
> }
>
> arch_stack_walk() passes "false" for need_reliable because its callers
> don't care about reliability. arch_stack_walk() is used for debug and
> test purposes.
>
> Introduce arch_stack_walk_reliable() for ARM64. This works like
> arch_stack_walk() except for two things:
>
> - It passes "true" for need_reliable.
>
> - It returns -EINVAL if unwind() aborts.
>
> Introduce the first reliability check in unwind_is_reliable() - If
> a return PC is not a valid kernel text address, consider the stack
> trace unreliable. It could be some generated code.
>
> Other reliability checks will be added in the future. Until all of the
> checks are in place, arch_stack_walk_reliable() may not be used by
> livepatch. But it may be used by debug and test code.
>
> SYM_CODE check
> ==============
>
> SYM_CODE functions do not follow normal calling conventions. They cannot
> be unwound reliably using the frame pointer. Collect the address ranges
> of these functions in a special section called "sym_code_functions".
>
> In unwind_is_reliable(), check the return PC against these ranges. If a
> match is found, then consider the stack trace unreliable. This is the
> second reliability check introduced by this work.
>
> Last stack frame
> ----------------
>
> If a SYM_CODE function occurs in the very last frame in the stack trace,
> then the stack trace is not considered unreliable. This is because there
> is no more unwinding to do. Examples:
>
> - EL0 exception stack traces end in the top level EL0 exception
> handlers.
>
> - All kernel thread stack traces end in ret_from_fork().
> ---
> Changelog:
>
> v8:
> From Mark Rutland:
>
> - Make the unwinder loop similar to other architectures.
>
> - Keep details to within the unwinder functions and return a simple
> boolean to the caller.
>
> - Convert some of the current code that contains unwinder logic to
> simply use arch_stack_walk(). I have converted all of them.
>
> - Do not copy sym_code_functions[]. Just place it in rodata for now.
>
> - Have the main loop check for termination conditions rather than
> having unwind_frame() check for them. In other words, let
> unwind_frame() assume that the fp is valid.
>
> - Replace the big comment for SYM_CODE functions with a shorter
> comment.
>
> /*
> * As SYM_CODE functions don't follow the usual calling
> * conventions, we assume by default that any SYM_CODE function
> * cannot be unwound reliably.
> *
> * Note that this includes:
> *
> * - Exception handlers and entry assembly
> * - Trampoline assembly (e.g., ftrace, kprobes)
> * - Hypervisor-related assembly
> * - Hibernation-related assembly
> * - CPU start-stop, suspend-resume assembly
> * - Kernel relocation assembly
> */
>
> v7:
> The Mailer screwed up the threading on this. So, I have resent this
> same series as version 8 with proper threading to avoid confusion.
> v6:
> From Mark Rutland:
>
> - The per-frame reliability concept and flag are acceptable. But more
> work is needed to make the per-frame checks more accurate and more
> complete. E.g., some code reorg is being worked on that will help.
>
> I have now removed the frame->reliable flag and deleted the whole
> concept of per-frame status. This is orthogonal to this patch series.
> Instead, I have improved the unwinder to return proper return codes
> so a caller can take appropriate action without needing per-frame
> status.
>
> - Remove the mention of PLTs and update the comment.
>
> I have replaced the comment above the call to __kernel_text_address()
> with the comment suggested by Mark Rutland.
>
> Other comments:
>
> - Other comments on the per-frame stuff are not relevant because
> that approach is not there anymore.
>
> v5:
> From Keiya Nobuta:
>
> - The term blacklist(ed) is not to be used anymore. I have changed it
> to unreliable. So, the function unwinder_blacklisted() has been
> changed to unwinder_is_unreliable().
>
> From Mark Brown:
>
> - Add a comment for the "reliable" flag in struct stackframe. The
> reliability attribute is not complete until all the checks are
> in place. Added a comment above struct stackframe.
>
> - Include some of the comments in the cover letter in the actual
> code so that we can compare it with the reliable stack trace
> requirements document for completeness. I have added a comment:
>
> - above unwinder_is_unreliable() that lists the requirements
> that are addressed by the function.
>
> - above the __kernel_text_address() call about all the cases
> the call covers.
>
> v4:
> From Mark Brown:
>
> - I was checking the return PC with __kernel_text_address() before
> the Function Graph trace handling. Mark Brown felt that all the
> reliability checks should be performed on the original return PC
> once that is obtained. So, I have moved all the reliability checks
> to after the Function Graph Trace handling code in the unwinder.
> Basically, the unwinder should perform PC translations first (for
> rhe return trampoline for Function Graph Tracing, Kretprobes, etc).
> Then, the reliability checks should be applied to the resulting
> PC.
>
> - Mark said to improve the naming of the new functions so they don't
> collide with existing ones. I have used a prefix "unwinder_" for
> all the new functions.
>
> From Josh Poimboeuf:
>
> - In the error scenarios in the unwinder, the reliable flag in the
> stack frame should be set. Implemented this.
>
> - Some of the other comments are not relevant to the new code as
> I have taken a different approach in the new code. That is why
> I have not made those changes. E.g., Ard wanted me to add the
> "const" keyword to the global section array. That array does not
> exist in v4. Similarly, Mark Brown said to use ARRAY_SIZE() for
> the same array in a for loop.
>
> Other changes:
>
> - Add a new definition for SYM_CODE_END() that adds the address
> range of the function to a special section called
> "sym_code_functions".
>
> - Include the new section under initdata in vmlinux.lds.S.
>
> - Define an early_initcall() to copy the contents of the
> "sym_code_functions" section to an array by the same name.
>
> - Define a function unwinder_blacklisted() that compares a return
> PC against sym_code_sections[]. If there is a match, mark the
> stack trace unreliable. Call this from unwind_frame().
>
> v3:
> - Implemented a sym_code_ranges[] array to contains sections bounds
> for text sections that contain SYM_CODE_*() functions. The unwinder
> checks each return PC against the sections. If it falls in any of
> the sections, the stack trace is marked unreliable.
>
> - Moved SYM_CODE functions from .text and .init.text into a new
> text section called ".code.text". Added this section to
> vmlinux.lds.S and sym_code_ranges[].
>
> - Fixed the logic in the unwinder that handles Function Graph
> Tracer return trampoline.
>
> - Removed all the previous code that handles:
> - ftrace entry code for traced function
> - special_functions[] array that lists individual functions
> - kretprobe_trampoline() special case
>
> v2
> - Removed the terminating entry { 0, 0 } in special_functions[]
> and replaced it with the idiom { /* sentinel */ }.
>
> - Change the ftrace trampoline entry ftrace_graph_call in
> special_functions[] to ftrace_call + 4 and added explanatory
> comments.
>
> - Unnested #ifdefs in special_functions[] for FTRACE.
>
> v1
> - Define a bool field in struct stackframe. This will indicate if
> a stack trace is reliable.
>
> - Implement a special_functions[] array that will be populated
> with special functions in which the stack trace is considered
> unreliable.
>
> - Using kallsyms_lookup(), get the address ranges for the special
> functions and record them.
>
> - Implement an is_reliable_function(pc). This function will check
> if a given return PC falls in any of the special functions. If
> it does, the stack trace is unreliable.
>
> - Implement check_reliability() function that will check if a
> stack frame is reliable. Call is_reliable_function() from
> check_reliability().
>
> - Before a return PC is checked against special_funtions[], it
> must be validates as a proper kernel text address. Call
> __kernel_text_address() from check_reliability().
>
> - Finally, call check_reliability() from unwind_frame() for
> each stack frame.
>
> - Add EL1 exception handlers to special_functions[].
>
> el1_sync();
> el1_irq();
> el1_error();
> el1_sync_invalid();
> el1_irq_invalid();
> el1_fiq_invalid();
> el1_error_invalid();
>
> - The above functions are currently defined as LOCAL symbols.
> Make them global so that they can be referenced from the
> unwinder code.
>
> - Add FTRACE trampolines to special_functions[]:
>
> ftrace_graph_call()
> ftrace_graph_caller()
> return_to_handler()
>
> - Add the kretprobe trampoline to special functions[]:
>
> kretprobe_trampoline()
>
> Previous versions and discussion
> ================================
>
> v7: Mailer screwed up the threading. Sent the same as v8 with proper threading.
> v6: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> v5: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> v4: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> v3: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> v2: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> v1: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> Madhavan T. Venkataraman (4):
> arm64: Make all stack walking functions use arch_stack_walk()
> arm64: Reorganize the unwinder code for better consistency and
> maintenance
> arm64: Introduce stack trace reliability checks in the unwinder
> arm64: Create a list of SYM_CODE functions, check return PC against
> list
>
> arch/arm64/include/asm/linkage.h | 12 ++
> arch/arm64/include/asm/sections.h | 1 +
> arch/arm64/include/asm/stacktrace.h | 16 +-
> arch/arm64/kernel/perf_callchain.c | 5 +-
> arch/arm64/kernel/process.c | 39 ++--
> arch/arm64/kernel/return_address.c | 6 +-
> arch/arm64/kernel/stacktrace.c | 291 ++++++++++++++++++++--------
> arch/arm64/kernel/time.c | 22 ++-
> arch/arm64/kernel/vmlinux.lds.S | 10 +
> 9 files changed, 277 insertions(+), 125 deletions(-)
>
>
> base-commit: 36a21d51725af2ce0700c6ebcb6b9594aac658a6
>

2021-08-24 06:04:08

by [email protected]

[permalink] [raw]
Subject: RE: [RFC PATCH v8 3/4] arm64: Introduce stack trace reliability checks in the unwinder

Hi Madhavan,

> @@ -245,7 +271,36 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
> fp = thread_saved_fp(task);
> pc = thread_saved_pc(task);
> }
> - unwind(consume_entry, cookie, task, fp, pc);
> + unwind(consume_entry, cookie, task, fp, pc, false);
> +}
> +
> +/*
> + * arch_stack_walk_reliable() may not be used for livepatch until all of
> + * the reliability checks are in place in unwind_consume(). However,
> + * debug and test code can choose to use it even if all the checks are not
> + * in place.
> + */

I'm glad to see the long-awaited function :)

Does the above comment mean that this comment will be removed by
another patch series that about live patch enablement, instead of [PATCH 4/4]?

It seems to take time... But I start thinking about test code.

Thanks,
Keiya


> +noinline int notrace arch_stack_walk_reliable(stack_trace_consume_fn consume_fn,
> + void *cookie,
> + struct task_struct *task)
> +{
> + unsigned long fp, pc;
> +
> + if (!task)
> + task = current;
> +
> + if (task == current) {
> + /* Skip arch_stack_walk_reliable() in the stack trace. */
> + fp = (unsigned long)__builtin_frame_address(1);
> + pc = (unsigned long)__builtin_return_address(0);
> + } else {
> + /* Caller guarantees that the task is not running. */
> + fp = thread_saved_fp(task);
> + pc = thread_saved_pc(task);
> + }
> + if (unwind(consume_fn, cookie, task, fp, pc, true))
> + return 0;
> + return -EINVAL;
> }
>
> #endif
> --
> 2.25.1

2021-08-24 12:21:07

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: Re: [RFC PATCH v8 3/4] arm64: Introduce stack trace reliability checks in the unwinder



On 8/24/21 12:55 AM, [email protected] wrote:
> Hi Madhavan,
>
>> @@ -245,7 +271,36 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
>> fp = thread_saved_fp(task);
>> pc = thread_saved_pc(task);
>> }
>> - unwind(consume_entry, cookie, task, fp, pc);
>> + unwind(consume_entry, cookie, task, fp, pc, false);
>> +}
>> +
>> +/*
>> + * arch_stack_walk_reliable() may not be used for livepatch until all of
>> + * the reliability checks are in place in unwind_consume(). However,
>> + * debug and test code can choose to use it even if all the checks are not
>> + * in place.
>> + */
>
> I'm glad to see the long-awaited function :)
>
> Does the above comment mean that this comment will be removed by
> another patch series that about live patch enablement, instead of [PATCH 4/4]?
>
> It seems to take time... But I start thinking about test code.
>

Yes. This comment will be removed when livepatch will be enabled eventually.
So, AFAICT, there are 4 pieces that are needed:

- Reliable stack trace in the kernel. I am trying to address that with my patch
series.

- Mark Rutland's work for making patching safe on ARM64.

- Objtool (or alternative method) for stack validation.

- Suraj Jitindar Singh's patch for miscellaneous things needed to enable live patch.

Once all of these pieces are in place, livepatch can be enabled.

That said, arch_stack_walk_reliable() can be used for test and debug purposes anytime
once this patch series gets accepted.

Thanks.

Madhavan

2021-08-24 13:16:05

by Mark Rutland

[permalink] [raw]
Subject: Re: [RFC PATCH v8 1/4] arm64: Make all stack walking functions use arch_stack_walk()

On Thu, Aug 12, 2021 at 02:06:00PM -0500, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Currently, there are multiple functions in ARM64 code that walk the
> stack using start_backtrace() and unwind_frame(). Convert all of
> them to use arch_stack_walk(). This makes maintenance easier.

It would be good to split this into a series of patches as Mark Brown
suggested in v7.

> Here is the list of functions:
>
> perf_callchain_kernel()
> get_wchan()
> return_address()
> dump_backtrace()
> profile_pc()

Note that arch_stack_walk() depends on CONFIG_STACKTRACE (which is not in
defconfig), so we'll need to reorganise things such that it's always defined,
or factor out the core of that function and add a wrapper such that we
can always use it.

> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
> ---
> arch/arm64/include/asm/stacktrace.h | 3 ---
> arch/arm64/kernel/perf_callchain.c | 5 +---
> arch/arm64/kernel/process.c | 39 ++++++++++++++++++-----------
> arch/arm64/kernel/return_address.c | 6 +----
> arch/arm64/kernel/stacktrace.c | 38 +++-------------------------
> arch/arm64/kernel/time.c | 22 +++++++++-------
> 6 files changed, 43 insertions(+), 70 deletions(-)
>
> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> index 8aebc00c1718..e43dea1c6b41 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -61,9 +61,6 @@ struct stackframe {
> #endif
> };
>
> -extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
> -extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
> - bool (*fn)(void *, unsigned long), void *data);
> extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
> const char *loglvl);
>
> diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
> index 4a72c2727309..2f289013c9c9 100644
> --- a/arch/arm64/kernel/perf_callchain.c
> +++ b/arch/arm64/kernel/perf_callchain.c
> @@ -147,15 +147,12 @@ static bool callchain_trace(void *data, unsigned long pc)
> void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
> struct pt_regs *regs)
> {
> - struct stackframe frame;
> -
> if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
> /* We don't support guest os callchain now */
> return;
> }
>
> - start_backtrace(&frame, regs->regs[29], regs->pc);
> - walk_stackframe(current, &frame, callchain_trace, entry);
> + arch_stack_walk(callchain_trace, entry, current, regs);
> }

We can also update callchain_trace take the return value of
perf_callchain_store into acount, e.g.

| static bool callchain_trace(void *data, unsigned long pc)
| {
| struct perf_callchain_entry_ctx *entry = data;
| return perf_callchain_store(entry, pc) == 0;
| }

>
> unsigned long perf_instruction_pointer(struct pt_regs *regs)
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index c8989b999250..52c12fd26407 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -544,11 +544,28 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
> return last;
> }
>
> +struct wchan_info {
> + unsigned long pc;
> + int count;
> +};
> +
> +static bool get_wchan_cb(void *arg, unsigned long pc)
> +{
> + struct wchan_info *wchan_info = arg;
> +
> + if (!in_sched_functions(pc)) {
> + wchan_info->pc = pc;
> + return false;
> + }
> + wchan_info->count--;
> + return !!wchan_info->count;
> +}

This will terminate one entry earlier than the old logic since we used
to use a post-increment (testing the prior value), and now we're
effectively using a pre-decrement (testing the new value).

I don't think that matters all that much in practice, but it might be
best to keep the original logic, e.g. initialize `count` to 0 and here
do:

return wchan_info->count++ < 16;

> +
> unsigned long get_wchan(struct task_struct *p)
> {
> - struct stackframe frame;
> - unsigned long stack_page, ret = 0;
> - int count = 0;
> + unsigned long stack_page;
> + struct wchan_info wchan_info;
> +
> if (!p || p == current || task_is_running(p))
> return 0;
>
> @@ -556,20 +573,12 @@ unsigned long get_wchan(struct task_struct *p)
> if (!stack_page)
> return 0;
>
> - start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
> + wchan_info.pc = 0;
> + wchan_info.count = 16;
> + arch_stack_walk(get_wchan_cb, &wchan_info, p, NULL);
>
> - do {
> - if (unwind_frame(p, &frame))
> - goto out;
> - if (!in_sched_functions(frame.pc)) {
> - ret = frame.pc;
> - goto out;
> - }
> - } while (count++ < 16);
> -
> -out:
> put_task_stack(p);
> - return ret;
> + return wchan_info.pc;
> }

Other than the comment above, this looks good to me.

> unsigned long arch_align_stack(unsigned long sp)
> diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
> index a6d18755652f..92a0f4d434e4 100644
> --- a/arch/arm64/kernel/return_address.c
> +++ b/arch/arm64/kernel/return_address.c
> @@ -35,15 +35,11 @@ NOKPROBE_SYMBOL(save_return_addr);
> void *return_address(unsigned int level)
> {
> struct return_address_data data;
> - struct stackframe frame;
>
> data.level = level + 2;
> data.addr = NULL;
>
> - start_backtrace(&frame,
> - (unsigned long)__builtin_frame_address(0),
> - (unsigned long)return_address);
> - walk_stackframe(current, &frame, save_return_addr, &data);
> + arch_stack_walk(save_return_addr, &data, current, NULL);
>
> if (!data.level)
> return data.addr;

Nor that arch_stack_walk() will start with it's caller, so
return_address() will be included in the trace where it wasn't
previously, which implies we need to skip an additional level.

That said, I'm not entirely sure why we need to skip 2 levels today; it
might be worth checking that's correct.

We should also mark return_address() as noinline to avoid surprises with
LTO.

> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 8982a2b78acf..1800310f92be 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -151,23 +151,21 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
> }
> NOKPROBE_SYMBOL(walk_stackframe);
>
> -static void dump_backtrace_entry(unsigned long where, const char *loglvl)
> +static bool dump_backtrace_entry(void *arg, unsigned long where)
> {
> + char *loglvl = arg;
> printk("%s %pSb\n", loglvl, (void *)where);
> + return true;
> }
>
> void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
> const char *loglvl)
> {
> - struct stackframe frame;
> - int skip = 0;
> -
> pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
>
> if (regs) {
> if (user_mode(regs))
> return;
> - skip = 1;
> }

We can simplifiy this to:

if (regs && user_mode(regs))
return;

>
> if (!tsk)
> @@ -176,36 +174,8 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
> if (!try_get_task_stack(tsk))
> return;
>
> - if (tsk == current) {
> - start_backtrace(&frame,
> - (unsigned long)__builtin_frame_address(0),
> - (unsigned long)dump_backtrace);
> - } else {
> - /*
> - * task blocked in __switch_to
> - */
> - start_backtrace(&frame,
> - thread_saved_fp(tsk),
> - thread_saved_pc(tsk));
> - }
> -
> printk("%sCall trace:\n", loglvl);
> - do {
> - /* skip until specified stack frame */
> - if (!skip) {
> - dump_backtrace_entry(frame.pc, loglvl);
> - } else if (frame.fp == regs->regs[29]) {
> - skip = 0;
> - /*
> - * Mostly, this is the case where this function is
> - * called in panic/abort. As exception handler's
> - * stack frame does not contain the corresponding pc
> - * at which an exception has taken place, use regs->pc
> - * instead.
> - */
> - dump_backtrace_entry(regs->pc, loglvl);
> - }
> - } while (!unwind_frame(tsk, &frame));
> + arch_stack_walk(dump_backtrace_entry, (void *)loglvl, tsk, regs);

It turns out we currently need this skipping to get the balance the
ftrace call stack, and arch_stack_walk() doesn't currently do the right
thing when starting from regs. That balancing isn't quite right, and
will be wrong in some case when unwinding across exception boundaries;
we could implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR using the FP to
solve that.

>
> put_task_stack(tsk);
> }
> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
> index eebbc8d7123e..671b3038a772 100644
> --- a/arch/arm64/kernel/time.c
> +++ b/arch/arm64/kernel/time.c
> @@ -32,22 +32,26 @@
> #include <asm/stacktrace.h>
> #include <asm/paravirt.h>
>
> +static bool profile_pc_cb(void *arg, unsigned long pc)
> +{
> + unsigned long *prof_pc = arg;
> +
> + if (in_lock_functions(pc))
> + return true;
> + *prof_pc = pc;
> + return false;
> +}
> +
> unsigned long profile_pc(struct pt_regs *regs)
> {
> - struct stackframe frame;
> + unsigned long prof_pc = 0;
>
> if (!in_lock_functions(regs->pc))
> return regs->pc;
>
> - start_backtrace(&frame, regs->regs[29], regs->pc);
> -
> - do {
> - int ret = unwind_frame(NULL, &frame);
> - if (ret < 0)
> - return 0;
> - } while (in_lock_functions(frame.pc));
> + arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
>
> - return frame.pc;
> + return prof_pc;
> }
> EXPORT_SYMBOL(profile_pc);

Mdoulo the problem above w.r.t. unwinding from regs, this looks good.

Thanks,
Mark.

>
> --
> 2.25.1
>

2021-08-24 17:27:15

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: Re: [RFC PATCH v8 1/4] arm64: Make all stack walking functions use arch_stack_walk()

Thanks for the review. Responses inline...

On 8/24/21 8:13 AM, Mark Rutland wrote:
> On Thu, Aug 12, 2021 at 02:06:00PM -0500, [email protected] wrote:
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> Currently, there are multiple functions in ARM64 code that walk the
>> stack using start_backtrace() and unwind_frame(). Convert all of
>> them to use arch_stack_walk(). This makes maintenance easier.
>
> It would be good to split this into a series of patches as Mark Brown
> suggested in v7.
>

Will do.

>> Here is the list of functions:
>>
>> perf_callchain_kernel()
>> get_wchan()
>> return_address()
>> dump_backtrace()
>> profile_pc()
>
> Note that arch_stack_walk() depends on CONFIG_STACKTRACE (which is not in
> defconfig), so we'll need to reorganise things such that it's always defined,
> or factor out the core of that function and add a wrapper such that we
> can always use it.
>

I will include CONFIG_STACKTRACE in defconfig, if that is OK with you and
Mark Brown.

>> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
>> ---
>> arch/arm64/include/asm/stacktrace.h | 3 ---
>> arch/arm64/kernel/perf_callchain.c | 5 +---
>> arch/arm64/kernel/process.c | 39 ++++++++++++++++++-----------
>> arch/arm64/kernel/return_address.c | 6 +----
>> arch/arm64/kernel/stacktrace.c | 38 +++-------------------------
>> arch/arm64/kernel/time.c | 22 +++++++++-------
>> 6 files changed, 43 insertions(+), 70 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
>> index 8aebc00c1718..e43dea1c6b41 100644
>> --- a/arch/arm64/include/asm/stacktrace.h
>> +++ b/arch/arm64/include/asm/stacktrace.h
>> @@ -61,9 +61,6 @@ struct stackframe {
>> #endif
>> };
>>
>> -extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
>> -extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
>> - bool (*fn)(void *, unsigned long), void *data);
>> extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>> const char *loglvl);
>>
>> diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
>> index 4a72c2727309..2f289013c9c9 100644
>> --- a/arch/arm64/kernel/perf_callchain.c
>> +++ b/arch/arm64/kernel/perf_callchain.c
>> @@ -147,15 +147,12 @@ static bool callchain_trace(void *data, unsigned long pc)
>> void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
>> struct pt_regs *regs)
>> {
>> - struct stackframe frame;
>> -
>> if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
>> /* We don't support guest os callchain now */
>> return;
>> }
>>
>> - start_backtrace(&frame, regs->regs[29], regs->pc);
>> - walk_stackframe(current, &frame, callchain_trace, entry);
>> + arch_stack_walk(callchain_trace, entry, current, regs);
>> }
>
> We can also update callchain_trace take the return value of
> perf_callchain_store into acount, e.g.
>
> | static bool callchain_trace(void *data, unsigned long pc)
> | {
> | struct perf_callchain_entry_ctx *entry = data;
> | return perf_callchain_store(entry, pc) == 0;
> | }
>

OK.

>>
>> unsigned long perf_instruction_pointer(struct pt_regs *regs)
>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> index c8989b999250..52c12fd26407 100644
>> --- a/arch/arm64/kernel/process.c
>> +++ b/arch/arm64/kernel/process.c
>> @@ -544,11 +544,28 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
>> return last;
>> }
>>
>> +struct wchan_info {
>> + unsigned long pc;
>> + int count;
>> +};
>> +
>> +static bool get_wchan_cb(void *arg, unsigned long pc)
>> +{
>> + struct wchan_info *wchan_info = arg;
>> +
>> + if (!in_sched_functions(pc)) {
>> + wchan_info->pc = pc;
>> + return false;
>> + }
>> + wchan_info->count--;
>> + return !!wchan_info->count;
>> +}
>
> This will terminate one entry earlier than the old logic since we used
> to use a post-increment (testing the prior value), and now we're
> effectively using a pre-decrement (testing the new value).
>
> I don't think that matters all that much in practice, but it might be
> best to keep the original logic, e.g. initialize `count` to 0 and here
> do:
>
> return wchan_info->count++ < 16;
>

The reason I did it this way is that with the old logic the actual limit
implemented is 17 instead of 16. That seemed odd. But I could do it the
way you have suggested.


>> +
>> unsigned long get_wchan(struct task_struct *p)
>> {
>> - struct stackframe frame;
>> - unsigned long stack_page, ret = 0;
>> - int count = 0;
>> + unsigned long stack_page;
>> + struct wchan_info wchan_info;
>> +
>> if (!p || p == current || task_is_running(p))
>> return 0;
>>
>> @@ -556,20 +573,12 @@ unsigned long get_wchan(struct task_struct *p)
>> if (!stack_page)
>> return 0;
>>
>> - start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
>> + wchan_info.pc = 0;
>> + wchan_info.count = 16;
>> + arch_stack_walk(get_wchan_cb, &wchan_info, p, NULL);
>>
>> - do {
>> - if (unwind_frame(p, &frame))
>> - goto out;
>> - if (!in_sched_functions(frame.pc)) {
>> - ret = frame.pc;
>> - goto out;
>> - }
>> - } while (count++ < 16);
>> -
>> -out:
>> put_task_stack(p);
>> - return ret;
>> + return wchan_info.pc;
>> }
>
> Other than the comment above, this looks good to me.
>

OK.

>> unsigned long arch_align_stack(unsigned long sp)
>> diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
>> index a6d18755652f..92a0f4d434e4 100644
>> --- a/arch/arm64/kernel/return_address.c
>> +++ b/arch/arm64/kernel/return_address.c
>> @@ -35,15 +35,11 @@ NOKPROBE_SYMBOL(save_return_addr);
>> void *return_address(unsigned int level)
>> {
>> struct return_address_data data;
>> - struct stackframe frame;
>>
>> data.level = level + 2;
>> data.addr = NULL;
>>
>> - start_backtrace(&frame,
>> - (unsigned long)__builtin_frame_address(0),
>> - (unsigned long)return_address);
>> - walk_stackframe(current, &frame, save_return_addr, &data);
>> + arch_stack_walk(save_return_addr, &data, current, NULL);
>>
>> if (!data.level)
>> return data.addr;
>
> Nor that arch_stack_walk() will start with it's caller, so
> return_address() will be included in the trace where it wasn't
> previously, which implies we need to skip an additional level.
>

You are correct. I will fix this. Thanks for catching this.

> That said, I'm not entirely sure why we need to skip 2 levels today; it
> might be worth checking that's correct.
>

AFAICT, return_address() acts like builtin_return_address(). That is, it
returns the address of the caller. If func() calls return_address(),
func() wants its caller's address. So, return_address() and func() need to
be skipped.

I will change it to skip 3 levels instead of 2.


> We should also mark return_address() as noinline to avoid surprises with
> LTO.
>

Will do.

>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index 8982a2b78acf..1800310f92be 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -151,23 +151,21 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
>> }
>> NOKPROBE_SYMBOL(walk_stackframe);
>>
>> -static void dump_backtrace_entry(unsigned long where, const char *loglvl)
>> +static bool dump_backtrace_entry(void *arg, unsigned long where)
>> {
>> + char *loglvl = arg;
>> printk("%s %pSb\n", loglvl, (void *)where);
>> + return true;
>> }
>>
>> void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>> const char *loglvl)
>> {
>> - struct stackframe frame;
>> - int skip = 0;
>> -
>> pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
>>
>> if (regs) {
>> if (user_mode(regs))
>> return;
>> - skip = 1;
>> }
>
> We can simplifiy this to:
>
> if (regs && user_mode(regs))
> return;
>

OK.

>>
>> if (!tsk)
>> @@ -176,36 +174,8 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>> if (!try_get_task_stack(tsk))
>> return;
>>
>> - if (tsk == current) {
>> - start_backtrace(&frame,
>> - (unsigned long)__builtin_frame_address(0),
>> - (unsigned long)dump_backtrace);
>> - } else {
>> - /*
>> - * task blocked in __switch_to
>> - */
>> - start_backtrace(&frame,
>> - thread_saved_fp(tsk),
>> - thread_saved_pc(tsk));
>> - }
>> -
>> printk("%sCall trace:\n", loglvl);
>> - do {
>> - /* skip until specified stack frame */
>> - if (!skip) {
>> - dump_backtrace_entry(frame.pc, loglvl);
>> - } else if (frame.fp == regs->regs[29]) {
>> - skip = 0;
>> - /*
>> - * Mostly, this is the case where this function is
>> - * called in panic/abort. As exception handler's
>> - * stack frame does not contain the corresponding pc
>> - * at which an exception has taken place, use regs->pc
>> - * instead.
>> - */
>> - dump_backtrace_entry(regs->pc, loglvl);
>> - }
>> - } while (!unwind_frame(tsk, &frame));
>> + arch_stack_walk(dump_backtrace_entry, (void *)loglvl, tsk, regs);
>
> It turns out we currently need this skipping to get the balance the
> ftrace call stack, and arch_stack_walk() doesn't currently do the right
> thing when starting from regs. That balancing isn't quite right, and
> will be wrong in some case when unwinding across exception boundaries;
> we could implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR using the FP to
> solve that.
>

I am not sure that I completely understand. So, I will study this and get back
to you with any questions.

>>
>> put_task_stack(tsk);
>> }
>> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
>> index eebbc8d7123e..671b3038a772 100644
>> --- a/arch/arm64/kernel/time.c
>> +++ b/arch/arm64/kernel/time.c
>> @@ -32,22 +32,26 @@
>> #include <asm/stacktrace.h>
>> #include <asm/paravirt.h>
>>
>> +static bool profile_pc_cb(void *arg, unsigned long pc)
>> +{
>> + unsigned long *prof_pc = arg;
>> +
>> + if (in_lock_functions(pc))
>> + return true;
>> + *prof_pc = pc;
>> + return false;
>> +}
>> +
>> unsigned long profile_pc(struct pt_regs *regs)
>> {
>> - struct stackframe frame;
>> + unsigned long prof_pc = 0;
>>
>> if (!in_lock_functions(regs->pc))
>> return regs->pc;
>>
>> - start_backtrace(&frame, regs->regs[29], regs->pc);
>> -
>> - do {
>> - int ret = unwind_frame(NULL, &frame);
>> - if (ret < 0)
>> - return 0;
>> - } while (in_lock_functions(frame.pc));
>> + arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
>>
>> - return frame.pc;
>> + return prof_pc;
>> }
>> EXPORT_SYMBOL(profile_pc);
>
> Mdoulo the problem above w.r.t. unwinding from regs, this looks good.
>

Great. Thanks.

Madhavan

2021-08-24 17:44:38

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: Re: [RFC PATCH v8 1/4] arm64: Make all stack walking functions use arch_stack_walk()



>>> diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
>>> index a6d18755652f..92a0f4d434e4 100644
>>> --- a/arch/arm64/kernel/return_address.c
>>> +++ b/arch/arm64/kernel/return_address.c
>>> @@ -35,15 +35,11 @@ NOKPROBE_SYMBOL(save_return_addr);
>>> void *return_address(unsigned int level)
>>> {
>>> struct return_address_data data;
>>> - struct stackframe frame;
>>>
>>> data.level = level + 2;
>>> data.addr = NULL;
>>>
>>> - start_backtrace(&frame,
>>> - (unsigned long)__builtin_frame_address(0),
>>> - (unsigned long)return_address);
>>> - walk_stackframe(current, &frame, save_return_addr, &data);
>>> + arch_stack_walk(save_return_addr, &data, current, NULL);
>>>
>>> if (!data.level)
>>> return data.addr;
>>
>> Nor that arch_stack_walk() will start with it's caller, so
>> return_address() will be included in the trace where it wasn't
>> previously, which implies we need to skip an additional level.
>>
>
> You are correct. I will fix this. Thanks for catching this.
>
>> That said, I'm not entirely sure why we need to skip 2 levels today; it
>> might be worth checking that's correct.
>>
>
> AFAICT, return_address() acts like builtin_return_address(). That is, it
> returns the address of the caller. If func() calls return_address(),
> func() wants its caller's address. So, return_address() and func() need to
> be skipped.
>
> I will change it to skip 3 levels instead of 2.
>

Actually, I take that back. I remember now. return_address() used to start
with PC=return_address(). That is, it used to start with itself. arch_stack_walk()
starts with its caller which, in this case, is return_address(). So, I don't need
to change anything.

Do you agree?

Madhavan

2021-08-24 17:44:51

by Mark Brown

[permalink] [raw]
Subject: Re: [RFC PATCH v8 1/4] arm64: Make all stack walking functions use arch_stack_walk()

On Tue, Aug 24, 2021 at 12:21:28PM -0500, Madhavan T. Venkataraman wrote:
> On 8/24/21 8:13 AM, Mark Rutland wrote:
> > On Thu, Aug 12, 2021 at 02:06:00PM -0500, [email protected] wrote:

> > Note that arch_stack_walk() depends on CONFIG_STACKTRACE (which is not in
> > defconfig), so we'll need to reorganise things such that it's always defined,
> > or factor out the core of that function and add a wrapper such that we
> > can always use it.

> I will include CONFIG_STACKTRACE in defconfig, if that is OK with you and
> Mark Brown.

That might be separately useful but it doesn't address the issue, if
something is optional we need to handle the case where that option is
disabled. It'll need to be one of the two options Mark Rutland
mentioned above.


Attachments:
(No filename) (785.00 B)
signature.asc (499.00 B)
Download all attachments

2021-08-24 17:46:52

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: Re: [RFC PATCH v8 1/4] arm64: Make all stack walking functions use arch_stack_walk()



On 8/24/21 12:38 PM, Mark Brown wrote:
> On Tue, Aug 24, 2021 at 12:21:28PM -0500, Madhavan T. Venkataraman wrote:
>> On 8/24/21 8:13 AM, Mark Rutland wrote:
>>> On Thu, Aug 12, 2021 at 02:06:00PM -0500, [email protected] wrote:
>
>>> Note that arch_stack_walk() depends on CONFIG_STACKTRACE (which is not in
>>> defconfig), so we'll need to reorganise things such that it's always defined,
>>> or factor out the core of that function and add a wrapper such that we
>>> can always use it.
>
>> I will include CONFIG_STACKTRACE in defconfig, if that is OK with you and
>> Mark Brown.
>
> That might be separately useful but it doesn't address the issue, if
> something is optional we need to handle the case where that option is
> disabled. It'll need to be one of the two options Mark Rutland
> mentioned above.
>

OK. I will do it so it is always defined.

Thanks.

Madhavan

2021-08-25 00:12:32

by [email protected]

[permalink] [raw]
Subject: RE: [RFC PATCH v8 3/4] arm64: Introduce stack trace reliability checks in the unwinder

> > Hi Madhavan,
> >
> >> @@ -245,7 +271,36 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
> >> fp = thread_saved_fp(task);
> >> pc = thread_saved_pc(task);
> >> }
> >> - unwind(consume_entry, cookie, task, fp, pc);
> >> + unwind(consume_entry, cookie, task, fp, pc, false); }
> >> +
> >> +/*
> >> + * arch_stack_walk_reliable() may not be used for livepatch until
> >> +all of
> >> + * the reliability checks are in place in unwind_consume(). However,
> >> + * debug and test code can choose to use it even if all the checks
> >> +are not
> >> + * in place.
> >> + */
> >
> > I'm glad to see the long-awaited function :)
> >
> > Does the above comment mean that this comment will be removed by
> > another patch series that about live patch enablement, instead of [PATCH 4/4]?
> >
> > It seems to take time... But I start thinking about test code.
> >
>
> Yes. This comment will be removed when livepatch will be enabled eventually.
> So, AFAICT, there are 4 pieces that are needed:
>
> - Reliable stack trace in the kernel. I am trying to address that with my patch
> series.
>
> - Mark Rutland's work for making patching safe on ARM64.
>
> - Objtool (or alternative method) for stack validation.
>
> - Suraj Jitindar Singh's patch for miscellaneous things needed to enable live patch.
>
> Once all of these pieces are in place, livepatch can be enabled.
>
> That said, arch_stack_walk_reliable() can be used for test and debug purposes anytime once this patch series gets accepted.
>
> Thanks.
>
> Madhavan


Thank you for the information.

Keiya

2021-08-26 04:55:08

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: Re: [RFC PATCH v8 1/4] arm64: Make all stack walking functions use arch_stack_walk()

Hi Mark Rutland, Mark Brown,

Do you have any comments on the reliability part of the patch series?

Madhavan

On 8/24/21 8:13 AM, Mark Rutland wrote:
> On Thu, Aug 12, 2021 at 02:06:00PM -0500, [email protected] wrote:
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> Currently, there are multiple functions in ARM64 code that walk the
>> stack using start_backtrace() and unwind_frame(). Convert all of
>> them to use arch_stack_walk(). This makes maintenance easier.
>
> It would be good to split this into a series of patches as Mark Brown
> suggested in v7.
>
>> Here is the list of functions:
>>
>> perf_callchain_kernel()
>> get_wchan()
>> return_address()
>> dump_backtrace()
>> profile_pc()
>
> Note that arch_stack_walk() depends on CONFIG_STACKTRACE (which is not in
> defconfig), so we'll need to reorganise things such that it's always defined,
> or factor out the core of that function and add a wrapper such that we
> can always use it.
>
>> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
>> ---
>> arch/arm64/include/asm/stacktrace.h | 3 ---
>> arch/arm64/kernel/perf_callchain.c | 5 +---
>> arch/arm64/kernel/process.c | 39 ++++++++++++++++++-----------
>> arch/arm64/kernel/return_address.c | 6 +----
>> arch/arm64/kernel/stacktrace.c | 38 +++-------------------------
>> arch/arm64/kernel/time.c | 22 +++++++++-------
>> 6 files changed, 43 insertions(+), 70 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
>> index 8aebc00c1718..e43dea1c6b41 100644
>> --- a/arch/arm64/include/asm/stacktrace.h
>> +++ b/arch/arm64/include/asm/stacktrace.h
>> @@ -61,9 +61,6 @@ struct stackframe {
>> #endif
>> };
>>
>> -extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
>> -extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
>> - bool (*fn)(void *, unsigned long), void *data);
>> extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>> const char *loglvl);
>>
>> diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
>> index 4a72c2727309..2f289013c9c9 100644
>> --- a/arch/arm64/kernel/perf_callchain.c
>> +++ b/arch/arm64/kernel/perf_callchain.c
>> @@ -147,15 +147,12 @@ static bool callchain_trace(void *data, unsigned long pc)
>> void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
>> struct pt_regs *regs)
>> {
>> - struct stackframe frame;
>> -
>> if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
>> /* We don't support guest os callchain now */
>> return;
>> }
>>
>> - start_backtrace(&frame, regs->regs[29], regs->pc);
>> - walk_stackframe(current, &frame, callchain_trace, entry);
>> + arch_stack_walk(callchain_trace, entry, current, regs);
>> }
>
> We can also update callchain_trace take the return value of
> perf_callchain_store into acount, e.g.
>
> | static bool callchain_trace(void *data, unsigned long pc)
> | {
> | struct perf_callchain_entry_ctx *entry = data;
> | return perf_callchain_store(entry, pc) == 0;
> | }
>
>>
>> unsigned long perf_instruction_pointer(struct pt_regs *regs)
>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> index c8989b999250..52c12fd26407 100644
>> --- a/arch/arm64/kernel/process.c
>> +++ b/arch/arm64/kernel/process.c
>> @@ -544,11 +544,28 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
>> return last;
>> }
>>
>> +struct wchan_info {
>> + unsigned long pc;
>> + int count;
>> +};
>> +
>> +static bool get_wchan_cb(void *arg, unsigned long pc)
>> +{
>> + struct wchan_info *wchan_info = arg;
>> +
>> + if (!in_sched_functions(pc)) {
>> + wchan_info->pc = pc;
>> + return false;
>> + }
>> + wchan_info->count--;
>> + return !!wchan_info->count;
>> +}
>
> This will terminate one entry earlier than the old logic since we used
> to use a post-increment (testing the prior value), and now we're
> effectively using a pre-decrement (testing the new value).
>
> I don't think that matters all that much in practice, but it might be
> best to keep the original logic, e.g. initialize `count` to 0 and here
> do:
>
> return wchan_info->count++ < 16;
>
>> +
>> unsigned long get_wchan(struct task_struct *p)
>> {
>> - struct stackframe frame;
>> - unsigned long stack_page, ret = 0;
>> - int count = 0;
>> + unsigned long stack_page;
>> + struct wchan_info wchan_info;
>> +
>> if (!p || p == current || task_is_running(p))
>> return 0;
>>
>> @@ -556,20 +573,12 @@ unsigned long get_wchan(struct task_struct *p)
>> if (!stack_page)
>> return 0;
>>
>> - start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
>> + wchan_info.pc = 0;
>> + wchan_info.count = 16;
>> + arch_stack_walk(get_wchan_cb, &wchan_info, p, NULL);
>>
>> - do {
>> - if (unwind_frame(p, &frame))
>> - goto out;
>> - if (!in_sched_functions(frame.pc)) {
>> - ret = frame.pc;
>> - goto out;
>> - }
>> - } while (count++ < 16);
>> -
>> -out:
>> put_task_stack(p);
>> - return ret;
>> + return wchan_info.pc;
>> }
>
> Other than the comment above, this looks good to me.
>
>> unsigned long arch_align_stack(unsigned long sp)
>> diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
>> index a6d18755652f..92a0f4d434e4 100644
>> --- a/arch/arm64/kernel/return_address.c
>> +++ b/arch/arm64/kernel/return_address.c
>> @@ -35,15 +35,11 @@ NOKPROBE_SYMBOL(save_return_addr);
>> void *return_address(unsigned int level)
>> {
>> struct return_address_data data;
>> - struct stackframe frame;
>>
>> data.level = level + 2;
>> data.addr = NULL;
>>
>> - start_backtrace(&frame,
>> - (unsigned long)__builtin_frame_address(0),
>> - (unsigned long)return_address);
>> - walk_stackframe(current, &frame, save_return_addr, &data);
>> + arch_stack_walk(save_return_addr, &data, current, NULL);
>>
>> if (!data.level)
>> return data.addr;
>
> Nor that arch_stack_walk() will start with it's caller, so
> return_address() will be included in the trace where it wasn't
> previously, which implies we need to skip an additional level.
>
> That said, I'm not entirely sure why we need to skip 2 levels today; it
> might be worth checking that's correct.
>
> We should also mark return_address() as noinline to avoid surprises with
> LTO.
>
>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index 8982a2b78acf..1800310f92be 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -151,23 +151,21 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
>> }
>> NOKPROBE_SYMBOL(walk_stackframe);
>>
>> -static void dump_backtrace_entry(unsigned long where, const char *loglvl)
>> +static bool dump_backtrace_entry(void *arg, unsigned long where)
>> {
>> + char *loglvl = arg;
>> printk("%s %pSb\n", loglvl, (void *)where);
>> + return true;
>> }
>>
>> void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>> const char *loglvl)
>> {
>> - struct stackframe frame;
>> - int skip = 0;
>> -
>> pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
>>
>> if (regs) {
>> if (user_mode(regs))
>> return;
>> - skip = 1;
>> }
>
> We can simplifiy this to:
>
> if (regs && user_mode(regs))
> return;
>
>>
>> if (!tsk)
>> @@ -176,36 +174,8 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>> if (!try_get_task_stack(tsk))
>> return;
>>
>> - if (tsk == current) {
>> - start_backtrace(&frame,
>> - (unsigned long)__builtin_frame_address(0),
>> - (unsigned long)dump_backtrace);
>> - } else {
>> - /*
>> - * task blocked in __switch_to
>> - */
>> - start_backtrace(&frame,
>> - thread_saved_fp(tsk),
>> - thread_saved_pc(tsk));
>> - }
>> -
>> printk("%sCall trace:\n", loglvl);
>> - do {
>> - /* skip until specified stack frame */
>> - if (!skip) {
>> - dump_backtrace_entry(frame.pc, loglvl);
>> - } else if (frame.fp == regs->regs[29]) {
>> - skip = 0;
>> - /*
>> - * Mostly, this is the case where this function is
>> - * called in panic/abort. As exception handler's
>> - * stack frame does not contain the corresponding pc
>> - * at which an exception has taken place, use regs->pc
>> - * instead.
>> - */
>> - dump_backtrace_entry(regs->pc, loglvl);
>> - }
>> - } while (!unwind_frame(tsk, &frame));
>> + arch_stack_walk(dump_backtrace_entry, (void *)loglvl, tsk, regs);
>
> It turns out we currently need this skipping to get the balance the
> ftrace call stack, and arch_stack_walk() doesn't currently do the right
> thing when starting from regs. That balancing isn't quite right, and
> will be wrong in some case when unwinding across exception boundaries;
> we could implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR using the FP to
> solve that.
>
>>
>> put_task_stack(tsk);
>> }
>> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
>> index eebbc8d7123e..671b3038a772 100644
>> --- a/arch/arm64/kernel/time.c
>> +++ b/arch/arm64/kernel/time.c
>> @@ -32,22 +32,26 @@
>> #include <asm/stacktrace.h>
>> #include <asm/paravirt.h>
>>
>> +static bool profile_pc_cb(void *arg, unsigned long pc)
>> +{
>> + unsigned long *prof_pc = arg;
>> +
>> + if (in_lock_functions(pc))
>> + return true;
>> + *prof_pc = pc;
>> + return false;
>> +}
>> +
>> unsigned long profile_pc(struct pt_regs *regs)
>> {
>> - struct stackframe frame;
>> + unsigned long prof_pc = 0;
>>
>> if (!in_lock_functions(regs->pc))
>> return regs->pc;
>>
>> - start_backtrace(&frame, regs->regs[29], regs->pc);
>> -
>> - do {
>> - int ret = unwind_frame(NULL, &frame);
>> - if (ret < 0)
>> - return 0;
>> - } while (in_lock_functions(frame.pc));
>> + arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
>>
>> - return frame.pc;
>> + return prof_pc;
>> }
>> EXPORT_SYMBOL(profile_pc);
>
> Mdoulo the problem above w.r.t. unwinding from regs, this looks good.
>
> Thanks,
> Mark.
>
>>
>> --
>> 2.25.1
>>

2021-08-26 16:27:17

by Mark Brown

[permalink] [raw]
Subject: Re: [RFC PATCH v8 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance

On Thu, Aug 12, 2021 at 02:06:01PM -0500, [email protected] wrote:

> Renaming of unwinder functions
> ==============================

> Rename unwinder functions to unwind_*() similar to other architectures
> for naming consistency. More on this below.

This feels like it could probably do with splitting up a bit for
reviewability, several of these headers you've got in the commit
logs look like they could be separate commits. Splitting things
up does help with reviewability, having only one change to keep
in mind at once is a lot less cognative load.

> Replace walk_stackframe() with unwind()
> =======================================
>
> walk_stackframe() contains the unwinder loop that walks the stack
> frames. Currently, start_backtrace() and walk_stackframe() are called
> separately. They should be combined in the same function. Also, the
> loop in walk_stackframe() should be simplified and should look like
> the unwind loops in other architectures such as X86 and S390.

This definitely seems like a separate change.


Attachments:
(No filename) (1.05 kB)
signature.asc (499.00 B)
Download all attachments

2021-08-26 16:29:19

by Mark Brown

[permalink] [raw]
Subject: Re: [RFC PATCH v8 3/4] arm64: Introduce stack trace reliability checks in the unwinder

On Thu, Aug 12, 2021 at 02:06:02PM -0500, [email protected] wrote:

> + if (frame->need_reliable && !unwind_is_reliable(frame)) {
> + /* Cannot unwind to the next frame reliably. */
> + frame->failed = true;
> + return false;
> + }

This means we only collect reliability information in the case
where we're specifically doing a reliable stacktrace. For
example when printing stack traces on the console it might be
useful to print a ? or something if the frame is unreliable as a
hint to the reader that the information might be misleading.
Could we therefore change the flag here to a reliability one and
our need_reliable check so that we always run
unwind_is_reliable()?

I'm not sure if we need to abandon the trace on first error when
doing a reliable trace but I can see it's a bit safer so perhaps
better to do so. If we don't abandon then we don't require the
need_reliable check at all.


Attachments:
(No filename) (934.00 B)
signature.asc (499.00 B)
Download all attachments

2021-08-26 23:21:09

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: Re: [RFC PATCH v8 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance



On 8/26/21 10:46 AM, Mark Brown wrote:
> On Thu, Aug 12, 2021 at 02:06:01PM -0500, [email protected] wrote:
>
>> Renaming of unwinder functions
>> ==============================
>
>> Rename unwinder functions to unwind_*() similar to other architectures
>> for naming consistency. More on this below.
>
> This feels like it could probably do with splitting up a bit for
> reviewability, several of these headers you've got in the commit
> logs look like they could be separate commits. Splitting things
> up does help with reviewability, having only one change to keep
> in mind at once is a lot less cognative load.
>
>> Replace walk_stackframe() with unwind()
>> =======================================
>>
>> walk_stackframe() contains the unwinder loop that walks the stack
>> frames. Currently, start_backtrace() and walk_stackframe() are called
>> separately. They should be combined in the same function. Also, the
>> loop in walk_stackframe() should be simplified and should look like
>> the unwind loops in other architectures such as X86 and S390.
>
> This definitely seems like a separate change.
>

OK. I will take a look at splitting the patch.

I am also requesting a review of the sym_code special section approach.
I know that you have already approved it. I wanted one more vote. Then,
I can remove the "RFC" word from the title and then it will be just a
code review of the patch series.

Mark Rutland,

Do you also approve the idea of placing unreliable functions (from an unwind
perspective) in a special section and using that in the unwinder for
reliable stack trace?

Thanks.

Madhavan

2021-08-26 23:33:17

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: Re: [RFC PATCH v8 3/4] arm64: Introduce stack trace reliability checks in the unwinder



On 8/26/21 10:57 AM, Mark Brown wrote:
> On Thu, Aug 12, 2021 at 02:06:02PM -0500, [email protected] wrote:
>
>> + if (frame->need_reliable && !unwind_is_reliable(frame)) {
>> + /* Cannot unwind to the next frame reliably. */
>> + frame->failed = true;
>> + return false;
>> + }
>
> This means we only collect reliability information in the case
> where we're specifically doing a reliable stacktrace. For
> example when printing stack traces on the console it might be
> useful to print a ? or something if the frame is unreliable as a
> hint to the reader that the information might be misleading.
> Could we therefore change the flag here to a reliability one and
> our need_reliable check so that we always run
> unwind_is_reliable()?
>
> I'm not sure if we need to abandon the trace on first error when
> doing a reliable trace but I can see it's a bit safer so perhaps
> better to do so. If we don't abandon then we don't require the
> need_reliable check at all.
>

I think that the caller should be able to specify that the stack trace
should be abandoned. Like Livepatch.

So, we could always do the reliability check. But keep need_reliable.

Thanks.

Madhavan

2021-09-01 16:26:46

by Mark Brown

[permalink] [raw]
Subject: Re: [RFC PATCH v8 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance

On Thu, Aug 26, 2021 at 06:19:07PM -0500, Madhavan T. Venkataraman wrote:

> Mark Rutland,

> Do you also approve the idea of placing unreliable functions (from an unwind
> perspective) in a special section and using that in the unwinder for
> reliable stack trace?

Rutland is on vacation for a couple of weeks so he's unlikely to reply
before the merge window is over I'm afraid.


Attachments:
(No filename) (392.00 B)
signature.asc (499.00 B)
Download all attachments

2021-09-02 07:14:42

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: Re: [RFC PATCH v8 2/4] arm64: Reorganize the unwinder code for better consistency and maintenance



On 9/1/21 11:20 AM, Mark Brown wrote:
> On Thu, Aug 26, 2021 at 06:19:07PM -0500, Madhavan T. Venkataraman wrote:
>
>> Mark Rutland,
>
>> Do you also approve the idea of placing unreliable functions (from an unwind
>> perspective) in a special section and using that in the unwinder for
>> reliable stack trace?
>
> Rutland is on vacation for a couple of weeks so he's unlikely to reply
> before the merge window is over I'm afraid.
>

OK. I am pretty sure he is fine with the special sections idea. So, I will
send out version 8 with the changes you requested and without the "RFC".

Thanks.

Madhavan

2021-10-09 23:49:56

by Madhavan T. Venkataraman

[permalink] [raw]
Subject: Re: [RFC PATCH v8 1/4] arm64: Make all stack walking functions use arch_stack_walk()



On 8/24/21 8:13 AM, Mark Rutland wrote:
> On Thu, Aug 12, 2021 at 02:06:00PM -0500, [email protected] wrote:
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> Currently, there are multiple functions in ARM64 code that walk the
>> stack using start_backtrace() and unwind_frame(). Convert all of
>> them to use arch_stack_walk(). This makes maintenance easier.
>
> It would be good to split this into a series of patches as Mark Brown
> suggested in v7.
>
>> Here is the list of functions:
>>
>> perf_callchain_kernel()
>> get_wchan()
>> return_address()
>> dump_backtrace()
>> profile_pc()
>
> Note that arch_stack_walk() depends on CONFIG_STACKTRACE (which is not in
> defconfig), so we'll need to reorganise things such that it's always defined,
> or factor out the core of that function and add a wrapper such that we
> can always use it.
>
>> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
>> ---
>> arch/arm64/include/asm/stacktrace.h | 3 ---
>> arch/arm64/kernel/perf_callchain.c | 5 +---
>> arch/arm64/kernel/process.c | 39 ++++++++++++++++++-----------
>> arch/arm64/kernel/return_address.c | 6 +----
>> arch/arm64/kernel/stacktrace.c | 38 +++-------------------------
>> arch/arm64/kernel/time.c | 22 +++++++++-------
>> 6 files changed, 43 insertions(+), 70 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
>> index 8aebc00c1718..e43dea1c6b41 100644
>> --- a/arch/arm64/include/asm/stacktrace.h
>> +++ b/arch/arm64/include/asm/stacktrace.h
>> @@ -61,9 +61,6 @@ struct stackframe {
>> #endif
>> };
>>
>> -extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
>> -extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
>> - bool (*fn)(void *, unsigned long), void *data);
>> extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>> const char *loglvl);
>>
>> diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
>> index 4a72c2727309..2f289013c9c9 100644
>> --- a/arch/arm64/kernel/perf_callchain.c
>> +++ b/arch/arm64/kernel/perf_callchain.c
>> @@ -147,15 +147,12 @@ static bool callchain_trace(void *data, unsigned long pc)
>> void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
>> struct pt_regs *regs)
>> {
>> - struct stackframe frame;
>> -
>> if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
>> /* We don't support guest os callchain now */
>> return;
>> }
>>
>> - start_backtrace(&frame, regs->regs[29], regs->pc);
>> - walk_stackframe(current, &frame, callchain_trace, entry);
>> + arch_stack_walk(callchain_trace, entry, current, regs);
>> }
>
> We can also update callchain_trace take the return value of
> perf_callchain_store into acount, e.g.
>
> | static bool callchain_trace(void *data, unsigned long pc)
> | {
> | struct perf_callchain_entry_ctx *entry = data;
> | return perf_callchain_store(entry, pc) == 0;
> | }
>
>>
>> unsigned long perf_instruction_pointer(struct pt_regs *regs)
>> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> index c8989b999250..52c12fd26407 100644
>> --- a/arch/arm64/kernel/process.c
>> +++ b/arch/arm64/kernel/process.c
>> @@ -544,11 +544,28 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
>> return last;
>> }
>>
>> +struct wchan_info {
>> + unsigned long pc;
>> + int count;
>> +};
>> +
>> +static bool get_wchan_cb(void *arg, unsigned long pc)
>> +{
>> + struct wchan_info *wchan_info = arg;
>> +
>> + if (!in_sched_functions(pc)) {
>> + wchan_info->pc = pc;
>> + return false;
>> + }
>> + wchan_info->count--;
>> + return !!wchan_info->count;
>> +}
>
> This will terminate one entry earlier than the old logic since we used
> to use a post-increment (testing the prior value), and now we're
> effectively using a pre-decrement (testing the new value).
>
> I don't think that matters all that much in practice, but it might be
> best to keep the original logic, e.g. initialize `count` to 0 and here
> do:
>
> return wchan_info->count++ < 16;
>
>> +
>> unsigned long get_wchan(struct task_struct *p)
>> {
>> - struct stackframe frame;
>> - unsigned long stack_page, ret = 0;
>> - int count = 0;
>> + unsigned long stack_page;
>> + struct wchan_info wchan_info;
>> +
>> if (!p || p == current || task_is_running(p))
>> return 0;
>>
>> @@ -556,20 +573,12 @@ unsigned long get_wchan(struct task_struct *p)
>> if (!stack_page)
>> return 0;
>>
>> - start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
>> + wchan_info.pc = 0;
>> + wchan_info.count = 16;
>> + arch_stack_walk(get_wchan_cb, &wchan_info, p, NULL);
>>
>> - do {
>> - if (unwind_frame(p, &frame))
>> - goto out;
>> - if (!in_sched_functions(frame.pc)) {
>> - ret = frame.pc;
>> - goto out;
>> - }
>> - } while (count++ < 16);
>> -
>> -out:
>> put_task_stack(p);
>> - return ret;
>> + return wchan_info.pc;
>> }
>
> Other than the comment above, this looks good to me.
>
>> unsigned long arch_align_stack(unsigned long sp)
>> diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
>> index a6d18755652f..92a0f4d434e4 100644
>> --- a/arch/arm64/kernel/return_address.c
>> +++ b/arch/arm64/kernel/return_address.c
>> @@ -35,15 +35,11 @@ NOKPROBE_SYMBOL(save_return_addr);
>> void *return_address(unsigned int level)
>> {> struct return_address_data data;
>> - struct stackframe frame;
>>
>> data.level = level + 2;
>> data.addr = NULL;
>>
>> - start_backtrace(&frame,
>> - (unsigned long)__builtin_frame_address(0),
>> - (unsigned long)return_address);
>> - walk_stackframe(current, &frame, save_return_addr, &data);
>> + arch_stack_walk(save_return_addr, &data, current, NULL);
>>
>> if (!data.level)
>> return data.addr;
>
> Nor that arch_stack_walk() will start with it's caller, so
> return_address() will be included in the trace where it wasn't
> previously, which implies we need to skip an additional level.
>
> That said, I'm not entirely sure why we need to skip 2 levels today; it
> might be worth checking that's correct.
>
> We should also mark return_address() as noinline to avoid surprises with
> LTO.
>
>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index 8982a2b78acf..1800310f92be 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -151,23 +151,21 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
>> }
>> NOKPROBE_SYMBOL(walk_stackframe);
>>
>> -static void dump_backtrace_entry(unsigned long where, const char *loglvl)
>> +static bool dump_backtrace_entry(void *arg, unsigned long where)
>> {
>> + char *loglvl = arg;
>> printk("%s %pSb\n", loglvl, (void *)where);
>> + return true;
>> }
>>
>> void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>> const char *loglvl)
>> {
>> - struct stackframe frame;
>> - int skip = 0;
>> -
>> pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
>>
>> if (regs) {
>> if (user_mode(regs))
>> return;
>> - skip = 1;
>> }
>
> We can simplifiy this to:
>
> if (regs && user_mode(regs))
> return;
>
>>
>> if (!tsk)
>> @@ -176,36 +174,8 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
>> if (!try_get_task_stack(tsk))
>> return;
>>
>> - if (tsk == current) {
>> - start_backtrace(&frame,
>> - (unsigned long)__builtin_frame_address(0),
>> - (unsigned long)dump_backtrace);
>> - } else {
>> - /*
>> - * task blocked in __switch_to
>> - */
>> - start_backtrace(&frame,
>> - thread_saved_fp(tsk),
>> - thread_saved_pc(tsk));
>> - }
>> -
>> printk("%sCall trace:\n", loglvl);
>> - do {
>> - /* skip until specified stack frame */
>> - if (!skip) {
>> - dump_backtrace_entry(frame.pc, loglvl);
>> - } else if (frame.fp == regs->regs[29]) {
>> - skip = 0;
>> - /*
>> - * Mostly, this is the case where this function is
>> - * called in panic/abort. As exception handler's
>> - * stack frame does not contain the corresponding pc
>> - * at which an exception has taken place, use regs->pc
>> - * instead.
>> - */
>> - dump_backtrace_entry(regs->pc, loglvl);
>> - }
>> - } while (!unwind_frame(tsk, &frame));
>> + arch_stack_walk(dump_backtrace_entry, (void *)loglvl, tsk, regs);
>
> It turns out we currently need this skipping to get the balance the
> ftrace call stack, and arch_stack_walk() doesn't currently do the right
> thing when starting from regs. That balancing isn't quite right, and
> will be wrong in some case when unwinding across exception boundaries;
> we could implement HAVE_FUNCTION_GRAPH_RET_ADDR_PTR using the FP to
> solve that.
>

Hi Mark,

It seems that the behavior is the same in the old and new code. Do you
agree?

In the old code when regs is used, the stack trace starts from regs->regs[29]
and the first PC displayed is regs->pc. arch_stack_walk() does the same thing.

Can you elaborate what change you want me to make here?

Madhavan