From: "Madhavan T. Venkataraman" <[email protected]>
Make all stack walking functions use arch_stack_walk()
======================================================
Currently, there are multiple functions in ARM64 code that walk the
stack using start_backtrace() and unwind_frame() or start_backtrace()
and walk_stackframe(). Convert all of them to use arch_stack_walk().
This makes maintenance easier.
This means that arch_stack_walk() needs to be always defined. So,
select CONFIG_STACKTRACE in the ARM64 Kconfig file.
Consolidate the unwinder
========================
Currently, start_backtrace() and walk_stackframe() are called separately.
There is no need to do that. Move the call to start_backtrace() into
walk_stackframe() so that walk_stackframe() is the only unwinder function
a consumer needs to call.
The consumers of walk_stackframe() are arch_stack_walk() and
arch_stack_walk_reliable().
Rename unwinder functions
=========================
Rename unwinder functions to unwind*() similar to other architectures
for naming consistency.
start_backtrace() ==> unwind_start()
unwind_frame() ==> unwind_next()
walk_stackframe() ==> unwind()
Annotate unwinder functions
===========================
Annotate all of the unwind_*() functions with notrace so they cannot be
ftraced and NOKPROBE_SYMBOL() so they cannot be kprobed. Ftrace and Kprobe
code can call the unwinder.
Redefine the unwinder loop
==========================
Redefine the unwinder loop and make it similar to other architectures.
Define the following:
unwind_start(&frame, fp, pc);
while (unwind_continue(task, &frame, consume_entry, cookie))
unwind_next(task, &frame);
unwind_continue()
This new function implements checks to determine whether the
unwind should continue or terminate.
unwind_next()
Same as the original unwind_frame() except:
- the stack trace termination check has been moved from here to
unwind_continue(). So, unwind_next() assumes that the fp is valid.
- unwind_frame() used to return an error value. unwind_next() only
sets an internal flag "failed" to indicate that an error was
encountered. This flag is checked by unwind_continue().
Reliability checks
==================
There are some kernel features and conditions that make a stack trace
unreliable. Callers may require the unwinder to detect these cases.
E.g., livepatch.
Introduce a new function called unwind_check_reliability() that will detect
these cases and set a boolean "reliable" in the stackframe.
unwind_check_reliability() will be called for every frame. That is, in
unwind_start() as well as unwind_next().
Introduce the first reliability check in unwind_check_reliability() - If
a return PC is not a valid kernel text address, consider the stack
trace unreliable. It could be some generated code.
Other reliability checks will be added in the future.
arch_stack_walk_reliable()
==========================
Introduce arch_stack_walk_reliable() for ARM64. This works like
arch_stack_walk() except that it returns an error if the stack trace is
found to be unreliable.
Until all of the reliability checks are in place in
unwind_check_reliability(), arch_stack_walk_reliable() may not be used by
livepatch. But it may be used by debug and test code.
SYM_CODE check
==============
This is the second reliability check implemented.
SYM_CODE functions do not follow normal calling conventions. They cannot
be unwound reliably using the frame pointer. Collect the address ranges
of these functions in a special section called "sym_code_functions".
In unwind_check_reliability(), check the return PC against these ranges. If
a match is found, then mark the stack trace unreliable.
Last stack frame
----------------
If a SYM_CODE function occurs in the very last frame in the stack trace,
then the stack trace is not considered unreliable. This is because there
is no more unwinding to do. Examples:
- EL0 exception stack traces end in the top level EL0 exception
handlers.
- All kernel thread stack traces end in ret_from_fork().
---
Changelog:
v9:
From me:
- Removed the word "RFC" from the subject line as I believe this
is mature enough to be a regular patch.
From Mark Brown, Mark Rutland:
- Split the patches into smaller, self-contained ones.
- Always enable STACKTRACE so that arch_stack_walk() is always
defined.
From Mark Rutland:
- Update callchain_trace() take the return value of
perf_callchain_store() into acount.
- Restore get_wchan() behavior to the original code.
- Simplify an if statement in dump_backtrace().
From Mark Brown:
- Do not abort the stack trace on the first unreliable frame.
v8:
- Synced to v5.14-rc5.
From Mark Rutland:
- Make the unwinder loop similar to other architectures.
- Keep details to within the unwinder functions and return a simple
boolean to the caller.
- Convert some of the current code that contains unwinder logic to
simply use arch_stack_walk(). I have converted all of them.
- Do not copy sym_code_functions[]. Just place it in rodata for now.
- Have the main loop check for termination conditions rather than
having unwind_frame() check for them. In other words, let
unwind_frame() assume that the fp is valid.
- Replace the big comment for SYM_CODE functions with a shorter
comment.
/*
* As SYM_CODE functions don't follow the usual calling
* conventions, we assume by default that any SYM_CODE function
* cannot be unwound reliably.
*
* Note that this includes:
*
* - Exception handlers and entry assembly
* - Trampoline assembly (e.g., ftrace, kprobes)
* - Hypervisor-related assembly
* - Hibernation-related assembly
* - CPU start-stop, suspend-resume assembly
* - Kernel relocation assembly
*/
v7:
The Mailer screwed up the threading on this. So, I have resent this
same series as version 8 with proper threading to avoid confusion.
v6:
From Mark Rutland:
- The per-frame reliability concept and flag are acceptable. But more
work is needed to make the per-frame checks more accurate and more
complete. E.g., some code reorg is being worked on that will help.
I have now removed the frame->reliable flag and deleted the whole
concept of per-frame status. This is orthogonal to this patch series.
Instead, I have improved the unwinder to return proper return codes
so a caller can take appropriate action without needing per-frame
status.
- Remove the mention of PLTs and update the comment.
I have replaced the comment above the call to __kernel_text_address()
with the comment suggested by Mark Rutland.
Other comments:
- Other comments on the per-frame stuff are not relevant because
that approach is not there anymore.
v5:
From Keiya Nobuta:
- The term blacklist(ed) is not to be used anymore. I have changed it
to unreliable. So, the function unwinder_blacklisted() has been
changed to unwinder_is_unreliable().
From Mark Brown:
- Add a comment for the "reliable" flag in struct stackframe. The
reliability attribute is not complete until all the checks are
in place. Added a comment above struct stackframe.
- Include some of the comments in the cover letter in the actual
code so that we can compare it with the reliable stack trace
requirements document for completeness. I have added a comment:
- above unwinder_is_unreliable() that lists the requirements
that are addressed by the function.
- above the __kernel_text_address() call about all the cases
the call covers.
v4:
From Mark Brown:
- I was checking the return PC with __kernel_text_address() before
the Function Graph trace handling. Mark Brown felt that all the
reliability checks should be performed on the original return PC
once that is obtained. So, I have moved all the reliability checks
to after the Function Graph Trace handling code in the unwinder.
Basically, the unwinder should perform PC translations first (for
rhe return trampoline for Function Graph Tracing, Kretprobes, etc).
Then, the reliability checks should be applied to the resulting
PC.
- Mark said to improve the naming of the new functions so they don't
collide with existing ones. I have used a prefix "unwinder_" for
all the new functions.
From Josh Poimboeuf:
- In the error scenarios in the unwinder, the reliable flag in the
stack frame should be set. Implemented this.
- Some of the other comments are not relevant to the new code as
I have taken a different approach in the new code. That is why
I have not made those changes. E.g., Ard wanted me to add the
"const" keyword to the global section array. That array does not
exist in v4. Similarly, Mark Brown said to use ARRAY_SIZE() for
the same array in a for loop.
Other changes:
- Add a new definition for SYM_CODE_END() that adds the address
range of the function to a special section called
"sym_code_functions".
- Include the new section under initdata in vmlinux.lds.S.
- Define an early_initcall() to copy the contents of the
"sym_code_functions" section to an array by the same name.
- Define a function unwinder_blacklisted() that compares a return
PC against sym_code_sections[]. If there is a match, mark the
stack trace unreliable. Call this from unwind_frame().
v3:
- Implemented a sym_code_ranges[] array to contains sections bounds
for text sections that contain SYM_CODE_*() functions. The unwinder
checks each return PC against the sections. If it falls in any of
the sections, the stack trace is marked unreliable.
- Moved SYM_CODE functions from .text and .init.text into a new
text section called ".code.text". Added this section to
vmlinux.lds.S and sym_code_ranges[].
- Fixed the logic in the unwinder that handles Function Graph
Tracer return trampoline.
- Removed all the previous code that handles:
- ftrace entry code for traced function
- special_functions[] array that lists individual functions
- kretprobe_trampoline() special case
v2
- Removed the terminating entry { 0, 0 } in special_functions[]
and replaced it with the idiom { /* sentinel */ }.
- Change the ftrace trampoline entry ftrace_graph_call in
special_functions[] to ftrace_call + 4 and added explanatory
comments.
- Unnested #ifdefs in special_functions[] for FTRACE.
v1
- Define a bool field in struct stackframe. This will indicate if
a stack trace is reliable.
- Implement a special_functions[] array that will be populated
with special functions in which the stack trace is considered
unreliable.
- Using kallsyms_lookup(), get the address ranges for the special
functions and record them.
- Implement an is_reliable_function(pc). This function will check
if a given return PC falls in any of the special functions. If
it does, the stack trace is unreliable.
- Implement check_reliability() function that will check if a
stack frame is reliable. Call is_reliable_function() from
check_reliability().
- Before a return PC is checked against special_funtions[], it
must be validates as a proper kernel text address. Call
__kernel_text_address() from check_reliability().
- Finally, call check_reliability() from unwind_frame() for
each stack frame.
- Add EL1 exception handlers to special_functions[].
el1_sync();
el1_irq();
el1_error();
el1_sync_invalid();
el1_irq_invalid();
el1_fiq_invalid();
el1_error_invalid();
- The above functions are currently defined as LOCAL symbols.
Make them global so that they can be referenced from the
unwinder code.
- Add FTRACE trampolines to special_functions[]:
ftrace_graph_call()
ftrace_graph_caller()
return_to_handler()
- Add the kretprobe trampoline to special functions[]:
kretprobe_trampoline()
Previous versions and discussion
================================
v8: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v7: Mailer screwed up the threading. Sent the same as v8 with proper threading.
v6: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v5: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v4: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v3: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v2: https://lore.kernel.org/linux-arm-kernel/[email protected]/
v1: https://lore.kernel.org/linux-arm-kernel/[email protected]/
Madhavan T. Venkataraman (11):
arm64: Select STACKTRACE in arch/arm64/Kconfig
arm64: Make perf_callchain_kernel() use arch_stack_walk()
arm64: Make get_wchan() use arch_stack_walk()
arm64: Make return_address() use arch_stack_walk()
arm64: Make dump_stacktrace() use arch_stack_walk()
arm64: Make profile_pc() use arch_stack_walk()
arm64: Call stack_backtrace() only from within walk_stackframe()
arm64: Rename unwinder functions, prevent them from being traced and
kprobed
arm64: Make the unwind loop in unwind() similar to other architectures
arm64: Introduce stack trace reliability checks in the unwinder
arm64: Create a list of SYM_CODE functions, check return PC against
list
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/linkage.h | 12 ++
arch/arm64/include/asm/sections.h | 1 +
arch/arm64/include/asm/stacktrace.h | 12 +-
arch/arm64/kernel/perf_callchain.c | 8 +-
arch/arm64/kernel/process.c | 38 ++--
arch/arm64/kernel/return_address.c | 6 +-
arch/arm64/kernel/stacktrace.c | 274 +++++++++++++++++++---------
arch/arm64/kernel/time.c | 22 ++-
arch/arm64/kernel/vmlinux.lds.S | 10 +
10 files changed, 257 insertions(+), 127 deletions(-)
base-commit: 36a21d51725af2ce0700c6ebcb6b9594aac658a6
--
2.25.1
From: "Madhavan T. Venkataraman" <[email protected]>
Currently, there are multiple functions in ARM64 code that walk the
stack using start_backtrace() and unwind_frame() or start_backtrace()
and walk_stackframe(). They should all be converted to use
arch_stack_walk(). This makes maintenance easier.
To do that, arch_stack_walk() must always be defined. arch_stack_walk()
is within #ifdef CONFIG_STACKTRACE. So, select STACKTRACE in
arch/arm64/Kconfig.
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index fdcd54d39c1e..bfb0ce60d820 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -35,6 +35,7 @@ config ARM64
select ARCH_HAS_SET_DIRECT_MAP
select ARCH_HAS_SET_MEMORY
select ARCH_STACKWALK
+ select STACKTRACE
select ARCH_HAS_STRICT_KERNEL_RWX
select ARCH_HAS_STRICT_MODULE_RWX
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
--
2.25.1
From: "Madhavan T. Venkataraman" <[email protected]>
Currently, get_wchan() in ARM64 code walks the stack using start_backtrace()
and unwind_frame(). Make it use arch_stack_walk() instead. This makes
maintenance easier.
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/kernel/process.c | 38 ++++++++++++++++++++++---------------
1 file changed, 23 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index c8989b999250..48ed89acce0d 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -544,11 +544,27 @@ __notrace_funcgraph struct task_struct *__switch_to(struct task_struct *prev,
return last;
}
+struct wchan_info {
+ unsigned long pc;
+ int count;
+};
+
+static bool get_wchan_cb(void *arg, unsigned long pc)
+{
+ struct wchan_info *wchan_info = arg;
+
+ if (!in_sched_functions(pc)) {
+ wchan_info->pc = pc;
+ return false;
+ }
+ return wchan_info->count++ < 16;
+}
+
unsigned long get_wchan(struct task_struct *p)
{
- struct stackframe frame;
- unsigned long stack_page, ret = 0;
- int count = 0;
+ unsigned long stack_page;
+ struct wchan_info wchan_info;
+
if (!p || p == current || task_is_running(p))
return 0;
@@ -556,20 +572,12 @@ unsigned long get_wchan(struct task_struct *p)
if (!stack_page)
return 0;
- start_backtrace(&frame, thread_saved_fp(p), thread_saved_pc(p));
+ wchan_info.pc = 0;
+ wchan_info.count = 0;
+ arch_stack_walk(get_wchan_cb, &wchan_info, p, NULL);
- do {
- if (unwind_frame(p, &frame))
- goto out;
- if (!in_sched_functions(frame.pc)) {
- ret = frame.pc;
- goto out;
- }
- } while (count++ < 16);
-
-out:
put_task_stack(p);
- return ret;
+ return wchan_info.pc;
}
unsigned long arch_align_stack(unsigned long sp)
--
2.25.1
From: "Madhavan T. Venkataraman" <[email protected]>
Currently, arch_stack_walk() calls start_backtrace() and walk_stackframe()
separately. There is no need to do that. Instead, call start_backtrace()
from within walk_stackframe(). In other words, walk_stackframe() is the only
unwind function a consumer needs to call.
Currently, the only consumer is arch_stack_walk(). In the future,
arch_stack_walk_reliable() will be another consumer.
start_backtrace(), unwind_frame() and walk_stackframe() are only used
within arm64/kernel/stacktrace.c. Make them static and remove them from
arch/arm64/include/asm/stacktrace.h.
Currently, there is a check for a NULL task in unwind_frame(). It is not
needed since all current consumers pass a non-NULL task.
Use struct stackframe only within the unwind functions.
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/include/asm/stacktrace.h | 6 ----
arch/arm64/kernel/stacktrace.c | 51 ++++++++++++++++-------------
2 files changed, 29 insertions(+), 28 deletions(-)
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 8aebc00c1718..c239f357d779 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -61,9 +61,6 @@ struct stackframe {
#endif
};
-extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
-extern void walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
- bool (*fn)(void *, unsigned long), void *data);
extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
const char *loglvl);
@@ -148,7 +145,4 @@ static inline bool on_accessible_stack(const struct task_struct *tsk,
return false;
}
-void start_backtrace(struct stackframe *frame, unsigned long fp,
- unsigned long pc);
-
#endif /* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 776c4debb5a7..7d32cee9ef4b 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -33,8 +33,8 @@
*/
-void start_backtrace(struct stackframe *frame, unsigned long fp,
- unsigned long pc)
+static void start_backtrace(struct stackframe *frame, unsigned long fp,
+ unsigned long pc)
{
frame->fp = fp;
frame->pc = pc;
@@ -63,14 +63,12 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
* records (e.g. a cycle), determined based on the location and fp value of A
* and the location (but not the fp value) of B.
*/
-int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
+static int notrace unwind_frame(struct task_struct *tsk,
+ struct stackframe *frame)
{
unsigned long fp = frame->fp;
struct stack_info info;
- if (!tsk)
- tsk = current;
-
/* Final frame; nothing to unwind */
if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
return -ENOENT;
@@ -136,15 +134,21 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
}
NOKPROBE_SYMBOL(unwind_frame);
-void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
- bool (*fn)(void *, unsigned long), void *data)
+static void notrace walk_stackframe(struct task_struct *tsk,
+ unsigned long fp, unsigned long pc,
+ bool (*fn)(void *, unsigned long),
+ void *data)
{
+ struct stackframe frame;
+
+ start_backtrace(&frame, fp, pc);
+
while (1) {
int ret;
- if (!fn(data, frame->pc))
+ if (!fn(data, frame.pc))
break;
- ret = unwind_frame(tsk, frame);
+ ret = unwind_frame(tsk, &frame);
if (ret < 0)
break;
}
@@ -190,19 +194,22 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
void *cookie, struct task_struct *task,
struct pt_regs *regs)
{
- struct stackframe frame;
+ unsigned long fp, pc;
+
+ if (regs) {
+ fp = regs->regs[29];
+ pc = regs->pc;
+ } else if (task == current) {
+ /* Skip arch_stack_walk() in the stack trace. */
+ fp = (unsigned long)__builtin_frame_address(1);
+ pc = (unsigned long)__builtin_return_address(0);
+ } else {
+ /* Caller guarantees that the task is not running. */
+ fp = thread_saved_fp(task);
+ pc = thread_saved_pc(task);
+ }
+ walk_stackframe(task, fp, pc, consume_entry, cookie);
- if (regs)
- start_backtrace(&frame, regs->regs[29], regs->pc);
- else if (task == current)
- start_backtrace(&frame,
- (unsigned long)__builtin_frame_address(1),
- (unsigned long)__builtin_return_address(0));
- else
- start_backtrace(&frame, thread_saved_fp(task),
- thread_saved_pc(task));
-
- walk_stackframe(task, &frame, consume_entry, cookie);
}
#endif
--
2.25.1
From: "Madhavan T. Venkataraman" <[email protected]>
Currently, perf_callchain_kernel() in ARM64 code walks the stack using
start_backtrace() and walk_stackframe(). Make it use arch_stack_walk()
instead. This makes maintenance easier.
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/kernel/perf_callchain.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
index 4a72c2727309..f173c448e852 100644
--- a/arch/arm64/kernel/perf_callchain.c
+++ b/arch/arm64/kernel/perf_callchain.c
@@ -140,22 +140,18 @@ void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
static bool callchain_trace(void *data, unsigned long pc)
{
struct perf_callchain_entry_ctx *entry = data;
- perf_callchain_store(entry, pc);
- return true;
+ return perf_callchain_store(entry, pc) == 0;
}
void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
struct pt_regs *regs)
{
- struct stackframe frame;
-
if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
/* We don't support guest os callchain now */
return;
}
- start_backtrace(&frame, regs->regs[29], regs->pc);
- walk_stackframe(current, &frame, callchain_trace, entry);
+ arch_stack_walk(callchain_trace, entry, current, regs);
}
unsigned long perf_instruction_pointer(struct pt_regs *regs)
--
2.25.1
From: "Madhavan T. Venkataraman" <[email protected]>
Currently, profile_pc() in ARM64 code walks the stack using
start_backtrace() and unwind_frame(). Make it use arch_stack_walk()
instead. This makes maintenance easier.
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/kernel/time.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
index eebbc8d7123e..671b3038a772 100644
--- a/arch/arm64/kernel/time.c
+++ b/arch/arm64/kernel/time.c
@@ -32,22 +32,26 @@
#include <asm/stacktrace.h>
#include <asm/paravirt.h>
+static bool profile_pc_cb(void *arg, unsigned long pc)
+{
+ unsigned long *prof_pc = arg;
+
+ if (in_lock_functions(pc))
+ return true;
+ *prof_pc = pc;
+ return false;
+}
+
unsigned long profile_pc(struct pt_regs *regs)
{
- struct stackframe frame;
+ unsigned long prof_pc = 0;
if (!in_lock_functions(regs->pc))
return regs->pc;
- start_backtrace(&frame, regs->regs[29], regs->pc);
-
- do {
- int ret = unwind_frame(NULL, &frame);
- if (ret < 0)
- return 0;
- } while (in_lock_functions(frame.pc));
+ arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
- return frame.pc;
+ return prof_pc;
}
EXPORT_SYMBOL(profile_pc);
--
2.25.1
From: "Madhavan T. Venkataraman" <[email protected]>
Currently, dump_stacktrace() in ARM64 code walks the stack using
start_backtrace() and unwind_frame(). Make it use arch_stack_walk()
instead. This makes maintenance easier.
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/kernel/stacktrace.c | 44 +++++-----------------------------
1 file changed, 6 insertions(+), 38 deletions(-)
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 8982a2b78acf..776c4debb5a7 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -151,24 +151,20 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
}
NOKPROBE_SYMBOL(walk_stackframe);
-static void dump_backtrace_entry(unsigned long where, const char *loglvl)
+static bool dump_backtrace_entry(void *arg, unsigned long where)
{
+ char *loglvl = arg;
printk("%s %pSb\n", loglvl, (void *)where);
+ return true;
}
void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
const char *loglvl)
{
- struct stackframe frame;
- int skip = 0;
-
pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
- if (regs) {
- if (user_mode(regs))
- return;
- skip = 1;
- }
+ if (regs && user_mode(regs))
+ return;
if (!tsk)
tsk = current;
@@ -176,36 +172,8 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
if (!try_get_task_stack(tsk))
return;
- if (tsk == current) {
- start_backtrace(&frame,
- (unsigned long)__builtin_frame_address(0),
- (unsigned long)dump_backtrace);
- } else {
- /*
- * task blocked in __switch_to
- */
- start_backtrace(&frame,
- thread_saved_fp(tsk),
- thread_saved_pc(tsk));
- }
-
printk("%sCall trace:\n", loglvl);
- do {
- /* skip until specified stack frame */
- if (!skip) {
- dump_backtrace_entry(frame.pc, loglvl);
- } else if (frame.fp == regs->regs[29]) {
- skip = 0;
- /*
- * Mostly, this is the case where this function is
- * called in panic/abort. As exception handler's
- * stack frame does not contain the corresponding pc
- * at which an exception has taken place, use regs->pc
- * instead.
- */
- dump_backtrace_entry(regs->pc, loglvl);
- }
- } while (!unwind_frame(tsk, &frame));
+ arch_stack_walk(dump_backtrace_entry, (void *)loglvl, tsk, regs);
put_task_stack(tsk);
}
--
2.25.1
From: "Madhavan T. Venkataraman" <[email protected]>
Currently, return_address() in ARM64 code walks the stack using
start_backtrace() and walk_stackframe(). Make it use arch_stack_walk()
instead. This makes maintenance easier.
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/kernel/return_address.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
index a6d18755652f..92a0f4d434e4 100644
--- a/arch/arm64/kernel/return_address.c
+++ b/arch/arm64/kernel/return_address.c
@@ -35,15 +35,11 @@ NOKPROBE_SYMBOL(save_return_addr);
void *return_address(unsigned int level)
{
struct return_address_data data;
- struct stackframe frame;
data.level = level + 2;
data.addr = NULL;
- start_backtrace(&frame,
- (unsigned long)__builtin_frame_address(0),
- (unsigned long)return_address);
- walk_stackframe(current, &frame, save_return_addr, &data);
+ arch_stack_walk(save_return_addr, &data, current, NULL);
if (!data.level)
return data.addr;
--
2.25.1
From: "Madhavan T. Venkataraman" <[email protected]>
Rename unwinder functions for consistency and better naming.
- Rename start_backtrace() to unwind_start().
- Rename unwind_frame() to unwind_next().
- Rename walk_stackframe() to unwind().
Prevent the following unwinder functions from being traced:
- unwind_start()
- unwind_next()
unwind() is already prevented from being traced.
Prevent the following unwinder functions from being kprobed:
- unwind_start()
unwind_next() and unwind() are already prevented from being kprobed.
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/kernel/stacktrace.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 7d32cee9ef4b..f4f3575f71fd 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -33,8 +33,8 @@
*/
-static void start_backtrace(struct stackframe *frame, unsigned long fp,
- unsigned long pc)
+static void notrace unwind_start(struct stackframe *frame, unsigned long fp,
+ unsigned long pc)
{
frame->fp = fp;
frame->pc = pc;
@@ -45,7 +45,7 @@ static void start_backtrace(struct stackframe *frame, unsigned long fp,
/*
* Prime the first unwind.
*
- * In unwind_frame() we'll check that the FP points to a valid stack,
+ * In unwind_next() we'll check that the FP points to a valid stack,
* which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
* treated as a transition to whichever stack that happens to be. The
* prev_fp value won't be used, but we set it to 0 such that it is
@@ -56,6 +56,8 @@ static void start_backtrace(struct stackframe *frame, unsigned long fp,
frame->prev_type = STACK_TYPE_UNKNOWN;
}
+NOKPROBE_SYMBOL(unwind_start);
+
/*
* Unwind from one frame record (A) to the next frame record (B).
*
@@ -63,8 +65,8 @@ static void start_backtrace(struct stackframe *frame, unsigned long fp,
* records (e.g. a cycle), determined based on the location and fp value of A
* and the location (but not the fp value) of B.
*/
-static int notrace unwind_frame(struct task_struct *tsk,
- struct stackframe *frame)
+static int notrace unwind_next(struct task_struct *tsk,
+ struct stackframe *frame)
{
unsigned long fp = frame->fp;
struct stack_info info;
@@ -104,7 +106,7 @@ static int notrace unwind_frame(struct task_struct *tsk,
/*
* Record this frame record's values and location. The prev_fp and
- * prev_type are only meaningful to the next unwind_frame() invocation.
+ * prev_type are only meaningful to the next unwind_next() invocation.
*/
frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
@@ -132,28 +134,30 @@ static int notrace unwind_frame(struct task_struct *tsk,
return 0;
}
-NOKPROBE_SYMBOL(unwind_frame);
-static void notrace walk_stackframe(struct task_struct *tsk,
- unsigned long fp, unsigned long pc,
- bool (*fn)(void *, unsigned long),
- void *data)
+NOKPROBE_SYMBOL(unwind_next);
+
+static void notrace unwind(struct task_struct *tsk,
+ unsigned long fp, unsigned long pc,
+ bool (*fn)(void *, unsigned long),
+ void *data)
{
struct stackframe frame;
- start_backtrace(&frame, fp, pc);
+ unwind_start(&frame, fp, pc);
while (1) {
int ret;
if (!fn(data, frame.pc))
break;
- ret = unwind_frame(tsk, &frame);
+ ret = unwind_next(tsk, &frame);
if (ret < 0)
break;
}
}
-NOKPROBE_SYMBOL(walk_stackframe);
+
+NOKPROBE_SYMBOL(unwind);
static bool dump_backtrace_entry(void *arg, unsigned long where)
{
@@ -208,7 +212,7 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
fp = thread_saved_fp(task);
pc = thread_saved_pc(task);
}
- walk_stackframe(task, fp, pc, consume_entry, cookie);
+ unwind(task, fp, pc, consume_entry, cookie);
}
--
2.25.1
From: "Madhavan T. Venkataraman" <[email protected]>
Change the loop in unwind()
===========================
Change the unwind loop in unwind() to:
unwind_start(&frame, fp, pc);
while (unwind_continue(tsk, &frame, fn, data))
unwind_next(tsk, &frame);
New function unwind_continue()
==============================
Define a new function unwind_continue() that is used in the unwind loop
to check for conditions that terminate a stack trace.
The conditions checked are:
- If the bottom of the stack has been reached, terminate.
- If the consume_entry() function returns false, the caller of
unwind has asked to terminate the stack trace. So, terminate.
- If unwind_next() failed for some reason (like stack corruption),
terminate.
Do not return an error value from unwind_next()
===============================================
We want to check for terminating conditions only in unwind_continue() from
the unwinder loop. So, do not return an error value from unwind_next().
Simply set a flag in the stackframe and check the flag in unwind_continue().
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/include/asm/stacktrace.h | 3 ++
arch/arm64/kernel/stacktrace.c | 78 ++++++++++++++++++-----------
2 files changed, 53 insertions(+), 28 deletions(-)
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index c239f357d779..ba2180c7d5cd 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -49,6 +49,8 @@ struct stack_info {
*
* @graph: When FUNCTION_GRAPH_TRACER is selected, holds the index of a
* replacement lr value in the ftrace graph stack.
+ *
+ * @failed: Unwind failed.
*/
struct stackframe {
unsigned long fp;
@@ -59,6 +61,7 @@ struct stackframe {
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
int graph;
#endif
+ bool failed;
};
extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index f4f3575f71fd..8e9e6f38c975 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -54,6 +54,7 @@ static void notrace unwind_start(struct stackframe *frame, unsigned long fp,
bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
frame->prev_fp = 0;
frame->prev_type = STACK_TYPE_UNKNOWN;
+ frame->failed = false;
}
NOKPROBE_SYMBOL(unwind_start);
@@ -65,24 +66,26 @@ NOKPROBE_SYMBOL(unwind_start);
* records (e.g. a cycle), determined based on the location and fp value of A
* and the location (but not the fp value) of B.
*/
-static int notrace unwind_next(struct task_struct *tsk,
- struct stackframe *frame)
+static void notrace unwind_next(struct task_struct *tsk,
+ struct stackframe *frame)
{
unsigned long fp = frame->fp;
struct stack_info info;
- /* Final frame; nothing to unwind */
- if (fp == (unsigned long)task_pt_regs(tsk)->stackframe)
- return -ENOENT;
-
- if (fp & 0x7)
- return -EINVAL;
+ if (fp & 0x7) {
+ frame->failed = true;
+ return;
+ }
- if (!on_accessible_stack(tsk, fp, 16, &info))
- return -EINVAL;
+ if (!on_accessible_stack(tsk, fp, 16, &info)) {
+ frame->failed = true;
+ return;
+ }
- if (test_bit(info.type, frame->stacks_done))
- return -EINVAL;
+ if (test_bit(info.type, frame->stacks_done)) {
+ frame->failed = true;
+ return;
+ }
/*
* As stacks grow downward, any valid record on the same stack must be
@@ -98,8 +101,10 @@ static int notrace unwind_next(struct task_struct *tsk,
* stack.
*/
if (info.type == frame->prev_type) {
- if (fp <= frame->prev_fp)
- return -EINVAL;
+ if (fp <= frame->prev_fp) {
+ frame->failed = true;
+ return;
+ }
} else {
set_bit(frame->prev_type, frame->stacks_done);
}
@@ -124,19 +129,44 @@ static int notrace unwind_next(struct task_struct *tsk,
* So replace it to an original value.
*/
ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
- if (WARN_ON_ONCE(!ret_stack))
- return -EINVAL;
+ if (WARN_ON_ONCE(!ret_stack)) {
+ frame->failed = true;
+ return;
+ }
frame->pc = ret_stack->ret;
}
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
frame->pc = ptrauth_strip_insn_pac(frame->pc);
-
- return 0;
}
NOKPROBE_SYMBOL(unwind_next);
+static bool notrace unwind_continue(struct task_struct *task,
+ struct stackframe *frame,
+ stack_trace_consume_fn consume_entry,
+ void *cookie)
+{
+ if (frame->failed) {
+ /* PC is suspect. Cannot consume it. */
+ return false;
+ }
+
+ if (!consume_entry(cookie, frame->pc)) {
+ /* Caller terminated the unwind. */
+ frame->failed = true;
+ return false;
+ }
+
+ if (frame->fp == (unsigned long)task_pt_regs(task)->stackframe) {
+ /* Final frame; nothing to unwind */
+ return false;
+ }
+ return true;
+}
+
+NOKPROBE_SYMBOL(unwind_continue);
+
static void notrace unwind(struct task_struct *tsk,
unsigned long fp, unsigned long pc,
bool (*fn)(void *, unsigned long),
@@ -145,16 +175,8 @@ static void notrace unwind(struct task_struct *tsk,
struct stackframe frame;
unwind_start(&frame, fp, pc);
-
- while (1) {
- int ret;
-
- if (!fn(data, frame.pc))
- break;
- ret = unwind_next(tsk, &frame);
- if (ret < 0)
- break;
- }
+ while (unwind_continue(tsk, &frame, fn, data))
+ unwind_next(tsk, &frame);
}
NOKPROBE_SYMBOL(unwind);
--
2.25.1
From: "Madhavan T. Venkataraman" <[email protected]>
SYM_CODE functions don't follow the usual calling conventions. Check if the
return PC in a stack frame falls in any of these. If it does, consider the
stack trace unreliable.
Define a special section for unreliable functions
=================================================
Define a SYM_CODE_END() macro for arm64 that adds the function address
range to a new section called "sym_code_functions".
Linker file
===========
Include the "sym_code_functions" section under read-only data in
vmlinux.lds.S.
Initialization
==============
Define an early_initcall() to create a sym_code_functions[] array from
the linker data.
Unwinder check
==============
Add a reliability check in unwind_is_reliable() that compares a return
PC with sym_code_functions[]. If there is a match, then return failure.
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/include/asm/linkage.h | 12 +++++++
arch/arm64/include/asm/sections.h | 1 +
arch/arm64/kernel/stacktrace.c | 55 +++++++++++++++++++++++++++++++
arch/arm64/kernel/vmlinux.lds.S | 10 ++++++
4 files changed, 78 insertions(+)
diff --git a/arch/arm64/include/asm/linkage.h b/arch/arm64/include/asm/linkage.h
index 9906541a6861..616bad74e297 100644
--- a/arch/arm64/include/asm/linkage.h
+++ b/arch/arm64/include/asm/linkage.h
@@ -68,4 +68,16 @@
SYM_FUNC_END_ALIAS(x); \
SYM_FUNC_END_ALIAS(__pi_##x)
+/*
+ * Record the address range of each SYM_CODE function in a struct code_range
+ * in a special section.
+ */
+#define SYM_CODE_END(name) \
+ SYM_END(name, SYM_T_NONE) ;\
+ 99: ;\
+ .pushsection "sym_code_functions", "aw" ;\
+ .quad name ;\
+ .quad 99b ;\
+ .popsection
+
#endif
diff --git a/arch/arm64/include/asm/sections.h b/arch/arm64/include/asm/sections.h
index e4ad9db53af1..c84c71063d6e 100644
--- a/arch/arm64/include/asm/sections.h
+++ b/arch/arm64/include/asm/sections.h
@@ -21,5 +21,6 @@ extern char __exittext_begin[], __exittext_end[];
extern char __irqentry_text_start[], __irqentry_text_end[];
extern char __mmuoff_data_start[], __mmuoff_data_end[];
extern char __entry_tramp_text_start[], __entry_tramp_text_end[];
+extern char __sym_code_functions_start[], __sym_code_functions_end[];
#endif /* __ASM_SECTIONS_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 142f08ae515f..40e5af7e5b1d 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,11 +18,40 @@
#include <asm/stack_pointer.h>
#include <asm/stacktrace.h>
+struct code_range {
+ unsigned long start;
+ unsigned long end;
+};
+
+static struct code_range *sym_code_functions;
+static int num_sym_code_functions;
+
+int __init init_sym_code_functions(void)
+{
+ size_t size = (unsigned long)__sym_code_functions_end -
+ (unsigned long)__sym_code_functions_start;
+
+ sym_code_functions = (struct code_range *)__sym_code_functions_start;
+ /*
+ * Order it so that sym_code_functions is not visible before
+ * num_sym_code_functions.
+ */
+ smp_mb();
+ num_sym_code_functions = size / sizeof(struct code_range);
+
+ return 0;
+}
+early_initcall(init_sym_code_functions);
+
/*
* Check the stack frame for conditions that make further unwinding unreliable.
*/
static void notrace unwind_check_reliability(struct stackframe *frame)
{
+ const struct code_range *range;
+ unsigned long pc;
+ int i;
+
/*
* If the PC is not a known kernel text address, then we cannot
* be sure that a subsequent unwind will be reliable, as we
@@ -30,6 +59,32 @@ static void notrace unwind_check_reliability(struct stackframe *frame)
*/
if (!__kernel_text_address(frame->pc))
frame->reliable = false;
+
+ /*
+ * Check the return PC against sym_code_functions[]. If there is a
+ * match, then the consider the stack frame unreliable.
+ *
+ * As SYM_CODE functions don't follow the usual calling conventions,
+ * we assume by default that any SYM_CODE function cannot be unwound
+ * reliably.
+ *
+ * Note that this includes:
+ *
+ * - Exception handlers and entry assembly
+ * - Trampoline assembly (e.g., ftrace, kprobes)
+ * - Hypervisor-related assembly
+ * - Hibernation-related assembly
+ * - CPU start-stop, suspend-resume assembly
+ * - Kernel relocation assembly
+ */
+ pc = frame->pc;
+ for (i = 0; i < num_sym_code_functions; i++) {
+ range = &sym_code_functions[i];
+ if (pc >= range->start && pc < range->end) {
+ frame->reliable = false;
+ return;
+ }
+ }
}
NOKPROBE_SYMBOL(unwind_check_reliability);
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 709d2c433c5e..2bf769f45b54 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -111,6 +111,14 @@ jiffies = jiffies_64;
#define TRAMP_TEXT
#endif
+#define SYM_CODE_FUNCTIONS \
+ . = ALIGN(16); \
+ .symcode : AT(ADDR(.symcode) - LOAD_OFFSET) { \
+ __sym_code_functions_start = .; \
+ KEEP(*(sym_code_functions)) \
+ __sym_code_functions_end = .; \
+ }
+
/*
* The size of the PE/COFF section that covers the kernel image, which
* runs from _stext to _edata, must be a round multiple of the PE/COFF
@@ -196,6 +204,8 @@ SECTIONS
swapper_pg_dir = .;
. += PAGE_SIZE;
+ SYM_CODE_FUNCTIONS
+
. = ALIGN(SEGMENT_ALIGN);
__init_begin = .;
__inittext_begin = .;
--
2.25.1
From: "Madhavan T. Venkataraman" <[email protected]>
There are some kernel features and conditions that make a stack trace
unreliable. Callers may require the unwinder to detect these cases.
E.g., livepatch.
Introduce a new function called unwind_check_reliability() that will
detect these cases and set a flag in the stack frame. Call
unwind_check_reliability() for every frame, that is, in unwind_start()
and unwind_next().
Introduce the first reliability check in unwind_check_reliability() - If
a return PC is not a valid kernel text address, consider the stack
trace unreliable. It could be some generated code. Other reliability checks
will be added in the future.
Let unwind() return a boolean to indicate if the stack trace is
reliable.
Introduce arch_stack_walk_reliable() for ARM64. This works like
arch_stack_walk() except that it returns -EINVAL if the stack trace is not
reliable.
Until all the reliability checks are in place, arch_stack_walk_reliable()
may not be used by livepatch. But it may be used by debug and test code.
Signed-off-by: Madhavan T. Venkataraman <[email protected]>
---
arch/arm64/include/asm/stacktrace.h | 3 ++
arch/arm64/kernel/stacktrace.c | 48 ++++++++++++++++++++++++++++-
2 files changed, 50 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index ba2180c7d5cd..ce0710fa3037 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -51,6 +51,8 @@ struct stack_info {
* replacement lr value in the ftrace graph stack.
*
* @failed: Unwind failed.
+ *
+ * @reliable: Stack trace is reliable.
*/
struct stackframe {
unsigned long fp;
@@ -62,6 +64,7 @@ struct stackframe {
int graph;
#endif
bool failed;
+ bool reliable;
};
extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 8e9e6f38c975..142f08ae515f 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,6 +18,22 @@
#include <asm/stack_pointer.h>
#include <asm/stacktrace.h>
+/*
+ * Check the stack frame for conditions that make further unwinding unreliable.
+ */
+static void notrace unwind_check_reliability(struct stackframe *frame)
+{
+ /*
+ * If the PC is not a known kernel text address, then we cannot
+ * be sure that a subsequent unwind will be reliable, as we
+ * don't know that the code follows our unwind requirements.
+ */
+ if (!__kernel_text_address(frame->pc))
+ frame->reliable = false;
+}
+
+NOKPROBE_SYMBOL(unwind_check_reliability);
+
/*
* AArch64 PCS assigns the frame pointer to x29.
*
@@ -55,6 +71,8 @@ static void notrace unwind_start(struct stackframe *frame, unsigned long fp,
frame->prev_fp = 0;
frame->prev_type = STACK_TYPE_UNKNOWN;
frame->failed = false;
+ frame->reliable = true;
+ unwind_check_reliability(frame);
}
NOKPROBE_SYMBOL(unwind_start);
@@ -138,6 +156,7 @@ static void notrace unwind_next(struct task_struct *tsk,
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
frame->pc = ptrauth_strip_insn_pac(frame->pc);
+ unwind_check_reliability(frame);
}
NOKPROBE_SYMBOL(unwind_next);
@@ -167,7 +186,7 @@ static bool notrace unwind_continue(struct task_struct *task,
NOKPROBE_SYMBOL(unwind_continue);
-static void notrace unwind(struct task_struct *tsk,
+static bool notrace unwind(struct task_struct *tsk,
unsigned long fp, unsigned long pc,
bool (*fn)(void *, unsigned long),
void *data)
@@ -177,6 +196,7 @@ static void notrace unwind(struct task_struct *tsk,
unwind_start(&frame, fp, pc);
while (unwind_continue(tsk, &frame, fn, data))
unwind_next(tsk, &frame);
+ return frame.reliable;
}
NOKPROBE_SYMBOL(unwind);
@@ -238,4 +258,30 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
}
+/*
+ * arch_stack_walk_reliable() may not be used for livepatch until all of
+ * the reliability checks are in place in unwind_consume(). However,
+ * debug and test code can choose to use it even if all the checks are not
+ * in place.
+ */
+noinline int notrace arch_stack_walk_reliable(stack_trace_consume_fn consume_fn,
+ void *cookie,
+ struct task_struct *task)
+{
+ unsigned long fp, pc;
+
+ if (task == current) {
+ /* Skip arch_stack_walk_reliable() in the stack trace. */
+ fp = (unsigned long)__builtin_frame_address(1);
+ pc = (unsigned long)__builtin_return_address(0);
+ } else {
+ /* Caller guarantees that the task is not running. */
+ fp = thread_saved_fp(task);
+ pc = thread_saved_pc(task);
+ }
+ if (unwind(task, fp, pc, consume_fn, cookie))
+ return 0;
+ return -EINVAL;
+}
+
#endif
--
2.25.1
Hi Mark Rutland, Mark Brown,
Sorry for the long delay in addressing your comments.
I was out sick for a month.
Please take a look when you get a chance.
I have removed the word "RFC" as I believe this is mature enough to be
a regular patch series at this point.
Thanks.
Madhavan
On 10/14/21 9:58 PM, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Make all stack walking functions use arch_stack_walk()
> ======================================================
>
> Currently, there are multiple functions in ARM64 code that walk the
> stack using start_backtrace() and unwind_frame() or start_backtrace()
> and walk_stackframe(). Convert all of them to use arch_stack_walk().
> This makes maintenance easier.
>
> This means that arch_stack_walk() needs to be always defined. So,
> select CONFIG_STACKTRACE in the ARM64 Kconfig file.
>
> Consolidate the unwinder
> ========================
>
> Currently, start_backtrace() and walk_stackframe() are called separately.
> There is no need to do that. Move the call to start_backtrace() into
> walk_stackframe() so that walk_stackframe() is the only unwinder function
> a consumer needs to call.
>
> The consumers of walk_stackframe() are arch_stack_walk() and
> arch_stack_walk_reliable().
>
> Rename unwinder functions
> =========================
>
> Rename unwinder functions to unwind*() similar to other architectures
> for naming consistency.
>
> start_backtrace() ==> unwind_start()
> unwind_frame() ==> unwind_next()
> walk_stackframe() ==> unwind()
>
> Annotate unwinder functions
> ===========================
>
> Annotate all of the unwind_*() functions with notrace so they cannot be
> ftraced and NOKPROBE_SYMBOL() so they cannot be kprobed. Ftrace and Kprobe
> code can call the unwinder.
>
> Redefine the unwinder loop
> ==========================
>
> Redefine the unwinder loop and make it similar to other architectures.
> Define the following:
>
> unwind_start(&frame, fp, pc);
> while (unwind_continue(task, &frame, consume_entry, cookie))
> unwind_next(task, &frame);
>
> unwind_continue()
> This new function implements checks to determine whether the
> unwind should continue or terminate.
>
> unwind_next()
> Same as the original unwind_frame() except:
>
> - the stack trace termination check has been moved from here to
> unwind_continue(). So, unwind_next() assumes that the fp is valid.
>
> - unwind_frame() used to return an error value. unwind_next() only
> sets an internal flag "failed" to indicate that an error was
> encountered. This flag is checked by unwind_continue().
>
> Reliability checks
> ==================
>
> There are some kernel features and conditions that make a stack trace
> unreliable. Callers may require the unwinder to detect these cases.
> E.g., livepatch.
>
> Introduce a new function called unwind_check_reliability() that will detect
> these cases and set a boolean "reliable" in the stackframe.
>
> unwind_check_reliability() will be called for every frame. That is, in
> unwind_start() as well as unwind_next().
>
> Introduce the first reliability check in unwind_check_reliability() - If
> a return PC is not a valid kernel text address, consider the stack
> trace unreliable. It could be some generated code.
>
> Other reliability checks will be added in the future.
>
> arch_stack_walk_reliable()
> ==========================
>
> Introduce arch_stack_walk_reliable() for ARM64. This works like
> arch_stack_walk() except that it returns an error if the stack trace is
> found to be unreliable.
>
> Until all of the reliability checks are in place in
> unwind_check_reliability(), arch_stack_walk_reliable() may not be used by
> livepatch. But it may be used by debug and test code.
>
> SYM_CODE check
> ==============
>
> This is the second reliability check implemented.
>
> SYM_CODE functions do not follow normal calling conventions. They cannot
> be unwound reliably using the frame pointer. Collect the address ranges
> of these functions in a special section called "sym_code_functions".
>
> In unwind_check_reliability(), check the return PC against these ranges. If
> a match is found, then mark the stack trace unreliable.
>
> Last stack frame
> ----------------
>
> If a SYM_CODE function occurs in the very last frame in the stack trace,
> then the stack trace is not considered unreliable. This is because there
> is no more unwinding to do. Examples:
>
> - EL0 exception stack traces end in the top level EL0 exception
> handlers.
>
> - All kernel thread stack traces end in ret_from_fork().
> ---
> Changelog:
>
> v9:
> From me:
>
> - Removed the word "RFC" from the subject line as I believe this
> is mature enough to be a regular patch.
>
> From Mark Brown, Mark Rutland:
>
> - Split the patches into smaller, self-contained ones.
>
> - Always enable STACKTRACE so that arch_stack_walk() is always
> defined.
>
> From Mark Rutland:
>
> - Update callchain_trace() take the return value of
> perf_callchain_store() into acount.
>
> - Restore get_wchan() behavior to the original code.
>
> - Simplify an if statement in dump_backtrace().
>
> From Mark Brown:
>
> - Do not abort the stack trace on the first unreliable frame.
>
>
> v8:
> - Synced to v5.14-rc5.
>
> From Mark Rutland:
>
> - Make the unwinder loop similar to other architectures.
>
> - Keep details to within the unwinder functions and return a simple
> boolean to the caller.
>
> - Convert some of the current code that contains unwinder logic to
> simply use arch_stack_walk(). I have converted all of them.
>
> - Do not copy sym_code_functions[]. Just place it in rodata for now.
>
> - Have the main loop check for termination conditions rather than
> having unwind_frame() check for them. In other words, let
> unwind_frame() assume that the fp is valid.
>
> - Replace the big comment for SYM_CODE functions with a shorter
> comment.
>
> /*
> * As SYM_CODE functions don't follow the usual calling
> * conventions, we assume by default that any SYM_CODE function
> * cannot be unwound reliably.
> *
> * Note that this includes:
> *
> * - Exception handlers and entry assembly
> * - Trampoline assembly (e.g., ftrace, kprobes)
> * - Hypervisor-related assembly
> * - Hibernation-related assembly
> * - CPU start-stop, suspend-resume assembly
> * - Kernel relocation assembly
> */
>
> v7:
> The Mailer screwed up the threading on this. So, I have resent this
> same series as version 8 with proper threading to avoid confusion.
> v6:
> From Mark Rutland:
>
> - The per-frame reliability concept and flag are acceptable. But more
> work is needed to make the per-frame checks more accurate and more
> complete. E.g., some code reorg is being worked on that will help.
>
> I have now removed the frame->reliable flag and deleted the whole
> concept of per-frame status. This is orthogonal to this patch series.
> Instead, I have improved the unwinder to return proper return codes
> so a caller can take appropriate action without needing per-frame
> status.
>
> - Remove the mention of PLTs and update the comment.
>
> I have replaced the comment above the call to __kernel_text_address()
> with the comment suggested by Mark Rutland.
>
> Other comments:
>
> - Other comments on the per-frame stuff are not relevant because
> that approach is not there anymore.
>
> v5:
> From Keiya Nobuta:
>
> - The term blacklist(ed) is not to be used anymore. I have changed it
> to unreliable. So, the function unwinder_blacklisted() has been
> changed to unwinder_is_unreliable().
>
> From Mark Brown:
>
> - Add a comment for the "reliable" flag in struct stackframe. The
> reliability attribute is not complete until all the checks are
> in place. Added a comment above struct stackframe.
>
> - Include some of the comments in the cover letter in the actual
> code so that we can compare it with the reliable stack trace
> requirements document for completeness. I have added a comment:
>
> - above unwinder_is_unreliable() that lists the requirements
> that are addressed by the function.
>
> - above the __kernel_text_address() call about all the cases
> the call covers.
>
> v4:
> From Mark Brown:
>
> - I was checking the return PC with __kernel_text_address() before
> the Function Graph trace handling. Mark Brown felt that all the
> reliability checks should be performed on the original return PC
> once that is obtained. So, I have moved all the reliability checks
> to after the Function Graph Trace handling code in the unwinder.
> Basically, the unwinder should perform PC translations first (for
> rhe return trampoline for Function Graph Tracing, Kretprobes, etc).
> Then, the reliability checks should be applied to the resulting
> PC.
>
> - Mark said to improve the naming of the new functions so they don't
> collide with existing ones. I have used a prefix "unwinder_" for
> all the new functions.
>
> From Josh Poimboeuf:
>
> - In the error scenarios in the unwinder, the reliable flag in the
> stack frame should be set. Implemented this.
>
> - Some of the other comments are not relevant to the new code as
> I have taken a different approach in the new code. That is why
> I have not made those changes. E.g., Ard wanted me to add the
> "const" keyword to the global section array. That array does not
> exist in v4. Similarly, Mark Brown said to use ARRAY_SIZE() for
> the same array in a for loop.
>
> Other changes:
>
> - Add a new definition for SYM_CODE_END() that adds the address
> range of the function to a special section called
> "sym_code_functions".
>
> - Include the new section under initdata in vmlinux.lds.S.
>
> - Define an early_initcall() to copy the contents of the
> "sym_code_functions" section to an array by the same name.
>
> - Define a function unwinder_blacklisted() that compares a return
> PC against sym_code_sections[]. If there is a match, mark the
> stack trace unreliable. Call this from unwind_frame().
>
> v3:
> - Implemented a sym_code_ranges[] array to contains sections bounds
> for text sections that contain SYM_CODE_*() functions. The unwinder
> checks each return PC against the sections. If it falls in any of
> the sections, the stack trace is marked unreliable.
>
> - Moved SYM_CODE functions from .text and .init.text into a new
> text section called ".code.text". Added this section to
> vmlinux.lds.S and sym_code_ranges[].
>
> - Fixed the logic in the unwinder that handles Function Graph
> Tracer return trampoline.
>
> - Removed all the previous code that handles:
> - ftrace entry code for traced function
> - special_functions[] array that lists individual functions
> - kretprobe_trampoline() special case
>
> v2
> - Removed the terminating entry { 0, 0 } in special_functions[]
> and replaced it with the idiom { /* sentinel */ }.
>
> - Change the ftrace trampoline entry ftrace_graph_call in
> special_functions[] to ftrace_call + 4 and added explanatory
> comments.
>
> - Unnested #ifdefs in special_functions[] for FTRACE.
>
> v1
> - Define a bool field in struct stackframe. This will indicate if
> a stack trace is reliable.
>
> - Implement a special_functions[] array that will be populated
> with special functions in which the stack trace is considered
> unreliable.
>
> - Using kallsyms_lookup(), get the address ranges for the special
> functions and record them.
>
> - Implement an is_reliable_function(pc). This function will check
> if a given return PC falls in any of the special functions. If
> it does, the stack trace is unreliable.
>
> - Implement check_reliability() function that will check if a
> stack frame is reliable. Call is_reliable_function() from
> check_reliability().
>
> - Before a return PC is checked against special_funtions[], it
> must be validates as a proper kernel text address. Call
> __kernel_text_address() from check_reliability().
>
> - Finally, call check_reliability() from unwind_frame() for
> each stack frame.
>
> - Add EL1 exception handlers to special_functions[].
>
> el1_sync();
> el1_irq();
> el1_error();
> el1_sync_invalid();
> el1_irq_invalid();
> el1_fiq_invalid();
> el1_error_invalid();
>
> - The above functions are currently defined as LOCAL symbols.
> Make them global so that they can be referenced from the
> unwinder code.
>
> - Add FTRACE trampolines to special_functions[]:
>
> ftrace_graph_call()
> ftrace_graph_caller()
> return_to_handler()
>
> - Add the kretprobe trampoline to special functions[]:
>
> kretprobe_trampoline()
>
> Previous versions and discussion
> ================================
>
> v8: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> v7: Mailer screwed up the threading. Sent the same as v8 with proper threading.
> v6: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> v5: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> v4: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> v3: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> v2: https://lore.kernel.org/linux-arm-kernel/[email protected]/
> v1: https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> Madhavan T. Venkataraman (11):
> arm64: Select STACKTRACE in arch/arm64/Kconfig
> arm64: Make perf_callchain_kernel() use arch_stack_walk()
> arm64: Make get_wchan() use arch_stack_walk()
> arm64: Make return_address() use arch_stack_walk()
> arm64: Make dump_stacktrace() use arch_stack_walk()
> arm64: Make profile_pc() use arch_stack_walk()
> arm64: Call stack_backtrace() only from within walk_stackframe()
> arm64: Rename unwinder functions, prevent them from being traced and
> kprobed
> arm64: Make the unwind loop in unwind() similar to other architectures
> arm64: Introduce stack trace reliability checks in the unwinder
> arm64: Create a list of SYM_CODE functions, check return PC against
> list
>
> arch/arm64/Kconfig | 1 +
> arch/arm64/include/asm/linkage.h | 12 ++
> arch/arm64/include/asm/sections.h | 1 +
> arch/arm64/include/asm/stacktrace.h | 12 +-
> arch/arm64/kernel/perf_callchain.c | 8 +-
> arch/arm64/kernel/process.c | 38 ++--
> arch/arm64/kernel/return_address.c | 6 +-
> arch/arm64/kernel/stacktrace.c | 274 +++++++++++++++++++---------
> arch/arm64/kernel/time.c | 22 ++-
> arch/arm64/kernel/vmlinux.lds.S | 10 +
> 10 files changed, 257 insertions(+), 127 deletions(-)
>
>
> base-commit: 36a21d51725af2ce0700c6ebcb6b9594aac658a6
>
On Thu, Oct 14, 2021 at 09:58:37PM -0500, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Currently, there are multiple functions in ARM64 code that walk the
> stack using start_backtrace() and unwind_frame() or start_backtrace()
> and walk_stackframe(). They should all be converted to use
> arch_stack_walk(). This makes maintenance easier.
Reviwed-by: Mark Brown <[email protected]>
Arguably this should be squashed in with the first user but that's
getting bikesheddy and could make hassle merging things in so meh.
On Thu, Oct 14, 2021 at 09:58:38PM -0500, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Currently, perf_callchain_kernel() in ARM64 code walks the stack using
> start_backtrace() and walk_stackframe(). Make it use arch_stack_walk()
> instead. This makes maintenance easier.
> static bool callchain_trace(void *data, unsigned long pc)
> {
> struct perf_callchain_entry_ctx *entry = data;
> - perf_callchain_store(entry, pc);
> - return true;
> + return perf_callchain_store(entry, pc) == 0;
> }
This changes us from unconditionally doing the whole walk to returning
an error if perf_callchain_store() returns an error so it's not quite a
straight transform, though since that seems like a useful improvement
which most likely on't have any practical impact that's fine.
Reviewed-by: Mark Brown <[email protected]>
On Thu, Oct 14, 2021 at 09:58:40PM -0500, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Currently, return_address() in ARM64 code walks the stack using
> start_backtrace() and walk_stackframe(). Make it use arch_stack_walk()
> instead. This makes maintenance easier.
Reviewed-by: Mark Brown <[email protected]>
On Thu, Oct 14, 2021 at 09:58:39PM -0500, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Currently, get_wchan() in ARM64 code walks the stack using start_backtrace()
> and unwind_frame(). Make it use arch_stack_walk() instead. This makes
> maintenance easier.
This overlaps with some very similar updates that Peter Zijlstra is
working on which addresses some existing problems with wchan:
https://lore.kernel.org/all/[email protected]/
It probably makes sense for you to coordinate with Peter here, some of
that series is already merged up to his patch 6 which looks very
similar to what you've got here. In that thread you'll see that Mark
Rutland spotted an issue with the handling of __switch_to() on arm64
which probably also applies to your change.
On 10/15/21 1:28 PM, Mark Brown wrote:
> On Thu, Oct 14, 2021 at 09:58:37PM -0500, [email protected] wrote:
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> Currently, there are multiple functions in ARM64 code that walk the
>> stack using start_backtrace() and unwind_frame() or start_backtrace()
>> and walk_stackframe(). They should all be converted to use
>> arch_stack_walk(). This makes maintenance easier.
>
> Reviwed-by: Mark Brown <[email protected]>
>
> Arguably this should be squashed in with the first user but that's
> getting bikesheddy and could make hassle merging things in so meh.
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Thanks.
Madhavan
On 10/20/21 9:59 AM, Mark Brown wrote:
> On Thu, Oct 14, 2021 at 09:58:38PM -0500, [email protected] wrote:
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> Currently, perf_callchain_kernel() in ARM64 code walks the stack using
>> start_backtrace() and walk_stackframe(). Make it use arch_stack_walk()
>> instead. This makes maintenance easier.
>
>> static bool callchain_trace(void *data, unsigned long pc)
>> {
>> struct perf_callchain_entry_ctx *entry = data;
>> - perf_callchain_store(entry, pc);
>> - return true;
>> + return perf_callchain_store(entry, pc) == 0;
>> }
>
> This changes us from unconditionally doing the whole walk to returning
> an error if perf_callchain_store() returns an error so it's not quite a
> straight transform, though since that seems like a useful improvement
> which most likely on't have any practical impact that's fine.
>
> Reviewed-by: Mark Brown <[email protected]>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Thanks.
Madhavan
Thanks.
On 10/20/21 10:03 AM, Mark Brown wrote:
> On Thu, Oct 14, 2021 at 09:58:40PM -0500, [email protected] wrote:
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> Currently, return_address() in ARM64 code walks the stack using
>> start_backtrace() and walk_stackframe(). Make it use arch_stack_walk()
>> instead. This makes maintenance easier.
>
> Reviewed-by: Mark Brown <[email protected]>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
I will take a look at what Peter has done and will coordinate with him.
Thanks.
Madhavan
On 10/20/21 11:10 AM, Mark Brown wrote:
> On Thu, Oct 14, 2021 at 09:58:39PM -0500, [email protected] wrote:
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> Currently, get_wchan() in ARM64 code walks the stack using start_backtrace()
>> and unwind_frame(). Make it use arch_stack_walk() instead. This makes
>> maintenance easier.
>
> This overlaps with some very similar updates that Peter Zijlstra is
> working on which addresses some existing problems with wchan:
>
> https://lore.kernel.org/all/[email protected]/
>
> It probably makes sense for you to coordinate with Peter here, some of
> that series is already merged up to his patch 6 which looks very
> similar to what you've got here. In that thread you'll see that Mark
> Rutland spotted an issue with the handling of __switch_to() on arm64
> which probably also applies to your change.
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Hi Madhavan,
Apolgoies for the delay in getting round to this.
On Thu, Oct 14, 2021 at 09:58:37PM -0500, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Currently, there are multiple functions in ARM64 code that walk the
> stack using start_backtrace() and unwind_frame() or start_backtrace()
> and walk_stackframe(). They should all be converted to use
> arch_stack_walk(). This makes maintenance easier.
>
> To do that, arch_stack_walk() must always be defined. arch_stack_walk()
> is within #ifdef CONFIG_STACKTRACE. So, select STACKTRACE in
> arch/arm64/Kconfig.
I'd prefer if we could decouple ARCH_STACKWALK from STACKTRACE, so that
we don't have to expose /proc/*/stack unconditionally, which Peter
Zijlstra has a patch for:
https://lore.kernel.org/lkml/[email protected]/
... but regardless the rest of the series looks pretty good, so I'll go
review that, and we can figure out how to queue the bits and pieces in
the right order.
Thanks,
Mark.
>
> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
> ---
> arch/arm64/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index fdcd54d39c1e..bfb0ce60d820 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -35,6 +35,7 @@ config ARM64
> select ARCH_HAS_SET_DIRECT_MAP
> select ARCH_HAS_SET_MEMORY
> select ARCH_STACKWALK
> + select STACKTRACE
> select ARCH_HAS_STRICT_KERNEL_RWX
> select ARCH_HAS_STRICT_MODULE_RWX
> select ARCH_HAS_SYNC_DMA_FOR_DEVICE
> --
> 2.25.1
>
On Thu, Oct 14, 2021 at 09:58:38PM -0500, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Currently, perf_callchain_kernel() in ARM64 code walks the stack using
> start_backtrace() and walk_stackframe(). Make it use arch_stack_walk()
> instead. This makes maintenance easier.
>
> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
This looks good to me; bailing out when perf_callchain_store() can't
accept any more entries absolutely makes sense.
I gave this a spin with:
| # perf record -g -c1 ls
| # perf report
... and the recorded callchains look sane.
Reviewed-by: Mark Rutland <[email protected]>
Tested-by: Mark Rutland <[email protected]>
As mentioned on patch 1, I'd like to get this rebased atop Peter's
untangling of ARCH_STACKWALK from STACKTRACE.
Thanks,
Mark.
> ---
> arch/arm64/kernel/perf_callchain.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/kernel/perf_callchain.c b/arch/arm64/kernel/perf_callchain.c
> index 4a72c2727309..f173c448e852 100644
> --- a/arch/arm64/kernel/perf_callchain.c
> +++ b/arch/arm64/kernel/perf_callchain.c
> @@ -140,22 +140,18 @@ void perf_callchain_user(struct perf_callchain_entry_ctx *entry,
> static bool callchain_trace(void *data, unsigned long pc)
> {
> struct perf_callchain_entry_ctx *entry = data;
> - perf_callchain_store(entry, pc);
> - return true;
> + return perf_callchain_store(entry, pc) == 0;
> }
>
> void perf_callchain_kernel(struct perf_callchain_entry_ctx *entry,
> struct pt_regs *regs)
> {
> - struct stackframe frame;
> -
> if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
> /* We don't support guest os callchain now */
> return;
> }
>
> - start_backtrace(&frame, regs->regs[29], regs->pc);
> - walk_stackframe(current, &frame, callchain_trace, entry);
> + arch_stack_walk(callchain_trace, entry, current, regs);
> }
>
> unsigned long perf_instruction_pointer(struct pt_regs *regs)
> --
> 2.25.1
>
On Thu, Oct 14, 2021 at 09:58:40PM -0500, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Currently, return_address() in ARM64 code walks the stack using
> start_backtrace() and walk_stackframe(). Make it use arch_stack_walk()
> instead. This makes maintenance easier.
>
> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
> ---
> arch/arm64/kernel/return_address.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
> index a6d18755652f..92a0f4d434e4 100644
> --- a/arch/arm64/kernel/return_address.c
> +++ b/arch/arm64/kernel/return_address.c
> @@ -35,15 +35,11 @@ NOKPROBE_SYMBOL(save_return_addr);
> void *return_address(unsigned int level)
> {
> struct return_address_data data;
> - struct stackframe frame;
>
> data.level = level + 2;
> data.addr = NULL;
>
> - start_backtrace(&frame,
> - (unsigned long)__builtin_frame_address(0),
> - (unsigned long)return_address);
> - walk_stackframe(current, &frame, save_return_addr, &data);
> + arch_stack_walk(save_return_addr, &data, current, NULL);
This looks equivalent to me. Previously the arguments to
start_backtrace() meant that walk_stackframe would report
return_address(), then the caller of return_address(), and so on. As
arch_stack_walk() starts from its immediate caller (i.e.
return_address()), that should result in the same trace.
It would be nice if we could note something to that effect in the commit
message.
I had a play with ftrace, which uses return_address(), and that all
looks sound.
>
> if (!data.level)
> return data.addr;
The end of this function currently does:
if (!data.level)
return data.addr;
else
return NULL;
... but since we initialize data.addr to NULL, and save_return_addr()
only writes to data.addr when called at the correct level, we can
simplify that to:
return data.addr;
Regardles of that cleanup:
Reviewed-by: Mark Rutland <[email protected]>
Tested-by: Mark Rutland <[email protected]>
I'll continue reviewing the series next week.
Thanks,
Mark.
On 10/22/21 1:11 PM, Mark Rutland wrote:
> On Thu, Oct 14, 2021 at 09:58:38PM -0500, [email protected] wrote:
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> Currently, perf_callchain_kernel() in ARM64 code walks the stack using
>> start_backtrace() and walk_stackframe(). Make it use arch_stack_walk()
>> instead. This makes maintenance easier.
>>
>> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
>
> This looks good to me; bailing out when perf_callchain_store() can't
> accept any more entries absolutely makes sense.
>
> I gave this a spin with:
>
> | # perf record -g -c1 ls
> | # perf report
>
> ... and the recorded callchains look sane.
>
> Reviewed-by: Mark Rutland <[email protected]>
> Tested-by: Mark Rutland <[email protected]>
>
Thanks a lot!
> As mentioned on patch 1, I'd like to get this rebased atop Peter's
> untangling of ARCH_STACKWALK from STACKTRACE.
>
Will do.
Thanks.
Madhavan
On 10/22/21 1:51 PM, Mark Rutland wrote:
> On Thu, Oct 14, 2021 at 09:58:40PM -0500, [email protected] wrote:
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> Currently, return_address() in ARM64 code walks the stack using
>> start_backtrace() and walk_stackframe(). Make it use arch_stack_walk()
>> instead. This makes maintenance easier.
>>
>> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
>> ---
>> arch/arm64/kernel/return_address.c | 6 +-----
>> 1 file changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/return_address.c b/arch/arm64/kernel/return_address.c
>> index a6d18755652f..92a0f4d434e4 100644
>> --- a/arch/arm64/kernel/return_address.c
>> +++ b/arch/arm64/kernel/return_address.c
>> @@ -35,15 +35,11 @@ NOKPROBE_SYMBOL(save_return_addr);
>> void *return_address(unsigned int level)
>> {
>> struct return_address_data data;
>> - struct stackframe frame;
>>
>> data.level = level + 2;
>> data.addr = NULL;
>>
>> - start_backtrace(&frame,
>> - (unsigned long)__builtin_frame_address(0),
>> - (unsigned long)return_address);
>> - walk_stackframe(current, &frame, save_return_addr, &data);
>> + arch_stack_walk(save_return_addr, &data, current, NULL);
>
> This looks equivalent to me. Previously the arguments to
> start_backtrace() meant that walk_stackframe would report
> return_address(), then the caller of return_address(), and so on. As
> arch_stack_walk() starts from its immediate caller (i.e.
> return_address()), that should result in the same trace.
>
> It would be nice if we could note something to that effect in the commit
> message.
>
Will do.
> I had a play with ftrace, which uses return_address(), and that all
> looks sound.
>
Thanks a lot!
>>
>> if (!data.level)
>> return data.addr;
>
> The end of this function currently does:
>
> if (!data.level)
> return data.addr;
> else
> return NULL;
>
> ... but since we initialize data.addr to NULL, and save_return_addr()
> only writes to data.addr when called at the correct level, we can
> simplify that to:
>
> return data.addr;
>
OK. I will make this change.
> Regardles of that cleanup:
>
> Reviewed-by: Mark Rutland <[email protected]>
> Tested-by: Mark Rutland <[email protected]>
>
Thanks a lot!
> I'll continue reviewing the series next week.
>
Great!
Madhavan
Hi,
> -----Original Message-----
> From: [email protected] <[email protected]>
> Sent: Friday, October 15, 2021 11:59 AM
> To: [email protected]; [email protected]; [email protected]; [email protected]; Nobuta, Keiya/信田 圭哉
> <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: [PATCH v10 06/11] arm64: Make profile_pc() use arch_stack_walk()
>
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Currently, profile_pc() in ARM64 code walks the stack using
> start_backtrace() and unwind_frame(). Make it use arch_stack_walk() instead. This makes maintenance easier.
>
> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
> ---
> arch/arm64/kernel/time.c | 22 +++++++++++++---------
> 1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c index eebbc8d7123e..671b3038a772 100644
> --- a/arch/arm64/kernel/time.c
> +++ b/arch/arm64/kernel/time.c
> @@ -32,22 +32,26 @@
> #include <asm/stacktrace.h>
> #include <asm/paravirt.h>
>
> +static bool profile_pc_cb(void *arg, unsigned long pc) {
> + unsigned long *prof_pc = arg;
> +
> + if (in_lock_functions(pc))
> + return true;
> + *prof_pc = pc;
> + return false;
> +}
> +
> unsigned long profile_pc(struct pt_regs *regs) {
> - struct stackframe frame;
> + unsigned long prof_pc = 0;
>
> if (!in_lock_functions(regs->pc))
> return regs->pc;
>
> - start_backtrace(&frame, regs->regs[29], regs->pc);
> -
> - do {
> - int ret = unwind_frame(NULL, &frame);
> - if (ret < 0)
> - return 0;
> - } while (in_lock_functions(frame.pc));
> + arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
>
> - return frame.pc;
> + return prof_pc;
> }
> EXPORT_SYMBOL(profile_pc);
>
> --
> 2.25.1
I've got build error with CONFIG_ACPI=n:
====
arch/arm64/kernel/time.c: In function 'profile_pc':
arch/arm64/kernel/time.c:52:2: error: implicit declaration of function 'arch_stack_walk' [-Werror=implicit-function-declaration]
52 | arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
| ^~~~~~~~~~~~~~~
====
I think it should include <linux/stacktrace.h> instead of <asm/stacktrace.h>.
Thanks,
Keiya
On Thu, Oct 14, 2021 at 09:58:41PM -0500, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Currently, dump_stacktrace() in ARM64 code walks the stack using
> start_backtrace() and unwind_frame(). Make it use arch_stack_walk()
> instead. This makes maintenance easier.
>
> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
> ---
> arch/arm64/kernel/stacktrace.c | 44 +++++-----------------------------
> 1 file changed, 6 insertions(+), 38 deletions(-)
>
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 8982a2b78acf..776c4debb5a7 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -151,24 +151,20 @@ void notrace walk_stackframe(struct task_struct *tsk, struct stackframe *frame,
> }
> NOKPROBE_SYMBOL(walk_stackframe);
>
> -static void dump_backtrace_entry(unsigned long where, const char *loglvl)
> +static bool dump_backtrace_entry(void *arg, unsigned long where)
> {
> + char *loglvl = arg;
> printk("%s %pSb\n", loglvl, (void *)where);
> + return true;
> }
>
> void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
> const char *loglvl)
> {
> - struct stackframe frame;
> - int skip = 0;
> -
> pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
>
> - if (regs) {
> - if (user_mode(regs))
> - return;
> - skip = 1;
> - }
> + if (regs && user_mode(regs))
> + return;
>
> if (!tsk)
> tsk = current;
> @@ -176,36 +172,8 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk,
> if (!try_get_task_stack(tsk))
> return;
>
> - if (tsk == current) {
> - start_backtrace(&frame,
> - (unsigned long)__builtin_frame_address(0),
> - (unsigned long)dump_backtrace);
> - } else {
> - /*
> - * task blocked in __switch_to
> - */
> - start_backtrace(&frame,
> - thread_saved_fp(tsk),
> - thread_saved_pc(tsk));
> - }
> -
> printk("%sCall trace:\n", loglvl);
> - do {
> - /* skip until specified stack frame */
> - if (!skip) {
> - dump_backtrace_entry(frame.pc, loglvl);
> - } else if (frame.fp == regs->regs[29]) {
> - skip = 0;
> - /*
> - * Mostly, this is the case where this function is
> - * called in panic/abort. As exception handler's
> - * stack frame does not contain the corresponding pc
> - * at which an exception has taken place, use regs->pc
> - * instead.
> - */
> - dump_backtrace_entry(regs->pc, loglvl);
> - }
> - } while (!unwind_frame(tsk, &frame));
> + arch_stack_walk(dump_backtrace_entry, (void *)loglvl, tsk, regs);
This looks really nice!
Unfortunately, removing the `skip` logic higlights a latent issue with
the unwinder (which we previously worked around here but not elsewhere),
whreby we can report erroneous entries when unwinding from regs, because
the fgraph ret stack can have entries added between exception entry and
performing unwind steps.
With this patch as-is, if you enable the function graph tracer, then use
magic-sysrq l, you can see something like:
| sysrq: Show backtrace of all active CPUs
| sysrq: CPU0:
| CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc4-00005-g9097969cd989 #2
| Hardware name: linux,dummy-virt (DT)
| pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : arch_cpu_idle+0x1c/0x30
| lr : arch_cpu_idle+0x14/0x30
| sp : ffffc74542823d50
| x29: ffffc74542823d50 x28: 000000004129444c x27: 0000000000000000
| x26: ffffc74542833f00 x25: 0000000000000000 x24: 0000000000000000
| x23: ffffc74542829b10 x22: ffffc7454213ccb8 x21: ffffc745428299e8
| x20: ffffc74542829ae0 x19: ffffc7454212d000 x18: 0000000000000000
| x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
| x14: 0000000000000000 x13: 0000000000000001 x12: 0000000002c00ff0
| x11: 0000000002c00000 x10: ffffc74542823d30 x9 : ffff26b8cc7ba600
| x8 : 0000000000000001 x7 : 0000000000000089 x6 : 000000291c72cca9
| x5 : ffff5f74b4953000 x4 : 0000000000004a09 x3 : ffff5f74b4953000
| x2 : 0000000000004a09 x1 : ffff26b9f6a91e20 x0 : 00000000000000e0
| Call trace:
| arch_cpu_idle+0x1c/0x30
| default_idle_call+0x4c/0x19c
| dump_backtrace+0x30/0x3c
| show_regs+0x38/0x50
| rest_init+0xf0/0x100
| arch_call_rest_init+0x1c/0x28
| start_kernel+0x69c/0x6dc
| __primary_switched+0xc0/0xc8
The 'dump_backtrace` and `show_regs` entries don't belong there, and are
hiding the real entries at that part of the callchain.
I think the fix for that is something like the below, which we should
take as a preparatory fix, but as this could trigger warnings in
legitimate usage scenarios we'll need to think a bit harder about the
case of unwinding from regs, and whether the WARN_ON_ONCE() is
warranted.
Thanks,
Mark.
---->8----
From f3e66ca75aff3474355839f72d123276028204e1 Mon Sep 17 00:00:00 2001
From: Mark Rutland <[email protected]>
Date: Mon, 25 Oct 2021 13:23:11 +0100
Subject: [PATCH] arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
When CONFIG_FUNCTION_GRAPH_TRACER is selected, and the function graph:
tracer is in use, unwind_frame() may erroneously asscociate a traced
function with an incorrect return address. This can happen when starting
an unwind from a pt_regs, or when unwinding across an exception
boundary.
The underlying problem is that ftrace_graph_get_ret_stack() takes an
index offset from the most recent entry added to the fgraph return
stack. We start an unwind at offset 0, and increment the offset each
time we encounter `return_to_handler`, which indicates a rewritten
return address. This is broken in two cases:
* Between creating a pt_regs and starting the unwind, function calls may
place entries on the stack, leaving an abitrary offset which we can
only determine by performing a full unwind from the caller of the
unwind code. While this initial unwind is open-coded in
dump_backtrace(), this is not performed for other unwinders such as
perf_callchain_kernel().
* When unwinding across an exception boundary (whether continuing an
unwind or starting a new unwind from regs), we always consume the LR
of the interrupted context, though this may not have been live at the
time of the exception. Where the LR was not live but happened to
contain `return_to_handler`, we'll recover an address from the graph
return stack and increment the current offset, leaving subsequent
entries off-by-one.
Where the LR was not live and did not contain `return_to_handler`, we
will still report an erroneous address, but subsequent entries will be
unaffected.
We can fix the graph return address issues by using
HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, and associating each rewritten return
address with a unique location on the stack. As the return address is
passed in the LR (and so is not guaranteed a unique location in memory),
we use the FP upon entry to the function (i.e. the address of the
caller's frame record) as the return address pointer.
NOTE: With this patch applied, unwinding using regs from a context where
LR is not live but contains `return_to_handler` should consistently
trigger the WARN_ON_ONCE() in unwind_frame(), as the FP must be
different to the FP upon entry to the function. So far I have been
unable to trigger this in local testing. This is a latent bug which
already existed, but could have been masked by the prior behviour of
consuming extra entries in the ftrace graph stack. So far I have not
been able to trigger this in local testing.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Madhavan T. Venkataraman <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/ftrace.h | 11 +++++++++++
arch/arm64/include/asm/stacktrace.h | 6 ------
arch/arm64/kernel/ftrace.c | 6 +++---
arch/arm64/kernel/stacktrace.c | 12 +++++-------
4 files changed, 19 insertions(+), 16 deletions(-)
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 91fa4baa1a93..c96d47cb8f46 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -12,6 +12,17 @@
#define HAVE_FUNCTION_GRAPH_FP_TEST
+/*
+ * HAVE_FUNCTION_GRAPH_RET_ADDR_PTR means that the architecture can provide a
+ * "return address pointer" which can be used to uniquely identify a return
+ * address which has been overwritten.
+ *
+ * On arm64 we use the address of the caller's frame record, which remains the
+ * same for the lifetime of the instrumented function, unlike the return
+ * address in the LR.
+ */
+#define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
+
#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
#define ARCH_SUPPORTS_FTRACE_OPS 1
#else
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 8aebc00c1718..9a319eca5cab 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -46,9 +46,6 @@ struct stack_info {
* @prev_type: The type of stack this frame record was on, or a synthetic
* value of STACK_TYPE_UNKNOWN. This is used to detect a
* transition from one stack to another.
- *
- * @graph: When FUNCTION_GRAPH_TRACER is selected, holds the index of a
- * replacement lr value in the ftrace graph stack.
*/
struct stackframe {
unsigned long fp;
@@ -56,9 +53,6 @@ struct stackframe {
DECLARE_BITMAP(stacks_done, __NR_STACK_TYPES);
unsigned long prev_fp;
enum stack_type prev_type;
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- int graph;
-#endif
};
extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c
index 7f467bd9db7a..3e5d0ed63eb7 100644
--- a/arch/arm64/kernel/ftrace.c
+++ b/arch/arm64/kernel/ftrace.c
@@ -249,8 +249,6 @@ int __init ftrace_dyn_arch_init(void)
* on the way back to parent. For this purpose, this function is called
* in _mcount() or ftrace_caller() to replace return address (*parent) on
* the call stack to return_to_handler.
- *
- * Note that @frame_pointer is used only for sanity check later.
*/
void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
unsigned long frame_pointer)
@@ -268,8 +266,10 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
*/
old = *parent;
- if (!function_graph_enter(old, self_addr, frame_pointer, NULL))
+ if (!function_graph_enter(old, self_addr, frame_pointer,
+ (void *)frame_pointer)) {
*parent = return_hooker;
+ }
}
#ifdef CONFIG_DYNAMIC_FTRACE
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 8982a2b78acf..749b680b4999 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -38,9 +38,6 @@ void start_backtrace(struct stackframe *frame, unsigned long fp,
{
frame->fp = fp;
frame->pc = pc;
-#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- frame->graph = 0;
-#endif
/*
* Prime the first unwind.
@@ -116,17 +113,18 @@ int notrace unwind_frame(struct task_struct *tsk, struct stackframe *frame)
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (tsk->ret_stack &&
(ptrauth_strip_insn_pac(frame->pc) == (unsigned long)return_to_handler)) {
- struct ftrace_ret_stack *ret_stack;
+ unsigned long orig_pc;
/*
* This is a case where function graph tracer has
* modified a return address (LR) in a stack frame
* to hook a function return.
* So replace it to an original value.
*/
- ret_stack = ftrace_graph_get_ret_stack(tsk, frame->graph++);
- if (WARN_ON_ONCE(!ret_stack))
+ orig_pc = ftrace_graph_ret_addr(tsk, NULL, frame->pc,
+ (void *)frame->fp);
+ if (WARN_ON_ONCE(frame->pc == orig_pc))
return -EINVAL;
- frame->pc = ret_stack->ret;
+ frame->pc = orig_pc;
}
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
--
2.11.0
On Mon, Oct 25, 2021 at 05:49:25PM +0100, Mark Rutland wrote:
> From f3e66ca75aff3474355839f72d123276028204e1 Mon Sep 17 00:00:00 2001
> From: Mark Rutland <[email protected]>
> Date: Mon, 25 Oct 2021 13:23:11 +0100
> Subject: [PATCH] arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
>
> When CONFIG_FUNCTION_GRAPH_TRACER is selected, and the function graph:
> tracer is in use, unwind_frame() may erroneously asscociate a traced
> function with an incorrect return address. This can happen when starting
> an unwind from a pt_regs, or when unwinding across an exception
> boundary.
>
> The underlying problem is that ftrace_graph_get_ret_stack() takes an
> index offset from the most recent entry added to the fgraph return
> stack. We start an unwind at offset 0, and increment the offset each
> time we encounter `return_to_handler`, which indicates a rewritten
> return address. This is broken in two cases:
>
> * Between creating a pt_regs and starting the unwind, function calls may
> place entries on the stack, leaving an abitrary offset which we can
> only determine by performing a full unwind from the caller of the
> unwind code. While this initial unwind is open-coded in
> dump_backtrace(), this is not performed for other unwinders such as
> perf_callchain_kernel().
>
> * When unwinding across an exception boundary (whether continuing an
> unwind or starting a new unwind from regs), we always consume the LR
> of the interrupted context, though this may not have been live at the
> time of the exception. Where the LR was not live but happened to
> contain `return_to_handler`, we'll recover an address from the graph
> return stack and increment the current offset, leaving subsequent
> entries off-by-one.
>
> Where the LR was not live and did not contain `return_to_handler`, we
> will still report an erroneous address, but subsequent entries will be
> unaffected.
It turns out I had this backwards, and we currently always *skip* the LR
when unwinding across regs, because:
* The entry assembly creates a synthetic frame record with the original
FP and the ELR_EL1 value (i.e. the PC at the point of the exception),
skipping the LR.
* In arch_stack_walk() we start the walk from regs->pc, and continue
with the frame record, skipping the LR.
* In the existing dump_backtrace, we skip until we hit a frame record
whose FP value matches the FP in the regs (i.e. the synthetic frame
record created by the entry assembly). That'll dump the ELR_EL1 value,
then continue to the next frame record, skipping the LR.
So case two is bogus, and only case one can happen today. This cleanup
shouldn't trigger the WARN_ON_ONCE() in unwind_frame(), and we can fix
the missing LR entry in a subsequent cleanup.
Thanks,
Mark.
On Thu, Oct 14, 2021 at 09:58:42PM -0500, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Currently, profile_pc() in ARM64 code walks the stack using
> start_backtrace() and unwind_frame(). Make it use arch_stack_walk()
> instead. This makes maintenance easier.
>
> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
> ---
> arch/arm64/kernel/time.c | 22 +++++++++++++---------
> 1 file changed, 13 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
> index eebbc8d7123e..671b3038a772 100644
> --- a/arch/arm64/kernel/time.c
> +++ b/arch/arm64/kernel/time.c
> @@ -32,22 +32,26 @@
> #include <asm/stacktrace.h>
> #include <asm/paravirt.h>
>
> +static bool profile_pc_cb(void *arg, unsigned long pc)
> +{
> + unsigned long *prof_pc = arg;
> +
> + if (in_lock_functions(pc))
> + return true;
> + *prof_pc = pc;
> + return false;
> +}
> +
> unsigned long profile_pc(struct pt_regs *regs)
> {
> - struct stackframe frame;
> + unsigned long prof_pc = 0;
>
> if (!in_lock_functions(regs->pc))
> return regs->pc;
This can go -- the first call to profile_pc_cb() will use regs->pc.
With that gone, and the include updates to use <linux/stacktrace.h>:
Reviewed-by: Mark Rutland <[email protected]>
Mark.
> - start_backtrace(&frame, regs->regs[29], regs->pc);
> -
> - do {
> - int ret = unwind_frame(NULL, &frame);
> - if (ret < 0)
> - return 0;
> - } while (in_lock_functions(frame.pc));
> + arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
>
> - return frame.pc;
> + return prof_pc;
> }
> EXPORT_SYMBOL(profile_pc);
>
> --
> 2.25.1
>
On 10/26/21 7:05 AM, Mark Rutland wrote:
> On Mon, Oct 25, 2021 at 05:49:25PM +0100, Mark Rutland wrote:
>> From f3e66ca75aff3474355839f72d123276028204e1 Mon Sep 17 00:00:00 2001
>> From: Mark Rutland <[email protected]>
>> Date: Mon, 25 Oct 2021 13:23:11 +0100
>> Subject: [PATCH] arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
>>
>> When CONFIG_FUNCTION_GRAPH_TRACER is selected, and the function graph:
>> tracer is in use, unwind_frame() may erroneously asscociate a traced
>> function with an incorrect return address. This can happen when starting
>> an unwind from a pt_regs, or when unwinding across an exception
>> boundary.
>>
>> The underlying problem is that ftrace_graph_get_ret_stack() takes an
>> index offset from the most recent entry added to the fgraph return
>> stack. We start an unwind at offset 0, and increment the offset each
>> time we encounter `return_to_handler`, which indicates a rewritten
>> return address. This is broken in two cases:
>>
>> * Between creating a pt_regs and starting the unwind, function calls may
>> place entries on the stack, leaving an abitrary offset which we can
>> only determine by performing a full unwind from the caller of the
>> unwind code. While this initial unwind is open-coded in
>> dump_backtrace(), this is not performed for other unwinders such as
>> perf_callchain_kernel().
>>
>> * When unwinding across an exception boundary (whether continuing an
>> unwind or starting a new unwind from regs), we always consume the LR
>> of the interrupted context, though this may not have been live at the
>> time of the exception. Where the LR was not live but happened to
>> contain `return_to_handler`, we'll recover an address from the graph
>> return stack and increment the current offset, leaving subsequent
>> entries off-by-one.
>>
>> Where the LR was not live and did not contain `return_to_handler`, we
>> will still report an erroneous address, but subsequent entries will be
>> unaffected.
>
> It turns out I had this backwards, and we currently always *skip* the LR
> when unwinding across regs, because:
>
> * The entry assembly creates a synthetic frame record with the original
> FP and the ELR_EL1 value (i.e. the PC at the point of the exception),
> skipping the LR.
>
> * In arch_stack_walk() we start the walk from regs->pc, and continue
> with the frame record, skipping the LR.
>
> * In the existing dump_backtrace, we skip until we hit a frame record
> whose FP value matches the FP in the regs (i.e. the synthetic frame
> record created by the entry assembly). That'll dump the ELR_EL1 value,
> then continue to the next frame record, skipping the LR.
>
> So case two is bogus, and only case one can happen today. This cleanup
> shouldn't trigger the WARN_ON_ONCE() in unwind_frame(), and we can fix
> the missing LR entry in a subsequent cleanup.
>
OK. Thanks.
Madhavan
On 10/24/21 9:18 PM, [email protected] wrote:
> Hi,
>
>> -----Original Message-----
>> From: [email protected] <[email protected]>
>> Sent: Friday, October 15, 2021 11:59 AM
>> To: [email protected]; [email protected]; [email protected]; [email protected]; Nobuta, Keiya/信田 圭哉
>> <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]
>> Subject: [PATCH v10 06/11] arm64: Make profile_pc() use arch_stack_walk()
>>
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> Currently, profile_pc() in ARM64 code walks the stack using
>> start_backtrace() and unwind_frame(). Make it use arch_stack_walk() instead. This makes maintenance easier.
>>
>> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
>> ---
>> arch/arm64/kernel/time.c | 22 +++++++++++++---------
>> 1 file changed, 13 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c index eebbc8d7123e..671b3038a772 100644
>> --- a/arch/arm64/kernel/time.c
>> +++ b/arch/arm64/kernel/time.c
>> @@ -32,22 +32,26 @@
>> #include <asm/stacktrace.h>
>> #include <asm/paravirt.h>
>>
>> +static bool profile_pc_cb(void *arg, unsigned long pc) {
>> + unsigned long *prof_pc = arg;
>> +
>> + if (in_lock_functions(pc))
>> + return true;
>> + *prof_pc = pc;
>> + return false;
>> +}
>> +
>> unsigned long profile_pc(struct pt_regs *regs) {
>> - struct stackframe frame;
>> + unsigned long prof_pc = 0;
>>
>> if (!in_lock_functions(regs->pc))
>> return regs->pc;
>>
>> - start_backtrace(&frame, regs->regs[29], regs->pc);
>> -
>> - do {
>> - int ret = unwind_frame(NULL, &frame);
>> - if (ret < 0)
>> - return 0;
>> - } while (in_lock_functions(frame.pc));
>> + arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
>>
>> - return frame.pc;
>> + return prof_pc;
>> }
>> EXPORT_SYMBOL(profile_pc);
>>
>> --
>> 2.25.1
>
>
> I've got build error with CONFIG_ACPI=n:
> ====
> arch/arm64/kernel/time.c: In function 'profile_pc':
> arch/arm64/kernel/time.c:52:2: error: implicit declaration of function 'arch_stack_walk' [-Werror=implicit-function-declaration]
> 52 | arch_stack_walk(profile_pc_cb, &prof_pc, current, regs);
> | ^~~~~~~~~~~~~~~
> ====
>
> I think it should include <linux/stacktrace.h> instead of <asm/stacktrace.h>.
>
OK. I will fix this.
Thanks for catching this.
Madhavan
On 10/27/21 8:32 AM, Mark Rutland wrote:
> On Thu, Oct 14, 2021 at 09:58:42PM -0500, [email protected] wrote:
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> Currently, profile_pc() in ARM64 code walks the stack using
>> start_backtrace() and unwind_frame(). Make it use arch_stack_walk()
>> instead. This makes maintenance easier.
>>
>> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
>> ---
>> arch/arm64/kernel/time.c | 22 +++++++++++++---------
>> 1 file changed, 13 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/arm64/kernel/time.c b/arch/arm64/kernel/time.c
>> index eebbc8d7123e..671b3038a772 100644
>> --- a/arch/arm64/kernel/time.c
>> +++ b/arch/arm64/kernel/time.c
>> @@ -32,22 +32,26 @@
>> #include <asm/stacktrace.h>
>> #include <asm/paravirt.h>
>>
>> +static bool profile_pc_cb(void *arg, unsigned long pc)
>> +{
>> + unsigned long *prof_pc = arg;
>> +
>> + if (in_lock_functions(pc))
>> + return true;
>> + *prof_pc = pc;
>> + return false;
>> +}
>> +
>> unsigned long profile_pc(struct pt_regs *regs)
>> {
>> - struct stackframe frame;
>> + unsigned long prof_pc = 0;
>>
>> if (!in_lock_functions(regs->pc))
>> return regs->pc;
>
> This can go -- the first call to profile_pc_cb() will use regs->pc.
>
Agreed.
> With that gone, and the include updates to use <linux/stacktrace.h>:
>
> Reviewed-by: Mark Rutland <[email protected]>
>
I will make the two changes.
Madhavan
On Thu, Oct 14, 2021 at 09:58:44PM -0500, [email protected] wrote:
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> Rename unwinder functions for consistency and better naming.
>
> - Rename start_backtrace() to unwind_start().
> - Rename unwind_frame() to unwind_next().
> - Rename walk_stackframe() to unwind().
This looks good to me.
Could we split this from the krpbes/tracing changes? I think this stands
on it's own, and (as below) the kprobes/tracing changes need some more
explanation, and would make sense as a separate patch.
> Prevent the following unwinder functions from being traced:
>
> - unwind_start()
> - unwind_next()
>
> unwind() is already prevented from being traced.
This could do with an explanation in the commis message as to why we
need to do this. If this is fixing a latent issue, it should be in a
preparatory patch that we can backport.
I dug into this a bit, and from taking a look, we prohibited ftrace in commit:
0c32706dac1b0a72 ("arm64: stacktrace: avoid tracing arch_stack_walk()")
... which is just one special case of graph return stack unbalancing,
and should be addressed by using HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, so
with the patch making us use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, that's
no longer necessary.
So we no longer seem to have a specific reason to prohibit ftrace
here.
> Prevent the following unwinder functions from being kprobed:
>
> - unwind_start()
>
> unwind_next() and unwind() are already prevented from being kprobed.
Likewise, I think this needs some explanation. From diggin, we
prohibited kprobes in commit:
ee07b93e7721ccd5 ("arm64: unwind: Prohibit probing on return_address()")
... and the commit message says we need to do this because this is
(transitively) called by trace_hardirqs_off(), which is kprobes
blacklisted, but doesn't explain the actual problem this results in.
AFAICT x86 directly uses __builtin_return_address() here, but that won't
recover rewritten addresses, which seems like a bug (or at least a
limitation) on x86, assuming I've read that correctly.
Thanks,
Mark.
> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
> ---
> arch/arm64/kernel/stacktrace.c | 34 +++++++++++++++++++---------------
> 1 file changed, 19 insertions(+), 15 deletions(-)
>
> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
> index 7d32cee9ef4b..f4f3575f71fd 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -33,8 +33,8 @@
> */
>
>
> -static void start_backtrace(struct stackframe *frame, unsigned long fp,
> - unsigned long pc)
> +static void notrace unwind_start(struct stackframe *frame, unsigned long fp,
> + unsigned long pc)
> {
> frame->fp = fp;
> frame->pc = pc;
> @@ -45,7 +45,7 @@ static void start_backtrace(struct stackframe *frame, unsigned long fp,
> /*
> * Prime the first unwind.
> *
> - * In unwind_frame() we'll check that the FP points to a valid stack,
> + * In unwind_next() we'll check that the FP points to a valid stack,
> * which can't be STACK_TYPE_UNKNOWN, and the first unwind will be
> * treated as a transition to whichever stack that happens to be. The
> * prev_fp value won't be used, but we set it to 0 such that it is
> @@ -56,6 +56,8 @@ static void start_backtrace(struct stackframe *frame, unsigned long fp,
> frame->prev_type = STACK_TYPE_UNKNOWN;
> }
>
> +NOKPROBE_SYMBOL(unwind_start);
> +
> /*
> * Unwind from one frame record (A) to the next frame record (B).
> *
> @@ -63,8 +65,8 @@ static void start_backtrace(struct stackframe *frame, unsigned long fp,
> * records (e.g. a cycle), determined based on the location and fp value of A
> * and the location (but not the fp value) of B.
> */
> -static int notrace unwind_frame(struct task_struct *tsk,
> - struct stackframe *frame)
> +static int notrace unwind_next(struct task_struct *tsk,
> + struct stackframe *frame)
> {
> unsigned long fp = frame->fp;
> struct stack_info info;
> @@ -104,7 +106,7 @@ static int notrace unwind_frame(struct task_struct *tsk,
>
> /*
> * Record this frame record's values and location. The prev_fp and
> - * prev_type are only meaningful to the next unwind_frame() invocation.
> + * prev_type are only meaningful to the next unwind_next() invocation.
> */
> frame->fp = READ_ONCE_NOCHECK(*(unsigned long *)(fp));
> frame->pc = READ_ONCE_NOCHECK(*(unsigned long *)(fp + 8));
> @@ -132,28 +134,30 @@ static int notrace unwind_frame(struct task_struct *tsk,
>
> return 0;
> }
> -NOKPROBE_SYMBOL(unwind_frame);
>
> -static void notrace walk_stackframe(struct task_struct *tsk,
> - unsigned long fp, unsigned long pc,
> - bool (*fn)(void *, unsigned long),
> - void *data)
> +NOKPROBE_SYMBOL(unwind_next);
> +
> +static void notrace unwind(struct task_struct *tsk,
> + unsigned long fp, unsigned long pc,
> + bool (*fn)(void *, unsigned long),
> + void *data)
> {
> struct stackframe frame;
>
> - start_backtrace(&frame, fp, pc);
> + unwind_start(&frame, fp, pc);
>
> while (1) {
> int ret;
>
> if (!fn(data, frame.pc))
> break;
> - ret = unwind_frame(tsk, &frame);
> + ret = unwind_next(tsk, &frame);
> if (ret < 0)
> break;
> }
> }
> -NOKPROBE_SYMBOL(walk_stackframe);
> +
> +NOKPROBE_SYMBOL(unwind);
>
> static bool dump_backtrace_entry(void *arg, unsigned long where)
> {
> @@ -208,7 +212,7 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
> fp = thread_saved_fp(task);
> pc = thread_saved_pc(task);
> }
> - walk_stackframe(task, fp, pc, consume_entry, cookie);
> + unwind(task, fp, pc, consume_entry, cookie);
>
> }
>
> --
> 2.25.1
>
On 10/27/21 12:53 PM, Mark Rutland wrote:
> On Thu, Oct 14, 2021 at 09:58:44PM -0500, [email protected] wrote:
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> Rename unwinder functions for consistency and better naming.
>>
>> - Rename start_backtrace() to unwind_start().
>> - Rename unwind_frame() to unwind_next().
>> - Rename walk_stackframe() to unwind().
>
> This looks good to me.
>
Thanks.
> Could we split this from the krpbes/tracing changes? I think this stands
> on it's own, and (as below) the kprobes/tracing changes need some more
> explanation, and would make sense as a separate patch.
>
OK. I will split the patches.
>> Prevent the following unwinder functions from being traced:
>>
>> - unwind_start()
>> - unwind_next()
>>
>> unwind() is already prevented from being traced.
>
> This could do with an explanation in the commis message as to why we
> need to do this. If this is fixing a latent issue, it should be in a
> preparatory patch that we can backport.
>
> I dug into this a bit, and from taking a look, we prohibited ftrace in commit:
>
> 0c32706dac1b0a72 ("arm64: stacktrace: avoid tracing arch_stack_walk()")
>
> ... which is just one special case of graph return stack unbalancing,
> and should be addressed by using HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, so
> with the patch making us use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, that's
> no longer necessary.
>
> So we no longer seem to have a specific reason to prohibit ftrace
> here.
>
OK, I will think about this and add a comment.
>> Prevent the following unwinder functions from being kprobed:
>>
>> - unwind_start()
>>
>> unwind_next() and unwind() are already prevented from being kprobed.
>
> Likewise, I think this needs some explanation. From diggin, we
> prohibited kprobes in commit:
>
> ee07b93e7721ccd5 ("arm64: unwind: Prohibit probing on return_address()")
>
> ... and the commit message says we need to do this because this is
> (transitively) called by trace_hardirqs_off(), which is kprobes
> blacklisted, but doesn't explain the actual problem this results in.
>
OK. I will think about this and add a comment.
> AFAICT x86 directly uses __builtin_return_address() here, but that won't
> recover rewritten addresses, which seems like a bug (or at least a
> limitation) on x86, assuming I've read that correctly.
>
OK.
Thanks,
Madhavan
Hi Madhavan,
> -----Original Message-----
> From: [email protected] <[email protected]>
> Sent: Friday, October 15, 2021 11:59 AM
> To: [email protected]; [email protected]; [email protected]; [email protected]; Nobuta, Keiya/信田 圭哉
> <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]
> Subject: [PATCH v10 10/11] arm64: Introduce stack trace reliability checks in the unwinder
>
> From: "Madhavan T. Venkataraman" <[email protected]>
>
> There are some kernel features and conditions that make a stack trace unreliable. Callers may require the unwinder to detect
> these cases.
> E.g., livepatch.
>
> Introduce a new function called unwind_check_reliability() that will detect these cases and set a flag in the stack frame. Call
> unwind_check_reliability() for every frame, that is, in unwind_start() and unwind_next().
>
> Introduce the first reliability check in unwind_check_reliability() - If a return PC is not a valid kernel text address, consider the
> stack trace unreliable. It could be some generated code. Other reliability checks will be added in the future.
>
> Let unwind() return a boolean to indicate if the stack trace is reliable.
>
> Introduce arch_stack_walk_reliable() for ARM64. This works like
> arch_stack_walk() except that it returns -EINVAL if the stack trace is not reliable.
>
> Until all the reliability checks are in place, arch_stack_walk_reliable() may not be used by livepatch. But it may be used by
> debug and test code.
>
> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
> ---
> arch/arm64/include/asm/stacktrace.h | 3 ++
> arch/arm64/kernel/stacktrace.c | 48 ++++++++++++++++++++++++++++-
> 2 files changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
> index ba2180c7d5cd..ce0710fa3037 100644
> --- a/arch/arm64/include/asm/stacktrace.h
> +++ b/arch/arm64/include/asm/stacktrace.h
> @@ -51,6 +51,8 @@ struct stack_info {
> * replacement lr value in the ftrace graph stack.
> *
> * @failed: Unwind failed.
> + *
> + * @reliable: Stack trace is reliable.
> */
> struct stackframe {
> unsigned long fp;
> @@ -62,6 +64,7 @@ struct stackframe {
> int graph;
> #endif
> bool failed;
> + bool reliable;
> };
>
> extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk, diff --git a/arch/arm64/kernel/stacktrace.c
> b/arch/arm64/kernel/stacktrace.c index 8e9e6f38c975..142f08ae515f 100644
> --- a/arch/arm64/kernel/stacktrace.c
> +++ b/arch/arm64/kernel/stacktrace.c
> @@ -18,6 +18,22 @@
> #include <asm/stack_pointer.h>
> #include <asm/stacktrace.h>
>
> +/*
> + * Check the stack frame for conditions that make further unwinding unreliable.
> + */
> +static void notrace unwind_check_reliability(struct stackframe *frame)
> +{
> + /*
> + * If the PC is not a known kernel text address, then we cannot
> + * be sure that a subsequent unwind will be reliable, as we
> + * don't know that the code follows our unwind requirements.
> + */
> + if (!__kernel_text_address(frame->pc))
> + frame->reliable = false;
> +}
> +
> +NOKPROBE_SYMBOL(unwind_check_reliability);
> +
> /*
> * AArch64 PCS assigns the frame pointer to x29.
> *
> @@ -55,6 +71,8 @@ static void notrace unwind_start(struct stackframe *frame, unsigned long fp,
> frame->prev_fp = 0;
> frame->prev_type = STACK_TYPE_UNKNOWN;
> frame->failed = false;
> + frame->reliable = true;
> + unwind_check_reliability(frame);
> }
>
> NOKPROBE_SYMBOL(unwind_start);
> @@ -138,6 +156,7 @@ static void notrace unwind_next(struct task_struct *tsk, #endif /*
> CONFIG_FUNCTION_GRAPH_TRACER */
>
> frame->pc = ptrauth_strip_insn_pac(frame->pc);
> + unwind_check_reliability(frame);
> }
Isn't it necessary to check "final frame" before unwind_check_reliability()?
The frame at this point is unwound frame, so may be last frame.
Or if move unwind_check_reliability() into unwind(), I think unwind() can
be twins as below:
~~~~~~~~
unwind(...) {
<...>
for (unwind_start(...); unwind_continue(...); unwind_next(...))
unwind_check_reliability(&frame);
}
unwind_reliable(...) {
<...>
for (unwind_start(...); unwind_continue(...); unwind_next(...)) {
unwind_check_reliability(&frame);
if (!frame.reliable)
break;
}
return (frame.reliable && !frame.failed);
}
~~~~~~~~
Thanks,
Keiya
>
> NOKPROBE_SYMBOL(unwind_next);
> @@ -167,7 +186,7 @@ static bool notrace unwind_continue(struct task_struct *task,
>
> NOKPROBE_SYMBOL(unwind_continue);
>
> -static void notrace unwind(struct task_struct *tsk,
> +static bool notrace unwind(struct task_struct *tsk,
> unsigned long fp, unsigned long pc,
> bool (*fn)(void *, unsigned long),
> void *data)
> @@ -177,6 +196,7 @@ static void notrace unwind(struct task_struct *tsk,
> unwind_start(&frame, fp, pc);
> while (unwind_continue(tsk, &frame, fn, data))
> unwind_next(tsk, &frame);
> + return frame.reliable;
> }
>
> NOKPROBE_SYMBOL(unwind);
> @@ -238,4 +258,30 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
>
> }
>
> +/*
> + * arch_stack_walk_reliable() may not be used for livepatch until all
> +of
> + * the reliability checks are in place in unwind_consume(). However,
> + * debug and test code can choose to use it even if all the checks are
> +not
> + * in place.
> + */
> +noinline int notrace arch_stack_walk_reliable(stack_trace_consume_fn consume_fn,
> + void *cookie,
> + struct task_struct *task)
> +{
> + unsigned long fp, pc;
> +
> + if (task == current) {
> + /* Skip arch_stack_walk_reliable() in the stack trace. */
> + fp = (unsigned long)__builtin_frame_address(1);
> + pc = (unsigned long)__builtin_return_address(0);
> + } else {
> + /* Caller guarantees that the task is not running. */
> + fp = thread_saved_fp(task);
> + pc = thread_saved_pc(task);
> + }
> + if (unwind(task, fp, pc, consume_fn, cookie))
> + return 0;
> + return -EINVAL;
> +}
> +
> #endif
> --
> 2.25.1
Hi Nobuta,
Sorry for the delay in responding to your comment.
I will fix the issue you have raised in the next version.
Thanks. Again, sorry for the late response.
Madhavan
On 11/4/21 7:39 AM, [email protected] wrote:
> Hi Madhavan,
>
>> -----Original Message-----
>> From: [email protected] <[email protected]>
>> Sent: Friday, October 15, 2021 11:59 AM
>> To: [email protected]; [email protected]; [email protected]; [email protected]; Nobuta, Keiya/信田 圭哉
>> <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]
>> Subject: [PATCH v10 10/11] arm64: Introduce stack trace reliability checks in the unwinder
>>
>> From: "Madhavan T. Venkataraman" <[email protected]>
>>
>> There are some kernel features and conditions that make a stack trace unreliable. Callers may require the unwinder to detect
>> these cases.
>> E.g., livepatch.
>>
>> Introduce a new function called unwind_check_reliability() that will detect these cases and set a flag in the stack frame. Call
>> unwind_check_reliability() for every frame, that is, in unwind_start() and unwind_next().
>>
>> Introduce the first reliability check in unwind_check_reliability() - If a return PC is not a valid kernel text address, consider the
>> stack trace unreliable. It could be some generated code. Other reliability checks will be added in the future.
>>
>> Let unwind() return a boolean to indicate if the stack trace is reliable.
>>
>> Introduce arch_stack_walk_reliable() for ARM64. This works like
>> arch_stack_walk() except that it returns -EINVAL if the stack trace is not reliable.
>>
>> Until all the reliability checks are in place, arch_stack_walk_reliable() may not be used by livepatch. But it may be used by
>> debug and test code.
>>
>> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
>> ---
>> arch/arm64/include/asm/stacktrace.h | 3 ++
>> arch/arm64/kernel/stacktrace.c | 48 ++++++++++++++++++++++++++++-
>> 2 files changed, 50 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
>> index ba2180c7d5cd..ce0710fa3037 100644
>> --- a/arch/arm64/include/asm/stacktrace.h
>> +++ b/arch/arm64/include/asm/stacktrace.h
>> @@ -51,6 +51,8 @@ struct stack_info {
>> * replacement lr value in the ftrace graph stack.
>> *
>> * @failed: Unwind failed.
>> + *
>> + * @reliable: Stack trace is reliable.
>> */
>> struct stackframe {
>> unsigned long fp;
>> @@ -62,6 +64,7 @@ struct stackframe {
>> int graph;
>> #endif
>> bool failed;
>> + bool reliable;
>> };
>>
>> extern void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk, diff --git a/arch/arm64/kernel/stacktrace.c
>> b/arch/arm64/kernel/stacktrace.c index 8e9e6f38c975..142f08ae515f 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -18,6 +18,22 @@
>> #include <asm/stack_pointer.h>
>> #include <asm/stacktrace.h>
>>
>> +/*
>> + * Check the stack frame for conditions that make further unwinding unreliable.
>> + */
>> +static void notrace unwind_check_reliability(struct stackframe *frame)
>> +{
>> + /*
>> + * If the PC is not a known kernel text address, then we cannot
>> + * be sure that a subsequent unwind will be reliable, as we
>> + * don't know that the code follows our unwind requirements.
>> + */
>> + if (!__kernel_text_address(frame->pc))
>> + frame->reliable = false;
>> +}
>> +
>> +NOKPROBE_SYMBOL(unwind_check_reliability);
>> +
>> /*
>> * AArch64 PCS assigns the frame pointer to x29.
>> *
>> @@ -55,6 +71,8 @@ static void notrace unwind_start(struct stackframe *frame, unsigned long fp,
>> frame->prev_fp = 0;
>> frame->prev_type = STACK_TYPE_UNKNOWN;
>> frame->failed = false;
>> + frame->reliable = true;
>> + unwind_check_reliability(frame);
>> }
>>
>> NOKPROBE_SYMBOL(unwind_start);
>> @@ -138,6 +156,7 @@ static void notrace unwind_next(struct task_struct *tsk, #endif /*
>> CONFIG_FUNCTION_GRAPH_TRACER */
>>
>> frame->pc = ptrauth_strip_insn_pac(frame->pc);
>> + unwind_check_reliability(frame);
>> }
>
> Isn't it necessary to check "final frame" before unwind_check_reliability()?
> The frame at this point is unwound frame, so may be last frame.
>
> Or if move unwind_check_reliability() into unwind(), I think unwind() can
> be twins as below:
>
> ~~~~~~~~
> unwind(...) {
> <...>
> for (unwind_start(...); unwind_continue(...); unwind_next(...))
> unwind_check_reliability(&frame);
> }
>
> unwind_reliable(...) {
> <...>
> for (unwind_start(...); unwind_continue(...); unwind_next(...)) {
> unwind_check_reliability(&frame);
> if (!frame.reliable)
> break;
> }
>
> return (frame.reliable && !frame.failed);
> }
> ~~~~~~~~
>
>
>
> Thanks,
> Keiya
>
>
>>
>> NOKPROBE_SYMBOL(unwind_next);
>> @@ -167,7 +186,7 @@ static bool notrace unwind_continue(struct task_struct *task,
>>
>> NOKPROBE_SYMBOL(unwind_continue);
>>
>> -static void notrace unwind(struct task_struct *tsk,
>> +static bool notrace unwind(struct task_struct *tsk,
>> unsigned long fp, unsigned long pc,
>> bool (*fn)(void *, unsigned long),
>> void *data)
>> @@ -177,6 +196,7 @@ static void notrace unwind(struct task_struct *tsk,
>> unwind_start(&frame, fp, pc);
>> while (unwind_continue(tsk, &frame, fn, data))
>> unwind_next(tsk, &frame);
>> + return frame.reliable;
>> }
>>
>> NOKPROBE_SYMBOL(unwind);
>> @@ -238,4 +258,30 @@ noinline notrace void arch_stack_walk(stack_trace_consume_fn consume_entry,
>>
>> }
>>
>> +/*
>> + * arch_stack_walk_reliable() may not be used for livepatch until all
>> +of
>> + * the reliability checks are in place in unwind_consume(). However,
>> + * debug and test code can choose to use it even if all the checks are
>> +not
>> + * in place.
>> + */
>> +noinline int notrace arch_stack_walk_reliable(stack_trace_consume_fn consume_fn,
>> + void *cookie,
>> + struct task_struct *task)
>> +{
>> + unsigned long fp, pc;
>> +
>> + if (task == current) {
>> + /* Skip arch_stack_walk_reliable() in the stack trace. */
>> + fp = (unsigned long)__builtin_frame_address(1);
>> + pc = (unsigned long)__builtin_return_address(0);
>> + } else {
>> + /* Caller guarantees that the task is not running. */
>> + fp = thread_saved_fp(task);
>> + pc = thread_saved_pc(task);
>> + }
>> + if (unwind(task, fp, pc, consume_fn, cookie))
>> + return 0;
>> + return -EINVAL;
>> +}
>> +
>> #endif
>> --
>> 2.25.1
>
On Fri, Oct 22, 2021 at 07:02:43PM +0100, Mark Rutland wrote:
> On Thu, Oct 14, 2021 at 09:58:37PM -0500, [email protected] wrote:
> > From: "Madhavan T. Venkataraman" <[email protected]>
> >
> > Currently, there are multiple functions in ARM64 code that walk the
> > stack using start_backtrace() and unwind_frame() or start_backtrace()
> > and walk_stackframe(). They should all be converted to use
> > arch_stack_walk(). This makes maintenance easier.
> >
> > To do that, arch_stack_walk() must always be defined. arch_stack_walk()
> > is within #ifdef CONFIG_STACKTRACE. So, select STACKTRACE in
> > arch/arm64/Kconfig.
>
> I'd prefer if we could decouple ARCH_STACKWALK from STACKTRACE, so that
> we don't have to expose /proc/*/stack unconditionally, which Peter
> Zijlstra has a patch for:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> ... but regardless the rest of the series looks pretty good, so I'll go
> review that, and we can figure out how to queue the bits and pieces in
> the right order.
FWIW, it looks like the direction of travel there is not go and unify
the various arch unwinders, but I would like to not depend on
STACKTRACE. Regardless, the initial arch_stack_walk() cleanup patches
all look good, so I reckon we should try to get those out of the way and
queue those for arm64 soon even if we need some more back-and-forth over
the later part of the series.
With that in mind, I've picked up Peter's patch decoupling
ARCH_STACKWALK from STACKTRACE, and rebased the initial patches from
this series atop. Since there's some subtltety in a few cases (and this
was easy to miss while reviewing), I've expanded the commit messages
with additional rationale as to why each transformation is safe.
I've pushed that to:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace/arch-stack-walk
There's a dependency on:
https://lore.kernel.org/r/[email protected]
... which was queued for v5.16-rc1, but got dropped due to a conflict,
and I'm expecting it to be re-queued as a fix for v5.16-rc2 shortly
after v5.16-rc1 is tagged. Hopefully that means we have a table base by
v5.16-rc2.
I'll send the preparatory series as I've prepared it shortly after
v5.16-rc1 so that people can shout if I've messed something up.
Hopefully it's easy enough to use that as a base for the more involved
rework later in this series.
Thanks,
Mark.
> Thanks,
> Mark.
>
> >
> > Signed-off-by: Madhavan T. Venkataraman <[email protected]>
> > ---
> > arch/arm64/Kconfig | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index fdcd54d39c1e..bfb0ce60d820 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -35,6 +35,7 @@ config ARM64
> > select ARCH_HAS_SET_DIRECT_MAP
> > select ARCH_HAS_SET_MEMORY
> > select ARCH_STACKWALK
> > + select STACKTRACE
> > select ARCH_HAS_STRICT_KERNEL_RWX
> > select ARCH_HAS_STRICT_MODULE_RWX
> > select ARCH_HAS_SYNC_DMA_FOR_DEVICE
> > --
> > 2.25.1
> >
I reviewed the changes briefly. They look good. I will take a more detailed look this week.
Thanks for doing this!
Once this is part of v5.16-rc2, I will send out version 11 on top of that with the rest of
the patches in my series.
Madhavan
On 11/12/21 11:44 AM, Mark Rutland wrote:
> On Fri, Oct 22, 2021 at 07:02:43PM +0100, Mark Rutland wrote:
>> On Thu, Oct 14, 2021 at 09:58:37PM -0500, [email protected] wrote:
>>> From: "Madhavan T. Venkataraman" <[email protected]>
>>>
>>> Currently, there are multiple functions in ARM64 code that walk the
>>> stack using start_backtrace() and unwind_frame() or start_backtrace()
>>> and walk_stackframe(). They should all be converted to use
>>> arch_stack_walk(). This makes maintenance easier.
>>>
>>> To do that, arch_stack_walk() must always be defined. arch_stack_walk()
>>> is within #ifdef CONFIG_STACKTRACE. So, select STACKTRACE in
>>> arch/arm64/Kconfig.
>>
>> I'd prefer if we could decouple ARCH_STACKWALK from STACKTRACE, so that
>> we don't have to expose /proc/*/stack unconditionally, which Peter
>> Zijlstra has a patch for:
>>
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> ... but regardless the rest of the series looks pretty good, so I'll go
>> review that, and we can figure out how to queue the bits and pieces in
>> the right order.
>
> FWIW, it looks like the direction of travel there is not go and unify
> the various arch unwinders, but I would like to not depend on
> STACKTRACE. Regardless, the initial arch_stack_walk() cleanup patches
> all look good, so I reckon we should try to get those out of the way and
> queue those for arm64 soon even if we need some more back-and-forth over
> the later part of the series.
>
> With that in mind, I've picked up Peter's patch decoupling
> ARCH_STACKWALK from STACKTRACE, and rebased the initial patches from
> this series atop. Since there's some subtltety in a few cases (and this
> was easy to miss while reviewing), I've expanded the commit messages
> with additional rationale as to why each transformation is safe.
> I've pushed that to:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/stacktrace/arch-stack-walk
>
> There's a dependency on:
>
> https://lore.kernel.org/r/[email protected]
>
> ... which was queued for v5.16-rc1, but got dropped due to a conflict,
> and I'm expecting it to be re-queued as a fix for v5.16-rc2 shortly
> after v5.16-rc1 is tagged. Hopefully that means we have a table base by
> v5.16-rc2.
>
> I'll send the preparatory series as I've prepared it shortly after
> v5.16-rc1 so that people can shout if I've messed something up.
>
> Hopefully it's easy enough to use that as a base for the more involved
> rework later in this series.
>
> Thanks,
> Mark.
>
>> Thanks,
>> Mark.
>>
>>>
>>> Signed-off-by: Madhavan T. Venkataraman <[email protected]>
>>> ---
>>> arch/arm64/Kconfig | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index fdcd54d39c1e..bfb0ce60d820 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -35,6 +35,7 @@ config ARM64
>>> select ARCH_HAS_SET_DIRECT_MAP
>>> select ARCH_HAS_SET_MEMORY
>>> select ARCH_STACKWALK
>>> + select STACKTRACE
>>> select ARCH_HAS_STRICT_KERNEL_RWX
>>> select ARCH_HAS_STRICT_MODULE_RWX
>>> select ARCH_HAS_SYNC_DMA_FOR_DEVICE
>>> --
>>> 2.25.1
>>>