Architectures use assembly code to initialize ftrace_regs and call
ftrace_ops_list_func(). Therefore, from the KMSAN's point of view,
ftrace_regs is poisoned on ftrace_ops_list_func entry(). This causes
KMSAN warnings when running the ftrace testsuite.
Fix by trusting the architecture-specific assembly code and always
unpoisoning ftrace_regs in ftrace_ops_list_func.
Acked-by: Steven Rostedt (Google) <[email protected]>
Reviewed-by: Alexander Potapenko <[email protected]>
Signed-off-by: Ilya Leoshkevich <[email protected]>
---
kernel/trace/ftrace.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8de8bec5f366..dfb8b26966aa 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -7399,6 +7399,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs)
{
+ kmsan_unpoison_memory(fregs, sizeof(*fregs));
__ftrace_ops_list_func(ip, parent_ip, NULL, fregs);
}
#else
--
2.43.0
On Thu, 14 Dec 2023 00:24:21 +0100
Ilya Leoshkevich <[email protected]> wrote:
> Architectures use assembly code to initialize ftrace_regs and call
> ftrace_ops_list_func(). Therefore, from the KMSAN's point of view,
> ftrace_regs is poisoned on ftrace_ops_list_func entry(). This causes
> KMSAN warnings when running the ftrace testsuite.
BTW, why is this only a problem for s390 and no other architectures?
If it is only a s390 thing, then we should do this instead:
in include/linux/ftrace.h:
/* Add a comment here to why this is needed */
#ifndef ftrace_list_func_unpoison
# define ftrace_list_func_unpoison(fregs) do { } while(0)
#endif
In arch/s390/include/asm/ftrace.h:
/* Add a comment to why s390 is special */
# define ftrace_list_func_unpoison(fregs) kmsan_unpoison_memory(fregs, sizeof(*fregs))
>
> Fix by trusting the architecture-specific assembly code and always
> unpoisoning ftrace_regs in ftrace_ops_list_func.
>
> Acked-by: Steven Rostedt (Google) <[email protected]>
I'm taking my ack away for this change in favor of what I'm suggesting now.
> Reviewed-by: Alexander Potapenko <[email protected]>
> Signed-off-by: Ilya Leoshkevich <[email protected]>
> ---
> kernel/trace/ftrace.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> index 8de8bec5f366..dfb8b26966aa 100644
> --- a/kernel/trace/ftrace.c
> +++ b/kernel/trace/ftrace.c
> @@ -7399,6 +7399,7 @@ __ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
> void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip,
> struct ftrace_ops *op, struct ftrace_regs *fregs)
> {
> + kmsan_unpoison_memory(fregs, sizeof(*fregs));
And here have:
ftrace_list_func_unpoison(fregs);
That way we only do it for archs that really need it, and do not affect
archs that do not.
I want to know why this only affects s390, because if we are just doing
this because "it works", it could be just covering up a symptom of
something else and not actually doing the "right thing".
-- Steve
> __ftrace_ops_list_func(ip, parent_ip, NULL, fregs);
> }
> #else
On Tue, 2024-01-02 at 10:17 -0500, Steven Rostedt wrote:
> On Thu, 14 Dec 2023 00:24:21 +0100
> Ilya Leoshkevich <[email protected]> wrote:
>
> > Architectures use assembly code to initialize ftrace_regs and call
> > ftrace_ops_list_func(). Therefore, from the KMSAN's point of view,
> > ftrace_regs is poisoned on ftrace_ops_list_func entry(). This
> > causes
> > KMSAN warnings when running the ftrace testsuite.
>
> BTW, why is this only a problem for s390 and no other architectures?
>
> If it is only a s390 thing, then we should do this instead:
>
> in include/linux/ftrace.h:
>
> /* Add a comment here to why this is needed */
> #ifndef ftrace_list_func_unpoison
> # define ftrace_list_func_unpoison(fregs) do { } while(0)
> #endif
>
> In arch/s390/include/asm/ftrace.h:
>
> /* Add a comment to why s390 is special */
> # define ftrace_list_func_unpoison(fregs)
> kmsan_unpoison_memory(fregs, sizeof(*fregs))
>
> >
> > Fix by trusting the architecture-specific assembly code and always
> > unpoisoning ftrace_regs in ftrace_ops_list_func.
> >
> > Acked-by: Steven Rostedt (Google) <[email protected]>
>
> I'm taking my ack away for this change in favor of what I'm
> suggesting now.
>
> > Reviewed-by: Alexander Potapenko <[email protected]>
> > Signed-off-by: Ilya Leoshkevich <[email protected]>
> > ---
> > kernel/trace/ftrace.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
> > index 8de8bec5f366..dfb8b26966aa 100644
> > --- a/kernel/trace/ftrace.c
> > +++ b/kernel/trace/ftrace.c
> > @@ -7399,6 +7399,7 @@ __ftrace_ops_list_func(unsigned long ip,
> > unsigned long parent_ip,
> > void arch_ftrace_ops_list_func(unsigned long ip, unsigned long
> > parent_ip,
> > struct ftrace_ops *op, struct
> > ftrace_regs *fregs)
> > {
> > + kmsan_unpoison_memory(fregs, sizeof(*fregs));
>
> And here have:
>
> ftrace_list_func_unpoison(fregs);
>
> That way we only do it for archs that really need it, and do not
> affect
> archs that do not.
>
>
> I want to know why this only affects s390, because if we are just
> doing
> this because "it works", it could be just covering up a symptom of
> something else and not actually doing the "right thing".
>
>
> -- Steve
>
>
> > __ftrace_ops_list_func(ip, parent_ip, NULL, fregs);
> > }
> > #else
>
Ok, it has been a while, but I believe I have a good answer now. KMSAN
shadow for memory above $rsp is essentially random. Here is an example
(you'll need a GDB hack from [1] if you want to try this at home):
(gdb) x/5i do_nanosleep
0xffffffff843607c0 <do_nanosleep>: call 0xffffffffc0201000
Thread 3 hit Breakpoint 1, 0xffffffffc0201000 in ?? ()
(gdb) x/64bx kmsan_get_metadata($rsp - 64, 0)
0xffffd1000087bd38: 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00
0xffffd1000087bd40: 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00
0xffffd1000087bd48: 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00
0xffffd1000087bd50: 0x00 0x00 0x00 0x00 0xff 0xff
0xff 0xff
0xffffd1000087bd58: 0x00 0x00 0x00 0x00 0x00 0x00
0x00 0x00
0xffffd1000087bd60: 0xff 0xff 0xff 0xff 0xff 0xff
0xff 0xff
0xffffd1000087bd68: 0xff 0xff 0xff 0xff 0xff 0xff
0xff 0xff
0xffffd1000087bd70: 0xff 0xff 0xff 0xff 0xff 0xff
0xff 0xff
So if assembly (in this case ftrace_regs_caller) allocates struct
pt_regs on stack, it may or may not be poisoned depending on what was
called before. So, by accident, on s390x it's poisoned and trips KMSAN,
and on x86_64 it's not. Based on this observation, I'd say we need
an unpoison call in all ftrace handlers (e.g., kprobe_ftrace_handler),
and not just this one.
But why is this the case? Kernel stacks are created by
alloc_thread_stack_node() using __vmalloc_node_range(__GFP_ZERO), so
they are fully unpoisoned. Then functions are called and return, their
locals are poisoned and unpoisoned. Interestingly enough, on return,
they are not poisoned back, even though
commit 37ad4ee8364255c73026a3c343403b5977fa7e79
Author: Alexander Potapenko <[email protected]>
Date: Thu Sep 15 17:04:13 2022 +0200
x86: kmsan: don't instrument stack walking functions
says they do. So what if we introduce that [2]?
# echo "p:nanosleep do_nanosleep %di"
>/sys/kernel/tracing/kprobe_events
# echo 1 >/sys/kernel/debug/tracing/events/kprobes/nanosleep/enable
# sleep 1
=====================================================
BUG: KMSAN: uninit-value in kprobe_ftrace_handler+0x5b9/0x790
kprobe_ftrace_handler+0x5b9/0x790
0xffffffffc02010de
do_nanosleep+0x5/0x670
hrtimer_nanosleep+0x169/0x3b0
common_nsleep+0xc7/0x100
__x64_sys_clock_nanosleep+0x4e2/0x650
do_syscall_64+0x6e/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Local variable nd created at:
do_filp_open+0x3b2/0x5e0
Quite similar to s390. Local variable nd is a random leftover from a
different call stack, which the modified instrumentation poisoned on
return from do_filp_open().
Alexander, what do you think about adding [2] upstream as an option
that can be enabled from the command line? Also, what do you think
about poisoning kernel stacks? Formally they are zeroed out, but I
think valid code has no business reading these zeroes.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=31878
[2]
https://github.com/iii-i/llvm-project/commits/msan-poison-allocas-before-returning-2024-06-12/