Current fprobe and rethook has some pitfalls and may introduce kernel stack recusion, especially in
massive tracing scenario.
For example, if (DEBUG_PREEMPT | TRACE_PREEMPT_TOGGLE) , preempt_count_{add, sub} can be traced via
ftrace, if we happens to use fprobe + rethook based on ftrace to hook on those functions,
recursion is introduced in functions like rethook_trampoline_handler and leads to kernel crash
because of stack overflow.
Snippets of such bug are like this:
[ 56.038709] BUG: #DF stack guard page was hit at 000000000b5b7199 (stack is 00000000f4b5a9b2..00000000af4160ce)
[ 56.038713] stack guard page: 0000 [#1] PREEMPT SMP NOPTI
[ 56.038715] CPU: 5 PID: 1836 Comm: retsnoop Kdump: loaded Not tainted 6.1.18 #2
[ 56.038717] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/05/2021
[ 56.038717] BUG: #DF stack guard page was hit at 0000000069dc65a2 (stack is 000000006b9345c5..00000000a221349b)
[ 56.038718] RIP: 0010:ftrace_ops_test+0x1a/0x70
[ 56.038721] Code: 89 df e8 79 e2 ff ff e9 6e ff ff ff 0f 1f 40 00 48 81 ec b0 00 00 00 49 89 f1 49 89 f8 31 c0 48 89 e6 b9 16 00 00 00 48 89 f7 <f3> 48 ab 48 85 d2 74 35 49 8b 80 d8 00 00 00 48 8b 40 08 48 89 44
[ 56.038722] RSP: 0018:fffffe5a8bba5fa0 EFLAGS: 00010046
[ 56.038724] RAX: 0000000000000000 RBX: fffffe5a8bba6090 RCX: 0000000000000016
[ 56.038725] RDX: fffffe5a8bba6090 RSI: fffffe5a8bba5fa0 RDI: fffffe5a8bba5fa0
[ 56.038726] RBP: ffffffffb7137910 R08: ffff8b967f827c70 R09: ffffffffb7137910
[ 56.038727] R10: 0000000000000000 R11: 0000000000000000 R12: fffffe5a8bba6090
[ 56.038727] R13: ffffffffb729e2bf R14: ffffffffffffffdf R15: ffff8b967f827c70
[ 56.038728] FS: 00007f6592d6ed00(0000) GS:ffff8b977da00000(0000) knlGS:0000000000000000
[ 56.038730] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 56.038730] CR2: fffffe5a8bba5f98 CR3: 000000010ed94002 CR4: 00000000003726e0
[ 56.038733] Call Trace:
[ 56.038735] <#DF>
[ 56.038740] ? exc_int3+0xa/0xc0
[ 56.038743] arch_ftrace_ops_list_func+0xc2/0x190
[ 56.038745] ? rethook_trampoline_handler+0x5f/0x140
[ 56.038748] ftrace_regs_call+0x5/0x52
[ 56.038751] ? rethook_trampoline_handler+0x5f/0x140
[ 56.038754] ? osnoise_arch_unregister+0x210/0x210
[ 56.038757] ? preempt_count_add+0x5/0xa0
[ 56.038760] preempt_count_add+0x5/0xa0
[ 56.038762] rethook_trampoline_handler+0x5f/0x140
[ 56.038764] ? rethook_trampoline_handler+0x5f/0x140
[ 56.038766] arch_rethook_trampoline_callback+0x3b/0x50
[ 56.038768] arch_rethook_trampoline+0x2c/0x60
[ 56.038770] ? rethook_trampoline_handler+0x5f/0x140
[ 56.038775] ? rethook_trampoline_handler+0x5f/0x140
[ 56.038778] osnoise_arch_unregister+0x210/0x210
[ 56.038780] ? rethook_trampoline_handler+0x5f/0x140
[ 56.038781] arch_rethook_trampoline_callback+0x3b/0x50
[ 56.038783] arch_rethook_trampoline+0x2c/0x60
[ 56.038785] ? rethook_trampoline_handler+0x5f/0x140
[ 56.038790] ? rethook_trampoline_handler+0x5f/0x140
[ 56.038792] osnoise_arch_unregister+0x210/0x210
[ 56.038794] ? rethook_trampoline_handler+0x5f/0x140
[ 56.038795] arch_rethook_trampoline_callback+0x3b/0x50
[ 56.038797] arch_rethook_trampoline+0x2c/0x60
[ 56.038799] ? rethook_trampoline_handler+0x5f/0x140
[ 56.038804] ? rethook_trampoline_handler+0x5f/0x140
[ 56.038806] osnoise_arch_unregister+0x210/0x210
[ 56.038808] ? rethook_trampoline_handler+0x5f/0x140
[ 56.038810] arch_rethook_trampoline_callback+0x3b/0x50
[ 56.038811] arch_rethook_trampoline+0x2c/0x60
...
[ 56.039133] ? rethook_trampoline_handler+0x5f/0x140
[ 56.039137] ? rethook_trampoline_handler+0x5f/0x140
[ 56.039139] osnoise_arch_unregister+0x210/0x210
[ 56.039141] ? rethook_trampoline_handler+0x5f/0x140
[ 56.039143] arch_rethook_trampoline_callback+0x3b/0x50
[ 56.039144] arch_rethook_trampoline+0x2c/0x60
[ 56.039147] ? rethook_trampoline_handler+0x5f/0x140
[ 56.039151] ? rethook_trampoline_handler+0x5f/0x140
[ 56.039156] ? vsnprintf+0x2a3/0x550
[ 56.039161] ? sprintf+0x4e/0x60
[ 56.039163] ? kallsyms_lookup_buildid+0x5f/0x130
[ 56.039167] ? __sprint_symbol.constprop.0+0xec/0x110
[ 56.039171] ? symbol_string+0xc5/0x150
[ 56.039197] ? vsnprintf+0x33a/0x550
[ 56.039201] ? exc_int3+0xa/0xc0
[ 56.039204] ? exc_int3+0xa/0xc0
[ 56.039205] ? ftrace_regs_call+0x5/0x52
[ 56.039208] ? ftrace_regs_call+0x5/0x52
[ 56.039211] ? lock_acquire+0x25d/0x2e0
[ 56.039214] ? lock_release+0x208/0x460
[ 56.039218] ? is_bpf_text_address+0x67/0xf0
[ 56.039220] ? kernel_text_address+0x111/0x120
[ 56.039223] ? __kernel_text_address+0xe/0x40
[ 56.039225] ? show_trace_log_lvl+0x1d7/0x336
[ 56.039227] ? show_trace_log_lvl+0x1d7/0x336
[ 56.039236] ? __die_body.cold+0x1a/0x1f
[ 56.039239] ? die+0x2a/0x50
[ 56.039242] ? handle_stack_overflow+0x49/0x60
[ 56.039245] ? exc_double_fault+0x148/0x180
[ 56.039248] ? asm_exc_double_fault+0x1f/0x30
[ 56.039251] ? rethook_trampoline_handler+0x5f/0x140
[ 56.039252] ? cpu_cgroup_css_free+0x30/0x30
[ 56.039254] ? cpu_cgroup_css_free+0x30/0x30
[ 56.039258] ? ftrace_ops_test+0x1a/0x70
[ 56.039260] </#DF>
This bug is found via tool retsnoop which internally uses bpf based on fprobe + rethook
Discussion of this bug can be found here:
Link: https://lore.kernel.org/bpf/[email protected]/
This patch series fix this problem by adding more recursion detection in each possible entry
functions, and also mark these specific to fprobe or rethook which are beyond the recusion-free
guarded region notrace.
Ze Gao (4):
rethook: use preempt_{disable, enable}_notrace in
rethook_trampoline_handler
fprobe: make fprobe_kprobe_handler recursion free
fprobe: add recursion detection in fprobe_exit_handler
rehook, fprobe: mark rethook related functions notrace
arch/riscv/kernel/probes/rethook.c | 4 +-
arch/s390/kernel/rethook.c | 6 +--
arch/x86/kernel/rethook.c | 8 ++--
kernel/trace/fprobe.c | 76 +++++++++++++++++++++++-------
kernel/trace/rethook.c | 12 ++---
5 files changed, 75 insertions(+), 31 deletions(-)
--
2.40.1