2021-07-29 14:13:14

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v10 00/16] kprobes: Fix stacktrace with kretprobes on x86

Hello,

This is the 10th version of the series to fix the stacktrace with kretprobe on x86.

The previous version is here;

https://lore.kernel.org/bpf/162601048053.1318837.1550594515476777588.stgit@devnote2/

This version is rebased on top of new kprobes cleanup series(*1) and merging
Josh's objtool update series (*2)(*3) as [6/16] and [7/16].

(*1) https://lore.kernel.org/bpf/162748615977.59465.13262421617578791515.stgit@devnote2/
(*2) https://lore.kernel.org/bpf/20210710192433.x5cgjsq2ksvaqnss@treble/
(*3) https://lore.kernel.org/bpf/20210710192514.ghvksi3ozhez4lvb@treble/

Changes from v9:
- Add Josh's objtool update patches with a build error fix as [6/16] and [7/16].
- Add a API document for kretprobe_find_ret_addr() and check cur != NULL in [5/16].

With this series, unwinder can unwind stack correctly from ftrace as below;

# cd /sys/kernel/debug/tracing
# echo > trace
# echo 1 > options/sym-offset
# echo r vfs_read >> kprobe_events
# echo r full_proxy_read >> kprobe_events
# echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
# echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
# echo 1 > events/kprobes/enable
# cat /sys/kernel/debug/kprobes/list
ffffffff8133b740 r full_proxy_read+0x0 [FTRACE]
ffffffff812560b0 r vfs_read+0x0 [FTRACE]
# echo 0 > events/kprobes/enable
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 3/3 #P:8
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<...>-134 [007] ...1 16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
<...>-134 [007] ...1 16.185901: <stack trace>
=> kretprobe_trace_func+0x209/0x300
=> kretprobe_dispatcher+0x4a/0x70
=> __kretprobe_trampoline_handler+0xd4/0x170
=> trampoline_handler+0x43/0x60
=> kretprobe_trampoline+0x2a/0x50
=> vfs_read+0x98/0x180
=> ksys_read+0x5f/0xe0
=> do_syscall_64+0x37/0x90
=> entry_SYSCALL_64_after_hwframe+0x44/0xae
<...>-134 [007] ...1 16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)

This shows the double return probes (vfs_read() and full_proxy_read()) on the stack
correctly unwinded. (vfs_read() returns to 'ksys_read+0x5f' and full_proxy_read()
returns to 'vfs_read+0x98')

This also changes the kretprobe behavisor a bit, now the instraction pointer in
the 'pt_regs' passed to kretprobe user handler is correctly set the real return
address. So user handlers can get it via instruction_pointer() API, and can use
stack_trace_save_regs().

You can also get this series from
git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v9


Thank you,

---

Josh Poimboeuf (3):
objtool: Add frame-pointer-specific function ignore
objtool: Ignore unwind hints for ignored functions
x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline()

Masami Hiramatsu (13):
ia64: kprobes: Fix to pass correct trampoline address to the handler
kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
kprobes: treewide: Make it harder to refer kretprobe_trampoline directly
kprobes: Add kretprobe_find_ret_addr() for searching return address
ARC: Add instruction_pointer_set() API
ia64: Add instruction_pointer_set() API
arm: kprobes: Make space for instruction pointer on stack
kprobes: Enable stacktrace from pt_regs in kretprobe handler
x86/kprobes: Push a fake return address at kretprobe_trampoline
x86/unwind: Recover kretprobe trampoline entry
tracing: Show kretprobe unknown indicator only for kretprobe_trampoline
x86/kprobes: Fixup return address in generic trampoline handler


arch/arc/include/asm/kprobes.h | 2
arch/arc/include/asm/ptrace.h | 5 +
arch/arc/kernel/kprobes.c | 13 +-
arch/arm/probes/kprobes/core.c | 11 +-
arch/arm64/include/asm/kprobes.h | 2
arch/arm64/kernel/probes/kprobes.c | 5 -
arch/arm64/kernel/probes/kprobes_trampoline.S | 4 -
arch/csky/include/asm/kprobes.h | 2
arch/csky/kernel/probes/kprobes.c | 4 -
arch/csky/kernel/probes/kprobes_trampoline.S | 4 -
arch/ia64/include/asm/ptrace.h | 5 +
arch/ia64/kernel/kprobes.c | 15 +--
arch/mips/kernel/kprobes.c | 15 +--
arch/parisc/kernel/kprobes.c | 6 +
arch/powerpc/include/asm/kprobes.h | 2
arch/powerpc/kernel/kprobes.c | 29 ++---
arch/powerpc/kernel/optprobes.c | 2
arch/powerpc/kernel/stacktrace.c | 2
arch/riscv/include/asm/kprobes.h | 2
arch/riscv/kernel/probes/kprobes.c | 4 -
arch/riscv/kernel/probes/kprobes_trampoline.S | 4 -
arch/s390/include/asm/kprobes.h | 2
arch/s390/kernel/kprobes.c | 12 +-
arch/s390/kernel/stacktrace.c | 2
arch/sh/include/asm/kprobes.h | 2
arch/sh/kernel/kprobes.c | 12 +-
arch/sparc/include/asm/kprobes.h | 2
arch/sparc/kernel/kprobes.c | 12 +-
arch/x86/include/asm/kprobes.h | 1
arch/x86/include/asm/unwind.h | 23 ++++
arch/x86/include/asm/unwind_hints.h | 5 +
arch/x86/kernel/kprobes/core.c | 71 ++++++++++---
arch/x86/kernel/unwind_frame.c | 3 -
arch/x86/kernel/unwind_guess.c | 3 -
arch/x86/kernel/unwind_orc.c | 21 +++-
include/linux/kprobes.h | 44 +++++++-
include/linux/objtool.h | 12 ++
kernel/kprobes.c | 133 +++++++++++++++++++------
kernel/trace/trace_output.c | 17 +--
lib/error-inject.c | 3 -
tools/include/linux/objtool.h | 12 ++
tools/objtool/check.c | 2
42 files changed, 360 insertions(+), 172 deletions(-)

--
Masami Hiramatsu (Linaro) <[email protected]>


2021-07-29 14:13:14

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v10 09/16] ARC: Add instruction_pointer_set() API

Add instruction_pointer_set() API for arc.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/arc/include/asm/ptrace.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/arc/include/asm/ptrace.h b/arch/arc/include/asm/ptrace.h
index 4c3c9be5bd16..cca8d6583e31 100644
--- a/arch/arc/include/asm/ptrace.h
+++ b/arch/arc/include/asm/ptrace.h
@@ -149,6 +149,11 @@ static inline long regs_return_value(struct pt_regs *regs)
return (long)regs->r0;
}

+static inline void instruction_pointer_set(struct pt_regs *regs,
+ unsigned long val)
+{
+ instruction_pointer(regs) = val;
+}
#endif /* !__ASSEMBLY__ */

#endif /* __ASM_PTRACE_H */


2021-07-29 14:13:20

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v10 12/16] kprobes: Enable stacktrace from pt_regs in kretprobe handler

Since the ORC unwinder from pt_regs requires setting up regs->ip
correctly, set the correct return address to the regs->ip before
calling user kretprobe handler.

This allows the kretrprobe handler to trace stack from the
kretprobe's pt_regs by stack_trace_save_regs() (eBPF will do
this), instead of stack tracing from the handler context by
stack_trace_save() (ftrace will do this).

Suggested-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Andrii Nakryiko <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
---
Changes in v9:
- Update comment to explain specifically why this is necessary.
Changes in v8:
- Update comment to clarify why this is needed.
Changes in v3:
- Cast the correct_ret_addr to unsigned long.
---
kernel/kprobes.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 833f07f33115..ebc587b9a346 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1937,6 +1937,13 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
BUG_ON(1);
}

+ /*
+ * Set the return address as the instruction pointer, because if the
+ * user handler calls stack_trace_save_regs() with this 'regs',
+ * the stack trace will start from the instruction pointer.
+ */
+ instruction_pointer_set(regs, (unsigned long)correct_ret_addr);
+
/* Run the user handler of the nodes. */
first = current->kretprobe_instances.first;
while (first) {


2021-07-29 14:13:25

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v10 16/16] x86/kprobes: Fixup return address in generic trampoline handler

In x86, the fake return address on the stack saved by
__kretprobe_trampoline() will be replaced with the real return
address after returning from trampoline_handler(). Before fixing
the return address, the real return address can be found in the
'current->kretprobe_instances'.

However, since there is a window between updating the
'current->kretprobe_instances' and fixing the address on the stack,
if an interrupt happens at that timing and the interrupt handler
does stacktrace, it may fail to unwind because it can not get
the correct return address from 'current->kretprobe_instances'.

This will eliminate that window by fixing the return address
right before updating 'current->kretprobe_instances'.

Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Andrii Nakryiko <[email protected]>
---
Changes in v9:
- Fixes the changelog. This can eliminate the window.
- Add more comment how it works.
Changes in v7:
- Add a prototype for arch_kretprobe_fixup_return()
---
arch/x86/kernel/kprobes/core.c | 18 ++++++++++++++++--
include/linux/kprobes.h | 3 +++
kernel/kprobes.c | 11 +++++++++++
3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 7e1111c19605..fce99e249d61 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1065,6 +1065,16 @@ NOKPROBE_SYMBOL(__kretprobe_trampoline);
*/
STACK_FRAME_NON_STANDARD_FP(__kretprobe_trampoline);

+/* This is called from kretprobe_trampoline_handler(). */
+void arch_kretprobe_fixup_return(struct pt_regs *regs,
+ kprobe_opcode_t *correct_ret_addr)
+{
+ unsigned long *frame_pointer = &regs->sp + 1;
+
+ /* Replace fake return address with real one. */
+ *frame_pointer = (unsigned long)correct_ret_addr;
+}
+
/*
* Called from __kretprobe_trampoline
*/
@@ -1082,8 +1092,12 @@ __used __visible void trampoline_handler(struct pt_regs *regs)
regs->sp += sizeof(long);
frame_pointer = &regs->sp + 1;

- /* Replace fake return address with real one. */
- *frame_pointer = kretprobe_trampoline_handler(regs, frame_pointer);
+ /*
+ * The return address at 'frame_pointer' is recovered by the
+ * arch_kretprobe_fixup_return() which called from the
+ * kretprobe_trampoline_handler().
+ */
+ kretprobe_trampoline_handler(regs, frame_pointer);

/*
* Copy FLAGS to 'pt_regs::sp' so that __kretprobe_trapmoline()
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 6d47a9da1e0a..e974caf39d3e 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -188,6 +188,9 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs);
extern int arch_trampoline_kprobe(struct kprobe *p);

+void arch_kretprobe_fixup_return(struct pt_regs *regs,
+ kprobe_opcode_t *correct_ret_addr);
+
void __kretprobe_trampoline(void);
/*
* Since some architecture uses structured function pointer,
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ebc587b9a346..b62af9fc3607 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1922,6 +1922,15 @@ unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
}
NOKPROBE_SYMBOL(kretprobe_find_ret_addr);

+void __weak arch_kretprobe_fixup_return(struct pt_regs *regs,
+ kprobe_opcode_t *correct_ret_addr)
+{
+ /*
+ * Do nothing by default. Please fill this to update the fake return
+ * address on the stack with the correct one on each arch if possible.
+ */
+}
+
unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
void *frame_pointer)
{
@@ -1967,6 +1976,8 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
first = first->next;
}

+ arch_kretprobe_fixup_return(regs, correct_ret_addr);
+
/* Unlink all nodes for this frame. */
first = current->kretprobe_instances.first;
current->kretprobe_instances.first = node->next;


2021-07-29 14:13:33

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v10 02/16] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()

~15 years ago kprobes grew the 'arch_deref_entry_point()' __weak function:

3d7e33825d87: ("jprobes: make jprobes a little safer for users")

But this is just open-coded dereference_symbol_descriptor() in essence, and
its obscure nature was causing bugs.

Just use the real thing and remove arch_deref_entry_point().

Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Andrii Nakryiko <[email protected]>
---
Changes in v9:
- Update changelog according to Ingo's suggestion.
Changes in v6:
- Use dereference_symbol_descriptor() so that it can handle address in
modules correctly.
---
arch/ia64/kernel/kprobes.c | 5 -----
arch/powerpc/kernel/kprobes.c | 11 -----------
include/linux/kprobes.h | 1 -
kernel/kprobes.c | 7 +------
lib/error-inject.c | 3 ++-
5 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index d4048518a1d7..0f8573bbf520 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -891,11 +891,6 @@ int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
return ret;
}

-unsigned long arch_deref_entry_point(void *entry)
-{
- return ((struct fnptr *)entry)->ip;
-}
-
static struct kprobe trampoline_p = {
.pre_handler = trampoline_probe_handler
};
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index cbc28d1a2e1b..077ab73e6e56 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -541,17 +541,6 @@ int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
}
NOKPROBE_SYMBOL(kprobe_fault_handler);

-unsigned long arch_deref_entry_point(void *entry)
-{
-#ifdef PPC64_ELF_ABI_v1
- if (!kernel_text_address((unsigned long)entry))
- return ppc_global_function_entry(entry);
- else
-#endif
- return (unsigned long)entry;
-}
-NOKPROBE_SYMBOL(arch_deref_entry_point);
-
static struct kprobe trampoline_p = {
.addr = (kprobe_opcode_t *) &kretprobe_trampoline,
.pre_handler = trampoline_probe_handler
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 0ba3f9e316d4..2ed61fcbc89c 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -381,7 +381,6 @@ int register_kprobe(struct kprobe *p);
void unregister_kprobe(struct kprobe *p);
int register_kprobes(struct kprobe **kps, int num);
void unregister_kprobes(struct kprobe **kps, int num);
-unsigned long arch_deref_entry_point(void *);

int register_kretprobe(struct kretprobe *rp);
void unregister_kretprobe(struct kretprobe *rp);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 8021bccb7770..550042d9a6ef 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1861,11 +1861,6 @@ static struct notifier_block kprobe_exceptions_nb = {
.priority = 0x7fffffff /* we need to be notified first */
};

-unsigned long __weak arch_deref_entry_point(void *entry)
-{
- return (unsigned long)entry;
-}
-
#ifdef CONFIG_KRETPROBES

unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
@@ -2327,7 +2322,7 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
int ret;

for (iter = start; iter < end; iter++) {
- entry = arch_deref_entry_point((void *)*iter);
+ entry = (unsigned long)dereference_symbol_descriptor((void *)*iter);
ret = kprobe_add_ksym_blacklist(entry);
if (ret == -EINVAL)
continue;
diff --git a/lib/error-inject.c b/lib/error-inject.c
index c73651b15b76..2ff5ef689d72 100644
--- a/lib/error-inject.c
+++ b/lib/error-inject.c
@@ -8,6 +8,7 @@
#include <linux/mutex.h>
#include <linux/list.h>
#include <linux/slab.h>
+#include <asm/sections.h>

/* Whitelist of symbols that can be overridden for error injection. */
static LIST_HEAD(error_injection_list);
@@ -64,7 +65,7 @@ static void populate_error_injection_list(struct error_injection_entry *start,

mutex_lock(&ei_mutex);
for (iter = start; iter < end; iter++) {
- entry = arch_deref_entry_point((void *)iter->addr);
+ entry = (unsigned long)dereference_symbol_descriptor((void *)iter->addr);

if (!kernel_text_address(entry) ||
!kallsyms_lookup_size_offset(entry, &size, &offset)) {


2021-07-29 14:13:34

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v10 13/16] x86/kprobes: Push a fake return address at kretprobe_trampoline

Change __kretprobe_trampoline() to push the address of the
__kretprobe_trampoline() as a fake return address at the bottom
of the stack frame. This fake return address will be replaced
with the correct return address in the trampoline_handler().

With this change, the ORC unwinder can check whether the return
address is modified by kretprobes or not.

Signed-off-by: Masami Hiramatsu <[email protected]>
Suggested-by: Josh Poimboeuf <[email protected]>
Tested-by: Andrii Nakryiko <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
---
Changes in v9:
- Update changelog and comment.
- Remove unneeded type casting.
---
arch/x86/kernel/kprobes/core.c | 34 +++++++++++++++++++++++++---------
1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index d1436d7463fd..7e1111c19605 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1022,28 +1022,33 @@ asm(
".global __kretprobe_trampoline\n"
".type __kretprobe_trampoline, @function\n"
"__kretprobe_trampoline:\n"
- /* We don't bother saving the ss register */
#ifdef CONFIG_X86_64
- " pushq %rsp\n"
+ /* Push a fake return address to tell the unwinder it's a kretprobe. */
+ " pushq $__kretprobe_trampoline\n"
UNWIND_HINT_FUNC
+ /* Save the 'sp - 8', this will be fixed later. */
+ " pushq %rsp\n"
" pushfq\n"
SAVE_REGS_STRING
" movq %rsp, %rdi\n"
" call trampoline_handler\n"
- /* Replace saved sp with true return address. */
- " movq %rax, 19*8(%rsp)\n"
RESTORE_REGS_STRING
+ /* In trampoline_handler(), 'regs->flags' is copied to 'regs->sp'. */
+ " addq $8, %rsp\n"
" popfq\n"
#else
- " pushl %esp\n"
+ /* Push a fake return address to tell the unwinder it's a kretprobe. */
+ " pushl $__kretprobe_trampoline\n"
UNWIND_HINT_FUNC
+ /* Save the 'sp - 4', this will be fixed later. */
+ " pushl %esp\n"
" pushfl\n"
SAVE_REGS_STRING
" movl %esp, %eax\n"
" call trampoline_handler\n"
- /* Replace saved sp with true return address. */
- " movl %eax, 15*4(%esp)\n"
RESTORE_REGS_STRING
+ /* In trampoline_handler(), 'regs->flags' is copied to 'regs->sp'. */
+ " addl $4, %esp\n"
" popfl\n"
#endif
" ret\n"
@@ -1063,8 +1068,10 @@ STACK_FRAME_NON_STANDARD_FP(__kretprobe_trampoline);
/*
* Called from __kretprobe_trampoline
*/
-__used __visible void *trampoline_handler(struct pt_regs *regs)
+__used __visible void trampoline_handler(struct pt_regs *regs)
{
+ unsigned long *frame_pointer;
+
/* fixup registers */
regs->cs = __KERNEL_CS;
#ifdef CONFIG_X86_32
@@ -1072,8 +1079,17 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
#endif
regs->ip = (unsigned long)&__kretprobe_trampoline;
regs->orig_ax = ~0UL;
+ regs->sp += sizeof(long);
+ frame_pointer = &regs->sp + 1;
+
+ /* Replace fake return address with real one. */
+ *frame_pointer = kretprobe_trampoline_handler(regs, frame_pointer);

- return (void *)kretprobe_trampoline_handler(regs, &regs->sp);
+ /*
+ * Copy FLAGS to 'pt_regs::sp' so that __kretprobe_trapmoline()
+ * can do RET right after POPF.
+ */
+ regs->sp = regs->flags;
}
NOKPROBE_SYMBOL(trampoline_handler);



2021-07-29 23:37:22

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v10 00/16] kprobes: Fix stacktrace with kretprobes on x86

On Thu, 29 Jul 2021 23:05:56 +0900
Masami Hiramatsu <[email protected]> wrote:

> Hello,
>
> This is the 10th version of the series to fix the stacktrace with kretprobe on x86.
>
> The previous version is here;
>
> https://lore.kernel.org/bpf/162601048053.1318837.1550594515476777588.stgit@devnote2/
>
> This version is rebased on top of new kprobes cleanup series(*1) and merging
> Josh's objtool update series (*2)(*3) as [6/16] and [7/16].
>
> (*1) https://lore.kernel.org/bpf/162748615977.59465.13262421617578791515.stgit@devnote2/
> (*2) https://lore.kernel.org/bpf/20210710192433.x5cgjsq2ksvaqnss@treble/
> (*3) https://lore.kernel.org/bpf/20210710192514.ghvksi3ozhez4lvb@treble/
>
> Changes from v9:
> - Add Josh's objtool update patches with a build error fix as [6/16] and [7/16].
> - Add a API document for kretprobe_find_ret_addr() and check cur != NULL in [5/16].
>
> With this series, unwinder can unwind stack correctly from ftrace as below;
>
> # cd /sys/kernel/debug/tracing
> # echo > trace
> # echo 1 > options/sym-offset
> # echo r vfs_read >> kprobe_events
> # echo r full_proxy_read >> kprobe_events
> # echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
> # echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
> # echo 1 > events/kprobes/enable
> # cat /sys/kernel/debug/kprobes/list
> ffffffff8133b740 r full_proxy_read+0x0 [FTRACE]
> ffffffff812560b0 r vfs_read+0x0 [FTRACE]
> # echo 0 > events/kprobes/enable
> # cat trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 3/3 #P:8
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
> <...>-134 [007] ...1 16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
> <...>-134 [007] ...1 16.185901: <stack trace>
> => kretprobe_trace_func+0x209/0x300
> => kretprobe_dispatcher+0x4a/0x70
> => __kretprobe_trampoline_handler+0xd4/0x170
> => trampoline_handler+0x43/0x60
> => kretprobe_trampoline+0x2a/0x50
> => vfs_read+0x98/0x180
> => ksys_read+0x5f/0xe0
> => do_syscall_64+0x37/0x90
> => entry_SYSCALL_64_after_hwframe+0x44/0xae
> <...>-134 [007] ...1 16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)
>
> This shows the double return probes (vfs_read() and full_proxy_read()) on the stack
> correctly unwinded. (vfs_read() returns to 'ksys_read+0x5f' and full_proxy_read()
> returns to 'vfs_read+0x98')
>
> This also changes the kretprobe behavisor a bit, now the instraction pointer in
> the 'pt_regs' passed to kretprobe user handler is correctly set the real return
> address. So user handlers can get it via instruction_pointer() API, and can use
> stack_trace_save_regs().
>
> You can also get this series from
> git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v9

Oops, this is of course 'kprobes/kretprobe-stackfix-v10'. And this branch includes above (*1) series.

Thank you,

--
Masami Hiramatsu <[email protected]>

2021-08-24 05:16:03

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH -tip v10 00/16] kprobes: Fix stacktrace with kretprobes on x86

On Thu, Jul 29, 2021 at 4:35 PM Masami Hiramatsu <[email protected]> wrote:
>
> On Thu, 29 Jul 2021 23:05:56 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > Hello,
> >
> > This is the 10th version of the series to fix the stacktrace with kretprobe on x86.
> >
> > The previous version is here;
> >
> > https://lore.kernel.org/bpf/162601048053.1318837.1550594515476777588.stgit@devnote2/
> >
> > This version is rebased on top of new kprobes cleanup series(*1) and merging
> > Josh's objtool update series (*2)(*3) as [6/16] and [7/16].
> >
> > (*1) https://lore.kernel.org/bpf/162748615977.59465.13262421617578791515.stgit@devnote2/
> > (*2) https://lore.kernel.org/bpf/20210710192433.x5cgjsq2ksvaqnss@treble/
> > (*3) https://lore.kernel.org/bpf/20210710192514.ghvksi3ozhez4lvb@treble/
> >
> > Changes from v9:
> > - Add Josh's objtool update patches with a build error fix as [6/16] and [7/16].
> > - Add a API document for kretprobe_find_ret_addr() and check cur != NULL in [5/16].
> >
> > With this series, unwinder can unwind stack correctly from ftrace as below;
> >
> > # cd /sys/kernel/debug/tracing
> > # echo > trace
> > # echo 1 > options/sym-offset
> > # echo r vfs_read >> kprobe_events
> > # echo r full_proxy_read >> kprobe_events
> > # echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
> > # echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
> > # echo 1 > events/kprobes/enable
> > # cat /sys/kernel/debug/kprobes/list
> > ffffffff8133b740 r full_proxy_read+0x0 [FTRACE]
> > ffffffff812560b0 r vfs_read+0x0 [FTRACE]
> > # echo 0 > events/kprobes/enable
> > # cat trace
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 3/3 #P:8
> > #
> > # _-----=> irqs-off
> > # / _----=> need-resched
> > # | / _---=> hardirq/softirq
> > # || / _--=> preempt-depth
> > # ||| / delay
> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > # | | | |||| | |
> > <...>-134 [007] ...1 16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
> > <...>-134 [007] ...1 16.185901: <stack trace>
> > => kretprobe_trace_func+0x209/0x300
> > => kretprobe_dispatcher+0x4a/0x70
> > => __kretprobe_trampoline_handler+0xd4/0x170
> > => trampoline_handler+0x43/0x60
> > => kretprobe_trampoline+0x2a/0x50
> > => vfs_read+0x98/0x180
> > => ksys_read+0x5f/0xe0
> > => do_syscall_64+0x37/0x90
> > => entry_SYSCALL_64_after_hwframe+0x44/0xae
> > <...>-134 [007] ...1 16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)
> >
> > This shows the double return probes (vfs_read() and full_proxy_read()) on the stack
> > correctly unwinded. (vfs_read() returns to 'ksys_read+0x5f' and full_proxy_read()
> > returns to 'vfs_read+0x98')
> >
> > This also changes the kretprobe behavisor a bit, now the instraction pointer in
> > the 'pt_regs' passed to kretprobe user handler is correctly set the real return
> > address. So user handlers can get it via instruction_pointer() API, and can use
> > stack_trace_save_regs().
> >
> > You can also get this series from
> > git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v9
>
> Oops, this is of course 'kprobes/kretprobe-stackfix-v10'. And this branch includes above (*1) series.

Hi Masami,

Was this ever merged/applied? This is a very important functionality
for BPF kretprobes, so I hope this won't slip through the cracks.
Thanks!

>
> Thank you,
>
> --
> Masami Hiramatsu <[email protected]>

2021-08-24 05:34:31

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v10 00/16] kprobes: Fix stacktrace with kretprobes on x86

On Mon, 23 Aug 2021 22:12:06 -0700
Andrii Nakryiko <[email protected]> wrote:

> On Thu, Jul 29, 2021 at 4:35 PM Masami Hiramatsu <[email protected]> wrote:
> >
> > On Thu, 29 Jul 2021 23:05:56 +0900
> > Masami Hiramatsu <[email protected]> wrote:
> >
> > > Hello,
> > >
> > > This is the 10th version of the series to fix the stacktrace with kretprobe on x86.
> > >
> > > The previous version is here;
> > >
> > > https://lore.kernel.org/bpf/162601048053.1318837.1550594515476777588.stgit@devnote2/
> > >
> > > This version is rebased on top of new kprobes cleanup series(*1) and merging
> > > Josh's objtool update series (*2)(*3) as [6/16] and [7/16].
> > >
> > > (*1) https://lore.kernel.org/bpf/162748615977.59465.13262421617578791515.stgit@devnote2/
> > > (*2) https://lore.kernel.org/bpf/20210710192433.x5cgjsq2ksvaqnss@treble/
> > > (*3) https://lore.kernel.org/bpf/20210710192514.ghvksi3ozhez4lvb@treble/
> > >
> > > Changes from v9:
> > > - Add Josh's objtool update patches with a build error fix as [6/16] and [7/16].
> > > - Add a API document for kretprobe_find_ret_addr() and check cur != NULL in [5/16].
> > >
> > > With this series, unwinder can unwind stack correctly from ftrace as below;
> > >
> > > # cd /sys/kernel/debug/tracing
> > > # echo > trace
> > > # echo 1 > options/sym-offset
> > > # echo r vfs_read >> kprobe_events
> > > # echo r full_proxy_read >> kprobe_events
> > > # echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
> > > # echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
> > > # echo 1 > events/kprobes/enable
> > > # cat /sys/kernel/debug/kprobes/list
> > > ffffffff8133b740 r full_proxy_read+0x0 [FTRACE]
> > > ffffffff812560b0 r vfs_read+0x0 [FTRACE]
> > > # echo 0 > events/kprobes/enable
> > > # cat trace
> > > # tracer: nop
> > > #
> > > # entries-in-buffer/entries-written: 3/3 #P:8
> > > #
> > > # _-----=> irqs-off
> > > # / _----=> need-resched
> > > # | / _---=> hardirq/softirq
> > > # || / _--=> preempt-depth
> > > # ||| / delay
> > > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > > # | | | |||| | |
> > > <...>-134 [007] ...1 16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
> > > <...>-134 [007] ...1 16.185901: <stack trace>
> > > => kretprobe_trace_func+0x209/0x300
> > > => kretprobe_dispatcher+0x4a/0x70
> > > => __kretprobe_trampoline_handler+0xd4/0x170
> > > => trampoline_handler+0x43/0x60
> > > => kretprobe_trampoline+0x2a/0x50
> > > => vfs_read+0x98/0x180
> > > => ksys_read+0x5f/0xe0
> > > => do_syscall_64+0x37/0x90
> > > => entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > <...>-134 [007] ...1 16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)
> > >
> > > This shows the double return probes (vfs_read() and full_proxy_read()) on the stack
> > > correctly unwinded. (vfs_read() returns to 'ksys_read+0x5f' and full_proxy_read()
> > > returns to 'vfs_read+0x98')
> > >
> > > This also changes the kretprobe behavisor a bit, now the instraction pointer in
> > > the 'pt_regs' passed to kretprobe user handler is correctly set the real return
> > > address. So user handlers can get it via instruction_pointer() API, and can use
> > > stack_trace_save_regs().
> > >
> > > You can also get this series from
> > > git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v9
> >
> > Oops, this is of course 'kprobes/kretprobe-stackfix-v10'. And this branch includes above (*1) series.
>
> Hi Masami,
>
> Was this ever merged/applied? This is a very important functionality
> for BPF kretprobes, so I hope this won't slip through the cracks.

No, not yet as far as I know.
I'm waiting for any comment on this series. Since this is basically
x86 ORC unwinder improvement, this series should be merged to -tip tree.

Ingo and Josh,

Could you give me any comment, please?

Thank you,


> Thanks!
>
> >
> > Thank you,
> >
> > --
> > Masami Hiramatsu <[email protected]>


--
Masami Hiramatsu <[email protected]>

2021-08-29 14:24:53

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH 0/1] Non stack-intrusive return probe event

Hello,

For a long time, we tackled to fix some issues around kretprobe.
One of the latest action was the stacktrace fix on x86 in this
thread.

https://lore.kernel.org/bpf/162756755600.301564.4957591913842010341.stgit@devnote2/

However, there seems no progress/further discussion. So I would
like to make another approach for this (and the other issues.)

Here is my idea -- replace kretprobe with kprobe.
In other words, put a kprobe on the "return instruction" directly
instead of modifying the kernel stack. This can solve most
of the kretprobe disadvantges. E.g.

- Since it doesn't change the kernel stack, any special stack
unwinder fixup is not needed anymore.
- No "max-instance" limitations anymore, because it will use
kprobes directly.
- Scalability performance will be improved as same as kprobes.
No list-operation in probe-runtime.

Here is a PoC code which introduces "retinsn_probe" event as a part
of ftrace kprobe event. I don't think we need to replace the
kretprobe. This should be a higher layer feature, because some
kernel functions can have multiple "return instructions". Thus,
the "retinsn_probe" must manage multiple kprobes. That means the
"retinsn_probe" will be a user of kprobes. I decided to make it
inside the ftrace "kprobe-event". This gives us another advantage
for eBPF support. Because eBPF uses "kprobe-event" instead of
"kprobe" directly, if the "retinsn_probe" is implemented in the
"kprobe-event", eBPF can use it without any change.
Anyway, this can be co-exist with kretprobe. So as far as any
user uses kretprobe, we can keep it.


Example
=======
For example, I ran a shell script, which was used in the
stacktrace fix series.

----
mount -t debugfs debugfs /sys/kernel/debug/
cd /sys/kernel/debug/tracing
echo > trace
echo 1 > options/sym-offset
echo r vfs_read >> kprobe_events
echo r full_proxy_read >> kprobe_events
echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
echo 1 > events/kprobes/enable
cat /sys/kernel/debug/kprobes/list
echo 0 > events/kprobes/enable
cat trace
----

This is the result.
----
ffffffff813b420e k full_proxy_read+0x6e
ffffffff812b7c0a k vfs_read+0xda
# tracer: nop
#
# entries-in-buffer/entries-written: 3/3 #P:8
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
cat-136 [007] d.Z. 8.038381: r_full_proxy_read_0: (vfs_read+0x9b/0x180 <- full_proxy_read)
cat-136 [007] d.Z. 8.038386: <stack trace>
=> kretprobe_trace_func+0x209/0x300
=> retinsn_dispatcher+0x7a/0xa0
=> kprobe_post_process+0x28/0x80
=> kprobe_int3_handler+0x166/0x1a0
=> exc_int3+0x47/0x140
=> asm_exc_int3+0x31/0x40
=> vfs_read+0x9b/0x180
=> ksys_read+0x68/0xe0
=> do_syscall_64+0x3b/0x90
=> entry_SYSCALL_64_after_hwframe+0x44/0xae
cat-136 [007] d.Z. 8.038387: r_vfs_read_0: (ksys_read+0x68/0xe0 <- vfs_read)
----

You can see the return probe events are translated to kprobes
instead of kretprobes. And also, on the stacktrace, we can see
an int3 calls the kprobe and decode stacktrace correctly.


TODO
====
Of course, this is just an PoC code, there are many TODOs.

- This PoC code only supports x86 at this moment. But I think this
can be done on the other architectures. What it needs is
to implement "find_return_instructions()".
- Code cleanup is not enough. I have to remove "kretprobe" from
"trace_kprobe" data structure, rewrite related functions etc.
- It has to handle "tail-call" optimized code, which replaces
a "call + return" into "jump". find_return_instruction() should
detect it and decode the jump destination too.


Thank you,


---

Masami Hiramatsu (1):
[PoC] tracing: kprobe: Add non-stack intrusion return probe event


arch/x86/kernel/kprobes/core.c | 59 +++++++++++++++++++++
kernel/trace/trace_kprobe.c | 110 ++++++++++++++++++++++++++++++++++++++--
2 files changed, 164 insertions(+), 5 deletions(-)

--
Masami Hiramatsu (Linaro) <[email protected]>

2021-08-29 14:26:15

by Masami Hiramatsu

[permalink] [raw]
Subject: [RFC PATCH 1/1] [PoC] tracing: kprobe: Add non-stack intrusion return probe event

Add kernel return instruction probe event (kriprobe event)
to kprobe event. This will hook the returns of the target function
but does not intrude the real stack entry.
This depends on each architecture implement one function --
find_return_instructions(). If it is implemented correctly,
kprobe event uses the kriprobe event instead of kretprobe.

Note, this is just a PoC code for x86. This doesn't work with
other arch which only supports kretprobe.
Also, This doesn't support the function with the tail call
(jump into another function instead of call & return),
kriprobe doesn't work with it yet.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/x86/kernel/kprobes/core.c | 59 +++++++++++++++++++++
kernel/trace/trace_kprobe.c | 110 ++++++++++++++++++++++++++++++++++++++--
2 files changed, 164 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index b6e046e4b289..4c4094505712 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1117,3 +1117,62 @@ int arch_trampoline_kprobe(struct kprobe *p)
{
return 0;
}
+
+static bool insn_is_return(struct insn *insn)
+{
+ switch (insn->opcode.bytes[0]) {
+ case 0xc2:
+ case 0xc3:
+ case 0xca:
+ case 0xcb:
+ return true;
+ default:
+ return false;
+ }
+}
+
+/**
+ * find_return_instructions -- Search return instruction in the function
+ * @func: The target function address
+ * @rets: The storage of the return instruction address
+ * @nr_rets: The length of @rets
+ *
+ * This searches the address of return instructions in the @func (@func must
+ * be an entry address of the target function). The results are stored in the
+ * @rets. If the number of return instructions are bigger than @nr_rets, this
+ * will return the required length of the @rets.
+ */
+int find_return_instructions(kprobe_opcode_t *func, kprobe_opcode_t *rets[], int nr_rets)
+{
+ unsigned long addr, end, size = 0, offset = 0;
+ kprobe_opcode_t buf[MAX_INSN_SIZE];
+ unsigned long recovered_insn;
+ struct insn insn;
+ int ret, nr = 0;
+
+ addr = (unsigned long)func;
+ if (!kallsyms_lookup_size_offset(addr, &size, &offset))
+ return -EINVAL;
+
+ if (offset != 0)
+ return -EINVAL;
+ end = addr + size;
+
+ /* Decode the function to find return instructions */
+ while (addr < end) {
+ recovered_insn = recover_probed_instruction(buf, addr);
+ if (!recovered_insn)
+ return -EILSEQ;
+ ret = insn_decode_kernel(&insn, (void *)recovered_insn);
+ if (ret < 0)
+ return -EILSEQ;
+ if (insn_is_return(&insn)) {
+ if (nr < nr_rets)
+ rets[nr++] = (kprobe_opcode_t *)addr;
+ }
+ /* TODO: find jmp for tail call (outside of this func) */
+ addr += insn.length;
+ }
+
+ return nr;
+}
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 3a64ba4bbad6..99e508ff45ad 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -50,6 +50,13 @@ static struct dyn_event_operations trace_kprobe_ops = {
.match = trace_kprobe_match,
};

+#define MAX_RET_INSNS 16
+
+struct kprobe_holder {
+ struct kprobe kp;
+ struct trace_kprobe *tk;
+};
+
/*
* Kprobe event core functions
*/
@@ -59,6 +66,8 @@ struct trace_kprobe {
unsigned long __percpu *nhit;
const char *symbol; /* symbol name */
struct trace_probe tp;
+ struct kprobe_holder *krets;
+ int nr_krets;
};

static bool is_trace_kprobe(struct dyn_event *ev)
@@ -82,7 +91,7 @@ static struct trace_kprobe *to_trace_kprobe(struct dyn_event *ev)

static nokprobe_inline bool trace_kprobe_is_return(struct trace_kprobe *tk)
{
- return tk->rp.handler != NULL;
+ return tk->rp.handler != NULL || tk->krets != NULL;
}

static nokprobe_inline const char *trace_kprobe_symbol(struct trace_kprobe *tk)
@@ -180,7 +189,7 @@ static nokprobe_inline unsigned long trace_kprobe_nhit(struct trace_kprobe *tk)

static nokprobe_inline bool trace_kprobe_is_registered(struct trace_kprobe *tk)
{
- return !(list_empty(&tk->rp.kp.list) &&
+ return tk->krets || !(list_empty(&tk->rp.kp.list) &&
hlist_unhashed(&tk->rp.kp.hlist));
}

@@ -311,13 +320,23 @@ static struct trace_kprobe *find_trace_kprobe(const char *event,
return NULL;
}

+static int enable_retinsn_probe(struct trace_kprobe *tk)
+{
+ int ret, i;
+
+ for (i = 0; i < tk->nr_krets; i++)
+ ret = enable_kprobe(&(tk->krets[i].kp));
+
+ return ret;
+}
+
static inline int __enable_trace_kprobe(struct trace_kprobe *tk)
{
int ret = 0;

if (trace_kprobe_is_registered(tk) && !trace_kprobe_has_gone(tk)) {
if (trace_kprobe_is_return(tk))
- ret = enable_kretprobe(&tk->rp);
+ ret = enable_retinsn_probe(tk);
else
ret = enable_kprobe(&tk->rp.kp);
}
@@ -474,6 +493,68 @@ static bool within_notrace_func(struct trace_kprobe *tk)
#define within_notrace_func(tk) (false)
#endif

+int find_return_instructions(kprobe_opcode_t *func, kprobe_opcode_t *rets[], int nr_rets);
+static void retinsn_dispatcher(struct kprobe *kp, struct pt_regs *regs, unsigned long flags);
+
+static void unregister_retinsn_probe(struct trace_kprobe *tk)
+{
+ struct kprobe *kpp[MAX_RET_INSNS];
+ int i;
+
+ for (i = 0; i < tk->nr_krets; i++)
+ kpp[i] = &tk->krets[i].kp;
+
+ unregister_kprobes(kpp, tk->nr_krets);
+}
+
+static int register_retinsn_probe(struct trace_kprobe *tk)
+{
+ kprobe_opcode_t *func = (kprobe_opcode_t *)trace_kprobe_address(tk);
+ kprobe_opcode_t *rets[MAX_RET_INSNS];
+ struct kprobe *kpp[MAX_RET_INSNS];
+ struct kprobe_holder *khs;
+ int i, ret, nrets;
+
+ /* Find return instruction in the target function. */
+ ret = find_return_instructions(func, rets, MAX_RET_INSNS);
+ if (ret < 0)
+ return ret;
+
+ /* There might be tail call (jump) in the function. */
+ if (ret == 0)
+ return -ENOENT;
+
+ /* Or, too many return instructions. */
+ if (ret > MAX_RET_INSNS)
+ return -E2BIG;
+
+ /* Allocate kprobes which probes the return instructions directly. */
+ nrets = ret;
+ khs = kcalloc(nrets, sizeof(struct kprobe_holder), GFP_KERNEL);
+ if (!khs)
+ return -ENOENT;
+
+ for (i = 0; i < nrets; i++) {
+ khs[i].kp.addr = rets[i];
+ khs[i].kp.flags = tk->rp.kp.flags;
+ khs[i].kp.post_handler = retinsn_dispatcher;
+ khs[i].tk = tk;
+ kpp[i] = &khs[i].kp;
+ }
+
+ ret = register_kprobes(kpp, nrets);
+ if (ret < 0) {
+ kfree(khs);
+ return ret;
+ }
+
+ tk->rp.kp.addr = trace_kprobe_address(tk);
+ tk->krets = khs;
+ tk->nr_krets = nrets;
+
+ return 0;
+}
+
/* Internal register function - just handle k*probes and flags */
static int __register_trace_kprobe(struct trace_kprobe *tk)
{
@@ -505,7 +586,7 @@ static int __register_trace_kprobe(struct trace_kprobe *tk)
tk->rp.kp.flags |= KPROBE_FLAG_DISABLED;

if (trace_kprobe_is_return(tk))
- ret = register_kretprobe(&tk->rp);
+ ret = register_retinsn_probe(tk);
else
ret = register_kprobe(&tk->rp.kp);

@@ -517,7 +598,7 @@ static void __unregister_trace_kprobe(struct trace_kprobe *tk)
{
if (trace_kprobe_is_registered(tk)) {
if (trace_kprobe_is_return(tk))
- unregister_kretprobe(&tk->rp);
+ unregister_retinsn_probe(tk);
else
unregister_kprobe(&tk->rp.kp);
/* Cleanup kprobe for reuse and mark it unregistered */
@@ -1744,6 +1825,25 @@ kretprobe_dispatcher(struct kretprobe_instance *ri, struct pt_regs *regs)
}
NOKPROBE_SYMBOL(kretprobe_dispatcher);

+static void retinsn_dispatcher(struct kprobe *kp, struct pt_regs *regs, unsigned long flags)
+{
+ struct kprobe_holder *kh = container_of(kp, struct kprobe_holder, kp);
+ struct trace_kprobe *tk = kh->tk;
+ struct kretprobe_instance ri; /* dummy : to be fixed */
+
+ ri.ret_addr = (void *)instruction_pointer(regs);
+
+ raw_cpu_inc(*tk->nhit);
+
+ if (trace_probe_test_flag(&tk->tp, TP_FLAG_TRACE))
+ kretprobe_trace_func(tk, &ri, regs);
+#ifdef CONFIG_PERF_EVENTS
+ if (trace_probe_test_flag(&tk->tp, TP_FLAG_PROFILE))
+ kretprobe_perf_func(tk, &ri, regs);
+#endif
+}
+NOKPROBE_SYMBOL(retinsn_dispatcher);
+
static struct trace_event_functions kretprobe_funcs = {
.trace = print_kretprobe_event
};

2021-08-30 19:13:11

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] Non stack-intrusive return probe event

On Sun, Aug 29, 2021 at 7:22 AM Masami Hiramatsu <[email protected]> wrote:
>
> Hello,
>
> For a long time, we tackled to fix some issues around kretprobe.
> One of the latest action was the stacktrace fix on x86 in this
> thread.
>
> https://lore.kernel.org/bpf/162756755600.301564.4957591913842010341.stgit@devnote2/
>
> However, there seems no progress/further discussion. So I would
> like to make another approach for this (and the other issues.)

v10 of kretprobe+stacktrace fixes ([0]) from Masami has received no
comment or objections in the last month, since it was posted. It fixes
the very real and very limiting problem of not being able to capture a
stack trace from BPF kretprobe programs. Masami, while I don't mind
your new approach, I think we shouldn't consider them as "either/or"
solutions. We have a fix that works for existing implementations, can
we please land it, and then work on further improvements
independently?

Ingo, Peter, Steven,

I'm not sure who and which kernel tree this has to go through, but
assuming it's one of you/yours, can you please take a look at [0] and
apply it where appropriate? The work has been going on since March and
it blocks development of some extremely useful tooling (retsnoop [1]
being one of them). There were also bpftrace users that were
completely surprised about the inability to use stack trace capturing
from kretprobe handlers, so it's not just me. I (and a bunch of other
BPF users) would greatly appreciate help with getting this problem
fixed. Thank you!

[0] https://lore.kernel.org/bpf/162756755600.301564.4957591913842010341.stgit@devnote2/
[1] https://github.com/anakryiko/retsnoop

>
> Here is my idea -- replace kretprobe with kprobe.
> In other words, put a kprobe on the "return instruction" directly
> instead of modifying the kernel stack. This can solve most
> of the kretprobe disadvantges. E.g.
>
> - Since it doesn't change the kernel stack, any special stack
> unwinder fixup is not needed anymore.
> - No "max-instance" limitations anymore, because it will use
> kprobes directly.
> - Scalability performance will be improved as same as kprobes.
> No list-operation in probe-runtime.
>
> Here is a PoC code which introduces "retinsn_probe" event as a part
> of ftrace kprobe event. I don't think we need to replace the
> kretprobe. This should be a higher layer feature, because some
> kernel functions can have multiple "return instructions". Thus,
> the "retinsn_probe" must manage multiple kprobes. That means the
> "retinsn_probe" will be a user of kprobes. I decided to make it
> inside the ftrace "kprobe-event". This gives us another advantage
> for eBPF support. Because eBPF uses "kprobe-event" instead of
> "kprobe" directly, if the "retinsn_probe" is implemented in the
> "kprobe-event", eBPF can use it without any change.
> Anyway, this can be co-exist with kretprobe. So as far as any
> user uses kretprobe, we can keep it.
>
>
> Example
> =======
> For example, I ran a shell script, which was used in the
> stacktrace fix series.
>
> ----
> mount -t debugfs debugfs /sys/kernel/debug/
> cd /sys/kernel/debug/tracing
> echo > trace
> echo 1 > options/sym-offset
> echo r vfs_read >> kprobe_events
> echo r full_proxy_read >> kprobe_events
> echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
> echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
> echo 1 > events/kprobes/enable
> cat /sys/kernel/debug/kprobes/list
> echo 0 > events/kprobes/enable
> cat trace
> ----
>
> This is the result.
> ----
> ffffffff813b420e k full_proxy_read+0x6e
> ffffffff812b7c0a k vfs_read+0xda
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 3/3 #P:8
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
> cat-136 [007] d.Z. 8.038381: r_full_proxy_read_0: (vfs_read+0x9b/0x180 <- full_proxy_read)
> cat-136 [007] d.Z. 8.038386: <stack trace>
> => kretprobe_trace_func+0x209/0x300
> => retinsn_dispatcher+0x7a/0xa0
> => kprobe_post_process+0x28/0x80
> => kprobe_int3_handler+0x166/0x1a0
> => exc_int3+0x47/0x140
> => asm_exc_int3+0x31/0x40
> => vfs_read+0x9b/0x180
> => ksys_read+0x68/0xe0
> => do_syscall_64+0x3b/0x90
> => entry_SYSCALL_64_after_hwframe+0x44/0xae
> cat-136 [007] d.Z. 8.038387: r_vfs_read_0: (ksys_read+0x68/0xe0 <- vfs_read)
> ----
>
> You can see the return probe events are translated to kprobes
> instead of kretprobes. And also, on the stacktrace, we can see
> an int3 calls the kprobe and decode stacktrace correctly.
>
>
> TODO
> ====
> Of course, this is just an PoC code, there are many TODOs.
>
> - This PoC code only supports x86 at this moment. But I think this
> can be done on the other architectures. What it needs is
> to implement "find_return_instructions()".
> - Code cleanup is not enough. I have to remove "kretprobe" from
> "trace_kprobe" data structure, rewrite related functions etc.
> - It has to handle "tail-call" optimized code, which replaces
> a "call + return" into "jump". find_return_instruction() should
> detect it and decode the jump destination too.
>
>
> Thank you,
>
>
> ---
>
> Masami Hiramatsu (1):
> [PoC] tracing: kprobe: Add non-stack intrusion return probe event
>
>
> arch/x86/kernel/kprobes/core.c | 59 +++++++++++++++++++++
> kernel/trace/trace_kprobe.c | 110 ++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 164 insertions(+), 5 deletions(-)
>
> --
> Masami Hiramatsu (Linaro) <[email protected]>

2021-08-31 06:08:09

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [RFC PATCH 0/1] Non stack-intrusive return probe event

On Mon, 30 Aug 2021 12:04:56 -0700
Andrii Nakryiko <[email protected]> wrote:

> On Sun, Aug 29, 2021 at 7:22 AM Masami Hiramatsu <[email protected]> wrote:
> >
> > Hello,
> >
> > For a long time, we tackled to fix some issues around kretprobe.
> > One of the latest action was the stacktrace fix on x86 in this
> > thread.
> >
> > https://lore.kernel.org/bpf/162756755600.301564.4957591913842010341.stgit@devnote2/
> >
> > However, there seems no progress/further discussion. So I would
> > like to make another approach for this (and the other issues.)
>
> v10 of kretprobe+stacktrace fixes ([0]) from Masami has received no
> comment or objections in the last month, since it was posted. It fixes
> the very real and very limiting problem of not being able to capture a
> stack trace from BPF kretprobe programs. Masami, while I don't mind
> your new approach, I think we shouldn't consider them as "either/or"
> solutions. We have a fix that works for existing implementations, can
> we please land it, and then work on further improvements
> independently?

That's right. The original stacktrace fix series is for fixing
existing issue with kretprobes and stack unwinder.
This one just for avoiding the issue only from dynamic events.
So we can anyway proceed both.

>
> Ingo, Peter, Steven,
>
> I'm not sure who and which kernel tree this has to go through, but
> assuming it's one of you/yours, can you please take a look at [0] and
> apply it where appropriate? The work has been going on since March and
> it blocks development of some extremely useful tooling (retsnoop [1]
> being one of them). There were also bpftrace users that were
> completely surprised about the inability to use stack trace capturing
> from kretprobe handlers, so it's not just me. I (and a bunch of other
> BPF users) would greatly appreciate help with getting this problem
> fixed. Thank you!
>
> [0] https://lore.kernel.org/bpf/162756755600.301564.4957591913842010341.stgit@devnote2/
> [1] https://github.com/anakryiko/retsnoop

Thank you,

>
> >
> > Here is my idea -- replace kretprobe with kprobe.
> > In other words, put a kprobe on the "return instruction" directly
> > instead of modifying the kernel stack. This can solve most
> > of the kretprobe disadvantges. E.g.
> >
> > - Since it doesn't change the kernel stack, any special stack
> > unwinder fixup is not needed anymore.
> > - No "max-instance" limitations anymore, because it will use
> > kprobes directly.
> > - Scalability performance will be improved as same as kprobes.
> > No list-operation in probe-runtime.
> >
> > Here is a PoC code which introduces "retinsn_probe" event as a part
> > of ftrace kprobe event. I don't think we need to replace the
> > kretprobe. This should be a higher layer feature, because some
> > kernel functions can have multiple "return instructions". Thus,
> > the "retinsn_probe" must manage multiple kprobes. That means the
> > "retinsn_probe" will be a user of kprobes. I decided to make it
> > inside the ftrace "kprobe-event". This gives us another advantage
> > for eBPF support. Because eBPF uses "kprobe-event" instead of
> > "kprobe" directly, if the "retinsn_probe" is implemented in the
> > "kprobe-event", eBPF can use it without any change.
> > Anyway, this can be co-exist with kretprobe. So as far as any
> > user uses kretprobe, we can keep it.
> >
> >
> > Example
> > =======
> > For example, I ran a shell script, which was used in the
> > stacktrace fix series.
> >
> > ----
> > mount -t debugfs debugfs /sys/kernel/debug/
> > cd /sys/kernel/debug/tracing
> > echo > trace
> > echo 1 > options/sym-offset
> > echo r vfs_read >> kprobe_events
> > echo r full_proxy_read >> kprobe_events
> > echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
> > echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
> > echo 1 > events/kprobes/enable
> > cat /sys/kernel/debug/kprobes/list
> > echo 0 > events/kprobes/enable
> > cat trace
> > ----
> >
> > This is the result.
> > ----
> > ffffffff813b420e k full_proxy_read+0x6e
> > ffffffff812b7c0a k vfs_read+0xda
> > # tracer: nop
> > #
> > # entries-in-buffer/entries-written: 3/3 #P:8
> > #
> > # _-----=> irqs-off
> > # / _----=> need-resched
> > # | / _---=> hardirq/softirq
> > # || / _--=> preempt-depth
> > # ||| / delay
> > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > # | | | |||| | |
> > cat-136 [007] d.Z. 8.038381: r_full_proxy_read_0: (vfs_read+0x9b/0x180 <- full_proxy_read)
> > cat-136 [007] d.Z. 8.038386: <stack trace>
> > => kretprobe_trace_func+0x209/0x300
> > => retinsn_dispatcher+0x7a/0xa0
> > => kprobe_post_process+0x28/0x80
> > => kprobe_int3_handler+0x166/0x1a0
> > => exc_int3+0x47/0x140
> > => asm_exc_int3+0x31/0x40
> > => vfs_read+0x9b/0x180
> > => ksys_read+0x68/0xe0
> > => do_syscall_64+0x3b/0x90
> > => entry_SYSCALL_64_after_hwframe+0x44/0xae
> > cat-136 [007] d.Z. 8.038387: r_vfs_read_0: (ksys_read+0x68/0xe0 <- vfs_read)
> > ----
> >
> > You can see the return probe events are translated to kprobes
> > instead of kretprobes. And also, on the stacktrace, we can see
> > an int3 calls the kprobe and decode stacktrace correctly.
> >
> >
> > TODO
> > ====
> > Of course, this is just an PoC code, there are many TODOs.
> >
> > - This PoC code only supports x86 at this moment. But I think this
> > can be done on the other architectures. What it needs is
> > to implement "find_return_instructions()".
> > - Code cleanup is not enough. I have to remove "kretprobe" from
> > "trace_kprobe" data structure, rewrite related functions etc.
> > - It has to handle "tail-call" optimized code, which replaces
> > a "call + return" into "jump". find_return_instruction() should
> > detect it and decode the jump destination too.
> >
> >
> > Thank you,
> >
> >
> > ---
> >
> > Masami Hiramatsu (1):
> > [PoC] tracing: kprobe: Add non-stack intrusion return probe event
> >
> >
> > arch/x86/kernel/kprobes/core.c | 59 +++++++++++++++++++++
> > kernel/trace/trace_kprobe.c | 110 ++++++++++++++++++++++++++++++++++++++--
> > 2 files changed, 164 insertions(+), 5 deletions(-)
> >
> > --
> > Masami Hiramatsu (Linaro) <[email protected]>


--
Masami Hiramatsu <[email protected]>

2021-09-13 17:19:52

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH -tip v10 00/16] kprobes: Fix stacktrace with kretprobes on x86

On Mon, Aug 23, 2021 at 10:32 PM Masami Hiramatsu <[email protected]> wrote:
>
> On Mon, 23 Aug 2021 22:12:06 -0700
> Andrii Nakryiko <[email protected]> wrote:
>
> > On Thu, Jul 29, 2021 at 4:35 PM Masami Hiramatsu <[email protected]> wrote:
> > >
> > > On Thu, 29 Jul 2021 23:05:56 +0900
> > > Masami Hiramatsu <[email protected]> wrote:
> > >
> > > > Hello,
> > > >
> > > > This is the 10th version of the series to fix the stacktrace with kretprobe on x86.
> > > >
> > > > The previous version is here;
> > > >
> > > > https://lore.kernel.org/bpf/162601048053.1318837.1550594515476777588.stgit@devnote2/
> > > >
> > > > This version is rebased on top of new kprobes cleanup series(*1) and merging
> > > > Josh's objtool update series (*2)(*3) as [6/16] and [7/16].
> > > >
> > > > (*1) https://lore.kernel.org/bpf/162748615977.59465.13262421617578791515.stgit@devnote2/
> > > > (*2) https://lore.kernel.org/bpf/20210710192433.x5cgjsq2ksvaqnss@treble/
> > > > (*3) https://lore.kernel.org/bpf/20210710192514.ghvksi3ozhez4lvb@treble/
> > > >
> > > > Changes from v9:
> > > > - Add Josh's objtool update patches with a build error fix as [6/16] and [7/16].
> > > > - Add a API document for kretprobe_find_ret_addr() and check cur != NULL in [5/16].
> > > >
> > > > With this series, unwinder can unwind stack correctly from ftrace as below;
> > > >
> > > > # cd /sys/kernel/debug/tracing
> > > > # echo > trace
> > > > # echo 1 > options/sym-offset
> > > > # echo r vfs_read >> kprobe_events
> > > > # echo r full_proxy_read >> kprobe_events
> > > > # echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
> > > > # echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
> > > > # echo 1 > events/kprobes/enable
> > > > # cat /sys/kernel/debug/kprobes/list
> > > > ffffffff8133b740 r full_proxy_read+0x0 [FTRACE]
> > > > ffffffff812560b0 r vfs_read+0x0 [FTRACE]
> > > > # echo 0 > events/kprobes/enable
> > > > # cat trace
> > > > # tracer: nop
> > > > #
> > > > # entries-in-buffer/entries-written: 3/3 #P:8
> > > > #
> > > > # _-----=> irqs-off
> > > > # / _----=> need-resched
> > > > # | / _---=> hardirq/softirq
> > > > # || / _--=> preempt-depth
> > > > # ||| / delay
> > > > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > > > # | | | |||| | |
> > > > <...>-134 [007] ...1 16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
> > > > <...>-134 [007] ...1 16.185901: <stack trace>
> > > > => kretprobe_trace_func+0x209/0x300
> > > > => kretprobe_dispatcher+0x4a/0x70
> > > > => __kretprobe_trampoline_handler+0xd4/0x170
> > > > => trampoline_handler+0x43/0x60
> > > > => kretprobe_trampoline+0x2a/0x50
> > > > => vfs_read+0x98/0x180
> > > > => ksys_read+0x5f/0xe0
> > > > => do_syscall_64+0x37/0x90
> > > > => entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > <...>-134 [007] ...1 16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)
> > > >
> > > > This shows the double return probes (vfs_read() and full_proxy_read()) on the stack
> > > > correctly unwinded. (vfs_read() returns to 'ksys_read+0x5f' and full_proxy_read()
> > > > returns to 'vfs_read+0x98')
> > > >
> > > > This also changes the kretprobe behavisor a bit, now the instraction pointer in
> > > > the 'pt_regs' passed to kretprobe user handler is correctly set the real return
> > > > address. So user handlers can get it via instruction_pointer() API, and can use
> > > > stack_trace_save_regs().
> > > >
> > > > You can also get this series from
> > > > git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v9
> > >
> > > Oops, this is of course 'kprobes/kretprobe-stackfix-v10'. And this branch includes above (*1) series.
> >
> > Hi Masami,
> >
> > Was this ever merged/applied? This is a very important functionality
> > for BPF kretprobes, so I hope this won't slip through the cracks.
>
> No, not yet as far as I know.
> I'm waiting for any comment on this series. Since this is basically
> x86 ORC unwinder improvement, this series should be merged to -tip tree.
>

Hey Masami,

It's been a while since you posted v10. It seems like this series
doesn't apply cleanly anymore. Do you mind rebasing and resubmitting
it again to refresh the series and make it easier for folks to review
and test it?

Also, do I understand correctly that [0] is a dependency of this
series? If yes, please rebase and resubmit that one as well. Not sure
on the status of Josh's patches you have dependency on as well. Can
you please coordinate with him and maybe incorporate them into your
series?

Please also cc Paul McKenney <[email protected]> for the future
revisions so he can follow along as well? Thanks!


[0] https://patchwork.kernel.org/project/netdevbpf/list/?series=522757&state=*



> Ingo and Josh,
>
> Could you give me any comment, please?
>
> Thank you,
>
>
> > Thanks!
> >
> > >
> > > Thank you,
> > >
> > > --
> > > Masami Hiramatsu <[email protected]>
>
>
> --
> Masami Hiramatsu <[email protected]>

2021-09-14 01:30:50

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v10 00/16] kprobes: Fix stacktrace with kretprobes on x86

On Mon, 13 Sep 2021 10:14:55 -0700
Andrii Nakryiko <[email protected]> wrote:

> On Mon, Aug 23, 2021 at 10:32 PM Masami Hiramatsu <[email protected]> wrote:
> >
> > On Mon, 23 Aug 2021 22:12:06 -0700
> > Andrii Nakryiko <[email protected]> wrote:
> >
> > > On Thu, Jul 29, 2021 at 4:35 PM Masami Hiramatsu <[email protected]> wrote:
> > > >
> > > > On Thu, 29 Jul 2021 23:05:56 +0900
> > > > Masami Hiramatsu <[email protected]> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > This is the 10th version of the series to fix the stacktrace with kretprobe on x86.
> > > > >
> > > > > The previous version is here;
> > > > >
> > > > > https://lore.kernel.org/bpf/162601048053.1318837.1550594515476777588.stgit@devnote2/
> > > > >
> > > > > This version is rebased on top of new kprobes cleanup series(*1) and merging
> > > > > Josh's objtool update series (*2)(*3) as [6/16] and [7/16].
> > > > >
> > > > > (*1) https://lore.kernel.org/bpf/162748615977.59465.13262421617578791515.stgit@devnote2/
> > > > > (*2) https://lore.kernel.org/bpf/20210710192433.x5cgjsq2ksvaqnss@treble/
> > > > > (*3) https://lore.kernel.org/bpf/20210710192514.ghvksi3ozhez4lvb@treble/
> > > > >
> > > > > Changes from v9:
> > > > > - Add Josh's objtool update patches with a build error fix as [6/16] and [7/16].
> > > > > - Add a API document for kretprobe_find_ret_addr() and check cur != NULL in [5/16].
> > > > >
> > > > > With this series, unwinder can unwind stack correctly from ftrace as below;
> > > > >
> > > > > # cd /sys/kernel/debug/tracing
> > > > > # echo > trace
> > > > > # echo 1 > options/sym-offset
> > > > > # echo r vfs_read >> kprobe_events
> > > > > # echo r full_proxy_read >> kprobe_events
> > > > > # echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
> > > > > # echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
> > > > > # echo 1 > events/kprobes/enable
> > > > > # cat /sys/kernel/debug/kprobes/list
> > > > > ffffffff8133b740 r full_proxy_read+0x0 [FTRACE]
> > > > > ffffffff812560b0 r vfs_read+0x0 [FTRACE]
> > > > > # echo 0 > events/kprobes/enable
> > > > > # cat trace
> > > > > # tracer: nop
> > > > > #
> > > > > # entries-in-buffer/entries-written: 3/3 #P:8
> > > > > #
> > > > > # _-----=> irqs-off
> > > > > # / _----=> need-resched
> > > > > # | / _---=> hardirq/softirq
> > > > > # || / _--=> preempt-depth
> > > > > # ||| / delay
> > > > > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > > > > # | | | |||| | |
> > > > > <...>-134 [007] ...1 16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
> > > > > <...>-134 [007] ...1 16.185901: <stack trace>
> > > > > => kretprobe_trace_func+0x209/0x300
> > > > > => kretprobe_dispatcher+0x4a/0x70
> > > > > => __kretprobe_trampoline_handler+0xd4/0x170
> > > > > => trampoline_handler+0x43/0x60
> > > > > => kretprobe_trampoline+0x2a/0x50
> > > > > => vfs_read+0x98/0x180
> > > > > => ksys_read+0x5f/0xe0
> > > > > => do_syscall_64+0x37/0x90
> > > > > => entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > <...>-134 [007] ...1 16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)
> > > > >
> > > > > This shows the double return probes (vfs_read() and full_proxy_read()) on the stack
> > > > > correctly unwinded. (vfs_read() returns to 'ksys_read+0x5f' and full_proxy_read()
> > > > > returns to 'vfs_read+0x98')
> > > > >
> > > > > This also changes the kretprobe behavisor a bit, now the instraction pointer in
> > > > > the 'pt_regs' passed to kretprobe user handler is correctly set the real return
> > > > > address. So user handlers can get it via instruction_pointer() API, and can use
> > > > > stack_trace_save_regs().
> > > > >
> > > > > You can also get this series from
> > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v9
> > > >
> > > > Oops, this is of course 'kprobes/kretprobe-stackfix-v10'. And this branch includes above (*1) series.
> > >
> > > Hi Masami,
> > >
> > > Was this ever merged/applied? This is a very important functionality
> > > for BPF kretprobes, so I hope this won't slip through the cracks.
> >
> > No, not yet as far as I know.
> > I'm waiting for any comment on this series. Since this is basically
> > x86 ORC unwinder improvement, this series should be merged to -tip tree.
> >
>
> Hey Masami,
>
> It's been a while since you posted v10. It seems like this series
> doesn't apply cleanly anymore. Do you mind rebasing and resubmitting
> it again to refresh the series and make it easier for folks to review
> and test it?

Yes, I'm planning to do that this week soon.
Thank you for ping me :)

>
> Also, do I understand correctly that [0] is a dependency of this
> series? If yes, please rebase and resubmit that one as well. Not sure
> on the status of Josh's patches you have dependency on as well. Can
> you please coordinate with him and maybe incorporate them into your
> series?

Sorry I can not see [0], could you tell me another URL or title?
Or is that Kees's patch [1]?

[1] https://lore.kernel.org/all/[email protected]/T/#u


>
> Please also cc Paul McKenney <[email protected]> for the future
> revisions so he can follow along as well? Thanks!

OK!

>
>
> [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=522757&state=*
>
>
>
> > Ingo and Josh,
> >
> > Could you give me any comment, please?
> >
> > Thank you,
> >
> >
> > > Thanks!
> > >
> > > >
> > > > Thank you,
> > > >
> > > > --
> > > > Masami Hiramatsu <[email protected]>
> >
> >
> > --
> > Masami Hiramatsu <[email protected]>


--
Masami Hiramatsu <[email protected]>

2021-09-14 01:39:00

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH -tip v10 00/16] kprobes: Fix stacktrace with kretprobes on x86

On Mon, Sep 13, 2021 at 5:38 PM Masami Hiramatsu <[email protected]> wrote:
>
> On Mon, 13 Sep 2021 10:14:55 -0700
> Andrii Nakryiko <[email protected]> wrote:
>
> > On Mon, Aug 23, 2021 at 10:32 PM Masami Hiramatsu <[email protected]> wrote:
> > >
> > > On Mon, 23 Aug 2021 22:12:06 -0700
> > > Andrii Nakryiko <[email protected]> wrote:
> > >
> > > > On Thu, Jul 29, 2021 at 4:35 PM Masami Hiramatsu <[email protected]> wrote:
> > > > >
> > > > > On Thu, 29 Jul 2021 23:05:56 +0900
> > > > > Masami Hiramatsu <[email protected]> wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > This is the 10th version of the series to fix the stacktrace with kretprobe on x86.
> > > > > >
> > > > > > The previous version is here;
> > > > > >
> > > > > > https://lore.kernel.org/bpf/162601048053.1318837.1550594515476777588.stgit@devnote2/
> > > > > >
> > > > > > This version is rebased on top of new kprobes cleanup series(*1) and merging
> > > > > > Josh's objtool update series (*2)(*3) as [6/16] and [7/16].
> > > > > >
> > > > > > (*1) https://lore.kernel.org/bpf/162748615977.59465.13262421617578791515.stgit@devnote2/
> > > > > > (*2) https://lore.kernel.org/bpf/20210710192433.x5cgjsq2ksvaqnss@treble/
> > > > > > (*3) https://lore.kernel.org/bpf/20210710192514.ghvksi3ozhez4lvb@treble/
> > > > > >
> > > > > > Changes from v9:
> > > > > > - Add Josh's objtool update patches with a build error fix as [6/16] and [7/16].
> > > > > > - Add a API document for kretprobe_find_ret_addr() and check cur != NULL in [5/16].
> > > > > >
> > > > > > With this series, unwinder can unwind stack correctly from ftrace as below;
> > > > > >
> > > > > > # cd /sys/kernel/debug/tracing
> > > > > > # echo > trace
> > > > > > # echo 1 > options/sym-offset
> > > > > > # echo r vfs_read >> kprobe_events
> > > > > > # echo r full_proxy_read >> kprobe_events
> > > > > > # echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
> > > > > > # echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
> > > > > > # echo 1 > events/kprobes/enable
> > > > > > # cat /sys/kernel/debug/kprobes/list
> > > > > > ffffffff8133b740 r full_proxy_read+0x0 [FTRACE]
> > > > > > ffffffff812560b0 r vfs_read+0x0 [FTRACE]
> > > > > > # echo 0 > events/kprobes/enable
> > > > > > # cat trace
> > > > > > # tracer: nop
> > > > > > #
> > > > > > # entries-in-buffer/entries-written: 3/3 #P:8
> > > > > > #
> > > > > > # _-----=> irqs-off
> > > > > > # / _----=> need-resched
> > > > > > # | / _---=> hardirq/softirq
> > > > > > # || / _--=> preempt-depth
> > > > > > # ||| / delay
> > > > > > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > > > > > # | | | |||| | |
> > > > > > <...>-134 [007] ...1 16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
> > > > > > <...>-134 [007] ...1 16.185901: <stack trace>
> > > > > > => kretprobe_trace_func+0x209/0x300
> > > > > > => kretprobe_dispatcher+0x4a/0x70
> > > > > > => __kretprobe_trampoline_handler+0xd4/0x170
> > > > > > => trampoline_handler+0x43/0x60
> > > > > > => kretprobe_trampoline+0x2a/0x50
> > > > > > => vfs_read+0x98/0x180
> > > > > > => ksys_read+0x5f/0xe0
> > > > > > => do_syscall_64+0x37/0x90
> > > > > > => entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > > <...>-134 [007] ...1 16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)
> > > > > >
> > > > > > This shows the double return probes (vfs_read() and full_proxy_read()) on the stack
> > > > > > correctly unwinded. (vfs_read() returns to 'ksys_read+0x5f' and full_proxy_read()
> > > > > > returns to 'vfs_read+0x98')
> > > > > >
> > > > > > This also changes the kretprobe behavisor a bit, now the instraction pointer in
> > > > > > the 'pt_regs' passed to kretprobe user handler is correctly set the real return
> > > > > > address. So user handlers can get it via instruction_pointer() API, and can use
> > > > > > stack_trace_save_regs().
> > > > > >
> > > > > > You can also get this series from
> > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v9
> > > > >
> > > > > Oops, this is of course 'kprobes/kretprobe-stackfix-v10'. And this branch includes above (*1) series.
> > > >
> > > > Hi Masami,
> > > >
> > > > Was this ever merged/applied? This is a very important functionality
> > > > for BPF kretprobes, so I hope this won't slip through the cracks.
> > >
> > > No, not yet as far as I know.
> > > I'm waiting for any comment on this series. Since this is basically
> > > x86 ORC unwinder improvement, this series should be merged to -tip tree.
> > >
> >
> > Hey Masami,
> >
> > It's been a while since you posted v10. It seems like this series
> > doesn't apply cleanly anymore. Do you mind rebasing and resubmitting
> > it again to refresh the series and make it easier for folks to review
> > and test it?
>
> Yes, I'm planning to do that this week soon.
> Thank you for ping me :)

Sounds good, thank you.

>
> >
> > Also, do I understand correctly that [0] is a dependency of this
> > series? If yes, please rebase and resubmit that one as well. Not sure
> > on the status of Josh's patches you have dependency on as well. Can
> > you please coordinate with him and maybe incorporate them into your
> > series?
>
> Sorry I can not see [0], could you tell me another URL or title?

Make sure that you have `state=*` when you are copy/pasting that URL,
it is just a normal Patchworks URL, should be visible to anyone. But
it's just your "kprobes: treewide: Clean up kprobe code" series (v3).

> Or is that Kees's patch [1]?
>
> [1] https://lore.kernel.org/all/[email protected]/T/#u
>
>
> >
> > Please also cc Paul McKenney <[email protected]> for the future
> > revisions so he can follow along as well? Thanks!
>
> OK!
>
> >
> >
> > [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=522757&state=*
> >
> >
> >
> > > Ingo and Josh,
> > >
> > > Could you give me any comment, please?
> > >
> > > Thank you,
> > >
> > >
> > > > Thanks!
> > > >
> > > > >
> > > > > Thank you,
> > > > >
> > > > > --
> > > > > Masami Hiramatsu <[email protected]>
> > >
> > >
> > > --
> > > Masami Hiramatsu <[email protected]>
>
>
> --
> Masami Hiramatsu <[email protected]>

2021-09-14 05:11:32

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v10 00/16] kprobes: Fix stacktrace with kretprobes on x86

On Mon, 13 Sep 2021 18:36:10 -0700
Andrii Nakryiko <[email protected]> wrote:

> On Mon, Sep 13, 2021 at 5:38 PM Masami Hiramatsu <[email protected]> wrote:
> >
> > On Mon, 13 Sep 2021 10:14:55 -0700
> > Andrii Nakryiko <[email protected]> wrote:
> >
> > > On Mon, Aug 23, 2021 at 10:32 PM Masami Hiramatsu <[email protected]> wrote:
> > > >
> > > > On Mon, 23 Aug 2021 22:12:06 -0700
> > > > Andrii Nakryiko <[email protected]> wrote:
> > > >
> > > > > On Thu, Jul 29, 2021 at 4:35 PM Masami Hiramatsu <[email protected]> wrote:
> > > > > >
> > > > > > On Thu, 29 Jul 2021 23:05:56 +0900
> > > > > > Masami Hiramatsu <[email protected]> wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > This is the 10th version of the series to fix the stacktrace with kretprobe on x86.
> > > > > > >
> > > > > > > The previous version is here;
> > > > > > >
> > > > > > > https://lore.kernel.org/bpf/162601048053.1318837.1550594515476777588.stgit@devnote2/
> > > > > > >
> > > > > > > This version is rebased on top of new kprobes cleanup series(*1) and merging
> > > > > > > Josh's objtool update series (*2)(*3) as [6/16] and [7/16].
> > > > > > >
> > > > > > > (*1) https://lore.kernel.org/bpf/162748615977.59465.13262421617578791515.stgit@devnote2/
> > > > > > > (*2) https://lore.kernel.org/bpf/20210710192433.x5cgjsq2ksvaqnss@treble/
> > > > > > > (*3) https://lore.kernel.org/bpf/20210710192514.ghvksi3ozhez4lvb@treble/
> > > > > > >
> > > > > > > Changes from v9:
> > > > > > > - Add Josh's objtool update patches with a build error fix as [6/16] and [7/16].
> > > > > > > - Add a API document for kretprobe_find_ret_addr() and check cur != NULL in [5/16].
> > > > > > >
> > > > > > > With this series, unwinder can unwind stack correctly from ftrace as below;
> > > > > > >
> > > > > > > # cd /sys/kernel/debug/tracing
> > > > > > > # echo > trace
> > > > > > > # echo 1 > options/sym-offset
> > > > > > > # echo r vfs_read >> kprobe_events
> > > > > > > # echo r full_proxy_read >> kprobe_events
> > > > > > > # echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
> > > > > > > # echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
> > > > > > > # echo 1 > events/kprobes/enable
> > > > > > > # cat /sys/kernel/debug/kprobes/list
> > > > > > > ffffffff8133b740 r full_proxy_read+0x0 [FTRACE]
> > > > > > > ffffffff812560b0 r vfs_read+0x0 [FTRACE]
> > > > > > > # echo 0 > events/kprobes/enable
> > > > > > > # cat trace
> > > > > > > # tracer: nop
> > > > > > > #
> > > > > > > # entries-in-buffer/entries-written: 3/3 #P:8
> > > > > > > #
> > > > > > > # _-----=> irqs-off
> > > > > > > # / _----=> need-resched
> > > > > > > # | / _---=> hardirq/softirq
> > > > > > > # || / _--=> preempt-depth
> > > > > > > # ||| / delay
> > > > > > > # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> > > > > > > # | | | |||| | |
> > > > > > > <...>-134 [007] ...1 16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
> > > > > > > <...>-134 [007] ...1 16.185901: <stack trace>
> > > > > > > => kretprobe_trace_func+0x209/0x300
> > > > > > > => kretprobe_dispatcher+0x4a/0x70
> > > > > > > => __kretprobe_trampoline_handler+0xd4/0x170
> > > > > > > => trampoline_handler+0x43/0x60
> > > > > > > => kretprobe_trampoline+0x2a/0x50
> > > > > > > => vfs_read+0x98/0x180
> > > > > > > => ksys_read+0x5f/0xe0
> > > > > > > => do_syscall_64+0x37/0x90
> > > > > > > => entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > > > > > <...>-134 [007] ...1 16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)
> > > > > > >
> > > > > > > This shows the double return probes (vfs_read() and full_proxy_read()) on the stack
> > > > > > > correctly unwinded. (vfs_read() returns to 'ksys_read+0x5f' and full_proxy_read()
> > > > > > > returns to 'vfs_read+0x98')
> > > > > > >
> > > > > > > This also changes the kretprobe behavisor a bit, now the instraction pointer in
> > > > > > > the 'pt_regs' passed to kretprobe user handler is correctly set the real return
> > > > > > > address. So user handlers can get it via instruction_pointer() API, and can use
> > > > > > > stack_trace_save_regs().
> > > > > > >
> > > > > > > You can also get this series from
> > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v9
> > > > > >
> > > > > > Oops, this is of course 'kprobes/kretprobe-stackfix-v10'. And this branch includes above (*1) series.
> > > > >
> > > > > Hi Masami,
> > > > >
> > > > > Was this ever merged/applied? This is a very important functionality
> > > > > for BPF kretprobes, so I hope this won't slip through the cracks.
> > > >
> > > > No, not yet as far as I know.
> > > > I'm waiting for any comment on this series. Since this is basically
> > > > x86 ORC unwinder improvement, this series should be merged to -tip tree.
> > > >
> > >
> > > Hey Masami,
> > >
> > > It's been a while since you posted v10. It seems like this series
> > > doesn't apply cleanly anymore. Do you mind rebasing and resubmitting
> > > it again to refresh the series and make it easier for folks to review
> > > and test it?
> >
> > Yes, I'm planning to do that this week soon.
> > Thank you for ping me :)
>
> Sounds good, thank you.
>
> >
> > >
> > > Also, do I understand correctly that [0] is a dependency of this
> > > series? If yes, please rebase and resubmit that one as well. Not sure
> > > on the status of Josh's patches you have dependency on as well. Can
> > > you please coordinate with him and maybe incorporate them into your
> > > series?
> >
> > Sorry I can not see [0], could you tell me another URL or title?
>
> Make sure that you have `state=*` when you are copy/pasting that URL,
> it is just a normal Patchworks URL, should be visible to anyone. But
> it's just your "kprobes: treewide: Clean up kprobe code" series (v3).

Ah, OK. Of course, I will include it :)

Thank you!

>
> > Or is that Kees's patch [1]?
> >
> > [1] https://lore.kernel.org/all/[email protected]/T/#u
> >
> >
> > >
> > > Please also cc Paul McKenney <[email protected]> for the future
> > > revisions so he can follow along as well? Thanks!
> >
> > OK!
> >
> > >
> > >
> > > [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=522757&state=*
> > >
> > >
> > >
> > > > Ingo and Josh,
> > > >
> > > > Could you give me any comment, please?
> > > >
> > > > Thank you,
> > > >
> > > >
> > > > > Thanks!
> > > > >
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > > > --
> > > > > > Masami Hiramatsu <[email protected]>
> > > >
> > > >
> > > > --
> > > > Masami Hiramatsu <[email protected]>
> >
> >
> > --
> > Masami Hiramatsu <[email protected]>


--
Masami Hiramatsu <[email protected]>