2021-06-18 08:57:26

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86

Hello,

Here is the 8th version of the series to fix the stacktrace with kretprobe on x86.

The previous version is;

https://lore.kernel.org/bpf/162209754288.436794.3904335049560916855.stgit@devnote2/

This version fixes to call appropriate function and drop some unneeded
patches.


Changes from v7:
[03/13]: Call dereference_kernel_function_descriptor() for getting the
address of kretprobe_trampoline.
[09/13]: Update the title and description to explain why it is needed.
[10/13][11/13]: Add Josh's Acked-by.



With this series, unwinder can unwind stack correctly from ftrace as below;

# cd /sys/kernel/debug/tracing
# echo > trace
# echo 1 > options/sym-offset
# echo r vfs_read >> kprobe_events
# echo r full_proxy_read >> kprobe_events
# echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
# echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
# echo 1 > events/kprobes/enable
# cat /sys/kernel/debug/kprobes/list
ffffffff8133b740 r full_proxy_read+0x0 [FTRACE]
ffffffff812560b0 r vfs_read+0x0 [FTRACE]
# echo 0 > events/kprobes/enable
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 3/3 #P:8
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<...>-134 [007] ...1 16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
<...>-134 [007] ...1 16.185901: <stack trace>
=> kretprobe_trace_func+0x209/0x300
=> kretprobe_dispatcher+0x4a/0x70
=> __kretprobe_trampoline_handler+0xd4/0x170
=> trampoline_handler+0x43/0x60
=> kretprobe_trampoline+0x2a/0x50
=> vfs_read+0x98/0x180
=> ksys_read+0x5f/0xe0
=> do_syscall_64+0x37/0x90
=> entry_SYSCALL_64_after_hwframe+0x44/0xae
<...>-134 [007] ...1 16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)

This shows the double return probes (vfs_read and full_proxy_read) on the stack
correctly unwinded. (vfs_read will return to ksys_read+0x5f and full_proxy_read
will return to vfs_read+0x98)

This actually changes the kretprobe behavisor a bit, now the instraction pointer in
the pt_regs passed to kretprobe user handler is correctly set the real return
address. So user handlers can get it via instruction_pointer() API, and can use
stack_trace_save_regs().

You can also get this series from
git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v8


Thank you,

---

Josh Poimboeuf (1):
x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code

Masami Hiramatsu (12):
ia64: kprobes: Fix to pass correct trampoline address to the handler
kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
kprobes: Add kretprobe_find_ret_addr() for searching return address
ARC: Add instruction_pointer_set() API
ia64: Add instruction_pointer_set() API
arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline
kprobes: Enable stacktrace from pt_regs in kretprobe handler
x86/kprobes: Push a fake return address at kretprobe_trampoline
x86/unwind: Recover kretprobe trampoline entry
tracing: Show kretprobe unknown indicator only for kretprobe_trampoline
x86/kprobes: Fixup return address in generic trampoline handler


arch/arc/include/asm/ptrace.h | 5 ++
arch/arc/kernel/kprobes.c | 2 -
arch/arm/probes/kprobes/core.c | 5 +-
arch/arm64/kernel/probes/kprobes.c | 3 -
arch/csky/kernel/probes/kprobes.c | 2 -
arch/ia64/include/asm/ptrace.h | 5 ++
arch/ia64/kernel/kprobes.c | 15 ++---
arch/mips/kernel/kprobes.c | 3 -
arch/parisc/kernel/kprobes.c | 4 +
arch/powerpc/kernel/kprobes.c | 13 ----
arch/riscv/kernel/probes/kprobes.c | 2 -
arch/s390/kernel/kprobes.c | 2 -
arch/sh/kernel/kprobes.c | 2 -
arch/sparc/kernel/kprobes.c | 2 -
arch/x86/include/asm/kprobes.h | 1
arch/x86/include/asm/unwind.h | 23 +++++++
arch/x86/include/asm/unwind_hints.h | 5 ++
arch/x86/kernel/kprobes/core.c | 53 +++++++++++++++--
arch/x86/kernel/unwind_frame.c | 3 -
arch/x86/kernel/unwind_guess.c | 3 -
arch/x86/kernel/unwind_orc.c | 18 +++++-
include/linux/kprobes.h | 44 ++++++++++++--
kernel/kprobes.c | 108 +++++++++++++++++++++++++----------
kernel/trace/trace_output.c | 17 +-----
lib/error-inject.c | 3 +
25 files changed, 238 insertions(+), 105 deletions(-)

--
Masami Hiramatsu (Linaro) <[email protected]>


2021-06-18 08:59:06

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 04/13] kprobes: Add kretprobe_find_ret_addr() for searching return address

Add kretprobe_find_ret_addr() for searching correct return address
from kretprobe instance list.

Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Andrii Nakryik <[email protected]>
---
Changes in v6:
- Replace BUG_ON() with WARN_ON_ONCE() in __kretprobe_trampoline_handler().
Changes in v3:
- Remove generic stacktrace fixup. Instead, it should be solved in
each unwinder. This just provide the generic interface.
Changes in v2:
- Add is_kretprobe_trampoline() for checking address outside of
kretprobe_find_ret_addr()
- Remove unneeded addr from kretprobe_find_ret_addr()
- Rename fixup_kretprobe_tramp_addr() to fixup_kretprobe_trampoline()
---
include/linux/kprobes.h | 22 +++++++++++
kernel/kprobes.c | 91 ++++++++++++++++++++++++++++++++++-------------
2 files changed, 87 insertions(+), 26 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 5ce677819a25..08d3415e4418 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -207,6 +207,14 @@ static nokprobe_inline void *kretprobe_trampoline_addr(void)
return dereference_kernel_function_descriptor(kretprobe_trampoline);
}

+static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
+{
+ return (void *)addr == kretprobe_trampoline_addr();
+}
+
+unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
+ struct llist_node **cur);
+
/* If the trampoline handler called from a kprobe, use this version */
unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
void *frame_pointer);
@@ -506,6 +514,20 @@ static inline bool is_kprobe_optinsn_slot(unsigned long addr)
}
#endif

+#if !defined(CONFIG_KRETPROBES)
+static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
+{
+ return false;
+}
+
+static nokprobe_inline
+unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
+ struct llist_node **cur)
+{
+ return 0;
+}
+#endif
+
/* Returns true if kprobes handled the fault */
static nokprobe_inline bool kprobe_page_fault(struct pt_regs *regs,
unsigned int trap)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ad7a8c81ab06..650cbe738666 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1840,45 +1840,69 @@ static struct notifier_block kprobe_exceptions_nb = {

#ifdef CONFIG_KRETPROBES

-unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
- void *frame_pointer)
+/* This assumes the tsk is current or the task which is not running. */
+static unsigned long __kretprobe_find_ret_addr(struct task_struct *tsk,
+ struct llist_node **cur)
{
- kprobe_opcode_t *correct_ret_addr = NULL;
struct kretprobe_instance *ri = NULL;
- struct llist_node *first, *node;
- struct kretprobe *rp;
+ struct llist_node *node = *cur;
+
+ if (!node)
+ node = tsk->kretprobe_instances.first;
+ else
+ node = node->next;

- /* Find all nodes for this frame. */
- first = node = current->kretprobe_instances.first;
while (node) {
ri = container_of(node, struct kretprobe_instance, llist);
-
- BUG_ON(ri->fp != frame_pointer);
-
if (ri->ret_addr != kretprobe_trampoline_addr()) {
- correct_ret_addr = ri->ret_addr;
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- goto found;
+ *cur = node;
+ return (unsigned long)ri->ret_addr;
}
-
node = node->next;
}
- pr_err("Oops! Kretprobe fails to find correct return address.\n");
- BUG_ON(1);
+ return 0;
+}
+NOKPROBE_SYMBOL(__kretprobe_find_ret_addr);

-found:
- /* Unlink all nodes for this frame. */
- current->kretprobe_instances.first = node->next;
- node->next = NULL;
+unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
+ struct llist_node **cur)
+{
+ struct kretprobe_instance *ri = NULL;
+ unsigned long ret;
+
+ do {
+ ret = __kretprobe_find_ret_addr(tsk, cur);
+ if (!ret)
+ return ret;
+ ri = container_of(*cur, struct kretprobe_instance, llist);
+ } while (ri->fp != fp);
+
+ return ret;
+}
+NOKPROBE_SYMBOL(kretprobe_find_ret_addr);

- /* Run them.. */
+unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
+ void *frame_pointer)
+{
+ kprobe_opcode_t *correct_ret_addr = NULL;
+ struct kretprobe_instance *ri = NULL;
+ struct llist_node *first, *node = NULL;
+ struct kretprobe *rp;
+
+ /* Find correct address and all nodes for this frame. */
+ correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);
+ if (!correct_ret_addr) {
+ pr_err("Oops! Kretprobe fails to find correct return address.\n");
+ BUG_ON(1);
+ }
+
+ /* Run them. */
+ first = current->kretprobe_instances.first;
while (first) {
ri = container_of(first, struct kretprobe_instance, llist);
- first = first->next;
+
+ if (WARN_ON_ONCE(ri->fp != frame_pointer))
+ break;

rp = get_kretprobe(ri);
if (rp && rp->handler) {
@@ -1889,6 +1913,21 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
rp->handler(ri, regs);
__this_cpu_write(current_kprobe, prev);
}
+ if (first == node)
+ break;
+
+ first = first->next;
+ }
+
+ /* Unlink all nodes for this frame. */
+ first = current->kretprobe_instances.first;
+ current->kretprobe_instances.first = node->next;
+ node->next = NULL;
+
+ /* Recycle them. */
+ while (first) {
+ ri = container_of(first, struct kretprobe_instance, llist);
+ first = first->next;

recycle_rp_inst(ri);
}

2021-06-18 08:59:29

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 01/13] ia64: kprobes: Fix to pass correct trampoline address to the handler

Commit e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
missed to pass the wrong trampoline address (it passes the descriptor address
instead of function entry address).
This fixes it to pass correct trampoline address to __kretprobe_trampoline_handler().
This also changes to use correct symbol dereference function to get the
function address from the kretprobe_trampoline.

Fixes: e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v5:
- Fix a compile error typo.
---
arch/ia64/kernel/kprobes.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index 441ed04b1037..d4048518a1d7 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -398,7 +398,8 @@ static void kretprobe_trampoline(void)

int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
- regs->cr_iip = __kretprobe_trampoline_handler(regs, kretprobe_trampoline, NULL);
+ regs->cr_iip = __kretprobe_trampoline_handler(regs,
+ dereference_function_descriptor(kretprobe_trampoline), NULL);
/*
* By returning a non-zero value, we are telling
* kprobe_handler() that we don't want the post_handler
@@ -414,7 +415,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
ri->fp = NULL;

/* Replace the return addr with trampoline addr */
- regs->b0 = ((struct fnptr *)kretprobe_trampoline)->ip;
+ regs->b0 = (unsigned long)dereference_function_descriptor(kretprobe_trampoline);
}

/* Check the instruction in the slot is break */
@@ -902,14 +903,14 @@ static struct kprobe trampoline_p = {
int __init arch_init_kprobes(void)
{
trampoline_p.addr =
- (kprobe_opcode_t *)((struct fnptr *)kretprobe_trampoline)->ip;
+ dereference_function_descriptor(kretprobe_trampoline);
return register_kprobe(&trampoline_p);
}

int __kprobes arch_trampoline_kprobe(struct kprobe *p)
{
if (p->addr ==
- (kprobe_opcode_t *)((struct fnptr *)kretprobe_trampoline)->ip)
+ dereference_function_descriptor(kretprobe_trampoline))
return 1;

return 0;

2021-06-18 08:59:34

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()

Replace arch_deref_entry_point() with dereference_symbol_descriptor()
because those are doing same thing.

Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Andrii Nakryik <[email protected]>
---
Changes in v6:
- Use dereference_symbol_descriptor() so that it can handle address in
modules correctly.
---
arch/ia64/kernel/kprobes.c | 5 -----
arch/powerpc/kernel/kprobes.c | 11 -----------
include/linux/kprobes.h | 1 -
kernel/kprobes.c | 7 +------
lib/error-inject.c | 3 ++-
5 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index d4048518a1d7..0f8573bbf520 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -891,11 +891,6 @@ int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
return ret;
}

-unsigned long arch_deref_entry_point(void *entry)
-{
- return ((struct fnptr *)entry)->ip;
-}
-
static struct kprobe trampoline_p = {
.pre_handler = trampoline_probe_handler
};
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index c64a5feaebbe..24472f2c2cfc 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -522,17 +522,6 @@ int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
}
NOKPROBE_SYMBOL(kprobe_fault_handler);

-unsigned long arch_deref_entry_point(void *entry)
-{
-#ifdef PPC64_ELF_ABI_v1
- if (!kernel_text_address((unsigned long)entry))
- return ppc_global_function_entry(entry);
- else
-#endif
- return (unsigned long)entry;
-}
-NOKPROBE_SYMBOL(arch_deref_entry_point);
-
static struct kprobe trampoline_p = {
.addr = (kprobe_opcode_t *) &kretprobe_trampoline,
.pre_handler = trampoline_probe_handler
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 523ffc7bc3a8..713c3a683011 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -382,7 +382,6 @@ int register_kprobe(struct kprobe *p);
void unregister_kprobe(struct kprobe *p);
int register_kprobes(struct kprobe **kps, int num);
void unregister_kprobes(struct kprobe **kps, int num);
-unsigned long arch_deref_entry_point(void *);

int register_kretprobe(struct kretprobe *rp);
void unregister_kretprobe(struct kretprobe *rp);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index e41385afe79d..f8fe9d077b41 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1838,11 +1838,6 @@ static struct notifier_block kprobe_exceptions_nb = {
.priority = 0x7fffffff /* we need to be notified first */
};

-unsigned long __weak arch_deref_entry_point(void *entry)
-{
- return (unsigned long)entry;
-}
-
#ifdef CONFIG_KRETPROBES

unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
@@ -2305,7 +2300,7 @@ static int __init populate_kprobe_blacklist(unsigned long *start,
int ret;

for (iter = start; iter < end; iter++) {
- entry = arch_deref_entry_point((void *)*iter);
+ entry = (unsigned long)dereference_symbol_descriptor((void *)*iter);
ret = kprobe_add_ksym_blacklist(entry);
if (ret == -EINVAL)
continue;
diff --git a/lib/error-inject.c b/lib/error-inject.c
index c73651b15b76..2ff5ef689d72 100644
--- a/lib/error-inject.c
+++ b/lib/error-inject.c
@@ -8,6 +8,7 @@
#include <linux/mutex.h>
#include <linux/list.h>
#include <linux/slab.h>
+#include <asm/sections.h>

/* Whitelist of symbols that can be overridden for error injection. */
static LIST_HEAD(error_injection_list);
@@ -64,7 +65,7 @@ static void populate_error_injection_list(struct error_injection_entry *start,

mutex_lock(&ei_mutex);
for (iter = start; iter < end; iter++) {
- entry = arch_deref_entry_point((void *)iter->addr);
+ entry = (unsigned long)dereference_symbol_descriptor((void *)iter->addr);

if (!kernel_text_address(entry) ||
!kallsyms_lookup_size_offset(entry, &size, &offset)) {

2021-06-18 09:01:09

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 08/13] arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline

Change kretprobe_trampoline to make a space for regs->ARM_pc so that
kretprobe_trampoline_handler can call instruction_pointer_set()
safely.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/arm/probes/kprobes/core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
index 583f6b1a2a6f..556b1560fb2c 100644
--- a/arch/arm/probes/kprobes/core.c
+++ b/arch/arm/probes/kprobes/core.c
@@ -374,11 +374,13 @@ int __kprobes kprobe_exceptions_notify(struct notifier_block *self,
void __naked __kprobes kretprobe_trampoline(void)
{
__asm__ __volatile__ (
+ "sub sp, sp, #16 \n\t"
"stmdb sp!, {r0 - r11} \n\t"
"mov r0, sp \n\t"
"bl trampoline_handler \n\t"
"mov lr, r0 \n\t"
"ldmia sp!, {r0 - r11} \n\t"
+ "add sp, sp, #16 \n\t"
#ifdef CONFIG_THUMB2_KERNEL
"bx lr \n\t"
#else

2021-06-18 09:02:34

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code

From: Josh Poimboeuf <[email protected]>

Add UNWIND_HINT_FUNC on kretporbe_trampoline code so that ORC
information is generated on the kretprobe_trampoline correctly.

Note that when the CONFIG_FRAME_POINTER=y, since the
kretprobe_trampoline skips updating frame pointer, the stack frame
of the kretprobe_trampoline seems non-standard. So this marks it
is STACK_FRAME_NON_STANDARD() and undefine UNWIND_HINT_FUNC.
Anyway, with the frame pointer, FP unwinder can unwind the stack
frame correctly without that hint.

Signed-off-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Andrii Nakryik <[email protected]>
---
Changes in v4:
- Apply UNWIND_HINT_FUNC only if CONFIG_FRAME_POINTER=n.
---
arch/x86/include/asm/unwind_hints.h | 5 +++++
arch/x86/kernel/kprobes/core.c | 17 +++++++++++++++--
2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/unwind_hints.h b/arch/x86/include/asm/unwind_hints.h
index 8e574c0afef8..8b33674288ea 100644
--- a/arch/x86/include/asm/unwind_hints.h
+++ b/arch/x86/include/asm/unwind_hints.h
@@ -52,6 +52,11 @@
UNWIND_HINT sp_reg=ORC_REG_SP sp_offset=8 type=UNWIND_HINT_TYPE_FUNC
.endm

+#else
+
+#define UNWIND_HINT_FUNC \
+ UNWIND_HINT(ORC_REG_SP, 8, UNWIND_HINT_TYPE_FUNC, 0)
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_X86_UNWIND_HINTS_H */
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 2dccb4347453..74f049b6e77f 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1019,6 +1019,19 @@ int kprobe_int3_handler(struct pt_regs *regs)
}
NOKPROBE_SYMBOL(kprobe_int3_handler);

+#ifdef CONFIG_FRAME_POINTER
+/*
+ * kretprobe_trampoline skips updating frame pointer. The frame pointer
+ * saved in trampoline_handler points to the real caller function's
+ * frame pointer. Thus the kretprobe_trampoline doesn't seems to have a
+ * standard stack frame with CONFIG_FRAME_POINTER=y.
+ * Let's mark it non-standard function. Anyway, FP unwinder can correctly
+ * unwind without the hint.
+ */
+STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
+#undef UNWIND_HINT_FUNC
+#define UNWIND_HINT_FUNC
+#endif
/*
* When a retprobed function returns, this code saves registers and
* calls trampoline_handler() runs, which calls the kretprobe's handler.
@@ -1031,6 +1044,7 @@ asm(
/* We don't bother saving the ss register */
#ifdef CONFIG_X86_64
" pushq %rsp\n"
+ UNWIND_HINT_FUNC
" pushfq\n"
SAVE_REGS_STRING
" movq %rsp, %rdi\n"
@@ -1041,6 +1055,7 @@ asm(
" popfq\n"
#else
" pushl %esp\n"
+ UNWIND_HINT_FUNC
" pushfl\n"
SAVE_REGS_STRING
" movl %esp, %eax\n"
@@ -1054,8 +1069,6 @@ asm(
".size kretprobe_trampoline, .-kretprobe_trampoline\n"
);
NOKPROBE_SYMBOL(kretprobe_trampoline);
-STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
-

/*
* Called from kretprobe_trampoline

2021-06-18 10:31:48

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()

Remove trampoline_address from kretprobe_trampoline_handler().
Instead of passing the address, kretprobe_trampoline_handler()
can use new kretprobe_trampoline_addr().

Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Andrii Nakryik <[email protected]>
---
Changes in v8:
- Use dereference_kernel_function_descriptor() to
get the kretprobe_trampoline address.
Changes in v3:
- Remove wrong kretprobe_trampoline declaration from
arch/x86/include/asm/kprobes.h.
Changes in v2:
- Remove arch_deref_entry_point() from comment.
---
arch/arc/kernel/kprobes.c | 2 +-
arch/arm/probes/kprobes/core.c | 3 +--
arch/arm64/kernel/probes/kprobes.c | 3 +--
arch/csky/kernel/probes/kprobes.c | 2 +-
arch/ia64/kernel/kprobes.c | 5 ++---
arch/mips/kernel/kprobes.c | 3 +--
arch/parisc/kernel/kprobes.c | 4 ++--
arch/powerpc/kernel/kprobes.c | 2 +-
arch/riscv/kernel/probes/kprobes.c | 2 +-
arch/s390/kernel/kprobes.c | 2 +-
arch/sh/kernel/kprobes.c | 2 +-
arch/sparc/kernel/kprobes.c | 2 +-
arch/x86/include/asm/kprobes.h | 1 -
arch/x86/kernel/kprobes/core.c | 2 +-
include/linux/kprobes.h | 18 +++++++++++++-----
kernel/kprobes.c | 3 +--
16 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/arch/arc/kernel/kprobes.c b/arch/arc/kernel/kprobes.c
index 5f0415fc7328..3cee75c87f97 100644
--- a/arch/arc/kernel/kprobes.c
+++ b/arch/arc/kernel/kprobes.c
@@ -381,7 +381,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
static int __kprobes trampoline_probe_handler(struct kprobe *p,
struct pt_regs *regs)
{
- regs->ret = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+ regs->ret = __kretprobe_trampoline_handler(regs, NULL);

/* By returning a non zero value, we are telling the kprobe handler
* that we don't want the post_handler to run
diff --git a/arch/arm/probes/kprobes/core.c b/arch/arm/probes/kprobes/core.c
index 27e0af78e88b..583f6b1a2a6f 100644
--- a/arch/arm/probes/kprobes/core.c
+++ b/arch/arm/probes/kprobes/core.c
@@ -390,8 +390,7 @@ void __naked __kprobes kretprobe_trampoline(void)
/* Called from kretprobe_trampoline */
static __used __kprobes void *trampoline_handler(struct pt_regs *regs)
{
- return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
- (void *)regs->ARM_fp);
+ return (void *)kretprobe_trampoline_handler(regs, (void *)regs->ARM_fp);
}

void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index 004b86eff9c2..649c970e65a2 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -396,8 +396,7 @@ int __init arch_populate_kprobe_blacklist(void)

void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
{
- return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline,
- (void *)kernel_stack_pointer(regs));
+ return (void *)kretprobe_trampoline_handler(regs, (void *)kernel_stack_pointer(regs));
}

void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
diff --git a/arch/csky/kernel/probes/kprobes.c b/arch/csky/kernel/probes/kprobes.c
index 68b22b499aeb..cc9dde2e4341 100644
--- a/arch/csky/kernel/probes/kprobes.c
+++ b/arch/csky/kernel/probes/kprobes.c
@@ -387,7 +387,7 @@ int __init arch_populate_kprobe_blacklist(void)

void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
{
- return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+ return (void *)kretprobe_trampoline_handler(regs, NULL);
}

void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
diff --git a/arch/ia64/kernel/kprobes.c b/arch/ia64/kernel/kprobes.c
index 0f8573bbf520..44c84c20b626 100644
--- a/arch/ia64/kernel/kprobes.c
+++ b/arch/ia64/kernel/kprobes.c
@@ -392,14 +392,13 @@ static void __kprobes set_current_kprobe(struct kprobe *p,
__this_cpu_write(current_kprobe, p);
}

-static void kretprobe_trampoline(void)
+void kretprobe_trampoline(void)
{
}

int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
- regs->cr_iip = __kretprobe_trampoline_handler(regs,
- dereference_function_descriptor(kretprobe_trampoline), NULL);
+ regs->cr_iip = __kretprobe_trampoline_handler(regs, NULL);
/*
* By returning a non-zero value, we are telling
* kprobe_handler() that we don't want the post_handler
diff --git a/arch/mips/kernel/kprobes.c b/arch/mips/kernel/kprobes.c
index 75bff0f77319..21a4fda1e2cb 100644
--- a/arch/mips/kernel/kprobes.c
+++ b/arch/mips/kernel/kprobes.c
@@ -486,8 +486,7 @@ void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
static int __kprobes trampoline_probe_handler(struct kprobe *p,
struct pt_regs *regs)
{
- instruction_pointer(regs) = __kretprobe_trampoline_handler(regs,
- kretprobe_trampoline, NULL);
+ instruction_pointer(regs) = __kretprobe_trampoline_handler(regs, NULL);
/*
* By returning a non-zero value, we are telling
* kprobe_handler() that we don't want the post_handler
diff --git a/arch/parisc/kernel/kprobes.c b/arch/parisc/kernel/kprobes.c
index 6d21a515eea5..4a35ac6e2ca2 100644
--- a/arch/parisc/kernel/kprobes.c
+++ b/arch/parisc/kernel/kprobes.c
@@ -175,7 +175,7 @@ int __kprobes parisc_kprobe_ss_handler(struct pt_regs *regs)
return 1;
}

-static inline void kretprobe_trampoline(void)
+void kretprobe_trampoline(void)
{
asm volatile("nop");
asm volatile("nop");
@@ -193,7 +193,7 @@ static int __kprobes trampoline_probe_handler(struct kprobe *p,
{
unsigned long orig_ret_address;

- orig_ret_address = __kretprobe_trampoline_handler(regs, trampoline_p.addr, NULL);
+ orig_ret_address = __kretprobe_trampoline_handler(regs, NULL);
instruction_pointer_set(regs, orig_ret_address);

return 1;
diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
index 24472f2c2cfc..025a9f83ae88 100644
--- a/arch/powerpc/kernel/kprobes.c
+++ b/arch/powerpc/kernel/kprobes.c
@@ -399,7 +399,7 @@ static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
unsigned long orig_ret_address;

- orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+ orig_ret_address = __kretprobe_trampoline_handler(regs, NULL);
/*
* We get here through one of two paths:
* 1. by taking a trap -> kprobe_handler() -> here
diff --git a/arch/riscv/kernel/probes/kprobes.c b/arch/riscv/kernel/probes/kprobes.c
index 247e33fa5bc7..07bc8804643e 100644
--- a/arch/riscv/kernel/probes/kprobes.c
+++ b/arch/riscv/kernel/probes/kprobes.c
@@ -370,7 +370,7 @@ int __init arch_populate_kprobe_blacklist(void)

void __kprobes __used *trampoline_probe_handler(struct pt_regs *regs)
{
- return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+ return (void *)kretprobe_trampoline_handler(regs, NULL);
}

void __kprobes arch_prepare_kretprobe(struct kretprobe_instance *ri,
diff --git a/arch/s390/kernel/kprobes.c b/arch/s390/kernel/kprobes.c
index 74b0bd2c24d4..1e6600765553 100644
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@ -351,7 +351,7 @@ static void __used kretprobe_trampoline_holder(void)
*/
static int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
- regs->psw.addr = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+ regs->psw.addr = __kretprobe_trampoline_handler(regs, NULL);
/*
* By returning a non-zero value, we are telling
* kprobe_handler() that we don't want the post_handler
diff --git a/arch/sh/kernel/kprobes.c b/arch/sh/kernel/kprobes.c
index 1c7f358ef0be..8e76a35e6e33 100644
--- a/arch/sh/kernel/kprobes.c
+++ b/arch/sh/kernel/kprobes.c
@@ -303,7 +303,7 @@ static void __used kretprobe_trampoline_holder(void)
*/
int __kprobes trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
{
- regs->pc = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+ regs->pc = __kretprobe_trampoline_handler(regs, NULL);

return 1;
}
diff --git a/arch/sparc/kernel/kprobes.c b/arch/sparc/kernel/kprobes.c
index 4c05a4ee6a0e..401534236c2e 100644
--- a/arch/sparc/kernel/kprobes.c
+++ b/arch/sparc/kernel/kprobes.c
@@ -451,7 +451,7 @@ static int __kprobes trampoline_probe_handler(struct kprobe *p,
{
unsigned long orig_ret_address = 0;

- orig_ret_address = __kretprobe_trampoline_handler(regs, &kretprobe_trampoline, NULL);
+ orig_ret_address = __kretprobe_trampoline_handler(regs, NULL);
regs->tpc = orig_ret_address;
regs->tnpc = orig_ret_address + 4;

diff --git a/arch/x86/include/asm/kprobes.h b/arch/x86/include/asm/kprobes.h
index bd7f5886a789..71ea2eab43d5 100644
--- a/arch/x86/include/asm/kprobes.h
+++ b/arch/x86/include/asm/kprobes.h
@@ -49,7 +49,6 @@ extern __visible kprobe_opcode_t optprobe_template_end[];
extern const int kretprobe_blacklist_size;

void arch_remove_kprobe(struct kprobe *p);
-asmlinkage void kretprobe_trampoline(void);

extern void arch_kprobe_override_function(struct pt_regs *regs);

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index c492ad3001ca..2dccb4347453 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1070,7 +1070,7 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
regs->ip = (unsigned long)&kretprobe_trampoline;
regs->orig_ax = ~0UL;

- return (void *)kretprobe_trampoline_handler(regs, &kretprobe_trampoline, &regs->sp);
+ return (void *)kretprobe_trampoline_handler(regs, &regs->sp);
}
NOKPROBE_SYMBOL(trampoline_handler);

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 713c3a683011..5ce677819a25 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -197,15 +197,23 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs);
extern int arch_trampoline_kprobe(struct kprobe *p);

+void kretprobe_trampoline(void);
+/*
+ * Since some architecture uses structured function pointer,
+ * use dereference_function_descriptor() to get real function address.
+ */
+static nokprobe_inline void *kretprobe_trampoline_addr(void)
+{
+ return dereference_kernel_function_descriptor(kretprobe_trampoline);
+}
+
/* If the trampoline handler called from a kprobe, use this version */
unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
- void *trampoline_address,
- void *frame_pointer);
+ void *frame_pointer);

static nokprobe_inline
unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
- void *trampoline_address,
- void *frame_pointer)
+ void *frame_pointer)
{
unsigned long ret;
/*
@@ -214,7 +222,7 @@ unsigned long kretprobe_trampoline_handler(struct pt_regs *regs,
* be running at this point.
*/
kprobe_busy_begin();
- ret = __kretprobe_trampoline_handler(regs, trampoline_address, frame_pointer);
+ ret = __kretprobe_trampoline_handler(regs, frame_pointer);
kprobe_busy_end();

return ret;
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index f8fe9d077b41..ad7a8c81ab06 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1841,7 +1841,6 @@ static struct notifier_block kprobe_exceptions_nb = {
#ifdef CONFIG_KRETPROBES

unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
- void *trampoline_address,
void *frame_pointer)
{
kprobe_opcode_t *correct_ret_addr = NULL;
@@ -1856,7 +1855,7 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,

BUG_ON(ri->fp != frame_pointer);

- if (ri->ret_addr != trampoline_address) {
+ if (ri->ret_addr != kretprobe_trampoline_addr()) {
correct_ret_addr = ri->ret_addr;
/*
* This is the real return address. Any other

2021-06-18 10:33:27

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 09/13] kprobes: Enable stacktrace from pt_regs in kretprobe handler

Since the ORC unwinder from pt_regs requires setting up regs->ip
correctly, set the correct return address to the regs->ip before
calling user kretprobe handler.

This allows the kretrprobe handler to trace stack from the
kretprobe's pt_regs by stack_trace_save_regs() (eBPF will do
this), instead of stack tracing from the handler context by
stack_trace_save() (ftrace will do this).

Suggested-by: Josh Poimboeuf <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Andrii Nakryik <[email protected]>
---
Changes in v8:
- Update comment to clarify why this is needed.
Changes in v3:
- Cast the correct_ret_addr to unsigned long.
---
kernel/kprobes.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 650cbe738666..ba729ed05cb3 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1896,6 +1896,9 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
BUG_ON(1);
}

+ /* Set the instruction pointer to the correct address */
+ instruction_pointer_set(regs, (unsigned long)correct_ret_addr);
+
/* Run them. */
first = current->kretprobe_instances.first;
while (first) {

2021-06-18 10:33:27

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 10/13] x86/kprobes: Push a fake return address at kretprobe_trampoline

This changes x86/kretprobe stack frame on kretprobe_trampoline
a bit, which now push the kretprobe_trampoline as a fake return
address at the bottom of the stack frame. With this fix, the ORC
unwinder will see the kretprobe_trampoline as a return address.

Signed-off-by: Masami Hiramatsu <[email protected]>
Suggested-by: Josh Poimboeuf <[email protected]>
Tested-by: Andrii Nakryik <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
---
arch/x86/kernel/kprobes/core.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 74f049b6e77f..4d040aaf969b 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1041,28 +1041,31 @@ asm(
".global kretprobe_trampoline\n"
".type kretprobe_trampoline, @function\n"
"kretprobe_trampoline:\n"
- /* We don't bother saving the ss register */
#ifdef CONFIG_X86_64
- " pushq %rsp\n"
+ /* Push fake return address to tell the unwinder it's a kretprobe */
+ " pushq $kretprobe_trampoline\n"
UNWIND_HINT_FUNC
+ /* Save the sp-8, this will be fixed later */
+ " pushq %rsp\n"
" pushfq\n"
SAVE_REGS_STRING
" movq %rsp, %rdi\n"
" call trampoline_handler\n"
- /* Replace saved sp with true return address. */
- " movq %rax, 19*8(%rsp)\n"
RESTORE_REGS_STRING
+ " addq $8, %rsp\n"
" popfq\n"
#else
- " pushl %esp\n"
+ /* Push fake return address to tell the unwinder it's a kretprobe */
+ " pushl $kretprobe_trampoline\n"
UNWIND_HINT_FUNC
+ /* Save the sp-4, this will be fixed later */
+ " pushl %esp\n"
" pushfl\n"
SAVE_REGS_STRING
" movl %esp, %eax\n"
" call trampoline_handler\n"
- /* Replace saved sp with true return address. */
- " movl %eax, 15*4(%esp)\n"
RESTORE_REGS_STRING
+ " addl $4, %esp\n"
" popfl\n"
#endif
" ret\n"
@@ -1073,8 +1076,10 @@ NOKPROBE_SYMBOL(kretprobe_trampoline);
/*
* Called from kretprobe_trampoline
*/
-__used __visible void *trampoline_handler(struct pt_regs *regs)
+__used __visible void trampoline_handler(struct pt_regs *regs)
{
+ unsigned long *frame_pointer;
+
/* fixup registers */
regs->cs = __KERNEL_CS;
#ifdef CONFIG_X86_32
@@ -1082,8 +1087,16 @@ __used __visible void *trampoline_handler(struct pt_regs *regs)
#endif
regs->ip = (unsigned long)&kretprobe_trampoline;
regs->orig_ax = ~0UL;
+ regs->sp += sizeof(long);
+ frame_pointer = ((unsigned long *)&regs->sp) + 1;

- return (void *)kretprobe_trampoline_handler(regs, &regs->sp);
+ /* Replace fake return address with real one. */
+ *frame_pointer = kretprobe_trampoline_handler(regs, frame_pointer);
+ /*
+ * Move flags to sp so that kretprobe_trapmoline can return
+ * right after popf.
+ */
+ regs->sp = regs->flags;
}
NOKPROBE_SYMBOL(trampoline_handler);


2021-06-18 10:33:36

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry

Since the kretprobe replaces the function return address with
the kretprobe_trampoline on the stack, x86 unwinders can not
continue the stack unwinding at that point, or record
kretprobe_trampoline instead of correct return address.

To fix this issue, find the correct return address from task's
kretprobe_instances as like as function-graph tracer does.

With this fix, the unwinder can correctly unwind the stack
from kretprobe event on x86, as below.

<...>-135 [003] ...1 6.722338: r_full_proxy_read_0: (vfs_read+0xab/0x1a0 <- full_proxy_read)
<...>-135 [003] ...1 6.722377: <stack trace>
=> kretprobe_trace_func+0x209/0x2f0
=> kretprobe_dispatcher+0x4a/0x70
=> __kretprobe_trampoline_handler+0xca/0x150
=> trampoline_handler+0x44/0x70
=> kretprobe_trampoline+0x2a/0x50
=> vfs_read+0xab/0x1a0
=> ksys_read+0x5f/0xe0
=> do_syscall_64+0x33/0x40
=> entry_SYSCALL_64_after_hwframe+0x44/0xae


Reported-by: Daniel Xu <[email protected]>
Signed-off-by: Masami Hiramatsu <[email protected]>
Suggested-by: Josh Poimboeuf <[email protected]>
Tested-by: Andrii Nakryik <[email protected]>
Acked-by: Josh Poimboeuf <[email protected]>
---
Changes in v7:
- Remove superfluous #include <linux/kprobes.h>.
Changes in v5:
- Fix the case of interrupt happens on kretprobe_trampoline+0.
Changes in v3:
- Split out the kretprobe side patch
- Fix build error when CONFIG_KRETPROBES=n.
Changes in v2:
- Remove kretprobe wrapper functions from unwind_orc.c
- Do not fixup state->ip when unwinding with regs because
kretprobe fixup instruction pointer before calling handler.
---
arch/x86/include/asm/unwind.h | 23 +++++++++++++++++++++++
arch/x86/kernel/unwind_frame.c | 3 +--
arch/x86/kernel/unwind_guess.c | 3 +--
arch/x86/kernel/unwind_orc.c | 18 ++++++++++++++----
4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index 70fc159ebe69..36d3971c0a2c 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -4,6 +4,7 @@

#include <linux/sched.h>
#include <linux/ftrace.h>
+#include <linux/kprobes.h>
#include <asm/ptrace.h>
#include <asm/stacktrace.h>

@@ -15,6 +16,7 @@ struct unwind_state {
unsigned long stack_mask;
struct task_struct *task;
int graph_idx;
+ struct llist_node *kr_cur;
bool error;
#if defined(CONFIG_UNWINDER_ORC)
bool signal, full_regs;
@@ -99,6 +101,27 @@ void unwind_module_init(struct module *mod, void *orc_ip, size_t orc_ip_size,
void *orc, size_t orc_size) {}
#endif

+static inline
+unsigned long unwind_recover_kretprobe(struct unwind_state *state,
+ unsigned long addr, unsigned long *addr_p)
+{
+ return is_kretprobe_trampoline(addr) ?
+ kretprobe_find_ret_addr(state->task, addr_p, &state->kr_cur) :
+ addr;
+}
+
+/* Recover the return address modified by instrumentation (e.g. kretprobe) */
+static inline
+unsigned long unwind_recover_ret_addr(struct unwind_state *state,
+ unsigned long addr, unsigned long *addr_p)
+{
+ unsigned long ret;
+
+ ret = ftrace_graph_ret_addr(state->task, &state->graph_idx,
+ addr, addr_p);
+ return unwind_recover_kretprobe(state, ret, addr_p);
+}
+
/*
* This disables KASAN checking when reading a value from another task's stack,
* since the other task could be running on another CPU and could have poisoned
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index d7c44b257f7f..8e1c50c86e5d 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -240,8 +240,7 @@ static bool update_stack_state(struct unwind_state *state,
else {
addr_p = unwind_get_return_address_ptr(state);
addr = READ_ONCE_TASK_STACK(state->task, *addr_p);
- state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
- addr, addr_p);
+ state->ip = unwind_recover_ret_addr(state, addr, addr_p);
}

/* Save the original stack pointer for unwind_dump(): */
diff --git a/arch/x86/kernel/unwind_guess.c b/arch/x86/kernel/unwind_guess.c
index c49f10ffd8cd..884d68a6e714 100644
--- a/arch/x86/kernel/unwind_guess.c
+++ b/arch/x86/kernel/unwind_guess.c
@@ -15,8 +15,7 @@ unsigned long unwind_get_return_address(struct unwind_state *state)

addr = READ_ONCE_NOCHECK(*state->sp);

- return ftrace_graph_ret_addr(state->task, &state->graph_idx,
- addr, state->sp);
+ return unwind_recover_ret_addr(state, addr, state->sp);
}
EXPORT_SYMBOL_GPL(unwind_get_return_address);

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index a1202536fc57..ad6a9aece379 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -534,9 +534,8 @@ bool unwind_next_frame(struct unwind_state *state)
if (!deref_stack_reg(state, ip_p, &state->ip))
goto err;

- state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
- state->ip, (void *)ip_p);
-
+ state->ip = unwind_recover_ret_addr(state, state->ip,
+ (unsigned long *)ip_p);
state->sp = sp;
state->regs = NULL;
state->prev_regs = NULL;
@@ -549,7 +548,15 @@ bool unwind_next_frame(struct unwind_state *state)
(void *)orig_ip);
goto err;
}
-
+ /*
+ * There is a small chance to interrupt at the entry of
+ * kretprobe_trampoline where the ORC info doesn't exist.
+ * That point is right after the RET to kretprobe_trampoline
+ * which was modified return address. So the @addr_p must
+ * be right before the regs->sp.
+ */
+ state->ip = unwind_recover_kretprobe(state, state->ip,
+ (unsigned long *)(state->sp - sizeof(long)));
state->regs = (struct pt_regs *)sp;
state->prev_regs = NULL;
state->full_regs = true;
@@ -562,6 +569,9 @@ bool unwind_next_frame(struct unwind_state *state)
(void *)orig_ip);
goto err;
}
+ /* See UNWIND_HINT_TYPE_REGS case comment. */
+ state->ip = unwind_recover_kretprobe(state, state->ip,
+ (unsigned long *)(state->sp - sizeof(long)));

if (state->full_regs)
state->prev_regs = state->regs;

2021-06-18 10:34:14

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 12/13] tracing: Show kretprobe unknown indicator only for kretprobe_trampoline

ftrace shows "[unknown/kretprobe'd]" indicator all addresses in the
kretprobe_trampoline, but the modified address by kretprobe should
be only kretprobe_trampoline+0.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Tested-by: Andrii Nakryik <[email protected]>
---
kernel/trace/trace_output.c | 17 ++++-------------
1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index d0368a569bfa..e46780670742 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -8,6 +8,7 @@
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/ftrace.h>
+#include <linux/kprobes.h>
#include <linux/sched/clock.h>
#include <linux/sched/mm.h>

@@ -346,22 +347,12 @@ int trace_output_call(struct trace_iterator *iter, char *name, char *fmt, ...)
}
EXPORT_SYMBOL_GPL(trace_output_call);

-#ifdef CONFIG_KRETPROBES
-static inline const char *kretprobed(const char *name)
+static inline const char *kretprobed(const char *name, unsigned long addr)
{
- static const char tramp_name[] = "kretprobe_trampoline";
- int size = sizeof(tramp_name);
-
- if (strncmp(tramp_name, name, size) == 0)
+ if (is_kretprobe_trampoline(addr))
return "[unknown/kretprobe'd]";
return name;
}
-#else
-static inline const char *kretprobed(const char *name)
-{
- return name;
-}
-#endif /* CONFIG_KRETPROBES */

void
trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset)
@@ -374,7 +365,7 @@ trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset)
sprint_symbol(str, address);
else
kallsyms_lookup(address, NULL, NULL, NULL, str);
- name = kretprobed(str);
+ name = kretprobed(str, address);

if (name && strlen(name)) {
trace_seq_puts(s, name);

2021-06-18 11:24:37

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 07/13] ia64: Add instruction_pointer_set() API

Add instruction_pointer_set() API for ia64.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
Changes in v4:
- Make the API macro for avoiding a build error.
---
arch/ia64/include/asm/ptrace.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/ia64/include/asm/ptrace.h b/arch/ia64/include/asm/ptrace.h
index 08179135905c..a024afbc70e5 100644
--- a/arch/ia64/include/asm/ptrace.h
+++ b/arch/ia64/include/asm/ptrace.h
@@ -51,6 +51,11 @@
* the canonical representation by adding to instruction pointer.
*/
# define instruction_pointer(regs) ((regs)->cr_iip + ia64_psr(regs)->ri)
+# define instruction_pointer_set(regs, val) \
+ ({ \
+ ia64_psr(regs)->ri = (val & 0xf); \
+ regs->cr_iip = (val & ~0xfULL); \
+ })

static inline unsigned long user_stack_pointer(struct pt_regs *regs)
{

2021-06-18 11:24:37

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 06/13] ARC: Add instruction_pointer_set() API

Add instruction_pointer_set() API for arc.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
arch/arc/include/asm/ptrace.h | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/arc/include/asm/ptrace.h b/arch/arc/include/asm/ptrace.h
index 4c3c9be5bd16..cca8d6583e31 100644
--- a/arch/arc/include/asm/ptrace.h
+++ b/arch/arc/include/asm/ptrace.h
@@ -149,6 +149,11 @@ static inline long regs_return_value(struct pt_regs *regs)
return (long)regs->r0;
}

+static inline void instruction_pointer_set(struct pt_regs *regs,
+ unsigned long val)
+{
+ instruction_pointer(regs) = val;
+}
#endif /* !__ASSEMBLY__ */

#endif /* __ASM_PTRACE_H */

2021-06-18 11:24:48

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip v8 13/13] x86/kprobes: Fixup return address in generic trampoline handler

In x86, kretprobe trampoline address on the stack frame will
be replaced with the real return address after returning from
trampoline_handler. Before fixing the return address, the real
return address can be found in the current->kretprobe_instances.

However, since there is a window between updating the
current->kretprobe_instances and fixing the address on the stack,
if an interrupt caused at that timing and the interrupt handler
does stacktrace, it may fail to unwind because it can not get
the correct return address from current->kretprobe_instances.

This will minimize that window by fixing the return address
right before updating current->kretprobe_instances.

Signed-off-by: Masami Hiramatsu <[email protected]>
Tested-by: Andrii Nakryik <[email protected]>
---
Changes in v7:
- Add a prototype for arch_kretprobe_fixup_return()
---
arch/x86/kernel/kprobes/core.c | 15 +++++++++++++--
include/linux/kprobes.h | 3 +++
kernel/kprobes.c | 8 ++++++++
3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4d040aaf969b..53c1dcfcb145 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -1032,6 +1032,7 @@ STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
#undef UNWIND_HINT_FUNC
#define UNWIND_HINT_FUNC
#endif
+
/*
* When a retprobed function returns, this code saves registers and
* calls trampoline_handler() runs, which calls the kretprobe's handler.
@@ -1073,6 +1074,17 @@ asm(
);
NOKPROBE_SYMBOL(kretprobe_trampoline);

+void arch_kretprobe_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr)
+{
+ unsigned long *frame_pointer;
+
+ frame_pointer = ((unsigned long *)&regs->sp) + 1;
+
+ /* Replace fake return address with real one. */
+ *frame_pointer = correct_ret_addr;
+}
+
/*
* Called from kretprobe_trampoline
*/
@@ -1090,8 +1102,7 @@ __used __visible void trampoline_handler(struct pt_regs *regs)
regs->sp += sizeof(long);
frame_pointer = ((unsigned long *)&regs->sp) + 1;

- /* Replace fake return address with real one. */
- *frame_pointer = kretprobe_trampoline_handler(regs, frame_pointer);
+ kretprobe_trampoline_handler(regs, frame_pointer);
/*
* Move flags to sp so that kretprobe_trapmoline can return
* right after popf.
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 08d3415e4418..259bdc80e708 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -197,6 +197,9 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs);
extern int arch_trampoline_kprobe(struct kprobe *p);

+void arch_kretprobe_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr);
+
void kretprobe_trampoline(void);
/*
* Since some architecture uses structured function pointer,
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index ba729ed05cb3..72e8125fb0e9 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1881,6 +1881,12 @@ unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
}
NOKPROBE_SYMBOL(kretprobe_find_ret_addr);

+void __weak arch_kretprobe_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr)
+{
+ /* Do nothing by default. */
+}
+
unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
void *frame_pointer)
{
@@ -1922,6 +1928,8 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
first = first->next;
}

+ arch_kretprobe_fixup_return(regs, (unsigned long)correct_ret_addr);
+
/* Unlink all nodes for this frame. */
first = current->kretprobe_instances.first;
current->kretprobe_instances.first = node->next;

2021-06-18 17:38:18

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH -tip v8 09/13] kprobes: Enable stacktrace from pt_regs in kretprobe handler

On Fri, Jun 18, 2021 at 04:06:46PM +0900, Masami Hiramatsu wrote:
> Since the ORC unwinder from pt_regs requires setting up regs->ip
> correctly, set the correct return address to the regs->ip before
> calling user kretprobe handler.
>
> This allows the kretrprobe handler to trace stack from the
> kretprobe's pt_regs by stack_trace_save_regs() (eBPF will do
> this), instead of stack tracing from the handler context by
> stack_trace_save() (ftrace will do this).
>
> Suggested-by: Josh Poimboeuf <[email protected]>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Tested-by: Andrii Nakryik <[email protected]>

Acked-by: Josh Poimboeuf <[email protected]>

> ---
> Changes in v8:
> - Update comment to clarify why this is needed.
> Changes in v3:
> - Cast the correct_ret_addr to unsigned long.
> ---
> kernel/kprobes.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/kprobes.c b/kernel/kprobes.c
> index 650cbe738666..ba729ed05cb3 100644
> --- a/kernel/kprobes.c
> +++ b/kernel/kprobes.c
> @@ -1896,6 +1896,9 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
> BUG_ON(1);
> }
>
> + /* Set the instruction pointer to the correct address */
> + instruction_pointer_set(regs, (unsigned long)correct_ret_addr);
> +
> /* Run them. */
> first = current->kretprobe_instances.first;
> while (first) {
>

--
Josh

2021-06-18 22:59:43

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86

On Fri, Jun 18, 2021 at 12:05 AM Masami Hiramatsu <[email protected]> wrote:
>
> Hello,
>
> Here is the 8th version of the series to fix the stacktrace with kretprobe on x86.
>
> The previous version is;
>
> https://lore.kernel.org/bpf/162209754288.436794.3904335049560916855.stgit@devnote2/
>
> This version fixes to call appropriate function and drop some unneeded
> patches.
>
>
> Changes from v7:
> [03/13]: Call dereference_kernel_function_descriptor() for getting the
> address of kretprobe_trampoline.
> [09/13]: Update the title and description to explain why it is needed.
> [10/13][11/13]: Add Josh's Acked-by.
>
>
>
> With this series, unwinder can unwind stack correctly from ftrace as below;
>
> # cd /sys/kernel/debug/tracing
> # echo > trace
> # echo 1 > options/sym-offset
> # echo r vfs_read >> kprobe_events
> # echo r full_proxy_read >> kprobe_events
> # echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
> # echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
> # echo 1 > events/kprobes/enable
> # cat /sys/kernel/debug/kprobes/list
> ffffffff8133b740 r full_proxy_read+0x0 [FTRACE]
> ffffffff812560b0 r vfs_read+0x0 [FTRACE]
> # echo 0 > events/kprobes/enable
> # cat trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 3/3 #P:8
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
> <...>-134 [007] ...1 16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
> <...>-134 [007] ...1 16.185901: <stack trace>
> => kretprobe_trace_func+0x209/0x300
> => kretprobe_dispatcher+0x4a/0x70
> => __kretprobe_trampoline_handler+0xd4/0x170
> => trampoline_handler+0x43/0x60
> => kretprobe_trampoline+0x2a/0x50
> => vfs_read+0x98/0x180
> => ksys_read+0x5f/0xe0
> => do_syscall_64+0x37/0x90
> => entry_SYSCALL_64_after_hwframe+0x44/0xae
> <...>-134 [007] ...1 16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)
>
> This shows the double return probes (vfs_read and full_proxy_read) on the stack
> correctly unwinded. (vfs_read will return to ksys_read+0x5f and full_proxy_read
> will return to vfs_read+0x98)
>
> This actually changes the kretprobe behavisor a bit, now the instraction pointer in
> the pt_regs passed to kretprobe user handler is correctly set the real return
> address. So user handlers can get it via instruction_pointer() API, and can use
> stack_trace_save_regs().
>
> You can also get this series from
> git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v8
>
>
> Thank you,
>
> ---
>
> Josh Poimboeuf (1):
> x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code
>
> Masami Hiramatsu (12):
> ia64: kprobes: Fix to pass correct trampoline address to the handler
> kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
> kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
> kprobes: Add kretprobe_find_ret_addr() for searching return address
> ARC: Add instruction_pointer_set() API
> ia64: Add instruction_pointer_set() API
> arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline
> kprobes: Enable stacktrace from pt_regs in kretprobe handler
> x86/kprobes: Push a fake return address at kretprobe_trampoline
> x86/unwind: Recover kretprobe trampoline entry
> tracing: Show kretprobe unknown indicator only for kretprobe_trampoline
> x86/kprobes: Fixup return address in generic trampoline handler
>

It works for BPF cases. Thanks!

Tested-by: Andrii Nakryiko <[email protected]>


>
> arch/arc/include/asm/ptrace.h | 5 ++
> arch/arc/kernel/kprobes.c | 2 -
> arch/arm/probes/kprobes/core.c | 5 +-
> arch/arm64/kernel/probes/kprobes.c | 3 -
> arch/csky/kernel/probes/kprobes.c | 2 -
> arch/ia64/include/asm/ptrace.h | 5 ++
> arch/ia64/kernel/kprobes.c | 15 ++---
> arch/mips/kernel/kprobes.c | 3 -
> arch/parisc/kernel/kprobes.c | 4 +
> arch/powerpc/kernel/kprobes.c | 13 ----
> arch/riscv/kernel/probes/kprobes.c | 2 -
> arch/s390/kernel/kprobes.c | 2 -
> arch/sh/kernel/kprobes.c | 2 -
> arch/sparc/kernel/kprobes.c | 2 -
> arch/x86/include/asm/kprobes.h | 1
> arch/x86/include/asm/unwind.h | 23 +++++++
> arch/x86/include/asm/unwind_hints.h | 5 ++
> arch/x86/kernel/kprobes/core.c | 53 +++++++++++++++--
> arch/x86/kernel/unwind_frame.c | 3 -
> arch/x86/kernel/unwind_guess.c | 3 -
> arch/x86/kernel/unwind_orc.c | 18 +++++-
> include/linux/kprobes.h | 44 ++++++++++++--
> kernel/kprobes.c | 108 +++++++++++++++++++++++++----------
> kernel/trace/trace_output.c | 17 +-----
> lib/error-inject.c | 3 +
> 25 files changed, 238 insertions(+), 105 deletions(-)
>
> --
> Masami Hiramatsu (Linaro) <[email protected]>

2021-06-28 13:52:07

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 00/13] kprobes: Fix stacktrace with kretprobes on x86

Hi Ingo and Peter,

Can you merge this series to tip tree?
Josh reviewed the ORC unwinder parts, so I think it is a good time to pick it.
(And recently I got same effort from Qiang, he thinks this can be a phishing risk *)

* https://lore.kernel.org/bpf/CAMZfGtWPi4CuVOtmUpy2N9J_mvp+5=gSAFvqV1nmvDKP+CAvQA@mail.gmail.com/

Thank you,

On Fri, 18 Jun 2021 16:05:22 +0900
Masami Hiramatsu <[email protected]> wrote:

> Hello,
>
> Here is the 8th version of the series to fix the stacktrace with kretprobe on x86.
>
> The previous version is;
>
> https://lore.kernel.org/bpf/162209754288.436794.3904335049560916855.stgit@devnote2/
>
> This version fixes to call appropriate function and drop some unneeded
> patches.
>
>
> Changes from v7:
> [03/13]: Call dereference_kernel_function_descriptor() for getting the
> address of kretprobe_trampoline.
> [09/13]: Update the title and description to explain why it is needed.
> [10/13][11/13]: Add Josh's Acked-by.
>
>
>
> With this series, unwinder can unwind stack correctly from ftrace as below;
>
> # cd /sys/kernel/debug/tracing
> # echo > trace
> # echo 1 > options/sym-offset
> # echo r vfs_read >> kprobe_events
> # echo r full_proxy_read >> kprobe_events
> # echo traceoff:1 > events/kprobes/r_vfs_read_0/trigger
> # echo stacktrace:1 > events/kprobes/r_full_proxy_read_0/trigger
> # echo 1 > events/kprobes/enable
> # cat /sys/kernel/debug/kprobes/list
> ffffffff8133b740 r full_proxy_read+0x0 [FTRACE]
> ffffffff812560b0 r vfs_read+0x0 [FTRACE]
> # echo 0 > events/kprobes/enable
> # cat trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 3/3 #P:8
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
> <...>-134 [007] ...1 16.185877: r_full_proxy_read_0: (vfs_read+0x98/0x180 <- full_proxy_read)
> <...>-134 [007] ...1 16.185901: <stack trace>
> => kretprobe_trace_func+0x209/0x300
> => kretprobe_dispatcher+0x4a/0x70
> => __kretprobe_trampoline_handler+0xd4/0x170
> => trampoline_handler+0x43/0x60
> => kretprobe_trampoline+0x2a/0x50
> => vfs_read+0x98/0x180
> => ksys_read+0x5f/0xe0
> => do_syscall_64+0x37/0x90
> => entry_SYSCALL_64_after_hwframe+0x44/0xae
> <...>-134 [007] ...1 16.185902: r_vfs_read_0: (ksys_read+0x5f/0xe0 <- vfs_read)
>
> This shows the double return probes (vfs_read and full_proxy_read) on the stack
> correctly unwinded. (vfs_read will return to ksys_read+0x5f and full_proxy_read
> will return to vfs_read+0x98)
>
> This actually changes the kretprobe behavisor a bit, now the instraction pointer in
> the pt_regs passed to kretprobe user handler is correctly set the real return
> address. So user handlers can get it via instruction_pointer() API, and can use
> stack_trace_save_regs().
>
> You can also get this series from
> git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux.git kprobes/kretprobe-stackfix-v8
>
>
> Thank you,
>
> ---
>
> Josh Poimboeuf (1):
> x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code
>
> Masami Hiramatsu (12):
> ia64: kprobes: Fix to pass correct trampoline address to the handler
> kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()
> kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()
> kprobes: Add kretprobe_find_ret_addr() for searching return address
> ARC: Add instruction_pointer_set() API
> ia64: Add instruction_pointer_set() API
> arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline
> kprobes: Enable stacktrace from pt_regs in kretprobe handler
> x86/kprobes: Push a fake return address at kretprobe_trampoline
> x86/unwind: Recover kretprobe trampoline entry
> tracing: Show kretprobe unknown indicator only for kretprobe_trampoline
> x86/kprobes: Fixup return address in generic trampoline handler
>
>
> arch/arc/include/asm/ptrace.h | 5 ++
> arch/arc/kernel/kprobes.c | 2 -
> arch/arm/probes/kprobes/core.c | 5 +-
> arch/arm64/kernel/probes/kprobes.c | 3 -
> arch/csky/kernel/probes/kprobes.c | 2 -
> arch/ia64/include/asm/ptrace.h | 5 ++
> arch/ia64/kernel/kprobes.c | 15 ++---
> arch/mips/kernel/kprobes.c | 3 -
> arch/parisc/kernel/kprobes.c | 4 +
> arch/powerpc/kernel/kprobes.c | 13 ----
> arch/riscv/kernel/probes/kprobes.c | 2 -
> arch/s390/kernel/kprobes.c | 2 -
> arch/sh/kernel/kprobes.c | 2 -
> arch/sparc/kernel/kprobes.c | 2 -
> arch/x86/include/asm/kprobes.h | 1
> arch/x86/include/asm/unwind.h | 23 +++++++
> arch/x86/include/asm/unwind_hints.h | 5 ++
> arch/x86/kernel/kprobes/core.c | 53 +++++++++++++++--
> arch/x86/kernel/unwind_frame.c | 3 -
> arch/x86/kernel/unwind_guess.c | 3 -
> arch/x86/kernel/unwind_orc.c | 18 +++++-
> include/linux/kprobes.h | 44 ++++++++++++--
> kernel/kprobes.c | 108 +++++++++++++++++++++++++----------
> kernel/trace/trace_output.c | 17 +-----
> lib/error-inject.c | 3 +
> 25 files changed, 238 insertions(+), 105 deletions(-)
>
> --
> Masami Hiramatsu (Linaro) <[email protected]>


--
Masami Hiramatsu <[email protected]>

2021-07-05 07:06:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()


* Masami Hiramatsu <[email protected]> wrote:

> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -197,15 +197,23 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
> struct pt_regs *regs);
> extern int arch_trampoline_kprobe(struct kprobe *p);
>
> +void kretprobe_trampoline(void);
> +/*
> + * Since some architecture uses structured function pointer,
> + * use dereference_function_descriptor() to get real function address.
> + */
> +static nokprobe_inline void *kretprobe_trampoline_addr(void)
> +{
> + return dereference_kernel_function_descriptor(kretprobe_trampoline);
> +}

Could we please also make 'kretprobe_trampoline' harder to use
accidentally, such by naming it appropriately?

__kretprobe_trampoline would be a good first step I guess.

Thanks,

Ingo

2021-07-05 07:43:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip v8 04/13] kprobes: Add kretprobe_find_ret_addr() for searching return address


* Masami Hiramatsu <[email protected]> wrote:

> Add kretprobe_find_ret_addr() for searching correct return address
> from kretprobe instance list.

A better changelog:

Add kretprobe_find_ret_addr() for searching the correct return address
from the kretprobe instances list.

But an explanation of *why* we want to add this function would be even
better. Is it a cleanup? Is it in preparation for future changes?

Plus:

> include/linux/kprobes.h | 22 +++++++++++
> kernel/kprobes.c | 91 ++++++++++++++++++++++++++++++++++-------------
> 2 files changed, 87 insertions(+), 26 deletions(-)
>
> diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> index 5ce677819a25..08d3415e4418 100644
> --- a/include/linux/kprobes.h
> +++ b/include/linux/kprobes.h
> @@ -207,6 +207,14 @@ static nokprobe_inline void *kretprobe_trampoline_addr(void)
> return dereference_kernel_function_descriptor(kretprobe_trampoline);
> }
>
> +static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
> +{
> + return (void *)addr == kretprobe_trampoline_addr();
> +}
> +
> +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
> + struct llist_node **cur);

These prototypes for helpers are put into a section of:

#ifdef CONFIG_KRETPROBES

But:

> +#if !defined(CONFIG_KRETPROBES)
> +static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
> +{
> + return false;
> +}
> +
> +static nokprobe_inline
> +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
> + struct llist_node **cur)
> +{
> + return 0;
> +}
> +#endif

Why does this use such a weird pattern? What is wrong with:

#ifndef CONFIG_KRETPROBES

But more importantly, why isn't this in the regular '#else' block of the
CONFIG_KRETPROBES block you added the other functions to ??

Why this intentional obfuscation combined with poor changelogs - is the
kprobes code too easy to read, does it have too few bugs?

And this series is on v8 already, and nobody noticed this?

> +/* This assumes the tsk is current or the task which is not running. */
> +static unsigned long __kretprobe_find_ret_addr(struct task_struct *tsk,
> + struct llist_node **cur)


A better comment:

/* This assumes 'tsk' is the current task, or is not running. */

We always escape variable names in English sentences. This is nothing new.

> + *cur = node;
> + return (unsigned long)ri->ret_addr;

Don't just randomly add forced type casts (which are dangerous,
bug-inducing patterns of code) without examining whether it's justified.

But a compiler warning is not justification!

In this case the examination would involve:

kepler:~/tip> git grep -w ret_addr kernel/kprobes.c

kernel/kprobes.c: if (ri->ret_addr != kretprobe_trampoline_addr()) {
kernel/kprobes.c: return (unsigned long)ri->ret_addr;
kernel/kprobes.c: ri->ret_addr = correct_ret_addr;

kepler:~/tip> git grep -w correct_ret_addr kernel/kprobes.c

kernel/kprobes.c: kprobe_opcode_t *correct_ret_addr = NULL;
kernel/kprobes.c: correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);
kernel/kprobes.c: if (!correct_ret_addr) {
kernel/kprobes.c: ri->ret_addr = correct_ret_addr;
kernel/kprobes.c: return (unsigned long)correct_ret_addr;

what we can see here is unnecessary type confusion & friction of the first
degree:

- 'correct_ret_addr' is 'kprobe_opcode_t *' (which is good), but the newly
introduced __kretprobe_find_ret_addr() function doesn't return such a
type - why?

- struct_kretprobe_instance::ret_address has a 'kprobe_opcode_t *' type as
well - which is good.

- kretprobe_find_ret_addr() uses 'unsigned long', but it returns the value
to __kretprobe_trampoline_handler(), which does *another* forced type
cast:

+ correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);


So we have the following type conversions:

kprobe_opcode_t * => unsigned long => unsigned long => kprobe_opcode_t *

Is there a technical reason why we cannot just use 'kprobe_opcode_t *'.

All other type casts in the kprobes code should be reviewed as well.

> - BUG_ON(1);
> + return 0;

And in the proper, intact type propagation model this would become
'return NULL' - which is *far* more obviously a 'not found' condition
than a random zero that might mean anything...

> +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
> + struct llist_node **cur)
> +{
> + struct kretprobe_instance *ri = NULL;
> + unsigned long ret;
> +
> + do {
> + ret = __kretprobe_find_ret_addr(tsk, cur);
> + if (!ret)
> + return ret;
> + ri = container_of(*cur, struct kretprobe_instance, llist);
> + } while (ri->fp != fp);
> +
> + return ret;

Here I see another type model problem: why is the frame pointer 'void *',
which makes it way too easy to mix up with text pointers such as
'kprobe_opcode_t *'?

In the x86 unwinder we use 'unsigned long *' as the frame pointer:

unsigned long *bp

but it might also make sense to introduce a more opaque dedicated type
within the kprobes code, such as 'frame_pointer_t'.

> +unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
> + void *frame_pointer)
> +{
> + kprobe_opcode_t *correct_ret_addr = NULL;
> + struct kretprobe_instance *ri = NULL;
> + struct llist_node *first, *node = NULL;
> + struct kretprobe *rp;
> +
> + /* Find correct address and all nodes for this frame. */
> + correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);
> + if (!correct_ret_addr) {
> + pr_err("Oops! Kretprobe fails to find correct return address.\n");

Could we please make user-facing messages less random? Right now we have:

kepler:~/tip> git grep -E 'pr_.*\(' kernel/kprobes.c include/linux/kprobes.h include/asm-generic/kprobes.h $(find arch/ -name kprobes.c)

arch/arm64/kernel/probes/kprobes.c: pr_warn("Unrecoverable kprobe detected.\n");
arch/csky/kernel/probes/kprobes.c: pr_warn("Address not aligned.\n");
arch/csky/kernel/probes/kprobes.c: pr_warn("Unrecoverable kprobe detected.\n");
arch/mips/kernel/kprobes.c: pr_notice("Kprobes for ll and sc instructions are not"
arch/mips/kernel/kprobes.c: pr_notice("Kprobes for branch delayslot are not supported\n");
arch/mips/kernel/kprobes.c: pr_notice("Kprobes for compact branches are not supported\n");
arch/mips/kernel/kprobes.c: pr_notice("%s: unaligned epc - sending SIGBUS.\n", current->comm);
arch/mips/kernel/kprobes.c: pr_notice("Kprobes: Error in evaluating branch\n");
arch/riscv/kernel/probes/kprobes.c: pr_warn("Address not aligned.\n");
arch/riscv/kernel/probes/kprobes.c: pr_warn("Unrecoverable kprobe detected.\n");
arch/s390/kernel/kprobes.c: pr_err("Invalid kprobe detected.\n");
kernel/kprobes.c: pr_debug("Failed to arm kprobe-ftrace at %pS (%d)\n",
kernel/kprobes.c: pr_debug("Failed to init kprobe-ftrace (%d)\n", ret);
kernel/kprobes.c: pr_err("Oops! Kretprobe fails to find correct return address.\n");
kernel/kprobes.c: pr_err("Dumping kprobe:\n");
kernel/kprobes.c: pr_err("Name: %s\nOffset: %x\nAddress: %pS\n",
kernel/kprobes.c: pr_err("kprobes: failed to populate blacklist: %d\n", err);
kernel/kprobes.c: pr_err("Please take care of using kprobes.\n");
kernel/kprobes.c: pr_warn("Kprobes globally enabled, but failed to arm %d out of %d probes\n",
kernel/kprobes.c: pr_info("Kprobes globally enabled\n");
kernel/kprobes.c: pr_warn("Kprobes globally disabled, but failed to disarm %d out of %d probes\n",
kernel/kprobes.c: pr_info("Kprobes globally disabled\n");

In particular, what users may see in their syslog, when the kprobes code
runs into trouble, is, roughly:

kepler:~/tip> git grep -E 'pr_.*\(' kernel/kprobes.c include/linux/kprobes.h include/asm-generic/kprobes.h $(find arch/ -name kprobes.c) | cut -d\" -f2

Unrecoverable kprobe detected.\n
Address not aligned.\n
Unrecoverable kprobe detected.\n
Kprobes for ll and sc instructions are not
Kprobes for branch delayslot are not supported\n
Kprobes for compact branches are not supported\n
%s: unaligned epc - sending SIGBUS.\n
Kprobes: Error in evaluating branch\n
Address not aligned.\n
Unrecoverable kprobe detected.\n
Invalid kprobe detected.\n
Failed to arm kprobe-ftrace at %pS (%d)\n
Failed to init kprobe-ftrace (%d)\n
Oops! Kretprobe fails to find correct return address.\n
Dumping kprobe:\n
Name: %s\nOffset: %x\nAddress: %pS\n
kprobes: failed to populate blacklist: %d\n
Please take care of using kprobes.\n
Kprobes globally enabled, but failed to arm %d out of %d probes\n
Kprobes globally enabled\n
Kprobes globally disabled, but failed to disarm %d out of %d probes\n
Kprobes globally disabled\n

Ugh. Some of the messages don't even have 'kprobes' in them...

So my suggestion would be:

- Introduce a subsystem syslog message prefix, via the standard pattern of:

#define pr_fmt(fmt) "kprobes: " fmt

- Standardize the messages:

- Start each message with a key noun that stresses the nature of the
failure.

- *Make each message self-explanatory*, don't leave users hanging in
the air about what is going to happen next. Messages like:

Address not aligned.\n

- Check spelling. This:

pr_err("kprobes: failed to populate blacklist: %d\n", err);
pr_err("Please take care of using kprobes.\n");

should be on a single line and should probably say something like:

pr_err("kprobes: Failed to populate blacklist (error: %d), kprobes not restricted, be careful using them!.\n", err);

and if checkpatch complains that the line is 'too long', ignore
checkpatch and keep the message self-contained.

- and most importantly: provide a suggested *resolution*.
Passive-aggressive messages like:

Oops! Kretprobe fails to find correct return address.\n

are next to useless. Instead, always describe:

- what happened,
- what is the kernel going to do or not do,
- is the kernel fine,
- what can the user do about it.

In this case, a better message would be:

kretprobes: Return address not found, not executing handler. Kernel is probably fine, but check the system tool that did this.

Each and every message should be reviewed & fixed to meet these standards -
or should be removed and replaced with a WARN_ON() if it's indicating an
internal bug that cannot be caused by kprobes using tools, such as this
one:

> + if (WARN_ON_ONCE(ri->fp != frame_pointer))
> + break;

I can help double checking the fixed messages, if you are unsure about any
of them.

> + /* Recycle them. */
> + while (first) {
> + ri = container_of(first, struct kretprobe_instance, llist);
> + first = first->next;
>
> recycle_rp_inst(ri);
> }

It would be helpful to explain, a bit more verbose comment, what
'recycling' is in this context. The code is not very helpful:

NOKPROBE_SYMBOL(free_rp_inst_rcu);

static void recycle_rp_inst(struct kretprobe_instance *ri)
{
struct kretprobe *rp = get_kretprobe(ri);

if (likely(rp)) {
freelist_add(&ri->freelist, &rp->freelist);
} else
call_rcu(&ri->rcu, free_rp_inst_rcu);
}
NOKPROBE_SYMBOL(recycle_rp_inst);

BTW., why are unnecessary curly braces used here?

The kprobes code urgently needs a quality boost.

Thanks,

Ingo

2021-07-05 07:48:04

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip v8 01/13] ia64: kprobes: Fix to pass correct trampoline address to the handler


* Masami Hiramatsu <[email protected]> wrote:

> Commit e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
> missed to pass the wrong trampoline address (it passes the descriptor address
> instead of function entry address).
> This fixes it to pass correct trampoline address to __kretprobe_trampoline_handler().
> This also changes to use correct symbol dereference function to get the
> function address from the kretprobe_trampoline.
>
> Fixes: e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
> Signed-off-by: Masami Hiramatsu <[email protected]>

A better changelog:

The following commit:

Commit e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")

Passed the wrong trampoline address to __kretprobe_trampoline_handler(): it
passes the descriptor address instead of function entry address.

Pass the right parameter.

Also use correct symbol dereference function to get the function address
from 'kretprobe_trampoline' - an IA64 special.

(Although I realize that much of this goes away just a couple of patches
later.)

Thanks,

Ingo

2021-07-05 07:49:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()


* Masami Hiramatsu <[email protected]> wrote:

> Replace arch_deref_entry_point() with dereference_symbol_descriptor()
> because those are doing same thing.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Tested-by: Andrii Nakryik <[email protected]>

A better changelog:

~15 years ago kprobes grew the 'arch_deref_entry_point()' __weak function:

3d7e33825d87: ("jprobes: make jprobes a little safer for users")

But this is just open-coded dereference_symbol_descriptor() in essence, and
its obscure nature was causing bugs.

Just use the real thing.

Thanks,

Ingo

2021-07-05 07:52:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()


* Masami Hiramatsu <[email protected]> wrote:

> Remove trampoline_address from kretprobe_trampoline_handler().
> Instead of passing the address, kretprobe_trampoline_handler()
> can use new kretprobe_trampoline_addr().
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Tested-by: Andrii Nakryik <[email protected]>

A better changelog:

The __kretprobe_trampoline_handler() callback, called from low level
arch kprobes methods, has the 'trampoline_address' parameter, which is
entirely superfluous as it basically just replicates:

dereference_kernel_function_descriptor(kretprobe_trampoline)

In fact we had bugs in arch code where it wasn't replicated correctly.

So remove this superfluous parameter and use kretprobe_trampoline_addr()
instead.

Thanks,

Ingo

2021-07-05 08:06:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code


* Masami Hiramatsu <[email protected]> wrote:

> From: Josh Poimboeuf <[email protected]>
>
> Add UNWIND_HINT_FUNC on kretporbe_trampoline code so that ORC
> information is generated on the kretprobe_trampoline correctly.

What is a 'kretporbe'?

> Note that when the CONFIG_FRAME_POINTER=y, since the
> kretprobe_trampoline skips updating frame pointer, the stack frame
> of the kretprobe_trampoline seems non-standard. So this marks it
> is STACK_FRAME_NON_STANDARD() and undefine UNWIND_HINT_FUNC.

What does 'marks it is' mean?

'undefine' UNWIND_HINT_FUNC?

Doesn't the patch do the exact opposite:

> +#define UNWIND_HINT_FUNC \
> + UNWIND_HINT(ORC_REG_SP, 8, UNWIND_HINT_TYPE_FUNC, 0)

But it does undefine it in a specific spot:



> Anyway, with the frame pointer, FP unwinder can unwind the stack
> frame correctly without that hint.
>
> Signed-off-by: Josh Poimboeuf <[email protected]>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Tested-by: Andrii Nakryik <[email protected]>

I have to say these changelogs are very careless.

> +#else
> +

In headers, in longer CPP blocks, please always mark the '#else' branch
with what it is the else branch of.

See the output of:

kepler:~/tip> git grep '#else' arch/x86/include/asm/ | head

> +#ifdef CONFIG_FRAME_POINTER
> +/*
> + * kretprobe_trampoline skips updating frame pointer. The frame pointer
> + * saved in trampoline_handler points to the real caller function's
> + * frame pointer. Thus the kretprobe_trampoline doesn't seems to have a
> + * standard stack frame with CONFIG_FRAME_POINTER=y.
> + * Let's mark it non-standard function. Anyway, FP unwinder can correctly
> + * unwind without the hint.

s/doesn't seems to have a standard stack frame
/doesn't have a standard stack frame

There's nothing 'seems' about the situation - it's a non-standard function
entry and stack frame situation, and the unwinder needs to know about it.

> +STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> +#undef UNWIND_HINT_FUNC
> +#define UNWIND_HINT_FUNC
> +#endif
> /*
> * When a retprobed function returns, this code saves registers and
> * calls trampoline_handler() runs, which calls the kretprobe's handler.
> @@ -1031,6 +1044,7 @@ asm(
> /* We don't bother saving the ss register */
> #ifdef CONFIG_X86_64
> " pushq %rsp\n"
> + UNWIND_HINT_FUNC
> " pushfq\n"
> SAVE_REGS_STRING
> " movq %rsp, %rdi\n"
> @@ -1041,6 +1055,7 @@ asm(
> " popfq\n"
> #else
> " pushl %esp\n"
> + UNWIND_HINT_FUNC
> " pushfl\n"
> SAVE_REGS_STRING
> " movl %esp, %eax\n"

Why not provide an appropriate annotation method in <asm/unwind_hints.h>,
so that other future code can use it too instead of reinventing the wheel?

Thanks,

Ingo

2021-07-05 08:08:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip v8 08/13] arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline


* Masami Hiramatsu <[email protected]> wrote:

> Change kretprobe_trampoline to make a space for regs->ARM_pc so that
> kretprobe_trampoline_handler can call instruction_pointer_set()
> safely.

The idiom is "make space", but in any case, what does this mean?

Was the stack frame set up in kretprobe_trampoline() and calling
trampoline_handler() buggy?

If yes, then explain the bad effects of the bug, and make all of this clear
in the title & changelog.

Thanks,

Ingo

2021-07-05 08:18:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip v8 10/13] x86/kprobes: Push a fake return address at kretprobe_trampoline


* Masami Hiramatsu <[email protected]> wrote:

> + /* Replace fake return address with real one. */
> + *frame_pointer = kretprobe_trampoline_handler(regs, frame_pointer);
> + /*
> + * Move flags to sp so that kretprobe_trapmoline can return
> + * right after popf.

What is a trapmoline?

Also, in the x86 code we capitalize register and instruction names so that
they are more distinctive and easier to read in the flow of English text.

Thanks,

Ingo

2021-07-05 08:35:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip v8 13/13] x86/kprobes: Fixup return address in generic trampoline handler


* Masami Hiramatsu <[email protected]> wrote:

> In x86, kretprobe trampoline address on the stack frame will
> be replaced with the real return address after returning from
> trampoline_handler. Before fixing the return address, the real
> return address can be found in the current->kretprobe_instances.
>
> However, since there is a window between updating the
> current->kretprobe_instances and fixing the address on the stack,
> if an interrupt caused at that timing and the interrupt handler
> does stacktrace, it may fail to unwind because it can not get
> the correct return address from current->kretprobe_instances.
>
> This will minimize that window by fixing the return address
> right before updating current->kretprobe_instances.

Is there still a window? I.e. is it "minimized" (to how big of a window?),
or eliminated?

> +void arch_kretprobe_fixup_return(struct pt_regs *regs,
> + unsigned long correct_ret_addr)
> +{
> + unsigned long *frame_pointer;
> +
> + frame_pointer = ((unsigned long *)&regs->sp) + 1;
> +
> + /* Replace fake return address with real one. */
> + *frame_pointer = correct_ret_addr;

Firstly, why does &regs->sp have to be forced to 'unsigned long *'?

pt_regs::sp is 'unsigned long' on both 32-bit and 64-bit kernels AFAICS.

Secondly, the new code modified by your patch now looks like this:

frame_pointer = ((unsigned long *)&regs->sp) + 1;

+ kretprobe_trampoline_handler(regs, frame_pointer);

where:

+void arch_kretprobe_fixup_return(struct pt_regs *regs,
+ unsigned long correct_ret_addr)
+{
+ unsigned long *frame_pointer;
+
+ frame_pointer = ((unsigned long *)&regs->sp) + 1;
+
+ /* Replace fake return address with real one. */
+ *frame_pointer = correct_ret_addr;
+}

So we first do:

frame_pointer = ((unsigned long *)&regs->sp) + 1;

... and pass that in to arch_kretprobe_fixup_return() as
'correct_ret_addr', which does:

+ frame_pointer = ((unsigned long *)&regs->sp) + 1;
+ *frame_pointer = correct_ret_addr;

... which looks like the exact same thing as:

*frame_pointer = frame_pointer;

... obfuscated through a thick layer of type casts?

Thanks,

Ingo

2021-07-05 10:05:36

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 03/13] kprobes: treewide: Remove trampoline_address from kretprobe_trampoline_handler()

Hi Ingo,

On Mon, 5 Jul 2021 09:03:19 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Masami Hiramatsu <[email protected]> wrote:
>
> > --- a/include/linux/kprobes.h
> > +++ b/include/linux/kprobes.h
> > @@ -197,15 +197,23 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
> > struct pt_regs *regs);
> > extern int arch_trampoline_kprobe(struct kprobe *p);
> >
> > +void kretprobe_trampoline(void);
> > +/*
> > + * Since some architecture uses structured function pointer,
> > + * use dereference_function_descriptor() to get real function address.
> > + */
> > +static nokprobe_inline void *kretprobe_trampoline_addr(void)
> > +{
> > + return dereference_kernel_function_descriptor(kretprobe_trampoline);
> > +}
>
> Could we please also make 'kretprobe_trampoline' harder to use
> accidentally, such by naming it appropriately?
>
> __kretprobe_trampoline would be a good first step I guess.

Good idea! Let me update it.

Thank you,



--
Masami Hiramatsu <[email protected]>

2021-07-05 10:07:38

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 01/13] ia64: kprobes: Fix to pass correct trampoline address to the handler

On Mon, 5 Jul 2021 09:46:33 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Masami Hiramatsu <[email protected]> wrote:
>
> > Commit e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
> > missed to pass the wrong trampoline address (it passes the descriptor address
> > instead of function entry address).
> > This fixes it to pass correct trampoline address to __kretprobe_trampoline_handler().
> > This also changes to use correct symbol dereference function to get the
> > function address from the kretprobe_trampoline.
> >
> > Fixes: e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
> > Signed-off-by: Masami Hiramatsu <[email protected]>
>
> A better changelog:
>
> The following commit:
>
> Commit e792ff804f49 ("ia64: kprobes: Use generic kretprobe trampoline handler")
>
> Passed the wrong trampoline address to __kretprobe_trampoline_handler(): it
> passes the descriptor address instead of function entry address.
>
> Pass the right parameter.
>
> Also use correct symbol dereference function to get the function address
> from 'kretprobe_trampoline' - an IA64 special.

Thanks for rewriting! OK, I'll update it.

>
> (Although I realize that much of this goes away just a couple of patches
> later.)

Yes, but since this is a real bug. I think I should split it for backporting
to stable trees. (Oh, I also forgot to add Cc: stable. Sorry about that.)

Thank you,


--
Masami Hiramatsu <[email protected]>

2021-07-05 12:04:40

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()

On Mon, 5 Jul 2021 09:48:09 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Masami Hiramatsu <[email protected]> wrote:
>
> > Replace arch_deref_entry_point() with dereference_symbol_descriptor()
> > because those are doing same thing.
> >
> > Signed-off-by: Masami Hiramatsu <[email protected]>
> > Tested-by: Andrii Nakryik <[email protected]>
>
> A better changelog:
>
> ~15 years ago kprobes grew the 'arch_deref_entry_point()' __weak function:
>
> 3d7e33825d87: ("jprobes: make jprobes a little safer for users")
>
> But this is just open-coded dereference_symbol_descriptor() in essence, and
> its obscure nature was causing bugs.
>
> Just use the real thing.

OK. BTW, I couldn't find actual bugs from it. What about this?

"its obscure nature was causing problems in the past."

Thank you,

>
> Thanks,
>
> Ingo


--
Masami Hiramatsu <[email protected]>

2021-07-05 14:12:57

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 04/13] kprobes: Add kretprobe_find_ret_addr() for searching return address

Hi Ingo,

On Mon, 5 Jul 2021 09:42:44 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Masami Hiramatsu <[email protected]> wrote:
>
> > Add kretprobe_find_ret_addr() for searching correct return address
> > from kretprobe instance list.
>
> A better changelog:
>
> Add kretprobe_find_ret_addr() for searching the correct return address
> from the kretprobe instances list.
>
> But an explanation of *why* we want to add this function would be even
> better. Is it a cleanup? Is it in preparation for future changes?

It's latter. This is for exposing kretprobe_find_ret_addr() and
is_kretprobe_trampoline(), which will be used in the 11/13.

>
> Plus:
>
> > include/linux/kprobes.h | 22 +++++++++++
> > kernel/kprobes.c | 91 ++++++++++++++++++++++++++++++++++-------------
> > 2 files changed, 87 insertions(+), 26 deletions(-)
> >
> > diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
> > index 5ce677819a25..08d3415e4418 100644
> > --- a/include/linux/kprobes.h
> > +++ b/include/linux/kprobes.h
> > @@ -207,6 +207,14 @@ static nokprobe_inline void *kretprobe_trampoline_addr(void)
> > return dereference_kernel_function_descriptor(kretprobe_trampoline);
> > }
> >
> > +static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
> > +{
> > + return (void *)addr == kretprobe_trampoline_addr();
> > +}
> > +
> > +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
> > + struct llist_node **cur);
>
> These prototypes for helpers are put into a section of:
>
> #ifdef CONFIG_KRETPROBES
>
> But:
>
> > +#if !defined(CONFIG_KRETPROBES)
> > +static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
> > +{
> > + return false;
> > +}
> > +
> > +static nokprobe_inline
> > +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
> > + struct llist_node **cur)
> > +{
> > + return 0;
> > +}
> > +#endif
>
> Why does this use such a weird pattern? What is wrong with:
>
> #ifndef CONFIG_KRETPROBES
>
> But more importantly, why isn't this in the regular '#else' block of the
> CONFIG_KRETPROBES block you added the other functions to ??

This is because there can be CONFIG_KPROBES=y but CONFIG_KRETPROBES=n case.

There are 3 combinations
1. CONFIG_KPROBES=y && CONFIG_KRETPROBES=y
2. CONFIG_KPROBES=y && CONFIG_KRETPROBES=n
3. CONFIG_KPROBES=n && CONFIG_KRETPROBES=n
The former definition covers case#1(note that this is in the #ifdef CONFIG_KPROBES),
and latter covers case #2 and #3.
(BTW, nowadays case #2 doesn't exist, so I think I should remove CONFIG_KRETPROBES)

Anyway, I'll put both at the last so that easier to read, something like

#ifdef CONFIG_KRETPROBES
static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
...
#else
static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr)
...
#endif

>
> Why this intentional obfuscation combined with poor changelogs - is the
> kprobes code too easy to read, does it have too few bugs?
>
> And this series is on v8 already, and nobody noticed this?
>
> > +/* This assumes the tsk is current or the task which is not running. */
> > +static unsigned long __kretprobe_find_ret_addr(struct task_struct *tsk,
> > + struct llist_node **cur)
>
>
> A better comment:
>
> /* This assumes 'tsk' is the current task, or is not running. */
>
> We always escape variable names in English sentences. This is nothing new.

OK.

>
> > + *cur = node;
> > + return (unsigned long)ri->ret_addr;
>
> Don't just randomly add forced type casts (which are dangerous,
> bug-inducing patterns of code) without examining whether it's justified.

Yes, I need to cleanup kprobes code, it seems too many '*' <-> 'unsinged long'
type castings.

> But a compiler warning is not justification!
>
> In this case the examination would involve:
>
> kepler:~/tip> git grep -w ret_addr kernel/kprobes.c
>
> kernel/kprobes.c: if (ri->ret_addr != kretprobe_trampoline_addr()) {
> kernel/kprobes.c: return (unsigned long)ri->ret_addr;
> kernel/kprobes.c: ri->ret_addr = correct_ret_addr;
>
> kepler:~/tip> git grep -w correct_ret_addr kernel/kprobes.c
>
> kernel/kprobes.c: kprobe_opcode_t *correct_ret_addr = NULL;
> kernel/kprobes.c: correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);
> kernel/kprobes.c: if (!correct_ret_addr) {
> kernel/kprobes.c: ri->ret_addr = correct_ret_addr;
> kernel/kprobes.c: return (unsigned long)correct_ret_addr;
>
> what we can see here is unnecessary type confusion & friction of the first
> degree:
>
> - 'correct_ret_addr' is 'kprobe_opcode_t *' (which is good), but the newly
> introduced __kretprobe_find_ret_addr() function doesn't return such a
> type - why?

OK, this is my mistake. Since 'kprobe_opcode_t *' is used only inside the
kprobes, I would like to make kretprobe_find_ret_addr() returning 'unsigned
long'. But I used 'unsigned long' for internal function too.

> - struct_kretprobe_instance::ret_address has a 'kprobe_opcode_t *' type as
> well - which is good.
>
> - kretprobe_find_ret_addr() uses 'unsigned long', but it returns the value
> to __kretprobe_trampoline_handler(), which does *another* forced type
> cast:
>
> + correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);

This is '__kretprobe_find_ret_addr()', an internal function, which should
be fixed to return 'kprobe_opcode_t *'.

But I would like to keep the 'kretprobe_find_ret_addr()' returns 'unsigned long'
because it is used from stack unwinder, which uses 'unsigned long' for the
address type. What would you think?

> So we have the following type conversions:
>
> kprobe_opcode_t * => unsigned long => unsigned long => kprobe_opcode_t *
>
> Is there a technical reason why we cannot just use 'kprobe_opcode_t *'.

OK, I'll use the 'kprobe_opcode_t *' unless it is exposed to other subsystem.

>
> All other type casts in the kprobes code should be reviewed as well.
>
> > - BUG_ON(1);
> > + return 0;
>
> And in the proper, intact type propagation model this would become
> 'return NULL' - which is *far* more obviously a 'not found' condition
> than a random zero that might mean anything...

OK.

>
> > +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp,
> > + struct llist_node **cur)
> > +{
> > + struct kretprobe_instance *ri = NULL;
> > + unsigned long ret;
> > +
> > + do {
> > + ret = __kretprobe_find_ret_addr(tsk, cur);
> > + if (!ret)
> > + return ret;
> > + ri = container_of(*cur, struct kretprobe_instance, llist);
> > + } while (ri->fp != fp);
> > +
> > + return ret;
>
> Here I see another type model problem: why is the frame pointer 'void *',
> which makes it way too easy to mix up with text pointers such as
> 'kprobe_opcode_t *'?

(at that moment, I just used same type of 'kretprobe_instance->fp')

>
> In the x86 unwinder we use 'unsigned long *' as the frame pointer:
>
> unsigned long *bp
>
> but it might also make sense to introduce a more opaque dedicated type
> within the kprobes code, such as 'frame_pointer_t'.
>
> > +unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
> > + void *frame_pointer)
> > +{
> > + kprobe_opcode_t *correct_ret_addr = NULL;
> > + struct kretprobe_instance *ri = NULL;
> > + struct llist_node *first, *node = NULL;
> > + struct kretprobe *rp;
> > +
> > + /* Find correct address and all nodes for this frame. */
> > + correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);
> > + if (!correct_ret_addr) {
> > + pr_err("Oops! Kretprobe fails to find correct return address.\n");
>
> Could we please make user-facing messages less random? Right now we have:

OK. Those are historically randomly expanded. Now the time to clean up.

>
> kepler:~/tip> git grep -E 'pr_.*\(' kernel/kprobes.c include/linux/kprobes.h include/asm-generic/kprobes.h $(find arch/ -name kprobes.c)
>
> arch/arm64/kernel/probes/kprobes.c: pr_warn("Unrecoverable kprobe detected.\n");
> arch/csky/kernel/probes/kprobes.c: pr_warn("Address not aligned.\n");
> arch/csky/kernel/probes/kprobes.c: pr_warn("Unrecoverable kprobe detected.\n");
> arch/mips/kernel/kprobes.c: pr_notice("Kprobes for ll and sc instructions are not"
> arch/mips/kernel/kprobes.c: pr_notice("Kprobes for branch delayslot are not supported\n");
> arch/mips/kernel/kprobes.c: pr_notice("Kprobes for compact branches are not supported\n");
> arch/mips/kernel/kprobes.c: pr_notice("%s: unaligned epc - sending SIGBUS.\n", current->comm);
> arch/mips/kernel/kprobes.c: pr_notice("Kprobes: Error in evaluating branch\n");
> arch/riscv/kernel/probes/kprobes.c: pr_warn("Address not aligned.\n");
> arch/riscv/kernel/probes/kprobes.c: pr_warn("Unrecoverable kprobe detected.\n");
> arch/s390/kernel/kprobes.c: pr_err("Invalid kprobe detected.\n");
> kernel/kprobes.c: pr_debug("Failed to arm kprobe-ftrace at %pS (%d)\n",
> kernel/kprobes.c: pr_debug("Failed to init kprobe-ftrace (%d)\n", ret);
> kernel/kprobes.c: pr_err("Oops! Kretprobe fails to find correct return address.\n");
> kernel/kprobes.c: pr_err("Dumping kprobe:\n");
> kernel/kprobes.c: pr_err("Name: %s\nOffset: %x\nAddress: %pS\n",
> kernel/kprobes.c: pr_err("kprobes: failed to populate blacklist: %d\n", err);
> kernel/kprobes.c: pr_err("Please take care of using kprobes.\n");
> kernel/kprobes.c: pr_warn("Kprobes globally enabled, but failed to arm %d out of %d probes\n",
> kernel/kprobes.c: pr_info("Kprobes globally enabled\n");
> kernel/kprobes.c: pr_warn("Kprobes globally disabled, but failed to disarm %d out of %d probes\n",
> kernel/kprobes.c: pr_info("Kprobes globally disabled\n");
>
> In particular, what users may see in their syslog, when the kprobes code
> runs into trouble, is, roughly:
>
> kepler:~/tip> git grep -E 'pr_.*\(' kernel/kprobes.c include/linux/kprobes.h include/asm-generic/kprobes.h $(find arch/ -name kprobes.c) | cut -d\" -f2
>
> Unrecoverable kprobe detected.\n
> Address not aligned.\n
> Unrecoverable kprobe detected.\n
> Kprobes for ll and sc instructions are not
> Kprobes for branch delayslot are not supported\n
> Kprobes for compact branches are not supported\n
> %s: unaligned epc - sending SIGBUS.\n
> Kprobes: Error in evaluating branch\n
> Address not aligned.\n
> Unrecoverable kprobe detected.\n
> Invalid kprobe detected.\n
> Failed to arm kprobe-ftrace at %pS (%d)\n
> Failed to init kprobe-ftrace (%d)\n
> Oops! Kretprobe fails to find correct return address.\n
> Dumping kprobe:\n
> Name: %s\nOffset: %x\nAddress: %pS\n
> kprobes: failed to populate blacklist: %d\n
> Please take care of using kprobes.\n
> Kprobes globally enabled, but failed to arm %d out of %d probes\n
> Kprobes globally enabled\n
> Kprobes globally disabled, but failed to disarm %d out of %d probes\n
> Kprobes globally disabled\n
>
> Ugh. Some of the messages don't even have 'kprobes' in them...

Indeed.

>
> So my suggestion would be:
>
> - Introduce a subsystem syslog message prefix, via the standard pattern of:
>
> #define pr_fmt(fmt) "kprobes: " fmt

OK.

>
> - Standardize the messages:
>
> - Start each message with a key noun that stresses the nature of the
> failure.
>
> - *Make each message self-explanatory*, don't leave users hanging in
> the air about what is going to happen next. Messages like:
>
> Address not aligned.\n
>
> - Check spelling. This:
>
> pr_err("kprobes: failed to populate blacklist: %d\n", err);
> pr_err("Please take care of using kprobes.\n");
>
> should be on a single line and should probably say something like:
>
> pr_err("kprobes: Failed to populate blacklist (error: %d), kprobes not restricted, be careful using them!.\n", err);
>
> and if checkpatch complains that the line is 'too long', ignore
> checkpatch and keep the message self-contained.

OK.

>
> - and most importantly: provide a suggested *resolution*.
> Passive-aggressive messages like:
>
> Oops! Kretprobe fails to find correct return address.\n
>
> are next to useless. Instead, always describe:
>
> - what happened,
> - what is the kernel going to do or not do,
> - is the kernel fine,
> - what can the user do about it.

OK, since above message is a kind of dying message, it must be important to notice
that the thread may not possible to continue to work.

>
> In this case, a better message would be:
>
> kretprobes: Return address not found, not executing handler. Kernel is probably fine, but check the system tool that did this.

If this happens, the kernel calls BUG_ON(1) because it is unrecovarable error
and there may be kernel bug. Can I say "kernel is probably fine"?

>
> Each and every message should be reviewed & fixed to meet these standards -
> or should be removed and replaced with a WARN_ON() if it's indicating an
> internal bug that cannot be caused by kprobes using tools, such as this
> one:
>
> > + if (WARN_ON_ONCE(ri->fp != frame_pointer))
> > + break;
>
> I can help double checking the fixed messages, if you are unsure about any
> of them.

Thanks for you help!

>
> > + /* Recycle them. */
> > + while (first) {
> > + ri = container_of(first, struct kretprobe_instance, llist);
> > + first = first->next;
> >
> > recycle_rp_inst(ri);
> > }
>
> It would be helpful to explain, a bit more verbose comment, what
> 'recycling' is in this context. The code is not very helpful:

OK.

>
> NOKPROBE_SYMBOL(free_rp_inst_rcu);
>
> static void recycle_rp_inst(struct kretprobe_instance *ri)
> {
> struct kretprobe *rp = get_kretprobe(ri);
>
> if (likely(rp)) {
> freelist_add(&ri->freelist, &rp->freelist);
> } else
> call_rcu(&ri->rcu, free_rp_inst_rcu);
> }
> NOKPROBE_SYMBOL(recycle_rp_inst);
>
> BTW., why are unnecessary curly braces used here?

Maybe I forgot to remove it when I introduced freelist_add() there...

>
> The kprobes code urgently needs a quality boost.

OK, before this series, I'll clean it up first.

Thank you,

>
> Thanks,
>
> Ingo


--
Masami Hiramatsu <[email protected]>

2021-07-05 14:41:35

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 08/13] arm: kprobes: Make a space for regs->ARM_pc at kretprobe_trampoline

On Mon, 5 Jul 2021 10:04:41 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Masami Hiramatsu <[email protected]> wrote:
>
> > Change kretprobe_trampoline to make a space for regs->ARM_pc so that
> > kretprobe_trampoline_handler can call instruction_pointer_set()
> > safely.
>
> The idiom is "make space", but in any case, what does this mean?

Since arm's kretprobe_trampoline() saves partial pt_regs, regs->ARM_pc
is not accessible (it points the caller function's stack frame).
Therefore, this extends the stack frame for storing regs->ARM_pc.

>
> Was the stack frame set up in kretprobe_trampoline() and calling
> trampoline_handler() buggy?
>
> If yes, then explain the bad effects of the bug, and make all of this clear
> in the title & changelog.

This is actually buggy from the specification viewpoint. And if
the kretprobe handler sets the instruction pointer, it must be
ignored, but in reallty, it breaks the stack frame (this does
not happen in the ftrace/perf dynamic events, but a custom kretprobe
kernel module can do this.)

Thank you,

>
> Thanks,
>
> Ingo


--
Masami Hiramatsu <[email protected]>

2021-07-05 15:44:25

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry

On Mon, 5 Jul 2021 13:36:14 +0200
Peter Zijlstra <[email protected]> wrote:

> On Fri, Jun 18, 2021 at 04:07:06PM +0900, Masami Hiramatsu wrote:
> > @@ -549,7 +548,15 @@ bool unwind_next_frame(struct unwind_state *state)
> > (void *)orig_ip);
> > goto err;
> > }
> > -
> > + /*
> > + * There is a small chance to interrupt at the entry of
> > + * kretprobe_trampoline where the ORC info doesn't exist.
> > + * That point is right after the RET to kretprobe_trampoline
> > + * which was modified return address. So the @addr_p must
> > + * be right before the regs->sp.
> > + */
> > + state->ip = unwind_recover_kretprobe(state, state->ip,
> > + (unsigned long *)(state->sp - sizeof(long)));
> > state->regs = (struct pt_regs *)sp;
> > state->prev_regs = NULL;
> > state->full_regs = true;
> > @@ -562,6 +569,9 @@ bool unwind_next_frame(struct unwind_state *state)
> > (void *)orig_ip);
> > goto err;
> > }
> > + /* See UNWIND_HINT_TYPE_REGS case comment. */
> > + state->ip = unwind_recover_kretprobe(state, state->ip,
> > + (unsigned long *)(state->sp - sizeof(long)));
> >
> > if (state->full_regs)
> > state->prev_regs = state->regs;
>
> Why doesn't the ftrace case have this? That is, why aren't both return
> trampolines having the same general shape?

Ah, this strongly depends what the trampoline code does.
For the kretprobe case, the PUSHQ at the entry of the kretprobe_trampoline()
does not covered by UNWIND_HINT_FUNC. Thus it needs to find 'correct_ret_addr'
by the frame pointer (which is next to the sp).

"kretprobe_trampoline:\n"
#ifdef CONFIG_X86_64
/* Push fake return address to tell the unwinder it's a kretprobe */
" pushq $kretprobe_trampoline\n"
UNWIND_HINT_FUNC

But I'm not so sure how ftrace treat it. It seems that the return_to_handler()
doesn't care such case. (anyway, return_to_handler() does not return but jump
to the original call-site, in that case, the information will be lost.)

Thank you,

--
Masami Hiramatsu <[email protected]>

2021-07-06 13:02:09

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 13/13] x86/kprobes: Fixup return address in generic trampoline handler

On Mon, 5 Jul 2021 10:34:58 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Masami Hiramatsu <[email protected]> wrote:
>
> > In x86, kretprobe trampoline address on the stack frame will
> > be replaced with the real return address after returning from
> > trampoline_handler. Before fixing the return address, the real
> > return address can be found in the current->kretprobe_instances.
> >
> > However, since there is a window between updating the
> > current->kretprobe_instances and fixing the address on the stack,
> > if an interrupt caused at that timing and the interrupt handler
> > does stacktrace, it may fail to unwind because it can not get
> > the correct return address from current->kretprobe_instances.
> >
> > This will minimize that window by fixing the return address
> > right before updating current->kretprobe_instances.
>
> Is there still a window? I.e. is it "minimized" (to how big of a window?),
> or eliminated?

Oh, this will eliminate the window, because the return address is
fixed before updating the 'current->kretprobe_instance'.


>
> > +void arch_kretprobe_fixup_return(struct pt_regs *regs,
> > + unsigned long correct_ret_addr)
> > +{
> > + unsigned long *frame_pointer;
> > +
> > + frame_pointer = ((unsigned long *)&regs->sp) + 1;
> > +
> > + /* Replace fake return address with real one. */
> > + *frame_pointer = correct_ret_addr;
>
> Firstly, why does &regs->sp have to be forced to 'unsigned long *'?
>
> pt_regs::sp is 'unsigned long' on both 32-bit and 64-bit kernels AFAICS.

Ah, right.

>
> Secondly, the new code modified by your patch now looks like this:
>
> frame_pointer = ((unsigned long *)&regs->sp) + 1;
>
> + kretprobe_trampoline_handler(regs, frame_pointer);
>
> where:
>
> +void arch_kretprobe_fixup_return(struct pt_regs *regs,
> + unsigned long correct_ret_addr)
> +{
> + unsigned long *frame_pointer;
> +
> + frame_pointer = ((unsigned long *)&regs->sp) + 1;
> +
> + /* Replace fake return address with real one. */
> + *frame_pointer = correct_ret_addr;
> +}
>
> So we first do:
>
> frame_pointer = ((unsigned long *)&regs->sp) + 1;
>
> ... and pass that in to arch_kretprobe_fixup_return() as
> 'correct_ret_addr', which does:

No, 'correct_ret_addr' is found from 'current->kretprobe_instances'

/* Find correct address and all nodes for this frame. */
correct_ret_addr = (void *)__kretprobe_find_ret_addr(current, &node);

>
> + frame_pointer = ((unsigned long *)&regs->sp) + 1;
> + *frame_pointer = correct_ret_addr;
>
> ... which looks like the exact same thing as:
>
> *frame_pointer = frame_pointer;
>
> ... obfuscated through a thick layer of type casts?

Thus it will be the same thing as

*frame_pointer = __kretprobe_find_ret_addr(current, &node);

Actually, this is a bit confusing because same 'frame_pointer' is
calcurated twice from 'regs->sp'. This is because the return address
is stored at 'frame_pointer' or not depends on the architecture.


Thank you,

>
> Thanks,
>
> Ingo


--
Masami Hiramatsu <[email protected]>

2021-07-06 15:13:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry

On Tue, 6 Jul 2021 09:55:03 +0200
Peter Zijlstra <[email protected]> wrote:

> > But I'm not so sure how ftrace treat it. It seems that the return_to_handler()
> > doesn't care such case. (anyway, return_to_handler() does not return but jump
> > to the original call-site, in that case, the information will be lost.)
>
> I find it bothersome (OCD, sorry :-) that both return trampolines behave
> differently. Doubly so because I know people (Steve in particular) have
> been talking about unifying them.

They were developed separately, and designed differently with different
goals in mind. Yes, I want to unify them, but trying to get the
different goals together, compounded by the fact that almost every arch
also implemented them differently (in which case, we need to find a way
to do it one arch at a time), makes the process extremely frustrating.

>
> Steve, can you clarify the ftrace side here? Afaict return_to_handler()
> is similarly affected.

I'm not exactly sure what the issue is. As Masami stated, kretprobe
uses a ret to return to the calling function, but ftrace uses a jmp.

kretprobe return tracing is more complex than the function graph return
tracing is (which is one of the issues I need to overcome to unify
them), and when the function graph return trampoline was created, it
did things as simple as possible (and before ORC existed).

Is this something to worry about now, or should we look to fix his in
the unifying process?

-- Steve

2021-07-07 10:16:38

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry

On Wed, 7 Jul 2021 10:20:41 +0200
Peter Zijlstra <[email protected]> wrote:

> > > Steve, can you clarify the ftrace side here? Afaict return_to_handler()
> > > is similarly affected.
> >
> > I'm not exactly sure what the issue is. As Masami stated, kretprobe
> > uses a ret to return to the calling function, but ftrace uses a jmp.
>
> I'll have to re-read the ftrace bits, but from the top of my head you
> cannot do an indirect jump and preserve all registers at the same time,
> so a return stub must use jump from stack aka. ret.
>
> > kretprobe return tracing is more complex than the function graph return
> > tracing is (which is one of the issues I need to overcome to unify
> > them),
>
> I'm not sure it is. IIRC the biggest pain point with kretprobe is that
> 'silly' property that the kretprobe_instance are not the same between
> kretprobes. Luckily, that's not actually used anywhere, so we can simply
> rip that out.

I actually don't want to keep this feature because no one use it.
(only systemtap needs it?)

Anyway, if we keep the idea-level compatibility (not code level),
what we need is 'void *data' in the struct kretprobe_instance.
User who needs it can allocate their own instance data for their
kretprobes when initialising it and sets in their entry handler.

Then we can have a simple kretprobe_instance.

Thank you,



--
Masami Hiramatsu <[email protected]>

2021-07-07 10:47:23

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry

On Wed, 7 Jul 2021 12:20:57 +0200
Peter Zijlstra <[email protected]> wrote:

> On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:
>
> > I actually don't want to keep this feature because no one use it.
> > (only systemtap needs it?)
>
> Yeah, you mentioned systemtap, but since that's out-of-tree I don't
> care. Their problem.
>
> > Anyway, if we keep the idea-level compatibility (not code level),
> > what we need is 'void *data' in the struct kretprobe_instance.
> > User who needs it can allocate their own instance data for their
> > kretprobes when initialising it and sets in their entry handler.
> >
> > Then we can have a simple kretprobe_instance.
>
> When would you do the alloc? When installing the retprobe, but that
> might be inside the allocator, which means you can't call the allocator
> etc.. :-)

Yes, so the user may need to allocate a pool right before register_kretprobe().
(whether per-kretprobe or per-task or global pool, that is user's choice.)

>
> If we look at struct ftrace_ret_stack, it has a few fixed function
> fields. The calltime one is all that is needed for the kretprobe
> example code.

kretprobe consumes 3 fields, a pointer to 'struct kretprobe' (which
stores callee function address in 'kretprobe::kp.addr'), a return
address and a frame pointer (*).

* note that this frame pointer might be used for fixing up the
stack trace, but the fixup method depends on the architecture.

Thank you,

--
Masami Hiramatsu <[email protected]>

2021-07-07 13:33:03

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry

On Wed, 7 Jul 2021 19:45:30 +0900
Masami Hiramatsu <[email protected]> wrote:

> On Wed, 7 Jul 2021 12:20:57 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:
> >
> > > I actually don't want to keep this feature because no one use it.
> > > (only systemtap needs it?)
> >
> > Yeah, you mentioned systemtap, but since that's out-of-tree I don't
> > care. Their problem.

Yeah, maybe it is not hard to update.

> >
> > > Anyway, if we keep the idea-level compatibility (not code level),
> > > what we need is 'void *data' in the struct kretprobe_instance.
> > > User who needs it can allocate their own instance data for their
> > > kretprobes when initialising it and sets in their entry handler.
> > >
> > > Then we can have a simple kretprobe_instance.
> >
> > When would you do the alloc? When installing the retprobe, but that
> > might be inside the allocator, which means you can't call the allocator
> > etc.. :-)
>
> Yes, so the user may need to allocate a pool right before register_kretprobe().
> (whether per-kretprobe or per-task or global pool, that is user's choice.)
>
> >
> > If we look at struct ftrace_ret_stack, it has a few fixed function
> > fields. The calltime one is all that is needed for the kretprobe
> > example code.
>
> kretprobe consumes 3 fields, a pointer to 'struct kretprobe' (which
> stores callee function address in 'kretprobe::kp.addr'), a return
> address and a frame pointer (*).

Oops, I forgot to add "void *data" for storing user data.

Thank you,

--
Masami Hiramatsu <[email protected]>

2021-07-07 14:44:58

by wuqiang.matt

[permalink] [raw]
Subject: Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry

On 2021/7/7 PM9:29, Masami Hiramatsu wrote:
> On Wed, 7 Jul 2021 19:45:30 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
>> On Wed, 7 Jul 2021 12:20:57 +0200
>> Peter Zijlstra <[email protected]> wrote:
>>
>>> On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:
>>>
>>>> I actually don't want to keep this feature because no one use it.
>>>> (only systemtap needs it?)
>>>
>>> Yeah, you mentioned systemtap, but since that's out-of-tree I don't
>>> care. Their problem.
>
> Yeah, maybe it is not hard to update.
>
>>>
>>>> Anyway, if we keep the idea-level compatibility (not code level),
>>>> what we need is 'void *data' in the struct kretprobe_instance.
>>>> User who needs it can allocate their own instance data for their
>>>> kretprobes when initialising it and sets in their entry handler.
>>>>
>>>> Then we can have a simple kretprobe_instance.
>>>
>>> When would you do the alloc? When installing the retprobe, but that
>>> might be inside the allocator, which means you can't call the allocator
>>> etc.. :-)
>>
>> Yes, so the user may need to allocate a pool right before register_kretprobe().
>> (whether per-kretprobe or per-task or global pool, that is user's choice.)
>>
>>>
>>> If we look at struct ftrace_ret_stack, it has a few fixed function
>>> fields. The calltime one is all that is needed for the kretprobe
>>> example code.
>>
>> kretprobe consumes 3 fields, a pointer to 'struct kretprobe' (which
>> stores callee function address in 'kretprobe::kp.addr'), a return
>> address and a frame pointer (*).
> > Oops, I forgot to add "void *data" for storing user data.
>

Should use "struct kretprobe_holder *rph", since "struct kretprobe" belongs
to 3rd-party module (which might be unloaded any time).

User's own pool might not work if the module can be unloaded. Better manage
the pool in kretprobe_holder, which needs no changes from user side.

2021-07-07 18:32:42

by Andrii Nakryiko

[permalink] [raw]
Subject: Re: [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()

On Fri, Jun 18, 2021 at 12:05 AM Masami Hiramatsu <[email protected]> wrote:
>
> Replace arch_deref_entry_point() with dereference_symbol_descriptor()
> because those are doing same thing.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Tested-by: Andrii Nakryik <[email protected]>

Hi Masami,

If you are going to post v9 anyway, can you please fix up my name, it
should be "Andrii Nakryiko", thanks!

> ---
> Changes in v6:
> - Use dereference_symbol_descriptor() so that it can handle address in
> modules correctly.
> ---
> arch/ia64/kernel/kprobes.c | 5 -----
> arch/powerpc/kernel/kprobes.c | 11 -----------
> include/linux/kprobes.h | 1 -
> kernel/kprobes.c | 7 +------
> lib/error-inject.c | 3 ++-
> 5 files changed, 3 insertions(+), 24 deletions(-)
>

[...]

2021-07-08 04:10:22

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 02/13] kprobes: treewide: Replace arch_deref_entry_point() with dereference_symbol_descriptor()

On Wed, 7 Jul 2021 11:28:10 -0700
Andrii Nakryiko <[email protected]> wrote:

> On Fri, Jun 18, 2021 at 12:05 AM Masami Hiramatsu <[email protected]> wrote:
> >
> > Replace arch_deref_entry_point() with dereference_symbol_descriptor()
> > because those are doing same thing.
> >
> > Signed-off-by: Masami Hiramatsu <[email protected]>
> > Tested-by: Andrii Nakryik <[email protected]>
>
> Hi Masami,
>
> If you are going to post v9 anyway, can you please fix up my name, it
> should be "Andrii Nakryiko", thanks!

Oops, sorry. OK, I'll fix it.

Thank you,

>
> > ---
> > Changes in v6:
> > - Use dereference_symbol_descriptor() so that it can handle address in
> > modules correctly.
> > ---
> > arch/ia64/kernel/kprobes.c | 5 -----
> > arch/powerpc/kernel/kprobes.c | 11 -----------
> > include/linux/kprobes.h | 1 -
> > kernel/kprobes.c | 7 +------
> > lib/error-inject.c | 3 ++-
> > 5 files changed, 3 insertions(+), 24 deletions(-)
> >
>
> [...]


--
Masami Hiramatsu <[email protected]>

2021-07-09 14:57:19

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 10/13] x86/kprobes: Push a fake return address at kretprobe_trampoline

On Mon, 5 Jul 2021 10:17:26 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Masami Hiramatsu <[email protected]> wrote:
>
> > + /* Replace fake return address with real one. */
> > + *frame_pointer = kretprobe_trampoline_handler(regs, frame_pointer);
> > + /*
> > + * Move flags to sp so that kretprobe_trapmoline can return
> > + * right after popf.
>
> What is a trapmoline?

This means kretprobe_trampoline() code.

>
> Also, in the x86 code we capitalize register and instruction names so that
> they are more distinctive and easier to read in the flow of English text.

OK, let me update it.

Thank you,

>
> Thanks,
>
> Ingo


--
Masami Hiramatsu <[email protected]>

2021-07-09 15:35:30

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code

On Mon, 5 Jul 2021 10:02:47 +0200
Ingo Molnar <[email protected]> wrote:

>
> * Masami Hiramatsu <[email protected]> wrote:
>
> > From: Josh Poimboeuf <[email protected]>
> >
> > Add UNWIND_HINT_FUNC on kretporbe_trampoline code so that ORC
> > information is generated on the kretprobe_trampoline correctly.
>
> What is a 'kretporbe'?

Oops, it's a typo.

>
> > Note that when the CONFIG_FRAME_POINTER=y, since the
> > kretprobe_trampoline skips updating frame pointer, the stack frame
> > of the kretprobe_trampoline seems non-standard. So this marks it
> > is STACK_FRAME_NON_STANDARD() and undefine UNWIND_HINT_FUNC.
>
> What does 'marks it is' mean?

Sorry, I meant, this marks the kretprobe_trampoline as non-standard
stack frame by STACK_FRAME_NON_STANDARD().

>
> 'undefine' UNWIND_HINT_FUNC?
>
> Doesn't the patch do the exact opposite:
>
> > +#define UNWIND_HINT_FUNC \
> > + UNWIND_HINT(ORC_REG_SP, 8, UNWIND_HINT_TYPE_FUNC, 0)
>
> But it does undefine it in a specific spot:

Yes, if you think this is not correct way, what about the following?

#ifdef CONFIG_FRAME_POINTER
STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
#define KRETPROBE_UNWIND_HINT_FUNC
#else
#define KRETPROBE_UNWIND_HINT_FUNC UNWIND_HINT_FUNC
#endif


> > Anyway, with the frame pointer, FP unwinder can unwind the stack
> > frame correctly without that hint.
> >
> > Signed-off-by: Josh Poimboeuf <[email protected]>
> > Signed-off-by: Masami Hiramatsu <[email protected]>
> > Tested-by: Andrii Nakryik <[email protected]>
>
> I have to say these changelogs are very careless.

Sorry for inconvenience...

>
> > +#else
> > +
>
> In headers, in longer CPP blocks, please always mark the '#else' branch
> with what it is the else branch of.

OK.

>
> See the output of:
>
> kepler:~/tip> git grep '#else' arch/x86/include/asm/ | head

Thanks for the hint!

>
> > +#ifdef CONFIG_FRAME_POINTER
> > +/*
> > + * kretprobe_trampoline skips updating frame pointer. The frame pointer
> > + * saved in trampoline_handler points to the real caller function's
> > + * frame pointer. Thus the kretprobe_trampoline doesn't seems to have a
> > + * standard stack frame with CONFIG_FRAME_POINTER=y.
> > + * Let's mark it non-standard function. Anyway, FP unwinder can correctly
> > + * unwind without the hint.
>
> s/doesn't seems to have a standard stack frame
> /doesn't have a standard stack frame
>
> There's nothing 'seems' about the situation - it's a non-standard function
> entry and stack frame situation, and the unwinder needs to know about it.

OK.

>
> > +STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> > +#undef UNWIND_HINT_FUNC
> > +#define UNWIND_HINT_FUNC
> > +#endif
> > /*
> > * When a retprobed function returns, this code saves registers and
> > * calls trampoline_handler() runs, which calls the kretprobe's handler.
> > @@ -1031,6 +1044,7 @@ asm(
> > /* We don't bother saving the ss register */
> > #ifdef CONFIG_X86_64
> > " pushq %rsp\n"
> > + UNWIND_HINT_FUNC
> > " pushfq\n"
> > SAVE_REGS_STRING
> > " movq %rsp, %rdi\n"
> > @@ -1041,6 +1055,7 @@ asm(
> > " popfq\n"
> > #else
> > " pushl %esp\n"
> > + UNWIND_HINT_FUNC
> > " pushfl\n"
> > SAVE_REGS_STRING
> > " movl %esp, %eax\n"
>
> Why not provide an appropriate annotation method in <asm/unwind_hints.h>,
> so that other future code can use it too instead of reinventing the wheel?

Would you mean we should define the UNWIND_HINT_FUNC as a macro
which depends on CONFIG_FRAME_POINTER, in <asm/unwind_hints.h>?

Josh, what would you think?

Thank you,


--
Masami Hiramatsu <[email protected]>

2021-07-10 01:44:18

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code

Hi Ingo and Josh,

On Sat, 10 Jul 2021 00:31:40 +0900
Masami Hiramatsu <[email protected]> wrote:

> > > +STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> > > +#undef UNWIND_HINT_FUNC
> > > +#define UNWIND_HINT_FUNC
> > > +#endif
> > > /*
> > > * When a retprobed function returns, this code saves registers and
> > > * calls trampoline_handler() runs, which calls the kretprobe's handler.
> > > @@ -1031,6 +1044,7 @@ asm(
> > > /* We don't bother saving the ss register */
> > > #ifdef CONFIG_X86_64
> > > " pushq %rsp\n"
> > > + UNWIND_HINT_FUNC
> > > " pushfq\n"
> > > SAVE_REGS_STRING
> > > " movq %rsp, %rdi\n"
> > > @@ -1041,6 +1055,7 @@ asm(
> > > " popfq\n"
> > > #else
> > > " pushl %esp\n"
> > > + UNWIND_HINT_FUNC
> > > " pushfl\n"
> > > SAVE_REGS_STRING
> > > " movl %esp, %eax\n"
> >
> > Why not provide an appropriate annotation method in <asm/unwind_hints.h>,
> > so that other future code can use it too instead of reinventing the wheel?

I think I got what you meant. Let me summarize the issue.

In case of CONFIG_FRAME_POINTER=n, it is OK just adding UNWIND_HINT_FUNC.

In case of CONFIG_FRAME_POINTER=y, without STACK_FRAME_NON_STANDARD(),
the objtool complains that a CALL instruction without the frame pointer.
---
arch/x86/kernel/kprobes/core.o: warning: objtool: __kretprobe_trampoline()+0x25: call without frame pointer save/setup
---

If we just add STACK_FRAME_NON_STANDARD() with UNWIND_HINT_FUNC macro,
the objtool complains that non-standard function has unwind hint.
---
arch/x86/kernel/kprobes/core.o: warning: objtool: __kretprobe_trampoline()+0x1: BUG: why am I validating an ignored function?
---

Thus, add STACK_FRAME_NON_STANDARD() and undefine UNWIND_HINT_FUNC macro,
the objtool doesn't complain.

This means that the STACK_FRAME_NON_STANDARD() and UNWIND_HINT_FUNC macro
are mutually exclusive. However, those macros are used different way.
The STACK_FRAME_NON_STANDARD() will have the target symbol and the
UNWIND_HINT_FUNC needs to be embedded in the target code.
Thus we can not combine them in general.

If we can have something like UNWIND_HINT_FUNC_NO_FP, it may solve this
issue without ugly #ifdef and #undef.

Is that correct?

Maybe I can add UNWIND_HINT_TYPE_FUNC_NO_FP for UNWIND_HINT and make objtool
ignore the call without frame pointer. This makes an exception that the
kretprobe_trampoline will be noted in '.discard.unwind_hints' section
instead of '.discard.func_stack_frame_non_standard' section.

Or another idea is to introduce ANNOTATE_NO_FP_FUNCTION_CALL with a new
'.discard.no_fp_function_calls' section.

What do you think these ideas?

Thank you,

--
Masami Hiramatsu <[email protected]>

2021-07-10 19:09:51

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH -tip v8 05/13] x86/kprobes: Add UNWIND_HINT_FUNC on kretprobe_trampoline code

On Sat, Jul 10, 2021 at 10:41:04AM +0900, Masami Hiramatsu wrote:
> Hi Ingo and Josh,
>
> On Sat, 10 Jul 2021 00:31:40 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> > > > +STACK_FRAME_NON_STANDARD(kretprobe_trampoline);
> > > > +#undef UNWIND_HINT_FUNC
> > > > +#define UNWIND_HINT_FUNC
> > > > +#endif
> > > > /*
> > > > * When a retprobed function returns, this code saves registers and
> > > > * calls trampoline_handler() runs, which calls the kretprobe's handler.
> > > > @@ -1031,6 +1044,7 @@ asm(
> > > > /* We don't bother saving the ss register */
> > > > #ifdef CONFIG_X86_64
> > > > " pushq %rsp\n"
> > > > + UNWIND_HINT_FUNC
> > > > " pushfq\n"
> > > > SAVE_REGS_STRING
> > > > " movq %rsp, %rdi\n"
> > > > @@ -1041,6 +1055,7 @@ asm(
> > > > " popfq\n"
> > > > #else
> > > > " pushl %esp\n"
> > > > + UNWIND_HINT_FUNC
> > > > " pushfl\n"
> > > > SAVE_REGS_STRING
> > > > " movl %esp, %eax\n"
> > >
> > > Why not provide an appropriate annotation method in <asm/unwind_hints.h>,
> > > so that other future code can use it too instead of reinventing the wheel?
>
> I think I got what you meant. Let me summarize the issue.
>
> In case of CONFIG_FRAME_POINTER=n, it is OK just adding UNWIND_HINT_FUNC.
>
> In case of CONFIG_FRAME_POINTER=y, without STACK_FRAME_NON_STANDARD(),
> the objtool complains that a CALL instruction without the frame pointer.
> ---
> arch/x86/kernel/kprobes/core.o: warning: objtool: __kretprobe_trampoline()+0x25: call without frame pointer save/setup
> ---
>
> If we just add STACK_FRAME_NON_STANDARD() with UNWIND_HINT_FUNC macro,
> the objtool complains that non-standard function has unwind hint.
> ---
> arch/x86/kernel/kprobes/core.o: warning: objtool: __kretprobe_trampoline()+0x1: BUG: why am I validating an ignored function?

I'm thinking this latter warning indicates an objtool bug (as the BUG
warning claims). If a function is ignored with
STACK_FRAME_NON_STANDARD() then objtool should probably also ignore its
hints. Then we should be able to get rid of the #undef/#ifdef
UNWIND_HINT_FUNC silliness.

The other warning is correct and STACK_FRAME_NON_STANDARD() still needs
to be behind '#ifdef CONFIG_FRAME_POINTER' since the function is missing
a frame pointer. So maybe we can make a STACK_FRAME_NON_STANDARD_FP()
or similar.

I'll post a few patches.

--
Josh

2021-07-10 19:27:11

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 2/2] objtool: Ignore unwind hints for ignored functions

If a function is ignored, also ignore its hints. This is useful for the
case where the function ignore is conditional on frame pointers, e.g.
STACK_FRAME_NON_STANDARD_FP().

Signed-off-by: Josh Poimboeuf <[email protected]>
---
tools/objtool/check.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index e5947fbb9e7a..67cbdcfcabae 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2909,7 +2909,7 @@ static int validate_unwind_hints(struct objtool_file *file, struct section *sec)
}

while (&insn->list != &file->insn_list && (!sec || insn->sec == sec)) {
- if (insn->hint && !insn->visited) {
+ if (insn->hint && !insn->visited && !insn->ignore) {
ret = validate_branch(file, insn->func, insn, state);
if (ret && backtrace)
BT_FUNC("<=== (hint)", insn);
--
2.31.1

2021-07-10 19:27:12

by Josh Poimboeuf

[permalink] [raw]
Subject: [PATCH 1/2] objtool: Add frame-pointer-specific function ignore

Add a CONFIG_FRAME_POINTER-specific version of
STACK_FRAME_NON_STANDARD() for the case where a function is
intentionally missing frame pointer setup, but otherwise needs
objtool/ORC coverage when frame pointers are disabled.

Signed-off-by: Josh Poimboeuf <[email protected]>
---
include/linux/objtool.h | 11 +++++++++++
tools/include/linux/objtool.h | 11 +++++++++++
2 files changed, 22 insertions(+)

diff --git a/include/linux/objtool.h b/include/linux/objtool.h
index 7e72d975cb76..c9575ed91052 100644
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -66,6 +66,17 @@ struct unwind_hint {
static void __used __section(".discard.func_stack_frame_non_standard") \
*__func_stack_frame_non_standard_##func = func

+/*
+ * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore
+ * for the case where a function is intentionally missing frame pointer setup,
+ * but otherwise needs objtool/ORC coverage when frame pointers are disabled.
+ */
+#ifdef CONFIG_FRAME_POINTER
+#define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func)
+#else
+#define STACK_FRAME_NON_STANDARD_FP(func)
+#endif
+
#else /* __ASSEMBLY__ */

/*
diff --git a/tools/include/linux/objtool.h b/tools/include/linux/objtool.h
index 7e72d975cb76..c9575ed91052 100644
--- a/tools/include/linux/objtool.h
+++ b/tools/include/linux/objtool.h
@@ -66,6 +66,17 @@ struct unwind_hint {
static void __used __section(".discard.func_stack_frame_non_standard") \
*__func_stack_frame_non_standard_##func = func

+/*
+ * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore
+ * for the case where a function is intentionally missing frame pointer setup,
+ * but otherwise needs objtool/ORC coverage when frame pointers are disabled.
+ */
+#ifdef CONFIG_FRAME_POINTER
+#define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func)
+#else
+#define STACK_FRAME_NON_STANDARD_FP(func)
+#endif
+
#else /* __ASSEMBLY__ */

/*
--
2.31.1

2021-07-11 01:19:14

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH 1/2] objtool: Add frame-pointer-specific function ignore

On Sat, 10 Jul 2021 12:24:33 -0700
Josh Poimboeuf <[email protected]> wrote:

> Add a CONFIG_FRAME_POINTER-specific version of
> STACK_FRAME_NON_STANDARD() for the case where a function is
> intentionally missing frame pointer setup, but otherwise needs
> objtool/ORC coverage when frame pointers are disabled.

Looks good to me.

Reviewed-by: Masami Hiramatsu <[email protected]>

Thanks!

>
> Signed-off-by: Josh Poimboeuf <[email protected]>
> ---
> include/linux/objtool.h | 11 +++++++++++
> tools/include/linux/objtool.h | 11 +++++++++++
> 2 files changed, 22 insertions(+)
>
> diff --git a/include/linux/objtool.h b/include/linux/objtool.h
> index 7e72d975cb76..c9575ed91052 100644
> --- a/include/linux/objtool.h
> +++ b/include/linux/objtool.h
> @@ -66,6 +66,17 @@ struct unwind_hint {
> static void __used __section(".discard.func_stack_frame_non_standard") \
> *__func_stack_frame_non_standard_##func = func
>
> +/*
> + * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore
> + * for the case where a function is intentionally missing frame pointer setup,
> + * but otherwise needs objtool/ORC coverage when frame pointers are disabled.
> + */
> +#ifdef CONFIG_FRAME_POINTER
> +#define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func)
> +#else
> +#define STACK_FRAME_NON_STANDARD_FP(func)
> +#endif
> +
> #else /* __ASSEMBLY__ */
>
> /*
> diff --git a/tools/include/linux/objtool.h b/tools/include/linux/objtool.h
> index 7e72d975cb76..c9575ed91052 100644
> --- a/tools/include/linux/objtool.h
> +++ b/tools/include/linux/objtool.h
> @@ -66,6 +66,17 @@ struct unwind_hint {
> static void __used __section(".discard.func_stack_frame_non_standard") \
> *__func_stack_frame_non_standard_##func = func
>
> +/*
> + * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore
> + * for the case where a function is intentionally missing frame pointer setup,
> + * but otherwise needs objtool/ORC coverage when frame pointers are disabled.
> + */
> +#ifdef CONFIG_FRAME_POINTER
> +#define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func)
> +#else
> +#define STACK_FRAME_NON_STANDARD_FP(func)
> +#endif
> +
> #else /* __ASSEMBLY__ */
>
> /*
> --
> 2.31.1
>


--
Masami Hiramatsu <[email protected]>

2021-07-11 02:15:08

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH 2/2] objtool: Ignore unwind hints for ignored functions

On Sat, 10 Jul 2021 12:25:14 -0700
Josh Poimboeuf <[email protected]> wrote:

> If a function is ignored, also ignore its hints. This is useful for the
> case where the function ignore is conditional on frame pointers, e.g.
> STACK_FRAME_NON_STANDARD_FP().

This also looks good to me, and test with my series works fine.

Reviewed-by: Masami Hiramatsu <[email protected]>
Tested-by: Masami Hiramatsu <[email protected]>

Thanks!

>
> Signed-off-by: Josh Poimboeuf <[email protected]>
> ---
> tools/objtool/check.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index e5947fbb9e7a..67cbdcfcabae 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -2909,7 +2909,7 @@ static int validate_unwind_hints(struct objtool_file *file, struct section *sec)
> }
>
> while (&insn->list != &file->insn_list && (!sec || insn->sec == sec)) {
> - if (insn->hint && !insn->visited) {
> + if (insn->hint && !insn->visited && !insn->ignore) {
> ret = validate_branch(file, insn->func, insn, state);
> if (ret && backtrace)
> BT_FUNC("<=== (hint)", insn);
> --
> 2.31.1
>


--
Masami Hiramatsu <[email protected]>

2021-07-11 14:09:58

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry

On Wed, 7 Jul 2021 22:42:47 +0800
Matt Wu <[email protected]> wrote:

> On 2021/7/7 PM9:29, Masami Hiramatsu wrote:
> > On Wed, 7 Jul 2021 19:45:30 +0900
> > Masami Hiramatsu <[email protected]> wrote:
> >
> >> On Wed, 7 Jul 2021 12:20:57 +0200
> >> Peter Zijlstra <[email protected]> wrote:
> >>
> >>> On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:
> >>>
> >>>> I actually don't want to keep this feature because no one use it.
> >>>> (only systemtap needs it?)
> >>>
> >>> Yeah, you mentioned systemtap, but since that's out-of-tree I don't
> >>> care. Their problem.
> >
> > Yeah, maybe it is not hard to update.
> >
> >>>
> >>>> Anyway, if we keep the idea-level compatibility (not code level),
> >>>> what we need is 'void *data' in the struct kretprobe_instance.
> >>>> User who needs it can allocate their own instance data for their
> >>>> kretprobes when initialising it and sets in their entry handler.
> >>>>
> >>>> Then we can have a simple kretprobe_instance.
> >>>
> >>> When would you do the alloc? When installing the retprobe, but that
> >>> might be inside the allocator, which means you can't call the allocator
> >>> etc.. :-)
> >>
> >> Yes, so the user may need to allocate a pool right before register_kretprobe().
> >> (whether per-kretprobe or per-task or global pool, that is user's choice.)
> >>
> >>>
> >>> If we look at struct ftrace_ret_stack, it has a few fixed function
> >>> fields. The calltime one is all that is needed for the kretprobe
> >>> example code.
> >>
> >> kretprobe consumes 3 fields, a pointer to 'struct kretprobe' (which
> >> stores callee function address in 'kretprobe::kp.addr'), a return
> >> address and a frame pointer (*).
> > > Oops, I forgot to add "void *data" for storing user data.
> >
>
> Should use "struct kretprobe_holder *rph", since "struct kretprobe" belongs
> to 3rd-party module (which might be unloaded any time).

Good catch. Yes, instead of 'struct kretprobe', we need to use the holder.

> User's own pool might not work if the module can be unloaded. Better manage
> the pool in kretprobe_holder, which needs no changes from user side.

No, since the 'data' will be only refered from user handler. If the kretprobe
is released, then the kretprobe_holder will clear the refernce to the 'struct
kretprobe'. Then, the user handler is never called. No one access the 'data'.

Thank you,

--
Masami Hiramatsu <[email protected]>

2021-07-11 16:30:59

by wuqiang.matt

[permalink] [raw]
Subject: Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry

On 2021/7/11 PM10:09, Masami Hiramatsu wrote:
> On Wed, 7 Jul 2021 22:42:47 +0800
> Matt Wu <[email protected]> wrote:
>
>> On 2021/7/7 PM9:29, Masami Hiramatsu wrote:
>>> On Wed, 7 Jul 2021 19:45:30 +0900
>>> Masami Hiramatsu <[email protected]> wrote:
>>>
>>>> On Wed, 7 Jul 2021 12:20:57 +0200
>>>> Peter Zijlstra <[email protected]> wrote:
>>>>
>>>>> On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:
>>>>>
>>>>>> I actually don't want to keep this feature because no one use it.
>>>>>> (only systemtap needs it?)
>>>>>
>>>>> Yeah, you mentioned systemtap, but since that's out-of-tree I don't
>>>>> care. Their problem.
>>>
>>> Yeah, maybe it is not hard to update.
>>>
>>>>>
>>>>>> Anyway, if we keep the idea-level compatibility (not code level),
>>>>>> what we need is 'void *data' in the struct kretprobe_instance.
>>>>>> User who needs it can allocate their own instance data for their
>>>>>> kretprobes when initialising it and sets in their entry handler.
>>>>>>
>>>>>> Then we can have a simple kretprobe_instance.
>>>>>
>>>>> When would you do the alloc? When installing the retprobe, but that
>>>>> might be inside the allocator, which means you can't call the allocator
>>>>> etc.. :-)
>>>>
>>>> Yes, so the user may need to allocate a pool right before register_kretprobe().
>>>> (whether per-kretprobe or per-task or global pool, that is user's choice.)
>>>>
>>>>>
>>>>> If we look at struct ftrace_ret_stack, it has a few fixed function
>>>>> fields. The calltime one is all that is needed for the kretprobe
>>>>> example code.
>>>>
>>>> kretprobe consumes 3 fields, a pointer to 'struct kretprobe' (which
>>>> stores callee function address in 'kretprobe::kp.addr'), a return
>>>> address and a frame pointer (*).
>>> > Oops, I forgot to add "void *data" for storing user data.
>>>
>>
>> Should use "struct kretprobe_holder *rph", since "struct kretprobe" belongs
>> to 3rd-party module (which might be unloaded any time).
>
> Good catch. Yes, instead of 'struct kretprobe', we need to use the holder.
>
>> User's own pool might not work if the module can be unloaded. Better manage
>> the pool in kretprobe_holder, which needs no changes from user side.
>
> No, since the 'data' will be only refered from user handler. If the kretprobe
> is released, then the kretprobe_holder will clear the refernce to the 'struct
> kretprobe'. Then, the user handler is never called. No one access the 'data'.

Indeed, there is no race of "data" accessing, since unregister_kretprobes()
is taking care of it.

This implementation just increases the complexity of caller to keep track
of all allocated instances and release them after unregistration.

But guys are likely to use kmalloc in pre-handler and kfree in post-handler,
which will lead to memory leaks.

2021-07-12 04:57:58

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip v8 11/13] x86/unwind: Recover kretprobe trampoline entry

On Sun, 11 Jul 2021 23:28:49 +0800
Matt Wu <[email protected]> wrote:

> On 2021/7/11 PM10:09, Masami Hiramatsu wrote:
> > On Wed, 7 Jul 2021 22:42:47 +0800
> > Matt Wu <[email protected]> wrote:
> >
> >> On 2021/7/7 PM9:29, Masami Hiramatsu wrote:
> >>> On Wed, 7 Jul 2021 19:45:30 +0900
> >>> Masami Hiramatsu <[email protected]> wrote:
> >>>
> >>>> On Wed, 7 Jul 2021 12:20:57 +0200
> >>>> Peter Zijlstra <[email protected]> wrote:
> >>>>
> >>>>> On Wed, Jul 07, 2021 at 07:15:10PM +0900, Masami Hiramatsu wrote:
> >>>>>
> >>>>>> I actually don't want to keep this feature because no one use it.
> >>>>>> (only systemtap needs it?)
> >>>>>
> >>>>> Yeah, you mentioned systemtap, but since that's out-of-tree I don't
> >>>>> care. Their problem.
> >>>
> >>> Yeah, maybe it is not hard to update.
> >>>
> >>>>>
> >>>>>> Anyway, if we keep the idea-level compatibility (not code level),
> >>>>>> what we need is 'void *data' in the struct kretprobe_instance.
> >>>>>> User who needs it can allocate their own instance data for their
> >>>>>> kretprobes when initialising it and sets in their entry handler.
> >>>>>>
> >>>>>> Then we can have a simple kretprobe_instance.
> >>>>>
> >>>>> When would you do the alloc? When installing the retprobe, but that
> >>>>> might be inside the allocator, which means you can't call the allocator
> >>>>> etc.. :-)
> >>>>
> >>>> Yes, so the user may need to allocate a pool right before register_kretprobe().
> >>>> (whether per-kretprobe or per-task or global pool, that is user's choice.)
> >>>>
> >>>>>
> >>>>> If we look at struct ftrace_ret_stack, it has a few fixed function
> >>>>> fields. The calltime one is all that is needed for the kretprobe
> >>>>> example code.
> >>>>
> >>>> kretprobe consumes 3 fields, a pointer to 'struct kretprobe' (which
> >>>> stores callee function address in 'kretprobe::kp.addr'), a return
> >>>> address and a frame pointer (*).
> >>> > Oops, I forgot to add "void *data" for storing user data.
> >>>
> >>
> >> Should use "struct kretprobe_holder *rph", since "struct kretprobe" belongs
> >> to 3rd-party module (which might be unloaded any time).
> >
> > Good catch. Yes, instead of 'struct kretprobe', we need to use the holder.
> >
> >> User's own pool might not work if the module can be unloaded. Better manage
> >> the pool in kretprobe_holder, which needs no changes from user side.
> >
> > No, since the 'data' will be only refered from user handler. If the kretprobe
> > is released, then the kretprobe_holder will clear the refernce to the 'struct
> > kretprobe'. Then, the user handler is never called. No one access the 'data'.
>
> Indeed, there is no race of "data" accessing, since unregister_kretprobes()
> is taking care of it.
>
> This implementation just increases the complexity of caller to keep track
> of all allocated instances and release them after unregistration.

Yes, but user can manage it with an array of pointers (or directly allocate
an array of their desired data). Not hard to track it in that case.

> But guys are likely to use kmalloc in pre-handler and kfree in post-handler,
> which will lead to memory leaks.

I will note "do not allocate memory inside kprobe handler" on manual.
I think that's all what we need. We cannot stop someone shooting their feet
especially in the kernel...

Thank you,

--
Masami Hiramatsu <[email protected]>

2021-07-29 02:32:43

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH 1/2] objtool: Add frame-pointer-specific function ignore

Hi Josh,

I found one mistake on this patch. You have to add STACK_FRAME_NON_STANDARD_FP() in
case of !CONFIG_STACK_VALIDATION.

diff --git a/include/linux/objtool.h b/include/linux/objtool.h
index c9575ed91052..aca52db2f3f3 100644
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -138,6 +138,7 @@ struct unwind_hint {
#define UNWIND_HINT(sp_reg, sp_offset, type, end) \
"\n\t"
#define STACK_FRAME_NON_STANDARD(func)
+#define STACK_FRAME_NON_STANDARD_FP(func)
#else
#define ANNOTATE_INTRA_FUNCTION_CALL
.macro UNWIND_HINT sp_reg:req sp_offset=0 type:req end=0

Anyway, I will send my series including these patches and this fix as v10.

Thank you,


On Sat, 10 Jul 2021 12:24:33 -0700
Josh Poimboeuf <[email protected]> wrote:

> Add a CONFIG_FRAME_POINTER-specific version of
> STACK_FRAME_NON_STANDARD() for the case where a function is
> intentionally missing frame pointer setup, but otherwise needs
> objtool/ORC coverage when frame pointers are disabled.
>
> Signed-off-by: Josh Poimboeuf <[email protected]>
> ---
> include/linux/objtool.h | 11 +++++++++++
> tools/include/linux/objtool.h | 11 +++++++++++
> 2 files changed, 22 insertions(+)
>
> diff --git a/include/linux/objtool.h b/include/linux/objtool.h
> index 7e72d975cb76..c9575ed91052 100644
> --- a/include/linux/objtool.h
> +++ b/include/linux/objtool.h
> @@ -66,6 +66,17 @@ struct unwind_hint {
> static void __used __section(".discard.func_stack_frame_non_standard") \
> *__func_stack_frame_non_standard_##func = func
>
> +/*
> + * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore
> + * for the case where a function is intentionally missing frame pointer setup,
> + * but otherwise needs objtool/ORC coverage when frame pointers are disabled.
> + */
> +#ifdef CONFIG_FRAME_POINTER
> +#define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func)
> +#else
> +#define STACK_FRAME_NON_STANDARD_FP(func)
> +#endif
> +
> #else /* __ASSEMBLY__ */
>
> /*
> diff --git a/tools/include/linux/objtool.h b/tools/include/linux/objtool.h
> index 7e72d975cb76..c9575ed91052 100644
> --- a/tools/include/linux/objtool.h
> +++ b/tools/include/linux/objtool.h
> @@ -66,6 +66,17 @@ struct unwind_hint {
> static void __used __section(".discard.func_stack_frame_non_standard") \
> *__func_stack_frame_non_standard_##func = func
>
> +/*
> + * STACK_FRAME_NON_STANDARD_FP() is a frame-pointer-specific function ignore
> + * for the case where a function is intentionally missing frame pointer setup,
> + * but otherwise needs objtool/ORC coverage when frame pointers are disabled.
> + */
> +#ifdef CONFIG_FRAME_POINTER
> +#define STACK_FRAME_NON_STANDARD_FP(func) STACK_FRAME_NON_STANDARD(func)
> +#else
> +#define STACK_FRAME_NON_STANDARD_FP(func)
> +#endif
> +
> #else /* __ASSEMBLY__ */
>
> /*
> --
> 2.31.1
>


--
Masami Hiramatsu <[email protected]>