2021-03-04 08:24:12

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: Broken kretprobe stack traces

Hi Daniel,

On Tue, 02 Mar 2021 17:15:13 -0800
"Daniel Xu" <[email protected]> wrote:

> Hi Masami,
>
> Jakub reported a bug with kretprobe stack traces -- wondering if you've gotten
> any bug reports related to stack traces being broken for kretprobes.

Yeah, stack dumper must check the stack entry is kretprobe'd and skip it.
For example, ftrace checks it as below.

/sys/kernel/debug/tracing # echo r vfs_read > kprobe_events
/sys/kernel/debug/tracing # echo stacktrace > events/kprobes/r_vfs_read_0/trigger
/sys/kernel/debug/tracing # echo 1 > events/kprobes/r_vfs_read_0/enable
/sys/kernel/debug/tracing # head -20 trace
# tracer: nop
#
# entries-in-buffer/entries-written: 15685/336094 #P:8
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
sh-132 [006] ...1 38.920352: <stack trace>
=> kretprobe_dispatcher
=> __kretprobe_trampoline_handler
=> trampoline_handler
=> [unknown/kretprobe'd]
=> [unknown/kretprobe'd]
=> __x64_sys_read
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe


>
> I think (can't prove) this used to work:

I'm not sure the bpftrace had correctly handled it or not.

>
> # bpftrace -e 'kretprobe:__tcp_retransmit_skb { @[kstack()] = count() }'
> Attaching 1 probe...
> ^C
>
> @[
> kretprobe_trampoline+0
> ]: 1

Would you know how the bpftrace stacktracer rewinds the stack entries?
FYI, ftrace does it in trace_seq_print_sym()@kernel/trace/trace_output.c

Thank you,

>
> fentry/fexit probes seem to work:
>
> # bpftrace -e 'kretfunc:__tcp_retransmit_skb { @[kstack()] = count() }'
> Attaching 1 probe...
> ^C
>
> @[
> ftrace_trampoline+10799
> bpf_get_stackid_raw_tp+121
> ftrace_trampoline+10799
> __tun_chr_ioctl.isra.0.cold+33312
> __tcp_retransmit_skb+5
> tcp_send_loss_probe+254
> tcp_write_timer_handler+394
> tcp_write_timer+149
> call_timer_fn+41
> __run_timers+493
> run_timer_softirq+25
> __softirqentry_text_start+207
> asm_call_sysvec_on_stack+18
> do_softirq_own_stack+55
> irq_exit_rcu+158
> sysvec_apic_timer_interrupt+54
> asm_sysvec_apic_timer_interrupt+18
> ]: 1
> @[
> ftrace_trampoline+10799
> bpf_get_stackid_raw_tp+121
> ftrace_trampoline+10799
> __tun_chr_ioctl.isra.0.cold+33312
> __tcp_retransmit_skb+5
> <...>
>
> which makes me suspect it's a kprobe specific issue.
>
> Thanks,
> Daniel


--
Masami Hiramatsu <[email protected]>


2021-03-04 11:08:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: Broken kretprobe stack traces

On Wed, 3 Mar 2021 13:48:28 +0900
Masami Hiramatsu <[email protected]> wrote:

>
> >
> > I think (can't prove) this used to work:

Would be good to find out if it did.

>
> I'm not sure the bpftrace had correctly handled it or not.
>
> >
> > # bpftrace -e 'kretprobe:__tcp_retransmit_skb { @[kstack()] = count() }'
> > Attaching 1 probe...
> > ^C
> >
> > @[
> > kretprobe_trampoline+0
> > ]: 1
>
> Would you know how the bpftrace stacktracer rewinds the stack entries?
> FYI, ftrace does it in trace_seq_print_sym()@kernel/trace/trace_output.c
>

The difference between trace events and normal function tracing stack
traces is that it keeps its original return address. But kretprobes (and
function graph tracing, and some bpf trampolines too) modify the return
pointer, and that could possibly cause havoc with the stack trace.

-- Steve

2021-03-04 12:45:29

by Daniel Xu

[permalink] [raw]
Subject: Re: Broken kretprobe stack traces

On Wed, Mar 03, 2021 at 01:48:28PM +0900, Masami Hiramatsu wrote:
> Hi Daniel,
>
> On Tue, 02 Mar 2021 17:15:13 -0800
> "Daniel Xu" <[email protected]> wrote:
>
> > Hi Masami,
> >
> > Jakub reported a bug with kretprobe stack traces -- wondering if you've gotten
> > any bug reports related to stack traces being broken for kretprobes.
>
> Yeah, stack dumper must check the stack entry is kretprobe'd and skip it.
> For example, ftrace checks it as below.
>
> /sys/kernel/debug/tracing # echo r vfs_read > kprobe_events
> /sys/kernel/debug/tracing # echo stacktrace > events/kprobes/r_vfs_read_0/trigger
> /sys/kernel/debug/tracing # echo 1 > events/kprobes/r_vfs_read_0/enable
> /sys/kernel/debug/tracing # head -20 trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 15685/336094 #P:8
> #
> # _-----=> irqs-off
> # / _----=> need-resched
> # | / _---=> hardirq/softirq
> # || / _--=> preempt-depth
> # ||| / delay
> # TASK-PID CPU# |||| TIMESTAMP FUNCTION
> # | | | |||| | |
> sh-132 [006] ...1 38.920352: <stack trace>
> => kretprobe_dispatcher
> => __kretprobe_trampoline_handler
> => trampoline_handler
> => [unknown/kretprobe'd]
> => [unknown/kretprobe'd]
> => __x64_sys_read
> => do_syscall_64
> => entry_SYSCALL_64_after_hwframe
>
>
> >
> > I think (can't prove) this used to work:
>
> I'm not sure the bpftrace had correctly handled it or not.
>
> >
> > # bpftrace -e 'kretprobe:__tcp_retransmit_skb { @[kstack()] = count() }'
> > Attaching 1 probe...
> > ^C
> >
> > @[
> > kretprobe_trampoline+0
> > ]: 1
>
> Would you know how the bpftrace stacktracer rewinds the stack entries?
> FYI, ftrace does it in trace_seq_print_sym()@kernel/trace/trace_output.c

Thanks for the hint, I'll take a look.

bpftrace generates a bpf program that calls into the kernel's
bpf_get_stackid():

https://github.com/torvalds/linux/blob/f69d02e37a85645aa90d18cacfff36dba370f797/include/uapi/linux/bpf.h#L1296

so it could be a bug with bpf.

<...>

2021-03-04 23:29:18

by Daniel Xu

[permalink] [raw]
Subject: Re: Broken kretprobe stack traces

On Wed, Mar 03, 2021 at 09:26:04AM -0500, Steven Rostedt wrote:
> On Wed, 3 Mar 2021 13:48:28 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> >
> > >
> > > I think (can't prove) this used to work:
>
> Would be good to find out if it did.

I'm installing some older kernels now to check. Will report back.

<...>

Thanks,
Daniel

2021-03-05 00:13:16

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: Broken kretprobe stack traces

On Wed, 3 Mar 2021 09:26:04 -0500
Steven Rostedt <[email protected]> wrote:

> On Wed, 3 Mar 2021 13:48:28 +0900
> Masami Hiramatsu <[email protected]> wrote:
>
> >
> > >
> > > I think (can't prove) this used to work:
>
> Would be good to find out if it did.
>
> >
> > I'm not sure the bpftrace had correctly handled it or not.
> >
> > >
> > > # bpftrace -e 'kretprobe:__tcp_retransmit_skb { @[kstack()] = count() }'
> > > Attaching 1 probe...
> > > ^C
> > >
> > > @[
> > > kretprobe_trampoline+0
> > > ]: 1
> >
> > Would you know how the bpftrace stacktracer rewinds the stack entries?
> > FYI, ftrace does it in trace_seq_print_sym()@kernel/trace/trace_output.c
> >
>
> The difference between trace events and normal function tracing stack
> traces is that it keeps its original return address. But kretprobes (and
> function graph tracing, and some bpf trampolines too) modify the return
> pointer, and that could possibly cause havoc with the stack trace.

BTW, I think if the stack tracer passes the nth of kretprobe_trampoline
or a cursor, kretprobe can find the correct return address from given task.

I've made a patch to do that only for the CONFIG_ARCH_STACKWALK=y

Here is an example on x86.

# echo r vfs_read > kprobe_events
# echo stacktrace > events/kprobes/r_vfs_read_0/trigger
# echo 1 > events/kprobes/r_vfs_read_0/enable
# echo 1 > options/sym-offset
# less trace
...

sh-132 [007] ...1 22.524917: <stack trace>
=> kretprobe_dispatcher+0x7d/0xc0
=> __kretprobe_trampoline_handler+0xdb/0x1b0
=> trampoline_handler+0x48/0x60
=> kretprobe_trampoline+0x2a/0x50
=> ksys_read+0x70/0xf0
=> __x64_sys_read+0x1a/0x20
=> do_syscall_64+0x38/0x50
=> entry_SYSCALL_64_after_hwframe+0x44/0xae
=> 0
=> 0

------

From 77a785a3a0791171b570830d0b2f099f8a4ba337 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <[email protected]>
Date: Thu, 4 Mar 2021 14:19:24 +0900
Subject: [PATCH] kprobes: stacktrace: Recover the address changed by kretprobe

Recover the return address on the stack which changed by the
kretprobe.

Signed-off-by: Masami Hiramatsu <[email protected]>
---
include/linux/kprobes.h | 3 ++
kernel/kprobes.c | 81 +++++++++++++++++++++++++++--------------
kernel/stacktrace.c | 26 +++++++++++++
3 files changed, 82 insertions(+), 28 deletions(-)

diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 1883a4a9f16a..a022e507d829 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -205,6 +205,9 @@ extern void arch_prepare_kretprobe(struct kretprobe_instance *ri,
struct pt_regs *regs);
extern int arch_trampoline_kprobe(struct kprobe *p);

+unsigned long kretprobe_real_stack_tsk(struct task_struct *tsk,
+ unsigned long addr,
+ struct llist_node **cur);
/* If the trampoline handler called from a kprobe, use this version */
unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
void *trampoline_address,
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 745f08fdd7a6..b3d9dbd6086f 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1863,46 +1863,56 @@ unsigned long __weak arch_deref_entry_point(void *entry)

#ifdef CONFIG_KRETPROBES

-unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
- void *trampoline_address,
- void *frame_pointer)
+/* This assumes the tsk is current or the task which is not running. */
+unsigned long kretprobe_real_stack_tsk(struct task_struct *tsk,
+ unsigned long addr,
+ struct llist_node **cur)
{
- kprobe_opcode_t *correct_ret_addr = NULL;
struct kretprobe_instance *ri = NULL;
- struct llist_node *first, *node;
- struct kretprobe *rp;
+ struct llist_node *node = *cur;

- /* Find all nodes for this frame. */
- first = node = current->kretprobe_instances.first;
- while (node) {
- ri = container_of(node, struct kretprobe_instance, llist);
+ if (addr != (unsigned long)&kretprobe_trampoline)
+ return addr;

- BUG_ON(ri->fp != frame_pointer);
+ if (!node)
+ node = tsk->kretprobe_instances.first;
+ else
+ node = node->next;

- if (ri->ret_addr != trampoline_address) {
- correct_ret_addr = ri->ret_addr;
- /*
- * This is the real return address. Any other
- * instances associated with this task are for
- * other calls deeper on the call stack
- */
- goto found;
+ while (node) {
+ ri = container_of(node, struct kretprobe_instance, llist);
+ if (ri->ret_addr != (void *)&kretprobe_trampoline) {
+ *cur = node;
+ return (unsigned long)ri->ret_addr;
}
-
node = node->next;
}
- pr_err("Oops! Kretprobe fails to find correct return address.\n");
- BUG_ON(1);
+ return 0;
+}

-found:
- /* Unlink all nodes for this frame. */
- current->kretprobe_instances.first = node->next;
- node->next = NULL;
+unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
+ void *trampoline_address,
+ void *frame_pointer)
+{
+ kprobe_opcode_t *correct_ret_addr = NULL;
+ struct kretprobe_instance *ri = NULL;
+ struct llist_node *first, *node = NULL;
+ struct kretprobe *rp;

- /* Run them.. */
+ /* Find correct address and all nodes for this frame. */
+ correct_ret_addr = (void*)kretprobe_real_stack_tsk(current,
+ (unsigned long)&kretprobe_trampoline, &node);
+ if (!correct_ret_addr) {
+ pr_err("Oops! Kretprobe fails to find correct return address.\n");
+ BUG_ON(1);
+ }
+
+ /* Run them. */
+ first = current->kretprobe_instances.first;
while (first) {
ri = container_of(first, struct kretprobe_instance, llist);
- first = first->next;
+
+ BUG_ON(ri->fp != frame_pointer);

rp = get_kretprobe(ri);
if (rp && rp->handler) {
@@ -1913,6 +1923,21 @@ unsigned long __kretprobe_trampoline_handler(struct pt_regs *regs,
rp->handler(ri, regs);
__this_cpu_write(current_kprobe, prev);
}
+ if (first == node)
+ break;
+
+ first = first->next;
+ }
+
+ /* Unlink all nodes for this frame. */
+ first = current->kretprobe_instances.first;
+ current->kretprobe_instances.first = node->next;
+ node->next = NULL;
+
+ /* Recycle them. */
+ while (first) {
+ ri = container_of(first, struct kretprobe_instance, llist);
+ first = first->next;

recycle_rp_inst(ri);
}
diff --git a/kernel/stacktrace.c b/kernel/stacktrace.c
index 9f8117c7cfdd..416f357e64b8 100644
--- a/kernel/stacktrace.c
+++ b/kernel/stacktrace.c
@@ -13,6 +13,7 @@
#include <linux/export.h>
#include <linux/kallsyms.h>
#include <linux/stacktrace.h>
+#include <linux/kprobes.h>

/**
* stack_trace_print - Print the entries in the stack trace
@@ -76,6 +77,10 @@ struct stacktrace_cookie {
unsigned int size;
unsigned int skip;
unsigned int len;
+#ifdef CONFIG_KRETPROBES
+ struct llist_node *cur;
+ struct task_struct *tsk;
+#endif
};

static bool stack_trace_consume_entry(void *cookie, unsigned long addr)
@@ -89,6 +94,7 @@ static bool stack_trace_consume_entry(void *cookie, unsigned long addr)
c->skip--;
return true;
}
+ addr = kretprobe_real_stack_tsk(c->tsk, addr, &c->cur);
c->store[c->len++] = addr;
return c->len < c->size;
}
@@ -116,6 +122,10 @@ unsigned int stack_trace_save(unsigned long *store, unsigned int size,
.store = store,
.size = size,
.skip = skipnr + 1,
+#ifdef CONFIG_KRETPROBES
+ .cur = NULL,
+ .tsk = current,
+#endif
};

arch_stack_walk(consume_entry, &c, current, NULL);
@@ -141,6 +151,10 @@ unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
.size = size,
/* skip this function if they are tracing us */
.skip = skipnr + (current == tsk),
+#ifdef CONFIG_KRETPROBES
+ .cur = NULL,
+ .tsk = tsk,
+#endif
};

if (!try_get_task_stack(tsk))
@@ -168,6 +182,10 @@ unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store,
.store = store,
.size = size,
.skip = skipnr,
+#ifdef CONFIG_KRETPROBES
+ .cur = NULL,
+ .tsk = current,
+#endif
};

arch_stack_walk(consume_entry, &c, current, regs);
@@ -194,6 +212,10 @@ int stack_trace_save_tsk_reliable(struct task_struct *tsk, unsigned long *store,
struct stacktrace_cookie c = {
.store = store,
.size = size,
+#ifdef CONFIG_KRETPROBES
+ .cur = NULL,
+ .tsk = tsk,
+#endif
};
int ret;

@@ -224,6 +246,10 @@ unsigned int stack_trace_save_user(unsigned long *store, unsigned int size)
struct stacktrace_cookie c = {
.store = store,
.size = size,
+#ifdef CONFIG_KRETPROBES
+ .cur = NULL,
+ .tsk = current,
+#endif
};
mm_segment_t fs;

--
2.25.1




Thank you,

--
Masami Hiramatsu <[email protected]>

2021-03-05 00:57:01

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] kprobes: stacktrace: Recover the address changed by kretprobe

Hi Masami,

I love your patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.12-rc1 next-20210304]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Masami-Hiramatsu/kprobes-stacktrace-Recover-the-address-changed-by-kretprobe/20210304-212528
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git f69d02e37a85645aa90d18cacfff36dba370f797
config: parisc-allyesconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/2204ff150821a6a3c13b4fa10784b5efb3e3adc8
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Masami-Hiramatsu/kprobes-stacktrace-Recover-the-address-changed-by-kretprobe/20210304-212528
git checkout 2204ff150821a6a3c13b4fa10784b5efb3e3adc8
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

kernel/kprobes.c:1847:12: warning: no previous prototype for 'kprobe_exceptions_notify' [-Wmissing-prototypes]
1847 | int __weak kprobe_exceptions_notify(struct notifier_block *self,
| ^~~~~~~~~~~~~~~~~~~~~~~~
kernel/kprobes.c: In function 'kretprobe_real_stack_tsk':
>> kernel/kprobes.c:1874:30: error: 'kretprobe_trampoline' undeclared (first use in this function); did you mean 'kretprobe_blackpoint'?
1874 | if (addr != (unsigned long)&kretprobe_trampoline)
| ^~~~~~~~~~~~~~~~~~~~
| kretprobe_blackpoint
kernel/kprobes.c:1874:30: note: each undeclared identifier is reported only once for each function it appears in
kernel/kprobes.c: In function '__kretprobe_trampoline_handler':
kernel/kprobes.c:1904:21: error: 'kretprobe_trampoline' undeclared (first use in this function); did you mean 'kretprobe_blackpoint'?
1904 | (unsigned long)&kretprobe_trampoline, &node);
| ^~~~~~~~~~~~~~~~~~~~
| kretprobe_blackpoint


vim +1874 kernel/kprobes.c

1865
1866 /* This assumes the tsk is current or the task which is not running. */
1867 unsigned long kretprobe_real_stack_tsk(struct task_struct *tsk,
1868 unsigned long addr,
1869 struct llist_node **cur)
1870 {
1871 struct kretprobe_instance *ri = NULL;
1872 struct llist_node *node = *cur;
1873
> 1874 if (addr != (unsigned long)&kretprobe_trampoline)
1875 return addr;
1876
1877 if (!node)
1878 node = tsk->kretprobe_instances.first;
1879 else
1880 node = node->next;
1881
1882 while (node) {
1883 ri = container_of(node, struct kretprobe_instance, llist);
1884 if (ri->ret_addr != (void *)&kretprobe_trampoline) {
1885 *cur = node;
1886 return (unsigned long)ri->ret_addr;
1887 }
1888 node = node->next;
1889 }
1890 return 0;
1891 }
1892

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (3.59 kB)
.config.gz (66.42 kB)
Download all attachments