Hi all,
I'm currently working on a patchset to make kretprobes produce
reasonable stack traces[1], and it appears this is a generic problem
across the entire kernel -- you can see the same kretprobe_trampoline()
issue when using ftrace just as much as bpf_trace.
However, in working on this patch, I've noticed that there appear to be
several different implementations of "get the stack trace from this
pt_regs" which all appear quite similar. Namely:
* struct ftrace_stack;
* struct perf_callchain_entry; [**]
* struct stack_trace;
* oprofile_operations->backtrace [This is not related to the kretprobe
problem, very tangential, but since it's usage is not very
complicated -- logging to dmesg -- it wouldn't be too bad to
refactor this one too].
Would there be a strong objection to me trying to merge together these
so that they all use 'struct stack_trace', and no longer have
arch-specific code that is doing (what appears to be) the same unwind
code? Or is this something that was intentionally avoided because there
are some differences that I'm not seeing?
The reason I ask is because the kretprobes patch would require saving
the stacktrace during pre_handler_kretprobe() -- and so in order for all
of the tracing subsystems to take advantage of it they'd need to be able
to use that saved stack trace.
The only other option I can see would be to implement some sort of
translation from 'struct stack_trace' to the others. This wouldn't be
too bad, but I imagine it would be uglier than refactoring them all to
use the same struct.
[**] perf_callchain_entry has the concept of "marking" a context in the
stack trace. But I wonder whether this is something that we could
do with 'struct stack_trace' -- after all it's just magic ->ip
values. *But* then the question is what is the purpose of
sysctl_perf_event_max_contexts_per_stack? It limits the number of
contexts, but isn't that already implicitly limited by the number
of stack entries? We could also implement this with 'struct
stack_trace' but it would require wrapping 'struct stack_trace' to
make it efficient.
[1]: https://github.com/iovisor/bpftrace/issues/101
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
On 2018-10-21, Aleksa Sarai <[email protected]> wrote:
> However, in working on this patch, I've noticed that there appear to be
> several different implementations of "get the stack trace from this
> pt_regs" which all appear quite similar. Namely:
>
> * struct ftrace_stack;
Sorry, I made a mistake here. ftrace_stack *does* use stack_trace under
the hood, and so it can be easily modified to use the saved kretprobe
stack. The only issue is with perf_callchain_entry (which affects
bpf_get_stack).
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
On Sun, Oct 21, 2018 at 08:31:06PM +1100, Aleksa Sarai wrote:
> Hi all,
>
> I'm currently working on a patchset to make kretprobes produce
> reasonable stack traces[1], and it appears this is a generic problem
> across the entire kernel -- you can see the same kretprobe_trampoline()
> issue when using ftrace just as much as bpf_trace.
>
> However, in working on this patch, I've noticed that there appear to be
> several different implementations of "get the stack trace from this
> pt_regs" which all appear quite similar. Namely:
>
> * struct ftrace_stack;
> * struct perf_callchain_entry; [**]
> * struct stack_trace;
the perf thing also does userspace stack, where the others do not afaik.
And for the kernel part it already uses the regular kernel unwinder.
I'm not sure what you're proposing.
On 2018-10-21, Peter Zijlstra <[email protected]> wrote:
> > Hi all,
> >
> > I'm currently working on a patchset to make kretprobes produce
> > reasonable stack traces[1], and it appears this is a generic problem
> > across the entire kernel -- you can see the same kretprobe_trampoline()
> > issue when using ftrace just as much as bpf_trace.
> >
> > However, in working on this patch, I've noticed that there appear to be
> > several different implementations of "get the stack trace from this
> > pt_regs" which all appear quite similar. Namely:
> >
> > * struct ftrace_stack;
> > * struct perf_callchain_entry; [**]
> > * struct stack_trace;
>
> the perf thing also does userspace stack, where the others do not afaik.
Ah right, I forgot about that. Though I wonder whether there would be a
strong argument against extending 'struct stack_trace' to contain the
ustack so we could use it then.
> And for the kernel part it already uses the regular kernel unwinder.
> I'm not sure what you're proposing.
They do use the kernel unwinder, but they are both implemented
per-architecture and use different structures to store and represent the
stack trace (even though at the end of the day they are just an array of
longs).
oprofile for instance (appears to) reimplement the following code
per-architecture:
struct stack_trace trace;
save_stack_trace(&trace);
print_stack_trace(&trace, 0);
The same applies for perf_callchain_entry which is all implemented
per-architecture even though get_perf_callchain_kernel() could just
re-use save_stack_trace_regs().
But after looking at this a bit more, I think I'll write a patch which
does this the "silly" way and then I'll see whether maintainers prefer
it to be refactored.
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>