2008-11-11 05:46:39

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC v3][PATCH 0/2] Make ftrace able to trace function return

This patchset adds the ability for ftrace to trace the function even on call time and on
return time. So we can now measure the time of execution of the most part of the functions
inside the kernel with ftrace.

The first patch bring the low level tools to add the support of return tracing on X86-32. It is
totally separated from the traditional implementation of ftrace and doesn't support
dynamic ftrace at this time.

The second patch adds a tracer based on the ring-buffer which measure the time of execution of the functions inside the kernel.

---

Most of the changes in this V3 have been suggested by Steven Rostedt.
The low-level function handling uses now the thread info and use a stack-style
storage of addresses. So it can follow the call stack. This implementation uses
a maximum depth of 20 calls per thread.
This is much more scalable than the old trampoline based implementation because
each thread has its own stack of return addresses. There should be lesser overruns
(cases when the tracer have to drop one trace because there is no more slot to store
the return address).
NMIs are still not supported but you can now trace even if the watchdog is enabled (functions
called in NMI context will not be traced).

There are other changes that are described below.
---

Changelog[V3]:

_CONFIG_FTRACE_RETURN depends now on CONFIG_FUNCTION_TRACER.
_ftrace_return_stub disappears to use now ftrace_stub
_Use a simplified version of the nmi context detection by ftrace
to not trace in nmi context.
_Warn and disable tracing if fault or if the return address is not in kernel text.
_Get rid of the trampoline to use the thread_info of a task to store return addresses.
A task get its stack of return addresses initialized in two cases:
_ The init task begins with an empty stack of return adresses
_ When a process forks, its child obtains an empty stack of return addresses because it will begin
its execution in userspace and doesn't need the memorized return adresses of its parent.
_ The return_to_handler execute in two states: a first part in assembly that save appropriate registers
(return values) and jump to the original return adresse. The second part retrieve the return address,
send the trace and return the address.
_Use CFLAGS_REMOVE for some files when CONFIG_FTRACE_RETURN is set to let function tracer still able
to trace its functions. V2 used notrace.
_Correct a bug during trace insertion where "disabled" wasn't well verified.
_Move case statement of the printing for return traces on trace.c to call the function provided by the
return tracer. So other tracers can now use it.

Project of future improvements:

_ Set a more fine grained locking when touching the return addresses trace stack.
The actual lock uses a global spinlock for all the stack of each thread.
It would be better to set one spinlock per thread or even use a lockless solution
based on atomic operations to manipulate the index on the stack.

_ Passing arguments to prepare_ftrace_return by register instead of using the stack. (faster)

_ Let the ftrace API to assign more than one return handler.

_ Implement return tracing on X86-64 and ARM.

Changelog[V2]:

_ Fix deadlock caused by NMI (disable tracing if watchdog is active)
_ Fix return value when it is 64 bits wide.
_ First really working version


Stats for the V3:
---
arch/x86/Kconfig | 1 +
arch/x86/include/asm/ftrace.h | 26 +++++
arch/x86/include/asm/thread_info.h | 24 +++++
arch/x86/kernel/Makefile | 6 +
arch/x86/kernel/entry_32.S | 33 ++++++
arch/x86/kernel/ftrace.c | 181 +++++++++++++++++++++++++++++++--
include/linux/ftrace.h | 20 ++++
include/linux/ftrace_irq.h | 2 +-
include/linux/sched.h | 11 ++
kernel/Makefile | 4 +
kernel/trace/Kconfig | 14 +++
kernel/trace/Makefile | 1 +
kernel/trace/ftrace.c | 16 +++
kernel/trace/trace.c | 65 ++++++++++--
kernel/trace/trace.h | 35 +++++++
kernel/trace/trace_functions_return.c | 82 +++++++++++++++
16 files changed, 505 insertions(+), 16 deletions(-)


2008-11-11 09:43:38

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC v3][PATCH 0/2] Make ftrace able to trace function return


* Frederic Weisbecker <[email protected]> wrote:

> This patchset adds the ability for ftrace to trace the function even
> on call time and on return time. So we can now measure the time of
> execution of the most part of the functions inside the kernel with
> ftrace.
>
> The first patch bring the low level tools to add the support of
> return tracing on X86-32. It is totally separated from the
> traditional implementation of ftrace and doesn't support dynamic
> ftrace at this time.
>
> The second patch adds a tracer based on the ring-buffer which
> measure the time of execution of the functions inside the kernel.

okay, i've created tip/tracing/function-return-tracer topic so that we
can start testing this for real and do can start iterating it via
smaller changes. I've also integrated it into tip/master, it's looking
good so far.

i did a couple of cleanups straight away:

1) i did a FTRACE_RETURN => FUNCTION_RET_TRACER rename, to move it in
line with the function tracer

2) i cleaned up the arch/x86/Kconfig impact:

f1c4be5: tracing, x86: clean up FUNCTION_RET_TRACER Kconfig

the return-tracer builds and boots, and works fabulously:

# cd /sys/debug/tracing/

# cat available_tracers
return function sched_switch nop

# echo return > current_tracer

clocksource_read+0xd/0xf -> acpi_pm_read (1547 ns)
getnstimeofday+0x3e/0xc8 -> clocksource_read (1951 ns)
ktime_get_ts+0x25/0x49 -> getnstimeofday (2354 ns)
ktime_get_ts+0x45/0x49 -> set_normalized_timespec (120 ns)

Small quirk, it does not seem possible to switch from ftrace to
ftrace-ret:

# echo ftrace > current_tracer
# echo return > current_tracer
bash: echo: write error: Invalid argument

i have to go via the "nop" tracer for this to work. Steve: switching
between tracer plugins should be seemless.

Question: have you thought about extending the return-tracer to
dyn-function-tracer?

That's where it will really shine: the dftrace workflow is to
typically enable a low number of functions - and there the
return-tracing overhead does not show up nearly as much as in the
static-function-tracing workflow.

Another suggestion: i think the "return" plugin name is confusing to
users (it confused me when i first saw it listed in
available_tracers).

So lets use "function_full" and "function" tracing perhaps? The "full"
tracer is what traces both entry and exit points, and establishes full
function-call timings/costs. The "function" tracer is more lightweight
and traces function entry events.

Ingo

2008-11-11 10:13:17

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] tracing, x86: function return tracer, fix assembly constraints


>From 867f7fb3ebb831970847b179e7df5a9ab10da16d Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Tue, 11 Nov 2008 11:18:14 +0100
Subject: [PATCH] tracing, x86: function return tracer, fix assembly constraints

fix:

arch/x86/kernel/ftrace.c: Assembler messages:
arch/x86/kernel/ftrace.c:140: Error: missing ')'
arch/x86/kernel/ftrace.c:140: Error: junk `(%ebp))' after expression
arch/x86/kernel/ftrace.c:141: Error: missing ')'
arch/x86/kernel/ftrace.c:141: Error: junk `(%ebp))' after expression

the [parent_replaced] is used in an =rm fashion, so that constraint
is correct in isolation - but [parent_old] aliases register %0 and uses
it in an addressing mode that is only valid with registers - so change
the constraint from =rm to =r.

This fixes the build failure.

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/ftrace.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index d68033b..9b2325a 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -151,7 +151,7 @@ void prepare_ftrace_return(unsigned long *parent, unsigned long self_addr)
" .long 2b, 3b\n"
".previous\n"

- : [parent_replaced] "=rm" (parent), [old] "=r" (old),
+ : [parent_replaced] "=r" (parent), [old] "=r" (old),
[faulted] "=r" (faulted)
: [parent_old] "0" (parent), [return_hooker] "r" (return_hooker)
: "memory"

2008-11-11 11:04:30

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] tracing: function return tracer, build fix


another small fixlet.

------------->
>From 19b3e9671c5a219b8c34da2cc66e0ce7c3a501ae Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Tue, 11 Nov 2008 11:57:02 +0100
Subject: [PATCH] tracing: function return tracer, build fix

fix:

arch/x86/kernel/ftrace.c: In function 'ftrace_return_to_handler':
arch/x86/kernel/ftrace.c:112: error: implicit declaration of function 'cpu_clock'

cpu_clock() is implicitly included via a number of ways, but its real
location is sched.h. (Build failure is triggerable if enough other
kernel components are turned off.)

Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/ftrace.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 9b2325a..16a571d 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -14,6 +14,7 @@
#include <linux/uaccess.h>
#include <linux/ftrace.h>
#include <linux/percpu.h>
+#include <linux/sched.h>
#include <linux/init.h>
#include <linux/list.h>

2008-11-11 15:30:25

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC v3][PATCH 0/2] Make ftrace able to trace function return

2008/11/11 Ingo Molnar <[email protected]>:
>
> * Frederic Weisbecker <[email protected]> wrote:
>
>> This patchset adds the ability for ftrace to trace the function even
>> on call time and on return time. So we can now measure the time of
>> execution of the most part of the functions inside the kernel with
>> ftrace.
>>
>> The first patch bring the low level tools to add the support of
>> return tracing on X86-32. It is totally separated from the
>> traditional implementation of ftrace and doesn't support dynamic
>> ftrace at this time.
>>
>> The second patch adds a tracer based on the ring-buffer which
>> measure the time of execution of the functions inside the kernel.
>
> okay, i've created tip/tracing/function-return-tracer topic so that we
> can start testing this for real and do can start iterating it via
> smaller changes. I've also integrated it into tip/master, it's looking
> good so far.


Thanks!

> i did a couple of cleanups straight away:
>
> 1) i did a FTRACE_RETURN => FUNCTION_RET_TRACER rename, to move it in
> line with the function tracer


Yes I started the development of this thing when FUNCTION_TRACER was
still FTRACE and I kept the old and confusing
name.

> 2) i cleaned up the arch/x86/Kconfig impact:
>
> f1c4be5: tracing, x86: clean up FUNCTION_RET_TRACER Kconfig


Ok.
And concerning the constraint fixed in assembly. That's strange my
compiler didn't even warn me about it.
Thanks for these three fixes.

> the return-tracer builds and boots, and works fabulously:
>
> # cd /sys/debug/tracing/
>
> # cat available_tracers
> return function sched_switch nop
>
> # echo return > current_tracer
>
> clocksource_read+0xd/0xf -> acpi_pm_read (1547 ns)
> getnstimeofday+0x3e/0xc8 -> clocksource_read (1951 ns)
> ktime_get_ts+0x25/0x49 -> getnstimeofday (2354 ns)
> ktime_get_ts+0x45/0x49 -> set_normalized_timespec (120 ns)
>
> Small quirk, it does not seem possible to switch from ftrace to
> ftrace-ret:
>
> # echo ftrace > current_tracer
> # echo return > current_tracer
> bash: echo: write error: Invalid argument
>
> i have to go via the "nop" tracer for this to work. Steve: switching
> between tracer plugins should be seemless.
>
> Question: have you thought about extending the return-tracer to
> dyn-function-tracer?


Yes. I forgot it into the project section. But it will be implemented :-)


> That's where it will really shine: the dftrace workflow is to
> typically enable a low number of functions - and there the
> return-tracing overhead does not show up nearly as much as in the
> static-function-tracing workflow.
>
> Another suggestion: i think the "return" plugin name is confusing to
> users (it confused me when i first saw it listed in
> available_tracers).
>
> So lets use "function_full" and "function" tracing perhaps? The "full"
> tracer is what traces both entry and exit points, and establishes full
> function-call timings/costs. The "function" tracer is more lightweight
> and traces function entry events.


Right. I first chose function_return but it was the largest tracer
name and so raised a bug that has been corrected
since. And I kept this confusing name.
I'm not sure function_full is really the good one because it's not
really a full version of the function tracer. It doesn't do the same
thing: when you look into a trace done by the function tracer, the
functions appear in order of their call whereas with the function
return tracer, they appear in the order of their return which is not
the same. The effect is not the same.
Why not function_return or something like that?

2008-11-11 17:15:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC v3][PATCH 0/2] Make ftrace able to trace function return


* Fr?d?ric Weisbecker <[email protected]> wrote:

> > 2) i cleaned up the arch/x86/Kconfig impact:
> >
> > f1c4be5: tracing, x86: clean up FUNCTION_RET_TRACER Kconfig
>
>
> Ok.
> And concerning the constraint fixed in assembly. That's strange my
> compiler didn't even warn me about it.

yep, that's common with assembly constraints - often the bugs will
only trigger with certain compilers and with certain config options.
Generally we've got enough variations of these parameters in -tip
testing (randconfig plus multiple tools versions) to reliably trigger
such issues.

> > So lets use "function_full" and "function" tracing perhaps? The
> > "full" tracer is what traces both entry and exit points, and
> > establishes full function-call timings/costs. The "function"
> > tracer is more lightweight and traces function entry events.
>
> Right. I first chose function_return but it was the largest tracer
> name and so raised a bug that has been corrected since. And I kept
> this confusing name.
>
> I'm not sure function_full is really the good one because it's not
> really a full version of the function tracer. It doesn't do the same
> thing: when you look into a trace done by the function tracer, the
> functions appear in order of their call whereas with the function
> return tracer, they appear in the order of their return which is not
> the same. The effect is not the same. Why not function_return or
> something like that?

at the risk of bikeshed-painting this issue too much, the problem with
function_return is that it has little meaning to actual users and even
to developers. What does the "return" mean? We know what it means,
because we know that opposed to function entry we'll also capture
function returns, and hence be able to do full function call tracing.

so function_full i thought to conduct this aspect of it better. But
suggestions are welcome.

Ingo

2008-11-11 17:24:40

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC v3][PATCH 0/2] Make ftrace able to trace function return

2008/11/11 Ingo Molnar <[email protected]>:
>
> * Fr?d?ric Weisbecker <[email protected]> wrote:
>
>> > 2) i cleaned up the arch/x86/Kconfig impact:
>> >
>> > f1c4be5: tracing, x86: clean up FUNCTION_RET_TRACER Kconfig
>>
>>
>> Ok.
>> And concerning the constraint fixed in assembly. That's strange my
>> compiler didn't even warn me about it.
>
> yep, that's common with assembly constraints - often the bugs will
> only trigger with certain compilers and with certain config options.
> Generally we've got enough variations of these parameters in -tip
> testing (randconfig plus multiple tools versions) to reliably trigger
> such issues.


I love -tip testing :-)


>> > So lets use "function_full" and "function" tracing perhaps? The
>> > "full" tracer is what traces both entry and exit points, and
>> > establishes full function-call timings/costs. The "function"
>> > tracer is more lightweight and traces function entry events.
>>
>> Right. I first chose function_return but it was the largest tracer
>> name and so raised a bug that has been corrected since. And I kept
>> this confusing name.
>>
>> I'm not sure function_full is really the good one because it's not
>> really a full version of the function tracer. It doesn't do the same
>> thing: when you look into a trace done by the function tracer, the
>> functions appear in order of their call whereas with the function
>> return tracer, they appear in the order of their return which is not
>> the same. The effect is not the same. Why not function_return or
>> something like that?
>
> at the risk of bikeshed-painting this issue too much, the problem with
> function_return is that it has little meaning to actual users and even
> to developers. What does the "return" mean? We know what it means,
> because we know that opposed to function entry we'll also capture
> function returns, and hence be able to do full function call tracing.
>
> so function_full i thought to conduct this aspect of it better. But
> suggestions are welcome.


Ok. Let's change into function_full, after all, I think that this tool
will be mostly used with a parsing pass
of its traces with a script to produce statistics and hierarchical
representation like does draw_functrace.py
After that, the order of apparition of the functions in the trace will
not really matter.

2008-11-11 18:43:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [RFC v3][PATCH 0/2] Make ftrace able to trace function return


* Fr?d?ric Weisbecker <[email protected]> wrote:

> > at the risk of bikeshed-painting this issue too much, the problem
> > with function_return is that it has little meaning to actual users
> > and even to developers. What does the "return" mean? We know what
> > it means, because we know that opposed to function entry we'll
> > also capture function returns, and hence be able to do full
> > function call tracing.
> >
> > so function_full i thought to conduct this aspect of it better.
> > But suggestions are welcome.
>
>
> Ok. Let's change into function_full, after all, I think that this
> tool will be mostly used with a parsing pass of its traces with a
> script to produce statistics and hierarchical representation like
> does draw_functrace.py After that, the order of apparition of the
> functions in the trace will not really matter.

how about function_cost ?

that's what it's primarily about at this stage: the ability to capture
entry+exit, and have the cost printed.

as opposed to function tracer, which traces function entry events, but
does not try to build a coherent picture about per function execution
cost.

Ingo

2008-11-11 19:15:00

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC v3][PATCH 0/2] Make ftrace able to trace function return

2008/11/11 Ingo Molnar <[email protected]>:
>
> * Fr?d?ric Weisbecker <[email protected]> wrote:
>
>> > at the risk of bikeshed-painting this issue too much, the problem
>> > with function_return is that it has little meaning to actual users
>> > and even to developers. What does the "return" mean? We know what
>> > it means, because we know that opposed to function entry we'll
>> > also capture function returns, and hence be able to do full
>> > function call tracing.
>> >
>> > so function_full i thought to conduct this aspect of it better.
>> > But suggestions are welcome.
>>
>>
>> Ok. Let's change into function_full, after all, I think that this
>> tool will be mostly used with a parsing pass of its traces with a
>> script to produce statistics and hierarchical representation like
>> does draw_functrace.py After that, the order of apparition of the
>> functions in the trace will not really matter.
>
> how about function_cost ?
>
> that's what it's primarily about at this stage: the ability to capture
> entry+exit, and have the cost printed.
>
> as opposed to function tracer, which traces function entry events, but
> does not try to build a coherent picture about per function execution
> cost.
>
> Ingo
>

Ah yes. That's better!

2008-11-12 20:44:39

by Frank Ch. Eigler

[permalink] [raw]
Subject: Re: [RFC v3][PATCH 0/2] Make ftrace able to trace function return


Frederic Weisbecker <[email protected]> writes:

> This patchset adds the ability for ftrace to trace the function even
> on call time and on return time. So we can now measure the time of
> execution of the most part of the functions inside the kernel with
> ftrace. [...]

By the way, one extra complication you may need to deal with, beyond
normal retprobes, is that by the time dyn-ftrace gets hold of the
function entry, some part of the function prologue will have been
executed. Because the mcount call/sequence is not the first
instruction in the function body, is there a risk that the call frame
cannot be reliably modified because of interference from those first
few other instructions? Maybe on some architectures/optimization
levels?

- FChE

2008-11-13 00:10:52

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC v3][PATCH 0/2] Make ftrace able to trace function return

2008/11/12 Frank Ch. Eigler <[email protected]>:
>
> Frederic Weisbecker <[email protected]> writes:
>
>> This patchset adds the ability for ftrace to trace the function even
>> on call time and on return time. So we can now measure the time of
>> execution of the most part of the functions inside the kernel with
>> ftrace. [...]
>
> By the way, one extra complication you may need to deal with, beyond
> normal retprobes, is that by the time dyn-ftrace gets hold of the
> function entry, some part of the function prologue will have been
> executed. Because the mcount call/sequence is not the first
> instruction in the function body, is there a risk that the call frame
> cannot be reliably modified because of interference from those first
> few other instructions? Maybe on some architectures/optimization
> levels?
>
> - FChE
>

I don't think that's a problem since CONFIG_FRAME_POINTER is activated
for function tracing.
Whatever instruction is called before the call to mcount, the return
address will remain just
before the frame pointer. It doesn't really differ from usual function
tracing which logs the address
of the parent by using this way.
And the case of dynamic_ftrace doesn't change anything since the call
to mcount is redirected to
ftrace_caller which can handle the return addresses properly.