2009-03-24 19:39:56

by Tim Bird

[permalink] [raw]
Subject: Anyone working on ftrace function graph support on ARM?

ARM and FTRACE people,

Is anyone working on function graph support for ARM?

I haven't done much ARM assembly, but the Intel mechanism
for the return hook looks fairly straightforward,
and I thought I'd take a shot at implementing it on ARM.

But if someone else is already doing it, I'd rather work
with them.

BTW - I turned on -finstrument-functions on ARM, and it seems
to work OK (with the exception being that I don't see evenly matched
calls and returns). For this latter reason, I'm going to
start with an implementation that copies the return hook
used by x86, with a fallback to using __cyg_profile_... instead
of mcount, if the hook approach proves too hard for me on ARM.

My ultimate goal is to add function duration filtering, which
is one of the nicer features of KFT (an older tracer I used
to maintain out-of-mainline). With all the ftrace and ringbuffer
support already in mainline, this shouldn't be too hard, but
I need to start with basic graph support on ARM first.

Any comments or feedback on the approach, or on current plans
to extend ftrace support on ARM, before I get too far along,
are welcome.

Thanks,
-- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================


2009-03-24 20:26:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?


* Tim Bird <[email protected]> wrote:

> My ultimate goal is to add function duration filtering, which is
> one of the nicer features of KFT (an older tracer I used to
> maintain out-of-mainline). With all the ftrace and ringbuffer
> support already in mainline, this shouldn't be too hard, but I
> need to start with basic graph support on ARM first.

ah, function duration filtering - is that to only show functions
which have a duration beyond a certain (runtime configurable) value?

Ingo

2009-03-24 20:38:28

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

Hello,

On Tue, Mar 24, 2009 at 12:38:50PM -0700, Tim Bird wrote:
> ARM and FTRACE people,
>
> Is anyone working on function graph support for ARM?
I don't think there is anyone working in that area.

You can Cc: me when your patches are ready, I will happily proof-read
them.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2009-03-24 21:03:49

by Tim Bird

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

Ingo Molnar wrote:
> * Tim Bird <[email protected]> wrote:
>
>> My ultimate goal is to add function duration filtering, which is
>> one of the nicer features of KFT (an older tracer I used to
>> maintain out-of-mainline). With all the ftrace and ringbuffer
>> support already in mainline, this shouldn't be too hard, but I
>> need to start with basic graph support on ARM first.
>
> ah, function duration filtering - is that to only show functions
> which have a duration beyond a certain (runtime configurable) value?

Exactly. With KFT, you set a "mintime filter", and it removed from
the trace buffer any functions which were less than the threshhold.
This was not high cost, because you usually only removed the most
recently added function in the trace buffer.

This lets you focus on routines that last a long time. I used
it mainly to find long-duration routines in early boot.
The initcall tracer serves a similar purpose, but not at the
same granularity.

It greatly extends the length of time you can trace, before the
buffer fills up.
-- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================

2009-03-24 21:36:45

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Tue, Mar 24, 2009 at 12:38:50PM -0700, Tim Bird wrote:
> ARM and FTRACE people,
>
> Is anyone working on function graph support for ARM?


Aah, it was on my plans!
But well, if you are about to do it, so don't hesitate.

I don't have any arm board for now, though I have a pending
one which I will probably get in few weeks, but for now
I can't work on it, so knock yourself out.


> I haven't done much ARM assembly, but the Intel mechanism
> for the return hook looks fairly straightforward,
> and I thought I'd take a shot at implementing it on ARM.


Yes, ie:

_Before jumping to the function entry hook, you must save
the arguments for the traced function on the stack.
On x86, its eax, edx and ecx.
On arm, it will be r0-r3.
Then you have to transmit the address of the traced function
(it's on r14) and it's parent (must rely on fp for that).
Then you call the entry hook and restore the old scratch/arg
registers.


_ On return hook it's pretty the same, (saving the scratch
registers, especially the return value which should be on r0
and r1 if I'm not wrong).
But you'll have to get the original return address from the
return handler and then put it on pc.

Well it's a very naive listing, there are sometimes some problems.
For example on x86-64, I had to save even some non-scratch registers
before calling the return hook, I still don't know why.


> But if someone else is already doing it, I'd rather work
> with them.
>
> BTW - I turned on -finstrument-functions on ARM, and it seems
> to work OK (with the exception being that I don't see evenly matched
> calls and returns). For this latter reason, I'm going to
> start with an implementation that copies the return hook
> used by x86, with a fallback to using __cyg_profile_... instead
> of mcount, if the hook approach proves too hard for me on ARM.


Be care, -finstrument-functions will add one more handler and then
increase the size of the kernel, may be a lot.

Moreover it will not be compatible with dynamic tracing which is
designed to patch the mcount call sites. It doesn't support patching
of __cyg_profile. And patching __cyg_profile call sites would mean
running twice the time to patch the kernel code.

Function tracing (hooks only on function entries) and dynamic tracing
are already implemented on Arm. Using the mcount way shoudn't really be
a problem because all is already set for that with the function tracer.


Anyway if you need some help, don't hesitate!


> My ultimate goal is to add function duration filtering, which
> is one of the nicer features of KFT (an older tracer I used
> to maintain out-of-mainline). With all the ftrace and ringbuffer
> support already in mainline, this shouldn't be too hard, but
> I need to start with basic graph support on ARM first.


Yeah, seems a nice idea.

Thanks,
Frederic.


> Any comments or feedback on the approach, or on current plans
> to extend ftrace support on ARM, before I get too far along,
> are welcome.
>
> Thanks,
> -- Tim
>
> =============================
> Tim Bird
> Architecture Group Chair, CE Linux Forum
> Senior Staff Engineer, Sony Corporation of America
> =============================
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2009-03-24 21:40:44

by Tim Bird

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

Frederic Weisbecker wrote:
> Yes, ie:
>
> _Before jumping to the function entry hook, you must save
> the arguments for the traced function on the stack.
> On x86, its eax, edx and ecx.
> On arm, it will be r0-r3.
> Then you have to transmit the address of the traced function
> (it's on r14) and it's parent (must rely on fp for that).
> Then you call the entry hook and restore the old scratch/arg
> registers.
>
>
> _ On return hook it's pretty the same, (saving the scratch
> registers, especially the return value which should be on r0
> and r1 if I'm not wrong).
> But you'll have to get the original return address from the
> return handler and then put it on pc.
>
> Well it's a very naive listing, there are sometimes some problems.
> For example on x86-64, I had to save even some non-scratch registers
> before calling the return hook, I still don't know why.

Thanks - this is very helpful. I'll send patches around when
I get something going. Hopefully in the next week or two.
-- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================

2009-03-24 21:49:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?


* Frederic Weisbecker <[email protected]> wrote:

> Well it's a very naive listing, there are sometimes some problems.
> For example on x86-64, I had to save even some non-scratch
> registers before calling the return hook, I still don't know why.

btw., which are those registers?

Ingo

2009-03-24 21:57:56

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Tue, Mar 24, 2009 at 10:48:46PM +0100, Ingo Molnar wrote:
>
> * Frederic Weisbecker <[email protected]> wrote:
>
> > Well it's a very naive listing, there are sometimes some problems.
> > For example on x86-64, I had to save even some non-scratch
> > registers before calling the return hook, I still don't know why.
>
> btw., which are those registers?
>
> Ingo


I would expect to only save rax,rdi,rsi,rdx,rcx,r8,r9 which
are used for parameters.
And I had some crashes until I append r10 and r11 which actually are
scratch if I'm not wrong, but since they are scratch and are not used for
arguments, I thought they didn't need to be saved.

Well, I think there were some code flow cases I was missing.


The complete code is:

movq %rax, (%rsp)
movq %rcx, 8(%rsp)
movq %rdx, 16(%rsp)
movq %rsi, 24(%rsp)
movq %rdi, 32(%rsp)
movq %r8, 40(%rsp)
movq %r9, 48(%rsp)
movq %r10, 56(%rsp)
movq %r11, 64(%rsp)

call ftrace_return_to_handler

movq %rax, 72(%rsp) <-- get original return value
movq 64(%rsp), %r11
movq 56(%rsp), %r10
movq 48(%rsp), %r9
movq 40(%rsp), %r8
movq 32(%rsp), %rdi
movq 24(%rsp), %rsi
movq 16(%rsp), %rdx
movq 8(%rsp), %rcx
movq (%rsp), %rax
addq $72, %rsp

2009-03-24 22:15:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?


* Frederic Weisbecker <[email protected]> wrote:

> On Tue, Mar 24, 2009 at 10:48:46PM +0100, Ingo Molnar wrote:
> >
> > * Frederic Weisbecker <[email protected]> wrote:
> >
> > > Well it's a very naive listing, there are sometimes some problems.
> > > For example on x86-64, I had to save even some non-scratch
> > > registers before calling the return hook, I still don't know why.
> >
> > btw., which are those registers?
> >
> > Ingo
>
>
> I would expect to only save rax,rdi,rsi,rdx,rcx,r8,r9 which are
> used for parameters.

> And I had some crashes until I append r10 and r11 which actually
> are scratch if I'm not wrong, but since they are scratch and are
> not used for arguments, I thought they didn't need to be saved.
>
> Well, I think there were some code flow cases I was missing.

Correct, r10 and r11 are clobbered registers too - and you need to
save them too in mcount methods.

The reason is that mcount has a special calling convention - it's
not just about not destroying arguments - GCC can keep data in r10
or r11 scratch registers across function calls as well - for example
for relatively static functions that are in its local optimization
scope.

If GCC can prove that the local scope function itself does not
clobber r10/r11, it does not have to clobber them across the
function call. But the mcount() callback still gets inserted.

So the rule is: mcount must not destroy _any_ register state.
(beyond flags)

ngo

2009-03-24 22:29:42

by Abhishek Sagar

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

Frederic Weisbecker wrote:
> Yes, ie:
>
> _Before jumping to the function entry hook, you must save
> the arguments for the traced function on the stack.
> On x86, its eax, edx and ecx.
> On arm, it will be r0-r3.
> Then you have to transmit the address of the traced function
> (it's on r14) and it's parent (must rely on fp for that).
> Then you call the entry hook and restore the old scratch/arg
> registers.
>
Instead of just restoring the old backed-up args, lr can be fixed up
inside the entry hook to point to the return hook. So when the traced
function returns, it actually returns to the return hook (where we can
restore the original return address). This means that
-finstrument-functions is not required at all. This is analogous to how
kretprobes work. The only difference here is that instead of planting a
kprobe at the function entry and redirecting the function return to the
profiling exit routine, we can use mcount. This is slightly more
complicated to implement but can be a very efficient alternative to
kretprobes.
--
Abhishek

2009-03-24 22:49:22

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Tue, Mar 24, 2009 at 06:29:03PM -0400, Abhishek Sagar wrote:
> Frederic Weisbecker wrote:
> > Yes, ie:
> >
> > _Before jumping to the function entry hook, you must save
> > the arguments for the traced function on the stack.
> > On x86, its eax, edx and ecx.
> > On arm, it will be r0-r3.
> > Then you have to transmit the address of the traced function
> > (it's on r14) and it's parent (must rely on fp for that).
> > Then you call the entry hook and restore the old scratch/arg
> > registers.
> >
> Instead of just restoring the old backed-up args, lr can be fixed up
> inside the entry hook to point to the return hook. So when the traced
> function returns, it actually returns to the return hook (where we can
> restore the original return address). This means that
> -finstrument-functions is not required at all. This is analogous to how
> kretprobes work. The only difference here is that instead of planting a
> kprobe at the function entry and redirecting the function return to the
> profiling exit routine, we can use mcount. This is slightly more
> complicated to implement but can be a very efficient alternative to
> kretprobes.
> --
> Abhishek
>

Indeed, you need to override lr, that even the only solution.
I was still thinking in an x86 way with its on stack return address.

2009-03-24 22:55:21

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Tue, Mar 24, 2009 at 11:14:39PM +0100, Ingo Molnar wrote:
>
> * Frederic Weisbecker <[email protected]> wrote:
>
> > On Tue, Mar 24, 2009 at 10:48:46PM +0100, Ingo Molnar wrote:
> > >
> > > * Frederic Weisbecker <[email protected]> wrote:
> > >
> > > > Well it's a very naive listing, there are sometimes some problems.
> > > > For example on x86-64, I had to save even some non-scratch
> > > > registers before calling the return hook, I still don't know why.
> > >
> > > btw., which are those registers?
> > >
> > > Ingo
> >
> >
> > I would expect to only save rax,rdi,rsi,rdx,rcx,r8,r9 which are
> > used for parameters.
>
> > And I had some crashes until I append r10 and r11 which actually
> > are scratch if I'm not wrong, but since they are scratch and are
> > not used for arguments, I thought they didn't need to be saved.
> >
> > Well, I think there were some code flow cases I was missing.
>
> Correct, r10 and r11 are clobbered registers too - and you need to
> save them too in mcount methods.
>
> The reason is that mcount has a special calling convention - it's
> not just about not destroying arguments - GCC can keep data in r10
> or r11 scratch registers across function calls as well - for example
> for relatively static functions that are in its local optimization
> scope.
>
> If GCC can prove that the local scope function itself does not
> clobber r10/r11, it does not have to clobber them across the
> function call. But the mcount() callback still gets inserted.
>
> So the rule is: mcount must not destroy _any_ register state.
> (beyond flags)
>
> ngo


Aah, ok, understood!
Thanks.

2009-03-25 08:36:54

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Tue, Mar 24, 2009 at 11:14:39PM +0100, Ingo Molnar wrote:
> So the rule is: mcount must not destroy _any_ register state.
> (beyond flags)

As has already been discussed on the ARM lists, we have a problem with
using mcount along with EABI, or OABI without frame pointers. We can
not avoid destroying 'lr' - the return address for the function.

OABI works around this by assuming that the parent function saved a
stack frame, which means it's at a known offset from the frame pointer
(provided frame pointers are enabled.) mcount reloads 'lr' before
returning:

ENTRY(mcount)
stmdb sp!, {r0-r3, lr}
mov r0, lr
sub r0, r0, #MCOUNT_INSN_SIZE

.globl mcount_call
mcount_call:
bl ftrace_stub
ldr lr, [fp, #-4] @ restore lr
ldmia sp!, {r0-r3, pc}

So I think if Tim is expecting to be able to use this feature on EABI
kernels, he's going to struggle.

2009-03-25 08:43:19

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Tue, Mar 24, 2009 at 11:48:58PM +0100, Frederic Weisbecker wrote:
> On Tue, Mar 24, 2009 at 06:29:03PM -0400, Abhishek Sagar wrote:
> > Instead of just restoring the old backed-up args, lr can be fixed up
> > inside the entry hook to point to the return hook. So when the traced
> > function returns, it actually returns to the return hook (where we can
> > restore the original return address). This means that
> > -finstrument-functions is not required at all. This is analogous to how
> > kretprobes work. The only difference here is that instead of planting a
> > kprobe at the function entry and redirecting the function return to the
> > profiling exit routine, we can use mcount. This is slightly more
> > complicated to implement but can be a very efficient alternative to
> > kretprobes.
> > --
> > Abhishek
> >
>
> Indeed, you need to override lr, that even the only solution.
> I was still thinking in an x86 way with its on stack return address.

As pointed out in my previous mail, identifying where on the stack the
return address is stored is only possible for OABI with frame pointers.

EABI will probably be possible with the stack unwinding code, but it
probably won't be cheap. The EABI unwinder is scheduled for merging
during the present now-open merge window.

2009-03-25 08:54:45

by Ingo Molnar

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?


* Russell King - ARM Linux <[email protected]> wrote:

> On Tue, Mar 24, 2009 at 11:48:58PM +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 24, 2009 at 06:29:03PM -0400, Abhishek Sagar wrote:
> > > Instead of just restoring the old backed-up args, lr can be fixed up
> > > inside the entry hook to point to the return hook. So when the traced
> > > function returns, it actually returns to the return hook (where we can
> > > restore the original return address). This means that
> > > -finstrument-functions is not required at all. This is analogous to how
> > > kretprobes work. The only difference here is that instead of planting a
> > > kprobe at the function entry and redirecting the function return to the
> > > profiling exit routine, we can use mcount. This is slightly more
> > > complicated to implement but can be a very efficient alternative to
> > > kretprobes.
> > > --
> > > Abhishek
> > >
> >
> > Indeed, you need to override lr, that even the only solution.
> > I was still thinking in an x86 way with its on stack return address.
>
> As pointed out in my previous mail, identifying where on the stack
> the return address is stored is only possible for OABI with frame
> pointers.
>
> EABI will probably be possible with the stack unwinding code, but
> it probably won't be cheap. The EABI unwinder is scheduled for
> merging during the present now-open merge window.

Unwinding is not realistic or desired for the function tracer - it
runs in every kernel function so performance is paramount.

So, if i understood you correctly, an OABI_COMPAT and FRAME_POINTERS
dependency has to be added to the ARM HAVE_FUNCTION_GRAPH_TRACER
Kconfig rule.

Ingo

2009-03-25 09:58:36

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Wed, Mar 25, 2009 at 09:54:18AM +0100, Ingo Molnar wrote:
> Unwinding is not realistic or desired for the function tracer - it
> runs in every kernel function so performance is paramount.

Which would also include the unwinder itself as well.

> So, if i understood you correctly, an OABI_COMPAT and FRAME_POINTERS
> dependency has to be added to the ARM HAVE_FUNCTION_GRAPH_TRACER
> Kconfig rule.

If we have frame pointers enabled with EABI, then it looks like it will
work as well. So the dependency should be on FRAME_POINTERS for _every_
feature using the mcount code.

Hmm, and it looks like the ftrace code is rather crap:

ENTRY(mcount)
stmdb sp!, {r0-r3, lr}
ldr r0, =ftrace_trace_function
ldr r2, [r0]
adr r0, ftrace_stub
cmp r0, r2
bne trace
ldr lr, [fp, #-4] @ restore lr
ldmia sp!, {r0-r3, pc}

trace:
ldr r1, [fp, #-4] @ lr of instrumented routine
mov r0, lr
sub r0, r0, #MCOUNT_INSN_SIZE
mov lr, pc
mov pc, r2
XXX calling a C function results in r0-r3,ip,lr being clobbered XXX

mov lr, r1 @ restore lr
XXX not necessarily, r1 might be some other random value

ldmia sp!, {r0-r3, pc}

In fact, to me the above code looks totally crap, because it's checking
whether the caller is 'ftrace_stub'. It can never be 'ftrace_stub'
because that is an assembly function:

.globl ftrace_stub
ftrace_stub:
mov pc, lr

and therefore gcc has no hand in adding a mcount call to it.

Moreover, the _dynamic_ ftrace code does this:

ENTRY(mcount)
stmdb sp!, {r0-r3, lr}
mov r0, lr
sub r0, r0, #MCOUNT_INSN_SIZE

.globl mcount_call
mcount_call:
bl ftrace_stub
ldr lr, [fp, #-4] @ restore lr
ldmia sp!, {r0-r3, pc}

ENTRY(ftrace_caller)
stmdb sp!, {r0-r3, lr}
ldr r1, [fp, #-4]
mov r0, lr
sub r0, r0, #MCOUNT_INSN_SIZE

.globl ftrace_call
ftrace_call:
bl ftrace_stub
ldr lr, [fp, #-4] @ restore lr
ldmia sp!, {r0-r3, pc}

In other words, it pushes some words onto the stack, sets r0, calls
an assembly function which does nothing but just returns, reloads lr,
restores the stack and returns. This ftrace implementation looks like
an exercise in slowing down execution to me with no added value.

2009-03-25 10:45:29

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

Hi Russell,

On Wed, Mar 25, 2009 at 09:57:51AM +0000, Russell King - ARM Linux wrote:
> On Wed, Mar 25, 2009 at 09:54:18AM +0100, Ingo Molnar wrote:
> > Unwinding is not realistic or desired for the function tracer - it
> > runs in every kernel function so performance is paramount.
>
> Which would also include the unwinder itself as well.
>
> > So, if i understood you correctly, an OABI_COMPAT and FRAME_POINTERS
> > dependency has to be added to the ARM HAVE_FUNCTION_GRAPH_TRACER
> > Kconfig rule.
>
> If we have frame pointers enabled with EABI, then it looks like it will
> work as well. So the dependency should be on FRAME_POINTERS for _every_
> feature using the mcount code.
>
> Hmm, and it looks like the ftrace code is rather crap:
>
> ENTRY(mcount)
> stmdb sp!, {r0-r3, lr}
> ldr r0, =ftrace_trace_function
> ldr r2, [r0]
> adr r0, ftrace_stub
> cmp r0, r2
> bne trace
> ldr lr, [fp, #-4] @ restore lr
> ldmia sp!, {r0-r3, pc}
>
> trace:
> ldr r1, [fp, #-4] @ lr of instrumented routine
> mov r0, lr
> sub r0, r0, #MCOUNT_INSN_SIZE
> mov lr, pc
> mov pc, r2
> XXX calling a C function results in r0-r3,ip,lr being clobbered XXX
>
> mov lr, r1 @ restore lr
> XXX not necessarily, r1 might be some other random value
>
> ldmia sp!, {r0-r3, pc}
>
> In fact, to me the above code looks totally crap, because it's checking
> whether the caller is 'ftrace_stub'. It can never be 'ftrace_stub'
> because that is an assembly function:
>
> .globl ftrace_stub
> ftrace_stub:
> mov pc, lr
>
> and therefore gcc has no hand in adding a mcount call to it.
Hhhm. Isn't the equivalent C-Code ~ as follows:

if (ftrace_trace_function != ftrace_stub)
trace(some, args);
return;
? ftrace_trace_function is initialised to ftrace_stub at compile time
and is changed when a tracerfunction is registered.

> Moreover, the _dynamic_ ftrace code does this:
>
> ENTRY(mcount)
> stmdb sp!, {r0-r3, lr}
> mov r0, lr
> sub r0, r0, #MCOUNT_INSN_SIZE
>
> .globl mcount_call
> mcount_call:
> bl ftrace_stub
> ldr lr, [fp, #-4] @ restore lr
> ldmia sp!, {r0-r3, pc}
>
> ENTRY(ftrace_caller)
> stmdb sp!, {r0-r3, lr}
> ldr r1, [fp, #-4]
> mov r0, lr
> sub r0, r0, #MCOUNT_INSN_SIZE
>
> .globl ftrace_call
> ftrace_call:
> bl ftrace_stub
> ldr lr, [fp, #-4] @ restore lr
> ldmia sp!, {r0-r3, pc}
>
> In other words, it pushes some words onto the stack, sets r0, calls
> an assembly function which does nothing but just returns, reloads lr,
> restores the stack and returns. This ftrace implementation looks like
> an exercise in slowing down execution to me with no added value.
The idea is that the instruction at address mcount_call (and
ftrace_call IIRC) is rewritten at run time.
Still the code is not active currently (because CONFIG_DYNAMIC_FTRACE
isn't selectable for ARM) and needs some care anyhow on reactivation
because the way how dynamic ftrace works changed somehow. Didn't look
at it up to now though.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2009-03-25 11:22:21

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Wed, Mar 25, 2009 at 11:45:05AM +0100, Uwe Kleine-K?nig wrote:
> Hi Russell,
>
> On Wed, Mar 25, 2009 at 09:57:51AM +0000, Russell King - ARM Linux wrote:
> > On Wed, Mar 25, 2009 at 09:54:18AM +0100, Ingo Molnar wrote:
> > > Unwinding is not realistic or desired for the function tracer - it
> > > runs in every kernel function so performance is paramount.
> >
> > Which would also include the unwinder itself as well.
> >
> > > So, if i understood you correctly, an OABI_COMPAT and FRAME_POINTERS
> > > dependency has to be added to the ARM HAVE_FUNCTION_GRAPH_TRACER
> > > Kconfig rule.
> >
> > If we have frame pointers enabled with EABI, then it looks like it will
> > work as well. So the dependency should be on FRAME_POINTERS for _every_
> > feature using the mcount code.
> >
> > Hmm, and it looks like the ftrace code is rather crap:
> >
> > ENTRY(mcount)
> > stmdb sp!, {r0-r3, lr}
> > ldr r0, =ftrace_trace_function
> > ldr r2, [r0]
> > adr r0, ftrace_stub
> > cmp r0, r2
> > bne trace
> > ldr lr, [fp, #-4] @ restore lr
> > ldmia sp!, {r0-r3, pc}
> >
> > trace:
> > ldr r1, [fp, #-4] @ lr of instrumented routine
> > mov r0, lr
> > sub r0, r0, #MCOUNT_INSN_SIZE
> > mov lr, pc
> > mov pc, r2
> > XXX calling a C function results in r0-r3,ip,lr being clobbered XXX
> >
> > mov lr, r1 @ restore lr
> > XXX not necessarily, r1 might be some other random value
> >
> > ldmia sp!, {r0-r3, pc}
> >
> > In fact, to me the above code looks totally crap, because it's checking
> > whether the caller is 'ftrace_stub'. It can never be 'ftrace_stub'
> > because that is an assembly function:
> >
> > .globl ftrace_stub
> > ftrace_stub:
> > mov pc, lr
> >
> > and therefore gcc has no hand in adding a mcount call to it.
> Hhhm. Isn't the equivalent C-Code ~ as follows:
>
> if (ftrace_trace_function != ftrace_stub)
> trace(some, args);
> return;
> ? ftrace_trace_function is initialised to ftrace_stub at compile time
> and is changed when a tracerfunction is registered.

Correct. But my point is that there's no way for ftrace_stub to ever call
mcount. So the check there is pointless.

> > Moreover, the _dynamic_ ftrace code does this:
> >
> > ENTRY(mcount)
> > stmdb sp!, {r0-r3, lr}
> > mov r0, lr
> > sub r0, r0, #MCOUNT_INSN_SIZE
> >
> > .globl mcount_call
> > mcount_call:
> > bl ftrace_stub
> > ldr lr, [fp, #-4] @ restore lr
> > ldmia sp!, {r0-r3, pc}
> >
> > ENTRY(ftrace_caller)
> > stmdb sp!, {r0-r3, lr}
> > ldr r1, [fp, #-4]
> > mov r0, lr
> > sub r0, r0, #MCOUNT_INSN_SIZE
> >
> > .globl ftrace_call
> > ftrace_call:
> > bl ftrace_stub
> > ldr lr, [fp, #-4] @ restore lr
> > ldmia sp!, {r0-r3, pc}
> >
> > In other words, it pushes some words onto the stack, sets r0, calls
> > an assembly function which does nothing but just returns, reloads lr,
> > restores the stack and returns. This ftrace implementation looks like
> > an exercise in slowing down execution to me with no added value.
> The idea is that the instruction at address mcount_call (and
> ftrace_call IIRC) is rewritten at run time.
> Still the code is not active currently (because CONFIG_DYNAMIC_FTRACE
> isn't selectable for ARM) and needs some care anyhow on reactivation
> because the way how dynamic ftrace works changed somehow. Didn't look
> at it up to now though.

Ok - it would be nice if there was a comment to explain that.

Is someone going to fix the existing ftrace before trying to build stuff
on top of it?

2009-03-25 11:41:51

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Wed, Mar 25, 2009 at 08:42:48AM +0000, Russell King - ARM Linux wrote:
> On Tue, Mar 24, 2009 at 11:48:58PM +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 24, 2009 at 06:29:03PM -0400, Abhishek Sagar wrote:
> > > Instead of just restoring the old backed-up args, lr can be fixed up
> > > inside the entry hook to point to the return hook. So when the traced
> > > function returns, it actually returns to the return hook (where we can
> > > restore the original return address). This means that
> > > -finstrument-functions is not required at all. This is analogous to how
> > > kretprobes work. The only difference here is that instead of planting a
> > > kprobe at the function entry and redirecting the function return to the
> > > profiling exit routine, we can use mcount. This is slightly more
> > > complicated to implement but can be a very efficient alternative to
> > > kretprobes.
> > > --
> > > Abhishek
> > >
> >
> > Indeed, you need to override lr, that even the only solution.
> > I was still thinking in an x86 way with its on stack return address.
>
> As pointed out in my previous mail, identifying where on the stack the
> return address is stored is only possible for OABI with frame pointers.
>
> EABI will probably be possible with the stack unwinding code, but it
> probably won't be cheap. The EABI unwinder is scheduled for merging
> during the present now-open merge window.


Hm, if I'm not wrong, the function tracer already depends on frame pointer,
this is necessary to retrieve the parent of the caller.

2009-03-25 12:10:41

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

Hi Russell,

> > > Hmm, and it looks like the ftrace code is rather crap:
> > >
> > > ENTRY(mcount)
> > > stmdb sp!, {r0-r3, lr}
> > > ldr r0, =ftrace_trace_function
> > > ldr r2, [r0]
> > > adr r0, ftrace_stub
> > > cmp r0, r2
> > > bne trace
> > > ldr lr, [fp, #-4] @ restore lr
> > > ldmia sp!, {r0-r3, pc}
> > >
> > > trace:
> > > ldr r1, [fp, #-4] @ lr of instrumented routine
> > > mov r0, lr
> > > sub r0, r0, #MCOUNT_INSN_SIZE
> > > mov lr, pc
> > > mov pc, r2
> > > XXX calling a C function results in r0-r3,ip,lr being clobbered XXX
> > >
> > > mov lr, r1 @ restore lr
> > > XXX not necessarily, r1 might be some other random value
> > >
> > > ldmia sp!, {r0-r3, pc}
> > >
> > > In fact, to me the above code looks totally crap, because it's checking
> > > whether the caller is 'ftrace_stub'. It can never be 'ftrace_stub'
> > > because that is an assembly function:
> > >
> > > .globl ftrace_stub
> > > ftrace_stub:
> > > mov pc, lr
> > >
> > > and therefore gcc has no hand in adding a mcount call to it.
> > Hhhm. Isn't the equivalent C-Code ~ as follows:
> >
> > if (ftrace_trace_function != ftrace_stub)
> > trace(some, args);
> > return;
> > ? ftrace_trace_function is initialised to ftrace_stub at compile time
> > and is changed when a tracerfunction is registered.
>
> Correct. But my point is that there's no way for ftrace_stub to ever call
> mcount. So the check there is pointless.
Which check? ftrace_trace_function isn't the caller of mcount but a
function pointer defined in kernel/trace/ftrace.c. And if this pointer
changes, the if-condition becomes true.

> Ok - it would be nice if there was a comment to explain that.
Some comments would be nice, that's right.

> Is someone going to fix the existing ftrace before trying to build stuff
> on top of it?
Mmh, I think you misunderstood the code, so I don't see something to fix
here (at least for non-dynamic ftrace). If the next few mails show,
that it's me who didn't understand the code, I will fix the problems, of
course.

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2009-03-25 16:00:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?


On Tue, 24 Mar 2009, Frederic Weisbecker wrote:

> On Tue, Mar 24, 2009 at 10:48:46PM +0100, Ingo Molnar wrote:
> >
> > * Frederic Weisbecker <[email protected]> wrote:
> >
> > > Well it's a very naive listing, there are sometimes some problems.
> > > For example on x86-64, I had to save even some non-scratch
> > > registers before calling the return hook, I still don't know why.
> >
> > btw., which are those registers?
> >
> > Ingo
>
>
> I would expect to only save rax,rdi,rsi,rdx,rcx,r8,r9 which
> are used for parameters.
> And I had some crashes until I append r10 and r11 which actually are
> scratch if I'm not wrong, but since they are scratch and are not used for
> arguments, I thought they didn't need to be saved.
>
> Well, I think there were some code flow cases I was missing.
>
>
> The complete code is:
>
> movq %rax, (%rsp)
> movq %rcx, 8(%rsp)
> movq %rdx, 16(%rsp)
> movq %rsi, 24(%rsp)
> movq %rdi, 32(%rsp)
> movq %r8, 40(%rsp)
> movq %r9, 48(%rsp)
> movq %r10, 56(%rsp)
> movq %r11, 64(%rsp)
>
> call ftrace_return_to_handler
>
> movq %rax, 72(%rsp) <-- get original return value
> movq 64(%rsp), %r11
> movq 56(%rsp), %r10
> movq 48(%rsp), %r9
> movq 40(%rsp), %r8
> movq 32(%rsp), %rdi
> movq 24(%rsp), %rsi
> movq 16(%rsp), %rdx
> movq 8(%rsp), %rcx
> movq (%rsp), %rax
> addq $72, %rsp

This bothers me. In PowerPC 64, all I have is:

_GLOBAL(return_to_handler)
/* need to save return values */
std r4, -24(r1)
std r3, -16(r1)
std r31, -8(r1)
mr r31, r1
stdu r1, -112(r1)

bl .ftrace_return_to_handler
nop

/* return value has real return address */
mtlr r3

ld r1, 0(r1)
ld r4, -24(r1)
ld r3, -16(r1)
ld r31, -8(r1)

/* Jump back to real return address */
blr

All I save is the return values (and I'm paranoid with that, by saving
both r3 and r4 and not just r3) as well as saving the stack. There should
be no reason to save any other registers.

This is not the same as mcount. mcount varies differently from arch to
arch. But this is the return of a function. This is not a mcount call, and
really has nothing to do with mcount.

If you think about it, the return is coming back from a function that
should have already saved all the registers that it modifies. The caller
of that function (the one we will return to) should have saved any
registers that are allowed to be modified by the callee.

When we call our ftrace_return_to_handler function it too will save any
register that it must for callees and restore it on return.

Perhaps the issue you had with x86_64 was that you did not set up the
stack frame properly? And by saving all those registers, it just happen to
do it for you?

-- Steve

2009-03-25 16:34:47

by Tim Bird

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

Russell King - ARM Linux wrote:
> As pointed out in my previous mail, identifying where on the stack the
> return address is stored is only possible for OABI with frame pointers.
>
> EABI will probably be possible with the stack unwinding code, but it
> probably won't be cheap. The EABI unwinder is scheduled for merging
> during the present now-open merge window.
>

-finstrument-functions is looking better and better. I know it
adds more overhead than the mcount call, and may wreak havoc with
the dynamic ftrace mechanisms, but at least the callouts are
simple, clear, and you get both entry and exit, at fixed
costs. I'll take a look at the EABI unwinder to see
what kind of variability it introduces (e.g. if it does a stack
scan or something).

Thanks very much for the information!
-- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================

2009-03-25 16:41:48

by Tim Bird

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

Ingo Molnar wrote:
> Unwinding is not realistic or desired for the function tracer - it
> runs in every kernel function so performance is paramount.
>
> So, if i understood you correctly, an OABI_COMPAT and FRAME_POINTERS
> dependency has to be added to the ARM HAVE_FUNCTION_GRAPH_TRACER
> Kconfig rule.

This, unfortunately, is a problem for me. All my ARM
projects are now using EABI. If the unwinding looks like
it has too much overhead, I'll do some research on the
-finstrument-functions (__cyg_profile_func_enter/exit) approach.

I'm not sure, however, if it's possible to integrate this with
the dynamic tracing mechanisms, though. So something may have
to give to get this supported on ARM.
-- Tim

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================

2009-03-25 17:05:54

by Uwe Kleine-König

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Wed, Mar 25, 2009 at 09:34:07AM -0700, Tim Bird wrote:
> Russell King - ARM Linux wrote:
> > As pointed out in my previous mail, identifying where on the stack the
> > return address is stored is only possible for OABI with frame pointers.
> >
> > EABI will probably be possible with the stack unwinding code, but it
> > probably won't be cheap. The EABI unwinder is scheduled for merging
> > during the present now-open merge window.
EABI with frame pointers should work, too, shouldn't it?

Best regards
Uwe

--
Pengutronix e.K. | Uwe Kleine-K?nig |
Industrial Linux Solutions | http://www.pengutronix.de/ |

2009-03-25 17:14:19

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Wed, Mar 25, 2009 at 12:00:24PM -0400, Steven Rostedt wrote:
>
> On Tue, 24 Mar 2009, Frederic Weisbecker wrote:
>
> > On Tue, Mar 24, 2009 at 10:48:46PM +0100, Ingo Molnar wrote:
> > >
> > > * Frederic Weisbecker <[email protected]> wrote:
> > >
> > > > Well it's a very naive listing, there are sometimes some problems.
> > > > For example on x86-64, I had to save even some non-scratch
> > > > registers before calling the return hook, I still don't know why.
> > >
> > > btw., which are those registers?
> > >
> > > Ingo
> >
> >
> > I would expect to only save rax,rdi,rsi,rdx,rcx,r8,r9 which
> > are used for parameters.
> > And I had some crashes until I append r10 and r11 which actually are
> > scratch if I'm not wrong, but since they are scratch and are not used for
> > arguments, I thought they didn't need to be saved.
> >
> > Well, I think there were some code flow cases I was missing.
> >
> >
> > The complete code is:
> >
> > movq %rax, (%rsp)
> > movq %rcx, 8(%rsp)
> > movq %rdx, 16(%rsp)
> > movq %rsi, 24(%rsp)
> > movq %rdi, 32(%rsp)
> > movq %r8, 40(%rsp)
> > movq %r9, 48(%rsp)
> > movq %r10, 56(%rsp)
> > movq %r11, 64(%rsp)
> >
> > call ftrace_return_to_handler
> >
> > movq %rax, 72(%rsp) <-- get original return value
> > movq 64(%rsp), %r11
> > movq 56(%rsp), %r10
> > movq 48(%rsp), %r9
> > movq 40(%rsp), %r8
> > movq 32(%rsp), %rdi
> > movq 24(%rsp), %rsi
> > movq 16(%rsp), %rdx
> > movq 8(%rsp), %rcx
> > movq (%rsp), %rax
> > addq $72, %rsp
>
> This bothers me. In PowerPC 64, all I have is:
>
> _GLOBAL(return_to_handler)
> /* need to save return values */
> std r4, -24(r1)
> std r3, -16(r1)
> std r31, -8(r1)
> mr r31, r1
> stdu r1, -112(r1)
>
> bl .ftrace_return_to_handler
> nop
>
> /* return value has real return address */
> mtlr r3
>
> ld r1, 0(r1)
> ld r4, -24(r1)
> ld r3, -16(r1)
> ld r31, -8(r1)
>
> /* Jump back to real return address */
> blr
>
> All I save is the return values (and I'm paranoid with that, by saving
> both r3 and r4 and not just r3) as well as saving the stack. There should
> be no reason to save any other registers.
>
> This is not the same as mcount. mcount varies differently from arch to
> arch. But this is the return of a function. This is not a mcount call, and
> really has nothing to do with mcount.
>
> If you think about it, the return is coming back from a function that
> should have already saved all the registers that it modifies. The caller
> of that function (the one we will return to) should have saved any
> registers that are allowed to be modified by the callee.
>
> When we call our ftrace_return_to_handler function it too will save any
> register that it must for callees and restore it on return.
>
> Perhaps the issue you had with x86_64 was that you did not set up the
> stack frame properly? And by saving all those registers, it just happen to
> do it for you?


I don't know. It seems to me that the stack frame is well set.
This is weird.


> -- Steve
>

2009-03-25 17:18:23

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Wed, Mar 25, 2009 at 06:05:33PM +0100, Uwe Kleine-K?nig wrote:
> On Wed, Mar 25, 2009 at 09:34:07AM -0700, Tim Bird wrote:
> > Russell King - ARM Linux wrote:
> > > As pointed out in my previous mail, identifying where on the stack the
> > > return address is stored is only possible for OABI with frame pointers.
> > >
> > > EABI will probably be possible with the stack unwinding code, but it
> > > probably won't be cheap. The EABI unwinder is scheduled for merging
> > > during the present now-open merge window.
> EABI with frame pointers should work, too, shouldn't it?

Yes, as I said at the top of my reply dated Wed, 25 Mar 2009 09:57:51 +0000

2009-03-25 18:38:56

by Tim Bird

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

Russell King - ARM Linux wrote:
> On Wed, Mar 25, 2009 at 06:05:33PM +0100, Uwe Kleine-K�nig wrote:
>> On Wed, Mar 25, 2009 at 09:34:07AM -0700, Tim Bird wrote:
>>> Russell King - ARM Linux wrote:
>>>> As pointed out in my previous mail, identifying where on the stack the
>>>> return address is stored is only possible for OABI with frame pointers.
>>>>
>>>> EABI will probably be possible with the stack unwinding code, but it
>>>> probably won't be cheap. The EABI unwinder is scheduled for merging
>>>> during the present now-open merge window.
>> EABI with frame pointers should work, too, shouldn't it?
>
> Yes, as I said at the top of my reply dated Wed, 25 Mar 2009 09:57:51 +0000

It turns out you can't use -pg and -fomit-frame-pointers at the same time.
At least, my gcc complains about this:
$ make
arm-sony-linux-gnueabi-dev-gcc -g -pg -fomit-frame-pointer -o hello hello.c
arm-sony-linux-gnueabi-dev-gcc: -pg and -fomit-frame-pointer are incompatible
make: *** [hello] Error 1

=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================

2009-03-25 18:41:29

by Steven Rostedt

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?


On Wed, 25 Mar 2009, Tim Bird wrote:

> Russell King - ARM Linux wrote:
> > On Wed, Mar 25, 2009 at 06:05:33PM +0100, Uwe Kleine-K???nig wrote:
> >> On Wed, Mar 25, 2009 at 09:34:07AM -0700, Tim Bird wrote:
> >>> Russell King - ARM Linux wrote:
> >>>> As pointed out in my previous mail, identifying where on the stack the
> >>>> return address is stored is only possible for OABI with frame pointers.
> >>>>
> >>>> EABI will probably be possible with the stack unwinding code, but it
> >>>> probably won't be cheap. The EABI unwinder is scheduled for merging
> >>>> during the present now-open merge window.
> >> EABI with frame pointers should work, too, shouldn't it?
> >
> > Yes, as I said at the top of my reply dated Wed, 25 Mar 2009 09:57:51 +0000
>
> It turns out you can't use -pg and -fomit-frame-pointers at the same time.
> At least, my gcc complains about this:
> $ make
> arm-sony-linux-gnueabi-dev-gcc -g -pg -fomit-frame-pointer -o hello hello.c
> arm-sony-linux-gnueabi-dev-gcc: -pg and -fomit-frame-pointer are incompatible
> make: *** [hello] Error 1

Correct, that is because mcount requires (as Russell earlier pointed out)
the use of frame pointers.

-- Steve

2009-03-25 20:27:53

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64


Ingo,

Please pull the latest tip/tracing/function-graph tree, which can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
tip/tracing/function-graph


Steven Rostedt (1):
x86, function-graph: only save return values on x86_64

----
arch/x86/kernel/entry_64.S | 19 +++----------------
1 files changed, 3 insertions(+), 16 deletions(-)
---------------------------
commit d5b754b2208b1c22691a7cd5d1a65398b5973d74
Author: Steven Rostedt <[email protected]>
Date: Wed Mar 25 14:30:04 2009 -0400

x86, function-graph: only save return values on x86_64

Impact: speed up

The return to handler portion of the function graph tracer should only
need to save the return values. The caller already saved off the
registers that the callee can modify. The returning function already
saved the registers it modified. When we call our own trace function
it too will save the registers that the callee must restore.

There's no reason to save off anything more that the registers used
to return the values.

Note, I did a complete kernel build with this modification and the
function graph tracer running on x86_64.

Signed-off-by: Steven Rostedt <[email protected]>

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index a331ec3..1ac9986 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -147,27 +147,14 @@ END(ftrace_graph_caller)
GLOBAL(return_to_handler)
subq $80, %rsp

+ /* Save the return values */
movq %rax, (%rsp)
- movq %rcx, 8(%rsp)
- movq %rdx, 16(%rsp)
- movq %rsi, 24(%rsp)
- movq %rdi, 32(%rsp)
- movq %r8, 40(%rsp)
- movq %r9, 48(%rsp)
- movq %r10, 56(%rsp)
- movq %r11, 64(%rsp)
+ movq %rdx, 8(%rsp)

call ftrace_return_to_handler

movq %rax, 72(%rsp)
- movq 64(%rsp), %r11
- movq 56(%rsp), %r10
- movq 48(%rsp), %r9
- movq 40(%rsp), %r8
- movq 32(%rsp), %rdi
- movq 24(%rsp), %rsi
- movq 16(%rsp), %rdx
- movq 8(%rsp), %rcx
+ movq 8(%rsp), %rdx
movq (%rsp), %rax
addq $72, %rsp
retq

2009-03-25 20:46:53

by Jaswinder Singh Rajput

[permalink] [raw]
Subject: Re: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64

On Wed, 2009-03-25 at 16:27 -0400, Steven Rostedt wrote:
> Ingo,
>
> Please pull the latest tip/tracing/function-graph tree, which can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/function-graph
>
>
> Steven Rostedt (1):
> x86, function-graph: only save return values on x86_64
>
> ----
> arch/x86/kernel/entry_64.S | 19 +++----------------
> 1 files changed, 3 insertions(+), 16 deletions(-)
> ---------------------------
> commit d5b754b2208b1c22691a7cd5d1a65398b5973d74
> Author: Steven Rostedt <[email protected]>
> Date: Wed Mar 25 14:30:04 2009 -0400
>
> x86, function-graph: only save return values on x86_64
>
> Impact: speed up
>
> The return to handler portion of the function graph tracer should only
> need to save the return values. The caller already saved off the
> registers that the callee can modify. The returning function already
> saved the registers it modified. When we call our own trace function
> it too will save the registers that the callee must restore.
>
> There's no reason to save off anything more that the registers used
> to return the values.
>
> Note, I did a complete kernel build with this modification and the
> function graph tracer running on x86_64.
>
> Signed-off-by: Steven Rostedt <[email protected]>
>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index a331ec3..1ac9986 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -147,27 +147,14 @@ END(ftrace_graph_caller)
> GLOBAL(return_to_handler)
> subq $80, %rsp
>
> + /* Save the return values */
> movq %rax, (%rsp)
> - movq %rcx, 8(%rsp)
> - movq %rdx, 16(%rsp)
> - movq %rsi, 24(%rsp)
> - movq %rdi, 32(%rsp)
> - movq %r8, 40(%rsp)
> - movq %r9, 48(%rsp)
> - movq %r10, 56(%rsp)
> - movq %r11, 64(%rsp)
> + movq %rdx, 8(%rsp)
>
> call ftrace_return_to_handler
>
> movq %rax, 72(%rsp)
> - movq 64(%rsp), %r11
> - movq 56(%rsp), %r10
> - movq 48(%rsp), %r9
> - movq 40(%rsp), %r8
> - movq 32(%rsp), %rdi
> - movq 24(%rsp), %rsi
> - movq 16(%rsp), %rdx
> - movq 8(%rsp), %rcx
> + movq 8(%rsp), %rdx
> movq (%rsp), %rax
> addq $72, %rsp
> retq

hmm, is this gonna work:
movq %rax, 16(%rsp)
movq 8(%rsp), %rdx
movq (%rsp), %rax
addq $16, %rsp
retq

--
JSR

2009-03-25 21:26:38

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64


On Thu, 26 Mar 2009, Jaswinder Singh Rajput wrote:
> >
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index a331ec3..1ac9986 100644
> > --- a/arch/x86/kernel/entry_64.S
> > +++ b/arch/x86/kernel/entry_64.S
> > @@ -147,27 +147,14 @@ END(ftrace_graph_caller)
> > GLOBAL(return_to_handler)
> > subq $80, %rsp
> >
> > + /* Save the return values */
> > movq %rax, (%rsp)
> > - movq %rcx, 8(%rsp)
> > - movq %rdx, 16(%rsp)
> > - movq %rsi, 24(%rsp)
> > - movq %rdi, 32(%rsp)
> > - movq %r8, 40(%rsp)
> > - movq %r9, 48(%rsp)
> > - movq %r10, 56(%rsp)
> > - movq %r11, 64(%rsp)
> > + movq %rdx, 8(%rsp)
> >
> > call ftrace_return_to_handler
> >
> > movq %rax, 72(%rsp)
> > - movq 64(%rsp), %r11
> > - movq 56(%rsp), %r10
> > - movq 48(%rsp), %r9
> > - movq 40(%rsp), %r8
> > - movq 32(%rsp), %rdi
> > - movq 24(%rsp), %rsi
> > - movq 16(%rsp), %rdx
> > - movq 8(%rsp), %rcx
> > + movq 8(%rsp), %rdx
> > movq (%rsp), %rax
> > addq $72, %rsp
> > retq
>
> hmm, is this gonna work:
> movq %rax, 16(%rsp)
> movq 8(%rsp), %rdx
> movq (%rsp), %rax
> addq $16, %rsp
> retq

Interesting. I was just talking over IRC with Frederic about this. One
theory about his earlier failures was not having a proper stack frame when
calling the function. This is probably a bogus theory and there was
probably something else that broke the code. But there is no harm in
allocating 80 bytes (it worked before). Thus, I rather error on the side
of least change than risk it.

-- Steve

2009-03-27 12:59:45

by Catalin Marinas

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

On Wed, 2009-03-25 at 09:34 -0700, Tim Bird wrote:
> Russell King - ARM Linux wrote:
> > As pointed out in my previous mail, identifying where on the stack the
> > return address is stored is only possible for OABI with frame pointers.
> >
> > EABI will probably be possible with the stack unwinding code, but it
> > probably won't be cheap. The EABI unwinder is scheduled for merging
> > during the present now-open merge window.
>
> -finstrument-functions is looking better and better. I know it
> adds more overhead than the mcount call, and may wreak havoc with
> the dynamic ftrace mechanisms, but at least the callouts are
> simple, clear, and you get both entry and exit, at fixed
> costs.

This approach would work even with Thumb-2 compiled kernels (not in
mainline) where the frame pointer is useless wrt backtraces. The Thumb-2
kernel cannot currently be compiled with frame pointer because there is
some inline assembly where gcc fail to allocate lower registers (in
Thumb, the frame pointer is r7).

Anyway, the EABI toolchain I have (from CodeSourcery) generates
something like below with -pg for both ARM and Thumb code (so that it
doesn't rely on the frame pointer):

push {lr}
bl __gnu_mcount_nc

I think this will be (was?) merged into the mainline gcc for ARM. The
-pg option is still incompatible with -fomit-frame-pointer but maybe it
shouldn't be anymore.

> I'll take a look at the EABI unwinder to see
> what kind of variability it introduces (e.g. if it does a stack
> scan or something).

It's not cheap - it looks up the function address in a binary tree and
it reads some bytecodes (either from the tree or from a different table
pointed to by the tree) which it interprets to perform the equivalent of
a return from the current function (reading registers from the stack and
changing SP, LR).

--
Catalin

2009-04-08 16:09:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64


(replying to old mail - working down my awful email backlog)

do we still need this for -tip?

Ingo

* Steven Rostedt <[email protected]> wrote:

>
> Ingo,
>
> Please pull the latest tip/tracing/function-graph tree, which can be found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> tip/tracing/function-graph
>
>
> Steven Rostedt (1):
> x86, function-graph: only save return values on x86_64
>
> ----
> arch/x86/kernel/entry_64.S | 19 +++----------------
> 1 files changed, 3 insertions(+), 16 deletions(-)
> ---------------------------
> commit d5b754b2208b1c22691a7cd5d1a65398b5973d74
> Author: Steven Rostedt <[email protected]>
> Date: Wed Mar 25 14:30:04 2009 -0400
>
> x86, function-graph: only save return values on x86_64
>
> Impact: speed up
>
> The return to handler portion of the function graph tracer should only
> need to save the return values. The caller already saved off the
> registers that the callee can modify. The returning function already
> saved the registers it modified. When we call our own trace function
> it too will save the registers that the callee must restore.
>
> There's no reason to save off anything more that the registers used
> to return the values.
>
> Note, I did a complete kernel build with this modification and the
> function graph tracer running on x86_64.
>
> Signed-off-by: Steven Rostedt <[email protected]>
>
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index a331ec3..1ac9986 100644
> --- a/arch/x86/kernel/entry_64.S
> +++ b/arch/x86/kernel/entry_64.S
> @@ -147,27 +147,14 @@ END(ftrace_graph_caller)
> GLOBAL(return_to_handler)
> subq $80, %rsp
>
> + /* Save the return values */
> movq %rax, (%rsp)
> - movq %rcx, 8(%rsp)
> - movq %rdx, 16(%rsp)
> - movq %rsi, 24(%rsp)
> - movq %rdi, 32(%rsp)
> - movq %r8, 40(%rsp)
> - movq %r9, 48(%rsp)
> - movq %r10, 56(%rsp)
> - movq %r11, 64(%rsp)
> + movq %rdx, 8(%rsp)
>
> call ftrace_return_to_handler
>
> movq %rax, 72(%rsp)
> - movq 64(%rsp), %r11
> - movq 56(%rsp), %r10
> - movq 48(%rsp), %r9
> - movq 40(%rsp), %r8
> - movq 32(%rsp), %rdi
> - movq 24(%rsp), %rsi
> - movq 16(%rsp), %rdx
> - movq 8(%rsp), %rcx
> + movq 8(%rsp), %rdx
> movq (%rsp), %rax
> addq $72, %rsp
> retq

2009-04-08 16:38:10

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64



On Wed, 8 Apr 2009, Ingo Molnar wrote:

>
> (replying to old mail - working down my awful email backlog)
>
> do we still need this for -tip?
>

I've ran it without issues. It will help speed up the function graph
tracer since it removes a bunch of copies at every function call. But I
would test it in your work suite, since I only tested it on two boxes.

-- Steve

2009-04-08 16:42:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64


* Steven Rostedt <[email protected]> wrote:

> > do we still need this for -tip?
>
> I've ran it without issues. It will help speed up the function
> graph tracer since it removes a bunch of copies at every function
> call. But I would test it in your work suite, since I only tested
> it on two boxes.

ok, great - i've applied it to tip:tracing/ftrace.

Ingo

2009-04-08 17:40:59

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH][GIT PULL] x86, function-graph: only save return values on x86_64

On Wed, Apr 08, 2009 at 06:09:15PM +0200, Ingo Molnar wrote:
>
> (replying to old mail - working down my awful email backlog)
>
> do we still need this for -tip?
>
> Ingo


Yes. It drops a small overhead on the function graph tracer.

Thanks,
Frederic.


> * Steven Rostedt <[email protected]> wrote:
>
> >
> > Ingo,
> >
> > Please pull the latest tip/tracing/function-graph tree, which can be found at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git
> > tip/tracing/function-graph
> >
> >
> > Steven Rostedt (1):
> > x86, function-graph: only save return values on x86_64
> >
> > ----
> > arch/x86/kernel/entry_64.S | 19 +++----------------
> > 1 files changed, 3 insertions(+), 16 deletions(-)
> > ---------------------------
> > commit d5b754b2208b1c22691a7cd5d1a65398b5973d74
> > Author: Steven Rostedt <[email protected]>
> > Date: Wed Mar 25 14:30:04 2009 -0400
> >
> > x86, function-graph: only save return values on x86_64
> >
> > Impact: speed up
> >
> > The return to handler portion of the function graph tracer should only
> > need to save the return values. The caller already saved off the
> > registers that the callee can modify. The returning function already
> > saved the registers it modified. When we call our own trace function
> > it too will save the registers that the callee must restore.
> >
> > There's no reason to save off anything more that the registers used
> > to return the values.
> >
> > Note, I did a complete kernel build with this modification and the
> > function graph tracer running on x86_64.
> >
> > Signed-off-by: Steven Rostedt <[email protected]>
> >
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index a331ec3..1ac9986 100644
> > --- a/arch/x86/kernel/entry_64.S
> > +++ b/arch/x86/kernel/entry_64.S
> > @@ -147,27 +147,14 @@ END(ftrace_graph_caller)
> > GLOBAL(return_to_handler)
> > subq $80, %rsp
> >
> > + /* Save the return values */
> > movq %rax, (%rsp)
> > - movq %rcx, 8(%rsp)
> > - movq %rdx, 16(%rsp)
> > - movq %rsi, 24(%rsp)
> > - movq %rdi, 32(%rsp)
> > - movq %r8, 40(%rsp)
> > - movq %r9, 48(%rsp)
> > - movq %r10, 56(%rsp)
> > - movq %r11, 64(%rsp)
> > + movq %rdx, 8(%rsp)
> >
> > call ftrace_return_to_handler
> >
> > movq %rax, 72(%rsp)
> > - movq 64(%rsp), %r11
> > - movq 56(%rsp), %r10
> > - movq 48(%rsp), %r9
> > - movq 40(%rsp), %r8
> > - movq 32(%rsp), %rdi
> > - movq 24(%rsp), %rsi
> > - movq 16(%rsp), %rdx
> > - movq 8(%rsp), %rcx
> > + movq 8(%rsp), %rdx
> > movq (%rsp), %rax
> > addq $72, %rsp
> > retq

2009-04-09 15:54:24

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: Anyone working on ftrace function graph support on ARM?

Sorry for the late reply, I'm way behind on list mail.

On Fri, Mar 27, 2009 at 12:58:08PM +0000, Catalin Marinas wrote:
> Anyway, the EABI toolchain I have (from CodeSourcery) generates
> something like below with -pg for both ARM and Thumb code (so that it
> doesn't rely on the frame pointer):
>
> push {lr}
> bl __gnu_mcount_nc
>
> I think this will be (was?) merged into the mainline gcc for ARM. The
> -pg option is still incompatible with -fomit-frame-pointer but maybe it
> shouldn't be anymore.

Was merged. It'll be in GCC 4.4 and is in our current Lite releases;
it was created to solve precisely this problem. It should be easy to
apply to an older GCC release if desired, but it's not in existing
FSF releases.

--
Daniel Jacobowitz
CodeSourcery