LinuxLists.cc - Ftrace for Microblaze

2009-11-03 13:34:12

Subject: Ftrace for Microblaze - notrace

Hi Thomas, Ingo, Steven and others

I am working on ftrace support for Microblaze. I have done basic ftrace
support - initcall, kmemtrace and other.
Currently I am working on function trace support. I did asm code as is
written in Documentation/trace/ftrace-design.txt.

I have two poins/questions.

I desided to use HAVE_FUNCTION_TRACE_MCOUNT_TEST - just test
function_trace_stop in asm code - that shouldn't be a problem.

I disable -pg in main Makefile because not work for me (more info below)
and I enable it just for two file in arch/microblaze (irq.c and intc.c)
- recompile kernel and test. Log is below and you can see that I am able
to see output for function. That's why I think that my mcount function
should be correct. Am I right?

uclinux login: root
Password:
# cd ; mkdir /debug; mount -t debugfs none /debug; cat
/debug/tracing/available_
tracers; echo function > /debug/tracing/current_tracer;echo 1 >
/debug/tracing/t
racing_enabled;cat /debug/tracing/trace | head -n 10
wakeup_rt wakeup function sched_switch nop
# tracer: function
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
sh-52 [000] 184.940000: irq_call <-do_IRQ
sh-52 [000] 184.940000: do_IRQ <-get_irq
sh-52 [000] 184.940000: handle_level_irq <-intc_mask_ack
sh-52 [000] 184.950000: handle_level_irq
<-intc_enable_or_unmask
sh-52 [000] 184.950000: do_IRQ <-get_irq
sh-52 [000] 184.950000: handle_level_irq <-intc_mask_ack
#

When is used -pg flag kernel stops when I enable function tracer.
Interesting is that freeze than I setup function tracing (echo function
> .../current/tracer). Is it correct behavior? I think that trace
should start after (echo 1> .../tracing_enabled).

The next thing is about annotation notrace. I look at all arch. Is there
any algorithm which function should use it? I think that this could be a
reason why I my kernel freeze when enable function tracer.

I tested to add -pg flag for process.c file and after it kernel freeze
too. Is it mean that problem could be there?

Any other suggestion?

Thanks for your help,
Michal

2009-11-03 16:36:03

by Steven Rostedt

[permalink] [raw]

Subject: Re: Ftrace for Microblaze - notrace

On Tue, 2009-11-03 at 14:32 +0100, Michal Simek wrote:
> Hi Thomas, Ingo, Steven and others
>
> I am working on ftrace support for Microblaze. I have done basic ftrace
> support - initcall, kmemtrace and other.
> Currently I am working on function trace support. I did asm code as is
> written in Documentation/trace/ftrace-design.txt.
>
> I have two poins/questions.
>
> I desided to use HAVE_FUNCTION_TRACE_MCOUNT_TEST - just test
> function_trace_stop in asm code - that shouldn't be a problem.

Yeah, and if you don't have that, it is taken care of in the generic
code too.

>
> I disable -pg in main Makefile because not work for me (more info below)
> and I enable it just for two file in arch/microblaze (irq.c and intc.c)
> - recompile kernel and test. Log is below and you can see that I am able
> to see output for function. That's why I think that my mcount function
> should be correct. Am I right?
>
> uclinux login: root
> Password:
> # cd ; mkdir /debug; mount -t debugfs none /debug; cat
> /debug/tracing/available_
> tracers; echo function > /debug/tracing/current_tracer;echo 1 >
> /debug/tracing/t
> racing_enabled;cat /debug/tracing/trace | head -n 10
> wakeup_rt wakeup function sched_switch nop
> # tracer: function
> #
> # TASK-PID CPU# TIMESTAMP FUNCTION
> # | | | | |
> sh-52 [000] 184.940000: irq_call <-do_IRQ
> sh-52 [000] 184.940000: do_IRQ <-get_irq
> sh-52 [000] 184.940000: handle_level_irq <-intc_mask_ack
> sh-52 [000] 184.950000: handle_level_irq
> <-intc_enable_or_unmask
> sh-52 [000] 184.950000: do_IRQ <-get_irq
> sh-52 [000] 184.950000: handle_level_irq <-intc_mask_ack
> #
>
>
> When is used -pg flag kernel stops when I enable function tracer.
> Interesting is that freeze than I setup function tracing (echo function
> > .../current/tracer). Is it correct behavior? I think that trace
> should start after (echo 1> .../tracing_enabled).

No, tracing_enabled is default on and should not be used (may go away).
Tracing starts as soon as you echo function into current_tracer.

>
>
> The next thing is about annotation notrace. I look at all arch. Is there
> any algorithm which function should use it? I think that this could be a
> reason why I my kernel freeze when enable function tracer.
>
> I tested to add -pg flag for process.c file and after it kernel freeze
> too. Is it mean that problem could be there?

Probably.

>
> Any other suggestion?

Just remember that -pg is very intrusive. I don't know your arch, nor
have I seen the code you did to update it. You can disable the -pg from
process.c by adding in the Makefile:

CFLAGS_REMOVE_process.o = -pg

And that will keep the process.c from being compiled with it. There may
be other files in the arch that should not have it. There's several in
x86 and in PowerPC.

-- Steve

2009-11-03 16:46:45

by Michal Simek

[permalink] [raw]

Subject: Re: Ftrace for Microblaze - notrace

Steven Rostedt wrote:
> On Tue, 2009-11-03 at 14:32 +0100, Michal Simek wrote:
>> Hi Thomas, Ingo, Steven and others
>>
>> I am working on ftrace support for Microblaze. I have done basic ftrace
>> support - initcall, kmemtrace and other.
>> Currently I am working on function trace support. I did asm code as is
>> written in Documentation/trace/ftrace-design.txt.
>>
>> I have two poins/questions.
>>
>> I desided to use HAVE_FUNCTION_TRACE_MCOUNT_TEST - just test
>> function_trace_stop in asm code - that shouldn't be a problem.
>
> Yeah, and if you don't have that, it is taken care of in the generic
> code too.

jj, I know - asm checking is just two instruction - that's why I use it.

>
>> I disable -pg in main Makefile because not work for me (more info below)
>> and I enable it just for two file in arch/microblaze (irq.c and intc.c)
>> - recompile kernel and test. Log is below and you can see that I am able
>> to see output for function. That's why I think that my mcount function
>> should be correct. Am I right?
>>
>> uclinux login: root
>> Password:
>> # cd ; mkdir /debug; mount -t debugfs none /debug; cat
>> /debug/tracing/available_
>> tracers; echo function > /debug/tracing/current_tracer;echo 1 >
>> /debug/tracing/t
>> racing_enabled;cat /debug/tracing/trace | head -n 10
>> wakeup_rt wakeup function sched_switch nop
>> # tracer: function
>> #
>> # TASK-PID CPU# TIMESTAMP FUNCTION
>> # | | | | |
>> sh-52 [000] 184.940000: irq_call <-do_IRQ
>> sh-52 [000] 184.940000: do_IRQ <-get_irq
>> sh-52 [000] 184.940000: handle_level_irq <-intc_mask_ack
>> sh-52 [000] 184.950000: handle_level_irq
>> <-intc_enable_or_unmask
>> sh-52 [000] 184.950000: do_IRQ <-get_irq
>> sh-52 [000] 184.950000: handle_level_irq <-intc_mask_ack
>> #
>>
>>
>> When is used -pg flag kernel stops when I enable function tracer.
>> Interesting is that freeze than I setup function tracing (echo function
>> > .../current/tracer). Is it correct behavior? I think that trace
>> should start after (echo 1> .../tracing_enabled).
>
> No, tracing_enabled is default on and should not be used (may go away).
> Tracing starts as soon as you echo function into current_tracer.

ok

>
>>
>> The next thing is about annotation notrace. I look at all arch. Is there
>> any algorithm which function should use it? I think that this could be a
>> reason why I my kernel freeze when enable function tracer.
>>
>> I tested to add -pg flag for process.c file and after it kernel freeze
>> too. Is it mean that problem could be there?
>
> Probably.

Is there any manual or instruction which functions are dangerous and
especially why?

>
>> Any other suggestion?
>
> Just remember that -pg is very intrusive. I don't know your arch, nor
> have I seen the code you did to update it. You can disable the -pg from
> process.c by adding in the Makefile:
>
> CFLAGS_REMOVE_process.o = -pg

jj, I know how to do it.

>
> And that will keep the process.c from being compiled with it. There may
> be other files in the arch that should not have it. There's several in
> x86 and in PowerPC.

Do you know why they exclude it?

Can I use any special debug features? I mean I look at kernel.h and
there are some trace_printk function - but they don't work for me. Is it
possible to use them or not?

Thanks,
Michal

>
> -- Steve
>
>

2009-11-03 17:32:54

by Steven Rostedt

[permalink] [raw]

Subject: Re: Ftrace for Microblaze - notrace

On Tue, 2009-11-03 at 17:45 +0100, Michal Simek wrote:
> Steven Rostedt wrote:

> >>
> >> The next thing is about annotation notrace. I look at all arch. Is there
> >> any algorithm which function should use it? I think that this could be a
> >> reason why I my kernel freeze when enable function tracer.
> >>
> >> I tested to add -pg flag for process.c file and after it kernel freeze
> >> too. Is it mean that problem could be there?
> >
> > Probably.
>
> Is there any manual or instruction which functions are dangerous and
> especially why?

No there's no manual. It varies from arch to arch. Hmm, maybe I'll
update the current doc to have some of the issues that come with it.
(more info farther down)

> >
> > And that will keep the process.c from being compiled with it. There may
> > be other files in the arch that should not have it. There's several in
> > x86 and in PowerPC.
>
> Do you know why they exclude it?

I did the ports for x86 and PowerPC so I guess I'm the best to answer
this ;-)

In arch/x86 we have these removing -pg:

tsc.o, rtc.o - Do not trace clock output. The tracer itself uses it.
Even though it protects against recursion, it is a big overhead to trace
something within the tracer.

paravirt-spinlocks.o - paravirt does crazy things with locks and such,
but overwriting text. Don't add mcount to them and overwrite the text
with the ftrace facility.

ftrace.o - well duh ;-)

early_printk.o - This is mostly used for debugging, don't trace it.

cpu/common.o - This is used in early stages of boot up of secordary
CPUs. As well as used in hot plug. Simply using something like
smp_processor_id() can crash the kernel here. The tracer code uses that,
so don't trace it.

Xen has a bunch that they added, and I think that's also because they do
funny things too.

For powerpc, we have:

cputable.o, prom_init.o, btext.o and prom.o that are all very sensitive
with boot up code. Simply calling mcount here can crash the system. But
this prevents boot up. It does not lock up the kernel when we enable
function tracing.

ftrace.o - again, this should be obvious.

time.o - this is the same reason we don't trace tsc and rtc in x86.

>
> Can I use any special debug features? I mean I look at kernel.h and
> there are some trace_printk function - but they don't work for me. Is it
> possible to use them or not?

Should work. That's not arch dependent. But you need to have some ftrace
enabled. Enable CONFIG_CONTEXT_SWITCH_TRACER, and trace_printk should
work. That writes to the ring buffer and you can see the output
in /debug/tracing/trace.

-- Steve

2009-11-03 18:13:25

by Michal Simek

[permalink] [raw]

Subject: Re: Ftrace for Microblaze - notrace

Steven Rostedt wrote:
> On Tue, 2009-11-03 at 17:45 +0100, Michal Simek wrote:
>> Steven Rostedt wrote:
>
>>>> The next thing is about annotation notrace. I look at all arch. Is there
>>>> any algorithm which function should use it? I think that this could be a
>>>> reason why I my kernel freeze when enable function tracer.
>>>>
>>>> I tested to add -pg flag for process.c file and after it kernel freeze
>>>> too. Is it mean that problem could be there?
>>> Probably.
>> Is there any manual or instruction which functions are dangerous and
>> especially why?
>
> No there's no manual. It varies from arch to arch. Hmm, maybe I'll
> update the current doc to have some of the issues that come with it.
> (more info farther down)

yes, will be helpful - anyway I found missing references to
ftrace-implementation.txt

kernel/trace/Kconfig:15: See
Documentation/trace/ftrace-implementation.txt
kernel/trace/Kconfig:20: See
Documentation/trace/ftrace-implementation.txt
kernel/trace/Kconfig:25: See
Documentation/trace/ftrace-implementation.txt
kernel/trace/Kconfig:37: See
Documentation/trace/ftrace-implementation.txt
kernel/trace/Kconfig:42: See
Documentation/trace/ftrace-implementation.txt
kernel/trace/Kconfig:47: See
Documentation/trace/ftrace-implementation.txt
kernel/trace/Kconfig:55: See
Documentation/trace/ftrace-implementation.txt

>
>>> And that will keep the process.c from being compiled with it. There may
>>> be other files in the arch that should not have it. There's several in
>>> x86 and in PowerPC.
>> Do you know why they exclude it?
>
> I did the ports for x86 and PowerPC so I guess I'm the best to answer
> this ;-)

I hope so.

>
> In arch/x86 we have these removing -pg:
>
> tsc.o, rtc.o - Do not trace clock output. The tracer itself uses it.
> Even though it protects against recursion, it is a big overhead to trace
> something within the tracer.
>
> paravirt-spinlocks.o - paravirt does crazy things with locks and such,
> but overwriting text. Don't add mcount to them and overwrite the text
> with the ftrace facility.
>
> ftrace.o - well duh ;-)
>
> early_printk.o - This is mostly used for debugging, don't trace it.
>
> cpu/common.o - This is used in early stages of boot up of secordary
> CPUs. As well as used in hot plug. Simply using something like
> smp_processor_id() can crash the kernel here. The tracer code uses that,
> so don't trace it.
>
> Xen has a bunch that they added, and I think that's also because they do
> funny things too.
>
>
> For powerpc, we have:
>
> cputable.o, prom_init.o, btext.o and prom.o that are all very sensitive
> with boot up code. Simply calling mcount here can crash the system. But
> this prevents boot up. It does not lock up the kernel when we enable
> function tracing.
>
> ftrace.o - again, this should be obvious.
>
> time.o - this is the same reason we don't trace tsc and rtc in x86.

ok In general - all timing functions and function which are related with
startup code - called only once before turning on irqs - there are a lot
of functions. I understand that make no sense to trace that files. And
plus all debugging functions. And function which uses locks.
Anything else?

>
>> Can I use any special debug features? I mean I look at kernel.h and
>> there are some trace_printk function - but they don't work for me. Is it
>> possible to use them or not?
>
> Should work. That's not arch dependent. But you need to have some ftrace
> enabled. Enable CONFIG_CONTEXT_SWITCH_TRACER, and trace_printk should
> work. That writes to the ring buffer and you can see the output
> in /debug/tracing/trace.

It could be nice features but in case that you have command line and you
can read that trace debug output. I mean any printk messages - which not
contain any trace facility - just pure printk to console. Are that
messages written to any place which could be read from debuger? Like log
buffer? Is there any place for them?

The second things is that I haven't implemented save_stack_trace
function - it is just while(1) loop. My kernel not jumps to this
function that's why I am not interested to implement it. Am I right that
ftrace code don't use it? Or the standard behavior is that ftrace code
use this function and really care about. Does your kernel use that function?

From my test I don't understand MCOUNT_INSN_SIZE - and which function
doesn't care about. There is this line of pseudocode in your doc
(unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE;) but what
the reason is to do that. If I understand correct selfpc and frompc just
give addresses which functions called that mcount and has impact to log.
I mean that if that address is not proper begin of function is found
(not sure if the code do it or not) function name (and function) which
contain that address. Or is there any special reason why to substract
that MCOUNT_INSN_SIZE - for timing or anything like that?

I simple just not find any reason for it. And I believe that don't have
connection with my problem.

Thanks,
Michal

>
> -- Steve
>
>

2009-11-04 02:03:10

by Steven Rostedt

[permalink] [raw]

Subject: Re: Ftrace for Microblaze - notrace

On Tue, 2009-11-03 at 19:11 +0100, Michal Simek wrote:

> > cputable.o, prom_init.o, btext.o and prom.o that are all very sensitive
> > with boot up code. Simply calling mcount here can crash the system. But
> > this prevents boot up. It does not lock up the kernel when we enable
> > function tracing.
> >
> > ftrace.o - again, this should be obvious.
> >
> > time.o - this is the same reason we don't trace tsc and rtc in x86.
>
> ok In general - all timing functions and function which are related with
> startup code - called only once before turning on irqs - there are a lot
> of functions. I understand that make no sense to trace that files. And
> plus all debugging functions. And function which uses locks.
> Anything else?

Actually, you only need to "notrace" the start up code that can crash
the box if they are traced. The powerpc code was so fragile there that
just calling mcount caused a panic.

>
>
> >
> >> Can I use any special debug features? I mean I look at kernel.h and
> >> there are some trace_printk function - but they don't work for me. Is it
> >> possible to use them or not?
> >
> > Should work. That's not arch dependent. But you need to have some ftrace
> > enabled. Enable CONFIG_CONTEXT_SWITCH_TRACER, and trace_printk should
> > work. That writes to the ring buffer and you can see the output
> > in /debug/tracing/trace.
>
> It could be nice features but in case that you have command line and you
> can read that trace debug output. I mean any printk messages - which not
> contain any trace facility - just pure printk to console. Are that
> messages written to any place which could be read from debuger? Like log
> buffer? Is there any place for them?

The are only written to the ring buffer. You may be able to read that
too, but it will take a bit of work.

>
> The second things is that I haven't implemented save_stack_trace
> function - it is just while(1) loop. My kernel not jumps to this
> function that's why I am not interested to implement it. Am I right that
> ftrace code don't use it? Or the standard behavior is that ftrace code
> use this function and really care about. Does your kernel use that function?

We use it for saving stack traces. The ftrace stack tracer also uses it.
But if it is a nop, the other parts of ftrace should still work.

>
> From my test I don't understand MCOUNT_INSN_SIZE - and which function
> doesn't care about. There is this line of pseudocode in your doc
> (unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE;) but what
> the reason is to do that. If I understand correct selfpc and frompc just
> give addresses which functions called that mcount and has impact to log.
> I mean that if that address is not proper begin of function is found
> (not sure if the code do it or not) function name (and function) which
> contain that address. Or is there any special reason why to substract
> that MCOUNT_INSN_SIZE - for timing or anything like that?

The passed in address to mcount is really the return address and not the
caller to mcount. We want the actual address to the mcount caller. To
get that we must subtract size of the call to mcount from the address
passed in.

>
> I simple just not find any reason for it. And I believe that don't have
> connection with my problem.

The code must account for it. If you have implemented dynamic ftrace,
then you need to do that. Otherwise we are updating the wrong places in
text.

-- Steve

2009-11-04 02:26:39

by John Williams

[permalink] [raw]

Subject: Re: Ftrace for Microblaze - notrace

>> ?From my test I don't understand MCOUNT_INSN_SIZE - and which function
>> doesn't care about. There is this line of pseudocode in your doc
>> (unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE;) but what
>> the reason is to do that. If I understand correct selfpc and frompc just
>> give addresses which functions called that mcount and has impact to log.
>> I mean that if that address is not proper begin of function is found
>> (not sure if the code do it or not) function name (and function) which
>> contain that address. Or is there any special reason why to substract
>> that MCOUNT_INSN_SIZE - for timing or anything like that?
>
> The passed in address to mcount is really the return address and not the
> caller to mcount. We want the actual address to the mcount caller. To
> get that we must subtract size of the call to mcount from the address
> passed in.

Is it critical that mcount calculate and use the exact entry point of
the calling function, or just that it be consistent and be located
"inside" the caller's footprint? If so, MCOUNT_INSN_SIZE would be
superfluous?...

MicroBlaze gcc currently inserts the mcount call after the function
prologue, and the prologue will differ in length depending upon how
much stack and register shuffling is required for that function. So,
a single constant cannot correctly describe the offset of the mcount
call site to the function's true entry point.

Or, perhaps the entry point is defined as "the first opcode after the
prologue", in which case MCOUNT_INSN_SIZE can just be the size of the
"bralid r15, _mcount; nop" sequence that we are generating (ie 8
bytes).

There is a simple GCC #define that will force the mcount branch before
the prologue, but that makes things a little trickier because our
mcount hook must take care of saving away the branch link register.
If we call after the prologue, gcc has already done that for us.

Thanks,

John
--
John Williams
PetaLogix - Linux Solutions for a Reconfigurable World
w: http://www.petalogix.com p: +61-7-30090663 f: +61-7-30090663

2009-11-04 02:42:54

by Steven Rostedt

[permalink] [raw]

Subject: Re: Ftrace for Microblaze - notrace

On Wed, 2009-11-04 at 12:26 +1000, John Williams wrote:
> >> From my test I don't understand MCOUNT_INSN_SIZE - and which function
> >> doesn't care about. There is this line of pseudocode in your doc
> >> (unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE;) but what
> >> the reason is to do that. If I understand correct selfpc and frompc just
> >> give addresses which functions called that mcount and has impact to log.
> >> I mean that if that address is not proper begin of function is found
> >> (not sure if the code do it or not) function name (and function) which
> >> contain that address. Or is there any special reason why to substract
> >> that MCOUNT_INSN_SIZE - for timing or anything like that?
> >
> > The passed in address to mcount is really the return address and not the
> > caller to mcount. We want the actual address to the mcount caller. To
> > get that we must subtract size of the call to mcount from the address
> > passed in.
>
> Is it critical that mcount calculate and use the exact entry point of
> the calling function, or just that it be consistent and be located
> "inside" the caller's footprint? If so, MCOUNT_INSN_SIZE would be
> superfluous?...

No no no.

I may not have articulated this well enough.

The address is of the actual call to mcount, not the entry point of the
function. For example:

ffffffff812dcb3c <schedule>:
ffffffff812dcb3c: 55 push %rbp
ffffffff812dcb3d: 48 89 e5 mov %rsp,%rbp
ffffffff812dcb40: 41 57 push %r15
ffffffff812dcb42: 41 56 push %r14
ffffffff812dcb44: 41 55 push %r13
ffffffff812dcb46: 41 54 push %r12
ffffffff812dcb48: 53 push %rbx
ffffffff812dcb49: 48 83 ec 68 sub $0x68,%rsp
ffffffff812dcb4d: e8 6e ee d2 ff callq ffffffff8100b9c0 <mcount>
ffffffff812dcb52: bf 01 00 00 00 mov $0x1,%edi
ffffffff812dcb57: e8 e4 91 d5 ff callq ffffffff81035d40 <add_preempt_count>

When mcount is called in the above code, the stack contains the address
0xffffffff812dcb52 which is the mov $0x1,%edi. But the dynamic
ftrace functionality needs to modify the call co mcount itself. To do
so, we must subtract the MCOUNT_INSN_SIZE (which for x86 is 5). This
gives us the actual call site to mcount (0xffffffff812dcb4d).

The generic code only uses that macro for debugging purposes. If
something goes wrong, we write out what was at the address of mcount, by
showing MCOUNT_INSN_SIZE bytes.

That macro is used in the arch specific code to modify the .text. It is
used as the size parameter.

>
> MicroBlaze gcc currently inserts the mcount call after the function
> prologue, and the prologue will differ in length depending upon how
> much stack and register shuffling is required for that function. So,
> a single constant cannot correctly describe the offset of the mcount
> call site to the function's true entry point.
>
> Or, perhaps the entry point is defined as "the first opcode after the
> prologue", in which case MCOUNT_INSN_SIZE can just be the size of the
> "bralid r15, _mcount; nop" sequence that we are generating (ie 8
> bytes).
>
> There is a simple GCC #define that will force the mcount branch before
> the prologue, but that makes things a little trickier because our
> mcount hook must take care of saving away the branch link register.
> If we call after the prologue, gcc has already done that for us.

You want MCOUNT_INSN_SIZE to be the number of bytes to back up to get to
the actual call to mcount (the bralid r15, mcount).

-- Steve