2008-11-16 21:25:44

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH 0/7] Porting dynmaic ftrace to PowerPC


The following patches are for my work on porting the new dynamic ftrace
framework to PowerPC. The issue I had with both PPC64 and PPC32 is
that the calls to mcount are 24 bit jumps. Since the modules are
loaded in vmalloc address space, the call to mcount is farther than
what a 24 bit jump can make. The way PPC solves this is with the use
of trampolines. The trampoline is a memory space allocated within the
24 bit region of the module. The code in the trampoline that the
jump is made to does a far jump to the core kernel code.

The way PPC64 implements this is slightly different than the way
PPC32 does. Since my PPC64 box has a serial port it makes developing
and debugging easier, so my first patches port to PPC64, and then
the later patches include the work to get PPC32 working.

I'm describing what both PPC archs do in a bit of detail so that the
PPC exports CC'd can tell me if I'm incorrect. I did not read any
PPC specs to find out what was happening, I only reviewed the existing
PPC code that was in Linux.

The PPC64 dynamic ftrace:

PPC64, although works with 64 bit registers, the op codes are still
32 bit in length. PPC64 uses table of contents (TOC) fields
to make their calls to functions. A function name is really a pointer
into the TOC table that stores the actual address of the function
along with the TOC of that function. The r2 register plays as the
TOC pointer. The actual name of the function is the function name
with a dot '.' prefix. The reference name "schedule" is really
to the TOC entry, which calls the actual code with the reference
name ".schedule". This also explains why the list of available filter
functions on PPC64 all have a dot prefix.

When a funtion is called, it uses the 'bl' command which is a 24
bit function jump (saving the return address in the link register).
The next operation after all 'bl' calls is a nop. What the module
load code does when one of these 'bl' calls is farther than 24 bits
can handle, it creates a entry in the TOC and has the 'bl' call to
that entry. The entry in the TOC will save the r2 register on the
stack "40(r1)" load the actually function into the ctrl register
and make the far jump using that register (I'm using the term
'far' to mean more than 24 bits, nothing to do with the x86 far jumps
that deal with segments). The module linker also modifies the
nop after the 'bl' call in the module into an op code that will restore
the r2 register 'ld r2,40(r1)'.

Now for what we need to do with dynamic ftrace on PPC64:

Dynamic ftrace needs to convert these calls to mcount into nops. It
also needs to be able to convert them back into a call to the
ftrace_caller (mcount is just a stub, the actual function recording
is done by a different function called ftrace_caller).

Before the dynamic ftrace modifies any code, it first makes sure
what it is changing is indeed what it expects it to be. This means
the dynamic ftrace code for PPC64 must be fully aware of the module
trampolines. When a mcount call is farther than 24 bits, it
takes a look at where that mcount call is at. The call should be into
an entry in the TOC, and the dynamic ftrace code reads the entry
that the call points to. It makes sure that the entry will make a
call to mcount (otherwise it returns failure).

After verifying that the 'bl' call calls into the TOC that calls
mcount, it converts that 'bl' call into a nop. It also converts
the following op code (the load of r2) into a nop since the TOC
is no longer saved. It does test that the following op is a load
of r2 before making any of the above changes. It also stores the
module structure pointer into the dyn_ftrace record field, for later
use.

On enabling the call back to ftrace_caller, the dynamic ftrace code
first verifys that the two op codes are two nops. It then reads
the dyn_ftrace structure module pointer to find the TOC and the entry
for the ftrace_caller (the ftrace_caller is added to the module
TOC on module load). It then changes the call to the ftrace_caller
and the op that reloads the r2 register.

Note, to disable the ftrace_caller, the same as the disabling of
mcount is done, but instead it verifies that the TOC entry calls
the ftrace_caller and not mcount.


The PPC32 dynamic ftrace:

The work for PPC32 is very much the same as the PPC64 code but the 32
version does not need to deal with TOCS. This makes the code much
simpler. Pretty much everything as PPC64 is done, except it does not
need to index a TOC.

To disable mcount (or ftrace_caller):

If the call is greater than 24 bits, it looks to see where the 'bl'
jumps to. It verifies that the trampoline that it jumps to makes the
call to 'mcount' (or ftrace_caller if that is what is expected).
It then simply converts the 'bl' to a nop.

To enable ftrace_caller:

The 'bl' is converted to jump to the ftrace_caller trampoline entry
that was created on module load.

I've tested the following patches on both PPC64 and PPC32. I will
admit that the PPC64 does not seem that stable, but neither does the
code when all this is not enabled ;-) I'll debug it more to see if
I can find the cause of my crashes, which may or may not be related
to the dynamic ftrace code. But the use of TOCS in PPC64 make me
a bit nervious that I did not do this correctly. Any help in reviewing
my code for mistakes would be greatly appreciated.

The following patches are in:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git

branch: tip/ppc


Steven Rostedt (7):
ftrace, PPC: do not latency trace idle
ftrace, ppc: convert to new dynamic ftrace arch API
ftrace: powerpc mcount record port
ftrace, PPC: use probe_kernel API to modify code
ftrace, PPC64: handle module trampolines for dyn ftrace
ftrace,ppc32: enabled dynamic ftrace
ftrace,ppc32: dynamic ftrace to handle modules

----
arch/powerpc/Kconfig | 2 +
arch/powerpc/include/asm/ftrace.h | 14 +-
arch/powerpc/include/asm/module.h | 16 ++-
arch/powerpc/kernel/ftrace.c | 460 +++++++++++++++++++++++++++++++++---
arch/powerpc/kernel/idle.c | 5 +
arch/powerpc/kernel/module_32.c | 10 +
arch/powerpc/kernel/module_64.c | 13 +
scripts/recordmcount.pl | 18 ++-
8 files changed, 495 insertions(+), 43 deletions(-)


2008-11-16 22:56:18

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 0/7] Porting dynmaic ftrace to PowerPC

Steven Rostedt writes:

> The following patches are for my work on porting the new dynamic ftrace
> framework to PowerPC. The issue I had with both PPC64 and PPC32 is
> that the calls to mcount are 24 bit jumps. Since the modules are
> loaded in vmalloc address space, the call to mcount is farther than
> what a 24 bit jump can make. The way PPC solves this is with the use
> of trampolines. The trampoline is a memory space allocated within the
> 24 bit region of the module. The code in the trampoline that the
> jump is made to does a far jump to the core kernel code.

Thanks for doing this work. I'll go through the patches in detail
today, but first I'd like to clear up a couple of things for you. The
first is that unconditional branches on PowerPC effectively have a
26-bit sign-extended offset, not 24-bit. The offset field in the
instruction is 24 bits long, but because all instructions are 4 bytes
long, two extra 0 bits get appended to the offset field, giving a
26-bit offset and a range of +/- 32MB from the branch instruction.

> PPC64, although works with 64 bit registers, the op codes are still
> 32 bit in length. PPC64 uses table of contents (TOC) fields
> to make their calls to functions. A function name is really a pointer
> into the TOC table that stores the actual address of the function
> along with the TOC of that function. The r2 register plays as the
> TOC pointer. The actual name of the function is the function name
> with a dot '.' prefix. The reference name "schedule" is really
> to the TOC entry, which calls the actual code with the reference
> name ".schedule". This also explains why the list of available filter
> functions on PPC64 all have a dot prefix.

A little more detail: the TOC mainly stores addresses and other
constants. Functions have a descriptor that is stored in a .opd
section (not the TOC, though the TOC may contain pointers to procedure
descriptors). Each descriptor has the address of the code, the
address of the TOC for the function, and a static chain pointer (not
used for C, but can used for other languages). As you note, the
symbol for a function will be the address of the descriptor rather
than the address of the function code.

> When a funtion is called, it uses the 'bl' command which is a 24
> bit function jump (saving the return address in the link register).
> The next operation after all 'bl' calls is a nop. What the module
> load code does when one of these 'bl' calls is farther than 24 bits
> can handle, it creates a entry in the TOC and has the 'bl' call to

The module loader allocates some memory for these trampolines, but
that's a distinct area from the TOC and the OPD section.

> that entry. The entry in the TOC will save the r2 register on the
> stack "40(r1)" load the actually function into the ctrl register

"counter" register, actually, not "ctrl".

> The work for PPC32 is very much the same as the PPC64 code but the 32
> version does not need to deal with TOCS. This makes the code much
> simpler. Pretty much everything as PPC64 is done, except it does not
> need to index a TOC.

Right.

> I've tested the following patches on both PPC64 and PPC32. I will
> admit that the PPC64 does not seem that stable, but neither does the
> code when all this is not enabled ;-) I'll debug it more to see if
> I can find the cause of my crashes, which may or may not be related
> to the dynamic ftrace code. But the use of TOCS in PPC64 make me
> a bit nervious that I did not do this correctly. Any help in reviewing
> my code for mistakes would be greatly appreciated.

Hmmm. What sort of crashes are you seeing?

Regards,
Paul.

2008-11-17 15:42:23

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/7] Porting dynmaic ftrace to PowerPC


On Mon, 17 Nov 2008, Paul Mackerras wrote:

> Steven Rostedt writes:
>
> > The following patches are for my work on porting the new dynamic ftrace
> > framework to PowerPC. The issue I had with both PPC64 and PPC32 is
> > that the calls to mcount are 24 bit jumps. Since the modules are
> > loaded in vmalloc address space, the call to mcount is farther than
> > what a 24 bit jump can make. The way PPC solves this is with the use
> > of trampolines. The trampoline is a memory space allocated within the
> > 24 bit region of the module. The code in the trampoline that the
> > jump is made to does a far jump to the core kernel code.
>
> Thanks for doing this work. I'll go through the patches in detail
> today, but first I'd like to clear up a couple of things for you. The
> first is that unconditional branches on PowerPC effectively have a
> 26-bit sign-extended offset, not 24-bit. The offset field in the
> instruction is 24 bits long, but because all instructions are 4 bytes
> long, two extra 0 bits get appended to the offset field, giving a
> 26-bit offset and a range of +/- 32MB from the branch instruction.

Ah yes, thanks for the clarification.

>
> > PPC64, although works with 64 bit registers, the op codes are still
> > 32 bit in length. PPC64 uses table of contents (TOC) fields
> > to make their calls to functions. A function name is really a pointer
> > into the TOC table that stores the actual address of the function
> > along with the TOC of that function. The r2 register plays as the
> > TOC pointer. The actual name of the function is the function name
> > with a dot '.' prefix. The reference name "schedule" is really
> > to the TOC entry, which calls the actual code with the reference
> > name ".schedule". This also explains why the list of available filter
> > functions on PPC64 all have a dot prefix.
>
> A little more detail: the TOC mainly stores addresses and other
> constants. Functions have a descriptor that is stored in a .opd
> section (not the TOC, though the TOC may contain pointers to procedure
> descriptors). Each descriptor has the address of the code, the
> address of the TOC for the function, and a static chain pointer (not
> used for C, but can used for other languages). As you note, the
> symbol for a function will be the address of the descriptor rather
> than the address of the function code.
>
> > When a funtion is called, it uses the 'bl' command which is a 24
> > bit function jump (saving the return address in the link register).
> > The next operation after all 'bl' calls is a nop. What the module
> > load code does when one of these 'bl' calls is farther than 24 bits
> > can handle, it creates a entry in the TOC and has the 'bl' call to
>
> The module loader allocates some memory for these trampolines, but
> that's a distinct area from the TOC and the OPD section.

Ah, yes, my mistake. It is a trampoline entry, not part of the TOC.

>
> > that entry. The entry in the TOC will save the r2 register on the
> > stack "40(r1)" load the actually function into the ctrl register
>
> "counter" register, actually, not "ctrl".

Oops, I still make that mistake :-/ I use to do a lot of PPC work several
years ago, and I would always call that the control register, and my
colleagues would always correct me and say its the counter register. I
guess some things just don't change ;-)


>
> > The work for PPC32 is very much the same as the PPC64 code but the 32
> > version does not need to deal with TOCS. This makes the code much
> > simpler. Pretty much everything as PPC64 is done, except it does not
> > need to index a TOC.
>
> Right.
>
> > I've tested the following patches on both PPC64 and PPC32. I will
> > admit that the PPC64 does not seem that stable, but neither does the
> > code when all this is not enabled ;-) I'll debug it more to see if
> > I can find the cause of my crashes, which may or may not be related
> > to the dynamic ftrace code. But the use of TOCS in PPC64 make me
> > a bit nervious that I did not do this correctly. Any help in reviewing
> > my code for mistakes would be greatly appreciated.
>
> Hmmm. What sort of crashes are you seeing?

This code is in tip, which is mainly used to develop for x86. I've hit a
few crashes, and I think I hit a couple without this code. But here's an
example:

huh, entered softirq 4 c000000000846ad8 preempt_count 10000103, exited
with fffefffe?
------------[ cut here ]------------
Badness at kernel/sched_fair.c:875
NIP: c00000000004bfb8 LR: c00000000004bf7c CTR: c0000000000b5830
REGS: c00000003929cce0 TRAP: 0700 Not tainted (2.6.28-rc4-tip)
MSR: 9000000000021032 <ME,IR,DR> CR: 28822842 XER: 20000000
TASK = c00000003d93cd10[2061] 'remove-trailing' THREAD: c00000003929c000
CPU: 1
GPR00: 0000000000000001 c00000003929cf60 c000000000887070 c000000000ae2d00
GPR04: c00000000004c2c0 0000000000003320 c00000003929cb70 000000000000080d
GPR08: c00000000079333c 000000000002ffff c000000000903380 c000000000903380
GPR12: 0000000048822848 c000000000903580 c000000000794000 0000000000000000
GPR16: c000000000903380 0000000000000001 c000000000909f7c 7fffffffffffffff
GPR20: c00000003929d8e0 c000000000ae2f20 00000086b6e84cc0 0000000000000001
GPR24: 0000000000000001 c000000000794000 c00000003d93cd10 c00000003d934f20
GPR28: c000000000ae4000 c00000003d93cd48 c000000000803550 c00000003929cf60
cpu 0x1: Vector: 400 (Instruction Access) at [c00000003929be1f]
pc: 01c0000000000ae8
lr: 01c0000000000aeb
sp: c00000003929c09f
msr: 9000000040001032
current = 0xc00000003d93cd10
paca = 0xc000000000903580
pid = 2061, comm = remove-trailing


Then it went into the monitor that is loaded. When I fix the rest of my
patches, I'll see if it is not my code that is crashing this, and then
I'll see if I can figure out what is causing some of these crashes.

Thanks Paul for all the feedback!

-- Steve

2008-11-17 20:04:18

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/7] Porting dynmaic ftrace to PowerPC


On Mon, 17 Nov 2008, Steven Rostedt wrote:
> On Mon, 17 Nov 2008, Paul Mackerras wrote:
> >
> > > I've tested the following patches on both PPC64 and PPC32. I will
> > > admit that the PPC64 does not seem that stable, but neither does the
> > > code when all this is not enabled ;-) I'll debug it more to see if
> > > I can find the cause of my crashes, which may or may not be related
> > > to the dynamic ftrace code. But the use of TOCS in PPC64 make me
> > > a bit nervious that I did not do this correctly. Any help in reviewing
> > > my code for mistakes would be greatly appreciated.
> >
> > Hmmm. What sort of crashes are you seeing?
>
> This code is in tip, which is mainly used to develop for x86. I've hit a
> few crashes, and I think I hit a couple without this code. But here's an
> example:
>
> huh, entered softirq 4 c000000000846ad8 preempt_count 10000103, exited
> with fffefffe?
> ------------[ cut here ]------------
> Badness at kernel/sched_fair.c:875
> NIP: c00000000004bfb8 LR: c00000000004bf7c CTR: c0000000000b5830
> REGS: c00000003929cce0 TRAP: 0700 Not tainted (2.6.28-rc4-tip)
> MSR: 9000000000021032 <ME,IR,DR> CR: 28822842 XER: 20000000
> TASK = c00000003d93cd10[2061] 'remove-trailing' THREAD: c00000003929c000
> CPU: 1
> GPR00: 0000000000000001 c00000003929cf60 c000000000887070 c000000000ae2d00
> GPR04: c00000000004c2c0 0000000000003320 c00000003929cb70 000000000000080d
> GPR08: c00000000079333c 000000000002ffff c000000000903380 c000000000903380
> GPR12: 0000000048822848 c000000000903580 c000000000794000 0000000000000000
> GPR16: c000000000903380 0000000000000001 c000000000909f7c 7fffffffffffffff
> GPR20: c00000003929d8e0 c000000000ae2f20 00000086b6e84cc0 0000000000000001
> GPR24: 0000000000000001 c000000000794000 c00000003d93cd10 c00000003d934f20
> GPR28: c000000000ae4000 c00000003d93cd48 c000000000803550 c00000003929cf60
> cpu 0x1: Vector: 400 (Instruction Access) at [c00000003929be1f]
> pc: 01c0000000000ae8
> lr: 01c0000000000aeb
> sp: c00000003929c09f
> msr: 9000000040001032
> current = 0xc00000003d93cd10
> paca = 0xc000000000903580
> pid = 2061, comm = remove-trailing
>
>
> Then it went into the monitor that is loaded. When I fix the rest of my
> patches, I'll see if it is not my code that is crashing this, and then
> I'll see if I can figure out what is causing some of these crashes.
>


Paul,

I'm thinking that I'm hitting stack overflows. I just got this crash:

Badness at net/core/skbuff.c:393
NIP: c0000000004c805c LR: c0000000004c800c CTR: c0000000000b57d0
------------[ cut here ]------------
kernel BUG at kernel/sched.c:1155!
cpu 0x1: Vector: 700 (Program Check) at [c00000002ad18500]
pc: c000000000049884: .resched_task+0x54/0xe0
lr: c000000000049858: .resched_task+0x28/0xe0
sp: c00000002ad18780
msr: 9000000000021032
current = 0xc00000002a516c30
paca = 0xc000000000903580
pid = 17578, comm = fixdep
kernel BUG at kernel/sched.c:1155!
enter ? for help
[c00000002ad18810] c000000000054494 .task_tick_fair+0xc4/0x120
[c00000002ad188a0] c000000000056188 .scheduler_tick+0x108/0x2d0
[c00000002ad18940] c00000000006b1d4 .update_process_times+0x74/0xb0
[c00000002ad189e0] c00000000008bb4c .tick_sched_timer+0x8c/0x120
[c00000002ad18a90] c000000000080f88 .__run_hrtimer+0xd8/0x130
[c00000002ad18b30] c000000000081fac .hrtimer_interrupt+0x16c/0x220
[c00000002ad18c20] c000000000023a0c .timer_interrupt+0xcc/0x110
[c00000002ad18cc0] c0000000000034e0 decrementer_common+0xe0/0x100
--- Exception: 901 (Decrementer) at c00000000000b978
.raw_local_irq_restore+0x58/0x60
[link register ] c00000000005b8d8 .vprintk+0x318/0x4b0
[c00000002ad18fb0] c00000000005b7e0 .vprintk+0x220/0x4b0 (unreliable)
[c00000002ad19110] c00000000005bac4 .printk+0x54/0x70
[c00000002ad191b0] c0000000000119d0 .show_regs+0x50/0x380
[c00000002ad19260] c000000000244150 .report_bug+0xb0/0x130
[c00000002ad19300] c0000000000253c0 .program_check_exception+0x1e0/0x610
[c00000002ad19390] c0000000000043e0 program_check_common+0xe0/0x100
--- Exception: 700 (Program Check) at c0000000004c805c
.skb_release_head_state+0x7c/0xe0
[c00000002ad19710] c0000000004c8f0c .skb_release_all+0x2c/0x50
[c00000002ad197a0] c0000000004c80f4 .__kfree_skb+0x34/0x120
[c00000002ad19830] c0000000004c8224 .kfree_skb+0x44/0x90
[c00000002ad198c0] c0000000004d31f4 .dev_hard_start_xmit+0x224/0x390
[c00000002ad19980] c0000000004ed1d0 .__qdisc_run+0x240/0x340
[c00000002ad19a50] c0000000004d7778 .dev_queue_xmit+0x328/0x630
[c00000002ad19b00] c000000000501940 .ip_finish_output+0x160/0x3d0
[c00000002ad19bc0] c0000000005022c4 .ip_output+0xc4/0xf0
[c00000002ad19c60] c000000000501c88 .ip_local_out+0x58/0x80
[c00000002ad19cf0] c000000000502824 .ip_queue_xmit+0x254/0x480
[c00000002ad19e00] c00000000051ace8 .tcp_transmit_skb+0x498/0x970
[c00000002ad19ef0] c00000000051cf98 .__tcp_push_pending_frames+0x248/0x9a0
[c00000002ad19fe0] c000000000518ec0 .tcp_rcv_established+0x180/0x710
[c00000002ad1a0a0] c0000000005219cc .tcp_v4_do_rcv+0x11c/0x2b0
[c00000002ad1a160] c000000000524084 .tcp_v4_rcv+0x6d4/0x870
[c00000002ad1a230] c0000000004fc0e8 .ip_local_deliver+0xf8/0x2b0
[c00000002ad1a2d0] c0000000004fc614 .ip_rcv+0x374/0x670
[c00000002ad1a3a0] c0000000004d28a8 .netif_receive_skb+0x298/0x380
[c00000002ad1a470] c00000000054a528 .lro_receive_skb+0x68/0xa0
[c00000002ad1a510] c00000000031a8d4 .pasemi_mac_clean_rx+0x2e4/0x500
[c00000002ad1a610] c00000000031b534 .pasemi_mac_poll+0x54/0x150
[c00000002ad1a6b0] c0000000004d5f90 .net_rx_action+0x150/0x290
[c00000002ad1a790] c000000000062438 .__do_softirq+0xe8/0x1e0
[c00000002ad1a850] c00000000000d2c4 .do_softirq+0xa4/0xd0
[c00000002ad1a8e0] c000000000062104 .irq_exit+0xb4/0xf0
[c00000002ad1a970] c00000000000d6a4 .do_IRQ+0x114/0x150
[c00000002ad1aa10] c000000000004160 hardware_interrupt_entry+0x1c/0x3c
--- Exception: 501 (Hardware Interrupt) at c000000000193d84
.journal_dirty_metadata+0x254/0x260
[c00000002ad1ad00] c00000000018e05c
.__ext3_journal_dirty_metadata+0x4c/0xa0 (unreliable)
[c00000002ad1ada0] c00000000017bcf4 .ext3_mark_iloc_dirty+0x2a4/0x590
[c00000002ad1ae70] c00000000017c5ec .ext3_mark_inode_dirty+0x5c/0x80
[c00000002ad1af30] c000000000180dec .ext3_dirty_inode+0xac/0xf0
[c00000002ad1afd0] c00000000012454c .__mark_inode_dirty+0x7c/0x210
[c00000002ad1b080] c000000000178174 .ext3_new_blocks+0xd4/0x770
[c00000002ad1b1b0] c00000000017d194 .ext3_get_blocks_handle+0x2a4/0xcd0
[c00000002ad1b380] c00000000017dee0 .ext3_get_block+0xa0/0x170
[c00000002ad1b440] c00000000012ca6c .__block_prepare_write+0x32c/0x510
[c00000002ad1b580] c00000000012cd80 .block_write_begin+0x90/0x160
[c00000002ad1b640] c0000000001800e0 .ext3_write_begin+0x130/0x2c0
[c00000002ad1b760] c0000000000b9d30
.generic_file_buffered_write+0x180/0x3c0
[c00000002ad1b8b0] c0000000000ba510
.__generic_file_aio_write_nolock+0x2c0/0x450
[c00000002ad1b9c0] c0000000000ba730 .generic_file_aio_write+0x90/0x130
[c00000002ad1ba80] c000000000179b00 .ext3_file_write+0x60/0x130
[c00000002ad1bb30] c0000000000f8fd4 .do_sync_write+0xf4/0x160
[c00000002ad1bcd0] c0000000000f99dc .vfs_write+0xdc/0x1e0
[c00000002ad1bd80] c0000000000fa48c .sys_write+0x6c/0xd0
[c00000002ad1be30] c0000000000074b0 syscall_exit+0x0/0x40
--- Exception: c01 (System Call) at 000000000ff470c4
SP (ffb8be40) is in userspace
1:mon> X

And running my stack_trace in ftrace, I've been seeing large hits on the
stack:

root@electra ~> cat /debug/tracing/stack_trace
Depth Size Location (56 entries)
----- ---- --------
0) 14032 112 ftrace_call+0x4/0x14
1) 13920 128 .sched_clock+0x20/0x60
2) 13792 128 .sched_clock_cpu+0x34/0x50
3) 13664 144 .cpu_clock+0x3c/0xa0
4) 13520 144 .get_timestamp+0x2c/0x50
5) 13376 192 .softlockup_tick+0x100/0x220
6) 13184 128 .run_local_timers+0x34/0x50
7) 13056 160 .update_process_times+0x44/0xb0
8) 12896 176 .tick_sched_timer+0x8c/0x120
9) 12720 160 .__run_hrtimer+0xd8/0x130
10) 12560 240 .hrtimer_interrupt+0x16c/0x220
11) 12320 160 .timer_interrupt+0xcc/0x110
12) 12160 240 decrementer_common+0xe0/0x100
13) 11920 576 0x80
14) 11344 160 .usb_hcd_irq+0x94/0x150
15) 11184 160 .handle_IRQ_event+0x80/0x120
16) 11024 160 .handle_fasteoi_irq+0xd8/0x1e0
17) 10864 160 .do_IRQ+0xbc/0x150
18) 10704 144 hardware_interrupt_entry+0x1c/0x3c
19) 10560 672 0x0
20) 9888 144 ._spin_unlock_irqrestore+0x84/0xd0
21) 9744 160 .scsi_dispatch_cmd+0x170/0x360
22) 9584 208 .scsi_request_fn+0x324/0x5e0
23) 9376 144 .blk_invoke_request_fn+0xc8/0x1b0
24) 9232 144 .__blk_run_queue+0x48/0x60
25) 9088 144 .blk_run_queue+0x40/0x70
26) 8944 192 .scsi_run_queue+0x3a8/0x3e0
27) 8752 160 .scsi_next_command+0x58/0x90
28) 8592 176 .scsi_end_request+0xd4/0x130
29) 8416 208 .scsi_io_completion+0x15c/0x500
30) 8208 160 .scsi_finish_command+0x15c/0x190
31) 8048 160 .scsi_softirq_done+0x138/0x1e0
32) 7888 160 .blk_done_softirq+0xd0/0x100
33) 7728 192 .__do_softirq+0xe8/0x1e0
34) 7536 144 .do_softirq+0xa4/0xd0
35) 7392 144 .irq_exit+0xb4/0xf0
36) 7248 160 .do_IRQ+0x114/0x150
37) 7088 752 hardware_interrupt_entry+0x1c/0x3c
38) 6336 144 .blk_rq_init+0x28/0xc0
39) 6192 208 .get_request+0x13c/0x3d0
40) 5984 240 .get_request_wait+0x60/0x170
41) 5744 192 .__make_request+0xd4/0x560
42) 5552 192 .generic_make_request+0x210/0x300
43) 5360 208 .submit_bio+0x168/0x1a0
44) 5152 160 .submit_bh+0x188/0x1e0
45) 4992 1280 .block_read_full_page+0x23c/0x430
46) 3712 1280 .do_mpage_readpage+0x43c/0x740
47) 2432 352 .mpage_readpages+0x130/0x1c0
48) 2080 160 .ext3_readpages+0x50/0x80
49) 1920 256 .__do_page_cache_readahead+0x1e4/0x340
50) 1664 160 .do_page_cache_readahead+0x94/0xe0
51) 1504 240 .filemap_fault+0x360/0x530
52) 1264 256 .__do_fault+0xb8/0x600
53) 1008 240 .handle_mm_fault+0x190/0x920
54) 768 320 .do_page_fault+0x3d4/0x5f0
55) 448 448 handle_page_fault+0x20/0x5c

This is one of the ftrace features, it tests the stack at every function
call and if it is a new max, then it records the stack. This dump was what
was recorded right after boot up. That is about 14K of stack use, and I
hear that the stack is 16K. That looks like it is getting pretty close.

I'm also told that most PPC64 configs should have CONFIG_IRQSTACKS set,
but I see that I do not. I am in the process of building my box without it
and see if I still have these crashes.

-- Steve

2008-11-18 13:56:48

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/7] Porting dynmaic ftrace to PowerPC


Paul and Benjamin,

Can I add your Acked-by: to all these patches that I submitted? I'm going
to recommit them with a consistent subject (all lower case ppc), but I'm
not going to change the patches themselves.

Would you two be fine with that? Or at least one of you?

-- Steve

2008-11-19 02:47:12

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 0/7] Porting dynmaic ftrace to PowerPC

Steven Rostedt writes:

> Can I add your Acked-by: to all these patches that I submitted? I'm going
> to recommit them with a consistent subject (all lower case ppc), but I'm
> not going to change the patches themselves.
>
> Would you two be fine with that? Or at least one of you?

My preference would be for the patches to go through the powerpc tree
unless there is a good reason for them to go via another tree.

The style we use for the headline is "powerpc: Add xyz feature" or
"powerpc/subsystem: Fix foo bug".

As for the acked-by, I feel I first need to go through the whole
series again with the changes you have made recently. Have you
reworked the earlier patches to avoid introducing any bugs, rather
than just fixing the bugs with later patches? If you haven't, are you
sure that the bugs won't cause any problems in bisecting?

Also, what's the best place to find the latest patch set?

Thanks again for doing all this work.

Paul.

2008-11-19 03:04:45

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/7] Porting dynmaic ftrace to PowerPC


On Wed, 19 Nov 2008, Paul Mackerras wrote:

> Steven Rostedt writes:
>
> > Can I add your Acked-by: to all these patches that I submitted? I'm going
> > to recommit them with a consistent subject (all lower case ppc), but I'm
> > not going to change the patches themselves.
> >
> > Would you two be fine with that? Or at least one of you?
>
> My preference would be for the patches to go through the powerpc tree
> unless there is a good reason for them to go via another tree.

I have no problem with that. The only thing is that we have a lot of
pending work still in the linux-tip tree, which you may need to pull
in to get these patches working. Well, there's two or three commits in the
generic code that I know the PPC code is dependent on.

I could give you a list of commits in tip that need to go mainline first
before we can pull in the PPC changes. Then you could wait till those
changes make it into 29 and then you could push the PPC modifications in
from your tree.

>
> The style we use for the headline is "powerpc: Add xyz feature" or
> "powerpc/subsystem: Fix foo bug".

We've been using the "ftrace: subject" format for most of our ftrace
commits, and have been using "ftrace, ppc: subject" or ppc32 or ppc64 for
those. But since this is a powerpc port, I will conform to your style.

>
> As for the acked-by, I feel I first need to go through the whole
> series again with the changes you have made recently. Have you
> reworked the earlier patches to avoid introducing any bugs, rather
> than just fixing the bugs with later patches? If you haven't, are you
> sure that the bugs won't cause any problems in bisecting?

Fair enough. I'll rework the patches again to fold back in the changes
based on your comments, as well as the comments of others. When I'm done,
I'll repost to the list. This way, if you pull them in, you can add your
Signed-off-by yourself.

>
> Also, what's the best place to find the latest patch set?

I'll be keeping the changes in my repo:

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git

branch tip/ppc

Note, this branch is based on top of Ingo's linux-tip tree. I can make a
a branch "ppc" that will be based on top of mainline, and I can add the
patches needed to get PPC working. The generic commits will have the
"ftrace: " format.

The generic ftrace patches will still need to go in through linux-tip. But
when they do, feel free to push the PowerPC port in. Anyway, you are the
ones that can test it better than we can. I have two PPC boxes that I test
on, but I'm sure you have a lot more.

>
> Thanks again for doing all this work.

Thank you for the reviews.

-- Steve

2008-11-19 09:28:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/7] Porting dynmaic ftrace to PowerPC


* Steven Rostedt <[email protected]> wrote:

> On Wed, 19 Nov 2008, Paul Mackerras wrote:
>
> > Steven Rostedt writes:
> >
> > > Can I add your Acked-by: to all these patches that I submitted? I'm going
> > > to recommit them with a consistent subject (all lower case ppc), but I'm
> > > not going to change the patches themselves.
> > >
> > > Would you two be fine with that? Or at least one of you?
> >
> > My preference would be for the patches to go through the powerpc tree
> > unless there is a good reason for them to go via another tree.
>
> I have no problem with that. The only thing is that we have a lot of
> pending work still in the linux-tip tree, which you may need to pull
> in to get these patches working. Well, there's two or three commits
> in the generic code that I know the PPC code is dependent on.
>
> I could give you a list of commits in tip that need to go mainline
> first before we can pull in the PPC changes. Then you could wait
> till those changes make it into 29 and then you could push the PPC
> modifications in from your tree.

note that this inserts a lot of (unnecessary) serialization and a
window of non-testing - by all likelyhood this will delay ppc ftrace
to v2.6.30 or later kernels.

Ingo

2008-11-19 10:38:42

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 0/7] Porting dynmaic ftrace to PowerPC

Ingo Molnar writes:

> * Steven Rostedt <[email protected]> wrote:
>
> > On Wed, 19 Nov 2008, Paul Mackerras wrote:
> >
> > > Steven Rostedt writes:
> > >
> > > > Can I add your Acked-by: to all these patches that I submitted? I'm going
> > > > to recommit them with a consistent subject (all lower case ppc), but I'm
> > > > not going to change the patches themselves.
> > > >
> > > > Would you two be fine with that? Or at least one of you?
> > >
> > > My preference would be for the patches to go through the powerpc tree
> > > unless there is a good reason for them to go via another tree.
> >
> > I have no problem with that. The only thing is that we have a lot of
> > pending work still in the linux-tip tree, which you may need to pull
> > in to get these patches working. Well, there's two or three commits
> > in the generic code that I know the PPC code is dependent on.
> >
> > I could give you a list of commits in tip that need to go mainline
> > first before we can pull in the PPC changes. Then you could wait
> > till those changes make it into 29 and then you could push the PPC
> > modifications in from your tree.
>
> note that this inserts a lot of (unnecessary) serialization and a
> window of non-testing - by all likelyhood this will delay ppc ftrace
> to v2.6.30 or later kernels.

Well, note that I said "unless there is a good reason". If it does
need to go via your tree, it can, though I don't see that it will get
much testing on powerpc there, and having it there will make it harder
to manage any conflicts with the other stuff I have queued up.

How much generic stuff that's not upstream do the powerpc ftrace
patches depend on?

Paul.

2008-11-19 10:57:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 0/7] Porting dynmaic ftrace to PowerPC


* Paul Mackerras <[email protected]> wrote:

> Ingo Molnar writes:
>
> > * Steven Rostedt <[email protected]> wrote:
> >
> > > On Wed, 19 Nov 2008, Paul Mackerras wrote:
> > >
> > > > Steven Rostedt writes:
> > > >
> > > > > Can I add your Acked-by: to all these patches that I submitted? I'm going
> > > > > to recommit them with a consistent subject (all lower case ppc), but I'm
> > > > > not going to change the patches themselves.
> > > > >
> > > > > Would you two be fine with that? Or at least one of you?
> > > >
> > > > My preference would be for the patches to go through the powerpc tree
> > > > unless there is a good reason for them to go via another tree.
> > >
> > > I have no problem with that. The only thing is that we have a lot of
> > > pending work still in the linux-tip tree, which you may need to pull
> > > in to get these patches working. Well, there's two or three commits
> > > in the generic code that I know the PPC code is dependent on.
> > >
> > > I could give you a list of commits in tip that need to go mainline
> > > first before we can pull in the PPC changes. Then you could wait
> > > till those changes make it into 29 and then you could push the PPC
> > > modifications in from your tree.
> >
> > note that this inserts a lot of (unnecessary) serialization and a
> > window of non-testing - by all likelyhood this will delay ppc ftrace
> > to v2.6.30 or later kernels.
>
> Well, note that I said "unless there is a good reason". If it does
> need to go via your tree, it can, though I don't see that it will
> get much testing on powerpc there, and having it there will make it
> harder to manage any conflicts with the other stuff I have queued
> up.

this is the diffstat:

arch/powerpc/Kconfig | 2 +
arch/powerpc/include/asm/ftrace.h | 14 +-
arch/powerpc/include/asm/module.h | 16 ++-
arch/powerpc/kernel/ftrace.c | 460 +++++++++++++++++++++++++++++++++---
arch/powerpc/kernel/idle.c | 5 +
arch/powerpc/kernel/module_32.c | 10 +
arch/powerpc/kernel/module_64.c | 13 +
scripts/recordmcount.pl | 18 ++-
8 files changed, 495 insertions(+), 43 deletions(-)

90% of it creates new code that shouldnt be collision-happy.

it does not "need" to go via tip/tracing, i just pointed out the
effects of the proposed workflow and i'm just trying to find a more
efficient workflow for it - while not impacting yours either. I think
it can be done - git gives us tons of tools to do this intelligently.

> How much generic stuff that's not upstream do the powerpc ftrace
> patches depend on?

a lot:

66 files changed, 3266 insertions(+), 985 deletions(-)

and it's all in flux (we are in the middle of the development cycle),
so i dont think it would be a good idea for you to pull those bits
into the powerpc tree.

Maybe Steve could do the following trick: create a Linus -git based
branch that uses the new APIs but marks ppc's ftrace as "depends 0" in
the powerpc Kconfig. (the new ftrace.c wont build)

You could pull that tree into the powerpc tree and everything should
still work fine in PPC - sans ftrace.

In tip/tracing we'd merge it too (these commits will never be
rebased), and we'd also remove the depends 0 from the powerpc Kconfig.
That sole change wont conflict with anything powerpc.

It would all play out just fine in linux-next: we'd have both the
latest powerpc bits and the latest ftrace bits on powerpc. You could
test the non-ftrace impact of Steve's changes in the powerpc tree as
well and have it all part of your usual workflow.

The 2.6.29 merge window would be trouble-free as well: since the
sha1's are the same, any of the trees can be merged upstream without
having to wait for the other one and without creating conflicts or
other trouble for the other one.

Hm?

Ingo

2008-11-19 11:36:15

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 0/7] Porting dynmaic ftrace to PowerPC

Ingo Molnar writes:

> and it's all in flux (we are in the middle of the development cycle),
> so i dont think it would be a good idea for you to pull those bits
> into the powerpc tree.

Quite. OK, it does sound like this stuff needs to live in your tree
for now, and from the diffstat it doesn't look like there is any
conflict with stuff in my tree yet.

> Maybe Steve could do the following trick: create a Linus -git based
> branch that uses the new APIs but marks ppc's ftrace as "depends 0" in
> the powerpc Kconfig. (the new ftrace.c wont build)
>
> You could pull that tree into the powerpc tree and everything should
> still work fine in PPC - sans ftrace.

Sounds like a reasonable idea, except that I think I'll delay pulling
that branch into my tree until I need to in order to resolve a
conflict - at least as far as my exported branches are concerned.

> In tip/tracing we'd merge it too (these commits will never be
> rebased),

I do want to see the patches in their final form and have the
opportunity to give an acked-by before you declare the branch frozen.
Apart from that, sounds good.

Paul.

2008-11-19 12:13:57

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/7] Porting dynmaic ftrace to PowerPC


On Wed, 19 Nov 2008, Ingo Molnar wrote:
>
> Maybe Steve could do the following trick: create a Linus -git based
> branch that uses the new APIs but marks ppc's ftrace as "depends 0" in
> the powerpc Kconfig. (the new ftrace.c wont build)

There's only two generic commits that need to be added for the PowerPC
code to work.

ftrace: pass module struct to arch dynamic ftrace functions
ftrace: allow NULL pointers in mcount_loc

I've already ported them to mainline to test PowerPC there.
Paul could use these two versions and keep ftrace in a separate branch in
his tree. This way all the PowerPC code will be there, and actually can be
tested. They may still hit the same bugs that we have fixed in tip, but
those should all be minor, since any major bug is already in mainline or
on its way.

>
> You could pull that tree into the powerpc tree and everything should
> still work fine in PPC - sans ftrace.
>
> In tip/tracing we'd merge it too (these commits will never be
> rebased), and we'd also remove the depends 0 from the powerpc Kconfig.
> That sole change wont conflict with anything powerpc.
>
> It would all play out just fine in linux-next: we'd have both the
> latest powerpc bits and the latest ftrace bits on powerpc. You could
> test the non-ftrace impact of Steve's changes in the powerpc tree as
> well and have it all part of your usual workflow.
>
> The 2.6.29 merge window would be trouble-free as well: since the
> sha1's are the same, any of the trees can be merged upstream without
> having to wait for the other one and without creating conflicts or
> other trouble for the other one.
>
> Hm?

If Paul uses the ported two generic commits, then he would need to keep
that in a separate branch that will never go upstream. He could pull in
the other PowerPC patches and do as you said, keep the depend off.

I can make two branches that Paul could pull from. One would have this
code disabled in the config, and just be the PowerPC port. Although, we
would need to figure out the best way to keep it disabled. Disable it
after the patches? The patches themselve enable it.

And then I could make another branch with the back port of the two commits
that Paul would never push upstream. This would allow for the PowerPC guys
to test the code in their tree without waiting for it. We just need to
trust that Paul will not push those commits ;-)

Actually, I can change the subject of those commits to have at the
beginning:

NOT FOR UPSTREAM ftrace: ....

What do you guys think?

-- Steve

2008-11-19 12:15:37

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/7] Porting dynmaic ftrace to PowerPC

On Wed, 19 Nov 2008, Steven Rostedt wrote:

>
> On Wed, 19 Nov 2008, Ingo Molnar wrote:
> >
> > Maybe Steve could do the following trick: create a Linus -git based
> > branch that uses the new APIs but marks ppc's ftrace as "depends 0" in
> > the powerpc Kconfig. (the new ftrace.c wont build)
>
> There's only two generic commits that need to be added for the PowerPC
> code to work.
>
> ftrace: pass module struct to arch dynamic ftrace functions
> ftrace: allow NULL pointers in mcount_loc
>
> I've already ported them to mainline to test PowerPC there.
> Paul could use these two versions and keep ftrace in a separate branch in
> his tree. This way all the PowerPC code will be there, and actually can be
> tested. They may still hit the same bugs that we have fixed in tip, but
> those should all be minor, since any major bug is already in mainline or
> on its way.

I just pushed all the PowerPC patches on top of this port it works.
I still need to rework the patches for Paul.

-- Steve