I'm working on a system that is using a bit too much stack to work
reliably with 4K stacks (OpenSSI with DRBD for the root filesystem
replication) and I find that the CONFIG_DEBUG_STACKOVERFLOW stuff works
rather poorly - often the stack dump will cause a stack overflow itself.
Here's a little patch to irq.c to make stack overflow dumps use a
separate stack.
It's based on 2.6.12 'cos that's how out of date I am. :-)
Maybe it'll be of use to someone.
Please CC [email protected] with any comments, I'm not subscribed to the list.
* John Hughes <[email protected]> wrote:
> I'm working on a system that is using a bit too much stack to work
> reliably with 4K stacks (OpenSSI with DRBD for the root filesystem
> replication) and I find that the CONFIG_DEBUG_STACKOVERFLOW stuff works
> rather poorly - often the stack dump will cause a stack overflow itself.
>
> Here's a little patch to irq.c to make stack overflow dumps use a
> separate stack.
>
> It's based on 2.6.12 'cos that's how out of date I am. :-)
in recent kernel's there's an automatic maximum-stack-footprint tracer
plugin available. When you suspect stack overflow you can use it by
enabling:
CONFIG_STACK_TRACER=y
and boot the kernel with the 'stacktrace' boot option. Then you'll get
such "worst-case stack footprint since bootup" entries in
/debug/tracing/stack_trace:
# ls -l /debug/tracing/*stack*
-rw-r--r-- 1 root root 0 2009-01-03 22:42 /debug/tracing/stack_max_size
-r--r--r-- 1 root root 0 2009-01-03 22:42 /debug/tracing/stack_trace
# cat /debug/tracing/*stack*
5624
Depth Size Location (38 entries)
----- ---- --------
0) 5496 232 lookup_address+0x51/0x20e
1) 5264 496 __change_page_attr_set_clr+0xa7/0xa15
2) 4768 128 kernel_map_pages+0x121/0x140
3) 4640 224 get_page_from_freelist+0x4bb/0x6ec
4) 4416 160 __alloc_pages_internal+0x150/0x4af
5) 4256 48 alloc_pages_current+0xbe/0xc7
6) 4208 16 alloc_slab_page+0x1e/0x6e
7) 4192 64 new_slab+0x51/0x1e2
8) 4128 96 __slab_alloc+0x260/0x3fa
9) 4032 80 kmem_cache_alloc+0xa8/0x120
10) 3952 48 radix_tree_preload+0x38/0x86
11) 3904 64 add_to_page_cache_locked+0x51/0xed
12) 3840 32 add_to_page_cache_lru+0x31/0x6b
13) 3808 64 find_or_create_page+0x56/0x7d
14) 3744 80 __getblk+0x136/0x262
15) 3664 32 __bread+0x13/0x82
16) 3632 64 ext3_get_branch+0x7b/0xef
17) 3568 368 ext3_get_blocks_handle+0xa2/0x907
18) 3200 96 ext3_get_block+0xc3/0x101
19) 3104 224 do_mpage_readpage+0x1ad/0x4d0
20) 2880 240 mpage_readpages+0xb6/0xf9
21) 2640 16 ext3_readpages+0x1f/0x21
22) 2624 144 __do_page_cache_readahead+0x147/0x1bd
23) 2480 48 do_page_cache_readahead+0x5b/0x68
24) 2432 112 filemap_fault+0x176/0x34b
25) 2320 160 __do_fault+0x58/0x410
26) 2160 176 handle_mm_fault+0x4b2/0xa3e
27) 1984 800 do_page_fault+0x86d/0xcec
28) 1184 208 page_fault+0x25/0x30
29) 976 16 clear_user+0x30/0x38
30) 960 288 load_elf_binary+0x61c/0x18f0
31) 672 80 search_binary_handler+0xd9/0x279
32) 592 176 load_script+0x1b3/0x1c8
33) 416 80 search_binary_handler+0xd9/0x279
34) 336 96 do_execve+0x1df/0x296
35) 240 64 sys_execve+0x43/0x5e
36) 176 176 stub_execve+0x6a/0xc0
It uses precise stack frames and lists their sizes and the total depth of
each stack entry.
Hope this helps,
Ingo
Ingo Molnar wrote:
> in recent kernel's there's an automatic maximum-stack-footprint tracer
> plugin available.
Lovely. Maybe one day I'll be able to use a recent kernel. :-)
Thanks for the tip.
* John Hughes <[email protected]> wrote:
> Ingo Molnar wrote:
>> in recent kernel's there's an automatic maximum-stack-footprint tracer
>> plugin available.
>
> Lovely. Maybe one day I'll be able to use a recent kernel. :-)
Curious, is that inability of using recent kernls our fault or your fault?
:)
Ingo
Ingo Molnar wrote:
> * John Hughes <[email protected]> wrote:
>
>
>> Ingo Molnar wrote:
>>
>>> in recent kernel's there's an automatic maximum-stack-footprint tracer
>>> plugin available.
>>>
>> Lovely. Maybe one day I'll be able to use a recent kernel. :-)
>>
>
> Curious, is that inability of using recent kernls our fault or your fault?
> :)
>
Oh, mine, all mine. Just can't find the time. :-)
(Well, I could blame you buggers for moving too fast, but I doubt that
would be appreciated :-)).
* John Hughes <[email protected]> wrote:
> Ingo Molnar wrote:
>> * John Hughes <[email protected]> wrote:
>>
>>
>>> Ingo Molnar wrote:
>>>
>>>> in recent kernel's there's an automatic maximum-stack-footprint
>>>> tracer plugin available.
>>>>
>>> Lovely. Maybe one day I'll be able to use a recent kernel. :-)
>>>
>>
>> Curious, is that inability of using recent kernls our fault or your fault?
>> :)
>>
> Oh, mine, all mine. Just can't find the time. :-)
>
> (Well, I could blame you buggers for moving too fast, but I doubt that
> would be appreciated :-)).
oh, feel free, we'll take it as sincere flattery ;-)
Ingo
John Hughes <[email protected]> writes:
> Here's a little patch to irq.c to make stack overflow dumps use a
> separate stack.
This has been fixed last year in mainline with
04b361abfdc522239e3a071f3afdebf5787d9f03
Morale: If you use ancient kernels you miss a lot of bugfixes.
-Andi
--
[email protected] -- Speaking for myself only.
Andi Kleen wrote:
> John Hughes <[email protected]> writes:
>
>
>> Here's a little patch to irq.c to make stack overflow dumps use a
>> separate stack.
>>
>
> This has been fixed last year in mainline with
> 04b361abfdc522239e3a071f3afdebf5787d9f03
>
Ah, thanks, neater than my fix.
> Morale: If you use ancient kernels you miss a lot of bugfixes.
>
Hey, don't you be ruining my morale, it's low enough already. :-)