2009-06-27 15:07:35

by Joao Correia

[permalink] [raw]
Subject: Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

Hello

Since midway through the 2.6.30 series, and continuing on the rc1 for
.31, im having this at every boot, during the boot process:

(...)
Jun 26 21:07:03 hightech kernel: NET: Registered protocol family 10
Jun 26 21:07:03 hightech kernel: lo: Disabled Privacy Extensions
Jun 26 21:07:03 hightech kernel: BUG: MAX_STACK_TRACE_ENTRIES too low!
Jun 26 21:07:03 hightech kernel: turning off the locking correctness validator.
Jun 26 21:07:03 hightech kernel: Pid: 672, comm: modprobe Not tainted
2.6.31-rc1 #1
Jun 26 21:07:03 hightech kernel: Call Trace:
Jun 26 21:07:03 hightech kernel: [<c080facf>] ? printk+0x22/0x3b
Jun 26 21:07:03 hightech kernel: [<c046c940>] save_trace+0x89/0xa3
Jun 26 21:07:03 hightech kernel: [<c046c9ca>] add_lock_to_list+0x70/0xb3
Jun 26 21:07:03 hightech kernel: [<c046eb81>] __lock_acquire+0x996/0xb08
Jun 26 21:07:03 hightech kernel: [<c066d7c2>] ? account+0x30/0xef
Jun 26 21:07:03 hightech kernel: [<c066d7c2>] ? account+0x30/0xef
Jun 26 21:07:03 hightech kernel: [<c046edaa>] lock_acquire+0xb7/0xeb
Jun 26 21:07:03 hightech kernel: [<c066d7c2>] ? account+0x30/0xef
Jun 26 21:07:03 hightech kernel: [<c066d7c2>] ? account+0x30/0xef
Jun 26 21:07:03 hightech kernel: [<c081283f>] _spin_lock_irqsave+0x45/0x89
Jun 26 21:07:03 hightech kernel: [<c066d7c2>] ? account+0x30/0xef
Jun 26 21:07:03 hightech kernel: [<c066d7c2>] account+0x30/0xef
Jun 26 21:07:03 hightech kernel: [<c066d995>] extract_entropy+0x3e/0xad
Jun 26 21:07:03 hightech kernel: [<c046c51a>] ? register_lock_class+0x25/0x2b0
Jun 26 21:07:03 hightech kernel: [<c066dab2>] xfer_secondary_pool+0xae/0xeb
Jun 26 21:07:03 hightech kernel: [<c046c51a>] ? register_lock_class+0x25/0x2b0
Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
Jun 26 21:07:03 hightech kernel: [<c046e4ed>] ? __lock_acquire+0x302/0xb08
Jun 26 21:07:03 hightech kernel: [<c066d988>] extract_entropy+0x31/0xad
Jun 26 21:07:03 hightech kernel: [<faa90290>] ?
ipv6_regen_rndid+0x2c/0xca [ipv6]
Jun 26 21:07:03 hightech kernel: [<faa90290>] ?
ipv6_regen_rndid+0x2c/0xca [ipv6]
Jun 26 21:07:03 hightech kernel: [<c066db18>] get_random_bytes+0x29/0x3e
Jun 26 21:07:03 hightech kernel: [<faa8e240>]
__ipv6_regen_rndid+0x2c/0xdc [ipv6]
Jun 26 21:07:03 hightech kernel: [<faa902a0>] ipv6_regen_rndid+0x3c/0xca [ipv6]
Jun 26 21:07:03 hightech kernel: [<faa90efa>] ipv6_add_dev+0x236/0x2d8 [ipv6]
Jun 26 21:07:03 hightech kernel: [<faa91e67>] addrconf_notify+0x5c/0x750 [ipv6]
Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
Jun 26 21:07:03 hightech kernel: [<c046d533>] ? mark_held_locks+0x52/0x7c
Jun 26 21:07:03 hightech kernel: [<c08111ce>] ? __mutex_lock_common+0x2ca/0x32b
Jun 26 21:07:03 hightech kernel: [<c046d800>] ?
trace_hardirqs_on_caller+0x122/0x155
Jun 26 21:07:03 hightech kernel: [<c08111df>] ? __mutex_lock_common+0x2db/0x32b
Jun 26 21:07:03 hightech kernel: [<c046d800>] ?
trace_hardirqs_on_caller+0x122/0x155
Jun 26 21:07:03 hightech kernel: [<c077d86e>] ? rtnl_lock+0x22/0x35
Jun 26 21:07:03 hightech kernel: [<faacb000>] ? inet6_init+0x0/0x2bf [ipv6]
Jun 26 21:07:03 hightech kernel: [<c0811322>] ? mutex_lock_nested+0x41/0x5a
Jun 26 21:07:03 hightech kernel: [<c077d86e>] ? rtnl_lock+0x22/0x35
Jun 26 21:07:03 hightech kernel: [<c0774eb0>]
register_netdevice_notifier+0x64/0x16c
Jun 26 21:07:03 hightech kernel: [<faacb000>] ? inet6_init+0x0/0x2bf [ipv6]
Jun 26 21:07:03 hightech kernel: [<faacb358>] addrconf_init+0x6b/0x13a [ipv6]
Jun 26 21:07:03 hightech kernel: [<faacb194>] inet6_init+0x194/0x2bf [ipv6]
Jun 26 21:07:03 hightech kernel: [<c0401186>] do_one_initcall+0x75/0x193
Jun 26 21:07:03 hightech kernel: [<c046c51a>] ? register_lock_class+0x25/0x2b0
Jun 26 21:07:03 hightech kernel: [<c046d7e7>] ?
trace_hardirqs_on_caller+0x109/0x155
Jun 26 21:07:03 hightech kernel: [<c046c51a>] ? register_lock_class+0x25/0x2b0
Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
Jun 26 21:07:03 hightech kernel: [<c046d533>] ? mark_held_locks+0x52/0x7c
Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
Jun 26 21:07:03 hightech kernel: [<c046d533>] ? mark_held_locks+0x52/0x7c
Jun 26 21:07:03 hightech kernel: [<c0810d74>] ?
__mutex_unlock_slowpath+0x105/0x124
Jun 26 21:07:03 hightech kernel: [<c046d800>] ?
trace_hardirqs_on_caller+0x122/0x155
Jun 26 21:07:03 hightech kernel: [<c046d84c>] ? trace_hardirqs_on+0x19/0x2c
Jun 26 21:07:03 hightech kernel: [<c046025a>] ?
__blocking_notifier_call_chain+0x5b/0x78
Jun 26 21:07:03 hightech kernel: [<c04091bd>] ? sched_clock+0x9/0xd
Jun 26 21:07:03 hightech kernel: [<c046c091>] ? lock_release_holdtime+0x39/0x143
Jun 26 21:07:03 hightech kernel: [<c046025a>] ?
__blocking_notifier_call_chain+0x5b/0x78
Jun 26 21:07:03 hightech kernel: [<c045f64e>] ? up_read+0x29/0x4f
Jun 26 21:07:03 hightech kernel: [<c046025a>] ?
__blocking_notifier_call_chain+0x5b/0x78
Jun 26 21:07:03 hightech kernel: [<c0460298>] ?
blocking_notifier_call_chain+0x21/0x37
Jun 26 21:07:03 hightech kernel: [<c04793c6>] sys_init_module+0xc8/0x1ee
Jun 26 21:07:03 hightech kernel: [<c0403793>] sysenter_do_call+0x12/0x38
Jun 26 21:07:03 hightech kernel: modprobe used greatest stack depth:
5296 bytes left
Jun 26 21:07:03 hightech kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
(...)

Now, the weird thing is that this doesnt always point at modprobe. I
have seen it occur with swapper, udevd and ip6tables-restore -at
least-.
For example, from another boot:

Jun 26 18:46:40 hightech kernel: Pid: 545, comm: udevd Not tainted
2.6.31-0.28.rc1.fc12.i586 #1
Jun 26 18:46:40 hightech kernel: Call Trace:
Jun 26 18:46:40 hightech kernel: [<c0815123>] ? printk+0x22/0x37
Jun 26 18:46:40 hightech kernel: [<c046cfa0>] save_trace+0x89/0xa3
Jun 26 18:46:40 hightech kernel: [<c046d02a>] add_lock_to_list+0x70/0xb3
Jun 26 18:46:40 hightech kernel: [<c046f224>] __lock_acquire+0x9d9/0xb08
Jun 26 18:46:40 hightech kernel: [<c05fdd3d>] ?
debug_check_no_obj_freed+0x7c/0x175
Jun 26 18:46:40 hightech kernel: [<c05fdd3d>] ?
debug_check_no_obj_freed+0x7c/0x175
Jun 26 18:46:40 hightech kernel: [<c046f40a>] lock_acquire+0xb7/0xeb
Jun 26 18:46:40 hightech kernel: [<c05fdd3d>] ?
debug_check_no_obj_freed+0x7c/0x175
Jun 26 18:46:40 hightech kernel: [<c05fdd3d>] ?
debug_check_no_obj_freed+0x7c/0x175
Jun 26 18:46:40 hightech kernel: [<c0817e8f>] _spin_lock_irqsave+0x45/0x89
Jun 26 18:46:40 hightech kernel: [<c05fdd3d>] ?
debug_check_no_obj_freed+0x7c/0x175
Jun 26 18:46:40 hightech kernel: [<c05fdd3d>]
debug_check_no_obj_freed+0x7c/0x175
Jun 26 18:46:40 hightech kernel: [<c046d188>] ? trace_hardirqs_off+0x19/0x2c
Jun 26 18:46:40 hightech kernel: [<c04ed7dd>] kmem_cache_free+0x92/0x125
Jun 26 18:46:40 hightech kernel: [<c05f19ad>] ? idr_remove+0x125/0x144
Jun 26 18:46:40 hightech kernel: [<c05f19ad>] idr_remove+0x125/0x144
Jun 26 18:46:40 hightech kernel: [<c05250a4>]
inotify_ignored_and_remove_idr+0x73/0x9b
Jun 26 18:46:40 hightech kernel: [<c0524526>] inotify_freeing_mark+0x1c/0x2f
Jun 26 18:46:40 hightech kernel: [<c0522da7>]
fsnotify_destroy_mark_by_entry+0x129/0x15e
Jun 26 18:46:40 hightech kernel: [<c0524b48>] sys_inotify_rm_watch+0x8e/0xbd
Jun 26 18:46:40 hightech kernel: [<c0403a5c>] syscall_call+0x7/0xb
Jun 26 18:46:40 hightech kernel: EXT3 FS on sde2, internal journal
Jun 26 18:46:40 hightech kernel: kjournald starting. Commit interval 5 seconds

Its not the exact same kernel (first one is vanilla, the second is
from rawhide fedora, but the results are the exact same. The origin of
the call trace keeps jumping around from program to program with every
boot).
The box will hang after some minutes too (as in totally hang, not even
screen refreshes). At start it will work for a couple of minutes tho.

The system is a quad-core phenom 9600 on an asus m3a-h/hdmi board,
with 4gb ddr2 800 ram, and 3 1TB sata hard disks.
The system is a fedora 11 with some rawhide updates thrown in, but
this doesnt happen on a 2.6.29 kernel, on the same system, so im
assuming its a kernel bug and not a distro problem (ofcourse, i could
be wrong).

Im attaching .config, and the full dmesg, as well as some other
informations (lspci -vv, cat /proc/cpuinfo). I can apply patches and
retest at will, as this will trigger at every boot, so its easily
reproducible on my end.

Thank you for your time,
Joao Correia
Centro de Informatica
Universidade da Beira Interior
Portugal


Attachments:
bootlog1.txt (61.24 kB)
bootlog2.txt (58.44 kB)
cpuinfo.txt (2.98 kB)
lspci.txt (23.78 kB)
Download all attachments

2009-06-27 15:12:11

by Joao Correia

[permalink] [raw]
Subject: Re: Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

Forgot config. Here it goes.

Thank you for your time,
Joao Correia
Centro de Informatica
Universidade da Beira Interior
Portugal

On Sat, Jun 27, 2009 at 4:07 PM, Joao
Correia<[email protected]> wrote:
> Hello
>
> Since midway through the 2.6.30 series, and continuing on the rc1 for
> .31, im having this at every boot, during the boot process:
>
> (...)
> Jun 26 21:07:03 hightech kernel: NET: Registered protocol family 10
> Jun 26 21:07:03 hightech kernel: lo: Disabled Privacy Extensions
> Jun 26 21:07:03 hightech kernel: BUG: MAX_STACK_TRACE_ENTRIES too low!
> Jun 26 21:07:03 hightech kernel: turning off the locking correctness validator.
> Jun 26 21:07:03 hightech kernel: Pid: 672, comm: modprobe Not tainted
> 2.6.31-rc1 #1
> Jun 26 21:07:03 hightech kernel: Call Trace:
> Jun 26 21:07:03 hightech kernel: [<c080facf>] ? printk+0x22/0x3b
> Jun 26 21:07:03 hightech kernel: [<c046c940>] save_trace+0x89/0xa3
> Jun 26 21:07:03 hightech kernel: [<c046c9ca>] add_lock_to_list+0x70/0xb3
> Jun 26 21:07:03 hightech kernel: [<c046eb81>] __lock_acquire+0x996/0xb08
> Jun 26 21:07:03 hightech kernel: [<c066d7c2>] ? account+0x30/0xef
> Jun 26 21:07:03 hightech kernel: [<c066d7c2>] ? account+0x30/0xef
> Jun 26 21:07:03 hightech kernel: [<c046edaa>] lock_acquire+0xb7/0xeb
> Jun 26 21:07:03 hightech kernel: [<c066d7c2>] ? account+0x30/0xef
> Jun 26 21:07:03 hightech kernel: [<c066d7c2>] ? account+0x30/0xef
> Jun 26 21:07:03 hightech kernel: [<c081283f>] _spin_lock_irqsave+0x45/0x89
> Jun 26 21:07:03 hightech kernel: [<c066d7c2>] ? account+0x30/0xef
> Jun 26 21:07:03 hightech kernel: [<c066d7c2>] account+0x30/0xef
> Jun 26 21:07:03 hightech kernel: [<c066d995>] extract_entropy+0x3e/0xad
> Jun 26 21:07:03 hightech kernel: [<c046c51a>] ? register_lock_class+0x25/0x2b0
> Jun 26 21:07:03 hightech kernel: [<c066dab2>] xfer_secondary_pool+0xae/0xeb
> Jun 26 21:07:03 hightech kernel: [<c046c51a>] ? register_lock_class+0x25/0x2b0
> Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
> Jun 26 21:07:03 hightech kernel: [<c046e4ed>] ? __lock_acquire+0x302/0xb08
> Jun 26 21:07:03 hightech kernel: [<c066d988>] extract_entropy+0x31/0xad
> Jun 26 21:07:03 hightech kernel: [<faa90290>] ?
> ipv6_regen_rndid+0x2c/0xca [ipv6]
> Jun 26 21:07:03 hightech kernel: [<faa90290>] ?
> ipv6_regen_rndid+0x2c/0xca [ipv6]
> Jun 26 21:07:03 hightech kernel: [<c066db18>] get_random_bytes+0x29/0x3e
> Jun 26 21:07:03 hightech kernel: [<faa8e240>]
> __ipv6_regen_rndid+0x2c/0xdc [ipv6]
> Jun 26 21:07:03 hightech kernel: [<faa902a0>] ipv6_regen_rndid+0x3c/0xca [ipv6]
> Jun 26 21:07:03 hightech kernel: [<faa90efa>] ipv6_add_dev+0x236/0x2d8 [ipv6]
> Jun 26 21:07:03 hightech kernel: [<faa91e67>] addrconf_notify+0x5c/0x750 [ipv6]
> Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
> Jun 26 21:07:03 hightech kernel: [<c046d533>] ? mark_held_locks+0x52/0x7c
> Jun 26 21:07:03 hightech kernel: [<c08111ce>] ? __mutex_lock_common+0x2ca/0x32b
> Jun 26 21:07:03 hightech kernel: [<c046d800>] ?
> trace_hardirqs_on_caller+0x122/0x155
> Jun 26 21:07:03 hightech kernel: [<c08111df>] ? __mutex_lock_common+0x2db/0x32b
> Jun 26 21:07:03 hightech kernel: [<c046d800>] ?
> trace_hardirqs_on_caller+0x122/0x155
> Jun 26 21:07:03 hightech kernel: [<c077d86e>] ? rtnl_lock+0x22/0x35
> Jun 26 21:07:03 hightech kernel: [<faacb000>] ? inet6_init+0x0/0x2bf [ipv6]
> Jun 26 21:07:03 hightech kernel: [<c0811322>] ? mutex_lock_nested+0x41/0x5a
> Jun 26 21:07:03 hightech kernel: [<c077d86e>] ? rtnl_lock+0x22/0x35
> Jun 26 21:07:03 hightech kernel: [<c0774eb0>]
> register_netdevice_notifier+0x64/0x16c
> Jun 26 21:07:03 hightech kernel: [<faacb000>] ? inet6_init+0x0/0x2bf [ipv6]
> Jun 26 21:07:03 hightech kernel: [<faacb358>] addrconf_init+0x6b/0x13a [ipv6]
> Jun 26 21:07:03 hightech kernel: [<faacb194>] inet6_init+0x194/0x2bf [ipv6]
> Jun 26 21:07:03 hightech kernel: [<c0401186>] do_one_initcall+0x75/0x193
> Jun 26 21:07:03 hightech kernel: [<c046c51a>] ? register_lock_class+0x25/0x2b0
> Jun 26 21:07:03 hightech kernel: [<c046d7e7>] ?
> trace_hardirqs_on_caller+0x109/0x155
> Jun 26 21:07:03 hightech kernel: [<c046c51a>] ? register_lock_class+0x25/0x2b0
> Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
> Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
> Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
> Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
> Jun 26 21:07:03 hightech kernel: [<c046d533>] ? mark_held_locks+0x52/0x7c
> Jun 26 21:07:03 hightech kernel: [<c046d317>] ? mark_lock+0x29/0x1f3
> Jun 26 21:07:03 hightech kernel: [<c046d533>] ? mark_held_locks+0x52/0x7c
> Jun 26 21:07:03 hightech kernel: [<c0810d74>] ?
> __mutex_unlock_slowpath+0x105/0x124
> Jun 26 21:07:03 hightech kernel: [<c046d800>] ?
> trace_hardirqs_on_caller+0x122/0x155
> Jun 26 21:07:03 hightech kernel: [<c046d84c>] ? trace_hardirqs_on+0x19/0x2c
> Jun 26 21:07:03 hightech kernel: [<c046025a>] ?
> __blocking_notifier_call_chain+0x5b/0x78
> Jun 26 21:07:03 hightech kernel: [<c04091bd>] ? sched_clock+0x9/0xd
> Jun 26 21:07:03 hightech kernel: [<c046c091>] ? lock_release_holdtime+0x39/0x143
> Jun 26 21:07:03 hightech kernel: [<c046025a>] ?
> __blocking_notifier_call_chain+0x5b/0x78
> Jun 26 21:07:03 hightech kernel: [<c045f64e>] ? up_read+0x29/0x4f
> Jun 26 21:07:03 hightech kernel: [<c046025a>] ?
> __blocking_notifier_call_chain+0x5b/0x78
> Jun 26 21:07:03 hightech kernel: [<c0460298>] ?
> blocking_notifier_call_chain+0x21/0x37
> Jun 26 21:07:03 hightech kernel: [<c04793c6>] sys_init_module+0xc8/0x1ee
> Jun 26 21:07:03 hightech kernel: [<c0403793>] sysenter_do_call+0x12/0x38
> Jun 26 21:07:03 hightech kernel: modprobe used greatest stack depth:
> 5296 bytes left
> Jun 26 21:07:03 hightech kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
> (...)
>
> Now, the weird thing is that this doesnt always point at modprobe. I
> have seen it occur with swapper, udevd and ip6tables-restore -at
> least-.
> For example, from another boot:
>
> Jun 26 18:46:40 hightech kernel: Pid: 545, comm: udevd Not tainted
> 2.6.31-0.28.rc1.fc12.i586 #1
> Jun 26 18:46:40 hightech kernel: Call Trace:
> Jun 26 18:46:40 hightech kernel: [<c0815123>] ? printk+0x22/0x37
> Jun 26 18:46:40 hightech kernel: [<c046cfa0>] save_trace+0x89/0xa3
> Jun 26 18:46:40 hightech kernel: [<c046d02a>] add_lock_to_list+0x70/0xb3
> Jun 26 18:46:40 hightech kernel: [<c046f224>] __lock_acquire+0x9d9/0xb08
> Jun 26 18:46:40 hightech kernel: [<c05fdd3d>] ?
> debug_check_no_obj_freed+0x7c/0x175
> Jun 26 18:46:40 hightech kernel: [<c05fdd3d>] ?
> debug_check_no_obj_freed+0x7c/0x175
> Jun 26 18:46:40 hightech kernel: [<c046f40a>] lock_acquire+0xb7/0xeb
> Jun 26 18:46:40 hightech kernel: [<c05fdd3d>] ?
> debug_check_no_obj_freed+0x7c/0x175
> Jun 26 18:46:40 hightech kernel: [<c05fdd3d>] ?
> debug_check_no_obj_freed+0x7c/0x175
> Jun 26 18:46:40 hightech kernel: [<c0817e8f>] _spin_lock_irqsave+0x45/0x89
> Jun 26 18:46:40 hightech kernel: [<c05fdd3d>] ?
> debug_check_no_obj_freed+0x7c/0x175
> Jun 26 18:46:40 hightech kernel: [<c05fdd3d>]
> debug_check_no_obj_freed+0x7c/0x175
> Jun 26 18:46:40 hightech kernel: [<c046d188>] ? trace_hardirqs_off+0x19/0x2c
> Jun 26 18:46:40 hightech kernel: [<c04ed7dd>] kmem_cache_free+0x92/0x125
> Jun 26 18:46:40 hightech kernel: [<c05f19ad>] ? idr_remove+0x125/0x144
> Jun 26 18:46:40 hightech kernel: [<c05f19ad>] idr_remove+0x125/0x144
> Jun 26 18:46:40 hightech kernel: [<c05250a4>]
> inotify_ignored_and_remove_idr+0x73/0x9b
> Jun 26 18:46:40 hightech kernel: [<c0524526>] inotify_freeing_mark+0x1c/0x2f
> Jun 26 18:46:40 hightech kernel: [<c0522da7>]
> fsnotify_destroy_mark_by_entry+0x129/0x15e
> Jun 26 18:46:40 hightech kernel: [<c0524b48>] sys_inotify_rm_watch+0x8e/0xbd
> Jun 26 18:46:40 hightech kernel: [<c0403a5c>] syscall_call+0x7/0xb
> Jun 26 18:46:40 hightech kernel: EXT3 FS on sde2, internal journal
> Jun 26 18:46:40 hightech kernel: kjournald starting. ?Commit interval 5 seconds
>
> Its not the exact same kernel (first one is vanilla, the second is
> from rawhide fedora, but the results are the exact same. The origin of
> the call trace keeps jumping around from program to program with every
> boot).
> The box will hang after some minutes too (as in totally hang, not even
> screen refreshes). At start it will work for a couple of minutes tho.
>
> The system is a quad-core phenom 9600 on an asus m3a-h/hdmi board,
> with 4gb ddr2 800 ram, and 3 1TB sata hard disks.
> The system is a fedora 11 with some rawhide updates thrown in, but
> this doesnt happen on a 2.6.29 kernel, on the same system, so im
> assuming its a kernel bug and not a distro problem (ofcourse, i could
> be wrong).
>
> Im attaching .config, and the full dmesg, as well as some other
> informations (lspci -vv, cat /proc/cpuinfo). I can apply patches and
> retest at will, as this will trigger at every boot, so its easily
> reproducible on my end.
>
> Thank you for your time,
> Joao Correia
> Centro de Informatica
> Universidade da Beira Interior
> Portugal
>


Attachments:
config (91.70 kB)

2009-06-29 02:08:28

by Cong Wang

[permalink] [raw]
Subject: Re: Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

On Sat, Jun 27, 2009 at 11:07 PM, Joao
Correia<[email protected]> wrote:
> Hello
>

Hello,

> Since midway through the 2.6.30 series, and continuing on the rc1 for
> .31, im having this at every boot, during the boot process:
>
> (...)
> Jun 26 21:07:03 hightech kernel: NET: Registered protocol family 10
> Jun 26 21:07:03 hightech kernel: lo: Disabled Privacy Extensions
> Jun 26 21:07:03 hightech kernel: BUG: MAX_STACK_TRACE_ENTRIES too low!
> Jun 26 21:07:03 hightech kernel: turning off the locking correctness validator.
> Jun 26 21:07:03 hightech kernel: Pid: 672, comm: modprobe Not tainted
> 2.6.31-rc1 #1

Could you please try the patch from Ingo below?

http://article.gmane.org/gmane.linux.kernel/852005

Does it help?

That value was already doubled from 131072 when the last time it was seen. :)

Thanks.

2009-06-29 08:54:17

by Joao Correia

[permalink] [raw]
Subject: Re: Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

Hello

Well, it doesnt complain about MAX_STACK_TRACE_ENTRIES anymore, but
now it gives me:

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
Pid: 1672, comm: qemu-kvm Not tainted 2.6.30-wl #2
Call Trace:
[<c07da76f>] ? printk+0x22/0x3b
[<c0463de9>] __lock_acquire+0xa77/0xb05
[<c0463f2e>] lock_acquire+0xb7/0xeb
[<c04be96c>] ? mm_take_all_locks+0xb3/0xf1
[<c04be96c>] ? mm_take_all_locks+0xb3/0xf1
[<c07dd476>] _spin_lock_nest_lock+0x30/0x71
[<c04be96c>] ? mm_take_all_locks+0xb3/0xf1
[<c04be96c>] mm_take_all_locks+0xb3/0xf1
[<c04cc741>] do_mmu_notifier_register+0x9f/0x14e
[<c04cc83d>] mmu_notifier_register+0x1e/0x31
[<f9e461b8>] kvm_dev_ioctl+0xfd/0x25d [kvm]
[<f9e460bb>] ? kvm_dev_ioctl+0x0/0x25d [kvm]
[<c04e4df6>] vfs_ioctl+0x29/0x91
[<c04e52f4>] do_vfs_ioctl+0x496/0x4e3
[<c07dfb82>] ? do_page_fault+0x229/0x26a
[<c0408e41>] ? sched_clock+0x9/0xd
[<c0461218>] ? lock_release_holdtime+0x39/0x143
[<c07dfb82>] ? do_page_fault+0x229/0x26a
[<c04e5396>] sys_ioctl+0x55/0x86
[<c04037f3>] sysenter_do_call+0x12/0x38

which looks like its younger brother :-). (Also appears on other
programs, not just qemu-kvm)

Should i increase its value too? Am i not just masking something else
by doing this?

Thank you for your time,
Joao Correia
Centro de Informatica
Universidade da Beira Interior
Portugal

On Mon, Jun 29, 2009 at 3:08 AM, Am?rico Wang<[email protected]> wrote:
> On Sat, Jun 27, 2009 at 11:07 PM, Joao
> Correia<[email protected]> wrote:
>> Hello
>>
>
> Hello,
>
>> Since midway through the 2.6.30 series, and continuing on the rc1 for
>> .31, im having this at every boot, during the boot process:
>>
>> (...)
>> Jun 26 21:07:03 hightech kernel: NET: Registered protocol family 10
>> Jun 26 21:07:03 hightech kernel: lo: Disabled Privacy Extensions
>> Jun 26 21:07:03 hightech kernel: BUG: MAX_STACK_TRACE_ENTRIES too low!
>> Jun 26 21:07:03 hightech kernel: turning off the locking correctness validator.
>> Jun 26 21:07:03 hightech kernel: Pid: 672, comm: modprobe Not tainted
>> 2.6.31-rc1 #1
>
> Could you please try the patch from Ingo below?
>
> http://article.gmane.org/gmane.linux.kernel/852005
>
> Does it help?
>
> That value was already doubled from 131072 when the last time it was seen. :)
>
> Thanks.
>

2009-06-29 09:13:18

by Cong Wang

[permalink] [raw]
Subject: Re: Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

On Mon, Jun 29, 2009 at 09:53:41AM +0100, Joao Correia wrote:
>Hello
>
>Well, it doesnt complain about MAX_STACK_TRACE_ENTRIES anymore, but
>now it gives me:
>
>BUG: MAX_LOCK_DEPTH too low!
>turning off the locking correctness validator.
>Pid: 1672, comm: qemu-kvm Not tainted 2.6.30-wl #2
>Call Trace:
> [<c07da76f>] ? printk+0x22/0x3b
> [<c0463de9>] __lock_acquire+0xa77/0xb05
> [<c0463f2e>] lock_acquire+0xb7/0xeb
> [<c04be96c>] ? mm_take_all_locks+0xb3/0xf1
> [<c04be96c>] ? mm_take_all_locks+0xb3/0xf1
> [<c07dd476>] _spin_lock_nest_lock+0x30/0x71
> [<c04be96c>] ? mm_take_all_locks+0xb3/0xf1
> [<c04be96c>] mm_take_all_locks+0xb3/0xf1
> [<c04cc741>] do_mmu_notifier_register+0x9f/0x14e
> [<c04cc83d>] mmu_notifier_register+0x1e/0x31
> [<f9e461b8>] kvm_dev_ioctl+0xfd/0x25d [kvm]
> [<f9e460bb>] ? kvm_dev_ioctl+0x0/0x25d [kvm]
> [<c04e4df6>] vfs_ioctl+0x29/0x91
> [<c04e52f4>] do_vfs_ioctl+0x496/0x4e3
> [<c07dfb82>] ? do_page_fault+0x229/0x26a
> [<c0408e41>] ? sched_clock+0x9/0xd
> [<c0461218>] ? lock_release_holdtime+0x39/0x143
> [<c07dfb82>] ? do_page_fault+0x229/0x26a
> [<c04e5396>] sys_ioctl+0x55/0x86
> [<c04037f3>] sysenter_do_call+0x12/0x38
>
>which looks like its younger brother :-). (Also appears on other
>programs, not just qemu-kvm)

Hmmm.. also known. :)

>
>Should i increase its value too? Am i not just masking something else
>by doing this?

Just try to modify your sysctl.conf file, on Fedora it should be
/etc/sysctl.conf, try to double the value of
/proc/sys/kernel/max_lock_depth.

If this doesn't work, try to increase the value of 'max_lock_depth'
in the kernel source. :)

I think Peter will have a patch to fix this.

2009-06-29 10:31:16

by Joao Correia

[permalink] [raw]
Subject: Re: Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

Hello

The system seemed to happily ignore all the sysctl.conf changes and
all echo VALUE > /proc/sys/kernel/max_lock_depth

So i dug a little on the source and changed

include/linux/sched.h

# define MAX_LOCK_DEPTH 48UL

to

# define MAX_LOCK_DEPTH 96UL

and im getting no more errors. Of course, now its probably radioactive
and about to blow up, but at least it's not complaining anymore :).

Thank you for your time and help,
Joao Correia
Centro de Informatica
Universidade da Beira Interior
Portugal


On Mon, Jun 29, 2009 at 10:15 AM, Amerigo Wang<[email protected]> wrote:
> On Mon, Jun 29, 2009 at 09:53:41AM +0100, Joao Correia wrote:
>>Hello
>>
>>Well, it doesnt complain about MAX_STACK_TRACE_ENTRIES anymore, but
>>now it gives me:
>>
>>BUG: MAX_LOCK_DEPTH too low!
>>turning off the locking correctness validator.
>>Pid: 1672, comm: qemu-kvm Not tainted 2.6.30-wl #2
>>Call Trace:
>> [<c07da76f>] ? printk+0x22/0x3b
>> [<c0463de9>] __lock_acquire+0xa77/0xb05
>> [<c0463f2e>] lock_acquire+0xb7/0xeb
>> [<c04be96c>] ? mm_take_all_locks+0xb3/0xf1
>> [<c04be96c>] ? mm_take_all_locks+0xb3/0xf1
>> [<c07dd476>] _spin_lock_nest_lock+0x30/0x71
>> [<c04be96c>] ? mm_take_all_locks+0xb3/0xf1
>> [<c04be96c>] mm_take_all_locks+0xb3/0xf1
>> [<c04cc741>] do_mmu_notifier_register+0x9f/0x14e
>> [<c04cc83d>] mmu_notifier_register+0x1e/0x31
>> [<f9e461b8>] kvm_dev_ioctl+0xfd/0x25d [kvm]
>> [<f9e460bb>] ? kvm_dev_ioctl+0x0/0x25d [kvm]
>> [<c04e4df6>] vfs_ioctl+0x29/0x91
>> [<c04e52f4>] do_vfs_ioctl+0x496/0x4e3
>> [<c07dfb82>] ? do_page_fault+0x229/0x26a
>> [<c0408e41>] ? sched_clock+0x9/0xd
>> [<c0461218>] ? lock_release_holdtime+0x39/0x143
>> [<c07dfb82>] ? do_page_fault+0x229/0x26a
>> [<c04e5396>] sys_ioctl+0x55/0x86
>> [<c04037f3>] sysenter_do_call+0x12/0x38
>>
>>which looks like its younger brother :-). (Also appears on other
>>programs, not just qemu-kvm)
>
> Hmmm.. also known. :)
>
>>
>>Should i increase its value too? Am i not just masking something else
>>by doing this?
>
> Just try to modify your sysctl.conf file, on Fedora it should be
> /etc/sysctl.conf, try to double the value of
> /proc/sys/kernel/max_lock_depth.
>
> If this doesn't work, try to increase the value of 'max_lock_depth'
> in the kernel source. :)
>
> I think Peter will have a patch to fix this.
>
>

2009-06-30 09:14:41

by Cong Wang

[permalink] [raw]
Subject: Re: Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

On Mon, Jun 29, 2009 at 11:30:41AM +0100, Joao Correia wrote:
>Hello
>
>The system seemed to happily ignore all the sysctl.conf changes and
>all echo VALUE > /proc/sys/kernel/max_lock_depth


Ouch, mistake! max_lock_depth is for rt_mutex... :)

>
>So i dug a little on the source and changed
>
>include/linux/sched.h
>
># define MAX_LOCK_DEPTH 48UL
>
>to
>
># define MAX_LOCK_DEPTH 96UL
>
>and im getting no more errors. Of course, now its probably radioactive
>and about to blow up, but at least it's not complaining anymore :).

Yeah, this is right.

Let's Cc: Peter to see if he would like to change this number...

Peter?

2009-07-01 20:30:51

by Joao Correia

[permalink] [raw]
Subject: Re: Crashes during boot on 2.6.30 / 2.6.31-rc, random programs

Hello again

Looks like this just opened the lid on some other limits. I just hit
another thing, which hadn't shown up before i made the changes, but
looks like its just another limit thats too low. This one is harder to
reproduce tho.

Jun 30 21:35:03 hightech kernel: BUG: MAX_LOCKDEP_CHAINS too low!
Jun 30 21:35:03 hightech kernel: turning off the locking correctness validator.
Jun 30 21:35:03 hightech kernel: Pid: 9379, comm: qemu-kvm Not tainted
2.6.30-wl #3
Jun 30 21:35:03 hightech kernel: Call Trace:
Jun 30 21:35:03 hightech kernel: [<c07da76f>] ? printk+0x22/0x3b
Jun 30 21:35:03 hightech kernel: [<c04638f5>] __lock_acquire+0x583/0xb05
Jun 30 21:35:03 hightech kernel: [<c046a107>] ? __module_text_address+0x20/0x6d
Jun 30 21:35:03 hightech kernel: [<c0463f2e>] lock_acquire+0xb7/0xeb
Jun 30 21:35:03 hightech kernel: [<c05cf959>] ? get_hash_bucket+0x3a/0x55
Jun 30 21:35:03 hightech kernel: [<c05cf959>] ? get_hash_bucket+0x3a/0x55
Jun 30 21:35:03 hightech kernel: [<c07dd75f>] _spin_lock_irqsave+0x45/0x89
Jun 30 21:35:03 hightech kernel: [<c05cf959>] ? get_hash_bucket+0x3a/0x55
Jun 30 21:35:03 hightech kernel: [<c05cf959>] get_hash_bucket+0x3a/0x55
Jun 30 21:35:03 hightech kernel: [<c05d09e8>] add_dma_entry+0x1f/0x4f
Jun 30 21:35:03 hightech kernel: [<c05d0c66>] debug_dma_map_sg+0xe1/0x147
Jun 30 21:35:03 hightech kernel: [<c0695a19>] ata_qc_issue+0x1c0/0x254
Jun 30 21:35:03 hightech kernel: [<c069bf44>] __ata_scsi_queuecmd+0x16a/0x1c8
Jun 30 21:35:03 hightech kernel: [<c067a37e>] ? scsi_done+0x0/0x32
Jun 30 21:35:03 hightech kernel: [<c069b9b1>] ? ata_scsi_rw_xlat+0x0/0x207
Jun 30 21:35:03 hightech kernel: [<c069c05b>] ata_scsi_queuecmd+0x55/0x95
Jun 30 21:35:03 hightech kernel: [<c067a37e>] ? scsi_done+0x0/0x32
Jun 30 21:35:03 hightech kernel: [<c067a627>] scsi_dispatch_cmd+0x187/0x207
Jun 30 21:35:03 hightech kernel: [<c068044c>] scsi_request_fn+0x33d/0x480
Jun 30 21:35:03 hightech kernel: [<c0445a1e>] ? del_timer+0x54/0x6e
Jun 30 21:35:03 hightech kernel: [<c05aefbe>] __generic_unplug_device+0x3e/0x53
Jun 30 21:35:03 hightech kernel: [<c05af00a>] blk_start_queueing+0x37/0x4a
Jun 30 21:35:03 hightech kernel: [<c05bcc7e>] cfq_insert_request+0x46f/0x498
Jun 30 21:35:03 hightech kernel: [<c05ad43b>] ? __rcu_read_unlock+0x0/0x3d
Jun 30 21:35:03 hightech kernel: [<c05ac60d>] elv_insert+0x12f/0x1d0
Jun 30 21:35:03 hightech kernel: [<c05ac763>] __elv_add_request+0xb5/0xce
Jun 30 21:35:03 hightech kernel: [<c05ad465>] ? __rcu_read_unlock+0x2a/0x3d
Jun 30 21:35:03 hightech kernel: [<c05ad533>] ? drive_stat_acct+0xbb/0xd3
Jun 30 21:35:03 hightech kernel: [<c05af6b3>] __make_request+0x33b/0x3c1
Jun 30 21:35:03 hightech kernel: [<c04cd85c>] ? check_valid_pointer+0x2c/0x6c
Jun 30 21:35:03 hightech kernel: [<c046249e>] ? mark_lock+0x29/0x1f3
Jun 30 21:35:03 hightech kernel: [<c04cf83a>] ? kmem_cache_alloc+0xc2/0x150
Jun 30 21:35:03 hightech kernel: [<c05ade19>] generic_make_request+0x2fd/0x351
Jun 30 21:35:03 hightech kernel: [<c05ade19>] ? generic_make_request+0x2fd/0x351
Jun 30 21:35:03 hightech kernel: [<c04a8b4e>] ? mempool_alloc+0x5b/0x105
Jun 30 21:35:03 hightech kernel: [<c04626ba>] ? mark_held_locks+0x52/0x7c
Jun 30 21:35:03 hightech kernel: [<c05adf2d>] submit_bio+0xc0/0xd9
Jun 30 21:35:03 hightech kernel: [<c04fd28d>] ? bio_alloc_bioset+0x35/0xd5
Jun 30 21:35:03 hightech kernel: [<c04f845c>] submit_bh+0xea/0x11b
Jun 30 21:35:03 hightech kernel: [<c04fae00>]
__block_write_full_page+0x218/0x2f0
Jun 30 21:35:03 hightech kernel: [<c054427d>] ?
ext4_normal_get_block_write+0x0/0x76
Jun 30 21:35:03 hightech kernel: [<c04fb5ea>] ? end_buffer_async_write+0x0/0x140
Jun 30 21:35:03 hightech kernel: [<c04fb5ea>] ? end_buffer_async_write+0x0/0x140
Jun 30 21:35:03 hightech kernel: [<c04faf4c>]
block_write_full_page_endio+0x74/0xda
Jun 30 21:35:03 hightech kernel: [<c04fb5ea>] ? end_buffer_async_write+0x0/0x140
Jun 30 21:35:03 hightech kernel: [<c054427d>] ?
ext4_normal_get_block_write+0x0/0x76
Jun 30 21:35:03 hightech kernel: [<c04fafd4>] block_write_full_page+0x22/0x38
Jun 30 21:35:03 hightech kernel: [<c04fb5ea>] ? end_buffer_async_write+0x0/0x140
Jun 30 21:35:03 hightech kernel: [<c0543390>] ext4_da_writepage+0x1aa/0x1da
Jun 30 21:35:03 hightech kernel: [<c054315b>] mpage_da_submit_io+0x9b/0xf0
Jun 30 21:35:03 hightech kernel: [<c0545f40>] ext4_da_writepages+0x2c2/0x445
Jun 30 21:35:03 hightech kernel: [<c04ad194>] do_writepages+0x32/0x58
Jun 30 21:35:03 hightech kernel: [<c04a739b>]
__filemap_fdatawrite_range+0x73/0x8c
Jun 30 21:35:03 hightech kernel: [<c04a7c81>] filemap_fdatawrite_range+0x2b/0x44
Jun 30 21:35:03 hightech kernel: [<c04a7e9e>] sync_page_range+0x7f/0xf5
Jun 30 21:35:03 hightech kernel: [<c04629d3>] ? trace_hardirqs_on+0x19/0x2c
Jun 30 21:35:03 hightech kernel: [<c04a7fd1>] generic_file_aio_write+0xbd/0xe1
Jun 30 21:35:03 hightech kernel: [<c053da09>] ext4_file_write+0xc2/0x149
Jun 30 21:35:03 hightech kernel: [<c04d7cf2>] do_sync_write+0xbe/0x10a
Jun 30 21:35:03 hightech kernel: [<c046249e>] ? mark_lock+0x29/0x1f3
Jun 30 21:35:03 hightech kernel: [<c0451586>] ?
autoremove_wake_function+0x0/0x55
Jun 30 21:35:03 hightech kernel: [<c04d920e>] ? __rcu_read_unlock+0x0/0x3d
Jun 30 21:35:03 hightech kernel: [<c058377a>] ?
security_file_permission+0x23/0x36
Jun 30 21:35:03 hightech kernel: [<c04d7eed>] ? rw_verify_area+0xa5/0xd8
Jun 30 21:35:03 hightech kernel: [<c04d7c34>] ? do_sync_write+0x0/0x10a
Jun 30 21:35:03 hightech kernel: [<c04d837f>] vfs_write+0x9f/0x10f
Jun 30 21:35:03 hightech kernel: [<c04d8459>] sys_pwrite64+0x6a/0x93
Jun 30 21:35:03 hightech kernel: [<c04037f3>] sysenter_do_call+0x12/0x38


Now the whole call trace is showing up (so the other limits were
raised correctly).

Im going to go through the source and raise this limit as well, but as
i said above, this one is harder to trigger, so it may take a couple
of days to see it again.

The first problem was indeed just the tip of the iceberg. Perphaps a
more comprehensive change of all the related limits is in order? (I do
not know if such change has already happened or not, if so, please
point me in the right direction.)

Again,
Thank you very much for your time,
Joao Correia
Centro de Informatica
Universidade da Beira Interior
Portugal

On Tue, Jun 30, 2009 at 10:16 AM, Amerigo Wang<[email protected]> wrote:
> On Mon, Jun 29, 2009 at 11:30:41AM +0100, Joao Correia wrote:
>>Hello
>>
>>The system seemed to happily ignore all the sysctl.conf changes and
>>all echo VALUE > ?/proc/sys/kernel/max_lock_depth
>
>
> Ouch, mistake! max_lock_depth is for rt_mutex... :)
>
>>
>>So i dug a little on the source and changed
>>
>>include/linux/sched.h
>>
>># define MAX_LOCK_DEPTH 48UL
>>
>>to
>>
>># define MAX_LOCK_DEPTH 96UL
>>
>>and im getting no more errors. Of course, now its probably radioactive
>>and about to blow up, but at least it's not complaining anymore :).
>
> Yeah, this is right.
>
> Let's Cc: Peter to see if he would like to change this number...
>
> Peter?
>

2009-07-02 09:29:20

by Cong Wang

[permalink] [raw]
Subject: lockdep max numbers (was Re: Crashes during boot on 2.6.30 / 2.6.31-rc, random programs)

On Wed, Jul 01, 2009 at 09:30:23PM +0100, Joao Correia wrote:
>Hello again
>
>Looks like this just opened the lid on some other limits. I just hit
>another thing, which hadn't shown up before i made the changes, but
>looks like its just another limit thats too low. This one is harder to
>reproduce tho.
>
>Jun 30 21:35:03 hightech kernel: BUG: MAX_LOCKDEP_CHAINS too low!
>Jun 30 21:35:03 hightech kernel: turning off the locking correctness validator.
>Jun 30 21:35:03 hightech kernel: Pid: 9379, comm: qemu-kvm Not tainted


Similar problem...

Peter? Would like to increase these numbers?


>2.6.30-wl #3
>Jun 30 21:35:03 hightech kernel: Call Trace:
>Jun 30 21:35:03 hightech kernel: [<c07da76f>] ? printk+0x22/0x3b
>Jun 30 21:35:03 hightech kernel: [<c04638f5>] __lock_acquire+0x583/0xb05
>Jun 30 21:35:03 hightech kernel: [<c046a107>] ? __module_text_address+0x20/0x6d
>Jun 30 21:35:03 hightech kernel: [<c0463f2e>] lock_acquire+0xb7/0xeb
>Jun 30 21:35:03 hightech kernel: [<c05cf959>] ? get_hash_bucket+0x3a/0x55
>Jun 30 21:35:03 hightech kernel: [<c05cf959>] ? get_hash_bucket+0x3a/0x55
>Jun 30 21:35:03 hightech kernel: [<c07dd75f>] _spin_lock_irqsave+0x45/0x89
>Jun 30 21:35:03 hightech kernel: [<c05cf959>] ? get_hash_bucket+0x3a/0x55
>Jun 30 21:35:03 hightech kernel: [<c05cf959>] get_hash_bucket+0x3a/0x55
>Jun 30 21:35:03 hightech kernel: [<c05d09e8>] add_dma_entry+0x1f/0x4f
>Jun 30 21:35:03 hightech kernel: [<c05d0c66>] debug_dma_map_sg+0xe1/0x147
>Jun 30 21:35:03 hightech kernel: [<c0695a19>] ata_qc_issue+0x1c0/0x254
>Jun 30 21:35:03 hightech kernel: [<c069bf44>] __ata_scsi_queuecmd+0x16a/0x1c8
>Jun 30 21:35:03 hightech kernel: [<c067a37e>] ? scsi_done+0x0/0x32
>Jun 30 21:35:03 hightech kernel: [<c069b9b1>] ? ata_scsi_rw_xlat+0x0/0x207
>Jun 30 21:35:03 hightech kernel: [<c069c05b>] ata_scsi_queuecmd+0x55/0x95
>Jun 30 21:35:03 hightech kernel: [<c067a37e>] ? scsi_done+0x0/0x32
>Jun 30 21:35:03 hightech kernel: [<c067a627>] scsi_dispatch_cmd+0x187/0x207
>Jun 30 21:35:03 hightech kernel: [<c068044c>] scsi_request_fn+0x33d/0x480
>Jun 30 21:35:03 hightech kernel: [<c0445a1e>] ? del_timer+0x54/0x6e
>Jun 30 21:35:03 hightech kernel: [<c05aefbe>] __generic_unplug_device+0x3e/0x53
>Jun 30 21:35:03 hightech kernel: [<c05af00a>] blk_start_queueing+0x37/0x4a
>Jun 30 21:35:03 hightech kernel: [<c05bcc7e>] cfq_insert_request+0x46f/0x498
>Jun 30 21:35:03 hightech kernel: [<c05ad43b>] ? __rcu_read_unlock+0x0/0x3d
>Jun 30 21:35:03 hightech kernel: [<c05ac60d>] elv_insert+0x12f/0x1d0
>Jun 30 21:35:03 hightech kernel: [<c05ac763>] __elv_add_request+0xb5/0xce
>Jun 30 21:35:03 hightech kernel: [<c05ad465>] ? __rcu_read_unlock+0x2a/0x3d
>Jun 30 21:35:03 hightech kernel: [<c05ad533>] ? drive_stat_acct+0xbb/0xd3
>Jun 30 21:35:03 hightech kernel: [<c05af6b3>] __make_request+0x33b/0x3c1
>Jun 30 21:35:03 hightech kernel: [<c04cd85c>] ? check_valid_pointer+0x2c/0x6c
>Jun 30 21:35:03 hightech kernel: [<c046249e>] ? mark_lock+0x29/0x1f3
>Jun 30 21:35:03 hightech kernel: [<c04cf83a>] ? kmem_cache_alloc+0xc2/0x150
>Jun 30 21:35:03 hightech kernel: [<c05ade19>] generic_make_request+0x2fd/0x351
>Jun 30 21:35:03 hightech kernel: [<c05ade19>] ? generic_make_request+0x2fd/0x351
>Jun 30 21:35:03 hightech kernel: [<c04a8b4e>] ? mempool_alloc+0x5b/0x105
>Jun 30 21:35:03 hightech kernel: [<c04626ba>] ? mark_held_locks+0x52/0x7c
>Jun 30 21:35:03 hightech kernel: [<c05adf2d>] submit_bio+0xc0/0xd9
>Jun 30 21:35:03 hightech kernel: [<c04fd28d>] ? bio_alloc_bioset+0x35/0xd5
>Jun 30 21:35:03 hightech kernel: [<c04f845c>] submit_bh+0xea/0x11b
>Jun 30 21:35:03 hightech kernel: [<c04fae00>]
>__block_write_full_page+0x218/0x2f0
>Jun 30 21:35:03 hightech kernel: [<c054427d>] ?
>ext4_normal_get_block_write+0x0/0x76
>Jun 30 21:35:03 hightech kernel: [<c04fb5ea>] ? end_buffer_async_write+0x0/0x140
>Jun 30 21:35:03 hightech kernel: [<c04fb5ea>] ? end_buffer_async_write+0x0/0x140
>Jun 30 21:35:03 hightech kernel: [<c04faf4c>]
>block_write_full_page_endio+0x74/0xda
>Jun 30 21:35:03 hightech kernel: [<c04fb5ea>] ? end_buffer_async_write+0x0/0x140
>Jun 30 21:35:03 hightech kernel: [<c054427d>] ?
>ext4_normal_get_block_write+0x0/0x76
>Jun 30 21:35:03 hightech kernel: [<c04fafd4>] block_write_full_page+0x22/0x38
>Jun 30 21:35:03 hightech kernel: [<c04fb5ea>] ? end_buffer_async_write+0x0/0x140
>Jun 30 21:35:03 hightech kernel: [<c0543390>] ext4_da_writepage+0x1aa/0x1da
>Jun 30 21:35:03 hightech kernel: [<c054315b>] mpage_da_submit_io+0x9b/0xf0
>Jun 30 21:35:03 hightech kernel: [<c0545f40>] ext4_da_writepages+0x2c2/0x445
>Jun 30 21:35:03 hightech kernel: [<c04ad194>] do_writepages+0x32/0x58
>Jun 30 21:35:03 hightech kernel: [<c04a739b>]
>__filemap_fdatawrite_range+0x73/0x8c
>Jun 30 21:35:03 hightech kernel: [<c04a7c81>] filemap_fdatawrite_range+0x2b/0x44
>Jun 30 21:35:03 hightech kernel: [<c04a7e9e>] sync_page_range+0x7f/0xf5
>Jun 30 21:35:03 hightech kernel: [<c04629d3>] ? trace_hardirqs_on+0x19/0x2c
>Jun 30 21:35:03 hightech kernel: [<c04a7fd1>] generic_file_aio_write+0xbd/0xe1
>Jun 30 21:35:03 hightech kernel: [<c053da09>] ext4_file_write+0xc2/0x149
>Jun 30 21:35:03 hightech kernel: [<c04d7cf2>] do_sync_write+0xbe/0x10a
>Jun 30 21:35:03 hightech kernel: [<c046249e>] ? mark_lock+0x29/0x1f3
>Jun 30 21:35:03 hightech kernel: [<c0451586>] ?
>autoremove_wake_function+0x0/0x55
>Jun 30 21:35:03 hightech kernel: [<c04d920e>] ? __rcu_read_unlock+0x0/0x3d
>Jun 30 21:35:03 hightech kernel: [<c058377a>] ?
>security_file_permission+0x23/0x36
>Jun 30 21:35:03 hightech kernel: [<c04d7eed>] ? rw_verify_area+0xa5/0xd8
>Jun 30 21:35:03 hightech kernel: [<c04d7c34>] ? do_sync_write+0x0/0x10a
>Jun 30 21:35:03 hightech kernel: [<c04d837f>] vfs_write+0x9f/0x10f
>Jun 30 21:35:03 hightech kernel: [<c04d8459>] sys_pwrite64+0x6a/0x93
>Jun 30 21:35:03 hightech kernel: [<c04037f3>] sysenter_do_call+0x12/0x38
>
>
>Now the whole call trace is showing up (so the other limits were
>raised correctly).
>
>Im going to go through the source and raise this limit as well, but as
>i said above, this one is harder to trigger, so it may take a couple
>of days to see it again.
>
>The first problem was indeed just the tip of the iceberg. Perphaps a
>more comprehensive change of all the related limits is in order? (I do
>not know if such change has already happened or not, if so, please
>point me in the right direction.)
>
>Again,
>Thank you very much for your time,
>Joao Correia
>Centro de Informatica
>Universidade da Beira Interior
>Portugal
>
>On Tue, Jun 30, 2009 at 10:16 AM, Amerigo Wang<[email protected]> wrote:
>> On Mon, Jun 29, 2009 at 11:30:41AM +0100, Joao Correia wrote:
>>>Hello
>>>
>>>The system seemed to happily ignore all the sysctl.conf changes and
>>>all echo VALUE >  /proc/sys/kernel/max_lock_depth
>>
>>
>> Ouch, mistake! max_lock_depth is for rt_mutex... :)
>>
>>>
>>>So i dug a little on the source and changed
>>>
>>>include/linux/sched.h
>>>
>>># define MAX_LOCK_DEPTH 48UL
>>>
>>>to
>>>
>>># define MAX_LOCK_DEPTH 96UL
>>>
>>>and im getting no more errors. Of course, now its probably radioactive
>>>and about to blow up, but at least it's not complaining anymore :).
>>
>> Yeah, this is right.
>>
>> Let's Cc: Peter to see if he would like to change this number...
>>
>> Peter?
>>

2009-07-02 10:32:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: lockdep max numbers (was Re: Crashes during boot on 2.6.30 / 2.6.31-rc, random programs)

On Thu, 2009-07-02 at 17:31 +0800, Amerigo Wang wrote:
> On Wed, Jul 01, 2009 at 09:30:23PM +0100, Joao Correia wrote:
> >Hello again
> >
> >Looks like this just opened the lid on some other limits. I just hit
> >another thing, which hadn't shown up before i made the changes, but
> >looks like its just another limit thats too low. This one is harder to
> >reproduce tho.
> >
> >Jun 30 21:35:03 hightech kernel: BUG: MAX_LOCKDEP_CHAINS too low!
> >Jun 30 21:35:03 hightech kernel: turning off the locking correctness validator.
> >Jun 30 21:35:03 hightech kernel: Pid: 9379, comm: qemu-kvm Not tainted
>
>
> Similar problem...
>
> Peter? Would like to increase these numbers?

> >2.6.30-wl #3

> >Jun 30 21:35:03 hightech kernel: [<c05d0c66>] debug_dma_map_sg+0xe1/0x147

We recently fixed the dma debug stuff to require less lockdep resources,
Joerg might know if that made it into .30.

> >>>So i dug a little on the source and changed
> >>>
> >>>include/linux/sched.h
> >>>
> >>># define MAX_LOCK_DEPTH 48UL
> >>>
> >>>to
> >>>
> >>># define MAX_LOCK_DEPTH 96UL

> >> Let's Cc: Peter to see if he would like to change this number...
> >>
> >> Peter?

I really don't think we want to raise this, I mean, holding more that 48
locks at the same time is somewhat pushing it, don't you think?

Where do you run into this limit?

2009-07-02 10:46:08

by Joao Correia

[permalink] [raw]
Subject: Re: lockdep max numbers (was Re: Crashes during boot on 2.6.30 / 2.6.31-rc, random programs)

Every boot, different programs, on a fedora 11 box. If i dont touch
any of the limits mentioned, it happens every boot.

Changing the two values mentioned, a third complain appears, althouh
more infrequently. I have had MAX_LOCKDEP_CHAINS pop up regarding
qemu-kvm, and once again on unrar, during regular box usage.

(quad core phenom 9600, asus m3a-h/hdmi board with 4gb ram).

Thank you very much for your time,
Joao Correia
Centro de Informatica
Universidade da Beira Interior
Portugal

On Thu, Jul 2, 2009 at 11:32 AM, Peter Zijlstra<[email protected]> wrote:
> On Thu, 2009-07-02 at 17:31 +0800, Amerigo Wang wrote:
>> On Wed, Jul 01, 2009 at 09:30:23PM +0100, Joao Correia wrote:
>> >Hello again
>> >
>> >Looks like this just opened the lid on some other limits. I just hit
>> >another thing, which hadn't shown up before i made the changes, but
>> >looks like its just another limit thats too low. This one is harder to
>> >reproduce tho.
>> >
>> >Jun 30 21:35:03 hightech kernel: BUG: MAX_LOCKDEP_CHAINS too low!
>> >Jun 30 21:35:03 hightech kernel: turning off the locking correctness validator.
>> >Jun 30 21:35:03 hightech kernel: Pid: 9379, comm: qemu-kvm Not tainted
>>
>>
>> Similar problem...
>>
>> Peter? Would like to increase these numbers?
>
>> >2.6.30-wl #3
>
>> >Jun 30 21:35:03 hightech kernel: [<c05d0c66>] debug_dma_map_sg+0xe1/0x147
>
> We recently fixed the dma debug stuff to require less lockdep resources,
> Joerg might know if that made it into .30.
>
>> >>>So i dug a little on the source and changed
>> >>>
>> >>>include/linux/sched.h
>> >>>
>> >>># define MAX_LOCK_DEPTH 48UL
>> >>>
>> >>>to
>> >>>
>> >>># define MAX_LOCK_DEPTH 96UL
>
>> >> Let's Cc: Peter to see if he would like to change this number...
>> >>
>> >> Peter?
>
> I really don't think we want to raise this, I mean, holding more that 48
> locks at the same time is somewhat pushing it, don't you think?
>
> Where do you run into this limit?
>
>