LinuxLists.cc - BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, 30 Sep 2006, Eric Rannaud wrote:
>
> (2) is introduced by d94a041519f3ab1ac023bf917619cd8c4a7d3c01
> [PATCH] taskstats: free skb, avoid returns in send_cpu_listeners,
> Signed-off-by: Shailabh Nagar.
> The kernel freezes after a BUG (no sysrq magic).

This one looks like the real problem is that totally broken stack unwinder
again.

The lockdep warning in itself is probably valid, but the reason for the
_hang_ is the

[ 138.831385] Unable to handle kernel paging request at ffffffff82800000 RIP:
[ 138.831439] [<ffffffff8020b491>] show_trace+0x311/0x380

and that code is just a total mess.

Andi, I really think that Dwarf unwinder needs to go. The code is totally
unreadable, and it's clearly fragile as hell. It doesn't check that the
pointers it gets are even remotely valid, but just follows them as if they
were.

The whole unwinder seems buggier than any bug it can ever unwind would be.
Really. Let's go back to the sane "try our best, don't guarantee anything,
but at least don't make things worse!" old code.

The people who wrote that crap (and yes, Andi, I mean apparently you and
Jan Beulich) really _have_ to get his act together. It's not just
unreadable and obviously buggy, it's so scarily that it's hard to even
talk about it. Lookie here:

#define HANDLE_STACK(cond) \
do while (cond) { \
unsigned long addr = *stack++; \

What the F*CK! "do while(cond) {" ????

Please. Somebody just rip out all this crap. I beg of you.

We need Al Viro here to put this kind of code into perspective. I _think_
he would have some choice words for code that is meant to "help"
debugging, and is this horrible.

Linus

----
>
> ---- netconsole for (2) with numa=noacpi
> [ 138.430672] BUG: warning at kernel/lockdep.c:565/print_infinite_recursion_bug()
> [ 138.430710] powernow-k8: MP systems not supported by PSB BIOS structure
> [ 138.430752] powernow-k8: MP systems not supported by PSB BIOS structure
> [ 138.430782] powernow-k8: MP systems not supported by PSB BIOS structure
> [ 138.430814] powernow-k8: MP systems not supported by PSB BIOS structure
> [ 138.430972]
> [ 138.430973] Call Trace:
> [ 138.431190] [<ffffffff8020b22d>] show_trace+0xad/0x380
> [ 138.431264] [<ffffffff8020b745>] dump_stack+0x15/0x20
> [ 138.431337] [<ffffffff8024bc1d>] print_infinite_recursion_bug+0x3d/0x50
> [ 138.431465] [<ffffffff8024bd2f>] find_usage_backwards+0x2f/0xd0
> [ 138.431591] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.431718] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.431844] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.431970] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.432097] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.432223] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.432349] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.432475] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.432602] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.432728] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.432855] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.432982] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.433111] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.433237] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.433362] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.433488] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.433613] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.433739] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.433865] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.433992] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.434120] [<ffffffff8024c740>] check_usage+0x40/0x2b0
> [ 138.434246] [<ffffffff8024e010>] __lock_acquire+0xa50/0xd20
> [ 138.434371] [<ffffffff8024e66b>] lock_acquire+0x8b/0xc0
> [ 138.434501] [<ffffffff804ac065>] _spin_lock+0x25/0x40
> [ 138.434631] [<ffffffff80227d30>] double_rq_lock+0x40/0x50
> [ 138.434730] [<ffffffff8022ceca>] rebalance_tick+0x19a/0x340
> [ 138.434831] [<ffffffff8022d0ef>] scheduler_tick+0x7f/0x390
> [ 138.434933] [<ffffffff8023afe3>] update_process_times+0x73/0x90
> [ 138.435046] [<ffffffff8021636b>] smp_local_timer_interrupt+0x2b/0x60
> [ 138.435133] [<ffffffff80216b28>] smp_apic_timer_interrupt+0x58/0x70
> [ 138.435217] [<ffffffff8020a76e>] apic_timer_interrupt+0x6a/0x70
> [ 138.435284] [<ffffffff802086b2>] default_idle+0x42/0x80
> [ 138.435348] [<ffffffff8020875a>] cpu_idle+0x6a/0x90
> [ 138.435414] [<ffffffff8094d94c>] start_secondary+0x4fc/0x510
> [ 138.435494] <IRQ> [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.435597] [<ffffffff8024bc1d>] print_infinite_recursion_bug+0x3d/0x50
> [ 138.435659] [<ffffffff8024bd2f>] find_usage_backwards+0x2f/0xd0
> [ 138.435721] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.435782] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.435844] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.435906] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.435967] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436029] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436093] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436155] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436216] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436277] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436339] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436400] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436462] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436523] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436585] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436647] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436708] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436770] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436831] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436893] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.436954] [<ffffffff8024c740>] check_usage+0x40/0x2b0
> [ 138.437016] [<ffffffff8024e010>] __lock_acquire+0xa50/0xd20
> [ 138.437080] [<ffffffff80227d30>] double_rq_lock+0x40/0x50
> [ 138.437141] [<ffffffff8024e66b>] lock_acquire+0x8b/0xc0
> [ 138.437202] [<ffffffff80227d30>] double_rq_lock+0x40/0x50
> [ 138.437264] [<ffffffff804ac065>] _spin_lock+0x25/0x40
> [ 138.437325] [<ffffffff80227d30>] double_rq_lock+0x40/0x50
> [ 138.437385] [<ffffffff8022ceca>] rebalance_tick+0x19a/0x340
> [ 138.437446] [<ffffffff804ac54b>] _spin_unlock_irq+0x2b/0x40
> [ 138.437508] [<ffffffff8022d0ef>] scheduler_tick+0x7f/0x390
> [ 138.437569] [<ffffffff8023afe3>] update_process_times+0x73/0x90
> [ 138.437631] [<ffffffff8021636b>] smp_local_timer_interrupt+0x2b/0x60
> [ 138.437694] [<ffffffff80216b28>] smp_apic_timer_interrupt+0x58/0x70
> [ 138.437756] [<ffffffff802086b0>] default_idle+0x40/0x80
> [ 138.437815] [<ffffffff80208670>] default_idle+0x0/0x80
> [ 138.437875] [<ffffffff8020a76e>] apic_timer_interrupt+0x6a/0x70
> [ 138.437936] <EOI> [<ffffffff8045c7ff>] ipq_kill+0x5f/0xa0
> [ 138.751237] [<ffffffff8021ed29>] flush_kernel_map+0x39/0x40
> [ 138.751306] [<ffffffff8021ecc0>] search_extable+0x40/0x70
> [ 138.763697] [<ffffffff8020efeb>] show_cpuinfo+0x18b/0x360
> [ 138.763765] [<ffffffff8020efef>] show_cpuinfo+0x18f/0x360
> [ 138.763828] [<ffffffff8020f0e6>] show_cpuinfo+0x286/0x360
> [ 138.763889] [<ffffffff8020f0ed>] show_cpuinfo+0x28d/0x360
> [ 138.764152] [<ffffffff8021de9f>] show_mem+0xbf/0x170
> [ 138.764211] [<ffffffff8021dea1>] show_mem+0xc1/0x170
> [ 138.764275] [<ffffffff8021dec0>] show_mem+0xe0/0x170
> [ 138.764334] [<ffffffff8021dec2>] show_mem+0xe2/0x170
> [ 138.764394] [<ffffffff8021ded7>] show_mem+0xf7/0x170
> [ 138.764453] [<ffffffff8021ded9>] show_mem+0xf9/0x170
> [ 138.764514] [<ffffffff8021dfc0>] __add_pages+0x40/0xd0
> [ 138.764576] [<ffffffff8021e089>] vmalloc_sync_all+0x39/0x170
> [ 138.799966] [<ffffffff8049ee60>] xfrm_state_netlink+0x0/0xa0
> [ 138.800034] [<ffffffff8049ee69>] xfrm_state_netlink+0x9/0xa0
> [ 138.800100] [<ffffffff8049ee86>] xfrm_state_netlink+0x26/0xa0
> [ 138.800161] [<ffffffff8049ee8f>] xfrm_state_netlink+0x2f/0xa0
> [ 138.800230] [<ffffffff8049f182>] xfrm_send_state_notify+0x282/0x6c0
> [ 138.800292] [<ffffffff8049f195>] xfrm_send_state_notify+0x295/0x6c0
> [ 138.800370] [<ffffffff8022ad33>] find_busiest_group+0x5d3/0x6f0
> [ 138.800431] [<ffffffff8022ad3c>] find_busiest_group+0x5dc/0x6f0
> [ 138.800499] [<ffffffff8049f23c>] xfrm_send_state_notify+0x33c/0x6c0
> [ 138.800562] [<ffffffff8049f243>] xfrm_send_state_notify+0x343/0x6c0
> [ 138.800628] [<ffffffff8022ae75>] migration_thread+0x25/0x2e0
> [ 138.800689] [<ffffffff8022ae7e>] migration_thread+0x2e/0x2e0
> [ 138.800752] [<ffffffff8022aea0>] migration_thread+0x50/0x2e0
> [ 138.800813] [<ffffffff8022aea9>] migration_thread+0x59/0x2e0
> [ 138.831385] Unable to handle kernel paging request at ffffffff82800000 RIP:
> [ 138.831439] [<ffffffff8020b491>] show_trace+0x311/0x380
> [ 138.831598] PGD 203027 PUD 205027 PMD 0
> [ 138.831762] Oops: 0000 [1] SMP
> [ 138.831889] CPU 15
> [ 138.831978] Modules linked in:
> [ 138.832073] Pid: 0, comm: swapper Not tainted 2.6.18-rc3rannaud-gd94a0415 #36
> [ 138.832137] RIP: 0010:[<ffffffff8020b491>] [<ffffffff8020b491>] show_trace+0x311/0x380
> [ 138.832252] RSP: 0018:ffff8102238e7810 EFLAGS: 00010002
> [ 138.832312] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
> [ 138.832375] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff805b9940
> [ 138.832439] RBP: ffff8102238e7900 R08: 0000000000000002 R09: ffffffff80256580
> [ 138.832501] R10: ffffffffffffffff R11: 0000000000000000 R12: ffffffff827ffffe
> [ 138.832564] R13: 0000000000000000 R14: 000000000000000f R15: ffffffff8097f000
> [ 138.832627] FS: 00000000005a6860(0000) GS:ffff810223894ac0(0000) knlGS:0000000000000000
> [ 138.832705] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 138.832766] CR2: ffffffff82800000 CR3: 00000002217c0000 CR4: 00000000000006e0
> [ 138.832830] Process swapper (pid: 0, threadinfo ffff8102238e2000, task ffff81022389b100)
> [ 138.832909] Stack: 0000000080838210 ffffffff808a1ca8 0000000000000000 0000000000000000
> [ 138.833169] 0000000000000000 0000000000000000 0000000000000000 0000000000000d07
> [ 138.833387] 0000000000000001 0000000000000000 0000000000000001 0000000000000001
> [ 138.833556] Call Trace:
> [ 138.833676] [<ffffffff8020b745>] dump_stack+0x15/0x20
> [ 138.833748] [<ffffffff8024bc1d>] print_infinite_recursion_bug+0x3d/0x50
> [ 138.833894] [<ffffffff8024bd2f>] find_usage_backwards+0x2f/0xd0
> [ 138.834020] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.834146] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.834273] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.834399] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.834527] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.834652] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.834777] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.834902] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.835029] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.835154] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.835279] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.835404] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.835533] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.835658] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.835784] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.835909] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.836034] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.836159] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.836285] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.836411] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.836539] [<ffffffff8024c740>] check_usage+0x40/0x2b0
> [ 138.836664] [<ffffffff8024e010>] __lock_acquire+0xa50/0xd20
> [ 138.836790] [<ffffffff8024e66b>] lock_acquire+0x8b/0xc0
> [ 138.836919] [<ffffffff804ac065>] _spin_lock+0x25/0x40
> [ 138.837049] [<ffffffff80227d30>] double_rq_lock+0x40/0x50
> [ 138.837148] [<ffffffff8022ceca>] rebalance_tick+0x19a/0x340
> [ 138.837249] [<ffffffff8022d0ef>] scheduler_tick+0x7f/0x390
> [ 138.837351] [<ffffffff8023afe3>] update_process_times+0x73/0x90
> [ 138.837464] [<ffffffff8021636b>] smp_local_timer_interrupt+0x2b/0x60
> [ 138.837549] [<ffffffff80216b28>] smp_apic_timer_interrupt+0x58/0x70
> [ 138.837633] [<ffffffff8020a76e>] apic_timer_interrupt+0x6a/0x70
> [ 138.837702] [<ffffffff802086b2>] default_idle+0x42/0x80
> [ 138.837766] [<ffffffff8020875a>] cpu_idle+0x6a/0x90
> [ 138.837832] [<ffffffff8094d94c>] start_secondary+0x4fc/0x510
> [ 138.837912] <IRQ> [<ffffffff8020b745>] dump_stack+0x15/0x20
> [ 138.838015] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.838077] [<ffffffff8024bc1d>] print_infinite_recursion_bug+0x3d/0x50
> [ 138.838140] [<ffffffff8024bd2f>] find_usage_backwards+0x2f/0xd0
> [ 138.838203] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.839444] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.839506] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.839569] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.839631] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.839694] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.839756] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.839819] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.839881] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.839943] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.840006] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.840068] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.840130] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.840193] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.840255] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.840316] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.840378] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.840439] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.840502] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.840564] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 138.840625] [<ffffffff8024c740>] check_usage+0x40/0x2b0
> [ 138.840687] [<ffffffff8024e010>] __lock_acquire+0xa50/0xd20
> [ 138.840749] [<ffffffff80227d30>] double_rq_lock+0x40/0x50
> [ 138.840811] [<ffffffff8024e66b>] lock_acquire+0x8b/0xc0
> [ 138.840871] [<ffffffff80227d30>] double_rq_lock+0x40/0x50
> [ 138.840933] [<ffffffff804ac065>] _spin_lock+0x25/0x40
> [ 138.840993] [<ffffffff80227d30>] double_rq_lock+0x40/0x50
> [ 138.841054] [<ffffffff8022ceca>] rebalance_tick+0x19a/0x340
> [ 138.841116] [<ffffffff804ac54b>] _spin_unlock_irq+0x2b/0x40
> [ 138.841179] [<ffffffff8022d0ef>] scheduler_tick+0x7f/0x390
> [ 138.841239] [<ffffffff8023afe3>] update_process_times+0x73/0x90
> [ 138.841301] [<ffffffff8021636b>] smp_local_timer_interrupt+0x2b/0x60
> [ 138.841363] [<ffffffff80216b28>] smp_apic_timer_interrupt+0x58/0x70
> [ 138.841425] [<ffffffff802086b0>] default_idle+0x40/0x80
> [ 138.841485] [<ffffffff80208670>] default_idle+0x0/0x80
> [ 138.841544] [<ffffffff8020a76e>] apic_timer_interrupt+0x6a/0x70
> [ 138.841605] <EOI> [<ffffffff8045c7ff>] ipq_kill+0x5f/0xa0
> [ 139.154267] [<ffffffff8021ed29>] flush_kernel_map+0x39/0x40
> [ 139.154336] [<ffffffff8021ecc0>] search_extable+0x40/0x70
> [ 139.166705] [<ffffffff8020efeb>] show_cpuinfo+0x18b/0x360
> [ 139.166774] [<ffffffff8020efef>] show_cpuinfo+0x18f/0x360
> [ 139.166837] [<ffffffff8020f0e6>] show_cpuinfo+0x286/0x360
> [ 139.166898] [<ffffffff8020f0ed>] show_cpuinfo+0x28d/0x360
> [ 139.167161] [<ffffffff8021de9f>] show_mem+0xbf/0x170
> [ 139.167225] [<ffffffff8021dea1>] show_mem+0xc1/0x170
> [ 139.167285] [<ffffffff8021dec0>] show_mem+0xe0/0x170
> [ 139.167345] [<ffffffff8021dec2>] show_mem+0xe2/0x170
> [ 139.167407] [<ffffffff8021ded7>] show_mem+0xf7/0x170
> [ 139.167467] [<ffffffff8021ded9>] show_mem+0xf9/0x170
> [ 139.167528] [<ffffffff8021dfc0>] __add_pages+0x40/0xd0
> [ 139.167588] [<ffffffff8021e089>] vmalloc_sync_all+0x39/0x170
> [ 139.202919] [<ffffffff8049ee60>] xfrm_state_netlink+0x0/0xa0
> [ 139.202993] [<ffffffff8049ee69>] xfrm_state_netlink+0x9/0xa0
> [ 139.203055] [<ffffffff8049ee86>] xfrm_state_netlink+0x26/0xa0
> [ 139.203116] [<ffffffff8049ee8f>] xfrm_state_netlink+0x2f/0xa0
> [ 139.203187] [<ffffffff8049f182>] xfrm_send_state_notify+0x282/0x6c0
> [ 139.203250] [<ffffffff8049f195>] xfrm_send_state_notify+0x295/0x6c0
> [ 139.203329] [<ffffffff8022ad33>] find_busiest_group+0x5d3/0x6f0
> [ 139.203389] [<ffffffff8022ad3c>] find_busiest_group+0x5dc/0x6f0
> [ 139.203457] [<ffffffff8049f23c>] xfrm_send_state_notify+0x33c/0x6c0
> [ 139.203519] [<ffffffff8049f243>] xfrm_send_state_notify+0x343/0x6c0
> [ 139.203582] [<ffffffff8022ae75>] migration_thread+0x25/0x2e0
> [ 139.203644] [<ffffffff8022ae7e>] migration_thread+0x2e/0x2e0
> [ 139.203707] [<ffffffff8022aea0>] migration_thread+0x50/0x2e0
> [ 139.203767] [<ffffffff8022aea9>] migration_thread+0x59/0x2e0
> [ 139.234201] Unable to handle kernel paging request at ffffffff82800000 RIP:
> [ 139.234255] [<ffffffff8020b491>] show_trace+0x311/0x380
> [ 139.234412] PGD 203027 PUD 205027 PMD 0
> [ 139.234574] Oops: 0000 [2] SMP
> [ 139.234699] CPU 15
> [ 139.234789] Modules linked in:
> [ 139.234880] Pid: 0, comm: swapper Not tainted 2.6.18-rc3rannaud-gd94a0415 #36
> [ 139.234942] RIP: 0010:[<ffffffff8020b491>] [<ffffffff8020b491>] show_trace+0x311/0x380
> [ 139.235057] RSP: 0018:ffff8102238e7488 EFLAGS: 00010002
> [ 139.235117] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
> [ 139.235180] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff805b9940
> [ 139.235243] RBP: ffff8102238e7578 R08: 0000000000000002 R09: ffffffff80256580
> [ 139.235306] R10: ffffffffffffffff R11: 0000000000000000 R12: ffffffff827ffffe
> [ 139.235368] R13: 0000000000000000 R14: 000000000000000f R15: ffffffff8097f000
> [ 139.235431] FS: 00000000005a6860(0000) GS:ffff810223894ac0(0000) knlGS:0000000000000000
> [ 139.235509] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 139.235569] CR2: ffffffff82800000 CR3: 00000002217c0000 CR4: 00000000000006e0
> [ 139.235634] Process swapper (pid: 0, threadinfo ffff8102238e2000, task ffff81022389b100)
> [ 139.235711] Stack: 00000000238e3fc0 0000000000000000 0000000000000000 0000000000000000
> [ 139.235970] 0000000000000000 0000000000000000 0000000000000000 0000000000000d07
> [ 139.236187] 0000000000000001 0000000000000000 0000000000000001 0000000000000001
> [ 139.236358] Call Trace:
> [ 139.236476] [<ffffffff8020b5fe>] _show_stack+0xfe/0x110
> [ 139.236545] [<ffffffff8020b69e>] show_registers+0x8e/0x110
> [ 139.236616] [<ffffffff804ad11a>] __die+0x9a/0xe0
> [ 139.236684] [<ffffffff804aee07>] do_page_fault+0x837/0x960
> [ 139.236785] [<ffffffff8020a8d1>] error_exit+0x0/0x96
> [ 139.236852] [<ffffffff8020b491>] show_trace+0x311/0x380
> [ 139.236923] [<ffffffff8020b745>] dump_stack+0x15/0x20
> [ 139.236995] [<ffffffff8024bc1d>] print_infinite_recursion_bug+0x3d/0x50
> [ 139.237133] [<ffffffff8024bd2f>] find_usage_backwards+0x2f/0xd0
> [ 139.237259] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.237384] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.237510] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.237637] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.237763] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.237889] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.238015] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.238140] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.238266] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.238391] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.238517] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.238642] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.238768] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.238896] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.239022] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.239147] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.239273] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.239399] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.239527] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.239653] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.239780] [<ffffffff8024c740>] check_usage+0x40/0x2b0
> [ 139.239907] [<ffffffff8024e010>] __lock_acquire+0xa50/0xd20
> [ 139.240034] [<ffffffff8024e66b>] lock_acquire+0x8b/0xc0
> [ 139.240161] [<ffffffff804ac065>] _spin_lock+0x25/0x40
> [ 139.240293] [<ffffffff80227d30>] double_rq_lock+0x40/0x50
> [ 139.240392] [<ffffffff8022ceca>] rebalance_tick+0x19a/0x340
> [ 139.240493] [<ffffffff8022d0ef>] scheduler_tick+0x7f/0x390
> [ 139.240595] [<ffffffff8023afe3>] update_process_times+0x73/0x90
> [ 139.240707] [<ffffffff8021636b>] smp_local_timer_interrupt+0x2b/0x60
> [ 139.240795] [<ffffffff80216b28>] smp_apic_timer_interrupt+0x58/0x70
> [ 139.240880] [<ffffffff8020a76e>] apic_timer_interrupt+0x6a/0x70
> [ 139.240947] [<ffffffff802086b2>] default_idle+0x42/0x80
> [ 139.241011] [<ffffffff8020875a>] cpu_idle+0x6a/0x90
> [ 139.241079] [<ffffffff8094d94c>] start_secondary+0x4fc/0x510
> [ 139.241160] <IRQ> [<ffffffff8020b5fe>] _show_stack+0xfe/0x110
> [ 139.241260] [<ffffffff8020b69e>] show_registers+0x8e/0x110
> [ 139.241321] [<ffffffff804ad11a>] __die+0x9a/0xe0
> [ 139.241381] [<ffffffff804aee07>] do_page_fault+0x837/0x960
> [ 139.241442] [<ffffffff80231f97>] printk+0x67/0x70
> [ 139.241501] [<ffffffff8022aea9>] migration_thread+0x59/0x2e0
> [ 139.241564] [<ffffffff80256af0>] kallsyms_lookup+0xd0/0x1f0
> [ 139.241626] [<ffffffff8020a8d1>] error_exit+0x0/0x96
> [ 139.241687] [<ffffffff80256580>] module_text_address+0x20/0x50
> [ 139.241748] [<ffffffff8020b491>] show_trace+0x311/0x380
> [ 139.241812] [<ffffffff8020b745>] dump_stack+0x15/0x20
> [ 139.241873] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.241934] [<ffffffff8024bc1d>] print_infinite_recursion_bug+0x3d/0x50
> [ 139.241999] [<ffffffff8024bd2f>] find_usage_backwards+0x2f/0xd0
> [ 139.242061] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242123] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242185] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242247] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242310] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242373] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242434] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242497] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242558] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242620] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242681] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242742] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242805] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242867] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242930] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.242992] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.243054] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.243116] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.243178] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.243241] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.243304] [<ffffffff8024c740>] check_usage+0x40/0x2b0
> [ 139.243365] [<ffffffff8024e010>] __lock_acquire+0xa50/0xd20
> [ 139.243427] [<ffffffff80227d30>] double_rq_lock+0x40/0x50
> [ 139.243488] [<ffffffff8024e66b>] lock_acquire+0x8b/0xc0
> [ 139.243549] [<ffffffff80227d30>] double_rq_lock+0x40/0x50
> [ 139.243610] [<ffffffff804ac065>] _spin_lock+0x25/0x40
> [ 139.243670] [<ffffffff80227d30>] double_rq_lock+0x40/0x50
> [ 139.243730] [<ffffffff8022ceca>] rebalance_tick+0x19a/0x340
> [ 139.243792] [<ffffffff804ac54b>] _spin_unlock_irq+0x2b/0x40
> [ 139.243855] [<ffffffff8022d0ef>] scheduler_tick+0x7f/0x390
> [ 139.243916] [<ffffffff8023afe3>] update_process_times+0x73/0x90
> [ 139.245151] [<ffffffff8021636b>] smp_local_timer_interrupt+0x2b/0x60
> [ 139.245214] [<ffffffff80216b28>] smp_apic_timer_interrupt+0x58/0x70
> [ 139.245277] [<ffffffff802086b0>] default_idle+0x40/0x80
> [ 139.245336] [<ffffffff80208670>] default_idle+0x0/0x80
> [ 139.245397] [<ffffffff8020a76e>] apic_timer_interrupt+0x6a/0x70
> [ 139.245458] <EOI> [<ffffffff8045c7ff>] ipq_kill+0x5f/0xa0
> [ 139.559005] [<ffffffff8021ed29>] flush_kernel_map+0x39/0x40
> [ 139.559075] [<ffffffff8021ecc0>] search_extable+0x40/0x70
> [ 139.571503] [<ffffffff8020efeb>] show_cpuinfo+0x18b/0x360
> [ 139.571572] [<ffffffff8020efef>] show_cpuinfo+0x18f/0x360
> [ 139.571635] [<ffffffff8020f0e6>] show_cpuinfo+0x286/0x360
> [ 139.571696] [<ffffffff8020f0ed>] show_cpuinfo+0x28d/0x360
> [ 139.571965] [<ffffffff8021de9f>] show_mem+0xbf/0x170
> [ 139.572024] [<ffffffff8021dea1>] show_mem+0xc1/0x170
> [ 139.572084] [<ffffffff8021dec0>] show_mem+0xe0/0x170
> [ 139.572144] [<ffffffff8021dec2>] show_mem+0xe2/0x170
> [ 139.572204] [<ffffffff8021ded7>] show_mem+0xf7/0x170
> [ 139.572263] [<ffffffff8021ded9>] show_mem+0xf9/0x170
> [ 139.572325] [<ffffffff8021dfc0>] __add_pages+0x40/0xd0
> [ 139.572386] [<ffffffff8021e089>] vmalloc_sync_all+0x39/0x170
> [ 139.607781] [<ffffffff8049ee60>] xfrm_state_netlink+0x0/0xa0
> [ 139.607850] [<ffffffff8049ee69>] xfrm_state_netlink+0x9/0xa0
> [ 139.607912] [<ffffffff8049ee86>] xfrm_state_netlink+0x26/0xa0
> [ 139.607973] [<ffffffff8049ee8f>] xfrm_state_netlink+0x2f/0xa0
> [ 139.608044] [<ffffffff8049f182>] xfrm_send_state_notify+0x282/0x6c0
> [ 139.608105] [<ffffffff8049f195>] xfrm_send_state_notify+0x295/0x6c0
> [ 139.608185] [<ffffffff8022ad33>] find_busiest_group+0x5d3/0x6f0
> [ 139.608250] [<ffffffff8022ad3c>] find_busiest_group+0x5dc/0x6f0
> [ 139.608320] [<ffffffff8049f23c>] xfrm_send_state_notify+0x33c/0x6c0
> [ 139.608382] [<ffffffff8049f243>] xfrm_send_state_notify+0x343/0x6c0
> [ 139.608446] [<ffffffff8022ae75>] migration_thread+0x25/0x2e0
> [ 139.608506] [<ffffffff8022ae7e>] migration_thread+0x2e/0x2e0
> [ 139.608569] [<ffffffff8022aea0>] migration_thread+0x50/0x2e0
> [ 139.608630] [<ffffffff8022aea9>] migration_thread+0x59/0x2e0
> [ 139.639155] Unable to handle kernel paging request at ffffffff82800000 RIP:
> [ 139.639209] [<ffffffff8020b491>] show_trace+0x311/0x380
> [ 139.639366] PGD 203027 PUD 205027 PMD 0
> [ 139.639531] Oops: 0000 [3] SMP
> [ 139.639657] CPU 15
> [ 139.639745] Modules linked in:
> [ 139.639836] Pid: 0, comm: swapper Not tainted 2.6.18-rc3rannaud-gd94a0415 #36
> [ 139.639899] RIP: 0010:[<ffffffff8020b491>] [<ffffffff8020b491>] show_trace+0x311/0x380
> [ 139.572325] [<ffffffff8021dfc0>] __add_pages+0x40/0xd0
> [ 139.572386] [<ffffffff8021e089>] vmalloc_sync_all+0x39/0x170
> [ 139.607781] [<ffffffff8049ee60>] xfrm_state_netlink+0x0/0xa0
> [ 139.607850] [<ffffffff8049ee69>] xfrm_state_netlink+0x9/0xa0
> [ 139.607912] [<ffffffff8049ee86>] xfrm_state_netlink+0x26/0xa0
> [ 139.607973] [<ffffffff8049ee8f>] xfrm_state_netlink+0x2f/0xa0
> [ 139.608044] [<ffffffff8049f182>] xfrm_send_state_notify+0x282/0x6c0
> [ 139.608105] [<ffffffff8049f195>] xfrm_send_state_notify+0x295/0x6c0
> [ 139.608185] [<ffffffff8022ad33>] find_busiest_group+0x5d3/0x6f0
> [ 139.608250] [<ffffffff8022ad3c>] find_busiest_group+0x5dc/0x6f0
> [ 139.608320] [<ffffffff8049f23c>] xfrm_send_state_notify+0x33c/0x6c0
> [ 139.608382] [<ffffffff8049f243>] xfrm_send_state_notify+0x343/0x6c0
> [ 139.608446] [<ffffffff8022ae75>] migration_thread+0x25/0x2e0
> [ 139.608506] [<ffffffff8022ae7e>] migration_thread+0x2e/0x2e0
> [ 139.608569] [<ffffffff8022aea0>] migration_thread+0x50/0x2e0
> [ 139.608630] [<ffffffff8022aea9>] migration_thread+0x59/0x2e0
> [ 139.639155] Unable to handle kernel paging request at ffffffff82800000 RIP:
> [ 139.639209] [<ffffffff8020b491>] show_trace+0x311/0x380
> [ 139.639366] PGD 203027 PUD 205027 PMD 0
> [ 139.639531] Oops: 0000 [3] SMP
> [ 139.639657] CPU 15
> [ 139.639745] Modules linked in:
> [ 139.639836] Pid: 0, comm: swapper Not tainted 2.6.18-rc3rannaud-gd94a0415
> #36
> [ 139.639899] RIP: 0010:[<ffffffff8020b491>] [<ffffffff8020b491>]
> show_trace+0x311/0x380
> [ 139.640016] RSP: 0018:ffff8102238e70f8 EFLAGS: 00010002
> [ 139.640075] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 0000000000000001
> [ 139.640138] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
> ffffffff805b9940
> [ 139.640202] RBP: ffff8102238e71e8 R08: 0000000000000002 R09:
> ffffffff80256580
> [ 139.640265] R10: ffffffffffffffff R11: 0000000000000000 R12:
> ffffffff827ffffe
> [ 139.640329] R13: 0000000000000000 R14: 000000000000000f R15:
> ffffffff8097f000
> [ 139.640392] FS: 00000000005a6860(0000) GS:ffff810223894ac0(0000)
> knlGS:0000000000000000
> [ 139.640470] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [ 139.640532] CR2: ffffffff82800000 CR3: 00000002217c0000 CR4:
> 00000000000006e0
> [ 139.640596] Process swapper (pid: 0, threadinfo ffff8102238e2000,
> task ffff81022389b100)
> [ 139.640674] Stack: 00000000238e3fc0 0000000000000000
> 0000000000000000 0000000000000000
> [ 139.640933] 0000000000000000 0000000000000000 0000000000000000
> 0000000000000d07
> [ 139.641151] 0000000000000001 0000000000000000 0000000000000001
> 0000000000000001
> [ 139.641321] Call Trace:
> [ 139.641438] [<ffffffff8020b5fe>] _show_stack+0xfe/0x110
> [ 139.641508] [<ffffffff8020b69e>] show_registers+0x8e/0x110
> [ 139.641579] [<ffffffff804ad11a>] __die+0x9a/0xe0
> [ 139.641648] [<ffffffff804aee07>] do_page_fault+0x837/0x960
> [ 139.641750] [<ffffffff8020a8d1>] error_exit+0x0/0x96
> [ 139.641817] [<ffffffff8020b491>] show_trace+0x311/0x380
> [ 139.641886] [<ffffffff8020b5fe>] _show_stack+0xfe/0x110
> [ 139.641955] [<ffffffff8020b69e>] show_registers+0x8e/0x110
> [ 139.642025] [<ffffffff804ad11a>] __die+0x9a/0xe0
> [ 139.642093] [<ffffffff804aee07>] do_page_fault+0x837/0x960
> [ 139.642186] [<ffffffff8020a8d1>] error_exit+0x0/0x96
> [ 139.642252] [<ffffffff8020b491>] show_trace+0x311/0x380
> [ 139.642322] [<ffffffff8020b745>] dump_stack+0x15/0x20
> [ 139.642395] [<ffffffff8024bc1d>] print_infinite_recursion_bug+0x3d/0x50
> [ 139.642534] [<ffffffff8024bd2f>] find_usage_backwards+0x2f/0xd0
> [ 139.642662] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.642789] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.642917] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.643044] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.643172] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.643300] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.643429] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.643557] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.643685] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.643812] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.643940] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.644067] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.644194] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.644321] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.644450] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.644577] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.644705] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> [ 139.644832] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> ---- (a bit more on console)
>
>
> ---- netconsole for v2.6.18 with numa=noacpi
> [ 320.125317] BUG: warning at
> kernel/lockdep.c:565/print_infinite_recursion_bug()
> [ 320.125380]
> [ 320.125381] Call Trace:
> [ 320.125550] [<ffffffff8100b22d>] show_trace+0xad/0x3b0
> [ 320.125604] [<ffffffff8100b775>] dump_stack+0x15/0x20
> [ 320.125659] [<ffffffff8104c23d>] print_infinite_recursion_bug+0x3d/0x50
> [ 320.125771] [<ffffffff8104c41f>] check_noncircular+0x2f/0xe0
> [ 320.125880] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.125993] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.126103] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.126213] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.126336] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.126459] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.126582] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.126705] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.126828] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.126953] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.127078] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.127201] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.127325] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.127448] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.127571] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.127694] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.127816] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.127940] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.128063] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.128186] [<ffffffff8104c473>] check_noncircular+0x83/0xe0
> [ 320.128309] [<ffffffff8104e5da>] __lock_acquire+0x9fa/0xd20
> [ 320.128432] [<ffffffff8104ec8b>] lock_acquire+0x8b/0xc0
> [ 320.128559] [<ffffffff812cadf7>] __mutex_lock_slowpath+0xd7/0x250
> [ 320.128681] [<ffffffff812caf95>] mutex_lock+0x25/0x30
> [ 320.128799] [<ffffffff81115613>] elevator_exit+0x23/0x60
> [ 320.129066] [<ffffffff811180f5>] blk_cleanup_queue+0x45/0x60
> [ 320.129319] [<ffffffff811e5fd9>] scsi_free_queue+0x9/0x10
> [ 320.129717] [<ffffffff811eae86>]
> scsi_device_dev_release_usercontext+0x106/0x160
> [ 320.130119] [<ffffffff81043e3b>] execute_in_process_context+0x2b/0x70
> [ 320.130237] [<ffffffff811e9dfa>] scsi_device_dev_release+0x1a/0x20
> [ 320.130617] [<ffffffff8119e84c>] device_release+0x1c/0x70
> [ 320.130959] [<ffffffff811270e3>] kobject_cleanup+0x63/0xa0
> [ 320.131220] [<ffffffff8112712d>] kobject_release+0xd/0x10
> [ 320.131481] [<ffffffff811276ef>] kref_put+0x6f/0x90
> [ 320.131741] [<ffffffff81126979>] kobject_put+0x19/0x20
> [ 320.131999] [<ffffffff8119eb45>] put_device+0x15/0x20
> [ 320.132339] [<ffffffff811e8f68>] scsi_probe_and_add_lun+0xb58/0xba0
> [ 320.132721] [<ffffffff811e9082>] __scsi_scan_target+0xd2/0x690
> [ 320.133102] [<ffffffff811e9695>] scsi_scan_channel+0x55/0xa0
> [ 320.133482] [<ffffffff811e97bd>] scsi_scan_host_selected+0xdd/0x130
> [ 320.133865] [<ffffffff811e9825>] scsi_scan_host+0x15/0x20
> [ 320.134247] [<ffffffff811f00f0>] megaraid_probe_one+0x1110/0x11c0
> [ 320.134634] [<ffffffff81136e43>] pci_device_probe+0xf3/0x170
> [ 320.134908] [<ffffffff811a0fc7>] driver_probe_device+0x67/0xe0
> [ 320.135251] [<ffffffff811a1177>] __driver_attach+0x97/0xf0
> [ 320.135593] [<ffffffff811a014f>] bus_for_each_dev+0x4f/0x80
> [ 320.135936] [<ffffffff811a0e1c>] driver_attach+0x1c/0x20
> [ 320.136279] [<ffffffff811a0598>] bus_add_driver+0x88/0x180
> [ 320.136621] [<ffffffff811a16c3>] driver_register+0x93/0xa0
> [ 320.136965] [<ffffffff811368a3>] __pci_register_driver+0x63/0x90
> [ 320.137242] [<ffffffff818a3290>] megaraid_init+0x70/0xa0
> [ 320.137621] [<ffffffff810071c4>] init+0x164/0x350
> [ 320.137679] [<ffffffff8100aaa0>] child_rip+0xa/0x12
> [ 320.137741]
> [ 320.137742] -> #20 (&rq->rq_lock_key#16){++..}:
> [ 320.137977] [<ffffffff8104ec8a>] lock_acquire+0x8a/0xc0
> [ 320.138296] [<ffffffff812cc404>] _spin_lock+0x24/0x40
> [ 320.138615] [<ffffffff8102860f>] double_rq_lock+0x3f/0x50
> [ 320.138934] [<ffffffff8102d789>] rebalance_tick+0x199/0x340
> [ 320.139254] [<ffffffff8102d9ae>] scheduler_tick+0x7e/0x390
> [ 320.139572] [<ffffffff8103b7a2>] update_process_times+0x72/0x90
> [ 320.139894] [<ffffffff8101639a>] smp_local_timer_interrupt+0x2a/0x60
> [ 320.140215] [<ffffffff81016b57>] smp_apic_timer_interrupt+0x57/0x70
> [ 320.140535] [<ffffffff8100a76e>] apic_timer_interrupt+0x6a/0x70
> [ 320.140854]
> [ 320.140855] -> #19 (&rq->rq_lock_key#15){++..}:
> [ 320.141088] [<ffffffff8104ec8a>] lock_acquire+0x8a/0xc0
> [ 320.141405] [<ffffffff812cc404>] _spin_lock+0x24/0x40
> [ 320.141722] [<ffffffff810285fa>] double_rq_lock+0x2a/0x50
> [ 320.142040] [<ffffffff8102b93a>] migration_thread+0x22a/0x2e0
> [ 320.142359] [<ffffffff81047589>] kthread+0xd9/0x110
> [ 320.142676] [<ffffffff8100aa9f>] child_rip+0x9/0x12
> [ 320.142992]
> [ 320.142993] -> #18 (&rq->rq_lock_key#14){++..}:
> [ 320.143226] [<ffffffff8104ec8a>] lock_acquire+0x8a/0xc0
> [ 320.143543] [<ffffffff812cc404>] _spin_lock+0x24/0x40
> [ 320.143863] [<ffffffff810285fa>] double_rq_lock+0x2a/0x50
> [ 320.144182] [<ffffffff8102b93a>] migration_thread+0x22a/0x2e0
> [ 320.144501] [<ffffffff81047589>] kthread+0xd9/0x110
> [ 320.144818] [<ffffffff8100aa9f>] child_rip+0x9/0x12
> [ 320.145135]
> [ 320.145135] -> #17 (&rq->rq_lock_key#13){++..}:
> [ 320.145369] [<ffffffff8104ec8a>] lock_acquire+0x8a/0xc0
> [ 320.145686] [<ffffffff812cc404>] _spin_lock+0x24/0x40
> [ 320.146004] [<ffffffff810285fa>] double_rq_lock+0x2a/0x50
> [ 320.146323] [<ffffffff8102b93a>] migration_thread+0x22a/0x2e0
> [ 320.146642] [<ffffffff81047589>] kthread+0xd9/0x110
> [ 320.146959] [<ffffffff8100aa9f>] child_rip+0x9/0x12
> [ 320.147275]
> [ 320.147276] -> #16 (&rq->rq_lock_key#12){++..}:
> [ 320.147508] [<ffffffff8104ec8a>] lock_acquire+0x8a/0xc0
> [ 320.147826] [<ffffffff812cc404>] _spin_lock+0x24/0x40
> [ 320.148141] [<ffffffff810285fa>] double_rq_lock+0x2a/0x50
> [ 320.148460] [<ffffffff8102b93a>] migration_thread+0x22a/0x2e0
> [ 320.148778] [<ffffffff81047589>] kthread+0xd9/0x110
> [ 320.149096] [<ffffffff8100aa9f>] child_rip+0x9/0x12
> [ 320.149411]
> [ 320.149412] -> #15 (&rq->rq_lock_key#11){++..}:
> [ 320.149645] [<ffffffff8104ec8a>] lock_acquire+0x8a/0xc0
> [ 320.149962] [<ffffffff812cc404>] _spin_lock+0x24/0x40
> [ 320.150280] [<ffffffff810285fa>] double_rq_lock+0x2a/0x50
> [ 320.150598] [<ffffffff8102b93a>] migration_thread+0x22a/0x2e0
> [ 320.150917] [<ffffffff81047589>] kthread+0xd9/0x110
> [ 320.151235] [<ffffffff8100aa9f>] child_rip+0x9/0x12
> [ 320.151551]
> [ 320.151552] -> #14 (&rq->rq_lock_key#10){++..}:
> [ 320.151785] [<ffffffff8104ec8a>] lock_acquire+0x8a/0xc0
> [ 320.152102] [<ffffffff812cc404>] _spin_lock+0x24/0x40
> [ 320.152421] [<ffffffff810285fa>] double_rq_lock+0x2a/0x50
> [ 320.152741] [<ffffffff8102b93a>] migration_thread+0x22a/0x2e0
> [ 320.153060]
> -----
> (there was slightly more info on the local console, but it locked up)
>
>
>
> ----- /proc/cpuinfo
> processor : 15
> vendor_id : AuthenticAMD
> cpu family : 15
> model : 33
> model name : Dual Core AMD Opteron(tm) Processor 880
> stepping : 2
> cpu MHz : 2411.123
> cache size : 1024 KB
> physical id : 7
> siblings : 2
> core id : 1
> cpu cores : 2
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext
> fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy
> bogomips : 4822.64
> TLB size : 1024 4K pages
> clflush size : 64
> cache_alignment : 64
> address sizes : 40 bits physical, 48 bits virtual
> power management: ts fid vid ttp
> --
>
>
>
> ---- lspci -vvv
> http://engm.ath.cx/liw64-lspci.txt
> ----
>

2006-09-30 20:13:22

by Andrew Morton

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, 30 Sep 2006 21:20:58 +0200
"Eric Rannaud" <[email protected]> wrote:

> On a 16-way Opteron (8 dual-core 880) with 8GB of RAM, vanilla 2.6.18
> crashes early on boot with a BUG.

omg what a mess. Have you tried it with lockdep disabled in config?

2006-09-30 20:21:21

by Al Viro

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, Sep 30, 2006 at 12:54:21PM -0700, Linus Torvalds wrote:
> We need Al Viro here to put this kind of code into perspective. I _think_
> he would have some choice words for code that is meant to "help"
> debugging, and is this horrible.

Oh, so _that_ is what it is supposed to do? I've seen it when it went
in, tried to read, barfed and chalked it up to KDB or itanic braindamage
(both have turds of that genre). Didn't realize that lockdep used it too...

2006-09-30 20:24:12

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, 30 Sep 2006, Peter Zijlstra wrote:
>
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm1/broken-out/slab-fix-lockdep-warnings.patch
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm1/broken-out/slab-fix-lockdep-warnings-fix.patch
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm1/broken-out/slab-fix-lockdep-warnings-fix-2.patch
>
> Those should rid you off the trace seen under:

Well, the more serious problem seems to be that any warning that causes a
back-trace on x86-64 seems liable to just lock up due to the backtrace
code itself being buggy.

At least I _hope_ its x86-64 only.

Linus

2006-09-30 20:29:30

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, 30 Sep 2006, Al Viro wrote:
>
> Oh, so _that_ is what it is supposed to do? I've seen it when it went
> in, tried to read, barfed and chalked it up to KDB or itanic braindamage
> (both have turds of that genre). Didn't realize that lockdep used it too...

Well, anything that shows or needs a back-trace. By definition, it's
pretty much just debug code.

I sure as hell hope we don't have any actual _semantics_ that depend on
back-traces, like the broken asynchronous C++ exception handling code etc
that people have in user space (what a total brain-damage _that_ is!).

Linus

2006-09-30 20:30:38

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Saturday 30 September 2006 21:54, Linus Torvalds wrote:
>
> On Sat, 30 Sep 2006, Eric Rannaud wrote:
> >
> > (2) is introduced by d94a041519f3ab1ac023bf917619cd8c4a7d3c01
> > [PATCH] taskstats: free skb, avoid returns in send_cpu_listeners,
> > Signed-off-by: Shailabh Nagar.
> > The kernel freezes after a BUG (no sysrq magic).
>
> This one looks like the real problem is that totally broken stack unwinder
> again.
>
> The lockdep warning in itself is probably valid, but the reason for the
> _hang_ is the
>
> [ 138.831385] Unable to handle kernel paging request at ffffffff82800000 RIP:
> [ 138.831439] [<ffffffff8020b491>] show_trace+0x311/0x380
>
> and that code is just a total mess.

The code decides to do a fallback at the top of stack space for some reason.

Hmm, i've seen this working on other kernel threads, but maybe
that was luck. Kernel threads don't end up in user space
so the normal check for that doesn't work.
I guess we just need another termination for the kernel threads
by pushing a 0 there explicitely. Jan, do you agree?

>
> Andi, I really think that Dwarf unwinder needs to go. The code is totally
> unreadable, and it's clearly fragile as hell. It doesn't check that the
> pointers it gets are even remotely valid, but just follows them as if they
> were.

It assumes the dwarf2 information is correct, but stack is normally checked
(actually it doesn't even follow any pointers on the stack unlike a frame pointer
unwinder)

>
> The whole unwinder seems buggier than any bug it can ever unwind would be.

We didn't so far find any bug in the unwinder code itself (ok if you don't
count the performance issue Ingo found) just lots in the annotations and one
bug in the dwarf2 standard. There were a couple of bugs in the fallback
code I added later, but that was added only because the annotations were
so buggy and we were afraid to lose information.

> Really. Let's go back to the sane "try our best, don't guarantee anything,
> but at least don't make things worse!" old code.

If you kick the people who add more than three levels of callback
to core driver code to get their acts together too that's fine
to me. Unfortunately I don't think that's realistic. So we clearly
need better unwinding.

> The people who wrote that crap (and yes, Andi, I mean apparently you and
> Jan Beulich) really _have_ to get his act together. It's not just
> unreadable and obviously buggy, it's so scarily that it's hard to even
> talk about it. Lookie here:
>
> #define HANDLE_STACK(cond) \
> do while (cond) { \
> unsigned long addr = *stack++; \

> What the F*CK! "do while(cond) {" ????

That is because while () alone doesn't tolerate a semicolon without
breaking if () MACRO; else ... Given I don't think it's used in a
else context here, but there are valid reasons for this convention.

[of course if C had a non crappy preprocessor this all wouldn't be needed,
but we're stuck with it now]

> > [ 138.430973] Call Trace:
> > [ 138.431190] [<ffffffff8020b22d>] show_trace+0xad/0x380
> > [ 138.431264] [<ffffffff8020b745>] dump_stack+0x15/0x20
> > [ 138.431337] [<ffffffff8024bc1d>] print_infinite_recursion_bug+0x3d/0x50
> > [ 138.431465] [<ffffffff8024bd2f>] find_usage_backwards+0x2f/0xd0
> > [ 138.431591] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0
> > [ 138.431718] [<ffffffff8024bd93>] find_usage_backwards+0x93/0xd0

... lots of recursive calls snipped ...

Scary. That code really seems to be quite deeply recursive.
Linus, why don't you flame code like this? @) Actually I guess the
real culprit here is not even lockdep, but too complex locking
in XFRM.

Actually I wonder if the stack didn't just overflow. Unfortunately it can't
be seen because the tail is in a interrupt stack.

> > [ 138.751306] [<ffffffff8021ecc0>] search_extable+0x40/0x70

After here the unwinder seems to become a bit and it shouldn't print
multiple entries. Jan any ideas why?

> > [ 138.763697] [<ffffffff8020efeb>] show_cpuinfo+0x18b/0x360
> > [ 138.763765] [<ffffffff8020efef>] show_cpuinfo+0x18f/0x360
> > [ 138.763828] [<ffffffff8020f0e6>] show_cpuinfo+0x286/0x360
> > [ 138.763889] [<ffffffff8020f0ed>] show_cpuinfo+0x28d/0x360
> > [ 138.764152] [<ffffffff8021de9f>] show_mem+0xbf/0x170
> > [ 138.764211] [<ffffffff8021dea1>] show_mem+0xc1/0x170
> > [ 138.764275] [<ffffffff8021dec0>] show_mem+0xe0/0x170
> > [ 138.764334] [<ffffffff8021dec2>] show_mem+0xe2/0x170
> > [ 138.764394] [<ffffffff8021ded7>] show_mem+0xf7/0x170
> > [ 138.764453] [<ffffffff8021ded9>] show_mem+0xf9/0x170
> > [ 138.764514] [<ffffffff8021dfc0>] __add_pages+0x40/0xd0
> > [ 138.764576] [<ffffffff8021e089>] vmalloc_sync_all+0x39/0x170
> > [ 138.799966] [<ffffffff8049ee60>] xfrm_state_netlink+0x0/0xa0
> > [ 138.800034] [<ffffffff8049ee69>] xfrm_state_netlink+0x9/0xa0
> > [ 138.800100] [<ffffffff8049ee86>] xfrm_state_netlink+0x26/0xa0
> > [ 138.800161] [<ffffffff8049ee8f>] xfrm_state_netlink+0x2f/0xa0
> > [ 138.800230] [<ffffffff8049f182>] xfrm_send_state_notify+0x282/0x6c0
> > [ 138.800292] [<ffffffff8049f195>] xfrm_send_state_notify+0x295/0x6c0
> > [ 138.800370] [<ffffffff8022ad33>] find_busiest_group+0x5d3/0x6f0
> > [ 138.800431] [<ffffffff8022ad3c>] find_busiest_group+0x5dc/0x6f0
> > [ 138.800499] [<ffffffff8049f23c>] xfrm_send_state_notify+0x33c/0x6c0
> > [ 138.800562] [<ffffffff8049f243>] xfrm_send_state_notify+0x343/0x6c0
> > [ 138.800628] [<ffffffff8022ae75>] migration_thread+0x25/0x2e0
> > [ 138.800689] [<ffffffff8022ae7e>] migration_thread+0x2e/0x2e0
> > [ 138.800752] [<ffffffff8022aea0>] migration_thread+0x50/0x2e0
> > [ 138.800813] [<ffffffff8022aea9>] migration_thread+0x59/0x2e0
> > [ 138.831385] Unable to handle kernel paging request at ffffffff82800000 RIP:
> > [ 138.831439] [<ffffffff8020b491>] show_trace+0x311/0x380

When show_trace faults it is actually not the unwinder itself
(which would be unwind()/processCFI() etc.), but fallback code
which is actually just the old unwinder. So most of your accusations
are hitting the wrong piece of code, Linus.

I bet it would have worked with call_trace=new (then it won't do
any fallbacks)

Anyways, I guess we need even more validation in the fallback code,
but just terminating the kernel thread stacks should fix that particular case.

Proposed patch appended. Jan, what do you think?

-Andi

Index: linux/arch/i386/kernel/process.c
===================================================================
--- linux.orig/arch/i386/kernel/process.c
+++ linux/arch/i386/kernel/process.c
@@ -328,6 +328,7 @@ extern void kernel_thread_helper(void);
int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
{
struct pt_regs regs;
+ int err;

memset(&regs, 0, sizeof(regs));

@@ -342,7 +343,9 @@ int kernel_thread(int (*fn)(void *), voi
regs.eflags = X86_EFLAGS_IF | X86_EFLAGS_SF | X86_EFLAGS_PF | 0x2;

/* Ok, create the new process.. */
- return do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, &regs, 0, NULL, NULL);
+ err = do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, &regs, 0, NULL, NULL);
+ if (err == 0) /* terminate kernel stack */
+ task_pt_regs(current)->eip = 0;
}
EXPORT_SYMBOL(kernel_thread);

Index: linux/arch/x86_64/kernel/entry.S
===================================================================
--- linux.orig/arch/x86_64/kernel/entry.S
+++ linux/arch/x86_64/kernel/entry.S
@@ -978,6 +978,11 @@ ENTRY(kernel_thread)
call do_fork
movq %rax,RAX(%rsp)
xorl %edi,%edi
+ test %rax,%rax
+ jnz 1f
+ /* terminate stack in child */
+ movq %rdi,RIP(%rsp)
+1:

/*
* It isn't worth to check for reschedule here,

2006-09-30 20:44:03

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, 30 Sep 2006, Linus Torvalds wrote:
>
> It's not just unreadable and obviously buggy, it's so scarily that it's
> hard to even talk about it. Lookie here:
>
> #define HANDLE_STACK(cond) \
> do while (cond) { \
> unsigned long addr = *stack++; \
>
> What the F*CK! "do while(cond) {" ????

Btw, it took me quite a while to realize how something like that can
even compile. Seriously. Don't write code like that. Maybe some humans
parse it as

do {
while (cond) {
..
}
} while(0)

(which is the technically correct parsing and explains why it compiles
and wors), but I suspect I'm not the only one who went "What the F*CK"
when shown it without the "extraneous" braces.

For similar reasons, we write

#define dummy(x) do { } while (0)

rather than the shorter

#define dummy(x) do ; while (0)

(which some people _do_ seem to use. Aarggh!)

Or at least indent it. Or something.

I'll see if I can make git warn about "do <non-blockstatement> while ()",
if only because I at least personally seem to have trouble parsing it.

Linus

2006-09-30 20:49:03

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, 30 Sep 2006, Andi Kleen wrote:
>
> We didn't so far find any bug in the unwinder code itself (ok if you don't
> count the performance issue Ingo found) just lots in the annotations and one
> bug in the dwarf2 standard.

I don't think it matters if it's a bug in the unwinding code or in the
data generated for it. It's still a bug in the unwinder.

Those bugs have been compiler bugs, manual annotation bugs, and it
doesn't _matter_ what kind of bugs. The end result is the same: the
unwinder is buggy.

> If you kick the people who add more than three levels of callback
> to core driver code to get their acts together too that's fine
> to me. Unfortunately I don't think that's realistic. So we clearly
> need better unwinding.

I dispute the "clearly". We didn't have _that_ many problems with just
manually filtering out obvious left-overs from some previous callchain.

I mean, really: Andi, point me to anything that was a real problem when we
had no unwinder at all?

Linus

2006-09-30 20:52:14

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On 9/30/06, Andrew Morton <[email protected]> wrote:
> > On a 16-way Opteron (8 dual-core 880) with 8GB of RAM, vanilla 2.6.18
> > crashes early on boot with a BUG.
>
> omg what a mess. Have you tried it with lockdep disabled in config?

Well, all I can say is that without lockdep it doesn't freeze right
away (and no BUG, but that's to be expected). I can stress test it if
you want, although it will take a while, if you think it might be a
false positive.

er.

2006-09-30 20:57:15

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On 9/30/06, Peter Zijlstra <[email protected]> wrote:
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm1/broken-out/slab-fix-lockdep-warnings.patch
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm1/broken-out/slab-fix-lockdep-warnings-fix.patch
> http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18/2.6.18-mm1/broken-out/slab-fix-lockdep-warnings-fix-2.patch
>
> Those should rid you off the trace seen under:
>
> > ---- console for (1) without numa=noacpi
> > Sep 30 15:54:06 liw64 kernel:

Somehow, it doesn't freeze anymore, althought the BUG is still there
with more or less the same stacktrace.

er.

---- (v2.6.18 with the 3 patches, w/o numa=noacpi)
[ 209.281596] BUG: warning at
kernel/lockdep.c:565/print_infinite_recursion_bug()
[ 209.281667]
[ 209.281668] Call Trace:
[ 209.281862] [<ffffffff8020b22d>] show_trace+0xad/0x3b0
[ 209.281921] [<ffffffff8020b775>] dump_stack+0x15/0x20
[ 209.281981] [<ffffffff8024ba6d>] print_infinite_recursion_bug+0x3d/0x50
[ 209.282098] [<ffffffff8024bb7f>] find_usage_backwards+0x2f/0xd0
[ 209.282212] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.282326] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.282439] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.282567] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.282695] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.282822] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.282949] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.283076] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.283203] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.283330] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.283457] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.283584] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.283711] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.283838] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.283965] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.284092] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.284220] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.284353] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.284398] powernow-k8: Found 16 Dual Core AMD Opteron(tm)
Processor 880 processors (version 2.00.00)
[ 209.284442] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.284479] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.284516] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.284558] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.284599] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.284638] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.284682] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.284720] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.284767] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.284891] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.284932] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.284972] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.285014] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.285051] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.285454] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.285581] [<ffffffff8024bbe3>] find_usage_backwards+0x93/0xd0
[ 209.285709] [<ffffffff8024c590>] check_usage+0x40/0x2b0
[ 209.285836] [<ffffffff8024de60>] __lock_acquire+0xa50/0xd20
[ 209.285964] [<ffffffff8024e4bb>] lock_acquire+0x8b/0xc0
[ 209.286095] [<ffffffff804ac595>] _spin_lock+0x25/0x40
[ 209.286227] [<ffffffff80227e4b>] double_rq_lock+0x2b/0x50
[ 209.286327] [<ffffffff8022cffa>] rebalance_tick+0x19a/0x340
[ 209.286430] [<ffffffff8022d21f>] scheduler_tick+0x7f/0x390
[ 209.286534] [<ffffffff8023b013>] update_process_times+0x73/0x90
[ 209.286648] [<ffffffff8021639b>] smp_local_timer_interrupt+0x2b/0x60
[ 209.286733] [<ffffffff80216b58>] smp_apic_timer_interrupt+0x58/0x70
[ 209.286819] [<ffffffff8020a76f>] apic_timer_interrupt+0x6b/0x70
[ 209.286894] [<ffffffff804aca80>] _spin_unlock_irq+0x30/0x40
[ 209.287024] [<ffffffff80249d69>] hrtimer_run_queues+0x159/0x180
[ 209.287149] [<ffffffff8023b34a>] run_timer_softirq+0x2a/0x200
[ 209.287265] [<ffffffff80237320>] __do_softirq+0x80/0x110
[ 209.287373] [<ffffffff8020adf8>] call_softirq+0x1c/0x28
[ 209.287442] [<ffffffff8020c57d>] do_softirq+0x3d/0xc0
[ 209.287512] [<ffffffff80236ff7>] irq_exit+0x57/0x60
[ 209.287619] [<ffffffff80216b5d>] smp_apic_timer_interrupt+0x5d/0x70
[ 209.287704] [<ffffffff8020a76f>] apic_timer_interrupt+0x6b/0x70
[ 209.287772] [<ffffffff802086b2>] default_idle+0x42/0x80
[ 209.287836] [<ffffffff8020875a>] cpu_idle+0x6a/0x90
[ 209.287903] [<ffffffff80a4c9ac>] start_secondary+0x4fc/0x510
[ 209.288027] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.288134] powernow-k8: MP systems not supported by PSB BIOS structure
[ 209.288614] Freeing unused kernel memory: 256k freed
[ 209.310446] SCSI subsystem initialized
[ 209.314846] megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16
00:01:03 EST 2006)
----

2006-09-30 20:57:16

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

* Linus Torvalds <[email protected]> wrote:

> I mean, really: Andi, point me to anything that was a real problem
> when we had no unwinder at all?

personally, i like perfect stacktraces, alot. x86_64 was a huge pain for
me without the unwinder. I got so used to perfect backtraces on i686
(when using %ebp frames) during the years, and i had to look at _many_
backtraces with lockdep. On x86_64 it was just constant brain-drain to
think away bogus stack entries. Yes, i can do it no problem when i have
to look at only a few stacktraces per day, but if it's hundreds per day
it's _alot_ of brainpower wasted.

(i'd have been happy with an %rbp based unwinder for x86_64, in fact i
implemented it for lockdep and used it for some time on x86_64, but Andi
wanted a dwarf-based, lower-overhead one. Andi also nicely integrated it
into stacktrace.c.)

Ingo

2006-09-30 21:04:43

by Andrew Morton

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, 30 Sep 2006 22:52:12 +0200
"Eric Rannaud" <[email protected]> wrote:

> On 9/30/06, Andrew Morton <[email protected]> wrote:
> > > On a 16-way Opteron (8 dual-core 880) with 8GB of RAM, vanilla 2.6.18
> > > crashes early on boot with a BUG.
> >
> > omg what a mess. Have you tried it with lockdep disabled in config?
>
> Well, all I can say is that without lockdep it doesn't freeze right
> away (and no BUG, but that's to be expected). I can stress test it if
> you want, although it will take a while, if you think it might be a
> false positive.
>

Well. We always appreciate stress-testing, thanks. But if that finds a
bug, it's presumably a different one from this lockdep-vs-unwinder problem.

You could set CONFIG_UNWIND_INFO=n and CONFIG_STACK_UNWIND=n and reenable
lockdep. That will a) tell us if there's some lockdep problem and b) will
give us a clearer look at any locking problems which your kernel is
detecting.

2006-09-30 21:11:19

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, 30 Sep 2006, Ingo Molnar wrote:
>
> (i'd have been happy with an %rbp based unwinder for x86_64, in fact i
> implemented it for lockdep and used it for some time on x86_64, but Andi
> wanted a dwarf-based, lower-overhead one. Andi also nicely integrated it
> into stacktrace.c.)

I wouldn't mind the dawrf-based one so much, if it wasn't so obviously
crap.

It could - and _should_ dammit! - do some basic sanity tests like "is the
thing even in the same stack page"? But nooo... It seems _designed_ to be
fragile and broken.

Here's a simple test: if the next stack-slot isn't on the same page, the
unwind information is bogus unless you had the IRQ stack-switch signature
there. Does the code do that? No. It just assumes that unwind information
is complete and perfect.

That's not the kind of code we write in the kernel. In the kernel, we
write code that _works_, regardless of the kind of horrible stuff people
feed it. That's _doubly_ true for something like a stack frame debugger,
which is invoced when there is trouble, and for all we know the stack
itself MIGHT BE CORRUPT.

In short, I think the stack unwinder is just _broken_. It has made all the
wrong policy decisions - it only works when everything is perfect, yet
it's actually meant to be _used_ when somethign bad happened. Doesn't that
strike anybody else as a totally flawed design?

It damn well should.

Linus

2006-09-30 21:33:52

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

* Linus Torvalds <[email protected]> wrote:

> It could - and _should_ dammit! - do some basic sanity tests like "is
> the thing even in the same stack page"? But nooo... It seems
> _designed_ to be fragile and broken.
>
> Here's a simple test: if the next stack-slot isn't on the same page,
> the unwind information is bogus unless you had the IRQ stack-switch
> signature there. Does the code do that? No. It just assumes that
> unwind information is complete and perfect.

fully agreed - i have pointed out areas of conceptual fragility to Jan
early on:

http://lkml.org/lkml/2006/6/2/59

(but AFAICS i got no reply to that mail - i missed that in the lockdep
flurry.)

Ingo

2006-09-30 21:43:24

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On 9/30/06, Andi Kleen <[email protected]> wrote:
> Index: linux/arch/i386/kernel/process.c
> ===================================================================
> --- linux.orig/arch/i386/kernel/process.c
> +++ linux/arch/i386/kernel/process.c
> @@ -328,6 +328,7 @@ extern void kernel_thread_helper(void);
> int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
> {
> struct pt_regs regs;
> + int err;
>
> memset(&regs, 0, sizeof(regs));
>
> @@ -342,7 +343,9 @@ int kernel_thread(int (*fn)(void *), voi
> regs.eflags = X86_EFLAGS_IF | X86_EFLAGS_SF | X86_EFLAGS_PF | 0x2;
>
> /* Ok, create the new process.. */
> - return do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, &regs, 0, NULL, NULL);
> + err = do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, &regs, 0, NULL, NULL);
> + if (err == 0) /* terminate kernel stack */
> + task_pt_regs(current)->eip = 0;
> }
> EXPORT_SYMBOL(kernel_thread);
>
> Index: linux/arch/x86_64/kernel/entry.S
> ===================================================================
> --- linux.orig/arch/x86_64/kernel/entry.S
> +++ linux/arch/x86_64/kernel/entry.S
> @@ -978,6 +978,11 @@ ENTRY(kernel_thread)
> call do_fork
> movq %rax,RAX(%rsp)
> xorl %edi,%edi
> + test %rax,%rax
> + jnz 1f
> + /* terminate stack in child */
> + movq %rdi,RIP(%rsp)
> +1:
>
> /*
> * It isn't worth to check for reschedule here,
>

For the sake of it, I tried your patch (more exactly the version at
the end of that email, againt v2.6.18). It doesn't freeze and prints
the following stack trace:

----
[ 174.630434] BUG: warning at
kernel/lockdep.c:565/print_infinite_recursion_bug()
[ 174.630506]
[ 174.630507] Call Trace:
[ 174.630703] [<ffffffff8020b23d>] show_trace+0xad/0x3b0
[ 174.630760] [<ffffffff8020b785>] dump_stack+0x15/0x20
[ 174.630819] [<ffffffff8024ba7d>] print_infinite_recursion_bug+0x3d/0x50
[ 174.630938] [<ffffffff8024bb8f>] find_usage_backwards+0x2f/0xd0
[ 174.631051] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.631163] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.631277] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.631403] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.631530] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.631655] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.631786] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.631912] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.632039] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.632166] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.632293] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.632419] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.632546] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.632672] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.632798] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.632924] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.633050] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.633176] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.633303] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.633432] [<ffffffff8024bbf3>] find_usage_backwards+0x93/0xd0
[ 174.633559] [<ffffffff8024c5a0>] check_usage+0x40/0x2b0
[ 174.633685] [<ffffffff8024de70>] __lock_acquire+0xa50/0xd20
[ 174.633811] [<ffffffff8024e4cb>] lock_acquire+0x8b/0xc0
[ 174.633941] [<ffffffff804ac5a5>] _spin_lock+0x25/0x40
[ 174.634071] [<ffffffff80227e5b>] double_rq_lock+0x2b/0x50
[ 174.634171] [<ffffffff8022b1bb>] migration_thread+0x22b/0x2e0
[ 174.634272] [<ffffffff80246dca>] kthread+0xda/0x110
[ 174.634392] [<ffffffff8020aaad>] child_rip+0xa/0x11
----

er.

diff --git a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c
index 8657c73..5a7ef45 100644
--- a/arch/i386/kernel/process.c
+++ b/arch/i386/kernel/process.c
@@ -336,6 +336,7 @@ __asm__(".section .text\n"
int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags)
{
struct pt_regs regs;
+ int err;

memset(&regs, 0, sizeof(regs));

@@ -350,7 +351,10 @@ int kernel_thread(int (*fn)(void *), voi
regs.eflags = X86_EFLAGS_IF | X86_EFLAGS_SF | X86_EFLAGS_PF | 0x2;

/* Ok, create the new process.. */
- return do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, &regs, 0,
NULL, NULL);
+ err = do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, &regs, 0,
NULL, NULL);
+ if (err == 0) /* terminate kernel stack */
+ task_pt_regs(current)->eip = 0;
+ return err;
}
EXPORT_SYMBOL(kernel_thread);

diff --git a/arch/x86_64/kernel/entry.S b/arch/x86_64/kernel/entry.S
index aa8d893..5dce978 100644
--- a/arch/x86_64/kernel/entry.S
+++ b/arch/x86_64/kernel/entry.S
@@ -958,6 +958,11 @@ ENTRY(kernel_thread)
call do_fork
movq %rax,RAX(%rsp)
xorl %edi,%edi
+ test %rax,%rax
+ jnz 1f
+ /* terminate stack in child */
+ movq %rdi,RIP(%rsp)
+1:

/*
* It isn't worth to check for reschedule here,

2006-09-30 21:57:14

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

>
> It could - and _should_ dammit! - do some basic sanity tests like "is the
> thing even in the same stack page"? But nooo... It seems _designed_ to be
> fragile and broken.

That wouldn't work with interrupt stacks.

The old unwinder code had a state machine to deal with them,
but it was distingustingly complicated (there are nasty corner cases
where you can be in multiple interrupt stacks nested). I'm not
sure we would have really wanted to retain that.

What it does is to do __get_user for the stack values and it has the
unwind tables in the executable as sanity check, so in some sense it is
more reliable than the old code.

BTW again let me repeat this particular issue wasn't in the unwinder
itself, but just in the fallback-to-old code which didn't do enough
sanity checks. So you can say it's not the new unwinder that
crashed, but the old one here. I'll add more.

-Andi

2006-09-30 21:56:48

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, 30 Sep 2006, Andi Kleen wrote:
>
> Anyways, I guess we need even more validation in the fallback code,
> but just terminating the kernel thread stacks should fix that particular case.

Why not just add the simple validation?

A kernel stack is one page in size. If you move to another page, you
terminate. It's that simple.

What if the kernel stack is corrupt? Buffer overruns do that.

This patch seems to just paper over the _real_ problem, namely the fact
that the stack tracer code doesn't actually validate any of its arguments.

> When show_trace faults it is actually not the unwinder itself
> (which would be unwind()/processCFI() etc.), but fallback code
> which is actually just the old unwinder. So most of your accusations
> are hitting the wrong piece of code, Linus.

Ehh, that's not even _true_, Andi.

The old unwinder (well, at least for x86, and I assume x86-64 used that as
the beginning point) didn't have this problem at all, exactly because it
couldn't get on the wrong stack-page in the first place.

The old code literally had:

static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
{
return p > (void *)tinfo &&
p < (void *)tinfo + THREAD_SIZE - 3;
}

and would refuse to touch anything that wasn't in the stack page. It was
simple, AND WE NEVER _EVER_ HAD A BUG RELATED TO IT, AFAIK.

In contrast, the new code doesn't do any sanity checking at all, and this
is not the first or even the second time it causes problems.

So be honest here, don't try to shift the blame.

Linus

2006-09-30 22:54:40

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, 30 Sep 2006, Andi Kleen wrote:
>
> That wouldn't work with interrupt stacks.

Andi.

Please spend even just a few minutes looking at the old i386 code.

> The old unwinder code had a state machine to deal with them, but it was
> distingustingly complicated (there are nasty corner cases where you can
> be in multiple interrupt stacks nested). I'm not sure we would have
> really wanted to retain that.

Here's what the old code did:

frameptr = initialize frame code
while (1) {
struct thread_info *context;
context = (struct thread_info *)
((unsigned long)stack & (~(THREAD_SIZE - 1)));
frameptr = print_context_stack(context, stack, frameptr, log_lvl);
stack = (unsigned long*)context->previous_esp;
if (!stack)
break;
printk(" =======================\n");
}

it it really was that simple. The actual frame-following code was in
"print_context_stack()", and it works entirely within a single stack page,
and returns once it is outside that stack page.

There is absolutely ZERO problem with new pages through interrupt stacks
etc, because we don't even trust the stack contents for that, we just
follow the stack context pointers that we _can_ trust.

Linus

2006-09-30 22:55:20

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sun, 1 Oct 2006, Andi Kleen wrote:
>
> No, it's not. On x86-64 it can be three or more stacks nested in
> complicated ways (process stack, interrupt stack, exception stack)
> The exception stack can happen multiple times.

And you don't think that's true on x86?

Read the x86 code. Please. The one _before_ you added unwinding.

Linus

2006-09-30 22:59:49

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, 30 Sep 2006, Linus Torvalds wrote:
>
> And you don't think that's true on x86?
>
> Read the x86 code. Please. The one _before_ you added unwinding.

(Btw, on x86, all different stacks looked the same, since they were all
irq stacks. If they don't look that way on x86-64, just fix _that_ part).

Linus

2006-10-01 00:25:33

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sun, 1 Oct 2006, Andi Kleen wrote:
>
> On i386 it is simpler (only one interrupt stack and one process stack)
> However there can be still nasty corner cases, like the temporary NMI stacks
> that were added recently. It could be probably all handled in a state machine,
> but it would be ugly (at least I heard complaints about the x86 code that
> does it)

No, I don't understand AT ALL why you are so hung up about this state
machine thing. It's not true. There's simply no state machine involved,
although from a theoretical standpoint you can obviously always implement
just about _anything_ as a state machine.

> > Read the x86 code. Please. The one _before_ you added unwinding.
>
> It's still the same if you disable unwinding.

I think your problem is that you think that "unwinding" needs to handle
all the page crossing. That's incorrect, and that just results in stupid
and unworkable code.

Instead (and this is why I was trying to point you to the original
pre-unwinding code on i386), what you should do is to see it as two
totally independent phases:

- you have an outer loop that loops around the pages (since the _kernel_
controls the stack nesting at that level). This is the loop I quoted at
you.

- you have a _separate_ "unwinder()" for each page. It only unwinds
within that one page, and if the frame moves away from the page, it
immediately just returns that address, but it knows that it cannot be a
"valid" unwind address within that page.

That separate "unwind within one page" can well be using the dwarf info:
it only really needs to verify that
- it stays within the same page
- the unwind info results in an aligned pointer at a strictly higher
address.
those two tests are trivial, and _guarantee_ that we don't access any
half-way untrustworthy pointer.

See? No state machine. And notice how the dwarf info absolutely does _not_
need to know about the magic page-crossing events like interrupts,
exceptions or anything else. Very much on purpose.

This is what we used to do with %ebp following (at least on x86), and what
I tried to explain. Nothing complicated. And it's easy to set up, and the
dawrf unwinder (which depends on complex data structures) can be trivially
verified (ie it just stops immediately when it doesn't understand
something, and "crosses a page" is such an event).

And the page-crossing events don't need to know _anything_ about the dwarf
format (or %ebp, or any other unwinding details), and it can just depend
on the chain of pages (which is trivial to set up - if the exception pages
aren't already using the same format as the interrupt stack pages, it
should at least not be hard to make them do that).

And the page-level unwinding data format is trivial. I don't think we even
bothered verifying it on x86, but I guess some simple sanity-checking even
there might be worth it (but unlike any dwarf unwinding, this is _not_ at
all complicated, and there are absolutely _zero_ issues about compiler and
linker versions etc etc).

Linus

2006-10-01 00:51:27

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sat, 30 Sep 2006, Linus Torvalds wrote:
>
> - you have an outer loop that loops around the pages (since the _kernel_
> controls the stack nesting at that level). This is the loop I quoted at
> you.
>
> - you have a _separate_ "unwinder()" for each page. It only unwinds
> within that one page, and if the frame moves away from the page, it
> immediately just returns that address, but it knows that it cannot be a
> "valid" unwind address within that page.

Side note: it's entirely possible that the "unwinder" code shouldn't even
try to return the address outside the page, since the first/last frame on
a page is likely to be special (ie it's an exception/interrupt kind of
thing), and it's entirely possible that the "page-level" loop is better at
handling that part too.

That way you wouldn't even need to make the exception frames haev the
dwarf info etc, because you could choose to just depend on knowing what
the format of such a page was. But that's obviously just an implementation
choice..

Doesn't that sound like it should be both fairly straightforward and
reasonable?

Linus

2006-09-30 22:00:47

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On 9/30/06, Andrew Morton <[email protected]> wrote:
> You could set CONFIG_UNWIND_INFO=n and CONFIG_STACK_UNWIND=n and reenable
> lockdep. That will a) tell us if there's some lockdep problem and b) will
> give us a clearer look at any locking problems which your kernel is
> detecting.

All right. Here is the stacktrace I get with config
CONFIG_UNWIND_INFO=n and CONFIG_STACK_UNWIND=n and v2.6.18 (all the
rest being equal http://engm.ath.cx/kernel/config-2.6.18). (and no
freeze)

[ 153.552157] BUG: warning at
kernel/lockdep.c:565/print_infinite_recursion_bug()
[ 153.552252]
[ 153.552253] Call Trace:
[ 153.552346] [<ffffffff8024b84d>] print_infinite_recursion_bug+0x3d/0x50
[ 153.552398] [<ffffffff8024b95f>] find_usage_backwards+0x2f/0xd0
[ 153.552447] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.552496] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.552545] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.552594] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.552643] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.552704] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.552767] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.552835] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.552896] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.552958] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.553019] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.553081] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.553145] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.553206] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.553268] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.553330] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.553392] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.553454] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.553515] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.553578] [<ffffffff8024b9c3>] find_usage_backwards+0x93/0xd0
[ 153.553643] [<ffffffff804aa97a>] trace_hardirqs_on_thunk+0x35/0x37
[ 153.553706] [<ffffffff8024c370>] check_usage+0x40/0x2b0
[ 153.553768] [<ffffffff8024dc40>] __lock_acquire+0xa50/0xd20
[ 153.553830] [<ffffffff8024de2f>] __lock_acquire+0xc3f/0xd20
[ 153.553894] [<ffffffff80227c2b>] double_rq_lock+0x2b/0x50
[ 153.553955] [<ffffffff8024e29b>] lock_acquire+0x8b/0xc0
[ 153.554016] [<ffffffff80227c2b>] double_rq_lock+0x2b/0x50
[ 153.554077] [<ffffffff804aae25>] _spin_lock+0x25/0x40
[ 153.554138] [<ffffffff80227c2b>] double_rq_lock+0x2b/0x50
[ 153.554199] [<ffffffff8022af8b>] migration_thread+0x22b/0x2e0
[ 153.554260] [<ffffffff8022ad60>] migration_thread+0x0/0x2e0
[ 153.554322] [<ffffffff80246b9a>] kthread+0xda/0x110
[ 153.554384] [<ffffffff8020aaa0>] child_rip+0xa/0x12
[ 153.554443] [<ffffffff804ab30b>] _spin_unlock_irq+0x2b/0x40
[ 153.554507] [<ffffffff8020a0dc>] restore_args+0x0/0x30
[ 153.554568] [<ffffffff80246ac0>] kthread+0x0/0x110
[ 153.554628] [<ffffffff8020aa96>] child_rip+0x0/0x12

Thanks.
er.

2006-09-30 22:02:52

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Saturday 30 September 2006 23:56, Linus Torvalds wrote:
>
> On Sat, 30 Sep 2006, Andi Kleen wrote:
> >
> > Anyways, I guess we need even more validation in the fallback code,
> > but just terminating the kernel thread stacks should fix that particular case.
>
> Why not just add the simple validation?
>
> A kernel stack is one page in size. If you move to another page, you
> terminate. It's that simple.

No, it's not. On x86-64 it can be three or more stacks nested in
complicated ways (process stack, interrupt stack, exception stack)
The exception stack can happen multiple times.

> What if the kernel stack is corrupt? Buffer overruns do that.
>
> This patch seems to just paper over the _real_ problem, namely the fact
> that the stack tracer code doesn't actually validate any of its arguments.

It has pretty good sanity checking by first using __get_user for the stack
data, and the regularly double checking the EIPs by looking them up
in CFI. If it can't find them it will abort.

> The old unwinder (well, at least for x86, and I assume x86-64 used that as
> the beginning point) didn't have this problem at all, exactly because it
> couldn't get on the wrong stack-page in the first place.

In this particular case what happened is that the dwarf2 unwinder
ended and then the fallback was in the wrong page and couldn't handle
it.

> The old code literally had:
>
> static inline int valid_stack_ptr(struct thread_info *tinfo, void *p)
> {
> return p > (void *)tinfo &&
> p < (void *)tinfo + THREAD_SIZE - 3;
> }
>
> and would refuse to touch anything that wasn't in the stack page. It was
> simple, AND WE NEVER _EVER_ HAD A BUG RELATED TO IT, AFAIK.

That was before interrupt stacks were introduced. With that it is significantly
more complicated. On x86-64 even more because there are exception stacks.

-Andi

2006-09-30 22:03:51

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

> For the sake of it, I tried your patch (more exactly the version at
> the end of that email, againt v2.6.18). It doesn't freeze and prints
> the following stack trace:

Thanks for testing.

-Andi

2006-09-30 22:09:36

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18) II

>
> BTW again let me repeat this particular issue wasn't in the unwinder
> itself, but just in the fallback-to-old code which didn't do enough
> sanity checks. So you can say it's not the new unwinder that
> crashed, but the old one here. I'll add more.

I double checked this now. This case Eric ran into should be already
fixed by a patch from Jan that went in before 2.6.18 even.

He just ran with an old kernel (2.6.18-rc3) that didn't have
that particular fix.

Still the kernel stack termination is probably a good idea. I think
(haven't tested) the current 2.6.18-git* code with termination
wouldn't have crashed, but reported a (incorrect) stuck.

-Andi

2006-09-30 22:12:43

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

* Eric Rannaud <[email protected]> wrote:

> On 9/30/06, Andrew Morton <[email protected]> wrote:
> >You could set CONFIG_UNWIND_INFO=n and CONFIG_STACK_UNWIND=n and reenable
> >lockdep. That will a) tell us if there's some lockdep problem and b) will
> >give us a clearer look at any locking problems which your kernel is
> >detecting.
>
>
> All right. Here is the stacktrace I get with config
> CONFIG_UNWIND_INFO=n and CONFIG_STACK_UNWIND=n and v2.6.18 (all the
> rest being equal http://engm.ath.cx/kernel/config-2.6.18). (and no
> freeze)

hm, does the patch below solve it? In general, lockdep warnings are
intended to be non-fatal, so i have put in various practical limits on
internal data structure failure modes. We havent had a /single/
lockdep-internal crash ever since lockdep went upstream [the unwinder
crashes are outside of lockdep], and that's largely due to the good
internal checks it does.

Recursion within the dependency graph is currently limited to 20, that's
probably not enough on your box - this patch doubles it to 40. I have
written the lockdep functions to have as small stackframes as possible,
so 40 should be OK too. (The practical recursion limit should be
somewhere between 100 and 200 entries. If we hit that then i'll change
the algorithm to be iteration-based. Graph walking logic is so easy to
program via recursion, so i'd like to keep recursion as long as
possible.)

Ingo

---------------->
Subject: lockdep: increase max allowed recursion depth
From: Ingo Molnar <[email protected]>

With lots of CPUs there can be lots of deep dependencies. Will change
the algorithm to iteration-based if it gets too deep.

Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/lockdep.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

Index: linux/kernel/lockdep.c
===================================================================
--- linux.orig/kernel/lockdep.c
+++ linux/kernel/lockdep.c
@@ -575,6 +575,8 @@ static noinline int print_circular_bug_t
return 0;
}

+#define RECURSION_LIMIT 40
+
static int noinline print_infinite_recursion_bug(void)
{
__raw_spin_unlock(&hash_lock);
@@ -595,7 +597,7 @@ check_noncircular(struct lock_class *sou
debug_atomic_inc(&nr_cyclic_check_recursions);
if (depth > max_recursion_depth)
max_recursion_depth = depth;
- if (depth >= 20)
+ if (depth >= RECURSION_LIMIT)
return print_infinite_recursion_bug();
/*
* Check this lock's dependency list:
@@ -645,7 +647,7 @@ find_usage_forwards(struct lock_class *s

if (depth > max_recursion_depth)
max_recursion_depth = depth;
- if (depth >= 20)
+ if (depth >= RECURSION_LIMIT)
return print_infinite_recursion_bug();

debug_atomic_inc(&nr_find_usage_forwards_checks);
@@ -684,7 +686,7 @@ find_usage_backwards(struct lock_class *

if (depth > max_recursion_depth)
max_recursion_depth = depth;
- if (depth >= 20)
+ if (depth >= RECURSION_LIMIT)
return print_infinite_recursion_bug();

debug_atomic_inc(&nr_find_usage_backwards_checks);

2006-09-30 22:18:18

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

* Andi Kleen <[email protected]> wrote:

> > Why not just add the simple validation?
> >
> > A kernel stack is one page in size. If you move to another page, you
> > terminate. It's that simple.
>
> No, it's not. On x86-64 it can be three or more stacks nested in
> complicated ways (process stack, interrupt stack, exception stack) The
> exception stack can happen multiple times.

it could be cleanly handled though: in June i suggested to use the
next-stack pointers at the end of exception pages. The only current
complexity here is that the 'linking' of exception pages is non-uniform,
it depends on the type of page. That's largely why that complex
statemachine had to be implemented, to match up the type of the page.
Since those pointers are put there by us, there's no real reason why we
couldnt standardize them.

Ingo

2006-09-30 22:19:44

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18) II

On 10/1/06, Andi Kleen <[email protected]> wrote:
> I double checked this now. This case Eric ran into should be already
> fixed by a patch from Jan that went in before 2.6.18 even.
>
> He just ran with an old kernel (2.6.18-rc3) that didn't have
> that particular fix.

Hmm, not sure I'm following you, but I did try with the released
v2.6.18 (fourth stacktrace in my first email in this thread). The
2.6.13-rc3 (d94a041519f3ab1ac023bf917619cd8c4a7d3c01) version was
tested only as the result of git-bisect, and is the first kernel that
crashed in this way. But v2.6.18 crashed in a similar way as well.
Are you saying v2.6.18 should contain a fix preventing it from crashing?

> Still the kernel stack termination is probably a good idea. I think
> (haven't tested) the current 2.6.18-git* code with termination
> wouldn't have crashed, but reported a (incorrect) stuck.

Making sure my post was clear: that's what v2.6.18 + termination does.

Sorry if I misunderstood you.

er.

2006-09-30 22:25:04

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sunday 01 October 2006 00:10, Ingo Molnar wrote:
>
> * Andi Kleen <[email protected]> wrote:
>
> > > Why not just add the simple validation?
> > >
> > > A kernel stack is one page in size. If you move to another page, you
> > > terminate. It's that simple.
> >
> > No, it's not. On x86-64 it can be three or more stacks nested in
> > complicated ways (process stack, interrupt stack, exception stack) The
> > exception stack can happen multiple times.
>
> it could be cleanly handled though: in June i suggested to use the
> next-stack pointers at the end of exception pages.

Yes, but then you couldn't validate it like Linus asked for.
Also we've had cases where this information was corrupted

(e.g. when RSP starts somewhere completely bogus)

> The only current
> complexity here is that the 'linking' of exception pages is non-uniform,
> it depends on the type of page. That's largely why that complex
> statemachine had to be implemented, to match up the type of the page.
> Since those pointers are put there by us, there's no real reason why we
> couldnt standardize them.

Also how would you know in what kind of stack you are to select the right
check mask (since exception stacks are larger than 8K)? You would also need make
unvalidated assumptions again.

The unwinder already does stronger validation than any of this.

That particularly case here was a hole in the fallback logic that got
actually fixed before 2.6.18 (not it was a -rc3 or so)

-Andi

2006-09-30 22:25:05

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18) II

On Sunday 01 October 2006 00:19, Eric Rannaud wrote:
> On 10/1/06, Andi Kleen <[email protected]> wrote:
> > I double checked this now. This case Eric ran into should be already
> > fixed by a patch from Jan that went in before 2.6.18 even.
> >
> > He just ran with an old kernel (2.6.18-rc3) that didn't have
> > that particular fix.
>
> Hmm, not sure I'm following you, but I did try with the released
> v2.6.18 (fourth stacktrace in my first email in this thread). The
> 2.6.13-rc3 (d94a041519f3ab1ac023bf917619cd8c4a7d3c01) version was
> tested only as the result of git-bisect, and is the first kernel that
> crashed in this way. But v2.6.18 crashed in a similar way as well.
> Are you saying v2.6.18 should contain a fix preventing it from crashing?

2.6.18 should have showed the backtrace from lockdep, but not the oops at
the end of the backtrace in show_trace()

-Andi

2006-09-30 23:57:17

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Sunday 01 October 2006 00:55, Linus Torvalds wrote:
>
> On Sun, 1 Oct 2006, Andi Kleen wrote:
> >
> > No, it's not. On x86-64 it can be three or more stacks nested in
> > complicated ways (process stack, interrupt stack, exception stack)
> > The exception stack can happen multiple times.
>
> And you don't think that's true on x86?

On i386 it is simpler (only one interrupt stack and one process stack)
However there can be still nasty corner cases, like the temporary NMI stacks
that were added recently. It could be probably all handled in a state machine,
but it would be ugly (at least I heard complaints about the x86 code that
does it)

> Read the x86 code. Please. The one _before_ you added unwinding.

It's still the same if you disable unwinding.

-Andi

2006-10-01 00:59:28

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On 10/1/06, Ingo Molnar <[email protected]> wrote:
> hm, does the patch below solve it?

Well, your patch applied against v2.6.18 (I tried with git HEAD
cfae35804bcb909225d6f4eb5bd29e25971614d8 but it seems there is another
problem with the megaraid driver that I'm going to report separatly),
removes the BUG from lockdep and produces the following Oops.

I tried with noapic, but I got a (somewhat) similar stack trace, but a
bit earlier, although I couldn't get anything on netconsole.

er.

----
[ 274.496966] BUG: soft lockup detected on CPU#15!
[ 279.610130] NMI Watchdog detected LOCKUP on CPU 15
[ 279.610192] CPU 15
[ 279.610281] Modules linked in: dm_snapshot dm_zero dm_mirror ext3
jbd sata_nv libata
[ 279.610646] Pid: 1, comm: init Not tainted
2.6.18--ingo-lockdep-allowdeeperrec-agst-2.6.18rannaud-dirty #47
[ 279.610728] RIP: 0010:[<ffffffff8104ef59>] [<ffffffff8104ef59>]
.text.lock.lockdep+0x52/0x89
[ 279.610849] RSP: 0000:ffff8102238e7bc8 EFLAGS: 00000082
[ 279.610909] RAX: ffffffff81691078 RBX: 00000000000028aa RCX: 0000000000000000
[ 279.610973] RDX: ffffffff81680ca0 RSI: ffff810223f757c0 RDI: ffff810223f75040
[ 279.611036] RBP: ffff8102238e7c38 R08: 0000000000000002 R09: 0000000000000001
[ 279.611099] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff81680ca0
[ 279.611162] R13: ffff810223f757c0 R14: ffff810223f75040 R15: 0000000000000000
[ 279.611225] FS: 0000000000587850(0063) GS:ffff810223894ac0(0000)
knlGS:0000000000000000
[ 279.611305] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 279.611365] CR2: 0000000000585230 CR3: 0000000223a91000 CR4: 00000000000006e0
[ 279.611430] Process init (pid: 1, threadinfo ffff810223e00000, task
ffff810223f75040)
[ 279.611506] Stack: ffff8102238e7cc0 ffff8102238e7cc8
ffff8102238e7c18 ffffffff81056da0
[ 279.611767] 0000000200000001 0000000000000000 ffff810223e01d78
0000000200000001
[ 279.611987] ffff810223f75750 0000000000000046 ffffffff81056830
0000000000000002
[ 279.612158] Call Trace:
[ 279.612260] <IRQ> [<ffffffff81056da0>] kallsyms_lookup+0xd0/0x1f0
[ 279.612362] [<ffffffff81056830>] module_text_address+0x20/0x50
[ 279.612424] [<ffffffff8104ea6b>] lock_acquire+0x8b/0xc0
[ 279.612486] [<ffffffff81056830>] module_text_address+0x20/0x50
[ 279.612549] [<ffffffff812bd966>] _spin_lock_irqsave+0x36/0x50
[ 279.612610] [<ffffffff81056830>] module_text_address+0x20/0x50
[ 279.612673] [<ffffffff81045142>] kernel_text_address+0x22/0x40
[ 279.612735] [<ffffffff8100b270>] show_trace+0x220/0x260
[ 279.612795] [<ffffffff81063114>] softlockup_tick+0xd4/0x110
[ 279.612856] [<ffffffff8100b4f5>] dump_stack+0x15/0x20
[ 279.612916] [<ffffffff8106312a>] softlockup_tick+0xea/0x110
[ 279.612978] [<ffffffff81043c60>] delayed_work_timer_fn+0x0/0x40
[ 279.613039] [<ffffffff8103b2b3>] run_local_timers+0x13/0x20
[ 279.613099] [<ffffffff8103b567>] update_process_times+0x57/0x90
[ 279.613161] [<ffffffff8101617b>] smp_local_timer_interrupt+0x2b/0x60
[ 279.613223] [<ffffffff81016938>] smp_apic_timer_interrupt+0x58/0x70
[ 279.613285] [<ffffffff8100a76f>] apic_timer_interrupt+0x6b/0x70
[ 279.613347] [<ffffffff812bd820>] _spin_unlock_irq+0x30/0x40
[ 279.613408] [<ffffffff8103b9e6>] run_timer_softirq+0x156/0x200
[ 279.613471] [<ffffffff81037890>] __do_softirq+0x80/0x110
[ 279.613532] [<ffffffff8100adf8>] call_softirq+0x1c/0x34
[ 279.613592] [<ffffffff8100c35d>] do_softirq+0x3d/0xc0
[ 279.613652] [<ffffffff81037567>] irq_exit+0x57/0x60
[ 279.613711] [<ffffffff8101693d>] smp_apic_timer_interrupt+0x5d/0x70
[ 279.613775] [<ffffffff8100a76f>] apic_timer_interrupt+0x6b/0x70
[ 279.613834] <EOI> [<ffffffff812bd8bc>] _spin_unlock_irqrestore+0x4c/0x60
[ 279.613937] [<ffffffff8104783e>] prepare_to_wait+0x6e/0x80
[ 279.613998] [<ffffffff812bb503>] __wait_on_bit+0x33/0x80
[ 279.614058] [<ffffffff81092110>] sync_buffer+0x0/0x50
[ 279.614117] [<ffffffff81092110>] sync_buffer+0x0/0x50
[ 279.614177] [<ffffffff812bb5c8>] out_of_line_wait_on_bit+0x78/0x90
[ 279.614239] [<ffffffff81047670>] wake_bit_function+0x0/0x40
[ 279.614300] [<ffffffff810919c2>] __wait_on_buffer+0x22/0x30
[ 279.614360] [<ffffffff81094e71>] __bread+0xa1/0xc0
[ 279.614427] [<ffffffff88044dfd>] :ext3:ext3_fill_super+0x83d/0x16f0
[ 279.614490] [<ffffffff812bb945>] __mutex_unlock_slowpath+0x125/0x140
[ 279.614553] [<ffffffff8104ed1b>] trace_hardirqs_on+0x11b/0x150
[ 279.614615] [<ffffffff81097947>] get_sb_bdev+0x117/0x180
[ 279.614681] [<ffffffff880445c0>] :ext3:ext3_fill_super+0x0/0x16f0
[ 279.614743] [<ffffffff811058fe>] selinux_sb_copy_data+0x18e/0x1b6
[ 279.614809] [<ffffffff88042523>] :ext3:ext3_get_sb+0x13/0x20
[ 279.614870] [<ffffffff81096ce6>] vfs_kern_mount+0xb6/0x160
[ 279.614933] [<ffffffff81096dfa>] do_kern_mount+0x4a/0x70
[ 279.614994] [<ffffffff810afc1f>] do_mount+0x71f/0x790
[ 279.615056] [<ffffffff8106b557>] get_page_from_freelist+0x287/0x480
[ 279.615120] [<ffffffff8104ed1b>] trace_hardirqs_on+0x11b/0x150
[ 279.615184] [<ffffffff8104ed1b>] trace_hardirqs_on+0x11b/0x150
[ 279.615247] [<ffffffff8106b7c3>] __alloc_pages+0x73/0x300
[ 279.615307] [<ffffffff812bd8b9>] _spin_unlock_irqrestore+0x49/0x60
[ 279.615371] [<ffffffff81083334>] alloc_pages_current+0x84/0x90
[ 279.615436] [<ffffffff8106a98b>] __get_free_pages+0x1b/0x40
[ 279.615498] [<ffffffff810afd24>] sys_mount+0x94/0xe0
[ 279.615558] [<ffffffff812bce8a>] trace_hardirqs_on_thunk+0x35/0x37
[ 279.615621] [<ffffffff81009b4e>] system_call+0x7e/0x83
[ 279.615680]
[ 279.615731]
[ 279.615732] Code: 83 3d 90 fc 30 00 00 7e f5 e9 66 f1 ff ff f3 90 83 3d 80 fc
[ 279.616600] console shuts up ...
[ 279.616655] <3>BUG: sleeping function called from invalid context
at kernel/rwsem.c:20
[ 279.775463] in_atomic():1, irqs_disabled():0
[ 279.775522]
[ 279.775522] Call Trace:
[ 279.775626] <NMI> [<ffffffff81028dbb>] __might_sleep+0xbb/0xd0
[ 279.775728] [<ffffffff8104a6ed>] down_read+0x1d/0x50
[ 279.775789] [<ffffffff81040212>] blocking_notifier_call_chain+0x22/0x50
[ 279.775853] [<ffffffff810327b5>] profile_task_exit+0x15/0x20
[ 279.775914] [<ffffffff810341e5>] do_exit+0x25/0x9a0
[ 279.775976] [<ffffffff81031c29>] release_console_sem+0x59/0x250
[ 279.776040] [<ffffffff812be27d>] die_nmi+0xcd/0xd0
[ 279.776100] [<ffffffff812beb05>] nmi_watchdog_tick+0x115/0x1d0
[ 279.776162] [<ffffffff812be58d>] default_do_nmi+0x8d/0x190
[ 279.776223] [<ffffffff812bec21>] do_nmi+0x61/0xa0
[ 279.776285] [<ffffffff812bdf23>] nmi+0x7f/0x80
[ 279.776345] [<ffffffff8104ef59>] .text.lock.lockdep+0x52/0x89
[ 279.776406] <EOE> <IRQ> [<ffffffff81056da0>] kallsyms_lookup+0xd0/0x1f0
[ 279.776544] [<ffffffff81056830>] module_text_address+0x20/0x50
[ 279.776607] [<ffffffff8104ea6b>] lock_acquire+0x8b/0xc0
[ 279.776667] [<ffffffff81056830>] module_text_address+0x20/0x50
[ 279.776730] [<ffffffff812bd966>] _spin_lock_irqsave+0x36/0x50
[ 279.776792] [<ffffffff81056830>] module_text_address+0x20/0x50
[ 279.776854] [<ffffffff81045142>] kernel_text_address+0x22/0x40
[ 279.776918] [<ffffffff8100b270>] show_trace+0x220/0x260
[ 279.776978] [<ffffffff81063114>] softlockup_tick+0xd4/0x110
[ 279.777040] [<ffffffff8100b4f5>] dump_stack+0x15/0x20
[ 279.778250] [<ffffffff8106312a>] softlockup_tick+0xea/0x110
[ 279.778312] [<ffffffff81043c60>] delayed_work_timer_fn+0x0/0x40
[ 279.778373] [<ffffffff8103b2b3>] run_local_timers+0x13/0x20
[ 279.778434] [<ffffffff8103b567>] update_process_times+0x57/0x90
[ 279.778496] [<ffffffff8101617b>] smp_local_timer_interrupt+0x2b/0x60
[ 279.778559] [<ffffffff81016938>] smp_apic_timer_interrupt+0x58/0x70
[ 279.778622] [<ffffffff8100a76f>] apic_timer_interrupt+0x6b/0x70
[ 279.778684] [<ffffffff812bd820>] _spin_unlock_irq+0x30/0x40
[ 279.778745] [<ffffffff8103b9e6>] run_timer_softirq+0x156/0x200
[ 279.778808] [<ffffffff81037890>] __do_softirq+0x80/0x110
[ 279.778870] [<ffffffff8100adf8>] call_softirq+0x1c/0x34
[ 279.778931] [<ffffffff8100c35d>] do_softirq+0x3d/0xc0
[ 279.778991] [<ffffffff81037567>] irq_exit+0x57/0x60
[ 279.779053] [<ffffffff8101693d>] smp_apic_timer_interrupt+0x5d/0x70
[ 279.779115] [<ffffffff8100a76f>] apic_timer_interrupt+0x6b/0x70
[ 279.779176] <EOI> [<ffffffff812bd8bc>] _spin_unlock_irqrestore+0x4c/0x60
[ 279.779279] [<ffffffff8104783e>] prepare_to_wait+0x6e/0x80
[ 279.779340] [<ffffffff812bb503>] __wait_on_bit+0x33/0x80
[ 279.779400] [<ffffffff81092110>] sync_buffer+0x0/0x50
[ 279.779460] [<ffffffff81092110>] sync_buffer+0x0/0x50
[ 279.779521] [<ffffffff812bb5c8>] out_of_line_wait_on_bit+0x78/0x90
[ 279.779584] [<ffffffff81047670>] wake_bit_function+0x0/0x40
[ 279.779645] [<ffffffff810919c2>] __wait_on_buffer+0x22/0x30
[ 279.779705] [<ffffffff81094e71>] __bread+0xa1/0xc0
[ 279.779770] [<ffffffff88044dfd>] :ext3:ext3_fill_super+0x83d/0x16f0
[ 279.779832] [<ffffffff812bb945>] __mutex_unlock_slowpath+0x125/0x140
[ 279.779895] [<ffffffff8104ed1b>] trace_hardirqs_on+0x11b/0x150
[ 279.779957] [<ffffffff81097947>] get_sb_bdev+0x117/0x180
[ 279.780023] [<ffffffff880445c0>] :ext3:ext3_fill_super+0x0/0x16f0
[ 279.780086] [<ffffffff811058fe>] selinux_sb_copy_data+0x18e/0x1b6
[ 279.780153] [<ffffffff88042523>] :ext3:ext3_get_sb+0x13/0x20
[ 279.780215] [<ffffffff81096ce6>] vfs_kern_mount+0xb6/0x160
[ 279.780276] [<ffffffff81096dfa>] do_kern_mount+0x4a/0x70
[ 279.780336] [<ffffffff810afc1f>] do_mount+0x71f/0x790
[ 279.780397] [<ffffffff8106b557>] get_page_from_freelist+0x287/0x480
[ 279.780460] [<ffffffff8104ed1b>] trace_hardirqs_on+0x11b/0x150
[ 279.780524] [<ffffffff8104ed1b>] trace_hardirqs_on+0x11b/0x150
[ 279.780587] [<ffffffff8106b7c3>] __alloc_pages+0x73/0x300
[ 279.780647] [<ffffffff812bd8b9>] _spin_unlock_irqrestore+0x49/0x60
[ 279.780711] [<ffffffff81083334>] alloc_pages_current+0x84/0x90
[ 279.780772] [<ffffffff8106a98b>] __get_free_pages+0x1b/0x40
[ 279.780834] [<ffffffff810afd24>] sys_mount+0x94/0xe0
[ 279.780894] [<ffffffff812bce8a>] trace_hardirqs_on_thunk+0x35/0x37
[ 281.198364] [<ffffffff81009b4e>] system_call+0x7e/0x83
[ 281.198425]
[ 281.198479] Kernel panic - not syncing: Aiee, killing interrupt handler!
[ 281.198541] <7>APIC error on CPU1: 00(08)
----

> Subject: lockdep: increase max allowed recursion depth
> From: Ingo Molnar <[email protected]>
>
> With lots of CPUs there can be lots of deep dependencies. Will change
> the algorithm to iteration-based if it gets too deep.
>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> kernel/lockdep.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> Index: linux/kernel/lockdep.c
> ===================================================================
> --- linux.orig/kernel/lockdep.c
> +++ linux/kernel/lockdep.c
> @@ -575,6 +575,8 @@ static noinline int print_circular_bug_t
> return 0;
> }
>
> +#define RECURSION_LIMIT 40
> +
> static int noinline print_infinite_recursion_bug(void)
> {
> __raw_spin_unlock(&hash_lock);
> @@ -595,7 +597,7 @@ check_noncircular(struct lock_class *sou
> debug_atomic_inc(&nr_cyclic_check_recursions);
> if (depth > max_recursion_depth)
> max_recursion_depth = depth;
> - if (depth >= 20)
> + if (depth >= RECURSION_LIMIT)
> return print_infinite_recursion_bug();
> /*
> * Check this lock's dependency list:
> @@ -645,7 +647,7 @@ find_usage_forwards(struct lock_class *s
>
> if (depth > max_recursion_depth)
> max_recursion_depth = depth;
> - if (depth >= 20)
> + if (depth >= RECURSION_LIMIT)
> return print_infinite_recursion_bug();
>
> debug_atomic_inc(&nr_find_usage_forwards_checks);
> @@ -684,7 +686,7 @@ find_usage_backwards(struct lock_class *
>
> if (depth > max_recursion_depth)
> max_recursion_depth = depth;
> - if (depth >= 20)
> + if (depth >= RECURSION_LIMIT)
> return print_infinite_recursion_bug();
>
> debug_atomic_inc(&nr_find_usage_backwards_checks);
>

2006-10-01 09:27:56

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

> Side note: it's entirely possible that the "unwinder" code shouldn't even

How often do I need to repeat that it wasn't the dwarf2 unwinder that crashed
here, but the fallback code that is essentially just the old unwinder? Somehow
I don't seem to get through.

> try to return the address outside the page, since the first/last frame on
> a page is likely to be special (ie it's an exception/interrupt kind of
> thing), and it's entirely possible that the "page-level" loop is better at
> handling that part too.
>
> That way you wouldn't even need to make the exception frames haev the
> dwarf info etc, because you could choose to just depend on knowing what
> the format of such a page was. But that's obviously just an implementation
> choice..
>
> Doesn't that sound like it should be both fairly straightforward and
> reasonable?

Ok I guess it would be possible to add another level of stack validation to
the unwinder if you insist of it.

I don't think it would help all that much because the unwinder already does
pretty good validation based on CFI and it wouldn't have avoided that
particular problem anyways (which was already fixed in 2.6.18 BTW, Eric's
bisect just managed to find a bad spot before 2.6.18)

Also I still think the code will be fairly ugly to do this, but at least it's
already written for the old unwinder. The x86-64 code needed
at least one state variable, requiring more function arguments
all over the unwinder, but it might be possible to do it without that.

-Andi

2006-10-04 09:14:13

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

>>> Linus Torvalds <[email protected]> 30.09.06 21:54 >>>
>On Sat, 30 Sep 2006, Eric Rannaud wrote:
>>
>> (2) is introduced by d94a041519f3ab1ac023bf917619cd8c4a7d3c01
>> [PATCH] taskstats: free skb, avoid returns in send_cpu_listeners,
>> Signed-off-by: Shailabh Nagar.
>> The kernel freezes after a BUG (no sysrq magic).
>
>This one looks like the real problem is that totally broken stack unwinder
>again.

I can only support Andi here - I disagree that the unwinder is totally
broken, and most code fragments you presented during the thread
to support your statements were actually not even introduced by the
unwinding patches. While you ask Andi to look at the old code, it
seems to me you haven't really looked at the new one.

>Andi, I really think that Dwarf unwinder needs to go. The code is totally
>unreadable, and it's clearly fragile as hell. It doesn't check that the
>pointers it gets are even remotely valid, but just follows them as if they
>were.

Rather than ripping out the unwinder, anyone not liking the way it
works or it is implemented can easily just disable it - I always thought
that's what config options are for.

>The whole unwinder seems buggier than any bug it can ever unwind would be.
>Really. Let's go back to the sane "try our best, don't guarantee anything,
>but at least don't make things worse!" old code.
>
>The people who wrote that crap (and yes, Andi, I mean apparently you and
>Jan Beulich) really _have_ to get his act together. It's not just
>unreadable and obviously buggy, it's so scarily that it's hard to even
>talk about it. Lookie here:
>
> #define HANDLE_STACK(cond) \
> do while (cond) { \
> unsigned long addr = *stack++; \
>
>What the F*CK! "do while(cond) {" ????
>
>Please. Somebody just rip out all this crap. I beg of you.

One example of you blaming the unwind code for something that
was all there before. (I agree that I wrote that code, but a lot
earlier, and in order to replace what I could call crap: repeating
the same code sequence three times, making it necessary to apply
any adjustments to it in three places).

Also, I'd really like to understand if we're on a technical discussion
here, or whether we're just trying to exchange emotions.

Jan

2006-10-04 09:20:44

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

>>> Linus Torvalds <[email protected]> 30.09.06 23:11 >>>
>It could - and _should_ dammit! - do some basic sanity tests like "is the
>thing even in the same stack page"? But nooo... It seems _designed_ to be
>fragile and broken.

Your opinion. Mine's different, as the unwind directives are specifically
meant to allow changing to a different stack page. Also, there shouldn't
be any potential for corrupt unwind data (once all annotations are
correct, the lack or incorrectness of which I'd rather attribute as bugs
of the old code or, in one case, the compiler) as long as it's being
properly write protected; corruption of the stack should be affecting
the quality of old and new style back traces in similar ways.

>Here's a simple test: if the next stack-slot isn't on the same page, the
>unwind information is bogus unless you had the IRQ stack-switch signature
>there. Does the code do that? No. It just assumes that unwind information
>is complete and perfect.
>
>That's not the kind of code we write in the kernel. In the kernel, we
>write code that _works_, regardless of the kind of horrible stuff people
>feed it. That's _doubly_ true for something like a stack frame debugger,
>which is invoced when there is trouble, and for all we know the stack
>itself MIGHT BE CORRUPT.
>
>In short, I think the stack unwinder is just _broken_. It has made all the
>wrong policy decisions - it only works when everything is perfect, yet
>it's actually meant to be _used_ when somethign bad happened. Doesn't that
>strike anybody else as a totally flawed design?

Again, a corrupt stack will not allow you getting reliable data out of the
old unwinder either. Even worse when you consider a stack overflow and
your request for range checks (or pointers into the stack) - you might not
get a stack trace then at all.

Jan

2006-10-04 09:24:33

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

>> The lockdep warning in itself is probably valid, but the reason for the
>> _hang_ is the
>>
>> [ 138.831385] Unable to handle kernel paging request at ffffffff82800000 RIP:
>> [ 138.831439] [<ffffffff8020b491>] show_trace+0x311/0x380
>>
>> and that code is just a total mess.
>
>The code decides to do a fallback at the top of stack space for some reason.
>
>Hmm, i've seen this working on other kernel threads, but maybe
>that was luck. Kernel threads don't end up in user space
>so the normal check for that doesn't work.
>I guess we just need another termination for the kernel threads
>by pushing a 0 there explicitely. Jan, do you agree?

I thought we had done this already.

>> > [ 138.751306] [<ffffffff8021ecc0>] search_extable+0x40/0x70
>
>After here the unwinder seems to become a bit and it shouldn't print
>multiple entries. Jan any ideas why?

Not without raw stack contents.

>Proposed patch appended. Jan, what do you think?

As said above - I thought we added zero-termination already.

Jan

2006-10-04 10:53:58

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

> >> > [ 138.751306] [<ffffffff8021ecc0>] search_extable+0x40/0x70
> >
> >After here the unwinder seems to become a bit and it shouldn't print
> >multiple entries. Jan any ideas why?
>
> Not without raw stack contents.

I suppose those are lost in time.

Unfortunately lockdep doesn't even print stack contents because it doesn't
save them.

> >Proposed patch appended. Jan, what do you think?
>
> As said above - I thought we added zero-termination already.

For head.S but not for kernel_thread I think. At least I can't
find any existing code for kernel_thread().

-Andi

2006-10-04 11:57:48

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

>> >Proposed patch appended. Jan, what do you think?
>>
>> As said above - I thought we added zero-termination already.
>
>For head.S but not for kernel_thread I think. At least I can't
>find any existing code for kernel_thread().

2.6.18-git11 (i386) already has an annotated version of
kernel_thread_helper() in entry.S, including the pushing of a
fake (zero) return address. x86-64 has child_rip with the
added push even in original 2.6.18.

Jan

2006-10-04 12:03:45

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

On Wednesday 04 October 2006 13:58, Jan Beulich wrote:
> >> >Proposed patch appended. Jan, what do you think?
> >>
> >> As said above - I thought we added zero-termination already.
> >
> >For head.S but not for kernel_thread I think. At least I can't
> >find any existing code for kernel_thread().
>
> 2.6.18-git11 (i386) already has an annotated version of
> kernel_thread_helper() in entry.S, including the pushing of a
> fake (zero) return address. x86-64 has child_rip with the
> added push even in original 2.6.18.

True.

I wonder why it didn't work then and why my patch fixed the crash.

Ok the pushl is outside the CFI_STARTPROC

ENTRY(kernel_thread_helper)
pushl $0 # fake return address for unwinder
CFI_STARTPROC
..

-Andi

2006-10-04 12:09:06

[permalink] [raw]

Subject: Re: BUG-lockdep and freeze (was: Arrr! Linux 2.6.18)

>>> Andi Kleen <[email protected]> 04.10.06 14:03 >>>
>On Wednesday 04 October 2006 13:58, Jan Beulich wrote:
>> >> >Proposed patch appended. Jan, what do you think?
>> >>
>> >> As said above - I thought we added zero-termination already.
>> >
>> >For head.S but not for kernel_thread I think. At least I can't
>> >find any existing code for kernel_thread().
>>
>> 2.6.18-git11 (i386) already has an annotated version of
>> kernel_thread_helper() in entry.S, including the pushing of a
>> fake (zero) return address. x86-64 has child_rip with the
>> added push even in original 2.6.18.
>
>True.
>
>I wonder why it didn't work then and why my patch fixed the crash.

That what I'm curious about too.

>Ok the pushl is outside the CFI_STARTPROC
>
>ENTRY(kernel_thread_helper)
> pushl $0 # fake return address for unwinder
> CFI_STARTPROC

Intentionally - before the push there is *no* return address at all.

Jan

2006-10-04 15:12:25