Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964854AbWH1W3Y (ORCPT ); Mon, 28 Aug 2006 18:29:24 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S964855AbWH1W3Y (ORCPT ); Mon, 28 Aug 2006 18:29:24 -0400 Received: from e31.co.us.ibm.com ([32.97.110.149]:56976 "EHLO e31.co.us.ibm.com") by vger.kernel.org with ESMTP id S964854AbWH1W3X (ORCPT ); Mon, 28 Aug 2006 18:29:23 -0400 Subject: Re: boot failure, "DWARF2 unwinder stuck at 0xc0100199" From: Badari Pulavarty To: Jan Beulich Cc: Andrew Morton , "J. Bruce Fields" , Andi Kleen , lkml , "Randy.Dunlap" In-Reply-To: <44EAD613.76E4.0078.0@novell.com> References: <20060820013121.GA18401@fieldses.org> <44E97353.76E4.0078.0@novell.com> <20060821094718.79c9a31a.rdunlap@xenotime.net> <20060821212043.332fdd0f.akpm@osdl.org> <44EAD613.76E4.0078.0@novell.com> Content-Type: text/plain Date: Mon, 28 Aug 2006 15:32:32 -0700 Message-Id: <1156804352.447.5.camel@dyn9047017100.beaverton.ibm.com> Mime-Version: 1.0 X-Mailer: Evolution 2.0.4 (2.0.4-4) Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9354 Lines: 217 On Tue, 2006-08-22 at 10:01 +0200, Jan Beulich wrote: > >>> Andrew Morton 22.08.06 06:20 >>> > >On Mon, 21 Aug 2006 09:47:18 -0700 > >"Randy.Dunlap" wrote: > > > >> > The 'stuck' unwinder issue at hand already has a fix, though planned to > >> > be merged for 2.6.19 only. The crash after switching to the legacy > >> > stack trace code is bad, though, but has little to do with the unwinder > >> > additions/changes. The way that code reads the stack is just > >> > inappropriate in contexts where things must be expected to be broken. > >> > >> "merged for 2.6.19" meaning: > >> - in (before) 2.6.19, or > >> - after 2.6.19 is released > >> > >> If "after," then it will likely need to be added to -stable also, > >> so it might as well go in "before" 2.6.19 is released. > > > >Precisely. > > My understanding of 'for' is that Andi will send to Linus after in the 2.6.19 > merge window. > > >Guys, this unwinder change has been quite problematic. We really cannot > >let this badness out into 2.6.18 - it degrades our ability to debug every > >subsystem in the entire kernel. Would marking it CONFIG_BROKEN get us back > >to 2.6.17 behaviour? > > I'd prefer pushing into 2.6.18 some of the patches currently scheduled for > 2.6.19 over marking it CONFIG_BROKEN. But that's clearly not my decision. > > Jan I get into few "unwinder" issues - see following 2 cases. (nothing really stopping my work). It gives me *useful* stacks than before - so no major complaints :) I know that you are working on some of the unwinder fixes for 2.6.19 - just want to let you know about my problems also :) Thanks, Badari Case 1: ======== ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at fs/buffer.c:2791 invalid opcode: 0000 [1] SMP CPU 1 Modules linked in: sg sd_mod qla2xxx firmware_class scsi_transport_fc scsi_mod ipv6 thermal processor fan button battery ac dm_mod floppy parport_pc lp parport Pid: 4216, comm: kjournald Not tainted 2.6.18-rc4 #3 RIP: 0010:[] [] submit_bh +0x29/0x130 RSP: 0018:ffff8101bde8dd08 EFLAGS: 00010246 RAX: 0000000000000005 RBX: ffff8101bd0ad250 RCX: ffff8101df880e88 RDX: ffff8101733887c0 RSI: ffff8101bd0ad250 RDI: 0000000000000001 RBP: ffff8101bde8dd28 R08: ffff8101a033be38 R09: ffff81017605d7c0 R10: 00000000000a8f52 R11: 00000000000a8f54 R12: ffff8101a0113260 R13: 0000000000000001 R14: 0000000000000003 R15: 0000000000000080 FS: 00002b5b2e4476d0(0000) GS:ffff8101800a5140(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00002b5b2e1bd000 CR3: 0000000000201000 CR4: 00000000000006e0 Process kjournald (pid: 4216, threadinfo ffff8101bde8c000, task ffff810180259790) Stack: ffff810174897f70 ffff8101bd0ad250 ffff8101a0113260 000000000000004c ffff8101bde8dd68 ffffffff80284179 00000000bde8dd68 ffff81017441d250 ffff8101769ee910 ffff8101dd2518c0 0000000000000080 ffff8101a0059200 Call Trace: [] ll_rw_block+0x79/0xd0 [] journal_commit_transaction+0x478/0x1170 [] kjournald+0xde/0x290 [] kthread+0xdc/0x110 [] child_rip+0x8/0x12 DWARF2 unwinder stuck at child_rip+0x8/0x12 Leftover inexact backtrace: [] kthread+0x0/0x110 [] child_rip+0x0/0x12 Code: 0f 0b 68 d3 e0 50 80 c2 e7 0a 48 83 7b 38 00 75 0a 0f 0b 68 RIP [] submit_bh+0x29/0x130 RSP <1>Unable to handle kernel paging request at 0000000146f4eac0 RIP: [] task_rq_lock+0x38/0x90 PGD 1ddc2e067 PUD 0 Oops: 0000 [2] SMP Case 2: ======= kfree_debugcheck: bad ptr ffff8100d39ae000h. ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at mm/slab.c:2698 invalid opcode: 0000 [1] SMP CPU 0 Modules linked in: stap_2543 autofs4 hidp rfcomm l2cap bluetooth sunrpc af_packet xt_state ip_conntrack nfnetlink xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 acpi_cpufreq freq_table processor binfmt_misc parport_pc lp parport ide_cd cdrom generic floppy e752x_edac edac_mc shpchp i2c_i801 uhci_hcd piix serio_raw ehci_hcd i2c_core pci_hotplug usbcore dm_snapshot dm_zero dm_mirror dm_mod ide_disk ide_core Pid: 2638, comm: fsx-linux Not tainted 2.6.18-rc4-smp #17 RIP: 0010:[] [] kfree_debugcheck +0x9a/0xa8 RSP: 0018:ffff81010d9ad5b8 EFLAGS: 00010096 RAX: 0000000000000030 RBX: ffff8100d39ae000 RCX: ffffffff8062daa0 RDX: 0000000000000000 RSI: 0000000000000092 RDI: 0000000100000000 RBP: ffff81010d9ad5c8 R08: 00000000000042ee R09: 0000000000000000 R10: 0000000000000092 R11: 0000000000000000 R12: ffff81010ab2dd60 R13: ffff8100d39ae000 R14: ffff8101212ddea8 R15: 0000000000000286 FS: 00002b611ab05200(0000) GS:ffffffff8069b000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002af34fee4000 CR3: 000000010b1c5000 CR4: 00000000000006e0 Process fsx-linux (pid: 2638, threadinfo ffff81010d9ac000, task ffff81010b6a30c0) Stack: 0000000000000000 0000000000000000 ffff81010d9ad618 ffffffff8027fbbc ffff81010d9ad618 ffffffff80309080 ffff81010d9ad608 0000000000000000 ffff81010ab2dd60 ffff81010da6d928 ffff8101212ddea8 0000000000000000 Call Trace: [] kfree+0x26/0x1f2 [] do_get_write_access+0x52e/0x54f [] journal_get_undo_access+0x2e/0x118 [] ext3_try_to_allocate_with_rsv+0x4b/0x504 [] ext3_new_blocks+0x2b9/0x74e [] ext3_get_blocks_handle+0x467/0xac4 [] ext3_get_block+0xc4/0xec [] __block_prepare_write+0x1bf/0x41e [] block_prepare_write+0x22/0x30 [] ext3_prepare_write+0xb5/0x185 [] generic_file_buffered_write+0x2c7/0x6b7 [] __generic_file_aio_write_nolock+0x2e5/0x331 [] generic_file_aio_write+0x69/0xc4 [] ext3_file_write+0x1e/0x9b [] do_sync_write+0xf0/0x12e [] vfs_write+0xcf/0x175 [] sys_write+0x47/0x70 [] system_call+0x7e/0x83 DWARF2 unwinder stuck at system_call+0x7e/0x83 Leftover inexact backtrace: Code: 0f 0b 68 ae 3c 4a 80 c2 8a 0a 58 5b c9 c3 55 48 89 e5 41 57 RIP [] kfree_debugcheck+0x9a/0xa8 RSP <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():0, irqs_disabled():1 Call Trace: [] show_trace+0xae/0x30e [] dump_stack+0x15/0x17 [] __might_sleep+0xb2/0xb4 [] down_read+0x1d/0x2f [] blocking_notifier_call_chain+0x1b/0x41 [] profile_task_exit+0x15/0x17 [] do_exit+0x25/0x91e [] kernel_math_error+0x0/0x96 [] DWARF2 unwinder stuck at 0xffff81010b6a30c0 Leftover inexact backtrace: [] do_trap+0xe0/0xef [] do_invalid_op+0xa7/0xb3 [] kfree_debugcheck+0x9a/0xa8 [] _spin_unlock_irq+0x9/0xc [] thread_return+0x5e/0xef [] error_exit+0x0/0x84 [] kfree_debugcheck+0x9a/0xa8 [] kfree_debugcheck+0x9a/0xa8 [] kfree+0x26/0x1f2 [] journal_cancel_revoke+0x137/0x1ac [] do_get_write_access+0x52e/0x54f [] wake_bit_function+0x0/0x2a [] __find_get_block+0x171/0x183 [] journal_get_undo_access+0x2e/0x118 [] ext3_try_to_allocate_with_rsv+0x4b/0x504 [] __getblk+0x39/0x25c [] __bread+0xe/0xb5 [] ext3_new_blocks+0x2b9/0x74e [] ext3_get_blocks_handle+0x467/0xac4 [] alloc_buffer_head+0x19/0x40 [] cache_alloc_debugcheck_after+0x1a5/0x1b4 [] alloc_buffer_head+0x19/0x40 [] kmem_cache_alloc+0xbe/0xca [] ext3_get_block+0xc4/0xec [] __block_prepare_write+0x1bf/0x41e [] ext3_get_block+0x0/0xec [] block_prepare_write+0x22/0x30 [] ext3_prepare_write+0xb5/0x185 [] _write_unlock_irq+0x9/0xc [] generic_file_buffered_write+0x2c7/0x6b7 [] touch_atime+0x6b/0xaa [] current_fs_time+0x3f/0x41 [] do_generic_mapping_read+0x42e/0x47a [] __generic_file_aio_write_nolock+0x2e5/0x331 [] generic_file_aio_write+0x69/0xc4 [] ext3_file_write+0x1e/0x9b [] do_sync_write+0xf0/0x12e [] autoremove_wake_function+0x0/0x38 [] mutex_lock+0x22/0x32 [] vfs_write+0xcf/0x175 [] sys_write+0x47/0x70 [] system_call+0x7e/0x83 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/