Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932340AbVK2SeU (ORCPT ); Tue, 29 Nov 2005 13:34:20 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S932338AbVK2SeU (ORCPT ); Tue, 29 Nov 2005 13:34:20 -0500 Received: from solarneutrino.net ([66.199.224.43]:55556 "EHLO tau.solarneutrino.net") by vger.kernel.org with ESMTP id S932331AbVK2SeT (ORCPT ); Tue, 29 Nov 2005 13:34:19 -0500 Date: Tue, 29 Nov 2005 13:34:17 -0500 To: linux-kernel@vger.kernel.org Cc: Kai.Makisara@kolumbus.fi, linux-scsi@vger.kernel.org, ryan@tau.solarneutrino.net, Andrew Morton Subject: Re: Fw: crash on x86_64 - mm related? Message-ID: <20051129183417.GA6326@tau.solarneutrino.net> References: <20051129092432.0f5742f0.akpm@osdl.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20051129092432.0f5742f0.akpm@osdl.org> User-Agent: Mutt/1.5.9i From: Ryan Richter Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 13156 Lines: 220 On Tue, Nov 29, 2005 at 09:24:32AM -0800, Ryan Richter wrote: Not sure if this matters, but this apparently happened in two stages. This first part happened during the backups, as I said earlier: > Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8) > flags:0x010000000000000c mapping:ffff8100355f1dd8 mapcount:2 count:0 > Backtrace: > > Call Trace:{bad_page+99} {free_hot_cold_page+101} > {__page_cache_release+151} {sgl_unmap_user_pages+120} > {release_buffering+27} {st_write+1697} > {vfs_write+198} {sys_write+83} > {system_call+126} > Trying to fix it up, but a reboot is needed > Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8) > flags:0x010000000000081c mapping:ffff81005c0fc310 mapcount:0 count:0 > Backtrace: > > Call Trace:{bad_page+99} {free_hot_cold_page+101} > {__page_cache_release+151} {sgl_unmap > _user_pages+120} > {release_buffering+27} {st_write+1697} > {vfs_write+198} {sys_write+83} > {system_call+126} > Trying to fix it up, but a reboot is needed > ----------- [cut here ] --------- [please bite here ] --------- > Kernel BUG at include/linux/mm.h:341 > invalid operand: 0000 [1] SMP > CPU 1 > Modules linked in: bonding > Pid: 2418, comm: taper Tainted: G B 2.6.14.2 #1 > RIP: 0010:[] {sgl_unmap_user_pages+93} > RSP: 0018:ffff810035725e18 EFLAGS: 00010256 > RAX: 0000000000000000 RBX: 0000000000000007 RCX: 000000000000000f > RDX: 00000000000000e0 RSI: 0000000000000001 RDI: ffff81000260b6f8 > RBP: ffff810004852068 R08: 00000000ffffffff R09: 0000000000000000 > R10: 0000000000008000 R11: 0000000000000200 R12: 0000000000000008 > R13: 0000000000000000 R14: 0000000000008000 R15: ffff810004949d10 > FS: 00002aaaab53d880(0000) GS:ffffffff804db880(0000) knlGS:00000000556b6920 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00002aaaaaac0000 CR3: 0000000035691000 CR4: 00000000000006e0 > Process taper (pid: 2418, threadinfo ffff810035724000, task ffff81017d680300) > Stack: ffff8101423f3600 ffff810004852000 0000000000000040 0000000000008000 > ffff810004949c00 ffffffff802b48fb ffff810004852000 ffffffff802b4fb1 > ffff810000000000 ffffffff00000001 > Call Trace:{release_buffering+27} {st_write+1697} > {vfs_write+198} {sys_write+83} > {system_call+126} > > Code: 0f 0b 68 ba 12 3a 80 c2 55 01 f0 83 47 08 ff 0f 98 c0 84 c0 > RIP {sgl_unmap_user_pages+93} RSP > ----------- [cut here ] --------- [please bite here ] --------- > Kernel BUG at mm/rmap.c:487 > invalid operand: 0000 [2] SMP > CPU 1 > Modules linked in: bonding > Pid: 2418, comm: taper Tainted: G B 2.6.14.2 #1 > RIP: 0010:[] {page_remove_rmap+39} > RSP: 0018:ffff810035725ab0 EFLAGS: 00010286 > RAX: 00000000ffffffff RBX: ffff8100356976f8 RCX: ffff81000000f000 > RDX: 0000000000000000 RSI: 8000000064c69067 RDI: ffff81000260b6f8 > RBP: 00002aaaaaadf000 R08: 0000000000000000 R09: ffff81000260b688 > R10: 00000000fffffffa R11: 0000000000000000 R12: ffff810101c22380 > R13: 8000000064c69067 R14: ffff81000260b6f8 R15: 0000000000000000 > FS: 00002aaaab53d880(0000) GS:ffffffff804db880(0000) knlGS:00000000556b6920 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00002aaaaaac0000 CR3: 0000000035691000 CR4: 00000000000006e0 > Process taper (pid: 2418, threadinfo ffff810035724000, task ffff81017d680300) > Stack: ffffffff80166ecd 00002aaaaab62000 ffff810035696aa8 00002aaaaab62000 > 00002aaaaab62000 00002aaaaab61fff ffff810035695550 00002aaaaab62000 > ffffffff80167180 ffff810035725d68 > Call Trace:{zap_pte_range+477} {unmap_page_range+496} > {unmap_vmas+293} {exit_mmap+162} > {mmput+49} {do_exit+438} > {die+81} {do_invalid_op+159} > {sgl_unmap_user_pages+93} {thread_return+86} > {sym_setup_data_and_start+402} {error_exit+0} > {sgl_unmap_user_pages+93} {sgl_unmap_user_pages+120} > {release_buffering+27} {st_write+1697} > {vfs_write+198} {sys_write+83} > {system_call+126} > > Code: 0f 0b 68 9b 35 3a 80 c2 e7 01 48 c7 c6 ff ff ff ff bf 20 00 > RIP {page_remove_rmap+39} RSP > <1>Fixing recursive fault but reboot is needed! > Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: > {ext3_prepare_write+27} > PGD 355bc067 PUD 355c9067 PMD 0 > Oops: 0000 [3] SMP > CPU 0 > Modules linked in: bonding > Pid: 2416, comm: driver Tainted: G B 2.6.14.2 #1 > RIP: 0010:[] {ext3_prepare_write+27} > RSP: 0018:ffff8100355e7b48 EFLAGS: 00010296 > RAX: 0000000000000000 RBX: ffffffff8040f660 RCX: 000000000000017d > RDX: 0000000000000094 RSI: ffff81000260b6f8 RDI: ffff810035b09cc0 > RBP: 000000000000000e R08: 00000000fffffffa R09: 00000000000000e9 > R10: ffff81001190c818 R11: 0000000000000000 R12: ffff81000260b6f8 > R13: ffff81000260b6f8 R14: 000000000000017d R15: 0000000000000094 > FS: 00002aaaab53d8e0(0000) GS:ffffffff804db800(0000) knlGS:00000000555bc920 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 0000000035555000 CR4: 00000000000006e0 > Process driver (pid: 2416, threadinfo ffff8100355e6000, task ffff8100f43e8a80) > Stack: ffff81014e643310 ffffffff8040f660 000000000000000e ffff81000260b6f8 > ffff81005c0fc310 0000000000000094 00000000000000e9 ffffffff80158247 > 0000000000000292 00002aaaaaac0000 > Call Trace:{generic_file_buffered_write+551} > {__ext3_journal_stop+45} {__mark_inode_dirty+52} > {inode_update_time+188} {__generic_file_aio_write_nolock+936} > {thread_return+86} {lock_timer_base+41} > {generic_file_aio_write+110} {ext3_file_write+35} > {do_sync_write+211} {__pollwait+0} > {autoremove_wake_function+0} {sys_select+1153} > {vfs_write+198} {sys_write+83} > {system_call+126} > > Code: 48 8b 28 48 89 ef e8 aa 26 00 00 c7 44 24 04 00 00 00 00 89 > RIP {ext3_prepare_write+27} RSP > CR2: 0000000000000000 > <0>Bad page state at prep_new_page (in process 'dumper', page ffff81000260b6f8) > flags:0x010000000000001d mapping:0000000000000000 mapcount:-1 count:1 > Backtrace: > > Call Trace:{bad_page+99} {prep_new_page+65} > {buffered_rmqueue+302} {__alloc_pages+261} > {generic_file_buffered_write+413} > {current_fs_time+105} {inode_update_time+62} > {__generic_file_aio_write_nolock+936} > {sock_common_recvmsg+52} {sock_aio_read+272} > {generic_file_aio_write+110} {ext3_file_write+35} > {do_sync_write+211} {__pollwait+0} > {autoremove_wake_function+0} {sys_select+1153} > {vfs_write+198} {sys_write+83} > {system_call+126} > Trying to fix it up, but a reboot is needed Everything from here on happened several hours later while updatedb was running. > Bad page state at prep_new_page (in process 'find', page ffff81000260b6f8) > flags:0x0100000000000064 mapping:ffff8100f3be9be9 mapcount:1 count:1 > Backtrace: > > Call Trace:{bad_page+99} {prep_new_page+65} > {buffered_rmqueue+302} {__alloc_pages+261} > {kmem_getpages+99} {cache_grow+192} > {cache_alloc_refill+459} {kmem_cache_alloc+54} > {d_alloc+33} {real_lookup+105} > {do_lookup+112} {__link_path_walk+2551} > {link_path_walk+178} {path_lookup+446} > {__user_walk+62} {vfs_lstat+38} > {sys_newlstat+31} {system_call+126} > > Trying to fix it up, but a reboot is needed > Unable to handle kernel paging request at 00002aaaab9c5b61 RIP: > {cache_alloc_refill+330} > PGD c2512067 PUD c2513067 PMD 0 > Oops: 0002 [4] SMP > CPU 0 > Modules linked in: bonding > Pid: 3011, comm: find Tainted: G B 2.6.14.2 #1 > RIP: 0010:[] {cache_alloc_refill+330} > RSP: 0018:ffff810112f05c28 EFLAGS: 00010082 > RAX: 00002aaaab9c5b59 RBX: 0000000000000010 RCX: 0000000000029ba6 > RDX: 00002aaaab9c5bb3 RSI: ffff810064c69040 RDI: ffff81000c01a288 > RBP: ffff8100f6fc4800 R08: ffff81000c01a250 R09: ffff81000c01a260 > R10: 0000000000000000 R11: 0000000000000000 R12: ffff81000c01a240 > R13: ffff8100f6fc3640 R14: ffff81000c01a288 R15: 00000000000000d0 > FS: 00002aaaaae00640(0000) GS:ffffffff804db800(0000) knlGS:00000000555bc920 > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > CR2: 00002aaaab9c5b61 CR3: 00000000c2bb3000 CR4: 00000000000006e0 > Process find (pid: 3011, threadinfo ffff810112f04000, task ffff810102b46040) > Stack: ffff810112f05e68 ffff810179923cb8 fffffffffffffff4 ffff810112f05d28 > ffff810179923cb8 ffff810112f05d28 ffff810112f05e68 ffffffff80160226 > 0000000000000292 ffffffff80193831 > Call Trace:{kmem_cache_alloc+54} {d_alloc+33} > {real_lookup+105} {do_lookup+112} > {__link_path_walk+2551} {link_path_walk+178} > {path_lookup+446} {__user_walk+62} > {vfs_lstat+38} {sys_newlstat+31} > {system_call+126} > > Code: 48 89 50 08 48 89 02 48 c7 46 08 00 02 20 00 83 7e 24 ff 48 > RIP {cache_alloc_refill+330} RSP > CR2: 00002aaaab9c5b61 > NMI Watchdog detected LOCKUP on CPU 1 > CPU 1 > Modules linked in: bonding > Pid: 7, comm: events/1 Tainted: G B 2.6.14.2 #1 > RIP: 0010:[] {.text.lock.spinlock+118} > RSP: 0018:ffff810004869dd0 EFLAGS: 00000086 > RAX: ffff81000c01a240 RBX: ffff81000c01a288 RCX: ffff8100f6fc3640 > RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff81000c01a288 > RBP: ffff810100009dc0 R08: 0000000000000000 R09: 0000000000000000 > R10: 00000000ffffffff R11: 0000000000000066 R12: 0000000000000000 > R13: ffff810100009dd0 R14: 0000000000000292 R15: ffff810100009e40 > FS: 00002aaaaae00640(0000) GS:ffffffff804db880(0000) knlGS:00000000556b6920 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 00002aaaaaf1df40 CR3: 000000017f448000 CR4: 00000000000006e0 > Process events/1 (pid: 7, threadinfo ffff810004868000, task ffff8100f6fb6080) > Stack: ffffffff8015e35b ffff8100f6fc3640 ffff810100009f60 0000000000000001 > ffff810100009e40 ffff8100f6fc3640 ffff8100f6fc38e0 ffff810100009f88 > ffffffff80161414 ffff810004869e58 > Call Trace:{drain_alien_cache+123} {cache_reap+164} > {cache_reap+0} {worker_thread+476} > {default_wake_function+0} {default_wake_function+0} > {worker_thread+0} {kthread+146} > {child_rip+8} {worker_thread+0} > {kthread+0} {child_rip+0} > > > Code: 80 3f 00 7e f9 e9 59 fe ff ff e8 58 41 e9 ff e9 6f fe ff ff > console shuts up ... > <0>Kernel panic - not syncing: Aiee, killing interrupt handler! - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/