Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S267367AbUI0U5B (ORCPT ); Mon, 27 Sep 2004 16:57:01 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S267370AbUI0Uzb (ORCPT ); Mon, 27 Sep 2004 16:55:31 -0400 Received: from cantor.suse.de ([195.135.220.2]:54699 "EHLO Cantor.suse.de") by vger.kernel.org with ESMTP id S267367AbUI0Uwq (ORCPT ); Mon, 27 Sep 2004 16:52:46 -0400 Subject: Re: kernel 2.6.9-rc2-mm1 system hangup From: Chris Mason To: Roel van der Made , Neil Brown , akpm@osdl.org Cc: linux-kernel@vger.kernel.org In-Reply-To: <20040924075416.GH7334@telegraafnet.nl> References: <20040924075416.GH7334@telegraafnet.nl> Content-Type: text/plain Date: Mon, 27 Sep 2004 16:53:23 -0400 Message-Id: <1096318403.19249.46.camel@watt.suse.com> Mime-Version: 1.0 X-Mailer: Evolution 2.0.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5000 Lines: 123 On Fri, 2004-09-24 at 09:54 +0200, Roel van der Made wrote: > Hi there, > > I'm having problems with servers hanging spontanely without any logging > or console output. They're running a 2.6.9-rc2-mm1 kernel and are Dell > PowerEdge 1750 dual Xeon servers with 4G ECC Reg. and 3 disks in > sw-raid 5. > > The systems still responds to ping and listens to ie. the mysql port, but does > not give a MySQL prompt, seems the disks are in deadlock state or so ? > > Using the sysrq showTasks I see the following traces (I will only show > some since the total log is much too long to show here, the full log > including the .config can be found on http://roel.net/backtrace/): For reiserfs deadlocks, it's usually the task stuck in do_journal_end that everyone else is waiting on. Those two procs are below, anyone have ideas where the md code is stuck? pdflush D 00000008 0 57 15 59 56 (L-TLB) f7cbbba4 00000046 f7cbbb94 00000008 00000001 00000008 00000008 94891fbc 0000003e f7c77000 f7cbbc28 00081600 f650fa80 f7e7db50 f7cbbc28 00000000 f7c77000 00000008 00000000 b31919b8 0000000d c2fba020 00000001 f7c77154 Call Trace: [] as_update_arq+0x2e/0x75 [] wait_for_completion+0x90/0xef [] default_wake_function+0x0/0x12 [] default_wake_function+0x0/0x12 [] elv_merged_request+0x1f/0x21 [] sync_page_io+0xa3/0xb1 [] bi_complete+0x0/0x1c [] write_disk_sb+0x78/0xb0 [] sync_sbs+0x2b/0x43 [] md_update_sb+0xa9/0xdb [] load_balance_newidle+0x35/0x98 [] md_write_start+0x95/0xa0 [] make_request+0x1df/0x226 [] generic_make_request+0x113/0x194 [] mempool_alloc+0x8b/0x15d [] autoremove_wake_function+0x0/0x57 [] submit_bio+0x70/0x121 [] bio_alloc+0xd9/0x1ac [] submit_bh+0xe0/0x133 [] ll_rw_block+0x68/0x88 [] flush_commit_list+0x454/0x48f [] do_journal_end+0x898/0xb63 [] pdflush+0x0/0x2c [] journal_end_sync+0x4d/0x89 [] reiserfs_sync_fs+0x65/0xa8 [] sync_supers+0x9b/0x9d [] wb_kupdate+0x60/0x13b [] __pdflush+0xbf/0x191 [] pdflush+0x28/0x2c [] wb_kupdate+0x0/0x13b [] pdflush+0x0/0x2c [] kthread+0xb7/0xbd [] kthread+0x0/0xbd [] kernel_thread_helper+0x5/0xb and munin-node D 00000008 0 23920 978 23921 23533 (NOTLB) e9cd59a0 00000086 e9cd598c 00000008 00000003 00000008 00000008 f753ac80 00000007 ea202550 e9cd5a24 00201c00 f6466300 00000082 c04830c0 f6449550 00000000 f6888e1c 00000008 c45a45e6 0000000d c2fca020 00000003 ea2026a4 Call Trace: [] wait_for_completion+0x90/0xef [] default_wake_function+0x0/0x12 [] __find_get_block+0x5e/0xc2 [] default_wake_function+0x0/0x12 [] is_tree_node+0x6f/0x71 [] sync_page_io+0xa3/0xb1 [] bi_complete+0x0/0x1c [] write_disk_sb+0x78/0xb0 [] sync_sbs+0x2b/0x43 [] md_update_sb+0xa9/0xdb [] inode2sd+0xcc/0x116 [] md_write_start+0x95/0xa0 [] make_request+0x1df/0x226 [] generic_make_request+0x113/0x194 [] mempool_alloc+0x8b/0x15d [] autoremove_wake_function+0x0/0x57 [] submit_bio+0x70/0x121 [] __find_get_block+0x5e/0xc2 [] bio_alloc+0xd9/0x1ac [] is_tree_node+0x6f/0x71 [] submit_bh+0xe0/0x133 [] submit_logged_buffer+0x5e/0x62 [] write_chunk+0x3d/0x47 [] kupdate_transactions+0x129/0x14c [] __find_get_block+0x3f/0xc2 [] inode_get_bytes+0x3d/0x54 [] run_timer_softirq+0x109/0x19d [] scheduler_tick+0x192/0x269 [] bh_lru_install+0xb0/0xe2 [] flush_used_journal_lists+0xbf/0xe1 [] flush_old_journal_lists+0x3f/0x5e [] do_journal_end+0x7b6/0xb63 [] journal_end+0xa2/0xc0 [] reiserfs_dirty_inode+0x8c/0xd3 [] __rmqueue+0xe8/0x139 [] __mark_inode_dirty+0x1d2/0x1d7 [] rmqueue_bulk+0x2f/0x6f [] inode_update_time+0xac/0xd6 [] reiserfs_file_write+0x2f4/0x7a3 [] do_wp_page+0x20a/0x380 [] pte_alloc_map+0xaa/0xd1 [] handle_mm_fault+0x15c/0x172 [] do_page_fault+0x19f/0x5c9 [] do_sigaction+0x1e7/0x203 [] run_timer_softirq+0x109/0x19d [] vfs_write+0xbc/0x127 [] sys_socketcall+0xf7/0x256 [] sys_write+0x51/0x80 [] syscall_call+0x7/0xb - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/