Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758898AbZFNPeK (ORCPT ); Sun, 14 Jun 2009 11:34:10 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753992AbZFNPd6 (ORCPT ); Sun, 14 Jun 2009 11:33:58 -0400 Received: from host238.190-226-77.telecom.net.ar ([190.226.77.238]:49554 "EHLO burns.springfield.home" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753719AbZFNPd5 (ORCPT ); Sun, 14 Jun 2009 11:33:57 -0400 Date: Sun, 14 Jun 2009 12:32:56 -0300 From: Leandro Lucarella To: Ryusuke Konishi Cc: linux-kernel@vger.kernel.org, albertito@blitiri.com.ar, users@nilfs.org Subject: Re: NILFS2 get stuck after bio_alloc() fail Message-ID: <20090614153256.GA4020@homero.springfield.home> References: <20090614013211.GA22552@homero.springfield.home> <20090614.124517.47505469.konishi.ryusuke@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090614.124517.47505469.konishi.ryusuke@gmail.com> X-Paranoid: Just because you're paranoid, don't mean they're not after you. User-Agent: mutt-ng/devel-r804 (Debian) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6286 Lines: 152 Ryusuke Konishi, el 14 de junio a las 12:45 me escribiste: > Hi, > On Sat, 13 Jun 2009 22:32:11 -0300, Leandro Lucarella wrote: > > Hi! > > > > While testing nilfs2 (using 2.6.30) doing some "cp"s and "rm"s, I noticed > > sometimes they got stucked in D state, and the kernel had said the > > following message: > > > > NILFS: IO error writing segment > > > > A friend gave me a hand and after adding some printk()s we found out that > > the problem seems to occur when bio_alloc()s inside nilfs_alloc_seg_bio() > > fail, making it return NULL; but we don't know how that causes the > > processes to get stucked. > > Thank you for reporting this issue. > > Could you get stack dump of the stuck nilfs task? > It is acquirable as follows if you enabled magic sysrq feature: > > # echo t > /proc/sysrq-trigger > > I will dig into the process how it got stuck. Here is (what I thought it's) the important stuff: [...] kdmflush S dc5abf5c 0 1018 2 dc5abf84 00000046 dc60d780 dc5abf5c c01ad12e dd4d6ed0 dd4d7148 e3504d6e 00003c16 dc8b2560 dc5abf7c c040e24b dd846da0 dc60d7cc dd4d6ed0 dc5abf8c c040d628 dc5abfd0 c0131dbd dc7fe230 dd4d6ed0 dc5abfa8 dd4d6ed0 dd846da8 Call Trace: [] ? bio_fs_destructor+0xe/0x10 [] ? down_write+0xb/0x30 [] schedule+0x8/0x20 [] worker_thread+0x16d/0x1e0 [] ? dm_wq_work+0x0/0x120 [dm_mod] [] ? autoremove_wake_function+0x0/0x50 [] ? worker_thread+0x0/0x1e0 [] kthread+0x43/0x80 [] ? kthread+0x0/0x80 [] kernel_thread_helper+0x7/0x14 [...] loop0 S dcc7bce0 0 15884 2 d7671f48 00000046 c01ad116 dcc7bce0 dcc7bca0 d4686590 d4686808 b50316ce 000003b8 dc7010a0 c01b0d4f c01b0cf0 dcc7bcec 0c7f3000 00000000 d7671f50 c040d628 d7671fd0 de85391c 00000000 00000000 00000000 dcbbd108 dcbbd000 Call Trace: [] ? bio_free+0x46/0x50 [] ? mpage_end_io_read+0x5f/0x70 [] ? mpage_end_io_read+0x0/0x70 [] schedule+0x8/0x20 [] loop_thread+0x1cc/0x490 [loop] [] ? do_lo_send_aops+0x0/0x1c0 [loop] [] ? autoremove_wake_function+0x0/0x50 [] ? loop_thread+0x0/0x490 [loop] [] kthread+0x43/0x80 [] ? kthread+0x0/0x80 [] kernel_thread_helper+0x7/0x14 segctord D 00000001 0 15886 2 d3847ef4 00000046 c011cefb 00000001 00000001 dcf48fd0 dcf49248 c052b9d0 d50962e4 dc701720 d46871dc d46871e4 c23f180c c23f180c d3847f28 d3847efc c040d628 d3847f20 c040ed3d c23f1810 dcf48fd0 d46871dc 00000000 c23f180c Call Trace: [] ? dequeue_task_fair+0x27b/0x280 [] schedule+0x8/0x20 [] rwsem_down_failed_common+0x7d/0x180 [] rwsem_down_write_failed+0x1d/0x30 [] call_rwsem_down_write_failed+0x6/0x8 [] ? down_write+0x1e/0x30 [] nilfs_transaction_lock+0x59/0x100 [nilfs2] [] nilfs_segctor_thread+0xcc/0x2e0 [nilfs2] [] ? nilfs_construction_timeout+0x0/0x10 [nilfs2] [] ? nilfs_segctor_thread+0x0/0x2e0 [nilfs2] [] kthread+0x43/0x80 [] ? kthread+0x0/0x80 [] kernel_thread_helper+0x7/0x14 rm D d976bde0 0 16147 1 d976bdf0 00000086 003abc46 d976bde0 c013cc46 c18ad190 c18ad408 00000000 003abc46 dc789900 d976be38 d976bdf0 00000000 d976be30 d976be38 d976bdf8 c040d628 d976be00 c040d67a d976be08 c01668dd d976be24 c040dad7 c01668b0 Call Trace: [] ? getnstimeofday+0x56/0x110 [] schedule+0x8/0x20 [] io_schedule+0x3a/0x70 [] sync_page+0x2d/0x60 [] __wait_on_bit+0x47/0x70 [] ? sync_page+0x0/0x60 [] wait_on_page_bit+0x98/0xb0 [] ? wake_bit_function+0x0/0x60 [] truncate_inode_pages_range+0x244/0x360 [] ? __mark_inode_dirty+0x2c/0x160 [] ? nilfs_transaction_commit+0x9c/0x170 [nilfs2] [] ? down_read+0xb/0x20 [] truncate_inode_pages+0x1a/0x20 [] nilfs_delete_inode+0x9f/0xd0 [nilfs2] [] ? nilfs_delete_inode+0x0/0xd0 [nilfs2] [] generic_delete_inode+0x92/0x150 [] generic_drop_inode+0x6f/0x1b0 [] iput+0x47/0x50 [] do_unlinkat+0xd3/0x160 [] ? vfs_readdir+0x66/0x90 [] ? filldir64+0x0/0xf0 [] ? sys_getdents64+0x96/0xb0 [] sys_unlinkat+0x23/0x50 [] syscall_call+0x7/0xb umount D d06bbe6c 0 16727 1 d06bbe7c 00000086 d06bbe58 d06bbe6c c013cc46 dc5ef350 dc5ef5c8 00000000 022bb380 dc6503a0 d06bbec4 d06bbe7c 00000000 d06bbebc d06bbec4 d06bbe84 c040d628 d06bbe8c c040d67a d06bbe94 c01668dd d06bbeb0 c040dad7 c01668b0 Call Trace: [] ? getnstimeofday+0x56/0x110 [] schedule+0x8/0x20 [] io_schedule+0x3a/0x70 [] sync_page+0x2d/0x60 [] __wait_on_bit+0x47/0x70 [] ? sync_page+0x0/0x60 [] wait_on_page_bit+0x98/0xb0 [] ? wake_bit_function+0x0/0x60 [] wait_on_page_writeback_range+0xa4/0x110 [] ? __filemap_fdatawrite_range+0x60/0x80 [] filemap_fdatawait+0x34/0x40 [] filemap_write_and_wait+0x3b/0x50 [] sync_blockdev+0x19/0x20 [] __sync_inodes+0x45/0x70 [] sync_inodes+0xd/0x30 [] do_sync+0x17/0x70 [] sys_sync+0xd/0x20 [] syscall_call+0x7/0xb [...] 'rm' is the "original" stuck process, 'umount' got stuck after that, when I tried to umount the nilfs (it was mounted in a loop device). Here is the complete trace: http://pastebin.lugmen.org.ar/4931 Thank you. -- Leandro Lucarella (luca) | Blog colectivo: http://www.mazziblog.com.ar/blog/ ---------------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------------- Don't take life to seriously, you won't get out alive -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/