From: Fredrik Andersson Subject: Re: Fwd: Ext4 bug with fallocate Date: Mon, 19 Oct 2009 11:49:28 +0200 Message-ID: References: <4ADB3AEC.8040901@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE To: Eric Sandeen , linux-ext4@vger.kernel.org Return-path: Received: from mail-ew0-f207.google.com ([209.85.219.207]:45725 "EHLO mail-ew0-f207.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932087AbZJSJtZ convert rfc822-to-8bit (ORCPT ); Mon, 19 Oct 2009 05:49:25 -0400 Received: by ewy3 with SMTP id 3so3383754ewy.17 for ; Mon, 19 Oct 2009 02:49:28 -0700 (PDT) In-Reply-To: <4ADB3AEC.8040901@redhat.com> Sender: linux-ext4-owner@vger.kernel.org List-ID: Hi, here is the data for this process: 5958816.744013] drdbmake =A0=A0=A0=A0 D ffff88021e4c7800=A0=A0=A0=A0 0 = 27019=A0 13796 [5958816.744013]=A0 ffff8801d1bcda88 0000000000000082 ffff8801f4ce9bf0 ffff8801678b1380 [5958816.744013]=A0 0000000000010e80 000000000000c748 ffff8800404963c0 ffffffff81526360 [5958816.744013]=A0 ffff880040496730 00000000f4ce9bf0 000000025819cebe 0000000000000282 [5958816.744013] Call Trace: [5958816.744013]=A0 [] schedule+0x9/0x20 [5958816.744013]=A0 [] start_this_handle+0x365/0x5d0 [5958816.744013]=A0 [] ? autoremove_wake_function+0x0= / 0x40 [5958816.744013]=A0 [] jbd2_journal_restart+0xbe/0x15= 0 [5958816.744013]=A0 [] ext4_ext_truncate+0x6dd/0xa20 [5958816.744013]=A0 [] ? find_get_pages+0x3b/0xf0 [5958816.744013]=A0 [] ext4_truncate+0x198/0x680 [5958816.744013]=A0 [] ? unmap_mapping_range+0x74/0x2= 80 [5958816.744013]=A0 [] ? jbd2_journal_stop+0x1e0/0x36= 0 [5958816.744013]=A0 [] vmtruncate+0xa5/0x110 [5958816.744013]=A0 [] inode_setattr+0x30/0x180 [5958816.744013]=A0 [] ext4_setattr+0x173/0x310 [5958816.744013]=A0 [] notify_change+0x119/0x330 [5958816.744013]=A0 [] do_truncate+0x63/0x90 [5958816.744013]=A0 [] ? get_write_access+0x23/0x60 [5958816.744013]=A0 [] sys_truncate+0x17b/0x180 [5958816.744013]=A0 [] system_call_fastpath+0x16/0x1b Don't know if this has anything to do with it, but=A0 I also noticed that another process of mine, which is working just fine, is executing a suspicious looking function called raid0_unplug. It operates on the same raid0/ext4 filesystem as the hung process. I include the calltrace for it here too: [5958816.744013] nodeserv=A0=A0=A0=A0=A0 D ffff880167bd7ca8=A0=A0=A0=A0= 0 17900=A0 13796 [5958816.744013]=A0 ffff880167bd7bf8 0000000000000082 ffff88002800a588 ffff88021e5b56e0 [5958816.744013]=A0 0000000000010e80 000000000000c748 ffff880100664020 ffffffff81526360 [5958816.744013]=A0 ffff880100664390 000000008119bd17 000000026327bfa9 0000000000000002 [5958816.744013] Call Trace: [5958816.744013]=A0 [] ? raid0_unplug+0x51/0x70 [raid= 0] [5958816.744013]=A0 [] schedule+0x9/0x20 [5958816.744013]=A0 [] io_schedule+0x37/0x50 [5958816.744013]=A0 [] sync_page+0x35/0x60 [5958816.744013]=A0 [] sync_page_killable+0x9/0x50 [5958816.744013]=A0 [] __wait_on_bit_lock+0x52/0xb0 [5958816.744013]=A0 [] ? sync_page_killable+0x0/0x50 [5958816.744013]=A0 [] __lock_page_killable+0x64/0x70 [5958816.744013]=A0 [] ? wake_bit_function+0x0/0x40 [5958816.744013]=A0 [] ? find_get_page+0x1b/0xb0 [5958816.744013]=A0 [] generic_file_aio_read+0x3b8/0x= 6b0 [5958816.744013]=A0 [] do_sync_read+0xf1/0x140 [5958816.744013]=A0 [] ? do_futex+0xb8/0xb20 [5958816.744013]=A0 [] ? _spin_unlock_irqrestore+0x2f= /0x40 [5958816.744013]=A0 [] ? autoremove_wake_function+0x0= /0x40 [5958816.744013]=A0 [] ? add_wait_queue+0x43/0x60 [5958816.744013]=A0 [] ? getnstimeofday+0x5c/0xf0 [5958816.744013]=A0 [] vfs_read+0xc8/0x170 [5958816.744013]=A0 [] sys_pread64+0x9a/0xa0 [5958816.744013]=A0 [] system_call_fastpath+0x16/0x1b Hope this makes sense to anyone, and please let me know if there is more info I can provide. /Fredrik On Sun, Oct 18, 2009 at 5:57 PM, Eric Sandeen wrot= e: > > Fredrik Andersson wrote: >> >> Hi, I'd like to report what I'm fairly certain is an ext4 bug. I hop= e >> this is the right place to do so. >> >> My program creates a big file (around 30 GB) with posix_fallocate (t= o >> utilize extents), fills it with data and uses ftruncate to crop it t= o >> its final size (usually somewhere between 20 and 25 GB). >> The problem is that in around 5% of the cases, the program locks up >> completely in a syscall. The process can thus not be killed even wit= h >> kill -9, and a reboot is all that will do. > > does echo w > /proc/sysrq-trigger (this does sleeping processes; or u= se echo t for all processes) show you where the stuck threads are? > > -Eric > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" i= n the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html