Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752664AbZJQWez (ORCPT ); Sat, 17 Oct 2009 18:34:55 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751726AbZJQWey (ORCPT ); Sat, 17 Oct 2009 18:34:54 -0400 Received: from lucidpixels.com ([75.144.35.66]:35848 "EHLO lucidpixels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751550AbZJQWex (ORCPT ); Sat, 17 Oct 2009 18:34:53 -0400 Date: Sat, 17 Oct 2009 18:34:57 -0400 (EDT) From: Justin Piszcz To: linux-kernel@vger.kernel.org, linux-raid@vger.kernel.org, xfs@oss.sgi.com cc: Alan Piszcz Subject: 2.6.31+2.6.31.4: XFS - All I/O locks up to D-state after 24-48 hours (sysrq-t+w available) Message-ID: User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5353 Lines: 109 Hello, I have a system I recently upgraded from 2.6.30.x and after approximately 24-48 hours--sometimes longer, the system cannot write any more files to disk (luckily though I can still write to /dev/shm) -- to which I have saved the sysrq-t and sysrq-w output: http://home.comcast.net/~jpiszcz/20091017/sysrq-w.txt http://home.comcast.net/~jpiszcz/20091017/sysrq-t.txt Configuration: $ cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md1 : active raid1 sdb2[1] sda2[0] 136448 blocks [2/2] [UU] md2 : active raid1 sdb3[1] sda3[0] 129596288 blocks [2/2] [UU] md3 : active raid5 sdj1[7] sdi1[6] sdh1[5] sdf1[3] sdg1[4] sde1[2] sdd1[1] sdc1[0] 5128001536 blocks level 5, 1024k chunk, algorithm 2 [8/8] [UUUUUUUU] md0 : active raid1 sdb1[1] sda1[0] 16787776 blocks [2/2] [UU] $ mount /dev/md2 on / type xfs (rw,noatime,nobarrier,logbufs=8,logbsize=262144) tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) udev on /dev type tmpfs (rw,mode=0755) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620) /dev/md1 on /boot type ext3 (rw,noatime) /dev/md3 on /r/1 type xfs (rw,noatime,nobarrier,logbufs=8,logbsize=262144) rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) Distribution: Debian Testing Arch: x86_64 The problem occurs with 2.6.31 and I upgraded to 2.6.31.4 and the problem persists. Here is a snippet of two processes in D-state, the first was not doing anything, the second was mrtg. [121444.684000] pickup D 0000000000000003 0 18407 4521 0x00000000 [121444.684000] ffff880231dd2290 0000000000000086 0000000000000000 0000000000000000 [121444.684000] 000000000000ff40 000000000000c8c8 ffff880176794d10 ffff880176794f90 [121444.684000] 000000032266dd08 ffff8801407a87f0 ffff8800280878d8 ffff880176794f90 [121444.684000] Call Trace: [121444.684000] [] ? free_pages_and_swap_cache+0x9d/0xc0 [121444.684000] [] ? __mutex_lock_slowpath+0xd6/0x160 [121444.684000] [] ? mutex_lock+0x1a/0x40 [121444.684000] [] ? generic_file_llseek+0x2f/0x70 [121444.684000] [] ? sys_lseek+0x7e/0x90 [121444.684000] [] ? sys_munmap+0x52/0x80 [121444.684000] [] ? system_call_fastpath+0x16/0x1b [121444.684000] rateup D 0000000000000000 0 18538 18465 0x00000000 [121444.684000] ffff88023f8a8c10 0000000000000082 0000000000000000 ffff88023ea09ec8 [121444.684000] 000000000000ff40 000000000000c8c8 ffff88023faace50 ffff88023faad0d0 [121444.684000] 0000000300003e00 000000010720cc78 0000000000003e00 ffff88023faad0d0 [121444.684000] Call Trace: [121444.684000] [] ? xfs_buf_iorequest+0x42/0x90 [121444.684000] [] ? xlog_bdstrat_cb+0x3d/0x50 [121444.684000] [] ? xlog_sync+0x20b/0x4e0 [121444.684000] [] ? xlog_state_sync+0x26c/0x2a0 [121444.684000] [] ? default_wake_function+0x0/0x10 [121444.684000] [] ? _xfs_log_force+0x51/0x80 [121444.684000] [] ? xfs_log_force+0xb/0x40 [121444.684000] [] ? xfs_alloc_ag_vextent+0x123/0x130 [121444.684000] [] ? xfs_alloc_vextent+0x368/0x4b0 [121444.684000] [] ? xfs_bmap_btalloc+0x598/0xa40 [121444.684000] [] ? xfs_bmapi+0x9e2/0x11a0 [121444.684000] [] ? xlog_grant_push_ail+0x30/0xf0 [121444.684000] [] ? xfs_trans_reserve+0xa8/0x220 [121444.684000] [] ? xfs_iomap_write_allocate+0x23e/0x3b0 [121444.684000] [] ? __xfs_get_blocks+0x8f/0x220 [121444.684000] [] ? xfs_iomap+0x2c0/0x300 [121444.684000] [] ? __set_page_dirty+0x66/0xd0 [121444.684000] [] ? xfs_map_blocks+0x25/0x30 [121444.684000] [] ? xfs_page_state_convert+0x414/0x6c0 [121444.684000] [] ? xfs_vm_writepage+0x77/0x130 [121444.684000] [] ? __writepage+0xa/0x40 [121444.684000] [] ? write_cache_pages+0x1df/0x3c0 [121444.684000] [] ? __writepage+0x0/0x40 [121444.684000] [] ? do_sync_write+0xe3/0x130 [121444.684000] [] ? do_writepages+0x20/0x40 [121444.684000] [] ? __filemap_fdatawrite_range+0x4d/0x60 [121444.684000] [] ? xfs_flush_pages+0xad/0xc0 [121444.684000] [] ? xfs_release+0x167/0x1d0 [121444.684000] [] ? xfs_file_release+0x10/0x20 [121444.684000] [] ? __fput+0xcd/0x1e0 [121444.684000] [] ? filp_close+0x56/0x90 [121444.684000] [] ? sys_close+0xa6/0x100 [121444.684000] [] ? system_call_fastpath+0x16/0x1b Anyone know what is going on here? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/