Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933454AbZKXQTz (ORCPT ); Tue, 24 Nov 2009 11:19:55 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758095AbZKXQTy (ORCPT ); Tue, 24 Nov 2009 11:19:54 -0500 Received: from lucidpixels.com ([75.144.35.66]:54585 "EHLO lucidpixels.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753306AbZKXQTx (ORCPT ); Tue, 24 Nov 2009 11:19:53 -0500 Date: Tue, 24 Nov 2009 11:20:00 -0500 (EST) From: Justin Piszcz To: Eric Sandeen cc: linux-raid@vger.kernel.org, Alan Piszcz , linux-kernel@vger.kernel.org, xfs@oss.sgi.com Subject: Re: Which kernel options should be enabled to find the root cause of this bug? In-Reply-To: <4B0BF866.7040004@sandeen.net> Message-ID: References: <4B0BF866.7040004@sandeen.net> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3251 Lines: 83 On Tue, 24 Nov 2009, Eric Sandeen wrote: > Justin Piszcz wrote: >> >> >> On Sat, 17 Oct 2009, Justin Piszcz wrote: >> >>> Hello, >>> >>> I have a system I recently upgraded from 2.6.30.x and after >>> approximately 24-48 hours--sometimes longer, the system cannot write >>> any more files to disk (luckily though I can still write to /dev/shm) >>> -- to which I have >>> saved the sysrq-t and sysrq-w output: >>> >>> http://home.comcast.net/~jpiszcz/20091017/sysrq-w.txt >>> http://home.comcast.net/~jpiszcz/20091017/sysrq-t.txt > > Unfortunately it looks like a lot of the sysrq-t, at least, was lost. Yes, when this occurred the first few times, I can only grab whats in dmesg to the ramdisk, trying to access any file system other than the ramdisk (tmpfs) /dev/shm, will cause the process to be locked. > > The sysrq-w trace has the "show blocked state" start a ways down the file, > for anyone playing along at home ;) > > Other things you might try are a sysrq-m to get memory state... I actually performed most of the useful sysrq-commands, please see the following: wget http://home.comcast.net/~jpiszcz/20091018/dmesg.txt wget http://home.comcast.net/~jpiszcz/20091018/interrupts.txt wget http://home.comcast.net/~jpiszcz/20091018/sysrq-l.txt wget http://home.comcast.net/~jpiszcz/20091018/sysrq-m.txt wget http://home.comcast.net/~jpiszcz/20091018/sysrq-p.txt wget http://home.comcast.net/~jpiszcz/20091018/sysrq-q.txt wget http://home.comcast.net/~jpiszcz/20091018/sysrq-t.txt wget http://home.comcast.net/~jpiszcz/20091018/sysrq-w.txt > >>> Configuration: >>> >>> $ cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md1 >>> : active raid1 sdb2[1] sda2[0] >>> 136448 blocks [2/2] [UU] >>> >>> md2 : active raid1 sdb3[1] sda3[0] >>> 129596288 blocks [2/2] [UU] >>> >>> md3 : active raid5 sdj1[7] sdi1[6] sdh1[5] sdf1[3] sdg1[4] sde1[2] >>> sdd1[1] sdc1[0] >>> 5128001536 blocks level 5, 1024k chunk, algorithm 2 [8/8] [UUUUUUUU] >>> >>> md0 : active raid1 sdb1[1] sda1[0] >>> 16787776 blocks [2/2] [UU] >>> >>> $ mount >>> /dev/md2 on / type xfs (rw,noatime,nobarrier,logbufs=8,logbsize=262144) >>> tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755) >>> proc on /proc type proc (rw,noexec,nosuid,nodev) >>> sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) >>> udev on /dev type tmpfs (rw,mode=0755) >>> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) >>> devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620) >>> /dev/md1 on /boot type ext3 (rw,noatime) >>> /dev/md3 on /r/1 type xfs >>> (rw,noatime,nobarrier,logbufs=8,logbsize=262144) >>> rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) >>> nfsd on /proc/fs/nfsd type nfsd (rw) > > Do you get the same behavior if you don't add the log options at mount time? I have not tried disabling the log options, although they have been in effect for a long time, (the logsbufs and bufsize and recently) the nobarrier support. Could there be an issue using -o nobarrier on a raid1+xfs? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/