Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756368AbXKNWbq (ORCPT ); Wed, 14 Nov 2007 17:31:46 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752063AbXKNWbh (ORCPT ); Wed, 14 Nov 2007 17:31:37 -0500 Received: from ns2.g-housing.de ([81.169.133.75]:47732 "EHLO mail.g-house.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751702AbXKNWbg (ORCPT ); Wed, 14 Nov 2007 17:31:36 -0500 Date: Wed, 14 Nov 2007 23:31:12 +0100 (CET) From: Christian Kujau X-X-Sender: evil@sheep.housecafe.de To: "J. Bruce Fields" cc: Benny Halevy , Chris Wedgwood , linux-xfs@oss.sgi.com, LKML Subject: Re: 2.6.24-rc2 XFS nfsd hang In-Reply-To: <20071114125907.GB4010@fieldses.org> Message-ID: References: <20071114070400.GA25708@puku.stupidest.org> <473AA72C.6020308@panasas.com> <20071114125907.GB4010@fieldses.org> User-Agent: Alpine 0.99999 (DEB 796 2007-11-08) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3159 Lines: 75 On Wed, 14 Nov 2007, J. Bruce Fields wrote: > On Wed, Nov 14, 2007 at 09:43:40AM +0200, Benny Halevy wrote: >> I wonder if this is a similar hang to what Christian was seeing here: >> http://lkml.org/lkml/2007/11/13/319 > > Ah, thanks for noticing that. Christian Kujau, is /data an xfs > partition? Sorry for the late reply :\ Yes, the nfsd process only got stuck when I did ls(1) (with or without -l) on a NFS share which contained a XFS partition. I did not care for the underlying fs first so I just ls'ed my shares and noticed that it got stuck. Now that you mention it I tried again, with a (git-wise) current 2.6 kernel and the same .config: http://nerdbynature.de/bits/2.6.24-rc2/nfsd/ Running ls on a ext3 or jfs backed nfs share did succeed, running ls on an xfs backed nfs share did not. The sysrq-t (see dmesg.2.gz please) looks like yours (to my untrained eye): nfsd D c04131c0 0 8535 2 e7ea97b8 00000046 e7ea9000 c04131c0 e7ea97b8 e697e7e0 00000282 e697e7e8 e7ea97e4 c0409ebc f71f3500 00000001 f71f3500 c0115540 e697e804 e697e804 e697e7e0 8f082000 00000001 e7ea97f4 c0409cc2 00000004 00000062 e7ea9800 Nov 14 23:07:14 sheep kernel: [ 1870.124185] Call Trace: [] __down+0x7c/0xd0 [] __down_failed+0xa/0x10 [] xfs_buf_lock+0x46/0x50 [] _xfs_buf_find+0xf2/0x190 [] xfs_buf_get_flags+0x54/0x120 [] xfs_buf_read_flags+0x1d/0x80 [] xfs_trans_read_buf+0x4a/0x350 [] xfs_da_do_buf+0x409/0x760 [] xfs_da_read_buf+0x2f/0x40 [] xfs_dir2_leaf_lookup_int+0x172/0x270 [] xfs_dir2_leaf_lookup+0x1e/0x90 [] xfs_dir_lookup+0xe4/0x100 [] xfs_dir_lookup_int+0x2e/0x100 [] xfs_lookup+0x62/0x90 [] xfs_vn_lookup+0x34/0x70 [] __lookup_hash+0xb6/0x100 [] lookup_one_len+0x4e/0x50 [] compose_entry_fh+0x59/0x120 [nfsd] [] encode_entry+0x329/0x3c0 [nfsd] [] nfs3svc_encode_entry_plus+0x3b/0x50 [nfsd] [] xfs_dir2_leaf_getdents+0x174/0x900 [] xfs_readdir+0xba/0xd0 [] xfs_file_readdir+0x44/0x70 [] vfs_readdir+0x7e/0xa0 [] nfsd_readdir+0x73/0xe0 [nfsd] [] nfsd3_proc_readdirplus+0xda/0x200 [nfsd] [] nfsd_dispatch+0x11b/0x210 [nfsd] [] svc_process+0x41c/0x760 [sunrpc] [] nfsd+0x164/0x2a0 [nfsd] [] kernel_thread_helper+0x7/0x10 >> Any suggestions other than to bisect this? (Bisection might be >> painful as it crosses the x86-merge.) Make that "impossible" for me, as I could not boot the bisected kernel and marking versions as "bad" for unrelated things seems to invalidate the results. However, from ~2500 revisions (2.6.24-rc2 to 2.6.23.1) down to ~20 or so in just 10 builds, that's pretty awesome. Christian. -- BOFH excuse #321: Scheduled global CPU outage - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/