From: Neil Brown Subject: Re: nfsd threads locked, 2.6.7 & ia64 Date: Mon, 28 Jun 2004 12:19:11 +1000 Sender: nfs-admin@lists.sourceforge.net Message-ID: <16607.32799.273922.585359@cse.unsw.edu.au> References: <20040627051129.GS10560@polop.usc.edu> <20040627193314.GE3491@fieldses.org> <20040628004602.GU10560@polop.usc.edu> <16607.28296.98868.364827@cse.unsw.edu.au> <20040628014822.GV10560@polop.usc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: nfs@lists.sourceforge.net Return-path: Received: from sc8-sf-mx1-b.sourceforge.net ([10.3.1.11] helo=sc8-sf-mx1.sourceforge.net) by sc8-sf-list2.sourceforge.net with esmtp (Exim 4.30) id 1Beljn-0003zh-Cz for nfs@lists.sourceforge.net; Sun, 27 Jun 2004 19:19:15 -0700 Received: from note.orchestra.cse.unsw.edu.au ([129.94.242.24] ident=root) by sc8-sf-mx1.sourceforge.net with esmtp (Exim 4.34) id 1Beljm-0003Be-N3 for nfs@lists.sourceforge.net; Sun, 27 Jun 2004 19:19:15 -0700 To: Garrick Staples In-Reply-To: message from Garrick Staples on Sunday June 27 Errors-To: nfs-admin@lists.sourceforge.net List-Unsubscribe: , List-Id: Discussion of NFS under Linux development, interoperability, and testing. List-Post: List-Help: List-Subscribe: , List-Archive: On Sunday June 27, garrick@usc.edu wrote: > > As you may remember, I have a pair of these ia64 machines. The one with the > stuck threads is currently only serving 1 xfs filesystem. The other machine > currently has 8 ext3 filesystems. Sounds like the finger is pointing at xfs .... > > Below is 3 different traces of nfsd, all 512 threads seem to follow one of > those 3 patterns. > > > nfsd D a000000100560810 0 2462 1 2406 2404 (L-TLB) > > Call Trace: > [] schedule+0xd20/0x12a0 > sp=e0000000059cfaa0 bsp=e0000000059c9290 > [] __down+0x210/0x320 > sp=e0000000059cfab0 bsp=e0000000059c9230 > [] linvfs_writev+0x290/0x320 [xfs] > sp=e0000000059cfae0 bsp=e0000000059c91d8 So this is waiting to "down" inode->i_sem. Someone must be holding that semaphore already... > > nfsd D a000000100560810 0 2400 1 2458 2455 (L-TLB) > > Call Trace: > [] schedule+0xd20/0x12a0 > sp=e00000003e13fcb0 bsp=e00000003e1391b8 > [] __down+0x210/0x320 > sp=e00000003e13fcc0 bsp=e00000003e139158 > [] nfsd_sync+0x240/0x280 [nfsd] > sp=e00000003e13fcf0 bsp=e00000003e139118 This is waiting on the same semapore.... > > > nfsd D a000000100562bd0 0 2455 1 2400 2399 (L-TLB) > This one is holding the semaphore (below).... > Call Trace: > [] schedule+0xd20/0x12a0 > sp=e000000005aa76c0 bsp=e000000005aa18b0 > [] io_schedule+0x70/0xa0 > sp=e000000005aa76d0 bsp=e000000005aa1898 > [] __lock_page+0x260/0x2e0 > sp=e000000005aa76d0 bsp=e000000005aa1860 ... and it seems to be trying to lock a page. I wonder why it can't. Neither of the other threads should be holding a lock on this page. Maybe if you put all the traces of all threads on a website somewhere, and send mail to linux-kernel@vger.kernel.org suggesting that you might have hit an XFS problem. NeilBrown > [] mpage_writepages+0x290/0x700 > sp=e000000005aa7750 bsp=e000000005aa1780 > [] do_writepages+0xe0/0x100 > sp=e000000005aa7800 bsp=e000000005aa1758 > [] __filemap_fdatawrite+0x160/0x180 > sp=e000000005aa7800 bsp=e000000005aa1738 > [] xfs_flush_inode+0x40/0x60 [xfs] > sp=e000000005aa7880 bsp=e000000005aa1718 > [] xfs_flush_space+0x1c0/0x200 [xfs] > sp=e000000005aa7880 bsp=e000000005aa16f0 > [] xfs_iomap_write_delay+0x560/0x760 [xfs] > sp=e000000005aa7880 bsp=e000000005aa1618 > [] xfs_iomap+0x450/0x800 [xfs] > sp=e000000005aa7930 bsp=e000000005aa15a0 > [] xfs_bmap+0x40/0x60 [xfs] > sp=e000000005aa7970 bsp=e000000005aa1558 > [] linvfs_get_block_core+0xe0/0x5c0 [xfs] > sp=e000000005aa7970 bsp=e000000005aa14f0 > [] __block_prepare_write+0x5f0/0xa60 > sp=e000000005aa79b0 bsp=e000000005aa1468 > [] block_prepare_write+0x40/0xa0 > sp=e000000005aa79e0 bsp=e000000005aa1438 > [] linvfs_prepare_write+0x90/0xc0 [xfs] > sp=e000000005aa79e0 bsp=e000000005aa1400 > [] generic_file_aio_write_nolock+0x810/0x12e0 > sp=e000000005aa79e0 bsp=e000000005aa1308 > [] xfs_write+0x3e0/0xda0 [xfs] > sp=e000000005aa7ab0 bsp=e000000005aa1230 > [] linvfs_writev+0x1b0/0x320 [xfs] > sp=e000000005aa7ae0 bsp=e000000005aa11d8 ***This is where the sem is being held**** > [] do_readv_writev+0x330/0x500 > sp=e000000005aa7c10 bsp=e000000005aa1170 > [] nfsd_write+0x1c0/0x7e0 [nfsd] > sp=e000000005aa7c90 bsp=e000000005aa10f8 > [] nfsd3_proc_write+0x180/0x260 [nfsd] > sp=e000000005aa7df0 bsp=e000000005aa10a8 > [] nfsd_dispatch+0x290/0x540 [nfsd] > sp=e000000005aa7df0 bsp=e000000005aa1058 > [] svc_process+0x10a0/0x1380 [sunrpc] > sp=e000000005aa7df0 bsp=e000000005aa0fe8 > [] nfsd+0x500/0x9c0 [nfsd] > sp=e000000005aa7e00 bsp=e000000005aa0ee8 > [] kernel_thread_helper+0xe0/0x100 > sp=e000000005aa7e30 bsp=e000000005aa0ec0 > [] start_kernel_thread+0x20/0x40 > sp=e000000005aa7e30 bsp=e000000005aa0ec0 > > -- > Garrick Staples, Linux/HPCC Administrator > University of Southern California ------------------------------------------------------- This SF.Net email sponsored by Black Hat Briefings & Training. Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs