From: Trond Myklebust Subject: Re: NFS hang Date: Mon, 27 Nov 2006 16:54:13 -0500 Message-ID: <1164664453.5727.44.camel@lade.trondhjem.org> References: <1162840599.31460.8.camel@zod.rchland.ibm.com> <1164655027.5727.5.camel@lade.trondhjem.org> <1164657487.5727.12.camel@lade.trondhjem.org> <1164663614.10787.21.camel@zod.rchland.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Chris Caputo , Frank Filz , nfs@lists.sourceforge.net Return-path: Received: from sc8-sf-mx1-b.sourceforge.net ([10.3.1.91] helo=mail.sourceforge.net) by sc8-sf-list2-new.sourceforge.net with esmtp (Exim 4.43) id 1GooQx-0003Mw-Od for nfs@lists.sourceforge.net; Mon, 27 Nov 2006 13:54:39 -0800 Received: from pat.uio.no ([129.240.10.15] ident=[U2FsdGVkX1+ssR9JMDmXCipiE1GadCcqvBkOLWOVjeU=]) by mail.sourceforge.net with esmtps (TLSv1:AES256-SHA:256) (Exim 4.44) id 1GooQw-00084L-Lo for nfs@lists.sourceforge.net; Mon, 27 Nov 2006 13:54:41 -0800 To: Josh Boyer In-Reply-To: <1164663614.10787.21.camel@zod.rchland.ibm.com> List-Id: "Discussion of NFS under Linux development, interoperability, and testing." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfs-bounces@lists.sourceforge.net Errors-To: nfs-bounces@lists.sourceforge.net On Mon, 2006-11-27 at 15:40 -0600, Josh Boyer wrote: > On Mon, 2006-11-27 at 21:22 +0000, Chris Caputo wrote: > > On Mon, 27 Nov 2006, Trond Myklebust wrote: > > > On Mon, 2006-11-27 at 19:33 +0000, Chris Caputo wrote: > > > > On Mon, 27 Nov 2006, Trond Myklebust wrote: > > > > > On Mon, 2006-11-27 at 19:09 +0000, Chris Caputo wrote: > > > > > > - if (!RPC_IS_QUEUED(task)) > > > > > > - continue; > > > > > > - rpc_clear_running(task); > > > > > > + queue = task->u.tk_wait.rpc_waitq; > > > > > > > > > > NACK... There is no guarantee that task->u.tk_wait has any meaning here. > > > > > Particularly not so in the case of an asynchronous task, where the > > > > > storage is shared with the work_struct. > > > > > > > > Yikes. Would you suggest I move the lock outside of the union and try > > > > again? > > > > > > No. There is no way this can work. You would need something that > > > guarantees that the task stays queued while you are taking the queue > > > lock. > > > > > > Have you instead tried Christophe Saout's patch (see attachment)? > > > > Thank you for the suggestion. With 65 minutes of uptime so far, Saout's > > November 5th patch is looking good. For reference, normally I see the > > race happen in under 15 minutes. > > > > I'll report back if any problems develop. This machine is an outgoing > > newsfeed server and so it pounds on NFS client routines 24x7. > > Would the race condition that Chris described potentially lead to the > stack trace I originally posted? If so, I can try to test this patch > out myself. Possibly. It is worth trying. Trond ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs