From: Chris Caputo Subject: Re: NFS hang Date: Fri, 1 Dec 2006 09:14:46 +0000 (GMT) Message-ID: References: <1162840599.31460.8.camel@zod.rchland.ibm.com> <1164655027.5727.5.camel@lade.trondhjem.org> <1164657487.5727.12.camel@lade.trondhjem.org> <1164948671.5761.12.camel@lade.trondhjem.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Frank Filz , Josh Boyer , nfs@lists.sourceforge.net Return-path: Received: from sc8-sf-mx2-b.sourceforge.net ([10.3.1.92] helo=mail.sourceforge.net) by sc8-sf-list2-new.sourceforge.net with esmtp (Exim 4.43) id 1Gq4Tr-0002I2-Hq for nfs@lists.sourceforge.net; Fri, 01 Dec 2006 01:14:52 -0800 Received: from nacho.alt.net ([207.14.113.18]) by mail.sourceforge.net with smtp (Exim 4.44) id 1Gq4Tp-000674-FJ for nfs@lists.sourceforge.net; Fri, 01 Dec 2006 01:14:50 -0800 To: Trond Myklebust In-Reply-To: <1164948671.5761.12.camel@lade.trondhjem.org> List-Id: "Discussion of NFS under Linux development, interoperability, and testing." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfs-bounces@lists.sourceforge.net Errors-To: nfs-bounces@lists.sourceforge.net On Thu, 30 Nov 2006, Trond Myklebust wrote: > On Fri, 2006-12-01 at 04:20 +0000, Chris Caputo wrote: > > On Thu, 30 Nov 2006, Chris Caputo wrote: > > > I am not sure if this is related, but at just over 3 days of uptime with > > > 2.6.19-rc6 and the Saout patch, I had this happen: > > > > > > --- > > > BUG: unable to handle kernel NULL pointer dereference at virtual address 00000028 > > > printing eip: > > > c029cf64 > > > EIP is at call_start+0x5c/0x6f > > > > I believe the above is the following line in clnt.c:call_start(): > > > > clnt->cl_stats->rpccnt++; > > > > So call_start() was called with a NULL task->tk_client. > > > > Similar result with 2.6.19... > > > > BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008 > > printing eip: > > c029ef8e > > EIP is at xprt_reserve+0x28/0x119 > > > > In this one xprt.c:xprt_reserve() is I believe crashing at: > > > > spin_lock(&xprt->reserve_lock); > > > > due to a NULL task->tk_xprt. > > > > Ideas on whether this is related to the Saout sched race patch or if this > > is something else entirely? > > I suspect that it is related, but not the same race. I've identified a > couple of other possible race conditions, mainly to do with the fact > that nothing prevents an rpc_task from being freed while you are inside > rpc_wake_up_task(). Could you try applying both the attached patches, > and see if that helps? I compiled 2.6.19 with your patches in addition to the Saout patch and a test is now in progress. Thanks, Chris ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs