Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753327AbYKYNjz (ORCPT ); Tue, 25 Nov 2008 08:39:55 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753094AbYKYNjn (ORCPT ); Tue, 25 Nov 2008 08:39:43 -0500 Received: from mtaout03-winn.ispmail.ntl.com ([81.103.221.49]:21893 "EHLO mtaout03-winn.ispmail.ntl.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751304AbYKYNjm (ORCPT ); Tue, 25 Nov 2008 08:39:42 -0500 From: Ian Campbell To: Trond Myklebust Cc: linux-nfs@vger.kernel.org, Max Kellermann , linux-kernel@vger.kernel.org, gcosta@redhat.com, Grant Coady , "J. Bruce Fields" , Tom Tucker In-Reply-To: <1227619696.7057.19.camel@heimdal.trondhjem.org> References: <20081017123207.GA14979@rabbit.intern.cm-ag> <1224484046.23068.14.camel@localhost.localdomain> <1225539927.2221.3.camel@localhost.localdomain> <1225546878.4390.3.camel@heimdal.trondhjem.org> <1227596962.16868.22.camel@localhost.localdomain> <1227619696.7057.19.camel@heimdal.trondhjem.org> Content-Type: text/plain Date: Tue, 25 Nov 2008 13:38:59 +0000 Message-Id: <1227620339.9425.99.camel@zakaz.uk.xensource.com> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 Content-Transfer-Encoding: 7bit X-SA-Exim-Connect-IP: 62.200.22.2 X-SA-Exim-Mail-From: ijc@hellion.org.uk Subject: Re: [PATCH] NFS regression in 2.6.26?, "task blocked for more than 120 seconds" X-SA-Exim-Version: 4.2.1 (built Tue, 09 Jan 2007 17:23:22 +0000) X-SA-Exim-Scanned: Yes (on hopkins.hellion.org.uk) X-Cloudmark-Analysis: v=1.0 c=1 a=6OAc2UI1ETEA:10 a=Gm1qT94b7fUA:10 a=VPW9pYw8AAAA:8 a=lhXSY8IOKFpFPMKdGVsA:9 a=ZddaTsgFqpLc0rkomn4MTtvKUQUA:4 a=FMDdsIUKT1EA:10 a=Hf6muOzgCGQA:10 a=LY0hPdMaydYA:10 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3435 Lines: 76 On Tue, 2008-11-25 at 08:28 -0500, Trond Myklebust wrote: > On Tue, 2008-11-25 at 07:09 +0000, Ian Campbell wrote: > > On Sat, 2008-11-01 at 09:41 -0400, Trond Myklebust wrote: > > > On Sat, 2008-11-01 at 11:45 +0000, Ian Campbell wrote: > > > > On Mon, 2008-10-20 at 07:27 +0100, Ian Campbell wrote: > > > > > So far I have bisected down to this range and am currently testing > > > > > acee478 which has been up for >4days. > > > > > > > > Another update. It has now bisected down to a small range > > > > > > > > 7272dcd31d56580dee7693c21e369fd167e137fe SUNRPC: xprt_autoclose() should not call xprt_disconnect() > > > > e06799f958bf7f9f8fae15f0c6f519953fb0257c SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket > > > > ef80367071dce7d2533e79ae8f3c84ec42708dc8 SUNRPC: TCP clear XPRT_CLOSE_WAIT when the socket is closed for writes > > > > 3b948ae5be5e22532584113e2e02029519bbad8f SUNRPC: Allow the client to detect if the TCP connection is closed > > > > 67a391d72ca7efb387c30ec761a487e50a3ff085 SUNRPC: Fix TCP rebinding logic > > > > 66af1e558538137080615e7ad6d1f2f80862de01 SUNRPC: Fix a race in xs_tcp_state_change() > > > > > > > > I'm currently testing 3b948ae5be5e22532584113e2e02029519bbad8f. > > > > > > > > 7272dcd31d56580dee7693c21e369fd167e137fe repro'd in half a day while > > > > ef818a28fac9bd214e676986d8301db0582b92a9 (parent of > > > > 66af1e558538137080615e7ad6d1f2f80862de01) survived for 7 days. > > > > According to bisect: > > > > e06799f958bf7f9f8fae15f0c6f519953fb0257c is first bad commit > > commit e06799f958bf7f9f8fae15f0c6f519953fb0257c > > Author: Trond Myklebust > > Date: Mon Nov 5 15:44:12 2007 -0500 > > > > SUNRPC: Use shutdown() instead of close() when disconnecting a TCP socket > > > > By using shutdown() rather than close() we allow the RPC client to wait > > for the TCP close handshake to complete before we start trying to reconnect > > using the same port. > > We use shutdown(SHUT_WR) only instead of shutting down both directions, > > however we wait until the server has closed the connection on its side. > > > > Signed-off-by: Trond Myklebust > > > > I've started testing 2.6.26 + revert. It's been a long while since I > > started this process so I'll also have a go at an up to date version. > > > > Cheers, > > That would indicate that the server is failing to close the TCP > connection when the client closes on its end. > > Could you remind me what server you are using? 2.6.25-2-486 which is a Debian package from backports.org, changelog indicates that it contains 2.6.25.7. > Also, does 'netstat -t' > show connections that are stuck in the CLOSE_WAIT state when you see the > hang? I'd have to wait for it to reproduce again to be 100% sure but according to http://lkml.indiana.edu/hypermail/linux/kernel/0808.3/0120.html I was seeing connections in FIN_WAIT2 but not CLOSE_WAIT. Ian. -- Ian Campbell Current Noise: Diamond Head - It's Electric "The only real way to look younger is not to be born so soon." -- Charles Schulz, "Things I've Had to Learn Over and Over and Over" -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/