Date: Mon, 13 Apr 2009 14:35:49 -0600
From: Daniel Stickney <dstickney@pronto.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Rudy@grumpydevil.homelinux.org, linux-nfs@vger.kernel.org
Subject: Re: Unexplained NFS mount hangs
Message-ID: <20090413143549.4dde87ac@dstickney2>
In-Reply-To: <48017BBF-03BD-4C87-84F1-1D3603273E4F@oracle.com>
References: <20090413092406.304d04fb@dstickney2>
	<C81F82EF-81F0-432D-B727-F496F807CEB3@oracle.com>
	<20090413104759.525161b2@dstickney2>
	<48017BBF-03BD-4C87-84F1-1D3603273E4F@oracle.com>
Content-Type: text/plain; charset=US-ASCII
Sender: linux-nfs-owner@vger.kernel.org
MIME-Version: 1.0

On Mon, 13 Apr 2009 13:08:14 -0400
Chuck Lever <chuck.lever@oracle.com> wrote:

> 
> On Apr 13, 2009, at 12:47 PM, Daniel Stickney wrote:
> 
> > On Mon, 13 Apr 2009 12:12:47 -0400
> > Chuck Lever <chuck.lever@oracle.com> wrote:
> >
> >> On Apr 13, 2009, at 11:24 AM, Daniel Stickney wrote:
> >>> Hi all,
> >>>
> >>> I am investigating some NFS mount hangs that we have started to
> >>> see over the past month on some of our servers. The behavior is
> >>> that the client mount hangs and needs to be manually unmounted
> >>> (forcefully with 'umount -f') and remounted to make it work.
> >>> There are about 85 clients mounting a partition over NFS. About
> >>> 50 of the clients are running Fedora Core 3 with kernel
> >>> 2.6.11-1.27_FC3smp. Not one of these 50 has ever had this mount
> >>> hang. The other 35 are CentOS 5.2 with kernel 2.6.27 which was
> >>> compiled from source. The mount hangs are inconsistent and so far
> >>> I don't know how to trigger them on demand. The timing of the
> >>> hangs as noted by the timestamp in /var/ log/messages varies. Not
> >>> all of the 35 CentOS clients have their mounts hang at the same
> >>> time, and the NFS server continues operating apparently normally
> >>> for all other clients. Normally maybe 5 clients have a mount hang
> >>> per week, on different days, mostly different times. Now and then
> >>> we might see a cluster of a few clien ts have their mounts hang
> >>> at the same exact time, but this is not consistent.
> >>> In /var/log/messages we see
> >>>
> >>> Apr 12 02:04:12 worker120 kernel: nfs: server broker101 not
> >>> responding, still trying
> >>
> >> Are these NFS/UDP or NFS/TCP mounts?
> >>
> >> If you use a different kernel (say, 2.6.26) on the CentOS systems,
> >> do the hangs go away?
> >
> > Hi Chuck,
> >
> > Thanks for your reply. The mounts are NFSv3 over TCP. We have not  
> > tried a different kernel (because of the number of servers to be  
> > upgraded), but that is next on to ToDo list. Wanted to explore the  
> > possibility that some other change might resolve the issue, but I
> > am getting close to launching the kernel upgrades. (The
> > prepackaged RHEL/CentOS 2.6.18* kernels have other NFS client
> > problems with attribute caching which really mess things up, so
> > that is why we have had to compile from source)
> >
> > To add a little more info, in a post on April 10th titled "NFSv3  
> > Client Timeout on 2.6.27" Bryan mentioned that his client socket
> > was in state FIN_WAIT2, and server in CLOSE_WAIT, which is exactly
> > what I am seeing here.
> >
> > tcp        0      0 worker120.cluster:944        
> > broker101.cluster:nfs       FIN_WAIT2
> >
> > This is especially interesting because the original nfs "server
> > not responding" message was about 32 hours ago. On this same
> > client, all other NFS mounts to other servers are showing state
> > "established".
> 
> Poking around in git, I see this recent commit:
> 
> commit 2a9e1cfa23fb62da37739af81127dab5af095d99
> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date:   Tue Oct 28 15:21:39 2008 -0400
> 
>      SUNRPC: Respond promptly to server TCP resets
> 
>      If the server sends us an RST error while we're in the  
> TCP_ESTABLISHED
>      state, then that will not result in a state change, and so the  
> RPC client
>      ends up hanging forever (see
>      http://bugzilla.kernel.org/show_bug.cgi?id=11154)
> 
>      We can intercept the reset by setting up an sk->sk_error_report  
> callback,
>      which will then allow us to initiate a proper shutdown and
> retry...
> 
>      We also make sure that if the send request receives an  
> ECONNRESET, then we
>      shutdown too...
> 
>      Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> Which may address part of the problem.  If I'm reading the output of  
> "git describe" correctly, this one should be in 2.6.28.
> 
> There are a whole series of commits in this area that went upstream  
> about a month ago.  It's not clear if these are also necessary to  
> address the problem.  But they would be in 2.6.30-rc1.
> 

Thanks Chuck. Reading through the bug reports, I also now believe that a
kernel upgrade is the most likely resolution to this problem. I
appreciate your time!

-Daniel


> > -Daniel
> >
> >>
> >>> One very interesting aspect of this behavior is that the load
> >>> value on the client with the hung mount immediately spikes to
> >>> (16.00)+ (normal load value). We have also seen client load
> >>> spikes to (30.00)+
> >>> (normal load value). These discrete load value increases might be
> >>> a good hint.
> >>>
> >>> Running 'df' prints some output and then hangs when it reaches the
> >>> hung mount point. 'mount -v' shows the mount point like normal.
> >>> When an NFS server is rebooted, we are used to seeing the client
> >>> log a "nfs: server ___________ not responding, still trying",
> >>> then a "nfs: server __________ OK" message when it comes back
> >>> online. With this issue there is never an "OK" message even
> >>> though the NFS server is still functioning for all other NFS
> >>> clients. On a client which has a hung NFS mount, running 'rpcinfo
> >>> -p' and 'showmount -e' against the NFS server shows that RPC and
> >>> NFS appear to be functioning between client and server even
> >>> during the issue.
> >>>
> >>>
> >>> # rpcinfo -p broker101
> >>>  program vers proto   port
> >>>   100000    2   tcp    111  portmapper
> >>>   100000    2   udp    111  portmapper
> >>>   100021    1   udp  32779  nlockmgr
> >>>   100021    3   udp  32779  nlockmgr
> >>>   100021    4   udp  32779  nlockmgr
> >>>   100021    1   tcp  60389  nlockmgr
> >>>   100021    3   tcp  60389  nlockmgr
> >>>   100021    4   tcp  60389  nlockmgr
> >>>   100011    1   udp    960  rquotad
> >>>   100011    2   udp    960  rquotad
> >>>   100011    1   tcp    963  rquotad
> >>>   100011    2   tcp    963  rquotad
> >>>   100003    2   udp   2049  nfs
> >>>   100003    3   udp   2049  nfs
> >>>   100003    4   udp   2049  nfs
> >>>   100003    2   tcp   2049  nfs
> >>>   100003    3   tcp   2049  nfs
> >>>   100003    4   tcp   2049  nfs
> >>>   100005    1   udp    995  mountd
> >>>   100005    1   tcp    998  mountd
> >>>   100005    2   udp    995  mountd
> >>>   100005    2   tcp    998  mountd
> >>>   100005    3   udp    995  mountd
> >>>   100005    3   tcp    998  mountd
> >>>
> >>>
> >>> # showmount -e broker101
> >>> Export list for broker101:
> >>> /mnt/sdc1 *
> >>> /mnt/sdb1 *
> >>>
> >>>
> >>> It is confusing that the NFS client doesn't recover automatically.
> >>> So whatever the issue is evidently is blocking the kernel from
> >>> seeing that the NFS server is live and functioning after the issue
> >>> is triggered.
> >>>
> >>> I'm running low on ideas of how to resolve this. One idea I have
> >>> is to modify some NFS client timeout values, but I don't have a
> >>> specific reason to think this will resolve the problem. Right now
> >>> the values are:
> >>>
> >>> # sysctl -a | grep -i nfs
> >>> fs.nfs.nlm_grace_period = 0
> >>> fs.nfs.nlm_timeout = 10
> >>> fs.nfs.nlm_udpport = 0
> >>> fs.nfs.nlm_tcpport = 0
> >>> fs.nfs.nsm_use_hostnames = 0
> >>> fs.nfs.nsm_local_state = 0
> >>> fs.nfs.nfs_callback_tcpport = 0
> >>> fs.nfs.idmap_cache_timeout = 600
> >>> fs.nfs.nfs_mountpoint_timeout = 500
> >>> fs.nfs.nfs_congestion_kb = 65152
> >>> sunrpc.nfs_debug = 0
> >>> sunrpc.nfsd_debug = 0
> >>>
> >>>
> >>> I've turned on nfs debugging but there was a tremendous amount of
> >>> output because of the NFS clients activity on several different
> >>> (and working) NFS mount points. I can capture and supply this
> >>> output again if it would be helpful. Has anyone seen this
> >>> behavior before, and does anyone have any suggestions for how
> >>> this might be resolved?
> >>>
> >>> Thanks for your time,
> >>>
> >>> Daniel Stickney
> >>> Operations Manager - Systems and Network Engineer
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe
> >>> linux-nfs" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >
> >
> >
> > Daniel Stickney
> > Operations Manager - Systems and Network Engineer
> 


Daniel Stickney
Operations Manager - Systems and Network Engineer