2010-06-09 00:20:53

by Murata, Dennis

[permalink] [raw]
Subject: FW: Unable to mount nfs directories RHEL 4.8

Didn't see the original message, sorry if this is a duplicate

-----Original Message-----
From: Murata, Dennis
Sent: Tuesday, June 08, 2010 3:26 PM
To: [email protected]
Subject: Unable to mount nfs directories RHEL 4.8

We are using a modified RHEL 4.8 build accessing Netapp filers for data
directories. The build has nfs-utils-1.0.6-93.EL4,
nfs-utils-lib-1.0.6-10.el4, kernel-largesmp-2.6.9-89.EL all x86_64.
After a period of use, on a very questionable network using tcp as the
nfs transport, workstation will start getting error messages in
/var/log/messages|dmesg and are not able to mount/access the data
directories. A reboot is necessary to allow the mounts. The time
period varies and seems to depend on the usage, but in general will
start within a week of moderate use. The error messages are:

lockd: cannot monitor 192.168.10.133
lockd: failed to monitor 192.168.10.133
nsm_mon_unmon: rpc failed, status=-96
lockd: cannot monitor 192.168.10.133
lockd: failed to monitor 192.168.10.133
nsm_mon_unmon: rpc failed, status=-96
lockd: cannot monitor 192.168.10.133
lockd: failed to monitor 192.168.10.133

These errors are repeated as access to the filer (ip address has been
changed) is tried. A ps on the workstation shows rpc.statd still
running, service nfslock status reports rpc.statd running.

If the nfs-utils rpm from RHEL 4.6 is loaded, nfs-utils-1.0.6-84.EL4,
the problem does not occur. We have been testing a few workstations for
~two weeks without problem. Any ideas?

TIA
Wayne


2010-06-09 16:19:06

by Chuck Lever

[permalink] [raw]
Subject: Re: FW: Unable to mount nfs directories RHEL 4.8

On 06/ 8/10 08:05 PM, Murata, Dennis wrote:
> Didn't see the original message, sorry if this is a duplicate
>
> -----Original Message-----
> From: Murata, Dennis
> Sent: Tuesday, June 08, 2010 3:26 PM
> To: [email protected]
> Subject: Unable to mount nfs directories RHEL 4.8
>
> We are using a modified RHEL 4.8 build accessing Netapp filers for data
> directories. The build has nfs-utils-1.0.6-93.EL4,
> nfs-utils-lib-1.0.6-10.el4, kernel-largesmp-2.6.9-89.EL all x86_64.
> After a period of use, on a very questionable network using tcp as the
> nfs transport, workstation will start getting error messages in
> /var/log/messages|dmesg and are not able to mount/access the data
> directories. A reboot is necessary to allow the mounts. The time
> period varies and seems to depend on the usage, but in general will
> start within a week of moderate use. The error messages are:
>
> lockd: cannot monitor 192.168.10.133
> lockd: failed to monitor 192.168.10.133
> nsm_mon_unmon: rpc failed, status=-96
> lockd: cannot monitor 192.168.10.133
> lockd: failed to monitor 192.168.10.133
> nsm_mon_unmon: rpc failed, status=-96
> lockd: cannot monitor 192.168.10.133
> lockd: failed to monitor 192.168.10.133
>
> These errors are repeated as access to the filer (ip address has been
> changed) is tried. A ps on the workstation shows rpc.statd still
> running, service nfslock status reports rpc.statd running.

Is portmap running, and is the statd service registered? Is lockd
registered for both UDP and TCP?

> If the nfs-utils rpm from RHEL 4.6 is loaded, nfs-utils-1.0.6-84.EL4,
> the problem does not occur. We have been testing a few workstations for
> ~two weeks without problem. Any ideas?
>
> TIA
> Wayne
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2010-06-09 18:47:26

by Murata, Dennis

[permalink] [raw]
Subject: RE: FW: Unable to mount nfs directories RHEL 4.8



> -----Original Message-----
> From: Chuck Lever [mailto:[email protected]]
> Sent: Wednesday, June 09, 2010 11:18 AM
> To: Murata, Dennis
> Cc: [email protected]
> Subject: Re: FW: Unable to mount nfs directories RHEL 4.8
>
> On 06/ 8/10 08:05 PM, Murata, Dennis wrote:
> > Didn't see the original message, sorry if this is a duplicate
> >
> > -----Original Message-----
> > From: Murata, Dennis
> > Sent: Tuesday, June 08, 2010 3:26 PM
> > To: [email protected]
> > Subject: Unable to mount nfs directories RHEL 4.8
> >
> > We are using a modified RHEL 4.8 build accessing Netapp filers for
> > data directories. The build has nfs-utils-1.0.6-93.EL4,
> > nfs-utils-lib-1.0.6-10.el4, kernel-largesmp-2.6.9-89.EL all x86_64.
> > After a period of use, on a very questionable network using
> tcp as the
> > nfs transport, workstation will start getting error messages in
> > /var/log/messages|dmesg and are not able to mount/access the data
> > directories. A reboot is necessary to allow the mounts. The time
> > period varies and seems to depend on the usage, but in general will
> > start within a week of moderate use. The error messages are:
> >
> > lockd: cannot monitor 192.168.10.133
> > lockd: failed to monitor 192.168.10.133
> > nsm_mon_unmon: rpc failed, status=-96
> > lockd: cannot monitor 192.168.10.133
> > lockd: failed to monitor 192.168.10.133
> > nsm_mon_unmon: rpc failed, status=-96
> > lockd: cannot monitor 192.168.10.133
> > lockd: failed to monitor 192.168.10.133
> >
> > These errors are repeated as access to the filer (ip
> address has been
> > changed) is tried. A ps on the workstation shows rpc.statd still
> > running, service nfslock status reports rpc.statd running.
>
> Is portmap running, and is the statd service registered? Is
> lockd registered for both UDP and TCP?
Portmap is running. Not sure how to check if lockd is registered but
the output from rpcinfo -p
[root@host1 ~]# rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100007 2 udp 880 ypbind
100007 1 udp 880 ypbind
100007 2 tcp 883 ypbind
100007 1 tcp 883 ypbind
100011 1 udp 948 rquotad
100011 2 udp 948 rquotad
100011 1 tcp 963 rquotad
100011 2 tcp 963 rquotad
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100021 1 udp 34574 nlockmgr
100021 3 udp 34574 nlockmgr
100021 4 udp 34574 nlockmgr
100021 1 tcp 32786 nlockmgr
100021 3 tcp 32786 nlockmgr
100021 4 tcp 32786 nlockmgr
100005 1 udp 962 mountd
100005 1 tcp 974 mountd
100005 2 udp 962 mountd
100005 2 tcp 974 mountd
100005 3 udp 962 mountd
100005 3 tcp 974 mountd
100024 1 udp 744 status
100024 1 tcp 750 status
[root@host1 ~]#
>
> > If the nfs-utils rpm from RHEL 4.6 is loaded,
> nfs-utils-1.0.6-84.EL4,
> > the problem does not occur. We have been testing a few
> workstations
> > for ~two weeks without problem. Any ideas?
> >
> > TIA
> > Wayne
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> linux-nfs"
> > in the body of a message to [email protected] More
> majordomo
> > info at http://vger.kernel.org/majordomo-info.html
>
>

2010-06-09 22:34:41

by Chuck Lever

[permalink] [raw]
Subject: Re: FW: Unable to mount nfs directories RHEL 4.8

On 06/ 9/10 02:47 PM, Murata, Dennis wrote:
>
>
>> -----Original Message-----
>> From: Chuck Lever [mailto:[email protected]]
>> Sent: Wednesday, June 09, 2010 11:18 AM
>> To: Murata, Dennis
>> Cc: [email protected]
>> Subject: Re: FW: Unable to mount nfs directories RHEL 4.8
>>
>> On 06/ 8/10 08:05 PM, Murata, Dennis wrote:
>>> Didn't see the original message, sorry if this is a duplicate
>>>
>>> -----Original Message-----
>>> From: Murata, Dennis
>>> Sent: Tuesday, June 08, 2010 3:26 PM
>>> To: [email protected]
>>> Subject: Unable to mount nfs directories RHEL 4.8
>>>
>>> We are using a modified RHEL 4.8 build accessing Netapp filers for
>>> data directories. The build has nfs-utils-1.0.6-93.EL4,
>>> nfs-utils-lib-1.0.6-10.el4, kernel-largesmp-2.6.9-89.EL all x86_64.
>>> After a period of use, on a very questionable network using
>> tcp as the
>>> nfs transport, workstation will start getting error messages in
>>> /var/log/messages|dmesg and are not able to mount/access the data
>>> directories. A reboot is necessary to allow the mounts. The time
>>> period varies and seems to depend on the usage, but in general will
>>> start within a week of moderate use. The error messages are:
>>>
>>> lockd: cannot monitor 192.168.10.133
>>> lockd: failed to monitor 192.168.10.133
>>> nsm_mon_unmon: rpc failed, status=-96
>>> lockd: cannot monitor 192.168.10.133
>>> lockd: failed to monitor 192.168.10.133
>>> nsm_mon_unmon: rpc failed, status=-96
>>> lockd: cannot monitor 192.168.10.133
>>> lockd: failed to monitor 192.168.10.133

These are all from the kernel, specifically lockd. They are reported
when lockd can't perform the upcall (via loopback) to rpc.statd to
monitor 192.168.10.133.

status -96 means the server (both portmap and rpc.statd are on the local
host in this case) doesn't support the requested program version (either
rpcbind v2 or statd v1).

You might get more information by enabling RPC debugging messages on
clients in this state.

# sudo rpcdebug -m rpc -s all

This will cause a lot of traffic in the syslog, so only do it once the
host is wedged, but still trying to do work. We want to capture
debugging output during at least one iteration of the messages above.

There are Red Hat NFS engineers on this list who can help you if you can
reproduce this with stock RHEL 4.8.

>>> These errors are repeated as access to the filer (ip
>> address has been
>>> changed) is tried. A ps on the workstation shows rpc.statd still
>>> running, service nfslock status reports rpc.statd running.
>>
>> Is portmap running, and is the statd service registered? Is
>> lockd registered for both UDP and TCP?
> Portmap is running. Not sure how to check if lockd is registered but
> the output from rpcinfo -p
> [root@host1 ~]# rpcinfo -p
> program vers proto port
> 100000 2 tcp 111 portmapper
> 100000 2 udp 111 portmapper
> 100007 2 udp 880 ypbind
> 100007 1 udp 880 ypbind
> 100007 2 tcp 883 ypbind
> 100007 1 tcp 883 ypbind
> 100011 1 udp 948 rquotad
> 100011 2 udp 948 rquotad
> 100011 1 tcp 963 rquotad
> 100011 2 tcp 963 rquotad
> 100003 2 udp 2049 nfs
> 100003 3 udp 2049 nfs
> 100003 4 udp 2049 nfs
> 100003 2 tcp 2049 nfs
> 100003 3 tcp 2049 nfs
> 100003 4 tcp 2049 nfs
> 100021 1 udp 34574 nlockmgr
> 100021 3 udp 34574 nlockmgr
> 100021 4 udp 34574 nlockmgr
> 100021 1 tcp 32786 nlockmgr
> 100021 3 tcp 32786 nlockmgr
> 100021 4 tcp 32786 nlockmgr
> 100005 1 udp 962 mountd
> 100005 1 tcp 974 mountd
> 100005 2 udp 962 mountd
> 100005 2 tcp 974 mountd
> 100005 3 udp 962 mountd
> 100005 3 tcp 974 mountd
> 100024 1 udp 744 status
> 100024 1 tcp 750 status
> [root@host1 ~]#