LinuxLists.cc - nfs3 lockd: cannot monitor errors

2012-03-03 12:21:23

Subject: nfs3 lockd: cannot monitor errors

Hi
We recently switched to nfs3 from nfs4 and now we're getting lots of
lockd errors. We can remove the error by mounting with:
-o local_lock=posix

We had to switch to be able to use the posix acl we had set on the share.

What problems may we face by setting the local_lock?
Thanks,
Steve

2012-03-05 23:23:51

by Chuck Lever III

[permalink] [raw]

Subject: Re: nfs3 lockd: cannot monitor errors

On Mar 5,
> On 03/03/12 >> Hi
>> We >> lockd >> -o local_lock=posix
>>
>> We >>
>> What >> Thanks,
>> Steve
>>
>
> Hi
> Sorry to bump
> This >
> Could >
> Cheers,
> Steve
>
> Server:
> Mar
This error
> ps aux | grep rpc
> root > root > root > root > root > root > statd >
>
> rpcinfo
> program version netid > 100000 4 tcp6 > 100000 3 tcp6 > 100000 4 udp6 > 100000 3 udp6 > 100000 4 tcp > 100000 3 tcp > 100000 2 tcp > 100000 4 udp > 100000 3 udp > 100000 2 udp > 100000 4 local > 100000 3 local > 100005 1 udp > 100005 1 tcp > 100005 1 udp6 > 100005 1 tcp6 > 100005 2 udp > 100003 2 tcp > 100003 3 tcp > 100227 2 tcp > 100227 3 tcp > 100003 2 udp > 100003 3 udp > 100227 2 udp > 100227 3 udp > 100005 2 tcp > 100005 2 udp6 > 100003 2 tcp6 > 100003 3 tcp6 > 100227 2 tcp6 > 100227 3 tcp6 > 100003 2 udp6 > 100003 3 udp6 > 100227 2 udp6 > 100227 3 udp6 > 100021 1 udp > 100021 3 udp > 100021 4 udp > 100021 1 tcp > 100021 3 tcp > 100021 4 tcp > 100021 1 udp6 > 100021 3 udp6 > 100021 4 udp6 > 100021 1 tcp6 > 100021 3 tcp6 > 100021 4 tcp6 > 100005 2 tcp6 > 100005 3 udp > 100005 3 tcp > 100005 3 udp6 > 100005 3 tcp6 > 100024 1 udp > 100024 1 tcp > 100024 1 udp6 > 100024 1 tcp6 >
>
> --
> To unsubscribe > the body > More majordomo info at
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2012, at 6:16 PM, steve wrote:
13:21, steve wrote:
recently switched to nfs3 from nfs4 and now we're getting lots of
errors. We can remove the error by mounting with:
had to switch to be able to use the posix acl we had set on the share.
problems may we face by setting the local_lock?
is openSUSE 12.1. All the references to lockd probs seem to come from 10 or so years ago.
anyone give me a one liner as to where to start looking? I've gone through all the usual channels.
5 16:24:46 hh3 kernel: [16760.656609] lockd: cannot monitor hh6
message means "hh3" cannot monitor "hh6". Usually that's a sign that rpc.statd on hh3 is having trouble getting a clean DNS lookup of hh6. Maybe enabling debugging on statd would produce a little more diagnostic information.
1214 0.0 0.1 2356 656 ? Ss Mar05 0:00 /sbin/rpcbind
1821 0.0 0.0 0 0 ? S< Mar05 0:00 [rpciod]
4649 0.0 0.0 2416 416 ? Ss Mar05 0:00 /usr/sbin/rpc.idmapd
4668 0.0 0.3 4092 1508 ? Ss Mar05 0:00 rpc.gssd
5192 0.0 0.2 3840 1096 ? Ss Mar05 0:00 /usr/sbin/rpc.svcgssd
5206 0.0 0.1 3032 568 ? Ss Mar05 0:00 /usr/sbin/rpc.mountd --no-nfs-version 4
5243 0.0 0.1 2584 652 ? Ss Mar05 0:00 /usr/sbin/rpc.statd --no-notify
address service owner
::.0.111 portmapper superuser
::.0.111 portmapper superuser
::.0.111 portmapper superuser
::.0.111 portmapper superuser
0.0.0.0.0.111 portmapper superuser
0.0.0.0.0.111 portmapper superuser
0.0.0.0.0.111 portmapper superuser
0.0.0.0.0.111 portmapper superuser
0.0.0.0.0.111 portmapper superuser
0.0.0.0.0.111 portmapper superuser
/var/run/rpcbind.sock portmapper superuser
/var/run/rpcbind.sock portmapper superuser
0.0.0.0.178.110 mountd superuser
0.0.0.0.136.30 mountd superuser
::.145.115 mountd superuser
::.220.209 mountd superuser
0.0.0.0.163.232 mountd superuser
0.0.0.0.8.1 nfs superuser
0.0.0.0.8.1 nfs superuser
0.0.0.0.8.1 nfs_acl superuser
0.0.0.0.8.1 nfs_acl superuser
0.0.0.0.8.1 nfs superuser
0.0.0.0.8.1 nfs superuser
0.0.0.0.8.1 nfs_acl superuser
0.0.0.0.8.1 nfs_acl superuser
0.0.0.0.210.206 mountd superuser
::.150.240 mountd superuser
::.8.1 nfs superuser
::.8.1 nfs superuser
::.8.1 nfs_acl superuser
::.8.1 nfs_acl superuser
::.8.1 nfs superuser
::.8.1 nfs superuser
::.8.1 nfs_acl superuser
::.8.1 nfs_acl superuser
0.0.0.0.141.165 nlockmgr superuser
0.0.0.0.141.165 nlockmgr superuser
0.0.0.0.141.165 nlockmgr superuser
0.0.0.0.173.52 nlockmgr superuser
0.0.0.0.173.52 nlockmgr superuser
0.0.0.0.173.52 nlockmgr superuser
::.208.135 nlockmgr superuser
::.208.135 nlockmgr superuser
::.208.135 nlockmgr superuser
::.175.226 nlockmgr superuser
::.175.226 nlockmgr superuser
::.175.226 nlockmgr superuser
::.200.118 mountd superuser
0.0.0.0.217.9 mountd superuser
0.0.0.0.185.184 mountd superuser
::.226.152 mountd superuser
::.204.113 mountd superuser
0.0.0.0.131.53 status 103
0.0.0.0.130.61 status 103
::.140.153 status 103
::.179.31 status 103
from this list: send the line "unsubscribe linux-nfs" in
of a message to [email protected]
http://vger.kernel.org/majordomo-info.html

2012-03-06 16:13:08

by Chuck Lever III

[permalink] [raw]

Subject: Re: nfs3 lockd: cannot monitor errors

On Mar 6, 2012, at 2:46 AM, steve wrote:

> On 06/03/12 00:23, Chuck Lever wrote:
>>
>> On Mar 5, 2012, at 6:16 PM, steve wrote:
>>
>>> On 03/03/12 13:21, steve wrote:
>>>> Hi
>>>> We recently switched to nfs3 from nfs4 and now we're getting lots of
>>>> lockd errors. We can remove the error by mounting with:
>>>> -o local_lock=posix
>>>>
>>>> We had to switch to be able to use the posix acl we had set on the share.
>>>>
>>>> What problems may we face by setting the local_lock?
>>>> Thanks,
>>>> Steve
>>>>
>>>
>>> Hi
>>> Sorry to bump
>>> This is openSUSE 12.1. All the references to lockd probs seem to come from 10 or so years ago.
>>>
>>> Could anyone give me a one liner as to where to start looking? I've gone through all the usual channels.
>>>
>>> Cheers,
>>> Steve
>>>
>>> Server:
>>> Mar 5 16:24:46 hh3 kernel: [16760.656609] lockd: cannot monitor hh6
>>
>> This error message means "hh3" cannot monitor "hh6". Usually that's a sign that rpc.statd on hh3 is having trouble getting a clean DNS lookup of hh6. Maybe enabling debugging on statd would produce a little more diagnostic information.
>>
>
> Hi Chuck
> Thanks for the reply. You've got me on the right track. Running at both ends in the foreground is rock solid and instantaneous:
>
> server: hh3, 192.168.1.3
> hh3:/home/steve # rpc.statd -Fd
> rpc.statd: Version 1.2.5 starting
> rpc.statd: Flags: No-Daemon Log-STDERR TI-RPC
> sm-notify: Version 1.2.5 starting
> sm-notify: Already notifying clients; Exiting!
> rpc.statd: Local NSM state number: 459
> rpc.statd: Effective UID, GID: 103, 65534
> rpc.statd: Waiting for client connections
> rpc.statd: from_local: updating local if addr list
> rpc.statd: from_local: checked 5 local if addrs; incoming address not found
> rpc.statd: check_default: access by 192.168.1.12 ALLOWED
> rpc.statd: Received SM_NOTIFY from hh6, state: 59
> rpc.statd: SM_NOTIFY from hh6 while not monitoring any hosts
> rpc.statd: Waiting for client connections
> rpc.statd: from_local: updating local if addr list
> rpc.statd: from_local: incoming address matches local interface address
> rpc.statd: check_default: access by 127.0.0.1 ALLOWED
> rpc.statd: Received SM_MON for 192.168.1.12 from hh3
> rpc.statd: get_nameinfo: failed to resolve address: Name or service not known

This is probably why you get the "lockd" failed to monitor" message. If the NFS server can't resolve "hh6.hh6.site" then it can't monitor it.

> client: hh6, 192.168.1.12
> rpc.statd: MONITORING 192.168.1.12 for hh3
> rpc.statd: Waiting for client connections
> rpc.statd -Fd
> rpc.statd: Version 1.2.5 starting
> rpc.statd: Flags: No-Daemon Log-STDERR TI-RPC
> sm-notify: Version 1.2.5 starting
> sm-notify: Already notifying clients; Exiting!
> rpc.statd: Adding record for hh3.hh3.site to the monitor list...
> rpc.statd: Loaded 1 previously monitored hosts
> rpc.statd: Local NSM state number: 59
> rpc.statd: Effective UID, GID: 103, 65534
> rpc.statd: Waiting for client connections
>
> Dropping to daemon makes the errors reappear with slow file transfer.
>
> I've also nailed the Thunar file manager under XFCE which seems to be making calls to cifs via Kerberos each time we request a file ??. With Nautilus it's fine. The nfs/server principal is called once at the start of the session with no cifs requests.
>
> Just one quick question, does the client server statd output look OK?

At first blush, it looks you definitely have some DNS configuration problems. statd can only work when the forward and reverse DNS maps for both peers match each other.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2012-03-05 23:16:51

by steve

[permalink] [raw]

Subject: Re: nfs3 lockd: cannot monitor errors

On 03/03/12 13:21, steve wrote:
> Hi
> We recently switched to nfs3 from nfs4 and now we're getting lots of
> lockd errors. We can remove the error by mounting with:
> -o local_lock=posix
>
> We had to switch to be able to use the posix acl we had set on the share.
>
> What problems may we face by setting the local_lock?
> Thanks,
> Steve
>

Hi
Sorry to bump
This is openSUSE 12.1. All the references to lockd probs seem to come
from 10 or so years ago.

Could anyone give me a one liner as to where to start looking? I've gone
through all the usual channels.

Cheers,
Steve

Server:
Mar 5 16:24:46 hh3 kernel: [16760.656609] lockd: cannot monitor hh6

ps aux | grep rpc
root 1214 0.0 0.1 2356 656 ? Ss Mar05 0:00
/sbin/rpcbind
root 1821 0.0 0.0 0 0 ? S< Mar05 0:00 [rpciod]
root 4649 0.0 0.0 2416 416 ? Ss Mar05 0:00
/usr/sbin/rpc.idmapd
root 4668 0.0 0.3 4092 1508 ? Ss Mar05 0:00 rpc.gssd
root 5192 0.0 0.2 3840 1096 ? Ss Mar05 0:00
/usr/sbin/rpc.svcgssd
root 5206 0.0 0.1 3032 568 ? Ss Mar05 0:00
/usr/sbin/rpc.mountd --no-nfs-version 4
statd 5243 0.0 0.1 2584 652 ? Ss Mar05 0:00
/usr/sbin/rpc.statd --no-notify

rpcinfo
program version netid address service owner
100000 4 tcp6 ::.0.111 portmapper superuser
100000 3 tcp6 ::.0.111 portmapper superuser
100000 4 udp6 ::.0.111 portmapper superuser
100000 3 udp6 ::.0.111 portmapper superuser
100000 4 tcp 0.0.0.0.0.111 portmapper superuser
100000 3 tcp 0.0.0.0.0.111 portmapper superuser
100000 2 tcp 0.0.0.0.0.111 portmapper superuser
100000 4 udp 0.0.0.0.0.111 portmapper superuser
100000 3 udp 0.0.0.0.0.111 portmapper superuser
100000 2 udp 0.0.0.0.0.111 portmapper superuser
100000 4 local /var/run/rpcbind.sock portmapper superuser
100000 3 local /var/run/rpcbind.sock portmapper superuser
100005 1 udp 0.0.0.0.178.110 mountd superuser
100005 1 tcp 0.0.0.0.136.30 mountd superuser
100005 1 udp6 ::.145.115 mountd superuser
100005 1 tcp6 ::.220.209 mountd superuser
100005 2 udp 0.0.0.0.163.232 mountd superuser
100003 2 tcp 0.0.0.0.8.1 nfs superuser
100003 3 tcp 0.0.0.0.8.1 nfs superuser
100227 2 tcp 0.0.0.0.8.1 nfs_acl superuser
100227 3 tcp 0.0.0.0.8.1 nfs_acl superuser
100003 2 udp 0.0.0.0.8.1 nfs superuser
100003 3 udp 0.0.0.0.8.1 nfs superuser
100227 2 udp 0.0.0.0.8.1 nfs_acl superuser
100227 3 udp 0.0.0.0.8.1 nfs_acl superuser
100005 2 tcp 0.0.0.0.210.206 mountd superuser
100005 2 udp6 ::.150.240 mountd superuser
100003 2 tcp6 ::.8.1 nfs superuser
100003 3 tcp6 ::.8.1 nfs superuser
100227 2 tcp6 ::.8.1 nfs_acl superuser
100227 3 tcp6 ::.8.1 nfs_acl superuser
100003 2 udp6 ::.8.1 nfs superuser
100003 3 udp6 ::.8.1 nfs superuser
100227 2 udp6 ::.8.1 nfs_acl superuser
100227 3 udp6 ::.8.1 nfs_acl superuser
100021 1 udp 0.0.0.0.141.165 nlockmgr superuser
100021 3 udp 0.0.0.0.141.165 nlockmgr superuser
100021 4 udp 0.0.0.0.141.165 nlockmgr superuser
100021 1 tcp 0.0.0.0.173.52 nlockmgr superuser
100021 3 tcp 0.0.0.0.173.52 nlockmgr superuser
100021 4 tcp 0.0.0.0.173.52 nlockmgr superuser
100021 1 udp6 ::.208.135 nlockmgr superuser
100021 3 udp6 ::.208.135 nlockmgr superuser
100021 4 udp6 ::.208.135 nlockmgr superuser
100021 1 tcp6 ::.175.226 nlockmgr superuser
100021 3 tcp6 ::.175.226 nlockmgr superuser
100021 4 tcp6 ::.175.226 nlockmgr superuser
100005 2 tcp6 ::.200.118 mountd superuser
100005 3 udp 0.0.0.0.217.9 mountd superuser
100005 3 tcp 0.0.0.0.185.184 mountd superuser
100005 3 udp6 ::.226.152 mountd superuser
100005 3 tcp6 ::.204.113 mountd superuser
100024 1 udp 0.0.0.0.131.53 status 103
100024 1 tcp 0.0.0.0.130.61 status 103
100024 1 udp6 ::.140.153 status 103
100024 1 tcp6 ::.179.31 status 103

2012-03-06 07:46:38

by steve

[permalink] [raw]

Subject: Re: nfs3 lockd: cannot monitor errors

On 06/03/12 00:23, Chuck Lever wrote:
>
> On Mar 5, 2012, at 6:16 PM, steve wrote:
>
>> On 03/03/12 13:21, steve wrote:
>>> Hi
>>> We recently switched to nfs3 from nfs4 and now we're getting lots of
>>> lockd errors. We can remove the error by mounting with:
>>> -o local_lock=posix
>>>
>>> We had to switch to be able to use the posix acl we had set on the share.
>>>
>>> What problems may we face by setting the local_lock?
>>> Thanks,
>>> Steve
>>>
>>
>> Hi
>> Sorry to bump
>> This is openSUSE 12.1. All the references to lockd probs seem to come from 10 or so years ago.
>>
>> Could anyone give me a one liner as to where to start looking? I've gone through all the usual channels.
>>
>> Cheers,
>> Steve
>>
>> Server:
>> Mar 5 16:24:46 hh3 kernel: [16760.656609] lockd: cannot monitor hh6
>
> This error message means "hh3" cannot monitor "hh6". Usually that's a sign that rpc.statd on hh3 is having trouble getting a clean DNS lookup of hh6. Maybe enabling debugging on statd would produce a little more diagnostic information.
>

Hi Chuck
Thanks for the reply. You've got me on the right track. Running at both
ends in the foreground is rock solid and instantaneous:

server: hh3, 192.168.1.3
hh3:/home/steve # rpc.statd -Fd
rpc.statd: Version 1.2.5 starting
rpc.statd: Flags: No-Daemon Log-STDERR TI-RPC
sm-notify: Version 1.2.5 starting
sm-notify: Already notifying clients; Exiting!
rpc.statd: Local NSM state number: 459
rpc.statd: Effective UID, GID: 103, 65534
rpc.statd: Waiting for client connections
rpc.statd: from_local: updating local if addr list
rpc.statd: from_local: checked 5 local if addrs; incoming address not found
rpc.statd: check_default: access by 192.168.1.12 ALLOWED
rpc.statd: Received SM_NOTIFY from hh6, state: 59
rpc.statd: SM_NOTIFY from hh6 while not monitoring any hosts
rpc.statd: Waiting for client connections
rpc.statd: from_local: updating local if addr list
rpc.statd: from_local: incoming address matches local interface address
rpc.statd: check_default: access by 127.0.0.1 ALLOWED
rpc.statd: Received SM_MON for 192.168.1.12 from hh3
rpc.statd: get_nameinfo: failed to resolve address: Name or service not
known

client: hh6, 192.168.1.12
rpc.statd: MONITORING 192.168.1.12 for hh3
rpc.statd: Waiting for client connections
rpc.statd -Fd
rpc.statd: Version 1.2.5 starting
rpc.statd: Flags: No-Daemon Log-STDERR TI-RPC
sm-notify: Version 1.2.5 starting
sm-notify: Already notifying clients; Exiting!
rpc.statd: Adding record for hh3.hh3.site to the monitor list...
rpc.statd: Loaded 1 previously monitored hosts
rpc.statd: Local NSM state number: 59
rpc.statd: Effective UID, GID: 103, 65534
rpc.statd: Waiting for client connections

Dropping to daemon makes the errors reappear with slow file transfer.

I've also nailed the Thunar file manager under XFCE which seems to be
making calls to cifs via Kerberos each time we request a file ??. With
Nautilus it's fine. The nfs/server principal is called once at the start
of the session with no cifs requests.

Just one quick question, does the client server statd output look OK?
Thanks,
Steve