2007-11-20 17:18:34

by Gerd Bavendiek

[permalink] [raw]
Subject: [NFS] 2.6.5-7.282 or 2.6.5-7.283: kernel: RPC: error 5 or nsm_mon_unmon: rpc failed, status=-13

Hi,

yes, this is old software from your point of view. But as I could not
get any helpful information so far I would like to ask here.

Running SLES9 SP3 (i.e. 2.6.5-7.244) on many boxes we started to
update some of these systems to 2.6.5-7.282 or 2.6.5-7.283. After this
update we see sometimes errors in the RPC layer.

One example is this box with a very simple setup, nothing special:

ad-test:/root>>> grep -i rpc /var/log/messages
Aug 21 13:42:47 ad-test kernel: RPC: error 5 connecting to server localhost
Aug 21 13:42:47 ad-test kernel: RPC: failed to contact portmap (errno -5).
Aug 21 13:50:51 ad-test kernel: RPC: error 5 connecting to server localhost
Aug 21 13:50:51 ad-test kernel: RPC: failed to contact portmap (errno -5).
Aug 21 14:37:50 ad-test kernel: RPC: error 5 connecting to server localhost
Aug 21 14:37:50 ad-test kernel: RPC: failed to contact portmap (errno -5).
Aug 22 15:11:32 ad-test kernel: RPC: error 5 connecting to server localhost
Aug 22 15:11:32 ad-test kernel: RPC: failed to contact portmap (errno -5).
ad-test:/root>>>

Very interesting: RPC 100024 is missing.

ad-test:/root>>> rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100021 1 udp 32799 nlockmgr
100021 3 udp 32799 nlockmgr
100021 4 udp 32799 nlockmgr
100021 1 tcp 33012 nlockmgr
100021 3 tcp 33012 nlockmgr
100021 4 tcp 33012 nlockmgr
ad-test:/root>>>

This is an 2.6.5-7.282, which has been up for 80 days.

Another example, again very simple setup, nothing special:

Aug 15 16:40:22 polyxena kernel: RPC: error 5 connecting to server localhost
Aug 15 16:40:22 polyxena kernel: RPC: failed to contact portmap (errno -5).
Aug 15 16:40:22 polyxena kernel: RPC: error 5 connecting to server localhost
Aug 15 16:40:22 polyxena kernel: RPC: failed to contact portmap (errno -5).

This is an 2.6.5-7.283-smp, uptime 201 days. On this one rpcinfo
output is fine:

polyxena:/root>>> rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 34281 status
100021 1 udp 34281 nlockmgr
100021 3 udp 34281 nlockmgr
100021 4 udp 34281 nlockmgr
100024 1 tcp 43627 status
100021 1 tcp 43627 nlockmgr
100021 3 tcp 43627 nlockmgr
100021 4 tcp 43627 nlockmgr
polyxena:/root>>>

We have never seen these errors with 282 or 283 on x86_64. We have never
seen them with 2.6.5-7.244.

On machines, which boot via pxe and have their root file system on
NetApp filers, we see the above described error _OR_ a second one:

Sep 16 04:46:57 c02ptec kernel: nsm_mon_unmon: rpc failed, status=-13
Sep 16 04:46:57 c02ptec kernel: lockd: cannot unmonitor 10.172.207.7

This may happen 10 minutes after reboot or after 60 days uptime.

Sep 16 04:48:07 c02ptec kernel: nsm_mon_unmon: rpc failed, status=-13
Sep 16 04:48:07 c02ptec kernel: lockd: cannot monitor 10.172.207.7
Sep 16 04:48:07 c02ptec kernel: lockd: failed to monitor 10.172.207.7

Luckily the application still lives, so I have this system still
running:

c02ptec:/var/log>>> uptime
6:04pm up 65 days 14:22, 0 users, load average: 0.06, 0.06, 0.07

despite all NFS kernel threads are gone:

c02ptec:/var/log>>> rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100033058 1 tcp 39722
100033057 1 tcp 39737
c02ptec:/var/log>>>

Are there known issue ?

What can I do to give more information ?

Thanks !

Gerd

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs