Hi,
yes, this is old software from your point of view. But as I could not
get any helpful information so far I would like to ask here.
Running SLES9 SP3 (i.e. 2.6.5-7.244) on many boxes we started to
update some of these systems to 2.6.5-7.282 or 2.6.5-7.283. After this
update we see sometimes errors in the RPC layer.
One example is this box with a very simple setup, nothing special:
ad-test:/root>>> grep -i rpc /var/log/messages
Aug 21 13:42:47 ad-test kernel: RPC: error 5 connecting to server localhost
Aug 21 13:42:47 ad-test kernel: RPC: failed to contact portmap (errno -5).
Aug 21 13:50:51 ad-test kernel: RPC: error 5 connecting to server localhost
Aug 21 13:50:51 ad-test kernel: RPC: failed to contact portmap (errno -5).
Aug 21 14:37:50 ad-test kernel: RPC: error 5 connecting to server localhost
Aug 21 14:37:50 ad-test kernel: RPC: failed to contact portmap (errno -5).
Aug 22 15:11:32 ad-test kernel: RPC: error 5 connecting to server localhost
Aug 22 15:11:32 ad-test kernel: RPC: failed to contact portmap (errno -5).
ad-test:/root>>>
Very interesting: RPC 100024 is missing.
ad-test:/root>>> rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100021 1 udp 32799 nlockmgr
100021 3 udp 32799 nlockmgr
100021 4 udp 32799 nlockmgr
100021 1 tcp 33012 nlockmgr
100021 3 tcp 33012 nlockmgr
100021 4 tcp 33012 nlockmgr
ad-test:/root>>>
This is an 2.6.5-7.282, which has been up for 80 days.
Another example, again very simple setup, nothing special:
Aug 15 16:40:22 polyxena kernel: RPC: error 5 connecting to server localhost
Aug 15 16:40:22 polyxena kernel: RPC: failed to contact portmap (errno -5).
Aug 15 16:40:22 polyxena kernel: RPC: error 5 connecting to server localhost
Aug 15 16:40:22 polyxena kernel: RPC: failed to contact portmap (errno -5).
This is an 2.6.5-7.283-smp, uptime 201 days. On this one rpcinfo
output is fine:
polyxena:/root>>> rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 34281 status
100021 1 udp 34281 nlockmgr
100021 3 udp 34281 nlockmgr
100021 4 udp 34281 nlockmgr
100024 1 tcp 43627 status
100021 1 tcp 43627 nlockmgr
100021 3 tcp 43627 nlockmgr
100021 4 tcp 43627 nlockmgr
polyxena:/root>>>
We have never seen these errors with 282 or 283 on x86_64. We have never
seen them with 2.6.5-7.244.
On machines, which boot via pxe and have their root file system on
NetApp filers, we see the above described error _OR_ a second one:
Sep 16 04:46:57 c02ptec kernel: nsm_mon_unmon: rpc failed, status=-13
Sep 16 04:46:57 c02ptec kernel: lockd: cannot unmonitor 10.172.207.7
This may happen 10 minutes after reboot or after 60 days uptime.
Sep 16 04:48:07 c02ptec kernel: nsm_mon_unmon: rpc failed, status=-13
Sep 16 04:48:07 c02ptec kernel: lockd: cannot monitor 10.172.207.7
Sep 16 04:48:07 c02ptec kernel: lockd: failed to monitor 10.172.207.7
Luckily the application still lives, so I have this system still
running:
c02ptec:/var/log>>> uptime
6:04pm up 65 days 14:22, 0 users, load average: 0.06, 0.06, 0.07
despite all NFS kernel threads are gone:
c02ptec:/var/log>>> rpcinfo -p
program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100033058 1 tcp 39722
100033057 1 tcp 39737
c02ptec:/var/log>>>
Are there known issue ?
What can I do to give more information ?
Thanks !
Gerd
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs