2002-04-24 21:31:17

by N??ez

[permalink] [raw]
Subject: NFS problems: help needed

Greetings,

I'm having serious problems with my NFS environment;
The server is a Dual Pentium III 1Ghz running Redhat
7.2 (Kernel 2.4.7-10smp) and the clients are Dual
Pentium 1Ghz Redhat 6.2 (Kernel 2.2.14-5.0smp).

We have a process that opens a lot of files on all the
cluster and after some seconds, all the clients start
showing the following messages:

Apr 23 19:12:05 linux0102 kernel: nfs: server
lnxdev0003 OK
Apr 23 19:12:17 linux0102 kernel: nfs: task 1059 can't
get a request slot
Apr 23 19:12:33 linux0102 kernel: nfs: server
lnxdev0003 OK
Apr 23 19:12:44 linux0102 kernel: nfs: server
lnxdev0003 not responding, still trying
Apr 23 19:12:44 linux0102 kernel: nfs: server
lnxdev0003 OK
Apr 23 19:12:54 linux0102 kernel: nfs: server
lnxdev0003 not responding, still trying
Apr 23 19:14:45 linux0102 automount[895]: expired
/nb_apps/espiel
Apr 23 19:14:45 linux0102 automount[895]: expired
/nb_apps/expat
Apr 23 19:14:45 linux0102 automount[895]: expired
/nb_apps/crypto
Apr 23 19:14:45 linux0102 automount[895]: expired
/nb_apps/rogue
Apr 23 19:14:45 linux0102 automount[895]: expired
/nb_apps/ACE
Apr 23 19:14:45 linux0102 automount[895]: expired
/nb_apps/xerces
Apr 23 19:15:49 linux0102 automount[906]: expired
/home/suresh
Apr 23 19:16:06 linux0102 kernel: nfs: server
lnxdev0003 OK
Apr 23 19:16:06 linux0102 kernel: nfs: server
lnxdev0003 OK
Apr 23 19:16:38 linux0102 kernel: nfs: server
lnxdev0003 not responding, still trying

I followed the tutorial that appears at
"http://nfs.sourceforge.net/nfs-howto/performance.html"
and applied the following optimizations (with no
result):

- Deactivated auto-negotiation on the HUB (we have a
Cisco FastHub 400) and forced the NIC to use "Full
Duplex" using FastEthernet. This should reduce the
number of collissions.
- Increased the number of nfsd servers and the socket
queue (here is a fragment of my script,
/etc/init.d/nfs, currently i use 16 nfsd servers):

RPCNFSDCOUNT=16
....
# NFSv3 only if kernel >= 2.2.18
OS_RELEASE=`uname --release`
....
echo -n "Starting NFS mountd: "
daemon rpc.mountd $RPCMOUNTDOPTS
echo
echo -n "Starting NFS daemon: "
echo 262144 > /proc/sys/net/core/rmem_default
echo 262144 > /proc/sys/net/core/rmem_max
daemon rpc.nfsd $RPCNFSDCOUNT
echo 65536 > /proc/sys/net/core/rmem_default
echo 65536 > /proc/sys/net/core/rmem_max
echo
- I increased the rsize and wsize on my clients to
4096 (im using automount and NIS, /etc/auto.master):
/home auto.home
-rw,intr,retry=10,rsize=4096,wsize=4096
/nb_apps auto.nb_apps
-rw,intr,retry=10,rsize=4096,wsize=4096
/data auto.data
-rw,intr,retry=10,rsize=4096,wsize=4096


What i should try next?. We have a network with very
few machines, apparently no collissions (the stats at
the hub doesn't show that many but i will have to
confirm).

Should i try to update the nfs-tools on all the boxes
( i can't do the same to the kernel because we have a
Cordba lib that could broke, at least on the clients)?

The server doesn't have a high load (i'm monitoring
the box using top and UCD-SNMP) and other network apps
work fine so far :(

Thanks in advance.



=====
Jos? Vicente Nu?ez Zuleta ([email protected])
Newbreak System Administrator (http://www.newbreak.com)
Office Phone: 203-355-1511, 203-355-1510
Fax: 203-355-1512
Cellular Phone: 203-570-7078

_________________________________________________________
Do You Yahoo!?
Informaci?n de Estados Unidos y Am?rica Latina, en Yahoo! Noticias.
Vis?tanos en http://noticias.espanol.yahoo.com

_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs