A colleague of mine who is having email difficulties asked me to forward
this to the list.
I am having a problem with nfs mounts under linux-2.4.20.
Server:
Redhat-8.0, kernel upgraded to linux-2.4.20 + Trond's NFS Patches.
Dual PIII 933 Mhz, 2 GB ram, tg3 gigabit ethernet driver.
Client
Redhat-7.1, kernel upgraded to linux-2.4.18 + Trond's NFS patches.
Dual Xeon 2.2 Ghz, 1 GB ram, e100 ethernet driver.
After I upgraded the server from 2.4.14 to 2.4.20, I am having a problem
mounting and unmounting the NFS partitions using autofs. I have all
of the clients touch the nfs filesystems (ls /misc/scratchx) to cause
them to mount. Many of the mounts fail (~ 50%) with the error. If I
slow
down the rate at which the ls is executed, I have a better chance at
mounting all of the filesystems. When I operate on chucks of the nodes
128 at a time, I can usually get them to mount ok.
In the messages file I get:
Jan 16 16:58:51 g0427 automount[537]: attempting to mount entry
/misc/scratchx
Jan 16 16:59:12 g0427 automount[13130]: >> mount: RPC: Timed out
Jan 16 16:59:12 g0427 automount[13130]: mount(nfs): nfs: mount failure
x00:/export/scratch1 on /misc/scratchx
I tried increasing timeo to 30 and 60, and this appeared to help it
did not fix the problem. I never change timeo before.
When the mounts do fail, I try and clean them up with umount. Often
when doing this I will get error messages on some of the nodes:
Bad UMNT RPC: RPC: Timed out
This problem can be reduced when I do not have the umount execute on
all of the nodes at the same time.
This happens with when the filesystem is ext3 or xfs.
I do not have these problems under the 2.4.14 kernel.
Are there some timing values in RPC that have changed or that I can
change
to increase timeouts? Does anyone have any other ideas?
Thanks,
Craig
--
David B. Ritch
High Performance Technologies, Inc.
-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com
Understand how to protect your customers personal information by implementing
SSL on your Apache Web Server. Click here to get our FREE Thawte Apache
Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
> After I upgraded the server from 2.4.14 to 2.4.20, I am having a problem
> mounting and unmounting the NFS partitions using autofs. I have all
> of the clients touch the nfs filesystems (ls /misc/scratchx) to cause
> them to mount. Many of the mounts fail (~ 50%) with the error.
Dave,
You might want to have Craig mention how many clients are involved.
If this is the FSL cluster, it's 1000. I don't know how mount decides
whether to use UCP or TCP for the mount operation, but I do know that
1000 UDP requests heading at the same server at the same time is some
major congestion. 1000 TCP connections against the same server is
likewise problemmatic.
-- greg
-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com
Understand how to protect your customers personal information by implementing
SSL on your Apache Web Server. Click here to get our FREE Thawte Apache
Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs