2006-10-26 14:56:36

by Andrew Ryan

[permalink] [raw]
Subject: stale file handles with linux NFS server, not with NetApp

I've recently tried using a RHEL3 machine as an NFS server. The server is
2 CPU Xeon, running 2.4.21-37.ELsmp kernel. The client is also RHEL3,
running 2.4.21-40.ELsmp. The clients are automounting homedirs from the
server. For years we've been using a Netapp server (currently running
ONTAP 7.0.1) and it's worked just fine. Trying the RHEL3 server though,
I'm experiencing a weird issue that I can't explain or fix.

1. Log in as user with an automounted homedir
2. cd in to some subdirectory (Foo/Bar/bax in example below) of homedir
(can't duplicate the problem in the homedir itself)
3. Wait a few minutes (automount timeout is 60 seconds)
4. Try to access cwd:
[grue@cu015 bax]$ ls -l /proc/$$/cwd
lrwxrwxrwx 1 grue __cubitu 0 Oct 26 06:48 /proc/16280/cwd ->
/home/grue/Foo/Bar/bax
5. So far so good, now *really* try to access this:
[grue@cu015 bax]$ sudo mkdir -p /tmp/gasd/dfsdf/sdf/
Password:
mkdir: cannot open current directory: Stale NFS file handle
[grue@cu015 bax]$ ls -l /proc/$$/cwd
lrwxrwxrwx 1 grue __cubitu 0 Oct 26 06:48 /proc/16280/cwd ->
/home/grue/Foo/Bar/bax (deleted)

At this point I need to cd out of this directory and then back into the
directory in order to get the Stale NFS file handle message to go away.
According to /proc/mounts the filesystem is still mounted.

My first inclination would be to blame autofs here, except that the exact
same automount config works fine against our NetApp filer: no "Stale file
handle" message is ever received no matter how long we wait.

The export on the server is done "rw,sync".
On the client, auto.master is one line:
/home /etc/auto.sharedhome --debug
and /etc/auto.sharedhome is also one line:
grue -fstype=nfs,rw,tcp,nfsvers=3,rsize=32768,wsize=32768,intr,hard,fg
mgr:/u1/chroot/home/&

This seems like such an obvious problem that I'm wondering what I'm doing
wrong, but can't figure out anything. I can provide any
additional debugging detail needed. Switching to 2.6/RHEL4 is not really
an option for us at this point, so we haven't tried it to see if that
fixes it.


Thanks
--andrew

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs