Hi,
i think i know what's going an, and why i get the "stale nfs handle"
error-message when the NFS server is restartet (real reboot, or a simply
/etc/init.d/nfs restart) but what i don't understand is, why the NFS
client doesn't "remount" the filesystem autmatically. In case of NFS
over tcp, the NFS client could easily detect a restart of the NFS server
(the tcp-connection was aborted) or are there other factors that keep
the NFS client from recognizing such stuff?
The scenaria that made me writing this, is that i'm setting up an NFS
server at my college right now. It will export a directory to many
clients where i don't have root-access. The NFS-directories are mounted
by the clients via automounter, and if i restart my Server for any
reason, i will get the "stale nfs handle" for hours. The kernel does
neither remount nor unmount the directory, and the automounter simply
doesn't unmount it too. It keeps mounted, and that will cause me
troubles for hours.
Were that any thought on that subject here or in any other mailinglist?
Is there any chance, that this might be improved in the future somehow?
Thx
Sven
P? lau , 04/09/2004 klokka 21:06, skreiv Sven K?hler:
> Hi,
>
> i think i know what's going an, and why i get the "stale nfs handle"
> error-message when the NFS server is restartet (real reboot, or a simply
> /etc/init.d/nfs restart) but what i don't understand is, why the NFS
> client doesn't "remount" the filesystem autmatically. In case of NFS
> over tcp, the NFS client could easily detect a restart of the NFS server
> (the tcp-connection was aborted) or are there other factors that keep
> the NFS client from recognizing such stuff?
Sigh. This question keeps coming up again and again and again. Why can't
you people search the archives?
Of course we could "fix" things for the user so that we just look up all
those filehandles again transparently.
The real question is: how do we know that is the right thing to do?
The NFS client wouldn't know the difference between your /etc/passwd
file and a javascript pop-up ad. If it gets an ESTALE error, then that
tells it that the original filehandle is invalid, but it does not know
WHY that is the case. The file may have been deleted and replaced by a
new one. It may be that your server is broken, and is actually losing
filehandles on reboot (as appears to be the case in your setup),...
Reopening the file, and then continuing to write from the same position
may be the right thing to do, but then again it may cause you to
overwrite a bunch of freshly written password entries.
So we bounce the error up to userland where these issues can actually be
resolved.
Cheers,
Trond
> Of course we could "fix" things for the user so that we just look up all
> those filehandles again transparently.
>
> The real question is: how do we know that is the right thing to do?
>
> The NFS client wouldn't know the difference between your /etc/passwd
> file and a javascript pop-up ad. If it gets an ESTALE error, then that
> tells it that the original filehandle is invalid, but it does not know
> WHY that is the case. The file may have been deleted and replaced by a
> new one. It may be that your server is broken, and is actually losing
> filehandles on reboot (as appears to be the case in your setup),...
I agree, but you simply admit that the NFS client doesn't seem to know,
when the server was restart. The simpliest thing i can imagine, is that
the NFS server generates a random integer-value at start, and transmits
it along with ESTALE. If the integer-value is different from the
integer-value the server send while mounting the FS, than the kernel has
to remount it transparently. This is a simple thing so that a client can
safely determine, if the server has been restarted, or not, and it only
adds 4 byte to some nfs-packets.
> Reopening the file, and then continuing to write from the same position
> may be the right thing to do, but then again it may cause you to
> overwrite a bunch of freshly written password entries.
In my case, if the nfs directory is mounted to /mnt/nfs, i can't even do
a simple "cd /mnt/nfs" without getting the "stale nfs handle" - even if
i use a different shell. I always thought, that the "cd /mnt/nfs" should
work, since the shell will aquire a new handle, but it doesn't work :-(
So i'm not really talking about restoring all file-handles. The
filehandles that were still open while the server restarted may stay
broken, but i'd like to be abled to open new ones at last.
> So we bounce the error up to userland where these issues can actually be
> resolved.
This is a good thing to do in general, but i think this needs improvement.
P? lau , 04/09/2004 klokka 21:51, skreiv Sven K?hler:
> I agree, but you simply admit that the NFS client doesn't seem to know,
> when the server was restart. The simpliest thing i can imagine, is that
> the NFS server generates a random integer-value at start, and transmits
> it along with ESTALE. If the integer-value is different from the
> integer-value the server send while mounting the FS, than the kernel has
> to remount it transparently. This is a simple thing so that a client can
> safely determine, if the server has been restarted, or not, and it only
> adds 4 byte to some nfs-packets.
No.... The simplest thing is for the server to actually abide by the
RFCs and not generate filehandles that change on reboot.
NFSv4 is the ONLY version of the protocol that actually supports the
concept of filehandles that have a finite lifetime.
> In my case, if the nfs directory is mounted to /mnt/nfs, i can't even do
> a simple "cd /mnt/nfs" without getting the "stale nfs handle" - even if
> i use a different shell. I always thought, that the "cd /mnt/nfs" should
> work, since the shell will aquire a new handle, but it doesn't work :-(
It won't if the root filehandle is broken too. That is the standard way
of telling the NFS client that the administrator has revoked our access
to the filesystem.
The solution is simple here: fix the broken server...
Cheers,
Trond
>>I agree, but you simply admit that the NFS client doesn't seem to know,
>>when the server was restart. The simpliest thing i can imagine, is that
>>the NFS server generates a random integer-value at start, and transmits
>>it along with ESTALE. If the integer-value is different from the
>>integer-value the server send while mounting the FS, than the kernel has
>>to remount it transparently. This is a simple thing so that a client can
>>safely determine, if the server has been restarted, or not, and it only
>>adds 4 byte to some nfs-packets.
>
> No.... The simplest thing is for the server to actually abide by the
> RFCs and not generate filehandles that change on reboot.
OK, that sounds complicated, but if it would work, than it would be very
nice indeed.
> NFSv4 is the ONLY version of the protocol that actually supports the
> concept of filehandles that have a finite lifetime.
But NFSv4 is still exprerimental :-( and i think the client don't have
NFSv4 support too.
>>In my case, if the nfs directory is mounted to /mnt/nfs, i can't even do
>>a simple "cd /mnt/nfs" without getting the "stale nfs handle" - even if
>>i use a different shell. I always thought, that the "cd /mnt/nfs" should
>>work, since the shell will aquire a new handle, but it doesn't work :-(
>
> It won't if the root filehandle is broken too. That is the standard way
> of telling the NFS client that the administrator has revoked our access
> to the filesystem.
>
> The solution is simple here: fix the broken server...
Sorry? Why is my server broken? I'm using kernel 2.6.8.1 with nfs-utils
1.0.6 on my server, and i don't see, what should be broken.
Thx
Sven
P? lau , 04/09/2004 klokka 22:23, skreiv Sven K?hler:
> Sorry? Why is my server broken? I'm using kernel 2.6.8.1 with nfs-utils
> 1.0.6 on my server, and i don't see, what should be broken.
When your server fails to work as per spec, then it is said to be
"broken" no matter what kernel/nfs-utils combination you are using.
The spec is that reboots are not supposed to clobber filehandles.
So, there are 3 possibilities:
1) You are exporting a non-supported filesystem, (e.g. FAT). See the
FAQ on http://nfs.sourceforge.org.
2) A bug in your initscripts is causing the table of exports to be
clobbered. Running "exportfs" in legacy 2.4 mode (without having the
nfsd filesystem mounted on /proc/fs/nfsd) appears to be broken for me at
least...
3) There is some other bug in knfsd that nobody else appears to be
seeing.
Cheers,
Trond
Trond Myklebust <[email protected]> said on Sat, 04 Sep 2004 23:01:07 -0400:
> P? lau , 04/09/2004 klokka 22:23, skreiv Sven K?hler:
>
> > Sorry? Why is my server broken? I'm using kernel 2.6.8.1 with nfs-utils
> > 1.0.6 on my server, and i don't see, what should be broken.
>
> When your server fails to work as per spec, then it is said to be
> "broken" no matter what kernel/nfs-utils combination you are using.
> The spec is that reboots are not supposed to clobber filehandles.
>
> So, there are 3 possibilities:
>
> 1) You are exporting a non-supported filesystem, (e.g. FAT). See the
> FAQ on http://nfs.sourceforge.org.
> 2) A bug in your initscripts is causing the table of exports to be
> clobbered. Running "exportfs" in legacy 2.4 mode (without having the
> nfsd filesystem mounted on /proc/fs/nfsd) appears to be broken for me at
> least...
> 3) There is some other bug in knfsd that nobody else appears to be
> seeing.
Have I got 2 cases of 3) for you perhaps?
I can't give you more info, because I am not the admin of the boxes
concerned, but we lose filehandles of specific files and spontaneously
sometimes (no server reboots, nfsd restarts, etc).
Background:
We have a compute cluster of machines all running SuSE's 2.4.20, or
thereabouts. The nfs servers run Linus's 2.4.26, talking to ext3, on
bigass apple Xserves.
I will update one directory with rsync from one host, and then try, a
little later on, to operate on that directory from another host. Every
now and then, from a single host only, a few files in that tree will
get stale filehandles - an ls of that directory will mostly be fine
apart from those files. They will also be fine from any other machine.
I have found that if I clobber cache with my alloclargemem program,
then those files will come back immediately.
The other problem we see regularly, and I have encoded explicitly into
my scripts to workaround, because it is such a common occurence, is
when I start 120 jobs in a short time on 120 nodes, which deal with a
bunch of common files read-only, and then write their own private
files, a few of them will die with the read-only files being stale. It
looks as if the server just can't cope with a hundred requests (and
possibly mounts, since they are automounted) in the space of half a
minute (big files, mind you), and starts returning bogus data.
The entire mount (which is automounted, looks like version 3) will
then remain stale for eternity, with df returning its minus 3
bazillion GB free, until automount is restarted.
Known problems? I googled for '"stale nfs file handle" spontaneous'
with no luck. Or is it likely perhaps that SuSE fscked with the nfs
(and autofs) client side code? The sysadmins look at these failures as
being a fact of life, but perhaps no-one else is seeing this, so it's
worth reporting.
--
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
PUBLIC NOTICE AS REQUIRED BY LAW: Any Use of This Product, in Any Manner
Whatsoever, Will Increase the Amount of Disorder in the Universe. Although No
Liability Is Implied Herein, the Consumer Is Warned That This Process Will
Ultimately Lead to the Heat Death of the Universe.
* Tim Connors:
> Background:
>
> We have a compute cluster of machines all running SuSE's 2.4.20, or
> thereabouts. The nfs servers run Linus's 2.4.26, talking to ext3, on
> bigass apple Xserves.
Which NFS server software are you using?
On Sun, 5 Sep 2004, Florian Weimer wrote:
> * Tim Connors:
>
> > Background:
> >
> > We have a compute cluster of machines all running SuSE's 2.4.20, or
> > thereabouts. The nfs servers run Linus's 2.4.26, talking to ext3, on
> > bigass apple Xserves.
>
> Which NFS server software are you using?
kernel nfsd
Source RPM: nfs-utils-1.0.1-109.src.rpm
--
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
Save the whales. Feed the hungry. Free the mallocs. --unk
> So, there are 3 possibilities:
>
> 1) You are exporting a non-supported filesystem, (e.g. FAT). See the
> FAQ on http://nfs.sourceforge.org.
I'm exporting a reiserfs.
> 2) A bug in your initscripts is causing the table of exports to be
> clobbered. Running "exportfs" in legacy 2.4 mode (without having the
> nfsd filesystem mounted on /proc/fs/nfsd) appears to be broken for me at
> least...
So there should be a filesystem mounted to /proc/fs/nfsd? This isn't the
case on my machine. Should the init-script do a simple "mount -t nfsd
none /proc/fs/nfsd"? Than this would be a Bug of my distribution (Gentoo).
Tim Connors wrote:
> I will update one directory with rsync from one host,
You mean rsync to the server and change files directly on the fs rather
than through an NFS client?
> and then try, a
> little later on, to operate on that directory from another host. Every
> now and then, from a single host only, a few files in that tree will
> get stale filehandles - an ls of that directory will mostly be fine
> apart from those files. They will also be fine from any other machine.
Yeah, that's what happens... Clients that had the file open are liable
to get ESTALE. Stale file handles stick around until unmount. As long as
they're around automount will consider the mount busy and not expire it
(but you can unmount manually or killall -USR1 automountd).
Mike
--
Mike Jagdis Web: http://www.eris-associates.co.uk
Eris Associates Limited Tel: +44 7780 608 368
Reading, England Fax: +44 118 926 6974
P? su , 05/09/2004 klokka 09:18, skreiv Sven K?hler:
> So there should be a filesystem mounted to /proc/fs/nfsd? This isn't the
> case on my machine. Should the init-script do a simple "mount -t nfsd
> none /proc/fs/nfsd"? Than this would be a Bug of my distribution (Gentoo).
Yes... See the manpage for "exportfs".
Cheers,
Trond
On Sun, 5 Sep 2004, Mike Jagdis wrote:
> Tim Connors wrote:
> > I will update one directory with rsync from one host,
>
> You mean rsync to the server and change files directly on the fs rather
> than through an NFS client?
No - the server is behind a firewall. Just an ordinary nfs client.
> > and then try, a
> > little later on, to operate on that directory from another host. Every
> > now and then, from a single host only, a few files in that tree will
> > get stale filehandles - an ls of that directory will mostly be fine
> > apart from those files. They will also be fine from any other machine.
>
> Yeah, that's what happens... Clients that had the file open are liable
> to get ESTALE. Stale file handles stick around until unmount. As long as
> they're around automount will consider the mount busy and not expire it
> (but you can unmount manually or killall -USR1 automountd).
Yep - that has been the case normally (when the entire mount went stale),
we'd just restart the automounter.
You almost hit the nail on the head with regards to the problem - this
last happened a week ago, and I seem to remember 6 files getting ESTALE.
But only 2 of those would have likely been open on the host where they
went stale, at any time near when they went stale (if they were open at
all), if I am remembering things right. Unless an `ls -lA --color` counts
as "opening" (they weren't symlinks, just normal files, so I doubt it).
What is strange, is I was able to make them "unstale" simply by clearing
cache - allocating a large block of ram, and ensuring buffers and cached
went to something very small. I didn't need to restart the automounter at
all. Then, I could `ls` the directory fine, and could `cat` the files
fine.
I'm afraid that the intermittent nature of this problem is going to make
it hard for me to reproduce though!
I take it the files go stale (normally) because sillyrename only happens
when 1 host tries to delete while the same host has the file open, so the
server doesn't know that a client still has it open, and if the inode just
happens to be allocated by something new, then the server has no choice
but to say "bugger off"? I thought I had seen in the past that you could
delete a file from one host, have another host still be using the file,
and it would do the sillyrename, and the client would continue to use the
file just fine - probably was on a Sun, come to think of it -- does it's
equivalent of sillyrename keep track of who has what open?
--
TimC -- http://astronomy.swin.edu.au/staff/tconnors/
"Meddle not in the affairs of cats, for they are subtle, and will
piss on your computer." - Jeff Wilder
Trond Myklebust wrote:
>>So there should be a filesystem mounted to /proc/fs/nfsd? This isn't the
>>case on my machine. Should the init-script do a simple "mount -t nfsd
>>none /proc/fs/nfsd"? Than this would be a Bug of my distribution (Gentoo).
Well, I am on Gentoo as well, and it seems that it is mounted on /proc/fs/nfs.
However `cat /proc/fs/nfs/exports` showed only one of 5 exported dirs on my server.
It has been a few weeks since last restart (and NFS restart).
`/etc/init.d/nfs restart` or `exportfs -a` fixed it.
> Yes... See the manpage for "exportfs".
Had a (first) look at it, but I still cannod understand what is the difference
between the "-r" and "-a" option...
The output on my system from both `exportfs -rv` and `exportfs -av` is the same.
Kalin.
--
|| ~~~~~~~~~~~~~~~~~~~~~~ ||
( ) http://ThinRope.net/ ( )
|| ______________________ ||
On Sat, 2004-09-04 at 23:01 -0400, Trond Myklebust wrote:
> 2) A bug in your initscripts is causing the table of exports to be
> clobbered. Running "exportfs" in legacy 2.4 mode (without having the
> nfsd filesystem mounted on /proc/fs/nfsd) appears to be broken for me at
> least...
> 3) There is some other bug in knfsd that nobody else appears to be
> seeing.
The fact that we require a persistent table of exports at all, and can't
call back to mountd to authenticate 'new' clients instead of just
telling them to sod off if the kernel doesn't already know about them,
is considered by some to be a bug in knfsd.
--
dwmw2
P? m? , 06/09/2004 klokka 05:57, skreiv David Woodhouse:
> The fact that we require a persistent table of exports at all, and can't
> call back to mountd to authenticate 'new' clients instead of just
> telling them to sod off if the kernel doesn't already know about them,
> is considered by some to be a bug in knfsd.
That should have been fixed in 2.6.x. If you do mount /proc/fs/nfsd, and
use a recent enough version of mountd, then knfsd can and will work
without any extra help from exportfs.
The one problem I have found with this implementation is that it relies
very heavily on reverse-DNS lookups, so it may give unexpected results
if you have more than one name for your client. I can't see why that
shouldn't be fixable, though...
Cheers,
Trond
On Sun, 2004-09-05 at 13:01, Trond Myklebust wrote:
> When your server fails to work as per spec, then it is said to be
> "broken" no matter what kernel/nfs-utils combination you are using.
> The spec is that reboots are not supposed to clobber filehandles.
>
> So, there are 3 possibilities:
>
> 1) You are exporting a non-supported filesystem, (e.g. FAT). See the
> FAQ on http://nfs.sourceforge.org.
> 2) A bug in your initscripts is causing the table of exports to be
> clobbered. Running "exportfs" in legacy 2.4 mode (without having the
> nfsd filesystem mounted on /proc/fs/nfsd) appears to be broken for me at
> least...
> 3) There is some other bug in knfsd that nobody else appears to be
> seeing.
>
4) You're exporting a filesystem mounted on a block device whose
device minor number is dynamic and has changed at the last reboot,
e.g. loopback mounts or SCSI.
5) The mapping of minor numbers is stable but you physically re-arranged
the disks or SCSI cards and changed /etc/fstab correspondingly.
Before you say any more, yes this is broken and fixing it properly is
Hard. This is why have the fsid export option.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.