Hi,
I rebooted one of my NFS servers (no configuration changed), and when it
came back up, all clients that had the exported file system mounted
(NFSv3 hard mount) reported stale NFS file handles, for the file system
root and for files within. The STALE error travelled across the wire
according to ethereal.
The server system runs SuSE Linux 8.2, they ship a 2.4.20 kernel with
some patches (POSIX ACL stuff, SuSE claim it's Solaris compatible,
haven't tested yet). Their start script is below.
The clients run SuSE Linux 8.1 with a patched 2.4.19 (no ACL stuff). The
server configuration wasn't changed across the reboot except for
replacing a SCSI terminator (I borrowed one I gave back when my own
arrived.)
Is there any known issue with NFS-exporting file systems that are hosted
in LVM volumes? Is there an issue with SuSE's ACL patches?
How is the file handle obtained and under what circumstances will it
become stale after a reboot? SuSE's RPM on the server is
nfs-utils-1.0.1-89.
I was under the impression that rebooting a server into the same
configuration would NOT give stale NFS file handles, in fact, this has
worked before with a SuSE Linux 8.1 server (but that one didn't use LVM
either, so there are two -- for me inseparable -- major differences
here.)
Sometimes on frustrated days like these I think I should just replace
this Linux NFS with Solaris. :-/
Can somebody point me to documents about NFS file handle internals or
try to explain the situation? Testing directions are welcome, as are
"kill LVM" or "kill ACL patches", the server isn't in production yet, so
there's still time to fix things for good. (Even kill SuSE, replace with
Debian/RedHat is an acceptable suggestion if there are technical
reasons.)
I also wonder if NFS clients should have a "masochistically_hard" mount
options that issue SIGKILL to processes that use stale NFS file handles,
I could use this...
SuSE 8.2 nfsserver start script excerpt (the nfslock daemon is started
before execution of this script):
PARAMS=3
test "$USE_KERNEL_NFSD_NUMBER" -gt 0 && PARAMS="$USE_KERNEL_NFSD_NUMBER"
echo -n "Starting kernel based NFS server"
/usr/sbin/exportfs -r
/usr/sbin/rpc.nfsd $PARAMS
startproc /usr/sbin/rpc.mountd
--
Matthias 'NFS sucks some days' Andree
-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
There are a number of things that can cause stale NFS file handles on
all clients with file systems mounted from the server when the server
reboots - they include:
The device on the server containing the file system changes e.g. change
of SCSI ID.
The file system not being mounted on the server when exported i.e. the
'wrong' file system is exported.
The contents of /var/lib/nfs/rmtab being missing or corrupt. If a
client's entry is missing in rmtab after a server reboot, then the
server no nothing about the clients clain to have mounetd the file
system - and gives a stale NFS file handle.
As you didn't change anything between reboots, then I guess it could be
the last case a bove.
Matthias Andree wrote:
>
> Hi,
>
> I rebooted one of my NFS servers (no configuration changed), and when it
> came back up, all clients that had the exported file system mounted
> (NFSv3 hard mount) reported stale NFS file handles, for the file system
> root and for files within. The STALE error travelled across the wire
> according to ethereal.
>
> The server system runs SuSE Linux 8.2, they ship a 2.4.20 kernel with
> some patches (POSIX ACL stuff, SuSE claim it's Solaris compatible,
> haven't tested yet). Their start script is below.
>
> The clients run SuSE Linux 8.1 with a patched 2.4.19 (no ACL stuff). The
> server configuration wasn't changed across the reboot except for
> replacing a SCSI terminator (I borrowed one I gave back when my own
> arrived.)
>
> Is there any known issue with NFS-exporting file systems that are hosted
> in LVM volumes? Is there an issue with SuSE's ACL patches?
>
> How is the file handle obtained and under what circumstances will it
> become stale after a reboot? SuSE's RPM on the server is
> nfs-utils-1.0.1-89.
>
> I was under the impression that rebooting a server into the same
> configuration would NOT give stale NFS file handles, in fact, this has
> worked before with a SuSE Linux 8.1 server (but that one didn't use LVM
> either, so there are two -- for me inseparable -- major differences
> here.)
>
> Sometimes on frustrated days like these I think I should just replace
> this Linux NFS with Solaris. :-/
>
> Can somebody point me to documents about NFS file handle internals or
> try to explain the situation? Testing directions are welcome, as are
> "kill LVM" or "kill ACL patches", the server isn't in production yet, so
> there's still time to fix things for good. (Even kill SuSE, replace with
> Debian/RedHat is an acceptable suggestion if there are technical
> reasons.)
>
> I also wonder if NFS clients should have a "masochistically_hard" mount
> options that issue SIGKILL to processes that use stale NFS file handles,
> I could use this...
>
> SuSE 8.2 nfsserver start script excerpt (the nfslock daemon is started
> before execution of this script):
>
> PARAMS=3
> test "$USE_KERNEL_NFSD_NUMBER" -gt 0 && PARAMS="$USE_KERNEL_NFSD_NUMBER"
>
> echo -n "Starting kernel based NFS server"
> /usr/sbin/exportfs -r
> /usr/sbin/rpc.nfsd $PARAMS
> startproc /usr/sbin/rpc.mountd
>
> --
> Matthias 'NFS sucks some days' Andree
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: INetU
> Attention Web Developers & Consultants: Become An INetU Hosting Partner.
> Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
> INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
[Sorry - sent the message before I had finished ...]
There are a number of things that can cause stale NFS file handles on
all clients with file systems mounted from the server when the server
reboots - they include:
The device on the server containing the file system changes e.g. change
of SCSI ID.
The file system not being mounted on the server when exported i.e. the
'wrong' file system is exported.
The contents of /var/lib/nfs/rmtab being missing or corrupt. If a
client's entry is missing in rmtab after a server reboot, then the
server no nothing about the clients claim to have mounted the file
system - and gives a stale NFS file handle.
As you didn't change anything between reboots, then I guess it could be
the last case above.
I've seen similar problems - I use XFS for my file systems and I _think_
the rmtab file contents got replaced with NULLs when the file server
crashed (a known 'problem' with XFS).
Also, if a client attempts to umount the file system from the server,
but the file system is busy, then rpc.mountd on the server will remove
the client's entry from rmtab, but the umount on the client will fail
and remain active. If the server now reboots, the client will get stale
NFS file handles when the server comes back up.
James Pearson
Matthias Andree wrote:
>
> Hi,
>
> I rebooted one of my NFS servers (no configuration changed), and when it
> came back up, all clients that had the exported file system mounted
> (NFSv3 hard mount) reported stale NFS file handles, for the file system
> root and for files within. The STALE error travelled across the wire
> according to ethereal.
>
> The server system runs SuSE Linux 8.2, they ship a 2.4.20 kernel with
> some patches (POSIX ACL stuff, SuSE claim it's Solaris compatible,
> haven't tested yet). Their start script is below.
>
> The clients run SuSE Linux 8.1 with a patched 2.4.19 (no ACL stuff). The
> server configuration wasn't changed across the reboot except for
> replacing a SCSI terminator (I borrowed one I gave back when my own
> arrived.)
>
> Is there any known issue with NFS-exporting file systems that are hosted
> in LVM volumes? Is there an issue with SuSE's ACL patches?
>
> How is the file handle obtained and under what circumstances will it
> become stale after a reboot? SuSE's RPM on the server is
> nfs-utils-1.0.1-89.
>
> I was under the impression that rebooting a server into the same
> configuration would NOT give stale NFS file handles, in fact, this has
> worked before with a SuSE Linux 8.1 server (but that one didn't use LVM
> either, so there are two -- for me inseparable -- major differences
> here.)
>
> Sometimes on frustrated days like these I think I should just replace
> this Linux NFS with Solaris. :-/
>
> Can somebody point me to documents about NFS file handle internals or
> try to explain the situation? Testing directions are welcome, as are
> "kill LVM" or "kill ACL patches", the server isn't in production yet, so
> there's still time to fix things for good. (Even kill SuSE, replace with
> Debian/RedHat is an acceptable suggestion if there are technical
> reasons.)
>
> I also wonder if NFS clients should have a "masochistically_hard" mount
> options that issue SIGKILL to processes that use stale NFS file handles,
> I could use this...
>
> SuSE 8.2 nfsserver start script excerpt (the nfslock daemon is started
> before execution of this script):
>
> PARAMS=3
> test "$USE_KERNEL_NFSD_NUMBER" -gt 0 && PARAMS="$USE_KERNEL_NFSD_NUMBER"
>
> echo -n "Starting kernel based NFS server"
> /usr/sbin/exportfs -r
> /usr/sbin/rpc.nfsd $PARAMS
> startproc /usr/sbin/rpc.mountd
>
> --
> Matthias 'NFS sucks some days' Andree
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: INetU
> Attention Web Developers & Consultants: Become An INetU Hosting Partner.
> Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
> INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
James Pearson <[email protected]> writes:
> The device on the server containing the file system changes e.g. change
> of SCSI ID.
>
> The file system not being mounted on the server when exported i.e. the
> 'wrong' file system is exported.
I can rule these two out.
> The contents of /var/lib/nfs/rmtab being missing or corrupt. If a
> client's entry is missing in rmtab after a server reboot, then the
> server no nothing about the clients claim to have mounted the file
> system - and gives a stale NFS file handle.
> As you didn't change anything between reboots, then I guess it could be
> the last case above.
Possible, but unlikely, I typed "reboot" which went through init and has
worked fine, no hints about fsck checking file systems because of an
unclean shutdown. If it was rmtab, I sure won't find out any more before
the problem shows up again.
> I've seen similar problems - I use XFS for my file systems and I _think_
> the rmtab file contents got replaced with NULLs when the file server
> crashed (a known 'problem' with XFS).
That's specific to any file systems that only journal metadata and don't
enforce a special "data first" write order such as ext3fs or patched
reiserfs. Incidentally, /var is an XFS partition, in contrast to the
exported partition (which is ext3).
> Also, if a client attempts to umount the file system from the server,
> but the file system is busy, then rpc.mountd on the server will remove
> the client's entry from rmtab, but the umount on the client will fail
> and remain active. If the server now reboots, the client will get stale
> NFS file handles when the server comes back up.
Hum. I'd have to look at autofs4.0.0pre10 (that's what the clients use).
It appears as though I could make the client send a mount request to the
server, just by typing "mount $MOUNTPOINT", this increments the last
line in rmtab:
client.example.org:/exported:0x0000000e
So I'll try a plain "mount" from the client next time I see those stale
NFS file handles.
I wonder if the NFS client should try sending a mount request when it
gets stale NFS file handle if a simple mount request resolves the
condition because the server lost its rmtab.
I configured the machine to create a tar.gz file of /var/lib/nfs at
boot-up time so I can have a look should it go wrong again. Maybe /var
is still too precious for XFS ;-)
--
Matthias Andree
-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, 19 Jun 2003, James Pearson wrote:
> The device on the server containing the file system changes e.g.
> change of SCSI ID.
isnt there an export option to allow one to 'batton-down' the
exported NFS device (and hence abstract it from the real device id on
the NFS server)?
regards,
--
Paul Jakma [email protected] [email protected] Key ID: 64A2FF6A
warning: do not ever send email to [email protected]
Fortune:
Someday somebody has got to decide whether the typewriter is the machine,
or the person who operates it.
-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
The export option is 'fsid=N' - you need a recent kernel - 2.4.20 and
above and nfs-utils v1.0.1 and above.
James Pearson
Paul Jakma wrote:
>
> On Thu, 19 Jun 2003, James Pearson wrote:
>
> > The device on the server containing the file system changes e.g.
> > change of SCSI ID.
>
> isnt there an export option to allow one to 'batton-down' the
> exported NFS device (and hence abstract it from the real device id on
> the NFS server)?
>
> regards,
> --
> Paul Jakma [email protected] [email protected] Key ID: 64A2FF6A
> warning: do not ever send email to [email protected]
> Fortune:
> Someday somebody has got to decide whether the typewriter is the machine,
> or the person who operates it.
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: INetU
> Attention Web Developers & Consultants: Become An INetU Hosting Partner.
> Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
> INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
Matthias Andree wrote:
>
> James Pearson <[email protected]> writes:
>
> > The device on the server containing the file system changes e.g. change
> > of SCSI ID.
> >
> > The file system not being mounted on the server when exported i.e. the
> > 'wrong' file system is exported.
>
> I can rule these two out.
>
> > The contents of /var/lib/nfs/rmtab being missing or corrupt. If a
> > client's entry is missing in rmtab after a server reboot, then the
> > server no nothing about the clients claim to have mounted the file
> > system - and gives a stale NFS file handle.
> > As you didn't change anything between reboots, then I guess it could be
> > the last case above.
>
> Possible, but unlikely, I typed "reboot" which went through init and has
> worked fine, no hints about fsck checking file systems because of an
> unclean shutdown. If it was rmtab, I sure won't find out any more before
> the problem shows up again.
>
> > I've seen similar problems - I use XFS for my file systems and I _think_
> > the rmtab file contents got replaced with NULLs when the file server
> > crashed (a known 'problem' with XFS).
>
> That's specific to any file systems that only journal metadata and don't
> enforce a special "data first" write order such as ext3fs or patched
> reiserfs. Incidentally, /var is an XFS partition, in contrast to the
> exported partition (which is ext3).
>
> > Also, if a client attempts to umount the file system from the server,
> > but the file system is busy, then rpc.mountd on the server will remove
> > the client's entry from rmtab, but the umount on the client will fail
> > and remain active. If the server now reboots, the client will get stale
> > NFS file handles when the server comes back up.
>
> Hum. I'd have to look at autofs4.0.0pre10 (that's what the clients use).
autofs does its own checks before doing an actual umount, so this is
unlikely.
It would be a problem if you manually ran 'umount -at nfs' on the
clients (I've been caught by this).
> It appears as though I could make the client send a mount request to the
> server, just by typing "mount $MOUNTPOINT", this increments the last
> line in rmtab:
>
> client.example.org:/exported:0x0000000e
>
> So I'll try a plain "mount" from the client next time I see those stale
> NFS file handles.
>
> I wonder if the NFS client should try sending a mount request when it
> gets stale NFS file handle if a simple mount request resolves the
> condition because the server lost its rmtab.
There was a posting from Neil Brown recently about possible changes for
2.6 kernels, that mentions doing away with rmtab - see:
http://marc.theaimsgroup.com/?l=linux-nfs&m=105331510308653&w=2
I don't know enough about this, but I _assume_ this will make things a
bit more 'solid'.
> I configured the machine to create a tar.gz file of /var/lib/nfs at
> boot-up time so I can have a look should it go wrong again. Maybe /var
> is still too precious for XFS ;-)
Something I've considered doing, but never got round to it ...
James Pearson
-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Fri, 20 Jun 2003, James Pearson wrote:
> There was a posting from Neil Brown recently about possible changes for
> 2.6 kernels, that mentions doing away with rmtab - see:
>
> http://marc.theaimsgroup.com/?l=linux-nfs&m=105331510308653&w=2
nfsd was supposed to be stateless ;-)
> > I configured the machine to create a tar.gz file of /var/lib/nfs at
> > boot-up time so I can have a look should it go wrong again. Maybe /var
> > is still too precious for XFS ;-)
>
> Something I've considered doing, but never got round to it ...
I'm using this in SuSE 8.2's /etc/init.d/boot.local:
tar --quiet -czf /var/lib/nfs-backup-`date -u +%FT%H%M%SZ`.tar.gz /var/lib/nfs
Takes a recent GNU tar and a recent "date" command that knows %F as
YYYY-MM-DD format and -u for UTC.
--
Matthias Andree
-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Friday June 20, [email protected] wrote:
> On Fri, 20 Jun 2003, James Pearson wrote:
> > There was a posting from Neil Brown recently about possible changes for
> > 2.6 kernels, that mentions doing away with rmtab - see:
> >
> > http://marc.theaimsgroup.com/?l=linux-nfs&m=105331510308653&w=2
>
> nfsd was supposed to be stateless ;-)
(I noticed the smiley, but thought I would reply anyway...)
Nope. NFS - the protocol - is stateless (Except for the state file
files stored on the fileserver....)
MOUNTD and NLM are separate supporting protocols, neither of
which are stateless.
and "nfsd" is a program, not a protocol. What does it mean for a
program to be stateless?
NeilBrown
-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Sat, 21 Jun 2003, Neil Brown wrote:
> Nope. NFS - the protocol - is stateless (Except for the state file
> files stored on the fileserver....)
> MOUNTD and NLM are separate supporting protocols, neither of
> which are stateless.
I am aware that there is state preserved, and preserving state in itself
isn't bad, it's just if a reboot without changing obvious configuration
makes mounts stale without apparent reason, that's annoying.
> and "nfsd" is a program, not a protocol. What does it mean for a
> program to be stateless?
To have only one trunk for control flow perhaps? :-> Well, seriously,
I'll watch if rmtab confuses the clients.
--
Matthias Andree
-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs