2013-02-19 23:37:12

by Mark Lehrer

[permalink] [raw]
Subject: Time to declare war on "stale nfs file handle"


I set up a pair of servers with DRBD, Pacemaker, and a virtual IP address. I am mounting with NFSv3 & TCP. The /etc/exports entry looks like this:

/nfs/volume01 *(rw,no_root_squash,fsid=778,insecure,async)


Failing between nodes seems to work well whenever I am testing it.
However, it seems that whenever I get into a situation where it has
been running for awhile and then I fail over, clients get the dreaded
Stale NFS file handle error on any open files. In this case they are
VM images, which means lots of rebooting.

What should I do to prevent this error from happening? I didn't have any
luck with exportfs -f. Unfortunately this is a tough one to duplicate so
I am just trying to prepare a few tests for the next time it happens.

Also, I understand that NFSv4 is not quite as stateless as the older versions.
Will this kind of arrangement have potential data loss issues on NFSv4?

Thanks,
Mark



2013-02-21 21:02:27

by Mark Lehrer

[permalink] [raw]
Subject: Re: Time to declare war on "stale nfs file handle"


>> been running for awhile and then I fail over, clients get the dreaded
>> Stale NFS file handle error on any open files. In this case they are
>
> That's weird. The contents of /proc/net/rpc/nfsd.fh/content and
> /proc/net/rpc/nfsd.export/content on both nodes after a failed failover
> might be interesting.

Cool, this is exactly the kind of information I was looking for.


> What filesystem are you exporting?

ext4.

Thanks!
Mark


2013-02-20 20:21:22

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Time to declare war on "stale nfs file handle"

On Tue, Feb 19, 2013 at 03:36:25PM -0700, Mark Lehrer wrote:
>
> I set up a pair of servers with DRBD, Pacemaker, and a virtual IP address. I am mounting with NFSv3 & TCP. The /etc/exports entry looks like this:
>
> /nfs/volume01 *(rw,no_root_squash,fsid=778,insecure,async)
>
>
> Failing between nodes seems to work well whenever I am testing it.
> However, it seems that whenever I get into a situation where it has
> been running for awhile and then I fail over, clients get the dreaded
> Stale NFS file handle error on any open files. In this case they are

That's weird. The contents of /proc/net/rpc/nfsd.fh/content and
/proc/net/rpc/nfsd.export/content on both nodes after a failed failover
might be interesting.

What filesystem are you exporting?

> VM images, which means lots of rebooting.
>
> What should I do to prevent this error from happening? I didn't have any
> luck with exportfs -f. Unfortunately this is a tough one to duplicate so
> I am just trying to prepare a few tests for the next time it happens.
>
> Also, I understand that NFSv4 is not quite as stateless as the older versions.
> Will this kind of arrangement have potential data loss issues on NFSv4?

It should still work if you copy over all the right stuff in
/var/lib/nfs.

--b.