2004-05-06 18:59:44

by Garrick Staples

[permalink] [raw]
Subject: nfsd, rmtab, failover, and stale filehandles

I'm trying to get a pair of NFS failover servers working correctly. I've got
all heartbeat scripts worked out, the common storage, IP takeover, etc. The
only problem I have now is random stale filehandles after a failover.
Sometimes it works, sometimes it doesn't.

I was originally trying to doctor up rmtab and carry over the mount info to the
second server, but according to recent docs, rmtab is no longer used and NFS
requests should simply always work, as long as the filesystem is exported to
the client hosts.

Is there anything I can do to track down these stale filehandles, what's
causing it, and how to prevent them? There's a thread going on right now about
ESTALEs with patches being thrown around, but I don't know if that pertains to
me.

Btw, does anyone know of a *searchable* archive of this list? The sf.net
interface drives me nuts.

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (939.00 B)
(No filename) (189.00 B)
Download all attachments

2004-05-06 19:07:22

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Thu, May 06, 2004 at 11:56:03AM -0700, Garrick Staples wrote:
> Btw, does anyone know of a *searchable* archive of this list? The sf.net
> interface drives me nuts.

marc.theaimsgroup.com seems to be a popular choice:

http://marc.theaimsgroup.com/?l=linux-nfs

--b.


-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-05-06 19:15:44

by Garrick Staples

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Thu, May 06, 2004 at 03:07:18PM -0400, J. Bruce Fields alleged:
> On Thu, May 06, 2004 at 11:56:03AM -0700, Garrick Staples wrote:
> > Btw, does anyone know of a *searchable* archive of this list? The sf.net
> > interface drives me nuts.
>
> marc.theaimsgroup.com seems to be a popular choice:
>
> http://marc.theaimsgroup.com/?l=linux-nfs

Thank you! I had tried that earlier as l=nfs and didn't find it. *duh*

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (501.00 B)
(No filename) (189.00 B)
Download all attachments

2004-05-06 19:17:32

by Garrick Staples

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Thu, May 06, 2004 at 11:56:03AM -0700, Garrick Staples alleged:
> I'm trying to get a pair of NFS failover servers working correctly. I've got
> all heartbeat scripts worked out, the common storage, IP takeover, etc. The
> only problem I have now is random stale filehandles after a failover.
> Sometimes it works, sometimes it doesn't.
>
> I was originally trying to doctor up rmtab and carry over the mount info to the
> second server, but according to recent docs, rmtab is no longer used and NFS
> requests should simply always work, as long as the filesystem is exported to
> the client hosts.

Btw, I'm using linux 2.6.5, glibc 2.3.2, and nfs-utils 1.0.6.


--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (750.00 B)
(No filename) (189.00 B)
Download all attachments

2004-05-06 21:54:37

by Garrick Staples

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Thu, May 06, 2004 at 12:13:52PM -0700, Garrick Staples alleged:
> On Thu, May 06, 2004 at 11:56:03AM -0700, Garrick Staples alleged:
> > I'm trying to get a pair of NFS failover servers working correctly. I've got
> > all heartbeat scripts worked out, the common storage, IP takeover, etc. The
> > only problem I have now is random stale filehandles after a failover.
> > Sometimes it works, sometimes it doesn't.
> >
> > I was originally trying to doctor up rmtab and carry over the mount info to the
> > second server, but according to recent docs, rmtab is no longer used and NFS
> > requests should simply always work, as long as the filesystem is exported to
> > the client hosts.
>
> Btw, I'm using linux 2.6.5, glibc 2.3.2, and nfs-utils 1.0.6.

I'm at a total loss. Everything I'm reading tells me that all I need to ensure
is that fs device names are the same on both servers so that the generated
filehandles are the same, and that I need to move all lines matching
":$mountpoint:" in rmtab to the new server. The former is done since I'm using
persistant device numbers with lvm. The latter shouldn't be needed because I'm
using the "new" proc interface with 2.6.5.

rmtab definitly doesn't do any noticable difference. I can add random text
and blank it out with no noticable difference on the clients.

Is this a client problem? The clients are all 2.4.24 and 2.4.26.

All clients and servers are using vanilla kernels.

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (1.49 kB)
(No filename) (189.00 B)
Download all attachments

2004-05-06 22:25:00

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Thu, May 06, 2004 at 02:53:11PM -0700, Garrick Staples wrote:
> I'm at a total loss. Everything I'm reading tells me that all I need to ensure
> is that fs device names are the same on both servers so that the generated
> filehandles are the same, and that I need to move all lines matching
> ":$mountpoint:" in rmtab to the new server. The former is done since I'm using
> persistant device numbers with lvm. The latter shouldn't be needed because I'm
> using the "new" proc interface with 2.6.5.
>
> rmtab definitly doesn't do any noticable difference. I can add random text
> and blank it out with no noticable difference on the clients.
>
> Is this a client problem? The clients are all 2.4.24 and 2.4.26.
>
> All clients and servers are using vanilla kernels.

I'm assuming /etc/exports is the same, and that the nfsd filesystem is
mounted (probably on /proc/fs/nfs/) and mountd is running without
complaint on the server that you're failing over to?

The kernel uses upcalls to mountd in part to construct the filehandles,
and nfserr_stale could be returned if those upcalls weren't working.
You can see the contents of the caches that hold the result of those
upcalls with something like

for i in `find /proc/net/rpc -name "content"`; do echo -e "\n$i:"; cat $i; done

Maybe the output from that (after a failed failover) would be
enlightening.

Hmm, also, could you try recompiling mountd with the following patch
applied?

--Bruce Fields


mountd needs to make sure that its internal state is synchronized with
etab before responding to kernel upcalls.


utils/mountd/cache.c | 6 ++++++
1 files changed, 6 insertions(+)

diff -puN utils/mountd/cache.c~upcall_export_check utils/mountd/cache.c
--- nfs-utils-1.0.6/utils/mountd/cache.c~upcall_export_check 2004-01-26 18:43:51.000000000 -0500
+++ nfs-utils-1.0.6-bfields/utils/mountd/cache.c 2004-01-26 18:43:51.000000000 -0500
@@ -67,6 +67,8 @@ void auth_unix_ip(FILE *f)
if (inet_aton(ipaddr, &addr)==0)
return;

+ auth_reload();
+
/* addr is a valid, interesting address, find the domain name... */
client = client_compose(addr);

@@ -138,6 +140,8 @@ void nfsd_fh(FILE *f)
break;
}

+ auth_reload();
+
/* Now determine export point for this fsid/domain */
for (i=0 ; i < MCL_MAXTYPES; i++) {
for (exp = exportlist[i]; exp; exp = exp->m_next) {
@@ -236,6 +240,8 @@ void nfsd_export(FILE *f)
if (qword_get(&cp, path, strlen(lbuf)) <= 0)
goto out;

+ auth_reload();
+
/* now find flags for this export point in this domain */
for (i=0 ; i < MCL_MAXTYPES; i++) {
for (exp = exportlist[i]; exp; exp = exp->m_next) {

_


-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-05-06 22:44:22

by Garrick Staples

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Thu, May 06, 2004 at 06:24:55PM -0400, J. Bruce Fields alleged:
> On Thu, May 06, 2004 at 02:53:11PM -0700, Garrick Staples wrote:
> > I'm at a total loss. Everything I'm reading tells me that all I need to ensure
> > is that fs device names are the same on both servers so that the generated
> > filehandles are the same, and that I need to move all lines matching
> > ":$mountpoint:" in rmtab to the new server. The former is done since I'm using
> > persistant device numbers with lvm. The latter shouldn't be needed because I'm
> > using the "new" proc interface with 2.6.5.
> >
> > rmtab definitly doesn't do any noticable difference. I can add random text
> > and blank it out with no noticable difference on the clients.
> >
> > Is this a client problem? The clients are all 2.4.24 and 2.4.26.
> >
> > All clients and servers are using vanilla kernels.
>
> I'm assuming /etc/exports is the same, and that the nfsd filesystem is
> mounted (probably on /proc/fs/nfs/) and mountd is running without
> complaint on the server that you're failing over to?

/etc/exports is basicly empty, so that the machines boot without exporting the
shared disk space. The failover scripts export the filesystems directly with
'exportfs'.

The nfsd fs is mounted. I'm definitely using the "new" upcall interface.

> The kernel uses upcalls to mountd in part to construct the filehandles,
> and nfserr_stale could be returned if those upcalls weren't working.
> You can see the contents of the caches that hold the result of those
> upcalls with something like

Yes, this seems to be the process that isn't working.


> for i in `find /proc/net/rpc -name "content"`; do echo -e "\n$i:"; cat $i; done
>
> Maybe the output from that (after a failed failover) would be
> enlightening.

$ for i in `find /proc/net/rpc -name "content"`; do echo -e "\n$i:"; cat $i;
done

/proc/net/rpc/nfsd.fh/content:
#domain fsidtype fsid [path]

/proc/net/rpc/nfsd.export/content:
#path domain(flags)

/proc/net/rpc/auth.unix.ip/content:
#class IP domain
[root@hpc-fs3 root]# for i in `find /proc/net/rpc -name "content"`; do echo -e
"\n$i:"; cat $i; done

/proc/net/rpc/nfsd.fh/content:
#domain fsidtype fsid [path]
# hpc*,hpc*.usc.edu 0 0x0200fe0000000002
# hpc*,hpc*.usc.edu 0 0x0600fe0000000002

/proc/net/rpc/nfsd.export/content:
#path domain(flags)

/proc/net/rpc/auth.unix.ip/content:
#class IP domain
nfsd 10.125.0.200 hpc*,hpc*.usc.edu

$ showmount -e localhost | grep hpc-25
/export/home/hpc-25 hpc*.usc.edu,rcf*.usc.edu,almaak.usc.edu


On the client (10.125.0.200),
$ df | grep hpc-25
hpc-nfs2:/export/home/hpc-25
0 1 0 0% /auto/hpc-25

hpc-nfs2 is the virtual IP that is currently rebound to the second NFS server.


>
> Hmm, also, could you try recompiling mountd with the following patch
> applied?
>
> --Bruce Fields

Trying it now...

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (2.90 kB)
(No filename) (189.00 B)
Download all attachments

2004-05-06 23:02:22

by Garrick Staples

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Thu, May 06, 2004 at 06:24:55PM -0400, J. Bruce Fields alleged:
> On Thu, May 06, 2004 at 02:53:11PM -0700, Garrick Staples wrote:
> > I'm at a total loss. Everything I'm reading tells me that all I need to ensure
> > is that fs device names are the same on both servers so that the generated
> > filehandles are the same, and that I need to move all lines matching
> > ":$mountpoint:" in rmtab to the new server. The former is done since I'm using
> > persistant device numbers with lvm. The latter shouldn't be needed because I'm
> > using the "new" proc interface with 2.6.5.
> >
> > rmtab definitly doesn't do any noticable difference. I can add random text
> > and blank it out with no noticable difference on the clients.
> >
> > Is this a client problem? The clients are all 2.4.24 and 2.4.26.
> >
> > All clients and servers are using vanilla kernels.
> Hmm, also, could you try recompiling mountd with the following patch
> applied?

No effect. All I did was kill mountd, run the patched mountd, and try to 'ls'
on the client.

Watching an strace of mountd, it doesn't seem that mountd is ever contacted.
The client reports stale filehandle pretty much immediately, and mountd doesn't
do anything.

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (1.27 kB)
(No filename) (189.00 B)
Download all attachments

2004-05-07 17:19:39

by Garrick Staples

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Thu, May 06, 2004 at 04:00:55PM -0700, Garrick Staples alleged:
> On Thu, May 06, 2004 at 06:24:55PM -0400, J. Bruce Fields alleged:
> > On Thu, May 06, 2004 at 02:53:11PM -0700, Garrick Staples wrote:
> > > I'm at a total loss. Everything I'm reading tells me that all I need to ensure
> > > is that fs device names are the same on both servers so that the generated
> > > filehandles are the same, and that I need to move all lines matching
> > > ":$mountpoint:" in rmtab to the new server. The former is done since I'm using
> > > persistant device numbers with lvm. The latter shouldn't be needed because I'm
> > > using the "new" proc interface with 2.6.5.
> > >
> > > rmtab definitly doesn't do any noticable difference. I can add random text
> > > and blank it out with no noticable difference on the clients.
> > >
> > > Is this a client problem? The clients are all 2.4.24 and 2.4.26.
> > >
> > > All clients and servers are using vanilla kernels.
> > Hmm, also, could you try recompiling mountd with the following patch
> > applied?
>
> No effect. All I did was kill mountd, run the patched mountd, and try to 'ls'
> on the client.
>
> Watching an strace of mountd, it doesn't seem that mountd is ever contacted.
> The client reports stale filehandle pretty much immediately, and mountd doesn't
> do anything.

I don't know how to debug this further. At this point I'm assuming nfsd is
immediately returning the ESTALE, there's something about the filehandle it
doesn't like. Is there some way to look at the filehandle on the 2.4 client
and see why the 2.6 server is rejecting it?

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (1.65 kB)
(No filename) (189.00 B)
Download all attachments

2004-05-07 18:25:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Thu, May 06, 2004 at 04:00:55PM -0700, Garrick Staples wrote:
> No effect. All I did was kill mountd, run the patched mountd, and try to 'ls'
> on the client.

I think that's not enough....

> Watching an strace of mountd, it doesn't seem that mountd is ever contacted.
> The client reports stale filehandle pretty much immediately, and mountd doesn't
> do anything.

The upcall was made earlier, and the kernel has cached it (even if it's
a negative result). So, just to make sure, could you try:

killall rpc.mountd
exportfs -f #flush the kernel's caches
rpc.mountd #(the patched version)

and then try again?

Of course, I may be barking up completely the wrong tree here, but I
think that bug in mountd could explain the problem you're seeing....--b.


-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-05-07 21:41:55

by Garrick Staples

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Fri, May 07, 2004 at 02:25:43PM -0400, J. Bruce Fields alleged:
> On Thu, May 06, 2004 at 04:00:55PM -0700, Garrick Staples wrote:
> > No effect. All I did was kill mountd, run the patched mountd, and try to 'ls'
> > on the client.
>
> I think that's not enough....
>
> > Watching an strace of mountd, it doesn't seem that mountd is ever contacted.
> > The client reports stale filehandle pretty much immediately, and mountd doesn't
> > do anything.
>
> The upcall was made earlier, and the kernel has cached it (even if it's
> a negative result). So, just to make sure, could you try:
>
> killall rpc.mountd
> exportfs -f #flush the kernel's caches
> rpc.mountd #(the patched version)

Hey! That did, in fact, revive a long-standing stale filehandle! Cheers for
Mr. Fields!

But now I've just noticed something else that is odd. Something that I'm
surprised I haven't come across before... I have to unexport filesystems
twice before I can unmount them.

Good and normal...

[root@hpc-fs3 root]# mount /dev/vgtest/lvtest /mnt/test
[root@hpc-fs3 root]# exportfs -o rw,no_subtree_check,no_root_squash,async 10.125.0.0/16:/mnt/test
[root@hpc-fs3 root]# exportfs -u 10.125.0.0/16:/mnt/test
[root@hpc-fs3 root]# umount /mnt/test


Same thing, but I did an 'ls' on the client after it was exported...

[root@hpc-fs3 root]# mount /dev/vgtest/lvtest /mnt/test
[root@hpc-fs3 root]# exportfs -o rw,no_subtree_check,no_root_squash,async 10.125.0.0/16:/mnt/test
[root@hpc-fs3 root]# exportfs -u 10.125.0.0/16:/mnt/test
[root@hpc-fs3 root]# umount /mnt/test
umount: /mnt/test: device is busy
[root@hpc-fs3 root]# exportfs -u 10.125.0.0/16:/mnt/test
[root@hpc-fs3 root]# umount /mnt/test

'exportfs -f' does as well as a second umount.

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (1.80 kB)
(No filename) (189.00 B)
Download all attachments

2004-05-07 21:52:07

by elijah wright

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles


> [root@hpc-fs3 root]# mount /dev/vgtest/lvtest /mnt/test
> [root@hpc-fs3 root]# exportfs -o rw,no_subtree_check,no_root_squash,async 10.125.0.0/16:/mnt/test
> [root@hpc-fs3 root]# exportfs -u 10.125.0.0/16:/mnt/test

those sure are some weird IP addresses ( .0.0? ) ...

elijah



-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-05-07 22:18:29

by Garrick Staples

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Fri, May 07, 2004 at 04:52:02PM -0500, elijah wright alleged:
>
> > [root@hpc-fs3 root]# mount /dev/vgtest/lvtest /mnt/test
> > [root@hpc-fs3 root]# exportfs -o rw,no_subtree_check,no_root_squash,async 10.125.0.0/16:/mnt/test
> > [root@hpc-fs3 root]# exportfs -u 10.125.0.0/16:/mnt/test
>
> those sure are some weird IP addresses ( .0.0? ) ...

10.125.0.0/16 is network notation. It means all IPs from 10.125.0.0 through
10.125.255.255.

--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (524.00 B)
(No filename) (189.00 B)
Download all attachments

2004-05-07 22:27:53

by elijah wright

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles



oops. sorry, i had it in my head that those were in mount commands rather
than exportfs commands. no clue what i was thinking.

elijah


> > > [root@hpc-fs3 root]# mount /dev/vgtest/lvtest /mnt/test
> > > [root@hpc-fs3 root]# exportfs -o rw,no_subtree_check,no_root_squash,async 10.125.0.0/16:/mnt/test
> > > [root@hpc-fs3 root]# exportfs -u 10.125.0.0/16:/mnt/test
> >
> > those sure are some weird IP addresses ( .0.0? ) ...
>
> 10.125.0.0/16 is network notation. It means all IPs from 10.125.0.0 through
> 10.125.255.255.
>
>


-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-05-07 22:35:43

by Garrick Staples

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Fri, May 07, 2004 at 02:38:12PM -0700, Garrick Staples alleged:
> But now I've just noticed something else that is odd. Something that I'm
> surprised I haven't come across before... I have to unexport filesystems
> twice before I can unmount them.

I'm looking at exportfs.c to see if I can add in a call to flush the caching
when unmounting, does the following seem reasonable?

Force a cache flush if unexporting, and don't reread rmtab if we're using the
new cache.

--- exportfs.c_orig 2004-05-07 15:32:22.423905760 -0700
+++ exportfs.c 2004-05-07 15:33:33.669998637 -0700
@@ -127,10 +127,13 @@
*/
if (!f_reexport)
xtab_export_read();
- if (!f_export)
+ if (!f_export) {
+ force_flush=1;
for (i = optind ; i < argc ; i++)
unexportfs(argv[i], f_verbose);
- rmtab_read();
+ }
+ if (!new_cache)
+ rmtab_read();
}
if (!new_cache) {
xtab_mount_read();


--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (1.17 kB)
(No filename) (189.00 B)
Download all attachments

2004-05-11 18:30:27

by Garrick Staples

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

I just wanted to mention that failover has been rock solid with those two
patches. Thank you very much for your help!

Does the patch below look ok?


On Fri, May 07, 2004 at 03:34:15PM -0700, Garrick Staples alleged:
> On Fri, May 07, 2004 at 02:38:12PM -0700, Garrick Staples alleged:
> > But now I've just noticed something else that is odd. Something that I'm
> > surprised I haven't come across before... I have to unexport filesystems
> > twice before I can unmount them.
>
> I'm looking at exportfs.c to see if I can add in a call to flush the caching
> when unmounting, does the following seem reasonable?
>
> Force a cache flush if unexporting, and don't reread rmtab if we're using the
> new cache.
>
> --- exportfs.c_orig 2004-05-07 15:32:22.423905760 -0700
> +++ exportfs.c 2004-05-07 15:33:33.669998637 -0700
> @@ -127,10 +127,13 @@
> */
> if (!f_reexport)
> xtab_export_read();
> - if (!f_export)
> + if (!f_export) {
> + force_flush=1;
> for (i = optind ; i < argc ; i++)
> unexportfs(argv[i], f_verbose);
> - rmtab_read();
> + }
> + if (!new_cache)
> + rmtab_read();
> }
> if (!new_cache) {
> xtab_mount_read();
>
>
> --
> Garrick Staples, Linux/HPCC Administrator
> University of Southern California



--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (1.54 kB)
(No filename) (189.00 B)
Download all attachments

2004-05-13 21:00:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Fri, May 07, 2004 at 03:34:15PM -0700, Garrick Staples wrote:
> I'm looking at exportfs.c to see if I can add in a call to flush the caching
> when unmounting, does the following seem reasonable?

Small note--I think your mailer must be messing up your patches--the
patch you appended e.g. has all tabs changed to spaces, and doesn't
apply.

> Force a cache flush if unexporting, and don't reread rmtab if we're using the
> new cache.

Hm. I don't understand why unexporting is a special case here; what
about changing export options? When I run exportfs, I expect any
changes to take place before it returns, so I don't understand why
there's a "force_flush" option at all--shouldn't the kernel's export
tables be flushed every time exportfs is run?

Can someone explaing what the reason was for adding the -f flag to
exportfs?

--Bruce Fields


-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-05-13 21:40:02

by Garrick Staples

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Thu, May 13, 2004 at 05:00:45PM -0400, J. Bruce Fields alleged:
> On Fri, May 07, 2004 at 03:34:15PM -0700, Garrick Staples wrote:
> > I'm looking at exportfs.c to see if I can add in a call to flush the caching
> > when unmounting, does the following seem reasonable?
>
> Small note--I think your mailer must be messing up your patches--the
> patch you appended e.g. has all tabs changed to spaces, and doesn't
> apply.

That's odd. I use mutt with vi. IRCC, I used'd vi's read function to read the
patch into the email. I can't imagine where any conversion would have taken
place.

Perhaps I copy/pasted from 'less', but I don't think so.


--
Garrick Staples, Linux/HPCC Administrator
University of Southern California


Attachments:
(No filename) (730.00 B)
(No filename) (189.00 B)
Download all attachments

2004-06-08 03:06:14

by NeilBrown

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Friday May 7, [email protected] wrote:
> On Fri, May 07, 2004 at 02:38:12PM -0700, Garrick Staples alleged:
> > But now I've just noticed something else that is odd. Something that I'm
> > surprised I haven't come across before... I have to unexport filesystems
> > twice before I can unmount them.

Thanks for reporting this. Sorry it has taken me so long to look at
it.

I can reproduce the problem, and I know what is causing it.

When caches are flushed, as they are at export, unexport, or
re-export, the caches are flushed in the order:
auth.unix.ip nfsd.export nfsd.fh

Flushing a cache involves:

1/ setting the "flush_time" on a cache to the time given. This has
the effect of marking all entries in the cache with an earlier
update time as expired.
2/ setting the "nextcheck" time for the cache to "now" so that all
entries will be checked when we....
2/ call "cache_flush", which checks which entries have expired, and
are no longer in use, and remove them.

The problem is that entries in nfsd.fh refer to entries in
nfsd.export. So when we flush nfsd.export, some entries will still
be in-use by nfsd.fh and so will not be removed.

We then flush nfsd.fh, which calls "cache_flush", but as the nextcheck
on nfsd.export has been updated, it isn't checked again. So the entry
stays there.

The fix, which I have tested and does appear to work, is to change the
order in which the caches are flushed. This is in "cache_flush" at
the end of support/nfs/cache.io.c The order should be:
static char *cachelist[] = {
"auth.unix.ip",
"nfsd.fh",
"nfsd.export",
NULL
};


>
> I'm looking at exportfs.c to see if I can add in a call to flush the caching
> when unmounting, does the following seem reasonable?

The code already flushes the caches if anything has changed.

Towards the end of main() in exportfs.c there is:

xtab_export_write();
if (new_cache)
cache_flush(force_flush);

xtab_export_write writes out a new copy of the export table and if
that is different (byte for byte) from the old copy, it is moved
in place of the old copy. So if you didn't make a change, the file
isn't touched.

Then cache_flush will flush the caches giving the modify time on the
'etab' file (which xtab_export_write might have changed) as the flush
time. If "force_flush==1", the current time is used instead of the
modify time.

So exportfs will always flush the caches, but will only flush entries
added before the etab file was last updated.

Adding "-f" will make it flush all changes, even if the etab isn't
updated.

So I cannot see how your patch would make any difference. Setting
force_flush at that point would have zero effect if the exportfs
command actually changed the etab file at all.

Your change to remove rmtab_read when new_cache is probably good,
though I cannot see how it would have a significant effect in this
case.

And to answer Bruce's subsequent question:
> Can someone explaing what the reason was for adding the -f flag to
> exportfs?
Because it is useful for testing. It should not be needed during
normal running.

I'll check-in the cache-flush-order and the no-rmtab_read changes to
CVS shortly.

NeilBrown




>
> Force a cache flush if unexporting, and don't reread rmtab if we're using the
> new cache.
>
> --- exportfs.c_orig 2004-05-07 15:32:22.423905760 -0700
> +++ exportfs.c 2004-05-07 15:33:33.669998637 -0700
> @@ -127,10 +127,13 @@
> */
> if (!f_reexport)
> xtab_export_read();
> - if (!f_export)
> + if (!f_export) {
> + force_flush=1;
> for (i = optind ; i < argc ; i++)
> unexportfs(argv[i], f_verbose);
> - rmtab_read();
> + }
> + if (!new_cache)
> + rmtab_read();
> }
> if (!new_cache) {
> xtab_mount_read();
>
>
> --
> Garrick Staples, Linux/HPCC Administrator
> University of Southern California


-------------------------------------------------------
This SF.Net email is sponsored by: GNOME Foundation
Hackers Unite! GUADEC: The world's #1 Open Source Desktop Event.
GNOME Users and Developers European Conference, 28-30th June in Norway
http://2004/guadec.org
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-06-08 03:42:36

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd, rmtab, failover, and stale filehandles

On Tue, Jun 08, 2004 at 01:05:59PM +1000, Neil Brown wrote:
> > I'm looking at exportfs.c to see if I can add in a call to flush the caching
> > when unmounting, does the following seem reasonable?
>
> The code already flushes the caches if anything has changed.
>
> Towards the end of main() in exportfs.c there is:
>
> xtab_export_write();
> if (new_cache)
> cache_flush(force_flush);

Ah! I missed that.

> And to answer Bruce's subsequent question:
> > Can someone explaing what the reason was for adding the -f flag to
> > exportfs?
> Because it is useful for testing. It should not be needed during
> normal running.

Got it, that makes more sense. But if the only time "-f" is ever needed
is when there's a bug somewhere, then maybe it should just be removed,
or at least not documented? Otherwise it seems like it's only likely to
cause confusion.

--b.


-------------------------------------------------------
This SF.Net email is sponsored by: GNOME Foundation
Hackers Unite! GUADEC: The world's #1 Open Source Desktop Event.
GNOME Users and Developers European Conference, 28-30th June in Norway
http://2004/guadec.org
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs