2004-08-10 09:06:28

by Frank Steiner

[permalink] [raw]
Subject: nfsd stales when restarting too fast

Hi,

I posted this on the kernel list already, but now that I'm subscribed here
I guess this is the better place :-) Neil already reacted to my mail on
LKML but the first proposal didn't help (order of exportfs and killall).

System is: SuSE 9.0 with 2.6.7 (tested up to 2.6.8rc3) and util-linux-2.12

Also tested with SuSE 9.1/SLES9 and SuSEs kernel 2.6.5.

When running "/etc/init.d/nfsserver restart" on the server, the clients
will react with "stale nfs handle" for all mounted directories that were
in use during the restart (e.g. if /var is mounted and syslogd is running,
or if some "find" is running on a mounted directory). The stale directories
will never come back to sane state (except restarting with sleep, see below).

When using
/etc/init.d/nfsserver stop
sleep 2
/etc/init.d/nfsserver start

(or putting a "sleep 1" between the lines "$0 stop" and "$0 start" in the
init script), everything goes fine. Restarting with sleep 2 will also
bring back the client dirs that were staled from a former restart without
sleep.

Without the init script, it can be traced down to:

killall -9 nfsd
killall -9 /usr/sbin/rpc.mountd
/usr/sbin/exportfs -au
[sleep 2]
/usr/sbin/exportfs -r
/usr/sbin/rpc.nfsd
/usr/sbin/rpc.mountd

Stales without the sleep, does not with the sleep. That behaviour is
independent from options like v3/v4, tcp/udp, lock/nolock, and it did
not happen with 2.4.

Unless this is sth. easy to fix in the kernel nfsd or client, it might
be a good idea to insert such a sleep statement in the distributors
init scripts to avoid people running into this error. I assume the
problem in the mail "machine hangs - SLES9/NFS" was caused by the
same problem.


cu,
Frank

--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-08-18 03:24:54

by NeilBrown

[permalink] [raw]
Subject: Re: nfsd stales when restarting too fast

On Tuesday August 10, [email protected] wrote:
> Hi,
>
> I posted this on the kernel list already, but now that I'm subscribed here
> I guess this is the better place :-) Neil already reacted to my mail on
> LKML but the first proposal didn't help (order of exportfs and killall).
>
> System is: SuSE 9.0 with 2.6.7 (tested up to 2.6.8rc3) and util-linux-2.12
>
> Also tested with SuSE 9.1/SLES9 and SuSEs kernel 2.6.5.
>
> When running "/etc/init.d/nfsserver restart" on the server, the clients
> will react with "stale nfs handle" for all mounted directories that were
> in use during the restart (e.g. if /var is mounted and syslogd is running,
> or if some "find" is running on a mounted directory). The stale directories
> will never come back to sane state (except restarting with sleep, see below).
>
> When using
> /etc/init.d/nfsserver stop
> sleep 2
> /etc/init.d/nfsserver start
>
> (or putting a "sleep 1" between the lines "$0 stop" and "$0 start" in the
> init script), everything goes fine. Restarting with sleep 2 will also
> bring back the client dirs that were staled from a former restart without
> sleep.
>
> Without the init script, it can be traced down to:
>
> killall -9 nfsd
> killall -9 /usr/sbin/rpc.mountd
> /usr/sbin/exportfs -au
> [sleep 2]
> /usr/sbin/exportfs -r
> /usr/sbin/rpc.nfsd
> /usr/sbin/rpc.mountd
>
> Stales without the sleep, does not with the sleep. That behaviour is
> independent from options like v3/v4, tcp/udp, lock/nolock, and it did
> not happen with 2.4.

Probably the best solution is to "not do that" - why do you want to
stop and then restart the server anyway? Why not just leave it
running.

However there is a race the, and "sleep 1" would fix it.
Another fix would be to use "-1" instead of "-9" to kill nfsd. This
causes it to exit without clearing the export table.
Another fix would be to apply to following patch to your 2.6 kernel.

NeilBrown


diff ./net/sunrpc/cache.c~current~ ./net/sunrpc/cache.c
--- ./net/sunrpc/cache.c~current~ 2004-08-18 13:07:44.000000000 +1000
+++ ./net/sunrpc/cache.c 2004-08-18 13:12:10.000000000 +1000
@@ -400,9 +400,10 @@ void cache_flush(void)

void cache_purge(struct cache_detail *detail)
{
- detail->flush_time = get_seconds()+1;
+ detail->flush_time = LONG_MAX;
detail->nextcheck = get_seconds();
cache_flush();
+ detail->flush_time = 1;
}




-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-18 07:17:03

by Frank Steiner

[permalink] [raw]
Subject: Re: nfsd stales when restarting too fast

Neil Brown wrote:
>
> Probably the best solution is to "not do that" - why do you want to
> stop and then restart the server anyway? Why not just leave it
> running.

Aeh, well, yes. Of course I should not restart but just reload when
sth. has changed... That's because I'm not used to the reload
option since it didn't exist in the early SuSE versions I was using, so
I'm always using restart. With reload (which just issues exportfs -r
in the SuSE init script) no problem occurs. I definitely should have
thought of that *feeling stupid* :-((

>
> However there is a race the, and "sleep 1" would fix it.
> Another fix would be to use "-1" instead of "-9" to kill nfsd. This
> causes it to exit without clearing the export table.
> Another fix would be to apply to following patch to your 2.6 kernel.

Just in case I forget the reload again next time I will include your
patch in my kernel rpm, too. Just to be sure :-)

Thanks for your help!
cu,
Frank

--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs