2003-01-28 17:05:24

by Christian Robottom Reis

[permalink] [raw]
Subject: lockd shutdown/umount order


As the final cleanup of the diskless shutdown sequence, I would like to
know what procedure is the correct one for an NFS-root workstation.
Currently, the following set of steps happens:

1. killall5 murders rpc.statd and portmap
2. umount -tnfs -a -r runs
3. the kernel complains:
lockd_down: lockd failed to exit, clearing pid
4. halt goes on without a problem.
5. lockd complains: failed to unmonitor 192.168.99.4 *after halt*

(I never knew kernel threads could complain after halt, amazing)

Now seeing this, I have patched killall5 to allow specifying certain
pids for exclusion from the kill list, and I am contemplating *not*
killing statd and portmap. However, I'm not sure if this is the correct
approach. I see the following options:

- Ignore the lockd_down and the failed to unmonitor messages. Are
they safely ignored, and can they be worked around or disabled in
that case?

- Make killall5 not murder rpc.statd/portmap. However, doesn't this
mean that they will be left "hanging" from the filesystem point of
view? Remember that /var/lib/nfs is *also* hosted on the server.

- Change something else in the shutdown order to make it right (I
can't really see how).

I see that other users have had similar problems:
http://fscked.org/writings/clusters/halt.patch alters the shutdown to
not run killall5 at all, which is not a good solution IMHO since
processes can stay running and block other umounts (and never have a
chance to die cleanly AAR).

Has anyone <wink> ran into this problem, and found a solution?

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-01-28 18:01:18

by Trond Myklebust

[permalink] [raw]
Subject: Re: lockd shutdown/umount order

>>>>> " " == Christian Reis <[email protected]> writes:

> As the final cleanup of the diskless shutdown sequence, I would
> like to know what procedure is the correct one for an NFS-root
> workstation. Currently, the following set of steps happens:

> 1. killall5 murders rpc.statd and portmap
> 2. umount -tnfs -a -r runs
> 3. the kernel complains:
> lockd_down: lockd failed to exit, clearing pid
> 4. halt goes on without a problem.
> 5. lockd complains: failed to unmonitor 192.168.99.4 *after
> halt*

killall5 sucks as a method for shutting down processes. It is far too
unselective.

The above messages mean that you still had a byte-range lock being
held when statd got killed. When the process that held the lock got
killed, the kernel automatically tried to release the lock, but failed
to notify statd that it had done so 'cos statd and portmap were no
longer running.

Make sure that you kill those other processes *before* you kill statd
and portmap.


Note: if you want to be clever about this, it is possible to divine
the pids of the processes that are holding locks (at least in the case
of POSIX locks which are the interesting ones here) by doing something
like
awk '/POSIX/ { print $5; }' </proc/locks

Cheers,
Trond


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-01-28 21:38:54

by Christian Robottom Reis

[permalink] [raw]
Subject: Re: lockd shutdown/umount order

On Tue, Jan 28, 2003 at 07:01:18PM +0100, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <[email protected]> writes:
>
> > As the final cleanup of the diskless shutdown sequence, I would
> > like to know what procedure is the correct one for an NFS-root
> > workstation. Currently, the following set of steps happens:
>
> > 1. killall5 murders rpc.statd and portmap
> > 2. umount -tnfs -a -r runs
> > 3. the kernel complains:
> > lockd_down: lockd failed to exit, clearing pid
> > 4. halt goes on without a problem.
> > 5. lockd complains: failed to unmonitor 192.168.99.4 *after
> > halt*
>
> killall5 sucks as a method for shutting down processes. It is far too
> unselective.

Agreed. The patch will go upstream to Miquel to see if we can at least
get basic selectivity (-o pid to omit, multiple allowed).

> The above messages mean that you still had a byte-range lock being
> held when statd got killed. When the process that held the lock got
> killed, the kernel automatically tried to release the lock, but failed
> to notify statd that it had done so 'cos statd and portmap were no
> longer running.

Well, I put a cat /proc/locks just before the killall5 and no locks are
being held (or none are being shown at least), so I don't think that is
the issue. Could it be that lockd is not going down, even after all
locks are released?

How *should* be lockd killed, actually? It doesn't seem to care about
signals we send, does it?

> Make sure that you kill those other processes *before* you kill statd
> and portmap.

But none of them *are* running, no locks show up, and no special processes
seem to appear on ps ax output:

PID TTY STAT TIME COMMAND
1 ? S 0:03 init
2 ? SW 0:00 [keventd]
3 ? SWN 0:00 [ksoftirqd_CPU0]
4 ? SW 0:00 [kswapd]
5 ? SW 0:00 [bdflush]
6 ? SW 0:00 [kupdated]
7 ? SW 0:00 [eth0]
8 ? SW 0:00 [rpciod]
90 ? S 0:00 /sbin/portmap
106 ? S 0:00 /sbin/rpc.statd
112 ? SW 0:00 [lockd]
166 ? S 0:00 /sbin/klogd
509 ? S 0:00 /bin/sh /etc/init.d/rc 0
600 ? S 0:00 /bin/sh /etc/rc0.d/S29sendsigs stop
607 ? R 0:00 ps ax

What could it be?

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs