On Monday December 10, [email protected] wrote:
> Neil Brown wrote:
> > I was under the impression that /var/run was always cleaned out on
> > reboot, so this shouldn't be a problem. Is my impression wrong?
> I don't think there are any guarantees about this. I was under
> the impression that the individual init scripts were suppose
> to clean those up and I know I fixed a few bugs in that area. ;-)
Debian has /etc/init.d/bootclean which contains:
cd /var/run ; .....
....
find . ! -xtype d ! -name utmp ! -name innd.pid \
-print0 | xargs -0r rm -f -- \
|| ....
so any files except a select few are removed.
SLES has /etc/init.d/boot.cleanup which has
find /var/run /var/lock -type f -print0 | xargs -0 -r rm -f 2>/dev/null
so again, all files get removed.
You seem to be saying that Redhat requires everything in /var/run to
be explicitly cleaned up ??? Aren't standards wonderful :-)
>
> >
> > And I think the point of the lock file was the sm-notify would only
> > run once per reboot.
> Why impose a limit? Why not recover lock at any point?
You only want to recover locks when they have been lost. Boot time is
that time.
The particular situation I wanted to handle was that I wanted it to be
OK to start statd at any time. And I wanted sm-notify to run at boot
time if possible, but also to be run when statd was started if it
hadn't run already.
An 'I have run already' file in /var/run seemed the obvious solution.
>
> > But I think that removing the lock file when sm-notify completes is
> > wrong.
> A side effect of all this is if rpc.statd is restarted and that
> file is not cleaned up the client will never even try to recover any locks.
> That's definitely not a good thing... imho...
There is no need for a client to recover locks if statd is restarted
(is there). I think statd now stores all state in /var/lib/nfs/* so
if it is killed and restarted, it should pick up exactly where it left
off.
The time that you need to notify clients is when *lockd* is
restarted. Or more significantly, when lockd arbitrarily drops it's
locks.
i.e. if you "kill -9" lockd, then you need to do something with
sm-notify and I acknowledge that isn't well documented nor is there an
automatic procedure to make it happen. So there is room for
improvement, but I don't think that running sm-notify every time that
statd is started is an improvement.
>
> >
> > Note the comment in sm-notify.c:
> >
> > /*
> > * Record pid in /var/run/sm-notify.pid
> > * This file should remain until a reboot, even if the
> > * program exits.
> > * If file already exists, fail.
> > */
> I guess I just don't understand why a pid file need to exist after a
> process is gone. Especially if its going to cause the next existence
> to imminently exit (potentially causing data corruption).
> That just does not seem very robust...
Maybe making look like a pid file was a mistake. I'd be OK with
changing the name to /var/run/sm-notify-has-run or similar. If we
could store in the file some reflection of the state of lockd, that
would be even better, but I don't think such a value is available at
the moment.
What potential for data corruption do you see?
NeilBrown