2007-12-11 03:28:36

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] nfs-utils: sm-notify does not remove its pid file.

On Monday December 10, [email protected] wrote:
> Neil Brown wrote:
> > I was under the impression that /var/run was always cleaned out on
> > reboot, so this shouldn't be a problem. Is my impression wrong?
> I don't think there are any guarantees about this. I was under
> the impression that the individual init scripts were suppose
> to clean those up and I know I fixed a few bugs in that area. ;-)

Debian has /etc/init.d/bootclean which contains:

cd /var/run ; .....
....
find . ! -xtype d ! -name utmp ! -name innd.pid \
-print0 | xargs -0r rm -f -- \
|| ....

so any files except a select few are removed.

SLES has /etc/init.d/boot.cleanup which has
find /var/run /var/lock -type f -print0 | xargs -0 -r rm -f 2>/dev/null

so again, all files get removed.

You seem to be saying that Redhat requires everything in /var/run to
be explicitly cleaned up ??? Aren't standards wonderful :-)


>
> >
> > And I think the point of the lock file was the sm-notify would only
> > run once per reboot.
> Why impose a limit? Why not recover lock at any point?

You only want to recover locks when they have been lost. Boot time is
that time.

The particular situation I wanted to handle was that I wanted it to be
OK to start statd at any time. And I wanted sm-notify to run at boot
time if possible, but also to be run when statd was started if it
hadn't run already.
An 'I have run already' file in /var/run seemed the obvious solution.

>
> > But I think that removing the lock file when sm-notify completes is
> > wrong.
> A side effect of all this is if rpc.statd is restarted and that
> file is not cleaned up the client will never even try to recover any locks.
> That's definitely not a good thing... imho...

There is no need for a client to recover locks if statd is restarted
(is there). I think statd now stores all state in /var/lib/nfs/* so
if it is killed and restarted, it should pick up exactly where it left
off.

The time that you need to notify clients is when *lockd* is
restarted. Or more significantly, when lockd arbitrarily drops it's
locks.
i.e. if you "kill -9" lockd, then you need to do something with
sm-notify and I acknowledge that isn't well documented nor is there an
automatic procedure to make it happen. So there is room for
improvement, but I don't think that running sm-notify every time that
statd is started is an improvement.

>
> >
> > Note the comment in sm-notify.c:
> >
> > /*
> > * Record pid in /var/run/sm-notify.pid
> > * This file should remain until a reboot, even if the
> > * program exits.
> > * If file already exists, fail.
> > */
> I guess I just don't understand why a pid file need to exist after a
> process is gone. Especially if its going to cause the next existence
> to imminently exit (potentially causing data corruption).
> That just does not seem very robust...

Maybe making look like a pid file was a mistake. I'd be OK with
changing the name to /var/run/sm-notify-has-run or similar. If we
could store in the file some reflection of the state of lockd, that
would be even better, but I don't think such a value is available at
the moment.

What potential for data corruption do you see?

NeilBrown