2003-01-07 15:19:15

by Christian Robottom Reis

[permalink] [raw]
Subject: /var/lib/nfs/sm/ files


Hi there,

Can `anybody' (Neil, Trond?) explain what the entries in
/var/lib/nfs/sm/ are for? If they refer to file locks, can we discover
which files they are referencing so I can try and understand why we get
leftover entries in there, and in which scenarios?

I"m still trying to look into the hang problems [1] I'm getting, since
there hasn't been a lot of progress about it. Anybody have a minute free
to try and help?

[1] http://www.uwsg.iu.edu/hypermail/linux/kernel/0210.0/1112.html

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL


2003-01-07 16:49:07

by Trond Myklebust

[permalink] [raw]
Subject: Re: /var/lib/nfs/sm/ files

>>>>> " " == Christian Reis <[email protected]> writes:

> Hi there,

> Can `anybody' (Neil, Trond?) explain what the entries in
> /var/lib/nfs/sm/ are for? If they refer to file locks, can we

'man rpc.statd'. Those files store the IP-addresses of the machines
being monitored by statd. In case of a crash or a reboot, those files
tell statd which machines that need to be notified.

Cheers,
Trond

2003-01-08 11:42:14

by Christian Robottom Reis

[permalink] [raw]
Subject: Re: /var/lib/nfs/sm/ files

On Tue, Jan 07, 2003 at 05:54:59PM +0100, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <[email protected]> writes:
>
> > Hi there,
>
> > Can `anybody' (Neil, Trond?) explain what the entries in
> > /var/lib/nfs/sm/ are for? If they refer to file locks, can we
>
> 'man rpc.statd'. Those files store the IP-addresses of the machines
> being monitored by statd. In case of a crash or a reboot, those files
> tell statd which machines that need to be notified.

Thanks. So my questions are related to what `monitored by statd' means:

- Why don't all the diskless workstations get an entry in that
directory while they are running? Right now I have 5 running, and
only one has an entry there.

- Why do most entries' mtime get updated periodically, but a few of
the entries go stale with time?

- Why do some of the stale entries get left over even after the
workstations have halted (these ones present the nfs hang issue)?

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

2003-01-08 12:40:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: /var/lib/nfs/sm/ files

On Wednesday 08 January 2003 12:50, Christian Reis wrote:
> - Why don't all the diskless workstations get an entry in that
> directory while they are running? Right now I have 5 running, and
> only one has an entry there.

...because only clients that are currently holding POSIX locks will have an
entry.

> - Why do most entries' mtime get updated periodically, but a few of
> the entries go stale with time?

The file should get deleted every time the client releases all locks and
successfully manages to notify the server that it is stopping monitoring.

> - Why do some of the stale entries get left over even after the
> workstations have halted (these ones present the nfs hang issue)?


As I've told you before: 'stale' entries, as you call them, indicate that the
rpc.statd never managed to notify the server that it should stop monitoring.
It indicates either the server or the client crashed before the POSIX locks
held by the client got released, or possibly that the rpc.statd processes
crashed (or got 'kill -9' ed).

Cheers,
Trond

2003-01-08 17:05:59

by Christian Robottom Reis

[permalink] [raw]
Subject: Re: /var/lib/nfs/sm/ files

On Wed, Jan 08, 2003 at 01:46:10PM +0100, Trond Myklebust wrote:
> > - Why do most entries' mtime get updated periodically, but a few of
> > the entries go stale with time?
>
> The file should get deleted every time the client releases all locks and
> successfully manages to notify the server that it is stopping monitoring.

Aha, this makes a lot of sense. Then the leftover files I am getting are
probably a product of:

syslog:Jan 7 08:35:47 canario rpc.statd[101]: Received erroneous SM_UNMON request from canario for 192.168.99.4
syslog:Jan 7 09:09:37 canario rpc.statd[101]: Received erroneous SM_UNMON request from canario for 192.168.99.4
syslog:Jan 7 18:23:15 canario rpc.statd[101]: Received erroneous SM_UNMON request from canario for 192.168.99.4

It seems that rpc.statd itself isn't liking the request it's getting and
never forwards it to the server. I used to think these were harmless,
but now I wonder why would this be happening?

> > - Why do some of the stale entries get left over even after the
> > workstations have halted (these ones present the nfs hang issue)?
>
>
> As I've told you before: 'stale' entries, as you call them, indicate that the

(Sorry, I am apparently clueless when it gets to these details.)

> rpc.statd never managed to notify the server that it should stop monitoring.
> It indicates either the server or the client crashed before the POSIX locks
> held by the client got released, or possibly that the rpc.statd processes
> crashed (or got 'kill -9' ed).

But at least it seems that nobody has crashed - statd is running along
fine. Both server and clients run the same versions of the daemon, and
the fact that we get repeated messages (without restarting anybody)
should indicate that it is in fact running.

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL