2004-04-23 10:37:33

by Gavin Hamill

[permalink] [raw]
Subject: lockd / statd fun (sorry)

Hullo,

I'm afraid I have to dredge up old ground here...

I'm running about 30 diskless workstations PXE-booting to an NFS-root and
NFS-homedir with NIS logins, and the workstations are regularly getting the
familiar

Apr 23 11:17:18 10.0.0.13 kernel: nsm_mon_unmon: rpc failed, status=-13
Apr 23 11:17:18 10.0.0.13 kernel: lockd: cannot monitor 10.0.0.253
Apr 23 11:17:18 10.0.0.13 kernel: lockd: failed to monitor 10.0.0.253
Apr 23 11:17:18 10.0.0.13 kernel: nsm_mon_unmon: rpc failed, status=-13
Apr 23 11:17:18 10.0.0.13 kernel: lockd: cannot monitor 10.0.0.253

kernel messages. Now, I've read as much as I can on the topic, and I have made
sure that both statd and lockd are running on both the client and the server.

The machines have generally worked well, but OpenOffice.org's setup seems to
require locking, and the messages have increasingly irritated me, so I need
to turn to the oracles on such matters :)

I'm using kernel-mode NFS server on both the physical server and the
workstations... On the server, I see

10714 ? S 0:00 /sbin/rpc.statd
10733 ? SW 0:00 [lockd]
10734 ? SW 0:00 \_ [rpciod]

and on the clients I see the same (from memory - I can't access them via SSH
from here). Clients are using a 2.4.22 kernel and the server is on 2.4.24.

I checked the manpage for statd, and was interested by the /var/lib/nfs/sm
directory - on the server, this directory is completely empty,
and /var/lib/nfs/state contains only 4 bytes: 001d 0000 - does this sound
normal?

/etc/hosts.allow was always blank , but today I tried the advice I found in
another mailing list posting, to add

statd: 10.0.0.

I then restarted nfs-common (rpc.statd) and nfs-kernel-server (rpc.nfsd and
rpc.mountd) with logs thusly:

Apr 23 11:06:26 10.0.0.9 kernel: nsm_mon_unmon: rpc failed, status=-13
Apr 23 11:06:26 10.0.0.9 kernel: lockd: cannot monitor 10.0.0.253
Apr 23 11:06:26 10.0.0.9 kernel: lockd: failed to monitor 10.0.0.253
Apr 23 11:06:40 fon kernel: nfsd: last server has exited
Apr 23 11:06:40 fon kernel: nfsd: unexporting all filesystems
Apr 23 11:06:42 10.0.0.24 kernel: nfs: server 10.0.0.253 not responding, still
trying
Apr 23 11:06:43 10.0.0.19 kernel: nfs: server 10.0.0.253 not responding, still
trying
Apr 23 11:06:43 10.0.0.9 kernel: nfs: server 10.0.0.253 not responding, still
Apr 23 11:06:45 10.0.0.19 kernel: nfs: server 10.0.0.253 OK
Apr 23 11:06:56 10.0.0.19 kernel: nsm_mon_unmon: rpc failed, status=-13
Apr 23 11:06:56 10.0.0.19 kernel: lockd: cannot monitor 10.0.0.253
Apr 23 11:06:56 10.0.0.19 kernel: lockd: failed to monitor 10.0.0.253

i.e. nothing's changed :(

Should the /var/lib/nfs be in use for the clients, too? My boot-sequence is
based on KNOPPIX and uses an initrd to symlink much of the filesystem to a
ramdisk, so I'm a bit concerned I might have messed up the permissions here.

Any advice warmly welcomed.

Cheers,
Gavin.
P.S. Olaf Kirch's "statd simplified" looks very interesting! :))


-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-04-23 11:32:11

by Bernd Schubert

[permalink] [raw]
Subject: Re: lockd / statd fun (sorry)

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Gavin,

> I'm afraid I have to dredge up old ground here...
>
> I'm running about 30 diskless workstations PXE-booting to an NFS-root and
> NFS-homedir with NIS logins, and the workstations are regularly getting t=
he
> familiar
>
> Apr 23 11:17:18 10.0.0.13 kernel: nsm_mon_unmon: rpc failed, status=3D-13
> Apr 23 11:17:18 10.0.0.13 kernel: lockd: cannot monitor 10.0.0.253
> Apr 23 11:17:18 10.0.0.13 kernel: lockd: failed to monitor 10.0.0.253
> Apr 23 11:17:18 10.0.0.13 kernel: nsm_mon_unmon: rpc failed, status=3D-13
> Apr 23 11:17:18 10.0.0.13 kernel: lockd: cannot monitor 10.0.0.253
>

we also use a diskless environment and also see that problem. However, as I=
=20
posted a long time ago to this list, it only happens if the nfs-utils are=20
compiled with the '--secure-statd' confiure option.
So every time we perform a general debian (testing) -update and so when als=
o=20
the nfs-utils become updated, we see that problem. Everytime this happens, =
I=20
fetch the debian nfs-utils source and recompile them without the=20
'--secure-statd' option.

When I posted that workaround, Trond told me, that its not good, since fake=
d=20
packages can be send to the statd-daemon that way. However, for us its bett=
er=20
to have an unsecured statd running than non at all. Also, the rpc.statd=20
manpage says, that the statd can be protected by the 'tcp_wrapper library'.


Cheers,
Bernd
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAiP6zC8BUnAF+ydYRAv6qAJ9XovApFhzLv1E7/EBUxUbstj6UbQCeKW2a
0BIpyiHKUmqYIGnvxIO0JmQ=3D
=3DDGzb
=2D----END PGP SIGNATURE-----


-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-23 11:48:33

by Gavin Hamill

[permalink] [raw]
Subject: Re: lockd / statd fun (sorry)

On Friday 23 April 2004 12:32, Bernd Schubert wrote:

> Everytime this happens, I fetch the debian nfs-utils source and recompile
> them without the '--secure-statd' option.

OK, I did see your message in the archives, but wasn't sure that it was
relevant here (I'm getting error -13 when you got -5) but I'm game for a
laugh :)

The workstations are running unstable, whilst the server is on woody. Should I
only need to update the nfs-common package on the workstations? I don't want
to touch the server too much if I can help it :)

> Also, the rpc.statd manpage says, that the statd can be protected by
> the 'tcp_wrapper library'.

<nod> That's what I've done in /etc/hosts.allow - it's a moot point, really
because it's a firewalled LAN :)

Cheers,
Gavin.


-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-23 11:57:08

by Olaf Kirch

[permalink] [raw]
Subject: Re: lockd / statd fun (sorry)

On Fri, Apr 23, 2004 at 12:48:30PM +0100, Gavin Hamill wrote:
> OK, I did see your message in the archives, but wasn't sure that it was
> relevant here (I'm getting error -13 when you got -5) but I'm game for a
> laugh :)

13 is EACCESS, and is probably returned by statd. So I would say you
are able to talk to statd, only there's something wrong with the setup.

statd wants the following files available and writable:

/var/lib/nfs/state (at start-up only; to store seq# number)
/var/lib/nfs/sm to store the NFS peer's address
/var/lib/nfs/sm.bak (at start-up only; for lock recovery)

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+


-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-27 13:41:59

by Gavin Hamill

[permalink] [raw]
Subject: Re: lockd / statd fun (sorry)

On Fri, Apr 23, 2004 at 01:57:04PM +0200, Olaf Kirch wrote:
> On Fri, Apr 23, 2004 at 12:48:30PM +0100, Gavin Hamill wrote:
> > OK, I did see your message in the archives, but wasn't sure that it was
> > relevant here (I'm getting error -13 when you got -5) but I'm game for a
> > laugh :)
>
> 13 is EACCESS, and is probably returned by statd. So I would say you
> are able to talk to statd, only there's something wrong with the setup.
>
> statd wants the following files available and writable:
>
> /var/lib/nfs/state (at start-up only; to store seq# number)
> /var/lib/nfs/sm to store the NFS peer's address
> /var/lib/nfs/sm.bak (at start-up only; for lock recovery)

OK, I'm assuming here that statd is only run by the machine running nfsd and
exporting filesystems, and /usr/sbin/rpc.statd is definately in the process list

Moving onto the files in /var/lib, everything seems to be in order, but I never see any
files written to /var/lib/nfs/sm.

statd is running as root, and here's the structure of /var/lib/nfs:

drwxr-xr-x 4 root root 4096 Apr 27 11:05 nfs

mop:/var/lib# ls -lR nfs/
nfs/:
total 24
-rw-r--r-- 1 root root 614 Apr 27 11:05 etab
-rw-r--r-- 1 root root 68 Apr 27 13:49 rmtab
drwxr-xr-x 2 root root 4096 Jul 9 2003 sm
drwxr-xr-x 2 root root 4096 Jul 9 2003 sm.bak
-rw------- 1 root root 4 Apr 27 10:42 state
-rw-r--r-- 1 root root 288 Apr 27 11:10 xtab

nfs/sm:
total 0

nfs/sm.bak:
total 0
mop:/var/lib#

There is nothing in the log for 'statd' other than

Apr 27 10:42:53 mop rpc.statd[478]: Version 1.0 Starting

This is not a life-or-death issue because the network is working,
but this is functonality that shoudl work, and it's bugging me :)

As always, any advice warmly received!

Cheers,
Gavin




-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-27 15:32:03

by Olaf Kirch

[permalink] [raw]
Subject: Re: lockd / statd fun (sorry)

On Tue, Apr 27, 2004 at 02:43:43PM +0100, Gavin Hamill wrote:
> OK, I'm assuming here that statd is only run by the machine running nfsd and
> exporting filesystems, and /usr/sbin/rpc.statd is definately in the process list

No, statd needs to run on the client as well.

> Moving onto the files in /var/lib, everything seems to be in order, but I never see any
> files written to /var/lib/nfs/sm.

Files should appear in /var/lib/nfs/sm while a lock is held, and disappear
shortly after the lock is released.

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+


-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-27 15:47:35

by Gavin Hamill

[permalink] [raw]
Subject: Re: lockd / statd fun (sorry)

On Tuesday 27 April 2004 16:32, Olaf Kirch wrote:
> On Tue, Apr 27, 2004 at 02:43:43PM +0100, Gavin Hamill wrote:
> > OK, I'm assuming here that statd is only run by the machine running nfsd
> > and exporting filesystems, and /usr/sbin/rpc.statd is definately in the
> > process list
>
> No, statd needs to run on the client as well.

Right, now we're moving in the right direction. when I run statd on the
clients, I see no errors on the console or system logs, but the statd process
does not appear in 'ps fawx' output (as it does on the server.)

An strace from '/sbin/rpc.statd' one of the clients is:

execve("/sbin/rpc.statd", ["/sbin/rpc.statd"], [/* 11 vars */]) = 0
uname({sys="Linux", node="cc", ...}) = 0
brk(0) = 0x80505c8
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x40017000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or
directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=30964, ...}) = 0
old_mmap(NULL, 30964, PROT_READ, MAP_PRIVATE, 3, 0) = 0x40018000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or
directory)
open("/lib/libnsl.so.1", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0000<\0\000"..., 512) =
512
fstat64(3, {st_mode=S_IFREG|0644, st_size=73528, ...}) = 0
old_mmap(NULL, 84864, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40020000
old_mmap(0x40032000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3,
0x11000) = 0x40032000
old_mmap(0x40033000, 7040, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_ANONYMOUS, -1, 0) = 0x40033000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or
directory)
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200^\1"..., 512) =
512
fstat64(3, {st_mode=S_IFREG|0644, st_size=1243792, ...}) = 0
old_mmap(NULL, 1253956, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40035000
old_mmap(0x4015d000, 32768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3,
0x127000) = 0x4015d000
old_mmap(0x40165000, 8772, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
MAP_ANONYMOUS, -1, 0) = 0x40165000
close(3) = 0
munmap(0x40018000, 30964) = 0
getrlimit(RLIMIT_NOFILE, {rlim_cur=1024, rlim_max=1024}) = 0
pipe([3, 4]) = 0
fork() = 598
close(4) = 0
read(3, 0xbffffdc7, 1) = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
read(3, "", 1) = 0
exit_group(1) = ?

Not very revealing :/

The whole of /var on the clients is initially held on a ramdisk, so they have
read/write access. Some of the larger 'less variable' parts of var are
symlinked to the NFS-rootfs, but /var/lib/nfs isn't one of those parts :/

I'm just about out of ideas on this one, and am ready to just mount the
NFS-root with the 'nolock' option - this works, and lets openoffice's setup
run (it requires locking..) but again, it's just 'a fix' and not The
Solution.

Cheers,
Gavin.


-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-27 15:56:03

by Olaf Kirch

[permalink] [raw]
Subject: Re: lockd / statd fun (sorry)

On Tue, Apr 27, 2004 at 04:47:26PM +0100, Gavin Hamill wrote:
> An strace from '/sbin/rpc.statd' one of the clients is:

You should run "strace -f" to see what the child process does.

Olaf
--
Olaf Kirch | The Hardware Gods hate me.
[email protected] |
---------------+


-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-27 16:49:21

by Trond Myklebust

[permalink] [raw]
Subject: Re: lockd / statd fun (sorry)

On Tue, 2004-04-27 at 11:47, Gavin Hamill wrote:

> I'm just about out of ideas on this one, and am ready to just mount the
> NFS-root with the 'nolock' option - this works, and lets openoffice's setup
> run (it requires locking..) but again, it's just 'a fix' and not The
> Solution.

Neither is keeping /var/lib/nfs on a ramdisk.

The problem then is that if you crash and reboot, rpc.statd will
restart, but it will not notify your server that you rebooted (because
/var/lib/nfs will have been wiped by the reboot). The server may then
end up hanging on to a bunch of locks that it thinks you still own.

/var/lib/nfs should *always* be on permanent storage if you want to use
locking.

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-04-27 18:13:48

by Gavin Hamill

[permalink] [raw]
Subject: Re: lockd / statd fun (sorry)

On Tue, Apr 27, 2004 at 12:49:17PM -0400, Trond Myklebust wrote:
> On Tue, 2004-04-27 at 11:47, Gavin Hamill wrote:
>
> > but again, it's just 'a fix' and not The Solution.
>
> Neither is keeping /var/lib/nfs on a ramdisk.
>
> /var/lib/nfs should *always* be on permanent storage if you want to use
> locking.

Ah I'd certainly not taken that into consideration - I assumed (always
bad, I know) it would reset any locks when the same client remounted the
same export at reboot.

OK, given that the machines have no local permanent storage, it seems
that's solved the problem for me - I must disable locking completely.

It's fortunate that the rootfs export is mounted read-only, and the
home-directories export are never shared between multiple users. That,
and the users are not running very demanding applications.

Plus, if the worst happens, it would take only a few minutes to restore
a corrupted file from backups, or re-create their userprofile entirely.

Thanks for the advice, NFS has been around for a long time, and I've
known nothing about it until the last few weeks :)

Cheers,
Gavin.



-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs