2002-10-03 21:39:06

by Christian Robottom Reis

[permalink] [raw]
Subject: 2.4.19+trond and diskless locking problems

Hey there,

We've got a network with about 20 diskless boxes at a client office.
They've been running 2.4.19 (both the knfsd server and the clients) with
Trond's patches for a while now, and though performance is really nice,
occasionally one box or another will end up with a strange locking
problem.

At bootup, and at shutdown, and for certain other tasks (reading utmp,
etc) the box hangs for a long while (during which I suspect it is trying
to lock). It hangs for about 300 seconds and then goes on normally.

When this happens, there is always a file left in /var/lib/nfs/sm
(normally there are no files in there for none of the clients, even when
they are on) for the hanging box. Is this normal?

We also occasionally get a log message in the server for this box like:

kernel:Aug 10 17:39:22 anthem kernel: lockd: cannot monitor 192.168.99.7

Trond, can I get you more troubleshooting information, or should I try
2.4.20-pre on server *and* clients? This is a bit wierd, but since I
don't know a lot of what went on in the last changes, I'm not sure where
to start looking.

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL


2002-10-03 22:42:45

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems

>>>>> " " == Christian Reis <[email protected]> writes:

> When this happens, there is always a file left in
> /var/lib/nfs/sm (normally there are no files in there for none
> of the clients, even when they are on) for the hanging box. Is
> this normal?

It means that rpc.statd did not manage to unmonitor the NFS locks
before it shut down. The reasons for this could be multiple, such as
for instance if the client crashed and/or rebooted. It might also
indicate that the server could not be contacted in order to unmonitor
the lock.

> We also occasionally get a log message in the server for this
> box like:

> kernel:Aug 10 17:39:22 anthem kernel: lockd: cannot monitor
> 192.168.99.7

Means that the kernel was unable to contact rpc.statd, or that was
unable to contact the server's rpc.statd for some reason.

> Trond, can I get you more troubleshooting information, or
> should I try 2.4.20-pre on server *and* clients? This is a bit
> wierd, but since I don't know a lot of what went on in the last
> changes, I'm not sure where to start looking.

There is nothing in the above to suggest that this must be a kernel
problem. The fact that you are seeing files in /var/lib/nfs/sm
in these cases rather suggests that the problem lies with rpc.statd.
Can you see any reason in your setup why it should be failing?

Cheers,
Trond

2002-10-03 23:20:36

by Christian Robottom Reis

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems

On Fri, Oct 04, 2002 at 12:47:38AM +0200, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <[email protected]> writes:
>
> > When this happens, there is always a file left in
> > /var/lib/nfs/sm (normally there are no files in there for none
> > of the clients, even when they are on) for the hanging box. Is
> > this normal?
>
> It means that rpc.statd did not manage to unmonitor the NFS locks
> before it shut down. The reasons for this could be multiple, such as
> for instance if the client crashed and/or rebooted. It might also
> indicate that the server could not be contacted in order to unmonitor
> the lock.

Yes, these hangs only happen with boxes that crash frequently (they
crash because of a wierd and unrelated bug in X, which forces a reboot
usually).

> > kernel:Aug 10 17:39:22 anthem kernel: lockd: cannot monitor
> > 192.168.99.7
>
> Means that the kernel was unable to contact rpc.statd, or that was
> unable to contact the server's rpc.statd for some reason.

Hmmm, I wonder if I understand properly. Is the following flow correct
for the RPC request?

Client Kernel -> Client rpc.statd -> Server rpc.statd -> Server Kernel

>
> There is nothing in the above to suggest that this must be a kernel
> problem. The fact that you are seeing files in /var/lib/nfs/sm
> in these cases rather suggests that the problem lies with rpc.statd.
> Can you see any reason in your setup why it should be failing?

Not really. The clients run rpc.statd 1.0 and the server, 1.0.1. Should
I start gdbing it to see what is going wrong?

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

2002-10-04 01:08:01

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems

>>>>> " " == Christian Reis <[email protected]> writes:

>> > kernel:Aug 10 17:39:22 anthem kernel: lockd: cannot
>> > monitor 192.168.99.7
>>
>> Means that the kernel was unable to contact rpc.statd, or that
>> was unable to contact the server's rpc.statd for some reason.

> Hmmm, I wonder if I understand properly. Is the following flow
> correct for the RPC request?

> Client Kernel -> Client rpc.statd -> Server rpc.statd ->
> Server Kernel

That's more or less right, except that the communication is bidirectional.

>> lies with rpc.statd. Can you see any reason in your setup why
>> it should be failing?

> Not really. The clients run rpc.statd 1.0 and the server,
> 1.0.1. Should I start gdbing it to see what is going wrong?

Start by using tcpdump to find out who, in the above chain, is taking
such a long time to respond.

Cheers,
Trond

2002-10-04 07:02:22

by Hasch

[permalink] [raw]
Subject: Re: [NFS] 2.4.19+trond and diskless locking problems

Am Donnerstag, 3. Oktober 2002 23:44 schrieb Christian Reis:
>
> We also occasionally get a log message in the server for this box like:
>
> kernel:Aug 10 17:39:22 anthem kernel: lockd: cannot monitor
> 192.168.99.7

I got the same messages when mounting an AIX client to a Linux server after
upgrading to 2.4.19 Kernel.
After installing the latest NFS utils, the problem went away.
So I guess Trond is right, try looking at the userspace utilities.

...Juergen

2002-10-04 13:11:42

by Christian Robottom Reis

[permalink] [raw]
Subject: Re: [NFS] 2.4.19+trond and diskless locking problems

On Fri, Oct 04, 2002 at 09:07:47AM +0200, Juergen Hasch wrote:
> I got the same messages when mounting an AIX client to a Linux server after
> upgrading to 2.4.19 Kernel.
> After installing the latest NFS utils, the problem went away.

Does this mean nfs-utils-1.0.1 vs 1.0, or were you using a much older
version?

> So I guess Trond is right, try looking at the userspace utilities.

I will, thanks.

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

2002-10-04 15:55:30

by Hasch

[permalink] [raw]
Subject: Re: [NFS] 2.4.19+trond and diskless locking problems

Am Freitag, 4. Oktober 2002 15:17 schrieb Christian Reis:
> On Fri, Oct 04, 2002 at 09:07:47AM +0200, Juergen Hasch wrote:
> > I got the same messages when mounting an AIX client to a Linux server
> > after upgrading to 2.4.19 Kernel.
> > After installing the latest NFS utils, the problem went away.
>
> Does this mean nfs-utils-1.0.1 vs 1.0, or were you using a much older
> version?

Versions 0.3.3 and 1.0.1 are working fine for me, so it looks like
your problem is different.

...Juergen

2002-11-20 13:55:23

by Christian Robottom Reis

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems

On Fri, Oct 04, 2002 at 03:13:14AM +0200, Trond Myklebust wrote:
> That's more or less right, except that the communication is bidirectional.
>
> >> lies with rpc.statd. Can you see any reason in your setup why
> >> it should be failing?
>
> > Not really. The clients run rpc.statd 1.0 and the server,
> > 1.0.1. Should I start gdbing it to see what is going wrong?
>
> Start by using tcpdump to find out who, in the above chain, is taking
> such a long time to respond.

I haven't forgotten this. It's just that I've been unable to test: the
problem just stopped showing up when I upgraded to 2.4.20-pre11 with
your NFS-ALL patches applied to it. Could something have changed, or are
we just lucky?

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

2002-11-20 16:55:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems

>>>>> " " == Christian Reis <[email protected]> writes:

> I haven't forgotten this. It's just that I've been unable to
> test: the problem just stopped showing up when I upgraded to
> 2.4.20-pre11 with your NFS-ALL patches applied to it. Could
> something have changed, or are we just lucky?

The main changes have been the discovery of a couple of kmap()
imbalances. Those are also fixed in 2.4.20-rc2.

Cheers,
Trond

2002-11-27 00:34:12

by Christian Robottom Reis

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems

On Wed, Nov 20, 2002 at 06:02:34PM +0100, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <[email protected]> writes:
>
> > I haven't forgotten this. It's just that I've been unable to
> > test: the problem just stopped showing up when I upgraded to
> > 2.4.20-pre11 with your NFS-ALL patches applied to it. Could
> > something have changed, or are we just lucky?
>
> The main changes have been the discovery of a couple of kmap()
> imbalances. Those are also fixed in 2.4.20-rc2.

Just because I send that email my server decided it is going to act up
again. So this time I sat down, and looked hard at the logs and tcpdump
output and this is what I *think*: a lockd bug. Here's the symptoms:

(recalling, a 100mbps network on a d-link switch with about 10 diskless
clients. Some of the boxes start getting really slow shutdown and
startup, and the server seems to be unaffected. Clients run 2.4.19 with
trond's-ALL patches, server runs 2.4.20-pre11 with trond's-ALL patches.
everything *should* be tcp-mounted, but I'm not 100% sure about the root
filesystem)

Trond, I've attached quite a bit of debug output below, but I'm a bit
lost as to what it could be now. Do you think I should start suspecting
the hardware, now? I have *no* reason to do so, but if nobody else sees
this sort of issue...

a) ps ax | grep lockd returns:

94 ? DW 0:00 [lockd]

Why would lockd be in state "D"? Looks bad. Can this happen in
normal usage? It kicks the loadavg up 1 point.

b) tcpdump output, as seen by the server, during bootup (server is
anthem, violinux is client):

[seemingly normal nfs operation up to here]

09:24:08.876333 anthem.async.com.br.nfs > violinux.async.com.br.2153135585: reply ok 128 lookup [|nfs] (DF)
09:24:08.876464 violinux.async.com.br.2169912801 > anthem.async.com.br.nfs: 132 setattr [|nfs] (DF)
09:24:08.876490 anthem.async.com.br.nfs > violinux.async.com.br.2169912801: reply ok 96 setattr [|nfs] (DF)

[ 10-second delay here ]

09:24:18.988289 violinux.async.com.br.793 > anthem.async.com.br.sometimes-rpc4: udp 180 (DF)

[ 11-second delay here ]

09:24:29.889685 violinux.async.com.br.2891398625 > anthem.async.com.br.nfs: 116 lookup [|nfs] (DF)
09:24:29.889864 anthem.async.com.br.nfs > violinux.async.com.br.2891398625: reply ok 128 lookup [|nfs] (DF)
09:24:29.890121 violinux.async.com.br.2908175841 > anthem.async.com.br.nfs: 112 read [|nfs] (DF)
09:24:29.890245 anthem.async.com.br.nfs > violinux.async.com.br.2908175841: reply ok 720 read [|nfs] (DF)
09:24:29.890654 violinux.async.com.br.2924953057 > anthem.async.com.br.nfs: 116 lookup [|nfs] (DF)

[ back to business as usual ]

This repeats itself a number of times, and these 20-second combined
hangs take their toll - it's 10 minutes already and no bootup. The
slowness does *not* manifest itself until we move into runlevel 3 of the
bootup process.

sometimes-rpc4 is port 32770, which rpcinfo -p shows us to be
nlockmgr.
nfs is port 2049.

c) netstat -apn just so we know what ports are in use (state LISTEN for
all but port 32770, which has no state listed):

Proto Recv-Q Send-Q Local Address Foreign Address PID/Program name

tcp 0 0 0.0.0.0:2049 0.0.0.0:* -
udp 31752 0 0.0.0.0:32770 0.0.0.0:* -
udp 0 0 0.0.0.0:32768 0.0.0.0:* 88/rpc.statd
udp 0 0 0.0.0.0:32769 0.0.0.0:* 90/rpc.mountd

d) rpcinfo -p

program vers proto port
100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100007 2 udp 680 ypbind
100007 1 udp 680 ypbind
100007 2 tcp 683 ypbind
100007 1 tcp 683 ypbind
100004 2 udp 685 ypserv
100004 1 udp 685 ypserv
100004 2 tcp 688 ypserv
100004 1 tcp 688 ypserv
100009 1 udp 687 yppasswdd
100024 1 udp 32768 status
100024 1 tcp 32768 status
100005 1 udp 32769 mountd
100005 1 tcp 32769 mountd
100005 2 udp 32769 mountd
100005 2 tcp 32769 mountd
100005 3 udp 32769 mountd
100005 3 tcp 32769 mountd
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100021 1 udp 32770 nlockmgr
100021 3 udp 32770 nlockmgr
100021 4 udp 32770 nlockmgr
100021 1 tcp 32770 nlockmgr
100021 3 tcp 32770 nlockmgr
100021 4 tcp 32770 nlockmgr

So, am I right in thinking it's a lockd problem? rpc.statd seems to be
okay, but I could be wrong..

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

2002-11-27 01:07:47

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems

>>>>> " " == Christian Reis <[email protected]> writes:

> a) ps ax | grep lockd returns:

> 94 ? DW 0:00 [lockd]

> Why would lockd be in state "D"? Looks bad. Can this happen
> in normal usage? It kicks the loadavg up 1 point.
<snip>
> [ 10-second delay here ]

> 09:24:18.988289 violinux.async.com.br.793 >
> anthem.async.com.br.sometimes-rpc4: udp 180 (DF)

> [ 11-second delay here ]

OK, so you are sending out the RPC request to the server's NLM daemon,
which is clearly receiving the packet (since tcpdump was able to log
it), but is never sending a reply. Are you seeing any kernel messages
in the syslog?

BTW: the tcpdumps you're showing are all UDP, not TCP...

Cheers,
Trond

2002-11-27 01:37:23

by Christian Robottom Reis

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems

On Wed, Nov 27, 2002 at 02:14:50AM +0100, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <[email protected]> writes:
>
> > a) ps ax | grep lockd returns:
>
> > 94 ? DW 0:00 [lockd]
>
> > Why would lockd be in state "D"? Looks bad. Can this happen
> > in normal usage? It kicks the loadavg up 1 point.
> <snip>
> > [ 10-second delay here ]
>
> > 09:24:18.988289 violinux.async.com.br.793 >
> > anthem.async.com.br.sometimes-rpc4: udp 180 (DF)
>
> > [ 11-second delay here ]
>
> OK, so you are sending out the RPC request to the server's NLM daemon,
> which is clearly receiving the packet (since tcpdump was able to log
> it), but is never sending a reply. Are you seeing any kernel messages
> in the syslog?

No. The kernel log (kern.* in syslog) is very quite during this period -
the only messenges I get are the eth0 promiscuous mode messages that
tcpdump triggers when it runs.

My messenges file just lists the mounts:

Nov 22 09:24:04 anthem rpc.mountd: authenticated mount request from violinux.async.com.br:800 for /
export/root (/export) Nov 22 09:24:06 anthem dhcpd: DHCPDISCOVER from 00:01:03:d7:f3:0a via eth0
Nov 22 09:24:06 anthem dhcpd: DHCPOFFER on 192.168.99.2 to 00:01:03:d7:f3:0a via eth0
Nov 22 09:24:06 anthem dhcpd: DHCPDISCOVER from 00:01:03:d7:f3:0a via eth0
Nov 22 09:24:06 anthem dhcpd: DHCPOFFER on 192.168.99.2 to 00:01:03:d7:f3:0a via eth0
Nov 22 09:24:06 anthem dhcpd: DHCPREQUEST for 192.168.99.2 (192.168.99.4) from 00:01:03:d7:f3:0a via eth0
Nov 22 09:24:06 anthem dhcpd: DHCPACK on 192.168.99.2 to 00:01:03:d7:f3:0a via eth0
Nov 22 09:24:08 anthem rpc.mountd: authenticated mount request from violinux.async.com.br:707 for /home (/home)
Nov 22 09:24:08 anthem rpc.mountd: authenticated mount request from violinux.async.com.br:711 for /mondo (/mondo)
Nov 22 09:24:08 anthem rpc.mountd: authenticated mount request from violinux.async.com.br:715 for /dist (/dist)
Nov 22 09:24:08 anthem rpc.mountd: authenticated mount request from violinux.async.com.br:719 for /var/spool/mail (/var/spool/mail)
Nov 22 09:24:08 anthem rpc.mountd: authenticated mount request from
violinux.async.com.br:712 for /export/root/var/spool/violinux (/export)
Nov 22 09:24:08 anthem rpc.mountd: authenticated mount request from violinux.async.com.br:714 for /export/root/var/log/violinux (/export)
Nov 22 09:27:31 anthem named[141]: "lab.16.106.143.in-addr.arpa IN NS"
points to a CNAME (grande.ic .unicamp.br)

(yep, unrelated last line just to show nothing is going on)

> BTW: the tcpdumps you're showing are all UDP, not TCP...

100% true, as I noticed as I pressed "y" to send. :-)

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

2002-11-27 17:01:14

by Christian Robottom Reis

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems

On Wed, Nov 27, 2002 at 02:14:50AM +0100, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <[email protected]> writes:
>
> > a) ps ax | grep lockd returns:
>
> > 94 ? DW 0:00 [lockd]
>
> > Why would lockd be in state "D"? Looks bad. Can this happen
> > in normal usage? It kicks the loadavg up 1 point.
> <snip>
> > [ 10-second delay here ]
>
> > 09:24:18.988289 violinux.async.com.br.793 >
> > anthem.async.com.br.sometimes-rpc4: udp 180 (DF)
>
> > [ 11-second delay here ]
>
> OK, so you are sending out the RPC request to the server's NLM daemon,
> which is clearly receiving the packet (since tcpdump was able to log
> it), but is never sending a reply. Are you seeing any kernel messages
> in the syslog?

Hmmm. Ran into this again this morning and right now when starting up
one of the boxes. A reboot of the workstation fixed it for the while.

What can I do to help further debug the issue? Try another kernel
version on clients? Server? This is giving me a headache.. :-/

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

2002-11-27 20:24:39

by Trond Myklebust

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems

>>>>> " " == Christian Reis <[email protected]> writes:

> What can I do to help further debug the issue? Try another
> kernel version on clients? Server? This is giving me a
> headache.. :-/

Try to give us a dump of the server lock manager activity when this
problem occurs. Try to do

echo "217" >/proc/sys/sunrpc/nlm_debug

on the server just before the client issues a lock.

Cheers,
Trond

2002-11-29 01:28:49

by Christian Robottom Reis

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems

On Wed, Nov 27, 2002 at 09:31:42PM +0100, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <[email protected]> writes:
>
> > What can I do to help further debug the issue? Try another
> > kernel version on clients? Server? This is giving me a
> > headache.. :-/
>
> Try to give us a dump of the server lock manager activity when this
> problem occurs. Try to do
>
> echo "217" >/proc/sys/sunrpc/nlm_debug

Okay, I've just done that. Just completing information, this server uses
ReiserFS filesystems, and I know we've had issues with knfsd and reiser
before, so I guess it's at least relevant to say.

* The tcpdump output first for the first set of hangs (hangs start 15:37:15)

[ Following a million nfs packets in the same second (:15): ]

15:37:15.606342 banzai.async.com.br.2337883237 > anthem.async.com.br.nfs: 112 lookup [|nfs] (DF)
15:37:15.606365 anthem.async.com.br.nfs > banzai.async.com.br.2337883237: reply ok 120 lookup [|nfs] (DF)
15:37:15.607877 banzai.async.com.br.792 > anthem.async.com.br.sunrpc: udp 56 (DF)
15:37:15.608384 anthem.async.com.br.sunrpc > banzai.async.com.br.792: udp 28 (DF)
15:37:15.608583 banzai.async.com.br.793 > anthem.async.com.br.sometimes-rpc4: udp 176 (DF)
15:37:15.609593 anthem.async.com.br.sometimes-rpc4 > banzai.async.com.br.793: udp 36 (DF)

[ 3s here ]

15:37:18.032896 banzai.async.com.br.2421769317 > anthem.async.com.br.nfs: 116 lookup [|nfs] (DF)
15:37:18.032975 anthem.async.com.br.nfs > banzai.async.com.br.2421769317: reply ok 128 lookup [|nfs] (DF)
15:37:18.033392 banzai.async.com.br.2438546533 > anthem.async.com.br.nfs: 112 read [|nfs] (DF)
15:37:18.033449 anthem.async.com.br.nfs > banzai.async.com.br.2438546533: reply ok 720 read [|nfs] (DF)
15:37:18.034177 banzai.async.com.br.2455323749 > anthem.async.com.br.nfs: 116 lookup [|nfs] (DF)
15:37:18.034211 anthem.async.com.br.nfs > banzai.async.com.br.2455323749: reply ok 128 lookup [|nfs] (DF)
15:37:18.034490 banzai.async.com.br.2472100965 > anthem.async.com.br.nfs: 112 read [|nfs] (DF)
15:37:18.034534 anthem.async.com.br.nfs > banzai.async.com.br.2472100965: reply ok 816 read [|nfs] (DF)

[ 10s here ]

15:37:28.041862 banzai.async.com.br.2488878181 > anthem.async.com.br.nfs: 100 getattr [|nfs] (DF)
15:37:28.042385 anthem.async.com.br.nfs > banzai.async.com.br.2488878181: reply ok 96 getattr [|nfs] (DF)
15:37:28.042820 banzai.async.com.br.2505655397 > anthem.async.com.br.nfs: 100 getattr [|nfs] (DF)
15:37:28.042892 anthem.async.com.br.nfs > banzai.async.com.br.2505655397: reply ok 96 getattr [|nfs] (DF)

At this point, a pattern forms. Second 28 has nfs activity, then hangs
till second 40. Second 40 has has nfs activity, then hangs till second
45. Then second 04. Then second 15. Then 20...

* The kernel log output when 217 is echoed to nlm_debug:

Nov 28 15:36:37 anthem kernel: eth0: Setting promiscuous mode.
Nov 28 15:37:15 anthem kernel: lockd: request from c0a86309
Nov 28 15:37:15 anthem kernel: lockd: nlm_lookup_host(c0a86309, p=17, v=4)
Nov 28 15:37:15 anthem kernel: lockd: host garbage collection
Nov 28 15:37:15 anthem kernel: lockd: nlmsvc_mark_resources
Nov 28 15:37:15 anthem kernel: lockd: get host 192.168.99.9
Nov 28 15:37:15 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 00006693 000071c9 0001dcc7)
Nov 28 15:37:15 anthem kernel: lockd: found file cbacccc0 (count 10)
Nov 28 15:37:15 anthem kernel: lockd: nlmsvc_lock(090b/26259, ty=1, pi=1, 0-9223372036854775807, bl=1)
Nov 28 15:37:15 anthem kernel: lockd: nlmsvc_lookup_block f=cbacccc0 pd=1 0-9223372036854775807 ty=1
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacc4c0 pd=401 0-9223372036854775807 ty=1 cookie=1269
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacccc0 pd=389 0-9223372036854775807 ty=1 cookie=1274
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacccc0 pd=383 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacccc0 pd=391 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacccc0 pd=407 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacccc0 pd=430 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:15 anthem kernel: lockd: check f=cbacccc0 pd=1 0-9223372036854775807 ty=1 cookie=124f
Nov 28 15:37:15 anthem kernel: lockd: nlmsvc_insert_block(c33be400, -1)
Nov 28 15:37:15 anthem kernel: lockd: release host 192.168.99.9
Nov 28 15:37:15 anthem kernel: lockd: nlm_release_file(cbacccc0, ct = 11)
Nov 28 15:37:15 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:15 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:45 anthem kernel: lockd: request from c0a86309
Nov 28 15:37:45 anthem kernel: lockd: nlm_lookup_host(c0a86309, p=17, v=4)
Nov 28 15:37:45 anthem kernel: lockd: get host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 00006693 000071c9 0001dcc7)
Nov 28 15:37:45 anthem kernel: lockd: found file cbacccc0 (count 10)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_cancel(090b/26259, pi=1, 0-9223372036854775807)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_lookup_block f=cbacccc0 pd=1 0-9223372036854775807 ty=1
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacc4c0 pd=401 0-9223372036854775807 ty=1 cookie=1269
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=389 0-9223372036854775807 ty=1 cookie=1274
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=383 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=391 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=407 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=430 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=326 0-9223372036854775807 ty=1 cookie=1250
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=1 0-9223372036854775807 ty=1 cookie=124f
Nov 28 15:37:45 anthem kernel: lockd: deleting block c33be400...
Nov 28 15:37:45 anthem kernel: lockd: unblocking blocked lock
Nov 28 15:37:45 anthem kernel: lockd: release host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: release host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: nlm_release_file(cbacccc0, ct = 11)
Nov 28 15:37:45 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:45 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:45 anthem kernel: lockd: request from c0a86309
Nov 28 15:37:45 anthem kernel: lockd: nlm_lookup_host(c0a86309, p=17, v=4)
Nov 28 15:37:45 anthem kernel: lockd: get host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 00006693 000071c9 0001dcc7)
Nov 28 15:37:45 anthem kernel: lockd: found file cbacccc0 (count 10)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_unlock(090b/26259, pi=1, 0-9223372036854775807)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_cancel(090b/26259, pi=1, 0-9223372036854775807)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_lookup_block f=cbacccc0 pd=1 0-9223372036854775807 ty=2
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacc4c0 pd=401 0-9223372036854775807 ty=1 cookie=1269
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=389 0-9223372036854775807 ty=1 cookie=1274
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=383 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=391 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=407 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=430 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=326 0-9223372036854775807 ty=1 cookie=1250
Nov 28 15:37:45 anthem kernel: lockd: release host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: nlm_release_file(cbacccc0, ct = 11)
Nov 28 15:37:45 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:45 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:45 anthem kernel: lockd: request from c0a86309
Nov 28 15:37:45 anthem kernel: lockd: nlm_lookup_host(c0a86309, p=17, v=4)
Nov 28 15:37:45 anthem kernel: lockd: get host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 00006693 000071c9 0001dcc7)
Nov 28 15:37:45 anthem kernel: lockd: found file cbacccc0 (count 10)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_lock(090b/26259, ty=1, pi=1, 0-9223372036854775807, bl=1)
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_lookup_block f=cbacccc0 pd=1 0-9223372036854775807 ty=1
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacc4c0 pd=401 0-9223372036854775807 ty=1 cookie=1269
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=389 0-9223372036854775807 ty=1 cookie=1274
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=383 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=391 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=407 0-9223372036854775807 ty=1 cookie=1262
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=430 0-9223372036854775807 ty=1 cookie=1272
Nov 28 15:37:45 anthem kernel: lockd: check f=cbacccc0 pd=326 0-9223372036854775807 ty=1 cookie=1250
Nov 28 15:37:45 anthem kernel: lockd: blocking on this lock (allocating).
Nov 28 15:37:45 anthem kernel: lockd: nlm_lookup_host(c0a86309, p=17, v=4)
Nov 28 15:37:45 anthem kernel: lockd: get host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: created block c33be400...
Nov 28 15:37:45 anthem kernel: lockd: nlmsvc_insert_block(c33be400, -1)
Nov 28 15:37:45 anthem kernel: lockd: blocking on this lock.
Nov 28 15:37:45 anthem kernel: lockd: release host 192.168.99.9
Nov 28 15:37:45 anthem kernel: lockd: nlm_release_file(cbacccc0, ct = 11)
Nov 28 15:37:45 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:37:45 anthem kernel: nlmsvc_retry_blocked(cbd6fc00, when=-1)
Nov 28 15:38:15 anthem kernel: lockd: request from c0a86309
Nov 28 15:38:15 anthem kernel: lockd: nlm_lookup_host(c0a86309, p=17, v=4)

[ Then we have a lot of entries for second 15, then a number for
second 45, then more for 15... ]

I also noticed an interesting thing. Certain boxes on the network are
hanging; others are not. And if I `ls -l /var/lib/nfs/sm', I get:

/var/lib/nfs/sm:
total 0
-rw------- 1 root root 0 Nov 26 16:41 192.168.99.2
-rw------- 1 root root 0 Nov 28 15:48 192.168.99.5
-rw------- 1 root root 0 Nov 23 17:06 192.168.99.7
-rw------- 1 root root 0 Nov 25 22:10 192.168.99.9

The boxes .2, .5 and .7 are hanging. The box .9 is not. See the dates
on their files? Could this be related to the problem?

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

2002-12-02 14:44:03

by Christian Robottom Reis

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems

On Thu, Nov 28, 2002 at 11:34:12PM -0200, Christian Reis wrote:
> On Wed, Nov 27, 2002 at 09:31:42PM +0100, Trond Myklebust wrote:
> > >>>>> " " == Christian Reis <[email protected]> writes:
> >
> > > What can I do to help further debug the issue? Try another
> > > kernel version on clients? Server? This is giving me a
> > > headache.. :-/
> >
> > Try to give us a dump of the server lock manager activity when this
> > problem occurs. Try to do
> >
> > echo "217" >/proc/sys/sunrpc/nlm_debug
>
> Okay, I've just done that. Just completing information, this server uses
> ReiserFS filesystems, and I know we've had issues with knfsd and reiser
> before, so I guess it's at least relevant to say.

Trond, did you have a minute to look at this? I wonder if it's really
related to Reiser or not..

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

2002-12-04 14:13:51

by Christian Robottom Reis

[permalink] [raw]
Subject: Re: 2.4.19+trond and diskless locking problems


Hey there. Another update on the wierd hangs. After rebooting the server
(and cleaning up /var/lib/nfs/sm consequently) we've had a couple of
days with no hangs. I've been keeping an eye on that directory closely
the past days to see if we have leftover entries there, and today one
showed up:

anthem:~$ date
Wed Dec 4 11:59:31 BRDT 2002
anthem:~$ sudo ls -l /var/lib/nfs/sm
total 0
-rw------- 1 root root 0 Dec 4 08:47 192.168.99.7

So I have a file there that has been untouched for over 3 hours now.
Having a look at the logfile, I have:

[... bootup messages culminating in ntpd startup ]

Dec 4 08:44:16 canario ntpd[317]: kernel time discipline status 0040
Dec 4 08:44:16 canario ntpd[317]: frequency initialized -97.153
from /var/lib/ntp/driftfile.canario
Dec 4 08:47:22 canario rpc.statd[101]: Received erroneous SM_UNMON
request from canario for 192.168.99.4

[ same time as file above, see ]

Dec 4 08:48:47 canario ntpd[317]: kernel time discipline status change 41
Dec 4 08:50:01 canario /USR/SBIN/CRON[425]: (smmsp) CMD (test -x
/usr/share/sendmail/sendmail && /usr/share/sendmail/sendmail cron-msp)

[... more bootup messages]

The reboot log from the night before shows a series of 8 erroneous
SM_UNMON messenges for a period of about an hour before the actual
reboot. The machine was shut down unclean the time before that (well,
as unclean as an NFS-mounted box can be).

And, sure enough, when shutting down this box now, we get the
stalls/pauses again. nlm_debug outputs for both the client and the
server (neat 30s intervals show up :-) - I've selected specific seconds
of activity (i.e. there was a set of entries at 12:06:32 before):

CLIENT:

Dec 4 12:07:02 canario kernel: lockd: nlm_bind_host(c0a86304)
Dec 4 12:07:02 canario kernel: lockd: next rebind in 6000 jiffies
Dec 4 12:07:02 canario kernel: lockd: get host 192.168.99.4
Dec 4 12:07:02 canario kernel: lockd: release host 192.168.99.4
Dec 4 12:07:02 canario kernel: lockd: release host 192.168.99.4
Dec 4 12:07:02 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
Dec 4 12:07:02 canario kernel: lockd: host garbage collection
Dec 4 12:07:02 canario kernel: lockd: nlmsvc_mark_resources
Dec 4 12:07:02 canario kernel: lockd: get host 192.168.99.4
Dec 4 12:07:02 canario kernel: lockd: nlm_bind_host(c0a86304)
Dec 4 12:07:02 canario kernel: lockd: release host 192.168.99.4
Dec 4 12:07:02 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
Dec 4 12:07:02 canario kernel: lockd: get host 192.168.99.4
Dec 4 12:07:02 canario kernel: lockd: nlm_bind_host(c0a86304)
Dec 4 12:07:32 canario kernel: lockd: nlm_bind_host(c0a86304)
Dec 4 12:07:32 canario kernel: lockd: get host 192.168.99.4
Dec 4 12:07:32 canario kernel: lockd: release host 192.168.99.4
Dec 4 12:07:32 canario kernel: lockd: release host 192.168.99.4
Dec 4 12:07:32 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
Dec 4 12:07:32 canario kernel: lockd: get host 192.168.99.4
Dec 4 12:07:32 canario kernel: lockd: nlm_bind_host(c0a86304)
Dec 4 12:07:32 canario kernel: lockd: release host 192.168.99.4
Dec 4 12:07:32 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
Dec 4 12:07:32 canario kernel: lockd: get host 192.168.99.4
Dec 4 12:07:32 canario kernel: lockd: nlm_bind_host(c0a86304)

SERVER:

Dec 4 12:07:02 anthem kernel: lockd: request from c0a86307
Dec 4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
Dec 4 12:07:02 anthem kernel: lockd: found file c1d1f380 (count 0)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=1
Dec 4 12:07:02 anthem kernel: lockd: check f=c1d1f380 pd=1 0-9223372036854775807 ty=1 cookie=1393
Dec 4 12:07:02 anthem kernel: lockd: deleting block d12e9000...
Dec 4 12:07:02 anthem kernel: lockd: unblocking blocked lock
Dec 4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
Dec 4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:02 anthem kernel: lockd: request from c0a86307
Dec 4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
Dec 4 12:07:02 anthem kernel: lockd: found file c1d1f380 (count 0)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_unlock(090b/64826, pi=1, 0-9223372036854775807)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=2
Dec 4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
Dec 4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:02 anthem kernel: lockd: request from c0a86307
Dec 4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
Dec 4 12:07:02 anthem kernel: lockd: found file c1d1f380 (count 0)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_lock(090b/64826, ty=1, pi=1, 0-9223372036854775807, bl=1)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=1
Dec 4 12:07:02 anthem kernel: lockd: blocking on this lock (allocating).
Dec 4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: created block d12e9000...
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_insert_block(d12e9000, -1)
Dec 4 12:07:02 anthem kernel: lockd: blocking on this lock.
Dec 4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
Dec 4 12:07:02 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)
Dec 4 12:07:02 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)
Dec 4 12:07:32 anthem kernel: lockd: request from c0a86307
Dec 4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:32 anthem kernel: lockd: host garbage collection
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_mark_resources
Dec 4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
Dec 4 12:07:32 anthem kernel: lockd: found file c1d1f380 (count 0)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=1
Dec 4 12:07:32 anthem kernel: lockd: check f=c1d1f380 pd=1 0-9223372036854775807 ty=1 cookie=1396
Dec 4 12:07:32 anthem kernel: lockd: deleting block d12e9000...
Dec 4 12:07:32 anthem kernel: lockd: unblocking blocked lock
Dec 4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
Dec 4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:32 anthem kernel: lockd: request from c0a86307
Dec 4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
Dec 4 12:07:32 anthem kernel: lockd: found file c1d1f380 (count 0)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_unlock(090b/64826, pi=1, 0-9223372036854775807)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775 807 ty=2
Dec 4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
Dec 4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:32 anthem kernel: lockd: request from c0a86307
Dec 4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
Dec 4 12:07:32 anthem kernel: lockd: found file c1d1f380 (count 0)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_lock(090b/64826, ty=1, pi=1, 0-9223372036854775 807, bl=1)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775 807 ty=1
Dec 4 12:07:32 anthem kernel: lockd: blocking on this lock (allocating).
Dec 4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: created block d12e9000...
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_insert_block(d12e9000, -1)
Dec 4 12:07:32 anthem kernel: lockd: blocking on this lock.
Dec 4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
Dec 4 12:07:32 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)
Dec 4 12:07:32 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)

HTH. Yes, we're all sick of this subject :-/

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL