2007-03-06 20:18:00

by Jan Rękorajski

[permalink] [raw]
Subject: lockd not responding

Hi,
After applying Trond's patches the oops problem went away but now I'm
back to comatose lockd.

rainbow is a NFS server, sith a random client:

[baggins@sith ~]$ rpcinfo -p rainbow | grep lock
100021 1 udp 32774 nlockmgr
100021 3 udp 32774 nlockmgr
100021 4 udp 32774 nlockmgr
100021 1 tcp 37150 nlockmgr
100021 3 tcp 37150 nlockmgr
100021 4 tcp 37150 nlockmgr

[baggins@sith ~]$ rpcinfo -u rainbow 100021
rpcinfo: RPC: Timed out
program 100021 version 0 is not available

[baggins@sith ~]$ rpcinfo -t rainbow 100021
rpcinfo: RPC: Timed out
program 100021 version 0 is not available

[baggins@sith ~]$ telnet rainbow 37150
Trying 10.1.1.4.37150...
Connected to rainbow.mimuw.edu.pl.
Escape character is '^]'.
^]
telnet>

[root@rainbow ~]# ps aux | grep "\[lockd\]"
root 3786 0.0 0.0 0 0 ? S 01:55 0:00 [lockd]

So, lockd is up and running, I can connect to it, but it's not responding
to RPC calls, what's interesting that it works just after the reboot and
only after some time it stops.

I also see a lot of these in logs on server
(red13 is another NFS client):

portmap: server red13 not responding, timed out
lockd: server red13 not responding, timed out
lockd: couldn't create RPC handle for red13

Looks to me that lockd loops over some dead client and is so wind up in
doing so that it has no time to answer new calls.

Jan
--
Jan Rekorajski | ALL SUSPECTS ARE GUILTY. PERIOD!
baggins<at>mimuw.edu.pl | OTHERWISE THEY WOULDN'T BE SUSPECTS, WOULD THEY?
BOFH, MANIAC | -- TROOPS by Kevin Rubio

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2007-09-22 21:31:57

by Jeff Layton

[permalink] [raw]
Subject: Re: lockd not responding

On Fri, 21 Sep 2007 16:52:12 -0400
Trond Myklebust <[email protected]> wrote:

> On Wed, 2007-09-12 at 20:27 -0400, Jeff Layton wrote:
> > On Wed, 12 Sep 2007 07:26:02 +0000 (UTC)
> > kenneth johansson <[email protected]> wrote:
> >
> > > Got a warning from the lock validating check again and
> > > later a unresponsive lockd with a backtrace this time
> > > actually at the same place the lock warning was on.
> >
> > FWIW, I think I ran into the exact same problem recently. I opened a
> > BZ case for it so I wouldn't forget about it, but haven't had time to
> > track it down:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=280311
> >
> > I'm not certain that lockd was stuck at the time, since I had some
> > other things going on with the box, but it may have been...
>
> Does the attached patch help?
>
> Cheers
> Trond
>

Thanks, Trond -- nice work :-)

Yes. It does seem to. I have a reproducer of sorts -- run this in a
continuous loop:

lock file
sleep 2 seconds
unlock file
sleep 1 second

When run simultaneously on the server and client against the same
inode, the lockdep warning usually pops within a few minutes. With
the patch above, I never saw it, even after running for 20 mins or
so.

--
Jeff Layton <[email protected]>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2007-09-23 18:52:57

by Kenneth Johansson

[permalink] [raw]
Subject: Re: lockd not responding


On Fri, 2007-09-21 at 16:52 -0400, Trond Myklebust wrote:
> On Wed, 2007-09-12 at 20:27 -0400, Jeff Layton wrote:
> > On Wed, 12 Sep 2007 07:26:02 +0000 (UTC)
> > kenneth johansson <[email protected]> wrote:
> >
> > > Got a warning from the lock validating check again and
> > > later a unresponsive lockd with a backtrace this time
> > > actually at the same place the lock warning was on.
> >
> > FWIW, I think I ran into the exact same problem recently. I opened a
> > BZ case for it so I wouldn't forget about it, but haven't had time to
> > track it down:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=280311
> >
> > I'm not certain that lockd was stuck at the time, since I had some
> > other things going on with the box, but it may have been...
>
> Does the attached patch help?

Yes it does.

I tested Jeffs lock unlock loop and it got the lock warning in a minute
or two the three times I tested without the patch and has run for about
an hour now with with no warning.

So there is an Ack from me on this fix.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs