On Thu, May 08, 2008 at 07:18:16PM -0300, Christian Robottom Reis wrote:
> Today, apparently at random, we had a locking problem on our LAN. Client
> applications hung, restarting them led to hangs, and the client dmesgs
> showed a familiar:
>
> [443619.682118] lockd: server anthem not responding, still trying
>
> So the server lockd apparently stopped responding to clients, and
> restarting clients got us nowhere. Eventually we cycled the server and
> everything's back to normal, but I'm pretty confused as to what
> happened. I couldn't scrape any evidence on the server that would point
> to why this happened -- no OOPS, error or even warning output.
>
> I was reading through the thread at
> http://groups.google.com.br/group/fa.linux.kernel/browse_thread/thread/6c7b5e49a46aef75/91adbb9f298db509?lnk=st&q=nfs+locking+server#91adbb9f298db509
> and figured that it might be a similar problem I'm facing, but I'm not
> entirely sure as it's hard to say if somebody interrupted a client
> program or not (it's a large diskless network).
I don't think the server stopped responding to clients in the case
Miklos described.
Perhaps a sysrq-T dump of lockd would show where (and whether) it's
blocked? (So once lockd stops responding, log into the server, run
"echo t >/proc/sysrq-trigger", and collect the output from the logs,
especially the stacktrace for the lockd process).
>
> Clients run 2.6.24-16-generic (stock Ubuntu Hardy) and server is
> 2.6.22-14-generic (stock Ubuntu Gutsy).
>
> If the problem happens again, what can I do on server and client to
> further debug the problem? And is there a utility that clears locks that
> we could use to avoid having to restart the server (acking the risks in
> cleared locks)?
If the server lockd has completely stopped responding to lockd requests,
then the problem isn't just a stray file lock.
--b.
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs
On Fri, May 09, 2008 at 11:43:05AM -0400, J. Bruce Fields wrote:
> I don't think the server stopped responding to clients in the case
> Miklos described.
Okay. Well, one month later, it happened again to me.
> Perhaps a sysrq-T dump of lockd would show where (and whether) it's
> blocked? (So once lockd stops responding, log into the server, run
> "echo t >/proc/sysrq-trigger", and collect the output from the logs,
> especially the stacktrace for the lockd process).
This time I did a ps auxww locking for the lockd process. And guess
what?
root 6323 0.0 0.0 0 0 ? D Jun01 0:50 [lockd]
I wonder why it's in the D state. I also wonder if there's a way to get
it back once it's in this state -- without reloading the kernel module
or rebooting, I guess.
I've collected a trace, at any rate, but lockd isn't even listed in it --
I can send it in if it makes sense.
What sort of debugging can I do to figure out what's wrong here?
(This is a dual-Xeon running:
Linux anthem 2.6.22-14-generic #1 SMP Tue Feb 12 07:42:25 UTC 2008 i686 GNU/Linux)
--
Christian Robottom Reis | http://async.com.br/~kiko/ | [+55 16] 3376 0125
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs