2003-06-10 15:59:32

by Matthew Mitchell

[permalink] [raw]
Subject: Need help with NFSD hang in 2.4.20+NFS_ALL

Everyone,

This morning I arrived at the office to find an NFS server hung up and
users whining at me.

It appeared that an updatedb launched from a cron job had gotten hung
up. Perhaps it caused some sort of overload, I'm not really sure.
System load was over 130, which is about what I expected given that we
had 128 nfs daemon threads, all of which were presumably waiting.

Since nothing was responding (couldn't touch the affected disk, couldn't
successfully sync), I tried to start a "graceful" shutdown, and the
exportfs -ua step hung. A strace -p of that process showed it hung up in

nfsservctl(0x4, <some address>, 0)

on the first mount listed in /var/lib/nfs/xtab. Nothing I could do
would make it move, and it wouldn't get any closer to shutdown so I had
to cycle the power. Joy.

Now, some background info: the NFS shared partition is a
loopback-mounted reiserfs partition, the file underlying which rests on
a big SW-raid volume. (It's every bit as awful as it sounds.) I don't
think NFS is necessarily the culprit here but it did seize up in the
most painful way.

There were some messages that looked like they were from lockd in the
ring buffer but (I see now) they never got written to the messages file.
Damn.

Does it sound feasible to anyone who might know that the system might
have just hiccuped under the load of the updatedb process? That's not
exactly good, but I can easily prevent it from running again...

More germane to this list: if I find this hung up again, is there
anything I can do to diagnose the problem? I don't know now if changing
the value of /proc/sys/sunrpc/nfsd_debug would have any effect, but if
someone suggests a good value I will try it.

This is a SMP box.

I'd greatly appreciate any help or suggestions or even questions to try
to figure out what is going on.

--
Matthew Mitchell
Systems Programmer/Administrator [email protected]
Geophysical Development Corporation phone 713 782 1234
1 Riverway Suite 2100, Houston, TX 77056 fax 713 782 1829



-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The best
thread debugger on the planet. Designed with thread debugging features
you've never dreamed of, try TotalView 6 free at http://www.etnus.com.
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-06-10 19:35:33

by Stuckless, Colin

[permalink] [raw]
Subject: RE: Need help with NFSD hang in 2.4.20+NFS_ALL


> -----Original Message-----
> From: Matthew Mitchell [mailto:[email protected]]
> Sent: Tuesday, June 10, 2003 1:26 PM
> To: [email protected]
> Subject: [NFS] Need help with NFSD hang in 2.4.20+NFS_ALL
>
>
> Everyone,
>
> This morning I arrived at the office to find an NFS server
> hung up and
> users whining at me.
>
> It appeared that an updatedb launched from a cron job had gotten hung
> up. Perhaps it caused some sort of overload, I'm not really sure.
> System load was over 130, which is about what I expected
> given that we
> had 128 nfs daemon threads, all of which were presumably waiting.


Matthew,

I don't have a proper solution to offer, but I feel your pain.

We're running a couple of dual-CPU boxes for our reservoir simulations
on RedHat 7.3 and ran into the same issue. I removed the offending
slocate.cron entry from /etc/cron.daily as it didn't seem to be
obeying the default switch to skip NFS mounted filesystems.

I have a question for you though - what distribution are you running
and what's the background on the 2.4.20+NFS_ALL kernel? I've been
running 2.4.18-27.7xsmp lately to try to find a decent NFS client
implementation that doesn't bring our Solaris 2.6 fileserver to
it's knees under heavy write activity (using NFS v3, UDP, r/wsize=8192).

Colin Stuckless







********************

This email communication is intended as a private communication for the sole
use of the primary addressee and those individuals listed for copies in the
original message. The information contained in this email is private and
confidential and if you are not an intended recipient you are hereby
notified that copying, forwarding or other dissemination or distribution of
this communication by any means is prohibited. If you are not specifically
authorized to receive this email and if you believe that you received it in
error please notify the original sender immediately. We honour similar
requests relating to the privacy of email communications.

Cette communication par courrier ?lectronique est une communication priv?e ?
l'usage exclusif du destinataire principal ainsi que des personnes dont les
noms figurent en copie. Les renseignements contenus dans ce courriel sont
confidentiels et si vous n'?tes pas le destinataire pr?vu, vous ?tes avis?,
par les pr?sentes que toute reproduction, tout transfert ou toute autre
forme de diffusion de cette communication par quelque moyen que ce soit est
interdit. Si vous n'?tes pas sp?cifiquement autoris? ? recevoir ce courriel
ou si vous croyez l'avoir re?u par erreur, veuillez en aviser l'exp?diteur
original imm?diatement. Nous respectons les demandes similaires qui
touchent la confidentialit? des communications par courrier ?lectronique.