2005-01-07 11:06:01

by Frank Steiner

[permalink] [raw]
Subject: ps -aux hangs on [nfsd]

Hi,

I've encountered a very very strange "bug" on one of our nfs servers.
It runs SuSE 9.0 with kernel 2.6.9 and exports some directories from
a 600GB IDE-Raid to about 60 clients with

/mnt/extraid1/extproj @intern(rw,root_squash,sync)
/mnt/extraid1/extstorage @intern(rw,root_squash,sync)

The clients run the same kernel and mount with

turing:/mnt/extraid1/extproj /mnt/extraid1/extproj nfs
defaults,tcp,rw,hard,rsize=16384,wsize=16384 0 0
turing:/mnt/extraid1/extstorage /mnt/extraid1/extstorage nfs
defaults,tcp,rw,hard,rsize=16384,wsize=16384 0 0

This is the same for all our nfs servers and clients, so this server
is special.

Everything works fine when starting the nfsserver and mounting the
directories. However, after about 1-2 weeks of normal works, some
things start to hang on the host, suddenly "ps -aux" hangs on showing
the [nfsd] processes. We recognize that because we do some

/etc/init.d/tomcat5 status >& /dev/null || /etc/init.d/rctomcat5 restart

via cron to check if the tomcat server is still running,
and the /etc/init.d/tomcat5 script uses "ps -aux".

So, when executing ps -aux the output shows the normal processes for
root, system users etc., then it comes to nfsd and shows

...
mysql 17132 0.0 0.0 481632 1248 ? S Jan03 0:00 /usr/sbin/mysqld-max
root 17166 0.0 0.0 0 0 ? SW Jan03 0:00 [nfsd]
root 17149 0.0 0.0 0 0 ? SW Jan03 0:00 [nfsd]
root 17150 0.0 0.0 0 0 ? SW Jan03 0:00 [nfsd]
root 17151 0.0 0.0 0 0 ? SW Jan03 0:00 [nfsd]
root 17152 0.0 0.0 0 0 ? SW Jan03 0:00 [nfsd]
root 17153 0.0 0.0 0 0 ? SW Jan03 0:00 [nfsd]
root 17154 0.0 0.0 0 0 ? SW Jan03 0:00 [nfsd]
root 17155 0.0 0.0 0 0 ? SW Jan03 0:00 [nfsd]
root 17156 0.0 0.0 0 0 ? SW Jan03 0:00 [nfsd]
root 17157 0.0 0.0 0 0 ? SW Jan03 0:00 [nfsd]
root 17165 0.0 0.0 0 0 ? SW Jan03 0:00 [nfsd]


and hangs. When I call top and sort for memory usage, I see that ps
is taking more and more memory. I killed it when using 1GB of memory.
We are running the nfsserver with 18 threads, so just the first 11
are shown.

However, when I call just "ps -x", I get all 18 [nfsd] threads:

...
17084 ? S 0:00 /bin/sh /usr/bin/mysqld_safe ...
17166 ? SW 0:00 [nfsd]
17149 ? SW 0:00 [nfsd]
17150 ? SW 0:00 [nfsd]
17151 ? SW 0:00 [nfsd]
17152 ? SW 0:00 [nfsd]
17153 ? SW 0:00 [nfsd]
17154 ? SW 0:00 [nfsd]
17155 ? SW 0:00 [nfsd]
17156 ? SW 0:00 [nfsd]
17157 ? SW 0:00 [nfsd]
17165 ? SW 0:00 [nfsd]
17164 ? SW 0:00 [nfsd]
17163 ? SW 0:00 [nfsd]
17162 ? SW 0:00 [nfsd]
17161 ? SW 0:00 [nfsd]
17160 ? SW 0:00 [nfsd]
17159 ? SW 0:00 [nfsd]
17158 ? SW 0:00 [nfsd]
17169 ? S 0:12 /usr/sbin/rpc.mountd -p 700
...


After calling "/etc/init.d/nfsserver restart", ps -aux works again, showing
all processes.

When ps -aux hung, the nfsserver still seemed to work fine, I could access
the filessystem from the nfs clients. In the log files, I did not find any
message about nfs errors or sth. similar.

The only thing that makes this nfs server special compared to the other
servers: It is outside our firewall and mounts one filesystem from
behind the firewall with a ssh tunnel. For that, it has some fstab line
localhost:/export/rpm /rpm-export nfs
defaults,tcp,ro,hard,rsize=16384,wsize=16384,port=701,mountport=702 0 0
It shouldn't affect the nfs server I guess, but who knows...

Any idea what that could be? How can ps -aux hang on showing a nfsd
thread and leaking memory up to over one GB? What can I do to provide
more helpful information? Turn one some debug options etc.?
Unfortunately, I cannot yet reprocude the behaviour reliably. It just shows
up some time, so far after about 1-2 weeks...

cu,
Frank


--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *




-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-01-27 10:57:22

by Frank Steiner

[permalink] [raw]
Subject: Re: ps -aux hangs on [nfsd]

Ok, it happened again and I did a "strace -f ps -aux". I can see
ps -aux going through almost all of the /proc/<pid> directories
for nfsd pids, but not all. On the 16th of the 18 nfsd pids, instead
of opening /proc/15600/cmdline, it opens /etc/passwd and then goes
into a look opening and closing /etc/passwd all over again.

Here are the relevant portions of the strace log, maybe someone
has an idea what could go wrong here.

The pids/proc dirs for the nfsd are 15585-15602. Things go wrong
on pid 15600.

Currently, this happens with the latest SuSE kernel for SuSE 9.2
(2.6.8 plus SuSE patches).


17054 stat64("/proc/15599", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
17054 open("/proc/15599/stat", O_RDONLY) = 6
17054 read(6, "15599 (nfsd) S 1 1 1 0 -1 104864"..., 1023) = 125
17054 close(6) = 0
17054 open("/proc/15599/status", O_RDONLY) = 6
17054 read(6, "Name:\tnfsd\nState:\tS (sleeping)\nS"..., 1023) = 348
17054 close(6) = 0
17054 open("/proc/15599/cmdline", O_RDONLY) = 6
17054 read(6, "", 2047) = 0
17054 close(6) = 0
17054 write(1, "root 15599 0.0 0.0 0 "..., 70) = 70
17054 stat64("/proc/15600", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
17054 open("/proc/15600/stat", O_RDONLY) = 6
17054 read(6, "15600 (nfsd) S 1 1 1 0 -1 104864"..., 1023) = 125
17054 close(6) = 0
17054 open("/proc/15600/status", O_RDONLY) = 6
17054 read(6, "Name:\tnfsd\nState:\tS (sleeping)\nS"..., 1023) = 373
17054 close(6) = 0
17054 open("/etc/passwd", O_RDONLY) = 6
===> here it should open /proc/16000/cmdline, not passwd <===

17054 fcntl64(6, F_GETFD) = 0
17054 fcntl64(6, F_SETFD, FD_CLOEXEC) = 0
17054 _llseek(6, 0, [0], SEEK_CUR) = 0
17054 fstat64(6, {st_mode=S_IFREG|0644, st_size=1704, ...}) = 0
17054 mmap2(NULL, 1704, PROT_READ, MAP_SHARED, 6, 0) = 0xb7dd3000
17054 _llseek(6, 1704, [1704], SEEK_SET) = 0
17054 fstat64(6, {st_mode=S_IFREG|0644, st_size=1704, ...}) = 0
17054 munmap(0xb7dd3000, 1704) = 0
17054 close(6) = 0
17054 open("/etc/passwd", O_RDONLY) = 6
17054 fcntl64(6, F_GETFD) = 0
17054 fcntl64(6, F_SETFD, FD_CLOEXEC) = 0
17054 _llseek(6, 0, [0], SEEK_CUR) = 0
17054 fstat64(6, {st_mode=S_IFREG|0644, st_size=1704, ...}) = 0
17054 mmap2(NULL, 1704, PROT_READ, MAP_SHARED, 6, 0) = 0xb7dd3000
17054 _llseek(6, 1704, [1704], SEEK_SET) = 0
17054 fstat64(6, {st_mode=S_IFREG|0644, st_size=1704, ...}) = 0
17054 munmap(0xb7dd3000, 1704) = 0
17054 close(6) = 0
17054 open("/etc/passwd", O_RDONLY) = 6
...
and so on.

After doing this about 100 times, the strace change a bit:
17054 open("/etc/passwd", O_RDONLY) = 6
17054 fcntl64(6, F_GETFD) = 0
17054 fcntl64(6, F_SETFD, FD_CLOEXEC) = 0
17054 _llseek(6, 0, [0], SEEK_CUR) = 0
17054 fstat64(6, {st_mode=S_IFREG|0644, st_size=1704, ...}) = 0
17054 mmap2(NULL, 1704, PROT_READ, MAP_SHARED, 6, 0) = 0xb7dd3000
17054 _llseek(6, 1704, [1704], SEEK_SET) = 0
17054 fstat64(6, {st_mode=S_IFREG|0644, st_size=1704, ...}) = 0
17054 munmap(0xb7dd3000, 1704) = 0
17054 close(6) = 0
17054 mremap(0xb7d53000, 331776, 335872, MREMAP_MAYMOVE) = 0xb7d53000
17054 open("/etc/passwd", O_RDONLY) = 6
...

I.e., the "mremap" occurs. These commands (with the mremap) now loop forever,
and every about 50 times a little variation occurs:

...
17054 munmap(0xb7dd3000, 1704) = 0
17054 close(6) = 0
17054 mremap(0xb7d53000, 405504, 405504, MREMAP_MAYMOVE) = 0xb7d53000
17054 socket(PF_UNIX, SOCK_STREAM, 0) = 6
17054 connect(6, {sa_family=AF_UNIX, path="/var/run/.nscd_socket"}, 110) = -1 ENOENT
(No such file or directory)
17054 close(6) = 0
17054 open("/etc/passwd", O_RDONLY) = 6
...

Anyone any ideas? I also did a "ls -laR" for all the nfsd dirs in /proc
as well as find+cat on these dirs, so all contents from /proc are logged
in case you need some more information...

cu,
Frank

--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-28 10:22:30

by Olaf Kirch

[permalink] [raw]
Subject: Re: ps -aux hangs on [nfsd]

On Thu, Jan 27, 2005 at 11:57:04AM +0100, Frank Steiner wrote:
> Here are the relevant portions of the strace log, maybe someone
> has an idea what could go wrong here.

This doesn't look like an NFS bug; it's a problem in procps or glibc.

When you do a ps aux, ps ends up in a function named simple_readtask
(in procps/proc/readproc.c) which does this:

stat /proc/<pid>
read /proc/<pid>/stat
read /proc/<pid>/status
map the uids it found to login names

The latter happens via a function named user_from_uid(), which caches
these mappings - this is the reason why you're not seeing an access to
/etc/passwd on every call.

But it doesn't look like user_from_uid() could loop at all. So
possibly somethings wrong inside glibc (unlikely as that is).

Olaf
--
Olaf Kirch | Things that make Monday morning interesting, #2:
[email protected] | "We have 8,000 NFS mount points, why do we keep
---------------+ running out of privileged ports?"


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-01-28 13:48:22

by Frank Steiner

[permalink] [raw]
Subject: Re: ps -aux hangs on [nfsd]

Olaf Kirch wrote

> This doesn't look like an NFS bug; it's a problem in procps or glibc.
>
> When you do a ps aux, ps ends up in a function named simple_readtask
> (in procps/proc/readproc.c) which does this:
>
> stat /proc/<pid>
> read /proc/<pid>/stat
> read /proc/<pid>/status
> map the uids it found to login names
>
> The latter happens via a function named user_from_uid(), which caches
> these mappings - this is the reason why you're not seeing an access to
> /etc/passwd on every call.

Hmm, maybe that goes wrong, because the uid found in the status file
does not exist on the server. The server has only system users, but
exports the fs to a clients where real users exist. I checked the
content of /proc/15600/status which I recorder when it was hanging
and it shows

Name: nfsd
State: S (sleeping)
SleepAVG: 102%
Tgid: 15600
Pid: 15600
PPid: 1
TracerPid: 0
Uid: 0 0 0 19021
Gid: 0 0 0 19000
FDSize: 32
Groups: 14 16 17 33 19000
Threads: 1
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: fffffffffffffef8
SigIgn: 0000000000000000
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 00000000ffffffff
CapEff: 00000000fefffee0

However, uid and gid 19021 and 19000 don't exist on the nfs server, only
on the clients mounting. Of course this shouldn't be a problem, but maybe
the bug is triggered by this situation?

Btw, I saw that the read command in tge strace log for /proc/15600/status
is the only one that ends with "= 373" while all reads for the other
nfsd processes end with "= 348". Could that indicate sth. is wrong
with this status file?

> But it doesn't look like user_from_uid() could loop at all. So
> possibly somethings wrong inside glibc (unlikely as that is).

All very puzzling and I've no idea how to track this down... Anyway
thanks for the explanation so far!

cu,
Frank



--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr. 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: +49 89 2180-99-4049
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs