Hello.
Some time ago I was faced with a strange problem in 2.4 kernel.
I could reproduce in only on one system - a production 2-CPU server that is
used as LTSP server here and also runs tons of services and MUST be always
up.
The problem is the following.
Server runs normally (and uptime may be already several weeks, but may be
only several hours).
Suddenly something happens.
And process table becomes full of zombies.
Looks like any thread created by any program becomes a zombie when
finished. Same programs (actually, same running processes) join()ed
finished threads correctly before Something Happened. So it looks very
much that Something happens inside the kernel.
Affected programs include mozilla, clamav, mysqld, licq and anything else
that creates short-living threads, or at least threads that live shorter
than program itself.
It looks like at some moment kernel looses the abitily to inform process
that their threads are over. AKAIK, this is done by SIGCHLD? Anyway,
manual sending SIGCHLD to the parent of zombies does not help.
After the problem happens, server becomes unusable (because of process
table overflow) in several minutes. One time Something Else happened, and
all those zombies disappeared. In all other cases a reboot was required.
If the process that created those "zombie thread" is terminated (i.e.
sevice stopped), all his zombies disappear. However, after service is
restarted, zombies become to appear again.
Athough I tried, I could not find any correlation between making system to
this "zombie-keeping" state and anything else happenning with the system.
Looks like that running java apps (with blackdown jdk) makes this happen
more often, bot still no direct correlation.
The problem happened with official 2.4.23, 2.4.24 and 2.4.25 kernels,
compiled from kernel.org sources.
Yestedray I was tired with this zombie problem (it arised twice during this
week), and decided to upgrade server to kernel 2.6.
I installed 2.6.4 kernel from the Debain kernel-image-2.6.4-1-k7-smp
package.
Unfortunately, this did not eliminate the problem: it happened today again.
The difference is that when running in 2.6, most binaries use NPTL libs
from /lib/i686/cmov/, and seem not to be affected by the problem (i.e. no
zombies from them). However, users need to run some statically-linked
binaries (without source available) that have non-NPTL libs statically
linked and so still use linuxthreads; those are affected (i.e. do create
zombies). So problem is not rendering server unusable (so it no longer
that critical), but it still exists in the 2.6 kernel.
I can't reproduce the problem on any other host. And the affected system is
a production server that is somewhat difficult to use for debugging :(
It is a dual-K7 server with Tyan Tiger MPX S2466 motherboard and 2 Gb of
ram. Output of 'lspci -vv' and 'cat /proc/cpuinfo' is attached. I may
provide any other technical information.
I'm a seasoned unix developer and sysadmin, and have some kernel hacking
experience. However, I don't work with the kernel currently, so I am not
"in context of" kernel internals.
So I'm looking either for a fix :), or for some advice on what to do with
this (i.e. where to look in the kernel code and what to look for).
Nikita Youshchenko,
sysadmin at lvk.cs.msu.su
On Thursday 01 April 2004 13:42, Nikita V. Youshchenko wrote:
> Hello.
>
> Some time ago I was faced with a strange problem in 2.4 kernel.
> I could reproduce in only on one system - a production 2-CPU server that is
> used as LTSP server here and also runs tons of services and MUST be always
> up.
>
> The problem is the following.
> Server runs normally (and uptime may be already several weeks, but may be
> only several hours).
> Suddenly something happens.
> And process table becomes full of zombies.
> Looks like any thread created by any program becomes a zombie when
> finished. Same programs (actually, same running processes) join()ed
> finished threads correctly before Something Happened. So it looks very
> much that Something happens inside the kernel.
> Affected programs include mozilla, clamav, mysqld, licq and anything else
> that creates short-living threads, or at least threads that live shorter
> than program itself.
How does ps -AH e looks like?
> It looks like at some moment kernel looses the abitily to inform process
> that their threads are over. AKAIK, this is done by SIGCHLD? Anyway,
> manual sending SIGCHLD to the parent of zombies does not help.
Did you try stracing parent process? It can receive SIGCHLD but
ignore/mishandle it.
> After the problem happens, server becomes unusable (because of process
> table overflow) in several minutes. One time Something Else happened, and
> all those zombies disappeared. In all other cases a reboot was required.
>
> If the process that created those "zombie thread" is terminated (i.e.
> sevice stopped), all his zombies disappear. However, after service is
> restarted, zombies become to appear again.
Probably they get reparented to init and it wait()'s for them,
ending their afterlife. So SIGCHLD works (at least in this case).
> Athough I tried, I could not find any correlation between making system to
> this "zombie-keeping" state and anything else happenning with the system.
> Looks like that running java apps (with blackdown jdk) makes this happen
> more often, bot still no direct correlation.
>
> The problem happened with official 2.4.23, 2.4.24 and 2.4.25 kernels,
> compiled from kernel.org sources.
>
> Yestedray I was tired with this zombie problem (it arised twice during this
> week), and decided to upgrade server to kernel 2.6.
> I installed 2.6.4 kernel from the Debain kernel-image-2.6.4-1-k7-smp
> package.
>
> Unfortunately, this did not eliminate the problem: it happened today again.
> The difference is that when running in 2.6, most binaries use NPTL libs
> from /lib/i686/cmov/, and seem not to be affected by the problem (i.e. no
> zombies from them). However, users need to run some statically-linked
> binaries (without source available) that have non-NPTL libs statically
> linked and so still use linuxthreads; those are affected (i.e. do create
> zombies). So problem is not rendering server unusable (so it no longer
> that critical), but it still exists in the 2.6 kernel.
Sounds like userspace problem in threading libraries.
What version of glibc/linuxthreads was in use before?
Maybe post your report on linuxthreads mailing list.
> I can't reproduce the problem on any other host. And the affected system is
> a production server that is somewhat difficult to use for debugging :(
> It is a dual-K7 server with Tyan Tiger MPX S2466 motherboard and 2 Gb of
> ram. Output of 'lspci -vv' and 'cat /proc/cpuinfo' is attached. I may
> provide any other technical information.
>
> I'm a seasoned unix developer and sysadmin, and have some kernel hacking
> experience. However, I don't work with the kernel currently, so I am not
> "in context of" kernel internals.
> So I'm looking either for a fix :), or for some advice on what to do with
> this (i.e. where to look in the kernel code and what to look for).
--
vda
> > Some time ago I was faced with a strange problem in 2.4 kernel.
> > I could reproduce in only on one system - a production 2-CPU server
> > that is used as LTSP server here and also runs tons of services and
> > MUST be always up.
> >
> > The problem is the following.
> > Server runs normally (and uptime may be already several weeks, but may
> > be only several hours).
> > Suddenly something happens.
> > And process table becomes full of zombies.
> > Looks like any thread created by any program becomes a zombie when
> > finished. Same programs (actually, same running processes) join()ed
> > finished threads correctly before Something Happened. So it looks very
> > much that Something happens inside the kernel.
> > Affected programs include mozilla, clamav, mysqld, licq and anything
> > else that creates short-living threads, or at least threads that live
> > shorter than program itself.
>
> How does ps -AH e looks like?
See output of "ps -lax" in attachment.
> > It looks like at some moment kernel looses the abitily to inform
> > process that their threads are over. AKAIK, this is done by SIGCHLD?
> > Anyway, manual sending SIGCHLD to the parent of zombies does not help.
>
> Did you try stracing parent process? It can receive SIGCHLD but
> ignore/mishandle it.
I tried to use strace -f, so all threads exist in the output. No signals
arrive, expect those send manually by kill().
Stracing same binary on another host shows that SIGRT_1 arrives to the
parent.
I may send the strace logs, but they are somewhat large.
So kernel really stops devivering signals.
As far as I understand, in case of threads SIGRT_1 is used instead of
SIGCHLD.
So I tried to send SIGRT_1 to the parent manually. And zombies disappeared!
However, new zombies appear soon. They may still be removed by manual
SIGRT_1, but it is not a solution for a kernel bug :).
> > After the problem happens, server becomes unusable (because of process
> > table overflow) in several minutes. One time Something Else happened,
> > and all those zombies disappeared. In all other cases a reboot was
> > required.
> >
> > If the process that created those "zombie thread" is terminated (i.e.
> > sevice stopped), all his zombies disappear. However, after service is
> > restarted, zombies become to appear again.
>
> Probably they get reparented to init and it wait()'s for them,
> ending their afterlife. So SIGCHLD works (at least in this case).
Seems that signal passing works only after reparenting zombies.
> > Athough I tried, I could not find any correlation between making
> > system to this "zombie-keeping" state and anything else happenning
> > with the system. Looks like that running java apps (with blackdown
> > jdk) makes this happen more often, bot still no direct correlation.
> >
> > The problem happened with official 2.4.23, 2.4.24 and 2.4.25 kernels,
> > compiled from kernel.org sources.
> >
> > Yestedray I was tired with this zombie problem (it arised twice during
> > this week), and decided to upgrade server to kernel 2.6.
> > I installed 2.6.4 kernel from the Debain kernel-image-2.6.4-1-k7-smp
> > package.
> >
> > Unfortunately, this did not eliminate the problem: it happened today
> > again. The difference is that when running in 2.6, most binaries use
> > NPTL libs from /lib/i686/cmov/, and seem not to be affected by the
> > problem (i.e. no zombies from them). However, users need to run some
> > statically-linked binaries (without source available) that have
> > non-NPTL libs statically linked and so still use linuxthreads; those
> > are affected (i.e. do create zombies). So problem is not rendering
> > server unusable (so it no longer that critical), but it still exists
> > in the 2.6 kernel.
>
> Sounds like userspace problem in threading libraries.
> What version of glibc/linuxthreads was in use before?
> Maybe post your report on linuxthreads mailing list.
I doubt it is a userspace problem.
It happens with the same userspace libs and binaries (or even same running
processes) with which it did not happen sometime ago.
It happens at the same moment with different processes running from
different accounts.
Restarting processes doesn't help.
It is not reprodusable on other hosts.
Manual signal send (kill -33 <parentpid>) removes already existing zombies.
I can hardly imagine a userspace problem that behaves like this.
Nikita
> > > It looks like at some moment kernel looses the abitily to inform
> > > process that their threads are over. AKAIK, this is done by SIGCHLD?
> > > Anyway, manual sending SIGCHLD to the parent of zombies does not help.
> >
> > Did you try stracing parent process? It can receive SIGCHLD but
> > ignore/mishandle it.
>
> I tried to use strace -f, so all threads exist in the output. No signals
> arrive, expect those send manually by kill().
> Stracing same binary on another host shows that SIGRT_1 arrives to the
> parent.
> I may send the strace logs, but they are somewhat large.
> So kernel really stops devivering signals.
Post reasonably small pieces of them.
> As far as I understand, in case of threads SIGRT_1 is used instead of
> SIGCHLD.
> So I tried to send SIGRT_1 to the parent manually. And zombies disappeared!
> However, new zombies appear soon. They may still be removed by manual
> SIGRT_1, but it is not a solution for a kernel bug :).
Maybe. Maybe not. I am no expert, I'd try to learn out how SIGRT_1
is generated in normal case (I suppose kernel does not distinguish
between threads and processes, maybe it's done by threading libs?)
> > Probably they get reparented to init and it wait()'s for them,
> > ending their afterlife. So SIGCHLD works (at least in this case).
>
> Seems that signal passing works only after reparenting zombies.
>
> > > Unfortunately, this did not eliminate the problem: it happened today
> > > again. The difference is that when running in 2.6, most binaries use
> > > NPTL libs from /lib/i686/cmov/, and seem not to be affected by the
> > > problem (i.e. no zombies from them). However, users need to run some
> > > statically-linked binaries (without source available) that have
> > > non-NPTL libs statically linked and so still use linuxthreads; those
> > > are affected (i.e. do create zombies). So problem is not rendering
> > > server unusable (so it no longer that critical), but it still exists
> > > in the 2.6 kernel.
> >
> > Sounds like userspace problem in threading libraries.
> > What version of glibc/linuxthreads was in use before?
> > Maybe post your report on linuxthreads mailing list.
>
> I doubt it is a userspace problem.
> It happens with the same userspace libs and binaries (or even same running
> processes) with which it did not happen sometime ago.
> It happens at the same moment with different processes running from
> different accounts.
> Restarting processes doesn't help.
> It is not reprodusable on other hosts.
> Manual signal send (kill -33 <parentpid>) removes already existing zombies.
> I can hardly imagine a userspace problem that behaves like this.
I won't argue. One thing is clear: not enough info at this time :(
Try to instrument (printk("...")) parts of kernel responsible for
handling exit() etc.
--
vda
>
> Nikita
> > As far as I understand, in case of threads SIGRT_1 is used instead of
> > SIGCHLD.
> > So I tried to send SIGRT_1 to the parent manually. And zombies
> > disappeared! However, new zombies appear soon. They may still be
> > removed by manual SIGRT_1, but it is not a solution for a kernel bug
> > :).
>
> Maybe. Maybe not. I am no expert, I'd try to learn out how SIGRT_1
> is generated in normal case (I suppose kernel does not distinguish
> between threads and processes, maybe it's done by threading libs?)
I've looked at the kernel source.
This is what I found.
- looks like do_notify_parent() from kernel/signal.c is called to notify
parent about child termination.
- do_notify_parent() calls __group_send_sig_info() to send the signal, and
does not check the return code. However, __group_send_sig_info() may fail.
- __group_send_sig_info() calls send_signal()
- send_signal() contains the following code:
struct sigqueue * q = NULL;
...
if (atomic_read(&nr_queued_signals) < max_queued_signals)
q = kmem_cache_alloc(sigqueue_cachep, GFP_ATOMIC);
if (q) {
...
} else {
if (sig >= SIGRTMIN && info && (unsigned long)info != 1
&& info->si_code != SI_USER)
return -EAGAIN;
...
SIGRT_1 = 33, 33 is greater than SIGRTMIN, info is definitely not 0 or 1,
and info->si_code is definitly not SI_USER on the path related to parent
process notification.
nr_queued_signals and sigqueue_cachep seem to be local for kernel/signal.c
file, and code is organized such that nr_queued_signals shows exactly how
many elements are allocated in sigqueue_cachep.
max_queued_signals equals to 1024, so it is not allowed to allocate more
than 1024 elements from sigqueue_cachep.
sigqueue_cachep is initialized in signals_init():
sigqueue_cachep =
kmem_cache_create("sigqueue",
sizeof(struct sigqueue),
__alignof__(struct sigqueue),
0, NULL, NULL);
So I looked into /proc/slabinfo on the server running "zombie-loving"
kernel, and found the following line:
nikita@zigzag:/proc> grep sigqueue slabinfo
sigqueue 1024 1107 144 27 1 : tunables 120 60 8 : slabdata 41 41 0
As far as I understand, the first number in this output is the number of
elements allocated from "sigqueue" cache. That is, all 1024 elements are
allocated!
So looks like 'atomic_read(&nr_queued_signals) < max_queued_signals' is
false, so 'q' is not allocated, and send_signal() returns -EAGAIN while
trying to send SIGRT_1 to the parent process. This error code is passed
from __group_send_sig_info() to do_notify_parent(), and just ignored
there. So signal is not delivered, and dying process is left in zombie
state.
So "something" that happens in the kernel that makes it "zombie-lover" is
sigqueue overflow.
Another question is why this ever happens on my server ...
On Thursday 01 April 2004 23:25, Nikita V. Youshchenko wrote:
> > > As far as I understand, in case of threads SIGRT_1 is used instead of
> > > SIGCHLD.
> > > So I tried to send SIGRT_1 to the parent manually. And zombies
> > > disappeared! However, new zombies appear soon. They may still be
> > > removed by manual SIGRT_1, but it is not a solution for a kernel bug
> > >
> > > :).
> >
> > Maybe. Maybe not. I am no expert, I'd try to learn out how SIGRT_1
> > is generated in normal case (I suppose kernel does not distinguish
> > between threads and processes, maybe it's done by threading libs?)
>
> I've looked at the kernel source.
> This is what I found.
Good! :)
> - looks like do_notify_parent() from kernel/signal.c is called to notify
> parent about child termination.
>
> - do_notify_parent() calls __group_send_sig_info() to send the signal, and
> does not check the return code. However, __group_send_sig_info() may fail.
>
> - __group_send_sig_info() calls send_signal()
>
> - send_signal() contains the following code:
>
> struct sigqueue * q = NULL;
> ...
> if (atomic_read(&nr_queued_signals) < max_queued_signals)
> q = kmem_cache_alloc(sigqueue_cachep, GFP_ATOMIC);
> if (q) {
> ...
> } else {
> if (sig >= SIGRTMIN && info && (unsigned long)info != 1
> && info->si_code != SI_USER)
> return -EAGAIN;
> ...
>
> SIGRT_1 = 33, 33 is greater than SIGRTMIN, info is definitely not 0 or 1,
> and info->si_code is definitly not SI_USER on the path related to parent
> process notification.
>
> nr_queued_signals and sigqueue_cachep seem to be local for kernel/signal.c
> file, and code is organized such that nr_queued_signals shows exactly how
> many elements are allocated in sigqueue_cachep.
> max_queued_signals equals to 1024, so it is not allowed to allocate more
> than 1024 elements from sigqueue_cachep.
>
> sigqueue_cachep is initialized in signals_init():
> sigqueue_cachep =
> kmem_cache_create("sigqueue",
> sizeof(struct sigqueue),
> __alignof__(struct sigqueue),
> 0, NULL, NULL);
>
> So I looked into /proc/slabinfo on the server running "zombie-loving"
> kernel, and found the following line:
> nikita@zigzag:/proc> grep sigqueue slabinfo
> sigqueue 1024 1107 144 27 1 : tunables 120 60 8 : slabdata 41 41 0
>
> As far as I understand, the first number in this output is the number of
> elements allocated from "sigqueue" cache. That is, all 1024 elements are
> allocated!
>
> So looks like 'atomic_read(&nr_queued_signals) < max_queued_signals' is
> false, so 'q' is not allocated, and send_signal() returns -EAGAIN while
> trying to send SIGRT_1 to the parent process. This error code is passed
> from __group_send_sig_info() to do_notify_parent(), and just ignored
> there.
Hmmm what it can do there? Maybe only printk(). The question is why
sigqueue gets so big and does not shrink.
> So signal is not delivered, and dying process is left in zombie
> state.
>
> So "something" that happens in the kernel that makes it "zombie-lover" is
> sigqueue overflow.
You found an explanation why there are zombies. Now, why it starts to happen?
Why does it persists? There must be some code which shrinks sigqueue.
It does not seem to work right.
--
vda