[Rafael, one entry for the regression list]
Roland,
The opensuse 11.0 dhcpcd keeps accumulating zombies of its child scripts
with 2.6.27-rc*. 2.6.26 didn't have this problem. After one night I had more
zombies than normal processes.
I bisected it down to
commit 2b2a1ff64afbadac842bbc58c5166962cf4f7664
Author: Roland McGrath <[email protected]>
Date: Fri Jul 25 19:45:54 2008 -0700
tracehook: death
causing the problem. Please fix.
-Andi
Thanks for the report.
Was this problem after the fixes in:
commit 5861bbfcc10fc0358abf52c7d22850c8d180f0b0
commit 5c7edcd7ee6b77b88252fe4096dce1a46a60c829
?
Do you know the process-level details that cause the problem?
(i.e. exits with what parent/signals/etc conditions)
It may take me a bit to set up an opensuse 11.0 vm to try its dhcpcd.
Thanks,
Roland
On Tue, Aug 19, 2008 at 12:43:18PM -0700, Roland McGrath wrote:
> Thanks for the report.
>
> Was this problem after the fixes in:
> commit 5861bbfcc10fc0358abf52c7d22850c8d180f0b0
> commit 5c7edcd7ee6b77b88252fe4096dce1a46a60c829
> ?
Yes. I saw it with yesterday's Linus HEAD
>
> Do you know the process-level details that cause the problem?
> (i.e. exits with what parent/signals/etc conditions)
> It may take me a bit to set up an opensuse 11.0 vm to try its dhcpcd.
Not currently. I'll send you a strace log later.
-Andi
I reproduced it, SA_NOCLDWAIT broke. I'll have a fix momentarily.
Thanks,
Roland
The following changes since commit 1fca25427482387689fa27594c992a961d98768f:
Linus Torvalds (1):
Merge branch 'release' of git://git.kernel.org/.../aegl/linux-2.6
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-utrace.git tracehook
Roland McGrath (1):
tracehook: fix SA_NOCLDWAIT
kernel/signal.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
Thanks,
Roland
---
[PATCH] tracehook: fix SA_NOCLDWAIT
I outwitted myself again in commit 2b2a1ff64afbadac842bbc58c5166962cf4f7664,
and broke the SA_NOCLDWAIT behavior so it leaks zombies. This fixes it.
Reported-by: Andi Kleen <[email protected]>
Signed-off-by: Roland McGrath <[email protected]>
---
kernel/signal.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index c539f60..e661b01 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1338,6 +1338,7 @@ int do_notify_parent(struct task_struct *tsk, int sig)
struct siginfo info;
unsigned long flags;
struct sighand_struct *psig;
+ int ret = sig;
BUG_ON(sig == -1);
@@ -1402,7 +1403,7 @@ int do_notify_parent(struct task_struct *tsk, int sig)
* is implementation-defined: we do (if you don't want
* it, just use SIG_IGN instead).
*/
- tsk->exit_signal = -1;
+ ret = tsk->exit_signal = -1;
if (psig->action[SIGCHLD-1].sa.sa_handler == SIG_IGN)
sig = -1;
}
@@ -1411,7 +1412,7 @@ int do_notify_parent(struct task_struct *tsk, int sig)
__wake_up_parent(tsk, tsk->parent);
spin_unlock_irqrestore(&psig->siglock, flags);
- return sig;
+ return ret;
}
static void do_notify_parent_cldstop(struct task_struct *tsk, int why)
> I outwitted myself again in commit 2b2a1ff64afbadac842bbc58c5166962cf4f7664,
> and broke the SA_NOCLDWAIT behavior so it leaks zombies. This fixes it.
>
> Reported-by: Andi Kleen <[email protected]>
Confirmed -- this patch fixes the zombie problem on my box.
Tested-by: Andi Kleen <[email protected]>
-Andi