2001-11-17 03:46:22

by Jeff Long

[permalink] [raw]
Subject: Zombies with 2.4.15pre5 (exit.c)

Running 2.4.15pre5 (UP) on i386, running UML 2.4.14-2.
UML processes create threads on the host system that don't
die. Threads are stuck at do_exit( ), so I backed out the
patch to kernel/exit.c @ 539 (in 2.4.15pre5 patch):

p->state = TASK_DEAD;

and things work fine. I do not see zombies with anything
other than UML processes/native threads.

_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


2001-11-20 16:09:05

by Dave McCracken

[permalink] [raw]
Subject: [PATCH] Re: Zombies with 2.4.15pre5 (exit.c)


--On Saturday, November 17, 2001 03:45:49 +0000 Jeff Long
<[email protected]> wrote:

> Running 2.4.15pre5 (UP) on i386, running UML 2.4.14-2.
> UML processes create threads on the host system that don't
> die. Threads are stuck at do_exit( ), so I backed out the
> patch to kernel/exit.c @ 539 (in 2.4.15pre5 patch):
>
> p->state = TASK_DEAD;
>
> and things work fine. I do not see zombies with anything
> other than UML processes/native threads.

The intent of the original patch was to make the task unfindable to other
waiters, which fixed a race condition in sys_wait4(). My assumption was
that the task was about to be cleaned up in release_task(). What I missed
was that there are a couple of code paths that don't release the task, but
assume it'll be cleaned up later.

The patch below should fix the problem.

Dave McCracken

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059

-----------------------

--- linux-2.4.15-pre7/kernel/exit.c Tue Nov 20 10:00:26 2001
+++ linux-2.4.15-pre7-patch/kernel/exit.c Tue Nov 20 09:57:48 2001
@@ -544,8 +544,11 @@
retval = ru ? getrusage(p, RUSAGE_BOTH, ru) : 0;
if (!retval && stat_addr)
retval = put_user(p->exit_code, stat_addr);
- if (retval)
+ if (retval) {
+ /* Reset state. We're not cleaning up yet */
+ p->state = TASK_ZOMBIE;
goto end_wait4;
+ }
retval = p->pid;
if (p->p_opptr != p->p_pptr) {
write_lock_irq(&tasklist_lock);
@@ -553,6 +556,8 @@
p->p_pptr = p->p_opptr;
SET_LINKS(p);
do_notify_parent(p, SIGCHLD);
+ /* Reset state. We're not cleaning up yet */
+ p->state = TASK_ZOMBIE;
write_unlock_irq(&tasklist_lock);
} else
release_task(p);

2001-11-20 18:44:33

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] Re: Zombies with 2.4.15pre5 (exit.c)

Hi,

Dave McCracken <[email protected]> writes:

> The intent of the original patch was to make the task unfindable to other
> waiters, which fixed a race condition in sys_wait4(). My assumption was
> that the task was about to be cleaned up in release_task(). What I missed
> was that there are a couple of code paths that don't release the task, but
> assume it'll be cleaned up later.

I think the original patch don't fix race condition, because
tasklist_lock is read_lock(). Furthermore, the threads which did not
receive process status continues waiting, even when there is no child
process.

I wrote the following patch. But, I'm not sure whether it is right.

Thanks
--
OGAWA Hirofumi <[email protected]>

--- linux-head/kernel/exit.c Tue Nov 20 23:32:58 2001
+++ wait/kernel/exit.c Tue Nov 20 17:56:12 2001
@@ -529,23 +529,27 @@
retval = ru ? getrusage(p, RUSAGE_BOTH, ru) : 0;
if (!retval && stat_addr)
retval = put_user((p->exit_code << 8) | 0x7f, stat_addr);
- if (!retval) {
- p->exit_code = 0;
- retval = p->pid;
+ if (retval)
+ goto end_wait4;
+
+ /* exactly one thread return the process status */
+ task_lock(p);
+ if (p->exit_code == 0) {
+ task_unlock(p);
+ goto repeat;
}
+ p->exit_code = 0;
+ task_unlock(p);
+ retval = p->pid;
goto end_wait4;
case TASK_ZOMBIE:
- /* Make sure no other waiter picks this task up */
- p->state = TASK_DEAD;
-
- current->times.tms_cutime += p->times.tms_utime + p->times.tms_cutime;
- current->times.tms_cstime += p->times.tms_stime + p->times.tms_cstime;
read_unlock(&tasklist_lock);
retval = ru ? getrusage(p, RUSAGE_BOTH, ru) : 0;
if (!retval && stat_addr)
retval = put_user(p->exit_code, stat_addr);
if (retval)
- goto end_wait4;
+ goto end_wait4;
+
retval = p->pid;
if (p->p_opptr != p->p_pptr) {
write_lock_irq(&tasklist_lock);
@@ -554,8 +558,20 @@
SET_LINKS(p);
do_notify_parent(p, SIGCHLD);
write_unlock_irq(&tasklist_lock);
- } else
- release_task(p);
+ goto end_wait4;
+ }
+
+ /* exactly one thread return the process status */
+ task_lock(p);
+ if (p->pid == 0) {
+ task_unlock(p);
+ goto repeat;
+ }
+ p->pid = 0;
+ task_unlock(p);
+ current->times.tms_cutime += p->times.tms_utime + p->times.tms_cutime;
+ current->times.tms_cstime += p->times.tms_stime + p->times.tms_cstime;
+ release_task(p);
goto end_wait4;
default:
continue;
--- linux-head/include/linux/sched.h Tue Nov 20 23:32:58 2001
+++ wait/include/linux/sched.h Tue Nov 20 17:38:11 2001
@@ -88,7 +88,6 @@
#define TASK_UNINTERRUPTIBLE 2
#define TASK_ZOMBIE 4
#define TASK_STOPPED 8
-#define TASK_DEAD 16

#define __set_task_state(tsk, state_value) \
do { (tsk)->state = (state_value); } while (0)

2001-11-21 18:11:37

by Pau Aliagas

[permalink] [raw]
Subject: Re: [PATCH] Re: Zombies with 2.4.15pre5 (exit.c)

On Tue, 20 Nov 2001, Dave McCracken wrote:

> > Running 2.4.15pre5 (UP) on i386, running UML 2.4.14-2.
> > UML processes create threads on the host system that don't
> > die. Threads are stuck at do_exit( ), so I backed out the
> > patch to kernel/exit.c @ 539 (in 2.4.15pre5 patch):
> >
> > p->state = TASK_DEAD;
> >
> > and things work fine. I do not see zombies with anything
> > other than UML processes/native threads.
>
> The intent of the original patch was to make the task unfindable to other
> waiters, which fixed a race condition in sys_wait4(). My assumption was
> that the task was about to be cleaned up in release_task(). What I missed
> was that there are a couple of code paths that don't release the task, but
> assume it'll be cleaned up later.
>
> The patch below should fix the problem.

It doesn't for me.
I'll try OGAWA Hirofumi's patch -posted to the list- and let you know.

Pau