2001-12-18 01:14:41

by Kurt Roeckx

[permalink] [raw]
Subject: wait() and strace -f

I got a weird problem here. I have a process that creates 2
childs, the first one dies very fast before the parent can call
wait(). When I strace -f this wait() doesn't clean up the zombie
as it should.

Note that this problem only happens when I have 2 childeren, use
strace -f, and call wait after the first child died. Just
strace, without strace, only 1 child, or call wait() after the
child died doesn't seem to cause the problem.

Btw, this is with 2.4.16.

Simple program to demostrate it:

int main()
{
int i;

if (!fork())
{
/* Child 1. */
return 0;
}

if (!fork())
{
/* Child 2. */
sleep(10);
return 0;
}

/* Parent. */
sleep(1);
wait(&i);
return 0;
}

Without strace -f, this program stops after 1 second and the
second child still lives for 9 seconds. With strace -f this
program stops after 10 second after the second child died.

I think it's related to strace being the "real" parent of the
child. But that doesn't really explain why I need 2 childs.


Kurt


2001-12-18 15:33:15

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: wait() and strace -f

Kurt Roeckx <[email protected]> writes:

> int main()
> {
> int i;
>
> if (!fork())
> {
> /* Child 1. */
> return 0;
> }
>
> if (!fork())
> {
> /* Child 2. */
> sleep(10);
> return 0;
> }
>
> /* Parent. */
> sleep(1);
> wait(&i);
> return 0;
> }
>
> Without strace -f, this program stops after 1 second and the
> second child still lives for 9 seconds. With strace -f this
> program stops after 10 second after the second child died.
>
> I think it's related to strace being the "real" parent of the
> child. But that doesn't really explain why I need 2 childs.

Probably, it's feature (or bug) of strace. If the trace process has
child, trace of a child is continued before wait() of parent. Then,
exit() of the child process continue wait() of parent.

> if (!fork())
> {
> /* Child 1. */
sleep(2);
> return 0;
> }

The above continued the parent after 2 seconds.
--
OGAWA Hirofumi <[email protected]>

2001-12-18 20:19:23

by Kurt Roeckx

[permalink] [raw]
Subject: Re: wait() and strace -f

On Tue, Dec 18, 2001 at 04:59:58PM +0900, OGAWA Hirofumi wrote:
> Kurt Roeckx <[email protected]> writes:
>
> > I think it's related to strace being the "real" parent of the
> > child. But that doesn't really explain why I need 2 childs.
>
> Probably, it's feature (or bug) of strace. I'm seems, if strace has
> child, trace of a child is started before wait() of parent. Then,
> exit() of child continue wait() of parent.

If I understand what you're saying, sleep(1) in child1, and
sleep(2) in the parent should fix the problem, which it doesn't.

And it still doesn't explain why it only happens with 2 childs.

Maybe I should have mentioned this before: the wait will clean up
the first child at the time the second child dies, or atleast
that's what wait() returns.

> > if (!fork())
> > {
> > /* Child 1. */
> sleep(2);
> > return 0;
> > }
>
> The above change is continued the parent after 2 seconds.

I know that too, as I said, only when child 1 dies before the
parent calls wait().


Kurt

2001-12-19 15:27:06

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: wait() and strace -f

Kurt Roeckx <[email protected]> writes:

> > Probably, it's feature (or bug) of strace. I'm seems, if strace has
^^^^^^
Sorry, s/strace/the trace process/

> > child, trace of a child is started before wait() of parent. Then,
> > exit() of child continue wait() of parent.
>
> If I understand what you're saying, sleep(1) in child1, and
> sleep(2) in the parent should fix the problem, which it doesn't.
>
> And it still doesn't explain why it only happens with 2 childs.

As far as I read the source, it seems strace is not counting the
zombie. And strace wait exit() of child2 before restarting the wait()
of parent.

strace parent child1 child2
zombie

sleep(1) sleep(10)
before wait()
trap wait()



before exit()
trap exit()
restart child2
run exit()
restart parent
run wait()

> Maybe I should have mentioned this before: the wait will clean up
> the first child at the time the second child dies, or atleast
> that's what wait() returns.
>
> > > if (!fork())
> > > {
> > > /* Child 1. */
> > sleep(2);
> > > return 0;
> > > }
> >
> > The above change is continued the parent after 2 seconds.
>
> I know that too, as I said, only when child 1 dies before the
> parent calls wait().

strace-4.4/process.c in strace_4.4-1.tar.gz

diff -u /tmp/t/strace-4.4/process.c.orig /tmp/t/strace-4.4/process.c
--- /tmp/t/strace-4.4/process.c.orig Fri Aug 3 20:51:28 2001
+++ /tmp/t/strace-4.4/process.c Wed Dec 19 08:20:05 2001
@@ -1349,7 +1349,7 @@
/* WTA: fix bug with hanging children */
if (!(tcp->u_arg[2] & WNOHANG) && tcp->nchildren > 0) {
/* There are traced children */
- tcp->flags |= TCB_SUSPENDED;
+ /* tcp->flags |= TCB_SUSPENDED; */
tcp->waitpid = tcp->u_arg[0];
}
}

Try the above patch. This restart wait() immediately. However, probably
it will break something of other. ;)
--
OGAWA Hirofumi <[email protected]>