2004-01-31 11:10:54

by Matthias Urlichs

[permalink] [raw]
Subject: BUG: NTPL: waitpid() doesn't return?

This partial trace is from Debian's mini-dinstall, which is a
multithreaded Python script.

What happens here is that it spawns a bunch of threads, then some of
these fork+execve external programs which they waitpid() for.

Unfortunately, some of these waitpid() calls don't return even though
the waited-for process clearly has exited.

This is kernel 2.6.2-rc2, unmodified (except for modularized IDE).

I've kept all the intervening clone() calls in the trace, hopefully
somebody can shed some light on what might be going on here.

(Imagine random other things happening between all of the following lines.)

31338 execve("/usr/bin/mini-dinstall", ["mini-dinstall"], [/* 12 vars */]) = 0
31338 rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0
31338 execve("/usr/bin/mini-dinstall", ["mini-dinstall"], [/* 12 vars */]) = 0
31338 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x4018e0c8) = 31339
31339 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x4018e0c8) = 31340
31340 clone(child_stack=0x42edbb48, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, parent_tidptr=0x42edbc18, {entry_number:6, base_addr:0x42edbbd0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0x42edbc18) = 31345
31342 clone( <unfinished ...>
31342 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x416dbc18) = 31346
31346 execve("/usr/bin/apt-ftparchive", ["apt-ftparchive", "packages", "testing/all"], [/* 12 vars */] <unfinished ...>
31346 <... execve resumed> ) = 0
31346 exit_group(0) = ?
31340 --- SIGCHLD (Child exited) @ 0 (0) ---
31342 waitpid(31346, <unfinished ...>

This last call never returns.

Any ideas?

NB: When not using strace, the waidpid() call does return;
unfortunately, it does so with "[Errno 10] No child processes".

--
Matthias Urlichs | noris network AG | http://smurf.noris.de/


2004-01-31 15:37:45

by bert hubert

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?

On Sat, Jan 31, 2004 at 11:46:06AM +0100, Matthias Urlichs wrote:

> This partial trace is from Debian's mini-dinstall, which is a
> multithreaded Python script.
>
> What happens here is that it spawns a bunch of threads, then some of
> these fork+execve external programs which they waitpid() for.
>
> Unfortunately, some of these waitpid() calls don't return even though
> the waited-for process clearly has exited.

It might be that in the NPTL world only one waitpid() can run per process
simultaneously? Do you wait for all pids or for a specific one?

Bert.

--
http://www.PowerDNS.com Open source, database driven DNS Software
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2004-01-31 16:18:08

by bert hubert

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?

On Sat, Jan 31, 2004 at 04:51:55PM +0100, Matthias Urlichs wrote:

> > Do you wait for all pids or for a specific one?
> >
> ... looking at the strace output, I see that thre are four different
> threads calling fork+child-exec/parent-waitpid() in parallel. The last
> one actually succeeds, so you might be right with this analysis.
>
> *Sigh* No matter how many people work at that code in the kernel, it's
> _still_ fragile. :-/

If they do not wait for a specific pid, the kernel is right. The kernel has
no way of knowing which process a specific waitpid is waiting for otherwise!

--
http://www.PowerDNS.com Open source, database driven DNS Software
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2004-01-31 16:10:55

by Matthias Urlichs

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?

Hi,

bert hubert:
> It might be that in the NPTL world only one waitpid() can run per process
> simultaneously?

Good call ...

> Do you wait for all pids or for a specific one?
>
... looking at the strace output, I see that thre are four different
threads calling fork+child-exec/parent-waitpid() in parallel. The last
one actually succeeds, so you might be right with this analysis.

*Sigh* No matter how many people work at that code in the kernel, it's
_still_ fragile. :-/

--
Matthias Urlichs | noris network AG | http://smurf.noris.de/

2004-01-31 18:41:33

by Matthias Urlichs

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?

Hi,

bert hubert:
> If they do not wait for a specific pid, the kernel is right. The kernel has
> no way of knowing which process a specific waitpid is waiting for otherwise!
>
Please check my original mail again. The thread _is_ waiting for a
specific pid.

--
Matthias Urlichs | noris network AG | http://smurf.noris.de/

2004-01-31 19:02:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?



On Sat, 31 Jan 2004, Matthias Urlichs wrote:
>
> This partial trace is from Debian's mini-dinstall, which is a
> multithreaded Python script.

Looks buggy.

> What happens here is that it spawns a bunch of threads, then some of
> these fork+execve external programs which they waitpid() for.
>
> Unfortunately, some of these waitpid() calls don't return even though
> the waited-for process clearly has exited.

"Clearly" is not correct. What I bet happens is:

> 31339 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x4018e0c8) = 31340
> 31340 clone(child_stack=0x42edbb48, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, parent_tidptr=0x42edbc18, {entry_number:6, base_addr:0x42edbbd0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0x42edbc18) = 31345
> 31342 clone( <unfinished ...>
> 31342 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x416dbc18) = 31346
> 31346 execve("/usr/bin/apt-ftparchive", ["apt-ftparchive", "packages", "testing/all"], [/* 12 vars */] <unfinished ...>
> 31346 <... execve resumed> ) = 0
> 31346 exit_group(0) = ?
> 31340 --- SIGCHLD (Child exited) @ 0 (0) ---
> 31342 waitpid(31346, <unfinished ...>

Notice how you do "waitpid()" for a _specific_ thread when you get a
SIGCHLD.

How the heck do you know _which_ thread it was that exited?

In other words, I suspect that you got a SIGCHILD for some _other_ thread,
and now you're waitpid'ing for the wrong thread. And while you do so, the
thread you wait for may well be waiting for you to do something.

Rule: never EVER wait for a specific thread in a SIGCHLD handler. When you
get a SIGCHLD, your signal handler should just wait for "any thread".
Because SIGCHLD isn't specific enough (well, you could get the pid from
siginfo, but because SIGCHLD isn't queued, that wouldn't really work
either).

So your SIHCHLD handler should _always_ look like this:

for (;;) {
int status;
int pid = waitpid(NULL, &status, WNOHANG);
if (pid < 0)
break;
handle_exit(pid, status);
}

and anything else is basically a bug in your program (well, your SIGCHLD
handler might choose to just set a flag, and let the main loop do the
above: that usually makes some races much simpler to handle, since the
signal handler really _really_ shouldn't be taking any threading locks or
anythign like that).

> Any ideas?

Fix your thing to use NULL and WNOHANG.

> NB: When not using strace, the waidpid() call does return;
> unfortunately, it does so with "[Errno 10] No child processes".

Probably just timing changes where strace effectively serializes something
(ie you probably have other threads that also get signals and do waitpid).

Linus

2004-01-31 19:19:27

by bert hubert

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?

[Added Ingo & NPTL list]

[Matthias reports that threads forking multiple programs and running waitpid
on them has problems]

On Sat, Jan 31, 2004 at 07:15:19PM +0100, Matthias Urlichs wrote:

> > If they do not wait for a specific pid, the kernel is right. The kernel has
> > no way of knowing which process a specific waitpid is waiting for otherwise!
> >
> Please check my original mail again. The thread _is_ waiting for a
> specific pid.

I can't reproduce this with 2.6.1, can you check if this works as intended
on your system? Should output:

Thread 0 Launched pid 10967
Thread 1 Launched pid 10971
Thread 1 waitpid for 10971 returned for 10971, status: 1
Thread 0 waitpid for 10967 returned for 10967, status: 3

This is with NPTL 0.60 (Debian Unstable).

./sleep program:

#!/bin/sh
sleep $1
exit $1

testcase.c:

#include <stdio.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <pthread.h>
#include <stdlib.h>

void unixDie(const char* what)
{
perror(what);
exit(1);
}
int s_counter;
void *func(void *p)
{
pid_t pid;
int status;
int counter=s_counter++;

pid=fork();
if(pid<0)
unixDie("doing fork");
if(!pid) {
char *arguments[]={"./sleep","",0};
char str[10];
sprintf(str,"%d",(int)p);
arguments[1]=str;

if(execve(arguments[0],arguments,0))
unixDie("doing execve");
}
if(pid) {
pid_t ret;
printf("Thread %d Launched pid %d\n", counter, pid);
ret=waitpid(pid, &status, 0);
if(ret<0)
unixDie("waitpid");

printf("Thread %d waitpid for %d returned for %d, status: %d\n",counter, pid, ret, WEXITSTATUS(status));
}

return 0;
}


int main(int argc, char **argv)
{
pthread_t t1, t2;
pthread_create(&t1, 0, func,(void*)3);
sleep(1);
pthread_create(&t2, 0, func,(void*)1);

pthread_join(t2,0);
pthread_join(t1,0);
return 0;
}


--
http://www.PowerDNS.com Open source, database driven DNS Software
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2004-01-31 20:10:55

by Matthias Urlichs

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?

Hi,

Linus Torvalds:
>
> "Clearly" is not correct. What I bet happens is:
>
> > 31346 execve("/usr/bin/apt-ftparchive", ["apt-ftparchive", "packages", "testing/all"], [/* 12 vars */] <unfinished ...>
> > 31346 <... execve resumed> ) = 0
> > 31346 exit_group(0) = ?
> > 31340 --- SIGCHLD (Child exited) @ 0 (0) ---
> > 31342 waitpid(31346, <unfinished ...>
>
> Notice how you do "waitpid()" for a _specific_ thread when you get a
> SIGCHLD.
>
> How the heck do you know _which_ thread it was that exited?

Please look again. The above trace clearly shows that #31346 has
exited, which is exactly the thread being waitpid()ed for.

What the program does is basically
- spawn four threads or so
- each thread forks off some process, and then waitpid()s for exactly
that pid

... and all but the last waitpid() never returns even though all four
child processes have exit_group()ed.

> Rule: never EVER wait for a specific thread in a SIGCHLD handler. When you
> get a SIGCHLD, your signal handler should just wait for "any thread".

No SIGCHLD has been installed. (I checked the strace output.)

--
Matthias Urlichs | noris network AG | http://smurf.noris.de/

2004-01-31 20:58:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?



On Sat, 31 Jan 2004, Matthias Urlichs wrote:
>
> Please look again. The above trace clearly shows that #31346 has
> exited, which is exactly the thread being waitpid()ed for.

The strace case I'm more than willing to pass off as a strace problem.
I find it quite common that strace doesn't detach from processes, leaving
some of them in a stopped state. So I assumed that the particular trace
wasn't very interesting.

That's especially true since you said the behaviour doesn't actually occur
without "strace".

So considering the further details:

> What the program does is basically
> - spawn four threads or so
> - each thread forks off some process, and then waitpid()s for exactly
> that pid
>
> ... and all but the last waitpid() never returns even though all four
> child processes have exit_group()ed.
>
> No SIGCHLD has been installed. (I checked the strace output.)

The bugs seems to be something totally different: you create a
"CLONE_DETACHED" child, and then you expect to be able to wait for it.
That's just bogus.

If the child is detached, that means that it reaps itself, and the
behaviour you _should_ see is pretty much undefined. Doing a "wait()" on a
detached child just doesn't make much sense.

So I think the "correct" behaviour actually ends up having the wait()
pause, and once the child exits it reaps itself, and that in turn
will/should make the wait() notice that there are no children, and return
a -ENOCHLD.

Because with a detached thread, you really will never get the status - the
child will have reaped itself and not given status to the parent.

But in theory it would be equally correct to just return ENOCHLD
_immediately_, since the clone was a DETACHED clone.

The "strace" behaviour may be a bug somewhere. I don't know if it's strace
or the kernel. But quite possibly, it's not even a "bug": exactly because
the child was detached, when strace detaches from it, it wouldn't be
re-attached to the parent, so there's nothing to wake the parent up again.

Linus

2004-01-31 21:18:46

by bert hubert

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?

On Sat, Jan 31, 2004 at 09:49:14PM +0100, Matthias Urlichs wrote:

> Your test program works... except that it reports, when I strace it,
>
> [pid 10629] waitpid(10631, Process 10629 suspended
> <unfinished ...>
> [pid 10628] <... mmap2 resumed> ) = 0x41966000
> [pid 10630] waitpid(10632, Process 10630 suspended
> <unfinished ...>
>
> Those "Process ### suspended" messages did NOT happen with the Python
> script that exhibits the bug.

This means your situation is different from what you describe. Python may
not in fact be doing 'real' threading on your setup. I suggest you try to
make the smallest possible python program that exhibits the program and send
it to the list.

Can you also run the Python program with LD_ASSUME_KERNEL=2.4 ? And my
program too.

Thanks.


--
http://www.PowerDNS.com Open source, database driven DNS Software
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2004-01-31 21:10:55

by Matthias Urlichs

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?

Hi,

bert hubert:
> [Matthias reports that threads forking multiple programs and running waitpid
> on them has problems]
>
To recap:

This process, on a SMP machine, does generate a few threads. Each
thread forks off some process, which it then waitpid()s for.
There is no SIGCHLD handler.

All but the last of these waitpid()s, as seen in strace output, never
return.

31342 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x416dbc18) = 31346
31346 execve("/usr/bin/apt-ftparchive", ["apt-ftparchive", "packages", "testing/all"], [/* 12 vars */] <unfinished ...>
31346 <... execve resumed> ) = 0
31346 exit_group(0) = ?
31340 --- SIGCHLD (Child exited) @ 0 (0) ---
31342 waitpid(31346, <unfinished ...>

Your test program works... except that it reports, when I strace it,

[pid 10629] waitpid(10631, Process 10629 suspended
<unfinished ...>
[pid 10628] <... mmap2 resumed> ) = 0x41966000
[pid 10630] waitpid(10632, Process 10630 suspended
<unfinished ...>

Those "Process ### suspended" messages did NOT happen with the Python
script that exhibits the bug.

--
Matthias Urlichs | noris network AG | http://smurf.noris.de/

2004-01-31 21:40:54

by Matthias Urlichs

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?

Hi,

Linus Torvalds:
> The strace case I'm more than willing to pass off as a strace problem.
> I find it quite common that strace doesn't detach from processes, leaving
> some of them in a stopped state. So I assumed that the particular trace
> wasn't very interesting.
>
Hmm. "Unfortunately", the test program which bert posted DOES work
correctly under strace.

So there's definitely something fishy going on here.

> The bugs seems to be something totally different: you create a
> "CLONE_DETACHED" child, and then you expect to be able to wait for it.

That's not "me", that's "the fork() in the NPTL library".

Besides, bert's test program exhibits exactly the same clone()
arguments, yet it works ...

--
Matthias Urlichs | noris network AG | http://smurf.noris.de/

2004-01-31 21:52:51

by Roland McGrath

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?

> Your test program works... except that it reports, when I strace it,
>
> [pid 10629] waitpid(10631, Process 10629 suspended
> <unfinished ...>
> [pid 10628] <... mmap2 resumed> ) = 0x41966000
> [pid 10630] waitpid(10632, Process 10630 suspended
> <unfinished ...>
>
> Those "Process ### suspended" messages did NOT happen with the Python
> script that exhibits the bug.

This is an strace bug. Because of goofy ptrace interactions, strace does
funny business with threads doing wait calls. strace should resume those
threads when the pids they are waiting for exit.

Make sure you are using the most current strace and if it's still not
different then report the strace bug (<[email protected]>
or https://bugzilla.redhat.com are fine).

To research the issue thoroughly, you may have to avoid relying on strace
to tell you what calls your programs make.

2004-01-31 22:10:58

by Matthias Urlichs

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?

Hi,

bert hubert:
> This means your situation is different from what you describe. Python may
> not in fact be doing 'real' threading on your setup.

Well, the strace was from Python, so probably I was missing something...
but I can't think of anything not-real which Python does, except that it
does synchronize its internal state with MANY futex calls. ;-)

> make the smallest possible python program that exhibits the program and send
> it to the list.
>
I'll do that.

--
Matthias Urlichs | noris network AG | http://smurf.noris.de/

2004-01-31 22:41:05

by Matthias Urlichs

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?

Hi,

some further information: as I just found out by further mangling bert's
test program, the bug seems to be SMP related.

In other words, the Python program in question runs without any problem
at all if I simply start it as

taskset 01 mini-dinstall

i.e. everything on a single CPU.

--
Matthias Urlichs | noris network AG | http://smurf.noris.de/

2004-01-31 22:52:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: BUG: NTPL: waitpid() doesn't return?



On Sat, 31 Jan 2004, Matthias Urlichs wrote:
>
> So there's definitely something fishy going on here.

Yes. Especially as the actual clone() we're waiting for was this
one:

31342 clone( <unfinished ...>
31342 <... clone resumed> child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x416dbc18) = 31346

which wasn't the detached one at all. I looked at the wrong clone().

> Besides, bert's test program exhibits exactly the same clone()
> arguments, yet it works ...

[ twilight zone music ]

Anyway. It's interesting to see who gets the SIGCHLD (which is why I
looked at the wrong clone) - that goes to 31340:

31346 exit_group(0) = ?
31340 --- SIGCHLD (Child exited) @ 0 (0) ---
31342 waitpid(31346, <unfinished ...>

which is just because we send a thread-group signal, so the signal isn't
actually necessarily directed toward the "real parent".

But the real parent _should_ have been woken up by __wake_up_parent().
And I don't see why that wouldn't happen.

Linus