2001-04-01 17:24:22

by Dennis Noordsij

[permalink] [raw]
Subject: pthreads & fork & execve

Hi,

I have question regarding use of pthreads, forks and execve's which appears
to not work very well :-) First let me explain the reasoning though

We have an app that launches a few other apps and keeps track of their
status, resource consumption etc. If one of the apps crashes, it is restarted
according to certain parameters.

The app uses pthreads, and it's method of (re)starting an application is
forking and calling execve.

It works fine for all-but-one other app, which core dumps when started this
way (from the commandline it works fine) and the core only traces back to
int main(int argc, char **argv). It uses both pthreads and -ldl for plugin
handling.

We have tried changing the linking order (i.e. -ldl -lpthread, -lpthread,
-ldl, etc), and even execv'ing a shell script that starts a shell script that
starts the app - result is the same, instant core without even running.

I can see who forks together with threads and execve's are a messy
combination, and a better solution altogether to our approach is appreciated
just as much as a way to make the current solution work :-)

We have tested both kernels 2.4.2 and 2.2.18.

We have tried on different systems, different hardware and slightly different
distributions (debian potato, unstable, etc).

To sum up: using a pthreaded app to launch another pthreaded app by means of
forking and exec(ve)'ng makes the second app core immediately, (at entering
main). What to do?

Kind regards, and thanks for any help
Dennis Noordsij


2001-04-02 11:44:49

by Richard Guenther

[permalink] [raw]
Subject: Re: pthreads & fork & execve

Hi!

I tracked this down to a corrupt jumptable somewhere in the pthreads
part of the libc (didnt have the source handy at that time, though). So
I think this is a libc bug (version does not matter) - I even did a
followup to a similar bug in the libc gnats database (I think I should
have opened a new one, though...). But I failed to construct a "simple"
testcase showing the bug (We use rather large amount of threads and
in one or two doing popen() calls - or handcrafted fork() && execv(),
the SIGSEGV is during fork()).

I stopped trying to find out what is going on as this feature is not
essential (but maybe useful in the future). So I suggest you build a
libc from source with debugging on and trace it down to the actual
libc problem - or better try to isolate a simple testcase.

I like to hear from the results :)

Richard.

On Fri, 30 Mar 2001, Dennis Noordsij wrote:

> Hi,
>
> I have question regarding use of pthreads, forks and execve's which appears
> to not work very well :-) First let me explain the reasoning though
>
> We have an app that launches a few other apps and keeps track of their
> status, resource consumption etc. If one of the apps crashes, it is restarted
> according to certain parameters.
>
> The app uses pthreads, and it's method of (re)starting an application is
> forking and calling execve.
>
> It works fine for all-but-one other app, which core dumps when started this
> way (from the commandline it works fine) and the core only traces back to
> int main(int argc, char **argv). It uses both pthreads and -ldl for plugin
> handling.
>
> We have tried changing the linking order (i.e. -ldl -lpthread, -lpthread,
> -ldl, etc), and even execv'ing a shell script that starts a shell script that
> starts the app - result is the same, instant core without even running.
>
> I can see who forks together with threads and execve's are a messy
> combination, and a better solution altogether to our approach is appreciated
> just as much as a way to make the current solution work :-)
>
> We have tested both kernels 2.4.2 and 2.2.18.
>
> We have tried on different systems, different hardware and slightly different
> distributions (debian potato, unstable, etc).
>
> To sum up: using a pthreaded app to launch another pthreaded app by means of
> forking and exec(ve)'ng makes the second app core immediately, (at entering
> main). What to do?
>
> Kind regards, and thanks for any help
> Dennis Noordsij
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Richard Guenther <[email protected]>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
The GLAME Project: http://www.glame.de/

2001-04-02 12:56:23

by Gustavo Niemeyer

[permalink] [raw]
Subject: Re: pthreads & fork & execve

Hi Richard! Hi Dennis!

> I tracked this down to a corrupt jumptable somewhere in the pthreads
> part of the libc (didnt have the source handy at that time, though). So
> I think this is a libc bug (version does not matter) - I even did a
> followup to a similar bug in the libc gnats database (I think I should
> have opened a new one, though...). But I failed to construct a "simple"
> testcase showing the bug (We use rather large amount of threads and
> in one or two doing popen() calls - or handcrafted fork() && execv(),
> the SIGSEGV is during fork()).

We're going trough two similar problems here. One is KDE, and the other
is Linuxconf. Linuxconf is core dumping on a module when it is linked
with pthread and dlopen()'ed with RTLD_GLOBAL. We must reduce one of
them to a testcase.

Btw, both are mainly C++ programs. Is your software written in C++?

> I stopped trying to find out what is going on as this feature is not
> essential (but maybe useful in the future). So I suggest you build a
> libc from source with debugging on and trace it down to the actual
> libc problem - or better try to isolate a simple testcase.

We'll probably do this here...

> I like to hear from the results :)

Please, let me know as well! :-)

Thanks!!

--
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]

2001-04-02 13:01:43

by Jakub Jelinek

[permalink] [raw]
Subject: Re: pthreads & fork & execve

On Mon, Apr 02, 2001 at 09:54:25AM -0300, Gustavo Niemeyer wrote:
> Hi Richard! Hi Dennis!
>
> > I tracked this down to a corrupt jumptable somewhere in the pthreads
> > part of the libc (didnt have the source handy at that time, though). So
> > I think this is a libc bug (version does not matter) - I even did a
> > followup to a similar bug in the libc gnats database (I think I should
> > have opened a new one, though...). But I failed to construct a "simple"
> > testcase showing the bug (We use rather large amount of threads and
> > in one or two doing popen() calls - or handcrafted fork() && execv(),
> > the SIGSEGV is during fork()).
>
> We're going trough two similar problems here. One is KDE, and the other
> is Linuxconf. Linuxconf is core dumping on a module when it is linked
> with pthread and dlopen()'ed with RTLD_GLOBAL. We must reduce one of
> them to a testcase.

By any chance, are you dlopening a DSO linked against -lpthread from
program not linked against -lpthread?

Jakub

2001-04-02 13:35:07

by Adam Dickmeiss

[permalink] [raw]
Subject: Re: pthreads & fork & execve

Hi.

On Mon, Apr 02, 2001 at 09:54:25AM -0300, Gustavo Niemeyer wrote:
> Hi Richard! Hi Dennis!
>
> > I tracked this down to a corrupt jumptable somewhere in the pthreads
> > part of the libc (didnt have the source handy at that time, though). So
> > I think this is a libc bug (version does not matter) - I even did a
> > followup to a similar bug in the libc gnats database (I think I should
> > have opened a new one, though...). But I failed to construct a "simple"
> > testcase showing the bug (We use rather large amount of threads and
> > in one or two doing popen() calls - or handcrafted fork() && execv(),
> > the SIGSEGV is during fork()).
>
> We're going trough two similar problems here. One is KDE, and the other
> is Linuxconf. Linuxconf is core dumping on a module when it is linked
> with pthread and dlopen()'ed with RTLD_GLOBAL. We must reduce one of
> them to a testcase.
>
> Btw, both are mainly C++ programs. Is your software written in C++?
>
> > I stopped trying to find out what is going on as this feature is not
> > essential (but maybe useful in the future). So I suggest you build a
> > libc from source with debugging on and trace it down to the actual
> > libc problem - or better try to isolate a simple testcase.

People making Apache 1.3.X modules have a problem too. They have to
rebuilt Apache and add -lpthread if any modules uses threads.

The following small program illustrates this. The program, main-wot,
crashes - the other, main-wt, doesn't.

[snip] sub.c:
int sub(void) { return 1; }

[snip] main.c:

#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>

int main(int argc, char **argv)
{
void *h = dlopen ("./sub.so", RTLD_NOW|RTLD_GLOBAL);
if (!h)
{
printf ("dlopen failed\n");
exit (1);
}
gethostbyname("slashdot.org");
dlclose (h);
exit (0);
}

[snip] Makefile
all: sub.so main-wt main-wot

sub.so: sub.c
gcc -D_REENTRANT -shared sub.c -o sub.so -lpthread
main-wt: main.c
gcc main.c -o main-wt -ldl -lpthread
main-wot: main.c
gcc main.c -o main-wot -ldl
[end]

Cheers,
Adam

> We'll probably do this here...
>
> > I like to hear from the results :)
>
> Please, let me know as well! :-)
>
> Thanks!!
>
> --
> Gustavo Niemeyer
>
> [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Adam Dickmeiss mailto:[email protected] http://www.indexdata.dk
Index Data T: +45 33410100 Mob.: 212 212 66

2001-04-02 14:34:28

by Gustavo Niemeyer

[permalink] [raw]
Subject: Re: pthreads & fork & execve

> People making Apache 1.3.X modules have a problem too. They have to
> rebuilt Apache and add -lpthread if any modules uses threads.

It seems to be the same case here.

> The following small program illustrates this. The program, main-wot,
> crashes - the other, main-wt, doesn't.
[...]

Both work here... am I doing something wrong (or right :-)??

I've tried to reduce to a testcase like this before, and it has worked
as well. I don't understand what this limitation is about.

--
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]

2001-04-02 14:28:07

by Gustavo Niemeyer

[permalink] [raw]
Subject: Re: pthreads & fork & execve

Hello Jakub!!

[...]
> By any chance, are you dlopening a DSO linked against -lpthread from
> program not linked against -lpthread?

Yes, I am!! Is this some limitation I'm not aware of?

Indeed, this seems to be made in many cases... is this about pthread??

--
Gustavo Niemeyer

[ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]

2001-04-03 10:20:47

by Richard Guenther

[permalink] [raw]
Subject: Re: pthreads & fork & execve

On Mon, 2 Apr 2001, Gustavo Niemeyer wrote:

> Hi Richard! Hi Dennis!
[...]

> Btw, both are mainly C++ programs. Is your software written in C++?

Not, just plain C - but one point is that we started seeing the problem
after we started using pthread_sigmask() to block certain signals from
reaching our threads.

Richard.

--
Richard Guenther <[email protected]>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
The GLAME Project: http://www.glame.de/