2004-11-04 04:32:25

by Blaisorblade

[permalink] [raw]
Subject: Fixing UML against NPTL (was: Re: [uml-devel] [PATCH] UML: Use PTRACE_KILL instead of SIGKILL to kill host-OS processes (take #2))

On Thursday 04 November 2004 01:39, Chris Wedgwood wrote:
> On Thu, Nov 04, 2004 at 01:13:27AM +0100, Blaisorblade wrote:
> > Well, not a lot of new work should go there.
>
> agreed
>
> > Not always. Do you think I'm a luser? Or what?
>
> no, not at all
>
> i was asking what break to see if i can help

Intimate Linking Script and NPTL knowledge would help a lot.

The issues are 2 ones:

1) LKML does not link because the linker does not like it's linker script,
which defines a special thread_private section (give a look at switcheroo()
and you could maybe realize the issue of copying the .text to a tmpfs file
and replacing the mapping to the executable with the tmpfs file mapping).

2) getpid() on a child clone returns the process's pid when run with a
NPTL-enabled glibc, while it returns the thread pid with a LinuxThreads one;
this causes tons of problems with UML, which uses signals as inter-thread and
intra-thread communication.

Note UML is not using pthread_create() to create the threads, where this
behaviour is an improvement. I'm using a plain clone() call without the
CLONE_THREAD flag (which is not even added in by glibc, according to strace).

I've not yet checked if glibc is hijacking getpid() or not, but that would be
strange anyway.

Also, this behaviour has been reported the first time about at the time of
2.6.0, but actually UML has almost never runs against NPTL glibc, because
it's statically linked most times.

--
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729


Attachments:
(No filename) (1.50 kB)
(No filename) (189.00 B)
Download all attachments

2004-11-11 18:01:09

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: Fixing UML against NPTL (was: Re: [uml-devel] [PATCH] UML: Use PTRACE_KILL instead of SIGKILL to kill host-OS processes (take #2))

On Thu, Nov 04, 2004 at 05:31:21AM +0100, Blaisorblade wrote:
> 2) getpid() on a child clone returns the process's pid when run with a
> NPTL-enabled glibc, while it returns the thread pid with a LinuxThreads one;
> this causes tons of problems with UML, which uses signals as inter-thread and
> intra-thread communication.
>
> Note UML is not using pthread_create() to create the threads, where this
> behaviour is an improvement. I'm using a plain clone() call without the
> CLONE_THREAD flag (which is not even added in by glibc, according to strace).
>
> I've not yet checked if glibc is hijacking getpid() or not, but that would be
> strange anyway.

Glibc caches the PID. If you're going to use clone directly, use the
gettid/getpid syscall directly. It's kind of rude that glibc breaks
getpid in this way; I recommend filing a bug in the glibc bugzilla at
sources.redhat.com.

--
Daniel Jacobowitz

2004-11-11 18:35:13

by Christophe Saout

[permalink] [raw]
Subject: Re: Fixing UML against NPTL (was: Re: [uml-devel] [PATCH] UML: Use PTRACE_KILL instead of SIGKILL to kill host-OS processes (take #2))

Am Donnerstag, den 11.11.2004, 12:45 -0500 schrieb Daniel Jacobowitz:

> Glibc caches the PID. If you're going to use clone directly, use the
> gettid/getpid syscall directly. It's kind of rude that glibc breaks
> getpid in this way; I recommend filing a bug in the glibc bugzilla at
> sources.redhat.com.

If glibc insists on caching the pid, it could also simply invalidate the
pid cache in the clone function.


Attachments:
signature.asc (189.00 B)
Dies ist ein digital signierter Nachrichtenteil

2004-11-11 18:50:06

by Daniel Jacobowitz

[permalink] [raw]
Subject: Re: Fixing UML against NPTL (was: Re: [uml-devel] [PATCH] UML: Use PTRACE_KILL instead of SIGKILL to kill host-OS processes (take #2))

On Thu, Nov 11, 2004 at 07:31:51PM +0100, Christophe Saout wrote:
> Am Donnerstag, den 11.11.2004, 12:45 -0500 schrieb Daniel Jacobowitz:
>
> > Glibc caches the PID. If you're going to use clone directly, use the
> > gettid/getpid syscall directly. It's kind of rude that glibc breaks
> > getpid in this way; I recommend filing a bug in the glibc bugzilla at
> > sources.redhat.com.

... but, thinking about it, they'll probably close it as INVALID.

> If glibc insists on caching the pid, it could also simply invalidate the
> pid cache in the clone function.

It currently does this for vfork, but not clone. Basically, you can't
call into glibc at all if you use clone. If you aren't using POSIX
threads, then the POSIX-compliant library is going to fall to pieces
around you. For instance, all the file locking will break, and
anything else that, like the PID cache, relies on either global or
per-_thread_ data.

--
Daniel Jacobowitz

2004-11-12 00:35:28

by Blaisorblade

[permalink] [raw]
Subject: Re: Fixing UML against NPTL (was: Re: [uml-devel] [PATCH] UML: Use PTRACE_KILL instead of SIGKILL to kill host-OS processes (take #2))

On Thursday 11 November 2004 19:45, Daniel Jacobowitz wrote:
> On Thu, Nov 11, 2004 at 07:31:51PM +0100, Christophe Saout wrote:
> > Am Donnerstag, den 11.11.2004, 12:45 -0500 schrieb Daniel Jacobowitz:
> > > Glibc caches the PID. If you're going to use clone directly, use the
> > > gettid/getpid syscall directly. It's kind of rude that glibc breaks
> > > getpid in this way; I recommend filing a bug in the glibc bugzilla at
> > > sources.redhat.com.
>
> ... but, thinking about it, they'll probably close it as INVALID.
>
> > If glibc insists on caching the pid, it could also simply invalidate the
> > pid cache in the clone function.

> It currently does this for vfork, but not clone. Basically, you can't
> call into glibc at all if you use clone. If you aren't using POSIX
> threads, then the POSIX-compliant library is going to fall to pieces
> around you. For instance, all the file locking will break, and
> anything else that, like the PID cache, relies on either global or
> per-_thread_ data.

Yes, in fact I guess that the problem is for any _thread variable. And as
fork() is not a raw syscall, so clone() shouldn't be.

I'll file a bugreport when I have time to do it properly - I don't want to
hear that "if I go into dirty things using clone(), I get to keep the pieces
and setup a different TLS for the new process"
--
Paolo Giarrusso, aka Blaisorblade
Linux registered user n. 292729