2001-07-17 04:22:28

by Tachino Nobuhiro

[permalink] [raw]
Subject: [BUG 2.4.6] PPID of a process is set to itself


Hi,

When I am playing with clone system call, I found the case the cloned process
becomes the zombie which is not reaped because the PPID of the process is
set to itself. The test program are following.



#include <sched.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

int stack[2048];

int
func(void *p)
{
exit(0);
}

int
main(int argc, char *argv[])
{
clone(func, &stack[2048],
CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD,
NULL);

sleep(1);
exit(0);
}

Following patch fixes the bug, but I don't know this is correct. Can
someone please explain me why in forget_original_parent(), the parent of
processes in a thread group is set to another process in the thread
group?

diff -u -r linux.org/kernel/exit.c linux/kernel/exit.c
--- linux.org/kernel/exit.c Sat May 5 06:44:06 2001
+++ linux/kernel/exit.c Tue Jul 17 11:06:59 2001
@@ -168,7 +168,7 @@
/* We dont want people slaying init */
p->exit_signal = SIGCHLD;
p->self_exec_id++;
- p->p_opptr = reaper;
+ p->p_opptr = p == reaper ? child_reaper : reaper;
if (p->pdeath_signal) send_sig(p->pdeath_signal, p, 0);
}
}


2001-07-17 04:42:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG 2.4.6] PPID of a process is set to itself

In article <[email protected]> you write:
>
>When I am playing with clone system call, I found the case the cloned process
>becomes the zombie which is not reaped because the PPID of the process is
>set to itself. The test program are following.

Heh.

>Following patch fixes the bug, but I don't know this is correct. Can
>someone please explain me why in forget_original_parent(), the parent of
>processes in a thread group is set to another process in the thread
>group?

The point with "CLONE_THREAD" is to create a sibling that is a more
"traditional" thread in the sense that it is more identical to the
original clonee - sharing the same thread group etc, so that we can
implement full POSIX pthreads semantics.

HOWEVER, the bug you hit is because CLONE_THREAD also implies
CLONE_PARENT, and the fork() code didn't actually enforce this. So
instead of your patch, we just should not allow the parent and the child
to be in the same thread group. Suggested real patch appended. Does this
fix it for you too?

Thanks,

Linus

------
--- linux-orig/kernel/fork.c Mon Apr 30 22:23:29 2001
+++ linux/kernel/fork.c Mon Jul 16 21:38:11 2001
@@ -604,7 +604,7 @@
p->run_list.next = NULL;
p->run_list.prev = NULL;

- if ((clone_flags & CLONE_VFORK) || !(clone_flags & CLONE_PARENT)) {
+ if ((clone_flags & CLONE_VFORK) || !(clone_flags & (CLONE_PARENT | CLONE_THREAD))) {
p->p_opptr = current;
if (!(p->ptrace & PT_PTRACED))
p->p_pptr = current;

2001-07-17 06:10:25

by Tachino Nobuhiro

[permalink] [raw]
Subject: Re: [BUG 2.4.6] PPID of a process is set to itself


At Mon, 16 Jul 2001 21:41:56 -0700,
Linus Torvalds wrote:
>
> HOWEVER, the bug you hit is because CLONE_THREAD also implies
> CLONE_PARENT, and the fork() code didn't actually enforce this. So
> instead of your patch, we just should not allow the parent and the child
> to be in the same thread group. Suggested real patch appended. Does this
> fix it for you too?

Thank you for the patch.
I tried it and found the the process cloned by my test program became
the zombie child of my shell and is not reaped because the shell is
not expecting the process.

# ps lw
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
000 0 1180 1176 14 0 3392 1640 wait4 S pts/1 0:00 bash
144 0 1249 1180 9 0 0 0 do_exi Z pts/1 0:00 [a.out <defunct>]
100 0 1279 1180 15 0 3272 1440 - R pts/1 0:00 ps lw


This is okay because when the shell is terminated, the child is also reaped,
but it seems a little strange.

2001-07-17 06:41:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BUG 2.4.6] PPID of a process is set to itself


On Tue, 17 Jul 2001, Tachino Nobuhiro wrote:
> At Mon, 16 Jul 2001 21:41:56 -0700,
> Linus Torvalds wrote:
> >
> > HOWEVER, the bug you hit is because CLONE_THREAD also implies
> > CLONE_PARENT, and the fork() code didn't actually enforce this. So
> > instead of your patch, we just should not allow the parent and the child
> > to be in the same thread group. Suggested real patch appended. Does this
> > fix it for you too?
>
> Thank you for the patch.
> I tried it and found the the process cloned by my test program became
> the zombie child of my shell and is not reaped because the shell is
> not expecting the process.

Right.

That, however, is because you're using CLONE_THREAD in a manner it wasn't
really meant to be used (but now it is purely _your_ problem, and will no
longer cause processes that cannot be reaped by anybody).

The real design for CLONE_THREAD is basically to allow pthreads-like
thread handling by having pthread_create() do roughly something like this:

static int has_created_master_process = 0;

if (!has_created_master_process) {
new_me = clone(CLONE_VM | SIGCHLD);
if (new_me > 0) {
/* Original thread turns into master process */
printf("I am the new master process, bow down before me!\n");
has_created_master_process = 1;
for (;;) {
if (!waitpid(-1, NULL, 0))
continue;
if (errno == ENOCHLD)
exit(0);
.. we could do signal propagation here ..
}
}
/* This child now takes over the role of the original thread */
}

clone(CLONE_VM | CLONE_THREAD | SIGCHLD);
/* The child of this clone is the new thread */

ie we'd always have "n+1" kernel threads for the "n" pthreads threads,
where the extra additional thread is there to make the real parent of the
threaded application see just one "process". More importantly, it means
that the real parent sees the death of the "clone group" only after all
threads have exited.

Alternatively, you can always use a zero signal specifier to clone(), but
you have to realize that if you do that, the original parent will only see
one exit code, and it will be the one from the first thread. The other
threads may still be running - and the original parent simply won't know
anything at all about them. This is quite acceptable thread behaviour for
some thread usage, but it is not the behaviour that pthreads is supposed
to have (this is how you can make "IO slaves" or similar - threads that
are not full-fledged parts of the process that the parent is supposed to
see).

Also, please do notice that one fundamental part of the CLONE_THREAD logic
never made it into a stable kernel: the shared signal handling. So while
CLONE_THREAD allows for many pthread-like things (one common process ID
shared by all threads, for example), the most fundamental part of it was
not actually merged into the standard kernel because of stability concerns
in late pre-2.4 test cycle.

Linus

2001-07-17 07:13:08

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [BUG 2.4.6] PPID of a process is set to itself

Linus Torvalds <[email protected]> writes:

> if (!has_created_master_process) {
> new_me = clone(CLONE_VM | SIGCHLD);
> if (new_me > 0) {
> /* Original thread turns into master process */
> printf("I am the new master process, bow down before me!\n");
> has_created_master_process = 1;
> for (;;) {
> if (!waitpid(-1, NULL, 0))
> continue;
> if (errno == ENOCHLD)
> exit(0);
> .. we could do signal propagation here ..
> }
> }
> /* This child now takes over the role of the original thread */

A bit more complicated due to switching of stacks but that's basically
it. With this clone model using n+1 threads is the only way to get
the semantics right.

> Also, please do notice that one fundamental part of the CLONE_THREAD logic
> never made it into a stable kernel: the shared signal handling. So while
> CLONE_THREAD allows for many pthread-like things (one common process ID
> shared by all threads, for example), the most fundamental part of it was
> not actually merged into the standard kernel because of stability concerns
> in late pre-2.4 test cycle.

Exactly. This is holding off everything. The way this is solved
(basically: the limitations imposed on the userland implementation)
will determine much of the implementation.

I've done already a great deal of the implementation when Linus first
put the code in the late 2.3 kernels. If somebody finally would get
the signal handling stuff done (and a few more little things, some
Linus already agreed on) we could have a compliant pthread
implementation soon.

--
---------------. ,-. 1325 Chesapeake Terrace
Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA
Red Hat `--' drepper at redhat.com `------------------------

2001-07-17 23:17:02

by bert hubert

[permalink] [raw]
Subject: huge number of context switches under 2.2.x with SMP & threaded apps

A threads related question - I have a nameserver with 8 active threads,
which in turn leads to 6 (in this case) MySQL connections. When
stresstesting this nameserver, we see a *huge* number of context switches.
50.000 has been observered. When raising this to ~50 active threads and ~50
MySQL connections we've seen 100.000 context switches/second. Performance
suffers.

This is a RedHat 6.2 system with a 2.2.16 kernel, 2*PIII, 900MHz.

I saw some mention of this problem on the MySQL site with regards to
processes holding a pthread_mutex_lock() for short amounts of time. They
advise to use 2.4 but right now that is not within the scope of my options.

My question: is there a 2.2 kernel in which this is resolved? And secondly,
is there a way to prevent this problem purely from userspace? In other
words, what causes this problem.

The MySQL site also mentions that 2.4 could do better in some ways,
especially regarding 'overspin'.

Thanks for your time.


--
http://www.PowerDNS.com Versatile DNS Services
Trilab The Technology People
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet

2001-07-17 23:26:24

by Davide Libenzi

[permalink] [raw]
Subject: RE: huge number of context switches under 2.2.x with SMP & threa


On 17-Jul-2001 bert hubert wrote:
> A threads related question - I have a nameserver with 8 active threads,
> which in turn leads to 6 (in this case) MySQL connections. When
> stresstesting this nameserver, we see a *huge* number of context switches.
> 50.000 has been observered. When raising this to ~50 active threads and ~50
> MySQL connections we've seen 100.000 context switches/second. Performance
> suffers.
>
> This is a RedHat 6.2 system with a 2.2.16 kernel, 2*PIII, 900MHz.
>
> I saw some mention of this problem on the MySQL site with regards to
> processes holding a pthread_mutex_lock() for short amounts of time. They
> advise to use 2.4 but right now that is not within the scope of my options.
>
> My question: is there a 2.2 kernel in which this is resolved? And secondly,
> is there a way to prevent this problem purely from userspace? In other
> words, what causes this problem.
>
> The MySQL site also mentions that 2.4 could do better in some ways,
> especially regarding 'overspin'.

If the lock is contended the thread start spinning with sched_yield() for a
given number of times.
This could result in an high ctx switch rate with quite long runqueue also.




- Davide