2003-08-09 19:17:58

by Matt Wilson

[permalink] [raw]
Subject: [PATCH] zap_other_threads() detaches thread group leader

Hi.

The change to detach the threads in zap_other_threads() broke the case
where the non-thread-group-leader is the cause of de_thread(). In
this case the group leader will be detached and freed before
switch_exec_pids() is complete and invalid data will be used.
This is a patch that makes sure that the group leader does not get
detached and reaped.

Thought that Ingo or Roland submitted this for me, but since it's
still not in test3 I figured I would send it myself. Please CC: me on
replies.

Cheers,

Matt
[email protected]

--- linux-2.6.0-test3/kernel/signal.c.zap 2003-08-09 14:58:50.000000000 -0400
+++ linux-2.6.0-test3/kernel/signal.c 2003-08-09 14:59:42.000000000 -0400
@@ -1011,9 +1011,11 @@ void zap_other_threads(struct task_struc
* killed as part of a thread group due to another
* thread doing an execve() or similar. So set the
* exit signal to -1 to allow immediate reaping of
- * the process.
+ * the process. But don't detach the thread group
+ * leader.
*/
- t->exit_signal = -1;
+ if (t != p->group_leader)
+ t->exit_signal = -1;

sigaddset(&t->pending.signal, SIGKILL);
rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);


2003-08-12 07:41:02

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] zap_other_threads() detaches thread group leader

Because of other sticky issues still unresolved and the lack of apparent
real need to support the troublesome case, at last check Linus was leaning
towards disallowing CLONE_THREAD&~CLONE_DETACHED in copy_process so that
the exit_signal = -1 change in zap_other_threads is no longer apropos and
can be taken back out entirely. Unless that is done, the != group_leader
check most certainly has to go in.



Thanks,
Roland

2003-08-12 07:52:30

by Roland McGrath

[permalink] [raw]
Subject: [PATCH] revert zap_other_threads breakage, disallow CLONE_THREAD without CLONE_DETACHED

Please apply this patch to get us back out of this useless quagmire and
disallow the problematic case that noone wants to try to use any more.


Thanks,
Roland


--- linux/kernel/fork.c 18 Jul 2003 16:27:39 -0000 1.142
+++ linux/kernel/fork.c 12 Aug 2003 06:20:05 -0000
@@ -744,7 +749,8 @@ struct task_struct *copy_process(unsigne
* Thread groups must share signals as well, and detached threads
* can only be started up within the thread group.
*/
- if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
+ if ((clone_flags & CLONE_THREAD) &&
+ (clone_flags & (CLONE_SIGHAND|CLONE_DETACHED)) != (CLONE_SIGHAND|CLONE_DETACHED))
return ERR_PTR(-EINVAL);
if ((clone_flags & CLONE_DETACHED) && !(clone_flags & CLONE_THREAD))
return ERR_PTR(-EINVAL);
--- linux/kernel/signal.c 7 Aug 2003 20:06:12 -0000 1.98
+++ linux/kernel/signal.c 12 Aug 2003 06:18:34 -0000
@@ -1005,16 +1016,6 @@ void zap_other_threads(struct task_struc
*/
if (t->state & (TASK_ZOMBIE|TASK_DEAD))
continue;
-
- /*
- * We don't want to notify the parent, since we are
- * killed as part of a thread group due to another
- * thread doing an execve() or similar. So set the
- * exit signal to -1 to allow immediate reaping of
- * the process.
- */
- t->exit_signal = -1;
-
sigaddset(&t->pending.signal, SIGKILL);
rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
signal_wake_up(t, 1);

2003-08-13 07:11:05

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] revert zap_other_threads breakage, disallow CLONE_THREAD without CLONE_DETACHED

On Tue, 2003-08-12 at 00:52, Roland McGrath wrote:
> Please apply this patch to get us back out of this useless quagmire and
> disallow the problematic case that noone wants to try to use any more.

It seems to me there's a couple of problems with this patch:

One is that it prevents any single piece of code using
clone(CLONE_THREAD) from working on both 2.4 and 2.6. While clone() is
mostly hidden under libpthread.so, there are some applications which use
it directly for various good reasons. I've implemented two versions of
my clone/wait stuff in Valgrind to cope with this, but not everyone will
have yet. Perhaps I'm the only person in the world to use CLONE_THREAD,
but it seems unlikely. After all, clone() is a public interface.

The other reason is that this seems to be covering over an actual
conceptual problem rather than fixing it. I can't say I really
understand this piece of the kernel (which isn't too surprising that it
is very complicated because it represents 30 years of Unix history
packed into a dense mass), but one niggling concern I have is that even
when you're using CLONE_DETACHED, if you've attached with ptrace(), you
effectively become a parent who can wait on the detached clone. If you
can wait via ptrace, then doesn't it mean that all the wait-related
problems are still visible? I've tried to reproduce some of the
problems I've been seeing with non-detached threads in the case where
they are detached traced threads, but so far it seems OK. But that may
be because I haven't hit it properly yet.

I guess my concern is that I'm not convinced we really understand what's
going wrong here, and so I'm not convinced that this patch really fixes
things.

J

2003-08-13 09:38:07

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] revert zap_other_threads breakage, disallow CLONE_THREAD without CLONE_DETACHED

> One is that it prevents any single piece of code using
> clone(CLONE_THREAD) from working on both 2.4 and 2.6.

I am not aware of what uses have been made of CLONE_THREAD in vanilla 2.4.
The semantics are so drastically different between 2.4 and 2.6 that it
seems useful to me that anything expecting the behavior you get from
CLONE_THREAD in 2.4 have some indication that the world is different now.

> Perhaps I'm the only person in the world to use CLONE_THREAD, but it
> seems unlikely. After all, clone() is a public interface.

Whoever is using it needs to cope with the many differences in the way that
CLONE_THREAD threads behave in 2.6 vs 2.4.

> I guess my concern is that I'm not convinced we really understand what's
> going wrong here, and so I'm not convinced that this patch really fixes
> things.

Every problem that I have personally seen or thought of, we understand and
have addressed in some fashion (the last of them by punting so it's ruled
out). I am not making any special claims about the code--there certainly
could be more problems. But there is not a failure mode that I have either
seen in a reproducible test case or seen postulated in our discussion that
we have not elucidated fully. I can certainly accept that you have seen
some failures that are not specifically explained, but as I understand it
there aren't any that you are seeing now or that you could reproduce that
aren't addressed by the changes we've discussed before now. If there are
specific failures you can reproduce that we have not already addressed, I
would certainly like to see them. When the last potential problem was
identified, it seemed sufficiently sticky to address properly that the
inclination was to punt. This seemed appropriate given that the only user
to complain about the subject (you) had said he'd punted on using the
problematical feature combination (CLONE_THREAD without CLONE_DETACHED). I
wouldn't claim that the last unresolved problem is intractable (in fact, I
already proposed a hacky solution that might be adequate) or that there
necessarily are (or aren't) more problems in the whole area of exit
handling. I don't know of other problems, and if people in actual fact
care about using CLONE_THREAD without CLONE_DETACHED in 2.6, then it is
worth someone's time and effort to address the remaining known problem and
field any newly discovered ones rather than punting.

If there are in fact other mysterious problems with exit handling unrelated
to the CLONE_THREAD|SIGCHLD style usage as you seem to be alluding to,
certainly we should track them down. I honestly don't know any specific
reasons to suspect any such things. Vague concern about ptrace seems like a
red herring to me, unless you can cite some specific oddness actually
happening to a real kernel. All the exit handling code has cases for ptrace
specifically and does things differently. If you are talking specifically
about the problem we discussed most recently, de_thread blocking when there
are zombies in the thread group: indeed, I think it will block when there
are ptrace'd zombies and the exec will not complete until the tracer reaps
them all. That could be called a semantic of ptrace, rather than a bug.
Otherwise, a debugger wouldn't necessarily get reports for every traced
thread and would need to know by other means that an exec had happened and
implicitly reaped them (not really hard to deal with, but perhaps an
unnecessary complication of the ptrace thread tracing interface).


Thanks,
Roland

2003-08-14 17:34:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] revert zap_other_threads breakage, disallow CLONE_THREAD without CLONE_DETACHED


On Tue, 12 Aug 2003, Roland McGrath wrote:
>
> Please apply this patch to get us back out of this useless quagmire and
> disallow the problematic case that noone wants to try to use any more.
>
> - if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
> + if ((clone_flags & CLONE_THREAD) &&
> + (clone_flags & (CLONE_SIGHAND|CLONE_DETACHED)) != (CLONE_SIGHAND|CLONE_DETACHED))
> return ERR_PTR(-EINVAL);
> if ((clone_flags & CLONE_DETACHED) && !(clone_flags & CLONE_THREAD))

Look at the condition that follows: it disallows CLONE_DETACHED for
non-threads.

Which really means that together with the first change, CLONE_DETACHED is
always the same thing as CLONE_THREAD: you can't have one without the
other.

Which means that they are logically not separate bits any more, and we
should just get rid of CLONE_DETACHED altogether, and use CLONE_THREAD in
all cases where it is tested for.

I'd really prefer not to keep a bit around that has to mean the same thing
as another bit - that way just lies madness. So I'll document
CLONE_DETACHED as being a no-op, and change the _one_ place that used it
to just use CLONE_THREAD instead.

Linus

2003-08-14 17:53:48

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] revert zap_other_threads breakage, disallow CLONE_THREAD without CLONE_DETACHED

Linus Torvalds wrote:
> I'd really prefer not to keep a bit around that has to mean the same thing
> as another bit - that way just lies madness. So I'll document
> CLONE_DETACHED as being a no-op, and change the _one_ place that used it
> to just use CLONE_THREAD instead.

Don't forget to mention that software that may be run on 2.5 kernels
needs to set both bits, else won't work as expected.

-- Jamie

2003-08-14 18:03:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] revert zap_other_threads breakage, disallow CLONE_THREAD without CLONE_DETACHED


On Thu, 14 Aug 2003, Jamie Lokier wrote:
>
> Don't forget to mention that software that may be run on 2.5 kernels
> needs to set both bits, else won't work as expected.

Well, the CLONE_DETACHED without CLONE_THREAD case was never legal, and my
current patch will actually warn about the newly disallowed case too. I've
not gotten any warnings with RH-9, and I don't think anybody else has a
new enough glibc to even use CLONE_THREAD at all.

But yes, there will be a warning, at least for a time (and eventually
we'll just return -EINVAL silently - ie the program will _fail_ the
clone(), it won't just act strangely).

Linus

2003-08-14 18:28:16

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] revert zap_other_threads breakage, disallow CLONE_THREAD without CLONE_DETACHED

Linus Torvalds wrote:
> > Don't forget to mention that software that may be run on 2.5 kernels
> > needs to set both bits, else won't work as expected.
>
> Well, the CLONE_DETACHED without CLONE_THREAD case was never legal, and my
> current patch will actually warn about the newly disallowed case too. I've
> not gotten any warnings with RH-9, and I don't think anybody else has a
> new enough glibc to even use CLONE_THREAD at all.

I'm thinking of programs that don't use Glibc, but do use these features.
Perhaps I'm the only person who writes such code :)

> But yes, there will be a warning, at least for a time (and eventually
> we'll just return -EINVAL silently - ie the program will _fail_ the
> clone(), it won't just act strangely).

-EINVAL would be great.

-- Jamie

2003-08-14 18:33:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] revert zap_other_threads breakage, disallow CLONE_THREAD without CLONE_DETACHED


On Thu, 14 Aug 2003, Jamie Lokier wrote:
>
> I'm thinking of programs that don't use Glibc, but do use these features.
> Perhaps I'm the only person who writes such code :)

Sure, there have always been projects out there that have used the
"native" clone() interfaces, but they're fairly rare, and CLONE_THREAD
being a new addition makes it less likely to show up as a problem.

> > But yes, there will be a warning, at least for a time (and eventually
> > we'll just return -EINVAL silently - ie the program will _fail_ the
> > clone(), it won't just act strangely).
>
> -EINVAL would be great.

I've pushed the change to the BK trees, and if you're not a BK user you
can just test the attached patch directly to see if it affects you..

Linus

----
# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
# ChangeSet 1.1149 -> 1.1150
# include/linux/sched.h 1.159 -> 1.160
# kernel/fork.c 1.134 -> 1.135
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/08/14 [email protected] 1.1150
# Mark CLONE_DETACHED as being irrelevant: it must match CLONE_THREAD.
#
# CLONE_THREAD without CLONE_DETACHED will now return -EINVAL, and
# for a while we will warn about anything that uses it (there are no
# known users, but this will help pinpoint any problems if somebody
# used to care about the invalid combination).
# --------------------------------------------
#
diff -Nru a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h Thu Aug 14 11:32:19 2003
+++ b/include/linux/sched.h Thu Aug 14 11:32:19 2003
@@ -49,7 +49,7 @@
#define CLONE_SETTLS 0x00080000 /* create a new TLS for the child */
#define CLONE_PARENT_SETTID 0x00100000 /* set the TID in the parent */
#define CLONE_CHILD_CLEARTID 0x00200000 /* clear the TID in the child */
-#define CLONE_DETACHED 0x00400000 /* parent wants no child-exit signal */
+#define CLONE_DETACHED 0x00400000 /* Not used - CLONE_THREAD implies detached uniquely */
#define CLONE_UNTRACED 0x00800000 /* set if the tracing process can't force CLONE_PTRACE on this clone */
#define CLONE_CHILD_SETTID 0x01000000 /* set the TID in the child */
#define CLONE_STOPPED 0x02000000 /* Start in stopped state */
diff -Nru a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c Thu Aug 14 11:32:19 2003
+++ b/kernel/fork.c Thu Aug 14 11:32:19 2003
@@ -746,8 +746,22 @@
*/
if ((clone_flags & CLONE_THREAD) && !(clone_flags & CLONE_SIGHAND))
return ERR_PTR(-EINVAL);
- if ((clone_flags & CLONE_DETACHED) && !(clone_flags & CLONE_THREAD))
+
+ /*
+ * CLONE_DETACHED must match CLONE_THREAD: it's a historical
+ * thing.
+ */
+ if (!(clone_flags & CLONE_DETACHED) != !(clone_flags & CLONE_THREAD)) {
+ /* Warn about the old no longer supported case so that we see it */
+ if (clone_flags & CLONE_THREAD) {
+ static int count;
+ if (count < 5) {
+ count++;
+ printk(KERN_WARNING "%s trying to use CLONE_THREAD without CLONE_DETACH\n", current->comm);
+ }
+ }
return ERR_PTR(-EINVAL);
+ }

retval = security_task_create(clone_flags);
if (retval)
@@ -877,10 +891,7 @@
p->parent_exec_id = p->self_exec_id;

/* ok, now we should be set up.. */
- if (clone_flags & CLONE_DETACHED)
- p->exit_signal = -1;
- else
- p->exit_signal = clone_flags & CSIGNAL;
+ p->exit_signal = (clone_flags & CLONE_THREAD) ? -1 : (clone_flags & CSIGNAL);
p->pdeath_signal = 0;

/*

2003-08-14 18:38:57

by Jamie Lokier

[permalink] [raw]
Subject: Re: [PATCH] revert zap_other_threads breakage, disallow CLONE_THREAD without CLONE_DETACHED

Linus Torvalds wrote:
> I've pushed the change to the BK trees, and if you're not a BK user you
> can just test the attached patch directly to see if it affects you..

That's cool, I am not affected.

Simply the documentation you mentioned, I felt someone who writes code
using CLONE_THREAD in future should be aware that if they just use
CLONE_THREAD by itself, their code won't work as expected with the
later 2.5 kernels. The EINVAL ensures that perfectly. So, :)

-- Jamie

2003-08-14 20:56:22

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH] revert zap_other_threads breakage, disallow CLONE_THREAD without CLONE_DETACHED

> Which means that they are logically not separate bits any more, and we
> should just get rid of CLONE_DETACHED altogether, and use CLONE_THREAD in
> all cases where it is tested for.

That is certainly fine by me.


Thanks,
Roland