2003-02-09 00:48:37

by Anton Blanchard

[permalink] [raw]
Subject: heavy handed exit() in latest BK


Hi,

>From BK late last night (has everything except the most recent 2
sound changesets)

kernel BUG at kernel/exit.c:710!

which is:

do_exit()
...
schedule();
BUG();
/* Avoid "noreturn function does return". */
for (;;) ;
}

Oops. In a police lineup Id finger the signal changes.

Anton


2003-02-09 01:54:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Sun, 9 Feb 2003, Anton Blanchard wrote:
>
> From BK late last night (has everything except the most recent 2
> sound changesets)
>
> kernel BUG at kernel/exit.c:710!
>
> which is:
>
> do_exit()
> ...
> schedule();
> BUG();
> /* Avoid "noreturn function does return". */
> for (;;) ;
> }
>
> Oops. In a police lineup Id finger the signal changes.

Interesting. Especially as the last thing exit_notify() does (just a few
lines above the schedule()) is to do

tsk->state = TASK_ZOMBIE;

and that schedule() _really_ really shouldn't return. Regardless of any
signal handler changes.

By then we also have local interrupts disabled, and we've explicitly
disabled preemption, so I don't see how anything could ever wake us up any
more.

But yes, I'd agree with your line-up, if for no other reason than the fact
that I don't really see anything else even _remotely_ likely to muck
around with process state.

I don't see this (obviously), but could you make "try_to_wake_up()" (in
kernel/sched.c) do a

BUG_ON(p->state & TASK_ZOMBIE);

to see if somebody tries to wake up a zombie task (maybe the signal
handler stuff does that under some unlucky circumstances).

And then the call trace would be very interesting indeed if the above bug
triggers.. (and unlike the do_exit() bug, the tri_to_wake_up() thing
should clearly pinpoint where the problem actually happens, knock wood).

Linus

2003-02-09 02:08:22

by Roland McGrath

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK

> By then we also have local interrupts disabled, and we've explicitly
> disabled preemption, so I don't see how anything could ever wake us up any
> more.

We already know how this happens. I think it's only possible with SMP.
I described this very problem in the final paragraph of my penultimate
message on the signals changes:

Incidentally, I've run across another bug introduced by the last rework of
handle_stop_signal (or perhaps similar races have always been there, I'm
not quite sure at the moment). It can call wake_up_process on a zombie
that's on its way to exit, triggering the BUG at the end of do_exit. I
think this race may be possible in all of the signal_wake_up calls for
SIGKILL cases, and other uses of wake_up_process like PTRACE_KILL.
Some such places check ->state, but they do not lock out exit races.
Perhaps having wake_up_process itself be race-proof would be simplest.
I don't have a good sense of how best to fix this one yet.

This will probably stop biting anyone in practice after the most recent new
plan for SIGKILL we've just been discussing. For signals, the race will
only be possible for SIGCONT sent when a thread is on its way to die. That
can be avoided by checking PF_EXITING in handle_stop_signal, because after
setting PF_EXITING any thread in do_exit will take the siglock and thus
can't have gotten far enough to go to TASK_ZOMBIE without being serialized
after the loop in handle_stop_signal.

As I said above, I think this race is possible in other uses of
wake_up_process. PTRACE_KILL is one example, but there are others and I
would have to check carefully to be convinced that other factors rule out
this exit race for them. I think that BUG_ON check should definitely go
into try_wake_up so that it hits should any of these other races ever
actually bite. Unless I am missing something, it won't necessarily catch
all races unless you use xchg to set TASK_RUNNING and then check the old value.

2003-02-09 02:13:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Sat, 8 Feb 2003, Roland McGrath wrote:
>
> As I said above, I think this race is possible in other uses of
> wake_up_process.

I don't think you have any other users (other than signals) that wake up
processes that aren't on some wait-queue.

And by the time we exit, we have better had removed outselves from all the
wait-queues, so I suspect signals are really the only thing that can wake
up a process after it died but before it's truly gone.

Anyway, I'll code up the SIGKILL changes that should make this a non-issue
(along with the bad SIGKILL/kernel-thread interaction).

Linus

2003-02-09 02:22:16

by Roland McGrath

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK

> Anyway, I'll code up the SIGKILL changes that should make this a non-issue

I'm already in the midst of those and will test my various cases and gdb
and so forth. If you want to put something in before I've done testing,
then please send me the patch you use.

2003-02-09 02:28:15

by Linus Torvalds

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Sat, 8 Feb 2003, Roland McGrath wrote:
>
> I'm already in the midst of those and will test my various cases and gdb
> and so forth. If you want to put something in before I've done testing,
> then please send me the patch you use.

You just got it. HOWEVER, I haven't actually committed it to my tree, and
I might as well wait for your test results, so you can choose to just
ignore it if you want to.

Linus

2003-02-09 02:27:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Sat, 8 Feb 2003, Linus Torvalds wrote:
>
> And then the call trace would be very interesting indeed if the above bug
> triggers.. (and unlike the do_exit() bug, the tri_to_wake_up() thing
> should clearly pinpoint where the problem actually happens, knock wood).

Ok, let's assume it was the wake_up_process() in handle_stop_signal(),
then hopefully this (totally untested) patch should fix it.

Roland, does this look sane to you? It makes the STOP wakeup be a
SIGCONT-only thing, and makes the wakeup conditional on !PF_EXITING.

It also makes SIGKILL ignore the stopped status for delivery.

Does this make the gdb test-suite happy?

Linus

---
===== kernel/signal.c 1.61 vs edited =====
--- 1.61/kernel/signal.c Fri Feb 7 06:46:13 2003
+++ edited/kernel/signal.c Sat Feb 8 18:29:11 2003
@@ -591,8 +591,7 @@
rm_from_queue(sigmask(SIGCONT), &t->pending);
t = next_thread(t);
} while (t != p);
- }
- else if (sig_kernel_cont(sig)) {
+ } else if (sig == SIGCONT) {
/*
* Remove all stop signals from all queues,
* and wake all threads.
@@ -637,17 +636,12 @@
* running the handler. With the TIF_SIGPENDING
* flag set, the thread will pause and acquire the
* siglock that we hold now and until we've queued
- * the pending signal. For SIGKILL, we likewise
- * don't want anybody doing anything but taking the
- * SIGKILL. The only case in which a thread would
- * not already be in the signal dequeuing loop is
- * non-signal (e.g. syscall) ptrace tracing, so we
- * don't worry about an unnecessary trip through
- * the signal code and just keep this code path
- * simpler by unconditionally setting the flag.
+ * the pending signal.
*/
- set_tsk_thread_flag(t, TIF_SIGPENDING);
- wake_up_process(t);
+ if (!(t->flags & PF_EXITING)) {
+ set_tsk_thread_flag(t, TIF_SIGPENDING);
+ wake_up_process(t);
+ }
t = next_thread(t);
} while (t != p);
}
@@ -789,15 +783,17 @@
* as soon as they're available, so putting the signal on the shared queue
* will be equivalent to sending it to one such thread.
*/
-#define wants_signal(sig, p) (!sigismember(&(p)->blocked, sig) \
- && (p)->state < TASK_STOPPED \
- && !((p)->flags & PF_EXITING) \
- && (task_curr(p) || !signal_pending(p)))
+#define wants_signal(sig, p, mask) \
+ (!sigismember(&(p)->blocked, sig) \
+ && ((p)->state & mask) \
+ && !((p)->flags & PF_EXITING) \
+ && (task_curr(p) || !signal_pending(p)))

static inline int
__group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
{
struct task_struct *t;
+ unsigned int mask;
int ret;

#if CONFIG_SMP
@@ -815,6 +811,14 @@
return 0;

/*
+ * Normally we only post signals to non-stopped threads,
+ * bit SIGKILL will puch through that too..
+ */
+ mask = TASK_RUNNING | TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE;
+ if (sig == SIGKILL)
+ mask |= TASK_STOPPED;
+
+ /*
* Put this signal on the shared-pending queue, or fail with EAGAIN.
* We always use the shared queue for process-wide signals,
* to avoid several races.
@@ -829,7 +833,7 @@
* If the main thread wants the signal, it gets first crack.
* Probably the least surprising to the average bear.
*/
- if (wants_signal(sig, p))
+ if (wants_signal(sig, p, mask))
t = p;
else if (thread_group_empty(p))
/*
@@ -847,7 +851,7 @@
t = p->signal->curr_target = p;
BUG_ON(t->tgid != p->tgid);

- while (!wants_signal(sig, t)) {
+ while (!wants_signal(sig, t, mask)) {
t = next_thread(t);
if (t == p->signal->curr_target)
/*

2003-02-09 02:32:24

by Roland McGrath

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK

> Roland, does this look sane to you?

Hey, that's my patch! :-) Actually, mine has one line of difference, which
is not to set TIF_SIGPENDING when SIGCONT is blocked. That difference is
visible to kernel threads that block SIGCONT and check signal_pending,
of which there are some though I don't recall understanding why.

I am still doing tests.

2003-02-09 03:21:20

by Roland McGrath

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK

There is definitely something after that patch. gdb is broken in ways that
are being mysterious to me, and now zombies with ppid 1 are sticking around
forever. With that patch, do:

bash$ (sleep 2 & echo zombie $! ; exec sleep 10000000) & sleep 4; kill -9 $!
[2] 1220
zombie 1222
bash$ ps l1222
[2]+ Killed ( sleep 2 & echo zombie $!; exec sleep 10000000 )
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
0 5281 1222 1 15 0 0 0 wait4 Z pts/1 0:00 [sleep <defun

Such zombies seem to stick around forever. I must have managed to break
the normal delivery of SIGCHLD to init, but right now it's a mystery to me how.

2003-02-09 03:24:17

by Roland McGrath

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK

Ah. Deciding state should be treated as a bitmask is not so hot when
TASK_RUNNING is 0.

2003-02-09 03:32:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Sat, 8 Feb 2003, Roland McGrath wrote:
>
> Ah. Deciding state should be treated as a bitmask is not so hot when
> TASK_RUNNING is 0.

Oops. Indeed. TASK_RUNNING isn't a bit at all, it's a lack of sleep.

Turning the logic the other way around:

/* Ignore processes that are dead or stopped (except for SIGKILL) */
mask = TASK_ZOMBIE | TASK_DEAD;
if (sig != SIGKILL)
mask != TASK_STOPPED;

and testing for !((p)->state & mask) should fix it. The mask ends up being
the states that are _not_ good to send signals for ;)

Linus

2003-02-09 03:35:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Sat, 8 Feb 2003, Linus Torvalds wrote:
>
> Oops. Indeed. TASK_RUNNING isn't a bit at all, it's a lack of sleep.

So it might work this way around..

Linus

---
===== kernel/signal.c 1.61 vs edited =====
--- 1.61/kernel/signal.c Fri Feb 7 06:46:13 2003
+++ edited/kernel/signal.c Sat Feb 8 19:39:48 2003
@@ -591,8 +591,7 @@
rm_from_queue(sigmask(SIGCONT), &t->pending);
t = next_thread(t);
} while (t != p);
- }
- else if (sig_kernel_cont(sig)) {
+ } else if (sig == SIGCONT) {
/*
* Remove all stop signals from all queues,
* and wake all threads.
@@ -637,17 +636,12 @@
* running the handler. With the TIF_SIGPENDING
* flag set, the thread will pause and acquire the
* siglock that we hold now and until we've queued
- * the pending signal. For SIGKILL, we likewise
- * don't want anybody doing anything but taking the
- * SIGKILL. The only case in which a thread would
- * not already be in the signal dequeuing loop is
- * non-signal (e.g. syscall) ptrace tracing, so we
- * don't worry about an unnecessary trip through
- * the signal code and just keep this code path
- * simpler by unconditionally setting the flag.
+ * the pending signal.
*/
- set_tsk_thread_flag(t, TIF_SIGPENDING);
- wake_up_process(t);
+ if (!(t->flags & PF_EXITING)) {
+ set_tsk_thread_flag(t, TIF_SIGPENDING);
+ wake_up_process(t);
+ }
t = next_thread(t);
} while (t != p);
}
@@ -789,15 +783,17 @@
* as soon as they're available, so putting the signal on the shared queue
* will be equivalent to sending it to one such thread.
*/
-#define wants_signal(sig, p) (!sigismember(&(p)->blocked, sig) \
- && (p)->state < TASK_STOPPED \
- && !((p)->flags & PF_EXITING) \
- && (task_curr(p) || !signal_pending(p)))
+#define wants_signal(sig, p, mask) \
+ (!sigismember(&(p)->blocked, sig) \
+ && !((p)->state & mask) \
+ && !((p)->flags & PF_EXITING) \
+ && (task_curr(p) || !signal_pending(p)))

static inline int
__group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
{
struct task_struct *t;
+ unsigned int mask;
int ret;

#if CONFIG_SMP
@@ -815,6 +811,14 @@
return 0;

/*
+ * Don't bother zombies and stopped tasks (but
+ * SIGKILL will punch through stopped state)
+ */
+ mask = TASK_DEAD | TASK_ZOMBIE;
+ if (sig != SIGKILL)
+ mask |= TASK_STOPPED;
+
+ /*
* Put this signal on the shared-pending queue, or fail with EAGAIN.
* We always use the shared queue for process-wide signals,
* to avoid several races.
@@ -829,7 +833,7 @@
* If the main thread wants the signal, it gets first crack.
* Probably the least surprising to the average bear.
*/
- if (wants_signal(sig, p))
+ if (wants_signal(sig, p, mask))
t = p;
else if (thread_group_empty(p))
/*
@@ -847,7 +851,7 @@
t = p->signal->curr_target = p;
BUG_ON(t->tgid != p->tgid);

- while (!wants_signal(sig, t)) {
+ while (!wants_signal(sig, t, mask)) {
t = next_thread(t);
if (t == p->signal->curr_target)
/*

2003-02-09 03:39:10

by Roland McGrath

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK

Here is the patch vs 2.5.59-1.1007 that I am using now. gdb seems happy.
I have not run a lot of other tests yet.


--- /home/roland/redhat/linux-2.5.59-1.1007/kernel/signal.c.~1~ Fri Feb 7 20:04:29 2003
+++ /home/roland/redhat/linux-2.5.59-1.1007/kernel/signal.c Sat Feb 8 19:35:28 2003
@@ -120,9 +120,6 @@ int max_queued_signals = 1024;
#define SIG_KERNEL_STOP_MASK (\
M(SIGSTOP) | M(SIGTSTP) | M(SIGTTIN) | M(SIGTTOU) )

-#define SIG_KERNEL_CONT_MASK (\
- M(SIGCONT) | M(SIGKILL) )
-
#define SIG_KERNEL_COREDUMP_MASK (\
M(SIGQUIT) | M(SIGILL) | M(SIGTRAP) | M(SIGABRT) | \
M(SIGFPE) | M(SIGSEGV) | M(SIGBUS) | M(SIGSYS) | \
@@ -139,8 +136,6 @@ int max_queued_signals = 1024;
(((sig) < SIGRTMIN) && T(sig, SIG_KERNEL_IGNORE_MASK))
#define sig_kernel_stop(sig) \
(((sig) < SIGRTMIN) && T(sig, SIG_KERNEL_STOP_MASK))
-#define sig_kernel_cont(sig) \
- (((sig) < SIGRTMIN) && T(sig, SIG_KERNEL_CONT_MASK))

#define sig_user_defined(t, signr) \
(((t)->sighand->action[(signr)-1].sa.sa_handler != SIG_DFL) && \
@@ -592,7 +587,7 @@ static void handle_stop_signal(int sig,
t = next_thread(t);
} while (t != p);
}
- else if (sig_kernel_cont(sig)) {
+ else if (sig == SIGCONT) {
/*
* Remove all stop signals from all queues,
* and wake all threads.
@@ -618,7 +613,8 @@ static void handle_stop_signal(int sig,
p->group_leader,
p->group_leader->real_parent);
}
- rm_from_queue(SIG_KERNEL_STOP_MASK, &p->signal->shared_pending);
+ rm_from_queue(SIG_KERNEL_STOP_MASK,
+ &p->signal->shared_pending);
t = p;
do {
rm_from_queue(SIG_KERNEL_STOP_MASK, &t->pending);
@@ -637,17 +633,13 @@ static void handle_stop_signal(int sig,
* running the handler. With the TIF_SIGPENDING
* flag set, the thread will pause and acquire the
* siglock that we hold now and until we've queued
- * the pending signal. For SIGKILL, we likewise
- * don't want anybody doing anything but taking the
- * SIGKILL. The only case in which a thread would
- * not already be in the signal dequeuing loop is
- * non-signal (e.g. syscall) ptrace tracing, so we
- * don't worry about an unnecessary trip through
- * the signal code and just keep this code path
- * simpler by unconditionally setting the flag.
+ * the pending signal.
*/
- set_tsk_thread_flag(t, TIF_SIGPENDING);
- wake_up_process(t);
+ if (!(t->flags & PF_EXITING)) {
+ if (!sigismember(&t->blocked, SIGCONT))
+ set_tsk_thread_flag(t, TIF_SIGPENDING);
+ wake_up_process(t);
+ }
t = next_thread(t);
} while (t != p);
}
@@ -782,23 +774,25 @@ force_sig_specific(int sig, struct task_
}

/*
- * Test if P wants to take SIG. After we've checked all threads with this,
- * it's equivalent to finding no threads not blocking SIG. Any threads not
- * blocking SIG were ruled out because they are not running and already
- * have pending signals. Such threads will dequeue from the shared queue
- * as soon as they're available, so putting the signal on the shared queue
- * will be equivalent to sending it to one such thread.
- */
-#define wants_signal(sig, p) (!sigismember(&(p)->blocked, sig) \
- && (p)->state < TASK_STOPPED \
- && !((p)->flags & PF_EXITING) \
- && (task_curr(p) || !signal_pending(p)))
+ * Test if P wants to take SIG. MASK gives P->state bits that rule it out.
+ * After we've checked all threads with this, it's equivalent to finding no
+ * threads not blocking SIG. Any threads not blocking SIG were ruled out
+ * because they are not running and already have pending signals. Such
+ * threads will dequeue from the shared queue as soon as they're available,
+ * so putting the signal on the shared queue will be equivalent to sending
+ * it to one such thread.
+ */
+#define wants_signal(sig, p, mask) \
+ (!sigismember(&(p)->blocked, sig) && \
+ !((p)->state & (mask)) && !((p)->flags & PF_EXITING) &&\
+ (task_curr(p) || !signal_pending(p)))

static inline int
__group_send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
{
struct task_struct *t;
int ret;
+ unsigned int mask;

#if CONFIG_SMP
if (!spin_is_locked(&p->sighand->siglock))
@@ -824,12 +818,20 @@ __group_send_sig_info(int sig, struct si
return ret;

/*
+ * Signals are only accepted by non-stopped processes.
+ * SIGKILL is the exception: it acts on stopped processes too.
+ */
+ mask = TASK_ZOMBIE | TASK_DEAD;
+ if (sig != SIGKILL)
+ mask |= TASK_STOPPED;
+
+ /*
* Now find a thread we can wake up to take the signal off the queue.
*
* If the main thread wants the signal, it gets first crack.
* Probably the least surprising to the average bear.
*/
- if (wants_signal(sig, p))
+ if (wants_signal(sig, p, mask))
t = p;
else if (thread_group_empty(p))
/*
@@ -847,7 +849,7 @@ __group_send_sig_info(int sig, struct si
t = p->signal->curr_target = p;
BUG_ON(t->tgid != p->tgid);

- while (!wants_signal(sig, t)) {
+ while (!wants_signal(sig, t, mask)) {
t = next_thread(t);
if (t == p->signal->curr_target)
/*

2003-02-09 04:44:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Sat, 8 Feb 2003, Roland McGrath wrote:
>
> Here is the patch vs 2.5.59-1.1007 that I am using now. gdb seems happy.
> I have not run a lot of other tests yet.

Looks like kernel threads still go crazy at shutdown. I saw the migration
threads apparently hogging the CPU.

Linus

2003-02-09 04:50:48

by Roland McGrath

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK

> Looks like kernel threads still go crazy at shutdown. I saw the migration
> threads apparently hogging the CPU.

Hmm, my (2-CPU) machine reboots quickly. I'd have to be checking closely
somehow to see if there is some short period of weird hoggery (I'm not sure
how, since my fingers aren't quick enough). What exactly did you observe,
and how?


Thanks,
Roland

2003-02-09 04:51:04

by Linus Torvalds

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Sat, 8 Feb 2003, Linus Torvalds wrote:
>
> Looks like kernel threads still go crazy at shutdown. I saw the migration
> threads apparently hogging the CPU.

Never mind, I didn't merge your patch correctly, I still had my old one
there that allowed setting signal-pending even for a blocked signal (and
thus would confuse all the kernel threads that didn't expect to ever have
somebody claim they had pending signals).

I think your patch is ok.

Linus

2003-02-09 09:18:35

by Russell King

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK

On Sat, Feb 08, 2003 at 08:51:05PM -0800, Linus Torvalds wrote:
> On Sat, 8 Feb 2003, Roland McGrath wrote:
> >
> > Here is the patch vs 2.5.59-1.1007 that I am using now. gdb seems happy.
> > I have not run a lot of other tests yet.
>
> Looks like kernel threads still go crazy at shutdown. I saw the migration
> threads apparently hogging the CPU.

I hope you're aware that alt-sysrq-t has been broken for some time?

include/linux/sched.h:

#define TASK_RUNNING 0
#define TASK_INTERRUPTIBLE 1
#define TASK_UNINTERRUPTIBLE 2
#define TASK_STOPPED 4
#define TASK_ZOMBIE 8
#define TASK_DEAD 16

kernel/sched.c:

static const char * stat_nam[] = { "R", "S", "D", "Z", "T", "W" };

fs/proc/array.c:

static const char *task_state_array[] = {
"R (running)", /* 0 */
"S (sleeping)", /* 1 */
"D (disk sleep)", /* 2 */
"T (stopped)", /* 8 */
"Z (zombie)", /* 4 */
"X (dead)" /* 16 */
};

So, for one more time, here's another mailing of the same patch to fix
this brokenness. In addition, we fix the wrong comment in fs/proc/array.c

--- orig/kernel/sched.c Sun Feb 9 09:16:31 2003
+++ linux/kernel/sched.c Sun Feb 9 09:23:44 2003
@@ -2037,7 +2037,7 @@
unsigned long free = 0;
task_t *relative;
int state;
- static const char * stat_nam[] = { "R", "S", "D", "Z", "T", "W" };
+ static const char * stat_nam[] = { "R", "S", "D", "T", "Z", "W" };

printk("%-13.13s ", p->comm);
state = p->state ? __ffs(p->state) + 1 : 0;
--- orig/fs/proc/array.c Sun Feb 9 09:17:36 2003
+++ linux/fs/proc/array.c Sun Feb 9 09:26:00 2003
@@ -126,8 +126,8 @@
"R (running)", /* 0 */
"S (sleeping)", /* 1 */
"D (disk sleep)", /* 2 */
- "T (stopped)", /* 8 */
- "Z (zombie)", /* 4 */
+ "T (stopped)", /* 4 */
+ "Z (zombie)", /* 8 */
"X (dead)" /* 16 */
};



--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2003-02-09 11:25:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


Roland,

there are two bugs in the signal code still:

- a read_lock(&tasklist_lock) is missing around the group_send_sig_info()
in send_sig_info().

- session-IDs and group-IDs are set outside the tasklist lock. This
causes breakage in the USB code. The correct fix is to do this:

void __set_special_pids(pid_t session, pid_t pgrp)
{
struct task_struct *curr = current;

if (curr->session != session) {
detach_pid(curr, PIDTYPE_SID);
curr->session = session;
attach_pid(curr, PIDTYPE_SID, session);
}
if (curr->pgrp != pgrp) {
detach_pid(curr, PIDTYPE_PGID);
curr->pgrp = pgrp;
attach_pid(curr, PIDTYPE_PGID, pgrp);
}
}

void set_special_pids(pid_t session, pid_t pgrp)
{
write_lock_irq(&tasklist_lock);
__set_special_pids(session, pgrp);
write_unlock_irq(&tasklist_lock);
}

and then update all places that do:

- current->session = 1;
- current->pgrp = 1;

to:

+ set_special_pids(1, 1);

otherwise we change the PIDs without properly rehashing them.

Ingo

2003-02-09 11:47:21

by Roland McGrath

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK

> - a read_lock(&tasklist_lock) is missing around the group_send_sig_info()
> in send_sig_info().

Indeed. I still intend to clean up those entry points and haven't gotten
to it, so I hadn't bothered with this yet either (though I think I sent it
to you for the backport). It certainly does bite in practice, e.g. SIGPIPE.

There is a similar failure to take the lock before using zap_other_threads.
I thought I sent this patch before, but it's not in 2.5 yet.

--- /home/roland/redhat/linux-2.5.59-1.1007/fs/exec.c.~1~ Fri Feb 7 20:04:27 2003
+++ /home/roland/redhat/linux-2.5.59-1.1007/fs/exec.c Sun Feb 9 03:43:36 2003
@@ -601,9 +601,12 @@ static inline int de_thread(struct task_

if (thread_group_empty(current))
goto no_thread_group;
+
/*
- * Kill all other threads in the thread group:
+ * Kill all other threads in the thread group.
+ * We must hold tasklist_lock to call zap_other_threads.
*/
+ read_lock(&tasklist_lock);
spin_lock_irq(lock);
if (oldsig->group_exit) {
/*
@@ -611,6 +614,7 @@ static inline int de_thread(struct task_
* return so that the signal is processed.
*/
spin_unlock_irq(lock);
+ read_unlock(&tasklist_lock);
kmem_cache_free(sighand_cachep, newsighand);
if (newsig)
kmem_cache_free(signal_cachep, newsig);
@@ -629,12 +633,15 @@ static inline int de_thread(struct task_
oldsig->group_exit_task = current;
current->state = TASK_UNINTERRUPTIBLE;
spin_unlock_irq(lock);
+ read_unlock(&tasklist_lock);
schedule();
+ read_lock(&tasklist_lock);
spin_lock_irq(lock);
if (oldsig->group_exit_task)
BUG();
}
spin_unlock_irq(lock);
+ read_unlock(&tasklist_lock);

/*
* At this point all other threads have exited, all we have to


> - session-IDs and group-IDs are set outside the tasklist lock. This
> causes breakage in the USB code. The correct fix is to do this:

This is outside the part of the code that I've touched lately.


Thanks,
Roland

2003-02-09 11:54:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Sun, 9 Feb 2003, Roland McGrath wrote:

> > - a read_lock(&tasklist_lock) is missing around the group_send_sig_info()
> > in send_sig_info().
>
> Indeed. I still intend to clean up those entry points and haven't
> gotten to it, so I hadn't bothered with this yet either (though I think
> I sent it to you for the backport). It certainly does bite in practice,
> e.g. SIGPIPE.

it does bite - here's the correctness fix meanwhile, until the interface
is cleaned up.

Ingo

--- linux/kernel/signal.c.orig
+++ linux/kernel/signal.c
@@ -1083,17 +1083,19 @@
int
send_sig_info(int sig, struct siginfo *info, struct task_struct *p)
{
+ int ret;
+
/* XXX should nix these interfaces and update the kernel */
- if (T(sig, SIG_KERNEL_BROADCAST_MASK))
- /* XXX do callers really always hold the tasklist_lock?? */
- return group_send_sig_info(sig, info, p);
- else {
- int error;
+ if (T(sig, SIG_KERNEL_BROADCAST_MASK)) {
+ read_lock(&tasklist_lock);
+ ret = group_send_sig_info(sig, info, p);
+ read_unlock(&tasklist_lock);
+ } else {
spin_lock_irq(&p->sighand->siglock);
- error = specific_send_sig_info(sig, info, p);
+ ret = specific_send_sig_info(sig, info, p);
spin_unlock_irq(&p->sighand->siglock);
- return error;
}
+ return ret;
}

int

2003-02-09 12:06:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Sun, 9 Feb 2003, Roland McGrath wrote:

> > - session-IDs and group-IDs are set outside the tasklist lock. This
> > causes breakage in the USB code. The correct fix is to do this:
>
> This is outside the part of the code that I've touched lately.

yes, i introduced the bug with the new pidhash. Here's the fix, against
BK-curr.

Ingo

--- linux/include/linux/sched.h.orig
+++ linux/include/linux/sched.h
@@ -503,6 +503,8 @@
extern struct mm_struct init_mm;

extern struct task_struct *find_task_by_pid(int pid);
+extern void set_special_pids(pid_t session, pid_t pgrp);
+extern void __set_special_pids(pid_t session, pid_t pgrp);

/* per-UID process charging. */
extern struct user_struct * alloc_uid(uid_t);
--- linux/fs/jffs/intrep.c.orig
+++ linux/fs/jffs/intrep.c
@@ -3344,8 +3344,7 @@
lock_kernel();
exit_mm(c->gc_task);

- current->session = 1;
- current->pgrp = 1;
+ set_special_pids(1, 1);
init_completion(&c->gc_thread_comp); /* barrier */
spin_lock_irq(&current->sighand->siglock);
siginitsetinv (&current->blocked, sigmask(SIGHUP) | sigmask(SIGKILL) | sigmask(SIGSTOP) | sigmask(SIGCONT));
--- linux/kernel/exit.c.orig
+++ linux/kernel/exit.c
@@ -254,6 +254,29 @@
write_unlock_irq(&tasklist_lock);
}

+void __set_special_pids(pid_t session, pid_t pgrp)
+{
+ struct task_struct *curr = current;
+
+ if (curr->session != session) {
+ detach_pid(curr, PIDTYPE_SID);
+ curr->session = session;
+ attach_pid(curr, PIDTYPE_SID, session);
+ }
+ if (curr->pgrp != pgrp) {
+ detach_pid(curr, PIDTYPE_PGID);
+ curr->pgrp = pgrp;
+ attach_pid(curr, PIDTYPE_PGID, pgrp);
+ }
+}
+
+void set_special_pids(pid_t session, pid_t pgrp)
+{
+ write_lock_irq(&tasklist_lock);
+ __set_special_pids(session, pgrp);
+ write_unlock_irq(&tasklist_lock);
+}
+
/*
* Put all the gunge required to become a kernel thread without
* attached user resources in one place where it belongs.
@@ -271,8 +294,7 @@
*/
exit_mm(current);

- current->session = 1;
- current->pgrp = 1;
+ set_special_pids(1, 1);
current->tty = NULL;

/* Become as one with the init task */
--- linux/kernel/kmod.c.orig
+++ linux/kernel/kmod.c
@@ -100,8 +100,7 @@
int i;
struct task_struct *curtask = current;

- curtask->session = 1;
- curtask->pgrp = 1;
+ set_special_pids(1, 1);

use_init_fs_context();

--- linux/kernel/sys.c.orig
+++ linux/kernel/sys.c
@@ -1021,16 +1021,7 @@
goto out;

current->leader = 1;
- if (current->session != current->pid) {
- detach_pid(current, PIDTYPE_SID);
- current->session = current->pid;
- attach_pid(current, PIDTYPE_SID, current->pid);
- }
- if (current->pgrp != current->pid) {
- detach_pid(current, PIDTYPE_PGID);
- current->pgrp = current->pid;
- attach_pid(current, PIDTYPE_PGID, current->pid);
- }
+ __set_special_pids(current->pid, current->pid);
current->tty = NULL;
current->tty_old_pgrp = 0;
err = current->pgrp;

2003-02-09 12:10:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


Arjan pointed out that this one is needed as well:

--- linux/kernel/ksyms.c.orig
+++ linux/kernel/ksyms.c
@@ -604,6 +604,7 @@
EXPORT_SYMBOL(tasklist_lock);
EXPORT_SYMBOL(find_task_by_pid);
EXPORT_SYMBOL(next_thread);
+EXPORT_SYMBOL_GPL(set_special_pids);
#if defined(CONFIG_SMP) && defined(__GENERIC_PER_CPU)
EXPORT_SYMBOL(__per_cpu_offset);
#endif

2003-02-09 12:14:13

by Arjan van de Ven

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK

On Sun, Feb 09, 2003 at 01:23:14PM +0100, Ingo Molnar wrote:
>
> Arjan pointed out that this one is needed as well:

but in second thought... modules don't really need to set the
pids at all, since reparent_to_init() is designed for doing so...

2003-02-10 01:01:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Sun, 9 Feb 2003, Roland McGrath wrote:
>
> There is a similar failure to take the lock before using zap_other_threads.
> I thought I sent this patch before, but it's not in 2.5 yet.

Hmm.. From looking at this patch, it seems as if you believe that
spinlocks must next correctly. They don't have to.

The ABBA kind of deadlock only means that you have to _take_ the spinlocks
in the right order, you can release them in any order you like (as long as
you release all of them).

So if the only requirement is that zap_other_threads() is called with the
tasklist lock held, I would suggest just dropping the tasklist lock after
the zap_other threads thing, despite the fact that you still want to hold
on to the siglock for a while longer. That simplifies the patch, and means
that you don't have to worry about dropping and re-taking the tasklist
lock in the loop that follows.

Linus

2003-02-10 01:17:38

by Roland McGrath

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK

--- /home/roland/redhat/linux-2.5.59-1.1007/fs/exec.c.~1~ Fri Feb 7 20:04:27 2003
+++ /home/roland/redhat/linux-2.5.59-1.1007/fs/exec.c Sun Feb 9 17:25:31 2003
@@ -601,9 +601,12 @@ static inline int de_thread(struct task_

if (thread_group_empty(current))
goto no_thread_group;
+
/*
- * Kill all other threads in the thread group:
+ * Kill all other threads in the thread group.
+ * We must hold tasklist_lock to call zap_other_threads.
*/
+ read_lock(&tasklist_lock);
spin_lock_irq(lock);
if (oldsig->group_exit) {
/*
@@ -611,6 +614,7 @@ static inline int de_thread(struct task_
* return so that the signal is processed.
*/
spin_unlock_irq(lock);
+ read_unlock(&tasklist_lock);
kmem_cache_free(sighand_cachep, newsighand);
if (newsig)
kmem_cache_free(signal_cachep, newsig);
@@ -618,6 +622,7 @@ static inline int de_thread(struct task_
}
oldsig->group_exit = 1;
zap_other_threads(current);
+ read_unlock(&tasklist_lock);

/*
* Account for the thread group leader hanging around:

2003-02-10 08:44:24

by Ingo Molnar

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Sat, 8 Feb 2003, Linus Torvalds wrote:

> Interesting. Especially as the last thing exit_notify() does (just a few
> lines above the schedule()) is to do
>
> tsk->state = TASK_ZOMBIE;
>
> and that schedule() _really_ really shouldn't return. Regardless of any
> signal handler changes.

the proper way to avoid such scenarios (besides removing tasks from all
waitqueues) is to remove the thread from all the PID-hashes prior setting
it to TASK_ZOMBIE. Since any signal-sending needs a task pointer, and the
task pointer can only be found by looking it up in the hash, the proper
way to exclude signal related wakeups (without introducing special
handling into the wakeup code), is to remove the thread from the hash.

Ingo

2003-02-10 15:16:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: heavy handed exit() in latest BK


On Mon, 10 Feb 2003, Ingo Molnar wrote:
>
> > Interesting. Especially as the last thing exit_notify() does (just a few
> > lines above the schedule()) is to do
> >
> > tsk->state = TASK_ZOMBIE;
> >
> > and that schedule() _really_ really shouldn't return. Regardless of any
> > signal handler changes.
>
> the proper way to avoid such scenarios (besides removing tasks from all
> waitqueues) is to remove the thread from all the PID-hashes prior setting
> it to TASK_ZOMBIE.

Not a good idea.

We still want to find zombie processes, since they show up in "ps"
listings etc. And I don't think sending a signal to a zombie process
should return ESRCH, since it's there.

Btw, I fixed it by making all wake-up events give a mask of which states
can be woken up. That's really what SIGCONT wanted anyway (only wake up
stopped tasks), _and_ it's what default_wake_function() really wanted.

Linus