2004-10-23 04:17:46

by Roland McGrath

[permalink] [raw]
Subject: Re: Fw: BUG_ONs in signal.c?

Once group_exit is set, it should never be cleared and group_exit_code
should never be changed. It's set at the beginning of do_coredump. If
do_coredump returned nonzero, there should be no way group_exit_code could
have changed from the value do_coredump set. If you hit one of those
BUG_ON checks, there is a problem I don't understand. I would like to know
how to reproduce it.


Thanks,
Roland


2004-10-23 04:31:40

by Roland McGrath

[permalink] [raw]
Subject: Re: Fw: BUG_ONs in signal.c?

Oh, duh. The race is obvious. Sorry for the confusion there.
I think this is the way to fix it.

--- linux-2.6/kernel/signal.c 19 Oct 2004 15:03:02 -0000 1.143
+++ linux-2.6/kernel/signal.c 23 Oct 2004 04:23:31 -0000
@@ -1909,22 +1910,16 @@ relock:
* Anything else is fatal, maybe with a core dump.
*/
current->flags |= PF_SIGNALED;
- if (sig_kernel_coredump(signr) &&
- do_coredump((long)signr, signr, regs)) {
+ if (sig_kernel_coredump(signr)) {
/*
- * That killed all other threads in the group and
- * synchronized with their demise, so there can't
- * be any more left to kill now. The group_exit
- * flags are set by do_coredump. Note that
- * thread_group_empty won't always be true yet,
- * because those threads were blocked in __exit_mm
- * and we just let them go to finish dying.
- */
- const int code = signr | 0x80;
- BUG_ON(!current->signal->group_exit);
- BUG_ON(current->signal->group_exit_code != code);
- do_exit(code);
- /* NOTREACHED */
+ * If it was able to dump core, this kills all
+ * other threads in the group and synchronizes with
+ * their demise. If we lost the race with another
+ * thread getting here, it set group_exit_code
+ * first and our do_group_exit call below will use
+ * that value and ignore the one we pass it.
+ */
+ do_coredump((long)signr, signr, regs);
}

/*


While looking at this, I noticed a bug (not directly related) in do_coredump.
It was setting the "core dumped" flag even when the format dumping hook
failed (e.g. for memory allocation failures).


--- linux-2.6/fs/exec.c 19 Oct 2004 15:05:13 -0000 1.146
+++ linux-2.6/fs/exec.c 23 Oct 2004 04:23:42 -0000
@@ -1417,7 +1417,8 @@ int do_coredump(long signr, int exit_cod

retval = binfmt->core_dump(signr, regs, file);

- current->signal->group_exit_code |= 0x80;
+ if (retval)
+ current->signal->group_exit_code |= 0x80;
close_fail:
filp_close(file, NULL);
fail_unlock:



Thanks,
Roland

2004-10-23 05:58:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: Fw: BUG_ONs in signal.c?



On Fri, 22 Oct 2004, Roland McGrath wrote:
>
> Once group_exit is set, it should never be cleared and group_exit_code
> should never be changed.

Hmm? Another signal that kills another thread, but isn't a core-dump
signal, will go through the __group_complete_signal() code in
kernel/signal.c, and do

p->signal->group_exit_code = sig;

adn the only locking there is the siglock/tasklist_lock as far as I can
see.

So as far as I can tell, I see

coredump thread other thread
=============== ============

do_coredump()
current->signal->group_exit_code = exit_code
coredump_wait(mm);

/* gets fatal non-coredump signal */
current->signal->group_exit_code = sig;
...
BUG_ON(current->signal->group_exit_code != exit_code);
!!BOOM!!

No?

Linus

2004-10-25 16:28:17

by Jesse Barnes

[permalink] [raw]
Subject: Re: Fw: BUG_ONs in signal.c?

On Friday, October 22, 2004 9:29 pm, Roland McGrath wrote:
> Oh, duh. The race is obvious. Sorry for the confusion there.
> I think this is the way to fix it.
>
> --- linux-2.6/kernel/signal.c 19 Oct 2004 15:03:02 -0000 1.143
> +++ linux-2.6/kernel/signal.c 23 Oct 2004 04:23:31 -0000
> @@ -1909,22 +1910,16 @@ relock:
> * Anything else is fatal, maybe with a core dump.
> */
> current->flags |= PF_SIGNALED;
> - if (sig_kernel_coredump(signr) &&
> - do_coredump((long)signr, signr, regs)) {
> + if (sig_kernel_coredump(signr)) {
> /*
> - * That killed all other threads in the group and
> - * synchronized with their demise, so there can't
> - * be any more left to kill now. The group_exit
> - * flags are set by do_coredump. Note that
> - * thread_group_empty won't always be true yet,
> - * because those threads were blocked in __exit_mm
> - * and we just let them go to finish dying.
> - */
> - const int code = signr | 0x80;
> - BUG_ON(!current->signal->group_exit);
> - BUG_ON(current->signal->group_exit_code != code);
> - do_exit(code);
> - /* NOTREACHED */
> + * If it was able to dump core, this kills all
> + * other threads in the group and synchronizes with
> + * their demise. If we lost the race with another
> + * thread getting here, it set group_exit_code
> + * first and our do_group_exit call below will use
> + * that value and ignore the one we pass it.
> + */
> + do_coredump((long)signr, signr, regs);
> }
>
> /*

Yeah, this looks good, although we'll end up calling do_group_exit instead of
do_exit for the dumped task, is that ok?

> While looking at this, I noticed a bug (not directly related) in
> do_coredump. It was setting the "core dumped" flag even when the format
> dumping hook failed (e.g. for memory allocation failures).
>
>
> --- linux-2.6/fs/exec.c 19 Oct 2004 15:05:13 -0000 1.146
> +++ linux-2.6/fs/exec.c 23 Oct 2004 04:23:42 -0000
> @@ -1417,7 +1417,8 @@ int do_coredump(long signr, int exit_cod
>
> retval = binfmt->core_dump(signr, regs, file);
>
> - current->signal->group_exit_code |= 0x80;
> + if (retval)
> + current->signal->group_exit_code |= 0x80;
> close_fail:
> filp_close(file, NULL);
> fail_unlock:

Yeah, saw this too, sorry I forgot to mention it.

Thanks,
Jesse

2004-10-25 18:57:07

by Roland McGrath

[permalink] [raw]
Subject: Re: Fw: BUG_ONs in signal.c?

> Yeah, this looks good, although we'll end up calling do_group_exit
> instead of do_exit for the dumped task, is that ok?

It's necessary for correcetness. do_group_exit takes the siglock and
checks the group_exit and group_exit_code fields, so in the race situation
the thread gets the right exit code.


Thanks,
Roland