LinuxLists.cc - Posix timers vs exec problems

2005-11-06 13:18:40

Subject: Posix timers vs exec problems

Roland, George, I think posix timers have problems with de_thread().

First, when non-leader thread does exec it calls release_task(->group_leader)
before calling exit_itimers(). This means that send_group_sigqueue() can oops
after taking tasklist_lock while doing spin_lock_irqsave(->sighand->siglock).
This is easy to fix.

cpu-timers have another problem. In !CPUCLOCK_PERTHREAD() case we attach the
timer to ->signal->cpu_timers, so these timers (when created by another process)
will survive after exec. However they will continue to profile execed process
only if it was group_leader who did exec, otherwise posix_cpu_timer_schedule()
will notice that timer->it.cpu.task has ->signal == NULL and stop this timer.

So, should de_thread() call posix_cpu_timers_exit_group() after exit_itimers()
thus stoping cpu-timers after exec? Another option is to change ->it.cpu.task for
each timer in ->signal->cpu_timers, this way cpu-timers will continue to work.
But this is not trivial.

Oleg.

2005-11-16 23:27:29

by George Anzinger

[permalink] [raw]

Subject: [PATCH] sigaction should clear all signals on SIG_IGN, not just < 32

Source: MontaVista Software, Inc.
Type: Defect Fix
Description:
It appeares that the sigaction system call, when processing a SIG_IGN or
a SIG_DFL, is removing only signals that appear in the first mask word.

Signed-off-by: George Anzinger <[email protected]>

include/linux/signal.h | 16 ++++++++++++++++
kernel/signal.c | 34 ++++++++++++++++++++++++++++++++--
2 files changed, 48 insertions(+), 2 deletions(-)

Index: linux-2.6.15-rc/kernel/signal.c
===================================================================
--- linux-2.6.15-rc.orig/kernel/signal.c
+++ linux-2.6.15-rc/kernel/signal.c
@@ -633,6 +633,33 @@ void signal_wake_up(struct task_struct *
* Returns 1 if any signals were found.
*
* All callers must be holding the siglock.
+ *
+ * This version takes a sigset mask and looks at all signals,
+ * not just those in the first mask word.
+ */
+static int rm_from_queue_full(sigset_t *mask, struct sigpending *s)
+{
+ struct sigqueue *q, *n;
+ sigset_t m;
+
+ sigandsets(&m, mask, &s->signal);
+ if (sigisemptyset(&m))
+ return 0;
+
+ signandsets(&s->signal, &s->signal, mask);
+ list_for_each_entry_safe(q, n, &s->list, list) {
+ if (sigismember(mask, q->info.si_signo)) {
+ list_del_init(&q->list);
+ __sigqueue_free(q);
+ }
+ }
+ return 1;
+}
+/*
+ * Remove signals in mask from the pending set and queue.
+ * Returns 1 if any signals were found.
+ *
+ * All callers must be holding the siglock.
*/
static int rm_from_queue(unsigned long mask, struct sigpending *s)
{
@@ -2471,6 +2498,7 @@ int
do_sigaction(int sig, const struct k_sigaction *act, struct k_sigaction *oact)
{
struct k_sigaction *k;
+ sigset_t mask;

if (!valid_signal(sig) || sig < 1 || (act && sig_kernel_only(sig)))
return -EINVAL;
@@ -2518,9 +2546,11 @@ do_sigaction(int sig, const struct k_sig
*k = *act;
sigdelsetmask(&k->sa.sa_mask,
sigmask(SIGKILL) | sigmask(SIGSTOP));
- rm_from_queue(sigmask(sig), &t->signal->shared_pending);
+ sigemptyset(&mask);
+ sigaddset(&mask, sig);
+ rm_from_queue_full(&mask, &t->signal->shared_pending);
do {
- rm_from_queue(sigmask(sig), &t->pending);
+ rm_from_queue_full(&mask, &t->pending);
recalc_sigpending_tsk(t);
t = next_thread(t);
} while (t != current);
Index: linux-2.6.15-rc/include/linux/signal.h
===================================================================
--- linux-2.6.15-rc.orig/include/linux/signal.h
+++ linux-2.6.15-rc/include/linux/signal.h
@@ -82,6 +82,22 @@ static inline int sigfindinword(unsigned

#endif /* __HAVE_ARCH_SIG_BITOPS */

+static inline int sigisemptyset(sigset_t *set)
+{
+ extern void _NSIG_WORDS_is_unsupported_size(void);
+ switch (_NSIG_WORDS) {
+ case 4:
+ return (set->sig[3] | set->sig[2] |
+ set->sig[1] | set->sig[0]) == 0;
+ case 2:
+ return (set->sig[1] | set->sig[0]) == 0;
+ case 1:
+ return set->sig[0] == 0;
+ default:
+ _NSIG_WORDS_is_unsupported_size();
+ }
+}
+
#define sigmask(sig) (1UL << ((sig) - 1))

#ifndef __HAVE_ARCH_SIG_SETOPS

Attachments:

sigaction-fix.patch (2.96 kB)

2005-11-22 01:11:14

by George Anzinger

[permalink] [raw]

Subject: Thread group exec race -> null pointer... HELP

George Anzinger wrote:
> While rooting aroung in the signal code trying to understand how to
> fix the SIG_IGN ploy (set sig handler to SIG_IGN and flood system with
> high speed repeating timers) I came across what, I think, is a problem
> in sigaction() in that when processing a SIG_IGN request it flushes
> signals from 1 to SIGRTMIN and leaves the rest.

Still rooting around in the above. The test program is attached. It
creates and arms a repeating timer and then clones a thread which does
an exec() call.

If I run the test with top (only two programs running) I quickly get
an OOPS on trying to derefence a NULL pointer. It is comming from a
call the posix timer code is making to deliver a timer. Call is to
send_group_sigqueue() at ~445 in posix-timers.c. The process being
passed in is DEAD with current->signal ==NULL, thus the OOPS. In the
first instance of this, we see that the thread-group leader is dead
and the exec code at line ~718 is setting the old leaders group-leader
to him self. The failure then happens when the IRQ release is done on
the write_unlock_irq() at ~732 thus allowing the timer interrupt.

Thinking that it makes no real sense to set the group leader to a dead
process, I did the following:

--- linux-2.6.15-rc.orig/fs/exec.c
+++ linux-2.6.15-rc/fs/exec.c
@@ -715,7 +715,7 @@ static inline int de_thread(struct task_
current->parent = current->real_parent = leader->real_parent;
leader->parent = leader->real_parent = child_reaper;
current->group_leader = current;
- leader->group_leader = leader;
+ leader->group_leader = current;

add_parent(current, current->parent);
add_parent(leader, leader->parent);

This also fails as there is still a window where the group leader is
dead with a null signal pointer, i.e. the interrupt happens (this time
on another cpu) before the above changed code is executed.

It seems to me that the group leader needs to change prior to setting
the signal pointer to NULL, but I don't really know this code very well.

Help !
--
George Anzinger [email protected]
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/

Attachments:

timer_kill.c (1.17 kB)

2005-11-22 13:31:33

by Oleg Nesterov

[permalink] [raw]

Subject: Re: Thread group exec race -> null pointer... HELP

George Anzinger wrote:
>
> Still rooting around in the above. The test program is attached. It
> creates and arms a repeating timer and then clones a thread which does
> an exec() call.

This patch:

http://marc.theaimsgroup.com/?l=linux-kernel&m=113138286512847

was intended to fix exactly this problem (and the same test program was
used to exploit the race and test the fix).

So, it does not help? I can't reproduce the problem.

Note: I think you also need this patch:

http://marc.theaimsgroup.com/?l=linux-kernel&m=113059955626598

otherwise I beleive OOPS can happen while killing this program if you are
running the kernel with this change applied:

[PATCH] Call exit_itimers from do_exit, not __exit_signal
http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=25f407f0b668f5e4ebd5d13e1fb4306ba6427ead

> first instance of this, we see that the thread-group leader is dead
> and the exec code at line ~718 is setting the old leaders group-leader
> to him self.

I think this code at line ~718

leader->group_leader = leader;

is noop, because leader->group_leader == leader here.

> - leader->group_leader = leader;
> + leader->group_leader = current;

This can't help, without SIGEV_THREAD_ID we don't check ->group_leader,
the signal goes to the thread group via timer->it_process, which is equal
to the old leader.

Oleg.

2005-11-23 20:32:49

by George Anzinger

[permalink] [raw]

Subject: Re: Thread group exec race -> null pointer... HELP

Oleg Nesterov wrote:
> George Anzinger wrote:
>
>>Still rooting around in the above. The test program is attached. It
>>creates and arms a repeating timer and then clones a thread which does
>>an exec() call.
>
>
> This patch:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=113138286512847
>
> was intended to fix exactly this problem (and the same test program was
> used to exploit the race and test the fix).
>
> So, it does not help? I can't reproduce the problem.

Yes, it does fix it. Somehow I missed the posting of that patch.
>
> Note: I think you also need this patch:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=113059955626598
>
> otherwise I beleive OOPS can happen while killing this program if you are
> running the kernel with this change applied:
>
> [PATCH] Call exit_itimers from do_exit, not __exit_signal
> http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=25f407f0b668f5e4ebd5d13e1fb4306ba6427ead
>
>
>
>>first instance of this, we see that the thread-group leader is dead
>>and the exec code at line ~718 is setting the old leaders group-leader
>>to him self.
>
>
> I think this code at line ~718
>
> leader->group_leader = leader;
>
> is noop, because leader->group_leader == leader here.
>
>
>>- leader->group_leader = leader;
>>+ leader->group_leader = current;
>
>
> This can't help, without SIGEV_THREAD_ID we don't check ->group_leader,
> the signal goes to the thread group via timer->it_process, which is equal
> to the old leader.

The signal code returns <0 so posix-timers digs into up the
group_leader and trys again. Still, the patch fixes it all.

--
George Anzinger [email protected]
HRT (High-res-timers): http://sourceforge.net/projects/high-res-timers/

2005-11-25 15:04:21

by Ingo Molnar

[permalink] [raw]

Subject: Re: Thread group exec race -> null pointer... HELP

* George Anzinger <[email protected]> wrote:

> >This patch:
> >
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=113138286512847
> >
> >was intended to fix exactly this problem (and the same test program was
> >used to exploit the race and test the fix).
> >
> >So, it does not help? I can't reproduce the problem.
>
> Yes, it does fix it. Somehow I missed the posting of that patch.

fyi, i've included the patch in -rt15.

Ingo