Date: Tue, 1 Jun 2004 22:57:03 -0700
From: Andrew Morton <akpm@osdl.org>
To: Jeremy Kerr <jeremy@redfishsoftware.com.au>
Cc: linux-kernel@vger.kernel.org, Linus Torvalds <torvalds@osdl.org>
Subject: Re: [PATCH] Fix signal race during process exit
Message-Id: <20040601225703.6c697bed.akpm@osdl.org>
In-Reply-To: <200406021213.58305.jeremy@redfishsoftware.com.au>
References: <200406021213.58305.jeremy@redfishsoftware.com.au>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2327
Lines: 63

Jeremy Kerr <jeremy@redfishsoftware.com.au> wrote:
>
> This patch fixes a race where timer-generated signals are delivered to an 
>  exiting process, after task->sighand is cleared.

Nasty.  I'm surprised that we haven't hit this more frequently.  I guess
timer-generated signals aren't very common.


However I'm not sure that your fix is complete:

void update_process_times(int user_tick)
{
	struct task_struct *p = current;
	int cpu = smp_processor_id(), system = user_tick ^ 1;

	/* Don't send signals to current after release_task() */
	if (likely(p->sighand))
		update_one_process(p, user_tick, system, cpu);

versus:

void __exit_sighand(struct task_struct *tsk)
{
	struct sighand_struct * sighand = tsk->sighand;

	/* Ok, we're done with the signal handlers.
	 * Set sighand to NULL to tell kernel/timer.c not
	 * to deliver further signals to this task
	 */
	tsk->sighand = NULL;
	if (atomic_dec_and_test(&sighand->count))
		kmem_cache_free(sighand_cachep, sighand);

If these two functions are running on different CPUs then the race is still
there - exit_sighand() can call update_process_times() while __exit_sighand
is throwing away p->sighand.

Question is, can these functions run on separate CPUs?  Certainly a
different CPU can run release_task(), via wait4().

And there's a little window at the end of exit_notify() where the exitting
task (which is still "current" on its CPU) can take a timer interrupt while
in a state TASK_ZOMBIE.  The CPU which is running wait4() will run
release_task() for the exitting task and the above race can occur.

(And if the exitting task is being ptraced things get more complex..)

Did I miss something?

It seems silly to be trying to deliver timer signals to processes which are
so late in exit and we could perhaps set ->it_prof_value and
->it_virt_value to zero earlier in exit.  That's sane, but doesn't fix the
race.

Right now, I see no alternative to adding locking which pins task->sighand
while the timer handler is running.  Taking tasklist_lock on each timer
tick will hurt - maybe a new per-process lock is needed?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/