2008-11-10 13:39:44

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH] for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock

The patch is ugly, but I don't see the better fix for now. Needs
the review from Peter/Ingo.

Unlike other similar routines, account_group_exec_runtime() could be
called "implicitly" from within scheduler after exit_notify(). This
means we can race with the parent doing release_task(), we can't just
check ->signal != NULL.

Change __exit_signal() to do spin_unlock_wait(&task_rq(tsk)->lock)
before __cleanup_signal() to make sure ->signal can't be freed under
task_rq(tsk)->lock. Note that task_rq_unlock_wait() doesn't care
about the case when tsk changes cpu/rq under us, this should be OK.

Thanks to Ingo who nacked my previous buggy patch.

Signed-off-by: Oleg Nesterov <[email protected]>
Reported-by: Doug Chapman <[email protected]>

--- K-28/include/linux/sched.h~SIG_RQ_LOCK 2008-11-06 19:12:44.000000000 +0100
+++ K-28/include/linux/sched.h 2008-11-10 13:13:07.000000000 +0100
@@ -247,6 +247,7 @@ extern void init_idle(struct task_struct
extern void init_idle_bootup_task(struct task_struct *idle);

extern int runqueue_is_locked(void);
+extern void task_rq_unlock_wait(struct task_struct *p);

extern cpumask_t nohz_cpu_mask;
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
--- K-28/kernel/sched.c~SIG_RQ_LOCK 2008-11-06 19:12:44.000000000 +0100
+++ K-28/kernel/sched.c 2008-11-10 13:05:09.000000000 +0100
@@ -969,6 +969,14 @@ static struct rq *task_rq_lock(struct ta
}
}

+void task_rq_unlock_wait(struct task_struct *p)
+{
+ struct rq *rq = task_rq(p);
+
+ smp_mb();
+ spin_unlock_wait(&rq->lock);
+}
+
static void __task_rq_unlock(struct rq *rq)
__releases(rq->lock)
{
--- K-28/kernel/exit.c~SIG_RQ_LOCK 2008-11-06 19:11:02.000000000 +0100
+++ K-28/kernel/exit.c 2008-11-10 15:07:22.000000000 +0100
@@ -141,6 +141,11 @@ static void __exit_signal(struct task_st
if (sig) {
flush_sigqueue(&sig->shared_pending);
taskstats_tgid_free(sig);
+ /*
+ * Make sure ->signal can't go away under rq->lock,
+ * see account_group_exec_runtime().
+ */
+ task_rq_unlock_wait(tsk);
__cleanup_signal(sig);
}
}


2008-11-11 10:36:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock


* Oleg Nesterov <[email protected]> wrote:

> The patch is ugly, but I don't see the better fix for now. Needs the
> review from Peter/Ingo.

this is indeed too ugly, and if we do it we'll get both this ugliness
and the CPU loop upstream forever. Frank, if you dont have time to fix
this code, then i guess the best thing is to do the full revert that
Peter sent.

Regarding this teardown bug. Stupid question: why cannot the signal
structure live as long as the last user is around? It's a tiny amount
of RAM.

Ingo

2008-11-11 11:58:30

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH] for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock

On 11/11, Ingo Molnar wrote:
>
> * Oleg Nesterov <[email protected]> wrote:
>
> > The patch is ugly, but I don't see the better fix for now. Needs the
> > review from Peter/Ingo.
>
> this is indeed too ugly,

Agreed. It was "unless we find another fix for 2.6.28".

> Regarding this teardown bug. Stupid question: why cannot the signal
> structure live as long as the last user is around? It's a tiny amount
> of RAM.

Well, release_task()->__exit_signal() clears/frees ->signal exactly
because it doesn't (must not) have users any longer. And we have the
code which checks ->signal != NULL to know if the task was already
released or not.

Now scheduler wants to play with ->signal. We can change the code so
that we don't actually free it until the task does the last schedule.
Say, we can free it __from put_task_struct(). But this means we need
another counter in signal_struct (signal_struct->count can't work).
And, until we change the code which checks ->signal != NULL, we need
another pointer in task_struct.


Perhaps this makes sense regardless of this bug, but I don't think
this is 2.6.28 material anyway.

Oleg.

2008-11-11 17:11:19

by Frank Mayhar

[permalink] [raw]
Subject: Re: [PATCH] for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock

On Tue, 2008-11-11 at 11:35 +0100, Ingo Molnar wrote:
> * Oleg Nesterov <[email protected]> wrote:
> > The patch is ugly, but I don't see the better fix for now. Needs the
> > review from Peter/Ingo.
> this is indeed too ugly, and if we do it we'll get both this ugliness
> and the CPU loop upstream forever. Frank, if you dont have time to fix
> this code, then i guess the best thing is to do the full revert that
> Peter sent.

Well, at the moment I'm up to my armpits in alligators. That said,
we're going to have to pull in this code regardless, ugliness and all,
since we're guaranteed to run into the soft lockup bug otherwise. This
means that I'll have strong incentive to come back and readdress the fix
to remove the ugliness and address Peter's concerns. I have no idea
when that will be, however.
--
Frank Mayhar <[email protected]>
Google, Inc.

2008-11-11 17:17:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock


* Frank Mayhar <[email protected]> wrote:

> On Tue, 2008-11-11 at 11:35 +0100, Ingo Molnar wrote:
> > * Oleg Nesterov <[email protected]> wrote:
> > > The patch is ugly, but I don't see the better fix for now. Needs the
> > > review from Peter/Ingo.
> > this is indeed too ugly, and if we do it we'll get both this ugliness
> > and the CPU loop upstream forever. Frank, if you dont have time to fix
> > this code, then i guess the best thing is to do the full revert that
> > Peter sent.
>
> Well, at the moment I'm up to my armpits in alligators. That said,
> we're going to have to pull in this code regardless, ugliness and
> all, since we're guaranteed to run into the soft lockup bug
> otherwise. This means that I'll have strong incentive to come back
> and readdress the fix to remove the ugliness and address Peter's
> concerns. I have no idea when that will be, however.

well, we wont leave buggy code in there for .28 - it could trigger
anytime on any SMP box, no matter how narrow the race is.

I've picked up the spin-wait fix from Oleg, because it fixes the bug.
But we should really fix the fundamental issues here too.

Ingo

2008-11-11 17:29:19

by Frank Mayhar

[permalink] [raw]
Subject: Re: [PATCH] for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock

On Tue, 2008-11-11 at 18:16 +0100, Ingo Molnar wrote:
> * Frank Mayhar <[email protected]> wrote:
>
> > On Tue, 2008-11-11 at 11:35 +0100, Ingo Molnar wrote:
> > > * Oleg Nesterov <[email protected]> wrote:
> > > > The patch is ugly, but I don't see the better fix for now. Needs the
> > > > review from Peter/Ingo.
> > > this is indeed too ugly, and if we do it we'll get both this ugliness
> > > and the CPU loop upstream forever. Frank, if you dont have time to fix
> > > this code, then i guess the best thing is to do the full revert that
> > > Peter sent.
> >
> > Well, at the moment I'm up to my armpits in alligators. That said,
> > we're going to have to pull in this code regardless, ugliness and
> > all, since we're guaranteed to run into the soft lockup bug
> > otherwise. This means that I'll have strong incentive to come back
> > and readdress the fix to remove the ugliness and address Peter's
> > concerns. I have no idea when that will be, however.
>
> well, we wont leave buggy code in there for .28 - it could trigger
> anytime on any SMP box, no matter how narrow the race is.

Sorry, by "we" I meant where I work, not the Linux kernel folks; I guess
it's as true for you guys as it is for us but I'm certainly not in a
position to speak for you.

> I've picked up the spin-wait fix from Oleg, because it fixes the bug.
> But we should really fix the fundamental issues here too.

Agreed, in spades. In fact I plan to track this internally so it
doesn't fall off my radar.
--
Frank Mayhar <[email protected]>
Google, Inc.