Hey Ingo,
So we've been seeing the following trace fairly frequently on our SMP
boxes when running kernbench:
BUG: at kernel/softirq.c:639 __tasklet_action()
Call Trace:
[<ffffffff8106d5da>] dump_trace+0xaa/0x32a
[<ffffffff8106d89b>] show_trace+0x41/0x5c
[<ffffffff8106d8cb>] dump_stack+0x15/0x17
[<ffffffff81094a97>] __tasklet_action+0xdf/0x12e
[<ffffffff81094f76>] tasklet_action+0x27/0x29
[<ffffffff8109530a>] ksoftirqd+0x16c/0x271
[<ffffffff81033d4d>] kthread+0xf5/0x128
[<ffffffff8105ff68>] child_rip+0xa/0x12
Paul also pointed this out awhile back: http://lkml.org/lkml/2007/2/25/1
Anyway, I think I finally found the issue. Its a bit hard to explain,
but the idea is while __tasklet_action is running the tasklet function
on CPU1, if a call to tasklet_schedule() on CPU2 is made, and if right
after we mark the TASKLET_STATE_SCHED bit we are preempted,
__tasklet_action on CPU1 might be able to re-run the function, clear the
bit and unlock the tasklet before CPU2 enters __tasklet_common_schedule.
Once __tasklet_common_schedule locks the tasklet, we will add the
tasklet to the list with the TASKLET_STATE_SCHED *unset*.
I've verified this race occurs w/ a WARN_ON in
__tasklet_common_schedule().
This fix avoids this race by making sure *after* we've locked the
tasklet that the STATE_SCHED bit is set before adding it to the list.
Does it look ok to you?
thanks
-john
Signed-off-by: John Stultz <[email protected]>
Index: 2.6-rt/kernel/softirq.c
===================================================================
--- 2.6-rt.orig/kernel/softirq.c 2007-06-05 18:30:54.000000000 -0700
+++ 2.6-rt/kernel/softirq.c 2007-06-05 18:36:44.000000000 -0700
@@ -544,10 +544,17 @@ static void inline
__tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
{
if (tasklet_trylock(t)) {
- WARN_ON(t->next != NULL);
- t->next = head->list;
- head->list = t;
- raise_softirq_irqoff(nr);
+ /* We may have been preempted before tasklet_trylock
+ * and __tasklet_action may have already run.
+ * So double check the sched bit while the takslet
+ * is locked before adding it to the list.
+ */
+ if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
+ WARN_ON(t->next != NULL);
+ t->next = head->list;
+ head->list = t;
+ raise_softirq_irqoff(nr);
+ }
tasklet_unlock(t);
}
}
* john stultz <[email protected]> wrote:
> This fix avoids this race by making sure *after* we've locked the
> tasklet that the STATE_SCHED bit is set before adding it to the list.
>
> Does it look ok to you?
ah - nice!! What would be the worst-case effect of this bug? (besides
the WARN_ON()?) In any case, this fix will be in -rt10. Great work!
Ingo
On 06/06/07, john stultz <[email protected]> wrote:
> --- 2.6-rt.orig/kernel/softirq.c 2007-06-05 18:30:54.000000000 -0700
> +++ 2.6-rt/kernel/softirq.c 2007-06-05 18:36:44.000000000 -0700
> @@ -544,10 +544,17 @@ static void inline
> __tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
> {
> if (tasklet_trylock(t)) {
> - WARN_ON(t->next != NULL);
> - t->next = head->list;
> - head->list = t;
> - raise_softirq_irqoff(nr);
> + /* We may have been preempted before tasklet_trylock
> + * and __tasklet_action may have already run.
> + * So double check the sched bit while the takslet
s/takslet/tasklet/
--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
On Wed, 2007-06-06 at 11:45 +0200, Ingo Molnar wrote:
> * john stultz <[email protected]> wrote:
>
> > This fix avoids this race by making sure *after* we've locked the
> > tasklet that the STATE_SCHED bit is set before adding it to the list.
> >
> > Does it look ok to you?
>
> ah - nice!! What would be the worst-case effect of this bug? (besides
> the WARN_ON()?) In any case, this fix will be in -rt10. Great work!
The only side effect I can come up w/ is you might run a tasklet 3 times
if it was only scheduled twice.
thanks
-john
On Tue, 2007-06-05 at 19:17 -0700, john stultz wrote:
> Hey Ingo,
> So we've been seeing the following trace fairly frequently on our SMP
> boxes when running kernbench:
>
> BUG: at kernel/softirq.c:639 __tasklet_action()
>
> Call Trace:
> [<ffffffff8106d5da>] dump_trace+0xaa/0x32a
> [<ffffffff8106d89b>] show_trace+0x41/0x5c
> [<ffffffff8106d8cb>] dump_stack+0x15/0x17
> [<ffffffff81094a97>] __tasklet_action+0xdf/0x12e
> [<ffffffff81094f76>] tasklet_action+0x27/0x29
> [<ffffffff8109530a>] ksoftirqd+0x16c/0x271
> [<ffffffff81033d4d>] kthread+0xf5/0x128
> [<ffffffff8105ff68>] child_rip+0xa/0x12
>
>
> Paul also pointed this out awhile back: http://lkml.org/lkml/2007/2/25/1
>
>
> Anyway, I think I finally found the issue. Its a bit hard to explain,
> but the idea is while __tasklet_action is running the tasklet function
> on CPU1, if a call to tasklet_schedule() on CPU2 is made, and if right
> after we mark the TASKLET_STATE_SCHED bit we are preempted,
> __tasklet_action on CPU1 might be able to re-run the function, clear the
> bit and unlock the tasklet before CPU2 enters __tasklet_common_schedule.
> Once __tasklet_common_schedule locks the tasklet, we will add the
> tasklet to the list with the TASKLET_STATE_SCHED *unset*.
>
> I've verified this race occurs w/ a WARN_ON in
> __tasklet_common_schedule().
>
>
> This fix avoids this race by making sure *after* we've locked the
> tasklet that the STATE_SCHED bit is set before adding it to the list.
>
> Does it look ok to you?
>
> thanks
> -john
>
> Signed-off-by: John Stultz <[email protected]>
>
> Index: 2.6-rt/kernel/softirq.c
> ===================================================================
> --- 2.6-rt.orig/kernel/softirq.c 2007-06-05 18:30:54.000000000 -0700
> +++ 2.6-rt/kernel/softirq.c 2007-06-05 18:36:44.000000000 -0700
> @@ -544,10 +544,17 @@ static void inline
> __tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
> {
> if (tasklet_trylock(t)) {
> - WARN_ON(t->next != NULL);
> - t->next = head->list;
> - head->list = t;
> - raise_softirq_irqoff(nr);
> + /* We may have been preempted before tasklet_trylock
> + * and __tasklet_action may have already run.
> + * So double check the sched bit while the takslet
> + * is locked before adding it to the list.
> + */
> + if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
> + WARN_ON(t->next != NULL);
> + t->next = head->list;
> + head->list = t;
> + raise_softirq_irqoff(nr);
> + }
> tasklet_unlock(t);
> }
> }
So while digging on a strange OOM issue we were seeing (which actually
ended up being fixed by Steven's softirq patch), I noticed that the fix
above is incomplete. With only the patch above, we may no longer have
unscheduled tasklets added to the list, but we may end up with scheduled
tasklets that are not on the list (and will stay that way!).
The following additional patch should correct this issue. Although since
we weren't actually hitting it, the issue is a bit theoretical, so I've
not been able to prove it really fixes anything.
thanks
-john
Index: 2.6-rt/kernel/softirq.c
===================================================================
--- 2.6-rt.orig/kernel/softirq.c 2007-06-12 20:30:11.000000000 -0700
+++ 2.6-rt/kernel/softirq.c 2007-06-12 20:37:22.000000000 -0700
@@ -544,6 +544,7 @@
__tasklet_common_schedule(struct tasklet_struct *t, struct tasklet_head *head, unsigned int nr)
{
if (tasklet_trylock(t)) {
+again:
/* We may have been preempted before tasklet_trylock
* and __tasklet_action may have already run.
* So double check the sched bit while the takslet
@@ -554,8 +555,21 @@
t->next = head->list;
head->list = t;
raise_softirq_irqoff(nr);
+ tasklet_unlock(t);
+ } else {
+ /* This is subtle. If we hit the corner case above
+ * It is possible that we get preempted right here,
+ * and another task has successfully called
+ * tasklet_schedule(), then this function, and
+ * failed on the trylock. Thus we must be sure
+ * before releasing the tasklet lock, that the
+ * SCHED_BIT is clear. Otherwise the tasklet
+ * may get its SCHED_BIT set, but not added to the
+ * list
+ */
+ if (!tasklet_tryunlock(t))
+ goto again;
}
- tasklet_unlock(t);
}
}
john stultz wrote:
>
> The following additional patch should correct this issue. Although since
> we weren't actually hitting it, the issue is a bit theoretical, so I've
> not been able to prove it really fixes anything.
Could you please look at the message below? I sent it privately near a month
ago, but I think these problems are not fixed yet.
Oleg.
---------------------------------------------------------------------------
> +__tasklet_action(struct softirq_action *a, struct tasklet_struct *list)
> +{
> + int loops = 1000000;
>
> while (list) {
> struct tasklet_struct *t = list;
>
> list = list->next;
> + /*
> + * Should always succeed - after a tasklist got on the
> + * list (after getting the SCHED bit set from 0 to 1),
> + * nothing but the tasklet softirq it got queued to can
> + * lock it:
> + */
> + if (!tasklet_trylock(t)) {
> + WARN_ON(1);
> + continue;
> + }
> +
> + t->next = NULL;
> +
> + /*
> + * If we cannot handle the tasklet because it's disabled,
> + * mark it as pending. tasklet_enable() will later
> + * re-schedule the tasklet.
> + */
> + if (unlikely(atomic_read(&t->count))) {
> +out_disabled:
> + /* implicit unlock: */
> + wmb();
> + t->state = TASKLET_STATEF_PENDING;
What if tasklet_enable() happens just before this line ?
After the next schedule_tasklet() we have all bits set: SCHED, RUN, PENDING.
The next invocation of __tasklet_action() clears SCHED, but tasklet_tryunlock()
below can never succeed because of PENDING.
> +again:
> + t->func(t->data);
> +
> + /*
> + * Try to unlock the tasklet. We must use cmpxchg, because
> + * another CPU might have scheduled or disabled the tasklet.
> + * We only allow the STATE_RUN -> 0 transition here.
> + */
> + while (!tasklet_tryunlock(t)) {
> + /*
> + * If it got disabled meanwhile, bail out:
> + */
> + if (atomic_read(&t->count))
> + goto out_disabled;
> + /*
> + * If it got scheduled meanwhile, re-execute
> + * the tasklet function:
> + */
> + if (test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
> + goto again;
TASKLET_STATE_SCHED could be set by tasklet_kill(), in this case it is not nice
to call t->func() again (but probably harmless).
On Fri, 2007-06-15 at 19:52 +0400, Oleg Nesterov wrote:
> john stultz wrote:
> >
> > The following additional patch should correct this issue. Although since
> > we weren't actually hitting it, the issue is a bit theoretical, so I've
> > not been able to prove it really fixes anything.
>
> Could you please look at the message below? I sent it privately near a month
> ago, but I think these problems are not fixed yet.
Hmm. Maybe you sent it to others on the cc list, as I can't find it in
my box. But apologies anyway.
> ---------------------------------------------------------------------------
>
> > +__tasklet_action(struct softirq_action *a, struct tasklet_struct *list)
> > +{
> > + int loops = 1000000;
> >
> > while (list) {
> > struct tasklet_struct *t = list;
> >
> > list = list->next;
> > + /*
> > + * Should always succeed - after a tasklist got on the
> > + * list (after getting the SCHED bit set from 0 to 1),
> > + * nothing but the tasklet softirq it got queued to can
> > + * lock it:
> > + */
> > + if (!tasklet_trylock(t)) {
> > + WARN_ON(1);
> > + continue;
> > + }
> > +
> > + t->next = NULL;
> > +
> > + /*
> > + * If we cannot handle the tasklet because it's disabled,
> > + * mark it as pending. tasklet_enable() will later
> > + * re-schedule the tasklet.
> > + */
> > + if (unlikely(atomic_read(&t->count))) {
> > +out_disabled:
> > + /* implicit unlock: */
> > + wmb();
> > + t->state = TASKLET_STATEF_PENDING;
>
> What if tasklet_enable() happens just before this line ?
>
> After the next schedule_tasklet() we have all bits set: SCHED, RUN, PENDING.
> The next invocation of __tasklet_action() clears SCHED, but tasklet_tryunlock()
> below can never succeed because of PENDING.
Yep. I've only been focusing on races in schedule/action, as I've been
hunting issues w/ rcu. But I'll agree that the other state changes look
problematic. I know Paul McKenney was looking at some of the other state
changes and was seeing some potential problems as well.
thanks
-john
On 06/15, john stultz wrote:
>
> On Fri, 2007-06-15 at 19:52 +0400, Oleg Nesterov wrote:
> >
> > Could you please look at the message below? I sent it privately near a month
> > ago, but I think these problems are not fixed yet.
>
> Hmm. Maybe you sent it to others on the cc list, as I can't find it in
> my box. But apologies anyway.
checking my mbox... Oops, you are right, sorry!
> > > + if (unlikely(atomic_read(&t->count))) {
> > > +out_disabled:
> > > + /* implicit unlock: */
> > > + wmb();
> > > + t->state = TASKLET_STATEF_PENDING;
> >
> > What if tasklet_enable() happens just before this line ?
> >
> > After the next schedule_tasklet() we have all bits set: SCHED, RUN, PENDING.
> > The next invocation of __tasklet_action() clears SCHED, but tasklet_tryunlock()
> > below can never succeed because of PENDING.
>
> Yep. I've only been focusing on races in schedule/action, as I've been
> hunting issues w/ rcu. But I'll agree that the other state changes look
> problematic. I know Paul McKenney was looking at some of the other state
> changes and was seeing some potential problems as well.
OK, thanks. But doesn't this mean your 2-nd patch is questionable?
> + } else {
> + /* This is subtle. If we hit the corner case above
> + * It is possible that we get preempted right here,
> + * and another task has successfully called
> + * tasklet_schedule(), then this function, and
> + * failed on the trylock. Thus we must be sure
> + * before releasing the tasklet lock, that the
> + * SCHED_BIT is clear. Otherwise the tasklet
> + * may get its SCHED_BIT set, but not added to the
> + * list
> + */
> + if (!tasklet_tryunlock(t))
> + goto again;
Again, tasklet_tryunlock() can fail because _PENDING was set by __tasklet_action().
In that case __tasklet_common_schedule() goes to the endless loop, no?
Oleg.