We met a ksoftirqd 100% issue, the perf top shows kernel is busy
with tasklet_action(), but no actual action is shown. From dumped
kernel, there's only one disabled tasklet on the tasklet_vec.
tasklet_action might be handled after tasklet is disabled, this will
make disabled tasklet stayed on tasklet_vec. tasklet_action will not
handle disabled tasklet, but place it on the tail of tasklet_vec,
still raise softirq for this tasklet. Things will become worse if
device driver uses tasklet_disable on its device remove/close code.
The disabled tasklet will stay on the vec, frequently __raise_softirq_off()
and make ksoftirqd wakeup even if no tasklets need to be handled.
This patch introduced a new TASKLET_STATE_HI bit to indicate HI_SOFTIRQ,
in tasklet_action(), simply ignore the disabled tasklet and don't raise
the softirq nr. In my previous patch, I remove tasklet_hi_enable() since
it is the same as tasklet_enable(). So only tasklet_enable() needs to be
modified, if tasklet state is changed from disable to enable, use
__tasklet_schedule() to put it on the right vec.
Signed-off-by: Xiaotian Feng <[email protected]>
Cc: Andrew Morton <[email protected]>
---
include/linux/interrupt.h | 12 ++++++++++--
kernel/softirq.c | 10 +++++-----
2 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 5e4e617..7e5bb00 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -521,7 +521,8 @@ struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }
enum
{
TASKLET_STATE_SCHED, /* Tasklet is scheduled for execution */
- TASKLET_STATE_RUN /* Tasklet is running (SMP only) */
+ TASKLET_STATE_RUN, /* Tasklet is running (SMP only) */
+ TASKLET_STATE_HI /* Tasklet is HI_SOFTIRQ */
};
#ifdef CONFIG_SMP
@@ -593,7 +594,14 @@ static inline void tasklet_disable(struct tasklet_struct *t)
static inline void tasklet_enable(struct tasklet_struct *t)
{
smp_mb__before_atomic_dec();
- atomic_dec(&t->count);
+ if (atomic_dec_and_test(&t->count)) {
+ if (!test_bit(TASKLET_STATE_SCHED, &t->state))
+ return;
+ if (test_bit(TASKLET_STATE_HI, &t->state))
+ __tasklet_hi_schedule(t);
+ else
+ __tasklet_schedule(t);
+ }
}
static inline void tasklet_hi_enable(struct tasklet_struct *t)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index cc96bdc..6d77aef 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -431,6 +431,7 @@ void __tasklet_hi_schedule(struct tasklet_struct *t)
*__this_cpu_read(tasklet_hi_vec.tail) = t;
__this_cpu_write(tasklet_hi_vec.tail, &(t->next));
raise_softirq_irqoff(HI_SOFTIRQ);
+ set_bit(TASKLET_STATE_HI, &t->state);
local_irq_restore(flags);
}
@@ -442,6 +443,7 @@ void __tasklet_hi_schedule_first(struct tasklet_struct *t)
t->next = __this_cpu_read(tasklet_hi_vec.head);
__this_cpu_write(tasklet_hi_vec.head, t);
+ set_bit(TASKLET_STATE_HI, &t->state);
__raise_softirq_irqoff(HI_SOFTIRQ);
}
@@ -467,10 +469,9 @@ static void tasklet_action(struct softirq_action *a)
if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
BUG();
t->func(t->data);
- tasklet_unlock(t);
- continue;
- }
+ }
tasklet_unlock(t);
+ continue;
}
local_irq_disable();
@@ -502,10 +503,9 @@ static void tasklet_hi_action(struct softirq_action *a)
if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
BUG();
t->func(t->data);
- tasklet_unlock(t);
- continue;
}
tasklet_unlock(t);
+ continue;
}
local_irq_disable();
--
1.7.9.5
On Fri, 2 Nov 2012 10:48:54 +0800
Xiaotian Feng <[email protected]> wrote:
> We met a ksoftirqd 100% issue, the perf top shows kernel is busy
> with tasklet_action(), but no actual action is shown. From dumped
> kernel, there's only one disabled tasklet on the tasklet_vec.
>
> tasklet_action might be handled after tasklet is disabled, this will
> make disabled tasklet stayed on tasklet_vec. tasklet_action will not
> handle disabled tasklet, but place it on the tail of tasklet_vec,
> still raise softirq for this tasklet. Things will become worse if
> device driver uses tasklet_disable on its device remove/close code.
> The disabled tasklet will stay on the vec, frequently __raise_softirq_off()
> and make ksoftirqd wakeup even if no tasklets need to be handled.
>
> This patch introduced a new TASKLET_STATE_HI bit to indicate HI_SOFTIRQ,
> in tasklet_action(), simply ignore the disabled tasklet and don't raise
> the softirq nr. In my previous patch, I remove tasklet_hi_enable() since
> it is the same as tasklet_enable(). So only tasklet_enable() needs to be
> modified, if tasklet state is changed from disable to enable, use
> __tasklet_schedule() to put it on the right vec.
gee, I haven't looked at the tasklet code in 100 years. I think I'll
send this in Thomas's direction ;)
The race description seems real and the patch looks sane to me. Are
you sure we can get away with never clearing TASKLET_STATE_HI? For
example, what would happen if code does a tasklet_hi_schedule(t) and
later does a tasklet_schedule(t)?
On Tue, Nov 6, 2012 at 6:52 AM, Andrew Morton <[email protected]> wrote:
> On Fri, 2 Nov 2012 10:48:54 +0800
> Xiaotian Feng <[email protected]> wrote:
>
>> We met a ksoftirqd 100% issue, the perf top shows kernel is busy
>> with tasklet_action(), but no actual action is shown. From dumped
>> kernel, there's only one disabled tasklet on the tasklet_vec.
>>
>> tasklet_action might be handled after tasklet is disabled, this will
>> make disabled tasklet stayed on tasklet_vec. tasklet_action will not
>> handle disabled tasklet, but place it on the tail of tasklet_vec,
>> still raise softirq for this tasklet. Things will become worse if
>> device driver uses tasklet_disable on its device remove/close code.
>> The disabled tasklet will stay on the vec, frequently __raise_softirq_off()
>> and make ksoftirqd wakeup even if no tasklets need to be handled.
>>
>> This patch introduced a new TASKLET_STATE_HI bit to indicate HI_SOFTIRQ,
>> in tasklet_action(), simply ignore the disabled tasklet and don't raise
>> the softirq nr. In my previous patch, I remove tasklet_hi_enable() since
>> it is the same as tasklet_enable(). So only tasklet_enable() needs to be
>> modified, if tasklet state is changed from disable to enable, use
>> __tasklet_schedule() to put it on the right vec.
>
> gee, I haven't looked at the tasklet code in 100 years. I think I'll
> send this in Thomas's direction ;)
>
> The race description seems real and the patch looks sane to me. Are
> you sure we can get away with never clearing TASKLET_STATE_HI? For
> example, what would happen if code does a tasklet_hi_schedule(t) and
> later does a tasklet_schedule(t)?
hmm, that will be a nightmare...
tasklet_schedule(t)/tasklet_hi_schedule(t) doesn't use list_head, they
simply
make t->next = NULL, then put t on the tail of
tasklet_vec/tasklet_hi_vec. If the code does a tasklet_hi_schedule()
and then a tasklet_schedule, the tasklet will stay on tasklet_vec and
tasklet_hi_vec .... tasklet_hi_action will handle it first and clear
the TASKLET_SCHED_SCHED bit, later, in tasklet_action, it will be
handled again and hit a BUG_ON ...
But if code does a tasklet_hi_schedule(), then tasklet_kil and later
does a tasklet_schedule(), we do need clear the TASKLET_STATE_HI. Also
we need to remove the tasklet_hi_enable() as it is the same as
tasklet_enable() and there's
only one user..
I'll send you V2 patch soon, thanks.
>
>
On Tue, 6 Nov 2012 09:22:16 +0800 Xiaotian Feng <[email protected]> wrote:
> On Tue, Nov 6, 2012 at 6:52 AM, Andrew Morton <[email protected]> wrote:
> > On Fri, 2 Nov 2012 10:48:54 +0800
> > Xiaotian Feng <[email protected]> wrote:
> >
> >> We met a ksoftirqd 100% issue, the perf top shows kernel is busy
> >> with tasklet_action(), but no actual action is shown. From dumped
> >> kernel, there's only one disabled tasklet on the tasklet_vec.
> >>
> >> tasklet_action might be handled after tasklet is disabled, this will
> >> make disabled tasklet stayed on tasklet_vec. tasklet_action will not
> >> handle disabled tasklet, but place it on the tail of tasklet_vec,
> >> still raise softirq for this tasklet. Things will become worse if
> >> device driver uses tasklet_disable on its device remove/close code.
> >> The disabled tasklet will stay on the vec, frequently __raise_softirq_off()
> >> and make ksoftirqd wakeup even if no tasklets need to be handled.
> >>
> >> This patch introduced a new TASKLET_STATE_HI bit to indicate HI_SOFTIRQ,
> >> in tasklet_action(), simply ignore the disabled tasklet and don't raise
> >> the softirq nr. In my previous patch, I remove tasklet_hi_enable() since
> >> it is the same as tasklet_enable(). So only tasklet_enable() needs to be
> >> modified, if tasklet state is changed from disable to enable, use
> >> __tasklet_schedule() to put it on the right vec.
> >
> > gee, I haven't looked at the tasklet code in 100 years. I think I'll
> > send this in Thomas's direction ;)
> >
> > The race description seems real and the patch looks sane to me. Are
> > you sure we can get away with never clearing TASKLET_STATE_HI? For
> > example, what would happen if code does a tasklet_hi_schedule(t) and
> > later does a tasklet_schedule(t)?
>
> hmm, that will be a nightmare...
> tasklet_schedule(t)/tasklet_hi_schedule(t) doesn't use list_head, they
> simply
> make t->next = NULL, then put t on the tail of
> tasklet_vec/tasklet_hi_vec. If the code does a tasklet_hi_schedule()
> and then a tasklet_schedule, the tasklet will stay on tasklet_vec and
> tasklet_hi_vec .... tasklet_hi_action will handle it first and clear
> the TASKLET_SCHED_SCHED bit, later, in tasklet_action, it will be
> handled again and hit a BUG_ON ...
Well, actually I meant if the caller reuses the tassklet_struct after
its softirq has been run.
> But if code does a tasklet_hi_schedule(), then tasklet_kil and later
> does a tasklet_schedule(), we do need clear the TASKLET_STATE_HI.
That as well ;)
> Also
> we need to remove the tasklet_hi_enable() as it is the same as
> tasklet_enable() and there's
> only one user..
>
> I'll send you V2 patch soon, thanks.
Sounds good.
On Mon, 2012-11-05 at 17:37 -0800, Andrew Morton wrote:
> On Tue, 6 Nov 2012 09:22:16 +0800 Xiaotian Feng <[email protected]> wrote:
>
> > On Tue, Nov 6, 2012 at 6:52 AM, Andrew Morton <[email protected]> wrote:
> > > On Fri, 2 Nov 2012 10:48:54 +0800
> > > Xiaotian Feng <[email protected]> wrote:
> > >
> > >> We met a ksoftirqd 100% issue, the perf top shows kernel is busy
> > >> with tasklet_action(), but no actual action is shown. From dumped
> > >> kernel, there's only one disabled tasklet on the tasklet_vec.
> > >>
> > >> tasklet_action might be handled after tasklet is disabled, this will
> > >> make disabled tasklet stayed on tasklet_vec. tasklet_action will not
> > >> handle disabled tasklet, but place it on the tail of tasklet_vec,
> > >> still raise softirq for this tasklet. Things will become worse if
> > >> device driver uses tasklet_disable on its device remove/close code.
> > >> The disabled tasklet will stay on the vec, frequently __raise_softirq_off()
> > >> and make ksoftirqd wakeup even if no tasklets need to be handled.
> > >>
> > >> This patch introduced a new TASKLET_STATE_HI bit to indicate HI_SOFTIRQ,
> > >> in tasklet_action(), simply ignore the disabled tasklet and don't raise
> > >> the softirq nr. In my previous patch, I remove tasklet_hi_enable() since
> > >> it is the same as tasklet_enable(). So only tasklet_enable() needs to be
> > >> modified, if tasklet state is changed from disable to enable, use
> > >> __tasklet_schedule() to put it on the right vec.
> > >
> > > gee, I haven't looked at the tasklet code in 100 years. I think I'll
> > > send this in Thomas's direction ;)
> > >
> > > The race description seems real and the patch looks sane to me. Are
> > > you sure we can get away with never clearing TASKLET_STATE_HI? For
> > > example, what would happen if code does a tasklet_hi_schedule(t) and
> > > later does a tasklet_schedule(t)?
> >
> > hmm, that will be a nightmare...
> > tasklet_schedule(t)/tasklet_hi_schedule(t) doesn't use list_head, they
> > simply
> > make t->next = NULL, then put t on the tail of
> > tasklet_vec/tasklet_hi_vec. If the code does a tasklet_hi_schedule()
> > and then a tasklet_schedule, the tasklet will stay on tasklet_vec and
> > tasklet_hi_vec .... tasklet_hi_action will handle it first and clear
> > the TASKLET_SCHED_SCHED bit, later, in tasklet_action, it will be
> > handled again and hit a BUG_ON ...
>
> Well, actually I meant if the caller reuses the tassklet_struct after
> its softirq has been run.
>
> > But if code does a tasklet_hi_schedule(), then tasklet_kil and later
> > does a tasklet_schedule(), we do need clear the TASKLET_STATE_HI.
>
> That as well ;)
>
> > Also
> > we need to remove the tasklet_hi_enable() as it is the same as
> > tasklet_enable() and there's
> > only one user..
> >
> > I'll send you V2 patch soon, thanks.
>
> Sounds good.
Hi all,
I couldn't find the v2 patch of this on linux-kernel but this commit
4660e32 "tasklet: ignore disabled tasklet in tasklet_action()"
BUGS in -next-20121127.
-----------[cut here ]----------
kernel BUG at /home/peter/src/kernels/next/kernel/softirq.c:471!
invalid opcode: 0000 [#1] PREEMPT SMP
....
The registers/stack dump isn't useful so I didn't include it here.
I'm still trying to track down the execution sequence that causes this,
but the high-level trigger is a firewire bus reset.
Hopefully I'll have more information soon.
Regards,
Peter Hurley
PS - My new staging/fwserial driver isn't to blame because it isn't
loaded when this happens ;)
[cc'ing linux-next]
On Wed, 2012-11-28 at 13:00 -0500, Peter Hurley wrote:
> Hi all,
>
> I couldn't find the v2 patch of this on linux-kernel but this commit
>
> 4660e32 "tasklet: ignore disabled tasklet in tasklet_action()"
>
> BUGS in -next-20121127.
>
> -----------[cut here ]----------
> kernel BUG at /home/peter/src/kernels/next/kernel/softirq.c:471!
> invalid opcode: 0000 [#1] PREEMPT SMP
> ....
>
> The registers/stack dump isn't useful so I didn't include it here.
>
> I'm still trying to track down the execution sequence that causes this,
> but the high-level trigger is a firewire bus reset.
>
> Hopefully I'll have more information soon.
>From the original v1 of this patch [where is v2?] ...
On Fri, 2012-11-02 at 10:48 +0800, Xiaotian Feng wrote:
> We met a ksoftirqd 100% issue, the perf top shows kernel is busy
> with tasklet_action(), but no actual action is shown. From dumped
> kernel, there's only one disabled tasklet on the tasklet_vec.
>
> tasklet_action might be handled after tasklet is disabled, this will
> make disabled tasklet stayed on tasklet_vec. tasklet_action will not
> handle disabled tasklet, but place it on the tail of tasklet_vec,
> still raise softirq for this tasklet. Things will become worse if
> device driver uses tasklet_disable on its device remove/close code.
> The disabled tasklet will stay on the vec, frequently __raise_softirq_off()
> and make ksoftirqd wakeup even if no tasklets need to be handled.
>
> This patch introduced a new TASKLET_STATE_HI bit to indicate HI_SOFTIRQ,
> in tasklet_action(), simply ignore the disabled tasklet and don't raise
> the softirq nr. In my previous patch, I remove tasklet_hi_enable() since
> it is the same as tasklet_enable(). So only tasklet_enable() needs to be
> modified, if tasklet state is changed from disable to enable, use
> __tasklet_schedule() to put it on the right vec.
>
> Signed-off-by: Xiaotian Feng <[email protected]>
> Cc: Andrew Morton <[email protected]>
> ---
> include/linux/interrupt.h | 12 ++++++++++--
> kernel/softirq.c | 10 +++++-----
> 2 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 5e4e617..7e5bb00 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -521,7 +521,8 @@ struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }
> enum
> {
> TASKLET_STATE_SCHED, /* Tasklet is scheduled for execution */
> - TASKLET_STATE_RUN /* Tasklet is running (SMP only) */
> + TASKLET_STATE_RUN, /* Tasklet is running (SMP only) */
> + TASKLET_STATE_HI /* Tasklet is HI_SOFTIRQ */
> };
>
> #ifdef CONFIG_SMP
> @@ -593,7 +594,14 @@ static inline void tasklet_disable(struct tasklet_struct *t)
> static inline void tasklet_enable(struct tasklet_struct *t)
> {
> smp_mb__before_atomic_dec();
> - atomic_dec(&t->count);
> + if (atomic_dec_and_test(&t->count)) {
> + if (!test_bit(TASKLET_STATE_SCHED, &t->state))
> + return;
> + if (test_bit(TASKLET_STATE_HI, &t->state))
> + __tasklet_hi_schedule(t);
> + else
>
Since this isn't protected by locks, all of the conditions that __were__
met to arrive here (t->count == 0 && t->state & TASKLET_STATE_SCHED) may
no longer be true now because another cpu may have run tasklet_action(),
so now this tasklet will be scheduled when it should not be.
Plus __tasklet_schedule() can't just be called from any cpu that happens
to be calling tasklet_enable(). What you're doing here means that the
tasklet could be scheduled on multiple cpus at the same time -- that's
not going to work.
+ __tasklet_schedule(t);
>