2001-11-10 12:21:58

by Mathijs Mohlmann

[permalink] [raw]
Subject: [PATCH] fix loop with disabled tasklets


I have a questions about the current softirq implementation.

Looking at kernel/softirq.c:418 and futher. When the tasklet is disabled it
is rescheduled. This is fine, but __cpu_raise_softirq is also called. This
means that ksoftirqd will loop until the tasklet is enabled.
Normaly this is no biggy, because user processes still come through.
But during boot -when the tasklet_enable is yet to be called by "the idle
thread"- the system will lockup. This happens during the boot of my sun4m
with the keyboard tasklet.

To demonstrate, i wrote a module. Inserting this will result in 0% idle time.
Inserting this with the below patch, the system simply goes on.

If this really is an issue, this patch fixes it. It does change the current
behavior a bit though. Currently when a tasklet is run while it is already
running on a different cpu, the task is rescheduled. This results in the
tasklet being executed two times. With this patch applied, it will not. (this
is still good according to kernel/include/interrups.h:83 and futher).

i'm not sure about the enable_tasklet bit. I think it will prevent
people from calling tasklet_enable from within an interrupt handler. But then
again, why do you want to do that? Thanx, velco and

Any comments?

me

#define MODULE
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/interrupt.h>

MODULE_LICENSE("GPL");

void
dummy_lockit(unsigned long nill)
{
printk("doing the lockit\n");
return;
}

DECLARE_TASKLET_DISABLED(lockit, dummy_lockit, 0);
int
init_module(void)
{
printk("going to lock\n");
tasklet_schedule(&lockit);
return(0);
}

void
cleanup_module(void)
{
tasklet_enable(&lockit);
set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout(HZ); /* some time to do dummy_lockit */

printk("bye\n");
}


diff -ruN linux-2.4.14/include/linux/interrupt.h
linux/include/linux/interrupt.h
--- linux-2.4.14/include/linux/interrupt.h Sat Nov 3 11:20:41 2001
+++ linux/include/linux/interrupt.h Sat Nov 10 12:28:14 2001
@@ -185,13 +185,15 @@
static inline void tasklet_enable(struct tasklet_struct *t)
{
smp_mb__before_atomic_dec();
- atomic_dec(&t->count);
+ if (atomic_dec_and_test(&t->count))
+ __cpu_raise_softirq(smp_processor_id(), TASKLET_SOFTIRQ);
}

static inline void tasklet_hi_enable(struct tasklet_struct *t)
{
smp_mb__before_atomic_dec();
- atomic_dec(&t->count);
+ if (atomic_dec_and_test(&t->count))
+ __cpu_raise_softirq(smp_processor_id(), HI_SOFTIRQ);
}

extern void tasklet_kill(struct tasklet_struct *t);
diff -ruN linux-2.4.14/kernel/softirq.c linux/kernel/softirq.c
--- linux-2.4.14/kernel/softirq.c Thu Nov 8 15:58:24 2001
+++ linux/kernel/softirq.c Sat Nov 10 12:27:24 2001
@@ -188,21 +188,19 @@

list = list->next;

- if (tasklet_trylock(t)) {
- if (!atomic_read(&t->count)) {
+ if (!atomic_read(&t->count)) {
+ if (tasklet_trylock(t)) {
if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
BUG();
t->func(t->data);
tasklet_unlock(t);
- continue;
}
- tasklet_unlock(t);
+ continue;
}

local_irq_disable();
t->next = tasklet_vec[cpu].list;
tasklet_vec[cpu].list = t;
- __cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
local_irq_enable();
}
}
@@ -222,21 +220,19 @@

list = list->next;

- if (tasklet_trylock(t)) {
- if (!atomic_read(&t->count)) {
+ if (!atomic_read(&t->count)) {
+ if (tasklet_trylock(t)) {
if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
BUG();
t->func(t->data);
tasklet_unlock(t);
- continue;
}
- tasklet_unlock(t);
+ continue;
}

local_irq_disable();
t->next = tasklet_hi_vec[cpu].list;
tasklet_hi_vec[cpu].list = t;
- __cpu_raise_softirq(cpu, HI_SOFTIRQ);
local_irq_enable();
}
}


2001-11-10 13:38:09

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

From: Mathijs Mohlmann <[email protected]>
Date: Sat, 10 Nov 2001 13:21:56 +0100

i'm not sure about the enable_tasklet bit. I think it will prevent
people from calling tasklet_enable from within an interrupt handler. But then
again, why do you want to do that? Thanx, velco and

Any comments?

I've been looking at this and I sent Andrea+Linus private mail on this
to try and work out a fix.

You can't simply stop enabling the softirq when you hit the "locked
tasklet" condition. That could deadlock the tasklet.

What really needs to happen is:

1) If tasklet is scheduled, but disabled, simply ignore it
during tasklet processing. Do not resignal softirq.
But do leave it on the pending lists.

2) When tasklet enable brings t->count back to zero and
tasklet is found to be scheduled, signal a local softirq.

To me, that would be the proper fix. But I still haven't heard back
from Andrea or Linus yet :-)

Franks a lot,
David S. Miller
[email protected]

2001-11-10 15:03:39

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

On Sat, Nov 10, 2001 at 05:37:20AM -0800, David S. Miller wrote:
> From: Mathijs Mohlmann <[email protected]>
> Date: Sat, 10 Nov 2001 13:21:56 +0100
>
> i'm not sure about the enable_tasklet bit. I think it will prevent
> people from calling tasklet_enable from within an interrupt handler. But then
> again, why do you want to do that? Thanx, velco and
>
> Any comments?
>
> I've been looking at this and I sent Andrea+Linus private mail on this
> to try and work out a fix.
>
> You can't simply stop enabling the softirq when you hit the "locked
> tasklet" condition. That could deadlock the tasklet.
>
> What really needs to happen is:
>
> 1) If tasklet is scheduled, but disabled, simply ignore it
> during tasklet processing. Do not resignal softirq.
> But do leave it on the pending lists.
>
> 2) When tasklet enable brings t->count back to zero and
> tasklet is found to be scheduled, signal a local softirq.
>
> To me, that would be the proper fix. But I still haven't heard back
> from Andrea or Linus yet :-)

just playing with the "softirq_raise" would be much simpler but I can
see only one problem: we don't know what cpu the tasklet is scheduled
into, so we don't know where to signal the local softirq.

So it seems to fix the looping problem of disabled tasklets we should
really dschedule the tasklet in tasklet_disable (and of course to forbid
it to be scheduled when disabled in tasklet_schedule) and later to
reschedule it in tasklet_enable.

The looping of disabled tasklets was intentional though, it's true as
you noted that if somebody disable a tasklet we'll keep wasting time on
it (ksofitrqd or not, we would waste time anyways at every irq etc..,
ksoftirqd just makes it visible with top which is a good thing rather
than having the overhead hided in the irq handler and in the scheduler),
but the idea was that by design disabled tasklets should remain disabled
for a little time.

Infact at the moment it's even impossible to remove a tasklet from the
queue if it remains disabled forever. tasklet_kill will deadlock on a
tasklet that is disabled.

Comments Linus?

Andrea

2001-11-10 15:29:16

by Mathijs Mohlmann

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

On Saturday 10 November 2001 16:03, Andrea Arcangeli wrote:
> just playing with the "softirq_raise" would be much simpler but I can
> see only one problem: we don't know what cpu the tasklet is scheduled
> into, so we don't know where to signal the local softirq.
>
> So it seems to fix the looping problem of disabled tasklets we should
> really dschedule the tasklet in tasklet_disable (and of course to forbid
> it to be scheduled when disabled in tasklet_schedule) and later to
> reschedule it in tasklet_enable.

Or add a cpu field to struct tasklet_struct.

> Infact at the moment it's even impossible to remove a tasklet from the
> queue if it remains disabled forever. tasklet_kill will deadlock on a
> tasklet that is disabled.
I think this is the responsability of the device driver writer (or who ever
uses it). AFAIK there is no defined behavior for this case. I vote for
removing the tasklet without it ever being run.

me

2001-11-10 15:55:52

by Alan

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

> > queue if it remains disabled forever. tasklet_kill will deadlock on a
> > tasklet that is disabled.
> I think this is the responsability of the device driver writer (or who ever
> uses it). AFAIK there is no defined behavior for this case. I vote for
> removing the tasklet without it ever being run.

I don't think the semantics actually matter too much. That would be mostly
down to timing. Surely its sufficient to say something like "when
tasklet_kill returns the tasklet will not be scheduled or execute further"

Do we care if it ran ? And do we want to imply any locking properties we
might have to maintain?

2001-11-10 16:39:25

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

On Sat, Nov 10, 2001 at 04:29:00PM +0100, Mathijs Mohlmann wrote:
> On Saturday 10 November 2001 16:03, Andrea Arcangeli wrote:
> > just playing with the "softirq_raise" would be much simpler but I can
> > see only one problem: we don't know what cpu the tasklet is scheduled
> > into, so we don't know where to signal the local softirq.
> >
> > So it seems to fix the looping problem of disabled tasklets we should
> > really dschedule the tasklet in tasklet_disable (and of course to forbid
> > it to be scheduled when disabled in tasklet_schedule) and later to
> > reschedule it in tasklet_enable.
>
> Or add a cpu field to struct tasklet_struct.

of course, but that has to be maintained with locking too that
invalidates the simplicity of such approch. My point was that if we have
to play with the tasklet locking I guess we can as well drop the tasklet
from the queue and be more scalable so you can schedule thousand of
disabled tasklets at zero do_softirq cost (that was first Dave
suggestion).

> > Infact at the moment it's even impossible to remove a tasklet from the
> > queue if it remains disabled forever. tasklet_kill will deadlock on a
> > tasklet that is disabled.
> I think this is the responsability of the device driver writer (or who ever
> uses it). AFAIK there is no defined behavior for this case. I vote for
> removing the tasklet without it ever being run.

yes, dropping it only after it run is an implementation detail, it can
delay a bit but it doesn't matter much. If we change the "disabled
tasklet" case (also for tasklet_kill, so you can kill also a disabled
tasklet) it may end to be more handy to also kill a scheduled tasklet
before it run, these will be more implementation details too and I agree
with you and Alan that here the semantics of running the tasklet before
killing it or not doesn't matter and should be considered undefined
behaviour. The only semantics of tasklet_kill are that the tasklet will
stop running after tasklet_kill returned and as Alan noted it depends on
the timing anyways.

btw, the patches I seen floating around checking the ->count before
tasking are racy.

Andrea

2001-11-11 02:32:57

by Mathijs Mohlmann

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

On Saturday 10 November 2001 16:03, Andrea Arcangeli wrote:
> So it seems to fix the looping problem of disabled tasklets we should
> really dschedule the tasklet in tasklet_disable (and of course to forbid
> it to be scheduled when disabled in tasklet_schedule) and later to
> reschedule it in tasklet_enable.

i'm beginning to see my troubles here. (only just now). I have been reading
"writing linux device drivers" and they state some additional properties for
tasklets. These are (page 199-200):
1) a tasklet is guaranteed to run on the cpu that first schedules it. For
better cache behavoir.
2) when a tasklet is disabled, you may still schedule it. It will run asap
after tasklet_enable.
If we also want to enforce these rules we should a) put them in interrupts.h
b) i think we don't escape from using a cpu field and locking it.

Now, i don't know where oreilly got these addition properties from. In fact,
i've been working on the kernel for less then a week and the first two days
were just to setup my cross compiler. I only noticed that my sparcstation LX
didn't boot and that started this of. (oh and i want to see my name in the
changelog someday, just for fun ;).

anyway, don't know were these come from, but i would like to know if we should
enforce them. I'm working on a better patch, which has the above features,
but if we don't support them, no need to wast cycles over them.

This patch i'm working on now removes the tasklet from tasklet_vec[cpu].list
when it is disabled but not after saving smp_processor_id in t->cpu. When the
tasklet is enabled and the tasklet is marked scheduled, the tasklet is added
to the right cpu.
t->cpu is set on first schedule and not reset with additional schedules.

Oh, and i will try to look for races this time ;)

me

2001-11-11 15:59:40

by Momchil Velikov

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets


Mathijs,

You may want to have a look at the following patches (tested by visual
inspection):

--- softirq.c.orig Sun Nov 11 16:42:14 2001
+++ softirq.c Sun Nov 11 16:59:44 2001
@@ -188,22 +188,17 @@ static void tasklet_action(struct softir

list = list->next;

+ if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
+ BUG();
+
if (tasklet_trylock(t)) {
- if (!atomic_read(&t->count)) {
- if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
- BUG();
+ if (!atomic_read(&t->count))
t->func(t->data);
- tasklet_unlock(t);
- continue;
- }
tasklet_unlock(t);
+ continue;
}

- local_irq_disable();
- t->next = tasklet_vec[cpu].list;
- tasklet_vec[cpu].list = t;
- __cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
- local_irq_enable();
+ tasklet_schedule(t);
}
}

@@ -222,22 +217,17 @@ static void tasklet_hi_action(struct sof

list = list->next;

+ if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
+ BUG();
+
if (tasklet_trylock(t)) {
- if (!atomic_read(&t->count)) {
- if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
- BUG();
+ if (!atomic_read(&t->count))
t->func(t->data);
- tasklet_unlock(t);
- continue;
- }
tasklet_unlock(t);
+ continue;
}

- local_irq_disable();
- t->next = tasklet_hi_vec[cpu].list;
- tasklet_hi_vec[cpu].list = t;
- __cpu_raise_softirq(cpu, HI_SOFTIRQ);
- local_irq_enable();
+ tasklet_hi_schedule(t);
}
}


--- interrupt.h.orig Sun Nov 11 16:49:05 2001
+++ interrupt.h Sun Nov 11 17:28:08 2001
@@ -185,13 +185,15 @@ static inline void tasklet_disable(struc
static inline void tasklet_enable(struct tasklet_struct *t)
{
smp_mb__before_atomic_dec();
- atomic_dec(&t->count);
+ if (atomic_dec_and_test(&t->count))
+ tasklet_schedule(t);
}

static inline void tasklet_hi_enable(struct tasklet_struct *t)
{
smp_mb__before_atomic_dec();
- atomic_dec(&t->count);
+ if (atomic_dec_and_test(&t->count))
+ tasklet_hi_schedule(t);
}

extern void tasklet_kill(struct tasklet_struct *t);

Regards,
-velco

2001-11-12 01:12:27

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

I'm just guessing: the scheduler isn't yet functional when
spawn_ksoftirqd is called. This is clearly a sparc32 bug, because
swapn_ksoftirqd is called via the __initcall section and the scheduler
must be fully functional by that time. Unrelated to the fact ksoftirqd
can loop on disabled tasklets, the ksoftirqd startup is only an innocent
trigger for the deadlock.

Mathijs, can you verify that? If my theory is right need_resched isn't
set even if ksoftirqd loops forever. It could be one of those two
possibilities:

1) the timer irq isn't running yet
2) "current" isn't functional

The softirq code shouldn't really be the culprit of the sparc32
deadlock (as Alexey pointed out).

Andrea

2001-11-12 07:43:03

by Mathijs Mohlmann

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets



On Mon, 12 Nov 2001, Andrea Arcangeli wrote:
> Mathijs, can you verify that? If my theory is right need_resched isn't
> set even if ksoftirqd loops forever. It could be one of those two
> possibilities:
>
> 1) the timer irq isn't running yet
> 2) "current" isn't functional

well, i'm at work now, no sparc32's here. But during my debugging i made
a piece of code that counted to times a process was selected by schedule.
A custom SysRq key read the values. What i saw during the deadlock was
that there where three processes running allready and scheduling seemed to
work find. keventd got selected everytime i pressed SysRq and ksoftirqd
got selected during idle time.

This (to me) proves that "current" is working just fine. Not quite sure
about the timer irq. I will look at that tonight. But i'm pretty sure
need_resched is set again and again.

Even if the timer irq is working fine, the sun should not enable the
keyboard irq without the tasklet being enabled. Initializing the keyboard
tasklet enabled got the sun to boot just fine for me.

But i feel that this is fixing the symptoms. If this turns out to be the
problem, it should be fixed, but i feel the softirq code needs some good
looking over with respect to disabled tasklets as well. (tasklet_enable
without a tasket_schedule and then a tasklet_kill will loop as well,
wont it?)

me (everything IMVHO)

2001-11-12 07:46:33

by Mathijs Mohlmann

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets



On 11 Nov 2001, Momchil Velikov wrote:
> You may want to have a look at the following patches (tested by visual
> inspection):

I like this one. I think it is what Andrea was going for, without the
changes to interrupt.h. If we are going for thisone we should add some
comments to interrupt.h warning about deadlocks etc.

still working on a patch that adds a cpu field to the tasklet_struct.
Looking for races takes me a long time :).

me


2001-11-12 07:59:44

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

From: Mathijs Mohlmann <[email protected]>
Date: Mon, 12 Nov 2001 08:42:27 +0100 (CET)

Even if the timer irq is working fine, the sun should not enable the
keyboard irq without the tasklet being enabled. Initializing the keyboard
tasklet enabled got the sun to boot just fine for me.

They come from the serial port, not from a normal "IRQ".
This is why events arrive so early.

Linus's proposed solution will work just fine and frankly
that's what I'm going to check into my tree. :-) For
reference this is:

1) Kill DECLARE_TASKLET_DISABLED use DECLARE_TASKLET for
keyboard_tasklet.

2) In keyboard tasklet handler check a "keyboard_init_done"
boolean and just return immediately if it is clear.

3) Where we currently do "tasklet_enable(&keyboard_tasklet);"
simply kill that line and check it to
"keyboard_init_done = 1;"

Franks a lot,
David S. Miller
[email protected]

2001-11-12 08:01:14

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

From: Mathijs Mohlmann <[email protected]>
Date: Mon, 12 Nov 2001 08:46:08 +0100 (CET)

On 11 Nov 2001, Momchil Velikov wrote:
> You may want to have a look at the following patches (tested by visual
> inspection):

I like this one.

I don't like these changes, just make DECLARE_TASKLET_DISABLED
and long-term tasklet_disable()'s be illegal.

There is only one abuser of this (keyboard), and it is easily cured.

Franks a lot,
David S. Miller
[email protected]

2001-11-12 08:03:34

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

From: Andrea Arcangeli <[email protected]>
Date: Mon, 12 Nov 2001 02:11:42 +0100

I'm just guessing: the scheduler isn't yet functional when
spawn_ksoftirqd is called.

The scheduler is fully functional, this isn't what is going wrong.

Look at my other email from last night. At this point in the booting
process, what would we possibly switch to if ksoftirqd is in running
state constantly? No other kernel thread is of a higher priority if
the only things running are the idle threads and ksoftird. This is
basically what I think is happening on sparc32.

Right?

Franks a lot,
David S. Miller
[email protected]

2001-11-12 08:37:10

by Momchil Velikov

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

>>>>> "Mathijs" == Mathijs Mohlmann <[email protected]> writes:

Mathijs> On 11 Nov 2001, Momchil Velikov wrote:
>> You may want to have a look at the following patches (tested by
>> visual inspection):

Mathijs> I like this one. I think it is what Andrea was going for,
Mathijs> without the changes to interrupt.h. If we are going for

In this patch, the first thing is to deschedule the tasklet. So, the
changes to interrupt.h are needed in order to put back the tasklet in
the queue.

Mathijs> thisone we should add some comments to interrupt.h warning
Mathijs> about deadlocks etc.

What deadlocks ? ;)

Mathijs> still working on a patch that adds a cpu field to the
Mathijs> tasklet_struct. Looking for races takes me a long time :).

I think we can ignore the cpu whenever a tasklet is scheduled and
disabled, treating it as not scheduled at all.

Regards,
-velco

2001-11-12 09:12:12

by Mathijs Mohlmann

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets


On 12-Nov-2001 Momchil Velikov wrote:
> In this patch, the first thing is to deschedule the tasklet. So, the
> changes to interrupt.h are needed in order to put back the tasklet in
> the queue.
I know, but Andrea suggested not to allow scheduling of disabled tasklets
Also, enableing the tasklet will result in a scheduled tasklet, regardless
whether it was scheduled. Plus, we are not sure if it is scheduled on the
same cpu that did the tasklet_schedule (but i might be the only one who
cares about this ;)

> Mathijs> thisone we should add some comments to interrupt.h warning
> Mathijs> about deadlocks etc.
> What deadlocks ? ;)
well, loops. Dont use tasklet_kill on disabled tasklet or on not scheduled
tasklets.

me


--
me

2001-11-12 09:35:42

by Momchil Velikov

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

>>>>> "Mathijs" == Mathijs Mohlmann <[email protected]> writes:

Mathijs> On 12-Nov-2001 Momchil Velikov wrote:
>> In this patch, the first thing is to deschedule the tasklet. So,
>> the changes to interrupt.h are needed in order to put back the
>> tasklet in the queue.
Mathijs> I know, but Andrea suggested not to allow scheduling of
Mathijs> disabled tasklets Also, enableing the tasklet will result in

Disabled tasklets are not scheduled by enable_tasklet (). A disabled
tasklet may temporarily appear in the queue, but nevertheless
tasklet_action will remove it. It seems gross to traverse the list in
order to remove a tasklet at the first disable.

Mathijs> a scheduled tasklet, regardless whether it was
Mathijs> scheduled.

Hmm, if it isn't scheduled, there is not much sense in disabling it at
all.

Mathijs> Plus, we are not sure if it is scheduled on the
Mathijs> same cpu that did the tasklet_schedule (but i might be the
Mathijs> only one who cares about this ;)

Mathijs> thisone we should add some comments to interrupt.h warning
Mathijs> about deadlocks etc.
>> What deadlocks ? ;)
Mathijs> well, loops. Dont use tasklet_kill on disabled tasklet or on
Mathijs> not scheduled tasklets.

Hmm, if TASKLET_STATE_SCHED is not set tasklet_kill will not deadlock.
And tasklet_kill yields (?). Doesn't that mean that tasklet_action
will be called eventually ?

Regards,
-velco



2001-11-12 09:55:13

by Mathijs Mohlmann

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets


On 12-Nov-2001 Momchil Velikov wrote:
> Hmm, if it isn't scheduled, there is not much sense in disabling it at
> all.
DECLARE_TASKLET_DISABLED
tasklet_enable

> Hmm, if TASKLET_STATE_SCHED is not set tasklet_kill will not deadlock.
> And tasklet_kill yields (?). Doesn't that mean that tasklet_action
> will be called eventually ?

oops, i misread tasklet_unlock_wait. So sorry.

me


--
me

2001-11-12 13:58:37

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

On Mon, Nov 12, 2001 at 08:42:27AM +0100, Mathijs Mohlmann wrote:
>
>
> On Mon, 12 Nov 2001, Andrea Arcangeli wrote:
> > Mathijs, can you verify that? If my theory is right need_resched isn't
> > set even if ksoftirqd loops forever. It could be one of those two
> > possibilities:
> >
> > 1) the timer irq isn't running yet
> > 2) "current" isn't functional
>
> well, i'm at work now, no sparc32's here. But during my debugging i made
> a piece of code that counted to times a process was selected by schedule.
> A custom SysRq key read the values. What i saw during the deadlock was
> that there where three processes running allready and scheduling seemed to
> work find. keventd got selected everytime i pressed SysRq and ksoftirqd

so "current" was _always_ "keventd" every time you press SYSRQ+P? sounds
like you deadlocked in keventd then.

> got selected during idle time.
>
> This (to me) proves that "current" is working just fine. Not quite sure
> about the timer irq. I will look at that tonight. But i'm pretty sure
> need_resched is set again and again.
>
> Even if the timer irq is working fine, the sun should not enable the
> keyboard irq without the tasklet being enabled. Initializing the keyboard

scheduling the tasklet without it being enabled is perfect, there's no
problem at all if the scheduler is functional by the time
spwan_ksoftirqd is executed.

All issues mentioned so far about the softirq should only to deal with
performance, not with deadlocking. If it deadlocks that's a sparc32 bug,
messing the softirq code can only hide the real bug.

> tasklet enabled got the sun to boot just fine for me.
>
> But i feel that this is fixing the symptoms. If this turns out to be the
> problem, it should be fixed, but i feel the softirq code needs some good
> looking over with respect to disabled tasklets as well. (tasklet_enable

As we said several times so far, tasklets should remain disabled only
for a little time during production (aka during the benchmarks), and
it's a matter of the user to tasklet_kill only enabled tasklets (it's
like quitting a function without a local_irq_restore, no surprise if irq
remains stuck if you forget to reenable irq/tasklets).

As Linus said tasklet_disable is the same thing of __cli() (or if you
prefer as Alexey said, it's like local_bh_disable()), you shouldn't __cli()
or local_bh_disable() for a long time, it's worthless to optimize for
very long critical sections, tasklet_disable is just to serialize within
a critical section. The boot case doesn't matter either, it doesn't
matter if at boot we waste a couple of cpu cycles, boot time won't
change anyways because of that (boot time is all cpu idle time waiting
those scsi things).

But again, those softirq issues are only performance issues, they cannot
explain a deadlock in spwan_ksoftirqd, that is a sparc32 bug, period.

So please fix the bug, then we can discuss the rest, but whatever we
change it won't be for the boot process, the only thing that matters is
production, even if ksoftirqd loops for seconds during the boot process
it doesn't matter at all.

> without a tasket_schedule and then a tasklet_kill will loop as well,
> wont it?)
>
> me (everything IMVHO)


Andrea

2001-11-12 14:05:48

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

On Mon, Nov 12, 2001 at 12:03:05AM -0800, David S. Miller wrote:
> From: Andrea Arcangeli <[email protected]>
> Date: Mon, 12 Nov 2001 02:11:42 +0100
>
> I'm just guessing: the scheduler isn't yet functional when
> spawn_ksoftirqd is called.
>
> The scheduler is fully functional, this isn't what is going wrong.

check ret_from_fork path, sparc32 scheduler is broken and this is why it
deadlocks at boot, it has nothing to do with the softirq code, softirq
code is innocent and it only get bitten by the sparc32 bug.

Andrea

2001-11-12 14:04:27

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

On Sun, Nov 11, 2001 at 11:59:05PM -0800, David S. Miller wrote:
> From: Mathijs Mohlmann <[email protected]>
> Date: Mon, 12 Nov 2001 08:42:27 +0100 (CET)
>
> Even if the timer irq is working fine, the sun should not enable the
> keyboard irq without the tasklet being enabled. Initializing the keyboard
> tasklet enabled got the sun to boot just fine for me.
>
> They come from the serial port, not from a normal "IRQ".
> This is why events arrive so early.
>
> Linus's proposed solution will work just fine and frankly
> that's what I'm going to check into my tree. :-) For
> reference this is:
>
> 1) Kill DECLARE_TASKLET_DISABLED use DECLARE_TASKLET for
> keyboard_tasklet.
>
> 2) In keyboard tasklet handler check a "keyboard_init_done"
> boolean and just return immediately if it is clear.
>
> 3) Where we currently do "tasklet_enable(&keyboard_tasklet);"
> simply kill that line and check it to
> "keyboard_init_done = 1;"

I recommend to fix the obvious sparc breakage, before changing the
softirq code in any way, to make sure not to hide the scheduler bug.

Ah, and let me bet, the sparc32 scheduler breakage is that it's not
running schedule_tail in ret_from_fork (you must
s/ret_from_smpfork/ret_from_fork/ as well). Since 2.4 it's not a SMP
thing any longer, it is _required_ on UP too or sched yield will break
badly! This of course is a problem not just for spwan_ksoftirqd, but
it's a core bug that triggers all the time with the sparc32 userspace
too.

Andrea

2001-11-12 14:21:18

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

On Mon, Nov 12, 2001 at 03:04:52PM +0100, Andrea Arcangeli wrote:
> On Mon, Nov 12, 2001 at 12:03:05AM -0800, David S. Miller wrote:
> > From: Andrea Arcangeli <[email protected]>
> > Date: Mon, 12 Nov 2001 02:11:42 +0100
> >
> > I'm just guessing: the scheduler isn't yet functional when
> > spawn_ksoftirqd is called.
> >
> > The scheduler is fully functional, this isn't what is going wrong.
>
> check ret_from_fork path, sparc32 scheduler is broken and this is why it
> deadlocks at boot, it has nothing to do with the softirq code, softirq
> code is innocent and it only get bitten by the sparc32 bug.

real fix looks like this (no idea what PSR_PIL means so not sure if this
really works on UP but certainly the sched_yield breakage is fixed now
and it won't deadlock in the softirq code any longer):

--- 2.4.15pre2aa1/arch/sparc/kernel/entry.S.~1~ Sat Feb 10 02:34:05 2001
+++ 2.4.15pre2aa1/arch/sparc/kernel/entry.S Mon Nov 12 15:17:32 2001
@@ -1466,8 +1466,7 @@
b C_LABEL(ret_sys_call)
ld [%sp + REGWIN_SZ + PT_I0], %o0

-#ifdef CONFIG_SMP
- .globl C_LABEL(ret_from_smpfork)
+ .globl C_LABEL(ret_from_fork)
C_LABEL(ret_from_smpfork):
wr %l0, PSR_ET, %psr
WRITE_PAUSE
@@ -1475,7 +1474,6 @@
mov %g3, %o0
b C_LABEL(ret_sys_call)
ld [%sp + REGWIN_SZ + PT_I0], %o0
-#endif

/* Linux native and SunOS system calls enter here... */
.align 4
--- 2.4.15pre2aa1/arch/sparc/kernel/process.c.~1~ Thu Oct 11 10:41:52 2001
+++ 2.4.15pre2aa1/arch/sparc/kernel/process.c Mon Nov 12 15:18:21 2001
@@ -455,11 +455,7 @@
* allocate the task_struct and kernel stack in
* do_fork().
*/
-#ifdef CONFIG_SMP
-extern void ret_from_smpfork(void);
-#else
-extern void ret_from_syscall(void);
-#endif
+extern void ret_from_smp(void);

int copy_thread(int nr, unsigned long clone_flags, unsigned long sp,
unsigned long unused,
@@ -493,13 +489,8 @@
copy_regwin(new_stack, (((struct reg_window *) regs) - 1));

p->thread.ksp = (unsigned long) new_stack;
-#ifdef CONFIG_SMP
p->thread.kpc = (((unsigned long) ret_from_smpfork) - 0x8);
p->thread.kpsr = current->thread.fork_kpsr | PSR_PIL;
-#else
- p->thread.kpc = (((unsigned long) ret_from_syscall) - 0x8);
- p->thread.kpsr = current->thread.fork_kpsr;
-#endif
p->thread.kwim = current->thread.fork_kwim;

/* This is used for sun4c only */

Andrea

2001-11-12 14:33:59

by Petr Vandrovec

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

On 12 Nov 01 at 15:20, Andrea Arcangeli wrote:

Hi Andrea,
does it compile? It looks to me like that you are using ret_from_fork,
ret_from_smp and ret_from_smpfork interchangably in the patch. You renamed
ret_from_smpfork to ret_from_fork, changed extern ret_from_smpfork
to ret_from_smp, and you still call ret_from_smpfork ;-)

Or do I miss something?
Thanks,
Petr Vandrovec
[email protected]


> --- 2.4.15pre2aa1/arch/sparc/kernel/entry.S.~1~ Sat Feb 10 02:34:05 2001
> +++ 2.4.15pre2aa1/arch/sparc/kernel/entry.S Mon Nov 12 15:17:32 2001
> @@ -1466,8 +1466,7 @@
> b C_LABEL(ret_sys_call)
> ld [%sp + REGWIN_SZ + PT_I0], %o0
>
> -#ifdef CONFIG_SMP
> - .globl C_LABEL(ret_from_smpfork)
> + .globl C_LABEL(ret_from_fork)
> C_LABEL(ret_from_smpfork):
> wr %l0, PSR_ET, %psr
> WRITE_PAUSE
> @@ -1475,7 +1474,6 @@
> mov %g3, %o0
> b C_LABEL(ret_sys_call)
> ld [%sp + REGWIN_SZ + PT_I0], %o0
> -#endif
>
> /* Linux native and SunOS system calls enter here... */
> .align 4
> --- 2.4.15pre2aa1/arch/sparc/kernel/process.c.~1~ Thu Oct 11 10:41:52 2001
> +++ 2.4.15pre2aa1/arch/sparc/kernel/process.c Mon Nov 12 15:18:21 2001
> @@ -455,11 +455,7 @@
> * allocate the task_struct and kernel stack in
> * do_fork().
> */
> -#ifdef CONFIG_SMP
> -extern void ret_from_smpfork(void);
> -#else
> -extern void ret_from_syscall(void);
> -#endif
> +extern void ret_from_smp(void);
>
> int copy_thread(int nr, unsigned long clone_flags, unsigned long sp,
> unsigned long unused,
> @@ -493,13 +489,8 @@
> copy_regwin(new_stack, (((struct reg_window *) regs) - 1));
>
> p->thread.ksp = (unsigned long) new_stack;
> -#ifdef CONFIG_SMP
> p->thread.kpc = (((unsigned long) ret_from_smpfork) - 0x8);
> p->thread.kpsr = current->thread.fork_kpsr | PSR_PIL;
> -#else
> - p->thread.kpc = (((unsigned long) ret_from_syscall) - 0x8);
> - p->thread.kpsr = current->thread.fork_kpsr;
> -#endif
> p->thread.kwim = current->thread.fork_kwim;
>
> /* This is used for sun4c only */

2001-11-12 14:49:10

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

On Mon, Nov 12, 2001 at 03:33:03PM +0000, Petr Vandrovec wrote:
> On 12 Nov 01 at 15:20, Andrea Arcangeli wrote:
>
> Hi Andrea,
> does it compile? It looks to me like that you are using ret_from_fork,
> ret_from_smp and ret_from_smpfork interchangably in the patch. You renamed

ret_from_smpfork must disappear, to fix any compilation trouble you only
need to replace it with ret_from_fork in any possible place that I
forgotten.

as said I don't know what PSR_PIL means so that is more important part
to check for runtime correctness (compilations details are trivial to
solve instead).

thanks for the review,
Andrea

2001-11-12 17:11:15

by Thorsten Kukuk

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

On Mon, Nov 12, Andrea Arcangeli wrote:

> On Mon, Nov 12, 2001 at 03:04:52PM +0100, Andrea Arcangeli wrote:
> > On Mon, Nov 12, 2001 at 12:03:05AM -0800, David S. Miller wrote:
> > > From: Andrea Arcangeli <[email protected]>
> > > Date: Mon, 12 Nov 2001 02:11:42 +0100
> > >
> > > I'm just guessing: the scheduler isn't yet functional when
> > > spawn_ksoftirqd is called.
> > >
> > > The scheduler is fully functional, this isn't what is going wrong.
> >
> > check ret_from_fork path, sparc32 scheduler is broken and this is why it
> > deadlocks at boot, it has nothing to do with the softirq code, softirq
> > code is innocent and it only get bitten by the sparc32 bug.
>
> real fix looks like this (no idea what PSR_PIL means so not sure if this
> really works on UP but certainly the sched_yield breakage is fixed now
> and it won't deadlock in the softirq code any longer):

A kernel with Andrea patch boots (after renaming all remaining
ret_from_smpfork to ret_from_fork) on one critical machine from me.
I will test the other this evening.

Thorsten

--
Thorsten Kukuk http://www.suse.de/~kukuk/ [email protected]
SuSE GmbH Deutschherrenstr. 15-19 D-90429 Nuernberg
--------------------------------------------------------------------
Key fingerprint = A368 676B 5E1B 3E46 CFCE 2D97 F8FD 4E23 56C6 FB4B

2001-11-12 19:03:24

by Mathijs Mohlmann

[permalink] [raw]
Subject: Re: [PATCH] fix loop with disabled tasklets

On Monday 12 November 2001 15:20, Andrea Arcangeli wrote:
> real fix looks like this

This works for me. Sorry for my stubbornness earlier. I didn't receive
Linus' comment on this and didn't find anything about it in the archive.
Thought is was an issue.

Thanx Andrea, i learned a lot (humbleness being one :)

me