Hey Ingo,
I've been seeing some odd behavior recently w/ the irq smp_affinity
masks. With a single runaway -rt thread, we were getting hard hangs on
an SMP system.
Clearly, if the runaway -rt thread is running on a cpu that is getting
critical interrupts (scsi, network, etc), a system hang would be
expected. However I started playing with the proc/irq/<num>/smp_affinity
masks and got some odd behavior.
Using taskset and a small script that sets all the irq smp_affinity
values, I got the following.
./irqbind.sh 1 ; taskset 1 ./runaway = HANG (as expected)
./irqbind.sh 1 ; taskset 2 ./runaway = NO HANG (as expected)
./irqbind.sh 2 ; taskset 1 ./runaway = NO HANG (as expected)
./irqbind.sh 2 ; taskset 2 ./runaway = HANG (as expected)
Everything is cool. However, I figured we could set the irq smp_affinity
mask to multiple cpus so a runaway thread could not cause a hang.
./irqbind.sh 3 ; taskset 1 ./runaway = Sometimes hang
./irqbind.sh 3 ; taskset 2 ./runaway = !Sometime hang
Basically it appeared that while the irq smp_affinity mask was set for
both cpus, the irq was really being delivered to only one cpu. Looking
at the code, sure enough, that is the case.
The patch below appears to correct this issue, however it also
repeatedly(on different irqs) causes the following BUG:
BUG: using smp_processor_id() in preemptible [00000000] code: IRQ 24/1518
caller is __do_softirq+0x10/0x51
[<c0263bc3>] debug_smp_processor_id+0x7b/0x88 (8)
[<c0126936>] __do_softirq+0x10/0x51 (12)
[<c0141195>] do_irqd+0xd8/0x10e (12)
[<c013250f>] kthread+0x7c/0xa6 (24)
[<c0132493>] kthread+0x0/0xa6 (20)
[<c0101041>] kernel_thread_helper+0x5/0xb (16)
---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<c0263b83>] .... debug_smp_processor_id+0x3b/0x88
.....[<00000000>] .. ( <= rest_init+0x3feffd64/0x3d)
------------------------------
| showing all locks held by: | (IRQ 24/1518 [f7b11070, 60]):
Thoughts?
-john
--- 2.6-rt/kernel/irq/manage.c 2006-05-11 18:37:36.000000000 -0500
+++ dev-rt/kernel/irq/manage.c 2006-05-11 21:09:10.000000000 -0500
@@ -731,10 +731,8 @@
/*
* Did IRQ affinities change?
*/
- if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
- mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
- set_cpus_allowed(current, mask);
- }
+ if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
+ set_cpus_allowed(current, irq_affinity[irq]);
#endif
schedule();
}
* john stultz <[email protected]> wrote:
> - if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
> - mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
> - set_cpus_allowed(current, mask);
> - }
this is intentional, not a bug. The point of the code above is to ensure
that every IRQ handler is executed on one CPU. I.e. the irq threads are
semi-affine - they are strictly affine when executing a handler, but
they may switch CPUs if the affinity mask points it elsewhere.
but ... we might be able to relax that. Potentially some IRQ handlers
might assume per-cpu-ness, but that should be uncovered by
DEBUG_PREEMPT's smp_processor_id() checks.
> + if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
> + set_cpus_allowed(current, irq_affinity[irq]);
> The patch below appears to correct this issue, however it also
> repeatedly(on different irqs) causes the following BUG:
ah. This actually uncovered a real bug. We were calling __do_softirq()
with interrupts enabled (and being preemptible) - which is certainly
bad.
this was hidden before because the smp_processor_id() debugging code
handles tasks bound to a single CPU as per-cpu-safe.
could you check the (totally untested) patch below and see if that fixes
things for you? I've also added your affinity change.
Ingo
Index: linux-rt.q/kernel/irq/manage.c
===================================================================
--- linux-rt.q.orig/kernel/irq/manage.c
+++ linux-rt.q/kernel/irq/manage.c
@@ -717,24 +717,21 @@ static int do_irqd(void * __desc)
if (param.sched_priority > 25)
curr_irq_prio = param.sched_priority - 1;
-// param.sched_priority = 1;
sys_sched_setscheduler(current->pid, SCHED_FIFO, ¶m);
while (!kthread_should_stop()) {
set_current_state(TASK_INTERRUPTIBLE);
do_hardirq(desc);
cond_resched_all();
+ local_irq_disable();
__do_softirq();
-// do_softirq_from_hardirq();
local_irq_enable();
#ifdef CONFIG_SMP
/*
* Did IRQ affinities change?
*/
- if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
- mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
- set_cpus_allowed(current, mask);
- }
+ if (!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
+ set_cpus_allowed(current, irq_affinity[irq]);
#endif
schedule();
}
On Fri, 12 May 2006, Ingo Molnar wrote:
> ah. This actually uncovered a real bug. We were calling __do_softirq()
> with interrupts enabled (and being preemptible) - which is certainly
> bad.
Hmm, I wonder if this is also affecting Mark's problem.
But since I showed that if hardirqs_disabled and running PREEMPT not
PREEMPT_RT, disable_irq can call schedule. This is done in
drivers/net/3c59x.c. It has a watchdog timeout calling disable_irq, which
calls synchronize_irq which might schedule:
void synchronize_irq(unsigned int irq)
{
struct irq_desc *desc = irq_desc + irq;
if (irq >= NR_IRQS)
return;
if (hardirq_preemption && !(desc->status & IRQ_NODELAY))
wait_event(desc->wait_for_handler,
!(desc->status & IRQ_INPROGRESS));
else
while (desc->status & IRQ_INPROGRESS)
cpu_relax();
}
-- Steve
>
> this was hidden before because the smp_processor_id() debugging code
> handles tasks bound to a single CPU as per-cpu-safe.
>
> could you check the (totally untested) patch below and see if that fixes
> things for you? I've also added your affinity change.
>
* Steven Rostedt <[email protected]> wrote:
> On Fri, 12 May 2006, Ingo Molnar wrote:
>
> > ah. This actually uncovered a real bug. We were calling __do_softirq()
> > with interrupts enabled (and being preemptible) - which is certainly
> > bad.
>
> Hmm, I wonder if this is also affecting Mark's problem.
>
> But since I showed that if hardirqs_disabled and running PREEMPT not
> PREEMPT_RT, disable_irq can call schedule. This is done in
> drivers/net/3c59x.c. It has a watchdog timeout calling disable_irq,
> which calls synchronize_irq which might schedule:
>
> void synchronize_irq(unsigned int irq)
> {
> struct irq_desc *desc = irq_desc + irq;
>
> if (irq >= NR_IRQS)
> return;
>
> if (hardirq_preemption && !(desc->status & IRQ_NODELAY))
> wait_event(desc->wait_for_handler,
> !(desc->status & IRQ_INPROGRESS));
> else
> while (desc->status & IRQ_INPROGRESS)
> cpu_relax();
> }
>
> -- Steve
>
> >
> > this was hidden before because the smp_processor_id() debugging code
> > handles tasks bound to a single CPU as per-cpu-safe.
> >
> > could you check the (totally untested) patch below and see if that fixes
> > things for you? I've also added your affinity change.
> >
Steven Rostedt wrote:
>
> On Fri, 12 May 2006, Ingo Molnar wrote:
>
>> ah. This actually uncovered a real bug. We were calling __do_softirq()
>> with interrupts enabled (and being preemptible) - which is certainly
>> bad.
>
> Hmm, I wonder if this is also affecting Mark's problem.
>
I thought the same thing when I read it??
Mark
* Mark Hounschell <[email protected]> wrote:
> Steven Rostedt wrote:
> >
> > On Fri, 12 May 2006, Ingo Molnar wrote:
> >
> >> ah. This actually uncovered a real bug. We were calling __do_softirq()
> >> with interrupts enabled (and being preemptible) - which is certainly
> >> bad.
> >
> > Hmm, I wonder if this is also affecting Mark's problem.
>
> I thought the same thing when I read it??
could you try the patch below?
Ingo
----
* Steven Rostedt <[email protected]> wrote:
> > one solution would be to forbid disable_irq() from softirq contexts, and
> > to convert the vortex timeout function to a workqueue and use the
> > *_delayed_work() APIs to drive it - and cross fingers there's not many
> > places to fix.
>
> I prefer the above. Maybe even add a WARN_ON(in_softirq()) in
> disable_irq.
>
> But I must admit, I wouldn't know how to make that change without
> spending more time on it then I have for this.
the simplest fix for now would be to use the _nosync variant in the
vortex timeout function.
Mark, does this fix the problem?
Ingo
Index: linux-rt.q/drivers/net/3c59x.c
===================================================================
--- linux-rt.q.orig/drivers/net/3c59x.c
+++ linux-rt.q/drivers/net/3c59x.c
@@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data)
if (vp->medialock)
goto leave_media_alone;
- disable_irq(dev->irq);
+ /* hack! */
+ disable_irq_nosync(dev->irq);
old_window = ioread16(ioaddr + EL3_CMD) >> 13;
EL3WINDOW(4);
media_status = ioread16(ioaddr + Wn4_Media);
Ingo Molnar wrote:
> * Mark Hounschell <[email protected]> wrote:
>
>> Steven Rostedt wrote:
>>> On Fri, 12 May 2006, Ingo Molnar wrote:
>>>
>>>> ah. This actually uncovered a real bug. We were calling __do_softirq()
>>>> with interrupts enabled (and being preemptible) - which is certainly
>>>> bad.
>>> Hmm, I wonder if this is also affecting Mark's problem.
>> I thought the same thing when I read it??
>
> could you try the patch below?
>
> Ingo
>
> ----
>
> * Steven Rostedt <[email protected]> wrote:
>
>>> one solution would be to forbid disable_irq() from softirq contexts, and
>>> to convert the vortex timeout function to a workqueue and use the
>>> *_delayed_work() APIs to drive it - and cross fingers there's not many
>>> places to fix.
>> I prefer the above. Maybe even add a WARN_ON(in_softirq()) in
>> disable_irq.
>>
>> But I must admit, I wouldn't know how to make that change without
>> spending more time on it then I have for this.
>
> the simplest fix for now would be to use the _nosync variant in the
> vortex timeout function.
>
> Mark, does this fix the problem?
>
> Ingo
>
> Index: linux-rt.q/drivers/net/3c59x.c
> ===================================================================
> --- linux-rt.q.orig/drivers/net/3c59x.c
> +++ linux-rt.q/drivers/net/3c59x.c
> @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data)
>
> if (vp->medialock)
> goto leave_media_alone;
> - disable_irq(dev->irq);
> + /* hack! */
> + disable_irq_nosync(dev->irq);
> old_window = ioread16(ioaddr + EL3_CMD) >> 13;
> EL3WINDOW(4);
> media_status = ioread16(ioaddr + Wn4_Media);
>
I'm trying that right this moment. It may take a while before I know if
it helps.
Mark
Ingo Molnar wrote:
> * Mark Hounschell <[email protected]> wrote:
>
>> Steven Rostedt wrote:
>>> On Fri, 12 May 2006, Ingo Molnar wrote:
>>>
>>>> ah. This actually uncovered a real bug. We were calling __do_softirq()
>>>> with interrupts enabled (and being preemptible) - which is certainly
>>>> bad.
>>> Hmm, I wonder if this is also affecting Mark's problem.
>> I thought the same thing when I read it??
>
> could you try the patch below?
>
> Ingo
>
> ----
>
> * Steven Rostedt <[email protected]> wrote:
>
>>> one solution would be to forbid disable_irq() from softirq contexts, and
>>> to convert the vortex timeout function to a workqueue and use the
>>> *_delayed_work() APIs to drive it - and cross fingers there's not many
>>> places to fix.
>> I prefer the above. Maybe even add a WARN_ON(in_softirq()) in
>> disable_irq.
>>
>> But I must admit, I wouldn't know how to make that change without
>> spending more time on it then I have for this.
>
> the simplest fix for now would be to use the _nosync variant in the
> vortex timeout function.
>
> Mark, does this fix the problem?
>
> Ingo
>
> Index: linux-rt.q/drivers/net/3c59x.c
> ===================================================================
> --- linux-rt.q.orig/drivers/net/3c59x.c
> +++ linux-rt.q/drivers/net/3c59x.c
> @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data)
>
> if (vp->medialock)
> goto leave_media_alone;
> - disable_irq(dev->irq);
> + /* hack! */
> + disable_irq_nosync(dev->irq);
> old_window = ioread16(ioaddr + EL3_CMD) >> 13;
> EL3WINDOW(4);
> media_status = ioread16(ioaddr + Wn4_Media);
>
Yes it does.
Mark
On Fri, 12 May 2006, Mark Hounschell wrote:
> >
> > Mark, does this fix the problem?
> >
> > Ingo
> >
> > Index: linux-rt.q/drivers/net/3c59x.c
> > ===================================================================
> > --- linux-rt.q.orig/drivers/net/3c59x.c
> > +++ linux-rt.q/drivers/net/3c59x.c
> > @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data)
> >
> > if (vp->medialock)
> > goto leave_media_alone;
> > - disable_irq(dev->irq);
> > + /* hack! */
> > + disable_irq_nosync(dev->irq);
> > old_window = ioread16(ioaddr + EL3_CMD) >> 13;
> > EL3WINDOW(4);
> > media_status = ioread16(ioaddr + Wn4_Media);
> >
>
> Yes it does.
>
It fixes it for both "complete preemption" and "normal preemption"?
-- Steve
Steven Rostedt wrote:
> On Fri, 12 May 2006, Mark Hounschell wrote:
>
>>> Mark, does this fix the problem?
>>>
>>> Ingo
>>>
>>> Index: linux-rt.q/drivers/net/3c59x.c
>>> ===================================================================
>>> --- linux-rt.q.orig/drivers/net/3c59x.c
>>> +++ linux-rt.q/drivers/net/3c59x.c
>>> @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data)
>>>
>>> if (vp->medialock)
>>> goto leave_media_alone;
>>> - disable_irq(dev->irq);
>>> + /* hack! */
>>> + disable_irq_nosync(dev->irq);
>>> old_window = ioread16(ioaddr + EL3_CMD) >> 13;
>>> EL3WINDOW(4);
>>> media_status = ioread16(ioaddr + Wn4_Media);
>>>
>> Yes it does.
>>
>
>
> It fixes it for both "complete preemption" and "normal preemption"?
>
> -- Steve
>
>
Normal for sure. I just rebooted to test complete.
Mark
Steven Rostedt wrote:
> On Fri, 12 May 2006, Mark Hounschell wrote:
>
>>> Mark, does this fix the problem?
>>>
>>> Ingo
>>>
>>> Index: linux-rt.q/drivers/net/3c59x.c
>>> ===================================================================
>>> --- linux-rt.q.orig/drivers/net/3c59x.c
>>> +++ linux-rt.q/drivers/net/3c59x.c
>>> @@ -1897,7 +1897,8 @@ vortex_timer(unsigned long data)
>>>
>>> if (vp->medialock)
>>> goto leave_media_alone;
>>> - disable_irq(dev->irq);
>>> + /* hack! */
>>> + disable_irq_nosync(dev->irq);
>>> old_window = ioread16(ioaddr + EL3_CMD) >> 13;
>>> EL3WINDOW(4);
>>> media_status = ioread16(ioaddr + Wn4_Media);
>>>
>> Yes it does.
>>
>
>
> It fixes it for both "complete preemption" and "normal preemption"?
>
> -- Steve
>
>
Nope, I still have the complete preemption problem. My box locked up
right away. Had to reboot it.
You guys know best but I think the problem the above patch fixes is
probably not related to my 'complete preemption' problem or the problem
reported by John. But my 'complete preemption' problem may certainly be
the same as Johns.
Mark
On Fri, 2006-05-12 at 07:50 +0200, Ingo Molnar wrote:
> * john stultz <[email protected]> wrote:
> > + if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
> > + set_cpus_allowed(current, irq_affinity[irq]);
>
> > The patch below appears to correct this issue, however it also
> > repeatedly(on different irqs) causes the following BUG:
>
> ah. This actually uncovered a real bug. We were calling __do_softirq()
> with interrupts enabled (and being preemptible) - which is certainly
> bad.
>
> this was hidden before because the smp_processor_id() debugging code
> handles tasks bound to a single CPU as per-cpu-safe.
>
> could you check the (totally untested) patch below and see if that fixes
> things for you? I've also added your affinity change.
Yep, no BUG messages and I get irq affinity behavior that matches what I
echo into the proc interface.
Looks good to me so far. I'll keep running w/ it and let you know if we
see any issues.
thanks
-john
On Fri, 2006-05-12 at 07:50 +0200, Ingo Molnar wrote:
> + if (!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
> + set_cpus_allowed(current, irq_affinity[irq]);
Gah! I introduced a terrible bug there.
Note the semi-colon at the end of the if statement! Sorry about that!
The following patch (which I've tested as well) fixes that.
--- 2.6-rt/kernel/irq/manage.c 2006-05-11 18:37:36.000000000 -0500
+++ dev-rt/kernel/irq/manage.c 2006-05-12 12:55:56.000000000 -0500
@@ -724,6 +724,7 @@
set_current_state(TASK_INTERRUPTIBLE);
do_hardirq(desc);
cond_resched_all();
+ local_irq_disable();
__do_softirq();
// do_softirq_from_hardirq();
local_irq_enable();
@@ -731,10 +732,8 @@
/*
* Did IRQ affinities change?
*/
- if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
- mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
- set_cpus_allowed(current, mask);
- }
+ if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]))
+ set_cpus_allowed(current, irq_affinity[irq]);
#endif
schedule();
}
john stultz wrote:
> --- 2.6-rt/kernel/irq/manage.c 2006-05-11 18:37:36.000000000 -0500
> +++ dev-rt/kernel/irq/manage.c 2006-05-12 12:55:56.000000000 -0500
> @@ -724,6 +724,7 @@
> set_current_state(TASK_INTERRUPTIBLE);
> do_hardirq(desc);
> cond_resched_all();
> + local_irq_disable();
> __do_softirq();
> // do_softirq_from_hardirq();
> local_irq_enable();
> @@ -731,10 +732,8 @@
> /*
> * Did IRQ affinities change?
> */
> - if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
> - mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
> - set_cpus_allowed(current, mask);
> - }
> + if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]))
> + set_cpus_allowed(current, irq_affinity[irq]);
> #endif
> schedule();
> }
>
>
>
>
Steve,
This patch seems to have made a huge difference in my 'complete
preemption' mode 'pauses' problem.
>From many seconds, even minutes at a time, to 5 seconds and less when it
even does happen???
It does still happen however.
Mark
john stultz wrote:
> On Fri, 2006-05-12 at 07:50 +0200, Ingo Molnar wrote:
>> + if (!cpus_equal(current->cpus_allowed, irq_affinity[irq]));
>> + set_cpus_allowed(current, irq_affinity[irq]);
>
> Gah! I introduced a terrible bug there.
>
> Note the semi-colon at the end of the if statement! Sorry about that!
>
> The following patch (which I've tested as well) fixes that.
>
> --- 2.6-rt/kernel/irq/manage.c 2006-05-11 18:37:36.000000000 -0500
> +++ dev-rt/kernel/irq/manage.c 2006-05-12 12:55:56.000000000 -0500
> @@ -724,6 +724,7 @@
> set_current_state(TASK_INTERRUPTIBLE);
> do_hardirq(desc);
> cond_resched_all();
> + local_irq_disable();
> __do_softirq();
> // do_softirq_from_hardirq();
> local_irq_enable();
> @@ -731,10 +732,8 @@
> /*
> * Did IRQ affinities change?
> */
> - if (!cpu_isset(smp_processor_id(), irq_affinity[irq])) {
> - mask = cpumask_of_cpu(any_online_cpu(irq_affinity[irq]));
> - set_cpus_allowed(current, mask);
> - }
> + if(!cpus_equal(current->cpus_allowed, irq_affinity[irq]))
> + set_cpus_allowed(current, irq_affinity[irq]);
> #endif
> schedule();
> }
>
>
>
>
FYI,
I just looked at rt21 and the first version of this patch seems to be in
it. Not this version.
Mark