2002-09-07 22:07:17

by Zwane Mwaikambo

[permalink] [raw]
Subject: [PATCH][RFC] per isr in_progress markers

Hi Ingo, Robert,
What do you make of the following patch. It is supposed to ease
irq sharing by allowing multiple isrs to be executed, but still not
allowing a specific isr to be run asynchronously. I haven't been able to
test it on SMP proper, only SMP kernel on UP machine and using a shared
network card and sound card concurrently with an interrupt load of
~3000irqs/s

Few questions;
1) Should we set IRQ_PENDING on the way out again if it is found to be
ISR_INPROGRESS?

2) Is the spin_unlock(desc->lock).. handle_IRQ_event() ...
spin_lock(desc->lock) window large enough to allow another cpu in there
to handle another interrupt on that descriptor?

Thanks,
Zwane

Index: linux-2.5.33/include/linux/interrupt.h
===================================================================
RCS file: /build/cvsroot/linux-2.5.33/include/linux/interrupt.h,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 interrupt.h
--- linux-2.5.33/include/linux/interrupt.h 31 Aug 2002 22:30:51 -0000 1.1.1.1
+++ linux-2.5.33/include/linux/interrupt.h 7 Sep 2002 17:21:40 -0000
@@ -13,6 +13,8 @@
#include <asm/system.h>
#include <asm/ptrace.h>

+#define ISR_INPROGRESS 1 /* ISR currently being executed */
+
struct irqaction {
void (*handler)(int, void *, struct pt_regs *);
unsigned long flags;
@@ -20,6 +22,7 @@
const char *name;
void *dev_id;
struct irqaction *next;
+ unsigned long status;
};


Index: linux-2.5.33/arch/i386/kernel/irq.c
===================================================================
RCS file: /build/cvsroot/linux-2.5.33/arch/i386/kernel/irq.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 irq.c
--- linux-2.5.33/arch/i386/kernel/irq.c 31 Aug 2002 22:31:11 -0000 1.1.1.1
+++ linux-2.5.33/arch/i386/kernel/irq.c 7 Sep 2002 17:22:26 -0000
@@ -200,21 +200,27 @@
*/
int handle_IRQ_event(unsigned int irq, struct pt_regs * regs, struct irqaction * action)
{
- int status = 1; /* Force the "do bottom halves" bit */
+ int ret = 1; /* Force the "do bottom halves" bit */

if (!(action->flags & SA_INTERRUPT))
local_irq_enable();

+ /* Ease irq sharing by allowing other handlers to be run instead
+ * of blocking all with IRQ_INPROGRESS */
+
do {
- status |= action->flags;
- action->handler(irq, action->dev_id, regs);
+ if (test_and_set_bit(ISR_INPROGRESS, &action->status) == 0) {
+ action->handler(irq, action->dev_id, regs);
+ clear_bit(ISR_INPROGRESS, &action->status);
+ }
+ ret |= action->flags;
action = action->next;
} while (action);
- if (status & SA_SAMPLE_RANDOM)
+ if (ret & SA_SAMPLE_RANDOM)
add_interrupt_randomness(irq);
local_irq_disable();

- return status;
+ return ret;
}

/*
@@ -342,10 +348,13 @@

/*
* If the IRQ is disabled for whatever reason, we cannot
- * use the action we have.
+ * use the action we have. Note that we don't check for
+ * IRQ_INPROGRESS, we allow multiple ISRs from a shared
+ * interrupt to be run concurrently, but still not allowing
+ * the same handler to be run asynchronously.
*/
action = NULL;
- if (likely(!(status & (IRQ_DISABLED | IRQ_INPROGRESS)))) {
+ if (likely(!(status & IRQ_DISABLED))) {
action = desc->action;
status &= ~IRQ_PENDING; /* we commit to handling */
status |= IRQ_INPROGRESS; /* we are handling it */
@@ -463,6 +472,7 @@
action->mask = 0;
action->name = devname;
action->next = NULL;
+ action->status = 0;
action->dev_id = dev_id;

retval = setup_irq(irq, action);
--
function.linuxpower.ca



2002-09-08 07:45:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers


On Sun, 8 Sep 2002, Zwane Mwaikambo wrote:

> What do you make of the following patch. It is supposed to ease
> irq sharing by allowing multiple isrs to be executed, but still not
> allowing a specific isr to be run asynchronously. I haven't been able to
> test it on SMP proper, only SMP kernel on UP machine and using a shared
> network card and sound card concurrently with an interrupt load of
> ~3000irqs/s

hm, perhaps it could confuse some of the more complex shared-IRQ-aware
device drivers, such as IDE. But your patch is very tempting nevertheless,
it removes much of the disadvantage of sharing interrupt lines. Most of
the handlers on the chain are supposed to be completely independent.

Ingo

2002-09-08 08:05:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers


> [...] But your patch is very tempting nevertheless, it removes much of
> the disadvantage of sharing interrupt lines. Most of the handlers on the
> chain are supposed to be completely independent.

one big issue are level triggered interrupts - your approach makes no
sense in the way we disable/ack the IRQ line currently:

disable IRQ line
ack APIC
-> call handler
while (work_left) {
ack interrupt on the card [*]
[... full processing ...]
}

if we didnt disable the IRQ line then an additional interrupt would be
triggered when [*] is done.

it could perhaps be handled the following way:

disable IRQ line
ack APIC
-> call handler
while (work_left) {
ack interrupt on the card [*]
enable IRQ line [**]
[... full processing ...]
}

so after [**] is done we could accept new interrupts, and the amount of
time we keep the irq line disabled should be small. Obviously this means
driver level changes.


an additional nit even for edge-triggered interrupts: synchronize_irq()
needs to be aware of the new bit on SMP, now that IRQ_PENDING is not
showing the true 'pending' state anymore. But it's doable. Basically
IRQ_PENDING would be gone completely, and replaced by a more complex set
of bits in the action struct. In the normal unshared case it should be
almost as efficient as the IRQ_PENDING bit.

in fact i'd suggest to also add a desc->pending counter in addition to the
per-action flag, to make it cheaper to determine whether there are any
pending handlers on the chain.

also some other code needs to be updated as well to be aware of the
changed pending-semantics: enable_irq() and probe_irq_on().

Ingo

2002-09-08 10:10:57

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Sun, 8 Sep 2002, Ingo Molnar wrote:

> if we didnt disable the IRQ line then an additional interrupt would be
> triggered when [*] is done.
>
> it could perhaps be handled the following way:
>
> disable IRQ line
> ack APIC
> -> call handler
> while (work_left) {
> ack interrupt on the card [*]
> enable IRQ line [**]
> [... full processing ...]
> }
>
> so after [**] is done we could accept new interrupts, and the amount of
> time we keep the irq line disabled should be small. Obviously this means
> driver level changes.

Ok that definitely would allow for more interrupts to get through, i was
working on the basis that handlers with SA_INTERRUPT set would allow for
for that reenable. About the driver level changes, would this be in regard
to a device with ISR_INPROGRESS triggering an interrupt and thus have one
pending? In that case couldn't we avoid touching the driver and increment
a pending counter on that particular handler?

> an additional nit even for edge-triggered interrupts: synchronize_irq()
> needs to be aware of the new bit on SMP, now that IRQ_PENDING is not
> showing the true 'pending' state anymore. But it's doable. Basically
> IRQ_PENDING would be gone completely, and replaced by a more complex set
> of bits in the action struct. In the normal unshared case it should be
> almost as efficient as the IRQ_PENDING bit.
>
> in fact i'd suggest to also add a desc->pending counter in addition to the
> per-action flag, to make it cheaper to determine whether there are any
> pending handlers on the chain.

Gotcha

> also some other code needs to be updated as well to be aware of the
> changed pending-semantics: enable_irq() and probe_irq_on().

I'll have a look at those too.

Thanks,
Zwane
--
function.linuxpower.ca


2002-09-08 10:29:55

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Sun, 8 Sep 2002, Ingo Molnar wrote:

> hm, perhaps it could confuse some of the more complex shared-IRQ-aware
> device drivers, such as IDE. But your patch is very tempting nevertheless,
> it removes much of the disadvantage of sharing interrupt lines. Most of
> the handlers on the chain are supposed to be completely independent.

iirc IDE is capable of doing its own masking per device(nIEN) and in fact
does even do unconditional sti's in its isr paths. So i would think it
would be one of the not so painful device drivers to take care of.
DISCLAIMER: I am not Andre Hedrick

Zwane
--
function.linuxpower.ca

2002-09-08 13:03:21

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Sun, 8 Sep 2002, Ingo Molnar wrote:

> if we didnt disable the IRQ line then an additional interrupt would be
> triggered when [*] is done.

We should be safe from that since ->end() is called after the handler at
at that stage the irq line is still disabled due to us not ack'ing in
->ack()

> it could perhaps be handled the following way:
>
> disable IRQ line
> ack APIC
> -> call handler
> while (work_left) {
> ack interrupt on the card [*]
> enable IRQ line [**]
> [... full processing ...]
> }
>
> so after [**] is done we could accept new interrupts, and the amount of
> time we keep the irq line disabled should be small. Obviously this means
> driver level changes.

After looking at the code at bit more the following would still work for
level triggered via ioapic. Due to that specific irq line being
effectively disabled until we hit ->end.

desc->handler->ack();
action->handler()
while(work_to_do) {
ack_interrupt_on_card(); <-- unmask_irq_line in here too
do_isr_work();
}
...
desc->handler->end();

So it looks like your previous suggestion of driver modification still
stands, would it be safe then to do a real ack in the isr (for the ioapic
case)?

Zwane

--
function.linuxpower.ca

2002-09-08 16:32:25

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

Here is a newer (untested) patch incorporating Ingo's suggestions as well
as adding an extra request_irq flag so that isrs can use isr_unmask_irq()
to enable their interrupt lines.

Ingo, if i didn't understand you properly please point out where i missed
what you meant.

Thanks,
Zwane

Index: linux-2.5.33/include/linux/interrupt.h
===================================================================
RCS file: /home/zwane/source/cvs_rep/linux-2.5.33/include/linux/interrupt.h,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 interrupt.h
--- linux-2.5.33/include/linux/interrupt.h 2002/09/08 10:48:54 1.1.1.1
+++ linux-2.5.33/include/linux/interrupt.h 2002/09/08 13:57:57
@@ -13,6 +13,8 @@
#include <asm/system.h>
#include <asm/ptrace.h>

+#define ISR_INPROGRESS 1 /* ISR currently being executed */
+
struct irqaction {
void (*handler)(int, void *, struct pt_regs *);
unsigned long flags;
@@ -20,6 +22,7 @@
const char *name;
void *dev_id;
struct irqaction *next;
+ unsigned long status;
};


Index: linux-2.5.33/include/linux/irq.h
===================================================================
RCS file: /home/zwane/source/cvs_rep/linux-2.5.33/include/linux/irq.h,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 irq.h
--- linux-2.5.33/include/linux/irq.h 2002/09/08 10:48:53 1.1.1.1
+++ linux-2.5.33/include/linux/irq.h 2002/09/08 13:52:24
@@ -24,13 +24,12 @@
*/
#define IRQ_INPROGRESS 1 /* IRQ handler active - do not enter! */
#define IRQ_DISABLED 2 /* IRQ disabled - do not enter! */
-#define IRQ_PENDING 4 /* IRQ pending - replay on enable */
-#define IRQ_REPLAY 8 /* IRQ has been replayed but not acked yet */
-#define IRQ_AUTODETECT 16 /* IRQ is being autodetected */
-#define IRQ_WAITING 32 /* IRQ not yet seen - for autodetection */
-#define IRQ_LEVEL 64 /* IRQ level triggered */
-#define IRQ_MASKED 128 /* IRQ masked - shouldn't be seen again */
-#define IRQ_PER_CPU 256 /* IRQ is per CPU */
+#define IRQ_REPLAY 4 /* IRQ has been replayed but not acked yet */
+#define IRQ_AUTODETECT 8 /* IRQ is being autodetected */
+#define IRQ_WAITING 16 /* IRQ not yet seen - for autodetection */
+#define IRQ_LEVEL 32 /* IRQ level triggered */
+#define IRQ_MASKED 64 /* IRQ masked - shouldn't be seen again */
+#define IRQ_PER_CPU 128 /* IRQ is per CPU */

/*
* Interrupt controller descriptor. This is all we need
@@ -62,6 +61,7 @@
struct irqaction *action; /* IRQ action list */
unsigned int depth; /* nested irq disables */
spinlock_t lock;
+ unsigned int pending;
} ____cacheline_aligned irq_desc_t;

extern irq_desc_t irq_desc [NR_IRQS];
Index: linux-2.5.33/arch/i386/kernel/irq.c
===================================================================
RCS file: /home/zwane/source/cvs_rep/linux-2.5.33/arch/i386/kernel/irq.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 irq.c
--- linux-2.5.33/arch/i386/kernel/irq.c 2002/09/08 10:48:21 1.1.1.1
+++ linux-2.5.33/arch/i386/kernel/irq.c 2002/09/08 15:41:03
@@ -66,7 +66,7 @@
* Controller mappings for all interrupt sources:
*/
irq_desc_t irq_desc[NR_IRQS] __cacheline_aligned =
- { [0 ... NR_IRQS-1] = { 0, &no_irq_type, NULL, 0, SPIN_LOCK_UNLOCKED}};
+ { [0 ... NR_IRQS-1] = { 0, &no_irq_type, NULL, 0, SPIN_LOCK_UNLOCKED, 0}};

static void register_irq_proc (unsigned int irq);

@@ -186,11 +186,33 @@
#if CONFIG_SMP
inline void synchronize_irq(unsigned int irq)
{
+ /* We can still synchronize with IRQ_INPROGRESS even
+ * with asynchronous handlers since it encapsulates
+ * the handlers
+ */
while (irq_desc[irq].status & IRQ_INPROGRESS)
cpu_relax();
}
#endif

+/* This should be called from ISRs to enable
+ * their interrupt line once they've acked
+ * the device interrupt they are handling
+ */
+
+void inline isr_unmask_irq(unsigned int irq)
+{
+ irq_desc_t *desc = irq_desc + irq;
+
+ if (desc->action->flags & SA_UNMASKIRQ)
+ desc->handler->enable(irq);
+#if 0
+ /* APICs require an ack too */
+ if (desc->handler->ack == ack_APIC_irq)
+ desc->handler->ack();
+#endif
+}
+
/*
* This should really return information about whether
* we should do bottom half handling etc. Right now we
@@ -200,21 +222,27 @@
*/
int handle_IRQ_event(unsigned int irq, struct pt_regs * regs, struct irqaction * action)
{
- int status = 1; /* Force the "do bottom halves" bit */
+ int ret = 1; /* Force the "do bottom halves" bit */

if (!(action->flags & SA_INTERRUPT))
local_irq_enable();

+ /* Ease irq sharing by allowing other handlers to be run instead
+ * of blocking all with IRQ_INPROGRESS */
+
do {
- status |= action->flags;
- action->handler(irq, action->dev_id, regs);
+ if (test_and_set_bit(ISR_INPROGRESS, &action->status) == 0) {
+ action->handler(irq, action->dev_id, regs);
+ clear_bit(ISR_INPROGRESS, &action->status);
+ }
+ ret |= action->flags;
action = action->next;
} while (action);
- if (status & SA_SAMPLE_RANDOM)
+ if (ret & SA_SAMPLE_RANDOM)
add_interrupt_randomness(irq);
local_irq_disable();

- return status;
+ return ret;
}

/*
@@ -289,9 +317,10 @@
case 1: {
unsigned int status = desc->status & ~IRQ_DISABLED;
desc->status = status;
- if ((status & (IRQ_PENDING | IRQ_REPLAY)) == IRQ_PENDING) {
+
+ if (!(status & IRQ_REPLAY) && desc->pending) {
desc->status = status | IRQ_REPLAY;
- hw_resend_irq(desc->handler,irq);
+ hw_resend_irq(desc->handler, irq);
}
desc->handler->enable(irq);
/* fall-through */
@@ -338,16 +367,19 @@
WAITING is used by probe to mark irqs that are being tested
*/
status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);
- status |= IRQ_PENDING; /* we _want_ to handle it */
+ desc->pending++; /* we _want_ to handle it */

/*
* If the IRQ is disabled for whatever reason, we cannot
- * use the action we have.
+ * use the action we have. Note that we don't check for
+ * IRQ_INPROGRESS, we allow multiple ISRs from a shared
+ * interrupt to be run concurrently, but still not allowing
+ * the same handler to be run asynchronously.
*/
action = NULL;
- if (likely(!(status & (IRQ_DISABLED | IRQ_INPROGRESS)))) {
+ if (likely(!(status & IRQ_DISABLED))) {
action = desc->action;
- status &= ~IRQ_PENDING; /* we commit to handling */
+ desc->pending--; /* we commit to handling */
status |= IRQ_INPROGRESS; /* we are handling it */
}
desc->status = status;
@@ -376,9 +408,9 @@
handle_IRQ_event(irq, &regs, action);
spin_lock(&desc->lock);

- if (likely(!(desc->status & IRQ_PENDING)))
+ if (likely(!desc->pending))
break;
- desc->status &= ~IRQ_PENDING;
+ desc->pending--;
}
out:
desc->status &= ~IRQ_INPROGRESS;
@@ -424,6 +456,7 @@
*
* SA_SAMPLE_RANDOM The interrupt can be used for entropy
*
+ * SA_UNMASKIRQ Unmask irq line whilst processing
*/

int request_irq(unsigned int irq,
@@ -463,6 +496,7 @@
action->mask = 0;
action->name = devname;
action->next = NULL;
+ action->status = 0;
action->dev_id = dev_id;

retval = setup_irq(irq, action);
@@ -563,14 +597,14 @@
desc = irq_desc + i;

spin_lock_irq(&desc->lock);
- if (!irq_desc[i].action)
- irq_desc[i].handler->startup(i);
+ if (!desc->action)
+ desc->handler->startup(i);
spin_unlock_irq(&desc->lock);
}

/* Wait for longstanding interrupts to trigger. */
for (delay = jiffies + HZ/50; time_after(delay, jiffies); )
- /* about 20ms delay */ barrier();
+ /* about 20ms delay */ cpu_relax();

/*
* enable any unassigned irqs
@@ -584,7 +618,7 @@
if (!desc->action) {
desc->status |= IRQ_AUTODETECT | IRQ_WAITING;
if (desc->handler->startup(i))
- desc->status |= IRQ_PENDING;
+ desc->pending++;
}
spin_unlock_irq(&desc->lock);
}
@@ -593,7 +627,7 @@
* Wait for spurious interrupts to trigger
*/
for (delay = jiffies + HZ/10; time_after(delay, jiffies); )
- /* about 100ms delay */ barrier();
+ /* about 100ms delay */ cpu_relax();

/*
* Now filter out any obviously spurious interrupts
@@ -756,6 +790,14 @@
return -EBUSY;
}

+ if (!(old->flags & new->flags & SA_UNMASKIRQ)) {
+ printk(KERN_DEBUG
+ "Unable to unmask irqs with isrs old:%s new:%s",
+ old->name, new->name);
+ old->flags &= ~SA_UNMASKIRQ;
+ new->flags &= ~SA_UNMASKIRQ;
+ }
+
/* add new interrupt at end of irq queue */
do {
p = &old->next;
Index: linux-2.5.33/arch/i386/kernel/io_apic.c
===================================================================
RCS file: /home/zwane/source/cvs_rep/linux-2.5.33/arch/i386/kernel/io_apic.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 io_apic.c
--- linux-2.5.33/arch/i386/kernel/io_apic.c 2002/09/08 10:48:21 1.1.1.1
+++ linux-2.5.33/arch/i386/kernel/io_apic.c 2002/09/08 14:43:33
@@ -1292,15 +1292,14 @@
#define shutdown_edge_ioapic_irq disable_edge_ioapic_irq

/*
- * Once we have recorded IRQ_PENDING already, we can mask the
+ * Once we have recorded irq pending already, we can mask the
* interrupt for real. This prevents IRQ storms from unhandled
* devices.
*/
static void ack_edge_ioapic_irq(unsigned int irq)
{
balance_irq(irq);
- if ((irq_desc[irq].status & (IRQ_PENDING | IRQ_DISABLED))
- == (IRQ_PENDING | IRQ_DISABLED))
+ if (irq_desc[irq].pending && (irq_desc[irq].status & IRQ_DISABLED))
mask_IO_APIC_irq(irq);
ack_APIC_irq();
}

--
function.linuxpower.ca



2002-09-08 21:56:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers


On Sun, 8 Sep 2002, Zwane Mwaikambo wrote:
>
> Here is a newer (untested) patch incorporating Ingo's suggestions as well
> as adding an extra request_irq flag so that isrs can use isr_unmask_irq()
> to enable their interrupt lines.

Hmm.. I really don't get the point of what this is supposed to actually
help.

Clearly, if the device doesn't share the irq line, this doesn't matter.
Similarly, it shouldn't matter if there is just one device that is active
(ie irq line sharing with some slow device where the interrupt happens
fairly seldom).

As far as I can tell, the only time when this might be an advantage is an
SMP machine with multiple devices sharing an extremely busy irq line. Then
the per-isr in-progress bit allows multiple CPU's to actively handle
several of the devices at the same time.

Or is there some other case where this is helpful?

The reason I don't much like this is:

- bigger SMP machines don't tend to share all that many interrupts
anyway, since they all use IO-APICs and tend to have fairly sparse irq
setups (as opposed to most laptops, which often seem to put every PCI
device on the same irq)

- if both devices really _are_ that actively pushing a lot of interrupts,
it sounds like you actually want to keep the caches hot on one CPU
instead of randomly taking the irq on various CPU's. Have you actually
got performance numbers to show otherwise? That irq lock is going to
bounce back and forth a _lot_, quite possibly undoing any advantage of
the patch.

- on all the cases where this _doesn't_ help, it just potentially makes
the stack depth even deeper.

So I'd really like to understand what the upsides are..

Linus

2002-09-08 22:59:22

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

Linus Torvalds wrote:
> On Sun, 8 Sep 2002, Zwane Mwaikambo wrote:
>
>>Here is a newer (untested) patch incorporating Ingo's suggestions as well
>>as adding an extra request_irq flag so that isrs can use isr_unmask_irq()
>>to enable their interrupt lines.
>
>
> Hmm.. I really don't get the point of what this is supposed to actually
> help.
>
> Clearly, if the device doesn't share the irq line, this doesn't matter.
> Similarly, it shouldn't matter if there is just one device that is active
> (ie irq line sharing with some slow device where the interrupt happens
> fairly seldom).
>
> As far as I can tell, the only time when this might be an advantage is an
> SMP machine with multiple devices sharing an extremely busy irq line. Then
> the per-isr in-progress bit allows multiple CPU's to actively handle
> several of the devices at the same time.


IMO one should seek to avoid sharing an IRQ line at all. I dunno that
you really want to tune for that case, when the user could vastly
improve the situation by manipulating IRQs in BIOS setup or similar
IRQ-distribution methods.

On an SMP box you especially want to distribute irqs to take best
advantage of irq affinity.

Jeff


2002-09-09 06:44:34

by bert hubert

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Sun, Sep 08, 2002 at 03:01:02PM -0700, Linus Torvalds wrote:

> setups (as opposed to most laptops, which often seem to put every PCI
> device on the same irq)

I've always thought that this was a linux problem - any reason *why* laptops
do this?

Regards,

bert hubert

--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO

2002-09-09 09:46:21

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

Hi,
I sent you a bad patch yesterday.

Zwane

Index: linux-2.5.33/arch/i386/kernel/io_apic.c
===================================================================
RCS file: /build/cvsroot/linux-2.5.33/arch/i386/kernel/io_apic.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 io_apic.c
--- linux-2.5.33/arch/i386/kernel/io_apic.c 31 Aug 2002 22:31:11 -0000 1.1.1.1
+++ linux-2.5.33/arch/i386/kernel/io_apic.c 8 Sep 2002 17:45:32 -0000
@@ -1292,15 +1292,14 @@
#define shutdown_edge_ioapic_irq disable_edge_ioapic_irq

/*
- * Once we have recorded IRQ_PENDING already, we can mask the
+ * Once we have recorded irq pending already, we can mask the
* interrupt for real. This prevents IRQ storms from unhandled
* devices.
*/
static void ack_edge_ioapic_irq(unsigned int irq)
{
balance_irq(irq);
- if ((irq_desc[irq].status & (IRQ_PENDING | IRQ_DISABLED))
- == (IRQ_PENDING | IRQ_DISABLED))
+ if (irq_desc[irq].pending && (irq_desc[irq].status & IRQ_DISABLED))
mask_IO_APIC_irq(irq);
ack_APIC_irq();
}
Index: linux-2.5.33/arch/i386/kernel/irq.c
===================================================================
RCS file: /build/cvsroot/linux-2.5.33/arch/i386/kernel/irq.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 irq.c
--- linux-2.5.33/arch/i386/kernel/irq.c 31 Aug 2002 22:31:11 -0000 1.1.1.1
+++ linux-2.5.33/arch/i386/kernel/irq.c 8 Sep 2002 22:46:56 -0000
@@ -66,7 +66,7 @@
* Controller mappings for all interrupt sources:
*/
irq_desc_t irq_desc[NR_IRQS] __cacheline_aligned =
- { [0 ... NR_IRQS-1] = { 0, &no_irq_type, NULL, 0, SPIN_LOCK_UNLOCKED}};
+ { [0 ... NR_IRQS-1] = { 0, &no_irq_type, NULL, 0, SPIN_LOCK_UNLOCKED, 0}};

static void register_irq_proc (unsigned int irq);

@@ -186,11 +186,29 @@
#if CONFIG_SMP
inline void synchronize_irq(unsigned int irq)
{
+ /* We can still synchronize with IRQ_INPROGRESS even
+ * with asynchronous handlers since it encapsulates
+ * the handlers
+ */
while (irq_desc[irq].status & IRQ_INPROGRESS)
cpu_relax();
}
#endif

+/* This should be called from ISRs to enable
+ * their interrupt line once they've acked
+ * the device interrupt they are handling.
+ */
+
+void isr_unmask_irq(unsigned int irq)
+{
+ irq_desc_t *desc = irq_desc + irq;
+
+ /* check for an IOAPIC wired interrupt */
+ if ((desc->action->flags & SA_UNMASKIRQ) && !memcmp(desc->handler, "IO", 2))
+ desc->handler->ack(irq);
+}
+
/*
* This should really return information about whether
* we should do bottom half handling etc. Right now we
@@ -200,21 +218,27 @@
*/
int handle_IRQ_event(unsigned int irq, struct pt_regs * regs, struct irqaction * action)
{
- int status = 1; /* Force the "do bottom halves" bit */
+ int ret = 1; /* Force the "do bottom halves" bit */

if (!(action->flags & SA_INTERRUPT))
local_irq_enable();

+ /* Ease irq sharing by allowing other handlers to be run instead
+ * of blocking all with IRQ_INPROGRESS */
+
do {
- status |= action->flags;
- action->handler(irq, action->dev_id, regs);
+ if (test_and_set_bit(ISR_INPROGRESS, &action->status) == 0) {
+ action->handler(irq, action->dev_id, regs);
+ clear_bit(ISR_INPROGRESS, &action->status);
+ }
+ ret |= action->flags;
action = action->next;
} while (action);
- if (status & SA_SAMPLE_RANDOM)
+ if (ret & SA_SAMPLE_RANDOM)
add_interrupt_randomness(irq);
local_irq_disable();

- return status;
+ return ret;
}

/*
@@ -289,9 +313,10 @@
case 1: {
unsigned int status = desc->status & ~IRQ_DISABLED;
desc->status = status;
- if ((status & (IRQ_PENDING | IRQ_REPLAY)) == IRQ_PENDING) {
+
+ if (!(status & IRQ_REPLAY) && desc->pending) {
desc->status = status | IRQ_REPLAY;
- hw_resend_irq(desc->handler,irq);
+ hw_resend_irq(desc->handler, irq);
}
desc->handler->enable(irq);
/* fall-through */
@@ -338,16 +363,19 @@
WAITING is used by probe to mark irqs that are being tested
*/
status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);
- status |= IRQ_PENDING; /* we _want_ to handle it */
+ desc->pending++; /* we _want_ to handle it */

/*
* If the IRQ is disabled for whatever reason, we cannot
- * use the action we have.
+ * use the action we have. Note that we don't check for
+ * IRQ_INPROGRESS, we allow multiple ISRs from a shared
+ * interrupt to be run concurrently, but still not allowing
+ * the same handler to be run asynchronously.
*/
action = NULL;
- if (likely(!(status & (IRQ_DISABLED | IRQ_INPROGRESS)))) {
+ if (likely(!(status & IRQ_DISABLED))) {
action = desc->action;
- status &= ~IRQ_PENDING; /* we commit to handling */
+ desc->pending--; /* we commit to handling */
status |= IRQ_INPROGRESS; /* we are handling it */
}
desc->status = status;
@@ -376,9 +404,9 @@
handle_IRQ_event(irq, &regs, action);
spin_lock(&desc->lock);

- if (likely(!(desc->status & IRQ_PENDING)))
+ if (likely(!desc->pending))
break;
- desc->status &= ~IRQ_PENDING;
+ desc->pending--;
}
out:
desc->status &= ~IRQ_INPROGRESS;
@@ -424,6 +452,7 @@
*
* SA_SAMPLE_RANDOM The interrupt can be used for entropy
*
+ * SA_UNMASKIRQ Unmask irq line whilst processing
*/

int request_irq(unsigned int irq,
@@ -463,6 +492,7 @@
action->mask = 0;
action->name = devname;
action->next = NULL;
+ action->status = 0;
action->dev_id = dev_id;

retval = setup_irq(irq, action);
@@ -563,14 +593,14 @@
desc = irq_desc + i;

spin_lock_irq(&desc->lock);
- if (!irq_desc[i].action)
- irq_desc[i].handler->startup(i);
+ if (!desc->action)
+ desc->handler->startup(i);
spin_unlock_irq(&desc->lock);
}

/* Wait for longstanding interrupts to trigger. */
for (delay = jiffies + HZ/50; time_after(delay, jiffies); )
- /* about 20ms delay */ barrier();
+ /* about 20ms delay */ cpu_relax();

/*
* enable any unassigned irqs
@@ -584,7 +614,7 @@
if (!desc->action) {
desc->status |= IRQ_AUTODETECT | IRQ_WAITING;
if (desc->handler->startup(i))
- desc->status |= IRQ_PENDING;
+ desc->pending++;
}
spin_unlock_irq(&desc->lock);
}
@@ -593,7 +623,7 @@
* Wait for spurious interrupts to trigger
*/
for (delay = jiffies + HZ/10; time_after(delay, jiffies); )
- /* about 100ms delay */ barrier();
+ /* about 100ms delay */ cpu_relax();

/*
* Now filter out any obviously spurious interrupts
@@ -756,6 +786,14 @@
return -EBUSY;
}

+ if (!(old->flags & new->flags & SA_UNMASKIRQ)) {
+ printk(KERN_DEBUG
+ "Unable to unmask irqs with isrs old:%s new:%s\n",
+ old->name, new->name);
+ old->flags &= ~SA_UNMASKIRQ;
+ new->flags &= ~SA_UNMASKIRQ;
+ }
+
/* add new interrupt at end of irq queue */
do {
p = &old->next;
Index: linux-2.5.33/include/asm-i386/hw_irq.h
===================================================================
RCS file: /build/cvsroot/linux-2.5.33/include/asm-i386/hw_irq.h,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 hw_irq.h
--- linux-2.5.33/include/asm-i386/hw_irq.h 31 Aug 2002 22:31:04 -0000 1.1.1.1
+++ linux-2.5.33/include/asm-i386/hw_irq.h 8 Sep 2002 20:29:58 -0000
@@ -43,6 +43,7 @@

extern void mask_irq(unsigned int irq);
extern void unmask_irq(unsigned int irq);
+extern void isr_unmask_irq(unsigned int irq);
extern void disable_8259A_irq(unsigned int irq);
extern void enable_8259A_irq(unsigned int irq);
extern int i8259A_irq_pending(unsigned int irq);
Index: linux-2.5.33/include/asm-i386/signal.h
===================================================================
RCS file: /build/cvsroot/linux-2.5.33/include/asm-i386/signal.h,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 signal.h
--- linux-2.5.33/include/asm-i386/signal.h 31 Aug 2002 22:31:04 -0000 1.1.1.1
+++ linux-2.5.33/include/asm-i386/signal.h 8 Sep 2002 17:47:11 -0000
@@ -120,6 +120,7 @@
#define SA_PROBE SA_ONESHOT
#define SA_SAMPLE_RANDOM SA_RESTART
#define SA_SHIRQ 0x04000000
+#define SA_UNMASKIRQ SA_NOMASK
#endif

#define SIG_BLOCK 0 /* for blocking signals */
Index: linux-2.5.33/include/linux/interrupt.h
===================================================================
RCS file: /build/cvsroot/linux-2.5.33/include/linux/interrupt.h,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 interrupt.h
--- linux-2.5.33/include/linux/interrupt.h 31 Aug 2002 22:30:51 -0000 1.1.1.1
+++ linux-2.5.33/include/linux/interrupt.h 8 Sep 2002 17:45:31 -0000
@@ -13,6 +13,8 @@
#include <asm/system.h>
#include <asm/ptrace.h>

+#define ISR_INPROGRESS 1 /* ISR currently being executed */
+
struct irqaction {
void (*handler)(int, void *, struct pt_regs *);
unsigned long flags;
@@ -20,6 +22,7 @@
const char *name;
void *dev_id;
struct irqaction *next;
+ unsigned long status;
};


Index: linux-2.5.33/include/linux/irq.h
===================================================================
RCS file: /build/cvsroot/linux-2.5.33/include/linux/irq.h,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 irq.h
--- linux-2.5.33/include/linux/irq.h 31 Aug 2002 22:30:53 -0000 1.1.1.1
+++ linux-2.5.33/include/linux/irq.h 8 Sep 2002 17:45:32 -0000
@@ -24,13 +24,12 @@
*/
#define IRQ_INPROGRESS 1 /* IRQ handler active - do not enter! */
#define IRQ_DISABLED 2 /* IRQ disabled - do not enter! */
-#define IRQ_PENDING 4 /* IRQ pending - replay on enable */
-#define IRQ_REPLAY 8 /* IRQ has been replayed but not acked yet */
-#define IRQ_AUTODETECT 16 /* IRQ is being autodetected */
-#define IRQ_WAITING 32 /* IRQ not yet seen - for autodetection */
-#define IRQ_LEVEL 64 /* IRQ level triggered */
-#define IRQ_MASKED 128 /* IRQ masked - shouldn't be seen again */
-#define IRQ_PER_CPU 256 /* IRQ is per CPU */
+#define IRQ_REPLAY 4 /* IRQ has been replayed but not acked yet */
+#define IRQ_AUTODETECT 8 /* IRQ is being autodetected */
+#define IRQ_WAITING 16 /* IRQ not yet seen - for autodetection */
+#define IRQ_LEVEL 32 /* IRQ level triggered */
+#define IRQ_MASKED 64 /* IRQ masked - shouldn't be seen again */
+#define IRQ_PER_CPU 128 /* IRQ is per CPU */

/*
* Interrupt controller descriptor. This is all we need
@@ -62,6 +61,7 @@
struct irqaction *action; /* IRQ action list */
unsigned int depth; /* nested irq disables */
spinlock_t lock;
+ unsigned int pending;
} ____cacheline_aligned irq_desc_t;

extern irq_desc_t irq_desc [NR_IRQS];

2002-09-09 09:57:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers


On Sun, 8 Sep 2002, Linus Torvalds wrote:

> As far as I can tell, the only time when this might be an advantage is
> an SMP machine with multiple devices sharing an extremely busy irq line.
> Then the per-isr in-progress bit allows multiple CPU's to actively
> handle several of the devices at the same time.
>
> Or is there some other case where this is helpful?

it could also improve latency of a faster interrupt source that shares its
irq line with a slow (but still frequent) handler. (such as SCSI or ne2k.)
This is both on UP and on SMP.

although this might be less of an issue these days.

Ingo

2002-09-09 10:10:29

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

bert hubert <[email protected]> wrote ..
> On Sun, Sep 08, 2002 at 03:01:02PM -0700, Linus Torvalds wrote:
>
> > setups (as opposed to most laptops, which often seem to put every PCI
> > device on the same irq)
>
> I've always thought that this was a linux problem - any reason *why* laptops
> do this?

Hi Bert,
I'd presume the reason for that would be because the irq/pin mappings end up in a manner that all the devices end up having the same irq assigned to the pin they're using. This would be a BIOS/fw problem, although it can be alleviated with PCI IRQ router support for that particular chipset.

Zwane

2002-09-09 14:30:08

by Robert Love

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Mon, 2002-09-09 at 05:49, [email protected] wrote:

> I sent you a bad patch yesterday.

I think Linus raises some important points wrt SMP performance. At the
same time, I think your patch is very simple and has the possibility to
provide improved performance on SMP _and_ UP when dealing with busy
shared interrupt handlers.

For example, consider two handlers on the same line. Even on UP, we can
find one slower handler blocking the run of another. So I think the
benefit to latency is clear.

I dunno about throughput... Linus's points need to be addressed.

Robert Love

2002-09-09 15:02:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers


On Mon, 9 Sep 2002, Ingo Molnar wrote:
>
> On Sun, 8 Sep 2002, Linus Torvalds wrote:
>
> > As far as I can tell, the only time when this might be an advantage is
> > an SMP machine with multiple devices sharing an extremely busy irq line.
> > Then the per-isr in-progress bit allows multiple CPU's to actively
> > handle several of the devices at the same time.
> >
> > Or is there some other case where this is helpful?
>
> it could also improve latency of a faster interrupt source that shares its
> irq line with a slow (but still frequent) handler. (such as SCSI or ne2k.)

Well, it migth also _deprove_ that latency, as taking another interrupt is
a lot more expensive than just walking the list of ISR's on the existing
irq chain.

Particularly on a P4, taking an interrupt is quite expensive.

Remember: you'd be "improving latency" by taking several interrupts
instead of taking just one. And usually, if the system is really under so
much interrupt load that this would be noticeable, you want to try to
_mitigate_ interrupts instead of adding new ones.

I think. I'd like to point out that I just have a gut feel for this, so
I'm definitely not trying to say that I absolutely hate the idea and that
it will never happen. But the thing worries me a bit, and I really would
prefer to have some quantifiable reasons for or against it.

In other words, I kind of understand your concerns, but I've got concerns
of my own. But nobody will argue against numbers..

Linus

2002-09-09 16:16:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

In article <[email protected]>,
bert hubert <[email protected]> wrote:
>On Sun, Sep 08, 2002 at 03:01:02PM -0700, Linus Torvalds wrote:
>
>> setups (as opposed to most laptops, which often seem to put every PCI
>> device on the same irq)
>
>I've always thought that this was a linux problem - any reason *why* laptops
>do this?

I have no idea why, but according to the irq routing information, the
irq lines really often _are_ wired to the same pin on the irq
controller.

(Oh, I'm sure there are cases where Linux ends up using the same irq
even when it isn't necessary, but equally often everything really is on
just irq 9 or something like that).

There may be good reasons for it, but I suspect it's one of those "we
are lazy, and it was just easier to tie those lines together" things at
design time. It might make for one less pin used, and potentially makes
the PIRQ table easier to write. Whatever.

Linus

2002-09-09 17:55:34

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On 9 Sep 2002, Robert Love wrote:

> I dunno about throughput... Linus's points need to be addressed.

Yep agreed, i'm going to try and get numbers.

Thanks,
Zwane
--
function.linuxpower.ca

2002-09-09 18:35:37

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers


On Mon, 9 Sep 2002, Linus Torvalds wrote:

> Remember: you'd be "improving latency" by taking several interrupts
> instead of taking just one. And usually, if the system is really under
> so much interrupt load that this would be noticeable, you want to try to
> _mitigate_ interrupts instead of adding new ones.

There's also the following effect that could generate additional
interrupts: the *same* IRQ source that is currently executing might
generate a (spurious but otherwise harmless) interrupt if we first ACK the
card then ACK the APIC and then do processing. Our current way of masking
interrupts in the IO-APIC at least leaves them pending there until the
handler's main work loop is finished and mitigates irqs.

Ingo

2002-09-09 18:45:54

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers


On Mon, 9 Sep 2002, Ingo Molnar wrote:
>
> There's also the following effect that could generate additional
> interrupts: the *same* IRQ source that is currently executing might
> generate a (spurious but otherwise harmless) interrupt if we first ACK the
> card then ACK the APIC and then do processing. Our current way of masking
> interrupts in the IO-APIC at least leaves them pending there until the
> handler's main work loop is finished and mitigates irqs.

I agree with you, but that is only true for edge-triggered APIC
interrupts, though - for level-triggered ones we will just re-take the
interrupt when we unmask it again.

Which is kind of sad. Is there some fast way to read the status of a
level-trigger irq off the IO-APIC in case it is still pending, and to do
the mitigation even for level-triggered?

(Btw, if there is, that would also allow us to notice the "constantly
screaming PCI interrupt" without help from the low-level isrs)

Linus

2002-09-09 19:09:56

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Mon, 9 Sep 2002, Linus Torvalds wrote:

> I agree with you, but that is only true for edge-triggered APIC
> interrupts, though - for level-triggered ones we will just re-take the
> interrupt when we unmask it again.
>
> Which is kind of sad. Is there some fast way to read the status of a
> level-trigger irq off the IO-APIC in case it is still pending, and to do
> the mitigation even for level-triggered?

perhaps Remote IRR might help there?

> (Btw, if there is, that would also allow us to notice the "constantly
> screaming PCI interrupt" without help from the low-level isrs)

As an aside, i just had an idea for another way to improve interrupt
handling latency. Instead of walking through all the isrs in the chain,
we can have an isr flag wether it was the source of the irq, and if so we
stop right there and not walk through the other isrs. Obviously taking
into account that some devices are dumb and have no real way of
determining.

Zwane

--
function.linuxpower.ca


2002-09-09 19:14:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers


On Mon, 9 Sep 2002, Zwane Mwaikambo wrote:

> As an aside, i just had an idea for another way to improve interrupt
> handling latency. Instead of walking through all the isrs in the chain,
> we can have an isr flag wether it was the source of the irq, and if so
> we stop right there and not walk through the other isrs. Obviously
> taking into account that some devices are dumb and have no real way of
> determining.

this is something i have a 0.5 MB patch for that touches a few hundred
drivers. I can dust it off if there's demand - it will break almost
nothing because i've done the hard work of adding the default 'no work was
done' bit to every driver's IRQ handler.

Ingo

2002-09-09 19:33:16

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Mon, 9 Sep 2002, Zwane Mwaikambo wrote:

> > Which is kind of sad. Is there some fast way to read the status of a
> > level-trigger irq off the IO-APIC in case it is still pending, and to do
> > the mitigation even for level-triggered?
>
> perhaps Remote IRR might help there?

I don't think so. As far as I understand the I/O APIC operation, you
can't really know the state of an interrupt input when Remote IRR is set
(i.e. an interrupt from the input is being processed). You can only read
a sort of state of an input from the Delivery Status bit when Remote IRR
is cleared.

For the i82489DX you could have read the state of an individual interrupt
request from the IRR register of the local unit handling the IRQ.

--
+ Maciej W. Rozycki, Technical University of Gdansk, Poland +
+--------------------------------------------------------------+
+ e-mail: [email protected], PGP key available +

2002-09-09 19:38:15

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Mon, 9 Sep 2002, Ingo Molnar wrote:

> this is something i have a 0.5 MB patch for that touches a few hundred
> drivers. I can dust it off if there's demand - it will break almost
> nothing because i've done the hard work of adding the default 'no work was
> done' bit to every driver's IRQ handler.

Could you upload and post a URL?

Thanks,
Zwane

--
function.linuxpower.ca

2002-09-09 20:04:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

Ingo Molnar wrote:
>
> On Mon, 9 Sep 2002, Zwane Mwaikambo wrote:
>
> > As an aside, i just had an idea for another way to improve interrupt
> > handling latency. Instead of walking through all the isrs in the chain,
> > we can have an isr flag wether it was the source of the irq, and if so
> > we stop right there and not walk through the other isrs. Obviously
> > taking into account that some devices are dumb and have no real way of
> > determining.
>
> this is something i have a 0.5 MB patch for that touches a few hundred
> drivers. I can dust it off if there's demand - it will break almost
> nothing because i've done the hard work of adding the default 'no work was
> done' bit to every driver's IRQ handler.
>

Does that code re-order the chain dynamically?

(My laptop shares an interrupt between the cardbus controller
and the cardbus ethernet controller. The ethernet controller
generates 1000 interrupts per second. The cardbus controller
generates 2 interrupts per day. yenta_interrupt is really, really
slow).

2002-09-09 20:33:22

by Alan

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Sun, 2002-09-08 at 11:57, Zwane Mwaikambo wrote:
> iirc IDE is capable of doing its own masking per device(nIEN) and in fact
> does even do unconditional sti's in its isr paths. So i would think it
> would be one of the not so painful device drivers to take care of.

If I remember rightly nIEN doesnt work everywhere. Also many IDE
interfaces may be using legacy IRQ wiring rather than PCI irq lines.

2002-09-09 20:44:56

by Andre Hedrick

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers


nIEN is set to allow the device to interrupt the host controller.
Some host controllers have the ablity to block and hold sticky that
transaction to the CPU.

You set nIEN and we need a whole new set of state diagrams for the driver.

nIEN = 0, interrupt driver
nIEN = 1, polling driver

You may not switch this in the middle of the execution of a command block.
If you want to try this, go for it, and leave me off the CC for the mess
you will make of your data.

On 9 Sep 2002, Alan Cox wrote:

> On Sun, 2002-09-08 at 11:57, Zwane Mwaikambo wrote:
> > iirc IDE is capable of doing its own masking per device(nIEN) and in fact
> > does even do unconditional sti's in its isr paths. So i would think it
> > would be one of the not so painful device drivers to take care of.
>
> If I remember rightly nIEN doesnt work everywhere. Also many IDE
> interfaces may be using legacy IRQ wiring rather than PCI irq lines.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Andre Hedrick
LAD Storage Consulting Group

2002-09-10 07:41:55

by Rogier Wolff

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Mon, Sep 09, 2002 at 11:53:40AM -0700, Linus Torvalds wrote:
>
> (Btw, if there is, that would also allow us to notice the "constantly
> screaming PCI interrupt" without help from the low-level isrs)

OH! That'd be nice: instead of a lockup if a PCI device's interrupt
isn't serviced, you get a nice message and a machine that might
still work!

On the other hand, you have a possibility for disaster if the
threshold isn't set right.

I have written serial drivers where the card will limit the interrupt
rate to max 100 per second. I then build in a detection: if my IRQ
handler gets called more than 10 times in a jiffy, we're in trouble.

Turns out that I left this in "in the field" and some people put the
serial card on the same interrupt line as a SCSI controller. The
scsi controller can generate more than 1000 interrupts per second ->
my driver shuts down....

Something similar may happen if say you net-spray a sligtly
under-powered machine with a Gigabit ethernet card: The GBE card may
indeed have a new packet ready by the time the interrupt tries to
return. Leads to an interesting DOS: just send a bunch of packets
in quick succession and the machine drops off the internet...

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* The Worlds Ecosystem is a stable system. Stable systems may experience *
* excursions from the stable situation. We are currenly in such an *
* excursion: The stable situation does not include humans. ***************

2002-09-10 11:51:07

by Stephan von Krawczynski

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Mon, 9 Sep 2002 21:37:42 +0200 (SAST)
Zwane Mwaikambo <[email protected]> wrote:

> > (Btw, if there is, that would also allow us to notice the "constantly
> > screaming PCI interrupt" without help from the low-level isrs)
>
> As an aside, i just had an idea for another way to improve interrupt
> handling latency. Instead of walking through all the isrs in the chain,
> we can have an isr flag wether it was the source of the irq, and if so we
> stop right there and not walk through the other isrs. Obviously taking
> into account that some devices are dumb and have no real way of
> determining.
>
> Zwane

Hello,

a short note on that: this proved to be a particularly bad idea back in the
amiga-days. All that happened with this idea (Amiga which has basically only 2
usable interrupts has heavy interrupt sharing) is that every good programmer
told the system that it was not the source of the ongoing interrupt - even if
it was - because otherwise you lost interrupts in heavy-load environment.
Shared interrupts _can_ work well, but you have to do short interrupt-routines
and don't mess the thing by over-intelligent (in fact non-atomic) operation.

Regards,
Stephan


>
> --
> function.linuxpower.ca
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2002-09-10 15:01:54

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers


[email protected] said:
> this is something i have a 0.5 MB patch for that touches a few
> hundred drivers. I can dust it off if there's demand - it will break
> almost nothing because i've done the hard work of adding the default
> 'no work was done' bit to every driver's IRQ handler.

Note that you can also survive IRQ storms this way -- if all handlers
returned 'no work was done' and you get over $N irqs per second, disable
that IRQ for a while.

--
dwmw2


2002-09-10 15:27:20

by Alan

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Tue, 2002-09-10 at 16:05, David Woodhouse wrote:
>
> [email protected] said:
> > this is something i have a 0.5 MB patch for that touches a few
> > hundred drivers. I can dust it off if there's demand - it will break
> > almost nothing because i've done the hard work of adding the default
> > 'no work was done' bit to every driver's IRQ handler.
>
> Note that you can also survive IRQ storms this way -- if all handlers
> returned 'no work was done' and you get over $N irqs per second, disable
> that IRQ for a while.

You can do that without touching any drivers and its better that way

Firstly the "no work was done" check is insufficient if work is being
done but the IRQ rate is too high to keep up.

Secondly the check means mangling all the drivers when you can establish
if work was done anyway by checking bh/userspace has also had some run
time.

I'm all for surviving IRQ storms on a level triggered IRQ, but do it
purely on interrupt rate and measured system progress not on some
assumption that the driver knows what its doing.

2002-09-10 17:19:36

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Mon, 9 Sep 2002, Andre Hedrick wrote:

> You may not switch this in the middle of the execution of a command block.
> If you want to try this, go for it, and leave me off the CC for the mess
> you will make of your data.

No worries, it was mentioned as an auxiliary comment on another topic, i
definitely do _not_ want to go anywhere near IDE code last i saw it has
quite a head count ;)

Zwane
--
function.linuxpower.ca

2002-09-10 18:00:27

by Gunther Mayer

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

Zwane Mwaikambo wrote:

>On Mon, 9 Sep 2002, Linus Torvalds wrote:
>
>
>
>>I agree with you, but that is only true for edge-triggered APIC
>>interrupts, though - for level-triggered ones we will just re-take the
>>interrupt when we unmask it again.
>>
>>Which is kind of sad. Is there some fast way to read the status of a
>>level-trigger irq off the IO-APIC in case it is still pending, and to do
>>the mitigation even for level-triggered?
>>
>>
>
>perhaps Remote IRR might help there?
>
>
>
>>(Btw, if there is, that would also allow us to notice the "constantly
>>screaming PCI interrupt" without help from the low-level isrs)
>>
>>
>
>As an aside, i just had an idea for another way to improve interrupt
>handling latency. Instead of walking through all the isrs in the chain,
>we can have an isr flag wether it was the source of the irq, and if so we
>stop right there and not walk through the other isrs.
>
This method is flawed for edge-triggered interrupts: you will miss any
interrupts which come in before you acked the first.


2002-09-10 18:13:02

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers

On Tue, 10 Sep 2002, Gunther Mayer wrote:

> This method is flawed for edge-triggered interrupts: you will miss any
> interrupts which come in before you acked the first.

We ack the PIC before doing the handler walk.

Zwane
--
function.linuxpower.ca

2002-09-10 20:49:26

by Gérard Roudier

[permalink] [raw]
Subject: Re: [PATCH][RFC] per isr in_progress markers



On Mon, 9 Sep 2002, Ingo Molnar wrote:

>
> On Mon, 9 Sep 2002, Zwane Mwaikambo wrote:
>
> > As an aside, i just had an idea for another way to improve interrupt
> > handling latency. Instead of walking through all the isrs in the chain,
> > we can have an isr flag wether it was the source of the irq, and if so
> > we stop right there and not walk through the other isrs. Obviously
> > taking into account that some devices are dumb and have no real way of
> > determining.
>
> this is something i have a 0.5 MB patch for that touches a few hundred
> drivers. I can dust it off if there's demand - it will break almost
> nothing because i've done the hard work of adding the default 'no work was
> done' bit to every driver's IRQ handler.

Level-triggerred (aka level-sensitive) interrupt is a condition. No matter
which device raises the condition first when, in fact, more than one
device did so. IMO, the heuristics that try to associate a CPU interrupt
to a single device source are broken when level triggerred is the concern,
for the simple reason they just rely on false assumptions. (The same
way, 'no work was done' for a device does not means that it is 100% sure
IRQ wasn't triggerred by this device).

Prior to improve something, we should want to estimate if there is really
room for improvement. For example, does Linux have to handle far more
interrupts than really needed ? The 'no work was done' you suggest can
help maintaining such stats and see if the kernel is good or bad in this
area. When applied in a per device manner, it might be accurate on low IRQ
load, but may significantly lie in high IRQ load situation, IMO.

Only having driver isr's returning some 'no work was done' is not enough
to optimize interrupts, in my opinion. What we want to know is if we got
too many interrupts or not. We need more information for this. For
example, drivers could also tell the kernel about the number of real
completions or some number of interrupts that can be considered useful.
For example, if a driver expects 1 interrupt per IOs, it might export to
the kernel the number of completed IOs ...

G?rard.