2013-03-19 15:12:22

by Daniel Vetter

[permalink] [raw]
Subject: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)))

On Tue, Mar 19, 2013 at 10:03 AM, Chris Wilson <[email protected]> wrote:
>> > How about just using:
>> > if (!HAS_GMBUS_IRQ(dev_priv->dev)) gmbus4_irq_en = 0;
>> > and the existing wait loop?
>>
>> I explicitly wanted to avoid touching GMBUS4 register, as the real cause
>> of the failure is not clear.
>>
>> But, as Yinghai Lu points out, the problem is most likely caused by
>> interrupt disabling not working properly (see his very good point
>> regarding DisINTx+ and INTx+ discrepancy), so zeroing the register out
>> should work .... and it indeed does in my case, hence the (tested) patch
>> below.
>>
>> I think it's a 3.9-rc material, and I am all open to debug this further
>> for 3.10 so that the race is closed and gmbus irqs can be used on Gen4
>> platform properly.
>
> Agreed. Using the IRQ for GMBUS is just a performance feature that can
> be deferred until after we determine the root cause - and hope that the
> failure is somehow peculiar to GMBUS.

Ok, I've merged this patch. But some further investigation points at a
much more severe dragon hiding here: The MSI interrupt for the intel
gfx is commonly in the 40+ range, but the interrupt vector with the
spurious interrupts is 16. Which is the irq of the intel gfx when MSI
is disabled!

So it looks like gmbus on the intel gfx is capable of generating
non-MSI interrupts in parallel to the MSI interrupts (since apparently
gmbus still works, so we get the interrupts we expect). I have no idea
how that could happen. Hence adding a bunch of people with more clue
than me.

For reference below the updated commit message.

Cheers, Daniel

Author: Jiri Kosina <[email protected]>
Date: Tue Mar 19 09:56:57 2013 +0100

drm/i915: stop using GMBUS IRQs on Gen4 chips

Commit 28c70f162 ("drm/i915: use the gmbus irq for waits") switched to
using GMBUS irqs instead of GPIO bit-banging for chipset generations 4
and above.

It turns out though that on many systems this leads to spurious interrupts
being generated, long after the register write to disable the IRQs has been
issued.

Typically this results in the spurious interrupt source getting
disabled:

[ 9.636345] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 9.637915] Pid: 4157, comm: ifup Tainted: GF
3.9.0-rc2-00341-g0863702 #422
[ 9.639484] Call Trace:
[ 9.640731] <IRQ> [<ffffffff8109b40d>] __report_bad_irq+0x1d/0xc7
[ 9.640731] [<ffffffff8109b7db>] note_interrupt+0x15b/0x1e8
[ 9.640731] [<ffffffff810999f7>] handle_irq_event_percpu+0x1bf/0x214
[ 9.640731] [<ffffffff81099a88>] handle_irq_event+0x3c/0x5c
[ 9.640731] [<ffffffff8109c139>] handle_fasteoi_irq+0x7a/0xb0
[ 9.640731] [<ffffffff8100400e>] handle_irq+0x1a/0x24
[ 9.640731] [<ffffffff81003d17>] do_IRQ+0x48/0xaf
[ 9.640731] [<ffffffff8142f1ea>] common_interrupt+0x6a/0x6a
[ 9.640731] <EOI> [<ffffffff8142f952>] ? system_call_fastpath+0x16/0x1b
[ 9.640731] handlers:
[ 9.640731] [<ffffffffa000d771>] usb_hcd_irq [usbcore]
[ 9.640731] [<ffffffffa0306189>] yenta_interrupt [yenta_socket]
[ 9.640731] Disabling IRQ #16

The really curious thing is now that irq 16 is _not_ the interrupt for
the i915 driver when using MSI, but it _is_ the interrupt when not
using MSI. So by all indications it seems like gmbus is able to
generate a legacy (shared) interrupt in MSI mode on some
configurations. I've tried to reproduce this and the differentiating
thing seems to be that on unaffected systems no other device uses irq
16 (which seems to be the non-MSI intel gfx interrupt on all gm45).

I have no idea how that even can happen.

To avoid tempting this elephant into a rage, just disable gmbus
interrupt support on gen 4.

v2: Improve the commit message with exact details of what's going on.
Also add a comment in the code to warn against this particular
elephant in the room.

Signed-off-by: Jiri Kosina <[email protected]> (v1)
Acked-by: Chris Wilson <[email protected]> (v1)
References: https://lkml.org/lkml/2013/3/8/325
Signed-off-by: Daniel Vetter <[email protected]>
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


2013-03-19 15:38:52

by Alan Stern

[permalink] [raw]
Subject: Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses

On Tue, 19 Mar 2013, Daniel Vetter wrote:

> For reference below the updated commit message.
>
> Cheers, Daniel
>
> Author: Jiri Kosina <[email protected]>
> Date: Tue Mar 19 09:56:57 2013 +0100

>
> drm/i915: stop using GMBUS IRQs on Gen4 chips
>
> Commit 28c70f162 ("drm/i915: use the gmbus irq for waits") switched to
> using GMBUS irqs instead of GPIO bit-banging for chipset generations 4
> and above.
>
> It turns out though that on many systems this leads to spurious interrupts
> being generated, long after the register write to disable the IRQs has been
> issued.
>
> Typically this results in the spurious interrupt source getting
> disabled:
>
> [ 9.636345] irq 16: nobody cared (try booting with the "irqpoll" option)
> [ 9.637915] Pid: 4157, comm: ifup Tainted: GF
> 3.9.0-rc2-00341-g0863702 #422
> [ 9.639484] Call Trace:
> [ 9.640731] <IRQ> [<ffffffff8109b40d>] __report_bad_irq+0x1d/0xc7
> [ 9.640731] [<ffffffff8109b7db>] note_interrupt+0x15b/0x1e8
> [ 9.640731] [<ffffffff810999f7>] handle_irq_event_percpu+0x1bf/0x214
> [ 9.640731] [<ffffffff81099a88>] handle_irq_event+0x3c/0x5c
> [ 9.640731] [<ffffffff8109c139>] handle_fasteoi_irq+0x7a/0xb0
> [ 9.640731] [<ffffffff8100400e>] handle_irq+0x1a/0x24
> [ 9.640731] [<ffffffff81003d17>] do_IRQ+0x48/0xaf
> [ 9.640731] [<ffffffff8142f1ea>] common_interrupt+0x6a/0x6a
> [ 9.640731] <EOI> [<ffffffff8142f952>] ? system_call_fastpath+0x16/0x1b
> [ 9.640731] handlers:
> [ 9.640731] [<ffffffffa000d771>] usb_hcd_irq [usbcore]
> [ 9.640731] [<ffffffffa0306189>] yenta_interrupt [yenta_socket]
> [ 9.640731] Disabling IRQ #16
>
> The really curious thing is now that irq 16 is _not_ the interrupt for
> the i915 driver when using MSI, but it _is_ the interrupt when not
> using MSI. So by all indications it seems like gmbus is able to
> generate a legacy (shared) interrupt in MSI mode on some
> configurations. I've tried to reproduce this and the differentiating
> thing seems to be that on unaffected systems no other device uses irq
> 16 (which seems to be the non-MSI intel gfx interrupt on all gm45).

That might be misleading. It's possible that the erroneous IRQs _are_
being issued but you're simply not aware of them. If the kernel thinks
that no device is using IRQ 16 then it will leave that IRQ disabled.

Alan Stern

2013-03-19 16:54:53

by Daniel Vetter

[permalink] [raw]
Subject: Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses

On Tue, Mar 19, 2013 at 4:38 PM, Alan Stern <[email protected]> wrote:
> On Tue, 19 Mar 2013, Daniel Vetter wrote:
>
>> For reference below the updated commit message.
>>
>> Cheers, Daniel
>>
>> Author: Jiri Kosina <[email protected]>
>> Date: Tue Mar 19 09:56:57 2013 +0100
>
>>
>> drm/i915: stop using GMBUS IRQs on Gen4 chips
>>
>> Commit 28c70f162 ("drm/i915: use the gmbus irq for waits") switched to
>> using GMBUS irqs instead of GPIO bit-banging for chipset generations 4
>> and above.
>>
>> It turns out though that on many systems this leads to spurious interrupts
>> being generated, long after the register write to disable the IRQs has been
>> issued.
>>
>> Typically this results in the spurious interrupt source getting
>> disabled:
>>
>> [ 9.636345] irq 16: nobody cared (try booting with the "irqpoll" option)
>> [ 9.637915] Pid: 4157, comm: ifup Tainted: GF
>> 3.9.0-rc2-00341-g0863702 #422
>> [ 9.639484] Call Trace:
>> [ 9.640731] <IRQ> [<ffffffff8109b40d>] __report_bad_irq+0x1d/0xc7
>> [ 9.640731] [<ffffffff8109b7db>] note_interrupt+0x15b/0x1e8
>> [ 9.640731] [<ffffffff810999f7>] handle_irq_event_percpu+0x1bf/0x214
>> [ 9.640731] [<ffffffff81099a88>] handle_irq_event+0x3c/0x5c
>> [ 9.640731] [<ffffffff8109c139>] handle_fasteoi_irq+0x7a/0xb0
>> [ 9.640731] [<ffffffff8100400e>] handle_irq+0x1a/0x24
>> [ 9.640731] [<ffffffff81003d17>] do_IRQ+0x48/0xaf
>> [ 9.640731] [<ffffffff8142f1ea>] common_interrupt+0x6a/0x6a
>> [ 9.640731] <EOI> [<ffffffff8142f952>] ? system_call_fastpath+0x16/0x1b
>> [ 9.640731] handlers:
>> [ 9.640731] [<ffffffffa000d771>] usb_hcd_irq [usbcore]
>> [ 9.640731] [<ffffffffa0306189>] yenta_interrupt [yenta_socket]
>> [ 9.640731] Disabling IRQ #16
>>
>> The really curious thing is now that irq 16 is _not_ the interrupt for
>> the i915 driver when using MSI, but it _is_ the interrupt when not
>> using MSI. So by all indications it seems like gmbus is able to
>> generate a legacy (shared) interrupt in MSI mode on some
>> configurations. I've tried to reproduce this and the differentiating
>> thing seems to be that on unaffected systems no other device uses irq
>> 16 (which seems to be the non-MSI intel gfx interrupt on all gm45).
>
> That might be misleading. It's possible that the erroneous IRQs _are_
> being issued but you're simply not aware of them. If the kernel thinks
> that no device is using IRQ 16 then it will leave that IRQ disabled.

I guess I should have phrased it more precisely, but that's exactly
what I expect is happening on my machine: I don't have anything on
irq16 (i.e. in non-msi mode the gfx interrupt isn't shared) and hence
the irq is completely disabled. Which obviously makes it impossible
for me to reproduce the issue. To test that theory, is there a quick
way to force-enable a given interrupt, short of just hacking up a 2nd
dummy irq handler in my driver?
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2013-03-19 18:19:02

by Alan Stern

[permalink] [raw]
Subject: Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt respo

On Tue, 19 Mar 2013, Daniel Vetter wrote:

> > That might be misleading. It's possible that the erroneous IRQs _are_
> > being issued but you're simply not aware of them. If the kernel thinks
> > that no device is using IRQ 16 then it will leave that IRQ disabled.
>
> I guess I should have phrased it more precisely, but that's exactly
> what I expect is happening on my machine: I don't have anything on
> irq16 (i.e. in non-msi mode the gfx interrupt isn't shared) and hence
> the irq is completely disabled. Which obviously makes it impossible
> for me to reproduce the issue. To test that theory, is there a quick
> way to force-enable a given interrupt, short of just hacking up a 2nd
> dummy irq handler in my driver?

I don't know of any way. In fact, I have been thinking of writing a
test driver module, with a module parameter telling it which IRQ number
to register for. It seems like the sort of thing that would be useful
to have, from time to time.

Alan Stern

2013-03-19 19:14:42

by Yinghai Lu

[permalink] [raw]
Subject: Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses

On Tue, Mar 19, 2013 at 9:54 AM, Daniel Vetter <[email protected]> wrote:

> I guess I should have phrased it more precisely, but that's exactly
> what I expect is happening on my machine: I don't have anything on
> irq16 (i.e. in non-msi mode the gfx interrupt isn't shared) and hence
> the irq is completely disabled. Which obviously makes it impossible
> for me to reproduce the issue. To test that theory, is there a quick
> way to force-enable a given interrupt, short of just hacking up a 2nd
> dummy irq handler in my driver?

You may try to add another request_irq()
after i915_load_modeset_init==>drm_irq_install.
That could install one dummy action for ioapic irq for i915.

Also you may need to add one quirk that does not disable intx during
msi enabling like:
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,
0x2e22,
quirk_msi_intx_disable_bug);

Thanks

Yinghai

2013-03-20 16:29:42

by Jiri Kosina

[permalink] [raw]
Subject: Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt respo

On Tue, 19 Mar 2013, Alan Stern wrote:

> > > That might be misleading. It's possible that the erroneous IRQs _are_
> > > being issued but you're simply not aware of them. If the kernel thinks
> > > that no device is using IRQ 16 then it will leave that IRQ disabled.
> >
> > I guess I should have phrased it more precisely, but that's exactly
> > what I expect is happening on my machine: I don't have anything on
> > irq16 (i.e. in non-msi mode the gfx interrupt isn't shared) and hence
> > the irq is completely disabled. Which obviously makes it impossible
> > for me to reproduce the issue. To test that theory, is there a quick
> > way to force-enable a given interrupt, short of just hacking up a 2nd
> > dummy irq handler in my driver?
>
> I don't know of any way. In fact, I have been thinking of writing a
> test driver module, with a module parameter telling it which IRQ number
> to register for. It seems like the sort of thing that would be useful
> to have, from time to time.

Ok, so how about this?
Daniel, is it enough to make the problem appear on your system (by
building this into the kernel and booting with dummy-irq.irq=16)?

Thanks.





From: Jiri Kosina <[email protected]>
Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver

This module accepts a single 'irq' parameter, which it should register for.

Its sole purpose is to help with debugging of IRQ sharing problems, by
force-enabling IRQ that would otherwise be disabled.

Suggested-by: Alan Stern <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
---
drivers/misc/Kconfig | 8 ++++++
drivers/misc/Makefile | 1 +
drivers/misc/dummy-irq.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 68 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/dummy-irq.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index e83fdfe..db24b79 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -93,6 +93,14 @@ config ATMEL_TCB_CLKSRC_BLOCK
TC can be used for other purposes, such as PWM generation and
interval timing.

+config DUMMY_IRQ
+ tristate "Dummy IRQ handler"
+ default n
+ ---help---
+ This module accepts a single 'irq' parameter, which it should register for.
+ Its sole purpose is to help with debugging of IRQ sharing problems, by
+ force-enabling IRQ that would otherwise be disabled.
+
config IBM_ASM
tristate "Device driver for IBM RSA service processor"
depends on X86 && PCI && INPUT
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 35a1463..28ff261 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_ATMEL_TCLIB) += atmel_tclib.o
obj-$(CONFIG_BMP085) += bmp085.o
obj-$(CONFIG_BMP085_I2C) += bmp085-i2c.o
obj-$(CONFIG_BMP085_SPI) += bmp085-spi.o
+obj-$(CONFIG_DUMMY_IRQ) += dummy-irq.o
obj-$(CONFIG_ICS932S401) += ics932s401.o
obj-$(CONFIG_LKDTM) += lkdtm.o
obj-$(CONFIG_TIFM_CORE) += tifm_core.o
diff --git a/drivers/misc/dummy-irq.c b/drivers/misc/dummy-irq.c
new file mode 100644
index 0000000..4fc13e0
--- /dev/null
+++ b/drivers/misc/dummy-irq.c
@@ -0,0 +1,59 @@
+/*
+ * Dummy IRQ handler driver.
+ *
+ * This module only registers itself as a handler that is specified to it
+ * by the 'irq' parameter.
+ *
+ * The sole purpose of this module is to help with debugging of systems on
+ * which spurious IRQs cause the IRQ to be disabled.
+ *
+ * Copyright (C) 2013 Jiri Kosina
+ */
+
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/irq.h>
+#include <linux/interrupt.h>
+
+static int irq;
+
+static irqreturn_t dummy_interrupt(int irq, void *dev_id)
+{
+ static int count = 0;
+
+ if (count == 0) {
+ printk("dummy-irq: interrupt occured on IRQ %d\n", irq);
+ count++;
+ }
+
+ return IRQ_NONE;
+}
+
+static int __init dummy_irq_init(void)
+{
+ if (request_irq(irq, &dummy_interrupt, IRQF_SHARED, "dummy_irq", &irq)) {
+ printk(KERN_ERR "dummy-irq: cannot register IRQ %d\n", irq);
+ return -EIO;
+ }
+ printk(KERN_INFO "dummy-irq: registered for IRQ %d\n", irq);
+ return 0;
+}
+
+static void __exit dummy_irq_exit(void)
+{
+ printk(KERN_INFO "dummy-irq unloaded\n");
+ free_irq(irq, &irq);
+ return;
+}
+
+module_init(dummy_irq_init);
+module_exit(dummy_irq_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Jiri Kosina");
+module_param_named(irq, irq, uint, 0444);
+MODULE_PARM_DESC(irq, "The IRQ to register for");

--
Jiri Kosina
SUSE Labs

2013-03-20 17:23:16

by Alan Stern

[permalink] [raw]
Subject: Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt respo

On Wed, 20 Mar 2013, Jiri Kosina wrote:

> > I don't know of any way. In fact, I have been thinking of writing a
> > test driver module, with a module parameter telling it which IRQ number
> > to register for. It seems like the sort of thing that would be useful
> > to have, from time to time.
>
> Ok, so how about this?
> Daniel, is it enough to make the problem appear on your system (by
> building this into the kernel and booting with dummy-irq.irq=16)?
>
> Thanks.
>
>
>
>
>
> From: Jiri Kosina <[email protected]>
> Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver
>
> This module accepts a single 'irq' parameter, which it should register for.
>
> Its sole purpose is to help with debugging of IRQ sharing problems, by
> force-enabling IRQ that would otherwise be disabled.
>
> Suggested-by: Alan Stern <[email protected]>
> Signed-off-by: Jiri Kosina <[email protected]>

This is just what I was thinking of. Three extremely minor
suggestions...

> +static irqreturn_t dummy_interrupt(int irq, void *dev_id)
> +{
> + static int count = 0;
> +
> + if (count == 0) {
> + printk("dummy-irq: interrupt occured on IRQ %d\n", irq);

You probably should put a severity level here. KERN_INFO?

> + count++;
> + }
> +
> + return IRQ_NONE;
> +}
> +
> +static int __init dummy_irq_init(void)
> +{
> + if (request_irq(irq, &dummy_interrupt, IRQF_SHARED, "dummy_irq", &irq)) {
> + printk(KERN_ERR "dummy-irq: cannot register IRQ %d\n", irq);
> + return -EIO;
> + }
> + printk(KERN_INFO "dummy-irq: registered for IRQ %d\n", irq);
> + return 0;
> +}
> +
> +static void __exit dummy_irq_exit(void)
> +{
> + printk(KERN_INFO "dummy-irq unloaded\n");
> + free_irq(irq, &irq);
> + return;

A return statement isn't needed here.

> +}
> +
> +module_init(dummy_irq_init);
> +module_exit(dummy_irq_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Jiri Kosina");
> +module_param_named(irq, irq, uint, 0444);

module_param is good enough when the parameter's name is the same as
the variable's name.

> +MODULE_PARM_DESC(irq, "The IRQ to register for");

Alan Stern

2013-03-20 23:14:36

by Jiri Kosina

[permalink] [raw]
Subject: Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses

On Tue, 19 Mar 2013, Yinghai Lu wrote:

> > I guess I should have phrased it more precisely, but that's exactly
> > what I expect is happening on my machine: I don't have anything on
> > irq16 (i.e. in non-msi mode the gfx interrupt isn't shared) and hence
> > the irq is completely disabled. Which obviously makes it impossible
> > for me to reproduce the issue. To test that theory, is there a quick
> > way to force-enable a given interrupt, short of just hacking up a 2nd
> > dummy irq handler in my driver?
>
> You may try to add another request_irq()
> after i915_load_modeset_init==>drm_irq_install.
> That could install one dummy action for ioapic irq for i915.
>
> Also you may need to add one quirk that does not disable intx during
> msi enabling like:
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL,
> 0x2e22,
> quirk_msi_intx_disable_bug);
>

This seemed to be really promising idea to me, as the DisINTx+ vs INTx+
discrepancy is very good hint, but unfortunately, after applying this:

---
drivers/pci/quirks.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 0369fb6..8508e24 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2643,6 +2643,9 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, 0x1073,
quirk_msi_intx_disable_bug);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATTANSIC, 0x1083,
quirk_msi_intx_disable_bug);
+
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x2a42,
+ quirk_msi_intx_disable_bug);
#endif /* CONFIG_PCI_MSI */

/* Allow manual resource allocation for PCI hotplug bridges


The problem is still there ... so the inconsistency between DisINTx+ and
INTx+ is unfortunately not the whole story.

--
Jiri Kosina
SUSE Labs

2013-03-20 23:21:27

by Jiri Kosina

[permalink] [raw]
Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver (was Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt respo))

On Wed, 20 Mar 2013, Alan Stern wrote:

> > Ok, so how about this?
> > Daniel, is it enough to make the problem appear on your system (by
> > building this into the kernel and booting with dummy-irq.irq=16)?
> >
> > Thanks.
> >
> > From: Jiri Kosina <[email protected]>
> > Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver
> >
> > This module accepts a single 'irq' parameter, which it should register for.
> >
> > Its sole purpose is to help with debugging of IRQ sharing problems, by
> > force-enabling IRQ that would otherwise be disabled.
> >
> > Suggested-by: Alan Stern <[email protected]>
> > Signed-off-by: Jiri Kosina <[email protected]>
>
> This is just what I was thinking of. Three extremely minor
> suggestions...

Thanks Alan. Updated version below.

Greg, willing to merge this simple debugging facility?


From: Jiri Kosina <[email protected]>
Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver

This module accepts a single 'irq' parameter, which it should register for.

Its sole purpose is to help with debugging of IRQ sharing problems, by
force-enabling IRQ that would otherwise be disabled.

Suggested-by: Alan Stern <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
---
drivers/misc/Kconfig | 8 ++++++
drivers/misc/Makefile | 1 +
drivers/misc/dummy-irq.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 68 insertions(+), 0 deletions(-)
create mode 100644 drivers/misc/dummy-irq.c

diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index e83fdfe..69bb79d 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -93,6 +93,14 @@ config ATMEL_TCB_CLKSRC_BLOCK
TC can be used for other purposes, such as PWM generation and
interval timing.

+config DUMMY_IRQ
+ tristate "Dummy IRQ handler"
+ default n
+ ---help---
+ This module accepts a single 'irq' parameter, which it should register for.
+ The sole purpose of this module is to help with debugging of systems on
+ which spurious IRQs would happen on disabled IRQ vector.
+
config IBM_ASM
tristate "Device driver for IBM RSA service processor"
depends on X86 && PCI && INPUT
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index 35a1463..28ff261 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -13,6 +13,7 @@ obj-$(CONFIG_ATMEL_TCLIB) += atmel_tclib.o
obj-$(CONFIG_BMP085) += bmp085.o
obj-$(CONFIG_BMP085_I2C) += bmp085-i2c.o
obj-$(CONFIG_BMP085_SPI) += bmp085-spi.o
+obj-$(CONFIG_DUMMY_IRQ) += dummy-irq.o
obj-$(CONFIG_ICS932S401) += ics932s401.o
obj-$(CONFIG_LKDTM) += lkdtm.o
obj-$(CONFIG_TIFM_CORE) += tifm_core.o
diff --git a/drivers/misc/dummy-irq.c b/drivers/misc/dummy-irq.c
new file mode 100644
index 0000000..7014167
--- /dev/null
+++ b/drivers/misc/dummy-irq.c
@@ -0,0 +1,59 @@
+/*
+ * Dummy IRQ handler driver.
+ *
+ * This module only registers itself as a handler that is specified to it
+ * by the 'irq' parameter.
+ *
+ * The sole purpose of this module is to help with debugging of systems on
+ * which spurious IRQs would happen on disabled IRQ vector.
+ *
+ * Copyright (C) 2013 Jiri Kosina
+ */
+
+/*
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+#include <linux/module.h>
+#include <linux/irq.h>
+#include <linux/interrupt.h>
+
+static int irq;
+
+static irqreturn_t dummy_interrupt(int irq, void *dev_id)
+{
+ static int count = 0;
+
+ if (count == 0) {
+ printk(KERN_INFO "dummy-irq: interrupt occured on IRQ %d\n",
+ irq);
+ count++;
+ }
+
+ return IRQ_NONE;
+}
+
+static int __init dummy_irq_init(void)
+{
+ if (request_irq(irq, &dummy_interrupt, IRQF_SHARED, "dummy_irq", &irq)) {
+ printk(KERN_ERR "dummy-irq: cannot register IRQ %d\n", irq);
+ return -EIO;
+ }
+ printk(KERN_INFO "dummy-irq: registered for IRQ %d\n", irq);
+ return 0;
+}
+
+static void __exit dummy_irq_exit(void)
+{
+ printk(KERN_INFO "dummy-irq unloaded\n");
+ free_irq(irq, &irq);
+}
+
+module_init(dummy_irq_init);
+module_exit(dummy_irq_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Jiri Kosina");
+module_param(irq, uint, 0444);
+MODULE_PARM_DESC(irq, "The IRQ to register for");

--
Jiri Kosina
SUSE Labs

2013-03-20 23:37:28

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] dummy-irq: introduce a dummy IRQ handler driver (was Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt respo))

On Thu, Mar 21, 2013 at 12:21:21AM +0100, Jiri Kosina wrote:
> On Wed, 20 Mar 2013, Alan Stern wrote:
>
> > > Ok, so how about this?
> > > Daniel, is it enough to make the problem appear on your system (by
> > > building this into the kernel and booting with dummy-irq.irq=16)?
> > >
> > > Thanks.
> > >
> > > From: Jiri Kosina <[email protected]>
> > > Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver
> > >
> > > This module accepts a single 'irq' parameter, which it should register for.
> > >
> > > Its sole purpose is to help with debugging of IRQ sharing problems, by
> > > force-enabling IRQ that would otherwise be disabled.
> > >
> > > Suggested-by: Alan Stern <[email protected]>
> > > Signed-off-by: Jiri Kosina <[email protected]>
> >
> > This is just what I was thinking of. Three extremely minor
> > suggestions...
>
> Thanks Alan. Updated version below.
>
> Greg, willing to merge this simple debugging facility?

Yes, I'll take it, give me a few days to catch up with my pending queue,
don't worry, it's not lost.

greg k-h

2013-03-21 14:42:49

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH] dummy-irq: introduce a dummy IRQ handler driver (was Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt respo))

On Thu, Mar 21, 2013 at 12:21:21AM +0100, Jiri Kosina wrote:
> On Wed, 20 Mar 2013, Alan Stern wrote:
>
> > > Ok, so how about this?
> > > Daniel, is it enough to make the problem appear on your system (by
> > > building this into the kernel and booting with dummy-irq.irq=16)?
> > >
> > > Thanks.
> > >
> > > From: Jiri Kosina <[email protected]>
> > > Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver
> > >
> > > This module accepts a single 'irq' parameter, which it should register for.
> > >
> > > Its sole purpose is to help with debugging of IRQ sharing problems, by
> > > force-enabling IRQ that would otherwise be disabled.
> > >
> > > Suggested-by: Alan Stern <[email protected]>
> > > Signed-off-by: Jiri Kosina <[email protected]>
> >
> > This is just what I was thinking of. Three extremely minor
> > suggestions...
>
> Thanks Alan. Updated version below.
>
> Greg, willing to merge this simple debugging facility?
>
>
> From: Jiri Kosina <[email protected]>
> Subject: [PATCH] dummy-irq: introduce a dummy IRQ handler driver
>
> This module accepts a single 'irq' parameter, which it should register for.
>
> Its sole purpose is to help with debugging of IRQ sharing problems, by
> force-enabling IRQ that would otherwise be disabled.
>
> Suggested-by: Alan Stern <[email protected]>
> Signed-off-by: Jiri Kosina <[email protected]>

Indeed, this is pretty useful and allowed me to quickly reproduce that
phantom irq on my gm45. Thanks to module reloading we can even reset the
kernel's irq disabling logic and so test different tricks quickly without
rebooting. Really useful.

Acked-by: Daniel Vetter <[email protected]>

Thanks, Daniel

> ---
> drivers/misc/Kconfig | 8 ++++++
> drivers/misc/Makefile | 1 +
> drivers/misc/dummy-irq.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 68 insertions(+), 0 deletions(-)
> create mode 100644 drivers/misc/dummy-irq.c
>
> diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
> index e83fdfe..69bb79d 100644
> --- a/drivers/misc/Kconfig
> +++ b/drivers/misc/Kconfig
> @@ -93,6 +93,14 @@ config ATMEL_TCB_CLKSRC_BLOCK
> TC can be used for other purposes, such as PWM generation and
> interval timing.
>
> +config DUMMY_IRQ
> + tristate "Dummy IRQ handler"
> + default n
> + ---help---
> + This module accepts a single 'irq' parameter, which it should register for.
> + The sole purpose of this module is to help with debugging of systems on
> + which spurious IRQs would happen on disabled IRQ vector.
> +
> config IBM_ASM
> tristate "Device driver for IBM RSA service processor"
> depends on X86 && PCI && INPUT
> diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
> index 35a1463..28ff261 100644
> --- a/drivers/misc/Makefile
> +++ b/drivers/misc/Makefile
> @@ -13,6 +13,7 @@ obj-$(CONFIG_ATMEL_TCLIB) += atmel_tclib.o
> obj-$(CONFIG_BMP085) += bmp085.o
> obj-$(CONFIG_BMP085_I2C) += bmp085-i2c.o
> obj-$(CONFIG_BMP085_SPI) += bmp085-spi.o
> +obj-$(CONFIG_DUMMY_IRQ) += dummy-irq.o
> obj-$(CONFIG_ICS932S401) += ics932s401.o
> obj-$(CONFIG_LKDTM) += lkdtm.o
> obj-$(CONFIG_TIFM_CORE) += tifm_core.o
> diff --git a/drivers/misc/dummy-irq.c b/drivers/misc/dummy-irq.c
> new file mode 100644
> index 0000000..7014167
> --- /dev/null
> +++ b/drivers/misc/dummy-irq.c
> @@ -0,0 +1,59 @@
> +/*
> + * Dummy IRQ handler driver.
> + *
> + * This module only registers itself as a handler that is specified to it
> + * by the 'irq' parameter.
> + *
> + * The sole purpose of this module is to help with debugging of systems on
> + * which spurious IRQs would happen on disabled IRQ vector.
> + *
> + * Copyright (C) 2013 Jiri Kosina
> + */
> +
> +/*
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published by
> + * the Free Software Foundation.
> + */
> +#include <linux/module.h>
> +#include <linux/irq.h>
> +#include <linux/interrupt.h>
> +
> +static int irq;
> +
> +static irqreturn_t dummy_interrupt(int irq, void *dev_id)
> +{
> + static int count = 0;
> +
> + if (count == 0) {
> + printk(KERN_INFO "dummy-irq: interrupt occured on IRQ %d\n",
> + irq);
> + count++;
> + }
> +
> + return IRQ_NONE;
> +}
> +
> +static int __init dummy_irq_init(void)
> +{
> + if (request_irq(irq, &dummy_interrupt, IRQF_SHARED, "dummy_irq", &irq)) {
> + printk(KERN_ERR "dummy-irq: cannot register IRQ %d\n", irq);
> + return -EIO;
> + }
> + printk(KERN_INFO "dummy-irq: registered for IRQ %d\n", irq);
> + return 0;
> +}
> +
> +static void __exit dummy_irq_exit(void)
> +{
> + printk(KERN_INFO "dummy-irq unloaded\n");
> + free_irq(irq, &irq);
> +}
> +
> +module_init(dummy_irq_init);
> +module_exit(dummy_irq_exit);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Jiri Kosina");
> +module_param(irq, uint, 0444);
> +MODULE_PARM_DESC(irq, "The IRQ to register for");
>
> --
> Jiri Kosina
> SUSE Labs

--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2013-03-24 19:45:11

by Shawn Starr

[permalink] [raw]
Subject: Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)))

On Tuesday, March 19, 2013 04:12:18 PM Daniel Vetter wrote:
> On Tue, Mar 19, 2013 at 10:03 AM, Chris Wilson <[email protected]>
wrote:
> >> > How about just using:
> >> > if (!HAS_GMBUS_IRQ(dev_priv->dev)) gmbus4_irq_en = 0;
> >> >
> >> > and the existing wait loop?
> >>
> >> I explicitly wanted to avoid touching GMBUS4 register, as the real cause
> >> of the failure is not clear.
> >>
> >> But, as Yinghai Lu points out, the problem is most likely caused by
> >> interrupt disabling not working properly (see his very good point
> >> regarding DisINTx+ and INTx+ discrepancy), so zeroing the register out
> >> should work .... and it indeed does in my case, hence the (tested) patch
> >> below.
> >>
> >> I think it's a 3.9-rc material, and I am all open to debug this further
> >> for 3.10 so that the race is closed and gmbus irqs can be used on Gen4
> >> platform properly.
> >
> > Agreed. Using the IRQ for GMBUS is just a performance feature that can
> > be deferred until after we determine the root cause - and hope that the
> > failure is somehow peculiar to GMBUS.
>
> Ok, I've merged this patch. But some further investigation points at a
> much more severe dragon hiding here: The MSI interrupt for the intel
> gfx is commonly in the 40+ range, but the interrupt vector with the
> spurious interrupts is 16. Which is the irq of the intel gfx when MSI
> is disabled!
>
> So it looks like gmbus on the intel gfx is capable of generating
> non-MSI interrupts in parallel to the MSI interrupts (since apparently
> gmbus still works, so we get the interrupts we expect). I have no idea
> how that could happen. Hence adding a bunch of people with more clue
> than me.
>

Hello folks,

I am using Linus git master and built an rpm for 3.9.0-rc4 which has Jiri's
patch. I confirm this patch returns the GMA 4500 to working behavior as in
3.8.

Thanks everyone.
Shawn

> For reference below the updated commit message.
>
> Cheers, Daniel
>
> Author: Jiri Kosina <[email protected]>
> Date: Tue Mar 19 09:56:57 2013 +0100
>
> drm/i915: stop using GMBUS IRQs on Gen4 chips
>
> Commit 28c70f162 ("drm/i915: use the gmbus irq for waits") switched to
> using GMBUS irqs instead of GPIO bit-banging for chipset generations 4
> and above.
>
> It turns out though that on many systems this leads to spurious
> interrupts being generated, long after the register write to disable the
> IRQs has been issued.
>
> Typically this results in the spurious interrupt source getting
> disabled:
>
> [ 9.636345] irq 16: nobody cared (try booting with the "irqpoll"
> option) [ 9.637915] Pid: 4157, comm: ifup Tainted: GF
> 3.9.0-rc2-00341-g0863702 #422
> [ 9.639484] Call Trace:
> [ 9.640731] <IRQ> [<ffffffff8109b40d>] __report_bad_irq+0x1d/0xc7
> [ 9.640731] [<ffffffff8109b7db>] note_interrupt+0x15b/0x1e8
> [ 9.640731] [<ffffffff810999f7>] handle_irq_event_percpu+0x1bf/0x214
> [ 9.640731] [<ffffffff81099a88>] handle_irq_event+0x3c/0x5c [
> 9.640731] [<ffffffff8109c139>] handle_fasteoi_irq+0x7a/0xb0 [ 9.640731]
> [<ffffffff8100400e>] handle_irq+0x1a/0x24
> [ 9.640731] [<ffffffff81003d17>] do_IRQ+0x48/0xaf
> [ 9.640731] [<ffffffff8142f1ea>] common_interrupt+0x6a/0x6a
> [ 9.640731] <EOI> [<ffffffff8142f952>] ?
> system_call_fastpath+0x16/0x1b [ 9.640731] handlers:
> [ 9.640731] [<ffffffffa000d771>] usb_hcd_irq [usbcore]
> [ 9.640731] [<ffffffffa0306189>] yenta_interrupt [yenta_socket]
> [ 9.640731] Disabling IRQ #16
>
> The really curious thing is now that irq 16 is _not_ the interrupt for
> the i915 driver when using MSI, but it _is_ the interrupt when not
> using MSI. So by all indications it seems like gmbus is able to
> generate a legacy (shared) interrupt in MSI mode on some
> configurations. I've tried to reproduce this and the differentiating
> thing seems to be that on unaffected systems no other device uses irq
> 16 (which seems to be the non-MSI intel gfx interrupt on all gm45).
>
> I have no idea how that even can happen.
>
> To avoid tempting this elephant into a rage, just disable gmbus
> interrupt support on gen 4.
>
> v2: Improve the commit message with exact details of what's going on.
> Also add a comment in the code to warn against this particular
> elephant in the room.
>
> Signed-off-by: Jiri Kosina <[email protected]> (v1)
> Acked-by: Chris Wilson <[email protected]> (v1)
> References: https://lkml.org/lkml/2013/3/8/325
> Signed-off-by: Daniel Vetter <[email protected]>

2013-03-31 19:55:23

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] dummy-irq: introduce a dummy IRQ handler driver (was Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt respo))

On Thu, 21 Mar 2013, Daniel Vetter wrote:

> Indeed, this is pretty useful and allowed me to quickly reproduce that
> phantom irq on my gm45. Thanks to module reloading we can even reset the
> kernel's irq disabling logic and so test different tricks quickly without
> rebooting. Really useful.

Daniel,

out of curiosity, have you been able to make some sense of the phantom
legacy IRQs on GM45 systems, or are we just staying with my original
bandaid (disabling GMBUS IRQs), declaring GM45 broken in this respect?

Thanks,

--
Jiri Kosina
SUSE Labs

2013-04-01 18:30:18

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH] dummy-irq: introduce a dummy IRQ handler driver (was Re: gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt respo))

On Sun, Mar 31, 2013 at 9:55 PM, Jiri Kosina <[email protected]> wrote:
> On Thu, 21 Mar 2013, Daniel Vetter wrote:
>
>> Indeed, this is pretty useful and allowed me to quickly reproduce that
>> phantom irq on my gm45. Thanks to module reloading we can even reset the
>> kernel's irq disabling logic and so test different tricks quickly without
>> rebooting. Really useful.
>
> Daniel,
>
> out of curiosity, have you been able to make some sense of the phantom
> legacy IRQs on GM45 systems, or are we just staying with my original
> bandaid (disabling GMBUS IRQs), declaring GM45 broken in this respect?

I've played around with dummy-irq on my own gm45 and afaict every
gmbus interrupt generates both an msi and a legacy pci interrupt when
msi is enabled. Everything else (= other interrupt sources) seems to
work as expected, and disabling msi also papers over the issue.
There's also scary comments in our gm45 irq handler that we need msi
to paper over some other races.

So I've decided that I don't want to dwell any longer in this
particular dungeon and that your bandaid (of just disabling this mess)
is the real fix.

Thanks, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch