2008-06-02 13:52:53

by Olaf Dabrunz

[permalink] [raw]
Subject: [PATCH 0/7] Boot IRQ quirks and rerouting

These patches are against linux-2.6-tip, auto-x86-next.

When IRQ lines on secondary or higher IO-APICs are masked (as done by
RT and others), many chipsets redirect IRQs on this line to the PIC, and
thereby regularly to the first IO-APIC in the system. This causes
spurious interrupts and can lead to disabled IRQ lines.

Disabling this "boot interrupt" (as it is mostly used to supply all
IRQs to the legacy PIC during boot) is chipset-specific, and not
possible for all chips. This patchset disables the boot interrupt on
chipsets where this is possible and where we know how to do it.

When disabling the boot interrupt is not possible, the patches tell the
IRQ code to always use the redirected interrupt line (on the first
IO-APIC) instead of the "original" line on the secondary (tertiary ...)
IO-APIC. The original line remains masked, and IRQs always appear on
the boot interrupt line on the first IO-APIC instead.

Two new boot parameters control the quirks and are explained in the
patches: nobootirqquirk,
bootirqquirk=0x<vendor>,0x<device>,<quirk_variant>. "noapic" also sets
nobootirqquirk.

All patches are co-authored by Stefan Assmann and Olaf Dabrunz. But
according to Documentation/SubmittingPatches, the patch submission format
allows only for one author.

drivers/acpi/pci_irq.c | 63 ++++++++
drivers/pci/quirks.c | 339 ++++++++++++++++++++++++++++++++++++++++++++-
include/asm-x86/io_apic.h | 4
include/linux/pci.h | 2
include/linux/pci_ids.h | 6
include/linux/pci_quirks.h | 29 +++
6 files changed, 435 insertions(+), 8 deletions(-)

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg


2008-06-02 13:52:14

by Olaf Dabrunz

[permalink] [raw]
Subject: [PATCH 4/7] disable broadcomm legacy boot interrupt generation

From: Olaf Dabrunz <[email protected]>

Add a quirk to disable legacy boot interrupt generation on broadcomm HT1000.

Signed-off-by: Olaf Dabrunz <[email protected]>
Signed-off-by: Stefan Assmann <[email protected]>
---
drivers/pci/quirks.c | 30 ++++++++++++++++++++++++++++++
1 files changed, 30 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 8f09f8f..e2fed6c 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1377,6 +1377,36 @@ int nobootirqquirk_setup(char *str)
__setup("nobootirqquirk", nobootirqquirk_setup);

/*
+ * disabled boot interrupts on HT-1000
+ */
+static void quirk_disable_broadcom_boot_interrupt(struct pci_dev *dev)
+{
+ u32 feature_enable;
+ u32 saved_feature_enable;
+ u8 irq;
+
+ if (nobootirqquirk)
+ return;
+
+ pci_read_config_dword(dev, 0x64,
+ &feature_enable);
+ saved_feature_enable = feature_enable;
+ feature_enable |= (1<<0);
+ pci_write_config_dword(dev, 0x64, feature_enable);
+
+ for (irq = 0x10; irq < 0x10 + 32; irq++) {
+ outb(irq, 0xC00);
+ outb(0x00, 0xC01);
+ }
+
+ pci_write_config_dword(dev, 0x64, saved_feature_enable);
+
+ printk(KERN_INFO "disabled boot interrupts on PCI device 0x%04x:0x%04x\n",
+ dev->vendor, dev->device);
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, PCI_DEVICE_ID_SERVERWORKS_HT1000SB, quirk_disable_broadcom_boot_interrupt);
+
+/*
* On some chipsets we can disable the generation of legacy INTx boot
* interrupts.
*/
--
1.5.2.4

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-02 13:52:39

by Olaf Dabrunz

[permalink] [raw]
Subject: [PATCH 6/7] disable pci legacy boot irq quirks on noapic boot

From: Stefan Assmann <[email protected]>

Don't enable pci boot irq quirks when falling back to pic, this would
break interrupt routing.

Signed-off-by: Stefan Assmann <[email protected]>
Signed-off-by: Olaf Dabrunz <[email protected]>
---
include/asm-x86/io_apic.h | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/io_apic.h b/include/asm-x86/io_apic.h
index 5b1292f..461771c 100644
--- a/include/asm-x86/io_apic.h
+++ b/include/asm-x86/io_apic.h
@@ -151,8 +151,11 @@ extern int skip_ioapic_setup;
/* 1 if the timer IRQ uses the '8259A Virtual Wire' mode */
extern int timer_through_8259;

+extern int nobootirqquirk;
+
static inline void disable_ioapic_setup(void)
{
+ nobootirqquirk = 1;
skip_ioapic_setup = 1;
}

--
1.5.2.4

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-02 13:53:13

by Olaf Dabrunz

[permalink] [raw]
Subject: [PATCH 3/7] disable legacy boot interrupt generation

From: Stefan Assmann <[email protected]>

Add a quirk to disable legacy boot interrupt generation on intel devices
that support disabling it.

This patch benefited from discussions with Alexander Graf, Torsten Duwe,
Ihno Krumreich, Daniel Gollub, Hannes Reinecke. The conclusions we drew
and the patch itself are the authors' responsibility alone.

Signed-off-by: Stefan Assmann <[email protected]>
Signed-off-by: Olaf Dabrunz <[email protected]>
---
drivers/pci/quirks.c | 28 ++++++++++++++++++++++++++++
include/linux/pci_ids.h | 1 +
2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6173be5..8f09f8f 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1377,6 +1377,34 @@ int nobootirqquirk_setup(char *str)
__setup("nobootirqquirk", nobootirqquirk_setup);

/*
+ * On some chipsets we can disable the generation of legacy INTx boot
+ * interrupts.
+ */
+#define PCI_IOAPIC_ALTERNATE_BASE_ADDRESS 0x40
+#define INTEL_6300_ESB_BOOT_INTERRUPT_ENABLE (1<<14)
+static void quirk_disable_intel_boot_interrupt(struct pci_dev *dev)
+{
+ u16 cnf;
+
+ if (nobootirqquirk)
+ return;
+
+ pci_read_config_word(dev, PCI_IOAPIC_ALTERNATE_BASE_ADDRESS, &cnf);
+ /* this disables the boot interrupt even though it reads ENABLE */
+ cnf |= INTEL_6300_ESB_BOOT_INTERRUPT_ENABLE;
+ pci_write_config_word(dev, PCI_IOAPIC_ALTERNATE_BASE_ADDRESS, cnf);
+
+ printk(KERN_INFO "disabled boot interrupt on device 0x%04x:0x%04x\n",
+ dev->vendor, dev->device);
+}
+#undef PCI_IOAPIC_ALTERNATE_BASE_ADDRESS
+/*
+ * IO-APIC1 on 6300ESB generates boot interrupts, see intel order no
+ * 300641-004US, section 5.7.3.
+ */
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ESB_10, quirk_disable_intel_boot_interrupt);
+
+/*
* Boot interrupts on some chipsets cannot be turned off. For these chipsets,
* remap the original interrupt in the linux kernel to the boot interrupt, so
* that a PCI device's interrupt handler is installed on the boot interrupt
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index c675399..b89616d 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2309,6 +2309,7 @@
#define PCI_DEVICE_ID_INTEL_ESB_4 0x25a4
#define PCI_DEVICE_ID_INTEL_ESB_5 0x25a6
#define PCI_DEVICE_ID_INTEL_ESB_9 0x25ab
+#define PCI_DEVICE_ID_INTEL_ESB_10 0x25ac
#define PCI_DEVICE_ID_INTEL_82820_HB 0x2500
#define PCI_DEVICE_ID_INTEL_82820_UP_HB 0x2501
#define PCI_DEVICE_ID_INTEL_82850_HB 0x2530
--
1.5.2.4

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-02 13:53:27

by Olaf Dabrunz

[permalink] [raw]
Subject: [PATCH 5/7] disable AMD/ATI legacy boot interrupt generation

From: Olaf Dabrunz <[email protected]>

Add quirks for several AMD/ATI chipsets to prevent generation of legacy boot
interrupts.

Signed-off-by: Olaf Dabrunz <[email protected]>
Signed-off-by: Stefan Assmann <[email protected]>
---
drivers/pci/quirks.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 93 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index e2fed6c..090ce38 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1377,6 +1377,99 @@ int nobootirqquirk_setup(char *str)
__setup("nobootirqquirk", nobootirqquirk_setup);

/*
+ * disable boot interrupts on AMD and ATI chipsets
+ */
+#define PCI_X_MISC 0x40
+#define PCI_X_AMD813X_NIOAMODE (1<<0)
+static void quirk_disable_amd_813x_boot_interrupt(struct pci_dev *dev)
+{
+ u32 pci_x_misc;
+
+ if (nobootirqquirk)
+ return;
+
+ pci_read_config_dword(dev, PCI_X_MISC, &pci_x_misc);
+ pci_x_misc &= ~PCI_X_AMD813X_NIOAMODE;
+ pci_write_config_dword(dev, PCI_X_MISC, pci_x_misc);
+
+ printk(KERN_INFO "disabled boot interrupts on PCI device "
+ "0x%04x:0x%04x\n", dev->vendor, dev->device);
+}
+#undef PCI_X_MISC
+#undef PCI_X_AMD813X_NIOAMODE
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_8131_BRIDGE, quirk_disable_amd_813x_boot_interrupt);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_8132_BRIDGE, quirk_disable_amd_813x_boot_interrupt);
+
+#define PCI_AMD8111_PCI_IRQ_ROUTING 0x56
+static void quirk_disable_amd_8111_boot_interrupt(struct pci_dev *dev)
+{
+ u16 pci_irq_routing;
+
+ if (nobootirqquirk)
+ return;
+
+ pci_read_config_word(dev, PCI_AMD8111_PCI_IRQ_ROUTING,
+ &pci_irq_routing);
+ if (!pci_irq_routing) {
+ printk(KERN_INFO "boot interrupts on PCI "
+ "device 0x%04x:0x%04x were "
+ "already disabled\n",
+ dev->vendor, dev->device);
+ return;
+ }
+ pci_write_config_word(dev, PCI_AMD8111_PCI_IRQ_ROUTING, 0);
+ printk(KERN_INFO "disabled boot interrupts on PCI device "
+ "0x%04x:0x%04x\n", dev->vendor, dev->device);
+}
+#undef PCI_AMD8111_PCI_IRQ_ROUTING
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_8111_SMBUS, quirk_disable_amd_8111_boot_interrupt);
+
+#define PCI_AMD_SB700S_FEATURES_ENABLE 0x64
+#define PCI_AMD_SB700S_PIC_ENABLE (1<<0)
+#define PIC_PCI_INTR_INDEX 0xC00
+#define PIC_PCI_INTR_DATA 0xC01
+#define CLEAR_PIC_IRQ_ROUTING(irq) \
+ outb(irq, PIC_PCI_INTR_INDEX); \
+ outb(0x00, PIC_PCI_INTR_DATA);
+static void quirk_disable_amd_sb700s_boot_interrupt(struct pci_dev *dev)
+{
+ u32 feature_enable;
+ u32 saved_feature_enable;
+
+ if (nobootirqquirk)
+ return;
+
+ pci_read_config_dword(dev, PCI_AMD_SB700S_FEATURES_ENABLE,
+ &feature_enable);
+ saved_feature_enable = feature_enable;
+ feature_enable |= PCI_AMD_SB700S_PIC_ENABLE;
+ pci_write_config_dword(dev, PCI_AMD_SB700S_FEATURES_ENABLE,
+ feature_enable);
+
+ CLEAR_PIC_IRQ_ROUTING(0x0);
+ CLEAR_PIC_IRQ_ROUTING(0x1);
+ CLEAR_PIC_IRQ_ROUTING(0x2);
+ CLEAR_PIC_IRQ_ROUTING(0x3);
+ CLEAR_PIC_IRQ_ROUTING(0x4);
+ CLEAR_PIC_IRQ_ROUTING(0x9);
+ CLEAR_PIC_IRQ_ROUTING(0xA);
+ CLEAR_PIC_IRQ_ROUTING(0xB);
+ CLEAR_PIC_IRQ_ROUTING(0xC);
+
+ pci_write_config_dword(dev, PCI_AMD_SB700S_FEATURES_ENABLE,
+ saved_feature_enable);
+
+ printk(KERN_INFO "disabled boot interrupts on PCI device "
+ "0x%04x:0x%04x\n", dev->vendor, dev->device);
+}
+#undef PCI_AMD_SB700S_FEATURES_ENABLE
+#undef PCI_AMD_SB700S_PIC_ENABLE
+#undef PIC_PCI_INTR_INDEX
+#undef PIC_PCI_INTR_DATA
+#undef CLEAR_PIC_IRQ_ROUTING
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_IXP600_SMBUS, quirk_disable_amd_sb700s_boot_interrupt);
+
+/*
* disabled boot interrupts on HT-1000
*/
static void quirk_disable_broadcom_boot_interrupt(struct pci_dev *dev)
--
1.5.2.4

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-02 13:52:27

by Olaf Dabrunz

[permalink] [raw]
Subject: [PATCH 1/7] add kernel cmdline option to disable pci-irq quirks

From: Olaf Dabrunz <[email protected]>

Add a switch to disable all boot interrupt quirks, using the parameter
nobootirqquirk.

Signed-off-by: Olaf Dabrunz <[email protected]>
Signed-off-by: Stefan Assmann <[email protected]>
---
drivers/pci/quirks.c | 15 +++++++++++++++
1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index dabb563..6245486 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1363,6 +1363,21 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x2609, quirk_intel_pcie_pm);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260a, quirk_intel_pcie_pm);
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260b, quirk_intel_pcie_pm);

+
+#ifdef CONFIG_X86_IO_APIC
+int nobootirqquirk __read_mostly = 0;
+
+int nobootirqquirk_setup(char *str)
+{
+ nobootirqquirk = 1;
+ printk(KERN_INFO "Boot IRQ quirk handling disabled\n");
+
+ return 1;
+}
+__setup("nobootirqquirk", nobootirqquirk_setup);
+#endif /* CONFIG_X86_IO_APIC */
+
+
/*
* Toshiba TC86C001 IDE controller reports the standard 8-byte BAR0 size
* but the PIO transfers won't work if BAR0 falls at the odd 8 bytes.
--
1.5.2.4

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-02 13:53:42

by Olaf Dabrunz

[permalink] [raw]
Subject: [PATCH 2/7] reroute PCI interrupt to legacy boot interrupt equivalent

From: Stefan Assmann <[email protected]>

Some chipsets (e.g. intel 6700PXH) generate a legacy INTx when the
IRQ entry in the chipset's IO-APIC is masked (as, e.g. the RT kernel
does during interrupt handling). On chipsets where this INTx generation
cannot be disabled, we reroute the valid interrupts to their legacy
equivalent to get rid of spurious interrupts that might otherwise bring
down (vital) interrupt lines through spurious interrupt detection in
note_interrupt().

The patch should eventually simply use another bitfield in the pci_dev
structure for the rerouting variant. This was in another variant of the patch
by Daniel Gollub. It is the cleanest and most simple approach.

This patch benefited from discussions with Alexander Graf, Torsten Duwe,
Ihno Krumreich, Daniel Gollub, Hannes Reinecke. The conclusions we drew
and the patch itself are the authors' responsibility alone.

Signed-off-by: Stefan Assmann <[email protected]>
Signed-off-by: Olaf Dabrunz <[email protected]>
---
drivers/acpi/pci_irq.c | 63 ++++++++++++++++++++++++++++++++++++++++++++
drivers/pci/quirks.c | 51 ++++++++++++++++++++++++++++++++++-
include/linux/pci.h | 2 +
include/linux/pci_ids.h | 4 +++
include/linux/pci_quirks.h | 28 +++++++++++++++++++
5 files changed, 147 insertions(+), 1 deletions(-)
create mode 100644 include/linux/pci_quirks.h

diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index 89022a7..ebdfbfe 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -384,6 +384,34 @@ acpi_pci_free_irq(struct acpi_prt_entry *entry,
return irq;
}

+#ifdef CONFIG_X86_IO_APIC
+static int
+bridge_has_boot_interrupt_variant(struct pci_bus *bus)
+{
+ struct pci_bus *bus_it;
+ int i;
+
+ for (bus_it = bus ; bus_it ; bus_it = bus_it->parent) {
+ if (!bus_it->self)
+ return 0;
+
+ printk(KERN_INFO "vendor=%04x device=%04x\n", bus_it->self->vendor,
+ bus_it->self->device);
+ for (i = 0; i < MAX_QUIRK_BOOTIRQ_REROUTE_DEVS; i++) {
+ if (quirk_bootirq_reroute_devs[i].vendor == 0)
+ return 0;
+
+ if (quirk_bootirq_reroute_devs[i].vendor == bus_it->self->vendor &&
+ quirk_bootirq_reroute_devs[i].device == bus_it->self->device)
+ return quirk_bootirq_reroute_devs[i].quirk_variant;
+ }
+ }
+ return 0;
+}
+
+extern int reroute_to_boot_interrupts;
+#endif /* CONFIG_X86_IO_APIC */
+
/*
* acpi_pci_irq_lookup
* success: return IRQ >= 0
@@ -413,6 +441,41 @@ acpi_pci_irq_lookup(struct pci_bus *bus,
}

ret = func(entry, triggering, polarity, link);
+
+#ifdef CONFIG_X86_IO_APIC
+ /*
+ * Some chipsets (e.g. intel 6700PXH) generate a legacy INTx when the
+ * IRQ entry in the chipset's IO-APIC is masked (as, e.g. the RT kernel
+ * does during interrupt handling). When this INTx generation cannot be
+ * disabled, we reroute these interrupts to their legacy equivalent to
+ * get rid of spurious interrupts.
+ */
+ if (reroute_to_boot_interrupts) {
+ switch (bridge_has_boot_interrupt_variant(bus)) {
+ case 0:
+ /* no rerouting necessary */
+ break;
+
+ case VARIANT_INTEL_BUS_INT_REMAPPING:
+ /*
+ * Remap according to INTx routing table in 6700PXH
+ * specs, intel order number 302628-002, section
+ * 2.15.2. Other chipsets (80332, ...) have the same
+ * mapping and are handled here as well.
+ */
+ printk(KERN_INFO "pci irq %d -> rerouted to legacy "
+ "irq %d\n", ret, (ret % 4) + 16);
+ ret = (ret % 4) + 16;
+ break;
+
+ default:
+ printk(KERN_INFO "not rerouting irq %d to legacy irq: "
+ "unknown mapping\n", ret);
+ break;
+ }
+ }
+#endif /* CONFIG_X86_IO_APIC */
+
return ret;
}

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6245486..6173be5 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1375,8 +1375,57 @@ int nobootirqquirk_setup(char *str)
return 1;
}
__setup("nobootirqquirk", nobootirqquirk_setup);
-#endif /* CONFIG_X86_IO_APIC */

+/*
+ * Boot interrupts on some chipsets cannot be turned off. For these chipsets,
+ * remap the original interrupt in the linux kernel to the boot interrupt, so
+ * that a PCI device's interrupt handler is installed on the boot interrupt
+ * line instead.
+ */
+int reroute_to_boot_interrupts;
+EXPORT_SYMBOL(reroute_to_boot_interrupts);
+
+struct quirk_bootirq_reroute_dev
+ quirk_bootirq_reroute_devs[MAX_QUIRK_BOOTIRQ_REROUTE_DEVS];
+EXPORT_SYMBOL(quirk_bootirq_reroute_devs);
+
+static void quirk_reroute_to_boot_interrupts_intel(struct pci_dev *dev)
+{
+ int i;
+
+ if (nobootirqquirk)
+ return;
+
+ for (i = 0; i < MAX_QUIRK_BOOTIRQ_REROUTE_DEVS; i++) {
+ if (quirk_bootirq_reroute_devs[i].vendor == 0)
+ break;
+
+ if (quirk_bootirq_reroute_devs[i].vendor == dev->vendor &&
+ quirk_bootirq_reroute_devs[i].device == dev->device)
+ return; /* already on list */
+ }
+
+ quirk_bootirq_reroute_devs[i].vendor = dev->vendor;
+ quirk_bootirq_reroute_devs[i].device = dev->device;
+
+ quirk_bootirq_reroute_devs[i].quirk_variant =
+ VARIANT_INTEL_BUS_INT_REMAPPING;
+
+ reroute_to_boot_interrupts = 1;
+
+ printk(KERN_INFO "PCI quirk: reroute interrupts for 0x%04x:0x%04x\n",
+ dev->vendor, dev->device);
+ return;
+}
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_80333_0, quirk_reroute_to_boot_interrupts_intel);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_80333_1, quirk_reroute_to_boot_interrupts_intel);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ESB2_0, quirk_reroute_to_boot_interrupts_intel);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PXH_0, quirk_reroute_to_boot_interrupts_intel);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PXH_1, quirk_reroute_to_boot_interrupts_intel);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PXHV, quirk_reroute_to_boot_interrupts_intel);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_80332_0, quirk_reroute_to_boot_interrupts_intel);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_80332_1, quirk_reroute_to_boot_interrupts_intel);
+#endif /* CONFIG_X86_IO_APIC */

/*
* Toshiba TC86C001 IDE controller reports the standard 8-byte BAR0 size
diff --git a/include/linux/pci.h b/include/linux/pci.h
index d18b1dd..6c46cad 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1070,5 +1070,7 @@ static inline void pci_mmcfg_early_init(void) { }
static inline void pci_mmcfg_late_init(void) { }
#endif

+#include <linux/pci_quirks.h>
+
#endif /* __KERNEL__ */
#endif /* LINUX_PCI_H */
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 9b940e6..c675399 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2233,6 +2233,10 @@
#define PCI_DEVICE_ID_INTEL_PXH_0 0x0329
#define PCI_DEVICE_ID_INTEL_PXH_1 0x032A
#define PCI_DEVICE_ID_INTEL_PXHV 0x032C
+#define PCI_DEVICE_ID_INTEL_80332_0 0x0330
+#define PCI_DEVICE_ID_INTEL_80332_1 0x0332
+#define PCI_DEVICE_ID_INTEL_80333_0 0x0370
+#define PCI_DEVICE_ID_INTEL_80333_1 0x0372
#define PCI_DEVICE_ID_INTEL_82375 0x0482
#define PCI_DEVICE_ID_INTEL_82424 0x0483
#define PCI_DEVICE_ID_INTEL_82378 0x0484
diff --git a/include/linux/pci_quirks.h b/include/linux/pci_quirks.h
new file mode 100644
index 0000000..d875b87
--- /dev/null
+++ b/include/linux/pci_quirks.h
@@ -0,0 +1,28 @@
+/*
+ * pci_quirks.h
+ *
+ * PCI quirks defines
+ * Copyright 2008, Olaf Dabrunz
+ */
+
+#ifndef LINUX_PCI_QUIRKS_H
+#define LINUX_PCI_QUIRKS_H
+
+#ifdef CONFIG_X86_IO_APIC
+#define MAX_QUIRK_BOOTIRQ_REROUTE_VARIANTS 32
+#define VARIANT_INTEL_BUS_INT_REMAPPING 1
+
+struct quirk_bootirq_reroute_dev {
+ unsigned short vendor;
+ unsigned short device;
+ unsigned short quirk_variant;
+};
+
+/* max number of different quirky devices */
+#define MAX_QUIRK_BOOTIRQ_REROUTE_DEVS 32
+
+extern struct quirk_bootirq_reroute_dev
+ quirk_bootirq_reroute_devs[];
+#endif /* CONFIG_X86_IO_APIC */
+
+#endif /* LINUX_PCI_QUIRKS_H */
--
1.5.2.4

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-02 13:53:54

by Olaf Dabrunz

[permalink] [raw]
Subject: [PATCH 7/7] bootirqquirk= parameter to enable bootirq quirks for additional chips

From: Olaf Dabrunz <[email protected]>

The existing bootirq quirks and the reroute workaround may work for other chips
where we could not test them. This parameter allows users to apply these to
other chips without the need to re-build the kernel.

This patch was conceived simultaneously by Stefan Assmann, Daniel Gollub and
Olaf Dabrunz. The implementation is the author's.

bootirqquirk=0x<vendor>,0x<device>,<quirk_type>

- quirk type 1 - 32 selects an IRQ reroute algorithm for devices
connected to that PCI bridge (currently only algorithm "1" is
implemented),

- quirk type 33 - x applies one of the known quirks to the PCI
device, currently these:

33 -> quirk_disable_intel_boot_interrupt
34 -> quirk_disable_broadcom_boot_interrupt
35 -> quirk_disable_amd_8111_boot_interrupt
36 -> quirk_disable_amd_813x_boot_interrupt
37 -> quirk_disable_amd_sb700s_boot_interrupt

Signed-off-by: Olaf Dabrunz <[email protected]>
Signed-off-by: Stefan Assmann <[email protected]>
---
drivers/pci/quirks.c | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 118 insertions(+), 0 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 090ce38..bb7fa65 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1576,6 +1576,113 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PXH_1, quirk_re
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PXHV, quirk_reroute_to_boot_interrupts_intel);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_80332_0, quirk_reroute_to_boot_interrupts_intel);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_80332_1, quirk_reroute_to_boot_interrupts_intel);
+
+/*
+ * For additional boot irq quirks save boot time settings or call the quirks
+ * directly.
+ */
+
+#define MAX_BOOTTIME_CONFIG_PCI_FIXUPS_EARLY 32
+struct pci_fixup __start_bc_pci_fixups_early[MAX_BOOTTIME_CONFIG_PCI_FIXUPS_EARLY];
+struct pci_fixup *__end_bc_pci_fixups_early = __start_bc_pci_fixups_early +
+ MAX_BOOTTIME_CONFIG_PCI_FIXUPS_EARLY;
+
+static void (*bootirq_quirk_func[])(struct pci_dev *dev) = {
+ &quirk_disable_intel_boot_interrupt,
+ &quirk_disable_broadcom_boot_interrupt,
+ &quirk_disable_amd_8111_boot_interrupt,
+ &quirk_disable_amd_813x_boot_interrupt,
+ &quirk_disable_amd_sb700s_boot_interrupt,
+};
+
+int __init bootirqquirk_setup(char *str)
+{
+ int ints[4];
+ struct quirk_bootirq_reroute_dev tmp;
+ int i;
+
+ str = get_options(str, ARRAY_SIZE(ints), ints);
+ if (!str)
+ return 0;
+
+ if (nobootirqquirk)
+ return 1;
+
+ /* all parameters are required */
+ if (ints[0] != 3)
+ return 0;
+
+ /* save settings */
+ tmp.vendor = ints[1];
+ tmp.device = ints[2];
+ tmp.quirk_variant = ints[3];
+
+ /*
+ * quirk variants > MAX_QUIRK_BOOTIRQ_REROUTE_VARIANTS select quirk
+ * functions
+ */
+ if (tmp.quirk_variant > MAX_QUIRK_BOOTIRQ_REROUTE_VARIANTS) {
+ for (i = 0; i < MAX_BOOTTIME_CONFIG_PCI_FIXUPS_EARLY; i++) {
+ if (__start_bc_pci_fixups_early[i].vendor == 0)
+ break;
+
+ if (__start_bc_pci_fixups_early[i].vendor == tmp.vendor &&
+ __start_bc_pci_fixups_early[i].device == tmp.device)
+ return 1; /* already in array */
+ }
+
+ if (i >= MAX_BOOTTIME_CONFIG_PCI_FIXUPS_EARLY) {
+ printk(KERN_INFO "PCI quirk: too many boottime "
+ "configurable early quirk entries when "
+ "trying to add 0x%04x:0x%04x\n",
+ tmp.vendor, tmp.device);
+ return 0;
+ }
+
+ if (tmp.quirk_variant > ARRAY_SIZE(bootirq_quirk_func) +
+ MAX_QUIRK_BOOTIRQ_REROUTE_VARIANTS) {
+ printk(KERN_INFO "PCI quirk: no such quirk function: "
+ "%d\n", tmp.quirk_variant);
+ return 0;
+ }
+
+ __start_bc_pci_fixups_early[i].vendor = tmp.vendor;
+ __start_bc_pci_fixups_early[i].device = tmp.device;
+ __start_bc_pci_fixups_early[i].hook =
+ bootirq_quirk_func[tmp.quirk_variant -
+ (MAX_QUIRK_BOOTIRQ_REROUTE_VARIANTS + 1)];
+ } else {
+ for (i = 0; i < MAX_QUIRK_BOOTIRQ_REROUTE_DEVS; i++) {
+ if (quirk_bootirq_reroute_devs[i].vendor == 0)
+ break;
+
+ if (quirk_bootirq_reroute_devs[i].vendor == tmp.vendor &&
+ quirk_bootirq_reroute_devs[i].device == tmp.device)
+ return 1; /* already in array */
+ }
+
+ if (i >= MAX_QUIRK_BOOTIRQ_REROUTE_DEVS) {
+ printk(KERN_INFO "PCI quirk: too many reroute entries when "
+ "trying to add 0x%04x:0x%04x\n",
+ tmp.vendor, tmp.device);
+ return 0;
+ }
+
+ quirk_bootirq_reroute_devs[i].vendor = tmp.vendor;
+ quirk_bootirq_reroute_devs[i].device = tmp.device;
+ quirk_bootirq_reroute_devs[i].quirk_variant = tmp.quirk_variant;
+
+ reroute_to_boot_interrupts = 1;
+ }
+
+ printk(KERN_INFO "PCI quirk: reroute interrupts for 0x%04x:0x%04x\n",
+ tmp.vendor, tmp.device);
+ return 1;
+}
+
+/* bootirqquirk=0x<vendor>,0x<device>,<quirk_type> */
+__setup("bootirqquirk=", bootirqquirk_setup);
+
#endif /* CONFIG_X86_IO_APIC */

/*
@@ -1773,6 +1880,17 @@ void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev)
return;
}
pci_do_fixups(dev, start, end);
+
+ switch(pass) {
+ case pci_fixup_early:
+ start = __start_bc_pci_fixups_early;
+ end = __end_bc_pci_fixups_early;
+ break;
+
+ default:
+ return;
+ }
+ pci_do_fixups(dev, start, end);
}
EXPORT_SYMBOL(pci_fixup_device);

--
1.5.2.4

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-02 16:31:23

by Olaf Dabrunz

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On 02-Jun-08, Olaf Dabrunz wrote:
> These patches are against linux-2.6-tip, auto-x86-next.

And they apply to linux-2.6 as well.

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-03 10:08:32

by Olaf Dabrunz

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

One note on testing:

These patches have been tested several times on machines containing the
PCI bridges that they handle. They solved (most of) the spurious
interrupt issues we had with RT. Some tests have run for several days or
even longer.

Comments?

Regards,

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-03 10:12:27

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

Olaf,

On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
> These patches are against linux-2.6-tip, auto-x86-next.
>
> When IRQ lines on secondary or higher IO-APICs are masked (as done by
> RT and others), many chipsets redirect IRQs on this line to the PIC, and
> thereby regularly to the first IO-APIC in the system. This causes
> spurious interrupts and can lead to disabled IRQ lines.
>
> Disabling this "boot interrupt" (as it is mostly used to supply all
> IRQs to the legacy PIC during boot) is chipset-specific, and not
> possible for all chips. This patchset disables the boot interrupt on
> chipsets where this is possible and where we know how to do it.
>
> When disabling the boot interrupt is not possible, the patches tell the
> IRQ code to always use the redirected interrupt line (on the first
> IO-APIC) instead of the "original" line on the secondary (tertiary ...)
> IO-APIC. The original line remains masked, and IRQs always appear on
> the boot interrupt line on the first IO-APIC instead.

Thanks for doing the research on this problem.

We probably don't want to enable this unconditionally right away, so
we should have a command line option which enables these quirks.

Also is there any interaction with MSI ?

Thanks,
tglx

2008-06-03 10:13:48

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/7] add kernel cmdline option to disable pci-irq quirks

On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
> From: Olaf Dabrunz <[email protected]>
>
> Add a switch to disable all boot interrupt quirks, using the parameter
> nobootirqquirk.

This should be a parameter in the form of

pci=ioapicquirk

and not a separate and completely unintuitive thing.

Thanks,
tglx

> Signed-off-by: Olaf Dabrunz <[email protected]>
> Signed-off-by: Stefan Assmann <[email protected]>
> ---
> drivers/pci/quirks.c | 15 +++++++++++++++
> 1 files changed, 15 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index dabb563..6245486 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1363,6 +1363,21 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x2609, quirk_intel_pcie_pm);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260a, quirk_intel_pcie_pm);
> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260b, quirk_intel_pcie_pm);
>
> +
> +#ifdef CONFIG_X86_IO_APIC
> +int nobootirqquirk __read_mostly = 0;
> +
> +int nobootirqquirk_setup(char *str)
> +{
> + nobootirqquirk = 1;
> + printk(KERN_INFO "Boot IRQ quirk handling disabled\n");
> +
> + return 1;
> +}
> +__setup("nobootirqquirk", nobootirqquirk_setup);
> +#endif /* CONFIG_X86_IO_APIC */
> +
> +
> /*
> * Toshiba TC86C001 IDE controller reports the standard 8-byte BAR0 size
> * but the PIO transfers won't work if BAR0 falls at the odd 8 bytes.
> --
> 1.5.2.4
>
> --
> Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, N??rnberg
>

2008-06-03 10:38:34

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 2/7] reroute PCI interrupt to legacy boot interrupt equivalent

On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
> From: Stefan Assmann <[email protected]>
>
> Some chipsets (e.g. intel 6700PXH) generate a legacy INTx when the
> IRQ entry in the chipset's IO-APIC is masked (as, e.g. the RT kernel
> does during interrupt handling). On chipsets where this INTx generation
> cannot be disabled, we reroute the valid interrupts to their legacy
> equivalent to get rid of spurious interrupts that might otherwise bring
> down (vital) interrupt lines through spurious interrupt detection in
> note_interrupt().
>
> The patch should eventually simply use another bitfield in the pci_dev
> structure for the rerouting variant. This was in another variant of the patch
> by Daniel Gollub. It is the cleanest and most simple approach.
>
> This patch benefited from discussions with Alexander Graf, Torsten Duwe,
> Ihno Krumreich, Daniel Gollub, Hannes Reinecke. The conclusions we drew
> and the patch itself are the authors' responsibility alone.
>
> Signed-off-by: Stefan Assmann <[email protected]>
> Signed-off-by: Olaf Dabrunz <[email protected]>
> ---
> drivers/acpi/pci_irq.c | 63 ++++++++++++++++++++++++++++++++++++++++++++
> drivers/pci/quirks.c | 51 ++++++++++++++++++++++++++++++++++-
> include/linux/pci.h | 2 +
> include/linux/pci_ids.h | 4 +++
> include/linux/pci_quirks.h | 28 +++++++++++++++++++
> 5 files changed, 147 insertions(+), 1 deletions(-)
> create mode 100644 include/linux/pci_quirks.h
>
> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
> index 89022a7..ebdfbfe 100644
> --- a/drivers/acpi/pci_irq.c
> +++ b/drivers/acpi/pci_irq.c
> @@ -384,6 +384,34 @@ acpi_pci_free_irq(struct acpi_prt_entry *entry,
> return irq;
> }
>
> +#ifdef CONFIG_X86_IO_APIC
> +static int
> +bridge_has_boot_interrupt_variant(struct pci_bus *bus)

one line

> +{
> + struct pci_bus *bus_it;
> + int i;
> +
> + for (bus_it = bus ; bus_it ; bus_it = bus_it->parent) {
> + if (!bus_it->self)
> + return 0;
> +
> + printk(KERN_INFO "vendor=%04x device=%04x\n", bus_it->self->vendor,
> + bus_it->self->device);
> + for (i = 0; i < MAX_QUIRK_BOOTIRQ_REROUTE_DEVS; i++) {
> + if (quirk_bootirq_reroute_devs[i].vendor == 0)
> + return 0;
> +
> + if (quirk_bootirq_reroute_devs[i].vendor == bus_it->self->vendor &&
> + quirk_bootirq_reroute_devs[i].device == bus_it->self->device)
> + return quirk_bootirq_reroute_devs[i].quirk_variant;
> + }
> + }
> + return 0;
> +}
> +
> +extern int reroute_to_boot_interrupts;

declarations need to be in a header file.

> +#endif /* CONFIG_X86_IO_APIC */
> +
> /*
> * acpi_pci_irq_lookup
> * success: return IRQ >= 0
> @@ -413,6 +441,41 @@ acpi_pci_irq_lookup(struct pci_bus *bus,
> }
>
> ret = func(entry, triggering, polarity, link);
> +
> +#ifdef CONFIG_X86_IO_APIC
> + /*
> + * Some chipsets (e.g. intel 6700PXH) generate a legacy INTx when the
> + * IRQ entry in the chipset's IO-APIC is masked (as, e.g. the RT kernel
> + * does during interrupt handling). When this INTx generation cannot be
> + * disabled, we reroute these interrupts to their legacy equivalent to
> + * get rid of spurious interrupts.
> + */
> + if (reroute_to_boot_interrupts) {
> + switch (bridge_has_boot_interrupt_variant(bus)) {
> + case 0:
> + /* no rerouting necessary */
> + break;
> +
> + case VARIANT_INTEL_BUS_INT_REMAPPING:
> + /*
> + * Remap according to INTx routing table in 6700PXH
> + * specs, intel order number 302628-002, section
> + * 2.15.2. Other chipsets (80332, ...) have the same
> + * mapping and are handled here as well.
> + */

They generate INTA,B,C,D messages. What makes sure, that those are
always routed to IRQ 16,17,18,19 ?

IIRC we have seen those spurious interrupts on IRQ 9 and 12 as well.

> + printk(KERN_INFO "pci irq %d -> rerouted to legacy "
> + "irq %d\n", ret, (ret % 4) + 16);
> + ret = (ret % 4) + 16;
> + break;
> +
> + default:
> + printk(KERN_INFO "not rerouting irq %d to legacy irq: "
> + "unknown mapping\n", ret);
> + break;
> + }
> + }
> +#endif /* CONFIG_X86_IO_APIC */
> +
> return ret;
> }
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6245486..6173be5 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1375,8 +1375,57 @@ int nobootirqquirk_setup(char *str)
> return 1;
> }
> __setup("nobootirqquirk", nobootirqquirk_setup);
> -#endif /* CONFIG_X86_IO_APIC */
>
> +/*
> + * Boot interrupts on some chipsets cannot be turned off. For these chipsets,
> + * remap the original interrupt in the linux kernel to the boot interrupt, so
> + * that a PCI device's interrupt handler is installed on the boot interrupt
> + * line instead.
> + */
> +int reroute_to_boot_interrupts;
> +EXPORT_SYMBOL(reroute_to_boot_interrupts);
> +
> +struct quirk_bootirq_reroute_dev
> + quirk_bootirq_reroute_devs[MAX_QUIRK_BOOTIRQ_REROUTE_DEVS];
> +EXPORT_SYMBOL(quirk_bootirq_reroute_devs);

Please use a consistent name space. This is about ioapic quirks so the
whole stuff should use ioapic_quirk_XXX or something like this.

Also why dont we put this information into the pci_dev structure
instead of having these tables and exports ?

> +static void quirk_reroute_to_boot_interrupts_intel(struct pci_dev *dev)
> +{
> + int i;
> +
> + if (nobootirqquirk)
> + return;

Default should be off.

> + for (i = 0; i < MAX_QUIRK_BOOTIRQ_REROUTE_DEVS; i++) {
> + if (quirk_bootirq_reroute_devs[i].vendor == 0)
> + break;
> +
> + if (quirk_bootirq_reroute_devs[i].vendor == dev->vendor &&
> + quirk_bootirq_reroute_devs[i].device == dev->device)
> + return; /* already on list */
> + }
> +
> + quirk_bootirq_reroute_devs[i].vendor = dev->vendor;
> + quirk_bootirq_reroute_devs[i].device = dev->device;
> +
> + quirk_bootirq_reroute_devs[i].quirk_variant =
> + VARIANT_INTEL_BUS_INT_REMAPPING;
> +
> + reroute_to_boot_interrupts = 1;
> +
> + printk(KERN_INFO "PCI quirk: reroute interrupts for 0x%04x:0x%04x\n",
> + dev->vendor, dev->device);
> + return;
> +}

> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index d18b1dd..6c46cad 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1070,5 +1070,7 @@ static inline void pci_mmcfg_early_init(void) { }
> static inline void pci_mmcfg_late_init(void) { }
> #endif
>
> +#include <linux/pci_quirks.h>
> +
> #endif /* __KERNEL__ */
> #endif /* LINUX_PCI_H */
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index 9b940e6..c675399 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -2233,6 +2233,10 @@
> #define PCI_DEVICE_ID_INTEL_PXH_0 0x0329
> #define PCI_DEVICE_ID_INTEL_PXH_1 0x032A
> #define PCI_DEVICE_ID_INTEL_PXHV 0x032C
> +#define PCI_DEVICE_ID_INTEL_80332_0 0x0330
> +#define PCI_DEVICE_ID_INTEL_80332_1 0x0332
> +#define PCI_DEVICE_ID_INTEL_80333_0 0x0370
> +#define PCI_DEVICE_ID_INTEL_80333_1 0x0372
> #define PCI_DEVICE_ID_INTEL_82375 0x0482
> #define PCI_DEVICE_ID_INTEL_82424 0x0483
> #define PCI_DEVICE_ID_INTEL_82378 0x0484

Please move the PCI ids out into a separate patch instead of adding
them gradually.

> diff --git a/include/linux/pci_quirks.h b/include/linux/pci_quirks.h
> new file mode 100644

Do we really need a separate header for this ?

> index 0000000..d875b87
> --- /dev/null
> +++ b/include/linux/pci_quirks.h
> @@ -0,0 +1,28 @@
> +/*
> + * pci_quirks.h
> + *
> + * PCI quirks defines
> + * Copyright 2008, Olaf Dabrunz
> + */
> +
> +#ifndef LINUX_PCI_QUIRKS_H
> +#define LINUX_PCI_QUIRKS_H
> +
> +#ifdef CONFIG_X86_IO_APIC
> +#define MAX_QUIRK_BOOTIRQ_REROUTE_VARIANTS 32
> +#define VARIANT_INTEL_BUS_INT_REMAPPING 1

Aside of the horrible names, if there are multiple variants, please
use an enum.

> +
> +struct quirk_bootirq_reroute_dev {
> + unsigned short vendor;
> + unsigned short device;
> + unsigned short quirk_variant;
> +};
> +
> +/* max number of different quirky devices */
> +#define MAX_QUIRK_BOOTIRQ_REROUTE_DEVS 32
> +
> +extern struct quirk_bootirq_reroute_dev
> + quirk_bootirq_reroute_devs[];
> +#endif /* CONFIG_X86_IO_APIC */
> +
> +#endif /* LINUX_PCI_QUIRKS_H */
> --
> 1.5.2.4
>
> --
> Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, N??rnberg
>

2008-06-03 10:40:33

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 3/7] disable legacy boot interrupt generation

On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6173be5..8f09f8f 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1377,6 +1377,34 @@ int nobootirqquirk_setup(char *str)
> __setup("nobootirqquirk", nobootirqquirk_setup);
>
> /*
> + * On some chipsets we can disable the generation of legacy INTx boot
> + * interrupts.
> + */
> +#define PCI_IOAPIC_ALTERNATE_BASE_ADDRESS 0x40
> +#define INTEL_6300_ESB_BOOT_INTERRUPT_ENABLE (1<<14)

new line

> +static void quirk_disable_intel_boot_interrupt(struct pci_dev *dev)
> +{
> + u16 cnf;
> +
> + if (nobootirqquirk)
> + return;
> +
> + pci_read_config_word(dev, PCI_IOAPIC_ALTERNATE_BASE_ADDRESS, &cnf);
> + /* this disables the boot interrupt even though it reads ENABLE */

And why cant we use a name for that constant which makes this obvious
w/o a comment ?

> + cnf |= INTEL_6300_ESB_BOOT_INTERRUPT_ENABLE;
> + pci_write_config_word(dev, PCI_IOAPIC_ALTERNATE_BASE_ADDRESS, cnf);
> +
> + printk(KERN_INFO "disabled boot interrupt on device 0x%04x:0x%04x\n",
> + dev->vendor, dev->device);
> +}
> +#undef PCI_IOAPIC_ALTERNATE_BASE_ADDRESS

Why #undef ?

> +/*
> + * IO-APIC1 on 6300ESB generates boot interrupts, see intel order no
> + * 300641-004US, section 5.7.3.
> + */
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_ESB_10, quirk_disable_intel_boot_interrupt);
> +
> +/*
> * Boot interrupts on some chipsets cannot be turned off. For these chipsets,
> * remap the original interrupt in the linux kernel to the boot interrupt, so
> * that a PCI device's interrupt handler is installed on the boot interrupt
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index c675399..b89616d 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -2309,6 +2309,7 @@
> #define PCI_DEVICE_ID_INTEL_ESB_4 0x25a4
> #define PCI_DEVICE_ID_INTEL_ESB_5 0x25a6
> #define PCI_DEVICE_ID_INTEL_ESB_9 0x25ab
> +#define PCI_DEVICE_ID_INTEL_ESB_10 0x25ac
> #define PCI_DEVICE_ID_INTEL_82820_HB 0x2500
> #define PCI_DEVICE_ID_INTEL_82820_UP_HB 0x2501
> #define PCI_DEVICE_ID_INTEL_82850_HB 0x2530
> --
> 1.5.2.4
>
> --
> Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, N??rnberg
>

2008-06-03 10:46:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 4/7] disable broadcomm legacy boot interrupt generation

On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
> /*
> + * disabled boot interrupts on HT-1000
> + */
> +static void quirk_disable_broadcom_boot_interrupt(struct pci_dev *dev)
> +{
> + u32 feature_enable;
> + u32 saved_feature_enable;

u32 a,b;

> + u8 irq;
> +
> + if (nobootirqquirk)
> + return;
> +
> + pci_read_config_dword(dev, 0x64,
> + &feature_enable);

one line.

Can we please have a useful constant for 0x64 ?

> + saved_feature_enable = feature_enable;
> + feature_enable |= (1<<0);

What does (1<<0) ? Please use a #define with a understandable name.

> + pci_write_config_dword(dev, 0x64, feature_enable);

pci_write_config_dword(dev, 0x64, feature_enable | WHATEVERTHISMEANS);

That way you dont need an extra variable.

> + for (irq = 0x10; irq < 0x10 + 32; irq++) {
> + outb(irq, 0xC00);
> + outb(0x00, 0xC01);

#defines for magic port constants please

> + }
> +
> + pci_write_config_dword(dev, 0x64, saved_feature_enable);
> +
> + printk(KERN_INFO "disabled boot interrupts on PCI device 0x%04x:0x%04x\n",
> + dev->vendor, dev->device);
> +}
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, PCI_DEVICE_ID_SERVERWORKS_HT1000SB, quirk_disable_broadcom_boot_interrupt);
> +
> +/*
> * On some chipsets we can disable the generation of legacy INTx boot
> * interrupts.
> */
> --
> 1.5.2.4
>
> --
> Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, N??rnberg
>

2008-06-03 10:54:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 5/7] disable AMD/ATI legacy boot interrupt generation

On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
>
> /*
> + * disable boot interrupts on AMD and ATI chipsets
> + */
> +#define PCI_X_MISC 0x40
> +#define PCI_X_AMD813X_NIOAMODE (1<<0)

new line before the function and consistent spacing

> +static void quirk_disable_amd_813x_boot_interrupt(struct pci_dev *dev)
> +{
> + u32 pci_x_misc;
> +
> + if (nobootirqquirk)
> + return;
> +
> + pci_read_config_dword(dev, PCI_X_MISC, &pci_x_misc);
> + pci_x_misc &= ~PCI_X_AMD813X_NIOAMODE;
> + pci_write_config_dword(dev, PCI_X_MISC, pci_x_misc);
> +
> + printk(KERN_INFO "disabled boot interrupts on PCI device "
> + "0x%04x:0x%04x\n", dev->vendor, dev->device);
> +}
> +#undef PCI_X_MISC
> +#undef PCI_X_AMD813X_NIOAMODE

why #undef ?

> +
> +#define PCI_AMD8111_PCI_IRQ_ROUTING 0x56
> +static void quirk_disable_amd_8111_boot_interrupt(struct pci_dev *dev)
> +{
> + u16 pci_irq_routing;
> +
> + if (nobootirqquirk)
> + return;
> +
> + pci_read_config_word(dev, PCI_AMD8111_PCI_IRQ_ROUTING,
> + &pci_irq_routing);
> + if (!pci_irq_routing) {
> + printk(KERN_INFO "boot interrupts on PCI "
> + "device 0x%04x:0x%04x were "
> + "already disabled\n",
> + dev->vendor, dev->device);
> + return;
> + }
> + pci_write_config_word(dev, PCI_AMD8111_PCI_IRQ_ROUTING, 0);
> + printk(KERN_INFO "disabled boot interrupts on PCI device "
> + "0x%04x:0x%04x\n", dev->vendor, dev->device);
> +}
> +#undef PCI_AMD8111_PCI_IRQ_ROUTING
> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_8111_SMBUS, quirk_disable_amd_8111_boot_interrupt);
> +
> +#define PCI_AMD_SB700S_FEATURES_ENABLE 0x64
> +#define PCI_AMD_SB700S_PIC_ENABLE (1<<0)
> +#define PIC_PCI_INTR_INDEX 0xC00
> +#define PIC_PCI_INTR_DATA 0xC01
> +#define CLEAR_PIC_IRQ_ROUTING(irq) \
> + outb(irq, PIC_PCI_INTR_INDEX); \
> + outb(0x00, PIC_PCI_INTR_DATA);

Please use an inline function

> +static void quirk_disable_amd_sb700s_boot_interrupt(struct pci_dev *dev)
> +{
> + u32 feature_enable;
> + u32 saved_feature_enable;
> +
> + if (nobootirqquirk)
> + return;
> +
> + pci_read_config_dword(dev, PCI_AMD_SB700S_FEATURES_ENABLE,
> + &feature_enable);
> + saved_feature_enable = feature_enable;
> + feature_enable |= PCI_AMD_SB700S_PIC_ENABLE;
> + pci_write_config_dword(dev, PCI_AMD_SB700S_FEATURES_ENABLE,
> + feature_enable);
> +
> + CLEAR_PIC_IRQ_ROUTING(0x0);
> + CLEAR_PIC_IRQ_ROUTING(0x1);
> + CLEAR_PIC_IRQ_ROUTING(0x2);
> + CLEAR_PIC_IRQ_ROUTING(0x3);
> + CLEAR_PIC_IRQ_ROUTING(0x4);
> + CLEAR_PIC_IRQ_ROUTING(0x9);
> + CLEAR_PIC_IRQ_ROUTING(0xA);
> + CLEAR_PIC_IRQ_ROUTING(0xB);
> + CLEAR_PIC_IRQ_ROUTING(0xC);

int i, irqs[] = {0x0,.....0x0c};

for (i = 0; i < ARRAY_SIZE(irqs); i++) {
outb(...);
outb(...);
}

perhaps ?

> +
> + pci_write_config_dword(dev, PCI_AMD_SB700S_FEATURES_ENABLE,
> + saved_feature_enable);
> +
> + printk(KERN_INFO "disabled boot interrupts on PCI device "
> + "0x%04x:0x%04x\n", dev->vendor, dev->device);
> +}
> +#undef PCI_AMD_SB700S_FEATURES_ENABLE
> +#undef PCI_AMD_SB700S_PIC_ENABLE
> +#undef PIC_PCI_INTR_INDEX
> +#undef PIC_PCI_INTR_DATA
> +#undef CLEAR_PIC_IRQ_ROUTING

grmbl

> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_IXP600_SMBUS, quirk_disable_amd_sb700s_boot_interrupt);
> +
> +/*
> * disabled boot interrupts on HT-1000
> */
> static void quirk_disable_broadcom_boot_interrupt(struct pci_dev *dev)
> --
> 1.5.2.4
>
> --
> Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, N??rnberg
>

2008-06-03 10:55:41

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 6/7] disable pci legacy boot irq quirks on noapic boot

On Mon, 2 Jun 2008, Olaf Dabrunz wrote:

> From: Stefan Assmann <[email protected]>
>
> Don't enable pci boot irq quirks when falling back to pic, this would
> break interrupt routing.
>
> Signed-off-by: Stefan Assmann <[email protected]>
> Signed-off-by: Olaf Dabrunz <[email protected]>
> ---
> include/asm-x86/io_apic.h | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/include/asm-x86/io_apic.h b/include/asm-x86/io_apic.h
> index 5b1292f..461771c 100644
> --- a/include/asm-x86/io_apic.h
> +++ b/include/asm-x86/io_apic.h
> @@ -151,8 +151,11 @@ extern int skip_ioapic_setup;
> /* 1 if the timer IRQ uses the '8259A Virtual Wire' mode */
> extern int timer_through_8259;
>
> +extern int nobootirqquirk;
> +

declarations in headers please.

> static inline void disable_ioapic_setup(void)
> {
> + nobootirqquirk = 1;

This logic needs to be default disabled anyway.

> skip_ioapic_setup = 1;
> }
>
> --
> 1.5.2.4
>
> --
> Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, N??rnberg
>

2008-06-03 10:56:55

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 7/7] bootirqquirk= parameter to enable bootirq quirks for additional chips

On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
> From: Olaf Dabrunz <[email protected]>
>
> The existing bootirq quirks and the reroute workaround may work for other chips
> where we could not test them. This parameter allows users to apply these to
> other chips without the need to re-build the kernel.
>
> This patch was conceived simultaneously by Stefan Assmann, Daniel Gollub and
> Olaf Dabrunz. The implementation is the author's.
>
> bootirqquirk=0x<vendor>,0x<device>,<quirk_type>
>
> - quirk type 1 - 32 selects an IRQ reroute algorithm for devices
> connected to that PCI bridge (currently only algorithm "1" is
> implemented),
>
> - quirk type 33 - x applies one of the known quirks to the PCI
> device, currently these:
>
> 33 -> quirk_disable_intel_boot_interrupt
> 34 -> quirk_disable_broadcom_boot_interrupt
> 35 -> quirk_disable_amd_8111_boot_interrupt
> 36 -> quirk_disable_amd_813x_boot_interrupt
> 37 -> quirk_disable_amd_sb700s_boot_interrupt

Oh no. This can be used as an debug aid when the need arises, but we
dont want to add this to the kernel.

Thanks,
tglx

> Signed-off-by: Olaf Dabrunz <[email protected]>
> Signed-off-by: Stefan Assmann <[email protected]>
> ---
> drivers/pci/quirks.c | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 118 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 090ce38..bb7fa65 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1576,6 +1576,113 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PXH_1, quirk_re
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PXHV, quirk_reroute_to_boot_interrupts_intel);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_80332_0, quirk_reroute_to_boot_interrupts_intel);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_80332_1, quirk_reroute_to_boot_interrupts_intel);
> +
> +/*
> + * For additional boot irq quirks save boot time settings or call the quirks
> + * directly.
> + */
> +
> +#define MAX_BOOTTIME_CONFIG_PCI_FIXUPS_EARLY 32
> +struct pci_fixup __start_bc_pci_fixups_early[MAX_BOOTTIME_CONFIG_PCI_FIXUPS_EARLY];
> +struct pci_fixup *__end_bc_pci_fixups_early = __start_bc_pci_fixups_early +
> + MAX_BOOTTIME_CONFIG_PCI_FIXUPS_EARLY;
> +
> +static void (*bootirq_quirk_func[])(struct pci_dev *dev) = {
> + &quirk_disable_intel_boot_interrupt,
> + &quirk_disable_broadcom_boot_interrupt,
> + &quirk_disable_amd_8111_boot_interrupt,
> + &quirk_disable_amd_813x_boot_interrupt,
> + &quirk_disable_amd_sb700s_boot_interrupt,
> +};
> +
> +int __init bootirqquirk_setup(char *str)
> +{
> + int ints[4];
> + struct quirk_bootirq_reroute_dev tmp;
> + int i;
> +
> + str = get_options(str, ARRAY_SIZE(ints), ints);
> + if (!str)
> + return 0;
> +
> + if (nobootirqquirk)
> + return 1;
> +
> + /* all parameters are required */
> + if (ints[0] != 3)
> + return 0;
> +
> + /* save settings */
> + tmp.vendor = ints[1];
> + tmp.device = ints[2];
> + tmp.quirk_variant = ints[3];
> +
> + /*
> + * quirk variants > MAX_QUIRK_BOOTIRQ_REROUTE_VARIANTS select quirk
> + * functions
> + */
> + if (tmp.quirk_variant > MAX_QUIRK_BOOTIRQ_REROUTE_VARIANTS) {
> + for (i = 0; i < MAX_BOOTTIME_CONFIG_PCI_FIXUPS_EARLY; i++) {
> + if (__start_bc_pci_fixups_early[i].vendor == 0)
> + break;
> +
> + if (__start_bc_pci_fixups_early[i].vendor == tmp.vendor &&
> + __start_bc_pci_fixups_early[i].device == tmp.device)
> + return 1; /* already in array */
> + }
> +
> + if (i >= MAX_BOOTTIME_CONFIG_PCI_FIXUPS_EARLY) {
> + printk(KERN_INFO "PCI quirk: too many boottime "
> + "configurable early quirk entries when "
> + "trying to add 0x%04x:0x%04x\n",
> + tmp.vendor, tmp.device);
> + return 0;
> + }
> +
> + if (tmp.quirk_variant > ARRAY_SIZE(bootirq_quirk_func) +
> + MAX_QUIRK_BOOTIRQ_REROUTE_VARIANTS) {
> + printk(KERN_INFO "PCI quirk: no such quirk function: "
> + "%d\n", tmp.quirk_variant);
> + return 0;
> + }
> +
> + __start_bc_pci_fixups_early[i].vendor = tmp.vendor;
> + __start_bc_pci_fixups_early[i].device = tmp.device;
> + __start_bc_pci_fixups_early[i].hook =
> + bootirq_quirk_func[tmp.quirk_variant -
> + (MAX_QUIRK_BOOTIRQ_REROUTE_VARIANTS + 1)];
> + } else {
> + for (i = 0; i < MAX_QUIRK_BOOTIRQ_REROUTE_DEVS; i++) {
> + if (quirk_bootirq_reroute_devs[i].vendor == 0)
> + break;
> +
> + if (quirk_bootirq_reroute_devs[i].vendor == tmp.vendor &&
> + quirk_bootirq_reroute_devs[i].device == tmp.device)
> + return 1; /* already in array */
> + }
> +
> + if (i >= MAX_QUIRK_BOOTIRQ_REROUTE_DEVS) {
> + printk(KERN_INFO "PCI quirk: too many reroute entries when "
> + "trying to add 0x%04x:0x%04x\n",
> + tmp.vendor, tmp.device);
> + return 0;
> + }
> +
> + quirk_bootirq_reroute_devs[i].vendor = tmp.vendor;
> + quirk_bootirq_reroute_devs[i].device = tmp.device;
> + quirk_bootirq_reroute_devs[i].quirk_variant = tmp.quirk_variant;
> +
> + reroute_to_boot_interrupts = 1;
> + }
> +
> + printk(KERN_INFO "PCI quirk: reroute interrupts for 0x%04x:0x%04x\n",
> + tmp.vendor, tmp.device);
> + return 1;
> +}
> +
> +/* bootirqquirk=0x<vendor>,0x<device>,<quirk_type> */
> +__setup("bootirqquirk=", bootirqquirk_setup);
> +
> #endif /* CONFIG_X86_IO_APIC */
>
> /*
> @@ -1773,6 +1880,17 @@ void pci_fixup_device(enum pci_fixup_pass pass, struct pci_dev *dev)
> return;
> }
> pci_do_fixups(dev, start, end);
> +
> + switch(pass) {
> + case pci_fixup_early:
> + start = __start_bc_pci_fixups_early;
> + end = __end_bc_pci_fixups_early;
> + break;
> +
> + default:
> + return;
> + }
> + pci_do_fixups(dev, start, end);
> }
> EXPORT_SYMBOL(pci_fixup_device);
>
> --
> 1.5.2.4
>
> --
> Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, N??rnberg
>

2008-06-03 15:54:24

by Jon Masters

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting


On Mon, 2008-06-02 at 14:45 +0200, Olaf Dabrunz wrote:

> These patches are against linux-2.6-tip, auto-x86-next.
>
> When IRQ lines on secondary or higher IO-APICs are masked (as done by
> RT and others), many chipsets redirect IRQs on this line to the PIC, and
> thereby regularly to the first IO-APIC in the system. This causes
> spurious interrupts and can lead to disabled IRQ lines.
>
> Disabling this "boot interrupt" (as it is mostly used to supply all
> IRQs to the legacy PIC during boot) is chipset-specific, and not
> possible for all chips. This patchset disables the boot interrupt on
> chipsets where this is possible and where we know how to do it.
>
> When disabling the boot interrupt is not possible, the patches tell the
> IRQ code to always use the redirected interrupt line (on the first
> IO-APIC) instead of the "original" line on the secondary (tertiary ...)
> IO-APIC. The original line remains masked, and IRQs always appear on
> the boot interrupt line on the first IO-APIC instead.
>
> Two new boot parameters control the quirks and are explained in the
> patches: nobootirqquirk,
> bootirqquirk=0x<vendor>,0x<device>,<quirk_variant>. "noapic" also sets
> nobootirqquirk.
>
> All patches are co-authored by Stefan Assmann and Olaf Dabrunz. But
> according to Documentation/SubmittingPatches, the patch submission format
> allows only for one author.

I like the patch series, insomuch as the intention is good, and it does
away with the special PCI IRQ quirk patches some of us are carrying in
our vendor trees to temporarily workaround this problem[0]. Also, I'm
extremely impressed with the research that went into this, since I
repeatedly tried to get ahold of information about disabling this
unfortunate "feature" of various bridges, without much success.

However, I really am not happy with the implementation as it stands. The
duplicated table of quirks that doesn't really fit in with the existing
PCI quirks infrastructure, the weird naming of the kernel options, and
various other things that Thomas has already mentioned in his reply.
Therefore, I think this needs a bit more reworking before going in.

Thanks!

Jon.

[0] The real fix come when we move IRQ handling in RT to per-device
threads, as is the longer term intention. Then you can quiesse the
device immediately and not the mask/unmask cycle that fails here.

2008-06-03 15:58:53

by Olaf Dabrunz

[permalink] [raw]
Subject: Re: [PATCH 4/7] disable broadcomm legacy boot interrupt generation

On 03-Jun-08, Jon Masters wrote:
> On Mon, 2008-06-02 at 14:45 +0200, Olaf Dabrunz wrote:
>
> > Add a quirk to disable legacy boot interrupt generation on broadcomm HT1000.
>
> This one interests me, because I got access to the Broadcom HT-1000
> datasheet and could not find this feature mentioned in there. I'd
> appreciate knowing how you figured this one out :)

I had a sharp look at it... ;)

Actually, I looked through the datasheet three times, and did not find
anything. Then I looked at the IRQ routing picture until I had an idea
of how it is supposed to work and the solution was becoming clear. The
register descriptions made me believe it could work, and testing
confirmed this.

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-03 16:08:15

by Jon Masters

[permalink] [raw]
Subject: Re: [PATCH 4/7] disable broadcomm legacy boot interrupt generation


On Tue, 2008-06-03 at 17:59 +0200, Olaf Dabrunz wrote:
> On 03-Jun-08, Jon Masters wrote:
> > On Mon, 2008-06-02 at 14:45 +0200, Olaf Dabrunz wrote:
> >
> > > Add a quirk to disable legacy boot interrupt generation on broadcomm HT1000.
> >
> > This one interests me, because I got access to the Broadcom HT-1000
> > datasheet and could not find this feature mentioned in there. I'd
> > appreciate knowing how you figured this one out :)
>
> I had a sharp look at it... ;)

Me too. It's ***NOT*** mentioned in there :)

> Actually, I looked through the datasheet three times, and did not find
> anything. Then I looked at the IRQ routing picture until I had an idea
> of how it is supposed to work and the solution was becoming clear. The
> register descriptions made me believe it could work, and testing
> confirmed this.

Oh, ok, now that takes some inordinate amount of dedication (on the
level of commitable...I mean dude, WTF? :P). Someone should call the LHC
and let them know you've got a way to detect the undetectable.

Jon.

2008-06-03 16:16:18

by Jon Masters

[permalink] [raw]
Subject: Re: [PATCH 4/7] disable broadcomm legacy boot interrupt generation

On Mon, 2008-06-02 at 14:45 +0200, Olaf Dabrunz wrote:

> Add a quirk to disable legacy boot interrupt generation on broadcomm HT1000.

This one interests me, because I got access to the Broadcom HT-1000
datasheet and could not find this feature mentioned in there. I'd
appreciate knowing how you figured this one out :)

Jon.

2008-06-03 16:18:07

by Stefan Assmann

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

Jon Masters schrieb:
> I like the patch series, insomuch as the intention is good, and it does
> away with the special PCI IRQ quirk patches some of us are carrying in
> our vendor trees to temporarily workaround this problem[0]. Also, I'm
> extremely impressed with the research that went into this, since I
> repeatedly tried to get ahold of information about disabling this
> unfortunate "feature" of various bridges, without much success.
>
In a perfect world these "features" as you call them should be disabled
by the bios, but this is not happening everywhere. Some bioses seem to
have code for that in ACPI others rely on the OS to do the work.
> However, I really am not happy with the implementation as it stands. The
> duplicated table of quirks that doesn't really fit in with the existing
> PCI quirks infrastructure, the weird naming of the kernel options, and
> various other things that Thomas has already mentioned in his reply.
> Therefore, I think this needs a bit more reworking before going in.
>
We're already working on that thanks to Thomas' review and will send an
updated patchset asap. If there's anything you'd like to have changed
feel free to let us know.
> Thanks!
>
> Jon.
>
> [0] The real fix come when we move IRQ handling in RT to per-device
> threads, as is the longer term intention. Then you can quiesse the
> device immediately and not the mask/unmask cycle that fails here.
>
>
>

Thanks :)
Stefan

--
Stefan Assmann | SUSE LINUX Products GmbH
Software Engineer | Maxfeldstr. 5, D-90409 Nuernberg
Mail : [email protected] | GF: Markus Rex, HRB 16746 (AG Nuernberg)

2008-06-03 16:56:19

by Olaf Dabrunz

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On 03-Jun-08, Jon Masters wrote:
> I like the patch series, insomuch as the intention is good, and it does
> away with the special PCI IRQ quirk patches some of us are carrying in
> our vendor trees to temporarily workaround this problem[0]. Also, I'm
> extremely impressed with the research that went into this, since I
> repeatedly tried to get ahold of information about disabling this
> unfortunate "feature" of various bridges, without much success.

Thanks a lot. :)

BTW, your blog was really helpful to get up to speed on this again. :)
(I read about (RT) IRQ problems a looong time ago on lkml... ;))

We also looked through the code and added tests to make sure that
IRQ-handling works correctly and to test a few theories. We became
relatively convinced that the code was correct. Then we thought the
failures were too systematic to be caused by glitches. Always the same
IRQs were triggered, and only when the "original" ones were masked.

We found first answers in the i6700PXH datasheet. After the first
reroute patch worked, the rest was about finding similar answers for
other chipsets.

> However, I really am not happy with the implementation as it stands. The
> duplicated table of quirks that doesn't really fit in with the existing
> PCI quirks infrastructure, the weird naming of the kernel options, and
> various other things that Thomas has already mentioned in his reply.
> Therefore, I think this needs a bit more reworking before going in.

Yep. We are working on it. Thanks for your comments. :)

> Thanks!
>
> Jon.
>
> [0] The real fix come when we move IRQ handling in RT to per-device
> threads, as is the longer term intention. Then you can quiesse the
> device immediately and not the mask/unmask cycle that fails here.

Yes, per-device threads help with the latency. I believe the boot irq
patches here can then make drivers work that do not (yet) use the new
approach. Also, I am not sure so far whether all devices can quiesce
their IRQs on the device without introducing possible loss of IRQs --
but that is just theory (or FUD? BS? oh-oh *duck*), so please don't
mind... :}

Thanks,

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-03 17:00:23

by Stefan Assmann

[permalink] [raw]
Subject: Re: [PATCH 1/7] add kernel cmdline option to disable pci-irq quirks

Thomas Gleixner schrieb:
> On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
>
>> From: Olaf Dabrunz <[email protected]>
>>
>> Add a switch to disable all boot interrupt quirks, using the parameter
>> nobootirqquirk.
>>
>
> This should be a parameter in the form of
>
> pci=ioapicquirk
>
> and not a separate and completely unintuitive thing.
>
> Thanks,
> tglx
>
>
Introducing 2 parameters then:
pci=ioapicreroute to enable rerouting of interrupts to the primary io-apic,
pci=noioapicquirk to disable all sorts of io-apic quirks.

This implies that disabling boot interrupts, on bridges where we know
how to do it, will be enabled by default. I'd favor this solution
because I can't think of any harm this could possibly cause. Boot
interrupts shouldn't happen on systems that run in apic mode.
>> Signed-off-by: Olaf Dabrunz <[email protected]>
>> Signed-off-by: Stefan Assmann <[email protected]>
>> ---
>> drivers/pci/quirks.c | 15 +++++++++++++++
>> 1 files changed, 15 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index dabb563..6245486 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -1363,6 +1363,21 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x2609, quirk_intel_pcie_pm);
>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260a, quirk_intel_pcie_pm);
>> DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x260b, quirk_intel_pcie_pm);
>>
>> +
>> +#ifdef CONFIG_X86_IO_APIC
>> +int nobootirqquirk __read_mostly = 0;
>> +
>> +int nobootirqquirk_setup(char *str)
>> +{
>> + nobootirqquirk = 1;
>> + printk(KERN_INFO "Boot IRQ quirk handling disabled\n");
>> +
>> + return 1;
>> +}
>> +__setup("nobootirqquirk", nobootirqquirk_setup);
>> +#endif /* CONFIG_X86_IO_APIC */
>> +
>> +
>> /*
>> * Toshiba TC86C001 IDE controller reports the standard 8-byte BAR0 size
>> * but the PIO transfers won't work if BAR0 falls at the odd 8 bytes.
>> --
>> 1.5.2.4
>>
>> --
>> Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, N??rnberg
>>
>>
Stefan

--
Stefan Assmann | SUSE LINUX Products GmbH
Software Engineer | Maxfeldstr. 5, D-90409 Nuernberg
Mail : [email protected] | GF: Markus Rex, HRB 16746 (AG Nuernberg)

2008-06-03 17:11:00

by Olaf Dabrunz

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

Hello Thomas, :)

On 03-Jun-08, Thomas Gleixner wrote:
> Olaf,
>
> On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
> > These patches are against linux-2.6-tip, auto-x86-next.
> >
> > When IRQ lines on secondary or higher IO-APICs are masked (as done by
> > RT and others), many chipsets redirect IRQs on this line to the PIC, and
> > thereby regularly to the first IO-APIC in the system. This causes
> > spurious interrupts and can lead to disabled IRQ lines.
> >
> > Disabling this "boot interrupt" (as it is mostly used to supply all
> > IRQs to the legacy PIC during boot) is chipset-specific, and not
> > possible for all chips. This patchset disables the boot interrupt on
> > chipsets where this is possible and where we know how to do it.
> >
> > When disabling the boot interrupt is not possible, the patches tell the
> > IRQ code to always use the redirected interrupt line (on the first
> > IO-APIC) instead of the "original" line on the secondary (tertiary ...)
> > IO-APIC. The original line remains masked, and IRQs always appear on
> > the boot interrupt line on the first IO-APIC instead.
>
> Thanks for doing the research on this problem.
>
> We probably don't want to enable this unconditionally right away, so
> we should have a command line option which enables these quirks.

Is this part of the process (sth like: keep it disabled until we trust
it)?

Otherwise I would suggest to enable the "disable-quirks" by default
(when we use the APICs), and to disable the reroute stuff by default
until people trust it.

The "disable-quirks" are what some chipset documentation recommends to
do in the ACPI "_PIC" method anyway, when APIC mode is selected by the
operating system. I have seen a few BIOSes that do exactly the same in
the "_PIC" method. They are straightforward and simple. I believe they
are safe in APIC mode (and off in PIC mode) and we should expose them to
testing.

The reroute patch is a bit of a hack for the chips where boot interrupts
cannot be switched off. It is a simple approach, and I can think up
(so far theoretical) scenarios where it breaks.

In defense of the reroute patch:

- we have not seen it break during our testing on more than 10 intel
machines (with and without chips on which it is used) and

- so far the rerouting was only necessary for some intel bridges,
and I checked that

1) all these bridges use the same routing for boot interrupts to
PCI IRQs, as in the i6700PXH datasheet, section 2.15.2:

Pin INT

0,4,8,12 -> INTA
1,5,9,13 -> INTB
2,6,10,14 -> INTC
3,7,11,15 -> INTD

(also note that this routing pattern is required by the PCI
specs for _cards_ that add PCI bridges, as the BIOS cannot
otherwise know the routing)

2) all intel ICHxs use the same routing for external PCI IRQs to
the ICHx's IO-APIC, as in the ICH5 datasheet, section 5.9.2:

Direct
from Pin IRQ #

PIRQA# -> 16
PIRQB# -> 17
PIRQC# -> 18
PIRQD# -> 19
PIRQE# -> 20
PIRQF# -> 21
PIRQG# -> 22
PIRQH# -> 23

and as the ICH provides a PIC, it needs to be the first interrupt
controller in the system, unless another chip also provides a PIC.

To break the reroute patch, someone would need to connect the PCI IRQs
from an intel PCI bridge that cannot disable boot interrupts
(6700PX[VH], 8033[23] or 6300ESB) to something other than an intel ICHx.
I have never seen such a board and I doubt there will be many such
boards.

> Also is there any interaction with MSI ?

The docs for the 6700PX[VH], and IIRC also for the 8033[23] or 6300ESB
say that boot irqs are generated when the original ones are masked. So
this is independent of MSI.

We have not many boards where MSI is enabled (as there are quirks for
many chips that switch MSI off). But our testing showed no problems on
MSI-enabled machines.

> Thanks,
> tglx

Thanks, :)

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-03 17:16:15

by Olaf Dabrunz

[permalink] [raw]
Subject: Re: [PATCH 4/7] disable broadcomm legacy boot interrupt generation

On 03-Jun-08, Jon Masters wrote:
>
> On Tue, 2008-06-03 at 17:59 +0200, Olaf Dabrunz wrote:
> > On 03-Jun-08, Jon Masters wrote:
> > > On Mon, 2008-06-02 at 14:45 +0200, Olaf Dabrunz wrote:
> > >
> > > > Add a quirk to disable legacy boot interrupt generation on broadcomm HT1000.
> > >
> > > This one interests me, because I got access to the Broadcom HT-1000
> > > datasheet and could not find this feature mentioned in there. I'd
> > > appreciate knowing how you figured this one out :)
> >
> > I had a sharp look at it... ;)
>
> Me too. It's ***NOT*** mentioned in there :)

That's right. :)

> > Actually, I looked through the datasheet three times, and did not find
> > anything. Then I looked at the IRQ routing picture until I had an idea
> > of how it is supposed to work and the solution was becoming clear. The
> > register descriptions made me believe it could work, and testing
> > confirmed this.
>
> Oh, ok, now that takes some inordinate amount of dedication (on the
> level of commitable...I mean dude, WTF? :P). Someone should call the LHC
> and let them know you've got a way to detect the undetectable.

xD *grin* :)

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-03 23:01:42

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 2/7] reroute PCI interrupt to legacy boot interrupt equivalent

On Tue, 3 Jun 2008 12:37:58 +0200 (CEST) Thomas Gleixner wrote:

> On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
> > From: Stefan Assmann <[email protected]>
> >
> > Some chipsets (e.g. intel 6700PXH) generate a legacy INTx when the
> > IRQ entry in the chipset's IO-APIC is masked (as, e.g. the RT kernel
> > does during interrupt handling). On chipsets where this INTx generation
> > cannot be disabled, we reroute the valid interrupts to their legacy
> > equivalent to get rid of spurious interrupts that might otherwise bring
> > down (vital) interrupt lines through spurious interrupt detection in
> > note_interrupt().
> >
> > The patch should eventually simply use another bitfield in the pci_dev
> > structure for the rerouting variant. This was in another variant of the patch
> > by Daniel Gollub. It is the cleanest and most simple approach.
> >
> > This patch benefited from discussions with Alexander Graf, Torsten Duwe,
> > Ihno Krumreich, Daniel Gollub, Hannes Reinecke. The conclusions we drew
> > and the patch itself are the authors' responsibility alone.
> >
> > Signed-off-by: Stefan Assmann <[email protected]>
> > Signed-off-by: Olaf Dabrunz <[email protected]>
> > ---
> > drivers/acpi/pci_irq.c | 63 ++++++++++++++++++++++++++++++++++++++++++++
> > drivers/pci/quirks.c | 51 ++++++++++++++++++++++++++++++++++-
> > include/linux/pci.h | 2 +
> > include/linux/pci_ids.h | 4 +++
> > include/linux/pci_quirks.h | 28 +++++++++++++++++++
> > 5 files changed, 147 insertions(+), 1 deletions(-)
> > create mode 100644 include/linux/pci_quirks.h
> >
> > diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
> > index 89022a7..ebdfbfe 100644
> > --- a/drivers/acpi/pci_irq.c
> > +++ b/drivers/acpi/pci_irq.c
> > @@ -384,6 +384,34 @@ acpi_pci_free_irq(struct acpi_prt_entry *entry,
> > return irq;
> > }
> >
> > +#ifdef CONFIG_X86_IO_APIC
> > +static int
> > +bridge_has_boot_interrupt_variant(struct pci_bus *bus)
>
> one line
>
> > +{
> > + struct pci_bus *bus_it;
> > + int i;
> > +
> > + for (bus_it = bus ; bus_it ; bus_it = bus_it->parent) {

Along with Thomas's comments:

spacing:
for (bus_it = bus; bus_it; bus_it = bus_it->parent) {

and additions to Documentation/kernel-parameters.txt, please.


> > + if (!bus_it->self)
> > + return 0;
> > +
> > + printk(KERN_INFO "vendor=%04x device=%04x\n", bus_it->self->vendor,
> > + bus_it->self->device);
> > + for (i = 0; i < MAX_QUIRK_BOOTIRQ_REROUTE_DEVS; i++) {
> > + if (quirk_bootirq_reroute_devs[i].vendor == 0)
> > + return 0;
> > +
> > + if (quirk_bootirq_reroute_devs[i].vendor == bus_it->self->vendor &&
> > + quirk_bootirq_reroute_devs[i].device == bus_it->self->device)
> > + return quirk_bootirq_reroute_devs[i].quirk_variant;
> > + }
> > + }
> > + return 0;
> > +}


---
~Randy

2008-06-04 02:41:48

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

Olaf Dabrunz <[email protected]> writes:

> These patches are against linux-2.6-tip, auto-x86-next.
>
> When IRQ lines on secondary or higher IO-APICs are masked (as done by
> RT and others), many chipsets redirect IRQs on this line to the PIC, and
> thereby regularly to the first IO-APIC in the system. This causes
> spurious interrupts and can lead to disabled IRQ lines.

Sounds like good problem tracking.

> Disabling this "boot interrupt" (as it is mostly used to supply all
> IRQs to the legacy PIC during boot) is chipset-specific, and not
> possible for all chips. This patchset disables the boot interrupt on
> chipsets where this is possible and where we know how to do it.

I know to work around a similar issue we disable the interrupt inputs
on the PIC. Is that not enough? In particular what is enabled and
what is disabled when these interrupts are coming in.

> When disabling the boot interrupt is not possible, the patches tell the
> IRQ code to always use the redirected interrupt line (on the first
> IO-APIC) instead of the "original" line on the secondary (tertiary ...)
> IO-APIC. The original line remains masked, and IRQs always appear on
> the boot interrupt line on the first IO-APIC instead.

In general we should not be disabling interrupts, especially in the
mainline kernel. Artificially increasing interrupt sharing just so
you can be certain of disabling the interrupt line seems to be the
wrong approach.

Much better would be to get everything off of the shared boot
interrupt line, so you can disable that, if that is possible.

For the mainstream kernel I expect we can even teach the drivers
not to call disable_irq. As a function of last resort to deal
with screaming irqs, disable_irq seems reasonable. Using disable_irq
on a regular basis appears to be asking for a trouble (as you have
found).

As for the question about MSI. Good MSI implementations have a mask
bit so we should be able to disable those reliably. For other MSI
implementations we disable both the mask bit and the pci_intx
capability. So there should be no way for the irq to leave the
device. But it hardware is strange sometimes.

That said have you tried clearing the PCI_COMMAND_INTX_DISABLE bit
of the PCI_COMMAND register to mask the boot interrupts? I expect
that would work on quite a lot of modern hardware.

Eric

2008-06-04 09:49:32

by Stefan Assmann

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

Eric W. Biederman schrieb:
> Olaf Dabrunz <[email protected]> writes:
>
>> These patches are against linux-2.6-tip, auto-x86-next.
>>
>> When IRQ lines on secondary or higher IO-APICs are masked (as done by
>> RT and others), many chipsets redirect IRQs on this line to the PIC, and
>> thereby regularly to the first IO-APIC in the system. This causes
>> spurious interrupts and can lead to disabled IRQ lines.
>
> Sounds like good problem tracking.
>
>> Disabling this "boot interrupt" (as it is mostly used to supply all
>> IRQs to the legacy PIC during boot) is chipset-specific, and not
>> possible for all chips. This patchset disables the boot interrupt on
>> chipsets where this is possible and where we know how to do it.
>
> I know to work around a similar issue we disable the interrupt inputs
> on the PIC. Is that not enough? In particular what is enabled and
> what is disabled when these interrupts are coming in.

On the chips (ICHx, ...) we saw, the interrupt lines on the PIC also go
to the first IO-APIC. So the boot interrupts go to both the PIC and the
first IO-APIC.

When running in APIC mode all PIC IRQs are disabled, except for the
timer maybe. Boot interrupts still arrive on the first IO-APIC and end
up as being counted as spurious interrupts.

>> When disabling the boot interrupt is not possible, the patches tell the
>> IRQ code to always use the redirected interrupt line (on the first
>> IO-APIC) instead of the "original" line on the secondary (tertiary ...)
>> IO-APIC. The original line remains masked, and IRQs always appear on
>> the boot interrupt line on the first IO-APIC instead.
>
> In general we should not be disabling interrupts, especially in the
> mainline kernel. Artificially increasing interrupt sharing just so
> you can be certain of disabling the interrupt line seems to be the
> wrong approach.
>
> Much better would be to get everything off of the shared boot
> interrupt line, so you can disable that, if that is possible.
>

The lines where the boot interrupts show up are usually hard wired to
the first IO-APIC. It might be possible to move devices that share these
lines to other interrupts but in many cases, especially on older ICHs,
this is not possible.

The wiring of the boot interrupts follows a fixed pattern on most
bridges and generates a PCI IRQ. This ends up on the ICH or equivalent,
where it either is hard-wired to IRQ 16 to 24, or can be routed to
some IRQ through a programmable mapping. An example for the mappings
on intel chips is in another mail in this thread.

> For the mainstream kernel I expect we can even teach the drivers
> not to call disable_irq. As a function of last resort to deal
> with screaming irqs, disable_irq seems reasonable. Using disable_irq
> on a regular basis appears to be asking for a trouble (as you have
> found).

We see these boot interrupts mainly in the RT kernels, which handle
interrupts in threads. To do this, they mask the IRQ until it has been
handled. The masking sets up the conditions on the chip so that boot
interrupts are generated.

When drivers call disable_irq, this is another possible cause for boot
interrupts. We agree that this should probably be fixed.

> As for the question about MSI. Good MSI implementations have a mask
> bit so we should be able to disable those reliably. For other MSI
> implementations we disable both the mask bit and the pci_intx
> capability. So there should be no way for the irq to leave the
> device. But it hardware is strange sometimes.
>
> That said have you tried clearing the PCI_COMMAND_INTX_DISABLE bit
> of the PCI_COMMAND register to mask the boot interrupts? I expect
> that would work on quite a lot of modern hardware.

Some chips (6700PXH, ...) are not PCIe 2.x (IIR the version correctly)
compatible, and they do not support switching off INTx generation.
We tried this anyway for the 6700PXH and it did not work.

Stefan and Olaf

--
Stefan Assmann | SUSE LINUX Products GmbH
Software Engineer | Maxfeldstr. 5, D-90409 Nuernberg
Mail : [email protected] | GF: Markus Rex, HRB 16746 (AG Nuernberg)

2008-06-04 10:05:40

by Olaf Dabrunz

[permalink] [raw]
Subject: Re: [PATCH 7/7] bootirqquirk= parameter to enable bootirq quirks for additional chips

On 03-Jun-08, Thomas Gleixner wrote:
> On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
> > From: Olaf Dabrunz <[email protected]>
> >
> > The existing bootirq quirks and the reroute workaround may work for other chips
> > where we could not test them. This parameter allows users to apply these to
> > other chips without the need to re-build the kernel.
> >
> > This patch was conceived simultaneously by Stefan Assmann, Daniel Gollub and
> > Olaf Dabrunz. The implementation is the author's.
> >
> > bootirqquirk=0x<vendor>,0x<device>,<quirk_type>
> >
> > - quirk type 1 - 32 selects an IRQ reroute algorithm for devices
> > connected to that PCI bridge (currently only algorithm "1" is
> > implemented),
> >
> > - quirk type 33 - x applies one of the known quirks to the PCI
> > device, currently these:
> >
> > 33 -> quirk_disable_intel_boot_interrupt
> > 34 -> quirk_disable_broadcom_boot_interrupt
> > 35 -> quirk_disable_amd_8111_boot_interrupt
> > 36 -> quirk_disable_amd_813x_boot_interrupt
> > 37 -> quirk_disable_amd_sb700s_boot_interrupt
>
> Oh no. This can be used as an debug aid when the need arises, but we
> dont want to add this to the kernel.

Yes, I agree. We should rather patch in additional chips as needed.
This is more a helper functionality for early testing.

Regards,

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-04 10:50:32

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

Stefan Assmann <[email protected]> writes:

> On the chips (ICHx, ...) we saw, the interrupt lines on the PIC also go
> to the first IO-APIC. So the boot interrupts go to both the PIC and the
> first IO-APIC.
>
> When running in APIC mode all PIC IRQs are disabled, except for the
> timer maybe. Boot interrupts still arrive on the first IO-APIC and end
> up as being counted as spurious interrupts.

So the boot interrupt comes in on one of the legacy irq vectors on
the first ioapic, but it is a native ioapic irq.

What is the reason why you don't simply disable the ioapic vector of
the boot interrupt? Do some devices not use anything else?

> The lines where the boot interrupts show up are usually hard wired to
> the first IO-APIC. It might be possible to move devices that share these
> lines to other interrupts but in many cases, especially on older ICHs,
> this is not possible.

Not what I was suggesting.

As I read the above. It says:
When you can not disable the boot interrupt you don't attempt to use
the non-boot interrupts and use the boot interrupt for everything.

If that is what your patches implement I disagree with that approach
for the mainstream kernel.

> The wiring of the boot interrupts follows a fixed pattern on most
> bridges and generates a PCI IRQ. This ends up on the ICH or equivalent,
> where it either is hard-wired to IRQ 16 to 24, or can be routed to
> some IRQ through a programmable mapping. An example for the mappings
> on intel chips is in another mail in this thread.

I will have to look. I am familiar with the concept but I have
not looked at any of these in detail for a while.

>> For the mainstream kernel I expect we can even teach the drivers
>> not to call disable_irq. As a function of last resort to deal
>> with screaming irqs, disable_irq seems reasonable. Using disable_irq
>> on a regular basis appears to be asking for a trouble (as you have
>> found).
>
> We see these boot interrupts mainly in the RT kernels, which handle
> interrupts in threads. To do this, they mask the IRQ until it has been
> handled. The masking sets up the conditions on the chip so that boot
> interrupts are generated.

Yes. My point being that because we don't do that in the mainstream
kernels we have more robust alternatives in the mainstream kernel.

> Some chips (6700PXH, ...) are not PCIe 2.x (IIR the version correctly)
> compatible, and they do not support switching off INTx generation.
> We tried this anyway for the 6700PXH and it did not work.

So much for that idea then.

Eric

2008-06-04 11:34:17

by Stefan Assmann

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

Eric W. Biederman schrieb:
> Stefan Assmann <[email protected]> writes:
>
>> On the chips (ICHx, ...) we saw, the interrupt lines on the PIC also go
>> to the first IO-APIC. So the boot interrupts go to both the PIC and the
>> first IO-APIC.
>>
>> When running in APIC mode all PIC IRQs are disabled, except for the
>> timer maybe. Boot interrupts still arrive on the first IO-APIC and end
>> up as being counted as spurious interrupts.
>
> So the boot interrupt comes in on one of the legacy irq vectors on
> the first ioapic, but it is a native ioapic irq.
>
> What is the reason why you don't simply disable the ioapic vector of
> the boot interrupt? Do some devices not use anything else?

That's exactly the reason why disabling them is a bad idea, you end up
having devices fixed to IO-APIC pins that boot interrupts are routed to.

>> The lines where the boot interrupts show up are usually hard wired to
>> the first IO-APIC. It might be possible to move devices that share these
>> lines to other interrupts but in many cases, especially on older ICHs,
>> this is not possible.
>
> Not what I was suggesting.
>
> As I read the above. It says:
> When you can not disable the boot interrupt you don't attempt to use
> the non-boot interrupts and use the boot interrupt for everything.
>
> If that is what your patches implement I disagree with that approach
> for the mainstream kernel.

You're right, that behavior is more appropriate for RT, where we see
these boot interrupts because of the way interrupts are handled there.
We're rewriting the patches to use a parameter like pci=ioapicreroute to
trigger this only if the parameter is specified (for the mainstream
kernel).

>> The wiring of the boot interrupts follows a fixed pattern on most
>> bridges and generates a PCI IRQ. This ends up on the ICH or equivalent,
>> where it either is hard-wired to IRQ 16 to 24, or can be routed to
>> some IRQ through a programmable mapping. An example for the mappings
>> on intel chips is in another mail in this thread.
>
> I will have to look. I am familiar with the concept but I have
> not looked at any of these in detail for a while.
>
>>> For the mainstream kernel I expect we can even teach the drivers
>>> not to call disable_irq. As a function of last resort to deal
>>> with screaming irqs, disable_irq seems reasonable. Using disable_irq
>>> on a regular basis appears to be asking for a trouble (as you have
>>> found).
>> We see these boot interrupts mainly in the RT kernels, which handle
>> interrupts in threads. To do this, they mask the IRQ until it has been
>> handled. The masking sets up the conditions on the chip so that boot
>> interrupts are generated.
>
> Yes. My point being that because we don't do that in the mainstream
> kernels we have more robust alternatives in the mainstream kernel.
>
>> Some chips (6700PXH, ...) are not PCIe 2.x (IIR the version correctly)
>> compatible, and they do not support switching off INTx generation.
>> We tried this anyway for the 6700PXH and it did not work.
>
> So much for that idea then.
>
> Eric

Stefan
--
Stefan Assmann | SUSE LINUX Products GmbH
Software Engineer | Maxfeldstr. 5, D-90409 Nuernberg
Mail : [email protected] | GF: Markus Rex, HRB 16746 (AG Nuernberg)

2008-06-04 11:37:11

by Olaf Dabrunz

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On 04-Jun-08, Eric W. Biederman wrote:
> Stefan Assmann <[email protected]> writes:
>
> > On the chips (ICHx, ...) we saw, the interrupt lines on the PIC also go
> > to the first IO-APIC. So the boot interrupts go to both the PIC and the
> > first IO-APIC.
> >
> > When running in APIC mode all PIC IRQs are disabled, except for the
> > timer maybe. Boot interrupts still arrive on the first IO-APIC and end
> > up as being counted as spurious interrupts.
>
> So the boot interrupt comes in on one of the legacy irq vectors on
> the first ioapic, but it is a native ioapic irq.
>
> What is the reason why you don't simply disable the ioapic vector of
> the boot interrupt? Do some devices not use anything else?

Yes, other devices do use the same ioapic irq. Depending on the chipset
and mainboard wiring, these other devices can or cannot be moved to a
different IRQ. Also, this changes with chipset generations and there are
several registers to consider for every chipset to re-route these
devices.

This solution would need to be worked on intesively, it will not work
often enough and also maintenance would cost a lot of time.

On the other hand, we see the reroute patch as a legacy support
workaround. Newer chipsets can usually disable the boot interrupt. But
devices like the 6700PXH are still around a lot.

None of the newer chipsets we looked at needs a workaround like
re-routing.

> > The lines where the boot interrupts show up are usually hard wired to
> > the first IO-APIC. It might be possible to move devices that share these
> > lines to other interrupts but in many cases, especially on older ICHs,
> > this is not possible.
>
> Not what I was suggesting.
>
> As I read the above. It says:
> When you can not disable the boot interrupt you don't attempt to use
> the non-boot interrupts and use the boot interrupt for everything.

Right.

> If that is what your patches implement I disagree with that approach
> for the mainstream kernel.

Amongst all the approaches we discussed before creating the patch, this
was the only working solution.

What would be your suggestion to handle boot interrupts that cannot be
disabled?

> > The wiring of the boot interrupts follows a fixed pattern on most
> > bridges and generates a PCI IRQ. This ends up on the ICH or equivalent,
> > where it either is hard-wired to IRQ 16 to 24, or can be routed to
> > some IRQ through a programmable mapping. An example for the mappings
> > on intel chips is in another mail in this thread.
>
> I will have to look. I am familiar with the concept but I have

Yes, please. :)

> not looked at any of these in detail for a while.
>
> >> For the mainstream kernel I expect we can even teach the drivers
> >> not to call disable_irq. As a function of last resort to deal
> >> with screaming irqs, disable_irq seems reasonable. Using disable_irq
> >> on a regular basis appears to be asking for a trouble (as you have
> >> found).
> >
> > We see these boot interrupts mainly in the RT kernels, which handle
> > interrupts in threads. To do this, they mask the IRQ until it has been
> > handled. The masking sets up the conditions on the chip so that boot
> > interrupts are generated.
>
> Yes. My point being that because we don't do that in the mainstream
> kernels we have more robust alternatives in the mainstream kernel.

The mainline kernel uses masking to disable IRQ lines. The IRQ should
then not appear somewhere else and cause spurious interrupts. This is
the reason why we believe the reroute patch is also beneficial to the
mainline kernel.

That being said, I also would be glad if we had an elegant solution.

> > Some chips (6700PXH, ...) are not PCIe 2.x (IIR the version correctly)
> > compatible, and they do not support switching off INTx generation.
> > We tried this anyway for the 6700PXH and it did not work.
>
> So much for that idea then.
>
> Eric

Thanks a lot, :)

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-04 15:54:03

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On Wed, 4 Jun 2008, Stefan Assmann wrote:

> You're right, that behavior is more appropriate for RT, where we see
> these boot interrupts because of the way interrupts are handled there.
> We're rewriting the patches to use a parameter like pci=ioapicreroute to
> trigger this only if the parameter is specified (for the mainstream
> kernel).

Well, this is clear these chipsets are broken beyond imagination,
negating some 15 years of I/O APIC compatibility where masking an input in
its redirection table is expected not to have any side effects. They
should have used a separate bit for the legacy INTx redirection. That has
been fixed in hardware though and there is nothing we can do about it at
this stage. Hire competent hardware designers the next time.

However I do not feel adding a command line parameter for each of such
workarounds is the right way to do it. We end up with an infinite number
of obscure options nobody or hardly anybody really understands. We should
be removing them rather than adding new ones.

What's the reasoning behind the option in this case? As I understand
there are two cases possible:

1. Secondary, etc. I/O APIC inputs are not masked under any circumstances.
No legacy INTx redirection happens, nothing to be done.

2. Secondary, etc. I/O APIC inputs are to be masked from time to time.
That would cause legacy INTx redirection for the affected chipsets in
situations where an interrupt arrives at a masked I/O APIC input. This
interrupt is delivered to an input of the primary I/O APIC which cannot
be masked because of other devices wired to it.

OK -- that means the interrupt is delivered anyway (and perhaps
discarded in the handler, but that does not matter here), so why to do
the rerouting in the first place? Because we could equally well keep
the original secondary, etc. I/O APIC input for the interrupt unmasked
(if an affected chipset is discovered) to the very same effect, yet
avoid all the hassle and unnecessary sharing. It could be handled very
easily by selecting a different "interrupt controller" setup for the
affected I/O APICs.

Either way I fail to see a reason not to this automatically -- I cannot
imagine a sane user to be able to decide whether to use the option or not
in this case.

That's one. Second -- I have had a look at the relevant chipset
documentation and I fail to see how INTx messages received over PCI
Express are actually converted to interrupt signals. I gather they are
interpreted by the MCH and posted to the ICH "somehow". What does that
"somehow" look like? Does the MCH assert PIRQ lines of the ICH? -- but I
fail to see any available outputs in the MCH that could be used for this
purpose. SERIRQ is obviously not used, because it is not implemented in
the MCH and PCI messages I am assuming are not used either as they are
edge-triggered and therefore preclude sharing and as such could be simply
masked out in the primary I/O APIC permanently.

Knowing this could help understanding the problem better.

Maciej

2008-06-04 16:10:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On Wed, 4 Jun 2008, Maciej W. Rozycki wrote:
> On Wed, 4 Jun 2008, Stefan Assmann wrote:
> What's the reasoning behind the option in this case? As I understand
> there are two cases possible:
>
> 1. Secondary, etc. I/O APIC inputs are not masked under any circumstances.
> No legacy INTx redirection happens, nothing to be done.
>
> 2. Secondary, etc. I/O APIC inputs are to be masked from time to time.
> That would cause legacy INTx redirection for the affected chipsets in
> situations where an interrupt arrives at a masked I/O APIC input. This
> interrupt is delivered to an input of the primary I/O APIC which cannot
> be masked because of other devices wired to it.
>
> OK -- that means the interrupt is delivered anyway (and perhaps
> discarded in the handler, but that does not matter here), so why to do

It does matter. When the interrupt is _not_ handled then it comes back
immediately for ever and after a while the kernel decides to disable
the legacy int, because nobody cares about the interrupt.

What happens is:

irq_disable(secondary);

irq line of the secondary ioapic becomes active

legacy irq happens

repeat:
legacy irq disable
handler runs and discards
legacy irq enable

legacy interrupt is still active due to rerouting
if (count < max)
goto repeat
else
disable legacy irq forever

Thanks,
tglx

2008-06-04 17:19:21

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On Wed, 4 Jun 2008, Thomas Gleixner wrote:

> It does matter. When the interrupt is _not_ handled then it comes back
> immediately for ever and after a while the kernel decides to disable
> the legacy int, because nobody cares about the interrupt.

Mental shortcut, sorry -- making the interrupt be discarded through the
primary rather than the secondary, etc. I/O APIC does not change anything.
Based on the description of the problem, the interupt will just have to be
delivered somewhere, so I see little purpose in complicating the routing
and causing additional sharing just to discard the interrupt elsewhere
anyway. If INTx messages cannot be blocked on the way anywhere, then the
originating I/O APIC should never get its inputs masked and the handler
should take care of the unwanted interrupts there. This is at least my
opinion.

BTW, it could be possible to mask the interrupt by fiddling with the
vector used and the TPR instead -- we have a range of low-priority vectors
which are never used for APIC interrupts, so the TPR may be hardcoded to
some non-zero value for all systems and then for the problematic chipsets
handlers may change vectors to mask or unmask APIC interrupts. This would
have to be verified on actual hardware and be conditional as it is likely
to cause troubles for systems using serial interrupt delivery over the
inter-APIC bus (the vector, delivery mode, etc. are generally not meant to
be changed with the input unmasked for these chips -- which just shows how
braindead the idea of the mask having side effects is). Just a thought.

Maciej

2008-06-04 17:34:21

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On Wed, 4 Jun 2008, Maciej W. Rozycki wrote:
> On Wed, 4 Jun 2008, Thomas Gleixner wrote:
>
> > It does matter. When the interrupt is _not_ handled then it comes back
> > immediately for ever and after a while the kernel decides to disable
> > the legacy int, because nobody cares about the interrupt.
>
> Mental shortcut, sorry -- making the interrupt be discarded through the
> primary rather than the secondary, etc. I/O APIC does not change anything.
> Based on the description of the problem, the interupt will just have to be
> delivered somewhere, so I see little purpose in complicating the routing
> and causing additional sharing just to discard the interrupt elsewhere
> anyway. If INTx messages cannot be blocked on the way anywhere, then the
> originating I/O APIC should never get its inputs masked and the handler
> should take care of the unwanted interrupts there. This is at least my
> opinion.

There is no way to take care of an unwanted interrupt when there is no
handler which knows to deal with the device.

The problem case is mostly preempt-rt, where we receive the interrupt,
mask it and wake up the handler thread. We can not leave it unmasked
for obvious reasons.

> BTW, it could be possible to mask the interrupt by fiddling with the
> vector used and the TPR instead -- we have a range of low-priority vectors
> which are never used for APIC interrupts, so the TPR may be hardcoded to
> some non-zero value for all systems and then for the problematic chipsets
> handlers may change vectors to mask or unmask APIC interrupts. This would
> have to be verified on actual hardware and be conditional as it is likely
> to cause troubles for systems using serial interrupt delivery over the
> inter-APIC bus (the vector, delivery mode, etc. are generally not meant to
> be changed with the input unmasked for these chips -- which just shows how
> braindead the idea of the mask having side effects is). Just a thought.

I tried this already and it results in extremly strange and randomly
changing behaviour up to a full system lockup :(

Thanks,

tglx

2008-06-04 17:53:55

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On Wed, 4 Jun 2008, Thomas Gleixner wrote:

> There is no way to take care of an unwanted interrupt when there is no
> handler which knows to deal with the device.

Of course.

> The problem case is mostly preempt-rt, where we receive the interrupt,
> mask it and wake up the handler thread. We can not leave it unmasked
> for obvious reasons.

Hmm, I can see what the problem is now -- you can mask the input of the
primary I/O APIC in this case (with no side effects), making other sources
at that line suffer, but at least you get away.

Anyway, my other questions remain open. How is the INTx message actually
delivered to the ICH? And why is the command line option needed?

Maciej

2008-06-04 18:36:43

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On Wed, 4 Jun 2008, Maciej W. Rozycki wrote:
> On Wed, 4 Jun 2008, Thomas Gleixner wrote:
>
> > There is no way to take care of an unwanted interrupt when there is no
> > handler which knows to deal with the device.
>
> Of course.
>
> > The problem case is mostly preempt-rt, where we receive the interrupt,
> > mask it and wake up the handler thread. We can not leave it unmasked
> > for obvious reasons.
>
> Hmm, I can see what the problem is now -- you can mask the input of the
> primary I/O APIC in this case (with no side effects), making other sources
> at that line suffer, but at least you get away.
>
> Anyway, my other questions remain open. How is the INTx message actually
> delivered to the ICH?

I have roughly as much clue as you about that. The datashi^Heet is not
really helpful.

> And why is the command line option needed?

We don't want to impose that in the first place, but OTOH to get test
coverage it's probably better to do it unconditionally.

Thanks,

tglx

2008-06-04 18:46:40

by Jon Masters

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting


On Wed, 2008-06-04 at 11:49 +0200, Stefan Assmann wrote:

> On the chips (ICHx, ...) we saw, the interrupt lines on the PIC also go
> to the first IO-APIC. So the boot interrupts go to both the PIC and the
> first IO-APIC.

Yup, it's system dependent too, but it's a mess in many cases. I've seen
numerous systems falling over - typically, the classical case will be
some IO controller when under heavy load will stop doing interrupts.
Another gotcha is that a lot of the time, these legacy interrupt (boot
interrupts) happen to be shared between e.g. a disk controller and a USB
host controller. On a server system running -RT, it's quite common that
there won't be much going on with the other device, so the problem goes
unnoticed for a long time...but it still bites you in the end.

> When running in APIC mode all PIC IRQs are disabled, except for the
> timer maybe. Boot interrupts still arrive on the first IO-APIC and end
> up as being counted as spurious interrupts.

Yup. I think the good thing is that both SuSE and Red Hat have
identified the same problems, posited the same root cause, and
contemplated similar workarounds...so I think we understand it.

Jon.

2008-06-04 18:52:17

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On Wed, 4 Jun 2008, Thomas Gleixner wrote:

> > Anyway, my other questions remain open. How is the INTx message actually
> > delivered to the ICH?
>
> I have roughly as much clue as you about that. The datashi^Heet is not
> really helpful.

I haven't been following Intel's datasheets recently. They usually do
publish all the necessary bits, but they are scattered around, so you
really have to read them all. Chances are it's mentioned somewhere.
Besides, I have no access to the PCI Express spec and there may be further
interesting bits there. I suppose Intel can be queried directly too,
especially by the interested corporate parties.

> > And why is the command line option needed?
>
> We don't want to impose that in the first place, but OTOH to get test
> coverage it's probably better to do it unconditionally.

I gather with the workarounds disabled affected systems are useless
anyway, so what's the point?

Maciej

2008-06-04 19:00:20

by Jon Masters

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting


On Wed, 2008-06-04 at 13:33 +0200, Stefan Assmann wrote:
> Eric W. Biederman schrieb:
> > Stefan Assmann <[email protected]> writes:

> > What is the reason why you don't simply disable the ioapic vector of
> > the boot interrupt? Do some devices not use anything else?
>
> That's exactly the reason why disabling them is a bad idea, you end up
> having devices fixed to IO-APIC pins that boot interrupts are routed to.

Usually, it's a *really* ugly combo of devices too. The symptom to this
problem was a large number of test systems that would fall over under
heavy IO after some time. I spent many weeks at the end of last year
trawling through the qla2xxx, megaraid/megasas, and other drivers
looking for the "problem", we even shipped some systems to some of these
hardware folks so they could hook up bus analyzers to see if the
interrupts really were being generated, and a lot more fun.

In the end, because I didn't have the level of information about the
unfortunate implementation of some of these chipsets, I reworked a patch
from Thomas that handles mask/unmask vs. edge->level->edge transition
depending upon PCI quirks for known at-issue bridges. I've been hoping
someone would dig up enough information to actually disable these
horrible "features" so we can get that upstream and into RT.

Meanwhile, I think this really shows that a couple of us who've been
meaning to poke at per-device IRQs should do so once again. This is the
real solution - kill off all those per-IRQ threads, have one per device
and split each IRQ into a "top" and "bottom" (just kidding on the names)
half. Have the first part be *incredibly* small and just return if the
device actually was responsible for the IRQ (and quiesse it), and have
the second part do the heavyweight handling. Since the second part can
run as a thread (and actually sleep for however long, because it's now
per device) - and since we have threaded tasklets and all this on RT
anyway - we can just kill off most of the extra logic in many Linux
interrupt handlers and have just the interrupt thread do the work.

Anyway, that's the bluesky idea (and rejected OLS 2008 paper :P)...

Jon.

2008-06-04 19:21:29

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On Wed, 4 Jun 2008, Jon Masters wrote:

> Meanwhile, I think this really shows that a couple of us who've been
> meaning to poke at per-device IRQs should do so once again. This is the
> real solution - kill off all those per-IRQ threads, have one per device
> and split each IRQ into a "top" and "bottom" (just kidding on the names)
> half. Have the first part be *incredibly* small and just return if the
> device actually was responsible for the IRQ (and quiesse it), and have
> the second part do the heavyweight handling. Since the second part can
> run as a thread (and actually sleep for however long, because it's now
> per device) - and since we have threaded tasklets and all this on RT
> anyway - we can just kill off most of the extra logic in many Linux
> interrupt handlers and have just the interrupt thread do the work.

This is how a lot of our drivers used to work in the old days -- modulo
implementation details. It is still a good idea, but it requires more
care and effort to implement, so people tend to avoid it unless forced by
the design of hardware involved (cf. phylib in the interrupt mode).

Maciej

2008-06-04 20:04:19

by Jon Masters

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting


On Wed, 2008-06-04 at 20:19 +0100, Maciej W. Rozycki wrote:
> On Wed, 4 Jun 2008, Jon Masters wrote:
>
> > Meanwhile, I think this really shows that a couple of us who've been
> > meaning to poke at per-device IRQs should do so once again. This is the
> > real solution - kill off all those per-IRQ threads, have one per device
> > and split each IRQ into a "top" and "bottom" (just kidding on the names)
> > half. Have the first part be *incredibly* small and just return if the
> > device actually was responsible for the IRQ (and quiesse it), and have
> > the second part do the heavyweight handling. Since the second part can
> > run as a thread (and actually sleep for however long, because it's now
> > per device) - and since we have threaded tasklets and all this on RT
> > anyway - we can just kill off most of the extra logic in many Linux
> > interrupt handlers and have just the interrupt thread do the work.
>
> This is how a lot of our drivers used to work in the old days -- modulo
> implementation details. It is still a good idea, but it requires more
> care and effort to implement, so people tend to avoid it unless forced by
> the design of hardware involved (cf. phylib in the interrupt mode).

I disagree. I think it's now actually the *inverse*. It /used/ to be
harder, because you didn't have a context in which you could do many
things (so you need to schedule some kind of deferred work), but
actually, it'll become a lot more attractive with device threads.

The only real caveat is for performance critical cases (the reasons we
have special softirqs and the like right now) but there will always be
special cases. Still, I'd like it if writing a Linux interrupt handler
came down to registering two functions - one lightweight tiny one, and
one that's just a thread. Much less room for making mistakes.

Jon.

2008-06-04 22:08:49

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On Wed, 4 Jun 2008, Jon Masters wrote:

> I disagree. I think it's now actually the *inverse*. It /used/ to be
> harder, because you didn't have a context in which you could do many
> things (so you need to schedule some kind of deferred work), but
> actually, it'll become a lot more attractive with device threads.

Well, I mean it's easier to do all the handling sequentially in the
hardirq context than split the thing and deal with all the communication,
locking, possible races, etc. so people avoid it unless really forced to.
In principle all the interrupt handlers could be split like this except
those really, really tiny ones or where latency is absolutely critical.
Yet it often does not happen.

> The only real caveat is for performance critical cases (the reasons we
> have special softirqs and the like right now) but there will always be
> special cases. Still, I'd like it if writing a Linux interrupt handler
> came down to registering two functions - one lightweight tiny one, and
> one that's just a thread. Much less room for making mistakes.

The two will have to pass some state between each other, might run
concurrently or in parallel, etc. and require some effort to be written
correctly.

Maciej

2008-06-04 22:32:29

by Jon Masters

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting


On Wed, 2008-06-04 at 23:07 +0100, Maciej W. Rozycki wrote:
> On Wed, 4 Jun 2008, Jon Masters wrote:
>
> > I disagree. I think it's now actually the *inverse*. It /used/ to be
> > harder, because you didn't have a context in which you could do many
> > things (so you need to schedule some kind of deferred work), but
> > actually, it'll become a lot more attractive with device threads.
>
> Well, I mean it's easier to do all the handling sequentially in the
> hardirq context than split the thing and deal with all the communication,
> locking, possible races, etc. so people avoid it unless really forced to.
> In principle all the interrupt handlers could be split like this except
> those really, really tiny ones or where latency is absolutely critical.
> Yet it often does not happen.

I'm really not proposing a return to top/bottom halves...

*). Top level handler is *tiny*. It's job is to get called (along with
every other such function registered for a particular IRQ line) and
determine if its device generated the interrupt, and to acknowledge,
preventing the device from asserting the IRQ line any longer.

The top part is called in hard IRQ context, even on RT.

*). Bottom level is automatically scheduled by the kernel in response to
the top part acknowledging that its device caused the interrupt.

The bottom part is run inside a dedicated kernel thread.

So pretty much everything is in the thread.

Jon.

2008-06-04 23:09:20

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH 0/7] Boot IRQ quirks and rerouting

On Wed, 4 Jun 2008, Jon Masters wrote:

> *). Top level handler is *tiny*. It's job is to get called (along with
> every other such function registered for a particular IRQ line) and
> determine if its device generated the interrupt, and to acknowledge,
> preventing the device from asserting the IRQ line any longer.
>
> The top part is called in hard IRQ context, even on RT.
>
> *). Bottom level is automatically scheduled by the kernel in response to
> the top part acknowledging that its device caused the interrupt.
>
> The bottom part is run inside a dedicated kernel thread.

I see what you mean -- well, with sane hardware that should be doable
with little effort indeed.

Maciej

2008-06-12 14:14:37

by Olaf Dabrunz

[permalink] [raw]
Subject: Re: [PATCH 2/7] reroute PCI interrupt to legacy boot interrupt equivalent

Sorry for the delay. We tried to address everything before we send new
patches and replies, and there were other tasks to do as well.

On 03-Jun-08, Thomas Gleixner wrote:
> On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
> > +
> > +#ifdef CONFIG_X86_IO_APIC
> > + /*
> > + * Some chipsets (e.g. intel 6700PXH) generate a legacy INTx when the
> > + * IRQ entry in the chipset's IO-APIC is masked (as, e.g. the RT kernel
> > + * does during interrupt handling). When this INTx generation cannot be
> > + * disabled, we reroute these interrupts to their legacy equivalent to
> > + * get rid of spurious interrupts.
> > + */
> > + if (reroute_to_boot_interrupts) {
> > + switch (bridge_has_boot_interrupt_variant(bus)) {
> > + case 0:
> > + /* no rerouting necessary */
> > + break;
> > +
> > + case VARIANT_INTEL_BUS_INT_REMAPPING:
> > + /*
> > + * Remap according to INTx routing table in 6700PXH
> > + * specs, intel order number 302628-002, section
> > + * 2.15.2. Other chipsets (80332, ...) have the same
> > + * mapping and are handled here as well.
> > + */
>
> They generate INTA,B,C,D messages. What makes sure, that those are
> always routed to IRQ 16,17,18,19 ?

We made sure by following the IRQ routing through the bridges and the
ICHs. For the MCHs in our test machines no special IRQ routing is
documented, and we tested that the IRQs end up on the ICH (the RIRR bit
is set on the ICH's IO-APIC IRQ line). The testing of the rerouting
patch showed that it works.

But the handling of IRQs in the MCHs is not completely clear (as pointed
out by Maciej W. Rozycki). I found out some interesting bits there now,
and a followup on this is in preparation.

We also need to document the other analysis and ideas that went into the
patch.

> IIRC we have seen those spurious interrupts on IRQ 9 and 12 as well.

Was that an AMD/ATI, Broadcomm or some other non-intel chip? (We can
switch off the boot IRQs for the AMD/ATI and Broadcomm chips we saw, so
no re-routing needed there.) I have looked at the intel ICH datasheets
from ICH[0] to ICH9 and all of them route PCI IRQs to IRQs 16 to 24.

> > + printk(KERN_INFO "pci irq %d -> rerouted to legacy "
> > + "irq %d\n", ret, (ret % 4) + 16);
> > + ret = (ret % 4) + 16;
> > + break;
> > +
> > + default:
> > + printk(KERN_INFO "not rerouting irq %d to legacy irq: "
> > + "unknown mapping\n", ret);
> > + break;
> > + }
> > + }
> > +#endif /* CONFIG_X86_IO_APIC */
> > +
> > return ret;
> > }
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 6245486..6173be5 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -1375,8 +1375,57 @@ int nobootirqquirk_setup(char *str)
> > return 1;
> > }
> > __setup("nobootirqquirk", nobootirqquirk_setup);
> > -#endif /* CONFIG_X86_IO_APIC */
> >
> > +/*
> > + * Boot interrupts on some chipsets cannot be turned off. For these chipsets,
> > + * remap the original interrupt in the linux kernel to the boot interrupt, so
> > + * that a PCI device's interrupt handler is installed on the boot interrupt
> > + * line instead.
> > + */
> > +int reroute_to_boot_interrupts;
> > +EXPORT_SYMBOL(reroute_to_boot_interrupts);
> > +
> > +struct quirk_bootirq_reroute_dev
> > + quirk_bootirq_reroute_devs[MAX_QUIRK_BOOTIRQ_REROUTE_DEVS];
> > +EXPORT_SYMBOL(quirk_bootirq_reroute_devs);
>
> Please use a consistent name space. This is about ioapic quirks so the
> whole stuff should use ioapic_quirk_XXX or something like this.
>
> Also why dont we put this information into the pci_dev structure
> instead of having these tables and exports ?

Yes. A variant of the patch using pci_dev was in the queue, and we use
this now.

(The reason we used the extra tables and exports was to avoid touching
pci_dev late in our release cycle. But I also believe the pci_dev
solution is cleaner for upstream. To give proper credit, Daniel Gollub
also suggested this to us.)

Regards,

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2008-06-12 14:14:54

by Olaf Dabrunz

[permalink] [raw]
Subject: Re: [PATCH 5/7] disable AMD/ATI legacy boot interrupt generation

On 03-Jun-08, Thomas Gleixner wrote:
> On Mon, 2 Jun 2008, Olaf Dabrunz wrote:
> >
> > /*
> > + * disable boot interrupts on AMD and ATI chipsets
> > + */
> > +#define PCI_X_MISC 0x40
> > +#define PCI_X_AMD813X_NIOAMODE (1<<0)
>
> new line before the function and consistent spacing
>
> > +static void quirk_disable_amd_813x_boot_interrupt(struct pci_dev *dev)
> > +{
> > + u32 pci_x_misc;
> > +
> > + if (nobootirqquirk)
> > + return;
> > +
> > + pci_read_config_dword(dev, PCI_X_MISC, &pci_x_misc);
> > + pci_x_misc &= ~PCI_X_AMD813X_NIOAMODE;
> > + pci_write_config_dword(dev, PCI_X_MISC, pci_x_misc);
> > +
> > + printk(KERN_INFO "disabled boot interrupts on PCI device "
> > + "0x%04x:0x%04x\n", dev->vendor, dev->device);
> > +}
> > +#undef PCI_X_MISC
> > +#undef PCI_X_AMD813X_NIOAMODE
>
> why #undef ?

To prevent spilling of constants to other code... Probably unneeded...

> > +
> > +#define PCI_AMD8111_PCI_IRQ_ROUTING 0x56
> > +static void quirk_disable_amd_8111_boot_interrupt(struct pci_dev *dev)
> > +{
> > + u16 pci_irq_routing;
> > +
> > + if (nobootirqquirk)
> > + return;
> > +
> > + pci_read_config_word(dev, PCI_AMD8111_PCI_IRQ_ROUTING,
> > + &pci_irq_routing);
> > + if (!pci_irq_routing) {
> > + printk(KERN_INFO "boot interrupts on PCI "
> > + "device 0x%04x:0x%04x were "
> > + "already disabled\n",
> > + dev->vendor, dev->device);
> > + return;
> > + }
> > + pci_write_config_word(dev, PCI_AMD8111_PCI_IRQ_ROUTING, 0);
> > + printk(KERN_INFO "disabled boot interrupts on PCI device "
> > + "0x%04x:0x%04x\n", dev->vendor, dev->device);
> > +}
> > +#undef PCI_AMD8111_PCI_IRQ_ROUTING
> > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_8111_SMBUS, quirk_disable_amd_8111_boot_interrupt);
> > +
> > +#define PCI_AMD_SB700S_FEATURES_ENABLE 0x64
> > +#define PCI_AMD_SB700S_PIC_ENABLE (1<<0)
> > +#define PIC_PCI_INTR_INDEX 0xC00
> > +#define PIC_PCI_INTR_DATA 0xC01
> > +#define CLEAR_PIC_IRQ_ROUTING(irq) \
> > + outb(irq, PIC_PCI_INTR_INDEX); \
> > + outb(0x00, PIC_PCI_INTR_DATA);
>
> Please use an inline function
>
> > +static void quirk_disable_amd_sb700s_boot_interrupt(struct pci_dev *dev)
> > +{
> > + u32 feature_enable;
> > + u32 saved_feature_enable;
> > +
> > + if (nobootirqquirk)
> > + return;
> > +
> > + pci_read_config_dword(dev, PCI_AMD_SB700S_FEATURES_ENABLE,
> > + &feature_enable);
> > + saved_feature_enable = feature_enable;
> > + feature_enable |= PCI_AMD_SB700S_PIC_ENABLE;
> > + pci_write_config_dword(dev, PCI_AMD_SB700S_FEATURES_ENABLE,
> > + feature_enable);
> > +
> > + CLEAR_PIC_IRQ_ROUTING(0x0);
> > + CLEAR_PIC_IRQ_ROUTING(0x1);
> > + CLEAR_PIC_IRQ_ROUTING(0x2);
> > + CLEAR_PIC_IRQ_ROUTING(0x3);
> > + CLEAR_PIC_IRQ_ROUTING(0x4);
> > + CLEAR_PIC_IRQ_ROUTING(0x9);
> > + CLEAR_PIC_IRQ_ROUTING(0xA);
> > + CLEAR_PIC_IRQ_ROUTING(0xB);
> > + CLEAR_PIC_IRQ_ROUTING(0xC);
>
> int i, irqs[] = {0x0,.....0x0c};
>
> for (i = 0; i < ARRAY_SIZE(irqs); i++) {
> outb(...);
> outb(...);
> }
>
> perhaps ?

Yep :)

> > +
> > + pci_write_config_dword(dev, PCI_AMD_SB700S_FEATURES_ENABLE,
> > + saved_feature_enable);
> > +
> > + printk(KERN_INFO "disabled boot interrupts on PCI device "
> > + "0x%04x:0x%04x\n", dev->vendor, dev->device);
> > +}
> > +#undef PCI_AMD_SB700S_FEATURES_ENABLE
> > +#undef PCI_AMD_SB700S_PIC_ENABLE
> > +#undef PIC_PCI_INTR_INDEX
> > +#undef PIC_PCI_INTR_DATA
> > +#undef CLEAR_PIC_IRQ_ROUTING
>
> grmbl

murmur... :)

> > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_IXP600_SMBUS, quirk_disable_amd_sb700s_boot_interrupt);
> > +
> > +/*
> > * disabled boot interrupts on HT-1000
> > */
> > static void quirk_disable_broadcom_boot_interrupt(struct pci_dev *dev)
> > --
> > 1.5.2.4
> >
> > --
> > Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, N??rnberg
> >

--
Olaf Dabrunz (od/odabrunz), SUSE Linux Products GmbH, Nürnberg

2010-02-16 00:30:53

by Yuhong Bao

[permalink] [raw]
Subject: RE: [PATCH 0/7] Boot IRQ quirks and rerouting


> Well, this is clear these chipsets are broken beyond imagination,
> negating some 15 years of I/O APIC compatibility where masking an input in
> its redirection table is expected not to have any side effects. They
> should have used a separate bit for the legacy INTx redirection. That has
> been fixed in hardware though and there is nothing we can do about it at
> this stage. Hire competent hardware designers the next time.
I think part of it has to do with how chipset vendors integrated the first I/O APIC into the southbridge around 1999-2001 (Intel was first to do so back in 1999, I think), which were necessary if there were any hope of putting them in uniprocessor systems. Unfortunately, the way it was integrated, the interrupt pins that goes outside the southbridge was shared with both the 8259 PICs (via a PIR) AND the first I/O APIC, and there was no way to do otherwise. While that was very convenient for single I/O APIC systems, on multiple I/O APIC systems it can cause boot interrupts to be delivered to the first I/O APIC as well as the PIC which could be very bad, as you have seen.

Yuhong Bao

_________________________________________________________________
Hotmail: Powerful Free email with security by Microsoft.
http://clk.atdmt.com/GBL/go/201469230/direct/01/-