2011-06-26 17:49:00

by Marc MERLIN

[permalink] [raw]
Subject: All kinds of irq 16: nobody cared with Sandy Bridge Asus P8H67-M MB and multiple drivers

[please Cc me on replies so that I can see them faster]

I've found various reports on the net, but usually they are one driver
or one card that's not doing the right thing.

In my case, it happens with multiple cards and drivers and I'm wondering if
it can be a motherboard bug somehow, and if there are linux kernel options
other than irqpoll which does not help, that can shed some light on this.

I just bought a new Sandy Bridge board:
Manufacturer: ASUSTeK Computer INC.
Product Name: P8H67-M PRO
Vendor: American Megatrends Inc.
Version: 1003
Release Date: 05/10/2011

I suppose the onboard pata could be problematic, but if so why would it only
fail when some special combination of cards share its irq?
03:00.0 IDE interface: VIA Technologies, Inc. Unknown device 0415
pata_via 0000:03:00.0: version 0.3.4
pata_via 0000:03:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
pata_via 0000:03:00.0: setting latency timer to 64

I started with 2.6.36 and now have the same problems with 2.6.39.1.
The MB has 2 PCIe slots, and 2 PCI slots.
3 slots get force routed on irq 16, and the last PCI slot is irq 19.
The onboard IDE is also irq 16.

I have tried 6 different PCI and PCIe cards:
- tg3 (gige)
- rtl8169 (gige)
- e100
- CMI8738-MC6
- sata_sil24 (sil 3132)
- sata_mv (marvell)

Pretty much all combinations where I have more than one PCI card plugged
in a slot routed to irq16 causes

irq 16: nobody cared (try booting with the "irqpoll" option)
(...)
handlers:
[<c037171b>] (ata_bmdma_interrupt+0x0/0x170)
[<f8450f1f>] (sil24_interrupt+0x0/0x443 [sata_sil24])
[<f8449d7a>] (e100_intr+0x0/0xa2 [e100])
Disabling IRQ #16

irqpoll makes no difference.

Moving the cards around helps if I only use 2 cards, one in the slot
with irq19 (currently sound card) and one in the slot with irq16.

Putting more than one card on irq 16 (not counting onboard pata) pretty
much always causes the dreaded message.

I however got one very puzzling error with an e100 in the separate irq19
slot once too:
kernel: irq 19: nobody cared (try booting with the "irqpoll" option)
kernel: Pid: 0, comm: swapper Tainted: G W 2.6.36.0-core2smp-volpreempt-noide-hm64-20100724 #1
kernel: Call Trace:
kernel: [<c016c7b2>] __report_bad_irq+0x2e/0x6f
kernel: [<c016c8e6>] note_interrupt+0xf3/0x149
kernel: [<c016b869>] ? handle_IRQ_event+0x1d/0x9c
kernel: [<c016cf10>] handle_fasteoi_irq+0x84/0xa2
kernel: [<c0104683>] handle_irq+0x3b/0x48
kernel: [<c0103ebc>] do_IRQ+0x41/0x9a
kernel: [<c0102e30>] common_interrupt+0x30/0x38
kernel: [<f8776ed2>] ? acpi_idle_enter_bm+0x245/0x281 [processor]
kernel: [<c03688d6>] cpuidle_idle_call+0x77/0xa9
kernel: [<c0101a6b>] cpu_idle+0x8e/0xab
kernel: [<c03e945c>] rest_init+0x58/0x5a
kernel: [<c05cd8c5>] start_kernel+0x318/0x31d
kernel: [<c05cd0c9>] i386_start_kernel+0xc9/0xd0
kernel: handlers:
kernel: [<f85ece3a>] (e100_intr+0x0/0xa0 [e100])
kernel: Disabling IRQ #19
(this was just a test, I don't otherwise use that e100 which works fine elsewhere).


This configuration is stable, but anything else on irq16 apparently
regardless of the card, causes issues:
CPU0 CPU1 CPU2 CPU3
0: 151 0 0 0 IO-APIC-edge timer
1: 8 0 0 0 IO-APIC-edge i8042
4: 1315 0 0 0 IO-APIC-edge serial
5: 0 0 0 0 IO-APIC-edge parport0
8: 1 0 0 0 IO-APIC-edge rtc0
9: 0 0 0 0 IO-APIC-fasteoi acpi
16: 1240641 0 0 0 IO-APIC-fasteoi pata_via, sata_sil24
17: 0 0 0 0 IO-APIC-fasteoi xhci_hcd:usb1
18: 11031 0 0 0 IO-APIC-fasteoi eth1
19: 312990 0 0 0 IO-APIC-fasteoi CMI8738-MC6
20: 0 0 0 0 IO-APIC-fasteoi ahci
22: 624 0 0 0 IO-APIC-fasteoi hda_intel
23: 3289687 0 0 0 IO-APIC-fasteoi ehci_hcd:usb2, ehci_hcd:usb3
NMI: 143 41 64 47 Non-maskable interrupts
LOC: 545170 213741 439101 194418 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 143 41 64 47 Performance monitoring interrupts
IWI: 0 0 0 0 IRQ work interrupts
RES: 3479 2084 18780 2195 Rescheduling interrupts
CAL: 874 1979 1417 1741 Function call interrupts
TLB: 691 472 1722 1781 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 4 4 4 4 Machine check polls
ERR: 0
MIS: 0

Below are various combinations of failures in case they help:

kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
kernel: Pid: 0, comm: swapper Tainted: G W 2.6.36.0-core2smp-volpreempt-noide-hm64-20100724 #1
kernel: Call Trace:
kernel: [<c016c7b2>] __report_bad_irq+0x2e/0x6f
kernel: [<c016c8e6>] note_interrupt+0xf3/0x149
kernel: [<c016b869>] ? handle_IRQ_event+0x1d/0x9c
kernel: [<c016cf10>] handle_fasteoi_irq+0x84/0xa2
kernel: [<c0104683>] handle_irq+0x3b/0x48
kernel: [<c0103ebc>] do_IRQ+0x41/0x9a
kernel: [<c0102e30>] common_interrupt+0x30/0x38
kernel: [<f8506ed2>] ? acpi_idle_enter_bm+0x245/0x281 [processor]
kernel: [<c03688d6>] cpuidle_idle_call+0x77/0xa9
kernel: [<c0101a6b>] cpu_idle+0x8e/0xab
kernel: [<c03e945c>] rest_init+0x58/0x5a
kernel: [<c05cd8c5>] start_kernel+0x318/0x31d
kernel: [<c05cd0c9>] i386_start_kernel+0xc9/0xd0
kernel: handlers:
kernel: [<c0284373>] (pcie_pme_irq+0x0/0x6f)
kernel: [<c0284373>] (pcie_pme_irq+0x0/0x6f)
kernel: [<f87a3989>] (mv_interrupt+0x0/0x765 [sata_mv])
kernel: [<c033d3b5>] (ata_bmdma_interrupt+0x0/0x16f)
kernel: [<f87d402a>] (snd_cmipci_interrupt+0x0/0xde [snd_cmipci])
kernel: Disabling IRQ #16


kernel: handlers:
kernel: [<c0284373>] (pcie_pme_irq+0x0/0x6f)
kernel: [<c0284373>] (pcie_pme_irq+0x0/0x6f)
kernel: [<c033d3b5>] (ata_bmdma_interrupt+0x0/0x16f)
kernel: [<f8865989>] (mv_interrupt+0x0/0x765 [sata_mv])
kernel: [<f8b9102a>] (snd_cmipci_interrupt+0x0/0xde [snd_cmipci])
kernel: Disabling IRQ #16

kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
kernel: handlers:
kernel: [<c0284373>] (pcie_pme_irq+0x0/0x6f)
kernel: [<c0284373>] (pcie_pme_irq+0x0/0x6f)
kernel: [<f84fcde6>] (sil24_interrupt+0x0/0x432 [sata_sil24])
kernel: [<c033d3b5>] (ata_bmdma_interrupt+0x0/0x16f)
kernel: [<f8484e3a>] (e100_intr+0x0/0xa0 [e100])
kernel: Disabling IRQ #16

kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
kernel: Pid: 0, comm: swapper Not tainted 2.6.36.0-core2smp-volpreempt-noide-hm64-20100724 #1
kernel: Call Trace:
kernel: [<c016c7b2>] __report_bad_irq+0x2e/0x6f
kernel: [<c016c8e6>] note_interrupt+0xf3/0x149
kernel: [<c016b869>] ? handle_IRQ_event+0x1d/0x9c
kernel: [<c016cf10>] handle_fasteoi_irq+0x84/0xa2
kernel: [<c0104683>] handle_irq+0x3b/0x48
kernel: [<c0103ebc>] do_IRQ+0x41/0x9a
kernel: [<c0102e30>] common_interrupt+0x30/0x38
kernel: [<c03685f9>] ? poll_idle+0x22/0x60
kernel: [<c03688d6>] cpuidle_idle_call+0x77/0xa9
kernel: [<c0101a6b>] cpu_idle+0x8e/0xab
kernel: [<c03e945c>] rest_init+0x58/0x5a
kernel: [<c05cd8c5>] start_kernel+0x318/0x31d
kernel: [<c05cd0c9>] i386_start_kernel+0xc9/0xd0
kernel: handlers:
kernel: [<c0284373>] (pcie_pme_irq+0x0/0x6f)
kernel: [<c0284373>] (pcie_pme_irq+0x0/0x6f)
kernel: [<c033d3b5>] (ata_bmdma_interrupt+0x0/0x16f)
kernel: [<f8865989>] (mv_interrupt+0x0/0x765 [sata_mv])
kernel: Disabling IRQ #16

kernel: [<c0284373>] (pcie_pme_irq+0x0/0x6f)
kernel: [<c0284373>] (pcie_pme_irq+0x0/0x6f)
kernel: [<c033d3b5>] (ata_bmdma_interrupt+0x0/0x16f)
kernel: [<f8865989>] (mv_interrupt+0x0/0x765 [sata_mv])
kernel: [<f8b9102a>] (snd_cmipci_interrupt+0x0/0xde [snd_cmipci])
kernel: Disabling IRQ #16

kernel: irq 19: nobody cared (try booting with the "irqpoll" option)
kernel: Pid: 0, comm: swapper Tainted: G W 2.6.36.0-core2smp-volpreempt-noide-hm64-20100724 #1
kernel: Call Trace:
kernel: [<c016c7b2>] __report_bad_irq+0x2e/0x6f
kernel: [<c016c8e6>] note_interrupt+0xf3/0x149
kernel: [<c016b869>] ? handle_IRQ_event+0x1d/0x9c
kernel: [<c016cf10>] handle_fasteoi_irq+0x84/0xa2
kernel: [<c0104683>] handle_irq+0x3b/0x48
kernel: [<c0103ebc>] do_IRQ+0x41/0x9a
kernel: [<c0102e30>] common_interrupt+0x30/0x38
kernel: [<f8776ed2>] ? acpi_idle_enter_bm+0x245/0x281 [processor]
kernel: [<c03688d6>] cpuidle_idle_call+0x77/0xa9
kernel: [<c0101a6b>] cpu_idle+0x8e/0xab
kernel: [<c03e945c>] rest_init+0x58/0x5a
kernel: [<c05cd8c5>] start_kernel+0x318/0x31d
kernel: [<c05cd0c9>] i386_start_kernel+0xc9/0xd0
kernel: handlers:
kernel: [<f85ece3a>] (e100_intr+0x0/0xa0 [e100])
kernel: Disabling IRQ #19

kernel: handlers:
kernel: [<c0284373>] (pcie_pme_irq+0x0/0x6f)
kernel: [<c0284373>] (pcie_pme_irq+0x0/0x6f)
kernel: [<f84fcde6>] (sil24_interrupt+0x0/0x432 [sata_sil24])
kernel: [<c033d3b5>] (ata_bmdma_interrupt+0x0/0x16f)
kernel: [<f8484e3a>] (e100_intr+0x0/0xa0 [e100])
kernel: Disabling IRQ #16

kernel: handlers:
kernel: [<c037171b>] (ata_bmdma_interrupt+0x0/0x170)
kernel: [<f8450f1f>] (sil24_interrupt+0x0/0x443 [sata_sil24])
kernel: [<f8449d7a>] (e100_intr+0x0/0xa2 [e100])
kernel: Disabling IRQ #16

kernel: handlers:
kernel: [<c037171b>] (ata_bmdma_interrupt+0x0/0x170)
kernel: [<f845df1f>] (sil24_interrupt+0x0/0x443 [sata_sil24])
kernel: [<f8d9efbb>] (tg3_interrupt_tagged+0x0/0xa2 [tg3])
kernel: Disabling IRQ #16

kernel: handlers:
kernel: [<c037171b>] (ata_bmdma_interrupt+0x0/0x170)
kernel: [<f85f09c1>] (mv_interrupt+0x0/0x746 [sata_mv])
kernel: [<f87046c1>] (rtl8169_interrupt+0x0/0x2b5 [r8169])
kernel: Disabling IRQ #16

Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/


2011-06-29 14:12:20

by Bill Davidsen

[permalink] [raw]
Subject: Re: All kinds of irq 16: nobody cared with Sandy Bridge Asus P8H67-M MB and multiple drivers

Marc MERLIN wrote:
> [please Cc me on replies so that I can see them faster]
>
> I've found various reports on the net, but usually they are one driver
> or one card that's not doing the right thing.
>
> In my case, it happens with multiple cards and drivers and I'm wondering if
> it can be a motherboard bug somehow, and if there are linux kernel options
> other than irqpoll which does not help, that can shed some light on this.
>
> I just bought a new Sandy Bridge board:
> Manufacturer: ASUSTeK Computer INC.
> Product Name: P8H67-M PRO
> Vendor: American Megatrends Inc.
> Version: 1003
> Release Date: 05/10/2011
>
> I suppose the onboard pata could be problematic, but if so why would it only
> fail when some special combination of cards share its irq?
> 03:00.0 IDE interface: VIA Technologies, Inc. Unknown device 0415
> pata_via 0000:03:00.0: version 0.3.4
> pata_via 0000:03:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> pata_via 0000:03:00.0: setting latency timer to 64
>
> I started with 2.6.36 and now have the same problems with 2.6.39.1.
> The MB has 2 PCIe slots, and 2 PCI slots.
> 3 slots get force routed on irq 16, and the last PCI slot is irq 19.
> The onboard IDE is also irq 16.
>
> I have tried 6 different PCI and PCIe cards:
> - tg3 (gige)
> - rtl8169 (gige)
> - e100
> - CMI8738-MC6
> - sata_sil24 (sil 3132)
> - sata_mv (marvell)
>
> Pretty much all combinations where I have more than one PCI card plugged
> in a slot routed to irq16 causes
>
> irq 16: nobody cared (try booting with the "irqpoll" option)
> (...)
> handlers:
> [<c037171b>] (ata_bmdma_interrupt+0x0/0x170)
> [<f8450f1f>] (sil24_interrupt+0x0/0x443 [sata_sil24])
> [<f8449d7a>] (e100_intr+0x0/0xa2 [e100])
> Disabling IRQ #16
>
> irqpoll makes no difference.
>
> Moving the cards around helps if I only use 2 cards, one in the slot
> with irq19 (currently sound card) and one in the slot with irq16.
>
> Putting more than one card on irq 16 (not counting onboard pata) pretty
> much always causes the dreaded message.
>
> I however got one very puzzling error with an e100 in the separate irq19
> slot once too:
> kernel: irq 19: nobody cared (try booting with the "irqpoll" option)
> kernel: Pid: 0, comm: swapper Tainted: G W 2.6.36.0-core2smp-volpreempt-noide-hm64-20100724 #1

Does this happen with an untainted kernel?

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2011-06-29 15:18:12

by Marc MERLIN

[permalink] [raw]
Subject: Re: All kinds of irq 16: nobody cared with Sandy Bridge Asus P8H67-M MB and multiple drivers

I'll put the summary at the top:
Apparently upgrading the MB's bios seems to have mostly fixed my problem.

On Wed, Jun 29, 2011 at 09:07:58AM -0400, Bill Davidsen wrote:
> Does this happen with an untainted kernel?

I'm confused as to why the kernel is tainted.

I have no binary modules and with 2.6.39.1 compiled from kernel.org source
with no patches, I still got:
Jun 25 20:05:41 gargamel kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Jun 25 20:05:41 gargamel kernel: Pid: 0, comm: swapper Tainted: G W 2.6.39.1-core2-volpreempt-noide-hm64-20110620 #2

Just to be clear, I am not running vmware.

I checked a recent boot, and saw it went from not tainted to tainted
after an apparent minor problem in the sata_mv driver (from kernel.org).
Would that make sense?

Jun 28 21:59:30 gargamel kernel: Modules linked in: keyspan ati_remote visor pl2303 ftdi_sio pci_hotplug soundcore backlight option processor usb_wwan rtc_cmos thermal_sys rtc_core hwmon parport_pc button snd_page_alloc parport usbserial wmi i2c_i801 evdev rtc_lib sata_mv pcspkr intel_agp intel_gtt agpgart tpm_tis r8169 iTCO_wdt iTCO_vendor_support xhci_hcd ehci_hcd usbcore
Jun 28 21:59:30 gargamel kernel: Pid: 1636, comm: scsi_eh_9 Not tainted 2.6.39.1-core2-volpreempt-noide-hm64-20110620 #2
(...)
Jun 28 21:59:30 gargamel kernel: Modules linked in: keyspan ati_remote visor pl2303 ftdi_sio pci_hotplug soundcore backlight option processor usb_wwan rtc_cmos thermal_sys rtc_core hwmon parport_pc button snd_page_alloc parport usbserial wmi i2c_i801 evdev rtc_lib sata_mv pcspkr intel_agp intel_gtt agpgart tpm_tis r8169 iTCO_wdt iTCO_vendor_support xhci_hcd ehci_hcd usbcore
Jun 28 21:59:30 gargamel kernel: Pid: 1636, comm: scsi_eh_9 Tainted: G W 2.6.39.1-core2-volpreempt-noide-hm64-20110620 #2

But back to the original question, it looks like a bios upgrade of the
motherboard mostly fixed the problem. The system boots and works fine with
multiple drivers using irq16 (using the same cards that were failing when
used together days ago).

What's interesting though is that at shutdown time, I do see this on the
console:
Stopping domain name service...: bind9 waiting for pid 3967 to die.
Stopping kernel log daemon....
Stopping system log daemon....
Asking all remaining processes to terminate...done.
irq 19: nobody cared (try booting with the "irqpoll" option)
handlers: [<f8ab1213>] (snd_cmipci_interrupt+0x0/0xe5 [snd_cmipci])
Disabling IRQ #19
All processes ended within 4 seconds....done.

And I only have one card using irq19:
19: 0 0 0 0 IO-APIC-fasteoi CMI8738-MC6

This does not happen while I use the card when the system is running.

Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/

2011-06-29 15:02:08

by Bill Davidsen

[permalink] [raw]
Subject: Re: All kinds of irq 16: nobody cared with Sandy Bridge Asus P8H67-M MB and multiple drivers

Marc MERLIN wrote:
> I'll put the summary at the top:
> Apparently upgrading the MB's bios seems to have mostly fixed my problem.
>
> On Wed, Jun 29, 2011 at 09:07:58AM -0400, Bill Davidsen wrote:
>
>> Does this happen with an untainted kernel?
>>
>
> I'm confused as to why the kernel is tainted.
>
> I have no binary modules and with 2.6.39.1 compiled from kernel.org source
> with no patches, I still got:
> Jun 25 20:05:41 gargamel kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
> Jun 25 20:05:41 gargamel kernel: Pid: 0, comm: swapper Tainted: G W 2.6.39.1-core2-volpreempt-noide-hm64-20110620 #2
>
> Just to be clear, I am not running vmware.
>
> I checked a recent boot, and saw it went from not tainted to tainted
> after an apparent minor problem in the sata_mv driver (from kernel.org).
> Would that make sense?
>
> Jun 28 21:59:30 gargamel kernel: Modules linked in: keyspan ati_remote visor pl2303 ftdi_sio pci_hotplug soundcore backlight option processor usb_wwan rtc_cmos thermal_sys rtc_core hwmon parport_pc button snd_page_alloc parport usbserial wmi i2c_i801 evdev rtc_lib sata_mv pcspkr intel_agp intel_gtt agpgart tpm_tis r8169 iTCO_wdt iTCO_vendor_support xhci_hcd ehci_hcd usbcore
> Jun 28 21:59:30 gargamel kernel: Pid: 1636, comm: scsi_eh_9 Not tainted 2.6.39.1-core2-volpreempt-noide-hm64-20110620 #2
> (...)
> Jun 28 21:59:30 gargamel kernel: Modules linked in: keyspan ati_remote visor pl2303 ftdi_sio pci_hotplug soundcore backlight option processor usb_wwan rtc_cmos thermal_sys rtc_core hwmon parport_pc button snd_page_alloc parport usbserial wmi i2c_i801 evdev rtc_lib sata_mv pcspkr intel_agp intel_gtt agpgart tpm_tis r8169 iTCO_wdt iTCO_vendor_support xhci_hcd ehci_hcd usbcore
> Jun 28 21:59:30 gargamel kernel: Pid: 1636, comm: scsi_eh_9 Tainted: G W 2.6.39.1-core2-volpreempt-noide-hm64-20110620 #2
>
>
Sounds as if some module isn't properly labeled? Do you load any custom
firmware or something like that.
Most puzzling.

> But back to the original question, it looks like a bios upgrade of the
> motherboard mostly fixed the problem. The system boots and works fine with
> multiple drivers using irq16 (using the same cards that were failing when
> used together days ago).
>
>
Glad the BIOS upgrade fixed it. The problem below is probably caused by
a driver for the irq going away with an int pending.
Again, "or something like that." ;-)

> What's interesting though is that at shutdown time, I do see this on the
> console:
> Stopping domain name service...: bind9 waiting for pid 3967 to die.
> Stopping kernel log daemon....
> Stopping system log daemon....
> Asking all remaining processes to terminate...done.
> irq 19: nobody cared (try booting with the "irqpoll" option)
> handlers: [<f8ab1213>] (snd_cmipci_interrupt+0x0/0xe5 [snd_cmipci])
> Disabling IRQ #19
> All processes ended within 4 seconds....done.
>
> And I only have one card using irq19:
> 19: 0 0 0 0 IO-APIC-fasteoi CMI8738-MC6
>
> This does not happen while I use the card when the system is running.
>
> Marc
>


--
Bill Davidsen<[email protected]>
We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination. -me, 2010