LinuxLists.cc - 2.6.18 - another DWARF2

2006-10-17 06:40:08

Subject: 2.6.18 - another DWARF2

just got the following on resume:

[87026.242000] usb 4-2: USB disconnect, address 18
[87026.448000] usb 4-2: new full speed USB device using uhci_hcd and address
20
[87026.613000] usb 4-2: configuration #1 chosen from 1 choice
[87026.704000] IRQ handler type mismatch for IRQ 90
[87026.704000] [<c0103d7c>] show_trace_log_lvl+0x5b/0x16d
[87026.704000] [<c01044cb>] show_trace+0xf/0x11
[87026.705000] [<c01045ce>] dump_stack+0x15/0x17
[87026.705000] [<c01403d7>] setup_irq+0x17d/0x190
[87026.705000] [<c0140466>] request_irq+0x7c/0x98
[87026.706000] [<c0251745>] e1000_open+0xcd/0x1a4
[87026.707000] [<c029565c>] dev_open+0x2b/0x62
[87026.708000] [<c0294181>] dev_change_flags+0x47/0xe4
[87026.709000] [<c02c7cd1>] devinet_ioctl+0x252/0x556
[87026.711000] [<c028b5a3>] sock_ioctl+0x19a/0x1be
[87026.712000] [<c016b6af>] do_ioctl+0x1f/0x62
[87026.712000] [<c016b937>] vfs_ioctl+0x245/0x257
[87026.713000] [<c016b995>] sys_ioctl+0x4c/0x67
[87026.714000] [<c0102db3>] syscall_call+0x7/0xb
[87026.714000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
[87026.715000] Leftover inexact backtrace:
[87026.715000] e1000: eth0: e1000_request_irq: Unable to allocate interrupt
Error: -16
[87027.026000] usb 4-1: new full speed USB device using uhci_hcd and address
21
[87027.055000] ata1: waiting for device to spin up (7 secs)
[87027.186000] usb 4-1: configuration #1 chosen from 1 choice
[87027.757000] IRQ handler type mismatch for IRQ 98
[87027.757000] [<c0103d7c>] show_trace_log_lvl+0x5b/0x16d
[87027.757000] [<c01044cb>] show_trace+0xf/0x11
[87027.758000] [<c01045ce>] dump_stack+0x15/0x17
[87027.758000] [<c01403d7>] setup_irq+0x17d/0x190
[87027.758000] [<c0140466>] request_irq+0x7c/0x98
[87027.759000] [<c0251745>] e1000_open+0xcd/0x1a4
[87027.760000] [<c029565c>] dev_open+0x2b/0x62
[87027.761000] [<c0294181>] dev_change_flags+0x47/0xe4
[87027.762000] [<c02c7cd1>] devinet_ioctl+0x252/0x556
[87027.764000] [<c028b5a3>] sock_ioctl+0x19a/0x1be
[87027.765000] [<c016b6af>] do_ioctl+0x1f/0x62
[87027.766000] [<c016b937>] vfs_ioctl+0x245/0x257
[87027.766000] [<c016b995>] sys_ioctl+0x4c/0x67
[87027.767000] [<c0102db3>] syscall_call+0x7/0xb
[87027.767000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
[87027.767000] Leftover inexact backtrace:
[87027.767000] e1000: eth0: e1000_request_irq: Unable to allocate interrupt
Error: -16
[87028.810000] IRQ handler type mismatch for IRQ 106
[87028.810000] [<c0103d7c>] show_trace_log_lvl+0x5b/0x16d
[87028.810000] [<c01044cb>] show_trace+0xf/0x11
[87028.811000] [<c01045ce>] dump_stack+0x15/0x17
[87028.811000] [<c01403d7>] setup_irq+0x17d/0x190
[87028.811000] [<c0140466>] request_irq+0x7c/0x98
[87028.812000] [<c0251745>] e1000_open+0xcd/0x1a4
[87028.813000] [<c029565c>] dev_open+0x2b/0x62
[87028.814000] [<c0294181>] dev_change_flags+0x47/0xe4
[87028.815000] [<c02c7cd1>] devinet_ioctl+0x252/0x556
[87028.817000] [<c028b5a3>] sock_ioctl+0x19a/0x1be
[87028.818000] [<c016b6af>] do_ioctl+0x1f/0x62
[87028.818000] [<c016b937>] vfs_ioctl+0x245/0x257
[87028.819000] [<c016b995>] sys_ioctl+0x4c/0x67
[87028.820000] [<c0102db3>] syscall_call+0x7/0xb
[87028.820000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
[87028.820000] Leftover inexact backtrace:
[87028.820000] e1000: eth0: e1000_request_irq: Unable to allocate interrupt
Error: -16
[87029.863000] IRQ handler type mismatch for IRQ 114
[87029.863000] [<c0103d7c>] show_trace_log_lvl+0x5b/0x16d
[87029.863000] [<c01044cb>] show_trace+0xf/0x11
[87029.864000] [<c01045ce>] dump_stack+0x15/0x17
[87029.864000] [<c01403d7>] setup_irq+0x17d/0x190
[87029.864000] [<c0140466>] request_irq+0x7c/0x98
[87029.865000] [<c0251745>] e1000_open+0xcd/0x1a4
[87029.866000] [<c029565c>] dev_open+0x2b/0x62
[87029.867000] [<c0294181>] dev_change_flags+0x47/0xe4
[87029.868000] [<c02c7cd1>] devinet_ioctl+0x252/0x556
[87029.870000] [<c028b5a3>] sock_ioctl+0x19a/0x1be
[87029.871000] [<c016b6af>] do_ioctl+0x1f/0x62
[87029.871000] [<c016b937>] vfs_ioctl+0x245/0x257
[87029.872000] [<c016b995>] sys_ioctl+0x4c/0x67
[87029.873000] [<c0102db3>] syscall_call+0x7/0xb
[87029.873000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
[87029.873000] Leftover inexact backtrace:
[87029.873000] e1000: eth0: e1000_request_irq: Unable to allocate interrupt
Error: -16
[87030.917000] IRQ handler type mismatch for IRQ 122
[87030.917000] [<c0103d7c>] show_trace_log_lvl+0x5b/0x16d
[87030.917000] [<c01044cb>] show_trace+0xf/0x11
[87030.918000] [<c01045ce>] dump_stack+0x15/0x17
[87030.918000] [<c01403d7>] setup_irq+0x17d/0x190
[87030.918000] [<c0140466>] request_irq+0x7c/0x98
[87030.919000] [<c0251745>] e1000_open+0xcd/0x1a4
[87030.920000] [<c029565c>] dev_open+0x2b/0x62
[87030.921000] [<c0294181>] dev_change_flags+0x47/0xe4
[87030.922000] [<c02c7cd1>] devinet_ioctl+0x252/0x556
[87030.924000] [<c028b5a3>] sock_ioctl+0x19a/0x1be
[87030.925000] [<c016b6af>] do_ioctl+0x1f/0x62
[87030.925000] [<c016b937>] vfs_ioctl+0x245/0x257
[87030.926000] [<c016b995>] sys_ioctl+0x4c/0x67
[87030.927000] [<c0102db3>] syscall_call+0x7/0xb
[87030.927000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
[87030.927000] Leftover inexact backtrace:
[87030.927000] e1000: eth0: e1000_request_irq: Unable to allocate interrupt
Error: -16
[87031.970000] IRQ handler type mismatch for IRQ 130
[87031.970000] [<c0103d7c>] show_trace_log_lvl+0x5b/0x16d
[87031.970000] [<c01044cb>] show_trace+0xf/0x11
[87031.971000] [<c01045ce>] dump_stack+0x15/0x17
[87031.971000] [<c01403d7>] setup_irq+0x17d/0x190
[87031.971000] [<c0140466>] request_irq+0x7c/0x98
[87031.972000] [<c0251745>] e1000_open+0xcd/0x1a4
[87031.973000] [<c029565c>] dev_open+0x2b/0x62
[87031.974000] [<c0294181>] dev_change_flags+0x47/0xe4
[87031.975000] [<c02c7cd1>] devinet_ioctl+0x252/0x556
[87031.977000] [<c028b5a3>] sock_ioctl+0x19a/0x1be
[87031.978000] [<c016b6af>] do_ioctl+0x1f/0x62
[87031.978000] [<c016b937>] vfs_ioctl+0x245/0x257
[87031.979000] [<c016b995>] sys_ioctl+0x4c/0x67
[87031.980000] [<c0102db3>] syscall_call+0x7/0xb
[87031.980000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
[87031.980000] Leftover inexact backtrace:
[87031.980000] e1000: eth0: e1000_request_irq: Unable to allocate interrupt
Error: -16
[87034.525000] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

full dmesg is at
http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.18-ie-la-tp-41.5+0813.boot
(boot time) and
http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.18-ie-la-tp-41.5+0813.out
(runtime)

gruss
mlo
--
Dipl.-Ing. Martin Lorenz

They that can give up essential liberty
to obtain a little temporary safety
deserve neither liberty nor safety.
Benjamin Franklin

please encrypt your mail to me
GnuPG key-ID: F1AAD37D
get it here:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xF1AAD37D

ICQ UIN: 33588107

2006-10-17 18:52:19

by Jesse Brandeburg

[permalink] [raw]

Subject: Re: 2.6.18 - another DWARF2

On 10/16/06, Martin Lorenz <[email protected]> wrote:
> just got the following on resume:
>
> [87026.242000] usb 4-2: USB disconnect, address 18
> [87026.448000] usb 4-2: new full speed USB device using uhci_hcd and address
> 20
> [87026.613000] usb 4-2: configuration #1 chosen from 1 choice
> [87026.704000] IRQ handler type mismatch for IRQ 90
> [87026.704000] [<c0103d7c>] show_trace_log_lvl+0x5b/0x16d
> [87026.704000] [<c01044cb>] show_trace+0xf/0x11
> [87026.705000] [<c01045ce>] dump_stack+0x15/0x17
> [87026.705000] [<c01403d7>] setup_irq+0x17d/0x190
> [87026.705000] [<c0140466>] request_irq+0x7c/0x98
> [87026.706000] [<c0251745>] e1000_open+0xcd/0x1a4
> [87026.707000] [<c029565c>] dev_open+0x2b/0x62
> [87026.708000] [<c0294181>] dev_change_flags+0x47/0xe4
> [87026.709000] [<c02c7cd1>] devinet_ioctl+0x252/0x556
> [87026.711000] [<c028b5a3>] sock_ioctl+0x19a/0x1be
> [87026.712000] [<c016b6af>] do_ioctl+0x1f/0x62
> [87026.712000] [<c016b937>] vfs_ioctl+0x245/0x257
> [87026.713000] [<c016b995>] sys_ioctl+0x4c/0x67
> [87026.714000] [<c0102db3>] syscall_call+0x7/0xb
> [87026.714000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
> [87026.715000] Leftover inexact backtrace:
> [87026.715000] e1000: eth0: e1000_request_irq: Unable to allocate interrupt
> Error: -16

I'm pretty sure this isn't an e1000 problem. you need to talk to
whoever is maintaining the IRQ subsystem for x86. E1000 is attempting
to register a shared interrupt and someone has already registered that
interrupt unshared.
from your dmesg
[ 0.289935] Intel(R) PRO/1000 Network Driver - version 7.1.9-k4-NAPI
[ 0.290030] Copyright (c) 1999-2006 Intel Corporation.
[ 0.290196] ACPI: PCI Interrupt 0000:02:00.0[A] -> GSI 16 (level,
low) -> IRQ 201
[ 0.291176] PCI: Setting latency timer of device 0000:02:00.0 to 64
[ 0.336827] e1000: 0000:02:00.0: e1000_probe: (PCI
Express:2.5Gb/s:Width x1) 00:16:d3:22:9b:82
[ 0.381316] e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
[ 0.381553] netconsole: not configured, aborting
[ 0.381759] libata version 2.00 loaded.
[ 0.381815] ahci 0000:00:1f.2: version 2.0
[ 0.381835] ACPI: PCI Interrupt 0000:00:1f.2[B] -> GSI 16 (level,
low) -> IRQ 201
and
[ 7.107822] USB Universal Host Controller Interface driver v3.0
[ 7.107988] ACPI: PCI Interrupt 0000:00:1d.0[A] -> GSI 16 (level,
low) -> IRQ 201

looks like several devices are sharing IRQ 201 (aka GSI 16) and ahci
or usb uhci_hcd is likely the problem, or the (acpi) power management
subsystem.

Hope this helps get the right people involved.

Jesse

2006-10-18 06:35:05

by Martin Lorenz

[permalink] [raw]

Subject: un/shared IRQ problem (was: Re: 2.6.18 - another DWARF2)

On Tue, Oct 17, 2006 at 11:52:16AM -0700, Jesse Brandeburg wrote:
> On 10/16/06, Martin Lorenz <[email protected]> wrote:
> >just got the following on resume:
> >
> >[87026.706000] [<c0251745>] e1000_open+0xcd/0x1a4
> >[87026.714000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
> >[87026.715000] Leftover inexact backtrace:
> >[87026.715000] e1000: eth0: e1000_request_irq: Unable to allocate interrupt
> >Error: -16
>
> I'm pretty sure this isn't an e1000 problem. you need to talk to
> whoever is maintaining the IRQ subsystem for x86. E1000 is attempting
> to register a shared interrupt and someone has already registered that
> interrupt unshared.

interestingly though it always involves e1000 when I see dumps like this.
I already reported more of those :-)
this one dosen't seem to do any harm to system stability. it occurs on every
suspend/resume and I can circumvent it by disabling msi

>
> looks like several devices are sharing IRQ 201 (aka GSI 16) and ahci
> or usb uhci_hcd is likely the problem, or the (acpi) power management
> subsystem.
>
> Hope this helps get the right people involved.

thank you

gruss
mlo
--
Dipl.-Ing. Martin Lorenz

They that can give up essential liberty
to obtain a little temporary safety
deserve neither liberty nor safety.
Benjamin Franklin

please encrypt your mail to me
GnuPG key-ID: F1AAD37D
get it here:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xF1AAD37D

ICQ UIN: 33588107

2006-10-19 06:26:17

by Andrew Morton

[permalink] [raw]

Subject: Re: un/shared IRQ problem (was: Re: 2.6.18 - another DWARF2)

On Wed, 18 Oct 2006 08:34:31 +0200
Martin Lorenz <[email protected]> wrote:

> On Tue, Oct 17, 2006 at 11:52:16AM -0700, Jesse Brandeburg wrote:
> > On 10/16/06, Martin Lorenz <[email protected]> wrote:
> > >just got the following on resume:
> > >
> > >[87026.706000] [<c0251745>] e1000_open+0xcd/0x1a4
> > >[87026.714000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
> > >[87026.715000] Leftover inexact backtrace:
> > >[87026.715000] e1000: eth0: e1000_request_irq: Unable to allocate interrupt
> > >Error: -16
> >
> > I'm pretty sure this isn't an e1000 problem. you need to talk to
> > whoever is maintaining the IRQ subsystem for x86. E1000 is attempting
> > to register a shared interrupt and someone has already registered that
> > interrupt unshared.
>
> interestingly though it always involves e1000 when I see dumps like this.
> I already reported more of those :-)
> this one dosen't seem to do any harm to system stability. it occurs on every
> suspend/resume and I can circumvent it by disabling msi
>
> >
> > looks like several devices are sharing IRQ 201 (aka GSI 16) and ahci
> > or usb uhci_hcd is likely the problem, or the (acpi) power management
> > subsystem.
> >
> > Hope this helps get the right people involved.
>
> thank you

Could we see the /proc/interrupts please, so we can find out where the
clash is happening?

2006-10-19 06:40:10

by Martin Lorenz

[permalink] [raw]

Subject: Re: un/shared IRQ problem (was: Re: 2.6.18 - another DWARF2)

On Wed, Oct 18, 2006 at 11:26:03PM -0700, Andrew Morton wrote:
> On Wed, 18 Oct 2006 08:34:31 +0200
> Martin Lorenz <[email protected]> wrote:
>
> > On Tue, Oct 17, 2006 at 11:52:16AM -0700, Jesse Brandeburg wrote:
> > > On 10/16/06, Martin Lorenz <[email protected]> wrote:
> > > >just got the following on resume:
> > > >
> > > >[87026.706000] [<c0251745>] e1000_open+0xcd/0x1a4
> > > >[87026.714000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
> > > >[87026.715000] Leftover inexact backtrace:
> > > >[87026.715000] e1000: eth0: e1000_request_irq: Unable to allocate interrupt
> > > >Error: -16
> > >
> > > I'm pretty sure this isn't an e1000 problem. you need to talk to
> > > whoever is maintaining the IRQ subsystem for x86. E1000 is attempting
> > > to register a shared interrupt and someone has already registered that
> > > interrupt unshared.
> >
> > interestingly though it always involves e1000 when I see dumps like this.
> > I already reported more of those :-)
> > this one dosen't seem to do any harm to system stability. it occurs on every
> > suspend/resume and I can circumvent it by disabling msi
> >
> > >
> > > looks like several devices are sharing IRQ 201 (aka GSI 16) and ahci
> > > or usb uhci_hcd is likely the problem, or the (acpi) power management
> > > subsystem.
> > >
> > > Hope this helps get the right people involved.
> >
> > thank you
>
> Could we see the /proc/interrupts please, so we can find out where the
> clash is happening?

here you are

~# cat /proc/interrupts
CPU0 CPU1
0: 76521957 2009390 IO-APIC-edge timer
1: 39599 0 IO-APIC-edge i8042
8: 128 0 IO-APIC-edge rtc
9: 415044 0 IO-APIC-level acpi
12: 862451 0 IO-APIC-edge i8042
58: 68014 326850 PCI-MSI libata
66: 508910 17187 IO-APIC-level sdhci:slot0, uhci_hcd:usb3
74: 3156375 0 IO-APIC-level uhci_hcd:usb2, ohci1394, HDA Intel
82: 134828 0 IO-APIC-level uhci_hcd:usb4, ehci_hcd:usb5
90: 46548 0 PCI-MSI eth0
201: 100133 0 IO-APIC-level uhci_hcd:usb1, yenta, i915@pci:0000:00:02.0
NMI: 0 0
LOC: 78530935 78520308
ERR: 0
MIS: 0

gruss
mlo
--
Dipl.-Ing. Martin Lorenz

They that can give up essential liberty
to obtain a little temporary safety
deserve neither liberty nor safety.
Benjamin Franklin

please encrypt your mail to me
GnuPG key-ID: F1AAD37D
get it here:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xF1AAD37D

ICQ UIN: 33588107

2006-10-19 07:01:29

by Andrew Morton

[permalink] [raw]

Subject: Re: un/shared IRQ problem (was: Re: 2.6.18 - another DWARF2)

On Thu, 19 Oct 2006 08:39:21 +0200
Martin Lorenz <[email protected]> wrote:

> On Wed, Oct 18, 2006 at 11:26:03PM -0700, Andrew Morton wrote:
> > On Wed, 18 Oct 2006 08:34:31 +0200
> > Martin Lorenz <[email protected]> wrote:
> >
> > > On Tue, Oct 17, 2006 at 11:52:16AM -0700, Jesse Brandeburg wrote:
> > > > On 10/16/06, Martin Lorenz <[email protected]> wrote:
> > > > >just got the following on resume:
> > > > >
> > > > >[87026.706000] [<c0251745>] e1000_open+0xcd/0x1a4
> > > > >[87026.714000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
> > > > >[87026.715000] Leftover inexact backtrace:
> > > > >[87026.715000] e1000: eth0: e1000_request_irq: Unable to allocate interrupt
> > > > >Error: -16
> > > >
> > > > I'm pretty sure this isn't an e1000 problem. you need to talk to
> > > > whoever is maintaining the IRQ subsystem for x86. E1000 is attempting
> > > > to register a shared interrupt and someone has already registered that
> > > > interrupt unshared.
> > >
> > > interestingly though it always involves e1000 when I see dumps like this.
> > > I already reported more of those :-)
> > > this one dosen't seem to do any harm to system stability. it occurs on every
> > > suspend/resume and I can circumvent it by disabling msi
> > >
> > > >
> > > > looks like several devices are sharing IRQ 201 (aka GSI 16) and ahci
> > > > or usb uhci_hcd is likely the problem, or the (acpi) power management
> > > > subsystem.
> > > >
> > > > Hope this helps get the right people involved.
> > >
> > > thank you
> >
> > Could we see the /proc/interrupts please, so we can find out where the
> > clash is happening?
>
>
> here you are
>
> ~# cat /proc/interrupts
> CPU0 CPU1
> 0: 76521957 2009390 IO-APIC-edge timer
> 1: 39599 0 IO-APIC-edge i8042
> 8: 128 0 IO-APIC-edge rtc
> 9: 415044 0 IO-APIC-level acpi
> 12: 862451 0 IO-APIC-edge i8042
> 58: 68014 326850 PCI-MSI libata
> 66: 508910 17187 IO-APIC-level sdhci:slot0, uhci_hcd:usb3
> 74: 3156375 0 IO-APIC-level uhci_hcd:usb2, ohci1394, HDA Intel
> 82: 134828 0 IO-APIC-level uhci_hcd:usb4, ehci_hcd:usb5
> 90: 46548 0 PCI-MSI eth0
> 201: 100133 0 IO-APIC-level uhci_hcd:usb1, yenta, i915@pci:0000:00:02.0
> NMI: 0 0
> LOC: 78530935 78520308
> ERR: 0
> MIS: 0
>

There are already three interrupt sources on 201, so they are all happy to
share.

It's e1000. Jesse, you fibbed ;)

static int e1000_request_irq(struct e1000_adapter *adapter)
{
...
if (adapter->have_msi)
flags &= ~IRQF_SHARED;

2006-10-19 14:50:38

by Eric W. Biederman

[permalink] [raw]

Subject: Re: un/shared IRQ problem (was: Re: 2.6.18 - another DWARF2)

Andrew Morton <[email protected]> writes:

> On Thu, 19 Oct 2006 08:39:21 +0200
> Martin Lorenz <[email protected]> wrote:
>
>> On Wed, Oct 18, 2006 at 11:26:03PM -0700, Andrew Morton wrote:
>> > On Wed, 18 Oct 2006 08:34:31 +0200
>> > Martin Lorenz <[email protected]> wrote:
>> >
>> > > On Tue, Oct 17, 2006 at 11:52:16AM -0700, Jesse Brandeburg wrote:
>> > > > On 10/16/06, Martin Lorenz <[email protected]> wrote:
>> > > > >just got the following on resume:
>> > > > >
>> > > > >[87026.706000] [<c0251745>] e1000_open+0xcd/0x1a4
>> > > > >[87026.714000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
>> > > > >[87026.715000] Leftover inexact backtrace:
>> > > > >[87026.715000] e1000: eth0: e1000_request_irq: Unable to allocate
> interrupt
>> > > > >Error: -16
>> > > >
>> > > > I'm pretty sure this isn't an e1000 problem. you need to talk to
>> > > > whoever is maintaining the IRQ subsystem for x86. E1000 is attempting
>> > > > to register a shared interrupt and someone has already registered that
>> > > > interrupt unshared.
>> > >
>> > > interestingly though it always involves e1000 when I see dumps like this.
>> > > I already reported more of those :-)
>> > > this one dosen't seem to do any harm to system stability. it occurs on
> every
>> > > suspend/resume and I can circumvent it by disabling msi
>> > >
>> > > >
>> > > > looks like several devices are sharing IRQ 201 (aka GSI 16) and ahci
>> > > > or usb uhci_hcd is likely the problem, or the (acpi) power management
>> > > > subsystem.
>> > > >
>> > > > Hope this helps get the right people involved.
>> > >
>> > > thank you
>> >
>> > Could we see the /proc/interrupts please, so we can find out where the
>> > clash is happening?
>>
>>
>> here you are
>>
>> ~# cat /proc/interrupts
>> CPU0 CPU1
>> 0: 76521957 2009390 IO-APIC-edge timer
>> 1: 39599 0 IO-APIC-edge i8042
>> 8: 128 0 IO-APIC-edge rtc
>> 9: 415044 0 IO-APIC-level acpi
>> 12: 862451 0 IO-APIC-edge i8042
>> 58: 68014 326850 PCI-MSI libata
>> 66: 508910 17187 IO-APIC-level sdhci:slot0, uhci_hcd:usb3
>> 74: 3156375 0 IO-APIC-level uhci_hcd:usb2, ohci1394, HDA Intel
>> 82: 134828 0 IO-APIC-level uhci_hcd:usb4, ehci_hcd:usb5
>> 90: 46548 0 PCI-MSI eth0
>> 201: 100133 0 IO-APIC-level uhci_hcd:usb1, yenta, i915@pci:0000:00:02.0
>> NMI: 0 0
>> LOC: 78530935 78520308
>> ERR: 0
>> MIS: 0
>>
>
> There are already three interrupt sources on 201, so they are all happy to
> share.
>
> It's e1000. Jesse, you fibbed ;)
>
> static int e1000_request_irq(struct e1000_adapter *adapter)
> {
> ...
> if (adapter->have_msi)
> flags &= ~IRQF_SHARED;

Well MSI irqs can't be shared, as they are edged triggered.
Is the e1000 really trying to use irq 201? That would indicate
a logic failure in the msi irq allocator. It should never allocate
an inuse irq.

I have to ask what is the state in 2.6.19-rc2? I'm wondering if
my turning of the msi irq allocator inside out has fixed this problem.

Does this situation work if MSI is disabled, in 2.6.18?

The backwards msi irq allocator in 2.6.18 is so convoluted it may have
a corner case where it fails, and that is triggering this mess.

If 2.6.19 works with MSI's enabled and 2.6.18 works with MSI disabled
I'm inclined to say I have done all that is reasonable.

Eric

2006-10-19 17:14:31

by Andrew Morton

[permalink] [raw]

Subject: Re: un/shared IRQ problem (was: Re: 2.6.18 - another DWARF2)

On Thu, 19 Oct 2006 08:48:10 -0600
[email protected] (Eric W. Biederman) wrote:

> >
> > There are already three interrupt sources on 201, so they are all happy to
> > share.
> >
> > It's e1000. Jesse, you fibbed ;)
> >
> > static int e1000_request_irq(struct e1000_adapter *adapter)
> > {
> > ...
> > if (adapter->have_msi)
> > flags &= ~IRQF_SHARED;
>
> Well MSI irqs can't be shared, as they are edged triggered.
> Is the e1000 really trying to use irq 201? That would indicate
> a logic failure in the msi irq allocator. It should never allocate
> an inuse irq.

It's gone very bad. See
http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.18-ie-la-tp-41.5+0813.boot
http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.18-ie-la-tp-41.5+0813.out

> I have to ask what is the state in 2.6.19-rc2? I'm wondering if
> my turning of the msi irq allocator inside out has fixed this problem.

Martin, please try CONFIG_PCI_MSI=n. I'd expect that to fix it (it always does)

> Does this situation work if MSI is disabled, in 2.6.18?
>
> The backwards msi irq allocator in 2.6.18 is so convoluted it may have
> a corner case where it fails, and that is triggering this mess.
>
> If 2.6.19 works with MSI's enabled and 2.6.18 works with MSI disabled
> I'm inclined to say I have done all that is reasonable.

oh, we haven't tried 2.6.19-rc2 yet? Please do that, with CONFIG_PCI_MSI=y.

2006-10-20 08:40:22

by Martin Lorenz

[permalink] [raw]

Subject: Re: un/shared IRQ problem (was: Re: 2.6.18 - another DWARF2)

On Thu, Oct 19, 2006 at 10:14:11AM -0700, Andrew Morton wrote:
> On Thu, 19 Oct 2006 08:48:10 -0600
> [email protected] (Eric W. Biederman) wrote:
>
> > >
> > > There are already three interrupt sources on 201, so they are all happy to
> > > share.
> > >
> > > It's e1000. Jesse, you fibbed ;)
> > >
> > > static int e1000_request_irq(struct e1000_adapter *adapter)
> > > {
> > > ...
> > > if (adapter->have_msi)
> > > flags &= ~IRQF_SHARED;
> >
> > Well MSI irqs can't be shared, as they are edged triggered.
> > Is the e1000 really trying to use irq 201? That would indicate
> > a logic failure in the msi irq allocator. It should never allocate
> > an inuse irq.
>
> It's gone very bad. See
> http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.18-ie-la-tp-41.5+0813.boot
> http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.18-ie-la-tp-41.5+0813.out
>
> > I have to ask what is the state in 2.6.19-rc2? I'm wondering if
> > my turning of the msi irq allocator inside out has fixed this problem.
>
> Martin, please try CONFIG_PCI_MSI=n. I'd expect that to fix it (it always does)
>
> > Does this situation work if MSI is disabled, in 2.6.18?
> >
> > The backwards msi irq allocator in 2.6.18 is so convoluted it may have
> > a corner case where it fails, and that is triggering this mess.
> >
> > If 2.6.19 works with MSI's enabled and 2.6.18 works with MSI disabled
> > I'm inclined to say I have done all that is reasonable.
>
> oh, we haven't tried 2.6.19-rc2 yet? Please do that, with CONFIG_PCI_MSI=y.

ok...
I compliled and installed the latest git yesterday evening and another one
this morning because of a configuration mistake I made

there are new DWARFs but the one I reported for 2.6.18 seem to be fixed
at least I diden't succeed in triggering it

the new ones are in
http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.19-rc2-tp-ie-e1-42.5+0737-gce9e3d99-dirty.run
http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.19-rc2-tp-ie-e1-42.5+0737-gce9e3d99-dirty.boot

http://www.lorenz.eu.org/~mlo/kernel/interrupts_2.6.19-rc2-tp-ie-e1-42.5+0737-gce9e3d99-dirty
is there too

http://www.lorenz.eu.org/~mlo/kernel/?C=M;O=D
for a list of files I uploaded

[ 64.655000] kobject_add failed for vcs6 with -EEXIST, don't try to register things with the same name in the same directory.
[ 64.655000] [<c0103bfd>] dump_trace+0x69/0x1af
[ 64.655000] [<c0103d5b>] show_trace_log_lvl+0x18/0x2c
[ 64.656000] [<c01043fa>] show_trace+0xf/0x11
[ 64.656000] [<c01044fd>] dump_stack+0x15/0x17
[ 64.656000] [<c01fbf3d>] kobject_add+0x160/0x189
[ 64.657000] [<c0250fec>] class_device_add+0xa2/0x3d8
[ 64.658000] [<c02513ae>] class_device_create+0x7c/0x9c
[ 64.659000] [<c0237858>] vcs_make_sysfs+0x3c/0x7e
[ 64.659000] [<c023c641>] con_open+0x6f/0x7c
[ 64.660000] [<c023259b>] tty_open+0x179/0x2f0
[ 64.661000] [<c016226e>] chrdev_open+0x124/0x13f
[ 64.662000] [<c015e665>] __dentry_open+0xc7/0x1ab
[ 64.662000] [<c015e7c3>] nameidata_to_filp+0x24/0x33
[ 64.662000] [<c015e804>] do_filp_open+0x32/0x39
[ 64.663000] [<c015e84d>] do_sys_open+0x42/0xc3
[ 64.663000] [<c015e907>] sys_open+0x1c/0x1e
[ 64.664000] [<c0102de7>] syscall_call+0x7/0xb
[ 64.664000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
[ 64.664000]
[ 64.664000] Leftover inexact backtrace:
[ 64.664000]
[ 64.664000] =======================
[ 64.666000] kobject_add failed for vcsa6 with -EEXIST, don't try to register things with the same name in the same directory.
[ 64.666000] [<c0103bfd>] dump_trace+0x69/0x1af
[ 64.666000] [<c0103d5b>] show_trace_log_lvl+0x18/0x2c
[ 64.666000] [<c01043fa>] show_trace+0xf/0x11
[ 64.666000] [<c01044fd>] dump_stack+0x15/0x17
[ 64.667000] [<c01fbf3d>] kobject_add+0x160/0x189
[ 64.667000] [<c0250fec>] class_device_add+0xa2/0x3d8
[ 64.668000] [<c02513ae>] class_device_create+0x7c/0x9c
[ 64.668000] [<c0237895>] vcs_make_sysfs+0x79/0x7e
[ 64.669000] [<c023c641>] con_open+0x6f/0x7c
[ 64.670000] [<c023259b>] tty_open+0x179/0x2f0
[ 64.670000] [<c016226e>] chrdev_open+0x124/0x13f
[ 64.670000] [<c015e665>] __dentry_open+0xc7/0x1ab
[ 64.671000] [<c015e7c3>] nameidata_to_filp+0x24/0x33
[ 64.671000] [<c015e804>] do_filp_open+0x32/0x39
[ 64.671000] [<c015e84d>] do_sys_open+0x42/0xc3
[ 64.672000] [<c015e907>] sys_open+0x1c/0x1e
[ 64.673000] [<c0102de7>] syscall_call+0x7/0xb
[ 64.673000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
[ 64.673000]
[ 64.673000] Leftover inexact backtrace:
[ 64.673000]
[ 64.673000] =======================

gruss
mlo
--
Dipl.-Ing. Martin Lorenz

They that can give up essential liberty
to obtain a little temporary safety
deserve neither liberty nor safety.
Benjamin Franklin

please encrypt your mail to me
GnuPG key-ID: F1AAD37D
get it here:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xF1AAD37D

ICQ UIN: 33588107

2006-10-20 08:48:01

by Andrew Morton

[permalink] [raw]

Subject: Re: un/shared IRQ problem (was: Re: 2.6.18 - another DWARF2)

On Fri, 20 Oct 2006 10:37:41 +0200
Martin Lorenz <[email protected]> wrote:

> On Thu, Oct 19, 2006 at 10:14:11AM -0700, Andrew Morton wrote:
> > On Thu, 19 Oct 2006 08:48:10 -0600
> > [email protected] (Eric W. Biederman) wrote:
> >
> > > >
> > > > There are already three interrupt sources on 201, so they are all happy to
> > > > share.
> > > >
> > > > It's e1000. Jesse, you fibbed ;)
> > > >
> > > > static int e1000_request_irq(struct e1000_adapter *adapter)
> > > > {
> > > > ...
> > > > if (adapter->have_msi)
> > > > flags &= ~IRQF_SHARED;
> > >
> > > Well MSI irqs can't be shared, as they are edged triggered.
> > > Is the e1000 really trying to use irq 201? That would indicate
> > > a logic failure in the msi irq allocator. It should never allocate
> > > an inuse irq.
> >
> > It's gone very bad. See
> > http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.18-ie-la-tp-41.5+0813.boot
> > http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.18-ie-la-tp-41.5+0813.out
> >
> > > I have to ask what is the state in 2.6.19-rc2? I'm wondering if
> > > my turning of the msi irq allocator inside out has fixed this problem.
> >
> > Martin, please try CONFIG_PCI_MSI=n. I'd expect that to fix it (it always does)
> >
> > > Does this situation work if MSI is disabled, in 2.6.18?
> > >
> > > The backwards msi irq allocator in 2.6.18 is so convoluted it may have
> > > a corner case where it fails, and that is triggering this mess.
> > >
> > > If 2.6.19 works with MSI's enabled and 2.6.18 works with MSI disabled
> > > I'm inclined to say I have done all that is reasonable.
> >
> > oh, we haven't tried 2.6.19-rc2 yet? Please do that, with CONFIG_PCI_MSI=y.
>
> ok...
> I compliled and installed the latest git yesterday evening and another one
> this morning because of a configuration mistake I made

OK, sounds like Eric's MSI changes have helped.

> there are new DWARFs but the one I reported for 2.6.18 seem to be fixed
> at least I diden't succeed in triggering it
>
> the new ones are in
> http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.19-rc2-tp-ie-e1-42.5+0737-gce9e3d99-dirty.run
> http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.19-rc2-tp-ie-e1-42.5+0737-gce9e3d99-dirty.boot

Confused. I see no backtraces in the above.

> http://www.lorenz.eu.org/~mlo/kernel/interrupts_2.6.19-rc2-tp-ie-e1-42.5+0737-gce9e3d99-dirty
> is there too
>
> http://www.lorenz.eu.org/~mlo/kernel/?C=M;O=D
> for a list of files I uploaded
>
> [ 64.655000] kobject_add failed for vcs6 with -EEXIST, don't try to register things with the same name in the same directory.
> [ 64.655000] [<c0103bfd>] dump_trace+0x69/0x1af
> [ 64.655000] [<c0103d5b>] show_trace_log_lvl+0x18/0x2c
> [ 64.656000] [<c01043fa>] show_trace+0xf/0x11
> [ 64.656000] [<c01044fd>] dump_stack+0x15/0x17
> [ 64.656000] [<c01fbf3d>] kobject_add+0x160/0x189
> [ 64.657000] [<c0250fec>] class_device_add+0xa2/0x3d8
> [ 64.658000] [<c02513ae>] class_device_create+0x7c/0x9c
> [ 64.659000] [<c0237858>] vcs_make_sysfs+0x3c/0x7e
> [ 64.659000] [<c023c641>] con_open+0x6f/0x7c
> [ 64.660000] [<c023259b>] tty_open+0x179/0x2f0
> [ 64.661000] [<c016226e>] chrdev_open+0x124/0x13f
> [ 64.662000] [<c015e665>] __dentry_open+0xc7/0x1ab
> [ 64.662000] [<c015e7c3>] nameidata_to_filp+0x24/0x33
> [ 64.662000] [<c015e804>] do_filp_open+0x32/0x39
> [ 64.663000] [<c015e84d>] do_sys_open+0x42/0xc3
> [ 64.663000] [<c015e907>] sys_open+0x1c/0x1e
> [ 64.664000] [<c0102de7>] syscall_call+0x7/0xb
> [ 64.664000] DWARF2 unwinder stuck at syscall_call+0x7/0xb

hm, people report that occasionally - I don't think anyone knows what's
causing it.

2006-10-20 09:20:29

by Martin Lorenz

[permalink] [raw]

Subject: Re: un/shared IRQ problem (was: Re: 2.6.18 - another DWARF2)

On Fri, Oct 20, 2006 at 01:47:53AM -0700, Andrew Morton wrote:
> On Fri, 20 Oct 2006 10:37:41 +0200
> Martin Lorenz <[email protected]> wrote:
>
> >
> > the new ones are in
> > http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.19-rc2-tp-ie-e1-42.5+0737-gce9e3d99-dirty.run
> > http://www.lorenz.eu.org/~mlo/kernel/dmesg_2.6.19-rc2-tp-ie-e1-42.5+0737-gce9e3d99-dirty.boot
>
> Confused. I see no backtraces in the above.

only the ones I copied below
so the original issue of this thread seems to be solved in the 2.6.19-rc2

>
> > http://www.lorenz.eu.org/~mlo/kernel/interrupts_2.6.19-rc2-tp-ie-e1-42.5+0737-gce9e3d99-dirty
> > is there too
> >
> > http://www.lorenz.eu.org/~mlo/kernel/?C=M;O=D
> > for a list of files I uploaded
> >
> > [ 64.655000] kobject_add failed for vcs6 with -EEXIST, don't try to register things with the same name in the same directory.
> > [ 64.655000] [<c0103bfd>] dump_trace+0x69/0x1af
> > [ 64.655000] [<c0103d5b>] show_trace_log_lvl+0x18/0x2c
> > [ 64.656000] [<c01043fa>] show_trace+0xf/0x11
> > [ 64.656000] [<c01044fd>] dump_stack+0x15/0x17
> > [ 64.656000] [<c01fbf3d>] kobject_add+0x160/0x189
> > [ 64.657000] [<c0250fec>] class_device_add+0xa2/0x3d8
> > [ 64.658000] [<c02513ae>] class_device_create+0x7c/0x9c
> > [ 64.659000] [<c0237858>] vcs_make_sysfs+0x3c/0x7e
> > [ 64.659000] [<c023c641>] con_open+0x6f/0x7c
> > [ 64.660000] [<c023259b>] tty_open+0x179/0x2f0
> > [ 64.661000] [<c016226e>] chrdev_open+0x124/0x13f
> > [ 64.662000] [<c015e665>] __dentry_open+0xc7/0x1ab
> > [ 64.662000] [<c015e7c3>] nameidata_to_filp+0x24/0x33
> > [ 64.662000] [<c015e804>] do_filp_open+0x32/0x39
> > [ 64.663000] [<c015e84d>] do_sys_open+0x42/0xc3
> > [ 64.663000] [<c015e907>] sys_open+0x1c/0x1e
> > [ 64.664000] [<c0102de7>] syscall_call+0x7/0xb
> > [ 64.664000] DWARF2 unwinder stuck at syscall_call+0x7/0xb
>
> hm, people report that occasionally - I don't think anyone knows what's
> causing it.
>
so I'll simply ignore it :-)

gruss
mlo
--
Dipl.-Ing. Martin Lorenz

They that can give up essential liberty
to obtain a little temporary safety
deserve neither liberty nor safety.
Benjamin Franklin

please encrypt your mail to me
GnuPG key-ID: F1AAD37D
get it here:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xF1AAD37D

ICQ UIN: 33588107