2008-04-15 08:33:23

by Andrew Morton

[permalink] [raw]
Subject: Re: [Bugme-new] [Bug 10455] New: Kernel lockup when iwl3945 modules is loaded after a few hours


(switched to email. Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 15 Apr 2008 01:16:39 -0700 (PDT) [email protected] wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10455
>
> Summary: Kernel lockup when iwl3945 modules is loaded after a few
> hours
> Product: Drivers
> Version: 2.5
> KernelVersion: 2.6.25-rc9
> Platform: All
> OS/Version: Linux
> Tree: Mainline
> Status: NEW
> Severity: normal
> Priority: P1
> Component: network-wireless
> AssignedTo: [email protected]
> ReportedBy: [email protected]
>
>
> Latest working kernel version: (not known)
> Earliest failing kernel version: 2.6.25-rc9

hm, why? Has this machine never run earlier kernels?

> Distribution: Ubuntu Linux 7.10
> Hardware Environment: Acer Aspire laptop model 5633WLMi
> Software Environment: Ubuntu Linux 7.10, 32 bit mode.
> Problem Description: When the iwl3945 module is loaded, the kernel locks up
> after a few hours. This problem does not occur when the iwl3945 module is not
> loaded.
>
> Steps to reproduce: Boot the system.
>
> The following kernel messages look suspicious:
>
> Apr 14 20:16:24 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:16:24 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:18:28 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:18:28 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:20:32 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:20:32 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:22:36 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:22:36 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:24:40 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:24:40 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:26:44 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:26:44 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:28:48 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:28:48 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:30:52 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:30:52 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:32:56 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:32:56 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:35:00 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:35:00 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:37:04 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:37:04 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:39:08 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:39:08 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:41:12 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:41:12 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:43:16 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:43:16 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:45:20 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:45:20 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:47:24 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:47:24 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:49:28 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:49:28 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:51:32 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:51:32 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:53:36 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:53:36 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:55:40 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:55:40 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:57:44 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:57:44 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 20:59:48 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 20:59:48 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 21:01:52 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 21:01:52 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 21:03:56 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 21:03:56 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled
> Apr 14 21:06:00 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> (level, low) -> IRQ 19
> Apr 14 21:06:00 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> disabled

Yes, that does look like an ACPI problem. But one would expect it to kill
the interface immediately, not several hours later.

Can you send the /proc/interrupts contents?



2008-04-15 17:23:07

by Reinette Chatre

[permalink] [raw]
Subject: RE: [Bugme-new] [Bug 10455] New: Kernel lockup when iwl3945 modules is loaded after a few hours

On , Bjorn Helgaas wrote:

> iwl3945 modules is loaded after a few hours
>
>>>> > Apr 14 21:06:00 localhost kernel: ACPI: PCI Interrupt
> 0000:05:00.0[A] -> GSI 19 (level, low) -> IRQ 19
>>>> > Apr 14 21:06:00 localhost kernel: ACPI: PCI interrupt
> for device 0000:05:00.0 disabled
>>>>
>>>> Yes, that does look like an ACPI problem. But one would expect
>>>> it to kill the interface immediately, not several hours later.
>
> I can't tell whether this is an ACPI problem. These messages are from
> pci_enable_device() and pci_disable_device(). So it looks like you're
> seeing an enable/disable pair every 124 seconds.
>
> This could be a problem somewhere in iwl3945_mac_start() that causes
> us to turn around and disable the device immediately. Most of
> the problems
> there have IWL_ERROR() messages associated with them, and you
> apparently aren't seeing those.

(duplicate entry will be found in bugzilla)

Are there any userspace apps managing this device? (eg. wpa_supplicant?)

Could you please load the driver with debugging (modprobe iwl3945
debug=0x43ffb) - that will help us see if iwl3945_mac_start() is being
called
frequently.

Thanks

Reinette

2008-04-15 08:39:19

by Bart Van Assche

[permalink] [raw]
Subject: Re: [Bugme-new] [Bug 10455] New: Kernel lockup when iwl3945 modules is loaded after a few hours

On Tue, Apr 15, 2008 at 10:32 AM, Andrew Morton
<[email protected]> wrote:
>
> > Latest working kernel version: (not known)
> > Earliest failing kernel version: 2.6.25-rc9
>
> hm, why? Has this machine never run earlier kernels?

2.6.22-14 works fine on the same hardware. This is the kernel provided
by Ubuntu.

> > Apr 14 21:06:00 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> > (level, low) -> IRQ 19
> > Apr 14 21:06:00 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> > disabled
>
> Yes, that does look like an ACPI problem. But one would expect it to kill
> the interface immediately, not several hours later.
>
> Can you send the /proc/interrupts contents?

Here it is:

CPU0 CPU1
0: 996478 0 IO-APIC-edge timer
1: 30 0 IO-APIC-edge i8042
8: 7 0 IO-APIC-edge rtc
9: 57710 0 IO-APIC-fasteoi acpi
12: 132 0 IO-APIC-edge i8042
14: 31294 11519 IO-APIC-edge ata_piix
15: 69158 0 IO-APIC-edge ata_piix
16: 85016 0 IO-APIC-fasteoi uhci_hcd:usb4,
i915@pci:0000:00:02.0
18: 15921 0 IO-APIC-fasteoi uhci_hcd:usb3
19: 0 0 IO-APIC-fasteoi uhci_hcd:usb2
21: 79770 60509 IO-APIC-fasteoi eth0
22: 5811 0 IO-APIC-fasteoi HDA Intel
23: 3 0 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb5
NMI: 0 0 Non-maskable interrupts
LOC: 509 996926 Local timer interrupts
RES: 29774 84400 Rescheduling interrupts
CAL: 62 28154 function call interrupts
TLB: 868 1262 TLB shootdowns
TRM: 0 0 Thermal event interrupts
SPU: 0 0 Spurious interrupts
ERR: 0
MIS: 0

Bart.

2008-04-15 15:17:58

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [Bugme-new] [Bug 10455] New: Kernel lockup when iwl3945 modules is loaded after a few hours

> > > > Apr 14 21:06:00 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19 (level, low) -> IRQ 19
> > > > Apr 14 21:06:00 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0 disabled
> > >
> > > Yes, that does look like an ACPI problem. But one would expect it to kill
> > > the interface immediately, not several hours later.

I can't tell whether this is an ACPI problem. These messages are from
pci_enable_device() and pci_disable_device(). So it looks like you're
seeing an enable/disable pair every 124 seconds.

This could be a problem somewhere in iwl3945_mac_start() that causes us
to turn around and disable the device immediately. Most of the problems
there have IWL_ERROR() messages associated with them, and you apparently
aren't seeing those.


2008-04-15 09:01:07

by Andrew Morton

[permalink] [raw]
Subject: Re: [Bugme-new] [Bug 10455] New: Kernel lockup when iwl3945 modules is loaded after a few hours

On Tue, 15 Apr 2008 10:39:17 +0200 "Bart Van Assche" <[email protected]> wrote:

> On Tue, Apr 15, 2008 at 10:32 AM, Andrew Morton
> <[email protected]> wrote:
> >
> > > Latest working kernel version: (not known)
> > > Earliest failing kernel version: 2.6.25-rc9
> >
> > hm, why? Has this machine never run earlier kernels?
>
> 2.6.22-14 works fine on the same hardware. This is the kernel provided
> by Ubuntu.

OK, thanks, so it's a regression but we don't know if it's a post-2.6.25
regression.

> > > Apr 14 21:06:00 localhost kernel: ACPI: PCI Interrupt 0000:05:00.0[A] -> GSI 19
> > > (level, low) -> IRQ 19
> > > Apr 14 21:06:00 localhost kernel: ACPI: PCI interrupt for device 0000:05:00.0
> > > disabled
> >
> > Yes, that does look like an ACPI problem. But one would expect it to kill
> > the interface immediately, not several hours later.
> >
> > Can you send the /proc/interrupts contents?
>
> Here it is:
>
> CPU0 CPU1
> 0: 996478 0 IO-APIC-edge timer
> 1: 30 0 IO-APIC-edge i8042
> 8: 7 0 IO-APIC-edge rtc
> 9: 57710 0 IO-APIC-fasteoi acpi
> 12: 132 0 IO-APIC-edge i8042
> 14: 31294 11519 IO-APIC-edge ata_piix
> 15: 69158 0 IO-APIC-edge ata_piix
> 16: 85016 0 IO-APIC-fasteoi uhci_hcd:usb4,
> i915@pci:0000:00:02.0
> 18: 15921 0 IO-APIC-fasteoi uhci_hcd:usb3
> 19: 0 0 IO-APIC-fasteoi uhci_hcd:usb2
> 21: 79770 60509 IO-APIC-fasteoi eth0
> 22: 5811 0 IO-APIC-fasteoi HDA Intel
> 23: 3 0 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb5
> NMI: 0 0 Non-maskable interrupts
> LOC: 509 996926 Local timer interrupts
> RES: 29774 84400 Rescheduling interrupts
> CAL: 62 28154 function call interrupts
> TLB: 868 1262 TLB shootdowns
> TRM: 0 0 Thermal event interrupts
> SPU: 0 0 Spurious interrupts
> ERR: 0
> MIS: 0

hm, OK.

I'd suggest that you raise a separate report against
[email protected] for this one - it does appear
to be unrelated, I think.