2007-01-23 09:00:40

by Lionel Landwerlin

[permalink] [raw]
Subject: 2.6.19.2 sky2/acpi crashes

Hi,

I'm running a macbook with a Marvell ethernet controller, and I have a
lots of freezes when using the ethernet controller under a load of
~100K/s. Since I'm running a 2.6.19.2 kernel, I'm able to get some
report from the kernel. Here they are :

Jan 23 09:30:57 cocoduo kernel: [ 662.920000] NETDEV WATCHDOG: eth0:
transmit timed out
Jan 23 09:30:57 cocoduo kernel: [ 662.920000] sky2 eth0: tx timeout
Jan 23 09:30:57 cocoduo kernel: [ 662.920000] sky2 eth0: transmit ring
493 .. 471 report=494 done=494
Jan 23 09:30:57 cocoduo kernel: [ 662.920000] sky2 status report lost?
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] BUG: soft lockup detected
on CPU#0!
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [softlockup_tick
+155/208] softlockup_tick+0x9b/0xd0
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [update_process_times
+49/128] update_process_times+0x31/0x80
Jan 23 09:31:06 cocoduo kernel: [ 672.832000]
[smp_apic_timer_interrupt+145/176] smp_apic_timer_interrupt+0x91/0xb0
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [apic_timer_interrupt
+31/36] apic_timer_interrupt+0x1f/0x24
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [_spin_lock_bh+18/32]
_spin_lock_bh+0x12/0x20
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [pg0
+946878101/1068803072] sky2_tx_timeout+0xf5/0x1d0 [sky2]
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [dev_watchdog+0/208]
dev_watchdog+0x0/0xd0
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [dev_watchdog+192/208]
dev_watchdog+0xc0/0xd0
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [run_timer_softirq
+273/400] run_timer_softirq+0x111/0x190
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [__do_softirq+116/240]
__do_softirq+0x74/0xf0
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [do_softirq+59/80]
do_softirq+0x3b/0x50
Jan 23 09:31:06 cocoduo kernel: [ 672.832000]
[smp_apic_timer_interrupt+150/176] smp_apic_timer_interrupt+0x96/0xb0
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [apic_timer_interrupt
+31/36] apic_timer_interrupt+0x1f/0x24
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [pg0
+943208348/1068803072] acpi_processor_idle+0x1fd/0x3b9 [processor]
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [cpu_idle+116/208]
cpu_idle+0x74/0xd0
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [start_kernel+872/1072]
start_kernel+0x368/0x430
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [unknown_bootoption
+0/624] unknown_bootoption+0x0/0x270
Jan 23 09:31:06 cocoduo kernel: [ 672.832000] =======================

As most of the time, the keyboard gets locked and the network driver is
down, I can get more informations.

Here my hardware configuration :

Apple Macbook 2GHz (x86, not amd64)
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and
945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile
945GM/GMS/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/940GML
Express Integrated Graphics Controller (rev 03)
00:07.0 Performance counters: Intel Corporation Unknown device 27a3 (rev
03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High
Definition Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express
Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express
Port 2 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
#1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
#2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
#3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
#4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface
Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE
Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family)
Serial ATA Storage Controller IDE (rev 02)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller
(rev 02)
01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E
Gigabit Ethernet Controller (rev 22)
02:00.0 Ethernet controller: Atheros Communications, Inc. Unknown device
001c (rev 01)
03:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 61)

I hope some fix could be released soon.

--
Lionel Landwerlin <[email protected]>


2007-01-23 09:22:54

by Luming Yu

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes

Please try to remove processor module.

On 1/23/07, Lionel Landwerlin <[email protected]> wrote:
> Hi,
>
> I'm running a macbook with a Marvell ethernet controller, and I have a
> lots of freezes when using the ethernet controller under a load of
> ~100K/s. Since I'm running a 2.6.19.2 kernel, I'm able to get some
> report from the kernel. Here they are :
>
> Jan 23 09:30:57 cocoduo kernel: [ 662.920000] NETDEV WATCHDOG: eth0:
> transmit timed out
> Jan 23 09:30:57 cocoduo kernel: [ 662.920000] sky2 eth0: tx timeout
> Jan 23 09:30:57 cocoduo kernel: [ 662.920000] sky2 eth0: transmit ring
> 493 .. 471 report=494 done=494
> Jan 23 09:30:57 cocoduo kernel: [ 662.920000] sky2 status report lost?
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] BUG: soft lockup detected
> on CPU#0!
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [softlockup_tick
> +155/208] softlockup_tick+0x9b/0xd0
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [update_process_times
> +49/128] update_process_times+0x31/0x80
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000]
> [smp_apic_timer_interrupt+145/176] smp_apic_timer_interrupt+0x91/0xb0
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [apic_timer_interrupt
> +31/36] apic_timer_interrupt+0x1f/0x24
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [_spin_lock_bh+18/32]
> _spin_lock_bh+0x12/0x20
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [pg0
> +946878101/1068803072] sky2_tx_timeout+0xf5/0x1d0 [sky2]
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [dev_watchdog+0/208]
> dev_watchdog+0x0/0xd0
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [dev_watchdog+192/208]
> dev_watchdog+0xc0/0xd0
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [run_timer_softirq
> +273/400] run_timer_softirq+0x111/0x190
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [__do_softirq+116/240]
> __do_softirq+0x74/0xf0
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [do_softirq+59/80]
> do_softirq+0x3b/0x50
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000]
> [smp_apic_timer_interrupt+150/176] smp_apic_timer_interrupt+0x96/0xb0
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [apic_timer_interrupt
> +31/36] apic_timer_interrupt+0x1f/0x24
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [pg0
> +943208348/1068803072] acpi_processor_idle+0x1fd/0x3b9 [processor]
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [cpu_idle+116/208]
> cpu_idle+0x74/0xd0
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [start_kernel+872/1072]
> start_kernel+0x368/0x430
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] [unknown_bootoption
> +0/624] unknown_bootoption+0x0/0x270
> Jan 23 09:31:06 cocoduo kernel: [ 672.832000] =======================
>
> As most of the time, the keyboard gets locked and the network driver is
> down, I can get more informations.
>
> Here my hardware configuration :
>
> Apple Macbook 2GHz (x86, not amd64)
> 00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS/940GML and
> 945GT Express Memory Controller Hub (rev 03)
> 00:02.0 VGA compatible controller: Intel Corporation Mobile
> 945GM/GMS/940GML Express Integrated Graphics Controller (rev 03)
> 00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/940GML
> Express Integrated Graphics Controller (rev 03)
> 00:07.0 Performance counters: Intel Corporation Unknown device 27a3 (rev
> 03)
> 00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High
> Definition Audio Controller (rev 02)
> 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express
> Port 1 (rev 02)
> 00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express
> Port 2 (rev 02)
> 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> #1 (rev 02)
> 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> #2 (rev 02)
> 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> #3 (rev 02)
> 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI
> #4 (rev 02)
> 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI
> Controller (rev 02)
> 00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
> 00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface
> Bridge (rev 02)
> 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE
> Controller (rev 02)
> 00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family)
> Serial ATA Storage Controller IDE (rev 02)
> 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller
> (rev 02)
> 01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E
> Gigabit Ethernet Controller (rev 22)
> 02:00.0 Ethernet controller: Atheros Communications, Inc. Unknown device
> 001c (rev 01)
> 03:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 61)
>
> I hope some fix could be released soon.
>
> --
> Lionel Landwerlin <[email protected]>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2007-01-23 10:17:14

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes

On Tue, 23 Jan 2007 08:59:28 +0000, Lionel Landwerlin wrote:

> Hi,
>
> I'm running a macbook with a Marvell ethernet controller, and I have a
> lots of freezes when using the ethernet controller under a load of
> ~100K/s. Since I'm running a 2.6.19.2 kernel, I'm able to get some
> report from the kernel. Here they are :

I am also having trouble with the sky2 module, though I've not yet seen a
oops, the driver stopped working after some heavy traffic (copying some G
of data). Only rmmod sky2; modprobe sky2 resolved this. (I am also on
2.6.19.2 but I've seen this happen on 2.6.20-rcX too).

Soeren

2007-01-23 11:12:52

by Andrew Lyon

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes

On 1/23/07, Soeren Sonnenburg <[email protected]> wrote:
> On Tue, 23 Jan 2007 08:59:28 +0000, Lionel Landwerlin wrote:
>
> > Hi,
> >
> > I'm running a macbook with a Marvell ethernet controller, and I have a
> > lots of freezes when using the ethernet controller under a load of
> > ~100K/s. Since I'm running a 2.6.19.2 kernel, I'm able to get some
> > report from the kernel. Here they are :
>
> I am also having trouble with the sky2 module, though I've not yet seen a
> oops, the driver stopped working after some heavy traffic (copying some G
> of data). Only rmmod sky2; modprobe sky2 resolved this. (I am also on
> 2.6.19.2 but I've seen this happen on 2.6.20-rcX too).
>
> Soeren
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Ive also had the same problem with both 2.6.19.2 and 2.6.20-rcX,
motherboard is gigabyte ga-965-ds3 , the networking stops completely
under moderate traffic, I get the following errors or a complete
lockup:

Jan 21 02:08:04 beast NETDEV WATCHDOG: eth0: transmit timed out
Jan 21 02:08:04 beast sky2 eth0: tx timeout
Jan 21 02:08:04 beast sky2 eth0: transmit ring 475 .. 452 report=475 done=475
Jan 21 02:08:04 beast sky2 hardware hung? flushing

At the time I was downloading a iso image at 850k/sec, so not really a
high network load at all.

rmmod / modprobe does resolve the issue, but more times than not the
box locks up completely instead of getting those errors.

Andy

2007-01-23 12:28:10

by Lionel Landwerlin

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes

Le mardi 23 janvier 2007 à 17:22 +0800, Luming Yu a écrit :
> Please try to remove processor module.

Ok, that's done. Same problem.

Just to show you I did not forget to remove processor.ko from initrd
image, I tried to load speedstep_centrino :

Jan 23 13:09:58 cocoduo kernel: [ 105.697279] speedstep_centrino:
Unknown symbol acpi_processor_notify_smm
Jan 23 13:09:58 cocoduo kernel: [ 105.697324] speedstep_centrino:
Unknown symbol acpi_processor_unregister_performance
Jan 23 13:09:58 cocoduo kernel: [ 105.697411] speedstep_centrino:
Unknown symbol acpi_processor_preregister_performance
Jan 23 13:09:58 cocoduo kernel: [ 105.697464] speedstep_centrino:
Unknown symbol acpi_processor_register_performance
Jan 23 13:15:51 cocoduo kernel: [ 458.267151] NETDEV WATCHDOG: eth0:
transmit timed out
Jan 23 13:15:51 cocoduo kernel: [ 458.267157] sky2 eth0: tx timeout
Jan 23 13:15:51 cocoduo kernel: [ 458.267164] sky2 eth0: transmit ring
421 .. 398 report=422 done=422
Jan 23 13:15:51 cocoduo kernel: [ 458.267166] sky2 status report lost?
Jan 23 13:16:00 cocoduo kernel: [ 466.730622] BUG: soft lockup detected
on CPU#0!
Jan 23 13:16:00 cocoduo kernel: [ 466.730644] [softlockup_tick
+155/208] softlockup_tick+0x9b/0xd0
Jan 23 13:16:00 cocoduo kernel: [ 466.730656] [update_process_times
+49/128] update_process_times+0x31/0x80
Jan 23 13:16:00 cocoduo kernel: [ 466.730666]
[smp_apic_timer_interrupt+145/176] smp_apic_timer_interrupt+0x91/0xb0
Jan 23 13:16:00 cocoduo kernel: [ 466.730674] [apic_timer_interrupt
+31/36] apic_timer_interrupt+0x1f/0x24
Jan 23 13:16:00 cocoduo kernel: [ 466.730684] [_spin_lock_bh+18/32]
_spin_lock_bh+0x12/0x20
Jan 23 13:16:00 cocoduo kernel: [ 466.730693] [pg0
+945944213/1068803072] sky2_tx_timeout+0xf5/0x1d0 [sky2]
Jan 23 13:16:00 cocoduo kernel: [ 466.730707] [dev_watchdog+0/208]
dev_watchdog+0x0/0xd0
Jan 23 13:16:00 cocoduo kernel: [ 466.730712] [dev_watchdog+192/208]
dev_watchdog+0xc0/0xd0
Jan 23 13:16:00 cocoduo kernel: [ 466.730718] [run_timer_softirq
+273/400] run_timer_softirq+0x111/0x190
Jan 23 13:16:00 cocoduo kernel: [ 466.730728] [__do_softirq+116/240]
__do_softirq+0x74/0xf0
Jan 23 13:16:00 cocoduo kernel: [ 466.730734] [do_softirq+59/80]
do_softirq+0x3b/0x50
Jan 23 13:16:00 cocoduo kernel: [ 466.730739]
[smp_apic_timer_interrupt+150/176] smp_apic_timer_interrupt+0x96/0xb0
Jan 23 13:16:00 cocoduo kernel: [ 466.730746] [apic_timer_interrupt
+31/36] apic_timer_interrupt+0x1f/0x24
Jan 23 13:16:00 cocoduo kernel: [ 466.730756] [mwait_idle_with_hints
+70/96] mwait_idle_with_hints+0x46/0x60
Jan 23 13:16:00 cocoduo kernel: [ 466.730762] [mwait_idle+12/32]
mwait_idle+0xc/0x20
Jan 23 13:16:00 cocoduo kernel: [ 466.730766] [cpu_idle+116/208]
cpu_idle+0x74/0xd0
Jan 23 13:16:00 cocoduo kernel: [ 466.730773] [start_kernel+872/1072]
start_kernel+0x368/0x430
Jan 23 13:16:00 cocoduo kernel: [ 466.730780] [unknown_bootoption
+0/624] unknown_bootoption+0x0/0x270
Jan 23 13:16:00 cocoduo kernel: [ 466.730790] =======================


2007-01-23 21:32:26

by Len Brown

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes

On Tuesday 23 January 2007 07:27, Lionel Landwerlin wrote:
> Le mardi 23 janvier 2007 à 17:22 +0800, Luming Yu a écrit :
> > Please try to remove processor module.
>
> Ok, that's done. Same problem.

any difference with "idle=poll"?
if yes, how about "idle=halt"?

2007-01-24 00:45:49

by Lionel Landwerlin

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes

Le mardi 23 janvier 2007 à 16:30 -0500, Len Brown a écrit :
> On Tuesday 23 January 2007 07:27, Lionel Landwerlin wrote:
> > Le mardi 23 janvier 2007 à 17:22 +0800, Luming Yu a écrit :
> > > Please try to remove processor module.
> >
> > Ok, that's done. Same problem.
>
> any difference with "idle=poll"?
> if yes, how about "idle=halt"?

idle=poll seems to fix the problem (cpu fan is running almost at full
speed). Maybe I should run a longer test... For now it consists to run
about 15 torrents and watching HDTV through ethernet device.

idle=halt does not :

Jan 24 01:37:02 cocoduo kernel: [ 1562.672639] NETDEV WATCHDOG: eth0: transmit timed out
Jan 24 01:37:02 cocoduo kernel: [ 1562.672648] sky2 eth0: tx timeout
Jan 24 01:37:02 cocoduo kernel: [ 1562.672656] sky2 eth0: transmit ring 243 .. 222 report=244 done=244
Jan 24 01:37:02 cocoduo kernel: [ 1562.672660] sky2 status report lost?
Jan 24 01:37:11 cocoduo kernel: [ 1571.787958] BUG: soft lockup detected on CPU#0!
Jan 24 01:37:11 cocoduo kernel: [ 1571.787989] [softlockup_tick+155/208] softlockup_tick+0x9b/0xd0
Jan 24 01:37:11 cocoduo kernel: [ 1571.788007] [update_process_times+49/128] update_process_times+0x31/0x80
Jan 24 01:37:11 cocoduo kernel: [ 1571.788020] [smp_apic_timer_interrupt+145/176] smp_apic_timer_interrupt+0x91/0xb0
Jan 24 01:37:11 cocoduo kernel: [ 1571.788032] [apic_timer_interrupt+31/36] apic_timer_interrupt+0x1f/0x24
Jan 24 01:37:11 cocoduo kernel: [ 1571.788048] [_spin_lock_bh+15/32] _spin_lock_bh+0xf/0x20
Jan 24 01:37:11 cocoduo kernel: [ 1571.788061] [pg0+946730645/1068803072] sky2_tx_timeout+0xf5/0x1d0 [sky2]
Jan 24 01:37:11 cocoduo kernel: [ 1571.788083] [dev_watchdog+0/208] dev_watchdog+0x0/0xd0
Jan 24 01:37:11 cocoduo kernel: [ 1571.788090] [dev_watchdog+192/208] dev_watchdog+0xc0/0xd0
Jan 24 01:37:11 cocoduo kernel: [ 1571.788099] [run_timer_softirq+273/400] run_timer_softirq+0x111/0x190
Jan 24 01:37:11 cocoduo kernel: [ 1571.788114] [__do_softirq+116/240] __do_softirq+0x74/0xf0
Jan 24 01:37:11 cocoduo kernel: [ 1571.788125] [do_softirq+59/80] do_softirq+0x3b/0x50
Jan 24 01:37:11 cocoduo kernel: [ 1571.788134] [smp_apic_timer_interrupt+150/176] smp_apic_timer_interrupt+0x96/0xb0
Jan 24 01:37:11 cocoduo kernel: [ 1571.788143] [apic_timer_interrupt+31/36] apic_timer_interrupt+0x1f/0x24
Jan 24 01:37:11 cocoduo kernel: [ 1571.788153] [default_idle+0/112] default_idle+0x0/0x70
Jan 24 01:37:11 cocoduo kernel: [ 1571.788166] [default_idle+54/112] default_idle+0x36/0x70
Jan 24 01:37:11 cocoduo kernel: [ 1571.788175] [cpu_idle+116/208] cpu_idle+0x74/0xd0
Jan 24 01:37:11 cocoduo kernel: [ 1571.788184] [start_kernel+872/1072] start_kernel+0x368/0x430
Jan 24 01:37:11 cocoduo kernel: [ 1571.788194] [unknown_bootoption+0/624] unknown_bootoption+0x0/0x270
Jan 24 01:37:11 cocoduo kernel: [ 1571.788208] =======================

--
Lionel Landwerlin <[email protected]>

2007-01-24 01:39:39

by Luming Yu

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes

On 1/24/07, Lionel Landwerlin <[email protected]> wrote:
> Le mardi 23 janvier 2007 ? 16:30 -0500, Len Brown a ?crit :
> > On Tuesday 23 January 2007 07:27, Lionel Landwerlin wrote:
> > > Le mardi 23 janvier 2007 ? 17:22 +0800, Luming Yu a ?crit :
> > > > Please try to remove processor module.
> > >
> > > Ok, that's done. Same problem.
> >
> > any difference with "idle=poll"?
> > if yes, how about "idle=halt"?
>
> idle=poll seems to fix the problem (cpu fan is running almost at full
> speed). Maybe I should run a longer test... For now it consists to run
> about 15 torrents and watching HDTV through ethernet device.
>
> idle=halt does not :

It sounds like issues relative to Processor C state.
Please enter a bug in ACPI category on bugzilla.kernel.org

2007-01-24 02:38:17

by Len Brown

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes


> > > Apple Macbook 2GHz (x86, not amd64)

> > > > > Please try to remove processor module.
> > > >
> > > > Ok, that's done. Same problem.
> > >
> > > any difference with "idle=poll"?
> > > if yes, how about "idle=halt"?
> >
> > idle=poll seems to fix the problem (cpu fan is running almost at full
> > speed). Maybe I should run a longer test... For now it consists to run
> > about 15 torrents and watching HDTV through ethernet device.
> >
> > idle=halt does not :
>
> It sounds like issues relative to Processor C state.
> Please enter a bug in ACPI category on bugzilla.kernel.org

Actually, the test above with the processor module removed proved
that it isn't ACPI C-states -- as they will not be available.
You should be able to observe that /proc/acpi/processor/*/power
does not indicate any C-state use when processor is unloaded.

My guess was that some deep C-state with long exit latency
was interfering with the device. booting w/o the processor
module should have left you running the native mwait idle.
booting with idle=halt should have left you running the HLT idle.
booting with idle=poll is a busy spin loop that never enters
any hardware power saving state.

I'm quite puzzled that idle=halt was not sufficient to solve the issue,
because that should be the lowest exit latency idle loop.
So maybe I'm wrong about the cause -- though I can't then
explain why idle=poll helps...

All of the idle selection options cause the kernel to print
a line with the word "idle" in it. Perhaps you could search
your dmesg for "idle" to verify that it is running what we
think it is?

-Len


2007-01-24 12:29:12

by Lionel Landwerlin

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes

Le mardi 23 janvier 2007 à 21:36 -0500, Len Brown a écrit :
> > > > Apple Macbook 2GHz (x86, not amd64)
>
> > > > > > Please try to remove processor module.
> > > > >
> > > > > Ok, that's done. Same problem.
> > > >
> > > > any difference with "idle=poll"?
> > > > if yes, how about "idle=halt"?
> > >
> > > idle=poll seems to fix the problem (cpu fan is running almost at full
> > > speed). Maybe I should run a longer test... For now it consists to run
> > > about 15 torrents and watching HDTV through ethernet device.
> > >
> > > idle=halt does not :
> >
> > It sounds like issues relative to Processor C state.
> > Please enter a bug in ACPI category on bugzilla.kernel.org
>
> Actually, the test above with the processor module removed proved
> that it isn't ACPI C-states -- as they will not be available.
> You should be able to observe that /proc/acpi/processor/*/power
> does not indicate any C-state use when processor is unloaded.
>
> My guess was that some deep C-state with long exit latency
> was interfering with the device. booting w/o the processor
> module should have left you running the native mwait idle.
> booting with idle=halt should have left you running the HLT idle.
> booting with idle=poll is a busy spin loop that never enters
> any hardware power saving state.
>
> I'm quite puzzled that idle=halt was not sufficient to solve the issue,
> because that should be the lowest exit latency idle loop.
> So maybe I'm wrong about the cause -- though I can't then
> explain why idle=poll helps...
>
> All of the idle selection options cause the kernel to print
> a line with the word "idle" in it. Perhaps you could search
> your dmesg for "idle" to verify that it is running what we
> think it is?

Here I join the complete log for idle=halt

I'm running idle=poll for more 1 hour now with heavy ethernet load, no
crash. It usualy happens in 10~15mn with idle=halt and 4~5mn with no
idle option.

--
Lionel Landwerlin <[email protected]>


Attachments:
halt.txt (46.64 kB)

2007-01-24 13:02:49

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes

On Tue, 23 Jan 2007 11:12:50 +0000, Andrew Lyon wrote:

> On 1/23/07, Soeren Sonnenburg <[email protected]> wrote:
>> On Tue, 23 Jan 2007 08:59:28 +0000, Lionel Landwerlin wrote:
>>
>> > Hi,
>> >
>> > I'm running a macbook with a Marvell ethernet controller, and I have a
>> > lots of freezes when using the ethernet controller under a load of
>> > ~100K/s. Since I'm running a 2.6.19.2 kernel, I'm able to get some
>> > report from the kernel. Here they are :
>>
>> I am also having trouble with the sky2 module, though I've not yet seen a
>> oops, the driver stopped working after some heavy traffic (copying some G
>> of data). Only rmmod sky2; modprobe sky2 resolved this. (I am also on
>> 2.6.19.2 but I've seen this happen on 2.6.20-rcX too).

[...]
> Ive also had the same problem with both 2.6.19.2 and 2.6.20-rcX,
> motherboard is gigabyte ga-965-ds3 , the networking stops completely
> under moderate traffic, I get the following errors or a complete
> lockup:
>
> Jan 21 02:08:04 beast NETDEV WATCHDOG: eth0: transmit timed out
> Jan 21 02:08:04 beast sky2 eth0: tx timeout
> Jan 21 02:08:04 beast sky2 eth0: transmit ring 475 .. 452 report=475 done=475
> Jan 21 02:08:04 beast sky2 hardware hung? flushing
>
> At the time I was downloading a iso image at 850k/sec, so not really a
> high network load at all.
>
> rmmod / modprobe does resolve the issue, but more times than not the
> box locks up completely instead of getting those errors.

I am on a completely different system (macbook pro 1,1) with PREEMPT and
cpu frequency scaling on not sure how this could be related...

Soeren

2007-01-24 13:04:26

by Soeren Sonnenburg

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes

[This mail was also posted to newsgate.kernel.]

On Tue, 23 Jan 2007 11:12:50 +0000, Andrew Lyon wrote:

> On 1/23/07, Soeren Sonnenburg <[email protected]> wrote:
>> On Tue, 23 Jan 2007 08:59:28 +0000, Lionel Landwerlin wrote:
>>
>> > Hi,
>> >
>> > I'm running a macbook with a Marvell ethernet controller, and I have a
>> > lots of freezes when using the ethernet controller under a load of
>> > ~100K/s. Since I'm running a 2.6.19.2 kernel, I'm able to get some
>> > report from the kernel. Here they are :
>>
>> I am also having trouble with the sky2 module, though I've not yet seen a
>> oops, the driver stopped working after some heavy traffic (copying some G
>> of data). Only rmmod sky2; modprobe sky2 resolved this. (I am also on
>> 2.6.19.2 but I've seen this happen on 2.6.20-rcX too).

[...]
> Ive also had the same problem with both 2.6.19.2 and 2.6.20-rcX,
> motherboard is gigabyte ga-965-ds3 , the networking stops completely
> under moderate traffic, I get the following errors or a complete
> lockup:
>
> Jan 21 02:08:04 beast NETDEV WATCHDOG: eth0: transmit timed out
> Jan 21 02:08:04 beast sky2 eth0: tx timeout
> Jan 21 02:08:04 beast sky2 eth0: transmit ring 475 .. 452 report=475 done=475
> Jan 21 02:08:04 beast sky2 hardware hung? flushing
>
> At the time I was downloading a iso image at 850k/sec, so not really a
> high network load at all.
>
> rmmod / modprobe does resolve the issue, but more times than not the
> box locks up completely instead of getting those errors.

I am on a completely different system (macbook pro 1,1) with PREEMPT and
cpu frequency scaling on not sure how this could be related...

Soeren
--
Sometimes, there's a moment as you're waking, when you become aware of
the real world around you, but you're still dreaming.

2007-01-24 17:47:28

by Stephen Hemminger

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes

On Tue, 23 Jan 2007 09:59:28 +0100
Lionel Landwerlin <[email protected]> wrote:

> Hi,
>
> I'm running a macbook with a Marvell ethernet controller, and I have a
> lots of freezes when using the ethernet controller under a load of
> ~100K/s. Since I'm running a 2.6.19.2 kernel, I'm able to get some
> report from the kernel. Here they are :

Please send sky2 bugs to me <[email protected]> and
[email protected].

> I hope some fix could be released soon.

I get problem reports all the time, unfortunately, so far these have
not been reproducible on the configurations and hardware I have. I am
not denying there is a problem, but if I can't reproduce it, it takes
a long time to fix.

Your problem seems to be missed/lost interrupts. If you display
the contents of /proc/interrupts (ie cat /proc/interrupts), it
will show whether level (good), edge (bad) or MSI (good if hw works)
are being used.

Some workaround related things to try are:

1) Adding the module parameter "idle_timeout=10" will cause
the driver to poll for status every 10ms. This is obviously a performance
overhead but it can allow system to function.

2) Disabling MSI with either "pci=nomsi" on boot cmdline or by using
module parameter "disable_msi=1". Message Signaled Interrupts are good,
but it seems some chipsets don't work right.


--
Stephen Hemminger <[email protected]>

2007-01-24 19:53:35

by Len Brown

[permalink] [raw]
Subject: Re: 2.6.19.2 sky2/acpi crashes

On Wednesday 24 January 2007 07:23, Lionel Landwerlin wrote:
> Le mardi 23 janvier 2007 à 21:36 -0500, Len Brown a écrit :
> > > > > Apple Macbook 2GHz (x86, not amd64)
> >
> > > > > > > Please try to remove processor module.
> > > > > >
> > > > > > Ok, that's done. Same problem.
> > > > >
> > > > > any difference with "idle=poll"?
> > > > > if yes, how about "idle=halt"?
> > > >
> > > > idle=poll seems to fix the problem (cpu fan is running almost at full
> > > > speed). Maybe I should run a longer test... For now it consists to run
> > > > about 15 torrents and watching HDTV through ethernet device.
> > > >
> > > > idle=halt does not :
> > >
> > > It sounds like issues relative to Processor C state.
> > > Please enter a bug in ACPI category on bugzilla.kernel.org
> >
> > Actually, the test above with the processor module removed proved
> > that it isn't ACPI C-states -- as they will not be available.
> > You should be able to observe that /proc/acpi/processor/*/power
> > does not indicate any C-state use when processor is unloaded.
> >
> > My guess was that some deep C-state with long exit latency
> > was interfering with the device. booting w/o the processor
> > module should have left you running the native mwait idle.
> > booting with idle=halt should have left you running the HLT idle.
> > booting with idle=poll is a busy spin loop that never enters
> > any hardware power saving state.
> >
> > I'm quite puzzled that idle=halt was not sufficient to solve the issue,
> > because that should be the lowest exit latency idle loop.
> > So maybe I'm wrong about the cause -- though I can't then
> > explain why idle=poll helps...
> >
> > All of the idle selection options cause the kernel to print
> > a line with the word "idle" in it. Perhaps you could search
> > your dmesg for "idle" to verify that it is running what we
> > think it is?
>
> Here I join the complete log for idle=halt

it is indeed doing what you asked it to
Jan 24 01:11:21 cocoduo kernel: [ 30.741018] using halt in idle threads.

> I'm running idle=poll for more 1 hour now with heavy ethernet load, no
> crash. It usualy happens in 10~15mn with idle=halt and 4~5mn with no
> idle option.

I think my guess is wrong. If idle=halt doesn't help, then the failure
doesn't have anything to do with the idle loop and power saving idle states.

I can't explain why idle=poll helps.

-Len