2009-03-29 06:20:45

by Paul Collins

[permalink] [raw]
Subject: soft lockup apparently in ath5k_hw_update_mib_counters (or ioread32?) with 2.6.29

After about two days of uptime with 2.6.29 I got this:

BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables vfat fat usb_storage sch_sfq i915 drm i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect btusb rfcomm hidp l2cap bluetooth tun cpufreq_stats rpcsec_gss_krb5 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc fuse cbc aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod fbcon font bitblit softcursor fb kvm_intel kvm acpi_cpufreq firewire_sbp2 loop snd_hda_intel snd_pcm arc4 snd_seq_midi snd_rawmidi ecb snd_seq_midi_event snd_seq snd_timer ath5k snd_seq_device snd mac80211 soundcore firewire_ohci firewire_core thermal snd_page_alloc cfg80211 i2c_i801 crc_itu_t button processor evdev
CPU 0:
Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables vfat fat usb_storage sch_sfq i915 drm i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect btusb rfcomm hidp l2cap bluetooth tun cpufreq_stats rpcsec_gss_krb5 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc fuse cbc aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod fbcon font bitblit softcursor fb kvm_intel kvm acpi_cpufreq firewire_sbp2 loop snd_hda_intel snd_pcm arc4 snd_seq_midi snd_rawmidi ecb snd_seq_midi_event snd_seq snd_timer ath5k snd_seq_device snd mac80211 soundcore firewire_ohci firewire_core thermal snd_page_alloc cfg80211 i2c_i801 crc_itu_t button processor evdev
Pid: 0, comm: swapper Not tainted 2.6.29-00003-g0be8685 #163 Macmini2,1
RIP: 0010:[<ffffffff803950f0>] [<ffffffff803950f0>] ioread32+0xf/0x32
RSP: 0018:ffffffff807dfc30 EFLAGS: 00000296
RAX: 0000000000000000 RBX: ffffffff807dfc30 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88007cb51a68 RDI: ffffc200109e8094
RBP: ffffffff8020d033 R08: ffffffff8070a600 R09: 0000000000000064
R10: ffffffff807dfd60 R11: ffff88007cb51980 R12: ffffffff807dfbb0
R13: ffff88007c916000 R14: 000000000076b647 R15: ffffffff8051966c
FS: 0000000000000000(0000) GS:ffffffff807e8000(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007feefddbb0d0 CR3: 000000007a74e000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
<IRQ> [<ffffffffa009a884>] ? ath5k_hw_update_mib_counters+0xd9/0x15a [ath5k]
[<ffffffffa00a0707>] ? ath5k_intr+0x24c/0x2bd [ath5k]
[<ffffffff8021dcfb>] ? lapic_timer_broadcast+0x18/0x1a
[<ffffffff8025f22c>] ? tick_handle_oneshot_broadcast+0xf3/0x101
[<ffffffff80278e07>] ? handle_IRQ_event+0x33/0x6a
[<ffffffff8027a248>] ? handle_fasteoi_irq+0x9b/0xe0
[<ffffffff8027a1a3>] ? handle_edge_irq+0x126/0x130
[<ffffffff80519565>] ? __irqentry_text_start+0x75/0xe9
[<ffffffff8020cdd3>] ? ret_from_intr+0x0/0x29
[<ffffffff804a6b19>] ? neigh_timer_handler+0x0/0x31b
[<ffffffff8024a038>] ? round_jiffies_common+0x28/0x5a
[<ffffffff804a593e>] ? neigh_periodic_timer+0x146/0x191
[<ffffffff8024a0b5>] ? round_jiffies+0x18/0x1a
[<ffffffff804a5955>] ? neigh_periodic_timer+0x15d/0x191
[<ffffffff804a57f8>] ? neigh_periodic_timer+0x0/0x191
[<ffffffff8024a4e7>] ? run_timer_softirq+0x19c/0x21d
[<ffffffff8024679a>] ? __do_softirq+0x8f/0x159
[<ffffffff8020d65c>] ? call_softirq+0x1c/0x28
[<ffffffff8020e6f1>] ? do_softirq+0x49/0x94
[<ffffffff8024647c>] ? irq_exit+0x44/0x7e
[<ffffffff805195b8>] ? __irqentry_text_start+0xc8/0xe9
[<ffffffff8020cdd3>] ? ret_from_intr+0x0/0x29
<EOI> [<ffffffff80260174>] ? tick_nohz_restart_sched_tick+0x140/0x149
[<ffffffff80260146>] ? tick_nohz_restart_sched_tick+0x112/0x149
[<ffffffff8020b2d0>] ? cpu_idle+0x98/0x9f
[<ffffffff80501157>] ? rest_init+0x6b/0x6d

Hardware:

ath5k phy0: Atheros AR5414 chip found (MAC: 0xa3, PHY: 0x61)

02:00.0 Ethernet controller [0200]: Atheros Communications Inc. AR242x 802.11abg Wireless PCI Express Adapter [168c:001c] (rev 01)
Subsystem: Apple Computer Inc. Device [106b:0086]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 256 bytes
Interrupt: pin A routed to IRQ 17
Region 0: Memory at 90100000 (64-bit, non-prefetchable) [size=64K]
Capabilities: <access denied>
Kernel driver in use: ath5k

--
Paul Collins
Wellington, New Zealand

Dag vijandelijk luchtschip de huismeester is dood


2009-03-29 09:00:20

by Paul Collins

[permalink] [raw]
Subject: Re: soft lockup apparently in ath5k_hw_update_mib_counters (or ioread32?) with 2.6.29

Jiri Slaby <[email protected]> writes:

> On 03/29/2009 08:20 AM, Paul Collins wrote:
>> After about two days of uptime with 2.6.29 I got this:
>>
>> BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
>> Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables vfat fat usb_storage sch_sfq i915 drm i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect btusb rfcomm hidp l2cap bluetooth tun cpufreq_stats rpcsec_gss_krb5 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc fuse cbc aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod fbcon font bitblit softcursor fb kvm_intel kvm acpi_cpufreq firewire_sbp2 loop snd_hda_intel snd_pcm arc4 snd_seq_midi snd_rawmidi ecb snd_seq_midi_event snd_seq snd_timer ath5k snd_seq_device snd mac80211 soundcore firewire_ohci firewire_core thermal snd_page_alloc cfg80211 i2c_i801 crc_itu_t button processor evdev
>> CPU 0:
>> Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables vfat fat usb_storage sch_sfq i915 drm i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect btusb rfcomm hidp l2cap bluetooth tun cpufreq_stats rpcsec_gss_krb5 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc fuse cbc aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod fbcon font bitblit softcursor fb kvm_intel kvm acpi_cpufreq firewire_sbp2 loop snd_hda_intel snd_pcm arc4 snd_seq_midi snd_rawmidi ecb snd_seq_midi_event snd_seq snd_timer ath5k snd_seq_device snd mac80211 soundcore firewire_ohci firewire_core thermal snd_page_alloc cfg80211 i2c_i801 crc_itu_t button processor evdev
>> Pid: 0, comm: swapper Not tainted 2.6.29-00003-g0be8685 #163 Macmini2,1
>> RIP: 0010:[<ffffffff803950f0>] [<ffffffff803950f0>] ioread32+0xf/0x32
>
> Huh. I see no reason for this to happen. I suppose this is a regression,
> which kernel worked?

I had a previous instance of the problem with v2.6.29-rc8 after about
six days of uptime. The kernel before that was 2.6.29-rc7ish, ran OK
for a couple of days. But since I don't know how to reproduce it
reliably it's hard to say which kernel is truly good.

> There is nothing like "too many interrupts, giving up for now" in dmesg,
> right?

No, no messages like that. I do however get timing-related errors at
about the time the soft lockup detector indicates the problem began.

With 2.6.29 I got

Mar 29 18:45:48 burly kernel: Clocksource tsc unstable (delta = 532189008 ns)
Mar 29 18:46:11 burly kernel: wlan0: No ProbeResp from current AP 00:1a:70:ee:7c:d6 - assume out of range
Mar 29 18:46:34 burly kernel: BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]

and with 2.6.29-rc8 I got

Mar 20 21:19:59 burly kernel: CE: hpet increasing min_delta_ns to 22500 nsec
Mar 20 21:21:02 burly kernel: BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]

but maybe these just symptoms.

--
Paul Collins
Wellington, New Zealand

Dag vijandelijk luchtschip de huismeester is dood

2009-03-29 08:29:42

by Jiri Slaby

[permalink] [raw]
Subject: Re: soft lockup apparently in ath5k_hw_update_mib_counters (or ioread32?) with 2.6.29

On 03/29/2009 08:20 AM, Paul Collins wrote:
> After about two days of uptime with 2.6.29 I got this:
>
> BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
> Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables vfat fat usb_storage sch_sfq i915 drm i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect btusb rfcomm hidp l2cap bluetooth tun cpufreq_stats rpcsec_gss_krb5 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc fuse cbc aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod fbcon font bitblit softcursor fb kvm_intel kvm acpi_cpufreq firewire_sbp2 loop snd_hda_intel snd_pcm arc4 snd_seq_midi snd_rawmidi ecb snd_seq_midi_event snd_seq snd_timer ath5k snd_seq_device snd mac80211 soundcore firewire_ohci firewire_core thermal snd_page_alloc cfg80211 i2c_i801 crc_itu_t button processor evdev
> CPU 0:
> Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables vfat fat usb_storage sch_sfq i915 drm i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect btusb rfcomm hidp l2cap bluetooth tun cpufreq_stats rpcsec_gss_krb5 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc fuse cbc aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod fbcon font bitblit softcursor fb kvm_intel kvm acpi_cpufreq firewire_sbp2 loop snd_hda_intel snd_pcm arc4 snd_seq_midi snd_rawmidi ecb snd_seq_midi_event snd_seq snd_timer ath5k snd_seq_device snd mac80211 soundcore firewire_ohci firewire_core thermal snd_page_alloc cfg80211 i2c_i801 crc_itu_t button processor evdev
> Pid: 0, comm: swapper Not tainted 2.6.29-00003-g0be8685 #163 Macmini2,1
> RIP: 0010:[<ffffffff803950f0>] [<ffffffff803950f0>] ioread32+0xf/0x32

Huh. I see no reason for this to happen. I suppose this is a regression,
which kernel worked?

There is nothing like "too many interrupts, giving up for now" in dmesg,
right?

> RSP: 0018:ffffffff807dfc30 EFLAGS: 00000296
> RAX: 0000000000000000 RBX: ffffffff807dfc30 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffff88007cb51a68 RDI: ffffc200109e8094
> RBP: ffffffff8020d033 R08: ffffffff8070a600 R09: 0000000000000064
> R10: ffffffff807dfd60 R11: ffff88007cb51980 R12: ffffffff807dfbb0
> R13: ffff88007c916000 R14: 000000000076b647 R15: ffffffff8051966c
> FS: 0000000000000000(0000) GS:ffffffff807e8000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 00007feefddbb0d0 CR3: 000000007a74e000 CR4: 00000000000026e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Call Trace:
> <IRQ> [<ffffffffa009a884>] ? ath5k_hw_update_mib_counters+0xd9/0x15a [ath5k]
> [<ffffffffa00a0707>] ? ath5k_intr+0x24c/0x2bd [ath5k]
> [<ffffffff8021dcfb>] ? lapic_timer_broadcast+0x18/0x1a
> [<ffffffff8025f22c>] ? tick_handle_oneshot_broadcast+0xf3/0x101
> [<ffffffff80278e07>] ? handle_IRQ_event+0x33/0x6a
> [<ffffffff8027a248>] ? handle_fasteoi_irq+0x9b/0xe0
> [<ffffffff8027a1a3>] ? handle_edge_irq+0x126/0x130
> [<ffffffff80519565>] ? __irqentry_text_start+0x75/0xe9
> [<ffffffff8020cdd3>] ? ret_from_intr+0x0/0x29
> [<ffffffff804a6b19>] ? neigh_timer_handler+0x0/0x31b
> [<ffffffff8024a038>] ? round_jiffies_common+0x28/0x5a
> [<ffffffff804a593e>] ? neigh_periodic_timer+0x146/0x191
> [<ffffffff8024a0b5>] ? round_jiffies+0x18/0x1a
> [<ffffffff804a5955>] ? neigh_periodic_timer+0x15d/0x191
> [<ffffffff804a57f8>] ? neigh_periodic_timer+0x0/0x191
> [<ffffffff8024a4e7>] ? run_timer_softirq+0x19c/0x21d
> [<ffffffff8024679a>] ? __do_softirq+0x8f/0x159
> [<ffffffff8020d65c>] ? call_softirq+0x1c/0x28
> [<ffffffff8020e6f1>] ? do_softirq+0x49/0x94
> [<ffffffff8024647c>] ? irq_exit+0x44/0x7e
> [<ffffffff805195b8>] ? __irqentry_text_start+0xc8/0xe9
> [<ffffffff8020cdd3>] ? ret_from_intr+0x0/0x29
> <EOI> [<ffffffff80260174>] ? tick_nohz_restart_sched_tick+0x140/0x149
> [<ffffffff80260146>] ? tick_nohz_restart_sched_tick+0x112/0x149
> [<ffffffff8020b2d0>] ? cpu_idle+0x98/0x9f
> [<ffffffff80501157>] ? rest_init+0x6b/0x6d
>
> Hardware:
>
> ath5k phy0: Atheros AR5414 chip found (MAC: 0xa3, PHY: 0x61)
>
> 02:00.0 Ethernet controller [0200]: Atheros Communications Inc. AR242x 802.11abg Wireless PCI Express Adapter [168c:001c] (rev 01)
> Subsystem: Apple Computer Inc. Device [106b:0086]
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0, Cache Line Size: 256 bytes
> Interrupt: pin A routed to IRQ 17
> Region 0: Memory at 90100000 (64-bit, non-prefetchable) [size=64K]
> Capabilities: <access denied>
> Kernel driver in use: ath5k
>


2009-03-29 13:15:27

by Bob Copeland

[permalink] [raw]
Subject: Re: soft lockup apparently in ath5k_hw_update_mib_counters (or ioread32?) with 2.6.29

On Sun, Mar 29, 2009 at 5:37 AM, Michael Tokarev <[email protected]> wrote=
:
> Paul Collins wrote:
>>
>> Jiri Slaby <[email protected]> writes:
>>
>>> On 03/29/2009 08:20 AM, Paul Collins wrote:
>>>>
>>>> After about two days of uptime with 2.6.29 I got this:
>>>>
>>>> BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
>>> Huh. I see no reason for this to happen. I suppose this is a regres=
sion,
>>> which kernel worked?
>>
>> I had a previous instance of the problem with v2.6.29-rc8 after abou=
t
>> six days of uptime. =A0The kernel before that was 2.6.29-rc7ish, ran=
OK
>> for a couple of days. =A0But since I don't know how to reproduce it
>> reliably it's hard to say which kernel is truly good.

I think it's just an interrupt storm because we don't reset MIB
interrupts properly (I think not all devices are read-and-clear). I
already have a patch in wireless-testing to turn off MIB interrupts unt=
il
ANI is working and I CCed stable on it.

http://marc.info/?l=3Dlinux-wireless&m=3D123725924404147&w=3D2

--=20
Bob Copeland %% http://www.bobcopeland.com

2009-03-29 09:37:42

by Michael Tokarev

[permalink] [raw]
Subject: Re: soft lockup apparently in ath5k_hw_update_mib_counters (or ioread32?) with 2.6.29

Paul Collins wrote:
> Jiri Slaby <[email protected]> writes:
>
>> On 03/29/2009 08:20 AM, Paul Collins wrote:
>>> After about two days of uptime with 2.6.29 I got this:
>>>
>>> BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
>>> Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables vfat fat usb_storage sch_sfq i915 drm i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect btusb rfcomm hidp l2cap bluetooth tun cpufreq_stats rpcsec_gss_krb5 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc fuse cbc aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod fbcon font bitblit softcursor fb kvm_intel kvm acpi_cpufreq firewire_sbp2 loop snd_hda_intel snd_pcm arc4 snd_seq_midi snd_rawmidi ecb snd_seq_midi_event snd_seq snd_timer ath5k snd_seq_device snd mac80211 soundcore firewire_ohci firewire_core thermal snd_page_alloc cfg80211 i2c_i801 crc_itu_t button processor evdev
>>> CPU 0:
>>> Modules linked in: ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables vfat fat usb_storage sch_sfq i915 drm i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect btusb rfcomm hidp l2cap bluetooth tun cpufreq_stats rpcsec_gss_krb5 nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc fuse cbc aes_x86_64 aes_generic xts gf128mul dm_crypt dm_mod fbcon font bitblit softcursor fb kvm_intel kvm acpi_cpufreq firewire_sbp2 loop snd_hda_intel snd_pcm arc4 snd_seq_midi snd_rawmidi ecb snd_seq_midi_event snd_seq snd_timer ath5k snd_seq_device snd mac80211 soundcore firewire_ohci firewire_core thermal snd_page_alloc cfg80211 i2c_i801 crc_itu_t button processor evdev
>>> Pid: 0, comm: swapper Not tainted 2.6.29-00003-g0be8685 #163 Macmini2,1
>>> RIP: 0010:[<ffffffff803950f0>] [<ffffffff803950f0>] ioread32+0xf/0x32
>> Huh. I see no reason for this to happen. I suppose this is a regression,
>> which kernel worked?
>
> I had a previous instance of the problem with v2.6.29-rc8 after about
> six days of uptime. The kernel before that was 2.6.29-rc7ish, ran OK
> for a couple of days. But since I don't know how to reproduce it
> reliably it's hard to say which kernel is truly good.
>
>> There is nothing like "too many interrupts, giving up for now" in dmesg,
>> right?
>
> No, no messages like that. I do however get timing-related errors at
> about the time the soft lockup detector indicates the problem began.
>
> With 2.6.29 I got
>
> Mar 29 18:45:48 burly kernel: Clocksource tsc unstable (delta = 532189008 ns)
> Mar 29 18:46:11 burly kernel: wlan0: No ProbeResp from current AP 00:1a:70:ee:7c:d6 - assume out of range
> Mar 29 18:46:34 burly kernel: BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
>
> and with 2.6.29-rc8 I got
>
> Mar 20 21:19:59 burly kernel: CE: hpet increasing min_delta_ns to 22500 nsec

Just a wild guest, finger-to-the-sky, but how about trying with hpet=disable
kernel option? After all that disasters I've seen with various hpet issues,
every time I see something about it, especially those familiar "CEL hpet
increasing min_delta" messages, I'm starting trembling again ;)

/mjt