2009-12-06 22:12:03

by Alan Stern

[permalink] [raw]
Subject: REGRESSION for RT2561/RT61 in 2.6.32

Sorry for the late notice; I haven't tried running new kernels on this
computer in a long time.

When I use 2.6.32 (or indeed 2.6.32-rc1), it works fine up until the
point where the wireless card is ifconfig'ed. Then not long afterward
(a few seconds to a minute) the system hangs.

# lspci -vv -s d.0
00:0d.0 Network controller: RaLink RT2561/RT61 802.11g PCI
Subsystem: Linksys WMP54G ver 4.1
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 11
Region 0: Memory at cfff0000 (32-bit, non-prefetchable) [size=32K]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: rt61pci
Kernel modules: rt61pci

While the system is hung, Alt-SysRq-keys still work but nothing else
shows up when I type (using a VT console, not X). However I don't know
what debugging to do or what to look for.

Under 2.6.31 everything works okay.

Any help appreciated,

Alan Stern



2009-12-08 14:48:07

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: REGRESSION for RT2561/RT61 in 2.6.32

On Tue, Dec 8, 2009 at 3:27 PM, Alan Stern <[email protected]> wrote:
> On Mon, 7 Dec 2009, Gertjan van Wingerde wrote:
>
>> > While the system is hung, Alt-SysRq-keys still work but nothing else
>> > shows up when I type (using a VT console, not X). ?However I don't know
>> > what debugging to do or what to look for.
>> >
>> > Under 2.6.31 everything works okay.
>> >
>>
>> Hmmm, that's odd. I haven't seen any other reports on this (on full system hangs that is).
>>
>> It would be very hard to figure out what is going wrong without any form of logging of
>> an oops or something. Is there any way you can find out if any oops information is present?
>
> There is no oops. ?The system just stops responding to keystrokes other
> than Alt-SysRq-*.
>
>> What does Alt-SysRq-l tell us, what are the CPU's doing when the system is hung?
>
> Unfortunately Alt-SysRq-l wasn't very helpful. ?The output is below
> (captured using a serial console), starting from when I configured the
> wireless interface. ?For thoroughness I have also attached the output
> of Alt-SysRq-T in single-user mode, but it doesn't seem to contain
> anything striking.
>
> Could the problem be connected with CONFIG_4KSTACKS (which was enabled
> in this kernel)?

Hmm, the stack overrun seems to be generated as part of the
Alt-SysRq-* handling, not of any rt61pci handling.

I noticed in the Alt-SysRq-T output that powersaving is enabled. We've
seen some strange behaviour on this, so could you disable that with:

iwconfig wlan0 power off

or run a kernel in which CONFIG_CFG80211_DEFAULT_PS is disabled.

Maybe that will fix the problem we are seeing?

Other than that I have no ideas on what is happening here.

---
Gertjan.

2009-12-08 14:27:39

by Alan Stern

[permalink] [raw]
Subject: Re: REGRESSION for RT2561/RT61 in 2.6.32

On Mon, 7 Dec 2009, Gertjan van Wingerde wrote:

> > While the system is hung, Alt-SysRq-keys still work but nothing else
> > shows up when I type (using a VT console, not X). However I don't know
> > what debugging to do or what to look for.
> >
> > Under 2.6.31 everything works okay.
> >
>
> Hmmm, that's odd. I haven't seen any other reports on this (on full system hangs that is).
>
> It would be very hard to figure out what is going wrong without any form of logging of
> an oops or something. Is there any way you can find out if any oops information is present?

There is no oops. The system just stops responding to keystrokes other
than Alt-SysRq-*.

> What does Alt-SysRq-l tell us, what are the CPU's doing when the system is hung?

Unfortunately Alt-SysRq-l wasn't very helpful. The output is below
(captured using a serial console), starting from when I configured the
wireless interface. For thoroughness I have also attached the output
of Alt-SysRq-T in single-user mode, but it doesn't seem to contain
anything striking.

Could the problem be connected with CONFIG_4KSTACKS (which was enabled
in this kernel)?

Alan Stern


[ 103.120951] rt61pci 0000:00:0d.0: firmware: requesting rt2561s.bin
[ 104.576241] wlan0: direct probe to AP 00:13:46:48:f3:2a (try 1)
[ 104.650501] wlan0: direct probe responded
[ 104.699132] wlan0: authenticate with AP 00:13:46:48:f3:2a (try 1)
[ 104.775338] wlan0: direct probe to AP 00:13:46:48:f3:2a (try 1)
[ 104.848989] wlan0: direct probe responded
[ 104.897641] wlan0: authenticate with AP 00:13:46:48:f3:2a (try 1)
[ 104.976899] wlan0: authenticated
[ 105.016632] wlan0: associate with AP 00:13:46:48:f3:2a (try 1)
[ 105.089350] wlan0: RX AssocResp from 00:13:46:48:f3:2a (capab=0x431 status=0 aid=2)
[ 105.181990] wlan0: associated
[ 169.329375] SysRq : Show backtrace of all active CPUs
[ 169.332017] sending NMI to all CPUs:
[ 169.332017] BUG: unable to handle kernel paging request at ffffb310
[ 169.332017] IP: [<c1010c1f>] __default_send_IPI_dest_field+0x3f/0x5f
[ 169.332017] *pde = 01341067 *pte = 00000000
[ 169.332017] Thread overran stack, or stack corrupted
[ 169.332017] Oops: 0002 [#1] PREEMPT SMP
[ 169.332017] last sysfs file: /sys/devices/pci0000:00/0000:00:0d.0/net/wlan0/broadcast
[ 169.332017] Modules linked in: arc4 ecb rt61pci crc_itu_t rt2x00pci rt2x00lib pcspkr mac80211 cfg80211 ehci_hcd eeprom_93cx6 sis900 mii ohci_hcd evdev floppy processor thermal_sys button usbcore [last unloaded: scsi_wait_scan]
[ 169.332017]
[ 169.332017] Pid: 0, comm: swapper Not tainted (2.6.32 #1) K7S5A
[ 169.332017] EIP: 0060:[<c1010c1f>] EFLAGS: 00010082 CPU: 0
[ 169.332017] EIP is at __default_send_IPI_dest_field+0x3f/0x5f
[ 169.332017] EAX: ffffb310 EBX: 00000800 ECX: 00000800 EDX: 01000000
[ 169.332017] ESI: 01000000 EDI: 00000002 EBP: c1c00e38 ESP: c1c00e2c
[ 169.332017] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 169.332017] Process swapper (pid: 0, ti=c1c00000 task=c12cd760 task.ti=c12b4000)
[ 169.332017] Stack:
[ 169.332017] 00000001 00000002 00000002 c1c00e4c c1010e2d 00000000 0000006c 00000001
[ 169.332017] <0> c1c00e58 c1010c61 00000000 c1c00e68 c1011087 c125f60b c12e6a80 c1c00e70
[ 169.332017] <0> c113677c c1c00e98 c11365c2 c127d944 c1284448 cf879890 00000009 00000082
[ 169.332017] Call Trace:
[ 169.332017] [<c1010e2d>] ? default_send_IPI_mask_logical+0x6c/0x8c
[ 169.332017] [<c1010c61>] ? default_send_IPI_all+0x22/0x60
[ 169.332017] [<c1011087>] ? arch_trigger_all_cpu_backtrace+0x2d/0x4f
[ 169.332017] [<c113677c>] ? sysrq_handle_showallcpus+0x8/0xa
[ 169.332017] [<c11365c2>] ? __handle_sysrq+0x9b/0x112
[ 169.332017] [<c113668e>] ? handle_sysrq+0x1f/0x21
[ 169.332017] [<c1130102>] ? kbd_event+0x303/0x570
[ 169.332017] [<c116128f>] ? input_pass_event+0x57/0x7a
[ 169.332017] [<c11623f6>] ? input_handle_event+0x35b/0x364
[ 169.332017] [<c11624c1>] ? input_event+0x4f/0x62
[ 169.332017] [<c11653a1>] ? atkbd_interrupt+0x454/0x51f
[ 169.332017] [<c115dfae>] ? serio_interrupt+0x33/0x66
[ 169.332017] [<c115f533>] ? i8042_interrupt+0x1d6/0x1e7
[ 169.332017] [<c103e815>] ? add_lock_to_list+0x36/0x95
[ 169.332017] [<c11ca18e>] ? _spin_unlock_irqrestore+0x53/0x60
[ 169.332017] [<c104d63b>] ? handle_IRQ_event+0x1d/0xa2
[ 169.332017] [<c104ec1b>] ? handle_level_irq+0x64/0xbc
[ 169.332017] [<c104ebb7>] ? handle_level_irq+0x0/0xbc
[ 169.332017] <IRQ>
[ 169.332017] [<c1003e82>] ? do_IRQ+0x45/0x9a
[ 169.332017] [<c1026d6a>] ? __do_softirq+0x0/0x10a
[ 169.332017] [<c100308e>] ? common_interrupt+0x2e/0x34
[ 169.332017] [<c1004747>] ? do_softirq+0x63/0xbc
[ 169.332017] [<c1026d6a>] ? __do_softirq+0x0/0x10a
[ 169.332017] [<c1026dbb>] ? __do_softirq+0x51/0x10a
[ 169.332017] [<c1026d6a>] ? __do_softirq+0x0/0x10a
[ 169.332017] <IRQ>
[ 169.332017] [<c1026afb>] ? irq_exit+0x38/0x7a
[ 169.332017] [<c1003ec1>] ? do_IRQ+0x84/0x9a
[ 169.332017] [<c100308e>] ? common_interrupt+0x2e/0x34
[ 169.332017] [<c1001d04>] ? cpu_idle+0x43/0x77
[ 169.332017] [<c1007d38>] ? default_idle+0x35/0x54
[ 169.332017] [<c1001d0a>] ? cpu_idle+0x49/0x77
[ 169.332017] [<c11bc5af>] ? rest_init+0x67/0x69
[ 169.332017] [<c12f47e5>] ? start_kernel+0x2ad/0x2b2
[ 169.332017] [<c12f4091>] ? i386_start_kernel+0x91/0x96
[ 169.332017] Code: 2d c1 ff 90 b8 00 00 00 eb 12 f3 90 a1 50 33 2d c1 8b 80 00 c3 ff ff f6 c4 10 75 ee a1 50 33 2d c1 c1 e6 18 89 f2 2d f0 3c 00 00 <89> 10 a1 50 33 2d c1 89 fa 09 da 80 cf 04 83 ff 02 0f 44 d3 2d
[ 169.332017] EIP: [<c1010c1f>] __default_send_IPI_dest_field+0x3f/0x5f SS:ESP 0068:c1c00e2c
[ 169.332017] CR2: 00000000ffffb310
[ 169.332017] ---[ end trace c3b8ebbecd15b86c ]---
[ 169.332017] Kernel panic - not syncing: Fatal exception in interrupt
[ 169.332017] Pid: 0, comm: swapper Tainted: G D 2.6.32 #1
[ 169.332017] Call Trace:
[ 169.332017] [<c11c818b>] ? printk+0xf/0x11
[ 169.332017] [<c11c80d7>] panic+0x43/0xe8
[ 169.332017] [<c11cb054>] oops_end+0x6e/0x7c
[ 169.332017] [<c1015257>] no_context+0x10c/0x116
[ 169.332017] [<c1015395>] __bad_area_nosemaphore+0x134/0x13c
[ 169.332017] [<c10eef3c>] ? __const_udelay+0x2c/0x2e
[ 169.332017] [<c10eefd4>] ? delay_tsc+0x79/0x91
[ 169.332017] [<c1140efb>] ? io_serial_out+0x0/0x15
[ 169.332017] [<c103e8df>] ? trace_hardirqs_off+0xb/0xd
[ 169.332017] [<c1141732>] ? serial8250_console_write+0xda/0xed
[ 169.332017] [<c10153aa>] bad_area_nosemaphore+0xd/0x10
[ 169.332017] [<c11cc1b5>] do_page_fault+0x11f/0x234
[ 169.332017] [<c11cc096>] ? do_page_fault+0x0/0x234
[ 169.332017] [<c11ca7bb>] error_code+0x6b/0x70
[ 169.332017] [<c11cc096>] ? do_page_fault+0x0/0x234
[ 169.332017] [<c1010c1f>] ? __default_send_IPI_dest_field+0x3f/0x5f
[ 169.332017] [<c1010e2d>] default_send_IPI_mask_logical+0x6c/0x8c
[ 169.332017] [<c1010c61>] default_send_IPI_all+0x22/0x60
[ 169.332017] [<c1011087>] arch_trigger_all_cpu_backtrace+0x2d/0x4f
[ 169.332017] [<c113677c>] sysrq_handle_showallcpus+0x8/0xa
[ 169.332017] [<c11365c2>] __handle_sysrq+0x9b/0x112
[ 169.332017] [<c113668e>] handle_sysrq+0x1f/0x21
[ 169.332017] [<c1130102>] kbd_event+0x303/0x570
[ 169.332017] [<c116128f>] input_pass_event+0x57/0x7a
[ 169.332017] [<c11623f6>] input_handle_event+0x35b/0x364
[ 169.332017] [<c11624c1>] input_event+0x4f/0x62
[ 169.332017] [<c11653a1>] atkbd_interrupt+0x454/0x51f
[ 169.332017] [<c115dfae>] serio_interrupt+0x33/0x66
[ 169.332017] [<c115f533>] i8042_interrupt+0x1d6/0x1e7
[ 169.332017] [<c103e815>] ? add_lock_to_list+0x36/0x95
[ 169.332017] [<c11ca18e>] ? _spin_unlock_irqrestore+0x53/0x60
[ 169.332017] [<c104d63b>] handle_IRQ_event+0x1d/0xa2
[ 169.332017] [<c104ec1b>] handle_level_irq+0x64/0xbc
[ 169.332017] [<c104ebb7>] ? handle_level_irq+0x0/0xbc
[ 169.332017] <IRQ> [<c1003e82>] ? do_IRQ+0x45/0x9a
[ 169.332017] [<c1026d6a>] ? __do_softirq+0x0/0x10a
[ 169.332017] [<c100308e>] ? common_interrupt+0x2e/0x34
[ 169.332017] [<c1004747>] ? do_softirq+0x63/0xbc
[ 169.332017] [<c1026d6a>] ? __do_softirq+0x0/0x10a
[ 169.332017] [<c1026dbb>] ? __do_softirq+0x51/0x10a
[ 169.332017] [<c1026d6a>] ? __do_softirq+0x0/0x10a
[ 169.332017] <IRQ> [<c1026afb>] ? irq_exit+0x38/0x7a
[ 169.332017] [<c1003ec1>] ? do_IRQ+0x84/0x9a
[ 169.332017] [<c100308e>] ? common_interrupt+0x2e/0x34
[ 169.332017] [<c1001d04>] ? cpu_idle+0x43/0x77
[ 169.332017] [<c1007d38>] ? default_idle+0x35/0x54
[ 169.332017] [<c1001d0a>] ? cpu_idle+0x49/0x77
[ 169.332017] [<c11bc5af>] ? rest_init+0x67/0x69
[ 169.332017] [<c12f47e5>] ? start_kernel+0x2ad/0x2b2
[ 169.332017] [<c12f4091>] ? i386_start_kernel+0x91/0x96


Attachments:
cap2.txt (32.48 kB)

2009-12-09 21:57:54

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: REGRESSION for RT2561/RT61 in 2.6.32

On 12/09/09 15:41, Alan Stern wrote:
> On Tue, 8 Dec 2009, Gertjan van Wingerde wrote:
>
>> Hmm, the stack overrun seems to be generated as part of the
>> Alt-SysRq-* handling, not of any rt61pci handling.
>>
>> I noticed in the Alt-SysRq-T output that powersaving is enabled. We've
>> seen some strange behaviour on this, so could you disable that with:
>>
>> iwconfig wlan0 power off
>>
>> or run a kernel in which CONFIG_CFG80211_DEFAULT_PS is disabled.
>>
>> Maybe that will fix the problem we are seeing?
>
> It did indeed! Thank you very much. If you need a testbed to figure
> out what's wrong with the power-saving code, just ask.
>

Great!

Thanks for testing and confirming.

I'll disable powersaving for rt2x00 for now.

And then to find the bugs in there before enabling it again :-(

---
Gertjan.

2009-12-09 22:33:03

by Otavio Salvador

[permalink] [raw]
Subject: Re: REGRESSION for RT2561/RT61 in 2.6.32

Hello,

On Wed, Dec 9, 2009 at 7:57 PM, Gertjan van Wingerde
<[email protected]> wrote:
> Thanks for testing and confirming.
>
> I'll disable powersaving for rt2x00 for now.

It would be nice to have it in 2.6.32.y and 2.6.31.y when it is done.

--
Otavio Salvador O.S. Systems
E-mail: [email protected] http://www.ossystems.com.br
Mobile: +55 53 9981-7854 http://projetos.ossystems.com.br

2009-12-07 02:11:36

by Alan Stern

[permalink] [raw]
Subject: Re: REGRESSION for RT2561/RT61 in 2.6.32

On Sun, 6 Dec 2009, Otavio Salvador wrote:

> Hello Alan,
>
> On Sun, Dec 6, 2009 at 8:12 PM, Alan Stern <[email protected]> wrote:
> > When I use 2.6.32 (or indeed 2.6.32-rc1), it works fine up until the
> > point where the wireless card is ifconfig'ed. ?Then not long afterward
> > (a few seconds to a minute) the system hangs.
>
> Did you test it with 2.6.32? Since 2.6.32-rc1 many fixes has been done
> into kernel and would be nice if you could test against current
> kernel.

The same thing happens with both 2.6.32-rc1 and 2.6.32. But not with
2.6.31.

Alan Stern


2009-12-07 22:03:24

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: REGRESSION for RT2561/RT61 in 2.6.32

On 12/06/09 23:12, Alan Stern wrote:
> Sorry for the late notice; I haven't tried running new kernels on this
> computer in a long time.
>
> When I use 2.6.32 (or indeed 2.6.32-rc1), it works fine up until the
> point where the wireless card is ifconfig'ed. Then not long afterward
> (a few seconds to a minute) the system hangs.
>
> # lspci -vv -s d.0
> 00:0d.0 Network controller: RaLink RT2561/RT61 802.11g PCI
> Subsystem: Linksys WMP54G ver 4.1
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 64, Cache Line Size: 64 bytes
> Interrupt: pin A routed to IRQ 11
> Region 0: Memory at cfff0000 (32-bit, non-prefetchable) [size=32K]
> Capabilities: [40] Power Management version 2
> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 PME-Enable- DSel=0 DScale=0 PME-
> Kernel driver in use: rt61pci
> Kernel modules: rt61pci
>
> While the system is hung, Alt-SysRq-keys still work but nothing else
> shows up when I type (using a VT console, not X). However I don't know
> what debugging to do or what to look for.
>
> Under 2.6.31 everything works okay.
>

Hmmm, that's odd. I haven't seen any other reports on this (on full system hangs that is).

It would be very hard to figure out what is going wrong without any form of logging of
an oops or something. Is there any way you can find out if any oops information is present?

What does Alt-SysRq-l tell us, what are the CPU's doing when the system is hung?

---
Gertjan.

2009-12-09 14:41:03

by Alan Stern

[permalink] [raw]
Subject: Re: REGRESSION for RT2561/RT61 in 2.6.32

On Tue, 8 Dec 2009, Gertjan van Wingerde wrote:

> Hmm, the stack overrun seems to be generated as part of the
> Alt-SysRq-* handling, not of any rt61pci handling.
>
> I noticed in the Alt-SysRq-T output that powersaving is enabled. We've
> seen some strange behaviour on this, so could you disable that with:
>
> iwconfig wlan0 power off
>
> or run a kernel in which CONFIG_CFG80211_DEFAULT_PS is disabled.
>
> Maybe that will fix the problem we are seeing?

It did indeed! Thank you very much. If you need a testbed to figure
out what's wrong with the power-saving code, just ask.

Alan Stern


2009-12-06 22:22:52

by Otavio Salvador

[permalink] [raw]
Subject: Re: REGRESSION for RT2561/RT61 in 2.6.32

Hello Alan,

On Sun, Dec 6, 2009 at 8:12 PM, Alan Stern <[email protected]> wrote:
> When I use 2.6.32 (or indeed 2.6.32-rc1), it works fine up until the
> point where the wireless card is ifconfig'ed. ?Then not long afterward
> (a few seconds to a minute) the system hangs.

Did you test it with 2.6.32? Since 2.6.32-rc1 many fixes has been done
into kernel and would be nice if you could test against current
kernel.

Cheers,

--
Otavio Salvador O.S. Systems
E-mail: [email protected] http://www.ossystems.com.br
Mobile: +55 53 9981-7854 http://projetos.ossystems.com.br

2010-01-11 15:46:47

by Alan Stern

[permalink] [raw]
Subject: Re: REGRESSION for RT2561/RT61 in 2.6.33

On Mon, 11 Jan 2010, Johannes Berg wrote:

> Alan,
>
> > The card doesn't work. Although the firmware load appears to
> > succeed, "iwconfig wlan0" says no wireless extensions present.
>
> You need to set CONFIG_CFG80211_WEXT (it is default y).
>
> > The only clue I can offer is that "rmmod rt61pci" hangs in
> > wiphy_unregister() during the wait_event() loop near the start,
> > because
> > rdev->opencount is equal to -1. This suggests a refcounting imbalance
> >
> > of some sort.
>
> Good point, that's a bug that will go away with CONFIG_CFG80211_WEXT,
> I'll send a fix.

Many thanks. I'll test the patch and the config option tonight and
report back tomorrow.

Alan Stern


2010-01-11 14:54:23

by Alan Stern

[permalink] [raw]
Subject: REGRESSION for RT2561/RT61 in 2.6.33

Gertjan:

Sorry to bother you again, but I have run across a new bug affecting
my rt2561s card. It appeared in 2.6.33-rc1 and is still present in
-rc3. (This is with CONFIG_CFG80211_DEFAULT_PS disabled, not that it
should make any difference now.)

The card doesn't work. Although the firmware load appears to
succeed, "iwconfig wlan0" says no wireless extensions present.

The only clue I can offer is that "rmmod rt61pci" hangs in
wiphy_unregister() during the wait_event() loop near the start, because
rdev->opencount is equal to -1. This suggests a refcounting imbalance
of some sort.

Alan Stern


2010-01-11 15:11:44

by Johannes Berg

[permalink] [raw]
Subject: Re: REGRESSION for RT2561/RT61 in 2.6.33

Alan,

> The card doesn't work. Although the firmware load appears to
> succeed, "iwconfig wlan0" says no wireless extensions present.

You need to set CONFIG_CFG80211_WEXT (it is default y).

> The only clue I can offer is that "rmmod rt61pci" hangs in
> wiphy_unregister() during the wait_event() loop near the start,
> because
> rdev->opencount is equal to -1. This suggests a refcounting imbalance
>
> of some sort.

Good point, that's a bug that will go away with CONFIG_CFG80211_WEXT,
I'll send a fix.

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part