2009-10-11 17:50:23

by Tomasz Chmielewski

[permalink] [raw]
Subject: ath5k AP kernel panic when client uses SCP

I am able to trigger this kernel panic when a client copies data using SCP to another client using the same AP:

client_1 <---wired---> AP <---wireless---> client_2

The panic happens after transferring around 30-40 MB of data.


The AP behaves stable with normal traffic (HTTP, HTTPS, IMAPS, text SSH).



The AP is Asus WL-500gP, it's a MIPS platform, running 2.6.31.1 kernel, hostapd v0.6.9.

I can reproduce the issue reliably.

Let me know if you need more info here.

[67359.700000] ------------[ cut here ]------------
[67359.710000] WARNING: at net/core/dev.c:1566 0x80280890()
[67359.710000] b44: caps=(0x0, 0x0) len=80 data_len=0 ip_summed=1
[67359.720000] Modules linked in: tun sch_sfq cls_fw sch_htb ipt_MASQUERADE iptable_nat nf_nat xt_MARK iptable_mangle ipt_ULOG xt_recent nf_conntrack_ipv4 nf_defrag1
[67359.740000] Call Trace:[<8002df58>] 0x8002df58
[67359.750000] [<8001371c>] 0x8001371c
[67359.750000] [<8001371c>] 0x8001371c
[67359.750000] [<8002cfb0>] 0x8002cfb0
[67359.760000] [<80280890>] 0x80280890
[67359.760000] [<8002d018>] 0x8002d018
[67359.770000] [<80280890>] 0x80280890
[67359.770000] [<80280c9c>] 0x80280c9c
[67359.770000] [<80280c20>] 0x80280c20
[67359.780000] [<802980e0>] 0x802980e0
[67359.780000] [<80298078>] 0x80298078
[67359.780000] [<8030dd3c>] 0x8030dd3c
[67359.790000] [<80284fbc>] 0x80284fbc
[67359.790000] [<80284f18>] 0x80284f18
[67359.790000] [<8030dd3c>] 0x8030dd3c
[67359.800000] [<803089d4>] 0x803089d4
[67359.800000] [<8030893c>] 0x8030893c
[67359.800000] [<8030ea74>] 0x8030ea74
[67359.810000] [<8030dd3c>] 0x8030dd3c
[67359.810000] [<8030dd3c>] 0x8030dd3c
[67359.820000] [<802a52f0>] 0x802a52f0
[67359.820000] [<8030893c>] 0x8030893c
[67359.820000] [<8030893c>] 0x8030893c
[67359.830000] [<8030893c>] 0x8030893c
[67359.830000] [<802a5440>] 0x802a5440
[67359.830000] [<8030893c>] 0x8030893c
[67359.840000] [<803089e8>] 0x803089e8
[67359.840000] [<80308a3c>] 0x80308a3c
[67359.840000] [<8030893c>] 0x8030893c
[67359.850000] [<8030dec8>] 0x8030dec8
[67359.850000] [<803089e8>] 0x803089e8
[67359.850000] [<803089e8>] 0x803089e8
[67359.860000] [<8030efa8>] 0x8030efa8
[67359.860000] [<8030ddb4>] 0x8030ddb4
[67359.870000] [<8030ddb4>] 0x8030ddb4
[67359.870000] [<802a52f0>] 0x802a52f0
[67359.870000] [<803089e8>] 0x803089e8
[67359.880000] [<803089e8>] 0x803089e8
[67359.880000] [<802a5440>] 0x802a5440
[67359.880000] [<806a5858>] 0x806a5858
[67359.890000] [<803089e8>] 0x803089e8
[67359.890000] [<80309978>] 0x80309978
[67359.890000] [<80308af4>] 0x80308af4
[67359.900000] [<80309978>] 0x80309978
[67359.900000] [<802a5440>] 0x802a5440
[67359.900000] [<803089e8>] 0x803089e8
[67359.910000] [<80309ae0>] 0x80309ae0
[67359.910000] [<80309978>] 0x80309978
[67359.920000] [<8030e668>] 0x8030e668
[67359.920000] [<8030e60c>] 0x8030e60c
[67359.920000] [<8030e25c>] 0x8030e25c
[67359.930000] [<80309978>] 0x80309978
[67359.930000] [<8030e25c>] 0x8030e25c
[67359.930000] [<802a5440>] 0x802a5440
[67359.940000] [<80014aa0>] 0x80014aa0
[67359.940000] [<8030e25c>] 0x8030e25c
[67359.940000] [<8030f96c>] 0x8030f96c
[67359.950000] [<8079301c>] 0x8079301c
[67359.950000] [<8030e25c>] 0x8030e25c
[67359.950000] [<802a52f0>] 0x802a52f0
[67359.960000] [<80309978>] 0x80309978
[67359.960000] [<80309978>] 0x80309978
[67359.970000] [<802a5440>] 0x802a5440
[67359.970000] [<8008feec>] 0x8008feec
[67359.970000] [<80309978>] 0x80309978
[67359.980000] [<80309d44>] 0x80309d44
[67359.980000] [<8008feec>] 0x8008feec
[67359.980000] [<8008ffdc>] 0x8008ffdc
[67359.990000] [<80276190>] 0x80276190
[67359.990000] [<80309978>] 0x80309978
[67359.990000] [<800900a4>] 0x800900a4
[67360.000000] [<8027fe3c>] 0x8027fe3c
[67360.000000] [<802964f8>] 0x802964f8
[67360.000000] [<8008feec>] 0x8008feec
[67360.010000] [<802828e8>] 0x802828e8
[67360.010000] [<80276190>] 0x80276190
[67360.020000] [<80283b40>] 0x80283b40
[67360.020000] [<80282a94>] 0x80282a94
[67360.020000] [<80033764>] 0x80033764
[67360.030000] [<8005d0f0>] 0x8005d0f0
[67360.030000] [<800543e0>] 0x800543e0
[67360.030000] [<80033874>] 0x80033874
[67360.040000] [<80033d74>] 0x80033d74
[67360.040000] [<80001844>] 0x80001844
[67360.040000] [<80001844>] 0x80001844
[67360.050000] [<80001a60>] 0x80001a60
[67360.050000] [<800149fc>] 0x800149fc
[67360.050000] [<8000efc8>] 0x8000efc8
[67360.060000] [<8000efc8>] 0x8000efc8
[67360.060000] [<8039c9ec>] 0x8039c9ec
[67360.070000] [<8039c9d0>] 0x8039c9d0
[67360.070000] [<8039c110>] 0x8039c110
[67360.070000]
[67360.070000] ---[ end trace 94ff764c3a95abf9 ]---
[67360.080000] Unhandled kernel unaligned access[#1]:
[67360.080000] Cpu 0
[67360.080000] $ 0 : 00000000 1000dc00 00000001 81445a40
[67360.080000] $ 4 : 05f20d4d 00000000 00000001 00000083
[67360.080000] $ 8 : 00000000 00000083 803d0000 ffffffea
[67360.080000] $12 : 803d0000 00000000 00000000 00000000
[67360.080000] $16 : 0000000c 00000001 8099ee20 81c72000
[67360.080000] $20 : 81d17e00 80330090 8030893c 81c72000
[67360.080000] $24 : 00010720 802ced34
[67360.080000] $28 : 8037c000 8037d880 00000010 80276890
[67360.080000] Hi : 00000000
[67360.080000] Lo : 00000000
[67360.080000] epc : 8006f058 0x8006f058
[67360.080000] Tainted: G W
[67360.080000] ra : 80276890 0x80276890
[67360.080000] Status: 1000dc03 KERNEL EXL IE
[67360.080000] Cause : 00800010
[67360.080000] BadVA : 05f20d4d
[67360.080000] PrId : 00029006 (Broadcom BCM3302)
[67360.080000] Modules linked in: tun sch_sfq cls_fw sch_htb ipt_MASQUERADE iptable_nat nf_nat xt_MARK iptable_mangle ipt_ULOG xt_recent nf_conntrack_ipv4 nf_defrag1
[67360.080000] Process swapper (pid: 0, threadinfo=8037c000, task=8037e000, tls=00000000)
[67360.080000] Stack : 8099ee20 80276190 00000000 00000000 8099ee20 803d80a8 8099ee20 802761f0
[67360.080000] 81d17e00 803d80a8 8099ee20 81c72000 81d17e00 80280d50 8099ee20 81404f80
[67360.080000] 8037d900 80da8000 80da8000 81d17e00 81d17e00 00000001 80da8000 8099ee20
[67360.080000] 81c72000 00665332 802980e0 80298078 80da8000 00000002 00000000 8030dd3c
[67360.080000] 80da8000 81d17e00 8099ee20 00000000 803d86e0 80000000 80284fbc 80284f18
[67360.080000] ...
[67360.080000] Call Trace:[<80276190>] 0x80276190
[67360.080000] [<802761f0>] 0x802761f0
[67360.080000] [<80280d50>] 0x80280d50
[67360.080000] [<802980e0>] 0x802980e0
[67360.080000] [<80298078>] 0x80298078
[67360.080000] [<8030dd3c>] 0x8030dd3c
[67360.080000] [<80284fbc>] 0x80284fbc
[67360.080000] [<80284f18>] 0x80284f18
[67360.080000] [<8030dd3c>] 0x8030dd3c
[67360.080000] [<803089d4>] 0x803089d4
[67360.080000] [<8030893c>] 0x8030893c
[67360.080000] [<8030ea74>] 0x8030ea74
[67360.080000] [<8030dd3c>] 0x8030dd3c
[67360.080000] [<8030dd3c>] 0x8030dd3c
[67360.080000] [<802a52f0>] 0x802a52f0
[67360.080000] [<8030893c>] 0x8030893c
[67360.080000] [<8030893c>] 0x8030893c
[67360.080000] [<8030893c>] 0x8030893c
[67360.080000] [<802a5440>] 0x802a5440
[67360.080000] [<8030893c>] 0x8030893c
[67360.080000] [<803089e8>] 0x803089e8
[67360.080000] [<80308a3c>] 0x80308a3c
[67360.080000] [<8030893c>] 0x8030893c
[67360.080000] [<8030dec8>] 0x8030dec8
[67360.080000] [<803089e8>] 0x803089e8
[67360.080000] [<803089e8>] 0x803089e8
[67360.080000] [<8030efa8>] 0x8030efa8
[67360.080000] [<8030ddb4>] 0x8030ddb4
[67360.080000] [<8030ddb4>] 0x8030ddb4
[67360.080000] [<802a52f0>] 0x802a52f0
[67360.080000] [<803089e8>] 0x803089e8
[67360.080000] [<803089e8>] 0x803089e8
[67360.080000] [<802a5440>] 0x802a5440
[67360.080000] [<806a5858>] 0x806a5858
[67360.080000] [<803089e8>] 0x803089e8
[67360.080000] [<80309978>] 0x80309978
[67360.080000] [<80308af4>] 0x80308af4
[67360.080000] [<80309978>] 0x80309978
[67360.080000] [<802a5440>] 0x802a5440
[67360.080000] [<803089e8>] 0x803089e8
[67360.080000] [<80309ae0>] 0x80309ae0
[67360.080000] [<80309978>] 0x80309978
[67360.080000] [<8030e668>] 0x8030e668
[67360.080000] [<8030e60c>] 0x8030e60c
[67360.080000] [<8030e25c>] 0x8030e25c
[67360.080000] [<80309978>] 0x80309978
[67360.080000] [<8030e25c>] 0x8030e25c
[67360.080000] [<802a5440>] 0x802a5440
[67360.080000] [<80014aa0>] 0x80014aa0
[67360.080000] [<8030e25c>] 0x8030e25c
[67360.080000] [<8030f96c>] 0x8030f96c
[67360.080000] [<8079301c>] 0x8079301c
[67360.080000] [<8030e25c>] 0x8030e25c
[67360.080000] [<802a52f0>] 0x802a52f0
[67360.080000] [<80309978>] 0x80309978
[67360.080000] [<80309978>] 0x80309978
[67360.080000] [<802a5440>] 0x802a5440
[67360.080000] [<8008feec>] 0x8008feec
[67360.080000] [<80309978>] 0x80309978
[67360.080000] [<80309d44>] 0x80309d44
[67360.080000] [<8008feec>] 0x8008feec
[67360.080000] [<8008ffdc>] 0x8008ffdc
[67360.080000] [<80276190>] 0x80276190
[67360.080000] [<80309978>] 0x80309978
[67360.080000] [<800900a4>] 0x800900a4
[67360.080000] [<8027fe3c>] 0x8027fe3c
[67360.080000] [<802964f8>] 0x802964f8
[67360.080000] [<8008feec>] 0x8008feec
[67360.080000] [<802828e8>] 0x802828e8
[67360.080000] [<80276190>] 0x80276190
[67360.080000] [<80283b40>] 0x80283b40
[67360.080000] [<80282a94>] 0x80282a94
[67360.080000] [<80033764>] 0x80033764
[67360.080000] [<8005d0f0>] 0x8005d0f0
[67360.080000] [<800543e0>] 0x800543e0
[67360.080000] [<80033874>] 0x80033874
[67360.080000] [<80033d74>] 0x80033d74
[67360.080000] [<80001844>] 0x80001844
[67360.080000] [<80001844>] 0x80001844
[67360.080000] [<80001a60>] 0x80001a60
[67360.080000] [<800149fc>] 0x800149fc
[67360.080000] [<8000efc8>] 0x8000efc8
[67360.080000] [<8000efc8>] 0x8000efc8
[67360.080000] [<8039c9ec>] 0x8039c9ec
[67360.080000] [<8039c9d0>] 0x8039c9d0
[67360.080000] [<8039c110>] 0x8039c110
[67360.080000]
[67360.080000]
[67360.080000] Code: 3c048007 08010471 2484f044 <8c820000> 3042c000 10400003 00803821 0801b8d1 00000000
[67360.080000] Disabling lock debugging due to kernel taint
[67360.560000] Kernel panic - not syncing: Fatal exception in interrupt




--
Tomasz Chmielewski
http://wpkg.org


2009-10-23 13:21:30

by Gábor Stefanik

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

On Fri, Oct 23, 2009 at 3:15 PM, Tomasz Chmielewski <[email protected]> wrote:
> Bob Copeland wrote:
>>
>> CONFIG_DEBUG_INFO is the basic switch. ?I don't know if MIPS needs
>> CONFIG_FRAME_POINTER but that could help too.
>
> I don't see CONFIG_FRAME_POINTER available.
> I compiled with CONFIG_DEBUG_INFO; let me know if I should enable some other
> DEBUG options as well (see below what's enabled and available).
>
> Before it oopses, I see lots of order 0 page allocation failures (there are
> some extra spaces at the end of each line due to the broken konsole in
> KDE4):
>
> http://www1.wpkg.org/oops2.txt
>
>
> Once, the device hanged without producing an oops (with lots of page
> allocation failures before).
>
>
> No clue about disassembling here, sorry.
>
>
> # zgrep DEBUG /proc/config.gz
>
> # CONFIG_PCI_DEBUG is not set
> # CONFIG_NETFILTER_DEBUG is not set
> # CONFIG_NETFILTER_XT_MATCH_LAYER7_DEBUG is not set
> CONFIG_CFG80211_REG_DEBUG=y
> # CONFIG_CFG80211_DEBUGFS is not set
> CONFIG_LIB80211_DEBUG=y
> # CONFIG_MAC80211_DEBUGFS is not set
> # CONFIG_MAC80211_DEBUG_MENU is not set
> # CONFIG_DEBUG_DRIVER is not set
> # CONFIG_DEBUG_DEVRES is not set
> # CONFIG_MTD_DEBUG is not set
> # CONFIG_SCSI_DEBUG is not set
> # CONFIG_DM_DEBUG is not set
> CONFIG_LIBERTAS_DEBUG=y
> CONFIG_ATH5K_DEBUG=y
> CONFIG_ATH9K_DEBUG=y
> CONFIG_IPW2100_DEBUG=y
> CONFIG_IPW2200_DEBUG=y
> CONFIG_LIBIPW_DEBUG=y
> CONFIG_IWLWIFI_DEBUG=y
> CONFIG_B43_DEBUG=y
> CONFIG_B43LEGACY_DEBUG=y
> CONFIG_ZD1211RW_DEBUG=y
> CONFIG_RT2X00_DEBUG=y
> # CONFIG_HISAX_DEBUG is not set
> # CONFIG_SSB_DEBUG is not set
> # CONFIG_USB_DEBUG is not set
> # CONFIG_USB_STORAGE_DEBUG is not set
> # CONFIG_USB_SERIAL_DEBUG is not set
> # CONFIG_JBD_DEBUG is not set
> # CONFIG_JBD2_DEBUG is not set
> # CONFIG_JFS_DEBUG is not set
> # CONFIG_XFS_DEBUG is not set
> # CONFIG_NTFS_DEBUG is not set
> CONFIG_JFFS2_FS_DEBUG=0
> # CONFIG_CIFS_DEBUG2 is not set
> CONFIG_DEBUG_FS=y
> CONFIG_DEBUG_KERNEL=y
> CONFIG_DEBUG_SHIRQ=y
> CONFIG_SCHED_DEBUG=y
> # CONFIG_DEBUG_OBJECTS is not set
> # CONFIG_DEBUG_SLAB is not set
> # CONFIG_DEBUG_RT_MUTEXES is not set
> # CONFIG_DEBUG_SPINLOCK is not set
> # CONFIG_DEBUG_MUTEXES is not set
> # CONFIG_DEBUG_LOCK_ALLOC is not set
> # CONFIG_DEBUG_SPINLOCK_SLEEP is not set
> # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
> # CONFIG_DEBUG_KOBJECT is not set
> CONFIG_DEBUG_INFO=y
> # CONFIG_DEBUG_VM is not set
> # CONFIG_DEBUG_WRITECOUNT is not set
> # CONFIG_DEBUG_MEMORY_INIT is not set
> # CONFIG_DEBUG_LIST is not set
> # CONFIG_DEBUG_SG is not set
> # CONFIG_DEBUG_NOTIFIERS is not set
> # CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
> # CONFIG_DYNAMIC_DEBUG is not set
> # CONFIG_DEBUG_STACK_USAGE is not set
> # CONFIG_RUNTIME_DEBUG is not set
>
>
>> Another option in the absence of debug info is disassembling the
>> Code: section of the oops and trying to find the corresponding
>> code via objdump, but this only really works well if you have some
>> idea which module is causing the error.
>
>
>
> --
> Tomasz Chmielewski
> http://wpkg.org
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

Is CONFIG_KALLSYMS set?

--
Vista: [V]iruses, [I]ntruders, [S]pyware, [T]rojans and [A]dware. :-)

2009-10-24 12:52:49

by Bob Copeland

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

>> I'll recompile...
>
> Is this one more useful?

Much! Thanks.

> [ 516.430000] ------------[ cut here ]------------
> [ 516.430000] WARNING: at net/core/dev.c:1566 skb_gso_segment+0x110/0x298()
> [ 516.440000] b44: caps=(0x0, 0x0) len=52 data_len=0 ip_summed=1

So this looks like the ethernet driver b44 instead of ath5k, I think.

> [ 516.450000] Modules linked in: configs aes_generic tun sch_sfq cls_fw sch_htb ipt_MASQUERADE iptable_nat nf_nat xt_MARK iptable_mangle ipt_ULOG xt_recent nf_conn1
> [ 516.480000] Call Trace:
> [ 516.480000] [<80013ac4>] dump_stack+0x8/0x34
> [ 516.480000] [<8002f2a0>] warn_slowpath_common+0x70/0x98
> [ 516.490000] [<8002f308>] warn_slowpath_fmt+0x24/0x30
> [ 516.490000] [<801f0398>] skb_gso_segment+0x110/0x298

...but this is higher up the stack. skb_gso_segment is about a year old
so it would be odd if there were a bug here. Can you do the following?

objdump -S net/core/dev.o, then find the address for skb_gso_segment,
it should look something like this (your numbers and disassembly will
differ):

00004190 <skb_gso_segment>:
*
* It may return NULL if the skb requires no segmentation. This is
* only possible when GSO is used for verifying header integrity.
*/
struct sk_buff *skb_gso_segment(struct sk_buff *skb, int features)
{
4190: 55 push %ebp
4191: 89 e5 mov %esp,%ebp
4193: 57 push %edi
4194: 56 push %esi

Then find whatever 0x4190 (your offset here) + 0x110 is, and pick about
10 lines before and after that and paste it -- the opcodes in the second
column should match up with the code lines below:

> [ 516.650000] [ 516.650000] [ 516.650000] Code: 3c048007 08010d50
> 248436fc <8c820000> 3042c000 10400003 00803821 0801ca7f 00000000 [

Or you can send me the whole objdump -S output off-list if that's easier.

--
Bob Copeland %% http://www.bobcopeland.com


2009-10-24 16:10:12

by Bob Copeland

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

On Sat, Oct 24, 2009 at 05:27:56PM +0200, Tomasz Chmielewski wrote:
> Hmm, I get:
>
> 00000000 <skb_gso_segment>
>
> So I better leave the files here:
>
> http://www1.wpkg.org/skb_gso_segment/

So, do you only get this WARN() once before things go downhill?

> [ 516.440000] b44: caps=(0x0, 0x0) len=52 data_len=0 ip_summed=1

The actual error is on skb_release_data, this might give a clue if
there's some corruption going on.

--
Bob Copeland %% http://www.bobcopeland.com


2009-10-12 08:14:04

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

Bob Copeland wrote:
> On Sun, Oct 11, 2009 at 1:49 PM, Tomasz Chmielewski <[email protected]> wrote:
>> The AP is Asus WL-500gP, it's a MIPS platform, running 2.6.31.1 kernel,
>> hostapd v0.6.9.
>>
>> I can reproduce the issue reliably.
>>
>> Let me know if you need more info here.
>
> Yes, please -- it would really be helpful if instead of just the addresses
> we have all the function names in the stack trace. Or at least what is
> at 8006f058 and 80276190. We've had a couple of reports of unaligned
> accesses but I haven't yet seen a useful stack trace.

OK, will try to send it later today.


>> [67359.700000] ------------[ cut here ]------------
>> [67359.710000] WARNING: at net/core/dev.c:1566 0x80280890()
>> [67359.710000] b44: caps=(0x0, 0x0) len=80 data_len=0 ip_summed=1
>> [67359.720000] Modules linked in: tun sch_sfq cls_fw sch_htb ipt_MASQUERADE
>> iptable_nat nf_nat xt_MARK iptable_mangle ipt_ULOG xt_recent
>> nf_conntrack_ipv4 nf_defrag1
>
> Did you replace the wireless device? I don't see ath5k in the
> above list; IIRC that AP is originally some broadcom chipset.

The AP device comes with a broadcom wireless card, which doesn't work well.
So I replaced broadcom with an ath5k card.
ath5k and other modules were probably "eaten" by the terminal/minicom/konsole/whatever.
This is the list of modules which should be loaded when the panic occurs:

# lsmod
Module Size Used by
tun 12160 2
sch_sfq 4496 1
cls_fw 3216 1
sch_htb 12480 1
ipt_MASQUERADE 976 1
iptable_nat 2800 1
nf_nat 12496 2 ipt_MASQUERADE,iptable_nat
xt_MARK 912 1
iptable_mangle 1008 1
ipt_ULOG 4144 1
xt_recent 5408 9
nf_conntrack_ipv4 4560 7 iptable_nat,nf_nat
nf_defrag_ipv4 624 1 nf_conntrack_ipv4
xt_state 784 4
nf_conntrack 46640 5 ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4,xt_state
xt_tcpudp 1888 5
iptable_filter 768 1
ip_tables 8848 3 iptable_nat,iptable_mangle,iptable_filter
x_tables 9648 8 ipt_MASQUERADE,iptable_nat,xt_MARK,ipt_ULOG,xt_recent,xt_state,xt_tcpudp,ip_tables
ath5k 131616 0
mac80211 130208 1 ath5k
ath 6256 1 ath5k
ohci_hcd 17344 0
uhci_hcd 18048 0
cfg80211 72000 3 ath5k,mac80211,ath


>> [67360.080000] Disabling lock debugging due to kernel taint
>> [67360.560000] Kernel panic - not syncing: Fatal exception in interrupt
>
> Why's it tainted? I can't remember and can't check right now if
> TAINT_WARN counts.

Yes, WARN taints the kernel.


--
Tomasz Chmielewski
http://wpkg.org

2009-10-23 13:15:25

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

Bob Copeland wrote:
>
> CONFIG_DEBUG_INFO is the basic switch. I don't know if MIPS needs
> CONFIG_FRAME_POINTER but that could help too.

I don't see CONFIG_FRAME_POINTER available.
I compiled with CONFIG_DEBUG_INFO; let me know if I should enable some
other DEBUG options as well (see below what's enabled and available).

Before it oopses, I see lots of order 0 page allocation failures (there
are some extra spaces at the end of each line due to the broken konsole
in KDE4):

http://www1.wpkg.org/oops2.txt


Once, the device hanged without producing an oops (with lots of page
allocation failures before).


No clue about disassembling here, sorry.


# zgrep DEBUG /proc/config.gz

# CONFIG_PCI_DEBUG is not set
# CONFIG_NETFILTER_DEBUG is not set
# CONFIG_NETFILTER_XT_MATCH_LAYER7_DEBUG is not set
CONFIG_CFG80211_REG_DEBUG=y
# CONFIG_CFG80211_DEBUGFS is not set
CONFIG_LIB80211_DEBUG=y
# CONFIG_MAC80211_DEBUGFS is not set
# CONFIG_MAC80211_DEBUG_MENU is not set
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_MTD_DEBUG is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_DM_DEBUG is not set
CONFIG_LIBERTAS_DEBUG=y
CONFIG_ATH5K_DEBUG=y
CONFIG_ATH9K_DEBUG=y
CONFIG_IPW2100_DEBUG=y
CONFIG_IPW2200_DEBUG=y
CONFIG_LIBIPW_DEBUG=y
CONFIG_IWLWIFI_DEBUG=y
CONFIG_B43_DEBUG=y
CONFIG_B43LEGACY_DEBUG=y
CONFIG_ZD1211RW_DEBUG=y
CONFIG_RT2X00_DEBUG=y
# CONFIG_HISAX_DEBUG is not set
# CONFIG_SSB_DEBUG is not set
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_SERIAL_DEBUG is not set
# CONFIG_JBD_DEBUG is not set
# CONFIG_JBD2_DEBUG is not set
# CONFIG_JFS_DEBUG is not set
# CONFIG_XFS_DEBUG is not set
# CONFIG_NTFS_DEBUG is not set
CONFIG_JFFS2_FS_DEBUG=0
# CONFIG_CIFS_DEBUG2 is not set
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SHIRQ=y
CONFIG_SCHED_DEBUG=y
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_WRITECOUNT is not set
# CONFIG_DEBUG_MEMORY_INIT is not set
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_RUNTIME_DEBUG is not set


> Another option in the absence of debug info is disassembling the
> Code: section of the oops and trying to find the corresponding
> code via objdump, but this only really works well if you have some
> idea which module is causing the error.



--
Tomasz Chmielewski
http://wpkg.org


2009-10-24 15:28:18

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

Bob Copeland wrote:

>> [ 516.430000] ------------[ cut here ]------------
>> [ 516.430000] WARNING: at net/core/dev.c:1566 skb_gso_segment+0x110/0x298()
>> [ 516.440000] b44: caps=(0x0, 0x0) len=52 data_len=0 ip_summed=1
>
> So this looks like the ethernet driver b44 instead of ath5k, I think.

Could be, but!

I can trigger it when doing this kind of SSH traffic:

PC >--100Mbit--> b44 (Asus WL-500gP) ath5k >--wireless--> laptop


When I transfer over wired ethernet only through this device, everything
works properly, no page allocation failures, no warnings, panics.


>> [ 516.450000] Modules linked in: configs aes_generic tun sch_sfq cls_fw sch_htb ipt_MASQUERADE iptable_nat nf_nat xt_MARK iptable_mangle ipt_ULOG xt_recent nf_conn1
>> [ 516.480000] Call Trace:
>> [ 516.480000] [<80013ac4>] dump_stack+0x8/0x34
>> [ 516.480000] [<8002f2a0>] warn_slowpath_common+0x70/0x98
>> [ 516.490000] [<8002f308>] warn_slowpath_fmt+0x24/0x30
>> [ 516.490000] [<801f0398>] skb_gso_segment+0x110/0x298
>
> ...but this is higher up the stack. skb_gso_segment is about a year old
> so it would be odd if there were a bug here. Can you do the following?
>
> objdump -S net/core/dev.o, then find the address for skb_gso_segment,
> it should look something like this (your numbers and disassembly will
> differ):
>
> 00004190 <skb_gso_segment>:
> *
> * It may return NULL if the skb requires no segmentation. This is
> * only possible when GSO is used for verifying header integrity.
> */

Hmm, I get:

00000000 <skb_gso_segment>

So I better leave the files here:

http://www1.wpkg.org/skb_gso_segment/

Let me know if you need anything else.


--
Tomasz Chmielewski
http://wpkg.org

2009-10-24 12:00:20

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

Tomasz Chmielewski wrote:
> G?bor Stefanik wrote:
>> On Fri, Oct 23, 2009 at 3:15 PM, Tomasz Chmielewski <[email protected]>
>> wrote:
>>> Bob Copeland wrote:
>>>> CONFIG_DEBUG_INFO is the basic switch. I don't know if MIPS needs
>>>> CONFIG_FRAME_POINTER but that could help too.
>>> I don't see CONFIG_FRAME_POINTER available.
>>> I compiled with CONFIG_DEBUG_INFO; let me know if I should enable
>>> some other
>>> DEBUG options as well (see below what's enabled and available).
>>>
>>> Before it oopses, I see lots of order 0 page allocation failures
>>> (there are
>>> some extra spaces at the end of each line due to the broken konsole in
>>> KDE4):
>>>
>>> http://www1.wpkg.org/oops2.txt
>
>> Is CONFIG_KALLSYMS set?
>
> Good clue:
>
> # zgrep KALL /proc/config.gz
> # CONFIG_KALLSYMS is not set
>
> I'll recompile...

Is this one more useful?

[ 516.430000] ------------[ cut here ]------------
[ 516.430000] WARNING: at net/core/dev.c:1566 skb_gso_segment+0x110/0x298()
[ 516.440000] b44: caps=(0x0, 0x0) len=52 data_len=0 ip_summed=1
[ 516.450000] Modules linked in: configs aes_generic tun sch_sfq cls_fw sch_htb ipt_MASQUERADE iptable_nat nf_nat xt_MARK iptable_mangle ipt_ULOG xt_recent nf_conn1
[ 516.480000] Call Trace:
[ 516.480000] [<80013ac4>] dump_stack+0x8/0x34
[ 516.480000] [<8002f2a0>] warn_slowpath_common+0x70/0x98
[ 516.490000] [<8002f308>] warn_slowpath_fmt+0x24/0x30
[ 516.490000] [<801f0398>] skb_gso_segment+0x110/0x298
[ 516.500000] [<801f0728>] dev_hard_start_xmit+0x208/0x364
[ 516.500000] [<80207c5c>] __qdisc_run+0x108/0x2f0
[ 516.510000] [<801f4ad8>] dev_queue_xmit+0x25c/0x3a8
[ 516.510000] [<802785c8>] br_dev_queue_push_xmit+0x98/0xac
[ 516.520000] [<8027e66c>] br_nf_post_routing+0x23c/0x25c
[ 516.520000] [<80214e78>] nf_iterate+0x70/0xf8
[ 516.530000] [<80214fc8>] nf_hook_slow+0x88/0x12c
[ 516.530000] [<80278630>] br_forward_finish+0x54/0x80
[ 516.540000] [<8027dac0>] br_nf_forward_finish+0x114/0x130
[ 516.540000] [<8027eba0>] br_nf_forward_ip+0x300/0x33c
[ 516.550000] [<80214e78>] nf_iterate+0x70/0xf8
[ 516.550000] [<80214fc8>] nf_hook_slow+0x88/0x12c
[ 516.560000] [<802786e8>] __br_forward+0x8c/0xe0
[ 516.560000] [<802796d4>] br_handle_frame_finish+0x168/0x1ac
[ 516.570000] [<8027e260>] br_nf_pre_routing_finish+0x40c/0x450
[ 516.580000] [<8027f564>] br_nf_pre_routing+0x824/0x86c
[ 516.580000] [<80214e78>] nf_iterate+0x70/0xf8
[ 516.580000] [<80214fc8>] nf_hook_slow+0x88/0x12c
[ 516.590000] [<80279938>] br_handle_frame+0x220/0x26c
[ 516.590000] [<801ef944>] netif_receive_skb+0x604/0x7b8
[ 516.600000] [<801f23f0>] process_backlog+0xc8/0x138
[ 516.600000] [<801f365c>] net_rx_action+0xe8/0x26c
[ 516.610000] [<80035a84>] __do_softirq+0xac/0x160
[ 516.610000] [<80035b94>] do_softirq+0x5c/0x94
[ 516.620000] [<80036094>] irq_exit+0x40/0x8c
[ 516.620000] [<80001844>] ret_from_irq+0x0/0x4
[ 516.630000] [<800ab12c>] core_sys_select+0x1d0/0x2d0
[ 516.630000] [<800ab870>] sys_select+0xe4/0x12c
[ 516.640000] [<800035f0>] stack_done+0x20/0x3c
[ 516.640000]
[ 516.640000] ---[ end trace b84c10674d0cf224 ]---
[ 516.650000] Unhandled kernel unaligned access[#1]:
[ 516.650000] Cpu 0
[ 516.650000] $ 0 : 00000000 1000dc00 00000001 81f97a40
[ 516.650000] $ 4 : e0f2fdea 00000000 00000001 00000083
[ 516.650000] $ 8 : 00000000 00000083 80380000 ffffffea
[ 516.650000] $12 : 80380000 00000030 00000020 00000000
[ 516.650000] $16 : 0000000c 00000001 80dfa8e0 81c67000
[ 516.650000] $20 : 81ccda00 8029ac50 80278530 81c67000
[ 516.650000] $24 : 00000204 8023e8ec
[ 516.650000] $28 : 81c14000 81c15750 00000005 801e6398
[ 516.650000] Hi : 00000000
[ 516.650000] Lo : 00000000
[ 516.650000] epc : 80073710 put_page+0x0/0x314
[ 516.650000] Tainted: G W
[ 516.650000] ra : 801e6398 skb_release_data+0xf4/0x160
[ 516.650000] Status: 1000dc03 KERNEL EXL IE
[ 516.650000] Cause : 00800010
[ 516.650000] BadVA : e0f2fdea
[ 516.650000] PrId : 00029006 (Broadcom BCM3302)
[ 516.650000] Modules linked in: configs aes_generic tun sch_sfq cls_fw sch_htb ipt_MASQUERADE iptable_nat nf_nat xt_MARK iptable_mangle ipt_ULOG xt_recent nf_conn1
[ 516.650000] Process init (pid: 1, threadinfo=81c14000, task=81c13938, tls=2aada4c0)
[ 516.650000] Stack : 80dfa8e0 801e5c98 00000000 00000000 80dfa8e0 8038b698 80dfa8e0 801e5cf8
[ 516.650000] 81ccda00 8038b698 80dfa8e0 81c67000 81ccda00 801f0858 80dfa8e0 81a40680
[ 516.650000] 81c157d0 81dc6800 81dc6800 81ccda00 81ccda00 00000001 81dc6800 80dfa8e0
[ 516.650000] 81c67000 0000548b 80207c5c 80207bf4 81dc6800 00000002 00000000 8027d934
[ 516.650000] 81dc6800 81ccda00 80dfa8e0 00000000 8038bcd0 80000000 801f4ad8 801f4a34
[ 516.650000] ...
[ 516.650000] Call Trace:
[ 516.650000] [<80073710>] put_page+0x0/0x314
[ 516.650000] [<801e6398>] skb_release_data+0xf4/0x160
[ 516.650000] [<801e5cf8>] __kfree_skb+0x14/0x1c0
[ 516.650000] [<801f0858>] dev_hard_start_xmit+0x338/0x364
[ 516.650000] [<80207c5c>] __qdisc_run+0x108/0x2f0
[ 516.650000] [<801f4ad8>] dev_queue_xmit+0x25c/0x3a8
[ 516.650000] [<802785c8>] br_dev_queue_push_xmit+0x98/0xac
[ 516.650000] [<8027e66c>] br_nf_post_routing+0x23c/0x25c
[ 516.650000] [<80214e78>] nf_iterate+0x70/0xf8
[ 516.650000] [<80214fc8>] nf_hook_slow+0x88/0x12c
[ 516.650000] [<80278630>] br_forward_finish+0x54/0x80
[ 516.650000] [<8027dac0>] br_nf_forward_finish+0x114/0x130
[ 516.650000] [<8027eba0>] br_nf_forward_ip+0x300/0x33c
[ 516.650000] [<80214e78>] nf_iterate+0x70/0xf8
[ 516.650000] [<80214fc8>] nf_hook_slow+0x88/0x12c
[ 516.650000] [<802786e8>] __br_forward+0x8c/0xe0
[ 516.650000] [<802796d4>] br_handle_frame_finish+0x168/0x1ac
[ 516.650000] [<8027e260>] br_nf_pre_routing_finish+0x40c/0x450
[ 516.650000] [<8027f564>] br_nf_pre_routing+0x824/0x86c
[ 516.650000] [<80214e78>] nf_iterate+0x70/0xf8
[ 516.650000] [<80214fc8>] nf_hook_slow+0x88/0x12c
[ 516.650000] [<80279938>] br_handle_frame+0x220/0x26c
[ 516.650000] [<801ef944>] netif_receive_skb+0x604/0x7b8
[ 516.650000] [<801f23f0>] process_backlog+0xc8/0x138
[ 516.650000] [<801f365c>] net_rx_action+0xe8/0x26c
[ 516.650000] [<80035a84>] __do_softirq+0xac/0x160
[ 516.650000] [<80035b94>] do_softirq+0x5c/0x94
[ 516.650000] [<80036094>] irq_exit+0x40/0x8c
[ 516.650000] [<80001844>] ret_from_irq+0x0/0x4
[ 516.650000] [<800ab12c>] core_sys_select+0x1d0/0x2d0
[ 516.650000] [<800ab870>] sys_select+0xe4/0x12c
[ 516.650000] [<800035f0>] stack_done+0x20/0x3c
[ 516.650000]
[ 516.650000]
[ 516.650000] Code: 3c048007 08010d50 248436fc <8c820000> 3042c000 10400003 00803821 0801ca7f 00000000
[ 516.650000] Disabling lock debugging due to kernel taint
[ 516.990000] Kernel panic - not syncing: Fatal exception in interrupt



As this line didn't fit:

Modules linked in: tun sch_sfq cls_fw sch_htb ipt_MASQUERADE iptable_nat nf_nat xt_MARK iptable_mangle ipt_ULOG xt_recent nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables ath5k mac80211 ath ohci_hcd cfg80211 uhci_hcd

--
Tomasz Chmielewski
http://wpkg.org

2009-10-11 22:48:10

by Bob Copeland

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

On Sun, Oct 11, 2009 at 1:49 PM, Tomasz Chmielewski <[email protected]> wrote:
> The AP is Asus WL-500gP, it's a MIPS platform, running 2.6.31.1 kernel,
> hostapd v0.6.9.
>
> I can reproduce the issue reliably.
>
> Let me know if you need more info here.

Yes, please -- it would really be helpful if instead of just the addresses
we have all the function names in the stack trace. Or at least what is
at 8006f058 and 80276190. We've had a couple of reports of unaligned
accesses but I haven't yet seen a useful stack trace.

> [67359.700000] ------------[ cut here ]------------
> [67359.710000] WARNING: at net/core/dev.c:1566 0x80280890()
> [67359.710000] b44: caps=(0x0, 0x0) len=80 data_len=0 ip_summed=1
> [67359.720000] Modules linked in: tun sch_sfq cls_fw sch_htb ipt_MASQUERADE
> iptable_nat nf_nat xt_MARK iptable_mangle ipt_ULOG xt_recent
> nf_conntrack_ipv4 nf_defrag1

Did you replace the wireless device? I don't see ath5k in the
above list; IIRC that AP is originally some broadcom chipset.

> [67360.080000] Disabling lock debugging due to kernel taint
> [67360.560000] Kernel panic - not syncing: Fatal exception in interrupt

Why's it tainted? I can't remember and can't check right now if
TAINT_WARN counts.

--
Bob Copeland %% http://www.bobcopeland.com

2009-10-11 17:56:28

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

Added linux-net as I used a wrong address in the original mail.


> I am able to trigger this kernel panic when a client copies data using
> SCP to another client using the same AP:
>
> client_1 <---wired---> AP <---wireless---> client_2
>
> The panic happens after transferring around 30-40 MB of data.
>
>
> The AP behaves stable with normal traffic (HTTP, HTTPS, IMAPS, text SSH).
>
>
>
> The AP is Asus WL-500gP, it's a MIPS platform, running 2.6.31.1 kernel,
> hostapd v0.6.9.
>
> I can reproduce the issue reliably.
>
> Let me know if you need more info here.
>
> [67359.700000] ------------[ cut here ]------------
> [67359.710000] WARNING: at net/core/dev.c:1566 0x80280890()
> [67359.710000] b44: caps=(0x0, 0x0) len=80 data_len=0 ip_summed=1
> [67359.720000] Modules linked in: tun sch_sfq cls_fw sch_htb
> ipt_MASQUERADE iptable_nat nf_nat xt_MARK iptable_mangle ipt_ULOG
> xt_recent nf_conntrack_ipv4 nf_defrag1
> [67359.740000] Call Trace:[<8002df58>] 0x8002df58
> [67359.750000] [<8001371c>] 0x8001371c
> [67359.750000] [<8001371c>] 0x8001371c
> [67359.750000] [<8002cfb0>] 0x8002cfb0
> [67359.760000] [<80280890>] 0x80280890
> [67359.760000] [<8002d018>] 0x8002d018
> [67359.770000] [<80280890>] 0x80280890
> [67359.770000] [<80280c9c>] 0x80280c9c
> [67359.770000] [<80280c20>] 0x80280c20
> [67359.780000] [<802980e0>] 0x802980e0
> [67359.780000] [<80298078>] 0x80298078
> [67359.780000] [<8030dd3c>] 0x8030dd3c
> [67359.790000] [<80284fbc>] 0x80284fbc
> [67359.790000] [<80284f18>] 0x80284f18
> [67359.790000] [<8030dd3c>] 0x8030dd3c
> [67359.800000] [<803089d4>] 0x803089d4
> [67359.800000] [<8030893c>] 0x8030893c
> [67359.800000] [<8030ea74>] 0x8030ea74
> [67359.810000] [<8030dd3c>] 0x8030dd3c
> [67359.810000] [<8030dd3c>] 0x8030dd3c
> [67359.820000] [<802a52f0>] 0x802a52f0
> [67359.820000] [<8030893c>] 0x8030893c
> [67359.820000] [<8030893c>] 0x8030893c
> [67359.830000] [<8030893c>] 0x8030893c
> [67359.830000] [<802a5440>] 0x802a5440
> [67359.830000] [<8030893c>] 0x8030893c
> [67359.840000] [<803089e8>] 0x803089e8
> [67359.840000] [<80308a3c>] 0x80308a3c
> [67359.840000] [<8030893c>] 0x8030893c
> [67359.850000] [<8030dec8>] 0x8030dec8
> [67359.850000] [<803089e8>] 0x803089e8
> [67359.850000] [<803089e8>] 0x803089e8
> [67359.860000] [<8030efa8>] 0x8030efa8
> [67359.860000] [<8030ddb4>] 0x8030ddb4
> [67359.870000] [<8030ddb4>] 0x8030ddb4
> [67359.870000] [<802a52f0>] 0x802a52f0
> [67359.870000] [<803089e8>] 0x803089e8
> [67359.880000] [<803089e8>] 0x803089e8
> [67359.880000] [<802a5440>] 0x802a5440
> [67359.880000] [<806a5858>] 0x806a5858
> [67359.890000] [<803089e8>] 0x803089e8
> [67359.890000] [<80309978>] 0x80309978
> [67359.890000] [<80308af4>] 0x80308af4
> [67359.900000] [<80309978>] 0x80309978
> [67359.900000] [<802a5440>] 0x802a5440
> [67359.900000] [<803089e8>] 0x803089e8
> [67359.910000] [<80309ae0>] 0x80309ae0
> [67359.910000] [<80309978>] 0x80309978
> [67359.920000] [<8030e668>] 0x8030e668
> [67359.920000] [<8030e60c>] 0x8030e60c
> [67359.920000] [<8030e25c>] 0x8030e25c
> [67359.930000] [<80309978>] 0x80309978
> [67359.930000] [<8030e25c>] 0x8030e25c
> [67359.930000] [<802a5440>] 0x802a5440
> [67359.940000] [<80014aa0>] 0x80014aa0
> [67359.940000] [<8030e25c>] 0x8030e25c
> [67359.940000] [<8030f96c>] 0x8030f96c
> [67359.950000] [<8079301c>] 0x8079301c
> [67359.950000] [<8030e25c>] 0x8030e25c
> [67359.950000] [<802a52f0>] 0x802a52f0
> [67359.960000] [<80309978>] 0x80309978
> [67359.960000] [<80309978>] 0x80309978
> [67359.970000] [<802a5440>] 0x802a5440
> [67359.970000] [<8008feec>] 0x8008feec
> [67359.970000] [<80309978>] 0x80309978
> [67359.980000] [<80309d44>] 0x80309d44
> [67359.980000] [<8008feec>] 0x8008feec
> [67359.980000] [<8008ffdc>] 0x8008ffdc
> [67359.990000] [<80276190>] 0x80276190
> [67359.990000] [<80309978>] 0x80309978
> [67359.990000] [<800900a4>] 0x800900a4
> [67360.000000] [<8027fe3c>] 0x8027fe3c
> [67360.000000] [<802964f8>] 0x802964f8
> [67360.000000] [<8008feec>] 0x8008feec
> [67360.010000] [<802828e8>] 0x802828e8
> [67360.010000] [<80276190>] 0x80276190
> [67360.020000] [<80283b40>] 0x80283b40
> [67360.020000] [<80282a94>] 0x80282a94
> [67360.020000] [<80033764>] 0x80033764
> [67360.030000] [<8005d0f0>] 0x8005d0f0
> [67360.030000] [<800543e0>] 0x800543e0
> [67360.030000] [<80033874>] 0x80033874
> [67360.040000] [<80033d74>] 0x80033d74
> [67360.040000] [<80001844>] 0x80001844
> [67360.040000] [<80001844>] 0x80001844
> [67360.050000] [<80001a60>] 0x80001a60
> [67360.050000] [<800149fc>] 0x800149fc
> [67360.050000] [<8000efc8>] 0x8000efc8
> [67360.060000] [<8000efc8>] 0x8000efc8
> [67360.060000] [<8039c9ec>] 0x8039c9ec
> [67360.070000] [<8039c9d0>] 0x8039c9d0
> [67360.070000] [<8039c110>] 0x8039c110
> [67360.070000]
> [67360.070000] ---[ end trace 94ff764c3a95abf9 ]---
> [67360.080000] Unhandled kernel unaligned access[#1]:
> [67360.080000] Cpu 0
> [67360.080000] $ 0 : 00000000 1000dc00 00000001 81445a40
> [67360.080000] $ 4 : 05f20d4d 00000000 00000001 00000083
> [67360.080000] $ 8 : 00000000 00000083 803d0000 ffffffea
> [67360.080000] $12 : 803d0000 00000000 00000000 00000000
> [67360.080000] $16 : 0000000c 00000001 8099ee20 81c72000
> [67360.080000] $20 : 81d17e00 80330090 8030893c 81c72000
> [67360.080000] $24 : 00010720 802ced34
> [67360.080000] $28 : 8037c000 8037d880 00000010 80276890
> [67360.080000] Hi : 00000000
> [67360.080000] Lo : 00000000
> [67360.080000] epc : 8006f058 0x8006f058
> [67360.080000] Tainted: G W
> [67360.080000] ra : 80276890 0x80276890
> [67360.080000] Status: 1000dc03 KERNEL EXL IE
> [67360.080000] Cause : 00800010
> [67360.080000] BadVA : 05f20d4d
> [67360.080000] PrId : 00029006 (Broadcom BCM3302)
> [67360.080000] Modules linked in: tun sch_sfq cls_fw sch_htb
> ipt_MASQUERADE iptable_nat nf_nat xt_MARK iptable_mangle ipt_ULOG
> xt_recent nf_conntrack_ipv4 nf_defrag1
> [67360.080000] Process swapper (pid: 0, threadinfo=8037c000,
> task=8037e000, tls=00000000)
> [67360.080000] Stack : 8099ee20 80276190 00000000 00000000 8099ee20
> 803d80a8 8099ee20 802761f0
> [67360.080000] 81d17e00 803d80a8 8099ee20 81c72000 81d17e00
> 80280d50 8099ee20 81404f80
> [67360.080000] 8037d900 80da8000 80da8000 81d17e00 81d17e00
> 00000001 80da8000 8099ee20
> [67360.080000] 81c72000 00665332 802980e0 80298078 80da8000
> 00000002 00000000 8030dd3c
> [67360.080000] 80da8000 81d17e00 8099ee20 00000000 803d86e0
> 80000000 80284fbc 80284f18
> [67360.080000] ...
> [67360.080000] Call Trace:[<80276190>] 0x80276190
> [67360.080000] [<802761f0>] 0x802761f0
> [67360.080000] [<80280d50>] 0x80280d50
> [67360.080000] [<802980e0>] 0x802980e0
> [67360.080000] [<80298078>] 0x80298078
> [67360.080000] [<8030dd3c>] 0x8030dd3c
> [67360.080000] [<80284fbc>] 0x80284fbc
> [67360.080000] [<80284f18>] 0x80284f18
> [67360.080000] [<8030dd3c>] 0x8030dd3c
> [67360.080000] [<803089d4>] 0x803089d4
> [67360.080000] [<8030893c>] 0x8030893c
> [67360.080000] [<8030ea74>] 0x8030ea74
> [67360.080000] [<8030dd3c>] 0x8030dd3c
> [67360.080000] [<8030dd3c>] 0x8030dd3c
> [67360.080000] [<802a52f0>] 0x802a52f0
> [67360.080000] [<8030893c>] 0x8030893c
> [67360.080000] [<8030893c>] 0x8030893c
> [67360.080000] [<8030893c>] 0x8030893c
> [67360.080000] [<802a5440>] 0x802a5440
> [67360.080000] [<8030893c>] 0x8030893c
> [67360.080000] [<803089e8>] 0x803089e8
> [67360.080000] [<80308a3c>] 0x80308a3c
> [67360.080000] [<8030893c>] 0x8030893c
> [67360.080000] [<8030dec8>] 0x8030dec8
> [67360.080000] [<803089e8>] 0x803089e8
> [67360.080000] [<803089e8>] 0x803089e8
> [67360.080000] [<8030efa8>] 0x8030efa8
> [67360.080000] [<8030ddb4>] 0x8030ddb4
> [67360.080000] [<8030ddb4>] 0x8030ddb4
> [67360.080000] [<802a52f0>] 0x802a52f0
> [67360.080000] [<803089e8>] 0x803089e8
> [67360.080000] [<803089e8>] 0x803089e8
> [67360.080000] [<802a5440>] 0x802a5440
> [67360.080000] [<806a5858>] 0x806a5858
> [67360.080000] [<803089e8>] 0x803089e8
> [67360.080000] [<80309978>] 0x80309978
> [67360.080000] [<80308af4>] 0x80308af4
> [67360.080000] [<80309978>] 0x80309978
> [67360.080000] [<802a5440>] 0x802a5440
> [67360.080000] [<803089e8>] 0x803089e8
> [67360.080000] [<80309ae0>] 0x80309ae0
> [67360.080000] [<80309978>] 0x80309978
> [67360.080000] [<8030e668>] 0x8030e668
> [67360.080000] [<8030e60c>] 0x8030e60c
> [67360.080000] [<8030e25c>] 0x8030e25c
> [67360.080000] [<80309978>] 0x80309978
> [67360.080000] [<8030e25c>] 0x8030e25c
> [67360.080000] [<802a5440>] 0x802a5440
> [67360.080000] [<80014aa0>] 0x80014aa0
> [67360.080000] [<8030e25c>] 0x8030e25c
> [67360.080000] [<8030f96c>] 0x8030f96c
> [67360.080000] [<8079301c>] 0x8079301c
> [67360.080000] [<8030e25c>] 0x8030e25c
> [67360.080000] [<802a52f0>] 0x802a52f0
> [67360.080000] [<80309978>] 0x80309978
> [67360.080000] [<80309978>] 0x80309978
> [67360.080000] [<802a5440>] 0x802a5440
> [67360.080000] [<8008feec>] 0x8008feec
> [67360.080000] [<80309978>] 0x80309978
> [67360.080000] [<80309d44>] 0x80309d44
> [67360.080000] [<8008feec>] 0x8008feec
> [67360.080000] [<8008ffdc>] 0x8008ffdc
> [67360.080000] [<80276190>] 0x80276190
> [67360.080000] [<80309978>] 0x80309978
> [67360.080000] [<800900a4>] 0x800900a4
> [67360.080000] [<8027fe3c>] 0x8027fe3c
> [67360.080000] [<802964f8>] 0x802964f8
> [67360.080000] [<8008feec>] 0x8008feec
> [67360.080000] [<802828e8>] 0x802828e8
> [67360.080000] [<80276190>] 0x80276190
> [67360.080000] [<80283b40>] 0x80283b40
> [67360.080000] [<80282a94>] 0x80282a94
> [67360.080000] [<80033764>] 0x80033764
> [67360.080000] [<8005d0f0>] 0x8005d0f0
> [67360.080000] [<800543e0>] 0x800543e0
> [67360.080000] [<80033874>] 0x80033874
> [67360.080000] [<80033d74>] 0x80033d74
> [67360.080000] [<80001844>] 0x80001844
> [67360.080000] [<80001844>] 0x80001844
> [67360.080000] [<80001a60>] 0x80001a60
> [67360.080000] [<800149fc>] 0x800149fc
> [67360.080000] [<8000efc8>] 0x8000efc8
> [67360.080000] [<8000efc8>] 0x8000efc8
> [67360.080000] [<8039c9ec>] 0x8039c9ec
> [67360.080000] [<8039c9d0>] 0x8039c9d0
> [67360.080000] [<8039c110>] 0x8039c110
> [67360.080000]
> [67360.080000]
> [67360.080000] Code: 3c048007 08010471 2484f044 <8c820000> 3042c000
> 10400003 00803821 0801b8d1 00000000
> [67360.080000] Disabling lock debugging due to kernel taint
> [67360.560000] Kernel panic - not syncing: Fatal exception in interrupt


--
Tomasz Chmielewski
http://wpkg.org


2009-10-23 13:24:05

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

G?bor Stefanik wrote:
> On Fri, Oct 23, 2009 at 3:15 PM, Tomasz Chmielewski <[email protected]> wrote:
>> Bob Copeland wrote:
>>> CONFIG_DEBUG_INFO is the basic switch. I don't know if MIPS needs
>>> CONFIG_FRAME_POINTER but that could help too.
>> I don't see CONFIG_FRAME_POINTER available.
>> I compiled with CONFIG_DEBUG_INFO; let me know if I should enable some other
>> DEBUG options as well (see below what's enabled and available).
>>
>> Before it oopses, I see lots of order 0 page allocation failures (there are
>> some extra spaces at the end of each line due to the broken konsole in
>> KDE4):
>>
>> http://www1.wpkg.org/oops2.txt

> Is CONFIG_KALLSYMS set?

Good clue:

# zgrep KALL /proc/config.gz
# CONFIG_KALLSYMS is not set

I'll recompile...


--
Tomasz Chmielewski
http://wpkg.org


2009-10-25 20:19:35

by Bob Copeland

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

On Sat, Oct 24, 2009 at 2:10 PM, Tomasz Chmielewski <[email protected]> wrote:
> Bob Copeland wrote:

> Here is everything the system said after being powered on, including all
> page allocation failures and the final oops:
>
> http://www1.wpkg.org/skb_gso_segment/minicom.txt
>
> (and it looks slightly different than the last time).

In this case an order-0 allocation failed inside ieee80211_skb_resize,
which then called dev_release_skb() on the original skb. Off the top
of my head, it's unclear why the subsequent put_page would cause a fault.

This is a little out of my area so take it with a grain of salt :) but
in general order-0 allocation failures are bad news -- it couldn't get
a single free page, and that happened in under 3 minutes. Maybe poke
around in /proc/slabinfo to see where the memory went?

--
Bob Copeland %% http://www.bobcopeland.com

2009-10-24 18:10:57

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

Bob Copeland wrote:
> On Sat, Oct 24, 2009 at 05:27:56PM +0200, Tomasz Chmielewski wrote:
>> Hmm, I get:
>>
>> 00000000 <skb_gso_segment>
>>
>> So I better leave the files here:
>>
>> http://www1.wpkg.org/skb_gso_segment/
>
> So, do you only get this WARN() once before things go downhill?

Here is everything the system said after being powered on, including all
page allocation failures and the final oops:

http://www1.wpkg.org/skb_gso_segment/minicom.txt

(and it looks slightly different than the last time).
--
Tomasz Chmielewski
http://wpkg.org

2009-10-12 08:13:58

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: ath5k AP kernel panic when client uses SCP

Bob Copeland wrote:
> On Sun, Oct 11, 2009 at 1:49 PM, Tomasz Chmielewski <[email protected]> wrote:
>> The AP is Asus WL-500gP, it's a MIPS platform, running 2.6.31.1 kernel,
>> hostapd v0.6.9.
>>
>> I can reproduce the issue reliably.
>>
>> Let me know if you need more info here.
>
> Yes, please -- it would really be helpful if instead of just the addresses
> we have all the function names in the stack trace. Or at least what is
> at 8006f058 and 80276190. We've had a couple of reports of unaligned
> accesses but I haven't yet seen a useful stack trace.

OK, will try to send it later today.


>> [67359.700000] ------------[ cut here ]------------
>> [67359.710000] WARNING: at net/core/dev.c:1566 0x80280890()
>> [67359.710000] b44: caps=(0x0, 0x0) len=80 data_len=0 ip_summed=1
>> [67359.720000] Modules linked in: tun sch_sfq cls_fw sch_htb ipt_MASQUERADE
>> iptable_nat nf_nat xt_MARK iptable_mangle ipt_ULOG xt_recent
>> nf_conntrack_ipv4 nf_defrag1
>
> Did you replace the wireless device? I don't see ath5k in the
> above list; IIRC that AP is originally some broadcom chipset.

The AP device comes with a broadcom wireless card, which doesn't work well.
So I replaced broadcom with an ath5k card.
ath5k and other modules were probably "eaten" by the terminal/minicom/konsole/whatever.
This is the list of modules which should be loaded when the panic occurs:

# lsmod
Module Size Used by
tun 12160 2
sch_sfq 4496 1
cls_fw 3216 1
sch_htb 12480 1
ipt_MASQUERADE 976 1
iptable_nat 2800 1
nf_nat 12496 2 ipt_MASQUERADE,iptable_nat
xt_MARK 912 1
iptable_mangle 1008 1
ipt_ULOG 4144 1
xt_recent 5408 9
nf_conntrack_ipv4 4560 7 iptable_nat,nf_nat
nf_defrag_ipv4 624 1 nf_conntrack_ipv4
xt_state 784 4
nf_conntrack 46640 5 ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4,xt_state
xt_tcpudp 1888 5
iptable_filter 768 1
ip_tables 8848 3 iptable_nat,iptable_mangle,iptable_filter
x_tables 9648 8 ipt_MASQUERADE,iptable_nat,xt_MARK,ipt_ULOG,xt_recent,xt_state,xt_tcpudp,ip_tables
ath5k 131616 0
mac80211 130208 1 ath5k
ath 6256 1 ath5k
ohci_hcd 17344 0
uhci_hcd 18048 0
cfg80211 72000 3 ath5k,mac80211,ath


>> [67360.080000] Disabling lock debugging due to kernel taint
>> [67360.560000] Kernel panic - not syncing: Fatal exception in interrupt
>
> Why's it tainted? I can't remember and can't check right now if
> TAINT_WARN counts.

Yes, WARN taints the kernel.


--
Tomasz Chmielewski
http://wpkg.org