2016-11-23 19:31:45

by Jason Cooper

[permalink] [raw]
Subject: ath9k ARMv7 OOPS in v4.8.6, v4.2.8

All,

I have a Ubiquiti SR-71 mini-pcie ath9k card in a Globalscale Mirabox
board (Marvell Armada 370 SoC). Every day or so I get a consistent
crash that brings down the whole board. I've attached three oops I
captured on the serial port.

I looked at the commits from v4.8.6 to v4.9-rc6, and nothing jumped out
at me as "this would fix it". And since it takes a day or so to trigger
the oops, bisecting would be a bit brutal. Does anyone have any insight
into this?

thx,

Jason.

------- oops from v4.2.8 ------------------------------------------
[ 3572.897994] Unable to handle kernel NULL pointer dereference at virtual address 00000020
[ 3572.906134] pgd = c0004000
[ 3572.908891] [00000020] *pgd=00000000
[ 3572.912504] Internal error: Oops: 5 [#1] SMP ARM
[ 3572.917142] Modules linked in: tun ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge ipv6 ath9k ath9k_common ath9k_hw led_class ath
[ 3572.930749] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.2.8 #57
[ 3572.937915] Hardware name: Marvell Armada 370/XP (Device Tree)
[ 3572.943774] task: c06fdd30 ti: c06f8000 task.ti: c06f8000
[ 3572.949208] PC is at ath_cmn_process_fft+0xac/0x4a0 [ath9k_common]
[ 3572.955421] LR is at ath_cmn_process_fft+0xc8/0x4a0 [ath9k_common]
[ 3572.961631] pc : [<bf081f44>] lr : [<bf081f60>] psr: 80000153
[ 3572.961631] sp : c06f9ca0 ip : 00000000 fp : 00000000
[ 3572.973160] r10: dc9a8010 r9 : 00000000 r8 : c06fa4d0
[ 3572.978409] r7 : 0000006c r6 : dd3abfc0 r5 : c06fad88 r4 : 00000000
[ 3572.984965] r3 : 00000001 r2 : 00000008 r1 : 00000004 r0 : 00000000
[ 3572.991522] Flags: Nzcv IRQs on FIQs off Mode SVC_32 ISA ARM Segment kernel
[ 3572.998951] Control: 10c5387d Table: 1c894019 DAC: 00000015
[ 3573.004723] Process swapper/0 (pid: 0, stack limit = 0xc06f8220)
[ 3573.010756] Stack: (0xc06f9ca0 to 0xc06fa000)
[ 3573.015137] 9ca0: c0734e31 dfbe3f40 c06f6f40 c06f6f40 c06fae34 c0734184 00989680 00000003
[ 3573.023356] 9cc0: dff6cde0 00000008 c06f9df4 00000069 dca0ab8c c0098b44 00000985 00000fc0
[ 3573.031573] 9ce0: c0733700 dffa0560 00000002 c001b6ac 000000c0 00000000 00000000 00000000
[ 3573.039790] 9d00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 3573.048006] 9d20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 3573.056222] 9d40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 3573.064438] 9d60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 dc9a8010
[ 3573.072654] 9d80: 00000000 dc983d18 dc983d00 dc983d00 00000002 dd045c90 dca08d40 bf090e64
[ 3573.080872] 9da0: d364463c 00000000 edf3c980 0000033f 0000033f 000001fa dc9a8010 00000000
[ 3573.089090] 9dc0: d3644890 00000000 dc9a8038 dca083e0 dd3abfc0 00000000 dd3abfc0 dca097ac
[ 3573.097306] 9de0: dc9a8038 00000002 00000001 dca083e0 00000100 d364463c 2602006c 018eff80
[ 3573.105522] 9e00: 80808000 00808080 20000001 00000000 00000000 00000000 00000000 00000000
[ 3573.113739] 9e20: 00000000 00000000 c06f9e68 dca08d40 00000001 dc9a8010 dca09708 c06f9e68
[ 3573.121955] 9e40: c06f4294 c06f8000 00000006 bf08e698 bf08e5dc dca096d0 dca096d4 00000000
[ 3573.130172] 9e60: c0735bc0 c0027e4c 00000000 c06fa098 00000100 c06f8000 c06f9e88 40000006
[ 3573.138389] 9e80: c06fa080 c0028030 ed5b3300 0000033f c06fa080 c06f4308 0000000a c0735bc0
[ 3573.146606] 9ea0: c06fa100 0004fe8e c051e380 00200000 c06f8000 c06f5444 00000000 00000000
[ 3573.154823] 9ec0: 00000001 df405000 c06fa484 c0759f80 c06f8000 c00283d8 c06f5444 c005b838
[ 3573.163041] 9ee0: c0341814 00000001 c0759f80 c06f9f20 000003ff c0009410 c034180c c0341814
[ 3573.171257] 9f00: 80000153 ffffffff c06f9f54 ed5b20c4 0000033f ed5b20c4 c06f8000 c0013340
[ 3573.179474] 9f20: 00000000 fffffffa 1f4ed000 dfbe3f40 ecf52c5b 0000033f dfbe34d0 00000001
[ 3573.187691] 9f40: ed5b20c4 0000033f ed5b20c4 c06f8000 0000001a c06f9f68 c034180c c0341814
[ 3573.195908] 9f60: 80000153 ffffffff 00000000 00000000 ed5b20c4 0000033f dfbe34d0 c051e374
[ 3573.204125] 9f80: dfbe34d0 c072a608 c06f64c8 c06fa51c c06f4364 c06fa524 c06f8000 c00535ec
[ 3573.212343] 9fa0: c06f9fa0 c0734dd1 dfffce00 ffffffff dfffce00 c06b5c6c ffffffff ffffffff
[ 3573.220559] 9fc0: 00000000 c06b5670 00000000 c06ea990 00000000 c0735294 c06fa4c0 c06ea98c
[ 3573.228777] 9fe0: c06fee2c 00004059 561f5811 00000000 00000000 0000807c 00000000 00000000
[ 3573.237025] [<bf081f44>] (ath_cmn_process_fft [ath9k_common]) from [<bf090e64>] (ath_rx_tasklet+0xa14/0xafc [ath9k])
[ 3573.247613] [<bf090e64>] (ath_rx_tasklet [ath9k]) from [<bf08e698>] (ath9k_tasklet+0xbc/0x20c [ath9k])
[ 3573.256981] [<bf08e698>] (ath9k_tasklet [ath9k]) from [<c0027e4c>] (tasklet_action+0x7c/0x110)
[ 3573.265641] [<c0027e4c>] (tasklet_action) from [<c0028030>] (__do_softirq+0xf8/0x23c)
[ 3573.273513] [<c0028030>] (__do_softirq) from [<c00283d8>] (irq_exit+0x78/0xb0)
[ 3573.280778] [<c00283d8>] (irq_exit) from [<c005b838>] (__handle_domain_irq+0x60/0xb0)
[ 3573.288651] [<c005b838>] (__handle_domain_irq) from [<c0009410>] (armada_370_xp_handle_irq+0x50/0xbc)
[ 3573.297917] [<c0009410>] (armada_370_xp_handle_irq) from [<c0013340>] (__irq_svc+0x40/0x54)
[ 3573.306306] Exception stack(0xc06f9f20 to 0xc06f9f68)
[ 3573.311385] 9f20: 00000000 fffffffa 1f4ed000 dfbe3f40 ecf52c5b 0000033f dfbe34d0 00000001
[ 3573.319602] 9f40: ed5b20c4 0000033f ed5b20c4 c06f8000 0000001a c06f9f68 c034180c c0341814
[ 3573.327816] 9f60: 80000153 ffffffff
[ 3573.331331] [<c0013340>] (__irq_svc) from [<c0341814>] (cpuidle_enter_state+0xdc/0x248)
[ 3573.339385] [<c0341814>] (cpuidle_enter_state) from [<c00535ec>] (cpu_startup_entry+0x170/0x234)
[ 3573.348218] [<c00535ec>] (cpu_startup_entry) from [<c06b5c6c>] (start_kernel+0x3b0/0x3bc)
[ 3573.356436] Code: e59c9004 e3e0b000 e5938000 ea000002 (e7990102)
[ 3573.362634] ---[ end trace 47b32564cd3160db ]---
[ 3573.367281] Kernel panic - not syncing: Fatal exception in interrupt
[ 3573.373667] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
----------------------------------------------------------------------

------- oops from v4.8.6 ---------------------------------------------
[231233.388543] Unable to handle kernel NULL pointer dereference at virtual address 00000020
[231233.396804] pgd = c0004000
[231233.399614] [00000020] *pgd=00000000
[231233.403311] Internal error: Oops: 17 [#1] SMP ARM
[231233.408124] Modules linked in: ath9k ath9k_common ath9k_hw ath
[231233.414132] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.6 #37
[231233.420165] Hardware name: Marvell Armada 370/XP (Device Tree)
[231233.426111] task: c0b091c0 task.stack: c0b00000
[231233.430760] PC is at ath_cmn_process_fft+0xa0/0x578 [ath9k_common]
[231233.437060] LR is at ath_cmn_process_fft+0xc4/0x578 [ath9k_common]
[231233.443357] pc : [<bf07bec4>] lr : [<bf07bee8>] psr: 80000153
[231233.443357] sp : c0b01cd0 ip : 00000000 fp : 00000000
[231233.455059] r10: c0b034d4 r9 : 00000069 r8 : 0000006c
[231233.460394] r7 : 00000000 r6 : dbef9440 r5 : c0b03da0 r4 : 00000000
[231233.467037] r3 : 00000001 r2 : 00000008 r1 : 00000004 r0 : 00000000
[231233.473682] Flags: Nzcv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none
[231233.481023] Control: 10c5387d Table: 1e614019 DAC: 00000051
[231233.486882] Process swapper/0 (pid: 0, stack limit = 0xc0b00220)
[231233.493001] Stack: (0xc0b01cd0 to 0xc0b02000)
[231233.497466] 1cc0: fffffffa 3b9ac9ff 003c8995 c0b44d30
[231233.505770] 1ce0: dda18010 00000000 c0b01e14 c017ad48 ffffffff dee1ac84 c0b44d30 c0b44b00
[231233.514073] 1d00: 0000099e 00000440 0001bef9 c01176f4 00000002 00000786 00000000 00000000
[231233.522377] 1d20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[231233.530680] 1d40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[231233.538984] 1d60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[231233.547287] 1d80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[231233.555591] 1da0: dda18010 dda18010 00000000 defaa3d8 defaa3c0 defaa3c0 dee18e40 de5b4a18
[231233.563895] 1dc0: dda18010 bf08a9d0 071a3710 00000056 00989680 00000000 071a393a 00000056
[231233.572199] 1de0: 000001fb dee18440 00000000 dbef9440 dee198b0 00000014 dda18038 dee1a97c
[231233.580504] 1e00: dbef9440 00000002 00000001 dee18440 0036f478 071a3710 2422006c 018fff07
[231233.588807] 1e20: 80252900 01168e85 40000000 00000000 00000000 00000000 00000000 00000000
[231233.597110] 1e40: 00000000 00000000 c0a4a2c0 dee18e40 dda18010 dee19808 00000001 c0b00000
[231233.605414] 1e60: c0a4a2c0 c0b02080 40000006 bf088810 dee197d0 dee197d4 00000000 c0b01e88
[231233.613717] 1e80: c0b00000 c0126b88 00000000 00000006 c0b00000 c0b02098 c0b02080 00000100
[231233.622021] 1ea0: c0b02080 c0126d74 df405000 c0b01f30 c0b01ea8 c0b3bf80 0000000a 0160605b
[231233.630324] 1ec0: c0b02100 00200100 df488a20 c0a4d500 00000000 00000000 00000001 df405000
[231233.638629] 1ee0: c0b01f30 00000000 c0b03524 c0127120 c0a4d500 c01635e4 c04b7318 20000153
[231233.646933] 1f00: c0b61040 00000001 c0b61040 c010145c c04b7318 20000153 ffffffff c0b01f64
[231233.655236] 1f20: 0028571c c0b00000 00000000 c010bd8c 00000000 0000d24e 1f194000 dfbe31c0
[231233.663540] 1f40: 37d25f82 0000d24e dfbe2590 00000001 0028571c 0000d24e 00000000 c0b03524
[231233.671843] 1f60: 0d5e8652 c0b01f80 c04b7310 c04b7318 20000153 ffffffff 00000051 00000000
[231233.680148] 1f80: dfbe2590 c0b00000 c0b034d4 00000001 dfbe2590 c0b2e070 c0a4e588 c0b0352c
[231233.688452] 1fa0: c0b03524 c015a240 c0b01fa8 c0b3b0f4 00000000 ffffffff 00000000 c0a00c54
[231233.696756] 1fc0: ffffffff ffffffff 00000000 c0a0068c 00000000 c0a3ba28 c0b3b614 c0b034c0
[231233.705059] 1fe0: c0a3ba24 c0b0a3a8 00004059 561f5811 00000000 0000807c 00000000 00000000
[231233.713401] [<bf07bec4>] (ath_cmn_process_fft [ath9k_common]) from [<bf08a9d0>] (ath_rx_tasklet+0x47c/0xb20 [ath9k])
[231233.724079] [<bf08a9d0>] (ath_rx_tasklet [ath9k]) from [<bf088810>] (ath9k_tasklet+0x1dc/0x218 [ath9k])
[231233.733623] [<bf088810>] (ath9k_tasklet [ath9k]) from [<c0126b88>] (tasklet_action+0x74/0x110)
[231233.742367] [<c0126b88>] (tasklet_action) from [<c0126d74>] (__do_softirq+0xfc/0x218)
[231233.750324] [<c0126d74>] (__do_softirq) from [<c0127120>] (irq_exit+0x7c/0xb4)
[231233.757678] [<c0127120>] (irq_exit) from [<c01635e4>] (__handle_domain_irq+0x60/0xb4)
[231233.765637] [<c01635e4>] (__handle_domain_irq) from [<c010145c>] (armada_370_xp_handle_irq+0x48/0xa8)
[231233.774989] [<c010145c>] (armada_370_xp_handle_irq) from [<c010bd8c>] (__irq_svc+0x6c/0x90)
[231233.783463] Exception stack(0xc0b01f30 to 0xc0b01f78)
[231233.788625] 1f20: 00000000 0000d24e 1f194000 dfbe31c0
[231233.796930] 1f40: 37d25f82 0000d24e dfbe2590 00000001 0028571c 0000d24e 00000000 c0b03524
[231233.805232] 1f60: 0d5e8652 c0b01f80 c04b7310 c04b7318 20000153 ffffffff
[231233.811974] [<c010bd8c>] (__irq_svc) from [<c04b7318>] (cpuidle_enter_state+0x18c/0x2b4)
[231233.820202] [<c04b7318>] (cpuidle_enter_state) from [<c015a240>] (cpu_startup_entry+0x17c/0x220)
[231233.829124] [<c015a240>] (cpu_startup_entry) from [<c0a00c54>] (start_kernel+0x368/0x374)
[231233.837430] Code: e5933000 e1d330b4 e58d3030 ea000002 (e7970102)
[231233.843714] ---[ end trace 0a5139f91be0a117 ]---
[231233.848460] Kernel panic - not syncing: Fatal exception in interrupt
[231233.854937] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
----------------------------------------------------------------------

------- oops from v4.8.6 #2 ------------------------------------------
[42059.303625] Unable to handle kernel NULL pointer dereference at virtual address 00000020
[42059.311799] pgd = c0004000
[42059.314522] [00000020] *pgd=00000000
[42059.318162] Internal error: Oops: 17 [#1] SMP ARM
[42059.322889] Modules linked in: ath9k ath9k_common ath9k_hw ath
[42059.328809] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.6 #37
[42059.334755] Hardware name: Marvell Armada 370/XP (Device Tree)
[42059.340613] task: c0b091c0 task.stack: c0b00000
[42059.345176] PC is at ath_cmn_process_fft+0xa0/0x578 [ath9k_common]
[42059.351388] LR is at ath_cmn_process_fft+0xc4/0x578 [ath9k_common]
[42059.357598] pc : [<bf07bec4>] lr : [<bf07bee8>] psr: 80000153
[42059.357598] sp : c0b01cd0 ip : 00000000 fp : 00000000
[42059.369127] r10: c0b034d4 r9 : 00000069 r8 : 0000006c
[42059.374374] r7 : 00000000 r6 : dcfbd340 r5 : c0b03da0 r4 : 00000000
[42059.380930] r3 : 00000001 r2 : 00000008 r1 : 00000004 r0 : 00000000
[42059.387487] Flags: Nzcv IRQs on FIQs off Mode SVC_32 ISA ARM Segment none
[42059.394741] Control: 10c5387d Table: 1ef48019 DAC: 00000051
[42059.400513] Process swapper/0 (pid: 0, stack limit = 0xc0b00220)
[42059.406545] Stack: (0xc0b01cd0 to 0xc0b02000)
[42059.410924] 1cc0: fffffffa 3b9ac9ff 0047c6ce c0b44d30
[42059.419141] 1ce0: de6a0010 00000000 c0b01e14 c017ad48 ffffffff dee0ac84 c0b44d30 c0b44b00
[42059.427357] 1d00: 0000099e 00000340 0001cfbd c01176f4 00000002 00000786 00000080 00000000
[42059.435573] 1d20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[42059.443789] 1d40: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[42059.452005] 1d60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[42059.460220] 1d80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[42059.468437] 1da0: de6a0010 de6a0010 00000000 deec5b58 deec5b40 deec5b40 dee08e40 de631ff0
[42059.476653] 1dc0: de6a0010 bf08a9d0 8918a8c8 00000002 00000000 00000000 8918aaa1 00000002
[42059.484869] 1de0: 00000200 dee08440 00000000 dcfbd340 dee098b0 00000014 de6a0038 dee0a97c
[42059.493085] 1e00: dcfbd340 00000002 00000001 dee08440 00fc2540 8918a8c8 0502006c 018e026c
[42059.501301] 1e20: 80202600 016d5ab7 40000100 00000000 00000000 00000000 00000000 00000000
[42059.509518] 1e40: 00000000 00000000 c0a4a2c0 dee08e40 de6a0010 dee09808 00000001 c0b00000
[42059.517734] 1e60: c0a4a2c0 c0b02080 40000006 bf088810 dee097d0 dee097d4 00000000 c0b01e88
[42059.525950] 1e80: c0b00000 c0126b88 00000000 00000006 c0b00000 c0b02098 c0b02080 00000100
[42059.534166] 1ea0: c0b02080 c0126d74 df405000 c0b01f30 c0b01ea8 c0b3bf80 0000000a 003fb83a
[42059.542382] 1ec0: c0b02100 00200100 df486a20 c0a4d500 00000000 00000000 00000001 df405000
[42059.550599] 1ee0: c0b01f30 00000000 c0b03524 c0127120 c0a4d500 c01635e4 c04b7318 20000153
[42059.558816] 1f00: c0b61040 00000001 c0b61040 c010145c c04b7318 20000153 ffffffff c0b01f64
[42059.567031] 1f20: 006895c7 c0b00000 00000000 c010bd8c 00000000 00002640 1f194000 dfbe31c0
[42059.575248] 1f40: b1708859 00002640 dfbe2590 00000001 006895c7 00002640 00000000 c0b03524
[42059.583464] 1f60: 0d5e8652 c0b01f80 c04b7310 c04b7318 20000153 ffffffff 00000051 00000000
[42059.591681] 1f80: dfbe2590 c0b00000 c0b034d4 00000001 dfbe2590 c0b2e070 c0a4e588 c0b0352c
[42059.599897] 1fa0: c0b03524 c015a240 c0b01fa8 c0b3b0f4 00000000 ffffffff 00000000 c0a00c54
[42059.608114] 1fc0: ffffffff ffffffff 00000000 c0a0068c 00000000 c0a3ba28 c0b3b614 c0b034c0
[42059.616330] 1fe0: c0a3ba24 c0b0a3a8 00004059 561f5811 00000000 0000807c 00000000 00000000
[42059.624586] [<bf07bec4>] (ath_cmn_process_fft [ath9k_common]) from [<bf08a9d0>] (ath_rx_tasklet+0x47c/0xb20 [ath9k])
[42059.635176] [<bf08a9d0>] (ath_rx_tasklet [ath9k]) from [<bf088810>] (ath9k_tasklet+0x1dc/0x218 [ath9k])
[42059.644631] [<bf088810>] (ath9k_tasklet [ath9k]) from [<c0126b88>] (tasklet_action+0x74/0x110)
[42059.653288] [<c0126b88>] (tasklet_action) from [<c0126d74>] (__do_softirq+0xfc/0x218)
[42059.661157] [<c0126d74>] (__do_softirq) from [<c0127120>] (irq_exit+0x7c/0xb4)
[42059.668423] [<c0127120>] (irq_exit) from [<c01635e4>] (__handle_domain_irq+0x60/0xb4)
[42059.676295] [<c01635e4>] (__handle_domain_irq) from [<c010145c>] (armada_370_xp_handle_irq+0x48/0xa8)
[42059.685559] [<c010145c>] (armada_370_xp_handle_irq) from [<c010bd8c>] (__irq_svc+0x6c/0x90)
[42059.693946] Exception stack(0xc0b01f30 to 0xc0b01f78)
[42059.699021] 1f20: 00000000 00002640 1f194000 dfbe31c0
[42059.707237] 1f40: b1708859 00002640 dfbe2590 00000001 006895c7 00002640 00000000 c0b03524
[42059.715453] 1f60: 0d5e8652 c0b01f80 c04b7310 c04b7318 20000153 ffffffff
[42059.722107] [<c010bd8c>] (__irq_svc) from [<c04b7318>] (cpuidle_enter_state+0x18c/0x2b4)
[42059.730247] [<c04b7318>] (cpuidle_enter_state) from [<c015a240>] (cpu_startup_entry+0x17c/0x220)
[42059.739081] [<c015a240>] (cpu_startup_entry) from [<c0a00c54>] (start_kernel+0x368/0x374)
[42059.747299] Code: e5933000 e1d330b4 e58d3030 ea000002 (e7970102)
[42059.753488] ---[ end trace d9b5665c4c165fb1 ]---
[42059.758151] Kernel panic - not syncing: Fatal exception in interrupt
[42059.764541] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
----------------------------------------------------------------------


2016-11-24 12:28:31

by Jason Cooper

[permalink] [raw]
Subject: Re: ath9k ARMv7 OOPS in v4.8.6, v4.2.8

On Thu, Nov 24, 2016 at 02:06:57PM +0800, [email protected] wrote:
>
> >>Okay, so i was 0, so running UP probably isn't going to help. r7 is
> >>also spec_priv->rfs_chan_spec_scan.
> >>
> >>So, I think the question is... how is this NULL - and has it always
> >>been NULL...
> >
> >The problem appears to be that ath_cmn_process_fft() isn't called that
> >often. When it is, it crashes in ath_cmn_is_fft_buf_full() because
> >spec_priv->rfs_chan_spec_scan is NULL when ATH9K_DEBUGFS=n. :-(
> >
> >I'm running with ATH9K_DEBUGFS=y now. If it goes a couple of days
> >without crashing, I'll gin up a patch.
> >
>
> A similar patch was applied to ath-next branch:
> https://patchwork.kernel.org/patch/9431163/.

Hmm. Ok, I'm giving it a spin on my board with SMP=y, ATH9K_DEBUGFS=n
(so the only change from known crashing is the patch) and we'll see how
it goes.

Honestly, though, I think the real problem is when kernels are built
without ATH9K_DEBUGFS. Did the reporter of the crash say if that was
enabled on his system or not?

I'm concerned that there may be other code lurking that secretly depends
on ATH9K_DEBUGFS being enabled.

thx,

Jason.

2016-11-23 21:17:53

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: ath9k ARMv7 OOPS in v4.8.6, v4.2.8

On Wed, Nov 23, 2016 at 08:59:17PM +0000, Jason Cooper wrote:
> As requested on irc:

Thanks.

> 7f0: ea000002 b 800 <ath_cmn_process_fft+0xac>
> 7f4: e7970102 ldr r0, [r7, r2, lsl #2]
> 7f8: ebfffffe bl 0 <relay_buf_full>
> 7fc: e0844000 add r4, r4, r0
> 800: e300a000 movw sl, #0
> 804: e28b2001 add r2, fp, #1
> 808: e340a000 movt sl, #0
> 80c: e3a01004 mov r1, #4
> 810: e1a0000a mov r0, sl
> 814: ebfffffe bl 0 <_find_next_bit_le>
> 818: e5953000 ldr r3, [r5]
> 81c: e1500003 cmp r0, r3
> 820: e1a0b000 mov fp, r0
> 824: e2802008 add r2, r0, #8
> 828: bafffff1 blt 7f4 <ath_cmn_process_fft+0xa0>

Okay, so i was 0, so running UP probably isn't going to help. r7 is
also spec_priv->rfs_chan_spec_scan.

So, I think the question is... how is this NULL - and has it always
been NULL...

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

2016-11-23 20:21:28

by Jason Cooper

[permalink] [raw]
Subject: Re: ath9k ARMv7 OOPS in v4.8.6, v4.2.8

Hi Russell,

On Wed, Nov 23, 2016 at 07:51:20PM +0000, Russell King - ARM Linux wrote:
> On Wed, Nov 23, 2016 at 07:15:39PM +0000, Jason Cooper wrote:
> > ------- oops from v4.8.6 #2 ------------------------------------------
> > [42059.303625] Unable to handle kernel NULL pointer dereference at virtual address 00000020
> > [42059.311799] pgd = c0004000
> > [42059.314522] [00000020] *pgd=00000000
> > [42059.318162] Internal error: Oops: 17 [#1] SMP ARM
> > [42059.322889] Modules linked in: ath9k ath9k_common ath9k_hw ath
> > [42059.328809] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.6 #37
> > [42059.334755] Hardware name: Marvell Armada 370/XP (Device Tree)
> > [42059.340613] task: c0b091c0 task.stack: c0b00000
> > [42059.345176] PC is at ath_cmn_process_fft+0xa0/0x578 [ath9k_common]
> > [42059.351388] LR is at ath_cmn_process_fft+0xc4/0x578 [ath9k_common]
> > [42059.357598] pc : [<bf07bec4>] lr : [<bf07bee8>] psr: 80000153
> > [42059.357598] sp : c0b01cd0 ip : 00000000 fp : 00000000
> > [42059.369127] r10: c0b034d4 r9 : 00000069 r8 : 0000006c
> > [42059.374374] r7 : 00000000 r6 : dcfbd340 r5 : c0b03da0 r4 : 00000000
> > [42059.380930] r3 : 00000001 r2 : 00000008 r1 : 00000004 r0 : 00000000
>
> Well, the good news is that it's reproducable.
>
> It looks like it could be this:
>
> static int
> ath_cmn_is_fft_buf_full(struct ath_spec_scan_priv *spec_priv)
> {
> for_each_online_cpu(i)
> ret += relay_buf_full(rc->buf[i]);

ahhh, my config has NR_CPUS=4, this SoC is uniprocessor. I'm going to
give it a go with SMP=no. This config is a lightly modified
mvebu_v7_defconfig. However, NR_CPUS isn't set in mvebu_v7_defconfig.
Only in multi_v7_defconfig.

I suspect ath9k uses different logic for setting up the relay buffer(s)
than for the code you referenced.

If SMP=no fails to fail ( :-P ) then we'll know where to start digging.

thx,

Jason.

2016-11-23 21:15:25

by Jason Cooper

[permalink] [raw]
Subject: Re: ath9k ARMv7 OOPS in v4.8.6, v4.2.8

On Wed, Nov 23, 2016 at 07:51:20PM +0000, Russell King - ARM Linux wrote:
> On Wed, Nov 23, 2016 at 07:15:39PM +0000, Jason Cooper wrote:
> > ------- oops from v4.8.6 #2 ------------------------------------------
> > [42059.303625] Unable to handle kernel NULL pointer dereference at virtual address 00000020
> > [42059.311799] pgd = c0004000
> > [42059.314522] [00000020] *pgd=00000000
> > [42059.318162] Internal error: Oops: 17 [#1] SMP ARM
> > [42059.322889] Modules linked in: ath9k ath9k_common ath9k_hw ath
> > [42059.328809] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.6 #37
> > [42059.334755] Hardware name: Marvell Armada 370/XP (Device Tree)
> > [42059.340613] task: c0b091c0 task.stack: c0b00000
> > [42059.345176] PC is at ath_cmn_process_fft+0xa0/0x578 [ath9k_common]
> > [42059.351388] LR is at ath_cmn_process_fft+0xc4/0x578 [ath9k_common]
> > [42059.357598] pc : [<bf07bec4>] lr : [<bf07bee8>] psr: 80000153
> > [42059.357598] sp : c0b01cd0 ip : 00000000 fp : 00000000
> > [42059.369127] r10: c0b034d4 r9 : 00000069 r8 : 0000006c
> > [42059.374374] r7 : 00000000 r6 : dcfbd340 r5 : c0b03da0 r4 : 00000000
> > [42059.380930] r3 : 00000001 r2 : 00000008 r1 : 00000004 r0 : 00000000
>
> Well, the good news is that it's reproducable.
>
> It looks like it could be this:
>
> static int
> ath_cmn_is_fft_buf_full(struct ath_spec_scan_priv *spec_priv)
> {
> for_each_online_cpu(i)
> ret += relay_buf_full(rc->buf[i]);
>
> where i = 8 (r2) and rc->buf is r7. That's just a guess though, as
> there's precious little to go on with the Code: line - modern GCCs
> don't give us much with the Code: line anymore to figure out what's
> going on without the exact object files.
>
> e5933000 ldr r3, [r3]
> e1d330b4 ldrh r3, [r3, #4]
> e58d3030 str r3, [sp, #48] ; 0x30
> ea000002 b 1c <foo+0x1c>
> e7970102 ldr r0, [r7, r2, lsl #2]
>

As requested on irc:


-------------->8--------------------------------------------------------
drivers/net/wireless/ath/ath9k/common-spectral.o: file format elf32-littlearm


Disassembly of section .text:

...

00000754 <ath_cmn_process_fft>:
754: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr}
758: e24dd0d4 sub sp, sp, #212 ; 0xd4
75c: e1a04002 mov r4, r2
760: e1a06001 mov r6, r1
764: e58d0024 str r0, [sp, #36] ; 0x24
768: e3a01000 mov r1, #0
76c: e58d2018 str r2, [sp, #24]
770: e28d0049 add r0, sp, #73 ; 0x49
774: e3a02087 mov r2, #135 ; 0x87
778: ebfffffe bl 0 <memset>
77c: e5d44007 ldrb r4, [r4, #7]
780: e20430fd and r3, r4, #253 ; 0xfd
784: e3530024 cmp r3, #36 ; 0x24
788: 13540005 cmpne r4, #5
78c: 13a04001 movne r4, #1
790: 03a04000 moveq r4, #0
794: 13a00000 movne r0, #0
798: 0a000001 beq 7a4 <ath_cmn_process_fft+0x50>
79c: e28dd0d4 add sp, sp, #212 ; 0xd4
7a0: e8bd8ff0 pop {r4, r5, r6, r7, r8, r9, sl, fp, pc}
7a4: e59d3018 ldr r3, [sp, #24]
7a8: e1d380b4 ldrh r8, [r3, #4]
7ac: e2489003 sub r9, r8, #3
7b0: e0863009 add r3, r6, r9
7b4: e5d30002 ldrb r0, [r3, #2]
7b8: e2000010 and r0, r0, #16
7bc: e21000ff ands r0, r0, #255 ; 0xff
7c0: 0afffff5 beq 79c <ath_cmn_process_fft+0x48>
7c4: e59d3024 ldr r3, [sp, #36] ; 0x24
7c8: e3005000 movw r5, #0
7cc: e3405000 movt r5, #0
7d0: e3e0b000 mvn fp, #0
7d4: e5932000 ldr r2, [r3]
7d8: e5937004 ldr r7, [r3, #4]
7dc: e5923438 ldr r3, [r2, #1080] ; 0x438
7e0: e58d2010 str r2, [sp, #16]
7e4: e5933000 ldr r3, [r3]
7e8: e1d330b4 ldrh r3, [r3, #4]
7ec: e58d3030 str r3, [sp, #48] ; 0x30
7f0: ea000002 b 800 <ath_cmn_process_fft+0xac>
7f4: e7970102 ldr r0, [r7, r2, lsl #2]
7f8: ebfffffe bl 0 <relay_buf_full>
7fc: e0844000 add r4, r4, r0
800: e300a000 movw sl, #0
804: e28b2001 add r2, fp, #1
808: e340a000 movt sl, #0
80c: e3a01004 mov r1, #4
810: e1a0000a mov r0, sl
814: ebfffffe bl 0 <_find_next_bit_le>
818: e5953000 ldr r3, [r5]
81c: e1500003 cmp r0, r3
820: e1a0b000 mov fp, r0
824: e2802008 add r2, r0, #8
828: bafffff1 blt 7f4 <ath_cmn_process_fft+0xa0>
82c: e59a0000 ldr r0, [sl]
830: e200000f and r0, r0, #15
834: ebfffffe bl 0 <__sw_hweight32>
838: e1540000 cmp r4, r0
83c: 0a000092 beq a8c <ath_cmn_process_fft+0x338>
840: e59d3010 ldr r3, [sp, #16]
844: e5932030 ldr r2, [r3, #48] ; 0x30
848: e5923018 ldr r3, [r2, #24]
84c: e3530001 cmp r3, #1
850: 0a000090 beq a98 <ath_cmn_process_fft+0x344>
854: 3a000119 bcc cc0 <ath_cmn_process_fft+0x56c>
858: e3530002 cmp r3, #2
85c: 1a000110 bne ca4 <ath_cmn_process_fft+0x550>
860: e3003000 movw r3, #0
864: e5921014 ldr r1, [r2, #20]
868: e1a00003 mov r0, r3
86c: e592301c ldr r3, [r2, #28]
870: e3002000 movw r2, #0
874: e3a0b087 mov fp, #135 ; 0x87
878: e1a0c002 mov ip, r2
87c: e1a02000 mov r2, r0
880: e3402000 movt r2, #0
884: e58d2034 str r2, [sp, #52] ; 0x34
888: e1a0200c mov r2, ip
88c: e3a0a08a mov sl, #138 ; 0x8a
890: e3402000 movt r2, #0
894: e58d2044 str r2, [sp, #68] ; 0x44
898: e1d120b4 ldrh r2, [r1, #4]
89c: e3a01080 mov r1, #128 ; 0x80
8a0: e58d1020 str r1, [sp, #32]
8a4: e1520003 cmp r2, r3
8a8: 33a03003 movcc r3, #3
8ac: 23a03002 movcs r3, #2
8b0: e58d3038 str r3, [sp, #56] ; 0x38
8b4: e2483002 sub r3, r8, #2
8b8: e58d3014 str r3, [sp, #20]
8bc: e3530000 cmp r3, #0
8c0: da000071 ble a8c <ath_cmn_process_fft+0x338>
8c4: e3a03000 mov r3, #0
8c8: e28aa002 add sl, sl, #2
8cc: e1a04003 mov r4, r3
8d0: e58d3028 str r3, [sp, #40] ; 0x28
8d4: e1a05004 mov r5, r4
8d8: e24b3001 sub r3, fp, #1
8dc: e1a07006 mov r7, r6
8e0: e58d302c str r3, [sp, #44] ; 0x2c
8e4: e58db01c str fp, [sp, #28]
8e8: e1a03009 mov r3, r9
8ec: e58d8010 str r8, [sp, #16]
8f0: e1a09004 mov r9, r4
8f4: ea00002c b 9ac <ath_cmn_process_fft+0x258>
8f8: e3520007 cmp r2, #7
8fc: e1a05003 mov r5, r3
900: e086b004 add fp, r6, r4
904: 8a00006f bhi ac8 <ath_cmn_process_fft+0x374>
908: e59d202c ldr r2, [sp, #44] ; 0x2c
90c: e1530002 cmp r3, r2
910: a3a09001 movge r9, #1
914: ba0000dd blt c90 <ath_cmn_process_fft+0x53c>
918: e59d101c ldr r1, [sp, #28]
91c: e2812002 add r2, r1, #2
920: e1520005 cmp r2, r5
924: ba000058 blt a8c <ath_cmn_process_fft+0x338>
928: e1510005 cmp r1, r5
92c: aa000092 bge b7c <ath_cmn_process_fft+0x428>
930: e5d7001f ldrb r0, [r7, #31]
934: e5d71020 ldrb r1, [r7, #32]
938: e1500001 cmp r0, r1
93c: 1a000052 bne a8c <ath_cmn_process_fft+0x338>
940: e58d3040 str r3, [sp, #64] ; 0x40
944: e1a01004 mov r1, r4
948: e59d3044 ldr r3, [sp, #68] ; 0x44
94c: e1a0000b mov r0, fp
950: e58d203c str r2, [sp, #60] ; 0x3c
954: e12fff33 blx r3
958: e3500000 cmp r0, #0
95c: e59d203c ldr r2, [sp, #60] ; 0x3c
960: e59d3040 ldr r3, [sp, #64] ; 0x40
964: 1a00008e bne ba4 <ath_cmn_process_fft+0x450>
968: e59d2010 ldr r2, [sp, #16]
96c: e152000a cmp r2, sl
970: da0000c9 ble c9c <ath_cmn_process_fft+0x548>
974: e59d9028 ldr r9, [sp, #40] ; 0x28
978: e2842001 add r2, r4, #1
97c: e0867002 add r7, r6, r2
980: e3590000 cmp r9, #0
984: 13a09000 movne r9, #0
988: 1a000003 bne 99c <ath_cmn_process_fft+0x248>
98c: e59d2020 ldr r2, [sp, #32]
990: e2425002 sub r5, r2, #2
994: e0844005 add r4, r4, r5
998: e2842001 add r2, r4, #1
99c: e1a04002 mov r4, r2
9a0: e59d2014 ldr r2, [sp, #20]
9a4: e1540002 cmp r4, r2
9a8: aa000037 bge a8c <ath_cmn_process_fft+0x338>
9ac: e59d2010 ldr r2, [sp, #16]
9b0: e152000a cmp r2, sl
9b4: e7d62004 ldrb r2, [r6, r4]
9b8: daffffce ble 8f8 <ath_cmn_process_fft+0x1a4>
9bc: e3520007 cmp r2, #7
9c0: e2855001 add r5, r5, #1
9c4: e086b004 add fp, r6, r4
9c8: 8a000002 bhi 9d8 <ath_cmn_process_fft+0x284>
9cc: e59d202c ldr r2, [sp, #44] ; 0x2c
9d0: e1550002 cmp r5, r2
9d4: aaffffcf bge 918 <ath_cmn_process_fft+0x1c4>
9d8: e3590000 cmp r9, #0
9dc: 0affffed beq 998 <ath_cmn_process_fft+0x244>
9e0: e59d201c ldr r2, [sp, #28]
9e4: e1520005 cmp r2, r5
9e8: 1affffe1 bne 974 <ath_cmn_process_fft+0x220>
9ec: ea00007e b bec <ath_cmn_process_fft+0x498>
9f0: e597e000 ldr lr, [r7]
9f4: e24b201f sub r2, fp, #31
9f8: e597c004 ldr ip, [r7, #4]
9fc: e2871021 add r1, r7, #33 ; 0x21
a00: e5973008 ldr r3, [r7, #8]
a04: e28d0068 add r0, sp, #104 ; 0x68
a08: e58de049 str lr, [sp, #73] ; 0x49
a0c: e58dc04d str ip, [sp, #77] ; 0x4d
a10: e597e010 ldr lr, [r7, #16]
a14: e597c014 ldr ip, [r7, #20]
a18: e58d3051 str r3, [sp, #81] ; 0x51
a1c: e597300c ldr r3, [r7, #12]
a20: e58de059 str lr, [sp, #89] ; 0x59
a24: e58dc05d str ip, [sp, #93] ; 0x5d
a28: e58d3055 str r3, [sp, #85] ; 0x55
a2c: e1d7c1bc ldrh ip, [r7, #28]
a30: e5973018 ldr r3, [r7, #24]
a34: e5d7e01f ldrb lr, [r7, #31]
a38: e1cdc6b5 strh ip, [sp, #101] ; 0x65
a3c: e58d3061 str r3, [sp, #97] ; 0x61
a40: e5cde067 strb lr, [sp, #103] ; 0x67
a44: ebfffffe bl 0 <memcpy>
a48: e59d3038 ldr r3, [sp, #56] ; 0x38
a4c: e59d1024 ldr r1, [sp, #36] ; 0x24
a50: e59d0018 ldr r0, [sp, #24]
a54: e58d300c str r3, [sp, #12]
a58: e59d3030 ldr r3, [sp, #48] ; 0x30
a5c: e58d3008 str r3, [sp, #8]
a60: e1cd2fd8 ldrd r2, [sp, #248] ; 0xf8
a64: e1cd20f0 strd r2, [sp]
a68: e28d2049 add r2, sp, #73 ; 0x49
a6c: e59d3034 ldr r3, [sp, #52] ; 0x34
a70: e12fff33 blx r3
a74: e3a01087 mov r1, #135 ; 0x87
a78: e28d0049 add r0, sp, #73 ; 0x49
a7c: ebfffffe bl 0 <__memzero>
a80: e59d1020 ldr r1, [sp, #32]
a84: e28d0049 add r0, sp, #73 ; 0x49
a88: ebfffffe bl 0 <add_device_randomness>
a8c: e3a00001 mov r0, #1
a90: e28dd0d4 add sp, sp, #212 ; 0xd4
a94: e8bd8ff0 pop {r4, r5, r6, r7, r8, r9, sl, fp, pc}
a98: e58d3038 str r3, [sp, #56] ; 0x38
a9c: e3003000 movw r3, #0
aa0: e3002000 movw r2, #0
aa4: e3403000 movt r3, #0
aa8: e3402000 movt r2, #0
aac: e58d3034 str r3, [sp, #52] ; 0x34
ab0: e3a0b03c mov fp, #60 ; 0x3c
ab4: e3a03038 mov r3, #56 ; 0x38
ab8: e58d2044 str r2, [sp, #68] ; 0x44
abc: e3a0a03f mov sl, #63 ; 0x3f
ac0: e58d3020 str r3, [sp, #32]
ac4: eaffff7a b 8b4 <ath_cmn_process_fft+0x160>
ac8: e59db01c ldr fp, [sp, #28]
acc: e153000b cmp r3, fp
ad0: 0a00005e beq c50 <ath_cmn_process_fft+0x4fc>
ad4: e06b5005 rsb r5, fp, r5
ad8: e2855001 add r5, r5, #1
adc: e3550003 cmp r5, #3
ae0: 979ff105 ldrls pc, [pc, r5, lsl #2]
ae4: eaffffd7 b a48 <ath_cmn_process_fft+0x2f4>
ae8: 00000b0c andeq r0, r0, ip, lsl #22
aec: 00000af8 strdeq r0, [r0], -r8
af0: 00000b20 andeq r0, r0, r0, lsr #22
af4: 000009f0 strdeq r0, [r0], -r0 ; <UNPREDICTABLE>
af8: e1a0200b mov r2, fp
afc: e1a01007 mov r1, r7
b00: e28d0049 add r0, sp, #73 ; 0x49
b04: ebfffffe bl 0 <memcpy>
b08: eaffffce b a48 <ath_cmn_process_fft+0x2f4>
b0c: e24b2001 sub r2, fp, #1
b10: e1a01007 mov r1, r7
b14: e28d004a add r0, sp, #74 ; 0x4a
b18: ebfffffe bl 0 <memcpy>
b1c: eaffffc9 b a48 <ath_cmn_process_fft+0x2f4>
b20: e597e000 ldr lr, [r7]
b24: e24b2020 sub r2, fp, #32
b28: e597c004 ldr ip, [r7, #4]
b2c: e2871021 add r1, r7, #33 ; 0x21
b30: e5973008 ldr r3, [r7, #8]
b34: e28d0069 add r0, sp, #105 ; 0x69
b38: e58de04a str lr, [sp, #74] ; 0x4a
b3c: e58dc04e str ip, [sp, #78] ; 0x4e
b40: e597e010 ldr lr, [r7, #16]
b44: e597c014 ldr ip, [r7, #20]
b48: e58d3052 str r3, [sp, #82] ; 0x52
b4c: e597300c ldr r3, [r7, #12]
b50: e58de05a str lr, [sp, #90] ; 0x5a
b54: e58dc05e str ip, [sp, #94] ; 0x5e
b58: e5d7e01f ldrb lr, [r7, #31]
b5c: e1d7c1bc ldrh ip, [r7, #28]
b60: e58d3056 str r3, [sp, #86] ; 0x56
b64: e5973018 ldr r3, [r7, #24]
b68: e1cdc6b6 strh ip, [sp, #102] ; 0x66
b6c: e5cde068 strb lr, [sp, #104] ; 0x68
b70: e58d3062 str r3, [sp, #98] ; 0x62
b74: ebfffffe bl 0 <memcpy>
b78: eaffffb2 b a48 <ath_cmn_process_fft+0x2f4>
b7c: e58d3040 str r3, [sp, #64] ; 0x40
b80: e1a01004 mov r1, r4
b84: e59d3044 ldr r3, [sp, #68] ; 0x44
b88: e1a0000b mov r0, fp
b8c: e58d203c str r2, [sp, #60] ; 0x3c
b90: e12fff33 blx r3
b94: e3500000 cmp r0, #0
b98: e59d203c ldr r2, [sp, #60] ; 0x3c
b9c: e59d3040 ldr r3, [sp, #64] ; 0x40
ba0: 0a00000e beq be0 <ath_cmn_process_fft+0x48c>
ba4: e5d7101f ldrb r1, [r7, #31]
ba8: e5d70020 ldrb r0, [r7, #32]
bac: e59dc01c ldr ip, [sp, #28]
bb0: e15c0005 cmp ip, r5
bb4: d1510000 cmple r1, r0
bb8: 03a01001 moveq r1, #1
bbc: 13a01000 movne r1, #0
bc0: e1520005 cmp r2, r5
bc4: d3a02000 movle r2, #0
bc8: c2012001 andgt r2, r1, #1
bcc: e3520000 cmp r2, #0
bd0: 0a00001a beq c40 <ath_cmn_process_fft+0x4ec>
bd4: e5db2001 ldrb r2, [fp, #1]
bd8: e3520007 cmp r2, #7
bdc: 9affff6d bls 998 <ath_cmn_process_fft+0x244>
be0: e59d201c ldr r2, [sp, #28]
be4: e1520005 cmp r2, r5
be8: 1affff5e bne 968 <ath_cmn_process_fft+0x214>
bec: e58d303c str r3, [sp, #60] ; 0x3c
bf0: e1a02007 mov r2, r7
bf4: e59d3038 ldr r3, [sp, #56] ; 0x38
bf8: e1cd8fd8 ldrd r8, [sp, #248] ; 0xf8
bfc: e59d1024 ldr r1, [sp, #36] ; 0x24
c00: e58d300c str r3, [sp, #12]
c04: e59d3030 ldr r3, [sp, #48] ; 0x30
c08: e1cd80f0 strd r8, [sp]
c0c: e59d0018 ldr r0, [sp, #24]
c10: e58d3008 str r3, [sp, #8]
c14: e59d3034 ldr r3, [sp, #52] ; 0x34
c18: e12fff33 blx r3
c1c: e58d0028 str r0, [sp, #40] ; 0x28
c20: e1a00007 mov r0, r7
c24: e59d1020 ldr r1, [sp, #32]
c28: ebfffffe bl 0 <add_device_randomness>
c2c: e59d3010 ldr r3, [sp, #16]
c30: e153000a cmp r3, sl
c34: e59d303c ldr r3, [sp, #60] ; 0x3c
c38: caffff4d bgt 974 <ath_cmn_process_fft+0x220>
c3c: eaffff92 b a8c <ath_cmn_process_fft+0x338>
c40: e59d202c ldr r2, [sp, #44] ; 0x2c
c44: e1520005 cmp r2, r5
c48: 1affffe4 bne be0 <ath_cmn_process_fft+0x48c>
c4c: eaffffe0 b bd4 <ath_cmn_process_fft+0x480>
c50: e59d3038 ldr r3, [sp, #56] ; 0x38
c54: e59d1024 ldr r1, [sp, #36] ; 0x24
c58: e59d0018 ldr r0, [sp, #24]
c5c: e58d300c str r3, [sp, #12]
c60: e59d3030 ldr r3, [sp, #48] ; 0x30
c64: e58d3008 str r3, [sp, #8]
c68: e1cd2fd8 ldrd r2, [sp, #248] ; 0xf8
c6c: e1cd20f0 strd r2, [sp]
c70: e1a02007 mov r2, r7
c74: e59d3034 ldr r3, [sp, #52] ; 0x34
c78: e12fff33 blx r3
c7c: e1a00007 mov r0, r7
c80: e59d1020 ldr r1, [sp, #32]
c84: ebfffffe bl 0 <add_device_randomness>
c88: e3a00001 mov r0, #1
c8c: eaffff7f b a90 <ath_cmn_process_fft+0x33c>
c90: e59d201c ldr r2, [sp, #28]
c94: e1530002 cmp r3, r2
c98: 0affffd3 beq bec <ath_cmn_process_fft+0x498>
c9c: e59db01c ldr fp, [sp, #28]
ca0: eaffff8b b ad4 <ath_cmn_process_fft+0x380>
ca4: e3000000 movw r0, #0
ca8: e300119a movw r1, #410 ; 0x19a
cac: e3400000 movt r0, #0
cb0: e3a03000 mov r3, #0
cb4: e58d3038 str r3, [sp, #56] ; 0x38
cb8: ebfffffe bl 0 <warn_slowpath_null>
cbc: eaffff76 b a9c <ath_cmn_process_fft+0x348>
cc0: e3a03000 mov r3, #0
cc4: e58d3038 str r3, [sp, #56] ; 0x38
cc8: eaffff73 b a9c <ath_cmn_process_fft+0x348>

2016-11-23 19:26:49

by Kalle Valo

[permalink] [raw]
Subject: Re: ath9k ARMv7 OOPS in v4.8.6, v4.2.8

Jason Cooper <[email protected]> writes:

> All,
>
> I have a Ubiquiti SR-71 mini-pcie ath9k card in a Globalscale Mirabox
> board (Marvell Armada 370 SoC). Every day or so I get a consistent
> crash that brings down the whole board. I've attached three oops I
> captured on the serial port.
>
> I looked at the commits from v4.8.6 to v4.9-rc6, and nothing jumped out
> at me as "this would fix it". And since it takes a day or so to trigger
> the oops, bisecting would be a bit brutal. Does anyone have any insight
> into this?

Is this a regression, meaning that it didn't crash on older kernels but
crashes on newer ones? Or has it always crashed?

--
Kalle Valo

2016-11-24 06:06:58

by Miaoqing Pan

[permalink] [raw]
Subject: Re: ath9k ARMv7 OOPS in v4.8.6, v4.2.8


>> Okay, so i was 0, so running UP probably isn't going to help. r7 is
>> also spec_priv->rfs_chan_spec_scan.
>>
>> So, I think the question is... how is this NULL - and has it always
>> been NULL...
>
> The problem appears to be that ath_cmn_process_fft() isn't called that
> often. When it is, it crashes in ath_cmn_is_fft_buf_full() because
> spec_priv->rfs_chan_spec_scan is NULL when ATH9K_DEBUGFS=n. :-(
>
> I'm running with ATH9K_DEBUGFS=y now. If it goes a couple of days
> without crashing, I'll gin up a patch.
>

A similar patch was applied to ath-next branch:
https://patchwork.kernel.org/patch/9431163/.

--
Miaoqing

2016-11-23 21:41:05

by Jason Cooper

[permalink] [raw]
Subject: Re: ath9k ARMv7 OOPS in v4.8.6, v4.2.8

On Wed, Nov 23, 2016 at 09:17:45PM +0000, Russell King - ARM Linux wrote:
> On Wed, Nov 23, 2016 at 08:59:17PM +0000, Jason Cooper wrote:
> > As requested on irc:
>
> Thanks.
>
> > 7f0: ea000002 b 800 <ath_cmn_process_fft+0xac>
> > 7f4: e7970102 ldr r0, [r7, r2, lsl #2]
> > 7f8: ebfffffe bl 0 <relay_buf_full>
> > 7fc: e0844000 add r4, r4, r0
> > 800: e300a000 movw sl, #0
> > 804: e28b2001 add r2, fp, #1
> > 808: e340a000 movt sl, #0
> > 80c: e3a01004 mov r1, #4
> > 810: e1a0000a mov r0, sl
> > 814: ebfffffe bl 0 <_find_next_bit_le>
> > 818: e5953000 ldr r3, [r5]
> > 81c: e1500003 cmp r0, r3
> > 820: e1a0b000 mov fp, r0
> > 824: e2802008 add r2, r0, #8
> > 828: bafffff1 blt 7f4 <ath_cmn_process_fft+0xa0>
>
> Okay, so i was 0, so running UP probably isn't going to help. r7 is
> also spec_priv->rfs_chan_spec_scan.
>
> So, I think the question is... how is this NULL - and has it always
> been NULL...

The problem appears to be that ath_cmn_process_fft() isn't called that
often. When it is, it crashes in ath_cmn_is_fft_buf_full() because
spec_priv->rfs_chan_spec_scan is NULL when ATH9K_DEBUGFS=n. :-(

I'm running with ATH9K_DEBUGFS=y now. If it goes a couple of days
without crashing, I'll gin up a patch.

thx,

Jason.

2016-11-23 19:51:01

by Jason Cooper

[permalink] [raw]
Subject: Re: ath9k ARMv7 OOPS in v4.8.6, v4.2.8

Hi Kalle,

On Wed, Nov 23, 2016 at 09:26:42PM +0200, Kalle Valo wrote:
> Jason Cooper <[email protected]> writes:
> > I have a Ubiquiti SR-71 mini-pcie ath9k card in a Globalscale Mirabox
> > board (Marvell Armada 370 SoC). Every day or so I get a consistent
> > crash that brings down the whole board. I've attached three oops I
> > captured on the serial port.
> >
> > I looked at the commits from v4.8.6 to v4.9-rc6, and nothing jumped out
> > at me as "this would fix it". And since it takes a day or so to trigger
> > the oops, bisecting would be a bit brutal. Does anyone have any insight
> > into this?
>
> Is this a regression, meaning that it didn't crash on older kernels but
> crashes on newer ones? Or has it always crashed?

iirc, it's always done this. It's one of my spare wifi backhauls that
spends most of it's time in a cardboard box waiting for a task,
collecting dust. Kinda like the toys in Toy Story.

I pulled it out a month or so ago and the behavior started. It had
4.2.8 on it at the time. I upgraded to latest stable a few weeks ago
(v4.8.6) and I'm getting the same issue.

When I originally set it up, it didn't run long enough for me to recall
if the issue occurred. Best I recall, that was with v4.2.8.

thx,

Jason.

2016-11-24 12:35:39

by Jason Cooper

[permalink] [raw]
Subject: Re: ath9k ARMv7 OOPS in v4.8.6, v4.2.8

All,

On Wed, Nov 23, 2016 at 09:40:53PM +0000, Jason Cooper wrote:
> I'm running with ATH9K_DEBUGFS=y now. If it goes a couple of days
> without crashing, I'll gin up a patch.

Well, it survived overnight, which it's never done before. :-) I'm
testing the relay_open() NULL patch now.

thx,

Jason.

2016-11-23 19:51:30

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: ath9k ARMv7 OOPS in v4.8.6, v4.2.8

On Wed, Nov 23, 2016 at 07:15:39PM +0000, Jason Cooper wrote:
> ------- oops from v4.8.6 #2 ------------------------------------------
> [42059.303625] Unable to handle kernel NULL pointer dereference at virtual address 00000020
> [42059.311799] pgd = c0004000
> [42059.314522] [00000020] *pgd=00000000
> [42059.318162] Internal error: Oops: 17 [#1] SMP ARM
> [42059.322889] Modules linked in: ath9k ath9k_common ath9k_hw ath
> [42059.328809] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.6 #37
> [42059.334755] Hardware name: Marvell Armada 370/XP (Device Tree)
> [42059.340613] task: c0b091c0 task.stack: c0b00000
> [42059.345176] PC is at ath_cmn_process_fft+0xa0/0x578 [ath9k_common]
> [42059.351388] LR is at ath_cmn_process_fft+0xc4/0x578 [ath9k_common]
> [42059.357598] pc : [<bf07bec4>] lr : [<bf07bee8>] psr: 80000153
> [42059.357598] sp : c0b01cd0 ip : 00000000 fp : 00000000
> [42059.369127] r10: c0b034d4 r9 : 00000069 r8 : 0000006c
> [42059.374374] r7 : 00000000 r6 : dcfbd340 r5 : c0b03da0 r4 : 00000000
> [42059.380930] r3 : 00000001 r2 : 00000008 r1 : 00000004 r0 : 00000000

Well, the good news is that it's reproducable.

It looks like it could be this:

static int
ath_cmn_is_fft_buf_full(struct ath_spec_scan_priv *spec_priv)
{
for_each_online_cpu(i)
ret += relay_buf_full(rc->buf[i]);

where i = 8 (r2) and rc->buf is r7. That's just a guess though, as
there's precious little to go on with the Code: line - modern GCCs
don't give us much with the Code: line anymore to figure out what's
going on without the exact object files.

e5933000 ldr r3, [r3]
e1d330b4 ldrh r3, [r3, #4]
e58d3030 str r3, [sp, #48] ; 0x30
ea000002 b 1c <foo+0x1c>
e7970102 ldr r0, [r7, r2, lsl #2]

What makes me wonder though is that if i=8, that means you must have a
system with 9 online CPUs, which is probably unlikely - or maybe that's
the problem, for_each_online_cpu() is going wrong...

If it's not that line of code, I don't see what else it would be based
on the output of my compiler - there's only one case in my disassembly
that corresponds with the single code line that we have to go on, and
it's this:

a44: e5983020 ldr r3, [r8, #32]
a48: e793010a ldr r0, [r3, sl, lsl #2] <===
a4c: ebfffffe bl 0 <relay_buf_full>
a50: e0844000 add r4, r4, r0
a54: e59f9434 ldr r9, [pc, #1076]
a58: e28a2001 add r2, sl, #1
a5c: e3a01004 mov r1, #4
a60: e1a00009 mov r0, r9
a64: ebfffffe bl 0 <_find_next_bit_le>
a68: e5953000 ldr r3, [r5]
a6c: e1500003 cmp r0, r3
a70: e1a0a000 mov sl, r0
a74: bafffff2 blt a44 <ath_cmn_process_fft+0xa8>

I'm debating now about whether we need to dump more of the code in the
oops - both before and after the faulting instruction...

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.