Pali Rohár <[email protected]> wrote:
> I got this crash more times during debugging of PCIe controller and crash
> happens somehow at the time when PCIe kernel code started link retraining (as
> part of ASPM code) when at the same time PCIe link went down and ath9k probably
> executed hw reset procedure.
> Currently I'm not able to reproduce this issue as it looks like to be
> some race condition between link training, ASPM, link down and reset
> path. And as always, race conditions which depends on more input
> parameters are hard to reproduce as it depends on precise timings.
> But it is clear that pointers are zero in this case and should be
> properly filled as same code pattern is used in ath9k_stop() function.
> Anyway I was able to reproduce this crash by manually triggering ath
> reset worker prior putting card up. I created simple patch to export
> reset functionality via debugfs and use it to "simulate" of triggering
> reset. s proved that NULL-pointer dereference issue is there.
> Function ath9k_hw_reset() is dereferencing chan structure pointer, so it
> needs to be non-NULL pointer.
> Function ath9k_stop() already contains code which sets ah->curchan to valid
> non-NULL pointer prior calling ath9k_hw_reset() function.
> Add same code pattern also into ath_reset_internal() function to prevent
> kernel NULL pointer dereference in ath9k_hw_reset() function.
> This change fixes kernel NULL pointer dereference in ath9k_hw_reset() which
> is caused by calling ath9k_hw_reset() from ath_reset_internal() with NULL
> chan structure.
> [ 45.334305] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
> [ 45.344417] Mem abort info:
> [ 45.347301] ESR = 0x96000005
> [ 45.350448] EC = 0x25: DABT (current EL), IL = 32 bits
> [ 45.356166] SET = 0, FnV = 0
> [ 45.359350] EA = 0, S1PTW = 0
> [ 45.362596] Data abort info:
> [ 45.365756] ISV = 0, ISS = 0x00000005
> [ 45.369735] CM = 0, WnR = 0
> [ 45.372814] user pgtable: 4k pages, 39-bit VAs, pgdp=000000000685d000
> [ 45.379663]  pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
> [ 45.388856] Internal error: Oops: 96000005 [#1] SMP
> [ 45.393897] Modules linked in: ath9k ath9k_common ath9k_hw
> [ 45.399574] CPU: 1 PID: 309 Comm: kworker/u4:2 Not tainted 5.12.0-rc2-dirty #785
> [ 45.414746] Workqueue: phy0 ath_reset_work [ath9k]
> [ 45.419713] pstate: 40000005 (nZcv daif -PAN -UAO -TCO BTYPE=--)
> [ 45.425910] pc : ath9k_hw_reset+0xc4/0x1c48 [ath9k_hw]
> [ 45.431234] lr : ath9k_hw_reset+0xc0/0x1c48 [ath9k_hw]
> [ 45.436548] sp : ffffffc0118dbca0
> [ 45.439961] x29: ffffffc0118dbca0 x28: 0000000000000000
> [ 45.445442] x27: ffffff800dee4080 x26: 0000000000000000
> [ 45.450923] x25: ffffff800df9b9d8 x24: 0000000000000000
> [ 45.456404] x23: ffffffc0115f6000 x22: ffffffc008d0d408
> [ 45.461885] x21: ffffff800dee5080 x20: ffffff800df9b9d8
> [ 45.467366] x19: 0000000000000000 x18: 0000000000000000
> [ 45.472846] x17: 0000000000000000 x16: 0000000000000000
> [ 45.478326] x15: 0000000000000010 x14: ffffffffffffffff
> [ 45.483807] x13: ffffffc0918db94f x12: ffffffc011498720
> [ 45.489289] x11: 0000000000000003 x10: ffffffc0114806e0
> [ 45.494770] x9 : ffffffc01014b2ec x8 : 0000000000017fe8
> [ 45.500251] x7 : c0000000ffffefff x6 : 0000000000000001
> [ 45.505733] x5 : 0000000000000000 x4 : 0000000000000000
> [ 45.511213] x3 : 0000000000000000 x2 : ffffff801fece870
> [ 45.516693] x1 : ffffffc00eded000 x0 : 000000000000003f
> [ 45.522174] Call trace:
> [ 45.524695] ath9k_hw_reset+0xc4/0x1c48 [ath9k_hw]
> [ 45.529653] ath_reset_internal+0x1a8/0x2b8 [ath9k]
> [ 45.534696] ath_reset_work+0x2c/0x40 [ath9k]
> [ 45.539198] process_one_work+0x210/0x480
> [ 45.543339] worker_thread+0x5c/0x510
> [ 45.547115] kthread+0x12c/0x130
> [ 45.550445] ret_from_fork+0x10/0x1c
> [ 45.554138] Code: 910922c2 9117e021 95ff0398 b4000294 (b9400a61)
> [ 45.560430] ---[ end trace 566410ba90b50e8b ]---
> [ 45.565193] Kernel panic - not syncing: Oops: Fatal exception in interrupt
> [ 45.572282] SMP: stopping secondary CPUs
> [ 45.576331] Kernel Offset: disabled
> [ 45.579924] CPU features: 0x00040002,0000200c
> [ 45.584416] Memory Limit: none
> [ 45.587564] Rebooting in 3 seconds..
> Signed-off-by: Pali Rohár <[email protected]>
> Cc: [email protected]
> Signed-off-by: Kalle Valo <[email protected]>
Patch applied to ath-next branch of ath.git, thanks.
fb312ac5ccb0 ath9k: Fix kernel NULL pointer dereference during ath_reset_internal()