Hi Loic and Mani,
I hate to be the bearer of bad news again :)
I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
problem, I only see the problem on the NUC. I do not know what's causing
this difference.
At the moment I'm running my tests with commit 020d3b26c07a reverted and
everything works without problems. Is there a simple way to fix this? Or
maybe we should just revert the commit? Commit log and kernel logs from
a failing case below.
Kalle
commit 020d3b26c07abe274ac17f64999bbd3bf3342195
Author: Loic Poulain <[email protected]>
AuthorDate: Fri Mar 5 17:14:01 2021 +0100
Commit: Manivannan Sadhasivam <[email protected]>
CommitDate: Wed Mar 10 20:11:22 2021 +0530
bus: mhi: Early MHI resume failure in non M3 state
MHI suspend/resume are symmetric and balanced procedures. If device is
not in M3 state on a resume, that means something happened behind our
back. In this case resume is aborted and error reported, to let the
controller handle the situation.
This is mainly requested for system wide suspend-resume operation in
PCI context which may lead to power-down/reset of the controller which
will then lose its MHI context. In such cases, PCI driver is supposed
to recover and reinitialize the device.
Signed-off-by: Loic Poulain <[email protected]>
Reviewed-by: Bhaumik Bhatt <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Manivannan Sadhasivam <[email protected]>
[ 267.182376] ACPI: PM: Waking up from system sleep state S3
[ 268.192783] ACPI: EC: interrupt unblocked
[ 268.193023] pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
[ 268.204389] pcieport 0000:00:1d.0: Intel SPT PCH root port ACS workaround enabled
[ 268.204391] pcieport 0000:00:1c.1: Intel SPT PCH root port ACS workaround enabled
[ 268.205227] pcieport 0000:00:1c.2: Intel SPT PCH root port ACS workaround enabled
[ 269.360336] ACPI: EC: event unblocked
[ 269.367187] usb usb3: root hub lost power or was reset
[ 269.367215] usb usb4: root hub lost power or was reset
[ 269.368584] ath11k_pci 0000:06:00.0: failed to set mhi state: RESUME(6)
[ 269.455966] nvme nvme0: 8/0/0 default/read/poll queues
[ 272.289737] igb 0000:05:00.0 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 272.424084] ath11k_pci 0000:06:00.0: timed out while waiting for wow wakeup completion
[ 272.424091] ath11k_pci 0000:06:00.0: failed to wakeup wow during resume: -110
[ 272.424096] ath11k_pci 0000:06:00.0: failed to resume core: -110
[ 272.424101] PM: dpm_run_callback(): pci_pm_resume+0x0/0x2d0 returns -110
[ 272.424119] ath11k_pci 0000:06:00.0: PM: failed to resume async: error -110
[ 275.432003] ath11k_pci 0000:06:00.0: wmi command 16387 timeout
[ 275.432034] ath11k_pci 0000:06:00.0: failed to send WMI_PDEV_SET_PARAM cmd
[ 275.432088] ath11k_pci 0000:06:00.0: failed to enable PMF QOS: (-11
[ 275.432094] ------------[ cut here ]------------
[ 275.432114] Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
[ 275.432144] WARNING: CPU: 3 PID: 3164 at net/mac80211/util.c:2361 ieee80211_reconfig+0x216/0x22a0 [mac80211]
[ 275.432225] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
[ 275.432287] CPU: 3 PID: 3164 Comm: kworker/u16:20 Not tainted 5.15.0-rc1 #483
[ 275.432293] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
[ 275.432298] Workqueue: events_unbound async_run_entry_fn
[ 275.432307] RIP: 0010:ieee80211_reconfig+0x216/0x22a0 [mac80211]
[ 275.432381] Code: c0 0f 85 4b 1f 00 00 41 c6 87 7c 08 00 00 00 4c 89 ff e8 ed 41 f1 ff 41 89 c5 85 c0 74 13 48 c7 c7 40 bc 7e c0 e8 ef 63 07 e4 <0f> 0b e9 12 ff ff ff 88 5c 24 37 49 8d 47 40 48 89 c2 48 89 44 24
[ 275.432386] RSP: 0000:ffffc90002bc7ab0 EFLAGS: 00010286
[ 275.432394] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 275.432399] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000578f48
[ 275.432403] RBP: ffff88810890169a R08: 0000000000000001 R09: ffff888234fe581b
[ 275.432408] R10: ffffed10469fcb03 R11: 0000000000000001 R12: ffff88810890169e
[ 275.432412] R13: 00000000fffffff5 R14: 0000000000000000 R15: ffff888108900e20
[ 275.432417] FS: 0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
[ 275.432421] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 275.432426] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
[ 275.432430] Call Trace:
[ 275.432443] ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
[ 275.432515] wiphy_resume+0x190/0x370 [cfg80211]
[ 275.432574] ? trace_device_pm_callback_start+0x123/0x1b0
[ 275.432584] ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
[ 275.432642] dpm_run_callback+0xf4/0x1b0
[ 275.432650] ? trace_device_pm_callback_end+0x1a0/0x1a0
[ 275.432658] ? device_links_read_unlock+0x1b/0x30
[ 275.432665] ? dpm_wait_for_superior+0x256/0x430
[ 275.432679] device_resume+0x3d5/0x980
[ 275.432688] ? dpm_run_callback+0x1b0/0x1b0
[ 275.432693] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
[ 275.432701] ? ktime_get+0x214/0x2f0
[ 275.432707] ? trace_hardirqs_on+0x1c/0x120
[ 275.432715] ? recalibrate_cpu_khz+0x10/0x10
[ 275.432726] ? device_resume+0x980/0x980
[ 275.432732] async_resume+0x14/0x30
[ 275.432738] async_run_entry_fn+0x90/0x4f0
[ 275.432750] process_one_work+0x866/0x1460
[ 275.432768] ? pwq_dec_nr_in_flight+0x230/0x230
[ 275.432787] ? worker_thread+0x152/0x1010
[ 275.432798] worker_thread+0x596/0x1010
[ 275.432818] ? process_one_work+0x1460/0x1460
[ 275.432828] kthread+0x322/0x3e0
[ 275.432833] ? _raw_spin_unlock_irq+0x1f/0x30
[ 275.432838] ? set_kthread_struct+0x100/0x100
[ 275.432848] ret_from_fork+0x22/0x30
[ 275.432872] irq event stamp: 977
[ 275.432876] hardirqs last enabled at (985): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
[ 275.432882] hardirqs last disabled at (992): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
[ 275.432888] softirqs last enabled at (402): [<ffffffffc095f878>] ath11k_htc_send+0x668/0xc10 [ath11k]
[ 275.432914] softirqs last disabled at (400): [<ffffffffc095f797>] ath11k_htc_send+0x587/0xc10 [ath11k]
[ 275.432937] ---[ end trace 88fd8120acef327c ]---
[ 275.433884] ------------[ cut here ]------------
[ 275.433888] wlan0: Failed check-sdata-in-driver check, flags: 0x4
[ 275.433917] WARNING: CPU: 3 PID: 3164 at net/mac80211/driver-ops.c:97 drv_remove_interface+0x2cb/0x330 [mac80211]
[ 275.434008] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
[ 275.434068] CPU: 3 PID: 3164 Comm: kworker/u16:20 Tainted: G W 5.15.0-rc1 #483
[ 275.434074] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
[ 275.434079] Workqueue: events_unbound async_run_entry_fn
[ 275.434087] RIP: 0010:drv_remove_interface+0x2cb/0x330 [mac80211]
[ 275.434154] Code: c1 e9 03 80 3c 01 00 75 72 48 8b 83 88 06 00 00 48 8d b3 a8 06 00 00 48 c7 c7 60 2a 7e c0 48 85 c0 48 0f 45 f0 e8 8a 12 16 e4 <0f> 0b eb 90 e8 6c a8 23 e2 e9 e9 fd ff ff e8 62 a8 23 e2 e9 06 fe
[ 275.434159] RSP: 0000:ffffc90002bc7788 EFLAGS: 00010282
[ 275.434167] RAX: 0000000000000000 RBX: ffff8881735a0c40 RCX: 0000000000000000
[ 275.434171] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000578ee3
[ 275.434176] RBP: ffff888108900e20 R08: 0000000000000001 R09: ffff888234fe581b
[ 275.434180] R10: ffffed10469fcb03 R11: 0000000000000001 R12: dffffc0000000000
[ 275.434184] R13: ffff8881735a12d8 R14: ffff888108901568 R15: 000000000000000f
[ 275.434189] FS: 0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
[ 275.434194] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 275.434198] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
[ 275.434203] Call Trace:
[ 275.434212] ieee80211_do_stop+0xe27/0x1a20 [mac80211]
[ 275.434291] ? mutex_lock_io_nested+0x1490/0x1490
[ 275.434303] ? ieee80211_del_virtual_monitor+0x1a0/0x1a0 [mac80211]
[ 275.434370] ? mark_held_locks+0xa5/0xe0
[ 275.434382] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
[ 275.434389] ? __local_bh_enable_ip+0x9d/0xf0
[ 275.434394] ? trace_hardirqs_on+0x1c/0x120
[ 275.434410] ieee80211_stop+0xb2/0x230 [mac80211]
[ 275.434484] __dev_close_many+0x191/0x2a0
[ 275.434491] ? netif_tx_stop_all_queues+0xf0/0xf0
[ 275.434496] ? find_held_lock+0x33/0x110
[ 275.434507] ? __lock_release+0x494/0xa40
[ 275.434518] dev_close_many+0x1c5/0x540
[ 275.434527] ? wait_for_completion_io+0x280/0x280
[ 275.434535] ? dev_get_by_napi_id+0x110/0x110
[ 275.434544] ? wiphy_resume+0x1a5/0x370 [cfg80211]
[ 275.434610] dev_close+0x132/0x1d0
[ 275.434617] ? dev_xdp_attach.constprop.0+0x750/0x750
[ 275.434633] cfg80211_shutdown_all_interfaces+0x71/0x180 [cfg80211]
[ 275.434697] wiphy_resume+0x1b2/0x370 [cfg80211]
[ 275.434755] ? trace_device_pm_callback_start+0x123/0x1b0
[ 275.434765] ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
[ 275.434822] dpm_run_callback+0xf4/0x1b0
[ 275.434830] ? trace_device_pm_callback_end+0x1a0/0x1a0
[ 275.434839] ? device_links_read_unlock+0x1b/0x30
[ 275.434845] ? dpm_wait_for_superior+0x256/0x430
[ 275.434859] device_resume+0x3d5/0x980
[ 275.434868] ? dpm_run_callback+0x1b0/0x1b0
[ 275.434873] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
[ 275.434880] ? ktime_get+0x214/0x2f0
[ 275.434886] ? trace_hardirqs_on+0x1c/0x120
[ 275.434893] ? recalibrate_cpu_khz+0x10/0x10
[ 275.434904] ? device_resume+0x980/0x980
[ 275.434910] async_resume+0x14/0x30
[ 275.434916] async_run_entry_fn+0x90/0x4f0
[ 275.434928] process_one_work+0x866/0x1460
[ 275.434946] ? pwq_dec_nr_in_flight+0x230/0x230
[ 275.434965] ? worker_thread+0x152/0x1010
[ 275.434992] worker_thread+0x596/0x1010
[ 275.435013] ? process_one_work+0x1460/0x1460
[ 275.435022] kthread+0x322/0x3e0
[ 275.435027] ? _raw_spin_unlock_irq+0x1f/0x30
[ 275.435032] ? set_kthread_struct+0x100/0x100
[ 275.435042] ret_from_fork+0x22/0x30
[ 275.435065] irq event stamp: 1923
[ 275.435069] hardirqs last enabled at (1931): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
[ 275.435076] hardirqs last disabled at (1938): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
[ 275.435082] softirqs last enabled at (1290): [<ffffffffa4c0050a>] __do_softirq+0x50a/0x7d5
[ 275.435087] softirqs last disabled at (1283): [<ffffffffa2336935>] __irq_exit_rcu+0xe5/0x120
[ 275.435093] ---[ end trace 88fd8120acef327d ]---
[ 275.435126] ------------[ cut here ]------------
[ 275.435130] WARNING: CPU: 3 PID: 3164 at net/mac80211/driver-ops.c:36 drv_stop+0x290/0x310 [mac80211]
[ 275.435197] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
[ 275.435256] CPU: 3 PID: 3164 Comm: kworker/u16:20 Tainted: G W 5.15.0-rc1 #483
[ 275.435261] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
[ 275.435265] Workqueue: events_unbound async_run_entry_fn
[ 275.435274] RIP: 0010:drv_stop+0x290/0x310 [mac80211]
[ 275.435339] Code: 80 3d 5f f1 29 00 00 75 e2 48 c7 c2 c0 29 7e c0 be 34 01 00 00 48 c7 c7 20 2a 7e c0 c6 05 43 f1 29 00 01 e8 af 64 16 e4 eb c1 <0f> 0b 5b 5d 41 5c 41 5d c3 0f 0b e9 d3 fd ff ff 48 89 ef e8 18 b2
[ 275.435344] RSP: 0000:ffffc90002bc7790 EFLAGS: 00010246
[ 275.435352] RAX: 0000000000000000 RBX: ffff888108900e20 RCX: 0000000000000001
[ 275.435356] RDX: 0000000000000004 RSI: ffffffffa5a021a0 RDI: ffff888145778920
[ 275.435360] RBP: ffff88810890169c R08: 0000000000000001 R09: ffffc90002bc757f
[ 275.435365] R10: ffffc90002bc77a8 R11: 0000000000000001 R12: dffffc0000000000
[ 275.435369] R13: ffff888108900e20 R14: ffff888108901568 R15: 000000000000000f
[ 275.435373] FS: 0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
[ 275.435378] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 275.435382] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
[ 275.435387] Call Trace:
[ 275.435394] ieee80211_do_stop+0x11dd/0x1a20 [mac80211]
[ 275.435472] ? mutex_lock_io_nested+0x1490/0x1490
[ 275.435484] ? ieee80211_del_virtual_monitor+0x1a0/0x1a0 [mac80211]
[ 275.435551] ? mark_held_locks+0xa5/0xe0
[ 275.435562] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
[ 275.435569] ? __local_bh_enable_ip+0x9d/0xf0
[ 275.435574] ? trace_hardirqs_on+0x1c/0x120
[ 275.435590] ieee80211_stop+0xb2/0x230 [mac80211]
[ 275.435663] __dev_close_many+0x191/0x2a0
[ 275.435670] ? netif_tx_stop_all_queues+0xf0/0xf0
[ 275.435675] ? find_held_lock+0x33/0x110
[ 275.435686] ? __lock_release+0x494/0xa40
[ 275.435697] dev_close_many+0x1c5/0x540
[ 275.435706] ? wait_for_completion_io+0x280/0x280
[ 275.435713] ? dev_get_by_napi_id+0x110/0x110
[ 275.435723] ? wiphy_resume+0x1a5/0x370 [cfg80211]
[ 275.435790] dev_close+0x132/0x1d0
[ 275.435797] ? dev_xdp_attach.constprop.0+0x750/0x750
[ 275.435813] cfg80211_shutdown_all_interfaces+0x71/0x180 [cfg80211]
[ 275.435876] wiphy_resume+0x1b2/0x370 [cfg80211]
[ 275.435935] ? trace_device_pm_callback_start+0x123/0x1b0
[ 275.435944] ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
[ 275.436018] dpm_run_callback+0xf4/0x1b0
[ 275.436026] ? trace_device_pm_callback_end+0x1a0/0x1a0
[ 275.436035] ? device_links_read_unlock+0x1b/0x30
[ 275.436041] ? dpm_wait_for_superior+0x256/0x430
[ 275.436055] device_resume+0x3d5/0x980
[ 275.436064] ? dpm_run_callback+0x1b0/0x1b0
[ 275.436069] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
[ 275.436076] ? ktime_get+0x214/0x2f0
[ 275.436082] ? trace_hardirqs_on+0x1c/0x120
[ 275.436089] ? recalibrate_cpu_khz+0x10/0x10
[ 275.436100] ? device_resume+0x980/0x980
[ 275.436106] async_resume+0x14/0x30
[ 275.436112] async_run_entry_fn+0x90/0x4f0
[ 275.436124] process_one_work+0x866/0x1460
[ 275.436142] ? pwq_dec_nr_in_flight+0x230/0x230
[ 275.436161] ? worker_thread+0x152/0x1010
[ 275.436172] worker_thread+0x596/0x1010
[ 275.436191] ? process_one_work+0x1460/0x1460
[ 275.436201] kthread+0x322/0x3e0
[ 275.436206] ? _raw_spin_unlock_irq+0x1f/0x30
[ 275.436211] ? set_kthread_struct+0x100/0x100
[ 275.436221] ret_from_fork+0x22/0x30
[ 275.436244] irq event stamp: 2619
[ 275.436248] hardirqs last enabled at (2627): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
[ 275.436254] hardirqs last disabled at (2634): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
[ 275.436260] softirqs last enabled at (1290): [<ffffffffa4c0050a>] __do_softirq+0x50a/0x7d5
[ 275.436266] softirqs last disabled at (1283): [<ffffffffa2336935>] __irq_exit_rcu+0xe5/0x120
[ 275.436271] ---[ end trace 88fd8120acef327e ]---
[ 275.438124] PM: dpm_run_callback(): wiphy_resume+0x0/0x370 [cfg80211] returns -11
[ 275.438194] ieee80211 phy0: PM: failed to resume async: error -11
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Hi Kalle,
On Thu, 16 Sept 2021 at 10:00, Kalle Valo <[email protected]> wrote:
>
> Hi Loic and Mani,
>
> I hate to be the bearer of bad news again :)
>
> I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
> MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
> ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
> Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
> problem, I only see the problem on the NUC. I do not know what's causing
> this difference.
I suppose the NUC is current PCI-Express power during suspend while
the laptop maintains PCIe/M2 power.
>
> At the moment I'm running my tests with commit 020d3b26c07a reverted and
> everything works without problems. Is there a simple way to fix this? Or
> maybe we should just revert the commit? Commit log and kernel logs from
> a failing case below.
Do you have log of success case?
To me, the device loses power, that is why MHI resuming is failing.
Normally the device should be properly recovered/reinitialized. Before
that patch the power loss was simply not detected (or handled at
higher stack level).
Regards,
Loic
>
> Kalle
>
> commit 020d3b26c07abe274ac17f64999bbd3bf3342195
> Author: Loic Poulain <[email protected]>
> AuthorDate: Fri Mar 5 17:14:01 2021 +0100
> Commit: Manivannan Sadhasivam <[email protected]>
> CommitDate: Wed Mar 10 20:11:22 2021 +0530
>
> bus: mhi: Early MHI resume failure in non M3 state
>
> MHI suspend/resume are symmetric and balanced procedures. If device is
> not in M3 state on a resume, that means something happened behind our
> back. In this case resume is aborted and error reported, to let the
> controller handle the situation.
>
> This is mainly requested for system wide suspend-resume operation in
> PCI context which may lead to power-down/reset of the controller which
> will then lose its MHI context. In such cases, PCI driver is supposed
> to recover and reinitialize the device.
>
> Signed-off-by: Loic Poulain <[email protected]>
> Reviewed-by: Bhaumik Bhatt <[email protected]>
> Reviewed-by: Manivannan Sadhasivam <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Manivannan Sadhasivam <[email protected]>
>
> [ 267.182376] ACPI: PM: Waking up from system sleep state S3
> [ 268.192783] ACPI: EC: interrupt unblocked
> [ 268.193023] pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
> [ 268.204389] pcieport 0000:00:1d.0: Intel SPT PCH root port ACS workaround enabled
> [ 268.204391] pcieport 0000:00:1c.1: Intel SPT PCH root port ACS workaround enabled
> [ 268.205227] pcieport 0000:00:1c.2: Intel SPT PCH root port ACS workaround enabled
> [ 269.360336] ACPI: EC: event unblocked
> [ 269.367187] usb usb3: root hub lost power or was reset
> [ 269.367215] usb usb4: root hub lost power or was reset
> [ 269.368584] ath11k_pci 0000:06:00.0: failed to set mhi state: RESUME(6)
> [ 269.455966] nvme nvme0: 8/0/0 default/read/poll queues
> [ 272.289737] igb 0000:05:00.0 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> [ 272.424084] ath11k_pci 0000:06:00.0: timed out while waiting for wow wakeup completion
> [ 272.424091] ath11k_pci 0000:06:00.0: failed to wakeup wow during resume: -110
> [ 272.424096] ath11k_pci 0000:06:00.0: failed to resume core: -110
> [ 272.424101] PM: dpm_run_callback(): pci_pm_resume+0x0/0x2d0 returns -110
> [ 272.424119] ath11k_pci 0000:06:00.0: PM: failed to resume async: error -110
> [ 275.432003] ath11k_pci 0000:06:00.0: wmi command 16387 timeout
> [ 275.432034] ath11k_pci 0000:06:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> [ 275.432088] ath11k_pci 0000:06:00.0: failed to enable PMF QOS: (-11
> [ 275.432094] ------------[ cut here ]------------
> [ 275.432114] Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> [ 275.432144] WARNING: CPU: 3 PID: 3164 at net/mac80211/util.c:2361 ieee80211_reconfig+0x216/0x22a0 [mac80211]
> [ 275.432225] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> [ 275.432287] CPU: 3 PID: 3164 Comm: kworker/u16:20 Not tainted 5.15.0-rc1 #483
> [ 275.432293] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> [ 275.432298] Workqueue: events_unbound async_run_entry_fn
> [ 275.432307] RIP: 0010:ieee80211_reconfig+0x216/0x22a0 [mac80211]
> [ 275.432381] Code: c0 0f 85 4b 1f 00 00 41 c6 87 7c 08 00 00 00 4c 89 ff e8 ed 41 f1 ff 41 89 c5 85 c0 74 13 48 c7 c7 40 bc 7e c0 e8 ef 63 07 e4 <0f> 0b e9 12 ff ff ff 88 5c 24 37 49 8d 47 40 48 89 c2 48 89 44 24
> [ 275.432386] RSP: 0000:ffffc90002bc7ab0 EFLAGS: 00010286
> [ 275.432394] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 275.432399] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000578f48
> [ 275.432403] RBP: ffff88810890169a R08: 0000000000000001 R09: ffff888234fe581b
> [ 275.432408] R10: ffffed10469fcb03 R11: 0000000000000001 R12: ffff88810890169e
> [ 275.432412] R13: 00000000fffffff5 R14: 0000000000000000 R15: ffff888108900e20
> [ 275.432417] FS: 0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
> [ 275.432421] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 275.432426] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
> [ 275.432430] Call Trace:
> [ 275.432443] ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> [ 275.432515] wiphy_resume+0x190/0x370 [cfg80211]
> [ 275.432574] ? trace_device_pm_callback_start+0x123/0x1b0
> [ 275.432584] ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> [ 275.432642] dpm_run_callback+0xf4/0x1b0
> [ 275.432650] ? trace_device_pm_callback_end+0x1a0/0x1a0
> [ 275.432658] ? device_links_read_unlock+0x1b/0x30
> [ 275.432665] ? dpm_wait_for_superior+0x256/0x430
> [ 275.432679] device_resume+0x3d5/0x980
> [ 275.432688] ? dpm_run_callback+0x1b0/0x1b0
> [ 275.432693] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> [ 275.432701] ? ktime_get+0x214/0x2f0
> [ 275.432707] ? trace_hardirqs_on+0x1c/0x120
> [ 275.432715] ? recalibrate_cpu_khz+0x10/0x10
> [ 275.432726] ? device_resume+0x980/0x980
> [ 275.432732] async_resume+0x14/0x30
> [ 275.432738] async_run_entry_fn+0x90/0x4f0
> [ 275.432750] process_one_work+0x866/0x1460
> [ 275.432768] ? pwq_dec_nr_in_flight+0x230/0x230
> [ 275.432787] ? worker_thread+0x152/0x1010
> [ 275.432798] worker_thread+0x596/0x1010
> [ 275.432818] ? process_one_work+0x1460/0x1460
> [ 275.432828] kthread+0x322/0x3e0
> [ 275.432833] ? _raw_spin_unlock_irq+0x1f/0x30
> [ 275.432838] ? set_kthread_struct+0x100/0x100
> [ 275.432848] ret_from_fork+0x22/0x30
> [ 275.432872] irq event stamp: 977
> [ 275.432876] hardirqs last enabled at (985): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
> [ 275.432882] hardirqs last disabled at (992): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
> [ 275.432888] softirqs last enabled at (402): [<ffffffffc095f878>] ath11k_htc_send+0x668/0xc10 [ath11k]
> [ 275.432914] softirqs last disabled at (400): [<ffffffffc095f797>] ath11k_htc_send+0x587/0xc10 [ath11k]
> [ 275.432937] ---[ end trace 88fd8120acef327c ]---
> [ 275.433884] ------------[ cut here ]------------
> [ 275.433888] wlan0: Failed check-sdata-in-driver check, flags: 0x4
> [ 275.433917] WARNING: CPU: 3 PID: 3164 at net/mac80211/driver-ops.c:97 drv_remove_interface+0x2cb/0x330 [mac80211]
> [ 275.434008] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> [ 275.434068] CPU: 3 PID: 3164 Comm: kworker/u16:20 Tainted: G W 5.15.0-rc1 #483
> [ 275.434074] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> [ 275.434079] Workqueue: events_unbound async_run_entry_fn
> [ 275.434087] RIP: 0010:drv_remove_interface+0x2cb/0x330 [mac80211]
> [ 275.434154] Code: c1 e9 03 80 3c 01 00 75 72 48 8b 83 88 06 00 00 48 8d b3 a8 06 00 00 48 c7 c7 60 2a 7e c0 48 85 c0 48 0f 45 f0 e8 8a 12 16 e4 <0f> 0b eb 90 e8 6c a8 23 e2 e9 e9 fd ff ff e8 62 a8 23 e2 e9 06 fe
> [ 275.434159] RSP: 0000:ffffc90002bc7788 EFLAGS: 00010282
> [ 275.434167] RAX: 0000000000000000 RBX: ffff8881735a0c40 RCX: 0000000000000000
> [ 275.434171] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000578ee3
> [ 275.434176] RBP: ffff888108900e20 R08: 0000000000000001 R09: ffff888234fe581b
> [ 275.434180] R10: ffffed10469fcb03 R11: 0000000000000001 R12: dffffc0000000000
> [ 275.434184] R13: ffff8881735a12d8 R14: ffff888108901568 R15: 000000000000000f
> [ 275.434189] FS: 0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
> [ 275.434194] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 275.434198] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
> [ 275.434203] Call Trace:
> [ 275.434212] ieee80211_do_stop+0xe27/0x1a20 [mac80211]
> [ 275.434291] ? mutex_lock_io_nested+0x1490/0x1490
> [ 275.434303] ? ieee80211_del_virtual_monitor+0x1a0/0x1a0 [mac80211]
> [ 275.434370] ? mark_held_locks+0xa5/0xe0
> [ 275.434382] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> [ 275.434389] ? __local_bh_enable_ip+0x9d/0xf0
> [ 275.434394] ? trace_hardirqs_on+0x1c/0x120
> [ 275.434410] ieee80211_stop+0xb2/0x230 [mac80211]
> [ 275.434484] __dev_close_many+0x191/0x2a0
> [ 275.434491] ? netif_tx_stop_all_queues+0xf0/0xf0
> [ 275.434496] ? find_held_lock+0x33/0x110
> [ 275.434507] ? __lock_release+0x494/0xa40
> [ 275.434518] dev_close_many+0x1c5/0x540
> [ 275.434527] ? wait_for_completion_io+0x280/0x280
> [ 275.434535] ? dev_get_by_napi_id+0x110/0x110
> [ 275.434544] ? wiphy_resume+0x1a5/0x370 [cfg80211]
> [ 275.434610] dev_close+0x132/0x1d0
> [ 275.434617] ? dev_xdp_attach.constprop.0+0x750/0x750
> [ 275.434633] cfg80211_shutdown_all_interfaces+0x71/0x180 [cfg80211]
> [ 275.434697] wiphy_resume+0x1b2/0x370 [cfg80211]
> [ 275.434755] ? trace_device_pm_callback_start+0x123/0x1b0
> [ 275.434765] ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> [ 275.434822] dpm_run_callback+0xf4/0x1b0
> [ 275.434830] ? trace_device_pm_callback_end+0x1a0/0x1a0
> [ 275.434839] ? device_links_read_unlock+0x1b/0x30
> [ 275.434845] ? dpm_wait_for_superior+0x256/0x430
> [ 275.434859] device_resume+0x3d5/0x980
> [ 275.434868] ? dpm_run_callback+0x1b0/0x1b0
> [ 275.434873] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> [ 275.434880] ? ktime_get+0x214/0x2f0
> [ 275.434886] ? trace_hardirqs_on+0x1c/0x120
> [ 275.434893] ? recalibrate_cpu_khz+0x10/0x10
> [ 275.434904] ? device_resume+0x980/0x980
> [ 275.434910] async_resume+0x14/0x30
> [ 275.434916] async_run_entry_fn+0x90/0x4f0
> [ 275.434928] process_one_work+0x866/0x1460
> [ 275.434946] ? pwq_dec_nr_in_flight+0x230/0x230
> [ 275.434965] ? worker_thread+0x152/0x1010
> [ 275.434992] worker_thread+0x596/0x1010
> [ 275.435013] ? process_one_work+0x1460/0x1460
> [ 275.435022] kthread+0x322/0x3e0
> [ 275.435027] ? _raw_spin_unlock_irq+0x1f/0x30
> [ 275.435032] ? set_kthread_struct+0x100/0x100
> [ 275.435042] ret_from_fork+0x22/0x30
> [ 275.435065] irq event stamp: 1923
> [ 275.435069] hardirqs last enabled at (1931): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
> [ 275.435076] hardirqs last disabled at (1938): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
> [ 275.435082] softirqs last enabled at (1290): [<ffffffffa4c0050a>] __do_softirq+0x50a/0x7d5
> [ 275.435087] softirqs last disabled at (1283): [<ffffffffa2336935>] __irq_exit_rcu+0xe5/0x120
> [ 275.435093] ---[ end trace 88fd8120acef327d ]---
> [ 275.435126] ------------[ cut here ]------------
> [ 275.435130] WARNING: CPU: 3 PID: 3164 at net/mac80211/driver-ops.c:36 drv_stop+0x290/0x310 [mac80211]
> [ 275.435197] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> [ 275.435256] CPU: 3 PID: 3164 Comm: kworker/u16:20 Tainted: G W 5.15.0-rc1 #483
> [ 275.435261] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> [ 275.435265] Workqueue: events_unbound async_run_entry_fn
> [ 275.435274] RIP: 0010:drv_stop+0x290/0x310 [mac80211]
> [ 275.435339] Code: 80 3d 5f f1 29 00 00 75 e2 48 c7 c2 c0 29 7e c0 be 34 01 00 00 48 c7 c7 20 2a 7e c0 c6 05 43 f1 29 00 01 e8 af 64 16 e4 eb c1 <0f> 0b 5b 5d 41 5c 41 5d c3 0f 0b e9 d3 fd ff ff 48 89 ef e8 18 b2
> [ 275.435344] RSP: 0000:ffffc90002bc7790 EFLAGS: 00010246
> [ 275.435352] RAX: 0000000000000000 RBX: ffff888108900e20 RCX: 0000000000000001
> [ 275.435356] RDX: 0000000000000004 RSI: ffffffffa5a021a0 RDI: ffff888145778920
> [ 275.435360] RBP: ffff88810890169c R08: 0000000000000001 R09: ffffc90002bc757f
> [ 275.435365] R10: ffffc90002bc77a8 R11: 0000000000000001 R12: dffffc0000000000
> [ 275.435369] R13: ffff888108900e20 R14: ffff888108901568 R15: 000000000000000f
> [ 275.435373] FS: 0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
> [ 275.435378] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 275.435382] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
> [ 275.435387] Call Trace:
> [ 275.435394] ieee80211_do_stop+0x11dd/0x1a20 [mac80211]
> [ 275.435472] ? mutex_lock_io_nested+0x1490/0x1490
> [ 275.435484] ? ieee80211_del_virtual_monitor+0x1a0/0x1a0 [mac80211]
> [ 275.435551] ? mark_held_locks+0xa5/0xe0
> [ 275.435562] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> [ 275.435569] ? __local_bh_enable_ip+0x9d/0xf0
> [ 275.435574] ? trace_hardirqs_on+0x1c/0x120
> [ 275.435590] ieee80211_stop+0xb2/0x230 [mac80211]
> [ 275.435663] __dev_close_many+0x191/0x2a0
> [ 275.435670] ? netif_tx_stop_all_queues+0xf0/0xf0
> [ 275.435675] ? find_held_lock+0x33/0x110
> [ 275.435686] ? __lock_release+0x494/0xa40
> [ 275.435697] dev_close_many+0x1c5/0x540
> [ 275.435706] ? wait_for_completion_io+0x280/0x280
> [ 275.435713] ? dev_get_by_napi_id+0x110/0x110
> [ 275.435723] ? wiphy_resume+0x1a5/0x370 [cfg80211]
> [ 275.435790] dev_close+0x132/0x1d0
> [ 275.435797] ? dev_xdp_attach.constprop.0+0x750/0x750
> [ 275.435813] cfg80211_shutdown_all_interfaces+0x71/0x180 [cfg80211]
> [ 275.435876] wiphy_resume+0x1b2/0x370 [cfg80211]
> [ 275.435935] ? trace_device_pm_callback_start+0x123/0x1b0
> [ 275.435944] ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> [ 275.436018] dpm_run_callback+0xf4/0x1b0
> [ 275.436026] ? trace_device_pm_callback_end+0x1a0/0x1a0
> [ 275.436035] ? device_links_read_unlock+0x1b/0x30
> [ 275.436041] ? dpm_wait_for_superior+0x256/0x430
> [ 275.436055] device_resume+0x3d5/0x980
> [ 275.436064] ? dpm_run_callback+0x1b0/0x1b0
> [ 275.436069] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> [ 275.436076] ? ktime_get+0x214/0x2f0
> [ 275.436082] ? trace_hardirqs_on+0x1c/0x120
> [ 275.436089] ? recalibrate_cpu_khz+0x10/0x10
> [ 275.436100] ? device_resume+0x980/0x980
> [ 275.436106] async_resume+0x14/0x30
> [ 275.436112] async_run_entry_fn+0x90/0x4f0
> [ 275.436124] process_one_work+0x866/0x1460
> [ 275.436142] ? pwq_dec_nr_in_flight+0x230/0x230
> [ 275.436161] ? worker_thread+0x152/0x1010
> [ 275.436172] worker_thread+0x596/0x1010
> [ 275.436191] ? process_one_work+0x1460/0x1460
> [ 275.436201] kthread+0x322/0x3e0
> [ 275.436206] ? _raw_spin_unlock_irq+0x1f/0x30
> [ 275.436211] ? set_kthread_struct+0x100/0x100
> [ 275.436221] ret_from_fork+0x22/0x30
> [ 275.436244] irq event stamp: 2619
> [ 275.436248] hardirqs last enabled at (2627): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
> [ 275.436254] hardirqs last disabled at (2634): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
> [ 275.436260] softirqs last enabled at (1290): [<ffffffffa4c0050a>] __do_softirq+0x50a/0x7d5
> [ 275.436266] softirqs last disabled at (1283): [<ffffffffa2336935>] __irq_exit_rcu+0xe5/0x120
> [ 275.436271] ---[ end trace 88fd8120acef327e ]---
> [ 275.438124] PM: dpm_run_callback(): wiphy_resume+0x0/0x370 [cfg80211] returns -11
> [ 275.438194] ieee80211 phy0: PM: failed to resume async: error -11
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
On Thu, Sep 16, 2021 at 12:18:10PM +0200, Loic Poulain wrote:
> Hi Kalle,
>
> On Thu, 16 Sept 2021 at 10:00, Kalle Valo <[email protected]> wrote:
> >
> > Hi Loic and Mani,
> >
> > I hate to be the bearer of bad news again :)
> >
> > I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
> > MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
> > ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
> > Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
> > problem, I only see the problem on the NUC. I do not know what's causing
> > this difference.
>
> I suppose the NUC is current PCI-Express power during suspend while
> the laptop maintains PCIe/M2 power.
>
Yes, that could be the case here.
> >
> > At the moment I'm running my tests with commit 020d3b26c07a reverted and
> > everything works without problems. Is there a simple way to fix this? Or
> > maybe we should just revert the commit? Commit log and kernel logs from
> > a failing case below.
>
> Do you have log of success case?
>
> To me, the device loses power, that is why MHI resuming is failing.
> Normally the device should be properly recovered/reinitialized. Before
> that patch the power loss was simply not detected (or handled at
> higher stack level).
>
If things seems to work fine without that patch, then it implies that setting M0
state works during resume. I think we should just revert that patch.
Loic, did that patch fix any issue for you or it was a cosmetic fix only?
Thanks,
Mani
> Regards,
> Loic
>
>
> >
> > Kalle
> >
> > commit 020d3b26c07abe274ac17f64999bbd3bf3342195
> > Author: Loic Poulain <[email protected]>
> > AuthorDate: Fri Mar 5 17:14:01 2021 +0100
> > Commit: Manivannan Sadhasivam <[email protected]>
> > CommitDate: Wed Mar 10 20:11:22 2021 +0530
> >
> > bus: mhi: Early MHI resume failure in non M3 state
> >
> > MHI suspend/resume are symmetric and balanced procedures. If device is
> > not in M3 state on a resume, that means something happened behind our
> > back. In this case resume is aborted and error reported, to let the
> > controller handle the situation.
> >
> > This is mainly requested for system wide suspend-resume operation in
> > PCI context which may lead to power-down/reset of the controller which
> > will then lose its MHI context. In such cases, PCI driver is supposed
> > to recover and reinitialize the device.
> >
> > Signed-off-by: Loic Poulain <[email protected]>
> > Reviewed-by: Bhaumik Bhatt <[email protected]>
> > Reviewed-by: Manivannan Sadhasivam <[email protected]>
> > Link: https://lore.kernel.org/r/[email protected]
> > Signed-off-by: Manivannan Sadhasivam <[email protected]>
> >
> > [ 267.182376] ACPI: PM: Waking up from system sleep state S3
> > [ 268.192783] ACPI: EC: interrupt unblocked
> > [ 268.193023] pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
> > [ 268.204389] pcieport 0000:00:1d.0: Intel SPT PCH root port ACS workaround enabled
> > [ 268.204391] pcieport 0000:00:1c.1: Intel SPT PCH root port ACS workaround enabled
> > [ 268.205227] pcieport 0000:00:1c.2: Intel SPT PCH root port ACS workaround enabled
> > [ 269.360336] ACPI: EC: event unblocked
> > [ 269.367187] usb usb3: root hub lost power or was reset
> > [ 269.367215] usb usb4: root hub lost power or was reset
> > [ 269.368584] ath11k_pci 0000:06:00.0: failed to set mhi state: RESUME(6)
> > [ 269.455966] nvme nvme0: 8/0/0 default/read/poll queues
> > [ 272.289737] igb 0000:05:00.0 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
> > [ 272.424084] ath11k_pci 0000:06:00.0: timed out while waiting for wow wakeup completion
> > [ 272.424091] ath11k_pci 0000:06:00.0: failed to wakeup wow during resume: -110
> > [ 272.424096] ath11k_pci 0000:06:00.0: failed to resume core: -110
> > [ 272.424101] PM: dpm_run_callback(): pci_pm_resume+0x0/0x2d0 returns -110
> > [ 272.424119] ath11k_pci 0000:06:00.0: PM: failed to resume async: error -110
> > [ 275.432003] ath11k_pci 0000:06:00.0: wmi command 16387 timeout
> > [ 275.432034] ath11k_pci 0000:06:00.0: failed to send WMI_PDEV_SET_PARAM cmd
> > [ 275.432088] ath11k_pci 0000:06:00.0: failed to enable PMF QOS: (-11
> > [ 275.432094] ------------[ cut here ]------------
> > [ 275.432114] Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> > [ 275.432144] WARNING: CPU: 3 PID: 3164 at net/mac80211/util.c:2361 ieee80211_reconfig+0x216/0x22a0 [mac80211]
> > [ 275.432225] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> > [ 275.432287] CPU: 3 PID: 3164 Comm: kworker/u16:20 Not tainted 5.15.0-rc1 #483
> > [ 275.432293] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> > [ 275.432298] Workqueue: events_unbound async_run_entry_fn
> > [ 275.432307] RIP: 0010:ieee80211_reconfig+0x216/0x22a0 [mac80211]
> > [ 275.432381] Code: c0 0f 85 4b 1f 00 00 41 c6 87 7c 08 00 00 00 4c 89 ff e8 ed 41 f1 ff 41 89 c5 85 c0 74 13 48 c7 c7 40 bc 7e c0 e8 ef 63 07 e4 <0f> 0b e9 12 ff ff ff 88 5c 24 37 49 8d 47 40 48 89 c2 48 89 44 24
> > [ 275.432386] RSP: 0000:ffffc90002bc7ab0 EFLAGS: 00010286
> > [ 275.432394] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > [ 275.432399] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000578f48
> > [ 275.432403] RBP: ffff88810890169a R08: 0000000000000001 R09: ffff888234fe581b
> > [ 275.432408] R10: ffffed10469fcb03 R11: 0000000000000001 R12: ffff88810890169e
> > [ 275.432412] R13: 00000000fffffff5 R14: 0000000000000000 R15: ffff888108900e20
> > [ 275.432417] FS: 0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
> > [ 275.432421] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 275.432426] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
> > [ 275.432430] Call Trace:
> > [ 275.432443] ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> > [ 275.432515] wiphy_resume+0x190/0x370 [cfg80211]
> > [ 275.432574] ? trace_device_pm_callback_start+0x123/0x1b0
> > [ 275.432584] ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> > [ 275.432642] dpm_run_callback+0xf4/0x1b0
> > [ 275.432650] ? trace_device_pm_callback_end+0x1a0/0x1a0
> > [ 275.432658] ? device_links_read_unlock+0x1b/0x30
> > [ 275.432665] ? dpm_wait_for_superior+0x256/0x430
> > [ 275.432679] device_resume+0x3d5/0x980
> > [ 275.432688] ? dpm_run_callback+0x1b0/0x1b0
> > [ 275.432693] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> > [ 275.432701] ? ktime_get+0x214/0x2f0
> > [ 275.432707] ? trace_hardirqs_on+0x1c/0x120
> > [ 275.432715] ? recalibrate_cpu_khz+0x10/0x10
> > [ 275.432726] ? device_resume+0x980/0x980
> > [ 275.432732] async_resume+0x14/0x30
> > [ 275.432738] async_run_entry_fn+0x90/0x4f0
> > [ 275.432750] process_one_work+0x866/0x1460
> > [ 275.432768] ? pwq_dec_nr_in_flight+0x230/0x230
> > [ 275.432787] ? worker_thread+0x152/0x1010
> > [ 275.432798] worker_thread+0x596/0x1010
> > [ 275.432818] ? process_one_work+0x1460/0x1460
> > [ 275.432828] kthread+0x322/0x3e0
> > [ 275.432833] ? _raw_spin_unlock_irq+0x1f/0x30
> > [ 275.432838] ? set_kthread_struct+0x100/0x100
> > [ 275.432848] ret_from_fork+0x22/0x30
> > [ 275.432872] irq event stamp: 977
> > [ 275.432876] hardirqs last enabled at (985): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
> > [ 275.432882] hardirqs last disabled at (992): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
> > [ 275.432888] softirqs last enabled at (402): [<ffffffffc095f878>] ath11k_htc_send+0x668/0xc10 [ath11k]
> > [ 275.432914] softirqs last disabled at (400): [<ffffffffc095f797>] ath11k_htc_send+0x587/0xc10 [ath11k]
> > [ 275.432937] ---[ end trace 88fd8120acef327c ]---
> > [ 275.433884] ------------[ cut here ]------------
> > [ 275.433888] wlan0: Failed check-sdata-in-driver check, flags: 0x4
> > [ 275.433917] WARNING: CPU: 3 PID: 3164 at net/mac80211/driver-ops.c:97 drv_remove_interface+0x2cb/0x330 [mac80211]
> > [ 275.434008] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> > [ 275.434068] CPU: 3 PID: 3164 Comm: kworker/u16:20 Tainted: G W 5.15.0-rc1 #483
> > [ 275.434074] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> > [ 275.434079] Workqueue: events_unbound async_run_entry_fn
> > [ 275.434087] RIP: 0010:drv_remove_interface+0x2cb/0x330 [mac80211]
> > [ 275.434154] Code: c1 e9 03 80 3c 01 00 75 72 48 8b 83 88 06 00 00 48 8d b3 a8 06 00 00 48 c7 c7 60 2a 7e c0 48 85 c0 48 0f 45 f0 e8 8a 12 16 e4 <0f> 0b eb 90 e8 6c a8 23 e2 e9 e9 fd ff ff e8 62 a8 23 e2 e9 06 fe
> > [ 275.434159] RSP: 0000:ffffc90002bc7788 EFLAGS: 00010282
> > [ 275.434167] RAX: 0000000000000000 RBX: ffff8881735a0c40 RCX: 0000000000000000
> > [ 275.434171] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff52000578ee3
> > [ 275.434176] RBP: ffff888108900e20 R08: 0000000000000001 R09: ffff888234fe581b
> > [ 275.434180] R10: ffffed10469fcb03 R11: 0000000000000001 R12: dffffc0000000000
> > [ 275.434184] R13: ffff8881735a12d8 R14: ffff888108901568 R15: 000000000000000f
> > [ 275.434189] FS: 0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
> > [ 275.434194] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 275.434198] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
> > [ 275.434203] Call Trace:
> > [ 275.434212] ieee80211_do_stop+0xe27/0x1a20 [mac80211]
> > [ 275.434291] ? mutex_lock_io_nested+0x1490/0x1490
> > [ 275.434303] ? ieee80211_del_virtual_monitor+0x1a0/0x1a0 [mac80211]
> > [ 275.434370] ? mark_held_locks+0xa5/0xe0
> > [ 275.434382] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> > [ 275.434389] ? __local_bh_enable_ip+0x9d/0xf0
> > [ 275.434394] ? trace_hardirqs_on+0x1c/0x120
> > [ 275.434410] ieee80211_stop+0xb2/0x230 [mac80211]
> > [ 275.434484] __dev_close_many+0x191/0x2a0
> > [ 275.434491] ? netif_tx_stop_all_queues+0xf0/0xf0
> > [ 275.434496] ? find_held_lock+0x33/0x110
> > [ 275.434507] ? __lock_release+0x494/0xa40
> > [ 275.434518] dev_close_many+0x1c5/0x540
> > [ 275.434527] ? wait_for_completion_io+0x280/0x280
> > [ 275.434535] ? dev_get_by_napi_id+0x110/0x110
> > [ 275.434544] ? wiphy_resume+0x1a5/0x370 [cfg80211]
> > [ 275.434610] dev_close+0x132/0x1d0
> > [ 275.434617] ? dev_xdp_attach.constprop.0+0x750/0x750
> > [ 275.434633] cfg80211_shutdown_all_interfaces+0x71/0x180 [cfg80211]
> > [ 275.434697] wiphy_resume+0x1b2/0x370 [cfg80211]
> > [ 275.434755] ? trace_device_pm_callback_start+0x123/0x1b0
> > [ 275.434765] ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> > [ 275.434822] dpm_run_callback+0xf4/0x1b0
> > [ 275.434830] ? trace_device_pm_callback_end+0x1a0/0x1a0
> > [ 275.434839] ? device_links_read_unlock+0x1b/0x30
> > [ 275.434845] ? dpm_wait_for_superior+0x256/0x430
> > [ 275.434859] device_resume+0x3d5/0x980
> > [ 275.434868] ? dpm_run_callback+0x1b0/0x1b0
> > [ 275.434873] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> > [ 275.434880] ? ktime_get+0x214/0x2f0
> > [ 275.434886] ? trace_hardirqs_on+0x1c/0x120
> > [ 275.434893] ? recalibrate_cpu_khz+0x10/0x10
> > [ 275.434904] ? device_resume+0x980/0x980
> > [ 275.434910] async_resume+0x14/0x30
> > [ 275.434916] async_run_entry_fn+0x90/0x4f0
> > [ 275.434928] process_one_work+0x866/0x1460
> > [ 275.434946] ? pwq_dec_nr_in_flight+0x230/0x230
> > [ 275.434965] ? worker_thread+0x152/0x1010
> > [ 275.434992] worker_thread+0x596/0x1010
> > [ 275.435013] ? process_one_work+0x1460/0x1460
> > [ 275.435022] kthread+0x322/0x3e0
> > [ 275.435027] ? _raw_spin_unlock_irq+0x1f/0x30
> > [ 275.435032] ? set_kthread_struct+0x100/0x100
> > [ 275.435042] ret_from_fork+0x22/0x30
> > [ 275.435065] irq event stamp: 1923
> > [ 275.435069] hardirqs last enabled at (1931): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
> > [ 275.435076] hardirqs last disabled at (1938): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
> > [ 275.435082] softirqs last enabled at (1290): [<ffffffffa4c0050a>] __do_softirq+0x50a/0x7d5
> > [ 275.435087] softirqs last disabled at (1283): [<ffffffffa2336935>] __irq_exit_rcu+0xe5/0x120
> > [ 275.435093] ---[ end trace 88fd8120acef327d ]---
> > [ 275.435126] ------------[ cut here ]------------
> > [ 275.435130] WARNING: CPU: 3 PID: 3164 at net/mac80211/driver-ops.c:36 drv_stop+0x290/0x310 [mac80211]
> > [ 275.435197] Modules linked in: ath11k_pci ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr ns mos7840 usbserial nvme nvme_core [last unloaded: mhi]
> > [ 275.435256] CPU: 3 PID: 3164 Comm: kworker/u16:20 Tainted: G W 5.15.0-rc1 #483
> > [ 275.435261] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0049.2018.0801.1601 08/01/2018
> > [ 275.435265] Workqueue: events_unbound async_run_entry_fn
> > [ 275.435274] RIP: 0010:drv_stop+0x290/0x310 [mac80211]
> > [ 275.435339] Code: 80 3d 5f f1 29 00 00 75 e2 48 c7 c2 c0 29 7e c0 be 34 01 00 00 48 c7 c7 20 2a 7e c0 c6 05 43 f1 29 00 01 e8 af 64 16 e4 eb c1 <0f> 0b 5b 5d 41 5c 41 5d c3 0f 0b e9 d3 fd ff ff 48 89 ef e8 18 b2
> > [ 275.435344] RSP: 0000:ffffc90002bc7790 EFLAGS: 00010246
> > [ 275.435352] RAX: 0000000000000000 RBX: ffff888108900e20 RCX: 0000000000000001
> > [ 275.435356] RDX: 0000000000000004 RSI: ffffffffa5a021a0 RDI: ffff888145778920
> > [ 275.435360] RBP: ffff88810890169c R08: 0000000000000001 R09: ffffc90002bc757f
> > [ 275.435365] R10: ffffc90002bc77a8 R11: 0000000000000001 R12: dffffc0000000000
> > [ 275.435369] R13: ffff888108900e20 R14: ffff888108901568 R15: 000000000000000f
> > [ 275.435373] FS: 0000000000000000(0000) GS:ffff888234e00000(0000) knlGS:0000000000000000
> > [ 275.435378] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 275.435382] CR2: 0000000000000000 CR3: 0000000101148002 CR4: 00000000003706e0
> > [ 275.435387] Call Trace:
> > [ 275.435394] ieee80211_do_stop+0x11dd/0x1a20 [mac80211]
> > [ 275.435472] ? mutex_lock_io_nested+0x1490/0x1490
> > [ 275.435484] ? ieee80211_del_virtual_monitor+0x1a0/0x1a0 [mac80211]
> > [ 275.435551] ? mark_held_locks+0xa5/0xe0
> > [ 275.435562] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> > [ 275.435569] ? __local_bh_enable_ip+0x9d/0xf0
> > [ 275.435574] ? trace_hardirqs_on+0x1c/0x120
> > [ 275.435590] ieee80211_stop+0xb2/0x230 [mac80211]
> > [ 275.435663] __dev_close_many+0x191/0x2a0
> > [ 275.435670] ? netif_tx_stop_all_queues+0xf0/0xf0
> > [ 275.435675] ? find_held_lock+0x33/0x110
> > [ 275.435686] ? __lock_release+0x494/0xa40
> > [ 275.435697] dev_close_many+0x1c5/0x540
> > [ 275.435706] ? wait_for_completion_io+0x280/0x280
> > [ 275.435713] ? dev_get_by_napi_id+0x110/0x110
> > [ 275.435723] ? wiphy_resume+0x1a5/0x370 [cfg80211]
> > [ 275.435790] dev_close+0x132/0x1d0
> > [ 275.435797] ? dev_xdp_attach.constprop.0+0x750/0x750
> > [ 275.435813] cfg80211_shutdown_all_interfaces+0x71/0x180 [cfg80211]
> > [ 275.435876] wiphy_resume+0x1b2/0x370 [cfg80211]
> > [ 275.435935] ? trace_device_pm_callback_start+0x123/0x1b0
> > [ 275.435944] ? trace_rdev_return_int+0x1a0/0x1a0 [cfg80211]
> > [ 275.436018] dpm_run_callback+0xf4/0x1b0
> > [ 275.436026] ? trace_device_pm_callback_end+0x1a0/0x1a0
> > [ 275.436035] ? device_links_read_unlock+0x1b/0x30
> > [ 275.436041] ? dpm_wait_for_superior+0x256/0x430
> > [ 275.436055] device_resume+0x3d5/0x980
> > [ 275.436064] ? dpm_run_callback+0x1b0/0x1b0
> > [ 275.436069] ? lockdep_hardirqs_on_prepare.part.0+0x19a/0x350
> > [ 275.436076] ? ktime_get+0x214/0x2f0
> > [ 275.436082] ? trace_hardirqs_on+0x1c/0x120
> > [ 275.436089] ? recalibrate_cpu_khz+0x10/0x10
> > [ 275.436100] ? device_resume+0x980/0x980
> > [ 275.436106] async_resume+0x14/0x30
> > [ 275.436112] async_run_entry_fn+0x90/0x4f0
> > [ 275.436124] process_one_work+0x866/0x1460
> > [ 275.436142] ? pwq_dec_nr_in_flight+0x230/0x230
> > [ 275.436161] ? worker_thread+0x152/0x1010
> > [ 275.436172] worker_thread+0x596/0x1010
> > [ 275.436191] ? process_one_work+0x1460/0x1460
> > [ 275.436201] kthread+0x322/0x3e0
> > [ 275.436206] ? _raw_spin_unlock_irq+0x1f/0x30
> > [ 275.436211] ? set_kthread_struct+0x100/0x100
> > [ 275.436221] ret_from_fork+0x22/0x30
> > [ 275.436244] irq event stamp: 2619
> > [ 275.436248] hardirqs last enabled at (2627): [<ffffffffa2462c4b>] console_trylock_spinning+0x19b/0x1f0
> > [ 275.436254] hardirqs last disabled at (2634): [<ffffffffa2462bfa>] console_trylock_spinning+0x14a/0x1f0
> > [ 275.436260] softirqs last enabled at (1290): [<ffffffffa4c0050a>] __do_softirq+0x50a/0x7d5
> > [ 275.436266] softirqs last disabled at (1283): [<ffffffffa2336935>] __irq_exit_rcu+0xe5/0x120
> > [ 275.436271] ---[ end trace 88fd8120acef327e ]---
> > [ 275.438124] PM: dpm_run_callback(): wiphy_resume+0x0/0x370 [cfg80211] returns -11
> > [ 275.438194] ieee80211 phy0: PM: failed to resume async: error -11
> >
> > --
> > https://patchwork.kernel.org/project/linux-wireless/list/
> >
> > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Manivannan Sadhasivam <[email protected]> writes:
> On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
>> Le jeu. 16 sept. 2021 Ă 13:12, Manivannan Sadhasivam <
>> [email protected]> a Ă©crit :
>>
>
> [...]
>
>> > If things seems to work fine without that patch, then it implies that
>> > setting M0
>> > state works during resume. I think we should just revert that patch.
>> >
>> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
>>
>>
>> It fixes sdx modem resuming issue, without that we don’t know modem needs
>> to be reinitialized.
>>
>
> Okay. Then in that case, the recovery mechanism has to be added to the ath11k
> MHI controller.
What does that mean in practise, do you have any pointers or examples? I
have no clue what you are proposing :)
> If that's too much of work for Kalle, then I'll look into it. But I might get
> time only after Plumbers.
I'm busy, as always, so not sure when I'm able to do it either. I think
we should seriously consider reverting 020d3b26c07a and adding it back
after ath11k is able to handle this new situation.
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
> Le jeu. 16 sept. 2021 Ă 13:12, Manivannan Sadhasivam <
> [email protected]> a Ă©crit :
>
[...]
> > If things seems to work fine without that patch, then it implies that
> > setting M0
> > state works during resume. I think we should just revert that patch.
> >
> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
>
>
> It fixes sdx modem resuming issue, without that we don’t know modem needs
> to be reinitialized.
>
Okay. Then in that case, the recovery mechanism has to be added to the ath11k
MHI controller.
If that's too much of work for Kalle, then I'll look into it. But I might get
time only after Plumbers.
Thanks,
Mani
On Thu, Sep 16, 2021 at 07:42:02PM +0300, Kalle Valo wrote:
> Manivannan Sadhasivam <[email protected]> writes:
>
> > On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
> >> Le jeu. 16 sept. 2021 Ă 13:12, Manivannan Sadhasivam <
> >> [email protected]> a Ă©crit :
> >>
> >
> > [...]
> >
> >> > If things seems to work fine without that patch, then it implies that
> >> > setting M0
> >> > state works during resume. I think we should just revert that patch.
> >> >
> >> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
> >>
> >>
> >> It fixes sdx modem resuming issue, without that we don’t know modem needs
> >> to be reinitialized.
> >>
> >
> > Okay. Then in that case, the recovery mechanism has to be added to the ath11k
> > MHI controller.
>
> What does that mean in practise, do you have any pointers or examples? I
> have no clue what you are proposing :)
>
Take a look at the mhi_pci_recovery_work() function below:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/pci_generic.c#n610
You need to implement something similar that basically powers up the MHI
endpoint (QCA6390) in case pm_resume() fails. At minimum, you need to call
below functions:
# Check if the device is powered on. If yes, then power it down to bring it back
mhi_power_down()
mhi_unprepare_after_power_down()
# Power up the device
mhi_prepare_for_power_up()
mhi_sync_power_up()
This implies that the WLAN device has been powered off during suspend, so the
resume fails and we are bringing the device back to working state.
> > If that's too much of work for Kalle, then I'll look into it. But I might get
> > time only after Plumbers.
>
> I'm busy, as always, so not sure when I'm able to do it either. I think
> we should seriously consider reverting 020d3b26c07a and adding it back
> after ath11k is able to handle this new situation.
>
Since Loic said that reverting would cause his modem (SDX device) to fail during
resume, this is not possible.
Thanks,
Mani
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
On 2021-09-17 01:19, Manivannan Sadhasivam wrote:
> On Thu, Sep 16, 2021 at 07:42:02PM +0300, Kalle Valo wrote:
>> Manivannan Sadhasivam <[email protected]> writes:
>>
>> > On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
>> >> Le jeu. 16 sept. 2021 Ă 13:12, Manivannan Sadhasivam <
>> >> [email protected]> a Ă©crit :
>> >>
>> >
>> > [...]
>> >
>> >> > If things seems to work fine without that patch, then it implies that
>> >> > setting M0
>> >> > state works during resume. I think we should just revert that patch.
>> >> >
>> >> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
>> >>
>> >>
>> >> It fixes sdx modem resuming issue, without that we don’t know modem needs
>> >> to be reinitialized.
>> >>
>> >
>> > Okay. Then in that case, the recovery mechanism has to be added to the ath11k
>> > MHI controller.
>>
>> What does that mean in practise, do you have any pointers or examples?
>> I
>> have no clue what you are proposing :)
>>
>
> Take a look at the mhi_pci_recovery_work() function below:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/pci_generic.c#n610
>
> You need to implement something similar that basically powers up the
> MHI
> endpoint (QCA6390) in case pm_resume() fails. At minimum, you need to
> call
> below functions:
>
> # Check if the device is powered on. If yes, then power it down to
> bring it back
> mhi_power_down()
> mhi_unprepare_after_power_down()
>
> # Power up the device
> mhi_prepare_for_power_up()
> mhi_sync_power_up()
>
> This implies that the WLAN device has been powered off during suspend,
> so the
> resume fails and we are bringing the device back to working state.
>
This is fine for platform which doesn't provide power supply during
suspend.
But NUC has power supply in suspend state.
QCA6390 on NUC works after just reverting this commit also proves NUC
has power supply in
suspend state.
The reason is MHI-STATUS register can't be read somehow in M3 state on
NUC.
Does the MHI spec state that MHI-STATUS register can be read in M3
state?
>> > If that's too much of work for Kalle, then I'll look into it. But I might get
>> > time only after Plumbers.
>>
>> I'm busy, as always, so not sure when I'm able to do it either. I
>> think
>> we should seriously consider reverting 020d3b26c07a and adding it back
>> after ath11k is able to handle this new situation.
>>
>
> Since Loic said that reverting would cause his modem (SDX device) to
> fail during
> resume, this is not possible.
>
> Thanks,
> Mani
>
>> --
>> https://patchwork.kernel.org/project/linux-wireless/list/
>>
>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
On Thu, Sep 23, 2021 at 04:34:43PM +0800, Carl Huang wrote:
> On 2021-09-17 01:19, Manivannan Sadhasivam wrote:
> > On Thu, Sep 16, 2021 at 07:42:02PM +0300, Kalle Valo wrote:
> > > Manivannan Sadhasivam <[email protected]> writes:
> > >
> > > > On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
> > > >> Le jeu. 16 sept. 2021 Ă 13:12, Manivannan Sadhasivam <
> > > >> [email protected]> a Ă©crit :
> > > >>
> > > >
> > > > [...]
> > > >
> > > >> > If things seems to work fine without that patch, then it implies that
> > > >> > setting M0
> > > >> > state works during resume. I think we should just revert that patch.
> > > >> >
> > > >> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
> > > >>
> > > >>
> > > >> It fixes sdx modem resuming issue, without that we don’t know modem needs
> > > >> to be reinitialized.
> > > >>
> > > >
> > > > Okay. Then in that case, the recovery mechanism has to be added to the ath11k
> > > > MHI controller.
> > >
> > > What does that mean in practise, do you have any pointers or
> > > examples? I
> > > have no clue what you are proposing :)
> > >
> >
> > Take a look at the mhi_pci_recovery_work() function below:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/pci_generic.c#n610
> >
> > You need to implement something similar that basically powers up the MHI
> > endpoint (QCA6390) in case pm_resume() fails. At minimum, you need to
> > call
> > below functions:
> >
> > # Check if the device is powered on. If yes, then power it down to bring
> > it back
> > mhi_power_down()
> > mhi_unprepare_after_power_down()
> >
> > # Power up the device
> > mhi_prepare_for_power_up()
> > mhi_sync_power_up()
> >
> > This implies that the WLAN device has been powered off during suspend,
> > so the
> > resume fails and we are bringing the device back to working state.
> >
> This is fine for platform which doesn't provide power supply during suspend.
> But NUC has power supply in suspend state.
If NUC retains power supply during suspend then it should work with that commit.
During resume, the device is expected to be in M3 state and that's what the
commit verifies.
If the device is in a different state, then most likely the device have power
cycled.
> QCA6390 on NUC works after just reverting this commit also proves NUC has
> power supply in
> suspend state.
>
That's because we allowed the device to be in any state during resume and if it
responds to the M0 transition it worked.
> The reason is MHI-STATUS register can't be read somehow in M3 state on NUC.
No, that's not correct.
> Does the MHI spec state that MHI-STATUS register can be read in M3 state?
>
Yes, all the MHI registers are accessible in all states. During M3, both MHI
host and device (if supported) will transition to D3 Cold. Then during resume,
host will switch to D0 link state and will also notify the device to enter D0.
For aid debugging, please see the state the device is in during mhi_pm_resume().
You can use below diff:
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index fb99e3727155..482d55dd209e 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -898,6 +898,9 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
if (MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))
return -EIO;
+ dev_info(dev, "Device state: %s\n",
+ TO_MHI_STATE_STR(mhi_get_mhi_state(mhi_cntrl)));
+
if (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_M3)
return -EINVAL;
Thanks,
Mani
On 2021-09-23 16:59, Manivannan Sadhasivam wrote:
> On Thu, Sep 23, 2021 at 04:34:43PM +0800, Carl Huang wrote:
>> On 2021-09-17 01:19, Manivannan Sadhasivam wrote:
>> > On Thu, Sep 16, 2021 at 07:42:02PM +0300, Kalle Valo wrote:
>> > > Manivannan Sadhasivam <[email protected]> writes:
>> > >
>> > > > On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
>> > > >> Le jeu. 16 sept. 2021 Ă 13:12, Manivannan Sadhasivam <
>> > > >> [email protected]> a Ă©crit :
>> > > >>
>> > > >
>> > > > [...]
>> > > >
>> > > >> > If things seems to work fine without that patch, then it implies that
>> > > >> > setting M0
>> > > >> > state works during resume. I think we should just revert that patch.
>> > > >> >
>> > > >> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
>> > > >>
>> > > >>
>> > > >> It fixes sdx modem resuming issue, without that we don’t know modem needs
>> > > >> to be reinitialized.
>> > > >>
>> > > >
>> > > > Okay. Then in that case, the recovery mechanism has to be added to the ath11k
>> > > > MHI controller.
>> > >
>> > > What does that mean in practise, do you have any pointers or
>> > > examples? I
>> > > have no clue what you are proposing :)
>> > >
>> >
>> > Take a look at the mhi_pci_recovery_work() function below:
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/pci_generic.c#n610
>> >
>> > You need to implement something similar that basically powers up the MHI
>> > endpoint (QCA6390) in case pm_resume() fails. At minimum, you need to
>> > call
>> > below functions:
>> >
>> > # Check if the device is powered on. If yes, then power it down to bring
>> > it back
>> > mhi_power_down()
>> > mhi_unprepare_after_power_down()
>> >
>> > # Power up the device
>> > mhi_prepare_for_power_up()
>> > mhi_sync_power_up()
>> >
>> > This implies that the WLAN device has been powered off during suspend,
>> > so the
>> > resume fails and we are bringing the device back to working state.
>> >
>> This is fine for platform which doesn't provide power supply during
>> suspend.
>> But NUC has power supply in suspend state.
>
> If NUC retains power supply during suspend then it should work with
> that commit.
> During resume, the device is expected to be in M3 state and that's what
> the
> commit verifies.
>
> If the device is in a different state, then most likely the device have
> power
> cycled.
>
But the tricky thing here is that upstream QCA6390 doesn't have recovery
mechanism to download
firmware again, so QCA6390 has no way to work after a power cycle.
>> QCA6390 on NUC works after just reverting this commit also proves NUC
>> has
>> power supply in
>> suspend state.
>>
>
> That's because we allowed the device to be in any state during resume
> and if it
> responds to the M0 transition it worked.
>
>> The reason is MHI-STATUS register can't be read somehow in M3 state on
>> NUC.
>
> No, that's not correct.
>
>> Does the MHI spec state that MHI-STATUS register can be read in M3
>> state?
>>
>
> Yes, all the MHI registers are accessible in all states. During M3,
> both MHI
> host and device (if supported) will transition to D3 Cold. Then during
> resume,
> host will switch to D0 link state and will also notify the device to
> enter D0.
>
> For aid debugging, please see the state the device is in during
> mhi_pm_resume().
> You can use below diff:
>
> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> index fb99e3727155..482d55dd209e 100644
> --- a/drivers/bus/mhi/core/pm.c
> +++ b/drivers/bus/mhi/core/pm.c
> @@ -898,6 +898,9 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
> if (MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))
> return -EIO;
>
> + dev_info(dev, "Device state: %s\n",
> + TO_MHI_STATE_STR(mhi_get_mhi_state(mhi_cntrl)));
> +
> if (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_M3)
> return -EINVAL;
>
>
> Thanks,
> Mani
Hi Carl and Kalle,
On Thu, 23 Sept 2021 at 11:26, Carl Huang <[email protected]> wrote:
>
> On 2021-09-23 16:59, Manivannan Sadhasivam wrote:
> > On Thu, Sep 23, 2021 at 04:34:43PM +0800, Carl Huang wrote:
> >> On 2021-09-17 01:19, Manivannan Sadhasivam wrote:
> >> > On Thu, Sep 16, 2021 at 07:42:02PM +0300, Kalle Valo wrote:
> >> > > Manivannan Sadhasivam <[email protected]> writes:
> >> > >
> >> > > > On Thu, Sep 16, 2021 at 01:18:22PM +0200, Loic Poulain wrote:
> >> > > >> Le jeu. 16 sept. 2021 Ă 13:12, Manivannan Sadhasivam <
> >> > > >> [email protected]> a Ă©crit :
> >> > > >>
> >> > > >
> >> > > > [...]
> >> > > >
> >> > > >> > If things seems to work fine without that patch, then it implies that
> >> > > >> > setting M0
> >> > > >> > state works during resume. I think we should just revert that patch.
> >> > > >> >
> >> > > >> > Loic, did that patch fix any issue for you or it was a cosmetic fix only?
> >> > > >>
> >> > > >>
> >> > > >> It fixes sdx modem resuming issue, without that we don’t know modem needs
> >> > > >> to be reinitialized.
> >> > > >>
> >> > > >
> >> > > > Okay. Then in that case, the recovery mechanism has to be added to the ath11k
> >> > > > MHI controller.
> >> > >
> >> > > What does that mean in practise, do you have any pointers or
> >> > > examples? I
> >> > > have no clue what you are proposing :)
> >> > >
> >> >
> >> > Take a look at the mhi_pci_recovery_work() function below:
> >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/bus/mhi/pci_generic.c#n610
> >> >
> >> > You need to implement something similar that basically powers up the MHI
> >> > endpoint (QCA6390) in case pm_resume() fails. At minimum, you need to
> >> > call
> >> > below functions:
> >> >
> >> > # Check if the device is powered on. If yes, then power it down to bring
> >> > it back
> >> > mhi_power_down()
> >> > mhi_unprepare_after_power_down()
> >> >
> >> > # Power up the device
> >> > mhi_prepare_for_power_up()
> >> > mhi_sync_power_up()
> >> >
> >> > This implies that the WLAN device has been powered off during suspend,
> >> > so the
> >> > resume fails and we are bringing the device back to working state.
> >> >
> >> This is fine for platform which doesn't provide power supply during
> >> suspend.
> >> But NUC has power supply in suspend state.
> >
> > If NUC retains power supply during suspend then it should work with
> > that commit.
> > During resume, the device is expected to be in M3 state and that's what
> > the
> > commit verifies.
> >
> > If the device is in a different state, then most likely the device have
> > power
> > cycled.
> >
> But the tricky thing here is that upstream QCA6390 doesn't have recovery
> mechanism to download
> firmware again, so QCA6390 has no way to work after a power cycle.
Maybe a simple quick-fix would be to add a 'force' parameter to the
mhi resume function and discard state testing in case it is forced,
that would allow both ath11k and modem to work for now. Then
investigating what happens on ath11k side.
Thoughts?
Regards,
Loic
Loic Poulain <[email protected]> writes:
> Hi Kalle,
>
> On Thu, 16 Sept 2021 at 10:00, Kalle Valo <[email protected]> wrote:
>>
>> Hi Loic and Mani,
>>
>> I hate to be the bearer of bad news again :)
>>
>> I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
>> MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
>> ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
>> Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
>> problem, I only see the problem on the NUC. I do not know what's causing
>> this difference.
>
> I suppose the NUC is current PCI-Express power during suspend while
> the laptop maintains PCIe/M2 power.
Sorry, I'm not able to parse that sentence. Can you elaborate more?
>> At the moment I'm running my tests with commit 020d3b26c07a reverted and
>> everything works without problems. Is there a simple way to fix this? Or
>> maybe we should just revert the commit? Commit log and kernel logs from
>> a failing case below.
>
> Do you have log of success case?
A log from a successful case in the end of email, using v5.15-rc1 plus
revert of commit 020d3b26c07abe27.
> To me, the device loses power, that is why MHI resuming is failing.
> Normally the device should be properly recovered/reinitialized. Before
> that patch the power loss was simply not detected (or handled at
> higher stack level).
Currently in ath11k we always keep the firmware running when in suspend,
this is a workaround due to problems between mac80211 and MHI stack.
IIRC the problem was something related MHI creating struct device during
resume or something like that.
[ 164.088772] PM: suspend entry (deep)
[ 164.089867] Filesystems sync: 0.000 seconds
[ 164.140383] Freezing user space processes ... (elapsed 0.004 seconds) done.
[ 164.146245] OOM killer disabled.
[ 164.148024] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 164.151767] printk: Suspending console(s) (use no_console_suspend to debug)
[ 164.155767] wlan0: deauthenticating from <SENSORED> by local choice (Reason: 3=DEAUTH_LEAVING)
[ 164.197460] e1000e: EEE TX LPI TIMER: 00000011
[ 164.787849] ACPI: EC: interrupt blocked
[ 164.863887] ACPI: PM: Preparing to enter system sleep state S3
[ 164.898479] ACPI: EC: event blocked
[ 164.898483] ACPI: EC: EC stopped
[ 164.898487] ACPI: PM: Saving platform NVS memory
[ 164.898496] Disabling non-boot CPUs ...
[ 164.910527] numa_remove_cpu cpu 1 node 0: mask now 0,2-7
[ 164.911609] smpboot: CPU 1 is now offline
[ 164.929506] numa_remove_cpu cpu 2 node 0: mask now 0,3-7
[ 164.930593] smpboot: CPU 2 is now offline
[ 164.947111] numa_remove_cpu cpu 3 node 0: mask now 0,4-7
[ 164.948192] smpboot: CPU 3 is now offline
[ 164.965687] numa_remove_cpu cpu 4 node 0: mask now 0,5-7
[ 164.967133] smpboot: CPU 4 is now offline
[ 164.983150] numa_remove_cpu cpu 5 node 0: mask now 0,6-7
[ 164.984211] smpboot: CPU 5 is now offline
[ 164.992047] numa_remove_cpu cpu 6 node 0: mask now 0,7
[ 164.993549] smpboot: CPU 6 is now offline
[ 165.004382] numa_remove_cpu cpu 7 node 0: mask now 0
[ 165.005456] smpboot: CPU 7 is now offline
[ 165.009866] ACPI: PM: Low-level resume complete
[ 165.010106] ACPI: EC: EC started
[ 165.010109] ACPI: PM: Restoring platform NVS memory
[ 165.012344] Enabling non-boot CPUs ...
[ 165.012978] x86: Booting SMP configuration:
[ 165.012984] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 165.014850] numa_add_cpu cpu 1 node 0: mask now 0-1
[ 165.023818] CPU1 is up
[ 165.024455] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 165.026190] numa_add_cpu cpu 2 node 0: mask now 0-2
[ 165.034904] CPU2 is up
[ 165.035479] smpboot: Booting Node 0 Processor 3 APIC 0x6
[ 165.037193] numa_add_cpu cpu 3 node 0: mask now 0-3
[ 165.046102] CPU3 is up
[ 165.046639] smpboot: Booting Node 0 Processor 4 APIC 0x1
[ 165.047005] numa_add_cpu cpu 4 node 0: mask now 0-4
[ 165.058328] CPU4 is up
[ 165.058976] smpboot: Booting Node 0 Processor 5 APIC 0x3
[ 165.059342] numa_add_cpu cpu 5 node 0: mask now 0-5
[ 165.070520] CPU5 is up
[ 165.071192] smpboot: Booting Node 0 Processor 6 APIC 0x5
[ 165.071574] numa_add_cpu cpu 6 node 0: mask now 0-6
[ 165.082952] CPU6 is up
[ 165.083609] smpboot: Booting Node 0 Processor 7 APIC 0x7
[ 165.083980] numa_add_cpu cpu 7 node 0: mask now 0-7
[ 165.095544] CPU7 is up
[ 165.099137] ACPI: PM: Waking up from system sleep state S3
[ 166.045084] ACPI: EC: interrupt unblocked
[ 166.045242] pcieport 0000:00:1c.4: Intel SPT PCH root port ACS workaround enabled
[ 166.056234] pcieport 0000:00:1c.1: Intel SPT PCH root port ACS workaround enabled
[ 166.057410] pcieport 0000:00:1d.0: Intel SPT PCH root port ACS workaround enabled
[ 166.057413] pcieport 0000:00:1c.2: Intel SPT PCH root port ACS workaround enabled
[ 167.210794] ACPI: EC: event unblocked
[ 167.258815] nvme nvme0: 8/0/0 default/read/poll queues
[ 167.694965] atkbd serio0: Unknown key released (translated set 2, code 0x7c on isa0060/serio0).
[ 167.695953] OOM killer enabled.
[ 167.697336] atkbd serio0: Use 'setkeycodes 7c <keycode>' to make it known.
[ 167.750241] Restarting tasks ... done.
[ 167.770450] PM: suspend exit
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
On Fri, Sep 24, 2021 at 12:07:41PM +0300, Kalle Valo wrote:
> Manivannan Sadhasivam <[email protected]> writes:
>
> > For aid debugging, please see the state the device is in during mhi_pm_resume().
> > You can use below diff:
> >
> > diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> > index fb99e3727155..482d55dd209e 100644
> > --- a/drivers/bus/mhi/core/pm.c
> > +++ b/drivers/bus/mhi/core/pm.c
> > @@ -898,6 +898,9 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
> > if (MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))
> > return -EIO;
> >
> > + dev_info(dev, "Device state: %s\n",
> > + TO_MHI_STATE_STR(mhi_get_mhi_state(mhi_cntrl)));
> > +
> > if (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_M3)
> > return -EINVAL;
>
> This is what I get with my NUC testbox:
>
> [ 970.488202] ACPI: EC: event unblocked
> [ 970.492484] hpet: Lost 1587 RTC interrupts
> [ 970.492749] mhi mhi0: Device state: RESET
Looks like the MHI device went into RESET state! It also looks to be a
firmware thing. But let's nail this down before adding any workaround in
the MHI stack.
Can you also rebuild the kernel with MHI debug enabled and capture the
logs in faliure case? Sorry if it is too much of work for you!
Thanks,
Mani
> [ 970.492805] ath11k_pci 0000:06:00.0: failed to set mhi state: RESUME(6)
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
On Fri, Sep 24, 2021 at 11:43:55AM +0200, Loic Poulain wrote:
> Hi Kalle,
>
> On Fri, 24 Sept 2021 at 10:36, Kalle Valo <[email protected]> wrote:
> >
> > Loic Poulain <[email protected]> writes:
> >
> > > Hi Kalle,
> > >
> > > On Thu, 16 Sept 2021 at 10:00, Kalle Valo <[email protected]> wrote:
> > >>
> > >> Hi Loic and Mani,
> > >>
> > >> I hate to be the bearer of bad news again :)
> > >>
> > >> I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
> > >> MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
> > >> ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
> > >> Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
> > >> problem, I only see the problem on the NUC. I do not know what's causing
> > >> this difference.
> > >
> > > I suppose the NUC is current PCI-Express power during suspend while
> > > the laptop maintains PCIe/M2 power.
> >
> > Sorry, I'm not able to parse that sentence. Can you elaborate more?
>
> Ouch, yes, I wanted to say that the NUC does not maintain the power of
> PCI express during suspend (leading to PCI D3cold state), whereas the
> laptop maintains the power of the M2 card... well, not sure now I see
> your logs.
>
> >
> > >> At the moment I'm running my tests with commit 020d3b26c07a reverted and
> > >> everything works without problems. Is there a simple way to fix this? Or
> > >> maybe we should just revert the commit? Commit log and kernel logs from
> > >> a failing case below.
> > >
> > > Do you have log of success case?
> >
> > A log from a successful case in the end of email, using v5.15-rc1 plus
> > revert of commit 020d3b26c07abe27.
> >
> > > To me, the device loses power, that is why MHI resuming is failing.
> > > Normally the device should be properly recovered/reinitialized. Before
> > > that patch the power loss was simply not detected (or handled at
> > > higher stack level).
> >
> > Currently in ath11k we always keep the firmware running when in suspend,
> > this is a workaround due to problems between mac80211 and MHI stack.
> > IIRC the problem was something related MHI creating struct device during
> > resume or something like that.
>
> Could you give a try with the attached patch? It should solve your
> issue without breaking modem support.
>
It will... But we should first try to see what is causing the device to
be in MHI RESET state. We can't add a force resume case without knowing
the rootcause.
And for workaround, we can proceed resume if device is in RESET state
adding a comment on why. But let's first get the MHI debug logs.
> Regards,
> Loic
Manivannan Sadhasivam <[email protected]> writes:
> For aid debugging, please see the state the device is in during mhi_pm_resume().
> You can use below diff:
>
> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> index fb99e3727155..482d55dd209e 100644
> --- a/drivers/bus/mhi/core/pm.c
> +++ b/drivers/bus/mhi/core/pm.c
> @@ -898,6 +898,9 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
> if (MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))
> return -EIO;
>
> + dev_info(dev, "Device state: %s\n",
> + TO_MHI_STATE_STR(mhi_get_mhi_state(mhi_cntrl)));
> +
> if (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_M3)
> return -EINVAL;
This is what I get with my NUC testbox:
[ 970.488202] ACPI: EC: event unblocked
[ 970.492484] hpet: Lost 1587 RTC interrupts
[ 970.492749] mhi mhi0: Device state: RESET
[ 970.492805] ath11k_pci 0000:06:00.0: failed to set mhi state: RESUME(6)
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Hi Kalle,
On Fri, 24 Sept 2021 at 10:36, Kalle Valo <[email protected]> wrote:
>
> Loic Poulain <[email protected]> writes:
>
> > Hi Kalle,
> >
> > On Thu, 16 Sept 2021 at 10:00, Kalle Valo <[email protected]> wrote:
> >>
> >> Hi Loic and Mani,
> >>
> >> I hate to be the bearer of bad news again :)
> >>
> >> I noticed already a while ago that commit 020d3b26c07a ("bus: mhi: Early
> >> MHI resume failure in non M3 state"), introduced in v5.13-rc1, broke
> >> ath11k resume on my NUC x86 testbox using QCA6390. Interestingly enough
> >> Dell XPS 13 9310 laptop (with QCA6390 as well) does not have this
> >> problem, I only see the problem on the NUC. I do not know what's causing
> >> this difference.
> >
> > I suppose the NUC is current PCI-Express power during suspend while
> > the laptop maintains PCIe/M2 power.
>
> Sorry, I'm not able to parse that sentence. Can you elaborate more?
Ouch, yes, I wanted to say that the NUC does not maintain the power of
PCI express during suspend (leading to PCI D3cold state), whereas the
laptop maintains the power of the M2 card... well, not sure now I see
your logs.
>
> >> At the moment I'm running my tests with commit 020d3b26c07a reverted and
> >> everything works without problems. Is there a simple way to fix this? Or
> >> maybe we should just revert the commit? Commit log and kernel logs from
> >> a failing case below.
> >
> > Do you have log of success case?
>
> A log from a successful case in the end of email, using v5.15-rc1 plus
> revert of commit 020d3b26c07abe27.
>
> > To me, the device loses power, that is why MHI resuming is failing.
> > Normally the device should be properly recovered/reinitialized. Before
> > that patch the power loss was simply not detected (or handled at
> > higher stack level).
>
> Currently in ath11k we always keep the firmware running when in suspend,
> this is a workaround due to problems between mac80211 and MHI stack.
> IIRC the problem was something related MHI creating struct device during
> resume or something like that.
Could you give a try with the attached patch? It should solve your
issue without breaking modem support.
Regards,
Loic
(adding the new mhi list, yay)
Hi Loic,
Loic Poulain <[email protected]> writes:
>> Loic Poulain <[email protected]> writes:
>>
>> > On Thu, 16 Sept 2021 at 10:00, Kalle Valo <[email protected]> wrote:
>>
>> >> At the moment I'm running my tests with commit 020d3b26c07a reverted and
>> >> everything works without problems. Is there a simple way to fix this? Or
>> >> maybe we should just revert the commit? Commit log and kernel logs from
>> >> a failing case below.
>> >
>> > Do you have log of success case?
>>
>> A log from a successful case in the end of email, using v5.15-rc1 plus
>> revert of commit 020d3b26c07abe27.
>>
>> > To me, the device loses power, that is why MHI resuming is failing.
>> > Normally the device should be properly recovered/reinitialized. Before
>> > that patch the power loss was simply not detected (or handled at
>> > higher stack level).
>>
>> Currently in ath11k we always keep the firmware running when in suspend,
>> this is a workaround due to problems between mac80211 and MHI stack.
>> IIRC the problem was something related MHI creating struct device during
>> resume or something like that.
>
> Could you give a try with the attached patch? It should solve your
> issue without breaking modem support.
Sorry for taking so long, but I now tested your patch on top of
v5.15-rc3 and, as expected, everything works as before with QCA6390 on
NUC x86 testbox.
Tested-by: Kalle Valo <[email protected]>
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
(adding also mhi list)
Manivannan Sadhasivam <[email protected]> writes:
> On Fri, Sep 24, 2021 at 12:07:41PM +0300, Kalle Valo wrote:
>> Manivannan Sadhasivam <[email protected]> writes:
>>
>> > For aid debugging, please see the state the device is in during mhi_pm_resume().
>> > You can use below diff:
>> >
>> > diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
>> > index fb99e3727155..482d55dd209e 100644
>> > --- a/drivers/bus/mhi/core/pm.c
>> > +++ b/drivers/bus/mhi/core/pm.c
>> > @@ -898,6 +898,9 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
>> > if (MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))
>> > return -EIO;
>> >
>> > + dev_info(dev, "Device state: %s\n",
>> > + TO_MHI_STATE_STR(mhi_get_mhi_state(mhi_cntrl)));
>> > +
>> > if (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_M3)
>> > return -EINVAL;
>>
>> This is what I get with my NUC testbox:
>>
>> [ 970.488202] ACPI: EC: event unblocked
>> [ 970.492484] hpet: Lost 1587 RTC interrupts
>> [ 970.492749] mhi mhi0: Device state: RESET
>
> Looks like the MHI device went into RESET state! It also looks to be a
> firmware thing. But let's nail this down before adding any workaround in
> the MHI stack.
>
> Can you also rebuild the kernel with MHI debug enabled and capture the
> logs in faliure case?
So what I should exactly do to enable debug messages?
I have this in my Kconfig:
CONFIG_MHI_BUS=m
# CONFIG_MHI_BUS_DEBUG is not set
# CONFIG_MHI_BUS_PCI_GENERIC is not set
And AFAICS CONFIG_MHI_BUS_DEBUG only enables the debugfs interface, I
doubt you meant that.
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Kalle Valo <[email protected]> writes:
> (adding the new mhi list, yay)
>
> Hi Loic,
>
> Loic Poulain <[email protected]> writes:
>
>>> Loic Poulain <[email protected]> writes:
>>>
>>> > On Thu, 16 Sept 2021 at 10:00, Kalle Valo <[email protected]> wrote:
>>>
>>> >> At the moment I'm running my tests with commit 020d3b26c07a reverted and
>>> >> everything works without problems. Is there a simple way to fix this? Or
>>> >> maybe we should just revert the commit? Commit log and kernel logs from
>>> >> a failing case below.
>>> >
>>> > Do you have log of success case?
>>>
>>> A log from a successful case in the end of email, using v5.15-rc1 plus
>>> revert of commit 020d3b26c07abe27.
>>>
>>> > To me, the device loses power, that is why MHI resuming is failing.
>>> > Normally the device should be properly recovered/reinitialized. Before
>>> > that patch the power loss was simply not detected (or handled at
>>> > higher stack level).
>>>
>>> Currently in ath11k we always keep the firmware running when in suspend,
>>> this is a workaround due to problems between mac80211 and MHI stack.
>>> IIRC the problem was something related MHI creating struct device during
>>> resume or something like that.
>>
>> Could you give a try with the attached patch? It should solve your
>> issue without breaking modem support.
>
> Sorry for taking so long, but I now tested your patch on top of
> v5.15-rc3 and, as expected, everything works as before with QCA6390 on
> NUC x86 testbox.
>
> Tested-by: Kalle Valo <[email protected]>
I doubt we will find enough time to fully debug this mhi issue anytime
soon. Can we commit Loic's patch so that this regression is resolved?
At the moment I'm doing all my regression testing with commit
020d3b26c07abe27 reverted. That's a risk, I would prefer to do my
testing without any hacks.
--
https://patchwork.kernel.org/project/linux-wireless/list/
https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
On Tue, Oct 19, 2021 at 03:12:01PM +0300, Kalle Valo wrote:
> Kalle Valo <[email protected]> writes:
>
> > (adding the new mhi list, yay)
> >
> > Hi Loic,
> >
> > Loic Poulain <[email protected]> writes:
> >
> >>> Loic Poulain <[email protected]> writes:
> >>>
> >>> > On Thu, 16 Sept 2021 at 10:00, Kalle Valo <[email protected]> wrote:
> >>>
> >>> >> At the moment I'm running my tests with commit 020d3b26c07a reverted and
> >>> >> everything works without problems. Is there a simple way to fix this? Or
> >>> >> maybe we should just revert the commit? Commit log and kernel logs from
> >>> >> a failing case below.
> >>> >
> >>> > Do you have log of success case?
> >>>
> >>> A log from a successful case in the end of email, using v5.15-rc1 plus
> >>> revert of commit 020d3b26c07abe27.
> >>>
> >>> > To me, the device loses power, that is why MHI resuming is failing.
> >>> > Normally the device should be properly recovered/reinitialized. Before
> >>> > that patch the power loss was simply not detected (or handled at
> >>> > higher stack level).
> >>>
> >>> Currently in ath11k we always keep the firmware running when in suspend,
> >>> this is a workaround due to problems between mac80211 and MHI stack.
> >>> IIRC the problem was something related MHI creating struct device during
> >>> resume or something like that.
> >>
> >> Could you give a try with the attached patch? It should solve your
> >> issue without breaking modem support.
> >
> > Sorry for taking so long, but I now tested your patch on top of
> > v5.15-rc3 and, as expected, everything works as before with QCA6390 on
> > NUC x86 testbox.
> >
> > Tested-by: Kalle Valo <[email protected]>
>
> I doubt we will find enough time to fully debug this mhi issue anytime
> soon. Can we commit Loic's patch so that this regression is resolved?
>
Sorry no :( Eventhough Loic's patch is working, I want to understand the
issue properly so that we could add a proper fix or patch the firmware
if possible.
Let's try to get the debug logs as I requested.
Thanks,
Mani
> At the moment I'm doing all my regression testing with commit
> 020d3b26c07abe27 reverted. That's a risk, I would prefer to do my
> testing without any hacks.
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
On Thu, Oct 07, 2021 at 12:55:52PM +0300, Kalle Valo wrote:
> (adding also mhi list)
>
> Manivannan Sadhasivam <[email protected]> writes:
>
> > On Fri, Sep 24, 2021 at 12:07:41PM +0300, Kalle Valo wrote:
> >> Manivannan Sadhasivam <[email protected]> writes:
> >>
> >> > For aid debugging, please see the state the device is in during mhi_pm_resume().
> >> > You can use below diff:
> >> >
> >> > diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> >> > index fb99e3727155..482d55dd209e 100644
> >> > --- a/drivers/bus/mhi/core/pm.c
> >> > +++ b/drivers/bus/mhi/core/pm.c
> >> > @@ -898,6 +898,9 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
> >> > if (MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))
> >> > return -EIO;
> >> >
> >> > + dev_info(dev, "Device state: %s\n",
> >> > + TO_MHI_STATE_STR(mhi_get_mhi_state(mhi_cntrl)));
> >> > +
> >> > if (mhi_get_mhi_state(mhi_cntrl) != MHI_STATE_M3)
> >> > return -EINVAL;
> >>
> >> This is what I get with my NUC testbox:
> >>
> >> [ 970.488202] ACPI: EC: event unblocked
> >> [ 970.492484] hpet: Lost 1587 RTC interrupts
> >> [ 970.492749] mhi mhi0: Device state: RESET
> >
> > Looks like the MHI device went into RESET state! It also looks to be a
> > firmware thing. But let's nail this down before adding any workaround in
> > the MHI stack.
> >
> > Can you also rebuild the kernel with MHI debug enabled and capture the
> > logs in faliure case?
>
> So what I should exactly do to enable debug messages?
>
> I have this in my Kconfig:
>
> CONFIG_MHI_BUS=m
> # CONFIG_MHI_BUS_DEBUG is not set
> # CONFIG_MHI_BUS_PCI_GENERIC is not set
>
> And AFAICS CONFIG_MHI_BUS_DEBUG only enables the debugfs interface, I
> doubt you meant that.
>
No. You should enable the dev_dbg messages in MHI core by adding the -DDEBUG
flag to the Makefile or by CONFIG_DYNAMIC_DEBUG.
Thanks,
Mani
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
On 21.10.21 12:03, Manivannan Sadhasivam wrote:
> On Tue, Oct 19, 2021 at 03:12:01PM +0300, Kalle Valo wrote:
>> Kalle Valo <[email protected]> writes:
>>> (adding the new mhi list, yay)
>>> Loic Poulain <[email protected]> writes:
>>>>> Loic Poulain <[email protected]> writes:
>>>>>> On Thu, 16 Sept 2021 at 10:00, Kalle Valo <[email protected]> wrote:
>>>>>>> At the moment I'm running my tests with commit 020d3b26c07a reverted and
>>>>>>> everything works without problems. Is there a simple way to fix this? Or
>>>>>>> maybe we should just revert the commit? Commit log and kernel logs from
>>>>>>> a failing case below.
>>>>>>
>>>>>> Do you have log of success case?
>>>>>
>>>>> A log from a successful case in the end of email, using v5.15-rc1 plus
>>>>> revert of commit 020d3b26c07abe27.
>>>>>
>>>>>> To me, the device loses power, that is why MHI resuming is failing.
>>>>>> Normally the device should be properly recovered/reinitialized. Before
>>>>>> that patch the power loss was simply not detected (or handled at
>>>>>> higher stack level).
>>>>>
>>>>> Currently in ath11k we always keep the firmware running when in suspend,
>>>>> this is a workaround due to problems between mac80211 and MHI stack.
>>>>> IIRC the problem was something related MHI creating struct device during
>>>>> resume or something like that.
>>>>
>>>> Could you give a try with the attached patch? It should solve your
>>>> issue without breaking modem support.
>>>
>>> Sorry for taking so long, but I now tested your patch on top of
>>> v5.15-rc3 and, as expected, everything works as before with QCA6390 on
>>> NUC x86 testbox.
>>>
>>> Tested-by: Kalle Valo <[email protected]>
>>
>> I doubt we will find enough time to fully debug this mhi issue anytime
>> soon. Can we commit Loic's patch so that this regression is resolved?
>
> Sorry no :( Eventhough Loic's patch is working, I want to understand the
> issue properly so that we could add a proper fix or patch the firmware
> if possible.
Lo, this is your Linux kernel regression tracker speaking!
> Let's try to get the debug logs as I requested.
That was 3 weeks ago. Afaics nothing happened since then (except the
other mail about this on the same day in this thread). Or did I miss
anything? And if not: How can we get the ball rolling somehow again to
get this regression finally fixed?
Ciao, Thorsten (carrying his Linux kernel regression tracker hat)
P.S.: I have no personal interest in this issue and watch it using
regzbot. Hence feel free to exclude me on further messages in this
thread, as I'm only posting this mail to hopefully get a status update
and things rolling again.
#regzbot poke
>> At the moment I'm doing all my regression testing with commit
>> 020d3b26c07abe27 reverted. That's a risk, I would prefer to do my
>> testing without any hacks.
>>
>> --
>> https://patchwork.kernel.org/project/linux-wireless/list/
>>
>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
>
>
On Thu, Oct 21, 2021 at 03:33:05PM +0530, Manivannan Sadhasivam wrote:
> On Tue, Oct 19, 2021 at 03:12:01PM +0300, Kalle Valo wrote:
> > Kalle Valo <[email protected]> writes:
> >
> > > (adding the new mhi list, yay)
> > >
> > > Hi Loic,
> > >
> > > Loic Poulain <[email protected]> writes:
> > >
> > >>> Loic Poulain <[email protected]> writes:
> > >>>
> > >>> > On Thu, 16 Sept 2021 at 10:00, Kalle Valo <[email protected]> wrote:
> > >>>
> > >>> >> At the moment I'm running my tests with commit 020d3b26c07a reverted and
> > >>> >> everything works without problems. Is there a simple way to fix this? Or
> > >>> >> maybe we should just revert the commit? Commit log and kernel logs from
> > >>> >> a failing case below.
> > >>> >
> > >>> > Do you have log of success case?
> > >>>
> > >>> A log from a successful case in the end of email, using v5.15-rc1 plus
> > >>> revert of commit 020d3b26c07abe27.
> > >>>
> > >>> > To me, the device loses power, that is why MHI resuming is failing.
> > >>> > Normally the device should be properly recovered/reinitialized. Before
> > >>> > that patch the power loss was simply not detected (or handled at
> > >>> > higher stack level).
> > >>>
> > >>> Currently in ath11k we always keep the firmware running when in suspend,
> > >>> this is a workaround due to problems between mac80211 and MHI stack.
> > >>> IIRC the problem was something related MHI creating struct device during
> > >>> resume or something like that.
> > >>
> > >> Could you give a try with the attached patch? It should solve your
> > >> issue without breaking modem support.
> > >
> > > Sorry for taking so long, but I now tested your patch on top of
> > > v5.15-rc3 and, as expected, everything works as before with QCA6390 on
> > > NUC x86 testbox.
> > >
> > > Tested-by: Kalle Valo <[email protected]>
> >
> > I doubt we will find enough time to fully debug this mhi issue anytime
> > soon. Can we commit Loic's patch so that this regression is resolved?
> >
>
> Sorry no :( Eventhough Loic's patch is working, I want to understand the
> issue properly so that we could add a proper fix or patch the firmware
> if possible.
>
> Let's try to get the debug logs as I requested.
>
I'm able to reproduce the issue on my NUC. I'm still investigating on how to
properly fix this issue. Expect a patch soon.
Thanks,
Mani
> Thanks,
> Mani
>
> > At the moment I'm doing all my regression testing with commit
> > 020d3b26c07abe27 reverted. That's a risk, I would prefer to do my
> > testing without any hacks.
> >
> > --
> > https://patchwork.kernel.org/project/linux-wireless/list/
> >
> > https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Hi, this is your Linux kernel regression tracker speaking, this time
looking for a status update.
On 18.11.21 18:41, Manivannan Sadhasivam wrote:
> On Thu, Oct 21, 2021 at 03:33:05PM +0530, Manivannan Sadhasivam wrote:
>> On Tue, Oct 19, 2021 at 03:12:01PM +0300, Kalle Valo wrote:
>>> Kalle Valo <[email protected]> writes:
>>>
>>>> (adding the new mhi list, yay)
>>>>
>>>> Hi Loic,
>>>>
>>>> Loic Poulain <[email protected]> writes:
>>>>
>>>>>> Loic Poulain <[email protected]> writes:
>>>>>>
>>>>>>> On Thu, 16 Sept 2021 at 10:00, Kalle Valo <[email protected]> wrote:
>>>>>>
>>>>>>>> At the moment I'm running my tests with commit 020d3b26c07a reverted and
>>>>>>>> everything works without problems. Is there a simple way to fix this? Or
>>>>>>>> maybe we should just revert the commit? Commit log and kernel logs from
>>>>>>>> a failing case below.
>>>>>>>
>>>>>>> Do you have log of success case?
>>>>>>
>>>>>> A log from a successful case in the end of email, using v5.15-rc1 plus
>>>>>> revert of commit 020d3b26c07abe27.
>>>>>>
>>>>>>> To me, the device loses power, that is why MHI resuming is failing.
>>>>>>> Normally the device should be properly recovered/reinitialized. Before
>>>>>>> that patch the power loss was simply not detected (or handled at
>>>>>>> higher stack level).
>>>>>>
>>>>>> Currently in ath11k we always keep the firmware running when in suspend,
>>>>>> this is a workaround due to problems between mac80211 and MHI stack.
>>>>>> IIRC the problem was something related MHI creating struct device during
>>>>>> resume or something like that.
>>>>>
>>>>> Could you give a try with the attached patch? It should solve your
>>>>> issue without breaking modem support.
>>>>
>>>> Sorry for taking so long, but I now tested your patch on top of
>>>> v5.15-rc3 and, as expected, everything works as before with QCA6390 on
>>>> NUC x86 testbox.
>>>>
>>>> Tested-by: Kalle Valo <[email protected]>
>>>
>>> I doubt we will find enough time to fully debug this mhi issue anytime
>>> soon. Can we commit Loic's patch so that this regression is resolved?
>>>
>>
>> Sorry no :( Eventhough Loic's patch is working, I want to understand the
>> issue properly so that we could add a proper fix or patch the firmware
>> if possible.
>>
>> Let's try to get the debug logs as I requested.
>
> I'm able to reproduce the issue on my NUC. I'm still investigating on how to
> properly fix this issue. Expect a patch soon.
Was there some progress? This issue was reported 75 days ago and still
is not fixed. From the point of the Linux kernel regression tracker I'd
say: it should not take this long. Looking back at it I wonder if
'reverted the culprit and reapply later together with a proper fix'
would have been the better strategy. I wonder if that still would be the
best way forward if no patch is forthcoming soon.
Ciao, Thorsten
#regzbot poke
>>> At the moment I'm doing all my regression testing with commit
>>> 020d3b26c07abe27 reverted. That's a risk, I would prefer to do my
>>> testing without any hacks.
>>>
>>> --
>>> https://patchwork.kernel.org/project/linux-wireless/list/
>>>
>>> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply. That's in everyone's interest, as
what I wrote above might be misleading to everyone reading this; any
suggestion I gave they thus might sent someone reading this down the
wrong rabbit hole, which none of us wants.
BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.