2022-08-25 11:24:24

by Manikanta Pubbisetty

[permalink] [raw]
Subject: [PATCH v2 3/4] ath11k: Enable low power mode when WLAN is not active

Currently, WLAN chip is powered once during driver probe and is kept
ON (powered) always even when WLAN is not active; keeping the chip
powered ON all the time will consume extra power which is not
desirable for a battery operated device. Same is the case with non-WoW
suspend, chip will never be put into low power mode when the system is
suspended resulting in higher battery drain.

As per the recommendation, sending a PDEV suspend WMI command followed
by a QMI MODE OFF command will cease all WLAN activity and put the device
in low power mode. When WLAN interfaces are brought up, sending a QMI
MISSION MODE command would be sufficient to bring the chip out of low
power. This is a better approach than doing hif_power_down()/hif_power_up()
for every WiFi ON/OFF sequence since the turnaround time for entry/exit of
low power mode is much less. Overhead is just the time taken for sending
QMI MODE OFF & QMI MISSION MODE commands instead of going through the
entire chip boot & QMI init sequence.

Currently the changes are applicable only for WCN6750. This can be
extended to other targets with a future patch.

Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1

Signed-off-by: Manikanta Pubbisetty <[email protected]>
---
drivers/net/wireless/ath/ath11k/ahb.c | 45 ++++++++
drivers/net/wireless/ath/ath11k/core.c | 149 ++++++++++++++++++++++---
drivers/net/wireless/ath/ath11k/core.h | 2 +
drivers/net/wireless/ath/ath11k/hif.h | 12 ++
drivers/net/wireless/ath/ath11k/mac.c | 8 +-
drivers/net/wireless/ath/ath11k/pci.c | 26 +++++
6 files changed, 222 insertions(+), 20 deletions(-)

diff --git a/drivers/net/wireless/ath/ath11k/ahb.c b/drivers/net/wireless/ath/ath11k/ahb.c
index c47414710138..d5784f319f2d 100644
--- a/drivers/net/wireless/ath/ath11k/ahb.c
+++ b/drivers/net/wireless/ath/ath11k/ahb.c
@@ -189,6 +189,47 @@ static const struct ath11k_pci_ops ath11k_ahb_pci_ops_wcn6750 = {
.window_read32 = ath11k_ahb_window_read32_wcn6750,
};

+static int ath11k_ahb_core_start_wcn6750(struct ath11k_base *ab)
+{
+ /* Initialize the hardware/firmware only for the first PDEV
+ * or during hardware recovery.
+ */
+ if (!test_bit(ATH11K_FLAG_RECOVERY, &ab->dev_flags) &&
+ ath11k_core_any_pdevs_on(ab))
+ return 0;
+
+ return ath11k_core_start_device(ab);
+}
+
+static void ath11k_ahb_core_stop_wcn6750(struct ath11k_base *ab)
+{
+ return ath11k_core_stop_device(ab);
+}
+
+static int ath11k_ahb_core_start_ipq8074(struct ath11k_base *ab)
+{
+ /* TODO: Currently initializing the hardware/firmware only
+ * during hardware recovery. Support to shutdown/turn-on
+ * the hardware during Wi-Fi OFF/ON will be added later.
+ */
+ if (!test_bit(ATH11K_FLAG_RECOVERY, &ab->dev_flags))
+ return 0;
+
+ return ath11k_core_start_device(ab);
+}
+
+static void ath11k_ahb_core_stop_ipq8074(struct ath11k_base *ab)
+{
+ /* TODO: Currently stopping the hardware/firmware only
+ * during driver unload. Support to shutdown/turn-on
+ * the hardware during Wi-Fi OFF/ON will be added later.
+ */
+ if (!test_bit(ATH11K_FLAG_UNREGISTERING, &ab->dev_flags))
+ return;
+
+ return ath11k_core_stop_device(ab);
+}
+
static inline u32 ath11k_ahb_read32(struct ath11k_base *ab, u32 offset)
{
return ioread32(ab->mem + offset);
@@ -695,6 +736,8 @@ static const struct ath11k_hif_ops ath11k_ahb_hif_ops_ipq8074 = {
.map_service_to_pipe = ath11k_ahb_map_service_to_pipe,
.power_down = ath11k_ahb_power_down,
.power_up = ath11k_ahb_power_up,
+ .core_start = ath11k_ahb_core_start_ipq8074,
+ .core_stop = ath11k_ahb_core_stop_ipq8074,
};

static const struct ath11k_hif_ops ath11k_ahb_hif_ops_wcn6750 = {
@@ -709,6 +752,8 @@ static const struct ath11k_hif_ops ath11k_ahb_hif_ops_wcn6750 = {
.map_service_to_pipe = ath11k_pcic_map_service_to_pipe,
.power_down = ath11k_ahb_power_down,
.power_up = ath11k_ahb_power_up,
+ .core_start = ath11k_ahb_core_start_wcn6750,
+ .core_stop = ath11k_ahb_core_stop_wcn6750,
};

static int ath11k_core_get_rproc(struct ath11k_base *ab)
diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c
index 8cfef7a07a65..88a54b36e178 100644
--- a/drivers/net/wireless/ath/ath11k/core.c
+++ b/drivers/net/wireless/ath/ath11k/core.c
@@ -1175,7 +1175,6 @@ static int ath11k_core_soc_create(struct ath11k_base *ab)
static void ath11k_core_soc_destroy(struct ath11k_base *ab)
{
ath11k_debugfs_soc_destroy(ab);
- ath11k_dp_free(ab);
ath11k_reg_free(ab);
ath11k_qmi_deinit_service(ab);
}
@@ -1231,11 +1230,7 @@ static int ath11k_core_pdev_create(struct ath11k_base *ab)

static void ath11k_core_pdev_destroy(struct ath11k_base *ab)
{
- ath11k_spectral_deinit(ab);
- ath11k_thermal_unregister(ab);
ath11k_mac_unregister(ab);
- ath11k_hif_irq_disable(ab);
- ath11k_dp_pdev_free(ab);
ath11k_debugfs_pdev_destroy(ab);
}

@@ -1367,7 +1362,7 @@ static int ath11k_core_start_firmware(struct ath11k_base *ab,
return ret;
}

-int ath11k_core_qmi_firmware_ready(struct ath11k_base *ab)
+static int ath11k_core_setup_device(struct ath11k_base *ab)
{
int ret;

@@ -1400,17 +1395,44 @@ int ath11k_core_qmi_firmware_ready(struct ath11k_base *ab)
break;
default:
ath11k_info(ab, "invalid crypto_mode: %d\n", ath11k_crypto_mode);
- return -EINVAL;
+ ret = -EINVAL;
+ goto err_dp_free;
}

if (ath11k_frame_mode == ATH11K_HW_TXRX_RAW)
set_bit(ATH11K_FLAG_RAW_MODE, &ab->dev_flags);

+ return 0;
+
+err_dp_free:
+ ath11k_dp_free(ab);
+err_firmware_stop:
+ ath11k_qmi_firmware_stop(ab);
+
+ return ret;
+}
+
+static void ath11k_core_free_device(struct ath11k_base *ab)
+{
+ ath11k_dp_free(ab);
+ ath11k_qmi_firmware_stop(ab);
+}
+
+int ath11k_core_qmi_firmware_ready(struct ath11k_base *ab)
+{
+ int ret;
+
+ ret = ath11k_core_setup_device(ab);
+ if (ret) {
+ ath11k_err(ab, "failed to setup device: %d\n", ret);
+ return ret;
+ }
+
mutex_lock(&ab->core_lock);
ret = ath11k_core_start(ab);
if (ret) {
ath11k_err(ab, "failed to start core: %d\n", ret);
- goto err_dp_free;
+ goto err_core_free;
}

ret = ath11k_core_pdev_create(ab);
@@ -1418,7 +1440,10 @@ int ath11k_core_qmi_firmware_ready(struct ath11k_base *ab)
ath11k_err(ab, "failed to create pdev core: %d\n", ret);
goto err_core_stop;
}
+
ath11k_hif_irq_enable(ab);
+ ath11k_hif_core_stop(ab);
+
mutex_unlock(&ab->core_lock);

return 0;
@@ -1426,11 +1451,9 @@ int ath11k_core_qmi_firmware_ready(struct ath11k_base *ab)
err_core_stop:
ath11k_core_stop(ab);
ath11k_mac_destroy(ab);
-err_dp_free:
- ath11k_dp_free(ab);
+err_core_free:
mutex_unlock(&ab->core_lock);
-err_firmware_stop:
- ath11k_qmi_firmware_stop(ab);
+ ath11k_core_free_device(ab);

return ret;
}
@@ -1708,7 +1731,6 @@ void ath11k_core_deinit(struct ath11k_base *ab)
mutex_lock(&ab->core_lock);

ath11k_core_pdev_destroy(ab);
- ath11k_core_stop(ab);

mutex_unlock(&ab->core_lock);

@@ -1776,37 +1798,128 @@ struct ath11k_base *ath11k_core_alloc(struct device *dev, size_t priv_size,
}
EXPORT_SYMBOL(ath11k_core_alloc);

+static int ath11k_core_suspend_target(struct ath11k_base *ab, u32 suspend_opt)
+{
+ struct ath11k *ar;
+ struct ath11k_pdev *pdev;
+ unsigned long time_left;
+ int ret;
+ int i;
+
+ for (i = 0; i < ab->num_radios; i++) {
+ pdev = &ab->pdevs[i];
+ ar = pdev->ar;
+
+ reinit_completion(&ab->htc_suspend);
+
+ ret = ath11k_wmi_pdev_suspend(ar, suspend_opt, pdev->pdev_id);
+ if (ret) {
+ ath11k_warn(ab, "could not suspend target (%d)\n", ret);
+ return ret;
+ }
+
+ time_left = wait_for_completion_timeout(&ab->htc_suspend, 3 * HZ);
+
+ if (!time_left) {
+ ath11k_warn(ab, "suspend timed out - target pause event never came\n");
+ return -ETIMEDOUT;
+ }
+ }
+
+ return 0;
+}
+
+void ath11k_core_stop_device(struct ath11k_base *ab)
+{
+ ath11k_core_suspend_target(ab, WMI_PDEV_SUSPEND_AND_DISABLE_INTR);
+ ath11k_hif_irq_disable(ab);
+ ath11k_hif_stop(ab);
+
+ if (!test_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags))
+ ath11k_qmi_firmware_stop(ab);
+
+ ath11k_wmi_detach(ab);
+ ath11k_dp_pdev_reo_cleanup(ab);
+ ath11k_spectral_deinit(ab);
+ ath11k_thermal_unregister(ab);
+ ath11k_dp_pdev_free(ab);
+ ath11k_dp_free(ab);
+}
+EXPORT_SYMBOL(ath11k_core_stop_device);
+
+int ath11k_core_any_pdevs_on(struct ath11k_base *ab)
+{
+ struct ath11k_pdev *pdev;
+ struct ath11k *ar;
+ int i;
+
+ for (i = 0; i < ab->num_radios; i++) {
+ pdev = &ab->pdevs[i];
+ ar = pdev->ar;
+ if (!ar)
+ continue;
+
+ if (ar->state == ATH11K_STATE_ON)
+ return true;
+ }
+
+ return false;
+}
+EXPORT_SYMBOL(ath11k_core_any_pdevs_on);
+
int ath11k_core_start_device(struct ath11k_base *ab)
{
int ret;

- if (!test_bit(ATH11K_FLAG_RECOVERY, &ab->dev_flags))
- return 0;
+ mutex_lock(&ab->core_lock);

ath11k_hal_srng_deinit(ab);

ret = ath11k_hal_srng_init(ab);
if (ret) {
ath11k_err(ab, "failed to init srng: %d\n", ret);
- return ret;
+ goto err_unlock;
}

clear_bit(ATH11K_FLAG_CRASH_FLUSH, &ab->dev_flags);

- ret = ath11k_core_qmi_firmware_ready(ab);
+ ret = ath11k_core_setup_device(ab);
if (ret) {
- ath11k_err(ab, "failed to init core: %d\n", ret);
+ ath11k_err(ab, "failed to setup device: %d\n", ret);
goto err_hal_srng_deinit;
}

+ ret = ath11k_core_start(ab);
+ if (ret) {
+ ath11k_err(ab, "failed to start core: %d\n", ret);
+ goto err_core_free;
+ }
+
+ ret = ath11k_core_pdev_create(ab);
+ if (ret) {
+ ath11k_err(ab, "failed to create pdev core: %d\n", ret);
+ goto err_core_stop;
+ }
+
+ ath11k_hif_irq_enable(ab);
clear_bit(ATH11K_FLAG_RECOVERY, &ab->dev_flags);

+ mutex_unlock(&ab->core_lock);
+
return 0;

+err_core_stop:
+ ath11k_core_stop(ab);
+ ath11k_mac_destroy(ab);
+err_core_free:
+ ath11k_core_free_device(ab);
err_hal_srng_deinit:
ath11k_hal_srng_deinit(ab);
+err_unlock:
+ mutex_unlock(&ab->core_lock);
return ret;
}
+EXPORT_SYMBOL(ath11k_core_start_device);

MODULE_DESCRIPTION("Core module for Qualcomm Atheros 802.11ax wireless LAN cards.");
MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/wireless/ath/ath11k/core.h b/drivers/net/wireless/ath/ath11k/core.h
index 086cc127c39c..e5436b7a8f5d 100644
--- a/drivers/net/wireless/ath/ath11k/core.h
+++ b/drivers/net/wireless/ath/ath11k/core.h
@@ -1136,6 +1136,8 @@ void ath11k_core_halt(struct ath11k *ar);
int ath11k_core_resume(struct ath11k_base *ab);
int ath11k_core_suspend(struct ath11k_base *ab);
int ath11k_core_start_device(struct ath11k_base *ab);
+void ath11k_core_stop_device(struct ath11k_base *ab);
+int ath11k_core_any_pdevs_on(struct ath11k_base *ab);

const struct firmware *ath11k_core_firmware_request(struct ath11k_base *ab,
const char *filename);
diff --git a/drivers/net/wireless/ath/ath11k/hif.h b/drivers/net/wireless/ath/ath11k/hif.h
index e9366f786fbb..e23b6ed17542 100644
--- a/drivers/net/wireless/ath/ath11k/hif.h
+++ b/drivers/net/wireless/ath/ath11k/hif.h
@@ -29,6 +29,8 @@ struct ath11k_hif_ops {
void (*ce_irq_enable)(struct ath11k_base *ab);
void (*ce_irq_disable)(struct ath11k_base *ab);
void (*get_ce_msi_idx)(struct ath11k_base *ab, u32 ce_id, u32 *msi_idx);
+ int (*core_start)(struct ath11k_base *ab);
+ void (*core_stop)(struct ath11k_base *ab);
};

static inline void ath11k_hif_ce_irq_enable(struct ath11k_base *ab)
@@ -134,4 +136,14 @@ static inline void ath11k_get_ce_msi_idx(struct ath11k_base *ab, u32 ce_id,
else
*msi_data_idx = ce_id;
}
+
+static inline int ath11k_hif_core_start(struct ath11k_base *ab)
+{
+ return ab->hif.ops->core_start(ab);
+}
+
+static inline void ath11k_hif_core_stop(struct ath11k_base *ab)
+{
+ return ab->hif.ops->core_stop(ab);
+}
#endif /* _HIF_H_ */
diff --git a/drivers/net/wireless/ath/ath11k/mac.c b/drivers/net/wireless/ath/ath11k/mac.c
index 56d63e9d2f1a..0cb5f6b7e90e 100644
--- a/drivers/net/wireless/ath/ath11k/mac.c
+++ b/drivers/net/wireless/ath/ath11k/mac.c
@@ -5705,9 +5705,9 @@ static int ath11k_mac_op_start(struct ieee80211_hw *hw)

ath11k_mac_drain_tx(ar);

- ret = ath11k_core_start_device(ab);
+ ret = ath11k_hif_core_start(ab);
if (ret) {
- ath11k_err(ab, "failed to start device : %d\n", ret);
+ ath11k_err(ab, "failed to start core : %d\n", ret);
return ret;
}

@@ -5868,6 +5868,10 @@ static void ath11k_mac_op_stop(struct ieee80211_hw *hw)
synchronize_rcu();

atomic_set(&ar->num_pending_mgmt_tx, 0);
+
+ /* If all PDEVs on the SoC are down, then power down the device */
+ if (!ath11k_core_any_pdevs_on(ar->ab))
+ ath11k_hif_core_stop(ar->ab);
}

static void
diff --git a/drivers/net/wireless/ath/ath11k/pci.c b/drivers/net/wireless/ath/ath11k/pci.c
index 5bd34a6273d9..b4ad69d06a90 100644
--- a/drivers/net/wireless/ath/ath11k/pci.c
+++ b/drivers/net/wireless/ath/ath11k/pci.c
@@ -680,6 +680,30 @@ static int ath11k_pci_start(struct ath11k_base *ab)
return 0;
}

+static int ath11k_pci_core_start(struct ath11k_base *ab)
+{
+ /* TODO: Currently initializing the hardware/firmware only
+ * during hardware recovery. Support to shutdown/turn-on
+ * the hardware during Wi-Fi OFF/ON will be added later.
+ */
+ if (!test_bit(ATH11K_FLAG_RECOVERY, &ab->dev_flags))
+ return 0;
+
+ return ath11k_core_start_device(ab);
+}
+
+static void ath11k_pci_core_stop(struct ath11k_base *ab)
+{
+ /* TODO: Currently stopping the hardware/firmware only
+ * during driver unload. Support to shutdown/turn-on
+ * the hardware during Wi-Fi OFF/ON will be added later.
+ */
+ if (!test_bit(ATH11K_FLAG_UNREGISTERING, &ab->dev_flags))
+ return;
+
+ return ath11k_core_stop_device(ab);
+}
+
static const struct ath11k_hif_ops ath11k_pci_hif_ops = {
.start = ath11k_pci_start,
.stop = ath11k_pcic_stop,
@@ -697,6 +721,8 @@ static const struct ath11k_hif_ops ath11k_pci_hif_ops = {
.ce_irq_enable = ath11k_pci_hif_ce_irq_enable,
.ce_irq_disable = ath11k_pci_hif_ce_irq_disable,
.get_ce_msi_idx = ath11k_pcic_get_ce_msi_idx,
+ .core_start = ath11k_pci_core_start,
+ .core_stop = ath11k_pci_core_stop,
};

static void ath11k_pci_read_hw_version(struct ath11k_base *ab, u32 *major, u32 *minor)
--
2.37.1


2022-08-31 08:08:01

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] ath11k: Enable low power mode when WLAN is not active

Manikanta Pubbisetty <[email protected]> writes:

> Currently, WLAN chip is powered once during driver probe and is kept
> ON (powered) always even when WLAN is not active; keeping the chip
> powered ON all the time will consume extra power which is not
> desirable for a battery operated device. Same is the case with non-WoW
> suspend, chip will never be put into low power mode when the system is
> suspended resulting in higher battery drain.
>
> As per the recommendation, sending a PDEV suspend WMI command followed
> by a QMI MODE OFF command will cease all WLAN activity and put the device
> in low power mode. When WLAN interfaces are brought up, sending a QMI
> MISSION MODE command would be sufficient to bring the chip out of low
> power. This is a better approach than doing hif_power_down()/hif_power_up()
> for every WiFi ON/OFF sequence since the turnaround time for entry/exit of
> low power mode is much less. Overhead is just the time taken for sending
> QMI MODE OFF & QMI MISSION MODE commands instead of going through the
> entire chip boot & QMI init sequence.
>
> Currently the changes are applicable only for WCN6750. This can be
> extended to other targets with a future patch.
>
> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1
>
> Signed-off-by: Manikanta Pubbisetty <[email protected]>

With this patch 3 my regression tests on WCN6855 panic:

[ 109.009427] rmmod ath11k_pci
[ 109.437120] kworker/dying (1560) used greatest stack depth: 24704 bytes left
[ 109.446051] general protection fault, probably for non-canonical address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[ 109.446133] KASAN: null-ptr-deref in range [0x00000000000001f0-0x00000000000001f7]
[ 109.446211] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.0-rc2-wt-ath+ #683
[ 109.446315] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021
[ 109.446393] RIP: 0010:ath11k_dp_process_rxdma_err+0x1c8/0x810 [ath11k]
[ 109.446486] Code: d2 48 63 d8 48 8b 4c 24 38 48 8d 04 5b 48 c1 e0 04 48 8d bc 01 f4 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 aa
[ 109.446557] RSP: 0018:ffffc90000007b90 EFLAGS: 00010203
[ 109.446619] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 109.446677] RDX: 000000000000003e RSI: 0000000000000000 RDI: 00000000000001f4
[ 109.446734] RBP: 0000000000000000 R08: ffff88814c0e1470 R09: ffff88814c0f6678
[ 109.446792] R10: dffffc0000000000 R11: ffffed102981ecd0 R12: 0000000000000000
[ 109.446903] R13: 0000000000000040 R14: 1ffff92000000f81 R15: ffff88814c0e0000
[ 109.446961] FS: 0000000000000000(0000) GS:ffff888233600000(0000) knlGS:0000000000000000
[ 109.447020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 109.447080] CR2: 000055570fe84cc8 CR3: 0000000152540006 CR4: 00000000003706f0
[ 109.447136] Call Trace:
[ 109.447194] <IRQ>
[ 109.447257] ? ath11k_dp_process_reo_status+0x3ce/0x560 [ath11k]
[ 109.447349] ? ath11k_dp_rx_process_wbm_err+0x17d0/0x17d0 [ath11k]
[ 109.447442] ath11k_dp_service_srng+0x358/0xab0 [ath11k]
[ 109.447536] ath11k_pcic_ext_grp_napi_poll+0x40/0x1d0 [ath11k]
[ 109.447628] __napi_poll.constprop.0+0x97/0x350
[ 109.447690] net_rx_action+0x7bf/0xbb0
[ 109.447749] ? __napi_poll.constprop.0+0x350/0x350
[ 109.447831] ? ktime_get+0x101/0x2f0
[ 109.447911] __do_softirq+0x1ca/0x897
[ 109.447989] __irq_exit_rcu+0xdb/0x180
[ 109.448049] irq_exit_rcu+0x5/0x20
[ 109.448109] sysvec_apic_timer_interrupt+0x8a/0xb0
[ 109.448168] </IRQ>
[ 109.448226] <TASK>
[ 109.448284] asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 109.448343] RIP: 0010:cpuidle_enter_state+0x1ef/0x9a0
[ 109.448407] Code: 00 00 8b 73 04 bf ff ff ff ff e8 8c f0 ff ff 31 ff e8 75 16 47 fe 80 7c 24 08 00 0f 85 a1 01 00 00 e8 75 7d 6a fe fb 45 85 f6 <0f> 88 6f 02 00 00 4d 63 ee 4b 8d 44 6d 00 49 8d 44 85 00 48 8d 7c
[ 109.448477] RSP: 0018:ffffffffb9407db8 EFLAGS: 00000202
[ 109.448540] RAX: 0000000000124987 RBX: ffff888114966400 RCX: 1ffffffff738a3e1
[ 109.448600] RDX: 0000000000000000 RSI: ffffffffb8a71bc0 RDI: ffffffffb8bec9c0
[ 109.448659] RBP: ffffffffb986efa0 R08: 0000000000000001 R09: ffffffffb9c54717
[ 109.448715] R10: fffffbfff738a8e2 R11: 00000000000002ec R12: 000000197b7a6a19
[ 109.448774] R13: 0000000000000002 R14: 0000000000000002 R15: ffff888114966404
[ 109.448871] ? cpuidle_enter_state+0x1eb/0x9a0
[ 109.448947] ? finish_task_switch.isra.0+0x120/0x870
[ 109.449006] cpuidle_enter+0x45/0xa0
[ 109.449065] cpuidle_idle_call+0x274/0x3f0
[ 109.449125] ? arch_cpu_idle_exit+0x30/0x30
[ 109.449187] ? tsc_verify_tsc_adjust+0x86/0x2d0
[ 109.449247] ? lockdep_hardirqs_off+0x90/0xd0
[ 109.449306] do_idle+0xe0/0x140
[ 109.449363] cpu_startup_entry+0x14/0x20
[ 109.449423] rest_init+0x135/0x1e0
[ 109.449482] arch_call_rest_init+0xa/0xb
[ 109.449542] start_kernel+0x391/0x3af
[ 109.449606] secondary_startup_64_no_verify+0xce/0xdb
[ 109.449671] </TASK>
[ 109.449729] Modules linked in: ath11k_pci(-) ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr nvme nvme_core
[ 109.449846] ---[ end trace 0000000000000000 ]---
[ 109.449927] RIP: 0010:ath11k_dp_process_rxdma_err+0x1c8/0x810 [ath11k]
[ 109.450073] Code: d2 48 63 d8 48 8b 4c 24 38 48 8d 04 5b 48 c1 e0 04 48 8d bc 01 f4 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 aa
[ 109.450147] RSP: 0018:ffffc90000007b90 EFLAGS: 00010203
[ 109.450212] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 109.450275] RDX: 000000000000003e RSI: 0000000000000000 RDI: 00000000000001f4
[ 109.450337] RBP: 0000000000000000 R08: ffff88814c0e1470 R09: ffff88814c0f6678
[ 109.450401] R10: dffffc0000000000 R11: ffffed102981ecd0 R12: 0000000000000000
[ 109.450465] R13: 0000000000000040 R14: 1ffff92000000f81 R15: ffff88814c0e0000
[ 109.450531] FS: 0000000000000000(0000) GS:ffff888233600000(0000) knlGS:0000000000000000
[ 109.450595] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 109.450661] CR2: 000055570fe84cc8 CR3: 0000000152540006 CR4: 00000000003706f0
[ 109.450723] Kernel panic - not syncing: Fatal exception in interrupt
[ 109.450794] Kernel Offset: 0x34e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 109.450837] Rebooting in 10 seconds..

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2022-09-01 08:18:39

by Manikanta Pubbisetty

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] ath11k: Enable low power mode when WLAN is not active

On 8/31/2022 1:31 PM, Kalle Valo wrote:
> Manikanta Pubbisetty <[email protected]> writes:
>
>> Currently, WLAN chip is powered once during driver probe and is kept
>> ON (powered) always even when WLAN is not active; keeping the chip
>> powered ON all the time will consume extra power which is not
>> desirable for a battery operated device. Same is the case with non-WoW
>> suspend, chip will never be put into low power mode when the system is
>> suspended resulting in higher battery drain.
>>
>> As per the recommendation, sending a PDEV suspend WMI command followed
>> by a QMI MODE OFF command will cease all WLAN activity and put the device
>> in low power mode. When WLAN interfaces are brought up, sending a QMI
>> MISSION MODE command would be sufficient to bring the chip out of low
>> power. This is a better approach than doing hif_power_down()/hif_power_up()
>> for every WiFi ON/OFF sequence since the turnaround time for entry/exit of
>> low power mode is much less. Overhead is just the time taken for sending
>> QMI MODE OFF & QMI MISSION MODE commands instead of going through the
>> entire chip boot & QMI init sequence.
>>
>> Currently the changes are applicable only for WCN6750. This can be
>> extended to other targets with a future patch.
>>
>> Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-00887-QCAMSLSWPLZ-1
>>
>> Signed-off-by: Manikanta Pubbisetty <[email protected]>
>
> With this patch 3 my regression tests on WCN6855 panic:
>
> [ 109.009427] rmmod ath11k_pci
> [ 109.437120] kworker/dying (1560) used greatest stack depth: 24704 bytes left
> [ 109.446051] general protection fault, probably for non-canonical address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
> [ 109.446133] KASAN: null-ptr-deref in range [0x00000000000001f0-0x00000000000001f7]
> [ 109.446211] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.0-rc2-wt-ath+ #683
> [ 109.446315] Hardware name: Intel(R) Client Systems NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339 05/28/2021
> [ 109.446393] RIP: 0010:ath11k_dp_process_rxdma_err+0x1c8/0x810 [ath11k]
> [ 109.446486] Code: d2 48 63 d8 48 8b 4c 24 38 48 8d 04 5b 48 c1 e0 04 48 8d bc 01 f4 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 aa
> [ 109.446557] RSP: 0018:ffffc90000007b90 EFLAGS: 00010203
> [ 109.446619] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 109.446677] RDX: 000000000000003e RSI: 0000000000000000 RDI: 00000000000001f4
> [ 109.446734] RBP: 0000000000000000 R08: ffff88814c0e1470 R09: ffff88814c0f6678
> [ 109.446792] R10: dffffc0000000000 R11: ffffed102981ecd0 R12: 0000000000000000
> [ 109.446903] R13: 0000000000000040 R14: 1ffff92000000f81 R15: ffff88814c0e0000
> [ 109.446961] FS: 0000000000000000(0000) GS:ffff888233600000(0000) knlGS:0000000000000000
> [ 109.447020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 109.447080] CR2: 000055570fe84cc8 CR3: 0000000152540006 CR4: 00000000003706f0
> [ 109.447136] Call Trace:
> [ 109.447194] <IRQ>
> [ 109.447257] ? ath11k_dp_process_reo_status+0x3ce/0x560 [ath11k]
> [ 109.447349] ? ath11k_dp_rx_process_wbm_err+0x17d0/0x17d0 [ath11k]
> [ 109.447442] ath11k_dp_service_srng+0x358/0xab0 [ath11k]
> [ 109.447536] ath11k_pcic_ext_grp_napi_poll+0x40/0x1d0 [ath11k]
> [ 109.447628] __napi_poll.constprop.0+0x97/0x350
> [ 109.447690] net_rx_action+0x7bf/0xbb0
> [ 109.447749] ? __napi_poll.constprop.0+0x350/0x350
> [ 109.447831] ? ktime_get+0x101/0x2f0
> [ 109.447911] __do_softirq+0x1ca/0x897
> [ 109.447989] __irq_exit_rcu+0xdb/0x180
> [ 109.448049] irq_exit_rcu+0x5/0x20
> [ 109.448109] sysvec_apic_timer_interrupt+0x8a/0xb0
> [ 109.448168] </IRQ>
> [ 109.448226] <TASK>
> [ 109.448284] asm_sysvec_apic_timer_interrupt+0x16/0x20
> [ 109.448343] RIP: 0010:cpuidle_enter_state+0x1ef/0x9a0
> [ 109.448407] Code: 00 00 8b 73 04 bf ff ff ff ff e8 8c f0 ff ff 31 ff e8 75 16 47 fe 80 7c 24 08 00 0f 85 a1 01 00 00 e8 75 7d 6a fe fb 45 85 f6 <0f> 88 6f 02 00 00 4d 63 ee 4b 8d 44 6d 00 49 8d 44 85 00 48 8d 7c
> [ 109.448477] RSP: 0018:ffffffffb9407db8 EFLAGS: 00000202
> [ 109.448540] RAX: 0000000000124987 RBX: ffff888114966400 RCX: 1ffffffff738a3e1
> [ 109.448600] RDX: 0000000000000000 RSI: ffffffffb8a71bc0 RDI: ffffffffb8bec9c0
> [ 109.448659] RBP: ffffffffb986efa0 R08: 0000000000000001 R09: ffffffffb9c54717
> [ 109.448715] R10: fffffbfff738a8e2 R11: 00000000000002ec R12: 000000197b7a6a19
> [ 109.448774] R13: 0000000000000002 R14: 0000000000000002 R15: ffff888114966404
> [ 109.448871] ? cpuidle_enter_state+0x1eb/0x9a0
> [ 109.448947] ? finish_task_switch.isra.0+0x120/0x870
> [ 109.449006] cpuidle_enter+0x45/0xa0
> [ 109.449065] cpuidle_idle_call+0x274/0x3f0
> [ 109.449125] ? arch_cpu_idle_exit+0x30/0x30
> [ 109.449187] ? tsc_verify_tsc_adjust+0x86/0x2d0
> [ 109.449247] ? lockdep_hardirqs_off+0x90/0xd0
> [ 109.449306] do_idle+0xe0/0x140
> [ 109.449363] cpu_startup_entry+0x14/0x20
> [ 109.449423] rest_init+0x135/0x1e0
> [ 109.449482] arch_call_rest_init+0xa/0xb
> [ 109.449542] start_kernel+0x391/0x3af
> [ 109.449606] secondary_startup_64_no_verify+0xce/0xdb
> [ 109.449671] </TASK>
> [ 109.449729] Modules linked in: ath11k_pci(-) ath11k mac80211 libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr nvme nvme_core
> [ 109.449846] ---[ end trace 0000000000000000 ]---
> [ 109.449927] RIP: 0010:ath11k_dp_process_rxdma_err+0x1c8/0x810 [ath11k]
> [ 109.450073] Code: d2 48 63 d8 48 8b 4c 24 38 48 8d 04 5b 48 c1 e0 04 48 8d bc 01 f4 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 aa
> [ 109.450147] RSP: 0018:ffffc90000007b90 EFLAGS: 00010203
> [ 109.450212] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 109.450275] RDX: 000000000000003e RSI: 0000000000000000 RDI: 00000000000001f4
> [ 109.450337] RBP: 0000000000000000 R08: ffff88814c0e1470 R09: ffff88814c0f6678
> [ 109.450401] R10: dffffc0000000000 R11: ffffed102981ecd0 R12: 0000000000000000
> [ 109.450465] R13: 0000000000000040 R14: 1ffff92000000f81 R15: ffff88814c0e0000
> [ 109.450531] FS: 0000000000000000(0000) GS:ffff888233600000(0000) knlGS:0000000000000000
> [ 109.450595] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 109.450661] CR2: 000055570fe84cc8 CR3: 0000000152540006 CR4: 00000000003706f0
> [ 109.450723] Kernel panic - not syncing: Fatal exception in interrupt
> [ 109.450794] Kernel Offset: 0x34e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [ 109.450837] Rebooting in 10 seconds..
>

Thanks for the report. From the logs, I see the system crashed during
rmmod ath11k_pci. I have run rmmod & insmod tests several times on
WCN6855 before sending the revised version. Probably some tests before
rmmod would have led to this crash. Could you please let me know what
tests does the regression suite covers? I'll try repro the bug.

Thanks,
Manikanta

2022-09-02 13:56:45

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] ath11k: Enable low power mode when WLAN is not active

Manikanta Pubbisetty <[email protected]> writes:

> On 8/31/2022 1:31 PM, Kalle Valo wrote:
>
>> With this patch 3 my regression tests on WCN6855 panic:
>>
>> [ 109.009427] rmmod ath11k_pci
>> [ 109.437120] kworker/dying (1560) used greatest stack depth: 24704 bytes left
>> [ 109.446051] general protection fault, probably for non-canonical
>> address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> KASAN
>> [ 109.446133] KASAN: null-ptr-deref in range
>> [0x00000000000001f0-0x00000000000001f7]
>> [ 109.446211] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.0-rc2-wt-ath+ #683
>> [ 109.446315] Hardware name: Intel(R) Client Systems
>> NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339
>> 05/28/2021
>> [ 109.446393] RIP: 0010:ath11k_dp_process_rxdma_err+0x1c8/0x810 [ath11k]
>> [ 109.446486] Code: d2 48 63 d8 48 8b 4c 24 38 48 8d 04 5b 48 c1 e0
>> 04 48 8d bc 01 f4 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
>> c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2
>> 0f 85 aa
>> [ 109.446557] RSP: 0018:ffffc90000007b90 EFLAGS: 00010203
>> [ 109.446619] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
>> [ 109.446677] RDX: 000000000000003e RSI: 0000000000000000 RDI: 00000000000001f4
>> [ 109.446734] RBP: 0000000000000000 R08: ffff88814c0e1470 R09: ffff88814c0f6678
>> [ 109.446792] R10: dffffc0000000000 R11: ffffed102981ecd0 R12: 0000000000000000
>> [ 109.446903] R13: 0000000000000040 R14: 1ffff92000000f81 R15: ffff88814c0e0000
>> [ 109.446961] FS: 0000000000000000(0000) GS:ffff888233600000(0000)
>> knlGS:0000000000000000
>> [ 109.447020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 109.447080] CR2: 000055570fe84cc8 CR3: 0000000152540006 CR4: 00000000003706f0
>> [ 109.447136] Call Trace:
>> [ 109.447194] <IRQ>
>> [ 109.447257] ? ath11k_dp_process_reo_status+0x3ce/0x560 [ath11k]
>> [ 109.447349] ? ath11k_dp_rx_process_wbm_err+0x17d0/0x17d0 [ath11k]
>> [ 109.447442] ath11k_dp_service_srng+0x358/0xab0 [ath11k]
>> [ 109.447536] ath11k_pcic_ext_grp_napi_poll+0x40/0x1d0 [ath11k]
>> [ 109.447628] __napi_poll.constprop.0+0x97/0x350
>> [ 109.447690] net_rx_action+0x7bf/0xbb0
>> [ 109.447749] ? __napi_poll.constprop.0+0x350/0x350
>> [ 109.447831] ? ktime_get+0x101/0x2f0
>> [ 109.447911] __do_softirq+0x1ca/0x897
>> [ 109.447989] __irq_exit_rcu+0xdb/0x180
>> [ 109.448049] irq_exit_rcu+0x5/0x20
>> [ 109.448109] sysvec_apic_timer_interrupt+0x8a/0xb0
>> [ 109.448168] </IRQ>
>> [ 109.448226] <TASK>
>> [ 109.448284] asm_sysvec_apic_timer_interrupt+0x16/0x20
>> [ 109.448343] RIP: 0010:cpuidle_enter_state+0x1ef/0x9a0
>> [ 109.448407] Code: 00 00 8b 73 04 bf ff ff ff ff e8 8c f0 ff ff 31
>> ff e8 75 16 47 fe 80 7c 24 08 00 0f 85 a1 01 00 00 e8 75 7d 6a fe fb
>> 45 85 f6 <0f> 88 6f 02 00 00 4d 63 ee 4b 8d 44 6d 00 49 8d 44 85 00
>> 48 8d 7c
>> [ 109.448477] RSP: 0018:ffffffffb9407db8 EFLAGS: 00000202
>> [ 109.448540] RAX: 0000000000124987 RBX: ffff888114966400 RCX: 1ffffffff738a3e1
>> [ 109.448600] RDX: 0000000000000000 RSI: ffffffffb8a71bc0 RDI: ffffffffb8bec9c0
>> [ 109.448659] RBP: ffffffffb986efa0 R08: 0000000000000001 R09: ffffffffb9c54717
>> [ 109.448715] R10: fffffbfff738a8e2 R11: 00000000000002ec R12: 000000197b7a6a19
>> [ 109.448774] R13: 0000000000000002 R14: 0000000000000002 R15: ffff888114966404
>> [ 109.448871] ? cpuidle_enter_state+0x1eb/0x9a0
>> [ 109.448947] ? finish_task_switch.isra.0+0x120/0x870
>> [ 109.449006] cpuidle_enter+0x45/0xa0
>> [ 109.449065] cpuidle_idle_call+0x274/0x3f0
>> [ 109.449125] ? arch_cpu_idle_exit+0x30/0x30
>> [ 109.449187] ? tsc_verify_tsc_adjust+0x86/0x2d0
>> [ 109.449247] ? lockdep_hardirqs_off+0x90/0xd0
>> [ 109.449306] do_idle+0xe0/0x140
>> [ 109.449363] cpu_startup_entry+0x14/0x20
>> [ 109.449423] rest_init+0x135/0x1e0
>> [ 109.449482] arch_call_rest_init+0xa/0xb
>> [ 109.449542] start_kernel+0x391/0x3af
>> [ 109.449606] secondary_startup_64_no_verify+0xce/0xdb
>> [ 109.449671] </TASK>
>> [ 109.449729] Modules linked in: ath11k_pci(-) ath11k mac80211
>> libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr nvme nvme_core
>> [ 109.449846] ---[ end trace 0000000000000000 ]---
>> [ 109.449927] RIP: 0010:ath11k_dp_process_rxdma_err+0x1c8/0x810 [ath11k]
>> [ 109.450073] Code: d2 48 63 d8 48 8b 4c 24 38 48 8d 04 5b 48 c1 e0
>> 04 48 8d bc 01 f4 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
>> c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2
>> 0f 85 aa
>> [ 109.450147] RSP: 0018:ffffc90000007b90 EFLAGS: 00010203
>> [ 109.450212] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
>> [ 109.450275] RDX: 000000000000003e RSI: 0000000000000000 RDI: 00000000000001f4
>> [ 109.450337] RBP: 0000000000000000 R08: ffff88814c0e1470 R09: ffff88814c0f6678
>> [ 109.450401] R10: dffffc0000000000 R11: ffffed102981ecd0 R12: 0000000000000000
>> [ 109.450465] R13: 0000000000000040 R14: 1ffff92000000f81 R15: ffff88814c0e0000
>> [ 109.450531] FS: 0000000000000000(0000) GS:ffff888233600000(0000)
>> knlGS:0000000000000000
>> [ 109.450595] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 109.450661] CR2: 000055570fe84cc8 CR3: 0000000152540006 CR4: 00000000003706f0
>> [ 109.450723] Kernel panic - not syncing: Fatal exception in interrupt
>> [ 109.450794] Kernel Offset: 0x34e00000 from 0xffffffff81000000
>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> [ 109.450837] Rebooting in 10 seconds..
>
> Thanks for the report. From the logs, I see the system crashed during
> rmmod ath11k_pci. I have run rmmod & insmod tests several times on
> WCN6855 before sending the revised version. Probably some tests before
> rmmod would have led to this crash. Could you please let me know what
> tests does the regression suite covers? I'll try repro the bug.

This is early in the tests, IIRC there was only insmod followed by
rmmod. Do you have KASAN etc enabled?

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2022-09-02 13:58:40

by Manikanta Pubbisetty

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] ath11k: Enable low power mode when WLAN is not active

On 9/2/2022 6:48 PM, Kalle Valo wrote:
> Manikanta Pubbisetty <[email protected]> writes:
>
>> On 8/31/2022 1:31 PM, Kalle Valo wrote:
>>
>>> With this patch 3 my regression tests on WCN6855 panic:
>>>
>>> [ 109.009427] rmmod ath11k_pci
>>> [ 109.437120] kworker/dying (1560) used greatest stack depth: 24704 bytes left
>>> [ 109.446051] general protection fault, probably for non-canonical
>>> address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>>> KASAN
>>> [ 109.446133] KASAN: null-ptr-deref in range
>>> [0x00000000000001f0-0x00000000000001f7]
>>> [ 109.446211] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.0-rc2-wt-ath+ #683
>>> [ 109.446315] Hardware name: Intel(R) Client Systems
>>> NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339
>>> 05/28/2021
>>> [ 109.446393] RIP: 0010:ath11k_dp_process_rxdma_err+0x1c8/0x810 [ath11k]
>>> [ 109.446486] Code: d2 48 63 d8 48 8b 4c 24 38 48 8d 04 5b 48 c1 e0
>>> 04 48 8d bc 01 f4 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
>>> c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2
>>> 0f 85 aa
>>> [ 109.446557] RSP: 0018:ffffc90000007b90 EFLAGS: 00010203
>>> [ 109.446619] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>> [ 109.446677] RDX: 000000000000003e RSI: 0000000000000000 RDI: 00000000000001f4
>>> [ 109.446734] RBP: 0000000000000000 R08: ffff88814c0e1470 R09: ffff88814c0f6678
>>> [ 109.446792] R10: dffffc0000000000 R11: ffffed102981ecd0 R12: 0000000000000000
>>> [ 109.446903] R13: 0000000000000040 R14: 1ffff92000000f81 R15: ffff88814c0e0000
>>> [ 109.446961] FS: 0000000000000000(0000) GS:ffff888233600000(0000)
>>> knlGS:0000000000000000
>>> [ 109.447020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 109.447080] CR2: 000055570fe84cc8 CR3: 0000000152540006 CR4: 00000000003706f0
>>> [ 109.447136] Call Trace:
>>> [ 109.447194] <IRQ>
>>> [ 109.447257] ? ath11k_dp_process_reo_status+0x3ce/0x560 [ath11k]
>>> [ 109.447349] ? ath11k_dp_rx_process_wbm_err+0x17d0/0x17d0 [ath11k]
>>> [ 109.447442] ath11k_dp_service_srng+0x358/0xab0 [ath11k]
>>> [ 109.447536] ath11k_pcic_ext_grp_napi_poll+0x40/0x1d0 [ath11k]
>>> [ 109.447628] __napi_poll.constprop.0+0x97/0x350
>>> [ 109.447690] net_rx_action+0x7bf/0xbb0
>>> [ 109.447749] ? __napi_poll.constprop.0+0x350/0x350
>>> [ 109.447831] ? ktime_get+0x101/0x2f0
>>> [ 109.447911] __do_softirq+0x1ca/0x897
>>> [ 109.447989] __irq_exit_rcu+0xdb/0x180
>>> [ 109.448049] irq_exit_rcu+0x5/0x20
>>> [ 109.448109] sysvec_apic_timer_interrupt+0x8a/0xb0
>>> [ 109.448168] </IRQ>
>>> [ 109.448226] <TASK>
>>> [ 109.448284] asm_sysvec_apic_timer_interrupt+0x16/0x20
>>> [ 109.448343] RIP: 0010:cpuidle_enter_state+0x1ef/0x9a0
>>> [ 109.448407] Code: 00 00 8b 73 04 bf ff ff ff ff e8 8c f0 ff ff 31
>>> ff e8 75 16 47 fe 80 7c 24 08 00 0f 85 a1 01 00 00 e8 75 7d 6a fe fb
>>> 45 85 f6 <0f> 88 6f 02 00 00 4d 63 ee 4b 8d 44 6d 00 49 8d 44 85 00
>>> 48 8d 7c
>>> [ 109.448477] RSP: 0018:ffffffffb9407db8 EFLAGS: 00000202
>>> [ 109.448540] RAX: 0000000000124987 RBX: ffff888114966400 RCX: 1ffffffff738a3e1
>>> [ 109.448600] RDX: 0000000000000000 RSI: ffffffffb8a71bc0 RDI: ffffffffb8bec9c0
>>> [ 109.448659] RBP: ffffffffb986efa0 R08: 0000000000000001 R09: ffffffffb9c54717
>>> [ 109.448715] R10: fffffbfff738a8e2 R11: 00000000000002ec R12: 000000197b7a6a19
>>> [ 109.448774] R13: 0000000000000002 R14: 0000000000000002 R15: ffff888114966404
>>> [ 109.448871] ? cpuidle_enter_state+0x1eb/0x9a0
>>> [ 109.448947] ? finish_task_switch.isra.0+0x120/0x870
>>> [ 109.449006] cpuidle_enter+0x45/0xa0
>>> [ 109.449065] cpuidle_idle_call+0x274/0x3f0
>>> [ 109.449125] ? arch_cpu_idle_exit+0x30/0x30
>>> [ 109.449187] ? tsc_verify_tsc_adjust+0x86/0x2d0
>>> [ 109.449247] ? lockdep_hardirqs_off+0x90/0xd0
>>> [ 109.449306] do_idle+0xe0/0x140
>>> [ 109.449363] cpu_startup_entry+0x14/0x20
>>> [ 109.449423] rest_init+0x135/0x1e0
>>> [ 109.449482] arch_call_rest_init+0xa/0xb
>>> [ 109.449542] start_kernel+0x391/0x3af
>>> [ 109.449606] secondary_startup_64_no_verify+0xce/0xdb
>>> [ 109.449671] </TASK>
>>> [ 109.449729] Modules linked in: ath11k_pci(-) ath11k mac80211
>>> libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr nvme nvme_core
>>> [ 109.449846] ---[ end trace 0000000000000000 ]---
>>> [ 109.449927] RIP: 0010:ath11k_dp_process_rxdma_err+0x1c8/0x810 [ath11k]
>>> [ 109.450073] Code: d2 48 63 d8 48 8b 4c 24 38 48 8d 04 5b 48 c1 e0
>>> 04 48 8d bc 01 f4 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
>>> c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2
>>> 0f 85 aa
>>> [ 109.450147] RSP: 0018:ffffc90000007b90 EFLAGS: 00010203
>>> [ 109.450212] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>> [ 109.450275] RDX: 000000000000003e RSI: 0000000000000000 RDI: 00000000000001f4
>>> [ 109.450337] RBP: 0000000000000000 R08: ffff88814c0e1470 R09: ffff88814c0f6678
>>> [ 109.450401] R10: dffffc0000000000 R11: ffffed102981ecd0 R12: 0000000000000000
>>> [ 109.450465] R13: 0000000000000040 R14: 1ffff92000000f81 R15: ffff88814c0e0000
>>> [ 109.450531] FS: 0000000000000000(0000) GS:ffff888233600000(0000)
>>> knlGS:0000000000000000
>>> [ 109.450595] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 109.450661] CR2: 000055570fe84cc8 CR3: 0000000152540006 CR4: 00000000003706f0
>>> [ 109.450723] Kernel panic - not syncing: Fatal exception in interrupt
>>> [ 109.450794] Kernel Offset: 0x34e00000 from 0xffffffff81000000
>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> [ 109.450837] Rebooting in 10 seconds..
>>
>> Thanks for the report. From the logs, I see the system crashed during
>> rmmod ath11k_pci. I have run rmmod & insmod tests several times on
>> WCN6855 before sending the revised version. Probably some tests before
>> rmmod would have led to this crash. Could you please let me know what
>> tests does the regression suite covers? I'll try repro the bug.
>
> This is early in the tests, IIRC there was only insmod followed by
> rmmod. Do you have KASAN etc enabled?
>

I see. I do not have KASAN enabled. Let me try enabling it.

Thanks,
Manikanta

2022-11-21 11:01:47

by Manikanta Pubbisetty

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] ath11k: Enable low power mode when WLAN is not active

On 9/2/2022 6:55 PM, Manikanta Pubbisetty wrote:
> On 9/2/2022 6:48 PM, Kalle Valo wrote:
>> Manikanta Pubbisetty <[email protected]> writes:
>>
>>> On 8/31/2022 1:31 PM, Kalle Valo wrote:
>>>
>>>> With this patch 3 my regression tests on WCN6855 panic:
>>>>
>>>> [  109.009427] rmmod ath11k_pci
>>>> [  109.437120] kworker/dying (1560) used greatest stack depth: 24704
>>>> bytes left
>>>> [ 109.446051] general protection fault, probably for non-canonical
>>>> address 0xdffffc000000003e: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>>>> KASAN
>>>> [ 109.446133] KASAN: null-ptr-deref in range
>>>> [0x00000000000001f0-0x00000000000001f7]
>>>> [  109.446211] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
>>>> 6.0.0-rc2-wt-ath+ #683
>>>> [ 109.446315] Hardware name: Intel(R) Client Systems
>>>> NUC8i7HVK/NUC8i7HVB, BIOS HNKBLi70.86A.0067.2021.0528.1339
>>>> 05/28/2021
>>>> [  109.446393] RIP: 0010:ath11k_dp_process_rxdma_err+0x1c8/0x810
>>>> [ath11k]
>>>> [ 109.446486] Code: d2 48 63 d8 48 8b 4c 24 38 48 8d 04 5b 48 c1 e0
>>>> 04 48 8d bc 01 f4 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
>>>> c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2
>>>> 0f 85 aa
>>>> [  109.446557] RSP: 0018:ffffc90000007b90 EFLAGS: 00010203
>>>> [  109.446619] RAX: dffffc0000000000 RBX: 0000000000000000 RCX:
>>>> 0000000000000000
>>>> [  109.446677] RDX: 000000000000003e RSI: 0000000000000000 RDI:
>>>> 00000000000001f4
>>>> [  109.446734] RBP: 0000000000000000 R08: ffff88814c0e1470 R09:
>>>> ffff88814c0f6678
>>>> [  109.446792] R10: dffffc0000000000 R11: ffffed102981ecd0 R12:
>>>> 0000000000000000
>>>> [  109.446903] R13: 0000000000000040 R14: 1ffff92000000f81 R15:
>>>> ffff88814c0e0000
>>>> [ 109.446961] FS: 0000000000000000(0000) GS:ffff888233600000(0000)
>>>> knlGS:0000000000000000
>>>> [  109.447020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  109.447080] CR2: 000055570fe84cc8 CR3: 0000000152540006 CR4:
>>>> 00000000003706f0
>>>> [  109.447136] Call Trace:
>>>> [  109.447194]  <IRQ>
>>>> [  109.447257]  ? ath11k_dp_process_reo_status+0x3ce/0x560 [ath11k]
>>>> [  109.447349]  ? ath11k_dp_rx_process_wbm_err+0x17d0/0x17d0 [ath11k]
>>>> [  109.447442]  ath11k_dp_service_srng+0x358/0xab0 [ath11k]
>>>> [  109.447536]  ath11k_pcic_ext_grp_napi_poll+0x40/0x1d0 [ath11k]
>>>> [  109.447628]  __napi_poll.constprop.0+0x97/0x350
>>>> [  109.447690]  net_rx_action+0x7bf/0xbb0
>>>> [  109.447749]  ? __napi_poll.constprop.0+0x350/0x350
>>>> [  109.447831]  ? ktime_get+0x101/0x2f0
>>>> [  109.447911]  __do_softirq+0x1ca/0x897
>>>> [  109.447989]  __irq_exit_rcu+0xdb/0x180
>>>> [  109.448049]  irq_exit_rcu+0x5/0x20
>>>> [  109.448109]  sysvec_apic_timer_interrupt+0x8a/0xb0
>>>> [  109.448168]  </IRQ>
>>>> [  109.448226]  <TASK>
>>>> [  109.448284]  asm_sysvec_apic_timer_interrupt+0x16/0x20
>>>> [  109.448343] RIP: 0010:cpuidle_enter_state+0x1ef/0x9a0
>>>> [ 109.448407] Code: 00 00 8b 73 04 bf ff ff ff ff e8 8c f0 ff ff 31
>>>> ff e8 75 16 47 fe 80 7c 24 08 00 0f 85 a1 01 00 00 e8 75 7d 6a fe fb
>>>> 45 85 f6 <0f> 88 6f 02 00 00 4d 63 ee 4b 8d 44 6d 00 49 8d 44 85 00
>>>> 48 8d 7c
>>>> [  109.448477] RSP: 0018:ffffffffb9407db8 EFLAGS: 00000202
>>>> [  109.448540] RAX: 0000000000124987 RBX: ffff888114966400 RCX:
>>>> 1ffffffff738a3e1
>>>> [  109.448600] RDX: 0000000000000000 RSI: ffffffffb8a71bc0 RDI:
>>>> ffffffffb8bec9c0
>>>> [  109.448659] RBP: ffffffffb986efa0 R08: 0000000000000001 R09:
>>>> ffffffffb9c54717
>>>> [  109.448715] R10: fffffbfff738a8e2 R11: 00000000000002ec R12:
>>>> 000000197b7a6a19
>>>> [  109.448774] R13: 0000000000000002 R14: 0000000000000002 R15:
>>>> ffff888114966404
>>>> [  109.448871]  ? cpuidle_enter_state+0x1eb/0x9a0
>>>> [  109.448947]  ? finish_task_switch.isra.0+0x120/0x870
>>>> [  109.449006]  cpuidle_enter+0x45/0xa0
>>>> [  109.449065]  cpuidle_idle_call+0x274/0x3f0
>>>> [  109.449125]  ? arch_cpu_idle_exit+0x30/0x30
>>>> [  109.449187]  ? tsc_verify_tsc_adjust+0x86/0x2d0
>>>> [  109.449247]  ? lockdep_hardirqs_off+0x90/0xd0
>>>> [  109.449306]  do_idle+0xe0/0x140
>>>> [  109.449363]  cpu_startup_entry+0x14/0x20
>>>> [  109.449423]  rest_init+0x135/0x1e0
>>>> [  109.449482]  arch_call_rest_init+0xa/0xb
>>>> [  109.449542]  start_kernel+0x391/0x3af
>>>> [  109.449606]  secondary_startup_64_no_verify+0xce/0xdb
>>>> [  109.449671]  </TASK>
>>>> [ 109.449729] Modules linked in: ath11k_pci(-) ath11k mac80211
>>>> libarc4 cfg80211 qmi_helpers qrtr_mhi mhi qrtr nvme nvme_core
>>>> [  109.449846] ---[ end trace 0000000000000000 ]---
>>>> [  109.449927] RIP: 0010:ath11k_dp_process_rxdma_err+0x1c8/0x810
>>>> [ath11k]
>>>> [ 109.450073] Code: d2 48 63 d8 48 8b 4c 24 38 48 8d 04 5b 48 c1 e0
>>>> 04 48 8d bc 01 f4 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
>>>> c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2
>>>> 0f 85 aa
>>>> [  109.450147] RSP: 0018:ffffc90000007b90 EFLAGS: 00010203
>>>> [  109.450212] RAX: dffffc0000000000 RBX: 0000000000000000 RCX:
>>>> 0000000000000000
>>>> [  109.450275] RDX: 000000000000003e RSI: 0000000000000000 RDI:
>>>> 00000000000001f4
>>>> [  109.450337] RBP: 0000000000000000 R08: ffff88814c0e1470 R09:
>>>> ffff88814c0f6678
>>>> [  109.450401] R10: dffffc0000000000 R11: ffffed102981ecd0 R12:
>>>> 0000000000000000
>>>> [  109.450465] R13: 0000000000000040 R14: 1ffff92000000f81 R15:
>>>> ffff88814c0e0000
>>>> [ 109.450531] FS: 0000000000000000(0000) GS:ffff888233600000(0000)
>>>> knlGS:0000000000000000
>>>> [  109.450595] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  109.450661] CR2: 000055570fe84cc8 CR3: 0000000152540006 CR4:
>>>> 00000000003706f0
>>>> [  109.450723] Kernel panic - not syncing: Fatal exception in interrupt
>>>> [ 109.450794] Kernel Offset: 0x34e00000 from 0xffffffff81000000
>>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>> [  109.450837] Rebooting in 10 seconds..
>>>
>>> Thanks for the report. From the logs, I see the system crashed during
>>> rmmod ath11k_pci. I have run rmmod & insmod tests several times on
>>> WCN6855 before sending the revised version. Probably some tests before
>>> rmmod would have led to this crash. Could you please let me know what
>>> tests does the regression suite covers? I'll try repro the bug.
>>
>> This is early in the tests, IIRC there was only insmod followed by
>> rmmod. Do you have KASAN etc enabled?
>>
>
> I see. I do not have KASAN enabled. Let me try enabling it.
>

Hi Kalle,

I have been trying to reproduce the above issue on WCN6855 for several
weeks now with KASAN, UBSAN, LOCKDEBUG etc enabled but could not
reproduce this issue. During this long exercise, I have uncovered other
issues and fixed them but I could not reproduce the above signature.
Not sure if there is any particular sequence to trigger the above crash
or it could be that it is an existing one that came up while testing
this series.

I will send V3 patch for review which fixes other issues in the patch
set. I'm in need of your help to test this series in your setup and see
if the above issue is still occurring.

Thanks,
Manikanta