2023-11-17 00:39:50

by Baochen Qiang

[permalink] [raw]
Subject: [PATCH] wifi: ath11k: fix race due to setting ATH11K_FLAG_EXT_IRQ_ENABLED too early

We are seeing below error randomly in the case where only
one MSI vector is configured:

kernel: ath11k_pci 0000:03:00.0: wmi command 16387 timeout

The reason is, currently, in ath11k_pcic_ext_irq_enable(),
ATH11K_FLAG_EXT_IRQ_ENABLED is set before NAPI is enabled.
This results in a race condition: after
ATH11K_FLAG_EXT_IRQ_ENABLED is set but before NAPI enabled,
CE interrupt breaks in. Since IRQ is shared by CE and data
path, ath11k_pcic_ext_interrupt_handler() is also called
where we call disable_irq_nosync() to disable IRQ. Then
napi_schedule() is called but it does nothing because NAPI
is not enabled at that time, meaning
ath11k_pcic_ext_grp_napi_poll() will never run, so we have
no chance to call enable_irq() to enable IRQ back. Finally
we get above error.

Fix it by setting ATH11K_FLAG_EXT_IRQ_ENABLED after all
NAPI and IRQ work are done. With the fix, we are sure that
by the time ATH11K_FLAG_EXT_IRQ_ENABLED is set, NAPI is
enabled.

Note that the fix above also introduce some side effects:
if ath11k_pcic_ext_interrupt_handler() breaks in after NAPI
enabled but before ATH11K_FLAG_EXT_IRQ_ENABLED set, nothing
will be done by the handler this time, the work will be
postponed till the next time the IRQ fires.

Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23

Signed-off-by: Baochen Qiang <[email protected]>
---
drivers/net/wireless/ath/ath11k/pcic.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath11k/pcic.c b/drivers/net/wireless/ath/ath11k/pcic.c
index 16d1e332193f..e602d4130105 100644
--- a/drivers/net/wireless/ath/ath11k/pcic.c
+++ b/drivers/net/wireless/ath/ath11k/pcic.c
@@ -460,8 +460,6 @@ void ath11k_pcic_ext_irq_enable(struct ath11k_base *ab)
{
int i;

- set_bit(ATH11K_FLAG_EXT_IRQ_ENABLED, &ab->dev_flags);
-
for (i = 0; i < ATH11K_EXT_IRQ_GRP_NUM_MAX; i++) {
struct ath11k_ext_irq_grp *irq_grp = &ab->ext_irq_grp[i];

@@ -471,6 +469,8 @@ void ath11k_pcic_ext_irq_enable(struct ath11k_base *ab)
}
ath11k_pcic_ext_grp_enable(irq_grp);
}
+
+ set_bit(ATH11K_FLAG_EXT_IRQ_ENABLED, &ab->dev_flags);
}
EXPORT_SYMBOL(ath11k_pcic_ext_irq_enable);


base-commit: 9a36440d929d134c56030a8492405708a143f580
--
2.25.1


2023-11-17 01:31:57

by Jeff Johnson

[permalink] [raw]
Subject: Re: [PATCH] wifi: ath11k: fix race due to setting ATH11K_FLAG_EXT_IRQ_ENABLED too early

On 11/16/2023 4:39 PM, Baochen Qiang wrote:
> We are seeing below error randomly in the case where only
> one MSI vector is configured:
>
> kernel: ath11k_pci 0000:03:00.0: wmi command 16387 timeout
>
> The reason is, currently, in ath11k_pcic_ext_irq_enable(),
> ATH11K_FLAG_EXT_IRQ_ENABLED is set before NAPI is enabled.
> This results in a race condition: after
> ATH11K_FLAG_EXT_IRQ_ENABLED is set but before NAPI enabled,
> CE interrupt breaks in. Since IRQ is shared by CE and data
> path, ath11k_pcic_ext_interrupt_handler() is also called
> where we call disable_irq_nosync() to disable IRQ. Then
> napi_schedule() is called but it does nothing because NAPI
> is not enabled at that time, meaning
> ath11k_pcic_ext_grp_napi_poll() will never run, so we have
> no chance to call enable_irq() to enable IRQ back. Finally
> we get above error.
>
> Fix it by setting ATH11K_FLAG_EXT_IRQ_ENABLED after all
> NAPI and IRQ work are done. With the fix, we are sure that
> by the time ATH11K_FLAG_EXT_IRQ_ENABLED is set, NAPI is
> enabled.
>
> Note that the fix above also introduce some side effects:
> if ath11k_pcic_ext_interrupt_handler() breaks in after NAPI
> enabled but before ATH11K_FLAG_EXT_IRQ_ENABLED set, nothing
> will be done by the handler this time, the work will be
> postponed till the next time the IRQ fires.
>
> Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
>
> Signed-off-by: Baochen Qiang <[email protected]>
Acked-by: Jeff Johnson <[email protected]>

2023-11-30 17:10:28

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] wifi: ath11k: fix race due to setting ATH11K_FLAG_EXT_IRQ_ENABLED too early

Baochen Qiang <[email protected]> wrote:

> We are seeing below error randomly in the case where only
> one MSI vector is configured:
>
> kernel: ath11k_pci 0000:03:00.0: wmi command 16387 timeout
>
> The reason is, currently, in ath11k_pcic_ext_irq_enable(),
> ATH11K_FLAG_EXT_IRQ_ENABLED is set before NAPI is enabled.
> This results in a race condition: after
> ATH11K_FLAG_EXT_IRQ_ENABLED is set but before NAPI enabled,
> CE interrupt breaks in. Since IRQ is shared by CE and data
> path, ath11k_pcic_ext_interrupt_handler() is also called
> where we call disable_irq_nosync() to disable IRQ. Then
> napi_schedule() is called but it does nothing because NAPI
> is not enabled at that time, meaning
> ath11k_pcic_ext_grp_napi_poll() will never run, so we have
> no chance to call enable_irq() to enable IRQ back. Finally
> we get above error.
>
> Fix it by setting ATH11K_FLAG_EXT_IRQ_ENABLED after all
> NAPI and IRQ work are done. With the fix, we are sure that
> by the time ATH11K_FLAG_EXT_IRQ_ENABLED is set, NAPI is
> enabled.
>
> Note that the fix above also introduce some side effects:
> if ath11k_pcic_ext_interrupt_handler() breaks in after NAPI
> enabled but before ATH11K_FLAG_EXT_IRQ_ENABLED set, nothing
> will be done by the handler this time, the work will be
> postponed till the next time the IRQ fires.
>
> Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.23
>
> Signed-off-by: Baochen Qiang <[email protected]>
> Acked-by: Jeff Johnson <[email protected]>
> Signed-off-by: Kalle Valo <[email protected]>

Patch applied to ath-next branch of ath.git, thanks.

5082b3e3027e wifi: ath11k: fix race due to setting ATH11K_FLAG_EXT_IRQ_ENABLED too early

--
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches