2024-03-29 02:07:49

by Baochen Qiang

[permalink] [raw]
Subject: [PATCH] wifi: ath12k: fix flush failure in recovery scenarios

Commit eaf9f17b861b ("wifi: ath12k: relocate ath12k_dp_pdev_pre_alloc()
call") moves ath12k_dp_pdev_pre_alloc() from ath12k_core_start() to
ath12k_mac_allocate(), resulting in ath12k_mac_flush() failure in
recovery scenarios:

[ 6849.684104] ath12k_pci 0000:04:00.0: pdev 0 successfully recovered
[ 6854.907320] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0
[ 6860.027353] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0
[ 6865.143385] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0

This is because, with ath12k_dp_pdev_pre_alloc() moved to ath12k_mac_allocate(),
dp->num_tx_pending is not reset due to ATH12K_FLAG_REGISTERED set in
recovery scenarios.

So a possible fix would be to reset that counter at some proper point,
just like the old design. But considering that the counter tracks number
of packets pending to be freed or returned to mac80211, forcefully reset
it might make it hard to expose some real issues. For example if somehow
ath12k fails to free/return some TX packets, we don't know that because
no warnings any more.

That is to say we should not reset that counter during recovery (which is
already done due to above commit), instead should decrease it each time
a packet is freed/returned. Currently almost each related function has
this logic implemented, except ath12k_dp_cc_cleanup(). So add the same
there to fix this issue.

Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4

Signed-off-by: Baochen Qiang <[email protected]>
---
drivers/net/wireless/ath/ath12k/dp.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/net/wireless/ath/ath12k/dp.c b/drivers/net/wireless/ath/ath12k/dp.c
index 1006eef8ff0c..ddfe1eb82d6b 100644
--- a/drivers/net/wireless/ath/ath12k/dp.c
+++ b/drivers/net/wireless/ath/ath12k/dp.c
@@ -1150,7 +1150,9 @@ static void ath12k_dp_cc_cleanup(struct ath12k_base *ab)
struct ath12k_rx_desc_info *desc_info;
struct ath12k_tx_desc_info *tx_desc_info, *tmp1;
struct ath12k_dp *dp = &ab->dp;
+ struct ath12k_skb_cb *skb_cb;
struct sk_buff *skb;
+ struct ath12k *ar;
int i, j;
u32 pool_id, tx_spt_page;

@@ -1201,6 +1203,11 @@ static void ath12k_dp_cc_cleanup(struct ath12k_base *ab)
if (!skb)
continue;

+ skb_cb = ATH12K_SKB_CB(skb);
+ ar = skb_cb->ar;
+ if (atomic_dec_and_test(&ar->dp.num_tx_pending))
+ wake_up(&ar->dp.tx_empty_waitq);
+
dma_unmap_single(ab->dev, ATH12K_SKB_CB(skb)->paddr,
skb->len, DMA_TO_DEVICE);
dev_kfree_skb_any(skb);

base-commit: fe7e1b830cf3c0272aa4eaf367c4c7b29c169c3d
--
2.25.1



2024-04-01 17:02:20

by Jeff Johnson

[permalink] [raw]
Subject: Re: [PATCH] wifi: ath12k: fix flush failure in recovery scenarios

On 3/28/2024 6:56 PM, Baochen Qiang wrote:
> Commit eaf9f17b861b ("wifi: ath12k: relocate ath12k_dp_pdev_pre_alloc()
> call") moves ath12k_dp_pdev_pre_alloc() from ath12k_core_start() to
> ath12k_mac_allocate(), resulting in ath12k_mac_flush() failure in
> recovery scenarios:
>
> [ 6849.684104] ath12k_pci 0000:04:00.0: pdev 0 successfully recovered
> [ 6854.907320] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0
> [ 6860.027353] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0
> [ 6865.143385] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0
>
> This is because, with ath12k_dp_pdev_pre_alloc() moved to ath12k_mac_allocate(),
> dp->num_tx_pending is not reset due to ATH12K_FLAG_REGISTERED set in
> recovery scenarios.
>
> So a possible fix would be to reset that counter at some proper point,
> just like the old design. But considering that the counter tracks number
> of packets pending to be freed or returned to mac80211, forcefully reset
> it might make it hard to expose some real issues. For example if somehow
> ath12k fails to free/return some TX packets, we don't know that because
> no warnings any more.
>
> That is to say we should not reset that counter during recovery (which is
> already done due to above commit), instead should decrease it each time
> a packet is freed/returned. Currently almost each related function has
> this logic implemented, except ath12k_dp_cc_cleanup(). So add the same
> there to fix this issue.
>
> Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4
>
> Signed-off-by: Baochen Qiang <[email protected]>

Do we have QCN9274 test results?


2024-04-02 00:23:27

by Baochen Qiang

[permalink] [raw]
Subject: Re: [PATCH] wifi: ath12k: fix flush failure in recovery scenarios



On 4/2/2024 1:01 AM, Jeff Johnson wrote:
> On 3/28/2024 6:56 PM, Baochen Qiang wrote:
>> Commit eaf9f17b861b ("wifi: ath12k: relocate ath12k_dp_pdev_pre_alloc()
>> call") moves ath12k_dp_pdev_pre_alloc() from ath12k_core_start() to
>> ath12k_mac_allocate(), resulting in ath12k_mac_flush() failure in
>> recovery scenarios:
>>
>> [ 6849.684104] ath12k_pci 0000:04:00.0: pdev 0 successfully recovered
>> [ 6854.907320] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0
>> [ 6860.027353] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0
>> [ 6865.143385] ath12k_pci 0000:04:00.0: failed to flush transmit queue 0
>>
>> This is because, with ath12k_dp_pdev_pre_alloc() moved to ath12k_mac_allocate(),
>> dp->num_tx_pending is not reset due to ATH12K_FLAG_REGISTERED set in
>> recovery scenarios.
>>
>> So a possible fix would be to reset that counter at some proper point,
>> just like the old design. But considering that the counter tracks number
>> of packets pending to be freed or returned to mac80211, forcefully reset
>> it might make it hard to expose some real issues. For example if somehow
>> ath12k fails to free/return some TX packets, we don't know that because
>> no warnings any more.
>>
>> That is to say we should not reset that counter during recovery (which is
>> already done due to above commit), instead should decrease it each time
>> a packet is freed/returned. Currently almost each related function has
>> this logic implemented, except ath12k_dp_cc_cleanup(). So add the same
>> there to fix this issue.
>>
>> Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0-03427-QCAHMTSWPL_V1.0_V2.0_SILICONZ-1.15378.4
>>
>> Signed-off-by: Baochen Qiang <[email protected]>
>
> Do we have QCN9274 test results?
I was thinking this also applies to QCN9274 and should not cause any
regression because it is quite a straightforward change.

Anyway, more tests more convincing.

Hi Vasanth, could you help with SSR test on this patch? Considering that
debugfs patches and firmware-assert patch are not included in ath.git,
you may need to find a way to simulate firmware crash to trigger SSR
process. Please let me know if you have any issues in that.

>