2023-06-06 13:44:27

by Ben Greear

[permalink] [raw]
Subject: iwlwifi: tx-fail reported with zero retry count.

It seems that when tx fails (due to TX_STATUS_FAIL_LONG_LIMIT in this case), the
retry-count is sometimes reported as zero. I would expect that it actually must have retried
many times (15 it seems?). At least for fail-long-limit failures, is there a known constant amount of
times the firmware retransmits the frame before giving up? If so, I can just hack that
value into the retry counter in this case.

Debug patch:

[greearb@ben-dt5 iwlwifi]$ git diff
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
index 1cddc65dd51e..6544dabb092b 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
@@ -1929,8 +1929,14 @@ static void iwl_mvm_rx_tx_cmd_single(struct iwl_mvm *mvm,
mvm->ethtool_stats.tx_mpdu_attempts += info->status.rates[0].count;
mvm->ethtool_stats.tx_mpdu_retry += tx_resp->failure_frame;
if (cb.flags & IWL_TX_CB_TXO_USED) {
+ u32 idx = status & TX_STATUS_MSK;
+
mvm->ethtool_stats.txo_tx_mpdu_attempts += info->status.rates[0].count;
mvm->ethtool_stats.txo_tx_mpdu_retry += tx_resp->failure_frame;
+ if (idx != TX_STATUS_SUCCESS) {
+ pr_info("txo tx status failed: %d tx_resp->failure_frame: %d\n",
+ idx, tx_resp->failure_frame);
+ }
}

iwl_mvm_hwrate_to_tx_status(mvm, mvm->fw,


From dmesg:

Jun 06 06:29:40 ct523c-0b29 kernel: txo tx status failed: 131 tx_resp->failure_frame: 15
Jun 06 06:29:43 ct523c-0b29 kernel: txo tx status failed: 131 tx_resp->failure_frame: 15
Jun 06 06:29:48 ct523c-0b29 kernel: txo tx status failed: 131 tx_resp->failure_frame: 15
Jun 06 06:29:49 ct523c-0b29 kernel: txo tx status failed: 131 tx_resp->failure_frame: 15
Jun 06 06:29:52 ct523c-0b29 kernel: txo tx status failed: 131 tx_resp->failure_frame: 15
Jun 06 06:29:57 ct523c-0b29 kernel: txo tx status failed: 131 tx_resp->failure_frame: 0
Jun 06 06:29:57 ct523c-0b29 kernel: txo tx status failed: 131 tx_resp->failure_frame: 0
Jun 06 06:29:58 ct523c-0b29 kernel: txo tx status failed: 131 tx_resp->failure_frame: 0
Jun 06 06:30:01 ct523c-0b29 kernel: txo tx status failed: 131 tx_resp->failure_frame: 15
Jun 06 06:30:02 ct523c-0b29 kernel: txo tx status failed: 131 tx_resp->failure_frame: 15
Jun 06 06:30:03 ct523c-0b29 kernel: txo tx status failed: 131 tx_resp->failure_frame: 15
....


Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2023-06-06 14:10:12

by Ben Greear

[permalink] [raw]
Subject: Re: iwlwifi: tx-fail reported with zero retry count.

On 6/6/23 6:36 AM, Ben Greear wrote:
> It seems that when tx fails (due to TX_STATUS_FAIL_LONG_LIMIT in this case), the
> retry-count is sometimes reported as zero.  I would expect that it actually must have retried
> many times (15 it seems?).  At least for fail-long-limit failures, is there a known constant amount of
> times the firmware retransmits the frame before giving up?  If so, I can just hack that
> value into the retry counter in this case.
>
> Debug patch:
>
> [greearb@ben-dt5 iwlwifi]$ git diff
> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> index 1cddc65dd51e..6544dabb092b 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> @@ -1929,8 +1929,14 @@ static void iwl_mvm_rx_tx_cmd_single(struct iwl_mvm *mvm,
>                 mvm->ethtool_stats.tx_mpdu_attempts += info->status.rates[0].count;
>                 mvm->ethtool_stats.tx_mpdu_retry += tx_resp->failure_frame;
>                 if (cb.flags & IWL_TX_CB_TXO_USED) {
> +                       u32 idx = status & TX_STATUS_MSK;
> +
>                         mvm->ethtool_stats.txo_tx_mpdu_attempts += info->status.rates[0].count;
>                         mvm->ethtool_stats.txo_tx_mpdu_retry += tx_resp->failure_frame;
> +                       if (idx != TX_STATUS_SUCCESS) {
> +                               pr_info("txo tx status failed: %d  tx_resp->failure_frame: %d\n",
> +                                       idx, tx_resp->failure_frame);
> +                       }
>                 }
>
>                 iwl_mvm_hwrate_to_tx_status(mvm, mvm->fw,
>
>
> From dmesg:
>
> Jun 06 06:29:40 ct523c-0b29 kernel: txo tx status failed: 131  tx_resp->failure_frame: 15
> Jun 06 06:29:43 ct523c-0b29 kernel: txo tx status failed: 131  tx_resp->failure_frame: 15
> Jun 06 06:29:48 ct523c-0b29 kernel: txo tx status failed: 131  tx_resp->failure_frame: 15
> Jun 06 06:29:49 ct523c-0b29 kernel: txo tx status failed: 131  tx_resp->failure_frame: 15
> Jun 06 06:29:52 ct523c-0b29 kernel: txo tx status failed: 131  tx_resp->failure_frame: 15
> Jun 06 06:29:57 ct523c-0b29 kernel: txo tx status failed: 131  tx_resp->failure_frame: 0
> Jun 06 06:29:57 ct523c-0b29 kernel: txo tx status failed: 131  tx_resp->failure_frame: 0
> Jun 06 06:29:58 ct523c-0b29 kernel: txo tx status failed: 131  tx_resp->failure_frame: 0
> Jun 06 06:30:01 ct523c-0b29 kernel: txo tx status failed: 131  tx_resp->failure_frame: 15
> Jun 06 06:30:02 ct523c-0b29 kernel: txo tx status failed: 131  tx_resp->failure_frame: 15
> Jun 06 06:30:03 ct523c-0b29 kernel: txo tx status failed: 131  tx_resp->failure_frame: 15

I think it has some other weirdness too. I added printout to show any non-zero failure_frame:

+ if (idx != TX_STATUS_SUCCESS || tx_resp->failure_frame) {
+ pr_info("txo tx status: %d tx_resp->failure_frame: %d\n",
+ idx, tx_resp->failure_frame);
+ }


For a while, it showed the status 131 and failure_frame of 0, then it started reporting lots
of smaller numbers of retries (more like what I would expect to see).
And then it seemed to stick in a case where it was reporting
lots of successes with 15 retries:

Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15
Jun 06 06:44:50 ct523c-0b29 kernel: txo tx status: 1 tx_resp->failure_frame: 15


I think it is quite unlikely that it reliably succeeded on the very last retry
so many times in a row. Maybe the FW has some issues in reporting the failure_frame
count properly?

Thanks,
Ben